Zero Variance Markov Chain Monte Carlo for Bayesian Estimators

Noname manuscript No. will be inserted by the editor Zero Variance Markov Chain Monte Carlo for Bayesian Estimators Supplementary material Antonietta Mira Reza Solgi Daniele Imparato Received: date / Accepted: date Appendix A: Probit model Mathematical formulation Let y i be Bernoulli r.v. s: y i x i B, p i, p i = Φx T i β, where β Rd is the vector of parameters of the model and Φ is the c.d.f. of a standard normal distribution. The likelihood function is: lβ y, x [ Φx T i β ] y i [ Φx T i β ] y i. As it can be seen by inspection, the likelihood function is invariant under the transformation x i, y i x i, y i. Therefore, for the sake of simplicity, in the rest of the example we assume y i = for any i, so that the likelihood simplifies: lβ y, x Φx T i β. This formula shows that the contribution of x i = 0 is just a constant Φx T i β = Φ0 =, therefore, without loss of generality, we assume for all i, x i 0. A. Mira Swiss Finance Institute, University of Lugano, via Buffi 3 CH-6904 Lugano, Switzerland. E-mail: antonietta.mira@usi.ch R. Solgi Swiss Finance Institute, University of Lugano, via Buffi 3 CH-6904 Lugano, Switzerland. E-mail: reza.solgi@usi.ch D. Imparato Department of Economics, University of Insubria, via Monte Generoso 7, 00 Varese, Italy. E-mail: daniele.imparato@uninsubria.it

Antonietta Mira et al. Using flat priors, the posterior of the model is proportional to the likelihood, and the Bayesian estimator of each parameter, β k, is the expected value of f k β = β k under π k =,,, d. Using Schrödinger-type Hamiltonians, H and ψ k β = P k β πβ, as the trial functions, where P k β = d j= a j,kβ j is a first degree polynomial, one gets: f k β = f k β + where, for j =,,..., d, Hψ kβ πβ y, x = f k β + z j = n because of the assumption y i = for any i. x ij φx T i β Φx T i β, d a j,k z j, j= Central limit theorem In the following, it is supposed that P is a linear polynomial. In the Probit model, the ZV-MCMC estimators obey a CLT if z j have finite + δ moment under π, for some δ > 0: [ E π zj +δ] = c E π n x ij φx T i β +δ Φx T i β = c c n x ij φx T i β Φx T i β +δ n Φx T i βdβ <. where c = δ, and c is the normalizing constant of π the target posterior. Define: and therefore: n x ij φx T i K β = β Φx T i β K β = Φx T i β, Kβ = K βk β +δ [ E π zj +δ] = c K βk βdβ. where c = c c. Before studying the convergence of this integral, the following property of the likelihood for the probit model is needed.,

Zero Variance Markov Chain Monte Carlo for Bayesian Estimators 3 Proposition Existence and uniqueness of MLE implies that, for any β 0 \ {0}, there exists i such that x T i β 0 < 0. Proof by contradiction. Uniqueness of MLE implies that x T x is full rank, that is, there is no β 0 orthogonal to all observations x i. This can be seen by contradiction: singularity of x T x implies existence of a non-zero β 0 orthogonal to all observations x i. This fact, in turn, implies lβ x, y = lβ + cβ 0 x, y and, therefore, l x, y does not have a unique global maximum. Next, assume there exists some β 0 such that, for any i, x T i β 0 > 0. Then β 0 is a direction of recession for the negative log-likelihood function n ln ΦxT i β that is a proper closed convex function. This implies that this function does not have non-empty bounded minimum set [], which means that the MLE does not exist. Now, rewriting Kβdβ in hyper-spherical coordinates through the bijective transformation ρ, θ,..., θ d := F β, where F is defined as β = ρ cosθ β l = ρ cosθ l l m= sinθ m, for l =,..., d β d = ρ d m= sinθ m, for θ Θ := {0 θ i π, i =,..., d, 0 θ d < π} and ρ > 0, one gets Kβdβ = := + Θ 0 + Θ Θ 0 Aθdθ, d KF ρ, θρ d sin d j θ j dρdθ. j= KF ρ, θρ d dρdθ Observe that the integrand is well defined for any ρ, θ on the domain of integration, so it is enough to study its asymptotic behaviour when ρ goes to infinity, and θ Θ. First, analyze K F n ρ, θ = x ij φ x i ρλ i θ Φ x i ρλ i θ where, for any i, λ i is a suitable function of the angles θ such that λ i [, ], which takes into account the sign of the scalar product in the original coordinates system. For any i, when ρ if λ i < 0, xijφ xi ρλi Φ x i ρλ i O ρ; +δ,

4 Antonietta Mira et al. if λ i > 0, xijφ xi ρλi Φ x i ρλ i if λ i = 0, xijφ xi ρλi Φ x i ρλ i Therefore: O φλ i ρ; = x ij π O. n x ij φ x i ρλ i Φ x i ρλ i O ρ and, for any θ Θ: K F ρ, θ O ρ +δ. Now, focus on K F ρ, θ = n Φ x i ρλ i θ; existence of MLE for the probit model implies that, for any θ Θ, there exists some l l n, such that λ l θ < 0, and therefore: K F ρ, θ < Φ x l ρλ l O φλ l ρ ρ. Putting these results together leads to KF ρ, θ = K F ρ, θk F ρ, θ O ρ +δ φλ l θρ so that, for any θ Θ, KF ρ, θρ d O ρ +δ+d φ λ l θρ, ρ +. Therefore, whenever the value θ Θ, its integrand converges to zero rapidly enough when ρ +. This concludes the proof. Note. In [] it is shown that the existence of the posterior under flat priors for probit and logit models is equivalent to the existence and finiteness of MLE. This ensures us that the posterior is well defined in our context. In order to verify the existence of the posterior mean, we can use a simplified version of the proof given above. In other words we should show βj n ΦxT i βdβ < +, that is, K β = β j Oρ. Therefore, a weaker version of the proof given above can be employed. Note. In the proof given above we have used flat priors: although this assumption simplifies the proof, however a very similar proof can be applied for non-flat priors. Assume the prior is π 0 β. Under this assumption the posterior is πβ = π 0 β lβ y, x π 0 β n β and the control variates are: x ijφx T i β Φx T i β ΦxT i z j = d ln π 0β dβ j n. Therefore we need to prove the + δ-th moment of z j under πβ is finite. A sufficient condition for this is the finiteness of + δ-th moments of d ln π 0β dβ j and n x ijφx T i β Φx T i β under πβ. If we assume π 0 β is bounded above, the latter is a trivial consequence of the proof given above for the flat priors. Therefore we only need to prove the finiteness of the integral d ln π 0 β dβ j +δ π 0 β Φx T i βdβ.

Zero Variance Markov Chain Monte Carlo for Bayesian Estimators 5 Again if we assume the prior is bounded from above, a sufficient condition for the existence of this integral is the existence of the following integral: +δ d ln π 0 β n dβ j Φx T i βdβ A proof very similar to the one given above will show that this integral is finite for common choices of priors π 0 β such as Normal, Student s T, etc. Appendix B: Logit model Mathematical formulation In the same setting as the probit model, let p i = expxt i β +expx T i β where β Rd is the vector of parameters of the model. The likelihood function is: lβ y, x expx T i β + expx T i β yi + expx T i β yi. 3 By inspection, it is easy to verify that the likelihood function is invariant under the transformation:x i, y i x i, y i. Therefore, for the sake of simplicity, in the sequel we assume y i = 0 for any i, so that the likelihood simplifies as: lβ y, x + expx T i β. The contribution of x i = 0 to the likelihood is just a constant, therefore, without loss of generality, it is assumed that x i 0 for all i. Using flat priors, the posterior distribution is proportional to 3 and the Bayesian estimator of each parameter, β k, is the expected value of f k β = β k under π k =,,, d. Using the same pair of operator H and test function ψ k as in Appendinx A, the control variates are: z j = n expx T i x β ij + expx T for j =,,..., d. i β, Central limit theorem As for the probit model, the ZV-MCMC estimators obey a CLT if the control variates z j have finite + δ moment under π, for some δ > 0 : [ ] n +δ E π z +δ j = c E π expx T i x β ij + expx T i β = c c n +δ expx T i x β n ij + expx T i β + expx T i βdβ

6 Antonietta Mira et al. where c = δ, and c is the normalizing constant of π. The finitiness of the integral is, indeed, trivial. Observe that the function e y / + e y is bounded by ; therefore, E π [ z +δ j ] n +δ x ij <. As for probit model, note that it can easily be shown that the posterior means exist under flat priors. Moreover, a very similar proof see Note of Appendix A can be used under a normal or Student s T prior distribution. Appendix C: GARCH model Mathematical formulation We assume that the returns are conditionally normal distributed, rt F t N 0, h t, where h t is a predictable F t measurable process: h t = ω + ω 3 h t + ω r t, where ω > 0, ω 0, and ω 3 0. Let r = r,..., r T be the observed time series. The likelihood function is equal to: l ω, ω, ω 3 r T t= h t exp and using independent truncated Normal priors for the parameters, the posterior is: [ π ω, ω, ω 3 r exp ] ω σ ω + ω σ ω + ω T 3 σ h t exp ω 3 t= Therefore, the control variates when the trial function is a first degree polynomial are: t= r t h t t= r t h t. ln π ω i = ω i σ ω i h t= t h t ω i + r t h t= t h t ω i, i =,, 3, where: h t = ωt 3, ω ω 3 h t = r h t t + ω 3 I t>, ω ω h t h t = h t + ω 3 I t>. ω 3 ω 3

Zero Variance Markov Chain Monte Carlo for Bayesian Estimators 7 Central limit theorem In order to prove the CLT for the ZV-MCMC estimator in the Garch model, we need: ln π ln π ln π,, L +δ π. 4 ω ω 3 ω To this end, h t and its partial derivatives should be expressed as a function of h 0 and r: t t h t = ω + ω3 k + ω3h t 0 + ω + ω3 k rt k, k= ln h t ω ln h t ω = ln h t ω 3 = = ω t 3 ω 3 I {t>}, t rt + t h t + k= ω t j 3 rj j=0 ω t j 3 h j j=0 I {t>}, I {t>}. Next, moving to spherical coordinates, the integral 4 can be written as K j ρ; θ, φdρdθdφ := A j θ, φdθdφ, [0,π/] 0 [0,π/] where, for j =,, 3, K j ; θ, φ = W j +δ W, with W = σ ω ρ cos θ sin φ W = σ ω ρ sin θ sin φ W 3 = σ ω ρ cos φ 3 t= h t W = exp and ρ sin θ t= h t t= h t r t h t r t ρ t cos t φ, h ρ cos φ t r t r h t t ht + t t + x t j 3 rj j=0 j=0 x t j 3 h j, ρ σ ω cos θ sin φ + σ ω sin θ sin φ + σ ω 3 cos φ T t= h t t t h t = ρ cos θ sin φ +ρ k cos k φ+h 0 ρ t cos t φ+ρ sin θ sin φ +rt kρ k cos k φ. k= k=, 5 t= r t h t

8 Antonietta Mira et al. The aim is to prove that, for any θ, φ [0, π/] and for any j, A j θ, φ is finite. To this end, the convergence of A j for any θ, φ should be discussed. Let us study the proper domain of K j ; θ, φ. Observe that K j ; θ, φ is not defined whenever h t = 0 and, if j = and φ π/, also for ρ = / cos φ. However, the discontinuity of K 3 at this point is removable, so that the domain of K 3 can be extended by continuity also at ρ = / cos φ. Since h t = 0 if and only if ρ = 0, it can be concluded that, for any j and for any θ, φ [0, π/], the proper domain of K j ; θ, φ is domk j ; θ, φ = 0+. By fixing the value of θ and φ, let us study the limits of K j when ρ 0 and ρ +. Observe that, whatever the values of θ and φ are, W j s are rationale functions of ρ. Therefore, for any j, W j +δ cannot grow towards infinity more than polynomially at the boundary of the domain. On the other hand, W goes to zero with an exponential rate both when ρ 0 and ρ +, for any θ and φ. This is sufficient to conclude that, for any θ, φ [0, π/], the integral A j is finite for j =,, 3 and, therefore, condition 4 holds and the ZV estimators for the GARCH model obeys a CLT. Appendix D: unbiasedness In this Appendix, explicit computations are presented, which were omitted in Section 5. Moreover, it is proved that all the ZV-MCMC estimators discussed in Section 8 are unbiased. Following the same notations as in Section 5, equation 9 follows because Hψ := π = Ω Ω Hψ π V ψ π ψ π = V πψ π ψ ndσ + π ψ Ω Ω Ω = V πψ π ψ ndσ + ψ π ndσ Ω Ω Ω = ΩH πψ + [ψ π π ψ] ndσ Ω = [ψ π π ψ] ndσ. Ω Ω ψ π Then, Therefore, Hψ = 0 if ψ π = π ψ on Ω. Now, let ψ = P π. π ψ = π P + P π π,

Zero Variance Markov Chain Monte Carlo for Bayesian Estimators 9 so that Hψ π = 0 if P x πx = 0, x Ω, j =,..., d. x j When π has unbounded support, following the previous computations integrating over the bounded set B r and taking the limit for r, one gets Hψ = π lim π P ndσ. 6 r + B r Therefore, unbiasedness in the unbounded case is reached if the limit appearing in the right-hand side of 6 is zero. Now, the unbiasedness of the ZV-MCMC estimators exploited in Section 8 is discussed. To this end, condition 6 should be verified. Let B ρ be a hypersphere of radius ρ and let n := ρβ be its normal versor. Then, for linear P, 6 equals zero if, for any j =,..., d, lim ρ + ρ B ρ πββ j ds = 0. 7 The Probit model is first considered. By using the same notations as in Appendix A, the integral in 7 is proportional to lim K F ρ, θρ d dθ. 8 ρ + ρ Note that, because of, there exist ρ 0 and M such that K F ρ, θρ d Mφλ lθ ρρ d Θ Mφλ lθ ρ 0 ρ d 0 := Gθ ρ ρ 0. Since Gθ L, by the dominated convergence theorem a sufficient condition to get unbiasedness is lim K F ρ, θρ d = 0, 9 ρ + which is true, because of for the Probit model. We now consider the Logit model for which it is easy to prove that Proposition holds. As done for the Probit model, one can write [ ] E π z +δ j K βk βdβ, where K β = K β = n +δ expx T i x β ij + expx T i β, + expx T i β,

0 Antonietta Mira et al. By using the hyper-spherical change of variables in, we get [ ] E π z +δ j K F ρ, θk F ρ, θρ d dρ : Θ 0 and, as for the Probit model, we must verify Equation 8. Now analyze K F ρ, θ; for any θ, existence of MLE implies the existence of some l l n, such that λ l θ > 0, and therefore: K F ρ, θ = Therefore, there exist ρ 0, M such that K F ρ, θρ d M exp λ lθ ρρ d + exp x i ρλ i < + exp x l ρλ l O exp ρλ l. 0 M exp λ lθ ρ 0 ρ d 0 := Gθ ρ ρ 0, where Gθ L. These computations allow us to use Equation 9 as a sufficient condition to get unbiasedness, and its proof becomes trivial. Finally consider the GARCH model. In this case, B ρ is the portion of a sphere of radius ρ defined on the positive orthant. Then, the limit lim ρ + ρ [0,π/] W F ρ, θρ d dθ, where W was defined in 5, should be discussed. Again, an application of the dominated convergence theorem leads to the simpler condition lim W F ρ, θρ = 0, ρ + which is true, since W decays with an exponential rate. References. Rockafellar, R.: Convex analysis, pp. 64 65. Princeton University Press 970. Speckman P.L. Lee, J., Sun, D.: Existence of the mle and propriety of posteriors for a general multinomial choice model. Statistica Sinica 9, 73 748 009