Zero Variance Markov Chain Monte Carlo for Bayesian Estimators

Similar documents
Stat 5421 Lecture Notes Proper Conjugate Priors for Exponential Families Charles J. Geyer March 28, 2016

A Very Brief Summary of Statistical Inference, and Examples

Part III. A Decision-Theoretic Approach and Bayesian testing

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Spring 2012 Math 541B Exam 1

A Very Brief Summary of Bayesian Inference, and Examples

Zig-Zag Monte Carlo. Delft University of Technology. Joris Bierkens February 7, 2017

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

EM Algorithm II. September 11, 2018

Bayesian Modeling of Conditional Distributions

Stat 5101 Lecture Notes

Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis

Parameter Estimation

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices

HIERARCHICAL BAYESIAN ANALYSIS OF BINARY MATCHED PAIRS DATA

1. Fisher Information

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference

Fast Maximum Likelihood estimation via Equilibrium Expectation for Large Network Data

Math 350 Solutions for Final Exam Page 1. Problem 1. (10 points) (a) Compute the line integral. F ds C. z dx + y dy + x dz C

Separation of Variables in Polar and Spherical Coordinates

Lecture 7 October 13

Sequential Monte Carlo Methods for Bayesian Computation

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed

Likelihood-free MCMC

Computer Intensive Methods in Mathematical Statistics

Bohr & Wheeler Fission Theory Calculation 4 March 2009

Math 1B Final Exam, Solution. Prof. Mina Aganagic Lecture 2, Spring (6 points) Use substitution and integration by parts to find:

Bayesian Multivariate Logistic Regression

λ(x + 1)f g (x) > θ 0

Bayesian inference for multivariate skew-normal and skew-t distributions

STA205 Probability: Week 8 R. Wolpert

Chapter 4: Modelling

HANDBOOK OF APPLICABLE MATHEMATICS

Monte Carlo in Bayesian Statistics

Lecture 8: The Metropolis-Hastings Algorithm

Stat 451 Lecture Notes Monte Carlo Integration

David Giles Bayesian Econometrics

Week 2 Notes, Math 865, Tanveer

Invariant HPD credible sets and MAP estimators

Lecture 26: Likelihood ratio tests

If there exists a threshold k 0 such that. then we can take k = k 0 γ =0 and achieve a test of size α. c 2004 by Mark R. Bell,

Kazuhiko Kakamu Department of Economics Finance, Institute for Advanced Studies. Abstract

e x3 dx dy. 0 y x 2, 0 x 1.

Supporting Information

Lecture 17: Likelihood ratio and asymptotic tests

Parametric Techniques Lecture 3

Nuisance parameters and their treatment

MARGINAL MARKOV CHAIN MONTE CARLO METHODS

Lecture 14 More on structural estimation

TEORIA BAYESIANA Ralph S. Silva

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

Analysis of Polya-Gamma Gibbs sampler for Bayesian logistic analysis of variance

Markov Chain Monte Carlo (MCMC)

Random Correlation Matrices, Top Eigenvalue with Heavy Tails and Financial Applications

Infinite-State Markov-switching for Dynamic. Volatility Models : Web Appendix

Econometrics I, Estimation

Physics 342 Lecture 23. Radial Separation. Lecture 23. Physics 342 Quantum Mechanics I

10. Exchangeability and hierarchical models Objective. Recommended reading

Generalized linear models

Power Series. Part 1. J. Gonzalez-Zugasti, University of Massachusetts - Lowell

Modelling geoadditive survival data

6.1 Moment Generating and Characteristic Functions

Control Variates for Markov Chain Monte Carlo

Exercises Chapter 4 Statistical Hypothesis Testing

Flat and multimodal likelihoods and model lack of fit in curved exponential families

New Bayesian methods for model comparison

A A + B. ra + A + 1. We now want to solve the Einstein equations in the following cases:

Data Mining and Analysis: Fundamental Concepts and Algorithms

TESTING ADS/CFT. John H. Schwarz STRINGS 2003

Computational statistics

1 Isotropic Covariance Functions

1 Directional Derivatives and Differentiability

Introduction to Bayesian learning Lecture 2: Bayesian methods for (un)supervised problems

Markov Chain Monte Carlo Methods

Bayesian linear regression

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Technical Vignette 5: Understanding intrinsic Gaussian Markov random field spatial models, including intrinsic conditional autoregressive models

ECE 275A Homework 7 Solutions

Overall Objective Priors

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014

Stable Limit Laws for Marginal Probabilities from MCMC Streams: Acceleration of Convergence

Computation of the scattering amplitude in the spheroidal coordinates

Lecture 5: Spatial probit models. James P. LeSage University of Toledo Department of Economics Toledo, OH

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

PART I INTRODUCTION The meaning of probability Basic definitions for frequentist statistics and Bayesian inference Bayesian inference Combinatorics

The Polya-Gamma Gibbs Sampler for Bayesian. Logistic Regression is Uniformly Ergodic

Review and continuation from last week Properties of MLEs

Appendix A Vector Analysis

STA 294: Stochastic Processes & Bayesian Nonparametrics

Bayesian Estimation with Sparse Grids

Measuring nuisance parameter effects in Bayesian inference

Maximum Likelihood Estimation

Visibility estimates in Euclidean and hyperbolic germ-grain models and line tessellations

An exponential family of distributions is a parametric statistical model having densities with respect to some positive measure λ of the form.

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Bridge estimation of the probability density at a point. July 2000, revised September 2003

Chapter 3. Point Estimation. 3.1 Introduction

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables THE UNIVERSITY OF MANCHESTER. 21 June :45 11:45

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

The square root rule for adaptive importance sampling

Transcription:

Noname manuscript No. will be inserted by the editor Zero Variance Markov Chain Monte Carlo for Bayesian Estimators Supplementary material Antonietta Mira Reza Solgi Daniele Imparato Received: date / Accepted: date Appendix A: Probit model Mathematical formulation Let y i be Bernoulli r.v. s: y i x i B, p i, p i = Φx T i β, where β Rd is the vector of parameters of the model and Φ is the c.d.f. of a standard normal distribution. The likelihood function is: lβ y, x [ Φx T i β ] y i [ Φx T i β ] y i. As it can be seen by inspection, the likelihood function is invariant under the transformation x i, y i x i, y i. Therefore, for the sake of simplicity, in the rest of the example we assume y i = for any i, so that the likelihood simplifies: lβ y, x Φx T i β. This formula shows that the contribution of x i = 0 is just a constant Φx T i β = Φ0 =, therefore, without loss of generality, we assume for all i, x i 0. A. Mira Swiss Finance Institute, University of Lugano, via Buffi 3 CH-6904 Lugano, Switzerland. E-mail: antonietta.mira@usi.ch R. Solgi Swiss Finance Institute, University of Lugano, via Buffi 3 CH-6904 Lugano, Switzerland. E-mail: reza.solgi@usi.ch D. Imparato Department of Economics, University of Insubria, via Monte Generoso 7, 00 Varese, Italy. E-mail: daniele.imparato@uninsubria.it

Antonietta Mira et al. Using flat priors, the posterior of the model is proportional to the likelihood, and the Bayesian estimator of each parameter, β k, is the expected value of f k β = β k under π k =,,, d. Using Schrödinger-type Hamiltonians, H and ψ k β = P k β πβ, as the trial functions, where P k β = d j= a j,kβ j is a first degree polynomial, one gets: f k β = f k β + where, for j =,,..., d, Hψ kβ πβ y, x = f k β + z j = n because of the assumption y i = for any i. x ij φx T i β Φx T i β, d a j,k z j, j= Central limit theorem In the following, it is supposed that P is a linear polynomial. In the Probit model, the ZV-MCMC estimators obey a CLT if z j have finite + δ moment under π, for some δ > 0: [ E π zj +δ] = c E π n x ij φx T i β +δ Φx T i β = c c n x ij φx T i β Φx T i β +δ n Φx T i βdβ <. where c = δ, and c is the normalizing constant of π the target posterior. Define: and therefore: n x ij φx T i K β = β Φx T i β K β = Φx T i β, Kβ = K βk β +δ [ E π zj +δ] = c K βk βdβ. where c = c c. Before studying the convergence of this integral, the following property of the likelihood for the probit model is needed.,

Zero Variance Markov Chain Monte Carlo for Bayesian Estimators 3 Proposition Existence and uniqueness of MLE implies that, for any β 0 \ {0}, there exists i such that x T i β 0 < 0. Proof by contradiction. Uniqueness of MLE implies that x T x is full rank, that is, there is no β 0 orthogonal to all observations x i. This can be seen by contradiction: singularity of x T x implies existence of a non-zero β 0 orthogonal to all observations x i. This fact, in turn, implies lβ x, y = lβ + cβ 0 x, y and, therefore, l x, y does not have a unique global maximum. Next, assume there exists some β 0 such that, for any i, x T i β 0 > 0. Then β 0 is a direction of recession for the negative log-likelihood function n ln ΦxT i β that is a proper closed convex function. This implies that this function does not have non-empty bounded minimum set [], which means that the MLE does not exist. Now, rewriting Kβdβ in hyper-spherical coordinates through the bijective transformation ρ, θ,..., θ d := F β, where F is defined as β = ρ cosθ β l = ρ cosθ l l m= sinθ m, for l =,..., d β d = ρ d m= sinθ m, for θ Θ := {0 θ i π, i =,..., d, 0 θ d < π} and ρ > 0, one gets Kβdβ = := + Θ 0 + Θ Θ 0 Aθdθ, d KF ρ, θρ d sin d j θ j dρdθ. j= KF ρ, θρ d dρdθ Observe that the integrand is well defined for any ρ, θ on the domain of integration, so it is enough to study its asymptotic behaviour when ρ goes to infinity, and θ Θ. First, analyze K F n ρ, θ = x ij φ x i ρλ i θ Φ x i ρλ i θ where, for any i, λ i is a suitable function of the angles θ such that λ i [, ], which takes into account the sign of the scalar product in the original coordinates system. For any i, when ρ if λ i < 0, xijφ xi ρλi Φ x i ρλ i O ρ; +δ,

4 Antonietta Mira et al. if λ i > 0, xijφ xi ρλi Φ x i ρλ i if λ i = 0, xijφ xi ρλi Φ x i ρλ i Therefore: O φλ i ρ; = x ij π O. n x ij φ x i ρλ i Φ x i ρλ i O ρ and, for any θ Θ: K F ρ, θ O ρ +δ. Now, focus on K F ρ, θ = n Φ x i ρλ i θ; existence of MLE for the probit model implies that, for any θ Θ, there exists some l l n, such that λ l θ < 0, and therefore: K F ρ, θ < Φ x l ρλ l O φλ l ρ ρ. Putting these results together leads to KF ρ, θ = K F ρ, θk F ρ, θ O ρ +δ φλ l θρ so that, for any θ Θ, KF ρ, θρ d O ρ +δ+d φ λ l θρ, ρ +. Therefore, whenever the value θ Θ, its integrand converges to zero rapidly enough when ρ +. This concludes the proof. Note. In [] it is shown that the existence of the posterior under flat priors for probit and logit models is equivalent to the existence and finiteness of MLE. This ensures us that the posterior is well defined in our context. In order to verify the existence of the posterior mean, we can use a simplified version of the proof given above. In other words we should show βj n ΦxT i βdβ < +, that is, K β = β j Oρ. Therefore, a weaker version of the proof given above can be employed. Note. In the proof given above we have used flat priors: although this assumption simplifies the proof, however a very similar proof can be applied for non-flat priors. Assume the prior is π 0 β. Under this assumption the posterior is πβ = π 0 β lβ y, x π 0 β n β and the control variates are: x ijφx T i β Φx T i β ΦxT i z j = d ln π 0β dβ j n. Therefore we need to prove the + δ-th moment of z j under πβ is finite. A sufficient condition for this is the finiteness of + δ-th moments of d ln π 0β dβ j and n x ijφx T i β Φx T i β under πβ. If we assume π 0 β is bounded above, the latter is a trivial consequence of the proof given above for the flat priors. Therefore we only need to prove the finiteness of the integral d ln π 0 β dβ j +δ π 0 β Φx T i βdβ.

Zero Variance Markov Chain Monte Carlo for Bayesian Estimators 5 Again if we assume the prior is bounded from above, a sufficient condition for the existence of this integral is the existence of the following integral: +δ d ln π 0 β n dβ j Φx T i βdβ A proof very similar to the one given above will show that this integral is finite for common choices of priors π 0 β such as Normal, Student s T, etc. Appendix B: Logit model Mathematical formulation In the same setting as the probit model, let p i = expxt i β +expx T i β where β Rd is the vector of parameters of the model. The likelihood function is: lβ y, x expx T i β + expx T i β yi + expx T i β yi. 3 By inspection, it is easy to verify that the likelihood function is invariant under the transformation:x i, y i x i, y i. Therefore, for the sake of simplicity, in the sequel we assume y i = 0 for any i, so that the likelihood simplifies as: lβ y, x + expx T i β. The contribution of x i = 0 to the likelihood is just a constant, therefore, without loss of generality, it is assumed that x i 0 for all i. Using flat priors, the posterior distribution is proportional to 3 and the Bayesian estimator of each parameter, β k, is the expected value of f k β = β k under π k =,,, d. Using the same pair of operator H and test function ψ k as in Appendinx A, the control variates are: z j = n expx T i x β ij + expx T for j =,,..., d. i β, Central limit theorem As for the probit model, the ZV-MCMC estimators obey a CLT if the control variates z j have finite + δ moment under π, for some δ > 0 : [ ] n +δ E π z +δ j = c E π expx T i x β ij + expx T i β = c c n +δ expx T i x β n ij + expx T i β + expx T i βdβ

6 Antonietta Mira et al. where c = δ, and c is the normalizing constant of π. The finitiness of the integral is, indeed, trivial. Observe that the function e y / + e y is bounded by ; therefore, E π [ z +δ j ] n +δ x ij <. As for probit model, note that it can easily be shown that the posterior means exist under flat priors. Moreover, a very similar proof see Note of Appendix A can be used under a normal or Student s T prior distribution. Appendix C: GARCH model Mathematical formulation We assume that the returns are conditionally normal distributed, rt F t N 0, h t, where h t is a predictable F t measurable process: h t = ω + ω 3 h t + ω r t, where ω > 0, ω 0, and ω 3 0. Let r = r,..., r T be the observed time series. The likelihood function is equal to: l ω, ω, ω 3 r T t= h t exp and using independent truncated Normal priors for the parameters, the posterior is: [ π ω, ω, ω 3 r exp ] ω σ ω + ω σ ω + ω T 3 σ h t exp ω 3 t= Therefore, the control variates when the trial function is a first degree polynomial are: t= r t h t t= r t h t. ln π ω i = ω i σ ω i h t= t h t ω i + r t h t= t h t ω i, i =,, 3, where: h t = ωt 3, ω ω 3 h t = r h t t + ω 3 I t>, ω ω h t h t = h t + ω 3 I t>. ω 3 ω 3

Zero Variance Markov Chain Monte Carlo for Bayesian Estimators 7 Central limit theorem In order to prove the CLT for the ZV-MCMC estimator in the Garch model, we need: ln π ln π ln π,, L +δ π. 4 ω ω 3 ω To this end, h t and its partial derivatives should be expressed as a function of h 0 and r: t t h t = ω + ω3 k + ω3h t 0 + ω + ω3 k rt k, k= ln h t ω ln h t ω = ln h t ω 3 = = ω t 3 ω 3 I {t>}, t rt + t h t + k= ω t j 3 rj j=0 ω t j 3 h j j=0 I {t>}, I {t>}. Next, moving to spherical coordinates, the integral 4 can be written as K j ρ; θ, φdρdθdφ := A j θ, φdθdφ, [0,π/] 0 [0,π/] where, for j =,, 3, K j ; θ, φ = W j +δ W, with W = σ ω ρ cos θ sin φ W = σ ω ρ sin θ sin φ W 3 = σ ω ρ cos φ 3 t= h t W = exp and ρ sin θ t= h t t= h t r t h t r t ρ t cos t φ, h ρ cos φ t r t r h t t ht + t t + x t j 3 rj j=0 j=0 x t j 3 h j, ρ σ ω cos θ sin φ + σ ω sin θ sin φ + σ ω 3 cos φ T t= h t t t h t = ρ cos θ sin φ +ρ k cos k φ+h 0 ρ t cos t φ+ρ sin θ sin φ +rt kρ k cos k φ. k= k=, 5 t= r t h t

8 Antonietta Mira et al. The aim is to prove that, for any θ, φ [0, π/] and for any j, A j θ, φ is finite. To this end, the convergence of A j for any θ, φ should be discussed. Let us study the proper domain of K j ; θ, φ. Observe that K j ; θ, φ is not defined whenever h t = 0 and, if j = and φ π/, also for ρ = / cos φ. However, the discontinuity of K 3 at this point is removable, so that the domain of K 3 can be extended by continuity also at ρ = / cos φ. Since h t = 0 if and only if ρ = 0, it can be concluded that, for any j and for any θ, φ [0, π/], the proper domain of K j ; θ, φ is domk j ; θ, φ = 0+. By fixing the value of θ and φ, let us study the limits of K j when ρ 0 and ρ +. Observe that, whatever the values of θ and φ are, W j s are rationale functions of ρ. Therefore, for any j, W j +δ cannot grow towards infinity more than polynomially at the boundary of the domain. On the other hand, W goes to zero with an exponential rate both when ρ 0 and ρ +, for any θ and φ. This is sufficient to conclude that, for any θ, φ [0, π/], the integral A j is finite for j =,, 3 and, therefore, condition 4 holds and the ZV estimators for the GARCH model obeys a CLT. Appendix D: unbiasedness In this Appendix, explicit computations are presented, which were omitted in Section 5. Moreover, it is proved that all the ZV-MCMC estimators discussed in Section 8 are unbiased. Following the same notations as in Section 5, equation 9 follows because Hψ := π = Ω Ω Hψ π V ψ π ψ π = V πψ π ψ ndσ + π ψ Ω Ω Ω = V πψ π ψ ndσ + ψ π ndσ Ω Ω Ω = ΩH πψ + [ψ π π ψ] ndσ Ω = [ψ π π ψ] ndσ. Ω Ω ψ π Then, Therefore, Hψ = 0 if ψ π = π ψ on Ω. Now, let ψ = P π. π ψ = π P + P π π,

Zero Variance Markov Chain Monte Carlo for Bayesian Estimators 9 so that Hψ π = 0 if P x πx = 0, x Ω, j =,..., d. x j When π has unbounded support, following the previous computations integrating over the bounded set B r and taking the limit for r, one gets Hψ = π lim π P ndσ. 6 r + B r Therefore, unbiasedness in the unbounded case is reached if the limit appearing in the right-hand side of 6 is zero. Now, the unbiasedness of the ZV-MCMC estimators exploited in Section 8 is discussed. To this end, condition 6 should be verified. Let B ρ be a hypersphere of radius ρ and let n := ρβ be its normal versor. Then, for linear P, 6 equals zero if, for any j =,..., d, lim ρ + ρ B ρ πββ j ds = 0. 7 The Probit model is first considered. By using the same notations as in Appendix A, the integral in 7 is proportional to lim K F ρ, θρ d dθ. 8 ρ + ρ Note that, because of, there exist ρ 0 and M such that K F ρ, θρ d Mφλ lθ ρρ d Θ Mφλ lθ ρ 0 ρ d 0 := Gθ ρ ρ 0. Since Gθ L, by the dominated convergence theorem a sufficient condition to get unbiasedness is lim K F ρ, θρ d = 0, 9 ρ + which is true, because of for the Probit model. We now consider the Logit model for which it is easy to prove that Proposition holds. As done for the Probit model, one can write [ ] E π z +δ j K βk βdβ, where K β = K β = n +δ expx T i x β ij + expx T i β, + expx T i β,

0 Antonietta Mira et al. By using the hyper-spherical change of variables in, we get [ ] E π z +δ j K F ρ, θk F ρ, θρ d dρ : Θ 0 and, as for the Probit model, we must verify Equation 8. Now analyze K F ρ, θ; for any θ, existence of MLE implies the existence of some l l n, such that λ l θ > 0, and therefore: K F ρ, θ = Therefore, there exist ρ 0, M such that K F ρ, θρ d M exp λ lθ ρρ d + exp x i ρλ i < + exp x l ρλ l O exp ρλ l. 0 M exp λ lθ ρ 0 ρ d 0 := Gθ ρ ρ 0, where Gθ L. These computations allow us to use Equation 9 as a sufficient condition to get unbiasedness, and its proof becomes trivial. Finally consider the GARCH model. In this case, B ρ is the portion of a sphere of radius ρ defined on the positive orthant. Then, the limit lim ρ + ρ [0,π/] W F ρ, θρ d dθ, where W was defined in 5, should be discussed. Again, an application of the dominated convergence theorem leads to the simpler condition lim W F ρ, θρ = 0, ρ + which is true, since W decays with an exponential rate. References. Rockafellar, R.: Convex analysis, pp. 64 65. Princeton University Press 970. Speckman P.L. Lee, J., Sun, D.: Existence of the mle and propriety of posteriors for a general multinomial choice model. Statistica Sinica 9, 73 748 009