Stable Limit Laws for Marginal Probabilities from MCMC Streams: Acceleration of Convergence

Similar documents
Statistical Inference

STA205 Probability: Week 8 R. Wolpert

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference

Bayesian Regression Linear and Logistic Regression

Statistical Inference

Introduction to Bayesian Methods. Introduction to Bayesian Methods p.1/??

Default Priors and Effcient Posterior Computation in Bayesian

Part III. A Decision-Theoretic Approach and Bayesian testing

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Bayes Factors, posterior predictives, short intro to RJMCMC. Thermodynamic Integration

Introduction to Bayesian Inference

Principles of Bayesian Inference

A Very Brief Summary of Bayesian Inference, and Examples

Markov Chain Monte Carlo (MCMC)

1 Hypothesis Testing and Model Selection

MODEL COMPARISON CHRISTOPHER A. SIMS PRINCETON UNIVERSITY

COS513 LECTURE 8 STATISTICAL CONCEPTS

Answers and expectations

David Giles Bayesian Econometrics

STA 294: Stochastic Processes & Bayesian Nonparametrics

Probability and Measure

Principles of Bayesian Inference

Model comparison. Christopher A. Sims Princeton University October 18, 2016

Inference for a Population Proportion

Hyperparameter estimation in Dirichlet process mixture models

Principles of Bayesian Inference

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Fourier Transforms of Measures

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

Riemann Manifold Methods in Bayesian Statistics

A Very Brief Summary of Statistical Inference, and Examples

Estimation of Operational Risk Capital Charge under Parameter Uncertainty

Density Estimation. Seungjin Choi

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

A Note on Lenk s Correction of the Harmonic Mean Estimator

Markov Chain Monte Carlo

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017

Approximate Bayesian computation for spatial extremes via open-faced sandwich adjustment

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.

Likelihood-free MCMC

Review. December 4 th, Review

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for Bayesian Estimation in a Finite Gaussian Mixture Model

STAT215: Solutions for Homework 2

Bayesian model selection for computer model validation via mixture model estimation

Statistical techniques for data analysis in Cosmology

ComputationalToolsforComparing AsymmetricGARCHModelsviaBayes Factors. RicardoS.Ehlers

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

ST 740: Model Selection

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

Chapter 5. Bayesian Statistics

Introduction to Probability and Statistics (Continued)

Computational statistics

Bayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida

On the Power of Tests for Regime Switching

Probability, CLT, CLT counterexamples, Bayes. The PDF file of this lecture contains a full reference document on probability and random variables.

Foundations of Statistical Inference

Monte Carlo in Bayesian Statistics

Bayesian Inference: Concept and Practice

BAYESIAN MODEL CRITICISM

Bayesian inference for multivariate skew-normal and skew-t distributions

Introduction to Bayesian Methods

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Lecture 8: The Metropolis-Hastings Algorithm

STAT 425: Introduction to Bayesian Analysis

David Giles Bayesian Econometrics

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2011

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Bayesian Learning (II)

Lecture 3. Univariate Bayesian inference: conjugate analysis

Bayesian model selection: methodology, computation and applications

MCMC for big data. Geir Storvik. BigInsight lunch - May Geir Storvik MCMC for big data BigInsight lunch - May / 17

Infer relationships among three species: Outgroup:

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li.

Statistical Inference

Lecture : Probabilistic Machine Learning

Introduction to Bayesian Data Analysis

9 Bayesian inference. 9.1 Subjective probability

Construction of an Informative Hierarchical Prior Distribution: Application to Electricity Load Forecasting

General Bayesian Inference I

Frequentist Statistics and Hypothesis Testing Spring

Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations

Recall that the AR(p) model is defined by the equation

Introduction to Bayesian Statistics 1

Introduction to Markov Chain Monte Carlo & Gibbs Sampling

Statistical Inference: Maximum Likelihood and Bayesian Approaches

Testing Simple Hypotheses R.L. Wolpert Institute of Statistics and Decision Sciences Duke University, Box Durham, NC 27708, USA

Asymptotic distribution of the sample average value-at-risk

Exercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters

Bayesian Inference for Normal Mean

INTRODUCTION TO BAYESIAN STATISTICS

Eco517 Fall 2014 C. Sims MIDTERM EXAM

Introduction to Bayesian Statistics. James Swain University of Alabama in Huntsville ISEEM Department

ABC methods for phase-type distributions with applications in insurance risk problems

Parametric Techniques Lecture 3

POSTERIOR ANALYSIS OF THE MULTIPLICATIVE HETEROSCEDASTICITY MODEL

Modeling Real Estate Data using Quantile Regression

Transcription:

Stable Limit Laws for Marginal Probabilities from MCMC Streams: Acceleration of Convergence Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham NC 778-5 - Revised April, Abstract In the Bayesian paradigm the marginal probability density function at the observed data vector is the key ingredient needed to compute Bayes factors and posterior probabilities of models and hypotheses. Although Markov chain Monte Carlo methods have simplified many calculations needed for the practical application of Bayesian methods, the problem of evaluating this marginal probability remains difficult. Newton and Raftery discovered that the harmonic mean of the likelihood function along an MCMC stream converges almostsurely but very slowly to the required marginal probability density. In this paper examples are shown to illustrate that these harmonic means converge in distribution to a one-sided stable law with index between one and two. Methods are proposed and illustrated for evaluating the required marginal probability density of the data from the limiting stable distribution, offering a dramatic acceleration in convergence over existing methods. Key words: Bayes factors, domain of attraction, harmonic mean, Markov chain Monte Carlo, model mixing. Supported by U.S. E.P.A. grant CR847-- Available at url: ftp://ftp.isds.duke.edu/pub/workingpapers/-.ps

Introduction In the Bayesian paradigm hypotheses are tested and models are selected by evaluating their posterior probabilities or, when opinions differ about what should be the hypotheses or models prior probabilities, Bayes factors are used to represent the relative support offered by the data for one hypothesis or model against another. In Bayesian model mixing, predictions and inference are made amid uncertainty about the underlying model by weighting different models predictions by their posterior probabilities. In all these cases a key ingredient is the marginal probability density function f m (x) at the observed data vector x, for each model m under consideration. Although Markov chain Monte Carlo methods have broadened dramatically the class of problems that can be solved numerically using Bayesian methods, the problem of evaluating f m (x) remains difficult. Newton and Raftery (994) discovered that the sample means of the inverse likelihood function f m (x θ i ) along an MCMC stream {θ i } will converge almost surely to the inverse f m (x), so that f m (x) itself can be approximated by the harmonic mean of the likelihoods f m (x θ i ); unfortunately, as they also discovered, the partial sums S n of the f m (x θ i ) often do not obey a Central Limit Theorem, and in those cases S n /n does not converge quickly. Many others have discovered the same phenomenon. In this paper examples are shown to illustrate that f m (x θ i ) will lie in the domain of attraction of a one-sided stable law of index α >. Only in problems with precise prior information and diffuse likelihoods is α =, where the Central Limit Theorem applies and the S n have a limiting Gaussian distribution, with sample means converging at rate n / ; with more sample information (or less prior information) the limit law is stable of index close to one, and slow convergence at rate n /α. Stable Laws Early this century Paul Lévy (95) proved that the only possible limiting distributions for recentered and rescaled partial sums S n = j n Y j of independent identically-distributed random variables are the stable laws (S n a n )/b n Z with characteristic functions (see Cheng and Liu, 997) E [e iωz ] = exp ( iδω γω α + iβ tan πα { γω α sgn ω γω }) ()

for some index α (, ) (, ], skewness β, scale γ >, and location < δ < (a similar formula holds for α = ); for < α < and β =, the cases of interest to us below, Z is integrable and the characteristic function can be written in Lévy-Khinchine form as E [e iωz ] = exp ( iδω + C { e iωu iωu } u α du ) () with C = γ α sin πα Γ(α + )/π. We will face the problem of estimating the mean E [Z] = δ βγ tan πα δ + βγ for α from data. π(α ) ( This location/scale family has a density function γ f x δ ) α,β γ, but only in a few special cases is f α,β (x) known in closed form; still it and its distribution function can be evaluated numerically by inverting the Fourier transform, f α,β (x) = e iωx ω α +iβ tan πα [ ω α sgn ω ω] dω π = π R F α,β (x) = c + π e ωα cos ( ωx β tan πα [ωα ω] ) dω e ωα sin ( ωx β tan πα [ωα ω] ) ω dω. (3) If the distribution of the {Y j } has finite variance (or tail probabilities that fall off fast enough that y P[ Y j > y] ) the central limit theorem applies and the limit must be normal. The limit will be stable of index α < if instead P[ Y j > y] ky α as y for some < α < ; it will be one-sided stable if also P[Y j < y]/p[y j > y], whereupon β = and the density function is given by f α,,γ,δ (x) = γπ = γπ [ ω(x δ) e ωα cos γ [ ω(x δ) e ω cos γ tan πα ] (ωα ω) dω (4) + ] π ω log ω dω if α =, (5) with approximate median F5 α δ and quartiles F 5 α, F 75 α δ γ. For example, here are the densities for α =. and α =.5, with unit scale γ = and zero location δ = : 3

Stable St(.,,,) Density Stable St(.5,,,) Density f(x)..5..5..5 f(x)..5..5..5 - M 4 6 mu x - M mu 4 6 x Each has the median (near x = ) and the mean (further to the right) labeled (with M and mu, respectively) and indicated with short vertical strokes; evidently the right tails are quite heavy, more so for α, making it difficult to estimate δ or E [Y j ] from sample averages. In principle it would be possible to estimate α, γ and δ by constructing averages Z i of sufficiently many of the Y j that the {Z j } are approximately independent one-sided stable variables of index α and center δ, then applying nonlinear optimization methods to minimize the negative log likelihood log L(α, γ, δ) = m log f α,,γ,δ (z i ), i= evaluated by applying numerical integration to (4); this computation of the maximum likelihood estimators ˆα, ˆγ, ˆδ entails an enormous computational burdon for sample sizes m large enough to provide clear evidence about δ, since the stable distribution has no nontrivial sufficient statistics or simple formulas for the MLE s. Instead we follow the following prescription of McCulloch (986), based on the distributional quantiles x α p of the standard fully-skewed stable distribution of index α, satisfying P[X x α p ] = p, and the sample quantiles ˆx p from the data, for p {.5,.5,.5,.75,.95}: Find the index α for which ˆx.95 ˆx.5 ˆν α = ν α xα.95 x α.5 ˆx.75 ˆx.5 x α.75 ; xα.5 note that the quantity ν α depends only on the kurtosis (and hence on α) but not on either scale or location (hence γ or δ). 4

For this α, find the scale γ = (ˆx.75 ˆx.5 )/(x α.75 xα.5) for which ˆx.75 ˆx.5 γ ˆν γ = ν γ xα.75 x α.5 ; note that the quantity ν γ = (x α.75 xα.5) depends only on the kurtosis and scale (hence on α and γ) but not on location (hence δ). For this α and γ, find the location δ = ˆx.5 γx α.5 for which δ ˆx.5 γ ˆν δ = ν δ xα.5. Thus E [Y ] = δ βγ tan πα may be estimated from the sample quantiles. A partial table of the necessary indices (along with ζ ν δ + tan πα ) is given in (McCulloch, 986).. Example Let X j Ga(a, λ) be independent draws from a Gamma distribution with shape parameter a and rate parameter λ, and set Y j exp(x j ); then Y j satisfies E [Y j p ] = ( p/λ) a < if p < λ; P[Y j > y] = P[λX j > λ log y] = Γ(a, λ log y)/γ(a) (λ log y) a exp( λ log y)[ + O(/λ log y)]/γ(a) ky λ as y, where Γ(a, x) denotes the incomplete Gamma function (Abramowitz and Stegun, 964, 6.5.3). If λ > then Y j has finite variance and lies in the normal domain of attraction, while for λ < the limit is one-sided stable of index α = λ.. Example Now let Z j be independent normally-distributed random variables with mean µ R and variance V >, and set Y j = exp(c Z j ). If µ = then c Z j c V χ is Gamma distributed with a = / and λ = /cv, so for V > /4c 5

the limiting distribution is again one-sided stable of index α = /cv ; even for µ the same limit follows from the calculation P[Y j > y] = P [ Z j > (log y)/c ] (set η ) (log y)/c = Φ ( η µ σ ) ( η + µ ) + Φ σ [ ( σ exp (η + µ) /V ) + exp ( (η µ) /V ) π η + µ η µ k e η /V = k y /cv as y, where Φ(z) is the cumulative distribution function for the standard normal distribution (Abramowitz and Stegun, 964, 6..3), so again Y j lies in the domain of attraction of the one-sided stable distribution of index α = /cv..3 Example 3 (Main Example) Now let X k iid No(θ, σ ) with known variance σ > but uncertain mean θ. Two models are entertained: M, under which θ No(µ, τ ), and M, under which θ No(µ, τ ) (the point null hypothesis with τ = is included). Thus the joint and marginal densities for the sufficient statistic x from a vector of n observations {X k } under model m {, } are: π m (θ, x) = (πσ /n) / (πτ m) / e n( x θ) /σ (θ µ m) /τ m f m ( x) = [π(σ /n + τ m)] / e ( x µm) /(σ +τ m ) and the posterior probability of model M and the posterior odds against M, under prior probabilities π[m ] = π and π[m ] = π, are ] P[M x] = π f ( x) π f ( x) + π f ( x) P[M x] P[M x] = π π f ( x) f ( x). Thus the key for making inference and for computing the Bayes factor B = f ( x)/f ( x) is computing the marginal density function at the observed data point x. f( x) = [π(σ /n + τ )] / e ( x µ) /(σ /n+τ ) 6

At equilibrium any Markov Chain Monte Carlo (MCMC) procedure will generate random variables θ j π(θ x) from the posterior distribution π(θ x) = π(θ, x)/f( x), so for any posterior-integrable function g(θ) the ergodic theorem ensures E [g(θ) x] = g(θ) f( x θ) π(θ) dθ f( x) = lim n n n g(θ j ). Newton and Raftery (994) reasoned that the posterior mean of the random variables Y j /f( x θ j ) would be j= E [ f( x θ) x ] = π(θ) dθ f( x) = f( x), leading to the Harmonic Mean Estimator n f( x) = lim = lim n Ȳ n n n j= /f( θ j x). (6) For any proper prior the limit in (6) converges almost surely. It is our goal to show that the convergence can be very slow. Indeed, ( Y j = /f( x θ j ) = (πσ /n) / e n( x θ j) /σ n ) exp σ (θ j x) is the same as Example above, with c = n/σ and V the conditional variance of θ j given x, V = (n/σ + /τ ), so the limiting distribution is normal and the central limit theorem applies if α = cv = (n/σ )(n/σ + /τ ) = ( + σ /nτ ) exceeds, i.e., if the prior variance τ is less than the sampling variance σ /n). Otherwise, if σ /n < τ, then the limiting distribution of S n is onesided stable with index α (, ). As the sample size n increases, so that the data contain substantially more information than the prior, then we are driven inexorably to the stable limit, with index α just slightly above one. Since E [Y j ] = /f(x) Lévy s limit theorem asserts that S n n/f(x) n /α Z (7) 7

converges in distribution to a stable of index α = + σ /nτ, β =, and mean, whence S n /n /f(x) + Z n /α ; evidently S n /n /f(x) as n, but the convergence is only at rate n ɛ for ɛ = /α = ( + nτ /σ ) and moreover the errors have thick-tailed distributions with infinite moments of all orders p α..4 Example 4 (Bernstein/von Mises) Under suitable regularity conditions every posterior distribution is asymptotically normally distributed, and every likelihood function is asymptotically normal, so the stable limiting behaviour of the preceeding section can be expected in nearly all efforts to apply the Harmonic Mean Estimator to compute Bayes factors for large sample sizes and relatively vague prior information. 3 Improving the Estimate Instead of estimating /f(x) S n /n directly from ergodic averages, we may try to estimate the parameters α, γ n, δ n for the fully-skewed stable Ȳn = S n /n using the quantile-based method of McCulloch (986) or the recent Bayesian approach of [check citation], whereupon we can estimate References /f(x) = E [Ȳn] = δ n γ n tan πα. Abramowitz, M. and Stegun, I. A., eds. (964) Handbook of Mathematical Functions With Formulas, Graphs, and Mathematical Tables, vol. 55 of Applied Mathematics Series. Washington, D.C.: National Bureau of Standards. Cheng, R. C. H. and Liu, W. B. (997) A continuous representation of the family of stable law distributions. Journal of the Royal Statistical Society, Series B (Methodological), 59, 37 45. Lévy, P. (95) Calcul des Probabilités. Paris: Gauthier-Villars. 8

McCulloch, J. H. (986) Simple consistent estimators of stable distribution parameters. Communications in Statistics (Part B) Simulation and Computation, 5, 9 36. Newton, M. A. and Raftery, A. E. (994) Approximate Bayesian inference with the weighted likelihood bootstrap. Journal of the Royal Statistical Society, Series B (Methodological), 56, 3 48. 9