MONTE CARLO METHODS. Hedibert Freitas Lopes

Similar documents
BAYESIAN MODEL CRITICISM

Session 3A: Markov chain Monte Carlo (MCMC)

A note on Reversible Jump Markov Chain Monte Carlo

Markov Chain Monte Carlo

Direct Simulation Methods #2

Markov Chain Monte Carlo Methods

Markov chain Monte Carlo methods for visual tracking

Likelihood-free MCMC

Markov Chain Monte Carlo methods

Simulation of truncated normal variables. Christian P. Robert LSTA, Université Pierre et Marie Curie, Paris

Monte Carlo in Bayesian Statistics

Learning the hyper-parameters. Luca Martino

Kobe University Repository : Kernel

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Introduction to Bayesian Computation

Bayesian Nonparametric Regression for Diabetes Deaths

Bayesian Inference and MCMC

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference

Accept-Reject Metropolis-Hastings Sampling and Marginal Likelihood Estimation

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

On the Optimal Scaling of the Modified Metropolis-Hastings algorithm

Introduction to Markov Chain Monte Carlo & Gibbs Sampling

Calibration of Stochastic Volatility Models using Particle Markov Chain Monte Carlo Methods

AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET. Machine Learning T. Schön. (Chapter 11) AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

Introduction to Machine Learning CMU-10701

eqr094: Hierarchical MCMC for Bayesian System Reliability

Appendix 3. Markov Chain Monte Carlo and Gibbs Sampling

Sequential Bayesian Updating

Bayesian Inference. Chapter 1. Introduction and basic concepts

Markov Chain Monte Carlo

A Single Series from the Gibbs Sampler Provides a False Sense of Security

ELEC633: Graphical Models

Factorization of Seperable and Patterned Covariance Matrices for Gibbs Sampling

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model

Markov Chain Monte Carlo

Metropolis-Hastings Algorithm

Sequential Monte Carlo Methods

Introduction to Machine Learning CMU-10701

Multivariate Slice Sampling. A Thesis. Submitted to the Faculty. Drexel University. Jingjing Lu. in partial fulfillment of the

Principles of Bayesian Inference

STAT 425: Introduction to Bayesian Analysis

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Sampling Methods (11/30/04)

Bayesian Estimation with Sparse Grids

SAMPLING ALGORITHMS. In general. Inference in Bayesian models

Bayesian inference for intractable distributions

CSC 2541: Bayesian Methods for Machine Learning

A note on slice sampling

arxiv: v1 [stat.co] 18 Feb 2012

Monte Carlo Methods. Jesús Fernández-Villaverde University of Pennsylvania

Bayesian Phylogenetics:

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Monte Carlo methods for sampling-based Stochastic Optimization

Monte Carlo Methods in Statistics

Markov chain Monte Carlo

Markov chain Monte Carlo

A = {(x, u) : 0 u f(x)},

Lecture 8: Bayesian Estimation of Parameters in State Space Models

Computational Advances for and from Bayesian Analysis

CS281A/Stat241A Lecture 22

Analysis of the Gibbs sampler for a model. related to James-Stein estimators. Jeffrey S. Rosenthal*

What is the most likely year in which the change occurred? Did the rate of disasters increase or decrease after the change-point?

Bayesian Computational Methods: Monte Carlo Methods

Bayesian Model Comparison:

Iterative Markov Chain Monte Carlo Computation of Reference Priors and Minimax Risk

Risk Estimation and Uncertainty Quantification by Markov Chain Monte Carlo Methods

A Semi-parametric Bayesian Framework for Performance Analysis of Call Centers

A Review of Pseudo-Marginal Markov Chain Monte Carlo

John Geweke Department of Economics University of Minnesota Minneapolis, MN First draft: April, 1991

Lecture Notes based on Koop (2003) Bayesian Econometrics

Monte Carlo Methods for Computation and Optimization (048715)

Lecture 4: Dynamic models

TEORIA BAYESIANA Ralph S. Silva

SUPPLEMENT TO MARKET ENTRY COSTS, PRODUCER HETEROGENEITY, AND EXPORT DYNAMICS (Econometrica, Vol. 75, No. 3, May 2007, )

The simple slice sampler is a specialised type of MCMC auxiliary variable method (Swendsen and Wang, 1987; Edwards and Sokal, 1988; Besag and Green, 1

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version)

Monte Carlo Methods in Bayesian Inference: Theory, Methods and Applications

Bayes Factors, posterior predictives, short intro to RJMCMC. Thermodynamic Integration

David Giles Bayesian Econometrics

MARKOV CHAIN MONTE CARLO

INTRODUCTION TO BAYESIAN STATISTICS

Pseudo-marginal MCMC methods for inference in latent variable models

Monte Carlo Inference Methods

Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling

Nonlinear Inequality Constrained Ridge Regression Estimator

Controlled sequential Monte Carlo

ST 740: Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture February Arnaud Doucet

Brief introduction to Markov Chain Monte Carlo

Bayesian Computations for DSGE Models

Inexact approximations for doubly and triply intractable problems

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait

Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

Bayesian Analysis of Massive Datasets Via Particle Filters

MCMC: Markov Chain Monte Carlo

Markov Chain Monte Carlo (MCMC)

General Construction of Irreversible Kernel in Markov Chain Monte Carlo

Transcription:

MONTE CARLO METHODS Hedibert Freitas Lopes The University of Chicago Booth School of Business 5807 South Woodlawn Avenue, Chicago, IL 60637 http://faculty.chicagobooth.edu/hedibert.lopes hlopes@chicagobooth.edu

1 A bit of 2 3 4 5 6 7 8 Outline

in the 40s and 50s Stan Ulam soon realized that computers could be used in this fashion to answer questions of neutron diffusion and mathematical physics; He contacted John Von Neumann and they developed many algorithms (importance sampling, rejection sampling, etc); In the 1940s Nick Metropolis and Klari Von Neumann designed new controls for the state-of-the-art computer (ENIAC); Metropolis and Ulam (1949) The. Journal of the American Statistical Association. Metropolis et al. (1953) Equations of state calculations by fast computing machines. Journal of Chemical Physics.

We introduce several () for integrating and/or sampling from nontrivial densities. Simple via importance sampling (IS) sampling Sampling importance resampling (SIR) Iterative sampling Metropolis-Hastings algorithms Simulated annealing Gibbs sampler Based on the book by Gamerman and Lopes (1996).

A few references (Geweke, 1989) (Gilks and Wild, 1992) SIR (Smith and Gelfand, 1992) Metropolis-Hastings algorithm (Hastings, 1970) Simulated annealing (Metropolis et al., 1953) Gibbs sampler (Gelfand and Smith, 1990)

Two main tasks 1 Compute high dimensional integrals: E π [h(θ)] = h(θ)π(θ)dθ 2 Obtain a sample {θ 1,..., θ n } from π(θ) when only a sample { θ 1,..., θ m } from q(θ) is available. q(θ) is known as the proposal/auxiliary density.

Bayes via appear frequently, but not exclusively, in modern Bayesian statistics. Posterior and predictive densities are hard to sample from: Posterior : π(θ) = f (x θ)p(θ) f (x) Predictive : f (x) = f (x θ)p(θ)dθ Other important integrals and/or functionals of the posterior and predictive densities are: Posterior modes: max θ π(θ); Posterior moments: E π [g(θ)]; Density estimation: ˆπ(g(θ)); Bayes factors: f (x M 0 )/f (x M 1 ); Decision: max d U(d, θ)π(θ)dθ.

The objective here is to compute moments E π [h(θ)] = h(θ)π(θ)dθ If θ 1,..., θ n is a random sample from π( ) then h mc = 1 n h(θ i ) E π [h(θ)] as n. n i=1 If, additionally, E π [h 2 (θ)] <, then V π [ h mc ] = 1 {h(θ) E π [h(θ)]} 2 π(θ)dθ n and v mc = 1 n 2 n (h(θ i ) h mc ) 2 V π [ h mc ] as n. i=1

The objective here is to compute 1 p = 1 0 Example i. [cos(50θ) + sin(20θ)] 2 dθ by noticing that the above integral can be rewritten as E π [h(θ)] = h(θ)π(θ)dθ where h(θ) = [cos(50θ) + sin(20θ)] 2 and π(θ) = 1 is the density of a U(0, 1). Therefore ˆp = 1 n n h(θ i ) i=1 where θ 1,..., θ n are i.i.d. from U(0, 1). 1 True value is 0.965.

h(θ) 0 1 2 3 4 0.0 0.2 0.4 0.6 0.8 1.0 θ p 0.0 0.4 0.8 1.2 1 2 3 4 5 6 Sample size (log10)

The objective is still the same, ie to compute E π [h(θ)] = h(θ)π(θ)dθ by noticing that h(θ)π(θ) E π [h(θ)] = q(θ)dθ q(θ) where q( ) is an importance function.

If θ 1,..., θ n is a random sample from q( ) then as n. h is = 1 n Ideally, q( ) should be n i=1 h(θ i )π(θ i ) q(θ i ) As close as possible to h( )π( ), and Easy to sample from. E π [h(θ)]

The objective here is to estimate p = Pr(θ > 2) = 2 Example ii. 1 π(1 + θ 2 dθ = 0.1475836 ) where θ is a standard Cauchy random variable. A natural estimator of p is ˆp 1 = 1 n n I {θ i (2, )} i=1 where θ 1,..., θ n Cauchy(0,1).

A more elaborated estimator based on a change of variables from θ to u = 1/θ is ˆp 2 = 1 n n i=1 where u 1,..., u n U(0, 1/2). u 2 i 2π[1 + u 2 i ]

The true value is p = 0.147584. n ˆp 1 ˆp 2 v 1/2 1 v 1/2 2 100 0.100000 0.1467304 0.030000 0.001004 1000 0.137000 0.1475540 0.010873 0.000305 10000 0.148500 0.1477151 0.003556 0.000098 100000 0.149100 0.1475591 0.001126 0.000031 1000000 0.147711 0.1475870 0.000355 0.000010 With only n = 1000 draws, ˆp 2 has roughly the same precision that ˆp 1, which is based on 1000n draws, ie. three orders of magnitude.

The objective is to draw from a target density π(θ) = c π π(θ) when only draws from an auxiliary density q(θ) = c q q(θ) is available, for normalizing constants c π and c q. If there exist a constant A < such that 0 π(θ) 1 for all θ A q(θ) then q(θ) becomes a blanketing density or an envelope and A the envelope constant.

Blanket distribution 0.0 0.1 0.2 0.3 0.4 0.5 pi (target) q (proposal) Aq 6 4 2 0 2 4 6

Bad draw 0.0 0.1 0.2 0.3 0.4 0.5 pi (target) q (proposal) Aq pi/aq=0.41 6 4 2 0 2 4 6

Good draw 0.0 0.1 0.2 0.3 0.4 0.5 pi (target) q (proposal) Aq pi/aq=0.89 6 4 2 0 2 4 6

Acceptance probability Acceptance probability 0.0 0.2 0.4 0.6 0.8 6 4 2 0 2 4 6 θ

Drawing from π(θ). 1 Draw θ from q( ); 2 Draw u from U(0, 1); 3 Accept θ if u π(θ ) A q(θ ) ; Algorithm 4 Repeat 1, 2 and 3 until n draws are accepted. Normalizing constants c π and c q are not needed. The theoretical acceptance rate is cq Ac π. The smaller the A, the larger the acceptance rate.

Enveloping the standard normal density π(θ) = 1 2π exp{ 0.5θ 2 } Example iii. by a Cauchy density q C (θ) = 1/(π(1 + θ 2 )), or a uniform density q U (θ) = 0.05 for θ ( 10, 10). Bad proposal:the maximum of π(θ)/q U (θ) is roughly A U = 7.98 for θ ( 10, 10). The theoretical acceptance rate is 12.53%. Good proposal: The max of π(θ)/q C (θ) is equal to A C = 2π/e 1.53. The theoretical acceptance rate is 65.35%.

UNIFORM PROPOSAL CAUCHY PROPOSAL 0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4 4 2 0 2 4 4 2 0 2 4 Empirical rates: 0.1265 (Uniform) and 0.6483 (Cauchy) Theoretical rates: 0.1253 (Uniform) and 0.6535 (Cauchy)

No need to rely on the existance of A! Algorithm 1 Draw θ1,..., θ n from q( ) 2 Compute (unnormalized) weights ω i = π(θi )/q(θi ) i = 1,..., n 3 Sample θ from {θ1,..., θ n} such that Pr(θ = θi ) ω i i = 1,..., n. 4 Repeat m times step 3. Rule of thumb: n/m = 20. Ideally, ω i = 1/n and Var(ω) = 0.

UNIFORM PROPOSAL Example iii. revisited CAUCHY PROPOSAL 0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4 4 2 0 2 4 4 2 0 2 4 Fraction of redraws: 0.391 (Uniform) and 0.1335 (Cauchy) Variance of weights: 4.675 (Uniform) and 0.332 (Cauchy)

Example iv. Assume that we are interested in sampling from π(θ) = α 1 p N (θ; µ 1, Σ 1 ) + α 2 p N (θ; µ 2, Σ 2 ) + α 3 p N (θ; µ 3, Σ 3 ) where p N ( ; µ, Σ) is the density of a bivariate normal distribution with mean vector µ and covariance matrix Σ. The mean vectors are µ 1 = (1, 4) µ 2 = (4, 2) µ 3 = (6.5, 2), the covariance matrices are ( ) 1.0 0.9 Σ 1 = and Σ 0.9 1.0 2 = Σ 3 = ( 1.0 0.5 0.5 1.0 ), and weights α 1 = α 2 = α 3 = 1/3.

Target π(θ) θ 2 0 2 4 6 2 0 2 4 6 8 10 θ 1

Target π(θ) 0.12 0.10 f(theta) 0.08 0.06 0.04 0.02 0.00 5 5 15 10 0 0 theta2 5 theta1 5 10 15 10

10 5 0 5 10 15 θ 2 Proposal q(θ) 5 0 5 10 15 θ 1 q(θ) N(µ, Σ) where µ 2 = (4, 2) and Σ = 9 ( 1.0 0.25 0.25 1.0 )

5 0 5 10 15 10 5 0 5 10 15 θ 1 θ 2 5 0 5 10 15 10 5 0 5 10 15 θ 1 θ 2 Acceptance rate: 9.91% of n = 10, 000 draws.

5 0 5 10 15 10 5 0 5 10 15 θ 1 θ 2 5 0 5 10 15 10 5 0 5 10 15 θ 1 θ 2 Fraction of redraws: 29.45% of (n = 10, 000, m = 2, 000).

& SIR 2 0 2 4 6 8 0 2 4 6 θ 1 θ 2 2 0 2 4 6 8 10 0 2 4 6 θ 1 θ 2

Example v. Let us now assume that π(θ) = α 1 p N (θ; µ 1, Σ 1 ) + α 3 p N (θ; µ 3, Σ 3 ) where mean vectors are the covariance matrices are ( ) 1.0 0.9 Σ 1 = 0.9 1.0 µ 1 = (1, 4) µ 3 = (6.5, 2), and Σ 3 = and weights α 1 = 1/3 and α 3 = 2/3. ( 1.0 0.5 0.5 1.0 ),

Target π(θ) θ 2 0 2 4 6 2 0 2 4 6 8 10 θ 1

Target π(θ) 0.12 0.10 f(theta) 0.08 0.06 0.04 0.02 0.00 5 5 15 10 0 0 theta2 5 theta1 5 10 15 10

Proposal q(θ) θ 2 10 5 0 5 10 15 5 0 5 10 15 θ 1

5 0 5 10 15 10 5 0 5 10 15 θ 1 θ 2 5 0 5 10 15 10 5 0 5 10 15 θ 1 θ 2 Acceptance rate: 10.1% of n = 10, 000 draws.

5 0 5 10 15 10 5 0 5 10 15 θ 1 θ 2 5 0 5 10 15 10 5 0 5 10 15 θ 1 θ 2 Fraction of redraws: 37.15% of (n = 10, 000, m = 2, 000).

& SIR 2 0 2 4 6 8 10 0 2 4 6 θ 1 θ 2 2 0 2 4 6 8 10 0 2 4 6 θ 1 θ 2

1 Metropolis and Ulam (1949) The. JASA, 44, 335-341. 2 Metropolis, Rosenbluth, Rosenbluth, Teller and Teller (1953) Equation of state calculations by fast computing machines. Journal of Chemical Physics, Number 21, 1087-1092. 3 Hastings (1970) sampling using Markov chains and their applications. Biometrika, 57, 97-109. 4 Peskun (1973) Optimum Monte-Carlo sampling using Markov chains. Biometrika, 60, 607-612. 5 Besag (1974) Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society, Series B, 36, 192-236. 6 Geman and Geman (1984) Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 721-741. 7 Jennison (1993) Discussion of the meeting on Gibbs sampling and other Markov chain. JRSS-B, 55, 54-6. 8 Kirkpatrick, Gelatt and Vecchi (1983) Optimization by simulated annealing. Science, 220, 671-80. 9 Pearl (1987) Evidential reasoning using stochastic simulation. Articial Intelligence, 32, 245-257. 10 Tanner and Wong (1987) The calculation of posterior distributions by data augmentation. JASA, 82, 528-550. 11 Geweke (1989) Bayesian inference in econometric models using. Econometrica, 57, 1317-1339. 12 Gelfand and Smith (1990) Sampling-based approaches to calculating marginal densities, JASA, 85, 398-409. 13 Casella and George (1992) Explaining the Gibbs sampler. The American Statistician,46,167-174. 14 Smith and Gelfand (1992) Bayesian statistics without tears: a sampling-resampling perspective. American Statistician, 46, 84-88. 15 Gilks and Wild (1992) Adaptive rejection sampling for Gibbs sampling. Applied Statistics, 41, 337-48. 16 Chib and Greenberg (1995) Understanding the Metropolis-Hastings algorithm. The American Statistician, 49, 327-335. 17 Dongarra and Sullivan (2000) Guest editors introduction: The top 10 algorithms. Computing in Science and Engineering, 2, 22-23.