Markov Chain Monte Carlo

Similar documents
Computational statistics

17 : Markov Chain Monte Carlo

Bayesian data analysis in practice: Three simple examples

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

ST 740: Markov Chain Monte Carlo

STAT 425: Introduction to Bayesian Analysis

Metropolis-Hastings Algorithm

Principles of Bayesian Inference

INTRODUCTION TO BAYESIAN STATISTICS

Markov Chain Monte Carlo (MCMC) and Model Evaluation. August 15, 2017

Markov Chain Monte Carlo methods

Bayesian Estimation of Input Output Tables for Russia

David Giles Bayesian Econometrics

Multivariate Normal & Wishart

Lecture 7 and 8: Markov Chain Monte Carlo

SAMPLING ALGORITHMS. In general. Inference in Bayesian models

Markov chain Monte Carlo

Monte Carlo in Bayesian Statistics

MH I. Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution

Bayesian Methods for Machine Learning

Monte Carlo integration

Bayesian Inference and Decision Theory

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version)

MCMC algorithms for fitting Bayesian models

eqr094: Hierarchical MCMC for Bayesian System Reliability

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

STA 294: Stochastic Processes & Bayesian Nonparametrics

Web Appendix for The Dynamics of Reciprocity, Accountability, and Credibility

MCMC Sampling for Bayesian Inference using L1-type Priors

Eco517 Fall 2013 C. Sims MCMC. October 8, 2013

Bayesian Model Comparison:

Metropolis Hastings. Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601. Module 9

16 : Approximate Inference: Markov Chain Monte Carlo

Markov Chain Monte Carlo, Numerical Integration

Doing Bayesian Integrals

Introduction to Machine Learning CMU-10701

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

MCMC: Markov Chain Monte Carlo

Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling

Bayesian modelling. Hans-Peter Helfrich. University of Bonn. Theodor-Brinkmann-Graduate School

Markov chain Monte Carlo Lecture 9

Advanced Statistical Modelling

Monte Carlo Methods. Leon Gu CSD, CMU

Learning the hyper-parameters. Luca Martino

CSC 2541: Bayesian Methods for Machine Learning

Adaptive Monte Carlo methods

Calibration of Stochastic Volatility Models using Particle Markov Chain Monte Carlo Methods

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo Using the Ratio-of-Uniforms Transformation. Luke Tierney Department of Statistics & Actuarial Science University of Iowa

Bayesian Phylogenetics:

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

STA 4273H: Statistical Machine Learning

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

Markov chain Monte Carlo

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

Simulation of truncated normal variables. Christian P. Robert LSTA, Université Pierre et Marie Curie, Paris

Markov chain Monte Carlo

MCMC Methods: Gibbs and Metropolis

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018

MCMC and Gibbs Sampling. Kayhan Batmanghelich

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

I. Bayesian econometrics

Markov Chain Monte Carlo

Markov Networks.

Bayesian Linear Regression

Markov chain Monte Carlo

Spring 2006: Introduction to Markov Chain Monte Carlo (MCMC)

An introduction to Bayesian statistics and model calibration and a host of related topics

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model

Down by the Bayes, where the Watermelons Grow

CTDL-Positive Stable Frailty Model

Markov Chain Monte Carlo Methods

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

The Recycling Gibbs Sampler for Efficient Learning

Stat 516, Homework 1

Markov Chain Monte Carlo (MCMC)

6 Markov Chain Monte Carlo (MCMC)

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

Markov Chain Monte Carlo

University of Toronto Department of Statistics

On Bayesian Computation

Bayesian Networks in Educational Assessment

Kazuhiko Kakamu Department of Economics Finance, Institute for Advanced Studies. Abstract

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo. Sampling Methods. Oliver Schulte - CMPT 419/726. Bishop PRML Ch.

CS281A/Stat241A Lecture 22

Monetary and Exchange Rate Policy Under Remittance Fluctuations. Technical Appendix and Additional Results

Assessing Regime Uncertainty Through Reversible Jump McMC

Introduction to Markov Chain Monte Carlo

Bayesian Gaussian Process Regression

Stat 5102 Notes: Markov Chain Monte Carlo and Bayesian Inference

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference

General Construction of Irreversible Kernel in Markov Chain Monte Carlo

What is the most likely year in which the change occurred? Did the rate of disasters increase or decrease after the change-point?

Principles of Bayesian Inference

Bayesian Inference for Regression Parameters

Computer intensive statistical methods

A Review of Pseudo-Marginal Markov Chain Monte Carlo

Parallel Tempering I

Principles of Bayesian Inference

Transcription:

Markov Chain Monte Carlo Recall: To compute the expectation E ( h(y ) ) we use the approximation E(h(Y )) 1 n n h(y ) t=1 with Y (1),..., Y (n) h(y). Thus our aim is to sample Y (1),..., Y (n) from f(y). Problem: Independent sampling from f(y) may be difficult. Markov chain Monte Carlo (MCMC) approach Generate Markov chain {Y } with stationary distribution f(y). Early iterations Y (1),..., Y (m) reflect starting value Y (). These iterations are called burn-in. After the burn-in, we say the chain has converged. Omit the burn-in from averages: 1 n m n t=m+1 h(y ) Burn in Stationarity 1 Y 1 1 3 4 5 6 7 8 9 1 How do we construct a Markov chain {Y } which has stationary distribution f(y)? Gibbs sampler Metropolis-Hastings algorithm (Metropolis et al 1953; Hastings 197) MCMC, April 9, 4-1 -

Gibbs Sampler Let Y = (,..., Y d ) be d dimensional with d and distribution f(y). The full conditional distribution of Y i is given by f(y i y 1,..., y i 1, y i+1,..., y d ) = f(y 1,..., y i 1, y i, y i+1,..., y d ) f(y1,..., y i 1, y i, y i+1,..., y d ) dy i Gibbs sampling Sample or update in turn: 1 f(y 1 Y, Y 3,..., Y d ) f(y 1, Y 3,..., Y d ) 3 f(y 3 1,, Y 4,..., Y d )... d f(y d 1,,..., d 1 ) Always use most recent values. In two dimensions, the sample path of the Gibbs sampler looks like this:.3 t=1 t=.5 t=4 Y. t=3.15.1 t=5 t=7 t=6.15..5.3.35 MCMC, April 9, 4 - -

Gibbs Sampler Detailed balance for Gibbs sampler: For simplicity, let Y = (, Y ) T. Then the update at time t + 1 is obtained from the previous Y in two steps: 1 p(y 1 Y ) p(y 1 ) Accordingly the transition matrix P (y, y ) = P( = y Y = y) can be factorized into two separate transition matrices P (y, y ) = P 1 (y, ỹ) P (ỹ, y ) where ỹ = (y 1, y ) T is the intermediate result after the first step. Obviously we have P 1 (y, ỹ) = p(y 1 y ) and P (ỹ, y ) = p(y y 1). Note that for any y, y, we have P 1 (y, y ) = if y y and P (y, y ) = if y 1 y 1. According to the detailed balance for time-dependent Markov chains, it suffices to show detailed balance for each of the transition matrices: For any states y, y such that y = y p(y) P 1 (y, y ) = p(y 1, y ) p(y 1 y ) = p(y 1 y ) p(y 1, y ) = p(y 1 y ) p(y 1, y ) = P 1 (y, y) p(y ), while for y, y with y y the equation is trivially fulfilled. Similarly we obtain for y, y such that y 1 = y 1 p(y) P (y, y ) = p(y 1, y ) p(y y 1 ) = p(y y 1 ) p(y, y 1 ) = p(y y 1) p(y 1, y ) = P (y, y) p(y ), while for y, y with y 1 y 1 the equation trivially holds. Altogether this shows that p(y) is indeed the stationary distribution of the Gibbs sampler. Note that combined we get p(y) P (y, y ) = p(y) P 1 (y, ỹ) P 1 (ỹ, y ) = p(y ) P (y, ỹ) P 1 (ỹ, y) p(y ) P (y, y). Explanation: Markov chains {Y t } which satisfy the detailed balance equation are called time-reversible since it can be shown that P(Y t+1 = y Y t = y) = P(Y t = y Y t+1 = y ). For the above Gibbs sampler, to go back in time we have to update the two components in reverse order - first and then 1. MCMC, April 9, 4-3 -

Gibbs Sampler Example: Bayes inference for a univariate normal sample Consider normally distributed observations Y = (,..., Y n ) T Y i iid N (µ, σ ). Likelihood function: ( 1 ) n f(y µ, σ ) σ ( exp 1 σ n ) (Y i µ) Prior distribution (noninformative prior): π(µ, σ ) 1 σ Posterior distribution: ( 1 ) n π(µ, σ Y ) +1 ( exp σ 1 σ Define τ = 1/σ. Then we can show that π(µ σ, Y ) = N ( Ȳ, σ /n ) ( n π(τ µ, Y ) = Γ, 1 n ) (Y i µ) n ) (Y i µ) Gibbs sampler: µ (t+1) N ( Ȳ, (n τ ) 1) ( n τ (t+1) Γ, 1 n ) (Y i µ (t+1) ) with σ (t+1) = 1/τ (t+1) MCMC, April 9, 4-4 -

Gibbs Sampler Implementation in R n<- #Data Y<-rnorm(n,,) MC<-;N<-1 #Run MC= chains of length N=1 p<-rep(,*mc*n) #Allocate memory for results dim(p)<-c(,mc,n) for (j in (1:MC)) { #Loop over chains p<-rgamma(1,n/,1/) #Starting value for tau for (i in (1:N)) { #Gibbs iterations p1<-rnorm(1,mean(y),sqrt(1/(p*n))) #Update mu p<-rgamma(1,n/,sum((y-p1)^)/) #Update tau p[1,j,i]<-p1 #Save results p[,j,i]<-p } } Results: Bayes inference for a univariate normal sample Two runs of Gibbs sampler (N=5): 3.5 3..5. 1.5 1..5.6.4. 1 3 4 5 1. µ 1. τ.8.8.6.6 ACF.4 ACF.4.... 1 3 4 5 Lag 1 3 4 5 Lag Marginal and joint posterior distributions (based on 1 draws): 8 1 8.6.5 6 4 6 4 τ.4.3..1.5 1. 1.5..5 3. 3.5 µ..1..3.4.5.6 τ..5 1. 1.5..5 3. 3.5 µ MCMC, April 9, 4-5 -

Markov Chain Monte Carlo Example: Bivariate normal distribtution Let Y = (, Y ) T be normally distributed with mean µ = (, ) T and covariance matrix ( ) 1 ρ Σ =. ρ 1 The conditional distributions are Y = N (ρ Y, 1 ρ ) Y = N (ρ, 1 ρ ) Thus the steps of the Gibbs sampler are 1 N (ρ Y, 1 ρ ), N (ρ 1, 1 ρ ). Note: We can obtain an independent sample Y = (Y 1, Y ) T by 1 N (, 1), N (ρ 1, 1 ρ ). MCMC, April 9, 4-6 -

Markov Chain Monte Carlo Comparison of MCMC and independent draws s 1 to 4 3 Independent sampling (n=3) Burn in 3 1 1 1 1 3 1 1 3 s 1 to 7 3 1 1 3 Independent sampling (n=6) 3 8 6 4 5 4 3 3 4 1 4 5 3 1 1 3 s 1 to 1 3 1 1 3 Independent sampling (n=9) 5 8 6 6 4 6 4 6 7 3 1 1 3 s 1 to 1 8 3 1 1 3 8 Independent sampling (n=99) 7 8 6 6 8 4 4 9 9 3 1 1 3 3 1 1 3 1 1 MCMC, April 9, 4-7 -

Convergence diagnostics Markov Chain Monte Carlo Plot chain for each quantity of interest. Plot auto-correlation function (ACF) ρ i (h) = corr ( Y i, Y (t+h) ) i. measures the correlation of values h lags apart. Slow decay of ACF indicates slow convergence and bad mixing. Can be used to find independent subsample. Run multiple, independent chains (e.g. 3-1). Several long runs (Gelman and Rubin 199) gives indication of convergence a sense of statistical security one very long run (Geyer, 199) reaches parts other schemes cannot reach. Widely dispersed starting values are particularly helpful to detect slow convergence. µ1 4 convergence ACF µ 1..8.6.4.. 1 1 3 5 1 15 Lag 5 1 15 iteration 4 6 8 1 If not satisfied, try some other diagnostics ( literature). MCMC, April 9, 4-8 -

Markov Chain Monte Carlo Note: Even after the chain reached convergence, it might not yet good enough for estimating E(h(Y )). 3 1 1 3 1 3 4 5 6 7 8 9 1 Problem: Chain should show good mixing (transition between states) run the chain for a longer period 6 5 4 3 1 3 1 1 3 3 1 1 3 1 3 4 5 6 7 8 9 1 11 1 13 14 15 16 17 18 19 1 9 6 3 3 1 1 3 Monte Carlo error Suppose we want to estimate E ( g(y ) ) by ĥ = 1 N N h(y ) t=1 with Y f(y). The error of the approximation (Monte Carlo error) is Estimation of Monte Carlo error: Let {Y (i,t) } be I Markov chains. Then 1 I(I 1) I (ĥ(i) ĥ) var(ĥ). var(ĥ) can be estimated by where ĥ(i) is the MCMC estimate based in the ith chain ĥ is the average of the ĥ(i) (overall estimate) MCMC, April 9, 4-9 -