Sequential Monte Carlo Methods

Similar documents
Sequential Monte Carlo Methods

Sequential Monte Carlo Methods

Sequential Monte Carlo Methods (for DSGE Models)

Bayesian Computations for DSGE Models

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference

Sequential Monte Carlo Methods (for DSGE Models)

Sequential Monte Carlo Sampling for DSGE Models

NBER WORKING PAPER SERIES SEQUENTIAL MONTE CARLO SAMPLING FOR DSGE MODELS. Edward P. Herbst Frank Schorfheide

WORKING PAPER NO SEQUENTIAL MONTE CARLO SAMPLING FOR DSGE MODELS. Edward Herbst Federal Reserve Board

Computational statistics

Sequential Monte Carlo samplers for Bayesian DSGE models

Sequential Monte Carlo Samplers for Applications in High Dimensions

The Metropolis-Hastings Algorithm. June 8, 2012

Infinite-State Markov-switching for Dynamic. Volatility Models : Web Appendix

Estimation of DSGE Models under Diffuse Priors and Data- Driven Identification Constraints. Markku Lanne and Jani Luoto

Sequential Monte Carlo samplers for Bayesian DSGE models

arxiv: v1 [stat.co] 23 Apr 2018

Introduction to Bayesian Inference

17 : Markov Chain Monte Carlo

MCMC algorithms for fitting Bayesian models

Bayesian Estimation with Sparse Grids

Auxiliary Particle Methods

Introduction to Bayesian Computation

Learning the hyper-parameters. Luca Martino

Tutorial on ABC Algorithms

MONTE CARLO METHODS. Hedibert Freitas Lopes

SMC 2 : an efficient algorithm for sequential analysis of state-space models

Sequential Bayesian Updating

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Eco517 Fall 2013 C. Sims MCMC. October 8, 2013

ASYMPTOTICALLY INDEPENDENT MARKOV SAMPLING: A NEW MARKOV CHAIN MONTE CARLO SCHEME FOR BAYESIAN INFERENCE

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

Bayesian Phylogenetics:

Controlled sequential Monte Carlo

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture February Arnaud Doucet

Bayes Factors, posterior predictives, short intro to RJMCMC. Thermodynamic Integration

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Exercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters

DSGE Model Forecasting

Reminder of some Markov Chain properties:

CSC 2541: Bayesian Methods for Machine Learning

Lecture 8: Bayesian Estimation of Parameters in State Space Models

SUPPLEMENT TO MARKET ENTRY COSTS, PRODUCER HETEROGENEITY, AND EXPORT DYNAMICS (Econometrica, Vol. 75, No. 3, May 2007, )

Bayesian Inference and MCMC

On the conjugacy of off-line and on-line Sequential Monte Carlo Samplers. Working Paper Research. by Arnaud Dufays. September 2014 No 263

MCMC and Gibbs Sampling. Kayhan Batmanghelich

Computer Practical: Metropolis-Hastings-based MCMC

ESTIMATION of a DSGE MODEL

STA 4273H: Statistical Machine Learning

Sequential Monte Carlo samplers

Principles of Bayesian Inference

TEORIA BAYESIANA Ralph S. Silva

Calibration of Stochastic Volatility Models using Particle Markov Chain Monte Carlo Methods

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

Monte Carlo in Bayesian Statistics

arxiv: v1 [stat.co] 1 Jun 2015

Monte Carlo methods for sampling-based Stochastic Optimization

Markov Chain Monte Carlo methods

Session 5B: A worked example EGARCH model

Bayesian Sequential Design under Model Uncertainty using Sequential Monte Carlo

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Markov chain Monte Carlo

Model comparison. Christopher A. Sims Princeton University October 18, 2016

Pseudo-marginal MCMC methods for inference in latent variable models

State-Space Methods for Inferring Spike Trains from Calcium Imaging

Lecture 4: Dynamic models

Tailored Randomized-block MCMC Methods with Application to DSGE Models


Computer Intensive Methods in Mathematical Statistics

Markov Chain Monte Carlo, Numerical Integration

Bayesian Methods for Machine Learning

Introduction to Markov Chain Monte Carlo & Gibbs Sampling

Computer intensive statistical methods

Inference in state-space models with multiple paths from conditional SMC

Sequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007

Sequential Monte Carlo Methods for Bayesian Computation

Piecewise Linear Continuous Approximations and Filtering for DSGE Models with Occasionally-Binding Constraints

Sequentially Adaptive Bayesian Learning Algorithms for Inference and Optimization

An introduction to Sequential Monte Carlo

Session 3A: Markov chain Monte Carlo (MCMC)

New Insights into History Matching via Sequential Monte Carlo

Sequentially Adaptive Bayesian Learning Algorithms for Inference and Optimization

Kernel adaptive Sequential Monte Carlo

CS281A/Stat241A Lecture 22

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix

A Review of Pseudo-Marginal Markov Chain Monte Carlo

Point, Interval, and Density Forecast Evaluation of Linear versus Nonlinear DSGE Models

Introduction to Probabilistic Machine Learning

Estimation and Inference on Dynamic Panel Data Models with Stochastic Volatility

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

LECTURE 15 Markov chain Monte Carlo

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

Hierarchical models. Dr. Jarad Niemi. August 31, Iowa State University. Jarad Niemi (Iowa State) Hierarchical models August 31, / 31

Nested Sampling. Brendon J. Brewer. brewer/ Department of Statistics The University of Auckland

Metropolis Hastings. Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601. Module 9

On Markov chain Monte Carlo methods for tall data

INTRODUCTION TO BAYESIAN STATISTICS

Negative Association, Ordering and Convergence of Resampling Methods

David Giles Bayesian Econometrics

Lecture 7 and 8: Markov Chain Monte Carlo

Transcription:

University of Pennsylvania Bradley Visitor Lectures October 23, 2017

Introduction Unfortunately, standard MCMC can be inaccurate, especially in medium and large-scale DSGE models: disentangling importance of internal versus external propagation mechanism; determining the relative importance of shocks. Now, we use sequential Monte Carlo (SMC) (more precisely, sequential importance sampling) instead: Better suited to handle irregular and multimodal posteriors associated with large DSGE models. Algorithms can be easily parallelized. SMC = Importance Sampling on Steriods. We build on Theoretical work: Chopin (2004); Del Moral, Doucet, Jasra (2006) Applied work: Creal (2007); Durham and Geweke (2011, 2012)

Importance Sampling Approximate π( ) by using a different, tractable density g(θ) that is easy to sample from. For more general problems, posterior density may be non-normalized. So we write π(θ) = p(y θ)p(θ) p(y ) = f (θ) f (θ)dθ. Importance sampling is based on the identity The ratio E π [h(θ)] = h(θ)π(θ)dθ = w(θ) = f (θ) g(θ) f (θ) h(θ) Θ Θ g(θ) g(θ)dθ f (θ) g(θ) g(θ)dθ. is called the (unnormalized) importance weight.

Importance Sampling 1 For i = 1 to N, draw θ i iid g(θ) and compute the unnormalized importance weights w i = w(θ i ) = f (θi ) g(θ i ). 2 Compute the normalized importance weights W i = 1 N w i N i=1 w i. An approximation of E π [h(θ)] is given by h N = 1 N N W i h(θ i ). i=1

Illustration If θ i s are draws from g( ) then E π [h] 1 N N i=1 h(θi )w(θ i ) 1 N N i=1 w(θi ), w(θ) = f (θ) g(θ). 0.40 0.35 f 0.30 g 1 0.25 g 0.20 2 0.15 0.10 0.05 0.00 6 4 2 0 2 4 6 weights f/g 1 f/g 2 6 4 2 0 2 4 6

Accuracy Since we are generating iid draws from g(θ), it s fairly straightforward to derive a CLT: It can be shown that N( hn E π [h]) = N ( 0, Ω(h) ), where Ω(h) = V g [(π/g)(h E π [h])]. Using a crude approximation (see, e.g., Liu (2008)), we can factorize Ω(h) as follows: Ω(h) V π [h] ( V g [π/g] + 1 ). The approximation highlights that the larger the variance of the importance weights, the less accurate the Monte Carlo approximation relative to the accuracy that could be achieved with an iid sample from the posterior. Users often monitor ESS = N V π[h] Ω(h) N 1 + V g [π/g].

From Importance Sampling to Sequential Importance Sampling In general, it s hard to construct a good proposal density g(θ), especially if the posterior has several peaks and valleys. Idea - Part 1: it might be easier to find a proposal density for π n (θ) = [p(y θ)]φn p(θ) = f n(θ). [p(y θ)] φ np(θ)dθ Z n at least if φ n is close to zero. Idea - Part 2: We can try to turn a proposal density for π n into a proposal density for π n+1 and iterate, letting φ n φ N = 1.

Illustration A stylized state-space model: [ ] [ θ1 y t = [1 1]s t, s t = 2 0 1 (1 θ1 2) θ 1θ 2 (1 θ1 2) s t 1 + 0 Innovation: ɛ t iidn(0, 1). Prior: uniform on the square 0 θ 1 1 and 0 θ 2 1. ] ɛ t. Simulate T = 200 observations given θ = [0.45, 0.45], which is observationally equivalent to θ = [0.89, 0.22]

Illustration: Tempered Posteriors of θ 1 5 4 3 2 1 0 50 40 30 n 20 10 0.0 0.2 0.4 0.6 θ 1 0.8 1.0 π n (θ) = [p(y p(θ) θ)]φn = f ( ) λ n(θ) n, φ [p(y θ)] φ np(θ)dθ n = Z n N φ

Illustration: Posterior Draws θ 2 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 θ 1

SMC Algorithm: A Graphical Illustration 10 5 0 5 10 C S M C S M C S M φ 0 φ 1 φ 2 φ 3 π n (θ) is represented by a swarm of particles {θ i n, W i n} N i=1 : h n,n = 1 N N i=1 W i nh(θ i n) a.s. E πn [h(θ n )]. C is Correction; S is Selection; and M is Mutation.

SMC Algorithm 1 Initialization. (φ 0 = 0). Draw the initial particles from the prior: θ1 i iid p(θ) and W i 1 = 1, i = 1,..., N. 2 Recursion. For n = 1,..., N φ, 1 Correction. Reweight the particles from stage n 1 by defining the incremental weights w i n = [p(y θ i n 1)] φn φ n 1 (1) and the normalized weights W n i w i = nwn 1 i 1 N N i=1 w, i = 1,..., N. (2) nw i n 1 i An approximation of E πn [h(θ)] is given by h n,n = 1 N 2 Selection. N i=1 W i nh(θ i n 1). (3)

SMC Algorithm 1 Initialization. 2 Recursion. For n = 1,..., N φ, 1 Correction. 2 Selection. (Optional Resampling) Let {ˆθ} N i=1 denote N iid draws from a multinomial distribution characterized by support points and weights {θ i n 1, W i n} N i=1 and set W i n = 1. An approximation of E πn [h(θ)] is given by ĥ n,n = 1 N N Wnh(ˆθ i n). i i=1 (4) 3 Mutation. Propagate the particles {ˆθ i, W i n} via N MH steps of a MH algorithm with transition density θ i n K n(θ n ˆθ i n; ζ n) and stationary distribution π n(θ). An approximation of E πn [h(θ)] is given by h n,n = 1 N N h(θn)w i n. i i=1 (5)

Remarks Correction Step: reweight particles from iteration n 1 to create importance sampling approximation of E πn [h(θ)] Selection Step: the resampling of the particles... (good) equalizes the particle weights and thereby increases accuracy of subsequent importance sampling approximations; (not good) adds a bit of noise to the MC approximation. Mutation Step: adapts particles to posterior π n(θ); imagine we don t do it: then we would be using draws from prior p(θ) to approximate posterior π(θ), which can t be good! 5 4 3 2 1 0 50 40 30 n 20 10 0.0 1.0 0.8 0.6 0.4 0.2 θ1

Theoretical Properties Goal: strong law of large numbers (SLLN) and central limit theorem (CLT) as N for every iteration n = 1,..., N φ. Regularity conditions: proper prior; bounded likelihood function; 2 + δ posterior moments of h(θ). Idea of proof (Chopin, 2004): proceed recursively Initialization: SLLN and CLT for iid random variables because we sample from prior. Assume that n 1 approximation (with normalized weights) yields N ( 1 N Show that N ( 1 N ) N h(θn 1)W i n 1 i E πn 1 [h(θ)] = N ( 0, Ω ) n 1(h) i=1 ) N h(θn)w i n i E πn [h(θ)] = N ( 0, Ω ) n(h) i=1

More on Transition Kernel in Mutation Step Transition kernel K n (θ ˆθ n 1 ; ζ n ): generated by running M steps of a Metropolis-Hastings algorithm. Lessons from DSGE model MCMC: blocking of parameters can reduces persistence of Markov chain; mixture proposal density avoids getting stuck. Blocking: Partition the parameter vector θ n into N blocks equally sized blocks, denoted by θ n,b, b = 1,..., N blocks. (We generate the blocks for n = 1,..., N φ randomly prior to running the SMC algorithm.) Example: random walk proposal density: ( ) ϑ b (θn,b,m 1, i θn, b,m, i Σ n,b) N θn,b,m 1, i cnσ 2 n,b.

Adaptive Choice of ζ n = (Σ n, c n ) Infeasible adaption: Let Σ n = V πn [θ]. Adjust scaling factor according to c n = c n 1f ( 1 R n 1(ζ n 1) ), where R n 1( ) is population rejection rate from iteration n 1 and e16(x 0.25) f (x) = 0.95 + 0.10 1 + e. 16(x 0.25) Feasible adaption use output from stage n 1 to replace ζ n by ˆζ n : Use particle approximations of E πn [θ] and V πn [θ] based on {θ i n 1, W i n} N i=1. Use actual rejection rate from stage n 1 to calculate ĉ n = ĉ n 1f ( ˆRn 1(ˆζ n 1) ). Result: under suitable regularity conditions replacing ζ n by ˆζ n where n(ˆζ n ζ n ) = O p (1) does not affect the asymptotic variance of the MC approximation.

Adaption of SMC Algorithm for Stylized State-Space Model 1.0 Acceptance Rate 0.5 0.0 1.50 Scale Parameter c 0.75 0.00 1000 Effective Sample Size 500 0 n 10 20 30 40 50 Notes: The dashed line in the top panel indicates the target acceptance rate of 0.25.

More on Resampling So far, we have used multinomial resampling. It s fairly intuitive and it is straightforward to obtain a CLT. But: multinominal resampling is not particularly efficient. The book contains a section on alternative resampling schemes (stratified resampling, residual resampling...) These alternative techniques are designed to achieve a variance reduction. Most resampling algorithms are not parallelizable because they rely on the normalized particle weights.

Application 1: Small Scale New Keynesian Model We will take a look at the effect of various tuning choices on accuracy: Tempering schedule λ: λ = 1 is linear, λ > 1 is convex. Number of stages N φ versus number of particles N. Number of blocks in mutation step versus number of particles.

Effect of λ on Inefficiency Factors InEff N [ θ] 10 4 10 3 10 2 10 1 10 0 1 2 3 4 5 6 7 8 λ Notes: The figure depicts hairs of InEff N [ θ] as function of λ. The inefficiency factors are computed based on N run = 50 runs of the SMC algorithm. Each hair corresponds to a DSGE model parameter.

Number of Stages N φ vs Number of Particles N 10 1 10 0 10 1 10 2 10 3 ρ g σ r ρ r ρ z σ g κ σ z π (A) ψ 1 γ (Q) ψ 2 r (A) τ N φ = 400, N = 250 N φ = 200, N = 500 N φ = 100, N = 1000 N φ = 50, N = 2000 N φ = 25, N = 4000 Notes: Plot of V[ θ]/v π [θ] for a specific configuration of the SMC algorithm. The inefficiency factors are computed based on N run = 50 runs of the SMC algorithm. N blocks = 1, λ = 2, N MH = 1.

Number of blocks N blocks in Mutation Step vs Number of Particles N 10 0 10 1 10 2 10 3 ρ g σ g σ z σ r ρ r κ ρ z π (A) ψ 1 τ γ (Q) r (A) ψ 2 Nblocks = 4, N = 250 Nblocks = 2, N = 500 Nblocks = 1, N = 1000 Notes: Plot of V[ θ]/v π [θ] for a specific configuration of the SMC algorithm. The inefficiency factors are computed based on N run = 50 runs of the SMC algorithm. N φ = 100, λ = 2, N MH = 1.

A Few Words on Posterior Model Probabilities Posterior model probabilities π i,t = π i,0 p(y 1:T M i ) M j=1 π j,0p(y 1:T M j ) where p(y 1:T M i ) = p(y 1:T θ (i), M i )p(θ (i) M i )dθ (i) For any model: ln p(y 1:T M i ) = T t=1 ln p(y t θ (i), Y 1:t 1, M i )p(θ (i) Y 1:t 1, M i )dθ (i) Marginal data density p(y 1:T M i ) arises as a by-product of SMC.

Marginal Likelihood Approximation Recall w i n = [p(y θ i n 1 )]φn φn 1. Then 1 N N w nw i n 1 i i=1 = p φn 1 (Y θ)p(θ) [p(y θ)] φn φn 1 dθ p φ n 1(Y θ)p(θ)dθ p(y θ) φ n p(θ)dθ p(y θ) φ n 1p(θ)dθ Thus, N φ ( ) 1 N w i N nwn 1 i n=1 i=1 p(y θ)p(θ)dθ.

SMC Marginal Data Density Estimates N φ = 100 N φ = 400 N Mean(ln ˆp(Y )) SD(ln ˆp(Y )) Mean(ln ˆp(Y )) SD(ln ˆp(Y )) 500-352.19 (3.18) -346.12 (0.20) 1,000-349.19 (1.98) -346.17 (0.14) 2,000-348.57 (1.65) -346.16 (0.12) 4,000-347.74 (0.92) -346.16 (0.07) Notes: Table shows mean and standard deviation of log marginal data density estimates as a function of the number of particles N computed over N run = 50 runs of the SMC sampler with N blocks = 4, λ = 2, and N MH = 1.

Application 2: Estimation of Smets and Wouters (2007) Model Benchmark macro model, has been estimated many (many) times. Core of many larger-scale models. 36 estimated parameters. RWMH: 10 million draws (5 million discarded); SMC: 500 stages with 12,000 particles. We run the RWM (using a particular version of a parallelized MCMC) and the SMC algorithm on 24 processors for the same amount of time. We estimate the SW model twenty times using RWM and SMC and get essentially identical results.

Application 2: Estimation of Smets and Wouters (2007) Model More interesting question: how does quality of posterior simulators change as one makes the priors more diffuse? Replace Beta by Uniform distributions; increase variances of parameters with Gamma and Normal prior by factor of 3.

SW Model with DIFFUSE Prior: Estimation stability RWMH (black) versus SMC (red) 4 3 2 1 0 1 2 3 4 l ι w µ p µ w ρ w ξ w r π

A Measure of Effective Number of Draws Suppose we could generate iid N eff ( Ê π [θ] approx N E π [θ], 1 N eff V π [θ] draws from posterior, then ). We can measure the variance of Ê π [θ] by running SMC and RWM algorithm repeatedly. Then, N eff V π[θ] V [ Ê π [θ] ]

Effective Number of Draws SMC RWMH Parameter Mean STD(Mean) N eff Mean STD(Mean) N eff σ l 3.06 0.04 1058 3.04 0.15 60 l -0.06 0.07 732-0.01 0.16 177 ι p 0.11 0.00 637 0.12 0.02 19 h 0.70 0.00 522 0.69 0.03 5 Φ 1.71 0.01 514 1.69 0.04 10 r π 2.78 0.02 507 2.76 0.03 159 ρ b 0.19 0.01 440 0.21 0.08 3 ϕ 8.12 0.16 266 7.98 1.03 6 σ p 0.14 0.00 126 0.15 0.04 1 ξ p 0.72 0.01 91 0.73 0.03 5 ι w 0.73 0.02 87 0.72 0.03 36 µ p 0.77 0.02 77 0.80 0.10 3 ρ w 0.69 0.04 49 0.69 0.09 11 µ w 0.63 0.05 49 0.63 0.09 11 ξ w 0.93 0.01 43 0.93 0.02 8

A Closer Look at the Posterior: Two Modes Parameter Mode 1 Mode 2 ξ w 0.844 0.962 ι w 0.812 0.918 ρ w 0.997 0.394 µ w 0.978 0.267 Log Posterior -804.14-803.51 Mode 1 implies that wage persistence is driven by extremely exogenous persistent wage markup shocks. Mode 2 implies that wage persistence is driven by endogenous amplification of shocks through the wage Calvo and indexation parameter. SMC is able to capture the two modes.

A Closer Look at the Posterior: Internal ξ w versus External ρ w Propagation 1 0.9 0.8 0.7 0.6 ρ w 0.5 0.4 0.3 0.2 0.1 0 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 ξ w

Stability of Posterior Computations: RWMH (black) versus SMC (red) 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 P (ξ w > ρ w) P (ρ w > µ w) P (ξ w > µ w) P (ξ p > ρ p) P (ρ p > µ p) P (ξ p > µ p)

Marginal Data Density Bayesian model evaluation is based on marginal data density p n (Y ) = [p(y θ)] φn p(θ)dθ. Recall that marginal data density is a by-product of the importance sampler: ˆp n (Y ) = Ẑn = 1 N N i=1 W i n. Algorithm, Method MDD Estimate Standard Deviation SMC, Particle Estimate -873.46 0.24 RWMH, Harmonic Mean -874.56 1.20