Sequential Monte Carlo Methods

Similar documents
Sequential Monte Carlo Methods

Sequential Monte Carlo Methods

Sequential Monte Carlo Methods (for DSGE Models)

Bayesian Computations for DSGE Models

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference

Sequential Monte Carlo Sampling for DSGE Models

WORKING PAPER NO SEQUENTIAL MONTE CARLO SAMPLING FOR DSGE MODELS. Edward Herbst Federal Reserve Board

NBER WORKING PAPER SERIES SEQUENTIAL MONTE CARLO SAMPLING FOR DSGE MODELS. Edward P. Herbst Frank Schorfheide

Sequential Monte Carlo Methods (for DSGE Models)

Computational statistics

Introduction to Bayesian Inference

Sequential Monte Carlo samplers for Bayesian DSGE models

Sequential Monte Carlo Samplers for Applications in High Dimensions

Infinite-State Markov-switching for Dynamic. Volatility Models : Web Appendix

The Metropolis-Hastings Algorithm. June 8, 2012

DSGE Model Forecasting

Estimation of DSGE Models under Diffuse Priors and Data- Driven Identification Constraints. Markku Lanne and Jani Luoto

Sequential Monte Carlo samplers for Bayesian DSGE models

Bayesian Inference for DSGE Models. Lawrence J. Christiano

17 : Markov Chain Monte Carlo

Bayesian Inference for DSGE Models. Lawrence J. Christiano

arxiv: v1 [stat.co] 23 Apr 2018

Learning the hyper-parameters. Luca Martino

Sequential Bayesian Updating

MCMC algorithms for fitting Bayesian models

Bayesian Estimation with Sparse Grids

Auxiliary Particle Methods

Tutorial on ABC Algorithms

Introduction to Bayesian Computation

Can News be a Major Source of Aggregate Fluctuations?

Non-nested model selection. in unstable environments

MONTE CARLO METHODS. Hedibert Freitas Lopes

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix

SMC 2 : an efficient algorithm for sequential analysis of state-space models

Lecture 8: Bayesian Estimation of Parameters in State Space Models

Exercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

Bayesian Estimation of DSGE Models: Lessons from Second-order Approximations

Eco517 Fall 2013 C. Sims MCMC. October 8, 2013

State-Space Methods for Inferring Spike Trains from Calcium Imaging

ASYMPTOTICALLY INDEPENDENT MARKOV SAMPLING: A NEW MARKOV CHAIN MONTE CARLO SCHEME FOR BAYESIAN INFERENCE

Piecewise Linear Continuous Approximations and Filtering for DSGE Models with Occasionally-Binding Constraints

Calibration of Stochastic Volatility Models using Particle Markov Chain Monte Carlo Methods

Bayesian Phylogenetics:

arxiv: v1 [stat.co] 1 Jun 2015

Ambiguous Business Cycles: Online Appendix

Bayesian Methods for Machine Learning

Controlled sequential Monte Carlo

Monte Carlo methods for sampling-based Stochastic Optimization

Bayes Factors, posterior predictives, short intro to RJMCMC. Thermodynamic Integration

Tailored Randomized-block MCMC Methods with Application to DSGE Models

Reminder of some Markov Chain properties:

What s News In Business Cycles

Monte Carlo in Bayesian Statistics

NBER WORKING PAPER SERIES WHAT'S NEWS IN BUSINESS CYCLES. Stephanie Schmitt-Grohe Martin Uribe. Working Paper

Session 5B: A worked example EGARCH model

CSC 2541: Bayesian Methods for Machine Learning

SUPPLEMENT TO MARKET ENTRY COSTS, PRODUCER HETEROGENEITY, AND EXPORT DYNAMICS (Econometrica, Vol. 75, No. 3, May 2007, )

Markov Chain Monte Carlo methods


Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Sequential Monte Carlo samplers

Dynamics and Monetary Policy in a Fair Wage Model of the Business Cycle

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture February Arnaud Doucet

On the conjugacy of off-line and on-line Sequential Monte Carlo Samplers. Working Paper Research. by Arnaud Dufays. September 2014 No 263

ESTIMATION of a DSGE MODEL

Bayesian Inference and MCMC

Computer Practical: Metropolis-Hastings-based MCMC

An introduction to Sequential Monte Carlo

MCMC and Gibbs Sampling. Kayhan Batmanghelich

On Markov chain Monte Carlo methods for tall data

Bayesian Model Comparison:

Kernel adaptive Sequential Monte Carlo

Pseudo-marginal MCMC methods for inference in latent variable models

STA 4273H: Statistical Machine Learning

Negative Association, Ordering and Convergence of Resampling Methods

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

TEORIA BAYESIANA Ralph S. Silva

DSGE Methods. Estimation of DSGE models: Maximum Likelihood & Bayesian. Willi Mutschler, M.Sc.

Stat 516, Homework 1

STA 294: Stochastic Processes & Bayesian Nonparametrics

Principles of Bayesian Inference

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

Sequential Monte Carlo Methods for Bayesian Computation

News Shocks and Business Cycles: Evidence from Forecast Data

Bayesian Sequential Design under Model Uncertainty using Sequential Monte Carlo

Point, Interval, and Density Forecast Evaluation of Linear versus Nonlinear DSGE Models

Macroeconomics Theory II

Lecture 4: Dynamic models

Sequentially Adaptive Bayesian Learning Algorithms for Inference and Optimization

New Insights into History Matching via Sequential Monte Carlo

Markov chain Monte Carlo

Session 3A: Markov chain Monte Carlo (MCMC)

Introduction to Markov Chain Monte Carlo & Gibbs Sampling

David Giles Bayesian Econometrics

Sequentially Adaptive Bayesian Learning Algorithms for Inference and Optimization

Estimation and Inference on Dynamic Panel Data Models with Stochastic Volatility

Markov Chain Monte Carlo, Numerical Integration

Monte Carlo Methods. Leon Gu CSD, CMU

Kernel Sequential Monte Carlo

Transcription:

University of Pennsylvania EABCN Training School May 10, 2016

Introduction Unfortunately, standard MCMC can be inaccurate, especially in medium and large-scale DSGE models: disentangling importance of internal versus external propagation mechanism; determining the relative importance of shocks. Previously: Modify MCMC algorithms to overcome weaknesses: blocking of parameters; tailoring of (mixture) proposal densities Now, we use sequential Monte Carlo (SMC) (more precisely, sequential importance sampling) instead: Better suited to handle irregular and multimodal posteriors associated with large DSGE models. Algorithms can be easily parallelized. SMC = Importance Sampling on Steriods. We build on Theoretical work: Chopin (2004); Del Moral, Doucet, Jasra (2006) Applied work: Creal (2007); Durham and Geweke (2011, 2012)

Importance Sampling Approximate π( ) by using a different, tractable density g(θ) that is easy to sample from. For more general problems, posterior density may be non-normalized. So we write π(θ) = p(y θ)p(θ) p(y ) = f (θ) f (θ)dθ. Importance sampling is based on the identity The ratio E π [h(θ)] = h(θ)π(θ)dθ = w(θ) = f (θ) g(θ) f (θ) h(θ) Θ Θ g(θ) g(θ)dθ f (θ) g(θ) g(θ)dθ. is called the (unnormalized) importance weight.

Importance Sampling 1 For i = 1 to N, draw θ i iid g(θ) and compute the unnormalized importance weights w i = w(θ i ) = f (θi ) g(θ i ). 2 Compute the normalized importance weights W i = 1 N w i N i=1 w i. An approximation of E π [h(θ)] is given by h N = 1 N N W i h(θ i ). i=1

Illustration If θ i s are draws from g( ) then E π [h] 1 N N i=1 h(θi )w(θ i ) 1 N N i=1 w(θi ), w(θ) = f (θ) g(θ). 0.40 0.35 f 0.30 g 1 0.25 g 0.20 2 0.15 0.10 0.05 0.00 6 4 2 0 2 4 6 weights f/g 1 f/g 2 6 4 2 0 2 4 6

Accuracy Since we are generating iid draws from g(θ), it s fairly straightforward to derive a CLT: It can be shown that N( hn E π [h]) = N ( 0, Ω(h) ), where Ω(h) = V g [(π/g)(h E π [h])]. Using a crude approximation (see, e.g., Liu (2008)), we can factorize Ω(h) as follows: Ω(h) V π [h] ( V g [π/g] + 1 ). The approximation highlights that the larger the variance of the importance weights, the less accurate the Monte Carlo approximation relative to the accuracy that could be achieved with an iid sample from the posterior. Users often monitor ESS = N V π[h] Ω(h) N 1 + V g [π/g].

From Importance Sampling to Sequential Importance Sampling In general, it s hard to construct a good proposal density g(θ), especially if the posterior has several peaks and valleys. Idea - Part 1: it might be easier to find a proposal density for π n (θ) = [p(y θ)]φn p(θ) [p(y θ)] φ np(θ)dθ = f n(θ) Z n. at least if φ n is close to zero. Idea - Part 2: We can try to turn a proposal density for π n into a proposal density for π n+1 and iterate, letting φ n φ N = 1.

Illustration: Our state-space model: [ y t = [1 1]s t, s t = Innovation: ɛ t iidn(0, 1). ] [ θ1 2 0 1 (1 θ1 2) θ 1θ 2 (1 θ1 2) s t 1 + 0 Prior: uniform on the square 0 θ 1 1 and 0 θ 2 1. Simulate T = 200 observations given θ = [0.45, 0.45], which is observationally equivalent to θ = [0.89, 0.22] ] ɛ t.

Illustration: Tempered Posteriors of θ 1 5 4 3 2 1 0 50 40 30 n 20 10 0.0 0.2 0.4 0.6 θ 1 0.8 1.0 π n (θ) = [p(y θ)]φn p(θ) = f ( ) λ n(θ) n, φ [p(y θ)] φ np(θ)dθ n = Z n N φ

Illustration: Posterior Draws θ 2 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 θ 1

SMC Algorithm: A Graphical Illustration 10 5 0 5 10 C S M C S M C S M φ 0 φ 1 φ 2 φ 3 π n (θ) is represented by a swarm of particles {θ i n, W i n} N i=1 : h n,n = 1 N N i=1 W i nh(θ i n) a.s. E πn [h(θ n )]. C is Correction; S is Selection; and M is Mutation.

SMC Algorithm 1 Initialization. (φ 0 = 0). Draw the initial particles from the prior: iid p(θ) and W i 1 = 1, i = 1,..., N. θ i 1 2 Recursion. For n = 1,..., N φ, 1 Correction. Reweight the particles from stage n 1 by defining the incremental weights w i n = [p(y θ i n 1)] φn φ n 1 (1) and the normalized weights W n i w i = nwn 1 i 1 N N i=1 w, i = 1,..., N. (2) nw i n 1 i An approximation of E πn [h(θ)] is given by h n,n = 1 N 2 Selection. N i=1 W i nh(θ i n 1). (3)

SMC Algorithm 1 Initialization. 2 Recursion. For n = 1,..., N φ, 1 Correction. 2 Selection. (Optional Resampling) Let {ˆθ} N i=1 denote N iid draws from a multinomial distribution characterized by support points and weights {θ i n 1, W i n} N i=1 and set W i n = 1. An approximation of E πn [h(θ)] is given by ĥ n,n = 1 N N Wnh(ˆθ i n). i (4) i=1 3 Mutation. Propagate the particles {ˆθ i, W i n} via N MH steps of a MH algorithm with transition density θ i n K n(θ n ˆθ i n; ζ n) and stationary distribution π n(θ). An approximation of E πn [h(θ)] is given by h n,n = 1 N N h(θn)w i n. i (5) i=1

Remarks Correction Step: reweight particles from iteration n 1 to create importance sampling approximation of E πn [h(θ)] Selection Step: the resampling of the particles (good) equalizes the particle weights and thereby increases accuracy of subsequent importance sampling approximations; (not good) adds a bit of noise to the MC approximation. Mutation Step: adapts particles to posterior π n(θ); imagine we don t do it: then we would be using draws from prior p(θ) to approximate posterior π(θ), which can t be good! 5 4 3 2 1 0 50 40 30 n 20 10 0.4 0.2 0.0 0.6 θ1 0.8 1.0

Theoretical Properties Goal: strong law of large numbers (SLLN) and central limit theorem (CLT) as N for every iteration n = 1,..., N φ. Regularity conditions: proper prior; bounded likelihood function; 2 + δ posterior moments of h(θ). Idea of proof (Chopin, 2004): proceed recursively Initialization: SLLN and CLT for iid random variables because we sample from prior. Assume that n 1 approximation (with normalized weights) yields N ( 1 N Show that N ( 1 N ) N h(θn 1)W i n 1 i E πn 1 [h(θ)] = N ( 0, Ω ) n 1(h) i=1 ) N h(θn)w i n i E πn [h(θ)] = N ( 0, Ω ) n(h) i=1

Theoretical Properties: Correction Step Suppose that the n 1 approximation (with normalized weights) yields ( ) 1 N N h(θ i N n 1)Wn 1 i E πn 1 [h(θ)] = N ( 0, Ω n 1 (h) ) i=1 Then ( 1 N N N i=1 h(θi n 1 )[p(y ) θi n 1 )]φn φn 1 Wn 1 i 1 N N i=1 [p(y E θi n 1 )]φn φn 1 Wn 1 i πn [h(θ)] = N ( 0, Ω n (h) ) where Ω n (h) = Ω n 1 ( vn 1 (θ)(h E πn [h]) ) v n 1 (θ) = [p(y θ)] φn φn 1 Z n 1 Z n This step relies on likelihood evaluations from iteration n 1 that are already stored in memory.

Theoretical Properties: Selection / Resampling After resampling by drawing from iid multinomial distribution we obtain ( ) 1 N N h(ˆθ i )Wn i E πn [h] = N ( 0, N ˆΩ(h) ), where i=1 ˆΩ n (h) = Ω(h) + V πn [h] Disadvantage of resampling: it adds noise. Advantage of resampling: it equalizes the particle weights, reducing the variance of v n (θ) in Ω n+1 (h) = Ω n ( vn (θ)(h E πn+1 [h]).

Theoretical Properties: Mutation We are using the Markov transition kernel K n (θ ˆθ) to transform draws ˆθ i n into draws θ i n. To preserve the distribution of the ˆθ n s i it has to be the case that π n (θ) = K n (θ ˆθ)π n (ˆθ)d ˆθ. It can be shown that the overall asymptotic variance after the mutation is the sum of the variance of the approximation of the conditional mean E Kn( θ n 1 )[h(θ)] which is given by ˆΩ ( E Kn( θ n 1 )[h(θ)] ) ; a weighted average of the conditional variance V Kn( θ n 1 )[h(θ)]: W n 1(θ n 1)v n 1(θ n 1)V Kn( θ n 1 )[h(θ)]π n 1(θ n 1). This step is easily parallelizable.

More on Transition Kernel in Mutation Step Transition kernel K n (θ ˆθ n 1 ; ζ n ): generated by running M steps of a Metropolis-Hastings algorithm. Lessons from DSGE model MCMC: blocking of parameters can reduces persistence of Markov chain; mixture proposal density avoids getting stuck. Blocking: Partition the parameter vector θ n into N blocks equally sized blocks, denoted by θ n,b, b = 1,..., N blocks. (We generate the blocks for n = 1,..., N φ randomly prior to running the SMC algorithm.) Example: random walk proposal density: ( ) ϑ b (θn,b,m 1, i θn, b,m, i Σ n,b) N θn,b,m 1, i cnσ 2 n,b.

Adaptive Choice of ζ n = (Σ n, c n ) Infeasible adaption: Let Σ n = V πn [θ]. Adjust scaling factor according to c n = c n 1f ( 1 R n 1(ζ n 1) ), where R n 1( ) is population rejection rate from iteration n 1 and e16(x 0.25) f (x) = 0.95 + 0.10 1 + e. 16(x 0.25) Feasible adaption use output from stage n 1 to replace ζ n by ˆζ n : Use particle approximations of E πn [θ] and V πn [θ] based on {θ i n 1, W i n} N i=1. Use actual rejection rate from stage n 1 to calculate ĉ n = ĉ n 1f ( ˆRn 1(ˆζ n 1) ). Result: under suitable regularity conditions replacing ζ n by ˆζ n where n(ˆζ n ζ n ) = O p (1) does not affect the asymptotic variance of the MC approximation.

Adaption of SMC Algorithm for Stylized State-Space Model 1.0 Acceptance Rate 0.5 0.0 1.50 Scale Parameter c 0.75 0.00 1000 Effective Sample Size 500 0 n 10 20 30 40 50 Notes: The dashed line in the top panel indicates the target acceptance rate of 0.25.

Convergence of SMC Approximation for Stylized State-Space Model 0.12 0.10 0.08 0.06 0.04 0.02 NV [θ 1 ] NV [θ 2 ] 0.00 N 10000 15000 20000 25000 30000 35000 40000 Notes: The figure shows NV[ θ j ] for each parameter as a function of the number of particles N. V[ θ j ] is computed based on N run = 1, 000 runs of the SMC algorithm with N φ = 100. The width of the bands is (2 1.96) 3/N run (NV[ θ j ]).

More on Resampling So far, we have used multinomial resampling. It s fairly intuitive and it is straightforward to obtain a CLT. But: multinominal resampling is not particularly efficient. The book contains a section on alternative resampling schemes (stratified resampling, residual resampling...) These alternative techniques are designed to achieve a variance reduction. Most resampling algorithms are not parallelizable because they rely on the normalized particle weights.

Application 1: Small Scale New Keynesian Model We will take a look at the effect of various tuning choices on accuracy: Tempering schedule λ: λ = 1 is linear, λ > 1 is convex. Number of stages N φ versus number of particles N. Number of blocks in mutation step versus number of particles.

Effect of λ on Inefficiency Factors InEff N [ θ] 10 4 10 3 10 2 10 1 10 0 1 2 3 4 5 6 7 8 λ Notes: The figure depicts hairs of InEff N [ θ] as function of λ. The inefficiency factors are computed based on N run = 50 runs of the SMC algorithm. Each hair corresponds to a DSGE model parameter.

Number of Stages N φ vs Number of Particles N 10 1 10 0 10 1 10 2 10 3 ρ g σ r ρ r ρ z σ g κ σ z π (A) ψ 1 γ (Q) ψ 2 r (A) τ N φ = 400, N = 250 N φ = 200, N = 500 N φ = 100, N = 1000 N φ = 50, N = 2000 N φ = 25, N = 4000 Notes: Plot of V[ θ]/v π [θ] for a specific configuration of the SMC algorithm. The inefficiency factors are computed based on N run = 50 runs of the SMC algorithm. N blocks = 1, λ = 2, N MH = 1.

Number of blocks N blocks in Mutation Step vs Number of Particles N 10 0 10 1 10 2 10 3 ρ g σ g σ z σ r ρ r κ ρ z π (A) ψ 1 τ γ (Q) r (A) ψ 2 Nblocks = 4, N = 250 Nblocks = 2, N = 500 Nblocks = 1, N = 1000 Notes: Plot of V[ θ]/v π [θ] for a specific configuration of the SMC algorithm. The inefficiency factors are computed based on N run = 50 runs of the SMC algorithm. N φ = 100, λ = 2, N MH = 1.

A Few Words on Posterior Model Probabilities Posterior model probabilities π i,t = π i,0 p(y 1:T M i ) M j=1 π j,0p(y 1:T M j ) where p(y 1:T M i ) = p(y 1:T θ (i), M i )p(θ (i) M i )dθ (i) For any model: ln p(y 1:T M i ) = T t=1 ln p(y t θ (i), Y 1:t 1, M i )p(θ (i) Y 1:t 1, M i )dθ (i) Marginal data density p(y 1:T M i ) arises as a by-product of SMC.

Marginal Likelihood Approximation Recall w i n = [p(y θ i n 1 )]φn φn 1. Then 1 N N w nw i n 1 i i=1 = p φn 1 (Y θ)p(θ) [p(y θ)] φn φn 1 dθ p φ n 1(Y θ)p(θ)dθ p(y θ) φ n p(θ)dθ p(y θ) φ n 1p(θ)dθ Thus, N φ ( ) 1 N w i N nwn 1 i n=1 i=1 p(y θ)p(θ)dθ.

SMC Marginal Data Density Estimates N φ = 100 N φ = 400 N Mean(ln ˆp(Y )) SD(ln ˆp(Y )) Mean(ln ˆp(Y )) SD(ln ˆp(Y )) 500-352.19 (3.18) -346.12 (0.20) 1,000-349.19 (1.98) -346.17 (0.14) 2,000-348.57 (1.65) -346.16 (0.12) 4,000-347.74 (0.92) -346.16 (0.07) Notes: Table shows mean and standard deviation of log marginal data density estimates as a function of the number of particles N computed over N run = 50 runs of the SMC sampler with N blocks = 4, λ = 2, and N MH = 1.

Application 2: Estimation of Smets and Wouters (2007) Model Benchmark macro model, has been estimated many (many) times. Core of many larger-scale models. 36 estimated parameters. RWMH: 10 million draws (5 million discarded); SMC: 500 stages with 12,000 particles. We run the RWM (using a particular version of a parallelized MCMC) and the SMC algorithm on 24 processors for the same amount of time. We estimate the SW model twenty times using RWM and SMC and get essentially identical results.

Application 2: Estimation of Smets and Wouters (2007) Model More interesting question: how does quality of posterior simulators change as one makes the priors more diffuse? Replace Beta by Uniform distributions; increase variances of parameters with Gamma and Normal prior by factor of 3.

SW Model with DIFFUSE Prior: Estimation stability RWH (black) versus SMC (red) 4 3 2 1 0 1 2 3 4 l ι w µ p µ w ρ w ξ w r π

A Measure of Effective Number of Draws Suppose we could generate iid N eff ( Ê π [θ] approx N E π [θ], 1 N eff V π [θ] draws from posterior, then ). We can measure the variance of Ê π [θ] by running SMC and RWM algorithm repeatedly. Then, N eff V π[θ] V [ Ê π [θ] ]

Effective Number of Draws SMC RWMH Parameter Mean STD(Mean) N eff Mean STD(Mean) N eff σ l 3.06 0.04 1058 3.04 0.15 60 l -0.06 0.07 732-0.01 0.16 177 ι p 0.11 0.00 637 0.12 0.02 19 h 0.70 0.00 522 0.69 0.03 5 Φ 1.71 0.01 514 1.69 0.04 10 r π 2.78 0.02 507 2.76 0.03 159 ρ b 0.19 0.01 440 0.21 0.08 3 ϕ 8.12 0.16 266 7.98 1.03 6 σ p 0.14 0.00 126 0.15 0.04 1 ξ p 0.72 0.01 91 0.73 0.03 5 ι w 0.73 0.02 87 0.72 0.03 36 µ p 0.77 0.02 77 0.80 0.10 3 ρ w 0.69 0.04 49 0.69 0.09 11 µ w 0.63 0.05 49 0.63 0.09 11 ξ w 0.93 0.01 43 0.93 0.02 8

A Closer Look at the Posterior: Two Modes Parameter Mode 1 Mode 2 ξ w 0.844 0.962 ι w 0.812 0.918 ρ w 0.997 0.394 µ w 0.978 0.267 Log Posterior -804.14-803.51 Mode 1 implies that wage persistence is driven by extremely exogenous persistent wage markup shocks. Mode 2 implies that wage persistence is driven by endogenous amplification of shocks through the wage Calvo and indexation parameter. SMC is able to capture the two modes.

A Closer Look at the Posterior: Internal ξ w versus External ρ w Propagation 1 0.9 0.8 0.7 0.6 ρ w 0.5 0.4 0.3 0.2 0.1 0 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 ξ w

Stability of Posterior Computations: RWH (black) versus SMC (red) 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 P (ξ w > ρ w) P (ρ w > µ w) P (ξ w > µ w) P (ξ p > ρ p) P (ρ p > µ p) P (ξ p > µ p)

Marginal Data Density Bayesian model evaluation is based on marginal data density p n (Y ) = [p(y θ)] φn p(θ)dθ. Recall that marginal data density is a by-product of the importance sampler: ˆp n (Y ) = Ẑn = 1 N N i=1 W i n. Algorithm, Method MDD Estimate Standard Deviation SMC, Particle Estimate -873.46 0.24 RWM, Harmonic Mean -874.56 1.20

Application 3: News Shock Model of Schmitt-Grohe and Uribe (2012) Real business cycle model; investment adjustment costs; variable capacity utilitization; Jaimovich-Rebelo (2009) preferences; permanent and stationary neutral and investment-specific technology shocks, goverment spending, wage mark-up, and preference shocks. Data: Real GDP, consumption, investment, government spending, hours, TFP, price of investment growth rates from 1955:Q2-2006:Q4.

Key Model Features News Shocks: each of the 7 exogenous processes has two anticipated components z t = ρz t 1 + ɛ 0 t }{{} unanticipated + ɛ 4 t 4 }{{} anticipated 4 qtrs ahead + ɛ 8 t 8 }{{} anticipated 8 qtrs ahead Jaimovich-Rebelo Pref: Households have CRRA preferences over consumption bundle V t : V t = C t bc t ψh θ t S t S t is a geometric average of current and past habit-adjusted consumption levels. S t = (C t bc t 1 ) γ (S t 1 ) 1 γ γ 0 : GHH preferences. Wealth effect on labor is small. γ 1 : Standard preferences.

Effective Number of Draws SMC RWMH Parameter Mean STD(Mean) N eff Mean STD(Mean) N eff σ 4 z 3.14 0.04 4190 3.08 0.24 108 i σg 8 0.41 0.01 1830 0.41 0.03 108 σ 8 z 5.59 0.09 1124 5.68 0.30 102 i θ 4.13 0.02 671 4.14 0.05 146 σ 0 z 12.27 0.09 640 12.36 0.09 616 i σµ 0 1.04 0.04 626 0.92 0.04 625 σg 0 0.62 0.01 609 0.60 0.03 111 κ 9.32 0.05 578 9.33 0.09 208 σζ 4 2.43 0.09 406 2.44 0.04 2066 σζ 0 3.82 0.10 384 3.80 0.22 73 σζ 8 2.65 0.11 335 2.62 0.18 126 σµ 4 4.26 0.24 49 4.33 0.49 12 σµ 8 1.36 0.24 46 1.34 0.49 11

Histogram of Wage Markup Process Parameters: RWH (black) versus SMC (red) 0 ρ µ σ µ 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0.9 0.92 0.94 0.96 0.98 1 0 0 2 4 6 0.4 σ µ 4 0.4 σ µ 8 0.3 0.3 0.2 0.2 0.1 0.1 0 0 2 4 6 0 0 2 4 6

Histogram of Preference Parameter γ: RWH (black) versus SMC (red) 1 8 x 10 3 γ γ 0.9 0.8 0.7 0.6 7 6 5 0.5 0.4 0.3 0.2 0.1 4 3 2 1 0 0 0 0.1 0.1 0.2 0.3 0.4 0.5 0.5 0.6 0.6 0.7 0.7 0.8 0.8 0.9 0.9 γ 0.6 implies that the importance of anticipated shocks for hours is substantially diminished.

An Alternative Prior for γ SGU use a Uniform prior for γ. We now change the prior to Beta(2, 1) (density is straight line between 0 and 1).

Histogram of Preference Parameter γ Alternative Prior: RWH (black) versus SMC (red) 1 γ 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Anticipated Shocks Variance Shares for Hours: RWH (black) versus SMC (red) SGU Prior Beta(2, 1) Prior 1 Hours 1 Hours 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1