Bayesian Inference for DSGE Models. Lawrence J. Christiano

Similar documents
Bayesian Inference for DSGE Models. Lawrence J. Christiano

Bayesian Inference for DSGE Models. Lawrence J. Christiano

... Econometric Methods for the Analysis of Dynamic General Equilibrium Models

DSGE Methods. Estimation of DSGE models: Maximum Likelihood & Bayesian. Willi Mutschler, M.Sc.

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix

Bayesian Inference and MCMC

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Conditional Forecasts

Combining Macroeconomic Models for Prediction

Dynamic Factor Models and Factor Augmented Vector Autoregressions. Lawrence J. Christiano

Density Estimation. Seungjin Choi

Principles of Bayesian Inference

Markov Chain Monte Carlo methods

Computational statistics

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Monetary and Exchange Rate Policy Under Remittance Fluctuations. Technical Appendix and Additional Results

David Giles Bayesian Econometrics

The Metropolis-Hastings Algorithm. June 8, 2012

ESTIMATION of a DSGE MODEL

Econometrics I, Estimation

SUPPLEMENT TO MARKET ENTRY COSTS, PRODUCER HETEROGENEITY, AND EXPORT DYNAMICS (Econometrica, Vol. 75, No. 3, May 2007, )

High-dimensional Problems in Finance and Economics. Thomas M. Mertens

TAKEHOME FINAL EXAM e iω e 2iω e iω e 2iω

Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US

17 : Markov Chain Monte Carlo

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference

Bayesian Regression Linear and Logistic Regression

Y t = log (employment t )

Markov Chain Monte Carlo methods

Bayesian Methods for Machine Learning

Advanced Econometrics III, Lecture 5, Dynamic General Equilibrium Models Example 1: A simple RBC model... 2

Lecture 4: State Estimation in Hidden Markov Models (cont.)

Bayesian Estimation with Sparse Grids

CPSC 540: Machine Learning

an introduction to bayesian inference

Markov Chain Monte Carlo Methods

Bayesian Estimation of DSGE Models

Estimation of moment-based models with latent variables

The Bayesian Approach to Multi-equation Econometric Model Estimation

Monte Carlo in Bayesian Statistics

Bayesian Model Comparison:

Probabilistic Graphical Models

Sequential Monte Carlo Methods (for DSGE Models)

1 Bayesian Linear Regression (BLR)

Bayesian Computations for DSGE Models

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

Pseudo-marginal MCMC methods for inference in latent variable models

Non-Markovian Regime Switching with Endogenous States and Time-Varying State Strengths

Lecture 8: Bayesian Estimation of Parameters in State Space Models

F denotes cumulative density. denotes probability density function; (.)

Differentiation and Integration

Learning the hyper-parameters. Luca Martino

Bayesian Methods with Monte Carlo Markov Chains II

Bayesian DSGE Model Estimation

Financial Econometrics

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1

Lecture 6: Markov Chain Monte Carlo

Bayesian Estimation of Input Output Tables for Russia

The Ising model and Markov chain Monte Carlo

Inference when identifying assumptions are doubted. A. Theory B. Applications

Sequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007

Machine Learning. Probability Basics. Marc Toussaint University of Stuttgart Summer 2014

Markov Chain Monte Carlo

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

Unsupervised Learning

Expectation Propagation Algorithm

Markov Chain Monte Carlo, Numerical Integration

Building and simulating DSGE models part 2

Bayesian Methods and Uncertainty Quantification for Nonlinear Inverse Problems

Introduction to Machine Learning CMU-10701

Probabilistic Graphical Models

MONTE CARLO METHODS. Hedibert Freitas Lopes

Bayesian Networks BY: MOHAMAD ALSABBAGH

Bayesian Phylogenetics:

MCMC Sampling for Bayesian Inference using L1-type Priors

Lecture : Probabilistic Machine Learning

LECTURE 15 Markov chain Monte Carlo

Session 3A: Markov chain Monte Carlo (MCMC)

Answers and expectations

Markov Networks.

CSC 2541: Bayesian Methods for Machine Learning

Lecture 4: Dynamic models

Session 5B: A worked example EGARCH model

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.

Inference when identifying assumptions are doubted. A. Theory. Structural model of interest: B 1 y t1. u t. B m y tm. u t i.i.d.

Introduction to Bayesian methods in inverse problems

Forecast combination and model averaging using predictive measures. Jana Eklund and Sune Karlsson Stockholm School of Economics

Gaussian Mixture Approximations of Impulse Responses and the Non-Linear Effects of Monetary Shocks

STA 4273H: Statistical Machine Learning

Switching Regime Estimation

Artificial Intelligence

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

EM Algorithm II. September 11, 2018

Probabilistic Graphical Networks: Definitions and Basic Results

Markov Chain Monte Carlo (MCMC) and Model Evaluation. August 15, 2017

ECO 513 Fall 2009 C. Sims HIDDEN MARKOV CHAIN MODELS

Transcription:

Bayesian Inference for DSGE Models Lawrence J. Christiano

Outline State space-observer form. convenient for model estimation and many other things. Bayesian inference Bayes rule. Monte Carlo integation. MCMC algorithm. Laplace approximation

State Space/Observer Form Compact summary of the model, and of the mapping between the model and data used in the analysis. Typically, data are available in log form. So, the following is useful: If x is steady state of x t : ˆx t x t x, x = x t x = 1 + ˆx t xt ) = log = log 1 + ˆx t ) ˆx t x Suppose we have a model solution in hand: 1 z t = Az t 1 + Bs t s t = Ps t 1 + ɛ t, Eɛ t ɛ t = D. 1 Notation taken from solution lecture notes, http://faculty.wcas.northwestern.edu/~lchrist/course/ Korea_2012/lecture_on_solving_rev.pdf

State Space/Observer Form Suppose we have a model in which the date t endogenous variables are capital, K t+1, and labor, N t : ) ˆK z t = t+1, s ˆN t = ˆε t, ɛ t = e t. t Data may include variables in z t and/or other variables. for example, suppose available data are N t and GDP, y t and production function in model is: so that y t = ε t K α t N 1 α t, ŷ t = ˆε t + α ˆK t + 1 α) ˆN t = 0 1 α ) z t + α 0 ) z t 1 + s t From the properties of ŷ t and ˆN t : ) Yt data log yt log y = log N = t log N ) + ŷt ˆN t )

State Space/Observer Form Model prediction for data: ) Yt data log y = log N + ) log y = log N + ξ t = zt z t 1 ˆε t = a + Hξ ) t, a = [ 0 1 α 0 1 [ log y log N ŷt ˆN t ) ] [ α 0 z t + 0 0 ] [ ] 10 z t 1 + s t ] [ 0 1 α α 0 1, H = 0 1 0 0 0 The Observer Equation may include measurement error, w t : Y data t = a + Hξ t + w t, Ew t w t = R. Semantics: ξ t is the state of the system do not confuse with the economic state K t, ε t )!). ]

State Space/Observer Form Law of motion of the state, ξ t state-space equation): ξ t = Fξ t 1 + u t, Eu t u t = Q ) [ ] ) ) zt+1 A 0 BP zt B0 z t = I 0 0 z t 1 + ɛ t+1, s t+1 0 0 P s t I ) [ B0 BDB ] [ ] 0 BD A 0 BP u t = ɛ t, Q = 0 0 0, F = I 0 0. I DB D 0 0 P

State Space/Observer Form ξ t = Fξ t 1 + u t, Eu t u t = Q, Y data t = a + Hξ t + w t, Ew t w t = R. Can be constructed from model parameters θ = β, δ,...) so F = F θ), Q = Q θ), a = a θ), H = H θ), R = R θ).

Uses of State Space/Observer Form Estimation of θ and forecasting ξ t and Yt data Can take into account situations in which data represent a mixture of quarterly, monthly, daily observations. Data Rich estimation. Could include several data measures e.g., employment based on surveys of establishments and surveys of households) on a single model concept. Useful for solving the following forecasting problems: Filtering mainly of technical interest in computing likelihood function): Smoothing: [ P ξ t Y data t 1, Ydata t 2 [ P ξ t Y data T,..., Ydata,..., Ydata 1 1 ], t = 1, 2,..., T. ], t = 1, 2,..., T. Example: real rate of interest and output gap can be recovered from ξ t using simple New Keynesian model. Useful for deriving a model s implications vector autoregressions

Quick Review of Probability Theory Two random variables, x x 1, x 2 ) and y y 1, y 2 ). Joint distribution: p x, y) x 1 x 2 y 1 p 11 p 12 y 2 p 21 p 22 = x 1 x 2 y 1 0.05 0.40 y 2 0.35 0.20 where Restriction: p ij = probability x = x i, y = y j ). x,y p x, y) dxdy = 1.

Quick Review of Probability Theory Joint distribution: p x, y) x 1 x 2 y 1 p 11 p 12 y 2 p 21 p 22 = x 1 x 2 y 1 0.05 0.40 y 2 0.35 0.20 Marginal distribution of x : p x) Probabilities of various values of x without reference to the value of y: or, p x) = { p11 + p 21 = 0.40 x = x 1 p 12 + p 22 = 0.60 x = x 2. p x) = p x, y) dy y

Quick Review of Probability Theory Joint distribution: p x, y) x 1 x 2 y 1 p 11 p 12 y 2 p 21 p 22 = x 1 x 2 y 1 0.05 0.40 y 2 0.35 0.20 Conditional distribution of x given y : p x y) Probability of x given that the value of y is known or, { p x1 y 1 ) p x y 1 ) = p x 2 y 1 ) p x y) = p 11 p 11 +p 12 = p 11 py 1 ) = 0.05 0.45 = 0.11 p 12 p 11 +p 12 = p 12 py 1 ) = 0.40 0.45 = 0.89 p x, y) p y).

Quick Review of Probability Theory Joint distribution: p x, y) Mode x 1 x 2 y 1 0.05 0.40 p y 1 ) = 0.45 y 2 0.35 0.20 p y 2 ) = 0.55 p x 1 ) = 0.40 p x 2 ) = 0.60 Mode of joint distribution in the example): argmax x,y p x, y) = x 2, y 1 ) Mode of the marginal distribution: argmax x p x) = x 2, argmax y p y) = y 2 Note: mode of the marginal and of joint distribution conceptually different.

Maximum Likelihood Estimation State space-observer system: ξ t+1 = Fξ t + u t+1, Eu t u t = Q, Y data t = a 0 + Hξ t + w t, Ew t w t = R Reduced form parameters, F, Q, a 0, H, R), functions of θ. Choose θ to maximize likelihood, p Y data θ ) : ) ) p Y data θ = p Y1 data,..., YT data θ ) ) = p Y1 data θ p Y2 data Y1 data, θ computed using Kalman Filter { }} ){ p Yt data Yt 1 data Ydata 1, θ ) p,, Ydata 1, θ YT data Ydata T 1 Kalman filter straightforward see, e.g., Hamilton s textbook).

Bayesian Inference Bayesian inference is about describing the mapping from prior beliefs about θ, summarized in p θ), to new posterior beliefs in the light of observing the data, Y data. General property of probabilities: ) { p Y data p Y, θ = data θ ) p θ) p θ Y data) p Y data), which implies Bayes rule: p θ Y data) = p Y data θ ) p θ) p Y data), mapping from prior to posterior induced by Y data.

Bayesian Inference Report features of the posterior distribution, p θ Y data). The value of θ that maximizes p θ Y data), mode of posterior distribution. Compare marginal prior, p θ i ), with marginal posterior of individual elements of θ, g θ i Y data) : g θ i Y data) = p θ Y data) dθ j =i multiple integration!!) θ j =i Probability intervals about the mode of θ Bayesian confidence intervals ), need g θ i Y data). Marginal likelihood for assessing model fit : p Y data) ) = p Y data θ p θ) dθ multiple integration) θ

Monte Carlo Integration: Simple Example Much of Bayesian inference is about multiple integration. Numerical methods for multiple integration: Quadrature integration example: approximating the integral as the sum of the areas of triangles beneath the integrand). Monte Carlo Integration: uses random number generator. Example of Monte Carlo Integration: suppose you want to evaluate b a f x) dx, - a < b. select a density function, g x) for x [a, b] and note: b a f x) dx = b a f x) f x) g x) dx = E g x) g x), where E is the expectation operator, given g x).

Monte Carlo Integration: Simple Example Previous result: can express an integral as an expectation relative to a arbitrary, subject to obvious regularity conditions) density function. Use the law of large numbers LLN) to approximate the expectation. step 1: draw x i independently from density, g, for i = 1,..., M. step 2: evaluate f x i ) /g x i ) and compute: Exercise. µ M 1 M M f x i ) g x i=1 i ) M E f x) g x). Consider an integral where you have an analytic solution available, e.g., 1 0 x2 dx. Evaluate the accuracy of the Monte Carlo method using various distributions on [0, 1] like uniform or Beta.

Monte Carlo Integration: Simple Example Standard classical sampling theory applies. Independence of f x i ) /g x i ) over i implies: ) M 1 f x var M i ) = v M g x i ) M, v M var f xi ) g x i ) i=1 ) 1 M M i=1 [ f xi ) g x i ) µ M ] 2. Central Limit Theorem Estimate of b f a x) dx is a realization from a Nomal distribution with mean estimated by µ M and variance, v M /M. With 95% probability, vm b µ M 1.96 M vm f x) dx µ M + 1.96 M Pick g to minimize variance in f x i ) /g x i ) and M to minimize subject to computing cost) v M /M. a

Markov Chain, Monte Carlo MCMC) Algorithms Among the top 10 algorithms "with the greatest influence on the development and practice of science and engineering in the 20th century". Reference: January/February 2000 issue of Computing in Science & Engineering, a joint publication of the American Institute of Physics and the IEEE Computer Society. Developed in 1946 by John von Neumann, Stan Ulam, and Nick Metropolis see http://www.siam.org/pdf/news/637.pdf)

MCMC Algorithm: Overview compute a sequence, θ 1), θ 2),..., θ M), of values of the N 1 vector of model parameters in such a way that [ ] lim Frequency θ i) close to θ = p θ Y data). M Use θ 1), θ 2),..., θ M) to obtain an approximation for Eθ, Var θ) under posterior distribution, p ) θ Y data) g θ i Y data = p θ Y data) dθdθ θ i =j p Y data) = θ p Y data θ ) p θ) dθ posterior distribution of any function of θ, f θ) e.g., impulse responses functions, second moments). MCMC also useful for computing posterior mode, arg max θ p θ Y data).

MCMC Algorithm: setting up Let G θ) denote the log of the posterior distribution excluding an additive constant): ) G θ) = log p Y data θ + log p θ) ; Compute posterior mode: θ = arg max G θ). θ Compute the positive definite matrix, V : [ V 2 G θ) θ θ ] 1 θ=θ Later, we will see that V is a rough estimate of the variance-covariance matrix of θ under the posterior distribution.

MCMC Algorithm: Metropolis-Hastings θ 1) = θ to compute θ r), for r > 1 step 1: select candidate θ r), x, draw }{{} x from θ r 1) + N 1 step 2: compute scalar, λ : jump distribution { }} { k N }{{} 0, V, k is a scalar N 1 p Y data x ) p x) λ = p Y data θ r 1)) p θ r 1)) step 3: compute θ r) : { θ r) = θ r 1) if u > λ x if u < λ, u is a realization from uniform [0, 1]

Practical issues What is a sensible value for k? set k so that you accept i.e., θ r) = x) in step 3 of MCMC algorithm are roughly 23 percent of time What value of M should you set? want convergence, in the sense that if M is increased further, the econometric results do not change substantially in practice, M = 10, 000 a small value) up to M = 1, 000, 000. large M is time-consuming. could use Laplace approximation after checking its accuracy) in initial phases of research project. more on Laplace below. Burn-in: in practice, some initial θ i) s are discarded to minimize the impact of initial conditions on the results. Multiple chains: may promote effi ciency. increase independence among θ i) s. can do MCMC utilizing parallel computing Dynare can do this).

MCMC Algorithm: Why Does it Work? Proposition that MCMC works may be surprising. Whether or not it works does not depend on the details, i.e., precisely how you choose the jump distribution of course, you had better use k > 0 and V positive definite). Proof: see, e.g., Robert, C. P. 2001), The Bayesian Choice, Second Edition, New York: Springer-Verlag. The details may matter by improving the effi ciency of the MCMC algorithm, i.e., by influencing what value of M you need. Some Intuition the sequence, θ 1), θ 2),..., θ M), is relatively heavily populated by θ s that have high probability and relatively lightly populated by low probability θ s. Additional intuition can be obtained by positing a simple scalar distribution and using MATLAB to verify that MCMC approximates it well see, e.g., question 2 in assignment 9).

Why a Low Acceptance Rate is Desirable

MCMC Algorithm: using the Results To approximate marginal posterior distribution, g θ i Y data), of θ i, compute and display the histogram of θ 1) i, θ 2) i,..., θ M) i, i = 1,..., M. Other objects of interest: mean and variance of posterior distribution θ : Eθ θ 1 M M θ j), Var θ) 1 M [ ] [ M θ j) θ θ j) θ]. j=1 j=1

MCMC Algorithm: using the Results More complicated objects of interest: impulse response functions, model second moments, forecasts, Kalman smoothed estimates of real rate, natural rate, etc. All these things can be represented as non-linear functions of the model parameters, i.e., f θ). can approximate the distribution of f θ) using Var f θ)) 1 M f θ 1)),..., f θ M)) Ef θ) f 1 M M [ f θ i)) f i=1 M f θ i)), i=1 ] [ f θ i)) f ]

MCMC: Remaining Issues In addition to the first and second moments already discused, would also like to have the marginal likelihood of the data. Marginal likelihood is a Bayesian measure of model fit.

MCMC Algorithm: the Marginal Likelihood Consider the following sample average: M 1 h θ j)) M j=1 p Y data θ j)) p θ j)), where h θ) is an arbitrary density function over the N dimensional variable, θ. By the law of large numbers, M 1 h θ j)) M j=1 p Y data θ j)) p θ j)) E M ) h θ) p Y data θ ) p θ)

MCMC Algorithm: the Marginal Likelihood 1 M = M j=1 θ p h θ j)) Y data θ j)) p θ j)) M E h θ) p Y data θ ) p θ) ) h θ) p Y data θ ) p θ) ) p Y data θ ) p θ) p Y data) dθ = 1 p Y data). When h θ) = p θ), harmonic mean estimator of the marginal likelihood. Ideally, want an h such that the variance of h θ j)) p Y data θ j)) p θ j)) is small recall the earlier discussion of Monte Carlo integration). More on this below.

Laplace Approximation to Posterior Distribution In practice, MCMC algorithm very time intensive. Laplace approximation is easy to compute and in many cases it provides a quick and dirty approximation that is quite good. Let θ R N denote the N dimensional vector of parameters and, as before, ) G θ) log p Y data θ p θ) ) p Y data θ ~likelihood of data p θ) ~prior on parameters θ ~maximum of G θ) i.e., mode)

Laplace Approximation Second order Taylor series expansion of G θ) log [ p Y data θ ) p θ) ] about θ = θ : G θ) G θ ) + G θ θ ) θ θ ) 1 2 θ θ ) G θθ θ ) θ θ ), where G θθ θ ) = 2 log p Y data θ ) p θ) θ θ Interior optimality of θ implies: Then: p p θ=θ G θ θ ) = 0, G θθ θ ) positive definite ) p θ) Y data θ Y data θ ) p θ ) exp { 1 } 2 θ θ ) G θθ θ ) θ θ ).

Laplace Approximation to Posterior Distribution Property of Normal distribution: { 1 G θ 2π) N θθ θ ) 1 2 exp 1 } 2 2 θ θ ) G θθ θ ) θ θ ) dθ = 1 Then, p Y data θ ) p θ) dθ p Y data θ ) p θ ) { exp 1 } 2 θ θ ) G θθ θ ) θ θ ) d = p Y data θ ) p θ ). 1 G θθ θ ) 2 1 2π) N 2

Conclude: Laplace Approximation p Y data) p Y data θ ) p θ ). 1 G θθ θ ) 1 2 2π) N 2 Laplace approximation to posterior distribution: p Y data θ ) p θ) p Y data) 1 2π) N 2 exp G θθ θ ) 1 2 { 1 } 2 θ θ ) G θθ θ ) θ θ ) So, posterior of θ i i.e., g θ i Y data) ) is approximately [ θ i ~N θi, G θθ θ ) 1] ). ii

Modified Harmonic Mean Estimator of Marginal Likelihood Harmonic mean estimator of the marginal likelihood, p Y data) : 1 M h θ j)) 1 M j=1 p Y data θ j)) p θ j)), with h θ) set to p θ). In this case, the marginal likelihood is the harmonic mean of the likelihood, evaluated at the values of θ generated by the MCMC algorithm. Problem: the variance of the object being averaged is likely to be high, requiring high M for accuracy. When h θ) is instead equated to Laplace approximation of posterior distribution, then h θ) is approximately proportional to p Y data θ j)) p being averaged in the last expression is low. θ j)) so that the variance of the variable

The Marginal Likelihood and Model Comparison Suppose we have two models, Model 1 and Model 2. compute p Y data Model 1 ) and p Y data Model 2 ) Suppose p Y data Model 1 ) > p Y data Model 2 ). Then, posterior odds on Model 1 higher than Model 2. Model 1 fits better than Model 2 Can use this to compare across two different models, or to evaluate contribution to fit of various model features: habit persistence, adjustment costs, etc. For an application of this and the other methods in these notes, see Smets and Wouters, AER 2007.