Bayesian Estimation of DSGE Models Stéphane Adjemian Université du Maine, GAINS & CEPREMAP stephane.adjemian@univ-lemans.fr http://www.dynare.org/stepan June 28, 2011 June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 1
Bayesian paradigm (motivations) Bayesian estimation of DSGE models with Dynare. 1. Data are not informative enough... 2. DSGE models are misspecified. 3. Model comparison. Prior elicitation. Efficiency issues. June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 2
Bayesian paradigm (basics) A model defines a joint probability distribution parametrized function over a sample of variables: Likelihood. p(y T θ) (1) We Assume that our prior information about parameters can be summarized by a joint probability density function. Let the prior density be p 0 (θ). The posterior distribution is given by (Bayes theorem squared): p 1 (θ Y T) = p 0(θ)p(Y T θ) p(y T ) (2) June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 3
Bayesian paradigm (basics, contd) The denominator is defined by p(yt) = p 0 (θ)p(yt θ)dθ (3) Θ the marginal density of the sample. A weighted mean of the sample conditional densities over all the possible values for the parameters. The posterior density is proportional to the product of the prior density and the density of the sample. p 1 (θ YT) p 0 (θ)p(yt θ) That s all we need for any inference about θ! The prior density deforms the shape of the likelihood! June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 4
A simple example (I) Data Generating Process y t = µ+ε t where ε t N(0,1) is a gaussian white noise. iid Let Y T (y 1,...,y T ). The likelihood is given by: p(y T µ) = (2π) T 2 e 1 2 Tt=1 (y t µ) 2 And the ML estimator of µ is: µ ML,T = 1 T T y t y t=1 June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 5
A simple example (II) Note that the variance of this estimator is a simple function of the sample size Noting that: V[ µ ML,T ] = 1 T T (y t µ) 2 = νs 2 +T(µ µ) 2 t=1 with ν = T 1 and s 2 = (T 1) 1 T t=1 (y t µ) 2. The likelihood can be equivalently written as: p(y T µ) = (2π) T 2e 1 2(νs 2 +T(µ µ) 2 ) The two statistics s 2 and µ are summing up the sample information. June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 6
A simple example (II, bis) T T (y t µ) 2 = ([y t µ] [µ µ]) 2 t=1 = t=1 T (y t µ) 2 + t=1 t=1 = νs 2 +T(µ µ) 2 = νs 2 +T(µ µ) 2 T T (µ µ) 2 (y t µ)(µ µ) t=1 ( T ) y t T µ (µ µ) The last term cancels out by definition of the sample mean. t=1 June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 7
A simple example (III) Let our prior be a gaussian distribution with expectation µ 0 and variance σ 2 µ. The posterior density is defined, up to a constant, by: p(µ Y T ) (2πσ 2 µ) 1 2 e 1 2 (µ µ 0 ) 2 σ 2 µ (2π) T 2 e 1 2(νs 2 +T(µ µ) 2 ) where the missing constant (denominator) is the marginal density (does not depend on µ). We also have: p(µ Y T ) exp { 1 2 ( T(µ µ) 2 + 1 )} σµ 2 (µ µ 0 ) 2 June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 8
A simple example (IV) A(µ) = T(µ µ) 2 + 1 (µ µ σµ 2 0 ) 2 = T ( µ 2 + µ 2 2µ µ ) + 1 ( µ 2 +µ 2 ) 0 2µµ 0 = = = σ 2 µ ( T + 1 ) ( µ 2 2µ T µ+ 1 µ σµ 2 σµ 2 0 )+ ( T + 1 ) µ 2µT µ+ 1 2 σµµ 2 0 σµ 2 T + 1 σµ 2 ( T + 1 ) µ σµ 2 T µ+ 1 σ 2 µµ 0 T + 1 σ 2 µ 2 + + ( T µ 2 + 1 ) µ 2 σµ 2 0 ( T µ 2 + 1 ) µ 2 σµ 2 0 ( T µ 2 + 1 ) µ 2 σµ 2 0 ( ) 2 T µ+ 1 µ σµ 2 0 T + 1 σµ 2 June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 9
A simple example (V) Finally we have: p(µ Y T ) exp 1 2 ( T + 1 ) [ σµ 2 µ T µ+ 1 σ 2 µµ 0 T + 1 σ 2 µ ]2 Up to a constant, this is a gaussian density with (posterior) expectation: E[µ] = and (posterior) variance: T µ+ 1 σ 2 µµ 0 T + 1 σ 2 µ V[µ] = 1 T + 1 σ 2 µ June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 10
A simple example (VI, The bridge) The posterior mean is a convex combination of the prior mean and the ML estimate. If σµ 2 (no prior information) then E[µ] µ (ML). If σµ 2 0 (calibration) then E[µ] µ 0. If σµ 2 < then the variance of the ML estimator is greater than the posterior variance. Not so simple if the model is non linear in the estimated parameters... Asymptotic (Gaussian) approximation. Simulation based approach (MCMC, Metropolis-Hastings,...). June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 11
Bayesian paradigm (II, Model Comparison) Comparison of marginal densities of the (same) data across models. p(yt I) measures the fit of model I. Suppose we have a prior distribution over models A, B,...: p(a), p(b),... Again, using the Bayes theorem we can compute the posterior distribution over models: p(i YT) = p(i)p(y T I) I p(i)p(y T I) June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 12
Estimation of DSGE models (I, Reduced form) Compute the steady state of the model (a system of non linear recurrence equations. Compute linear approximation of the model. Solve the linearized model: y t ȳ(θ) n 1 = T(θ) (y t 1 ȳ(θ)) n n n 1 +R(θ) n q ε t q 1 (4) where n is the number of endogenous variables, q is the number of structural innovations. The reduced form model is non linear w.r.t the deep parameters. We do not observe all the endogenous variables. June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 13
Estimation of DSGE models (II, SSM) Let y t be a subset of y t gathering p observed variables. To bring the model to the data, we use a state-space representation: y t = Z(y t +ȳ(θ))+η t ŷ t = T(θ)ŷ t 1 +R(θ)ε t (5a) (5b) where ŷ t = y t ȳ(θ). Equation (5b) is the reduced form of the DSGE model. state equation Equation (5a) selects a subset of the endogenous variables, Z is a p n matrix filled with zeros and ones. measurement equation June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 14
Estimation of DSGE models (III, Likelihood) a Let YT = {y 1,y 2,...,y T } be the sample. Let ψ be the vector of parameters to be estimated (θ, the covariance matrices of ε and η). The likelihood, that is the density of YT the parameters, is given by: conditionally on T L(ψ;YT) = p(yt ψ) = p(y0 ψ) p ( yt Y t 1,ψ ) (6) To evaluate the likelihood we need to specify the marginal density p(y0 ψ) (or p(y 0 ψ)) and the conditional density p ( yt Y t 1,ψ). t=1 June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 15
Estimation of DSGE models (III, Likelihood) b The state-space model (5), describes the evolution of the endogenous variables distribution. The distribution of the initial condition (y 0 ) is set equal to the ergodic distribution of the stochastic difference equation (so that the distribution of y t is time invariant). Because we consider a linear(ized) reduce form model and the disturbances are supposed to be gaussian (say ε N (0,Σ)) then the initial (ergodic) distribution is also gaussian: y 0 N (E [y t ],V [y t ]) Unit roots (diffuse kalman filter). June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 16
Estimation (III, Likelihood) c Evaluation of the density of yt Y t 1 is not trivial, because yt also depends on unobserved endogenous variables. The following identity can be used: p ( yt Y t 1,ψ ) = p(yt y t,ψ)p(y t Yt 1,ψ)dy t (7) Λ The density of y t Y t 1 is the mean of the density of y t y t weigthed by the density of y t Y t 1. The first conditional density is given by the measurement equation (5a). A Kalman filter is used to evaluate the density of the latent variables (y t ) conditional on the sample up to time t 1 (Yt 1 ) [ predictive density]. June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 17
Estimation (III, Likelihood) d The Kalman filter can be seen as a bayesian recursive estimation routine: p(y t Y t 1,ψ) = p(y t Y t,ψ) = Λ p(y t y t 1,ψ)p(y t 1 Y t 1,ψ)dy t 1 (8a) p(y t y t,ψ)p(y t Y t 1,ψ) Λ p(y t y t,ψ)p ( y t Y t 1,ψ) dy t (8b) Equation (8a) says that the predictive density of the latent variables is the mean of the density of y t y t 1, given by the state equation (5b), weigthed by the density y t 1 conditional on Yt 1 (given by (8b)). The update equation (8b) is an application of the Bayes theorem how to update our knowledge about the latent variables when new information (data) becomes available. June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 18
Estimation (III, Likelihood) e p(y t Y t,ψ) = p(y t y t,ψ)p ( y t Y t 1,ψ) Λ p(y t y t,ψ)p ( y t Y t 1,ψ) dy t p ( y t Yt 1,ψ) is the a priori density of the latent variables at time t. p(yt y t,ψ) is the density of the observation at time t knowing the state and the parameters (this density is obtained from the measurement equation (5a)) the likelihood associated to yt. Λ p(y t y t,ψ)p ( y t Yt 1,ψ) dy t is the marginal density of the new information. June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 19
Estimation (III, Likelihood) f The linear gaussian Kalman filter recursion is given by: v t F t K t = y t Z(ŷ t +ȳ(θ) = ZP t Z +V[η] = T(θ)P t T(θ) F 1 t ŷ t+1 = T(θ)ŷ t +K t v t P t+1 = T(θ)P t (T(θ) K t Z) +R(θ)ΣR(θ) for t = 1,...,T, with ŷ 0 and P 0 given. Finally the (log)-likelihood is: lnl(ψ Y T) = Tk 2 ln(2π) 1 2 T t=1 F t 1 2 v tf 1 t v t References: Harvey, Hamilton. June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 20
Simulations for exact posterior analysis Noting that: E[ϕ(ψ)] = Ψ ϕ(ψ)p 1 (ψ Y T)dψ we can use the empirical mean of ( ϕ(ψ (1) ),ϕ(ψ (2) ),...,ϕ(ψ (n) ) ), where ψ (i) are draws from the posterior distribution to evaluate the expectation of ϕ(ψ). The approximation error goes to zero when n. We need to simulate draws from the posterior distribution Metropolis-Hastings. We build a stochastic recurrence whose limiting distribution is the posterior distribution. June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 21
Simulations (Metropolis-Hastings) a 1. Choose a starting point Ψ 0 & run a loop over 2-3-4. 2. Draw a proposal Ψ from a jumping distribution J(Ψ Ψ t 1 ) = N(Ψ t 1,c Ω m ) 3. Compute the acceptance ratio r = p 1(Ψ Y T ) p(ψ t 1 Y T ) = K(Ψ Y T ) K(Ψ t 1 Y T ) 4. Finally Ψ t = Ψ Ψ t 1 with probability min(r,1) otherwise. June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 22
Simulations (Metropolis-Hastings) b posterior kernel K(θ o ) θ o June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 23
Simulations (Metropolis-Hastings) c K ( θ 1) = K(θ ) posterior kernel K(θ o ) θ o θ 1 = θ June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 24
Simulations (Metropolis-Hastings) d K ( θ 1) posterior kernel K(θ o ) K(θ )?? θ o θ 1 θ June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 25
Simulations (Metropolis-Hastings) e How should we choose the scale factor c (variance of the jumping distribution)? The acceptance rate should be strictly positive and not too important. How many draws? Convergence has to be assessed... Parallel Markov chains Pooled moments have to be close to Within moments. June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 26
Dynare syntax (I) var A B C; varexo E; parameters a b c d e f ; model(linear); A=A(+1)-b/e*(B-C(+1)+A(+1)-A); C=f*A+(1-d)*C(-1);... end; June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 27
Dynare syntax (II) estimated_params; stderr e, uniform_pdf,,,0,1;... a, normal_pdf, 1.5, 0.25; b, gamma_pdf, 1, 2; c, gamma_pdf, 2, 2, 1;... end; varobs pie r y rw; estimation(datafile=dataraba,first_obs=10,...,mh_jscale=0.5); June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 28
Prior Elicitation The results may depend heavily on our choice for the prior density or the parametrization of the model (not asymptotically). How to choose the prior? Subjective choice (data driven or theoretical), example: the Calvo parameter for the Phillips curve. Objective choice, examples: the (optimized) Minnesota prior for VAR (Phillips, 1996). Robustness of the results must be evaluated: Try different parametrization. Use more general prior densities. Uninformative priors. June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 29
Prior Elicitation (parametrization of the model) a Estimation of the Phillips curve : π t = βeπ t+1 + (1 ξ p)(1 βξ p ) ( ) (σ c +σ l )y t +τ t ξ p ξ p is the (Calvo) probability (for an intermediate firm) of being able to optimally choose its price at time t. With probability 1 ξ p the price is indexed on past inflation an/or steady state inflation. Let α p 1 1 ξ p be the expected period length during which a firm will not optimally adjust its price. Let λ = (1 ξ p)(1 βξ p ) ξ p be the slope of the Phillips curve. Suppose that β, σ c and σ l are known. June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 30
Prior Elicitation (parametrization of the model) b The prior may be defined on ξ p, α p or the slope λ. Say we choose a uniform prior for the Calvo probability: ξ p U [.51,.99] The prior mean is.75 (so that the implied value for α p is 4 quarters). This prior is often think as a non informative prior... An alternative would be to choose a uniform prior for α p : α p U [ 1 1.51,1 1.99] These two priors are very different! June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 31
Prior Elicitation (parametrization of the model) c 45 40 35 30 25 20 15 10 5 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 The prior on α p is much more informative than the prior on ξ p. June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 32
Prior Elicitation (parametrization of the model) d 80 70 60 50 40 30 20 10 0 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 Implied prior density of ξ p = 1 1 α p if the prior density of α p is uniform. June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 33
Prior Elicitation (more general prior densities) Robustness of the results may be evaluated by considering a more general prior density. For instance, in our simple example we could assume a student prior density for µ instead of a gaussian density. June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 34
Prior Elicitation (flat prior) If a parameter, say µ, can take values between and, the flat prior is a uniform density between and. If a parameter, say σ, can take values between 0 and, the flat prior is a uniform density between and for logσ: p 0 (logσ) 1 p 0 (σ) 1 σ Invariance. Why is this prior non informative?... p 0 (µ)dµ is not defined! Improper prior. Practical implications for DSGE estimation. June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 35
Prior Elicitation (non informative prior) An alternative, proposed by Jeffrey, is to use the Fisher information matrix: p 0 (ψ) I(ψ) 1 2 with [( p(y )( I(ψ) = E T ψ) p(y ) T ψ) ] ψ ψ The idea is to mimic the information in the data... Automatic choice of the prior. Invariance to any continuous transformation of the parameters. Very different results (compared to the flat prior) Unit root controverse. June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 36
Effective prior mass Dynare excludes parameters such that the steady state does not exist, or such that the BK conditions are not satisfied. The effective prior mass can be less than 1. Comparison of marginal densities of the data is not informative if the prior mass is not invariant across models. The estimation of the posterior mode is more difficult if the effective prior mass is less than 1. June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 37
Efficiency (I) If possible use options use_dll (need a compiler, gcc) Do not let dynare compute the steady state! Even if there is no closed form solution for the steady state, the static model can be concentrated and reduced to a small nonlinear system of equations, which can be solved by using standard newton algorithm in the steastate file. This numerical part can be done in a mex routine called by the steastate file. Alternative initialization of the Kalman filter (lik_init=4). June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 38
Efficiency (II) Fixed point of the state equation Fixed point of the Riccati equation Execution time 7.28 3.31 Likelihood -1339.034-1338.915 Marginal density 1296.336 1296.203 α 0.3526 0.3527 β 0.9937 0.9937 γ a 0.0039 0.0039 γ m 1.0118 1.0118 ρ 0.6769 0.6738 ψ 0.6508 0.6508 δ 0.0087 0.0087 σ ε a 0.0146 0.0146 σ ε m 0.0043 0.0043 Table 1: Estimation of fs2000. In percentage the maximum deviation is less than 0.45. June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 39