Dynamic Macro 1 / 114. Bayesian Estimation. Summer Bonn University

Size: px

Start display at page:

Download "Dynamic Macro 1 / 114. Bayesian Estimation. Summer Bonn University"

Jade French
5 years ago
Views:

1 Dynamic Macro Bayesian Estimation Petr Sedláček Bonn University Summer / 114

2 Overall plan Motivation Week 1: Use of computational tools, simple DSGE model Tools necessary to solve models and a solution method Week 2: function approximation and numerical integration Week 3: theory of perturbation (1st and higher-order) Tools necessary for, and principles of, estimation Week 4: Kalman filter and Maximum Likelihood estimation Week 5: principles of Bayesian estimation 2 / 114

3 Plan for today Bayesian estimation: the basic ideas extra information over ML: priors main challenge: evaluating the posterior Markov Chain Monte Carlo (MCMC) practical issues: acceptance rate, diagnostics implementation in Dynare 3 / 114

4 Frequentist vs. Bayesian views Bayes rule Bayesian estimation: basic concepts 4 / 114

5 Frequentist vs. Bayesian views Bayes rule Frequentist vs. Bayesian views Frequentist view: parameters are fixed, but unknown likelihood is a sampling distribution for the data realizations of observables Y T just one of many possible realizations from L(Y T Ψ) inferences about Ψ based on probabilities of particular Y T for given Ψ 5 / 114

6 Frequentist vs. Bayesian views Bayes rule Frequentist vs. Bayesian views Bayesian view: observations, not parameters, are taken as given Ψ are viewed as random inference about Ψ based on probabilities of Ψ conditional on data Y T P(Ψ Y T ) probabilistic view of Ψ enables incorporation of prior beliefs Sims (2007): Bayesian inference is a way of thinking, not a basket of methods 6 / 114

7 Frequentist vs. Bayesian views Bayes rule Bayes rule / 114

8 Frequentist vs. Bayesian views Bayes rule Bayes rule Joint density of the data and parameters is: P(Y T, Ψ) =L(Y T Ψ)P(Ψ) or P(Y T, Ψ) =L(Ψ Y T )P(Y T ) From the above we get Bayes rule: P(Ψ Y T ) = L(YT Ψ)P(Ψ) P(Y T ) 8 / 114

9 Frequentist vs. Bayesian views Bayes rule Elements of Bayes rule what we re interested in, posterior distribution: P(Ψ Y T ) likelihood of the data: L(Y T Ψ) our prior about the parameters: P(Ψ) probability of the data: P(Y T ) for the distribution of Ψ P(Y T ) is just a constant P(Ψ Y T ) L(Y T Ψ)P(Ψ) 9 / 114

10 Frequentist vs. Bayesian views Bayes rule What is the challenge? getting the posterior is typically not such a big deal problem is that we often want to know more: conditional expected values of a function of the posterior like mean, variance, model etc. 10 / 114

11 Frequentist vs. Bayesian views Bayes rule What is the challenge? E[g(Ψ)] = g(ψ)p(ψ Y T )dψ P(Ψ Y T )dψ E[g(Ψ)] is the weighted average of g(ψ) weights are determined by the data (likelihood) and the prior 11 / 114

12 Frequentist vs. Bayesian views Bayes rule What is the challenge? we need to be able to evaluate the integral! Special/Simple case: we are able to draw Ψ from P(Ψ Y T ) can evaluate integral via Monte Carlo integration you won t be lucky enough to experience this case 12 / 114

13 Frequentist vs. Bayesian views Bayes rule Our situation: we can calculate P(Ψ Y T ), but we cannot draw from it Solutions: numerical integration Markov Chain Monte Carlo (MCMC) integration What is the standard? although numerical integration is fast and accurate computational burden rises exponentially with dimension suited for low-dimension problems use MCMC methods 13 / 114

14 14 / 114

15 Idea of priors summarize prior information previous studies data not used in estimation pre-sample data other countries etc. don t be too restrictive more on prior selection in extensions 15 / 114

16 Most commonly used distributions: normal beta, support [0, 1] persistence parameters (inverted-) gamma, support (0, ) volatility parameters uniform 16 / 114

17 Prior predictive analysis check whether priors make sense use the prior as the posterior steady state? impulse response functions? 17 / 114

18 Some terminology Jeffreys prior non-informative prior improper vs. proper priors improper prior is non-integrable (integral is ) important to have proper distributions for model comparison 18 / 114

19 Some terminology (natural) conjugate priors family of prior distributions after multiplication with the likelihood produce a posterior of the same family Minnesota (Litterman) prior used in VARs for distribution of lags 19 / 114

20 Importance sampling Markov Chain Monte Carlo Gibbs algorithm Metropolis-Hastings algorithm Practical issues with MH algorithm 20 / 114

21 Importance sampling Markov Chain Monte Carlo Gibbs algorithm Metropolis-Hastings algorithm Practical issues with MH algorithm Starting point Aim is to be able to calculate something like E[g(Ψ)] = g(ψ)p(ψ Y T )dψ P(Ψ Y T )dψ we know how to calculate P(Ψ Y T ) but we cannot draw from it the system is too large for numerical integration 21 / 114

22 Importance sampling Markov Chain Monte Carlo Gibbs algorithm Metropolis-Hastings algorithm Practical issues with MH algorithm Principle of posterior evaluation We cannot draw from the target distribution, but 1. can draw from a different, stand-in, distribution 2. can evaluate both stand-in and target distributions 3. comparing the two, we can re-weigh the draw cleverly 22 / 114

23 Importance sampling Markov Chain Monte Carlo Gibbs algorithm Metropolis-Hastings algorithm Practical issues with MH algorithm Principle of posterior evaluation the above procedure is the idea of importance sampling MCMC methods effectively a version of importance sampling traveling through the parameter space is more sophisticated and or acceptance probability more sophisticated 23 / 114

24 Importance sampling Markov Chain Monte Carlo Gibbs algorithm Metropolis-Hastings algorithm Practical issues with MH algorithm A few simple examples Problem: we want to simulate x x comes from truncated normal with mean µ and variance σ 2 and a < x < b Solution: 1. draw y from N(µ, σ 2 ) 2a. if y (a, b) then keep draw (accept) and go back to 1 2b. otherwise discard draw (reject) and go back to 1 24 / 114

25 Importance sampling Markov Chain Monte Carlo Gibbs algorithm Metropolis-Hastings algorithm Practical issues with MH algorithm A few simple examples Problem: want to draw x from F (x), but we cannot we can sample from G(x) and f (x) cg(x) x Solution: 1. sample y from G(y) 2. accept draw with probability f (y) cg(y) Note: and go back to 1 acceptance rate higher for lower c optimal c is c = sup x f (x) g(x) Metropolis-Hastings sampler (MCMC) is a generalization 25 / 114

26 Importance sampling Markov Chain Monte Carlo Gibbs algorithm Metropolis-Hastings algorithm Practical issues with MH algorithm Importance sampling Main idea very similar to the previous example: cannot draw from P(Ψ Y T ) but can draw from H(Ψ) be smart in reweighing (accepting) the draws 26 / 114

27 Importance sampling Markov Chain Monte Carlo Gibbs algorithm Metropolis-Hastings algorithm Practical issues with MH algorithm Importance sampling g(ψ) P(Ψ Y T ) h(ψ) h(ψ)dψ E[g(Ψ)] = P(Ψ Y T ) h(ψ) h(ψ)dψ = g(ψ)ω(ψ)h(ψ)dψ ω(ψ)h(ψ)dψ ω(ψ) = P(Ψ YT ) h(ψ) 27 / 114

28 Importance sampling Markov Chain Monte Carlo Gibbs algorithm Metropolis-Hastings algorithm Practical issues with MH algorithm Importance sampling Approximate the integral using MC integration: E[g(Ψ)] M m=1 ω(ψ(m) )g(ψ (m) ) M m=1 ω(ψ(m) ) M is the number of draws from importance function h(ψ) 28 / 114

29 Importance sampling Markov Chain Monte Carlo Gibbs algorithm Metropolis-Hastings algorithm Practical issues with MH algorithm Importance sampling How to best choose h(.)? we d like h(.) to have fatter tails compared to f (.) normal distribution has rather thin tails often not a good importance function 29 / 114

Importance sampling Markov Chain Monte Carlo Gibbs algorithm Metropolis-Hastings algorithm Practical issues with MH algorithm Before we move on 3 doors, behind one

30 Importance sampling Markov Chain Monte Carlo Gibbs algorithm Metropolis-Hastings algorithm Practical issues with MH algorithm Before we move on 3 doors, behind one of them is a car pick one I will open one of the remaining two without the car you can choose to stick with your choice or switch who stays and who switches? 30 / 114

31 Importance sampling Markov Chain Monte Carlo Gibbs algorithm Metropolis-Hastings algorithm Practical issues with MH algorithm Some preliminaries for MCMC Markov property: if for all k 1 and all t P(x t+1 x t, x t 1,..., x t k ) = P(x t+1 x t ) Transition kernel: K(x, y) = P(x t+1 = y x t = x) for x, y X X is the sample space 31 / 114

32 Importance sampling Markov Chain Monte Carlo Gibbs algorithm Metropolis-Hastings algorithm Practical issues with MH algorithm Main idea behind MCMC methods as before, we d like to sample from P(Ψ Y T ), but we cannot MCMC methods provide a way to create a Markov chain transition kernel (K) for Ψ that has an invariant density P(Ψ Y T ) given K simulate the Markov chain P = KP starting with some initial values P(Ψ 0) (eventually) distribution of Markov chain P(Ψ Y T ) 32 / 114

33 Importance sampling Markov Chain Monte Carlo Gibbs algorithm Metropolis-Hastings algorithm Practical issues with MH algorithm Main idea behind MCMC methods a principle of constructing such kernels Metropolis (-Hastings) algorithm (MH) the Gibbs sampler is a special case 33 / 114

34 Importance sampling Markov Chain Monte Carlo Gibbs algorithm Metropolis-Hastings algorithm Practical issues with MH algorithm Gibbs algorithm special case of the MH algorithm applies when can sample from each conditional distribution again, this will rarely be applicable in our case 34 / 114

35 Importance sampling Markov Chain Monte Carlo Gibbs algorithm Metropolis-Hastings algorithm Practical issues with MH algorithm Gibbs algorithm instead of draws of Ψ from P(Ψ Y T ) portion Ψ into k blocks sample each from P(Ψ j Y T, Ψ j ) for j = 1,..., k iterate until convergence 35 / 114

36 Importance sampling Markov Chain Monte Carlo Gibbs algorithm Metropolis-Hastings algorithm Practical issues with MH algorithm Gibbs sampling Iterations (k = 2): initiate sample with Ψ 0 then iterate according to: Ψ 1 i+1 P(Ψ 1 Y T, Ψ 2 i ) Ψ 2 i+1 P(Ψ 2 Y T, Ψ 1 i ) can prove that the above converges to P(Ψ Y T ) discard first B number of draws to eliminate influence of Ψ 0 36 / 114

37 Importance sampling Markov Chain Monte Carlo Gibbs algorithm Metropolis-Hastings algorithm Practical issues with MH algorithm Gibbs sampling once Markov chain has converged proceed as if we could sample directly: E[g(Ψ)] = 1 m m g(ψ i ) i=1 37 / 114

38 Importance sampling Markov Chain Monte Carlo Gibbs algorithm Metropolis-Hastings algorithm Practical issues with MH algorithm Gibbs sampling however, draws are serially correlated standard errors are higher [ ( 1 σ (E[g(Ψ)]) = σ m σ 2 0 variance of g(ψ) m 1 γ l l th -order autocovariance of g(ψ) l=1 )] 1/2 m 1 γ l m 38 / 114

39 Importance sampling Markov Chain Monte Carlo Gibbs algorithm Metropolis-Hastings algorithm Practical issues with MH algorithm Metropolis-Hastings algorithm Main idea same as with importance sampling: 1. draw from a stand-in distribution h(ψ; θ) θ explicitly shows parameters of stand-in distribution e.g. mean (µ h ) and variance (σh 2) 2. accept/reject based on probability q(ψ i+1 Ψ i ) 3. go back to 1 3a. stand-in density does not change (indpendent MH) 3b. mean of stand-in adjusts (random walk MH) can show convergence to target distribution 39 / 114

40 Importance sampling Markov Chain Monte Carlo Gibbs algorithm Metropolis-Hastings algorithm Practical issues with MH algorithm Acceptance probability Metropolis q(ψ i+1 Ψ i ) = min [ 1, P(Ψ i+1 YT ) P(Ψ i Y T ) ] Ψ i+1 is the new candidate draw from stand-in distribution if P(Ψ i+1 YT ) high relative to P(Ψ i Y T ) probability of Ψ i+1 relatively high and should accept 40 / 114

41 Importance sampling Markov Chain Monte Carlo Gibbs algorithm Metropolis-Hastings algorithm Practical issues with MH algorithm Acceptance probability Metropolis-Hastings q(ψ i+1 Ψ i ) = min [ 1, P(Ψ i+1 YT ) P(Ψ i Y T ) ] h(ψ i ; θ) h(ψ i+1 ; θ) scale down by relative likelihood in stand-in density a more common draw from the stand-in gets less weight q(ψ i+1 Ψ i ) is lowered 41 / 114

42 Importance sampling Markov Chain Monte Carlo Gibbs algorithm Metropolis-Hastings algorithm Practical issues with MH algorithm Acceptance probability Metropolis-Hastings q(ψ i+1 Ψ i ) = min [ 1, P(Ψ i+1 YT ) P(Ψ i Y T ) ] h(ψ i ; θ) h(ψ i+1 ; θ) P(Ψ i+1 YT )/h(ψ i+1 ; θ) high high probability of Ψ i+1 in target distribution should accept higher q(ψ i+1 Ψ i ) P(Ψ i Y T )/h(ψ i ; θ) high lower q(ψ i+1 Ψ i ) last draw was already in a likely part of the parameter space force the algorithm to explore less likely areas 42 / 114

43 Importance sampling Markov Chain Monte Carlo Gibbs algorithm Metropolis-Hastings algorithm Practical issues with MH algorithm Updating the stand-in density Independence chain variant stand-in distribution does not change it is independent across Monte Carlo replications this is also the case in importance-sampling 43 / 114

44 Importance sampling Markov Chain Monte Carlo Gibbs algorithm Metropolis-Hastings algorithm Practical issues with MH algorithm Updating the stand-in density Random walk variant candidate draws are obtained according to Ψ i+1 = Ψ i + ɛ i+1 ɛ i from a symmetric density around 0 and variance σ 2 h as if the mean of the stand-in density adjusts with each accepted draw in θ, µ h = Ψ i 44 / 114

45 Importance sampling Markov Chain Monte Carlo Gibbs algorithm Metropolis-Hastings algorithm Practical issues with MH algorithm Summary of MCMC with MH algorithm 1. maximize log-posterior log P(Y T Ψ) + log P(Ψ) this yields the posterior mode Ψ 2. draw from a stand-in distribution h(ψ; θ) should have fatter tails than posterior 3. accept/reject based on probability q(ψ i+1 Ψ i ) Metropolis vs. Metropolis-Hastings specification 4. go back to 2 adjust (random walk variant) stand-in distribution do not adjust (independence variant) stand-in distribution 45 / 114

46 Importance sampling Markov Chain Monte Carlo Gibbs algorithm Metropolis-Hastings algorithm Practical issues with MH algorithm Summary of MCMC with MH algorithm evaluation of the likelihood (step 1 and 3) requires computation of the steady state solution of the model constructing the likelihood function (via the Kalman filter) 46 / 114

47 Importance sampling Markov Chain Monte Carlo Gibbs algorithm Metropolis-Hastings algorithm Practical issues with MH algorithm Choice of stand-in density stand-in should have fatter tails variance parameter important for acceptance rate optimal acceptance rates: around 0.44 for estimation of 1 parameter around 0.23 for estimation of more than 5 parameters 47 / 114

48 Importance sampling Markov Chain Monte Carlo Gibbs algorithm Metropolis-Hastings algorithm Practical issues with MH algorithm Choice of stand-in density often, stand-in is N( ˆΨ, c 2 Σ Ψ ) ˆΨ is the posterior mode Σ Ψ is the inverse (negative) Hessian at the mode tip: start with c = 2.4/ d d is number of estimated parameters increase (decrease) c if acceptance rate is too high (low) 48 / 114

49 Importance sampling Markov Chain Monte Carlo Gibbs algorithm Metropolis-Hastings algorithm Practical issues with MH algorithm Convergence statistics theory says that distribution will converge to target when does this happen? diagnostic tests sequence of draws should be from the invariant distribution moments should not change within/between sequences 49 / 114

50 Importance sampling Markov Chain Monte Carlo Gibbs algorithm Metropolis-Hastings algorithm Practical issues with MH algorithm Brooks and Gelman statistics I draws and J sequences W = 1 J 1 J I 1 j=1 I ( ) 2 Ψi,j Ψ j i=1 B = I J J ( Ψj Ψ ) 2 j=1 B/I : estimate of the variance of the mean across sequences W : estimate of average variance within sequences 50 / 114

51 Importance sampling Markov Chain Monte Carlo Gibbs algorithm Metropolis-Hastings algorithm Practical issues with MH algorithm Brooks and Gelman statistics Combine the two measures of variance: V = I 1 W + B I I as the length of the simulation increases want these statistics to settle down 51 / 114

52 Importance sampling Markov Chain Monte Carlo Gibbs algorithm Metropolis-Hastings algorithm Practical issues with MH algorithm Geweke statistic partition a sequence into 3 subsets s = {I, II, III } compute mean (Ψ s ) and standard errors (σ s Ψ ) s.e. s must be corrected for serial correlation then, under convergence CD is distributed N(0, 1) CD = ΨI Ψ III σ I Ψ + σiii Ψ 52 / 114

53 Bayesian vs. frequentist inference Highest posterior density intervals Bayes factors Model comparison 53 / 114

54 Bayesian vs. frequentist inference Highest posterior density intervals Bayes factors Model comparison Bayesian vs. frequentist inference Bayesian inference cannot use frequentist principles t-test, F-test, LR-test etc. they have a frequentist justification of repeated sampling instead, there are two common Bayesian principles: Highest Posterior Density (HPD) interval Bayes factors (posterior odds) 54 / 114

55 Bayesian vs. frequentist inference Highest posterior density intervals Bayes factors Model comparison Highest posterior density intervals A 100(1 α)% posterior interval for Ψ is given by P(b < Ψ < b) = b b P(Ψ Y T )dψ = 1 α there exists many such intervals the HPD interval is the smallest one of them 55 / 114

56 Bayesian vs. frequentist inference Highest posterior density intervals Bayes factors Model comparison HPD tests the HPD test amounts to checking whether Ψ i HPD 1 α this is an informal way of comparing nested models i.e. different parameter values Bayesians can also compare non-nested models more on this below 56 / 114

57 Bayesian vs. frequentist inference Highest posterior density intervals Bayes factors Model comparison Bayes factors B = P(YT Ψ 1 )P(Ψ 1 ) P(Y T Ψ 2 )P(Ψ 2 ) where Ψ 1 and Ψ 2 are two different sets of parameter values if B > 1 Ψ 1 is a posteriori more likely than Ψ 2 57 / 114

58 Bayesian vs. frequentist inference Highest posterior density intervals Bayes factors Model comparison Model comparison posterior densities can be used to evaluate conditional probabilities of particular parameter values conditional probabilities of different model specifications use Bayes factors (posterior odds ratio) to compare models advantage is that all models are treated symmetrically there is no null model compared to an alternative 58 / 114

59 Bayesian vs. frequentist inference Highest posterior density intervals Bayes factors Model comparison Model comparison B A B = P A(Y T Ψ A )P A (Ψ A ) P B (Y T Ψ B )P B (Ψ B ) it is also possible to assign priors on models the posterior odds ratio is then PO A B = P(A YT ) P(B Y T ) = B P(A) A B P(B) 59 / 114

60 Bayesian vs. frequentist inference Highest posterior density intervals Bayes factors Model comparison Model comparison Bayes factor is related to Bayesian information criterion (BIC) the RHS is the BIC where B A B P A(Y T Ψ A ) P B (Y T Ψ B ) T kb ka 2 Ψ i denote ML estimates of parameters k i denote the number of parameters important to use proper priors if not, always prefer model with less parameters 60 / 114

61 Bayesian vs. frequentist inference Highest posterior density intervals Bayes factors Model comparison How much information in Bayes factor? Kass and Raftery (1995), if the value of B A B is between 1 and 3 barely worth mentioning between 3 and 20 positive evidence between 20 and 150 strong evidence over 150 very strong evidence 61 / 114

62 Preliminaries and steady state Estimation command Decomposition Output Example 62 / 114

63 Preliminaries and steady state Estimation command Decomposition Output Example Preliminaries setup is the same as with ML estimation always a good idea to solve model first some parameter values are likely to remain calibrated 63 / 114

64 Preliminaries and steady state Estimation command Decomposition Output Example : initialization initialize as usual var c, k, z, y; varexo e; parameters beta, rho, alpha, nu, delta, sigma; set parameter values that are not estimated alpha = 0.36; rho = 0.95; beta = 0.99; nu = 1; delta = 0.025; 64 / 114

65 Preliminaries and steady state Estimation command Decomposition Output Example : setting it up after model part, and specification of steady state tell Dynare which parameters he should estimate estimated params; stderr e, inv gamma pdf, 0.01, inf; end; the above tells Dynare to estimate σ, the st. error of the productivity disturbance the prior distribution is an inverted gamma the prior mean is 0.01 and the prior st. error is 65 / 114

66 Preliminaries and steady state Estimation command Decomposition Output Example : steady state steady state calculated for many different values of Ψ! solve for the steady state yourself (linearizing makes it easier) give the exact steady state to Dynare for the initial values option to provide own function that calculates steady state! modfilename steadystate.m or steady state model; block 66 / 114

67 Preliminaries and steady state Estimation command Decomposition Output Example : estimation then also tell Dynare which are the observable variables varobs y; estimation(options); options include specify data file for estimation: datafile=data number of MH sequences: mh nblocks number of MH replications: mh replic parameter of stand-in distribution variance (c): mh jscale variance of initial draw: mh init scale first observation (default first): first obs sample size (default all): nobs many more! 67 / 114

68 Preliminaries and steady state Estimation command Decomposition Output Example : decomposition decompose endogenous variables into contribution of shocks possible also after stoch simul shock decomposition(options) variables; options include e.g. parameter set use calibrated values: =calibration use prior/posterior mode: =prior mode/=posterior mode variables specifies for which variables to run the decomposition 68 / 114

69 Preliminaries and steady state Estimation command Decomposition Output Example : output RESULTS FROM POSTERIOR MAXIMIZATION: most important is the mode other stuff based on normality assumptions (typically violated) when Dynare gets to MCMC part it shows: in which MCMC sequence you are which fraction has been completed acceptance rate: adjust mh jscale appropriately remember that low acceptance rate algorithm travels through a larger part of Ψ domain 69 / 114

70 Preliminaries and steady state Estimation command Decomposition Output Example : plots priors MCMC diagnostics prior and posterior densities shocks implied at the mode observables and corresponding implied values 70 / 114

71 Preliminaries and steady state Estimation command Decomposition Output Example Estimating the neoclassical growth model use neoclassical growth model as data generating process 265 observations of output use Bayesian estimation to estimate σ σ, ρ, δ, α 71 / 114

72 Preliminaries and steady state Estimation command Decomposition Output Example Estimating the neoclassical growth model Easy case: estimated params; stderr e, inv gamma pdf, 0.01, inf; end; varobs y; estimation(datafile=y,mh nblocks=1,mh replic=10000, mh jscale=3,mh init scale=12) c, k, y; 72 / 114

73 Preliminaries and steady state Estimation command Decomposition Output Example MCMC prior plots-easy case 150 SE_e / 114

74 Preliminaries and steady state Estimation command Decomposition Output Example Shocks-easy case 0.03 e / 114

75 Preliminaries and steady state Estimation command Decomposition Output Example Observables and implied values-easy case 4.2 y / 114

76 Preliminaries and steady state Estimation command Decomposition Output Example Posterior density plots-easy case SE_e / 114

77 Preliminaries and steady state Estimation command Decomposition Output Example Printed results - easy case Posterior mode: (0.0004) Average acceptance rate: 37.7% Diagnostic statistics (Geweke): p-values on equality of means in sub-samples (no taper) 0.33 (4% taper) 0.38 (8% taper) etc. Posterior mean and HPD interval: ( ) 77 / 114

78 Preliminaries and steady state Estimation command Decomposition Output Example What we did today Basic concept of Bayesian estimation priors evaluating the posterior Markov Chain Monte Carlo (MCMC) practical issues acceptance rate, diagnostics implementation in Dynare 78 / 114

79 Preliminaries and steady state Estimation command Decomposition Output Example What we did in the first half of course Motivation Week 1: Use of computational tools, simple DSGE model Tools necessary to solve models and a solution method Week 2: function approximation and numerical integration Week 3: theory of perturbation (1st and higher-order) Tools necessary for, and principles of, estimation Week 4: Kalman filter and Maximum Likelihood estimation Week 5: principles of Bayesian estimation 79 / 114

80 Preliminaries and steady state Estimation command Decomposition Output Example 80 / 114

81 Trends More on priors Alternatives Trends Problem: methodology works for stationary environments data has trends not clear which trend the model represents? 81 / 114

82 Trends More on priors Alternatives Trends 10 5 deviations from trend (%) HP (1600) HP(10 5 ) linear quadratic BP(6,32) / 114

83 Trends More on priors Alternatives Trends we could build in a trend within the model e.g. productivity is trending stationarize non-stationary variables within the model i.e. inspect variables relative to productivity however, not clear that data satisfies balanced growth 83 / 114

84 Trends More on priors Alternatives Trends 0.65 Great ratios 0 c/y real i/y real :1 1962:4 1975:1 1987:2 2000: c/y nominal i/y nominal :1 1962:2 1975:1 1987:2 2000: :1 1962:2 1975:1 1987:2 2000: :1 1962:2 1975:1 1987:2 2000: / 114

85 Trends More on priors Alternatives Trends Solutions: use differenced data highlights high-frequency movements (measurement error) detrend prior to estimation 85 / 114

86 Trends More on priors Alternatives Estimation on detrended data use e.g. quadratic trend: y t = a 0 + a 1 t + a 2 t 2 + u t each variable can have its own trend using HP or Band Pass filter: y obs filtered t = B(L)y obs t B(L) is a 2-sided filter! creates artificial serial correlation in the filtered data apply filter also to model data 86 / 114

87 Trends More on priors Alternatives Estimation on detrended data the above implies that the model is fitted to low(er) frequencies only Canova (2010) points out that the above can lead to: underestimated volatility of shocks persistence of shocks is overestimated less perceived noise decisions rules imply higher predictability substitution and income effects may be distorted due to the above proposes to estimate flexible trend specifications within model 87 / 114

88 Trends More on priors Alternatives More on selecting priors what we ve described is based on selecting (independent) priors about deep parameters however, often we have priors about observables moreover, reasonable independent priors may form rather unreasonable properties of the model solutions proposed in the literature: Del Negro, Schorfheide (2008) Andrle, Benes (2013) Jarocinsky, Marcet (2013) 88 / 114

89 Trends More on priors Alternatives Del Negro, Schorfheide (2008) more guidance for eliciting priors three main issues with (independent) priors about deep parameters: may lead to probability mass on unrealistic properties of the model most exogenous shock processes are latent, i.e. difficult to form priors about priors are often transfered to different models 89 / 114

90 Trends More on priors Alternatives Del Negro, Schorfheide (2008) they group parameters into three categories: those determining the steady state those determining exogenous shocks those determining the endogenous propagation mechanism 90 / 114

91 Trends More on priors Alternatives Del Negro, Schorfheide (2008) Parameters related to steady state relationships discount rate, depreciation, returns to scale, inflation target etc. let S D (Ψ ss ) be a vector of steady state relationships depending on a set of parameters Ψ ss then Ŝ = S D(Ψ ss ) + η are measurements of those relationships with measurement error η Ŝ has a probabilistic interpretation and therefore using Bayes rule, one can write P(Ψ ss Ŝ) L(Ŝ Ψ ss)p(ψ ss ) allows for overidentification 91 / 114

92 Trends More on priors Alternatives Del Negro, Schorfheide (2008) Exogenous processes volatility and persistence parameters use implied moments of endogenous variables to back out priors the above is given values for Ψ ss and Ψ endo valid for a particular model and should not be directly transfered across models 92 / 114

93 Trends More on priors Alternatives Del Negro, Schorfheide (2008) Endogenous propagation mechanisms price rigidity, labor supply elasticity etc. one could use similar principle as above authors suggest independent priors because researchers often have a relatively good idea note that the joint prior induces non-trivial non-linear relationships between parameters joint prior becomes P(Ψ Ŝ) L(Ŝ Ψ ss)p(ψ ss )P(Ψ endo ) requires an additional step in MCMC algorithm 93 / 114

94 Trends More on priors Alternatives Andrle, Benes (2013) Andrle and Benes do not distinguish between groups of parameters their system priors are priors about concepts such as impulse response functions conditional correlations etc. 94 / 114

95 Trends More on priors Alternatives Andrle, Benes (2013) even sensible individual-parameter priors can lead to unintended properties of the aggregate model independence of priors can lead to substantial mass on such parameter regions call for careful prior-predictive analysis: IRFs, second moments... compare with posterior results is it the data or the model driving the results? 95 / 114

96 Trends More on priors Alternatives Andrle, Benes (2013) Candidates for system priors: steady states sensible values in levels or growth rates (un-)conditional moments cross-correlations (conditional on shocks) impulse response properties peak impacts, duration, horizon of monetary policy effectiveness etc. 96 / 114

97 Trends More on priors Alternatives Andrle, Benes (2013) Implementation: use Bayes rule again specify model properties you care about Z = h(ψ) these can be characterized by a probabilistic model Z D(Z s ) D(Z s ) is a distribution function Z s are parameters of that function (hyper-parameters) its likelihood function (the system prior): P(Z s Ψ, h) composite joint prior: P(Ψ Z s, h) P(Z s Ψ, h)p(ψ) 97 / 114

98 Trends More on priors Alternatives Andrle, Benes (2013) The posterior becomes P(Ψ Y T, Z s ) L(Y T Ψ)P(Z s Ψ, h)p(ψ) evaluation is in principle the same as before use of MCMC methods additional step in evaluating the system prior slows things down - have to run MCMC on prior (with likelihood switched off ) and then posterior 98 / 114

99 Trends More on priors Alternatives Jarocinsky, Marcet (2013) similar ideas as above, but in the context of Bayesian VARs their point is that widely used priors about parameters can lead to behavior of observables that is counterfactual always a good to do prior-predictive analysis of you model! 99 / 114

100 Trends More on priors Alternatives Alternatives to Bayesian estimation Maximum likelihood calibration GMM SMM & indirect inference 100 / 114

101 Trends More on priors Alternatives Maximum likelihood we ve seen it yesterday conceptually different from Bayesian estimation tools required part of Bayesian estimation 101 / 114

102 Trends More on priors Alternatives Calibration wide-spread methodology at least since Kydland and Prescott (1982) prior to this, state-of-the-art were systems of simultaneous equations those were viewed as true statistical models to be estimated 102 / 114

103 Trends More on priors Alternatives Calibration although calibration is also an empirical exercise it lacks the probabilistic interpretation the constraint is that the model mimics (a priori identified) features in the data Kydland and Prescott (1996): It is important to emphasize that the parameter values selected are not the ones that provide the best fit in some statistical sense. 103 / 114

104 Trends More on priors Alternatives Calibration Parameters are pinned down by a selection of real-world features long-run averages (labor share, hours worked) micro studies (preference parameters) certain business cycle properties of the data (shock parameters) etc. 104 / 114

105 Trends More on priors Alternatives Calibration compare different features of the data to model predictions closely related to moment-matching (estimating models) however, calibration lacks the statistical formality the above is a strong source of criticism of calibration no formal rules on selecting dimension to which model is fit no formal rules of comparing alternatives - models are necessarily misspecified not that the last point does not hold for Bayesian model comparison 105 / 114

106 Trends More on priors Alternatives Matching moments (GMM, SMM, II) idea similar to calibration: a set of moments (features) of the data used to parameterize model a different set of moments used to judge the performance of model matching moments adds statistical rigor estimation hypothesis testing 106 / 114

107 Trends More on priors Alternatives Matching moments (GMM, SMM, II) as with calibration, moment matching is based on a selection of moments often referred to as limited-information procedures a full range of statistical implications contained in model s likelihood function disadvantages of limited-information procedures potential loss of efficiency inference potentially sensitive to selected moments advantages of limited-information procedures no need to make distributional assumptions 107 / 114

108 Trends More on priors Alternatives Generalized method of moments attributed to Hansen (1982), generalization, asymptotic properties the main idea is to use orthogonality conditions (e.g. first-order-conditions) E[f (x t, Ψ)] = 0 x t is a vector of variables Ψ are model parameters 108 / 114

109 Trends More on priors Alternatives Generalized method of moments pick Ψ s.t. the sample analogs of orthogonality conditions g(x, Ψ) = 1/T t f (x t, Ψ) hold exactly, exactly identified case number of parameters = number of moment conditions are as close to zero as possible, overidentified case are number of parameters < number of moment conditions 109 / 114

110 Trends More on priors Alternatives Generalized method of moments in the over-identified case Ω is a weighting matrix min Ψ g(x, Ψ) Ωg(X, Ψ) the optimal weighting matrix is the inverse of the var-covar matrix of g(x, Ψ) 110 / 114

111 Trends More on priors Alternatives Simulated method of moments in some cases the orthogonality conditions cannot be assessed analytically moment-matching estimation based on simulations retains asymptotic properties of GMM 111 / 114

112 Trends More on priors Alternatives Simulated method of moments let z t be model variables corresponding to data x t let empirical targets be summarized by h(x t ) SMM estimation is based on E[h(x t )] = E[h(z t, Ψ)] f (x t, Ψ) = h(x t ) h(z t, Ψ) 112 / 114

113 Trends More on priors Alternatives Indirect inference based on reduced-form models main idea is to use structural model to interpret reduced-form results can simulated data from a structural model replicate a reduced-form estimate using real-world data? i.e. it is a moment-matching exercise moments are clearly defined by prior reduced-form analysis 113 / 114

114 Trends More on priors Alternatives Indirect inference let δ be a vector of reduced-form estimates δ(x t ) are those in the data and δ(z t, Ψ) are those from the model pick Ψ s.t. δ(x t ) = δ(z t, Ψ) 114 / 114

Bayesian Estimation of DSGE Models

Bayesian Estimation of DSGE Models Stéphane Adjemian Université du Maine, GAINS & CEPREMAP stephane.adjemian@univ-lemans.fr http://www.dynare.org/stepan June 28, 2011 June 28, 2011 Université du Maine,