Timevarying VARs Wouter J. Den Haan London School of Economics c Wouter J. Den Haan
Time-Varying VARs Gibbs-Sampler general idea probit regression application (Inverted Wishart distribution Drawing from a multi-variate Normal in Matlab Time-varying VAR model specification Gibbs sampler 2 / 38
Gibbs Sampler Suppose x, y, and z are distributed according to f (x, y, z Suppose that drawing x, y, and z from f (x, y, z is diffi cult but you can draw x from f (x y, z and you can draw y from f (y x, z and you can draw y from f (z x, y 3 / 38
Gibbs Sampler - how it works Start with y 0, z 0 Draw x 1 from f (x y 0, z 0, Draw y 1 from f (y x 1, z 0, Draw z 1 from f (z x 1, y 1, Draw x 2 from f (x y 1, z 1 (x i, y i, z i is one draw from the joint density f (x, y, z Although series are constructed recursively, they are not time series 4 / 38
Gibbs Sampler - convergence The idea is that this sequence converges to a sequence drawn from f (x, y, z. Since convergence is not immediate, you have to discard beginning of sequence (burn-in period. See Casella and George (1992 for a discussion on why and when this works. 5 / 38
Gibbs Sampler - probit regression This example is from Lancaster (2004 y i is the i th observation of a binary variable, i.e., y i {0, 1} y i is an unobservable and given by y i = x i β + ε i, ε i N(0, 1 { 1 if y y = i 0 0 o.w. 6 / 38
Probit regression Parameters: β and y = [y 1, y 2,, y n] Data: X = [x 1, x 2,, x n ], Y = {y i, x i } n i=1 Objective: get p ( β Y, i.e., the distribution of β given Y. With the Gibbs sample we can get a sequence of obervations for ( β, y distributed according to p ( β, ŷ Y, from which we can get p ( β Y 7 / 38
Probit - Gibb sampler step 1 We need to draw from p ( β y, Y Given y and X β N ( (X X 1 X y, ( X X 1, since the standard deviation of ε i is known and equal to 1. 8 / 38
Probit - Gibb sampler step 2 We need to draw from p (y β, Y Since the y i s are independent, we can do this separately for each i y i N >0 (x i β, 1 if y i = 1 y i N <0 (x i β, 1 if y i = 0, where N >0 ( is a Normal distribution truncated on the left at 0 N <0 ( is a Normal distribution truncated on the right at 0 9 / 38
Wishart distribution generalization of Chi-square distribution to more variables X : n p matrix; each row drawn from N p (0, Σ, where Σ is the p p variance-covariance matrix W = X X W p (Σ, n, i.e., the p-dimensional Wishart with scale matrix Σ and degrees of freedom n You get the Chi-square if p = 1 and Σ = 1 10 / 38
Inverse Wishart distribution If W has a Wishart distribution with parameters Σ and n, then W 1 has an inverse Wishart with scale matrix Σ 1 and degrees of freedom n!!! In the assignment, the input of the Matlab Inverse Wishart function is Σ not Σ 1. 11 / 38
Inverse Wishart in Bayesian statistics Data: x t is a p 1 vector with i.i.d. random observations with distribution N (0, V prior of V : ( p (V = IW V 1, n posterior of V : p ( V X T ( = IW W 1, n + T W = V + V T V T = T x tx t t=1 Note that V T is like a sum of squares 12 / 38
Multivariate normal in Matlab x t is a p 1 vector and we want x t N(0, Σ C=chol(Σ Thus C is an upper-triangular matrix and C C = Σ e t is a p 1 vector with draws from N(0, I p E [ C e t e tc ] = Σ Thus, C e t is a p 1 vector with draws from N(0, Σ 13 / 38
Time-varying VARs - intro Idea: capture changes in model specification in a flexible way The analysis here is based on Cogley and Sargent (2002, CS 14 / 38
Model specification y t = X tθ t + ε t X t = [ 1, y t 1, y t 2,, y t p ] θ t = θ t 1 + v t ε t N(0, R v t N(0, Q 15 / 38
Model specification E t [ εt v t ] [ εt v t ] = V = ( R C C Q θ t : "parameters" R, C, and Q are the "hyperparameters" 16 / 38
Model specification - details Simplifying assumptions: CS impose that θ t is such that y t would be stationary if θ t+τ = θ t for all τ 0. This stationarity requirement is left out for transparency. C = 0. 17 / 38
Notation Y T = [ y 1,, ] y T θ T = [ θ 0, θ 1,, ] θ T 18 / 38
Priors Prior for initial condition: θ 0 N ( θ, P Prior for hyperparameters: p(v = IW (V 1, T 0 θ, P, V, T 0 are taken as given 19 / 38
Posterior The posterior is given by p (θ T, V Y T We can use the Gibbs sampler if we can draw from ( P θ T Y T, V and from P (V Y T, θ T 20 / 38
Stationarity CS exclude draws of θ t for which the dgp of y t is nonstationary: p (θ t is density without imposing stationarity and f (θ t is density with imposing stationarity This restriction is ignored in these slides 21 / 38
Gibbs part I: Posterior of theta given V Since f (A, B = f (A B f (B, we have ( ( p θ T Y T, V = f θ T Y T, V ( ( = f θ T 1 θ T, Y T, V f θ T Y T, V ( ( = f θ T 2 θ T, θ T 1, Y T, V f θ T 1 θ T, Y T, V ( f θ T Y T, V ( ( = f θ T 3 θ T, θ T 1, θ T 2, Y T, V f θ t 2 θ T, θ T 1, Y T, V ( ( f θ T 1 θ T, Y T, V f θ T Y T, V 22 / 38
Posterior of theta given V Since θ t = θ t 1 + v t, θ t+τ has no predictive power for θ t 1 for all τ 1 given Y T and θ t, Thus ( f θ T 2 θ T, θ T 1, Y T, V ( f θ T 3 θ T, θ T 1, θ T 2, Y T, V ( = f θ T 2 θ T 1, Y T, V ( = f θ T 3 θ T 2, Y T, V etc. 23 / 38
Posterior of theta given V Combining gives ( ( p θ T Y T, V = f θ T Y T T 1 (, V f θ t θ t+1, Y T, V t=1 All the densities are Gaussian = if we( know the means and the variances, then we can draw from p θ T Y T, V 24 / 38
Posterior of theta given V We need to find the means and variances of ( ( f θ T Y T, V & f θ t θ t+1, Y T, V Notation θ t t = E ( θ t Y t, V ( P t t 1 = VAR θ t Y t 1, V P t t = VAR ( θ t Y t, V θ t t+1 = E ( θ t θ t+1, Y t, V ( = E θ t θ t+1, Y T, V P t t+1 = VAR ( θ t θ t+1, Y t, V ( = VAR θ t θ t+1, Y T, V 25 / 38
Posterior of theta given V First, use Kalman filter to go forward start with θ 0 and P 0 0 Next, go backwards to get draws for θ t given θ t+1 26 / 38
Posterior of theta given V Kalman filter part: y t = X tθ t + ε t X t = [ 1, y t 1, y t 2,, y t p ] θ t = θ t 1 + v t ε t N(0, R v t N(0, Q Here: the p + 1 elements of X t are the known (time-varying coeffi cients of the state-space represenation the elements of θ t are the unobserved underlying state variables 27 / 38
Posterior of theta given V Kalman filter in the first period: and then iterate P 1 0 = P 0 0 + Q 1 K 1 = P 1 0 X 1 (X 1 P 1 0X 1 + R θ 1 1 = θ 0 0 + K 1 (y 1 X 1 θ 0 0 P t t 1 = P t 1 t 1 + Q 1 K t = P t t 1 X t (X tp t t 1 X t + R ( θ t t = θ t 1 t 1 + K t y t X tθ t 1 t 1 P t t = P t t 1 K t X tp t t 1 28 / 38
Posterior of theta given V In the Kalman filter part of the assignment: and we go up to TH(:,1 = θ 0 TH(:,t+1 = θ t t Pe(:,:,t = P t 1 t 1 Po(:,:,t = P t t 1 TH(:,T+1 = θ T T Pe(:,:,T+1 = P T T Po(:,:,T = P T T 1 29 / 38
Posterior of theta given V Distribution terminal state: ( f θ T Y T, V = N (θ T T, P T T From this we get a draw θ T 30 / 38
Posterior of theta given V Draws for θ T 1, θ T 2,, θ 1 are obtained recursively from ( f θ t θ t+1, Y T, V = N (θ t t+1, P t t+1 θ t t+1 = θ t t + P t t P (θ 1 t+1 t t+1 θ t t P t t+1 = P t t P t t P 1 t+1 t P t t The terms needed to calculate θ t t+1 and P t t+1 are generated by the Kalman filter (that is, from going forward and the standard projection formulas (and note that the covariance of θ t+1 and θ t is the variance of θ t 31 / 38
Details for previous slide E [y x] = µ y + Σ yx Σxx 1 (x µ x = E [θ t θ t+1 ; ] = E [θ t ] + Σ θt θ t+1 Σ 1 θ t+1 θ t+1 (θ t+1 E [θ t+1 ] = E [θ t ] + Σ θt θ t Σ 1 θ t+1 θ t+1 (θ t+1 E [θ t+1 ] = θ t t+1 = θ t t + P t t P 1 t+1 t (θ t+1 E [θ t + v t ] = θ t t + P t t P 1 t+1 t (θ t+1 E [θ t ] = θ t t + P t t P (θ 1 t+1 t t+1 θ t t Suppressing the dependence on Y t and V to simplify notation 32 / 38
Posterior of theta given V In the backward part of the assignment: Draw from TH(t-1 t In the for loop below t goes from high to low. At a particular t: 1 TH(:,t+1 it is a random draw from a normal that has already been determined (either in this loop or for T above 2 TH(:,t on the RHS of the mean equation is equal to theta_(t-1 (t-1 3 TH(:,t what we end up with is a random draw for theta(t-1 conditional on knowning theta in the next period 33 / 38
Why go forward & backward? The Kalman filter gives us E ( θ t Y t, V and VAR(θ t Y t, V With this information, we can also obtain draws for θ t ( However, we need draws from f θ T Y T, V not from ( f θ T Y t, V. The analysis above showed how to get draws ( from f θ T Y T, V recursively by going backward. 34 / 38
Relation to Kalman Smoother The Kalman smoother also goes backwards and resembles the procedure here. However, there is a difference. The Kalman smoother computes the mean and variance for f ( θ t Y T, V We need the mean and variance for f ( θ t θ t+1, Y T, V Since ( f θ t θ t+1, Y T, V = f ( θ t θ t+1, Y t, V, we can calculate these from Kalman filter without using the Kalman smoother 35 / 38
Gibbs part II: Posterior of V given theta Next step is to draw from the posterior given Y T and θ T, that is get a draw from p (V Y T, θ T The posterior combines the prior and information from the data = in each Gibbs iteration the prior is the same but the data set (i.e., θ T is different 36 / 38
Gibbs part II: Posterior of V given theta Given Y T and θ T, we can calcluate ε t and ν t. Both have mean zero and a Normal distribution Thus ( p V Y T, θ T = IW(V1 1, T 1 T 1 = T 0 + T V 1 = V + V T V T = T ( εt v t=1 t (ε t v t!!! Note that V, V T, &V 1 are like a sum of squares, whereas V (and R&Q are like a sum of squares divided by number of observations (same notation as in CS 37 / 38
References Casella, G., and E.I. George, 1992, Explaining the Gibbs Sampler, The American Statistician, 46 167-74. Cogley, T. and T.J. Sargent, 2001, Evolving Post-World War II U.S. Inflation Dynamics, NBER Macroeconomics Annual 2001, volume 16, 331-338. De Wind, J, 2014, Accounting for time-varying and nonlinear relationships in macroeconomic models, dissertation, University of Amsterdam. Available at http://dare.uva.nl/document/523663. Lancaster, T., 2004, An introduction to modern Bayesian econometrics, Blackwell Publishing. very nice textbook covering lots of stuff in an understandable way 38 / 38