Open Economy Macroeconomics: Theory, methods and applications

Open Economy Macroeconomics: Theory, methods and applications Lecture 4: The state space representation and the Kalman Filter Hernán D. Seoane UC3M January, 2016

Today s lecture State space representation The Kalman Filter

Today s lecture Some references Hamilton (2000), Ch13 Bauer, Haltom, Rubio-Ramirez, 2003, Using the Kalman Filter to Smooth the Shocks of a Dynamic Stochastic General Equilibrium Model Bauer, Haltom, Rubio-Ramirez, 2005, Smoothing the shocks of a dynamic stochastic general equilibrium model Sargent and Ljunqvist, 2012. Advanced Macroeconomic Theory. Ch 2 Kim and Nelson, 1999. State-Space models with regime-switching. Ch 2 and Ch 3

State space representation x t+1 = Fx t + v t+1 y t = H x t + w t Where y t is a vector of variables observed at t and x t+1 is a vector of unobserved variables at t, the state vector F and H are matrices of coefficients of the required dimensions The first equation is the state equation, and the second equation is the measurement or observed equation

State Space Representation v t and w t are uncorrelated normally distributed white noise vectors E(v t v t ) = Q E(w t w t ) = R

State Space Representation It is not unique Suppose B is a square matrix, non-singular, and conformable with F Define xt = Bx t, F = BFB 1 and H = (H B). Then, x t+1 = F x t + Bv t+1 y t = (H ) x t + w t

State Space Representation Example, consider an AR(2) process y t = ρ 1 y t 1 + ρ 2 y t 2 + w t Define x t = ( ρ 1 ρ 2 1 0 ) x t 1 + ( 1 0 ) w t For x t = [y t y t 1 ] This is the transition equation The measurement equation is y t = [1 0]x t

Some preliminary stuff Suppose we want to forecast based on conditional expectations: forecast the value of Y t+1 based on variables X t Suppose we want the forecast to be a linear function Y t+1 = α X t Suppose we can find α s such that the forecast error Y t+1 α X t is uncorrelated to X t Here α X t is called a linear projection of Y t+1 on X t

Some notation Let x t+1 t = E(x t+1 y t ) be the linear projection of x t+1 on y t and a constant Let y t+1 t = E(y t+1 y t ) = H x t+1 t be the linear projection of y t+1 on y t and a constant Let P t+1 t = E(x t+1 x t+1 t )(x t+1 x t+1 t ) be the mean squared forecasting error when projecting x t+1 Let Σ t+1 t = E(y t+1 y t+1 t )(y t+1 y t+1 t ) = H P t+1 t H + R be the mean squared forecasting error when projecting y t+1

How does it work? The Kalman Filter starts by assuming an initial state condition Suppose we have x t t 1 and y t t 1 We observe y t we need to update x t t With x t t, we could compute x t+1 t = Fx t t and also y t+1 t = H x t+1 t Here a whole updating of information would have occurred and we just need to wait to get the new y t+1 observation

Forecasting y t Suppose we have x t t 1 and P t t 1 And we observe a new realization of the data, y t... now we want to use this new data to obtain x t+1 t and P t+1 t Let s first find the forecast of y t : ŷ t t 1 = Ê(y t y t 1 ) note that: Ê(y t x t ) = H x t ŷ t t 1 = H Ê(x t y t 1 ) = H x t t 1, given that x t t 1 is known from the previous iteration, we can solve for the forecast of y t

Forecasting y t The error of this forecast is y t ŷ t t 1 = H x t + w t H x t t 1 = H (x t x t t 1 ) + w t with a MSE of E[(y t ŷ t t 1 )(y t ŷ t t 1 ) ] = E[H (x t x t t 1 )(x t x t t 1 ) H] + E[w t w t] or E[(y t ŷ t t 1 )(y t ŷ t t 1 ) ] = H P t t 1 H + R

Updating inference about x t The inference about x t is updated on the basis on the evidence of y t to produce x t t = Ê(x t y t, y t 1 ) = Ê(x t y t ) This comes just from using the formula for updating a linear projection: x t t = x t t 1 + {E[(x t x t t 1 )(y t ŷ t t 1 ) ]} {E[(y t ŷ t t 1 )(y t ŷ t t 1 ) ]} 1 (y t ŷ t t 1 ) x t t = x t t 1 + {E[(x t x t t 1 )(H (x t x t t 1 ) + w t ) ]} {E[(y t ŷ t t 1 )(y t ŷ t t 1 ) ]} 1 (y t ŷ t t 1 ) x t t = x t t 1 + {E[(x t x t t 1 )(x t x t t 1 ) H]} {E[(y t ŷ t t 1 )(y t ŷ t t 1 ) ]} 1 (y t ŷ t t 1 ) x t t = x t t 1 + P t t 1 H {H P t t 1 H + R} 1 (y t H x t t 1 ) The MSE associated to this update P t t P t t = P t t 1 P t t 1 H(H P t t 1 H + R) 1 H P t t 1

Producing a forecast of x t+1 Now we want to forecast x t+1 t = Ê(x t+1 y t ) x t+1 t = Ê(x t+1 y t ) = FÊ(x t y t ) + Ê(v t+1 y t ) = Fx t t + 0 Plugging our previous findings ( ) x t+1 t = F x t t 1 + P t t 1 H {H P t t 1 H + R} 1 (y t H x t t 1 ) x t+1 t = Fx t t 1 + FP t t 1 H {H P t t 1 H + R} 1 (y t H x t t 1 ) Define K t = FP t t 1 H(H P t t 1 H + R) 1

How does it work? Hence x t+1 t = Fx t t 1 + K t (y t H x t t 1 ) This is an updating equation after observing y t K t is the Kalman Gain and determines how much importance is going to be allocated to the new information We want the K t that minimizes the mean squared forecast error P t+1 t = FP t t F + Q

The algorithm Given x t t 1 and P t t 1 and observation y t, Kalman Filter algorithm is as follows y t t 1 = H x t t 1 Σ t t 1 = H P t t 1 H + R x t t = x t t 1 + H P t t 1 [H P t t 1 H + R] 1 (y t y t t 1 ) P t t = P t t 1 P t t 1 H[H P t t 1 H + R] 1 HP t t 1 x t+1 t = Fx t t P t+1 t = FP t t F + Q

Intuition about K t Remember Rewrite it as K t = P t t 1 H(H P t t 1 H + R) 1 K t = P t t 1 H(Σ t t 1 ) 1 If we did a big mistake forecasting x t t 1, K t is large, which means we are going to put a lot of weight on the new information

Note that intuitively, we start from an initial condition Then, we are using the observables to update our forecast of the unobserved variables Where is the initial condition coming from? Where do we start the system?

We focus in stationary processes Initialize the algorithm in the steady state x 1 0 = x P 1 0 = P Where x = Fx P = FP F + Q The second expression is a Lyapunov equation and can be solved iteratively or using Kronecker products

So far Note that we have a way of recovering filtered estimates of the unobserved components conditional on the up to t data We can try to do better than that, when running the Kalman Filter, we already have the whole sequence of observables up to period T We can try to recover the smoothed estimates of the unobserved variables x T = {x t } T t=1, by computing their value conditional on the whole sample, y T

So far We are looking for x t T = E(x t y T ) This procedure is called smoothing and we do it using the Kalman Smoother Inputs for the Kalman Smoother are all obtained from Kalman Filter

The Kalman Smoother Suppose we know x t+1. Using the formula for updating linear projections E(x t x t+1, y t ) = x t t + ( E(x t x t t )(x t+1 x t+1 t ) ) P 1 t+1 t (x t+1 x t+1 t ) Here: E(x t x t t )(x t+1 x t+1 t ) = E[(x t x t t )(Fx t + v t+1 Fx t t ) ] = P t t F The error and x t and the projection are uncorrelated, then E[(x t x t t )(x t x t t )F ] = P t t F E(x t x t+1, y t ) = x t t + J t (x t+1 x t+1 t ) for J t = P t t F P 1 t+1 t

The Kalman Smoother Now, this linear projection E(x t x t+1, y t ) is the same as E(x t x t+1, y T ) Which is true because y t+j = H ( F j 1 x t+1 + F j 2 v t+2 +... + v t+j ) + w t+j for all j and the error x t E(x t x t+1, y t ) is uncorrelated with x t+1 (by definition of linear projection) and v t+2,..., v t+j and w t+j (because of our maintained assumptions) once we know x t+1 additional data contains no information then the error x t E(x t x t+1, y t ) is uncorrelated of y t+1 for all j > 0, then E(x t x t+1, y t ) = E(x t x t+1, y T ) = x t t + J t (x t+1 x t+1 t )

The Kalman Smoother Finally integrating out x t+1 ( E(x t y T ) = E E(x t x t+1, y T ) y T) = x t t + J t (E(x t+1 y T ) x t+1 t ) ( E(x t y T ) = E E(x t x t+1, y T ) y T) = x t t + J t (x t+1 T x t+1 t )

Algorithm Run the Kalman Filter and keep {x t t } T t=1, {x t+1 t} T 1 t=0, {P t t} T t=1, {P t+1 t } T 1 t=0 Note that the last entry in {x t t } T t=1 is x T T We have all the information for J t = P t t F P 1 t+1 t which we can use in E(x t y T ) = x t t + J t (x t+1 T x t+1 t ) to obtain x t T Finally, we iterate backwards

Implications Given that the innovations are Gaussian and the system is linear [ x t y t ] y t 1 N ([ x t t 1 y t t 1 ] [, P t t 1 H P t t 1 P t t 1 H H P t t 1 H + R ]) This implies that: x t y t N(x t t, P t t ) A consequence of these is that y t y t 1 is also normally distributed