Filtering and Likelihood Inference

Filtering and Likelihood Inference Jesús Fernández-Villaverde University of Pennsylvania July 10, 2011 Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 1 / 79

Motivation Introduction Filtering, smoothing, and forecasting problems are pervasive in economics. Examples: 1 Macroeconomics: evaluating likelihood of DSGE models. 2 Microeconomics: structural models of individual choice with unobserved heterogeneity. 3 Finance: time-varying variance of asset returns. However, ltering is a complicated endeavor with no simple and exact algorithm. Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 2 / 79

Environment I Introduction Discrete time t 2 f1, 2,...g. Why discrete time? 1 Economic data is discrete. 2 Easier math. Comparison with continuous time: 1 Discretize observables. 2 More involved math (stochastic calculus) but often we have extremely powerful results. Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 3 / 79

Environment II Introduction States S t. We will focus on continuous state spaces. Comparison with discrete states: 1 Markov-Switching models. 2 Jumps and continuous changes. Initial state S 0 is either known or it comes from p (S 0 ; γ). Properties of p (S 0 ; γ)? Stationarity? Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 4 / 79

State Space Representations State Space Representations Transition equation: S t = f (S t 1, W t ; γ) Measurement equation: Y t = g (S t, V t ; γ) f and g are measurable functions. Interpretation. Modelling origin. Note Markov structure. Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 5 / 79

Shocks State Space Representations fw t g and fv t g are independent of each other. fw t g is known as process noise and fv t g as measurement noise. W t and V t have zero mean. No assumptions on the distribution beyond that. Often, we assume that the variance of W t is given by R t and the variance of V t by Q t. Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 6 / 79

State Space Representations DSGE Models and State Space Representations We have the solution of a DSGE model: S t = P 1 S t 1 + P 2 Z t Y t = R 1 S t 1 + R 2 Z t This has nearly the same form that S t = f (S t 1, W t ; γ) Y t = g (S t, V t ; γ) We only need to be careful with: 1 To rewrite the measurement equation in terms of S t instead of S t 1. 2 How we partition Z t into W t and V t. Later, we will present an example. Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 7 / 79

Generalizations I State Space Representations We can accommodate many generalizations by playing with the state de nition: 1 Serial correlation of shocks. 2 Contemporaneous correlation of shocks. 3 Time changing state space equations. Often, even in nite histories (for example in a dynamic game) can be tracked by a Lagrangian multiplier. Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 8 / 79

Generalizations II State Space Representations However, some generalizations can be tricky to accommodate. Take the model: S t = f (S t 1, W t ; γ) Y t = g (S t, V t, Y t 1 ; γ) Y t will be an in nite-memory process. Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 9 / 79

Conditional Densities State Space Representations From S t = f (S t 1, W t ; γ), we can compute p (S t js t 1 ; γ). From Y t = g (S t, V t ; γ), we can compute p (Y t js t ; γ). From S t = f (S t 1, W t ; γ) and Y t = g (S t, V t ; γ), we have: Y t = g (f (S t 1, W t ; γ), V t ; γ) and hence we can compute p (Y t js t 1 ; γ). Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 10 / 79

Filtering Filtering, Smoothing, and Forecasting Filtering: we are concerned with what we have learned up to current observation. Smoothing: we are concerned with what we learn with the full sample. Forecasting: we are concerned with future realizations. Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 11 / 79

Goal of Filtering I Filtering Compute conditional densities: p S t jy t 1 ; γ and p (S t jy t ; γ). Why? 1 It allows probability statements regarding the situation of the system. 2 Compute conditional moments: mean, s tjt and s tjt 1, and variances P tjt and P tjt 1. 3 Other functions of the states. Examples of interest. Theoretical point: do the conditional densities exist? Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 12 / 79

Goals of Filtering II Filtering Evaluate the likelihood function of the observables y T at parameter values γ: p y T ; γ Given the Markov structure of our state space representation: p y T T ; γ = p y t jy t 1 ; γ Then: p y T ; γ = p (y 1 jγ) = Z T t=2 t=1 p y t jy t p (y 1 js 1 ; γ) ds 1 T t=2 1 ; γ Z p (y t js t ; γ) p S t jy t 1 ; γ ds t Hence, knowledge of p S t jy t 1 ; γ T t=1 and p (S 1; γ) allow the evaluation of the likelihood of the model. Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 13 / 79

Two Fundamental Tools Filtering 1 Chapman-Kolmogorov equation: p S t jy t 1 ; γ Z = p (S t js t 1 ; γ) p S t 1 jy t 1 ; γ ds t 1 2 Bayes theorem: p S t jy t ; γ = p (y tjs t ; γ) p S t jy t p (y t jy t 1 ; γ) 1 ; γ where: p y t jy t 1 ; γ Z = p (y t js t ; γ) p S t jy t 1 ; γ ds t Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 14 / 79

Interpretation Filtering All ltering problems have two steps: prediction and update. 1 Chapman-Kolmogorov equation is one-step ahead predictor. 2 Bayes theorem updates the conditional density of states given the new observation. We can think of those two equations as operators that map measures into measures. Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 15 / 79

Filtering Recursion for Conditional Distribution Combining the Chapman-Kolmogorov and the Bayes theorem: p S t jy t ; γ = R p (St js t 1 ; γ) p S t 1 jy t 1 ; γ ds t 1 R R p (St js t 1 ; γ) p (S t 1 jy t 1 ; γ) ds t 1 p (y t js t ; γ) ds t p (y t js t ; γ) To initiate that recursion, we only need a value for s 0 or p (S 0 ; γ). Applying the Chapman-Kolmogorov equation once more, we get p St jy t 1 ; γ T to evaluate the likelihood function. t=1 Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 16 / 79

Initial Conditions I Filtering From previous discussion, we know that we need a value for s 1 or p (S 1 ; γ). Stationary models: ergodic distribution. Non-stationary models: more complicated. Importance of transformations. Initialization in the case of Kalman lter. Forgetting conditions. Non-contraction properties of the Bayes operator. Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 17 / 79

Smoothing Filtering We are interested on the distribution of the state conditional on all the observations, on p S t jy T ; γ and p y t jy T ; γ. We compute: p S t jy T ; γ = p S t jy t ; γ Z p S t+1 jy T ; γ p (S t+1 js t ; γ) p (S t+1 jy t ; γ) ds t+1 a backward recursion that we initialize with p S T jy T ; γ, fp (S t jy t ; γ)g T t=1 and p S t jy t 1 ; γ T we obtained from ltering. t=1 Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 18 / 79

Filtering Forecasting We apply the Chapman-Kolmogorov equation recursively, we can get p (S t+j jy t ; γ), j 1. Integrating recursively: Z p y l+1 jy l ; γ = p (y l+1 js l+1 ; γ) p S l+1 jy l ; γ ds l+1 from t + 1 to t + j, we get p y t+j jy T ; γ. Clearly smoothing and forecasting require to solve the ltering problem rst! Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 19 / 79

Problem of Filtering Filtering We have the recursion p S t jy t ; γ = R p (St js t 1 ; γ) p S t 1 jy t 1 ; γ ds t 1 R R p (St js t 1 ; γ) p (S t 1 jy t 1 ; γ) ds t 1 p (y t js t ; γ) ds t p (y t js t ; γ) A lot of complicated and high dimensional integrals (plus the one involved in the likelihood). In general, we do not have closed form solution for them. Translate, spread, and deform (TSD) the conditional densities in ways that impossibilities to t them within any known parametric family. Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 20 / 79

Exception Filtering There is one exception: linear and Gaussian case. Why? Because if the system is linear and Gaussian, all the conditional probabilities are also Gaussian. Linear and Gaussian state spaces models translate and spread the conditional distributions, but they do not deform them. For Gaussian distributions, we only need to track mean and variance (su cient statistics). Kalman lter accomplishes this goal e ciently. Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 21 / 79

Linear Gaussian Case Kalman Filtering Let the following system: Transition equation s t = Fs t 1 + G ω t, ω t N (0, Q) Measurement equation y t = Hs t + υ t, υ t N (0, R) Assume we want to write the likelihood function of y T = fy t g T t=1. Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 22 / 79

Kalman Filtering The State Space Representation is Not Unique Take the previous state space representation. Let B be a non-singular squared matrix conforming with F. Then, if st = Bs t, F = BFB 1, G = BG, and H = HB 1, we can write a new, equivalent, representation: Transition equation st+1 = F st + G ω t, ω t N (0, Q) Measurement equation y t = H st + υ t, υ t N (0, R) Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 23 / 79

Example I Kalman Filtering AR(2) process: y t = ρ 1 y t 1 + ρ 2 z t 2 + σ υ υ t, υ t N (0, 1) Model is not apparently not Markovian. However, it is trivial to write it in a state space form. In fact, we have many di erent state space forms. Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 24 / 79

Example I Kalman Filtering State Space Representation I: yt ρ1 1 yt = 1 ρ 2 y t 1 ρ 2 0 ρ 2 y t 2 y t = 1 0 y t ρ 2 y t 1 συ + 0 υ t State Space Representation II: yt ρ1 ρ = 2 yt 1 συ + υ y t 1 1 0 y t 2 0 t y y t = 1 ρ t 2 y t 1 1 0 Rotation B = on the second system to get the rst one. 0 ρ 2 Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 25 / 79

Example II Kalman Filtering MA(1) process: y t = υ t + θυ t 1, υ t N 0, σ 2 υ, and Eυt υ s = 0 for s 6= t. State Space Representation I: = yt θυ t 0 1 0 0 yt 1 y t = 1 0 y t θυ t θυ t 1 1 + θ υ t State Space Representation II: s t = υ t 1 y t = sx t + υ t Again both representations are equivalent! Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 26 / 79

Example III Kalman Filtering Now we explore a di erent issue. Random walk plus drift process: y t = y t 1 + β + σ υ υ t, υ t N (0, 1) This is even more interesting: we have a unit root and a constant parameter (the drift). State Space Representation: yt 1 1 = β 0 1 yt1 β y t = 1 0 y t β συ + 0 υ t Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 27 / 79

Kalman Filtering Some Conditions on the State Space Representation We only consider stable systems. A system is stable if for any initial state s 0, the vector of states, s t, converges to some unique s. A necessary and su cient condition for the system to be stable is that: jλ i (F )j < 1 for all i, where λ i (F ) stands for eigenvalue of F. Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 28 / 79

Kalman Filtering Introducing the Kalman Filter Developed by Kalman and Bucy. Wide application in science. Basic idea. Prediction, smoothing, and control. Di erent derivations. Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 29 / 79

Kalman Filtering Some De nitions De nition Let s tjt 1 = E s t jy t 1 be the best linear predictor of s t given the history of observables until t 1, i.e., y t 1. De nition Let y tjt 1 = E y t jy t 1 = Hs tjt 1 be the best linear predictor of y t given the history of observables until t 1, i.e., y t 1. De nition Let s tjt = E (s t jy t ) be the best linear predictor of s t given the history of observables until t, i.e., s t. Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 30 / 79

Kalman Filtering What is the Kalman Filter Trying to Do? Let assume we have s tjt 1 and y tjt 1. We observe a new y t. We need to obtain s tjt. Note that s t+1jt = Fs tjt and y t+1jt = Hs t+1jt, so we can go back to the rst step and wait for y t+1. Therefore, the key question is how to obtain s tjt from s tjt 1 and y t. Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 31 / 79

Kalman Filtering A Minimization Approach to the Kalman Filter Assume we use the following equation to get s tjt from y t and s tjt 1 : s tjt = s tjt 1 + K t y t y tjt 1 = stjt 1 + K t y t Hs tjt 1 This formula will have some probabilistic justi cation (to follow). K t is called the Kalman lter gain and it measures how much we update s tjt 1 as a function in our error in predicting y t. The question is how to nd the optimal K t. The Kalman lter is about how to build K t such that optimally update s tjt from s tjt 1 and y t. How do we nd the optimal K t? Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 32 / 79

Kalman Filtering Some Additional De nitions De nition 0 Let Σ tjt 1 E s t s tjt 1 st s tjt 1 jy t 1 be the predicting error variance covariance matrix of s t given the history of observables until t 1, i.e. y t 1. De nition 0 Let Ω tjt 1 E y t y tjt 1 yt y tjt 1 jy t 1 be the predicting error variance covariance matrix of y t given the history of observables until t 1, i.e. y t 1. De nition Let Σ tjt E 0 s t s tjt st s tjt jy t be the predicting error variance covariance matrix of s t given the history of observables until t, i.e. y t. Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 33 / 79

Kalman Filtering The Kalman Filter Algorithm I Given Σ tjt 1, y t, and s tjt 1, we can now set the Kalman lter algorithm. Let Σ tjt 1, then we compute: 0 Ω tjt 1 E y t y tjt 1 yt y tjt 1 jy t 1 0 0 1 H s t s tjt 1 st s tjt 1 H 0 B = E 0 @ +υ t s t s tjt 1 H 0 C A +H s t s tjt 1 υ 0 t + υ t υtjy 0 t 1 = HΣ tjt 1 H 0 + R Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 34 / 79

Kalman Filtering The Kalman Filter Algorithm II Let Σ tjt Let Σ tjt 1, then we compute: 0 E y t y tjt 1 st s tjt 1 jy t 1 = E H s 0! t s tjt 1 st s tjt 1 0 = HΣ +υ t s t s tjt 1 jy t 1 tjt 1 1, then we compute: K t = Σ tjt 1 H 0 HΣ tjt 1 H 0 + R 1 Let Σ tjt 1, s tjt 1, K t, and y t, then we compute: s tjt = s tjt 1 + K t y t Hs tjt 1 Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 35 / 79

Kalman Filtering Finding the Optimal Gain We want K t such that min Σ tjt. Thus: K t = Σ tjt 1 H 0 HΣ tjt 1 H 0 + R 1 with the optimal update of s tjt given y t and s tjt 1 being: s tjt = s tjt 1 + K t y t Hs tjt 1 Intuition: note that we can rewrite K t in the following way: K t = Σ tjt 1 H 0 Ω 1 tjt 1 1 If we did a big mistake forecasting s tjt 1 using past information (Σ tjt 1 large), we give a lot of weight to the new information (K t large). 2 If the new information is noise (R large), we give a lot of weight to the old prediction (K t small). Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 36 / 79

Example Kalman Filtering Assume the following model in state space form: Transition equation: s t = µ + ω t, ω t N 0, σ 2 ω Measurement equation: y t = s t + υ t, υ t N 0, σ 2 υ Let σ 2 υ = qσ 2 ω. Then, if Σ 1j0 = σ 2 ω, (s 1 is drawn from the ergodic distribution of s t ): K 1 = σ 2 1 ω 1 + q 1 1 + q. Therefore, the bigger σ 2 υ relative to σ 2 ω (the bigger q), the lower K 1 and the less we trust y 1. Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 37 / 79

Kalman Filtering The Kalman Filter Algorithm III Let Σ tjt 1, s tjt 1, K t, and y t. Then, we compute: Σ tjt E s t s tjt st 0 s tjt jy t = 0 0 1 s t s tjt 1 st s tjt 1 0 s t s E B tjt 1 yt Hs tjt 1 K 0 t 0 C @ K t y t Hs tjt 1 st s tjt 1 + A = Σ tjt 1 K t HΣ tjt 1 0 K t y t Hs tjt 1 yt Hs tjt 1 K 0 t jy t where s t s tjt = s t s tjt 1 K t y t Hs tjt 1. Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 38 / 79

Kalman Filtering The Kalman Filter Algorithm IV Let Σ tjt 1, s tjt 1, K t, and y t, then we compute: Σ t+1jt = F Σ tjt F 0 + GQG 0 Let s tjt, then we compute: 1 s t+1jt = Fs tjt 2 y t+1jt = Hs t+1jt Therefore, from s tjt 1, Σ tjt 1, and y t we compute s tjt and Σ tjt. We also compute y tjt 1 and Ω tjt 1 to help (later) to calculate the likelihood function of y T = fy t g T t=1. Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 39 / 79

Kalman Filtering The Kalman Filter Algorithm: A Review We start with s tjt 1 and Σ tjt 1. Then, we observe y t and: Ω tjt 1 = HΣ tjt 1 H 0 + R y tjt 1 = Hs tjt 1 K t = Σ tjt 1 H 0 HΣ tjt 1 H 0 + R 1 Σ tjt = Σ tjt 1 K t HΣ tjt 1 s tjt = s tjt 1 + K t y t Hs tjt 1 Σ t+1jt = F Σ tjt F 0 + GQG 0 s t+1jt = Fs tjt We nish with s t+1jt and Σ t+1jt. Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 40 / 79

Kalman Filtering Writing the Likelihood Function Likelihood function of y T = fy t g T t=1 : log p y T jf, G, H, Q, R = T t=1 T t=1 log p y t jy t 1 F, G, H, Q, R = N 2 log 2π + 1 2 log Ω tjt 1 + 1 2 ς0 t Ω 1 tjt 1 ς t where: is white noise and: ς t = y t y tjt 1 = y t Hs tjt 1 Ω tjt 1 = H t Σ tjt 1 H 0 t + R Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 41 / 79

Kalman Filtering Initial conditions for the Kalman Filter An important step in the Kalman Filter is to set the initial conditions. Initial conditions s 1j0 and Σ 1j0. Where do they come from? Since we only consider stable system, the standard approach is to set: s 1j0 = s Σ 1j0 = Σ where s solves: s = Fs Σ = F Σ F 0 + GQG 0 How do we nd Σ? Σ = [I F F ] 1 vec(gqg 0 ) Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 42 / 79

Kalman Filtering Initial conditions for the Kalman Filter II Under the following conditions: 1 The system is stable, i.e. all eigenvalues of F are strictly less than one in absolute value. 2 GQG 0 and R are p.s.d. symmetric. 3 Σ 1j0 is p.s.d. symmetric. Then Σ t+1jt! Σ. Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 43 / 79

Remarks Kalman Filtering 1 There are more general theorems than the one just described. 2 Those theorems are based on non-stable systems. 3 Since we are going to work with stable system the former theorem is enough. 4 Last theorem gives us a way to nd Σ as Σ t+1jt! Σ for any Σ 1j0 we start with. Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 44 / 79

The Kalman Filter and DSGE models The Kalman Filter and DSGE models Basic real business cycle model: max E 0 t=0 β t flog c t + ψ log (1 l t )g Equilibrium conditions: 1 c t = βe t c t + k t+1 = k α t (e z t l t ) 1 α + (1 δ) k t z t = ρz t 1 + σε t, ε t N (0, 1) 1 c t+1 αk α 1 t+1 (ez t+1 l t+1 ) 1 α + 1 δ l t ψ c t = (1 α) kt α (e z t l t ) 1 α 1 l t c t + k t+1 = k α t (e z t l t ) 1 α + (1 δ) k t z t = ρz t 1 + σε t Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 45 / 79

The Kalman Filter and DSGE models The Kalman Filter and Linearized DSGE Models We loglinearize (or linearize) the equilibrium conditions around the steady state. Alternatives: particle lter. We assume that we have data on: 1 log output t 2 log l t 3 log c t s.t. a linearly additive measurement error V t = v 1,t v 2,t v 3,t 0. Why measurement error? Stochastic singularity. Degrees of freedom in the measurement equation. Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 46 / 79

The Kalman Filter and DSGE models Policy Functions We need to write the model in state space form. Remember that a loglinear solution has the form: bk t+1 = p 1 bk t + p 2 z t and \output t = q 1 bk t + q 2 z t bl t = r 1 bk t + r 2 z t bc t = u 1 bk t + u 2 z t Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 47 / 79

The Kalman Filter and DSGE models Writing the Likelihood Function Transition equation: 0 1 0 1 @ bk t A = @ z t {z } s t 1 0 0 0 p 1 p 2 0 0 ρ 10 A@ {z } {z } F 1 bk t 1 z t 1 s t 1 1 0 A + @ 0 0 σ 1 {z } G A ɛ {z} t Measurement equation: 0 log output t 1 0 log y q 1 q 2 10 1 1 0 @ log l t A = @ log l r 1 r 2 A@ bk t A + @ log c t {z } log c u 1 {z u 2 } z t {z } y t H s t ω t. v 1,t v 2,t v 3,t 1 A {z } υ Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 48 / 79

The Kalman Filter and DSGE models The Solution to the Model in State Space Form Now, with y T, F, G, H, Q, and R as de ned before......we can use the Ricatti equations to evaluate the likelihood function: log p y T jγ = log p y T jf, G, H, Q, R where γ = fα, β, ρ, ψ, δ, σg. Cross-equations restrictions implied by equilibrium solution. With the likelihood, we can do inference! Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 49 / 79

Nonlinear Filtering Nonlinear Filtering Di erent approaches. Deterministic ltering: 1 Kalman family. 2 Grid-based ltering. Simulation ltering: 1 McMc. 2 Sequential Monte Carlo. Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 50 / 79

Nonlinear Filtering Kalman Family of Filters Use ideas of Kalman ltering to NLGF problems. Non-optimal lters. Di erent implementations: 1 Extended Kalman lter. 2 Iterated Extended Kalman lter. 3 Second-order Extended Kalman lter. 4 Unscented Kalman lter. Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 51 / 79

Nonlinear Filtering The Extended Kalman Filter EKF is historically the rst descendant of the Kalman lter. EKF deals with nonlinearities with a rst order approximation to the system and applying the Kalman lter to this approximation. Non-Gaussianities are ignored. Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 52 / 79

Algorithm Nonlinear Filtering Given s t 1jt 1, s tjt 1 = f s t 1jt 1, 0; γ. Then: P tjt 1 = Q t 1 + F t P t 1jt 1 Ft 0 where F t = df (S t 1, W t ; γ) ds t 1 Kalman gain, K t, is: where Then St 1 =s t 1jt 1,W t =0 K t = P tjt 1 G 0 t G t P tjt 1 G 0 t + R t 1 G t = dg (S t 1, v t ; γ) ds t 1 St 1 =s tjt 1,v t =0 s tjt = s tjt 1 + K t y t g s tjt 1, 0; γ P tjt = P tjt 1 K t G t P tjt 1 Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 53 / 79

Problems of EKF Nonlinear Filtering 1 It ignores the non-gaussianities of W t and V t. 2 It ignores the non-gaussianities of states distribution. 3 Approximation error incurred by the linearization. 4 Biased estimate of the mean and variance. 5 We need to compute Jacobian and Hessians. As the sample size grows, those errors accumulate and the lter diverges. Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 54 / 79

Nonlinear Filtering Iterated Extended Kalman Filter I Compute s tjt 1 and P tjt 1 as in EKF. Iterate N times on: where and K i t = P tjt 1 G i0 t G i t P tjt 1 G i0 t + R t 1 G i t = dg (S t 1, v t ; γ) ds t 1 St 1 =s i tjt 1,v t =0 s i tjt = s tjt 1 + K i t y t g s tjt 1, 0; γ Why are we iterating? How many times? Then: s tjt = s tjt 1 + K t y t g s N tjt 1, 0; γ P tjt = P tjt 1 K N t G N t P tjt 1 Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 55 / 79

Nonlinear Filtering Second-order Extended Kalman Filter We keep second-order terms of the Taylor expansion of transition and measurement. Theoretically, less biased than EKF. Messy algebra. In practice, not much improvement. Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 56 / 79

Nonlinear Filtering Unscented Kalman Filter I Recent proposal by Julier and Uhlmann (1996). Based around the unscented transform. A set of sigma points is selected to preserve some properties of the conditional distribution (for example, the rst two moments). Then, those points are transformed and the properties of the new conditional distribution are computed. The UKF computes the conditional mean and variance accurately up to a third order approximation if the shocks W t and V t are Gaussian and up to a second order if they are not. The sigma points are chosen deterministically and not by simulation as in a Monte Carlo method. The UKF has the advantage with respect to the EKF that no Jacobian or Hessians is required, objects that may be di cult to compute. Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 57 / 79

Nonlinear Filtering New State Variable We modify the state space by creating a new augmented state variable: S t = [S t, W t, V t ] that includes the pure state space and the two random variables W t and V t. We initialize the lter with s 0j0 = E (S t ) = E (S 0, 0, 0) 2 P 0j0 0 0 3 P 0j0 = 4 0 R 0 0 5 0 0 Q 0 Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 58 / 79

Nonlinear Filtering Sigma Points Let L be the dimension of the state variable S t. For t = 1, we calculate the 2L + 1 sigma points: S 0,t 1jt 1 = s t 1jt 1 S i,t 1jt 1 = s t 1jt 1 (L + λ) P t 1jt 1 0.5 for i = 1,..., L S i,t 1jt 1 = s t 1jt 1 + (L + λ) P t 1jt 1 0.5 for i = L + 1,..., 2L Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 59 / 79

Parameters Nonlinear Filtering λ = α 2 (L + κ) L is a scaling parameter. α determines the spread of the sigma point and it must belong to the unit interval. κ is a secondary parameter usually set equal to zero. Notation for each of the elements of S: S i = [S s i, S w i, S v i ] for i = 0,..., 2L Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 60 / 79

Weights Nonlinear Filtering Weights for each point: W0 m = λ L + λ W0 c = λ L + λ + 1 α2 + β W0 m = X0 c 1 = for i = 1,..., 2L 2 (L + λ) β incorporates knowledge regarding the conditional distributions. For Gaussian distributions, β = 2 is optimal. Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 61 / 79

Nonlinear Filtering Algorithm I: Prediction of States We compute the transition of the pure states: S s i,tjt 1 = f S s i,tjt 1, Sw i,t 1jt 1 ; γ Weighted state s tjt 1 = 2L Wi m S s i,tjt 1 i=0 Weighted variance: P tjt 1 = 2L Wi c i=0 S s i,tjt 1 s tjt 1 S s i,tjt 1 s tjt 1 0 Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 62 / 79

Nonlinear Filtering Algorithm II: Prediction of Observables Predicted sigma observables: Y i,tjt 1 = g S s i,tjt 1, Sv i,tjt 1 ; γ Predicted observable: y tjt 1 = 2L Wi m Y i,tjt 1 i=0 Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 63 / 79

Algorithm III: Update Nonlinear Filtering Variance-covariance matrices: P yy,t = P xy,t = 2L W c i Y i,tjt 1 y tjt 1 Yi,tjt 1 y tjt 0 1 i=0 2L Wi S c i,tjt 1 s tjt 1 Y i,tjt 1 i=0 y tjt 0 1 Kalman gain: K t = P xy,t P 1 yy,t Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 64 / 79

Algorithm IV: Update Nonlinear Filtering Update of the state: s tjt = s tjt + K t y t y tjt 1 the update of the variance: P tjt = P tjt 1 + K t P yy,t K 0 t Finally: 2 P tjt = 4 P tjt 0 0 0 R t 0 0 0 Q t 3 5 Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 65 / 79

Grid-Based Filtering Nonlinear Filtering Remember that we have the recursion p s t jy t ; γ = R p (st js t 1 ; γ) p s t 1 jy t 1 ; γ ds t 1 R R p (st js t 1 ; γ) p (s t 1 jy t 1 ; γ) ds t 1 p (y t js t ; γ) ds t p (y t js t ; γ) This recursion requires the evaluation of three integrals. This suggests the possibility of addressing the problem by computing those integrals by a deterministic procedure as a grid method. Kitagawa (1987)and Kramer and Sorenson (1988). Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 66 / 79

Grid-Based Filtering I Nonlinear Filtering We divide the state space into N cells, with center point s i t, s i t : i = 1,..., N. We substitute the exact conditional densities by discrete densities that put all the mass at the points st i N i=1. We denote δ (x) is a Dirac function with mass at 0. Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 67 / 79

Grid-Based Filtering II Nonlinear Filtering Then, approximated distributions and weights: p s t jy t 1 ; γ ' p s t jy t ; γ ' ω i tjt 1 = ω i tjt = N ω i tjt 1 δ s t s i t i=1 N ω i tjt 1 δ s t st i i=1 N ω j t 1jt 1 p stjs i j t j=1 ω i tjt 1 p y tjst; i γ 1 ; γ N j=1 ω y j tjt 1 p t jst j ; γ Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 68 / 79

Approximated Recursion Nonlinear Filtering N i=1 N j=1 p s t jy t ; γ = i h N j=1 ω s jt 1jt 1 p tjs i j t 1 ; γ p y t jst; i γ i δ s t h N j=1 ω s jt 1jt 1 p tjs i j t 1 ; γ p y t jst j ; γ st i Compare with p s t jy t ; γ = R p (st js t 1 ; γ) p s t 1 jy t 1 ; γ ds t 1 R R p (st js t 1 ; γ) p (s t 1 jy t 1 ; γ) ds t 1 p (y t js t ; γ) ds t p (y t js t ; γ) given that p s t 1 jy t 1 ; γ ' N ω i t 1jt 1 δ si t i=1 Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 69 / 79

Problems Nonlinear Filtering Grid lters require a constant readjustment to small changes in the model or its parameter values. Too computationally expensive to be of any practical bene t beyond very low dimensions. Grid points are xed ex ante and the results are very dependent on that choice. Can we overcome those di culties and preserve the idea of integration? Yes, through Monte Carlo Integration. Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 70 / 79

Nonlinear Filtering Particle Filtering Remember, 1 Transition equation: 2 Measurement equation: S t = f (S t 1, W t ; γ) Y t = g (S t, V t ; γ) Some Assumptions: 1 We can partition fw t g into two independent sequences fw 1,t g and fw 2,t g, s.t. W t = (W 1,t, W 2,t ) and dim (W 2,t ) + dim (V t ) dim (Y t ). 2 We can always evaluate the conditional densities p y t jw t 1, y t 1, S 0 ; γ. 3 The model assigns positive probability to the data. Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 71 / 79

Nonlinear Filtering Rewriting the Likelihood Function Evaluate the likelihood function of the a sequence of realizations of the observable y T at a particular parameter value γ: p y T ; γ We factorize it as: = T Z Z t=1 p y T ; γ = T t=1 p y t jy t 1 ; γ p y t jw t 1, y t 1, S 0 ; γ p W t 1, S 0 jy t 1 ; γ dw t 1 ds 0 Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 72 / 79

Nonlinear Filtering A Law of Large Numbers n o tjt 1,i tjt 1,i N T If s0, w1 N i.i.d. draws from i=1 t=1 p W t 1, S 0 jy t 1 ; γ T t=1, then: p y T ; γ ' T t=1 1 N N p i=1 tjt 1,i y t jw1, y t 1 tjt 1,i, s0 ; γ The problem of evaluating the likelihood is equivalent to the problem of drawing from p W t 1, S 0 jy t 1 ; γ T t=1 Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 73 / 79

Introducing Particles Nonlinear Filtering n o N s t 1,i 0, w t 1,i t 1 1 N i.i.d. draws from p W1, S 0 jy t 1 ; γ. i=1 Each s t 1,i 0, w t 1,i 1 is a particle and particles. n o tjt 1,i tjt 1,i N s0, w1 tjt 1,i Each s0, w n tjt 1,i s0, w1 Weights: tjt 1,i o tjt 1,i N n o N s t 1,i 0, w t 1,i 1 a swarm of i=1 i=1 N i.i.d. draws from p W t 1, S 0jy t 1 ; γ. 1 is a proposed particle and i=1 a swarm of proposed particles. qt i = p tjt 1,i y t jw1, y t 1 tjt 1,i, s0 ; γ N tjt 1,i i=1 p y t jw1, y t 1 tjt 1,i, s0 ; γ Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 74 / 79

A Proposition Nonlinear Filtering Let n o es 0 i, ew 1 i N be a draw with replacement from tjt 1,i tjt 1,i N s i=1 0, w1 i=1 and probabilities qt. i Then es 0 i, ew 1 i N is a draw from i=1 p (W 1 t, S 0jy t ; γ). Importance of the Proposition: n o tjt 1,i tjt 1,i N 1 It shows how a draw s0, w1 from p W 1 t, S 0jy t 1 ; γ n o i=1 N can be used to draw s t,i 0, w t,i 1 from p (W 1 t, S 0jy t ; γ). i=1 n o N 2 With a draw s t,i 0, w t,i 1 from p (W 1 t, S 0jy t ; γ) we can use i=1 n o p (W 1,t+1 ; γ) to get a draw s t+1jt,i 0, w t+1jt,i N 1 and iterate the i=1 procedure. Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 75 / 79

Sequential Monte Carlo Nonlinear Filtering Step 0, Initialization: Set t 1 and set p W1 t 1, S 0 jy t 1 ; γ = p (S 0 ; γ). n o tjt 1,i tjt 1,i N Step 1, Prediction: Sample N values s0, w1 from i=1 the density p W1 t, S 0jy t 1 ; γ = p (W 1,t ; γ) p W1 t 1, S 0 jy t 1 ; γ. tjt 1,i tjt 1,i Step 2, Weighting: Assign to each draw s0, w1 the weight qt. i n o N Step 3, Sampling: Draw s t,i 0, w t,i 1 with rep. from n o i=1 tjt 1,i tjt 1,i N s0, w1 with probabilities q i N t. If t < T set i=1 i=1 t t + 1 and go to step 1. Otherwise go to step 4. n o tjt 1,i tjt 1,i N T Step 4, Likelihood: Use s0, w1 to compute: p y T ; γ ' T t=1 1 N N i=1 p i=1 t=1 tjt 1,i y t jw1, y t 1 tjt 1,i, s0 ; γ Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 76 / 79

A Trivial Application Nonlinear Filtering How do we evaluate the likelihood function p y T jα, β, σ of the nonlinear, non-gaussian process: s t = α + β s t 1 1 + s t 1 + w t y t = s t + v t where w t N (0, σ) and v t t (2) given some observables y T = fy t g T t=1 and s 0. Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 77 / 79

Nonlinear Filtering 1 Let s 0,i 0 = s 0 for all i. 2 Generate N i.i.d. draws 3 Evaluate p y 1 jw 1j0,i 1, y 0, s 1j0,i 0 n o s 1j0,i N 0, w 1j0,i from N (0, σ). i=1 = p t(2) y 1 α + β s 1j0,i 0 1+s 1j0,i 0 4 Evaluate the relative weights q i 1 = p t(2) y 1 α+β + w 1j0,i. s 1j0,i 0 1+s 1j0,i 0 N i=1 p t(2) y 1 α+β s1j0,i 0 1+s 1j0,i 0 +w 1j0,i!! +w 1j0,i!!. n s 1j0,i 0, w 1j0,i o N i=1 with 5 Resample with replacement N values of n o N relative weights q1 i. Call those sampled values s 1,i 0, w 1,i. i=1 6 Go to step 1, and iterate 1-4 until the end of the sample. Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 78 / 79

Nonlinear Filtering A Law of Large Numbers A law of the large numbers delivers: p y 1 j y 0, α, β, σ ' 1 N N p y 1 jw 1j0,i 1, y 0, s 1j0,i 0 i=1 and consequently: p y T α, β, σ ' T t=1 1 N N p i=1 tjt 1,i y t jw1, y t 1, s tjt 1,i 0 Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 79 / 79