State-Space Models. Initialization, Estimation and Smoothing of the Kalman Filter

Sae-Space Models Iniializaion, Esimaion and Smoohing of he Kalman Filer

Iniializaion of he Kalman Filer The Kalman filer shows how o updae pas predicors and he corresponding predicion error variances when a new observaion becomes available, ha is, how o move from ime (-1) o ime. Applicaion of he Kalman filer requires herefore an iniial predicor 0 ˆβ and he corresponding covariance marix, P = E[( ˆ β β )( ˆ β β ) ]. 0 0 0 0 0 Several efficien procedures have been proposed in he lieraure, bu since under cerain regulariy condiions and for sufficienly long series he curren predicors don depend on he iniializaion process, simple procedures can usually be applied.

Iniializaion procedure A simple iniializaion procedure is as follows: 1- Iniialize nonsaionary componens β 0, j of he sae vecor by any value (say, β 0, j=0) and a = ˆ =M very large variance, P E( β β ) 0, j 0, j 0, j (Large relaive o he magniude of he series.) - Iniialise saionary componens by heir (uncondiional) mean and variance. The use of 1 corresponds o he use of a noninformaive (diffuse) prior disribuion for he corresponding parameers under a Bayesian paradigm. 3

Example of Iniializaion Consider he BSM bu suppose ha he observed series { y, = 1,,...} consiss of survey esimaors and hence is subjec o sampling errors; y = Y + e, where Y defines he corresponding populaion value and e is he sampling error assumed o follow an AR() process. The saespace model is herefore, Observaion Equaion: y = L + S + e + ε Transiion Equaion: L = L 1+ R 1+ η1 ; R = R 1+ η, S = S 1 S S 3+ η3, (or rigonomeric model) e = φ e + φ e + η. 1 1 4 4

Sae-space represenaion of he exended BSM (quarerly series) Le = L R S S 1 S e e 1 β (,,,,,, ) Observaion Equaion y = (1,0,1,0,0,1,0) β + ε Transiion Equaion 1 1 0 0 0 0 0 L 1 η1 0 1 0 0 0 0 0 R 1 η 0 0 1 1 1 0 0 S 1 η 3 β = 0 0 1 0 0 0 0 S + 0 0 0 0 1 0 0 0 S 3 0 0 0 0 0 0 φ φ e 1 η 4 0 0 0 0 0 1 0 e 0 Q η1 q1 0 0 0 0 0 0 η 0 q 0 0 0 0 0 η 3 0 0 q3 0 0 0 0 = Cov 0 = 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 η4 0 0 0 0 0 q4 0 0 0 0 0 0 0 0 0 5

Possible iniializaion for he exended BSM β = ( L, R, S, S, S, e, e ). 1 1 Excep for e and e 1, he oher 5 componen of β are nonsaionary. Also, as we know by now, q E( e ) 0, Var( e ) V 4 = = e (1 ρφ 11 ρφ) = Hence: ˆ β 0 = (0,0,0,0,0,0,0), and P 0 M 0 0 0 0 0 0 0 M 0 0 0 0 0 0 0 M 0 0 0 0 = 0 0 0 M 0 0 0 0 0 0 0 M 0 0 0 0 0 0 0 V 0 0 0 0 0 0 0 e Ve M is a large number.. 6

Esimaion of hyper-parameers So far we assumed ha he marices X, T, Σ and Q are known. In pracice, he elemens of Σ and Q are usually unknown, and possibly also some of he elemens of he ransiion marix T. (In he previous example, T conains φ 1 and φ.) Denoe by λ he vecor of all he unknown parameers, ofen referred o as hyperparameers Under he classical (Frequenis) approach, hese parameers are esimaed from he observed series using Maximum Likelihood (ML) or oher sandard echniques. Unlike in classical saisics, he observaions { Y, = 1,,... N} are no independen, and hence he likelihood is no a simple produc of he densiies of he individual observaions. 7

Predicion Error Decomposiion Denoe Y 0 = β 0 and le Y () = ( Y, Y 1,..., Y 0 ) denoe he vecor of observaions unil and including ime. The following decomposiion follows from repeaed applicaion of Bayes Theorem, where f () denoes he corresponding probabiliy densiy funcions (pdf). f [ Y ] = f ( Y, Y,..., Y, Y ) ( N) N N 1 1 0 = = f [ Y Y ] f [ Y ] N ( N 1) ( N 1) f [ Y Y ] f [ Y Y ] f [ Y ] N ( N 1) N 1 ( N ) ( N ) N =... = [ fy [ Y ] fy [ β ] f( β ) ( 0 0 = ( 1) 1 0 0 β = Y ). Assuming ha ε and η are normal and likewise for β 0, each condiional disribuion fy [ Y( 1) ] is also normal; Y Y N Yˆ F ( 1) ~ ( 1, ), F = ( X P X +Σ ) = Var( Y Yˆ ). 1 1 8

Esimaion of hyper-parameers, (con.) N N 1 1 0 = fy Y( 1) fy1 β0 f β0 = f ( Y, Y,..., Y, Y ) N Y Y ~ N( Yˆ, F ) ( 1) 1 [ [ ] [ ] ( ) F = ( XP X' +Σ ) = VarY ( Yˆ ) = Vare ( ). 1 1 The log-likelihood has herefore he form, nn 1 N log[ L; λ Y( N) ] = log( π) log F = 1 1 N 1 ef e + log f( β0) [n=dim( Y )] = 1 We need o specify f ( β 0) which amouns o he Iniializaion of he Kalman filer. Possibiliies 1- Fix β 0 and add he unknown elemens of β 0 o he vecor λ. This means ha we drop he erm log f ( β 0) from he likelihood. If β 0 is of high dimension [p=dim( β 0 ) is large], we end up wih many more unknown parameers. 9

Iniializaion of he likelihood (con.) - Regress Y 1... Y m as a funcion of β m o obain ˆm β and P m, base he likelihood on fy [ Y( 1) ]. N = m+ 1 The value of m is se o he smalles value ha yields an esimaor wih a proper covariance marix P m. Example: If n>p, we can wrie, ˆ β ( ), ( ) 1 1 1 1 1 1= X1 Σ X1 X1 Σ Y1 P1= X1 Σ X1 (Σ is unknown), and base he likelihood on N fy [ Y ]. = ( 1) If n p we need o sacrifice more observaions. 3- Use he Iniializaion procedure discussed before (and assume a disribuion for β 0). Here again we need o sacrifice he firs m observaions unil we ge an esimaor ˆm β wih a proper covariance marix P m. 10

Maximum likelihood esimaion (con.) Having esablished he log-likelihood, i is hen maximized wih respec o he elemens conained in λ (he hyper-parameers). The maximizaion process yields also he asympoic covariance marix of he esimaors via he Inverse Informaion marix. 11

Example; MLE for AR(1) models Consider he AR(1) model, Z = φz 1 + ε where Z = ( Y μ). 1 = 1 1 = = EZ ( Z ) φz, VarZ ( Z ) σ Var( ε ) We assume ha ε ~ N(0, σ ). N 1 N = = 1 1 N 1 ( Z φz 1) = πσ σ f ( Z,..., Z ) f( Z Z ) f( Z ) = exp[ ] f ( Z1). Hence, N 1 N 1 log( L) = log( π ) log( σ ) N ( Z 1) [ φz + ] + log[ f ( Z 1)]. = σ We need o specify f ( Z 1). 1

Specificaion of f ( Z 1) for he AR(1) model Possibiliies 1- Fix Z 1 and maximize he likelihood ignoring he las erm ( f ( Z1) = cons.) - Assume Z σ ~ N[0, ] and add (1 φ ) 1 log[ f( Z )] 1 log( ) Z 1 1 = πσ Z + σ Z Z where σ σ /(1 φ ) =. o he likelihood If we subsiue ( Y μ) for Z everywhere in he likelihood we can maximize also wih respec o μ. 13

Smoohing We considered so far he predicion of he sae vecors (i.e., he compuaion of ˆ β + k 1, k=0,1, ), and he updaing of he curren sae vecor as new daa become available, i.e., he compuaion of ˆ β ˆ = β. Having new daa Y, we may wan o modify he previous predicors, ˆ β ˆ 1, β..., which is known as smoohing. For example, modify he esimaes of he rend and seasonal effecs produced in a previous year, which in X-11 erminology is called revision. Noe: for he las ime poin N, ˆ β NN = ˆ β. N 14

Smoohing of he previous sae vecor We like o modify (smooh) he esimaor ˆ β 1 compued from he observaions Y 1,..., Y 1 when a new observaion Y become available. We again have wo independen predicors for β 1: * ˆ 1 β 1 u 1 β = + ; * * 1 1 = 1 E( u u ) P * β ε β 1 η ε β 1 Y = X + = X T + X + = X T + v ; * u and 1 Evv = +Σ * * ( ) XQX * v are independen, given 1 β. 15

Smoohing of previous sae vecor (con.) Se he regression model, * ˆ β 1 Ι u 1 = Y 1 * XT β + v, where * 1 P 1 * 0 u 0 Var V v = XQX =. +Σ The generalised leas square predicor (GLS) is, 1 1 1 ˆ 1 1 TX V XT TX V β = Ι Ι Y ˆ Ι β [, ] [, ] = ˆ β + ( ˆ β ˆ β ) ; * 1 P 1 T 1 * 1 1 1 1 P = P TP. The predicor ˆ β 1 is updaed based on he predicion error of ( ˆ β T ˆ β 1 ) = ( ˆ β ˆ β 1). 16

Predicion Error Variance P = E[( ˆ β β )( ˆ β β )'] 1 1 1 1 1 * * 1 1 1 1 = P P ( P P) P ' P 1 P 1. P P means ha, 1 1 Var( a ˆ β ) Var( a ˆ β ) for all vecors 1 1 a0. As expeced, smoohing reduces he variance of he predicors. Recursive smoohing algorihm ˆ β = ˆ β + ( ˆ β ˆ β ); * 1 N 1 P 1 N T 1 1 N 1 N 1 1 N 1 * 1 1 = 1 1 P P TP P = E[( ˆ β β )( ˆ β β )'] * * 1 1 ( 1 N) 1 = P P P P P. ˆ = ˆ, = P. The algorihm sars wih, β β P NN N NN N 17

Random walk + noise y = 1 θ + ε, 1 Example Var( ε ) = σ ( 1 X = ) θ = 1 θ + η, Var( η ) = q ( T = 1) For his model, p = p /( p + q), and * ˆ θ = (1 p ) ˆ θ + p ˆ θ +. * * N 1 N As, p cons p /( p + q) c ˆ θ = (1 ) ˆ θ + ˆ θ = ˆ θ + ( ˆ θ ˆ θ ). and N c c + 1 N c + 1 N (See exercise). 18