Lecture 6: State Space Model and Kalman Filter Bus 490, Time Series Analysis, Mr R Tsay A state space model consists of two equations: S t+ F S t + Ge t+, () Z t HS t + ɛ t (2) where S t is a state vector of dimension m, Z t is the observed time series, F, G, H are matrices of parameters, {e t } and {ɛ t } are iid random vectors satisfying E(e t ) 0, E(ɛ t ) 0, Cov(e t ) Q, Cov(ɛ t ) R and {e t } and {ɛ t } are independent In the engineering literature, a state vector denotes the unobservable vector that describes the status of the system Thus, a state vector can be thought of as a vector that contains the necessary information to predict the future observations (ie, minimum mean squared error forecasts) Remark: In some applications, a state-space model is written as S t+ F S t + Ge t, Z t HS t + ɛ t Is there any difference between the two parameterizations? In this course, Z t is a scalar and F, G, H are constants A general state space model in fact allows for vector time series and time-varying parameters Also, the independence requirement between e t and ɛ t can be relaxed so that e t+ and ɛ t are correlated To appreciate the above state space model, we first consider its relation with ARMA models The basic relations are an ARMA model can be put into a state space form in infinite many ways; for a given state space model in ()-(2), there is an ARMA model A State space model to ARMA model: The key here is the Cayley-Hamilton theorem, which says that for any m m matrix F with characteristic equation c(λ) F λi λ m + α λ m + α 2 λ m 2 + + α m λ + α 0, we have c(f ) 0 In other words, the matrix F satisfies its own characteristic equation, ie F m + α F m + α 2 F m 2 + + α m F + α m I 0
Next, from the state transition equation, we have S t S t S t+ F S t + Ge t+ S t+2 F 2 S t + F Ge t+ + Ge t+2 S t+3 F 3 S t + F 2 Ge t+ + F Ge t+2 + Ge t+3 S t+m F m S t + F m Ge t+ + + F Ge t+m + Ge t+m Multiplying the above equations by α m, α m,, α,, respectively, and summing, we obtain S t+m + α S t+m + + α m S t+ + α m S t Ge t+m + β e t+m + + β m e t+ (3) In the above, we have used the fact that c(f ) 0 Finally, two cases are possible First, assume that there is no observational noise, ie ɛ t 0 for all t in (2) Then, by multiplying H from the left to equation (3) and using Z t HS t, we have Z t+m + α Z t+m + + α m Z t+ + α m Z t a t+m θ a t+m θ m a t+, where a t HGe t This is an ARMA(m, m ) model The second possibility is that there is an observational noise Then, the same argument gives ( + α B + + α m B m )(Z t+m ɛ t+m ) ( θ B θ m B m )a t+m By combining ɛ t with a t, the above equation is an ARMA(m, m) model B ARMA model to state space model: We begin the discussion with some simple examples given later Example : Consider the AR(2) model Three general approaches will be Z t φ Z t + φ 2 Z t 2 + a t For such an AR(2) process, to compute the forecasts, we need Z t and Z t 2 Therefore, it is easily seen that Zt+ Z t φ φ 2 0 Zt Z t + 0 a t+ and Z t, 0S t 2
where S t (Z t, Z t ) and there is no observational noise Example 2: Consider the MA(2) model Method : at a t Z t a t θ a t θ 2 a t 2 0 0 0 at a t 2 + Z t θ, θ 2 S t + a t Here the innovation a t shows up in both the state transition equation and the observation equation The state vector is of dimension 2 Method 2: For an MA(2) model, we have Z t t Z t Let S t (Z t, θ a t θ 2 a t, θ 2 a t ) Then, and S t+ Z t+ t θ a t θ 2 a t Z t+2 t θ 2 a t 0 0 0 0 0 0 0 S t + Z t, 0, 0S t θ θ 2 0 a t a t+ Here the state vector is of dimension 3, but there is no observational noise Exercise: Generalize the above result to an MA(q) model Next, we consider three general approaches Akaike s approach: For an ARMA(p, q) process, let m max{p, q + }, φ i 0 for i > p and θ j 0 for j > q Define S t (Z t, Z t+ t, Z t+2 t,, Z t+m t ) where Z t+l t is the conditional expectation of Z t+l given Ψ t {Z t, Z t, } By using the updating equation of forecasts (recall what we discussed before) Z t+ (l ) Z t (l) + ψ l a t+, it is easy to show that S t+ F S t + Ga t+ Z t, 0,, 0S t 3
where 0 0 0 ψ 0 0 0 F, G ψ 2 φ m φ m φ 2 φ ψ m The matrix F is call a companion matrix of the polynomial φ B φ m B m Aoki s Method: This is a two-step procedure First, consider the MA(q) part Letting W t a t θ a t θ q a t q, we have a t a t a t q+ 0 0 0 0 0 0 0 0 0 0 a t a t 2 + 0 a t a t q 0 W t θ, θ 2,, θ q S t + a t In the next step, we use the usual AR(p) format for Consequently, define the state vector as Z t φ Z t φ p Z t p W t S t (Z t, Z t 2,, Z t p, a t,, a t q ) Then, we have Z t Z t Z t p+ a t a t and a t q+ φ φ 2 φ p θ θ θ q 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Z t φ,, φ p, θ,, θ q S t + a t Z t Z t 2 a t q Z t p a t a t 2 0 0 + 0 0 a t Third approach: The third method is used by some authors, eg Harvey and his associates Consider an ARMA(p, q) model p q Z t φ i Z t i + a t θ j a t j j 4
Let m max{p, q} Define φ i 0 for i > p and θ j 0 for j > q The model can be written as Z t φ i Z t i + a t θ i a t i Using ψ(b) θ(b)/φ(b), we can obtain the ψ-weights of the model by equating coefficients of B j in the equation ( θ B θ m B m ) ( φ B φ m B m )(ψ 0 + ψ B + + ψ m B m + ), where ψ 0 In particular, consider the coefficient of B m, we have θ m φ m ψ 0 φ m ψ φ ψ m + ψ m Consequently, ψ m φ i ψ m i θ m (4) Next, from the ψ-weight representation we obtain Consequently, Z t+m i a t+m i + ψ a t+m i + ψ 2 a t+m i 2 +, Z t+m i t ψ m i a t + ψ m i+ a t + ψ m i+2 a t 2 + Z t+m i t ψ m i+ a t + ψ m i+2 a t 2 + Z t+m i t Z t+m i t + ψ m i a t, m i > 0 (5) We are ready to setup a state space model Define S t (Z t t, Z t+ t,, Z t+m t ) Using Z t Z t t + a t, the observational equation is Z t, 0,, 0S t + a t The state-transition equation can be obtained by Equations (5) and (4) First, for the first m elements of S t+, Equation (5) applies Second, for the last element of S t+, the model implies Z t+m t φ i Z t+m i t θ m a t Using Equation (5), we have Z t+m t φ i (Z t+m i t + ψ m i a t ) θ m a t φ i Z t+m i t + ( φ i ψ m i θ m )a t φ i Z t+m i t + ψ m a t, 5
where the last equality uses Equation (4) Consequently, the state-transition equation is where S t+ F S t + Ga t ψ 0 0 0 ψ 0 0 0 2 F, G ψ 3 φ m φ m φ 2 φ ψ m Note that for this third state space model, the dimension of the state vector is m max{p, q}, which may be lower than that of the Akaike s approach However, the innovations to both the state-transition and observational equations are a t Kalman Filter Kalman filter is a set of recursive equation that allows us to update the information in a state space model It basically decomposes an observation into conditional mean and predictive residual sequentially Thus, it has wide applications in statistical analysis The simplest way to derive the Kalman recursion is to use normality assumption It should be pointed out, however, that the recursion is a result of the least squares principle (or projection) not normality Thus, the recursion continues to hold for non-normal case The only difference is that the solution obtained is only optimal within the class of linear solutions With normailty, the solution is optimal among all possible solutions (linear and nonlinear) Under normality, we have that normal prior plus normal likelihood results in a normal posterior, that if the random vector (X, Y ) are jointly normal X Y µx N( µ y, Σxx Σ xy Σ yx Σ yy ), then the conditional distribution of X given Y y is normal X Y y Nµ x + Σ xy Σ yy (y µ y ), Σ xx Σ xy Σ yy Σ yx Using these two results, we are ready to derive the Kalman filter In what follows, let P t+j t be the conditional covariance matrix of S t+j given {Z t, Z t, } for j 0 and S t+j t be the conditional mean of S t+j given {Z t, Z t, } First, by the state space model, we have S t+ t F S t t (6) 6
Z t+ t HS t+ t (7) P t+ t F P t t F + GQG (8) V t+ t HP t+ t H + R (9) C t+ t HP t+ t (0) where V t+ t is the conditional variance of Z t+ given {Z t, Z t, } and C t+ t denotes the conditional covariance between Z t+ and S t+ Next, consider the joint conditional distribution between S t+ and Z t+ The above results give St+ Z t+ t N( St+ t Z t+ t, Pt+ t P t+ t H HP t+ t HP t+ t H + R Finally, when Z t+ becomes available, we may use the property of nromality to update the distribution of S t+ More specifically, and Obviously, S t+ t+ S t+ t + P t+ t H HP t+ t H + R (Z t+ Z t+ t ) () P t+ t+ P t+ t P t+ t H HP t+ t H + R HP t+ t (2) r t+ t Z t+ Z t+ t Z t+ HS t+ t is the predictive residual for time point t + The updating equation in () says that when the predictive residual r t+ t is non-zero there is new information about the system so that the state vector should be modified The contribution of r t+ t to the state vector, of course, needs to be weighted by the variance of r t+ t and the conditional covariance matrix of S t+ In summary, the Kalman filter consists of the following equations: Prediction: (6), (7), (8) and (9) Updating: () and (2) In practice, one starts with initial prior information S 0 0 and P 0 0, then predicts Z 0 and V 0 Once the observation Z is available, uses the updating equations to compute S and P, which in turns serve as prior for the next observation This is the Kalman recusion Applications of Kalman Filter Kalman filter has many applications They are often classified as filtering, prediction, and smoothing Let F t be the information available at time t, ie, F t σ- filed{z t, Z t 2, } Filtering: make inference on S t given F t Prediction: draw inference about S t+h with h > 0, given F t 7 )
SMoothing: make inference about S t given the data F T, where T t is the sample size We shall briefly discuss some of the applications A good reference is Chapter of Tsay (2005, 2nd ed) 8