ECE531 Lecture 11: Dynamic Parameter Estimation: Kalman-Bucy Filter

ECE531 Lecture 11: Dynamic Parameter Estimation: Kalman-Bucy Filter D. Richard Brown III Worcester Polytechnic Institute 09-Apr-2009 Worcester Polytechnic Institute D. Richard Brown III 09-Apr-2009 1 / 51

The Kalman Water Filter (from IEEE Control Systems Magazine Feb 2006: http://ieeexplore.ieee.org/iel5/37/33354/01580172.pdf) Worcester Polytechnic Institute D. Richard Brown III 09-Apr-2009 2 / 51

Dynamic MMSE State Estimation (Filtering) U[n] F[n],G[n] H[n] X[n] X[n + 1] Z[n] unit delay V [n] Y [n] For now, we will focus on the filtering problem, i.e. estimating X[l] given the observations Y [0],..., Y [l]. We know that the MMSE state estimator is n o ˆX[l l] := E X[l] Y0 l where Y l 0 := {Y [0],..., Y [l]}. The goal here is to leverage the restrictions we placed on the dynamical model in Lecture 10 to derive a computationally efficient MMSE estimator. Worcester Polytechnic Institute D. Richard Brown III 09-Apr-2009 3 / 51

MMSE State Prediction ˆX[0] given no observations Recall that we have assumed a Gaussian distributed initial state X[0] N(m[0],Σ[0]) where m[0] and Σ[0] are known. If you have no observations yet, what is the best estimator of the initial state? Prior to receiving the first observation, the MMSE estimate (prediction) of the initial state X[0] is simply ˆX[0 1] := E {X[0] no observations} = m[0]. The error covariance matrix of this MMSE estimator (predictor) is then { ( ) ( ) } Σ[0 1] := E ˆX[0 1] X[0] ˆX[0 1] X[0] = E {(m[0] X[0]) (m[0] X[0]) } = Σ[0] Worcester Polytechnic Institute D. Richard Brown III 09-Apr-2009 4 / 51

Review: Conditional Mean of Jointly Gaussian Vectors Given Y = HX + V with X N(m,Σ), V N(0,R), and X and V independent. We would like to compute the conditional mean E[X Y ]. We can use the conditional pdf of jointly Gaussian random vectors to write E[X Y = y] = E[X] + cov(x,y )[cov(y,y )] 1 (y E[Y ]) It is pretty easy to show that E[X] = m, E[Y ] = Hm, cov(x,y ) = ΣH, and cov(y,y ) = HΣH + R. Hence, E[X Y = y] = m + ΣH [ HΣH + R] 1 (y Hm) Worcester Polytechnic Institute D. Richard Brown III 09-Apr-2009 5 / 51

MMSE State Estimate ˆX[0] given Y [0] (part 1) At time n = 0, we receive the observation Y [0] = H[0]X[0] + V [0] where X[0] N(m[0],Σ[0]), V [0] N(0,R[0]), and H[0] is known. Note that, under the restrictions in Lecture 10, this is just the standard linear Gaussian model. We use our prior result to find the MMSE estimate of X[0] given Y [0]... ˆX[0 0] := E {X[0] Y [0]} = m[0] + Σ[0]H [0] = ˆX[0 1] + Σ[0 1]H [0] H[0]Σ[0]H [0] + R[0] 1 (Y [0] H[0]m[0]) H[0]Σ[0 1]H [0] + R[0] 1 (Y [0] H[0] ˆX[0 1]) Worcester Polytechnic Institute D. Richard Brown III 09-Apr-2009 6 / 51

MMSE State Estimate ˆX[0] given Y [0] (part 2) If we define we can write Remarks: K[0] := cov(x[0], Y [0])[cov(Y [0], Y [0])] 1 = Σ[0 1]H [0] ( H[0]Σ[0 1]H [0] + R[0] ) 1 ˆX[0 0] = ˆX[0 1] + K[0] ( Y [0] H[0] ˆX[0 ) 1] The term Ỹ [0 1] := Y [0] H[0] ˆX[0 1] is sometimes called the innovation. It is the error between the MMSE prediction of Y [0] and the actual observation Y [0]. Substituting for Y [0], note that the innovation can be written as ( Ỹ [0 1] = H[0]X[0] + V [0] H[0] ˆX[0 ) 1] ( = H[0] X[0] ˆX[0 ) 1] + V [0] Worcester Polytechnic Institute D. Richard Brown III 09-Apr-2009 7 / 51

Kalman Gain The matrix ECE531 Lecture 11: The Kalman Filter K[0] := cov(x[0], Y [0])[cov(Y [0], Y [0])] 1 is sometimes called the Kalman Gain. Both covariances are unconditional here. Rewriting first covariance term in the Kalman Gain as { ( )( ) } cov(x[0], Y [0]) = E X[0] ˆX[0 1] Y [0] H[0] ˆX[0 1] and the second term as j cov(y [0], Y [0]) = E Y [0] H[0] ˆX[0 1] Y [0] H[0] ˆX[0 ff 1] we can see that the Kalman Gain (in a simplistic scalar sense) is proportional to the covariance between the state prediction error and the innovation inversely proportional to the innovation variance Does this make sense in the context of our state estimation equation? ˆX[0 0] = ˆX[0 1] + K[0] Y [0] H[0] ˆX[0 1] Worcester Polytechnic Institute D. Richard Brown III 09-Apr-2009 8 / 51

Additional Interpretation Note that K[0] is only a function of the error covariance matrix of the initial state prediction Σ[0 1], the known state update matrix H[0], and the known measurement noise covariance R[0]. Given K[0] and H[0], the state estimate ˆX[0 0] is only a function of the previous state prediction ˆX[0 1] and the current observation Y [0]. What can we say about Σ[0 0], the error covariance matrix of the state estimate ˆX[0 0] (conditioned on the observation Y [0])? Worcester Polytechnic Institute D. Richard Brown III 09-Apr-2009 9 / 51

MMSE State Estimate ˆX[0] given Y [0] The error covariance matrix of the MMSE estimator ˆX[0 0] can be computed as { ( ) ( ) } Σ[0 0] := E ˆX[0 0] X[0] ˆX[0 0] X[0] Y [0] = cov {X[0] Y [0]} We can use the standard result for jointly Gaussian random vectors to write ( 1 Σ[0 0] = Σ[0] Σ[0]H [0] H[0]Σ[0]H [0] + R[0]) H[0]Σ[0] = Σ[0 1] K[0]H[0]Σ[0 1] where we used our definitions for Σ[0 1] and K[0] in the last equality. Note that, given K[0] and H[0], the error covariance matrix of the MMSE state estimator after the observation Y [0] is only a function of the error covariance matrix of the prediction Σ[0 1]. Worcester Polytechnic Institute D. Richard Brown III 09-Apr-2009 10 / 51

Remarks ECE531 Lecture 11: The Kalman Filter What we have done so far: 1. Predicted the first state with no observations: ˆX[0 1]. 2. Computed the error covariance matrix of this prediction: Σ[0 1]. 3. Received the observation Y [0]. 4. Estimated the first state given the observation: ˆX[0 0]. 5. Computed the error covariance matrix of this estimate: Σ[0 0]. Interesting observations: The estimate ˆX[0 0] is expressed in terms of the prediction ˆX[0 1] and the observation Y [0]. The estimate error covariance matrix Σ[0 0] is expressed in terms of the prediction error covariance matrix Σ[0 1]. The goal here is to develop a general recursion. Our first step will be to see if the prediction for the next state (and its ECM) can be expressed in terms of the estimate (and its ECM) of the previous state. Worcester Polytechnic Institute D. Richard Brown III 09-Apr-2009 11 / 51

State Update from X[l] to X[n + 1] From our linear dynamical model, we have X[n + 1] = F[n]X[n] + G[n]U[n] = F[n](F[n 1]X[n 1] + G[n 1]U[n 1]) + G[n]U[n] = etc. For n + 1 > l, we can repeat this process to write { n l } { n n j 1 } X[n + 1] = F[n k] X[l] + F[n k] G[j]U[j] where k=0 } {{ } Fn l F t n := j=l k=0 { F[n]F[n 1] F[t] I } {{ } Fn j+1 t n t > n Worcester Polytechnic Institute D. Richard Brown III 09-Apr-2009 12 / 51

General Expression for MMSE State Prediction For n + 1 > l, we can use our prior result to write ˆX[n + 1 l] := E { X[n + 1] Y0 l } n model = E Fl nx[l] + Fn j+1 G[j]U[j] Y0 l linearity = F l ne { X[l] Y l 0 = F l n ˆX[l l] + irrelevance = F l n ˆX[l l] + = F l n ˆX[l l] j=l } + n n j=l n j=l j=l Fn j+1 G[j]E { U[j] Y0 l } Fn j+1 G[j]E { U[j] Y0 l } Fn j+1 G[j]E {U[j]} Note that the one-step predictor can be expressed as ˆX[l + 1 l] = F[l] ˆX[l l]. Worcester Polytechnic Institute D. Richard Brown III 09-Apr-2009 13 / 51

Conditional ECM of MMSE State Predictor n o Σ[n + 1 l] := cov X[n + 1] Y0 l 8 < nx model = cov : Fl nx[l] + j=l n o indep = cov FnX[l] l Y0 l n o irrelevance = cov FnX[l] l Y0 l F j+1 n o = Fncov l X[l] Y0 l (Fn) l + = F l nσ[l l](f l n) + 9 = n G[j]U[j] Y0 l ; 8 9 < nx = + cov Fn j+1 G[j]U[j] Y0 l : ; j=l 8 9 < nx = + cov Fn j+1 G[j]U[j] : ; nx j=l j=l nx j=l Fn j+1 G[j]Q[j] Fn j+1 G[j]cov {U[j]} F j+1 n G[j] For one-step prediction: Σ[l + 1 l] = F[l]Σ[l l]f [l] + G[l]Q[l]G [l]. F j+1 n G[j] Worcester Polytechnic Institute D. Richard Brown III 09-Apr-2009 14 / 51

Summary of Main Results MMSE state estimator (only shown so far for the case l = 0): ˆX[l l] = ˆX[l ( l 1] + K[l] Y [l] H[l] ˆX[l ) l 1] One-step MMSE state predictor: ˆX[l + 1 l] = F[l] ˆX[l l] ECM of MMSE state estimator (only shown so far for the case l = 0): Σ[l l] = Σ[l l 1] K[l]H[l]Σ[l l 1] ECM of MMSE one-step state predictor: Σ[l + 1 l] = F[l]Σ[l l]f [l] + G[l]Q[l]G [l] Worcester Polytechnic Institute D. Richard Brown III 09-Apr-2009 15 / 51

Induction: MMSE State Estimator for Arbitrary l Assume that and ˆX[l l] = ˆX[l l 1] + K[l] ( Y [l] H[l] ˆX[l ) l 1] Σ[l l] = Σ[l l 1] K[l]H[l]Σ[l l 1] are true for some value of l. We want to show that these expressions are also true for l + 1 using the fact that the prediction equations have already been shown to be true for any l. Fact (not too difficult to prove): The innovation Ỹ [l + 1 l] := Y [l + 1] H[l + 1] ˆX[l + 1 l] is a zero-mean Gaussian random vector uncorrelated with Y [0],..., Y [l]. Worcester Polytechnic Institute D. Richard Brown III 09-Apr-2009 16 / 51

Useful Result 0 ECE531 Lecture 11: The Kalman Filter Assume that X, Y, and Z are jointly Gaussian and that Y and Z are uncorrelated with each other. Denote C as the covariance matrix between the subscripted random variables. We can use the conditional mean result for jointly Gaussian random vectors to write E {X Y,Z} = E{X} + [ ] [ ] 1 [ ] C C XY C Y Y C Y Z Y E{Y } XZ C ZY C ZZ Z E{Z} Under our assumption that Y and Z are uncorrelated, both C Y Z and C ZY are equal to zero. The matrix inverse is then easy to compute and we have the useful result E {X Y,Z} = E{X} + C XY C 1 Y Y (Y E{Y }) + C XZC 1 ZZ (Z E{Z}) = E{X Y } + C XZ CZZ 1 (Z E{Z}). Worcester Polytechnic Institute D. Richard Brown III 09-Apr-2009 17 / 51

Induction: Conditional Mean ˆX[l + 1 l + 1] := E { X[l + 1] Y0 l+1 } = E { X[l + 1] Y0 l, Y [l + 1]} } = E {X[l + 1] Y0 l, Ỹ [l + 1 l] = E { X[l + 1] Y0 l } ) + cov (X[l + 1], Ỹ [l + 1 l] [ 1 cov (Ỹ [l + 1 l], Ỹ [l + 1 l])] Ỹ [l + 1 l] = ˆX[l ) + 1 l] + cov (X[l + 1], Ỹ [l + 1 l] [ 1 cov (Ỹ [l + 1 l], Ỹ [l + 1 l])] Ỹ [l + 1 l] } where we have used useful result 0 and the fact that E {Ỹ [l + 1 l] = 0 to write the second to last expression. Worcester Polytechnic Institute D. Richard Brown III 09-Apr-2009 18 / 51

Induction: Conditional Mean Defining ) [ )] 1 K[l + 1] := cov (X[l + 1],Ỹ [l + 1 l] cov (Ỹ [l + 1 l], Ỹ [l + 1 l] and substituting for the innovation Ỹ [l + 1 l] := Y [l + 1] H[l + 1] ˆX[l + 1 l], we get the result we wanted: ˆX[l + 1 l + 1] = ˆX[l + 1 l] + K[l + 1] ( Y [l + 1] H[l + 1] ˆX[l ) + 1 l] Worcester Polytechnic Institute D. Richard Brown III 09-Apr-2009 19 / 51

Induction: Error Covariance Matrix The final step is to show Σ[l + 1 l + 1] = Σ[l + 1 l] K[l + 1]H[l + 1]Σ[l + 1 l] To see this, start from the definition n o Σ[l + 1 l + 1] = E ˆX[l + 1 l + 1] X[l + 1] ˆX[l + 1 l + 1] X[l + 1] and substitute our result ˆX[l + 1 l + 1] = ˆX[l + 1 l] + K[l + 1] ( Y [l + 1] H[l + 1] ˆX[l ) + 1 l] After expanding the outer product, there are four expectations that need to be computed... Worcester Polytechnic Institute D. Richard Brown III 09-Apr-2009 20 / 51

Useful Result Number 1 cov X[l + 1], Ỹ [l + 1 l] n = E X[l + 1] ˆX[l + 1 l] o Y [l + 1] H[l + 1] ˆX[l + 1 l] n = E X[l + 1] ˆX[l + 1 l] o H[l + 1](X[l + 1] ˆX[l + 1 l]) + V [l + 1] n = E X[l + 1] ˆX[l + 1 l] o X[l + 1] ˆX[l + 1 l] H [l + 1] = Σ[l + 1 l]h [l + 1] Worcester Polytechnic Institute D. Richard Brown III 09-Apr-2009 21 / 51

Useful Result Number 2 cov Ỹ [l + 1 l], Ỹ [l + 1 l] n = E Y [l + 1] H[l + 1] ˆX[l + 1 l] o Y [l + 1] H[l + 1] ˆX[l + 1 l] n = E H[l + 1](X[l + 1] ˆX[l + 1 l]) + V [l + 1] o H[l + 1](X[l + 1] ˆX[l + 1 l]) + V [l + 1] n = H[l + 1]E X[l + 1] ˆX[l + 1 l] o X[l + 1] ˆX[l + 1 l] H [l + 1] +R[l + 1] = H[l + 1]Σ[l + 1 l]h [l + 1] + R[l + 1] Worcester Polytechnic Institute D. Richard Brown III 09-Apr-2009 22 / 51

j ff E X[l + 1] ˆX[l + 1 l] X[l + 1] ˆX[l + 1 l] j K[l + 1]E Y [l + 1] H[l + 1] ˆX[l + 1 l] X[l + 1] ˆX[l ff + 1 l] j E X[l + 1] ˆX[l + 1 l] Y [l + 1] H[l + 1] ˆX[l ff + 1 l] K [l + 1] + j K[l + 1]E Y [l + 1] H[l + 1] ˆX[l + 1 l] Y [l + 1] H[l + 1] ˆX[l ff + 1 l] K [l + 1] The first line is simply Σ[l + 1 l]. The third line was solved in useful result number 1 and is Σ[l + 1 l]h [l + 1]K [l + 1] Inspection of K [l + 1] reveals that the third line is a symmetric matrix. Hence, the second line is equal to the third line. The fourth line was solved in useful result number 2... Worcester Polytechnic Institute D. Richard Brown III 09-Apr-2009 23 / 51

From useful result number 2, we can write the fourth line as K[l + 1](H[l + 1]Σ[l + 1 l]h [l + 1] + R[l + 1])K [l + 1] This can be simplified a bit since K [l + 1] = (H[l + 1]Σ[l + 1 l]h [l + 1] + R[l + 1]) 1 H[l + 1]Σ[l + 1 l] Substituting for the K [l + 1] at the end of the equation, we can write the fourth line as K[l + 1]H[l + 1]Σ[l + 1 l] which is the same as the second and third lines, except for a sign change. Putting it all together, we have Σ[l + 1 l + 1] = Σ[l + 1 l] K[l + 1]H[l + 1]Σ[l + 1 l] which is the result we wanted to show. Worcester Polytechnic Institute D. Richard Brown III 09-Apr-2009 24 / 51

The Discrete-Time Kalman-Bucy Filter (1961) Theorem Under the squared error cost assignment, the linear system model, and the white Gaussian input, noise, and initial state assumptions discussed in Lecture 10, the optimal estimates for the current state (filtering) and the next state (prediction) are given recursively as ˆX[l l] = ˆX[l ( l 1] + K[l] Y [l] H[l] ˆX[l ) l 1] for l = 0, 1,... ˆX[l + 1 l] = F[l] ˆX[l l] for l = 0, 1,... with the initialization ˆX[0 1] = m[0] and where the matrix K[l] = Σ[l l 1]H [l] ( H[l]Σ[l l 1]H [l] + R[l] ) 1 with Σ[l l 1] := cov {X[l] Y [0],...,Y [l 1]} and R[l] := cov {V [l]}. Worcester Polytechnic Institute D. Richard Brown III 09-Apr-2009 25 / 51

Kalman Filter: Summary of General Recursion Initialization (predictions): ˆX[0 1] = m[0] Σ[0 1] = Σ[0] Recursion, beginning with l = 0: ( K[l] = Σ[l l 1]H [l] H[l]Σ[l l 1]H [l] + R[l] ˆX[l l] = ˆX[l ( l 1] + K[l] Y [l] H[l] ˆX[l ) l 1] Σ[l l] = Σ[l l 1] K[l]H[l]Σ[l l 1] ˆX[l + 1 l] = F[l] ˆX[l l] Σ[l + 1 l] = F[l]Σ[l l]f[l] + G[l]Q[l]G [l] ) 1 Worcester Polytechnic Institute D. Richard Brown III 09-Apr-2009 26 / 51

The Kalman Filter ECE531 Lecture 11: The Kalman Filter ˆX[l l] Y [l] K[l] F[l] ˆX[l + 1 l] - H[l] ˆX[l l 1] unit delay Worcester Polytechnic Institute D. Richard Brown III 09-Apr-2009 27 / 51

The Kalman Filter: Time-Invariant Case ˆX[l l] Y [l] K[l] F ˆX[l + 1 l] - H ˆX[l l 1] unit delay K[l] = ˆX[l l] = Σ[l l 1]H ( HΣ[l l 1]H + R ) 1 ˆX[l ( l 1] + K[l] Y [l] H ˆX[l ) l 1] Σ[l l] = Σ[l l 1] K[l]HΣ[l l 1] ˆX[l + 1 l] = F ˆX[l l] Σ[l + 1 l] = FΣ[l l]f + GQG Even though F, G, H, Q, and R are time-invariant, K[l] is still time-varying. Worcester Polytechnic Institute D. Richard Brown III 09-Apr-2009 28 / 51

Example: One-Dimensional Motion Our one-dimensional motion system: The state X[l] = [ x[l], v[l] ] (position, velocity). The input U[l] = a[l] i.i.d. N(0, σ 2 a ) (acceleration). The state update equation is X[l + 1] = where T is the sampling time. [ ] 1 T 0 1 }{{} F [ 0 X[l] + T] U[l] }{{} G Suppose we observe the position of the particle in noise. Then Y [n] = [ 1 0 ] X[n] + V [n] }{{} H where V [l] i.i.d. N(0, σv 2 ) and is independent of U[l]. Worcester Polytechnic Institute D. Richard Brown III 09-Apr-2009 29 / 51

Example: One-Dimensional Motion velocity v 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 squared error 3.5 3 2.5 2 1.5 1 0.5 true state predicted state estimated state 0 2 1 0 1 2 position x T = 0.1, σ 2 a = 1, and σ2 v 0 0 5 10 15 20 time = 0.1 in this example. Worcester Polytechnic Institute D. Richard Brown III 09-Apr-2009 30 / 51

Example: One-Dimensional Motion Kalman Gain velocity v 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 Kalman gain 1.4 1.2 1 0.8 0.6 0.4 0.2 position x velocity v 0 2 1 0 1 2 position x T = 0.1, σ 2 a = 1, and σ2 v 0 0 5 10 15 20 time = 0.1 in this example. Worcester Polytechnic Institute D. Richard Brown III 09-Apr-2009 31 / 51

Example: One-Dimensional Motion MSE 2 1.8 true state predicted state estimated state 1.6 1.4 mean squared error 1.2 1 0.8 0.6 0.4 0.2 0 0 5 10 15 20 25 30 time T = 0.1, σ 2 a = 1, σ 2 v = 0.1; averaged over 5000 runs. Worcester Polytechnic Institute D. Richard Brown III 09-Apr-2009 32 / 51

Computational Requirements: Batch MMSE Estimation Suppose we ve received observations Y [0],...,Y [l] and we wish to estimate the state X[l] from these observations. If we just did this as a batch operation, what is the MMSE estimate of X[l]? ˆX[l l] = E {X[l] Y [0],...,Y [l]} [ ] 1 ( ) = E{X[l]} + cov{x[l],y0 l } cov{y0 l,y0 l } Y0 l E{Y0 L } where Y l 0 := [ Y [0],...,Y [l] ]. Recall that, for each l, Y [l] R k. What are the dimensions of the matrix inverse that we have to compute here? Worcester Polytechnic Institute D. Richard Brown III 09-Apr-2009 33 / 51

Computational Requirements: Kalman Filter Note that the Kalman filter recursion also has a matrix inverse. ( K[l] = Σ[l l 1]H [l] H[l]Σ[l l 1]H [l] + R[l] ) 1 What are the dimensions of the matrix inverse that we have to compute here? Worcester Polytechnic Institute D. Richard Brown III 09-Apr-2009 34 / 51

Computational Requirements: Kalman Filter Note that the error covariance matrix and Kalman gain updates ( ) 1 K[l] = Σ[l l 1]H [l] H[l]Σ[l l 1]H [l] + R[l] Σ[l l] = Σ[l l 1] K[l]H[l]Σ[l l 1] Σ[l + 1 l] = F[l]Σ[l l]f[l] + G[l]Q[l]G [l] do not depend on the observation. This means that, if F[l], G[l], H[l], Q[l], and R[l] are known in advance, e.g. they are time invariant, the error covariance matrices (prediction and estimation) as well as the Kalman gain can be computed in advance. Pre-computation of K[l], Σ[l l], and Σ[l + 1 l] makes the real-time computational requirements of the Kalman filter very modest: State estimate: One matrix-vector product and one vector addition. State prediction: One matrix-vector product. Worcester Polytechnic Institute D. Richard Brown III 09-Apr-2009 35 / 51

Static State Estimation with the Kalman Filter Consider the special case F[l] I and G[l] 0. In this case, we have a static parameter estimation problem since the state does not change over time. It should be clear that X[l] X[0] = X. What does the Kalman filter do in this scenario? Let s look at the recursion replacing F[l] I and G[l] 0 (for simplicity, we also assume that H and R are time-invariant). ) 1 ( K[l] = Σ[l l 1]H HΣ[l l 1]H + R ˆX[l l] = ˆX[l ( l 1] + K[l] Y [l] H ˆX[l ) l 1] Σ[l l] = Σ[l l 1] K[l]HΣ[l l 1] ˆX[l + 1 l] = ˆX[l l] Σ[l + 1 l] = Σ[l l] Both predictions are equal to the last estimates (this should make sense). Worcester Polytechnic Institute D. Richard Brown III 09-Apr-2009 36 / 51

Static State Estimation with the Kalman Filter Since the predictions aren t necessary anymore, the notation and the recursion can be simplified to ( ) 1 K[l] = Σ[l 1]H HΣ[l 1]H + R ˆX[l] = ˆX[l ( 1] + K[l] Y [l] H ˆX[l ) 1] Σ[l] = Σ[l 1] K[l]HΣ[l 1] What is the Kalman filter doing here? It is performing sequential MMSE estimation of the static parameter X. That is, given observations Y [0],Y [1],..., the Kalman filter is a computationally efficient way to sequentially refine the MMSE estimate of an unknown static parameter (state) with each new observation. Worcester Polytechnic Institute D. Richard Brown III 09-Apr-2009 37 / 51

Example: Sequential MMSE Estimation of Static Param. X 1 0.1 0.05 0 0.05 0.1 0.15 0.2 squared error 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 true state estimated state 0.25 0 0.1 0.2 0.3 0.4 X 0 σ 2 v = 0.1 and H = I in this example. 0 0 5 10 15 20 25 time Worcester Polytechnic Institute D. Richard Brown III 09-Apr-2009 38 / 51

Example: Sequential MMSE Estimation of Static Param. 0.2 0.18 true state estimated state 0.16 0.14 mean squared error 0.12 0.1 0.08 0.06 0.04 0.02 0 0 5 10 15 20 25 30 35 40 45 50 time σ 2 v = 0.1 and H = I; averaged over 5000 runs. Worcester Polytechnic Institute D. Richard Brown III 09-Apr-2009 39 / 51

Extension 1: Non-White Input Sequence Suppose the input sequence to our discrete-time dynamical system model was zero-mean and wide-sense stationary, but not white, i.e.: cov {U[k],U[l]} = Q[k l] This causes the state of the system to no longer be a Markov sequence. A lot of our earlier analysis is not going to be valid anymore. What can we do? The basic idea is to write an equivalent discrete-time dynamical system model with a white input signal and an augmented state. Then the usual Kalman filter can be applied. Worcester Polytechnic Institute D. Richard Brown III 09-Apr-2009 40 / 51

Non-White Input Sequence Example Example: Suppose we have our one-dimensional motion model again but this time the Gussian acceleration input has autocorrelation Q[k l] = cov {a[k],a[l]} = σ 2 a α k l for α (0,1). Note that this covariance can be achieved by the difference equation a[l + 1] = αa[l] + W[l] where W[l] is a zero-mean stationary white Gaussian sequence with var {W[l]} = σ 2 w = (1 α 2 )σ 2 a. The new system has input U[l] = W[l] and state X[l] = [x[l],v[l],a[l]]. What do the matrices F and G look like? Worcester Polytechnic Institute D. Richard Brown III 09-Apr-2009 41 / 51

Extension 2: Deterministic Input Component Sometimes the state update equation for the discrete-time dynamical model is specified as X[l + 1] = F[l]X[l] + G[l]U[l] + s[l] where s[l] is a known sequence. How does this affect the Kalman Filter recursion? Intuitively, the presence of a known sequence in the state update equation shouldn t degrade the performance of the MMSE estimator. It is not too difficult to show that the only change to the Kalman filter recursion is in the prediction step: ˆX[l + 1 l] = F[l] ˆX[l l] + s[l] The error covariance matrix of the prediction is unchanged. Why? Hence the expressions for the Kalman gain, the state estimate, and the error covariance matrix of the estimate remain the same. The details are left as an exercise (Poor Chapter V.2). Worcester Polytechnic Institute D. Richard Brown III 09-Apr-2009 42 / 51

Extension 3: Correlated Input and Measurement Noise Suppose now that the current input to the dynamical system and the current measurement noise are correlated, i.e.: { C k = l cov {U[k], V [l]} = 0 otherwise Again, a lot of our earlier analysis is not going to be valid anymore. What can we do? The basic idea here is transform the dynamical system to an equivalent dynamical system with uncorrelated input and measurement noise sequences. After deriving the new white input sequence (Ũ[k], uncorrelated with V [k] and with covariance Q CR 1 C ), you will end up with a new state update equation X[l + 1] = (F CR 1 H)X[l] + GŨ[k] + CR 1 s[k] where CR 1 s[k] is a known input (see Extension 2). The details are left as an exercise (Poor Chapter V.4). Worcester Polytechnic Institute D. Richard Brown III 09-Apr-2009 43 / 51

Types of State Prediction Fixed interval Fixed point Fixed lead Worcester Polytechnic Institute D. Richard Brown III 09-Apr-2009 44 / 51

Fixed Interval Prediction Given observations Y [0],..., Y [L 1], we want to predict the states X[L + j] for j = 0, 1,... in a computationally efficient manner. Earlier, we derived a general expression for MMSE prediction of the state X[n + 1] given observations Y [0],...,Y [l]: where F t n := ˆX[n + 1 l] = F l n ˆX[l l] (1) { F[n]F[n 1] F[t] t n I t > n. This result can be written recursively for our problem as ˆX[L + j L 1] = F L 1 L+j 1 ˆX[L 1 L 1] = F[L + j 1] ˆX[L + j 1 L 1] The associated error covariance matrix recursion can be computed similarly using our earlier results. Worcester Polytechnic Institute D. Richard Brown III 09-Apr-2009 45 / 51

Fixed Point Prediction Given observations Y [0],...,Y [l], we want to predict the state X[L] in a computationally efficient manner for l = 0,...,L 1. We can use the standard Kalman recursion to rewrite (1) as ˆX[L l] = FL 1 l ˆX[l l] { } = FL 1 l ˆX[l l 1] + K[l] Ỹ [l l 1] = FL 1 {F[l l 1] ˆX[l } 1 l 1] + K[l]Ỹ [l l 1] = F l 1 ˆX[l L 1 1 l 1] + FL 1 l K[l]Ỹ [l l 1] = ˆX[L l 1] + FL 1 l K[l]Ỹ [l l 1] which gives a recursion for ˆX[n + 1 l]. Note that it is assumed that K[l] and the innovations Ỹ [l l 1] were computed as part of the standard Kalman filter recursion. A similar procedure can be followed to derive the recursion for the ECM. Worcester Polytechnic Institute D. Richard Brown III 09-Apr-2009 46 / 51

Fixed Lead Prediction Given observations Y [0],...,Y [l], we want to estimate the state X[l + L] in a computationally efficient manner for l = 0,1,.... For a fixed lead of L samples, the recursion can be derived as ˆX[l + L l] = F[l + L 1] ˆX[l + L 1 l 1] + Fl+L 1 l K[l]Ỹ [l l 1] The error covariance matrix recursion can be computed by propagating the ECM from time l + 1 to time l + L using the standard Kalman filter ECM recursion Σ[l + k l] = F[l + k 1]Σ[l + k 1 l]f [l + k 1] starting at k = 1 and recursing up to k = L. +G[l + k 1]Q[l + k 1]G [l + k 1] Worcester Polytechnic Institute D. Richard Brown III 09-Apr-2009 47 / 51

Types of State Smoothing Fixed interval Fixed point Fixed lag Worcester Polytechnic Institute D. Richard Brown III 09-Apr-2009 48 / 51

Fixed Interval Smoothing Given observations Y [0],..., Y [L 1], we want to estimate (smooth) the state X[l] for l < L 1 in a computationally efficient manner. We assume that the standard Kalman estimates ˆX[l l] and predictions ˆX[l l 1] and associated error covariance matrices for l = 0, 1,..., L 1 have already been computed, but now we want to go back and smooth our earlier estimates in light of the full set of observations Y [0],..., Y [L 1]. The (backward) recursion then is to specify ˆX[l L 1] in terms of ˆX[l + 1 L 1] for l = L 2, L 3,...,0. We will also need a backward recursion for the error covariance matrix. The main idea behind the derivation: 1. Set up the joint pdf of the states X[l] and X[l + 1], conditioned on the observations Y [0],..., Y [L 1]. This conditional PDF will, of course, be Gaussian. 2. Write an expression for the conditional mean at time l in terms of the conditional mean at time l + 1 (backward recursion). Worcester Polytechnic Institute D. Richard Brown III 09-Apr-2009 49 / 51

Fixed Interval Smoothing: The Main Results The backward state recursion can be derived as ˆX[l L 1] = ˆX[l { } l] + C[l] ˆX[l + 1 L 1] ˆX[l + 1 l] } {{ } smoothing residual where C[l] = Σ[l l]f [l]σ 1 [l + 1 l] is the smoother gain. The backward recursion for the smoothed state error covariance matrix can also be derived as Σ[l L 1] = Σ[l l] + C[l] {Σ[l + 1 L 1] Σ[l + 1 l]} C [l]. The backward recursions for other types of smoothing can also be derived similarly. Worcester Polytechnic Institute D. Richard Brown III 09-Apr-2009 50 / 51

Conclusions ECE531 Lecture 11: The Kalman Filter The discrete-time Kalman-Bucy filter is often called the Workhorse of Estimation. Computationally efficient and no loss of optimality in the linear Gaussian case. We will see that the discrete-time Kalman-Bucy filter is also optimum in dynamic (or static) MMSE parameter estimation problems among the class of linear MMSE estimators even if the noise is non-gaussian. Even more extensions: Continuous-time Kalman-Bucy filter. Extended Kalman filter (nonlinear but differentiable state update, input, and output functions). Unscented Kalman filter (also allows for nonlinear functions, better convergence properties than the EKF in some cases). etc. Worcester Polytechnic Institute D. Richard Brown III 09-Apr-2009 51 / 51