Kalman Filtering. Namrata Vaswani. March 29, Kalman Filter as a causal MMSE estimator

Kalman Filtering Namrata Vaswani March 29, 2018 Notes are based on Vincent Poor s book. 1 Kalman Filter as a causal MMSE estimator Consider the following state space model (signal and observation model). t = H t X t + W t, W t N (0, R) (1) X t = F t X t 1 + U t, U t N (0, Q) (2) where X 0, {U t, t = 1,... }, {W t, t = 0,... } are mutually independent and X 0 N (0, Σ 0 ). Recall that Cov[X E[(X E[X )(X E[X ) T. Define ˆX t s E[X t 0:s Σ t s Cov[X t 0:s = E[(X t ˆX t s )( ) T 0:s (3) Thus ˆX t s is the MMSE estimator of X t from observations 0:s and Σ t s is its error covariance conditioned in 0:s. We claim that that ˆX t t and ˆX t t 1 satisfy the following recursion (Kalman filter). ˆX t t 1 = F t ˆXt 1 t 1 Σ t t 1 = F t Σ t 1 t 1 F T t + Q, with initialization, ˆX0 1 = 0, Σ 0 1 = Σ 0. K t = Σ t t 1 H T t (H t Σ t t 1 H T t + R) 1 ˆX t t = ˆX t t 1 + K t ( t H t ˆXt t 1 ) Σ t t = [I K t H t Σ t t 1 (4)

1.1 Conditional Gaussian Distribution In the proof we will need the following result for jointly Gaussian random variables. If X, have the joint PDF [ then X N ( [ µ µ X, [ Σ Σ X Σ X Σ X E[X = µ X + Σ X Σ 1 ( µ ) Cov[X = Σ X Σ X Σ 1 Σ X, (5) Proof: One way to prove this is to write out the expression for the conditional PDF and use the block matrix inversion lemma. A shorter and nicer proof is as follows. The idea is to define a r.v. Z that is a linear function of X and and is such that Cov(Z, ) = 0. Because it is a linear function, Z and are also jointly Gaussian and hence cov = 0 will imply independence. Then if one writes of X as a linear function of Z and, getting the above quantities becomes very easy because E[f(Z) = E[f(Z). 1. Let Z = X + B. We want Cov(Z, ) = 0. But notice that Cov(Z, ) = E[(X µ X )( µ ) T +BE[( µ )( µ ) T = Σ X +BΣ. Thus if we let B = Σ X Σ 1 we will get Cov(Z, ) = 0 and so using joint-guassianity Z and are independent. 2. Thus, Z = X Σ X Σ 1 and so X = Z +Σ X Σ 1. Also, Z and are independent. Thus, E[X = E[Z + Σ X Σ 1 ) = µ X Σ X Σ 1 µ + Σ X Σ 1 (6) and since Cov(Z ) = Cov(Z), Cov( ) = 0 and Cov(Z, B ) = E[(Z µ Z ) ( µ ) T = E[(Z µ Z )( µ ) T = 0, we get Cov[X = Cov(Z B, Z B ) where B := Σ X Σ 1 = Cov(Z ) + Cov(B ) Cov(Z, B ) Cov(B, Z ) = Cov(Z) = Σ X + Σ X Σ 1 Σ Σ 1 Σ X Σ X Σ 1 Σ X Σ X Σ 1 Σ X = Σ X Σ X Σ 1 Σ X (7) 1.2 Proof of KF as MMSE estimator 1. Use induction. Base case for t = 0 follows directly from the signal model. Assume that (4) holds at t 1.

2. To compute ˆX t t 1, take E[ 0:t 1 on the signal model, (2). Then use the fact that U t is independent of 0:t 1 = f(x 0, U 0:t 1, W 0:t 1 ) to show that E[U t 0:t 1 = E[U t = 0 (since U t is zero mean). 3. To compute Σ t t 1, take Cov[ 0:t 1 on the signal model, (2). Show that the crossterms which contain E[(X t 1 ˆX t 1 t 1 )U T t 0:t 1 or its transpose are zero using the following approach. (a) Let Z X t 1 ˆX t 1 t 1. (b) It is easy to see that both Z and 0:t 1 are functions of X 0, U 0:t 1, W 0:t 1, i.e. {Z, 0:t 1 } = f((x 0, U 0:t 1, W 0:t 1 ). Thus, U t is independent of {Z, 0:t 1 } (using the model assumption). (c) U independent of {Z, } implies (i) U independent of ; (ii) U independent of Z given (conditionally independent). Proof: U independent of {Z, } implies that f(y, u) = f(z, y, u)dz = f(z, y)f(u)dz = f(y)f(u). Thus (i) follows. To show (ii), notice that f(z, u y) = f(z y)f(u z, y) by chain rule. U independent of {Z, } implies that f(u z, y) = f(u). Also, (i) implies that f(u) = f(u y). Thus, f(z, u y) = f(z y)f(u y) and thus (ii) holds. (d) Thus U t, Z are conditionally independent given 0:t 1. (e) Thus the cross term, E[ZU T t 0:t 1 = E[Z 0:t 1 E[U t 0:t 1 T = E[Z 0:t 1 E[U t T = 0 and the same holds for its transpose. The second equality follows because U t is independent of 0:t 1 while the third follows because U t is zero mean. 4. For the update step, first derive the expression for the joint pdf of X t, t conditioned on 0:t 1, f(x t, y t y 0:t 1 ), and then use the conditional mean and covariance formula for joint Gaussians given in (5), to obtain the expression for ˆX t t and Σ t t. To obtain the joint pdf expression, use the following approach. (a) Since the three random variables are jointly Gaussian, f(x t, y t y 0:t 1 ) will also be Gaussian. Thus we only need expressions for its mean and covariance. Clearly, [ [ f(x t, y t y 0:t 1 ) = N ( Σ X = Σ t t 1, x t y t ; [ ˆXt t 1 H ˆX t t 1, Σ X Σ T X Σ X Σ X ), where Σ X = Cov[X t, t 0:t 1 = E[(X t ˆX t t 1 )(H(X t ˆX t t 1 ) + W t ) T 0:t 1 = Σ t t 1 H T + cross 1 = Σ t t 1 H T + 0, Σ = Cov[ t 0:t 1 = HΣ t t 1 H T + R + cross 2 + cross T 2 = HΣ t t 1 H T + R + 0 + 0

(b) To show that cross 1 = 0 and cross 2 = 0, use the fact that (X t ˆX t t 1 ) and W t are conditionally independent given 0:t 1 ; W t is independent of 0:t 1 and W t is zero mean. These can be shown in a fashion analogous to step 3. This finishes the proof of the induction step and thus of the result. A few important things to notice are as follows. 1. Since the expressions for Σ t t 1 and Σ t t do not depend on 0:t 1, thus these are also equal to the unconditional error covariances, i.e. Σ t t 1 E[(X t ˆX t t 1 )( ) T 0:t 1 = E[(X t ˆX t t 1 )( ) T and similarly, Σ t t E[(X t ˆX t t )( ) T 0:t = E[(X t ˆX t t )( ) T 2. It can be shown that the innovations, Z t t H ˆX t t 1, are pairwise uncorrelated. Since they are jointly Gaussian, this means they are mutually independent. (a) To show uncorrelated-ness one needs to use E[E[Z Z 1, Z 2 Z 1 = E[Z Z 1 (law of iterated expectations applied to Z = Z Z 1 ). (b) Consider an s < t, E[Z t Z T s = E[E[Z t Z T s 0:s = E[E[Z t 0:s Z T s = E[E[( t H ˆX t t 1 ) 0:s Z T s = E[(E[ t 0:s E[H ˆX t t 1 0:s )Z T s = E[(H ˆX t s E[H ˆX t t 1 0:s )Z T s = E[(H ˆX t s H ˆX t s )Z T s = 0 (8) We used E[E[Z Z 1, Z 2 Z 1 = E[Z Z 1 to show that E[H ˆX t t 1 0:s = H ˆX t s. Here Z X t, Z 1 0:s, Z 2 s+1:t. 3. Another expression for K t : 1.3 Kalman filter with control input Consider a state space model of the form t = HX t + r( 1, 2,... t 1 ) + W t, W t N (0, R) X t = F X t 1 + q( 1, 2,... t 1 ) + GU t, U t N (0, Q) with X 0, {U t } t=1, {W t } t=0 being mutually independent and X 0 N (0, Σ 0 ). The above is a state space model, but with a nonzero feedback control input in both equations. To derive the Kalman recursion for this model, use the exact same procedure

outlined above. Since at time t, the control inputs are functions of 0:t 1, when we take E[ 0:t 1 of either the signal model or the observation model, r and q just get pulled out as constants. Thus the expressions for the error covariances does not change at all. 2 Kalman filter as a causal linear MMSE estimator Consider the state space model of (1), (2), but with the difference that X 0, U t, W t s are no longer Gaussian, but are just some zero mean random variables with the given covariances. Also, instead of being mutually independent, they are only pairwise uncorrelated. For this model, the Kalman filter of (4) is the causal linear MMSE estimator, i.e. ˆXt t 1 is the linear MMSE of X t from 0:t 1, ˆXt t is the linear MMSE of X t from 0:t, and the unconditional error covariances, E[(X t ˆX t t 1 )(.) T = Σ t t 1 and E[(X t ˆX t t )(.) T = Σ t t. We show the above using the orthogonality principle: ˆX is the linear MMSE of X from observations 0:n 1 if and only if Outline of Proof. E[(X ˆX) = 0 and E[(X ˆX) T l = 0, l = 0, 1,... n 1, (9) 1. Use induction. The base case is easy, since you are using no observations. Assume that our result holds for t 1, i.e. E[(X t 1 ˆX t 1 t 1 )(.) T = Σ t 1 t 1. 2. The expression for ˆX t t 1 is given in (4). Thus, E[(X t ˆX t t 1 ) T l ˆXt 1 t 1 is linear MMSE of X t 1 from 0:t 1 and = F E[(X t 1 ˆX t 1 t 1 )l T + F E[U t l T = 0 + 0, l = 0, 1,... t 1(10) The first term is zero because of the induction hypothesis and the orthogonality principle. The second term is zero because of uncorrelated-ness of U t and l (follows because l is a linear function of X 0, U 1:l, W 0:l ). Also, E[(X t ˆX t t 1 ) = F E[X t 1 ˆX t 1 t 1 + E[U t = 0 (11) This follows by induction assumption and orthogonality principle and since U t is zero mean. Thus by the orthogonality principle, ˆXt t 1 is the L-MMSE of X t from 0:t 1. 3. It is easy to show that E[(X t ˆX t t 1 )(.) T = F Σ t 1 t 1 F T + Q + 0 + 0 = Σ t t 1 by showing that the cross-terms are zero. The cross-terms are of the form E[(X t 1 ˆX t 1 t 1 )U T t (or its transpose). They are zero since (X t 1 ˆX t 1 t 1 ) is a linear function of X 0, U 1:t 1, W 0:t 1, all of which are uncorrelated with U t, and U t is zero mean.

4. The expression for ˆX t t is given in (4). Using the expression for t, X t ˆX t t = X t [(I K t H) ˆX t t 1 + K t t = (I K t H)(X t ˆX t t 1 ) + K t W t (12) Thus, E[(X t ˆX t t ) T l = (I K t H)E[(X t ˆX t t 1 )l T + K t E[W t l T (13) For l = 0, 1,... t 1, one can use the previous step; uncorrelated-ness of W t and 0:t 1 and W t being zero mean, to show that the above is zero. Consider l = t. E[(X t ˆX t t )t T = (I K t H)E[(X t ˆX t t 1 )t T K t E[W t t T = (I K t H)E[(X t ˆX t t 1 )(HX t + W t ) T K t E[W t W T t = (I K t H)Σ t t 1 H T K t R = 0 (14) (a) The last equality follows from the expression for K t. (b) The second-last one follows because i. E[(X t ˆX t t 1 )X T t = E[(X t ˆX t t 1 )(X t ˆX t t 1 + ˆX t t 1 ) T = Σ t t 1 +E[(X t ˆX t t 1 ) ˆX T t t 1. ii. E[(X t ˆX t t 1 ) ˆX T t t 1 = E[E[(X t ˆX t t 1 ) 0:t 1 ˆX T t t 1 = E[0. ˆX T t t 1 = 0 iii. W t and (X t ˆX t t 1 ) are uncorrelated and W t is zero mean. (c) The third-last one follows using (1) and the fact that W t and X t are uncorrelated and W t is zero mean. Also, using (12), E[(X t ˆX t t ) = (I K t H)E[X t ˆX t t 1 + K t E[W t = 0 (15) The first term is zero by prediction step claim and orthogonality principle; the second term is zero since W t is zero mean. Thus by orthogonality principle, ˆXt t is L-MMSE of X t from 0:t. 5. To get the expression for E[(X t ˆX t t )(.) T, expand it out using the expression for ˆX t t from (4) and then simplify it using (1) and the expression for K t. (a) E[(X t ˆX t t )(.) T = (I K t H)E[(X t ˆX t t 1 )(.) T (I K t H) T + K t E[W t W T t K T t + cross 3 + cross T 3 = (I K t H)Σ t t 1 (I K t H) T + K t RK T t + 0 + 0. (b) The cross term, cross 3, contains E[(X t ˆX t t 1 )W T t which can be shown to be zero since W t and (X t ˆX t t 1 ) are uncorrelated and W t is zero mean. (c) Use the expression for K t and simplify to show that (I K t H)Σ t t 1 (I K t H) T + K t RK T t = (I K t H)Σ t t 1 : homework problem.