4 Derivations of the Discrete-Time Kalman Filter

Size: px

Start display at page:

Download "4 Derivations of the Discrete-Time Kalman Filter"

Gervais Bryan
6 years ago
Views:

1 Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof N Shimkin 4 Derivations of the Discrete-Time Kalman Filter We derive here the basic equations of the Kalman filter (KF), for discrete-time linear systems We consider several derivations under different assumptions and viewpoints: For the Gaussian case, the KF is the optimal (MMSE) state estimator In the non-gaussian case, the KF is derived as the best linear (LMMSE) state estimator We also provide a deterministic (least-squares) interpretation We start by describing the basic state-space model 1

2 41 The Stochastic State-Space Model A discrete-time, linear, time-varying state space system is given by: x k+1 = F k x k + G k w k (state evolution equation) z k = H k x k + v k (measurement equation) for k 0 (say), and initial conditions x 0 Here: F k, G k, H k are known matrices x k IR n is the state vector w k IR n w is the state noise z k IR m is the observation vector v k the observation noise The initial conditions are given by x 0, usually a random variable The noise sequences (w k, v k ) and the initial conditions x 0 are stochastic processes with known statistics The Markovian model Recall that a stochastic process {X k } is a Markov process if p(x k+1 X k, X k 1, ) = p(x k+1 X k ) For the state x k to be Markovian, we need the following assumption Assumption A1: The state-noise process {w k } is white in the strict sense, namely all w k s are independent of each other Furthermore, this process is independent of x 0 The following is then a simple exercise: Proposition: Under A1, the state process {x k, k 0} is Markov 2

3 Note: Linearity is not essential: The Marko property follows from A1 also for the nonlinear state equation x k+1 = f(x k, w k ) The measurement process z k is usually not Markov The pdf of the state can (in principle) be computed recursively via the following (Chapman-Kolmogorov) equation: p(x k+1 ) = p(x k+1 x k )p(x k )dx k where p(x k+1 x k ) is determined by p(w k ) The Gaussian model Assume that the noise sequences {w k }, {v k } and the initial conditions x 0 are jointly Gaussian It easily follows that the processes {x k } and {z k } are (jointly) Gaussian as well If, in addition, A1 is satisfied (namely {w k } is white and independent of x 0 ), then x k is a Markov process This model is often called the Gauss-Markov Model 3

4 Second-Order Model We often assume that only the first and second order statistics of the noise is known Consider our linear system: x k+1 = F k x k + G k w k, k 0 z k = H k x x + v k, under the following assumptions: w k a 0-mean white noise: E(w k ) = 0, cov(w k, w l ) = Q k δ kl v k a 0-mean white noise: E(v k ) = 0, cov(v k, v l ) = R k δ kl cov(w k, v l ) = 0: uncorrelated noise x 0 is uncorrelated with the other noise sequences denote x 0 = E(x 0 ), cov(x 0 ) = P 0 We refer to this model as the standard second-order model It is sometimes useful to allow correlation between v k and w k : cov(w k, v l ) E(w k vl T ) = S k δ kl This gives the second-order model with correlated noise A short-hand notation for the above correlations: w k w l Q k δ kl S k δ kl 0 cov( v k, v l ) = S k T δ kl R k δ kl P 0 x 0 x 0 Note that the Gauss-Markov model is a special case of this model 4

5 Mean and covariance propagation For the standard second-order model, we easily obtain recursive formulas for the mean and covariance of the state The mean obviously satisfies: x k+1 = F k x k + G k w k = F k x k Consider next the covariance: P k = E((xk x k )(x k x) T ) Note that x k+1 x k+1 = F k (x k x k ) + G k w k, and w k and x k are uncorrelated (why?) Therefore P k+1 = F k P k Fk T + G k Q k G T k This equation is in the form of a Lyapunov difference equation Since z k = H k x x + v k, it is now easy to compute its covariance, and also the joint covariances of (x k, z k ) In the Gaussian case, the pdf of x k is completely specified by the mean and covariance: x k N(x k, P k ) 5

6 42 The KF for the Gaussian Case Consider the linear Gaussian (or Gauss-Markov) model x k+1 = F k x k + G k w k, k 0 z k = H k x x + v k where: {w k } and {v k } are independent, zero-mean Gaussian white processes with covariances E(v k v T l ) = R k δ kl, E(w k w T l ) = Q k δ kl The initial state x 0 is a Gaussian RV, independent of the noise processes, with x 0 N(x 0, P 0 ) Let Z k = (z 0,, z k ) Our goal is to compute recursively the following optimal (MMSE) estimator of x k : ˆx + k ˆx k k = E(x k Z k ) Also define the one-step predictor of x k : ˆx k ˆx k k 1 = E(x k Z k 1 ) and the respective covariance matrices: P + k P k P k k P k k 1 = E{x k ˆx + k )(x k ˆx + k )T Z k } = E{x k ˆx k )(x k ˆx k )T Z k 1 } Note that P + k (and similarly P k ) can be viewed in two ways: (i) It is the covariance matrix of the (posterior) estimation error, e k = x k ˆx + k In particular, MMSE = trace(p + k ) 6

7 (ii) It is the covariance matrix of the conditional RV (x k Z k ), namely an RV with distribution p(x k Z k ) (since ˆx + k is its mean) Finally, denote P 0 = P 0, ˆx 0 = x 0 Recall the formulas for conditioned Gaussian vectors: If x and z are jointly Gaussian, then p x z N(m, Σ), with m = m x + Σ xz Σ 1 zz (z m z ), Σ = Σ xx Σ xz Σ 1 zz Σ zx The same formulas hold when everything is conditioned, in addition, on another random vector According to the terminology above, we say in this case that the conditional RV (x z) is Gaussian Proposition: For the model above, all random processes (noises, x k, z k ) are jointly Gaussian Proof: All can be expressed as linear combinations of the noise seqeunces, which are jointly Gaussian (why?) It follows that (x k Z m ) is Gaussian (for any k, m) In particular: (x k Z k ) N(ˆx + k, P + k ), (x k Z k 1 ) N(ˆx k, P k ) 7

8 Filter Derivation Suppose, at time k, that (ˆx k, P k ) is given We shall compute (ˆx + k, P + k ) and (ˆx k+1, P k+1 ), using the following two steps Measurement update step: Since z k = H k x k + v k, then the conditional vector ( ) xk ( Z k 1 ) is Gaussian, with mean and covariance: z k ˆx k H kˆx k, P k P k HT k H k P k M k where M k = Hk P k HT k + R k To compute (x k Z k ) = (x k z k, Z k 1 ), we apply the above formula for conditional expectation of Gaussian RVs, with everything pre-conditioned on Z k 1 It follows that (x k Z k ) is Gaussian, with mean and covariance: ˆx + k = E(x k Z k ) = ˆx k + P k HT k (M k ) 1 (z k H kˆx k ) P + k = cov(x k Z k ) = P k P k HT k (M k ) 1 H k P k Time update step Recall that x k+1 = F k x k + G k w k Further, x k and w k are independent given Z k (why?) Therefore, P k+1 ˆx k+1 = E(x k+1 Z k ) = F kˆx + k = cov(x k+1 Z k ) = F k P + k F T k + G k Q k G T k 8

9 Remarks: 1 The KF computes both the estimate ˆx + k and its MSE/covariance P + k (and similarly for ˆx k ) Note that the covariance computation is needed as part of the estimator computation However, it is also of independent importance as is assigns a measure of the uncertainly (or confidence) to the estimate 2 It is remarkable that the conditional covariance matrices P + k and P k do not depend on the measurements {z k } They can therefore be computed in advance, given the system matrices and the noise covariances 3 As usual in the Gaussian case, P + k is also the unconditional error covariance: P + k = cov(x k ˆx + k ) = E[(x k ˆx + k )(x k ˆx + k )T ] In the non-gaussian case, the unconditional covariance will play the central role as we compute the LMMSE estimator 4 Suppose we need to estimate some s k = Cxk Then the optimal estimate is ŝ k = E(s k Z k ) = C ˆx + k 5 The following output prediction error z k = zk H kˆx k z k E(z k Z k 1 ) is called the innovation, and { z k } is the important innovations process Note that M k = H k P k HT k + R k is just the covariance of z k 9

10 43 Best Linear Estimator Innovations Approach a Linear Estimators Recall that the best linear (or LMMSE) estimator of x given y is an estimator of the form ˆx = Ay + b, which minimizes the mean square error E( x ˆx 2 ) It is given by: ˆx = m x + Σ xy Σ 1 yy (y m y ) where Σ xy and Σ yy are the covariance matrices It easily follows that ˆx is unbiased: E(ˆx) = m x, and the corresponding (minimal) error covariance is cov(x ˆx) = E(x ˆx)(x ˆx) T = Σ xx Σ xy Σ 1 yy Σ T xy We shall find it convenient to denote this estimator ˆx as E L (x y) Note that this is not the standard conditional expectation Recall further the orthogonality principle: E((x E L (x y))l(y)) = 0 for any linear function L(y) of y The following property will be most useful It follows simply by using y = (y 1 ; y 2 ) in the formulas above: Suppose cov(y 1, y 2 ) = 0 Then E L (x y 1, y 2 ) = E L (x y 1 ) + [E L (x y 2 ) E(x)] Furthermore, cov(x E L (x y 1, y 2 )) = (Σ xx Σ xy1 Σ 1 y 1 y 1 Σ T xy 1 ) Σ xy2 Σ 1 y 2 y 2 Σ T xy 2 10

11 b The innovations process Consider a discrete-time stochastic process {z k } k 0 The (wide-sense) innovations process is defined as z k = z k E L (z k Z k 1 ), where Z k 1 = (z 0 ; z k 1 ) The innovation RV z k may be regarded as containing only the new statistical information which is not already in Z k 1 The following properties follow directly from those of the best linear estimator: (1) E( z k ) = 0, and E( z k Z T k 1 ) = 0 (2) z k is a linear function of Z k (3) Thus, cov( z k, z l ) = E( z k z T l ) = 0 for k l This implies that the innovations process is a zero-mean white noise process Denote Z k = ( z 0 ; ; z k ) It is easily verified that Z k and Z k are linear functions of each other This implies that E L (x Z k ) = E L (x Z k ) for any RV x It follows that (taking E(x) = 0 for simplicity): E L (x Z k ) = E L (x Z k ) = E L (x Z k 1 ) + E L (x z k ) = k E L (x z l ) l=0 11

12 c Derivation of the KF equations We proceed to derive the Kalman filter as the best linear estimator for our linear, non-gaussian model We slightly generalize the model that was treated so far by allowing correlation between the state noise and measurement noise Thus, we consider the model x k+1 = F k x k + G k w k, k 0 z k = H k x x + v k, with [w k ; v k ] a zero-mean white noise sequence with covariance E( w k [wl T, vl T ]) = Q k v k S T k S k R k δ kl x 0 has mean x 0, covariance P 0, and is uncorrelated with the noise sequence We use here the following notation: Z k = (z 0 ; ; z k ) ˆx k k 1 = E L (x k Z k 1 ) ˆx k k = E L (x k Z k ) x k k 1 = x k ˆx k k 1 x k k = x k ˆx k k P k k 1 = cov( x k k 1 ) P k k = cov( x k k ) and defne the innovations process z k = zk E L (z k Z k 1 ) = z k H kˆx k k 1 Note that z k = H k x k k 1 + v k 12

13 Measurement update: From our previous discussion of linear estimation and innovations, ˆx k k = E L (x k Z k ) = E L (x k Z k ) = E L (x k Z k 1 ) + E L (x k z k ) E(x k ) This relation is the basis for the innovations approach The rest follows essentially by direct computations, and some use of the orthogonality principle First, E L (x k z k ) E(x k ) = cov(x k, z k )cov( z k ) 1 z k The two covariances are next computed: cov(x k, z k ) = cov(x k, H k x k k 1 + v k ) = P k k 1 Hk T, where E(x k x T k k 1 ) = P k k 1 follows by orthogonality, and we also used the fact that v k and x k are not correlated Similarly, cov( z k ) = cov(h k x k k 1 + v k ) = H k P k k 1 H T k + R k = Mk By substituting in the estimator expression we obtain ˆx k k = ˆx k k 1 + P k k 1 Hk T M 1 k z k Time update: This step is less trivial than before due to the correlation between v k and w k We have ˆx k+1 k = E L (x k+1 Z k ) = E L (F k x k + G k w k Z k ) = F kˆx k k + G k E L (w k z k ) In the last equation we used E L (w k Z k 1 ) = 0 since w k is uncorrelated with Z k 1 Thus ˆx k+1 k = F kˆx k k + G k E(w k z T k )cov( z k ) 1 z k = F kˆx k k + G k S k M 1 k z k where E(w k z T k ) = E(w kv T k ) = S k follows from z k = H k x k k 1 + v k 13

14 Combined update: Combining the measurement and time updates, we obtain the one-step update for ˆx k k 1 : ˆx k+1 k = F kˆx k k 1 + K k z k where K k = (Fk P k k 1 H k + G k S k )M 1 k z k = z k H kˆx k k 1 M k = H k P k k 1 H T k + R k Covariance update: The relation between P k k and P k k 1 is exactly as before The recursion for P k+1 k is most conveniently obtained in terms of P k k 1 directly From the previous relations we obtain x k+1 k = (F k K k H k ) x k k 1 + G k w k K k v k Since x k is uncorrelated with w k and v k, P k+1 k = (F k K k H k )P k k 1 (F k K k H k ) T + G k Q k G T k +K k R k K T k (G ks k K T k + K ks T k GT k ) This completes the filter equations for this case 14

15 Addendum: A Hilbert space interpretation The definitions and results concerning linear estimators can be nicely interpreted in terms of a Hilbert space formulation Consider for simplicity all RVs in this section to have 0 mean Recall that a Hilbert space is a (complete) inner-product space That is, it is a linear vector space V, with a real-valued inner product operation v 1, v 2 which is bi-linear, symmetric, and non-degenerate ( v, v = 0 iff v = 0) (Completeness means that every Cauchy sequence has a limit) The derived norm is defined as v 2 = v, v The following facts are standard: 1 A subspace S is a linearly-closed subset of V Alternatively, it is the linear span of some set of vectors {v α } 2 The orthogonal projection Π S v of a vector v unto the subspace S is the closest element to v in S, ie, the vector v S which minimizes v v Such a vector exists and is unique, and satisfies (v Π S v) S, ie, v Π S v, s = 0 for s S 3 If S = span{s 1,, s k }, then Π S v = k i=1 α is i, where [α 1,, α k ] = [ v, s 1,, v, s k ][ s i, s j i,j=1k ] 1 4 If S = S 1 S 2 (S is the direct sum of two orthogonal subspaces S 1 and S 2 ), then Π S v = Π S1 v + Π S2 v If {s 1,, s k } is an orthogonal basis of S, then Π S v = k v, s i s i, s i 1 s i i=1 15

16 5 Given a set of (independent) vectors {v 1, v 2 }, the following Gram-Schmidt procedure provides an orthogonal basis: ṽ k = v k Π span{v1 v k 1 }v k k 1 = v k v k, ṽ i ṽ i, ṽ i 1 v i i=1 We can fit the previous results on linear estimation to this framework by noting the following correspondence: Our Hilbert space is the space of all zero-mean random variables x (on a given probability space) which are square-integrable: E(x 2 ) = 0 The inner product in defined as x, y = E(xy) The optimal linear estimator E L (x k Z k ), with Z k = (z 0,, z k ), is the orthogonal projection of the vector x k on the subspace spanned by Z k (If x k is vector-valued, we simply consider the projection of each element separately) The innovations process {z k } is an orthogonalized version of {z k } The Hilbert space formulation provides a nice insight, and can also provide useful technical results, especially in the continuous-time case However, we shall not go deeper into this topic 16

17 44 The Kalman Filter as a Least-Squares Problem Consider the following deterministic optimization problem Cost function (to be minimized): J k = 1 2 (x 0 x 0 ) T P 1 0 (x 0 x 0 ) k l=0 l=0 (z l H l x l ) T R 1 l (z l H l x l ) k 1 wl T Q 1 l w l Constraints: Variables: x l+1 = F l x l + G l w l, l = 0, 1,, k 1 x 0, x k ; w 0, w k 1 Here x 0, {z l } are given vectors, and P 0, R l, Q l symmetric positive-definite matrices Let (x (k) o,, x (k) ) denote the optimal solution of this problem We claim that x(k) k can be computed exactly as ˆx k k in the corresponding KF problem This claim can be established by writing explicitly the least-squares solution for k 1 and k, and manipulating the matrix expressions We will take here a quicker route, using the Gaussian insight Theorem The minimizing solution (x (k) o,, x (k) ) of the above LS problem is the k maximizer of the conditional probability (that is, the M AP estimator): p(x 0,, x k Z k ), wrt(x o,, x k ) k 17

18 related to the Gaussian model: x k+1 = F k x k + G k w k, x 0 N(x 0, P 0 ) z k = H k x k + v k, w k N(0, Q k ), v k N(0, P k ) with w k, v k white and independent of x 0 Proof: Write down the distribution p(x 0 x k, Z k ) Immediate Consequence: Since for Gaussian RV s MAP=MMSE, then (x 0,, x k ) (k) are equivalent to the expected means: In particular, x (k) k = x + k Remark: The above theorem (but not the last consequence) holds true even for the non-linear model: x k+1 = F k (x k ) + G k w k 18

19 45 KF Equations Basic Versions a The basic equations Initial Conditions: ˆx 0 = x 0 = E(x0 ), P 0 = P 0 = cov(x0 ) Measurement update: ˆx + k = ˆx k + K k(z k H kˆx k ) P + k = P k K kh k P k where K k is the Kalman Gain matrix: K k = P k HT k (H k P k HT k + R k ) 1 Time update: ˆx k+1 = F kˆx + k [+B k u k ] P k+1 = F kp + k F T k + G k Q k G T k b One-step iterations The two-step equations may obviously be combined into a one-step update which computes ˆx + k+1 from ˆx+ k (or ˆx k+1 from ˆx k ) For example, ˆx k+1 = F kˆx k + F kk k (z k H kˆx k ) P k+1 = F k (P k K kh k P k )F k T + G k Q k G T k L k = Fk K k is also known as the Kalman gain The iterative equation for P k Riccati Equation is called the (discrete-time, time-varying) Matrix 19

20 c Other important quantities The measurement prediction, the innovations process, and the innovations covariance are given by ẑ k = E(zk Z k 1 ) = H kˆx k (+I ku k ) z k = zk ẑ k = H k x k M k = cov( zk ) = H k P k HT k + R k d Alternative Forms for the covariance update The measurement update for the (optimal) covariance P k may be expressed in the following equivalent formulas: P + k = P k K kh k P k We mention two alternative forms: = (I K k H k )P k = P k P k HT k M 1 k H kp k = P k K km k K T k 1 The Joseph form: Noting that x k ˆx + k = (I K kh k )(x k ˆx k ) K kv k it follows immediately that P + k = (I K kh k )P k (I K kh k ) T + K k R k K T k This form may be more computationally expensive, but has the following advantages: 20

21 It holds for any gain K k (not just the optimal) that is used in the estimator equation ˆx + k = ˆx k + K k z k Numerically, it is guaranteed to preserve positive-definiteness (P + k > 0) 2 Information form: (P + k ) 1 = (P k ) 1 + H k R 1 k H k The equivalence may obtained via the useful Matrix Inversion Lemma: (A + BCD) 1 = A 1 A 1 B(DA 1 B + C 1 ) 1 DA 1 where A, C are square nonsingular matrices (possibly of different size) P 1 is called the Information Matrix It forms the basis for the information filter, which only computes the inverse covariances e Relation to Deterministic Observers The one-step recursion for ˆx k control theory Given a (deterministic) system: is similar in form to the algebraic state observer from x k+1 = F k x k + B k u k z k = H k x k a state observer is defined by ˆx k+1 = F kˆx k + B k u k + L k (z k H kˆx k ) where L k are gain matrices to be chosen, with the goal of obtaining x k 0 as k = (xk ˆx k ) 21

22 Since x k+1 = (F k L k H k ) x k, we need to choose L k so that the linear system defined by A k = (F k L k H k ) is asymptotically stable This is possible when the original system is detectable The Kalman gain automatically satisfies this stability requirement (whenever the detectability condition is satisfied) 22

9 Multi-Model State Estimation

9 Multi-Model State Estimation Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof. N. Shimkin 9 Multi-Model State