A Matrix Theoretic Derivation of the Kalman Filter

Size: px

Start display at page:

Download "A Matrix Theoretic Derivation of the Kalman Filter"

Helena Cross
5 years ago
Views:

1 A Matrix Theoretic Derivation of the Kalman Filter 4 September 2008 Abstract This paper presents a matrix-theoretic derivation of the Kalman filter that is accessible to students with a strong grounding in matrix theory and multi-variable calculus. 1 Introduction The Kalman filter is a data analysis method used in a wide range of engineering and applied mathematics problems. Standard derivations of the filter mae use of probability theory and can be difficult to follow for those not well-versed in probabilistic techniques and notation. In this paper, we present a derivation of the Kalman filter that is accessible to undergraduates with a strong grounding in matrix theory and multi-variable calculus. With this development, the Kalman filter can be presented as a part of a senior-level, undergraduate matrix theory or numerical analysis course. Of ey interest to us are stochastic linear systems of the form b = Ax + e, (1) where b R m is measured data; A R m n is a nown observation matrix; x R n is the unnown parameter vector to be estimated; e R m is a zero-mean Gaussian random vector; and x R n is the unnown vector that is to be estimated. The standard technique for estimating x, nown as least squares estimation, was developed by Gauss in his study of planetary motion [3]. The extension of least squares estimation to the case when the unnown x is also assumed to be a Gaussian random vector is nown as minimum variance estimation [6]. In the study of time varying phenomena, it is natural to generalize (1) as follows: x = M x 1 + E, (2) b = A x + e, (3) where equation (3) is analogous to (1); and in (2), M R n is the nown evolution matrix, x 1 is a Gaussian random vector, and E R n is a zero-mean Gaussian random vector. The Kalman filter [4, 7] is the extension of minimum variance (and hence least squares) estimation to the problem of sequentially estimating {x 1, x 2,...} given data {b 1, b 2,...} in (2), (3). For the interested reader, a discussion of the progression of ideas from Gauss to Kalman is the subject of the excellent paper [5]. Mathematics Subject Classifications: 15A99, 65C60. 1

2 This paper is organized as follows. First, in Section 2, we present the basic statistical definitions and results that we will need in our later discussion. Then in Section 3, we derive the minimum variance estimator of x given model (1). We then derive the Kalman filter in Section 4, using the minimum variance theory applied to model (2), (3). Finally, we present an equivalent formulation of the Kalman filter, which we call the variational Kalman filter, as well as the extended Kalman filter for the case when (2), (3) contain nonlinear evolution and/or observation operators. 2 Statistical Preliminaries Let x = (x 1,..., x n ) T be a random vector with E(x i ) the mean of x i and E((x i µ i ) 2 ), where µ i = E(x i ), its variance. The mean of x is then defined E(x) = (E(x 1 ),..., E(x n )) T, while the n n covariance matrix of x is defined [cov(x)] ij = E((x i µ i )(x j µ j )), 1 i, j n. Note that the diagonal of cov(x) contains the variances of x 1,..., x n, while the off diagonal elements contain the covariance values. Thus if x i and x j are independent [cov(x)] ij = 0 for i j. The n m cross correlation matrix of the random n-vector x and m-vector y, which we will denote Γ xy, is defined Γ xy = E(xy T ), (4) where [E(xy T )] ij = E(x i y j ). If x and y are independent, then Γ xy is the zero matrix. Furthermore, E(x) = 0 implies Γ xx = cov(x). (5) Finally, given an m n matrix A and a random n-vector x, it is not difficult to show that cov(ax) = Acov(x)A T. (6) We end these preliminary comments with the probability density function of primary interest to us in this paper, the Gaussian distribution. If b is an n 1 Gaussian random vector, then its probability density function has the form 1 p b (b; µ, C) = ( (2π)n det(c) exp 1 ) 2 (b µ)t C 1 (b µ), (7) where µ R n, C is an n n symmetric positive definite matrix, and det( ) denotes matrix determinant. Then E(b) = µ and cov(b) = C. We will use the notation b N(µ, C) in this case. For more details on introductory mathematical statistics, see one of many introductory mathematics statistics texts, including [2]. 3 Minimum Variance Estimation First, we consider model (1). When x is assumed to be deterministic, it is a standard exercise to show that the least squares estimator is given by x ls = (A T A) 1 A T b. 2

3 However, we are interested in the case in which x N(0, C φ ) with C φ a symmetric positive definite matrix. We assume, furthermore, that x and e are independent random variables. The problem of estimating φ is statistical, however as we will see, the minimum variance estimator can be the solution of a linear system of equations. We now define the minimum variance estimator of φ. Definition 1. Suppose x and b are random vectors defined on the same probability space whose components have finite expected squares. The minimum variance estimator of x from b is given by where = ˆBb, ˆB = arg min E ( Bb x 2 ) B R n n 2. We now state and prove a theorem that gives us the minimum variance estimator in an elegant closed form. Theorem 1. If Γ bb (defined in (4)) is invertible, then the minimum variance estimator of x from b is given by = (Γ φb Γ bb 1 )b. Proof. First, we note that E( Bb x 2 ) = trace ( E[(Bb x)(bb x) T ] ), = trace ( BE[bb T ]B T BE[bx T ] E[xb T ]B T + E[xx T ] ). Then, using the distributive property of the trace function and the identity we see that de( Bb x 2 )/db = 0 when which establishes the result. ( ) T d d db trace(bt C) = db trace(bc) = C, ˆB = Γ φb Γ bb 1, In the context of (1), and given our assumptions stated above, we can obtain a more concrete form for the minimum variance estimator. In particular, we note that since x and e are assumed to be independent, Γ φe = Γ eφ = 0. Hence, using (1), we obtain Similarly, Γ φb = E[x(Ax + e) T ], = Γ φφ A T. Γ bb = E[(Ax + e)(ax + e) T ], = AΓ φφ A T + Γ ee. 3

4 Thus, since Γ φφ = C φ and Γ ee = C e, the minimum variance estimator has the form = ˆBb = C φ A T (AC φ A T + C e ) 1 b, We note, in passing, that (8) can also be expressed as { = arg min Ax b 2 x C 1 e = (A T C 1 e A + C 1 φ ) 1 A T C 1 e b. (8) + x 2 C 1 φ }, (9) def where arg min denotes argument of the minimum and x C = x T Cx. This establishes a clear connection between minimum variance estimation and generalized Tihonov regularization [6]. Note in particular that if C e = σ1i 2 and C φ = σ2i, 2 problem (9) can be equivalently expressed as { = arg min Ax b (σ 2 x 1/σ2) x 2 2 } 2, which has classical Tihonov form. Note that this is also equivalent to the maximum a posteriori (MAP) estimator. 4 The Kalman Filter In the previous section, we considered the stationary linear model (1), but suppose that what we wish to estimate (x above) changes in time. More specifically, suppose our model now has the form (2), (3). Equation (2) is the equation of evolution for x with M the n n linear evolution matrix, and E N(0, C E ). In equation (3), b denotes the m 1 observed data, A the m n linear observation matrix, and e N(0, C e ). In both equations, denotes the time index. The problem is to estimate x at time from b and an estimate 1 of the state at time 1. We assume 1 N(x 1, C est 1 ). To facilitate a more straightforward application of the result of Theorem 1, we rewrite (2), (3). First, define x a = M 1 (10) z = x x a, (11) r = b A x a. (12) Then, subtracting (10) from (2) and A x a from both sides of (3), and dropping the dependence for notational simplicity, we obtain the stochastic linear equations z = M(x ) + E, (13) r = Az + e. (14) The minimum variance estimator of z from r given (13), (14) is then given, via Theorem 1, by z est = Γ zr Γ rr 1 r. 4

5 We assume that x is independent of E, and that z = x x a is independent of e. We note, furthermore, that by assumption both of these random vectors have zero mean. Then, from (4), (5), (6), (13) and (14), we obtain Γ zz = MC est M T + C E def = C a, (15) Γ zr = C a A T, Γ rr = AC a A T + C e. where C est and C a are the covariance matrices for and x a respectively. minimum variance estimator of z is given by Thus, finally, the z est = C a A T (AC a A T + C e ) 1 r, (16) From (16) and (11) we then immediately obtain the Kalman Filter estimate of x given by where + = x a + H(b Ax a ). (17) H = C a A T (AC a A T + C e ) 1 (18) is nown as the Kalman Gain matrix. Finally, in order to compute the covariance of +, we note that by (17) and (3), + = (I HA)x a + He + HAx, where x is the true state. Given our assumptions and using (6), the covariance then taes the form C est + = (I HA)C a (I HA) T + HC e H T, which can be rewritten, using the identity HC e H T = (I HA)C a A T H T, in the simplified form C est + = C a HAC a. (19) Incorporating the dependence again leads directly to the Kalman filter iteration. The Kalman Filter Algorithm Step 0: Select initial guess 0 and covariance C est 0, and set = 1. Step 1: Compute the evolution model estimate and covariance: A. Compute x a = M 1 ; B. Compute C a = M C est 1 MT + C E := C a. Step 2: Compute the Kalman filter estimate and covariance: A. Compute the Kalman Gain H = C a AT (A C a AT + C e) 1 ; B. Compute the estimate = x a + H (b A x a ); C. Compute the estimate covariance C est = C a H A C a. Step 3: Update := + 1 and return to Step 1. 5

6 4.1 A Variational Formulation of the Kalman Filter As in the stationary case (see (8), (9)), we can rewrite equation (16) in the form z est = (AC 1 e A T + (C a ) 1 ) 1 A T C 1 e r, which, yields, using (11), the Kalman filter estimate + = x a + [A T C 1 e = arg min x A + (C a ) 1 ] 1 A T C 1 e (b Ax a ), { l(x) def = 1 2 (b Ax)T C 1 e (b Ax) (x xa ) T (C a ) 1 (x x a ) It can be shown using a Taylor series argument that }. + = x a 2 l(x a ) 1 l(x a ), (20) where l and 2 l denote the gradient and Hessian of l respectively, and are given by By the matrix inversion lemma, we have l(x) = A T C 1 e (b Ax) + (C a ) 1 (x x a ), 2 l(x) = A T C 1 e A + (C a ) 1. (A T C 1 e A + (C a ) 1 ) 1 = C a C a A T (A T C a A T + C e ) 1 AC a. Then from equations (18) and (19), we obtain the very useful fact that This allows us to define an iteration equivalent to that given above. The Variational Kalman Filter Algorithm C est + = 2 l(x) 1. (21) Step 0: Select initial guess 0 and covariance C est 0, and set = 1. Step 1: Compute the evolution model estimate and covariance: A. Compute x a = M 1 ; B. Compute C a = M C est MT + C E := C a. Step 2: Compute the Kalman filter estimate and covariance: A. Compute the estimate = arg min x l(x); C. Compute the estimate covariance C est = 2 l(x) 1. Step 3: Update := + 1 and return to Step 1. A natural question is, what is the use of this equivalent formulation of the Kalman filter? Theoretically there is no benefit gained in using the variational Kalman filter if the estimate and its covariance are computed exactly. However, with the variational approach, the filter estimate, and even its covariance [1], can be computed approximately using an iterative minimization method. This is particularly important for large-scale problems where the exact Kalman filter is prohibitively expensive to compute. 6

7 4.2 The Extended Kalman Filter The extended Kalman filter (EKF) is the extension of the Kalman filter when (2), (3) are replaced by x = M(x 1 ) + E, (22) b = A(x ) + e, (23) where M and A are (possibly) nonlinear functions. EKF is obtained by the following simple modification of either of the above algorithms: in Step 1, A use, instead, x a = M(xest ), and define where x denotes the Jacobian. 5 Conclusions 1 ) M = M(xest x, and A = A(xa ) x, (24) We have presented a derivation of the Kalman filter that utilizes matrix analysis techniques as well as the Bayesian statistical approach of minimum variance estimation. In addition, we presented an equivalent variational formulation of the Kalman filter, as well as the extended Kalman filter for nonlinear problems. References [1] A. Auvinen, J. Bardsley, H. Haario, and T. Kauranne, Large-Scale Kalman Filtering Using the Limited Memory BFGS Method, submitted. [2] Peter Bicel and Kjell Dosum, Mathematical Statistics: Basic Ideas and Selected Topics, Vol. 1, Prentice Hall, New Jersey, [3] K. G. Gauss, Theory of Motion of the Heavenly Bodies, New Yor: Dover, [4] Rudolph E. Kalman, A New Approach to Linear Filtering and Prediction Problems, Transactions of the ASME Journal of Basic Engineering, 82, Series D, pp , [5] H. W. Sorenson, Least Squares Estimation: From Gauss to Kalman, IEEE Spectrum, vol. 7, pp , July [6] Curtis R. Vogel, Computational Methods for Inverse Problems, SIAM [7] Greg Welch and Gary Bishop, An Introduction to the Kalman Filter, UNC Chapel Hill, Dept. of Computer Science Tech. Report, TR

Kalman Filter. Predict: Update: x k k 1 = F k x k 1 k 1 + B k u k P k k 1 = F k P k 1 k 1 F T k + Q

Kalman Filter. Predict: Update: x k k 1 = F k x k 1 k 1 + B k u k P k k 1 = F k P k 1 k 1 F T k + Q Kalman Filter Kalman Filter Predict: x k k 1 = F k x k 1 k 1 + B k u k P k k 1 = F k P k 1 k 1 F T k + Q Update: K = P k k 1 Hk T (H k P k k 1 Hk T + R) 1 x k k = x k k 1 + K(z k H k x k k 1 ) P k k =(I