The Kalman filter is arguably one of the most notable algorithms

LECTURE E NOTES «Kalman Filtering with Newton s Method JEFFREY HUMPHERYS and JEREMY WEST The Kalman filter is arguably one of the most notable algorithms of the 0th century [1]. In this article, we derive the Kalman filter using Newton s method for root finding. We show that the one-step Kalman filter is given by a single iteration of Newton s method on the gradient of a quadratic objective function, and with a judiciously chosen initial guess. This derivation is different from those found in standard texts [] [6], since it provides a more general framework for recursive state estimation. Although not presented here, this approach can also be used to derive the extended Kalman filter for nonlinear systems. BACKGROUND Linear Estimation Suppose that data are generated by the linear model b 5 Ax 1e, (1) where A is a known m 3 n matrix of rank n, e is an m-dimensional random variable with zero mean and known positive-definite covariance Q 5 E3ee T 4, denoted Q. 0, and b [ R m represents known, but inexact, measurements with errors given by e. The vector x [ R n is the set of parameters to be estimated. The estimator x^ is linear if x^ 5 Kb for some n 3 m matrix K. We say that the estimator x^ is unbiased if E3x^ 4 5 x. Since E3x^ 4 5 E3Kb4 5 E3K(Ax 1e)4 5 KAx 1 KE3e4 5 KAx, it follows that the linear estimator x^ 5 Kb is unbiased if and only if KA 5 I. When m. n, many choices of K could be made to satisfy this condition. We want to find the meansquared error-minimizing linear unbiased estimator, or, in other words, the minimizer of the quantity E3 7x^ x7 4 over all matrices K that satisfy the constraint KA 5 I. In What Is the Best Linear Unbiased Estimator, we prove the Gauss- Markov theorem, which states that the optimal K is given Digital Object Identifier 10.1109/MCS.010.938485 by K 5 (A T Q A) A T Q, and thus the corresponding estimator is given by whose covariance is given by x^ 5 (A T Q A) A T Q b, () P 5 E3(x^ x)(x^ x) T 4 5 (A T Q A). (3) In addition, the Gauss-Markov theorem states that every other linear unbiased estimator of x has larger covariance than (3). Thus we call () the best linear unbiased estimator of (1). Weighted Least Squares It is not a coincidence that the estimator in () has the same form as the solution of the weighted least squares problem 1 x^ 5 argmin x[r n 7b Ax7 W, (4) where W. 0 is a problem-specific weighting matrix and 7 # 7 W describes a weighted -norm defined as 7x7 W J x T Wx. The factor 1/ is not necessary but makes calculations involving the derivative simpler. Scaling the objective function by a constant does not affect the minimizer. Therefore, consider the positive-definite quadratic objective function J(x) 5 1 7b Ax7 W 5 1 xt A T WAx x T A T Wb 1 1 bt Wb. (5) Its minimizer x^ satisfies =J(x^ ) 5 0, which yields the normal equations A T WAx^ 5 A T Wb. (6) Since A is assumed to have full column rank, the weighted Gramian A T WA is nonsingular, and thus (6) has the solution x^ 5 (A T WA) A T Wb. (7) Hence, in comparing () with (7), we see that the best linear unbiased estimator of (1) is found by solving the weighted least squares problem (4) with weight 1066-033X/10/$6.00 010IEEE DECEMBER 010 «IEEE CONTROL SYSTEMS MAGAZINE 101

What Is the Best Linear Unbiased Estimator? Suppose that data are generated by the linear model b 5 Ax 1e, where A is a known m 3 n matrix of rank n, and e is an m-dimensional random variable with zero mean and covariance Q. 0. Recall that a linear estimator x^ 5 Kb of x is unbiased if and only if KA 5 I. If n, m, then there are many choices of K that are possible. We want the matrix K that minimizes the meansquared error E3 7 x^ x 7 4. Note, however, that 7x^ x7 5 7Kb x7 5 7KAx 1 Kex7 5 7Ke7 5e T K T Ke. Moreover, since e T K T Ke is a scalar, we have e T K T Ke5 tr(e T K T Ke) 5 tr(kee T K T ). It follows, by the linearity of expectation, that E3 7x^ x7 4 5 tr(ke3ee T 4K T ) 5 tr(kqk T ). Thus the mean-squared error minimizing linear unbiased estimator is found by minimizing tr(kqk T ) subject to the constraint KA 5 I, where K [ R n3m. This is a convex optimization problem, that is, the objective is a convex function and the feasible set is a convex set. Hence, to fi nd the unique minimizer, it suffi ces to fi nd the unique critical point of the Lagrangian L(K, l) 5 tr(kqk T ) tr(l T (KA I)), where the matrix Lagrange multiplier l [ R n3n corresponds to the n constraints KA 5 I. Thus the minimum occurs when 05= K L(K, l) 5 ' 'K tr(kqkt l T (KA I)) that is, when 5 KQ T 1 KQ la T 5 KQ la T, K 5 1 lat Q. Since KA 5 I, we have that l5(a T Q A). Therefore, K 5 (A T Q A) A T Q, and the optimal estimator is given by x^ 5 (A T Q A) A T Q b. To compute the covariance of the estimator we expand x^ as x^ 5 (A T Q A) A T Q (Ax 1e) 5 x 1 (A T Q A) A T Q e. Thus the covariance is given by P 5 E3(x^ x)(x^ x) T 4 5 (A T Q A) A T Q E3ee T 4Q A(A T Q A) 5 (A T Q A). We now show that every linear unbiased estimator other than x^ produces a larger covariance. If x^l 5 Lb is a linear unbiased estimator of the linear model, then since KA 5 I, it follows that there exists a matrix D such that DA 5 0 and L 5 K 1 D. The covariance of x^l is given by E3(x^L x)(x^l x) T 4 5 E3(K 1 D)ee T (K T 1 D T )4 5 (K 1 D)Q(K T 1 D T ) 5 KQK T 1 DQD T 1 KQD T 1 (KQD T ) T. Note, however, that KQD T 5 0. Thus E3(x^L x)(x^l x) T 4 5 KQK T 1 DQD T $ KQK T. Therefore, x^ is the best linear unbiased estimator of x. W 5 Q. 0. Also from (3) it can be noted that the inverse weighted-gramian matrix P 5 (A T Q A) is the covariance of this estimator. Newton s Method Let f : R n S R n be a smooth function. Assume that f(x^ ) 5 0 and that the Jacobian matrix Df(x) is nonsingular in a neighborhood of x^. If the initial guess x 0 is sufficiently close to x^, then Newton s method, which is given by x k11 5 x k Df(x k ) f(x k ), (8) ` produces a sequence 5x k 6 k50 that converges quadratically to x^, that is, there exists a constant c. 0 such that, for every positive integer k, 7x k11 x^ 7 # c7x k x^ 7. Newton s method is widely used in optimization problems since the local extrema of an objective function J(x) can be obtained by finding roots of its gradient f(x) J =J(x). Hence, from (8) we have x k11 5 x k D J(x k ) =J(x k ), (9) where D J(x k ) denotes the Hessian of J evaluated at x k. This iterative process converges, likewise at a quadratic rate, to the isolated local minimizer x^ of J whenever the starting point x 0 is sufficiently close to x^ and the Hessian D J(x) is positive definite in a neighborhood of x^. In practice, we do not invert the Hessian D J(x k ) when computing (9). Indeed, by a factor of roughly two, we can 10 IEEE CONTROL SYSTEMS MAGAZINE» DECEMBER 010

In this article, we derive the Kalman filter using Newton s method for root finding. more efficiently compute the Newton update by solving the linear system D J(x k )y k 5=J(x k ), for example by Gaussian elimination, and then setting x k11 5 x k 1 y k ; see [7] for details. To find the minimizer of a positive-definite quadratic form (5), it is not necessary to use an iterative scheme, such as Newton s method, since the gradient is affine in x and the unique minimizer can be found by solving the normal equations (6) directly. However, positivedefinite quadratic forms have the special property that Newton s method (9) converges in a single step for each starting point x [ R n, that is, for all x [ R n, the minimizer x^ satisfies x^ 5 x D J(x) =J(x). (10) This observation provides a key insight used to derive the Kalman filter below. Note also that the Hessian of the quadratic form (5) is D J(x) 5 A T WA, which is constant in x. Thus, throughout we denote the Hessian as D J, dropping the explicit dependence on x. Recursive Least Squares We now consider the least squares problem (4) in the case of recursive implementation. At each time k, we seek the best linear unbiased estimator x^ k of x for the linear model where b k 5 A k x 1e k, (11) b 1 A 1 v 1 b k 5 (, A k 5 (, and e k 5 (. b k A k v k We assume, for each j 5 1, c, k, that the noise terms v j have zero mean and are uncorrelated with E3v j v jt 4 5 R j. 0. We also assume that A 1 has full column rank, and thus each A k has full column rank. The estimate is found by solving the weighted least squares problem x^ k 5 argmin7 ba k x7 Wk, (1) x where the weight is given by the inverse covariance W k 5 E3e k e kt 4 5 diag(r 1, c, R k ). As time marches forward, the number of rows of the linear system increases, thus altering the least squares solution. We now show that it is possible to use the least squares solution x^ k at time k 1 to efficiently compute the least squares solution x^ k at time k. This process is the recursive least squares algorithm. We begin by rewriting (1) as J k (x) 5 1 a k i51 7b i A i x7 Ri. (13) The positive-definite quadratic form (13) can be expressed recursively as J k (x) 5 J k (x) 1 1 7b k A k x7 Rk. The gradient and Hessian of J k are given by and =J k (x) 5=J k (x) 1 A k T R k (A k x b k ) D J k 5 D J k 1 A k T R k A k, (14) respectively. Since A 1 has full column rank, we see that D J 1. 0. From (14), it follows that D J k. 0 for every positive integer k, and hence from (10) the minimizer of (13) becomes x^ k 5 x (D J k ) (=J k (x) 1 A k T R k (A k x b k )), (15) where the starting point x [ R n can be arbitrarily chosen. Since =J k (x^ k ) 5 0, we set x 5 x^ k in (15). Thus (15) becomes x^ k 5 x^ k K k A k T R k (A k x^ k b k ), where K k J (D J k ) is the inverse of the Hessian of J, and from (3) represents the covariance of the estimate. Observing from (14) that K k 5 K k 1 A T k R k A k and using Lemma 1 in Inversion Lemmata, we have K k 5 (K k 1 A T k R k A k ) 5 K k K k A kt (R k 1 A k K k A kt ) A k K k. Thus, to summarize, the recursive least squares method is K k 5 K k K k A kt (R k 1 A k K k A kt ) A k K k x^ k 5 x^ k K k A k T R k (A k x^ k b k ). At each time k, this algorithm gives the best linear unbiased estimator of (11), along with its covariance. DECEMBER 010 «IEEE CONTROL SYSTEMS MAGAZINE 103

The Kalman filter is arguably one of the most notable algorithms of the 0th century. The recursive least squares algorithm allows for efficient updating of the weighted least squares solution. Indeed, if p rows are added during the kth step, then the expression R k 1 A k K k A k T is a p 3 p matrix, which is small in size compared to the matrix A k W k A k T, which is needed, say, if we compute the solution directly using (7). STATE-SPACE MODELS AND STATE ESTIMATION Consider the stochastic discrete-time linear dynamic system x k11 5 F k x k 1 G k u k 1 w k, (16) y k 5 H k x k 1 v k, (17) where the variables represent the state x k [ R n, the input u k [ R p, and the output y k [ R q. The terms w k and v k are noise processes, which are assumed to have mean zero and be mutually uncorrelated, with known covariances Q k. 0 and R k. 0, respectively. We assume that the system has the stochastic initial state x 0 5m 0 1 w, where m 0 is the mean and w is a zero-mean random variable with covariance E3w w T 4 5 Q. 0. State Estimation We formulate the state estimation problem, given m known observations y 1, c, y m and k known inputs u 0, c, u k, where k $ m. To find the best linear unbiased estimator of the states x 1, c, x k, we begin by writing (16) (17) as the large linear estimation problem m 0 5 x 0 w, G 0 u 0 5 x 1 F 0 x 0 w 0, y 1 5 H 1 x 1 1 v 1, ( ( G m u m 5 x m F m x m w m, y m 5 H m x m 1 v m, G m u m 5 x m11 F m x m w m, ( ( G k u k 5 x k F k x k w k. Note that the known measurements, namely, the inputs and outputs, are on the left, whereas the unknown states are on the right, together with the noise terms. The parameters to be estimated in this linear estimation problem are the states x 0, x 1, c, x k, which we write as z k 5 3x 0 c xk 4 T [ R (k11)n. Then the linear system takes the form where b k m 5 A k m z k 1e k m, T T T e k m 5 3w w 0 v 1 c T T T wm v m w m c T wk is a zero-mean random variable whose inverse covariance is the positive-definite block-diagonal matrix W k m 5 diag(q, Q 0, R 1, c, Q m, R m, Q m c, Q k ). 4 T Inversion Lemmata We provide some technical lemmata that are used throughout the article. The first lemma is the Sherman Morrison Woodbury formula, which shows how to invert an additively updated matrix when we know the inverse of the original matrix. The second inversion lemma tells us how to invert block matrices. These statements can be verified by direct computation; see also [8, p. 18 19]. LEMMA 1 (SHERMAN-MORRISON-WOODBURY) Let A and C be square matrices and B and D be given so that the sum A 1 BCD is nonsingular. If A, C and C 1 DA B are also nonsingular, then (A 1 BCD) 5 A A B(C 1 DA B) DA. LEMMA (SCHUR) Let M be a square matrix with block form M 5 c A B C D d. If A, D, A BD C, and D CA B are nonsingular, then (A M BD C) A B(D CA B) 5 c D C(A BD C) (D CA B) d. 104 IEEE CONTROL SYSTEMS MAGAZINE» DECEMBER 010

The second observation is that the least-squares estimation problem can be solved by minimizing a positive-definite quadratic functional. We observe that each column of A k m is a pivot column. Hence, it follows that A k m has full column rank, and thus the weighted Gramian A T k m W k m A k m is nonsingular. Thus, the best linear unbiased estimator z^ k m of z k is given by the weighted least squares solution z^ k m 5 (A T k m W k m A k m ) A T k m W k m b k m. Since the state estimation problem is cast as a linear model, we can equivalently solve the estimation problem by minimizing the positive-definite objective function J k m ) 5 1 7A k mz k b k m 7 Wk m. (18) We can find its minimizer z^ k m by performing a single iteration of Newton s method given by z^ k m 5 z (A T k m W k m A k m ) A T k m W k m (A k m z b k m ), (19) where z [ R (k11)n is arbitrary. In the following, we set m 5 k and choose a canonical z that simplifies (19), thus leaving us with the Kalman filter. Kalman Derivation with Newton s Method Consider the state estimation problem in the case m 5 k, that is, the number of observations equals the number of inputs. It is of particular interest in applications to determine the current state x k given the observations y 1, c, y k and inputs u 0, c, u k. The Kalman filter gives the best linear unbiased estimator x^ k k of x k in terms of the previous state estimator x^ k k and the latest data u k and y k up to that point in time. We begin by rewriting (18) as J k m ) 5 1 00x 0 m 0 00 Q 1 1 m a 00y i H i x i 00 R i i51 1 1 k a 00x i F i x i G i u i 00 Q i. (0) i51 For convenience, we denote z^ k k as z^ k, x^ k k as x^ k, and J k k as J k. For m 5 k, (0) can be expressed recursively as J k ) 5 J k ) 1 1 7y k H k x k 7 Rk 1 1 7x k F k z k G k u k 7 Q k, where F k 5 30 c 0 F k 4. The gradient and Hessian of J k are given by =J k ) 5 and T =J k ) 1 F k Q k (F k z k x k 1 G k u k ) c Q k (F k z k x k 1 G k u k ) 1 H T k R k (H k x k y k ) d Q k Q k D J k 5 c D T J k 1 F T k F k F k Q k F k Q k 1 H T k R d, () k H k respectively. Since D J 0 5 Q. 0, it follows inductively that D J k. 0 for every positive integer k. The proof follows from the observation that z T k D T J k z k 5 z k D J k z k 1 7F k z k x k 7 Qk 1 7H k x k 7 Rk, and thus the right-hand side is nonnegative, being zero only if z k 5 0 and x k 5 0, or, equivalently, z k 5 0. From one iteration of Newton s method, we have that z^ k 5 z k (D J k ) =J k ) () for all z k [ R (k11)n. Since =J k (z^ k ) 5 0 and F k z^ k 5 F k x^ k, we set Thus, z^ k z k 5 c F k x^ k 1 G k u k. d. 0 =J k ) 5 c H T k R k 3H k (F k x^ k 1 G k u k ) y k 4 d, and the bottom row of () becomes x^ k 5 F k x^ k 1 G k u k P k H k T R k 3 3H k (F k x^ k 1 G k u k ) y k 4, where P k is the bottom-right block of the inverse Hessian (D J k ), which from Lemma in Inversion Lemmata, is given by P k 5 (Q k 1 H T k R k H k Q k F k 3 (D T J k 1 F k Q k F k ) T F k Q k ) 5 3(Q k 1 F k (D J k ) T F k ) 1 H T k R k H k 4 T 5 3(Q k 1 F k P k F k ) 1 H T k R k H k 4. DECEMBER 010 «IEEE CONTROL SYSTEMS MAGAZINE 105

Note that P k is the covariance of the estimator x^ k. To summarize, we have the recursive estimator for x k given by T P k 5 3(Q k 1 F k P k F k ) 1 H T k R k H k 4 x^ k 5 F k x^ k 1 G k u k P k H k T R k 3 3H k (F k x^ k 1 G k u k ) y k 4, where x^ 0 5m 0, P 0 5 Q. This recursive estimator is the onestep Kalman filter. CONCLUSIONS There are a few key observations in the Newton method derivation of the Kalman filter. First is that the state estimation problem for linear systems (16) (17) is a potentially large linear least squares estimation problem. The second observation is that the least-squares estimation problem can be solved by minimizing a positive-definite quadratic functional. The third point is that a positive-definite quadratic functional can be minimized with a single iteration of Newton s method for any starting point x [ R n. The final observation is that by judiciously choosing the starting point to be the solution of the previous state estimate, several terms cancel and the Newton update simplifies to the one-step Kalman filter. We remark that this approach to state estimation can be generalized to include predictive estimates on future states and smoothed estimates on previous states. We can also use Newton s method to derive the extended Kalman filter. ACKNOWLEDGMENTS The authors thank Randall W. Beard, Magnus Egerstedt, C. Shane Reese, and H. Dennis Tolley for useful conversations. This work was supported in part by the National Science Foundation, Division of Mathematical Sciences, grant numbers CAREER 0847074 and CSUMS 063938. AUTHOR INFORMATION Jeffrey Humpherys received the Ph.D. in mathematics from Indiana University in 00 and was an Arnold Ross Assistant Professor at The Ohio State University from 00-005. Since 005, he has been at Brigham Young University, where he is currently an associate professor of mathematics. His interests are in applied mathematical analysis and numerical computation. Jeremy West received B.S. degrees in mathematics and computer science and the M.S. in mathematics in 009, all from Brigham Young University. He is currently a Ph.D. student in mathematics at the University of Michigan. His interests are in discrete mathematics. REFERENCES [1] J. L. Casti, Five More Golden Rules: Knots, Codes, Chaos, and other Great Theories of 0th-Century Mathematics. New York: Wiley, 000. [] M. S. Grewal and A. P. Andrews, Kalman Filtering: Theory and Practice Using MATLAB, nd ed. New York: Wiley, 001. [3] D. Simon, Optimal State Estimation: Kalman, H-Infinity, and Nonlinear Approaches. New York: Wiley-Interscience, 006. [4] T. Kailath, A. Sayed, and B. Hassibi, Linear Estimation. Englewood Cliffs, NJ: Prentice-Hall, 000. [5] R. Brown and P. Hwang, Introduction to Random Signals and Applied Kalman Filtering. New York: Wiley, 1997. [6] A. Gelb, Applied Optimal Estimation. Cambridge, MA: MIT Press, 1974. [7] C. T. Kelley, Solving Nonlinear Equations with Newton s Method. Philadelphia, PA: SIAM, 003. [8] R. A. Horn and C. R. Johnson, Matrix Analysis. Cambridge, U.K.: Cambridge Univ. Press, 1990. 106 IEEE CONTROL SYSTEMS MAGAZINE» DECEMBER 010