Kalman Filter and Parameter Identification. Florian Herzog

Kalman Filter and Parameter Identification Florian Herzog 2013

Continuous-time Kalman Filter In this chapter, we shall use stochastic processes with independent increments w 1 (.) and w 2 (.) at the input and the output, respectively, of a dynamic system. We shall switch back and forth between the mathematically precise description of these (normalized) Brownian motions by their increments and the sloppy description by the corresponding white noises v(.) and ṙ(.), respectively, which is preferred in engineering circles Q 1/2 (t)dw 1 (t) = [ v(t) u(t)]dt (1) R 1/2 (t)dw 2 (t) = [ṙ(t) r(t)]dt, (2) where Q 1/2 (t) and R 1/2 (t) are the volatility parameters. We assume a linear dynamical system, where the inputs u(t) and the measurements y(t) are disturbed by noise. Stochastic Systems, 2013 2

Continuous-time Kalman Filter Consider the following linear time-varying dynamic system of order n which is driven by the m-vector-valued white noise v(.). Its initial state x(t 0 ) is a random vector ξ and its p-vector-valued output y(.) is corrupted by the additive white noise ṙ(.): System description in the mathematically precise form: dx(t) = A(t)x(t)dt + B(t)dv(t) = [A(t)x(t) + B(t)u(t)]dt + B(t)Q 1/2 (t)dw 1 (t) (3) x(t 0 ) = ξ (4) y(t)dt = C(t)x(t)dt + dr(t) = [C(t)x(t) + r(t)]dt + R 1/2 (t)dw 2 (t), (5) (6) Stochastic Systems, 2013 3

Continuous-time Kalman Filter System description in the engineer s form: ẋ(t) = A(t)x(t) + B(t) v(t) (7) x(t 0 ) = ξ (8) y(t) = C(t)x(t) + ṙ(t), (9) where ξ : N (x 0, Σ 0 ) (10) v(t) : N (u(t), Q(t)) (11) ṙ(t) : N (r(t), R(t)). (12) Note that in the last two lines, we have deleted the factor 1 dt Q(t) and R(t). which ought to multiply Stochastic Systems, 2013 4

Continuous-time Kalman Filter The filtering problem is stated as follows: Find the optimal filter with the state vector x which is optimal in the following sense: The state estimation is bias-free: E{x(t) x(t)} 0. The error of the state estimate has infimal covariance matrix: Σ opt (t) Σ subopt (t). Stochastic Systems, 2013 5

Continuous-time Kalman Filter With the above-mentioned assumptions that ξ, v, and r are mutually independent and Gaussian, the optimal filter is the following linear dynamic system: x(t) = A(t) x(t) + B(t)u(t) + H(t)[y(t) r(t) C(t) x(t)] = [A(t) H(t)C(t)] x(t) + B(t)u(t) + H(t)[y(t) r(t)] (13) x(t 0 ) = x 0 (14) with H(t) = Σ(t)C T (t)r 1 (t), (15) where Σ(t) is the covariance matrix of the error x(t) x(t) of the state estimate x(t). Stochastic Systems, 2013 6

Continuous-time Kalman Filter Σ(t) is the covariance matrix of the error x(t) x(t) of the state estimate x(t) satisfying the following matrix Riccati differential equation: Σ(t) = A(t)Σ(t) + Σ(t)A T (t) Σ(t)C T (t)r 1 (t)c(t)σ(t) + B(t)Q(t)B T (t) (16) Σ(t 0 ) = Σ 0. (17) The estimation error e(t) = x(t) x(t) satisfies the differential equation ė = ẋ x = Ax + Bu + B( v u) [A HC] x Bu HCx H(ṙ r) = [A HC]e + B( v u) H(ṙ r). (18) Stochastic Systems, 2013 7

Continuous-time Kalman Filter The covariance matrix Σ(t) of the state estimation errer e(t) is governed by the matrix differential equation Σ = [A HC]Σ + Σ[A HC] T + BQB T + HRH T (19) = AΣ + ΣA T + BQB T ΣC T R 1 CΣ + [H ΣC T R 1 ] R [H ΣC T R 1 ] T AΣ + ΣA T + BQB T ΣC T R 1 CΣ. (20) Obviously, Σ(t) is infimized at all times t for H(t) = Σ(t)C T (t)r 1 (t). (21) Stochastic Systems, 2013 8

Continuous-time Kalman Filter Given (19) for the covariance matrix Σ(t): Σ = [A HC]Σ + Σ[A HC] T + BQB T + HRH T. Its first differential with respect to H, at some value of H, with increment dh can be formulated as follows: d Σ(H, dh) = Σ(H) H dh = dhcσ ΣC T dh T + dhrh T + HRdH T = U[HR ΣC T ]T dh, (22) where T is the matrix transposing operator and U is the operator which adds the transposed matrix to its matrix argument. Stochastic Systems, 2013 9

Continuous-time Kalman Filter Consider the following stochastic system of first order with the random initial condition ξ : N (x 0, Σ 0 ) and the uncorrelated white noises v : N (0, Q) and ṙ : N (0, R): ẋ(t) = ax(t) + b[u(t) + v(t)] x(0) = ξ y(t) = x(t) + ṙ(t). Find the time-invariant Kalman filter! The resulting time-invariant Kalman filter is described by the following equations: x(t) = x(0) = x 0. a 2 + b 2QR x(t) + bu(t) + ( a + a 2 + b 2Q R ) y(t) Stochastic Systems, 2013 10

Continuous/discrete-time Kalman Filter The continuous-time measurement (9) which is corrupted by the white noise error ṙ(t) with infinite covariance, y(t) = C(t)x(t) + ṙ(t), must be replaced by an averaged discrete-time measurement y k = C k x(t k ) + r k (23) r k : N (r k, R k ) (24) Cov( r i, r j ) = R i δ ij. (25) Stochastic Systems, 2013 11

Continuous/discrete-time Kalman Filter At any sampling time t k we denote measurements by x(t k t k 1 ) and error covariance matrix Σ(t k t k 1 ) before the new measurement y k has been processed and the state estimate x(t k t k ) and the corresponding error covariance matrix Σ(t k t k ) after the new measurement y k has been processed. As a consequence, the continuous-time/discrete-time Kalman filter alternatingly performs update steps at the sampling times t k processing the latest measurement information y k and open-loop extrapolation steps between the times t k and t K+1 with no measurement information available: Stochastic Systems, 2013 12

Continuous/discrete-time Kalman Filter The Continuous-Time/Discrete-Time Kalman Filter Update at time t k : x(t k t k ) = x(t k t k 1 ) + Σ(t k t k 1 )C T k [ ] 1 C k Σ(t k t k 1 )C T k +R k [y k r k C k x(t k t k 1 )] (26) Σ(t k t k ) = Σ(t k t k 1 ) Σ(t k t k 1 )C T k [ C k Σ(t k t k 1 )C T k +R k] 1Ck Σ(t k t k 1 ) (27) or, equivalently: Σ 1 (t k t k ) = Σ 1 (t k t k 1 ) + C T k R 1 k C k. (28) Stochastic Systems, 2013 13

Continuous/discrete-time Kalman Filter Continuous-time extrapolation from t k to t k+1 with the initial conditions x(t k t k ) and Σ(t k t k ): x(t t k ) = A(t) x(t t k ) + B(t)u(t) (29) Σ(t t k ) = A(t)Σ(t t k ) + Σ(t t k )A T (t) + B(t)Q(t)B T (t). (30) Assuming that the Kalman filter starts with an update at the initial time t 0, this Kalman filter is initialized at t 0 as follows: x(t 0 t 1 ) = x 0 (31) Σ(t 0 t 1 ) = Σ 0. (32) Stochastic Systems, 2013 14

Discrete-time Kalman Filter We consider the following discrete-time stochastic system: x k+1 = F k x k + G k ṽ k (33) x 0 = ξ (34) y k = C k x k + r k (35) ξ : N (x 0, Σ 0 ) (36) ṽ : N (u k, Q k ) (37) Cov(ṽ i, ṽ j ) = Q i δ ij (38) r : N (r k, R k ) (39) Cov( r i, r j ) = R i δ ij (40) ξ, ṽ, r : mutually independent. (41) Stochastic Systems, 2013 15

Discrete-time Kalman Filter We have the following (approximate) zero-order-hold-equivalence correspondences to the continuous-time stochastic system: F k = Φ(t k+1, t k ) transition matrix (42) G k = tk+1 t k Φ(t k+1, t)b(t)dt zero order hold equivalence (43) C k = C(t k ) (44) Q k = Q(t k ) t k+1 t k (45) R k = R(t k) t k t k = averaging time at time t k (46) Stochastic Systems, 2013 16

Discrete-time Kalman Filter The Discrete-Time Kalman Filter Update at time t k : x k k = x k k 1 + Σ k k 1 C T k Σ k k = Σ k k 1 Σ k k 1 C T k [ C k Σ k k 1 C T k +R k] 1[yk r k C k x k k 1 ] (47) [ C k Σ k k 1 C T k +R k] 1Ck Σ k k 1 (48) or, equivalently: Σ 1 k k = Σ 1 k 1 k 1 + CT k R 1 k C k (49) Stochastic Systems, 2013 17

Discrete-time Kalman Filter Extrapolation from t k to t k+1 : x k+1 k = F k x k k + G k u k (50) Σ k+1 k = F k Σ k k F T k + G kq k G T k (51) Initialization at time t 0 : x 0 1 = x 0 (52) Σ 0 1 = Σ 0. (53) Stochastic Systems, 2013 18

Extended Kalman Filter The linearized problem may not be a good approximation we stick to the following philosophy: Where it does not hurt: Analyze the nonlinear system. Where it hurts: use linearization. We consider the following nonlinear stochastic system: ẋ(t) = f(x(t), u(t), t) + B(t) v(t) (54) x(t 0 ) = ξ (55) y(t) = g(x(t), t) + ṙ(t) (56) ξ : N (x 0, Σ 0 ) (57) v : N (v(t), Q(t)) (58) ṙ : N (r(t), R(t)). (59) Stochastic Systems, 2013 19

Extended Kalman Filter The extended Kalman filter Dynamics of the state estimation: x(t) = f( x(t), u(t), t) + B(t)v(t) + H(t)[y(t) r(t) g( x(t), t)] (60) x(t 0 ) = x 0 (61) with H(t) = Σ(t)C T (t)r 1 (t). The error covariance matrix Σ(t) must be calculated in real-time using the folloing matrix Riccati differential equation: Σ(t) = A(t)Σ(t) + Σ(t)A T (t) Σ(t)C T (t)r 1 (t)c(t)σ(t) + B(t)Q(t)B T (t) (62) Σ(t 0 ) = Σ 0. (63) Stochastic Systems, 2013 20

Extended Kalman Filter The matrices A(t) and C(t) correspond to the dynamics matrix and the output matrix, respectively, of the linearization of the nonlinear system around the estimated trajectory: A(t) = C(t) = f( x(t), u(t), t) (64) x g( x(t), t). (65) x Stochastic Systems, 2013 21

Extended Kalman Filter Because the dynamics of the error covariance matrix Σ(t) correspond to the linearized system, we face the following problems: This filter is not optimal. The state estimate will be biased. The reference point for the linearization is questionable. The matrix Σ(t) is only an approximation of the state error covariance matrix. The state vectors x(t) und x(t) are not Gaussian even if ξ, v, and ṙ are. Stochastic Systems, 2013 22

Extended Kalman Filter Again, the continous-time measurement corrupted with the white noise ṙ with infinite covariance y(t) = g(x(t), t) + ṙ(t) (66) must be replaced by an averaged dicrete-time measurement y k = g(x(t k ), t k ) + r k (67) r k : N (r k, R k ) (68) Cov( r i, r j ) = R i δ ij. (69) Stochastic Systems, 2013 23

Continuous-time/discrete-time extended Kalman filter Update at time t k : x(t k t k ) = x(t k t k 1 ) + Σ(t k t k 1 )C T k [ ] 1 C k Σ(t k t k 1 )C T k + R k [y k r k g( x(t k t k 1 ), t k )] (70) Σ(t k t k ) = Σ(t k t k 1 ) (71) [ 1Ck Σ(t k t k 1 )C T k C k Σ(t k t k 1 )C T k k] + R Σ(t k t k 1 ) (72) Extrapolation between t k and t k+1 : x(t t k ) = f( x(t t k ), u(t), t) + B(t)v(t) (73) Σ(t t k ) = A(t)Σ(t t k ) + Σ(t t k )A T (t) + B(t)Q(t)B T (t) (74) Stochastic Systems, 2013 24

Continuous-time/discrete-time extended Kalman filter Again, the matrices A(t) and C k are the dynamics matrix and the output matrix, respectively, of the nonlinear system linearized around the estimated trajectory: A(t) = f( x(t), u(t), t) x C k = g( x(t k t k 1 ), t k ) x (75) (76) Initialization at t 0 : x(t 0 t 1 ) = x 0 (77) Σ(t 0 t 1 ) = Σ 0. (78) Stochastic Systems, 2013 25

Kalman Filter and Parameter Identification The system and the measurement equations in discrete time are given by x k+1 = F k x k + G k u k + [G k Q 1 2 k ]w (1) k (79) The update equations of the Kalman Filter are y k = C k x k + r k + [R 1 2 k ]w (2) k (80) x k k = x k k 1 + Σ k k 1 C T k [C kσ k k 1 C T k + R k] 1 v k (81) v k = [y k r k C k x k k 1 ] (82) Σ k k = Σ k k 1 where v k denotes the prediction error. Σ k k 1 C T k [C kσ k k 1 C T k + R k] 1 C k Σ k k 1 (83) Stochastic Systems, 2013 26

Kalman Filter and Parameter Identification The prediction equations are given by x k k 1 = F k 1 x k 1 k 1 + G k 1 u k 1 (84) Σ k k 1 = F k 1 Σ k 1 k 1 F T k 1 + G k 1Q k G T k 1 (85) The algorithm is carried out in two distinct parts: Prediction Step The first step consists of forming an optimal predictor of the next observation, given all the information available up to time t k 1. This results in a so-called a priori estimate for time t. Updating Step The a priori estimate is updated with the new information arriving at time t k that is combined with the already available information from time t k 1. The result of this step is the filtered estimate or the a posteriori estimate Stochastic Systems, 2013 27

Kalman Filter and Parameter Identification The prediction error is defined by v k = y k y k k 1 = y k E[y k F k 1 ] = y k r k C k x k k 1 and its covariance matrix can be calculated by K k k 1 = Cov(v k F k 1 ) = E[v k v T k F k 1] = C k E[ x k k 1 x T k k 1 F k 1]C T k + E[w(2) k w(2)t k F k 1 ] = C k Σ k k 1 C T k + R k This result is important for forming the log likelihood estimator. Stochastic Systems, 2013 28

Kalman Filter and Parameter Identification We start to examine the appropriate log-likelihood function with our frame work of the Kalman Filter. Let ψ R d be the vector of unknown parameters, which belongs to the admissible parameter space Ψ. The various system matrices of the state space models, as well as the variances of stochastic processes, given in the previous section, depend on ψ. The likelihood function of the state space model is given by the joint density of the observational data y = (y N, y N 1,..., y 1 ) l(y, ψ) = p(y N, y N 1,..., y 1 ; ψ) (86) which reflects how likely it would have been to have observed the date if ψ were the true values of the parameters. Stochastic Systems, 2013 29

Kalman Filter and Parameter Identification Using the definition of condition probability and employing Bayes s theorem recursively, we now write the joint density as product of conditional densities l(y, ψ) = p(y N y N 1,..., y 1 ; ψ)... p(y k y k 1,..., y 1 ; ψ)... p(y 1 ; ψ) (87) where we approximate the initial density function p(y 1 ; ψ) by p(y 1 y 0 ; ψ). Furthermore, since we deal with a Markovian system, future values of y l with l > k only depend on (y k, y k 1,..., y 1 ) through the current values of y k, the expression in (87) can be rewritten to depend on only the last observed values, thus it is reduced to l(y, ψ) = p(y N y N 1 ; ψ)... p(y k y k 1 ; ψ)... p(y 1 ; ψ) (88) Stochastic Systems, 2013 30

Kalman Filter and Parameter Identification For the purpose of estimating the parameter vector ψ we express the likelihood function in terms of the prediction error. Since the variance of the prediction error v k is the same as the conditional variance of y k, i.e. Cov[v k F k 1 ] = Cov[y k F k 1 ] (89) we are able to state the density function. The density function p(y k y k 1 ; ψ) is a Gauss Normal distribution with conditional mean with conditional covariance matrix E[y k F k 1 ] = r k + C k x k k 1 (90) Cov[y k F k 1 ] = K k k 1. (91) Stochastic Systems, 2013 31

Kalman Filter and Parameter Identification The density function of a n-dimensional Normal distribution can be written in matrix notation as 1 (2π) n e 1 2 (x µ)t Σ 1 (x µ) 2 Σ where µ denotes the mean value and Σ covariance matrix. In our case x µ is replaced by y k r k C k x k k 1 = v k and the covariance matrix is K k k 1. The probability density function thus takes the form p(y k y k 1 ; ψ) = 1 (2π) p 2 K k k 1 e 1 2 vt k K 1 k k 1 vk, (92) where v k R p. Stochastic Systems, 2013 32

Kalman Filter and Parameter Identification Taking the logarithm of (92) yields ln(p(y k y k 1 ; ψ)) = n 2 ln(2π) 1 2 ln K k k 1 1 2 vt k K 1 k k 1 v k (93) which gives us the whole log-likelihood function as the sum L(y, ψ) = 1 2 N k=1 n ln(2π) + ln K k k 1 + v T k K 1 k k 1 v k (94) In order to estimate the unknown parameters from the log-likelihood function in (94) we employ an appropriate optimization routine which maximizes L(y, ψ) with respect to ψ. Stochastic Systems, 2013 33

Kalman Filter and Parameter Identification iven the simplicity of the innovation form of the likelihood function, we can find the values for the conditional expectation of the prediction error and the conditional covariance matrix for every given parameter vector ψ using the Kalman Filter algorithm. Hence, we are able to calculate the value of the conditional log likelihood function numerically. The parameter are estimated based on the optimization ψ ML = arg max ψ Ψ L(y, ψ) (95) where Ψ denotes the parameter space. The optimization is either an unconstrained optimization problem if ψ R d or a constrained optimization if the parameter space Ψ R d. Stochastic Systems, 2013 34

Kalman Filter and Parameter Identification Stochastic Systems, 2013 35