Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012
Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and General Bayesian Optimal Filter 4 Summary and Demo
Batch Linear Regression [1/2] 1.7 Measurement 1.6 True Signal 1.5 1.4 1.3 1.2 1.1 1 0.9 0.8 0 0.2 0.4 0.6 0.8 1 Consider the linear regression model y k = a 1 + a 2 t k +ǫ k, with ǫ k N(0,σ 2 ) and a = (a 1, a 2 ) N(m 0, P 0 ). In probabilistic notation this is: where H k = (1 t k ). p(y k a) = N(y k H k a,σ 2 ) p(a) = N(a m 0, P 0 ),
Batch Linear Regression [2/2] The Bayesian batch solution by the Bayes rule: N p(a y 1:N ) p(a) p(y k a) k=1 = N(a m 0, P 0 ) The posterior is Gaussian N N(y k H k a,σ 2 ). k=1 p(a y 1:N ) = N(a m N, P N ). The mean and covariance are given as [ m N = P 1 0 + 1 ] 1 [ ] 1 σ 2 HT H σ 2 HT y+p 1 0 m 0 [ P N = P 1 0 + 1 1 H] σ 2 HT, where H k = (1 t k ) and H = (H 1 ; H 2 ;...;H N ), and y = (y 1 ;...;y N ).
Recursive Linear Regression [1/3] Assume that we have already computed the posterior distribution, which is conditioned on the measurements up to k 1: p(a y 1:k 1 ) = N(a m k 1, P k 1 ). Assume that we get the kth measurement y k. Using the equations from the previous slide we get p(a y 1:k ) p(y k a) p(a y 1:k 1 ) N(a m k, P k ). The mean and covariance are given as [ m k = P 1 k 1 + 1 σ 2 HT k H k P k = [ P 1 k 1 + 1 σ 2 HT k H k] 1. ] 1 [ 1 σ 2 HT k y k + P 1 k 1 m k 1 ]
Recursive Linear Regression [2/3] By the matrix inversion lemma (or Woodbury identity): P k = P k 1 P k 1 H T k [ H k P k 1 H T k +σ2] 1 Hk P k 1. Now the equations for the mean and covariance reduce to S k = H k P k 1 H T k +σ2 K k = P k 1 H T k S 1 k m k = m k 1 + K k [y k H k m k 1 ] P k = P k 1 K k S k K T k. Computing these for k = 0,..., N gives exactly the linear regression solution but without a matrix inversion 1! A special case of Kalman filter. 1 Without an explicit matrix inversion
Recursive Linear Regression [3/3] Convergence of the recursive solution to the batch solution on the last step the solutions are exactly equal: 2.5 2 E[a 1 ] a 1 E[a ] 2 a 2 1.5 1 0.5 0 0 0.2 0.4 0.6 0.8 1 Time
Batch vs. Recursive Estimation [1/2] General batch solution: Specify the measurement model: p(y 1:N θ) = k p(y k θ). Specify the prior distribution p(θ). Compute posterior distribution by the Bayes rule: p(θ y 1:N ) = 1 Z p(θ) k p(y k θ). Compute point estimates, moments, predictive quantities etc. from the posterior distribution.
Batch vs. Recursive Estimation [2/2] General recursive solution: Specify the measurement likelihood p(y k θ). Specify the prior distribution p(θ). Process measurements y 1,...,y N one at a time, starting from the prior: p(θ y 1 ) = 1 Z 1 p(y 1 θ)p(θ) p(θ y 1:2 ) = 1 Z 2 p(y 2 θ)p(θ y 1 ). p(θ y 1:N ) = 1 Z N p(y N θ)p(θ y 1:N 1 ). The posterior at the last step is the same as the batch solution.
Advantages of Recursive Solution The recursive solution can be considered as the online learning solution to the Bayesian learning problem. Batch Bayesian inference is a special case of recursive Bayesian inference. The parameter can be modeled to change between the measurement steps basis of filtering theory.
Drift Model for Linear Regression [1/3] Let assume Gaussian random walk between the measurements in the linear regression model: p(y k a k ) = N(y k H k a k,σ 2 ) p(a k a k 1 ) = N(a k a k 1, Q) p(a 0 ) = N(a 0 m 0, P 0 ). Again, assume that we already know p(a k 1 y 1:k 1 ) = N(a k 1 m k 1, P k 1 ). The joint distribution of a k and a k 1 is (due to Markovianity of dynamics!): p(a k, a k 1 y 1:k 1 ) = p(a k a k 1 ) p(a k 1 y 1:k 1 ).
Drift Model for Linear Regression [2/3] Integrating over a k 1 gives: p(a k y 1:k 1 ) = p(a k a k 1 ) p(a k 1 y 1:k 1 ) da k 1. This equation for Markov processes is called the Chapman-Kolmogorov equation. Because the distributions are Gaussian, the result is Gaussian where p(a k y 1:k 1 ) = N(a k m k, P k ), m k = m k 1 P k = P k 1 + Q.
Drift Model for Linear Regression [3/3] As in the pure recursive estimation, we get p(a k y 1:k ) p(y k a k ) p(a k y 1:k 1 ) N(a k m k, P k ). After applying the matrix inversion lemma, mean and covariance can be written as S k = H k P k HT k +σ2 K k = P k HT k S 1 k m k = m k + K k[y k H k m k ] P k = P k K ks k K T k. Again, we have derived a special case of the Kalman filter. The batch version of this solution would be much more complicated.
State Space Notation In the previous section we formulated the model as p(a k a k 1 ) = N(a k a k 1, Q) p(y k a k ) = N(y k H k a k,σ 2 ) But in Kalman filtering and control theory the vector of parameters a k is usually called state and denoted as x k. More standard state space notation: Or equivalently p(x k x k 1 ) = N(x k x k 1, Q) p(y k x k ) = N(y k H k x k,σ 2 ) x k = x k 1 + q y k = H k x k + r, where q N(0, Q), r N(0,σ 2 ).
Kalman Filter [1/2] The canonical Kalman filtering model is p(x k x k 1 ) = N(x k A k 1 x k 1, Q k 1 ) p(y k x k ) = N(y k H k x k, R k ). More often, this model can be seen in the form x k = A k 1 x k 1 + q k 1 y k = H k x k + r k. The Kalman filter actually calculates the following distributions: p(x k y 1:k 1 ) = N(x k m k, P k ) p(x k y 1:k ) = N(x k m k, P k ).
Kalman Filter [2/2] Prediction step of the Kalman filter: m k = A k 1 m k 1 P k = A k 1 P k 1 A T k 1 + Q k 1. Update step of the Kalman filter: S k = H k P k HT k + R k K k = P k HT k S 1 k m k = m k + K k [y k H k m k ] P k = P k K k S k K T k. These equations will be derived from the general Bayesian filtering equations in the next lecture.
Probabilistic Non-Linear Filtering [1/2] Generic discrete-time state space models Generic Markov models x k = f(x k 1, q k 1 ) y k = h(x k, r k ). y k p(y k x k ) x k p(x k x k 1 ). Approximation methods: Extended Kalman filters (EKF), Unscented Kalman filters (UKF), sequential Monte Carlo (SMC) filters a ka particle filters.
Probabilistic Non-Linear Filtering [2/2] In continuous-discrete filtering models, dynamics are modeled in continuous time, measurements at discrete time steps. The continuous time versions of Markov models are called as stochastic differential equations: dx dt = f(x, t)+w(t) where w(t) is a continuous time Gaussian white noise process. Approximation methods: Extended Kalman filters, Unscented Kalman filters, sequential Monte Carlo, particle filters.
Summary Linear regression problem can be solved as batch problem or recursively the latter solution is a special case of Kalman filter. A generic Bayesian estimation problem can also be solved as batch problem or recursively. If we let the linear regression parameter change between the measurements, we get a simple linear state space model again solvable with Kalman filtering model. By generalizing this idea and the solution we get the Kalman filter algorithm. By further generalizing to non-gaussian models results in a generic probabilistic state space model.
Demonstration Batch and recursive linear regression.