Lecture 7: Optimal Smoothing - PDF Free Download

Department of Biomedical Engineering and Computational Science Aalto University March 17, 2011

Contents 1 What is Optimal Smoothing? 2 Bayesian Optimal Smoothing Equations 3 Rauch-Tung-Striebel Smoother 4 Gaussian Approximation Based Smoothing 5 Particle Smoothing 6 Summary and Demonstration

Filtering, Prediction and Smoothing

Types of Smoothing Problems Fixed-interval smoothing: estimate states on interval [0, T] given measurements on the same interval. Fixed-point smoothing: estimate state at a fixed point of time in the past. Fixed-lag smoothing: estimate state at a fixed delay in the past. Here we shall only consider fixed-interval smoothing, the others can be quite easily derived from it.

Examples of Smoothing Problems Given all the radar measurements of a rocket (or missile) trajectory, what was the exact place of launch? Estimate the whole trajectory of a car based on GPS measurements to calibrate the inertial navigation system accurately. What was the history of chemical/combustion/other process given a batch of measurements from it? Remove noise from audio signal by using smoother to estimate the true audio signal under the noise. Smoothing solution also arises in EM algorithm for estimating the parameters of a state space model.

Optimal Smoothing Algorithms Linear Gaussian models Rauch-Tung-Striebel smoother (RTSS). Two-filter smoother. Non-linear Gaussian models Extended Rauch-Tung-Striebel smoother (ERTSS). Unscented Rauch-Tung-Striebel smoother (URTSS). Statistically linearized Rauch-Tung-Striebel smoother (URTSS). Gaussian Rauch-Tung-Striebel smoothers (GRTSS), cubature, Gauss-Hermite, Bayes-Hermite, Monte Carlo. Two-filter versions of the above. Non-linear non-gaussian models Sequential importance resampling based smoother. Rao-Blackwellized particle smoothers. Grid based smoother.

Problem Formulation Probabilistic state space model: measurement model: y k p(y k x k ) dynamic model: x k p(x k x k 1 ) Assume that the filtering distributions p(x k y 1:k ) have already been computed for all k = 0,...,T. We want recursive equations of computing the smoothing distribution for all k < T : p(x k y 1:T ). The recursion will go backwards in time, because on the last step, the filtering and smoothing distributions coincide: p(x T y 1:T ).

Derivation of Formal Smoothing Equations [1/2] The key: due to the Markov properties of state we have: Thus we get: p(x k x k+1, y 1:T ) = p(x k x k+1, y 1:k ) p(x k x k+1, y 1:T ) = p(x k x k+1, y 1:k ) = p(x k, x k+1 y 1:k ) p(x k+1 y 1:k ) = p(x k+1 x k, y 1:k ) p(x k y 1:k ) p(x k+1 y 1:k ) = p(x k+1 x k ) p(x k y 1:k ). p(x k+1 y 1:k )

Derivation of Formal Smoothing Equations [2/2] Assuming that the smoothing distribution of the next step p(x k+1 y 1:T ) is available, we get p(x k, x k+1 y 1:T ) = p(x k x k+1, y 1:T ) p(x k+1 y 1:T ) = p(x k x k+1, y 1:k ) p(x k+1 y 1:T ) = p(x k+1 x k ) p(x k y 1:k ) p(x k+1 y 1:T ) p(x k+1 y 1:k ) Integrating over x k+1 gives [ ] p(xk+1 x p(x k y 1:T ) = p(x k y 1:k ) k ) p(x k+1 y 1:T ) dx k+1 p(x k+1 y 1:k )

Bayesian Optimal Smoothing Equations Bayesian Optimal Smoothing Equations The Bayesian optimal smoothing equations consist of prediction step and backward update step: p(x k+1 y 1:k ) = p(x k+1 x k ) p(x k y 1:k ) dx k [ ] p(xk+1 x p(x k y 1:T ) = p(x k y 1:k ) k ) p(x k+1 y 1:T ) dx k+1 p(x k+1 y 1:k ) The recursion is started from the filtering (and smoothing) distribution of the last time step p(x T y 1:T ).

Linear-Gaussian Smoothing Problem Gaussian driven linear model, i.e., Gauss-Markov model: x k = A k 1 x k 1 + q k 1 y k = H k x k + r k, In probabilistic terms the model is p(x k x k 1 ) = N(x k A k 1 x k 1, Q k 1 ) p(y k x k ) = N(y k H k x k, R k ). Kalman filter can be used for computing all the Gaussian filtering distributions: p(x k y 1:k ) = N(x k m k, P k ).

RTS: Derivation Preliminaries Gaussian probability density ( 1 N(x m, P) = (2π) n/2 exp 1 ) P 1/2 2 (x m)t P 1 (x m), Let x and y have the Gaussian densities p(x) = N(x m, P), p(y x) = N(y H x, R), Then the joint and marginal distributions are ( ) (( ) ( )) x m P P H T N, y H m H P H P H T + R y N(H m, H P H T + R).

RTS: Derivation Preliminaries (cont.) If the random variables x and y have the joint Gaussian probability density ( ) (( ) ( )) x a A C N, y b C T, B Then the marginal and conditional densities of x and y are given as follows: x N(a, A) y N(b, B) x y N(a+C B 1 (y b), A C B 1 C T ) y x N(b+C T A 1 (x a), B C T A 1 C).

Derivation of Rauch-Tung-Striebel Smoother [1/4] By the Gaussian distribution computation rules we get p(x k, x k+1 y 1:k ) = p(x k+1 x k ) p(x k y 1:k ) where m 1 = ( mk A k m k ) = N(x k+1 A k x k, Q k ) N(x k m k, P k ) ([ ] ) xk = N m1, P 1, x k+1 ( Pk P, P 1 = k A T ) k A k P k A k P k A T k + Q. k

Derivation of Rauch-Tung-Striebel Smoother [2/4] By conditioning rule of Gaussian distribution we get p(x k x k+1, y 1:T ) = p(x k x k+1, y 1:k ) = N(x k m 2, P 2 ), where G k = P k A T k (A k P k A T k + Q k) 1 m 2 = m k + G k (x k+1 A k m k ) P 2 = P k G k (A k P k A T k + Q k) G T k.

Derivation of Rauch-Tung-Striebel Smoother [3/4] The joint distribution of x k and x k+1 given all the data is p(x k+1, x k y 1:T ) = p(x k x k+1, y 1:T ) p(x k+1 y 1:T ) = N(x k m 2, P 2 ) N(x k+1 m s k+1, Ps k+1 ([ ] ) ) xk+1 = N m3, P 3 x k where ( m s ) m 3 = k+1 m k + G k (m s k+1 A k m k ) ( P s P 3 = k+1 P s ) k+1 GT k G k P s k+1 G k P s k+1 GT k + P. 2

Derivation of Rauch-Tung-Striebel Smoother [4/4] The marginal mean and covariance are thus given as m s k = m k + G k (m s k+1 A k m k ) P s k = P k + G k (P s k+1 A k P k A T k Q k) G T k. The smoothing distribution is then Gaussian with the above mean and covariance: p(x k y 1:T ) = N(x k m s k, Ps k ),

Rauch-Tung-Striebel Smoother Rauch-Tung-Striebel Smoother Backward recursion equations for the smoothed means m s k and covariances P s k : m k+1 = A k m k P k+1 = A k P k A T k + Q k G k = P k A T k [P k+1 ] 1 m s k = m k + G k [m s k+1 m k+1 ] P s k = P k + G k [P s k+1 P k+1 ] GT k, m k and P k are the mean and covariance computed by the Kalman filter. The recursion is started from the last time step T, with m s T = m T and P s T = P T.

RTS Smoother: Car Tracking Example The dynamic model of the car tracking model from the first & third lectures was: x k 1 0 t 0 y k ẋk = 0 1 0 t 0 0 1 0 + q k 1 ẏk } 0 0 0 {{ 1 } A x k 1 y k 1 ẋ k 1 ẏ k 1 where q k is zero mean with a covariance matrix Q. q1 c t3 /3 0 q c 1 t2 /2 0 Q = 0 q2 c t3 /3 0 q2 c t2 /2 q1 c t2 /2 0 q1 c t 0 0 q2 c t2 /2 0 q2 c t

Non-Linear Smoothing Problem Non-linear Gaussian state space model: x k = f(x k 1 )+q k 1 y k = h(x k )+r k, We want to compute Gaussian approximations to the smoothing distributions: p(x k y 1:T ) N(x k m s k, Ps k ).

Extended Rauch-Tung-Striebel Smoother Derivation The approximate joint distribution of x k and x k+1 is ([ ] ) xk p(x k, x k+1 y 1:k ) = N m1, P x 1, k+1 where ( ) mk m 1 = f(m k ) ( P P 1 = k P k F T x (m ) k) F x (m k ) P k F x (m k ) P k F T x (m. k)+q k The rest of the derivation is analogous to the linear RTS smoother.

Extended Rauch-Tung-Striebel Smoother Extended Rauch-Tung-Striebel Smoother The equations for the extended RTS smoother are m k+1 = f(m k) P k+1 = F x(m k ) P k F T x(m k )+Q k G k = P k F T x (m k)[p k+1 ] 1 m s k = m k + G k [m s k+1 m k+1 ] P s k = P k + G k [P s k+1 P k+1 ] GT k, where the matrix F x (m k ) is the Jacobian matrix of f(x) evaluated at m k.

Statistically Linearized Rauch-Tung-Striebel Smoother Derivation With statistical linearization we get the approximation ([ ] ) xk p(x k, x k+1 y 1:k ) = N m1, P 1, x k+1 where ( ) mk m 1 = E[f(x k )] ( P P 1 = k E[f(x k )δx T ) k ]T E[f(x k )δx T k ] E[f(x k)δx T k ] P 1 k E[f(x k )δx T. k ]T + Q k The expectations are taken with respect to filtering distribution of x k. The derivation proceeds as with linear RTS smoother.

Statistically Linearized Rauch-Tung-Striebel Smoother Statistically Linearized Rauch-Tung-Striebel Smoother The equations for the statistically linearized RTS smoother are m k+1 = E[f(x k)] P k+1 = E[f(x k)δx T k ] P 1 k E[f(x k )δx T k ]T + Q k G k = E[f(x k )δx T k ]T [P k+1 ] 1 m s k = m k + G k [m s k+1 m k+1 ] P s k = P k + G k [P s k+1 P k+1 ] GT k, where the expectations are taken with respect to the filtering distribution x k N(m k, P k ).

Gaussian Rauch-Tung-Striebel Smoother Derivation With Gaussian moment matching we get the approximation ( [ ] [ ] [ ] ) xk mk Pk D p(x k, x k+1 y 1:k ) = N, k+1 x k+1 D T k+1 P, k+1 m k+1 where m k+1 = P k+1 = f(x k ) N(x k m k, P k ) dx k [f(x k ) m k+1 ][f(x k) m k+1 ]T N(x k m k, P k ) dx k + Q k D k+1 = [x k m k ][f(x k ) m k+1 ]T N(x k m k, P k ) dx k.

Gaussian Rauch-Tung-Striebel Smoother Gaussian Rauch-Tung-Striebel Smoother The equations for the Gaussian RTS smoother are m k+1 = f(x k ) N(x k m k, P k ) dx k P k+1 = [f(x k ) m k+1 ][f(x k) m k+1 ]T N(x k m k, P k ) dx k + Q k D k+1 = [x k m k ][f(x k ) m k+1 ]T N(x k m k, P k ) dx k G k = D k+1 [P k+1 ] 1 m s k = m k + G k (m s k+1 m k+1 ) P s k = P k + G k (P s k+1 P k+1 ) GT k.

Cubature Smoother Derivation [1/2] Recall the 3rd order spherical Gaussian integral rule: g(x) N(x m, P) dx where 1 2n 2n i=1 g(m+ Pξ (i) ), { ξ (i) n ei, i = 1,...,n = n e i n, i = n+1,...,2n, where e i denotes a unit vector to the direction of coordinate axis i.

Cubature Smoother Derivation [2/2] We get the approximation ( [ xk p(x k, x k+1 y 1:k ) = N x k+1 ] [ mk m k+1 ] [ ] ) Pk D, k+1 D T k+1 P, k+1 where X (i) k m k+1 = 1 2n = m k + P k ξ (i) P k+1 = 1 2n D k+1 = 1 2n 2n i=1 2n i=1 2n i=1 f(x (i) k ) [f(x (i) k [X (i) k ) m k+1 ][f(x(i) k ) m k+1 ]T + Q k m k ][f(x (i) k ) m k+1 ]T.

Cubature Rauch-Tung-Striebel Smoother [1/3] Cubature Rauch-Tung-Striebel Smoother 1 Form the sigma points: X (i) k = m k + P k ξ (i), i = 1,...,2n, where the unit sigma points are defined as { ξ (i) n ei, i = 1,...,n = n e i n, i = n+1,...,2n. 2 Propagate the sigma points through the dynamic model: ˆX (i) k+1 = f(x(i) k ), i = 1,...,2n.

Cubature Rauch-Tung-Striebel Smoother [2/3] Cubature Rauch-Tung-Striebel Smoother (cont.) 3 Compute the predicted mean m k+1, the predicted covariance P k+1 and the cross-covariance D k+1: m k+1 = 1 2n P k+1 = 1 2n D k+1 = 1 2n 2n i=1 2n i=1 2n i=1 ˆX (i) k+1 ( ˆX (i) k+1 m k+1 )( ˆX (i) k+1 m k+1 )T + Q k (X (i) k m k )( ˆX (i) k+1 m k+1 )T.

Cubature Rauch-Tung-Striebel Smoother [3/3] Cubature Rauch-Tung-Striebel Smoother (cont.) 4 Compute the gain G k, mean m s k and covariance Ps k as follows: G k = D k+1 [P k+1 ] 1 m s k = m k + G k (m s k+1 m k+1 ) P s k = P k + G k (P s k+1 P k+1 ) GT k.

Unscented Rauch-Tung-Striebel Smoother [1/3] Unscented Rauch-Tung-Striebel Smoother 1 Form the sigma points: X (0) k = m k, X (i) k X (i+n) k = m k + [ ] n+λ Pk i = m k [ ] n+λ Pk, i = 1,...,n. i 2 Propagate the sigma points through the dynamic model: ˆX (i) k+1 = f(x(i) k ), i = 0,...,2n.

Unscented Rauch-Tung-Striebel Smoother [2/3] Unscented Rauch-Tung-Striebel Smoother (cont.) 3 Compute predicted mean, covariance and cross-covariance: m k+1 = 2n i=0 P k+1 = 2n D k+1 = i=0 2n i=0 W (m) i ˆX (i) k+1 W (c) i ( ˆX (i) k+1 m k+1 )( ˆX (i) k+1 m k+1 )T + Q k W (c) i (X (i) k m k )( ˆX (i) k+1 m k+1 )T,

Unscented Rauch-Tung-Striebel Smoother [3/3] Unscented Rauch-Tung-Striebel Smoother (cont.) 4 Compute gain smoothed mean and smoothed covariance: as follows: G k = D k+1 [P k+1 ] 1 m s k = m k + G k (m s k+1 m k+1 ) P s k = P k + G k (P s k+1 P k+1 ) GT k.

Other Gaussian RTS Smoothers Gauss-Hermite RTS smoother is based on multidimensional Gauss-Hermite integration. Bayes-Hermite or Gaussian Process RTS smoother uses Gaussian process based quadrature (Bayes-Hermite). Monte Carlo integration based RTS smoothers. Central differences etc.

Particle Smoothing [1/2] The smoothing solution can be obtained from SIR by storing the whole state histories into the particles. Special care is needed on the resampling step. The smoothed distribution approximation is then of the form N p(x k y 1:T ) w (i) T δ(x k x (i) k ). i=1 where x (i) k is the kth component in x (i) 1:T. Unfortunately, the approximation is often quite degenerate. Specialized algorithms for particle smoothing exists.

Particle Smoothing [2/2] Recall the Rao-Blackwellized particle filtering model: s k p(s k s k 1 ) x k = A(s k 1 ) x k 1 + q k, q k N(0, Q) y k = H(s k ) x k + r k, r k N(0, R) The principle of Rao-Blackwellized particle smoothing is the following: 1 During filtering store the whole sampled state and Kalman filter histories to the particles. 2 At the smoothing step, apply Rauch-Tung-Striebel smoothers to each of the Kalman filter histories in the particles. The smoothing distribution approximation will then be of the form p(x k, s k y 1:T ) N i=1 w (i) T δ(s k s (i) k ) N(x k m s,(i) k, P s,(i) k ).

Summary Optimal smoothing is used for computing estimates of state trajectories given the measurements on the whole trajectory. Rauch-Tung-Striebel (RTS) smoother is the closed form smoother for linear Gaussian models. Extended, statistically linearized and unscented RTS smoothers are the approximate nonlinear smoothers corresponding to EKF, SLF and UKF. Gaussian RTS smoothers: cubature RTS smoother, Gauss-Hermite RTS smoothers and various others Particle smoothing can be done by storing the whole state histories in SIR algorithm. Rao-Blackwellized particle smoother is a combination of particle smoothing and RTS smoothing.

Matlab Demo: Pendulum [1/2] Pendulum model: ( ) ( x 1 k x 1 xk 2 = k 1 + x2 k 1 t ) ( ) 0 xk 1 2 g sin(x1 k 1 }{{ ) t + q k 1 } f(x k 1 ) y k = sin(xk 1 }{{} ) +r k, h(x k ) The required Jacobian matrix for ERTSS: ( ) 1 t F x (x) = g cos(x 1 ) t 1

Matlab Demo: Pendulum [2/2] The required expected value for SLRTSS is ( ) m E[f(x)] = 1 + m 2 t m 2 g sin(m 1 ) exp( P 11 /2) t And the cross term: ( ) E[f(x)(x m) T c11 c ] = 12, c 21 c 22 where c 11 = P 11 + t P 12 c 12 = P 12 + t P 22 c 21 = P 12 g t cos(m 1 ) P 11 exp( P 11 /2) c 22 = P 22 g t cos(m 1 ) P 12 exp( P 11 /2)