An Brief Overview of Particle Filtering

1 An Brief Overview of Particle Filtering Adam M. Johansen a.m.johansen@warwick.ac.uk www2.warwick.ac.uk/fac/sci/statistics/staff/academic/johansen/talks/ May 11th, 2010 Warwick University Centre for Systems Biology

2 Outline Background Monte Carlo Methods Hidden Markov Models / State Space Models Sequential Monte Carlo Filtering Sequential Importance Sampling Sequential Importance Resampling Advanced methods Smoothing Parameter Estimation

3 Background

4 Monte Carlo Estimating π Rain is uniform. Circle is inscribed in square. A square = 4r 2. A circle = πr 2. p = A circle A square = π 4. 383 of 500 successes. ˆπ = 4 383 500 = 3.06. Also obtain confidence intervals.

5 Monte Carlo The Monte Carlo Method Given a probability density, p, I = ϕ(x)p(x)dx Simple Monte Carlo solution: iid Sample X1,..., X N p. N Estimate Î = 1 N ϕ(x N ). i=1 E Justified by the law of large numbers... and the central limit theorem. Can also be viewed as approximating π(dx) = p(x)dx with ˆπ N (dx) = 1 N N δ Xi (dx). i=1

6 Monte Carlo Importance Sampling Given q, such that p(x) > 0 q(x) > 0 and p(x)/q(x) <, define w(x) = p(x)/q(x) and: I = ϕ(x)p(x)dx = ϕ(x)w(x)q(x)dx. This suggests the importance sampling estimator: iid Sample X 1,..., X N q. N Estimate Î = 1 N w(x i )ϕ(x i ). i=1 Can also be viewed as approximating π(dx) = p(x)dx with ˆπ n (dx) = 1 n w(x i )δ Xi (dx). n i=1

7 Monte Carlo Variance and Importance Sampling Variance depends upon proposal: ( ) 2 Var(Î) =E 1 N w(x i )ϕ(x i ) I N = 1 N 2 = 1 N i=1 N E [( w(x i ) 2 ϕ(x i ) 2 I 2)] i=1 ( Eq [ w(x) 2 ϕ(x) 2] I 2) = 1 N ( Ep [ w(x)ϕ(x) 2 ] I 2) For positive ϕ, minimized by: q(x) p(x)ϕ(x) w(x)ϕ(x) 1

8 Monte Carlo Self-Normalised Importance Sampling Often, p is known only up to a normalising constant. As E q (Cwϕ) = CE p (ϕ)... If v(x) = Cw(x), then E q (vϕ) E q (v1) = E q(cwϕ) E q (Cw1) = CE p(ϕ) CE p (1) = E p(ϕ). Estimate the numerator and denominator with the same sample: N v(x i )ϕ(x i ) i=1 Î =. N v(x i ) i=1 Biased for finite samples, but consistent. Typically reduces variance.

9 Hidden Markov Models Hidden Markov Models / State Space Models x 1 x 2 x 3 x 4 x 5 x 6 y 1 y 2 y 3 y 4 y 5 y 6 Unobserved Markov chain {X n } transition f. Observed process {Y n } conditional density g. Density: n p(x 1:n, y 1:n ) = f 1 (x 1 )g(y 1 x 1 ) f(x i x i 1 )g(y i x i ). i=2

10 Hidden Markov Models Inference in HMMs Given y 1:n : What is x1:n, in particular, what is xn, and what about xn+1? f and g may have unknown parameters θ. We wish to obtain estimates for n = 1, 2,....

11 Hidden Markov Models Some Terminology Three main problems: Given known parameters θ, what are: The filtering and prediction distributions: p(x n y 1:n ) = p(x 1:n y 1:n )dx 1:n 1 p(x n+1 y 1:n ) = p(x n+1 x n )p(x n y 1:n )dx n The smoothing distributions: p(x 1:n y 1:n ) = p(x 1:n, y 1:n ) p(y 1:n ) And how can we estimate static parameters: p(θ y 1:n ) and p(x 1:n, θ y 1:n )?

12 Hidden Markov Models Formal Solutions Filtering and Prediction Recursions: Smoothing: p(x n y 1:n ) = p(x n y 1:n 1 )g(y n x n ) p(x n y 1:n 1 )g(y n x n)dx n p(x n+1 y 1:n ) = p(x n y 1:n )f(x n+1 x n )dx n p(x 1:n 1 y 1:n 1 )f(x n x n 1 )g(y n x n ) p(x 1:n y 1:n ) = g(yn x n)f(x n x n 1 )p(x n 1 y 1:n 1)dx n 1:n

13 Particle Filtering Importance Sampling in This Setting Given p(x 1:n y 1:n ) for n = 1, 2,.... We could sample from a sequence q n (x 1:n ) for each n. Or we could let q n (x 1:n ) = q n (x n x 1:n 1 )q n 1 (x 1:n 1 ) and re-use our samples. The importance weight decomposes: p(x 1:n y 1:n ) w n (x 1:n ) q n (x n x 1:n 1 )q n 1 (x 1:n 1 ) p(x 1:n y 1:n ) = q n (x n x 1:n 1 )p(x 1:n 1 y 1:n 1 ) w n 1(x 1:n 1 ) = f(x n x n 1 )g(y n x n ) q n (x n x n 1 )p(y n y 1:n 1 ) w n 1(x 1:n 1 )

14 Particle Filtering Sequential Importance Sampling Prediction & Update A first particle filter : Simple default: qn (x n x n 1 ) = f(x n x n 1 ). Importance weighting becomes: w n (x 1:n ) = w n 1 (x 1:n 1 ) g(y n x n ) Algorithmically, at iteration n: Given {W i n 1, X i 1:n 1} for i = 1,..., N: Sample Xn i f( Xn 1) i (prediction) Weight Wn i Wn 1g(y i n Xn) i (update)

15 Particle Filtering Example: Almost Constant Velocity Model States: x n = [s x n u x n s y n u y n] T Dynamics: x n = Ax n 1 + ɛ n s x n 1 t 0 0 u x n s y n = 0 1 0 0 0 0 1 t u y n 0 0 0 1 Observation: y n = Bx n + ν n [ r x n r y n ] = [ 1 0 0 0 0 0 1 0 ] s x n 1 u x n 1 s y n 1 u y n 1 s x n u x n s y n u y n + ɛ n + ν n

16 Particle Filtering Better Proposal Distributions The importance weighting is: w n (x 1:n ) f(x n x n 1 )g(y n x n ) w n 1 (x 1:n 1 ) q n (x n x n 1 ) Optimally, we d choose: q n (x n x n 1 ) f(x n x n 1 )g(y n x n ) Typically impossible; but good approximation is possible. Can obtain much better performance. But, eventually the approximation variance is too large.

17 Particle Filtering Resampling Given {W i n, X i 1:n } targetting p(x 1:n y 1:n ). Sample X i 1:n N j=1 Wnδ i X j. 1:n And {1/N, X i 1:n } also targets p(x 1:n y 1:n ). Such steps can be incorporated into the SIS algorithm.

18 Particle Filtering Sequential Importance Resampling Algorithmically, at iteration n: Given {W i n 1, X i n 1} for i = 1,..., N: Resample, obtaining {1/N, Xi n 1 }. Sample X i n q( e X i n 1) Weight W i n f(xi n e X i n 1 )g(yn Xi n ) q(x i n e X i n 1 ) Actually: You can resample more efficiently. It s only necessary to resample some of the time.

19 Particle Filtering What else can we do? Some common improvements: Incorporate MCMC (7) Use the next observation before resampling auxiliary particle filters (11, 9) Sample new values for recent states (6) Explicitly approximate the marginal filter (10)

20 Particle Smoothing Smoothing: Resampling helps filtering. But it eliminates unique particle values. Approximating p(x 1:n y 1:n ) or p(x m y 1:n ) fails for large n.

21 Particle Smoothing Smoothing Strategies Some approaches: Fixed-lag smoothing: for large enough L. p(x n L y 1:n ) p(x n L y 1:n 1 ) Particle implementation of forward-backward algorithm: p(x 1:n y 1:n ) = p(x n y 1:n ) n 1 m=1 p(x n 1 x n, y 1:n ) Doucet et al. have developed a forward-only variant. Particle implementation of two-filter formula: p(x m y 1:n ) p(x m y 1:m )p(y m+1:n x m )

22 Parameter Estimation Static Parameters If f(x n x n 1 ) and g(y n x n ) depend upon θ: We could set x n = (x n, θ n ) and f (x n x n 1) = f(x n x n 1 )δ θn 1 (θ n ) Degeneracy causes difficulties: Resampling eliminates values. We never propose new values. MCMC transitions don t mix well.

23 Parameter Estimation Approaches to Parameter Estimation Artificial dynamics: θ n = θ n 1 + ɛ n. Iterated filtering. Sufficient statistics. Various online likelihood approaches. Practical filtering..

24 Summary

25 SMCTC: C++ Template Class for SMC Algorithms Implementing SMC algorithms in C/C++ isn t hard. Software for implementing general SMC algorithms. C++ element largely confined to the library. Available (under a GPL-3 license from) www2.warwick.ac.uk/fac/sci/statistics/staff/ academic/johansen/smctc/ or type smctc into google. Example code includes simple particle filter.

26 In Conclusion (with references) Sequential Monte Carlo methods ( Particle Filters ) can approximate filtering and smoothing distributions (5). Proposal distributions matter. Resampling: is necessary (2), but not every iteration, and need not be multinomial (4). Smoothing & parameter estimation are harder. Software for easy implementation exists: PFLib (1) SMCTC (8) Actually, similar techniques apply elsewhere (3)

References [1] L. Chen, C. Lee, A. Budhiraja, and R. K. Mehra. PFlib: An object oriented matlab toolbox for particle filtering. In Proceedings of SPIE Signal Processing, Sensor Fusion and Target Recognition XVI, volume 6567, 2007. [2] P. Del Moral. Feynman-Kac formulae: genealogical and interacting particle systems with applications. Probability and Its Applications. Springer Verlag, New York, 2004. [3] P. Del Moral, A. Doucet, and A. Jasra. Sequential Monte Carlo samplers. Journal of the Royal Statistical Society B, 63(3):411 436, 2006. [4] R. Douc and O. Cappé. Comparison of resampling schemes for particle filters. In Proceedings of the 4th International Symposium on Image and Signal Processing and Analysis, volume I, pages 64 69. IEEE, 2005. [5] A. Doucet and A. M. Johansen. A tutorial on particle filtering and smoothing: Fiteen years later. In D. Crisan and B. Rozovsky, editors, The Oxford Handbook of Nonlinear Filtering. Oxford University Press, 2010. To appear. [6] A. Doucet, M. Briers, and S. Sénécal. Efficient block sampling strategies for sequential Monte Carlo methods. Journal of Computational and Graphical Statistics, 15(3):693 711, 2006. [7] W. R. Gilks and C. Berzuini. Following a moving target Monte Carlo inference for dynamic Bayesian models. Journal of the Royal Statistical Society B, 63:127 146, 2001. [8] A. M. Johansen. SMCTC: Sequential Monte Carlo in C++. Journal of Statistical Software, 30(6):1 41, April 2009. [9] A. M. Johansen and A. Doucet. A note on the auxiliary particle filter. Statistics and Probability Letters, 78(12):1498 1504, September 2008. URL http://dx.doi.org/10.1016/j.spl.2008.01.032. [10] M. Klass, N. de Freitas, and A. Doucet. Towards practical n 2 Monte Carlo: The marginal particle filter. In Proceedings of Uncertainty in Artificial Intelligence, 2005. [11] M. K. Pitt and N. Shephard. Filtering via simulation: Auxiliary particle filters. Journal of the American Statistical Association, 94(446):590 599, 1999. 27

28 SMC Sampler Outline Given a sample {X (i) 1:n 1 }N i=1 targeting η n 1, sample X n (i) calculate K n (X (i) n 1, ), W n (X (i) 1:n ) = η n(x n (i) )L n 1 (X n (i), X (i) n 1 ) η n 1 (X (i) n 1 )K n(x (i) n 1, X(i) n ). Resample, yielding: {X (i) 1:n }N i=1 targeting η n. Hints that we d like to use L n 1 (x n, x n 1 ) = η n 1(x n 1 )K n (x n 1, x n ) ηn 1 (x n 1 )K n(x n 1, x n).