Auxiliary Particle Methods Perspectives & Applications Adam M. Johansen 1 adam.johansen@bristol.ac.uk Oxford University Man Institute 29th May 2008 1 Collaborators include: Arnaud Doucet, Nick Whiteley 1
2 Introduction
3 Outline Outline Background: Particle Filters Hidden Markov Models & Filtering Particle Filters & Sequential Importance Resampling Auxiliary Particle Filters Interpretation Auxiliary Particle Filters are SIR Algorithms Theoretical Considerations Practical Implications Applications Auxiliary SMC Samplers The Probability Hypothesis Density On Stratification Conclusions
4 Outline Outline Background: Particle Filters Hidden Markov Models & Filtering Particle Filters & Sequential Importance Resampling Auxiliary Particle Filters Interpretation Auxiliary Particle Filters are SIR Algorithms Theoretical Considerations Practical Implications Applications Auxiliary SMC Samplers The Probability Hypothesis Density On Stratification Conclusions
5 Outline Outline Background: Particle Filters Hidden Markov Models & Filtering Particle Filters & Sequential Importance Resampling Auxiliary Particle Filters Interpretation Auxiliary Particle Filters are SIR Algorithms Theoretical Considerations Practical Implications Applications Auxiliary SMC Samplers The Probability Hypothesis Density On Stratification Conclusions
6 Outline Outline Background: Particle Filters Hidden Markov Models & Filtering Particle Filters & Sequential Importance Resampling Auxiliary Particle Filters Interpretation Auxiliary Particle Filters are SIR Algorithms Theoretical Considerations Practical Implications Applications Auxiliary SMC Samplers The Probability Hypothesis Density On Stratification Conclusions
7 HMMs & Particle Filters Hidden Markov Models x 1 x 2 x 3 x 4 x 5 x 6 y 1 y 2 y 3 y 4 y 5 y 6 X n is a X -valued Markov Chain with transition density f: X n {X n 1 = x n 1 } f( x n 1 ) Y n is a Y-valued stochastic process: Y n {X n = x n } g( x n )
8 HMMs & Particle Filters Optimal Filtering The filtering distribution may be expressed recursively as: p(x n y 1:n 1 ) = p(x n 1 y 1:n 1 )f(x n x n 1 )dx n 1 Prediction p(x n y 1:n ) = p(x n y 1:n 1 )g(y n x n ) p(xn y 1:n 1 )g(y n x n )dx n Update. The smoothing distribution may be expressed recursively as: p(x 1:n y 1:n 1 ) = p(x 1:n 1 y 1:n 1 )f n (x n x n 1 ) p(x 1:n y 1:n ) = p(x 1:n y 1:n 1 )g n (y n x n ) p(x1:n y 1:n 1 )g n (y n x n )dx 1:n Prediction Update.
9 HMMs & Particle Filters Optimal Filtering The filtering distribution may be expressed recursively as: p(x n y 1:n 1 ) = p(x n 1 y 1:n 1 )f(x n x n 1 )dx n 1 Prediction p(x n y 1:n ) = p(x n y 1:n 1 )g(y n x n ) p(xn y 1:n 1 )g(y n x n )dx n Update. The smoothing distribution may be expressed recursively as: p(x 1:n y 1:n 1 ) = p(x 1:n 1 y 1:n 1 )f n (x n x n 1 ) p(x 1:n y 1:n ) = p(x 1:n y 1:n 1 )g n (y n x n ) p(x1:n y 1:n 1 )g n (y n x n )dx 1:n Prediction Update.
10 HMMs & Particle Filters Particle Filters: Sequential Importance Resampling At time n = 1: Sample X i 1,1 q 1( ). Weight Resample. At times n > 1, iterate: W i 1 f(x i 1,1)g(y 1 X i 1,1)/q 1 (X i 1,1) Sample X i n,n q n ( X i n 1,n 1 ). Set Xi n,1:n 1 = Xi n 1. Weight Resample. W i n f(xi n,n X i n,n 1 )g(xi n,n y n ) q n (X n,n X n,1:n 1 )
11 Auxiliary Particle Filters Auxiliary [v] Particle Filters (Pitt & Shephard 99) If we have access to the next observation before resampling, we could use this structure: Pre-weight every particle with λ (i) n ˆp(y n X (i) n 1 ). Propose new states, from the mixture distribution N i=1 λ (i) n q( X (i) n 1 )/ N i=1 λ (i) n. Weight samples, correcting for the pre-weighting. Resample particle set. W i n f(xi n,n X i n,n 1 )g(xi n,n y n ) λ i nq n (X n,n X n,1:n 1 )
12 Auxiliary Particle Filters Some Well Known Refinements We can tidy things up a bit: 1. The auxiliary variable step is equivalent to multinomial resampling. 2. So, there s no need to resample before the pre-weighting. Now we have: Pre-weight every particle with λ (i) n Resample Propose new states ˆp(y n X (i) n 1 ). Weight samples, correcting for the pre-weighting. W i n f(xi n,n X i n,n 1 )g(xi n,n y n ) λ i nq n (X n,n X n,1:n 1 )
13 Interpretation
14 APFs without Auxiliary Variables General SIR Algorithms The SIR algorithm can be used somewhat more generally. Given {π n } defined on E n = X n : Sample X i n,n q n ( X i n 1 ). Set Xi n,1:n 1 = Xi n 1. Weight Resample. W i n π n (X i n) π n 1 (X n,1:n 1 )q n (X n,n X n,1:n 1 )
15 APFs without Auxiliary Variables An Interpretation of the APF If we move the first step at time n + 1 to the last at time n, we get: Resample Propose new states Weight samples, correcting earlier pre-weighting. Pre-weight every particle with λ (i) n+1 ˆp(y n+1 X (i) n ). which is an SIR algorithm targetting the sequence of distributions η n (x n ) p(x 1:n y 1:n )ˆp(y n+1 x n ) which allows estimation under the actually interesting distribution via importance sampling.
16 Theoretical Considerations Theoretical Considerations Direct analysis of the APF is largely unnecessary. Results can be obtained by considering the associated SIR algorithm. SIR has a (discrete time) Feynman-Kac interpretation.
17 Theoretical Considerations For example... Proposition. Under standard regularity conditions N ( ϕ N n,ap F ϕ n ) N ( 0, σ 2 n (ϕ n ) ) where, Z σ 2 p ( x1 1 (ϕ y 1 ) 2 1) = (ϕ 1 (x 1 ) ϕ 1 ) 2 dx 1 q 1 (x 1 ) Z σ 2 p(x1 n (ϕn) = y 1:n ) 2 Z «2 ϕ n(x 1:n )p(x 2:n y 2:n, x 1 )dx 2:n ϕ n dx 1 q 1 (x 1 ) t 1 X Z p(x 1:k y 1:n ) 2 Z «2 + ϕ n(x 1:n )p(x k+1:n y k+1:n, x k )dx k+1:n ϕ n dx 1:k bp(x k=2 1:k 1 y 1:k )q k (x k x k 1 ) Z p(x 1:n y 1:n ) 2 + bp(x 1:n 1 y 1:n )q n(x n x n 1 ) (ϕn(x 1:n) ϕ n) 2 dx 1:n.
18 Practical Implications Practical Implications It means we re doing importance sampling. Choosing p(y n x n 1 ) = p (y n x n = E [X n x n 1 ]) is dangerous. A safer choice would be ensure that g(y n x n )f(x n x n 1 ) sup x n 1,x n p(y n x n 1 )q(x n x n 1 ) < Using APF doesn t ensure superior performance.
19 Practical Implications A Contrived Illustration Consider the following binary state-space model with common state and observation spaces: X = {0, 1} p(x 1 = 0) = 0.5 p(x n = x n 1 ) = 1 δ Y = X p(y n = x n ) = 1 ε. δ controls ergodicity of the state process. ɛ controls the information contained in observations. Consider estimating E(X 2 Y 1:2 = (0, 1)).
Practical Implications Variance of SIR - Variance of APF 0.05 0-0.05-0.1-0.15-0.2 0.05 0.1 0.15 0.2 ǫ 0.25 0.3 0.35 0.4 0.45 0.5 0.9 0.8 0.7 0.6 0.5 0.4 δ 0.3 0.2 0.1 1 20
21 Practical Implications 0.45 0.4 Variance Comparison at ǫ = 0.25 SISR APF SISR Asymptotic APF Asymptotic 0.35 Variance 0.3 0.25 0.2 0.15 0.1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 δ
22 Applications & Extensions
23 Auxiliary SMC Samplers SMC Samplers (Del Moral, Doucet & Jasra, 2006) SIR Techniques can be adapted for any sequence of distributions {π n }. Define Sample at time n, n 1 π n (x 1:n ) = π n (x n ) L k (x k+1, x k ). k=1 X i n K n (X i n 1, ) Weight Resample. W i n π n(x i n)l n 1 (X i n, X i n 1 ) π n 1 (X i n 1 )K n(x i n 1, Xi n)
24 Auxiliary SMC Samplers Auxiliary SMC Samplers An auxiliary SMC sampler for a sequence of distributions π n comprises: An SMC sampler targeting some auxiliary sequence of distributions µ n. A sequence of importance weight functions Generic approaches: w n (x n ) dπ n dµ n (x n ). Choose µ n (x n ) π n (x n )V n (x n ). Incorporate as much information as possible prior to resampling.
25 Auxiliary SMC Samplers Auxiliary SMC Samplers An auxiliary SMC sampler for a sequence of distributions π n comprises: An SMC sampler targeting some auxiliary sequence of distributions µ n. A sequence of importance weight functions Generic approaches: w n (x n ) dπ n dµ n (x n ). Choose µ n (x n ) π n (x n )V n (x n ). Incorporate as much information as possible prior to resampling.
26 Auxiliary SMC Samplers Resample-Move Approaches A common approach in SMC Samplers: Choose K n (x n 1, x n ) to be π n -invariant. Set Leading to L n 1 (x n, x n 1 ) = π n(x n 1 )K n (x n 1, x n ). π n (x n ) W n (x n 1, x n ) = π n(x n 1 ) π n 1 (x n 1 ). It would make more sense to resample and then move...
27 Auxiliary SMC Samplers An Interpretation of Pure Resample-Move It s SIR with auxiliary distributions... µ n (x n ) = π n+1 (x n ) L n 1 (x n, x n 1 ) = µ n 1(x n 1 )K n (x n 1, x n ) µ n 1 (x n ) w n (x n 1:n ) = µ n(x n ) µ n 1 (x n ) = π n+1(x n ) π n (x n ) w n (x n ) = µ n 1(x n ) µ n (x n ) = π n(x n ) π n+1 (x n ). = π n(x n 1 )K n (x n 1, x n ) π n (x n )
28 Auxiliary SMC Samplers Piecewise-Deterministic Processes 25 20 15 y 10 5 0 5 10 35 40 45 50 55 x
29 Auxiliary SMC Samplers Filtering of Piecewise-Deterministic Processes (Whiteley, Johansen & Godsill, 2007) X n = (k n, τ n,1:kn, θ n,0:kn ) specifies a continuous time trajectory (ζ t ) t [0,tn]. ζ t observed in the presence of noise, e.g. Y n = ζ tn + V n Target sequence of distributions {π n } on nested spaces: π n (k, τ 1:k, θ 0:k ) q(θ 0 ) k f(τ j τ j 1 )q(θ j τ j, θ j 1, τ j 1 ) j=1 S(t n, τ k ) n g(y p ζ tp ) p=1
30 Auxiliary SMC Samplers Filtering of Piecewise-Deterministic Processes (Whiteley, Johansen & Godsill, 2007) π n yields posterior marginal distributions for recent history of ζ t Employ auxiliary SMC samplers Choose µ n (k, τ 1:k, θ 0:k ) π n (k, τ 1:k, θ 0:k )V n (τ k, θ k ) Kernel proposes new pairs (τ j, θ j ) and adjusts recent pairs Applications in object-tracking and finance
31 The Auxiliary SMC-PHD Filter Probability Hypothesis Density Filtering (Mahler, 2003; Vo et al. 2005) Approximates the optimal filter for a class of spatial point process-valued HMMs A recursion for intensity functions: α n (x n ) = f(x n x n 1 )p S (x n 1 ) α n 1 (x n 1 )dx n 1 + γ(x n ) E [ ] m n ψ n,r (x n ) α n (x n ) = 1 p D (x n ) + α n (x n ). Z n,r r=1 Approximate the intensity functions { α n } n 0 using SMC
32 The Auxiliary SMC-PHD Filter An ASMC Implementation (Whiteley et al., 2007) α n 1 (dx n 1 ) α N n 1 (dx n 1) = 1 N N i=1 W i n 1 δ X i n 1 (dx n 1) In a simple case, target integral at nth iteration: E m n E r=1 ϕ(x n ) ψ n,r(x n ) Z n,r f(x n x n 1 )p S (x n 1 )dx n α N n 1(dx n 1 ) Proposal distribution q n (x n, x n 1, r n) built from α N n 1, potential functions {V n,r } and factorises: N q n (x n x i=1 V n,r(x (i) n 1 )W n 1δ (i) X (dx n 1 ) n 1 n 1, r) N i=1 V n,r(x (i) n 1 )W q n (r) n 1 and also estimate normalising constants {Z n,r } by IS
33 On Stratification Stratification... Back to HMM, let (A p ) M p=1 denote a partition of X Introduce stratum indicator variable R n = M p=1 pi A p (X n ) Define extended model: p(x 1:n, r 1:n y 1:n ) g(y n x n )f(x n r n, x n 1 )f(r n x n 1 ) p(x 1:n 1, r 1:n 1 y 1:n 1 ) An auxiliary SMC filter resamples from distribution with N M support points, over particles strata Applications in switching state space models
34 Conclusions
35 Conclusions Conclusions The APF is a standard SIR algorithm for nonstandard distributions. This interpretation... allows standard results to be applied directly, provides guidance on implementation, and allows the same technique to be applied in more general settings. Thanks for listening... any questions?
36 Conclusions Conclusions The APF is a standard SIR algorithm for nonstandard distributions. This interpretation... allows standard results to be applied directly, provides guidance on implementation, and allows the same technique to be applied in more general settings. Thanks for listening... any questions?
37 Conclusions Conclusions The APF is a standard SIR algorithm for nonstandard distributions. This interpretation... allows standard results to be applied directly, provides guidance on implementation, and allows the same technique to be applied in more general settings. Thanks for listening... any questions?
38 Conclusions Conclusions The APF is a standard SIR algorithm for nonstandard distributions. This interpretation... allows standard results to be applied directly, provides guidance on implementation, and allows the same technique to be applied in more general settings. Thanks for listening... any questions?
39 Conclusions Conclusions The APF is a standard SIR algorithm for nonstandard distributions. This interpretation... allows standard results to be applied directly, provides guidance on implementation, and allows the same technique to be applied in more general settings. Thanks for listening... any questions?
40 Conclusions Conclusions The APF is a standard SIR algorithm for nonstandard distributions. This interpretation... allows standard results to be applied directly, provides guidance on implementation, and allows the same technique to be applied in more general settings. Thanks for listening... any questions?
41 Conclusions Conclusions The APF is a standard SIR algorithm for nonstandard distributions. This interpretation... allows standard results to be applied directly, provides guidance on implementation, and allows the same technique to be applied in more general settings. Thanks for listening... any questions?
References [1] P. Del Moral, A. Doucet, and A. Jasra. Sequential Monte Carlo samplers. Journal of the Royal Statistical Society B, 63(3):411 436, 2006. [2] A. M. Johansen and A. Doucet. Auxiliary variable sequential Monte Carlo methods. Technical Report 07:09, University of Bristol, Department of Mathematics Statistics Group, University Walk, Bristol, BS8 1TW, UK, July 2007. [3] A. M. Johansen and A. Doucet. A note on the auxiliary particle filter. Statistics and Probability Letters, 2008. To appear. [4] A. M. Johansen and N. Whiteley. A modern perspective on auxiliary particle filters. In Proceedings of Workshop on Inference and Estimation in Probabilistic Time Series Models. Isaac Newton Institute, June 2008. To appear. [5] R. P. S. Mahler. Multitarget Bayes filtering via first-order multitarget moments. IEEE Transactions on Aerospace and Electronic Systems, 39(4):1152, October 2003. [6] M. K. Pitt and N. Shephard. Filtering via simulation: Auxiliary particle filters. Journal of the American Statistical Association, 94(446):590 599, 1999. [7] B. Vo, S.S. Singh, and A. Doucet. Sequential Monte Carlo methods for multi-target filtering with random finite sets. IEEE Transactions on Aerospace and Electronic Systems, 41(4):1223 1245, October 2005. [8] N. Whiteley, A. M. Johansen, and S. Godsill. Monte Carlo filtering of piecewise-deterministic processes. Technical Report CUED/F-INFENG/TR-592, University of Cambridge, Department of Engineering, Cambridge University Engineering Department, Trumpington Street, Cambridge, CB2 1PZ, December 2007. [9] N. Whiteley, S. Singh, and S. Godsill. Auxiliary particle implementation of the probability hypothesis density filter. Technical Report CUED F-INFENG/590, University of Cambridge, Department of Engineering, Trumpington Street, Cambridge,CB1 2PZ, United Kingdom, 2007.