Particle Filtering a brief introductory tutorial. Frank Wood Gatsby, August 2007

Size: px

Start display at page:

Download "Particle Filtering a brief introductory tutorial. Frank Wood Gatsby, August 2007"

Ralf Price
5 years ago
Views:

1 Particle Filtering a brief introductory tutorial Frank Wood Gatsby, August 2007

2 Problem: Target Tracking A ballistic projectile has been launched in our direction and may or may not land near enough to kill us We have orders to launch an interceptor only if there is a >5% chance it will land near enough to kill us (the projectile s kill radius is 100 meters) the interceptor requires 5 seconds to destroy incoming projectiles We live under an evil dictator who will kill us himself if we unnecessarily launch an interceptor (they re expensive after all)

3 Problem Schematic (x t, y t ) r t θ t (0,0)

4 Problem problems Only noisy observations are available True trajectory is unknown and must be inferred from the noisy observations Full details will be given at the end of the lecture

5 Probabilistic approach Treat true trajectory as a sequence of latent variables Specify a model and do inference to recover the distribution over the latent variables (trajectories)

6 Linear State Space Model (LSSM) Discrete time First-order Markov chain x t+1 = ax t + ², ² N(µ ², σ² 2 ) y t = bx t + η, η N(µ η, ση) 2 x t 1 x t x t+1 y t 1 y t y t+1

7 LSSM Joint distribution p(x 1:i, y 1:i )= Q N i=1 p(y i x i )p(x i x i 1 ) For prediction we want the posterior predictive p(x i y 1:i 1 ) and posterior (filtering) distributions p(x i 1 y 1:i 1 )

8 Inferring the distributions of interest Many methods exist to infer these distributions Markov Chain Monte Carlo (MCMC) Variational inference Belief propagation etc. In this setting sequential inference is possible because of characteristics of the model structure and preferable due to the problem requirements

9 Exploiting LSSM model structure p(x i y 1:i ) = p(x i, x 1:i 1 y 1:i )dx 1:i 1 p(y i x i, x 1:i 1, y 1:i 1 )p(x i, x 1:i 1, y 1:i 1 )dx 1:i 1 p(y i x i ) p(x i x 1:i 1, y 1:i )p(x 1:i 1 y 1:i )dx 1:i 1 p(y i x i ) p(x i x i 1 )p(x 1:i 1 y 1:i 1 )dx 1:i 1 p(y i x i ) p(x i x i 1 )p(x i 1 y 1:i 1 )dx i 1 p(y i x i )p(x i y 1:i 1 )

10 Exploiting LSSM model structure p(x i y 1:i ) = Use p(xbayes i, x 1:i 1 rule yand 1:i the )dxconditional 1:i 1 independence structure dictated by the LSSM graphical model p(y i x i, x 1:i 1, y 1:i 1 )p(x i, x 1:i 1, y 1:i 1 )dx 1:i 1 x t 1 x t x t+1 p(y i x i ) p(x i x 1:i 1, y 1:i )p(x 1:i 1 y 1:i )dx 1:i 1 p(y i x y t 1 i ) p(x i y x t i 1 )p(x y t+1 1:i 1 y 1:i 1 )dx 1:i 1 p(y i x i ) p(x i x i 1 )p(x i 1 y 1:i 1 )dx i 1 p(y i x i )p(x i y 1:i 1 )

11 for sequential inference p(x i y 1:i ) = p(x i, x 1:i 1 y 1:i )dx 1:i 1 p(y i x i, x 1:i 1, y 1:i 1 )p(x i, x 1:i 1, y 1:i 1 )dx 1:i 1 p(y i x i ) p(x i x 1:i 1, y 1:i )p(x 1:i 1 y 1:i )dx 1:i 1 p(y i x i ) p(x i x i 1 )p(x 1:i 1 y 1:i 1 )dx 1:i 1 p(y i x i ) p(x i x i 1 )p(x i 1 y 1:i 1 )dx i 1 p(y i x i )p(x i y 1:i 1 )

12 Particle filtering steps Start with a discrete representation of the posterior up to observation i-1 Use Monte Carlo integration to represent the posterior predictive distribution as a finite mixture model Use importance sampling with the posterior predictive distribution as the proposal distribution to sample the posterior distribution up to observation i

13 Posterior predictive distribution What? State model Likelihood p(x i y 1:i ) p(y i x i ) p(x i x i 1 )p(x i 1 y 1:i 1 )dx i 1 p(y i x i )p(x i y 1:i 1 ) Start with a discrete representation of this distribution

14 Monte Carlo integration General setup p(x) LX `=1 w`δ x` lim L LX `=1 w`f(x`) = f(x)p(x)dx As applied in this stage of the particle filter p(x i y 1:i 1 ) = p(x i x i 1 )p(x i 1 y 1:i 1 )dx i 1 LX `=1 w i 1 ` p(x i x i 1 ` ) Samples from Finite mixture model

15 Intuitions Particle filter name comes from physical interpretation of samples and the fact that the filtering distribution is often the distribution of interest

16 Start with samples representing the hidden state {w i 1 `,x i 1 ` } L`=1 p(x i 1 y 1:i 1 ) (0,0)

17 Evolve them according to the state model ˆx i m p( x i 1 ` ) ŵ i m w i 1 m (0,0)

18 Re-weight them by the likelihood w i m ŵ i mp(y i ˆx i m) (0,0)

19 Down-sample / resample {w ì,x ì } L`=1 {wi m,x i m} M m=1 (0,0)

20 Particle filtering Consists of two basic elements: Monte Carlo integration LX p(x) w`δ x` lim L LX `=1 w`f(x`) = Importance sampling `=1 f(x)p(x)dx

21 Importance sampling E x [f(x)] = Original distribution: hard to sample from, easy to evaluate = Importance weights p(x)f(x)dx p(x) q(x) f(x)q(x)dx 1 LX p(x`) L q(x`) f(x`) `=1 r` = p(x`) q(x`) Proposal distribution: easy to sample from x` q( )

22 Importance sampling un-normalized distributions p(x) = p(x) p q(x) = q(x) q Un-normalized distribution to sample from, still hard to sample from and easy to evaluate x` q( ) Un-normalized proposal distribution: still easy to sample from New term: ratio of normalizing constants E x [f(x)] 1 L LX `=1 q p 1 L p(x`) q(x`) f(x`) LX `=1 p(x`) q(x`) f(x`)

23 Normalizing the importance weights Un-normalized importance weights Takes a little algebra r` = p(x`) q(x`) q L p P L `=1 r` w` = r` P L `=1 r` Normalized importance weights E x [f(x)] q p 1 L LX `=1 LX `=1 w`f(x`) p(x`) q(x`) f(x`)

24 PF ing: Forming the posterior predictive {w i 1 `,x i 1 ` } L`=1 p(x i 1 y 1:i 1 ) p(x i y 1:i 1 ) = Posterior up to observation i 1 p(x i x i 1 )p(x i 1 y 1:i 1 )dx i 1 LX `=1 w i 1 ` p(x i x i 1 ` ) The proposal distribution for importance sampling of the posterior up to observation i is this approximate posterior predictive distribution

25 Sampling the posterior predictive Generating samples from the posterior predictive distribution is the first place where we can introduce variance reduction techniques {ŵ i m, ˆxi m }M m=1 p(x i y 1:i 1 ), p(x i y 1:i 1 ) For instance sample from each mixture component several twice such that M, the number of samples drawn, is two times L, the number of densities in the mixture model, and assign weights ŵ i m = wi 1 ` 2 LX `=1 w i 1 ` p(x i x i 1 ` )

26 Not the best Most efficient Monte Carlo estimator of a function Γ(x) From survey sampling theory: Neyman allocation Number drawn from each mixture density is proportional to the weight of the mixture density times the std. dev. of the function Γ over the mixture density Take home: smarter sampling possible [Cochran 1963]

27 Over-sampling from the posterior predictive distribution {ŵ i m, ˆx i m} M m=1 p(x i y 1:i 1 ) (0,0)

28 Importance sampling the posterior Recall that we want samples from p(x i y 1:i ) p(y i x i ) p(y i x i )p(x i y 1:i 1 ) p(x i x i 1 )p(x i 1 y 1:i 1 )dx i 1 and make the following importance sampling identifications p(x i )=p(y i x i )p(x i y 1:i 1 ) q(x i ) = p(x i y 1:i 1 ) LX Distribution from which we want to sample Proposal distribution `=1 w i 1 ` p(x i x i 1 ` )

29 Sequential importance sampling Weighted posterior samples arise as {ŵ i m, ˆx i m} q( ) Normalization of the weights takes place as before wm i ˆr = P ì L `=1 ˆrì ˆr i m = p(ˆxi m ) q(ˆx i m ) = p(y i ˆx i m)ŵ i m We are left with M weighted samples from the posterior up to observation i {w i m,x i m} M m=1 p(x i y 1:i )

30 An alternative view {ŵ i m, ˆxi m } L X `=1 w i 1 ` p(x i x i 1 ` ) p(x i y 1:i 1 ) P M m=1 ŵi mδˆx i m p(x i y 1:i ) P M m=1 p(y i ˆx i m)ŵ i mδˆx i m

31 Importance sampling from the posterior distribution {w i m,x i m} M m=1 p(x i y 1:i ) (0,0)

32 Sequential importance re-sampling Down-sample L particles and weights from the collection of M particles and weights {w ì,x ì } L`=1 {wi m,x i m} M m=1 this can be done via multinomial sampling or in a way that provably minimizes estimator variance [Fearnhead 04]

33 Down-sampling the particle set {w ì,x ì } L`=1 p(x i y 1:i ) (0,0)

34 Recap Starting with (weighted) samples from the posterior up to observation i-1 Monte Carlo integration was used to form a mixture model representation of the posterior predictive distribution The posterior predictive distribution was used as a proposal distribution for importance sampling of the posterior up to observation i M>Lsamples were drawn and re-weighted according to the likelihood (the importance weight), then the collection of particles was down-sampled to Lweighted samples

35 SIR Particle Filter The Sequential Importance Sampler (SIS) particle filter multiplicatively updates weights at every iteration and never resamples particles (skipped) Concentrated weight / particle degeneracy The Sequential Importance Resampler (SIR) particle filter avoids many of the problems associated with SIS particle filtering but always uses representations with Lparticles and does not maintain a weighted particle set

36 Particle evolution {w i 1 `,x i 1 ` } L`=1 p(x i 1 y 1:i 1 ) (0,0)

37 Particle evolution {ŵ i m, ˆx i m} M m=1 p(x i y 1:i 1 ) (0,0)

38 Particle evolution {w i m,x i m} M m=1 p(x i y 1:i ) (0,0)

39 Particle evolution {w ì,x ì } L`=1 p(x i y 1:i ) (0,0)

40 LSSM Not alone Various other models are amenable to sequential inference, Dirichlet process mixture modelling is another example, dynamic Bayes nets are another

41 Tricks and Variants Reduce the dimensionality of the integrand through analytic integration Rao-Blackwellization Reduce the variance of the Monte Carlo estimator through Maintaining a weighted particle set Stratified sampling Over-sampling Optimal re-sampling

42 Rao-Blackwellization In models where parameters can be analytically marginalized out, or the particle state space can otherwise be collapsed, the efficiency of the particle filter can be improved by doing so

43 Stratified Sampling Sampling from a mixture density using the algorithm on the right produces a more efficient Monte Carlo estimator {w n,x n } K n=1 X k π k f k ( ) for n=1:k choose kaccording to π k sample x n from f k set w n equal to 1/N for n=1:k choose f n sample x n from f n set w n equal to π n

44 Intuition: weighted particle set What is the difference between these two discrete distributions over the set {a,b,c}? (a), (a), (b), (b), (c) (.4, a), (.4, b), (.2, c) Weighted particle representations are equally or more efficient for the same number of particles

45 Optimal particle re-weighting Next step: when down-sampling, pass all particles above a threshold c through without modifying their weights where c is the unique solution to P M m=1 min{cwi m, 1} = L Resample all particles with weights below c using stratified sampling and give the selected particles weight 1/ c Fearnhead 2004

46 Result is provably optimal In the down-sampling step {w ì,x ì } L`=1 {wi m,x i m} M m=1 Imagine instead a sparse set of weights of which some are zero { w ì,x ì } M`=1 {wi m,x i m} M m=1 Then this down-sampling algorithm is optimal w.r.t. P M m=1 E w[( w i m w i m) 2 ] Fearnhead 2004

47 Problem Details Time and position are given in seconds and meters respectively Initial launch velocity and position are both unknown The maximum muzzle velocity of the projectile is 1000m/s The measurement error in the Cartesian coordinate system is N(0,10000) and N(0,500) for x and y position respectively The measurement error in the polar coordinate system is N(0,.001) for θand Gamma(1,100) for r The kill radius of the projectile is 100m

48 Data and Support Code

49 Laws of Motion In case you ve forgotten: r =(v 0 cos(α))ti +((v 0 sin(α)t 1 2 gt2 )j where v is the initial speed and α is the initial angle

50 Good luck!

51 Monte Carlo Integration Compute integrals for which analytical solutions are unknown f(x)p(x)dx p(x) LX w`δ x` `=1

52 Monte Carlo Integration Integral approximated as the weighted sum of function evaluations at L points f(x)p(x)dx p(x) LX w`δ x` `=1 = LX `=1 f(x) LX `=1 w`f(x`) w`δ x`dx

53 Sampling To use MC integration one must be able to sample from p(x) lim L {w`,x`} L`=1 p( ) LX w`δ x` p( ) `=1

54 Theory (Convergence) Quality of the approximation independent of the dimensionality of the integrand Convergence of integral result to the truth is O(1/n 1/2 ) from L.L.N. s. Error is independent of dimensionality of x

55 Bayesian Modelling Formal framework for the expression of modelling assumptions Two common definitions: using Bayes rule marginalizing over models Posterior Prior p(θ x) = p(x θ)p(θ) p(x) = p(x θ)p(θ) p(x θ)p(θ)dθ Likelihood Evidence

56 Posterior Estimation Often the distribution over latent random variables (parameters) is of interest Sometimes this is easy (conjugacy) Usually it is hard because computing the evidence is intractable

57 Conjugacy Example θ Beta(α, β) x θ Binomial(N,θ) p(θ) = p(x θ) = Γ(α + β) Γ(α)Γ(β) θα 1 (1 θ) β 1 µ N θ x (1 θ) N x x x successes in N trials, θ probability of success

58 Conjugacy Continued p(θ x) = = = p(x θ)p(θ) R p(x θ)p(θ)dθ 1 (x) p(x θ)p(θ) 1 (x) θα 1+x (1 θ) β 1+N x θ x Beta(α + x, β + N x) µ Γ(α + β + N) (x) = Γ(α + x)γ(β + N x) 1

59 Non-Conjugate Models Easy to come up with examples σ 2 N(0, α) x σ 2 N(0, σ 2 )

60 Posterior Inference Posterior averages are frequently important in problem domains posterior predictive distribution p(x i+1 x 1:i )= R p(x i+1 θ, x 1:i )p(θ x 1:i )dθ evidence (as seen) for model comparison, etc.

61 Relating the Posterior to the Posterior Predictive p(x i y 1:i ) = p(x i, x 1:i 1 y 1:i )dx 1:i 1 p(y i x i, x 1:i 1, y 1:i 1 )p(x i, x 1:i 1, y 1:i 1 )dx 1:i 1 p(y i x i ) p(x i x 1:i 1, y 1:i )p(x 1:i 1 y 1:i )dx 1:i 1 p(y i x i ) p(x i x i 1 )p(x 1:i 1 y 1:i 1 )dx 1:i 1 p(y i x i ) p(x i x i 1 )p(x i 1 y 1:i 1 )dx i 1 p(y i x i )p(x i y 1:i 1 )

62 Relating the Posterior to the Posterior Predictive p(x i y 1:i ) = p(x i, x 1:i 1 y 1:i )dx 1:i 1 p(y i x i, x 1:i 1, y 1:i 1 )p(x i, x 1:i 1, y 1:i 1 )dx 1:i 1 p(y i x i ) p(x i x 1:i 1, y 1:i )p(x 1:i 1 y 1:i )dx 1:i 1 p(y i x i ) p(x i x i 1 )p(x 1:i 1 y 1:i 1 )dx 1:i 1 p(y i x i ) p(x i x i 1 )p(x i 1 y 1:i 1 )dx i 1 p(y i x i )p(x i y 1:i 1 )

63 Relating the Posterior to the Posterior Predictive p(x i y 1:i ) = p(x i, x 1:i 1 y 1:i )dx 1:i 1 p(y i x i, x 1:i 1, y 1:i 1 )p(x i, x 1:i 1, y 1:i 1 )dx 1:i 1 p(y i x i ) p(x i x 1:i 1, y 1:i )p(x 1:i 1 y 1:i )dx 1:i 1 p(y i x i ) p(x i x i 1 )p(x 1:i 1 y 1:i 1 )dx 1:i 1 p(y i x i ) p(x i x i 1 )p(x i 1 y 1:i 1 )dx i 1 p(y i x i )p(x i y 1:i 1 )

64 Relating the Posterior to the Posterior Predictive p(x i y 1:i ) = p(x i, x 1:i 1 y 1:i )dx 1:i 1 p(y i x i, x 1:i 1, y 1:i 1 )p(x i, x 1:i 1, y 1:i 1 )dx 1:i 1 p(y i x i ) p(x i x 1:i 1, y 1:i )p(x 1:i 1 y 1:i )dx 1:i 1 p(y i x i ) p(x i x i 1 )p(x 1:i 1 y 1:i 1 )dx 1:i 1 p(y i x i ) p(x i x i 1 )p(x i 1 y 1:i 1 )dx i 1 p(y i x i )p(x i y 1:i 1 )

Sequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007

Sequential Monte Carlo and Particle Filtering Frank Wood Gatsby, November 2007 Importance Sampling Recall: Let s say that we want to compute some expectation (integral) E p [f] = p(x)f(x)dx and we remember