Sequential Monte Carlo Methods - A Review. Arnaud Doucet. Engineering Department, Cambridge University, UK

Size: px

Start display at page:

Download "Sequential Monte Carlo Methods - A Review. Arnaud Doucet. Engineering Department, Cambridge University, UK"

Maurice Bishop
5 years ago
Views:

1 Sequetial Mote Carlo Methods - A Review Araud Doucet Egieerig Departmet, Cambridge Uiversity, UK ad2/araud doucet.html ad2@eg.cam.ac.uk Istitut Heri Poicaré - Paris - 2 Décembre

2 Outlie of the Talk 1. Objectives. 2. Sequetial Importace Samplig. 3. Resamplig. 4. Samplig while lookig ahead. 5. Variace Reductio. 6. Smoothig. 7. Sequetial Mote Carlo Samplers (with P. Del Moral). 2

3 1.1 What do we wat to do? Estimate expectatios f (x 0: ) π (dx 0: ) with respect to a sequece of probability distributios {π } 0 kow up to a ormalizig costat, i.e. π (dx 0: ) = π (x 0: ) dx 0: where x 0: = (x 0, x 1,..., x ). Mote Carlo methods: Obtai N weighted samples { } X (i) 0:, W (i) i=1,...,n (W (i) > 0, N i=1 W (i) N i=1 = 1) such that W (i) f ( ) X (i) 0: N f (x 0: ) π (dx 0: ) 1 Objectives Page 3

4 1.2 Sequetial Mote Carlo Methods Sequetial Mote Carlo Methods: Havig obtai { } X (i) 0:, W (i) i=1,...,n from π. { } X (i) 0: 1, W (i) 1 from π 1, i=1,...,n Real-time methods, i.e. o-iterative, good for us egieers. Eve for batch problems, it ca provide better results tha MCMC. Combiatio of importace samplig, resamplig ad MCMC. SMC are ot a BLACK-BOX!!! 1 Objectives Page 4

5 1.3 The optimal filterig case Let {X } 0 ad {Y } 1 be p ad q -valued stochastic processes Evolutio equatio: Pr (X +1 A X 0:, Y 1: ) = A f (x X ) dx, Observatio equatio: Pr (Y +1 B X 0:+1, Y 1: ) = B g (y X +1) dy. Sequece of posterior distributios Pr (X 0: dx 0: Y 1: = y 1: ) = p (x 0: y 1: ) dx 0: p (x 0: y 1: ) f (x 0 ) f (x k x k 1 ) g (y k x k ) k=1 }{{}}{{} prior k=1 likelihood 1 Objectives Page 5

6 1.4 Applicatio of Sequetial Mote Carlo Methods Optimal filterig: the most popular applicatio of SMC. Other applicatios: Polymer desig, Protei foldig, Quatum physics. Bioiformatics, Geetics, Rare evets. Computig eigemeasure of positive operators. 1 Objectives Page 6

7 2.1 Importace Samplig Caot sample from π (x 0: )! Importace Samplig Itroduce a pdf q (x 0: ) > 0 π (x 0: ) > 0 the π (x 0: ) = w (x 0: ) q (x 0: ) w (x 0: ) q (x 0: ) dx 0: where Importace weight: w (x 0: ) π (x 0: ) /q (x 0: ) If X (i) 0: q the π N (dx 0: ) = N i=1 W (i) δ (i) X 0: ( ) (dx 0: ), W (i) w X (i) 0:, N i=1 W (i) = 1. 2 Sequetial Importace Samplig Page 7

8 2.2 Sequetial Importace Samplig To be sequetial Weights must be computed sequetially! w (x 0: ) π (x 0: ) /q (x 0: ) π (x 0: ) q (x 0: ) q 1 (x 0: 1 ) π 1 (x 0: 1 ) w 1 (x 0: 1 ). Oly impute x at time q (x 0: ) = q 1 (x 0: 1 ) q (x x 0: 1 ), w (x 0: ) π (x 0: ) π 1 (x 0: 1 ) q (x x 0: 1 ) w 1 (x 0: 1 ). 2 Sequetial Importace Samplig Page 8

9 2.3 Sequetial Importace Samplig for Optimal Filterig Oe has π (x 0: ) = p (x 0: y 1: ) ad q (x 0: y 1: ) = q 1 (x 0: 1 y 1: 1 ) q (x x 0: 1, y 1: ) The importace weight is w (x 0: ) π (x 0: ) π 1 (x 0: 1 ) q (x 0: ) w 1 (x 0: 1 ) f (x x 1 ) g (y x ) q (x x 0: 1, y 1: ) w 1 (x 0: 1 ). 2 Sequetial Importace Samplig Page 9

10 2.4 Sequetial Importace Samplig for Optimal Filterig ( ) Samplig Step. For i = 1,..., N, sample X (i) q X (i) 0: 1, y 1: ad update the importace weights W (i) f ( X (i) q ( ) ( ) X (i) 1 g y X (i) X (i) 0: 1, y 1: X (i) ) W (i) 1 The empirical distributio p N (dx 0: y 1: ) = N i=1 W (i) δ (i) X 0: (dx 0: ) is a approximatio of p (x 0: y 1: ) 2 Sequetial Importace Samplig Page 10

11 2.5 Choice of the Importace Desity The closer q (x 0: ) is from π (x 0: ) the better it works... Oe has w (x 0: ) π (x x 0: 1 ) q (x x 0: 1 ) π (x 0: 1 ) π 1 (x 0: 1 ) w 1 (x 0: 1 ) Obvious choice q (x x 0: 1 ) = π (x x 0: 1 ) but requires kowig π (x 0: 1 ) Filterig case: q (x x 0: 1, y 1: ) g (y x ) f (x x 1 ) 2 Sequetial Importace Samplig Page 11

12 2.6 Choice of the Importace Desity Prior distributio. q (x x 0: 1, y 1: ) = f (x x 1 ), w (x 0: ) w 1 (x 0: 1 ) g (y x ). Optimal distributio. q (x x 0: 1, y 1: ) g (y x ) f (x x 1 ), w (x 0: ) w 1 (x 0: 1 ) g (y x ) f (x x 1 ) dx. Alterative suboptimal distributios. Use EKF, UKF or aythig you wat to approximate the optimal distributio. Advatages of Mote Carlo: still theoretically valid! 2 Sequetial Importace Samplig Page 12

13 3.1 Resamplig Dimesio of the state space icreases with importace samplig collapse. Needs for resamplig - KEY elemet of SMC methods. Basic idea of resamplig. At time, multiply particles X (i) 0: oes with small weights. with high weights W (i) ad discard Give to the resampled particles a equal weight. Keep the umber of particles fixed. 3 Resamplig Page 13

14 3.2 Multiomial Resamplig At time, π N (dx 0: ) = N i=1 W (i) δ (i) X 0: (dx 0: ) Sample N times from π N to obtai N ew samples. Equivalet to copy X (i) 0: k i times where ( (k 1,..., k N ) M N; W (1) ),..., W (N). Oe has E [k i ] = NW (i) but large variace var [k i ] = NW (i) ( 1 W (i) ). 3 Resamplig Page 14

15 3.3 Better Resamplig Schemes What you really wat N i=1 W (i) δ (i) X 0: (dx 0: ) N i=1 k i N δ X (i) 0: (dx 0: ) where N k i = N, k i i=1. Residual resamplig (Baker 1987). Set k i = ( k1,..., k ) ( ) (1) (N) (i) N M N; W,..., W ; W NW (i) NW (i) + k i where. NW (i) Stratified/Systematic Samplig (Wittley 1994, Kitagawa 1996, Carpeter et al. 1999): 2 lies of code... very efficiet. Mimimum etropy samplig (Crisa 2001). 3 Resamplig Page 15

16 3.4 More Resamplig Schemes Keep some iformatio about the weights. If k i copies, give a ew weight W (i) 1 + ( NW (i) N NW (i) ) /Nk i If a particle of weight 1.7/N is copied two times the oe substracts 0.3/N mass. W (i) (1 0.3/2) N If a particle of weight 1.7/N is copied oe time the oe adds 0.7/N mass. 3 Resamplig Page 16

17 3.5 SIS + Resamplig for Optimal Filterig Samplig ( Step. For i = ) 1,..., N, set X (i) (i) q X 0: 1, y 1: X (i) 0: 1 = X(i) 0: 1, sample W (i) f ( X(i) q ( X(i) (i) X 1 ) g ( y X (i) 0: 1, y 1: (i) X ) ) Selectio Step. Multiply/Suppress samples { } { } weights W (i) to obtai X (i) 0:. { } X(i) 0: respectively with high/low p N (dx 0: y 1: ) = 1 N N i=1 δ X (i) 0: (dx 0: ) is a approximatio of p (x 0: y 1: ). 3 Resamplig Page 17

18 3.6 Usig MCMC to prevet degeeracy If the distributio of the weights w (i) i.e. poor approximatio. is skewed A few particles domiate; To prevet degeeracy Use kerel approximatio p N (x 0: y 1: ) = 1 N N i=1 K h ( ) x 0: X (i) 0:... It modifies the target distributio. Use MCMC step: o perturbatio of the target X (i) 0: K ( ) X (i) 0:, where π (dx 0:) = K (dx 0: x 0: ) π (dx 0: ). 3 Resamplig Page 18

19 4.1 Look oe-step ahead before samplig Cosider the optimal case where q (x x 0: 1 ) = π (x x 0: 1 ) w (x 0: ) π (x 0: 1 ) /π 1 (x 0: 1 ) idepedet of x! Samplig the resamplig iefficiet. Resamplig the samplig makes sese. I the case of optimal filterig q (x x 0: 1, y 1: ) g (y x ) f (x x 1 ), w (x 0: ) g (y x ) f (x x 1 ) dx. i.e. look at what is happeig oe-step-ahead ad sample i this regio. 4 Samplig while lookig ahead. Page 19

20 4.2 Look oe-step ahead before samplig I the geeral case, oe caot compute w (x 0: ) π (x 0: 1 ) /π 1 (x 0: 1 ), π (x 0: 1 ) requires itegratio! Basic idea. Approximate it, use say accordig to these weights. } (i) {Ŵ, ad resample { } X (i) 0: 1 N i=1 W (i) 1 δ X (i) 0: 1 (dx 0: 1 ) Resample N i=1 α (i) δ X (i) 0: 1 (dx 0: 1 ) where α (i) W (i) (i) 1 /Ŵ, N i=1 α (i) = 1. 4 Samplig while lookig ahead. Page 20

21 4.3 Auxiliary Particle Filter-Like Selectio Step. Set X(i) 0: 1 respectively with high/low weights = X(i) 0: 1, Multiply/Suppress samples { X(i) 0: 1 } (i) {Ŵ to obtai { } X (i) 0: 1. } Samplig Step. For i = 1,..., N, sample X (i) q ( ) X (i) 0: 1, y 1: W (i) W (i) 1 Ŵ (i) f ( X (i) q ( ) ( ) X (i) 1 g y X (i) ) X (i) 0: 1, y 1: X (i) p N (dx 0: y 1: ) = N i=1 W (i) δ (i) X 0: (dx 0: ) is a approximatio of p (x 0: y 1: ). 4 Samplig while lookig ahead. Page 21

22 4.4 Why oly oe-step ahead? If it is expected that π +L (x 0: 1 ) very differet from π 1 (x 0: 1 ), multiple step-ahead. Boost particles usig approximatio of π +L (x 0: 1 ) /π 1 (x 0: 1 ). Sample particles with approximatio of π +L (x x 1 ). 4 Samplig while lookig ahead. Page 22

23 4.5 Modifyig the past of the trajectories If large discrepacy betwee π 1 (x 0: 1 ) ad π (x 0: 1 ): all methods are iefficiet! At time, we might wat to sample ew trajectories accordig to say q (dx 0: x 0: 1 ) δ x0: 1 ( dx 0: 1 ) q (dx x 0: 1 ). Problem: If q (dx 0: x 0: 1 ) is a geeral kerel the the importace distributio q (dx 0: x 0: 1 ) π 1 (dx 0: 1 ) has ot aalytical expressio. Importace samplig o a exteded space. 4 Samplig while lookig ahead. Page 23

24 4.6 Phatom kerel - More freedom for my particles Artificial joit measure π (dx }{{ 0:) L (dx 0: 1 x }}{{ 0:) } fixed Phatom kerel Importace weights w (x 0:, x 0:) π (dx 0:) L (dx 0: 1 x 0:) π 1 (dx 0: 1 ) q (dx 0: x 0: 1) Applicatio to parameter estimatio π (dθ) (see SMC samplers) w (θ, θ ) π (dθ ) L (dθ θ ) π 1 (dθ) q (dθ θ). 4 Samplig while lookig ahead. Page 24

25 5.1 Variace Reductio Methods Cotrol Variate. Atithetic Variables. var [A + B] = var [A] + var [B] + 2cov [A, B]. Quasi Mote Carlo. Explore more uiformly the space. Rao-Blackwellisatio. Itegrate out aalytically variables wheever you ca. Importace Samplig. To compute f (x) π (dx), usig samples from π (dx) is ot the best choice. 5 Variace Reductio Methods Page 25

26 5.2 Rao-Blackwellisatio A Good Mote Carlo is a Dead Mote Carlo - Trotter. Assume x = (u, v ) so π (x 0: ) = π (u 0:, v 0: ) = π (v 0: u 0: ) π (u 0: ) }{{} kow up to a ormalizig costat. Two possible estimates of f (v 0: ) π (dv 0: ) I 1 = 1 N ( ) f V (i) 0:, I 2 = 1 N E N N Oe has i=1 i=1 [ ] f (V 0: ) U (i) 0: var [I 2 ] var [I 1 ] as var [f (V 0: )] = var[e [f (V 0: ) U 0: ]]+ E [var [f (V 0:) U 0:]] }{{}}{{} correspods to I Variace Reductio Methods Page 26

27 5.3 Applicatio to Optimal Filterig Coditioally Liear Gaussia State Space Models Oe has U Markov process V = A (U ) V 1 + B (U ) W, W i.i.d. N (0, I) Y = C (U ) V + D (U ) Z, Z i.i.d. N (0, I) p (u 0:, v 0: y 1: ) = p (v 0: y 1:, u 0: ) }{{} Gaussia distributio p (u 0: y 1: ), }{{} kow up to a ormalizig costat p (u 0: y 1: ) p (y 1: u 0: ) }{{} likelihood - Kalma filter p (u 0: ). Sample from p (u 0: y 1: ) ad ot from p (u 0:, v 0: y 1: ) gai ca be eormous! 5 Variace Reductio Methods Page 27

28 5.4 Applicatio to Optimal Filterig Stadard Estimate: If ( ) U (i) 0:, V (i) 0: p (u 0:, v 0: y 1: ) Ê 1 [V y 1: ] = 1 N V (i). N i=1 Raoblackwell Estimate: If U (i) 0: p (u 0: y 1: ) Ê 2 [V y 1: ] = 1 N [ ] Ê V y 1:, U (i) 0:. N }{{} i=1 Kalma Oe has ] var [Ê2 [V y 1: ] ] var [Ê1 [V y 1: ] ad the gai will be high if the average variace of E [var [V y 1:, U ]] large. 5 Variace Reductio Methods Page 28

29 6.1 Smoothig problems Smoothig problems: Sample from π (dx 0: ) / Estimate π (dx k ). SMC methods yield π N (dx 0: ) = N i=1 W (i) δ (i) X 0: (dx 0: ) so X 0: π N is a approximate sample from π. π N (dx k ) = N i=1 W (i) δ (i) X k (dx k ). Problem: As is large, approximatio of the joit distributio is BAD!!!. 6 Smoothig Page 29

30 6.2 Smoothig usig Margial Distributios Aim: Samplig from p (dx 0: y 1: ). Forward Backward Samplig from p (x 0: y 1: ) based o p (x 0: y 1: ) p (x y 1: ) 1 k=0 f (x k+1 x k ) p (x k y 1:k ) p (x k+1 y 1:k ) Sample X p N (dx y 1: ), the Sample for k = 1,..., 0 Xk f ( ) Xk+1 x k p N (dx k y 1:k ) p ( ) N Xk+1 = y 1:k N i=1 W (i) k (X f ) k+1 X (i) k+1 δ (i) X (dx k ) k N i=1 W (i) k (X f ) k+1 X (i) k+1 6 Smoothig Page 30

31 6.3 Smoothig usig Margial Distributios Aim: Estimatig π (dx k ). Forward Filterig Backward Smoothig f (xk+1 x k ) p (x k+1 y 1: ) p (x k y 1: ) = p (x k y 1:k ) p (x k+1 y 1:k ) dx k+1 Two-Filter Formula - Requires additioal assumptio. p (x k y 1: ) p (x k y k ) }{{} Filter p (y k+1: x k ) }{{} Iformatio Filter. 6 Smoothig Page 31

32 7.1 Samplig from a probability distributio Aim: Sample from a sequece π (dx) (space of fixed dimesio) Solutio: Defie π (dx 0: ) π (dx ) L k (dx k 1 x k ) k=1 where π (dx) = π (dx), π (dx) [π 0 (x)] γ [π (x)] 1 γ dx (γ 1), π (dx) π γ (x) dx (γ ), π (dx) = p (dx y 1: ). {L } sequece of phatom kerels - free parameters. 7 Sequetial Mote Carlo Samplers Page 32

33 7.2 Sequetial Mote Carlo Sampler For i = 1,..., N, set X (i) 0: 1 = X(i) 0: 1 ad sample X (i) M ( X (i) 1, ) ad W (i) π ( d ) (i) X π 1 ( d X (i) 1 ) (i) X 1 ( L X(i), d ) ( M X (i) 1, d X (i) ), N i=1 W (i) = 1. Multiply/Discard particles { } obtai N particles X (i) 0:. { } X (i) 0: with respect to high/low weights { W (i) } to * P. Del Moral & A. Doucet, O a Class of Geealogical ad Iteractig Metropolis Models, to appear Lecture Notes i Maths, * P. Del Moral & A. Doucet, Sequetial Mote Carlo Samplers, TR Cambridge Uiv., CUED-F-INFENG o. 444, Sequetial Mote Carlo Samplers Page 33

34 7.3 Relatios to previous work Aealed Importace Samplig (Neal, 1998): No resamplig, M MCMC kerel of ivariat dist. π ad L (x, dx ) = M (x, dx) π (dx ) π (dx) W (i) π k ( d ) (i) X 1 π k 1 ( d X (i) 1 ). Populatio Mote Carlo (TR. Cappé & al. 2002): Homogeeous case M = M, L = L ad π = π, M MCMC kerel of ivariat dist. π L (x, dx ) = π (dx ) W (i) M ( ) π d X (i) ( ). X (i) 1, d X (i) 7 Sequetial Mote Carlo Samplers Page 34

35 7.4 A Few Commets All SMC algorithms ca be reused. Basic elemet of more complex algorithms. May degrees of freedom Optimizatio of L give M... May covergece results available; see Del Moral & Miclo s papers. 7 Sequetial Mote Carlo Samplers Page 35

36 8.1 Refereces SMC Website: Algorithms: Sequetial Mote Carlo Methods i Practice, New York: Spriger-Verlag, Theoretical results: P. Del Moral & L. Miclo Special Issue Applied Sigal Processig, Deadlie April 30, May potetial algorithmical ad theoretical developmets / Numerous applicatios. 8 Refereces Page 36

Advanced Sequential Monte Carlo Methods

Advanced Sequential Monte Carlo Methods Advaced Sequetial Mote Carlo Methods Araud Doucet Departmets of Statistics & Computer Sciece Uiversity of British Columbia A.D. () / 35 Geeric Sequetial Mote Carlo Scheme At time =, sample q () ad set