Advanced Sequential Monte Carlo Methods

Size: px

Start display at page:

Download "Advanced Sequential Monte Carlo Methods"

Gyles Richards
6 years ago
Views:

1 Advaced Sequetial Mote Carlo Methods Araud Doucet Departmets of Statistics & Computer Sciece Uiversity of British Columbia A.D. () / 35

2 Geeric Sequetial Mote Carlo Scheme At time =, sample q () ad set w Resample o, W (i) to obtai ew particles also deoted At time 2 sample X (i ) q j X (i ) : compute w X (i ) Resample : = γ : γ : q. : o :, W (i) to obtai ew particles also deoted = γ. q : o o A.D. () 2 / 35

3 Sequetial Mote Carlo for Hidde Markov Models At time =, sample q () ad set w = µ g y j Resample q, W (i). y At time 2 sample X (i ) q j y, X (i ) compute w X (i ) : = f X (i) q Resample o to obtai ew particles also deoted g y jx (i) X (i) y,. o :, W (i) to obtai ew particles also deoted : o o A.D. () 3 / 35

4 Resamplig ca drastically improve the performace of SIS i models havig good mixig properties; e.g. state-space models: this ca be veri ed experimetally ad theoretically. Resamplig does ot solve all our problems; oly the SMC approximatios of the most recet margials π (x L+: ) are reliable; i.e. we ca have uiform (i time) covergece bouds. A.D. () 4 / 35

5 A Limited Framework? It seems that there is ot much to do to improve over this SMC scheme. Give a sequece of distributios π (x : ), use your favourite resamplig scheme ad the oly degree of freedom is essetially q (x j x : ). We kow that the best choice is q opt (x j x : ) = π (x j x : ) so how ca we do ay better??? Aswer: Modify the sequece of target distributios ad the associated proposals i a sesible way. A.D. () 4 / 35

6 Advaced SMC Methods Auxiliary particle lter Resample-move algorithm Block samplig strategy A.D. () 5 / 35

7 Auxiliary Particle Filter This is a very popular strategy itroduced by Pitt & Shephard (999). It was origially itroduced usig auxiliary variables but presetatio here is completely di eret... Iitial Remark: The stadard SMC algorithm appears very ie ciet whe q (x j y, x ) = p (x j y, x ) ) w (x : ) = p (y j x ) A.D. () 6 / 35

8 Stadard Algorithm vs Alterative Strategy Stadard Strategy Sample X (i) p j y, ad compute w : = p y j. o Resample :, W (i) to obtai ew particles also deoted Alterative Strategy Compute w : = p o y j. Resample :, W (i) to obtai ew particles also deoted o : approx. dist. p (x : j y : ). Sample X (i) p j y, : We swap the samplig ad resamplig steps; this yields more diverse particles at time ; hece ituitively a better estimate. A.D. () 7 / 35 o.

9 Auxiliary Particle Filter We ca oly swap the samplig ad resamplig steps whe q (x j y, x ) = p (x j y, x ) as the resultig weight is idepedet of x ; i.e. we have w (x : ) = p (y j x ). If we caot sample from p (x j y, x ) ad/or do ot kow p (y j x ), we ca simply propose to use approximatios bp (x j y, x ) ad bp (y j x ) ad correct for the bias. The APF is thus essetially a look-ahead strategy where o we try to aticipate the quality of our curret particles : with respect to y. A.D. () 8 / 35

10 Algorithm Assumig you have W (i) o, : approx. p (x : j y : ). Compute W (i) W (i) bp y j. o Resample :, W (i) to obtai ew particles also deoted o : approx. dist. ep (x : j y : ) p (x : j y : ) bp (y j x ). Sample X (i) bp j y, ad reweight W (i) p : y : p : y : bp y j bp X (i) y, f X (i) g y j X (i) bp y j bp X (i) y, A.D. () 9 / 35

11 Iterpretatio as a stadard SMC algorithm It is easy to check that this algorithm is othig but a stadard SMC for π (x : ) p (x : j y : ) bp (y + j x ) We do ot target p (x : j y : ) directly so it is ecessary to use IS to correct for the discrepacy betwee this target ad π (x : ) bp (x j y, x ) the distributio of the particles obtaied after the samplig step W (i) p : p y : bp : y : y j bp y, All the covergece results for stadard SMC ca thus straightforwardly be exteded to the APF. Perhaps surprisigly, the APF does ot domiate uiformly the stadard SMC scheme eve if q (x j y, x ) = p (x j y, x ). This is because it is just a oe-step optimizatio procedure. A.D. () / 35

12 Desig Issues I the literature, it is ofte suggested to approximate Z p (y j x ) = g (y j x ) f (x j x ) dx via bp (y j x ) = g (y j µ (x )) where µ (x ) is the mode, mea or media of f (x j x ). Typically, people ted to build a approximatio bp (x j y, x ) idepedetly of the approximatio bp (y j x ). A simpler ad better way cosists of buildig a approximatio bp (x, y j x ) = bp (x, y j x ) bp (y j x ) of p (x, y j x ) = g (y j x ) f (x j x ) such that p (x, y j x ) bp (x, y j x ) < C < A.D. () / 35

13 Limitatios The algorithms described earlier su er from several limitatios. Eve if the optimal importace distributio p (x j y, x ) ca be used, this does ot guaratee that the SMC algorithms will be e ciet. Ideed, if the variace of p (y j x ) is high, the the variace of the resultig approximatio will be high. Hece it will be ecessary to resample very frequetly ad the approximatio bp (x : j y : ) of the joit distributio p (x : j y : ) will be ureliable. Oe major problem with the approaches discussed above is that oly the variables X i are sampled at time but the path values X i : remai xed. A obvious way to improve upo these algorithms would ivolve ot oly samplig X i at time, but also modifyig the values of the paths over a xed lag X i L+: for L > i light of the ew observatio y ; L beig xed or upper bouded. A.D. () 2 / 35

14 Resample-Move The Resample-Move algorithm (Gilks & Berzuii, JRSS B, 2) is a stadard approach to mitigate this problem. Like MCMC, it relies upo Markov kerels with appropriate ivariat distributios. Whilst MCMC uses such kerels to geerate collectios of correlated samples, the Resample-Move algorithm uses them i withi a SMC algorithm as a pricipled way to jitter the particle locatios ad thus to reduce degeeracy. A Markov kerel K (x: j x :) of ivariat distributio p (x : j y : ) is a Markov trasitio kerel with the property that Z p (x : j y : ) K x: x: dx: = p x: y:. For such a kerel, if X : p (x : j y : ) ad X : j X : K (x : j X : ) the X : p (x :j y : ). Eve if X : is ot dist.accordig to p (x : j y : ) the, after applyig K, X : ca oly have a dist. closer to p (x : j y : ) i TV tha that of X :. A.D. () 3 / 35

15 Examples of Markov Kerels For example, we could cosider the followig Gibbs sampler: set x : L = x : L the sample x L+ from p x L+ j y :, x : L, x L+2:, sample x L+2 from p x L+2 j y :, x : L+, x L+3: ad so o util we sample x from p x j y :, x : ; that is K x: x : = δx: L x: L k= L+ p xk y :, x:k, x k+: ad we write, with a slight abuse of otatio, the o-degeerate compoet of the MCMC kerel K x L+: x :. It is straightforward to verify that this kerel is p (x : j y : )-ivariat. A.D. () 4 / 35

16 If it is ot possible to sample from p xk j y :, x:k, x k+: = p x k j y k, xk, x k+, we ca istead employ a Metropolis-Hastigs (MH) strategy ad sample a cadidate accordig to some proposal q xk j y k, xk, x k :k+ ad accept it with the usual MH acceptace probability mi, p (x :k, x k+:j y : ) q x k j y k, xk, x k, x! k+ p x:k, x k+: y : q x k j y k, xk, x k :k+ = mi, g (y k j xk ) f (x k+j xk ) f x k j x k q xk j y k, xk, x k, x! k+ g (y k j x k ) f (x k+ j x k ) f x k j xk q x k j y k, xk, x k :k+ These kerels ca be ergodic oly if L = so that all of the compoets of x : are updated. However, i our cotext we will use o-ergodic kerels as we restrict ourselves to updatig the variables X L+: for some xed or bouded L. A.D. () 5 / 35

17 Resample Move Assumig we have access to X i : approx. dist. p (x : j y : ) the at times L Sample X i q(x j y, X i ) ad set X : i X : i, X i. Compute the weights W i g(y jx)f i ( XjX i i ). q( Xjy i,x i ) Resample W, i X: i N :o, X i. to obtai N ew equally-weighted particles Sample X i L+: K (x L+: j X i :) ad set X: i X i : L, X i L+:. A.D. () 6 / 35

18 Iterpretatio as a stadard SMC algorithm We ca justify isertig MCMC trasitios withi a SMC algorithm as follows. Give a target distributio π, a istrumetal distributio µ ad a π-ivariat Markov kerel K, the followig geeralizatio of the IS idetity is trivially true: Z π(y)ϕ(y)dy = ZZ for ay Markov kerel L. π(y)l(xj y) µ(x)k (yj x) µ(x)k (yj x) ϕ(y)dxdy This approach correspods to IS o a elarged space usig µ(x)k (yj x) as the proposal distributio for a target π(y)l(xj y) ad the estimatig a fuctio ϕ (x, y) = ϕ(y). A.D. () 7 / 35

19 I particular, for the time-reversal kerel associated with K L(xj y) = π(x)k (yj x), π (y) we have the importace weight π(y)l(xj y) µ(x)k (yj x) = π (x) µ(x). This iterpretatio of such a approach illustrates its de ciecy: the importace weights deped oly upo the locatio before the MCMC move while the sample depeds upo the locatio after the move. Eve if the kerel was perfectly mixig, leadig to a collectio of iid samples from the target distributio, some of these samples would be elimiated ad some replicated i the resamplig step. Resamplig after a MCMC step will always lead to greater sample diversity tha performig the steps i the other order (ad this algorithm ca be justi ed directly by the ivariace property). A.D. () 8 / 35

20 It is possible to reformulate this algorithm as a speci c applicatio of the geeric SMC algorithm. To simplify otatio we write q (x j x ) for q (x j y, x ) ad to clarify our argumet, it is ecessary to add a superscript to the variables; e.g. X p k correspods to the pth time the radom variable X k is sampled; i this ad the followig sectio, this superscript does ot deote the particle idex. Resample move is a geeric SMC algorithm associated to π x :L+,..., x :L+ L+, x :L :2 L,..., x = p x L+,...., x L+ L+, x L L,..., x 2 y : L x, x 2,..., x L L+ x 2, x 3,..., x L+ L+ K 2 x 2, x2 x 3, x2 2 L x x 2 where L is the time-reversal kerel associated with K. A.D. () 9 / 35

21 If o resamplig is used the we have q x :L+,..., x :L+ L+, x :L :2 L,..., x = q x K x 2 x q2 x2 x 2 K2 x 3, x2 2 x 2, x2 q x x 2 K x L+ L+,..., x 3, x 2 x: L+ L, x L L+,..., x 2, x. This sequece of target distributios admits the lterig distributios of iterest as margials. The clear theoretical advatage of usig MCMC moves is that the use of eve o-ergodic MCMC kerels fk g ca oly improve the mixig properties of fπ g compared to the atural sequece of lterig distributios; this explais why these algorithms outperform a stadard particle lter for a give umber of particles. A.D. () 2 / 35

22 Block Samplig Resample-Move su ers from a major drawback: it does allows us to reitroduce some diversity amog the set of particles after the resamplig step over a lag of legth L >, the importace weights have the same expressio as for the stadard particle lter. This strategy does ot sigi catly decrease the umber of resamplig steps compared to a stadard approach. It ca partially mitigate the problem associated with resamplig, but it does ot prevet these resamplig steps i the rst place. A alterative approach block samplig approach cosists of directly samplig the compoets x L+: at time ; the previously-sampled values of the compoets x L+: sampled are simply discarded. A.D. () 2 / 35

23 Why it is ot trivial... The basic idea is trivial but ot applicable... Cosider you have X : p (x : j y : ) ad at time you sample X L+: q ( j X L, y L+: ) the the joit distributio of X :, X L+: p (x : j y : ) q x L+: x L, y L+: If we discard X L+:, the the distributio of X : L, X L+: is Z p (x : j y : ) q x L+: x L, y L+: dx L+: = p (x : L j y : ) q x L+: x L, y L+:. We typically do ot kow p (x : L j y : ) up to a ormalizig costat so we caot use IS! A.D. () 22 / 35

24 Exteded Importace Samplig The idea cosists of usig a exteded target distributio p x : L, x L+: y: q(x L+: jy L+:, x L ) whose margial is by costructio p x : L, x L+: y :. We the use IS o this exteded state-space ad compute the weights p x : L, x L+: y : q(x L+: jy L+:, x L ) p (x : j y : ) q x L+: x L, y L+: A.D. () 23 / 35

25 Algorithm Settigs If we kew how to compute p (x : L j y : ) the the IS distributio miimizig the variace of the importace weight is q opt (x L+: j x L, y L+: ) = p (x L+: j y L+:, x L ) ad the importace weight is Z p (y L+: j x L ) = f (x k j x k ) g (y k j x k ) dx L+:. k= L+ This optimal weight has a variace which typically decreases expoetially fast with L (uder mixig assumptios). As this distributio is typically ot available ad/or p (y L+: j x L ) caot be computed, we eed to use a approximatio. A.D. () 24 / 35

26 The distributio q(x L+: jy L+:, x L ) miimizig the variace of the weights is simply = q opt (x L+: jy L+:, x L ) p (x : j y : ) q x L+: x L, y R L+: p (x: j y : ) q x L+: x L, y L+: dx L+: So if we pick q opt (x L+: j x L, y L+: ) = p (x L+: j y L+:, x L ) the q opt (x L+: jy L+:, x L ) = p(x L+: jy L+:, x L ) This suggests oce more usig a approximatio of this desity A.D. () 25 / 35

27 Block Samplig SMC Assumig we have access to the at time L o X i : approx. dist. p (x : j y : ) Sample X i L+: q(x L+:jy L+:, X i : ). Compute the weights p X i W i : L, X i L+:, y : bp(x i L+: jy L+:, X i L) p X i :, y : bp(x i L+: jy L+:, X i. L) o Resample W, i X i : L, X i L+: to obtai N ew equally weighted o particles X i :. A.D. () 26 / 35

28 Iterpretatio as a stadard SMC algorithm Oce more this is just a special case of the geeric SMC algorithm. To simply otatio we write q (x L+: j x L ) for q (x L+: j y L+:, x L ) ad to clarify our argumet we add a superscript to the variables; e.g. X p k correspods to the pth time the radom variable X k is sampled. The block samplig algoritmh correspods to π x :L,..., x :L L+, x :L = p x L : L+, x L q 2 x 2, x 2 q x. L+2,..., x L+2,..., x y : q x L L+,..., x If o resamplig is used, a path is sampled accordig to q x :L,..., x :L L+, x :L L+2,..., x = q x q2 x 2, x2 q x L L+,..., x x L L x L. L A.D. () 27 / 35

29 Applicatio to Bearig-oly Trackig Target model X = where V i.i.d. N (, Σ). T T C A X + V The state vector X = X X 2 X 3 X 4 T is such that X (resp. X 3 ) correspods to the horizotal (resp. vertical) positio of the target whereas X 2 (resp. X 4 ) correspods to the horizotal (resp. vertical) velocity. Oe oly receives observatios of the bearigs of the target from a sesor located at the origi + W where W i.i.d. N Y = ta X 3 X, 4 ; i.e. the observatios are almost oiseless. A.D. () 28 / 35

30 We build a approximatio bp (x L+: j y L+:, x L ) of the optimal importace distributio usig the EKF ad the forward lterig/backward samplig formula. We compare the stadard bootstrap lter, two resample-move algorithms where the SISR algorithm for L = usig the EKF proposal is used followed by: (i) oe at a time Metropolis-Hastigs (MH) moves usig a approximatio of the full coditioals p (x k j y k, x k, x k+ ) as a proposal over a lag L = (algorithm RML()); ad (ii) usig the EKF proposal for L = (algorithm RMFL()), the block samplig algorithms for L =, 2, 5 ad which are usig the EKF proposal deoted SMC-EKF(L). Systematic resamplig is performed wheever the ESS goes below N/2. A.D. () 29 / 35

31 Compariso i terms of resamplig steps Filter Avge. # Resamplig steps Bootstrap 46.7 SMC-EKF() 44.6 RML() 45.2 RMFL() 43.3 SMC-EKF(2) 34.9 SMC-EKF(5) 4.6 SMC-EKF().3 Table : Average umber of resamplig steps for simulatios, time istaces per simulatio usig N = particles. A.D. () 3 / 35

32 9 Bootstrap RMFL() EKF(5) EKF() Figure: Average umber of uique particles X (i ) approximatig p ( x j y : ) plotted agaist time (x-axis) A.D. () 3 / 35

33 Applicatio to Stochastic Volatility We cosider a stadard SV model X = φx + σv, X N Y = β exp (X /2) W, σ 2, φ 2, i.i.d. i.i.d. where V N (, ) ad W N (, ). We propose to build approximatio of p (x L+: j y L+:, x L ) usig the fact that log Y 2 = log β 2 + X + log W 2. We approximate o-gaussia oise term log W 2 with a Gaussia oise of similar mea ad variace ad hece obtai a liear Gaussia model approximatio. We the use the KF to build our proposal. The performace of our algorithms are assessed through computer simulatios based o varyig samples sizes to attai a approximately equal computatioal cost. A.D. () 32 / 35

34 Filter # Particles Avge. # Resamplig Steps Bootstrap SMC-EKF() SMC-EKF(2) 4 8. SMC-EKF(5) 6.6 SMC-EKF().45 Table 2: Average umber of resamplig steps for simulatios usig 5 time istaces per simulatio. A.D. () 33 / 35

35 2 8 6 Boots trap EKF() EKF(5) EKF() Figure: Average umber of uique particles X (i ) approximatig p ( x j y : ) plotted agaist time (x-axis) A.D. () 34 / 35

36 P osterior desity P osterior desity Figure: Empirical measure approximatios of p ( x j y :945 ) at times =, 3, 6, 9 for Bootstrap (top left), SMC-EKF() (top right), SMC-EKF(5) (bottom left), SMC-EKF() (bottom right) A.D. () 35 / 35 P osterior desity P osterior desity Time 5 Sample space Time 5 Sample space Time 5 Sample space Time 5 Sample space 5

Sequential Monte Carlo Methods - A Review. Arnaud Doucet. Engineering Department, Cambridge University, UK

Sequential Monte Carlo Methods - A Review. Arnaud Doucet. Engineering Department, Cambridge University, UK Sequetial Mote Carlo Methods - A Review Araud Doucet Egieerig Departmet, Cambridge Uiversity, UK http://www-sigproc.eg.cam.ac.uk/ ad2/araud doucet.html ad2@eg.cam.ac.uk Istitut Heri Poicaré - Paris - 2