Efficient Block Sampling Strategies for Sequential Monte Carlo Methods

Size: px

Start display at page:

Download "Efficient Block Sampling Strategies for Sequential Monte Carlo Methods"

Randell Palmer
5 years ago
Views:

1 Efficiet Block Samplig Strategies for Sequetial Mote Carlo Methods Araud DOUCET, Mark BRIERS, ad Stéphae SÉNÉCA Sequetial Mote Carlo SMC) methods are a powerful set of simulatio-based techiques for samplig sequetially from a sequece of complex probability distributios. These methods rely o a combiatio of importace samplig ad resamplig techiques. I a Markov chai Mote Carlo MCMC) framework, block samplig strategies ofte perform much better tha algorithms based o oe-at-a-time samplig strategies if good proposal distributios to update blocks of variables ca be desiged. I a SMC framework, stadard algorithms sequetially sample the variables oe at a time whereas, like MCMC, the efficiecy of algorithms could be improved sigificatly by usig block samplig strategies. Ufortuately, a direct implemetatio of such strategies is impossible as it requires the kowledge of itegrals which do ot admit closed-form expressios. This article itroduces a ew methodology which bypasses this problem ad is a atural extesio of stadard SMC methods. Applicatios to several sequetial Bayesia iferece problems demostrate these methods. Key Words: Block sequetial Mote Carlo; Importace samplig; Markov chai Mote Carlo; Optimal filterig; Particle filterig; State-space models. 1. INTRODUCTION Sequetial Mote Carlo SMC) methods are a set of flexible simulatio-based methods for samplig from a sequece of probability distributios; each distributio beig oly kow up to a ormalizig costat. These methods were origially itroduced i the early 1950s by physicists ad have become very popular over the past few years i statistics ad related fields, see Chopi 2002, 2004); Gilks ad Berzuii 2001); Küsch 2005); iu 2001); Pitt ad Shephard 1999). For example, these methods are ow extesively used to solve sequetial Bayesia iferece problems arisig i ecoometrics, advaced sigal Araud Doucet is Associate Professor, Departmets of Statistics ad Computer Sciece, Uiversity of British Columbia, Vacouver, British Columbia, V6T 1Z4, Caada araud@stat.ubc.ca). Mark Briers is Research Studet, Departmet of Egieerig, Uiversity of Cambridge, Cambridge, CB2 1PZ, UK mb511@cam.ac.uk). Stéphae Séécal is Research Associate, The Istitute of Statistical Mathematics, Departmet of Statistical Sciece, Tokyo, Japa steph@ism.ac.jp) America Statistical Associatio, Istitute of Mathematical Statistics, ad Iterface Foudatio of North America Joural of Computatioal ad Graphical Statistics, Volume 15, Number 3, Pages 1 19 DOI: / X

2 2 A. DOUCET, M.BRIERS, AND S. SÉNÉCA processig or robotics; see Doucet, de Freitas, ad Gordo 2001) for a comprehesive review of the literature. SMC methods approximate the sequece of probability distributios of iterest usig a large set of radom samples, amed particles. These particles are propagated over time usig simple importace samplig IS) ad resamplig mechaisms. Asymptotically, that is, as the umber of particles goes to ifiity, the covergece of these particle approximatios towards the sequece of probability distributios ca be esured uder very weak assumptios as discussed by Del Moral 2004). However, for practical implemetatios, a fiite ad sometimes quite restricted umber of particles has to be cosidered. I these cases, it is crucial to desig efficiet samplig strategies i order to sample particles i regios of high probability mass. A large amout of effort has bee devoted to derivig improved schemes for: 1) samplig particles based o tailored importace desities e.g., Carpeter, Clifford, ad Fearhead 1999; Doucet, Godsill, ad Adrieu 2000; Guo, Wag, ad Che 2005; iu ad Che 1998; Pitt ad Shephard 1999), 2) MCMC steps to rejuveate the particle populatio e.g., Chopi 2002; Doucet, Gordo, ad Krishamurthy 2001; Fearhead 2002; Gilks ad Berzuii 2001), ad 3) look-ahead techiques e.g., Grassberger 1997; iu 2001; Meirovitch 1985; Wag, Che, ad Guo 2002). However, tailored importace desities attempt to sample oly oe variable at a time, Markov chai Mote Carlo MCMC) steps require the use of fast mixig kerels for good performace, ad look-ahead techiques are computatioally expesive as they typically require a local Mote Carlo itegratio for each particle. We propose here a alterative approach that allows us to exted the class of importace samplig distributios i a plethora of applicatios without havig to perform ay local Mote Carlo itegratio. Guidelies for the desig of efficiet samplig schemes based o this ew framework are give. The resultig methods are atural ad pricipled extesios of stadard SMC schemes. They ca be applied i a straightforward way wherever SMC methods are curretly used. We demostrate their efficiecy o various optimal filterig problems. The rest of this article is orgaized as follows. I Sectio 2, stadard SMC methods are briefly reviewed ad we outlie their limitatios. The ew importace samplig approach is preseted i Sectio 3. I Sectio 4, we illustrate our methodology usig several optimal filterig problems. Fially, we give a brief discussio ad draw coclusios i Sectio SEQUENTIA MONTE CARO METHODS I this sectio we itroduce the otatio, briefly describe stadard SMC methods, ad outlie their limitatios see Del Moral 2004); Doucet et al. 2000); iu 2001) for further details. et us cosider a sequece of probability distributios π 1 such that π is defied o the product space E = E ad admits a desity deoted π x 1: ) with respect to a domiatig measure typically ebesgue) where, for ay geeral sequece z k, we write z i:j =z i,z i+1,...,z j ). Each desity is kow up to a ormalizig costat, that is, π x 1: )=Z 1 γ x 1: ), where γ : E R + ca be evaluated poitwise whereas Z is ukow.

3 EFFICIENT BOCK SAMPING STRATEGIES 3 SMC methods are a class of algorithms for approximately samplig sequetially from π ; that is, first sample from π 1 the π 2 ad so o. By samplig, we mea obtaiig at time a collectio of N N 1) weighted radom samples, i =1,...,N, with W i) W i),x i) 1: > 0 ad N i=1 W i) =1satisfyig, for ay π -itegrable fuctio ϕ, N i=1 W i) ϕ X i) 1: ) N ϕ x 1: ) π x 1: ) dx 1:. These radom samples are kow as particles ad are propagated through time usig importace samplig ad resamplig mechaisms. A popular applicatio of SMC methods is optimal filterig, where a latet Markov process X 1 is oly observed through a sequece of oisy observatios Y 1. I this case the target distributio π x 1: )=p x 1: y 1: ) is the posterior distributio of X 1: give a realizatio of the observatios Y 1: = y 1: ; see Sectio 4 for additioal details. 2.1 STANDARD SMC METHODS W i) 1,Xi) 1: 1 We first describe the stadard sequetial importace samplig resamplig SISR) scheme. At time 1, assume a set of weighted particles approximatig π 1 is available. Note that a radom sample/particle X i) 1: 1 represets a path from time 1 to 1. The probability desity of movig to x whe the curret path is x 1: 1 is deoted q x x 1: 1 ). The desities q are parameters of the algorithm to be selected by the user. The algorithm proceeds as follows at time. 1. Sample X i) q X i) 1: 1 ). 2. Update ad ormalize the weights W i) W i) 1 3. If the degeeracy of W i) is high, resample obtai N uweighted particles also deoted particles W i) N 1 ). π X i) 1: ) π 1 X i) 1: 1 )q 2.1) X i) X i) 1: 1 ). X i) 1: X i) 1: accordig to W i) to i.e., weights of resampled The resamplig step is ecessary as, i most cases, the variace of the importace weights teds to icrease over time. Thus, after a small umber of time steps, all particles except a few have egligible weights. I the resamplig operatio, we associate to each particle a umber of offsprig proportioal to its weight. Hece we focus the future computatioal efforts o the zoes of high probability; see Doucet, de Freitas, ad Gordo 2001) for several stadard resamplig schemes. The degeeracy of the particle represetatio is

4 4 A. DOUCET, M.BRIERS, AND S. SÉNÉCA typically measured usig the effective sample size ESS), as stated by iu ad Che 1998): N ) 1 ESS =. 2.2) i=1 W i)2 The ESS takes values betwee 1 ad N; if the ESS is below a give threshold, say N/2, the we resample; iu 2001, chap. 3). After the resamplig step, the particles are approximately distributed accordig to π. Expressio 2.1) follows from π x 1: ) µ x 1: ) ew weight = π 1x 1: 1 ) π x 1: ), µ 1 x 1: 1 ) π 1 x 1: 1 )q x x 1: 1 ) previous weight icremetal weight after the samplig step; that is, if the last resamplig where µ is the distributio of X i) 1: step occurred at time p p <) oe has approximately µ x 1: )=π p x 1:p ) q k x k x 1:k 1 ). k=p+1 The SISR algorithm has a computatioal complexity of order O N) ad, for may practical applicatios such as optimal filterig, the calculatio of the icremetal weight has a fixed computatioal complexity. A alterative, popular SMC method is the auxiliary approach itroduced by Pitt ad Shephard 1999) i the cotext of optimal filterig. The efficiecy of the algorithms described above is highly depedet o the choice of the importace distributios q. The variace of importace samplig estimates ca be show to be approximately proportioal to oe plus the variace of the uormalised importace weights; iu 2001). I practice, the resamplig step itroduces correlatios betwee particles ad the variace expressio is much more complex, see Chopi 2004); Del Moral 2004); Küsch 2005). However, it remais sesible to try to miimize the variace of the uormalized importace weights appearig i the SISR algorithm. I curret approaches, the oly degree of freedom we have at time is the importace distributio q x x 1: 1 ) as the paths X i) 1: 1 previously sampled are ot modified. I this case, we are restricted to lookig at the miimizatio of the variace of the icremetal weights coditioal upo X i) 1: 1. It is well kow ad straightforward to establish that this coditioal variace is miimized for q opt x x 1: 1 )=π x x 1: 1 ). 2.3) Usig this distributio, the icremetal importace weight is give by π x 1: ) π 1 x 1: 1 ) q opt x x 1: 1 ) = π x 1: 1) π 1 x 1: 1 ). 2.4) However, it ca be difficult to sample from π x x 1: 1 ) ad/or to compute π x 1: 1 ). Various methods have bee proposed to approximate them. For example, i the optimal filterig cotext, π x x 1: 1 ) ad π x 1: 1 ) are typically approximated usig stadard suboptimal filterig techiques such as the exteded Kalma filter; see, for example, Carpeter et al. 1999); Doucet et al. 2000); Guo et al. 2005); Pitt ad Shephard 1999).

5 EFFICIENT BOCK SAMPING STRATEGIES IMITATIONS Stadard SMC methods suffer from several limitatios. It is importat to emphasize at this poit that, eve if the importace distributio 2.3) ca be used or well approximated, this does ot guaratee that the SMC algorithm will be efficiet. Ideed, if the discrepacy betwee two cosecutive distributios π x 1: 1 ) ad π 1 x 1: 1 ) is high, the the variace of the icremetal weight 2.4) will be high. Cosequetly it will be ecessary to resample very ofte ad the particle approximatio of the joit distributio π dx 1: )= N i=1 W i) δ i) X dx 1: ) 1: will be ureliable. I particular, for k << the margial distributio π dx 1:k ) will oly be approximated by a few if ot oe uique particle because the algorithm will have resampled may times betwee times k ad. The problem with approaches discussed util ow is that oly the variables X i) are sampled at time but the paths values X i) 1: 1 remai fixed. A obvious way to improve the algorithm would cosist of ot oly samplig X i) at time but also modifyig the values of the paths over a fixed lag X i) +1: 1 for >1i light of π ; beig fixed or upper bouded to esure that we have a sequetial algorithm. The objective of this approach is ot oly to sample X i) i regios of high probability mass but also to modify the path values X i) +1: 1 to move them towards these regios. This approach is coceptually simple. Ufortuately, we will see that a direct aive implemetatio of it is impossible as it would require calculatig a itractable itegral for each particle. I the ext sectio we preset a origial approach which allows us to circumvet this problem. 3. EFFICIENT BOCK SAMPING STRATEGIES FOR SMC 3.1 EXTENDED IMPORTANCE SAMPING At time 1, assume a set of weighted particles W i) 1,Xi) 1: 1 approximatig π 1 is available. We propose ot oly to exted the curret paths but also to sample agai ) a sectio of their paths over a fixed lag. et q x +1: x 1: 1 deote the probability desity of movig to x +1: whe the curret path is x 1: 1; that is, we sample X i) +1: q X i) 1: 1 ), costruct the ew paths X i) 1:,X i) +1:, ad discard X i) +1: 1. ettig µ 1 deote the distributio of X i) 1: 1 at time 1, the distributio of X i) 1: 1,X i) +1: is thus give by µ x1: 1,x +1:) = µ 1 x 1: 1 ) q x +1: x 1: 1 ) 3.1)

6 6 A. DOUCET, M.BRIERS, AND S. SÉNÉCA ad hece the distributio of the paths of iterest X i) 1:,X i) +1: is µ x1:,x ) +1: = µ x1: 1,x +1:) dx +1: ) We would like to correct for the discrepacy betwee π x1:,x +1:) ad µ x1:,x +1:) by usig importace samplig. However there are two problems with this approach. First, it is usually impossible to compute µ x1:,x +1:) poitwise up to a ormalizig costat. Secod, eve if it were possible, there would o loger be a simple expressio such as 2.1) for the weight update. To deal with this problem, the key idea is to perform importace samplig o the elarged space associated with X i) 1: 1,X i) +1: as their joit probability distributio 3.1) does ot ivolve ay itegral. To do this it is ecessary to exted the dimesioality of the target distributio π x1:,x +1:) to be able to perform importace samplig. ) We itroduce a artificial coditioal distributio λ x +1: 1 x 1:,x +1: that allows us to defie a ew exteded target distributio π x1:,x +1:) λ x +1: 1 x 1:,x +1:). As oe ca see, by costructio, this artificial target distributio admits the required distributio π x1:,x +1:) as a margial. So if we perform IS to estimate this artificial target distributio, the margially we will obtai a estimate of π x1:,x +1:). It is ow easy to estimate the icremetal weight usig the followig relatio ) π x1:,x +1:) λ x +1: 1 x 1:,x +1: ) µ x1: 1,x +1: = π 1 x 1: 1 ) µ 1 x 1: 1 ) ) π x1:,x +1:) λ x +1: 1 x 1:,x +1: π 1 x 1: 1 ) q x +1: x 1: 1) 3.3) as this does ot ivolve ay itegratio. Note that i this framework, π 1 x 1: 1 ) /µ 1 x 1: 1 ) does ot correspod to the importace weight calculated at time 1 because we also use artificial distributios before time. However, if we express the target distributio at time as π multiplied by all the artificial distributios itroduced util time the the followig block SISR algorithm weights the particles cosistetly at time ; see Appedix for details. At time < 1. Sample X i) 1: q X i) 1: 1 ). 2. Update ad ormalize the weights 3. Set X i) 1: W i) X i) 1: W i). 1 π X i) 1: )λ X i) 1: 1 ) X i) 1: π 1 X i) 1: 1 )q X i) 1: Xi) 1: 1 ).

7 EFFICIENT BOCK SAMPING STRATEGIES 7 4. If the degeeracy of W i) is high, resample X i) 1: accordig to W i) to obtai N uweighted particles that is, weights of resampled particles W i) N 1 ). At time 5. Sample X i) +1: q X i) 1: 1 ). 6. Update ad ormalize the weights π X i) W i) W i) 1:,X i) +1: )λ 1 π 1 X i) 7. Set X i) 1: X i) +1: 1 X i) 1:,X i) +1: 1: 1 )q X i) +1: Xi) 1: 1 ) icremetal weight 8. If the degeeracy of X i) 1:,X i) +1:. W i) is high, resample X i) 1: accordig to ). 3.4) W i) obtai N uweighted particles i.e., weights of resampled particles W i) N 1 ). We adopt the covetio π 0 x 1:0 )=1ad λ 1 x 1:0 x 1)=1. This algorithm is a simple ad pricipled extesio of the stadard SISR procedure. A auxiliary versio of this method i the spirit of Pitt ad Shephard 1999) ca also be obtaied. Geeral covergece results developed for SMC methods apply i a straightforward way; see Del Moral 2004). Ideed, the oly differece is that istead of samplig from the iitial sequece of distributios, we ow sample from a sequece of exteded distributios defied i the Appedix. Clearly the performace of the algorithm is highly depedet o the artificial distributios λ ad the importace distributios q. I the ext subsectio, we provide guidelies o how to select these distributios so as to optimise the performace of the algorithm. to 3.2 AGORITHMIC SETTINGS We first address the selectio of the artificial distributios λ. To select them, we propose to miimize the variace of the icremetal importace weight appearig i 3.4). We will deote this icremetal weight w x1: 1, x +1:). ) Propositio 1. The coditioal distributio λ x +1: 1 x 1:,x +1:) which miimizes the variace of the icremetal importace weight w x1: 1, x +1: is give by λ opt w opt x +1: 1 x 1:,x +1:) = π 1 x 1: 1 )q x +1: x 1: 1) π 1 x 1: 1 )q x +1: x 1: 1)dx +1: 1 3.5) ad i this case the icremetal weight takes the form ) x1:, x π x1:,x +1:) = +1: π 1 x 1: 1 )q x +1: x. 3.6) 1: 1)dx +1: 1

8 8 A. DOUCET, M.BRIERS, AND S. SÉNÉCA Proof of Propositio 1: The result follows from the variace decompositio formula var [ w X1: 1, X +1: )] = E [ var [ w X1: 1, X +1:) X 1:,X +1: ]] +var [ E [ w X1: 1, X +1:) X 1:,X +1:]]. 3.7) The secod term o the right-had side of 3.7) is idepedet of λ x +1: 1 x 1:, x +1:) as E [ w X1: 1, X +1:) X 1:,X +1: ] ) π x1:,x +1: λ x +1: 1 x 1:,x = +1:) π 1 x 1: 1 ) q x +1: x 1: 1) π 1 x 1: 1 )q x +1: x 1: 1) π 1 x 1: 1 )q x +1: x dx +1: 1 1: 1)dx +1: 1 = w opt x1:, x ) +1:. The term E [ var [ w X1: 1, X +1:) X1: +1,X +1:]] is equal to zero if oe uses the expressio 3.5) for λ as i this case the icremetal weight becomes idepedet of x +1: 1. This result is ituitive ad simply states that the optimal artificial distributio λ opt is the oe that takes us back to the case where we perform importace samplig o the space where the variables x +1: 1 are itegrated out. I practice, it is typically impossible however to use λ opt ad w opt, as the margial distributio π 1 x 1: 1 )q x +1: x 1: 1 )dx +1: 1 3.8) caot be computed i closed form. There is a importat exceptio. If q x +1: x 1: 1) =q x +1: x 1: ), the 3.8) does ot ivolve a itegral ad λ opt x +1: 1 x 1:,x ) +1: = π 1 x +1: 1 x 1: ), 3.9) ) w opt x1:, x π x1:,x +1:) = +1: π 1 x 1: )q x +1: x 1: ). 3.10) As is the case with stadard SMC previously discussed, π 1 x 1: ) is typically ukow. However, λ could be selected so as to approximate 3.9). We emphasize that eve if it is ot equal to 3.9), this procedure still yields asymptotically cosistet estimates. Havig optimized λ, we ow cosider the distributios q that miimise the coditioal variace of the icremetal importace weight 3.6). ) Propositio 2. The importace distributio q x +1: x 1: 1 which miimizes ) the variace of the λ -optimized icremetal weight w opt x1: 1, x +1: coditioal upo x 1: is give by ) x +1: x 1: 1 = π x +1: x 1: ) 3.11) q opt

9 EFFICIENT BOCK SAMPING STRATEGIES 9 ) ad i this case w opt x1: 1, x +1: satisfies w opt x1: 1, x π x 1: ) +1:) = π 1 x 1: ). 3.12) Proof of Propositio 2: The proof is straightforward as it is easy to check that the coditioal variace of w opt is equal to zero for q opt give i 3.11). The expressio 3.12) follows by isertig 3.11) ito 3.6). Note that this result is a straightforward extesio of the stadard case where =1as discussed i 2.3) 2.4). I practice, it follows from Propositios 1 ad 2 that we should aim to desig importace distributios q which approximate 3.11) ad the select artificial distributios λ to approximate 3.9). So, if we use a approximatio π x +1: x 1: ) of π x +1: x1: ) for the importace distributio, we ca also use a approximatio π 1 x +1: 1 x 1: ) of π 1 x +1: 1 x 1: ) of the optimal artificial distributio. I this case, the block SISR algorithm proceeds as follows at time ). 1. Sample X i) +1: π X i) 1: ). 2. Update ad ormalize the weights 3. Set W i) X i) 1: ) X i) +1: 1 X i) 1: π 1 X i) 1: 1 ) π X i) +1: Xi) 1: ). π X i) W i) 1:,X i) +1: ) π If the degeeracy of X i) 1:,X i) +1:. W i) is high, resample X i) 1: accordig to W i) to obtai N uweighted particles i.e., weights of resampled particles W i) N 1 ). 3.3 DISCUSSION The resample-move RM) strategy proposed by Gilks ad Berzuii 2001) see also Doucet, Gordo, ad Krishamurthy 2001; Fearhead 2002) is a popular alterative method to limit the degeeracy of the particle populatio. It ca also be iterpreted as samplig from a sequece of artificially exteded distributios. Assume we have samples W i) 1,Xi) 1: 1 approximatig π 1. At time, the RM algorithm first uses a stadard SISR step as described i Sectio 2. The the paths betwee time +1ad are moved accordig to a MCMC kerel q x +1: x 1:) of ivariat distributio π x +1: x 1: ) ad their weights are ot modified. This MCMC step correspods to samplig from a exteded distributio π x1:,x +1:) λ x +1: x 1:,x ) +1: where the artificial measure is give by λ x +1: x 1:,x ) π x 1: ) q x +1: = +1: x 1:) ) π x1:,x. +1:

10 10 A. DOUCET, M.BRIERS, AND S. SÉNÉCA I practice, oe itroduces a resamplig step betwee the stadard SISR step ad the MCMC step if the degeeracy of the importace weights is high. If this step was ot itroduced, the RM would be iefficiet. Ideed, eve if oe had a very fast mixig MCMC kerel, the the weights 2.1) would ot be modified. This is suboptimal. The itroductio of a resamplig step mitigates this problem but, cotrary to the block samplig strategies described i the previous sectio, RM ca oly limit the path degeeracy over a lag of legth. This is demostrated i Sectio 4. I the cotext of static models, SMC algorithms usig a RM-type strategy have bee proposed by Chopi 2002) whereas algorithms based o usig alterative artificial measures have bee proposed by Del Moral, Doucet, ad Jasra 2006). However, i Chopi 2002) ad Del Moral et al. 2006), the authors use at time a MCMC kerel of ivariat distributio π to sample the particles, whereas the particles are sampled here usig approximatios of Gibbs moves. We believe that the ew approach proposed here is simpler ad is a atural extesio of stadard techiques correspodig to the case =1. We do ot claim that these block samplig SMC methods will always outperform stadard SMC. It depeds etirely o the ability of the user to desig good approximatios of the distributios π x +1: x 1: ). Similarly, i a MCMC framework, block samplig strategies will oly outperform oe at a time strategies if the proposal distributios to sample blocks are desiged carefully. A lot of effort has bee devoted to the desig of efficiet importace distributios/proposal distributios e.g., Durbi ad Koopma 2000; Pitt ad Shephard 1997) ad these methods ca be directly applied to our framework. 4. APPICATIONS TO OPTIMA FITERING 4.1 MODE I this sectio, we detail the applicatio of block samplig SMC methods to optimal filterig. Cosider a uobserved hidde Markov process X 1 defied by X 1 µ, X X 1 = x 1 f x 1 ). We oly have access to oisy observatios Y 1. These observatios are such that coditioal o X 1 their margial desity is give by Y X = x g x ). At time, the optimal estimatio of the collectio of states X 1: give a realizatio of the observatios Y 1: = y 1: is based o the posterior desity π x 1: )=p x 1: y 1: ) µ x 1 ) g y 1 x 1 ) f x k x k 1 ) g y k x k ). k=2

11 EFFICIENT BOCK SAMPING STRATEGIES 11 The optimal distributio 3.11) ad associated importace weight 3.12) are equal to π x +1: x 1: )=px +1: y +1:,x ), 4.1) π x 1: )=px 1: y 1: ), π x 1: ) π 1 x 1: ) = p x 1: y 1: ) p x 1: y 1: 1 ) p y y +1: 1,x ). 4.2) We ca assess the effect of the block samplig approach o the optimal importace weights i the importat case where the optimal filter forgets its iitial coditio expoetially; see Del Moral 2004, chap. 4) for sufficiet coditios for expoetial forgettig. I this importace case, uder additioal assumptios, it has already bee established that SMC methods coverge uiformly i time i p orm i Del Moral 2004, chap. 7) ad that the variace of the SMC approximatios is also bouded uiformly i time; see Chopi 2004, theorem 5). The followig simple result shows that i this case the optimal weights 4.2) also become idepedet of x as icreases. emma 1. Assume that for fiite costats A,Bad α<1) g y x ) <Afor ay x ad that the optimal filter forgets its iitial coditios expoetially, that is, we have p x y +1:,x ) p ) x y +1:,x dx Bα for ay x,x ) ad ay. I this case the optimal importace weights satisfy for ay y p y y +1: 1,x ) p y y +1: 1,x ) ABα. The straightforward proof is omitted. I practice, we caot compute these weights exactly ad so use approximatios istead. However, this result suggests that if we ca approximate the optimal importace distributio i a satisfactory way the the variace of these weights will decrease sigificatly with, limitig drastically the umber of resamplig steps ecessary. et us cosider a simple Gaussia autoregressive model X = αx 1 + σ v V, Y = X + σ w W iid iid where V N 0, 1) ad W N 0, 1). I this case, it is easy to establish that p x y +1:,x ) is a Gaussia distributio with covariace idepedet of x,y +1: ) such that E x y +1:,x ) E x y +1:,x ) ) α = 1+σv/σ 2 w 2 x x. As soo as α 1+σv/σ 2 w 2 < 1

12 12 A. DOUCET, M.BRIERS, AND S. SÉNÉCA the p x y +1:,x ) forgets its iitial coditio expoetially quickly. This covergece is faster whe the sigal to oise ratio σ 2 v/σ 2 w is high ad the uderlyig Markov process X is mixig quickly i.e., small α). Although we have oly discussed here the liear Gaussia case solvable through the Kalma filter), more geerally the expoetial forgettig property will hold whe the Markov process X mixes quickly ad/or whe the observatios are sufficietly iformative. I such situatios, we expect block samplig SMC methods to outperform sigificatly stadard methods if good approximatios of the optimal importace distributios ca be obtaied. 4.2 SIMUATIONS This sectio discusses the applicatio of the block samplig SMC methods to two popular problems. The first problem is a target trackig problem which has bee aalyzed i a umber of statistical publicatios icludig Fearhead 2002) ad Gilks ad Berzuii 2001). The secod is for stochastic volatility models appearig i Kim, Shephard, ad Chib 1998), Pitt ad Shephard 1997), Pitt ad Shephard 1999) Bearig-Oly Trackig The target is modeled usig a stadard costat velocity model 1 T X = T X 1 + V, iid where V N0, Σ), with T =1ad T 3 /3 T 2 /2 0 0 T 2 /2 T 0 0 Σ=5 0 0 T 3 /3 T 2 /2 0 0 T 2 /2 T ) T The state vector X = X 1 X 2 X 3 X 4 is such that X 1 respectively X) 3 correspods to the horizotal respectively vertical) positio of the target whereas X 2 respectively X) 4 correspods to the horizotal respectively vertical) velocity. Oe oly receives observatios of the bearigs of the target from a sesor located at the origi ) X Y = ta 1 3 X 1 + W where W iid N 0, 10 4) ; that is, the observatios are almost oiseless. I the simulatios, the iitial state X 1 is distributed accordig to a Gaussia of mea correspodig to the true iitial simulated poit ad a idetity covariace. We emphasize that these parameters are represetative of real-world trackig scearios. To build a approximatio p x +1: y +1:,x ) of the optimal importace distributio 4.1), we use the exteded Kalma filter EKF) combied with the forward.

13 EFFICIENT BOCK SAMPING STRATEGIES 13 Table 1. Average Number of Resamplig Steps for 100 Simulatios, 100 Time Istaces per Simulatio Usig N = 1,000 Particles Filter Avge. # resamplig steps Bootstrap 46.7 SMC-EKF1) 44.6 RM10) 45.2 RMF10) 43.3 SMC-EKF2) 34.9 SMC-EKF5) 4.6 SMC-EKF10) 1.3 filterig/backward samplig formula described by Chib 1996) ad Frühwirth-Schatter 1994). More precisely we use p x +1: y +1:,x ) where = p x y +1:,x ) p x k y +1:k,x,x k+1 )= 1 k= +1 p x k y +1:k,x,x k+1 ), 4.3) f x k+1 x k ) p x k y +1:k,x ) f xk+1 x k ) p x k y +1:k,x ) dx k. The distributios p x k y +1:k,x ) are Gaussia distributios whose parameters are computed usig a EKF iitialized usig X = x. We compare the followig: The stadard bootstrap filter see, e.g., Gordo, Salmod, ad Smith 1993) which uses the prior as importace distributio, two resample-move algorithms as described by Gilks ad Berzuii 2001), where the SISR algorithm for =1usig the EKF proposal is used followed by: 1) oe at a time Metropolis-Hastigs MH) moves usig a approximatio of the full coditioals p x k y k,x k 1,x k+1 ) as a proposal over a lag =10algorithm RM10)); ad 2) usig the EKF proposal give by 4.3) for =10algorithm RMF10)). The acceptace probabilities of those moves were betwee 0.5/0.6 i all cases. the block SISR algorithms for =2, 5, ad 10 which are usig the EKF proposal deoted SMC-EKF). Systematic resamplig is performed wheever the ESS defied i 2.2) goes below N/2. The results are displayed i Table 1. The stadard algorithms amely, bootstrap, SMC-EKF1), RM10), ad RMF10) eed to resample very ofte as the ESS drop below N/2. The resample-move algorithms RM10) ad RMF10) suffer from the same problems as stadard SMC techiques bootstrap ad SMC-EKF1)) despite their computatioal complexity beig similar

14 14 A. DOUCET, M.BRIERS, AND S. SÉNÉCA Figure 1. x-axis). Average umber of uique particles X i) approximatig px y 1:100 ) y-axis) plotted agaist time to SMC-EKF10); this is because MCMC steps are oly itroduced after a EKF1) proposal has bee performed. Coversely, as icreases, the umber of resamplig steps required by SMC-EKF) methods decreases dramatically. Cosequetly, the umber of uique paths approximatig p x 1:100 y 1:100 ) remais very large. I Figure 1, we display the average umber of uique particles X i) approximatig p x y 1:100 ). We see that usig stadard techiques this umber rapidly decreases towards 1 as decreases whereas usig the block samplig approach this decrease is much slower STOCHASTIC VOATIITY We cosider the popular stochastic volatility model as described by Durbi ad Koopma 2000); Kim et al. 1998); Pitt ad Shephard 1997, 1999) σ 2 ) X = φx 1 + σv,x 1 N 0, 1 φ 2, Y = β exp X /2) W, 4.4) iid iid where V N0, 1) ad W N0, 1). I the SMC cotext, several techiques have bee proposed to approximate the optimal importace distributio for = 1, that is p x y,x 1 ); Pitt ad Shephard 1999). I the MCMC cotext as i Pitt ad Shephard 1997), methods to approximate distributios of the form p x +1: y +1:,x ) have bee proposed but these are typically computatioally itesive. We propose here a simpler alterative based o the fact that log Y 2 ) = log β 2 ) + X + log W 2 ). 4.5) This represetatio has bee previously used i the ecoometrics literature to obtai the optimal liear miimum mea square estimate of X usig the Kalma filter. We use

15 EFFICIENT BOCK SAMPING STRATEGIES 15 Table 2. Average Number of Resamplig Steps for 100 Simulatios usig 500 Time Istaces per Simulatio Filter # particles Avge. # resamplig steps Bootstrap SMC-EKF1) SMC-EKF2) SMC-EKF5) SMC-EKF10) it here to build our importace distributio. We approximate the o-gaussia oise term log W 2 ) with a Gaussia oise of similar mea ad variace ad hece obtai a liear Gaussia model approximatio of 4.4) 4.5). We the proceed i a similar fashio to the bearigs-oly-trackig example, by usig a Gaussia approximatio of the optimal distributio of the form 4.3). The performace of our algorithms are assessed through computer simulatios based o varyig samples sizes to attai a approximately equal computatioal cost. We compare the stadard bootstrap filter, the block SISR algorithms for =1, 2, 5, ad 10 deoted SMC-EKF). Systematic resamplig is agai performed wheever the ESS goes below N/2. The results are displayed i Table 2 for σ 2 =0.9, φ=0.8, ad β =0.7. The computatioal complexity of the proposed approach is higher tha that of stadard techiques. However, as these algorithms use the observatios to guide the particles i regios of high probability mass, they are much more robust to outliers tha stadard techiques as was clearly emphasized by Pitt ad Shephard 1999). Moreover, the umber of resamplig steps is cosequetly sigificatly limited. This is useful if a parallel implemetatio is performed as the resamplig operatio is see as a major bottleeck to the parallelizatio of SMC techiques. Figure 2 displays the average umber of uique particles X i) approximatig p x y 1:500 ). We see that usig the stadard techiques, this umber decreases rather quickly as decreases. However, usig the block samplig approach this decreases much more slowly. I particular, SMC-EKF10) performs remarkably well. For <400, SMC- EKF10) algorithm outperforms the bootstrap filter i terms of uique umber of particles. It provides estimates of p x y 1:500 ) that are much more reliable tha the bootstrap filter as decreases. Iterestigly, for the same computatioal complexity, the bootstrap filter cosistetly outperforms the SMC-EKF1) algorithm for this problem. However, we emphasize here that, if outliers were preset, the improvemets brought by the SMC-EKF1) algorithm ad the block samplig algorithms over the bootstrap would be much higher tha i these simulatios. We ow apply the algorithms with N = 1,000 particles for all algorithms to the poud/dollar daily exchage rates from 1/10/81 to 28/6/85. This time series cosists of

16 16 A. DOUCET, M.BRIERS, AND S. SÉNÉCA Figure 2. x-axis). Average umber of uique particles X i) approximatig px y 1:500 ) y-axis) plotted agaist time 945 data poits ad the parameters σ = , φ = , β = are selected as i Durbi ad Koopma 2000). Figure 3 displays the empirical measures approximatig various margial smoothig distributios. As expected, this approximatio improves sigificatly as icreases. Figure 4 displays SMC estimates of the posterior variaces var [X y 1:945 ]. The variace estimates of the bootstrap ad SMC-EKF1) quickly decay to zero as decreases because the posterior distributios p x y 1:945 ) are approximated by oe uique particle. The variace estimates provided by the block samplig approaches Figure 3. Empirical measures approximatig the smoothig distributios px y 1:945 ) at times = 100, 130, 160, 190 for bootstrap top left), SMC-EKF1) top right), SMC-EKF5) bottom left), SMC-EKF10) bottom right).

17 EFFICIENT BOCK SAMPING STRATEGIES 17 Figure 4. SMC estimates of var X y 1:945 ) y-axis) plotted agaist time x-axis). Top to bottom: bootstrap, SMC-EKF1), SMC-EKF5), SMC-EKF10). are much better. I particular SMC-EKF10) provides variace estimates which are approximately similar as decreases; this is expected as a result of the ergodic properties of this state-space model. A MCMC ru o the same dataset yields comparable estimates. This provides strog evidece that such blockig strategies ca sigificatly limit the degeeracy of the particle populatio ad yield much better estimates of joit distributios tha stadard techiques. 5. DISCUSSION This article preseted pricipled extesios of stadard SMC methods that allow us to implemet block samplig strategies. These methods ca be applied wherever SMC methods apply. Give that the cost of block samplig schemes is higher tha that of stadard methods, it is difficult to assess beforehad whether it will be beeficial for a specific applicatio. Nevertheless, the examples preseted i the previous sectio show that it ca dramatically reduce the umber of resamplig steps ad provide a far better approximatio of joit distributios tha stadard techiques, for a fixed computatioal complexity. Geerally, our guidelies are that we will observe sigificat gais whe it is possible to desig a sesible approximatio of the optimal importace distributio 3.11) ad whe the discrepacy betwee successive target distributios is high. I the optimal filterig framework, this situatio occurs whe we receive, for example, iformative observatios ad the dyamic oise is high. This also suggests that the block samplig approach could oly be used i cases where we observe a sigificat drop i the ESS usig stadard techiques.

18 18 A. DOUCET, M.BRIERS, AND S. SÉNÉCA APPENDIX: BOCK SAMPING WEIGHT UPDATE DERIVATION This appedix establishes the validity of the weights update rule 3.4). To clarify our argumet, it is ecessary to add a superscript to the variables; for example, X p k correspods to the pth time the radom variable X k is sampled. Usig such otatio, a path is sampled accordig to X1 1 q 1 ), X 2 1,X2 1 ) ) q2 X 1 1,. X 1,X2 1,...,X 1 ) q X 1 1,...,X 1) 1, X 2,X3 1,...,X+1 1 ) q+1 X 1,X2 1,...,X) 1,. X +1,X ),X1 q X 1:,X ),X1. To summarize, the importace distributio at time is of the form q x 1 1,...,x 1 ) ) = q1 x 1 1 q2 x 2 1,x 1 ) 2 x 1 1 q x +1,...,x 1 x 1:,...,x 1 1) ; A.1) that is, at time we have sampled times the variables x 1: +1 the i times x +1+i for i =1,...,. We ow cosider the followig exteded target distributio deoted π π x 1: 1,...,x 1: +1,x 1: 1 +2,... ),x1 = π x 1: +1,x 1 +2,... ),x1 λ2 x 1 1 x 2 1,x 1 ) 2 λ x 1 +1,...,x1 1 x 1:,x +1,...,x 1 ). A.2) Clearly we have π x 1: 1,...,x 1: +1,x1: 1 +2,... ),x1 q x 1: 1,...,x 1: +1,x1: 1 +2,... ),x1 ew weight = π 1 x 1: 1,...,x 1:,x1: 1 +1,... ),x1 1 q 1 x 1: 1,...,x 1:,x1: 1 +1,... ),x1 1 previous weight π x 1: +1,x 1 +2,..., ) x1 λ x 1 +1,..., x1 1 x 1:,x +1,..., ) x1 π 1 x 1:,x 1 +1,..., 1) x1 q x +1,..., x 1 x 1:,x 1 +1,..., ). x1 1 icremetal weight This establishes the validity of 3.4).

19 EFFICIENT BOCK SAMPING STRATEGIES 19 [Received April Revised Jauary 2006.] REFERENCES Carpeter, J., Clifford, P., ad Fearhead, P. 1999), A Improved Particle Filter for No-liear Problems, IEEE Proceedigs Radar, Soar ad Navigatio, 146, 2 7. Chib, S. 1996), Calculatig Posterior Distributios ad Modal Estimates i Markov Mixture Models, Joural of Ecoometrics, 75, Chopi, N. 2002), A Sequetial Particle Filter Method for Static Models, Biometrika, 89, ), Cetral imit Theorem for Sequetial Mote Carlo Methods ad its Applicatio to Bayesia Iferece, The Aals of Statistics, 32, Del Moral, P. 2004), Feyma-Kac Formulae Geealogical ad Iteractig Particle Systems with Applicatios, New York: Spriger-Verlag. Del Moral, P., Doucet, A., ad Jasra, A. 2006), Sequetial Mote Carlo Samplers, Joural of the Royal Statistical Society, Series B, 68, Doucet, A., de Freitas, J., ad Gordo, N. eds.) 2001), Sequetial Mote Carlo Methods i Practice, New York: Spriger-Verlag. Doucet, A., Godsill, S., ad Adrieu, C. 2000), O Sequetial Mote Carlo Samplig Methods for Bayesia Filterig, Statistics ad Computig, 10, Doucet, A., Gordo, N., ad Krishamurthy, V. 2001), Particle Filters for State Estimatio of Jump Markov iear Systems, IEEE Trasactios o Sigal Processig, 49, Durbi, J., ad Koopma, S. 2000), Time Series Aalysis of No-Gaussia Observatios Based o State Space Models from Both Classical ad Bayesia Perspectives with discussio), Joural of the Royal Statistical Society, Series B, 62, Fearhead, P. 2002), MCMC, Sufficiet Statistics ad Particle Filter, Joural of Computatioal ad Graphical Statistics, 11, Frühwirth-Schatter, S. 1994), Data Augmetatio ad Dyamic iear Models, Joural of Time Series Aalysis, 15, Gilks, W., ad Berzuii, C. 2001), Followig a Movig Target Mote Carlo Iferece for Dyamic Bayesia Models, Joural of the Royal Statistical Society, Series B, 63, Gordo, N., Salmod, D., ad Smith, A. 1993), Novel Approach to Noliear/No-Gaussia Bayesia State Estimatio, IEEE Proceedigs F, 140, Grassberger, P. 1997), Prued-Eriched Rosebluth Method: Simulatios of Theta Polymers of Chai egth up to 1,000,000, Physical Review E, 56, Guo, D., Wag, X., ad Che, R. 2005), New Sequetial Mote Carlo Methods for Noliear Dyamic Systems, Statistics ad Computig, 15, Kim, S., Shephard, N., ad Chib, S. 1998), Stochastic Volatility: ikelihood Iferece ad Compariso with ARCH Models, Review of Ecoomic Studies, 65, Küsch, H. 2005), Recursive Mote Carlo filters: Algorithms ad Theoretical Aalysis, The Aals of Statistics, 33, iu, J. 2001), Mote Carlo Strategies i Scietific Computig, Berli: Spriger. iu, J., ad Che, R. 1998), Sequetial Mote Carlo Methods for Dyamic Systems, Joural of the America Statistical Associatio, 93, Meirovitch, H. 1985), Scaig Method as a Ubiased Simulatio Techique ad its Applicatio to the Study of Self-Attractig Radom Walks, Physical Review A, Pitt, M., ad Shephard, N. 1997), ikelihood Aalysis of o-gaussia Measuremet Time Series, Biometrika, 84, ), Filterig via Simulatio: Auxiliary Particle Filters, Joural of the America Statistical Associatio, 94, Wag, X., Che, R., ad Guo, D. 2002), Delayed-Pilot Samplig for Mixture Kalma Filter with Applicatios i Fadig Chaels, IEEE Trasactios o Sigal Processig, 50,

Advanced Sequential Monte Carlo Methods

Advanced Sequential Monte Carlo Methods Advaced Sequetial Mote Carlo Methods Araud Doucet Departmets of Statistics & Computer Sciece Uiversity of British Columbia A.D. () / 35 Geeric Sequetial Mote Carlo Scheme At time =, sample q () ad set