ON MULTIPLE TRY SCHEMES AND THE PARTICLE METROPOLIS-HASTINGS ALGORITHM

Size: px

Start display at page:

Download "ON MULTIPLE TRY SCHEMES AND THE PARTICLE METROPOLIS-HASTINGS ALGORITHM"

Scot Small
6 years ago
Views:

1 ON MULTIPLE TRY SCHEMES AN THE PARTICLE METROPOLIS-HASTINGS ALGORITHM L. Martino, F. Leisen, J. Coraner University of Helsinki, Helsinki (Finlan). University of Kent, Canterbury (UK). ABSTRACT Markov Chain Monte Carlo (MCMC) algorithms an Sequential Monte Carlo (SMC) methos (a.k.a., particle filters) are well-known Monte Carlo methoologies, wiely use in ifferent fiels for Bayesian inference an stochastic optimization. The Multiple Try Metropolis (MTM) algorithm is an extension of the stanar Metropolis- Hastings (MH) algorithm in which the next state of the chain is chosen among a set of caniates, accoring to certain weights. The Particle MH (PMH) algorithm is another avance MCMC technique specifically esigne for scenarios where the multiimensional target ensity can be easily factorize as multiplication of conitional ensities. PMH combines jointly SMC an MCMC approaches. Both, MTM an PMH, have been wiely stuie an applie in literature. PMH variants have been often applie for the joint purpose of tracking ynamic variables an tuning constant parameters in a state space moel. Furthermore, PMH can be also consiere as an alternative particle smoothing metho. In this work, we investigate connections, similarities an ifferences among MTM schemes an PMH methos. This stuy allows the esign of novel efficient schemes for filtering an smoothing purposes in state space moels. More specially, one of them, calle particle Multiple Try Metropolis (P-MTM), obtains very promising results in ifferent numerical simulations. Keywors: Bayesian Inference; Particle Filter; Particle smoother; Markov Chain Monte Carlo (MCMC); Multiple Try Metropolis; Particle MCMC.. INTROUCTION Monte Carlo statistical methos are powerful tools for numerical inference an stochastic optimization [34, 2. Markov Chain Monte Carlo (MCMC) [5, 9, 20, 34 an Sequential Monte Carlo (SMC) algorithms (a.k.a., particle filters) [2, 7, 0, 29 are classical Monte Carlo techniques employe in orer to approximate an otherwiseincalculable (analytically) integral involving a complicate target probability ensity function (pf) [2, 20 (the posterior istribution in Bayesian inference). MCMC algorithms prouce a Markov chain with a stationary istribution that coincies with the target pf, whereas SMC methos (a.k.a., particle filters) yiel an approximation of the target measure by weighte samples. The Multiple Try Metropolis (MTM) metho of [22, [2, Chapter 5 is an avance MCMC technique, which is extension of the well-known Metropolis-Hastings (MH) algorithm [28, 3. In MTM, the next state of the chain is selecte among a set of caniates accoring to some suitable weights. This enables the MTM sampler to make large step-size jumps without a lowering of the acceptance rate an thus the exploration of a larger portion of the state space is facilitate. A famous special case of MTM, well-known in molecular simulation fiel, is the orientational This work has been supporte by the AoF grant 2570, an by the European Union 7th Framework Programme through the Marie Curie Initial Training Network Machine Learning for Personalize Meicine MLPM202, Grant No

2 bias Monte Carlo technique [. ue to its goo performance, several generalizations of the basic MTM scheme [22 can be foun in literature: with correlate caniates [26, 33, more general form of the weights an ifferent frameworks [8, 27, 3, 39, with aaptive an interacting proposal pfs [4. Interesting an relate stuies about MTM or the use of multiple auxiliary variables for builing acceptance probabilities within a MH approach can be foun in [3, 37, 25. Inepenently from the erivation of the MTM schemes, the class of Particle MCMC methos (P-MCMC) has been propose [, 6, 30, 38 in literature. P-MCMC methos are specifically esigne to solve inference problems in state space moels, combining SMC an the MCMC approaches. In this work, we focus on a specific P-MCMC metho, calle Particle Metropolis-Hastings (PMH) algorithm [, 6. The iea behin PMH is first to approximate the target measure with a elta approximation base on weighte samples obtaining by a particle filter, an then to use this approximation as proposal pf within MH technique. The Particle Marginal MH technique is a variant of the stanar PMH metho esigne in orer to estimate jointly the sequence of hien states an the static parameters of the moel [38. Note that the stanar PMH scheme can be interprete as a particle smoother [9, 0, 2, 6, 7, 36 since, at each iteration of PMH, the weighte paths obtaine by a particle filter are employe in orer to upate the previous estimation accoring to a suitable MH-type rule, taking into account all the receive observations. The authors in [ iscuss the relationships of P-MCMC with other existing techniques. They mention an escribe precisely the relationship with the so-calle configurational bias Monte Carlo metho [35,[2, Chapter 5. This technique is also strictly connecte to the MTM scheme. The authors also allue quickly to the MTM metho [22. However, the relationship between MTM an PMH eserves a more careful look. In this work, we show that MTM an PMH algorithms are strictly connecte. PMH can be interprete as an MTM using an inepenent proposal pf which generates correlate caniates, rawn an weighte sequentially through a particle filter. In orer to clarify the relationship between MTM an PMH, we recall the batch importance sampling (IS) an sequential importance sampling (SIS) methos an point out some relevant consierations about ifferent estimators of the marginal likelihoo (a.k.a, Bayesian evience) [9. Furthermore, we also introuce a variant of the MTM scheme with inepenent proposal pf. These observations are essential in orer to show the connection between MTM an PMH. This exhaustive stuy allows the esign of novel more efficient schemes, taking avantage of the ifferent alternatives an analyses alreay provie in literature about both techniques [37, 27, 30, 38. Hence, we propose novel possible MTM an PMH schemes. One of them, calle Particle MTM (P-MTM), combines the stanar PMH an MTM kernels, mixing properly the main avantages of both: the sequential construction of the ifferent tries by PMH, an the possibility given an MTM scheme of consiering several caniates perturbing the previous state of the chain. P-MTM provie excellent performance as we show with numerical results. In one simulation, we test P-MTM as a particle smoother in orer to make inference of a sequence of hien states in a stochastic volatility moel, obtaining very favorable results. The paper is structure as follows. In Section 2, we recall some require concepts about importance sampling an resampling techniques. The MTM methos are escribe in Section 3 whereas the escription of the PMH algorithms an their relationship with the MTM schemes is provie in Section 4. Novel schemes are iscusse in Section 5. Section 6 is evote to the numerical simulations an, in Section 7, we provie some conclusions. 2. IMPORTANCE SAMPLING In many applications, we esire to infer a vector of unknown parameters, x R ζ, give a set of observe ata, y R Y. In these cases, one is intereste in approximating ifferent moments the posterior ensity π(x y) that, Let us enote π(x : y :) the posterior of the hien states x : given the receive observations y :. PMH schemes can be also employe for improving the approximations of the marginal posteriors π(x y :) with, jointly with the complete posterior π(x : y :). PMH can be interprete as a way to combine properly several inepenent runs of a particle filter (after obtaining all the observations y :), generating a ergoic Markov chain with invariant istribution π(x : y :) [9.

3 hereafter, we simply enote as π(x). More specifically, in this work, we enote the variable of interest as x = x : = [x, x 2,..., x = X R ζ, where x X R ζ for all =,...,. The target ensity is inicate as π(x) = Z π(x), where Z = π(x)x, () is often known as marginal likelihoo (a.k.a., Bayesian inference). In many application, we are only able to evaluate π(x) since Z is unknown. Moreover, in general, we are not able to raw ranom samples from π(x). Monte Carlo techniques employ a simpler proposal ensity, enote as q(x), with support X R ζ, 2 for generating possible ranom caniates. Then, these caniates are filtere using some suitable proceure, in orer to prouce a particle approximation of π(x) an also to provie an estimation of Z. 2.. Batch an Sequential Importance Sampling A well-known Monte Carlo technique is the importance sampling (IS) metho. IS provies an approximation with weighte samples of the measure of π. More specifically, N samples x (),..., x (N) are rawn from a proposal pf q(x) an then they are weighte as w (n) = π(x(n) ) q(x (n), n =,..., N, (2) ) where the super-inex n in w (n) enotes the corresponing particle an the subinex refers to the imension of x, i.e., x = x : = [x,..., x. Thus, the particle approximation is π (x) = w (n) δ(x x(n) ), (3) where we have enote the normalize weights as w (n) = w(n) i= w(i). An estimation of Z is given by Ẑ = N w (n). (4) In high imensional spaces (x = X R ζ ), an equivalent sequential proceure, calle sequential importance sampling (SIS), is preferre to the previous batch approach. Recall that x = x : = [x,..., x, we can observe that a target pf π(x) can always be expresse as π(x) π(x) = γ (x ) γ (x x : ) (5) using the chain rule [32 where γ (x ) is a marginal pf an γ (x x : ) are conitional pfs. We also consier the joint probability of the partial vector x : = [x,..., x, X =2 π (x : ) = Z π (x : ) π (x : ) = γ (x ) γ j (x j x :j ), (6) 2 For the sake of simplicity, in the observations of the rest of the work, we consier the proposal function q(x) be normalize, i.e., q(x)x =. j=2

4 where Z = π (x : )x :, (7) X an, clearly, we have π (x : ) = π(x). In many applications, the target appears irectly ecompose as in Eq. (5), e.g., as in state-space moels. However, in general, one nees to marginalize several times the target π(x) for obtaining analytically the conitional pfs γ (x x : ), =,...,. Given the target in Eq. (5), we can also consier a proposal pf ecompose in the same fashion q(x) = q (x )q 2 (x 2 x ) q (x x : 2 )q (x x : ). In a batch IS scheme, given an n-th sample x (n) = x (n) : q(x), we assign the importance weight w (n) = π(x(n) ) q(x (n) ) = γ (n) (x )γ 2(x (n) 2 x(n) ) γ (x (n) x(n) : ) q (x (n) )q 2(x (n) 2 x(n) ) q (x (n) x(n) : ). The previous expression suggests a recursive proceure for computing the importance weights: starting with w (n) = π(x(n) ) q(x (n) where we have set ) an then w (n) = w (n) β(n), = β (n) j, =,...,, j= β (n) = w (n) an β (n) = γ (x (n) x(n) : ) q (x (n) x(n) (8) : ), (9) for = 2,...,. Thus, given N samples x (),..., x (N), finally we obtain the particle approximations of the sequence of pfs π (x : ) as π (x : ) = w (n) an an estimator of each normalizing constant Z is given by Ẑ = N δ(x : x (n) ), =,...,, (0) w (n) = N : β (n) j j=. () However, an alternative equivalent formulation is often use [ N Z =, (2) = = j= j= [ Ẑj j= w (n) j β(n) j [ w(n) j w(n) j Ẑ j, (3) = Ẑ Ẑ 2... Ẑ = Ẑ 0 Ẑ Ẑ Ẑ, (4) where, for simplicity, we have set Ẑ0 =. A alternative erivation of the (final) estimator Z is given in Appenix A. Remark. In SIS, there are two equivalent formulations, Ẑ in Eq. (4) an Z in Eq. (2) of estimator of Z.

5 2.2. Sequential Importance Resampling (SIR) Sequential Importance Resampling (SIR) [2, 34 combines the sequential construction of the importance weights as in SIS with the application of resampling steps [7, 8. Namely, when some pre-establishe criterion is fulfille [7, 8, 23, N inepenent particles are rawn accoring to the probability mass π (x : ). Then, the resample particles are propagate for proviing the next approximation π + (x :+ ). More specifically, let us consier that a resampling step is performe at the -th iteration. Hence, N samples x (j) : are rawn from π (x : ), an then the corresponing weights are set to the same value [7, 8. A proper choice [24 is to set the unnormalize importance weights w (n) = Ẑ, j =,..., N. (5) i.e., w () = w (2) that w (x (n) : ) = N following weights =... = w (N), for all j =,..., N. One reason why this is a goo choice, for instance, is that efining the, equal for each resample particle x (n) :. Hence, after a resampling step, we have ξ (n) ) = { (n) w, without resampling at -th iteration, Ẑ, with resampling at -th iteration. (6) then, in any case, N N ξ(n) = Ẑ, as expecte. Therefore, the weight recursion for SIR becomes ξ (n) = ξ (n) β(n), where ξ(n) = { (n) ξ, without res. at ( )-th iter., Ẑ, with res. at ( )-th iter. (7) See Appenix A for further etails. Remark 2. With the recursive efinition of the weights ξ (n) Ẑ = N ξ (n) β(n), Z = in Eq. (7), the two estimators [ j= ξ (n) j β(n) j (8) (n) where ξ j = ξ(n) j, are both vali an equivalent estimators of Z [24. i= ξ(i) j For instance, if the resampling is applie at each iteration, observe that they become an Z = [ N j= [ Ẑ = Ẑ N β (n) = β (n) j j= [ N, (9) β (n) j, (20) an clearly coincie. Note that, w.r.t. the estimator in Eq. () (for SIS, i.e., without resampling), the operations of prouct an sum are inverte. Figure 2 epicts ifferent examples of generation of weighte samples x (n) with or without employing resampling steps. More specifically, Figure 2 shows the components x, (n)..., x (n) of each sample, with = 0. Remark 2 is necessary to to escribe exhaustively the relationship between MTM an PMH algorithms. Below, we recall the MTM schemes an iscuss a novel suitable variant in orer to link MTM to PMH.

6 Table. Generic MTM algorithm.. Choose a initial state x 0 an the total number of iterations K. 2. For k =,..., K: (a) raw N samples from x (i) q(x x k ), i =,..., N. (b) Choose one sample x {x (),..., x (N) } with probability proportional to the importance weights Namely, raw a sample x from w (i) = π(x(i) ), i =,..., N. q(x (i) x k ) π (x) = w (n) δ(x x(n) ). (c) raw N auxiliary samples z (j) q(x x ), j =,..., N, an set z (N) = x k. () Compute the importance weights also for the auxiliary points, ρ (i) = π(z(i) ), i =,..., N. q(z (i) x ) (e) Set x k = x with probability α = otherwise, with probability α, set x k = x k. i= w(i), i= ρ(i) 3. MULTIPLE TRY METROPOLIS (MTM) ALGORITHMS The Multiple Try Metropolis (MTM) algorithm [22 is an avance MCMC technique, where N caniates are generate each iterations. Accoring to some suitable weights, one caniate is chosen an accepte as new state with a suitable probability α. The MTM steps with a generic proposal q(x x k ), epening on the previous state, are summarize in Table where we have enote a b = min[a, b. For N =, the MTM algorithm becomes the stanar Metropolis-Hastings (MH) metho [2, 34. We consier importance weights for facilitating the comparison with other techniques. However, ifferent kin of weights coul be applie [22, 27. The MTM metho generates a reversible Markov chain that converges to π(x) [22, 27. If the proposal pf is inepenent from the previous state of the chain, i.e., q(x), the algorithm can be simplifie. inee, the steps 2c an 2 can be remove in the MTM scheme. Namely, one oes not nee to generate the auxiliary samples at step 2c. Inee, in this case, we coul irectly set z (j) = x (j), j =,..., N. The simplifie MTM algorithm (I-MTM) is given in Table. A graphical representation of a MTM scheme is provie in Figure, with = an N = 2. Alternative version of the I-MTM metho (I-MTM2). In this work, we highlight that the I-MTM metho can be esigne in an alternative way. With a proposal pf inepenent from the previous state, we have seen that we can set z (j) = x (j), j =,..., N, because each x (j) is itself rawn from q(x). With the same argument, we can also use the samples generate in the previous iteration of the algorithm as auxiliary points, since all the samples are generate inepenently from the same proposal pf. Namely, the alternative suitable version of the I-MTM is summarize in Table 3. Note that, in this case, we can write the acceptance probability α as α = Ẑ Ẑ (k )

7 q(x x k ) w () w (2) x = x (2) (2) q(z x ) () =^ w() + w (2) () + (2) x () x x (2) k z (2) = x z () k Forwar Backwar Test x k = x or x k = x k Fig.. Sketch of a generic MTM metho with = an N = 2 tries. In this example, the secon caniate is selecte as x = x (2). It has been selecte with probability w () an z (2) = x k. = w() w () +w(2). The auxiliary points are z () q(z x ) Table 2. MTM with inepenent proposal (I-MTM).. Choose a initial state x 0 an the total number of iterations K. 2. For k =,..., K: (a) raw N samples from x (i) q(x), i =,..., N. (b) Choose one sample x {x (),..., x (N) } with probability proportional to the importance weights = π(x(i) ), i =,..., N. q(x (i) ) w (i) Moreover, we enote as w an w,k, the weights corresponing to x an x k, respectively. (c) Set x k = x with probability α = i= w(i) i= w(i) w + w,k = i= w(i), (2) i= ρ(i) where the values ρ (i) enote the importance weights of {z(),..., z (N) } = {x (),..., x (N) } \ {x } {x k }. Otherwise, set x k = x k. where Ẑ an Ẑ(k ) are both estimators of Z. 4. PARTICLE METROPOLIS-HASTINGS ALGORITHM AN ITS RELATIONSHIP WITH MTM Consier that we are able to factorize the target ensity as π(x) π(x) = γ (x )γ 2 (x 2 x ) γ (x x : ). The Particle Metropolis Hastings (PMH) metho [ is another MCMC technique propose inepenently from the MTM algorithm, specifically esigne for being applie in this framework. The complete escription is provie in Table 4. At each iteration, a particle filter is run in orer to provie a particle approximation by N weighte samples

8 Table 3. Alternative I-MTM algorithm (I-MTM2).. Choose a initial state x 0, the total number of iterations K an obtain an estimation Ẑ(0) Z. 2. For k =,..., K: (a) Choose one sample x {x (),..., x (N) } with probability proportional to the importance weights (b) Set x k = x an Ẑ(k) = Ẑ = N w (i) i= w(i) = π(x(i) ), i =,..., N. q(x (i) ) with probability α = N N i= w(i) = Ẑ (k ) Ẑ Ẑ (k ) otherwise, with probability α, set x k = x k an Ẑ(k) = Ẑ(k ). Table 4. Particle Metropolis-Hastings (PMH) algorithm.. Choose a initial state x 0, the total number of iterations K an obtain an estimation Ẑ(0) Z. 2. For k =,..., K: (a) Using a proposal pf of type q(x) = q (x )q 2(x 2 x ) q (x x : ), we employ SIR (see Section 2.2) for rawing with N particles an weighting properly them, {x (i), w (i) }N i=. Namely, we obtain a particle approximation of the measure of target pf π (x) = i= Furthermore, we also obtain Ẑ in Eq. (), or Z in Eq. (2). w (i) δ(x x(i) ). (b) raw x π(x), i.e., choose a particle x = {x (),..., x (N) } with probability w (i), i =,..., N. (c) Set x k = x an Ẑ(k) = Ẑ with probability α = Ẑ, (22) Ẑ (k ) otherwise set x k = x k an Ẑ(k) = Ẑ(k ). of the measure of the target. Then, a sample among the N weighte particles is chosen by one resampling steps. This selecte sample is then accepte or rejecte as next state of the chain accoring to an MH-type acceptance probability, which involves two estimators of Z. Both estimators Ẑ an Z can be use in PMH (although the original algorithm is escribe with the use of Z [), if the resample particles are properly weighte as shown in Eq. (5) [24. A generalization of PMH for hanling both ynamic an fixe parameters (as hien states an parameters of the moel, respectively), calle Particle Marginal MH algorithm, is escribe in Appenix B. Smoothing. PMH is a particle smoother since the outputs of ifferent run of a particle filter (i.e., at each run, N weighte paths obtaine with a SIR proceure) are further processe through the MH acceptance functions, taking into account all the receive observations.

9 6 4 2 x (5) x (2) x (3) = x (3) : x () x (4) (a) Batch-IS or SIS with q (x ) = N (x 0, 2) an N = (b) Batch-IS or SIS with q (x x ) an N = (c) SIR using q (x x ) an resampling at the iterations = 4, 8 (with N = 40). Fig. 2. Examples of application of the IS technique. We consier as target ensity a multivariate Gaussian pf, π(x) = 0 = N (x 2, 2 ). In each figure, every component of ifferent particles are represente, so that each particle x (i) forms a path. The normalize weights w n corresponing to each figure are also shown. The line-with of each path is proportional to the corresponing weight w n. The particle corresponing to the greatest weight is always epicte in black. (a) Batch IS or SIS with N = 5 particles an q(x) = = N (x 0, 2). (b) Batch IS or SIS with N = 40 particles an q(x) = N (x 2, ) =2 N (x x, ). (c) SIR with N = 40 particles an q(x) = N (x 2, ) =2 N (x x, ) an resampling steps at the iterations = 4, Relationship between MTM an PMH A simple look at the alternative version of the MTM technique with inepenent proposal (I-MTM2), introuce in Section 3, an the PMH metho, shows that are strictly relate. Inee, the structure of the two algorithms coincies. The connections an ifferences are liste below: The main ifference lies that the caniates in PMH are generate sequentially, using a SMC proceure (i.e., by a particle filter). If the resampling steps in the SMC are not applie them I-MTM2 an PMH are exactly the same algorithm, where the caniates are rawn in a batch setting or sequential way. Namely, I-MTM2 generates irectly x (i) = [x (i) from a pf q(x) in the space x whereas PMH raws sequentially each component x (i),..., x(i) of x from q (x x (i) : ) (see also Figure 4, for further clarifications). The use of resampling steps is the main ifference between the generation proceures of PMH an I-MTM2. Owing to the use of the resampling, the caniates {x (),..., x (N) } propose by PMH are not inepenent, ifferently from the MTM schemes consiere in this work. Without resampling, the generate samples x (i) = x (i) : woul be inepenent as in I-MTM2. The generation of correlate samples can be also consiere in MTM methos, as simply shown for instance in [5, without jeoparizing the ergoicity of the chain. Thus, more precisely, PMH can be consiere as an I-MTM2 scheme using correlate samples (e.g., as in [5), an where the caniates are generate sequentially. For clarifying this point, in Figure 2 we show ifferent particles weighte with IS weights (the line-with of each path is proportional to the corresponing normalize weight w n ). More specifically, we represent each component of x (n), =,..., = 0 of each particle x(n) = x (n) :0 with n =,..., N {5, 40}. The target ensity is a multivariate Gaussian pf, π(x) = 0 = N (x 2, 2 ), i.e., with expecte value µ = 2, for =,..., 0. Figures 2(a)-(b) correspons to the application of IS with two ifferent proposal pfs an without resampling. In Figure 2(a), the components x (n) are inepenent. In Figure 2(b), the components x (n)

10 within each sample x (n) are correlate, but the samples x (n), n =,..., N, are still inepenent. In Figure 2(c) two resampling are also applie at the iterations = 4, 8, generating correlation among the particles x (n), n =,..., N, as well. Figure 2(c) correspons to the sample generation in PMH. In the correspoing stanar formulations, I-MTM2 uses the estimator Ẑ in Eq. (4) whereas PMH has been propose using Z, given in Eq. (2). However, they are equivalent formulation of an estimator of the normalizing constant Z (see Remark 2), if the resample particles are proper weighte [24. The PMH approach is preferable in high imension, when the target can be factorize as in Eq. (5), since the use of the resampling steps can provie a better proposal generation proceure. 5. NOVEL SCHEMES The previous consierations also allow us to esign novel PMH schemes. For instance, we can easily suggest an alternative proper acceptance probability, α = NẐ NẐ w + w. (23),k We enote as var-pmh the variant of the PMH technique which uses the probability α above, instea of the probability α in Eq. (22). Namely, var-pmh is ientical with the PMH metho in Table 4, replacing Eq. (22) with Eq. (23). The var-pmh structure is equivalent (within a sequential framework) to I-MTM of Table 2, in the same fashion as PMH in Table 4 is equivalent to I-MTM2 of Table 3. State epenent PMH (S-PMH). Moreover, we can also exten the stanar PMH metho employing a state-epenent proposal pf (epenent from the previous state), instea of an inepenent proposal (namely, inepenent from the previous state) as in Table 4. This novel scheme, enote as S-PMH, is outline in Table 5. In this case, the generation of a backwar path is require at step 2c. Hence, in S-PMH, we have this aitional computation cost. However, the generate backwar paths coul be also recycle for estimating the hien states (nevertheless, this requires an eserves more specific analysis). The valiity of S-PMH is ensure since it correspons to the MTM scheme in Table. In S-PMH, the approximation π is provie consiering with correlate samples ue to the resampling, unlike in MTM. However, it oes not jeoparize the ergoicity (as shown, e.g., in [5). Furthermore, we consier the use of resampling steps only at certain 0 R K pre-establishe iterations,,..., R. If R = 0, no resampling it is applie so that we obtain a stanar MTM scheme with a sequential generation of the tries. If R = K, the resampling is applie at each iteration, so that we have a bootstrap filter for generating the samples [7, 9. Figure 3(a) shows a sketch of the ifferent schemes iscusse in this work. The MTM schemes are given on the left sie, whereas the corresponing PMH approaches are provie on the right. The boxes with ashe contours represent the novel schemes introuce in this work. As an example of proposal in S-PMH scheme, we can consier q (s s :, x :,k ) = q (s s, x,k ), so that the complete proposal is q(s x k ) = q (s x,k ) =2 q (s s, x,k ). However, it is not straightforwar to choose an tune properly the components q (s s, x,k ). Particle Multiple Try Metropolis (P-MTM) algorithm. A simpler an robust scheme consists in performing alternatively a stanar PMH kernel (or var-pmh), enote as K P MH (x t x t ) an an MTM kernel with a ranom walk proposal pf, enote as K MT M (x t x t ). Namely, Particle Multiple Try Metropolis (P-MTM) algorithm is forme by the following steps:

11 . Choose an initial state x 0, an set t =. 2. While t < T : (a) Generate a new state using one step of PMH, i.e., x t K P MH (x x t ), generating sequentially with an inepenent proposal pf q(x) = q (x ) =2 q (x x ), an set t = t +. (b) Generate a new state using one step of MTM, x t K MT M (x x t ), using a ranom walk proposal pf, i.e., q(x x t ), in orer to raw the N caniates, an set t = t +. The two kernels are linke together an, since each kernel leaves invariant the target pf π, it is to show that the prouct of two vali kernels also leaves invariant π (see Appenix C). P-MTM exploits the avantage of selecting N particles component by component using resampling steps (as in PMH) an, jointly, the possibility of proposing N caniates taking in account the previous state of the chain. Figure 3(b) provies a graphical representation of P-MTM an Figure 4 gives a sketch of the two generation approaches jointly use in P-MTM. Gen. MTM S-PMH Particle MTM algorithm I-MTM I-MTM2 variant of PMH PMH K PMH (x t x t ) K MTM (x t+ x t ) (a) (b) Fig. 3. (a) Graphical representation of the MTM methos an the corresponing PMH schemes. The boxes with ashe contours contain the novel schemes presente in this work. (b) Graphical sketch of the P-MTM algorithm. 5Y x (n) q(x) =q (x ) q(x x ) =2 x 2,k x (n) q(x x k ) x k x 5,k x (n) x,k x (n) x 3,k x 4,k (a) Sequential generation of a caniate x (n) (b) Generation with a ranom walk proposal of x (n). Fig. 4. Examples of generation of one caniate x (n) (with = 5) (a) with a sequential approach as in PMH, without consiering resampling steps; (b) from a ranom walk proposal q(x x k ) which takes into account the previous state of the chain x k, for instance, q(x x k ) = 5 = q (x x,k ) (if each component is rawn inepenent from the others).

12 6.. Comparison among ifferent particle schemes 6. NUMERICAL SIMULATIONS In orer to the the ifferent techniques, we consier a multiimensional Gaussian target ensity, π(x) = π(x,..., x ) = N (x µ, σ 2 ), (24) with x = x : R, = 0, with µ :3 = 2, µ 4:7 = 4, µ 8:0 =, an σ = 2. We apply I-MTM, I-MTM2, PMH, Var-PMH (using the acceptance probability in Eq. (23)) an P-MTM for estimating the vector µ :0. In each metho, we employ Gaussian-piece proposal pf for the sequential proposal construction of the N caniates, i.e. = q(x x ) = N (x x, σ 2 p), setting σ p = 2. For the ranom walk MTM (RW-MTM) part within P-MTM, we consier a Gaussian proposal q rw (x x k ) = = N (x x,k, σrw) 2 with σ rw = an x k = [x,k,..., x,k is the previous state of the chain. For all the PMH schemes, we consier to perform resampling at each iteration (in I-MTM an I-MTM2, clearly no resampling is applie). We test the techniques consiering ifferent value of number of particles N an number of iterations of the chain K. We compute the MSE in estimating the vector µ :0, averaging over 500 inepenent simulations. The starting particles, =, are chosen ranomly x (i) N (x; 2, 4), for i =,..., N, at each run an for each metho. Figures 5(a)-(b) show the MSE as function of number of iterations K in semilog scale, keeping fixe the number of tries N = 3. Figure 5(a) reports the results of the MTM schemes whereas Figure 5(b) reports the results of the PMH schemes. Figure 5(c) epicts the MSE in the estimation of µ :0 of function of N, for the PMH methos. These results show that the use of an acceptance probability of type in Eq. (2)-(23) provie smaller MSE. This is more evient for small number of caniates N. Namely, the use of acceptance probability in Eq. (23) within a PMH is preferable since provies better performance. When N grows, the performance of PMH an var-pmh methos becomes similar, since the acceptance probability approaches, in both cases. P-MTM provies excellent results, outperforming the other schemes. The MSE vanishes to zero when N increases, as expecte, confirming the valiity of the novel schemes. The results also shows that performing resampling at each iteration is not optimal an that a smaller rate of resampling steps coul improve the performance [7, 9. Figure 5() epicts 35 ifferent states x k = x :0,k at ifferent iteration inices k, obtaine with var-pmh (N = 000 an K = 000) an the values µ :0 shown in ashe line Inference in a stochastic volatility moel We consier a stochastic volatility moel where the hien state x R at the -th iteration follows an AR() process an represents the log-volatility [4 of a financial time series at time t N. The state space moel is given by { x = αx + u, y = exp ( x 2 ) v, =,...,. (25) where α = 0.9 is the AR parameter, an u an v enote inepenent zero-mean Gaussian ranom variables with variances σu 2 = an σv 2 = 0.5, respectively. Note that v is a multiplicative noise. We observe a sequence y : R, an we esire to infer the hien states x :, analyzing the joint posterior π(x : y : ). A classical particle filtering (PF) approach provies a particle approximation of the posterior π(x : y : ) an, ue to the Monte Carlo approach, also of the marginal posteriors π(x y : ) with (use for smoothing). Unfortunately, when <<, the stanar PF strategy in general fails: the marginal

13 I MTM2 I MTM 0 PMH var PMH P MTM MSE MSE K (a) MTM schemes K (b) PMH schemes. 0 PMH var PMH P MTM 4 2 MSE N (c) PMH schemes () ifferent States of var-pmh. Fig. 5. (a)-(b) MSE versus number of iterations K of the chain in semilog scale, fixing the number of particles N = 3. (a) I-MTM (soli line) an I-MTM2 (ashe line). (b) PMH (soli line), var-pmh (triangles an ashe line) an P-MTM (stars an ashe line). (c) MSE versus N of the chain in semilog scale for PMH (soli line), var- PMH ( triangles an ashe line) an P-MTM (stars an ashe line). () ifferent states x k = x :0,k at ifferent iteration inices k, obtaine with var-pmh (N = 000 an K = 000). The values µ :0 are shown in ashe line (µ :3 = 2, µ 4:7 = 4 an µ 8:0 = ).

14 istribution π(x y : ) (use for filtering) occupies a privilege role within the particle filter framework as it is better characterize than any of the other marginal posterior istributions, as pointe out in [9. PMH schemes can be also employe for improving the approximations of the marginal posteriors π(x y : ) with as well as of the complete posterior π(x : y : ). Inee, PMH can be interprete as a way to combine properly several inepenent runs of a particle filter (after obtaining all the observations y : ), generating a ergoic Markov chain with invariant istribution π(x : y : ). We test the stanar PMH technique [ an the Particle MTM (P-MTM) metho escribe in Section 5. In both cases, for the sequential proposal part, we employ a stanar bootstrap particle filter (i.e., using the transition probability p(x x ) given by the test moel as proposal an with resampling at each iteration). For the stanar MTM block in P-MTM we use q(x x k ) = N (x x,k, σp), 2 = as proposal pf, with σ p = 0.5. We have average the results over 500 inepenent runs. At each run, we generate a ifferent of observations y : from the moel, setting = 00. Clearly, both schemes are compare with the same number of evaluations of the posteriors (i.e., for instance fixing N, PMH performs K iterations of the chain, an P-MTM is forme by K 2 of PMH steps an K 2 of MTM steps, as escribe in Section 5). First, we consier N = 0 an varies the number of total iterations of the algorithms, K from K = 2 to K = 500. The results are shown in Fig. 6(a). Then, we fix K = 50 an test ifferent values of N, from N = 0 to N = 000. The results are shown in Fig. 6(b). We can observe that always P-MTM outperforms PMH obtaining smaller values of MSE (with the exception of one value with a very small number of iterations, K = 2, in Fig. 6(a)) P MTM PMH P MTM PMH MSE MSE K N (a) MSE versus K. (b) MSE versus N. Fig. 6. (a) MSE versus number of iterations K of the generate chain in semi-log scale, fixing the number of particles (N = 0). (b) MSE versus number of particles N in semi-log scale, fixing the number of iterations of the algorithms (K = 50). 7. CONCLUSIONS In this work, we have highlighte the strong connection between MTM an PMH algorithms. Specifically, PMH can be interprete as an MTM using correlate caniates, rawn an weighte sequentially through a particle filter.

15 We have employe this consieration in orer to introuce novel MTM an PMH schemes, mixing properly both approaches. For instance, the Particle MTM (P-MTM) algorithm combines the stanar PMH an MTM kernels, using the sequential construction of the ifferent tries obtaine by PMH, an the possibility of perturbing the previous state of the chain using a MTM with ranom-walk proposal ensity. P-MTM is a very efficient technique for filtering an smoothing in state space moel. We have teste P-MTM, jointly with other techniques, in ifferent numerical simulations obtaining excellent results. As future line, we plan to esign the marginal version of P-MTM for the joint purpose of inferring ynamic an static parameters in a state space framework. References [ C. Anrieu, A. oucet, an R. Holenstein. Particle Markov chain Monte Carlo methos. J. R. Statist. Soc. B, 72(3): , 200. [2 M. S. Arulumpalam, S. Maskell, N. Goron, an T. Klapp. A tutorial on particle filters for online nonlinear/non- Gaussian Bayesian tracking. IEEE Transactions Signal Processing, 50(2):74 88, February [3 M. Béar, R. ouc, an E. Mouline. Scaling analysis of multiple-try MCMC methos. Stochastic Processes an their Applications, 22: , 202. [4 R. Casarin, R. Craiu, an F. Leisen. Interacting multiple try algorithms with ifferent proposal istributions. Statistics an Computing, 23(2):85 200, 203. [5 R. V. Craiu an C. Lemieux. Acceleration of the Multiple-Try Metropolis algorithm using antithetic an stratifie sampling. Statistics an Computing, 7(2):09 20, [6 J. ahlin, F. Linsten, an T. B. Schn. Particle Metropolis Hastings using Langevin ynamics. In 203 IEEE International Conference on Acoustics, Speech an Signal Processing, pages , May 203. [7 P. M. jurić, J. H. Kotecha, J. Zhang, Y. Huang, T. Ghirmai, M. F. Bugallo, an J. Míguez. Particle filtering. IEEE Signal Processing Magazine, 20(5):9 38, September [8 A. oucet, N. e Freitas, an N. Goron, eitors. Sequential Monte Carlo Methos in Practice. Springer, New York (USA), 200. [9 A. oucet an A. M. Johansen. A tutorial on particle filtering an smoothing: fifteen years later. technical report, [0 W. Fong, S. J. Gosill, A. oucet, an M. West. Monte Carlo smoothing with application to auio signal enhancement. IEEE Transactions on Signal Processing, 50(2): , February [. Frenkel an B. Smit. Unerstaning molecular simulation: from algorithms to applications. Acaemic Press, San iego, 996. [2 S. Gosill, A. oucet, an M. West. Monte Carlo smoothing for nonlinear time series. Journal of the American Statistical Association, 99(465):56 68, March [3 W. K. Hastings. Monte Carlo sampling methos using Markov chains an their applications. Biometrika, 57():97 09, 970. [4 E. Jacquier, N. G. Polson, an P. E. Rossi. Bayesian analysis of stochastic volatility moels. Journal of Business an Economic Statistics, 2(4):37 389, October 994.

16 [5 L. Jing an P. Vaakkepat. Interacting MCMC particle filter for tracking maneuvering target. igital Signal Processing, 20:56 574, August 200. [6 G. Kitagawa. Monte Carlo filter an smoother for non-gaussian nonlinear state-space moels. J. Comput. Graph. Statist., : 25, 996. [7 G. Kitagawa an S. Sato. Monte Carlo smoothing an self-organising state-space moel. In A. oucet, N. e Freitas, an N. Goron, eitors, Sequential Monte Carlo Methos in Practice, chapter 9, pages Springer, 200. [8 G. Kobayashi an H. Kozumi. Generalize multiple-point Metropolis algorithms for approximate bayesian computation. Journal of Statistical Computation an Simulation, 85(4), 205. [9 J. R. Larocque an P. Reilly. Reversible jump MCMC for joint etection an estimation of sources in colore noise. IEEE Transactions on Signal Processing, 50(2), February 998. [20 F. Liang, C. Liu, an R. Caroll. Avance Markov Chain Monte Carlo Methos: Learning from Past Samples. Wiley Series in Computational Statistics, Englan, 200. [2 J. S. Liu. Monte Carlo Strategies in Scientific Computing. Springer, [22 J. S. Liu, F. Liang, an W. H. Wong. The Multiple-Try metho an local optimization in Metropolis sampling. Journal of the American Statistical Association, 95(449):2 34, March [23 L. Martino, V. Elvira, an F. Louzaa. Alternative Effective Sample Size measures for Importance Sampling. IEEE Workshop on Statistical Signal Processing (SSP), June 206. [24 L. Martino, V. Elvira, an F. Louzaa. Weighting a resample particle in Sequential Monte Carlo. IEEE Workshop on Statistical Signal Processing (SSP), June 206. [25 L. Martino an F. Louzaa. Issues in the Multiple Try Metropolis mixing. Computational Statistics (to appear), pages 4, 206. [26 L. Martino, V. P. el Olmo, an J. Rea. A multi-point Metropolis scheme with generic weight functions. Statistics & Probability Letters, 82(7): , 202. [27 L. Martino an J. Rea. On the flexibility of the esign of Multiple Try Metropolis schemes. Computational Statistics, 28(6), [28 N. Metropolis, A. Rosenbluth, M. Rosenbluth, A. Teller, an E. Teller. Equations of state calculations by fast computing machines. Journal of Chemical Physics, 2:087 09, 953. [29 J. Míguez, T. Ghirmai, M. F. Bugallo, an P. M. jurić. A sequential Monte Carlo technique for blin synchronization an etection in frequency-flat Rayleigh faing wireless channels. Signal Processing, 84(): , November [30 I. Nevat, G. W. Peters, an J. Yuan. Channel tracking in relay systems via particle MCMC. In IEEE Vehicular Technology Conference, pages 5, Sept 20. [3 S. Panolfi, F. Bartolucci, an N. Friel. A generalize multiple-try version of the reversible jump algorithm. Computational Statistics & ata Analysis (available online), 203. [32 A. Papoulis. Probability, Ranom Variables an Stochastic Processes. McGraw-Hill Series in Electrical Engineering, 984.

17 [33 Z. S. Qin an J. S. Liu. Multi-Point Metropolis metho with application to hybri Monte Carlo. Journal of Computational Physics, 72: , 200. [34 C. P. Robert an G. Casella. Monte Carlo Statistical Methos. Springer, [35 J. I. Siepmann an. Frenkel. Configurational bias Monte Carlo: a new sampling scheme for flexible chains. Molecular Physics, 75():59 70, 992. [36 P. Stavropoulos an. M. Titterington. Improve particle filters an smoothing. In A. oucet, N. e Freitas, an N. Goron, eitors, Sequential Monte Carlo Methos in Practice, chapter 4, pages Springer, 200. [37 G. Storvik. On the flexibility of Metropolis-Hastings acceptance probabilities in auxiliary variable proposal generation. Scaninavian Journal of Statistics, 38(2): , February 20. [38 T. Vu, B. N. Vo, an R. Evans. A particle marginal Metropolis-Hastings multi-target tracker. IEEE Transactions on Signal Processing, 62(5): , Aug 204. [39 Y. Zhang an W. Zhang. Improve generic acceptance function for multi-point Metropolis algorithms. 2n International Conference on Electronic an Mechanical Engineering an Information Technology, 202. A. ALTERNATIVE FORMULATION OF THE ESTIMATOR OF Z In SIS approach, there are two possible equivalent formulations of the estimators of Z, the first one Ẑ in Eqs. (4)- () an the secon one Z given in Eq. (2). This alternative formulation can be also erive as follows. Consier the following integrals, Z = π (x : )x : Ẑ = w (n) X N, (26) an Clearly, we can write X γ (x x : ) π (x : )x : = X γ (x x : ) π (x : )x : = = π (x : ) X π (x : ) π (x : )x :, (27) = Z Z. (28) γ (x x : ) X q (x x : ) q (x x : ) π (x : )x :, X β (x x : )q (x x : ) π (x : )x :, (29) where we have set β (x x : ) = γ (x x : ) q (x x : ). Replacing π (x : ) with π (x : ) given in Eq. (0), β (x x : )q (x x : ) π (x : )x : = X = = w (n) w (n) X β (x x : )q (x x : )δ(x : x (n) : )x :, X β (x x (n) : )q (x x (n) : )x.

18 Hence, using again Monte Carlo for approximating each integral within the sum, i.e., given N samples x (n) q (x x (n) : ), n =,..., N (one sample for each ifferent q ( x (n) : )), an enoting β(n) = β (x (n) x(n) : ), we obtain X β (x x : )q (x x : ) π (x : )x : = where we have use w (n) = = = = w (n) β(n), (30) i= w(i) i= w(i) N w(n) N i= w(i) w (n) β(n), w (n), = Ẑ Ẑ w(n), the recursive expression of the weights, w (n) i= w(i) = w (n) estimator in Eq. (26). Finally, we can obtain, setting Ẑ0 =, [ Ẑ Ẑ 2 Z = = Ẑ Ẑ Ẑ Ẑ = Ẑ Ẑ 2 Ẑ = that is exactly the estimator in Eq. (2). A.. Application of resampling = i= w (x (i) : )β (x (i) x(i) : ) β(n) Z Z, (3), an Ẑ is the Z, (32) Let us consier to approximate the integral in Eq. (30) via importance sampling, assuming in this case to raw N samples, x () :,..., x(n) :, from q (x x : ) π (x : ), hence we can write Moreover using Eq. (27), we have N X β (x x : )q (x x : ) π (x : )x : N β(n) Z Z. β (n). (33) B. PARTICLE MARGINAL METROPOLIS-HASTINGS (PM-MH) ALGORITHM The Particle Marginal Metropolis-Hastings (PM-MH) algorithm is a simple extension of the PMH metho for the combine sampling of ynamic an fixe unknown parameters, enote as x an θ, respectively. Let us consier the following state space moel { q (x x, θ), (34) l (y x, θ) where q represents a transition probability, an l is the likelihoo function. The parameter θ Θ is consiere also unknown so that the inference problem consists in inferring (x :, θ) given the sequence of receive measurements y :. With respect to the notation use in Section 2., we have γ (x θ) = l (y x, θ)q (x θ), an γ (x x :, θ) = l (y x, θ)q (x x, θ),

19 with = 2,...,. Hence, consiering also a prior p(θ) over θ, an x = x :, y = y :, the complete target is π(x, θ y) = π(x y, θ)p(θ y), (35) = π(x y, θ) p(y θ)p(θ), (36) p(y) = π(x, y θ) p(θ) p(y), (37) [ p(θ) = l (y x, θ)q (x θ) l (y x, θ)q (x x, θ) p(y). (38) =2 We can evaluate π(x, y θ) π(x y, θ), it is not an issue using a self-normalize IS approach for approximating π(x y, θ). However, we cannot evaluate p(θ y), p(y θ) an p(y). Let us consier to apply a stanar MH metho for sampling from π(x, θ y). We assume possible to raw samples [x, θ as proposal pf q(θ, x θ k ) = q θ (θ θ k ) π(x y, θ ), where k =,..., K is the iteration of the chain an π(x y, θ) is the posterior of x. Assume hypothetically that it is possible to raw from q(θ k, x k θ k ), we obtain the following acceptance probability α = π(x, θ y)q(θ k, x k θ ) π(x k, θ k y)q(θ, x θ k ), (39) = π(x, θ y)q θ (θ k θ ) π(x k y, θ k ) π(x k, θ k y)q θ (θ θ k ) π(x y, θ ). (40) Then, since π(x, θ y) = π(x y, θ)p(θ y), we can replace it into the expression above α = π(x y, θ )p(θ y)q θ (θ k θ ) π(x k y, θ k ) π(x k y, θ k )p(θ k y)q θ (θ θ k ) π(x y, θ ), (4) = p(θ y)q θ (θ k θ ) p(θ k y)q θ (θ θ k ), (42) = p(y θ )p(θ )q θ (θ k θ ) p(y θ k )p(θ k )q θ (θ θ k ). (43) The problem is that, in general, we are not able to evaluate the likelihoo function Z(θ) = p(y θ) = π(x, y θ)x. However, we can approximate Z(θ) via importance sampling. Thus, the iea is to use the approximate proposal pf q(θ, x θ k ) = q θ (θ θ k ) π(x y, θ ), where π is a particle approximation of π obtaine by SIR an, at the same, we get the estimation Ẑ(θ ). Therefore, the PM-MH algorithm can be summarize as following:. For k =,..., K : (a) raw θ q θ (θ θ k ) an then x π(x y, θ ) via SIR. (b) Set [θ k, x k = [θ, x with probability otherwise set [θ k, x k = [θ k, x k. Ẑ(θ )p(θ )q θ (θ k θ ) α = Ẑ(θ k )p(θ k )q θ (θ θ k ) Given the observations provie in this work, PM-MH can be seen as a combination of a MH metho w.r.t. θ an a MTM-type metho w.r.t. x.

20 C. INVARIANT ENSITY OF P-MTM Let us consier two MCMC kernels, K P MH (y x) an K MT M (z y) with x, y, z R x, corresponing to the PMH an MTM steps in the P-MTM scheme, respectively. We assume π( ) is the invariant ensity of both chains. The two kernels have been esigne such that K P MH (y x) π(x)x = π(y), K MT M (z y) π(y)y = π(z). In P-MTM, the kernels K P MH, K MT M are use sequentially. Namely, first a sample is rawn from y K P MH (y x) an then z K MT M (z y ). The complete transition probability from z to x is given by K P MT M (z x) = K MT M (z y)k P MH (y x)y. (44) The target π is also invariant w.r.t. K P MT M (z x) [34, 2, 20. Inee, we can write K P MT M (z x) π(x)x = [ = K MT M (z y)k P MH (y x)y π(x)x, [ = K MT M (z y) K P MH (y x) π(x)x y, = K MT M (z y) π(y)y, = π(z), (45) which is precisely the efinition of invariant pf of K P MT M (z x). Clearly, we can invert the orer of the application of the kernels, i.e., using first K MT M an then K P MH. Thus, since the two are connecte sequentially, also the intermeiate steps are istribute as π(z), after a burn-in perio.

21 Table 5. State epenent PMH (S-PMH). Choose a initial state x 0, the total number of iterations K. 2. For k =,..., K: (a) Using a proposal pf of type q(s x k ) = q (s x,k )q 2(s 2 s, x :2,k ) q (s s :, x :,k ), (46) we employ SIR (see Section 2.2) for rawing with N particles, x (i), an weighting properly them, {x (i), w (i) )}N i=. The resampling steps are applie at R fixe an pre-establishe iterations (0 R K), < 2 <... < R. Thus, we obtain a particle approximation of the measure of target pf π (x) = i= Furthermore, we also obtain ẐX in Eq. () or Z X as in Eq. (2). w (i) δ(x x(i) ). (b) raw x π(x), i.e., choose a particle x = {x (),..., x (N) } with probability w (i), i =,..., N. (c) raw N particles z (),..., z (N ) via SIR using q(z x ) as in Eq. (46), applying resampling at the same iterations, < 2 <... < R, use in the generation of x (i) s. Moreover, set z N = x. () Compute where ρ (i) = π(z(i) ), i =,..., N. q(z (i) x ) (e) Set x k = x with probability otherwise set x k = x k. Ẑ Z = N α = i= ρ (i). Ẑ X Ẑ Z,

WEIGHTING A RESAMPLED PARTICLE IN SEQUENTIAL MONTE CARLO. L. Martino, V. Elvira, F. Louzada

WEIGHTING A RESAMPLED PARTICLE IN SEQUENTIAL MONTE CARLO. L. Martino, V. Elvira, F. Louzada WEIGHTIG A RESAMPLED PARTICLE I SEQUETIAL MOTE CARLO L. Martino, V. Elvira, F. Louzaa Dep. of Signal Theory an Communic., Universia Carlos III e Mari, Leganés (Spain). Institute of Mathematical Sciences