arxiv: v2 [stat.me] 15 May 2018

Size: px

Start display at page:

Download "arxiv: v2 [stat.me] 15 May 2018"

Tobias Whitehead
5 years ago
Views:

1 Piecewise-Determiistic Markov Chai Mote Carlo Paul Vaetti 1, Alexadre Bouchard-Côté 2, George Deligiaidis 1, Araud Doucet 1 May 16, Departmet of Statistics, Uiversity of Oxford, UK. 2 Departmet of Statistics, Uiversity of British Columbia, Caada. arxiv: v2 stat.me] 15 May 218 Abstract A ovel class of o-reversible Markov chai Mote Carlo schemes relyig o cotiuous-time piecewisedetermiistic Markov Processes has recetly emerged. I these algorithms, the state of the Markov process evolves accordig to a determiistic dyamics which is modified usig a Markov trasitio kerel at radom evet times. These methods ejoy remarkable features icludig the ability to update oly a subset of the state compoets while other compoets implicitly keep evolvig ad the ability to use a ubiased estimate of the gradiet of the log-target while preservig the target as ivariat distributio. However, they also suffer from importat limitatios. The determiistic dyamics used so far do ot exploit the structure of the target. Moreover, exact simulatio of the evet times is feasible for a importat yet restricted class of problems ad, eve whe it is, it is applicatio specific. This limits the applicability of these techiques ad prevets the developmet of a geeric software implemetatio of them. We itroduce ovel MCMC methods addressig these shortcomigs. I particular, we itroduce ovel cotiuous-time algorithms relyig o exact Hamiltoia flows ad ovel o-reversible discrete-time algorithms which ca exploit complex dyamics such as approximate Hamiltoia dyamics arisig from symplectic itegrators while preservig the attractive features of cotiuous-time algorithms. We demostrate the performace of these schemes o a variety of applicatios. Keywords: geeralized Metropolis Hastigs; Hamiltoia dyamics; itractable likelihood; o-reversible Markov chai Mote Carlo; piecewise-determiistic Markov process; weak covergece. 1 Itroductio Markov chai Mote Carlo MCMC) methods are the tools of choice to sample o-stadard probability distributios. I high-dimesioal scearios, the celebrated Metropolis Hastigs algorithm performs usually poorly ad alterative algorithms are required. Two of the most popular alteratives are slice samplig 37] ad Hamiltoia Mote Carlo HMC) methods 18, 38, 3, 4] which have had much empirical success over recet years. More recetly, cotiuous-time o-reversible MCMC algorithms based o Piecewise-Determiistic Markov Processes PDMP) schemes have also appeared i the literature i applied probability 35, 17, 7], automatic cotrol 34], physics 42, 32, 27, 39], statistics ad machie learig 1, 6, 2, 5, 4, 47]. I physics, these schemes have become quickly popular as they provide state-of-the-art performace whe applied to the simulatio of large scale physical models. They also show promise for statistics applicatios, i particular for high dimesioal sparse graphical models 1] ad big data 1, 6, 21, 4]. However, the PDMP-based schemes curretly available suffer from shortcomigs which limit both their applicability ad performace. To esure ivariace with respect to the target distributio, oe eeds to be able to simulate these cotiuous-time processes exactly. I practice, this restricts severely the determiistic dyamics oe ca use: all the existig algorithms use a simple liear dyamics that does ot exploit the geometry of the target. Moreover, exact simulatio of the evet times is problem specific ad may be impossible i certai scearios. This prevets the developmet of a geeric software implemetatio of these techiques. I this paper, we address these limitatios by developig ovel cotiuous-time ad discrete-time Piecewise- Determiistic Markov Chai Mote Carlo PD-MCMC) techiques which brig together HMC, PDMP ad geeralized Metropolis Hastigs. 1

2 First, we show that it is possible to develop cotiuous-time PD-MCMC algorithms relyig o Hamiltoia dyamics. I this cotext, exact simulatio of the resultig PDMP remais possible for a importat class of target distributios. The resultig algorithms provide a alterative to elliptical slice samplig-type algorithms 36, 8]. We also exploit a geeralized versio of Metropolis Hastigs algorithm see, e.g., 31]) satisfyig a skewed detailed balace coditio to derive ovel schemes. Secod, we itroduce ovel discrete-time PD-MCMC algorithms. These o-reversible algorithms ca be thought of as a discretized versio of cotiuous-time PD-MCMC but preserve the target distributio as ivariat distributio for all discretizatio steps. These schemes are ot oly able to exploit complex dyamics, such as approximate Hamiltoia dyamics arisig from symplectic itegrators, but it is also always possible to simulate the evet times. Moreover some versios of these discrete-time algorithms do ot eve require beig able to compute the gradiet of the log-target. These methods ejoy the same attractive features as their cotiuous-time couterparts: they ca leverage ay represetatio of the target as a product of o-egative factors. Additioally they ca use ubiased estimators of the log-target distributio ad its gradiet ad still provide algorithms with the correct ivariat distributio. The rest of the paper is orgaised as follows. I Sectio 2 we review cotiuous-time PDMPs, provide sufficiet coditios to esure ivariace of a PDMP with respect to a give target distributio, discuss existig PD- MCMC algorithms ad fially itroduce ovel algorithms relyig o Hamiltoia dyamics. I Sectio 3, we itroduce the class of discrete-time PDMP ad provide sufficiet coditios to esure ivariace of a PDMP with respect to a give target distributio which parallel the oes obtaied i the cotiuous-time scearios. We review existig ad describe ovel discrete-time PD-MCMC algorithms. Sectio 4 is dedicated to the efficiet implemetatio of discrete-time algorithms usig subsamplig ad prefetchig ideas while Sectio 5 proposes discrete-time algorithms to hadle scearios where the target is itractable but its logarithm ad the logarithm of its gradiet ca be estimated ubiasedly. Empirical performace of some of these schemes are reviewed i Sectio 6. Appedix A cotais all the proofs of validity of the proposed algorithms while weak covergece of a specific discrete-time scheme to a PDMP is prove i Appedix B. 2 Cotiuous-Time PDMP ad PD-MCMC 2.1 PDMP PDMPs were itroduced i 14]. We will oly provide here a iformal review of this class of processes i the spirit of 34, 17, 2, 5] ad refer the reader to 15] for a detailed theoretical treatmet. For the sake of simplicity, assume that Z = R. A Z-valued cotiuous-time PDMP process z t ; t is a càdlàg process ivolvig a determiistic dyamics altered by radom jumps at radom evet times. It is defied through 1. a Ordiary Differetial Equatio ODE) with differetiable drift φ : Z Z, i.e., which iduces a determiistic flow dz t dt = φ z t), 1) t, z) R + Z Φ t z) Z 2) satisfyig the semi-group property Φ s Φ t = Φ s+t ad such that t Φ t z) is càdlàg, 2. a evet rate λ : Z R +, with λ z t ) ɛ + o ɛ) beig the probability of havig a evet i the time iterval t, t + ɛ], ad 3. a Markov trasitio kerel Q from Z to Z where the state at evet time t is give by z t Q z t, ), z t beig the state of the process just before the evet. Algorithm 1 describes how to simulate the path of a PDMP. 2

3 Algorithm 1 Simulatio of cotiuous-time PDMP 1. Iitialize z arbitrarily o Z ad set t. 2. for k = 1, 2,... do a) Sample iter-evet time τ k, where τ k is a o-egative radom variable such that t P τ k t) = exp λ Φ r z tk 1 ) ] dr. 3) r= b) For r, τ k ), set c) Set t k t k 1 + τ k ad sample z tk 1 +r Φ r z tk 1 ). 4) z tk Qz t, ). 5) k To be able to exactly simulate a PDMP, we thus eed to be able to simulate from the distributio 3) ad compute the flow 4). Fially we also eed to be able to simulate from the trasitio kerel Q. I importat scearios, exact simulatio of the evet times ca be performed usig iversio of the itegrated rate fuctio as i 42] or usig adaptive thiig procedures as i 1]. We ow itroduce the geerator associated with the PDMP. For fuctios i the domai of the geerator, it is defied by E f z t+ɛ ) z t = z] f z) Lf z) = lim. ɛ ɛ Uder suitable regularity coditios 15, Theorem 26.14], it ca be show that this geerator is give by Lf z) = φ z), f z) + λ z) f z ) f z)] Q z, dz ), 6) where a, b deotes the scalar product betwee vectors a, b ad a 2 = a, a. The first term o the right had side of 6) arises from the determiistic dyamics while the secod term correspods to the jump compoet of the process. 2.2 From PDMP to PD-MCMC Assume we are iterested i samplig from a give target probability distributio o the Borel space Z, B Z)). If we wat to use a PDMP mechaism to sample this target distributio, this PDMP eeds at least to admit this distributio as ivariat distributio. We provide here sufficiet coditios to esure this is satisfied. If additioally the PDMP is ergodic, this will allow us to estimate cosistetly expectatios with respect to the ivariat distributio. From ow oward, the target distributio will be assumed to have a strictly positive desity ρ z) with respect to the Lebesgue measure dz where ρ z) = exp H z)). 7) Ivariace with respect to ρ will be satisfied if ρ dz) Lf z) = for all fuctios f i the domai of the geerator 15, Propositio 34.7]. From 6), this meas that we eed ρ dz) φ z), f z) + ρ dz) λ z) Q z, dz ) f z ) f z)] =. However, usig itegratio by parts, we obtai ρ dz) φ z), f z) = ρ dz) φ z) H z), φ z) f z) 3

4 where φ z) := iφ i z) is the divergece of the vector field φ. Hece, a sufficiet coditio to esure ivariace of a PDMP with respect to ρ is to have ] ρ dz) λ z) Q z, dz ) f z ) f z)] φ z) H z), φ z) f z) =. 8) The followig otatio will prove useful to formulate sufficiet coditios to esure ivariace of a PDMP with respect to ρ. Suppose that we are give a a measure ν o Z, B Z) ad a measurable mappig Γ : Z Z. The the push-forward of the measure ν uder the mappig Γ, ofte deoted by Γ ν dz), is the measure A ν Γ 1 A) ) for ay A B Z). We will use here the otatio ν Γ 1 dz) ). For ay measurable f : Z R, the followig idetity holds fz)γ ν dz) = f Γ z)ν dz). Z Z Sufficiet coditios for global methods We provide here useful sufficiet coditios o φ, λ, ad Q to esure ρ-ivariace of the associated PDMP, without makig ay structural assumptios o these objects. A1) Coditios o φ, λ, ad Q 1. There exists a ρ-preservig mappig S : Z Z; that is S is measurable ad satisfies ρ S 1 dz) ) = ρdz). 2. The evet rate λ satisfies λ S z)) λ z) = φ z) H z), φ z). 9) 3. The kerel Q satisfies ρ dz) λ z) Q z, dz ) = ρ S 1 dz ) ) λ S z )). 1) Based o these assumptios, straightforward calculatios show that the followig result holds. Propositio 1. Assume A1). The the PDMP admits ρ as ivariat distributio Sufficiet coditios for local methods Assume that H z) ca be decomposed as follows H z) = H i z), 11) where potetially each H i z) oly depeds o a subset of the compoets of z. I this cotext, like i stadard MCMC, we might be iterested i usig a trasitio kerel which is a mixture of kerels performig local updates. This ca be achieved i the PDMP framework by itroducig a evet rate of the form ad a trasitio kerel of the form λ z) = Q z, dz ) = λ i z) 12) λ i z) λ z) Q i z, dz ) 13) where Q i are Markov trasitio kerels. Let us write ] := 1, 2,...,. To simulate the evet times of the resultig PDMP, oe ca associate a clock to each idex i ] ad use a priority queue 42, 32, 1]. Whe it is possible to boud λ i ; i ] locally i time, more elaborate thiig strategies have bee developed i 1, Sectio 3.3.2] ad 29]. Based o these structural assumptios o λ ad Q, we ca provide useful sufficiet local coditios o φ, λ i : i ] ad Q i : i ] to esure that ivariace of the associated PDMP with respect to ρ is satisfied. 4

5 A2) Coditios o φ, λ i : i ] ad Q i : i ] 1. There exists a ρ-preservig mappig S : Z Z. 2. The evet rates λ i : i ] satisfy λ i S z)) λ i z) = φ z) H z), φ z). 14) 3. For all i ], the trasitio kerel Q i satisfies ρ dz) λ i z) Q i z, dz ) = ρs 1 dz ))λ i S z )). 15) If the fuctios H i : i ] are differetiable the Assumptio A2.2 is satisfied for a divergece-free vector field, i.e. φ =, if for all i ] λ i S z)) λ i z) = H i z), φ z). 16) Propositio 2. Assume A2). The the PDMP admits ρ as ivariat distributio Sufficiet coditios for doubly stochastic methods Cosider ow a slight geeralizatio of the previous sceario where the target distributio caot eve be evaluated poitwise up to a ormalizig costat but there exists a measure µ o some measurable space Ω, G) ad a fuctio H ω z) : Ω Z R which ca be evaluated poitwise up to a additive costat such that H z) = H ω z) µ dω). 17) I this cotext, we cosider a evet rate of the form λ z) = λ ω z) µ dω) 18) where λ ω : Ω R + ad a trasitio kerel of the form Q z, dz λω z) µ dω) Q ω z, dz ) ) =, 19) λω z) µ dω) where Q ω is a Markov trasitio kerel from Z to Z. I Sectio 2.2.2, 11), 12) ad 13) simply correspod to 17), 18) ad 19) if we select µ as the measure such that µ i) = 1 for all i Ω = ]. The sufficiet coditios of the previous sectio ca be directly geeralized. A3) Coditios o φ, λ ω : ω Ω ad Q ω : ω Ω 1. There exists a ρ-preservig mappig S : Z Z. 2. The evet rates λ ω : ω Ω satisfy λ ω S z)) λ ω z) µ dω) = φ z) H z), φ z). 2) 3. For all ω Ω, the trasitio kerel Q ω satisfies ρ dz) λ ω z) Q ω z, dz ) = ρ S 1 dz ) ) λ ω S z )). 21) If µ is a probability measure ad the derivative H ω z) is well-defied for almost all ω Ω the uder weak regularity coditios, it follows from 17) that H ω z) is a ubiased estimate of H z) whe ω µ ad Assumptio A3.2 will be satisfied for a divergece-free field if λ ω S z)) λ ω z) = H ω z), φ z). 22) We will refer to this class of PD-MCMC as doubly stochastic i referece to doubly-stochastic Poisso processes. Propositio 3. Assume A3). The the PDMP admits ρ as ivariat distributio. 5

6 2.3 Existig PD-MCMC algorithms All the existig algorithms we are aware of are based o the followig framework. The target distributio admits a desity with respect to Lebesgue measure o X = R d equal to π x) = exp U x)). Lettig z = x, v), a exteded target distributio ρ dz) o Z = X V is the defied as ρ dz) = π dx) ψ dv), 23) where ψ is a auxiliary distributio o V, where V ca be for example either R d or the uit hypersphere S d 1 so that = 2d. The followig liear dyamics is the cosidered φ z) = v, d ), so the resultig flow is aalytically tractable ad give by Φ t z) = x + vt, v). 24) I this case, we have φ =. Additioally, all these algorithms rely o Sx, v) = x, v) which ca be viewed as a time reversal, so 9) becomes λ S z)) λ z) = λ x, v) λ x, v) = U x), v. 25) These algorithms differ i the way the evet rate ad the trasitio kerels are specified. We just give a few examples here ad refer the reader to the list of refereces for other examples Boucy particle sampler This algorithm proposed i 42] exploits ay additive decompositio of the potetial U, i.e. For λ ref >, it uses the evet rate U x) = λ z) = λ ref + where x + := max, x). It also relies o the trasitio kerel Q z, dz ) = λ ref λ z) δ xdx )ψdv ) + m U i x). 26) m U i x), v + where, for ay vector field W : R d R d, we defie R W x) as m R W x)v := v 2 U i x), v + δ x dx )δ R Ui x)vdv ), 27) λ z) W x), v W x). 28) W x) 2 We ote that 28) correspods to a bouce as it ca be iterpreted as a Newtoia collisio with the plae perpedicular to W at x. I 42], a ormal distributio is used for ψ but the uiform distributio o S d 1 ca also bee used 35, 16]. We are i the sceario where λ ad Q are of the form 12) ad 13) with = m + 1, λ i z) = 1 m U i x), v + ad Q i z, dz ) = δ x dx )δ R Ui x)vdv ) for i m] ad λ z) = λ ref, Q z, dz ) = δ x dx )ψdv ). It ca be checked that Assumptio A2 holds i this sceario. I particular, Assumptio A2.2 ca be verified by checkig the stroger coditio 16). Ideed, if we write H i := x H i, v H i ) the 16) becomes λ i x, v) λ i x, v) = x H i, v which is satisfied for H i z) := U i x) for i m] ad H z) :=. For m = 1, we refer to this algorithm as the global BPS ad for m > 1 as the local BPS. The local BPS is computatioally advatageous compared to BPS whe either U i x) oly depeds of a subset of the compoets of x, as for sparse graphical models, ad/or whe m is very large, as for big data applicatios. The BPS algorithm has bee further exteded to the sceario where oe has access to a ubiased estimate of U; see 4] ad 2, Sectio 4.4.2]. The validity of this algorithm ca be established as a applicatio of the results of Sectio We are ot aware of ay implemetatio of this algorithm i scearios where µ is ot a atomic measure with fiite support, i which case the algorithm is the local BPS. 6

7 2.3.2 Zig-Zag sampler This algorithm proposed i 6, 7] uses for ψ the uiform distributio o 1, 1 d1. It relies o the followig evet rates λ i z) = λ ref,i + i U x), v i +, while the trasitio kerel is selected as Q i z, dz ) = δ x dx )δ vi dv i) j i δ vj dv j). It is also possible to further exploit ay additive decompositio of U x) withi this framework ad this has bee used to develop a efficiet samplig algorithm for big data 6]. Agai, it is easy to show that Assumptio A2 is satisfied BPS sampler with radomized bouces Alteratives to bouces of the form 28) have bee proposed where oe uses Q z, dz ) = δ x dx )Q x v, dv ) 29) ad ψ v) = g v ). I this case, Assumptio A1.3 is verified if ψ dv) λ x, v) Q x v, dv ) = ψ dv ) λ x, v ). 3) Here ψ will be the stadard multivariate ormal distributio. We cosider the sceario where λ x, v) = U x), v + as i the global BPS. To preset the various methods proposed i the literature, a decompositio of the velocity similar to that adopted i 33] is useful: where ad are uit orm vectors such that All the radomized bouce procedures retur a vector v v = a + a, 31) U x), v, v. 32) where, =. With this otatio, we obtai λ x, v ) = a + U x). v = a + a, 33) Let χ k) ad χ 2 k) be the χ ad χ 2 distributios respectively, with k degrees of freedom. Uder ψ, the radom variables a ad a are idepedet ad satisfy a χ d 1), a N, 1). 34) Ideed, we have a 2 χ2 d 1) ad a. We give below some examples of kerels Q x v, dv ) satisfyig Equatio 3). 1. Idepedet samplig 2]: 2] proposes usig Q x v, dv ) ψ dv ) λ x, v ) a + ψ dv ) which satisfies 3) but a scheme to sample this distributio was ot give. Usig the parameterizatio 31)-33), 34) shows this ca be achieved by samplig a accordig to a desity proportioal to a + times the stadard ormal desity, which is equivalet to samplig a χ 2). Fially, sample v ψ ad set a = v v,. 2. Forward-evet chai 33]: I 33], ψ is the uiform distributio o S d 1, whereas we cosider the sceario where ψ is the ormal distributio. Oe uses =, set a = a ad a χ d 1). Alteratively, sample a χ 2) ad set a = a. For either scheme, we recover the method of 33] o S d 1 by ormalizig v, i.e. settig v = v / v. 3. Autoregressive bouce: this is a ew scheme where oe samples a χ 2) with probability p b ad a = a otherwise, sample v ψ ad set a = v v,. Fially, set a = ρ a + 1 ρ2 a for ρ 1, 1]. The properties of these radomized bouces are ot yet well uderstood. experimetally o a variety of models. I Sectio 6, we compare them 1 I this sceario, ρ dz) does ot admit a desity with respect to Lebesgue measure but the results discussed previously ca be directly exteded to this sceario. 7

8 2.4 Hamiltoia PD-MCMC Although all previously proposed methods rely o the liear flow 24), the framework preseted i Sectio 2.2 is much more flexible. We exploit here this geeralizatio to provide ovel cotiuous-time PD-MCMC algorithms relyig o Hamiltoia dyamics. 2 As i Sectio 2.3, we cosider targets of the form ρz) = π x) ψ v) with π x) = exp U x)) beig the desity of iterest o X = R d ad ψ the stadard multivariate ormal o V = R d. We use here the Hamiltoia flow Φ t associated with the Hamiltoia Ĥ z) = V x) + Kv), 35) where Kv) = v T v/2 ad µ x) exp V x)) is a auxiliary probability desity esurig Φ t is aalytically tractable, e.g., V is quadratic or liear 41]. For example if π x) is a posterior desity arisig from a Gaussia prior, the µ x) could be this Gaussia prior. Alteratively, µ x) ca always be selected as a Gaussia approximatio to π x). We ca the rewrite the target as ρz) = exp H z)) where H z) = Ũ x) + V x) + Kv), where Ũ x) := U x) V x). This is the same ratioale as i elliptical slice samplig-type algorithms 36, 8]: both schemes use a exact Hamiltoia dyamics associated with a approximatio of π to explore the space. The differece with these algorithms ad the method proposed here is that we correct for the discrepacy betwee µ ad π by usig a PDMP mechaism istead of slice samplig techiques. The Hamiltoia flow Φ t is iduced by the ODE of drift φ = φ x, φ v ) where φ x = v Ĥ z) = v ad φ v = x Ĥ z) = V x). Hece, we have φ z) = ad φ z) H z), φz) = x H z), φ x v H z), φ v = Ũ x), v V x), v + V x), v Ũx), = v. Oe ca check that Assumptio A.1 is thus verified for S z) = x, v) if we use a evet rate ad trasitio kerel as i the global BPS but based o Ũ oly3 λ z) := λ ref + Ũ x), v, + Ũ x), v Q z, dz ) := λ ref λ z) δ xdx )ψdv ) + λ z) + δ x dx )δ R Ũ x)v dv ). We ca alteratively use the radomized bouces described i Sectio substitutig Ũ for U. Figure 1 illustrates a sample path obtaied from the resultig Hamiltoia BPS algorithm. Local ad doubly stochastic versios of this algorithm as for BPS 42, 1, 41] ca also be directly developed. I the big data examples cosidered i 1, 6, 4], oe could for example use for µ a Gaussia approximatio of π. A local algorithm ca the be obtaied usig for Ũi the differece of the gradiet of the log-likelihood correspodig to data i ad the properly rescaled gradiet of the log-approximate posterior, as i 6]. If the terms Ũi are locally bouded, we ca simulate exactly the PDMP usig thiig techiques which boil dow to data subsamplig 1, 6]. This provides a alterative to 13] which also exploits Hamiltoia dyamics ad subsamplig but does ot preserve π as ivariat distributio. Fially, we also ote that the methods itroduced i this sectio ca be combied with the HMC algorithm of 41] proposed to perform exact simulatio of costraied ormal distributios. This exteds sigificatly the applicability of the work i 41], which ca be viewed as a special case where Ũ =. A alterative approach to costraied problems is proposed i 5] but it is limited to piecewise-liear dyamics. 2 The first arxiv versio of 1] proposed a versio of the BPS algorithm usig Hamiltoia dyamics but uses a differet approach based o maifolds. The algorithm suggested therei does ot preserve the correct ivariat distributio. 3 For Ũ =, this algorithm correspods to a cotiuous-time HMC algorithm with mometum/velocity refreshmet at Poisso times. 8

9 BPSHamiltoia,global) BPSPiecewiseLiear,global) BPSPiecewiseLiear,local) First positio coordiate Secod positio coordiate Figure 1: Examples of paths for the Hamiltoia BPS left), global BPS middle) ad local BPS blue). All algorithms are ru for a wall clock time of 15ms o a 1-dimesioal Gaussia latet field with sparsely observed Poisso distributed observatios oe observatio for every 1 latet variables), see Sectio 6.1 for details. The first two positio coordiates are show. 2.5 Usig geeralized Metropolis Hastigs trasitios at evet times All the algorithms we have cosidered so far are such that oly a part of the state z = x, v) is updated at evet times, i.e., the trasitio kerel is of the form Q z, dz ) = δx dx )Qx v, dv ). We might be iterested i desigig more geeral trasitios kerels satisfyig Assumptio A1.3 ad similarly Assumptio A2.3 or Assumptio A3.3. For sake of illustratio, cosider Assumptio A1.3. This ca be rewritte as ρ dz) Q z, dz ) = ρ S 1 dz ) 36) for the probability measure ρ dz) ρ dz) λ z) assumig that ρ dz) λ z) <, a weak coditio which we assume holds. If the mappig S is a ivolutio, i.e., S 1 = S, ad we ca desig a kerel Q satisfyig the so-called skewed detailed balace coditio ρ dz) Q z, dz ) = ρ S dz )) Q S z ), S dz)), 37) the it follows directly by itegratig both terms i this equality with respect to variable z that it will satisfy 36). We preset here a geeric mechaism which ca be used to achieve this kow as the Geeralized Metropolis Hastigs GMH) algorithm. The GMH algorithm is a simple extesio of MH; see for example 31, pp ]. For a probability measure ν dz) = ν z) dz o Z, let us cosider the followig GMH kerel defied for a Markov proposal kerel M by T z, dz ) = β z, z ) M z, dz ) + 1 β z, w) M z, dw) δsz) dz ) 38) where β z, z ) = g ν S dz )) M S z ), S dz)) ν dz) M z, dz ) We make the followig assumptios: A4) Coditios o ν, S, M ad g 1. The mappig S is a ivolutio, i.e., S 1 = S. 2. The Rado-Nikodym derivative ν S dz )) M S z ), S dz)) ν dz) M z, dz ) is defied ad positive for almost all z, z ) Z Z. 3. The fuctio g : R+, 1] satisfies g r) = rg 1/r) )

10 Assumptio A4.1 is satisfied for g r) = mi 1, r). For a determiistic proposal M z, dz ) = δ Ψz) dz ), Assumptio A4.3 is satisfied if Ψ admits a iverse Ψ 1 such that Ψ 1 = S Ψ S 4) ad the the acceptace probability is give by ) ν S Ψ dz)) β z, z ) = β z) = g. 41) ν dz) Propositio 4. Assume A4). The the GMH kerel T defied by 38) satisfies the followig skewed detailed balace coditio ν dz) T z, dz ) = ν S dz )) T S z ), S dz)). 42) If additioally S is a ν-preservig mappig the the GMH kerel is ν-ivariat. The proof of this result follows from direct calculatios give i the Appedix ad ca also be foud i 31, pp ]. Usig this result, it is possible to check easily Assumptio A1.3 for the BPS ad Zig-Zag processes. For example, for the BPS, Q is of the form 38) with ν = ρ, S 1 = S, g r) = mi 1, r) as we use a determiistic proposal Ψ z) = x, R U x) v) which verifies Ψ 1 = S Ψ S so β z, z ) = 1 for all z, z. Hece by Propositio 4, Q satisfies the skewed detailed balace 37), hece it satisfies 36). The beefit of the GMH approach is that it allows us to defie much more geeral kerels at evet times. For example oe could use a determiistic proposal with Ψ z) = x, R Û x) v) where Û is a computatioally cheap approximatio of U. It is valid to use such a determiistic proposal at it satisfies Ψ 1 z) = S Ψ Sz). I this case, there is a probability of the bouce beig rejected ad settig z S z). We ca also use trasitio kerels which modify the compoet x of z. 3 Discrete-time PDMP ad PD-MCMC We itroduce here the class of discrete-time PDMP ad preset geeral coditios for such processes to esure ivariace w.r.t. a strictly positive desity ρ z) = exp H z)). These coditios parallel the coditios give Sectio 2.2 for cotiuous-time algorithms. 3.1 Discrete-time PDMP As i the cotiuous-time sceario, we assume for simplicity that Z = R. A Z-valued discrete-time PDMP process z t ; t N ivolves a determiistic dyamics altered by radom jumps at radom evet times. It is defied through 1. a diffeomorphism Φ : Z Z with the absolute value of the determiat of the Jacobia satisfyig Φ z) > for all z, 2. a acceptace probability α : Z, 1] with 1 α z) beig the probability of havig a evet at the ext time step whe the curret state is z, ad 3. a Markov trasitio kerel Q from Z to Z where the state at evet time t is give by z t Q z t 1, ). Algorithm 2 describes how to simulate the path of a discrete-time PDMP. It will be coveiet to use the covetios 1 i= = 1, Φ z) = z ad Φ r+1 z) = Φ r Φ z) for r N. 1

11 Algorithm 2 Simulatio of discrete-time PDMP 1. Iitialize z arbitrarily o Z ad set t. 2. for k = 1, 2,... do a) Sample iter-evet time τ k, where τ k is a o-egative iteger-valued radom variable such that P τ k = j) = 1 α Φ j )) j 1 z tk 1 α Φ i )) z tk 1. 43) b) If τ k 1 the for r 1,..., τ k, set z tk 1 +r Φ r z tk 1 ). 44) i= c) Set t k t k 1 + τ k + 1 ad sample z tk Qz tk 1, ). 45) The process z t ; t N is othig but a Markov process of trasitio kerel K z, dz ) = α z) δ Φz) dz ) + 1 α z)) Q z, dz ). 46) 3.2 From discrete-time PDMP to PD-MCMC Similarly to Sectio 2.2, assume we are iterested i samplig a strictly positive desity ρ z) give by 7) usig a discrete-time PDMP process. Ivariace of the kerel K with respect to ρ is satisfied if, by defiitio, oe has ρ dz) K z, dz ) = ρ dz ). 47) From 46), 47) ca be rewritte as ρ Φ 1 z ) ) α Φ 1 z ) ) Φ 1 z ) dz + ρ dz) 1 α z) Q z, dz ) = ρ dz ). 48) All the followig developmets could also be adapted to sample from distributios o discrete spaces but this will ot be discussed here Sufficiet coditios for global methods We provide here useful sufficiet coditios o Φ, α, ad Q to esure ρ-ivariace of the associated discrete-time PDMP, without makig ay structural assumptio o these objects. A5) Coditios o Φ, α, ad Q 1. There exists a ρ-preservig mappig S : Z Z. 2. The acceptace probability α satisfies 3. The kerel Q satisfies log α S Φ z)) log α z) = log Φ z) H Φ z)) H z). 49) ρ dz) 1 α z)) Q z, dz ) = ρs 1 dz )) 1 α S z ))). 5) Remark 5. Coditios A5.1 to A5.3 parallel the coditios A1.1 to A1.3. Propositio 6. Assume A5). The the discrete-time PDMP admits ρ as ivariat distributio. Remark 7. Whe S is a ivolutio so that ρs 1 dz )) = ρs dz )), coditio A5.3 ca be iterpreted as a skewed ivariace coditio o νdz) ρdz)1 αz)). The quatity ρdz) 1 α z)) is proportioal to the ivariat distributio of the jump chai, i.e. the distributio of those states where the proposal Φ z) is rejected. It has a clear aalogue i the cotiuous-time sceario where the jumps occur at states with distributio proportioal to ρdz)λz). 11

12 3.2.2 Sufficiet coditios for local methods I scearios where H z) ca be decomposed as i 11), it will prove coveiet to cosider a acceptace probability of the form α z) = α i z) 51) where α i : Z, 1] are themselves acceptace probabilities 4. To sample a evet of probability α z), we ca sample idepedet Beroulli variables B i, such that B i Ber1 α i z)) for i ] where Berp) is the Beroulli distributio of parameter p. Hece the probability of the evet B =,..., ) where B = B 1,..., B ) is α z). Thus if B :=,..., ), we will set z Φ z). Otherwise, that is if B B where B =, 1 \, the we will sample z Qz, ) where Q z, dz ) = b B Q B 1 b z) Q b z, dz ). 52) I this expressio Q b is a Markov kerel ad Q B 1 b z) is the distributio of B coditioed upo B := B i 1 which is give by Q B 1 b z) = Berb i; 1 α i z)). 53) 1 α z) Based o these structural assumptios o α ad Q, we ca provide useful sufficiet local coditios o Φ, α i : i ] ad Q B : b B to esure ivariace of the associated discrete-time PDMP w.r.t. ρ is satisfied. A6) Coditios o φ, α i : i ], ad Q b : b B 1. There exists a ρ-preservig mappig S : Z Z. 2. The acceptace probabilities α i : i ] satisfy log α i S Φ z)) log α i z) = log Φ z) H Φ z)) H z). 54) 3. For all b B, the trasitio kerel Q b satisfies ρ dz) 1 α z)) Q B 1 b z) Q b z, dz ) = ρs 1 dz )) 1 α S z ))) Q B 1 b S z )). 55) For a mappig such that Φ = 1, the Assumptio A6.2 is satisfied if for all i ] log α i S Φ z)) log α i z) = H i Φ z)) H i z). 56) Propositio 8. Assume A6). The the discrete-time PDMP admits ρ as ivariat distributio Sufficiet coditios for doubly stochastic methods Cosider fially the sceario where H z) is give by 17). I this cotext, we cosider a acceptace probability of the form α z) = exp log α ω z) µ dω) 57) where α ω : Z, 1] which is a geeralizatio of 51) from the measure µ i) = 1 o a fiite space Ω = ] to a arbitrary measure o a geeral space. Obviously whe Ω is ot fiite, the strategy previously adopted to simulate a evet of probability α z) is ot applicable. However, this ca be achieved by simulatig a Poisso process P o Ω of rate Λ dω) = log α ω z) µ dω), the law of which we deote with Q dp z), ad oticig that α z) is the void probability of P. A similar idea was used i a differet cotext i 3]. Hece if the umber of poits is ull, i.e. P =, the we will set z Φ z). If P 1, that is P P where P is the set of cofiguratios of the Poisso process havig at least oe poit, the we will sample z Qz, ) where Q z, dz ) = Q P 1 dp z) Q P z, dz ). 58) P 4 The authors i 32] derive a cotiuous-time local PD-MCMC by usig this factorized acceptace probability, usig a mappig Φ z) = x + ɛv, v) ad takig the limit as ɛ. However for a strictly positive ɛ >, they do ot defie a discrete-time local PD-MCMC as proposed here. 12

13 I this expressio Q P is a Markov kerel ad Q P 1 dp z) is the law of the Poisso process P coditioed upo the evet P 1 which is give by Q P 1 dp z) = A7) Coditios o φ, α ω : ω Ω ad Q P : P P 1. There exists a ρ-preservig mappig S : Z Z. I P 1) Q dp z). 59) 1 α z) 2. The acceptace probabilities α ω : ω Ω satisfy log α ω S Φ z)) log α ω z)] µ dω) = log Φ z) H Φ z)) H z). 6) 3. For all P P, the trasitio kerel Q P satisfies ρ dz) 1 α z)) Q P 1 dp z) Q P z, dz ) = ρs 1 dz )) 1 α S z ))) Q P 1 dp S z )). Assumptio A.7.3 is a iformal expressio meaig that we assume that for Q P 1 dp z)-almost all P P dq P 1 P z) ρ dz) 1 α z)) dq P 1 P S z )) Q P z, dz ) = ρs 1 dz )) 1 α S z ))), ad the Rado-Nikodym derivative i the expressio above is well-defied ad strictly positive for Q P z, dz ) almost all z. For a mappig such that Φ = 1, Assumptio A7.2 is satisfied if for all ω Ω log α ω S Φ z)) log α ω z) = H ω Φ z)) H ω z). 62) Propositio 9. Assume A7). The the discrete-time PDMP admits ρ as ivariat distributio. 61) 3.3 Existig PD-MCMC algorithms A few algorithms proposed i the literature ca be cosidered as special istaces of discrete-time PD-MCMC algorithms. They all rely o the same framework discussed i Sectio 2.3, that is they sample a exteded target desity ρ z) = exp H z)) = π x) ψ v) defied 23) o Z = R d R d where π is the target distributio of iterest ad ψ is a stadard multivariate ormal. They use a mappig such that Φ = 1, Φ 1 = S Φ S with S z) = x, v) ad α z) = mi 1, ρ Φ z)) /ρ z). A fairly geeric scheme is detailed i Algorithm 3. Algorithm 3 Discrete-time PD-MCMC 1. With probability mi 1, ρ Φ z)) /ρ z), set z Φ z). 2. Otherwise, sample z M z, ). 3. With probability β x, v), x, v ) = mi 1, ρ x, v ) ρ Φ x, v ))] + M x, v ), x, v)) ρ x, v) ρ Φ x, v))] + M x, v), x, v, )) set z z, otherwise set z x, v). This scheme satisfies Assumptio A5.1 to Assumptio A5.3 ad is thus ρ-ivariat. I particular Assumptio A5.3 is satisfied as Steps 2 ad 3 correspod to usig for the evet kerel Q a GMH kerel satisfyig the skewed-detailed balace coditio 42) for ν dz) ρ dz) 1 α z)). Remark 1. Algorithm 3 ca be alteratively viewed as a compositio of reversible kerels. First, a delayedrejectio algorithm proposig Φ ad, i case of rejectio, the proposig Mz, S 1 )). Secod, the ivolutio S is applied ucoditioally. I the delayed-rejectio framework, we ca view coditio A5.3 as a coditio o delayed-rejectio kerels expressed i a sort of remaider form. While our algorithm uses two proposals, extedig this remaider coditio to multiple proposals would require that each Q k satisfies ρdz) k 1 1 α i z))q k z, dz ) = ρdz ) k 1 1 α iz )). 13

14 3.3.1 Guided radom walk This algorithm was proposed i 25]. It is a special case of Algorithm 3 which uses Φ z) = x + vɛ, v) for some ɛ > ad a proposal M z, dz ) = δ Sz) dz ) which is accepted with probability Hamiltoia Mote Carlo The celebrated HMC algorithm proposed i 18] is also a special case of Algorithm 3 which uses a proposal M z, dz ) = δ Sz) dz ). However, cotrary to guided radom walk, it is usig for Φ a symplectic itegrator targetig the Hamiltoia H. This determiistic proposal satisfies ideed Φ = 1 ad Φ 1 = S Φ S see, e.g., 38, 3]). The resultig PD-MCMC kerel K is usually combied with a mometum refreshmet step v ψ Reflective Slice Samplig: discrete-time BPS schemes Several versios of slice samplig, kow as reflective slice samplig, are based o bouces similar to the BPS ad are also a special case of Algorithm 3; see 37, Sectio 7]. They rely Φ z) = x + vɛ, v) for some ɛ > ad a determiistic proposal M z, dz ) = δ Ψz) dz ). Reflective slice samplig with ier reflectios is usig Ψ z) = x, v ) = x, R U x)v) while reflective slice samplig with outer reflectios is usig Ψ z) = x, v ) = x + vɛ + R U x + vɛ)vɛ, R U x + vɛ)v). Both proposals satisfy Ψ 1 = S Ψ S. The outer versio of the algorithm has bee recetly proposed idepedetly i 43]; see also 44] for a related proposal i the cotext of ested samplig. I either case, the acceptace probability simplifies to β x, v) = mi 1, π x ) πx v ɛ)] +. π x) πx + vɛ)] + Ituitively, these algorithms ca be iterpreted as discrete-time versios of the BPS process. Elemetary calculatios show ideed that i both cases α z) 1 ɛ U x), v + ad β z) 1 as ɛ uder regularity assumptios. We provide here a weak covergece result for the resultig Markov chai where ψ is the uiform distributio o S d 1 to limit techicalities. Propositio 11. Uder regularity coditios, reflective slice samplig with ier reflectios coverges weakly to the BPS for λ ref = as ɛ. A precise mathematical statemet, Theorem 12, ad its proof are give i Appedix B. We ca modify this algorithm to iclude a refreshmet, i.e. by samplig v ψ with probability λ ref ɛ. This weak covergece result of Propositio 11 ca be directly exteded to this case to show that the resultig discrete-time process coverges weakly to the BPS process with refreshmet rate λ ref. Note that the kerel K would still be ρ-ivariat if Φ were usig a computatioally cheap approximatio Û of U to bouce. However, this discrete-time algorithm does ot coverge to the BPS process as the probability of acceptig z = S z) does ot vaish as ɛ i this sceario. Uder regularity coditios, it will istead coverge towards the algorithm described at the ed of Sectio Extesios Discrete-time BPS with radomized bouces As discussed i Sectio 2.3.3, a variety of radomized bouces has bee proposed for cotiuous PD-MCMC. We show here how to geeralize these ideas to discrete-time. Let ψ deote the stadard ormal distributio o R d, Φ z) = x + vɛ, v), α z) = mi 1, ρ Φ z)) /ρ z) ad S z) = x, v) satisfyig Assumptios A5.1 ad A5.2 ad we select a evet kerel of the form Q z, dz ) = δ x dx )Q x v, dv ) based o a proposal M x v, dv ) = M x v, v ) dv. This leads to Algorithm 4. 14

15 Algorithm 4 Discrete-time BPS with radomized bouces 1. With probability mi 1, π x + vɛ) /π x), set z x + vɛ, v). 2. Otherwise a) Sample v M x v, ). b) With probability set z x, v ). c) Otherwise set z x, v). mi 1, ψ v ) π x) πx v ɛ)] + M x v, v) ψ v) π x) πx + vɛ)] + M x v, v, ) For the kerel M x v, ), we ca use the radomized bouces developed i Sectio as well as M x v, ) = ψ ). The forward-evet 33], geeralized BPS 47], ad autoregressive boucig procedures discussed i Sectio iduce a trasitio kerel M x satisfyig ψv) Ux), v + M x v, v ) = ψ v ) Ux), v + M x v, v), for which we would expect that the acceptace ratio i Step 2.b of Algorithm 4 will be close to 1 for small ɛ. The ivariace with respect to ρ of the trasitio kerel is easy to check. Assumptio A5.1 is clearly satisfied. Assumptio A5.2 follows from direct calculatios usig Φ = 1 ad Φ 1 = S Φ S. Fially Assumptio A5.3 follows from the fact that the evet kerel correspodig to steps 2.a to 2.c of Algorithm 4 is a GMH kerel with ν z) ρ z) 1 α z)) with a proposal kerel M x v, dv ) Discrete-time Hamiltoia BPS We cosider here the discrete-time versio of the Hamiltoia BPS proposed i Sectio 2.4. This is achieved by settig ψ as the stadard ormal distributio o R d, α z) = mi 1, ρ Φ z)) /ρ z) ad S z) = x, v). We also cosider a approximatio Ĥ z) defied i 35) of the Hamiltoia H z) ad recall that Ũ x) := U x) V x) ad deote Ψ z) = x, R Ũ x) v). I Sectio 2.4, we were cosiderig for Φ t the exact Hamiltoia flow associated with Ĥ z). I discrete time we ca select for Φ either this exact flow Φ ɛ for some ɛ > or a leapfrog itegrator with L steps which we will deote Φ HD. The crucial differece is thus that it is ot ecessary to restrict ourselves to a Hamiltoia Ĥ z) for which the Hamiltoia equatios ca be solved exactly. The resultig algorithm the proceeds as follows. Algorithm 5 Discrete-time Hamiltoia BPS 1. With probability mi 1, ρ Φ HD z)) /ρ z), set z Φ HD z). 2. Otherwise mi a) With probability 1, ρ x, R Ũ x) v) ρ Φ HD x, R Ũ x) v)) ] + ρ x, v) ρ Φ HD x, v))] + set z x, R Ũ x) v). b) Otherwise set z x, v). = mi 1, ρ x, v) ρ Φ HD x, R Ũ x) v)) ] + ρ x, v) ρ Φ HD x, v))] +, The ivariace with respect to ρ of the trasitio kerel is easy to check. Assumptio A5.1 is obviously satisfied. Assumptio A5.2 follows from direct calculatios usig Φ = 1 ad Φ 1 = S Φ S. Fially Assumptio A5.3 follows from the fact that the evet kerel correspodig to step a) ad b) of Algorithm 5 is a GMH kerel with ν z) ρ z) 1 α z)) with a determiistic trasitio kerel satisfyig Ψ 1 = S Ψ S. If Φ is a leapfrog itegrator of stepsize ɛ > targetig the Hamiltoia H z), the the strategy described above is ot directly applicable as Ũ x) = for all x so R Ũ x) is ot defied. However as Φ ca be thought of as the exact time discretizatio of a shadow Hamiltoia of the form Ĥɛ z) = H z) ɛ 2 H ) z) + O ɛ 4 3, p. 17], it may be possible to build bouces based o H z) to correct for the discrepacy betwee the true Hamiltoia dyamics ad its leapfrog approximatio. 15

16 Algorithm 6 Discrete-time gradiet-free BPS 1. With probability mi 1, π x + vɛ) /π x), set z x + vɛ, v). 2. Otherwise a) Sample v ψ. b) With probability set z x, v ). c) Otherwise go to Step 2.a. π x) πx v ɛ)] + πx) Discrete-time gradiet-free BPS The BPS-type algorithms give thus far all require computatio of the gradiet of the potetial Ux) i order to update the velocity v whe a bouce evet occurs. However, we may wish to target potetial fuctios where this gradiet caot be computed or is very expesive to compute. Additioally, the gradiet may ot be iformative i some models, such as certai embeddigs of discrete spaces where the gradiet may be zero almost everywhere. A scheme to approximate the gradiet Ux) by computig umerical differeces was advaced i 43]. Here, some umber cpt of orthogoal uit vectors ζ i, i cpt ] are selected, ad the gradiet approximated alog each of these vectors by, e.g., i = Ux + hζ i) Ux hζ i ) 2h for some small value h. The combiatio of these cpt vectors yields a approximatio to the gradiet cpt ĝ = i ζ i, which for cpt = d is a typical umerical approximatio to the gradiet. The ew velocity is foud by a reversible map from the old velocity to the ew velocity which preserves the magitude of the velocity ad maitais the projectio of the velocity o the gradiet vector. We may derive a algorithm which operates i the same spirit as that of 43]. By takig cpt orthogoal uit vectors, here selected radomly ad idepedetly of v, we ca achieve a reversible algorithm by simply takig the reflectio off of the approximate gradiet v = v 2 ĝ,v ĝ 2 ĝ, ad acceptig this proposal i the same way we would accept a typical bouce i the discrete-time BPS algorithm; specifically, by acceptig the bouce with probability mi 1, π x) πx v ɛ)] +. π x) πx + vɛ)] + Alteratively, we propose a algorithm which is related to the cotiuous-time radomized bouces of Sectio We had previously oted that the idepedet samplig algorithm proposed i 2] cosists of samplig from the distributio proportioal to ψv )λx, v ), idepedetly of the curret value of v. Based o the discrete-time ivariace coditio 5), we may aalogously sample from the distributio proportioal to ψv ) πx) πx v ɛ)] +. This ca be accomplished by usig rejectio samplig with istrumetal distributio ψ, otig that the ratio betwee the desities is bouded above by πx); thus each rejectio samplig proposal v is accepted with probability πx) πx v ɛ) ] /πx), ad the first accepted proposal is also accepted as + the ew state v. See Algorithm 6 for details of this rejectio-samplig scheme. 16

17 3.4.4 Efficiet Implemetatio of Discrete-time PD-MCMC All the implemetatios of discrete-time PD-MCMC schemes we are aware of cosist of simulatig the algorithm usig the kerel 46), that is, at each time step it is checked whether a evet occurs with probability 1 α z) whe i state z. However, it is possible to improve over this implemetatio i some iterestig scearios. Assume there exists ᾱ : Z, 1] such that for k N we have α Φ k z) ) ᾱ z, k) > where ᾱ z, k) is computatioally cheaper to evaluate tha α Φ k z) ). It is the possible to simulate a iter-evet time of distributio 43) by simulatig a time from the istrumetal distributio P τ = j) = 1 ᾱ z, j) j 1 i= ᾱ z, i) which is the accepted with probability 1 α Φ τ z)) / 1 ᾱ z, τ). For a liear dyamics Φ z) = x + vɛ, v), we ca obtai such bouds by upper boudig the derivative of t U x + vt). If α z) = mi 1, ρ Φ z)) /ρ z), we ca also always use for example the lower boud ᾱ z, k) = ᾱi z, k) where ᾱ i z, k) = mi 1, ρ i Φ k+1 z) ) /ρ i Φ k z) ) for ρ z) = ρ i z). It has the potetial advatage that simulatig a evet of probability ᾱ z, k) ca be performed i parallel by simulatig idepedet Beroulli radom variables B i Ber1 ᾱ i z, k)) for i ]. Fially there are scearios where it is possible to directly simulate a evet time from 43). For example, assume that π x) = exp U x)) where U is strictly covex, Φ z) = x + vɛ, v) ad α z) = mi 1, ρ Φ z)) /ρ z) = mi 1, exp U x + vɛ) U x))) the it is easy to show that Algorithm 7 returs a sample from 43). This adapts the approaches developed i 1, Sectio 2.3.1] for the cotiuous-time BPS algorithm to the discretetime case. Algorithm 7 Simulatio iter-evet time for discrete-time BPS for strictly log-cocave targets 1. Miimize the potetial alog the cotiuous trajectory t = arg mi U x + vt) : t R Set k = arg mi U x + vkɛ) : k t /ɛ, t /ɛ. 3. Solve for t t U x + vt) U x + vk ɛ) = E, E Exp, 1]. 4. Retur τ = t/ɛ. All these strategies ca be easily combied. For example, we ca use a upper boud ᾱ z, k) = ᾱi z, k) where ρ i z) is strictly log-cocave for some i ]. 4 Discrete-time local PD-MCMC 4.1 Algorithm descriptio Give the framework provided i Sectio 3.2.2, it is ot difficult to obtai discrete-time local PD-MCMC schemes for ρ z) = exp H i z)) = π x) ψ v) = exp U x))ψ v) o Z = R d R d where π is the target distributio of iterest with ψ is a multivariate ormal. We ca for example select a dyamics, ivolutio ad acceptace probability satisfyig Φ = 1, α i z) = mi 1, ρ i Φ z)) /ρ i z) with ρ i z) = exp H i z)), S z) = x, v), Φ 1 = S Φ S ad ρ S = ρ. A rather geeric local PD-MCMC scheme is preseted i Algorithm 8. 17

18 Algorithm 8 Discrete-time local PD-MCMC 1. For i ], sample B i Ber ρ i z) ρ i Φ z))] + /ρ i z). 2. If B i = for all i ], set z Φ z). 3. Otherwise, sample z M B z, ). 4. With probability mi 1, M B S z ), S z)) M B z, z ) ρ i S z )) Ber B i ; 1 α i S z ))). 63) ρ i z) Ber B i ; 1 α i z)) set z z. Otherwise, set z x, v). Here Steps 3 ad 4 of Algorithm 8 correspods to a GMH kerel satisfyig the skewed-detailed balace coditio 42) for ν b dz) ρ dz) 1 α z)) Q B 1 b z) ad a proposal M B z, dz ) for ay b B. Cosider a special case of Algorithm 8 give i Algorithm 9 which correspods to a discrete-time versio of local BPS. It is usig Φ z) = x + vɛ, v), Sz) = x, v) ad a determiistic proposal M b z, dz ) = δ Ψb z) dz ) satisfyig Ψ 1 b = S Ψ b S. We also use ρ i z) = exp U i x)) := π i x) so that U x) = m U i x) ad ρ z) = ψ v) with = m + 1. We could have selected α z) = α ref to refresh the velocity periodically but we omit it for ease of presetatio. The oly differece with Algorithm 8 is that we actually use here a alterative acceptace probability which is lower tha 63) but has the advatages that it factorizes across i. It will prove useful as it is the possible to simulate a evet with the required acceptace probability by simulatig idepedet evets i parallel. Algorithm 9 Discrete-time local BPS 1. For i m], sample B i Ber π i x) π i x + vɛ)] + /π i x). 2. If B i = for all i ], set z x + vɛ, v). 3. Otherwise, a) Set z Ψ B z) := x, v ), where v R U x)v with U x) := i:b U i x). b) With probability m = i:b i= set z Ψ B z). c) Otherwise, set z x, v). mi 1, ρ i S Ψ B z)) Ber B i ; 1 α i S Ψ B z))) ρ i z) Ber B i ; 1 α i z)) mi 1, mi π ix), π i x v ɛ)) mi π i x), π i x + vɛ)) i:b mi 1, π ix) π i x v ɛ)] +, 64) π i x) π i x + vɛ)] + Note that U x) depeds o both u, v ad ɛ, we stress this depedece as it is omitted otatioally. Algorithms 8 ad 9 might appear of limited iterest as they require to sample Beroulli radom variables at each iteratio. I the ext sectios, we show how we ca propose implemetatios that parallel the priority queue implemetatio of the local BPS proposed i 42], see 1, Sectio 3.3.1] for a detailed descriptio, as well as the subsamplig algorithms proposed i 1, 6, 29, Sectio 3.3.2]. 4.2 Prefetchig implemetatio We first describe a priority queue type implemetatio of Algorithm 9 based o parallel prefetchig ideas 11, 2] i scearios where m U x) = U i x Si ), 18

19 x Si beig a subset of the compoets of x ad π i x) = exp U i x Si )). There are may possible variatios of this implemetatio. Algorithm 1 Discrete-time local BPS implemetatio via parallel prefetchig 1. Iitializatio a) For i m], sample o-egative evet times τ i with distributio 2. Iteratio t, t 1 max, 1 π ) τi 1 ix + vτ i + 1)ɛ) mi 1, π ) ix + vk + 1)ɛ). π i x + vτ i ɛ) π i x + vkɛ) k= a) If mi τ i >, the set z x + ɛv, v). Update τ i τ i 1. b) Otherwise, i. Compute U x) := i:τ i= U i x Si ), 65) ad let v R U x) v. ii. With probability mi 1, mi π ix), π i x v ɛ)) mi 1, π ix) π i x v ɛ)] +, 66) mi π i:τ i x), π i x + vɛ)) π i> i:τ i x) π i x + vɛ)] + i= set z x, v ). Sample agai τ i for all i where v j v j for some j S i. iii. Otherwise set z x, v). Sample τ i for all i. The efficiecy of Algorithm 1 relies o the capability of computig the τ i efficietly. This may be possible whe, for example, this is doe i parallel or whe we some property of π i allows it, such as i the case of log-cocave targets detailed as i Algorithm 7 give above. 4.3 Subsamplig implemetatios For sufficietly small ɛ, we might expect that i Step 1 of Algorithm 9 would yield very few idices for which B i = 1. This motivates a approach which ca sample these variables more efficietly by fidig a upper boud o the probability that B i = 1, essetially allowig us to boud the umber of idices for which B i = 1. We preset Algorithm 11; here, the acceptace of the bouce move 64) is computed i two stages: i Step 4.b we simulate evets of probability 1 mi 1, πix) πix v ɛ)] + for each i where B i = 1, if these succeed the i Step 4.c we simulate evets of probability 1 mi π ix) π ix+vɛ)] + 1, miπix),πix v ɛ)) miπ ix),π ix+vɛ)) for each i where B i =. We suggest that oe ca make use of efficiet procedures described i Algorithm 12 ad Algorithm 13 to sample multiple Beroulli radom variables i both Steps 1 ad 4.c; i both cases we expect few cases where the respective Beroulli variables are 1. While Step 4.b also samples a set of Beroulli variables, our assumptio that ɛ is small suggests that the umber of variables sampled here will be small; as such this step may be iexpesive ad there is likely little to be gaied by a more sophisticated simulatio scheme. 19

A Note on Effi cient Conditional Simulation of Gaussian Distributions. April 2010

A Note on Effi cient Conditional Simulation of Gaussian Distributions. April 2010 A Note o Effi ciet Coditioal Simulatio of Gaussia Distributios A D D C S S, U B C, V, BC, C April 2010 A Cosider a multivariate Gaussia radom vector which ca be partitioed ito observed ad uobserved compoetswe