Simultaneous drift conditions for Adaptive Markov Chain Monte Carlo algorithms

Size: px
Start display at page:

Download "Simultaneous drift conditions for Adaptive Markov Chain Monte Carlo algorithms"

Transcription

1 Simultaneous drift conditions for Adaptive Markov Chain Monte Carlo algorithms Yan Bai Feb 2009; Revised Nov 2009 Abstract In the paper, we mainly study ergodicity of adaptive MCMC algorithms. Assume that under some regular conditions about target distributions, all the MCMC samplers in {P γ : γ Y} simultaneously satisfy a group of drift conditions, and have the uniform small set C in the sense of the m-step transition such that each MCMC sampler converges to target at a polynomial rate. We say that the family {P γ : γ Y} is simultaneously polynomially ergodic. Suppose that Diminishing Adaptation and simultaneous polynomial ergodicity hold. We find that either when the number of drift conditions is greater than or equal to two, or when the number of drift conditions having some specific form is one, the adaptive MCMC algorithm is ergodic. We also discuss some recent results related to this topic, and show that under some additional condition, Containment is necessary for ergodicity of adaptive MCMC. Keywords: adaptive MCMC, ergodicity, simultaneous drift condition, simultaneous polynomial ergodicity 1 Introduction Markov chain Monte Carlo (MCMC) algorithms are widely used for approximately sampling from complicated probability distributions. However, there are some limitations for this method, because it is often necessary to tune the scaling and other parameters before the algorithm will converge efficiently. Adaptive MCMC methods using regeneration times and other complicated constructions have been proposed by [8, 6, and elsewhere]. More recently, Adaptive MCMC methods has been used to improve convergence via a self-studying procedure. In this direction, a key step was made by Haario et al. [9], who proposed an adaptive Metropolis algorithm attempting to optimize the proposal distribution, and proved that a particular version of this algorithm correctly converges weakly to the target distribution. Andrieu and Robert [2] observed that the algorithm of [9] can be viewed as a version of the Robbins-Monro stochastic control algorithm [see 12]. The results of [9] were then generalized by [4, 1], proving convergence of more general adaptive MCMC algorithms. Roberts and Rosenthal [13] use a coupling method for adaptive MCMC algorithm to show that its ergodicity is implied by Containment and Diminishing Adaptation, based on the basic Department of Statistics, University of Toronto, Toronto, ON M5S 3G3, CA. yanbai@utstat.toronto.edu 1

2 assumptions: each transition kernel P γ on the state space X and the adaptive parameter space Y (γ Y), is φ-irreducible, aperiodic and admits a nontrivial unique finite invariant probability measure π. Diminishing Adaptation is relatively easy to check, and commonly the adaptive strategy is designed artificially. However, Containment is really abstract and hard to check. In their paper, they prove that simultaneous strongly aperiodic geometric ergodicity ensures Containment. Yang [14] and Atchadé and Fort [3] respectively tackle the open problem 21 in [13]. Both results are very interesting. [14] assumed that all the transition kernels simultaneously satisfy the drift condition P γ V V 1 + bi C, and the adaptive parameter space is compact under some metric, and connects it with the regeneration decomposition to find the uniform bound of the distance P n γ (x, ) π( ) TV for all γ. Once this condition given, the distance P n γ (x, ) π( ) TV can be uniformly bounded by the test function. The boundedness of the test function sequence {V (X n ) : n 0} can ensure Containment. Under some situations, to directly check Containment is quite hard. [3] use the similar coupling method as that in [13] to prove an attractive and better result when adaptive MCMC algorithm is restricted to Markovian Adaptation. [3] also assumed that uniformly strongly aperiodicity, simultaneously drift condition in the weakest form P γ V V 1 + bi C, and uniform convergence on any sublevel set of the test function V (x). The idea is that after the chain comes into some big sublevel set of the test function V (x), apply the coupling method for ergodicity. In Section 2 we introduce the notations and some conditions. In Section 3 we discuss the results in [14, 3]. In Section 4 we provide a necessary condition of ergodicity conditional on some additional condition. In Section 5 we demonstrate our conditions and result. 2 Terminology Consider a target distribution π( ) defined on the state space X with respect to some σ-field B(X ) (π(x) is also used as the density function). Let {P γ : γ Y} be the family of transition kernels of time homogeneous Markov chains with the same stationary distribution as π, i.e. πp γ = π for all γ Y. An adaptive MCMC algorithm Z := {(X n, Γ n ) : n 0} is regarded as lying in the sample path space Ω := (X Y) equipped with a σ-field F. For each initial state x X and initial parameter γ Y, there is a probability measure P (x,γ) such that the probability of the event [Z A] is well-defined for any set A F. There is a filtration G := {G n : n 0} such that Z is adapted to G. Denote the corresponding expectation by E (x,γ) [f(x n )] for the measurable function f : X R. A framework of adaptive MCMC is defined as: 1. Given a initial state X 0 := x 0 X and a kernel P Γ0 with Γ 0 := γ 0 Y. At each iteration n + 1, X n+1 is generated from P Γn (X n, ); 2. Γ n+1 is obtained from some function of X 0,, X n+1 and Γ 0,, Γ n. For A B(X ), P (x0,γ 0 )(X n+1 A G n ) = P (x0,γ 0 )(X n+1 A X n, Γ n ) = P Γn (X n, A). (1) We say that the adaptive MCMC Z is ergodic if for any initial state x 0 X and any kernel index γ 0 Y, P (x0,γ 0 )(X n ) π( ) TV converges to zero eventually where µ TV = µ(a). Obviously the joint distribution of X n+1 and Γ n+1 conditional on G n is sup A B(X ) P (x0,γ 0 ) (X n+1 dx, Γ n+1 dγ G n ) = P Γn (X n, dx)p (x0,γ 0 )(Γ n+1 dγ X n+1 = x, G n ). 2

3 Furthermore, if Γ n+1 is simply a function of X n and Γ n, then the adaptive MCMC algorithm is called Markovian Adaptation, i.e. the joint process {(X n, Γ n ) : n 0} is a time-inhomogeneous Markov Chain. Containment is defined as that for any X 0 = x 0 and Γ 0 = γ 0, for any ɛ > 0, the stochastic process {M ɛ (X n, Γ n ) : n 0} is bounded in probability P (x0,γ 0 ), i.e. for all δ > 0, there is N N such that P (x0,γ 0 )(M ɛ (X n, Γ n ) N) 1 δ for all n N, where M ɛ (x, γ) = inf{n 1 : P n γ (x, ) π( ) TV ɛ} is the ɛ-convergence function. Diminishing Adaptation is defined as that for any X 0 = x 0 and Γ 0 = γ 0, lim n D n = 0 in probability P (x0,γ 0 ) where D n = sup x X PΓn+1 (x, ) P Γn (x, ) TV represents the amount of adaptation performed between iterations n and n + 1. For a non-negative function V, the process {V (X n ) : n 0} is bounded in probability if given any (x 0, γ 0 ) X Y, δ > 0, N > 0, M > 0 such that for n > N, P (x0,γ 0 )(V (X n ) > M) < δ. Suppose that the parameter space Y is a metric space. Say the adaptive parameter process {Γ n : n 0} is bounded in probability if x 0 X, γ 0 Y, ɛ > 0, N > 0, some compact set B Y, such that for n > N, P (x0,γ 0 )(Γ n B c ) < ɛ. 3 Simultaneous Drift Conditions Before studying the simultaneous drift conditions, we show the result in [13]: Theorem 1 ([13]). Ergodicity of an adaptive MCMC is implied by Containment and Diminishing Adaptation. Remark 1. [13] gave one condition: simultaneously strongly aperiodically geometrically ergodic which can ensure Containment. In the definition, the drift conditions have the same form: P γ V (x) λv (x) + bi C (x) for all γ Y. However, if {Γ n : n 0} is bounded in probability, Containment is implied by that in each compact subset B of Y, the drift conditions have the same form: P γ V B (x) λ B V B (x) + b B I C (x). More generally, see the following corollary. Corollary 1. Suppose that (i) the parameter space Y is a metric space, (ii) the adaptive parameter process {Γ n : n 0} is bounded in probability, (iii){ for any compact set K Y, for any ɛ > 0, the local ɛ-convergence time { M ɛ (X n ) := inf m N + : sup γ K P m γ (X n, ) π( ) } TV < ɛ : n 0} is bounded in probability, m (iv) Diminishing Adaptation. Then the adaptive MCMC algorithm {X n : n 0} with the adaptive parameter process {Γ n : n 0} is ergodic. The proof is trivial and omitted. [14] gave the following conditions to tackle the open problem 21 in [13]: Y1: There exist a constant δ > 0, and set C F, and a probability measure ν γ ( ) for γ Y, such that P γ (x, ) δi C (x)ν γ ( ) for γ Y; Y2: all kernels uniformly satisfy the weakest drift condition: P γ V V 1 + bi C, where V : X [1, ) and π(v ) < ; 3

4 Y3: Y is compact under the metric d(γ 1, γ 2 ) = sup P γ1 (x, ) P γ2 (x, ) TV ; x X Y4: the stochastic process {V (X n ) : n 0} is bounded in probability. Theorem 2 ([14]). Suppose Diminishing Adaptation holds. The conditions (Y1)-(Y4) ensure the ergodicity of adaptive MCMC algorithms. Remark In the proof of Theorem 2, both (Y1) and (Y2) can ensure that each transition kernel is ergodic to π. Both (Y3) and (Y4) imply that the total variation distance between P γ and π converges to zero uniformly on Y. 2. The condition π(v ) < is a relatively strong condition. For each P γ, suppose that the chain {X n (γ) : n 0} is a time homogeneous Markov chain with the transition kernel P γ. For any recurrent set A X with π(a) > 0, by Meyn and Tweedie [11] (MT) Proposition , π(v ) = [ A π(dy)e τa ] 1 γ V (X (γ) i ) X (γ) 0 = y. Assuming that there exists a small set C 1 X [ τc1 1 with sup x C1 E γ V (X (γ) i ) X (γ) 0 = x ] [ σc1 <, denote U γ (x) = E γ V (X(γ) i ) X (γ) 0 = x [ τc1 1 Hence, by MT Theorem , P γ U γ U γ V (x)+b 1 I C1 where b 1 = sup x C1 E γ V (X (γ) i Suppose that there is a test function V 1 satisfying P γ V 1 V 1 V + bi C1 and V 1 (x)i C1 (x) V (x). By MT Proposition , U γ (x) V 1 (x). So, P γ U γ U γ 1 + bi C1. We will study the simultaneous drift condition with the form P γ V 1 V 1 V 0 + bi C instead where the test functions V 0 (x) and V 1 (x) are uniform for every P γ. Under this situation, the condition (Y3) are unnecessary, and the condition (Y4) is implied (See Theorem 6, Remark 7). [3] also gave the following conditions to study the ergodicity of adaptive MCMC with Markovian Adaptation: M1: there exists a probability measure ν( ), a constant δ > 0, and set C F such that P γ (x, ) δi C (x)ν( ) for γ Y; M2: there exists a measurable function V : X [1, ) and a positive constant b > 0 such that for any γ Y, (P γ V )(x) V (x) 1 + bi C (x); M3: for any sublevel set D l = {x X : V (x) l} of V, lim n sup Dl Y ]. P n γ (x, ) π( ) TV = 0. Theorem 3 ([3]). Suppose Diminishing Adaptation holds. The conditions (M1)-(M3) imply the ergodicity of adaptive MCMC algorithm with Markovian adaptation. Remark Since P (x0,γ0)(v (X n ) > M) π(d c M) P (x0,γ 0 )(X n ) π( ) TV, M can be taken extremely large such that π(dm c ) < ɛ. (M1-M3) and Diminishing Adaptation imply that R.H.S. of the above equation converges to zero. So, {V (X n ) : n 0} is bounded in probability. 2. In Section 4 we show that under certain condition, Containment is a necessary condition of ergodicity of adaptive MCMC provided that (M3) holds. From another view, [3] s proof does apply the coupling method to check Containment by using Diminishing Adaptation and simultaneous drift conditions. ) X (γ) ] 0 = x. 4

5 4 The necessary condition of the ergodicity Before showing our result, let us to discuss one simple example which was studied in [5]. Example 1. Let the state space X = {1, 2} and the transition kernel [ ] 1 θ θ P θ =. θ 1 θ Obviously, for each θ (0, 1), the stationary distribution is uniform on X. Consider a stateindependent adaptation: for k = 1, 2,, at each time n = 2k 1 choose the transition kernel index θ n 1 = 1/2, and at each time n = 2k choose the transition kernel index θ n 1 = 1/n. Remark 4. The chain converges to the target distribution, but neither Diminishing Adaptation nor Containment holds. Remark 5. Obviously, the adaptive parameter process {θ n : n 0} is not bounded in probability. Example 1 shows that Containment is also not necessary. In the following theorem, we prove that under certain additional condition similar to (M3), Containment is necessary for ergodicity of adaptive algorithms. Theorem 4 (The necessity of Containment). Suppose that there exists an increasing sequence of sets D k X on the state space X, such that for any k > 0, lim sup P n γ n (x, ) π( ) TV = 0. (2) D k Y If the adaptive MCMC algorithm is ergodic then Containment holds. Proof: Fix ɛ > 0. For any δ > 0, taking K > 0 such that π(dk c ) < δ/2. For the set D K, there exists M such that sup P M γ (x, ) π( ) TV < ɛ. D K Y Hence, for any (x 0, γ 0 ) X Y, by the ergodicity of the adaptive MCMC {X n } n, there exists some N > 0 such that n > N, P (x0,γ0)(x n D c K) π(d c K) < δ/2. So, for (X n, Γ n ) (D K, Y), Hence, Therefore, Containment holds. [X n D K ] = [(X n, Γ n ) D K Y] [M ɛ (X n, Γ n ) M]. P (x0,γ 0 ) (M ɛ (X n, Γ n ) > M) P (x0,γ 0 ) ((X n, Γ n ) (D K Y) c ) = P (x0,γ 0 ) (X n D c K) P (x0,γ 0 ) (X n D c K) π(d c K) + π(d c K) < δ. 5

6 Corollary 2. Suppose that the parameter space Y is a metric space, and the adaptive scheme {Γ n : n 0} is bounded in probability. Suppose that there exists an increasing sequence of sets (D k, Y k ) X Y such that any k > 0, lim n sup Pγ n (x, ) π( ) TV = 0. D k Y k If the adaptive MCMC algorithm is ergodic then Containment holds. Proof: Using the same technique in Theorem 4, for large enough M > 0, P (x0,γ 0 ) (M ɛ (X n, Γ n ) > M) P (x0,γ 0 ) ((X n, Γ n ) (D k Y k ) c ) P (x0,γ 0 ) (X n D c k ) + P (x 0,γ 0 ) (Γ n Y c k ) P (x0,γ 0 ) (X n D c K) π(d c K) + π(d c K) + P (x0,γ 0 ) (Γ n Y c k ). Since {Γ n : n 0} is bounded in probability, the result holds. 5 Simultaneous Polynomial Ergodicity Although the ergodicity of adaptive MCMC algorithms, to some degree, is solved in [14] and [3], there are still some properties unknown about simultaneous polynomial ergodicity. In the section, we find that the conditions (Y4) and (M3) are implied for the adaptive MCMC with simultaneous polynomial ergodicity. Before studying it, let us recall the result about a quantitative bound for time-homogeneous Markov chain with polynomial convergence rate by Fort and Moulines [7]. Theorem 5 ([7]). Suppose that the time-homogeneous transition kernel P satisfies the following conditions: P is π-irreducible for an invariant probability measure π; There exist some sets C B(X ) and D B(X ), C D, π(c) > 0 and an integer m 1, such that for any (x, x ) := C D D C, A B(X ), P m (x, A) P m (x, A) ρ x,x (A) (3) where ρ x,x is some measure on X for (x, x ), and ɛ := inf (x,x ) ρ x,x (X ) > 0. Let q 1. There exist some measurable functions V k : X R + \ {0} for k {0, 1,..., q}, and for k {0, 1,..., q 1}, for some constants 0 < a k < 1, b k <, and c k > 0 such that π(v β q ) < for some β (0, 1]. P V k+1 (x) V k+1 (x) V k (x) + b k I C (x), inf x X V k(x) c k > 0, V k (x) b k a k V k (x), x D c, (4) sup V q <. D 6

7 Then, for any x X, n m, P n (x, ) π( ) TV min 1 l q B(β) l (x, n), (5) with B (β) l (x, n) = ɛ + (I A (β) m ) 1 δ x π(w β ), e l S(l, n + 1 m) β + j n+1 m (1 ɛ+ ) j (n m) (S(l, j + 1) β S(l, j) β ), where <, > denotes the inner product in R q+1, {e l }, 0 l q is the canonical basis on R q+1, I is the identity matrix; δ x π(w β ) := δ x (dy)π(dy )W β (y, y ) where W β (x, x ) := ( W β 0 (x, x ),, W β q (x, x )) T with W0 (x, x ) := 1 and W l (x, x ) = I (x, x ) + I c(x, x ) ( l 1 k=0 where m(v 0 ) := inf (x,x ) c {V 0(x) + V 0 (x )}; A (β) m := a k ) 1 S(0, k) := 1 and S(i, k) := (m(v 0 )) 1 (V l (x) + V l (x )) for 1 l q k S(i 1, j), i 1; j=1 A (β) m (0) A (β) m (1) A (β) m (0) , m (q 1) A (β) m (q 2) A (β) m (0) 0 A (β) m (q) A (β) m (q 1) A (β) m (1) A (β) m (0) A (β) where A (β) m (l) := sup l (x,x ) S(i, ( m)β 1 ρ x,x (X ) ) R x,x (x, dy)r x,x (x, dy )W β l i (y, y ), where the residual kernel and ɛ + := sup (x,x ) ρ x,x (X ). R x,x (u, dy) := ( 1 ρ x,x (X ) ) 1 ( P m γ (u, dy) ρ x,x (dy) ) ; Remark 6. In the B (β) l (x, n), ɛ + depends on the set and the measure ρ x,x ; the matrix (I A (β) m ) 1 depends on the set, the transition kernel P, ρ x,x and the test functions V k ; δ x π(w β ) depends on the set and the test functions V k. Consider the special case of the theorem: ρ x,x (dy) = δν(dy) where ν is a probability measure with ν(c) > 0, and := C C. 1. ɛ + = ɛ = δ. ( ) 2. I A (β) m is a lower triangle matrix so (I A (β) m ) 1 = is also a lower triangle matrix, and fixing k 0 all b (β) i,i k are equal. b(β) b (β) ij 1 ii = 1 A (β) m (0) 7 i,j=1,...,q+1. For i > j, b(β) ij is the polynomial

8 combination of A (β) m (0),, A (β) m (i + 1) divided by (1 A (β) that b (β) 21 = A (β) m (1) (1 A (β) m (0)) 2. m (0)) i. By some algebras, we can obtain So, by calculating B(β) 1 (x, n), we can get the quantitative bound with a simple form. B (β) 1 (x, n) only involves two test functions V 0 (x) and V 1 (x). Remark 7. From Equation (4), V 0 (x) b 0 /(1 α 0 ) > b 0 because 0 < α 0 < 1. Consider the drift condition: P V 1 V 1 V 0 + b 0 I C. Since πp = π, π(v 0 ) b 0 π(c) b 0. Hence, the V 0 in the theorem can not be constant. Remark 8. Without the condition π(v β ) <, the bound in Equation (5) can also be obtained. However, the bound is possibly infinity. The subscript l of B (β) l (x, n) and β can explain the polynomial rate. The related rate is S(l, n + 1 m) β = O((n + 1 m) lβ ). It can be observed that B (β) l (x, n) involves test functions V 0 (x),, V l (x), and lim sup n n βl B (β) l (x, n) <. The maximal rate of convergence is equal to qβ. 5.1 Conditions The following conditions are derived from Theorem 5, and some changes are added to apply for adaptive MCMC algorithms. Say that the family {P γ : γ Y} is simultaneously polynomially ergodic (S.P.E.) if the conditions (A1)-(A4) are satisfied. A1: each P γ is φ γ -irreducible with stationary distribution π( ); Remark 9. From MT Proposition , if P γ is ϕ-irreducible, then P γ is π-irreducible and the invariant measure π is a maximal irreducibility measure. A2: there is a set C X, some integer m N, some constant δ > 0, and some probability measure ν γ ( ) on X such that: π(c) > 0, and P m γ (x, ) δi C (x)ν γ ( ) for γ Y; (6) Remark 10. In Theorem 5, there is one condition Eq. (3) ensuring the splitting technique. Here we consider the special case of that condition: ρ x,x (dy) = δν γ (dy) and = C C. Thus, by Remark 6, the bound of P n γ (x, ) π( ) TV depends on C, m, the minorization constant δ, π( ), ν γ, and test functions V l (x) so we assume that they are uniform on all the transition kernels. A3: there is q N and measurable functions: V 0, V 1,..., V q : X (0, ) where V 0 V 1 V q, such that for k = 0, 1,..., q 1, there are 0 < α k < 1, b k <, and c k > 0 such that: P γ V k+1 (x) V k+1 (x) V k (x) + b k I C (x), V k (x) c k for x X and γ Y; (7) V k (x) b k α k V k (x) for x X /C; (8) sup V q (x) <. (9) x C Remark 11. For x C, ν γ (V l ) 1 δ P m γ V l (x) 1 δ sup x C V l (x) + mb l 1 δ. A4: π(v β q ) < for some β (0, 1]. 8

9 5.2 Main Result Theorem 6. Suppose an adaptive MCMC algorithm satisfies Diminishing Adaptation. Then, the algorithm is ergodic under any of the following cases: (i) S.P.E., and the number q of simultaneous drift conditions is strictly greater than two; (ii) S.P.E., and when the number q of simultaneous drift conditions is greater than or equal to two, there exists an increasing function f : R + R + such that V 1 (x) f(v 0 (x)); (iii) Under the conditions (A1) and (A2), there exist some positive constants c > 0, b > b > 0, α (0, 1), and a measurable function V (x) : X R+ with V (x) 1 and supv (x) < such that x C P γ V (x) V (x) cv α (x) + bi C (x) for γ Y, (10) cv α (x)i C c(x) b ; (iv) Under the condition (A1), (A2), and (A4), there exist some constant b > b > 0, two measurable functions V 0 : X R + and V 1 : X R + with 1 V 0 (x) V 1 (x) and supv 1 (x) < such that x C P γ V 1 (x) V 1 (x) V 0 (x) + bi C (x) for γ Y, (11) V 0 (x)i C c(x) b, and the process {V 1 (X n ) : n 0} is bounded in probability. The theorem consists of Theorem 7, Theorem 8, Theorem 9, and Lemma 1. Theorem 9 shows that {V (X n ) : n 0} in the case (iii) is bounded in probability. The case (iii) is a special case of S.P.E. with q = 1 so that the uniform quantitative bound of P n γ (x, ) π( ) TV for γ Y exists. Remark 12. For the part (iii), (A4) is implied by MT Theorem with β = α. 5.3 Proof of Theorem 6 Lemma 1. Suppose that the family {P γ : γ Y} is S.P.E.. If the stochastic process {V l (X n ) : n 0} is bounded in probability for some l {1,..., q}, then Containment is satisfied. Proof: We use the notation in Theorem 5. From S.P.E., for γ Y, let ρ x,x (dy) = δν γ (dy) (so ρ x,x (X ) = δ) and := C C. So, ɛ + = ɛ = δ. Note that the matrix I A (β) m is a lower triangle matrix. Denote (I A (β) m ) 1 := (b (β) ij ) i,,,q. By the definition of B (β) l (x, n), B (β) ɛ + l k=0 l (x, n) = b(β) lk π(dy)w β k (x, y) S(l, n + 1 m) β + j n+1 m (1 ɛ+ ) j (n m) (S(l, j + 1) β S(l, j) β ) ɛ + l S(l, n + 1 m) β b (β) lk π(dy)w β k (x, y). By some algebras, for k = 1,, q, k=0 ) k 1 π(dy)w β k (m(v (x, y) 1 + β [ 0 ) a i V β k (x) + π(v β k ) ] (12) 9

10 because β (0, 1]. In addition, m(v 0 ) c 0 so the coefficient of the second term on the right hand side is finite. By induction, we obtain that b (β) 10 = A(β) m (1), and b(β) 2 11 = 1. It is easy to check that 0 < b (β) 11 1 δ. By some algebras, A (β) m (1) m β + m β + sup (x,x ) C C sup (x,x ) C C (1 A (β) m (0)) 1 A (β) m (0) R x,x (x, dy)r x,x (x, dy )W β 1 (y, y ) [ ] 1 + (a 0 m(v 0 )) β (Pγ m V β 1 (x) + P γ m V β 1 (x )) m β (a 0 m(v 0 )) β (sup V 1 (x) + mb 0 ) x C because Pγ m V β 1 (x) P γ m V 1 (x) V 1 (x) + mb 0. Therefore, b (β) 10 is bounded from the above by some value independent of γ. Thus, ( ) B (β) δ 1 (x, n) S(1, n + 1 m) β b (β) 10 π(dy)w β 0 (x, y) + b(β) 11 π(dy)w β 1 (x, y) δ ( [ ]) (n + 1 m) β b (β) 10 π(c) + b(β) (a 0 m(v 0 )) β (V β β 1 (x) + π(v1 )). Therefore, the boundedness of the process {V 1 (X k ) : k 0} implies that the random sequence B (β) 1 (X n, n) converges to zero uniformly on X in probability. Containment holds. Let {Z j : j 0} be an adaptive sequence of positive random variables. For each j, Z j will denote a fixed Borel measurable function of X j. τ n will denote any stopping time starting from time n of the process {X i : i 0} i.e. [τ n = i] σ(x k : k = 1,, n + i) and P(τ n < ) = 1. Lemma 2 (Dynkin s Formula for adaptive MCMC). For m > 0, and n > 0, τ m,n E[Z τm,n X m, Γ m ] = Z m + E[ (E[Z m+i F m+i 1 ] Z m+i 1 ) X m, Γ m ] i=1 where τ m,n := min(n, τ m, inf(k 0 : Z m+k n)). Proof: τ m,n Z τm,n = Z m + (Z m+i Z m+i 1 ) = Z m + i=1 Since τ m,n i is measurable to F m+i 1, E[Z τm,n X m, Γ m ] =Z m + E[ n I( τ m,n i)(z m+i Z m+i 1 ) i=1 n E[Z m+i Z m+i 1 F m+i 1 ]I( τ m,n i) X m, Γ m ] i=1 τ m,n =Z m + E[ (E[Z m+i F m+i 1 ] Z m+i 1 ) X m, Γ m ]. i=1 10

11 Lemma 3 (Comparison Lemma for adaptive MCMC). Suppose that there exist two sequences of positive functions {s j, f j : j 0} on X such that E[Z j+1 F j ] Z j f j (X j ) + s j (X j ). (13) Then for a stopping time τ n starting from the time n of the adaptive MCMC {X i : i 0}, τ n 1 E[ τ n 1 f n+j (X n+j ) X n, Γ n ] Z n (X n ) + E[ Proof: From Lemma 2 and Eq. (13), the result can be obtained. s n+j (X n+j ) X n, Γ n ]. The following proposition shows the relations between the moments of the hitting time and the test function V -modulated moments for adaptive MCMC algorithms with S.P.E., which is derived from the result for Markov chain in [10, Theorem 3.2]. Define the first return time and the ith return time to the set C from the time n respectively: τ n,c := τ n,c (1) := min {k 1 : X n+k C} (14) and τ n,c (i) := min {k > τ n,c (i 1) : X n+k C} for n 0 and i > 1. (15) Proposition 1. Consider an adaptive MCMC {X i : i 0} with the adaptive parameter {Γ i : i 0}. If the family {P γ : γ Y} is S.P.E., then there exist some constants {d i : i = 0,, q 1} such that at the time n, for k = 1,, q, c q k E[τ k n,c X n, Γ n ] k τ n,c 1 E[ (i + 1) k 1 V q k (X n+i ) X n, Γ n ] d q k (V q (X n ) + k b q i I C (X n )) i=1 where the test functions {V i ( ) : i = 0,, q}, the set C, {c i : i = 0,, q 1}, and {b i : i = 0,, q 1} are defined in the S.P.E.. Proof: Since V q k (x) c q k on X, τ n,c 1 E[ τ n,c 1 (i + 1) k 1 So, the first inequality holds. Consider k = 1. By S.P.E. and Lemma 3, τn,c 0 x k 1 dx = k 1 τ k n,c. (i + 1) k 1 V q k (X n+i ) X n, Γ n ] c q k k E[τ k n,c X n, Γ n ]. (16) τ n,c 1 E[ V q 1 (X n+i ) X n, Γ n ] V q (X n ) + b q 1 I C (X n ). (17) 11

12 So, the case k = 1 of the second inequality of the result holds. For i 0, by S.P.E., E[(i + 1) k 1 V q k+1 (X n+i+1 ) X n+i, Γ n+i ] i k 1 V q k+1 (X n+i ) (i + 1) k 1 (V q k+1 (X n+i ) V q k (X n+i ) + b q k I C (X n+i )) i k 1 V q k+1 (X n+i ) (i + 1) k 1 V q k (X n+i ) + d ( ) i k 2 V q k+1 (X n+i ) + (i + 1) k 1 b q k I C (X n+i ) for some positive d independent of i. By Lemma 3, τ n,c 1 E[ (i + 1) k 1 V q k (X n+i ) X n, Γ n ] τ n,c 1 de[ i (k 1) 1 V q (k 1) (X n+i ) X n, Γ n ] + b q k I C (X n ). From the above equation, by induction, the second inequality of the result holds. (18) Theorem 7. Suppose that the family {P γ : γ Y} is S.P.E. for q > 2. Then, Containment holds. Proof: For k = 1,..., q, take large enough M > 0 such that C {x : V q k (x) M}, P (x0,γ 0 ) (V q k (X n ) > M) = By Proposition 1, for i = 0,, n, n P (x0,γ 0 ) (V q k (X n ) > M, τ i,c > n i, X i C) + P (x0,γ 0 ) (V q k (X n ) > M, τ 0,C > n, X 0 / C). P (x0,γ 0 ) (V q k (X n ) > M, τ i,c > n i X i C) τ i,c 1 P (x0,γ 0 ) (j + 1) k 1 V q k (X i+j ) > (n i) k 1 M+ c q k n i 1 (j + 1) k 1, τ i,c > n i X i C τ i,c 1 P (x0,γ 0 ) (j + 1) k 1 V q k (X i+j ) > (n i) k 1 M+ sup x C E (x0,γ 0 ) c q k n i 1 [ E (x0,γ 0 ) (j + 1) k 1 X i C [ τi,c ] ] 1 (j + 1) k 1 V q k (X i+j ) X i, Γ i X i = x (n i) k 1 M + c n i 1 q k (j + 1) k 1 ) ( d q k sup x C V q (x) + k j=1 b q ji C (x) (n i) k 1 M + c q k n i 1 (j + 1) k 1, 12

13 and P (x0,γ 0 ) (V q k (X n ) > M, τ 0,C > n X 0 / C) By simple algebra, ( d q k V q (x 0 ) + ) k j=1 b q ji C (x 0 ) n k 1 M + c n 1. q k (j + 1)k 1 Therefore, n i 1 ( ) (n i) k 1 M + c q k (j + 1) k 1 = O (n i) k 1 (M + c q k (n i)). P (x0,γ 0 ) (V q k (X n ) > M) k d q k sup V q (x) + b q j x C {x 0 } j=1 ( n ) P (x0,γ 0 )(X i C) (n i) k 1 (M + c q k (n i)) + δ C c(x 0 ) n k 1. (M + c q k n) (19) Whenever q 2, k can be chosen as 2. While k 2, the summation of L.H.S. of Eq. (19) is finite given M. But if q = 2 then just the process {V 0 (X n ) : n 0} is bounded probability so that q > 2 is required for the result. Hence, taking large enough M > 0, the probability will be small enough. So, the sequence {V q 2 (X n ) : n 0} is bounded in probability. By Lemma 1, Containment holds. Remark 13. In the proof, only (A3) is used. Remark 14. If V 0 is a nice function (non-decreasing) of V 1, then the sequence {V 1 (X n ) : n 0} is bounded in probability. In Theorem 9, we discuss this situation for certain simultaneously single polynomial drift condition. Theorem 8. Suppose that {P γ : γ Y} is S.P.E. for q = 2. Suppose that there exists a strictly increasing function f : R + R + such that V 1 (x) f(v 0 (x)) for all x X. Then, Containment is implied. Proof: From Eq. (19), we have that {V 0 (X n ) : n 0} is bounded in probability. Since V 1 (x) f(v 0 (x)), P (x0,γ 0 ) (V 1 (X n ) > f(m)) P (x0,γ 0 ) (f(v 0 (X n )) > f(m)) = P (x0,γ 0 ) (V 0 (X n ) > M), because f( ) is strictly increasing. By the boundedness of V 0 (X n ), for any ɛ > 0, there exists N > 0 and some M > 0 such that for n > N, P (x0,γ 0 ) (V 1 (X n ) > f(m)) ɛ. Therefore, {V 1 (X n ) : n 0} is bounded in probability. By Lemma 1, Containment is satisfied. Consider the single polynomial drift condition, see [10]: P γ V (x) V (x) cv α (x) + bi C (x) where 0 α < 1. Because the moments of the hitting time to the set C is (see details in [10]), for any 1 ξ 1/(1 α), [ τc ] 1 E x (i + 1) ξ 1 V (X i ) < V (x) + bi C (x). 13

14 The polynomial rate function r(n) = n ξ 1. If α = 0, then r(n) is a constant. Under this situation, it is difficult to utilize the technique in Theorem 7 to prove {V (X n ) : n 0} is bounded in probability. Thus, we assume α (0, 1). Proposition 2. Consider an adaptive MCMC {X n : n 0} with an adaptive scheme {Γ n : n 0}. Suppose that (A1) holds, and there exist some positive constants c > 0, > b > 0, α (0, 1), and a measurable function V (x) : X R+ with V (x) 1 and supv (x) < such that x C P γ V (x) V (x) cv α (x) + bi C (x) for γ Y. (20) Then for 1 ξ 1/(1 α), τ n,c 1 (i + 1) ξ 1 V 1 ξ(1 α) (X n+i ) X n, Γ n c ξ (C)(V (X n ) + 1). (21) E (x0,γ 0 ) Proof: The proof applies the techniques in Lemma 3.5 and Theorem 3.6 of [10]. Theorem 9. Suppose that (A2) and the conditions in Proposition 2 are satisfied, and there exists some constant b > b such that cv α (x)i C c > b. Then, Containment is implied. Proof: Using the same techniques in Theorem 7, we have that ( P (x0,γ 0 ) V 1 ξ(1 α) (X n ) > M ) ( n ) P c ξ (sup x C {x0 } V (x) + 1) (x0,γ 0 )(X i C) (n i) ξ 1 (M+n i) + δ C c (x 0). (22) n ξ 1 (M+n) Therefore, for ξ [1, 1/(1 α)), the sequence { V 1 ξ(1 α) (X n ) : n 0 } is bounded in probability. Since 1 ξ(1 α) > 0, the process {V (X n ) : n 0} is bounded in probability. By Lemma 1, Containment holds. Acknowledgements Special thanks are due to Professor Jeffrey S. Rosenthal. Without his assistance this work would not have been completed. The author is very grateful to G. Fort for her helpful suggestions. References [1] C Andrieu and E Moulines. On the ergodicity properties of some adaptive Markov Chain Monte Carlo algorithms.. Ann. Appl. Probab., 16(3): , [2] C. Andrieu and C.P. Robert. Controlled MCMC for optimal sampling.. Preprint, [3] Y.F. Atchadé and G. Fort. Limit Theorems for some adaptive MCMC algorithms with subgeometric kernels. Preprint, [4] Y.F. Atchadé and J.S. Rosenthal. On Adaptive Markov Chain Monte Carlo Algorithms. Bernoulli, 11(5): ,

15 [5] Y. Bai, G.O. Roberts, and J.S. Rosenthal. On the Containment condition of adaptive Markov Chain Monte Carlo algorithms [6] A.E. Brockwell and J.B. Kadane. Indentification of regeneration times in MCMC simulation, with application to adaptive schemes. J. Comp. Graph. Stat, 14: , [7] G. Fort and E. Moulines. Computable Bounds for Subgeometrical and geometrical Ergodicity [8] W.R. Gilks, G.O. Roberts, and S.K. Sahu. Adaptive Markov chain Monte Carlo. J. Amer. Statist. Assoc., 93: , [9] H. Haario, E. Saksman, and J. Tamminen. An adaptive Metropolis algorithm. Bernoulli, 7: , [10] S.F. Jarner and G.O. Roberts. Polynomial convergence rates of Markov Chains. Ann. Appl. Probab., 12(1): , [11] S. P. Meyn and R. L. Tweedie. Markov Chains and Stochastic Stability. London: Springer- Verlag, [12] H. Robbins and S. Monro. A stochastic approximation method. Ann. Math. Stat., 22: , [13] G.O. Roberts and J.S. Rosenthal. Coupling and Ergodicity of adaptive Markov chain Monte Carlo algorithms. J. Appl. Prob., 44: , [14] C. Yang. Recurrent and Ergodic Properties of Adaptive MCMC. Preprint,

On the Containment Condition for Adaptive Markov Chain Monte Carlo Algorithms

On the Containment Condition for Adaptive Markov Chain Monte Carlo Algorithms On the Containment Condition for Adaptive Markov Chain Monte Carlo Algorithms Yan Bai, Gareth O. Roberts, and Jeffrey S. Rosenthal Last revised: July, 2010 Abstract This paper considers ergodicity properties

More information

Coupling and Ergodicity of Adaptive MCMC

Coupling and Ergodicity of Adaptive MCMC Coupling and Ergodicity of Adaptive MCMC by Gareth O. Roberts* and Jeffrey S. Rosenthal** (March 2005; last revised March 2007.) Abstract. We consider basic ergodicity properties of adaptive MCMC algorithms

More information

Some Results on the Ergodicity of Adaptive MCMC Algorithms

Some Results on the Ergodicity of Adaptive MCMC Algorithms Some Results on the Ergodicity of Adaptive MCMC Algorithms Omar Khalil Supervisor: Jeffrey Rosenthal September 2, 2011 1 Contents 1 Andrieu-Moulines 4 2 Roberts-Rosenthal 7 3 Atchadé and Fort 8 4 Relationship

More information

A regeneration proof of the central limit theorem for uniformly ergodic Markov chains

A regeneration proof of the central limit theorem for uniformly ergodic Markov chains A regeneration proof of the central limit theorem for uniformly ergodic Markov chains By AJAY JASRA Department of Mathematics, Imperial College London, SW7 2AZ, London, UK and CHAO YANG Department of Mathematics,

More information

Faithful couplings of Markov chains: now equals forever

Faithful couplings of Markov chains: now equals forever Faithful couplings of Markov chains: now equals forever by Jeffrey S. Rosenthal* Department of Statistics, University of Toronto, Toronto, Ontario, Canada M5S 1A1 Phone: (416) 978-4594; Internet: jeff@utstat.toronto.edu

More information

An introduction to adaptive MCMC

An introduction to adaptive MCMC An introduction to adaptive MCMC Gareth Roberts MIRAW Day on Monte Carlo methods March 2011 Mainly joint work with Jeff Rosenthal. http://www2.warwick.ac.uk/fac/sci/statistics/crism/ Conferences and workshops

More information

SUPPLEMENT TO PAPER CONVERGENCE OF ADAPTIVE AND INTERACTING MARKOV CHAIN MONTE CARLO ALGORITHMS

SUPPLEMENT TO PAPER CONVERGENCE OF ADAPTIVE AND INTERACTING MARKOV CHAIN MONTE CARLO ALGORITHMS Submitted to the Annals of Statistics SUPPLEMENT TO PAPER CONERGENCE OF ADAPTIE AND INTERACTING MARKO CHAIN MONTE CARLO ALGORITHMS By G Fort,, E Moulines and P Priouret LTCI, CNRS - TELECOM ParisTech,

More information

Examples of Adaptive MCMC

Examples of Adaptive MCMC Examples of Adaptive MCMC by Gareth O. Roberts * and Jeffrey S. Rosenthal ** (September, 2006.) Abstract. We investigate the use of adaptive MCMC algorithms to automatically tune the Markov chain parameters

More information

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R.

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R. Ergodic Theorems Samy Tindel Purdue University Probability Theory 2 - MA 539 Taken from Probability: Theory and examples by R. Durrett Samy T. Ergodic theorems Probability Theory 1 / 92 Outline 1 Definitions

More information

University of Toronto Department of Statistics

University of Toronto Department of Statistics Norm Comparisons for Data Augmentation by James P. Hobert Department of Statistics University of Florida and Jeffrey S. Rosenthal Department of Statistics University of Toronto Technical Report No. 0704

More information

The simple slice sampler is a specialised type of MCMC auxiliary variable method (Swendsen and Wang, 1987; Edwards and Sokal, 1988; Besag and Green, 1

The simple slice sampler is a specialised type of MCMC auxiliary variable method (Swendsen and Wang, 1987; Edwards and Sokal, 1988; Besag and Green, 1 Recent progress on computable bounds and the simple slice sampler by Gareth O. Roberts* and Jerey S. Rosenthal** (May, 1999.) This paper discusses general quantitative bounds on the convergence rates of

More information

Quantitative Non-Geometric Convergence Bounds for Independence Samplers

Quantitative Non-Geometric Convergence Bounds for Independence Samplers Quantitative Non-Geometric Convergence Bounds for Independence Samplers by Gareth O. Roberts * and Jeffrey S. Rosenthal ** (September 28; revised July 29.) 1. Introduction. Markov chain Monte Carlo (MCMC)

More information

On Reparametrization and the Gibbs Sampler

On Reparametrization and the Gibbs Sampler On Reparametrization and the Gibbs Sampler Jorge Carlos Román Department of Mathematics Vanderbilt University James P. Hobert Department of Statistics University of Florida March 2014 Brett Presnell Department

More information

arxiv: v1 [math.pr] 17 May 2009

arxiv: v1 [math.pr] 17 May 2009 A strong law of large nubers for artingale arrays Yves F. Atchadé arxiv:0905.2761v1 [ath.pr] 17 May 2009 March 2009 Abstract: We prove a artingale triangular array generalization of the Chow-Birnbau- Marshall

More information

ON CONVERGENCE RATES OF GIBBS SAMPLERS FOR UNIFORM DISTRIBUTIONS

ON CONVERGENCE RATES OF GIBBS SAMPLERS FOR UNIFORM DISTRIBUTIONS The Annals of Applied Probability 1998, Vol. 8, No. 4, 1291 1302 ON CONVERGENCE RATES OF GIBBS SAMPLERS FOR UNIFORM DISTRIBUTIONS By Gareth O. Roberts 1 and Jeffrey S. Rosenthal 2 University of Cambridge

More information

Practical conditions on Markov chains for weak convergence of tail empirical processes

Practical conditions on Markov chains for weak convergence of tail empirical processes Practical conditions on Markov chains for weak convergence of tail empirical processes Olivier Wintenberger University of Copenhagen and Paris VI Joint work with Rafa l Kulik and Philippe Soulier Toronto,

More information

arxiv: v1 [stat.co] 28 Jan 2018

arxiv: v1 [stat.co] 28 Jan 2018 AIR MARKOV CHAIN MONTE CARLO arxiv:1801.09309v1 [stat.co] 28 Jan 2018 Cyril Chimisov, Krzysztof Latuszyński and Gareth O. Roberts Contents Department of Statistics University of Warwick Coventry CV4 7AL

More information

Chapter 7. Markov chain background. 7.1 Finite state space

Chapter 7. Markov chain background. 7.1 Finite state space Chapter 7 Markov chain background A stochastic process is a family of random variables {X t } indexed by a varaible t which we will think of as time. Time can be discrete or continuous. We will only consider

More information

Lectures on Stochastic Stability. Sergey FOSS. Heriot-Watt University. Lecture 4. Coupling and Harris Processes

Lectures on Stochastic Stability. Sergey FOSS. Heriot-Watt University. Lecture 4. Coupling and Harris Processes Lectures on Stochastic Stability Sergey FOSS Heriot-Watt University Lecture 4 Coupling and Harris Processes 1 A simple example Consider a Markov chain X n in a countable state space S with transition probabilities

More information

Lecture 10. Theorem 1.1 [Ergodicity and extremality] A probability measure µ on (Ω, F) is ergodic for T if and only if it is an extremal point in M.

Lecture 10. Theorem 1.1 [Ergodicity and extremality] A probability measure µ on (Ω, F) is ergodic for T if and only if it is an extremal point in M. Lecture 10 1 Ergodic decomposition of invariant measures Let T : (Ω, F) (Ω, F) be measurable, and let M denote the space of T -invariant probability measures on (Ω, F). Then M is a convex set, although

More information

Lecture 5. If we interpret the index n 0 as time, then a Markov chain simply requires that the future depends only on the present and not on the past.

Lecture 5. If we interpret the index n 0 as time, then a Markov chain simply requires that the future depends only on the present and not on the past. 1 Markov chain: definition Lecture 5 Definition 1.1 Markov chain] A sequence of random variables (X n ) n 0 taking values in a measurable state space (S, S) is called a (discrete time) Markov chain, if

More information

Central limit theorems for ergodic continuous-time Markov chains with applications to single birth processes

Central limit theorems for ergodic continuous-time Markov chains with applications to single birth processes Front. Math. China 215, 1(4): 933 947 DOI 1.17/s11464-15-488-5 Central limit theorems for ergodic continuous-time Markov chains with applications to single birth processes Yuanyuan LIU 1, Yuhui ZHANG 2

More information

Perturbed Proximal Gradient Algorithm

Perturbed Proximal Gradient Algorithm Perturbed Proximal Gradient Algorithm Gersende FORT LTCI, CNRS, Telecom ParisTech Université Paris-Saclay, 75013, Paris, France Large-scale inverse problems and optimization Applications to image processing

More information

A generalization of Dobrushin coefficient

A generalization of Dobrushin coefficient A generalization of Dobrushin coefficient Ü µ ŒÆ.êÆ ÆÆ 202.5 . Introduction and main results We generalize the well-known Dobrushin coefficient δ in total variation to weighted total variation δ V, which

More information

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can

More information

Markov Chains and Stochastic Sampling

Markov Chains and Stochastic Sampling Part I Markov Chains and Stochastic Sampling 1 Markov Chains and Random Walks on Graphs 1.1 Structure of Finite Markov Chains We shall only consider Markov chains with a finite, but usually very large,

More information

Examples of Adaptive MCMC

Examples of Adaptive MCMC Examples of Adaptive MCMC by Gareth O. Roberts * and Jeffrey S. Rosenthal ** (September 2006; revised January 2008.) Abstract. We investigate the use of adaptive MCMC algorithms to automatically tune the

More information

Geometric Convergence Rates for Time-Sampled Markov Chains

Geometric Convergence Rates for Time-Sampled Markov Chains Geometric Convergence Rates for Time-Sampled Markov Chains by Jeffrey S. Rosenthal* (April 5, 2002; last revised April 17, 2003) Abstract. We consider time-sampled Markov chain kernels, of the form P µ

More information

Adaptive Markov Chain Monte Carlo: Theory and Methods

Adaptive Markov Chain Monte Carlo: Theory and Methods Chapter Adaptive Markov Chain Monte Carlo: Theory and Methods Yves Atchadé, Gersende Fort and Eric Moulines 2, Pierre Priouret 3. Introduction Markov chain Monte Carlo (MCMC methods allow to generate samples

More information

Recent Advances in Regional Adaptation for MCMC

Recent Advances in Regional Adaptation for MCMC Recent Advances in Regional Adaptation for MCMC Radu Craiu Department of Statistics University of Toronto Collaborators: Yan Bai (Statistics, Toronto) Antonio Fabio di Narzo (Statistics, Bologna) Jeffrey

More information

Existence, Uniqueness and Stability of Invariant Distributions in Continuous-Time Stochastic Models

Existence, Uniqueness and Stability of Invariant Distributions in Continuous-Time Stochastic Models Existence, Uniqueness and Stability of Invariant Distributions in Continuous-Time Stochastic Models Christian Bayer and Klaus Wälde Weierstrass Institute for Applied Analysis and Stochastics and University

More information

University of Toronto Department of Statistics

University of Toronto Department of Statistics Optimal Proposal Distributions and Adaptive MCMC by Jeffrey S. Rosenthal Department of Statistics University of Toronto Technical Report No. 0804 June 30, 2008 TECHNICAL REPORT SERIES University of Toronto

More information

ON THE DEFINITION OF RELATIVE PRESSURE FOR FACTOR MAPS ON SHIFTS OF FINITE TYPE. 1. Introduction

ON THE DEFINITION OF RELATIVE PRESSURE FOR FACTOR MAPS ON SHIFTS OF FINITE TYPE. 1. Introduction ON THE DEFINITION OF RELATIVE PRESSURE FOR FACTOR MAPS ON SHIFTS OF FINITE TYPE KARL PETERSEN AND SUJIN SHIN Abstract. We show that two natural definitions of the relative pressure function for a locally

More information

Monte Carlo methods for sampling-based Stochastic Optimization

Monte Carlo methods for sampling-based Stochastic Optimization Monte Carlo methods for sampling-based Stochastic Optimization Gersende FORT LTCI CNRS & Telecom ParisTech Paris, France Joint works with B. Jourdain, T. Lelièvre, G. Stoltz from ENPC and E. Kuhn from

More information

arxiv: v2 [math.pr] 4 Sep 2017

arxiv: v2 [math.pr] 4 Sep 2017 arxiv:1708.08576v2 [math.pr] 4 Sep 2017 On the Speed of an Excited Asymmetric Random Walk Mike Cinkoske, Joe Jackson, Claire Plunkett September 5, 2017 Abstract An excited random walk is a non-markovian

More information

When is a Markov chain regenerative?

When is a Markov chain regenerative? When is a Markov chain regenerative? Krishna B. Athreya and Vivekananda Roy Iowa tate University Ames, Iowa, 50011, UA Abstract A sequence of random variables {X n } n 0 is called regenerative if it can

More information

Variance Bounding Markov Chains

Variance Bounding Markov Chains Variance Bounding Markov Chains by Gareth O. Roberts * and Jeffrey S. Rosenthal ** (September 2006; revised April 2007.) Abstract. We introduce a new property of Markov chains, called variance bounding.

More information

Lattice Gaussian Sampling with Markov Chain Monte Carlo (MCMC)

Lattice Gaussian Sampling with Markov Chain Monte Carlo (MCMC) Lattice Gaussian Sampling with Markov Chain Monte Carlo (MCMC) Cong Ling Imperial College London aaa joint work with Zheng Wang (Huawei Technologies Shanghai) aaa September 20, 2016 Cong Ling (ICL) MCMC

More information

THE SET OF RECURRENT POINTS OF A CONTINUOUS SELF-MAP ON AN INTERVAL AND STRONG CHAOS

THE SET OF RECURRENT POINTS OF A CONTINUOUS SELF-MAP ON AN INTERVAL AND STRONG CHAOS J. Appl. Math. & Computing Vol. 4(2004), No. - 2, pp. 277-288 THE SET OF RECURRENT POINTS OF A CONTINUOUS SELF-MAP ON AN INTERVAL AND STRONG CHAOS LIDONG WANG, GONGFU LIAO, ZHENYAN CHU AND XIAODONG DUAN

More information

MARKOV CHAINS AND HIDDEN MARKOV MODELS

MARKOV CHAINS AND HIDDEN MARKOV MODELS MARKOV CHAINS AND HIDDEN MARKOV MODELS MERYL SEAH Abstract. This is an expository paper outlining the basics of Markov chains. We start the paper by explaining what a finite Markov chain is. Then we describe

More information

Uniformly Uniformly-ergodic Markov chains and BSDEs

Uniformly Uniformly-ergodic Markov chains and BSDEs Uniformly Uniformly-ergodic Markov chains and BSDEs Samuel N. Cohen Mathematical Institute, University of Oxford (Based on joint work with Ying Hu, Robert Elliott, Lukas Szpruch) Centre Henri Lebesgue,

More information

MATH 564/STAT 555 Applied Stochastic Processes Homework 2, September 18, 2015 Due September 30, 2015

MATH 564/STAT 555 Applied Stochastic Processes Homework 2, September 18, 2015 Due September 30, 2015 ID NAME SCORE MATH 56/STAT 555 Applied Stochastic Processes Homework 2, September 8, 205 Due September 30, 205 The generating function of a sequence a n n 0 is defined as As : a ns n for all s 0 for which

More information

Local consistency of Markov chain Monte Carlo methods

Local consistency of Markov chain Monte Carlo methods Ann Inst Stat Math (2014) 66:63 74 DOI 10.1007/s10463-013-0403-3 Local consistency of Markov chain Monte Carlo methods Kengo Kamatani Received: 12 January 2012 / Revised: 8 March 2013 / Published online:

More information

Lecture 11: Introduction to Markov Chains. Copyright G. Caire (Sample Lectures) 321

Lecture 11: Introduction to Markov Chains. Copyright G. Caire (Sample Lectures) 321 Lecture 11: Introduction to Markov Chains Copyright G. Caire (Sample Lectures) 321 Discrete-time random processes A sequence of RVs indexed by a variable n 2 {0, 1, 2,...} forms a discretetime random process

More information

Renewal theory and computable convergence rates for geometrically ergodic Markov chains

Renewal theory and computable convergence rates for geometrically ergodic Markov chains Renewal theory and computable convergence rates for geometrically ergodic Markov chains Peter H. Baxendale July 3, 2002 Revised version July 6, 2003 Abstract We give computable bounds on the rate of convergence

More information

Weighted Sums of Orthogonal Polynomials Related to Birth-Death Processes with Killing

Weighted Sums of Orthogonal Polynomials Related to Birth-Death Processes with Killing Advances in Dynamical Systems and Applications ISSN 0973-5321, Volume 8, Number 2, pp. 401 412 (2013) http://campus.mst.edu/adsa Weighted Sums of Orthogonal Polynomials Related to Birth-Death Processes

More information

A mathematical framework for Exact Milestoning

A mathematical framework for Exact Milestoning A mathematical framework for Exact Milestoning David Aristoff (joint work with Juan M. Bello-Rivas and Ron Elber) Colorado State University July 2015 D. Aristoff (Colorado State University) July 2015 1

More information

Mixing time for a random walk on a ring

Mixing time for a random walk on a ring Mixing time for a random walk on a ring Stephen Connor Joint work with Michael Bate Paris, September 2013 Introduction Let X be a discrete time Markov chain on a finite state space S, with transition matrix

More information

6 Markov Chain Monte Carlo (MCMC)

6 Markov Chain Monte Carlo (MCMC) 6 Markov Chain Monte Carlo (MCMC) The underlying idea in MCMC is to replace the iid samples of basic MC methods, with dependent samples from an ergodic Markov chain, whose limiting (stationary) distribution

More information

GLUING LEMMAS AND SKOROHOD REPRESENTATIONS

GLUING LEMMAS AND SKOROHOD REPRESENTATIONS GLUING LEMMAS AND SKOROHOD REPRESENTATIONS PATRIZIA BERTI, LUCA PRATELLI, AND PIETRO RIGO Abstract. Let X, E), Y, F) and Z, G) be measurable spaces. Suppose we are given two probability measures γ and

More information

Markov Chains CK eqns Classes Hitting times Rec./trans. Strong Markov Stat. distr. Reversibility * Markov Chains

Markov Chains CK eqns Classes Hitting times Rec./trans. Strong Markov Stat. distr. Reversibility * Markov Chains Markov Chains A random process X is a family {X t : t T } of random variables indexed by some set T. When T = {0, 1, 2,... } one speaks about a discrete-time process, for T = R or T = [0, ) one has a continuous-time

More information

SUPPLEMENT TO EXTENDING THE LATENT MULTINOMIAL MODEL WITH COMPLEX ERROR PROCESSES AND DYNAMIC MARKOV BASES

SUPPLEMENT TO EXTENDING THE LATENT MULTINOMIAL MODEL WITH COMPLEX ERROR PROCESSES AND DYNAMIC MARKOV BASES SUPPLEMENT TO EXTENDING THE LATENT MULTINOMIAL MODEL WITH COMPLEX ERROR PROCESSES AND DYNAMIC MARKOV BASES By Simon J Bonner, Matthew R Schofield, Patrik Noren Steven J Price University of Western Ontario,

More information

The Wang-Landau algorithm in general state spaces: Applications and convergence analysis

The Wang-Landau algorithm in general state spaces: Applications and convergence analysis The Wang-Landau algorithm in general state spaces: Applications and convergence analysis Yves F. Atchadé and Jun S. Liu First version Nov. 2004; Revised Feb. 2007, Aug. 2008) Abstract: The Wang-Landau

More information

MARKOV CHAIN MONTE CARLO

MARKOV CHAIN MONTE CARLO MARKOV CHAIN MONTE CARLO RYAN WANG Abstract. This paper gives a brief introduction to Markov Chain Monte Carlo methods, which offer a general framework for calculating difficult integrals. We start with

More information

Tail inequalities for additive functionals and empirical processes of. Markov chains

Tail inequalities for additive functionals and empirical processes of. Markov chains Tail inequalities for additive functionals and empirical processes of geometrically ergodic Markov chains University of Warsaw Banff, June 2009 Geometric ergodicity Definition A Markov chain X = (X n )

More information

On the Applicability of Regenerative Simulation in Markov Chain Monte Carlo

On the Applicability of Regenerative Simulation in Markov Chain Monte Carlo On the Applicability of Regenerative Simulation in Markov Chain Monte Carlo James P. Hobert 1, Galin L. Jones 2, Brett Presnell 1, and Jeffrey S. Rosenthal 3 1 Department of Statistics University of Florida

More information

Chapter 2: Markov Chains and Queues in Discrete Time

Chapter 2: Markov Chains and Queues in Discrete Time Chapter 2: Markov Chains and Queues in Discrete Time L. Breuer University of Kent 1 Definition Let X n with n N 0 denote random variables on a discrete space E. The sequence X = (X n : n N 0 ) is called

More information

Minorization Conditions and Convergence Rates for Markov Chain Monte Carlo. (September, 1993; revised July, 1994.)

Minorization Conditions and Convergence Rates for Markov Chain Monte Carlo. (September, 1993; revised July, 1994.) Minorization Conditions and Convergence Rates for Markov Chain Monte Carlo September, 1993; revised July, 1994. Appeared in Journal of the American Statistical Association 90 1995, 558 566. by Jeffrey

More information

ITERATED FUNCTION SYSTEMS WITH CONTINUOUS PLACE DEPENDENT PROBABILITIES

ITERATED FUNCTION SYSTEMS WITH CONTINUOUS PLACE DEPENDENT PROBABILITIES UNIVERSITATIS IAGELLONICAE ACTA MATHEMATICA, FASCICULUS XL 2002 ITERATED FUNCTION SYSTEMS WITH CONTINUOUS PLACE DEPENDENT PROBABILITIES by Joanna Jaroszewska Abstract. We study the asymptotic behaviour

More information

Convergence Rate of Markov Chains

Convergence Rate of Markov Chains Convergence Rate of Markov Chains Will Perkins April 16, 2013 Convergence Last class we saw that if X n is an irreducible, aperiodic, positive recurrent Markov chain, then there exists a stationary distribution

More information

An Introduction to Entropy and Subshifts of. Finite Type

An Introduction to Entropy and Subshifts of. Finite Type An Introduction to Entropy and Subshifts of Finite Type Abby Pekoske Department of Mathematics Oregon State University pekoskea@math.oregonstate.edu August 4, 2015 Abstract This work gives an overview

More information

CONVERGENCE THEOREM FOR FINITE MARKOV CHAINS. Contents

CONVERGENCE THEOREM FOR FINITE MARKOV CHAINS. Contents CONVERGENCE THEOREM FOR FINITE MARKOV CHAINS ARI FREEDMAN Abstract. In this expository paper, I will give an overview of the necessary conditions for convergence in Markov chains on finite state spaces.

More information

Exercise Solutions to Functional Analysis

Exercise Solutions to Functional Analysis Exercise Solutions to Functional Analysis Note: References refer to M. Schechter, Principles of Functional Analysis Exersize that. Let φ,..., φ n be an orthonormal set in a Hilbert space H. Show n f n

More information

25.1 Ergodicity and Metric Transitivity

25.1 Ergodicity and Metric Transitivity Chapter 25 Ergodicity This lecture explains what it means for a process to be ergodic or metrically transitive, gives a few characterizes of these properties (especially for AMS processes), and deduces

More information

KERNEL ESTIMATORS OF ASYMPTOTIC VARIANCE FOR ADAPTIVE MARKOV CHAIN MONTE CARLO. By Yves F. Atchadé University of Michigan

KERNEL ESTIMATORS OF ASYMPTOTIC VARIANCE FOR ADAPTIVE MARKOV CHAIN MONTE CARLO. By Yves F. Atchadé University of Michigan Submitted to the Annals of Statistics arxiv: math.pr/0911.1164 KERNEL ESTIMATORS OF ASYMPTOTIC VARIANCE FOR ADAPTIVE MARKOV CHAIN MONTE CARLO By Yves F. Atchadé University of Michigan We study the asymptotic

More information

Stochastic Simulation

Stochastic Simulation Stochastic Simulation Ulm University Institute of Stochastics Lecture Notes Dr. Tim Brereton Summer Term 2015 Ulm, 2015 2 Contents 1 Discrete-Time Markov Chains 5 1.1 Discrete-Time Markov Chains.....................

More information

Winter 2019 Math 106 Topics in Applied Mathematics. Lecture 9: Markov Chain Monte Carlo

Winter 2019 Math 106 Topics in Applied Mathematics. Lecture 9: Markov Chain Monte Carlo Winter 2019 Math 106 Topics in Applied Mathematics Data-driven Uncertainty Quantification Yoonsang Lee (yoonsang.lee@dartmouth.edu) Lecture 9: Markov Chain Monte Carlo 9.1 Markov Chain A Markov Chain Monte

More information

General Glivenko-Cantelli theorems

General Glivenko-Cantelli theorems The ISI s Journal for the Rapid Dissemination of Statistics Research (wileyonlinelibrary.com) DOI: 10.100X/sta.0000......................................................................................................

More information

Lecture notes for Analysis of Algorithms : Markov decision processes

Lecture notes for Analysis of Algorithms : Markov decision processes Lecture notes for Analysis of Algorithms : Markov decision processes Lecturer: Thomas Dueholm Hansen June 6, 013 Abstract We give an introduction to infinite-horizon Markov decision processes (MDPs) with

More information

Stat 516, Homework 1

Stat 516, Homework 1 Stat 516, Homework 1 Due date: October 7 1. Consider an urn with n distinct balls numbered 1,..., n. We sample balls from the urn with replacement. Let N be the number of draws until we encounter a ball

More information

ON THE SIMILARITY OF CENTERED OPERATORS TO CONTRACTIONS. Srdjan Petrović

ON THE SIMILARITY OF CENTERED OPERATORS TO CONTRACTIONS. Srdjan Petrović ON THE SIMILARITY OF CENTERED OPERATORS TO CONTRACTIONS Srdjan Petrović Abstract. In this paper we show that every power bounded operator weighted shift with commuting normal weights is similar to a contraction.

More information

Stochastic Proximal Gradient Algorithm

Stochastic Proximal Gradient Algorithm Stochastic Institut Mines-Télécom / Telecom ParisTech / Laboratoire Traitement et Communication de l Information Joint work with: Y. Atchade, Ann Arbor, USA, G. Fort LTCI/Télécom Paristech and the kind

More information

Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics

Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics Eric Slud, Statistics Program Lecture 1: Metropolis-Hastings Algorithm, plus background in Simulation and Markov Chains. Lecture

More information

Rudiments of Ergodic Theory

Rudiments of Ergodic Theory Rudiments of Ergodic Theory Zefeng Chen September 24, 203 Abstract In this note we intend to present basic ergodic theory. We begin with the notion of a measure preserving transformation. We then define

More information

Random Times and Their Properties

Random Times and Their Properties Chapter 6 Random Times and Their Properties Section 6.1 recalls the definition of a filtration (a growing collection of σ-fields) and of stopping times (basically, measurable random times). Section 6.2

More information

PROBLEMS. (b) (Polarization Identity) Show that in any inner product space

PROBLEMS. (b) (Polarization Identity) Show that in any inner product space 1 Professor Carl Cowen Math 54600 Fall 09 PROBLEMS 1. (Geometry in Inner Product Spaces) (a) (Parallelogram Law) Show that in any inner product space x + y 2 + x y 2 = 2( x 2 + y 2 ). (b) (Polarization

More information

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability... Functional Analysis Franck Sueur 2018-2019 Contents 1 Metric spaces 1 1.1 Definitions........................................ 1 1.2 Completeness...................................... 3 1.3 Compactness......................................

More information

Reversible Markov chains

Reversible Markov chains Reversible Markov chains Variational representations and ordering Chris Sherlock Abstract This pedagogical document explains three variational representations that are useful when comparing the efficiencies

More information

Stochastic relations of random variables and processes

Stochastic relations of random variables and processes Stochastic relations of random variables and processes Lasse Leskelä Helsinki University of Technology 7th World Congress in Probability and Statistics Singapore, 18 July 2008 Fundamental problem of applied

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov

More information

Real Analysis Notes. Thomas Goller

Real Analysis Notes. Thomas Goller Real Analysis Notes Thomas Goller September 4, 2011 Contents 1 Abstract Measure Spaces 2 1.1 Basic Definitions........................... 2 1.2 Measurable Functions........................ 2 1.3 Integration..............................

More information

Advances and Applications in Perfect Sampling

Advances and Applications in Perfect Sampling and Applications in Perfect Sampling Ph.D. Dissertation Defense Ulrike Schneider advisor: Jem Corcoran May 8, 2003 Department of Applied Mathematics University of Colorado Outline Introduction (1) MCMC

More information

STOCHASTIC PROCESSES Basic notions

STOCHASTIC PROCESSES Basic notions J. Virtamo 38.3143 Queueing Theory / Stochastic processes 1 STOCHASTIC PROCESSES Basic notions Often the systems we consider evolve in time and we are interested in their dynamic behaviour, usually involving

More information

ON THE STRONG LIMIT THEOREMS FOR DOUBLE ARRAYS OF BLOCKWISE M-DEPENDENT RANDOM VARIABLES

ON THE STRONG LIMIT THEOREMS FOR DOUBLE ARRAYS OF BLOCKWISE M-DEPENDENT RANDOM VARIABLES Available at: http://publications.ictp.it IC/28/89 United Nations Educational, Scientific Cultural Organization International Atomic Energy Agency THE ABDUS SALAM INTERNATIONAL CENTRE FOR THEORETICAL PHYSICS

More information

Lecture 5 - Hausdorff and Gromov-Hausdorff Distance

Lecture 5 - Hausdorff and Gromov-Hausdorff Distance Lecture 5 - Hausdorff and Gromov-Hausdorff Distance August 1, 2011 1 Definition and Basic Properties Given a metric space X, the set of closed sets of X supports a metric, the Hausdorff metric. If A is

More information

Value Iteration and Action ɛ-approximation of Optimal Policies in Discounted Markov Decision Processes

Value Iteration and Action ɛ-approximation of Optimal Policies in Discounted Markov Decision Processes Value Iteration and Action ɛ-approximation of Optimal Policies in Discounted Markov Decision Processes RAÚL MONTES-DE-OCA Departamento de Matemáticas Universidad Autónoma Metropolitana-Iztapalapa San Rafael

More information

TUNING OF MARKOV CHAIN MONTE CARLO ALGORITHMS USING COPULAS

TUNING OF MARKOV CHAIN MONTE CARLO ALGORITHMS USING COPULAS U.P.B. Sci. Bull., Series A, Vol. 73, Iss. 1, 2011 ISSN 1223-7027 TUNING OF MARKOV CHAIN MONTE CARLO ALGORITHMS USING COPULAS Radu V. Craiu 1 Algoritmii de tipul Metropolis-Hastings se constituie într-una

More information

Empirical Processes: General Weak Convergence Theory

Empirical Processes: General Weak Convergence Theory Empirical Processes: General Weak Convergence Theory Moulinath Banerjee May 18, 2010 1 Extended Weak Convergence The lack of measurability of the empirical process with respect to the sigma-field generated

More information

MET Workshop: Exercises

MET Workshop: Exercises MET Workshop: Exercises Alex Blumenthal and Anthony Quas May 7, 206 Notation. R d is endowed with the standard inner product (, ) and Euclidean norm. M d d (R) denotes the space of n n real matrices. When

More information

Consistency of the maximum likelihood estimator for general hidden Markov models

Consistency of the maximum likelihood estimator for general hidden Markov models Consistency of the maximum likelihood estimator for general hidden Markov models Jimmy Olsson Centre for Mathematical Sciences Lund University Nordstat 2012 Umeå, Sweden Collaborators Hidden Markov models

More information

Walsh Diffusions. Andrey Sarantsev. March 27, University of California, Santa Barbara. Andrey Sarantsev University of Washington, Seattle 1 / 1

Walsh Diffusions. Andrey Sarantsev. March 27, University of California, Santa Barbara. Andrey Sarantsev University of Washington, Seattle 1 / 1 Walsh Diffusions Andrey Sarantsev University of California, Santa Barbara March 27, 2017 Andrey Sarantsev University of Washington, Seattle 1 / 1 Walsh Brownian Motion on R d Spinning measure µ: probability

More information

REVERSALS ON SFT S. 1. Introduction and preliminaries

REVERSALS ON SFT S. 1. Introduction and preliminaries Trends in Mathematics Information Center for Mathematical Sciences Volume 7, Number 2, December, 2004, Pages 119 125 REVERSALS ON SFT S JUNGSEOB LEE Abstract. Reversals of topological dynamical systems

More information

Lecture Notes Introduction to Ergodic Theory

Lecture Notes Introduction to Ergodic Theory Lecture Notes Introduction to Ergodic Theory Tiago Pereira Department of Mathematics Imperial College London Our course consists of five introductory lectures on probabilistic aspects of dynamical systems,

More information

Formal power series rings, inverse limits, and I-adic completions of rings

Formal power series rings, inverse limits, and I-adic completions of rings Formal power series rings, inverse limits, and I-adic completions of rings Formal semigroup rings and formal power series rings We next want to explore the notion of a (formal) power series ring in finitely

More information

1 Kernels Definitions Operations Finite State Space Regular Conditional Probabilities... 4

1 Kernels Definitions Operations Finite State Space Regular Conditional Probabilities... 4 Stat 8501 Lecture Notes Markov Chains Charles J. Geyer April 23, 2014 Contents 1 Kernels 2 1.1 Definitions............................. 2 1.2 Operations............................ 3 1.3 Finite State Space........................

More information

Part II Probability and Measure

Part II Probability and Measure Part II Probability and Measure Theorems Based on lectures by J. Miller Notes taken by Dexter Chua Michaelmas 2016 These notes are not endorsed by the lecturers, and I have modified them (often significantly)

More information

Brownian Motion. 1 Definition Brownian Motion Wiener measure... 3

Brownian Motion. 1 Definition Brownian Motion Wiener measure... 3 Brownian Motion Contents 1 Definition 2 1.1 Brownian Motion................................. 2 1.2 Wiener measure.................................. 3 2 Construction 4 2.1 Gaussian process.................................

More information

25.1 Markov Chain Monte Carlo (MCMC)

25.1 Markov Chain Monte Carlo (MCMC) CS880: Approximations Algorithms Scribe: Dave Andrzejewski Lecturer: Shuchi Chawla Topic: Approx counting/sampling, MCMC methods Date: 4/4/07 The previous lecture showed that, for self-reducible problems,

More information

Analysis of the Gibbs sampler for a model. related to James-Stein estimators. Jeffrey S. Rosenthal*

Analysis of the Gibbs sampler for a model. related to James-Stein estimators. Jeffrey S. Rosenthal* Analysis of the Gibbs sampler for a model related to James-Stein estimators by Jeffrey S. Rosenthal* Department of Statistics University of Toronto Toronto, Ontario Canada M5S 1A1 Phone: 416 978-4594.

More information

Lect4: Exact Sampling Techniques and MCMC Convergence Analysis

Lect4: Exact Sampling Techniques and MCMC Convergence Analysis Lect4: Exact Sampling Techniques and MCMC Convergence Analysis. Exact sampling. Convergence analysis of MCMC. First-hit time analysis for MCMC--ways to analyze the proposals. Outline of the Module Definitions

More information