Simultaneous drift conditions for Adaptive Markov Chain Monte Carlo algorithms
|
|
- Nickolas Lloyd
- 5 years ago
- Views:
Transcription
1 Simultaneous drift conditions for Adaptive Markov Chain Monte Carlo algorithms Yan Bai Feb 2009; Revised Nov 2009 Abstract In the paper, we mainly study ergodicity of adaptive MCMC algorithms. Assume that under some regular conditions about target distributions, all the MCMC samplers in {P γ : γ Y} simultaneously satisfy a group of drift conditions, and have the uniform small set C in the sense of the m-step transition such that each MCMC sampler converges to target at a polynomial rate. We say that the family {P γ : γ Y} is simultaneously polynomially ergodic. Suppose that Diminishing Adaptation and simultaneous polynomial ergodicity hold. We find that either when the number of drift conditions is greater than or equal to two, or when the number of drift conditions having some specific form is one, the adaptive MCMC algorithm is ergodic. We also discuss some recent results related to this topic, and show that under some additional condition, Containment is necessary for ergodicity of adaptive MCMC. Keywords: adaptive MCMC, ergodicity, simultaneous drift condition, simultaneous polynomial ergodicity 1 Introduction Markov chain Monte Carlo (MCMC) algorithms are widely used for approximately sampling from complicated probability distributions. However, there are some limitations for this method, because it is often necessary to tune the scaling and other parameters before the algorithm will converge efficiently. Adaptive MCMC methods using regeneration times and other complicated constructions have been proposed by [8, 6, and elsewhere]. More recently, Adaptive MCMC methods has been used to improve convergence via a self-studying procedure. In this direction, a key step was made by Haario et al. [9], who proposed an adaptive Metropolis algorithm attempting to optimize the proposal distribution, and proved that a particular version of this algorithm correctly converges weakly to the target distribution. Andrieu and Robert [2] observed that the algorithm of [9] can be viewed as a version of the Robbins-Monro stochastic control algorithm [see 12]. The results of [9] were then generalized by [4, 1], proving convergence of more general adaptive MCMC algorithms. Roberts and Rosenthal [13] use a coupling method for adaptive MCMC algorithm to show that its ergodicity is implied by Containment and Diminishing Adaptation, based on the basic Department of Statistics, University of Toronto, Toronto, ON M5S 3G3, CA. yanbai@utstat.toronto.edu 1
2 assumptions: each transition kernel P γ on the state space X and the adaptive parameter space Y (γ Y), is φ-irreducible, aperiodic and admits a nontrivial unique finite invariant probability measure π. Diminishing Adaptation is relatively easy to check, and commonly the adaptive strategy is designed artificially. However, Containment is really abstract and hard to check. In their paper, they prove that simultaneous strongly aperiodic geometric ergodicity ensures Containment. Yang [14] and Atchadé and Fort [3] respectively tackle the open problem 21 in [13]. Both results are very interesting. [14] assumed that all the transition kernels simultaneously satisfy the drift condition P γ V V 1 + bi C, and the adaptive parameter space is compact under some metric, and connects it with the regeneration decomposition to find the uniform bound of the distance P n γ (x, ) π( ) TV for all γ. Once this condition given, the distance P n γ (x, ) π( ) TV can be uniformly bounded by the test function. The boundedness of the test function sequence {V (X n ) : n 0} can ensure Containment. Under some situations, to directly check Containment is quite hard. [3] use the similar coupling method as that in [13] to prove an attractive and better result when adaptive MCMC algorithm is restricted to Markovian Adaptation. [3] also assumed that uniformly strongly aperiodicity, simultaneously drift condition in the weakest form P γ V V 1 + bi C, and uniform convergence on any sublevel set of the test function V (x). The idea is that after the chain comes into some big sublevel set of the test function V (x), apply the coupling method for ergodicity. In Section 2 we introduce the notations and some conditions. In Section 3 we discuss the results in [14, 3]. In Section 4 we provide a necessary condition of ergodicity conditional on some additional condition. In Section 5 we demonstrate our conditions and result. 2 Terminology Consider a target distribution π( ) defined on the state space X with respect to some σ-field B(X ) (π(x) is also used as the density function). Let {P γ : γ Y} be the family of transition kernels of time homogeneous Markov chains with the same stationary distribution as π, i.e. πp γ = π for all γ Y. An adaptive MCMC algorithm Z := {(X n, Γ n ) : n 0} is regarded as lying in the sample path space Ω := (X Y) equipped with a σ-field F. For each initial state x X and initial parameter γ Y, there is a probability measure P (x,γ) such that the probability of the event [Z A] is well-defined for any set A F. There is a filtration G := {G n : n 0} such that Z is adapted to G. Denote the corresponding expectation by E (x,γ) [f(x n )] for the measurable function f : X R. A framework of adaptive MCMC is defined as: 1. Given a initial state X 0 := x 0 X and a kernel P Γ0 with Γ 0 := γ 0 Y. At each iteration n + 1, X n+1 is generated from P Γn (X n, ); 2. Γ n+1 is obtained from some function of X 0,, X n+1 and Γ 0,, Γ n. For A B(X ), P (x0,γ 0 )(X n+1 A G n ) = P (x0,γ 0 )(X n+1 A X n, Γ n ) = P Γn (X n, A). (1) We say that the adaptive MCMC Z is ergodic if for any initial state x 0 X and any kernel index γ 0 Y, P (x0,γ 0 )(X n ) π( ) TV converges to zero eventually where µ TV = µ(a). Obviously the joint distribution of X n+1 and Γ n+1 conditional on G n is sup A B(X ) P (x0,γ 0 ) (X n+1 dx, Γ n+1 dγ G n ) = P Γn (X n, dx)p (x0,γ 0 )(Γ n+1 dγ X n+1 = x, G n ). 2
3 Furthermore, if Γ n+1 is simply a function of X n and Γ n, then the adaptive MCMC algorithm is called Markovian Adaptation, i.e. the joint process {(X n, Γ n ) : n 0} is a time-inhomogeneous Markov Chain. Containment is defined as that for any X 0 = x 0 and Γ 0 = γ 0, for any ɛ > 0, the stochastic process {M ɛ (X n, Γ n ) : n 0} is bounded in probability P (x0,γ 0 ), i.e. for all δ > 0, there is N N such that P (x0,γ 0 )(M ɛ (X n, Γ n ) N) 1 δ for all n N, where M ɛ (x, γ) = inf{n 1 : P n γ (x, ) π( ) TV ɛ} is the ɛ-convergence function. Diminishing Adaptation is defined as that for any X 0 = x 0 and Γ 0 = γ 0, lim n D n = 0 in probability P (x0,γ 0 ) where D n = sup x X PΓn+1 (x, ) P Γn (x, ) TV represents the amount of adaptation performed between iterations n and n + 1. For a non-negative function V, the process {V (X n ) : n 0} is bounded in probability if given any (x 0, γ 0 ) X Y, δ > 0, N > 0, M > 0 such that for n > N, P (x0,γ 0 )(V (X n ) > M) < δ. Suppose that the parameter space Y is a metric space. Say the adaptive parameter process {Γ n : n 0} is bounded in probability if x 0 X, γ 0 Y, ɛ > 0, N > 0, some compact set B Y, such that for n > N, P (x0,γ 0 )(Γ n B c ) < ɛ. 3 Simultaneous Drift Conditions Before studying the simultaneous drift conditions, we show the result in [13]: Theorem 1 ([13]). Ergodicity of an adaptive MCMC is implied by Containment and Diminishing Adaptation. Remark 1. [13] gave one condition: simultaneously strongly aperiodically geometrically ergodic which can ensure Containment. In the definition, the drift conditions have the same form: P γ V (x) λv (x) + bi C (x) for all γ Y. However, if {Γ n : n 0} is bounded in probability, Containment is implied by that in each compact subset B of Y, the drift conditions have the same form: P γ V B (x) λ B V B (x) + b B I C (x). More generally, see the following corollary. Corollary 1. Suppose that (i) the parameter space Y is a metric space, (ii) the adaptive parameter process {Γ n : n 0} is bounded in probability, (iii){ for any compact set K Y, for any ɛ > 0, the local ɛ-convergence time { M ɛ (X n ) := inf m N + : sup γ K P m γ (X n, ) π( ) } TV < ɛ : n 0} is bounded in probability, m (iv) Diminishing Adaptation. Then the adaptive MCMC algorithm {X n : n 0} with the adaptive parameter process {Γ n : n 0} is ergodic. The proof is trivial and omitted. [14] gave the following conditions to tackle the open problem 21 in [13]: Y1: There exist a constant δ > 0, and set C F, and a probability measure ν γ ( ) for γ Y, such that P γ (x, ) δi C (x)ν γ ( ) for γ Y; Y2: all kernels uniformly satisfy the weakest drift condition: P γ V V 1 + bi C, where V : X [1, ) and π(v ) < ; 3
4 Y3: Y is compact under the metric d(γ 1, γ 2 ) = sup P γ1 (x, ) P γ2 (x, ) TV ; x X Y4: the stochastic process {V (X n ) : n 0} is bounded in probability. Theorem 2 ([14]). Suppose Diminishing Adaptation holds. The conditions (Y1)-(Y4) ensure the ergodicity of adaptive MCMC algorithms. Remark In the proof of Theorem 2, both (Y1) and (Y2) can ensure that each transition kernel is ergodic to π. Both (Y3) and (Y4) imply that the total variation distance between P γ and π converges to zero uniformly on Y. 2. The condition π(v ) < is a relatively strong condition. For each P γ, suppose that the chain {X n (γ) : n 0} is a time homogeneous Markov chain with the transition kernel P γ. For any recurrent set A X with π(a) > 0, by Meyn and Tweedie [11] (MT) Proposition , π(v ) = [ A π(dy)e τa ] 1 γ V (X (γ) i ) X (γ) 0 = y. Assuming that there exists a small set C 1 X [ τc1 1 with sup x C1 E γ V (X (γ) i ) X (γ) 0 = x ] [ σc1 <, denote U γ (x) = E γ V (X(γ) i ) X (γ) 0 = x [ τc1 1 Hence, by MT Theorem , P γ U γ U γ V (x)+b 1 I C1 where b 1 = sup x C1 E γ V (X (γ) i Suppose that there is a test function V 1 satisfying P γ V 1 V 1 V + bi C1 and V 1 (x)i C1 (x) V (x). By MT Proposition , U γ (x) V 1 (x). So, P γ U γ U γ 1 + bi C1. We will study the simultaneous drift condition with the form P γ V 1 V 1 V 0 + bi C instead where the test functions V 0 (x) and V 1 (x) are uniform for every P γ. Under this situation, the condition (Y3) are unnecessary, and the condition (Y4) is implied (See Theorem 6, Remark 7). [3] also gave the following conditions to study the ergodicity of adaptive MCMC with Markovian Adaptation: M1: there exists a probability measure ν( ), a constant δ > 0, and set C F such that P γ (x, ) δi C (x)ν( ) for γ Y; M2: there exists a measurable function V : X [1, ) and a positive constant b > 0 such that for any γ Y, (P γ V )(x) V (x) 1 + bi C (x); M3: for any sublevel set D l = {x X : V (x) l} of V, lim n sup Dl Y ]. P n γ (x, ) π( ) TV = 0. Theorem 3 ([3]). Suppose Diminishing Adaptation holds. The conditions (M1)-(M3) imply the ergodicity of adaptive MCMC algorithm with Markovian adaptation. Remark Since P (x0,γ0)(v (X n ) > M) π(d c M) P (x0,γ 0 )(X n ) π( ) TV, M can be taken extremely large such that π(dm c ) < ɛ. (M1-M3) and Diminishing Adaptation imply that R.H.S. of the above equation converges to zero. So, {V (X n ) : n 0} is bounded in probability. 2. In Section 4 we show that under certain condition, Containment is a necessary condition of ergodicity of adaptive MCMC provided that (M3) holds. From another view, [3] s proof does apply the coupling method to check Containment by using Diminishing Adaptation and simultaneous drift conditions. ) X (γ) ] 0 = x. 4
5 4 The necessary condition of the ergodicity Before showing our result, let us to discuss one simple example which was studied in [5]. Example 1. Let the state space X = {1, 2} and the transition kernel [ ] 1 θ θ P θ =. θ 1 θ Obviously, for each θ (0, 1), the stationary distribution is uniform on X. Consider a stateindependent adaptation: for k = 1, 2,, at each time n = 2k 1 choose the transition kernel index θ n 1 = 1/2, and at each time n = 2k choose the transition kernel index θ n 1 = 1/n. Remark 4. The chain converges to the target distribution, but neither Diminishing Adaptation nor Containment holds. Remark 5. Obviously, the adaptive parameter process {θ n : n 0} is not bounded in probability. Example 1 shows that Containment is also not necessary. In the following theorem, we prove that under certain additional condition similar to (M3), Containment is necessary for ergodicity of adaptive algorithms. Theorem 4 (The necessity of Containment). Suppose that there exists an increasing sequence of sets D k X on the state space X, such that for any k > 0, lim sup P n γ n (x, ) π( ) TV = 0. (2) D k Y If the adaptive MCMC algorithm is ergodic then Containment holds. Proof: Fix ɛ > 0. For any δ > 0, taking K > 0 such that π(dk c ) < δ/2. For the set D K, there exists M such that sup P M γ (x, ) π( ) TV < ɛ. D K Y Hence, for any (x 0, γ 0 ) X Y, by the ergodicity of the adaptive MCMC {X n } n, there exists some N > 0 such that n > N, P (x0,γ0)(x n D c K) π(d c K) < δ/2. So, for (X n, Γ n ) (D K, Y), Hence, Therefore, Containment holds. [X n D K ] = [(X n, Γ n ) D K Y] [M ɛ (X n, Γ n ) M]. P (x0,γ 0 ) (M ɛ (X n, Γ n ) > M) P (x0,γ 0 ) ((X n, Γ n ) (D K Y) c ) = P (x0,γ 0 ) (X n D c K) P (x0,γ 0 ) (X n D c K) π(d c K) + π(d c K) < δ. 5
6 Corollary 2. Suppose that the parameter space Y is a metric space, and the adaptive scheme {Γ n : n 0} is bounded in probability. Suppose that there exists an increasing sequence of sets (D k, Y k ) X Y such that any k > 0, lim n sup Pγ n (x, ) π( ) TV = 0. D k Y k If the adaptive MCMC algorithm is ergodic then Containment holds. Proof: Using the same technique in Theorem 4, for large enough M > 0, P (x0,γ 0 ) (M ɛ (X n, Γ n ) > M) P (x0,γ 0 ) ((X n, Γ n ) (D k Y k ) c ) P (x0,γ 0 ) (X n D c k ) + P (x 0,γ 0 ) (Γ n Y c k ) P (x0,γ 0 ) (X n D c K) π(d c K) + π(d c K) + P (x0,γ 0 ) (Γ n Y c k ). Since {Γ n : n 0} is bounded in probability, the result holds. 5 Simultaneous Polynomial Ergodicity Although the ergodicity of adaptive MCMC algorithms, to some degree, is solved in [14] and [3], there are still some properties unknown about simultaneous polynomial ergodicity. In the section, we find that the conditions (Y4) and (M3) are implied for the adaptive MCMC with simultaneous polynomial ergodicity. Before studying it, let us recall the result about a quantitative bound for time-homogeneous Markov chain with polynomial convergence rate by Fort and Moulines [7]. Theorem 5 ([7]). Suppose that the time-homogeneous transition kernel P satisfies the following conditions: P is π-irreducible for an invariant probability measure π; There exist some sets C B(X ) and D B(X ), C D, π(c) > 0 and an integer m 1, such that for any (x, x ) := C D D C, A B(X ), P m (x, A) P m (x, A) ρ x,x (A) (3) where ρ x,x is some measure on X for (x, x ), and ɛ := inf (x,x ) ρ x,x (X ) > 0. Let q 1. There exist some measurable functions V k : X R + \ {0} for k {0, 1,..., q}, and for k {0, 1,..., q 1}, for some constants 0 < a k < 1, b k <, and c k > 0 such that π(v β q ) < for some β (0, 1]. P V k+1 (x) V k+1 (x) V k (x) + b k I C (x), inf x X V k(x) c k > 0, V k (x) b k a k V k (x), x D c, (4) sup V q <. D 6
7 Then, for any x X, n m, P n (x, ) π( ) TV min 1 l q B(β) l (x, n), (5) with B (β) l (x, n) = ɛ + (I A (β) m ) 1 δ x π(w β ), e l S(l, n + 1 m) β + j n+1 m (1 ɛ+ ) j (n m) (S(l, j + 1) β S(l, j) β ), where <, > denotes the inner product in R q+1, {e l }, 0 l q is the canonical basis on R q+1, I is the identity matrix; δ x π(w β ) := δ x (dy)π(dy )W β (y, y ) where W β (x, x ) := ( W β 0 (x, x ),, W β q (x, x )) T with W0 (x, x ) := 1 and W l (x, x ) = I (x, x ) + I c(x, x ) ( l 1 k=0 where m(v 0 ) := inf (x,x ) c {V 0(x) + V 0 (x )}; A (β) m := a k ) 1 S(0, k) := 1 and S(i, k) := (m(v 0 )) 1 (V l (x) + V l (x )) for 1 l q k S(i 1, j), i 1; j=1 A (β) m (0) A (β) m (1) A (β) m (0) , m (q 1) A (β) m (q 2) A (β) m (0) 0 A (β) m (q) A (β) m (q 1) A (β) m (1) A (β) m (0) A (β) where A (β) m (l) := sup l (x,x ) S(i, ( m)β 1 ρ x,x (X ) ) R x,x (x, dy)r x,x (x, dy )W β l i (y, y ), where the residual kernel and ɛ + := sup (x,x ) ρ x,x (X ). R x,x (u, dy) := ( 1 ρ x,x (X ) ) 1 ( P m γ (u, dy) ρ x,x (dy) ) ; Remark 6. In the B (β) l (x, n), ɛ + depends on the set and the measure ρ x,x ; the matrix (I A (β) m ) 1 depends on the set, the transition kernel P, ρ x,x and the test functions V k ; δ x π(w β ) depends on the set and the test functions V k. Consider the special case of the theorem: ρ x,x (dy) = δν(dy) where ν is a probability measure with ν(c) > 0, and := C C. 1. ɛ + = ɛ = δ. ( ) 2. I A (β) m is a lower triangle matrix so (I A (β) m ) 1 = is also a lower triangle matrix, and fixing k 0 all b (β) i,i k are equal. b(β) b (β) ij 1 ii = 1 A (β) m (0) 7 i,j=1,...,q+1. For i > j, b(β) ij is the polynomial
8 combination of A (β) m (0),, A (β) m (i + 1) divided by (1 A (β) that b (β) 21 = A (β) m (1) (1 A (β) m (0)) 2. m (0)) i. By some algebras, we can obtain So, by calculating B(β) 1 (x, n), we can get the quantitative bound with a simple form. B (β) 1 (x, n) only involves two test functions V 0 (x) and V 1 (x). Remark 7. From Equation (4), V 0 (x) b 0 /(1 α 0 ) > b 0 because 0 < α 0 < 1. Consider the drift condition: P V 1 V 1 V 0 + b 0 I C. Since πp = π, π(v 0 ) b 0 π(c) b 0. Hence, the V 0 in the theorem can not be constant. Remark 8. Without the condition π(v β ) <, the bound in Equation (5) can also be obtained. However, the bound is possibly infinity. The subscript l of B (β) l (x, n) and β can explain the polynomial rate. The related rate is S(l, n + 1 m) β = O((n + 1 m) lβ ). It can be observed that B (β) l (x, n) involves test functions V 0 (x),, V l (x), and lim sup n n βl B (β) l (x, n) <. The maximal rate of convergence is equal to qβ. 5.1 Conditions The following conditions are derived from Theorem 5, and some changes are added to apply for adaptive MCMC algorithms. Say that the family {P γ : γ Y} is simultaneously polynomially ergodic (S.P.E.) if the conditions (A1)-(A4) are satisfied. A1: each P γ is φ γ -irreducible with stationary distribution π( ); Remark 9. From MT Proposition , if P γ is ϕ-irreducible, then P γ is π-irreducible and the invariant measure π is a maximal irreducibility measure. A2: there is a set C X, some integer m N, some constant δ > 0, and some probability measure ν γ ( ) on X such that: π(c) > 0, and P m γ (x, ) δi C (x)ν γ ( ) for γ Y; (6) Remark 10. In Theorem 5, there is one condition Eq. (3) ensuring the splitting technique. Here we consider the special case of that condition: ρ x,x (dy) = δν γ (dy) and = C C. Thus, by Remark 6, the bound of P n γ (x, ) π( ) TV depends on C, m, the minorization constant δ, π( ), ν γ, and test functions V l (x) so we assume that they are uniform on all the transition kernels. A3: there is q N and measurable functions: V 0, V 1,..., V q : X (0, ) where V 0 V 1 V q, such that for k = 0, 1,..., q 1, there are 0 < α k < 1, b k <, and c k > 0 such that: P γ V k+1 (x) V k+1 (x) V k (x) + b k I C (x), V k (x) c k for x X and γ Y; (7) V k (x) b k α k V k (x) for x X /C; (8) sup V q (x) <. (9) x C Remark 11. For x C, ν γ (V l ) 1 δ P m γ V l (x) 1 δ sup x C V l (x) + mb l 1 δ. A4: π(v β q ) < for some β (0, 1]. 8
9 5.2 Main Result Theorem 6. Suppose an adaptive MCMC algorithm satisfies Diminishing Adaptation. Then, the algorithm is ergodic under any of the following cases: (i) S.P.E., and the number q of simultaneous drift conditions is strictly greater than two; (ii) S.P.E., and when the number q of simultaneous drift conditions is greater than or equal to two, there exists an increasing function f : R + R + such that V 1 (x) f(v 0 (x)); (iii) Under the conditions (A1) and (A2), there exist some positive constants c > 0, b > b > 0, α (0, 1), and a measurable function V (x) : X R+ with V (x) 1 and supv (x) < such that x C P γ V (x) V (x) cv α (x) + bi C (x) for γ Y, (10) cv α (x)i C c(x) b ; (iv) Under the condition (A1), (A2), and (A4), there exist some constant b > b > 0, two measurable functions V 0 : X R + and V 1 : X R + with 1 V 0 (x) V 1 (x) and supv 1 (x) < such that x C P γ V 1 (x) V 1 (x) V 0 (x) + bi C (x) for γ Y, (11) V 0 (x)i C c(x) b, and the process {V 1 (X n ) : n 0} is bounded in probability. The theorem consists of Theorem 7, Theorem 8, Theorem 9, and Lemma 1. Theorem 9 shows that {V (X n ) : n 0} in the case (iii) is bounded in probability. The case (iii) is a special case of S.P.E. with q = 1 so that the uniform quantitative bound of P n γ (x, ) π( ) TV for γ Y exists. Remark 12. For the part (iii), (A4) is implied by MT Theorem with β = α. 5.3 Proof of Theorem 6 Lemma 1. Suppose that the family {P γ : γ Y} is S.P.E.. If the stochastic process {V l (X n ) : n 0} is bounded in probability for some l {1,..., q}, then Containment is satisfied. Proof: We use the notation in Theorem 5. From S.P.E., for γ Y, let ρ x,x (dy) = δν γ (dy) (so ρ x,x (X ) = δ) and := C C. So, ɛ + = ɛ = δ. Note that the matrix I A (β) m is a lower triangle matrix. Denote (I A (β) m ) 1 := (b (β) ij ) i,,,q. By the definition of B (β) l (x, n), B (β) ɛ + l k=0 l (x, n) = b(β) lk π(dy)w β k (x, y) S(l, n + 1 m) β + j n+1 m (1 ɛ+ ) j (n m) (S(l, j + 1) β S(l, j) β ) ɛ + l S(l, n + 1 m) β b (β) lk π(dy)w β k (x, y). By some algebras, for k = 1,, q, k=0 ) k 1 π(dy)w β k (m(v (x, y) 1 + β [ 0 ) a i V β k (x) + π(v β k ) ] (12) 9
10 because β (0, 1]. In addition, m(v 0 ) c 0 so the coefficient of the second term on the right hand side is finite. By induction, we obtain that b (β) 10 = A(β) m (1), and b(β) 2 11 = 1. It is easy to check that 0 < b (β) 11 1 δ. By some algebras, A (β) m (1) m β + m β + sup (x,x ) C C sup (x,x ) C C (1 A (β) m (0)) 1 A (β) m (0) R x,x (x, dy)r x,x (x, dy )W β 1 (y, y ) [ ] 1 + (a 0 m(v 0 )) β (Pγ m V β 1 (x) + P γ m V β 1 (x )) m β (a 0 m(v 0 )) β (sup V 1 (x) + mb 0 ) x C because Pγ m V β 1 (x) P γ m V 1 (x) V 1 (x) + mb 0. Therefore, b (β) 10 is bounded from the above by some value independent of γ. Thus, ( ) B (β) δ 1 (x, n) S(1, n + 1 m) β b (β) 10 π(dy)w β 0 (x, y) + b(β) 11 π(dy)w β 1 (x, y) δ ( [ ]) (n + 1 m) β b (β) 10 π(c) + b(β) (a 0 m(v 0 )) β (V β β 1 (x) + π(v1 )). Therefore, the boundedness of the process {V 1 (X k ) : k 0} implies that the random sequence B (β) 1 (X n, n) converges to zero uniformly on X in probability. Containment holds. Let {Z j : j 0} be an adaptive sequence of positive random variables. For each j, Z j will denote a fixed Borel measurable function of X j. τ n will denote any stopping time starting from time n of the process {X i : i 0} i.e. [τ n = i] σ(x k : k = 1,, n + i) and P(τ n < ) = 1. Lemma 2 (Dynkin s Formula for adaptive MCMC). For m > 0, and n > 0, τ m,n E[Z τm,n X m, Γ m ] = Z m + E[ (E[Z m+i F m+i 1 ] Z m+i 1 ) X m, Γ m ] i=1 where τ m,n := min(n, τ m, inf(k 0 : Z m+k n)). Proof: τ m,n Z τm,n = Z m + (Z m+i Z m+i 1 ) = Z m + i=1 Since τ m,n i is measurable to F m+i 1, E[Z τm,n X m, Γ m ] =Z m + E[ n I( τ m,n i)(z m+i Z m+i 1 ) i=1 n E[Z m+i Z m+i 1 F m+i 1 ]I( τ m,n i) X m, Γ m ] i=1 τ m,n =Z m + E[ (E[Z m+i F m+i 1 ] Z m+i 1 ) X m, Γ m ]. i=1 10
11 Lemma 3 (Comparison Lemma for adaptive MCMC). Suppose that there exist two sequences of positive functions {s j, f j : j 0} on X such that E[Z j+1 F j ] Z j f j (X j ) + s j (X j ). (13) Then for a stopping time τ n starting from the time n of the adaptive MCMC {X i : i 0}, τ n 1 E[ τ n 1 f n+j (X n+j ) X n, Γ n ] Z n (X n ) + E[ Proof: From Lemma 2 and Eq. (13), the result can be obtained. s n+j (X n+j ) X n, Γ n ]. The following proposition shows the relations between the moments of the hitting time and the test function V -modulated moments for adaptive MCMC algorithms with S.P.E., which is derived from the result for Markov chain in [10, Theorem 3.2]. Define the first return time and the ith return time to the set C from the time n respectively: τ n,c := τ n,c (1) := min {k 1 : X n+k C} (14) and τ n,c (i) := min {k > τ n,c (i 1) : X n+k C} for n 0 and i > 1. (15) Proposition 1. Consider an adaptive MCMC {X i : i 0} with the adaptive parameter {Γ i : i 0}. If the family {P γ : γ Y} is S.P.E., then there exist some constants {d i : i = 0,, q 1} such that at the time n, for k = 1,, q, c q k E[τ k n,c X n, Γ n ] k τ n,c 1 E[ (i + 1) k 1 V q k (X n+i ) X n, Γ n ] d q k (V q (X n ) + k b q i I C (X n )) i=1 where the test functions {V i ( ) : i = 0,, q}, the set C, {c i : i = 0,, q 1}, and {b i : i = 0,, q 1} are defined in the S.P.E.. Proof: Since V q k (x) c q k on X, τ n,c 1 E[ τ n,c 1 (i + 1) k 1 So, the first inequality holds. Consider k = 1. By S.P.E. and Lemma 3, τn,c 0 x k 1 dx = k 1 τ k n,c. (i + 1) k 1 V q k (X n+i ) X n, Γ n ] c q k k E[τ k n,c X n, Γ n ]. (16) τ n,c 1 E[ V q 1 (X n+i ) X n, Γ n ] V q (X n ) + b q 1 I C (X n ). (17) 11
12 So, the case k = 1 of the second inequality of the result holds. For i 0, by S.P.E., E[(i + 1) k 1 V q k+1 (X n+i+1 ) X n+i, Γ n+i ] i k 1 V q k+1 (X n+i ) (i + 1) k 1 (V q k+1 (X n+i ) V q k (X n+i ) + b q k I C (X n+i )) i k 1 V q k+1 (X n+i ) (i + 1) k 1 V q k (X n+i ) + d ( ) i k 2 V q k+1 (X n+i ) + (i + 1) k 1 b q k I C (X n+i ) for some positive d independent of i. By Lemma 3, τ n,c 1 E[ (i + 1) k 1 V q k (X n+i ) X n, Γ n ] τ n,c 1 de[ i (k 1) 1 V q (k 1) (X n+i ) X n, Γ n ] + b q k I C (X n ). From the above equation, by induction, the second inequality of the result holds. (18) Theorem 7. Suppose that the family {P γ : γ Y} is S.P.E. for q > 2. Then, Containment holds. Proof: For k = 1,..., q, take large enough M > 0 such that C {x : V q k (x) M}, P (x0,γ 0 ) (V q k (X n ) > M) = By Proposition 1, for i = 0,, n, n P (x0,γ 0 ) (V q k (X n ) > M, τ i,c > n i, X i C) + P (x0,γ 0 ) (V q k (X n ) > M, τ 0,C > n, X 0 / C). P (x0,γ 0 ) (V q k (X n ) > M, τ i,c > n i X i C) τ i,c 1 P (x0,γ 0 ) (j + 1) k 1 V q k (X i+j ) > (n i) k 1 M+ c q k n i 1 (j + 1) k 1, τ i,c > n i X i C τ i,c 1 P (x0,γ 0 ) (j + 1) k 1 V q k (X i+j ) > (n i) k 1 M+ sup x C E (x0,γ 0 ) c q k n i 1 [ E (x0,γ 0 ) (j + 1) k 1 X i C [ τi,c ] ] 1 (j + 1) k 1 V q k (X i+j ) X i, Γ i X i = x (n i) k 1 M + c n i 1 q k (j + 1) k 1 ) ( d q k sup x C V q (x) + k j=1 b q ji C (x) (n i) k 1 M + c q k n i 1 (j + 1) k 1, 12
13 and P (x0,γ 0 ) (V q k (X n ) > M, τ 0,C > n X 0 / C) By simple algebra, ( d q k V q (x 0 ) + ) k j=1 b q ji C (x 0 ) n k 1 M + c n 1. q k (j + 1)k 1 Therefore, n i 1 ( ) (n i) k 1 M + c q k (j + 1) k 1 = O (n i) k 1 (M + c q k (n i)). P (x0,γ 0 ) (V q k (X n ) > M) k d q k sup V q (x) + b q j x C {x 0 } j=1 ( n ) P (x0,γ 0 )(X i C) (n i) k 1 (M + c q k (n i)) + δ C c(x 0 ) n k 1. (M + c q k n) (19) Whenever q 2, k can be chosen as 2. While k 2, the summation of L.H.S. of Eq. (19) is finite given M. But if q = 2 then just the process {V 0 (X n ) : n 0} is bounded probability so that q > 2 is required for the result. Hence, taking large enough M > 0, the probability will be small enough. So, the sequence {V q 2 (X n ) : n 0} is bounded in probability. By Lemma 1, Containment holds. Remark 13. In the proof, only (A3) is used. Remark 14. If V 0 is a nice function (non-decreasing) of V 1, then the sequence {V 1 (X n ) : n 0} is bounded in probability. In Theorem 9, we discuss this situation for certain simultaneously single polynomial drift condition. Theorem 8. Suppose that {P γ : γ Y} is S.P.E. for q = 2. Suppose that there exists a strictly increasing function f : R + R + such that V 1 (x) f(v 0 (x)) for all x X. Then, Containment is implied. Proof: From Eq. (19), we have that {V 0 (X n ) : n 0} is bounded in probability. Since V 1 (x) f(v 0 (x)), P (x0,γ 0 ) (V 1 (X n ) > f(m)) P (x0,γ 0 ) (f(v 0 (X n )) > f(m)) = P (x0,γ 0 ) (V 0 (X n ) > M), because f( ) is strictly increasing. By the boundedness of V 0 (X n ), for any ɛ > 0, there exists N > 0 and some M > 0 such that for n > N, P (x0,γ 0 ) (V 1 (X n ) > f(m)) ɛ. Therefore, {V 1 (X n ) : n 0} is bounded in probability. By Lemma 1, Containment is satisfied. Consider the single polynomial drift condition, see [10]: P γ V (x) V (x) cv α (x) + bi C (x) where 0 α < 1. Because the moments of the hitting time to the set C is (see details in [10]), for any 1 ξ 1/(1 α), [ τc ] 1 E x (i + 1) ξ 1 V (X i ) < V (x) + bi C (x). 13
14 The polynomial rate function r(n) = n ξ 1. If α = 0, then r(n) is a constant. Under this situation, it is difficult to utilize the technique in Theorem 7 to prove {V (X n ) : n 0} is bounded in probability. Thus, we assume α (0, 1). Proposition 2. Consider an adaptive MCMC {X n : n 0} with an adaptive scheme {Γ n : n 0}. Suppose that (A1) holds, and there exist some positive constants c > 0, > b > 0, α (0, 1), and a measurable function V (x) : X R+ with V (x) 1 and supv (x) < such that x C P γ V (x) V (x) cv α (x) + bi C (x) for γ Y. (20) Then for 1 ξ 1/(1 α), τ n,c 1 (i + 1) ξ 1 V 1 ξ(1 α) (X n+i ) X n, Γ n c ξ (C)(V (X n ) + 1). (21) E (x0,γ 0 ) Proof: The proof applies the techniques in Lemma 3.5 and Theorem 3.6 of [10]. Theorem 9. Suppose that (A2) and the conditions in Proposition 2 are satisfied, and there exists some constant b > b such that cv α (x)i C c > b. Then, Containment is implied. Proof: Using the same techniques in Theorem 7, we have that ( P (x0,γ 0 ) V 1 ξ(1 α) (X n ) > M ) ( n ) P c ξ (sup x C {x0 } V (x) + 1) (x0,γ 0 )(X i C) (n i) ξ 1 (M+n i) + δ C c (x 0). (22) n ξ 1 (M+n) Therefore, for ξ [1, 1/(1 α)), the sequence { V 1 ξ(1 α) (X n ) : n 0 } is bounded in probability. Since 1 ξ(1 α) > 0, the process {V (X n ) : n 0} is bounded in probability. By Lemma 1, Containment holds. Acknowledgements Special thanks are due to Professor Jeffrey S. Rosenthal. Without his assistance this work would not have been completed. The author is very grateful to G. Fort for her helpful suggestions. References [1] C Andrieu and E Moulines. On the ergodicity properties of some adaptive Markov Chain Monte Carlo algorithms.. Ann. Appl. Probab., 16(3): , [2] C. Andrieu and C.P. Robert. Controlled MCMC for optimal sampling.. Preprint, [3] Y.F. Atchadé and G. Fort. Limit Theorems for some adaptive MCMC algorithms with subgeometric kernels. Preprint, [4] Y.F. Atchadé and J.S. Rosenthal. On Adaptive Markov Chain Monte Carlo Algorithms. Bernoulli, 11(5): ,
15 [5] Y. Bai, G.O. Roberts, and J.S. Rosenthal. On the Containment condition of adaptive Markov Chain Monte Carlo algorithms [6] A.E. Brockwell and J.B. Kadane. Indentification of regeneration times in MCMC simulation, with application to adaptive schemes. J. Comp. Graph. Stat, 14: , [7] G. Fort and E. Moulines. Computable Bounds for Subgeometrical and geometrical Ergodicity [8] W.R. Gilks, G.O. Roberts, and S.K. Sahu. Adaptive Markov chain Monte Carlo. J. Amer. Statist. Assoc., 93: , [9] H. Haario, E. Saksman, and J. Tamminen. An adaptive Metropolis algorithm. Bernoulli, 7: , [10] S.F. Jarner and G.O. Roberts. Polynomial convergence rates of Markov Chains. Ann. Appl. Probab., 12(1): , [11] S. P. Meyn and R. L. Tweedie. Markov Chains and Stochastic Stability. London: Springer- Verlag, [12] H. Robbins and S. Monro. A stochastic approximation method. Ann. Math. Stat., 22: , [13] G.O. Roberts and J.S. Rosenthal. Coupling and Ergodicity of adaptive Markov chain Monte Carlo algorithms. J. Appl. Prob., 44: , [14] C. Yang. Recurrent and Ergodic Properties of Adaptive MCMC. Preprint,
On the Containment Condition for Adaptive Markov Chain Monte Carlo Algorithms
On the Containment Condition for Adaptive Markov Chain Monte Carlo Algorithms Yan Bai, Gareth O. Roberts, and Jeffrey S. Rosenthal Last revised: July, 2010 Abstract This paper considers ergodicity properties
More informationCoupling and Ergodicity of Adaptive MCMC
Coupling and Ergodicity of Adaptive MCMC by Gareth O. Roberts* and Jeffrey S. Rosenthal** (March 2005; last revised March 2007.) Abstract. We consider basic ergodicity properties of adaptive MCMC algorithms
More informationSome Results on the Ergodicity of Adaptive MCMC Algorithms
Some Results on the Ergodicity of Adaptive MCMC Algorithms Omar Khalil Supervisor: Jeffrey Rosenthal September 2, 2011 1 Contents 1 Andrieu-Moulines 4 2 Roberts-Rosenthal 7 3 Atchadé and Fort 8 4 Relationship
More informationA regeneration proof of the central limit theorem for uniformly ergodic Markov chains
A regeneration proof of the central limit theorem for uniformly ergodic Markov chains By AJAY JASRA Department of Mathematics, Imperial College London, SW7 2AZ, London, UK and CHAO YANG Department of Mathematics,
More informationFaithful couplings of Markov chains: now equals forever
Faithful couplings of Markov chains: now equals forever by Jeffrey S. Rosenthal* Department of Statistics, University of Toronto, Toronto, Ontario, Canada M5S 1A1 Phone: (416) 978-4594; Internet: jeff@utstat.toronto.edu
More informationAn introduction to adaptive MCMC
An introduction to adaptive MCMC Gareth Roberts MIRAW Day on Monte Carlo methods March 2011 Mainly joint work with Jeff Rosenthal. http://www2.warwick.ac.uk/fac/sci/statistics/crism/ Conferences and workshops
More informationSUPPLEMENT TO PAPER CONVERGENCE OF ADAPTIVE AND INTERACTING MARKOV CHAIN MONTE CARLO ALGORITHMS
Submitted to the Annals of Statistics SUPPLEMENT TO PAPER CONERGENCE OF ADAPTIE AND INTERACTING MARKO CHAIN MONTE CARLO ALGORITHMS By G Fort,, E Moulines and P Priouret LTCI, CNRS - TELECOM ParisTech,
More informationExamples of Adaptive MCMC
Examples of Adaptive MCMC by Gareth O. Roberts * and Jeffrey S. Rosenthal ** (September, 2006.) Abstract. We investigate the use of adaptive MCMC algorithms to automatically tune the Markov chain parameters
More informationErgodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R.
Ergodic Theorems Samy Tindel Purdue University Probability Theory 2 - MA 539 Taken from Probability: Theory and examples by R. Durrett Samy T. Ergodic theorems Probability Theory 1 / 92 Outline 1 Definitions
More informationUniversity of Toronto Department of Statistics
Norm Comparisons for Data Augmentation by James P. Hobert Department of Statistics University of Florida and Jeffrey S. Rosenthal Department of Statistics University of Toronto Technical Report No. 0704
More informationThe simple slice sampler is a specialised type of MCMC auxiliary variable method (Swendsen and Wang, 1987; Edwards and Sokal, 1988; Besag and Green, 1
Recent progress on computable bounds and the simple slice sampler by Gareth O. Roberts* and Jerey S. Rosenthal** (May, 1999.) This paper discusses general quantitative bounds on the convergence rates of
More informationQuantitative Non-Geometric Convergence Bounds for Independence Samplers
Quantitative Non-Geometric Convergence Bounds for Independence Samplers by Gareth O. Roberts * and Jeffrey S. Rosenthal ** (September 28; revised July 29.) 1. Introduction. Markov chain Monte Carlo (MCMC)
More informationOn Reparametrization and the Gibbs Sampler
On Reparametrization and the Gibbs Sampler Jorge Carlos Román Department of Mathematics Vanderbilt University James P. Hobert Department of Statistics University of Florida March 2014 Brett Presnell Department
More informationarxiv: v1 [math.pr] 17 May 2009
A strong law of large nubers for artingale arrays Yves F. Atchadé arxiv:0905.2761v1 [ath.pr] 17 May 2009 March 2009 Abstract: We prove a artingale triangular array generalization of the Chow-Birnbau- Marshall
More informationON CONVERGENCE RATES OF GIBBS SAMPLERS FOR UNIFORM DISTRIBUTIONS
The Annals of Applied Probability 1998, Vol. 8, No. 4, 1291 1302 ON CONVERGENCE RATES OF GIBBS SAMPLERS FOR UNIFORM DISTRIBUTIONS By Gareth O. Roberts 1 and Jeffrey S. Rosenthal 2 University of Cambridge
More informationPractical conditions on Markov chains for weak convergence of tail empirical processes
Practical conditions on Markov chains for weak convergence of tail empirical processes Olivier Wintenberger University of Copenhagen and Paris VI Joint work with Rafa l Kulik and Philippe Soulier Toronto,
More informationarxiv: v1 [stat.co] 28 Jan 2018
AIR MARKOV CHAIN MONTE CARLO arxiv:1801.09309v1 [stat.co] 28 Jan 2018 Cyril Chimisov, Krzysztof Latuszyński and Gareth O. Roberts Contents Department of Statistics University of Warwick Coventry CV4 7AL
More informationChapter 7. Markov chain background. 7.1 Finite state space
Chapter 7 Markov chain background A stochastic process is a family of random variables {X t } indexed by a varaible t which we will think of as time. Time can be discrete or continuous. We will only consider
More informationLectures on Stochastic Stability. Sergey FOSS. Heriot-Watt University. Lecture 4. Coupling and Harris Processes
Lectures on Stochastic Stability Sergey FOSS Heriot-Watt University Lecture 4 Coupling and Harris Processes 1 A simple example Consider a Markov chain X n in a countable state space S with transition probabilities
More informationLecture 10. Theorem 1.1 [Ergodicity and extremality] A probability measure µ on (Ω, F) is ergodic for T if and only if it is an extremal point in M.
Lecture 10 1 Ergodic decomposition of invariant measures Let T : (Ω, F) (Ω, F) be measurable, and let M denote the space of T -invariant probability measures on (Ω, F). Then M is a convex set, although
More informationLecture 5. If we interpret the index n 0 as time, then a Markov chain simply requires that the future depends only on the present and not on the past.
1 Markov chain: definition Lecture 5 Definition 1.1 Markov chain] A sequence of random variables (X n ) n 0 taking values in a measurable state space (S, S) is called a (discrete time) Markov chain, if
More informationCentral limit theorems for ergodic continuous-time Markov chains with applications to single birth processes
Front. Math. China 215, 1(4): 933 947 DOI 1.17/s11464-15-488-5 Central limit theorems for ergodic continuous-time Markov chains with applications to single birth processes Yuanyuan LIU 1, Yuhui ZHANG 2
More informationPerturbed Proximal Gradient Algorithm
Perturbed Proximal Gradient Algorithm Gersende FORT LTCI, CNRS, Telecom ParisTech Université Paris-Saclay, 75013, Paris, France Large-scale inverse problems and optimization Applications to image processing
More informationA generalization of Dobrushin coefficient
A generalization of Dobrushin coefficient Ü µ ŒÆ.êÆ ÆÆ 202.5 . Introduction and main results We generalize the well-known Dobrushin coefficient δ in total variation to weighted total variation δ V, which
More informationMarkov Chain Monte Carlo (MCMC)
Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can
More informationMarkov Chains and Stochastic Sampling
Part I Markov Chains and Stochastic Sampling 1 Markov Chains and Random Walks on Graphs 1.1 Structure of Finite Markov Chains We shall only consider Markov chains with a finite, but usually very large,
More informationExamples of Adaptive MCMC
Examples of Adaptive MCMC by Gareth O. Roberts * and Jeffrey S. Rosenthal ** (September 2006; revised January 2008.) Abstract. We investigate the use of adaptive MCMC algorithms to automatically tune the
More informationGeometric Convergence Rates for Time-Sampled Markov Chains
Geometric Convergence Rates for Time-Sampled Markov Chains by Jeffrey S. Rosenthal* (April 5, 2002; last revised April 17, 2003) Abstract. We consider time-sampled Markov chain kernels, of the form P µ
More informationAdaptive Markov Chain Monte Carlo: Theory and Methods
Chapter Adaptive Markov Chain Monte Carlo: Theory and Methods Yves Atchadé, Gersende Fort and Eric Moulines 2, Pierre Priouret 3. Introduction Markov chain Monte Carlo (MCMC methods allow to generate samples
More informationRecent Advances in Regional Adaptation for MCMC
Recent Advances in Regional Adaptation for MCMC Radu Craiu Department of Statistics University of Toronto Collaborators: Yan Bai (Statistics, Toronto) Antonio Fabio di Narzo (Statistics, Bologna) Jeffrey
More informationExistence, Uniqueness and Stability of Invariant Distributions in Continuous-Time Stochastic Models
Existence, Uniqueness and Stability of Invariant Distributions in Continuous-Time Stochastic Models Christian Bayer and Klaus Wälde Weierstrass Institute for Applied Analysis and Stochastics and University
More informationUniversity of Toronto Department of Statistics
Optimal Proposal Distributions and Adaptive MCMC by Jeffrey S. Rosenthal Department of Statistics University of Toronto Technical Report No. 0804 June 30, 2008 TECHNICAL REPORT SERIES University of Toronto
More informationON THE DEFINITION OF RELATIVE PRESSURE FOR FACTOR MAPS ON SHIFTS OF FINITE TYPE. 1. Introduction
ON THE DEFINITION OF RELATIVE PRESSURE FOR FACTOR MAPS ON SHIFTS OF FINITE TYPE KARL PETERSEN AND SUJIN SHIN Abstract. We show that two natural definitions of the relative pressure function for a locally
More informationMonte Carlo methods for sampling-based Stochastic Optimization
Monte Carlo methods for sampling-based Stochastic Optimization Gersende FORT LTCI CNRS & Telecom ParisTech Paris, France Joint works with B. Jourdain, T. Lelièvre, G. Stoltz from ENPC and E. Kuhn from
More informationarxiv: v2 [math.pr] 4 Sep 2017
arxiv:1708.08576v2 [math.pr] 4 Sep 2017 On the Speed of an Excited Asymmetric Random Walk Mike Cinkoske, Joe Jackson, Claire Plunkett September 5, 2017 Abstract An excited random walk is a non-markovian
More informationWhen is a Markov chain regenerative?
When is a Markov chain regenerative? Krishna B. Athreya and Vivekananda Roy Iowa tate University Ames, Iowa, 50011, UA Abstract A sequence of random variables {X n } n 0 is called regenerative if it can
More informationVariance Bounding Markov Chains
Variance Bounding Markov Chains by Gareth O. Roberts * and Jeffrey S. Rosenthal ** (September 2006; revised April 2007.) Abstract. We introduce a new property of Markov chains, called variance bounding.
More informationLattice Gaussian Sampling with Markov Chain Monte Carlo (MCMC)
Lattice Gaussian Sampling with Markov Chain Monte Carlo (MCMC) Cong Ling Imperial College London aaa joint work with Zheng Wang (Huawei Technologies Shanghai) aaa September 20, 2016 Cong Ling (ICL) MCMC
More informationTHE SET OF RECURRENT POINTS OF A CONTINUOUS SELF-MAP ON AN INTERVAL AND STRONG CHAOS
J. Appl. Math. & Computing Vol. 4(2004), No. - 2, pp. 277-288 THE SET OF RECURRENT POINTS OF A CONTINUOUS SELF-MAP ON AN INTERVAL AND STRONG CHAOS LIDONG WANG, GONGFU LIAO, ZHENYAN CHU AND XIAODONG DUAN
More informationMARKOV CHAINS AND HIDDEN MARKOV MODELS
MARKOV CHAINS AND HIDDEN MARKOV MODELS MERYL SEAH Abstract. This is an expository paper outlining the basics of Markov chains. We start the paper by explaining what a finite Markov chain is. Then we describe
More informationUniformly Uniformly-ergodic Markov chains and BSDEs
Uniformly Uniformly-ergodic Markov chains and BSDEs Samuel N. Cohen Mathematical Institute, University of Oxford (Based on joint work with Ying Hu, Robert Elliott, Lukas Szpruch) Centre Henri Lebesgue,
More informationMATH 564/STAT 555 Applied Stochastic Processes Homework 2, September 18, 2015 Due September 30, 2015
ID NAME SCORE MATH 56/STAT 555 Applied Stochastic Processes Homework 2, September 8, 205 Due September 30, 205 The generating function of a sequence a n n 0 is defined as As : a ns n for all s 0 for which
More informationLocal consistency of Markov chain Monte Carlo methods
Ann Inst Stat Math (2014) 66:63 74 DOI 10.1007/s10463-013-0403-3 Local consistency of Markov chain Monte Carlo methods Kengo Kamatani Received: 12 January 2012 / Revised: 8 March 2013 / Published online:
More informationLecture 11: Introduction to Markov Chains. Copyright G. Caire (Sample Lectures) 321
Lecture 11: Introduction to Markov Chains Copyright G. Caire (Sample Lectures) 321 Discrete-time random processes A sequence of RVs indexed by a variable n 2 {0, 1, 2,...} forms a discretetime random process
More informationRenewal theory and computable convergence rates for geometrically ergodic Markov chains
Renewal theory and computable convergence rates for geometrically ergodic Markov chains Peter H. Baxendale July 3, 2002 Revised version July 6, 2003 Abstract We give computable bounds on the rate of convergence
More informationWeighted Sums of Orthogonal Polynomials Related to Birth-Death Processes with Killing
Advances in Dynamical Systems and Applications ISSN 0973-5321, Volume 8, Number 2, pp. 401 412 (2013) http://campus.mst.edu/adsa Weighted Sums of Orthogonal Polynomials Related to Birth-Death Processes
More informationA mathematical framework for Exact Milestoning
A mathematical framework for Exact Milestoning David Aristoff (joint work with Juan M. Bello-Rivas and Ron Elber) Colorado State University July 2015 D. Aristoff (Colorado State University) July 2015 1
More informationMixing time for a random walk on a ring
Mixing time for a random walk on a ring Stephen Connor Joint work with Michael Bate Paris, September 2013 Introduction Let X be a discrete time Markov chain on a finite state space S, with transition matrix
More information6 Markov Chain Monte Carlo (MCMC)
6 Markov Chain Monte Carlo (MCMC) The underlying idea in MCMC is to replace the iid samples of basic MC methods, with dependent samples from an ergodic Markov chain, whose limiting (stationary) distribution
More informationGLUING LEMMAS AND SKOROHOD REPRESENTATIONS
GLUING LEMMAS AND SKOROHOD REPRESENTATIONS PATRIZIA BERTI, LUCA PRATELLI, AND PIETRO RIGO Abstract. Let X, E), Y, F) and Z, G) be measurable spaces. Suppose we are given two probability measures γ and
More informationMarkov Chains CK eqns Classes Hitting times Rec./trans. Strong Markov Stat. distr. Reversibility * Markov Chains
Markov Chains A random process X is a family {X t : t T } of random variables indexed by some set T. When T = {0, 1, 2,... } one speaks about a discrete-time process, for T = R or T = [0, ) one has a continuous-time
More informationSUPPLEMENT TO EXTENDING THE LATENT MULTINOMIAL MODEL WITH COMPLEX ERROR PROCESSES AND DYNAMIC MARKOV BASES
SUPPLEMENT TO EXTENDING THE LATENT MULTINOMIAL MODEL WITH COMPLEX ERROR PROCESSES AND DYNAMIC MARKOV BASES By Simon J Bonner, Matthew R Schofield, Patrik Noren Steven J Price University of Western Ontario,
More informationThe Wang-Landau algorithm in general state spaces: Applications and convergence analysis
The Wang-Landau algorithm in general state spaces: Applications and convergence analysis Yves F. Atchadé and Jun S. Liu First version Nov. 2004; Revised Feb. 2007, Aug. 2008) Abstract: The Wang-Landau
More informationMARKOV CHAIN MONTE CARLO
MARKOV CHAIN MONTE CARLO RYAN WANG Abstract. This paper gives a brief introduction to Markov Chain Monte Carlo methods, which offer a general framework for calculating difficult integrals. We start with
More informationTail inequalities for additive functionals and empirical processes of. Markov chains
Tail inequalities for additive functionals and empirical processes of geometrically ergodic Markov chains University of Warsaw Banff, June 2009 Geometric ergodicity Definition A Markov chain X = (X n )
More informationOn the Applicability of Regenerative Simulation in Markov Chain Monte Carlo
On the Applicability of Regenerative Simulation in Markov Chain Monte Carlo James P. Hobert 1, Galin L. Jones 2, Brett Presnell 1, and Jeffrey S. Rosenthal 3 1 Department of Statistics University of Florida
More informationChapter 2: Markov Chains and Queues in Discrete Time
Chapter 2: Markov Chains and Queues in Discrete Time L. Breuer University of Kent 1 Definition Let X n with n N 0 denote random variables on a discrete space E. The sequence X = (X n : n N 0 ) is called
More informationMinorization Conditions and Convergence Rates for Markov Chain Monte Carlo. (September, 1993; revised July, 1994.)
Minorization Conditions and Convergence Rates for Markov Chain Monte Carlo September, 1993; revised July, 1994. Appeared in Journal of the American Statistical Association 90 1995, 558 566. by Jeffrey
More informationITERATED FUNCTION SYSTEMS WITH CONTINUOUS PLACE DEPENDENT PROBABILITIES
UNIVERSITATIS IAGELLONICAE ACTA MATHEMATICA, FASCICULUS XL 2002 ITERATED FUNCTION SYSTEMS WITH CONTINUOUS PLACE DEPENDENT PROBABILITIES by Joanna Jaroszewska Abstract. We study the asymptotic behaviour
More informationConvergence Rate of Markov Chains
Convergence Rate of Markov Chains Will Perkins April 16, 2013 Convergence Last class we saw that if X n is an irreducible, aperiodic, positive recurrent Markov chain, then there exists a stationary distribution
More informationAn Introduction to Entropy and Subshifts of. Finite Type
An Introduction to Entropy and Subshifts of Finite Type Abby Pekoske Department of Mathematics Oregon State University pekoskea@math.oregonstate.edu August 4, 2015 Abstract This work gives an overview
More informationCONVERGENCE THEOREM FOR FINITE MARKOV CHAINS. Contents
CONVERGENCE THEOREM FOR FINITE MARKOV CHAINS ARI FREEDMAN Abstract. In this expository paper, I will give an overview of the necessary conditions for convergence in Markov chains on finite state spaces.
More informationExercise Solutions to Functional Analysis
Exercise Solutions to Functional Analysis Note: References refer to M. Schechter, Principles of Functional Analysis Exersize that. Let φ,..., φ n be an orthonormal set in a Hilbert space H. Show n f n
More information25.1 Ergodicity and Metric Transitivity
Chapter 25 Ergodicity This lecture explains what it means for a process to be ergodic or metrically transitive, gives a few characterizes of these properties (especially for AMS processes), and deduces
More informationKERNEL ESTIMATORS OF ASYMPTOTIC VARIANCE FOR ADAPTIVE MARKOV CHAIN MONTE CARLO. By Yves F. Atchadé University of Michigan
Submitted to the Annals of Statistics arxiv: math.pr/0911.1164 KERNEL ESTIMATORS OF ASYMPTOTIC VARIANCE FOR ADAPTIVE MARKOV CHAIN MONTE CARLO By Yves F. Atchadé University of Michigan We study the asymptotic
More informationStochastic Simulation
Stochastic Simulation Ulm University Institute of Stochastics Lecture Notes Dr. Tim Brereton Summer Term 2015 Ulm, 2015 2 Contents 1 Discrete-Time Markov Chains 5 1.1 Discrete-Time Markov Chains.....................
More informationWinter 2019 Math 106 Topics in Applied Mathematics. Lecture 9: Markov Chain Monte Carlo
Winter 2019 Math 106 Topics in Applied Mathematics Data-driven Uncertainty Quantification Yoonsang Lee (yoonsang.lee@dartmouth.edu) Lecture 9: Markov Chain Monte Carlo 9.1 Markov Chain A Markov Chain Monte
More informationGeneral Glivenko-Cantelli theorems
The ISI s Journal for the Rapid Dissemination of Statistics Research (wileyonlinelibrary.com) DOI: 10.100X/sta.0000......................................................................................................
More informationLecture notes for Analysis of Algorithms : Markov decision processes
Lecture notes for Analysis of Algorithms : Markov decision processes Lecturer: Thomas Dueholm Hansen June 6, 013 Abstract We give an introduction to infinite-horizon Markov decision processes (MDPs) with
More informationStat 516, Homework 1
Stat 516, Homework 1 Due date: October 7 1. Consider an urn with n distinct balls numbered 1,..., n. We sample balls from the urn with replacement. Let N be the number of draws until we encounter a ball
More informationON THE SIMILARITY OF CENTERED OPERATORS TO CONTRACTIONS. Srdjan Petrović
ON THE SIMILARITY OF CENTERED OPERATORS TO CONTRACTIONS Srdjan Petrović Abstract. In this paper we show that every power bounded operator weighted shift with commuting normal weights is similar to a contraction.
More informationStochastic Proximal Gradient Algorithm
Stochastic Institut Mines-Télécom / Telecom ParisTech / Laboratoire Traitement et Communication de l Information Joint work with: Y. Atchade, Ann Arbor, USA, G. Fort LTCI/Télécom Paristech and the kind
More informationMinicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics
Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics Eric Slud, Statistics Program Lecture 1: Metropolis-Hastings Algorithm, plus background in Simulation and Markov Chains. Lecture
More informationRudiments of Ergodic Theory
Rudiments of Ergodic Theory Zefeng Chen September 24, 203 Abstract In this note we intend to present basic ergodic theory. We begin with the notion of a measure preserving transformation. We then define
More informationRandom Times and Their Properties
Chapter 6 Random Times and Their Properties Section 6.1 recalls the definition of a filtration (a growing collection of σ-fields) and of stopping times (basically, measurable random times). Section 6.2
More informationPROBLEMS. (b) (Polarization Identity) Show that in any inner product space
1 Professor Carl Cowen Math 54600 Fall 09 PROBLEMS 1. (Geometry in Inner Product Spaces) (a) (Parallelogram Law) Show that in any inner product space x + y 2 + x y 2 = 2( x 2 + y 2 ). (b) (Polarization
More informationFunctional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...
Functional Analysis Franck Sueur 2018-2019 Contents 1 Metric spaces 1 1.1 Definitions........................................ 1 1.2 Completeness...................................... 3 1.3 Compactness......................................
More informationReversible Markov chains
Reversible Markov chains Variational representations and ordering Chris Sherlock Abstract This pedagogical document explains three variational representations that are useful when comparing the efficiencies
More informationStochastic relations of random variables and processes
Stochastic relations of random variables and processes Lasse Leskelä Helsinki University of Technology 7th World Congress in Probability and Statistics Singapore, 18 July 2008 Fundamental problem of applied
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov
More informationReal Analysis Notes. Thomas Goller
Real Analysis Notes Thomas Goller September 4, 2011 Contents 1 Abstract Measure Spaces 2 1.1 Basic Definitions........................... 2 1.2 Measurable Functions........................ 2 1.3 Integration..............................
More informationAdvances and Applications in Perfect Sampling
and Applications in Perfect Sampling Ph.D. Dissertation Defense Ulrike Schneider advisor: Jem Corcoran May 8, 2003 Department of Applied Mathematics University of Colorado Outline Introduction (1) MCMC
More informationSTOCHASTIC PROCESSES Basic notions
J. Virtamo 38.3143 Queueing Theory / Stochastic processes 1 STOCHASTIC PROCESSES Basic notions Often the systems we consider evolve in time and we are interested in their dynamic behaviour, usually involving
More informationON THE STRONG LIMIT THEOREMS FOR DOUBLE ARRAYS OF BLOCKWISE M-DEPENDENT RANDOM VARIABLES
Available at: http://publications.ictp.it IC/28/89 United Nations Educational, Scientific Cultural Organization International Atomic Energy Agency THE ABDUS SALAM INTERNATIONAL CENTRE FOR THEORETICAL PHYSICS
More informationLecture 5 - Hausdorff and Gromov-Hausdorff Distance
Lecture 5 - Hausdorff and Gromov-Hausdorff Distance August 1, 2011 1 Definition and Basic Properties Given a metric space X, the set of closed sets of X supports a metric, the Hausdorff metric. If A is
More informationValue Iteration and Action ɛ-approximation of Optimal Policies in Discounted Markov Decision Processes
Value Iteration and Action ɛ-approximation of Optimal Policies in Discounted Markov Decision Processes RAÚL MONTES-DE-OCA Departamento de Matemáticas Universidad Autónoma Metropolitana-Iztapalapa San Rafael
More informationTUNING OF MARKOV CHAIN MONTE CARLO ALGORITHMS USING COPULAS
U.P.B. Sci. Bull., Series A, Vol. 73, Iss. 1, 2011 ISSN 1223-7027 TUNING OF MARKOV CHAIN MONTE CARLO ALGORITHMS USING COPULAS Radu V. Craiu 1 Algoritmii de tipul Metropolis-Hastings se constituie într-una
More informationEmpirical Processes: General Weak Convergence Theory
Empirical Processes: General Weak Convergence Theory Moulinath Banerjee May 18, 2010 1 Extended Weak Convergence The lack of measurability of the empirical process with respect to the sigma-field generated
More informationMET Workshop: Exercises
MET Workshop: Exercises Alex Blumenthal and Anthony Quas May 7, 206 Notation. R d is endowed with the standard inner product (, ) and Euclidean norm. M d d (R) denotes the space of n n real matrices. When
More informationConsistency of the maximum likelihood estimator for general hidden Markov models
Consistency of the maximum likelihood estimator for general hidden Markov models Jimmy Olsson Centre for Mathematical Sciences Lund University Nordstat 2012 Umeå, Sweden Collaborators Hidden Markov models
More informationWalsh Diffusions. Andrey Sarantsev. March 27, University of California, Santa Barbara. Andrey Sarantsev University of Washington, Seattle 1 / 1
Walsh Diffusions Andrey Sarantsev University of California, Santa Barbara March 27, 2017 Andrey Sarantsev University of Washington, Seattle 1 / 1 Walsh Brownian Motion on R d Spinning measure µ: probability
More informationREVERSALS ON SFT S. 1. Introduction and preliminaries
Trends in Mathematics Information Center for Mathematical Sciences Volume 7, Number 2, December, 2004, Pages 119 125 REVERSALS ON SFT S JUNGSEOB LEE Abstract. Reversals of topological dynamical systems
More informationLecture Notes Introduction to Ergodic Theory
Lecture Notes Introduction to Ergodic Theory Tiago Pereira Department of Mathematics Imperial College London Our course consists of five introductory lectures on probabilistic aspects of dynamical systems,
More informationFormal power series rings, inverse limits, and I-adic completions of rings
Formal power series rings, inverse limits, and I-adic completions of rings Formal semigroup rings and formal power series rings We next want to explore the notion of a (formal) power series ring in finitely
More information1 Kernels Definitions Operations Finite State Space Regular Conditional Probabilities... 4
Stat 8501 Lecture Notes Markov Chains Charles J. Geyer April 23, 2014 Contents 1 Kernels 2 1.1 Definitions............................. 2 1.2 Operations............................ 3 1.3 Finite State Space........................
More informationPart II Probability and Measure
Part II Probability and Measure Theorems Based on lectures by J. Miller Notes taken by Dexter Chua Michaelmas 2016 These notes are not endorsed by the lecturers, and I have modified them (often significantly)
More informationBrownian Motion. 1 Definition Brownian Motion Wiener measure... 3
Brownian Motion Contents 1 Definition 2 1.1 Brownian Motion................................. 2 1.2 Wiener measure.................................. 3 2 Construction 4 2.1 Gaussian process.................................
More information25.1 Markov Chain Monte Carlo (MCMC)
CS880: Approximations Algorithms Scribe: Dave Andrzejewski Lecturer: Shuchi Chawla Topic: Approx counting/sampling, MCMC methods Date: 4/4/07 The previous lecture showed that, for self-reducible problems,
More informationAnalysis of the Gibbs sampler for a model. related to James-Stein estimators. Jeffrey S. Rosenthal*
Analysis of the Gibbs sampler for a model related to James-Stein estimators by Jeffrey S. Rosenthal* Department of Statistics University of Toronto Toronto, Ontario Canada M5S 1A1 Phone: 416 978-4594.
More informationLect4: Exact Sampling Techniques and MCMC Convergence Analysis
Lect4: Exact Sampling Techniques and MCMC Convergence Analysis. Exact sampling. Convergence analysis of MCMC. First-hit time analysis for MCMC--ways to analyze the proposals. Outline of the Module Definitions
More information