On Reparametrization and the Gibbs Sampler
|
|
- Trevor Neal
- 5 years ago
- Views:
Transcription
1 On Reparametrization and the Gibbs Sampler Jorge Carlos Román Department of Mathematics Vanderbilt University James P. Hobert Department of Statistics University of Florida March 2014 Brett Presnell Department of Statistics University of Florida Abstract Gibbs samplers derived under different parametrizations of the target density can have radically different rates of convergence. In this article, we specify conditions under which reparametrization leaves the convergence rate of a Gibbs chain unchanged. An example illustrates how these results can be exploited in convergence rate analyses. 1 Introduction It is well-known that Gibbs samplers derived under different parametrizations of a Bayesian hierarchical model can have dramatically different rates of convergence (Gelfand et al., 1995; Papaspiliopoulos et al., 2007; Roberts and Sahu, 1997; Yu and Meng, 2011). In this article, we consider the reverse situation in which reparametrization has no effect. To motivate our study, we begin with a fresh look at a well-known toy example involving a simple random effects model with known variance components. Consider the one-way random effects model given by Y ij = θ i + ɛ ij, (1) i = 1,..., c, j = 1,..., m i, where the θ i s are independent and identically distributed (iid) N(µ, σ 2 ), and the ɛ ij s are independent of the θ i s and iid N(0, σ 2 e). (For now, we restrict attention to the balanced case where m i m.) Suppose that the variance components, σ 2 and σ 2 e, are known, Román s research supported by NSF Grant DMS Corresponding author s and telephone number: jc.roman@vanderbilt.edu ; Hobert s research supported by NSF Grant DMS
2 and that the prior on µ is flat. Let θ = (θ 1,..., θ c ) and let y denote the observed data. A simple calculation shows that the posterior density of µ given y is normal, but consider nevertheless the twocomponent Gibbs chain {(µ n, θ n )} n=0 that alternately samples from the conditional distributions θ µ, y and µ θ, y, which are c-variate normal and univariate normal, respectively. The marginal sequence {µ n } n=0 is itself a Markov chain whose invariant density is the posterior density (of µ given y), and it s easy to show that the exact rate of convergence of this chain is σe/(σ 2 e 2 + mσ 2 ) (see, e.g., Liu et al., 1994). The rate of convergence will be formally defined in Section 2, but for now it suffices to note that the rate is between 0 and 1, and smaller is better. Now consider a reparametrized version of model (1) given by Y ij = µ + u i + ɛ ij, where the u i s are iid N(0, σ 2 ), and the ɛ ij s are independent of the u i s and still iid N(0, σe). 2 Let u = (u 1,..., u c ). This is called the non-centered parametrization (NCP), whereas model (1) is called the centered parametrization (CP). If we put the same flat prior on µ, then the posterior density of µ given y remains the same as in the CP model. However, the two-component Gibbs sampler derived from the NCP model, which alternates between draws from u µ, y and µ u, y, is not the same as the one based on the CP. Furthermore, the two Gibbs samplers have completely different convergence behavior. Indeed, the convergence rate of the NCP Gibbs sampler is 1 σe/(σ 2 e 2 + mσ 2 ). So when one of the two Gibbs samplers is very slow to converge, the other converges extremely rapidly. This simple example illustrates that reparametrization can significantly affect the convergence rate of the Gibbs sampler. In a practical version of the one-way model, the variance components are unknown. In this case, the standard default prior density for (µ, σ 2, σe) 2 is 1/ ( σe 2 ) σ 2. We assume that the posterior is proper - see Román (2012) for conditions. The posterior density of (µ, σ 2, σe) 2 given y, which is the same under CP and NCP, is intractable, so this is no longer a toy example. As in the known variance case, there are two different versions of the standard two-component Gibbs sampler for this problem: the CP Gibbs sampler, which alternates between θ, µ σ 2, σe, 2 y and σ 2, σe µ, 2 θ, y, and the NCP Gibbs sampler, which alternates between u, µ σ 2, σe, 2 y and σ 2, σe u, 2 µ, y. The results of Section 3 imply that, in contrast with the known variance case, these two Gibbs samplers converge at exactly the same rate. Consequently, convergence rate results for either of these Gibbs samplers apply directly to the other. In Section 3 we compare the results of Román (2012), who analyzed the NCP Gibbs sampler, with those of Tan and Hobert (2009), who studied the CP version. The CP and NCP Gibbs Markov chains described above share the same rate of convergence because the transformation that takes the CP model to the NCP model involves variables (θ and µ) that reside in the same component (or block) of the two-component Gibbs sampler. (Note that this 2
3 is not the case in the toy example where the variance components are known.) The main result in this paper is a formalization of this idea. We now provide an overview of our results in the special case where the target distribution has a density with respect to Lebesgue measure. Suppose f : R d 1 R d 2 R d k [0, ) is a probability density function, and let Φ 1 = { (X n (1), X n (2),..., X n (k) ) } denote the Markov chain simulated by the k-component Gibbs n=0 sampler based on f(x 1, x 2,..., x k ) that updates the components in the natural order. It is wellknown and easy to see that the marginal sequence Φ 1 := { (X n (2),..., X n (k) ) } is also a Markov n=0 chain. Now, for i {2, 3,..., k}, let Φ i denote the k-component Gibbs sampler whose update order is (i, i + 1,..., k, 1, 2,..., i 1), and let Φ i denote the corresponding marginal Markov chain (that leaves out X (i) ). We show that all 2k of these chains converge at exactly the same rate. Not only is this fact the key to the proof of our main result concerning reparametrization, it is also useful from a practical standpoint. Indeed, if one wishes to know the rate of convergence of Φ 1, then it suffices to study the lower-dimensional chain Φ i (for any i = 1, 2,..., k), which may be easier to analyze than Φ 1. This idea has been used to establish qualitative convergence results (such as geometric and uniform ergodicity) for two-component Gibbs samplers (see, e.g., Diebolt and Robert (1994) and Román and Hobert (2012)). Now let (X 1, X 2,..., X k ) denote a random vector with density f, and consider the k-component Gibbs sampler Φ 1 based on the distribution of ( X 1, X 2,..., X k ) = (t 1 (X 1 ), t 2 (X 2 ),..., t k (X k )). Suppose f(x 1, x 2,..., x k ) can be written as a function of (t 1 (x 1 ), t 2 (x 2 ),..., t k (x k )), an assumption that obviously holds if each t i : R d i R d i is invertible. Then, by exploiting the fact that the 2k chains described above share the same rate, we show that Φ 1 and Φ 1 converge at the same rate. An important implication of this result is that, when analyzing the convergence rate of a Gibbs sampler, one is free to choose a convenient parametrization, as long as the corresponding transformation respects the within-component restriction. The remainder of this article is organized as follows. Section 2 contains some background on general state space Markov chain theory as well as preliminary results. Our main result showing that a within-component reparametrization does not affect the convergence rate of the Gibbs Markov chain can be found in Section 3. This section also contains the application of our main result to the Gibbs samplers for the one-way model with improper priors. 3
4 2 Markov Chain Background and Preliminary Results As in Meyn and Tweedie (1993, Chapter 3), let P (x, dy) be a generic Markov transition function (MTF) on a set X equipped with a countably generated σ-algebra. Let P n (x, dy) denote the n-step MTF. We assume throughout that the chain determined by P is ψ-irreducible, aperiodic and positive recurrent with invariant probability measure π. We do not assume reversibility. For a measure ν on X, let νp n (dy) = X P n (x, dy)ν(dx). Following Roberts and Tweedie (2001) and Rosenthal (2003), define the L 1 -rate of convergence of the Markov chain as { } 1 ρ = exp sup lim n n log νp n π TV, ν p(π) where TV denotes the total variation norm for signed measures and p(π) is the set of all probability measures ν that are absolutely continuous with respect to π with X (dν/dπ)2 dπ <. For reversible chains, ρ equals the usual rate of convergence, i.e., the spectral radius (and norm) of the self-adjoint Markov operator defined by P (Rosenthal, 2003, Proposition 2). As in Roberts and Rosenthal (1997), we say that the chain (or the corresponding MTF) is π-a.e. geometrically ergodic if there exist M : X (0, ) and κ < 1 such that, for π-a.e. x X, P n (x, ) π( ) TV M(x)κ n for all n N. We often omit the π-a.e. and simply write geometrically ergodic. The next proposition follows easily from results in Roberts and Rosenthal (1997) and Roberts and Tweedie (2001). Proposition 1. The Markov chain based on P is geometrically ergodic if and only if ρ < 1. Now, for i = 1, 2,..., k, let (X i, F i, µ i ) denote σ-finite measure spaces, and let (X, F, µ) denote the corresponding product space. Suppose that π is a probability distribution on (X, F) having density f(x 1, x 2,..., x k ) with respect to µ. Let P i denote the MTF of the k-component Gibbs sampler whose update order is (i, i + 1,..., k, 1, 2,..., i 1), and let Q i denote the MTF of the corresponding marginal Markov chain (that leaves out the ith component). A proof of the following result can be found in the Appendix. Proposition 2. The Markov chains defined by the MTFs {P i } k i=1 and {Q i} k i=1 all share the same L 1 convergence rate. In conjunction with Proposition 1, Proposition 2 shows that geometric ergodicity is a solidarity property for the 2k chains defined by {P i } k i=1 and {Q i} k i=1. That is, either all 2k chains are geometrically ergodic, or none of them is. This result is actually well-known when k = 2. Indeed, in 4
5 that case, Diaconis et al. s (2008) Lemma 2.4 shows that geometric ergodicity is a solidarity property for P 1 and Q 1, and symmetry implies that the same holds for P 2 and Q 2. (These facts can also be established using results in Roberts and Rosenthal (2001).) Furthermore, when k = 2, the marginal Markov chains defined by Q 1 and Q 2 are reversible, and the norms of the corresponding self-adjoint Markov operators are identical (Liu et al., 1994). Then, because a reversible Markov chain is geometrically ergodic if and only if the norm of its Markov operator is strictly less than one (Roberts and Rosenthal, 1997), it follows that Q 1 is geometrically ergodic if and only if Q 2 is geometrically ergodic, completing the cycle, and the argument (for k = 2). 3 Reparametrization Suppose that (X 1, X 2,..., X k ) has (joint) distribution π, and let π represent the distribution of (t 1 (X 1 ), t 2 (X 2 ),..., t k (X k )). Under what conditions does the Gibbs sampler based on π have the same rate of convergence as the sampler based on π, i.e., when is the convergence rate of the Gibbs sampler unchanged by within-block transformations? To formalize this question, let (X i, F i, µ i ), i = 1, 2..., k, (X, F, µ), π, and f be as in the previous section. Let (Y i, G i ), i = 1, 2,..., k, be measurable spaces, let (Y, G) be their product, and assume that t i : X i Y i, i = 1, 2,..., k are measurable transformations. Finally, let T (x 1, x 2,..., x k ) = (t 1 (x 1 ), t 2 (x 2 ),..., t k (x k )) and let π = π T 1 be the probability distribution induced on (Y, G) by the transformation T, i.e., π(b) = π(t 1 (B)), B G, where T 1 (B) is the pre-image of B under T. The following result is proved in the Appendix. Proposition 3. Suppose that there exists a measurable function f : Y R, such that for all (x 1, x 2,..., x k ) X. f(x 1, x 2,..., x k ) = f(t 1 (x 1 ), t 2 (x 2 ),..., t k (x k )) (2) Then the k-component Gibbs samplers based on π and π (both updating the components in the natural order), have the same L 1 -rate of convergence. Remark. The main hypothesis of Proposition 3 clearly holds when each t i is an invertible function (with measurable inverse), since then (2) holds with f(y 1, y 2,..., y k ) = f ( t 1 1 (y 1), t 1 2 (y 2),..., t 1 k (y k) ). We now return to the CP and NCP Gibbs samplers for the one-way model. In the Introduction, we considered only the balanced case, in which all the m i s are the same, and we considered only one prior density. Here we allow the m i to differ, and we consider a family of prior densities for 5
6 (µ, σ 2, σ 2 e) given by ( σ 2 ) (a+1)( σ 2 e ) (b+1)i(0, ) (σ 2 e)i (0, ) (σ 2 ), where a, b are hyperparameters. Note that by taking (a, b) = ( 1/2, 0), we recover the default prior from the Introduction. Tan and Hobert (2009) analyzed the Gibbs sampler based on the CP version of the one-way model and proved that the CP Gibbs Markov chain is geometrically ergodic if a < 0 and {( c M + 2b c + 3 and c min i=1 m i m i + 1 ) 1 }, m { ( c )} < 2 exp Ψ M 2 + a, where M = c i=1 m i, m = max{m 1, m 2,..., m c } and Ψ(x) = d dx log( Γ(x) ) is the digamma function. Román (2012) (see also Román and Hobert (2012)) subsequently proved that the NCP Gibbs Markov chain is geometrically ergodic if a < 0 and { ( c )} M + 2b c + 2 and 1 < 2 exp Ψ 2 + a. It s easy to see that Román s conditions are weaker (i.e., easier to satisfy) than those of Tan and Hobert. However, the two sets of conditions are directly comparable only if geometric ergodicity is a solidarity property for the two different Gibbs chains. Let π(θ, µ, σ 2, σ 2 e y) denote the complete data posterior density under the CP model, which is the invariant density of the CP Gibbs Markov chain. Consider a one-to-one transformation of ( (θ, µ), (σ 2, σ 2 e) ) to ( t(θ, µ), (σ 2, σ 2 e) ), where t : R c+1 R c+1 is defined as follows: t(θ, µ) = ( θ1 µ, θ 2 µ,..., θ c µ, µ ). The density of the transformed variable is exactly the complete data posterior density under the NCP model, so Proposition 3 implies that the CP and NCP Gibbs chains share the same L 1 -rate. Thus, Román s (2012) result is indeed an improvement upon that of Tan and Hobert (2009). We now present an example involving a transformation that is not one-to-one. Consider a pair of random variables (X 1, X 2 ) such that X 1 X 2 = x 2 N(0, 1/x 2 ) (3) and X 2 Gamma ( ν 2, ν 2 ), where ν > 0 is a known constant. Then the density of (X1, X 2 ) is f(x 1, x 2 ) = (ν/2)ν/2 { Γ(ν/2) 2π x ν exp x 2 ( x ν )} I (0, ) (x 2 ), 6
7 and it can be shown that ( ν + 1 X 2 X 1 = x 1 Gamma 2, 1 ) 2 (x2 1 + ν). (4) Although direct simulation of (X 1, X 2 ) is clearly possible, consider the Gibbs sampler which uses the conditionals in (3) and (4). Suppose we use the transformation U 1 = t 1 (X 1 ) = X1 2 (which is not one-to-one) together with U 2 = t 2 (X 2 ) = X 2. Since X 1 X 2 = x 2 N(0, 1/x 2 ), it follows immediately using a χ 2 -type calculation that X 2 1 X 2 = x 2 Gamma(1/2, x 2 /2). In other words, U 1 U 2 = u 2 Gamma(1/2, u 2 /2). (5) Obviously, U 2 Gamma ( ν 2, ν 2 ) and the density of (U1, U 2 ) is f U (u 1, u 2 ) = (ν/2)ν/2 Γ(ν/2) 1 { u ν exp u } 2 2π u1 2 (u 1 + ν) I (0, ) (u 1 )I (0, ) (u 2 ). Moreover, a simple calculation shows that U 2 U 1 = u 1 Gamma ( ν + 1 2, 1 ) 2 (u 1 + ν). (6) The associated Gibbs sampler can be simulated using the conditionals given in (5) and (6). Finally, because the joint density of (X 1, X 2 ) depends on x 1 only through t 1 (x 1 ) = x 2 1, the condition in Proposition 3 is satisfied and we conclude that the Gibbs samplers associated with f and f U converge at the same L 1 -rate. Appendix Proof of Proposition 2. We will prove the result for k = 3. The extension to general k is obvious, and only involves more complicated notation. The proof has two parts: first we show that P 1, P 2 and P 3 share the same L 1 rate; and then we show that P i and Q i have the same L 1 rate for i = 1, 2, 3, We prove the first result by showing that ρ 1 ρ 2 ρ 3 ρ 1, where ρ i denotes the L 1 rate of P i. For this, we need only show that ρ 1 ρ 2, with the remaining inequalities following by symmetry. To prove ρ 1 ρ 2, we show that for each fixed ν p(π), there exists a ν p(π) such that, for all n N, νp n+1 1 π TV ν P n 2 π TV. (7) From this it follows that lim n n 1 log νp n 1 π TV lim n n 1 log ν P n 2 π TV log(ρ 2), 7
8 which implies ρ 1 ρ 2. To prove (7), let (X 1, X 2, X 3 ) have distribution π, and let f 1 23 (x 1 x 2, x 3 ), f 2 13 (x 2 x 1, x 3 ), and f 3 12 (x 3 x 1, x 2 ) represent the conditional densities of X 1 (given X 2, and X 3 ), of X 2 (given X 1 and X 3 ), and of X 3 (given X 1 and X 2 ), respectively. For i = 1, 2 and A F, we have P i ((x 1, x 2, x 3 ), A) = k i (x 1, x 2, x 3 x 1, x 2, x 3 ) µ(d(x 1, x 2, x 3)), A where k 1 and k 2 are the Markov transition densities associated with P 1 and P 2, respectively. Of course, k 1 (x 1, x 2, x 3 x 1, x 2, x 3 ) = f 1 23 (x 1 x 2, x 3 )f 2 13 (x 2 x 1, x 3 )f 3 12 (x 3 x 1, x 2), and k 2 is defined analogously. It is convenient to express each of P 1 and P 2 as the composition of three simple transition kernels. To this end, let δ x ( ) denote a point mass measure at x and let ( P 1 23 (x1, x 2, x 3 ), A ) = f 1 23 (x 1 x 2, x 3 )(µ 1 δ x2 δ x3 )(d(x 1, x 2, x 3)), A be the kernel associated with the single update of X 1 (given X 2 and X 3 ). Define the kernels associated with the (conditional) updates of X 2 and X 3 analogously and call them P 2 13 and P 3 12, respectively. A routine calculation shows that P 1 = P 1 23 P 2 13 P 3 12 and P 2 = P 2 13 P 3 12 P Given ν p(π) having density q with respect to π, let ν = νp A straightforward calculation shows that ν has density q = P 1 23 q with respect to π. Moreover, a simple application of Jensen s inequality shows that X (q ) 2 dπ <, so ν p(π). Also, given a function g : X [ 1, 1, let ĝ = P 2 13 P 3 12 g and note that ĝ 1, where is the supremum norm. Writing P 1 and P 2 in terms of the kernels P 1 23, P 2 13 and P 3 12, and using a simple induction argument, we obtain that, for any n 1, νp1 n+1 g = ν P2 n ĝ for all ν p(π) and all g : X [ 1, 1. Finally, since π = πp 2 13 = πp 3 12, we have that π = πp 2 13 P 3 12 and thus πg = πĝ. Hence, νp1 n+1 (g) π(g) = ν P2 n (ĝ) π(ĝ) sup ν P2 n (h) π(h) = 2 ν P2 n π TV, {h: h 1} and because g was arbitrary, (7) follows. For the second part of the proof, let η i denote the L 1 rate for Q i, i = 1, 2, 3. We will show that ρ 1 = η 1. The other two equivalences then follow by symmetry. For a measurable set B in X 2 X 3, [ Q 1 ((x 2, x 3 ), B) = k 1 (x 1, x 2, x 3 x 1, x 2, x 3 ) µ 1 (dx 1) (µ 2 µ 3 )(d(x 2, x 3)), B X 1 8
9 and the corresponding invariant distribution is given by [ π 23 (B) = f(x 1, x 2, x 3 ) µ 1 (dx 1 ) (µ 2 µ 3 )(d(x 2, x 3 )). B X 1 Given α p(π 23 ) and g : X 2 X 3 [ 1, 1, define α p(π) by α dα (A) = (x 2, x 3 )π(d(x 1, x 2, x 3 )) dπ 23 and ǧ : X [ 1, 1 by ǧ(x 1, x 2, x 3 ) = g(x 2, x 3 ), respectively. Then A (P 1 ǧ)(x 1, x 2, x 3 ) = k 1 (x 1, x 2, x 3 x 1, x 2, x 3 )g(x 2, x 3)µ(d(x 1, x 2, x 3)) X [ = k 1 (x 1, x 2, x 3 x 1, x 2, x 3 ) µ 1 (dx 1) g(x 2, x 3) µ 2 (dx 2) µ 3 (dx 3) = (Q 1 g)(x 2, x 3 ), X 1 X 2 X 3 and it follows by induction that (P1 nǧ)(x 1, x 2, x 3 ) = (Q n 1 g)(x 2, x 3 ) for all n 1. Thus, α (P1 n ǧ) = (P1 n ǧ)(x 1, x 2, x 3 ) dα (x 2, x 3 )π(d(x 1, x 2, x 3 )) X dπ 23 = (Q n 1 g)(x 2, x 3 ) dα (x 2, x 3 )π(d(x 1, x 2, x 3 )) X dπ 23 = (Q n 1 g)(x 2, x 3 ) dα [ (x 2, x 3 ) f(x 1, x 2, x 3 )µ 1 (dx 1 ) µ 2 (dx 2 ) µ 3 (dx 3 ) X 2 X 3 dπ 23 X 1 = (Q n 1 g)(x 2, x 3 ) dα (x 2, x 3 ) π 23 (d(x 2, x 3 )) = α(q n g). X 3 dπ 23 X 2 Finally, since π(ǧ) = π 23 (g), we have αq n 1 (g) π 23 (g) = α P n 1 (ǧ) π(ǧ) sup α P1 n (h) π(h) = 2 α P1 n π TV, {h: h 1} for all n 1, and since g : X 2 X 3 [ 1, 1 was arbitrary, αq n 1 π 23 TV α P n 1 π TV for all n 1. This proves that η 1 ρ 1. To prove the reverse inequality, let ν p(π) and g : X [ 1, 1, and define ν p(π 23 ) by [ ν dν (B) = B X 1 dπ (x 1, x 2, x 3 )f 1 23 (x 1 x 2, x 3 )µ 1 (dx 1 ) π 23 (d(x 2, x 3 )) and, noting that (P 1 g)(x 1, x 2, x 3 ) does not depend on x 1, let ǧ(x 2, x 3 ) = (P 1 g)(x 1, x 2, x 3 ). An induction argument similar to the one above shows that (P n+1 1 g)(x 1, x 2, x 3 ) = (Q n 1 ǧ)(x 2, x 3 ) for 9
10 all n 1, and thus, [ ν(p1 n+1 g) = (Q n dν 1 ǧ)(x 2, x 3 ) X 3 X 2 X 1 dπ (x 1, x 2, x 3 )f(x 1, x 2, x 3 ) µ 1 (dx 1 ) µ 2 (dx 2 ) µ 3 (dx 3 ) [ = (Q X 3 X2 n1 ǧ)(x 2, x 3 ) dν (x 2, x 3 ) f(x dπ 1, x 2, x 3 ) µ 1 (dx 1) µ 2 (dx 2 ) µ 3 (dx 3 ) 23 X 1 = (Q X2 X3 n1 ǧ)(x 2, x 3 ) dν (x 2, x 3 ) π 23 (d(x 2, x 3 )) = ν (Q n 1 P 1 g). dπ 23 Finally, since π(g) = (πp 1 )(g) = π(p 1 g) = π 23 (ǧ), we have νp1 n+1 (g) π(g) = ν Q n 1 (ǧ) π 23 (ǧ) sup ν Q n 1 (h) π 23 (h) = 2 ν Q n 1 π 23 TV {h: h 1} for all n 1. Since g : X [ 1, 1 was arbitrary, it follows that νp n+1 1 π TV ν Q n 1 π 23 TV. This implies that ρ 1 η 1, completing the proof of the proposition. A few technical remarks will be helpful before beginning the proof of Proposition 3. We will employ the following lemma. Lemma 1. Let (X, F, µ) be a measure space, let (Y, G) be a measurable space, and let π be a probability measure on (X, F) having density f with respect to µ. Suppose that T : X Y is measurable and that f(x) = f(t (x)) for some measurable function f : Y R. Let ν = µ T 1 be the measure induced on (Y, G) by µ and T. Similarly, let π = π T 1 be the probability measure induced on (Y, G) by π and T. Then π has density f with respect to ν. Proof. By change of variables (Billingsley, 1995, Theorem 16.13), for any B G, we have π(b) = π ( T 1 (B) ) = f(t (x)) µ(dx) = f(y) (µ T 1 )(dy) = f(y) ν(dy). T 1 (B) B B Returning to the specific context of Proposition 3, consider the product spaces (X, F, µ) and (Y, G), and the transformation T (x 1, x 2,..., x k ) = (t 1 (x 1 ), t 2 (x 2 ),..., t k (x k )). By Lemma 1, π = π T 1 has density f with respect to the measure ν = µ T 1. Let ν i = µ i t 1 i, i = 1, 2,..., k. If the measure spaces (Y i, G i, ν i ), i = 1, 2,..., k, are σ-finite, then it is easy to check that ν is equal to the product measure ν 1 ν 2 ν k. However, there is nothing in our hypotheses to guarantee that the ν i are σ-finite, and if any of them fail to be σ-finite, then technical difficulties arise which invalidate our proof. 10
11 Fortunately, we may assume without loss of generality that the ν i are σ-finite, and even finite. To see this, let π i denote the ith marginal distribution of π and let f i denote the density of π i with respect to µ i, which may be computed in the usual way by integrating f over all but its ith coordinate. Then it is easy to check that π has density f with respect to the product measure π 1 π 2 π k, where Now let f(x 1, x 2,..., x k ) f(x 1, x 2,..., x k ) = f 1 (x 1 )f 2 (x 2 ) f k (x k ), if f 1(x 1 )f 2 (x 2 ) f k (x k ) > 0, 0, otherwise. f 1 (y 1 ) = f(y1, t 2 (x 2 ),..., t k (x k )) µ(dx 2 ) µ(dx k ), X k X 2 and define f 2,..., f k similarly. From (2), it is obvious that f i (x i ) = f i (t i (x i )), i = 1, 2,..., k, and it follows that f(x 1, x 2,..., x k ) is a function of (t 1 (x 1 ), t 2 (x 2 ),..., t k (x k )). Thus, the hypotheses of Proposition 3 also hold upon replacement of f by f and µ i by π i, i = 1, 2,..., k. But in this case ν i = π i t 1 i, which is a probability measure, and hence finite. Proof of Proposition 3. Again, we will prove the result for k = 3. Assume, without loss of generality, that ν i = µ i t 1 i is σ-finite for i = 1, 2, 3. Let (X 1, X 2, X 3 ) have distribution π. We first prove that the Gibbs sampler based on (the distribution of) (t 1 (X 1 ), X 2, X 3 ) has the same L 1 -rate of convergence as the one based on (X 1, X 2, X 3 ). A similar argument then implies that this rate of convergence is shared by the Gibbs sampler based on (t 1 (X 1 ), t 2 (X 2 ), X 3 ), and then by the Gibbs sampler based on (t 1 (X 1 ), t 2 (X 2 ), t 2 (X 3 )), thus proving the result. By Lemma 1, (t 1 (X 1 ), X 2, X 3 ) has density g(y 1, x 2, x 3 ) = f(y 1, t 2 (x 2 ), t 3 (x 3 )) with respect to ν 1 µ 2 µ 3. Letting g 1 23 (y 1 x 2, x 3 ), g 2 13 (x 2 y 1, x 3 ), and g 3 12 (x 3 y 1, x 2 ) represent the corresponding conditional densities, the Gibbs sampler based on (t 1 (X 1 ), X 2, X 3 ) has transition density k 1 (y 1, x 2, x 3 y 1, x 2, x 3 ) = g 1 23 (y 1 x 2, x 3 )g 2 13 (x 2 y 1, x 3 )g 3 12 (x 3 y 1, x 2) with respect to ν 1 µ 2 µ 3. But g(t 1 (x 1 ), x 2, x 3 ) = f(t 1 (x 1 ), t 2 (x 2 ), t 3 (x 3 )) = f(x 1, x 2, x 3 ), and from this it is easily checked that g 1 23 (t 1 (x 1 ) x 2, x 3 ) = f 1 23 (x 1 x 2, x 3 ), g 2 13 (x 2 t 1 (x 1 ), x 3 ) = 11
12 f 2 13 (x 2 x 1, x 3 ), and g 3 12 (x 3 t 1 (x 1 ), x 2 ) = f 3 12 (x 3 x 1, x 2 ). By change of variables, k1 (y 1, x 2, x 3 y 1, x 2, x 3 ) ν 1 (dy 1) Y 1 = g 1 23 (y 1 x 2, x 3 )g 2 13 (x 2 y 1, x 3 )g 3 12 (x 3 y 1, x 2) (µ 1 t 1 1 )(dy 1) Y 1 = g 1 23 (t 1 (x 1) x 2, x 3 )g 2 13 (x 2 t 1 (x 1), x 3 )g 3 12 (x 3 t 1 (x 1), x 2) µ 1 (dx 1) X 1 = f 1 23 (x 1 x 2, x 3 )f 2 13 (x 2 x 1, x 3 )f 3 12 (x 3 x 1, x 2) µ 1 (dx 1) X 1 = k 1 (x 1, x 2, x 3 x 1, x 2, x 3 ) µ 1 (dx 1), X 1 and thus the marginal (X 2, X 3 ) chain of the Gibbs sampler based on (t 1 (X 1 ), X 2, X 3 ) has the same transition density (with respect to µ 2 µ 3 ) as the marginal (X 2, X 3 ) chain of the Gibbs sampler based on (X 1, X 2, X 3 ). This implies that the two marginal chains have the same L 1 -convergence rate, and it follows from Proposition 2 that the two parent chains also share this rate. Acknowledgments The authors thank Aixin Tan and an anonymous referee for helpful comments and suggestions. References BILLINGSLEY, P. (1995). Probability and Measure. 3rd ed. John Wiley and Sons, New York. DIACONIS, P., KHARE, K. and SALOFF-COSTE, L. (2008). Gibbs sampling, exponential families and orthogonal polynomials (with discussion). Statistical Science, DIEBOLT, J. and ROBERT, C. P. (1994). Estimation of finite mixture distributions by Bayesian sampling. Journal of the Royal Statistical Society, Series B, GELFAND, A. E., SAHU, S. K. and CARLIN, B. P. (1995). Efficient parametrisations for normal linear mixed models. Biometrika, LIU, J. S., WONG, W. H. and KONG, A. (1994). Covariance structure of the Gibbs sampler with applications to comparisons of estimators and augmentation schemes. Biometrika, MEYN, S. P. and TWEEDIE, R. L. (1993). Markov Chains and Stochastic Stability. Springer Verlag, London. 12
13 PAPASPILIOPOULOS, O., ROBERTS, G. O. and SKÖLD, M. (2007). A general framework for the parametrization of hierarchical models. Statistical Science, ROBERTS, G. and SAHU, S. K. (1997). Updating schemes, correlation structure, blocking and parameterisation for the Gibbs sampler. Journal of the Royal Statistical Society, Series B, ROBERTS, G. O. and ROSENTHAL, J. S. (1997). Geometric ergodicity and hybrid Markov chains. Electronic Communications in Probability, ROBERTS, G. O. and ROSENTHAL, J. S. (2001). Markov chains and de-initializing processes. Scandinavian Journal of Statistics, ROBERTS, G. O. and TWEEDIE, R. L. (2001). Geometric L 2 and L 1 convergence are equivalent for reversible Markov chains. Journal of Applied Probability, 38A ROMÁN, J. C. (2012). Convergence Analysis of Block Gibbs Samplers for Bayesian General Linear Mixed Models. Ph.D. thesis, Department of Statistics, University of Florida. ROMÁN, J. C. and HOBERT, J. P. (2012). Convergence analysis of the Gibbs sampler for Bayesian general linear mixed models with improper priors. Annals of Statistics, ROSENTHAL, J. S. (2003). Asymptotic variance and convergence rates of nearly-periodic MCMC algorithms. Journal of the American Statistical Association, TAN, A. and HOBERT, J. P. (2009). Block Gibbs sampling for Bayesian random effects models with improper priors: Convergence and regeneration. Journal of Computational and Graphical Statistics, YU, Y. and MENG, X.-L. (2011). To center or not to center: That is not the question - An ancillarity-sufficiency interweaving strategy (ASIS) for boosting MCMC efficiency (with discussion). Journal of Computational and Graphical Statistics,
University of Toronto Department of Statistics
Norm Comparisons for Data Augmentation by James P. Hobert Department of Statistics University of Florida and Jeffrey S. Rosenthal Department of Statistics University of Toronto Technical Report No. 0704
More informationBlock Gibbs sampling for Bayesian random effects models with improper priors: Convergence and regeneration
Block Gibbs sampling for Bayesian random effects models with improper priors: Convergence and regeneration Aixin Tan and James P. Hobert Department of Statistics, University of Florida May 4, 009 Abstract
More informationMarkov Chain Monte Carlo (MCMC)
Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can
More informationCONVERGENCE ANALYSIS OF THE GIBBS SAMPLER FOR BAYESIAN GENERAL LINEAR MIXED MODELS WITH IMPROPER PRIORS. Vanderbilt University & University of Florida
Submitted to the Annals of Statistics CONVERGENCE ANALYSIS OF THE GIBBS SAMPLER FOR BAYESIAN GENERAL LINEAR MIXED MODELS WITH IMPROPER PRIORS By Jorge Carlos Román and James P. Hobert Vanderbilt University
More informationSpectral properties of Markov operators in Markov chain Monte Carlo
Spectral properties of Markov operators in Markov chain Monte Carlo Qian Qin Advisor: James P. Hobert October 2017 1 Introduction Markov chain Monte Carlo (MCMC) is an indispensable tool in Bayesian statistics.
More informationGeometric ergodicity of the Bayesian lasso
Geometric ergodicity of the Bayesian lasso Kshiti Khare and James P. Hobert Department of Statistics University of Florida June 3 Abstract Consider the standard linear model y = X +, where the components
More informationThe Polya-Gamma Gibbs Sampler for Bayesian. Logistic Regression is Uniformly Ergodic
he Polya-Gamma Gibbs Sampler for Bayesian Logistic Regression is Uniformly Ergodic Hee Min Choi and James P. Hobert Department of Statistics University of Florida August 013 Abstract One of the most widely
More informationImproving the Convergence Properties of the Data Augmentation Algorithm with an Application to Bayesian Mixture Modelling
Improving the Convergence Properties of the Data Augmentation Algorithm with an Application to Bayesian Mixture Modelling James P. Hobert Department of Statistics University of Florida Vivekananda Roy
More informationAnalysis of Polya-Gamma Gibbs sampler for Bayesian logistic analysis of variance
Electronic Journal of Statistics Vol. (207) 326 337 ISSN: 935-7524 DOI: 0.24/7-EJS227 Analysis of Polya-Gamma Gibbs sampler for Bayesian logistic analysis of variance Hee Min Choi Department of Statistics
More informationA Comparison Theorem for Data Augmentation Algorithms with Applications
A Comparison Theorem for Data Augmentation Algorithms with Applications Hee Min Choi Department of Statistics University of California, Davis James P. Hobert Department of Statistics University of Florida
More informationLocal consistency of Markov chain Monte Carlo methods
Ann Inst Stat Math (2014) 66:63 74 DOI 10.1007/s10463-013-0403-3 Local consistency of Markov chain Monte Carlo methods Kengo Kamatani Received: 12 January 2012 / Revised: 8 March 2013 / Published online:
More informationOn the Applicability of Regenerative Simulation in Markov Chain Monte Carlo
On the Applicability of Regenerative Simulation in Markov Chain Monte Carlo James P. Hobert 1, Galin L. Jones 2, Brett Presnell 1, and Jeffrey S. Rosenthal 3 1 Department of Statistics University of Florida
More informationWhen is a Markov chain regenerative?
When is a Markov chain regenerative? Krishna B. Athreya and Vivekananda Roy Iowa tate University Ames, Iowa, 50011, UA Abstract A sequence of random variables {X n } n 0 is called regenerative if it can
More informationWeb Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.
Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Ruppert A. EMPIRICAL ESTIMATE OF THE KERNEL MIXTURE Here we
More informationReversible Markov chains
Reversible Markov chains Variational representations and ordering Chris Sherlock Abstract This pedagogical document explains three variational representations that are useful when comparing the efficiencies
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov
More informationSTA 294: Stochastic Processes & Bayesian Nonparametrics
MARKOV CHAINS AND CONVERGENCE CONCEPTS Markov chains are among the simplest stochastic processes, just one step beyond iid sequences of random variables. Traditionally they ve been used in modelling a
More informationON CONVERGENCE RATES OF GIBBS SAMPLERS FOR UNIFORM DISTRIBUTIONS
The Annals of Applied Probability 1998, Vol. 8, No. 4, 1291 1302 ON CONVERGENCE RATES OF GIBBS SAMPLERS FOR UNIFORM DISTRIBUTIONS By Gareth O. Roberts 1 and Jeffrey S. Rosenthal 2 University of Cambridge
More informationConvergence Analysis of MCMC Algorithms for Bayesian Multivariate Linear Regression with Non-Gaussian Errors
Convergence Analysis of MCMC Algorithms for Bayesian Multivariate Linear Regression with Non-Gaussian Errors James P. Hobert, Yeun Ji Jung, Kshitij Khare and Qian Qin Department of Statistics University
More informationMinicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics
Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics Eric Slud, Statistics Program Lecture 1: Metropolis-Hastings Algorithm, plus background in Simulation and Markov Chains. Lecture
More informationSimultaneous drift conditions for Adaptive Markov Chain Monte Carlo algorithms
Simultaneous drift conditions for Adaptive Markov Chain Monte Carlo algorithms Yan Bai Feb 2009; Revised Nov 2009 Abstract In the paper, we mainly study ergodicity of adaptive MCMC algorithms. Assume that
More informationPartially Collapsed Gibbs Samplers: Theory and Methods. Ever increasing computational power along with ever more sophisticated statistical computing
Partially Collapsed Gibbs Samplers: Theory and Methods David A. van Dyk 1 and Taeyoung Park Ever increasing computational power along with ever more sophisticated statistical computing techniques is making
More informationA Geometric Interpretation of the Metropolis Hastings Algorithm
Statistical Science 2, Vol. 6, No., 5 9 A Geometric Interpretation of the Metropolis Hastings Algorithm Louis J. Billera and Persi Diaconis Abstract. The Metropolis Hastings algorithm transforms a given
More informationA regeneration proof of the central limit theorem for uniformly ergodic Markov chains
A regeneration proof of the central limit theorem for uniformly ergodic Markov chains By AJAY JASRA Department of Mathematics, Imperial College London, SW7 2AZ, London, UK and CHAO YANG Department of Mathematics,
More informationAnalysis of the Gibbs sampler for a model. related to James-Stein estimators. Jeffrey S. Rosenthal*
Analysis of the Gibbs sampler for a model related to James-Stein estimators by Jeffrey S. Rosenthal* Department of Statistics University of Toronto Toronto, Ontario Canada M5S 1A1 Phone: 416 978-4594.
More informationMinorization Conditions and Convergence Rates for Markov Chain Monte Carlo. (September, 1993; revised July, 1994.)
Minorization Conditions and Convergence Rates for Markov Chain Monte Carlo September, 1993; revised July, 1994. Appeared in Journal of the American Statistical Association 90 1995, 558 566. by Jeffrey
More informationPartially Collapsed Gibbs Samplers: Theory and Methods
David A. VAN DYK and Taeyoung PARK Partially Collapsed Gibbs Samplers: Theory and Methods Ever-increasing computational power, along with ever more sophisticated statistical computing techniques, is making
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationYaming Yu Department of Statistics, University of California, Irvine Xiao-Li Meng Department of Statistics, Harvard University.
Appendices to To Center or Not to Center: That is Not the Question An Ancillarity-Sufficiency Interweaving Strategy (ASIS) for Boosting MCMC Efficiency Yaming Yu Department of Statistics, University of
More informationVariance Bounding Markov Chains
Variance Bounding Markov Chains by Gareth O. Roberts * and Jeffrey S. Rosenthal ** (September 2006; revised April 2007.) Abstract. We introduce a new property of Markov chains, called variance bounding.
More informationLECTURE 15 Markov chain Monte Carlo
LECTURE 15 Markov chain Monte Carlo There are many settings when posterior computation is a challenge in that one does not have a closed form expression for the posterior distribution. Markov chain Monte
More informationFaithful couplings of Markov chains: now equals forever
Faithful couplings of Markov chains: now equals forever by Jeffrey S. Rosenthal* Department of Statistics, University of Toronto, Toronto, Ontario, Canada M5S 1A1 Phone: (416) 978-4594; Internet: jeff@utstat.toronto.edu
More informationMarkov chain Monte Carlo
Markov chain Monte Carlo Karl Oskar Ekvall Galin L. Jones University of Minnesota March 12, 2019 Abstract Practically relevant statistical models often give rise to probability distributions that are analytically
More informationAsymptotically Stable Drift and Minorization for Markov Chains. with Application to Albert and Chib s Algorithm
Asymptotically Stable Drift and Minorization for Markov Chains with Application to Albert and Chib s Algorithm Qian Qin and James P. Hobert Department of Statistics University of Florida December 2017
More informationMARGINAL MARKOV CHAIN MONTE CARLO METHODS
Statistica Sinica 20 (2010), 1423-1454 MARGINAL MARKOV CHAIN MONTE CARLO METHODS David A. van Dyk University of California, Irvine Abstract: Marginal Data Augmentation and Parameter-Expanded Data Augmentation
More informationTo Center or Not to Center: That is Not the Question An Ancillarity-Sufficiency Interweaving Strategy (ASIS) for Boosting MCMC Efficiency
To Center or Not to Center: That is Not the Question An Ancillarity-Sufficiency Interweaving Strategy (ASIS) for Boosting MCMC Efficiency Yaming Yu Department of Statistics, University of California, Irvine
More informationStatistics & Data Sciences: First Year Prelim Exam May 2018
Statistics & Data Sciences: First Year Prelim Exam May 2018 Instructions: 1. Do not turn this page until instructed to do so. 2. Start each new question on a new sheet of paper. 3. This is a closed book
More informationConvergence complexity analysis of Albert and Chib s algorithm for Bayesian probit regression
Convergence complexity analysis of Albert and Chib s algorithm for Bayesian probit regression Qian Qin and James P. Hobert Department of Statistics University of Florida April 2018 Abstract The use of
More informationBayesian Computation in Color-Magnitude Diagrams
Bayesian Computation in Color-Magnitude Diagrams SA, AA, PT, EE, MCMC and ASIS in CMDs Paul Baines Department of Statistics Harvard University October 19, 2009 Overview Motivation and Introduction Modelling
More informationBayesian inference for multivariate skew-normal and skew-t distributions
Bayesian inference for multivariate skew-normal and skew-t distributions Brunero Liseo Sapienza Università di Roma Banff, May 2013 Outline Joint research with Antonio Parisi (Roma Tor Vergata) 1. Inferential
More informationAsymptotic efficiency of simple decisions for the compound decision problem
Asymptotic efficiency of simple decisions for the compound decision problem Eitan Greenshtein and Ya acov Ritov Department of Statistical Sciences Duke University Durham, NC 27708-0251, USA e-mail: eitan.greenshtein@gmail.com
More informationInvariant HPD credible sets and MAP estimators
Bayesian Analysis (007), Number 4, pp. 681 69 Invariant HPD credible sets and MAP estimators Pierre Druilhet and Jean-Michel Marin Abstract. MAP estimators and HPD credible sets are often criticized in
More informationLecture 8: The Metropolis-Hastings Algorithm
30.10.2008 What we have seen last time: Gibbs sampler Key idea: Generate a Markov chain by updating the component of (X 1,..., X p ) in turn by drawing from the full conditionals: X (t) j Two drawbacks:
More informationThe Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations
The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations John R. Michael, Significance, Inc. and William R. Schucany, Southern Methodist University The mixture
More informationGeneral Glivenko-Cantelli theorems
The ISI s Journal for the Rapid Dissemination of Statistics Research (wileyonlinelibrary.com) DOI: 10.100X/sta.0000......................................................................................................
More informationApplicability of subsampling bootstrap methods in Markov chain Monte Carlo
Applicability of subsampling bootstrap methods in Markov chain Monte Carlo James M. Flegal Abstract Markov chain Monte Carlo (MCMC) methods allow exploration of intractable probability distributions by
More informationComputational statistics
Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated
More informationA Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait
A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj Adriana Ibrahim Institute
More informationMarginal Markov Chain Monte Carlo Methods
Marginal Markov Chain Monte Carlo Methods David A. van Dyk Department of Statistics, University of California, Irvine, CA 92697 dvd@ics.uci.edu Hosung Kang Washington Mutual hosung.kang@gmail.com June
More informationeqr094: Hierarchical MCMC for Bayesian System Reliability
eqr094: Hierarchical MCMC for Bayesian System Reliability Alyson G. Wilson Statistical Sciences Group, Los Alamos National Laboratory P.O. Box 1663, MS F600 Los Alamos, NM 87545 USA Phone: 505-667-9167
More informationBayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence
Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns
More informationINTRODUCTION TO MARKOV CHAIN MONTE CARLO
INTRODUCTION TO MARKOV CHAIN MONTE CARLO 1. Introduction: MCMC In its simplest incarnation, the Monte Carlo method is nothing more than a computerbased exploitation of the Law of Large Numbers to estimate
More informationMetropolis-Hastings Algorithm
Strength of the Gibbs sampler Metropolis-Hastings Algorithm Easy algorithm to think about. Exploits the factorization properties of the joint probability distribution. No difficult choices to be made to
More informationGENERAL STATE SPACE MARKOV CHAINS AND MCMC ALGORITHMS
GENERAL STATE SPACE MARKOV CHAINS AND MCMC ALGORITHMS by Gareth O. Roberts* and Jeffrey S. Rosenthal** (March 2004; revised August 2004) Abstract. This paper surveys various results about Markov chains
More informationCONVERGENCE ANALYSIS OF THE GIBBS SAMPLER FOR BAYESIAN GENERAL LINEAR MIXED MODELS WITH IMPROPER PRIORS
The Annals of Statistics 01, Vol. 40, No. 6, 83 849 DOI: 10.114/1-AOS105 Institute of Mathematical Statistics, 01 CONVERGENCE ANALYSIS OF THE GIBBS SAMPLER FOR BAYESIAN GENERAL LINEAR MIXED MODELS WITH
More informationn [ F (b j ) F (a j ) ], n j=1(a j, b j ] E (4.1)
1.4. CONSTRUCTION OF LEBESGUE-STIELTJES MEASURES In this section we shall put to use the Carathéodory-Hahn theory, in order to construct measures with certain desirable properties first on the real line
More informationMarkov Chain Monte Carlo
Markov Chain Monte Carlo Recall: To compute the expectation E ( h(y ) ) we use the approximation E(h(Y )) 1 n n h(y ) t=1 with Y (1),..., Y (n) h(y). Thus our aim is to sample Y (1),..., Y (n) from f(y).
More informationSlice Sampling Mixture Models
Slice Sampling Mixture Models Maria Kalli, Jim E. Griffin & Stephen G. Walker Centre for Health Services Studies, University of Kent Institute of Mathematics, Statistics & Actuarial Science, University
More informationGeometric Ergodicity and Hybrid Markov Chains
Geometric Ergodicity and Hybrid Markov Chains by Gareth O. Roberts* and Jeffrey S. Rosenthal** (August 1, 1996; revised, April 11, 1997.) Abstract. Various notions of geometric ergodicity for Markov chains
More informationg(x) = P (y) Proof. This is true for n = 0. Assume by the inductive hypothesis that g (n) (0) = 0 for some n. Compute g (n) (h) g (n) (0)
Mollifiers and Smooth Functions We say a function f from C is C (or simply smooth) if all its derivatives to every order exist at every point of. For f : C, we say f is C if all partial derivatives to
More informationQuantitative Non-Geometric Convergence Bounds for Independence Samplers
Quantitative Non-Geometric Convergence Bounds for Independence Samplers by Gareth O. Roberts * and Jeffrey S. Rosenthal ** (September 28; revised July 29.) 1. Introduction. Markov chain Monte Carlo (MCMC)
More informationStatistical Inference
Statistical Inference Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham, NC, USA Spring, 2006 1. DeGroot 1973 In (DeGroot 1973), Morrie DeGroot considers testing the
More information1 Kernels Definitions Operations Finite State Space Regular Conditional Probabilities... 4
Stat 8501 Lecture Notes Markov Chains Charles J. Geyer April 23, 2014 Contents 1 Kernels 2 1.1 Definitions............................. 2 1.2 Operations............................ 3 1.3 Finite State Space........................
More informationConvergence of Conditional Metropolis-Hastings Samplers
Convergence of Conditional Metropolis-Hastings Samplers Galin L. Jones Gareth O. Roberts Jeffrey S. Rosenthal (June, 2012; revised March 2013 and June 2013) Abstract We consider Markov chain Monte Carlo
More informationSome Results on the Ergodicity of Adaptive MCMC Algorithms
Some Results on the Ergodicity of Adaptive MCMC Algorithms Omar Khalil Supervisor: Jeffrey Rosenthal September 2, 2011 1 Contents 1 Andrieu-Moulines 4 2 Roberts-Rosenthal 7 3 Atchadé and Fort 8 4 Relationship
More information7. Estimation and hypothesis testing. Objective. Recommended reading
7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing
More informationProblem 3. Give an example of a sequence of continuous functions on a compact domain converging pointwise but not uniformly to a continuous function
Problem 3. Give an example of a sequence of continuous functions on a compact domain converging pointwise but not uniformly to a continuous function Solution. If we does not need the pointwise limit of
More informationChapter 7. Markov chain background. 7.1 Finite state space
Chapter 7 Markov chain background A stochastic process is a family of random variables {X t } indexed by a varaible t which we will think of as time. Time can be discrete or continuous. We will only consider
More informationApril 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning
for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions
More informationSimulation of truncated normal variables. Christian P. Robert LSTA, Université Pierre et Marie Curie, Paris
Simulation of truncated normal variables Christian P. Robert LSTA, Université Pierre et Marie Curie, Paris Abstract arxiv:0907.4010v1 [stat.co] 23 Jul 2009 We provide in this paper simulation algorithms
More informationA Review of Pseudo-Marginal Markov Chain Monte Carlo
A Review of Pseudo-Marginal Markov Chain Monte Carlo Discussed by: Yizhe Zhang October 21, 2016 Outline 1 Overview 2 Paper review 3 experiment 4 conclusion Motivation & overview Notation: θ denotes the
More informationGeometric Convergence Rates for Time-Sampled Markov Chains
Geometric Convergence Rates for Time-Sampled Markov Chains by Jeffrey S. Rosenthal* (April 5, 2002; last revised April 17, 2003) Abstract. We consider time-sampled Markov chain kernels, of the form P µ
More informationSUPPLEMENT TO PAPER CONVERGENCE OF ADAPTIVE AND INTERACTING MARKOV CHAIN MONTE CARLO ALGORITHMS
Submitted to the Annals of Statistics SUPPLEMENT TO PAPER CONERGENCE OF ADAPTIE AND INTERACTING MARKO CHAIN MONTE CARLO ALGORITHMS By G Fort,, E Moulines and P Priouret LTCI, CNRS - TELECOM ParisTech,
More informationAdaptive Monte Carlo methods
Adaptive Monte Carlo methods Jean-Michel Marin Projet Select, INRIA Futurs, Université Paris-Sud joint with Randal Douc (École Polytechnique), Arnaud Guillin (Université de Marseille) and Christian Robert
More informationModified Simes Critical Values Under Positive Dependence
Modified Simes Critical Values Under Positive Dependence Gengqian Cai, Sanat K. Sarkar Clinical Pharmacology Statistics & Programming, BDS, GlaxoSmithKline Statistics Department, Temple University, Philadelphia
More informationAn introduction to adaptive MCMC
An introduction to adaptive MCMC Gareth Roberts MIRAW Day on Monte Carlo methods March 2011 Mainly joint work with Jeff Rosenthal. http://www2.warwick.ac.uk/fac/sci/statistics/crism/ Conferences and workshops
More informationCHAPTER 6. Differentiation
CHPTER 6 Differentiation The generalization from elementary calculus of differentiation in measure theory is less obvious than that of integration, and the methods of treating it are somewhat involved.
More informationAn ABC interpretation of the multiple auxiliary variable method
School of Mathematical and Physical Sciences Department of Mathematics and Statistics Preprint MPS-2016-07 27 April 2016 An ABC interpretation of the multiple auxiliary variable method by Dennis Prangle
More information17 : Markov Chain Monte Carlo
10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo
More informationCONVERGENCE RATES AND REGENERATION OF THE BLOCK GIBBS SAMPLER FOR BAYESIAN RANDOM EFFECTS MODELS
CONVERGENCE RATES AND REGENERATION OF THE BLOCK GIBBS SAMPLER FOR BAYESIAN RANDOM EFFECTS MODELS By AIXIN TAN A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
More informationNonparametric Drift Estimation for Stochastic Differential Equations
Nonparametric Drift Estimation for Stochastic Differential Equations Gareth Roberts 1 Department of Statistics University of Warwick Brazilian Bayesian meeting, March 2010 Joint work with O. Papaspiliopoulos,
More informationWeak convergence of Markov chain Monte Carlo II
Weak convergence of Markov chain Monte Carlo II KAMATANI, Kengo Mar 2011 at Le Mans Background Markov chain Monte Carlo (MCMC) method is widely used in Statistical Science. It is easy to use, but difficult
More informationVARIABLE TRANSFORMATION TO OBTAIN GEOMETRIC ERGODICITY IN THE RANDOM-WALK METROPOLIS ALGORITHM
Submitted to the Annals of Statistics VARIABLE TRANSFORMATION TO OBTAIN GEOMETRIC ERGODICITY IN THE RANDOM-WALK METROPOLIS ALGORITHM By Leif T. Johnson and Charles J. Geyer Google Inc. and University of
More informationComputer intensive statistical methods
Lecture 11 Markov Chain Monte Carlo cont. October 6, 2015 Jonas Wallin jonwal@chalmers.se Chalmers, Gothenburg university The two stage Gibbs sampler If the conditional distributions are easy to sample
More informationStat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.
Stat 535 C - Statistical Computing & Monte Carlo Methods Arnaud Doucet Email: arnaud@cs.ubc.ca 1 1.1 Outline Introduction to Markov chain Monte Carlo The Gibbs Sampler Examples Overview of the Lecture
More informationEstimating the spectral gap of a trace-class Markov operator
Estimating the spectral gap of a trace-class Markov operator Qian Qin, James P. Hobert and Kshitij Khare Department of Statistics University of Florida April 2017 Abstract The utility of a Markov chain
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning
More informationMARKOV CHAIN MONTE CARLO
MARKOV CHAIN MONTE CARLO RYAN WANG Abstract. This paper gives a brief introduction to Markov Chain Monte Carlo methods, which offer a general framework for calculating difficult integrals. We start with
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More information6 Markov Chain Monte Carlo (MCMC)
6 Markov Chain Monte Carlo (MCMC) The underlying idea in MCMC is to replace the iid samples of basic MC methods, with dependent samples from an ergodic Markov chain, whose limiting (stationary) distribution
More informationSupplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements
Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements Jeffrey N. Rouder Francis Tuerlinckx Paul L. Speckman Jun Lu & Pablo Gomez May 4 008 1 The Weibull regression model
More information08a. Operators on Hilbert spaces. 1. Boundedness, continuity, operator norms
(February 24, 2017) 08a. Operators on Hilbert spaces Paul Garrett garrett@math.umn.edu http://www.math.umn.edu/ garrett/ [This document is http://www.math.umn.edu/ garrett/m/real/notes 2016-17/08a-ops
More informationLecture 10. Theorem 1.1 [Ergodicity and extremality] A probability measure µ on (Ω, F) is ergodic for T if and only if it is an extremal point in M.
Lecture 10 1 Ergodic decomposition of invariant measures Let T : (Ω, F) (Ω, F) be measurable, and let M denote the space of T -invariant probability measures on (Ω, F). Then M is a convex set, although
More informationImproved Robust MCMC Algorithm for Hierarchical Models
UNIVERSITY OF TEXAS AT SAN ANTONIO Improved Robust MCMC Algorithm for Hierarchical Models Liang Jing July 2010 1 1 ABSTRACT In this paper, three important techniques are discussed with details: 1) group
More informationA quick introduction to Markov chains and Markov chain Monte Carlo (revised version)
A quick introduction to Markov chains and Markov chain Monte Carlo (revised version) Rasmus Waagepetersen Institute of Mathematical Sciences Aalborg University 1 Introduction These notes are intended to
More informationControl Variates for Markov Chain Monte Carlo
Control Variates for Markov Chain Monte Carlo Dellaportas, P., Kontoyiannis, I., and Tsourti, Z. Dept of Statistics, AUEB Dept of Informatics, AUEB 1st Greek Stochastics Meeting Monte Carlo: Probability
More informationMarkov Chains and De-initialising Processes
Markov Chains and De-initialising Processes by Gareth O. Roberts* and Jeffrey S. Rosenthal** (November 1998; last revised July 2000.) Abstract. We define a notion of de-initialising Markov chains. We prove
More informationBayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units
Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Sahar Z Zangeneh Robert W. Keener Roderick J.A. Little Abstract In Probability proportional
More informationHastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model
UNIVERSITY OF TEXAS AT SAN ANTONIO Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model Liang Jing April 2010 1 1 ABSTRACT In this paper, common MCMC algorithms are introduced
More informationMarkov Chain Monte Carlo Methods
Markov Chain Monte Carlo Methods John Geweke University of Iowa, USA 2005 Institute on Computational Economics University of Chicago - Argonne National Laboaratories July 22, 2005 The problem p (θ, ω I)
More information