On Reparametrization and the Gibbs Sampler

Size: px
Start display at page:

Download "On Reparametrization and the Gibbs Sampler"

Transcription

1 On Reparametrization and the Gibbs Sampler Jorge Carlos Román Department of Mathematics Vanderbilt University James P. Hobert Department of Statistics University of Florida March 2014 Brett Presnell Department of Statistics University of Florida Abstract Gibbs samplers derived under different parametrizations of the target density can have radically different rates of convergence. In this article, we specify conditions under which reparametrization leaves the convergence rate of a Gibbs chain unchanged. An example illustrates how these results can be exploited in convergence rate analyses. 1 Introduction It is well-known that Gibbs samplers derived under different parametrizations of a Bayesian hierarchical model can have dramatically different rates of convergence (Gelfand et al., 1995; Papaspiliopoulos et al., 2007; Roberts and Sahu, 1997; Yu and Meng, 2011). In this article, we consider the reverse situation in which reparametrization has no effect. To motivate our study, we begin with a fresh look at a well-known toy example involving a simple random effects model with known variance components. Consider the one-way random effects model given by Y ij = θ i + ɛ ij, (1) i = 1,..., c, j = 1,..., m i, where the θ i s are independent and identically distributed (iid) N(µ, σ 2 ), and the ɛ ij s are independent of the θ i s and iid N(0, σ 2 e). (For now, we restrict attention to the balanced case where m i m.) Suppose that the variance components, σ 2 and σ 2 e, are known, Román s research supported by NSF Grant DMS Corresponding author s and telephone number: jc.roman@vanderbilt.edu ; Hobert s research supported by NSF Grant DMS

2 and that the prior on µ is flat. Let θ = (θ 1,..., θ c ) and let y denote the observed data. A simple calculation shows that the posterior density of µ given y is normal, but consider nevertheless the twocomponent Gibbs chain {(µ n, θ n )} n=0 that alternately samples from the conditional distributions θ µ, y and µ θ, y, which are c-variate normal and univariate normal, respectively. The marginal sequence {µ n } n=0 is itself a Markov chain whose invariant density is the posterior density (of µ given y), and it s easy to show that the exact rate of convergence of this chain is σe/(σ 2 e 2 + mσ 2 ) (see, e.g., Liu et al., 1994). The rate of convergence will be formally defined in Section 2, but for now it suffices to note that the rate is between 0 and 1, and smaller is better. Now consider a reparametrized version of model (1) given by Y ij = µ + u i + ɛ ij, where the u i s are iid N(0, σ 2 ), and the ɛ ij s are independent of the u i s and still iid N(0, σe). 2 Let u = (u 1,..., u c ). This is called the non-centered parametrization (NCP), whereas model (1) is called the centered parametrization (CP). If we put the same flat prior on µ, then the posterior density of µ given y remains the same as in the CP model. However, the two-component Gibbs sampler derived from the NCP model, which alternates between draws from u µ, y and µ u, y, is not the same as the one based on the CP. Furthermore, the two Gibbs samplers have completely different convergence behavior. Indeed, the convergence rate of the NCP Gibbs sampler is 1 σe/(σ 2 e 2 + mσ 2 ). So when one of the two Gibbs samplers is very slow to converge, the other converges extremely rapidly. This simple example illustrates that reparametrization can significantly affect the convergence rate of the Gibbs sampler. In a practical version of the one-way model, the variance components are unknown. In this case, the standard default prior density for (µ, σ 2, σe) 2 is 1/ ( σe 2 ) σ 2. We assume that the posterior is proper - see Román (2012) for conditions. The posterior density of (µ, σ 2, σe) 2 given y, which is the same under CP and NCP, is intractable, so this is no longer a toy example. As in the known variance case, there are two different versions of the standard two-component Gibbs sampler for this problem: the CP Gibbs sampler, which alternates between θ, µ σ 2, σe, 2 y and σ 2, σe µ, 2 θ, y, and the NCP Gibbs sampler, which alternates between u, µ σ 2, σe, 2 y and σ 2, σe u, 2 µ, y. The results of Section 3 imply that, in contrast with the known variance case, these two Gibbs samplers converge at exactly the same rate. Consequently, convergence rate results for either of these Gibbs samplers apply directly to the other. In Section 3 we compare the results of Román (2012), who analyzed the NCP Gibbs sampler, with those of Tan and Hobert (2009), who studied the CP version. The CP and NCP Gibbs Markov chains described above share the same rate of convergence because the transformation that takes the CP model to the NCP model involves variables (θ and µ) that reside in the same component (or block) of the two-component Gibbs sampler. (Note that this 2

3 is not the case in the toy example where the variance components are known.) The main result in this paper is a formalization of this idea. We now provide an overview of our results in the special case where the target distribution has a density with respect to Lebesgue measure. Suppose f : R d 1 R d 2 R d k [0, ) is a probability density function, and let Φ 1 = { (X n (1), X n (2),..., X n (k) ) } denote the Markov chain simulated by the k-component Gibbs n=0 sampler based on f(x 1, x 2,..., x k ) that updates the components in the natural order. It is wellknown and easy to see that the marginal sequence Φ 1 := { (X n (2),..., X n (k) ) } is also a Markov n=0 chain. Now, for i {2, 3,..., k}, let Φ i denote the k-component Gibbs sampler whose update order is (i, i + 1,..., k, 1, 2,..., i 1), and let Φ i denote the corresponding marginal Markov chain (that leaves out X (i) ). We show that all 2k of these chains converge at exactly the same rate. Not only is this fact the key to the proof of our main result concerning reparametrization, it is also useful from a practical standpoint. Indeed, if one wishes to know the rate of convergence of Φ 1, then it suffices to study the lower-dimensional chain Φ i (for any i = 1, 2,..., k), which may be easier to analyze than Φ 1. This idea has been used to establish qualitative convergence results (such as geometric and uniform ergodicity) for two-component Gibbs samplers (see, e.g., Diebolt and Robert (1994) and Román and Hobert (2012)). Now let (X 1, X 2,..., X k ) denote a random vector with density f, and consider the k-component Gibbs sampler Φ 1 based on the distribution of ( X 1, X 2,..., X k ) = (t 1 (X 1 ), t 2 (X 2 ),..., t k (X k )). Suppose f(x 1, x 2,..., x k ) can be written as a function of (t 1 (x 1 ), t 2 (x 2 ),..., t k (x k )), an assumption that obviously holds if each t i : R d i R d i is invertible. Then, by exploiting the fact that the 2k chains described above share the same rate, we show that Φ 1 and Φ 1 converge at the same rate. An important implication of this result is that, when analyzing the convergence rate of a Gibbs sampler, one is free to choose a convenient parametrization, as long as the corresponding transformation respects the within-component restriction. The remainder of this article is organized as follows. Section 2 contains some background on general state space Markov chain theory as well as preliminary results. Our main result showing that a within-component reparametrization does not affect the convergence rate of the Gibbs Markov chain can be found in Section 3. This section also contains the application of our main result to the Gibbs samplers for the one-way model with improper priors. 3

4 2 Markov Chain Background and Preliminary Results As in Meyn and Tweedie (1993, Chapter 3), let P (x, dy) be a generic Markov transition function (MTF) on a set X equipped with a countably generated σ-algebra. Let P n (x, dy) denote the n-step MTF. We assume throughout that the chain determined by P is ψ-irreducible, aperiodic and positive recurrent with invariant probability measure π. We do not assume reversibility. For a measure ν on X, let νp n (dy) = X P n (x, dy)ν(dx). Following Roberts and Tweedie (2001) and Rosenthal (2003), define the L 1 -rate of convergence of the Markov chain as { } 1 ρ = exp sup lim n n log νp n π TV, ν p(π) where TV denotes the total variation norm for signed measures and p(π) is the set of all probability measures ν that are absolutely continuous with respect to π with X (dν/dπ)2 dπ <. For reversible chains, ρ equals the usual rate of convergence, i.e., the spectral radius (and norm) of the self-adjoint Markov operator defined by P (Rosenthal, 2003, Proposition 2). As in Roberts and Rosenthal (1997), we say that the chain (or the corresponding MTF) is π-a.e. geometrically ergodic if there exist M : X (0, ) and κ < 1 such that, for π-a.e. x X, P n (x, ) π( ) TV M(x)κ n for all n N. We often omit the π-a.e. and simply write geometrically ergodic. The next proposition follows easily from results in Roberts and Rosenthal (1997) and Roberts and Tweedie (2001). Proposition 1. The Markov chain based on P is geometrically ergodic if and only if ρ < 1. Now, for i = 1, 2,..., k, let (X i, F i, µ i ) denote σ-finite measure spaces, and let (X, F, µ) denote the corresponding product space. Suppose that π is a probability distribution on (X, F) having density f(x 1, x 2,..., x k ) with respect to µ. Let P i denote the MTF of the k-component Gibbs sampler whose update order is (i, i + 1,..., k, 1, 2,..., i 1), and let Q i denote the MTF of the corresponding marginal Markov chain (that leaves out the ith component). A proof of the following result can be found in the Appendix. Proposition 2. The Markov chains defined by the MTFs {P i } k i=1 and {Q i} k i=1 all share the same L 1 convergence rate. In conjunction with Proposition 1, Proposition 2 shows that geometric ergodicity is a solidarity property for the 2k chains defined by {P i } k i=1 and {Q i} k i=1. That is, either all 2k chains are geometrically ergodic, or none of them is. This result is actually well-known when k = 2. Indeed, in 4

5 that case, Diaconis et al. s (2008) Lemma 2.4 shows that geometric ergodicity is a solidarity property for P 1 and Q 1, and symmetry implies that the same holds for P 2 and Q 2. (These facts can also be established using results in Roberts and Rosenthal (2001).) Furthermore, when k = 2, the marginal Markov chains defined by Q 1 and Q 2 are reversible, and the norms of the corresponding self-adjoint Markov operators are identical (Liu et al., 1994). Then, because a reversible Markov chain is geometrically ergodic if and only if the norm of its Markov operator is strictly less than one (Roberts and Rosenthal, 1997), it follows that Q 1 is geometrically ergodic if and only if Q 2 is geometrically ergodic, completing the cycle, and the argument (for k = 2). 3 Reparametrization Suppose that (X 1, X 2,..., X k ) has (joint) distribution π, and let π represent the distribution of (t 1 (X 1 ), t 2 (X 2 ),..., t k (X k )). Under what conditions does the Gibbs sampler based on π have the same rate of convergence as the sampler based on π, i.e., when is the convergence rate of the Gibbs sampler unchanged by within-block transformations? To formalize this question, let (X i, F i, µ i ), i = 1, 2..., k, (X, F, µ), π, and f be as in the previous section. Let (Y i, G i ), i = 1, 2,..., k, be measurable spaces, let (Y, G) be their product, and assume that t i : X i Y i, i = 1, 2,..., k are measurable transformations. Finally, let T (x 1, x 2,..., x k ) = (t 1 (x 1 ), t 2 (x 2 ),..., t k (x k )) and let π = π T 1 be the probability distribution induced on (Y, G) by the transformation T, i.e., π(b) = π(t 1 (B)), B G, where T 1 (B) is the pre-image of B under T. The following result is proved in the Appendix. Proposition 3. Suppose that there exists a measurable function f : Y R, such that for all (x 1, x 2,..., x k ) X. f(x 1, x 2,..., x k ) = f(t 1 (x 1 ), t 2 (x 2 ),..., t k (x k )) (2) Then the k-component Gibbs samplers based on π and π (both updating the components in the natural order), have the same L 1 -rate of convergence. Remark. The main hypothesis of Proposition 3 clearly holds when each t i is an invertible function (with measurable inverse), since then (2) holds with f(y 1, y 2,..., y k ) = f ( t 1 1 (y 1), t 1 2 (y 2),..., t 1 k (y k) ). We now return to the CP and NCP Gibbs samplers for the one-way model. In the Introduction, we considered only the balanced case, in which all the m i s are the same, and we considered only one prior density. Here we allow the m i to differ, and we consider a family of prior densities for 5

6 (µ, σ 2, σ 2 e) given by ( σ 2 ) (a+1)( σ 2 e ) (b+1)i(0, ) (σ 2 e)i (0, ) (σ 2 ), where a, b are hyperparameters. Note that by taking (a, b) = ( 1/2, 0), we recover the default prior from the Introduction. Tan and Hobert (2009) analyzed the Gibbs sampler based on the CP version of the one-way model and proved that the CP Gibbs Markov chain is geometrically ergodic if a < 0 and {( c M + 2b c + 3 and c min i=1 m i m i + 1 ) 1 }, m { ( c )} < 2 exp Ψ M 2 + a, where M = c i=1 m i, m = max{m 1, m 2,..., m c } and Ψ(x) = d dx log( Γ(x) ) is the digamma function. Román (2012) (see also Román and Hobert (2012)) subsequently proved that the NCP Gibbs Markov chain is geometrically ergodic if a < 0 and { ( c )} M + 2b c + 2 and 1 < 2 exp Ψ 2 + a. It s easy to see that Román s conditions are weaker (i.e., easier to satisfy) than those of Tan and Hobert. However, the two sets of conditions are directly comparable only if geometric ergodicity is a solidarity property for the two different Gibbs chains. Let π(θ, µ, σ 2, σ 2 e y) denote the complete data posterior density under the CP model, which is the invariant density of the CP Gibbs Markov chain. Consider a one-to-one transformation of ( (θ, µ), (σ 2, σ 2 e) ) to ( t(θ, µ), (σ 2, σ 2 e) ), where t : R c+1 R c+1 is defined as follows: t(θ, µ) = ( θ1 µ, θ 2 µ,..., θ c µ, µ ). The density of the transformed variable is exactly the complete data posterior density under the NCP model, so Proposition 3 implies that the CP and NCP Gibbs chains share the same L 1 -rate. Thus, Román s (2012) result is indeed an improvement upon that of Tan and Hobert (2009). We now present an example involving a transformation that is not one-to-one. Consider a pair of random variables (X 1, X 2 ) such that X 1 X 2 = x 2 N(0, 1/x 2 ) (3) and X 2 Gamma ( ν 2, ν 2 ), where ν > 0 is a known constant. Then the density of (X1, X 2 ) is f(x 1, x 2 ) = (ν/2)ν/2 { Γ(ν/2) 2π x ν exp x 2 ( x ν )} I (0, ) (x 2 ), 6

7 and it can be shown that ( ν + 1 X 2 X 1 = x 1 Gamma 2, 1 ) 2 (x2 1 + ν). (4) Although direct simulation of (X 1, X 2 ) is clearly possible, consider the Gibbs sampler which uses the conditionals in (3) and (4). Suppose we use the transformation U 1 = t 1 (X 1 ) = X1 2 (which is not one-to-one) together with U 2 = t 2 (X 2 ) = X 2. Since X 1 X 2 = x 2 N(0, 1/x 2 ), it follows immediately using a χ 2 -type calculation that X 2 1 X 2 = x 2 Gamma(1/2, x 2 /2). In other words, U 1 U 2 = u 2 Gamma(1/2, u 2 /2). (5) Obviously, U 2 Gamma ( ν 2, ν 2 ) and the density of (U1, U 2 ) is f U (u 1, u 2 ) = (ν/2)ν/2 Γ(ν/2) 1 { u ν exp u } 2 2π u1 2 (u 1 + ν) I (0, ) (u 1 )I (0, ) (u 2 ). Moreover, a simple calculation shows that U 2 U 1 = u 1 Gamma ( ν + 1 2, 1 ) 2 (u 1 + ν). (6) The associated Gibbs sampler can be simulated using the conditionals given in (5) and (6). Finally, because the joint density of (X 1, X 2 ) depends on x 1 only through t 1 (x 1 ) = x 2 1, the condition in Proposition 3 is satisfied and we conclude that the Gibbs samplers associated with f and f U converge at the same L 1 -rate. Appendix Proof of Proposition 2. We will prove the result for k = 3. The extension to general k is obvious, and only involves more complicated notation. The proof has two parts: first we show that P 1, P 2 and P 3 share the same L 1 rate; and then we show that P i and Q i have the same L 1 rate for i = 1, 2, 3, We prove the first result by showing that ρ 1 ρ 2 ρ 3 ρ 1, where ρ i denotes the L 1 rate of P i. For this, we need only show that ρ 1 ρ 2, with the remaining inequalities following by symmetry. To prove ρ 1 ρ 2, we show that for each fixed ν p(π), there exists a ν p(π) such that, for all n N, νp n+1 1 π TV ν P n 2 π TV. (7) From this it follows that lim n n 1 log νp n 1 π TV lim n n 1 log ν P n 2 π TV log(ρ 2), 7

8 which implies ρ 1 ρ 2. To prove (7), let (X 1, X 2, X 3 ) have distribution π, and let f 1 23 (x 1 x 2, x 3 ), f 2 13 (x 2 x 1, x 3 ), and f 3 12 (x 3 x 1, x 2 ) represent the conditional densities of X 1 (given X 2, and X 3 ), of X 2 (given X 1 and X 3 ), and of X 3 (given X 1 and X 2 ), respectively. For i = 1, 2 and A F, we have P i ((x 1, x 2, x 3 ), A) = k i (x 1, x 2, x 3 x 1, x 2, x 3 ) µ(d(x 1, x 2, x 3)), A where k 1 and k 2 are the Markov transition densities associated with P 1 and P 2, respectively. Of course, k 1 (x 1, x 2, x 3 x 1, x 2, x 3 ) = f 1 23 (x 1 x 2, x 3 )f 2 13 (x 2 x 1, x 3 )f 3 12 (x 3 x 1, x 2), and k 2 is defined analogously. It is convenient to express each of P 1 and P 2 as the composition of three simple transition kernels. To this end, let δ x ( ) denote a point mass measure at x and let ( P 1 23 (x1, x 2, x 3 ), A ) = f 1 23 (x 1 x 2, x 3 )(µ 1 δ x2 δ x3 )(d(x 1, x 2, x 3)), A be the kernel associated with the single update of X 1 (given X 2 and X 3 ). Define the kernels associated with the (conditional) updates of X 2 and X 3 analogously and call them P 2 13 and P 3 12, respectively. A routine calculation shows that P 1 = P 1 23 P 2 13 P 3 12 and P 2 = P 2 13 P 3 12 P Given ν p(π) having density q with respect to π, let ν = νp A straightforward calculation shows that ν has density q = P 1 23 q with respect to π. Moreover, a simple application of Jensen s inequality shows that X (q ) 2 dπ <, so ν p(π). Also, given a function g : X [ 1, 1, let ĝ = P 2 13 P 3 12 g and note that ĝ 1, where is the supremum norm. Writing P 1 and P 2 in terms of the kernels P 1 23, P 2 13 and P 3 12, and using a simple induction argument, we obtain that, for any n 1, νp1 n+1 g = ν P2 n ĝ for all ν p(π) and all g : X [ 1, 1. Finally, since π = πp 2 13 = πp 3 12, we have that π = πp 2 13 P 3 12 and thus πg = πĝ. Hence, νp1 n+1 (g) π(g) = ν P2 n (ĝ) π(ĝ) sup ν P2 n (h) π(h) = 2 ν P2 n π TV, {h: h 1} and because g was arbitrary, (7) follows. For the second part of the proof, let η i denote the L 1 rate for Q i, i = 1, 2, 3. We will show that ρ 1 = η 1. The other two equivalences then follow by symmetry. For a measurable set B in X 2 X 3, [ Q 1 ((x 2, x 3 ), B) = k 1 (x 1, x 2, x 3 x 1, x 2, x 3 ) µ 1 (dx 1) (µ 2 µ 3 )(d(x 2, x 3)), B X 1 8

9 and the corresponding invariant distribution is given by [ π 23 (B) = f(x 1, x 2, x 3 ) µ 1 (dx 1 ) (µ 2 µ 3 )(d(x 2, x 3 )). B X 1 Given α p(π 23 ) and g : X 2 X 3 [ 1, 1, define α p(π) by α dα (A) = (x 2, x 3 )π(d(x 1, x 2, x 3 )) dπ 23 and ǧ : X [ 1, 1 by ǧ(x 1, x 2, x 3 ) = g(x 2, x 3 ), respectively. Then A (P 1 ǧ)(x 1, x 2, x 3 ) = k 1 (x 1, x 2, x 3 x 1, x 2, x 3 )g(x 2, x 3)µ(d(x 1, x 2, x 3)) X [ = k 1 (x 1, x 2, x 3 x 1, x 2, x 3 ) µ 1 (dx 1) g(x 2, x 3) µ 2 (dx 2) µ 3 (dx 3) = (Q 1 g)(x 2, x 3 ), X 1 X 2 X 3 and it follows by induction that (P1 nǧ)(x 1, x 2, x 3 ) = (Q n 1 g)(x 2, x 3 ) for all n 1. Thus, α (P1 n ǧ) = (P1 n ǧ)(x 1, x 2, x 3 ) dα (x 2, x 3 )π(d(x 1, x 2, x 3 )) X dπ 23 = (Q n 1 g)(x 2, x 3 ) dα (x 2, x 3 )π(d(x 1, x 2, x 3 )) X dπ 23 = (Q n 1 g)(x 2, x 3 ) dα [ (x 2, x 3 ) f(x 1, x 2, x 3 )µ 1 (dx 1 ) µ 2 (dx 2 ) µ 3 (dx 3 ) X 2 X 3 dπ 23 X 1 = (Q n 1 g)(x 2, x 3 ) dα (x 2, x 3 ) π 23 (d(x 2, x 3 )) = α(q n g). X 3 dπ 23 X 2 Finally, since π(ǧ) = π 23 (g), we have αq n 1 (g) π 23 (g) = α P n 1 (ǧ) π(ǧ) sup α P1 n (h) π(h) = 2 α P1 n π TV, {h: h 1} for all n 1, and since g : X 2 X 3 [ 1, 1 was arbitrary, αq n 1 π 23 TV α P n 1 π TV for all n 1. This proves that η 1 ρ 1. To prove the reverse inequality, let ν p(π) and g : X [ 1, 1, and define ν p(π 23 ) by [ ν dν (B) = B X 1 dπ (x 1, x 2, x 3 )f 1 23 (x 1 x 2, x 3 )µ 1 (dx 1 ) π 23 (d(x 2, x 3 )) and, noting that (P 1 g)(x 1, x 2, x 3 ) does not depend on x 1, let ǧ(x 2, x 3 ) = (P 1 g)(x 1, x 2, x 3 ). An induction argument similar to the one above shows that (P n+1 1 g)(x 1, x 2, x 3 ) = (Q n 1 ǧ)(x 2, x 3 ) for 9

10 all n 1, and thus, [ ν(p1 n+1 g) = (Q n dν 1 ǧ)(x 2, x 3 ) X 3 X 2 X 1 dπ (x 1, x 2, x 3 )f(x 1, x 2, x 3 ) µ 1 (dx 1 ) µ 2 (dx 2 ) µ 3 (dx 3 ) [ = (Q X 3 X2 n1 ǧ)(x 2, x 3 ) dν (x 2, x 3 ) f(x dπ 1, x 2, x 3 ) µ 1 (dx 1) µ 2 (dx 2 ) µ 3 (dx 3 ) 23 X 1 = (Q X2 X3 n1 ǧ)(x 2, x 3 ) dν (x 2, x 3 ) π 23 (d(x 2, x 3 )) = ν (Q n 1 P 1 g). dπ 23 Finally, since π(g) = (πp 1 )(g) = π(p 1 g) = π 23 (ǧ), we have νp1 n+1 (g) π(g) = ν Q n 1 (ǧ) π 23 (ǧ) sup ν Q n 1 (h) π 23 (h) = 2 ν Q n 1 π 23 TV {h: h 1} for all n 1. Since g : X [ 1, 1 was arbitrary, it follows that νp n+1 1 π TV ν Q n 1 π 23 TV. This implies that ρ 1 η 1, completing the proof of the proposition. A few technical remarks will be helpful before beginning the proof of Proposition 3. We will employ the following lemma. Lemma 1. Let (X, F, µ) be a measure space, let (Y, G) be a measurable space, and let π be a probability measure on (X, F) having density f with respect to µ. Suppose that T : X Y is measurable and that f(x) = f(t (x)) for some measurable function f : Y R. Let ν = µ T 1 be the measure induced on (Y, G) by µ and T. Similarly, let π = π T 1 be the probability measure induced on (Y, G) by π and T. Then π has density f with respect to ν. Proof. By change of variables (Billingsley, 1995, Theorem 16.13), for any B G, we have π(b) = π ( T 1 (B) ) = f(t (x)) µ(dx) = f(y) (µ T 1 )(dy) = f(y) ν(dy). T 1 (B) B B Returning to the specific context of Proposition 3, consider the product spaces (X, F, µ) and (Y, G), and the transformation T (x 1, x 2,..., x k ) = (t 1 (x 1 ), t 2 (x 2 ),..., t k (x k )). By Lemma 1, π = π T 1 has density f with respect to the measure ν = µ T 1. Let ν i = µ i t 1 i, i = 1, 2,..., k. If the measure spaces (Y i, G i, ν i ), i = 1, 2,..., k, are σ-finite, then it is easy to check that ν is equal to the product measure ν 1 ν 2 ν k. However, there is nothing in our hypotheses to guarantee that the ν i are σ-finite, and if any of them fail to be σ-finite, then technical difficulties arise which invalidate our proof. 10

11 Fortunately, we may assume without loss of generality that the ν i are σ-finite, and even finite. To see this, let π i denote the ith marginal distribution of π and let f i denote the density of π i with respect to µ i, which may be computed in the usual way by integrating f over all but its ith coordinate. Then it is easy to check that π has density f with respect to the product measure π 1 π 2 π k, where Now let f(x 1, x 2,..., x k ) f(x 1, x 2,..., x k ) = f 1 (x 1 )f 2 (x 2 ) f k (x k ), if f 1(x 1 )f 2 (x 2 ) f k (x k ) > 0, 0, otherwise. f 1 (y 1 ) = f(y1, t 2 (x 2 ),..., t k (x k )) µ(dx 2 ) µ(dx k ), X k X 2 and define f 2,..., f k similarly. From (2), it is obvious that f i (x i ) = f i (t i (x i )), i = 1, 2,..., k, and it follows that f(x 1, x 2,..., x k ) is a function of (t 1 (x 1 ), t 2 (x 2 ),..., t k (x k )). Thus, the hypotheses of Proposition 3 also hold upon replacement of f by f and µ i by π i, i = 1, 2,..., k. But in this case ν i = π i t 1 i, which is a probability measure, and hence finite. Proof of Proposition 3. Again, we will prove the result for k = 3. Assume, without loss of generality, that ν i = µ i t 1 i is σ-finite for i = 1, 2, 3. Let (X 1, X 2, X 3 ) have distribution π. We first prove that the Gibbs sampler based on (the distribution of) (t 1 (X 1 ), X 2, X 3 ) has the same L 1 -rate of convergence as the one based on (X 1, X 2, X 3 ). A similar argument then implies that this rate of convergence is shared by the Gibbs sampler based on (t 1 (X 1 ), t 2 (X 2 ), X 3 ), and then by the Gibbs sampler based on (t 1 (X 1 ), t 2 (X 2 ), t 2 (X 3 )), thus proving the result. By Lemma 1, (t 1 (X 1 ), X 2, X 3 ) has density g(y 1, x 2, x 3 ) = f(y 1, t 2 (x 2 ), t 3 (x 3 )) with respect to ν 1 µ 2 µ 3. Letting g 1 23 (y 1 x 2, x 3 ), g 2 13 (x 2 y 1, x 3 ), and g 3 12 (x 3 y 1, x 2 ) represent the corresponding conditional densities, the Gibbs sampler based on (t 1 (X 1 ), X 2, X 3 ) has transition density k 1 (y 1, x 2, x 3 y 1, x 2, x 3 ) = g 1 23 (y 1 x 2, x 3 )g 2 13 (x 2 y 1, x 3 )g 3 12 (x 3 y 1, x 2) with respect to ν 1 µ 2 µ 3. But g(t 1 (x 1 ), x 2, x 3 ) = f(t 1 (x 1 ), t 2 (x 2 ), t 3 (x 3 )) = f(x 1, x 2, x 3 ), and from this it is easily checked that g 1 23 (t 1 (x 1 ) x 2, x 3 ) = f 1 23 (x 1 x 2, x 3 ), g 2 13 (x 2 t 1 (x 1 ), x 3 ) = 11

12 f 2 13 (x 2 x 1, x 3 ), and g 3 12 (x 3 t 1 (x 1 ), x 2 ) = f 3 12 (x 3 x 1, x 2 ). By change of variables, k1 (y 1, x 2, x 3 y 1, x 2, x 3 ) ν 1 (dy 1) Y 1 = g 1 23 (y 1 x 2, x 3 )g 2 13 (x 2 y 1, x 3 )g 3 12 (x 3 y 1, x 2) (µ 1 t 1 1 )(dy 1) Y 1 = g 1 23 (t 1 (x 1) x 2, x 3 )g 2 13 (x 2 t 1 (x 1), x 3 )g 3 12 (x 3 t 1 (x 1), x 2) µ 1 (dx 1) X 1 = f 1 23 (x 1 x 2, x 3 )f 2 13 (x 2 x 1, x 3 )f 3 12 (x 3 x 1, x 2) µ 1 (dx 1) X 1 = k 1 (x 1, x 2, x 3 x 1, x 2, x 3 ) µ 1 (dx 1), X 1 and thus the marginal (X 2, X 3 ) chain of the Gibbs sampler based on (t 1 (X 1 ), X 2, X 3 ) has the same transition density (with respect to µ 2 µ 3 ) as the marginal (X 2, X 3 ) chain of the Gibbs sampler based on (X 1, X 2, X 3 ). This implies that the two marginal chains have the same L 1 -convergence rate, and it follows from Proposition 2 that the two parent chains also share this rate. Acknowledgments The authors thank Aixin Tan and an anonymous referee for helpful comments and suggestions. References BILLINGSLEY, P. (1995). Probability and Measure. 3rd ed. John Wiley and Sons, New York. DIACONIS, P., KHARE, K. and SALOFF-COSTE, L. (2008). Gibbs sampling, exponential families and orthogonal polynomials (with discussion). Statistical Science, DIEBOLT, J. and ROBERT, C. P. (1994). Estimation of finite mixture distributions by Bayesian sampling. Journal of the Royal Statistical Society, Series B, GELFAND, A. E., SAHU, S. K. and CARLIN, B. P. (1995). Efficient parametrisations for normal linear mixed models. Biometrika, LIU, J. S., WONG, W. H. and KONG, A. (1994). Covariance structure of the Gibbs sampler with applications to comparisons of estimators and augmentation schemes. Biometrika, MEYN, S. P. and TWEEDIE, R. L. (1993). Markov Chains and Stochastic Stability. Springer Verlag, London. 12

13 PAPASPILIOPOULOS, O., ROBERTS, G. O. and SKÖLD, M. (2007). A general framework for the parametrization of hierarchical models. Statistical Science, ROBERTS, G. and SAHU, S. K. (1997). Updating schemes, correlation structure, blocking and parameterisation for the Gibbs sampler. Journal of the Royal Statistical Society, Series B, ROBERTS, G. O. and ROSENTHAL, J. S. (1997). Geometric ergodicity and hybrid Markov chains. Electronic Communications in Probability, ROBERTS, G. O. and ROSENTHAL, J. S. (2001). Markov chains and de-initializing processes. Scandinavian Journal of Statistics, ROBERTS, G. O. and TWEEDIE, R. L. (2001). Geometric L 2 and L 1 convergence are equivalent for reversible Markov chains. Journal of Applied Probability, 38A ROMÁN, J. C. (2012). Convergence Analysis of Block Gibbs Samplers for Bayesian General Linear Mixed Models. Ph.D. thesis, Department of Statistics, University of Florida. ROMÁN, J. C. and HOBERT, J. P. (2012). Convergence analysis of the Gibbs sampler for Bayesian general linear mixed models with improper priors. Annals of Statistics, ROSENTHAL, J. S. (2003). Asymptotic variance and convergence rates of nearly-periodic MCMC algorithms. Journal of the American Statistical Association, TAN, A. and HOBERT, J. P. (2009). Block Gibbs sampling for Bayesian random effects models with improper priors: Convergence and regeneration. Journal of Computational and Graphical Statistics, YU, Y. and MENG, X.-L. (2011). To center or not to center: That is not the question - An ancillarity-sufficiency interweaving strategy (ASIS) for boosting MCMC efficiency (with discussion). Journal of Computational and Graphical Statistics,

University of Toronto Department of Statistics

University of Toronto Department of Statistics Norm Comparisons for Data Augmentation by James P. Hobert Department of Statistics University of Florida and Jeffrey S. Rosenthal Department of Statistics University of Toronto Technical Report No. 0704

More information

Block Gibbs sampling for Bayesian random effects models with improper priors: Convergence and regeneration

Block Gibbs sampling for Bayesian random effects models with improper priors: Convergence and regeneration Block Gibbs sampling for Bayesian random effects models with improper priors: Convergence and regeneration Aixin Tan and James P. Hobert Department of Statistics, University of Florida May 4, 009 Abstract

More information

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can

More information

CONVERGENCE ANALYSIS OF THE GIBBS SAMPLER FOR BAYESIAN GENERAL LINEAR MIXED MODELS WITH IMPROPER PRIORS. Vanderbilt University & University of Florida

CONVERGENCE ANALYSIS OF THE GIBBS SAMPLER FOR BAYESIAN GENERAL LINEAR MIXED MODELS WITH IMPROPER PRIORS. Vanderbilt University & University of Florida Submitted to the Annals of Statistics CONVERGENCE ANALYSIS OF THE GIBBS SAMPLER FOR BAYESIAN GENERAL LINEAR MIXED MODELS WITH IMPROPER PRIORS By Jorge Carlos Román and James P. Hobert Vanderbilt University

More information

Spectral properties of Markov operators in Markov chain Monte Carlo

Spectral properties of Markov operators in Markov chain Monte Carlo Spectral properties of Markov operators in Markov chain Monte Carlo Qian Qin Advisor: James P. Hobert October 2017 1 Introduction Markov chain Monte Carlo (MCMC) is an indispensable tool in Bayesian statistics.

More information

Geometric ergodicity of the Bayesian lasso

Geometric ergodicity of the Bayesian lasso Geometric ergodicity of the Bayesian lasso Kshiti Khare and James P. Hobert Department of Statistics University of Florida June 3 Abstract Consider the standard linear model y = X +, where the components

More information

The Polya-Gamma Gibbs Sampler for Bayesian. Logistic Regression is Uniformly Ergodic

The Polya-Gamma Gibbs Sampler for Bayesian. Logistic Regression is Uniformly Ergodic he Polya-Gamma Gibbs Sampler for Bayesian Logistic Regression is Uniformly Ergodic Hee Min Choi and James P. Hobert Department of Statistics University of Florida August 013 Abstract One of the most widely

More information

Improving the Convergence Properties of the Data Augmentation Algorithm with an Application to Bayesian Mixture Modelling

Improving the Convergence Properties of the Data Augmentation Algorithm with an Application to Bayesian Mixture Modelling Improving the Convergence Properties of the Data Augmentation Algorithm with an Application to Bayesian Mixture Modelling James P. Hobert Department of Statistics University of Florida Vivekananda Roy

More information

Analysis of Polya-Gamma Gibbs sampler for Bayesian logistic analysis of variance

Analysis of Polya-Gamma Gibbs sampler for Bayesian logistic analysis of variance Electronic Journal of Statistics Vol. (207) 326 337 ISSN: 935-7524 DOI: 0.24/7-EJS227 Analysis of Polya-Gamma Gibbs sampler for Bayesian logistic analysis of variance Hee Min Choi Department of Statistics

More information

A Comparison Theorem for Data Augmentation Algorithms with Applications

A Comparison Theorem for Data Augmentation Algorithms with Applications A Comparison Theorem for Data Augmentation Algorithms with Applications Hee Min Choi Department of Statistics University of California, Davis James P. Hobert Department of Statistics University of Florida

More information

Local consistency of Markov chain Monte Carlo methods

Local consistency of Markov chain Monte Carlo methods Ann Inst Stat Math (2014) 66:63 74 DOI 10.1007/s10463-013-0403-3 Local consistency of Markov chain Monte Carlo methods Kengo Kamatani Received: 12 January 2012 / Revised: 8 March 2013 / Published online:

More information

On the Applicability of Regenerative Simulation in Markov Chain Monte Carlo

On the Applicability of Regenerative Simulation in Markov Chain Monte Carlo On the Applicability of Regenerative Simulation in Markov Chain Monte Carlo James P. Hobert 1, Galin L. Jones 2, Brett Presnell 1, and Jeffrey S. Rosenthal 3 1 Department of Statistics University of Florida

More information

When is a Markov chain regenerative?

When is a Markov chain regenerative? When is a Markov chain regenerative? Krishna B. Athreya and Vivekananda Roy Iowa tate University Ames, Iowa, 50011, UA Abstract A sequence of random variables {X n } n 0 is called regenerative if it can

More information

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Ruppert A. EMPIRICAL ESTIMATE OF THE KERNEL MIXTURE Here we

More information

Reversible Markov chains

Reversible Markov chains Reversible Markov chains Variational representations and ordering Chris Sherlock Abstract This pedagogical document explains three variational representations that are useful when comparing the efficiencies

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov

More information

STA 294: Stochastic Processes & Bayesian Nonparametrics

STA 294: Stochastic Processes & Bayesian Nonparametrics MARKOV CHAINS AND CONVERGENCE CONCEPTS Markov chains are among the simplest stochastic processes, just one step beyond iid sequences of random variables. Traditionally they ve been used in modelling a

More information

ON CONVERGENCE RATES OF GIBBS SAMPLERS FOR UNIFORM DISTRIBUTIONS

ON CONVERGENCE RATES OF GIBBS SAMPLERS FOR UNIFORM DISTRIBUTIONS The Annals of Applied Probability 1998, Vol. 8, No. 4, 1291 1302 ON CONVERGENCE RATES OF GIBBS SAMPLERS FOR UNIFORM DISTRIBUTIONS By Gareth O. Roberts 1 and Jeffrey S. Rosenthal 2 University of Cambridge

More information

Convergence Analysis of MCMC Algorithms for Bayesian Multivariate Linear Regression with Non-Gaussian Errors

Convergence Analysis of MCMC Algorithms for Bayesian Multivariate Linear Regression with Non-Gaussian Errors Convergence Analysis of MCMC Algorithms for Bayesian Multivariate Linear Regression with Non-Gaussian Errors James P. Hobert, Yeun Ji Jung, Kshitij Khare and Qian Qin Department of Statistics University

More information

Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics

Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics Eric Slud, Statistics Program Lecture 1: Metropolis-Hastings Algorithm, plus background in Simulation and Markov Chains. Lecture

More information

Simultaneous drift conditions for Adaptive Markov Chain Monte Carlo algorithms

Simultaneous drift conditions for Adaptive Markov Chain Monte Carlo algorithms Simultaneous drift conditions for Adaptive Markov Chain Monte Carlo algorithms Yan Bai Feb 2009; Revised Nov 2009 Abstract In the paper, we mainly study ergodicity of adaptive MCMC algorithms. Assume that

More information

Partially Collapsed Gibbs Samplers: Theory and Methods. Ever increasing computational power along with ever more sophisticated statistical computing

Partially Collapsed Gibbs Samplers: Theory and Methods. Ever increasing computational power along with ever more sophisticated statistical computing Partially Collapsed Gibbs Samplers: Theory and Methods David A. van Dyk 1 and Taeyoung Park Ever increasing computational power along with ever more sophisticated statistical computing techniques is making

More information

A Geometric Interpretation of the Metropolis Hastings Algorithm

A Geometric Interpretation of the Metropolis Hastings Algorithm Statistical Science 2, Vol. 6, No., 5 9 A Geometric Interpretation of the Metropolis Hastings Algorithm Louis J. Billera and Persi Diaconis Abstract. The Metropolis Hastings algorithm transforms a given

More information

A regeneration proof of the central limit theorem for uniformly ergodic Markov chains

A regeneration proof of the central limit theorem for uniformly ergodic Markov chains A regeneration proof of the central limit theorem for uniformly ergodic Markov chains By AJAY JASRA Department of Mathematics, Imperial College London, SW7 2AZ, London, UK and CHAO YANG Department of Mathematics,

More information

Analysis of the Gibbs sampler for a model. related to James-Stein estimators. Jeffrey S. Rosenthal*

Analysis of the Gibbs sampler for a model. related to James-Stein estimators. Jeffrey S. Rosenthal* Analysis of the Gibbs sampler for a model related to James-Stein estimators by Jeffrey S. Rosenthal* Department of Statistics University of Toronto Toronto, Ontario Canada M5S 1A1 Phone: 416 978-4594.

More information

Minorization Conditions and Convergence Rates for Markov Chain Monte Carlo. (September, 1993; revised July, 1994.)

Minorization Conditions and Convergence Rates for Markov Chain Monte Carlo. (September, 1993; revised July, 1994.) Minorization Conditions and Convergence Rates for Markov Chain Monte Carlo September, 1993; revised July, 1994. Appeared in Journal of the American Statistical Association 90 1995, 558 566. by Jeffrey

More information

Partially Collapsed Gibbs Samplers: Theory and Methods

Partially Collapsed Gibbs Samplers: Theory and Methods David A. VAN DYK and Taeyoung PARK Partially Collapsed Gibbs Samplers: Theory and Methods Ever-increasing computational power, along with ever more sophisticated statistical computing techniques, is making

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Yaming Yu Department of Statistics, University of California, Irvine Xiao-Li Meng Department of Statistics, Harvard University.

Yaming Yu Department of Statistics, University of California, Irvine Xiao-Li Meng Department of Statistics, Harvard University. Appendices to To Center or Not to Center: That is Not the Question An Ancillarity-Sufficiency Interweaving Strategy (ASIS) for Boosting MCMC Efficiency Yaming Yu Department of Statistics, University of

More information

Variance Bounding Markov Chains

Variance Bounding Markov Chains Variance Bounding Markov Chains by Gareth O. Roberts * and Jeffrey S. Rosenthal ** (September 2006; revised April 2007.) Abstract. We introduce a new property of Markov chains, called variance bounding.

More information

LECTURE 15 Markov chain Monte Carlo

LECTURE 15 Markov chain Monte Carlo LECTURE 15 Markov chain Monte Carlo There are many settings when posterior computation is a challenge in that one does not have a closed form expression for the posterior distribution. Markov chain Monte

More information

Faithful couplings of Markov chains: now equals forever

Faithful couplings of Markov chains: now equals forever Faithful couplings of Markov chains: now equals forever by Jeffrey S. Rosenthal* Department of Statistics, University of Toronto, Toronto, Ontario, Canada M5S 1A1 Phone: (416) 978-4594; Internet: jeff@utstat.toronto.edu

More information

Markov chain Monte Carlo

Markov chain Monte Carlo Markov chain Monte Carlo Karl Oskar Ekvall Galin L. Jones University of Minnesota March 12, 2019 Abstract Practically relevant statistical models often give rise to probability distributions that are analytically

More information

Asymptotically Stable Drift and Minorization for Markov Chains. with Application to Albert and Chib s Algorithm

Asymptotically Stable Drift and Minorization for Markov Chains. with Application to Albert and Chib s Algorithm Asymptotically Stable Drift and Minorization for Markov Chains with Application to Albert and Chib s Algorithm Qian Qin and James P. Hobert Department of Statistics University of Florida December 2017

More information

MARGINAL MARKOV CHAIN MONTE CARLO METHODS

MARGINAL MARKOV CHAIN MONTE CARLO METHODS Statistica Sinica 20 (2010), 1423-1454 MARGINAL MARKOV CHAIN MONTE CARLO METHODS David A. van Dyk University of California, Irvine Abstract: Marginal Data Augmentation and Parameter-Expanded Data Augmentation

More information

To Center or Not to Center: That is Not the Question An Ancillarity-Sufficiency Interweaving Strategy (ASIS) for Boosting MCMC Efficiency

To Center or Not to Center: That is Not the Question An Ancillarity-Sufficiency Interweaving Strategy (ASIS) for Boosting MCMC Efficiency To Center or Not to Center: That is Not the Question An Ancillarity-Sufficiency Interweaving Strategy (ASIS) for Boosting MCMC Efficiency Yaming Yu Department of Statistics, University of California, Irvine

More information

Statistics & Data Sciences: First Year Prelim Exam May 2018

Statistics & Data Sciences: First Year Prelim Exam May 2018 Statistics & Data Sciences: First Year Prelim Exam May 2018 Instructions: 1. Do not turn this page until instructed to do so. 2. Start each new question on a new sheet of paper. 3. This is a closed book

More information

Convergence complexity analysis of Albert and Chib s algorithm for Bayesian probit regression

Convergence complexity analysis of Albert and Chib s algorithm for Bayesian probit regression Convergence complexity analysis of Albert and Chib s algorithm for Bayesian probit regression Qian Qin and James P. Hobert Department of Statistics University of Florida April 2018 Abstract The use of

More information

Bayesian Computation in Color-Magnitude Diagrams

Bayesian Computation in Color-Magnitude Diagrams Bayesian Computation in Color-Magnitude Diagrams SA, AA, PT, EE, MCMC and ASIS in CMDs Paul Baines Department of Statistics Harvard University October 19, 2009 Overview Motivation and Introduction Modelling

More information

Bayesian inference for multivariate skew-normal and skew-t distributions

Bayesian inference for multivariate skew-normal and skew-t distributions Bayesian inference for multivariate skew-normal and skew-t distributions Brunero Liseo Sapienza Università di Roma Banff, May 2013 Outline Joint research with Antonio Parisi (Roma Tor Vergata) 1. Inferential

More information

Asymptotic efficiency of simple decisions for the compound decision problem

Asymptotic efficiency of simple decisions for the compound decision problem Asymptotic efficiency of simple decisions for the compound decision problem Eitan Greenshtein and Ya acov Ritov Department of Statistical Sciences Duke University Durham, NC 27708-0251, USA e-mail: eitan.greenshtein@gmail.com

More information

Invariant HPD credible sets and MAP estimators

Invariant HPD credible sets and MAP estimators Bayesian Analysis (007), Number 4, pp. 681 69 Invariant HPD credible sets and MAP estimators Pierre Druilhet and Jean-Michel Marin Abstract. MAP estimators and HPD credible sets are often criticized in

More information

Lecture 8: The Metropolis-Hastings Algorithm

Lecture 8: The Metropolis-Hastings Algorithm 30.10.2008 What we have seen last time: Gibbs sampler Key idea: Generate a Markov chain by updating the component of (X 1,..., X p ) in turn by drawing from the full conditionals: X (t) j Two drawbacks:

More information

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations John R. Michael, Significance, Inc. and William R. Schucany, Southern Methodist University The mixture

More information

General Glivenko-Cantelli theorems

General Glivenko-Cantelli theorems The ISI s Journal for the Rapid Dissemination of Statistics Research (wileyonlinelibrary.com) DOI: 10.100X/sta.0000......................................................................................................

More information

Applicability of subsampling bootstrap methods in Markov chain Monte Carlo

Applicability of subsampling bootstrap methods in Markov chain Monte Carlo Applicability of subsampling bootstrap methods in Markov chain Monte Carlo James M. Flegal Abstract Markov chain Monte Carlo (MCMC) methods allow exploration of intractable probability distributions by

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj Adriana Ibrahim Institute

More information

Marginal Markov Chain Monte Carlo Methods

Marginal Markov Chain Monte Carlo Methods Marginal Markov Chain Monte Carlo Methods David A. van Dyk Department of Statistics, University of California, Irvine, CA 92697 dvd@ics.uci.edu Hosung Kang Washington Mutual hosung.kang@gmail.com June

More information

eqr094: Hierarchical MCMC for Bayesian System Reliability

eqr094: Hierarchical MCMC for Bayesian System Reliability eqr094: Hierarchical MCMC for Bayesian System Reliability Alyson G. Wilson Statistical Sciences Group, Los Alamos National Laboratory P.O. Box 1663, MS F600 Los Alamos, NM 87545 USA Phone: 505-667-9167

More information

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns

More information

INTRODUCTION TO MARKOV CHAIN MONTE CARLO

INTRODUCTION TO MARKOV CHAIN MONTE CARLO INTRODUCTION TO MARKOV CHAIN MONTE CARLO 1. Introduction: MCMC In its simplest incarnation, the Monte Carlo method is nothing more than a computerbased exploitation of the Law of Large Numbers to estimate

More information

Metropolis-Hastings Algorithm

Metropolis-Hastings Algorithm Strength of the Gibbs sampler Metropolis-Hastings Algorithm Easy algorithm to think about. Exploits the factorization properties of the joint probability distribution. No difficult choices to be made to

More information

GENERAL STATE SPACE MARKOV CHAINS AND MCMC ALGORITHMS

GENERAL STATE SPACE MARKOV CHAINS AND MCMC ALGORITHMS GENERAL STATE SPACE MARKOV CHAINS AND MCMC ALGORITHMS by Gareth O. Roberts* and Jeffrey S. Rosenthal** (March 2004; revised August 2004) Abstract. This paper surveys various results about Markov chains

More information

CONVERGENCE ANALYSIS OF THE GIBBS SAMPLER FOR BAYESIAN GENERAL LINEAR MIXED MODELS WITH IMPROPER PRIORS

CONVERGENCE ANALYSIS OF THE GIBBS SAMPLER FOR BAYESIAN GENERAL LINEAR MIXED MODELS WITH IMPROPER PRIORS The Annals of Statistics 01, Vol. 40, No. 6, 83 849 DOI: 10.114/1-AOS105 Institute of Mathematical Statistics, 01 CONVERGENCE ANALYSIS OF THE GIBBS SAMPLER FOR BAYESIAN GENERAL LINEAR MIXED MODELS WITH

More information

n [ F (b j ) F (a j ) ], n j=1(a j, b j ] E (4.1)

n [ F (b j ) F (a j ) ], n j=1(a j, b j ] E (4.1) 1.4. CONSTRUCTION OF LEBESGUE-STIELTJES MEASURES In this section we shall put to use the Carathéodory-Hahn theory, in order to construct measures with certain desirable properties first on the real line

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo Markov Chain Monte Carlo Recall: To compute the expectation E ( h(y ) ) we use the approximation E(h(Y )) 1 n n h(y ) t=1 with Y (1),..., Y (n) h(y). Thus our aim is to sample Y (1),..., Y (n) from f(y).

More information

Slice Sampling Mixture Models

Slice Sampling Mixture Models Slice Sampling Mixture Models Maria Kalli, Jim E. Griffin & Stephen G. Walker Centre for Health Services Studies, University of Kent Institute of Mathematics, Statistics & Actuarial Science, University

More information

Geometric Ergodicity and Hybrid Markov Chains

Geometric Ergodicity and Hybrid Markov Chains Geometric Ergodicity and Hybrid Markov Chains by Gareth O. Roberts* and Jeffrey S. Rosenthal** (August 1, 1996; revised, April 11, 1997.) Abstract. Various notions of geometric ergodicity for Markov chains

More information

g(x) = P (y) Proof. This is true for n = 0. Assume by the inductive hypothesis that g (n) (0) = 0 for some n. Compute g (n) (h) g (n) (0)

g(x) = P (y) Proof. This is true for n = 0. Assume by the inductive hypothesis that g (n) (0) = 0 for some n. Compute g (n) (h) g (n) (0) Mollifiers and Smooth Functions We say a function f from C is C (or simply smooth) if all its derivatives to every order exist at every point of. For f : C, we say f is C if all partial derivatives to

More information

Quantitative Non-Geometric Convergence Bounds for Independence Samplers

Quantitative Non-Geometric Convergence Bounds for Independence Samplers Quantitative Non-Geometric Convergence Bounds for Independence Samplers by Gareth O. Roberts * and Jeffrey S. Rosenthal ** (September 28; revised July 29.) 1. Introduction. Markov chain Monte Carlo (MCMC)

More information

Statistical Inference

Statistical Inference Statistical Inference Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham, NC, USA Spring, 2006 1. DeGroot 1973 In (DeGroot 1973), Morrie DeGroot considers testing the

More information

1 Kernels Definitions Operations Finite State Space Regular Conditional Probabilities... 4

1 Kernels Definitions Operations Finite State Space Regular Conditional Probabilities... 4 Stat 8501 Lecture Notes Markov Chains Charles J. Geyer April 23, 2014 Contents 1 Kernels 2 1.1 Definitions............................. 2 1.2 Operations............................ 3 1.3 Finite State Space........................

More information

Convergence of Conditional Metropolis-Hastings Samplers

Convergence of Conditional Metropolis-Hastings Samplers Convergence of Conditional Metropolis-Hastings Samplers Galin L. Jones Gareth O. Roberts Jeffrey S. Rosenthal (June, 2012; revised March 2013 and June 2013) Abstract We consider Markov chain Monte Carlo

More information

Some Results on the Ergodicity of Adaptive MCMC Algorithms

Some Results on the Ergodicity of Adaptive MCMC Algorithms Some Results on the Ergodicity of Adaptive MCMC Algorithms Omar Khalil Supervisor: Jeffrey Rosenthal September 2, 2011 1 Contents 1 Andrieu-Moulines 4 2 Roberts-Rosenthal 7 3 Atchadé and Fort 8 4 Relationship

More information

7. Estimation and hypothesis testing. Objective. Recommended reading

7. Estimation and hypothesis testing. Objective. Recommended reading 7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing

More information

Problem 3. Give an example of a sequence of continuous functions on a compact domain converging pointwise but not uniformly to a continuous function

Problem 3. Give an example of a sequence of continuous functions on a compact domain converging pointwise but not uniformly to a continuous function Problem 3. Give an example of a sequence of continuous functions on a compact domain converging pointwise but not uniformly to a continuous function Solution. If we does not need the pointwise limit of

More information

Chapter 7. Markov chain background. 7.1 Finite state space

Chapter 7. Markov chain background. 7.1 Finite state space Chapter 7 Markov chain background A stochastic process is a family of random variables {X t } indexed by a varaible t which we will think of as time. Time can be discrete or continuous. We will only consider

More information

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions

More information

Simulation of truncated normal variables. Christian P. Robert LSTA, Université Pierre et Marie Curie, Paris

Simulation of truncated normal variables. Christian P. Robert LSTA, Université Pierre et Marie Curie, Paris Simulation of truncated normal variables Christian P. Robert LSTA, Université Pierre et Marie Curie, Paris Abstract arxiv:0907.4010v1 [stat.co] 23 Jul 2009 We provide in this paper simulation algorithms

More information

A Review of Pseudo-Marginal Markov Chain Monte Carlo

A Review of Pseudo-Marginal Markov Chain Monte Carlo A Review of Pseudo-Marginal Markov Chain Monte Carlo Discussed by: Yizhe Zhang October 21, 2016 Outline 1 Overview 2 Paper review 3 experiment 4 conclusion Motivation & overview Notation: θ denotes the

More information

Geometric Convergence Rates for Time-Sampled Markov Chains

Geometric Convergence Rates for Time-Sampled Markov Chains Geometric Convergence Rates for Time-Sampled Markov Chains by Jeffrey S. Rosenthal* (April 5, 2002; last revised April 17, 2003) Abstract. We consider time-sampled Markov chain kernels, of the form P µ

More information

SUPPLEMENT TO PAPER CONVERGENCE OF ADAPTIVE AND INTERACTING MARKOV CHAIN MONTE CARLO ALGORITHMS

SUPPLEMENT TO PAPER CONVERGENCE OF ADAPTIVE AND INTERACTING MARKOV CHAIN MONTE CARLO ALGORITHMS Submitted to the Annals of Statistics SUPPLEMENT TO PAPER CONERGENCE OF ADAPTIE AND INTERACTING MARKO CHAIN MONTE CARLO ALGORITHMS By G Fort,, E Moulines and P Priouret LTCI, CNRS - TELECOM ParisTech,

More information

Adaptive Monte Carlo methods

Adaptive Monte Carlo methods Adaptive Monte Carlo methods Jean-Michel Marin Projet Select, INRIA Futurs, Université Paris-Sud joint with Randal Douc (École Polytechnique), Arnaud Guillin (Université de Marseille) and Christian Robert

More information

Modified Simes Critical Values Under Positive Dependence

Modified Simes Critical Values Under Positive Dependence Modified Simes Critical Values Under Positive Dependence Gengqian Cai, Sanat K. Sarkar Clinical Pharmacology Statistics & Programming, BDS, GlaxoSmithKline Statistics Department, Temple University, Philadelphia

More information

An introduction to adaptive MCMC

An introduction to adaptive MCMC An introduction to adaptive MCMC Gareth Roberts MIRAW Day on Monte Carlo methods March 2011 Mainly joint work with Jeff Rosenthal. http://www2.warwick.ac.uk/fac/sci/statistics/crism/ Conferences and workshops

More information

CHAPTER 6. Differentiation

CHAPTER 6. Differentiation CHPTER 6 Differentiation The generalization from elementary calculus of differentiation in measure theory is less obvious than that of integration, and the methods of treating it are somewhat involved.

More information

An ABC interpretation of the multiple auxiliary variable method

An ABC interpretation of the multiple auxiliary variable method School of Mathematical and Physical Sciences Department of Mathematics and Statistics Preprint MPS-2016-07 27 April 2016 An ABC interpretation of the multiple auxiliary variable method by Dennis Prangle

More information

17 : Markov Chain Monte Carlo

17 : Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo

More information

CONVERGENCE RATES AND REGENERATION OF THE BLOCK GIBBS SAMPLER FOR BAYESIAN RANDOM EFFECTS MODELS

CONVERGENCE RATES AND REGENERATION OF THE BLOCK GIBBS SAMPLER FOR BAYESIAN RANDOM EFFECTS MODELS CONVERGENCE RATES AND REGENERATION OF THE BLOCK GIBBS SAMPLER FOR BAYESIAN RANDOM EFFECTS MODELS By AIXIN TAN A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT

More information

Nonparametric Drift Estimation for Stochastic Differential Equations

Nonparametric Drift Estimation for Stochastic Differential Equations Nonparametric Drift Estimation for Stochastic Differential Equations Gareth Roberts 1 Department of Statistics University of Warwick Brazilian Bayesian meeting, March 2010 Joint work with O. Papaspiliopoulos,

More information

Weak convergence of Markov chain Monte Carlo II

Weak convergence of Markov chain Monte Carlo II Weak convergence of Markov chain Monte Carlo II KAMATANI, Kengo Mar 2011 at Le Mans Background Markov chain Monte Carlo (MCMC) method is widely used in Statistical Science. It is easy to use, but difficult

More information

VARIABLE TRANSFORMATION TO OBTAIN GEOMETRIC ERGODICITY IN THE RANDOM-WALK METROPOLIS ALGORITHM

VARIABLE TRANSFORMATION TO OBTAIN GEOMETRIC ERGODICITY IN THE RANDOM-WALK METROPOLIS ALGORITHM Submitted to the Annals of Statistics VARIABLE TRANSFORMATION TO OBTAIN GEOMETRIC ERGODICITY IN THE RANDOM-WALK METROPOLIS ALGORITHM By Leif T. Johnson and Charles J. Geyer Google Inc. and University of

More information

Computer intensive statistical methods

Computer intensive statistical methods Lecture 11 Markov Chain Monte Carlo cont. October 6, 2015 Jonas Wallin jonwal@chalmers.se Chalmers, Gothenburg university The two stage Gibbs sampler If the conditional distributions are easy to sample

More information

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet. Stat 535 C - Statistical Computing & Monte Carlo Methods Arnaud Doucet Email: arnaud@cs.ubc.ca 1 1.1 Outline Introduction to Markov chain Monte Carlo The Gibbs Sampler Examples Overview of the Lecture

More information

Estimating the spectral gap of a trace-class Markov operator

Estimating the spectral gap of a trace-class Markov operator Estimating the spectral gap of a trace-class Markov operator Qian Qin, James P. Hobert and Kshitij Khare Department of Statistics University of Florida April 2017 Abstract The utility of a Markov chain

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning

More information

MARKOV CHAIN MONTE CARLO

MARKOV CHAIN MONTE CARLO MARKOV CHAIN MONTE CARLO RYAN WANG Abstract. This paper gives a brief introduction to Markov Chain Monte Carlo methods, which offer a general framework for calculating difficult integrals. We start with

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

6 Markov Chain Monte Carlo (MCMC)

6 Markov Chain Monte Carlo (MCMC) 6 Markov Chain Monte Carlo (MCMC) The underlying idea in MCMC is to replace the iid samples of basic MC methods, with dependent samples from an ergodic Markov chain, whose limiting (stationary) distribution

More information

Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements

Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements Jeffrey N. Rouder Francis Tuerlinckx Paul L. Speckman Jun Lu & Pablo Gomez May 4 008 1 The Weibull regression model

More information

08a. Operators on Hilbert spaces. 1. Boundedness, continuity, operator norms

08a. Operators on Hilbert spaces. 1. Boundedness, continuity, operator norms (February 24, 2017) 08a. Operators on Hilbert spaces Paul Garrett garrett@math.umn.edu http://www.math.umn.edu/ garrett/ [This document is http://www.math.umn.edu/ garrett/m/real/notes 2016-17/08a-ops

More information

Lecture 10. Theorem 1.1 [Ergodicity and extremality] A probability measure µ on (Ω, F) is ergodic for T if and only if it is an extremal point in M.

Lecture 10. Theorem 1.1 [Ergodicity and extremality] A probability measure µ on (Ω, F) is ergodic for T if and only if it is an extremal point in M. Lecture 10 1 Ergodic decomposition of invariant measures Let T : (Ω, F) (Ω, F) be measurable, and let M denote the space of T -invariant probability measures on (Ω, F). Then M is a convex set, although

More information

Improved Robust MCMC Algorithm for Hierarchical Models

Improved Robust MCMC Algorithm for Hierarchical Models UNIVERSITY OF TEXAS AT SAN ANTONIO Improved Robust MCMC Algorithm for Hierarchical Models Liang Jing July 2010 1 1 ABSTRACT In this paper, three important techniques are discussed with details: 1) group

More information

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version)

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version) A quick introduction to Markov chains and Markov chain Monte Carlo (revised version) Rasmus Waagepetersen Institute of Mathematical Sciences Aalborg University 1 Introduction These notes are intended to

More information

Control Variates for Markov Chain Monte Carlo

Control Variates for Markov Chain Monte Carlo Control Variates for Markov Chain Monte Carlo Dellaportas, P., Kontoyiannis, I., and Tsourti, Z. Dept of Statistics, AUEB Dept of Informatics, AUEB 1st Greek Stochastics Meeting Monte Carlo: Probability

More information

Markov Chains and De-initialising Processes

Markov Chains and De-initialising Processes Markov Chains and De-initialising Processes by Gareth O. Roberts* and Jeffrey S. Rosenthal** (November 1998; last revised July 2000.) Abstract. We define a notion of de-initialising Markov chains. We prove

More information

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Sahar Z Zangeneh Robert W. Keener Roderick J.A. Little Abstract In Probability proportional

More information

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model UNIVERSITY OF TEXAS AT SAN ANTONIO Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model Liang Jing April 2010 1 1 ABSTRACT In this paper, common MCMC algorithms are introduced

More information

Markov Chain Monte Carlo Methods

Markov Chain Monte Carlo Methods Markov Chain Monte Carlo Methods John Geweke University of Iowa, USA 2005 Institute on Computational Economics University of Chicago - Argonne National Laboaratories July 22, 2005 The problem p (θ, ω I)

More information