CONVERGENCE ANALYSIS OF THE GIBBS SAMPLER FOR BAYESIAN GENERAL LINEAR MIXED MODELS WITH IMPROPER PRIORS

Size: px
Start display at page:

Download "CONVERGENCE ANALYSIS OF THE GIBBS SAMPLER FOR BAYESIAN GENERAL LINEAR MIXED MODELS WITH IMPROPER PRIORS"

Transcription

1 The Annals of Statistics 01, Vol. 40, No. 6, DOI: /1-AOS105 Institute of Mathematical Statistics, 01 CONVERGENCE ANALYSIS OF THE GIBBS SAMPLER FOR BAYESIAN GENERAL LINEAR MIXED MODELS WITH IMPROPER PRIORS BY JORGE CARLOS ROMÁN AND JAMES P. HOBERT 1 Vanderbilt University and University of Florida Bayesian analysis of data from the general linear mixed model is challenging because any nontrivial prior leads to an intractable posterior density. However, if a conditionally conjugate prior density is adopted, then there is a simple Gibbs sampler that can be employed to explore the posterior density. A popular default among the conditionally conjugate priors is an improper prior that takes a product form with a flat prior on the regression parameter, and so-called power priors on each of the variance components. In this paper, a convergence rate analysis of the corresponding Gibbs sampler is undertaken. The main result is a simple, easily-checked sufficient condition for geometric ergodicity of the Gibbs Markov chain. This result is close to the best possible result in the sense that the sufficient condition is only slightly stronger than what is required to ensure posterior propriety. The theory developed in this paper is extremely important from a practical standpoint because it guarantees the existence of central limit theorems that allow for the computation of valid asymptotic standard errors for the estimates computed using the Gibbs sampler. 1. Introduction. The general linear mixed model GLMM) takes the form 1) Y = Xβ Zu e, where Y is an N 1 data vector, X and Z are known matrices with dimensions N p and N q, respectively, β is an unknown p 1 vector of regression coefficients, u is a random vector whose elements represent the various levels of the random factors in the model and e N N 0,σe I). The random vectors e and u are assumed to be independent. Suppose there are r random factors in the model. Then u and Z are partitioned accordingly as u = u T 1 ut u T r )T and Z = Z 1 Z Z r ), where u i is q i 1, Z i is N q i and q 1 q r = q. Then Zu = Z i u i, Received April 01; revised September Supported by NSF Grants DMS and DMS MSC010 subject classifications. Primary 60J7; secondary 6F15. Key words and phrases. Convergence rate, geometric drift condition, geometric ergodicity, Markov chain, Monte Carlo, posterior propriety. 83

2 84 J. C. ROMÁN AND J. P. HOBERT and it is assumed that u N q 0,D),whereD = r σ u i I qi.letσ denote the vector of variance components, that is, σ = σ e σ u 1 σ u r ) T. For background on this model, which is sometimes called the variance components model, see Searle, Casella and McCulloch 199). A Bayesian version of the GLMM can be assembled by specifying a prior distribution for the unknown parameters, β and σ. A popular choice is the proper conditionally) conjugate prior that takes β to be multivariate normal, and takes each of the variance components to be inverted gamma. One obvious reason for using such a prior is that the resulting posterior has conditional densities with standard forms, and this facilitates the use of the Gibbs sampler. In situations where there is little prior information, the hyperparameters of this proper prior are often set to extreme values as this is thought to yield a noninformative prior. Unfortunately, these extreme proper priors approximate improper priors that correspond to improper posteriors, and this results in various forms of instability. This problem has led several authors, including Daniels 1999) and Gelman 006), to discourage the use of such extreme proper priors, and to recommend alternative default priors that are improper, but lead to proper posteriors. Consider, for example, the one-way random effects model given by ) Y ij = β α i e ij, where i = 1,...,c, j = 1,...,n i,theα i s are i.i.d. N0,σα ),andthee ij s, which are independent of the α i s, are i.i.d. N0,σe ). This is an important special case of model 1). See Section 5 for a detailed explanation of how the GLMM reduces to the one-way model.) The standard diffuse prior for this model, which is among those recommended by Gelman 006), has density 1/σe σα ). This prior, like many of the improper priors for the GLMM that have been suggested and studied in the literature, is called a power prior because it is a product of terms, each a variance component brought to a possibly negative) power. Of course, like the proper conjugate priors mentioned above, power priors also lead to posteriors whose conditional densities have standard forms. In this paper, we consider the following parametric family of priors for β, σ ): p β,σ ; a,b ) = [ σe ) r ] ae 1) e b e /σe σ ) ai 1) b ui e i /σu i σ 3) ), I R r1 where a = a e,a 1,...,a r ) and b = b e,b 1,...,b r ) are fixed hyperparameters, and R := 0, ).Bytakingb to be the vector of 0 s, we can recover the power priors described above. Note that β does not appear on the right-hand side of 3); that is, we are using a so-called flat prior for β. Consequently, even if all the elements of a and b are strictly positive, so that every variance component gets a proper prior, the overall prior remains improper. There have been several studies concerning posterior propriety in this context, but it is still not known exactly which values of a and b yield proper posteriors. The best known result is due to Sun, Tsutakawa

3 CONVERGENCE OF THE GIBBS SAMPLER 85 and He 001), and we state it below so that it can be used in a comparison later in this section. Define θ = β T u T ) T and W = X Z), sothatwθ = Xβ Zu. Lety denote the observed value of Y,andletφ d x; μ, ) denote the N d μ, ) density evaluated at the vector x. By definition, the posterior density is proper if my) := π θ,σ y ) dθ dσ <, R pq where R r1 π θ,σ y ) = φ N y; Wθ,σ 4) e I ) φ q u; 0,D)p β,σ ; a,b ). A routine calculation shows that the posterior is improper if rankx) < p. The following result provides sufficient and nearly necessary) conditions for propriety. Throughout the paper, the symbol P subscripted with a matrix will denote the projection onto the column space of that matrix.) THEOREM 1[Sun, Tsutakawa and He 001)]. Assume that rankx) = p, and let t = rankz T I P X )Z) and SSE = I P W )y. If the following four conditions hold, then my) < : A) For each i {1,,...,r}, one of the following holds: A1) a i <b i = 0; A) b i > 0; B) for each i {1,,...,r}, q i a i >q t; C) N a e >p r a i I,0) a i ); D) b e SSE > 0. If my) <, then the posterior density is well defined i.e., proper) and is given by πθ,σ y) = π θ, σ y)/my), but it is intractable in the sense that posterior expectations cannot be computed in closed form, nor even by classical Monte Carlo methods. However, there is a simple two-step Gibbs sampler that can be used to approximate the intractable posterior expectations. This Gibbs sampler simulates a Markov chain, {θ n,σn )} n=0, that lives on X = Rpq R r1, and has invariant density πθ,σ y). If the current state of the chain is θ n,σn ), then the next state, θ n1,σn1 ), is simulated using the usual two steps. Indeed, we draw θ n1 from πθ σn,y),whichisap q)-dimensional multivariate normal density, and then we draw σn1 from πσ θ n1,y), which is a product of r 1 univariate inverted gamma densities. The exact forms of these conditional densities are given in Section. Because the Gibbs Markov chain is Harris ergodic see Section ), we can use it to construct consistent estimates of intractable posterior expectations. For k>0, let L k π) denote the set of functions g : R pq R r1 R such that E π g k := R r1 R pq g θ,σ ) k π θ,σ y ) dθ dσ <.

4 86 J. C. ROMÁN AND J. P. HOBERT If g L 1 π), then the ergodic theorem implies that the average g m := 1 m m 1 i=0 g θ i,σi ) is a strongly consistent estimator of E π g, no matter how the chain is started. Of course, in practice, an estimator is only useful if it is possible to compute an associated probabilistic) bound on the difference between the estimate and the truth. Typically, this bound is based on a standard error. All available methods of computing a valid asymptotic standard error for g m are based on the existence of a central limit theorem CLT) for g m [see, e.g., Bednorz and Łatuszyński 007), Flegal, Haran and Jones 008), Flegal and Jones 010), Jones et al. 006)]. Unfortunately, even if g L k π) for all k>0, Harris ergodicity is not enough to guarantee the existence of a CLT for g m [see, e.g., Roberts and Rosenthal 1998, 004)]. The standard method of establishing the existence of CLTs is to prove that the underlying Markov chain converges at a geometric rate. Let BX) denote the Borel sets in X, andletp n : X BX) [0, 1] denote the n-step Markov transition function of the Gibbs Markov chain. That is, P n θ, σ ), A) is the probability that θ n,σn ) A, given that the chain is started at θ 0,σ0 ) = θ, σ ). Also, let ) denote the posterior distribution. The chain is called geometrically ergodic if there exist a function M : X [0, ) and a constant ϱ [0, 1) such that, for all θ, σ ) X and all n = 0, 1,...,wehave P n θ,σ ), ) ) TV M θ,σ ) ϱ n, where TV denotes the total variation norm. The relationship between geometric convergence and CLTs is simple: if the chain is geometrically ergodic and E π g δ < for some δ>0, then there is a CLT for g m. Our main result Theorem in Section 3) provides conditions under which the Gibbs Markov chain is geometrically ergodic. The conditions of Theorem are not easy to interpret, and checking them may require some nontrivial numerical work. On the other hand, the following corollary to Theorem is a slightly weaker result whose conditions are very easy to check and understand. COROLLARY 1. Assume that rankx) = p. If the following four conditions hold, then the Gibbs Markov chain is geometrically ergodic. A) For each i {1,,...,r}, one of the following holds: A1) a i <b i = 0; A) b i > 0; B ) for each i {1,,...,r}, q i a i >q t ; C ) N a e >p t ; D) b e SSE > 0.

5 CONVERGENCE OF THE GIBBS SAMPLER 87 As we explain in Section, the best result we could possibly hope to obtain is that the Gibbs Markov chain is geometrically ergodic whenever the posterior is proper. With this in mind, note that the conditions of Corollary 1 are very close to the conditions for propriety given in Theorem 1. In fact, the former imply the latter. To see this, assume that A), B ), C ) and D) all hold. Then, obviously, B) holds, and all that remains is to show that C) holds. This would follow immediately if we could establish that 5) t a i I,0) a i ). We consider two cases. First, if r I,0) a i ) = 0, then it follows that r a i I,0) a i ) = 0, and 5) holds since t is nonnegative). On the other hand, if r I,0) a i )>0, then there is at least one negative a i,andb )implies that q i a i )I,0) a i )>q t. This inequality combined with the fact that q = q 1 q r yields t>q q i a i )I,0) a i ) a i I,0) a i ), so 5) holds, and this completes the argument. The strong similarity between the conditions of Corollary 1 and those of Theorem 1 might lead the reader to believe that the proofs of our results rely somehow on Theorem 1. This is not the case, however. In fact, we do not even assume posterior propriety before embarking on our convergence rate analysis; see Section 3. The only other existing result on geometric convergence of Gibbs samplers for linear mixed models with improper priors is that of Tan and Hobert 009), who considered a slightly reparameterized version of) the one-way random effects model ) and priors with b = b e,b 1 ) = 0, 0). We show in Section 5 that our Theorem specialized to the one-way model) improves upon the result of Tan and Hobert 009) in the sense that our sufficient conditions for geometric convergence are weaker. Moreover, it is known in this case exactly which priors lead to proper posteriors when SSE > 0), and we use this fact to show that our results can be very close to the best possible. For example, if the standard diffuse prior is used, then the posterior is proper if and only if c 3. On the other hand, our results imply that the Gibbs Markov chain is geometrically ergodic as long as c 3, and the total sample size, N = n 1 n n c,isatleastc. The extra condition that N c is extremely weak. Indeed, SSE > 0 implies that N c 1, so, for fixed c 3, our condition for geometric ergodicity fails only in the single case where N = c 1.

6 88 J. C. ROMÁN AND J. P. HOBERT An analogue of Corollary 1 for the GLMM with proper priors can be found in Johnson and Jones 010). In contrast with our results, one of their sufficient conditions for geometric convergence is that X T Z = 0, which rarely holds in practice. Overall, the proper and improper cases are similar, in the sense that geometric ergodicity is established via geometric drift conditions in both cases. However, the drift conditions are quite disparate, and the analysis required in the improper case is substantially more demanding. Finally, we note that the linear models considered by Papaspiliopoulos and Roberts 008) are substantively different from ours because these authors assume that the variance components are known. The remainder of this paper is organized as follows. Section contains a formal definition of the Gibbs Markov chain. The main convergence result is stated andproveninsection3, and an application involving the two-way random effects model is given in Section 4. In Section 5, we consider the one-way random effects model and compare our conditions for geometric convergence with those of Tan and Hobert 009). Finally, Section 6 concerns an interesting technical issue related to the use of improper priors.. The Gibbs sampler. In this section, we formally define the Markov chain underlying the Gibbs sampler, and state some of its properties. Recall that θ = β T u T ) T, σ = σe σu 1 σu r ) T and π θ, σ y) is the potentially improper, unnormalized posterior density defined at 4). Suppose that 6) π θ,σ y ) dθ < R pq for all σ outside a set of measure zero in R r1,andthat 7) R r1 π θ,σ y ) dσ < for all θ outside a set of measure zero in R pq. These two integrability conditions are necessary, but not sufficient, for posterior propriety. Keep in mind that it is not known exactly which priors yield proper posteriors.) When 6) and7) hold, we can define conditional densities as follows: π θ σ,y ) π θ, σ y) = R pq π θ, σ y)dθ and π σ θ,y ) π θ, σ y) = R r1 π θ, σ y)dσ. Clearly, when the posterior is proper, these conditionals are the usual ones based on πθ,σ y). When the posterior is improper, they are incompatible conditional densities; that is, there is no proper) joint density that generates them. In either case, we can run the Gibbs sampler as usual by drawing alternately from the two conditionals. However, as we explain below, if the posterior is improper, then the resulting Markov chain cannot be geometrically ergodic. Despite this fact, we do not restrict attention to the cases where the sufficient conditions for propriety in

7 CONVERGENCE OF THE GIBBS SAMPLER 89 Theorem 1 are satisfied. Indeed, we hope to close the gap that currently exists between the necessary and sufficient conditions for propriety by finding weaker conditions than those in Theorem 1 that imply geometric ergodicity and hence posterior propriety). We now provide a set of conditions that guarantee that the integrability conditions are satisfied. Define s = min{q 1 a 1,q a,...,q r a r,n a e }. The proof of the following result is straightforward and is left to the reader. PROPOSITION 1. The following four conditions are sufficient for 6) and 7) to hold: S1) rankx) = p; S) min{b 1,b,...,b r } 0; S3) b e SSE > 0; S4) s >0. Note that SSE = SSEX,Z,y) = y W ˆθ,whereW = X Z) and ˆθ = W T W) W T y. Therefore, if condition S3) holds, then for all θ R pq, b e y Wθ = b e y W ˆθ Wθ W ˆθ b e SSE > 0. Note also that if N>p q, then SSE is strictly positive with probability one under the data generating model. Assume now that S1) S4) hold so that the conditional densities are well defined. Routine manipulation of π θ, σ y) shows that πθ σ,y)is a multivariate normal density with mean vector [ X T X ) 1 X T I σ ) 1ZQ m = e 1 Z T I P X ) ) ] y ) σ 1Q 1 e Z T I P X )y and covariance matrix [ σ V = e X T X ) 1 RQ 1 R T RQ 1 Q 1 R T Q 1 where Q = σ e ) 1 Z T I P X )Z D 1 and R = X T X) 1 X T Z. Things are a bit more complicated for πσ θ,y) due to the possible existence of a bothersome set of measure zero. Define A ={i {1,,...,r} : b i = 0}.IfA is empty, then πσ θ,y) is well defined for every θ R pq, and it is the following product of r 1 inverted gamma densities: π σ θ,y ) = f IG σ e ; N a e,b e r ], ) y Wθ f IG σu i ; q i a i,b i u i ),

8 830 J. C. ROMÁN AND J. P. HOBERT where d c f IG v; c,d) = Ɣc)v c1 e d/v, v>0, 0, v 0, for c,d > 0. On the other hand, if A is nonempty, then π θ,σ y ) dσ =, R r1 whenever θ N := {θ R pq : i A u i =0}. The fact that πσ θ,y) is not defined when θ N is irrelevant from a simulation standpoint because the probability of observing a θ in N is zero. However, in order to perform a theoretical analysis, the Markov transition density Mtd) of the Gibbs Markov chain must be defined for every θ R pq. Obviously, the Mtd can be defined arbitrarily on a set of measure zero. Thus, for θ/ N,wedefineπσ θ,y) as in the case where A is empty, while if θ N, we define it to be f IG σ e ; 1, 1) r f IG σ u i ; 1, 1). Note that this definition can also be used when A is empty if we simply define N to be in that case. The Mtd of the Gibbs Markov chain, {θ n,σ n )} n=0,isdefinedas k θ,σ θ, σ ) = π σ θ,y ) π θ σ,y ). It is easy to see that the chain is ψ-irreducible, and that π θ, σ y) is an invariant density. It follows that the chain is positive recurrent if and only if the posterior is proper [Meyn and Tweedie 1993), Chapter 10]. Since a geometrically ergodic chain is necessarily positive recurrent, the Gibbs Markov chain cannot be geometrically ergodic when the posterior is improper. The point here is that conditions implying geometric ergodicity also imply posterior propriety. The marginal sequences, {θ n } n=0 and {σ n } n=0, are themselves Markov chains; see, for example, Liu, Wong and Kong 1994). The σ -chain lives on R r1 and has Mtd given by k 1 σ σ ) = π σ θ,y ) π θ σ,y ) dθ R pq and invariant density R pq π θ, σ y)dθ. Similarly, the θ-chain lives on R pq and has Mtd k θ θ)= π θ σ,y ) π σ θ,y ) dσ R r1 and invariant density R r1 π θ, σ y)dσ. Since the two marginal chains are also ψ-irreducible, they are positive recurrent if and only if the posterior is proper. Moreover, when the posterior is proper, routine calculations show that all three chains are Harris ergodic; that is, ψ-irreducible, aperiodic and positive Harris recurrent; see Román 01) for details. An important fact that we will exploit is that

9 CONVERGENCE OF THE GIBBS SAMPLER 831 geometric ergodicity is a solidarity property for the three chains {θ n,σn )} n=0, {θ n } n=0 and {σ n } n=0 ; that is, either all three are geometric or none of them is [Diaconis, Khare and Saloff-Coste 008), Liu, Wong and Kong 1994), Roberts and Rosenthal 001)]. In the next section, we prove that the Gibbs Markov chain converges at a geometric rate by proving that one of the marginal chains does. 3. The main result. In order to state the main result, we need a bit more notation. For i {1,...,r}, definer i to be the q i q matrix of 0 s and 1 s such that R i u = u i.inotherwords,r i is the matrix that extracts u i from u. Here is our main result. THEOREM. Assume that S1) S4) hold so that the Gibbs sampler is well defined. If the following two conditions hold, then the Gibbs Markov chain is geometrically ergodic: 1) For each i {1,,...,r}, one of the following holds: i) a i <b i = 0; ii) b i > 0. ) There exists an s 0, 1] 0, s/) such that s p t) s ƔN/ a e s) 8) < 1 ƔN/ a e ) and { } s Ɣqi / a i s) tr )) 9) Ri I P s Ɣq i / a i ) Z T I PX)Z )RT i < 1, where t = rankz T I P X )Z) and P Z T I P X )Z is the projection onto the column space of Z T I P X )Z. REMARK 1. It is important to reiterate that, by themselves, S1) S4) do not imply that the posterior density is proper. Of course, if conditions 1) and ) in Theorem hold as well, then the chain is geometric, so the posterior is necessarily proper. REMARK. A numerical search could be employed to check the second condition of Theorem. Indeed, one could evaluate the left-hand sides of 8) and 9) at all values of s on a fine grid in the interval 0, 1] 0, s/). The goal, of course, would be to find a single value of s at which both 8) and9) are satisfied. It can be shown that, if there does exist an s 0, 1] 0, s/) such that 8) and9) hold, then N a e >p t and, for each i = 1,,...,r, q i a i > trr i I P Z T I P X )Z )RT i ). Thus, it would behoove the user to verify these simple conditions before engaging in any numerical work.

10 83 J. C. ROMÁN AND J. P. HOBERT REMARK 3. When evaluating 9), it may be helpful to write P Z T I P X )Z as U T P U,whereU and are the orthogonal and diagonal matrices, respectively, in the spectral decomposition of Z T I P X )Z. Thatis,U is a q-dimensional orthogonal matrix and is a diagonal matrix containing the eigenvalues of Z T I P X )Z. Of course, the projection P is a q q binary diagonal matrix whose ith diagonal element is 1 if and only if the ith diagonal element of is positive. REMARK 4. Note that tr ) r )] R i I P Z T I P X )Z )RT i = tr [I P Z T I P X )Z ) Ri T R i = tri P Z T I P X )Z ) = ranki P Z T I P X )Z ) = q t. Moreover, when r>1, the matrix I P Z T I P X )Z has q = q 1 q q r diagonal elements, and the nonnegative) term trr i I P Z T I P X )Z )RT i ) is simply the sum of the q i diagonal elements that correspond to the ith random factor. REMARK 5. Recall from the Introduction that Corollary 1 provides an alternative set of sufficient conditions for geometric ergodicity that are harder to satisfy, but easier to check. A proof of Corollary 1 is given at the end of this section. We will prove Theorem indirectly by proving that the σ -chain is geometrically ergodic when the conditions of Theorem hold). This is accomplished by establishing a geometric drift condition for the σ -chain. PROPOSITION. Assume that S1) S4) hold so that the Gibbs sampler is well defined. Under the two conditions of Theorem, there exist a ρ [0, 1) and a finite constant L such that, for every σ R r1, 10) E v σ ) σ ) ρv σ ) L, where the drift function is defined as v σ ) = α σe ) s σ ) s ui α σ ) c e σ ) c, ui and α and c are positive constants. Hence, under the two conditions of Theorem, the σ -chain is geometrically ergodic. REMARK 6. The formulas for ρ = ρα,s,c) and L = Lα,s,c) are provided in the proof, as is a set of acceptable values for the pair α, c). Recall that the value of s is given to us in the hypothesis of Theorem.

11 CONVERGENCE OF THE GIBBS SAMPLER 833 PROOF OF PROPOSITION. The proof has two parts. In part I, we establish the validity of the geometric drift condition, 10). In part II, we use results from Meyn and Tweedie 1993) to show that 10) implies geometric ergodicity of the σ -chain. Part I. By conditioning on θ and iterating, we can express Evσ ) σ ) as [ E αe σe ) s θ ) r ) E σ ) s θ ui αe σe ) c θ ) r ) ] E σ ) c θ ui σ. We now develop upper bounds for each of the four terms inside the square brackets. Fix s S := 0, 1] 0, s/), and define and, for each i {1,,...,r},define G 0 s) = s ƔN/ a e s) ƔN/ a e ) G i s) = s Ɣq i/ a i s). Ɣq i / a i ) Note that, since s 0, 1], x 1 x ) s x1 s xs whenever x 1,x 0. Thus, E ) σe ) s θ ) = s y s Wθ G 0 s) b e y Wθ s G 0 s) [b se ) s ] = G 0 s) y Wθ ) s s G 0 s)b s e. Similarly, E σu ) s θ ) i = s G i s) b i u i ) s G i s) u i ) s s G i s)bi s. Now, for any c>0, we have E ) σe ) c θ ) = c y c Wθ G 0 c) b e c G 0 c) b e SSE and, for each i {1,,...,r}, E σu ) c θ ) i = c G i c) b i u i ) c ) c G i c) [ u i ) c I{0} b i ) b i ) c I 0, ) b i ) ].

12 834 J. C. ROMÁN AND J. P. HOBERT Recall that A ={i : b i = 0}, and note that E r σ u i ) c θ) can be bounded above by a constant if A is empty. Thus, we consider the case in which A is empty separately from the case where A. We begin with the latter, which is the more difficult case. Case I: A is nonempty. Combining the four bounds above and applying Jensen s inequality twice), we have 11) E v σ ) σ ) αg 0 s) [ E y Wθ σ )] s G i s) [ E u i σ )] s i A G i c)e [ u i c σ ] κα,s,c), where κα,s,c) = α s G 0 s)b s e s i : b i >0 G i c)b i ) c. G i s)bi s α c G 0 c) b e SSE Appendix A. contains a proof of the following inequality: ) c 1) E [ y Wθ σ ] p t) σ e I P X )y I P X )Z K ), where with a matrix argument denotes the Frobenius norm, and the constant K = KX,Z,y) is defined and shown to be finite in Appendix A.1. It follows immediately that [ E y Wθ σ )] s p t) s σ e ) s I P X )y I P X )Z K ) s. In Appendix A.3, it is shown that, for each i {1,,...,r},wehave E [ u i σ ] ξ i σ e ζ i σ u j R i K ), j=1 where ξ i = trr i Z T I P X )Z) Ri T ), ζ i = trr i I P Z T I P X )Z )RT i ) and A denotes the Moore Penrose inverse of the matrix A. It follows that [ E ui σ )] s ξ s i σ ) s e ζ s i σ ) s uj Ri K ) s 13). In Appendix A.4, it is established that, for any c 0, 1/), and for each i {1,,...,r},wehave E [ u i c σ ] c Ɣq i/ c) [ λ c Ɣq i /) max σ ) c e σ ) c ] 14) ui, j=1

13 CONVERGENCE OF THE GIBBS SAMPLER 835 where λ max denotes the largest eigenvalue of Z T I P X )Z. Using1) 14) in 11), we have E v σ ) σ ) α δ 1 s) δ ) s) σ ) s e δ3 s) σ ) s α uj 15) α δ 4c) α j=1 σ ) c e δ5 c) σ ) c uj Lα, s, c), where δ 1 s) := G 0 s)p t) s, δ s) := ξi s G is), δ 3 s) := ζi s G is), δ 4 c) := c λ c max G i c) Ɣq i/ c), Ɣq i A i /) [ δ 5 c) := c max G i c) Ɣq ] i/ c) i A Ɣq i /) and Hence, where j A Lα,s,c)= κα,s,c) αg 0 s) I P X )y I P X )Z K ) s G i s) R i K ) s. E v σ ) σ ) ρα,s,c)v σ ) Lα, s, c), { ρα,s,c)= max δ 1 s) δ s) α,δ 3s), δ } 4c) α,δ 5c). We must now show that there exists a triple α,s,c) R S 0, 1/) such that ρα,s,c) < 1. We begin by demonstrating that, if c is small enough, then δ 5 c) < 1. Define ã = max i A a i.also,setc = 0, 1/) 0, ã). Fixc C and note that δ 5 c) = max i A [ Ɣqi / a i c) Ɣq i / a i ) ] Ɣq i / c). Ɣq i /) For any i A, c a i < 0, and since s >0, it follows that 0 < q i a i < q i a i c< q i. But, Ɣx z)/ Ɣx) is decreasing in x for x>z>0, so we have Ɣq i / a i ) Ɣq i / a i c) = Ɣq i/ a i c c) Ɣq i / a i c) > Ɣq i/ c), Ɣq i /)

14 836 J. C. ROMÁN AND J. P. HOBERT and it follows immediately that δ 5 c) < 1 whenever c C. The two conditions of Theorem imply that there exists an s S such that δ 1 s )<1andδ 3 s )<1. Let c be any point in C, and choose α to be any number larger than { δ s ) max 1 δ 1 s ),δ 4 c )}. A simple calculation shows that ρα,s,c )<1, and this completes the argument for case I. Case II: A =. Since we no longer have to deal with E r σu i ) c θ), bound 15) becomes E v σ ) σ ) α δ 1 s) δ s) α ) σ ) s e δ3 s) σ ) s uj Lα, s, c), and there is no restriction on c other than c>0. [Note that the constant term Lα,s,c)requires no alteration when we move from case I to case II.] Hence, where j=1 E v σ ) σ ) ρα,s)v σ ) Lα, s, c), { ρα,s)= max δ 1 s) δ s) α },δ 3s). We must now show that there exists a α, s) R S such that ρα,s)< 1. As in case I, the two conditions of Theorem imply that there exists an s S such that δ 1 s )<1andδ 3 s )<1. Let α be any number larger than δ s ) 1 δ 1 s ). A simple calculation shows that ρα,s )<1, and this completes the argument for case II. This completes part I of the proof. Part II. We begin by establishing that the σ -chain satisfies certain properties. Recall that its Mtd is given by k 1 σ σ ) = π σ θ,y ) π θ σ,y ) dθ. R pq Note that k 1 is strictly positive on R r1 R r1. It follows that the σ -chain is ψ-irreducible and aperiodic, and that its maximal irreducibility measure is equivalent to Lebesgue measure on R r1 ; for definitions, see Meyn and Tweedie 1993), Chapters 4 and 5. Let P 1 denote the Markov transition function of the σ -chain; that is, for any σ R r1 and any Borel set A, P 1 σ,a ) = k 1 σ σ ) dσ. A

15 CONVERGENCE OF THE GIBBS SAMPLER 837 We now demonstrate that the σ -chain is a Feller chain; that is, for each fixed open set O, P 1,O)is a lower semi-continuous function on R r1. Indeed, let { σ m } m=1 be a sequence in R r1 that converges to σ R r1.then lim inf P 1 σ m m,o ) = lim inf m = lim inf = m O [ O O [ O k 1 σ σ m ) dσ R pq π σ θ,y ) π θ σ m,y) dθ Rpq π σ θ,y )[ lim inf = P 1 σ,o ), ] dσ m π θ σ m,y)] dθ dσ R pq π σ θ,y ) π θ σ,y ) dθ ] dσ where the inequality follows from Fatou s lemma, and the third equality follows from the fact that πθ σ,y) is continuous in σ ; for a proof of continuity, see Román 01). We conclude that P 1,O) is lower semi-continuous, so the σ - chain is Feller. The last thing we must do before we can appeal to the results in Meyn and Tweedie 1993) is to show that the drift function, v ), is unbounded off compact sets; that is, we must show that, for every d R,theset S d = { σ R r1 : v σ ) d } is compact. Let d be such that S d is nonempty otherwise S d is trivially compact), which means that d and d/α must be larger than 1. Since vσ ) is a continuous function, S d is closed in R r1. Now consider the following set: T d = [ d/α) 1/c,d/α) 1/s] [ d 1/c,d 1/s] [ d 1/c,d 1/s]. The set T d is compact in R r1.sinces d T d, S d is a closed subset of a compact set in R r1,soitiscompactinr r1. Hence, the drift function is unbounded off compact sets. Since the σ -chain is Feller and its maximal irreducibility measure is equivalent to Lebesgue measure on R r1, Meyn and Tweedie s 1993) Theorem shows that every compact set in R r1 is petite. Hence, for each d R,thesetS d is petite, so v ) is unbounded off petite sets. It now follows from the drift condition 10) and an application of Meyn and Tweedie s 1993) Lemma that condition iii) of Meyn and Tweedie s 1993) Theorem is satisfied, so the σ -chain is geometrically ergodic. This completes part II of the proof. We end this section with a proof of Corollary 1.

16 838 J. C. ROMÁN AND J. P. HOBERT PROOF OF COROLLARY 1. It suffices to show that, together, conditions B ) and C )ofcorollary1 imply the second condition of Theorem. Clearly, B )and C ) imply that s/ > 1, so 0, 1] 0, s/) = 0, 1]. Takes = 1. Condition C ) implies s p t) s ƔN/ a e s ) ƔN/ a e ) = p t N a e < 1. Now, we know from Remark 4 that r trr i I P Z T I P X )Z )RT i ) = q t. Hence, { Ɣqi / a s i s } ) tr )) s Ri I P Ɣq i / a i ) Z T I PX)Z )RT i trr i I P Z = T I P X )Z )RT i ) q i a i r trr i I P Z T I P X )Z )RT i ) min j {1,,...,r} {q j a j } = q t min j {1,,...,r} {q j a j } < 1, where the last inequality follows from condition B ). 4. An application of the main result. In this section, we illustrate the application of Theorem using the two-way random effects model with one observation per cell. The model equation is Y ij = β α i γ j ε ij, where i = 1,,...,m, j = 1,,...,n,theα i s are i.i.d. N0,σ α ),theγ j s are i.i.d. N0,σ γ ) and the ε ij s are i.i.d. N0,σ e ).Theα i s, γ j s and ε ij s are all independent. We begin by explaining how to put this model in GLMM matrix) form. There are a total of N = m n observations and we arrange them using the usual lexicographical) ordering Y = Y 11 Y 1n Y 1 Y n Y m1 Y mn ) T. Since β is a univariate parameter common to all of the observations, p = 1and X is an N 1 column vector of ones, which we denote by 1 N.Therearer = random factors with q 1 = m and q = n, soq = m n. Letting denote the Kronecker product, we can write the Z matrix as Z 1 Z ),wherez 1 = I m 1 n and Z = 1 m I n. We assume throughout this section that SSE > 0.

17 CONVERGENCE OF THE GIBBS SAMPLER 839 We now examine conditions 8)and9)ofTheoremfor this particular model. A routine calculation shows that Z T I P X )Z is a block diagonal matrix given by n I m 1 ) m J m m I n 1 ) n J n, where J d is a d d matrix of ones and is the direct sum operator). It follows immediately that t = rank Z T I P X )Z ) = rank I m 1 ) m J m rank I n 1 ) n J n = m n. Hence, 8) becomes s m n 1) s Ɣmn/ a e s) < 1. Ɣmn/ a e ) Now, it can be shown that ) ) 1 1 I P Z T I P X )Z = m J m n J n. Hence, we have and tr ) ) 1 R 1 I P Z T I P X )Z )RT 1 = tr m J m = 1 tr ) ) 1 R I P Z T I P X )Z )RT = tr n J n = 1. Therefore, 9) reduces to { Ɣm/ s a1 s) Ɣn/ a } s) < 1. Ɣm/ a 1 ) Ɣn/ a ) Now consider a concrete example in which m = 5, n = 6, and the prior is I R σe )I R σα )I R σγ ) σe σα σ. γ So, we are taking b e = b 1 = b = a e = 0anda 1 = a = 1/. Corollary 1 implies that the Gibbs Markov chain is geometrically ergodic whenever m, n 6, but this result is not applicable when m = 5andn = 6. Hence, we turn to Theorem. In this case, s = 4, so we need to find an s 0, 1] such that { s s Ɣ30/ 0 s) max 10), Ɣ30/ 0) } Ɣ5/ 1/) s) Ɣ6/ 1/) s) < 1. Ɣ5/ 1/)) Ɣ6/ 1/)) The reader can check that, when s = 0.9, the left-hand side is approximately Therefore, Theorem implies that the Gibbs Markov chain is geometrically ergodic in this case.

18 840 J. C. ROMÁN AND J. P. HOBERT 5. Specializing to the one-way random effects model. The only other existing results on geometric convergence of Gibbs samplers for linear mixed models with improper priors are those of Tan and Hobert 009) hereafter, T&H). These authors considered the one-way random effects model, which is a simple, but important special case of the GLMM given in 1). In this section, we show that our results improve upon those of T&H. Recall that the one-way model is given by 16) Y ij = β α i e ij, where i = 1,...,c, j = 1,...,n i,theα i s are i.i.d. N0,σα ),andthee ij s, which are independent of the α i s, are i.i.d. N0,σe ). It is easy to see that 16) is a special case of the GLMM. Obviously, there are a total of N = n 1 n c observations, and we arrange them in a column vector with the usual ordering as follows: Y = Y 11 Y 1n1 Y 1 Y n Y c1 Y cnc ) T. As in the two-way model of Section 4, β is a univariate parameter common to all of the observations, so p = 1andX = 1 N. Here there is only one random factor with c levels), so r = 1, q = q 1 = c and Z = c 1 ni. Of course, in this case, ni j=1 y ij. We assume throughout SSE = c ni j=1 y ij y i ),wherey i = n 1 i this section that SSE > 0. We note that T&H actually considered a slightly different parameterization of the one-way model. In their version, β does not appear in the model equation 16), but rather as the mean of the u i s. In other words, T&H used the centered parameterization, whereas here we are using the noncentered parameterization. Román 01) shows that, because β and α 1 α c ) T are part of a single block in the Gibbs sampler, the centered and noncentered versions of the Gibbs sampler converge at exactly the same rate. T&H considered improper priors for β, σe,σ α ) that take the form σ ) ae 1) e IR σ ) e σ ) a1 1) α IR σ ) α, and they showed that the Gibbs sampler for the one-way model is geometrically ergodic if a 1 < 0and 17) { c c min n i n i 1 N a e c 3 ) 1 }, n N and { )} c < exp a 1, where n = max{n 1,n,...,n c } and x) = dx d logɣx)) is the digamma function. We now consider the implications of Theorem in the case of the one-way model. First, t = rankz T I P X )Z) = c 1. Combining this fact with Remark 4,

19 CONVERGENCE OF THE GIBBS SAMPLER 841 it follows that the two conditions of Theorem will hold if a 1 < 0, and there exists an s 0, 1) 0,a 1 c ) 0,a e N ) such that { s max c s ƔN/ a e s), Ɣc/ a } 1 s) < 1. ƔN/ a e ) Ɣc/ a 1 ) Román 01) shows that such an s does indeed exist so the Gibbs chain is geometrically ergodic) when { )} c 18) N a e c and 1< exp a 1. Now,it seasytoshowthat { c ) 1 } n i 1 c min, n. n i 1 N Consequently, if 17) holds, then so does 18). In other words, our sufficient conditions are weaker than those of T&H, so our result improves upon theirs. Moreover, in contrast with the conditions of T&H, our conditions do not directly involve the group sample sizes, n 1,n,...,n c. Of course, the best result possible would be that the Gibbs Markov chain is geometrically ergodic whenever the posterior is proper. Our result is very close to the best possible in the important case where the standard diffuse prior is used; that is, when a 1 = 1/ anda e = 0. The posterior is proper in this case if and only if c 3[Sun, Tsutakawa and He 001)]. It follows from 18) that the Gibbs Markov chain is geometrically ergodic as long as c 3, and the total sample size, N = n 1 n n c, is at least c. This additional sample size condition is extremely weak. Indeed, the positivity of SSE implies that N c 1, so, for fixed c 3, our condition for geometric ergodicity fails only in the single case where N = c 1. Interestingly, in this case, the conditions of Corollary 1 reduce to c 5 and N c Discussion. Our decision to work with the σ -chain rather than the θ-chain was based on an important technical difference between the two chains that stems from the fact that πσ θ,y) is not continuous in θ for each fixed σ when the set A is nonempty). Indeed, recall that, for θ/ N, π σ θ,y ) = f IG σ e ; N a e,b e but for θ N, r ) y Wθ f IG σu i ; q i a i,b i u i π σ θ,y ) = f IG σ e ; 1, 1 ) r f IG σ ui ; 1, 1 ). ),

20 84 J. C. ROMÁN AND J. P. HOBERT Also, recall that the Mtd of the σ -chain is given by k 1 σ σ ) = π σ θ,y ) π θ σ,y ) dθ. R pq Since the set N has measure zero, the arbitrary part of πσ θ,y) washes out of k 1. However, the same cannot be said for the θ-chain, whose Mtd is given by k θ θ)= π θ σ,y ) π σ θ,y ) dσ. R r1 This difference between k 1 and k comes into play when we attempt to apply certain topological results from Markov chain theory, such as those in Chapter 6 of Meyn and Tweedie 1993). In particular, in our proof that the σ -chain is a Feller chain which was part of the proof of Proposition ), we used the fact that πθ σ,y) is continuous in σ for each fixed θ. Sinceπσ θ,y) is not continuous, we cannot use the same argument to prove that the θ-chain is Feller. In fact, we suspect that the θ-chain is not Feller, and if this is true, it means that our method of proof will not work for the θ-chain. It is possible to circumvent the problem described above by removing the set N from the state space of the θ-chain. In this case, we are no longer required to define πσ θ,y) for θ N, and since πσ θ,y) is continuous for fixed σ )on R pq \ N, the Feller argument for the θ-chain will go through. On the other hand, the new state space has holes in it, and this could complicate the search for a drift function that is unbounded off compact sets. For example, consider a toy drift function given by vx) = x. This function is clearly unbounded off compact sets when the state space is R, but not when the state space is R \{0}. The modified drift function v x) = x 1/x is unbounded off compact sets for the holey state space. T&H overlooked a set of measure zero similar to our N ), and this oversight led to an error in the proof of their main result Proposition 3). However, Román 01) shows that T&H s proof can be repaired and that their result is correct as stated. The fix involves deleting the offending null set from the state space, and adding a term to the drift function. A.1. Preliminary results. APPENDIX: UPPER BOUNDS Here is our first result. LEMMA 1. The following inequalities hold for all σ R r1 and all i {1,,...,r}: 1) Q 1 Z T I P X )Z) σ e I P Z T I P X )Z ) r j=1 σ u j ); ) tri P X )ZQ 1 Z T I P X )) rankz T I P X )Z)σ e ; 3) R i Q 1 R T i ) 1 σ e ) 1 λ max σ u i ) 1 )I qi.

21 CONVERGENCE OF THE GIBBS SAMPLER 843 PROOF. Recall from Section 3 that U T U is the spectral decomposition of Z T I P X )Z,andthatP is a binary diagonal matrix whose ith diagonal element is 1 if and only if the ith diagonal element of is positive. Let σ = r j=1 σu j. Since σ ) 1 I q D 1,wehave and this yields σ ) 1Z T e I P X )Z σ ) 1Iq σe ) 1Z T I P X )Z D 1, 19) Q 1 = σe ) 1Z T I P X )Z D 1) 1 σe ) 1Z T I P X )Z σ ) 1Iq ) 1 = U T σe ) 1 Iq σ ) 1 ) 1U. Now let be a diagonal matrix whose diagonal elements, {λ i }q,aregivenby { λ i = λ 1 i, λ i 0, 0, λ i = 0. Note that, for each i {1,,...,r},wehave 1 λ i σ e ) 1 σ ) 1 λ i σ e I {0}λ i )σ. This shows that σ e ) 1 Iq σ ) 1 ) 1 σ e I P )σ. Together with 19), this leads to Q 1 U T σe ) 1 Iq σ ) 1 ) 1U U T σe I P )σ ) U = Z T I P X )Z ) σ e U T I P )Uσ. So to prove the first statement, it remains to show that U T I P )U = I P Z T I P X )Z. But notice that letting A = ZT I P X )Z and using its spectral decomposition, we have which implies that A A T A ) A T = U T U U T T U ) U T U = U T T ) U = U T P U, I P Z T I P X )Z = I A A T A ) A T = I U T P U = U T I P )U. The proof of the first statement is now complete. Now let Z = I P X )Z. Multiplying the first statement on the left and the right by Z and Z T, respectively, and

22 844 J. C. ROMÁN AND J. P. HOBERT then taking traces yields 0) tr ZQ 1 Z T ) tr Z Z T Z ) Z T ) σ e tr ZU T I P )U Z T ) σ. Since Z T Z) Z T Z) is idempotent, we have tr Z Z T Z ) Z T ) = tr Z T Z Z T Z ) ) = rank Z T Z Z T Z ) ) = rank Z T Z ). Furthermore, tr ZU T I P )U Z T ) = tr U T I P )UZ T I P X )Z ) = tr U T I P )UU T U ) = tr U T I P ) U ) = 0, where the last line follows from the fact that I P ) = 0. It follows from 0) that tr I P X )ZQ 1 Z T I P X ) ) rank Z T I P X )Z ) σ e, and the second statement has been established. Recall from Section 3 that λ max is the largest eigenvalue of Z T I P X )Z,andthatR i is the q i q matrix of 0 s and 1 s such that R i u = u i.now,fixi {1,,...,r} and note that Q = σe ) 1Z T I P X )Z D 1 σe ) 1λmax I q D 1. It follows that R i σ e ) 1λmax I q D 1) 1 R T i R i Q 1 R T i, and since these two matrices are both positive definite, we have Ri Q 1 Ri T ) 1 Ri σ ) 1λmax e I q D 1) 1 R T ) 1 i = σe ) 1λmax σu ) 1 ) Iqi i, and this proves that the third statement is true. Let z i and y i denote the ith column of Z T = I P X )Z) T and the ith component of y, respectively. Also, define K to be N y i t i ti T Nq ) w j t j tj T w j t j tj T w ii q t i, sup w R Nq t T i j {1,,...,N}\{i} j=n1 where, for j = 1,,...,N, t j = z j,andforj {N 1,...,N q},thet j are the standard orthonormal basis vectors in R q ;thatis,t Nl has a one in the lth position and zeros everywhere else.

23 CONVERGENCE OF THE GIBBS SAMPLER 845 LEMMA. For any σ R r1, we have h σ ) := σe ) 1Q 1 Z T I P X )y K<. The following result from Khare and Hobert 011) will be used in the proof of Lemma. LEMMA 3. Fix n {, 3,...} and m N, and let x 1,...,x n be vectors in R m. Then ) n C m,n x 1 ; x,...,x n ) := sup x T w R n 1 x 1 x1 T w i x i xi T w 1 I x 1 is finite. PROOF OF LEMMA. Recall that we defined z i and y i to be the ith column of Z T = I P X )Z) T and the ith component of y, respectively. Now, i= h σ ) = Z T I P X )Z σe D 1) 1 Z T I P X )y N = Z T Z σ e D 1) 1 zi y i = N Z T Z σe D 1) 1 zi y i N N ) z j z j T σ e 1 z D 1 i y i j=1 where N = y i K i σ ), K i σ ) := z i z i T ) z j z j T σ e 1 z D 1 i. j {1,,...,N}\{i} For each i {1,,...,N},define ˆK i = t i ti T sup w R Nq t T i j {1,,...,N}\{i} Nq w j t j tj T j=n1 w j t j t T j w ii q ) t i,

24 846 J. C. ROMÁN AND J. P. HOBERT and notice that K can be written as K = N y i ˆK i. Therefore, it is enough to show that, for each i {1,,...,N}, K i σ ) ˆK i <.Now, Ki σ ) = z i T z i z i T j {1,,...,N}\{i} ) z j z j T σ e z D 1 i = z i T z i z i T z j z j T σ e D 1 1 ) σ j {1,,...,N}\{i} I q σ e ) σ I q z i sup ti T t i ti T Nq w j t j tj T ) w j t j tj T w ii q t i w R Nq j {1,,...,N}\{i} j=n1 = ˆK i. Finally, an application of Lemma 3 shows that ˆK i is finite, and the proof is complete. Let χk μ) denote the noncentral chi-square distribution with k degrees of freedom and noncentrality parameter μ. LEMMA 4. If J χk μ) and γ 0,k/), then E [ J γ ] γ Ɣk/ γ). Ɣk/) PROOF. E [ J γ ] = Since Ɣx γ)/ɣx)is decreasing for x>γ>0, we have μ i e μ [ x γ i! i=0 R = γ μ i e μ i! i=0 γ Ɣk/ γ). Ɣk/) 1 Ɣk/ i) k/i xk/i 1 e x/ Ɣk/ i γ) Ɣk/ i) ] dx A.. An upper bound on E[ y Wθ σ ]. We remind the reader that θ = β T u T ) T, W = X Z),andthatπθ σ,y)is a multivariate normal density with mean m and covariance matrix V. Thus, 1) E [ y Wθ σ ] = tr WVW T ) y Wm,

25 and we have tr WVW T ) ) CONVERGENCE OF THE GIBBS SAMPLER 847 = σ e trp X) tr P X ZQ 1 Z T P X ) tr ZQ 1 Z T P X ) tr ZQ 1 Z T ) = pσ e tr ZQ 1 Z T P X ) tr ZQ 1 Z T ) = pσ e tr ZQ 1 Z T I P X ) ) = pσ e tr I P X )ZQ 1 Z T I P X ) ) pσ e rank Z T I P X )Z ) σ e = p t)σ e, where the inequality is an application of Lemma 1. Finally, a simple calculation shows that y Wm= I P X ) [ I σe ) 1ZQ 1 Z T I P X ) ] y. Hence, y Wm = I P X )y σe ) 1I PX )ZQ 1 Z T I P X )y I P X )y σe ) 1I PX )ZQ 1 Z T I P X )y 3) I P X )y I P X )Z σe ) 1Q 1 Z T I P X )y I P X )y I P X )Z K, where denotes the Frobenius norm and the last inequality uses Lemma. Finally, combining 1), ) and3) yields E [ y Wθ σ ] p t)σ e I P X )y I P X )Z K ). A.3. An upper bound on E[ u i σ ]. Note that E [ u i σ ] = E [ R i u σ ] = tr R i Q 1 Ri T ) 4) E [ R i u σ ]. By Lemma 1, wehave tr R i Q 1 Ri T ) tr Ri Z T I P X )Z ) R T ) i σ e tr ) r 5) R i I P Z T I P X )Z )RT i σu j j=1 = ξ i σe ζ i σu j. j=1 Now, by Lemma, 6) E [ R i u σ ] R i E [ u σ ] = R i h σ ) R i K.

26 848 J. C. ROMÁN AND J. P. HOBERT Combining 4), 5) and6) yields E [ u i σ ] ξ i σe ζ i σu j R i K ). j=1 A.4. An upper bound on E[ u i ) c σ ]. Fix i {1,,...,r}. Givenσ, R i Q 1 R T i ) 1/ u i has a multivariate normal distribution with identity covariance matrix. It follows that, conditional on σ, the distribution of u T i R iq 1 R T i ) 1 u i is χ q i w). It follows from Lemma 4 that, as long as c 0, 1/), wehave Now, by Lemma 1, E [ u i ) c σ ] E [[ u T i Ri Q 1 Ri T ) 1ui ] c σ ] c Ɣq i/ c). Ɣq i /) = σe ) 1λmax σu ) 1 ) ce [[ i u T i σ ) 1λmax e σu ) 1 ) ] c σ Iqi i u ] i σe ) 1λmax σu ) 1 ) ce [[ i u T i Ri Q 1 Ri T ) 1ui ] c σ ] σe ) 1λmax σu ) 1 ) c c Ɣq i/ c) i Ɣq i /) c Ɣq i/ c) [ λ c Ɣq i /) max σ ) c e σ ) c ] ui. Acknowledgments. The authors thank three anonymous reviewers for helpful comments and suggestions. REFERENCES BEDNORZ, W. and ŁATUSZYŃSKI, K. 007). A few remarks on Fixed-width output analysis for Markov chain Monte Carlo by Jones et al. [MR79478]. J. Amer. Statist. Assoc MR4158 DANIELS, M. J. 1999). A prior for the variance in hierarchical models. Canad. J. Statist MR17458 DIACONIS, P., KHARE, K. and SALOFF-COSTE, L. 008). Gibbs sampling, exponential families and orthogonal polynomials with discussion). Statist. Sci MR FLEGAL, J.M.,HARAN, M.andJONES, G. L. 008). Markov chain Monte Carlo: Can we trust the third significant figure? Statist. Sci MR51683 FLEGAL, J. M. and JONES, G. L. 010). Batch means and spectral variance estimators in Markov chain Monte Carlo. Ann. Statist MR GELMAN, A. 006). Prior distributions for variance parameters in hierarchical models. Bayesian Anal JOHNSON, A. A. and JONES, G. L. 010). Gibbs sampling for a Bayesian hierarchical general linear model. Electron. J. Stat MR645487

27 CONVERGENCE OF THE GIBBS SAMPLER 849 JONES, G. L., HARAN, M., CAFFO, B. S. and NEATH, R. 006). Fixed-width output analysis for Markov chain Monte Carlo. J. Amer. Statist. Assoc MR79478 KHARE, K. and HOBERT, J. P. 011). A spectral analytic comparison of trace-class data augmentation algorithms and their sandwich variants. Ann. Statist MR LIU, J. S., WONG, W. H. and KONG, A. 1994). Covariance structure of the Gibbs sampler with applications to the comparisons of estimators and augmentation schemes. Biometrika MR MEYN, S. P. andtweedie, R. L. 1993). Markov Chains and Stochastic Stability. Springer, London. MR PAPASPILIOPOULOS, O. and ROBERTS, G. 008). Stability of the Gibbs sampler for Bayesian hierarchical models. Ann. Statist MR ROBERTS, G. O. and ROSENTHAL, J. S. 1998). Markov-chain Monte Carlo: Some practical implications of theoretical results with discussion). Canad. J. Statist MR ROBERTS, G. O. and ROSENTHAL, J. S. 001). Markov chains and de-initializing processes. Scand. J. Stat MR ROBERTS, G.O.andROSENTHAL, J. S. 004). General state space Markov chains and MCMC algorithms. Probab. Surv MR ROMÁN, J. C. 01). Convergence analysis of block Gibbs samplers for Bayesian general linear mixed models. Ph.D. thesis, Dept. Statistics, Univ. Florida, Gainesville, FL. SEARLE, S. R., CASELLA, G. andmcculloch, C. E. 199). Variance Components. Wiley, New York. MR SUN, D., TSUTAKAWA, R. K. and HE, Z. 001). Propriety of posteriors with improper priors in hierarchical linear mixed models. Statist. Sinica MR18000 TAN, A. and HOBERT, J. P. 009). Block Gibbs sampling for Bayesian random effects models with improper priors: Convergence and regeneration. J. Comput. Graph. Statist MR DEPARTMENT OF MATHEMATICS VANDERBILT UNIVERSITY NASHVILLE, TENNESSEE 3740 USA jc.roman@vanderbilt.edu DEPARTMENT OF STATISTICS UNIVERSITY OF FLORIDA GAINESVILLE, FLORIDA 3611 USA jhobert@stat.ufl.edu

CONVERGENCE ANALYSIS OF THE GIBBS SAMPLER FOR BAYESIAN GENERAL LINEAR MIXED MODELS WITH IMPROPER PRIORS. Vanderbilt University & University of Florida

CONVERGENCE ANALYSIS OF THE GIBBS SAMPLER FOR BAYESIAN GENERAL LINEAR MIXED MODELS WITH IMPROPER PRIORS. Vanderbilt University & University of Florida Submitted to the Annals of Statistics CONVERGENCE ANALYSIS OF THE GIBBS SAMPLER FOR BAYESIAN GENERAL LINEAR MIXED MODELS WITH IMPROPER PRIORS By Jorge Carlos Román and James P. Hobert Vanderbilt University

More information

On Reparametrization and the Gibbs Sampler

On Reparametrization and the Gibbs Sampler On Reparametrization and the Gibbs Sampler Jorge Carlos Román Department of Mathematics Vanderbilt University James P. Hobert Department of Statistics University of Florida March 2014 Brett Presnell Department

More information

Block Gibbs sampling for Bayesian random effects models with improper priors: Convergence and regeneration

Block Gibbs sampling for Bayesian random effects models with improper priors: Convergence and regeneration Block Gibbs sampling for Bayesian random effects models with improper priors: Convergence and regeneration Aixin Tan and James P. Hobert Department of Statistics, University of Florida May 4, 009 Abstract

More information

Analysis of Polya-Gamma Gibbs sampler for Bayesian logistic analysis of variance

Analysis of Polya-Gamma Gibbs sampler for Bayesian logistic analysis of variance Electronic Journal of Statistics Vol. (207) 326 337 ISSN: 935-7524 DOI: 0.24/7-EJS227 Analysis of Polya-Gamma Gibbs sampler for Bayesian logistic analysis of variance Hee Min Choi Department of Statistics

More information

Geometric ergodicity of the Bayesian lasso

Geometric ergodicity of the Bayesian lasso Geometric ergodicity of the Bayesian lasso Kshiti Khare and James P. Hobert Department of Statistics University of Florida June 3 Abstract Consider the standard linear model y = X +, where the components

More information

The Polya-Gamma Gibbs Sampler for Bayesian. Logistic Regression is Uniformly Ergodic

The Polya-Gamma Gibbs Sampler for Bayesian. Logistic Regression is Uniformly Ergodic he Polya-Gamma Gibbs Sampler for Bayesian Logistic Regression is Uniformly Ergodic Hee Min Choi and James P. Hobert Department of Statistics University of Florida August 013 Abstract One of the most widely

More information

University of Toronto Department of Statistics

University of Toronto Department of Statistics Norm Comparisons for Data Augmentation by James P. Hobert Department of Statistics University of Florida and Jeffrey S. Rosenthal Department of Statistics University of Toronto Technical Report No. 0704

More information

STA 294: Stochastic Processes & Bayesian Nonparametrics

STA 294: Stochastic Processes & Bayesian Nonparametrics MARKOV CHAINS AND CONVERGENCE CONCEPTS Markov chains are among the simplest stochastic processes, just one step beyond iid sequences of random variables. Traditionally they ve been used in modelling a

More information

POSTERIOR PROPRIETY IN SOME HIERARCHICAL EXPONENTIAL FAMILY MODELS

POSTERIOR PROPRIETY IN SOME HIERARCHICAL EXPONENTIAL FAMILY MODELS POSTERIOR PROPRIETY IN SOME HIERARCHICAL EXPONENTIAL FAMILY MODELS EDWARD I. GEORGE and ZUOSHUN ZHANG The University of Texas at Austin and Quintiles Inc. June 2 SUMMARY For Bayesian analysis of hierarchical

More information

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can

More information

Improving the Convergence Properties of the Data Augmentation Algorithm with an Application to Bayesian Mixture Modelling

Improving the Convergence Properties of the Data Augmentation Algorithm with an Application to Bayesian Mixture Modelling Improving the Convergence Properties of the Data Augmentation Algorithm with an Application to Bayesian Mixture Modelling James P. Hobert Department of Statistics University of Florida Vivekananda Roy

More information

Convergence Analysis of MCMC Algorithms for Bayesian Multivariate Linear Regression with Non-Gaussian Errors

Convergence Analysis of MCMC Algorithms for Bayesian Multivariate Linear Regression with Non-Gaussian Errors Convergence Analysis of MCMC Algorithms for Bayesian Multivariate Linear Regression with Non-Gaussian Errors James P. Hobert, Yeun Ji Jung, Kshitij Khare and Qian Qin Department of Statistics University

More information

Spectral properties of Markov operators in Markov chain Monte Carlo

Spectral properties of Markov operators in Markov chain Monte Carlo Spectral properties of Markov operators in Markov chain Monte Carlo Qian Qin Advisor: James P. Hobert October 2017 1 Introduction Markov chain Monte Carlo (MCMC) is an indispensable tool in Bayesian statistics.

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov

More information

Simultaneous drift conditions for Adaptive Markov Chain Monte Carlo algorithms

Simultaneous drift conditions for Adaptive Markov Chain Monte Carlo algorithms Simultaneous drift conditions for Adaptive Markov Chain Monte Carlo algorithms Yan Bai Feb 2009; Revised Nov 2009 Abstract In the paper, we mainly study ergodicity of adaptive MCMC algorithms. Assume that

More information

Markov chain Monte Carlo

Markov chain Monte Carlo Markov chain Monte Carlo Karl Oskar Ekvall Galin L. Jones University of Minnesota March 12, 2019 Abstract Practically relevant statistical models often give rise to probability distributions that are analytically

More information

A Comparison Theorem for Data Augmentation Algorithms with Applications

A Comparison Theorem for Data Augmentation Algorithms with Applications A Comparison Theorem for Data Augmentation Algorithms with Applications Hee Min Choi Department of Statistics University of California, Davis James P. Hobert Department of Statistics University of Florida

More information

Central limit theorems for ergodic continuous-time Markov chains with applications to single birth processes

Central limit theorems for ergodic continuous-time Markov chains with applications to single birth processes Front. Math. China 215, 1(4): 933 947 DOI 1.17/s11464-15-488-5 Central limit theorems for ergodic continuous-time Markov chains with applications to single birth processes Yuanyuan LIU 1, Yuhui ZHANG 2

More information

A Geometric Interpretation of the Metropolis Hastings Algorithm

A Geometric Interpretation of the Metropolis Hastings Algorithm Statistical Science 2, Vol. 6, No., 5 9 A Geometric Interpretation of the Metropolis Hastings Algorithm Louis J. Billera and Persi Diaconis Abstract. The Metropolis Hastings algorithm transforms a given

More information

ON CONVERGENCE RATES OF GIBBS SAMPLERS FOR UNIFORM DISTRIBUTIONS

ON CONVERGENCE RATES OF GIBBS SAMPLERS FOR UNIFORM DISTRIBUTIONS The Annals of Applied Probability 1998, Vol. 8, No. 4, 1291 1302 ON CONVERGENCE RATES OF GIBBS SAMPLERS FOR UNIFORM DISTRIBUTIONS By Gareth O. Roberts 1 and Jeffrey S. Rosenthal 2 University of Cambridge

More information

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2. APPENDIX A Background Mathematics A. Linear Algebra A.. Vector algebra Let x denote the n-dimensional column vector with components 0 x x 2 B C @. A x n Definition 6 (scalar product). The scalar product

More information

More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order Restriction

More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order Restriction Sankhyā : The Indian Journal of Statistics 2007, Volume 69, Part 4, pp. 700-716 c 2007, Indian Statistical Institute More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order

More information

Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics

Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics Eric Slud, Statistics Program Lecture 1: Metropolis-Hastings Algorithm, plus background in Simulation and Markov Chains. Lecture

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

Nonparametric Drift Estimation for Stochastic Differential Equations

Nonparametric Drift Estimation for Stochastic Differential Equations Nonparametric Drift Estimation for Stochastic Differential Equations Gareth Roberts 1 Department of Statistics University of Warwick Brazilian Bayesian meeting, March 2010 Joint work with O. Papaspiliopoulos,

More information

Modeling conditional distributions with mixture models: Theory and Inference

Modeling conditional distributions with mixture models: Theory and Inference Modeling conditional distributions with mixture models: Theory and Inference John Geweke University of Iowa, USA Journal of Applied Econometrics Invited Lecture Università di Venezia Italia June 2, 2005

More information

Applicability of subsampling bootstrap methods in Markov chain Monte Carlo

Applicability of subsampling bootstrap methods in Markov chain Monte Carlo Applicability of subsampling bootstrap methods in Markov chain Monte Carlo James M. Flegal Abstract Markov chain Monte Carlo (MCMC) methods allow exploration of intractable probability distributions by

More information

Asymptotically Stable Drift and Minorization for Markov Chains. with Application to Albert and Chib s Algorithm

Asymptotically Stable Drift and Minorization for Markov Chains. with Application to Albert and Chib s Algorithm Asymptotically Stable Drift and Minorization for Markov Chains with Application to Albert and Chib s Algorithm Qian Qin and James P. Hobert Department of Statistics University of Florida December 2017

More information

STA205 Probability: Week 8 R. Wolpert

STA205 Probability: Week 8 R. Wolpert INFINITE COIN-TOSS AND THE LAWS OF LARGE NUMBERS The traditional interpretation of the probability of an event E is its asymptotic frequency: the limit as n of the fraction of n repeated, similar, and

More information

VARIABLE TRANSFORMATION TO OBTAIN GEOMETRIC ERGODICITY IN THE RANDOM-WALK METROPOLIS ALGORITHM

VARIABLE TRANSFORMATION TO OBTAIN GEOMETRIC ERGODICITY IN THE RANDOM-WALK METROPOLIS ALGORITHM Submitted to the Annals of Statistics VARIABLE TRANSFORMATION TO OBTAIN GEOMETRIC ERGODICITY IN THE RANDOM-WALK METROPOLIS ALGORITHM By Leif T. Johnson and Charles J. Geyer Google Inc. and University of

More information

A regeneration proof of the central limit theorem for uniformly ergodic Markov chains

A regeneration proof of the central limit theorem for uniformly ergodic Markov chains A regeneration proof of the central limit theorem for uniformly ergodic Markov chains By AJAY JASRA Department of Mathematics, Imperial College London, SW7 2AZ, London, UK and CHAO YANG Department of Mathematics,

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2 Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate

More information

On the Central Limit Theorem for an ergodic Markov chain

On the Central Limit Theorem for an ergodic Markov chain Stochastic Processes and their Applications 47 ( 1993) 113-117 North-Holland 113 On the Central Limit Theorem for an ergodic Markov chain K.S. Chan Department of Statistics and Actuarial Science, The University

More information

Chapter 1. Measure Spaces. 1.1 Algebras and σ algebras of sets Notation and preliminaries

Chapter 1. Measure Spaces. 1.1 Algebras and σ algebras of sets Notation and preliminaries Chapter 1 Measure Spaces 1.1 Algebras and σ algebras of sets 1.1.1 Notation and preliminaries We shall denote by X a nonempty set, by P(X) the set of all parts (i.e., subsets) of X, and by the empty set.

More information

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 18.466 Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 1. MLEs in exponential families Let f(x,θ) for x X and θ Θ be a likelihood function, that is, for present purposes,

More information

Proofs for Large Sample Properties of Generalized Method of Moments Estimators

Proofs for Large Sample Properties of Generalized Method of Moments Estimators Proofs for Large Sample Properties of Generalized Method of Moments Estimators Lars Peter Hansen University of Chicago March 8, 2012 1 Introduction Econometrica did not publish many of the proofs in my

More information

Part V. 17 Introduction: What are measures and why measurable sets. Lebesgue Integration Theory

Part V. 17 Introduction: What are measures and why measurable sets. Lebesgue Integration Theory Part V 7 Introduction: What are measures and why measurable sets Lebesgue Integration Theory Definition 7. (Preliminary). A measure on a set is a function :2 [ ] such that. () = 2. If { } = is a finite

More information

When is a Markov chain regenerative?

When is a Markov chain regenerative? When is a Markov chain regenerative? Krishna B. Athreya and Vivekananda Roy Iowa tate University Ames, Iowa, 50011, UA Abstract A sequence of random variables {X n } n 0 is called regenerative if it can

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Convergence complexity analysis of Albert and Chib s algorithm for Bayesian probit regression

Convergence complexity analysis of Albert and Chib s algorithm for Bayesian probit regression Convergence complexity analysis of Albert and Chib s algorithm for Bayesian probit regression Qian Qin and James P. Hobert Department of Statistics University of Florida April 2018 Abstract The use of

More information

Weighted Sums of Orthogonal Polynomials Related to Birth-Death Processes with Killing

Weighted Sums of Orthogonal Polynomials Related to Birth-Death Processes with Killing Advances in Dynamical Systems and Applications ISSN 0973-5321, Volume 8, Number 2, pp. 401 412 (2013) http://campus.mst.edu/adsa Weighted Sums of Orthogonal Polynomials Related to Birth-Death Processes

More information

Invariant HPD credible sets and MAP estimators

Invariant HPD credible sets and MAP estimators Bayesian Analysis (007), Number 4, pp. 681 69 Invariant HPD credible sets and MAP estimators Pierre Druilhet and Jean-Michel Marin Abstract. MAP estimators and HPD credible sets are often criticized in

More information

ST 740: Linear Models and Multivariate Normal Inference

ST 740: Linear Models and Multivariate Normal Inference ST 740: Linear Models and Multivariate Normal Inference Alyson Wilson Department of Statistics North Carolina State University November 4, 2013 A. Wilson (NCSU STAT) Linear Models November 4, 2013 1 /

More information

Lecture 8: Linear Algebra Background

Lecture 8: Linear Algebra Background CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 8: Linear Algebra Background Lecturer: Shayan Oveis Gharan 2/1/2017 Scribe: Swati Padmanabhan Disclaimer: These notes have not been subjected

More information

Part III. A Decision-Theoretic Approach and Bayesian testing

Part III. A Decision-Theoretic Approach and Bayesian testing Part III A Decision-Theoretic Approach and Bayesian testing 1 Chapter 10 Bayesian Inference as a Decision Problem The decision-theoretic framework starts with the following situation. We would like to

More information

A Single-letter Upper Bound for the Sum Rate of Multiple Access Channels with Correlated Sources

A Single-letter Upper Bound for the Sum Rate of Multiple Access Channels with Correlated Sources A Single-letter Upper Bound for the Sum Rate of Multiple Access Channels with Correlated Sources Wei Kang Sennur Ulukus Department of Electrical and Computer Engineering University of Maryland, College

More information

Reversible Markov chains

Reversible Markov chains Reversible Markov chains Variational representations and ordering Chris Sherlock Abstract This pedagogical document explains three variational representations that are useful when comparing the efficiencies

More information

Testing Restrictions and Comparing Models

Testing Restrictions and Comparing Models Econ. 513, Time Series Econometrics Fall 00 Chris Sims Testing Restrictions and Comparing Models 1. THE PROBLEM We consider here the problem of comparing two parametric models for the data X, defined by

More information

First Hitting Time Analysis of the Independence Metropolis Sampler

First Hitting Time Analysis of the Independence Metropolis Sampler (To appear in the Journal for Theoretical Probability) First Hitting Time Analysis of the Independence Metropolis Sampler Romeo Maciuca and Song-Chun Zhu In this paper, we study a special case of the Metropolis

More information

10. Exchangeability and hierarchical models Objective. Recommended reading

10. Exchangeability and hierarchical models Objective. Recommended reading 10. Exchangeability and hierarchical models Objective Introduce exchangeability and its relation to Bayesian hierarchical models. Show how to fit such models using fully and empirical Bayesian methods.

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC Stat 451 Lecture Notes 07 12 Markov Chain Monte Carlo Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapters 8 9 in Givens & Hoeting, Chapters 25 27 in Lange 2 Updated: April 4, 2016 1 / 42 Outline

More information

Markov Chains, Stochastic Processes, and Matrix Decompositions

Markov Chains, Stochastic Processes, and Matrix Decompositions Markov Chains, Stochastic Processes, and Matrix Decompositions 5 May 2014 Outline 1 Markov Chains Outline 1 Markov Chains 2 Introduction Perron-Frobenius Matrix Decompositions and Markov Chains Spectral

More information

1 Stochastic Dynamic Programming

1 Stochastic Dynamic Programming 1 Stochastic Dynamic Programming Formally, a stochastic dynamic program has the same components as a deterministic one; the only modification is to the state transition equation. When events in the future

More information

On the Set of Limit Points of Normed Sums of Geometrically Weighted I.I.D. Bounded Random Variables

On the Set of Limit Points of Normed Sums of Geometrically Weighted I.I.D. Bounded Random Variables On the Set of Limit Points of Normed Sums of Geometrically Weighted I.I.D. Bounded Random Variables Deli Li 1, Yongcheng Qi, and Andrew Rosalsky 3 1 Department of Mathematical Sciences, Lakehead University,

More information

Global Maxwellians over All Space and Their Relation to Conserved Quantites of Classical Kinetic Equations

Global Maxwellians over All Space and Their Relation to Conserved Quantites of Classical Kinetic Equations Global Maxwellians over All Space and Their Relation to Conserved Quantites of Classical Kinetic Equations C. David Levermore Department of Mathematics and Institute for Physical Science and Technology

More information

Markov Random Fields

Markov Random Fields Markov Random Fields 1. Markov property The Markov property of a stochastic sequence {X n } n 0 implies that for all n 1, X n is independent of (X k : k / {n 1, n, n + 1}), given (X n 1, X n+1 ). Another

More information

Chapter 7. Markov chain background. 7.1 Finite state space

Chapter 7. Markov chain background. 7.1 Finite state space Chapter 7 Markov chain background A stochastic process is a family of random variables {X t } indexed by a varaible t which we will think of as time. Time can be discrete or continuous. We will only consider

More information

General Bayesian Inference I

General Bayesian Inference I General Bayesian Inference I Outline: Basic concepts, One-parameter models, Noninformative priors. Reading: Chapters 10 and 11 in Kay-I. (Occasional) Simplified Notation. When there is no potential for

More information

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015 Part IB Statistics Theorems with proof Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly)

More information

STAT 7032 Probability Spring Wlodek Bryc

STAT 7032 Probability Spring Wlodek Bryc STAT 7032 Probability Spring 2018 Wlodek Bryc Created: Friday, Jan 2, 2014 Revised for Spring 2018 Printed: January 9, 2018 File: Grad-Prob-2018.TEX Department of Mathematical Sciences, University of Cincinnati,

More information

Analysis of the Gibbs sampler for a model. related to James-Stein estimators. Jeffrey S. Rosenthal*

Analysis of the Gibbs sampler for a model. related to James-Stein estimators. Jeffrey S. Rosenthal* Analysis of the Gibbs sampler for a model related to James-Stein estimators by Jeffrey S. Rosenthal* Department of Statistics University of Toronto Toronto, Ontario Canada M5S 1A1 Phone: 416 978-4594.

More information

Computer intensive statistical methods

Computer intensive statistical methods Lecture 11 Markov Chain Monte Carlo cont. October 6, 2015 Jonas Wallin jonwal@chalmers.se Chalmers, Gothenburg university The two stage Gibbs sampler If the conditional distributions are easy to sample

More information

MCMC algorithms for fitting Bayesian models

MCMC algorithms for fitting Bayesian models MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models

More information

Bayesian Nonparametric Point Estimation Under a Conjugate Prior

Bayesian Nonparametric Point Estimation Under a Conjugate Prior University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 5-15-2002 Bayesian Nonparametric Point Estimation Under a Conjugate Prior Xuefeng Li University of Pennsylvania Linda

More information

1 Data Arrays and Decompositions

1 Data Arrays and Decompositions 1 Data Arrays and Decompositions 1.1 Variance Matrices and Eigenstructure Consider a p p positive definite and symmetric matrix V - a model parameter or a sample variance matrix. The eigenstructure is

More information

Alternative implementations of Monte Carlo EM algorithms for likelihood inferences

Alternative implementations of Monte Carlo EM algorithms for likelihood inferences Genet. Sel. Evol. 33 001) 443 45 443 INRA, EDP Sciences, 001 Alternative implementations of Monte Carlo EM algorithms for likelihood inferences Louis Alberto GARCÍA-CORTÉS a, Daniel SORENSEN b, Note a

More information

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS STEVEN P. LALLEY AND ANDREW NOBEL Abstract. It is shown that there are no consistent decision rules for the hypothesis testing problem

More information

STAT 730 Chapter 4: Estimation

STAT 730 Chapter 4: Estimation STAT 730 Chapter 4: Estimation Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 23 The likelihood We have iid data, at least initially. Each datum

More information

Linear Algebra Massoud Malek

Linear Algebra Massoud Malek CSUEB Linear Algebra Massoud Malek Inner Product and Normed Space In all that follows, the n n identity matrix is denoted by I n, the n n zero matrix by Z n, and the zero vector by θ n An inner product

More information

Bayesian Inference. Chapter 9. Linear models and regression

Bayesian Inference. Chapter 9. Linear models and regression Bayesian Inference Chapter 9. Linear models and regression M. Concepcion Ausin Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master in Mathematical Engineering

More information

Bayesian Linear Regression

Bayesian Linear Regression Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective

More information

STAT 518 Intro Student Presentation

STAT 518 Intro Student Presentation STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible

More information

Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data

Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data Journal of Multivariate Analysis 78, 6282 (2001) doi:10.1006jmva.2000.1939, available online at http:www.idealibrary.com on Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone

More information

Basic math for biology

Basic math for biology Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood

More information

DAVID DAMANIK AND DANIEL LENZ. Dedicated to Barry Simon on the occasion of his 60th birthday.

DAVID DAMANIK AND DANIEL LENZ. Dedicated to Barry Simon on the occasion of his 60th birthday. UNIFORM SZEGŐ COCYCLES OVER STRICTLY ERGODIC SUBSHIFTS arxiv:math/052033v [math.sp] Dec 2005 DAVID DAMANIK AND DANIEL LENZ Dedicated to Barry Simon on the occasion of his 60th birthday. Abstract. We consider

More information

Bayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida

Bayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida Bayesian Statistical Methods Jeff Gill Department of Political Science, University of Florida 234 Anderson Hall, PO Box 117325, Gainesville, FL 32611-7325 Voice: 352-392-0262x272, Fax: 352-392-8127, Email:

More information

2. The Concept of Convergence: Ultrafilters and Nets

2. The Concept of Convergence: Ultrafilters and Nets 2. The Concept of Convergence: Ultrafilters and Nets NOTE: AS OF 2008, SOME OF THIS STUFF IS A BIT OUT- DATED AND HAS A FEW TYPOS. I WILL REVISE THIS MATE- RIAL SOMETIME. In this lecture we discuss two

More information

Partially Collapsed Gibbs Samplers: Theory and Methods. Ever increasing computational power along with ever more sophisticated statistical computing

Partially Collapsed Gibbs Samplers: Theory and Methods. Ever increasing computational power along with ever more sophisticated statistical computing Partially Collapsed Gibbs Samplers: Theory and Methods David A. van Dyk 1 and Taeyoung Park Ever increasing computational power along with ever more sophisticated statistical computing techniques is making

More information

HIERARCHICAL BAYESIAN ANALYSIS OF BINARY MATCHED PAIRS DATA

HIERARCHICAL BAYESIAN ANALYSIS OF BINARY MATCHED PAIRS DATA Statistica Sinica 1(2), 647-657 HIERARCHICAL BAYESIAN ANALYSIS OF BINARY MATCHED PAIRS DATA Malay Ghosh, Ming-Hui Chen, Atalanta Ghosh and Alan Agresti University of Florida, Worcester Polytechnic Institute

More information

ELEC633: Graphical Models

ELEC633: Graphical Models ELEC633: Graphical Models Tahira isa Saleem Scribe from 7 October 2008 References: Casella and George Exploring the Gibbs sampler (1992) Chib and Greenberg Understanding the Metropolis-Hastings algorithm

More information

MARGINAL MARKOV CHAIN MONTE CARLO METHODS

MARGINAL MARKOV CHAIN MONTE CARLO METHODS Statistica Sinica 20 (2010), 1423-1454 MARGINAL MARKOV CHAIN MONTE CARLO METHODS David A. van Dyk University of California, Irvine Abstract: Marginal Data Augmentation and Parameter-Expanded Data Augmentation

More information

Factorization of Seperable and Patterned Covariance Matrices for Gibbs Sampling

Factorization of Seperable and Patterned Covariance Matrices for Gibbs Sampling Monte Carlo Methods Appl, Vol 6, No 3 (2000), pp 205 210 c VSP 2000 Factorization of Seperable and Patterned Covariance Matrices for Gibbs Sampling Daniel B Rowe H & SS, 228-77 California Institute of

More information

The Lebesgue Integral

The Lebesgue Integral The Lebesgue Integral Brent Nelson In these notes we give an introduction to the Lebesgue integral, assuming only a knowledge of metric spaces and the iemann integral. For more details see [1, Chapters

More information

SPECTRAL THEOREM FOR COMPACT SELF-ADJOINT OPERATORS

SPECTRAL THEOREM FOR COMPACT SELF-ADJOINT OPERATORS SPECTRAL THEOREM FOR COMPACT SELF-ADJOINT OPERATORS G. RAMESH Contents Introduction 1 1. Bounded Operators 1 1.3. Examples 3 2. Compact Operators 5 2.1. Properties 6 3. The Spectral Theorem 9 3.3. Self-adjoint

More information

Properties of Matrices and Operations on Matrices

Properties of Matrices and Operations on Matrices Properties of Matrices and Operations on Matrices A common data structure for statistical analysis is a rectangular array or matris. Rows represent individual observational units, or just observations,

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Department of Forestry & Department of Geography, Michigan State University, Lansing Michigan, U.S.A. 2 Biostatistics, School of Public

More information

Remarks on Improper Ignorance Priors

Remarks on Improper Ignorance Priors As a limit of proper priors Remarks on Improper Ignorance Priors Two caveats relating to computations with improper priors, based on their relationship with finitely-additive, but not countably-additive

More information

The Skorokhod reflection problem for functions with discontinuities (contractive case)

The Skorokhod reflection problem for functions with discontinuities (contractive case) The Skorokhod reflection problem for functions with discontinuities (contractive case) TAKIS KONSTANTOPOULOS Univ. of Texas at Austin Revised March 1999 Abstract Basic properties of the Skorokhod reflection

More information

Undirected Graphical Models

Undirected Graphical Models Undirected Graphical Models 1 Conditional Independence Graphs Let G = (V, E) be an undirected graph with vertex set V and edge set E, and let A, B, and C be subsets of vertices. We say that C separates

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

Tail inequalities for additive functionals and empirical processes of. Markov chains

Tail inequalities for additive functionals and empirical processes of. Markov chains Tail inequalities for additive functionals and empirical processes of geometrically ergodic Markov chains University of Warsaw Banff, June 2009 Geometric ergodicity Definition A Markov chain X = (X n )

More information

Estimating the spectral gap of a trace-class Markov operator

Estimating the spectral gap of a trace-class Markov operator Estimating the spectral gap of a trace-class Markov operator Qian Qin, James P. Hobert and Kshitij Khare Department of Statistics University of Florida April 2017 Abstract The utility of a Markov chain

More information

Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method

Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method Madeleine B. Thompson Radford M. Neal Abstract The shrinking rank method is a variation of slice sampling that is efficient at

More information

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization MATHEMATICS OF OPERATIONS RESEARCH Vol. 29, No. 3, August 2004, pp. 479 491 issn 0364-765X eissn 1526-5471 04 2903 0479 informs doi 10.1287/moor.1040.0103 2004 INFORMS Some Properties of the Augmented

More information

Isomorphisms between pattern classes

Isomorphisms between pattern classes Journal of Combinatorics olume 0, Number 0, 1 8, 0000 Isomorphisms between pattern classes M. H. Albert, M. D. Atkinson and Anders Claesson Isomorphisms φ : A B between pattern classes are considered.

More information

Gibbs Sampling in Linear Models #2

Gibbs Sampling in Linear Models #2 Gibbs Sampling in Linear Models #2 Econ 690 Purdue University Outline 1 Linear Regression Model with a Changepoint Example with Temperature Data 2 The Seemingly Unrelated Regressions Model 3 Gibbs sampling

More information

ON THE REGULARITY OF SAMPLE PATHS OF SUB-ELLIPTIC DIFFUSIONS ON MANIFOLDS

ON THE REGULARITY OF SAMPLE PATHS OF SUB-ELLIPTIC DIFFUSIONS ON MANIFOLDS Bendikov, A. and Saloff-Coste, L. Osaka J. Math. 4 (5), 677 7 ON THE REGULARITY OF SAMPLE PATHS OF SUB-ELLIPTIC DIFFUSIONS ON MANIFOLDS ALEXANDER BENDIKOV and LAURENT SALOFF-COSTE (Received March 4, 4)

More information

The small ball property in Banach spaces (quantitative results)

The small ball property in Banach spaces (quantitative results) The small ball property in Banach spaces (quantitative results) Ehrhard Behrends Abstract A metric space (M, d) is said to have the small ball property (sbp) if for every ε 0 > 0 there exists a sequence

More information