Analysis of Polya-Gamma Gibbs sampler for Bayesian logistic analysis of variance

Size: px
Start display at page:

Download "Analysis of Polya-Gamma Gibbs sampler for Bayesian logistic analysis of variance"

Transcription

1 Electronic Journal of Statistics Vol. (207) ISSN: DOI: 0.24/7-EJS227 Analysis of Polya-Gamma Gibbs sampler for Bayesian logistic analysis of variance Hee Min Choi Department of Statistics University of California, Davis and Jorge Carlos Román Department of Mathematics and Statistics San Diego State University Abstract: We consider the intractable posterior density that results when the one-way logistic analysis of variance model is combined with a flat prior. We analyze Polson, Scott and Windle s (203) data augmentation (DA) algorithm for exploring the posterior. he Markov operator associated with the DA algorithm is shown to be trace-class. AMS 2000 subject classifications: Primary 60J27; secondary 60K35. Keywords and phrases: Polya-Gamma distribution, data augmentation algorithm, geometric convergence rate, Markov chain, Markov operator, Monte Carlo, trace-class operator. Received September 205. Contents Introduction Polson, Scott and Windle s algorithm Main result Discussion A Appendix Acknowledgements References Introduction Consider the logistic regression set-up in which Y,...,Y N are independent Bernoulli random variables such that Pr(Y i = ) = F (x i β), where x i is a p vector of known covariates that are associated with Y i, β is a p vector 326

2 Gibbs sampler for Bayesian logistic analysis of variance 327 of unknown regression coefficients, and F is the standard logistic distribution function, that is, F (s) =e s /( + e s ). An important special case is the one-way logistic analysis of variance (ANOVA) model, where each x i is a unit vector. (See Section 3 for a detailed explanation of how the logistic regression model reduces to the one-way model.) In general, the joint mass function of Y,...,Y N is given by N N [ Pr(Y i = y i β) = F (x i β) ] y i [ F (x i β) ] y i I0,} (y i ). () A Bayesian version of the logistic regression model () can be assembled by specifying a prior for the unknown regression parameter β. We consider a flat improper prior. Of course, whenever an improper prior is used, one must check that the resulting posterior is well-defined (i.e., proper). Chen and Shao (2000) provide necessary and sufficient conditions for propriety and these are stated explicitly in Section 2. We assume that these conditions are satisfied, and denote the posterior by π(β y). his posterior density is intractable in the sense that its expectations, which are required for Bayesian inference, cannot be computed in closed form. However, we may resort to Markov chain Monte Carlo (MCMC) methods to approximate the intractable posterior expectations. he most recent MCMC method for this problem is a data augmentation (DA) algorithm that is based on the Polya-Gamma latent data strategy developed in Polson, Scott and Windle (203) (hereafter, PS&W). In this article, we show that in the one-way logistic ANOVA model, which is an important special case of logistic regression models, the Markov operator associated with PS& W s DA algorithm is trace-class (see Section 3 for definition). he fact that this Markov operator is trace-class implies that it is also compact, which in turn implies that the corresponding Markov chain is geometrically ergodic. hisisvery important from a practical standpoint because geometric ergodicity guarantees the existence of central limit theorems for ergodic averages, which allows for the calculation of asymptotically valid standard errors for the MCMC estimates of posterior expectations (see, e.g., Flegal, Haran and Jones, 2008; Jones, Haran, Caffo and Neath, 2006). Aside from our work, the only existing convergence rate result of a DA algorithm for the Bayesian logistic model is the one in Choi and Hobert (203). However, in Choi and Hobert (203), the model that was considered has proper normal priors on the regression parameter. While our result applies to a relatively small class of logistic regression models, it is the first of its kind for logistic regression models with an improper prior. It turns out that using a flat improper prior complicates the analysis that is required to study the corresponding Markov chain. Indeed, our analysis is substantially different from theirs. he remainder of this paper is organized as follows. Section 2 contains a formal description of PS&W s algorithm for exploring the posterior π(β y). In Section 3, we study the operator associated with PS&W s DA Markov chain for the one-way logistic ANOVA model and show that the associated Markov

3 328 H. M. Choi and J. C. Roman operator is trace-class. Finally, in Section 4 we discuss possible difficulties in extending our result to the general logistic regression model. 2. Polson, Scott and Windle s algorithm We begin with a description of the posterior density and conditions for propriety. We consider the posterior density that results when the logistic regression likelihood () is combined with a flat prior on the regression parameter β. Let y =(y,...,y N ) denote the vector of observed data and define the marginal density as N [ c(y) = F (x i β) ] y i [ F (x i β) ] y i dβ. R p By definition, the posterior is proper if and only if c(y) <. Of course, when c(y) <, the posterior density is given by π(β y) = c(y) N [ F (x i β) ] y i [ F (x i β) ] y i. Recall from the Introduction that Chen and Shao (2000) provide necessary and sufficient conditions for propriety. In order to state Chen and Shao s (2000) result, we need a bit more notation. As usual, let X denote the N p design matrix whose ith row is x i,andletz be an N p matrix whose ith row is zi := I 0} (y i ) x i I }(y i ) x i. Finally, let 0 p be the p vector of zeros. Here is the result. Proposition (Chen and Shao, 2000). he function c(y) is finite if and only if (A) the design matrix X has full column rank and (B) there is a vector b =(b,...,b N ) with strictly positive components such that Z b =0 p. In particular, in the one-way logistic ANOVA model, the design matrix X has full column rank so the posterior is proper if and only if (B) holds. (he precise form of X in the one-way model and an easily checkable equivalent condition for propriety are stated in the next section.) hroughout the remainder of this section, we assume that the conditions of Proposition are satisfied so that the posterior is well-defined. We now describe PS&W s DA algorithm for exploring the posterior π(β y). Let R + := (0, ). For fixed w R N +, define Σ(w) =(Z Ω(w)Z) and μ(w) = Σ(w)Z ( 2 N), where Ω(w) isthen N diagonal matrix whose ith diagonal entry is w i,and N is the N vector of s. When we write W PG(,c), we mean W has a Polya-Gamma density (PS&W) given by f(x; c) = cosh(c/2) e c2 x 2 g(x),

4 where c 0and Gibbs sampler for Bayesian logistic analysis of variance 329 g(x) = k (2k +) (2k+)2 ( ) e 8x I (0, ) (x). (2) 2πx 3 k=0 he dynamics of PS&W s Markov chain, Φ = β (m) } m=0, are implicitly defined through the following two-step procedure for moving from the current state, β (m) = β, to new state β (m+). Iteration m + of PS&W s DA algorithm:. Draw W,...,W N independently with W i PG(, z i β ), and call the observed value w =(w,...,w N ). 2. Draw β (m+) N p ( μ(w), Σ(w) ). A highly efficient rejection sampler for simulating the Polya-Gamma distribution is provided in PS&W. Also, a formal derivation of the above algorithm is similar to the one for the normal prior case provided in Choi and Hobert (203). he Markov transition density (Mtd) of the Markov chain Φ is given by k(β β )= π(β w, y) π(w β,y) dw, R N + where π(β w, y) andπ(w β,y) are conditional densities that can be gleaned from the two-step algorithm described above. Indeed, π(w β,y) is a product of Polya-Gamma densities, and π(β w, y) is the multivariate normal density. Note that k maps R p R p into R + and that π(β y) is an invariant density for this Mtd. It follows that the corresponding Markov chain Φ is Harris ergodic; that is, irreducible, aperiodic, and positive Harris recurrent (see, e.g., Hobert, 20). 3. Main result In this section, we restrict attention to the one-way logistic ANOVA model, an important special case of logistic regression models, and present a spectral analysis result concerning PS&W s Markov chain Φ for this problem. In particular, we prove that the Markov operator associated with Φ is trace-class. We begin by describing the one-way logistic ANOVA model. Let Y ij } be independent Bernoulli random variables such that Pr(Y ij = β) =F (β i ), i =, 2,...,p, j =, 2,...,n i,

5 330 H. M. Choi and J. C. Roman where β i is the main effect of the ith treatment group, and β =(β,...,β p ). Note that there are p treatment groups and the number of observations in different groups may differ. Of course, this model can be written in the logistic regression form (). Indeed, there are a total of N := n + + n p observations and we arrange them using the usual lexicographical ordering: y =[y y n y 2 y 2n2 y p y pnp ]. he random version of y, call it Y, is defined similarly using the same ordering. We denote the lth components of y and Y as y l and Y l, respectively. Letting ñ 0 =0andñ j = j k= n k for j N p :=, 2,...,p}, we can write x i = e j, if ñ j + i ñ j, where e j denote the p jth unit vector (i.e., the jth column of the p p identity matrix). Note that the design matrix X has full column rank since it equals p n i, which has orthogonal columns. It follows from Proposition that the posterior π(β y) that results when the one-way logistic ANOVA likelihood is combined with a flat prior on β is proper if and only if (B) holds, but this condition is not easy to interpret. We present an equivalent condition that is easy to check and understand. As usual, let ˆp i be the proportion of s in the ith treatment group, that is, ˆp i = n i ñi l=ñ i + y l. he following result, which is proven in the Appendix, implies that the posterior is proper if and only if there are at least one and one 0 in each treatment group. Corollary. Assume that X = p n i. he posterior is proper if and only if 0 < ˆp i < for all i N p. (B ) Assume that the posterior π(β y) is proper. We now study PS&W s Markov chain Φ for exploring the posterior. In order to formally describe our main result, we must introduce the operator associated with Φ. Recall that k(β β ) denotes the Mtd of Φ, that is, the conditional density of β (m+) given that β (m) = β. Let L 2 0 be the space of real-valued functions with domain R p that are square integrable and have mean zero with respect to the posterior density π(β y). his is a Hilbert space in which the inner product of g, h L 2 0 is defined as g, h = g(β) h(β) π(β y) dβ, R p and the corresponding norm is, of course, given by g = g, g /2. he Mtd k defines an operator on L 2 0 and the spectrum of the operator contains a great deal of information about the convergence behavior of the corresponding Markov chain Φ (see, e.g., Hobert, Roy and Robert, 20). Let K : L 2 0 L 2 0 denote the operator that maps g L 2 0 to (Kg)(β )= g(β) k(β β ) dβ. R p

6 Gibbs sampler for Bayesian logistic analysis of variance 33 Because the operator K is self-adjoint and positive, the spectrum of K is a subset of the interval [0, ] (see, e.g., Hobert and Marchev, 2008; Liu,Wongand Kong, 994; Hobert et al., 20). Moreover, if the self-adjoint, positive operator K is compact, then its spectrum consists solely of eigenvalues (which are all strictly less than one) and the point 0} (see, e.g., Retherford, 993, p. 6-62). If the sum of the eigenvalues is finite, then the operator is called trace-class (see Khare and Hobert (20), and references therein). Here is our main result. heorem. Assume that X = p n i and the posterior is proper. hen, the Markov operator K is trace-class. he following lemma, which is a simple derivation of results in Devroye (2009), will be used in the proof of heorem. Lemma. Let g denote the density of PG(, 0) random variable. If w (0, log 3 ],then g(w) exp }. 2πw 3 8w Remark. It is clear that (2) is a density of PG(, 0) random variable. As described in Devroye (2009), Lemma follows from the fact that g is of the form g(w) = k=0 ( )k a k (w) where nonnegative a k (w)} k=0 is decreasing in k for w (0, log 3 ]. Proofofheorem. o prove that the Markov operator K is trace-class, we follow a technique used in Khare and Hobert (20); that is, we will establish the following condition [ ] k(β β) dβ = π(β w, y) π(w β,y) dβ dw <. (3) R p R p he key is to bound the inner integral in (3) by R N + N [ } ] a c exp g(w i ), 8w i where c and a< are constants. We then use Lemma to complete the proof. We begin by evaluating the inner integral in (3). Recall that Σ = Σ(w) = (Z Ω(w)Z) and μ = μ(w) =Σ(w)Z ( 2 N ). First, note that the product of densities π(β w, y) π(w β,y) can be written as follows: (2π) p 2 Σ 2 exp ( ( β (Z ΩZ)β 2β Z ) )} 2 2 N exp 8 } NZ(Z ΩZ) Z N N ( ) } z cosh i β exp (z i β)2 w i g(w i ) 2 2

7 332 H. M. Choi and J. C. Roman =(2π) p 2 Σ 2 exp ( 2 β 2Σ ) } β N ( ) z cosh i β exp z i β } g(w i ) 2 2 exp 8 } NZ(Z ΩZ) Z N =(2π) p 2 Σ 2 exp ( 2 β 2Σ ) } [ N ( β +exp z 2 i β} )] exp } N 8 NZ(Z ΩZ) Z N g(w i ), (4) where the last equality follows from cosh(a)e a = 2 ( + e 2a ). We now evaluate the integral of (4) with respect to β. Recall N N =, 2,...,N}. For each A N N, define an N p matrix Z A whose ith row is zi if i A 0 p if i N N \ A. hen, it is easy to see that exp exp } zi β if A is nonempty N Z A β} = i A if A is empty. herefore, we have N ( +exp z i β} ) = A N N exp N Z A β } so the integral of (4) with respect to β can be written as c 0 exp } [ N ] 8 NZ(Z ΩZ) Z N g(w i ) A N N R p exp N Z A β } φ (β; 0 p, Σ/2) dβ, (5) where c 0 is a constant, and φ(β; 0 p, Σ/2) is a multivariate normal density with mean 0 p and variance Σ/2. Note that the integral in (5) is just the moment generating function of β evaluated at the point N Z A. Hence, (5) is equal to c 0 exp } [ N ] 8 N Z(Z ΩZ) Z N g(w i )

8 Gibbs sampler for Bayesian logistic analysis of variance 333 } exp 4 NZ A (Z ΩZ) ZA N. (6) A N N We now express the exponential terms of (6) in a more compact way. Recall X = p n i and the ith row of Z is zi = I 0} (y i ) x i I } (y i ) x i. Clearly, the N p matrix Z can be written as UX,whereU is the N N diagonal matrix whose ith diagonal entry is equal to u i := I 0} (y i ) I } (y i ). It is easy to see Z ΩZ = X ΩX, and using the orthogonality of the columns of X, Z ΩZ is a p p diagonal matrix whose ith diagonal entry is equal to ñ i j=ñ i + where ñ 0 =0andñ i = i k= n k for i N p.letz ij denote the jth component of the row vector z i. A straightforward calculation yields that, for A N N, NZ A (Z ΩZ) Z A N = w j, ( p i A z ) 2 ij ñj j= k=ñ w j + k Hence, the exponential terms in (6) can be written as ( ) 2 ( p N ) 2 ñj 2 z ij z ij 8. (7) A N N exp j= k=ñ j + w k i A We now claim that, for all A N N and all j N p, ( ) 2 ( N ) 2 2 z ij z ij <. n 2 j i A Recall that ˆp j is the proportion of s in the jth treatment group, that is, ˆp j = ñj n j i=ñ y j + i. Also, recall from the definition of Z that its jth column is [ 0 uñj + uñj 0 ñj ] N ñ j,. where u i = I 0} (y i ) I } (y i ). Of course, when j =orj = p, weomitthe vector 0 0 ; for example, the first column of Z is [u u n 0 N n ]. Since there are n j ˆp j s and n j ( ˆp j ) 0 s in the jth treatment group, in the jth column of Z, n j ˆp j of the u i s are s and n j ( ˆp j )oftheu i s are s. Assume without loss of generality that, for each j N p, the first n j ˆp j u i s are s and the remaining n j ( ˆp j ) u i s are s. hat is, we assume that the jth column of Z is equal to [ n ˆp n ( ˆp ) 0 N n ] if j = [0 ñ j n j ˆp j n j( ˆp j) 0 N ñ j ] if <j<p (8) [0 ñ p n p ˆp p n p( ˆp p) ] if j = p.

9 334 H. M. Choi and J. C. Roman Letting ˆq j = ˆp j for j N p, the sum of the jth column of Z is N z ij = n j ( ˆp j ) n j ˆp j = n j (ˆq j ˆp j ), and, for all A N N and j N p, z ij max n j ˆp j,n j ˆq j }. i A It follows that, for all A N N and all j N p, ( ) 2 ( N ) 2 2 z ij z ij 2maxˆp j, ˆq j } 2 (ˆq j ˆp j ) 2 n 2 j i A =2(ˆp 2 j +ˆq 2 j ) 2minˆp j, ˆq j } 2 (ˆq j ˆp j ) 2 = 2minˆp j, ˆq j } 2. Since condition (B ) of Corollary is in force, for all j N p,wehave Hence, (7) is bounded above by 2 N a exp 8 2minˆp j, ˆq j } 2 <. p j= ñj n 2 j k=ñ j w k where [ a := max 2minˆpj, ˆq j } 2] <. j N p Combining (5), (6), (7) and(9), we have a p n 2 j π(β w, y) π(w β,y) dβ c exp ñj R 8 p a c exp 8 = c N j= p, (9) k=ñ j + w k ñ j w j= k k=ñ j + a exp 8w i } g(w i ), N g(w i ) N g(w i ) where c is a constant, and the last inequality is due to the arithmetic-harmonic mean inequality. It follows that π(β w, y) π(w β,y) dβ dw R p R N +

10 Gibbs sampler for Bayesian logistic analysis of variance 335 N a c exp R + 8w i N [ t c 0 N [ t c 0 } g(w i ) dw i } a exp g(w i ) dw i + 8w i exp a 2πw 3 i 8w i t ] a exp g(w i ) dw i 8t} } ] dw i + c 2, (0) where t = log 3, c 2 is a constant, and the last inequality is due to Lemma. he integrand in (0) is a kernel of an inverse gamma density, and hence (3) is satisfied. 4. Discussion We have shown in Section 3 that in the one-way logistic ANOVA model, the Markov operator associated with PS&W s DA chain is trace-class (and thus the chain is geometrically ergodic). Given this result, it is natural to ask whether PS&W s DA operator is also trace-class in the general logistic regression model. We note that our trace-class proof for the one-way model relies heavily on the orthogonality of the columns of the design matrix X, which is not guaranteed in general. In particular, the idea is to use orthogonality to express (Z ΩZ) as a diagonal matrix whose diagonal entries are simple functions of w =(w,...,w N ). However, in the general case, (Z ΩZ) does not necessarily take a simple form, which in turn complicates the analysis. hus, we believe that establishing (3) for the general case would be much more than a straightforward extension of the proof of our heorem. It is our hope that our result for the one-way logistic ANOVA model will promote the study of the DA Markov chain for the general logistic regression model. Appendix A: Appendix In this section, we provide a proof of Corollary. Before presenting the proof, we recall the structure of Z givenin(8) which will be used a couple of times within the proof. he jth column of the N p matrix Z is [ n ˆp n ( ˆp ) 0 N n ] if j = [0 ñ j n j ˆp j n j( ˆp j) 0 N ñ j ] if <j<p [0 ñ p n p ˆp p n p( ˆp p) ] if j = p. ProofofCorollary. Recall that c(y) < if and only if (B) holds. hus, it suffices to show that (B ) is equivalent to (B). First, we shall write b = (b,,b p ), where each b i is an n i vector. We start by demonstrating that (B) implies (B ). We will establish this by contradiction. Assume (without

11 336 H. M. Choi and J. C. Roman loss of generality) that ˆp = 0. hen, there are no s in the first column of Z, so the first column of Z is equal to [ n 0 N n ]. It follows that the first component of the p vector Z b is n b > 0 for all b R N +, which contradicts (B). We can similarly prove that the assumption of ˆp = yields a contradiction to (B). Hence, it must be that 0 < ˆp j < for all j N p. We now show that (B ) implies (B). We will establish this by explicitly constructing a vector b R N + such that Z b =0 p. For each j N p, define b [ n j ( 2ˆp j )+ n j = j ] if 0 < ˆp j < 2 [ n j n j (2ˆp j ) + ] if 2 ˆp j <. Clearly, each b j is in R nj +,sothatb R N +. herefore, we need only to show that Z b =0 p. It follows that, if 2 ˆp j <, then the jth component of the p vector Z b is equal to [ n j ˆp j n j( ˆp j)] [ ] nj =0. n j (2ˆp j ) + We can similarly show that, if 0 < ˆp j < 2, then the jth component of Z b is also zero. Hence, Z b =0 p, which completes the proof. Acknowledgements Román s research was supported by NSF Grant DMS References Chen, M.-H. and Shao, Q.-M. (2000). Propriety of posterior distribution for dichotomous quantal response models. Proceedings of the American Mathematical Society, MR Choi,H.M.and Hobert, J. P. (203). he Polya-Gamma Gibbs sampler for Bayesian logistic regression is uniformly ergodic. Electronic Journal of Statistics, MR30966 Devroye, L. (2009). On exact simulation algorithms for some distributions related to Jacobi theta functions. Statistics and Probability Letters 79, Flegal,J.M., Haran, M. and Jones, G. L. (2008). Markov chain Monte Carlo: Can we trust the third significant figure? Statistical Science, MR Hobert, J. P. (20). he data augmentation algorithm: heory and methodology. In Handbook of Markov Chain Monte Carlo (S. Brooks, A. Gelman, G. Jones and X.-L. Meng, eds.). Chapman & Hall/CRC Press. MR Hobert, J. P. and Marchev, D. (2008). A theoretical comparison of the data augmentation, marginal augmentation and PX-DA algorithms. he Annals of Statistics, MR

12 Gibbs sampler for Bayesian logistic analysis of variance 337 Hobert, J. P., Roy, V. and Robert, C. P. (20). Improving the convergence properties of the data augmentation algorithm with an application to Bayesian mixture modelling. Statistical Science, Jones, G. L., Haran, M., Caffo, B. S. and Neath, R. (2006). Fixed-width output analysis for Markov chain Monte Carlo. Journal of the American Statistical Association, Khare, K. and Hobert, J. P. (20). A spectral analytic comparison of trace-class data augmentation algorithms and their sandwich variants. he Annals of Statistics, Liu, J. S., Wong, W. H. and Kong, A. (994). Covariance structure of the Gibbs sampler with applications to comparisons of estimators and augmentation schemes. Biometrika, Polson, N. G., Scott, J. G. and Windle, J. (203). Bayesian inference for logistic models using Polya-Gamma latent variables. Journal of the American Statistical Association, MR37472 Retherford,J.R.(993). Hilbert Space: Compact Operators and the race heorem. Cambridge University Press.

The Polya-Gamma Gibbs Sampler for Bayesian. Logistic Regression is Uniformly Ergodic

The Polya-Gamma Gibbs Sampler for Bayesian. Logistic Regression is Uniformly Ergodic he Polya-Gamma Gibbs Sampler for Bayesian Logistic Regression is Uniformly Ergodic Hee Min Choi and James P. Hobert Department of Statistics University of Florida August 013 Abstract One of the most widely

More information

On Reparametrization and the Gibbs Sampler

On Reparametrization and the Gibbs Sampler On Reparametrization and the Gibbs Sampler Jorge Carlos Román Department of Mathematics Vanderbilt University James P. Hobert Department of Statistics University of Florida March 2014 Brett Presnell Department

More information

Geometric ergodicity of the Bayesian lasso

Geometric ergodicity of the Bayesian lasso Geometric ergodicity of the Bayesian lasso Kshiti Khare and James P. Hobert Department of Statistics University of Florida June 3 Abstract Consider the standard linear model y = X +, where the components

More information

A Comparison Theorem for Data Augmentation Algorithms with Applications

A Comparison Theorem for Data Augmentation Algorithms with Applications A Comparison Theorem for Data Augmentation Algorithms with Applications Hee Min Choi Department of Statistics University of California, Davis James P. Hobert Department of Statistics University of Florida

More information

University of Toronto Department of Statistics

University of Toronto Department of Statistics Norm Comparisons for Data Augmentation by James P. Hobert Department of Statistics University of Florida and Jeffrey S. Rosenthal Department of Statistics University of Toronto Technical Report No. 0704

More information

Convergence Analysis of MCMC Algorithms for Bayesian Multivariate Linear Regression with Non-Gaussian Errors

Convergence Analysis of MCMC Algorithms for Bayesian Multivariate Linear Regression with Non-Gaussian Errors Convergence Analysis of MCMC Algorithms for Bayesian Multivariate Linear Regression with Non-Gaussian Errors James P. Hobert, Yeun Ji Jung, Kshitij Khare and Qian Qin Department of Statistics University

More information

Block Gibbs sampling for Bayesian random effects models with improper priors: Convergence and regeneration

Block Gibbs sampling for Bayesian random effects models with improper priors: Convergence and regeneration Block Gibbs sampling for Bayesian random effects models with improper priors: Convergence and regeneration Aixin Tan and James P. Hobert Department of Statistics, University of Florida May 4, 009 Abstract

More information

CONVERGENCE ANALYSIS OF THE GIBBS SAMPLER FOR BAYESIAN GENERAL LINEAR MIXED MODELS WITH IMPROPER PRIORS. Vanderbilt University & University of Florida

CONVERGENCE ANALYSIS OF THE GIBBS SAMPLER FOR BAYESIAN GENERAL LINEAR MIXED MODELS WITH IMPROPER PRIORS. Vanderbilt University & University of Florida Submitted to the Annals of Statistics CONVERGENCE ANALYSIS OF THE GIBBS SAMPLER FOR BAYESIAN GENERAL LINEAR MIXED MODELS WITH IMPROPER PRIORS By Jorge Carlos Román and James P. Hobert Vanderbilt University

More information

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can

More information

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns

More information

STA 294: Stochastic Processes & Bayesian Nonparametrics

STA 294: Stochastic Processes & Bayesian Nonparametrics MARKOV CHAINS AND CONVERGENCE CONCEPTS Markov chains are among the simplest stochastic processes, just one step beyond iid sequences of random variables. Traditionally they ve been used in modelling a

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov

More information

Improving the Convergence Properties of the Data Augmentation Algorithm with an Application to Bayesian Mixture Modelling

Improving the Convergence Properties of the Data Augmentation Algorithm with an Application to Bayesian Mixture Modelling Improving the Convergence Properties of the Data Augmentation Algorithm with an Application to Bayesian Mixture Modelling James P. Hobert Department of Statistics University of Florida Vivekananda Roy

More information

Markov chain Monte Carlo

Markov chain Monte Carlo Markov chain Monte Carlo Karl Oskar Ekvall Galin L. Jones University of Minnesota March 12, 2019 Abstract Practically relevant statistical models often give rise to probability distributions that are analytically

More information

Spectral properties of Markov operators in Markov chain Monte Carlo

Spectral properties of Markov operators in Markov chain Monte Carlo Spectral properties of Markov operators in Markov chain Monte Carlo Qian Qin Advisor: James P. Hobert October 2017 1 Introduction Markov chain Monte Carlo (MCMC) is an indispensable tool in Bayesian statistics.

More information

Estimating the spectral gap of a trace-class Markov operator

Estimating the spectral gap of a trace-class Markov operator Estimating the spectral gap of a trace-class Markov operator Qian Qin, James P. Hobert and Kshitij Khare Department of Statistics University of Florida April 2017 Abstract The utility of a Markov chain

More information

MARGINAL MARKOV CHAIN MONTE CARLO METHODS

MARGINAL MARKOV CHAIN MONTE CARLO METHODS Statistica Sinica 20 (2010), 1423-1454 MARGINAL MARKOV CHAIN MONTE CARLO METHODS David A. van Dyk University of California, Irvine Abstract: Marginal Data Augmentation and Parameter-Expanded Data Augmentation

More information

MARKOV CHAIN MONTE CARLO

MARKOV CHAIN MONTE CARLO MARKOV CHAIN MONTE CARLO RYAN WANG Abstract. This paper gives a brief introduction to Markov Chain Monte Carlo methods, which offer a general framework for calculating difficult integrals. We start with

More information

Asymptotically Stable Drift and Minorization for Markov Chains. with Application to Albert and Chib s Algorithm

Asymptotically Stable Drift and Minorization for Markov Chains. with Application to Albert and Chib s Algorithm Asymptotically Stable Drift and Minorization for Markov Chains with Application to Albert and Chib s Algorithm Qian Qin and James P. Hobert Department of Statistics University of Florida December 2017

More information

Lecture 8: The Metropolis-Hastings Algorithm

Lecture 8: The Metropolis-Hastings Algorithm 30.10.2008 What we have seen last time: Gibbs sampler Key idea: Generate a Markov chain by updating the component of (X 1,..., X p ) in turn by drawing from the full conditionals: X (t) j Two drawbacks:

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Convergence complexity analysis of Albert and Chib s algorithm for Bayesian probit regression

Convergence complexity analysis of Albert and Chib s algorithm for Bayesian probit regression Convergence complexity analysis of Albert and Chib s algorithm for Bayesian probit regression Qian Qin and James P. Hobert Department of Statistics University of Florida April 2018 Abstract The use of

More information

Proofs for Large Sample Properties of Generalized Method of Moments Estimators

Proofs for Large Sample Properties of Generalized Method of Moments Estimators Proofs for Large Sample Properties of Generalized Method of Moments Estimators Lars Peter Hansen University of Chicago March 8, 2012 1 Introduction Econometrica did not publish many of the proofs in my

More information

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version)

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version) A quick introduction to Markov chains and Markov chain Monte Carlo (revised version) Rasmus Waagepetersen Institute of Mathematical Sciences Aalborg University 1 Introduction These notes are intended to

More information

Marginal Markov Chain Monte Carlo Methods

Marginal Markov Chain Monte Carlo Methods Marginal Markov Chain Monte Carlo Methods David A. van Dyk Department of Statistics, University of California, Irvine, CA 92697 dvd@ics.uci.edu Hosung Kang Washington Mutual hosung.kang@gmail.com June

More information

Applicability of subsampling bootstrap methods in Markov chain Monte Carlo

Applicability of subsampling bootstrap methods in Markov chain Monte Carlo Applicability of subsampling bootstrap methods in Markov chain Monte Carlo James M. Flegal Abstract Markov chain Monte Carlo (MCMC) methods allow exploration of intractable probability distributions by

More information

VARIABLE TRANSFORMATION TO OBTAIN GEOMETRIC ERGODICITY IN THE RANDOM-WALK METROPOLIS ALGORITHM

VARIABLE TRANSFORMATION TO OBTAIN GEOMETRIC ERGODICITY IN THE RANDOM-WALK METROPOLIS ALGORITHM Submitted to the Annals of Statistics VARIABLE TRANSFORMATION TO OBTAIN GEOMETRIC ERGODICITY IN THE RANDOM-WALK METROPOLIS ALGORITHM By Leif T. Johnson and Charles J. Geyer Google Inc. and University of

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Improving the Data Augmentation algorithm in the two-block setup

Improving the Data Augmentation algorithm in the two-block setup Improving the Data Augmentation algorithm in the two-block setup Subhadip Pal, Kshitij Khare and James P. Hobert University of Florida Abstract The Data Augmentation (DA) approach to approximate sampling

More information

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions

More information

6 Markov Chain Monte Carlo (MCMC)

6 Markov Chain Monte Carlo (MCMC) 6 Markov Chain Monte Carlo (MCMC) The underlying idea in MCMC is to replace the iid samples of basic MC methods, with dependent samples from an ergodic Markov chain, whose limiting (stationary) distribution

More information

Stat 516, Homework 1

Stat 516, Homework 1 Stat 516, Homework 1 Due date: October 7 1. Consider an urn with n distinct balls numbered 1,..., n. We sample balls from the urn with replacement. Let N be the number of draws until we encounter a ball

More information

LECTURE 15 Markov chain Monte Carlo

LECTURE 15 Markov chain Monte Carlo LECTURE 15 Markov chain Monte Carlo There are many settings when posterior computation is a challenge in that one does not have a closed form expression for the posterior distribution. Markov chain Monte

More information

A Geometric Interpretation of the Metropolis Hastings Algorithm

A Geometric Interpretation of the Metropolis Hastings Algorithm Statistical Science 2, Vol. 6, No., 5 9 A Geometric Interpretation of the Metropolis Hastings Algorithm Louis J. Billera and Persi Diaconis Abstract. The Metropolis Hastings algorithm transforms a given

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

Partially Collapsed Gibbs Samplers: Theory and Methods. Ever increasing computational power along with ever more sophisticated statistical computing

Partially Collapsed Gibbs Samplers: Theory and Methods. Ever increasing computational power along with ever more sophisticated statistical computing Partially Collapsed Gibbs Samplers: Theory and Methods David A. van Dyk 1 and Taeyoung Park Ever increasing computational power along with ever more sophisticated statistical computing techniques is making

More information

The Bayesian Approach to Multi-equation Econometric Model Estimation

The Bayesian Approach to Multi-equation Econometric Model Estimation Journal of Statistical and Econometric Methods, vol.3, no.1, 2014, 85-96 ISSN: 2241-0384 (print), 2241-0376 (online) Scienpress Ltd, 2014 The Bayesian Approach to Multi-equation Econometric Model Estimation

More information

An ABC interpretation of the multiple auxiliary variable method

An ABC interpretation of the multiple auxiliary variable method School of Mathematical and Physical Sciences Department of Mathematics and Statistics Preprint MPS-2016-07 27 April 2016 An ABC interpretation of the multiple auxiliary variable method by Dennis Prangle

More information

HIERARCHICAL BAYESIAN ANALYSIS OF BINARY MATCHED PAIRS DATA

HIERARCHICAL BAYESIAN ANALYSIS OF BINARY MATCHED PAIRS DATA Statistica Sinica 1(2), 647-657 HIERARCHICAL BAYESIAN ANALYSIS OF BINARY MATCHED PAIRS DATA Malay Ghosh, Ming-Hui Chen, Atalanta Ghosh and Alan Agresti University of Florida, Worcester Polytechnic Institute

More information

Kernel adaptive Sequential Monte Carlo

Kernel adaptive Sequential Monte Carlo Kernel adaptive Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) December 7, 2015 1 / 36 Section 1 Outline

More information

Bayesian Inference. Chapter 9. Linear models and regression

Bayesian Inference. Chapter 9. Linear models and regression Bayesian Inference Chapter 9. Linear models and regression M. Concepcion Ausin Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master in Mathematical Engineering

More information

SAMPLING ALGORITHMS. In general. Inference in Bayesian models

SAMPLING ALGORITHMS. In general. Inference in Bayesian models SAMPLING ALGORITHMS SAMPLING ALGORITHMS In general A sampling algorithm is an algorithm that outputs samples x 1, x 2,... from a given distribution P or density p. Sampling algorithms can for example be

More information

Bayesian Multivariate Logistic Regression

Bayesian Multivariate Logistic Regression Bayesian Multivariate Logistic Regression Sean M. O Brien and David B. Dunson Biostatistics Branch National Institute of Environmental Health Sciences Research Triangle Park, NC 1 Goals Brief review of

More information

Hamiltonian Monte Carlo

Hamiltonian Monte Carlo Chapter 7 Hamiltonian Monte Carlo As with the Metropolis Hastings algorithm, Hamiltonian (or hybrid) Monte Carlo (HMC) is an idea that has been knocking around in the physics literature since the 1980s

More information

Partially Collapsed Gibbs Samplers: Theory and Methods

Partially Collapsed Gibbs Samplers: Theory and Methods David A. VAN DYK and Taeyoung PARK Partially Collapsed Gibbs Samplers: Theory and Methods Ever-increasing computational power, along with ever more sophisticated statistical computing techniques, is making

More information

Likelihood-free MCMC

Likelihood-free MCMC Bayesian inference for stable distributions with applications in finance Department of Mathematics University of Leicester September 2, 2011 MSc project final presentation Outline 1 2 3 4 Classical Monte

More information

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary

More information

Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method

Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method Madeleine B. Thompson Radford M. Neal Abstract The shrinking rank method is a variation of slice sampling that is efficient at

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

Partial factor modeling: predictor-dependent shrinkage for linear regression

Partial factor modeling: predictor-dependent shrinkage for linear regression modeling: predictor-dependent shrinkage for linear Richard Hahn, Carlos Carvalho and Sayan Mukherjee JASA 2013 Review by Esther Salazar Duke University December, 2013 Factor framework The factor framework

More information

Resampling Markov chain Monte Carlo algorithms: Basic analysis and empirical comparisons. Zhiqiang Tan 1. February 2014

Resampling Markov chain Monte Carlo algorithms: Basic analysis and empirical comparisons. Zhiqiang Tan 1. February 2014 Resampling Markov chain Monte Carlo s: Basic analysis and empirical comparisons Zhiqiang Tan 1 February 2014 Abstract. Sampling from complex distributions is an important but challenging topic in scientific

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo 1 Motivation 1.1 Bayesian Learning Markov Chain Monte Carlo Yale Chang In Bayesian learning, given data X, we make assumptions on the generative process of X by introducing hidden variables Z: p(z): prior

More information

Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data

Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data Journal of Multivariate Analysis 78, 6282 (2001) doi:10.1006jmva.2000.1939, available online at http:www.idealibrary.com on Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone

More information

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Review. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with

More information

Computer intensive statistical methods

Computer intensive statistical methods Lecture 11 Markov Chain Monte Carlo cont. October 6, 2015 Jonas Wallin jonwal@chalmers.se Chalmers, Gothenburg university The two stage Gibbs sampler If the conditional distributions are easy to sample

More information

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture 15-7th March Arnaud Doucet

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture 15-7th March Arnaud Doucet Stat 535 C - Statistical Computing & Monte Carlo Methods Lecture 15-7th March 2006 Arnaud Doucet Email: arnaud@cs.ubc.ca 1 1.1 Outline Mixture and composition of kernels. Hybrid algorithms. Examples Overview

More information

Markov Chains CK eqns Classes Hitting times Rec./trans. Strong Markov Stat. distr. Reversibility * Markov Chains

Markov Chains CK eqns Classes Hitting times Rec./trans. Strong Markov Stat. distr. Reversibility * Markov Chains Markov Chains A random process X is a family {X t : t T } of random variables indexed by some set T. When T = {0, 1, 2,... } one speaks about a discrete-time process, for T = R or T = [0, ) one has a continuous-time

More information

Local consistency of Markov chain Monte Carlo methods

Local consistency of Markov chain Monte Carlo methods Ann Inst Stat Math (2014) 66:63 74 DOI 10.1007/s10463-013-0403-3 Local consistency of Markov chain Monte Carlo methods Kengo Kamatani Received: 12 January 2012 / Revised: 8 March 2013 / Published online:

More information

Markov chain Monte Carlo

Markov chain Monte Carlo 1 / 26 Markov chain Monte Carlo Timothy Hanson 1 and Alejandro Jara 2 1 Division of Biostatistics, University of Minnesota, USA 2 Department of Statistics, Universidad de Concepción, Chile IAP-Workshop

More information

Markov Chains, Stochastic Processes, and Matrix Decompositions

Markov Chains, Stochastic Processes, and Matrix Decompositions Markov Chains, Stochastic Processes, and Matrix Decompositions 5 May 2014 Outline 1 Markov Chains Outline 1 Markov Chains 2 Introduction Perron-Frobenius Matrix Decompositions and Markov Chains Spectral

More information

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj Adriana Ibrahim Institute

More information

Kobe University Repository : Kernel

Kobe University Repository : Kernel Kobe University Repository : Kernel タイトル Title 著者 Author(s) 掲載誌 巻号 ページ Citation 刊行日 Issue date 資源タイプ Resource Type 版区分 Resource Version 権利 Rights DOI URL Note on the Sampling Distribution for the Metropolis-

More information

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model UNIVERSITY OF TEXAS AT SAN ANTONIO Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model Liang Jing April 2010 1 1 ABSTRACT In this paper, common MCMC algorithms are introduced

More information

Stat 5421 Lecture Notes Proper Conjugate Priors for Exponential Families Charles J. Geyer March 28, 2016

Stat 5421 Lecture Notes Proper Conjugate Priors for Exponential Families Charles J. Geyer March 28, 2016 Stat 5421 Lecture Notes Proper Conjugate Priors for Exponential Families Charles J. Geyer March 28, 2016 1 Theory This section explains the theory of conjugate priors for exponential families of distributions,

More information

On the Use of Cauchy Prior Distributions for Bayesian Logistic Regression

On the Use of Cauchy Prior Distributions for Bayesian Logistic Regression On the Use of Cauchy Prior Distributions for Bayesian Logistic Regression Joyee Ghosh, Yingbo Li, Robin Mitra arxiv:157.717v2 [stat.me] 9 Feb 217 Abstract In logistic regression, separation occurs when

More information

Eventually reducible matrix, eventually nonnegative matrix, eventually r-cyclic

Eventually reducible matrix, eventually nonnegative matrix, eventually r-cyclic December 15, 2012 EVENUAL PROPERIES OF MARICES LESLIE HOGBEN AND ULRICA WILSON Abstract. An eventual property of a matrix M C n n is a property that holds for all powers M k, k k 0, for some positive integer

More information

Chapter 7. Markov chain background. 7.1 Finite state space

Chapter 7. Markov chain background. 7.1 Finite state space Chapter 7 Markov chain background A stochastic process is a family of random variables {X t } indexed by a varaible t which we will think of as time. Time can be discrete or continuous. We will only consider

More information

Bayesian GLMs and Metropolis-Hastings Algorithm

Bayesian GLMs and Metropolis-Hastings Algorithm Bayesian GLMs and Metropolis-Hastings Algorithm We have seen that with conjugate or semi-conjugate prior distributions the Gibbs sampler can be used to sample from the posterior distribution. In situations,

More information

Spring 2017 Econ 574 Roger Koenker. Lecture 14 GEE-GMM

Spring 2017 Econ 574 Roger Koenker. Lecture 14 GEE-GMM University of Illinois Department of Economics Spring 2017 Econ 574 Roger Koenker Lecture 14 GEE-GMM Throughout the course we have emphasized methods of estimation and inference based on the principle

More information

Zero Variance Markov Chain Monte Carlo for Bayesian Estimators

Zero Variance Markov Chain Monte Carlo for Bayesian Estimators Noname manuscript No. will be inserted by the editor Zero Variance Markov Chain Monte Carlo for Bayesian Estimators Supplementary material Antonietta Mira Reza Solgi Daniele Imparato Received: date / Accepted:

More information

Efficient Sampling Methods for Truncated Multivariate Normal and Student-t Distributions Subject to Linear Inequality Constraints

Efficient Sampling Methods for Truncated Multivariate Normal and Student-t Distributions Subject to Linear Inequality Constraints Efficient Sampling Methods for Truncated Multivariate Normal and Student-t Distributions Subject to Linear Inequality Constraints Yifang Li Department of Statistics, North Carolina State University 2311

More information

1 Data Arrays and Decompositions

1 Data Arrays and Decompositions 1 Data Arrays and Decompositions 1.1 Variance Matrices and Eigenstructure Consider a p p positive definite and symmetric matrix V - a model parameter or a sample variance matrix. The eigenstructure is

More information

Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics

Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics Eric Slud, Statistics Program Lecture 1: Metropolis-Hastings Algorithm, plus background in Simulation and Markov Chains. Lecture

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector

More information

Convex Optimization CMU-10725

Convex Optimization CMU-10725 Convex Optimization CMU-10725 Simulated Annealing Barnabás Póczos & Ryan Tibshirani Andrey Markov Markov Chains 2 Markov Chains Markov chain: Homogen Markov chain: 3 Markov Chains Assume that the state

More information

Xia Wang and Dipak K. Dey

Xia Wang and Dipak K. Dey A Flexible Skewed Link Function for Binary Response Data Xia Wang and Dipak K. Dey Technical Report #2008-5 June 18, 2008 This material was based upon work supported by the National Science Foundation

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Simulation of truncated normal variables. Christian P. Robert LSTA, Université Pierre et Marie Curie, Paris

Simulation of truncated normal variables. Christian P. Robert LSTA, Université Pierre et Marie Curie, Paris Simulation of truncated normal variables Christian P. Robert LSTA, Université Pierre et Marie Curie, Paris Abstract arxiv:0907.4010v1 [stat.co] 23 Jul 2009 We provide in this paper simulation algorithms

More information

MCMC: Markov Chain Monte Carlo

MCMC: Markov Chain Monte Carlo I529: Machine Learning in Bioinformatics (Spring 2013) MCMC: Markov Chain Monte Carlo Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Review of Markov

More information

The Multivariate Gaussian Distribution

The Multivariate Gaussian Distribution The Multivariate Gaussian Distribution Chuong B. Do October, 8 A vector-valued random variable X = T X X n is said to have a multivariate normal or Gaussian) distribution with mean µ R n and covariance

More information

Quantitative Non-Geometric Convergence Bounds for Independence Samplers

Quantitative Non-Geometric Convergence Bounds for Independence Samplers Quantitative Non-Geometric Convergence Bounds for Independence Samplers by Gareth O. Roberts * and Jeffrey S. Rosenthal ** (September 28; revised July 29.) 1. Introduction. Markov chain Monte Carlo (MCMC)

More information

Scaling up Bayesian Inference

Scaling up Bayesian Inference Scaling up Bayesian Inference David Dunson Departments of Statistical Science, Mathematics & ECE, Duke University May 1, 2017 Outline Motivation & background EP-MCMC amcmc Discussion Motivation & background

More information

Course 311: Michaelmas Term 2005 Part III: Topics in Commutative Algebra

Course 311: Michaelmas Term 2005 Part III: Topics in Commutative Algebra Course 311: Michaelmas Term 2005 Part III: Topics in Commutative Algebra D. R. Wilkins Contents 3 Topics in Commutative Algebra 2 3.1 Rings and Fields......................... 2 3.2 Ideals...............................

More information

ST 740: Markov Chain Monte Carlo

ST 740: Markov Chain Monte Carlo ST 740: Markov Chain Monte Carlo Alyson Wilson Department of Statistics North Carolina State University October 14, 2012 A. Wilson (NCSU Stsatistics) MCMC October 14, 2012 1 / 20 Convergence Diagnostics:

More information

Markov Chain Monte Carlo Lecture 6

Markov Chain Monte Carlo Lecture 6 Sequential parallel tempering With the development of science and technology, we more and more need to deal with high dimensional systems. For example, we need to align a group of protein or DNA sequences

More information

ON CONVERGENCE RATES OF GIBBS SAMPLERS FOR UNIFORM DISTRIBUTIONS

ON CONVERGENCE RATES OF GIBBS SAMPLERS FOR UNIFORM DISTRIBUTIONS The Annals of Applied Probability 1998, Vol. 8, No. 4, 1291 1302 ON CONVERGENCE RATES OF GIBBS SAMPLERS FOR UNIFORM DISTRIBUTIONS By Gareth O. Roberts 1 and Jeffrey S. Rosenthal 2 University of Cambridge

More information

Markov Chain Monte Carlo Lecture 4

Markov Chain Monte Carlo Lecture 4 The local-trap problem refers to that in simulations of a complex system whose energy landscape is rugged, the sampler gets trapped in a local energy minimum indefinitely, rendering the simulation ineffective.

More information

16 : Approximate Inference: Markov Chain Monte Carlo

16 : Approximate Inference: Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models 10-708, Spring 2017 16 : Approximate Inference: Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Yuan Yang, Chao-Ming Yen 1 Introduction As the target distribution

More information

CONVERGENCE ANALYSIS OF THE GIBBS SAMPLER FOR BAYESIAN GENERAL LINEAR MIXED MODELS WITH IMPROPER PRIORS

CONVERGENCE ANALYSIS OF THE GIBBS SAMPLER FOR BAYESIAN GENERAL LINEAR MIXED MODELS WITH IMPROPER PRIORS The Annals of Statistics 01, Vol. 40, No. 6, 83 849 DOI: 10.114/1-AOS105 Institute of Mathematical Statistics, 01 CONVERGENCE ANALYSIS OF THE GIBBS SAMPLER FOR BAYESIAN GENERAL LINEAR MIXED MODELS WITH

More information

General Glivenko-Cantelli theorems

General Glivenko-Cantelli theorems The ISI s Journal for the Rapid Dissemination of Statistics Research (wileyonlinelibrary.com) DOI: 10.100X/sta.0000......................................................................................................

More information

Inference in state-space models with multiple paths from conditional SMC

Inference in state-space models with multiple paths from conditional SMC Inference in state-space models with multiple paths from conditional SMC Sinan Yıldırım (Sabancı) joint work with Christophe Andrieu (Bristol), Arnaud Doucet (Oxford) and Nicolas Chopin (ENSAE) September

More information

Reversible Markov chains

Reversible Markov chains Reversible Markov chains Variational representations and ordering Chris Sherlock Abstract This pedagogical document explains three variational representations that are useful when comparing the efficiencies

More information

Bayesian Methods with Monte Carlo Markov Chains II

Bayesian Methods with Monte Carlo Markov Chains II Bayesian Methods with Monte Carlo Markov Chains II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University hslu@stat.nctu.edu.tw http://tigpbp.iis.sinica.edu.tw/courses.htm 1 Part 3

More information

Bayesian Learning and Inference in Recurrent Switching Linear Dynamical Systems

Bayesian Learning and Inference in Recurrent Switching Linear Dynamical Systems Bayesian Learning and Inference in Recurrent Switching Linear Dynamical Systems Scott W. Linderman Matthew J. Johnson Andrew C. Miller Columbia University Harvard and Google Brain Harvard University Ryan

More information

Nonlinear Inequality Constrained Ridge Regression Estimator

Nonlinear Inequality Constrained Ridge Regression Estimator The International Conference on Trends and Perspectives in Linear Statistical Inference (LinStat2014) 24 28 August 2014 Linköping, Sweden Nonlinear Inequality Constrained Ridge Regression Estimator Dr.

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As

More information

arxiv: v1 [stat.co] 18 Feb 2012

arxiv: v1 [stat.co] 18 Feb 2012 A LEVEL-SET HIT-AND-RUN SAMPLER FOR QUASI-CONCAVE DISTRIBUTIONS Dean Foster and Shane T. Jensen arxiv:1202.4094v1 [stat.co] 18 Feb 2012 Department of Statistics The Wharton School University of Pennsylvania

More information

HOW TO CHOOSE THE STATE RELEVANCE WEIGHT OF THE APPROXIMATE LINEAR PROGRAM?

HOW TO CHOOSE THE STATE RELEVANCE WEIGHT OF THE APPROXIMATE LINEAR PROGRAM? HOW O CHOOSE HE SAE RELEVANCE WEIGH OF HE APPROXIMAE LINEAR PROGRAM? YANN LE ALLEC AND HEOPHANE WEBER Abstract. he linear programming approach to approximate dynamic programming was introduced in [1].

More information

Research Article Power Prior Elicitation in Bayesian Quantile Regression

Research Article Power Prior Elicitation in Bayesian Quantile Regression Hindawi Publishing Corporation Journal of Probability and Statistics Volume 11, Article ID 87497, 16 pages doi:1.1155/11/87497 Research Article Power Prior Elicitation in Bayesian Quantile Regression Rahim

More information