The Polya-Gamma Gibbs Sampler for Bayesian. Logistic Regression is Uniformly Ergodic

Similar documents
Analysis of Polya-Gamma Gibbs sampler for Bayesian logistic analysis of variance

Geometric ergodicity of the Bayesian lasso

On Reparametrization and the Gibbs Sampler

Convergence Analysis of MCMC Algorithms for Bayesian Multivariate Linear Regression with Non-Gaussian Errors

Dynamic Generalized Linear Models

Asymptotically Stable Drift and Minorization for Markov Chains. with Application to Albert and Chib s Algorithm

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Applicability of subsampling bootstrap methods in Markov chain Monte Carlo

Bayesian Multivariate Logistic Regression

Convergence complexity analysis of Albert and Chib s algorithm for Bayesian probit regression

A Comparison Theorem for Data Augmentation Algorithms with Applications

University of Toronto Department of Statistics

Markov Chain Monte Carlo (MCMC)

Xia Wang and Dipak K. Dey

Lecture 8: The Metropolis-Hastings Algorithm

Improving the Convergence Properties of the Data Augmentation Algorithm with an Application to Bayesian Mixture Modelling

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Block Gibbs sampling for Bayesian random effects models with improper priors: Convergence and regeneration

MARGINAL MARKOV CHAIN MONTE CARLO METHODS

Markov chain Monte Carlo

Spectral properties of Markov operators in Markov chain Monte Carlo

STA 4273H: Statistical Machine Learning

Gibbs Sampling for the Probit Regression Model with Gaussian Markov Random Field Latent Variables

Lecture 5: Spatial probit models. James P. LeSage University of Toledo Department of Economics Toledo, OH

On the Applicability of Regenerative Simulation in Markov Chain Monte Carlo

Estimating the spectral gap of a trace-class Markov operator

Computer intensive statistical methods

CONVERGENCE ANALYSIS OF THE GIBBS SAMPLER FOR BAYESIAN GENERAL LINEAR MIXED MODELS WITH IMPROPER PRIORS. Vanderbilt University & University of Florida

Marginal Markov Chain Monte Carlo Methods

Bayesian linear regression

STA 216, GLM, Lecture 16. October 29, 2007

Stat 5101 Lecture Notes

STAT 518 Intro Student Presentation

Bayesian Methods for Machine Learning

Kobe University Repository : Kernel

HIERARCHICAL BAYESIAN ANALYSIS OF BINARY MATCHED PAIRS DATA

Proofs for Large Sample Properties of Generalized Method of Moments Estimators

Partial factor modeling: predictor-dependent shrinkage for linear regression

6 Markov Chain Monte Carlo (MCMC)

Introduction to Machine Learning CMU-10701

Index. Pagenumbersfollowedbyf indicate figures; pagenumbersfollowedbyt indicate tables.

Default Priors and Effcient Posterior Computation in Bayesian

Hamiltonian Monte Carlo

Bayesian Methods with Monte Carlo Markov Chains II

VARIABLE TRANSFORMATION TO OBTAIN GEOMETRIC ERGODICITY IN THE RANDOM-WALK METROPOLIS ALGORITHM

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture 15-7th March Arnaud Doucet

Gibbs Sampling in Linear Models #2

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo: Can We Trust the Third Significant Figure?

Efficient Sampling Methods for Truncated Multivariate Normal and Student-t Distributions Subject to Linear Inequality Constraints

An ABC interpretation of the multiple auxiliary variable method

Bayesian Learning and Inference in Recurrent Switching Linear Dynamical Systems

Research Division Federal Reserve Bank of St. Louis Working Paper Series

The STS Surgeon Composite Technical Appendix

BAYESIAN ANALYSIS OF BINARY REGRESSION USING SYMMETRIC AND ASYMMETRIC LINKS

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

The Bayesian Approach to Multi-equation Econometric Model Estimation

Or How to select variables Using Bayesian LASSO

On the Use of Cauchy Prior Distributions for Bayesian Logistic Regression

MCMC algorithms for fitting Bayesian models

Stat 5421 Lecture Notes Proper Conjugate Priors for Exponential Families Charles J. Geyer March 28, 2016

Bayes: All uncertainty is described using probability.

Bayes methods for categorical data. April 25, 2017

Exploring Monte Carlo Methods

Control Variates for Markov Chain Monte Carlo

A Fully Nonparametric Modeling Approach to. BNP Binary Regression

Coupled Hidden Markov Models: Computational Challenges

ON CONVERGENCE RATES OF GIBBS SAMPLERS FOR UNIFORM DISTRIBUTIONS

Multivariate Versus Multinomial Probit: When are Binary Decisions Made Separately also Jointly Optimal?

Local consistency of Markov chain Monte Carlo methods

Marginal Specifications and a Gaussian Copula Estimation

Wageningen Summer School in Econometrics. The Bayesian Approach in Theory and Practice

Variable Selection for Multivariate Logistic Regression Models

A fast sampler for data simulation from spatial, and other, Markov random fields

The Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition.

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Label Switching and Its Simple Solutions for Frequentist Mixture Models

Bayesian GLMs and Metropolis-Hastings Algorithm

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Geometric Ergodicity of a Random-Walk Metorpolis Algorithm via Variable Transformation and Computer Aided Reasoning in Statistics

Invariant HPD credible sets and MAP estimators

Stat 516, Homework 1

Markov Chain Monte Carlo

Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method

Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics

Gibbs Sampling in Endogenous Variables Models

Simulation of truncated normal variables. Christian P. Robert LSTA, Université Pierre et Marie Curie, Paris

Improving the Data Augmentation algorithm in the two-block setup

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model

Markov chain Monte Carlo

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

Kazuhiko Kakamu Department of Economics Finance, Institute for Advanced Studies. Abstract

Gibbs Sampling in Latent Variable Models #1

Markov Chain Monte Carlo for Linear Mixed Models

Non-Parametric Bayesian Inference for Controlled Branching Processes Through MCMC Methods

SUPPLEMENT TO EXTENDING THE LATENT MULTINOMIAL MODEL WITH COMPLEX ERROR PROCESSES AND DYNAMIC MARKOV BASES

Kernel adaptive Sequential Monte Carlo

Regression and Statistical Inference

Spring 2017 Econ 574 Roger Koenker. Lecture 14 GEE-GMM

STA 4273H: Statistical Machine Learning

Transcription:

he Polya-Gamma Gibbs Sampler for Bayesian Logistic Regression is Uniformly Ergodic Hee Min Choi and James P. Hobert Department of Statistics University of Florida August 013 Abstract One of the most widely used data augmentation algorithms is Albert and Chib s (1993) algorithm for Bayesian probit regression. Polson, Scott and Windle (013) recently introduced an analogous algorithm for Bayesian logistic regression. he main difference between the two is that Albert and Chib s (1993) truncated normals are replaced by so-called Polya-Gamma random variables. In this note, we establish that the Markov chain underlying Polson et al. s (013) algorithm is uniformly ergodic. his theoretical result has important practical benefits. In particular, it guarantees the existence of central limit theorems that can be used to make an informed decision about how long the simulation should be run. 1 Introduction Consider a binary regression set-up in which Y 1,..., Y n are independent Bernoulli random variables such that Pr(Y i = 1 β) = F (x i β), where x i is a p 1 vector of known covariates associated with Y i, β is AMS 000 subject classifications. Primary 60J7; secondary 6F15 Abbreviated title. Gibbs sampler for Bayesian logistic regression Key words and phrases. Polya-Gamma distribution, Data augmentation algorithm, Minorization condition, Markov chain, Monte Carlo 1

a p 1 vector of unknown regression coefficients, and F : R (0, 1) is a distribution function. wo important special cases are probit regression, where F is the standard normal distribution function, and logistic regression, where F is the standard logistic distribution function, that is, F (t) = e t /(1 + e t ). In general, the joint mass function of Y 1,..., Y n is given by n Pr(Y i = y i β) = n F (x i β) ] y i 1 F (x i β) ] 1 y i I 0,1} (y i ). A Bayesian version of the model requires a prior distribution for the unknown regression parameter, β. If π(β) is the prior density for β, then the posterior density of β given the data, y = (y 1,..., y n ), is defined as π(β y) = π(β) c(y) n F (x i β) ] y i 1 F (x i β) ] 1 y i, where c(y) is the normalizing constant, that is, n c(y) := π(β) F (x i β) ] y i 1 F (x i β) ] 1 y i dβ. R p Regardless of the choice of F, the posterior density is intractable in the sense that expectations with respect to π(β y), which are required for Bayesian inference, cannot be computed in closed form. Moreover, classical Monte Carlo methods based on independent and identically distributed (iid) samples from π(β y) are problematic when the dimension, p, is large. hese difficulties have spurred the development of many Markov chain Monte Carlo methods for exploring π(β y). One of the most popular is a simple data augmentation (DA) algorithm for the probit regression case that was developed by Albert and Chib (1993). Each iteration of this algorithm has two steps, the first entails the simulation of n independent univariate truncated normals, and the second requires a draw from the p-variate normal distribution. A convergence rate analysis of the underlying Markov chain can be found in Roy and Hobert (007). Since the publication of Albert and Chib (1993), the search has been on for an analogous algorithm for Bayesian logistic regression. Attempts to develop such an analogue by mimicking the missing data argument of Albert and Chib (1993) in the logistic case have not been entirely successful. In particular, the resulting algorithms are much more complicated than that of Albert and Chib (1993), and simplified versions

of them are inexact (see, e.g., Frühwirth-Schnatter and Frühwirth, 010; Holmes and Held, 006; Marchev, 011). Using a different approach, Polson et al. (013) (hereafter PS&W) have developed a real analogue of Albert and Chib s (1993) algorithm for Bayesian logistic regression. he main difference between the two algorithms is that the truncated normals in Albert and Chib s (1993) algorithm are replaced by Polya-Gamma random variables in PS&W s algorithm. We now describe the new algorithm. In the remainder of this paper, we restrict attention to the logistic regression model with a proper N p (b, B) prior on β. As usual, let X denote the n p matrix whose ith row is x i, and let R + = (0, ). For fixed w, define Σ(w) = ( X Ω(w)X + B 1) 1 ( and m(w) = Σ(w) X (y 1 1 n) + B 1 b ), where Ω(w) is the n n diagonal matrix whose ith diagonal element is w i, and 1 n is an n 1 vector of 1s. Finally, let PG(1, c) denote the Polya-Gamma distribution with parameters 1 and c, which is carefully defined in Section. he dynamics of PS&W s Markov chain are defined (implicitly) through the following two-step procedure for moving from the current state, β (m) = β, to β (m+1). Iteration m + 1 of PS&W s DA algorithm: 1. Draw W 1,..., W n independently with W i PG(1, x i β ), and call the observed value w = (w 1,..., w n ). Draw β (m+1) N p ( m(w), Σ(w) ) Highly efficient methods of simulating Polya-Gamma random variables are provided by PS&W. In this paper, we prove that PS&W s Markov chain is uniformly ergodic. his theoretical convergence rate result is extremely important when it comes to using PS&W s algorithm in practice to estimate intractable posterior expectations. Indeed, uniform ergodicity guarantees the existence of central limit theorems for averages such as m 1 m 1 i=0 g( β (i)), where g : R p R is square integrable with respect to 3

the target posterior, π(β y) (see, e.g., Roberts and Rosenthal, 004; ierney, 1994). It also allows for the computation of a consistent estimator of the associated asymptotic variance, which can be used to make an informed decision about how long the simulation should be run (see, e.g., Flegal, Haran and Jones, 008). We also establish the existence of the moment generating function (mgf) of the posterior distribution. It follows that, if g(β 1,..., β p ) = β a i for some i 1,,..., p} and a > 0, then g is square integrable with respect to the posterior. he remainder of this paper is organized as follows. Section contains a brief, but careful development of PS&W s DA algorithm. Section 3 contains the proof that the Markov chain underlying this algorithm is uniformly ergodic, as well as the proof that the posterior density, π(β y), has an mgf. Polson, Scott and Windle s Algorithm PS&W s DA algorithm is based on a latent data representation of the posterior distribution. o describe it, we need to introduce what PS&W call the Polya-Gamma distribution. Let E k } k=1 be a sequence of iid Exp(1) random variables and define W = π k=1 E k (k 1). It is well-known (see, e.g., Biane, Pitman and Yor, 001) that the random variable W has density g(w) = k (k + 1) (k+1) ( 1) e 8w I πw 3 (0, ) (w), (1) k=0 and that its Laplace transform is given by E e tw ] = cosh 1 ( t/ ). (Recall that cosh(z) = (e z + e z )/.) PS&W create the Polya-Gamma family of densities through an exponential tilting of the density g. Indeed, consider a parametric family of densities, indexed by c 0, that takes the form f(x; c) = cosh(c/) e c x g(x). 4

Of course, when c = 0, we recover the original density. A random variable with density f(x; c) is said to have a PG(1, c) distribution. We now describe a latent data formulation that leads to PS&W s DA algorithm. Our development is different, and we believe somewhat more transparent, than that given by PS&W. Conditional on β, let (Y i, W i )} n be independent random pairs such that Y i and W i are also independent, with Y i Bernoulli ( e x i β /(1 + e x i β ) ) and W i PG(1, x i β ). Let W = (W 1,..., W n ) and denote its density by f(w β). Combining this latent data model with the prior, π(β), yields the augmented posterior density defined as π(β, w y) = n Pr(Y i = y i β) ] f(w β) π(β) c(y). Clearly, π(β, w y) dw = π(β y), which is our target posterior density. PS&W s DA algorithm alternates between draws from π(β w, y) and π(w β, y). he conditional independence of Y i and W i implies that π(w β, y) = f(w β). hus, we can draw from π(w β, y) by making n independent draws from the Polya-Gamma distribution (as in the first step of the two-step procedure described in the Introduction). he other conditional density is multivariate normal. o see this, note that n ] π(β w, y) Pr(Y i = y i β) f(w β)π(β) n ( e x β) ] y i i n ( x = cosh i β 1 + e x i β n ( e x β) y i ] i ( x π(β) cosh i β 1 + e x i β n = n π(β) exp ] ) e (x i β) w i g(w i ) π(β) ) e (x i β) w i y i x i β x i β (x i β) w i where the last equality follows from the fact that cosh(z) = (1+e z )/(e z ). A routine Bayesian regressiontype calculation then reveals that β w, y N p (m(w), Σ(w)), where m(w) and Σ(w) are defined in the Introduction. }, 5

he Markov transition density (Mtd) of the DA Markov chain, Φ = β (m) } m=0, is k ( β β ) = π(β w, y) π(w β, y) dw. () Of course, we do not have to perform the integration in (), but we do need to be able to simulate random vectors with density k ( β ), and this is exactly what the two-step procedure described in the Introduction does. Note that k : R p R p (0, ); that is, k is strictly positive. It follows immediately that the Markov chain Φ is irreducible, aperiodic and Harris recurrent (see, e.g., Hobert, 011). 3 Uniform Ergodicity For m 1,, 3,... }, let k m : R p R p (0, ) denote the m-step Mtd of PS&W s Markov chain, where k 1 k. hese densities are defined inductively as follows k m( β β ) = k m 1( β β ) k ( β β ) dβ. R p he conditional density of β (m) given β (0) = β is precisely k m( β ). he Markov chain Φ is geometrically ergodic if there exist M : R p 0, ) and ρ 0, 1) such that, for all m, R p k m( β β ) π(β y) dβ M(β ) ρ m. (3) he quantity on the left-hand side of (3) is the total variation distance between the posterior distribution and the distribution of β (m) (given β (0) = β ). If the function M( ) is bounded above, then the chain is uniformly ergodic. One method of establishing uniform ergodicity is to construct a minorization condition that holds on the entire state space. In particular, if we can find a δ > 0 and a density function h : R p 0, ) such that, for all β, β R p, k ( β β ) δ h(β), (4) then the chain is uniformly ergodic. In fact, if (4) holds, then (3) holds with M 1 and ρ = 1 δ (see, e.g., Jones and Hobert, 001, Section 3.). 6

In order to state our main result, we need a bit more notation. Let φ(z; µ, V ) denote the multivariate normal density with mean µ and covariance matrix V evaluated at the point z. Recall that the prior on β is N p (b, B), and define ( s = B 1 X y 1 ) 1 n + B 1 b. Here is our main result. Proposition 1. he Mtd of Φ satisfies the following minorization condition k ( β β ) δφ(β; m, Σ ), where m = ( 1 X X + B 1) 1 B 1 s, Σ = ( 1 X X + B 1) 1, and δ = Σ 1/ exp B 1/ 1 m Σ 1 m } e n 4 n exp 1 s s }. Hence, PS&W s Markov chain is uniformly ergodic. Remark 1. As discussed in the Introduction, uniform ergodicity guarantees the existence of central limit theorems and consistent estimators of the associated asymptotic variances, and these can be used to make an informed decision about how long the simulation should be run. While it is true that, in many cases, δ will be so close to zero that the bound (3) (with M 1 and ρ = 1 δ) will not be useful for getting a handle on the total variation distance, this does not detract from the usefulness of the result. Moreover, it is certainly possible that a different analysis could yield a different minorization condition with a larger δ. he following easily established lemmas will be used in the proof of Proposition 1. Lemma 1. If A is a symmetric nonnegative definite matrix, then all of the eigenvalues of (I + A) 1 are in (0, 1], and I (I + A) 1 is also nonnegative definite. Lemma. For a, b R, cosh(a + b) cosh(a) cosh(b). Proof of Proposition 1. Recall that Σ = Σ(w) = ( X Ω(w)X +B 1) 1 and m = m(w) = Σ(w) ( X (y 1 1 n) + B 1 b ). We begin by showing that Σ 1 B 1. Indeed, Σ = (X ΩX + B 1 ) 1 = ( B 1 B 1 X ΩXB 1 B 1 + B 1 B 1 ) 1 = B ( X Ω X + I) 1, 7

where X = XB 1. Now, since X Ω X is nonnegative definite, Lemma 1 implies that Σ B, and the result follows. Next, we show that m Σ 1 m s s. Letting l = y 1 1 n, we have m Σ 1 m = (X l + B 1 b) (X ΩX + B 1 ) 1 (X l + B 1 b) = (X l + B 1 b) B 1 ( X Ω X + I) 1 B 1 (X l + B 1 b) = ( X l + B 1 b) ( X Ω X + I) 1 ( X l + B 1 b) ( X l + B 1 b) ( X l + B 1 b) = s s, where the inequality follows from Lemma 1. Using these two inequalities, we have π(β w, y) = (π) p Σ 1 exp 1 } (β m) Σ 1 (β m) Now since it follows that = (π) p Σ 1 exp 1 β (X ΩX)β 1 β B 1 β 1 } m Σ 1 m + m Σ 1 β (π) p B 1 exp 1 β (X ΩX)β 1 β B 1 β 1 } s s + m Σ 1 β = (π) p B 1 exp 1 = (π) p B 1 exp π(w β, y) = n w i (x i β) 1 β B 1 β 1 } s s + s B 1 β 1 β B 1 β 1 s s + s B 1 β } n n ( ) } x cosh i β exp (x i β) w i g(w i ), π(β w, y)π(w β, y) (π) p B 1 exp 1 β B 1 β 1 s s + s B 1 β } n ( x cosh i β ) exp (x i β) + (x i β ) } ] exp (x i β) w i. } ] ]w i g(w i ). 8

Recall that k(β β ) = R π(β w, y) π(w β, y) dw. o this end, note that n + (x exp i β) + (x i β ) } (x ]w i i g(w i ) dw i = cosh β) + (x i β ) } 1 R + ( )} 1 x i cosh β + x i β ( ) ( )} 1 x i cosh β x i cosh β, where the first inequality is due to the fact that a + b a + b for non-negative a, b, and the second inequality is by Lemma. It follows that n ( x cosh i β ) exp (x i β) + (x i β ) } ] n ( ) ] 1 x w i g(w i ) dw n cosh i β. Moreover, n ( ) ] 1 x cosh i β n ( ) ] 1 x exp i β n ( (x exp i β) ) ] 1 + 1 4 = exp 1 n ( (x i β) + 1 ) } = e n 4 exp 1 ( β X ) } Xβ, where the second inequality holds because a (a + 1)/ for any real a. Putting all of this together, we have k(β β ) = π(β w, y) π(w β, y) dw (π) p B 1 exp 1 β B 1 β 1 s s + s B 1 β } n e n 4 exp 1 = (π) p B 1 exp 1 ( X X β + B 1) } β + s B 1 β n e n 4 s s } Σ 1 B 1 n e n 4 s s = (π) p Σ 1 exp 1 (β m ) Σ 1 (β m ) = δ φ(β; m, Σ ), ( β X ) } Xβ + m Σ 1 m 9

and the proof is complete. Remark. It is worth pointing out that our proof circumvents the need to deal directly with the unwieldy density of the PG(1, 0) distribution, which is given in (1). he utility of the Polya-Gamma latent data strategy extends far beyond the basic logistic regression model. Indeed, PS&W use this technique to develop useful Gibbs samplers for a variety of Bayesian hierarchical models in which a logit link is employed. hese include Bayesian logistic regression with random effects and negative-binomial regression. Of course, given the results herein, it is natural to ask whether these other Gibbs samplers are also uniformly ergodic, and to what extent our methods could be used to answer this question. We note, however, that the Mtd of the Polya-Gamma Gibbs sampler is relatively simple. For example, the Gibbs sampler for the mixed effects model has more than two-steps, so its Mtd is significantly more complex than that of the Polya-Gamma Gibbs sampler. hus, the development of (uniform) minorization conditions for the other Gibbs samplers would likely entail more than just a straightforward extension of the proof of Proposition 1. On the other hand, it is possible that some of the inequalities in that proof could be recycled. Finally, we establish that the posterior distribution of β given the data has a moment generating function. Proposition. For any fixed t R p, R p e β t π(β y) dβ <. Hence, the moment generating function of the posterior distribution exists. Remark 3. Chen and Shao (000) develop results for Bayesian logistic regression with a flat (improper) prior on β. In particular, these authors provide conditions on X and y that guarantee the existence of the mgf of the posterior of β given the data. Note that our result holds for all X and y Proof. Recall that π(β, w y) dw = π(β y), 10

where π(β, w y) = n Pr(Y i = y i β) ] f(w β) π(β) c(y). Hence, it suffices to show that R p e β t π(β, w y) dβ dw <. Straightforward calculations similar to those done in Section show that π(β, w y) = φ(β; m, Σ) n c(y) Σ 1 B 1 exp 1 } } 1 n b B 1 b exp m Σ 1 m g(w i ). hen, since Σ B and m Σ 1 m s s, we have π(β, w y) φ(β; m, Σ) n c(y) exp 1 } } 1 n b B 1 b exp s s g(w i ). Now, using the formula for the multivariate normal mgf, it suffices to show that n ] e β t φ(β; m, Σ) g(w i ) dβ dw = exp m t + 1 } n ] R p R n t Σt g(w i ) dw <. + We establish this by demonstrating that m t + 1 t Σt is uniformly bounded in w. For a matrix A, define A = sup x =1 Ax. Of course, if A is non-negative definite, then A is equal to the largest eigenvalue of A. Now, using Cauchy-Schwartz and properties of the norm, we have m t m t = B 1 B 1 m t B 1 B 1 m t. Now since B 1 m = ( X Ω X + I ) 1( X (y 1 1 n) + B 1 b ), we have B 1 m = ( X Ω X + I ) ( ( 1 X y 1 ) ) 1 n + B 1 b ( X Ω X + I ) ( 1 X y 1 ) 1 ( n + X Ω X + I ) 1 B 1 b ( X Ω X + I ) 1 ( X y 1 ) 1 ( n + X Ω X + I ) 1 B 1 b X ( y 1 n) 1 + B 1 b, 11

where the first inequality is due to the fact that a + b a + b for any vectors a and b, and the third inequality is due to the fact that ( X Ω X + I ) 1 1 (Lemma 1). Hence, m t is uniformly bounded in w. Finally, another application of Lemma 1 yields t Σt = t ( X ΩX + B 1) 1 t = t ( B 1 X Ω X + I ) 1 1 B t t Bt, so t Σt is also uniformly bounded in w, and the result follows. Acknowledgment. he second author was supported by NSF Grant DMS-11-06395. References ALBER, J. H. and CHIB, S. (1993). Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association, 88 669 679. BIANE, P., PIMAN, J. and YOR, M. (001). Probability laws related to the Jacobi theta and Riemann zeta functions, and Brownian excursions. Bulletin of the American Mathematical Society, 38 435 465. CHEN, M.-H. and SHAO, Q.-M. (000). Propriety of posterior distribution for dichotomous quantal response models. Proceedings of the American Mathematical Society, 19 93 30. FLEGAL, J. M., HARAN, M. and JONES, G. L. (008). Markov chain Monte Carlo: Can we trust the third significant figure? Statistical Science, 3 50 60. FRÜHWIRH-SCHNAER, S. and FRÜHWIRH, R. (010). Data augmentation and MCMC for binary and multinomial logit models. In Statistical Modelling and Regression Structures. Springer-Verlag, 111 13. HOBER, J. P. (011). he data augmentation algorithm: heory and methodology. In Handbook of Markov Chain Monte Carlo (S. Brooks, A. Gelman, G. Jones and X.-L. Meng, eds.). Chapman & Hall/CRC Press. HOLMES, C. and HELD, L. (006). Bayesian auxiliary variable models for binary and multinomial regression. Bayesian Analysis, 1 145 168. 1

JONES, G. L. and HOBER, J. P. (001). Honest exploration of intractable probability distributions via Markov chain Monte Carlo. Statistical Science, 16 31 34. MARCHEV, D. (011). Markov chain Monte Carlo algorithms for the Bayesian logistic regression model. In Proceedings of the Annual International Conference on Operations Research and Statistics. 154 159. POLSON, N. G., SCO, J. G. and WINDLE, J. (013). Bayesian inference for logistic models using Polya- Gamma latent variables. Journal of the American Statistical Association (to appear). ArXiv:105.0310v3. ROBERS, G. O. and ROSENHAL, J. S. (004). General state space Markov chains and MCMC algorithms. Probability Surveys, 1 0 71. ROY, V. and HOBER, J. P. (007). Convergence rates and asymptotic standard errors for Markov chain Monte Carlo algorithms for Bayesian probit regression. Journal of the Royal Statistical Society, Series B, 69 607 63. IERNEY, L. (1994). Markov chains for exploring posterior distributions (with discussion). he Annals of Statistics, 1701 176. 13