BAYESIAN ANALYSIS OF BINARY REGRESSION USING SYMMETRIC AND ASYMMETRIC LINKS

Size: px
Start display at page:

Download "BAYESIAN ANALYSIS OF BINARY REGRESSION USING SYMMETRIC AND ASYMMETRIC LINKS"

Transcription

1 Sankhyā : The Indian Journal of Statistics 2000, Volume 62, Series B, Pt. 3, pp BAYESIAN ANALYSIS OF BINARY REGRESSION USING SYMMETRIC AND ASYMMETRIC LINKS By SANJIB BASU Northern Illinis University, Dekalb, USA and SAURABH MUKHOPADHYAY Merck Research Laboratories, Rahway, USA SUMMARY. Binary response regression is a useful technique for analyzing categorical data. Popular binary models use special link functions such as the logit or the probit link. In this article, the inverse link function H is modeled to be a scale mixture of cumulative distribution functions. Two different models for H are proposed: (i) H is a finite normal scale mixture with a Dirichlet distribution prior on the mixing distribution; and (ii) H is a scale mixture of truncated normal distributions with the mixing distribution having a Dirichlet prior. The second model allows symmetric as well as asymmetric links. Bayesian analyses of these models using data augmentation and Gibbs sampling are described. Model diagnostics by cross validation of the conditional predictive distributions are proposed. These analyses are illustrated in the Beetle mortality data and the Challenger o-ring distress data. 1. Introduction Consider the binary regression model P (Y i = 1) = H(x T i β), i = 1,..., N. (1) Here the binary response y i is either 0 or 1, x T i = (x 1i,..., x ki ) is the set of covariates, β = (β 1,..., β k ) T is a vector of unknown parameters and the function H is usually assumed to be known. In the terminology of Generalized Linear Models (McCullagh and Nelder (1989)), H is the inverse link function. For ease of exposition, we refer to H as the link function in this article. Popular probit and logit models are obtained if H is chosen as the standard normal cdf Φ or the cdf of the standard logistic distribution respectively. Such particular choices of H are often done for convenience and on an ad hoc basis. Paper received November 1998; revised November AMS (1991) subject classification. 62F15, 62J12. Key words and phrases. Asymmetric link, binary data, cross-validation, Dirichlet distribution, Gibbs sampling, normal scale mixture, predictive distribution.

2 bayesian analysis of binary regression 373 Binary regression with a parametric family of link functions (instead of a single fixed choice) has been explored by many, see Prentice (1976), Aranda-Ordaz (1981), Guerrero and Johnson (1982), and Stukel (1988). Their works show that such extended models can significantly improve fits. Notice that (1) requires range of H to be [0, 1], usually it is also preferable to have a nondecreasing smooth function H. These requirements match exactly with a smooth continuous cumulative distribution function (cdf). Recently, there has been strong interest in Bayesian analysis of binary and polychotomous response regression with the class (or a subclass) of cdfs as choices for the function H. See Albert and Chib (1993), Chen and Dey (1996) and the references therein. Let F be the class of all cdfs on R. it The family F includes cdfs which are often undesirable as choices for H, for example, cdfs of discrete distributions. Instead, we consider the subclass of normal scale mixture cdfs F N = { F N ( ) = [0, ) Φ( σ ) dg(σ), G is a cdf on [0, )} as possible choices for the function H. The class of normal scale mixtures allows a variety of functional forms and varying tail structures (including normal, all t distributions, Logistic, Double Exponential, and Cauchy), thus presenting us with a wide array of choices for the link H. Moreover, a normal scale mixture cdf F N is continuous, smooth, infinitely differentiable and symmetric (F N (θ) = 1 F N ( θ)). The cdf F N also does not have a wiggly structure, indeed, F N is convex on (, 0) and concave on (0, ). For H( ) = F N ( ) = Φ( σ ) dg(σ), our binary regression model (1) becomes P (Y i = 1) = Φ({x T i β}/σ) dg(σ), i = 1,..., N, (2) which includes two unknowns, β and the mixing distribution G. These two unknowns are related since the interpretation of the regression coefficient β depends on the form of the link function H( ) and hence on G. In our prior specification, however, we typically use the improper non-subjective prior π 1 (β) 1 independent of G. This specification reflects our complete uncertainty about the value of β irrespective of the form of the link function. For the mixing distribution G, we use an independent prior π 2 (G). The posterior distribution π(β, G y) which combines the prior π(β, G) and the sampling model of (2), however, is analytically intractable. Albert and Chib (1993) described a data-augmented Gibbs sampling methodology for probit and t-link models. We extend this algorithm to our case of normal scale mixture links. In section 2, we consider finite normal scale mixture links. The mixing distribution G is of the form s s p j δ {τj } where 0 p j 1, p j = 1, δ {τj } is the degenerate distribution at τ j. We assume that the support points 0 < τ 1 <... < τ s < are user-specified. The resulting link function is H( ) = mixture of normal cdfs. p = (p 1,..., p s ) T, i.e., π(g) = π(p) = constant s p j Φ( τ j ), a finite scale We assume a Dirichlet distribution (DD(ν)) prior on s p νj 1 j where ν j > 0 are user

3 374 sanjib basu and saurabh mukhopadhyay specified. The normal scale mixture link of (2) always produces a symmetric link. We develop a new family of distributions based on mixtures of truncated normals that naturally contains symmetric and asymmetric distributions. Use of truncated normal mixtures as the link H introduces many methodological and computational challenges. These are discussed in sections 3 and 4. Model checking is an integral part of any statistical analysis. We consider several cross validation model checking criteria and develop easy methods for their calculation in section 5. In section 6, we apply our proposed models to the beetle mortality data and the Challenger data of Dalal, Fowlkes and Hoadley (1989). Conclusions are given in section Finite Normal Scale Mixture Links We observe (y i, x i ), i = 1,..., N, where y i is binary and x i is a set of covariates which are either continuous or categorical. We assume that Y i s are independent Bernoulli (θ i ) where θ i = Φ({x T i β}/σ) dg(σ) as in (2). We take G = s p j δ {τj} and put a Dirichlet distribution prior π 2 (G) = DD(ν) on p = (p 1,..., p s ) T. We further assume an independent prior π 1 (β) on β. The posterior distribution for this model is analytically intractable. We use Gibbs sampling which is an extension of the sampler proposed by Albert and Chib (1993). This sampler introduces two sets of latent variables Z = (Z 1,..., Z N ) T and σ = (σ 1,..., σ N ) T. The complete model structure along with the distributions of the latent variables are given below : (a) Given β and σ; the latent variables Z 1,..., Z N are independent with Z i N(x T i β, σ2 i ). (b) Given Z; Y 1,..., Y N are completely determined with Y i = 1 if Z i > 0, and = 0 otherwise. (c) Given G; the latent variables σ 1,..., σ N are i.i.d. G. (d) G = s π 2 (p) = DD(ν). p j δ {τj} and p = (p 1,..., p s ) T (e) β is independent of G and has a prior π 1 (β). has a Dirichlet distribution prior From (a) and (b), P (Y i = 1 σ, β, G) = P (Z i > 0 σ, β, G) = Φ({x T i β}/σ i). Integrated over σ i (from (c)), P (Y i = 1 β, G) = Φ({x T i β}/σ i) dg(σ i ), thus giving back our model of (2). To implement the Gibbs sampler, one needs to simulate from the full conditional distributions of each unobserved variable given the observed y and the remaining variables. These distributions are described next. (i) Given y, β, σ and G; Z 1,..., Z N are independent with Z i distributed as N(x T i β, σ2 i ) truncated at left by 0 if y i = 1, and truncated at right by 0 if y i = 0.

4 bayesian analysis of binary regression 375 In (ii) (iv) below, we assume that the given y and z satisfy (y i 1 2 ) z i > 0, i = 1,..., N. (ii) Given y, β, z and G; σ 1,..., σ N are independent with σ i s q ij δ {τj } where q ij = { p j τ j φ({z i x T i β}/τ j)}/{ s density function. k=1 p k τ k φ({z i x T i β}/τ k)} and φ( ) is the N(0, 1) (iii) Notice that σ i belongs to the set {τ 1,..., τ s } with probability 1. Given y, β, z, σ; let k j = # of σ i which equals τ j. Then G y, β, z, σ = s p j δ {τj} where p = (p 1,..., p s ) T has a Dirichlet distribution DD(ν ) and ν j = ν j +k j, j = 1,..., s. (iv) If we assume a customary diffuse prior π 1 (β) 1, then β y, z, σ, G N k ( β, (X T W X) 1 ) where β = (X T W X) 1 X T W z, W = diagonal(1/σi 2 ), and X = [x 1,..., x N ] T is the design matrix (we assume rank(x) = k). This follows immediately from Bayesian linear model theory. Introduction of the latent variables Z and σ substantially simplifies this calculation and enables us to obtain the conditional densities in closed forms. Simulation from each of the distributions in (i) (iv) is relatively easy. The above model requires two user inputs in the prior structure at step (d); the support set τ = (τ 1,..., τ s ) for σ i s and the Dirichlet distribution parameter ν = (ν 1,..., ν s }. In the examples, we often choose equally spaced values for τ i s with some small and some large values and choose equiprobable ν 1 =... = ν s in the absence of other informations. 3. Asymmetric Links Aranda-Ordaz (1981), Stukel (1988), and Agresti (1990) describe data where asymmetric links produce significantly better fits than symmetric links. Normal scale mixtures, however, always produce symmetric links due to the symmetry of the normal distribution. If we want to keep the normal mixture structure, one possible way to generate asymmetric links is by considering both location and scale mixtures of normals. Location-scale mixtures of normals are extremely rich, in fact, they contain all densities on the real line in their weak convergence closure (see Lo (1984)). They thus also include multiple spiked and multimodal densities. If we use only the data to choose a single link function from this class (for example, through a maximum likelihood procedure), a multiple spiked density probably will be the best choice. The situation is reminiscent of the density estimation scenario where, without any smoothness restriction, a density with spikes at the data points is the maximum likelihood estimate. One way to avoid this problem is to choose an appropriate prior that makes the choice of these undesirable functions less probable a priori. We take an alternate route where we start from a smaller class of functions not containing the undesirable functions. This class is described next.

5 376 sanjib basu and saurabh mukhopadhyay We consider normal distributions which are truncated either at left or at right, and then consider their scale mixtures. Define a cdf F (z, σ) as follows : (i) for σ > 0, F (z, σ) = 0 if z < 0 and = 2 Φ( z z σ ) 1 if z 0; (ii) for σ < 0, F (z, σ) = 2 Φ( σ ) if z < 0 and = 1 if z 0; and (iii) for σ = 0, F (z, σ) = 0 is z < 0 and = 1 if z 0. The corresponding density f(z, σ) = 2 σ φ( z σ ) if σ z > 0 and = 0 otherwise. Notice that f(, σ) is simply the density of N(0, σ 2 ) truncated at left by 0 if σ > 0 and truncated at right by 0 if σ < 0. We consider scale mixtures of F (, σ) as possible choices for the link function H, i.e., H(z) = F (z, σ) dg(σ) where G is a R distribution on the whole real line. The class of normal scale mixtures is a subclass of these distributions; if G is symmetric about 0 then H is a normal scale mixture cdf. On the other hand, if G is asymmetric, so also is H. The symmetry relation Φ( z z σ ) = 1 Φ( σ ) of normal cdfs translates to the following relation in F : F (z, σ) = 1 F ( z, σ). Some caution is also needed in the introduction of the latent variables Z. We replace (a) of section 2 by (a ) Given β and σ, the latent variables Z 1,..., Z N are independent with Z i having density f(z i x T i β, σ i). The rest of the finite mixture model ((b) (e) of section 2) remains the same except that the domain of τ i s changes to < τ 1 <... < τ s <. To avoid complications, we further assume that τ i 0, i = 1,..., s. With this structure, P (Y i = 1 σ, β, G) = P (Z i > 0 σ, β, G) = 1 F ( x T i β, σ i) = F (x T i β, σ i). Integrated over σ i, P (Y i = 1 β, G) = F (x T i β, σ i) dg(σ i ) as we want. For this asymmetric model, the conditional distributions of each unobserved variable given the observed y and the remaining variables is given below. Note, Z i has density f(z i x T i β, σ i), i = 1,..., N. This poses N restrictions : σ i (Z i x T i β) > 0, i = 1,..., N (see definition of f(, σ)), or σ (Z X β) 0 where means left vector > right vector in every coordinate. (1) Given y, β, σ and G; Z 1,..., Z N are independent with Z i having density c 1 φ({z i x T i β}/ σ i ) on the restricted domain Z where Z = {z i : σ i (z i x T i β) > 0, z i > 0} if y i = 1, Z = {z i : σ i (z i x T i β) > 0, z i 0} if y i = 0, and c 1 is the normalizing constant. In (2) (4) below, we assume (y i 1 2 ) Z i > 0, i = 1,..., N. s (2) Given y, β, Z and G; σ 1,..., σ N are independent with σ i c 2 q ij δ {τj } where q ij = p j τ φ({z j i x T i β}/ τ j ) if τ j (Z i x T i β) > 0, and = 0 otherwise. (3) The conditional density of G y, β, Z, σ is same as in (iii) of section 2. (4) For π 1 (β) 1, β y, Z, σ, G is distributed as N k ( β, (X T W X) 1 ) (as in (iv) of section 2) restricted on the support {β : σ (Z Xβ) 0}. The first problem associated with these conditional distributions relates to the irreducibility of the underlying Markov Chain. Suppose we start the Gibbs sampler from an initial positive value for σ i,i.e., σ (0) i > 0. It is then easy to see that at every iteration of the Gibbs cycle the generated σ (r) i will be > 0. Similarly, if the initial σ (0) i < 0, then every generated σ (r) i < 0. This implies that the Markov

6 bayesian analysis of binary regression 377 chain generated by the conditional distributions (1) (4) is not irreducible. We circumvent this problem by generating from the joint distribution of (Z, σ) y, β, G (instead of their individual full conditional distributions). The generation from this joint distribution can be done in the following steps: (i) given y, β, G, the pairs (Z i, σ i ), i = 1,..., N are independent; (ii) generate σ i from the distribution of σ i y, β, G; and then (iii) generate Z i from the conditional distribution of Z i σ i, y, β, G. The last distribution is already obtained in (1). The distribution of σ i y, β, G required in (ii) is very similar to the distribution in (2) above. We s have σ i c 2 qij δ {τ j } where the qij s are now determined by the following rules: Suppose y i = 1. Then qij = p j Φ(min(x T i β, 0)/ τ j )) if τ j < 0. For τ j > 0, qij = 0 if x T i β < 0 and q ij = p j (Φ(x T i β/τ j) 1/2) if x T i β > 0. The case of y i = 0 is similar. The other problem relates to generation β from its full conditional distributions given in (4). We need to generate random deviates from a multivariate normal on the restricted domain {β : c i x T i β > b i, i = 1,..., N} where c i = 1, b i = Z i if σ i > 0 and c i = 1, b i = Z i if σ i < 0. Thus, we need to generate β from a k-dimensional space, but the support of β is restricted by N (N k) linear constraints. Moreover, the support region changes in every Gibbs iteration. Our first attempted solution to this problem was to simply generate β from the k-dimensional multivariate normal N k ( β, (X T W X) 1 ) and then accept or reject depending on whether the generated β falls within the support or outside. However, we soon found out this proposal is extremely inefficient. When N is moderately large (for example, N = 481 and k = 2 as in the Beetle mortality example of section 6), the k-dimensional support of β restricted by the N hyperplanes may be a very crooked region of small volume and may carry a very small percentage of the total mass of the N k ( β, (X T W X) 1 ) distribution. 4. Simulation Techniques In this section we discuss two problems on random variate generations: (i) simulation from a univariate truncated normal distribution and (ii) simulation from a multivariate normal over a restricted support. The first problem has an easy solution by the inverse cdf method unless the support interval is in the far tail of the distribution where the method may run into numerical problems. Many other efficient methods are available for generation from distribution tails, see Schmeiser (1980), Daganpur (1988). Since normal distribution is log-concave, one may also use the adaptive rejection method of Gilks and Wild (1992). We follow the envelope rejection method where the truncated normal density is dominated by a truncated exponential density λ exp( λ(x a)) I {x a} and the parameter λ is chosen optimally. See Daganpur (1988, p185) for more details. The second simulation problem arises from the asymmetric link model where we need to generate a random β from the k-dimensional distribution N k ( β, (X T W X) 1 ) on the restricted support {β : σ (Z Xβ) 0}. In the following, we discuss the

7 378 sanjib basu and saurabh mukhopadhyay general problem of generating a random deviate X from a multivariate density f restricted on a support set A R k. Let S be the set {(x, u) : x A, 0 u f(x)} R k+1, i.e., the region under the graph of f over A. Our problem is then equivalent to generating a uniformly distributed random point Y = (X, U) on S since X (= the projection of Y onto A) is then a random deviate with density f on A (see Devroye (1986, Theorem 3.1, p40)). We use the Markov chain Monte Carlo method proposed by Smith (1984) to generate uniformly distributed points over a bounded region S (also see Rubin (1984) for other approaches). Smith s mixing algorithm is as follows : (1) Start with an initial point Y 0 S and i = 0. (2) Generate a random direction d uniformly distributed over a direction set D R k+1. Find the line set L = S {y : y = Y 0 + λ d, λ a real scalar}. Generate a new point Y i+1 uniformly distributed over L. (3) If i < the presepecified maximum iteration number, set i = i + 1 and go back to (2). Smith shows that (under some assumptions) this Markovian scheme generates points asymptotically uniformly distributed over S. One choice for the direction set D is the set of (k + 1) coordinate directions. Moreover, if the restricted region is of the form {x : Ax b}, then the choice of D = {coordinate directions} especially simplifies the determination of the line segment L at every iteration. Notice that the restricted region we consider in the asymmetric link model is exactly of this form. However, if the region under consideration is in the shape of an elongated polygonal tube of small cross-section at an angle to the coordinate axes, then the coordinate direction algorithm will take many small steps and the rate of convergence will be painfully slow. Our experience with the examples suggest that this can happen in the asymmetric link model. Instead of the coordinate directions, we thus choose the alternative random directions algorithm where at each iteration a random direction is chosen from the direction set D = the (k + 1) dimensional unit sphere = {d : d = 1}. The determination of the line segment L is slightly more complicated in the random direction algorithm, but the convergence rate is faster. 5. Model Diagnostics Model selection and model diagnostics are integral parts of any data analysis. The formal Bayesian criterion for comparison of two models is the Bayes factor. The computation of Bayes factor from Markov Chain Monte Carlo analysis, however, is typically difficult since the simulation methods avoid the computation of the normalizing constant and this is precisely what is needed in the Bayes factor (See DiCiccio et al. (1997) for a review of various Bayes factor estimation methods). We avoid these complications of Bayes factor computation and instead use the crossvalidated predictive criteria which have been proposed and used by Geisser and Eddy (1979), Gelfand, Dey, and Chang (1992), Gelfand (1996), Gelfand and Ghosh (1998), among others.

8 bayesian analysis of binary regression 379 In many applications, we observe multiple independent binary responses, under the same covariate vector x i. Let L be the number of distinct x i s, and we denote them by x 1,..., x L. Let n k be the total number of binary Y s observed under x k ( L n k = N) out of which T k (= Y i ) many are 1 s. According to our k=1 i : x i=x k sampling model, T 1,..., T L are independent and T k Binomial(n k, θ k ) where θ k = H(x T β). For model checking, we cross validate the sufficient statistics T 1,..., T L (instead of the Y s). Let t k be the observed value of T k, t be the L 1 observed data vector, and let t (k) denote the (L 1) 1 vector with k-th observation t k deleted. Also, let ω = (β, G, Z, σ) denote the set of unobserved variables. We use f to denote predictive distributions (e.g. f(t k t (k) )) as well as sampling distributions (f(t k ω)), and π to denote priors (π(ω)) as well as posteriors (π(ω t)). We assume all relevant integrals in the following exist. We check models from a cross validated predictive approach and examine f(t k t (k) ), i.e., the predictive distribution of the random variable T k conditioned on the remaining observations t (k). Following Gelfund, Dey and Chang (1992), we compare a random T k from f(t k t (k) ) against the observed value t k by the following two checking criteria. See Gelfund, Dey and Chang (1992) for other checking criteria and more details. (a) d 1k = expected difference between the observed t k and the random T k, i.e., t k µ k where µ k is the mean of the distribution f(t k t (k) ). d 1k are thus the familiar residuals. We can also compute the studentized residuals as d 1k = d 1k s k where s 2 k = Var[T k t (k) ]. We use the quantity Q 1 = (d 1k )2 as a summary model diagnostic index. (b) d 2k = f(t k t (k) ), i.e., the likelihood of observing T k = t k given the remaining observations t (k). Small values of d 2k criticize the model. As suggested by Geisser and Eddy (1979) and Gelfund, Dey and Chang (1992), we use Q 2 = L d 2k as the second summary index of model diagnostic. Notice that Q 2 can be interpreted as a joint pseudo marginal likelihood of the observed t. To compute d 1k and d 2k, we need the mean µ k, the variance s 2 k, and the value f(t k t (k) ) of the predictive distribution f(t k t (k) ). Notice f(t k t (k) ) = f(tk ω) π(ω t (k) ). One possible strategy to approximate f(t k t (k) ) is as follows : (i) delete t k from the observed data vector t to obtain t (k) ; (ii) use Gibbs sampling (sections 2 and 3) to generate R many Monte Carlo samples of ω r from π(ω t (k) ); and (iii) approximate f(t k t (k) ) by the Monte Carlo sum 1 R R f(t k ω r ). Since we need f(t k t (k) ) for every k = 1,..., L, this strategy r=1 would require L separate Gibbs sampling runs. The following alternative strategy works faster; it estimates f(t k t (k) ) for every k = 1,..., L from a single Gibbs run. Notice, if we can generate ω samples from π(ω t (k) ), we can then approximate f(t k t (k) ) by (iii) above. But, π(ω t (k) ) = f(t (k) ω) π(ω)/m(t (k) ) = {m(t)/m(t (k) )} π(ω t)/f(t k ω) k=1

9 380 sanjib basu and saurabh mukhopadhyay = {c(t, t (k) )/f(t k ω)} π(ω t) where m(t) = f(t ω) π(ω) dω is the marginal and c(t, t (k) ) = m(t)/m(t (k) ) is a constant. If we now generate R many Monte Carlo ω samples from the complete posterior distribution π(ω t) by one Gibbs sampling run, f(t k t (k) ) can be estimated by {c(t, t (k) )/R} R f(t k ω r )/f(t k ω r ) for every k = 1,..., L. r=1 The constant c(t, t (k) ) is not known, but notice 1 = π(ω t (k) ) dω = c(t, t (k) ) {1/f(tk ω)} π(ω t) dω. Hence, c(t, t (k) ) can also be estimated from the same Gibbs run by {R 1 R r=1 1/f(t k ω r )} 1. To obtain d 1k, we need µ k = E[T k t (k) ] and σ 2 k = Var[T k t (k) ] which suggests that we may need to estimate the predictive density f(t k t (k) ) for a whole range of values of T k. However, this again could be simplified. Notice, µ k = E[T k t (k) ] = E[ E[T k t (k), ω] t (k) ] = E[ E[T k ω] t (k) ] since T 1,..., T L are conditionally independent given ω. In our setup, T k ω Binomial(n k, θ k ) where θ k = H(x kt β), thus the inside E[T k ω] = n k θ k and µ k = n k θk π(ω t (k) ) dω. Similarly, σ 2 k = Var[T k t (k) ] = Var[ E[T k ω] t (k) ] + E[ Var[T k ω] t (k) ] = Var[n k θ k t (k) ] + E[n k θ k (1 θ k ) t (k) ] = µ 2 k + {n 2 kθk 2 + n k θ k (1 θ k )} π(ω t (k) ) dω. µ k and σk 2 can now be easily estimated by the corresponding Monte Carlo sums taken over samples generated from the posterior distribution. 6. Application We illustrate the use of our binary response models and the calculation of our model diagnostic tools in two examples. The first example studies the well-known Beetle mortality data. We compare analyses based on the the finite mixture model and the asymmetric link model. In the second example, we analyze the Challenger o ring distress data. We compare the performances of our proposed finite mixture and asymmetric link models with Bayesian probit link and t-links. In addition, we also study the performance of the general normal mixture link model proposed in Basu and Mukhopadhyay (1998). Example 1. Bliss (1935) reports the results of a toxicological experiment concerning the number of beetles killed after 5 hours exposure to gaseous carbon disulphide at various concentrations. Figure 1 shows the observed proportion of

10 bayesian analysis of binary regression 381 beetles killed against log dosage of carbon di-sulphide. The plot clearly shows a non-symmetric structure. Aranda-Ordaz (1981), Stukel (1988), Agresti (1990, pp ) and many others examined these data from a non-bayesian viewpoint. These earlier analyses found that typically asymmetric link models lead to significant improvement in the maximum likelihood based model fit L o g D o s e P a t t e r n O b s e r v e d F i n i t e A s y m m e t r i c Figure 1. Observed proportions of beetles killed and posterior expected mortality probabilities from MF and MA models We analyze these data using our proposed finite mixture link model MF and the asymmetric link model MA. We use log dosage as the single covariate. Thus, our postulated link structure is P (Y = 1) = H(β 0 + β 1 log-dosage). In the finite mixture model MF, the function H is a finite scale mixture of normal cdfs, H( ) = s p j Φ( /τ j ) where the mixing probabilities p = (p 1,..., p s ) follow a Dirichlet distribution DD(ν). We use s = 11 and the set of τ j values as T = {0.5, 1, 2, 3, 4, 6, 8, 10, 15, 20, 30}. This choice covers the range of small, moderate as well as large τ values. We use equal values for the Dirichlet distribution parameter, ν 1 =... = ν 11 = α/11 where α > 0 is a user-specified constant. For the asymmetric link model MA, we use similar specification of the prior parameters. Here, the support point of τ j s are chosen to be T { T } where T = {0.5, 1, 2, 3, 4, 6, 8, 10, 15, 20, 30} as in the former model MF. The Dirichlet distribution parameters are specified to be ν 1 =... = ν 22 = α/22. In Figure 1, the posterior expected probabilities of beetle mortality obtained from models MF and MA are plotted along with the observed proportions. The asymmetric link model MA easily adapts to the non-symmetric shape displayed by

11 382 sanjib basu and saurabh mukhopadhyay the observed proportions and shows a better fit compared to the symmetric finite mixture model MF. Let n k and t k denote the number of beetles exposed and the number of beetles killed at a particular log-dosage level. We use the sum of squared differences between the observed and expected counts, i.e., SSE = L k=1 [t k n k E{H(β 0 + β 1 log-dosage) t}] 2 as a summary index of model fit. We further calculate the summarized diagnostic indices Q 1 = d 1k2 and Q 2 = d 2k as proposed in section 5. These values are shown in Table 1. The SSE for model MA is almost half of the SSE value for model MF indicating a significantly better fit from the former model. A comparison of the diagnostic index Q 2 values for the two models provides further support for this statement. The value of Q 2 for the MA model is almost 38 times higher than the MF model. The other index Q 1, however, is slightly higher for the asymmetric model MA. This is mostly due to the high influence (d 1k = 5.28) of the 6th observation (see plots of d 1k and d 2k in Figure 2). Overall, the asymmetric link model MA fits significantly better than the finite mixture model MF on these data. Table 1. Beetle mortality data : SSE and summary model diagnostic measures Finite mixture model MF Asymmetric model MA SSE Q 1 = d 2 1k k=1 48 Q 2 = d 2k k=1 Example 2. An interesting application of binary regression to the risk analysis of the Challenger space shuttle is given in Dalal, Fowlkes and Hoadley (1989). The Rogers commission concluded that the Challenger accident was caused by gas leak through the 6 o-ring joints of the shuttle. Dalal, Fowlkes and Hoadley (1989) looked at the number of distressed o-rings (among the 6) versus launch temperature (Temp.) and pressure (Pres.) for 23 previous shuttle flights. The previous shuttles were launched at temperatures between 53 F and 81 F. A maximum likelihood logistic regression analysis with temperature and pressure as covariates yields insignificant effect of pressure and predicts a strong probability ( 82%) of distress in o-rings at Temp. = 31 and Pres. = 200, the actual launch conditions of the fatal shuttle. However, Lavine (1991) later pointed out that such a prediction depends strongly on the choice of the link function. A probit or a complimentary log-log link model fits the data equally well, but predicts a smaller probability of distress (about 67%) at the launch conditions. In our Bayesian analysis of Challenger data, we include both temperature and pressure as covariates. We examine the following models for the link function H; MP: a probit link, H = Φ; MT : a t link, H = cdf of a t distribution; MF : a finite mixture model; and MA: an asymmetric link model. In addition, we examine the

12 bayesian analysis of binary regression 383 d 1 k P a t t e r n A s y m m e t r i c F i n i t e l o g ( d 2 k ) P a t t e r n A s y m m e t r i c F i n i t e Figure 2. Beetle mortality data : Model diagnostics for MF and MA models general mixture model MG proposed in Basu and Mukhopadhyay (1998). In this model, the link function, H( ) = Φ( /σ)dg(σ), is still a scale mixture of normals. However, the mixing distribution, G, is not restricted to be supported on finitely many points. Rather, it can be any arbitrary distribution. A Dirichlet process prior is assumed on the mixing distribution G. We refer the reader to Basu and Mukhopadhyay (1998) for further details about this general mixture model.

13 384 sanjib basu and saurabh mukhopadhyay The performances of these five models in the Challenger data are shown in Table 2 and Figures 3 and 4. In terms of SSE, the general mixture model MG performs the best, though the SSE values for the other models (except for MA) are comparable. The asymmetric link model MA has the worst (largest) SSE value but its summary model diagnostic measure Q 1 = d 1k2 is the best (smallest). The plots of d 1k and d 2k in Figure 4 point out the 5th and the 14th observation as influential in models MF and MG. Both Dalal, Fowlkes and Hoadley (1989) and Lavine (1991) also found observation 14 troublesome. However, notice that neither of these observations are influential in the MA model. In fact, all the d 1k and d 2k values are within reasonable range in the MA model. The MA model thus adapts itself to guard against the influential observations, but loses in terms of SSE (the model fitting criterion) in the process. This dichotomy between model diagnostics and model fitting can also be seen in the plot of fitted probabilities in Figure 3. Table 2. Challenger data SSE, summary model diagnostics and predicted failure probability, P (31, 200) Model SSE Q 1 = d 2 1k Q 2 = d 2k P (31, 200) k=1 k=1 MP MT MF MG MA Figure 3. Observed proportions of o-ring distress and posterior expected / predicted distress probabilities from the five models : MP, MT, MF, MG and MA

14 bayesian analysis of binary regression 385 Figure 4. Challenger data : Model diagnostics for MF, MG and MA models One of the principal aim of the analysis of Challenger data is to predict the probability of a failure (P(31,200)) at Temp. = 31 and Pres. = 200. For our five Bayesian models, the P(31,200) values listed in Table 2 show a wide range. This is clearly expected and is in agreement with Lavine s (1991) findings. As seen in Figure 4, Temp. = 31 is far beyond the range of the observed data. Prediction at such an extrapolated point is expected to strongly depend on the choice of the link function and choice of other parameters of the model. 7. Discussion In this article, we have proposed Bayesian analyses of binary response regression using scale mixture of cdfs as the link function H. We have presented two different link structures: finite normal scale mixtures MF and scale mixtures of truncated normals MA. These models introduce flexibility in the choice of H and free the user from using a single pre-specified functional form. Moreover, the model MA introduces additional flexibility by allowing asymmetry in H. Basu and Mukhopadhyay (1998) generalized the finite mixture model to a general mixture model where the mixing distribution G is not restricted to be supported on finitely many points. They use a Dirichlet process prior on G. We note here

15 386 sanjib basu and saurabh mukhopadhyay that their proposed methodology can easily be implemented on our asymmetric link model of section 3. The resulting analysis would involve a simple combination of their methodology with the techniques proposed in section 3. References Albert, J.H. and Chib, S. (1993). Bayesian analysis of binary and polychotomous response data, Journal of the American Statistical Association, 88, Agresti, A. (1990). Categorical Data Analysis, John Wiley. Aranda-Ordaz F.J. (1981). On two families of transformations to additivity for binary response data, Biometrika, 68, Basu, S. and Mukhopadhyay, S. (1998). Binary response regression with normal scale mixture links, In Generalized Linear Models: A Bayesian Perspective, D.K. Dey et al. (Eds.) , Marcel Dekker, New York. Daganpur, J. (1988). Principles of Random Variate Generation, Oxford University Press, Oxford. Dalal, S.R., Fowlkes, E.B., and Hoadley, B. (1989). Risk analysis of space shuttle : Pre- Challenger Prediction of Failure, Journal of the American Statistical Association, 84, DiCiccio, T.J., Kass, R.E., Raftery, A., Wasserman, L. (1997). Computing Bayes factor by combining simulation and asymptotic approximations, Journal of the American Statistical Association, 92, Devroye, L. (1986). Nonuniform Random variate Generation. Springer-Verlag, New York. Chen, M-H. and Dey, D.K. (1998). Bayesian modeling of correlated binary responses via scale mixture of multivariate normal link functions, Sankhya, 60, 322. Geisser, S. and Eddy, W. (1979). A predictive approach to model selection, Journal of the American Statistical Association, 74, Gelfand, A.E. (1996). Model determination using sampling-based methods, In Markov Chain Monte Carlo in Practice (W.R. Gilks, S. Richardson, and D.J. Spiegelhalter Eds.), Chapman and Hall, London Gelfand, A.E., Dey, D.K., and Chang, H. (1992). Model determination using predictive distributions with implementations via sampling-based methods, In Bayesian Statistics 4, J.M. Bernardo, et. al. (Eds.), Oxford University Press, Oxford. Gelfand, A.E. and Ghosh, S.K. (1998). Model choice: A minimum posterior predictive loss approach. Biometrika, 85, Gilks, W.R. and Wild, P. (1992). Adaptive rejection sampling for Gibbs sampling, Applied Statistics, 41, Guerrero, V.M. and Johnson, R. (1982). Use of the Box-Cox transformation with Binary Response models, Biometrika, 69, Lavine, M. (1991). Problems in extrapolation illustrated with space shuttle o-ring data, Journal of the American Statistical Association, 86, Lo, A.Y. (1984). On a class of Bayesian nonparametric estimates: I. Density estimates, Annals of Statistics, 12, McCullagh, P. and Nelder, J. (1989). Generalized Linear Models. 2nd ed., Chapman and Hall. Prentice, R.L. (1976). A generalization of the probit and logit models for dose response curves, Biometrics, 32, Rubin, P.A. (1984). Generating random points in a polytope, Communications in Statistics- Simulation and Computation, 13, Schmeisser, B.W. (1980). Generation of variates from distribution tails, Operat. Res., 28,

16 bayesian analysis of binary regression 387 Smith, R.L. (1984). Efficient Monte Carlo procedures for generating points uniformly distributed over bounded regions, Operat. Res., 32, Stukel, T.A. (1988). Generalized logistic models, Journal of the American Statistical Association, 83, Sanjib Basu Division of Statistics Northern Illinois University DeKalb, IL USA Saurabh Mukhopadhyay Merck Research Laboratories P.O. Box 2000, RY Rahway, New Jersey USA saurabh

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns

More information

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary

More information

A Bayesian Mixture Model with Application to Typhoon Rainfall Predictions in Taipei, Taiwan 1

A Bayesian Mixture Model with Application to Typhoon Rainfall Predictions in Taipei, Taiwan 1 Int. J. Contemp. Math. Sci., Vol. 2, 2007, no. 13, 639-648 A Bayesian Mixture Model with Application to Typhoon Rainfall Predictions in Taipei, Taiwan 1 Tsai-Hung Fan Graduate Institute of Statistics National

More information

The Bayesian Approach to Multi-equation Econometric Model Estimation

The Bayesian Approach to Multi-equation Econometric Model Estimation Journal of Statistical and Econometric Methods, vol.3, no.1, 2014, 85-96 ISSN: 2241-0384 (print), 2241-0376 (online) Scienpress Ltd, 2014 The Bayesian Approach to Multi-equation Econometric Model Estimation

More information

A note on Reversible Jump Markov Chain Monte Carlo

A note on Reversible Jump Markov Chain Monte Carlo A note on Reversible Jump Markov Chain Monte Carlo Hedibert Freitas Lopes Graduate School of Business The University of Chicago 5807 South Woodlawn Avenue Chicago, Illinois 60637 February, 1st 2006 1 Introduction

More information

Discussion on Fygenson (2007, Statistica Sinica): a DS Perspective

Discussion on Fygenson (2007, Statistica Sinica): a DS Perspective 1 Discussion on Fygenson (2007, Statistica Sinica): a DS Perspective Chuanhai Liu Purdue University 1. Introduction In statistical analysis, it is important to discuss both uncertainty due to model choice

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

The Polya-Gamma Gibbs Sampler for Bayesian. Logistic Regression is Uniformly Ergodic

The Polya-Gamma Gibbs Sampler for Bayesian. Logistic Regression is Uniformly Ergodic he Polya-Gamma Gibbs Sampler for Bayesian Logistic Regression is Uniformly Ergodic Hee Min Choi and James P. Hobert Department of Statistics University of Florida August 013 Abstract One of the most widely

More information

7. Estimation and hypothesis testing. Objective. Recommended reading

7. Estimation and hypothesis testing. Objective. Recommended reading 7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing

More information

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University Econ 690 Purdue University In virtually all of the previous lectures, our models have made use of normality assumptions. From a computational point of view, the reason for this assumption is clear: combined

More information

Bayesian Multivariate Logistic Regression

Bayesian Multivariate Logistic Regression Bayesian Multivariate Logistic Regression Sean M. O Brien and David B. Dunson Biostatistics Branch National Institute of Environmental Health Sciences Research Triangle Park, NC 1 Goals Brief review of

More information

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California Texts in Statistical Science Bayesian Ideas and Data Analysis An Introduction for Scientists and Statisticians Ronald Christensen University of New Mexico Albuquerque, New Mexico Wesley Johnson University

More information

Marginal Specifications and a Gaussian Copula Estimation

Marginal Specifications and a Gaussian Copula Estimation Marginal Specifications and a Gaussian Copula Estimation Kazim Azam Abstract Multivariate analysis involving random variables of different type like count, continuous or mixture of both is frequently required

More information

Simulation of truncated normal variables. Christian P. Robert LSTA, Université Pierre et Marie Curie, Paris

Simulation of truncated normal variables. Christian P. Robert LSTA, Université Pierre et Marie Curie, Paris Simulation of truncated normal variables Christian P. Robert LSTA, Université Pierre et Marie Curie, Paris Abstract arxiv:0907.4010v1 [stat.co] 23 Jul 2009 We provide in this paper simulation algorithms

More information

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations John R. Michael, Significance, Inc. and William R. Schucany, Southern Methodist University The mixture

More information

Lecture 5: Spatial probit models. James P. LeSage University of Toledo Department of Economics Toledo, OH

Lecture 5: Spatial probit models. James P. LeSage University of Toledo Department of Economics Toledo, OH Lecture 5: Spatial probit models James P. LeSage University of Toledo Department of Economics Toledo, OH 43606 jlesage@spatial-econometrics.com March 2004 1 A Bayesian spatial probit model with individual

More information

STAT 518 Intro Student Presentation

STAT 518 Intro Student Presentation STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible

More information

Monte Carlo Integration using Importance Sampling and Gibbs Sampling

Monte Carlo Integration using Importance Sampling and Gibbs Sampling Monte Carlo Integration using Importance Sampling and Gibbs Sampling Wolfgang Hörmann and Josef Leydold Department of Statistics University of Economics and Business Administration Vienna Austria hormannw@boun.edu.tr

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA Kasun Rathnayake ; A/Prof Jun Ma Department of Statistics Faculty of Science and Engineering Macquarie University

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Bayesian Nonparametric Regression for Diabetes Deaths

Bayesian Nonparametric Regression for Diabetes Deaths Bayesian Nonparametric Regression for Diabetes Deaths Brian M. Hartman PhD Student, 2010 Texas A&M University College Station, TX, USA David B. Dahl Assistant Professor Texas A&M University College Station,

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Review. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

Bayes methods for categorical data. April 25, 2017

Bayes methods for categorical data. April 25, 2017 Bayes methods for categorical data April 25, 2017 Motivation for joint probability models Increasing interest in high-dimensional data in broad applications Focus may be on prediction, variable selection,

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j Standard Errors & Confidence Intervals β β asy N(0, I( β) 1 ), where I( β) = [ 2 l(β, φ; y) ] β i β β= β j We can obtain asymptotic 100(1 α)% confidence intervals for β j using: β j ± Z 1 α/2 se( β j )

More information

A NONINFORMATIVE BAYESIAN APPROACH FOR TWO-STAGE CLUSTER SAMPLING

A NONINFORMATIVE BAYESIAN APPROACH FOR TWO-STAGE CLUSTER SAMPLING Sankhyā : The Indian Journal of Statistics Special Issue on Sample Surveys 1999, Volume 61, Series B, Pt. 1, pp. 133-144 A OIFORMATIVE BAYESIA APPROACH FOR TWO-STAGE CLUSTER SAMPLIG By GLE MEEDE University

More information

Xia Wang and Dipak K. Dey

Xia Wang and Dipak K. Dey A Flexible Skewed Link Function for Binary Response Data Xia Wang and Dipak K. Dey Technical Report #2008-5 June 18, 2008 This material was based upon work supported by the National Science Foundation

More information

Bayesian Semiparametric GARCH Models

Bayesian Semiparametric GARCH Models Bayesian Semiparametric GARCH Models Xibin (Bill) Zhang and Maxwell L. King Department of Econometrics and Business Statistics Faculty of Business and Economics xibin.zhang@monash.edu Quantitative Methods

More information

Kazuhiko Kakamu Department of Economics Finance, Institute for Advanced Studies. Abstract

Kazuhiko Kakamu Department of Economics Finance, Institute for Advanced Studies. Abstract Bayesian Estimation of A Distance Functional Weight Matrix Model Kazuhiko Kakamu Department of Economics Finance, Institute for Advanced Studies Abstract This paper considers the distance functional weight

More information

Bayesian Semiparametric GARCH Models

Bayesian Semiparametric GARCH Models Bayesian Semiparametric GARCH Models Xibin (Bill) Zhang and Maxwell L. King Department of Econometrics and Business Statistics Faculty of Business and Economics xibin.zhang@monash.edu Quantitative Methods

More information

Bayesian Inference for the Multivariate Normal

Bayesian Inference for the Multivariate Normal Bayesian Inference for the Multivariate Normal Will Penny Wellcome Trust Centre for Neuroimaging, University College, London WC1N 3BG, UK. November 28, 2014 Abstract Bayesian inference for the multivariate

More information

Analysing geoadditive regression data: a mixed model approach

Analysing geoadditive regression data: a mixed model approach Analysing geoadditive regression data: a mixed model approach Institut für Statistik, Ludwig-Maximilians-Universität München Joint work with Ludwig Fahrmeir & Stefan Lang 25.11.2005 Spatio-temporal regression

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Sahar Z Zangeneh Robert W. Keener Roderick J.A. Little Abstract In Probability proportional

More information

A Fully Nonparametric Modeling Approach to. BNP Binary Regression

A Fully Nonparametric Modeling Approach to. BNP Binary Regression A Fully Nonparametric Modeling Approach to Binary Regression Maria Department of Applied Mathematics and Statistics University of California, Santa Cruz SBIES, April 27-28, 2012 Outline 1 2 3 Simulation

More information

Lecture 16: Mixtures of Generalized Linear Models

Lecture 16: Mixtures of Generalized Linear Models Lecture 16: Mixtures of Generalized Linear Models October 26, 2006 Setting Outline Often, a single GLM may be insufficiently flexible to characterize the data Setting Often, a single GLM may be insufficiently

More information

Gibbs Sampling for the Probit Regression Model with Gaussian Markov Random Field Latent Variables

Gibbs Sampling for the Probit Regression Model with Gaussian Markov Random Field Latent Variables Gibbs Sampling for the Probit Regression Model with Gaussian Markov Random Field Latent Variables Mohammad Emtiyaz Khan Department of Computer Science University of British Columbia May 8, 27 Abstract

More information

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

LOGISTIC REGRESSION Joseph M. Hilbe

LOGISTIC REGRESSION Joseph M. Hilbe LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of

More information

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC Stat 451 Lecture Notes 07 12 Markov Chain Monte Carlo Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapters 8 9 in Givens & Hoeting, Chapters 25 27 in Lange 2 Updated: April 4, 2016 1 / 42 Outline

More information

STA 216, GLM, Lecture 16. October 29, 2007

STA 216, GLM, Lecture 16. October 29, 2007 STA 216, GLM, Lecture 16 October 29, 2007 Efficient Posterior Computation in Factor Models Underlying Normal Models Generalized Latent Trait Models Formulation Genetic Epidemiology Illustration Structural

More information

Variable Selection for Multivariate Logistic Regression Models

Variable Selection for Multivariate Logistic Regression Models Variable Selection for Multivariate Logistic Regression Models Ming-Hui Chen and Dipak K. Dey Journal of Statistical Planning and Inference, 111, 37-55 Abstract In this paper, we use multivariate logistic

More information

HIERARCHICAL BAYESIAN ANALYSIS OF BINARY MATCHED PAIRS DATA

HIERARCHICAL BAYESIAN ANALYSIS OF BINARY MATCHED PAIRS DATA Statistica Sinica 1(2), 647-657 HIERARCHICAL BAYESIAN ANALYSIS OF BINARY MATCHED PAIRS DATA Malay Ghosh, Ming-Hui Chen, Atalanta Ghosh and Alan Agresti University of Florida, Worcester Polytechnic Institute

More information

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A. Linero and M. Daniels UF, UT-Austin SRC 2014, Galveston, TX 1 Background 2 Working model

More information

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Bagging During Markov Chain Monte Carlo for Smoother Predictions Bagging During Markov Chain Monte Carlo for Smoother Predictions Herbert K. H. Lee University of California, Santa Cruz Abstract: Making good predictions from noisy data is a challenging problem. Methods

More information

Data Augmentation for the Bayesian Analysis of Multinomial Logit Models

Data Augmentation for the Bayesian Analysis of Multinomial Logit Models Data Augmentation for the Bayesian Analysis of Multinomial Logit Models Steven L. Scott, University of Southern California Bridge Hall 401-H, Los Angeles, CA 90089-1421 (sls@usc.edu) Key Words: Markov

More information

A Level-Set Hit-And-Run Sampler for Quasi- Concave Distributions

A Level-Set Hit-And-Run Sampler for Quasi- Concave Distributions University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 2014 A Level-Set Hit-And-Run Sampler for Quasi- Concave Distributions Shane T. Jensen University of Pennsylvania Dean

More information

Bayesian non-parametric model to longitudinally predict churn

Bayesian non-parametric model to longitudinally predict churn Bayesian non-parametric model to longitudinally predict churn Bruno Scarpa Università di Padova Conference of European Statistics Stakeholders Methodologists, Producers and Users of European Statistics

More information

Dynamic System Identification using HDMR-Bayesian Technique

Dynamic System Identification using HDMR-Bayesian Technique Dynamic System Identification using HDMR-Bayesian Technique *Shereena O A 1) and Dr. B N Rao 2) 1), 2) Department of Civil Engineering, IIT Madras, Chennai 600036, Tamil Nadu, India 1) ce14d020@smail.iitm.ac.in

More information

Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements

Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements Jeffrey N. Rouder Francis Tuerlinckx Paul L. Speckman Jun Lu & Pablo Gomez May 4 008 1 The Weibull regression model

More information

Bayesian Hypothesis Testing in GLMs: One-Sided and Ordered Alternatives. 1(w i = h + 1)β h + ɛ i,

Bayesian Hypothesis Testing in GLMs: One-Sided and Ordered Alternatives. 1(w i = h + 1)β h + ɛ i, Bayesian Hypothesis Testing in GLMs: One-Sided and Ordered Alternatives Often interest may focus on comparing a null hypothesis of no difference between groups to an ordered restricted alternative. For

More information

Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling

Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling 1 / 27 Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling Melih Kandemir Özyeğin University, İstanbul, Turkey 2 / 27 Monte Carlo Integration The big question : Evaluate E p(z) [f(z)]

More information

Gaussian kernel GARCH models

Gaussian kernel GARCH models Gaussian kernel GARCH models Xibin (Bill) Zhang and Maxwell L. King Department of Econometrics and Business Statistics Faculty of Business and Economics 7 June 2013 Motivation A regression model is often

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Bayesian estimation of bandwidths for a nonparametric regression model with a flexible error density

Bayesian estimation of bandwidths for a nonparametric regression model with a flexible error density ISSN 1440-771X Australia Department of Econometrics and Business Statistics http://www.buseco.monash.edu.au/depts/ebs/pubs/wpapers/ Bayesian estimation of bandwidths for a nonparametric regression model

More information

Index. Pagenumbersfollowedbyf indicate figures; pagenumbersfollowedbyt indicate tables.

Index. Pagenumbersfollowedbyf indicate figures; pagenumbersfollowedbyt indicate tables. Index Pagenumbersfollowedbyf indicate figures; pagenumbersfollowedbyt indicate tables. Adaptive rejection metropolis sampling (ARMS), 98 Adaptive shrinkage, 132 Advanced Photo System (APS), 255 Aggregation

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Theory and Methods of Statistical Inference

Theory and Methods of Statistical Inference PhD School in Statistics cycle XXIX, 2014 Theory and Methods of Statistical Inference Instructors: B. Liseo, L. Pace, A. Salvan (course coordinator), N. Sartori, A. Tancredi, L. Ventura Syllabus Some prerequisites:

More information

Bayesian time series classification

Bayesian time series classification Bayesian time series classification Peter Sykacek Department of Engineering Science University of Oxford Oxford, OX 3PJ, UK psyk@robots.ox.ac.uk Stephen Roberts Department of Engineering Science University

More information

A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL

A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL Discussiones Mathematicae Probability and Statistics 36 206 43 5 doi:0.75/dmps.80 A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL Tadeusz Bednarski Wroclaw University e-mail: t.bednarski@prawo.uni.wroc.pl

More information

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture 15-7th March Arnaud Doucet

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture 15-7th March Arnaud Doucet Stat 535 C - Statistical Computing & Monte Carlo Methods Lecture 15-7th March 2006 Arnaud Doucet Email: arnaud@cs.ubc.ca 1 1.1 Outline Mixture and composition of kernels. Hybrid algorithms. Examples Overview

More information

7. Estimation and hypothesis testing. Objective. Recommended reading

7. Estimation and hypothesis testing. Objective. Recommended reading 7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing

More information

Charles E. McCulloch Biometrics Unit and Statistics Center Cornell University

Charles E. McCulloch Biometrics Unit and Statistics Center Cornell University A SURVEY OF VARIANCE COMPONENTS ESTIMATION FROM BINARY DATA by Charles E. McCulloch Biometrics Unit and Statistics Center Cornell University BU-1211-M May 1993 ABSTRACT The basic problem of variance components

More information

Reconstruction of individual patient data for meta analysis via Bayesian approach

Reconstruction of individual patient data for meta analysis via Bayesian approach Reconstruction of individual patient data for meta analysis via Bayesian approach Yusuke Yamaguchi, Wataru Sakamoto and Shingo Shirahata Graduate School of Engineering Science, Osaka University Masashi

More information

Efficient Sampling Methods for Truncated Multivariate Normal and Student-t Distributions Subject to Linear Inequality Constraints

Efficient Sampling Methods for Truncated Multivariate Normal and Student-t Distributions Subject to Linear Inequality Constraints Efficient Sampling Methods for Truncated Multivariate Normal and Student-t Distributions Subject to Linear Inequality Constraints Yifang Li Department of Statistics, North Carolina State University 2311

More information

ABC methods for phase-type distributions with applications in insurance risk problems

ABC methods for phase-type distributions with applications in insurance risk problems ABC methods for phase-type with applications problems Concepcion Ausin, Department of Statistics, Universidad Carlos III de Madrid Joint work with: Pedro Galeano, Universidad Carlos III de Madrid Simon

More information

arxiv: v1 [stat.co] 18 Feb 2012

arxiv: v1 [stat.co] 18 Feb 2012 A LEVEL-SET HIT-AND-RUN SAMPLER FOR QUASI-CONCAVE DISTRIBUTIONS Dean Foster and Shane T. Jensen arxiv:1202.4094v1 [stat.co] 18 Feb 2012 Department of Statistics The Wharton School University of Pennsylvania

More information

A general mixed model approach for spatio-temporal regression data

A general mixed model approach for spatio-temporal regression data A general mixed model approach for spatio-temporal regression data Thomas Kneib, Ludwig Fahrmeir & Stefan Lang Department of Statistics, Ludwig-Maximilians-University Munich 1. Spatio-temporal regression

More information

Nonparametric Bayesian modeling for dynamic ordinal regression relationships

Nonparametric Bayesian modeling for dynamic ordinal regression relationships Nonparametric Bayesian modeling for dynamic ordinal regression relationships Athanasios Kottas Department of Applied Mathematics and Statistics, University of California, Santa Cruz Joint work with Maria

More information

BAYESIAN MODEL CRITICISM

BAYESIAN MODEL CRITICISM Monte via Chib s BAYESIAN MODEL CRITICM Hedibert Freitas Lopes The University of Chicago Booth School of Business 5807 South Woodlawn Avenue, Chicago, IL 60637 http://faculty.chicagobooth.edu/hedibert.lopes

More information

eqr094: Hierarchical MCMC for Bayesian System Reliability

eqr094: Hierarchical MCMC for Bayesian System Reliability eqr094: Hierarchical MCMC for Bayesian System Reliability Alyson G. Wilson Statistical Sciences Group, Los Alamos National Laboratory P.O. Box 1663, MS F600 Los Alamos, NM 87545 USA Phone: 505-667-9167

More information

Bayesian model selection: methodology, computation and applications

Bayesian model selection: methodology, computation and applications Bayesian model selection: methodology, computation and applications David Nott Department of Statistics and Applied Probability National University of Singapore Statistical Genomics Summer School Program

More information

22s:152 Applied Linear Regression. Chapter 2: Regression Analysis. a class of statistical methods for

22s:152 Applied Linear Regression. Chapter 2: Regression Analysis. a class of statistical methods for 22s:152 Applied Linear Regression Chapter 2: Regression Analysis Regression analysis a class of statistical methods for studying relationships between variables that can be measured e.g. predicting blood

More information

Partial factor modeling: predictor-dependent shrinkage for linear regression

Partial factor modeling: predictor-dependent shrinkage for linear regression modeling: predictor-dependent shrinkage for linear Richard Hahn, Carlos Carvalho and Sayan Mukherjee JASA 2013 Review by Esther Salazar Duke University December, 2013 Factor framework The factor framework

More information

Bayesian inference for multivariate skew-normal and skew-t distributions

Bayesian inference for multivariate skew-normal and skew-t distributions Bayesian inference for multivariate skew-normal and skew-t distributions Brunero Liseo Sapienza Università di Roma Banff, May 2013 Outline Joint research with Antonio Parisi (Roma Tor Vergata) 1. Inferential

More information

COMPOSITIONAL IDEAS IN THE BAYESIAN ANALYSIS OF CATEGORICAL DATA WITH APPLICATION TO DOSE FINDING CLINICAL TRIALS

COMPOSITIONAL IDEAS IN THE BAYESIAN ANALYSIS OF CATEGORICAL DATA WITH APPLICATION TO DOSE FINDING CLINICAL TRIALS COMPOSITIONAL IDEAS IN THE BAYESIAN ANALYSIS OF CATEGORICAL DATA WITH APPLICATION TO DOSE FINDING CLINICAL TRIALS M. Gasparini and J. Eisele 2 Politecnico di Torino, Torino, Italy; mauro.gasparini@polito.it

More information

Markov Chain Monte Carlo in Practice

Markov Chain Monte Carlo in Practice Markov Chain Monte Carlo in Practice Edited by W.R. Gilks Medical Research Council Biostatistics Unit Cambridge UK S. Richardson French National Institute for Health and Medical Research Vilejuif France

More information

A model of skew item response theory

A model of skew item response theory 1 A model of skew item response theory Jorge Luis Bazán, Heleno Bolfarine, Marcia D Ellia Branco Department of Statistics University of So Paulo Brazil ISBA 2004 May 23-27, Via del Mar, Chile 2 Motivation

More information

Kobe University Repository : Kernel

Kobe University Repository : Kernel Kobe University Repository : Kernel タイトル Title 著者 Author(s) 掲載誌 巻号 ページ Citation 刊行日 Issue date 資源タイプ Resource Type 版区分 Resource Version 権利 Rights DOI URL Note on the Sampling Distribution for the Metropolis-

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

BAYESIAN ANALYSIS OF CORRELATED PROPORTIONS

BAYESIAN ANALYSIS OF CORRELATED PROPORTIONS Sankhyā : The Indian Journal of Statistics 2001, Volume 63, Series B, Pt. 3, pp 270-285 BAYESIAN ANALYSIS OF CORRELATED PROPORTIONS By MARIA KATERI, University of Ioannina TAKIS PAPAIOANNOU University

More information

Generalized common spatial factor model

Generalized common spatial factor model Biostatistics (2003), 4, 4,pp. 569 582 Printed in Great Britain Generalized common spatial factor model FUJUN WANG Eli Lilly and Company, Indianapolis, IN 46285, USA MELANIE M. WALL Division of Biostatistics,

More information

Rank Regression with Normal Residuals using the Gibbs Sampler

Rank Regression with Normal Residuals using the Gibbs Sampler Rank Regression with Normal Residuals using the Gibbs Sampler Stephen P Smith email: hucklebird@aol.com, 2018 Abstract Yu (2000) described the use of the Gibbs sampler to estimate regression parameters

More information

A nonparametric Bayesian approach to inference for non-homogeneous. Poisson processes. Athanasios Kottas 1. (REVISED VERSION August 23, 2006)

A nonparametric Bayesian approach to inference for non-homogeneous. Poisson processes. Athanasios Kottas 1. (REVISED VERSION August 23, 2006) A nonparametric Bayesian approach to inference for non-homogeneous Poisson processes Athanasios Kottas 1 Department of Applied Mathematics and Statistics, Baskin School of Engineering, University of California,

More information

D-optimal Designs for Factorial Experiments under Generalized Linear Models

D-optimal Designs for Factorial Experiments under Generalized Linear Models D-optimal Designs for Factorial Experiments under Generalized Linear Models Jie Yang Department of Mathematics, Statistics, and Computer Science University of Illinois at Chicago Joint research with Abhyuday

More information

Power and Sample Size Calculations with the Additive Hazards Model

Power and Sample Size Calculations with the Additive Hazards Model Journal of Data Science 10(2012), 143-155 Power and Sample Size Calculations with the Additive Hazards Model Ling Chen, Chengjie Xiong, J. Philip Miller and Feng Gao Washington University School of Medicine

More information

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Bayesian Inference. Chapter 4: Regression and Hierarchical Models Bayesian Inference Chapter 4: Regression and Hierarchical Models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Advanced Statistics and Data Mining Summer School

More information

Bayes: All uncertainty is described using probability.

Bayes: All uncertainty is described using probability. Bayes: All uncertainty is described using probability. Let w be the data and θ be any unknown quantities. Likelihood. The probability model π(w θ) has θ fixed and w varying. The likelihood L(θ; w) is π(w

More information

On a multivariate implementation of the Gibbs sampler

On a multivariate implementation of the Gibbs sampler Note On a multivariate implementation of the Gibbs sampler LA García-Cortés, D Sorensen* National Institute of Animal Science, Research Center Foulum, PB 39, DK-8830 Tjele, Denmark (Received 2 August 1995;

More information

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features Yangxin Huang Department of Epidemiology and Biostatistics, COPH, USF, Tampa, FL yhuang@health.usf.edu January

More information

Statistics & Data Sciences: First Year Prelim Exam May 2018

Statistics & Data Sciences: First Year Prelim Exam May 2018 Statistics & Data Sciences: First Year Prelim Exam May 2018 Instructions: 1. Do not turn this page until instructed to do so. 2. Start each new question on a new sheet of paper. 3. This is a closed book

More information

Bayesian Analysis for Step-Stress Accelerated Life Testing using Weibull Proportional Hazard Model

Bayesian Analysis for Step-Stress Accelerated Life Testing using Weibull Proportional Hazard Model Noname manuscript No. (will be inserted by the editor) Bayesian Analysis for Step-Stress Accelerated Life Testing using Weibull Proportional Hazard Model Naijun Sha Rong Pan Received: date / Accepted:

More information

Basic Sampling Methods

Basic Sampling Methods Basic Sampling Methods Sargur Srihari srihari@cedar.buffalo.edu 1 1. Motivation Topics Intractability in ML How sampling can help 2. Ancestral Sampling Using BNs 3. Transforming a Uniform Distribution

More information