Bayesian Hypothesis Testing in GLMs: One-Sided and Ordered Alternatives. 1(w i = h + 1)β h + ɛ i,

Size: px

Start display at page:

Download "Bayesian Hypothesis Testing in GLMs: One-Sided and Ordered Alternatives. 1(w i = h + 1)β h + ɛ i,"

Anissa Wood
5 years ago
Views:

1 Bayesian Hypothesis Testing in GLMs: One-Sided and Ordered Alternatives Often interest may focus on comparing a null hypothesis of no difference between groups to an ordered restricted alternative. For example, we may have a k level ordered categorical predictor, w i, y i = β 0 + k 1 1(w i = h + 1)β h + ɛ i, where ɛ i N(0, σ 2 ). h=1 H 0 : β 1 =... = β k 1 = 0 homogeneity (no association) H 1 : β 1... β k 1 simple increasing order How to assess from Bayesian perspective?

2 Suppose we assumed the conjugate prior density, β = (β 0,..., β k 1 ) N(µ 0, Σ 0 ) and σ 2 IG(a 0, b 0 ). Under this prior density, we could easily calculate the posterior density Posterior probabilities of β h < 0 can be calculated We can also calculate Pr(β 1... β k 1 data) How can we address H 0 vs H 1 using this posterior? Is there a better way?

3 The Bayes factor is a standard way of comparing two hypotheses, H 0 and H 1. To calculate the Bayes factor, we need to calculate the prior and posterior probabilities of each of the two hypotheses. What are these probabilities under the conjugate normal prior? Can we use Pr(H 0 ) = 1 Pr(H 1 ) = 1 Pr(β 1... β k 1 ) as the prior? Why or why not?

4 The problem with this approach is that the typical normal conjugate prior assigns zero probability to the null hypothesis. Thus, the above strategy doesn t make sense. Instead, we want to choose a prior density for β that allocates probability to H 0 and H 1, with these probabilities adding to one. Essentially, we need a prior that has support on the restricted space Ω = {β : β 1... β k 1 }, with positive probability assigned to equalities.

5 We would also like to have a prior is easy to elicit and results in easy computation To place order restrictions on parameters in Bayesian models, Gelfand, Smith and Lee (1992) proposed priors of the form π(β) 1(β Ω) N(µ 0, Σ 0 ), which is a truncated Gaussian density This prior allocates probability one to the restricted space Ω In addition, the full conditional densities of the β s follow a conditionally conjugate normal form Is this approach good for comparing H 0 and H 1?

6 Actually, we are still assigning zero prior probability to the null hypothesis H 0. By discard draws from the multivariate normal density that are inconsistent with β 1... β k 1, we ensure that strictly increasing order is satisfied However, we never draw a value of β such that β j = β h. A generalization is to include point masses to accommodate equalites

7 In particular, first reparameterize so that γ 1 = β 1 and γ j = β j β j 1 for j = 2,..., k 1. Then choose the following prior density: π(β 0, γ) = N(β 0 ; µ 0, σ 2 0) { k 1 h=1 π 0h 1(γ h = 0) + (1 π 0h )1(γ h > 0) 0 N(γ h ; µ h, σh) 2 N(z; µ h, σh)dz 2 } The γ h parameters are assigned prior densities consisting of mixtures of point masses at zero (with probability π 0h ) and normal densities truncated below by zero. The prior probability of equivalent means for individuals with w i = j and w i = j + 1 is π 0j, for j = 1,..., k 1. The prior probability of the overall null hypothesis H 0 is π 0 = k 1 j=1 π 0j.

8 Under this prior, Pr(H 0 ) = π 0 and Pr(H 1 ) = 1 π 0. The prior has support on the restricted space Ω. In addition, the prior density is conditionally conjugate with the posterior of γ h of the form π h 1(γ h = 0) + (1 π N(γ h ; h )1(γ h > 0) µ h, σ h) 2 0 N(z; µ h, σ h)dz 2, where µ h and σ 2 h are the posterior mean and variance derived under an unrestriced N(µ 0h, σ 2 0h) prior density for γ h. π h is the posterior probability of γ h = 0 given the data and other parameters.

9 Due to the simplicity of this form, we can simply proceed by a Gibbs sampling algorithm: 1. Specify initial values for β 0, γ and σ Update σ 2 by sampling from IG full conditional 3. Update β 0 by sampling from normal full conditional 4. Update γ h, for h = 1,..., k 1, by sampling from the zeroinflated truncated normal full conditional: (a) Sample from point mass by using Bernoulli( π h ). (b) If not in point mass sample from N( µ h, σ 2 h) truncated below by Repeat 2-4.

10 Calculation of Bayes factors for hypothesis testing From the Gibbs sampling output, we have samples from the posterior density for γ. The elements of γ that are equal to zero tell us which hypothesis we are in for a given sample. For example, γ 1 =... = γ k 1 = 0 implies H 0. Thus, we are effectively moving between different hypotheses in implementing the Gibbs sampler in the same way that stochastic search algorithms move between models with different predictors. Posterior probabilities for a given hypothesis can be calculated as simply the proportion of samples for which that hypothesis holds.

11 Discussion This strategy is very useful for inferences on effects of ordered categorical predictors. For binary and ordered categorical response data, this same approach can be used by using a probit model for the ordinal response and data augmentation (Albert and Chib, 1993) for computation. This same approach can also be used for analysis of discrete time survival data using a continuation ratio probit model to characterize the survival likelihood. For other GLMs similar approaches can be used but the prior is no longer conjugate, so computation can be more intensive.

12 Midterm Review Problem Set 1. Suppose that 2500 pregnant women are enrolled in a study and the outcome is the occurrence of preterm birth. Possible predictors of preterm birth include age of the woman, smoking, socioeconomic status, body mass index, bleeding during pregnancy, serum level of dde, and several dietary factors. Formulate the problem of selecting the important predictors of preterm birth in a generalized linear model (GLM) framework. Show the components of the GLM, including the link function and distribution (in exponential family form). Describe (briefly) how estimation and inference could proceed via a frequentist approach. 2. Women are enrolled in a study when they go off of contraception with the intention of achieving a pregnancy. Suppose there are 350 women in the study who provide information on the number of menstrual cycles required to achieve a pregnancy, whether or not they smoke cigarettes, and their age at beginning the attempt. Describe a statistical model for addressing the question: Is cigarette smoking related to time to pregnancy? Formulate the statistical model within a Bayesian framework and outline the details of model fitting and inference (including the form of the posterior density, an outline of the algorithm for posterior computation, and the approach for addressing the scientific question based on the posterior). 3. A study is connected examining the impact of alcohol intake during pregnancy on the occurrence of birth defects of 5 different types. Outcome data for a child consist of 5 binary indicators of the presence or absence of the different birth defects. A physician working with you on the study notes that certain

13 children have several birth defects, possibly due to defects in important unmeasured genes, while most children have no defects. Describe a latent variable model for analyzing these data and outline (briefly) the details of a Bayesian analysis (including the form of the posterior density, an outline of the algorithm for posterior computation, and the approach for addressing the scientific question based on the posterior). 4. A toxicology study is conducted in which pregnant mice are exposed to different doses of a chemical. The outcome data consist of an ordinal ranking of the sickness of each pup in each litter, with 1 = healthy, 2 = low birth weight but otherwise healthy, 3 = malformed, and 4 = dead. The goal of the study is to see if dose is associated with health of the pup. Describe a model and analytic strategy. What is the interpretation of the model parameters? What assumptions are being made and can they be relaxed?

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary