Bayesian Multivariate Logistic Regression

Similar documents
Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

STA 216, GLM, Lecture 16. October 29, 2007

Bayesian Linear Regression

Bayes methods for categorical data. April 25, 2017

Bayesian Hypothesis Testing in GLMs: One-Sided and Ordered Alternatives. 1(w i = h + 1)β h + ɛ i,

,..., θ(2),..., θ(n)

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

A Fully Nonparametric Modeling Approach to. BNP Binary Regression

Xia Wang and Dipak K. Dey

The Polya-Gamma Gibbs Sampler for Bayesian. Logistic Regression is Uniformly Ergodic

Bayesian Inference in the Multivariate Probit Model

Bayesian Isotonic Regression and Trend Analysis

Model Selection in GLMs. (should be able to implement frequentist GLM analyses!) Today: standard frequentist methods for model selection

STA216: Generalized Linear Models. Lecture 1. Review and Introduction

Figure 36: Respiratory infection versus time for the first 49 children.

Bayesian linear regression

Bayesian Linear Models

Bayesian Linear Models

Lecture 8: The Metropolis-Hastings Algorithm

Lecture 16: Mixtures of Generalized Linear Models

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Bayesian Isotonic Regression and Trend Analysis

Metropolis-Hastings Algorithm

Hierarchical Modeling for Univariate Spatial Data

CTDL-Positive Stable Frailty Model

Gibbs Sampling in Linear Models #2

Hierarchical Modelling for Univariate Spatial Data

Stat 5101 Lecture Notes

Partial factor modeling: predictor-dependent shrinkage for linear regression

Variable Selection for Multivariate Logistic Regression Models

ST 740: Linear Models and Multivariate Normal Inference

Default Priors and Effcient Posterior Computation in Bayesian

The linear model is the most fundamental of all serious statistical models encompassing:

Generalized Linear Models Introduction

Part 6: Multivariate Normal and Linear Models

HIERARCHICAL BAYESIAN ANALYSIS OF BINARY MATCHED PAIRS DATA

MULTILEVEL IMPUTATION 1

Supplementary Material for Analysis of Job Satisfaction: The Case of Japanese Private Companies

A Nonparametric Bayesian Model for Multivariate Ordinal Data

Fixed and random effects selection in linear and logistic models

A Bayesian multi-dimensional couple-based latent risk model for infertility

Gibbs Sampling in Latent Variable Models #1

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Practical Bayesian Quantile Regression. Keming Yu University of Plymouth, UK

Bayesian Linear Models

Fixed and Random Effects Selection in Linear and Logistic Models

Gibbs Sampling in Endogenous Variables Models

Semiparametric Generalized Linear Models

Estimation of Sample Selection Models With Two Selection Mechanisms

STAT 705 Generalized linear mixed models

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random

Repeated ordinal measurements: a generalised estimating equation approach

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

Sampling bias in logistic models

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j

Conjugate Analysis for the Linear Model

A general mixed model approach for spatio-temporal regression data

Dynamic Generalized Linear Models

A generalized skew probit class link for binary regression

Model Selection in Bayesian Survival Analysis for a Multi-country Cluster Randomized Trial

Bayesian Inference. Chapter 9. Linear models and regression

Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs

Econometrics Lecture 5: Limited Dependent Variable Models: Logit and Probit

Markov Chain Monte Carlo Methods

Bayesian methods for missing data: part 1. Key Concepts. Nicky Best and Alexina Mason. Imperial College London

Bayesian non-parametric model to longitudinally predict churn

Markov Chain Monte Carlo (MCMC)

Single-level Models for Binary Responses

Mixed Models for Longitudinal Ordinal and Nominal Outcomes

Hierarchical Modelling for Univariate Spatial Data

Multivariate Normal & Wishart

Lecture Notes based on Koop (2003) Bayesian Econometrics

Frailty Probit model for multivariate and clustered interval-censor

Default Priors and Efficient Posterior Computation in Bayesian Factor Analysis

Nonparametric Bayesian modeling for dynamic ordinal regression relationships

Part 8: GLMs and Hierarchical LMs and GLMs

Bayesian Analysis of Latent Variable Models using Mplus

Online Appendix to: Marijuana on Main Street? Estimating Demand in Markets with Limited Access

STA6938-Logistic Regression Model

Generalized Linear Models for Non-Normal Data

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

Lecture 5: Spatial probit models. James P. LeSage University of Toledo Department of Economics Toledo, OH

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

A User-Friendly Introduction to Link-Probit- Normal Models

Chapter 1. Modeling Basics

The Metropolis-Hastings Algorithm. June 8, 2012

PQL Estimation Biases in Generalized Linear Mixed Models

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P.

Modern Methods of Statistical Learning sf2935 Lecture 5: Logistic Regression T.K

Multinomial Regression Models

MCMC algorithms for fitting Bayesian models

Research Article Power Prior Elicitation in Bayesian Quantile Regression

A Very Brief Summary of Statistical Inference, and Examples

On Bayesian Computation

A Very Brief Summary of Bayesian Inference, and Examples

A Bayesian Probit Model with Spatial Dependencies

The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models

Stat 642, Lecture notes for 04/12/05 96

Data Augmentation for the Bayesian Analysis of Multinomial Logit Models

Transcription:

Bayesian Multivariate Logistic Regression Sean M. O Brien and David B. Dunson Biostatistics Branch National Institute of Environmental Health Sciences Research Triangle Park, NC 1

Goals Brief review of existing methods Illustrate some useful computational techniques MCMC importance sampling (Hastings, 1970) Data-augmentation Gibbs algorithm (Albert & Chib, 1993) Methods for sampling from truncated multivariate normal (Devroye, 1989; Geweke, 1989) Metropolis algorithms for sampling correlation matrices (Chib & Greenberg, 1998; Chen & Dey, 1998) 2

Introduction Correlated binary data arise in numerous application Longitudinal studies Cluster-randomized trials Epidemiologic studies of twins Approaches to regression analysis of multivariate binary and ordinal categorical data Generalized estimating equations (GEE) Generalized linear mixed models (GLMMs) Logistic link is common in health sciences (odds ratios) Some approaches that work well for frequentist inference do not work as well in Bayesian context 3

Two main types of models 1. Cluster-specific models. Regression parameters have cluster-specific interpretation. For example, LogitPr[y ij = 1] = x ijβ + z ijb i, b i N(0, D) 2. Marginal models. Regression parameters have population-average (marginal) interpretation (desirable for epidemiologic studies). LogitPr[y ij = 1] = x ijβ Full likelihood not necessary for frequentist inference can use GEE. Need a full likelihood for Bayesian inference. 4

Full Likelihood Approaches to Marginal Models Multivariate logistic regression Parameterization via cross-odds ratios (Glonek & McCullagh, 1995) Let ȳ ij = 1 y ij. log Logit Pr(y ij = 1) = log E(y ij) E(ȳ ij ) = x ijβ j { } E(yij y ih ) log E(ȳ ij y ih ) /E(y ijȳ ih ) = x ijh θ jh E(ȳ ij ȳ ih ) } { E(yij y ih y ik ) E(ȳ ij y ih y ik ) / E(y ijȳ ih y ik ) E(ȳ ij ȳ ih y ik ) { E(yij y ih ȳ ik ) E(ȳ ij y ih ȳ ik ) / E(y ijȳ ih ȳ ik ) E(ȳ ij ȳ ih ȳ ik ) etc... } = x ijhk γ jhk Typically impose additional restrictions to simplify model 5

Full Likelihood Approaches to Marginal Models Multivariate Probit Models (Chib & Greenberg, 1998) y ij = 1(z ij > 0) z ij = x ij β + e ij e i = (e i1,..., e ip ) N(0, R) Notation y i = (y i1,..., y ip ) is a vector of binary outcomes x ij is a vector of predictors associated with y ij R is a correlation matrix for identifiability β and R are parameters to be estimated 6

Multivariate Categorical Regression Methods Generalized Estimating Equations (GEE) - Cannot use GEE for Bayesian inference. Generalized Linear Mixed Models (GLMMs) - Posterior is improper when simple non-informative priors are chosen. (Natarajan and Kass, JASA, 2000) - Regression parameters have subject-specific, not population averaged, interpretation. Multivariate logistic regression - Modeling dependency via multi-way odds ratios is unwieldy. Multivariate probit regression - Advantage: Simplified computation, modeling of dependency. 7

Objectives Propose new likelihood and computational algorithm for multivariate logistic regression Model for individual outcomes is univariate logistic regression Correlation structure is similar to probit models Advantages Results can be summarized by odds ratios Simple and flexible correlation structure Computation is simple (like probit models) Posterior is proper when non-informative priors are chosen 8

Model specification via underlying variables Binary logistic regression Univariate case Logistic density f(z; µ) = LogitPr[y i = 1] = x i β Equivalent Model y i = 1(z i > 0) z i Logistic(x iβ, 1) exp { (z µ) } [ 1 + exp { (z µ) }] 2 9

Model specification via underlying variables Multivariate generalization Let y i = (y i1,..., y ip ) denote vector of binary responses Let X i denote (p q) matrix of predictors y ij = 1(z ij > 0) z i = (z i1,..., z ip ) Multivariate Logistic(X i β, R) 10

Choice of Multivariate Logistic Density There is a lack of flexible multivariate logistic distributions Need to define a new logistic density with a flexible correlation structure Approach: Transform variables that follow a standard multivariate distribution Let t = (t 1,..., t p ) Multivariate t ν (0, R) Let z j = µ j + log ( F (tj ) 1 F (t j ) ), where F (.) is CDF of t j. Then z = (z 1,..., z p ) is Multivariate Logistic(µ, R) 11

Form of proposed multivariate logistic density L p (z µ, R) = T p, ν (t 0, R) p j=1 L(z j µ j ) T ν (t j 0, 1), (1) where the conventional multivariate t density is denoted by T p,ν (t, Σ) = Γ (ν + p)/2 Γ(ν/2)(νπ) p/2 Σ 1/2! 1+ 1 ν (t ) Σ 1 (t ) (ν+p)/2, t j = F 1 (e z j /(e z j + e µ j )) with F 1 (.) denoting the inverse CDF of the T ν (0, 1) density, t = (t 1,..., t p ), µ = (µ 1,..., µ p ), R is a correlation matrix (i.e., with 1 s on the diagonal), and 12

Bayesian Implementation Probability Model y ij = 1(z ij > 0) z i = (z i1,..., z ip ) Multivariate Logistic(X i β, R i ) R i = R i (θ, X i ) Prior Specification Assume π(β, θ) = π(β)π(θ) Choose π(β) N(β 0, Σ β ) or π(β) 1 Can use any prior for π(θ) (including uniform) 13

Posterior Computation Use MCMC Importance Sampling (Hastings, 1970) 1. Use data-augmentation/gibbs/metropolis algorithm to sample from an approximation to the posterior, π approx (θ data) 2. Assign importance weights Let θ denote sample from π approx (θ data) Importance weight = π exact(θ data) π approx (θ data) Approximation is based on a (multivariate) t approximation to the (multivariate) logistic (see Albert & Chib, 1993). Nearly perfect approximation makes importance sampling highly efficient. 14

MCMC Importance Sampling (Hastings, 1970) A method to calculate population means, moments, percentiles, and other expectations of the form E = g(x)π(x)dx Suppose we have an MCMC algorithm that π (x) π(x). Draw sample { x (1),..., x (T )} from an MCMC that π (x) Define importance weight w (t) = π ( x (t)) /π ( x (t)) Can prove that as T Ê = T t=1 w(t) g(x (t) ) T t=1 w(t) g(x)π(x)dx 15

Computation for Multivariate Logistic Regression True Model Logistic Approximation t-link y ij = 1(z ij > 0) y ij = 1(z ( ) ij > 0) F (tij ) z ij = x ij β + log 1 F (t ij )... zij = x ijβ + σt ij t i N ( 0, φ 1 i R ) t i N ( 0, φ 1 i R ) φ i Gamma ( ν 2, ) ν 2 φ i Gamma ( ν 2, ) ν 2 When ν and σ 2 are appropriately chosen, these two models yield virtually identical inferences about β and R. We sample from posterior under multivariate t-link model. Use data-augmentation approach (Albert & Chib, 1993) Gibbs steps to update β, t i s, φ i s. Metropolis step to update R. 16

Full conditionals for t-link model i.i.d Likelihood: y ij = 1(z ij > 0), φ i Gamma( ν 2, ν 2 ( ) ) z i = {z i1,..., z ip } N X i β, σ2 φ i R Prior: β N(β 0, Σ β ), R π[r] Full conditionals: 1. β z, R, y, φ N (AB, A) A = B = h h Σ 1 β Σ 1 β P + n i 1 σ 2 i=1 φ ix ir 1 X i P β 0 + σ 2 n i i=1 φ ix ir 1 z i 2. z i β, Σ, y, z ( i), φ T N Ωy (X i β, σ2 φ i R) ( ) 3. φ i β, Σ, y, z, φ ( i) Gamma ν+p 2, ν+σ 2 (z i X i β) R 1 (z i X i β) 2 4. R use Metropolis step 17

Metropolis step for R Sample a candidate value for the p = p(p 1)/2 unique elements of R: unique R ( ) N p unique R (t 1), Ω, where Ω is chosen by experimentation to yield a desirable acceptance probability. If R is positive definite, set R (t) = R with probability min { 1, π( R) n i=1 N p(z (t) i π( R) n i=1 N p(z (t) i X i β (t), σ 2 /φ (t) i R) X i β (t), σ 2 /φ (t) i R (t 1) ) and set R (t) = R (t 1) otherwise. If R is not positive definite, then R (t) = R (t 1). } 18

Weights for Importance Sampling weight π true (β, R, z y) An equivalent computational formula is weight = π logistic (z β, R) π t-link (z β, R) n ( ) p Tp, ν (t i 0, R) = T p, ν (z i x ij β, σ 2 R) i=1 / πapprox (β, R, z y). j=1 ( ) L(zij x ij β), T 1, ν (t ij 0) where t i = (t i1,..., t ip ) is defined by t j = F 1 (e z j /(e z j + e µ j )) with F 1 (.) denoting the inverse CDF of the T ν (0, 1) density. 19

Extension to ordered categorical data Cumulative logits model Univariate case y i = LogitPr[y i k] = α k x i β 1 if z i (, α 1 ) 2 if z i (α 1, α 2 ).. k if z i (α k 1, ), z i Logistic(x iβ, 1) If π(α) 1, then full conditional of α j is uniform. Full conditional of z i is truncated to fall in (α yi 1, α yi ) 20

Approaches to sampling from truncated normal 1. Inverse CDF method ( Draw u Uniform Set z = µ + σφ 1 (u) Φ( a µ σ Computing Φ 1 (.) is slow ) b µ ), Φ( σ ) Splus crashes when (a µ)/σ > 8 2. Importance sampling with exponential density for (a, ) Draw E i ind Exponential(1), i = 1, 2 Repeat until E 2 1 2σ 2 (a µ) 2 E 2 Set x = a + E 1 a µ σ 3. Geweke (1989) proposed a mixed-rejection algorithm that chooses between i) normal rejection sampling, ii) uniform rejection sampling, iii) exponential rejection sampling. 21

Trick for verifying propriety Let y denote the (n p) matrix of outcomes and let y j denote the data in the jth column of y. In other words, y j = {y 1j,..., y nj }. Let π[β y j ] denote the posterior distribution of β given y j obtained by fitting a univariate logistic regression model with π[β] 1. Theorem: If at least one π[β y j ] is proper then π[β, R y] is proper. Proof: π[β, R y] = Z Z Z Z Z Pr(y β, R)dβdR Pr(y j β, R) Pr(y β, R, y j )dβdr Pr(y j β, R)dβ Z dr = Z π[β y j ]dβ Note: π[β yj ] is proper if MLE exists. Programs like SAS PROC LOGISTIC automatically check for existence of the MLE. 22

Example Data: All twin pregnancies (n = 584) enrolled in the Collaborative Perinatal Project from 1959 to 1965 Outcome: Small for gestational age (SGA) birth. Covariates: Gender, maternal age, years of cigarette smoking, weight gain during pregnancy, gestational age at delivery, and variables relating to obstetric history. Previously analyzed by: Ananth and Preisser analyzed data via maximum likelihood using a different bivariate logistic model. Used odds ratios to model within-twins association Goal: (i) To assess efficiency of importance sampling. (ii) To assess whether two different models yield similar results. 23

Example - Model and Prior Specification Marginal probability model: Correlation model: (Same as Ananth & Preisser) logit Pr(y ij = 1 x ij, β, θ) = x ijβ, x β = to be defined Let ρ i denote single free correlation parameter in R i. θ 1 if subject i is primiparous ρ i = if subject i is multiparous θ 2 Prior: π(β, θ 1, θ 2 ) 1 24

Table 1. Bivariate logistic regression analysis of SGA in twins A & P, 1999 Posterior Summary Covariate MLE (SE) Mean (SD) OR (95% CI) Intercept β 0 3.10 (1.62) 2.97 (1.63) Female infant β 1 0.36 (0.17) 0.35 (0.17) 1.43 (1.01 2.01) Pregnancy history β 2-0.36 (0.26) -0.39 (0.26) 0.68 (0.41 1.13) β 3-0.92 (0.38) -0.97 (0.38) 0.38 (0.18 0.79) β 4 0.44 (0.33) 0.42 (0.32) 1.53 (0.81 2.87) β 5 0.38 (0.46) 0.45 (0.44) 1.57 (0.66 3.68) log(age) β 6-1.03 (0.37) -1.00 (0.47) 0.37 (0.15 0.91) log(wt gain + 6) β 7-0.47 (0.17) -0.46 (0.17) 0.63 (0.45 0.88) log(yrs smoking + 1) β 8 0.26 (0.10) 0.25 (0.09) 1.28 (1.07 1.55) (Gest age 37) β 9 0.21 (0.04) 0.21 (0.04) (Gest age 37) 2 β 10 0.02 (0.01) 0.02 (0.01) Correlation( Primiparous) θ 1 0.16 (0.16) Correlation (Multiparous) θ 2 0.35 (0.07) Pr[θ 2 > θ 1 data] = 86% 25

Conclusions Computational algorithm is easy to program and efficient Posterior is proper under mild conditions Uses underlying normal framework, similar to probit models Has marginal logistic interpretation for individual outcomes Generalizations are straightforward. Multivariate polychotomous outcomes Mixed discrete and continuous outcomes 26