Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Similar documents
Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

STA 216, GLM, Lecture 16. October 29, 2007

Bayesian Multivariate Logistic Regression

Bayesian Hypothesis Testing in GLMs: One-Sided and Ordered Alternatives. 1(w i = h + 1)β h + ɛ i,

Gibbs Sampling in Latent Variable Models #1

Bayes methods for categorical data. April 25, 2017

STA216: Generalized Linear Models. Lecture 1. Review and Introduction

Econometrics Lecture 5: Limited Dependent Variable Models: Logit and Probit

Lecture 16: Mixtures of Generalized Linear Models

Bayesian Linear Regression

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

A Fully Nonparametric Modeling Approach to. BNP Binary Regression

The linear model is the most fundamental of all serious statistical models encompassing:

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture 15-7th March Arnaud Doucet

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

MULTILEVEL IMPUTATION 1

Gibbs Sampling in Linear Models #2

A Bayesian Mixture Model with Application to Typhoon Rainfall Predictions in Taipei, Taiwan 1

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

Partial factor modeling: predictor-dependent shrinkage for linear regression

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random

Stat 5101 Lecture Notes

Wageningen Summer School in Econometrics. The Bayesian Approach in Theory and Practice

Generalized Linear Models for Non-Normal Data

November 2002 STA Random Effects Selection in Linear Mixed Models

Bayesian linear regression

Bayesian Inference in the Multivariate Probit Model

Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes 1

Bayesian Linear Models

Stat 579: Generalized Linear Models and Extensions

Bayesian Linear Models

Linear Regression Models P8111

Model Selection in GLMs. (should be able to implement frequentist GLM analyses!) Today: standard frequentist methods for model selection

Default Priors and Effcient Posterior Computation in Bayesian

Bayesian non-parametric model to longitudinally predict churn

BAYESIAN ANALYSIS OF BINARY REGRESSION USING SYMMETRIC AND ASYMMETRIC LINKS

Single-level Models for Binary Responses

Online Appendix to: Marijuana on Main Street? Estimating Demand in Markets with Limited Access

Gibbs Sampling for the Probit Regression Model with Gaussian Markov Random Field Latent Variables

ECON 594: Lecture #6

Factor Analytic Models of Clustered Multivariate Data with Informative Censoring (refer to Dunson and Perreault, 2001, Biometrics 57, )

Bayesian Linear Models

A general mixed model approach for spatio-temporal regression data

Gibbs Sampling in Endogenous Variables Models

Hierarchical Modeling for Univariate Spatial Data

Generalized Linear Models 1

I. Multinomial Logit Suppose we only have individual specific covariates. Then we can model the response probability as

Modeling Binary Outcomes: Logit and Probit Models

Marginal Specifications and a Gaussian Copula Estimation

12 Modelling Binomial Response Data

Analysing geoadditive regression data: a mixed model approach

Bayesian Analysis of Latent Variable Models using Mplus

Lecture 13: More on Binary Data

Generalized Linear Models

Generalized Linear Models. Kurt Hornik

Lecture 8: The Metropolis-Hastings Algorithm

An Introduction to Bayesian Linear Regression

Lecture 5: Spatial probit models. James P. LeSage University of Toledo Department of Economics Toledo, OH

The Logit Model: Estimation, Testing and Interpretation

Part 8: GLMs and Hierarchical LMs and GLMs

Linear Regression. Data Model. β, σ 2. Process Model. ,V β. ,s 2. s 1. Parameter Model

Goals. PSCI6000 Maximum Likelihood Estimation Multiple Response Model 1. Multinomial Dependent Variable. Random Utility Model

Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs

STA 4273H: Statistical Machine Learning

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

Generalized Linear Models Introduction

Contents. Part I: Fundamentals of Bayesian Inference 1

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Logistic Regression. Seungjin Choi

Nonparametric Bayesian modeling for dynamic ordinal regression relationships

Generalized Linear Models

Generalized logit models for nominal multinomial responses. Local odds ratios

CTDL-Positive Stable Frailty Model

Measurement error as missing data: the case of epidemiologic assays. Roderick J. Little

Bayesian GLMs and Metropolis-Hastings Algorithm

Linear, Generalized Linear, and Mixed-Effects Models in R. Linear and Generalized Linear Models in R Topics

Linear Regression With Special Variables

STAT 518 Intro Student Presentation

Default Priors and Efficient Posterior Computation in Bayesian Factor Analysis

Supplementary Material for Analysis of Job Satisfaction: The Case of Japanese Private Companies

The Multilevel Logit Model for Binary Dependent Variables Marco R. Steenbergen

GEE for Longitudinal Data - Chapter 8

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Conjugate Analysis for the Linear Model

Data-analysis and Retrieval Ordinal Classification

STAT5044: Regression and Anova

36-720: The Rasch Model

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

Weighted Least Squares I

Bayesian Inference. Chapter 9. Linear models and regression

Hierarchical Modelling for Univariate Spatial Data

Hybrid Censoring; An Introduction 2

PQL Estimation Biases in Generalized Linear Mixed Models

Research Division Federal Reserve Bank of St. Louis Working Paper Series

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

Econometrics II. Seppo Pynnönen. Spring Department of Mathematics and Statistics, University of Vaasa, Finland

Comparison of multiple imputation methods for systematically and sporadically missing multilevel data

Statistical Estimation

Transcription:

Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary response Y = 1 is recorded if and only if U > 0: θ = Pr(Y = 1 x) = 1 F (0; x). Since U is not directly observed there is no loss of generality in taking the critical (i.e., cutoff point) to be 0. In addition, we can take the standard deviation of U (or some other measure of dispersion) to be 1, without loss of generality. 1

Probit Models For example, if U N(x β, 1) it follows that θ i = Pr(Y = 1 x i ) = Φ(x iβ), where Φ( ) is the cumulative normal distribution function Φ(t) = (2π) 1/2 t exp( 1 2 z2 ) dz. The relation is linearized by the inverse normal transformation Φ 1 (θ) = x iβ = p j=1 x ij β j. 2

We have regarded the cutoff value of U as fixed and the mean of U to be changing with x. Alternatively, one could assume that the distribution of U is fixed and allow the critical value to vary with x (e.g., dose) In toxicology studies where dose is the explanatory variable it makes sense to let V denote the minimum level of dose needed to produce a response (i.e., tolerance) 3

Under the second formulation, y i = 1 if x i β > v i It follows that Pr(Y = 1 x i ) = Pr(V x iβ). Note that the shape of the dose-response curve is determined by the distribution function of V If V N(0, 1), then Pr(Y = 1 x i ) = Φ(x iβ), and it follows that the U and V formulations are equivalent The U formulation is more common 4

Latent Utilities & Choice Models Suppose that Fred is choosing between 2 brands of a product (say, Ben & Jerry s or Haigen Daz) Fred has a utility for Ben & Jerry s (denoted by Z i1 ) and a utility for Haigen Daz (denotes by Z i2 ) Letting the difference in utilities be represented by the normal linear model, we have U i = Z i1 Z i2 = x iβ + ɛ i, where ɛ i N(0, 1). If Fred has a higher utility for Ben & Jerry s, then Z i1 > Z i2, U i > 0, and Fred will choose Ben & Jerry s (Y i = 1) 5

This latent utility formulation is again equivalent to a probit model for the binary response. The generalization to a multinomial response is straightforward by introducing k latent utilities instead of 2, and letting an individual s response (i.e., choice) correspond to the category with the maximum utility Although the probit model is preferred in bioassay and social sciences applications, the logistic model is preferred in the biomedical sciences Of course, the choice of distribution function for U (and hence the choice of link in the binary response GLM) should be motivated by model fit. 6

Logistic Regression The normal form is only one possibility for the distribution of U. Another is the logistic distribution with location x iβ and unit scale. The logistic distribution has cumulative distribution function so that F (u) = exp(u x iβ) 1 + exp(u x iβ), F (0; x i ) = 1/{1 + exp(x iβ)}, It follows that Pr(Y = 1 x i ) = Pr(U > 0 x i ) = 1 F (0; x i ) = 1/{1+exp( x iβ)}. To linearize this relation, we take the logit transformation of both sides, log{θ i /(1 θ i )} = x iβ. 7

Homework Exercise: For x i = (1, x) and β 2 > 0, reformulate the logistic regression in terms of a threshold model (i.e., the V formulation of the probit model described above). Derive the probability density function (pdf) obtained by differentiating Pr(Y = 1 x i ) with respect to x. Reparameterize in terms of τ = 1/β 2 and µ = β 1 /β 2. Plot this pdf for µ = 0 and πτ/ 3 = 1 along with the N(0,1) pdf in S-PLUS. Which density has the fatter tails? Is the pdf for x in the logistic case in the exponential family? 8

Some Generalizations of the Logistic Model The logistic regression model assumes a restricted dose-response shape and it is possible generalize the model to relax the restriction Aranda-Ordaz (1981) proposed two families of linearizing transformation, which can easily be inverted and which span a range of forms. The first, which is restricted to symmetric cases (i.e., invariant to interchanging success & failure) is 2 θ ν (1 θ) ν ν θ ν + (1 θ). ν In the limit as ν 0, this is logistic and for ν = 1 this is linear The second family has log[{(1 θ) ν 1}/ν], which reduces to the extreme value model when ν = 0 and the logistic when ν = 1. 9

When there is doubt about the transformation, a formal approach is to use one or the other of the above transformations and to fit the resulting model for a range of possible values for ν A profile likelihood can be obtained for ν by plotting the maximized likelihood against ν (Frequentist) Potentially, one could choose a standard form, such as the logistic, if the corresponding value of ν falls with the 95% profile likelihood confidence region. Alternatively, we could choose a prior density for ν and implement a Bayesian approach. 10

Data Augmentation Algorithms for Probit Models (Albert & Chib, 1993, JASA, 669-679) Now, suppose that p i = Pr(y i = 1 x i, β) = Φ(x iβ), where Φ( ) is the N(0,1) cdf. As discussed previously, this probit regression model is equivalent to: y i = 1(z i > 0), z i N(x iβ, 1), where z 1,..., z n are independent latent variables. Note that if the z i are known and a multivariate normal prior is chosen for β, the posterior distribution is multivariate normal. 11

The z i are unknown latent variables, which we introduce for computational convenience and which have no impact on the model interpretation. By introducing the z i s, we are augmenting the observed data y = (y 1,..., y n ) with latent data z = (z 1,..., z n ). The joint posterior density of the unobservables β and z is π(β, z y) π(β) n i=1 {1(z i > 0) 1(y i = 1) + 1(z i 0) 1(y i = 0)}N(z i ; x iβ, 1), which is the prior for β times the prior for z given β times the likelihood for y given z and β. 12

Note that, integrating out the latent data, we have π(β y) = π(β, z y) dz π(β) n i=1 = π(β) n i=1 = π(β) n i=1 { 0 N(z i; x iβ, 1) dz } 1(y i =0) { i N(z 0 i ; x } iβ, 1) dz i Φ( x iβ) 1(y i=0) Φ(x iβ) 1(y i=1), {1 Φ(x iβ)} 1(y i=0) Φ(x iβ) 1(y i=1), and computation could proceed for this binary-response glm using Gibbs sampling with ARS (e.g., in WinBUGS) This procedure updates the parameters one at a time and requires programming of ARS. 13

Alternative: Data Augmentation Gibbs sampler 1. Choose initial values for β and prior density, β N(β 0, Σ 0 ). 2. Impute the latent data by sampling from the full conditional distribution, π(z i β, y) {1(z i > 0) 1(y i = 1) + 1(z i 0) 1(y i = 0)}N(z i ; x iβ, 1) d = truncated N(x iβ, 1), for z i > 0 if y i = 1, and z i 0 if y i = 0. 3. Update β (jointly!) by sampling from the full conditional, π(β z, y) d = N( β, Σ β ), where Σ β = (Σ 1 0 + X X) 1 is the posterior covariance conditional on z and β = Σ β (Σ 1 0 β 0 + X z). 4. Repeat steps 2-3 until apparent convergence and calculate posterior summaries for β based on a large number of additional iterates. 14

Some Comments If you like probit models, the Albert and Chib algorithm is extremely useful being very easy to program and efficient relative to ARS. Probit models have the disadvantage that the regression coefficients cannot be expressed as a simple analytic function of the probability of response. However, by approximating the logistic model using a scale mixture of normals, one can modify the Albert and Chib approach for logistic regression (O Brien and Dunson, 2003, ISDS Discussion Paper 03-08). Underlying normal models are not limited to univariate binary data - generalizations extremely useful. 15

Extending GLMs for Correlated Data GLMs assume that the observations y 1,..., y n are independent draws from an exponential family distribution However, in many applications, there may be dependency in the outcome data For example, in longitudinal studies, repeated observations are collected for each study subject 16

Longitudinal Studies For subject i (i = 1,..., n), outcome data consist of an n i 1 vector of measurements at follow-up times t i,1,..., t i,ni. Instead of a single measurement, y i, for subject i, we have a vector of measurements, y = (y i1,..., y i,ni ). Since different measurements for a subject may be correlated, the standard GLM is not appropriate 17

Possibilities for Repeated Measures Data 1. Conditional Model: Allow the linear predictor to differ for the different study subjects, η ij = x ijβ + b i, where x ij does not include an intercept and b i is a subject-specific parameter (i.e., subject is a blocking factor in ANOVA jargon). 2. Marginal Model: Specify a marginal model for the population averaged response, and construct a variance estimator which takes into account the correlation structure (e.g., Generalized Estimating Equations, Liang and Zeger, 1986) 3. Mixed Model: Assume the regression coefficients for a subject are drawn from a population distribution, and estimate both the population and individual-specific parameters (Laird and Ware, 1982). 18