,..., θ(2),..., θ(n)

Similar documents
Figure 36: Respiratory infection versus time for the first 49 children.

Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association for binary responses

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

Repeated ordinal measurements: a generalised estimating equation approach

Bayesian Multivariate Logistic Regression

36-720: The Rasch Model

Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52

High-Throughput Sequencing Course

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA

A measure of partial association for generalized estimating equations

Stat 579: Generalized Linear Models and Extensions

Semiparametric Generalized Linear Models

Using Estimating Equations for Spatially Correlated A

Generalized Estimating Equations (gee) for glm type data

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

STA216: Generalized Linear Models. Lecture 1. Review and Introduction

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Generalized linear models

LOGISTIC REGRESSION Joseph M. Hilbe

Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models:

GEE for Longitudinal Data - Chapter 8

Sample size calculations for logistic and Poisson regression models

Lecture 3.1 Basic Logistic LDA

Simulating Longer Vectors of Correlated Binary Random Variables via Multinomial Sampling

Linear Mixed Models. One-way layout REML. Likelihood. Another perspective. Relationship to classical ideas. Drawbacks.

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random

Stat 5101 Lecture Notes

Modeling Longitudinal Count Data with Excess Zeros and Time-Dependent Covariates: Application to Drug Use

Stuart R. Lipsitz and Garrett M. Fitzmaurice, Joseph G. Ibrahim, Debajyoti Sinha, Michael Parzen. and Steven Lipshultz

Part IV: Marginal models

Single-level Models for Binary Responses

Convergence Properties of a Sequential Regression Multiple Imputation Algorithm

Lecture 3 September 1

Generalized Linear Models Introduction

Sample Size and Power Considerations for Longitudinal Studies

Generalized Linear Models (1/29/13)

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )

Biostatistics Advanced Methods in Biostatistics IV

Generalized Quasi-likelihood (GQL) Inference* by Brajendra C. Sutradhar Memorial University address:

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Efficiency of generalized estimating equations for binary responses

Lecture 5: LDA and Logistic Regression

Poisson regression: Further topics

STAT 526 Advanced Statistical Methodology

Robust covariance estimator for small-sample adjustment in the generalized estimating equations: A simulation study

STAT 526 Spring Final Exam. Thursday May 5, 2011

FREQUENTIST BEHAVIOR OF FORMAL BAYESIAN INFERENCE

Part 8: GLMs and Hierarchical LMs and GLMs

Faculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics

Generalized Linear Models. Kurt Hornik

Generalized Quasi-likelihood versus Hierarchical Likelihood Inferences in Generalized Linear Mixed Models for Count Data

MLE and GMM. Li Zhao, SJTU. Spring, Li Zhao MLE and GMM 1 / 22

Statistical Analysis of Randomized Experiments with Nonignorable Missing Binary Outcomes

Linear Regression Models P8111

Longitudinal Modeling with Logistic Regression

Multinomial Regression Models

Introduction to mtm: An R Package for Marginalized Transition Models

Topic 12 Overview of Estimation

Copula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011

Research Projects. Hanxiang Peng. March 4, Department of Mathematical Sciences Indiana University-Purdue University at Indianapolis

Chap 2. Linear Classifiers (FTH, ) Yongdai Kim Seoul National University

PQL Estimation Biases in Generalized Linear Mixed Models

Joint Modeling of Longitudinal Item Response Data and Survival

Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models

Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models

TECHNICAL REPORT # 59 MAY Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study

Justine Shults 1,,, Wenguang Sun 1,XinTu 2, Hanjoo Kim 1, Jay Amsterdam 3, Joseph M. Hilbe 4,5 and Thomas Ten-Have 1

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Generalized Linear Models

Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes 1

Constrained Maximum Likelihood Estimation for Model Calibration Using Summary-level Information from External Big Data Sources

Previous lecture. P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing.

Small Sample Methods for the Analysis of Clustered Binary Data

University of California, Berkeley

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models

Covariance Models (*) X i : (n i p) design matrix for fixed effects β : (p 1) regression coefficient for fixed effects

STA 216, GLM, Lecture 16. October 29, 2007

22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression

i=1 h n (ˆθ n ) = 0. (2)

Consider Table 1 (Note connection to start-stop process).

Generalized Linear Models for Non-Normal Data

Naive Bayes and Gaussian Bayes Classifier

Applied Economics. Regression with a Binary Dependent Variable. Department of Economics Universidad Carlos III de Madrid

Naive Bayes and Gaussian Bayes Classifier

Appendix A. Numeric example of Dimick Staiger Estimator and comparison between Dimick-Staiger Estimator and Hierarchical Poisson Estimator

The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models

Primal-dual Covariate Balance and Minimal Double Robustness via Entropy Balancing

Machine Learning. Lecture 3: Logistic Regression. Feng Li.

STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method.

Binary Response: Logistic Regression. STAT 526 Professor Olga Vitek

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Stat 579: Generalized Linear Models and Extensions

Bayesian methods for missing data: part 1. Key Concepts. Nicky Best and Alexina Mason. Imperial College London

Linear Regression With Special Variables

Models for Clustered Data

Models for Clustered Data

Course topics (tentative) The role of random effects

Stat 5102 Final Exam May 14, 2015

A generalized linear mixed model for longitudinal binary data with a marginal logit link function

Transcription:

Likelihoods for Multivariate Binary Data Log-Linear Model We have 2 n 1 distinct probabilities, but we wish to consider formulations that allow more parsimonious descriptions as a function of covariates. One choice is the log-linear model: 0 1 Pr(Y = y) = c(θ) exp @ X θ (1) j y j + X θ (2) j 1 j y j1 y j2 +... + θ (n) 2 12...n y 1...y n A, j j 1 <j 2 with 2 n 1 parameters θ = (θ (1) 1,..., θ(1) n, θ (2) and where c(θ) is the normalizing constant. 12,..., θ(2) n 1,n,..., θ(n) 12...n )T, This formulation allows calculation of cell probabilities, but is less useful for describing Pr(Y = y) as a function of x. Note that we have 2 n 1 parameters and we have two aims: reduce this number, and introduce a regression model. 324 Example: n = 2. We have Pr(Y 1 = y 1, Y 2 = y 2 ) = c(θ) exp θ (1) 1 y 1 + θ (1) 2 y 2 + θ (2) 12 y 1y 2, where θ = (θ (1) 1, θ(1) 2, θ(2) 12 )T and c(θ) 1 = 1X 1X y 1 =0 y 2 =0 exp θ (1) 1 y 1 + θ (1) 2 y 2 + θ (2) 12 y 1y 2 y 1 y 2 Pr(Y 1 = y 1, Y 2 = y 2 ) 0 0 c(θ) 1 0 c(θ) exp(θ (1) 1 ) 0 1 c(θ) exp(θ (1) 2 ) 1 1 c(θ) exp(θ (1) 1 + θ (1) 2 + θ (2) 12 ) 325

Hence we have interpretations: exp(θ (1) 1 ) = Pr(Y 1 = 1, Y 2 = 0) Pr(Y 1 = 0, Y 2 = 0) = Pr(Y 1 = 1 Y 2 = 0) Pr(Y 1 = 0 Y 2 = 0) the odds of an event at trial 1, given no event at trial 2; exp(θ (1) 2 ) = Pr(Y 1 = 0, Y 2 = 1) Pr(Y 1 = 0, Y 2 = 0) = Pr(Y 2 = 1 Y 1 = 0) Pr(Y 2 = 0 Y 1 = 0) the odds of an event at trial 2, given an event at trial 1; exp(θ (12) 12 ) = Pr(Y 1 = 1, Y 2 = 1)Pr(Y 1 = 0, Y 2 = 0) Pr(Y 1 = 1, Y 2 = 0)Pr(Y 1 = 0, Y 2 = 1) = Pr(Y 2 = 1 Y 1 = 1)/ Pr(Y 2 = 0 Y 1 = 1) Pr(Y 2 = 1 Y 1 = 0)/ Pr(Y 2 = 0 Y 1 = 0) the ratio of odds of an event at trial 2 given an event at trial 1, divided by the odds of an event at trial 2 given a at event at trial 1. Hence if this parameter is larger than 1 we have positive dependence. 326 Quadratic Exponential Log-Linear Model We describe three approaches to modeling binary data: conditional odds ratios, correlations, marginal odds ratios. Zhao and Prentice (1990) consider the log-linear model with third and higher-order terms set to zero, so that 0 1 Pr(Y = y) = c(θ) exp @ X j θ (1) j y j + X θ (2) jk y jy k A. j<k For this model Interpretation: Pr(Y j = 1 Y k = y k, Y l = 0, l j, k) Pr(Y j = 0 Y k = y k, Y l = 0, l j, k) = exp(θ(1) j + θ (2) jk y k). exp(θ (1) j ) is the odds of a success, given all other responses are zero. exp(θ (2) jk ) is the odds ratio describing the association between Y j and Y k, given all other responses are fixed (equal to zero). 327

Limitations: 1. Suppose we now wish to model θ as a function of x. Example: Y respiratory infection, x mother s smoking (no/yes). Then we could let the parameters θ depend on x, i.e. θ(x). But the difference between θ (1) j (x = 1) and θ (1) j (x = 0) represent the effect of smoking on the conditional probability of respiratory infection at visit j, given that there was no infection at any other visits. Difficult to interpret, and we would rather model the marginal probability. 2. The interpretation of the θ parameters depends on the number of responses n particularly a problem in a longitudinal setting with different n i. 328 Bahadur Representation Another approach to parameterizing a multivariate binary model was proposed by Bahadur (1961) who used marginal means, as well as second-order moments specified in terms of correlations. Let Then we can write 0 R j = Y j µ j [µ j (1 µ j )] 1/2 ρ jk = corr(y j, Y k ) = E[R j R k ] ρ jkl = E[R j R k R l ]...... ρ 1,...,n = E[R 1...R n ] Pr(Y = y) = @1 + X ρ jk r j r k + X j<k j<k<l ny j=1 µ y j j (1 µ j) 1 y j 1 ρ jkl r j r k r l +... + ρ 1,...,n r 1 r 2...r n A Appealing because we have the marginal means µ j and nuisance parameters. 329

Limitations: Unfortunately, the correlations are constrained in complicated ways by the marginal means. Example: consider measurements on a single individual, Y 1 and Y 2, with means µ 1 and µ 2. We have and corr(y 1, Y 2 ) = Pr(Y 1 = 1, Y 2 = 1) µ 1 µ 2 {µ 1 (1 µ 1 )µ 2 (1 µ 2 )} 1/2 max(0, µ 1 + µ 2 1) Pr(Y 1 = 1, Y 2 = 1) min(µ 1, µ 2 ), which implies complex constraints on the correlation. For example, if µ 1 = 0.8 and µ 2 = 0.2 then 0 corr(y 1, Y 2 ) 0.25. 330 Marginal Odds Ratios An alternative is to parameterize in terms of the marginal means and the marginal odds ratios defined by γ jk = Pr(Y j = 1, Y k = 1) Pr(Y j = 0, Y k = 0) Pr(Y j = 1, Y k = 0)Pr(Y j = 0, Y k = 1) = Pr(Y j = 1 Y k = 1)/Pr(Y j = 0 Y k = 1) Pr(Y j = 1 Y k = 0)/Pr(Y j = 0 Y k = 0) which is the odds that the j-th observation is a 1, given the k-th observation is a 1, divided by the odds that the j-th observation is a 1, given the k-th observation is a 0. Hence if γ jk > 1 we have positive dependence between outcomes j and k. It is then possible to obtain the joint distribution in terms of the means µ, where µ j = Pr(Y j = 1) the odds ratios γ = (γ 12,..., γ n 1,n ) and contrasts of odds ratios 331

We need to find E[Y j Y k ] = µ jk, so that we can write down the likelihood function, or an estimating function. For the case of n = 2 we have and so γ 12 = Pr(Y 1 = 1, Y 2 = 1) Pr(Y 1 = 0, Y 2 = 0) Pr(Y 1 = 1, Y 2 = 0)Pr(Y 1 = 0, Y 2 = 1) = µ 12(1 µ 1 µ 2 + µ 12 ) (µ 1 µ 12 )(µ 2 µ 12 ), µ 2 12(γ 12 1) + µ 12 b + γ 12 µ 1 µ 2 = 0, where b = (µ 1 + µ 2 )(1 γ 12 ) 1, to give µ 12 = b ± p b 2 4(γ 12 1)µ 1 µ 2. 2(γ 12 1) Y 2 0 1 Y 1 0 1 µ 1 1 µ 12 µ 1 1 µ 2 µ 2 332 Limitations In a longitudinal setting (we add an i subscript to denote individuals), finding the µ ijk terms is computationally complex. Large numbers of nuisance odds ratios if n i s are large assumptions such as γ ijk = γ for all i, j, k may be made. Another possibility is to take log γ ijk = α 0 + α 1 t ij t ik 1, so that the degree of association is inversely proportional to the time between observations. 333

Modeling Multivariate Binary Data Using GEE For a marginal Bernoulli outcome we have Pr(Y ij = y ij x ij ) = µ y ij ij (1 µ ij) 1 y ij = exp(y ij θ ij log{1 + e θ ij }), where θ ij = log(µ ij /(1 µ ij ), an exponential family representation. For independent responses we therefore have the likelihood 2 3 2 mx Xn i mx Xn i mx Xn i Pr(Y x) = exp 4 y ij θ ij log{1 + e θ ij } 5 = exp 4 j=1 j=1 To find the MLEs we consider the score equation: G(β) = l β = mx Xn i j=1 l ij θ ij θ ij β = mx Xn i x ij (y ij µ ij ) = j=1 j=1 l ij 3 5. mx x T i (y i µ i ). So GEE with working independence can be implemented with standard software, though we need to fix-up the standard errors via sandwich estimation. 334 Non-independence GEE Assuming working correlation matrices: R i (α) and estimating equation G(β) = where W i = 1/2 i R i (α) 1/2 i. mx D T i W 1 i (y i µ i ), Here α are parameters that we need a consistent estimator of (Newey 1990, shows that the choice of estimator for α has no effect on the asymptotic efficiency). Define a set of n i (n i 1)/2 empirical correlations R ijk = (Y ij µ ij )(Y ik µ ik ) [µ ij (1 µ ij )µ ik (1 µ ik )] 1/2. We can then define a set of moment-based estimating equations to obtain estimates of α. 335

First Extension to GEE Rather than have a method of moments estimator for α, Prentice (1988) proposed using a second set of estimating equations for α. In the context of data with var(y ij ) = v(µ ij ): G 1 (β, α) = G 2 (β, α) = mx mx D T i W 1 i (Y i µ i ) E T i H 1 i (T i Σ i ) where R ij = {Y ij µ ij }/v(µ ij ) 1/2, to give data T T i = (R i1r i2,..., R ini 1R ini, R 2 i1,..., R2 in i ), Σ i (α) = E[T i ] is a model for the correlations and variances of the standardized residuals, E i = Σ i α, and H i = cov(t i ) is the working covariance model. The vector T i has n i (n i 1)/2 + n i elements in general the working covariance model H i is in general complex. 336 In the context of binary data Prentice (1988) the variances are determined by the mean and so the last n i terms of T i are dropped; he also suggests taking a diagonal working covariance model, H i, i.e. ignoring the covariances. The theoetical variances ae given by var(r ij R ik ) = 1+(1 2p ij )(1 2p ik ){p ij (1 p ij )p ik (1 p ik )} 1/2 Σ(α) ijk Σ(α) 2 ijk which depend on the assumed correlation model Σ these may be taken as the diagonal elements of H i. 337

Application of GEE Extension to Marginal Odds Model We have the marginal mean model logit E[Y ij X ij ] = βx ij. Suppose we specify a model for the associations in terms of the marginal log odds ratios: j ff Pr(Yij = 1, Y ik = 1) Pr(Y ij = 0, Y ik = 0) α ijk = log. Pr(Y ij = 1, Y ik = 0) Pr(Y ij = 0, Y ik = 1) These are nuisance parameters, but how do we estimate them? Carey et al. (1992) suggest the following approach fo estimating β and α. Let µ ij = Pr(Y ij = 1) µ ik = Pr(Y ik = 1) µ ijk = Pr(Y ij = 1, Y ik = 1) 338 It is easy to show that Pr(Y ij = 1 Y ik = y ik ) Pr(Y ij = 0 Y ik = y ik ) = exp(y ik α ijk ) Pr(Y ij = 1, Y ik = 0) Pr(Y ij = 0, Y ik = 0) «µ ij µ ijk = exp(y ik α ijk ) 1 µ ij µ ik + µ ijk which can be written as a logistic regression model in terms of conditional probabilities: «Pr(Yij = 1 Y ik = y ik ) logit E[Y ij Y ik ] = log Pr(Y ij = 0 Y ik = y ik ) «µ ij µ ijk = y ik α ijk + log 1 µ ij µ ik + µ ijk where the term on the right is a known offset (the µ s are a function of β only). Suppose for simplicity that α ijk = α then given current estimates of β, α, we can fit a logistic regression model by regressing Y ij on Y ik for 1 j < k n i, to estimate α this can then be used within the working correlation model. Carey et al. (1992) refer to this method as alternating logistic regressions. 339

Indonesian Children s Health Example > summary(geese(y ~ xero+age,corstr="independence",id=id,family="binomial")) Mean Model: Mean Link: logit Variance to Mean Relation: binomial Coefficients: (Intercept) -2.38479528 0.117676276 410.699689 0.000000e+00 age -0.02605769 0.005306513 24.113112 9.083967e-07 xero 0.72015485 0.419718477 2.943985 8.619783e-02 Scale Model: Scale Link: identity Estimated Scale Parameters: (Intercept) 0.977505 0.2766052 12.48871 0.0004094196 Correlation Model: Correlation Structure: independence Number of clusters: 275 Maximum cluster size: 6 340 > summary(geese(y ~ age+xero, corstr="exchangeable", id=id,family="binomial")) (Intercept) -2.37015400 0.117210489 408.902887 0.000000e+00 age -0.02532507 0.005271204 23.082429 1.552026e-06 xero 0.58758892 0.449818037 1.706371 1.914569e-01 Estimated Scale Parameters: (Intercept) 0.9681312 0.2618218 13.67278 0.0002175859 Estimated Correlation Parameters: alpha 0.04423924 0.03222984 1.884079 0.1698713 > summary(geese(y ~ age+xero, corstr="ar1", id=id,family="binomial")) Coefficients: (Intercept) -2.37470963 0.11733291 409.620156 0.000000e+00 age -0.02597886 0.00528451 24.167452 8.831225e-07 xero 0.63692645 0.44374132 2.060245 1.511859e-01 Estimated Scale Parameters: (Intercept) 0.9715817 0.2694961 12.99732 0.0003119374 Estimated Correlation Parameters: alpha 0.05844094 0.04528613 1.665344 0.1968834 341

Conditional Likelihood: Binary Longitudinal Data Recall that conditional likelihood is a technique for eliminating nuisance parameters, here what we have previously modeled as random effects. Consider individual i with binary observations y i1,..., y ini and assume the random intercepts model Y ij γ i, β Bernoulli(p ij ), where «pij log = x ij β + γ i 1 p ij where γ i = x i β + b i and x ij (a slight change from our usual notation), are those covariates which change within an individual. 342 We have Pr(y i1,..., y ini γ i, β) = = = n i Y exp (γ i y ij + x ij βy ij ) 1 + exp (γ j=1 i + x ij β) P ni exp γ i j=1 y ij + P n i j=1 x ijy ij β Q ni j=1 [1 + exp (γ i + x ij β)] exp (γ i t 2i + t 1i β) Q ni j=1 [1 + exp (γ i + x ij β)] = exp (γ it 2i + t 1i β) k(γ i, β) = p(t 1i, t 2i γ i, β) where t 1i = k(γ i, β) = n i X X x ij y ij, t 2i = j=1 n i Y j=1 n i j=1 y ij [1 + exp (γ i + x ij β)]. 343

We have where L c (β) = p(t 2i γ i, β) = my p(t 1i t 2i, β) = P ni y i+ l=1 exp my p(t 1i, t 2i γ i, β) p(t 2i γ i, β) γ i P ni j=1 y ij + P n i k=1 x iky l ik β k(γ i, β), where the summation is over the ` n i y i+ ways of choosing yi+ ones out of n i, and y l i = (yl i1,..., yl in i ), l = 1,..., ` n i y i+ is the collection of these ways. Hence L c (β) = = my my P ni y i+ P ni exp γ i j=1 y ij + P n i j=1 x ijy ij β l=1 exp P ni y i+ P ni γ i j=1 y ij + P n i k=1 x ikyik l β exp Pni j=1 x ijy ij β l=1 exp `P n i k=1 x ikyik l β 344 Notes Can be computationally expensive to evaluate likelihood if n i is large, e.g. if n i = 20 and y i+ = 10, ` n i y i+ = 184,756. There is no contribution to the conditional likelihood from individuals: With n i = 1. With y i+ = 0 or y i+ = n i. For those covariates with x i1 =... = x ini = x i. The conditional likelihood estimates β s that are associated with within-individual covariates. If a covariate only varies between individuals, then it cannot be estimated using conditional likelihood. For covariates that vary both between and within individuals, only the within-individual contrasts are used. The similarity to Cox s partial likelihood may be exploited to carry out computation. We have not made a distributional assumption for the γ i s! 345

Examples: If n i = 3 and y i = (0,0, 1) so that y i+ = 1 then y 1 i = (1, 0,0), y2 i = (0, 1,0), y3 i = (0,0, 1), and the contribution to the conditional likelihood is exp(x i3 β) exp(x i1 β) + exp(x i2 β) + exp(x i3 β). If n i = 3 and y i = (1,0, 1) so that y i+ = 2 then y 1 i = (1, 1,0), y2 i = (1, 0,1), y3 i = (0,1, 1), and the contribution to the conditional likelihood is exp(x i1 β + x i3 β) exp(x i1 β + x i2 β) + exp(x i1 β + x i3 β) + exp(x i2 β + x i3 β). 346