Stat 579: Generalized Linear Models and Extensions

Similar documents
Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association for binary responses

Generalized Estimating Equations (gee) for glm type data

Biostatistics Workshop Longitudinal Data Analysis. Session 4 GARRETT FITZMAURICE

1 Mixed effect models and longitudinal data analysis

The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models

Stat 579: Generalized Linear Models and Extensions

BIOS 2083 Linear Models c Abdus S. Wahed

STAT 526 Advanced Statistical Methodology

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions

Introduction to General and Generalized Linear Models

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

REPEATED MEASURES. Copyright c 2012 (Iowa State University) Statistics / 29

Linear Mixed Models. One-way layout REML. Likelihood. Another perspective. Relationship to classical ideas. Drawbacks.

For more information about how to cite these materials visit

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1

Matrix Approach to Simple Linear Regression: An Overview

Generalized Linear Models. Kurt Hornik

1/15. Over or under dispersion Problem

Generalized Linear Models Introduction

Modeling Longitudinal Count Data with Excess Zeros and Time-Dependent Covariates: Application to Drug Use

,..., θ(2),..., θ(n)

Ordinary Least Squares Regression

STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method.

GEE for Longitudinal Data - Chapter 8

[y i α βx i ] 2 (2) Q = i=1

General Linear Model: Statistical Inference

Mixed-Models. version 30 October 2011

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Analysis of Longitudinal Data. Patrick J. Heagerty PhD Department of Biostatistics University of Washington

Outline of GLMs. Definitions

STA216: Generalized Linear Models. Lecture 1. Review and Introduction

STAT 705 Chapter 16: One-way ANOVA

Linear models and their mathematical foundations: Simple linear regression

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Linear Methods for Prediction

Mixed-Model Estimation of genetic variances. Bruce Walsh lecture notes Uppsala EQG 2012 course version 28 Jan 2012

Lecture 5: LDA and Logistic Regression

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Using Estimating Equations for Spatially Correlated A

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Bayesian Inference. Chapter 9. Linear models and regression

Questions and Answers on Heteroskedasticity, Autocorrelation and Generalized Least Squares

PQL Estimation Biases in Generalized Linear Mixed Models

Repeated ordinal measurements: a generalised estimating equation approach

Maximum Likelihood Estimation

On Properties of QIC in Generalized. Estimating Equations. Shinpei Imori

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

13. October p. 1

Models for Clustered Data

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Models for Clustered Data

Generalized Linear Models: An Introduction

Chapter 14. Linear least squares

Bias Variance Trade-off

Covariance Models (*) X i : (n i p) design matrix for fixed effects β : (p 1) regression coefficient for fixed effects

Multiple Linear Regression

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Simple and Multiple Linear Regression

STAT 135 Lab 10 Two-Way ANOVA, Randomized Block Design and Friedman s Test

A measurement error model approach to small area estimation

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA

REGRESSION WITH SPATIALLY MISALIGNED DATA. Lisa Madsen Oregon State University David Ruppert Cornell University

Sample Size and Power Considerations for Longitudinal Studies

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Modeling the Mean: Response Profiles v. Parametric Curves

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

ECON 4160, Autumn term Lecture 1

Chapter 7. Hypothesis Testing

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key

Figure 36: Respiratory infection versus time for the first 49 children.

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random

Multinomial Logistic Regression Models

36-720: Linear Mixed Models

WU Weiterbildung. Linear Mixed Models

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.

Various types of likelihood

Stat 710: Mathematical Statistics Lecture 12

Generalized linear models

Linear Methods for Prediction

Generalized Linear Models

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

9.1 Orthogonal factor model.

Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

MIXED MODELS THE GENERAL MIXED MODEL

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Statistical Distribution Assumptions of General Linear Models

Regression Models - Introduction

STAT 525 Fall Final exam. Tuesday December 14, 2010

High-Throughput Sequencing Course

AMS-207: Bayesian Statistics

Robust covariance estimator for small-sample adjustment in the generalized estimating equations: A simulation study

Transcription:

Stat 579: Generalized Linear Models and Extensions Linear Mixed Models for Longitudinal Data Yan Lu April, 2018, week 15 1 / 38

Data structure t1 t2 tn i 1st subject y 11 y 12 y 1n1 Experimental 2nd subject y 21 y 22 y 2n2. m 1 th subject y m1 1 y m1 2 y m1 n m1 1st subject y m1 +1,1 y m1 +1,2 y m1 +1,n m1 +1 Control 2nd subject y m1 +2,1 y m1 +2,2 y m1 +2,n m1 +2. m th subject y m1 y m2 y mnm 2 / 38

Linear mixed model Y i = X i β + Z i b i + ɛ i (1) where β g = [ βg0 β g1 ] [ βb0, β B = β B1 ] [ βg, β = β B ] = β g0 β g1 β B0 β B1, b i = [ bi0 b i1 ] b i N(0, g) indep ɛ i N(0, R i ) Therefore, Y i N(X i β, Z i gz i + R i ) 3 / 38

{ Xi β Y i = g + Z i b i + ɛ i child i is a girl X i β B + Z i b i + ɛ i child i is a boy where 1 t i1 1 t i2 Z i =.., X i = 1 t ini δ i δ i t i1 (1 δ i ) (1 δ i )t i1 δ i δ i t i2 (1 δ i ) (1 δ i )t i2.... δ i δ i t ini (1 δ i ) (1 δ i )t ini with δ i = { 1 girls 0 boys 4 / 38

Y i = X i β + Z i b i + ɛ i The parameters of the model are β and those in g and R i. The b i are unknown random variables, while Z i is fixed and known. The likelihood function is based on the marginal distribution Y i N(X i β, Z i gz i + R i ) where responses Y 1, Y 2,, Y m for different individuals are assumed independent. same form as the longitudinal models Y i N(X i β, Σ i ) where here Σ i = Z i gz i + R i 5 / 38

Thus standard ML and REML can be used for inference on β, whereas AIC/BIC may be used for selections of appropriate covariance structures. 6 / 38

We can also state model (1) as Y 1 Y 2. Y m = X 1 X 2. X m Y = Xβ + Zb + ɛ (2) Z 1 0 0 0 0 Z 2 0 0 b 1 β+. 0 0...... b 2. 0......... + 0 b m 0 0 Z m where E(ɛ) = 0, E(b) = 0 and R 1 0 0 0 0 R 2 0 0 var(ɛ) =...... = R 0 0 0 R m ɛ 1 ɛ 2. ɛ m 7 / 38

and var(b) = g 0 0 0 0 g 0 0...... 0 0 0 g = G Note that G and g are different. R and G are block diagonal which reflects assumptions that b 1, b 2,, b m are mutually independent, so are ɛ 1, ɛ 2,, ɛ m. As before, we also assume the b i s and ɛ i s are independent 8 / 38

Concisely, the linear mixed model for longitudinal data can be written as Y = Xβ + Zb + ɛ with b N(0, G) indep of ɛ N(0, R). Y N(Xβ, V), where var(y) = V = ZGZ + R -V is block-diagonal with ith diagonal block var(y i ) = Z i gz i + R i 9 / 38

Model (1) includes many settings for special case Y i = X i β + Z i b i + ɛ i the standard linear regression model, which excludes Z i b i and assumes that R i = σ 2 I ni independent errors with constant variance (same is true for standard ANOVA) longitudinal model from week 13 which also excludes Z i b i the random coefficient model where X i β was Z i β the split plot, randomized block and one-way random effects model when we assume E(b i ) = 0, we typically are thinking of b i as subject specific deviations, which makes sense when the effects in Z i are contained in X i. 10 / 38

One-way random effects model Y i = Y ij = µ + α i + ɛ ij, i = 1, 2,, m, j = 1,, n i (3) Y i1 ɛ i1 α 1 1 Y i2.., ɛ ɛ i2 i =.., α = α 2.., X 1 i =.. Y ini 1 α N(0, σ 2 αi m ) ɛ N(0, σ 2 I n ) α and ɛ independent ɛ ini Y i = X i µ + Z i α i + ɛ i, where Z i = X i α m n i 1 11 / 38

β = µ, X = X 1 X 2. X m Z = mk 1, Y = 1 0 0 1 0 0. 1 0 0. 0 0 1 0 0 1. 0 0 1 Y 1 Y 2. Y m, ɛ = mk m ɛ 1 ɛ 2. ɛ m, 12 / 38

One way random effects model (3) can be written as Y = Xβ + Zα + ɛ where µ = β, α = b, J ni is an n i 1 vector of 1s, g = σα 2 and R i = σei 2 ni, I ni is an n i 1 identity matrix which implies Z i gz i + R i = σαj 2 ni + σei 2 ni 13 / 38

Best linear unbiased prediction (BLUPS) The expected response for subject i conditional on b i is E(Y i b i ) = X i β + Z i b i Unconditional expectation, averaging over all subjects is E(Y i ) = E {E(Y i b i )} = X i β + Z i E(b i ) = X i β Given the ML or REML estiator of β, say ˆβ, we estimate the unconditional (or pop averaged) mean with X i ˆβ 14 / 38

How do we estimate (actually predict) the subject specific E(Y i b i ) = X i β + Z i b i It is reasonable to think X i ˆβ + Zi ˆbi for a suitable chosen ˆb i. Statistical theory suggests that the best prediction for the random variable b i based on Y i is E(b i Y i ). Under normality, this can be shown to equal E(b i Y i ) = gz iv 1 i (Y i X i β) This follows because Y i b i N(X i β + Z i b i, R i ) and b i N(0, g) implies (Y i, b i ) has a multivariate normal distribution. It is well known that b i Y i is normal with E(b i Y i ) = E(b i ) + cov(b i, Y i )var(y i ) 1 (Y i E(Y i )) = 0 + gz iv 1 i (Y i X i β) 15 / 38

Here V i = var(y i ) and cov(b i, Y i ) = cov(b i, X i β + Z i b i + ɛ i ) = cov(b i, Z i b i + ɛ i ) = cov(b i, Z i b i ) + cov(b i, ɛ i ) = cov(b i, b i )Z i = gz i So result follows. 16 / 38

Best Prediction E(b i Y i ) = gz iv 1 i (Y i Xβ) Replace unknown parameters g, V i and β by MLES or REMLs gives ˆb i = ĝz 1 i ˆV i (Y i Xˆβ) which is often called the empirical estimated BLUP (best unbiased predictor). Thus the subject specific trajectory X i β + Z i b i is predicted via X i ˆβ + Z i ˆb i = X i ˆβ + Z i ĝz i = (I Z i ĝz i 1 ˆV i 1 ˆV i = (I W i )X i ˆβ + W i Y i (Y i X i ˆβ) )X i ˆβ + Zi ĝz i 1 ˆV i Y i 17 / 38

If you consider W i a weight, then our base prediction is a weighted average of an individual s response Y i and the estimated mean response X i ˆβ, averaged over all individuals with same design matrix X i as subject i. Note that prediction in the one-way random effects ANOVA we give before are special cases of the above results. 18 / 38

Tests on variance component Suppose we wish to test Y ij = β 0 + β 1 t ij + b 0i + b 1i t ij + ɛ ij ( ) ( [ ]) b0i σ 2 N 0, 0 σ 01 b 1i σ 10 σ1 2 H 0 : σ 2 1 = 0 i.e.,b 1i = 0 for all i. In this case β 1 is the slope of each individual s regression line. This test can be done informally using AIC/BIC (i.e., compare null model to alternative model with arbitrary σ1 2 ), or using LR test LR = 2log(ˆL red /ˆL full ) where ˆL red and ˆL full are the maximized log-likelihood for the reduced and full models. 19 / 38

Boundry correction H 0 specifies that σ 2 1 lies on the boundary of possible value for σ2 1 (i.e., σ 2 1 0). The standard theory for LR tests doesn t hold. One can show under H 0, LR 0.5χ 2 1 + 0.5χ 2 2 i.e., LR statistic has a null distribution that is a 50:50 mixture of χ 2 1 and χ2 2 distribution (rather than usual) χ2 1. Result mixture holds since we are comparing 2 nested models, one with 1 random effect b 0i, the other with two b 0i and b 1i. More generally if we are comparing nested models with q and q + 1 correlated random-effects (which still differ by 1 random effect), then LR 0.5χ 2 q + 0.5χ 2 q+1 20 / 38

Time dependent covariance Mixed models allow for a variety of complications with longitudinal data, for example dropout/missing covariates in fixed effects can either be static (constant over time) such as baseline measurements or time dependent such as time. IN some cases, the treatment (or group) one is assigned changes with time. For example, you may be switched to placebo group if you are having all adverse reactions to a drug, or switched to a higher dose of a drug if given dose is not effective. Some care is needed to ensure proper inferences with these complications. 21 / 38

Generalized estimating equations (GEEs) Liang and Zeger (1986) Apply to non-normal data, shortage of joint distribution such as multivariate probability distribution GEE approach bypasses the distribution problem by focusing on modeling the mean function and correlation function only. An estimating equation is proposed for estimating the regression effects and then appropriate statistical theory is applied to generate standard errors and procedures for inference. 22 / 38

Standard longitudinal model for normal responses. The standard model for longitudinal data can be written as with ɛ N(0, Σ). Y = Xβ + ɛ Y N(Xβ, Σ), where Σ is block-diagonal with ith diagonal block Σ i usually write Σ = σ 2 R, var(y ij ) = σ 2, cov(y i ) = Σ i, Σ i = σ 2 R i Equivalently, y i N(X i β, σ 2 R i ) 23 / 38

L β = 1 σ 2 X R (y µ) = 1 σ 2 k j=1 X jr 1 j (y j µ j ) = 0 (4) ˆβ = (X R 1 X) 1 X R 1 y k = (X jr 1 j X j ) 1 X jr 1 j y j (5) j=1 MLEs of σ 2 and R can be computed independently of β, plugging these into (5) gives the overall MLE of β. 24 / 38

Generalized linear models y i indep Bin(n i, µ i ) A glm has the following form with some link function g( ) g(µ i ) = η i = x i β ( ) µi log, logit link: logistic regression 1 µ g(µ i ) = i Φ 1 (µ i ), Probit link: probit regression log { log(1 µ i )} Complementary log-log link exp(η i )/(1 + exp(η i )), logit link µ i = g 1 (η i ) = Φ(η i ), Probit link 1 exp { exp(η i )} Complementary log-log 25 / 38

General longitudinal model In a general case, y N(Xβ, Σ) ˆβ = (X ˆΣ 1 X) 1 X ˆΣ 1 y is the MLE of β and ˆΣ is the MLE of Σ. Key to generalizing glms to longitudinal responses is to modify equation (4) to accommodate g(µ ij ) = X ijβ instead of µ ij = X ijβ for some link function g( ). If the responses within an individual were independent and satisfied an EF model, this would be a glm and the score function for β based solely on y i would be w ij l i β = X iw i i (y i µ i ) i = diag(g (µ ij )) w ii = diag φv (µ ij )g (µ i ) 2 w ij, weight for jth obsn in y i, i.e., y ij 26 / 38

Assuming responses between individuals are independent, the score function based on the sample is L k β = L j k β = X jw j j (y j µ j ) j=1 j=1 Recall the score function under normality, 0 = 1 σ 2 k j=1 X jr 1 j (y j µ j ) R j = I nj, corresponds to the case where responses within an individual are independent. Idea behind GEEs for glms is to take the score function when responses within an individual are independent. 0 = k j=1 X j w j j (y j µ j ) and replace w j j by a matrix that captures the within individual correlation structure, but reduces to w j j if responses within an individual are independent. 27 / 38

SAS ( ) 1 w i i = diag φ w ij V (µ ij )g (µ i ) 2 g (µ ij ) ( ) wij = diag φv (µ ij 1 g (µ ij ) ( ) ( ) 1 wij = diag g diag (µ ij ) φv (µ ij ) = i 1 V 1 i 28 / 38

Recall for a glm that var(y ij ) = φv (µ ij) w ij φ is the dispersion parameter, so if responses within an individual were independent, then var(y i ) = V i and the score function for β for a glm when responses within an individual are independent reduces to L k β = X jw j j (y j µ j ) = = j=1 k j=1 k j=1 X j 1 j V 1 j (y j µ j ) D jv 1 j (y j µ j ) = 0 D j depends on X j and link function, but not the variance function. 29 / 38

GEE extension L β = k j=1 D jv 1 j (y j µ j ) = 0 (6) GEE extension replaces V i as defined above by a covariance matrix for y i that reflects the correlation structure for a longitudinal response vector. (w i ) 0.5 = diag Var(y i ) = φa 1 2 i Wi 1 2 Ri (α)wi 1 1 2 A 2 i where A i = diag(v (µ ij )), wi = diag(w ij ), so that ( ) A 0.5 i = diag V (µ ij ) ( ) 1 wij A 0.5 i (wi ) 0.5 = diag ( ) V (µ ij ) w ij 30 / 38

R i (α) = n i n i correlation matrix for y i which depends on a set α of correlation parameters. If responses within an individual are independent, R i (α) = I ni and thus as before. V i = φa 0.5 i wi 0.5 w ( ) φv (µij ) = diag w ij i 0.5 A 0.5 i 31 / 38

Several comments Consider equation (6), L β = k j=1 D jv 1 j (y j µ j ) = 0 D j = D j (β), V j = V j (α, β, φ), µ j = µ j (β) for normal response and V i = σ 2 R i (α), the GEE of β is MLE when σ 2 and α are estimated by ML (and REML if σ 2 and α are estimated by REML) GEE estimators are MLEs when R i (α) = I ni and responses are from an EF distribution Scale parameter φ is optional (not required for some models) 32 / 38

Choice of R i (α) Independent R i (α) = I ni Exchangeable R i (α) = Unstructured, corr(y ij, y ik ) = α jk AR(1), corr(y ij, y ik ) = α k j 1 α α α α 1 α α α α... α 1 33 / 38

Estimators Need to estimate α to compute the GEE, using exchangeable R i (α) as an example. y i = (y i1, y i2,, y ini ), corr(y i ) = R i (α), R i (α) is exchangeable. scaled Pearson residuals Γ ij = y ij µ ij var(yij ) = y ij µ ij φv (µij )/w ij α = ρ jk = corr(y ij, y ik ) = E(Γ ij, Γ ik ), j k if the parameters in Γ ij were known, then each pair Γ ij, Γ ik would be an unbiased estimator of α. There are n i (n i 1)/2 such pairs for individual i. Adding up all pairs over all subjects and divided by the total number of pairs N = 0.5 m i=1 n i(n i 1) gives an estimate of α ˆα = 1 N m n i i=1 j,k=1 Γ ij Γ ik 34 / 38

SAS unscaled residuals Γ ij = e ij / φ, so e ij = ˆα = 1 N φ y ij µ ij V (µij )/w ij m e ij e ik (7) i=1 j<k Note that ˆα depends on φ and β (through µ ij ), ˆα(φ, β) E(e 2 ij ) = var(e ij) = φ, var(γ ij ) = 1 so var(e ij ) = φ ˆφ = 1 N m n i m eij, 2 N = n i (8) i=1 j=1 i=1 e ij depends on β, ˆφ is a function of β, say ˆφ(β). 35 / 38

Fitting algorithm 1. Get an initial estimate of β, say ˆβ 0 by solving equation (6), assuming independence structure (i.e., R i (α) = I ni ) doesn t require estimates of φ or α. 2. Compute ˆµ ij = µ ij (ˆβ 0 ) and ê ij = y ij ˆµ ij and plug into V (ˆµij )/w ij (8) to get ˆφ 0 = ˆφ( ˆβ i ) and into (7) with ˆφ 0 to get ˆα 0 = ˆα( ˆφ 0, ˆβ 0 ) 3. Compute V i = V i (ˆα 0, ˆβ 0, ˆφ 0 ) and D i = D i (ˆβ 0 ) and plug into (6). Solve GEE score function (6) for ˆβ, say ˆβ (1) 4. Repeat 2 and 3 with ˆβ (1), iterate until (hopefully) convergence 5. Final estimates ˆβ, ˆα and ˆφ Same idea applied to other correlation structures. 36 / 38

The working correlation structure It is difficult to propose a suitable correlation structure R i (α) for non-normal longitudinal data. GEE approach has some good properties to deal with the problem β is consistently estimated even if R i (α) is uncorrectly specified Standard errors and inference procedure are available to account for R i (α) possibly being misspecified In standard ML based inference for normally distributed repeated measures, the ML estimate of β is also consistent even R i (α) is misspecified, but the ML-based standard errors and tests are not robust to misspecification of R i (α). This makes GEE-based inference attractive even for normal response. R i (α) is not necessarily taken seriously, it is called working correlation structure. But estimators with less variability will be produced using R i (α) that mimic the true correlation. 37 / 38

Distribution of GEE estimators ˆβ Under suitable regularity conditions, ˆβ is approximately normally distributed ˆβ N(β, Σ) given and estimator ˆΣ of Σ, we can construct CI, tests and inferences. 38 / 38