Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Linear Mixed Models for Longitudinal Data Yan Lu April, 2018, week 15 1 / 38

Data structure t1 t2 tn i 1st subject y 11 y 12 y 1n1 Experimental 2nd subject y 21 y 22 y 2n2. m 1 th subject y m1 1 y m1 2 y m1 n m1 1st subject y m1 +1,1 y m1 +1,2 y m1 +1,n m1 +1 Control 2nd subject y m1 +2,1 y m1 +2,2 y m1 +2,n m1 +2. m th subject y m1 y m2 y mnm 2 / 38

Linear mixed model Y i = X i β + Z i b i + ɛ i (1) where β g = [ βg0 β g1 ] [ βb0, β B = β B1 ] [ βg, β = β B ] = β g0 β g1 β B0 β B1, b i = [ bi0 b i1 ] b i N(0, g) indep ɛ i N(0, R i ) Therefore, Y i N(X i β, Z i gz i + R i ) 3 / 38

{ Xi β Y i = g + Z i b i + ɛ i child i is a girl X i β B + Z i b i + ɛ i child i is a boy where 1 t i1 1 t i2 Z i =.., X i = 1 t ini δ i δ i t i1 (1 δ i ) (1 δ i )t i1 δ i δ i t i2 (1 δ i ) (1 δ i )t i2.... δ i δ i t ini (1 δ i ) (1 δ i )t ini with δ i = { 1 girls 0 boys 4 / 38

Y i = X i β + Z i b i + ɛ i The parameters of the model are β and those in g and R i. The b i are unknown random variables, while Z i is fixed and known. The likelihood function is based on the marginal distribution Y i N(X i β, Z i gz i + R i ) where responses Y 1, Y 2,, Y m for different individuals are assumed independent. same form as the longitudinal models Y i N(X i β, Σ i ) where here Σ i = Z i gz i + R i 5 / 38

Thus standard ML and REML can be used for inference on β, whereas AIC/BIC may be used for selections of appropriate covariance structures. 6 / 38

We can also state model (1) as Y 1 Y 2. Y m = X 1 X 2. X m Y = Xβ + Zb + ɛ (2) Z 1 0 0 0 0 Z 2 0 0 b 1 β+. 0 0...... b 2. 0......... + 0 b m 0 0 Z m where E(ɛ) = 0, E(b) = 0 and R 1 0 0 0 0 R 2 0 0 var(ɛ) =...... = R 0 0 0 R m ɛ 1 ɛ 2. ɛ m 7 / 38

and var(b) = g 0 0 0 0 g 0 0...... 0 0 0 g = G Note that G and g are different. R and G are block diagonal which reflects assumptions that b 1, b 2,, b m are mutually independent, so are ɛ 1, ɛ 2,, ɛ m. As before, we also assume the b i s and ɛ i s are independent 8 / 38

Concisely, the linear mixed model for longitudinal data can be written as Y = Xβ + Zb + ɛ with b N(0, G) indep of ɛ N(0, R). Y N(Xβ, V), where var(y) = V = ZGZ + R -V is block-diagonal with ith diagonal block var(y i ) = Z i gz i + R i 9 / 38

Model (1) includes many settings for special case Y i = X i β + Z i b i + ɛ i the standard linear regression model, which excludes Z i b i and assumes that R i = σ 2 I ni independent errors with constant variance (same is true for standard ANOVA) longitudinal model from week 13 which also excludes Z i b i the random coefficient model where X i β was Z i β the split plot, randomized block and one-way random effects model when we assume E(b i ) = 0, we typically are thinking of b i as subject specific deviations, which makes sense when the effects in Z i are contained in X i. 10 / 38

One-way random effects model Y i = Y ij = µ + α i + ɛ ij, i = 1, 2,, m, j = 1,, n i (3) Y i1 ɛ i1 α 1 1 Y i2.., ɛ ɛ i2 i =.., α = α 2.., X 1 i =.. Y ini 1 α N(0, σ 2 αi m ) ɛ N(0, σ 2 I n ) α and ɛ independent ɛ ini Y i = X i µ + Z i α i + ɛ i, where Z i = X i α m n i 1 11 / 38

β = µ, X = X 1 X 2. X m Z = mk 1, Y = 1 0 0 1 0 0. 1 0 0. 0 0 1 0 0 1. 0 0 1 Y 1 Y 2. Y m, ɛ = mk m ɛ 1 ɛ 2. ɛ m, 12 / 38

One way random effects model (3) can be written as Y = Xβ + Zα + ɛ where µ = β, α = b, J ni is an n i 1 vector of 1s, g = σα 2 and R i = σei 2 ni, I ni is an n i 1 identity matrix which implies Z i gz i + R i = σαj 2 ni + σei 2 ni 13 / 38

Best linear unbiased prediction (BLUPS) The expected response for subject i conditional on b i is E(Y i b i ) = X i β + Z i b i Unconditional expectation, averaging over all subjects is E(Y i ) = E {E(Y i b i )} = X i β + Z i E(b i ) = X i β Given the ML or REML estiator of β, say ˆβ, we estimate the unconditional (or pop averaged) mean with X i ˆβ 14 / 38

How do we estimate (actually predict) the subject specific E(Y i b i ) = X i β + Z i b i It is reasonable to think X i ˆβ + Zi ˆbi for a suitable chosen ˆb i. Statistical theory suggests that the best prediction for the random variable b i based on Y i is E(b i Y i ). Under normality, this can be shown to equal E(b i Y i ) = gz iv 1 i (Y i X i β) This follows because Y i b i N(X i β + Z i b i, R i ) and b i N(0, g) implies (Y i, b i ) has a multivariate normal distribution. It is well known that b i Y i is normal with E(b i Y i ) = E(b i ) + cov(b i, Y i )var(y i ) 1 (Y i E(Y i )) = 0 + gz iv 1 i (Y i X i β) 15 / 38

Here V i = var(y i ) and cov(b i, Y i ) = cov(b i, X i β + Z i b i + ɛ i ) = cov(b i, Z i b i + ɛ i ) = cov(b i, Z i b i ) + cov(b i, ɛ i ) = cov(b i, b i )Z i = gz i So result follows. 16 / 38

Best Prediction E(b i Y i ) = gz iv 1 i (Y i Xβ) Replace unknown parameters g, V i and β by MLES or REMLs gives ˆb i = ĝz 1 i ˆV i (Y i Xˆβ) which is often called the empirical estimated BLUP (best unbiased predictor). Thus the subject specific trajectory X i β + Z i b i is predicted via X i ˆβ + Z i ˆb i = X i ˆβ + Z i ĝz i = (I Z i ĝz i 1 ˆV i 1 ˆV i = (I W i )X i ˆβ + W i Y i (Y i X i ˆβ) )X i ˆβ + Zi ĝz i 1 ˆV i Y i 17 / 38

If you consider W i a weight, then our base prediction is a weighted average of an individual s response Y i and the estimated mean response X i ˆβ, averaged over all individuals with same design matrix X i as subject i. Note that prediction in the one-way random effects ANOVA we give before are special cases of the above results. 18 / 38

Tests on variance component Suppose we wish to test Y ij = β 0 + β 1 t ij + b 0i + b 1i t ij + ɛ ij ( ) ( [ ]) b0i σ 2 N 0, 0 σ 01 b 1i σ 10 σ1 2 H 0 : σ 2 1 = 0 i.e.,b 1i = 0 for all i. In this case β 1 is the slope of each individual s regression line. This test can be done informally using AIC/BIC (i.e., compare null model to alternative model with arbitrary σ1 2 ), or using LR test LR = 2log(ˆL red /ˆL full ) where ˆL red and ˆL full are the maximized log-likelihood for the reduced and full models. 19 / 38

Boundry correction H 0 specifies that σ 2 1 lies on the boundary of possible value for σ2 1 (i.e., σ 2 1 0). The standard theory for LR tests doesn t hold. One can show under H 0, LR 0.5χ 2 1 + 0.5χ 2 2 i.e., LR statistic has a null distribution that is a 50:50 mixture of χ 2 1 and χ2 2 distribution (rather than usual) χ2 1. Result mixture holds since we are comparing 2 nested models, one with 1 random effect b 0i, the other with two b 0i and b 1i. More generally if we are comparing nested models with q and q + 1 correlated random-effects (which still differ by 1 random effect), then LR 0.5χ 2 q + 0.5χ 2 q+1 20 / 38

Time dependent covariance Mixed models allow for a variety of complications with longitudinal data, for example dropout/missing covariates in fixed effects can either be static (constant over time) such as baseline measurements or time dependent such as time. IN some cases, the treatment (or group) one is assigned changes with time. For example, you may be switched to placebo group if you are having all adverse reactions to a drug, or switched to a higher dose of a drug if given dose is not effective. Some care is needed to ensure proper inferences with these complications. 21 / 38

Generalized estimating equations (GEEs) Liang and Zeger (1986) Apply to non-normal data, shortage of joint distribution such as multivariate probability distribution GEE approach bypasses the distribution problem by focusing on modeling the mean function and correlation function only. An estimating equation is proposed for estimating the regression effects and then appropriate statistical theory is applied to generate standard errors and procedures for inference. 22 / 38

Standard longitudinal model for normal responses. The standard model for longitudinal data can be written as with ɛ N(0, Σ). Y = Xβ + ɛ Y N(Xβ, Σ), where Σ is block-diagonal with ith diagonal block Σ i usually write Σ = σ 2 R, var(y ij ) = σ 2, cov(y i ) = Σ i, Σ i = σ 2 R i Equivalently, y i N(X i β, σ 2 R i ) 23 / 38

L β = 1 σ 2 X R (y µ) = 1 σ 2 k j=1 X jr 1 j (y j µ j ) = 0 (4) ˆβ = (X R 1 X) 1 X R 1 y k = (X jr 1 j X j ) 1 X jr 1 j y j (5) j=1 MLEs of σ 2 and R can be computed independently of β, plugging these into (5) gives the overall MLE of β. 24 / 38

Generalized linear models y i indep Bin(n i, µ i ) A glm has the following form with some link function g( ) g(µ i ) = η i = x i β ( ) µi log, logit link: logistic regression 1 µ g(µ i ) = i Φ 1 (µ i ), Probit link: probit regression log { log(1 µ i )} Complementary log-log link exp(η i )/(1 + exp(η i )), logit link µ i = g 1 (η i ) = Φ(η i ), Probit link 1 exp { exp(η i )} Complementary log-log 25 / 38

General longitudinal model In a general case, y N(Xβ, Σ) ˆβ = (X ˆΣ 1 X) 1 X ˆΣ 1 y is the MLE of β and ˆΣ is the MLE of Σ. Key to generalizing glms to longitudinal responses is to modify equation (4) to accommodate g(µ ij ) = X ijβ instead of µ ij = X ijβ for some link function g( ). If the responses within an individual were independent and satisfied an EF model, this would be a glm and the score function for β based solely on y i would be w ij l i β = X iw i i (y i µ i ) i = diag(g (µ ij )) w ii = diag φv (µ ij )g (µ i ) 2 w ij, weight for jth obsn in y i, i.e., y ij 26 / 38

Assuming responses between individuals are independent, the score function based on the sample is L k β = L j k β = X jw j j (y j µ j ) j=1 j=1 Recall the score function under normality, 0 = 1 σ 2 k j=1 X jr 1 j (y j µ j ) R j = I nj, corresponds to the case where responses within an individual are independent. Idea behind GEEs for glms is to take the score function when responses within an individual are independent. 0 = k j=1 X j w j j (y j µ j ) and replace w j j by a matrix that captures the within individual correlation structure, but reduces to w j j if responses within an individual are independent. 27 / 38

SAS ( ) 1 w i i = diag φ w ij V (µ ij )g (µ i ) 2 g (µ ij ) ( ) wij = diag φv (µ ij 1 g (µ ij ) ( ) ( ) 1 wij = diag g diag (µ ij ) φv (µ ij ) = i 1 V 1 i 28 / 38

Recall for a glm that var(y ij ) = φv (µ ij) w ij φ is the dispersion parameter, so if responses within an individual were independent, then var(y i ) = V i and the score function for β for a glm when responses within an individual are independent reduces to L k β = X jw j j (y j µ j ) = = j=1 k j=1 k j=1 X j 1 j V 1 j (y j µ j ) D jv 1 j (y j µ j ) = 0 D j depends on X j and link function, but not the variance function. 29 / 38

GEE extension L β = k j=1 D jv 1 j (y j µ j ) = 0 (6) GEE extension replaces V i as defined above by a covariance matrix for y i that reflects the correlation structure for a longitudinal response vector. (w i ) 0.5 = diag Var(y i ) = φa 1 2 i Wi 1 2 Ri (α)wi 1 1 2 A 2 i where A i = diag(v (µ ij )), wi = diag(w ij ), so that ( ) A 0.5 i = diag V (µ ij ) ( ) 1 wij A 0.5 i (wi ) 0.5 = diag ( ) V (µ ij ) w ij 30 / 38

R i (α) = n i n i correlation matrix for y i which depends on a set α of correlation parameters. If responses within an individual are independent, R i (α) = I ni and thus as before. V i = φa 0.5 i wi 0.5 w ( ) φv (µij ) = diag w ij i 0.5 A 0.5 i 31 / 38

Several comments Consider equation (6), L β = k j=1 D jv 1 j (y j µ j ) = 0 D j = D j (β), V j = V j (α, β, φ), µ j = µ j (β) for normal response and V i = σ 2 R i (α), the GEE of β is MLE when σ 2 and α are estimated by ML (and REML if σ 2 and α are estimated by REML) GEE estimators are MLEs when R i (α) = I ni and responses are from an EF distribution Scale parameter φ is optional (not required for some models) 32 / 38

Choice of R i (α) Independent R i (α) = I ni Exchangeable R i (α) = Unstructured, corr(y ij, y ik ) = α jk AR(1), corr(y ij, y ik ) = α k j 1 α α α α 1 α α α α... α 1 33 / 38

Estimators Need to estimate α to compute the GEE, using exchangeable R i (α) as an example. y i = (y i1, y i2,, y ini ), corr(y i ) = R i (α), R i (α) is exchangeable. scaled Pearson residuals Γ ij = y ij µ ij var(yij ) = y ij µ ij φv (µij )/w ij α = ρ jk = corr(y ij, y ik ) = E(Γ ij, Γ ik ), j k if the parameters in Γ ij were known, then each pair Γ ij, Γ ik would be an unbiased estimator of α. There are n i (n i 1)/2 such pairs for individual i. Adding up all pairs over all subjects and divided by the total number of pairs N = 0.5 m i=1 n i(n i 1) gives an estimate of α ˆα = 1 N m n i i=1 j,k=1 Γ ij Γ ik 34 / 38

SAS unscaled residuals Γ ij = e ij / φ, so e ij = ˆα = 1 N φ y ij µ ij V (µij )/w ij m e ij e ik (7) i=1 j<k Note that ˆα depends on φ and β (through µ ij ), ˆα(φ, β) E(e 2 ij ) = var(e ij) = φ, var(γ ij ) = 1 so var(e ij ) = φ ˆφ = 1 N m n i m eij, 2 N = n i (8) i=1 j=1 i=1 e ij depends on β, ˆφ is a function of β, say ˆφ(β). 35 / 38

Fitting algorithm 1. Get an initial estimate of β, say ˆβ 0 by solving equation (6), assuming independence structure (i.e., R i (α) = I ni ) doesn t require estimates of φ or α. 2. Compute ˆµ ij = µ ij (ˆβ 0 ) and ê ij = y ij ˆµ ij and plug into V (ˆµij )/w ij (8) to get ˆφ 0 = ˆφ( ˆβ i ) and into (7) with ˆφ 0 to get ˆα 0 = ˆα( ˆφ 0, ˆβ 0 ) 3. Compute V i = V i (ˆα 0, ˆβ 0, ˆφ 0 ) and D i = D i (ˆβ 0 ) and plug into (6). Solve GEE score function (6) for ˆβ, say ˆβ (1) 4. Repeat 2 and 3 with ˆβ (1), iterate until (hopefully) convergence 5. Final estimates ˆβ, ˆα and ˆφ Same idea applied to other correlation structures. 36 / 38

The working correlation structure It is difficult to propose a suitable correlation structure R i (α) for non-normal longitudinal data. GEE approach has some good properties to deal with the problem β is consistently estimated even if R i (α) is uncorrectly specified Standard errors and inference procedure are available to account for R i (α) possibly being misspecified In standard ML based inference for normally distributed repeated measures, the ML estimate of β is also consistent even R i (α) is misspecified, but the ML-based standard errors and tests are not robust to misspecification of R i (α). This makes GEE-based inference attractive even for normal response. R i (α) is not necessarily taken seriously, it is called working correlation structure. But estimators with less variability will be produced using R i (α) that mimic the true correlation. 37 / 38

Distribution of GEE estimators ˆβ Under suitable regularity conditions, ˆβ is approximately normally distributed ˆβ N(β, Σ) given and estimator ˆΣ of Σ, we can construct CI, tests and inferences. 38 / 38