Stat 579: Generalized Linear Models and Extensions

Size: px

Start display at page:

Download "Stat 579: Generalized Linear Models and Extensions"

Christal George
5 years ago
Views:

1 Stat 579: Generalized Linear Models and Extensions Linear Mixed Models for Longitudinal Data Yan Lu April, 2018, week 14 1 / 64

2 Data structure and Model t1 t2 tn i 1st subject y 11 y 12 y 1n1 2nd subject y 21 y 22 y 2n2. mth subject y m1 y m2 y mnm Model: y ij = β i0 + β i1 t ij + ɛ ij = (β 0 + b i0 ) + (β 1 + b i1 )t ij + ɛ ij = β 0 + β 1 t ij + b i0 + b i1 t ij + ɛ ij b i0 and b i1 are random. 2 / 64

3 y i = y i1 y i2. y ini Y i = Z i β i + ɛ i β = = Z i (β + b i ) + ɛ i = Z i β + Z i b i + ɛ i (1) [ ] [ ] β0 bi0, b i = β 1 1 t i1 1 t i2, Z i =.. 1 t ini b i1, ɛ i = ɛ 1 ɛ 2. ɛ m 3 / 64

4 β is average of intercepts and slopes across subjects in population b i is subject specific deviation of intercept and slope from population average values. -we think of b i as random effects that describe how much the ith subjects regression coefficients differ from population values Usually assume that E(b i ) = 0 and Var(b i ) = g, where g specifies how regression effect vary and covary across subjects in population. 4 / 64

5 E(Y i ) = E {Z i β + Z i b i + ɛ i } = Z i β + Z i E(b i ) + E(ɛ i ) = Z i β or alternative, E(Y ij ) = β 0 + β 1 t ij, j = 1, 2,, n i -assume that b i and ɛ i are independent (the deviations in regression coefficients are independent of the within individual deviations) var(y i ) = var(z i β + Z i b i + ɛ i ) = var(z i b i + ɛ i ) = var(z i b i ) + var(ɛ i ) = Z i gz i + R i 5 / 64

6 We also typically assume normality of ɛ i and b i, that is which implies ɛ i N(0, R i ), b i N(0, g) Y i = Z i β + Z i b i + ɛ i N(Z i β, Z i gz i + R i ) Y i Z i, β i N(Z i β i, R i ), i.e., Y i = Z i β i + ɛ i with ɛ i N(0, R i ) β i Z i N(β, g) Y i Z i N(Z i β, Z i gz i + R i ) 6 / 64

7 Remarks: Assuming a linear relationship within individuals implies that the population mean response changes linearly with time, where the population regression line has an intercept and slope that are averages of subject specific intercepts and slopes. How do I think about var(b i ) = g if the specific lines are parallel to the population mean line with different intercepts, var(b i0 ) > var(b i1 ) 0 if the specific lines display a funnel shape opening to right, individual slopes increase as intercepts increase, so cov(b i0, b i1 ) > 0 if the specific lines interacted with each other, individual slopes decrease as intercepts increase, so that cov(b i0, b i1 ) < 0 7 / 64

8 The marginal models in week 13 allow the mean to vary with time, but assumed that whereas here we have var(y i ) = Σ i Var(Y i ) = Z i gz i + R i since Z i depends on times t ij (more specifically Z i gz i has elements that are quadratic in time) - only very special cases where our current model reduces to models considered in week13. -if var(b i1 ) = 0, (implies a model with random intercepts but fixed slopes), and if R i = σ 2 I ni, we get var(y i ) having a homogeneous intraclass structure 8 / 64

9 Multiple groups Recall Y i = Z i β + Z i b i + ɛ i has a fixed effect component β and a random effect component b i plus an independent residual component ɛ i. Thus mixed effects model is the endproduct of a random coefficient regression model. Because the design matrices for β and b i are both Z i, it is easy to recognize that the random effect piece just specifies subject deviations from Z i β, i.e., Z i (β + b i ) = Z i β + Z i b i 9 / 64

10 Recall Y i = Z i β + Z i b i + ɛ i y i = y i1 y i2. y ini β = [ β0 β 1 ] [ bi0, b i = 1 t i1 1 t i2, Z i =.. 1 t ini b i1 ], ɛ i = ɛ 1 ɛ 2. ɛ m 10 / 64

11 More generally, a mixed model has structure Y i = X i β + Z i b i + ɛ i (2) where X i and Z i need not have the same number of columns, or any of the same effects. Therefore, b i N(0, g) indep ɛ i N(0, R i ) Y i N(X i β, Z i gz i + R i ) we can use this more general structure so that our model (1) can handle multiple groups i.e., boys and girls in the dental data. The key to generalizing (1) to multiple groups is to modify th fixed effects portion of the model so populations of boys and girls have their own regression line, but they have the subject specific deviation alone. 11 / 64

12 The simplest means to this end is to define boy and girl population mean regression coefficients β g = and write [ βg0 β g1 ] [ βb0, β B = β B1 ] [ βg, β = β B ] = { Zi β Y i = g + Z i b i + ɛ i child i is a girl Z i β B + Z i b i + ɛ i child i is a boy β g0 β g1 β B0 β B1 12 / 64

13 Define Recall that So that [ ] A i = [ ] β = [ βg β B ] = child i is a girl child i is a boy β g0 β g1 β B0 β B1 { βg girls A i β = boys All A i does is pick out elements for boys and girls in β. Thus we can more compactly write the model as β B Y i = Z i A i β + Z i b i + ɛ i = X i β + Z i b i + ɛ i 13 / 64

14 X i = Z i A i 1 t [ ] i t i2 =.. [ ] t ini t i t i girls 1 t = ini 0 0 = t i t i2.... boys t ini where δ i = { 1 girls 0 boys child i is a girl child i is a boy δ i δ i t i1 (1 δ i ) (1 δ i )t i1 δ i δ i t i2 (1 δ i ) (1 δ i )t i2.... δ i δ i t ini (1 δ i ) (1 δ i )t ini X i is simply the SGP design matrix used earlier. We could have perhaps written down the model directly without ever defining A i 14 / 64

15 Discussion POPULATION-AVERAGED VS. SUBJECT-SPECIFIC : population-averaged approach: the focus of modeling is on the averages (means) across the population of units at each time point, and how these averages are related over time. subject-specific approach: the focus of modeling is on individual units. In the case where the models considered are linear, the two perspectives ultimately lead to the same type of model for the mean, so that either interpretation is valid. But this is not the case with non-linear models or models for non-normal data. 15 / 64

16 The subject-specific, random coefficient approach has the additional feature that it automatically leads to a particular assumption about the structure of the covariance matrix of a data vector, which naturally acknowledges within and among-unit variation separately. In contrast, the population-averaged approach forces the data analyst to model this covariance, thinking about the two sources of variation together. As a result, the subject-specific approach of the random coefficient model, and, more generally, the linear mixed effects models has become incredibly popular. ALTERNATIVE TERMINOLOGY: The random coefficient model, allowing for the possibility of different groups, is sometimes referred to as a growth curve model in the statistical and subject-matter literature. 16 / 64

17 CHOICE OF COVARIANCE STRUCTURE: One may in principle take the covariance matrix R i, corresponding to within-unit variation, to be one of a variety of structures according to knowledge of the data collection process. If the main source of within-unit variation is measurement error, or if it is instead fluctuation but observations are far apart in time taking R i diagonal may be reasonable. One may in principle take the covariance matrix var(b i ), characterizing variation among units (through how the parameters in the individual trajectories vary) to be the same for all groups or different, depending on the belief about the pattern of variation for each group. 17 / 64

18 The most commonly-used form of the random coefficient model is that where R i = σ 2 I ni, var(b i ) = g = same for all groups Often this structure is suitable; e.g. units tend to vary similarly for each group, although the means may be different. This same kind of assumption (means differ, variance the same) is standard in usual analysis of variance models and methods. This model is considered extensively and almost exclusively in much of the literature. It is certainly possible to relax these assumptions; for example, we can take g to be different for each gender group in the dental data example. 18 / 64

19 One pitfall of trying to get too fancy with modeling of R i and var(b i ) is that it is quite likely that one will end up with a model that is too complicated to be sorted out given the data at hand. Identifiability becomes a problem. Many people are willing to risk the possibility that they may incorrectly specify R i and/or g by, for example, assuming that the var(b i ) = g is common to all groups when it may not be. The form of the model Σ i = Z i gz i + R i is sufficiently general that, even if the two components g and R i are not exactly correctly chosen, the resulting Σ i matrix will differ very little from that one would obtain if they were. Thus, if one s main interest is in estimating β and tests about it, this may be okay. 19 / 64

20 However, if interest is focused on b i and R i themselves, then obviously one would want to investigate all possibilities. However, be aware that fitting very fancy models may lead to difficulties and over-fitting. 20 / 64

21 Example: dental data Orthodontic distance is measured at ages t i1 = 8, t i2 = 10, t i3 = 12, t i4 = 14 (n i = 4) for each subject from 2 groups (boys (red) and girls (black)) > ex.data obs subject age distance gender / 64

22 > aa<-aggregate(distance~age+gender, data=ex.data, mean) #cell means > aa age gender distance / 64

23 #interaction plots par(mfrow=c(1,1),oma=c(0,0,3,0),pch=42,font.sub=3,adj=0.5) interaction.plot(ex.data$age,ex.data$gender, ex.data$distance, trace.label="gender",xlab="age",ylab="distance", col=c("black","red")) mtext(side=3,outer=t,line=0,cex=1.3,"age Distance Study") par(adj=1) title(sub="interact.ps") saveplot(filename="dentalinteraction", type=c("png"), device=dev.cur()) 23 / 64

24 Figure 1: Sample means at each time across children gender distance age Increasing linear trend with time, males (red) and females (black) have different intercepts and possibly different slopes. 24 / 64

25 par(mfrow=c(1,1),oma=c(0,0,3,0),pch=42,font.sub=3,adj=0.5) interaction.plot(ex.data2$age,ex.data2$child, ex.data2$distance, trace.label="girls",xlab="age", ylab="distance",col=1:11) mtext(side=3,outer=t,line=0,cex=1.3, "Age Distance Study Female") par(adj=1) title(sub="interact2.ps") saveplot(filename="dentalinteraction2", type=c("png"), device=dev.cur()) 25 / 64

26 Figure 2: Sample means at each time across children Increasing linear trend with time 26 / 64

27 par(mfrow=c(1,1),oma=c(0,0,3,0),pch=42,font.sub=3,adj=0.5) interaction.plot(ex.data3$age,ex.data3$child, ex.data3$distance, trace.label="boys",xlab="age",ylab="distance",col=1:16) mtext(side=3,outer=t,line=0,cex=1.3, "Age Distance Study Male") par(adj=1) title(sub="interact3.ps") saveplot(filename="dentalinteraction3", type=c("png"), device=dev.cur()) 27 / 64

28 Figure 3: Sample means at each time across children Increasing linear trend with time, more interactions observed. 28 / 64

29 > ##test if the two covariance matrices are equivalent across the groups > tres <- boxm(d2[, 3:6], d2[, "gender"]) > tres Box s M-test for Homogeneity of Covariance Matrices data: d2[, 3:6] Chi-Sq (approx.) = , df = 10, p-value = Do not reject the null hypothesis of equivalent covariance matrices across the groups. 29 / 64

30 Fit the model, with common g matrix for both genders, using default that diagonal within-child covariance matrix R i with same variance σ 2 for each gender. The random effects structure is specified in parentheses here, we allow for random intercept and slope that are correlated 30 / 64

31 myfit <- lmer(distance ~ -1 + gender + age:gender + (1 + age child), REML=FALSE,data=ex.data) summary(myfit) anova(myfit) 31 / 64

32 > summary(myfit) Linear mixed model fit by maximum likelihood t-tests use Satterthwaite approximations to degrees of freedom [lmermod] Formula: distance ~ -1 + gender + age:gender + (1 + age child) Data: ex.data AIC BIC loglik deviance df.resid Random effects: Groups Name Variance Std.Dev. Corr child (Intercept) age Residual Number of obs: 108, groups: child, / 64

33 [ g = ] 33 / 64

34 Fixed effects: Estimate Std. Error df t value Pr(> t ) gender e-14 * gender e-16 * gender0:age e-05 * gender1:age e-10 * --- Signif. codes: 0 *** ** 0.01 * Correlation of Fixed Effects: gendr0 gendr1 gndr0: gender gender0:age gender1:age / 64

35 > anova(myfit) Analysis of Variance Table of type III with Satterthwaite approximation for degrees of freedom Sum Sq Mean Sq NumDF DenDF F.value Pr(>F) gender < 2.2e-16 *** gender:age e-10 *** --- Signif. codes: 0 *** ** 0.01 * / 64

36 Figure 4: scatter plot of boys after Centering and scaling Age Age 10 Age Age 14 Boys observe positive correlations between times 36 / 64

37 Figure 5: scatter plot of girls after Centering and scaling Age Age 10 Age Age 14 Girls observe positive correlations between times 37 / 64

38 Fit two parameterizations of a regression model Difference group parametrization (DGP) Separate group parametrization (SGP) ex.data$child <- factor(ex.data$child) ex.data$gender <- factor(ex.data$gender) # First do OLS fit ignoring correlation # Assume group difference paramatrization fit.ls.dgp <- lm(distance ~ gender + age+ age:gender, data=ex.data) summary(fit.ls.dgp) # Assume separate intercept and slope by gender fit.ls.sgp <- lm(distance ~ -1 + gender+ age:gender, data=ex.data) summary(fit.ls.sgp) 38 / 64

39 DGP Coefficients: ## Estimate Std. Error t value Pr(> t ) ## (Intercept) < 2e-16 *** ## gender ## age ** ## gender1:age The estimated mean function for girls is AGE. The estimated mean function for boys is ( ) + ( )AGE = AGE. 39 / 64

40 SGP Coefficients: ## Estimate Std. Error t value Pr(> t ) ## gender < 2e-16 *** ## gender < 2e-16 *** ## gender0:age ** ## gender1:age e-08 *** The estimated mean function for girls is AGE. The estimated mean function for boys is AGE. 40 / 64

41 GLS fit, nlme package Model (a): unstructured Σ i Common unstructured correlation Σ i with variances changing over time for both genders Note that gls() defines BIC dfferently from SAS (it uses the total number of observations N while SAS MIXED uses the total number of individuals m The weights statement makes the variances on the diagonal differ over time - the default with no weight statement is that they are the same for all times Note that standard error estimates of ˆβ are not correct, need to use robust.cov function to derive the correct ses. 41 / 64

42 fit.un <- gls(distance ~ -1 + gender + age:gender, data=ex.data,correlation=corsymm(form=~1 child), weights=varident(form=~1 age),method="ml") ##varident: constant variance(s), generally used to allow # different variances according #to the levels of a classification factor. 42 / 64

43 rbind(beta.un,sebeta.un) ## gender0 gender1 gender0:age gender1:age ## beta.un ## sebeta.un sebeta.un.corrected ## gender0 gender1 gender0:age gender1:age ## #marginal covariance and correlation matrices V.un ## Marginal variance covariance matrix ## [,1] [,2] [,3] [,4] ## [1,] ## [2,] ## [3,] ## [4,] ## Standard Deviations: / 64

44 (b2) Compound symmetry with variance different by gender fit.cs2 <- gls(distance ~ -1 + gender + gender:age, data=ex.data,correlation=corcompsymm(form = ~ 1 child), weights = varident(form = ~ 1 gender),method="ml") beta.cs2 <- coef(fit.cs2) sebeta.cs2 <- summary(fit.cs2)$ttable[,"std.error"] V.cs2.girl <- getvarcov(fit.cs2, individual=1) V.cs2.boy <- getvarcov(fit.cs2, individual=12) Gamma.cs2 <- cov2cor(v.cs2.girl) 44 / 64

45 rbind(beta.cs2,sebeta.cs2) ## gender0 gender1 gender0:age gender1:age ## beta.cs ## sebeta.cs V.cs2.girl ## Marginal variance covariance matrix ## [,1] [,2] [,3] [,4] ## [1,] ## [2,] ## [3,] ## [4,] ## Standard Deviations: / 64

46 V.cs2.boy ## Marginal variance covariance matrix ## [,1] [,2] [,3] [,4] ## [1,] ## [2,] ## [3,] ## [4,] ## Standard Deviations: Gamma.cs2 ## Marginal variance covariance matrix ## [,1] [,2] [,3] [,4] ## [1,] ## [2,] ## [3,] ## [4,] ## Standard Deviations: / 64

47 Model selection SAS and R use different conventions to calculate AIC and BIC. We have used ML here, in which case AIC is the same but BIC differs as noted above. If we d used REML, both AIC and BIC values are calculated differently by SAS and R using different conventions regarding the number of observations and number of parameters, so are not comparable, but can be compared within a single implementation (but not across SAS and R). Use the anova() function to compare the models Both AIC and BIC support the common compound symmetric correlation model with different variance for each gender (constant across time). 47 / 64

48 anova(fit.un,fit.cs,fit.cs2,fit.ar1) ## Model df AIC BIC loglik ## fit.un ## fit.cs ## fit.cs ## fit.ar Test L.Ratio p-value 1 vs vs vs < / 64

49 Fitting linear mixed effects models uing the lme() function in nlme A general call to lme() looks like fit.object$ <-$ lme(model formula, random, correlation, weights, data) Correlation is a specification for the within-individual correlation structure R i ; the variable on the right specifies the factor determining sets of observations that are assumed independent/uncorrelated (observations are independent by child here). Visit for more info and visit the corclasses and corstruct link for lists of built-in correlation structures. 49 / 64

50 weights is a specification for the nature of the within-individual variances R i ; the variable on the right specifies a feature by which they can be different. Here, using age specifies that the variances on the diagonal of the overall covariance matrix can be different across age (time); using gender specifies a different variance for each gender (common for all times); and using gender*age gives variances that change over age and are different between genders. In gnenerl, the weights option allows one to make the variance on the diagonal of the overall covariance model be different depending on a group factor by weights = varident(form = 1 groupvar ) or to change over time by weights = varident(form = 1 timevar ) 50 / 64

51 Model (a): Common g or D matrix for both genders default diagonal within-child covariance matrix R i with same variance σ 2 for each gender. Thus, σ 2 is the sum of realization process and measurement error variance 51 / 64

52 dental.lme.a <- lme(distance ~ -1 + gender + age:gender, data=thedat,random = ~ age child,method="ml") beta.a <- fixed.effects(dental.lme.a) # beta, also fixef(dental.lme.a) b.a <- random.effects(dental.lme.a) # posterior modes bi, also ranef(dental.lme.a) sebeta.a <- summary(dental.lme.a)$ttable[,"std.error"] # these SEs are "off" by a factor very close to 1 D.a <- getvarcov(dental.lme.a, type="random.effects") # sigma2.a <- dental.lme.a$sigma^2 # sigma^2 V.a <- getvarcov(dental.lme.a,type="marginal", individual=1) # V_i R.a <- getvarcov(dental.lme.a,type="conditional", individual=1) # R_i 52 / 64

53 sebeta.a <- summary(dental.lme.a)$ttable[,"std.error"] # these SEs are "off" by a factor very close to 1, DON T # The attribute varfix returns the CORRECT # model-based covariance matrix! # Thus, the square roots of its diagonal elements are # model-based standard errors (compare to SAS) # USE THE FOLLOWING ONE FOR SEs sebeta.model.a <- sqrt(diag(dental.lme.a$varfix)) > sebeta.a gender0 gender1 gender0:age gender1:age > sebeta.model.a gender0 gender1 gender0:age gender1:age / 64

54 # Compare the fitted models via AIC and BIC; model (b), # with diagonal R_i with gender-specific within-child # variances and common random effects covariance # matrix D is preferred dental.lme.b <- lme(distance ~ -1 + gender + age:gender, data=thedat, random = ~ age child, weights = varident(form = ~ 1 gender), method="ml") 54 / 64

55 # Refit model (b) using REML and get dental.lme.b.reml <- lme(distance ~ -1 + gender + age:gender,data=thedat, random = ~ age child, weights = varident(form = ~ 1 gender)) beta.b.reml <- fixed.effects(dental.lme.b.reml) # beta sebeta.model.b.reml <- sqrt(diag(dental.lme.b$varfix)) b.b.reml <- random.effects(dental.lme.b.reml) # posterior modes bi D.b.reml <- getvarcov(dental.lme.b.reml, type="random.effects") # D 55 / 64

56 #Random effects empirical Bayes estimates bhat_i for first 5 girls b.b.reml[1:20,] ## (Intercept) age ## ## ## ## ## ## ## ## ## ## ## ## ## / 64

57 # PA predicted values X_i betahat are produced by level=0; #SS predicted values X_i betahat + Z_i bhat_i are produced # by level =1; both are gotten by level = 0:1 fitted(dental.lme.b.reml,level=0:1)[1:20,] ## fixed child ## ## ## ## ## ## ## ## ## ## ## / 64

58 # Plot SS residuals vs. predicted values #pdf("dental.residplot.pdf",width=10) plot(dental.lme.b.reml, resid(., type="p",level=1) ~ fitted(.,level=1) ) qqnorm(dental.lme.b.reml, ~ resid(., type="p",level=1), abline=c(0,1)) # One can also make QQ plots and histograms of # the bhat_i themselves to assess the normality of # the random effects, but remember that # these are "shrunken" so could be misleading. qqnorm(dental.lme.b.reml, ~ ranef(.)) 58 / 64

59 Figure 6: Plot SS residuals vs. predicted values Fitted values Standardized residuals Didn t notice obvious pattern. 59 / 64

60 Figure 7: QQ plot of SS residuals Standardized residuals Quantiles of standard normal Not too bad. 60 / 64

61 Figure 8: QQ plots for estimated random effects Random effects Quantiles of standard normal (Intercept) age Not too bad. 61 / 64

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Linear Mixed Models for Longitudinal Data Yan Lu April, 2018, week 12 1 / 34 Correlated data multivariate observations clustered data repeated measurement