Part III: Linear mixed effects models

Size: px

Start display at page:

Download "Part III: Linear mixed effects models"

Jemimah Parker
5 years ago
Views:

1 Part III: Linear mixed effects models 159 BIO 245, Spring 2018

2 Dental growth data Females Males Length, mm Length, mm Age, years Age, years One possible scientific goal is to estimate and formally compare the average growth trajectory between males and females 160 BIO 245, Spring 2018

3 In Part II we considered two-stage least squares as one possible analysis strategy: (1) estimate the growth trajectory for each child: Y k = Z k β k + ǫ k child-specific regression parameters, β k ǫ k are observation-specific random variations around each subjects underlying growth curve assumed to be i.i.d with mean 0 and variance σ 2 k (2) consider differences in the child-specific coefficients between males and females β k = W k β + γ k β characterizes the mean growth curve in the population (of children) γ k are child-specific deviations around the population growth curve assumed to be i.i.d with mean 0 and variance-covariance matrix G 161 BIO 245, Spring 2018

4 Results: Fitted lines Coefficients Length, mm Slope Age, years Intercept > summary(lm(beta1 ~ gender, data=betamat))... Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) gender * BIO 245, Spring 2018

5 Issues The design matrix is constrained at each stage of the analysis stage 1 is restricted to within-subject covariates stage 2 is restricted to between-subject covariates Information is lost by having summarized the response vector for subject k at stage 1 Differential uncertainty in the β k is not accounted for at stage 2 estimates, not the true β k uncertainty depends on n k and when the observations are collected The fact that observations are correlated is ignored All of these are key motivators for combining stages 1 and 2 into a single model formulation 163 BIO 245, Spring 2018

6 Linear mixed effects models Basic idea Assume that each cluster has a regression model characterized by cluster-specific regression parameters Structure the cluster-specific regressions across the population of clusters via a series of p fixed effects parameters that are common to all clusters in the population q random effects parameters that permit cluster-specific perturbations As we will see, both sets of parameters allow for differences in the regressions across subjects 164 BIO 245, Spring 2018

7 Notation Y k = (Y k1, Y k2,..., Y knk ) T Response vector β = (β 1, β 2,..., β p ) T X ki = (X ki,1, X ki,2,..., X ki,p ) X k = (X k1, X k2,..., X knk ) T γ k = (γ k1, γ k2,..., γ kq ) T Z ki = (Z ki,1, Z ki,2,..., Z ki,q ) Z k = (Z k1, Z k2,..., Z knk ) T ǫ k = (ǫ k1, ǫ k2,..., ǫ knk ) T Fixed effects Design matrix for the fixed effects n k p Random effects Design matrix for the random effects n k q Residual error terms 165 BIO 245, Spring 2018

8 Definition By a linear mixed effects model, we mean a statistical model with the following assumptions/components: model for the response of the k th cluster, given the fixed and random effects: Y k = X k β + Z k γ k + ǫ k model for the random effects and residual error: E[γ k ] = 0 E[ǫ k ] = 0 Cov[γ k ] = G(α) Cov[ǫ k ] = R k (α) Cov[γ k,ǫ k ] = 0 α referred to as covariance parameters responses across clusters are independent of eachother 166 BIO 245, Spring 2018

9 Note, this specification defines a marginal linear model: E[Y k X k ] = E γ [E Y [Y k X k, γ k ]] = E γ [X k β + Z k γ k ] = X k β µ k (β) Cov[Y k X k ] = Cov γ [E Y [Y k X k, γ k ]] + E γ [Cov Y [Y k X k, γ k ]] = Cov γ [X k β + Z k γ k ] + E γ [R k (α)] = Z k G(α)Zk T + R k (α) Σ k (α) Cov[Y k,y k ] = 0 In principle, therefore, we could use the methods developed in Part II to estimate components of this model 167 BIO 245, Spring 2018

10 From the expression for Σ(α), we can see that G k (α) and R k (α) jointly characterize variation in the observed responses around the population mean induced by β Σ k (α) = Z k G(α)Z T k + R k (α) G(α) is a q q matrix that quantifies random variation in the trajectories across clusters R k (α) is a n k n k matrix that quantifies random variation within subjects 168 BIO 245, Spring 2018

11 Interpretation Consider the interpretation of β in: Y k = X k β + Z k γ k + ǫ k Following standard principles, the components of β correspond to differences in the mean response between specific populations determined by the specific covariate under consideration require holding other terms constant, including the random effects interpretation is therefore conditional on the random effect Since the γ k are continuous, in theory they are unique to each cluster interpretation of β is therefore cluster-specific Some folks find this unappealing simultaneously thinking about averages and yet we are conditioning on a specific cluster 169 BIO 245, Spring 2018

12 Recall, however, the induced marginal model: E[Y k X k ] = X k β the interpretation of β in this model does not require conditioning on the subject interpretation is therefore marginal with respect to cluster membership more in line with our usual interpretation of regression parameters The main point here is that despite the model being conditionally specified, since the true conditional and marginal parameters (i.e. the regression coefficients) are the same numerically one can interpret results from a linear mixed effects model as pertaining to marginal associations relevant key question becomes how do you estimate β as well as perform inference? 170 BIO 245, Spring 2018

13 Q: What about the interpretation of the random effects? often interpreted as latent characteristics that are specific to the subject represent the collective impact of factors relevant to the response but not included in the model Need to be careful, however, not to interpret their inclusion as adjusting for unmeasured confounding 171 BIO 245, Spring 2018

14 Special case #1: Random intercepts model Arguably the simplest mixed effects model is one in which a single random intercept is introduced: Y ki = X T kiβ + γ 0k + ǫ ki γ 0k is a cluster-specific deviation around the population intercept, β 0 assume E[γ 0k ] = 0 and V[γ 0k ] = σ 2 γ assume E[ǫ ki ] = 0 and V[ǫ ki ] = σ 2 ǫ Consider the induced marginal covariance, Σ k, by considering two specific study units: Y ki = X T kiβ + γ 0k + ǫ ki Y kj = X T kjβ + γ 0k + ǫ kj notice how the two observations share the same value of γ 0k 172 BIO 245, Spring 2018

15 We then have: V[Y ki ] = V γ [E Y [Y ki γ 0k ]] + E γ [V Y [Y ki γ 0k ]] = σ 2 γ + σ 2 ǫ Cov[Y ki,y kj ] = Cov γ [E Y [Y ki γ 0k ],E Y [Y kj γ 0k ]] = σ 2 γ + E γ [Cov Y [Y ki,y kj γ 0k ]] so that for α = (σγ, 2 σǫ): 2 Σ(α) = σ 2 γ +σ 2 ǫ σ 2 γ... σ 2 γ σγ 2 σγ 2 +σǫ 2... σγ σ 2 γ σ 2 γ... σ 2 γ +σ 2 ǫ 173 BIO 245, Spring 2018

16 Consequently, the induced marginal model has an exchangeable dependence structure Notice that the correlation between two study units is: Cor[Y ki,y kj ] = σ 2 γ σ 2 γ + σ 2 ǫ = Between Between + Within [0,1] only positive correlation is entertained degree of correlation depends on the interplay between the between-cluster variation and the within-cluster variation in the responses if σ 2 γ = 0 there is no correlation if σ 2 ǫ = 0 there is perfect (positive) correlation 174 BIO 245, Spring 2018

17 Visualization of K=10 clusters with n k =5: σ γ = 1; σ ε = 1 σ γ = 1; σ ε = 0 Y ki Y ki T ki T ki σ γ = 0; σ ε = 1 σ γ = 2; σ ε = 1 Y ki Y ki T ki T ki 175 BIO 245, Spring 2018

18 Special case #2: Random intercepts/slopes model The random intercept model can be extended to permit the effects of select covariates to vary across the clusters For example in a longitudinal study in which T ki is the timing of the i th observation in the k th cluster, we might specify the model: Y ki = X T kiβ + γ 0k + T ki γ 1k + ǫ ki γ 0k is a cluster-specific deviation around the population intercept γ 1k is a cluster-specific deviation around the population slope for time assume T ki is an element of X ki assume Cov[γ k ] = G(α) = Σ γ,00 Σ γ,01 Σ γ,01 Σ γ,11 assume E[ǫ ki ] = 0 and V[ǫ ki ] = σǫ BIO 245, Spring 2018

19 Again consider the induced marginal covariance, Σ k, by considering two specific study units: Y ki = β 0 + β 1 T ki + γ 0k + γ 1k T ki + ǫ ki Y kj = β 0 + β 1 T kj + γ 0k + γ 1k T kj + ǫ kj for simplicity, just have the intercept and a term for T ki in X k We then have: V[Y ki ] = V γ [E Y [Y ki γ k ]] + E γ [V Y [Y ki γ k ]] = Σ γ,00 + 2Σ γ,01 T ki + Σ γ,11 T 2 ki + σ 2 ǫ Cov[Y ki,y kj ] = Cov γ [E Y [Y ki γ],e Y [Y kj γ k ]] + E γ [Cov Y [Y ki,y kj γ k ]] = Σ γ,00 + Σ γ,01 (T ki +T kj ) + Σ γ,11 T ki T kj 177 BIO 245, Spring 2018

20 In contrast to the exchangeable structure induced by the random intercepts model, the random intercepts/slopes model permits: heteroskedasticity across the study units within a cluster covariance (correlation) to depend on the two study units under consideration Note both of these extensions are specific to the covariate whose slope is permitted to vary across the clusters In principle, one could include any element of X k in Z k depending on what we believe about the structure of dependence the scientific interest(s) 178 BIO 245, Spring 2018

21 Visualization of K=10 clusters with n k =5: Σ γ00 = 1; Σ γ01 = 0; Σ γ11 = 1; σ ε = 0 Σ γ00 = 0.1; Σ γ01 = 0; Σ γ11 = 1; σ ε = 0 Y ki Y ki T ki T ki Σ γ00 = 1; Σ γ01 = 0; Σ γ11 = 0.1; σ ε = 0 Σ γ00 = 1; Σ γ01 = 0.8; Σ γ11 = 1; σ ε = 0 Y ki Y ki T ki T ki 179 BIO 245, Spring 2018

22 Special case #3: Random intercepts/serial dependence model The two special cases we ve seen so far build flexibility in the first component of: Cov[Y k X k ] = Σ k (α) = Z k G(α)Z T k }{{} Random effects + R k (α) }{{} Residual error i.e. solely via the random effects An alternative strategy to introducing flexibility is to explicitly model the correlation structure in the R(α) i.e. move beyond and permit the ǫ ki to be correlated R k (α) = σ 2 ǫi nk 180 BIO 245, Spring 2018

23 In a longitudinal study, one could do this by postulating that the residual error terms arise, in part, due to some (latent) stochastic process introduces serial dependence For example, the random intercepts/serial dependence model: Y ki = X T kiβ + γ 0k + W k (T ki ) + ǫ ki γ 0k is a cluster-specific random intercept assume E[γ 0k ] = 0 and V[γ 0k ] = σ 2 γ W k (T ki ) is a serial dependence term assume E[ǫ ki ] = 0 and V[ǫ ki ] = σ2 ǫ Assume the stochastic process W k ( ) is mean zero and is characterized by its covariance function: Cov[W k (T ki ), W k (T kj )] = σ 2 Wρ(U k,ij ) where U k,ij = T ki T kj 181 BIO 245, Spring 2018

24 Note, the ǫ ki terms in this model don t have the same interpretation as in the two previous models capture variation in the error terms beyond that explained by W k ( ) Under this model, assuming independence between γ 0k and W k ( ), for two specific study units we have: Y ki = β 0 + β 1 T ki + γ 0k + W k (T ki ) + ǫ ki Y kj = β 0 + β 1 T kj + γ 0k + W k (T kj ) + ǫ kj V[Y ki ] = V γ [E Y [Y ki γ 0k ]] + E γ [V Y [Y ki γ 0k ]] = σ 2 γ + σ 2 W + σ 2 ǫ Cov[Y ki,y kj ] = Cov γ [E Y [Y ki γ 0k ],E Y [Y kj γ 0k ]] + E γ [Cov Y [Y ki,y kj γ 0k ]] = σγ 2 + σwρ(u 2 k,ij ) 182 BIO 245, Spring 2018

25 Notice that the correlation between two study units is: Cor[Y ki,y kj ] = σ2 γ + σ 2 W ρ(u k,ij) σ 2 γ + σ 2 W + σ2 ǫ Specific examples for the correlation function include: ρ(u k,ij ) = ρ U kij ρ(u k,ij ) = exp{ U kij /range} auto-regressive model exponential spatial correlation 183 BIO 245, Spring 2018

26 Visualization of K=10 clusters with n k =5 based on an auto-regressive correlation function for W k ( ): σ γ = 0; σ W = 1; ρ W = 1.0; σ ε = 0 σ γ = 0; σ W = 1; ρ W = 0.8; σ ε = 0 Y ki Y ki T ki T ki σ γ = 0; σ W = 1; ρ W = 0.0; σ ε = 0 σ γ = 0; σ W = 2; ρ W = 0.8; σ ε = 0 Y ki Y ki T ki T ki 184 BIO 245, Spring 2018

27 Comment The random intercepts/slopes and the random intercepts/serial dependence models both structure Σ k (α) such that correlation among study units depends, in part, on the timing of their measurement One way of thinking how to distinguish between the two models is that the: random intercepts/slopes model structures dependence globally random intercepts/serial dependence structures dependence locally 185 BIO 245, Spring 2018

28 Estimation/inference Model components: response of the k th cluster, given the fixed and random effects: Y k = X k β + Z k γ k + ǫ k random effects and residual error: E[γ k ] = 0 E[ǫ k ] = 0 Cov[γ k ] = G(α) Cov[ǫ k ] = R k (α) Cov[γ k,ǫ k ] = 0 responses across clusters are independent of eachother If we are to proceed on the basis of likelihood-based estimation/inference we need to specify a complete probability distribution for the data 186 BIO 245, Spring 2018

29 The marginalized likelihood The most common way forward is to assume that and that γ k MVN q (0,G(α)) ǫ k MVN nk (0,R k (α)) γ k ǫ k Given these assumptions, we can write down the marginal distribution of the response for the k th cluster as: f Y (Y k β,α) = f(y k,γ k β,α) γ k = f Y γ (Y k γ k,β,α)f γ (γ k α) γ k marginal with respect to the random effects a convolution of two multivariate Normal distributions 187 BIO 245, Spring 2018

30 Since the convolution of two Normal distributions is itself a Normal distribution, and using results we derived on slide 167, we can write that: Y k β,α MVN nk (X k β, Σ k (α)) where Σ k (α) = Z k G(α)Z T k +R k (α) Use this as a basis for an intergated or marginal likelihood that can be used to perform estimation/inference with respect to (β, α): L(β,α) = K k=1 f Y (Y k β,α) assume that contributions from different clusters are independent From this, the log-likelihood is proportional to: l(β,α) K K log Σ k (α) + (Y k X k β) T Σ k (α) 1 (Y k X k β) k=1 k=1 188 BIO 245, Spring 2018

31 ML estimation of β Differentiating with respect to β yields the score: U β (β,α) = β l(β,α) = K k=1 X T k Σ k (α) 1 (Y k X k β) which yields the familiar MLE for β given α: ( K ) 1( K ) β(α) = Xk T Σ k (α) 1 X k Xk T Σ k (α) 1 Y k k=1 k=1 The covariance of β(α) is based on the inverse of the Fisher information matrix: [ 2 ] K I β (β,α) = E β β T l(β,α) = Xk T Σ k (α) 1 X k k=1 denoted this by A(α) when discussing least squares estimation 189 BIO 245, Spring 2018

32 Notice that the MLE is equivalent to the WLS estimator with W taken to be Σ k (α) 1 we could therefore use the theory of WLS to perform estimation/ inference for β since β(α) is a linear combination of the Y k, we have: β(α) MVN p (β, A(α) 1 ) since α is unknown, we could plug-in any consistent estimator and appeal to Slutsky s Theorem to see that: as K A( α) 1/2 ( β( α) β) MVN p (0, I) note, it is not sufficient for n k for fixed K 190 BIO 245, Spring 2018

33 ML estimation of α Differentiating the log-likelihood with respect to α yields: U α (β,α) = α l(β,α) K [ = (Y k X k β) T Σ k (α) 1 Σ k α Σ k(α) 1 (Y k X k β) k=1 trace ( )] Σ 1 Σ k k α In general one won t be able to solve U α (β,α) = 0 analytically However one proceeds, the solution α must satisfy the constraint that the resulting covariance matrix Σ( α) is positive definite several computation methods for parameterizing Σ(α) so that this constraint is satisfied e.g. Cholesky decomposition 191 BIO 245, Spring 2018

34 REML estimation Recall that, in general, the MLE for variance components in a multivariate Normal exhibit small sample bias intuition is that the standard ML estimator doesn t acknowledge the fact that the mean (i.e. β) is estimated implications for inference in small samples When considering estimation/inference for general linear models for dependent data we derived the REML estimator as an alternative REML can also be used here Operationally, the strategy involves the use of the linear transformation: Y (Z, β) where Z = B T Y with B the N (N p) matrix defined by the requirements that BB T = A and B T B = I. 192 BIO 245, Spring 2018

35 Exploiting the facts that E[Z] = 0 and Cov[Z, β] = 0, the pdf of Z, expressed as a function of Y, is proportional to: f Y (Y ) f( β) = (2π) (N p)/2 Σ(α) 1/2 X T Σ(α) 1 X 1/2 exp { 1 } 2 (Y X β) T Σ(α) 1 (Y X β) The REML estimator of Σ, therefore, maximizes the so-called restricted log-likelihood: l (α) log Σ log X T Σ(α) 1 X (Y X β) T Σ(α) 1 (Y X β) 193 BIO 245, Spring 2018

36 Inference From an inferential perspective, one might be interested in answering questions regarding the mean model and/or the dependence model Mean model: formally evaluate the association between some exposure of interest and the response compare competing specifications of some association investigate potential effect modification Dependence model: explain the nature of random variation in the data valid inference for the fixed effects requires a correct structure for Σ k (α) avoid over-parameterization which can lead to inefficient inference for the fixed effects 194 BIO 245, Spring 2018

37 Inference for the mean model Consider testing the fixed effects in nested linear mixed models: H 0 : β = β 1 0 versus H 1 : β = β 1 β 2 Usual Wald test will be valid, regardless of whether ML or REML was used If ML was used, one could also use a likelihood ratio test If REML was used, a likelihood ratio test will not be valid X k β under H 0 is not the same as X k β under H 1 correspondingly different Z = B T Y under H 0 and H 1 REML likelihoods are based on different sets of observations differences in REML deviances are not meaningful 195 BIO 245, Spring 2018

38 Inference for the dependence structure Consider testing whether the random slopes in a random intercepts/slopes model are necessary i.e. compare the fit based on to that based on random intercepts/slopes model: Y ki = X T kiβ + γ 0k + T ki γ 1k + ǫ ki to that based on the random intercepts model: Y ki = X T kiβ + γ 0k + ǫ ki Recall the covariance structure for the random intercepts and slopes: Cov[γ k ] = G(α) = Σ γ,00 Σ γ,01 Σ γ,01 Σ γ, BIO 245, Spring 2018

39 We can therefore structure the hypotheses for this question as: H 0 : G(α) = Σ γ, H 1 : G(α) = versus Σ γ,00 Σ γ,01 Σ γ,01 Σ γ,11 Notice that under the null, Σ γ,11 = 0 G(α) under H 0 is the boundary of the parameter space defined by the alternative violation of a standard assumption used to establish the (asymptotic) χ 2 distribution of the likelihood ratio test statistic 197 BIO 245, Spring 2018

40 In general, when testing variance components, the asymptotic sampling distribution of the LRT statistic is a mixture of χ 2 distributions nature of the mixture depends on the hypotheses one is testing For H 0 and H 1 above, the correct sampling distribution is a 50:50 mixture of a χ 2 2 distribution and a χ 2 1 distribution Density of χ LRT statistic Density of the mixture LRT statistic 198 BIO 245, Spring 2018

41 See that the naïve use of a χ 2 2 distribution will lead to overly conservative inference for a given value of the test statistic, the p-value based on a χ 2 2 distribution will be bigger than the p-value based on the correct mixture fail to reject H 0 when one should reject it leads to an incorrect over-simplification of the model for the dependence structure Additional details can be found in: Self and Liang (JASA, 1987) Stram and Lee (Biometrics, 1994) Finally, note that, in contrast to likelihood-based inference for β, one can perform valid likelihood-based inference for α whether ML or REML is used X k β is the same under H 0 and H 1 REML likelihoods are based on the same set of observations 199 BIO 245, Spring 2018

42 Empirical Bayes So far estimation/inference for (β, α) has been based on a marginalized likelihood in which the random effects have been integrated out In some settings, one might be interested in the random effects themselves visualization of heterogeneity across clusters characterization of cluster-specific associations profiling and ranking If (β, α) were known, one could proceed within the Bayesian paradigm for the unknown γ k : take the conditional distribution of the data as the likelihood take the distributional assumptions for the random effects as the prior the posterior of γ k given the data Y k : π(γ k Y k,β,α) L(γ k ;Y k,β,α) π(γ k α) 200 BIO 245, Spring 2018

43 As an estimate of γ k, one might compute and report the mean of the posterior distribution: γ k = E[γ k Y k,β,α] For a given form of this expectation, one could plug in estimates of (β, α) based on ML or REML Referred to as the empirical Bayes estimator empirical in the sense that the data was used to inform the prior i.e. we plugged α into π(γ k α) 201 BIO 245, Spring 2018

44 Empirical Bayes in a simple setting Consider the simple setting in which there are no covariates: Y ki = β 0k + ǫ ki β 0k Normal(β 0, τ 2 ) ǫ ki Normal(0, σ 2 ) The induced marginal distribution of the response vector is: Y k1 Y k2... Y knk MVN nk β 0 β 0... β 0, τ 2 +σ 2 τ 2... τ 2 τ 2 τ 2 +σ 2... τ τ 2 τ 2... τ 2 +σ BIO 245, Spring 2018

45 It is relatively straightforward to show that the ML estimator of β 0, the population mean, is simply the WLS estimator: ˆβ 0 = k w ky k k w k where w k = τ 2 +σ 2 /n k obtain estimates of σ 2 and τ 2 via ML or REML Now consider the joint distribution of the cluster-specific mean, Y k and β 0k : Y k β 0k MVN 2 β 0 β 0, τ2 +σ 2 /n k τ 2 τ 2 τ BIO 245, Spring 2018

46 The corresponding conditional distribution of β 0k Y k is a Normal with mean E[β 0k Y k,β 0,σ 2,τ 2 ] = β 0 + = τ 2 τ 2 +σ 2 /n k (Y k β 0 ) σ 2 /n k τ 2 +σ 2 /n k β 0 + τ 2 τ 2 +σ 2 /n k Y k weighted average of the population mean and the cluster-specific mean of the responses as n k gets large, more weight is attached to the sample mean also depends on the relative degree of between- versus within-cluster variation Operationally one would plug-in the estimates of β 0, σ 2 and τ 2, we get the empirical Bayes estimate of β 0k : β 0k = ˆσ 2 /n k ˆτ 2 + ˆσ 2 /n k ˆβ0 + ˆτ 2 ˆτ 2 + ˆσ 2 /n k Y k 204 BIO 245, Spring 2018

47 Empirical Bayes for linear mixed effects models Now consider the linear mixed effects model: Y k = X k β + Z k γ k + ǫ k γ k MVN q (0,G(α)) ǫ k MVN nk (0,R k (α)) γ k ǫ k for which the induced marginal distribution of Y k is a multivariate Normal: Y k MVN nk (X k β, Σ k (α)) where Σ k (α) = Z k G(α)Z T k +R k (α) Suppose we obtain estimates of (β, α) via ML or REML 205 BIO 245, Spring 2018

48 Now consider the (cluster-specific) joint distribution of Y k and γ k : Y k γ k MVN nk +q X kβ 0, Σ k(α) G(α)Zk T Z k G(α) G(α) since Cov[Y k, γ k ] = Cov[X k β + Z k γ k + ǫ k, γ k ] = Cov[X k β, γ k ] + Cov[Z k γ k, γ k ]+ Cov[ǫ k, γ k ] = 0 + Z k Cov[γ k, γ k ] + 0 = Z k G(α) The induced conditional distribution of γ k Y k is: γ k Y k MVN q (G(α)Z T k Σ k (α) 1 (Y k X k β), Σ k(α)) where Σ k(α) = G(α) G(α)Z T k Σ k (α) 1 Z k G(α) 206 BIO 245, Spring 2018

49 Consequently, the empirical Bayes estimator of γ k is: γ k = G( α)z T k Σ k ( α) 1 (Y k X k β) simply plug in the estimates (ML or REML) of (β, α) Can also examine the fitted values of the response based on the empirical Bayes estimator: Ŷ k = X k β + Zk γ k = X k β + Zk [ G( α)z T k Σ k ( α) 1 (Y k X k β) ] = [I nk Z k G( α)z T k Σ k ( α) 1 ]X k β + Zk G( α)z T k Σ k ( α) 1 Y k weighted average of the estimated population profile, X k β, and the observed data, Y k. 207 BIO 245, Spring 2018

50 Fitting models in R There are many ways of fitting linear mixed effects models in R Two functions/packages written by Doug Bates and colleagues are: lme() function in the nlme package lmer() function in the lme4 package Unfortunately, no single function/package has the functionality to fit all forms of a linear mixed effects model that one could be interested in e.g. lme() permits direct specification of structure for the off-diagonal elements of R k (α), whereas lmer() does not (at least as far as I can tell) e.g. lmer() permits non-nested levels of clustering, whereas lme() does not (at least as far as I can tell) 208 BIO 245, Spring 2018

51 The basic call to lme() is has the following elements: fixed random correlation weights method Specification for the regression model, X k β Specification for the random effects model, Z k γ k Specification for serial dependence in R k (α) Specification for heteroskedasticity in R k (α) Indicator of whether ML or REML is used > ## >?lme ## Help on specifying a linear mixed model >?corclasses ## Help on specifying serial dependence >?varclasses ## Help on specifying heteroskedasticity Additional resources: Pinheiro J. and Bates D. Mixed-Effects Models in S and S-PLUS (2006) Galecki A. and Tomasz B. Linear mixed-effects models using R: A step-by-step approach (2013). 209 BIO 245, Spring 2018

52 Dental growth data Fit a series of models with the same mean structure: E[Y ki ] = β 0 + β 1 A ki + β 1 G k + β 3 A kig k where A ki = A ki 8 Consider a range of specifications for dependence: Σ k (α) = Z k G(α)Z T k +R k (α) > ## > load("growth.rdata") > growth$agestar <- growth$age - 8 > library(nlme) > > ## Naive - independent errors > ## > fit0.ml <- glm(length ~ agestar * gender, data=growth, family=gaussian) 210 BIO 245, Spring 2018

53 > ## Random intercepts + independent errors > ## > fit1.ml <- lme(fixed=length ~ agestar * gender, + random=restruct(~ 1 id), + data=growth, method="ml") > > ## Independent random intercepts/slopes + independent errors > ## * two subject-specific random effects are independent > ## > fit2.ml <- lme(fixed=length ~ agestar * gender, + random=restruct(~ agestar id, pdclass="pddiag"), + data=growth, method="ml") > > ## Correlated random intercepts/slopes + independent errors > ## * two subject-specific random effects are correlated > ## > fit3.ml <- lme(fixed=length ~ agestar * gender, + random=restruct(~ agestar id), + data=growth, method="ml") 211 BIO 245, Spring 2018

54 > ## Random intercepts + auto-regressive errors > ## > fit4.ml <- lme(fixed=length ~ agestar * gender, + random=restruct(~ 1 id), + correlation=corar1(form= ~ agestar id), + data=growth, method="ml") > > ## Random intercepts + exponential spatial errors > ## > fit5.ml <- lme(fixed=length ~ agestar * gender, + random=restruct(~ 1 id), + correlation=corexp(form= ~ agestar id), + data=growth, method="ml") > > ## Random intercepts + (exponential spatial and independent measurement error) > ## > fit6.ml <- lme(fixed=length ~ agestar * gender, + random=restruct(~ 1 id), + correlation=corexp(form= ~ agestar id, nugget=true), + data=growth, method="ml") 212 BIO 245, Spring 2018

55 > ## Random intercepts + independent errors > ## * heteroskedasticity across age > ## > fit7.ml <- lme(fixed=length ~ agestar * gender, + random=restruct(~ 1 id), + weights=varident(form= ~1 agestar), + data=growth, method="ml") > > ## Random intercepts + independent errors > ## * heteroskedasticity across gender > ## > fit8.ml <- lme(fixed=length ~ agestar * gender, + random=restruct(~ 1 id), + weights=varident(form= ~1 gender), + data=growth, method="ml") 213 BIO 245, Spring 2018

56 Output: > ## > summary(fit0.ml)... Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-16 *** agestar ** gendermale * agestar:gendermale (Dispersion parameter for gaussian family taken to be ) Null deviance: on 103 degrees of freedom Residual deviance: on 100 degrees of freedom AIC: > sqrt(summary(fit0.ml)$dispersion) [1] BIO 245, Spring 2018

57 > ## Random intercepts + independent errors > ## > summary(fit1.ml) Linear mixed-effects model fit by maximum likelihood Data: growth AIC BIC loglik Random effects: Formula: ~1 id (Intercept) Residual StdDev: Fixed effects: length ~ agestar * gender Value Std.Error DF t-value p-value (Intercept) agestar gendermale agestar:gendermale Number of Observations: 104 Number of Groups: BIO 245, Spring 2018

58 > ## Independent random intercepts/slopes + independent errors > ## > summary(fit2.ml)... Random effects: Formula: ~agestar id Structure: Diagonal (Intercept) agestar Residual StdDev: > > ## Correlated random intercepts/slopes + independent errors > ## > summary(fit3.ml)... Random effects: Formula: ~agestar id Structure: General positive-definite, Log-Cholesky parametrization StdDev Corr (Intercept) (Intr) agestar Residual BIO 245, Spring 2018

59 > ## Random intercepts + auto-regressive errors > ## > summary(fit4.ml)... Random effects: Formula: ~1 id (Intercept) Residual StdDev: Correlation Structure: ARMA(1,0) Formula: ~agestar id Parameter estimate(s): Phi BIO 245, Spring 2018

60 > ## Random intercepts + exponential spatial errors > ## > summary(fit5.ml)... Random effects: Formula: ~1 id (Intercept) Residual StdDev: Correlation Structure: Exponential spatial correlation Formula: ~agestar id Parameter estimate(s): range > > ## Correlation between two observation 1 time unit apart > ## > range < > exp(-1/range) [1] e BIO 245, Spring 2018

61 > ## Random intercepts + (exponential spatial and independent measurement error) > ## > summary(fit6.ml)... Random effects: Formula: ~1 id (Intercept) Residual StdDev: Correlation Structure: Exponential spatial correlation Formula: ~agestar id Parameter estimate(s): range nugget > > ## Correlation between two observation 1 time unit apart > ## > range < > nugget < > (1-nugget) * exp(-1/range) [1] BIO 245, Spring 2018

62 > ## Random intercepts + independent errors > ## * heteroskedasticity across age > summary(fit7.ml)... Random effects: Formula: ~1 id (Intercept) Residual StdDev: Variance function: Structure: Different standard deviations per stratum Formula: ~1 agestar Parameter estimates: BIO 245, Spring 2018

63 > ## Random intercepts + independent errors > ## * heteroskedasticity across gender > summary(fit8.ml)... Random effects: Formula: ~1 id (Intercept) Residual StdDev: Variance function: Structure: Different standard deviations per stratum Formula: ~1 gender Parameter estimates: female male BIO 245, Spring 2018

64 Comparison of model fit: Dependence model log-like AIC 0. Independence Random intercepts + inde. errors Independent random intercepts/slopes + inde. errors Correlated random intercepts/slopes + inde. errors Random intercepts + AR errors Random intercepts + ES errors Random intercepts + (ES and ME errors) Random intercepts + heteroske inde. errors (age) Random intercepts + heteroske inde. errors (gender) seems clear that model 8 provides the best fit (from among those presented, at least) 222 BIO 245, Spring 2018

65 Results for: E[Y ki ] = β 0 + β 1 A ki + β 1 G k + β 3 A kig k Dependence Estimate Standard error model β 0 β 1 β 2 β 3 β 0 β 1 β 2 β standard error estimates are notably smaller under model BIO 245, Spring 2018

66 Empirical Bayes estimates of the random effects estimates from a random intercept/slopes model compare estimates with those from K=26 separate regressions Subject specific intercepts Subject specific slopes Random effects Random effects Fixed effects Fixed effects see structure imposed by the model 224 BIO 245, Spring 2018

67 Finally consider the impact of using REML for model 8 > > fit8.reml <- lme(fixed=length ~ agestar * gender, + random=restruct(~ 1 id), + weights=varident(form= ~1 gender), + data=growth, method="reml") > > summary(fit8.reml) Linear mixed-effects model fit by REML... Random effects: Formula: ~1 id (Intercept) Residual StdDev: Variance function: Structure: Different standard deviations per stratum Formula: ~1 gender Parameter estimates: female male BIO 245, Spring 2018

68 ML REML Est SE Est SE Fixed effects Intercept, β Main effect for age, β Main effect for gender, β Interaction, β Variance components SD of random intercepts SD for errors females males differences don t appear to be substantial 226 BIO 245, Spring 2018

69 MACS CD4+ cell count data Consider the mean model: E[CD4 ki ] = β 0 + β 1 time ki + β 2 age ki + β 3 smoker ki based on the following data manipulations: + β 4 drug ki + β 5 cesd-cs ki + β 5 cesd-lo ki > ## > load("macs.rdata") > macs$smoker <- as.numeric(macs$packs > 0) > macs$age <- macs$age / 5 > macs$cesd <- (macs$cesd - 10) / 5 restricted to K=266 participants those who seroconverted with the pre measurement no more than 6 months prior to seroconversion cesd-cs and cesd-lo are the cross-sectional and longitudinal forms of CESD we ve considered previously 227 BIO 245, Spring 2018

70 Fit seven of the models for the dependence structure considered for the dental growth data for simplicity do not consider heteroskedasticity Dependence model log-like AIC 0. Naïve independence -10,956 21, Random intercepts + inde. errors -10,705 21, Independent random intercepts/slopes + inde. errors -10,674 21, Correlated random intercepts/slopes + inde. errors -10,663 21, Random intercepts + AR errors -10,832 21, Random intercepts + ES errors -10,677 21, Random intercepts + (ES and ME errors) -10,667 21,357 indications that dependence models 3, and 6 provide the best fits to the data 228 BIO 245, Spring 2018

71 > ## Correlated random intercepts/slopes + independent errors > ## > summary(fit3.ml)... Random effects: Formula: ~time id Structure: General positive-definite, Log-Cholesky parametrization StdDev Corr (Intercept) (Intr) time Residual Fixed effects: list(my.form) Value Std.Error DF t-value p-value (Intercept) time substantial variation in the random slopes relative to that in the random intercepts and residual error negative correlation between the random intercepts and slopes 229 BIO 245, Spring 2018

72 Subject specific slope for time Subject specific intercept vast majority of subject-specific slopes are negative participants with higher initial CD4 count tend to have steeper declines post-seroconversion 230 BIO 245, Spring 2018

73 > ## Random intercepts + (exponential spatial and independent measurement error) > ## > summary(fit6.ml) Random effects: Formula: ~1 id (Intercept) Residual StdDev: Correlation Structure: Exponential spatial correlation Formula: ~time id Parameter estimate(s): range nugget > > ## Correlation between two observation 1 time unit apart > ## > (1-nugget) * exp(-1/range) [1] Fairly substantial correlation in the serial dependence term 231 BIO 245, Spring 2018

74 Point estimates: E[CD4 ki ] = β 0 + β 1 time ki + β 2 age ki + β 3 smoker ki + β 4 drug ki + β 5 cesd-cs ki + β 5 cesd-lo ki Dependence Int time age smoker drug cesd-cs cesd-lo model BIO 245, Spring 2018

75 Standard error estimates: E[CD4 ki ] = β 0 + β 1 time ki + β 2 age ki + β 3 smoker ki + β 4 drug ki + β 5 cesd-cs ki + β 5 cesd-lo ki Dependence Int time age smoker drug cesd-cs cesd-lo model BIO 245, Spring 2018

76 Model diagnostics Since estimation/inference for linear mixed effects models is likelihood-based, the validity of results relies on a series of assumptions: correct specification of the mean model correct specification of the dependence model Normality of γ k and ǫ k K is sufficiently large for asymptotic results to apply Prior to formal modeling, one can (and should) perform an initial EDA to examine possible errors in the data as well as to get insight into model structure Once a model has been fit, the assumptions can be re-assessed using an analysis of residuals and the empirical Bayes estimates of the random effects 234 BIO 245, Spring 2018

77 Residuals In mixed effects models, the notion of a residual can be defined at a number of different levels marginal (population-level) residuals: stage-one (cluster-level) residuals: e k = Y k X k β ǫ k = Y k X k β Z k γ k γ k can also be viewed as form of stage two residual deviation around the population regression Estimates of these quantities can be obtained by plugging in β and γ k : ê k = Y k X k β ǫ k = Y k X k β Zk γ k 235 BIO 245, Spring 2018

78 Standardization The observed residuals arise from having fit a model and, therefore, should exhibit variation-covariation in accordance with the structure of the model e.g. if the model permits the residual error variance to differ by some covariate, then the observed residuals will reflect this If the goal is to investigate the extent to which the model is inadequate, we should acknowledge this phenomenon One way forward is to standardize (or normalize) the residuals let L T kl k be the Cholesky decomposition of Σ k ( α) use L k to form the standardized marginal residuals ê k = L 1 k êk = L 1 k (Y k X k β) 236 BIO 245, Spring 2018

79 If Σ k (α) has been correctly specified then Cov[ê k] I nk no mean-variance relationship in the variance components no relationship between the residuals and any of the covariates no residual correlation among observations within a cluster Note, as appropriate, one might also consider standardizing the stage one residuals ǫ k = Y k X k β Z k γ k particularly useful for investigating residual (unaccounted) serial dependence in the specification of R k (α) 237 BIO 245, Spring 2018

80 Diagnostics Conventionally, folks tend to focus on the stage one residuals and the random effects estimates when performing diagnostics for linear mixed models patterns in the population residuals could result from inadequacy in the model structure in various places Mean model: plots of ˆǫ ki versus the columns of X k to investigate whether the assumed specification of X k β is adequate plots of γ kq versus the columns of X k to investigate whether the assumed specification of Z k γ k is adequate Note, if the residuals have been standardized, one should compare them with the columns of X k similarly standardized 238 BIO 245, Spring 2018

81 Dependence model: plot of ˆǫ ki versus the fitted values, ˆµ ki = X k β + Zk γ k, to see if there is a residual mean-variance relationship plot of ˆǫ ki versus lag(ˆǫ ki ) to see if there is any residual serial dependence Normality: standard approach is to present a Q-Q plot of the fitted residual versus the expected normalized residuals from a Normal distribution stage one residuals, ˆǫ ki random effect estimates, γ k,j for j= 1,..., q as appropriate, one can also examine scatter plots of the components of γ k to assess joint Normality 239 BIO 245, Spring 2018

82 Dental growth data Recall the best fitting model was model 8 random intercepts + heteroskedastic error terms by gender Present select diagnostics based on model 1: random intercepts + homoskedastic independent error > > ## Random intercepts + independent errors > ## > fit1.ml <- lme(fixed=length ~ agestar * gender, + random=restruct(~ 1 id), + data=growth, method="ml") > > epshat <- resid(fit1.ml, type="normalized") > gammahat <- ranef(fit1.ml)[,1] 240 BIO 245, Spring 2018

83 Investigate heteroskedasticity in the standardized stage 1 residuals: years 10 years 12 years 14 years Females Males inconclusive evidence regarding heteroskedasticity by age strong evidence of heteroskedasticity by gender 241 BIO 245, Spring 2018

84 Compare residuals and lagged residuals to investigate potential serial dependence: only for observations at ages 10, 12 and 14 Stage 1 residual Lagged stage 1 residual indication of unaccounted for local correlation 242 BIO 245, Spring 2018

85 Q-Q plots: Stage 1 residuals Random intercepts Sample quantiles Sample quantiles Theoretical quantiles Theoretical quantiles looking for linearity between the theoretical and sample quantiles some evidence of heavier-than-normal tails in the stage 1 residuals random intercepts seems fine tough to tell with only K=26 children 243 BIO 245, Spring 2018

86 Summary Goals: perform estimation/inference for regression parameters from a model for a continuous response while acknowledging within-cluster dependence also perform estimation/inference for the dependence structure estimate cluster-specific parameters/effects Approach: combine fixed effects with random effects into a single model formulation: Y k = X k β + Z k γ k + ǫ k consider various strategies for structuring the model components so that a range of dependence models can be entertained structure for Cov[γ k ] = G(α) structure for Cov[ǫ k ] = R k (α) 244 BIO 245, Spring 2018

87 Estimation/inference: integrated likelihood for (β, α) ML or REML empirical Bayes for the cluster-specific random effects, γ k Since estimation/inference is fully parametric, diagnostics for model structure and assumptions are critical mean model dependence model distributional assumptions 245 BIO 245, Spring 2018

Covariance Models (*) X i : (n i p) design matrix for fixed effects β : (p 1) regression coefficient for fixed effects

Covariance Models (*) X i : (n i p) design matrix for fixed effects β : (p 1) regression coefficient for fixed effects Covariance Models (*) Mixed Models Laird & Ware (1982) Y i = X i β + Z i b i + e i Y i : (n i 1) response vector X i : (n i p) design matrix for fixed effects β : (p 1) regression coefficient for fixed