1 Mixed effect models and longitudinal data analysis

Size: px

Start display at page:

Download "1 Mixed effect models and longitudinal data analysis"

Lucy Turner
5 years ago
Views:

1 1 Mixed effect models and longitudinal data analysis Mixed effects models provide a flexible approach to any situation where data have a grouping structure which introduces some kind of correlation between the observations The term effects refers to the parameters associated to the predictor variables (because they are - in some sense- the effect that the predictors have on the response) In all the methods we have seen until now, the predictor variables have been considered as fixed (as opposed to random), ie unknown numbers The term mixed effect refers to the fact that some of the effects are now considered to be random variables Fixed effects are usually associated to the variables that are of interest for the analysis and whose relationship with the response we hope to be able to generalize beyond the specific event of the experiment Random effects are associated to predictors whose values in the experiment is a random sample from a larger real world set and therefore we can model this sampling procedure with random variables Examples of these are subject-specific characteristics, batches effects in industrial settings and so on 11 Linear mixed effects model Let Y = (Y 1,, Y n ) T be a vector of n observations, and for each i = 1,, n let x (i) = (x i1,, x ip1 ) and z (i) = (z i1,, z ip2 ) be row vectors of known covariates of length p 1 and p 2 respectively Let X be the n p 1 matrix with ith row x (i) and Z be the n p 2 matrix with ith row z (i) We require that p 1 + p 2 < n, and that X and Z have full rank 1

2 The linear mixed effect model assumes that where Y = Xβ + Zu + ε (1) β is an unknown vector of fixed effects (parameters), u is an unknown vector of random effects and u N(0, G), ε in a random error, with ε N(0, σ 2 I n ), σ 2 > 0 and I n being a n n identity matrix, u and ε are independent Alternatively, it is possible to define the model through the conditional distribution of the response vector (this is also called hierarchical formulation): Y u = ũ N(Xβ + Zũ, σ 2 I n ) and u N(0, G) Model 1 implies that Y N(Xβ, Σ), where Σ = ZGZ T + σ 2 I n (this is called marginal formulation of the model) Estimation and inference for the fixed effects parameters are based on this marginal formulation Let now α be a vector formed by the unknown parameters in the covariance matrix G and σ 2 (ie all the unknown parameters in the covariance matrix of Y ) and θ = (β, α) the vector containing all the unknown parameters in the marginal formulation of the model The likelihood of the model is then L(θ; Y ) Σ 1 2 (α) exp( 1 2 (Y Xβ)T Σ 1 (α)(y Xβ)) (2) and we want to estimate θ by maximizing the likelihood function (or the log-likelihood) If α was known, the maximum likelihood estimator for β would be ˆβ(α) = (X T Σ 1 (α)x) 1 X T Σ 1 (α)y (3) Since α (ie the covariance of the random effects and the error variance) is unknown in practice, we need first to estimate it A first possible approach is to substitute ˆβ(α) in (2) and maximize the likelihood numerically with respect to α This leads to the maximum likelihood estimator (MLE) for the vector parameter θ However, the MLE estimator for linear mixed effect 2

3 model has a couple of drawbacks First, it is biased (analogously to the MLE for the variance of a Gaussian sample) and the error on the random effects covariance may be large Second, the covariance being constrained to be positive definite, we may have numerical instability when the maximum of the likelihood corresponds to negative definite matrices and the optimum is reached on the boundary of the permissible domain To bypass these problems, Corbeil and Searle (1976) propose to use a restricted maximum likelihood (REML) approach This consists in estimating the covariance matrix Σ first and then plug it in (3) To estimate Σ, we consider linear combination a k of the observations such as a T k X = 0, since this cancels out the fixed effect Let A be the matrix with columns a 1,, a n 1 and we have U = A T Y N(0, A T Σ(α)A) We can then estimate α by maximizing the likelihood L(α; U) Note that, while the difference between MLE and REML lies in the way the covariance parameters are estimated, they also lead to different estimates for the β, since its estimator depends on Σ The random effects being random variables, the only parameters we can really estimate are the ones of their covariance structure However, to obtain fitted values from the model we need to estimate the realization of the random effects for the different levels of the factor, since we only know the expression of Y u To do this, we focus on the conditional distribution of the random effects given the data u Y and we can choose the mode of this distribution as predictor for the random effects (maximum a posteriori estimator, MAP) This can be easily obtained from the Bayes theorem, since we can note that u Y is still multivariate Gaussian (since both u and Y u are multivariate Gaussian) with mean G(α)Z T Σ 1 (α)(y Xβ) which coincides with the mode of the distribution We can then plug in estimates of β and α we got from MLE or REML (this approach is often called Empirical Bayes) Therefore, the conditional mode (and mean) of the random effects is û = G( ˆα)Z T Σ 1 ( ˆα)(Y X ˆβ) This is peculiar of Gaussian models, in a more general case we would need to maximise the conditional distribution obtained from the Bayes theorem Henderson et al (1959) show that the estimates for β and the conditional mean of u in model (1) where G and σ 2 are known (or in practice estimated by REML) are the solution of the linear system ( X T X X T Z Z T X Z T Z + σ 2 G 1 ) ( ˆβ û ) = ( X T Y Z T Y However, solving this system is in practice computationally more expensive than the approaches described above ) 3

4 12 Inference for linear mixed effects models We can compare two nested models m 0 and m 1 using the likelihood ratio test statistics 2(l( ˆβ 1, ˆα 1 Y ) l( ˆβ 0, ˆα 0 Y )), where ˆβ 0, ˆα 0 ans ˆβ 1, ˆα 1 are the estimates for the parameters of the two models and l indicates the log-likelihood function This test statistics is asymptotically distributed as a χ 2 with degrees of freedom equal to the difference in number of parameters between the two models However, this asymptotic result is based on some technical assumptions that are not always satisfied in practice (see, eg, Casella and Berger, 2002, for more details) In particular, the parameters under the null model are required not to be on the boundary of the parameter space This is a problem for testing variance and covariances of the random effects, which are constrained to be positive (or positive definite) and the null hypothesis is they being equal to zero Moreover, the χ 2 approximation is often poor for finite samples In addition, if the models differ in their fixed effects, it is not possible to use REML estimates in the likelihood ratio statistics, because REML estimates the random effects by considering linear combinations of the data that remove the fixed effects and therefore the two likelihood are not comparable These considerations lead to be careful with the p-values based on χ 2 approximations and to usually prefer parametric bootstrap methods to approximate the distribution of the test statistics 13 Generalized linear mixed effects model It is also possible to consider random effects in the context of generalized linear models Definition 11 A generalized linear mixed effects model assumes that Y i u ED(µ i, φ i ) independently for i = 1,, n, where g(µ i ) = x (i) β + z (i) u for some monotonic and differentiable function g, and φ i = φa i for known a i > 0 β and φ are unknown parameters and the random effects u F for some distribution F Note that this model requires the observations to be conditionally independent (given u) and the random effects u affect the distributions only through the conditional mean The parameters of this model and the conditional modes of the random effects can be estimated by maximum likelihood, 4

5 but efficient algorithm for this optimization problem are very much an active area of research 14 Longitudinal data and repeated measures A context where mixed effects models are useful is the case of longitudinal data and repeated measures, ie when several subjects are measured repeatedly When measurements are taken over time, these are called longitudinal data This means that observations cannot be assumed independent and therefore we need some way to accommodate for the correlation structure One possibility is to use a mixed effect model Other possibilities (which we are not considering in the course) are estimating the covariance structure using some parametric model (for example, exponentially decreasing in time) and then fit the model using generalized least squares (or generalized estimating equation for GLM) or transition models, which model each response conditionally to the other responses of the same subject Let Y i be the vector of responses of length n i for the subject i = 1,, M and its conditional distribution be Y i u i N(X i β + Z i u i, σ 2 Λ i ) (4) independently and u i N(0, G) are independent between subjects The main difference with the model considered before is that we allow here for a more general covariance structure Λ i for the errors which may also change between subjects The marginal formulation for model (4) is Y i N(X i β, Σ i ), where Σ i = σ 2 Λ i + Z i GZi T We can then define combine data from different subjects in a single model, Y = Y 1 Y M, X = X 1 X M, u = and we get that Y N(Xβ, Σ), where Σ = diag(σ 1,, Σ M ) is a blockdiagonal matrix Now the model has the form of a linear mixed effect model as discussed in Section 11 and estimation and inference can be obtained with the methods described in the previous sections Analogously, it is possible to define a generalized linear mixed effects model for longitudinal data, where Y ij u i ED(µ ij, φ ij ) for j = 1,, n i measurements, u i F independently for i = 1,, M subjects and g(µ ij ) = x (j) i β + z (j) i u i u 1 u M 5

6 References Casella, G and Berger, R L (2002) Statistical inference, Duxbury Corbeil, R R and Searle, S R (1976) Restricted maximum likelihood (REML) estimation of variance components in the mixed model, Technometrics, 18, Henderson, C R, Kempthorne, O, Searle, S R and Von Krosigk, C M (1959) The estimation of environmental and genetic trends from records subject to culling, Biometrics, 15, Kenward, M G and Roger, J H (1997) Small sample inference for fixed effects from restricted maximum likelihood, Biometrics,

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown. Weighting We have seen that if E(Y) = Xβ and V (Y) = σ 2 G, where G is known, the model can be rewritten as a linear model. This is known as generalized least squares or, if G is diagonal, with trace(g)