11. Linear Mixed-Effects Models. Copyright c 2018 Dan Nettleton (Iowa State University) 11. Statistics / 49

Size: px

Start display at page:

Download "11. Linear Mixed-Effects Models. Copyright c 2018 Dan Nettleton (Iowa State University) 11. Statistics / 49"

Jasmin Armstrong
5 years ago
Views:

1 11. Linear Mixed-Effects Models Copyright c 2018 Dan Nettleton (Iowa State University) 11. Statistics / 49

2 The Linear Mixed-Effects Model y = Xβ + Zu + e X is an n p matrix of known constants β R p is an unknown parameter vector Z is an n q matrix of known constants u is a q 1 random vector e is an n 1 vector of random errors Copyright c 2018 Dan Nettleton (Iowa State University) 11. Statistics / 49

3 The Linear Mixed-Effects Model y = Xβ + Zu + e The elements of β are considered to be non-random and are called fixed effects. The elements of u are random variables and are called random effects. The elements of the error vector e are always considered to be random variables. Copyright c 2018 Dan Nettleton (Iowa State University) 11. Statistics / 49

4 Because the model includes both fixed and random effects (in addition to the random errors), it is called a mixed-effects model or, more simply, a mixed model. The model is called a linear mixed-effects model because (as we will soon see) E(y u) = Xβ + Zu, a linear function of fixed and random effects. opyright c 2018 Dan Nettleton (Iowa State University) 11. Statistics / 49

5 We assume that E(e) = 0 Var(e) = R E(u) = 0 Var(u) = G Cov(e, u) = 0. It follows that opyright c 2018 Dan Nettleton (Iowa State University) 11. Statistics / 49

6 E(y) = E(Xβ + Zu + e) = Xβ + ZE(u) + E(e) = Xβ Var(y) = Var(Xβ + Zu + e) = Var(Zu + e) = Var(Zu) + Var(e) = ZVar(u)Z + R = ZGZ + R Σ. opyright c 2018 Dan Nettleton (Iowa State University) 11. Statistics / 49

7 We usually consider the special case in which [ ] ([ ] [ ]) u 0 G 0 N, e 0 0 R = y N(Xβ, ZGZ + R). opyright c 2018 Dan Nettleton (Iowa State University) 11. Statistics / 49

8 The conditional mean and variance, given the random effects, are E(y u) = Xβ + Zu and Var(y u) = R. opyright c 2018 Dan Nettleton (Iowa State University) 11. Statistics / 49

9 Example 1 Suppose an experiment was conducted to compare the height of plants grown at two soil moisture levels (labeled 1 and 2). The soil moisture levels were randomly assigned to 4 pots with 2 pots per moisture level. For each moisture level, 3 seeds were planted in one pot and 2 seeds were planted in the other. After a four-week growing period, the height of each seedling was measured. Let y ijk denote the height for soil moisture level i, pot j, seedling k. opyright c 2018 Dan Nettleton (Iowa State University) 11. Statistics / 49

10 Copyright c 2018 Dan Nettleton (Iowa State University) 11. Statistics / 49

11 Consider the model y ijk = µ + α i + p ij + e ijk p 11, p 12, p 21, p 22 i.i.d. N(0, σ 2 p) independent of the e ijk terms, which are assumed to be iid N(0, σ 2 e). This model can be written in the form y = Xβ + Zu + e, where Copyright c 2018 Dan Nettleton (Iowa State University) 11. Statistics / 49

12 y = y 111 y 112 y 113 y 121 y 122 y 211 y 212 y 213 y 221 y 222, X = µ, β = α 1 α 2, Copyright c 2018 Dan Nettleton (Iowa State University) 11. Statistics / 49

13 Z = , u = p 11 p 12 p 21 p 22, e = e 111 e 112 e 113 e 121 e 122 e 211 e 212 e 213 e 221 e 222. opyright c 2018 Dan Nettleton (Iowa State University) 11. Statistics / 49

14 y 111 y 112 y 113 y 121 y 122 y 211 y 212 y 213 y 221 y 222 = µ + α 1 µ + α 1 µ + α 1 µ + α 1 µ + α 1 µ + α 2 µ + α 2 µ + α 2 µ + α 2 µ + α 2 + p 11 p 11 p 11 p 12 p 12 p 21 p 21 p 21 p 22 p 22 + e 111 e 112 e 113 e 121 e 122 e 211 e 212 e 213 e 221 e 222 opyright c 2018 Dan Nettleton (Iowa State University) 11. Statistics / 49

15 y 111 = µ + α 1 + p 11 + e 111 y 112 = µ + α 1 + p 11 + e 112 y 113 = µ + α 1 + p 11 + e 113 y 121 = µ + α 1 + p 12 + e 121 y 122 = µ + α 1 + p 12 + e 122 y 211 = µ + α 2 + p 21 + e 211 y 212 = µ + α 2 + p 21 + e 212 y 213 = µ + α 2 + p 21 + e 213 y 221 = µ + α 2 + p 22 + e 221 y 222 = µ + α 2 + p 22 + e 222 opyright c 2018 Dan Nettleton (Iowa State University) 11. Statistics / 49

16 G = Var(u) = Var([p 11, p 12, p 21, p 22 ] ) = σ 2 pi 4 4 R = Var(e) = σ 2 ei Var(y) = ZGZ + R = Zσ 2 piz + σ 2 ei = σ 2 pzz + σ 2 ei. opyright c 2018 Dan Nettleton (Iowa State University) 11. Statistics / 49

17 ZZ = Copyright c 2018 Dan Nettleton (Iowa State University) 11. Statistics / 49

18 Thus, Var(y) = σ 2 pzz + σ 2 ei is a block diagonal matrix. The first block is Var y 111 y 112 = σ 2 p + σ 2 e σ 2 p σ 2 p σ 2 p σ 2 p + σ 2 e σ 2 p. y 113 σ 2 p σ 2 p σ 2 p + σ 2 e opyright c 2018 Dan Nettleton (Iowa State University) 11. Statistics / 49

19 Note that Var(y ijk ) = σp 2 + σe 2 i, j, k. Cov(y ijk, y ijk ) = σp 2 i, j, and k k. Cov(y ijk, y i j k ) = 0 if i i or j j. Any two observations from the same pot have covariance σp. 2 Any two observations from different pots are uncorrelated. Copyright c 2018 Dan Nettleton (Iowa State University) 11. Statistics / 49

20 Alternative Derivation of Variances and Covariances Var(y ijk ) = Var(µ + α i + p ij + e ijk ) = Var(p ij + e ijk ) = Var(p ij ) + Var(e ijk ) + Cov(p ij, e ijk ) + Cov(e ijk, p ij ) = σp 2 + σe = σp 2 + σe. 2 For k k, Cov(y ijk, y ijk ) = Cov(µ + α i + p ij + e ijk, µ + α i + p ij + e ijk ) = Cov(p ij + e ijk, p ij + e ijk ) = Cov(p ij, p ij ) + Cov(p ij, e ijk ) +Cov(e ijk, p ij ) + Cov(e ijk, e ijk ) = Var(p ij ) = σp. 2 Copyright c 2018 Dan Nettleton (Iowa State University) 11. Statistics / 49

21 Note that Var(y) may be written as σ 2 ev where V is a block diagonal matrix with blocks of the form 1 + σp/σ 2 e 2 σp/σ 2 e 2... σp/σ 2 e 2 σ p/σ 2 e σp/σ 2 e 2... σp/σ 2 e σ 2 p/σ 2 e σ 2 p/σ 2 e σ 2 p/σ 2 e Thus, if σ 2 p/σ 2 e were known, we would have the Aitken Model. y = Xβ + ɛ, where ɛ = Zu + e N(0, σ 2 V), σ 2 σ 2 e. opyright c 2018 Dan Nettleton (Iowa State University) 11. Statistics / 49

22 Thus, if σ 2 p/σ 2 e were known, we would use GLS to estimate any estimable Cβ by Cˆβ V = C(X V 1 X) X V 1 y. However, we seldom know σ 2 p/σ 2 e or, more generally, Σ or V. For the general problem where Var(y) = Σ is an unknown positive definite matrix, we can rewrite Σ as σ 2 V, where σ 2 is an unknown positive variance and V is an unknown positive definite matrix. As in our simple example, each entry of V is usually assumed to be a known function of few unknown parameters. opyright c 2018 Dan Nettleton (Iowa State University) 11. Statistics / 49

23 Thus, our strategy for estimating an estimable Cβ involves estimating the unknown parameters in V to obtain Cˆβ ˆV = C(X ˆV 1 X) X ˆV 1 y. In general, Cˆβ ˆV = C(X ˆV 1 X) X ˆV 1 y is an nonlinear estimator that is an approximation to Cˆβ V = C(X V 1 X) X V 1 y, which would be the BLUE of Cβ if V were known. opyright c 2018 Dan Nettleton (Iowa State University) 11. Statistics / 49

24 In special cases, Cˆβ ˆV may be a linear estimator. However, even for our simple example involving seedling height, Cˆβ ˆV is a nonlinear estimator of Cβ for C = [1, 1, 0] Cβ = µ + α 1, C = [1, 0, 1] Cβ = µ + α 2, and C = [0, 1, 1] Cβ = α 1 α 2. Confidence intervals and tests for these estimable functions are not exact. Copyright c 2018 Dan Nettleton (Iowa State University) 11. Statistics / 49

25 In our simple example involving seedling height, there was only one random factor (pot). When there are m random factors, we can partition Z and u as Z = [Z 1,..., Z m ] and u = u 1. u m, where u j is the vector of random effects associated with factor j (j = 1,..., m). opyright c 2018 Dan Nettleton (Iowa State University) 11. Statistics / 49

26 We can write Zu as [Z 1,..., Z m ] u 1. u m = m Z j u j. j=1 We often assume that all random effects (including random errors) are mutually independent and that the random effects associated with the jth random factor have variance σ 2 j (j = 1,..., m). Under these assumptions, Var(y) = ZGZ + R = m σj 2 Z j Z j + σei. 2 j=1 opyright c 2018 Dan Nettleton (Iowa State University) 11. Statistics / 49

27 Example 2 Consider an experiment involving 4 litters of 4 animals each. Suppose 4 treatments are randomly assigned to the 4 animals in each litter. Suppose we obtain two replicate muscle samples from each animal and measure the response of interest for each muscle sample. Copyright c 2018 Dan Nettleton (Iowa State University) 11. Statistics / 49

28 Copyright c 2018 Dan Nettleton (Iowa State University) 11. Statistics / 49

29 Let y ijk denote the kth measure of the response for the animal from litter j that received treatment i (i = 1, 2, 3, 4; j = 1, 2, 3, 4; k = 1, 2). Suppose y ijk = µ + τ i + l j + a ij + e ijk, where β = [µ, τ 1, τ 2, τ 3, τ 4 ] R 5 is an unknown vector of fixed parameters, u = [l 1, l 2, l 3, l 4, a 11, a 21, a 31, a 41, a 12,..., a 34, a 44 ] is a vector of random effects, and opyright c 2018 Dan Nettleton (Iowa State University) 11. Statistics / 49

30 e = [e 111, e 112, e 211, e 212, e 311, e 312, e 411, e 412,..., e 441, e 442 ] is a vector of random errors. With y = [y 111, y 112, y 211, y 212, y 311, y 312, y 411, y 412,..., y 441, y 442 ], we can write the model as a linear mixed-effects model y = Xβ + Zu + e, where opyright c 2018 Dan Nettleton (Iowa State University) 11. Statistics / 49

31 X = , Z = The matrix above repeated three more times Copyright c 2018 Dan Nettleton (Iowa State University) 11. Statistics / 49

32 Kronecker Product Notation A B = = a 11 a 12 a 1n a 21 a 22 a 2n B... a m1 a m2 a mn a 11 B a 12 B a 1n B a 21 B a 22 B a 2n B... a m1 B a m2 B a mn B Copyright c 2018 Dan Nettleton (Iowa State University) 11. Statistics / 49

33 We can write less and be more precise using Kronecker product notation. X = [ 1 8 1, I ], Z = [ I , I ]. In this experiment, we have two random factors: litter and animal. We can partition our random effects vector u into a vector of litter effects and a vector of animal effects: Copyright c 2018 Dan Nettleton (Iowa State University) 11. Statistics / 49

34 [ ] l u =, l = a l 1 l 2 l 3 l 4 a 11 a 44 a 21 a 31, a = a 41. a 12. We make the usual assumption that [ ] ([ ] [ ]) l 0 σ 2 u = N, l I 0, a 0 0 σai 2 where σl 2, σ2 a R + are unknown parameters. Copyright c 2018 Dan Nettleton (Iowa State University) 11. Statistics / 49

35 We can partition Z = [ I = [Z l, Z a ]. 2 1 ] We have Zu = [Z l, Z a ] [ ] l a = Z l l + Z a a and Copyright c 2018 Dan Nettleton (Iowa State University) 11. Statistics / 49

36 Var(Zu) = ZGZ = [Z l, Z a ] [ ] [ ] σ 2 l I 0 Z l 0 σ 2 ai Z a = Z l (σ 2 l I)Z l + Z a(σ 2 ai)z a = σ 2 l Z lz l + σ2 az a Z a = σ 2 l I σ2 a I Copyright c 2018 Dan Nettleton (Iowa State University) 11. Statistics / 49

37 We usually assume that all random effects and random errors are mutually independent and that the errors (like the effects within each factor) are identically distributed: l 0 σl 2 a N 0, I σai 2 0. e σei 2 The unknown variance parameters σl 2, σ2 a, σe 2 R + are called variance components. Copyright c 2018 Dan Nettleton (Iowa State University) 11. Statistics / 49

38 In this case, we have R = Var(e) = σ 2 ei. Thus, Var(y) = ZGZ + R = σl 2 Z lz l + σ2 az a Z a + σei. 2 This is a block diagonal matrix with a block as follows. (To get a block to fit on one slide, let l = σ 2 l, a = σ2 a, e = σ 2 e). opyright c 2018 Dan Nettleton (Iowa State University) 11. Statistics / 49

39 l + a + e l + a l l l l l l l + a l + a + e l l l l l l l l l + a + e l + a l l l l l l l + a l + a + e l l l l l l l l l + a + e l + a l l l l l l l + a l + a + e l l l l l l l l l + a + e l + a l l l l l l l + a l + a + e Copyright c 2018 Dan Nettleton (Iowa State University) 11. Statistics / 49

40 Random Effects Specify the Correlation Structure Copyright c 2018 Dan Nettleton (Iowa State University) 11. Statistics / 49

41 Without the mouse random effects, our model would correspond to an RCBD with 2 mice per treatment per litter Copyright c 2018 Dan Nettleton (Iowa State University) 11. Statistics / 49

42 With no random effects, our model would correspond to a CRD with 8 mice per treatment Copyright c 2018 Dan Nettleton (Iowa State University) 11. Statistics / 49

43 Review of Experimental Design Terminology Experiment An investigation in which the investigator applies some treatments to experimental units and then observes the effect of the treatments on the experimental units by measuring one or more response variables. opyright c 2018 Dan Nettleton (Iowa State University) 11. Statistics / 49

44 Treatment a condition or set of conditions applied to experimental units in an experiment. Experimental Unit the physical entity to which a treatment is randomly assigned and independently applied. Copyright c 2018 Dan Nettleton (Iowa State University) 11. Statistics / 49

45 Response Variable a characteristic of an experimental unit that is measured after treatment and analyzed to assess the effects of treatments on experimental units. Observational Unit the unit on which a response variable is measured. There is often a one-to-one correspondence between experimental units and observational units, but that is not always true. Copyright c 2018 Dan Nettleton (Iowa State University) 11. Statistics / 49

46 In our example involving plant heights and soil moisture levels, pots were the experimental units because soil moisture levels were randomly assigned to pots. Seedlings were the observational units because the response was measured separately for each seedling. Whenever there is more than one observational unit for an experimental unit or whenever the response is measured multiple times for an experimental unit, we say we have multiple observations per experiment unit. This scenario is also referred to as subsampling or pseudo-replication. Copyright c 2018 Dan Nettleton (Iowa State University) 11. Statistics / 49

47 Whenever an experiment involves multiple observations per experimental unit, it is important to include a random effect for each experimental unit. Without a random effect for each experimental unit, a one-to-one correspondence between observations and experimental units is assumed. Including random effects in a model is one way to account for a lack of independence among observations that might be expected based on the design of an experiment. Copyright c 2018 Dan Nettleton (Iowa State University) 11. Statistics / 49

48 Completely Randomized Design (CRD) experimental design in which, for given number of experiment units per treatment, all possible assignments of treatments to experimental units are equally likely. Block a group of experimental units that, prior to treatment, are expected to be more like one another (with respect to one or more response variables) than experimental units in general. Randomized Complete Block Design (RCBD) experimental design in which separate and completely randomized treatment assignments are made for each of multiple blocks in such a way that all treatments have at least one experimental unit in each block. Copyright c 2018 Dan Nettleton (Iowa State University) 11. Statistics / 49

49 Blocking grouping similar experimental units together and assigning different treatments within such groups of experimental units. The experiment involving muscle samples from mice used blocking. Each litter was a block in that experiment. Each mouse was an experimental unit. Each muscle sample was an observational unit. Copyright c 2018 Dan Nettleton (Iowa State University) 11. Statistics / 49

Linear Mixed-Effects Models. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 34

Linear Mixed-Effects Models. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 34 Linear Mixed-Effects Models Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 1 / 34 The Linear Mixed-Effects Model y = Xβ + Zu + e X is an n p design matrix of known constants β R