Stat 579: Generalized Linear Models and Extensions

Size: px

Start display at page:

Download "Stat 579: Generalized Linear Models and Extensions"

Kristina Porter
6 years ago
Views:

1 Stat 579: Generalized Linear Models and Extensions Mixed models Yan Lu March, 2018, week 8 1 / 32

2 Restricted Maximum Likelihood (REML) REML: uses a likelihood function calculated from the transformed set of data, so that nuisance parameters have no effect 2 / 32

3 Neyman-Scott problem Example (17), Jiang s book showed that if the number of parameters increases with the sample size, MLE may not be consistent Suppose two observations are collected from m individuals individual 1 y 11 y 12 individual 2 y 21 y 22 individual m y m1 y m2 -Each individual i has its own (unknown) mean µ i -The observations are independent and normally distributed with variance σ 2, want to estimate σ 2 y ij = µ i + ɛ ij, ɛ ij iid N(0, σ 2 ), i = 1, 2,, m (1) m + 1 parameters: µ 1, µ 2,, µ m, σ number of observations 2m m + 1 is proportional to 2m 3 / 32

4 Let y = y 11 y 12 y 21 y 22 y m1 y m2, X = I m 1 2 = Model (5) can be written as β = µ 1 µ 2 µ m 2m m, ɛ = ɛ 11 ɛ 12 ɛ 21 ɛ 22 ɛ m1 ɛ m2 y = Xβ + ɛ (2) 4 / 32

5 ˆσ ML 2 = (y Xˆβ) (y Xˆβ) 2m m 2 = (y ij ȳ i ) 2 /2m = i=1 j=1 m i=1 1 2 (y i1 y i2 ) 2 /2m Let Z i = y i1 y i2, Z i N(0, 2σ 2 ), Z i s iid S 2 = m i=1 (Z i Z) 2 m 1 p 2σ 2 Since Z p 0, m i=1 Z 2 i m 1 p 2σ 2 5 / 32

6 Hence, ˆσ 2 ML = 1 2 = 1 2 m i=1 Z i 2 2m m i=1 Z i 2 m 1 m 1 2m p 1 2 2σ2 1 2 = σ2 /2 6 / 32

7 Transformation Consider transformation Z i = y i1 y i2, Z i iid N(0, 2σ 2 ) Let Recall that such that A X = 0 Z = A y = X = I m 1 2 = m m m 2m y 7 / 32

8 y = Xβ + ɛ transforms to A y = A Xβ + A ɛ ie Z = A ɛ REML is to apply a transformation to the data to eliminate the fixed effects Then use the transformed data to estimate the variance component p σ 2 ˆσ 2 REML 8 / 32

9 Restricted maximum likelihood (REML) y = Xβ p 1 + Zα q 1 + ɛ n 1 (3) α N(0, G), ɛ N(0, R), α and ɛ are independent Var(y) = ZGZ + R = V rank(x) = p Apply a transformation to eliminate the fixed effects β, choose A n (n p), rank(a) = n p, such that (3) becomes A (n p) n X n p = 0 A y = A Xβ + A Zα + A ɛ = 0 + A Zα + A ɛ = A (Zα + ɛ) 9 / 32

10 Define z = A y, z N(0, A VA) { 1 f R (z) = (2π) (n p)/2 A exp 1 } VA 1/2 2 z (A VA) 1 z (4) Restricted log-likelihood is l R (θ) = C 1 2 log( A VA ) 1 2 z (A VA) 1 z REML estimator of θ is defined as the maximizer of (4) 10 / 32

11 Remarks REML estimator doesn t depend on A Choice of A is not unique, but the results will be the same The REML method is a method of estimating θ (not β, β is eliminated before estimation) After ˆθ R obtained, β is usually estimated the same way as the ML, such estimators are referred as the REML estimator of β 11 / 32

12 Example: one-way balanced random effects model y ij = µ + α i + ɛ ij, i = 1, 2,, m, j = 1,, k, n = mk α N(0, σ 2 αi m ), ɛ N(0, σ 2 I n ) α and ɛ independent Source df SS MS Between groups (SSA) m 1 m i=1 k(ȳ i ȳ ) 2 SSA/(m-1) Within groups (SSE) n m i,j (y ij ȳ i ) 2 SSE/(n-m) total n 1 i,j (y ij ȳ ) 2 12 / 32

13 Transformation z = A y for any A that makes A X = 0 l R = 1 2 (mk 1)log2π 1 2 log(mk) 1 m(k 1)logσ (m 1)log(σ2 + kσ 2 α) SSE 2σ 2 SSA 2(σ 2 + kσ 2 α) REML equations l R σ 2 = 0, l R σα 2 = 0 { σ 2 + kσα 2 = MSA σ 2 = MSE 13 / 32

14 ML and REML estimators for one way random effects model ML Estimator REML Estimator ˆσ 2 { ( MSE )} MSE m 1 ˆσ α 2 max 0, k 1 MSA MSE max { 0, k 1 (MSA MSE) } m 14 / 32

15 Comments: Both have the same merits of being based on the likelihood principle which leads to useful properties such as consistency, asymptotic normality and efficiency For balanced mixed ANOVA models, the REML estimates for the variance components are identical to classical ANOVA-type estimates obtained from solving the equations which set mean squares equal to their expectations Generally REML estimates of variance components are preferred REML estimators are not guaranteed to be unbiased, but they are usually less biased than ML estimators In general, expect results from ML and REML estimates to differ more as the number of p of fixed effects in the model increase 15 / 32

16 Analysis of variance estimation (MOM) Basic idea came from the method of moments, let Q be the q-dimentional vector whose components are quadratic functions of the data The ANOVA estimators of the variance components are obtained by solving the system of equation E(Q) = Q Source df SS MS Between groups (SSA) m 1 m i=1 k(ȳ i ȳ ) 2 SSA/(m-1) Within groups (SSE) n m i,j (y ij ȳ i ) 2 SSE/(n-m) total n 1 i,j (y ij ȳ ) 2 16 / 32

17 { E(SSA) = (m 1)kσ 2 α + (m 1)σ 2 E(SSE) = m(k 1)σ 2 ANOVA estimating equations are { SSA = (m 1)kσ 2 α + (m 1)σ 2 The ANOVA estimators are same as REML estimator SSE = m(k 1)σ 2 ˆσ 2 = MSE ˆσ α = max { 0, k 1 (MSA MSE) } 17 / 32

18 Interpretation Consequence of assuming random statistical inferences can be made to the population from which the group effects were drawn Random effects induce a correlation among observations with the same group effect Using random effects involves making extra assumptions but often results in more precise estimates Fixed and random effects Fixed effects are interpreted as usual Multiple comparisons of the groups are still of interest Estimates of the random effects α i, within group, are interpreted similarly But we are not particularly interested in the specific level that was randomly selected from population Larger estimates of σ 2 α means importance of the random effects Commonly, we are interested in the proportion ˆσ 2 α/(ˆσ 2 α + ˆσ 2 ) 18 / 32

19 Tests in Gaussian Mixed Models Example: Animal behavior study (LONSDORF, ET AL (2004) Nature, 428, ) Objective: to study the learning of termite fishing by young chimpanzees study Design: follow chimpanzee mothers over four field seasons in Gombe National park, Tanzania Primary Observation: age at which termite fishing skill has been acquired by the mother s offspring Results: female offspring acquired the skill at a younger age (31 vs 58 months) and spent more time fishing, male offspring preferred to play 19 / 32

20 Can we apply mixed model to the data? Offspring of the same mother are likely to be correlated, need to account for this correlation Some mothers termite fish more skillfully than others offspring of these mothers might acquire the skill at an earlier age we should account for these mother effects consider as random Also account for gender as fixed effects Mixed model y ij = (age) ij = β 0 + β 1 sex + γ i + ɛ ij i: chimp mother, j: jth offspring mother-specific effect: γ i N(0, σ 2 γ) ɛ ij N(0, σ 2 ) 20 / 32

21 Interested Tests H 0 : β 1 = 0 vs β 1 0 If the sex is associated with age of termite fishing H 0 : σγ 2 = 0 vs σγ 2 > 0 If there is a mother-specific effect 21 / 32

22 Example: one way random effects model y ij = µ + α i + ɛ ij, i = 1, 2,, m, j = 1,, n i (5) Let i n i = n, for simplicity, let n 1 = n 2 = n m = k, n = mk Let y = y 11 y 1k y 21 y 2k y m1 y mk, ɛ = ɛ 11 ɛ 1k ɛ 21 ɛ 2k ɛ m1 ɛ mk, Z = mk m 22 / 32

23 Let β = µ, α = α 1 α 2 α m, X = mk 1 One way random effects model (5) can be written as y = Xβ + Zα + ɛ α N(0, σ 2 αi m ), ɛ N(0, σ 2 I mk ) α and ɛ independent Fixed: µ Random: α i related to σ 2 α and ɛ ij related to σ 2 23 / 32

24 F test F-test ANOVA table H 0 : σ 2 α = 0 vs σ 2 α > 0 Source df SS MS Between groups (SSB) m 1 m i=1 n i(ȳ i ȳ ) 2 SSB/(m-1) Within groups (SSE) n m i,j (y ij ȳ i ) 2 SSE/(m(k-1)) total n 1 i,j (y ij ȳ ) 2 MSB/(σ2 + kσα) 2 MSE/σ 2 has an F distribution with degrees of freedom (m 1, m(k 1)) Under H 0, MSB/MSE has an F distribution with degrees of freedom (m 1, m(k 1)) 24 / 32

25 Some notations Projection matrix: P X = X(X X) 1 X Z X = P X Z P X = I P X 25 / 32

26 F test for mixed effects model Suppose for the general mixed model, y = Xβ + Zα + ɛ Let α = (α 1, α 2,, α s) with α i N(0, σ 2 i I), ɛ N(0, σ 2 I), Z = (Z 1, Z 2,, Z s ) with Z i associated with random vector α i One wishes to test the hypothesis H 0 : σ 2 1 = 0 vs σ 2 1 > 0 Write the mixed model as the following y = Xβ + Z 1 α 1 + Z 1 α 1 + ɛ where α 1 = (α 2,, α s), Z 1 = (Z 2,, Z s ) 26 / 32

27 Consider the quadratic form (Jiang, chapter 2) q 1 = σ 2 y P Z1 (X,Z 1 )y = y { P Z1 (X,Z 1 )/σ 2} y q 2 = σ 2 y P (X,Z) y = y { P (X,Z) /σ 2} y q 1 χ 2 r1, r 1 = rank {(X, Z)} rank {(X, Z 1 )} q 1 and q 2 are independent, q 2 χ 2 r2, r 2 = n rank {(X, Z)} F = q 1/r 1 q 2 /r 2 F r1,r 2 27 / 32

28 Likelihood ratio test based on the likelihood ratio, which expresses how many times more likely the data under one model than the other used to compare the fit of two models 28 / 32

29 Let Y = (Y 1, Y 2,, Y n ), where Y 1, Y 2,, Y n have joint pdf f (y, θ) for θ Ω, and consider the hypothesis H 0 : θ Ω 0 vs H α : θ Ω Ω 0 The generalized likelihood ratio (GLR) is defined by λ(y) = max θ Ω 0 f (y; θ) max θ Ω f (y; θ) = f (y; ˆθ 0 ) f (y; ˆθ) ˆθ denote the usual MLE of θ ˆθ 0 denotes the MLE under the restriction that H 0 is true if y f (y; θ 1,, θ k ), then under H 0 : (θ 1, θ 2,, θ r ) = (θ 10, θ 20,, θ r0 ), r < k Approximately, for large n, 2logλ(y) χ 2 (r) (Hartley and Rao (1967), Jiang (2005c) provided rigorous proof) An appropriate size α test is to reject H 0 if 2logλ(y) χ 2 1 α(r) 29 / 32

30 One way Random Effects Model y ij = µ + α i + ɛ ij, i = 1,, m, j = 1,, k α i N(0, σ 2 α), ɛ ij N(0, σ 2 ) l(µ, σ 2 α, σ 2 ) = c 1 2 (n m)log(σ2 ) σ 2 m k i=1 j=1 m log(σ 2 + kσα) 2 i=1 (y ij µ) 2 + σ2 α 2σ 2 m k 2 σ 2 + kσα 2 (ȳ i µ) 2 i=1 Under H 0 : σ 2 α = 0 vs σ 2 α > 0 l(µ, 0, σ 2 ) = c n 2 log(σ2 ) 1 2σ 2 m i=1 j=1 k (y ij µ) 2 30 / 32

31 Set partial derivatives equal to 0 l µ = 1 2σ 2 m i=1 j=1 k 2(y ij µ) = 0 l σ 2 = n 2 = n 2σ 2 + ˆσ 2 0 = 1 σ 2 m k i=1 j=1 2(y ij µ) 2 4σ 4 m k i=1 j=1 (y ij µ) 2 2σ 4 ˆµ = ȳ m k i=1 j=1 (y ij ȳ ) 2 n 31 / 32

32 Without any restriction ˆµ = ȳ m k ˆσ 2 i=1 j=1 = (y ij ȳ ) 2 = MSE m(k 1) ˆσ α 2 = 1 [ k m i=1 (ȳ i ȳ ) 2 ] ˆσ 2 k m 2logλ(y) can be calculated However, under H 0 : σ 2 α = 0 vs σ 2 α > 0, 2logλ(y) is not distributed as χ 2 1 It could be approximated by a mixture of 1 2 χ χ2 1 If there are no boundry issue, say H 0 : σ 2 α = 2 vs σ 2 α 2 2logλ(y) is distributed as χ / 32

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 1 Random Vectors Let a 0 and y be n 1 vectors, and let A be an n n matrix. Here, a 0 and A are non-random, whereas y is