ML and REML Variance Component Estimation. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58

Size: px

Start display at page:

Download "ML and REML Variance Component Estimation. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58"

Sheryl Parks
6 years ago
Views:

1 ML and REML Variance Component Estimation Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58

2 Suppose y = Xβ + ε, where ε N(0, Σ) for some positive definite, symmetric matrix Σ. Furthermore, suppose each element of Σ is a known function of an unknown vector of parameters θ Θ. opyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58

3 For example, consider Σ = m σj 2 Z j Z j + σei, 2 j=1 where Z 1,..., Z m are known matrices, θ = [θ 1,..., θ m, θ m+1 ] = [σ 2 1,..., σ 2 m, σ 2 e], and Θ = {θ : θ j > 0, j = 1,..., m + 1}. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58

4 The likelihood function is L(β, θ y) = (2π) n/2 Σ 1/2 e 1 2 (y Xβ) Σ 1 (y Xβ) and the log likelihood function is l(β, θ y) = n 2 log(2π) 1 2 log Σ 1 2 (y Xβ) Σ 1 (y Xβ). Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58

5 Based on previous results, (y Xβ) Σ 1 (y Xβ) is minimized over β R p, for any fixed θ, by ˆβ(θ) (X Σ 1 X) X Σ 1 y. opyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58

6 Thus, the profile log likelihood for θ is l (θ y) = sup β R p l(β, θ y) = n 2 log(2π) 1 2 log Σ 1 2 (y Xˆβ(θ)) Σ 1 (y Xˆβ(θ)). Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58

7 In the general case, numerical methods are used to find the maximizer of l (θ y) over θ Θ. Let ˆθ MLE = arg max{l (θ y) : θ Θ}. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58

8 Let ˆΣ MLE be the matrix Σ with ˆθ MLE in place of θ. The MLE of an estimable Cβ is then given by Cˆβ MLE = C(X ˆΣ 1 MLE X) X ˆΣ 1 MLE y. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58

9 We can write Σ as σ 2 V, where σ 2 > 0, V is PD and symmetric, σ 2 is known function of θ, and each entry of V is a known function of θ, e.g., Σ = m σj 2 Z j Z j + σei 2 j=1 = σ 2 e [ m σ 2 j σ 2 j=1 e ] Z j Z j + I opyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58

10 If we let ˆσ 2 MLE and ˆV MLE denote σ 2 and V with ˆθ MLE in place of θ, then It follows that ˆΣ MLE = ˆσ MLE 2 ˆV MLE and ˆΣ 1 MLE = 1 ˆV 1 ˆσ MLE 2 MLE. Cˆβ MLE = C(X ˆΣ 1 MLE X) X ˆΣ 1 MLE ( ) y = C X 1 ˆV 1 ˆσ MLE 2 MLE X X 1 ˆV 1 ˆσ MLE 2 MLE y = C(X ˆV 1 MLE X) X ˆV 1 MLE y. opyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58

11 Note that Cˆβ MLE = C(X ˆV 1 MLE X) X ˆV 1 MLE y is the GLS estimator under the Aitken model with ˆV MLE in place of V. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58

12 We have seen by example that the MLE of the variance component vector θ can be biased. For example, for the case of Σ = σ 2 I, where θ = [σ 2 ], the MLE of σ 2 is (y Xˆβ) (y Xˆβ) n with expectation n r n σ2. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58

13 This MLE for σ 2 is often criticized for failing to account for the loss of degrees of freedom needed to estimate β. [ ] (y Xˆβ) (y Xˆβ) E = n r n n σ2 < σ 2 [ (y Xβ) ] (y Xβ) = E n Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58

14 A familiar special case: Suppose y 1,..., y n i.i.d. N(µ, σ 2 ). Then [ n i=1 E (y i µ) 2 ] = σ 2, but n [ n i=1 E (y i ȳ) 2 ] = n 1 n n σ2. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58

15 REML is an approach that produces unbiased estimators for these special cases and produces less biased estimators than ML estimators in general. Depending on whom you ask, REML stands for REsidual Maximum Likelihood or REstricted Maximum Likelihood. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58

16 The REML method: Find n rank(x) = n r linearly independent vectors a 1,..., a n r such that a i X = 0 for all i = 1,..., n r. Find the maximum likelihood estimate of θ using w 1 a 1 y,..., w n r a n ry as data. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58

17 A = [a 1,..., a n r ] w = w 1. w n r = a 1 y. a n ry = A y. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58

18 If a X = 0, then a y is known as an error contrast. Thus, w 1,..., w n r comprise a set of n r error contrasts. Because (I P X )X = X P X X = X X = 0, the elements of (I P X )y = y P X y = y ŷ are each error contrasts. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58

19 Because rank(i P X ) = n rank(x) = n r, there exists a set of n r linearly independent rows of I P X that can be used in step 1 of the REML method to get a 1,..., a n r. If we do use a subset of rows of I P X to get a 1,..., a n r, the error contrasts w 1 = a 1 y,..., w n r = a n ry will be a subset of the elements of (I P X )y = y ŷ, the residual vector. This is why it makes sense to call the procedure Residual Maximum Likelihood. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58

20 Note that w = A y = A (Xβ + ε) = A Xβ + A ε = 0 + A ε = A ε. Thus, w = A ε N(A 0, A ΣA) d = N(0, A ΣA), and the distribution of w depends on θ but not β. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58

21 The log likelihood function in this case is l w (θ w) = n r 2 log(2π) 1 2 log A ΣA 1 2 w (A ΣA) 1 w. An MLE for θ, say ˆθ, can be found in the general case using numerical methods to obtain the REML estimate of θ. Let ˆθ = arg max{l w (θ y) : θ Θ}. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58

22 Once a REML estimate of θ (and thus Σ) has been obtained, the BLUE of an estimable Cβ if Σ were unknown can be approximated by Cˆβ REML = C(X ˆΣ 1 X) X ˆΣ 1 y, where ˆΣ is Σ with ˆθ (the REML estimate of θ) in place of θ. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58

23 Suppose A and B are each n (n r) matrices satisfying rank(a) = rank(b) = n r and A X = B X = 0. Let Prove that w = A y and v = B y. ˆθ maximizes l w (θ w) over θ Θ iff ˆθ maximizes l v (θ v) over θ Θ. opyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58

24 Proof: A X = 0 = X A = 0 = each column of A N (X ). dim(n (X )) = n rank(x) = n r. the n r LI columns of A form a basis for N (X ). B X = 0 = X B = 0 = columns of B N (X ). an (n r) (n r) matrix M AM = B. opyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58

25 Note that M, an (n r) (n r) matrix, is nonsingular n r = rank(b) = rank(am) rank(m) n r = rank(m) = n r. v (B ΣB) 1 v = y B(M A ΣAM) 1 B y = y AMM 1 (A ΣA) 1 (M ) 1 M A y = y A(A ΣA) 1 A y = w (A ΣA) 1 w. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58

26 Also B ΣB = M A ΣAM = M A ΣA M = M 2 A ΣA. opyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58

27 Now note that l v (θ v) = n r log(2π) log B ΣB 1 2 v (B ΣB) 1 v = n r log(2π) log( M 2 A ΣA ) 1 2 w (A ΣA) 1 w = 1 2 log( M 2 ) n r log(2π) log( A ΣA ) = log M + l w (θ w). 1 2 w (A ΣA) 1 w Because M is free of θ, the result follows. opyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58

28 Consider the special case y = Xβ + ε, ε N(0, σ 2 I). Prove that the REML estimator of σ 2 is ˆσ 2 = y (I P X )y. n r Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58

29 REML Theorem: Suppose y = Xβ + ε, where ε N(0, Σ), Σ a PD, symmetric matrix whose entries are known functions of an unknown parameter vector θ Θ. Furthermore, suppose rank(x) = r and X is any n r matrix consisting of any set of r LI columns of X. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58

30 REML Theorem: Let A be any n (n r) matrix satisfying rank(a) = n r and A X = 0. Let w = A y N(0, A ΣA). Then ˆθ maximizes l w (θ w) over θ Θ ˆθ maximizes, over θ Θ, g(θ) = 1 2 log Σ 1 2 log X Σ 1 X 1 2 (y Xˆβ(θ)) Σ 1 (y Xˆβ(θ)), where ˆβ(θ) = (X Σ 1 X) X Σ 1 y. opyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58

31 We have previously shown: 1. an n (n r) matrix Q QQ = I P X, Q X = 0, and Q Q = I, n m and 2. Any n (n r) matrix A of rank n r satisfying A X = 0 leads to the same REML estimator ˆθ. Thus, without loss of generality, we may assume the matrix A in the REML Theorem is Q. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58

32 We will now prove 3 lemmas and then use those lemmas to prove the REML Theorem. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58

33 Lemma R1: Let G = Σ 1 X( X Σ 1 X) 1 and suppose A is and n (n r) matrix satisfying A A = m m I and AA = I P X. Then [A, G] 2 = X X 1. opyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58

34 Proof of Lemma R1: [A, G] 2 = [A, G] [A, G] = [A, G] [A, G] A A A G = G A G G = I G A = I G G G AA G A G G G (by our result on the determinant of a partitioned matrix) Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58

35 = G G G (I P X )G = G P X G = G P X G = ( X Σ X) 1 X Σ 1 X( X X) 1 X Σ 1 X( X Σ 1 X) 1 = [( X Σ X) 1 ( X Σ 1 X)]( X X) 1 [( X Σ 1 X)( X Σ 1 X) 1 ] = ( X X) 1 = X X 1. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58

36 Lemma R2: A ΣA = X Σ 1 X Σ X X 1, where A, X, and Σ are as previously defined. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58

37 Proof of Lemma R2: Again define G = Σ 1 X( X Σ 1 X) 1. Show that (1) A ΣG = 0, and (2) G ΣG = ( X Σ 1 X) 1. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58

38 Next show that [A, G] 2 Σ = A ΣA X Σ 1 X 1. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58

39 Now use Lemma R1 to complete the proof of Lemma R2. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58

40 Lemma R3: A(A ΣA) 1 A = Σ 1 Σ 1 X( X Σ 1 X) 1 X Σ 1, where A, Σ, and X are as defined previously. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58

41 Proof of Lemma R3: [A, G] 1 Σ 1 ([A, G] ) 1 = ([A, G] Σ[A, G]) 1 [ ] 1 [ A ΣA A ΣG A ΣA 0 = = G ΣA G ΣG 0 G ΣG [ ] 1 A ΣA 0 = 0 ( X Σ 1 X) 1 [ ] (A ΣA) 1 0 =. 0 X Σ 1 X ] 1 opyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58

42 Now multiplying on the left by [A, G] and on the right by [A, G] yields [ ] [ ] (A Σ 1 ΣA) 1 0 A = [A, G] 0 X Σ 1 X G = A(A ΣA) 1 A + G X Σ 1 XG. (3) opyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58

43 Now G X Σ 1 XG = Σ 1 X( X Σ 1 X) 1 X Σ 1 X( X Σ 1 X) 1 X Σ 1 = Σ 1 X( X Σ 1 X) 1 X Σ 1. (4) Combining (3) and (4) yields A(A ΣA) 1 A = Σ 1 Σ 1 X( X Σ 1 X) 1 X Σ 1. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58

44 Now use Lemmas R2 and R3 to prove the REML Theorem. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58

45 l w (θ w) = n r log(2π) log A ΣA 1 2 w (A ΣA) 1 w = constant log( X Σ 1 X Σ X X 1 ) 1 2 y A(A ΣA) 1 A y = constant log Σ 1 2 log X Σ 1 X 1 2 y (Σ 1 Σ 1 X( X Σ 1 X) 1 X Σ 1 )y opyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58

46 = constant log Σ 1 2 log X Σ 1 X 1 2 y Σ 1/2 (I Σ 1/2 X( X Σ 1 X) 1 X Σ 1/2 )Σ 1/2 y = constant log Σ 1 2 log X Σ 1 X 1 2 y Σ 1/2 (I P Σ 1/2 X )Σ 1/2 y. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58

47 Now C(Σ 1/2 X) = C(Σ 1/2 X) P Σ 1/2 X = P Σ 1/2 X y Σ 1/2 (I P Σ 1/2 X )Σ 1/2 y = y Σ 1/2 (I P Σ 1/2 X )Σ 1/2 y = (I P Σ 1/2 X )Σ 1/2 y 2 Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58

48 = Σ 1/2 y Σ 1/2 X(X Σ 1 X) X Σ 1 y 2 = Σ 1/2 y Σ 1/2 Xˆβ(θ) 2 = Σ 1/2 (y Xˆβ(θ)) 2 = (y Xˆβ(θ)) Σ 1/2 Σ 1/2 (y Xˆβ(θ)) = (y Xˆβ(θ)) Σ 1 (y Xˆβ(θ)). Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58

49 we have l w (θ w) = constant 2 1 log Σ log X Σ 1 X 1 2 (y Xˆβ(θ)) Σ 1 (y Xˆβ(θ)) = constant 2 + g(θ). ˆθ maximizes l w (θ w) over θ Θ ˆθ maximizes g(θ) over θ Θ. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58

20. REML Estimation of Variance Components. Copyright c 2018 (Iowa State University) 20. Statistics / 36

20. REML Estimation of Variance Components Copyright c 2018 (Iowa State University) 20. Statistics 510 1 / 36 Consider the General Linear Model y = Xβ + ɛ, where ɛ N(0, Σ) and Σ is an n n positive definite