Prediction. is a weighted least squares estimate since it minimizes. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark

Size: px

Start display at page:

Download "Prediction. is a weighted least squares estimate since it minimizes. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark"

Abraham Anthony
6 years ago
Views:

1 Prediction Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark March 22, 2017 WLS and BLUE (prelude to BLUP) Suppose that Y has mean β and known covariance matrix V (but Y need not be normal). Then ˆβ = ( T V 1 ) 1 T V 1 Y is a weighted least squares estimate since it minimizes (Y β) T V 1 (Y β). It is also the best linear unbiased estimate (BLUE) - that is the unbiased estimate with smallest variance in the sense that Var β Var ˆβ is positive semi-definite for any other linear unbiased estimate β. 1 / 1 More generally, suppose Y has mean vector µ in linear subspace M with orthogonal projection P. Then Aˆµ = APY is BLUE for ψ = Aµ. 2 / 1 BLUE for general parameter Suppose EY = µ is in linear subspace M and CovY = σ 2 I and ψ = Aµ. Then BLUE of ψ is ˆψ = Aˆµ where ˆµ = PY and P orthogonal projection on M. Proof: Assume ψ is LUE. I.e. ψ = BY and E ψ = Bµ = Aµ for all µ L. We also have APµ = Aµ for all µ M (which implies that ˆψ is unbiased). Thus for all w R p (B AP)Pw = BPw APw = APw APw = 0 since Pw M. This implies (B AP)P = 0 which gives Cov( ψ ˆψ, ˆψ) = σ 2 (B AP)P T A T = 0 Case: V = LL T different from σ 2 I and µ = β where has full rank. Then BLUE of ψ = Aµ is Aˆµ where ˆµ = ( T V 1 ) T V 1 Y is WLS estimate of µ. Proof: combine previous result with result below, letting Ỹ = L 1 Y. Result: suppose ˆψ = CỸ is BLUE of ψ based on data Ỹ and K is an invertible matrix. Then ˆψ = CK 1 Y is BLUE based on Y = KỸ as well. so that Var( ψ) = Var( ψ) = Var( ψ ˆψ) + Var ˆψ. Note: like Pythogoras if we say ψ ˆψ and ˆψ orthogonal when their covariance is zero. 3 / 1 4 / 1

2 Estimable parameters and BLUE Definition: A linear combination a T β is estimable if it has a LUE b T Y. Result: a T β is estimable a T β = c T µ for some c. If a T β is estimable then BLUE is c Tˆµ. Optimal prediction and Y random variables, g real function. General result Cov(Y E[Y ], g( )) = Cov(E[Y E[Y ] ], E[g( ) ])+ Note, since E[Y E[Y ]] = 0 we also have ECov(Y E[Y ], g( ) ) = 0 E(Y E[Y ])g( ) = 0 In particular, for any prediction Ỹ = f ( ) of Y : 5 / 1 E [ (Y E[Y ])(E[Y ] f ( )) ] = 0 from which it follows that E(Y Ỹ ) 2 = E(Y E[Y ]) 2 +E(E[Y ] Ỹ ) 2 E(Y E[Y ]) 2 Thus E[Y ] minimizes mean square prediction error. 6 / 1 Pythagoras and conditional expectation Space of real random variables with finite variance may be viewed as a vector space with inner product and (L 2 ) norm <, Y >= E(Y ) = E 2 Orthogonal decomposition (Pythagoras): Y 2 = E[Y ] 2 + Y E[Y ] 2 E[Y ] may be viewed as projection of Y on since it minimizes distance E(Y Ỹ ) 2 among all predictors Ỹ = f ( ). For zero-mean random variables, orthogonal is the same as uncorrelated. Decomposition of Y : Y = E[Y ] + (Y E[Y ]) where predictor E[Y ] and prediction error Y E[Y ] uncorrelated. Thus, VarY = VarE[Y ]+Var(Y E[Y ]) = VarE[Y ]+EVar[Y ] whereby Var(Y E[Y ]) = EVar[Y ]. Prediction variance is equal to the expected conditional variance of Y. (Grimmett & Stirzaker, Prob. and Random Processes, Chapter 7.9 good source on this perspective on prediction and conditional expectation) 7 / 1 8 / 1

3 BLUP Consider random vectors Y and with with mean vectors and covariance matrix EY = µ Y E = µ [ ] ΣY Σ Σ = Y Σ Y Σ Then the best linear unbiased predictor of Y given is in the sense that Ŷ = µ Y + Σ Y Σ 1 ( µ ) Var[Y (a + B )] Var[Y Ŷ ] is positive semi-definite for all linear unbiased predictors a + B and = only if Ŷ = a + B (unbiased: E[Y a B ] = 0). 9 / 1 Fact: Thus Cov[Y Ŷ, Ŷ ] = 0 and Cov[C, Y Ŷ ] = 0 for all C. (1) VarŶ = Cov(Y, Ŷ ) = Σ Y Σ 1 Σ Y It follows that mean square prediction error is Var[Y Ŷ ] = VarY + VarŶ 2Cov(Y, Ŷ ) = Σ Y Σ Y Σ 1 Σ Y Proof of fact: Cov[C, Y Ŷ ] = Cov[C, Y ] Cov[C, Ŷ ] = CΣ Y CΣ Σ 1 Σ Y = 0 10 / 1 Proof of BLUP One version: Recall Cov(B, Y Ŷ ] = 0 for all B. Var[Y (a + B )] = Var[Y Ŷ ] + Var[Ŷ (a + B )]+ Cov[Y Ŷ, Ŷ (a + B )] + Cov[Ŷ (a + B ), Y Ŷ ] = Var[Y Ŷ ] + Var[Ŷ (a + B )] Hence Var[Y (a + B )] Var[Y Ŷ ] = Var[Ŷ (a + B )] where right hand side is positive semi-definite. Second version: (Y scalar for consistency with previous slide) BLUP is projection of Y onto linear subspace spanned by 1,..., n (with orthonormal basis U 1,... U n where U = Σ 1/2 ) (analogue to least squares). NB: conditional expectation E[Y ] projection on space of all variables Z = f ( 1,..., n ) where f real function. 11 / 1 Conditional distribution in multivariate normal distribution Consider jointly normal random vectors Y and with mean vector and covariance matrix Then (provided Σ invertible) µ = (µ Y, µ ) [ ] ΣY Σ Σ = Y Σ Y Y = x N(µ Y + Σ Y Σ 1 (x µ ), Σ Y Σ Y Σ 1 Σ Y ) Proof: By BLUP Σ Y = Ŷ + R where Ŷ = µ Y + Σ Y Σ 1 ( µ ), R = Y Ŷ and Cov(R, ) = 0. By normality R is independent of and the result follows. 12 / 1

4 Conditional simulation using prediction Suppose Y and are jointly normal and we wish to simulate Y = x. By previous result Y = x ŷ + R where ŷ = µ Y + Σ Y Σ 1 (x µ ). We thus need to simulate R. This can be done by simulated prediction : simulate (Y, ) and compute Ŷ and R = Y Ŷ. Then our conditional simulation is ŷ + R Advantageous if it is easier to simulate (Y, ) and compute Ŷ than simulate directly from conditional distribution of Y = x (e.g. if simulation of (Y, ) easy but Σ Y Σ Y Σ 1 Σ Y difficult) 13 / 1 Prediction in linear mixed model Let U N(0, Ψ) and Y U = u N( β + Zu, Σ). Then Cov[U, Y ] = ΨZ T and VarY = V = ZΨZ T + Σ. and NB: Û = E[U Y ] = ΨZ T V 1 (Y β) VarÛ = ΨZ T V 1 ZΨ T ΨZ T (ZΨZ T + Σ) 1 = (Ψ 1 + Z T Σ 1 Z) 1 Z T Σ 1 e.g. useful if Ψ 1 is sparse (like AR-model). Similarly Var[U Y ] = Var[U Û] = Ψ ΨZ T V 1 ZΨ T = (Ψ 1 +Z T Σ 1 Z) 1 One-way anova example at p. 186 in M & T. 14 / 1 IQ example BLUP as hierarchical likelihood estimates Y measurement of IQ, U subject specific random effect: Y = µ + U + ɛ where standard deviation of U and ɛ are 15 and 5 and µ = 100. Given Y = 130, E[µ + U Y = 130] = 127. Example of shrinkage to the mean. Maximization of joint density ( hierarchical likelihood ) f (y u; β)f (u; ψ) with respect to u gives BLUP (M & T p for one-way anova and p. 183 for general linear mixed model) Joint maximization wrt. u and β gives Henderson s mixed-model equations (M & T p. 184) leading to BLUE ˆβ and BLUP û. 15 / 1 16 / 1

5 BLUP of mixed effect with unknown β Assume E = Cβ and EY = Dβ. Given and β, BLUP of is K = Aβ + BY ˆK(β) = Aβ + BŶ (β) where BLUP Ŷ (β) = Dβ + Σ Y Σ 1 ( Cβ). Typically β is unknown. Then BLUP is where ˆβ is BLUE (Harville, 1991) ˆK = A ˆβ + BŶ ( ˆβ) Proof: ˆK(β) can be rewritten as Aβ+BŶ (β) = [A+BD BΣ Y Σ 1 C]β+BΣ Y Σ 1 = T +BΣ Y Σ 1 Note BLUE of T = [A + BD BΣ Y Σ 1 C]β is ˆT = [A + BD BΣ Y Σ 1 C] ˆβ. Now consider a LUP K = H = [H BΣ Y Σ 1 ] + BΣ Y Σ 1 of K. By unbiasedness, T = [H BΣ Y Σ 1 ] is LUE of T. Note Var[ T T ] Var[ ˆT T ]. Also note by (??) Cov[ T T, ˆK(β) K] = 0 and Cov[ ˆT T, ˆK(β) K] = 0 Using this it follows that Var[ K K] Var[ ˆK K] 17 / 1 Hint: subtract and add ˆK(β). 18 / 1 EBLUP and EBLUE Model assessment Typically covariance matrix depends on unknown parameters. EBLUPS are obtained by replacing unknown variance parameters by their estimates (similar for EBLUE). Make histograms, qq-plots etc. for EBLUPs of ɛ and U. May be advantageous to consider standardized EBLUPS. Standardized BLUP is [CovÛ] 1/2 Û 19 / 1 20 / 1

6 Example: prediction of random intercepts and slopes in orthodont data ort7=lmer(distance~age+factor(sex)+(1 Subject),data=Orthodont) #check of model ort7 #residuals res=residuals(ort7) qqnorm(res) qqline(res) #outliers occur for subjects M09 and M13 #plot residuals against subjects boxplot(resort~orthodont$subject) #plot residuals against fitted values fitted=fitted(ort7) plot(rank(fitted),resort) #extract predictions of random intercepts raneffects=ranef(ort7) #qqplot of random intercepts qqnorm(ranint[[1]]) qqline(ranint[[1]]) #plot for subject M09 M09=Orthodont$Subject=="M09" plot(orthodont$age[m09],fitted[m09],type="l",ylim=c(20,32)) points(orthodont$age[m09],orthodont$distance[m09]) 21 / 1 22 / 1 Normal Q Q Plot Example: quantitative genetics (Sorensen and Waagepetersen 2003) Sample Quantiles resort ij size of jth litter of ith pig. Histogram Pedigree x z M16 M11 M03 M14 M06 M10 F06 F05 F02 F03 F Theoretical Quantiles rank(fitted) Sample Quantiles Normal Q Q Plot fitted[m09] litter size w 2 1 v 2 w Etc. pig without observation. pig with observation. 3 4 Theoretical Quantiles Orthodont$age[M09] 23 / 1 24 / 1

7 U i, Ũ i random genetic effects influencing size and variability of ij : ij U i = u i, Ũ i = ũ i N (µ i + u i, exp( µ i + ũ i )) (U 1,..., U n, Ũ 1,..., Ũ n ) N(0, G A) A: additive genetic relationship (correlation) matrix (depending on pedigree). Correlation structure derived from simple model: U offspring = 1 2 (U father + U mother ) + ɛ Q = A 1 sparse! (generalization of AR(1)) [ σu G = 2 ] ρσ u σũ ρσ u σũ ρ: coefficient of genetic correlation between U i and Ũ i. NB: high dimension n > Aim: identify pigs with favorable genetic effects σ 2 ũ 25 / 1 Exercises 1. Write out in detail the proofs of the results on slide For the orthodont data: fit models where the subject specific intercepts are treated a) as fixed effects b) as random effects. Compare the BLUEs of the fixed effects with the BLUPs of the random effects. 3. Fill in the details of the proof on slide Verify the results on page 186 in M&T regarding BLUPs in case of a one-way anova. 5. Consider slides 14 and the special case of the hidden AR(1) model. Explain how you can do conditional simulation of the hidden AR(1) process U given the observed data Y using 1) conditional simulation using prediction 2) the expressions for the conditional mean and covariance (cf. slide 14). Try to implement the solutions in R. 26 / 1

WLS and BLUE (prelude to BLUP) Prediction

WLS and BLUE (prelude to BLUP) Prediction Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark April 21, 2018 Suppose that Y has mean X β and known covariance matrix V (but Y need