WLS and BLUE (prelude to BLUP) Prediction

Similar documents
Prediction. is a weighted least squares estimate since it minimizes. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark

Linear models. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. October 5, 2016

Course topics (tentative) The role of random effects

Outline. Statistical inference for linear mixed models. One-way ANOVA in matrix-vector form

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

STAT 100C: Linear models

Mixed-Models. version 30 October 2011

STAT 100C: Linear models

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Best Linear Unbiased Prediction (BLUP) of Random Effects in the Normal Linear Mixed Effects Model. *Modified notes from Dr. Dan Nettleton from ISU

5.1 Consistency of least squares estimates. We begin with a few consistency results that stand on their own and do not depend on normality.

Maximum Likelihood Estimation

Mixed-Model Estimation of genetic variances. Bruce Walsh lecture notes Uppsala EQG 2012 course version 28 Jan 2012

MIXED MODELS THE GENERAL MIXED MODEL

The Simple Regression Model. Part II. The Simple Regression Model

Analysis of variance using orthogonal projections

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1

Outline for today. Maximum likelihood estimation. Computation with multivariate normal distributions. Multivariate normal distribution

21. Best Linear Unbiased Prediction (BLUP) of Random Effects in the Normal Linear Mixed Effects Model

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Lecture 5: BLUP (Best Linear Unbiased Predictors) of genetic values. Bruce Walsh lecture notes Tucson Winter Institute 9-11 Jan 2013

Joint Distributions. (a) Scalar multiplication: k = c d. (b) Product of two matrices: c d. (c) The transpose of a matrix:

STAT 350: Geometry of Least Squares

[y i α βx i ] 2 (2) Q = i=1

ANOVA: Analysis of Variance - Part I

Part 6: Multivariate Normal and Linear Models

Multivariate Regression

REPEATED MEASURES. Copyright c 2012 (Iowa State University) Statistics / 29

4 Multiple Linear Regression

Ch 2: Simple Linear Regression

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin

Notes on Random Vectors and Multivariate Normal

Econ 2120: Section 2

The Hilbert Space of Random Variables

Fitting Linear Statistical Models to Data by Least Squares II: Weighted

Chapter 5 Prediction of Random Variables

3 Multiple Linear Regression

Covariance and Correlation

Regression: Lecture 2

3. Linear Regression With a Single Regressor

Linear Models Review

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012

Bayesian Linear Regression

Simple Linear Regression

Random vectors X 1 X 2. Recall that a random vector X = is made up of, say, k. X k. random variables.

5 Operations on Multiple Random Variables

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

Statistics 351 Probability I Fall 2006 (200630) Final Exam Solutions. θ α β Γ(α)Γ(β) (uv)α 1 (v uv) β 1 exp v }

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Regression diagnostics

Lecture 13: Simple Linear Regression in Matrix Format. 1 Expectations and Variances with Vectors and Matrices

Probability Lecture III (August, 2006)

Linear Regression. Junhui Qian. October 27, 2014

Intermediate Econometrics

STA 2201/442 Assignment 2

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013

UCSD ECE153 Handout #34 Prof. Young-Han Kim Tuesday, May 27, Solutions to Homework Set #6 (Prepared by TA Fatemeh Arbabjolfaei)

Simple Linear Regression: The Model

The Statistical Property of Ordinary Least Squares

Prediction. Prediction MIT Dr. Kempthorne. Spring MIT Prediction

For more information about how to cite these materials visit

f X, Y (x, y)dx (x), where f(x,y) is the joint pdf of X and Y. (x) dx

STAT5044: Regression and Anova. Inyoung Kim

Introduction to Probability and Stocastic Processes - Part I

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

Orthogonal Projection and Least Squares Prof. Philip Pennance 1 -Version: December 12, 2016

Ch4. Distribution of Quadratic Forms in y

Linear models and their mathematical foundations: Simple linear regression

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

where x and ȳ are the sample means of x 1,, x n

MLES & Multivariate Normal Theory

Formulas for probability theory and linear models SF2941

General Linear Model: Statistical Inference

11 Hypothesis Testing

We begin by thinking about population relationships.

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

Statistics 910, #15 1. Kalman Filter

Statistical View of Least Squares

Oct Simple linear regression. Minimum mean square error prediction. Univariate. regression. Calculating intercept and slope

Lecture Notes 1: Vector spaces

Topics in Probability and Statistics

Simple and Multiple Linear Regression

Lecture 22: A Review of Linear Algebra and an Introduction to The Multivariate Normal Distribution

STA 256: Statistics and Probability I

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n

2.1 Linear regression with matrices

Final Exam. Economics 835: Econometrics. Fall 2010

If you believe in things that you don t understand then you suffer. Stevie Wonder, Superstition

Regression. ECO 312 Fall 2013 Chris Sims. January 12, 2014

Vectors and Matrices Statistics with Vectors and Matrices

Outline for today. Two-way analysis of variance with random effects

Multiple Linear Regression

STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method.

Correlation in Linear Regression

1. Density and properties Brief outline 2. Sampling from multivariate normal and MLE 3. Sampling distribution and large sample behavior of X and S 4.

2 (Statistics) Random variables

Multivariate Regression (Chapter 10)

Transcription:

WLS and BLUE (prelude to BLUP) Prediction Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark April 21, 2018 Suppose that Y has mean X β and known covariance matrix V (but Y need not be normal). Then ˆβ = (X T V 1 X ) 1 X T V 1 Y is a weighted least squares estimate since it minimizes (Y X β) T V 1 (Y X β). It is also the best linear unbiased estimate (BLUE) - that is the unbiased estimate with smallest variance in the sense that Var β Var ˆβ is positive semi-definite for any other linear unbiased estimate β. 1 / 29 2 / 29 BLUE for general parameter Theorem: Suppose EY = µ is in linear subspace M and CovY = σ 2 I and ψ = Aµ. Then BLUE of ψ is ˆψ = Aˆµ where ˆµ = PY and P orthogonal projection on M. Key result: for any other LUE ψ = BY. Cov( ψ ˆψ, ˆψ) = E[( ψ ˆψ) ˆψ] = 0 Note: like Pythogoras if we say X and Y orthogonal if inner product E[XY ] = 0. Proof of key result: Assume ψ is LUE. I.e. ψ = BY and E ψ = Bµ = Aµ for all µ L. We also have APµ = Aµ for all µ M (which implies that ˆψ is unbiased). Thus for all w R p (B AP)Pw = BPw APw = APw APw = 0 since Pw M. This implies (B AP)P = 0 which gives Cov( ψ ˆψ, ˆψ) = σ 2 (B AP)P T A T = 0. By key result: Var( ψ) = Var( ψ ˆψ)+Var ˆψ Var( ψ) Var ˆψ = Var( ψ ˆψ) 0. Hence ˆψ is BLUE. 3 / 29 4 / 29

BLUE - non-diagonal covariance matrix Estimable parameters and BLUE Lemma: suppose ˆψ = CỸ is BLUE of ψ based on data Ỹ and K is an invertible matrix. Then ˆψ = CK 1 Y is BLUE based on Y = KỸ as well. Proof: note that any LUE ψ = BY based on Y is LUE based on Ỹ as well ( ψ = BK 1 Y ). Corollary: suppose V = LL T is invertible and µ = X β where X has full rank. Then BLUE of ψ = Aµ is Aˆµ where ˆµ = X (X T V 1 X ) 1 X T V 1 Y is WLS estimate of µ. Definition: A linear combination a T β is estimable if it has a LUE b T Y. Result: a T β is estimable a T β = c T µ for some c. By results on previous slides: If a T β is estimable then BLUE is c Tˆµ. Proof: Ỹ has covariance matrix I and mean µ = X β where µ = L 1 µ. Thus by theorem, BLUE of Aµ = AL µ is AL X ( X T X ) 1 X T Ỹ. Applying lemma we get BLUE based on Y is AL X ( X T X ) 1 X T L 1 Y = Aˆµ. 5 / 29 6 / 29 Optimal prediction X and Y random variables, g real function. General result Pythagoras and conditional expectation Space of real random variables with finite variance may be viewed as a vector space with inner product and (L 2 ) norm Cov(Y E[Y X ], g(x )) = Cov(E[Y E[Y X ] X ], E[g(X ) X ])+ ECov(Y E[Y X ], g(x ) X ) = 0 In particular, for any prediction Ỹ = f (X ) of Y : E [ (Y E[Y X ])(E[Y X ] f (X )) ] = 0 from which it follows that E(Y Ỹ )2 = E(Y E[Y X ]) 2 +E(E[Y X ] Ỹ )2 E(Y E[Y X ]) 2 Thus E[Y X ] minimizes mean square prediction error. 7 / 29 < X, Y >= E(XY ) X = EX 2 Orthogonal decomposition (Pythagoras): Y 2 = E[Y X ] 2 + Y E[Y X ] 2 E[Y X ] may be viewed as projection of Y on X since it minimizes distance E(Y Ỹ ) 2 among all predictors Ỹ = f (X ). For zero-mean random variables, orthogonal is the same as uncorrelated. (Grimmett & Stirzaker, Prob. and Random Processes, Chapter 7.9 good source on this perspective on prediction and conditional expectation) 8 / 29

Decomposition of Y : BLUP Consider random vectors Y and X with with mean vectors Y = E[Y X ] + (Y E[Y X ]) EY = µ Y EX = µ X where predictor E[Y X ] and prediction error Y E[Y X ] uncorrelated. Thus, and covariance matrix [ ] ΣY Σ Σ = YX Σ XY Σ X VarY = VarE[Y X ]+Var(Y E[Y X ]) = VarE[Y X ]+EVar[Y X ] whereby Var(Y E[Y X ]) = EVar[Y X ]. Prediction variance is equal to the expected conditional variance of Y. 9 / 29 Then the best linear unbiased predictor of Y given X is in the sense that Ŷ = µ Y + Σ YX Σ 1 X (X µ X ) Var[Y (a + BX )] Var[Y Ŷ ] is positive semi-definite for all linear unbiased predictors a + BX and = only if Ŷ = a + BX (unbiased: E[Y a BX ] = 0). 10 / 29 Prediction variance/mean square prediction error Fact: Thus Cov[Y Ŷ, Ŷ ] = 0 and Cov[CX, Y Ŷ ] = 0 for all C. (1) VarŶ = Cov(Y, Ŷ ) = Σ YX Σ 1 X Σ XY It follows that mean square prediction error is Var[Y Ŷ ] = VarY + VarŶ 2Cov(Y, Ŷ ) = Σ Y Σ YX Σ 1 X Σ XY Proof of fact: Cov[CX, Y Ŷ ] = Cov[CX, Y ] Cov[CX, Ŷ ] = CΣ XY CΣ X Σ 1 X Σ XY = 0 Proof of BLUP By (1), Cov(CX, Y Ŷ ] = 0 for all C. Var[Y (a + BX )] = Var[Y Ŷ ] + Var[Ŷ (a + BX )]+ Cov[Y Ŷ, Ŷ (a + BX )] + Cov[Ŷ (a + BX ), Y Ŷ ] = Var[Y Ŷ ] + Var[Ŷ (a + BX )] Hence Var[Y (a + BX )] Var[Y Ŷ ] = Var[Ŷ (a + BX )] where right hand side is positive semi-definite. 11 / 29 12 / 29

BLUP as projection Y scalar for consistency with slide on L 2 space view. X = (X 1,..., X n ) T. Assume wlog that all variables are centered EY = EX i = 0 (otherwise consider prediction of Y EY based on X i EX i ). BLUP is projection of Y onto linear subspace spanned by X 1,..., X n (with orthonormal basis U 1,..., U n where U = Σ 1/2 X X ): Ŷ = n i=1 (analogue to least squares). E[YU i ]U i = Σ YX Σ 1 X X NB: conditional expectation E[Y X ] projection of Y onto space of all variables Z = f (X 1,..., X n ) where f real function. 13 / 29 Conditional distribution in multivariate normal distribution Consider jointly normal random vectors Y and X with mean vector and covariance matrix Then (provided Σ X invertible) µ = (µ Y, µ X ) [ ] ΣY Σ Σ = YX Σ XY Y X = x N(µ Y + Σ YX Σ 1 X (x µ X ), Σ Y Σ YX Σ 1 X Σ XY ) Proof: By BLUP Σ X Y = Ŷ + R where Ŷ = µ Y + Σ YX Σ 1 XX (X µ X ), R = Y Ŷ N(0, Σ Y Σ YX Σ 1 X Σ XY ) and Cov(R, X ) = 0. By normality R is independent of X. Given X = x, Ŷ is constant and distribution of R is not affected. Thus result follows. 14 / 29 Optimal prediction for jointly normal random vectors Conditional simulation using prediction Suppose Y and X are jointly normal and we wish to simulate Y X = x. By previous result Y X = x ŷ + R By previous result it follows that BLUP of Y given X coincides with E[Y X ] when (X, Y ) jointly normal. Hence for normally distributed (X, Y ), BLUP is optimal prediction. where ŷ = µ Y + Σ YX Σ 1 X (x µ X ). We thus need to simulate R. This can be done by simulated prediction : simulate (Y, X ) and compute Ŷ and R = Y Ŷ. Then our conditional simulation is ŷ + R Advantageous if it is easier to simulate (Y, X ) and compute Ŷ than simulate directly from conditional distribution of Y X = x (e.g. if simulation of (Y, X ) easy but Σ Y Σ YX Σ 1 X Σ XY difficult) 15 / 29 16 / 29

Prediction in linear mixed model Let U N(0, Ψ) and Y U = u N(X β + Zu, Σ). Then Cov[U, Y ] = ΨZ T and VarY = V = ZΨZ T + Σ. and NB: Û = E[U Y ] = ΨZ T V 1 (Y X β) VarÛ = ΨZ T V 1 ZΨ T ΨZ T (ZΨZ T + Σ) 1 = (Ψ 1 + Z T Σ 1 Z) 1 Z T Σ 1 e.g. useful if Ψ 1 is sparse (like AR-model). IQ example Y measurement of IQ, U subject specific random effect: Y = µ + U + ɛ where standard deviation of U and ɛ are 15 and 5 and µ = 100. Given Y = 130, E[µ + U Y = 130] = 127. Example of shrinkage to the mean. Similarly Var[U Y ] = Var[U Û] = Ψ ΨZ T V 1 ZΨ T = (Ψ 1 +Z T Σ 1 Z) 1 One-way anova example at p. 186 in M & T. 17 / 29 18 / 29 BLUP as hierarchical likelihood estimates BLUP of mixed effect with unknown β Assume EX = Cβ and EY = Dβ. Given X and β, BLUP of Maximization of joint density ( hierarchical likelihood ) f (y u; β)f (u; ψ) with respect to u gives BLUP (M & T p. 171-172 for one-way anova and p. 183 for general linear mixed model) Joint maximization wrt. u and β gives Henderson s mixed-model equations (M & T p. 184) leading to BLUE ˆβ and BLUP û. is K = Aβ + BY ˆK(β) = Aβ + BŶ (β) where BLUP Ŷ (β) = Dβ + Σ YX Σ 1 X (X Cβ). Typically β is unknown. Then BLUP is ˆK = A ˆβ + BŶ ( ˆβ) where ˆβ is BLUE (Harville, 1991) 19 / 29 20 / 29

Proof: ˆK(β) can be rewritten as Aβ+BŶ (β) = [A+BD BΣ YX Σ 1 X C]β+BΣ YX Σ 1 X X = T +BΣ YX Σ 1 X X Note BLUE of T = [A + BD BΣ YX Σ 1 X C]β is ˆT = [A + BD BΣ YX Σ 1 X C] ˆβ. Now consider a LUP K = HX = [H BΣ YX Σ 1 X ]X + BΣ YX Σ 1 of K. By unbiasedness, T = [H BΣ YX Σ 1 X ]X is LUE of T. Note Var[ T T ] Var[ ˆT T ]. Also note by (1) X X EBLUP and EBLUE Typically covariance matrix depends on unknown parameters. EBLUPS are obtained by replacing unknown variance parameters by their estimates (similar for EBLUE). Cov[ T T, ˆK(β) K] = 0 and Cov[ ˆT T, ˆK(β) K] = 0 Using this it follows that Var[ K K] Var[ ˆK K] Hint: subtract and add ˆK(β) both in Var[ K K] and Var[ ˆK K]. 21 / 29 22 / 29 Model assessment Example: prediction of random intercepts and slopes in orthodont data Make histograms, qq-plots etc. for EBLUPs of ɛ and U. May be advantageous to consider standardized EBLUPS. Standardized BLUP is [CovÛ] 1/2 Û ort7=lmer(distance~age+factor(sex)+(1 Subject),data=Orthodont) #check of model ort7 #residuals res=residuals(ort7) qqnorm(res) qqline(res) #outliers occur for subjects M09 and M13 #plot residuals against subjects boxplot(resort~orthodont$subject) #plot residuals against fitted values fitted=fitted(ort7) plot(rank(fitted),resort) 23 / 29 24 / 29

Normal Q Q Plot #extract predictions of random intercepts raneffects=ranef(ort7) #qqplot of random intercepts qqnorm(ranint[[1]]) qqline(ranint[[1]]) #plot for subject M09 M09=Orthodont$Subject=="M09" plot(orthodont$age[m09],fitted[m09],type="l",ylim=c(20,32)) points(orthodont$age[m09],orthodont$distance[m09]) Sample Quantiles Sample Quantiles 4 2 0 2 4 2 0 2 4 2 1 0 1 2 Theoretical Quantiles Normal Q Q Plot fitted[m09] 4 2 0 2 4 20 22 24 26 28 30 32 M16 M11 M03 M14 M06 M10 F06 F05 F02 F03 F11 resort 4 2 0 2 4 0 20 40 60 80 100 rank(fitted) 2 1 0 1 2 8 9 10 11 12 13 14 Theoretical Quantiles Orthodont$age[M09] 25 / 29 26 / 29 Example: quantitative genetics (Sorensen and Waagepetersen 2003) U i, Ũ i random genetic effects influencing size and variability of X ij : X ij U i = u i, Ũ i = ũ i N (µ i + u i, exp( µ i + ũ i )) X ij size of jth litter of ith pig. Histogram x Pedigree z (U 1,..., U n, Ũ 1,..., Ũ n ) N(0, G A) A: additive genetic relationship (correlation) matrix (depending on pedigree). Correlation structure derived from simple model: 0 200 400 600 800 1000 1200 5 10 15 20 litter size w 2 1 v 2 w Etc. pig without observation. pig with observation. 3 4 U offspring = 1 2 (U father + U mother ) + ɛ Q = A 1 sparse! (generalization of AR(1)) [ σu G = 2 ] ρσ u σũ ρσ u σũ ρ: coefficient of genetic correlation between U i and Ũ i. NB: high dimension n > 6000. σ 2 ũ 27 / 29 Aim: identify pigs with favorable genetic effects 28 / 29

Exercises 1. Write out in detail the proofs of the results on slide 3-5. 2. Fill in the details of the proof on slide 21. 3. Verify the results on page 186 in M&T regarding BLUPs in case of a one-way anova. 4. Consider slides 15-16 and the special case of the hidden AR(1) model from the second miniproject. Explain how you can do conditional simulation of the hidden AR(1) process U given the observed data Y using 1) conditional simulation using prediction 2) the expressions for the conditional mean and covariance (cf. slide 14). Try to implement the solutions in R. 29 / 29