Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Similar documents
STAT 100C: Linear models

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Introduction to Estimation Methods for Time Series models. Lecture 1

Multivariate Regression Analysis

STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method.

Properties of the least squares estimates

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

Lecture 13: Simple Linear Regression in Matrix Format. 1 Expectations and Variances with Vectors and Matrices

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

2. A Review of Some Key Linear Models Results. Copyright c 2018 Dan Nettleton (Iowa State University) 2. Statistics / 28

Data Mining Stat 588

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

ST 740: Linear Models and Multivariate Normal Inference

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin

3 Multiple Linear Regression

Quick Review on Linear Multiple Regression

IEOR 165 Lecture 7 1 Bias-Variance Tradeoff

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Lecture 15. Hypothesis testing in the linear model

Homoskedasticity. Var (u X) = σ 2. (23)

STAT5044: Regression and Anova. Inyoung Kim

Simple Linear Regression

Maximum Likelihood Estimation

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012

MIT Spring 2015

Linear Model Under General Variance

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that

[y i α βx i ] 2 (2) Q = i=1

STA 302f16 Assignment Five 1

Association studies and regression

Chapter 5 Matrix Approach to Simple Linear Regression

3. For a given dataset and linear model, what do you think is true about least squares estimates? Is Ŷ always unique? Yes. Is ˆβ always unique? No.

Probability and Statistics Notes

Weighted Least Squares

Random vectors X 1 X 2. Recall that a random vector X = is made up of, say, k. X k. random variables.

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Linear Algebra Review

Statistics 910, #15 1. Kalman Filter

Multiple Linear Regression

STAT 540: Data Analysis and Regression

Ch 2: Simple Linear Regression

Review of Econometrics

Xβ is a linear combination of the columns of X: Copyright c 2010 Dan Nettleton (Iowa State University) Statistics / 25 X =

Matrix Approach to Simple Linear Regression: An Overview

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Bias Variance Trade-off

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A =

Multivariate Regression

L2: Two-variable regression model

Statistics 910, #5 1. Regression Methods

MLES & Multivariate Normal Theory

FENG CHIA UNIVERSITY ECONOMETRICS I: HOMEWORK 4. Prof. Mei-Yuan Chen Spring 2008

Advanced Econometrics

Linear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77

Sensitivity of GLS estimators in random effects models

Lecture 11: Regression Methods I (Linear Regression)

General Linear Model: Statistical Inference

1 Data Arrays and Decompositions

4 Multiple Linear Regression

Lecture 11: Regression Methods I (Linear Regression)

THE ANOVA APPROACH TO THE ANALYSIS OF LINEAR MIXED EFFECTS MODELS

Applied Econometrics (QEM)

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

Estimation of the Response Mean. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 27

2.1 Linear regression with matrices

Motivation for multiple regression

STAT 100C: Linear models

Matrix Factorizations

Lecture 6: Linear models and Gauss-Markov theorem

Linear Models and Estimation by Least Squares

Inference in Regression Analysis

Quantitative Analysis of Financial Markets. Summary of Part II. Key Concepts & Formulas. Christopher Ting. November 11, 2017

The linear model is the most fundamental of all serious statistical models encompassing:

Estimating Estimable Functions of β. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 17

Econ 620. Matrix Differentiation. Let a and x are (k 1) vectors and A is an (k k) matrix. ) x. (a x) = a. x = a (x Ax) =(A + A (x Ax) x x =(A + A )

Measuring the fit of the model - SSR

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1

Lecture 14 Simple Linear Regression

Lecture 3: Multiple Regression

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

11 Hypothesis Testing

Regression Analysis. y t = β 1 x t1 + β 2 x t2 + β k x tk + ϵ t, t = 1,..., T,

Advanced Statistics I : Gaussian Linear Model (and beyond)

Part 6: Multivariate Normal and Linear Models

Advanced Quantitative Methods: ordinary least squares

BIOS 2083 Linear Models c Abdus S. Wahed

Likelihood Methods. 1 Likelihood Functions. The multivariate normal distribution likelihood function is

Econometrics II - EXAM Answer each question in separate sheets in three hours

Math 423/533: The Main Theoretical Topics

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

Regression and Statistical Inference

LECTURE 5 HYPOTHESIS TESTING

Econometrics Multiple Regression Analysis: Heteroskedasticity

Transcription:

Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall that the linear model is y = Xβ + Σ 1/2 z E[z] = 0 Var[z] = I, where X R n p, β R p and Σ S n +. The term linear here typically refers to the mean parameters: The expectation of any element y i of y is linear in the parameters. If the parameter space for Σ is unrestricted, then the covariance model is linear as as well. If the covariance is restricted, 1

say Σ = Σ(θ) for θ Θ, then typically the covariance model is nonlinear in its parameters (e.g. autoregressive covariance). A surprising number of other models can be viewed as submodels of the linear model: 1. Row and column effects models: Let y i,j = µ + a i + b j + σ i ψ j z i,j Y = µ11 + a1 + 1b + Σ 1/2 ZΨ 1/2 y = 1µ + (1 I)a + (I 1)b + (Ψ Σ) 1/2 z µ = [1 (1 I) (I 1)] a + Ψ Σ) 1/2 z. b 2. Random effects models: 3. Independent multivariate sample: Let y 1,..., y n i.i.d. N p (µ, Σ). Then Y = 1µ + ZΣ 1/2 y = (I 1)µ + (Σ 1/2 I)z 4. General linear model: Let y i N(B x i, Σ), independently for i = 1,..., n. Then Y = XB + ZΣ 1/2 y = (I X)b + (Σ 1/2 I)z 5. Correlated rows: Suppose Cov[Y [i,] ] Σ, Cov[Y [,j] ] Ψ. Then Y = XB + Ψ 1/2 ZΣ 1/2 y = (I X)b + (Σ 1/2 Ψ 1/2 )z. 2

6. Row and column covariates: Suppose E[y i,j ] = x i Bw j, for example, Y = XBW + Ψ 1/2 ZΣ 1/2 y = (W X)b + (Σ 1/2 Ψ 1/2 )z. So we begin by studying the deceptively simple general form of the linear model. OLS: For a given value of β, the RSS is given by RSS(β) = (y Xβ) (y Xβ). Differentiation or completing the square shows that the minimizer in β is ˆβ = (X X) 1 X y. It is straightforward to show that the OLS estimator is unbiased, E[ˆβ] = β, and that the variance of this estimator is Var[ˆβ] = (X X) 1 (X ΣX )(X X) 1. If Σ = σ 2 I then the OLS estimator is optimal in terms of variance: Theorem 1 (Gauss-Markov). Suppose Var[y] = σ 2 I and let ˇβ = Ay be an unbiased estimator for β. Then Var[ˇβ] = Var[ˆβ] + E[(ˇβ ˆβ)(ˇβ ˆβ) ] If Σ σ 2 I then the OLS estimator does not have minimum variance. 3

Exercise 1. Obtain the form of β, ŷ = Xβ, and Var[ˆβ] in terms of components of the SVD of X. Comment on the following intuitive statement: If y Xβ then X 1 y β. GLS: If we knew Σ we could obtain a decorrelated vector ỹ = Σ 1/2 y. The model for ỹ is ỹ = Σ 1/2 y = Σ 1/2 Xβ + Σ 1/2 Σ 1/2 z ỹ = Xβ + z. The OLS estimator based on the decorrelated data and corresponding regressors is ˆβ OLS = ( X X) 1 X ỹ. By the Gauss-Markov theorem, this estimator is optimal among all linear unbiased estimators, that is unbiased estimators of the form Aỹ. Obviously, this includes unbiased estimators of the form Ay, since ỹ is a linear transformation of y. What is this estimator in terms of the original data and covariates? ˆβ GLS = ˆβ OLS = ( X X) 1 X ỹ = (X Σ 1 X) 1 X Σ 1 y. The variance of this estimator is easily shown to be Var[ˆβ GLS ] = (XΣ 1 X) 1. This is the smallest variance attainable among unbiased linear estimators, since the GLS estimate is the BLUE - best linear unbiased estimator. 4

Furthermore, suppose we assume normal errors, z N n (0, I). For known Σ, the MLE of β is the minimizer of 2 log p(y X, β) = (y Xβ) Σ 1 (y Xβ). Taking derivatives, the MLE is the GLS estimator. Big picture: If y N n (Xβ, Σ) and Σ is known, ˆβ GLS = (X Σ 1 X) 1 X Σ 1 y is the GLS estimator; the ML estimator; the OLS estimator based on the decorrelated data Σ 1/2 y; the BLUE based on y. Now the problem is that Σ is rarely known. In some cases, particularly if Σ = Σ(θ) for some low dimensional parameter θ, we can jointly estimate Σ and β. This is known as feasible GLS. 2 GLS for multivariate regression Perhaps the most commonly used model is multivariate regression, a.k.a. the general linear model: Y = XB + ZΣ 1/2 y = (I X)b + (Σ 1/2 I)z 5

GLS: Using the vector form for the model, the GLS estimator of b = vec(b) is ˆB GLS = ( (I X) (Σ I) 1 (I X) ) 1 ( (I X) (Σ 1 I)y ) = (Σ 1 X X) 1 (Σ 1 X )y = (I (X X) 1 X )y Now unvectorize : ˆB GLS = (I (X X) 1 X )y = (X X) 1 X Y. Notice that Σ does not appear in formula for the GLS estimator. about why this is: Think The multivariate regression model is just a regression model for each variable: y j = Xb j + e j So the jth column of B R q p is b j,the vector of regression coefficients for the jth variable. In this model, there is no correlation among the elements of y j /rows of Y. We already knew that in this case the BLUE for b j based on y j is the OLS estimator: ˆβ j = (X X) 1 X y j. Now is this BLUE among estimators based on Y? The question comes down to whether or not observing some variables Y [, j] correlated with y j helps with estimating b j. The answer given by the above calculation is no. 6

Binding the OLS estimators columnwise gives the multivariate OLS/GLS estimator: ˆB = [ˆβ1 ˆβ p ] To summarize estimator: ˆBGLS = (X X) 1 X Y is the OLS estimator; the BLUE; the MLE under normal errors. the column-bound univariate OLS estimators. What is the covariance of ˆB? How do we even write this out, since ˆB is a matrix? One approach is to vectorize. Recall the vector form for the GLM: Y = XB + ZΣ 1/2 y = (I X)b + (Σ 1/2 I)z = Xb + Σ 1/2 z Now we already calculated the variance of a GLS estimator: Var[ ˆB] Var[ˆb] = ( X Σ 1 X) 1 = ( (I X )(Σ 1 I)(I X) ) 1 = Σ (X X) 1. = ( (Σ 1 X X) 1) This means that Cov[ ˆB [,j], ˆB [,j ]] = σ j,j (X X) 1 Cov[ ˆB [k,], ˆB [k,]] = s k,k Σ where s k,k = (X X) 1 k,k. In particular, Cov[ˆb k,j, ˆb k,j ] = σ j,j s k,k. So in a sense, Σ is the column covariance of ˆB, and (X X) 1 is the row covariance. 7

3 Covariance estimation for the GLM Since the OLS/GLS estimator of B is (X X) 1 X Y, the fitted values for Y are Ŷ = X ˆB = X(X X) 1 X Y = P X Y. So the fitted values of the matrix Y are just the column-wise projections of Y on to the space spanned by the columns of X. Obvious but important is that ˆB itself is a function of Ŷ: ˆB = (X X) 1 X Y = (X X) 1 (X X)(X X) 1 X = (X X) 1 X ( X(X X) 1 X ) Y = (X X) 1 X P X Y = (X X) 1 X Ŷ The residuals are the projection onto the null space of X Ê = Y Ŷ = (I P X )Y = (I X(X X) 1 X)Y = P 0 Y. This gives the decomposition Y = Ŷ + Ê. 8

Is Ŷ correlated with Ê? Let s find out. Let ŷ and ê be the vectorizations of Ŷ and Ê respectively. The covariance between these two matrices is the covariance between their vectorizations. Cov[ŷ, ê] = E[(ŷ E[ŷ])ê ] Now what are these items? Ŷ E[Ŷ] = P XY E[P X Y] = P X (Y E[Y]) = P X E, ŷ E[ŷ] = (I P X )e. Ê = P 0 Y = P 0 (XB + E) = P 0 E, ê = (I P 0 )e. Therefore, Cov[ŷ, ê] = E[(ŷ E[ŷ])ê ] = E[(I P X )ee (I P 0 )] = 0. So as with ordinary regression, the residuals are uncorrelated with the fitted values, and hence also uncorrelated with ˆB. This is useful, as it allows us to obtain an estimate of Σ that is independent of our estimate of B if the errors are normal. 9

Ok, let s consider estimating Σ from Ê. Now Ê = P 0E, and under the model E = ZΣ 1/2. Therefore, Ê = P 0ZΣ 1/2, and S = Ê Ê = Σ 1/2 Z P 0 P 0 ZΣ 1/2 E[S] = Σ 1/2 E[Z P 0 Z]Σ 1/2 = Σ 1/2 (tr(p 0 )I)Σ 1/2 = Σ (n q) Therefore, ˆΣ = Ê Ê/(n q) is an unbiased estimator of Σ. However, it is not the MLE: Exercise 2. Show that the MLE of (B, Σ) is ( ˆB, S/n). If the errors are Gaussian then we can say more. Suppose Y = XB + ZΣ 1/2 Z N n p (0, I I), or equivalently, Y N n p (XB, Σ I). Then Ê = P 0 ZΣ 1/2. Now recall from a few months ago the following: P 0 is a rank-(n q) idempotent projection matrix; P 0 = GG T for some G V n q,n R n (n q) ; If Z N(0, I I) then G Z N n q (0, I I). 10

So we have S = Ê Ê = Σ 1/2 Z P 0 P 0 ZΣ 1/2 = Σ 1/2 Z P 0 ZΣ 1/2 = Σ 1/2 Z GG ZΣ 1/2 = W W, where W = G ZΣ 1/2 N (n q) p (0, Σ). Therefore S = Ê Ê = W W Wishart (n q) (Σ). We ve seen a simpler version of this before: Suppose Y N n p (1µ, Σ I). Although it doesn t look like it, this is a multivariate linear regression model where the regression model for each column j is y j = 1µ j + e j. That is there is q = 1 predictor and it is constant for all rows : X = 1 and B = µ ; P X = 1(1 1) 1 1 = 1 /n; P 0 = I 11 /n = C; S = Ê Ê = Y CC Wishart n 1 (Σ). 4 Testing the GLH We now derive some classical testing and confidence procedures for the regression matrix B of the Gaussian GLM: Y N(XB, Σ I). Some results we ve obtained so far: 11

ˆB = (X X) 1 X Y N q p (B, Σ (X X) 1 ); S = Ê Ê Wishart (n q) (Σ); ˆB and Ê are uncorrelated and so independent. Consider first inference for B [k,] = b k, the (transposed) kth row of the matrix B. This p 1 vector represents the effects of the kth predictor X [,k] on Y: Y = XB + E = q X [,q] b k + E. k=1 In the context of making inference about b j, our results above tell us that ˆb j N p (b j, h jj Σ); ˆb j is independent of S. Our objective is to make a confidence region or test a hypothesis about b j based on ˆb j. Recall that a particularly simple case of this is when q = 1 and X = 1. This is just the multivariate normal model Y N n p (1µ, Σ I). Recall our results, both old and new, imply that ȳ N(µ, Σ/n) ȳ is independent of S. The point is that a statistical method developed to infer µ from (ȳ, S) can be used to infer b j from (ˆb j, S). So we proceed with the former since the notation is simpler. 12

Suppose Σ were known and we wanted to test H : E[y] = µ. We then might construct a test statistic as follows: y N(µ, Σ/n) n(y µ) N(0, Σ) (µ) = nσ 1/2 (y µ) N(0, I) z 2 (µ) = n(ȳ µ) Σ 1 (ȳ µ) χ 2 p. So in this case the null distribution is χ 2 p, and of course does not depend on the parameters. In this case that Σ is known, this statistic is a pivot statistic, that is, has a distribution that doesn t depend on any unknown parameters. We use such statistics to test and infer values of µ Since Σ is unknown we might try replacing it with ˆΣ the unbiased estimator: t 2 (ȳ, ˆΣ) = n(ȳ µ) Σ 1 (ȳ µ). Can we use this for testing and confidence region construction? To do so requires the following: show that t 2 is pivotal, that is, doesn t depend on unknown parameters; derive the distribution of t 2. 13