STAT 540: Data Analysis and Regression

Similar documents
Estimating Estimable Functions of β. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 17

Matrix Approach to Simple Linear Regression: An Overview

2. A Review of Some Key Linear Models Results. Copyright c 2018 Dan Nettleton (Iowa State University) 2. Statistics / 28

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

Ch 2: Simple Linear Regression

Xβ is a linear combination of the columns of X: Copyright c 2010 Dan Nettleton (Iowa State University) Statistics / 25 X =

Simultaneous Inference: An Overview

Linear models and their mathematical foundations: Simple linear regression

Estimation of the Response Mean. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 27

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model

Need for Several Predictor Variables

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

Ch 3: Multiple Linear Regression

Formal Statement of Simple Linear Regression Model

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Lecture 10 Multiple Linear Regression

6. Multiple Linear Regression

Topic 7 - Matrix Approach to Simple Linear Regression. Outline. Matrix. Matrix. Review of Matrices. Regression model in matrix form

SSR = The sum of squared errors measures how much Y varies around the regression line n. It happily turns out that SSR + SSE = SSTO.

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects

Lecture 6 Multiple Linear Regression, cont.

Applied Regression Analysis

Chapter 6 Multiple Regression

Multiple Regression. Dr. Frank Wood. Frank Wood, Linear Regression Models Lecture 12, Slide 1

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

Chapter 5 Matrix Approach to Simple Linear Regression

Data Mining Stat 588

STAT5044: Regression and Anova. Inyoung Kim

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik

Simple and Multiple Linear Regression

Multivariate Regression

Multiple Linear Regression

Simple Linear Regression

Linear Algebra Review

Regression Models for Quantitative and Qualitative Predictors: An Overview

Inference for Regression

[4+3+3] Q 1. (a) Describe the normal regression model through origin. Show that the least square estimator of the regression parameter is given by

Chapter 12 - Lecture 2 Inferences about regression coefficient

Chapter 2 Multiple Regression I (Part 1)

3 Multiple Linear Regression

Math 423/533: The Main Theoretical Topics

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

Chapter 2 Inferences in Simple Linear Regression

Linear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is.

Multivariate Linear Regression Models

Mathematics for Economics MA course

assumes a linear relationship between mean of Y and the X s with additive normal errors the errors are assumed to be a sample from N(0, σ 2 )

Nonparametric Regression and Bonferroni joint confidence intervals. Yang Feng

THE ANOVA APPROACH TO THE ANALYSIS OF LINEAR MIXED EFFECTS MODELS

PART I. (a) Describe all the assumptions for a normal error regression model with one predictor variable,

STAT 705 Chapter 16: One-way ANOVA

F-tests and Nested Models

Bias Variance Trade-off

Linear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77

17: INFERENCE FOR MULTIPLE REGRESSION. Inference for Individual Regression Coefficients

Inference in Normal Regression Model. Dr. Frank Wood

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

14 Multiple Linear Regression

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

4 Multiple Linear Regression

Lecture 1 Linear Regression with One Predictor Variable.p2

STAT 100C: Linear models

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Simple Linear Regression

LINEAR REGRESSION MODELS W4315

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing

Lecture 12 Inference in MLR

STAT5044: Regression and Anova. Inyoung Kim

Inferences for Regression

Ma 3/103: Lecture 25 Linear Regression II: Hypothesis Testing and ANOVA

STAT5044: Regression and Anova

Homoskedasticity. Var (u X) = σ 2. (23)

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin

BIOS 2083 Linear Models c Abdus S. Wahed

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007

Regression and Statistical Inference

Linear Models in Machine Learning

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013

Chapter 2. Continued. Proofs For ANOVA Proof of ANOVA Identity. the product term in the above equation can be simplified as n

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

General Linear Model: Statistical Inference

Multiple Linear Regression

STA 2101/442 Assignment Four 1

Least Squares Estimation-Finite-Sample Properties

Lectures on Simple Linear Regression Stat 431, Summer 2012

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

Chapter 4: Regression Models

Chapter 14. Linear least squares

MIT Spring 2015

Correlation Analysis

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

Maximum Likelihood Estimation

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

Measuring the fit of the model - SSR

Lecture 11: Regression Methods I (Linear Regression)

Transcription:

STAT 540: Data Analysis and Regression Wen Zhou http://www.stat.colostate.edu/~riczw/ Email: riczw@stat.colostate.edu Department of Statistics Colorado State University Fall 205 W. Zhou (Colorado State University) STAT 540 July 6th, 205 / 62

Contents Multiple Linear Regression Model 2 Inference on Multiple Regression 3 Inference about Regression Parameters 4 Estimation and Prediction 5 Geometric View of Regression and Linear Models 6 Estimating estimable function of coefficient W. Zhou (Colorado State University) STAT 540 July 6th, 205 2 / 62

Multiple Linear Regression I Multiple linear regression model Multiple linear regression model in matrix terms 2 Estimation of regression coefficients Inference ANOVA results 2 Inference about regression parameters 3 Estimation of mean response and prediction of new observation Inference about regression parameters Estimation and prediction Geometric interpretation of linear model and regression Estimating estimable function of regression or linear coefficient β W. Zhou (Colorado State University) STAT 540 July 6th, 205 3 / 62

Multiple Linear Regression Model 2 Inference on Multiple Regression 3 Inference about Regression Parameters 4 Estimation and Prediction 5 Geometric View of Regression and Linear Models 6 Estimating estimable function of coefficient W. Zhou (Colorado State University) STAT 540 July 6th, 205 4 / 62

Multiple Linear Regression Example: # of predictor variables = 2. Y i = β 0 + β X i + β 2 X i2 + ɛ i, ɛ i iid N(0, σ 2 ), for i =,..., n. Response surface: E(Y i ) = Example: Y = Pine bark beetle density X = Temperature X2 = Tree species W. Zhou (Colorado State University) STAT 540 July 6th, 205 5 / 62

Interpretation of Coefficients β 0 : Intercept. When the model scope includes X = X 2 = 0. β0 is interpreted as the mean response E(Y ) at X = X 2 = 0. β j : Slope in the direction of X j (effect). E(Y )/ Xj = EY X=(X,X 2 )(Y ) E Y X=(X,X 2 )(Y ) = Interpreted as the change in the mean response E(Y ) per unit increase in X j, when X j are held constant. What if X j is qualitative? W. Zhou (Colorado State University) STAT 540 July 6th, 205 6 / 62

Multiple Linear Regression A general linear regression model is, for i =,, n Response surface: Y i = β 0 + p X ij β j + ɛ i, ɛ i iid N(0, σ 2 ). j= E(Y i ) = β 0 + p X ij β j j= Regression coefficients: β 0, β,..., β p, β p. Predictor variables: X,..., X p are known constants/values. The model is linear in the parameters, not necessarily in the shape of the response surface. W. Zhou (Colorado State University) STAT 540 July 6th, 205 7 / 62

Response Surface Examples Polynomial regression E(Y ) = β 0 + β X + β 2 X 2 + β 3 X 3. Transformed variables E(log(Y )) = β 0 + β X + β 2 X2. Interaction effects E(Y ) = β 0 + β X + β 2 X2 + β 3 X X 2. The change in the mean response corresponding to a unit change in X depends on X 2 and vice versa. Testing whether β3 = 0 or not is very challenging in high-dimensional (n = o(p)). W. Zhou (Colorado State University) STAT 540 July 6th, 205 8 / 62

Qualitative Predictor Variables Example: Let Y = length of hospital stay, X = age, and X 2 = gender: 0 for male and for female. An additive model is Thus the response surface for males is and for females is β 2 is This kind of model sometimes is called ANVOCA model. W. Zhou (Colorado State University) STAT 540 July 6th, 205 9 / 62

Qualitative Predictor Variables Interaction: the relationship between X and Y for a fixed value of X 2 = x 2 depends on x 2. An interaction model is Thus the response surface for males is and for females is W. Zhou (Colorado State University) STAT 540 July 6th, 205 0 / 62

Notation n observations, response variable, p β s with predictors (i.e. β 0 is the pth). Response variable: Y n = (Y, Y 2,..., Y n ) T. The predictors are arranged in the design matrix X X 2 X,p X 2 X 22 X 2,p X n p =... X n X n2 X n,p Random error: ɛ n = (ɛ, ɛ 2,..., ɛ n ) T. Regression coefficients: β p = (β 0, β,..., β p ) T. W. Zhou (Colorado State University) STAT 540 July 6th, 205 / 62

Multiple Linear Regression Model in Matrix Terms The multiple linear regression model can be written as where as we have seen before Thus, E(ɛ) = 0 n, Var{ɛ} = σ 2 I n n. and Y W. Zhou (Colorado State University) STAT 540 July 6th, 205 2 / 62

Least Squares Estimation Consider the criterion: n p Q = (Y i β 0 β j X ij ) 2 = i= j= The least squares estimate of β is assuming that X T X is invertible. This is also the MLE. What condition on X do we need to have X T X invertible? What if X T X is not invertible? W. Zhou (Colorado State University) STAT 540 July 6th, 205 3 / 62

Fitted Values and Residuals Fitted values: Ŷ = where the hat matrix is Residuals: e = W. Zhou (Colorado State University) STAT 540 July 6th, 205 4 / 62

Multiple Linear Regression Model 2 Inference on Multiple Regression 3 Inference about Regression Parameters 4 Estimation and Prediction 5 Geometric View of Regression and Linear Models 6 Estimating estimable function of coefficient W. Zhou (Colorado State University) STAT 540 July 6th, 205 5 / 62

Sums of Squares We have sums of squares in matrix forms that SSR = SSE = SST O = n (Ŷi Ȳ )2 = i= n (Y i Ŷi) 2 = i= n (Y i Ȳ )2 = i= Partitioning of total sum of squares and particularly the df are SST O }{{} df=n = SSR }{{} df=p + SSE }{{}. df=n p W. Zhou (Colorado State University) STAT 540 July 6th, 205 6 / 62

Mean Squares Define mean squares MSR = SSR p, SSE MSE = n p. It can be shown that E(MSE) = Also can be shown that = σ 2 if β j = 0 for j E(M SR). > σ 2 otherwise W. Zhou (Colorado State University) STAT 540 July 6th, 205 7 / 62

ANOVA Table The ANOVA table is Source SS df MS F Regression SSR MSR F = MSR/MSE Error SSE M SE Total SST O If then in which case MSR/MSE. E(MSE) = E(MSR) = σ 2 W. Zhou (Colorado State University) STAT 540 July 6th, 205 8 / 62

Overall F Test for Regression Relation Test H 0 : v.s. H a :. It can be shown that under H0, F = MSR MSE Thus we can perform an F -test at level α by the decision rule: Conditional on H 0 being rejected, we may want to find (or a.s.) Identification/Selection. S = {j β j 0} W. Zhou (Colorado State University) STAT 540 July 6th, 205 9 / 62

Coefficient of Multiple Determination, R 2 The coefficient of multiple determination is denoted by R 2 and is defined as R 2 = SSR SST O = SSE SST O Interpretation: The proportion of variation in the Y i s explained by the regression relation. W. Zhou (Colorado State University) STAT 540 July 6th, 205 20 / 62

More on R 2 As more predictors are added to the model (p ), R 2 must increase. Why? Recall SST O = SSR + SSE SST O is fixed for Y while SSE is a minimum of the unconstraint convex optimization problem β = arg min SSE(β 0,..., β p ). Suppose we consider an extra predictor and thus consider SSE(β0,..., β p). The β that minimizes this SSE cannot be inferior to the previous minimizer because β p = 0 is a special case within the new minimization problem that incorporates the previous one. W. Zhou (Colorado State University) STAT 540 July 6th, 205 2 / 62

Adjusted R 2 R 2 depends on p (even for p n), how to remove that dependence? The adjusted coefficient of multiple determination is denoted by Ra 2 and is defined as Ra 2 = SSE/n p ( ) n SSE SST O/n = n p SST O. The adjusted coefficient of multiple determination R 2 a may decrease when more predictors are in the model. Many other statistics such as AIC, BIC, Mallow s C p, etc. will be discussed and they are superior over R 2 a. W. Zhou (Colorado State University) STAT 540 July 6th, 205 22 / 62

Multiple Linear Regression Model 2 Inference on Multiple Regression 3 Inference about Regression Parameters 4 Estimation and Prediction 5 Geometric View of Regression and Linear Models 6 Estimating estimable function of coefficient W. Zhou (Colorado State University) STAT 540 July 6th, 205 23 / 62

Estimation of Regression Coefficients Mean satisfies E( ˆβ) = β. That is, the LS estimate ˆβ is an unbiased estimate of β. Variance-covariance matrix: Σ β := Var{ ˆβ} = σ 2 ( X T X ). (Σβ ) kk = (Σβ ) kl = W. Zhou (Colorado State University) STAT 540 July 6th, 205 24 / 62

Inference about Regression Coefficients The estimated variance-covariance matrix. ˆΣ β := s 2 { ˆβ} = MSE (X T X ) s 2 { ˆβ 0 } s{ ˆβ 0, ˆβ } s{ ˆβ 0, ˆβ p } s{ := ˆβ, ˆβ 0 } s 2 { ˆβ } s{ ˆβ, ˆβ p }... s{ ˆβ p, ˆβ 0 } s{ ˆβ p, ˆβ } s 2 { ˆβ p } Under the multiple linear regression model, we have for k = 0,,..., p. ˆβ k β k s{ ˆβ k } W. Zhou (Colorado State University) STAT 540 July 6th, 205 25 / 62

Inference about Regression Coefficients Thus the ( α) confidence interval for β k is ˆβ k ± t α/2;n p s{ ˆβ k }. Test H 0 : β k = β k0 versus H a : β k β k0. Under H 0, we have t = ˆβ k β k0 s{ ˆβ k } t n p Thus we can perform a t-test at level α by the decision rule: W. Zhou (Colorado State University) STAT 540 July 6th, 205 26 / 62

Multiple Linear Regression Model 2 Inference on Multiple Regression 3 Inference about Regression Parameters 4 Estimation and Prediction 5 Geometric View of Regression and Linear Models 6 Estimating estimable function of coefficient W. Zhou (Colorado State University) STAT 540 July 6th, 205 27 / 62

Estimation of Mean Response Hidden Extrapolation Define X h = (, X h,..., X h,p ) T. Caution about hidden extrapolations. The region (with respect to X0) defined by d(x 0) = X T 0 (X T X) X 0 h max where h max = max i h ii, is an ellipsoid enclosing all data points inside the regressor variable hull (RVH). Predictions for any X0 outside the RVH (i.e., d(x 0) > h max) is hidden extrapolation, at least to some degree. W. Zhou (Colorado State University) STAT 540 July 6th, 205 28 / 62

Estimation of Mean Response The estimated mean response corresponding to X h = Mean E(Ŷh) = Variance Var{Ŷh} = Estimated variance is s 2 {Ŷh} = W. Zhou (Colorado State University) STAT 540 July 6th, 205 29 / 62

Confidence Intervals for Mean Response The ( α) confidence interval for E(Y h ) is Ŷ h ± t α/2;n p s{ŷh} The Working-Hotelling ( α) confidence band for the regression surface is Ŷ h ± W s{ŷh} where W 2 = pf ( α; p, n p). The Bonferroni ( α) joint confidence intervals for g mean responses are Ŷ h ± Bs{Ŷh} where B = t α/(2g);n p. W. Zhou (Colorado State University) STAT 540 July 6th, 205 30 / 62

Prediction of New Observation The predicted new observation corresponding to X h is Ŷh = X T h ˆβ, and Mean E( Ŷ h ) = X T h β = E(Y h(new) ). Prediction error variance σ 2 pred = Var(Ŷh Y h(new) ) = Estimated prediction error variance is s 2 {pred} = W. Zhou (Colorado State University) STAT 540 July 6th, 205 3 / 62

Prediction Intervals for New Observation The ( α) prediction interval for Y h(new) is Ŷ h ± t α/2;n p s{pred} The Scheffé ( α) joint confidence intervals for g new observations are Ŷ h ± Ss{pred} where S 2 = gf ( α; g, n p). The Bonferroni ( α) joint confidence intervals for g new observations are Ŷ h ± Bs{pred} where B = t α/(2g);n p. W. Zhou (Colorado State University) STAT 540 July 6th, 205 32 / 62

Multiple Linear Regression Model 2 Inference on Multiple Regression 3 Inference about Regression Parameters 4 Estimation and Prediction 5 Geometric View of Regression and Linear Models 6 Estimating estimable function of coefficient W. Zhou (Colorado State University) STAT 540 July 6th, 205 33 / 62

Geometric Viewpoint: The Column Space of the Design Matrix Xβ is a linear combination of the columns of X β Xβ = [x,..., x p ]. = β x +... + β p x p β p The set of all possible linear combinations of the columns of X is called the column space of X and is denoted by C(X) = {Xa : a R p } The Gauss-Markov linear model says y is a random vector whose mean is in the column space of X and whose variance is σ 2 I for some positive real number σ 2, i.e. E(y) C(X) and Var(y) = σ 2 I, σ 2 R + W. Zhou (Colorado State University) STAT 540 July 6th, 205 34 / 62

An Example Column Space X = [ ] C(X) = {Xa : a R p } {[ ] } = [a ] : a R [ ] } = {a : a R {[ ] } = a a : a R W. Zhou (Colorado State University) STAT 540 July 6th, 205 35 / 62

Another Example Column Space X = 0 0 0 0 0 [ ] C(X) = 0 a : a R 2 0 a 2 0 0 = a 0 + a 0 2 : a, a 2 R 0 a 0 a = 0 + 0 a 2 : a, a 2 R 0 a 2 a a = a 2 : a, a 2 R a 2 W. Zhou (Colorado State University) STAT 540 July 6th, 205 36 / 62

Another Example Column Space X = 0 0 0 0, X 2 = 0 0 0 0 x C(X ) x = X a for some a R 2 [ ] 0 x = X 2 for some a R 2 a x = X 2 b for some b R 3 x C(X 2 ) Thus C(X ) C(X 2 ) W. Zhou (Colorado State University) STAT 540 July 6th, 205 37 / 62

Another Example Column Space (continued) x C(X 2 ) x = X 2 a for some a R 3 0 x = a + a 2 0 + a 0 3 for some a R3 0 a + a 2 a + a 2 x = a + a 3 for some a, a 2, a 3 R a + a 3 [ a + a 2 x = X a + a 3 ] for some a, a 2, a 3 R W. Zhou (Colorado State University) STAT 540 July 6th, 205 38 / 62

Another Example Column Space (continued) x = X [ a + a 2 a + a 3 x = X b for some b R 2 x C(X ) ] for some a, a 2, a 3 R Thus, C(X 2 ) C(X ), as we have shown C(X ) C(X 2 ). It follows that C(X ) = C(X 2 ). W. Zhou (Colorado State University) STAT 540 July 6th, 205 39 / 62

Estimation of E(y) A fundamental goal of linear model analysis is to estimate E(y) We could, of course, use y to estimate E(y) y is obviously an unbiased estimator of E(y), but it is often not a very sensible estimator. For example, suppose [ ] [ ] [ y = µ + y 2 ɛ ɛ 2 Should we estimate E(y) = [µ, µ] by y = [6., 2.3]? ], and we observe y = [6., 2.3] W. Zhou (Colorado State University) STAT 540 July 6th, 205 40 / 62

Estimation of E(y) The Gauss-Markov linear models says that E(y) C(X), so we should use that information when estimating E(y) Consider estimating E(y) by the point in C(X) that is closest to y (as measured by the usual Euclidean distance). This unique point is called the orthogonal projection of y onto C(X) and denoted by ŷ (although it could be argued that Ê(y) might be better notation). By definition, y ŷ = min z C(X) y z where a = n i= a2 i W. Zhou (Colorado State University) STAT 540 July 6th, 205 4 / 62

Geometric Viewpoint on Multiple Regression (and LM) Geometrically, how to minimize the distance between Y and C(X)? That point is The vector between Y and X β is, and the distance is For R 2 : if we add another predictor, C(X) gains more dimension, so e can only decrease. C(X) Note: if dim(s) = n then W. Zhou (Colorado State University) STAT 540 July 6th, 205 42 / 62

W. Zhou (Colorado State University) STAT 540 July 6th, 205 43 / 62

Orthogonal Projection Matrices It can be shown that, as we did for least square estimators y R n, ŷ = P X y is the optimal one, i.e. ŷ = P X y is the best estimator of E(y) in the class of linear unbiased estimators for the unique matrix P X = H, the hat matrix, and is called orthogonal projection matrix HH = H, idempotent H = H, symmetric HX = X and X H = X (Why? Intuitively...) If (X X) is not invertible, we use its generalized inverse (X X) where AA A = A. The H is invariant to the choice of (X X), which is itself not unique ŷ and y ŷ are orthogonal (Why?) W. Zhou (Colorado State University) STAT 540 July 6th, 205 44 / 62

An Example Orthogonal Projection Suppose [ y y 2 ] = [ ] µ + [ X(X X) X = = ɛ ɛ 2 [ [ ], and we observe y = [6., 2.3]. Then ] ([ ] [ ] = [ = 2 [ = ] [ [2] [ ] [ ][ ] 2 ] /2 /2 /2 /2 ] ]) [ ] W. Zhou (Colorado State University) STAT 540 July 6th, 205 45 / 62

An Example Orthogonal Projection Thus, [ the orthogonal ] projection of y = [6., 2.3] onto the column space of X = is [ ] [ ] [ ] /2 /2 6. 4.2 P X y = Hy = = /2 /2 2.3 4.2 W. Zhou (Colorado State University) STAT 540 July 6th, 205 46 / 62

Geometric illustration Suppose X = [ 2 ] and y = [ 2 3/4 ] W. Zhou (Colorado State University) STAT 540 July 6th, 205 47 / 62

Geometric illustration Suppose X = [ 2 ] and y = [ 2 3/4 ] W. Zhou (Colorado State University) STAT 540 July 6th, 205 48 / 62

Geometric illustration Suppose X = [ 2 ] and y = [ 2 3/4 ] W. Zhou (Colorado State University) STAT 540 July 6th, 205 49 / 62

Geometric illustration Suppose X = [ 2 ] and y = [ 2 3/4 ] W. Zhou (Colorado State University) STAT 540 July 6th, 205 50 / 62

Geometric illustration The angle between ŷ and residual y ŷ is 90. So, orthogonal projection. W. Zhou (Colorado State University) STAT 540 July 6th, 205 5 / 62

Multiple Linear Regression Model 2 Inference on Multiple Regression 3 Inference about Regression Parameters 4 Estimation and Prediction 5 Geometric View of Regression and Linear Models 6 Estimating estimable function of coefficient W. Zhou (Colorado State University) STAT 540 July 6th, 205 52 / 62

What if X is not full column rank? X T X is not invertible, then (X T X) has to be defined based on the generalized inverse matrix. If X is not of full column rank, then there are infinitely many vectors in the set {b : Xb = Xβ} for any fixed value of β. Thus, no matter what the value of E(y), there will be infinitely many vectors b such that Xb = E(y) when X is not of full column rank. Our response vector y can help us learn about E(y) = Xβ, but when X is NOT of full column rank, there is NO hope of learning about β alone unless additional information about β is available. How, we could estimate estimable function of β W. Zhou (Colorado State University) STAT 540 July 6th, 205 53 / 62

Treatment Effects Model Researchers randomly assigned a total of six experimental units to two treatments and measured a response of interest. y ij = µ + τ i + ɛ ij, i =, 2; j =, 2, 3 y y 2 y 3 y 2 y 22 = µ + τ µ + τ µ + τ µ + τ 2 µ + τ 2 + ɛ ɛ 2 ɛ 3 ɛ 2 ɛ 22 Question: what is X, β? y 23 µ + τ 2 ɛ 23 W. Zhou (Colorado State University) STAT 540 July 6th, 205 54 / 62

Treatment Effects Model (continued) In this case, it makes no sense to estimate β = [µ, τ, τ 2 ] because there are multiple (infinitely many, in fact) choices of β that define the same mean for y. For example µ τ τ 2 = all yield same Xβ = E(y). 5, 0 4 6, 999 995 993 When multiple values for β define the same E(y), we say that β is non-estimable. W. Zhou (Colorado State University) STAT 540 July 6th, 205 55 / 62

Estimable Functions of β A linear function of β, Cβ, is said to be estimable if there is a linear function of y, say Ay, that is an unbiased estimator for Cβ. Otherwise, nonexistence of such linear function implies that Cβ is non-estimable. Note that Ay is an unbiased estimator of Cβ if and only if E(Ay) = Cβ, for β R p AXβ = Cβ AX = C This says that we can estimate Cβ as long as Cβ = AXβ = AE(y) for some A, i.e. as long as Cβ is a linear function of E(y) The bottom line is that we can always estimate E(y) and all linear functions of E(y); all other linear functions of β are non-estimable W. Zhou (Colorado State University) STAT 540 July 6th, 205 56 / 62

Treatment Effects Model (continued) Xβ = 0 0 0 0 0 0 0 0 µ τ τ 2 = µ + τ µ + τ µ + τ µ + τ 2 µ + τ 2 µ + τ 2 so that [, 0, 0, 0, 0, 0]Xβ = [,, 0]β = µ + τ [0, 0, 0,, 0, 0]Xβ = [, 0, ]β = µ + τ 2 [, 0, 0,, 0, 0]Xβ = [0,, ]β = τ τ 2 are estimable functions of β W. Zhou (Colorado State University) STAT 540 July 6th, 205 57 / 62

Estimating Estimable Functions of β If Cβ is estimable, then there exists a matrix A such that C = AX and Cβ = AXβ = AE(y) for any β R p It makes sens to estimate Cβ by AÊ(y) = Aŷ = AP Xy = AX(X X) X y = AX(X X) X X ˆβ = AP X X ˆβ = AX ˆβ = Cβ C ˆβ is called an Ordinary Least Squares (OLS) estimator of Cβ Note that although the hat is on β, it is Cβ that we are estimating Invariance of C ˆβ to the choice of ˆβ: Although there are infinitely many solutions to the normal equations when X is not of full column rank, C ˆβ is the same for all normal equation solutions ˆβ whenever Cβ is estimable (STAT 640) W. Zhou (Colorado State University) STAT 540 July 6th, 205 58 / 62

Treatment Effects Model (continued) Suppose our aim is to estimate τ τ 2 As noted before Xβ = 0 0 0 0 0 0 0 0 µ τ τ 2 = µ + τ µ + τ µ + τ µ + τ 2 µ + τ 2 µ + τ 2, so that [, 0, 0,, 0, 0]Xβ = [0,, ]β = τ τ 2 Thus, we can compute the OLS estimator of τ τ 2 as [, 0, 0,, 0, 0]ŷ = [0,, ] ˆβ where ˆβ is any solution to the normal equations. W. Zhou (Colorado State University) STAT 540 July 6th, 205 59 / 62

Treatment Effects Model (continued) The normal equation in this case is 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 b b 2 b 3 = 0 0 0 0 0 0 0 0 y y 2 y 3 y 2 y 22 y 23 so that 6 3 3 3 3 0 3 0 3 b b 2 b 3 = y.. y. y 2. W. Zhou (Colorado State University) STAT 540 July 6th, 205 60 / 62

Treatment Effects Model (continued) ȳ.. 0 ˆβ = ȳ. ȳ.. and ˆβ 2 = ȳ. are both solutions to the normal equation ȳ 2. ȳ.. ȳ 2. (Check this). Thus, the OLS estimator of Cβ = [0,, ]β = τ τ 2 is ȳ.. C ˆβ = [0,, ] ȳ. ȳ.. = ȳ. ȳ 2. = [0,, ] ȳ 2. ȳ.. 0 ȳ. ȳ 2. = C ˆβ 2 HW: Can you find two different generalized inverse of (X X), A and A 2 that (X X)A i (X X) = (X X) so that A i = (X X) for each i, and they will give you ˆβ and ˆβ 2, respectively? W. Zhou (Colorado State University) STAT 540 July 6th, 205 6 / 62

The Gauss-Markov Theorem Under the Gauss-Markov Linear Model, the OLS estimator c ˆβ of an estimable linear function c β is the unique Best Linear Unbiased Estimator (BLUE) in the sense that Var(c ˆβ) is strictly less than the variance of any other linear unbiased estimator of c β for all β R p and all σ 2 R +. The Gauss-Markov Theorem says that if we want to estimate an estimable linear function c β using a linear estimator that is unbiased, we should always use the OLS estimator. In our simple example of the treatment effects model, we could have used y y 2 to estimate τ τ 2. It is easy to see that y y 2 is a linear estimator that is unbiased for τ τ 2, but its variance is clearly larger than the variance of the OLS estimator ȳ. ȳ 2. (as guaranteed by the Gauss-Markov Theorem). W. Zhou (Colorado State University) STAT 540 July 6th, 205 62 / 62