Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

Similar documents
Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model

Lecture 10 Multiple Linear Regression

Formal Statement of Simple Linear Regression Model

The Multiple Regression Model

STAT 540: Data Analysis and Regression

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

Multiple Linear Regression

Inference for Regression

Chapter 2 Inferences in Simple Linear Regression

Chapter 2. Continued. Proofs For ANOVA Proof of ANOVA Identity. the product term in the above equation can be simplified as n

6. Multiple Linear Regression

Linear Algebra Review

Need for Several Predictor Variables

Applied Regression Analysis

Regression Models - Introduction

Chapter 4: Regression Models

Chapter 14. Linear least squares

Ch 2: Simple Linear Regression

SSR = The sum of squared errors measures how much Y varies around the regression line n. It happily turns out that SSR + SSE = SSTO.

Matrix Approach to Simple Linear Regression: An Overview

Statistics for Managers using Microsoft Excel 6 th Edition

Correlation and the Analysis of Variance Approach to Simple Linear Regression

PubH 7405: REGRESSION ANALYSIS. MLR: INFERENCES, Part I

Inferences for Regression

Chapter 4. Regression Models. Learning Objectives

STOR 455 STATISTICAL METHODS I

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Correlation Analysis

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

STA121: Applied Regression Analysis

STAT5044: Regression and Anova. Inyoung Kim

Mathematics for Economics MA course

Lecture 6 Multiple Linear Regression, cont.

assumes a linear relationship between mean of Y and the X s with additive normal errors the errors are assumed to be a sample from N(0, σ 2 )

STAT 705 Chapter 16: One-way ANOVA

Ch 3: Multiple Linear Regression

F-tests and Nested Models

Regression Models. Chapter 4. Introduction. Introduction. Introduction

Linear models and their mathematical foundations: Simple linear regression

Regression Models - Introduction

17: INFERENCE FOR MULTIPLE REGRESSION. Inference for Individual Regression Coefficients

Lecture 9 SLR in Matrix Form

Basic Business Statistics 6 th Edition

We like to capture and represent the relationship between a set of possible causes and their response, by using a statistical predictive model.

The legacy of Sir Ronald A. Fisher. Fisher s three fundamental principles: local control, replication, and randomization.

Chapter 5 Matrix Approach to Simple Linear Regression

TMA4255 Applied Statistics V2016 (5)

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables

Bias Variance Trade-off

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)

ECO220Y Simple Regression: Testing the Slope

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing

Linear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is.

Chapter 14 Student Lecture Notes 14-1

Chapter 2 Multiple Regression I (Part 1)

Econ 3790: Business and Economics Statistics. Instructor: Yogesh Uppal

Lectures on Simple Linear Regression Stat 431, Summer 2012

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal

STAT5044: Regression and Anova. Inyoung Kim

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

Biostatistics 380 Multiple Regression 1. Multiple Regression

14 Multiple Linear Regression

Chapter 7 Student Lecture Notes 7-1

Statistical Techniques II EXST7015 Simple Linear Regression

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

LINEAR REGRESSION MODELS W4315

Inference in Regression Analysis

ST Correlation and Regression

Lecture 1 Linear Regression with One Predictor Variable.p2

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Lecture 2 Simple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: Chapter 1

STA 2101/442 Assignment Four 1

Regression Analysis II

Ordinary Least Squares Regression

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1

STAT5044: Regression and Anova

Chapter 6 Multiple Regression

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

Simple Linear Regression

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).

Lecture 11: Simple Linear Regression

Multivariate Regression (Chapter 10)

Variance Decomposition and Goodness of Fit

Basic Business Statistics, 10/e

Lecture 3: Inference in SLR

Simple linear regression

Multivariate Regression

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

CHAPTER 2: Assumptions and Properties of Ordinary Least Squares, and Inference in the Linear Regression Model

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Inference for the Regression Coefficient

MATH 644: Regression Analysis Methods

Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13)

Notes for Week 13 Analysis of Variance (ANOVA) continued WEEK 13 page 1

Homoskedasticity. Var (u X) = σ 2. (23)

Lecture 5: Linear Regression

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Inference in Normal Regression Model. Dr. Frank Wood

Transcription:

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is Q = (Y i β 0 β 1 X i1 β 2 X i2 β p 1 X i.p 1 ) 2, which in matrix notation is Q = (Y Xβ) (Y Xβ) = Y Y 2β X Y + β X Xβ. Now consider the partial differential vector operator β (Q) = Q β 0 Q β 1. Q β p 1. Then β (Q) = 2X Y + 2X Xβ. Note: This results was verified in the context of simple linear regression (where it is easy to verify), but it holds for multiple regression as wee. Setting β (Q) = 0 and replacing β with its least square estimator b leads to the normal equations whose solution is (X X)b = X Y, b = (X X) 1 (X Y ). Note: The matrix solution looks the same as in the case of simple linear regression. However, now b is a p 1 column vector of least square estimators, (X X) 1 is a symmetric p p constant matrix and (X Y )is a p 1 vector. In the general multiple regression case with p 1 predictor variables, it is not possible to obtain nice algebraic expressions for the elements of b as in the case of simple regression. Instead the least squares estimates are found numerically by first finding (X X) 1 and then multiplying on the right by (X Y ). All matrices involved in these computations can be recovered from SAS. Refer to the SAS output that accompanies these notes. 1

Properties of the Least Square Estimators 1. The least square estimators are unbiased, i.e. E(b) = β. 2. The variance-covariance matrix of b is V (b) = σ 2 (X X) 1. 3. The estimated variance-covariance matrix of b is ˆV (b) = MSE(X X) 1. In the notation of the text, this estimated p p variance-covariance matrix is written V (b) = E(b β)(b β) = S 2 (b 0 ) S(b 0, b 1 ) S(b 0, b p 1 ) S(b 1, b 0 ) S 2 (b 1 ) S(b 1, b p 1 ).... S(b p 1, b 0 ) S(b p 1, b 1 ) S 2 (b p 1 ) which is output by SAS when using PROC REG with option COVB. Also, the estimated standard error of b k is Std{b k } = S(b k ) = S 2 (b k ), k = 0, 1,, p 1, which are automatically output by SAS when PROC REG is used. Remark: By the Guass-Markov Theorem, least squares estimators b are the best (in the sense of smallest variance) linear unbiased estimators (BLUE) of β. Predicted Value and Residuals, AS in simple linear regression, Ŷ = Xb where Ŷ 1 1 X 11 X 1,p 1 Ŷ 2 Ŷ =., X = 1 X 21 X 2,p 1...., b = 1 X n1 X n,p 1 Ŷ n b 0 b 2. b p 1. Thus, Since Ŷ i = b 0 + b 1 X i1 + b 2 X i2 + + b p 1 X i,p 1. b = (X X) 1 (X Y ). 2

That is, Ŷ = HY, H = X(X X) 1 X. Remark: The so-called hat matrix H Is n n no matter how many predictor variables are involved in the regression. Of course it is more difficult to compute when there are p 1 predictor variables.as in the case of simple linear regression, And the residual vector is H = H, H 2 = H. e = Y Ŷ = Y HY = (I n H)Y. Note: H transforms Y into the estimated mean response vector Ŷ while I n H transforms Y into e, the vector of residuals. Variance-covariance Matrix of the residuals The variance-covariance matrix of the Predicted Ŷ is computed as follows: V (Ŷ ) = σ 2 H. And since σ 2 is not known, the estimated variance-covariance matrix of Ŷ is ˆV (Ŷ ) = MSEH. Similarly, V (e) = σ 2 (I n H), ˆV (e) = MSE(I n H). Note: The mean square error MSE will be shown later. Analysis of Variance As in simple linear regression, the fundamental identity on which the analysis of variance is based on: SST O = SSR + SSE, where and SST O = (Y i Ȳ )2 = Y (I n 1 n J)Y, J = 11, SSE = e 2 i = e e = Y (I n H)Y, SSR = (Ŷi Ȳ )2 = Y (H 1 n J)Y. 3

The degrees of freedom associated with above sum of squares are: df SST O = n 1, df SSE = n p, df SSR = p 1. Thus the corresponding mean squares are: MSE = SSE n p, SSR MSR = p 1. And by Cochran s Theorem, SSR and SSE are independent with the following χ 2 distribution: SSE χ 2 SSR (n p), χ 2 (p 1, θ/σ 2 ), σ 2 σ 2 where θ = [β 1 (X i1 X 1 ) + β 2 (X i2 X 2 ) + + β p 1 (X i,p 1 X p 1 )] 2. Remark: E(MSE) = σ 2, E(MSR) = σ 2 + θ p 1. We see the consistency when it is compared to E(MSR) = σ 2 + β1 2 (X i X) 2, for p 1 = 1. ANOVA Table: Source df Sum Squares Mean Squares Expected MS F Ratio Regression/model p 1 SSR MSR = SSR σ 2 + θ F = MSR p 1 p 1 MSE Error n p SSE MSE = SSE σ 2 n p Total n 1 SST O F Test for Regression The F-ratio in the above ANOVA table tests the hypotheses: H 0 : β 1 = β 2 = = β p 1 = 0, H a : β k 0 for at least one k = 1, 2,, p 1. The test statistics is then And under H 0, F = MSR MSE. F F (p 1, n p). Thus, the decision rules for an α level test are: Decision rule I Decision rule II Accept H 0 if F F (1 α; p 1, n p), Accept H 0 if P v = P (F (p 1, n p) > F ) α, Reject H 0 if F > F (1 α; p 1, n p). Reject H 0 if P v = P (F (p 1, n p) > F ) < α. 4

Coefficient of Multiple Determination R 2 = SSR SST O measures the proportion of the total variation in response variable Y which is due to its linear relationship on explanatory variables X 1, X 2,, X p 1. Thus R 2 plays the same role in multiple regression that R 2 does in simple regression. Comments A large value of R 2 does not necessarily imply that the fitted model is a useful one. For instance, 1. The nonlinearity may exist even if the R 2 is large. 2. Most of observation may have been taken at certain ranges of the predictor variables. Despite a high R 2 in this case, the fitted model may not be useful if most prediction require extrapolations outside the region of observations. 3. Even though R 2 is large, MSE may still be too large for inferences to be useful when high precision is required. 4. The above F-test statistics can also be written in terms of R 2 : F = MSR ( ) n p R 2 MSE = p 1 1 R. 2 Adjusted R 2 Recall R 2 = SSR SST O = 1 SSE SST O. The adjusted R 2 is obtained by dividing SSE and SST O by their respective degrees of freedom. That is, ( ) Ra 2 SSE/(n p) n 1 SSE = 1 SST O/(n 1) = 1 n p SST O. Remark. Adding another explanatory variable to the multiple regression will always decrease SSE thus increase R 2. However, Ra 2 may actually decrease when another explanatory variable is added to the model, because the decrease in SSE may be more offset by the loss of a degree of freedom in the denominator (i.e. n p). Coefficient of Multiple Correlation R = R 2. R dose not have direct interpretation in terms of reduction in the variability of the dependent variables as dose R 2. It is not often used. 5