Multiple Linear Regression

Similar documents
Simple Linear Regression

Ch 2: Simple Linear Regression

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

Ch 3: Multiple Linear Regression

STAT 3A03 Applied Regression With SAS Fall 2017

Chapter 12 - Lecture 2 Inferences about regression coefficient

Measuring the fit of the model - SSR

Linear models and their mathematical foundations: Simple linear regression

Applied Regression Analysis

Correlation Analysis

Simple Linear Regression

Lecture 6 Multiple Linear Regression, cont.

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013

Lecture 18 MA Applied Statistics II D 2004

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Multivariate Regression (Chapter 10)

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

Linear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is.

Lecture 11 Multiple Linear Regression

Simple linear regression

Inference for Regression

Inferences for Regression

Lecture 10 Multiple Linear Regression

Categorical Predictor Variables

where x and ȳ are the sample means of x 1,, x n

Lecture 18: Simple Linear Regression

2. Regression Review

Linear Regression Model. Badr Missaoui

Homoskedasticity. Var (u X) = σ 2. (23)

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables

Statistiek II. John Nerbonne. March 17, Dept of Information Science incl. important reworkings by Harmut Fitz

Inference in Regression Analysis

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Chapter 6 Multiple Regression

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing

Simple and Multiple Linear Regression

STAT 511. Lecture : Simple linear regression Devore: Section Prof. Michael Levine. December 3, Levine STAT 511

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin

Confidence Interval for the mean response

6. Multiple Linear Regression

Weighted Least Squares

Lecture 12 Inference in MLR

Homework 2: Simple Linear Regression

ECON The Simple Regression Model

Confidence Intervals, Testing and ANOVA Summary

Math 3330: Solution to midterm Exam

Simple Linear Regression

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

SSR = The sum of squared errors measures how much Y varies around the regression line n. It happily turns out that SSR + SSE = SSTO.

STA 2101/442 Assignment Four 1

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik

MATH 644: Regression Analysis Methods

14 Multiple Linear Regression

Lecture 13 Extra Sums of Squares

Inference for Regression Simple Linear Regression

[y i α βx i ] 2 (2) Q = i=1

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model

Chapter 14. Linear least squares

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

1 Least Squares Estimation - multiple regression.

Formal Statement of Simple Linear Regression Model

Inference for the Regression Coefficient

Simple Linear Regression: One Qualitative IV

Ordinary Least Squares Regression Explained: Vartanian

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

Simple Linear Regression

STA121: Applied Regression Analysis

Chapter 3: Multiple Regression. August 14, 2018

Analysis of Variance. Source DF Squares Square F Value Pr > F. Model <.0001 Error Corrected Total

MS&E 226. In-Class Midterm Examination Solutions Small Data October 20, 2015

CS 5014: Research Methods in Computer Science

Chapter 14 Simple Linear Regression (A)

Math 423/533: The Main Theoretical Topics

The Multiple Regression Model

Lecture 14 Simple Linear Regression

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A =

Correlation and Regression

Lecture 15. Hypothesis testing in the linear model

5. Multiple Regression (Regressioanalyysi) (Azcel Ch. 11, Milton/Arnold Ch. 12) The k-variable Multiple Regression Model

4 Multiple Linear Regression

Ma 3/103: Lecture 25 Linear Regression II: Hypothesis Testing and ANOVA

ECON3150/4150 Spring 2015

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

Mathematics for Economics MA course

Statistics for Managers using Microsoft Excel 6 th Edition

Bias Variance Trade-off

Lectures on Simple Linear Regression Stat 431, Summer 2012

L2: Two-variable regression model

Advanced Quantitative Methods: ordinary least squares

ST505/S697R: Fall Homework 2 Solution.

Simple Linear Regression. Material from Devore s book (Ed 8), and Cengagebrain.com

Regression used to predict or estimate the value of one variable corresponding to a given value of another variable.

Multivariate Regression Analysis

Can you tell the relationship between students SAT scores and their college grades?

Transcription:

Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there is still a lot of variability about the line remaining. Some of this variability may be explained by including other covariates in the model. 3-1

Adding a Second Covariate Example: University staff are interested in the relationship between how well students do in High School with their performance in first year of university. In one US study, 224 Computer Science students in a large were followed for their first 3 semesters of university. The data are called CSData.txt. Measured variables include the students GPA after first year university, High school average grades (coded from 1 10) in math, science and English and the students Math and Verbal SAT scores. 3-2

Adding a Second Covariate Example continued: An initial analysis looked at how well average high school math grade (coded from 1 10) predicted their GPA. The fitted least squares line was GPA = 0.908 + 0.208HSM A test of whether β 1 = 0 gave a p-value of < 0.0001. The R 2 value, however was only 0.1905. Can we do better and account for more of the variability? 3-3

Adding a Second Covariate Example continued: Maybe the Math SAT scores are a better predictor. The fitted least squares line for this is GPA = 1.284 + 0.0023SATM The test of β 1 = 0 has a p-value of 0.0001 and R 2 = 0.0634 Conclusion: Math SAT scores are a useful predictor of first year university GPA but not as good as high school math grade. 3-4

Adding a Second Covariate Example continued: Now suppose that we try to include both High School and SAT Math scores in the model. The least squares estimated model is now GPA = 0.666 + 0.193HSM + 0.0006SATM The p-value for testing β 1 = 0 (coefficient of HSM) is < 0.0001 as before. That for testing β 2 = 0 (coefficient of SATM) is now 0.319 The R 2 for this model with 2 covariates is 0.1942, only slightly higher than it was for the simple regression with the covariate HSM alone. What is going on? 3-5

The Two Covariate Model Y = β 0 + β 1 X 1 + β 2 X 2 + ε The interpretation of the parameters β 1 and β 2 is different than in the simple model. β 2 is now the effect of X 2 after adjusting for X 1. Similarly β 1 is the effect of X 1 after adjusting for the effect of X 2. If X 1 and X 2 are uncorrelated then the estimates will be the same as the estimates in the simple models but in general this is not true. 3-6

The Two Covariate Model Consider the following sequence of models. 1. Fit a simple linear model between y 1,..., y n and the covariate X 1 with observed values x 11,..., x n1. Find the residuals e 11,..., e n1. 2. Fit a regression with the variable X 2 as response (observed values x 12,..., x n2 ) and x 11,..., x n1 as covariate values and find the residuals e 12,..., e n2. 3. Fit a regression with e 11,..., e n1 as response values and e 12,..., e n2 as covariate values. 3-7

The Two Covariate Model The residuals from the first model can be considered the values of Y adjusted for the values of X 1. Similarly the residuals from the second model can be considered the values of the covariate X 2 adjusted for the values of X 1. Hence the third model regresses the adjusted response on the adjusted second covariate. The estimated slope in this third model will be exactly the estimated coefficient of X 2 in the two covariate model. The standard errors, test statistics and p-values will also be identical (but may be subject to rounding error). 3-8

Adding a Second Covariate Note that it is not always the case that the R 2 for the two covariate model is less than the sum of the R 2 for each of the individual simple models. In some cases, neither covariate alone is a good predictor of the response but the two together do a very good job of explaining the variability in the response. If X 1 and X 2 are highly negatively correlated to each other but Y is related to both in the same direction this can happen. In a 1987 paper David Hamilton (Dalhousie) showed this quite nicely. 3-9

Multiple Regression The assumed model is of the form Y = β 0 + β 1 X 1 + β 2 X 2 + + β p X p + ε The error terms ε are assumed to have mean 0 for every value of x = (x 1,..., x p ) t. The variance of ε is assumed constant (equal to σ 2 ) for all covariate vectors x. As before we wish to estimate the p+2 parameters and make inference about the coefficients in the model. 3-10

Estimating the Coefficients As in simple regression we will estimate the coefficients using least squares. That is we will find values of β 0, β 1,..., β p which minimize S(β 0, β 1,..., β p ) = n i=1 (y i β 0 β 1 x i1 β 2 x i2 β p x ip ) 2 On taking partial derivatives and setting them equal to 0 we get the p + 1 normal equations which must be solved simultaneously to get the estimators. 3-11

The Model in Matrix Notation Let Y = (y 1,..., y n ) t be the response vector. The error vector is similarly defined as ε = (ε 1,..., ε n ) t. Define the vector of coefficients as β = (β 0, β 1,..., β p ) t. The Design Matrix X is the n (p+1) matrix with all elements in the first column equal to 1 and column r + 1 equal to (x 1r,..., x nr ) t. Then the linear regression model can be written as Y = Xβ + ε 3-12

Estimation in Matrix Notation For given β the residuals can be written as e i = y i x i β where x i is the i th row of X. Then we can write S(β) = n i=1 (y i x i β) 2 = (Y Xβ) t (Y Xβ) On taking derivatives and setting them equal to 0 we get the least squares estimates ˆβ = (X t X) 1 X t Y 3-13

Estimation of σ 2 The fitted values from the regression are Ŷ = X ˆβ. The residuals are e = Y Ŷ = Y X ˆβ As in the simple model, we can find an unbiased estimator of σ 2 by looking at the sum of squared residuals SSE = e t e = (Y X ˆβ) t (Y X ˆβ). The estimator is ˆσ 2 = SSE n (p + 1). 3-14

Simple Regression in Matrix Notation Simple regression is a special case of multiple regression with p = 1 and can be formulated in the same matrix framework. Y is still the vector of observed response variables. β = ( β0 β 1 ). The X matrix is X = 1 x 1 1 x 2.. 1 x n 3-15

Simple Regression in Matrix Notation We can then see that X t X = n nx nx x 2 i The inverse of this matrix is (X t X) 1 = 1 nsxx x 2 i nx nx n 3-16

Simple Regression in Matrix Notation We also have X t Y = ( ) ny xi y i Hence we get ˆβ = (X t X) 1 X t Y = 1 nsxx ny x 2 i nx x i y i n x i y i n 2 xy A little algebra shows that these estimates agree with those given for the simple regression model. 3-17

Simple Regression in Matrix Notation Note that we can write (X t X) 1 σ 2 = 1 nsxx = x 2 i nx ( ) 1 n + Sxx x2 σ 2 nx n σ 2 x Sxx σ2 x Sxx σ2 σ 2 Sxx The diagonal elements of this matrix give Var(ˆβ 0 ) and Var(ˆβ 1 ) as stated in Theorem 2. The off-diagonal element gives Cov(ˆβ 0, ˆβ 1 ) again as in Theorem 2. 3-18

Properties of the Estimators Theorem 6 The estimators described on the previous slides satisfy 1. E ( ˆβ X ) = β. 2. Var ( ˆβ X ) = σ 2 (X t X) 1. 3. E (ˆσ 2 X ) = σ 2. 3-19

Standard Errors and Sampling Distributions The estimated Variance-Covariance matrix for β is given by Var( ˆβ) = ˆσ 2 (X t X) 1 = σ 2 C. The standard errors can be found by looking at the diagonal elements of this matrix. se(ˆβ j ) = ˆσ c jj These standard errors are always output as part of the SAS output from a regression. The Sampling Distributions are ˆβ j β j se(ˆβ j ) t n (p+1) j = 0, 1,..., p 3-20

Individual Confidence Intervals Confidence intervals for individual parameters are given by ˆβ j ± t (n p 1;α/2) se(ˆβ j ) j = 0,..., p As with the estimates, these are to be interpreted as confidence intervals for the additional effect of X j when all of the other covariates are included in the model. 3-21

Individual Hypothesis Tests Similarly we can test if β j = β 0 j using the test statistic t j = β j β 0 j se(ˆβ j ). The p-value for this test can be found by comparing the test statistic to the t n p 1 distribution. It is important to remember that this is a test of the additional contribution of X j when all of the other covariates are in the model. 3-22

Analysis of Variance Table As in the simple model we can write the total sum of squares as the error sum of squares plus a sum of squares related to the model. These are generally displayed in an ANOVA table exactly as in the simple case. The degrees of freedom for the model is now p whereas that for the error is n p 1. The mean squares are the sums of squares divided by the degrees of freedom. 3-23

The Coefficient of Determination As in the simple model we can define the coefficient of determination R 2 = SSR SST = 1 SSE SST. This is still a measure of the proportion of variability in Y which is accounted for by the fitted model. It can also be shown that R 2 is the square of the correlation coefficient between Y and the fitted values Ŷ. In multiple regression, a plot of Y against Ŷ is also very useful since it incorporates all of the covariates at once. 3-24

Testing All Coefficients Equal to 0 We will usually want to test if the fitted model is useful in predicting the response. The null hypothesis here is H 0 : β 1 = β 2 = = β p = 0 The alternative is that at least one of the β j 0. 3-25

Testing All Coefficients Equal to 0 Under the null hypothesis it can be shown that F = MSR MSE F p,n p 1 This value is given in the F Value column of the ANOVA table. The final column gives the p-value for the test. Note that this is a simultaneous test for all of the covariates. Rejecting H 0 does not imply that all of the covariates are important, just that some subset of them are important. 3-26

Testing a Subset of Coefficients Equal to 0 In some cases we may want to test if some but not all of the coefficients are 0. We define the full model to be Y = β 0 + β 1 X 1 + + β p X p + ε Suppose the null hypothesis we want to test is H 0 : β k+1 = β k+2 = = β p = 0 against the alternative that one of these β j 0. Then we can define the reduced model to be Y = β 0 + β 1 X 1 + β k X k + ε 3-27

Testing a Subset of Coefficients Equal to 0 From the full model we can get the SSE and MSE for this model. Lets call them SSE full and MSE full. As always the degrees of freedom will be n p 1 Similarly we can find SSE red from the reduced model with n k 1 degrees of freedom. Then we can define a test statistic to be F = (SSE red SSE full )/(p k) MSE full which has an F p k,n p 1 distribution if H 0 is true so we can find a p-value. 3-28

Other Tests Suppose that H 0 specifies some reduced model which can be thought of as a special case of the full model. For example we may have a model that specifies some subset of coefficients are all equal but not necessarily 0. Then we fit the full and reduced model to get the SSE from each and the df from each. The test statistic is then F = (SSE red SSE full )/(df red df full ) SSE full /df full which has an F distribution with df red df full and df full degrees of freedom when H 0 is true. 3-29

Prediction As before we often wish to do inference for a new vector of covariate values x 0 = (1, x 1,0, x 2,0,..., x p,0 ) t. The point estimator of µ 0 = E[Y x 0 ] is ˆµ 0 = x t 0 ˆβ The standard error of this estimator is se(ˆµ 0 ) = ˆσ x t 0 (Xt X) 1 x 0 A 100(1 α)% confidence interval is given by ˆµ 0 ± t (n p 1,α/2) se(ˆµ 0 ) 3-30

Prediction We may also wish to predict the value for an individual with covariate values x 0. The point predictor is the same as the point estimator of µ 0 ŷ 0 = x t 0 ˆβ In the standard error, however, we need to account for the extra error term so we get the standard error of prediction se(ŷ 0 ) = ˆσ 1 + x t 0 (Xt X) 1 x 0 A 100(1 α)% prediction interval is given by ŷ 0 ± t (n p 1,α/2) se(ŷ 0 ) 3-31