Lecture 10 Multiple Linear Regression

Similar documents
Lecture 2 Simple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: Chapter 1

Lecture 3: Inference in SLR

Lecture 9 SLR in Matrix Form

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

Lecture 12 Inference in MLR

6. Multiple Linear Regression

STAT Chapter 11: Regression

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Ch 2: Simple Linear Regression

Correlation Analysis

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

Regression Models - Introduction

Inferences for Regression

Chapter 12 - Lecture 2 Inferences about regression coefficient

Chapter 4: Regression Models

Unit 10: Simple Linear Regression and Correlation

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

Lecture 18 Miscellaneous Topics in Multiple Regression

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model

Lecture 13 Extra Sums of Squares

Inference for Regression

Multiple Linear Regression

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

Inference for Regression Simple Linear Regression

Unit 11: Multiple Linear Regression

Topic 14: Inference in Multiple Regression

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

Lecture 1 Linear Regression with One Predictor Variable.p2

Inference for the Regression Coefficient

The Multiple Regression Model

Chapter 4. Regression Models. Learning Objectives

A discussion on multiple regression models

Lecture 11 Multiple Linear Regression

Chapter 6 Multiple Regression

STAT 540: Data Analysis and Regression

Chapter 7 Student Lecture Notes 7-1

Biostatistics 380 Multiple Regression 1. Multiple Regression

: The model hypothesizes a relationship between the variables. The simplest probabilistic model: or.

STAT 350 Final (new Material) Review Problems Key Spring 2016

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

Confidence Interval for the mean response

Inference for Regression Inference about the Regression Model and Using the Regression Line

Formal Statement of Simple Linear Regression Model

Ch 3: Multiple Linear Regression

STOR 455 STATISTICAL METHODS I

Chapter 14 Student Lecture Notes 14-1

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007

Correlation and the Analysis of Variance Approach to Simple Linear Regression

9. Linear Regression and Correlation

Topic 20: Single Factor Analysis of Variance

Lecture 14 Simple Linear Regression

Statistics for Managers using Microsoft Excel 6 th Edition

Lecture 18 MA Applied Statistics II D 2004

Multivariate Regression (Chapter 10)

STA 4210 Practise set 2b

Chapter 3 Multiple Regression Complete Example

Mathematics for Economics MA course

Ch. 1: Data and Distributions

5.1 Model Specification and Data 5.2 Estimating the Parameters of the Multiple Regression Model 5.3 Sampling Properties of the Least Squares

df=degrees of freedom = n - 1

CHAPTER EIGHT Linear Regression

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

Statistical Techniques II EXST7015 Simple Linear Regression

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Simple Linear Regression: One Quantitative IV

Intro to Linear Regression

F-tests and Nested Models

Intro to Linear Regression

Need for Several Predictor Variables

Simple linear regression

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.

Lecture 11: Simple Linear Regression

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

Correlation and regression

Regression Models. Chapter 4. Introduction. Introduction. Introduction

Psychology 282 Lecture #4 Outline Inferences in SLR

Simple Linear Regression: One Qualitative IV

Math 3330: Solution to midterm Exam

Basic Business Statistics 6 th Edition

We like to capture and represent the relationship between a set of possible causes and their response, by using a statistical predictive model.

Lectures on Simple Linear Regression Stat 431, Summer 2012

STA 4210 Practise set 2a

STAT5044: Regression and Anova. Inyoung Kim

SIMPLE REGRESSION ANALYSIS. Business Statistics

1. Simple Linear Regression

STATISTICAL DATA ANALYSIS IN EXCEL

Chapter 2 Multiple Regression I (Part 1)

Chapter 2 Inferences in Simple Linear Regression

Simple and Multiple Linear Regression

Lecture notes on Regression & SAS example demonstration

y response variable x 1, x 2,, x k -- a set of explanatory variables

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).

Statistics 5100 Spring 2018 Exam 1

STAT 4385 Topic 03: Simple Linear Regression

Regression Models - Introduction

Simple Linear Regression

Econometrics. 4) Statistical inference

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal

Chapter 14 Simple Linear Regression (A)

28. SIMPLE LINEAR REGRESSION III

Transcription:

Lecture 10 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 10-1

Topic Overview Multiple Linear Regression Model 10-2

Data for Multiple Regression Y i is the response variable (as usual) Xi1, Xi2, Xip, 1 are the p 1 explanatory variables for cases i= 1,2,..., n Example In HW #2, you considered predicting freshman GPA based on ACT scores. Perhaps we consider high school GPA and an intelligence test as well. Then for this problem, we would be using p = 4. 10-3

Multicollinearity Predictor variables are often correlated to each other. If predictor variables are highly correlated, they will be fighting to explain the same part of the variation in the response variable. Caution: Using highly correlated predictor variables in the same model will not lead to useful parameter estimates. Want to be careful of this. 10-4

Multiple Regression Model Y = β + βx + βx + + β X + ε i 0 1 i1 2 i2 p 1 ip, 1 i i= 1,2,..., n observations Assumptions exactly as before: ε ~ 0, σ iid i N ( 2 ) Y i is the value of the response variable for the i th case. X ik is the value of the k th explanatory variable for the i th case. 10-5

Multiple Regression Model (2) β 0 is the intercept (think multidimensional). β1, β2,, βp 1 are the regression (slope) coefficients for the explanatory variables. Parameters as usual include all of the β 's as 2 well as σ. These need to be estimated from the data. 10-6

Regression Plane/Surface 10-7

Model in Matrix Form Y= X β+ n 1 n pp 1 ε n 1 ε ~ N, ( ) 0σ 2 I n n ( β, σ 2 ) Y N X I 10-8

Design matrix X X n p 1 1 X X X 11 12 1, p 1 X X X 21 22 2, p 1 = 1 X X X n1 n2 np, 1 10-9

Coefficient matrix β β p 1 β 0 β 1 = β p 1 10-10

Least Squares Solution Minimize distances between point and response surface Find b to minimize SSE= ( Y Xb)( Y Xb ) Obtain normal equations as before: XXb=XY Least Squares Solution as before: b= XX XY ( ) 1 10-11

Fitted Values / Residuals Fitted (predicted) values for the mean of Y are Yˆ = Xb= XXX XY = HY ( ) 1 Residuals are e=y Y=Y ˆ HY= ( I HY ) Note formulas are same as before, with hat matrix: H= XXX X ( ) 1 10-12

Linear Regression Models The term linear here refers to the parameters, not the predictor variables. We can use linear regression models to deal with almost any function of a predictor variable (e.g. X 2,log( X ), etc.) We cannot use linear regression models to deal with nonlinear functions of the parameters (unless we can find a transformation that makes them linear). 10-13

Types of Predictors Continuous Predictors we are used to these. Qualitative Predictors Two possible outcomes (e.g. male/female) represented by 0 or 1 Polynomial Regression Use squared or higher-ordered terms in regression model. Typically always include lower order terms. 2 p 1 Yi= β0+ β1xi+ β2xi + + βp 1Xi + εi 10-14

Types of Predictors (2) Using Transformed Variables Transform one or more X s Transform Y Interaction Effects Use Product of Predictor variables as an additional variable. Each variable in the product included by itself as well. Y = β + βx + βx + βx X + ε i 0 1 i1 2 i2 3 i1 i2 i More on some of these models later... 10-15

Analysis of Variance Formulas for sums of squares(in matrix terms) are the same as before ( ) 2 ˆ 1 SSR= Yi Y = bxy n YJY SSE ( Y ) i Yi 2 = ˆ = ee = YY bxy ( ) 2 1 SSTO= Yi Y = YY n YJY 10-16

Analysis of Variance (2) Degrees of Freedom depend on the model Always n 1 total degrees of freedom Model degrees of freedom is equal to the number of terms in the model ( p 1) Each variable has at least one term May be additional terms for squares, interactions, etc. Error degrees of freedom is difference between total and model degrees of freedom ( n p). 10-17

Analysis of Variance (3) Mean Squares obtained by dividing SS by DF for each source. The mean square error (MSE) is still, always, and forever, the estimate of 2 σ. 10-18

ANOVA Table Source df SS MS F Regression (Model) p-1 ( Y Y) 2 ˆi SSR df R MSR MSE Error n-p ( Y Yˆ ) 2 Total n-1 ( Y Y) 2 i i i SSE df E SSTO df T 10-19

F-test for model significance The ratio F = MSR / MSE is again used to test for a regression relationship. Difference from SLR Null Hyp: H0 : β1= β2=... = βp 1= 0 Alt Hyp: H : at least one β 0 a Tests model significance, not individual variables; gives no indication of which variable(s) in particular are important k 10-20

F-test for model significance (2) Under null, has F-distribution with degrees of freedom p 1 and n p. Reject if statistic is larger than critical value for α= 0.05; or if p-value for test (given in SAS ANOVA table) is less than 0.05 If reject, conclude at least one of the explanatory variables is important. If fail to reject, and sample size large enough (power), then none of the explanatory variables are useful. 10-21

Coefficient of Multiple Determination 2 SSR SSE R = = 1 SSTO SSTO Measures the percentage of variation explained by the variables in the model. Additional variables will make R 2 go up; so cannot really use R 2 to determine whether a variable should be added. 10-22

Adjusted R 2 2 MSE n 1 SSE Ra = 1 = 1 MSTO n p SSTO Recall mean squares are SS adjusted by degrees of freedom R can increase or decrease when a new 2 a variable is introduced into the model; depending on whether the decrease in SSE is offset by the lost degree of freedom. R can be used to decide if variables are 2 a important in a model. 10-23

Inference for INDIVIDUAL Regression Coefficients We already have b ~ N(, σ 2 (X X) -1 ), so define { } ( ) = MSE XX 1 2 s b p p For individual b k, the estimated variance is the k th diagonal element of this matrix: 2 { } 2 s bk = { } s b kk, Note: k=0,1,,p-1. 10-24

Confidence Intervals for β k CI for k β is b ± t s{ b} k crit k Critical value comes from t-distribution with n p degrees of freedom (DF for error) If CI includes zero, then we cannot reject H : 0 0 β k= (i.e. that variable is not significant when added to the model containing all of the other variables.) 10-25

Significance Test for β k Is known as a test; tests whether the k th explanatory variable is important when added to all of the other variables in the model (i.e. it is a conditional test). * t = b / s b Test statistic is as before: { } Compare to t-critical value on n p degrees of freedom. This is the test given in the model parameters section of SAS for PROC REG. k k 10-26

Upcoming in Lecture 11 Case Study: Computer Science Student Data 10-27