Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

Similar documents
Inference for Regression Simple Linear Regression

Inference for Regression Inference about the Regression Model and Using the Regression Line

Statistiek II. John Nerbonne. March 17, Dept of Information Science incl. important reworkings by Harmut Fitz

Inference for the Regression Coefficient

Confidence Interval for the mean response

Multiple Linear Regression

23. Inference for regression

Lecture 11 Multiple Linear Regression

Inferences for Regression

Lecture 18 MA Applied Statistics II D 2004

Chapter 6 Multiple Regression

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

Correlation Analysis

Measuring the fit of the model - SSR

Lecture 13 Extra Sums of Squares

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Intro to Linear Regression

STOR 455 STATISTICAL METHODS I

Intro to Linear Regression

Can you tell the relationship between students SAT scores and their college grades?

The Multiple Regression Model

Lecture 10 Multiple Linear Regression

Econ 3790: Business and Economics Statistics. Instructor: Yogesh Uppal

Simple Linear Regression

Ch 2: Simple Linear Regression

Chapter 4. Regression Models. Learning Objectives

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College

INFERENCE FOR REGRESSION

Chapter 4: Regression Models

11 Correlation and Regression

STAT 350 Final (new Material) Review Problems Key Spring 2016

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal

Basic Business Statistics, 10/e

ANOVA: Analysis of Variation

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

Correlation and Linear Regression

28. SIMPLE LINEAR REGRESSION III

Lecture 12 Inference in MLR

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters

Lecture 2 Simple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: Chapter 1

Inference for Regression

9. Linear Regression and Correlation

Multiple Regression. More Hypothesis Testing. More Hypothesis Testing The big question: What we really want to know: What we actually know: We know:

SIMPLE REGRESSION ANALYSIS. Business Statistics

Mathematics for Economics MA course

STAT Chapter 11: Regression

Basic Business Statistics 6 th Edition

Chapter 14 Simple Linear Regression (A)

Statistics for Managers using Microsoft Excel 6 th Edition

The simple linear regression model discussed in Chapter 13 was written as

Inference with Simple Regression

Lecture 3: Inference in SLR

: The model hypothesizes a relationship between the variables. The simplest probabilistic model: or.

Statistical Techniques II EXST7015 Simple Linear Regression

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

This document contains 3 sets of practice problems.

Chapter 14 Student Lecture Notes 14-1

Lecture 18: Simple Linear Regression

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

Example: Four levels of herbicide strength in an experiment on dry weight of treated plants.

df=degrees of freedom = n - 1

Lecture 1 Linear Regression with One Predictor Variable.p2

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables

Correlation. A statistics method to measure the relationship between two variables. Three characteristics

STAT 512 MidTerm I (2/21/2013) Spring 2013 INSTRUCTIONS

Homework 2: Simple Linear Regression

Math Section MW 1-2:30pm SR 117. Bekki George 206 PGH

Math 3330: Solution to midterm Exam

Chapter 10. Regression. Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania

DESAIN EKSPERIMEN Analysis of Variances (ANOVA) Semester Genap 2017/2018 Jurusan Teknik Industri Universitas Brawijaya

Unit 6 - Introduction to linear regression

Chapte The McGraw-Hill Companies, Inc. All rights reserved.

Six Sigma Black Belt Study Guides

Regression Models - Introduction

Confidence Intervals, Testing and ANOVA Summary

ST Correlation and Regression

Formal Statement of Simple Linear Regression Model

Regression Models. Chapter 4. Introduction. Introduction. Introduction

Chapter Learning Objectives. Regression Analysis. Correlation. Simple Linear Regression. Chapter 12. Simple Linear Regression

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

Ch 13 & 14 - Regression Analysis

Unit 6 - Simple linear regression

Simple Linear Regression

Lectures on Simple Linear Regression Stat 431, Summer 2012

Ch Inference for Linear Regression

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION

Simple Linear Regression. (Chs 12.1, 12.2, 12.4, 12.5)

Correlation & Simple Regression

Simple Linear Regression: One Qualitative IV

Simple Linear Regression: A Model for the Mean. Chap 7

y = a + bx 12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation Review: Interpreting Computer Regression Output

Topic 16: Multicollinearity and Polynomial Regression

Review of Statistics 101

Ch. 1: Data and Distributions

Analysis of Variance. ภาว น ศ ร ประภาน ก ล คณะเศรษฐศาสตร มหาว ทยาล ยธรรมศาสตร

STA121: Applied Regression Analysis

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

Chapter 3 Multiple Regression Complete Example

Transcription:

Multiple Regression Inference for Multiple Regression and A Case Study IPS Chapters 11.1 and 11.2 2009 W.H. Freeman and Company

Objectives (IPS Chapters 11.1 and 11.2) Multiple regression Data for multiple regression Multiple linear regression model Estimation of the parameters Confidence interval for β j Significance test for β j ANOVA table for multiple regression Squared multiple correlation R 2

Data for multiple regression Up to this point we have considered, in detail, the linear regression model in one explanatory variable x. ŷ = b 0 + b 1 x Usually more complex linear models are needed in practical situations. There are many problems in which a knowledge of more than one explanatory variable is necessary in order to obtain a better understanding and better prediction of a particular response. In general, we have data on n cases and p explanatory variables.

Multiple linear regression model For p number of explanatory variables, we can express the population mean response (μ y ) as a linear equation: μ y = β 0 + β 1 x 1 + β p x p The statistical model for n sample data (i = 1, 2, n) is then: Data = fit + residual y i = (β 0 + β 1 x 1i + β p x pi ) + (ε i ) Where the ε i are independent and normally distributed N(0, σ). Multiple linear regression assumes equal variance σ 2 of y. The parameters of the model are β 0, β 1 β p.

Estimation of the parameters We selected a random sample of n individuals for which p + 1 variables were measured (x 1, x p, y). The least-squares regression method minimizes the sum of squared deviations e i (= y i ŷ i ) to express y as a linear function of the p explanatory variables: ŷ i = b 0 + b 1 x 1i + b k x pi As with simple linear regression, the constant b 0 is the y intercept. The regression coefficients (b 1 b p ) reflect the unique association of each independent variable with the y variable. They are analogous to the slope in simple regression. ŷ μ y b 0 are unbiased estimates of population parameters β 0 b p β p

Confidence interval for β j Estimating the regression parameters β 0, β j β p is a case of onesample inference with unknown population variance. We rely on the t distribution, with n p 1 degrees of freedom. A level C confidence interval for β j is: b j ± t* SE b j -SE b j is the standard error of b j we rely on software to obtain SE b j. - t* is the t critical for the t (n 2) distribution with area C between t* and +t*.

Significance test for β j To test the hypothesis H 0 : β j = 0 versus a 1 or 2 sided alternative. We calculate the t statistic t = b j /SE b j which has the t (n p 1) distribution to find the p-value of the test. Note: Software typically provides two-sided p-values.

ANOVA F-test for multiple regression For a multiple linear relationship the ANOVA tests the hypotheses H 0 : β 1 = β 2 = = β p = 0 versus H a : H 0 not true by computing the F statistic: F = MSM / MSE When H 0 is true, F follows the F(1, n p 1) distribution. The p-value is P(F > f ). A significant p-value doesn t mean that all p explanatory variables have a significant influence on y only that at least one does.

ANOVA table for multiple regression Source Sum of squares SS df Mean square MS F P-value ( ˆ y) Model 2 p SSM/DFM MSM/MSE Tail area above F y i Error n p 1 SSE/DFE ( y i yˆ i ) Total n 1 ( y) y i 2 2 SST = SSM + SSE DFT = DFM + DFE The standard deviation of the sampling distribution, s, for n sample data points is calculated from the residuals e i = y i ŷ i s 2 = 2 ei y = ( n p 1 n i p yˆ i 1 ) 2 = SSE DFE = MSE s is an unbiased estimate of the regression standard deviation σ.

Squared multiple correlation R 2 Just as with simple linear regression, R 2, the squared multiple correlation, is the proportion of the variation in the response variable y that is explained by the model. In the particular case of multiple linear regression, the model is all p explanatory variables taken together. R 2 = ( yˆ ( y i i y) y) 2 2 = SSModel SSTotal

We have data on 224 first-year computer science majors at a large university in a given year. The data for each student include: * Cumulative GPA after 2 semesters at the university (y, response variable) * SAT math score (SATM, x1, explanatory variable) * SAT verbal score (SATV, x2, explanatory variable) * Average high school grade in math (HSM, x3, explanatory variable) * Average high school grade in science (HSS, x4, explanatory variable) * Average high school grade in English (HSE, x5, explanatory variable) Here are the summary statistics for these data given by software SAS:

The first step in multiple linear regression is to study all pair-wise relationships between the p + 1 variables. Here is the SAS output for all pair-wise correlation analyses (value of r and 2 sided p-value of H 0 : ρ = 0). Scatterplots for all 15 pair-wise relationships are also necessary to understand the data.

For simplicity, let s first run a multiple linear regression using only the three high school grade averages: P-value very significant R 2 is fairly small (20%) HSM significant HSS, HSE not

P-value very significant R 2 is fairly small (20%) The ANOVA for the multiple linear regression using only HSM, HSS, and HSE is very significant at least one of the regression coefficients is significantly different from zero. But R 2 is fairly small (0.205) only about 20% of the variations in cumulative GPA can be explained by these high school scores. (Remember, a small p-value does not imply a large effect.)

HSM significant HSS, HSE not The tests of hypotheses for each b within the multiple linear regression reach significance for HSM only. We found a significant correlation between HSS and GPA when analyzed by themselves, so why is b HSS not significant in the multiple regression equation? Well, HHS and HHM are also significantly correlated. When all three high school averages are used together in the multiple regression analysis, only HSM contributes significantly to our ability to predict GPA.

We now drop the least significant variable from the previous model: HSS. P-value very significant R 2 is small (20%) The conclusions are about the same. But notice that the actual regression coefficients have changed. predicted GPA=.590+.169HSM+.045HSE+.034HSS predicted GPA=.624+.183HSM+.061HSE HSM significant HSE not

Let s run a multiple linear regression with the two SAT scores only. P-value very significant R 2 is very small (6%) SATM significant SATV not The ANOVA test for β SATM and β SATV is very significant at least one is not zero. R 2 is really small (0.06) only 6% of GPA variations are explained by these tests. When taken together, only SATM is a significant predictor of GPA (P 0.0007).

We finally run a multiple regression model with all the variables together P-value very significant R 2 fairly small (21%) HSM significant The overall test is significant, but only the average high school math score (HSM) makes a significant contribution in this model to predicting the cumulative GPA. This conclusion applies to computer majors at this large university.

Minitab Excel