Chapter 13. Multiple Regression and Model Building

Similar documents
Correlation Analysis

Chapter 14 Student Lecture Notes 14-1

Chapter 4: Regression Models

Chapter 4. Regression Models. Learning Objectives

Chapter 3 Multiple Regression Complete Example

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Chapter 7 Student Lecture Notes 7-1

Statistics for Managers using Microsoft Excel 6 th Edition

The simple linear regression model discussed in Chapter 13 was written as

Regression Models. Chapter 4. Introduction. Introduction. Introduction

Basic Business Statistics, 10/e

Basic Business Statistics 6 th Edition

Chapter 14 Simple Linear Regression (A)

Bayesian Analysis LEARNING OBJECTIVES. Calculating Revised Probabilities. Calculating Revised Probabilities. Calculating Revised Probabilities

Chapter 14. Multiple Regression Models. Multiple Regression Models. Multiple Regression Models

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal

FinQuiz Notes

Final Exam - Solutions

x3,..., Multiple Regression β q α, β 1, β 2, β 3,..., β q in the model can all be estimated by least square estimators

Chapter 16. Simple Linear Regression and dcorrelation

Inferences for Regression

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

STA121: Applied Regression Analysis

Chapter 16. Simple Linear Regression and Correlation

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Review of Statistics

Chapter Learning Objectives. Regression Analysis. Correlation. Simple Linear Regression. Chapter 12. Simple Linear Regression

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters

STAT Chapter 11: Regression

Ch 2: Simple Linear Regression

ST430 Exam 2 Solutions

Psychology 282 Lecture #4 Outline Inferences in SLR

A discussion on multiple regression models

Ch 13 & 14 - Regression Analysis

The Multiple Regression Model

Regression Models. Chapter 4

Multiple Regression and Model Building Lecture 20 1 May 2006 R. Ryznar

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

Simple Linear Regression

Correlation & Simple Regression

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues

Unit 10: Simple Linear Regression and Correlation

Chapter 5 Friday, May 21st

Applied Regression Modeling: A Business Approach Chapter 2: Simple Linear Regression Sections

28. SIMPLE LINEAR REGRESSION III

Inference for Regression Simple Linear Regression

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables.

Inference for Regression

Multiple Regression Methods

What is a Hypothesis?

Chapter 10. Regression. Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania

Analysing data: regression and correlation S6 and S7

Lecture 9: Linear Regression

STATISTICS 110/201 PRACTICE FINAL EXAM

A Second Course in Statistics: Regression Analysis

Lecture 2 Simple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: Chapter 1

Chapter 9. Dummy (Binary) Variables. 9.1 Introduction The multiple regression model (9.1.1) Assumption MR1 is

1 Multiple Regression

appstats27.notebook April 06, 2017

MBA Statistics COURSE #4

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

Measuring the fit of the model - SSR

Hypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima

Ch14. Multiple Regression Analysis

Inference for Regression Inference about the Regression Model and Using the Regression Line

Lecture 10 Multiple Linear Regression

Midterm 2 - Solutions

Chapter 15 Multiple Regression

5.1 Model Specification and Data 5.2 Estimating the Parameters of the Multiple Regression Model 5.3 Sampling Properties of the Least Squares

DEMAND ESTIMATION (PART III)

Inference for the Regression Coefficient

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

Chapter 27 Summary Inferences for Regression

Project Report for STAT571 Statistical Methods Instructor: Dr. Ramon V. Leon. Wage Data Analysis. Yuanlei Zhang

Mathematics for Economics MA course

FAQ: Linear and Multiple Regression Analysis: Coefficients

Simple Linear Regression

Lecture 6: Linear Regression

ECON 497: Lecture 4 Page 1 of 1

Chapter 8 Heteroskedasticity

Chapter 14 Multiple Regression Analysis

Solutions to Exercises in Chapter 9

9. Linear Regression and Correlation

STAT 212 Business Statistics II 1

Econ 3790: Business and Economics Statistics. Instructor: Yogesh Uppal

INFERENCE FOR REGRESSION

Warm-up Using the given data Create a scatterplot Find the regression line

Can you tell the relationship between students SAT scores and their college grades?

REVIEW 8/2/2017 陈芳华东师大英语系

Lecture 3: Inference in SLR

Lecture 2 Linear Regression: A Model for the Mean. Sharyn O Halloran

Midterm 2 - Solutions

Review of Statistics 101

Answer Key: Problem Set 6

Chapter 26 Multiple Regression, Logistic Regression, and Indicator Variables

Unit 6 - Simple linear regression

Chapte The McGraw-Hill Companies, Inc. All rights reserved.

Contest Quiz 3. Question Sheet. In this quiz we will review concepts of linear regression covered in lecture 2.

Transcription:

Chapter 13 Multiple Regression and Model Building

Multiple Regression Models The General Multiple Regression Model y x x x 0 1 1 2 2... k k y is the dependent variable x, x,..., x 1 2 k the model are the independent variables... 0 1 1 2 2 k k E y x x x i is the deterministic portion of determines the contribution of the independent variable x i

Multiple Regression Models Analyzing a Multiple Regression Model 1. Hypothesize the deterministic component of the model 2. Use sample data to estimate β 0,β 1,β 2, β k 3. Specify probability distribution of ε and estimate σ 4. Check that assumptions on ε are satisfied 5. Statistically evaluate model usefulness 6. Useful model used for prediction, estimation, other purposes

The First-Order Model: Estimating and Interpreting the -Parameters For 0 1 1 2 2 3 3 4 4 5 5 E y x x x x x the chosen fitted model y ˆ ˆ x ˆ x ˆ... 0 1 1 k k minimizes ˆ 2 S S E y y

The First-Order Model: Estimating and Interpreting the -Parameters y = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3 + ε where Y = Sales price (dollars) X 1 = Appraised land value (dollars) X 2 = Appraised improvements (dollars) X 3 = Area (square feet)

The First-Order Model: Estimating and Interpreting the -Parameters Plot of data for sample size n=20

The First-Order Model: Estimating and Interpreting the -Parameters Fit model to data

The First-Order Model: Estimating and Interpreting the -Parameters Interpret β estimates ˆ.8 1 4 5 ˆ.8 2 0 4 2 ˆ 1 3.5 3 1 1 E(y), the mean sale price of the property is estimated to increase.8145 dollars for every $1 increase in appraised land value, holding other variables constant E(y), the mean sale price of the property is estimated to increase.8204 dollars for every $1 increase in appraised improvements, holding other variables constant E(y), the mean sale price of the property is estimated to increase 13.53 dollars for additional square foot of living area, holding other variables constant

The First-Order Model: Estimating and Interpreting the -Parameters Given the model E(y) = 1 +2x 1 +x 2, the effect of x 2 on E(y), holding x 1 and x 2 constant is

The First-Order Model: Estimating and Interpreting the -Parameters Given the model E(y) = 1 +2x 1 +x 2, the effect of x 2 on E(y), holding x 1 and x 2 constant is

Model Assumptions Assumptions about Random Error ε 1. For any given set of values of x 1, x 2,..x k, the random error has a normal probability distribution with mean 0 and variance σ 2 2. The random errors are independent Estimators of σ 2 for a Multiple Regression Model with k Independent Variables s 2 = SSE n-number of Estimated β parameters = SSE n-(k+1)

Inferences about the -Parameters 2 types of inferences can be made, using either confidence intervals or hypothesis testing For any inferences to be made, the assumptions made about the random error term ε (normal distribution with mean 0 and variance σ 2, independence or errors) must be met

Inferences about the -Parameters A 100(1-α)% Confidence Interval for a -Parameter ˆ t s i 2 ˆ where t α/2 is based on n-(k+1) degrees of freedom and n = Number of observations k+1 = Number of parameters in the model i

Inferences about the -Parameters A Test of an Individual Parameter Coefficient One-Tailed Test H 0 : β i =0 H a : β i <0 (or H a : β i >0) Two-Tailed Test H 0 : β i =0 H a : β i 0 T e s t S ta tis tic : t ˆ i Rejection region: t< -t α (or t< -t α when H a : β 1 >0) Rejection region: t > t α/2 Where t α and t α/2 are based on n-(k+1) degrees of freedom s ˆ i

Inferences about the -Parameters An Excel Analysis Use for hypotheses about parameter coefficients Use for confidence Intervals

Checking the Overall Utility of a Model 3 tests: 1. Multiple coefficient of determination R 2 R 2 S S E S S S S E yy E x p la in e d v a r ia b ility 1 S S S S T o ta l v a r ia b ility y y y y 2. Adjusted multiple coefficient of determination R 3. Global F-test n 1 SSE n 1 1 1 1 1 n k S S n k 1 yy 2 2 a R T e s t s ta tis tic : F 2 S S S S E k yy R k 2 S S E n k 1 1 R n k 1

Checking the Overall Utility of a Model Testing Global Usefulness of the Model: The Analysis of Variance F-test H 0 : β 1 = β 2=... β k =0 H a : At least one β i 0 T e s t s ta tis tic : F 2 S S S S E k yy R k M e a n S q u a r e M o d e l 2 S S E n k 1 1 R n k 1 M e a n S q u a r e E r r o r where n is the sample size and k is number of terms in the model Rejection region: F>F α, with k numerator degrees of freedom and [n- (k+1)] denominator degrees of freedom

Checking the Overall Utility of a Model Checking the Utility of a Multiple Regression Model 1. Conduct a test of overall model adequacy using the F-test. If H 0 is rejected, proceed to step 2 2. Conduct t-tests on β parameters of particular interest

Using the Model for Estimation and Prediction As in Simple Linear Regression, intervals around a predicted value will be wider than intervals around an estimated value Most statistics packages will print out both confidence and prediction intervals

Model Building: Interaction Models An Interaction Model relating E(y) to Two Quantitative Independent Variables 0 1 1 2 2 3 1 2 E y x x x x where x 1 3 2 represents the change in E(y) for every 1-unit increase in x 1, holding x 2 fixed x 2 3 1 represents the change in E(y) for every 1-unit increase in x 2, holding x 1 fixed

Model Building: Interaction Models When the relationship between two y and x i is not impacted by a second x (no interaction) When the linear relationship between y and x i depends on another x

Model Building: Interaction Models

Model Building: Quadratic and other Higher-Order Models A Quadratic (Second-Order) Model where E y x x 0 1 2 2 0 1 2 is the y-intercept of the curve is a shift parameter is the rate of curvature

Model Building: Quadratic and other Higher-Order Models Home Size-Electrical Usage Data Size of Home, x (sq. ft.) Monthly Usage, y (kilowatt-hours) 1,290 1,182 1,350 1,172 1,470 1,264 1,600 1,493 1,710 1,571 1,840 1,711 1,980 1,804 2,230 1,840 2,400 1,95 2,930 1,954

Model Building: Quadratic and other Higher-Order Models yˆ 1, 2 1 6.1 2.3 9 8 9 x.0 0 0 4 5 x 2

Model Building: Quadratic and other Higher-Order Models A Complete Second-Order Model with Two Quantitative Independent Variables where 2 2 E y x x x x x x 0 1 2 2 3 1 2 4 1 5 2 0 is the y-intercept, value of E(y) when x 1 =x 2 =0, 1 2 3, 4 5 changes cause the surface to shift along the x 1 and x 2 axes controls the rotation of the surface control the type of surface, rates of curvature

Model Building: Quadratic and other Higher-Order Models

Model Building: Qualitative (Dummy) Variable Models Dummy variables coded, qualitative variables Codes are in the form of (1, 0), 1 being the presence of a condition, 0 the absence Create Dummy variables so that there is one less dummy variable than categories of the qualitative variable of interest Gender dummy variable coded as x = 1 if male, x=0 if female If model is E(y)=β 0 +β 1 x, β 1 captures the effect of being male on the dependent variable

Model Building: Models with both Quantitative and Qualitative Variables Start with a first order model with one quantitative variable, E(y)=β 0 +β 1 x Adding a qualitative variable with no interaction, E(y)=β 0 +β 1 x 1 + β 2 x 2 + β 3 x 3

Model Building: Models with both Quantitative and Qualitative Variables Adding an interaction term, E(y)=β 0 +β 1 x 1 + β 2 x 2 + β 3 x 3 + β 4 x 1 x 2 + β 5 x 1 x 3 Main effect, Main effect Interaction x 1 x 2 and x 3

Model Building: Comparing Nested Models Models are nested if one model contains all the terms of the other model and at least one additional term. Complete (full) model the more complex model Reduced model the simpler model

Model Building: Comparing Nested Models Models are nested if one model contains all the terms of the other model and at least one additional term. Complete (full) model the more complex model Reduced model the simpler model 2 2 E y x x x x x x 0 1 2 2 3 1 2 4 1 5 2 0 1 2 2 3 1 2 E y x x x x

Model Building: Comparing Nested Models F-Test for Comparing Nested Models F-Test for comparing nested models: Reduced model Complete Model 0 1 1... E y x x g g 0 1 1 1 1 E y x... x x... x H 0 : β g+1 = β g+2=... β k =0 H a : At least one β under test is nonzero. g g g g k k T e s t s ta tis tic : F # ' 0 S S E S S E k g S S E S S E s te s te d in H R C R C S S E n k 1 M S E C C Rejection region: F>F α, with k-g numerator degrees of freedom and [n-(k+1)] denominator degrees of freedom

Model Building: Stepwise Regression Used when a large set of independent variables Software packages will add in variables in order of explanatory value. Decisions based on largest t-values at each step Procedure is best used as a screening procedure only

Residual Analysis: Checking the Regression Assumptions Regression Residual the difference between an observed y value and its corresponding predicted value ˆ y yˆ Properties of Regression Residuals The mean of the residuals equals zero The standard deviation of the residuals is equal to the standard deviation of the fitted regression model

Residual Analysis: Checking the Regression Assumptions Analyzing Residuals Top plot of residuals reveals non-random pattern, curved shape Second plot, based on second-order term being added to model, results in random pattern, better model

Residual Analysis: Checking the Regression Assumptions Identifying Outliers Residual plots can reveal outliers Outliers need to be checked to try to determine if error is involved If error is involved, or observation is not representative, analysis can be rerun after deleting data point to assess the effect. Outlier

Residual Analysis: Checking the Regression Assumptions Checking for Normal Errors With Outlier Without Outlier

Residual Analysis: Checking the Regression Assumptions Checking for Equal Variances Pattern in residuals indicate violation of equal variance assumption Can point to use of transformation on the dependent variable to stabilize variance

Residual Analysis: Checking the Regression Assumptions Steps in Residual Analysis 1. Check for misspecified model by plotting residuals against quantitative independent variables 2. Examine residual plots for outliers 3. Check for non-normal error using frequency distribution of residuals 4. Check for unequal error variances using plots of residuals against predicted values

Some Pitfalls: Estimability, Multicollinearity, and Extrapolation Estimability the number of levels of observed x-values must be one more than the order of the polynomial in x that you want to fit Multicollinearity when two or more independent variables are correlated

Some Pitfalls: Estimability, Multicollinearity, and Extrapolation Multicollinearity when two or more independent variables are correlated Leads to confusing, misleading results, incorrect parameter estimate signs. Can be identified by checking correlations among x s non-significant for most/all x s signs opposite from expected in the estimated β parameters Can be addressed by Dropping one or more of the correlated variables in the model Restricting inferences to range of sample data, not making inferences about individual β parameters based on t-tests.

Some Pitfalls: Estimability, Multicollinearity, and Extrapolation Extrapolation use of model to predict outside of range of sample data is dangerous Correlated Errors most common when working with time series data, values of y and x s observed over a period of time. Solution is to develop a time series model.