Chapter 7 Student Lecture Notes 7-1

Similar documents
Chapter 14 Student Lecture Notes 14-1

Chapter 3 Multiple Regression Complete Example

Basic Business Statistics, 10/e

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

The Multiple Regression Model

Chapter 4. Regression Models. Learning Objectives

Correlation Analysis

Regression Models. Chapter 4. Introduction. Introduction. Introduction

Chapter 4: Regression Models

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal

Statistics for Managers using Microsoft Excel 6 th Edition

Analisi Statistica per le Imprese

Basic Business Statistics 6 th Edition

Chapter 13. Multiple Regression and Model Building

The simple linear regression model discussed in Chapter 13 was written as

Multiple Regression. Peerapat Wongchaiwat, Ph.D.

Inferences for Regression

Chapter 13 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics

Mathematics for Economics MA course

Inference for Regression

STA121: Applied Regression Analysis

Chapter 14 Simple Linear Regression (A)

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Single and multiple linear regression analysis

Regression Models. Chapter 4

Chapter Learning Objectives. Regression Analysis. Correlation. Simple Linear Regression. Chapter 12. Simple Linear Regression

Chapter 16. Simple Linear Regression and dcorrelation

Bayesian Analysis LEARNING OBJECTIVES. Calculating Revised Probabilities. Calculating Revised Probabilities. Calculating Revised Probabilities

Chapter 16. Simple Linear Regression and Correlation

Ch14. Multiple Regression Analysis

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables.

MULTIPLE REGRESSION ANALYSIS AND OTHER ISSUES. Business Statistics

Lecture 10 Multiple Linear Regression

Econ 3790: Statistics Business and Economics. Instructor: Yogesh Uppal

BNAD 276 Lecture 10 Simple Linear Regression Model

SIMPLE REGRESSION ANALYSIS. Business Statistics

Regression Models REVISED TEACHING SUGGESTIONS ALTERNATIVE EXAMPLES

Business Statistics. Lecture 10: Correlation and Linear Regression

A discussion on multiple regression models

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Regression Analysis II

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

DEMAND ESTIMATION (PART III)

Ch 13 & 14 - Regression Analysis

Inference for Regression Simple Linear Regression

What is a Hypothesis?

LI EAR REGRESSIO A D CORRELATIO

Hypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima

Regression Analysis. BUS 735: Business Decision Making and Research

Multiple Regression Methods

Econ 3790: Business and Economics Statistics. Instructor: Yogesh Uppal

Ch 2: Simple Linear Regression

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

Inference for the Regression Coefficient

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

Inference for Regression Inference about the Regression Model and Using the Regression Line

Data Analysis 1 LINEAR REGRESSION. Chapter 03

Simple Linear Regression: One Qualitative IV

Chapte The McGraw-Hill Companies, Inc. All rights reserved.

Simple Linear Regression

Unit 11: Multiple Linear Regression

Simple Linear Regression

Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues

Finding Relationships Among Variables

CHAPTER EIGHT Linear Regression

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

x3,..., Multiple Regression β q α, β 1, β 2, β 3,..., β q in the model can all be estimated by least square estimators

y response variable x 1, x 2,, x k -- a set of explanatory variables

Chapter 15 Multiple Regression

STATISTICAL DATA ANALYSIS IN EXCEL

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

Interactions. Interactions. Lectures 1 & 2. Linear Relationships. y = a + bx. Slope. Intercept

ECON 450 Development Economics

Statistics and Quantitative Analysis U4320

Lecture 2 Simple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: Chapter 1

Multiple Regression: Chapter 13. July 24, 2015

Class time (Please Circle): 11:10am-12:25pm. or 12:45pm-2:00pm

Lectures on Simple Linear Regression Stat 431, Summer 2012

ECO220Y Simple Regression: Testing the Slope

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

Concordia University (5+5)Q 1.

Final Exam - Solutions

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

Linear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is.

PubH 7405: REGRESSION ANALYSIS. MLR: INFERENCES, Part I

Simple Linear Regression

Regression Models for Quantitative and Qualitative Predictors: An Overview

Unit 10: Simple Linear Regression and Correlation

Section 4: Multiple Linear Regression

Multiple Regression Analysis. Basic Estimation Techniques. Multiple Regression Analysis. Multiple Regression Analysis

CHAPTER 4 & 5 Linear Regression with One Regressor. Kazu Matsuda IBEC PHBU 430 Econometrics

Applied Regression Analysis. Section 2: Multiple Linear Regression

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

Categorical Predictor Variables

Measuring the fit of the model - SSR

Correlation & Simple Regression

Applied Regression Modeling: A Business Approach Chapter 3: Multiple Linear Regression Sections

Solutions to Exercises in Chapter 9

ECON 497 Midterm Spring

Transcription:

Chapter 7 Student Lecture Notes 7- Chapter Goals QM353: Business Statistics Chapter 7 Multiple Regression Analysis and Model Building After completing this chapter, you should be able to: Explain model building using multiple regression analysis Apply multiple regression analysis to business decision-making situations Analyze and interpret the computer output for a multiple regression model Test the significance of the independent variables in a multiple regression model Chapter Goals After completing this chapter, you should be able to: Recognize potential problems in multiple regression analysis and take steps to correct the problems Incorporate qualitative variables into the regression model by using dummy variables Use variable transformations to model nonlinear relationships Test if the coefficients of a regression model are useful The Multiple Regression Model Idea: Examine the linear relationship between dependent (y) & or more independent variables (x i ) Population model: Y-intercept Population slopes Random Error y β βx βx βkxk ε Estimated multiple regression model: Estimated (or predicted) value of y ŷ b Estimated intercept Estimated slope coefficients bx bx bkxk Multiple Regression Model Multiple Regression Model Two variable model y ŷ b bx bx Two variable model y i y Sample observation ŷ b bx bx y i < e = (y y) < x x i x x Called the regression hyperplane x x i The best fit equation, y, is found by minimizing the sum of squared errors, e < 8 Prentice-Hall, Inc.

Chapter 7 Student Lecture Notes 7- Multiple Regression Assumptions Basic Model-Building Concepts Errors (residuals) from the regression model: e = (y y) < The model errors are statistically independent and represent a random sample from the population of all possible errors For a given x, there can exist many values of y; thus many possible values for ε The errors are normally distributed The mean of the errors is zero Errors have a constant variance Models are used to test changes without actually implementing the changes Can be used to predict outputs based on specified inputs Consists of 3 components: Model specification Model fitting Model diagnosis Model Specification Model Building Sometimes referred to as model identification Is a process for establishing the framework for the model Decide what you want to do and select the dependent variable (y) Determine the potential independent variables (x) for your model Gather sample data (observations) for all variables Process of actually constructing the equation for the data May include some or all of the independent variables (x) The goal is to explain the variation in the dependent variable (y) with the selected independent variables (x) Model Diagnosis Example Analyzing the quality of the model (perform diagnostic checks) Assess the extent to which the assumptions appear to be satisfied If unacceptable, begin the model-building process again Should use the simplest model available to meet needs The goal is to help you make better decisions A distributor of frozen desert pies wants to evaluate factors thought to influence demand Dependent variable: Pie sales (units per week) Independent variables: Price (in $) Advertising ($ s) Data are collected for 5 weeks 8 Prentice-Hall, Inc.

Chapter 7 Student Lecture Notes 7-3 Formulate the Model Interpretation of Estimated Coefficients Pie Price Advertising Week Sales ($) ($s) 35 5.5 3.3 46 7.5 3.3 3 35 8. 3. 4 43 8. 4.5 5 35 6.8 3. 6 38 7.5 4. 7 43 4.5 3. 8 47 6.4 3.7 9 45 7. 3.5 49 5. 4. 34 7. 3.5 3 7.9 3. 3 44 5.9 4. 4 45 5. 3.5 5 3 7..7 Multiple regression model: Sales = b + b (Price) + b (Advertising) Correlation matrix: Pie Sales Pie Sales Price Advertising Price -.4437 Advertising.5563.344 Slope (b i ) Estimates that the average value of y changes by b i units for each unit increase in X i holding all other variables constant Example: if b = -, then sales (y) is expected to decrease by an estimated pies per week for each $ increase in selling price (x ), net of the effects of changes due to advertising (x ) y-intercept (b ) The estimated average value of y when all x i = (assuming all x i = is within the range of observed values) Scatter Diagrams The Correlation Matrix Sales vs. Price Sales 6 5 4 3 Sales 6 4 6 8 5 Price 4 3 Sales vs. Advertising Correlation between the dependent variable and selected independent variables can be found using Excel: Formula Tab: Data Analysis / Correlation Can check for statistical significance of correlation with a t test 3 4 5 Advertising Pie Sales Correlation Matrix Estimating a Multiple Linear Regression Equation Pie Sales Price Advertising Pie Sales Price -.4437 Advertising.5563.344 Computer software is generally used to generate the coefficients and measures of goodness of fit for multiple regression Price vs. Sales : r = -.4437 There is a negative association between price and sales Advertising vs. Sales : r =.5563 There is a positive association between advertising and sales Excel: Data / Data Analysis / Regression PHStat: Add-Ins / PHStat / Regression / Multiple Regression 8 Prentice-Hall, Inc.

Chapter 7 Student Lecture Notes 7-4 Estimating a Multiple Linear Regression Equation Excel: Data / Data Analysis / Regression PHStat: Estimating a Multiple Linear Regression Equation Add-Ins / PHStat / Regression / Multiple Regression Multiple Regression Output The Multiple Regression Equation Multiple R.73 R Square.548 Adjusted R Square.447 Standard Error 47.4634 Sales 36.56-4.975(Price) 74.3(Advertising) Observations 5 Regression 946.7 473.3 6.5386. Residual 733.36 5.776 Total 4 56493.333 Intercept 36.569 4.5389.6885.993 57.58835 555.4644 Price -4.9759.833 -.3565.3979-48.5766 -.3739 Advertising 74.396 5.9673.85478.449 7.5533 3.7888 Sales 36.56-4.975(Price) 74.3(Advertising) where Sales is in number of pies per week Price is in $ Advertising is in $ s. b = -4.975: sales will decrease, on average, by 4.975 pies per week for each $ increase in selling price, net of the effects of changes due to advertising b = 74.3: sales will increase, on average, by 74.3 pies per week for each $ increase in advertising, net of the effects of changes due to price Using The Model to Make Predictions Predict sales for a week in which the selling price is $5.5 and advertising is $35: Sales 36.56-4.975(Price) 74.3(Advertising) 36.56-4.975 (5.5) 74.3(3.5) 48.6 Multiple Coefficient of Determination (R ) Reports the proportion of total variation in y explained by all x variables taken together R SSR Sum of squares regression SST Total sum of squares Predicted sales is 48.6 pies Note that Advertising is in $ s, so $35 means that x = 3.5 8 Prentice-Hall, Inc.

Chapter 7 Student Lecture Notes 7-5 Multiple R.73 R Square.548 Adjusted R Square.447 Standard Error 47.4634 Observations 5 R Regression 946.7 473.3 6.5386. Residual 733.36 5.776 Total 4 56493.333 Multiple Coefficient of Determination SSR 946..548 SST 56493.3 5.% of the variation in pie sales is explained by the variation in price and advertising Intercept 36.569 4.5389.6885.993 57.58835 555.4644 Price -4.9759.833 -.3565.3979-48.5766 -.3739 Advertising 74.396 5.9673.85478.449 7.5533 3.7888 Adjusted R R never decreases when a new x variable is added to the model This can be a disadvantage when comparing models What is the net effect of adding a new variable? We lose a degree of freedom when a new x variable is added Did the new x variable add enough explanatory power to offset the loss of one degree of freedom? Shows the proportion of variation in y explained by all x variables adjusted for the number of x variables used (where n = sample size, k = number of independent variables) Penalize excessive use of unimportant independent variables Smaller than R R A Adjusted R ( R Useful in comparing among models n ) n k Multiple R.73 R Square.548 Adjusted R Square.447 Standard Error 47.4634 Observations 5 Regression 946.7 473.3 6.5386. Residual 733.36 5.776 Total 4 56493.333 Multiple Coefficient of Determination R A.447 44.% of the variation in pie sales is explained by the variation in price and advertising, taking into account the sample size and number of independent variables Intercept 36.569 4.5389.6885.993 57.58835 555.4644 Price -4.9759.833 -.3565.3979-48.5766 -.3739 Advertising 74.396 5.9673.85478.449 7.5533 3.7888 Model Diagnosis: F-Test F-Test for Overall Significance of the Model Shows if there is a linear relationship between all of the x variables considered together and y Use F test statistic Hypotheses: H : β = β = = β k = (no linear relationship) H A : at least one β i (at least one independent variable affects y) F-Test for Overall Significance Test statistic: where F has SSR F k SSE n k MSR MSE (numerator) D = k and (denominator) D = (n k ) degrees of freedom 8 Prentice-Hall, Inc.

Chapter 7 Student Lecture Notes 7-6 F-Test for Overall Significance Multiple R.73 R Square.548 Adjusted R Square.447 Standard Error 47.4634 Observations 5 MSR 473. F 6.5386 MSE 5.8 Regression 946.7 473.3 6.5386. Residual 733.36 5.776 Total 4 56493.333 With and degrees of freedom P-value for the F-Test Intercept 36.569 4.5389.6885.993 57.58835 555.4644 Price -4.9759.833 -.3565.3979-48.5766 -.3739 Advertising 74.396 5.9673.85478.449 7.5533 3.7888 H : β = β = H A : β and β not both zero =.5 df = df = Do not reject H F-Test for Overall Significance Critical Value: F = 3.885 =.5 Reject H F.5 = 3.885 F Test Statistic: MSR F 6.5386 MSE Decision: Reject H at =.5 Conclusion: The regression model does explain a significant portion of the variation in pie sales (There is evidence that at least one independent variable affects y ) Model Diagnosis: Are Individual Variables Significant? Use t-tests of individual variable slopes Shows if there is a linear relationship between the variable x i and y Hypotheses: H : β i = (no linear relationship) H A : β i (linear relationship does exist between x i and y) H : β i Are Individual Variables Significant? = (no linear relationship) H A : β i (linear relationship does exist between x i and y ) Test Statistic: bi t s b i (df = n k ) Multiple R.73 R Square.548 Adjusted R Square.447 Standard Error 47.4634 Observations 5 Are Individual Variables Significant? Regression 946.7 473.3 6.5386. Residual 733.36 5.776 Total 4 56493.333 t-value for Price is t = -.36, with p-value.398 t-value for Advertising is t =.855, with p-value.45 Intercept 36.569 4.5389.6885.993 57.58835 555.4644 Price -4.9759.833 -.3565.3979-48.5766 -.3739 Advertising 74.396 5.9673.85478.449 7.5533 3.7888 H : β i = H A : β i d.f. = 5-- = =.5 t / =.788 α/=.5 Inferences about the Slope: t Test Example From Excel output: Reject H Do not reject H Reject H -t α/ t α/ -.788.788 Coefficients Standard Error t Stat P-value Price -4.9759.833 -.3565.3979 Advertising 74.396 5.9673.85478.449 The test statistic for each variable falls in the rejection region (p-values <.5) α/=.5 Decision: Reject H for each variable Conclusion: There is evidence that both Price and Advertising affect pie sales at =.5 8 Prentice-Hall, Inc.

Chapter 7 Student Lecture Notes 7-7 Standard Deviation of the Regression Model The estimate of the standard deviation of the regression model is: s SSE n k MSE Is this value large or small? Must compare to the mean size of y for comparison Multiple R.73 R Square.548 Adjusted R Square.447 Standard Error 47.4634 Observations 5 Standard Deviation of the Regression Model Regression 946.7 473.3 6.5386. Residual 733.36 5.776 Total 4 56493.333 The standard deviation of the regression model is 47.46 Intercept 36.569 4.5389.6885.993 57.58835 555.4644 Price -4.9759.833 -.3565.3979-48.5766 -.3739 Advertising 74.396 5.9673.85478.449 7.5533 3.7888 Standard Deviation of the Regression Model The standard deviation of the regression model is 47.46 A rough prediction range for pie sales in a given week is (47.46) 94. Pie sales in the sample were in the 3 to 5 per week range, so this range is probably too large to be acceptable. The analyst may want to look for additional variables that can explain more of the variation in weekly sales Model Diagnosis: Multicollinearity Multicollinearity: High correlation exists between two independent variables and therefore the variables overlap This means the two variables contribute redundant information to the multiple regression model Multicollinearity Including two highly correlated independent variables can adversely affect the regression results No new information provided Can lead to unstable coefficients (large standard error and low t-values) Coefficient signs may not match prior expectations Some Indications of Severe Multicollinearity Incorrect signs on the coefficients Large change in the value of a previous coefficient when a new variable is added to the model A previously significant variable becomes insignificant when a new independent variable is added The estimate of the standard deviation of the model increases when a variable is added to the model 8 Prentice-Hall, Inc.

Chapter 7 Student Lecture Notes 7-8 Detect Collinearity (Variance Inflationary Factor) VIF j is used to measure collinearity: VIF j R j R j is the coefficient of determination when the j th independent variable is regressed against the remaining k independent variables If VIF j 5, x j is highly correlated with the other explanatory variables Regression Analysis Price and all other X Detect Collinearity in PHStat Add-Ins/ PHStat / Regression / Multiple Regression Check the variance inflationary factor (VIF) box Multiple R.343758 R Square.96446 Adjusted R Square -.7595366 Standard Error.5735 Observations 5 VIF.9735 Output for the pie sales example: Since there are only two explanatory variables, only one VIF is reported VIF is < 5 There is no evidence of collinearity between Price and Advertising Confidence Interval Estimate for the Slope Confidence interval for the population slope β (the effect of changes in price on pie sales): b t i / Example: Weekly sales are estimated to be reduced by between.37 to 48.58 pies for each increase of $ in the selling price s Coefficients Standard Error Lower 95% Upper 95% Intercept 36.569 4.5389 57.58835 555.4644 Price -4.9759.833-48.5766 -.3739 Advertising 74.396 5.9673 7.5533 3.7888 b i where t has (n k ) d.f. Qualitative (Dummy) Variables Categorical explanatory variable (dummy variable) with two or more levels: yes or no, on or off, male or female freshman, sophomore, etc., class standing Sometimes called indicator variables Regression intercepts are different if the variable is significant Assumes equal slopes for other variables The number of dummy variables needed is (number of levels ) Let: Dummy-Variable Model Example (with Levels) y = pie sales x = price ŷ b b x b x x = holiday (X = if a holiday occurred during the week) (X = if there was no holiday that week) ŷ b Dummy-Variable Model Example (with Levels) ŷ b y (sales) b + b b b x b () (b b x b () b ) b x b Different intercept b x Same slope Holiday No Holiday If H : β = is rejected, then Holiday has a significant effect on pie sales x (Price) 8 Prentice-Hall, Inc.

Chapter 7 Student Lecture Notes 7-9 Example: Interpreting the Dummy Variable Coefficient (with Levels) Sales: number of pies sold per week Price: pie price in $ Holiday: Sales 3-3(Price) 5(Holiday) If a holiday occurred during the week If no holiday occurred b = 5: on average, sales were 5 pies greater in weeks with a holiday than in weeks without a holiday, given the same price Dummy-Variable Models (more than Levels) The number of dummy variables is one less than the number of levels Example: y = house price ; x = square feet The style of the house is also thought to matter: Style = ranch, split level, condo Three levels, so two dummy variables are needed Dummy-Variable Models (more than Levels) Let the default category be condo if ranch x x3 if not if split level if not ŷ b b x b x b x 3 3 b shows the impact on price if the house is a ranch style, compared to a condo b 3 shows the impact on price if the house is a split level style, compared to a condo Interpreting the Dummy Variable Coefficients (with 3 Levels) Suppose the estimated equation is ŷ.43.45x 3.53x 8.84x For a condo: x = x 3 = ŷ.43.45x For a ranch: x 3 = ŷ.43.45x For a split level: x = ŷ.43.45x 3.53 8.84 With the same square feet, a ranch will have an estimated average price of 3.53 thousand dollars more than a condo With the same square feet, a ranch will have an estimated average price of 8.84 thousand dollars more than a condo. 3 Chapter Summary Developed the multiple regression model Tested the significance of the multiple regression model Developed adjusted R Tested individual regression coefficients Used dummy variables Examined interaction in a multiple regression model 8 Prentice-Hall, Inc.