Chapter 3 Multiple Regression Complete Example

Similar documents
Chapter 14 Student Lecture Notes 14-1

Chapter 7 Student Lecture Notes 7-1

Basic Business Statistics, 10/e

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

The Multiple Regression Model

What is a Hypothesis?

Econ 325: Introduction to Empirical Economics

Chapter 4. Regression Models. Learning Objectives

Chapter 4: Regression Models

Regression Models. Chapter 4. Introduction. Introduction. Introduction

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal

Correlation Analysis

Chapter 13. Multiple Regression and Model Building

Statistics for Managers Using Microsoft Excel/SPSS Chapter 8 Fundamentals of Hypothesis Testing: One-Sample Tests

Bayesian Analysis LEARNING OBJECTIVES. Calculating Revised Probabilities. Calculating Revised Probabilities. Calculating Revised Probabilities

The simple linear regression model discussed in Chapter 13 was written as

Chapter 16. Simple Linear Regression and Correlation

Chapter 16. Simple Linear Regression and dcorrelation

Multiple Regression. Peerapat Wongchaiwat, Ph.D.

Business Statistics: Lecture 8: Introduction to Estimation & Hypothesis Testing

Statistics for Managers using Microsoft Excel 6 th Edition

Mathematics for Economics MA course

Analisi Statistica per le Imprese

Basic Business Statistics 6 th Edition

Inferences for Regression

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Chapter Learning Objectives. Regression Analysis. Correlation. Simple Linear Regression. Chapter 12. Simple Linear Regression

Chapter 13 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics

STA121: Applied Regression Analysis

Multiple Regression Methods

STAT 212 Business Statistics II 1

Regression Analysis. BUS 735: Business Decision Making and Research

Ch 13 & 14 - Regression Analysis

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables.

Hypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Ch14. Multiple Regression Analysis

Chapter 14 Simple Linear Regression (A)

MULTIPLE REGRESSION ANALYSIS AND OTHER ISSUES. Business Statistics

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

Lecture 10 Multiple Linear Regression

Inference for the Regression Coefficient

LI EAR REGRESSIO A D CORRELATIO

Inference for Regression Inference about the Regression Model and Using the Regression Line

Econ 3790: Business and Economics Statistics. Instructor: Yogesh Uppal

Inference for Regression Simple Linear Regression

Inference for Regression

Simple Linear Regression: One Qualitative IV

Regression Analysis II

Final Exam - Solutions

MBA Statistics COURSE #4

Regression Models REVISED TEACHING SUGGESTIONS ALTERNATIVE EXAMPLES

SIMPLE REGRESSION ANALYSIS. Business Statistics

Econ 3790: Statistics Business and Economics. Instructor: Yogesh Uppal

Statistics and Quantitative Analysis U4320

Regression Analysis IV... More MLR and Model Building

Ch 2: Simple Linear Regression

Interactions. Interactions. Lectures 1 & 2. Linear Relationships. y = a + bx. Slope. Intercept

Business Statistics. Lecture 10: Correlation and Linear Regression

Sociology 6Z03 Review II

Multiple regression: Model building. Topics. Correlation Matrix. CQMS 202 Business Statistics II Prepared by Moez Hababou

LECTURE 5. Introduction to Econometrics. Hypothesis testing

Class time (Please Circle): 11:10am-12:25pm. or 12:45pm-2:00pm

Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues

Multiple linear regression

A discussion on multiple regression models

Multiple Regression Analysis

STAT 350 Final (new Material) Review Problems Key Spring 2016

Lectures 5 & 6: Hypothesis Testing

Single and multiple linear regression analysis

12.12 MODEL BUILDING, AND THE EFFECTS OF MULTICOLLINEARITY (OPTIONAL)

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

Marquette University Executive MBA Program Statistics Review Class Notes Summer 2018

BNAD 276 Lecture 10 Simple Linear Regression Model

Finding Relationships Among Variables

Multiple Linear Regression. Chapter 12

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

Section 3: Simple Linear Regression

Correlation & Simple Regression

Data Analysis 1 LINEAR REGRESSION. Chapter 03

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8

Econometrics. 4) Statistical inference

Inference with Simple Regression

x3,..., Multiple Regression β q α, β 1, β 2, β 3,..., β q in the model can all be estimated by least square estimators

Ordinary Least Squares Regression Explained: Vartanian

Unit 27 One-Way Analysis of Variance

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

Simple Linear Regression

CHAPTER EIGHT Linear Regression

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

Lecture 6: Linear Regression

The Hong Kong University of Science & Technology ISOM551 Introductory Statistics for Business Assignment 4 Suggested Solution

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

Six Sigma Black Belt Study Guides

Review of Statistics

Chapter 14 Multiple Regression Analysis

ECO220Y Simple Regression: Testing the Slope

Business Statistics. Lecture 10: Course Review

STAT Chapter 8: Hypothesis Tests

Transcription:

Department of Quantitative Methods & Information Systems ECON 504 Chapter 3 Multiple Regression Complete Example Spring 2013 Dr. Mohammad Zainal

Review Goals After completing this lecture, you should be able to: Formulate null and alternative hypotheses for applications involving a single population mean or proportion Formulate a decision rule for testing a hypothesis Know how to use the test statistic, critical value, and p-value approaches to test the null hypothesis Know what Type I and Type II errors are 2

Review Goals explain model building using multiple regression analysis apply multiple regression analysis to business decision-making situations (continued) analyze and interpret the computer output for a multiple regression model test the significance of the independent variables in a multiple regression model 3

Review Goals recognize potential problems in multiple regression analysis and take steps to correct the problems incorporate qualitative variables into the regression model by using dummy variables (continued) use variable transformations to model nonlinear relationships 4

What is a Hypothesis? A hypothesis is a claim (assumption) about a population parameter: population mean Example: The mean monthly cell phone bill of this city is µ = $42 population proportion Example: The proportion of adults in this city with cell phones is P =.68 5

The Null Hypothesis, H 0 States the assumption (numerical) to be tested Example: The average number of TV sets in H 0 : μ U.S. Homes is at least three ( ) Is always about a population parameter, not about a sample statistic 3 H 0 : μ 3 : x 3 H 0 6

The Null Hypothesis, H 0 Begin with the assumption that the null hypothesis is true Similar to the notion of innocent until proven guilty Refers to the status quo Always contains =, or sign May or may not be rejected (continued) 7

The Alternative Hypothesis, H A Is the opposite of the null hypothesis e.g.: The average number of TV sets in U.S. homes is less than 3 ( H A : µ < 3 ) Challenges the status quo Never contains the =, or sign May or may not be accepted Is generally the hypothesis that is believed (or needs to be supported) by the researcher a research hypothesis 8

Formulating Hypotheses Example 1: Ford motor company has worked to reduce road noise inside the cab of the redesigned F150 pickup truck. It would like to report in its advertising that the truck is quieter. The average of the prior design was 68 decibels at 60 mph. What is the appropriate hypothesis test? 9

Formulating Hypotheses Example 1: Ford motor company has worked to reduce road noise inside the cab of the redesigned F150 pickup truck. It would like to report in its advertising that the truck is quieter. The average of the prior design was 68 decibels at 60 mph. What is the appropriate test? H 0 : µ 68 (the truck is not quieter) status quo H A : µ < 68 (the truck is quieter) wants to support If the null hypothesis is rejected, Ford has sufficient evidence to support that the truck is now quieter. 10

Formulating Hypotheses Example 2: The average annual income of buyers of Ford F150 pickup trucks is claimed to be $65,000 per year. An industry analyst would like to test this claim. What is the appropriate hypothesis test? 11

Formulating Hypotheses Example 1: The average annual income of buyers of Ford F150 pickup trucks is claimed to be $65,000 per year. An industry analyst would like to test this claim. What is the appropriate test? H 0 : µ = 65,000 (income is as claimed) status quo H A : µ 65,000 (income is different than claimed) The analyst will believe the claim unless sufficient evidence is found to discredit it. 12

Hypothesis Testing Process Claim: the population mean age is 50. Null Hypothesis: H 0 : µ = 50 Population Now select a random sample: Sample Suppose the sample mean age is 20: x = 20 Is x = 20 likely if µ = 50? If not likely, REJECT Null Hypothesis 13

Reason for Rejecting H 0 Sampling Distribution of x 20 If it is unlikely that we would get a sample mean of this value... μ = 50 If H 0 is true... if in fact this were the population mean x... then we reject the null hypothesis that μ = 50. 14

Errors in Making Decisions Type I Error Reject a true null hypothesis Considered a serious type of error The probability of Type I Error is Called level of significance of the test Set by researcher in advance 15

Errors in Making Decisions (continued) Type II Error Fail to reject a false null hypothesis The probability of Type II Error is β β is a calculated value, the formula is discussed later in the chapter 16

Outcomes and Probabilities Possible Hypothesis Test Outcomes Key: Outcome (Probability) Decision Do Not Reject H 0 Reject H 0 State of Nature H 0 True No error (1 - ) Type I Error ( ) H 0 False Type II Error ( β ) No Error ( 1 - β ) 17

Type I & II Error Relationship Type I and Type II errors cannot happen at the same time Type I error can only occur if H 0 is true Type II error can only occur if H 0 is false If Type I error probability ( ) Type II error probability ( β ), then 18

Factors Affecting Type II Error All else equal, β when the difference between hypothesized parameter and its true value β when β when σ β when n The formula used to compute the value of β is discussed later in the chapter 19

Level of Significance, Defines unlikely values of sample statistic if null hypothesis is true Defines rejection region of the sampling distribution Is designated by, (level of significance) Typical values are.01,.05, or.10 Is selected by the researcher at the beginning Provides the critical value(s) of the test 20

Hypothesis Tests for the Mean Hypothesis Tests for σ Known σ Unknown Assume first that the population standard deviation σ is known 21

Process of Hypothesis Testing 1. Specify population parameter of interest 2. Formulate the null and alternative hypotheses 3. Specify the desired significance level, α 4. Define the rejection region 5. Take a random sample and determine whether or not the sample result is in the rejection region 6. Reach a decision and draw a conclusion 22

Level of Significance and the Rejection Region Level of significance = Lower tail test Example: H 0 : μ 3 H A : μ < 3 Upper tail test Example: H 0 : μ 3 H A : μ > 3 Two tailed test Example: H 0 : μ = 3 H A : μ 3 /2 /2 -z α 0 0 z α -z α/2 0 z α/2 Reject H Do not Do not 0 Reject H 0 Reject H Do not 0 Reject H reject H 0 0 reject H 0 reject H 0 23

Critical Value for Lower Tail Test The cutoff value, -z α or x α, is called a critical value H 0 : μ 3 H A : μ < 3 Reject H 0 Do not reject H 0 -z α 0 x μ z σ n x α µ=3 24

Critical Value for Upper Tail Test z α The cutoff value, or, is called a critical value x α H 0 : μ 3 H A : μ > 3 Do not reject H 0 0 µ=3 z α x α Reject H 0 x μ z σ n 25

Critical Values for Two Tailed Tests There are two cutoff values (critical values): H 0 : μ = 3 H A : μ 3 ± z α/2 or /2 /2 x α/2 Lower x α/2 Upper Reject H 0 -z α/2 x α/2 Lower Do not reject H 0 Reject H 0 z α/2 0 µ=3 x α/2 Upper x /2 μ z /2 σ n 26

The Rejection Region Lower tail test Example: H 0 : μ 3 H A : μ < 3 Upper tail test Example: H 0 : μ 3 H A : μ > 3 Two tailed test Example: H 0 : μ = 3 H A : μ 3 /2 /2 -z α 0 0 z α -z α/2 0 z α/2 x α Reject H Do not Do not 0 Reject H 0 Reject H Do not 0 Reject H reject H 0 0 reject H 0 reject H 0 x α xα/2(l) xα/2(u) Reject H 0 if z < -z α Reject H 0 if z > z α Reject H 0 if z < -z α/2 or z > z α/2 i.e., if x < x α i.e., if x > x α i.e., if x < x α/2(l) or x > x α/2(u) 27

Two Equivalent Approaches to Hypothesis Testing z-units: For given, find the critical z value(s): -z α, z α,or ±z α/2 Convert the sample mean x to a z test statistic: Reject H 0 if z is in the rejection region, z x μ σ n otherwise do not reject H 0 x units: Given, calculate the critical value(s) x α, or x α/2(l) and x α/2(u) The sample mean is the test statistic. Reject H 0 if x is in the rejection region, otherwise do not reject H 0 28

Hypothesis Testing Example Test the claim that the true mean # of TV sets in US homes is at least 3. (Assume σ = 0.8) 1. Specify the population value of interest The mean number of TVs in US homes 2. Formulate the appropriate null and alternative hypotheses H 0 : μ 3 H A : μ < 3 (This is a lower tail test) 3. Specify the desired level of significance Suppose that =.05 is chosen for this test 29

Hypothesis Testing Example 4. Determine the rejection region (continued) =.05 Reject H 0 Do not reject H 0 -z α = -1.645 0 This is a one-tailed test with =.05. Since σ is known, the cutoff value is a z value: Reject H 0 if z < z = -1.645 ; otherwise do not reject H 0 30

Hypothesis Testing Example 5. Obtain sample evidence and compute the test statistic Suppose a sample is taken with the following results: n = 100, x = 2.84 ( = 0.8 is assumed known) Then the test statistic is: z x σ μ 2.84 0.8 3.16.08 2.0 n 100 31

Hypothesis Testing Example 6. Reach a decision and interpret the result (continued) =.05 Reject H 0 Do not reject H 0-2.0-1.645 0 Since z = -2.0 < -1.645, we reject the null hypothesis that the mean number of TVs in US homes is at least 3. There is sufficient evidence that the mean is less than 3. z 32

Hypothesis Testing Example An alternate way of constructing rejection region: =.05 (continued) Now expressed in x, not z units x Reject H 0 Do not reject H 0 2.8684 3 2.84 Since x = 2.84 < 2.8684, we reject the null hypothesis x α μ z α σ n 3 1.645 0.8 100 2.8684 33

p-value Approach to Testing Convert Sample Statistic ( x ) to Test Statistic (a z value, if σ is known) Determine the p-value from a table or computer Compare the p-value with If p-value <, reject H 0 If p-value, do not reject H 0 34

p-value Approach to Testing p-value: Probability of obtaining a test statistic more extreme ( or ) than the observed sample value given H 0 is true Also called observed level of significance Smallest value of for which H 0 can be rejected (continued) 35

p-value example Example: How likely is it to see a sample mean of 2.84 (or something further below the mean) if the true mean is = 3.0? P(x P z 2.84 μ 3.0) 2.84 3.0 0.8 100 P(z 2.0).0228 =.05 p-value =.0228 2.8684 3 2.84-1.645-2.0 0 x z 36

p-value example (continued) Compare the p-value with If p-value <, reject H 0 If p-value, do not reject H 0 Here: p-value =.0228 =.05 Since.0228 <.05, we reject the null hypothesis =.05 p-value =.0228 2.8684 3 2.84 37

Example: Upper Tail z Test for Mean ( Known) A phone industry manager thinks that customer monthly cell phone bill have increased, and now average over $52 per month. The company wishes to test this claim. (Assume = 10 is known) Form hypothesis test: H 0 : μ 52 H A : μ > 52 the average is not over $52 per month the average is greater than $52 per month (i.e., sufficient evidence exists to support the manager s claim) 38

Example: Find Rejection Region Suppose that =.10 is chosen for this test (continued) Find the rejection region: Reject H 0 =.10 Do not reject H 0 0 z α =1.28 Reject H 0 Reject H 0 if z > 1.28 39

Review: Finding Critical Value - One Tail What is z given = 0.10?.90.50.40 z 0 1.28.10 Critical Value = 1.28 =.10 Standard Normal Distribution Table (Portion) Z.07.08.09 1.1.3790.3810.3830 1.2.3980.3997.4015 1.3.4147.4162.4177 40

Example: Test Statistic Obtain sample evidence and compute the test statistic Suppose a sample is taken with the following results: n = 64, x = 53.1 (=10 was assumed known) Then the test statistic is: (continued) z x σ μ 53.1 10 52 0.88 n 64 41

Example: Decision (continued) Reach a decision and interpret the result: Reject H 0 =.10 Do not reject H 0 0 z =.88 1.28 Reject H 0 Do not reject H 0 since z = 0.88 1.28 i.e.: there is not sufficient evidence that the mean bill is over $52 42

p -Value Solution (continued) Calculate the p-value and compare to p-value =.1894 Reject H 0 =.10 0 Do not reject H 0 1.28 z =.88 Reject H 0 P(x 53.1 μ 52.0) 53.1 52.0 Pz 10 64 P(z 0.88).5.3106.1894 Do not reject H 0 since p-value =.1894 > =.10 43

Critical Value Approach to Testing When σ is known, convert sample statistic ( x ) to a z test statistic Hypothesis Tests for Known The test statistic is: x μ z σ n Unknown 44

Critical Value Approach to Testing When σ is unknown, convert sample statistic ( x ) to a t test statistic Hypothesis Tests for Known (The population must be approximately normal) Unknown The test statistic is: tn 1 x μ s n 45

Hypothesis Tests for μ, σ Unknown 1. Specify the population value of interest 2. Formulate the appropriate null and alternative hypotheses 3. Specify the desired level of significance 4. Determine the rejection region (critical values are from the t-distribution with n-1 d.f.) 5. Obtain sample evidence and compute the t test statistic 6. Reach a decision and interpret the result 46

Example: Two-Tail Test ( Unknown) The average cost of a hotel room in New York is said to be $168 per night. A random sample of 25 hotels resulted in x = $172.50 and s = $15.40. Test at the = 0.05 level. (Assume the population distribution is normal) H 0 : μ = 168 H A : μ 168 47

Example Solution: Two-Tail Test H 0 : μ = 168 H A : μ 168 /2=.025 /2=.025 = 0.05 n = 25 Critical Values: t 24 = ± 2.0639 is unknown, so use a t statistic tn 1 Reject H 0 Reject H 0 t α/2 Do not reject H 0 -t α/2 0-2.0639 2.0639 x μ s n 172.50 168 15.40 25 1.46 1.46 Do not reject H 0 : not sufficient evidence that true mean cost is different than $168 48

Type II Error Type II error is the probability of failing to reject a false H 0 Suppose we fail to reject H 0 : μ 52 when in fact the true mean is μ = 50 50 Reject H 0 : μ 52 52 Do not reject H 0 : μ 52 49

Type II Error (continued) Suppose we do not reject H 0 : 52 when in fact the true mean is = 50 This is the true distribution of x if = 50 This is the range of x where H 0 is not rejected 50 Reject H 0 : 52 52 Do not reject H 0 : 52 50

Type II Error Suppose we do not reject H 0 : μ 52 when in fact the true mean is μ = 50 (continued) Here, β = P( x cutoff ) if μ = 50 β 50 Reject H 0 : μ 52 52 Do not reject H 0 : μ 52 51

Calculating β Suppose n = 64, σ = 6, and =.05 cutoff (for H 0 : μ 52) x μ z σ n 52 1.645 6 64 50.766 So β = P( x 50.766 ) if μ = 50 50 50.766 52 Reject H 0 : μ 52 Do not reject H 0 : μ 52 52

Calculating β Suppose n = 64, σ = 6, and =.05 (continued) 50.766 50 P( x 50.766 μ 50) P z P(z 1.02).5.3461.1539 6 64 Probability of type II error: β =.1539 50 Reject H 0 : μ 52 52 Do not reject H 0 : μ 52 53

Hypothesis Tests in Minitab 54

Hypothesis Tests in Minitab 55

Sample Minitab Output 56

Hypothesis Tests Summary Addressed hypothesis testing methodology Performed z Test for the mean (σ known) Discussed p value approach to hypothesis testing Performed one-tail and two-tail tests... 57

Hypothesis Tests Summary (continued) Performed t test for the mean (σ unknown) Performed z test for the proportion Discussed Type II error and computed its probability 58

Multiple Regression Assumptions Errors (residuals) from the regression model: e = (y y) < The model errors are independent and random The errors are normally distributed The mean of the errors is zero Errors have a constant variance 59

Model Specification Decide what you want to do and select the dependent variable Determine the potential independent variables for your model Gather sample data (observations) for all variables 60

The Correlation Matrix Correlation between the dependent variable and selected independent variables can be found using Excel: Formula Tab: Data Analysis / Correlation Can check for statistical significance of correlation with a t test 61

Example A distributor of frozen desert pies wants to evaluate factors thought to influence demand Dependent variable: Pie sales (units per week) Independent variables: Price (in $) Data are collected for 15 weeks Advertising ($100 s) 62

Pie Sales Model Week Pie Sales Price ($) Advertising ($100s) 1 350 5.50 3.3 2 460 7.50 3.3 3 350 8.00 3.0 4 430 8.00 4.5 5 350 6.80 3.0 Multiple regression model: Sales = b 0 + b 1 (Price) + b 2 (Advertising) 6 380 7.50 4.0 7 430 4.50 3.0 8 470 6.40 3.7 9 450 7.00 3.5 10 490 5.00 4.0 11 340 7.20 3.5 12 300 7.90 3.2 13 440 5.90 4.0 Correlation matrix: Pie Sales Price Advertising Pie Sales 1 Price -0.44327 1 Advertising 0.55632 0.03044 1 14 450 5.00 3.5 15 300 7.00 2.7 63

Interpretation of Estimated Coefficients Slope (b i ) Estimates that the average value of y changes by b i units for each 1 unit increase in X i holding all other variables constant Example: if b 1 = -20, then sales (y) is expected to decrease by an estimated 20 pies per week for each $1 increase in selling price (x 1 ), net of the effects of changes due to advertising (x 2 ) y-intercept (b 0 ) The estimated average value of y when all x i = 0 (assuming all x i = 0 is within the range of observed values) 64

Pie Sales Correlation Matrix Pie Sales 1 Pie Sales Price Advertising Price -0.44327 1 Advertising 0.55632 0.03044 1 Price vs. Sales : r = -0.44327 There is a negative association between price and sales Advertising vs. Sales : r = 0.55632 There is a positive association between advertising and sales 65

Scatter Diagrams Sales 600 Sales vs. Price 500 400 300 200 100 0 0 2 4 6 8 10 Price Sales 600 500 400 Sales vs. Advertising 300 200 100 0 0 1 2 3 4 5 Advertising 66

Estimating a Multiple Linear Regression Equation Computer software is generally used to generate the coefficients and measures of goodness of fit for multiple regression Excel: Data / Data Analysis / Regression Minitab: Stat / Regression / Regression 67

Estimating a Multiple Linear Regression Equation Excel: 68

Minitab: Estimating a Multiple Linear Regression Equation 69

Multiple Regression Output Regression Statistics Multiple R 0.72213 R Square 0.52148 Adjusted R Square 0.44172 Standard Error 47.46341 Observations 15 Sales 306.526-24.975(Price) 74.131(Advertising) ANOVA df SS MS F Significance F Regression 2 29460.027 14730.013 6.53861 0.01201 Residual 12 27033.306 2252.776 Total 14 56493.333 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404 Price -24.97509 10.83213-2.30565 0.03979-48.57626-1.37392 Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888 70

The Multiple Regression Equation Sales 306.526-24.975(Price) 74.131(Advertising) where Sales is in number of pies per week Price is in $ Advertising is in $100 s. b 1 = -24.975: sales will decrease, on average, by 24.975 pies per week for each $1 increase in selling price, net of the effects of changes due to advertising b 2 = 74.131: sales will increase, on average, by 74.131 pies per week for each $100 increase in advertising, net of the effects of changes due to price 71

Using The Model to Make Predictions Predict sales for a week in which the selling price is $5.50 and advertising is $350: Sales 306.526-24.975(Price) 74.131(Advertising) 306.526-24.975(5.50) 74.131(3.5) 428.62 Predicted sales is 428.62 pies Note that Advertising is in $100 s, so $350 means that x 2 = 3.5 72

Predictions in Minitab 73

Predictions in Minitab (continued) Predicted y value < Confidence interval for the mean y value, given these x s < Prediction interval for an individual y value, given these x s < 74

Multiple Coefficient of Determination (R 2 ) Reports the proportion of total variation in y explained by all x variables taken together R 2 SSR SST Sum of squaresregression Total sumof squares 75

Multiple Coefficient of Determination (continued) Regression Statistics Multiple R 0.72213 R Square 0.52148 Adjusted R Square 0.44172 Standard Error 47.46341 Observations 15 R 2 SSR SST 29460.0 56493.3.52148 52.1% of the variation in pie sales is explained by the variation in price and advertising ANOVA df SS MS F Significance F Regression 2 29460.027 14730.013 6.53861 0.01201 Residual 12 27033.306 2252.776 Total 14 56493.333 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404 Price -24.97509 10.83213-2.30565 0.03979-48.57626-1.37392 Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888 76

Adjusted R 2 R 2 never decreases when a new x variable is added to the model This can be a disadvantage when comparing models What is the net effect of adding a new variable? We lose a degree of freedom when a new x variable is added Did the new x variable add enough explanatory power to offset the loss of one degree of freedom? 77

Adjusted R 2 Shows the proportion of variation in y explained by all x variables adjusted for the number of x variables used R 2 A 1 (1 R ) n (where n = sample size, k = number of independent variables) Penalize excessive use of unimportant independent variables Smaller than R 2 Useful in comparing among models 2 n 1 k 1 (continued) 78

Multiple Coefficient of Determination (continued) Regression Statistics Multiple R 0.72213 R Square 0.52148 Adjusted R Square 0.44172 Standard Error 47.46341 Observations 15 R 2 A.44172 44.2% of the variation in pie sales is explained by the variation in price and advertising, taking into account the sample size and number of independent variables ANOVA df SS MS F Significance F Regression 2 29460.027 14730.013 6.53861 0.01201 Residual 12 27033.306 2252.776 Total 14 56493.333 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404 Price -24.97509 10.83213-2.30565 0.03979-48.57626-1.37392 Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888 79

Is the Model Significant? F-Test for Overall Significance of the Model Shows if there is a linear relationship between all of the x variables considered together and y Use F test statistic Hypotheses: H 0 : β 1 = β 2 = = β k = 0 (no linear relationship) H A : at least one β i 0 (at least one independent variable affects y) 80

F-Test for Overall Significance Test statistic: (continued) F SSR k SSE n k 1 MSR MSE where F has (numerator) D 1 = k and (denominator) D 2 = (n k 1) degrees of freedom 81

F-Test for Overall Significance (continued) Regression Statistics Multiple R 0.72213 R Square 0.52148 Adjusted R Square 0.44172 Standard Error 47.46341 Observations 15 F MSR MSE With 2 and 12 degrees of freedom 14730.0 2252.8 6.5386 P-value for the F-Test ANOVA df SS MS F Significance F Regression 2 29460.027 14730.013 6.53861 0.01201 Residual 12 27033.306 2252.776 Total 14 56493.333 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404 Price -24.97509 10.83213-2.30565 0.03979-48.57626-1.37392 Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888 82

F-Test for Overall Significance 0 H 0 : β 1 = β 2 = 0 H A : β 1 and β 2 not both zero =.05 df 1 = 2 df 2 = 12 Do not reject H 0 Critical Value: F = 3.885 =.05 Reject H 0 F.05 = 3.885 F Test Statistic: MSR F 6.5386 MSE Decision: Reject H 0 at = 0.05 Conclusion: (continued) The regression model does explain a significant portion of the variation in pie sales (There is evidence that at least one independent variable affects y ) 83

Are Individual Variables Significant? Use t-tests of individual variable slopes Shows if there is a linear relationship between the variable x i and y Hypotheses: H 0 : β i = 0 (no linear relationship) H A : β i 0 (linear relationship does exist between x i and y) 84

Are Individual Variables Significant? H 0 : β i = 0 (no linear relationship) H A : β i 0 (linear relationship does exist between x i and y ) Test Statistic: (continued) t b i 0 (df = n k 1) s b i 85

Are Individual Variables Significant? (continued) Regression Statistics Multiple R 0.72213 R Square 0.52148 Adjusted R Square 0.44172 Standard Error 47.46341 Observations 15 t-value for Price is t = -2.306, with p-value.0398 t-value for Advertising is t = 2.855, with p-value.0145 ANOVA df SS MS F Significance F Regression 2 29460.027 14730.013 6.53861 0.01201 Residual 12 27033.306 2252.776 Total 14 56493.333 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404 Price -24.97509 10.83213-2.30565 0.03979-48.57626-1.37392 Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888 86

Inferences about the Slope: t Test Example H 0 : β i = 0 H A : β i 0 d.f. = 15-2-1 = 12 =.05 t /2 = 2.1788 /2=.025 Reject H 0 Do not reject H -t 0 α/2 t α/2 0-2.1788 2.1788 From Excel output: Coefficients Standard Error t Stat P-value Price -24.97509 10.83213-2.30565 0.03979 Advertising 74.13096 25.96732 2.85478 0.01449 The test statistic for each variable falls in the rejection region (p-values <.05) /2=.025 Reject H 0 Decision: Reject H 0 for each variable Conclusion: There is evidence that both Price and Advertising affect pie sales at =.05 87

Confidence Interval Estimate for the Slope Confidence interval for the population slope β 1 (the effect of changes in price on pie sales): b i t / 2 s b i where t has (n k 1) d.f. Coefficients Standard Error Lower 95% Upper 95% Intercept 306.52619 114.25389 57.58835 555.46404 Price -24.97509 10.83213-48.57626-1.37392 Advertising 74.13096 25.96732 17.55303 130.70888 Example: Weekly sales are estimated to be reduced by between 1.37 to 48.58 pies for each increase of $1 in the selling price 88

Standard Deviation of the Regression Model The estimate of the standard deviation of the regression model is: s SSE n k 1 MSE Is this value large or small? Must compare to the mean size of y for comparison 89

Standard Deviation of the Regression Model (continued) Regression Statistics Multiple R 0.72213 R Square 0.52148 Adjusted R Square 0.44172 The standard deviation of the regression model is 47.46 Standard Error 47.46341 Observations 15 ANOVA df SS MS F Significance F Regression 2 29460.027 14730.013 6.53861 0.01201 Residual 12 27033.306 2252.776 Total 14 56493.333 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404 Price -24.97509 10.83213-2.30565 0.03979-48.57626-1.37392 Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888 90

Standard Deviation of the Regression Model (continued) The standard deviation of the regression model is 47.46 A rough prediction range for pie sales in a given week is 2(47.46) 94.2 Pie sales in the sample were in the 300 to 500 per week range, so this range is probably too large to be acceptable. The analyst may want to look for additional variables that can explain more of the variation in weekly sales 91

Multicollinearity Multicollinearity: High correlation exists between two independent variables This means the two variables contribute redundant information to the multiple regression model 92

Multicollinearity Including two highly correlated independent variables can adversely affect the regression results No new information provided Can lead to unstable coefficients (large standard error and low t-values) Coefficient signs may not match prior expectations (continued) 93

Some Indications of Severe Multicollinearity Incorrect signs on the coefficients Large change in the value of a previous coefficient when a new variable is added to the model A previously significant variable becomes insignificant when a new independent variable is added The estimate of the standard deviation of the model increases when a variable is added to the model 94

Detect Collinearity (Variance Inflationary Factor) VIF j is used to measure collinearity: VIF j 1 1 R 2 j R 2 j is the coefficient of determination when the j th independent variable is regressed against the remaining k 1 independent variables If VIF j 5, x j is highly correlated with the other explanatory variables 95

Detect Collinearity Regression Analysis Price and all other X Regression Statistics Multiple R 0.030437581 R Square 0.000926446 Adjusted R Square -0.075925366 Standard Error 1.21527235 Observations 15 VIF 1.000927305 Output for the pie sales example: Since there are only two explanatory variables, only one VIF is reported VIF is < 5 There is no evidence of collinearity between Price and Advertising 96

Qualitative (Dummy) Variables Categorical explanatory variable (dummy variable) with two or more levels: yes or no, on or off, male or female coded as 0 or 1 Regression intercepts are different if the variable is significant Assumes equal slopes for other variables The number of dummy variables needed is (number of levels 1) 97

Dummy-Variable Model Example (with 2 Levels) Let: ŷ b 2 0 b x b x 1 2 y = pie sales 1 x 1 = price x 2 = holiday (X 2 = 1 if a holiday occurred during the week) (X 2 = 0 if there was no holiday that week) 98

Dummy-Variable Model Example (with 2 Levels) (continued) ŷ b 0 b x 1 1 b 2 (1) (b 0 b 2 ) b x 1 1 Holiday ŷ b 0 b x 1 1 b 2 (0) b 0 b x 1 1 No Holiday y (sales) b 0 + b 2 b 0 Different intercept Same slope If H 0 : β 2 = 0 is rejected, then Holiday has a significant effect on pie sales x 1 (Price) 99

Interpreting the Dummy Variable Coefficient (with 2 Levels) Example: Sales 300-30(Price) 15(Holiday) Sales: number of pies sold per week Price: pie price in $ 1 If a holiday occurred during the week 0 If no holiday occurred b 2 = 15: on average, sales were 15 pies greater in weeks with a holiday than in weeks without a holiday, given the same price 100

Dummy-Variable Models (more than 2 Levels) The number of dummy variables is one less than the number of levels Example: y = house price ; x 1 = square feet The style of the house is also thought to matter: Style = ranch, split level, condo Three levels, so two dummy variables are needed 101

Dummy-Variable Models (more than 2 Levels) Let the default category be condo (continued) 1 if ranch x2 x3 0 if not 1 0 if splitlevel if not ŷ b 3 0 b x1 b x2 b x 1 2 3 b 2 shows the impact on price if the house is a ranch style, compared to a condo b 3 shows the impact on price if the house is a split level style, compared to a condo 102

Interpreting the Dummy Variable Coefficients (with 3 Levels) Suppose the estimated equation is ŷ 20.43 0.045x 1 23.53x2 18.84x3 For a condo: x 2 = x 3 = 0 ŷ ŷ ŷ 20.43 0.045x 1 For a ranch: x 3 = 0 20.43 0.045x For a split level: x 2 = 0 20.43 0.045x 1 1 23.53 18.84 With the same square feet, a ranch will have an estimated average price of 23.53 thousand dollars more than a condo With the same square feet, a ranch will have an estimated average price of 18.84 thousand dollars more than a condo. 103

Interaction Effects Hypothesizes interaction between pairs of x variables Response to one x variable varies at different levels of another x variable Contains two-way cross product terms y β 2 0 β1x1 β2x2 β3x3 β4x1x2 β5x1x2 Basic Terms Interactive Terms 104

Effect of Interaction Given: y β β1x1 β2x2 β3x1x2 0 ε Without interaction term, effect of x 1 on y is measured by β 1 With interaction term, effect of x 1 on y is measured by β 1 + β 3 x 2 Effect changes as x 2 increases 105

Interaction Example 12 8 4 0 y y = 1 + 2x 1 + 3x 2 + 4x 1 x 2 0 0.5 1 1.5 where x 2 = 0 or 1 (dummy variable) x 2 = 1 y = 1 + 2x 1 + 3(1) + 4x 1 (1) = 4 + 6x 1 x 2 = 0 y = 1 + 2x 1 + 3(0) + 4x 1 (0) = 1 + 2x 1 Effect (slope) of x 1 on y does depend on x 2 value x 1 106

Interaction Regression Model Worksheet Case, i y i x 1i x 2i x 1i x 2i 1 1 1 3 3 2 4 8 5 40 3 1 3 2 6 4 3 5 6 30 : : : : : multiply x 1 by x 2 to get x 1 x 2, then run regression with y, x 1, x 2, x 1 x 2 107

Evaluating Presence of Interaction Hypothesize interaction between pairs of independent variables y β β1x1 β2x2 β3x1x2 0 ε Hypotheses: H 0 : β 3 = 0 (no interaction between x 1 and x 2 ) H A : β 3 0 (x 1 interacts with x 2 ) 108

Model Building Goal is to develop a model with the best set of independent variables Easier to interpret if unimportant variables are removed Lower probability of collinearity Stepwise regression procedure Provide evaluation of alternative models as variables are added Best-subset approach Try all combinations and select the best using the highest adjusted R 2 and lowest s ε 109

Stepwise Regression Idea: develop the least squares regression equation in steps, either through forward selection, backward elimination, or through standard stepwise regression The coefficient of partial determination is the measure of the marginal contribution of each independent variable, given that other independent variables are in the model 110

Best Subsets Regression Idea: estimate all possible regression equations using all possible combinations of independent variables Choose the best fit by looking for the highest adjusted R 2 and lowest standard error s ε Stepwise regression and best subsets regression can be performed using Minitab, or other statistical software packages 111