Chapter 12 12-1 North Seattle Community College BUS21 Business Statistics Chapter 12 Learning Objectives In this chapter, you learn:! How to use regression analysis to predict the value of a dependent variable based on an independent variable! The meaning of the regression coefficients b and b 1! How to evaluate the assumptions of regression analysis and know what to do if the assumptions are violated! To make inferences about the slope and correlation coefficient! To estimate mean values and predict individual values BUS21: Business Statistics Simple Regression- 2 Correlation! Correlation analysis! is used to measure the association (linear relationship) between two variables! is only concerned with the strength of the relationship! does not imply cause and effect! can be visualized by the use of a scatter plot to show the relationship between two variables! Regression analysis is used to! predict the value of a dependent variable based on the value of at least one independent variable! explain the impact on the dependent variable from changes in an independent variable Dependent variable (): the variable we wish to predict or explain Independent variable (): the variable used in order to predict or explain the dependent variable BUS21: Business Statistics Simple Regression- 3 BUS21: Business Statistics Simple Regression- 4 Types of Relationships! Only one independent variable,! Relationship between and is described by a linear function Linear relationships Curvilinear relationships! Assumption:! Changes in are related to changes in BUS21: Business Statistics Simple Regression- 5 BUS21: Business Statistics Simple Regression- 6
Chapter 12 12-2 Strength of Relationships Strong Weak None For a population: Intercept Coefficient Slope Coefficient Independent Variable i =! +! 1 i + " i Dependent Variable Linear component Random Error component BUS21: Business Statistics Simple Regression- 7 BUS21: Business Statistics Simple Regression- 8 Observed Value of for i Predicted Value of for i Intercept (! ) i Random Error (for this i value)! i i =! +! 1 i Commonly shown in math as y = mx+b Slope (! 1 ) BUS21: Business Statistics Simple Regression- 9 Linear Regression Equation The simple linear regression equation is based on a sample set gives an estimate of the population regression line Estimated (or predicted) value for observation i ŷ = b + b x i Estimate of the regression intercept Estimate of the regression slope BUS21: Business Statistics Simple Regression- 1 1 i Value of for observation i The Least Squares Method Least Squares Criterion: Minimize the sum of the squared differences between and Ŷ min "(y i! y ˆ i ) 2 which equates to min "(y i! (b + b 1 x i )) 2 The Least Squares Method The sum of the squared differences can be minimized when: and b 1 =!(x i " x )(y i " y )!(x i " x ) 2 b = y! b 1 x BUS21: Business Statistics Simple Regression- 11 BUS21: Business Statistics Simple Regression- 12
Chapter 12 12-3 Example:! A real estate agent wishes to examine the relationship between the selling price of a home and its size (measured in square feet)! A random sample of 1 houses is selected! = square feet! = house price ($ s) BUS21: Business Statistics Simple Regression- 13 Square House Feet Price ($ s) x y x! x y! y (x! x )(y! y ) (x! x ) 2 14 245-315 -41.5 1372.5 99225 16 312-115 25.5-2932.5 13225 17 279-15 -7.5 112.5 225 1875 38 16 21.5 344 256 11 199-615 -87.5 53812.5 378225 155 219-165 -67.5 11137.5 27225 235 45 635 118.5 75247.5 43225 245 324 735 37.5 27562.5 54225 1425 319-29 32.5-9425 841 17 255-15 -31.5 472.5 225 Sum.. 1725. 15715 Mean 1715 286.5 BUS21: Business Statistics Simple Regression- 14 Square House Feet Price ($ s) x y x! x y! y (x! x )(y! y ) (x! x ) 2 Sum.. 1725. 15715 Mean 1715 286.5 b 1 =!(x i " x )(y i " y )!(x i " x ) 2 = 1725 15715 =.11 b = y! b 1 x = 286.5! (.11)(1715) = 98.25 Scatter Plot House Price vs. Square Feet 45 4 35 3 25 2 15 1 5 5 1 15 2 25 3 House Price ($1s) Square Feet BUS21: Business Statistics Simple Regression- 15 BUS21: Business Statistics Simple Regression- 16 Using Excel Excel output Regression Statistics Multiple R.76211 R Square.5882 Adjusted R Square.52842 Standard Error 41.3332 Observations 1 The regression equation is: house price = 98.25 + (.11)(square feet) ANOVA df SS MS F Significance F 18934.9348 18934.9348 11.848.139 Regression 1 Residual 8 13665.5652 178.1957 Total 9 326.5 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 98.24833 58.3348 1.69296.12892-35.5772 232.7386 Square Feet.1977.3297 3.32938.139.3374.1858 BUS21: Business Statistics Simple Regression- 17 BUS21: Business Statistics Simple Regression- 18
Chapter 12 12-4 Intercept = 98.25 Scatter Plot House Price vs. Square Feet House Price ($1s) 45 4 35 3 25 2 15 1 5 5 1 15 2 25 3 Square Feet Slope =.11! b is. Note: Interpretation of b house price = 98.25 +.11 (square feet)! the estimated mean value of! when the value of is zero Because a house cannot have a square footage of zero, b has no practical application house price = 98.25 + (.11)(square feet) BUS21: Business Statistics Simple Regression- 19 BUS21: Business Statistics Simple Regression- 2! b 1 is. Interpretation of b 1 house price = 98.25 +.11 (square feet)! the estimated change in the mean value of! when the value of changes by one unit Here, b 1 =.11 tells us that the mean value of a house increases by.11 x $1 or about $11, on average, for each additional one square foot of size Making Predictions Predict the price for a house with 2 square feet: house price = 98.25 +.11(sq.ft.) = 98.25 +.11(2)! 318 The predicted price for a house with 2 square feet is $318, BUS21: Business Statistics Simple Regression- 21 BUS21: Business Statistics Simple Regression- 22 House Price ($1s) 45 4 35 3 25 2 15 1 5 Making Predictions! When using a regression model for prediction, only predict within the relevant range of data Relevant range for interpolation 5 1 15 2 25 3 Do not try to extrapolate beyond the range of observed s BUS21: Business Statistics Square Feet Simple Regression- 23! Total variation of y is made up of two parts:!(y i " y ) 2 =!(y y ˆ i " i " y ) y 2 ) +!(y i " y ˆ i ) 2 Total Sum of Squares Regression Total Sum Sum of of Squares Error Sum of Squares SST = SSR + SSE BUS21: Business Statistics Simple Regression- 24
Chapter 12 12-5 (cont d)! SST = total sum of squares (Total Variation)! The variation of the i values around their mean. y y i (cont d)! SSR = regression sum of squares (Explained Variation)! Variation from the relationship between and! SSE = error sum of squares (Unexplained Variation)! Variation in attributable to factors other than ˆ y i y SST =!(y i " y ) 2 SSE =!(y i " y ˆ i ) 2 ŷ = b + b x SSR =!( y ˆ i " y ) 2 i 1 i BUS21: Business Statistics Simple Regression- 25 x i BUS21: Business Statistics Simple Regression- 26 x! The coefficient of determination is! the portion of the total variation in the dependent variable that is explained by variation in the independent variable! also called r-squared and is denoted as r 2 regression sum of squares r 2 = total sum of squares =!( y ˆ " y i )2!(y i " y ) = SSR 2 SST r 2 = 1 r 2 = 1 There is a perfect linear relationship between and : 1% of the variation in is explained by variation in Note: Since r2 is a ratio, then " r 2 " 1 BUS21: Business Statistics Simple Regression- 27 r 2 = 1 BUS21: Business Statistics Simple Regression- 28 < r 2 < 1 As r 2 decreases, there is a weaker linear relationship between and : Some, but not all, of the variation in is explained by variation in r 2 = r 2 = There is no linear relationship between and : The value of does not depend on. None of the variation in is explained by variation in. BUS21: Business Statistics Simple Regression- 29 BUS21: Business Statistics Simple Regression- 3
Chapter 12 12-6 Using Excel Regression Statistics Multiple R.76211 R Square.5882 Adjusted R Square.52842 Standard Error 41.3332 Observations 1 r 2 = SSR SST = 18934.9348 326.5 =.5882 58.8% of the variation in house prices is explained by variation in square feet ANOVA df SS MS F Significance F 18934.9348 18934.9348 11.848.139 Regression 1 Residual 8 13665.5652 178.1957 Total 9 326.5 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 98.24833 58.3348 1.69296.12892-35.5772 232.7386 Square Feet.1977.3297 3.32938.139.3374.1858 BUS21: Business Statistics Simple Regression- 31