Chapter 4 Smple Lnear Regresson Chapter 4 - Smple Lnear Regresson Manageral decsons often are based on the relatonshp between two or more varables. Regresson analss can be used to develop an equaton showng how the varables are related. The varable beng predcted s called the dependent varable and s denoted b. The varables beng used to predct the value of the dependent varable are called the ndependent varables and are denoted b. Smple lnear regresson nvolves one ndependent varable and one dependent varable. Two or more ndependent varables s called multple regresson. The relatonshp between the two varables s appromated b a straght lne. Smple Lnear Regresson Model The equaton that descrbes how s related to and an error term s called the regresson model. The smple lnear regresson model s: = b 0 + b +e b 0 and b are called parameters of the model, e s a random varable called the error term. Smple Lnear Regresson Equaton Smple Lnear Regresson Equaton The smple lnear regresson equaton s: Postve Lnear Relatonshp E( = b 0 + b E( Graph of the regresson equaton s a straght lne. b 0 s the ntercept of the regresson lne. b s the slope of the regresson lne. E( s the epected value of for a gven value. Intercept b 0 Regresson lne Slope b s postve 3 4 Smple Lnear Regresson Equaton Negatve Lnear Relatonshp Estmated Smple Lnear Regresson Equaton The estmated smple lnear regresson equaton Intercept b 0 E( Regresson lne ŷ b b 0 The graph s called the estmated regresson lne. Slope b s negatve b 0 s the ntercept of the lne. b s the slope of the lne. s the estmated value of for a gven value. ŷ ou can show No Relatonshp 5 6
Chapter 4 Smple Lnear Regresson Regresson Model = b 0 + b +e Regresson Equaton E( = b 0 + b Unknown Parameters b 0, b b 0 and b provde estmates of b 0 and b Estmaton Process Sample Data:.... n n Estmated Regresson Equaton ŷ b b 0 Sample Statstcs b 0, b Least Squares Method Least Squares Crteron mn ( = observed value of the dependent varable for the th observaton mn ( ˆ mn ( (b b ^ = estmated value of the dependent varable for the th observaton 0 7 8 Observed Value of for Predcted Value of for Intercept = β 0 Smple Lnear Regresson Model β0 β ε ε Random Error for ths value Slope = β 9 Least Squares Method Slope for the Estmated Regresson Equaton ( ( b ( = value of ndependent varable for th observaton = value of dependent varable for th observaton = mean value for ndependent varable = mean value for dependent varable -Intercept for the Estmated Regresson Equaton b b 0 0 Smple Lnear Regresson Eample: Reed Auto Sales Reed Auto perodcall has a specal weeklong sale. As part of the advertsng campagn Reed runs one or more televson commercals durng the weekend precedng the sale. Data from a sample of 5 prevous sales are shown here. Number of TV Ads ( 3 3 Number of Cars Sold ( 4 4 8 7 7 S = 0 S = 00 0 ( ( --6 4 0- --3 7 ( ( 0 ( 4 Estmated Regresson Equaton Slope for the Estmated Regresson Equaton ( ( 0 b 5 ( 4 -Intercept for the Estmated Regresson Equaton b 0 b 0 5( 0 Estmated Regresson Equaton ˆ 0 5
Chapter 4 Smple Lnear Regresson Usng Ecel s Chart Tools for Scatter Dagram & Estmated Regresson Equaton Reed Auto Sales Estmated Regresson Lne Measures of Varaton Total varaton s made up of two parts: SST SSR SSE Total Sum of Squares Regresson Sum of Squares Error Sum of Squares SST ( SSR ( ˆ = Mean value of the dependent varable = Observed value of the dependent varable = Predcted value of for the gven value SSE ( ˆ 3 4 Measures of Varaton Measures of Varaton SST = total sum of squares (Total Varaton Measures the varaton of the values around ther mean SST ( SSR = regresson sum of squares (Eplaned Varaton Varaton attrbutable to the relatonshp between and SSR ( ˆ SSE = error sum of squares (Uneplaned Varaton Varaton n attrbutable to factors other than SSE ( ˆ _ SST ( SSE ( ˆ SSR ( ˆ _ 5 6 Coeffcent of Determnaton r or R Relatonshp Among SST, SSR, SSE ( SST = SSR + SSE ( ˆ ( ˆ SST = total sum of squares SSR = sum of squares due to regresson SSE = sum of squares due to error r = SSR/SST = 00/4 =.877 The regresson relatonshp s ver strong; 87.7% of the varablt n the number of cars sold can be eplaned b the lnear relatonshp between the number of TV ads and the number of cars sold. 7 Sample Correlaton Coeffcent We learned n Chapter 3 r (sgn of b Coeffcent of D eterm naton r (sgn of b r b = the slope of the estmated regresson equaton ˆ b0 b 8 3
Chapter 4 Smple Lnear Regresson Sample Correlaton Coeffcent r (sgn of b r Eamples of Appromate r (or R Values The sgn of b n the equaton s +. r ˆ 0 5 = +.877 r = +.9366 r = r = Perfect lnear relatonshp between and : 00% of the varaton n s eplaned b varaton n Note: Ths onl holds for smple regresson 9 r = 0 Eamples of Appromate r (or R Values Dfferent Values of the Correlaton Coeffcent Once Agan 0 < r < Weaker lnear relatonshps between and : Some but not all of the varaton n s eplaned b varaton n Eamples of Appromate r (or R Values SUMMAR OUTPUT Armand s Pzza (Ecel Fle r = 0 Regresson Statstcs Multple R 0.950955 R Square 0.9073363 = 60 + 5 Adjusted R Square 0.890575334 Standard Error 3.893669 Observatons 0 No lnear relatonshp between and : ANOVA df SS MS F Sgnfcance F r = 0 The value of does not depend on. (None of the varaton n s eplaned b varaton n Regresson (SSR 400 400 74.484.54887E-05 Resdual (SSE 8 530 9.5 Total (SST 9 5730 Coeffcents Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0% Intercept 60 9.60348 6.503336 0.0009 38.747559 8.757 38.74756 8.75744 3 Varable 5 0.5806538 8.66749.5E-05 3.66905963 6.338094 3.6690596 6.338094037 4 4
Sales Chapter 4 Smple Lnear Regresson 50 00 50 Armand's Pzza = 60 + 5 R² = 0.907 Reed Auto Sales Estmated Regresson Lne Once Agan SUMMAR OUTPUT Regresson Statstcs Multple R 0.9365858 R Square 0.877998 Adjusted R Square 0.836573 Standard Error.6046899 Observatons 5 00 50 0 0 5 0 5 0 5 30 Populaton Predcted Lnear ( 5 ANOVA df SS MS F Sgnfcance F Regresson 00 00.485743 0.089863 Resdual 3 4 4.667 Total 4 4 Coeffcents Standard Error t Stat P-value Lower 95% Upper Lower 95% 95.0% Upper 95.0% Intercept 0.3664393 4.6 0.04360.46895750 7.53.469 7.53045 Ads 5.080345 4.69 0.089863.565659 8.437.563 8.43743488 Pont Estmaton = 0 + 5(3 = 5 cars If 3 TV ads are run pror to a sale, we epect the mean number of cars sold to be: 6 Lookng at Regresson n More Detal Assumptons About the Error Term e. The error e s a random varable wth mean of zero.. The varance of e, denoted b, s the same for all values of the ndependent varable. 3. The values of e are ndependent. 4. The error e s a normall dstrbuted random varable. 7 8 Testng for Sgnfcance To test for a sgnfcant regresson relatonshp, we must conduct a hpothess test to determne whether the value of b s zero. Two tests are commonl used: t Test and F Test Both the t test and F test requre an estmate of, the varance of e n the regresson model. 9 An Estmate of SSE Testng for Sgnfcance The mean square error (MSE provdes the estmate of, and the notaton s s also used. s = MSE = SSE/(n ( ˆ ( b 0 b An Estmate of whch s called the standard error of the estmate. s MSE SSE n 30 5
Chapter 4 Smple Lnear Regresson Testng for Sgnfcance: t Test Hpotheses Test Statstc Rejecton Rule t H0: b 0 H : 0 a b b s b where s s S ( b Reject H 0 f p-value < a or t < -t a or t > t a t a s based on a t dstrbuton Wth n - degrees of freedom 3 Confdence Interval for b Rejecton Rule Reject H 0 f 0 s not ncluded n the confdence nterval for b. 95% Confdence Interval for b b t s = 5 +/- 3.8(.08 = 5 +/- 3.44 a / b Concluson or.56 to 8.44 0 s not ncluded n the confdence nterval. Reject H 0 3 Confdence Interval for b The form of a confdence nterval for b s: b s the pont estmator where t a/ s the t value provdng an area of a/ n the upper tal of a t dstrbuton wth n - degrees of freedom s the margn of error Testng for Sgnfcance: t Test. Determne the hpotheses.. Specf the level of sgnfcance. 3. Select the test statstc. H0: b 0 H : 0 a b b t s a =.05 4. State the rejecton rule. Reject H 0 f p-value <.05 or t > 3.8 (wth 3 degrees of freedom b 33 34 Testng for Sgnfcance: t Test 5. Compute the value of the test statstc. b 5 t 4.63 s.08 b 6. Determne whether to reject H 0. t = 4.54 provdes an area of.0 n the upper tal. Hence, the p-value s less than.0. (Also, t = 4.63 > 3.8. We can reject H 0. 35 Hpotheses Test Statstc Testng for Sgnfcance: F Test Rejecton Rule H : b 0 H 0 : a b 0 F = MSR/MSE Reject H 0 f p-value < a or F > F a F a s based on an F dstrbuton wth degree of freedom n the numerator and n - degrees of freedom n the denomnator 36 6
Chapter 4 Smple Lnear Regresson Mechancs of the F Test Graphcall Testng for Sgnfcance: F Test H0: b 0 H : 0 a b. Determne the hpotheses.. Specf the level of sgnfcance. 3. Select the test statstc. 4. State the rejecton rule. a =.05 F = MSR/MSE 5. Compute the value of the test statstc. Reject H 0 f p-value <.05 or F > 0.3 (wth d.f. n numerator and 3 d.f. n denomnator F = MSR/MSE = 00/4.667 =.43 6. Determne whether to reject H 0. F = 7.44 provdes an area of.05 n the upper tal. Thus, the p-value correspondng to F =.43 s less than (.05 =.05. Hence, we reject H 0. 37 The statstcal evdence s suffcent to conclude that we have a sgnfcant relatonshp between the number of TV ads ared and the number of cars sold. 38 Some Cautons about the Interpretaton of Sgnfcance Tests Rejectng H 0 : b = 0 and concludng that the relatonshp between and s sgnfcant does not enable us to conclude that a cause-and-effect relatonshp s present between and. Just because we are able to reject H 0 : b = 0 and demonstrate statstcal sgnfcance does not enable us to conclude that there s a lnear relatonshp between and. Resdual Analss If the assumptons about the error term e appear questonable, the hpothess tests about the sgnfcance of the regresson relatonshp and the nterval estmaton results ma not be vald. The resduals provde the best nformaton about e. Resdual for Observaton ˆ Much of the resdual analss s based on an eamnaton of graphcal plots. 39 40 Resdual Plot Aganst If the assumpton that the varance of e s the same for all values of s vald, and the assumed regresson model s an adequate representaton of the relatonshp between the varables, then Resdual Plot Aganst The resdual plot should gve an overall mpresson of a horzontal band of ponts The resdual plot should gve an overall mpresson of a horzontal band of ponts. 4 unbased: have an average value of zero n an thn vertcal strp, and 4 7
Chapter 4 Smple Lnear Regresson Resdual Plot Aganst Eample: Armand s Pzza Parlors Student Populaton ( Sales ( Predcted sales = 60 + 5( Resduals ( 58 70-6 05 90 5 8 88 00-8 8 00 8 7 0-3 6 37 40-3 0 57 60-3 0 69 60 9 49 70-6 0 90 Resdual Plot Aganst Usng Ecel to Produce a Resdual Plot When the Regresson dalog bo appears, we must also select the Resdual Plot opton. The output wll nclude two new tems: A plot of the resduals aganst the ndependent varable, and A lst of predcted values of and the correspondng resdual values. 43 44 Standardzed Resdual Plot The standardzed resdual plot can provde nsght about the assumpton that the error term e has a normal dstrbuton. If ths assumpton s satsfed, the dstrbuton of the standardzed resduals should appear to come from a standard normal probablt dstrbuton. Eample: Armand s Pzza Parlors Observaton Predcted sales = 60 + 5( Resduals ( Standardzed Resdual 70 - -.079 90 5.4 3 00 - -.9487 4 00 8.430 5 0-3 -.96 6 40-3 -.96 7 60-3 -.37 8 60 9.75 9 70 - -.74 0 90.079 45 Independence Assumpton Independence assumpton s most lkel to be volated when the data are tme-seres data If the data s not tme seres, then t can be reordered wthout affectng the data For tme-seres data, the tme-ordered error terms can be autocorrelated Postve autocorrelaton s when a postve error term n tme perod tends to be followed b another postve value n +k Negatve autocorrelaton s when a postve error term tends to be followed b a negatve value 46 Independence Assumpton Vsuall Postve Autocorrelaton Independence Assumpton Vsuall Negatve Autocorrelaton 47 48 8