4//6 Appled Statstcs ad Probablty for Egeers Sth Edto Douglas C. Motgomery George C. Ruger Chapter Smple Lear Regresso ad Correlato CHAPTER OUTLINE Smple Lear Regresso ad Correlato - Emprcal Models -8 Correlato - Smple Lear Regresso -9 Regresso o Trasformed Varables -3 Propertes of the Least Squares - Logstc Regresso Estmators -4 Hypothess Test Smple Lear Regresso -4. Use of t-tests -4. Aalyss of varace approach to test sgfcace of regresso -5 Cofdece Itervals -5. Cofdece tervals o the slope ad tercept -5. Cofdece terval o the mea respose -6 Predcto of New Observatos -7 Adequacy of the Regresso Model -7. Resdual aalyss -7. Coeffcet of determato (R ) Chapter Ttle ad Outle
4//6 Learg Objectves for Chapter After careful study of ths chapter, you should be able to do the followg:. Use smple lear regresso for buldg emprcal models to egeerg ad scetfc data.. Uderstad how the method of least squares s used to estmate the parameters a lear regresso model. 3. Aalyze resduals to determe f the regresso model s a adequate ft to the data or to see f ay uderlyg assumptos are volated. 4. Test the statstcal hypotheses ad costruct cofdece tervals o the regresso model parameters. 5. Use the regresso model to make a predcto of a future observato ad costruct a approprate predcto terval o the future observato. 6. Apply the correlato model. 7. Use smple trasformatos to acheve a lear regresso model. Chapter Learg Objectves 3 -: Emprcal Models May problems egeerg ad scece volve eplorg the relatoshps betwee two or more varables. Regresso aalyss s a statstcal techque that s very useful for these types of problems. For eample, a chemcal process, suppose that the yeld of the product s related to the processoperatg temperature. Regresso aalyss ca be used to buld a model to predct yeld at a gve temperature level. Sec - Emprcal Models 4
4//6 -: Smple Lear Regresso The smple lear regresso cosders a sgle regressor or predctor ad a depedet or respose varable Y. The epected value of Y at each level of s a radom varable: E(Y ) = b + b We assume that each observato, Y, ca be descrbed by the model Y = b + b + Sec - Smple Lear Regresso 5 -: Smple Lear Regresso Least Squares Estmates The least-squares estmates of the tercept ad slope the smple lear regresso model are bˆ y bˆ (-) bˆ y y (-) where y ( / ) y ad ( / ). Sec - Smple Lear Regresso 6 3
4//6 4 -: Smple Lear Regresso 7 The ftted or estmated regresso le s therefore (-3) Note that each par of observatos satsfes the relatoshp where s called the resdual. The resdual descrbes the error the ft of the model to the th observato y. y b b ˆ ˆ ˆ e y,,,, ˆ ˆ b b y y e ˆ Sec - Smple Lear Regresso -: Smple Lear Regresso Notato 8 S y y y S y Sec - Smple Lear Regresso
4//6 EXAMPLE - Oyge Purty We wll ft a smple lear regresso model to the oyge purty data Table -. The followg quattes may be computed: 3.9 y,843..96 y 9.65 y 7,44.53 9.89 y,4.6566 (3.9) S 9.89.6888 ad y S y y (3.9)(,843.),4.6566.7744 Sec - Smple Lear Regresso 9 EXAMPLE - Oyge Purty - cotued Therefore, the least squares estmates of the slope ad tercept are ad bˆ y bˆ S bˆ S y.7744 4.94748.6888 9.65 (4.94748).96 74.833 The ftted smple lear regresso model (wth the coeffcets reported to three decmal places) s yˆ 74.83 4. 947 Sec - Smple Lear Regresso 5
4//6 -: Smple Lear Regresso Estmatg The error sum of squares s SS E e y yˆ It ca be show that the epected value of the error sum of squares s E(SS E ) = ( ). Sec - Smple Lear Regresso -: Smple Lear Regresso Estmatg A ubased estmator of s ˆ SS E (-4) where SS E ca be easly computed usg SSE SST ˆb Sy (-5) Sec - Smple Lear Regresso 6
4//6-3: Propertes of the Least Squares Estmators Slope Propertes E( bˆ) b Itercept Propertes V (ˆ) b S E(ˆ b ) b ad V (ˆ b ) S Sec -3 Propertes of the Least Squares Estmators 3-4: Hypothess Tests Smple Lear Regresso -4. Use of t-tests Suppose we wsh to test H : b = b, H : b b, A approprate test statstc would be T bˆ b ˆ /, S (-6) Sec -4 Hypothess Tests Smple Lear Regresso 4 7
4//6-4: Hypothess Tests Smple Lear Regresso -4. Use of t-tests The test statstc could also be wrtte as: T bˆ ˆ b se (ˆ b ) We would reject the ull hypothess f, t > t a/, - Sec -4 Hypothess Tests Smple Lear Regresso 5-4: Hypothess Tests Smple Lear Regresso -4. Use of t-tests Suppose we wsh to test H : b = b, H : b b, A approprate test statstc would be T ˆ bˆ b, S bˆ b se(ˆ b ), (-7) Sec -4 Hypothess Tests Smple Lear Regresso 6 8
4//6-4: Hypothess Tests Smple Lear Regresso -4. Use of t-tests We would reject the ull hypothess f t > t a/, - Sec -4 Hypothess Tests Smple Lear Regresso 7-4: Hypothess Tests Smple Lear Regresso -4. Use of t-tests A mportat specal case of the hypotheses of Equato -8 s H : b = H : b These hypotheses relate to the sgfcace of regresso. Falure to reject H s equvalet to cocludg that there s o lear relatoshp betwee ad Y. Sec -4 Hypothess Tests Smple Lear Regresso 8 9
4//6-4: Hypothess Tests Smple Lear Regresso EXAMPLE - Oyge Purty Tests of Coeffcets We wll test for sgfcace of regresso usg the model for the oyge purty data from Eample -. The hypotheses are H : b = H : b ad we wll use a =.. From Eample - ad Table - we have b ˆ 4.947, S.6888, ˆ.8 so the t-statstc Equato -6 becomes bˆ ˆ b 4.947 t.35 ˆ / S se(ˆ b).8/.6888 Practcal Iterpretato: Sce the referece value of t s t.5,8 =.88, the value of the test statstc s very far to the crtcal rego, mplyg that H : b = should be rejected. There s strog evdece to support ths clam. The P-value for ths test s ~ 9 P.3. Ths was obtaed maually wth a calculator. Table - presets the Mtab output for ths problem. Notce that the t-statstc value for the slope s computed as.35 ad that the reported P-value s P =.. Mtab also reports the t-statstc for testg the hypothess H : b =. Ths statstc s computed from Equato -7, wth b, =, as t = 46.6. Clearly, the, the hypothess that the tercept s zero s rejected. Sec -4 Hypothess Tests Smple Lear Regresso 9-4: Hypothess Tests Smple Lear Regresso -4. Aalyss of Varace Approach to Test Sgfcace of Regresso The aalyss of varace detty s y y yˆ y y yˆ Symbolcally, (-8) SS T = SS R + SS E (-9) Sec -4 Hypothess Tests Smple Lear Regresso
4//6-4: Hypothess Tests Smple Lear Regresso -4. Aalyss of Varace Approach to Test Sgfcace of Regresso If the ull hypothess, H : b = s true, the statstc F SS SS / E R / MS R MSE (-) follows the F,- dstrbuto ad we would reject f f > f a,,-. Sec -4 Hypothess Tests Smple Lear Regresso -4: Hypothess Tests Smple Lear Regresso -4. Aalyss of Varace Approach to Test Sgfcace of Regresso The quattes, MS R ad MS E are called mea squares. Aalyss of varace table: Source of Sum of Squares Degrees of Mea Square F Varato Freedom Regresso SSR ˆb S y MS R MS R /MS E Error SSE SST ˆb Sy - MS E Total SS T - Note that MS E = ˆ Sec -4 Hypothess Tests Smple Lear Regresso
4//6-4: Hypothess Tests Smple Lear Regresso EXAMPLE -3 Oyge Purty ANOVA We wll use the aalyss of varace approach to test for sgfcace of regresso usg the oyge purty data model from Eample -. Recall that SS ˆ T 73.38, b 4.947, S y =.7744, ad =. The regresso sum of squares s SS R b S ˆ y ad the error sum of squares s (4.947).7744 5.3 SS E = SS T - SS R = 73.38-5.3 =.5 The aalyss of varace for testg H : b = s summarzed the Mtab output Table -. The test statstc s f = MS R /MS E = 5.3/.8 = 8.86, for whch we fd that the P-value s ~ 9 P.3, so we coclude that b s ot zero. There are frequetly mor dffereces termology amog computer packages. For eample, sometmes the regresso sum of squares s called the model sum of squares, ad the error sum of squares s called the resdual sum of squares. Sec -4 Hypothess Tests Smple Lear Regresso 3-5: Cofdece Itervals -5. Cofdece Itervals o the Slope ad Itercept Defto Uder the assumpto that the observato are ormally ad depedetly dstrbuted, a ( - a)% cofdece terval o the slope b smple lear regresso s ˆ ˆ t ˆ ˆ b a/, b b ta/, (-) S S Smlarly, a ( - a)% cofdece terval o the tercept b s bˆ t a/, ˆ S b bˆ t a/, ˆ S (-) Sec -5 Cofdece Itervals 4
4//6-5: Cofdece Itervals EXAMPLE -4 Oyge Purty Cofdece Iterval o the Slope We wll fd a 95% cofdece terval o the slope of the regresso le usg the data Eample -. Recall that bˆ 4.947, S.6888, ad ˆ. 8 (see Table -). The, from Equato - we fd Or Ths smplfes to bˆ t ˆ b bˆ t.5,8.5,8 S.8 b 7.73 Practcal Iterpretato: Ths CI does ot clude zero, so there s strog evdece (at a =.5) that the slope s ot zero. The CI s reasoably arrow (.766) because the error varace s farly small. ˆ S.8.8 4.947. b 4.947..6888.6888 Sec -5 Cofdece Itervals 5-5: Cofdece Itervals -5. Cofdece Iterval o the Mea Respose Defto ˆ Y bˆ bˆ A ( - a)% cofdece terval about the mea respose at the value of, say Y, s gve by ˆ Y ta/, ˆ Y S ˆ Y ta/, ˆ S (-3) where ˆ Y bˆ bˆ s computed from the ftted regresso model. Sec -5 Cofdece Itervals 6 3
4//6-5: Cofdece Itervals Eample -5 Oyge Purty Cofdece Iterval o the Mea Respose We wll costruct a 95% cofdece terval about the mea respose for the data Eample -. The ftted model s ˆ Y 74.83 4. 947, ad the 95% cofdece terval o Y s foud from Equato -3 as (.96) ˆ..8 Y.6888 Suppose that we are terested predctg mea oyge purty whe = %. The ˆ Y 74.83 4.947(.) 89.3. ad the 95% cofdece terval s 89.3..8 or 89.3.75 Therefore, the 95% CI o Y. s 88.48 Y. 89.98 Ths s a reasoable arrow CI. (..96).6888 Sec -5 Cofdece Itervals 7-6: Predcto of New Observatos Predcto Iterval A ( - a) % predcto terval o a future observato Y at the value s gve by yˆ t a/, ˆ Y yˆ S t a/, ˆ S (-4) The value ŷ s computed from the regresso model yˆ bˆ ˆ b. Sec -6 Predcto of New Observatos 8 4
4//6-6: Predcto of New Observatos EXAMPLE -6 Oyge Purty Predcto Iterval To llustrate the costructo of a predcto terval, suppose we use the data Eample - ad fd a 95% predcto terval o the et observato of oyge purty =.%. Usg Equato -4 ad recallg from Eample -5 that yˆ 89.3, we fd that the predcto terval s 89.3. Y.8 89.3....96.6888.8..96.6888 whch smplfes to 86.83 y 9.63 Ths s a reasoably arrow predcto terval. Sec -6 Predcto of New Observatos 9-7: Adequacy of the Regresso Model Fttg a regresso model requres several assumptos.. Errors are ucorrelated radom varables wth mea zero;. Errors have costat varace; ad, 3. Errors be ormally dstrbuted. The aalyst should always cosder the valdty of these assumptos to be doubtful ad coduct aalyses to eame the adequacy of the model Sec -7 Adequacy of the Regresso Model 3 5
4//6-7: Adequacy of the Regresso Model -7. Resdual Aalyss The resduals from a regresso model are e = y - ŷ, where y s a actual observato ad ŷ s the correspodg ftted value from the regresso model. Aalyss of the resduals s frequetly helpful checkg the assumpto that the errors are appromately ormally dstrbuted wth costat varace, ad determg whether addtoal terms the model would be useful. Sec -7 Adequacy of the Regresso Model 3-7: Adequacy of the Regresso Model EXAMPLE -7 Oyge Purty Resduals The regresso model for the oyge purty data Eample - s yˆ 74.83 4.947. Table -4 presets the observed ad predcted values of y at each value of from ths data set, alog wth the correspodg resdual. These values were computed usg Mtab ad show the umber of decmal places typcal of computer output. A ormal probablty plot of the resduals s show Fg. -. Sce the resduals fall appromately alog a straght le the fgure, we coclude that there s o severe departure from ormalty. The resduals are also plotted agast the predcted value Fg. - ad agast the hydrocarbo levels Fg. -. These plots do ot dcate ay serous model adequaces. ŷ Sec -7 Adequacy of the Regresso Model 3 6
4//6-7: Adequacy of the Regresso Model Eample -7 Sec -7 Adequacy of the Regresso Model 33-7: Adequacy of the Regresso Model Eample -7 Fgure - Normal probablty plot of resduals, Eample -7. Sec -7 Adequacy of the Regresso Model 34 7
4//6-7: Adequacy of the Regresso Model Eample -7 Fgure - Plot of resduals versus predcted oyge purty, ŷ, Eample -7. Sec -7 Adequacy of the Regresso Model 35-7: Adequacy of the Regresso Model -7. Coeffcet of Determato (R ) The quatty R SS R SS T SS SS E T s called the coeffcet of determato ad s ofte used to judge the adequacy of a regresso model. R ; We ofte refer (loosely) to R as the amout of varablty the data eplaed or accouted for by the regresso model. Sec -7 Adequacy of the Regresso Model 36 8
4//6-7: Adequacy of the Regresso Model -7. Coeffcet of Determato (R ) For the oyge purty regresso model, R = SS R /SS T = 5.3/73.38 =.877 Thus, the model accouts for 87.7% of the varablty the data. Sec -7 Adequacy of the Regresso Model 37-8: Correlato We assume that the jot dstrbuto of X ad Y s the bvarate ormal dstrbuto preseted Chapter 5, ad Y ad Y are the mea ad varace of Y, X ad X are the mea ad varace X, ad r s the correlato coeffcet betwee Y ad X. Recall that the correlato coeffcet s defed as where XY s the covarace betwee Y ad X. The codtoal dstrbuto of Y for a gve value of X = s where fy y r XY X Y ep Y y b b Y (-5) (-6) b b Y X Y r X r Y X (-7) (-8) Sec -8 Correlato 38 9
4//6-8: Correlato It s possble to draw fereces about the correlato coeffcet r ths model. The estmator of r s the sample correlato coeffcet Note that R X X Y Y Y X X / S S SS / XX XY T (-9) b ˆ SS S T XX / R (-) We may also wrte: R bˆ S S XX YY bˆ S SS X Y T SS SS R T Sec -8 Correlato 39-8: Correlato It s ofte useful to test the hypotheses H : r = H : r The approprate test statstc for these hypotheses s T R (-) R Reject H f t > t a/,-. Sec -8 Correlato 4
4//6-8: Correlato The test procedure for the hypothess H : r = H : r where r s somewhat more complcated. I ths case, the approprate test statstc s Z = (arctah R - arctah r )( - 3) / (-) Reject H f z > z a/. Sec -8 Correlato 4-8: Correlato The appromate (- a)% cofdece terval s / / tah za za arctah r r tah arctah r 3 3 (-3) Sec -8 Correlato 4
4//6-8: Correlato EXAMPLE -8 Wre Bod Pull Stregth I Chapter (Secto -3) a applcato of regresso aalyss s descrbed whch a egeer at a semcoductor assembly plat s vestgatg the relatoshp betwee pull stregth of a wre bod ad two factors: wre legth ad de heght. I ths eample, we wll cosder oly oe of the factors, the wre legth. A radom sample of 5 uts s selected ad tested, ad the wre bod pull stregth ad wre legth arc observed for each ut. The data are show Table -. We assume that pull stregth ad wre legth are jotly ormally dstrbuted. Fgure -3 shows a scatter dagram of wre bod stregth versus wre legth. We have used the Mtab opto of dsplayg bo plots of each dvdual varable o the scatter dagram. There s evdece of a lear relatoshp betwee the two varables. The Mtab output for fttg a smple lear regresso model to the data s show below. Sec -8 Correlato 43-8: Correlato Fgure -3 Scatter plot of wre bod stregth versus wre legth, Eample -8. Sec -8 Correlato 44
4//6-8: Correlato Mtab Output for Eample -8 Regresso Aalyss: Stregth versus Legth The regresso equato s Stregth = 5. +.9 Legth Predctor Coef SE Coef T P Costat 5.5.46 4.46. Legth.97.7 4.8. S = 3.93 R-Sq = 96.4% R-Sq(adj) = 96.% PRESS = 7.44 R-Sq(pred) = 95.54% Aalyss of Varace Source DF SS MS F P Regresso 5885.9 5885.9 65.8. Resdual Error 3. 9.6 Total 4 65.9 Sec -8 Correlato 45 Eample -8 (cotued) -8: Correlato Now S = 698.56 ad S y = 7.73, ad the sample correlato coeffcet s r 7.73 / S SS 698.5665.9 S y T /.988 Note that r = (.988) =.964 (whch s reported the Mtab output), or that appromately 96.4% of the varablty pull stregth s eplaed by the lear relatoshp to wre legth. Sec -8 Correlato 46 3
4//6 Eample -8 (cotued) -8: Correlato Now suppose that we wsh to test the hypotheses H : r = H : r wth a =.5. We ca compute the t-statstc of Equato - as t r r.988 3.964 4.8 Ths statstc s also reported the Mtab output as a test of H : b =. Because t.5,3 =.69, we reject H ad coclude that the correlato coeffcet r. Sec -8 Correlato 47 Eample -8 (cotued) -8: Correlato Fally, we may costruct a appromate 95% cofdece terval o r from Equato -3. Sce arctah r = arctah.988 =.345, Equato -3 becomes.96.96 tah.345 r tah.345 whch reduces to.9585 r.99 Sec -8 Correlato 48 4
4//6-9: Trasformato ad Logstc Regresso We occasoally fd that the straght-le regresso model Y = b + b + approprate because the true regresso fucto s olear. Sometmes olearty s vsually determed from the scatter dagram, ad sometmes, because of pror eperece or uderlyg theory, we kow advace that the model s olear. Occasoally, a scatter dagram wll ehbt a apparet olear relatoshp betwee Y ad. I some of these stuatos, a olear fucto ca be epressed as a straght le by usg a sutable trasformato. Such olear models are called trscally lear. Sec -9 Trasformato ad Logstc Regresso 49-9: Trasformato ad Logstc Regresso EXAMPLE -9 Wdmll Power A research egeer s vestgatg the use of a wdmll to geerate electrcty ad has collected data o the DC output from ths wdmll ad the correspodg wd velocty. The data are plotted Fgure -4 ad lsted Table -5 Table -5 Observed Values ad Regressor Varable for Eample -9. Observato Wd Velocty (mph), DC Output, y Number, 5..58 6..8 3 3.4.57 4.7.5 5..36 6 9.7.386 7 9.55.94 8 3.5.558 9 8.5.66 6..866.9.653 6.35.93 3 4.6.56 4 5.8.737 5 7.4.88 6 3.6.37 7 7.85.79 8 8.8. 9 7..8 5.45.5 9..33..3 3 4..94 4 3.95.44 5.45.3 Sec -9 Trasformato ad Logstc Regresso 5 5
4//6-9: Trasformato ad Logstc Regresso Eample -9 (Cotued) Fgure -4 Plot of DC output y versus wd velocty for the wdmll data. Fgure -5 Plot of resduals e versus ftted values y for the wdmll data. ˆ Sec -9 Trasformato ad Logstc Regresso 5-9: Trasformato ad Logstc Regresso Eample -9 (Cotued) Fgure -6 Plot of DC output versus = / for the wdmll data. Sec -9 Trasformato ad Logstc Regresso 5 6
4//6-9: Trasformato ad Logstc Regresso Eample -9 (Cotued) Fgure -7 Plot of resduals versus ftted values y for the trasformed model for the wdmll data. ˆ Fgure -8 Normal probablty plot of the resduals for the trasformed model for the wdmll data. A plot of the resduals from the trasformed model versus yˆ s show Fgure -7. Ths plot does ot reveal ay serous problem wth equalty of varace. The ormal probablty plot, show Fgure -8, gves a mld dcato that the errors come from a dstrbuto wth heaver tals tha the ormal (otce the slght upward ad dowward curve at the etremes). Ths ormal probablty plot has the z-score value plotted o the horzotal as. Sce there s o strog sgal of model adequacy, we coclude that the trasformed model s satsfactory. Sec -9 Trasformato ad Logstc Regresso 53 Importat Terms & Cocepts of Chapter Aalyss of varace test regresso Cofdece terval o mea respose Correlato coeffcet Emprcal model Cofdece tervals o model parameters Itrscally lear model Least squares estmato of regresso model parameters Logstcs regresso Model adequacy checkg Odds rato Predcto terval o a future observato Regresso aalyss Resdual plots Resduals Scatter dagram Smple lear regresso model stadard error Statstcal test o model parameters Trasformatos Chapter Summary 54 7