Chapter 11: Simple Linear Regression and Correlation

Chapter 11: Smple Lnear Regresson and Correlaton 11-1 Emprcal Models 11-2 Smple Lnear Regresson 11-3 Propertes of the Least Squares Estmators 11-4 Hypothess Test n Smple Lnear Regresson 11-4.1 Use of t-tests 11-4.2 Analyss of varance approach to test sgnfcance of regresson 11-5 Confdence Intervals 11-5.1 Confdence ntervals on the slope and ntercept 11-5.2 Confdence nterval on the mean response 11-6 Predcton of New Observatons 11-7 Adequacy of the Regresson Model 11-7.1 Resdual analyss 11-7.2 Coeffcent of determnaton (R 2 ) 11-8 Correlaton 11-9 Regresson on Transformed Varables 11-10 Logstc Regresson 1 Chapter Learnng Objectves After careful study of ths chapter you should be able to: 1. Use smple lnear regresson for buldng emprcal models to engneerng and scentfc data 2. Understand how the method of least squares s used to estmate the parameters n a lnear regresson model 3. Analyze resduals to determne f the regresson model s an adequate ft to the data or to see f any underlyng assumptons are volated 4. Test the statstcal hypotheses and construct confdence ntervals on the regresson model parameters 5. Use the regresson model to make a predcton of a future observaton and construct an approprate predcton nterval on the future observaton 6. Apply the correlaton model 7. Use smple transformatons to acheve a lnear regresson model 2

Emprcal Models Many problems n engneerng and scence nvolve explorng the relatonshps between two or more varables. Regresson analyss s a statstcal technque that s very useful for these types of problems. For example, n a chemcal process, suppose that the yeld of the product s related to the process-operatng temperature. Regresson analyss can be used to buld a model to predct yeld at a gven temperature level. 3 Emprcal Model - Example Data 4

Emprcal Model - Example Plot Fgure 11-1: Scatter dagram of oxygen purty versus hydrocarbon level from Table 11-1. 5 Smple Lnear Regresson Based on the scatter dagram, t s probably reasonable to assume that the mean of the random varable Y s related to x by the followng straght-lne relatonshp: where the slope and ntercept of the lne are called regresson coeffcents. The smple lnear regresson model s gven by where s the random error term. 6

Varance of Y = Varance of ε We thnk of the regresson model as an emprcal model. Suppose that the mean and varance of are 0 and 2, respectvely, then: The varance of Y gven x s: 7 Model of True Regresson Lne The true regresson model s a lne of mean values: where 1 can be nterpreted as the change n the mean of Y for a unt change n x (slope of the lne). The varablty of Y at a partcular value of x s determned by the error varance, 2. Ths mples there s a dstrbuton of Y-values at each x and that the varance of ths dstrbuton s the same at each x. 8

Dstrbuton of Y along Lne Fgure 11-2:The dstrbuton of Y for a gven value of x for the oxygen purty-hydrocarbon data. 9 Predctor and Response Varables The case of smple lnear regresson consders a sngle regressor or predctor x and a dependent or response varable Y. The expected value of Y at each level of x s a random varable: We assume that each observaton, Y, can be descrbed by the model: 10

Suppose that we have n pars of observatons (x 1, y 1 ), (x 2, y 2 ), (x n, y n ). The method of least squares s used to estmate the parameters, 0 and 1, by mnmzng the sum of the squares of the vertcal devatons. Fgure 11-3: Devatons of the data from the estmated regresson model. Method of Least Squares 11 Sum of Square Devatons Snce the n observatons n the sample can be expressed as: The sum of the squares of the devatons (errors) of the observatons from the true regresson lne s: 12

Least Squares Normal Equatons 13 Smple Lnear Regresson Coeffcents 14

Ftted Regresson Lne 15 16 n x x x x S n n n xx 2 1 1 2 1 2 n y x y x x x y y S n n n n xy 1 1 1 1 Sums of Squares The followng notaton may also be used: Then, xx xy S S 1 ˆ x y 1 0 ˆ ˆ and (11-10) (11-11)

Smple Lnear Regresson - Example Example 11-1 17 Example 11-1 (contnued) 18

Example 11-1 (contnued) Fgure 11-4: Scatter plot of oxygen purty y versus hydrocarbon level x and regresson model ŷ = 74.20 + 14.97x. 19 Computng 2 The error sum of squares s: It can be shown that the expected value of the error sum of squares s E(SS E ) = (n 2) 2. An unbased estmator of 2 s: where SS E can be easly computed usng: 20

21 Excel Data Analyss Tool Regresson output SUMMARY OUTPUT Regresson Statstcs Multple R 0.937 R Square 0.877 Adjusted R Square 0.871 Standard Error 1.087 Observatons 20.000 ANOVA df SS MS F Sgnfcance F Regresson 1 152.127 152.127 128.862 0.000 Resdual 18 21.250 1.181 Total 19 173.377 Coeffcents Standard Error t Stat P-value Intercept 74.283 1.593 46.617 0.000 X Varable 1 14.947 1.317 11.352 0.000 22

Propertes of Least Squares Estmators Slope propertes for the mean and varance (11-15) (11-16) Intercept propertes for the mean and varance (11-17) 23 Estmated Standard Errors In smple lnear regresson the estmated standard error of the slope and the estmated standard error of the ntercept are: se 2 2 ˆ ˆ 21 x seˆ 0 ˆ 1 S xx n S xx respectvely, where the estmated varance s computed usng Equaton 11-13. 24

Hypothess Test for the Slope If we wsh to test the slope s some value β 1,0 : (11-18) An approprate test statstc would be: ˆ 1 ˆ 1,0 1 T0 2 ˆ S se ˆ We would reject the null hypothess f: XX 1 1,0 (11-19) (11-20) 25 Hypothess Test for the Intercept If we wsh to test the ntercept s some value β 0,0 : (11-21) An approprate test statstc would be: (11-22) We would reject the null hypothess f: 26

Sgnfcance of Regresson An mportant specal case of these hypotheses s: (11-23) Falure to reject H 0 s equvalent to concludng that there s no lnear relatonshp between x and Y. In other words, f we conclude the slope could be 0 the nformaton on x tells us nothng about the varaton n the response, Y. 27 Fgure 11-5: The hypothess H 0 : 1 = 0 s not rejected. Fgure 11-6: The hypothess H 0 : 1 = 0 s rejected. 28

Hypothess Testng - Example Example 11-2 29 Analyss of Varance (ANOVA) The analyss of varance dentty s: If the null hypothess, H 0 : β 1 = 0 s true, the statstc follows the F 1,n-2 dstrbuton and we would reject f f 0 > f,1,n-2. 30

The ANOVA Table The quanttes MS R and MS E are called mean squares of the regresson and the errors, respectvely. Analyss of varance (ANOVA) table: 31 Analyss of Varance - Example Example 11-3 32

Equvalence of t-tests and ANOVA 33 Confdence Intervals on Regresson Model Parameters The followng state the confdence ntervals for the slope and ntercept of a regresson model. 34

Example 11 4 (Confdence Interval on the Slope) 12.181 β 1 17.713 35 Confdence Interval on the Mean Response The pont estmate for the response at a gven x s: ˆ ˆ ˆ x Y x 0 The confdence nterval for the mean response s then: 0 1 0 36

Example 11 5 (Confdence Interval on the Mean Response) 37 Example 11 5 (contnued) 38

Example 11 5 (contnued) Fgure 11-7: Scatter dagram of oxygen purty data from Example 11-1 wth ftted regresson lne and 95% confdence lmts on Y x0. 39 Predcton of New Observatons The response pont estmate for a new observaton at x 0 s: Yˆ 0 ˆ ˆ x 0 The predcton nterval for the new response, Y 0, s then: 1 0 40

Example 11 6 (Predcton Interval) 41 Example 11 6 (contnued) 42

Example 11 6 (contnued) Fgure 11-8: Scatter dagram of oxygen purty data from Example 11-1 wth ftted regresson lne, 95% predcton lmts (outer lnes), and 95% confdence lmts on Y x0. 43 Adequacy of Regresson Models Fttng a regresson model requres several assumptons. 1. Errors are uncorrelated random varables wth mean zero; 2. Errors have constant varance; and, 3. Errors be normally dstrbuted. The analyst should always consder the valdty of these assumptons to be doubtful and conduct analyses to examne the adequacy of the model 44

Resdual (Error) Analyss The resduals from a regresson model are e = y - ŷ, where y s an actual observaton and ŷ s the correspondng ftted value from the regresson model. Analyss of the resduals s frequently helpful n checkng the assumpton that the errors are approxmately normally dstrbuted wth constant varance, and n determnng whether addtonal terms n the model would be useful. 45 Resdual Plots Fgure 11-9: Patterns for resdual plots. (a) satsfactory, (b) funnel, (c) double bow, (d) nonlnear. 46

Resdual Analyss - Example Example 11-7 47 Example 11-7 (contnued) 48

Example 11-7 (contnued) Fgure 11-10: Normal probablty plot of resduals, Example 11-7. 49 Example 11-7 (contnued) Fgure 11-11: Plot of resduals versus predcted oxygen purty, ŷ, Example 11-7. 50

Coeffcent of Determnaton (R 2 ) The quantty s called the coeffcent of determnaton and s often used to judge the adequacy of a regresson model. 0 R 2 1; (11-34) We often refer (loosely) to R 2 as the amount of varablty n the data explaned or accounted for by the regresson model. 51 R 2 Computatons - Example For the oxygen purty regresson model, R 2 = SS R /SS T = 152.13/173.38 = 0.877 Thus, the model accounts for 87.7% of the varablty n the data. 52

Regresson on Transformed Varables In many cases a plot of the ndependent varable, y, aganst the dependent varable, x, may show the relatonshp s not lnear. Performng a lnear regresson would lead to a poor ft and resdual analyss would show the model s nadequate. However, we can often transform the dependent varable frst. Ths transformed varable, x, may have a lnear relatonshp wth y. 53 Therefore, we can perform a lnear regresson between the x and y. However, note that any use of the new equaton for predcton would requre a reverse transformaton to ndcate the desred value of x. Transformaton can take on many forms. Typcal ones nclude: x = logarthm (x) x = square root (x) x = nverse (x). 54

Example 11-9 An engneer has collected data on the DC output from a wndmll under dfferent wnd speed condtons. He wshes to develop a model descrbng output n terms of wnd speed. The table on the rght shows the data collected for output, y, as a response and wnd speed, x, as the dependent varable. The fnal column shows the transformed value, x =1/x. Obs. Output (y) Velocty (x) x'=1/x 1 1.582 5.00 0.200 2 1.822 6.00 0.167 3 1.057 3.40 0.294 4 0.5 2.70 0.370 5 2.236 10.00 0.100 6 2.386 9.70 0.103 7 2.294 9.55 0.105 8 0.558 3.05 0.328 9 2.166 8.15 0.123 10 1.866 6.20 0.161 11 0.653 2.90 0.345 12 1.93 6.35 0.157 13 1.562 4.60 0.217 14 1.737 5.80 0.172 15 2.088 7.40 0.135 16 1.137 3.60 0.278 17 2.179 7.85 0.127 18 2.112 8.80 0.114 19 1.8 7.00 0.143 20 1.501 5.45 0.183 21 2.303 9.10 0.110 22 2.31 10.20 0.098 23 1.194 4.10 0.244 24 1.144 3.95 0.253 25 0.123 2.45 0.408 55 Example 11-9 (contnued) 3.0 2.5 DC Output 2.0 1.5 1.0 0.5 0.0 0 2 4 6 8 10 12 Wnd Velocty, x Orgnal Regresson Equaton (Orgnal Data): y = 0.1309 + 0.2411 x R 2 = 0.875 56

Example 11-9 (contnued) 3.0 2.5 DC Output 2.0 1.5 1.0 Transformed 0.5 0.0 0.0 0.1 0.2 0.3 0.4 0.5 Transformed Wnd Velocty, 1/x Regresson Equaton (Transformed Data): y = 2.9789 6.9345 x R 2 = 0.980 57 THE END OF ENGG 319 CLASS NOTES 58