933 Probablty ad Statstcs for Software ad Kowledge Egeers Lecture 3: Smple Lear Regresso ad Correlato Mocha Soptkamo, Ph.D. Outle The Smple Lear Regresso Model (.) Fttg the Regresso Le (.) The Aalyss of Varace Table (.6) Resdual Aalyss (.7) Correlato Aalyss (.9)
The Smple Lear Regresso Model I (.) Purpose of regresso aalyss: predct the value of a depedet or respose varable from the values of at least oe explaatory or depedet varable (also called predctors or factors). Purpose of correlato aalyss: measure the stregth of the correlato betwee two varables. Itercept parameter The Smple Lear Regresso Model II (.) y β 0 + β x Y + N(β 0 β x, σ ) Slope parameter Smple lear regresso model
The Smple Lear Regresso Model III (.) Iterpretato of the error varace σ The Smple Lear Regresso Model IV (.) β > 0 postve relatoshp β 0 No relatoshp SLR model s ot approprate for olear relatoshp 35 30 5 β < 0 egatve relatoshp 0 5 0 5 0 0 4 6 8 0 4 6 3
The Smple Lear Regresso Model V (.) Ex.67 pg.536: Car Plat Electrcty Usage Electrcty usage 3.8 3.6 3.4 3. 3.8.6.4. 3 3.5 4 4.5 5 5.5 6 6.5 Productom Excel sheet Outle The Smple Lear Regresso Model (.) Fttg the Regresso Le (.) The Aalyss of Varace Table (.6) Resdual Aalyss (.7) Correlato Aalyss (.9) 4
Fttg the Regresso Le I (.) : Selectg the best le (errors) error estmated y The least squares ft Fttg the Regresso Le II (.) yˆ x 0 : : β ad β SSE ˆ y β 0 + βx predcted value of y for observato. value of observato. e Subject to: e 0 are chose to mmze: ( y ˆ y ) [ ] y ( β + β x ) 0 5
Fttg the Regresso Le III (.) Method of Least Squares x y β x β0 β xy y x Varace of errors: ( x) ˆ σ SSE - sce two regresso parameters eed to be computed frst Fttg the Regresso Le IV (.) Ex.67 pg.545: Car Plat Electrcty Usage β x β y b x 0 x y xy ( x) x 4.885 y.846 x x y 9.3 69.53 69.53 4.885.846 β 9.3 4.885 0.4988 β.846 0.4998 4.885 0 0.409 y 0.409 + 0. 499x Excel sheet 6
Fttg the Regresso Le V (.) Ex.67 pg.545: Car Plat Electrcty Usage 3.8 3.6 3.4 y 0.498x + 0.409 R² 0.80 Electrcty usage 3. 3.8.6.4. 3 3.5 4 4.5 5 5.5 6 6.5 Productom Outle The Smple Lear Regresso Model (.) Fttg the Regresso Le (.) The Aalyss of Varace Table (.6) Resdual Aalyss (.7) Correlato Aalyss (.9) 7
Outle The Smple Lear Regresso Model (.) Fttg the Regresso Le (.) The Aalyss of Varace Table (.6) Resdual Aalyss (.7) Correlato Aalyss (.9) Outle The Smple Lear Regresso Model (.) Fttg the Regresso Le (.) The Aalyss of Varace Table (.6) Resdual Aalyss (.7) Correlato Aalyss (.9) 8
The Aalyss of Varace Table: Sum of Squares Decomposto I (.6.) Apply the smlar ANOVA approach as the oe-factor layout as Chapter Cosder the varablty the depedet varable y Hypothess test: H 0 : β 0 The Aalyss of Varace Table: Sum of Squares Decomposto II (.6.) SST ( y y) SSR ( yˆ y) SST SSE SSE ( ˆ ) y y 9
The Aalyss of Varace Table: Sum of Squares Decomposto III (.6.) The sum of squares for a smple lear regresso The Aalyss of Varace Table: Sum of Squares Decomposto IV (.6.) The aalyss of varace table for a smple lear regresso aalyss Hypothess test: H 0 : β 0 The two-sded p-value s p-value P(X > F) where X s RV that has a F,- dstrbuto 0
The Aalyss of Varace Table: Sum of Squares Decomposto V (.6.) Coeffcet of determato (R ): fracto of varato explaed by the regresso R SSR SST SSE SST SST (0 R ) SSE SST The closer R s to oe, the better s the regresso model. The Aalyss of Varace Table: Sum of Squares Decomposto VI (.6.) The coeffcet of determato R s larger scearo II tha scearo I
The Aalyss of Varace Table: Sum of Squares Decomposto VII (.6.) Ex.67 pg.57: Car Plat Electrcty Usage MSR.4 F 40.53 MSE 0.099 SSR.4 R 0.80 SST.55 The hgher the value of R the better the regresso. Excel sheet Outle The Smple Lear Regresso Model (.) Fttg the Regresso Le (.) The Aalyss of Varace Table (.6) Resdual Aalyss (.7) Correlato Aalyss (.9)
Resdual Aalyss Methods I (.7.) Resduals: dffereces betwee the observed values of the depedet varable ad the correspodg predcted (ftted) values ˆ e y Resdual aalyss ca be used to Idetfy outlers Check f the ftted model s good Check f the varace of error s costat Check f the error terms are ormally dstrbuted y Excel sheet Resdual Aalyss Methods II (.7.) Plot the resduals e agast the values of the explaatory varable x Radom scatter plot dcates o problem wth the obtaed regresso model If e /σˆ (stadardzed resdual) s > 3, data pot s a outler If there are outlers, they should be removed ad the regresso le should be ftted aga Excel sheet 3
Resdual Aalyss Methods III (.7.) Resdual plot dcatg pots that may be outlers Resdual Aalyss Methods IV (.7.) If resdual plots show postve ad egatve resduals grouped together, a lear model s ot sutable A groupg of postve ad egatve resduals dcates that the lear model s approprate 4
Resdual Aalyss Methods V (.7.) If the resdual plot shows a fuel shape, the varace of error (σ ) s ot costat, coflctg w/ the assumpto A fuel shape the resdual plot dcates a o-costat error varace Resdual Aalyss Methods VI (.7.) Normal probablty plot (ormal scores plot) of resduals ca be used to check f the error terms ε are ormally dstrbuted Normal sc cores A ormal scores plot of a smulated sample from a ormal dstrbuto, whch shows the pots lyg approxmately o a straght le 5
Resdual Aalyss Methods VII (.7.) res Normal scor Exhbts o-ormal dstrbuto of resduals Lear modelg approach may ot be used Normal scores Normal scores plots of smulated samples from o-ormal dstrbutos, whch show olear patters Outle The Smple Lear Regresso Model (.) Fttg the Regresso Le (.) The Aalyss of Varace Table (.6) Resdual Aalyss (.7) Correlato Aalyss (.9) 6
The Sample Correlato Coeffcet I (.9.) From the correlato eq. Secto.5.4, Cov( X, Y ) ρ Corr( X, Y ) Var( X )Var( Y ) whch measures the stregth of lear assocato betwee two jotly dstrbuted RVs X ad Y The sample correlato coeffcet r for a set of pared data observatos (x, y ) s r ( x x )( y y ) ( x x) x x x y xy ( y y) y y (- r ) The Sample Correlato Coeffcet II (.9.) r 0 o lear assocato r < 0 egatve lear assocato r > 0 postve lear assocato 7
The Sample Correlato Coeffcet III (.9.) r R (sample correlato coeft.) (coeft. of fdetermato) t r s uchaged f x ad y are swapped, whch s cotrast to regresso aalyss, whch requres that oe varable be depedet ad the other explaatory r s also ot affected by ay lear combato of the varables, e.g., x ax + b ad ad y cy d + The Sample Correlato Coeffcet IV (.9.) Hypothess test: H 0 : ρ 0(o 0 correlato betwee RVs) ca be performed by computg t-statstc r t r wth a t-dstrbuto w/ degrees of freedom 8
The Sample Correlato Coeffcet V (.9.) Ex.69 pg.588: Craal Crcumfereces r S S XY XX S YY 3.0745 0.55.489 99.457 (r R 0.55 0.065) Null Hypothess H 0 : ρ 0 (o correlato) computer t-statstc r 0.55 8 t r 0.55 t-statstc. p-value P(X > t) P(X >.) 0.77 Sce p-value > α, we accept H 0 fger legth ad craal crcumferece are ot correlated. Excel sheet 9