SMAM 39 Least Squares Example. Heatg ad combusto aalyses were performed order to study the composto of moo rocks collected by Apollo 4 ad 5 crews. Recorded c ad c of the Mtab output are the determatos of hydroge (H) ad carbo (C) parts per mllo (PPM) for specmes. Row Hydroge Carbo 0.0 05.0 8.0 0.0 3 90.0 99.0 4 8.0.0 5 38.0 50.0 6 0.0 50.0 7.8 7.3 8 66.0 74.0 9.0 7.7 0 0.0 45.0 85.0 5 x = 533. 8 x y y xy = 434. 84 = 6 = 4864. 58 = 43760. 84 Some of the above tems wll be computed o a had held calculator. The computg formula for xy x y = = = b = x ( x) = = ( 43760. 84) ( 533. 8)( 6) = =. 79 ( 434. 84) ( 533. 8)
a = ( y bx) a = ( 6 (. 79)( 533. 8)) = 8. 0598 The least square equato s y = 8.0598 +.79x The predcted values are obtaed by substtutg the x value to the least square equato to obta the predcted y. The dfferece betwee the observed ad the predcted values are called the resduals. For example whe x = 8 the observed value of y =. The predcted value s y = 8.0598 +.79() = 35.47 The resdual s 35.47 = 3.47. Oe way to fd SSE s to fd the sum of the squares of the resduals. The SSR may be foud after fdg SST by subtracto. Ths method s proe to roudoff errors. It s better to fd SSR frst. Use the followg computg formulae S S S xx yy xy ( x ) = x ( y ) = y = xy x y SSR =bs xy = (.79)(365.4)=0780.6 SST = S yy =3566.3 SSE = 695.7 00R =00( 0780.6/3566.3)=79.4% of the varato s accouted for.
Worksheet sze: 00000 cells MTB > ame c='hydroge'\ MTB > ame c='carbo' MTB > set c DATA> 0 8 90 8 38 0.8 66.0 0 85 DATA> ed MTB > set c DATA> 05 0 99 50 50 7.3 74 7.7 45 5 DATA> ed MTB > prt c c Data Dsplay Row Hydroge Carbo 0.0 05.0 8.0 0.0 3 90.0 99.0 4 8.0.0 5 38.0 50.0 6 0.0 50.0 7.8 7.3 8 66.0 74.0 9.0 7.7 0 0.0 45.0 85.0 5.0 Cosder the scatterplot. MTB > plot c*c
x 05+ x x Carbo x 70+ x x x x 35+ x 0+ ++++++ Hydroge 0 5 50 75 00 5 The graph of a straght le mght be a reasoable ft. The correlato coeffcet s gve the computato. MTB > corr c c Correlatos (Pearso) Correlato of Carbo ad Hydroge = 0.89. The regresso le s gve below MTB > regress c o c Regresso Aalyss The regresso equato s Carbo = 8. + 0.79 Hydroge Predctor Coef Stdev trato p Costat 8.059 8.394.5 0.060 Hydroge 0.79 0.34 5.90 0.000 s = 7.59 Rsq = 79.5% Rsq(adj) = 77.% Aalyss of Varace SOURCE DF SS MS F p Regresso 078 078 34.83 0.000 Error 9 786 30 Total 0 3566
Uusual Observatos Obs. Hydroge Carbo Ft Stdev.Ft Resdual St.Resd 85 5.00 85.3 7. 34.3.4R R deotes a obs. wth a large st. resd. The regresso le accouts for 79.5% of the varato. We ca fd the predcted values ad plot them. MTB > let c5=8.058+.79*c MTB > ame c5='predct' MTB > prt c c5 Row Hydroge predct 0.0 3.00 0 0.0 33.88 8.0 8.936 85.0 85.30 3 90.0 89.66 4 8.0 4.388 5 38.0 48.4 6 0.0 33.88 7.8 0.73 8 66.0 70.77 9.0 9.640 I the plot below the letter b are the predcted values ad the letter a are the observed values. TB > gstd * NOTE * Stadard Graphcs are eabled. Professoal Graphcs are dsabled. Use the GPRO commad to eable Professoal Graphcs. MTB > mplot c*c c5*c
Character Multple Plot A B 05+ A A B BB A 70+ B A A A 35+ ++++++ 0 5 50 75 00 5 A = Carbo vs. Hydroge B = C5(predct) vs. Hydroge Ths gves a dea of how good the ft s. Recorded here are the scores of 6 studets o a mdterm ad fal exam s statstcs. Data Dsplay Row mdterm fal 8 80 75 8 3 7 83 4 6 57 5 96 00 6 56 30 7 85 68 8 8 56 9 70 40 0 77 87 7 65 9 86 3 88 8 4 79 57 5 77 75 6 68 47 MTB > Aga lets make a scatter plot MTB > ame c3='mdterm' MTB > ame c4='fal' MTB > set c3 DATA> 8 75 7 6 96 56 85 8 70 77 7 9 88 79 77 68 DATA> ed MTB > set c4
DATA> 80 8 83 57 00 30 68 56 40 87 65 86 8 57 75 47 DATA> ed MTB > plot c4*c3 haracter Plot 00+ x Fal x x x x x x 75+ x x x x x x 50+ x x x 5+ ++++++ Mdterm 5 30 45 60 75 90 MTB > GPro. MTB > Observe that oe of the observatos (8,56) s way out. MTB > corr c4 c3 Correlatos (Pearso) Correlato of fal ad mdterm = 0.583 MTB > regress c4 o,c3 Regresso Aalyss The regresso equato s fal = 3.0 + 0.65 mdterm Predctor Coef Stdev trato p Costat.95 7.4.3 0.08 mdterm 0.65 0.37.69 0.08 s = 6. Rsq = 34.0% Rsq(adj) = 9.3% Aalyss of Varace
SOURCE DF SS MS F p Regresso 898.7 898.7 7. 0.08 Error 4 368.3 6.9 Total 5 5579.9 Uusual Observatos Obs. mdterm fal Ft Stdev.Ft Resdual St.Resd 8 8.0 56.00 34. 3.37.79.37RX Oly 34% of the varato s accouted for. Redog the regresso wthout the uusual observato mproves the ft cosderably but ot eough to make t worthwhle. MTB > let c6=c3 MTB > let c7=c4 The uusual observato was deleted from the colums o the worksheet. MTB > regress c7 o,c6 Regresso Aalyss The regresso equato s C7 = 37. +.39 C6 5 cases used cases cota mssg values Predctor Coef Stdev trato p Costat 37.09 4.6.5 0.56 C6.39 0.39 4.36 0.00 s = 3.00 Rsq = 59.4% Rsq(adj) = 56.3% Aalyss of Varace SOURCE DF SS MS F p Regresso 36.4 36.4 9.0 0.00 Error 3 98.5 69. Total 4 544.9 Theoretcal Devlopmet Gve a set of data pots (X, Y ) the objectve s to fd the straght le such that the sum of the squares of the dfferece betwee the observed values ad
those that would be predcted by the regresso equato s a mmum. Ths amouts to fd the values of the slope ad the y tercept such that Fab (, ) = ( Y a bx) = s mmzed. A o calculus dervato of the LS Equato s gve o the ext page.
Dervato of Least Square Formula wthout Calculus Notato S xx = Ú =ƒ Hx x L S xy = Ú =ƒ Hx x L Hy y ) S yy = Ú =ƒ Hy y L. The goal s to fd a ad b so that F(a,b)=Ú =ƒ Hy a bx L s mmmzed. Ths represets the dfferece betwee the observed values ad those predcted by the best fttg equato. Now addg ad subtractg y ad bx Ú =ƒ Hy a bx L = Ú =ƒ @Hy yl + Hy a b xl bhx xme = Ú =ƒ Hy y L +Hy a b xl +b Ú =ƒ = S yy + Hy a b xl +b S xx bs xy = S yy + Hy a b xl +S xx Jb bs xy = S yy IS xym Sxx Sxx + Iy a bxm +S xx Jb S xy Sxx N The above expresso s mmzed whe Hx x M bú =ƒ Hx xl Hy y ) + J S xy Sxx N IS xym ) Sxx b= S xy Sxx ad a = y bx.
Oce the regresso equato s derved the corrected sum of squares ca be broke up to two parts a sum of squares due to regresso ad a sum of squares due to error. SST =Ú = Hy yl =Ú = Hy a bx + a + bx yl =Ú = Hy a bx L +Ú = Ha + bx yl +Ú = Hy a bx L Ha + bx yl Sce a = y bx. y a bx =y y +bx bx Ú = Hy a bx L Ha + bx yl=ú = Hy y + b x bx L Hbx b xl = bs xy b S xx = S xy Sxx S xy Sxx Sxx=0 The cross term s therefore zero ad SST =SSR +SSE SSR = Ú = Ha + bx yl = b Ú = Hx xl = bs xy The quatty R = SSR represets the proporto of the SST varato accouted for by the regresso le.