Lear Regreo ad Correlato 00 Predctg the erupto tme after oberved a erupto of 4 mute durato. 90 80 70 Iter-erupto Tme.5.0.5 3.0 3.5 4.0 4.5 5.0 5.5 Durato A ample of tererupto tme wa take durg Augut -8, 978 ad Augut 6-3, 979. The data et aved the fle g.dat whch a ASCII fle, ad the SPSS fle gr.av. The frt colum of the data et for date, the ecod colum erupto durato tme ( mute) ad the thrd colum tererupto tme (mute). How ca we help the tourt? Tr ome eplorator techque uch a htogram, boplot, catter plot,..., to ee f there a patter. A charactertc of the geer the durato of the prevou erupto. We could thk of our data a par of the form (durato of erupto, tme utl et erupto). Do ou ee two ubgroup th data et? Do ou ee that a loger durato ted to be followed b a loger tme terval utl the et erupto? J. S. Rehart, a 969 paper the Joural of Geophcal Reearch, provde a mecham for th patter baed o the temperature level of the water at the bottom of a geer tube at the tme the water at the top reache the bolg temperature. That a horter erupto would be followed b a horter tererupto tme (ad a loger erupto would be followed b a loger tererupto tme) alo cotet wth Rehart' model, ce a hort erupto characterzed b havg more water at the bottom of the geer beg heated hort of bolg temperature, ad left the tube. Th water ha bee heated omewhat, however, o t take le tme for the et erupto to occur. A log erupto reult the tube beg empted, o the water mut be heated from a colder temperature, whch take loger. Awer the followg queto:. I there a correlato betwee the durato of erupto ad the tme betwee erupto? Ca we model the relato ug a mple regreo modelg techque ad fd the equato of the le that bet decrbe the data.. Ue lear regreo techque to predct the average watg tme utl et erupto after a erupto that lat 4.3 mute. (Ue a 95% cofdece terval for predctg the average watg tme utl et erupto.)
Lear Regreo ad Correlato Smple Lear Regreo Model the Lear Relato Betwee Two Quattatve varable, (repoe varable) ad (eplaator varable). µ + ε (Smple Lear Regreo Model) that to determe µ α + β. What are the value of α ad β? Leat-Square Etmate of Regreo Parameter Fd oluto of α ad β of a traght le that mmze the followg varablt meaure (total accumulated quared error). Σ [ (α + β )] Maatee Klled 30? +? Redual (or Error) 0 0 0 0 0 700 800 Leat-Square Etmate of Regreo Parameter ˆ SS β, ˆ α ˆ β Aother formula: ˆ β r, SS where ad are tadard devato of ad Eample: (Maatee) 37 7945 4 ˆ β. 486 ˆ α. 486 4. 430439 09809. 5 4 4 Equato of the regreo le: ˆ α ˆ + ˆ β ; ˆ 4. 430439 +. 486 If at a certa ear the umber of power boat regtered 700,000, etmate how ma maatee o average would be klled. ŷ 4.430439 +.486, at 700 4.430439 +.486 700 45.973 Number of Power Boat
Lear Regreo ad Correlato Problem of etrapolato Etrapolated reult for a value out of the cope of A poble tred Scope of data Regreo ad caualt Regreo telf provde o formato about caual patter ad mut upplemeted b addtoal aal (wth deged ad cotrolled epermet) to obta ght about caual relatohp. Frt Order Smple Lear Regreo Model Model aumpto: α + β + ε wth error depedet ad detcall dtrbuted a ε Ν (0, σ ) ad mea of at µ α + β. Mea of at 4 3 4 3
Lear Regreo ad Correlato Redual: e ˆ ˆ ˆ α + β ˆ ŷ Eample: Fd the redual at 4. Oberved, ad predcted 4.430439 +.486 4 6.0. The redual 6.0 4.99. Source of varablt: Repoe varable, Eplaator varable, Error Redual Sum of Square (SSRed), or Sum of Square of Error (SSE) Total Sum of Square (SSTo) SSTo SSR + SSE ( ) Sum of Square due to Regreo (SSR) SSTo SSE ANOVA Table Source of Var. S.S. d.f. M.S. F Regreo SSR SSR/MSR MSR/MSE Error SSE SSE/( )MSE Total (corrected) SSTo ( ˆ ) e ˆ ) ( Etmato of σ : MSE SSE / ( ) 8.87 (Degree of freedom ) Etmated Stadard Error of the regreo model: 4.8 4
Lear Regreo ad Correlato Model R Model Summar R Square Adjuted R Square Std. Error of the Etmate.94 a.886.877 4.8 a. Predctor: (Cotat), NPOWERBT ANOVA b Model Sum of Square df Mea Square F Sg. Regreo 7.979 7.979 93.65.000 a Redual 9.4 8.87 Total 93.49 3 a. Predctor: (Cotat), NPOWERBT b. Depedet Varable: MANKILL Model (Cotat) NPOWERBT a. Depedet Varable: MANKILL Utadardzed Coeffcet B Coeffcet a Std. Error Stadardzed Coeffcet Beta -4.430 7.4-5.589.000.5.03.94 9.675.000 t Sg. Equato of the regreo le: ˆ α ˆ + ˆ β ; ˆ 4. 430439 +. 486 Iferece for Regreo Coeffcet Hpothe: Ho: β β 0, v.. Ha: β β 0 (It ofte tetg for Ho: β 0 v.. Ha: β 0.) Tet Stattc: ˆ β β 0 t ~ t-dtrbuto d.f., where e ˆ ˆ ( β ).03 e ˆ ( ˆ β ) ( ) Deco rule: Reject Ho, f C.V. approach: t < t α/ or t > t α/ p-value approach: p-value < α Hpothe: Ho: α α 0, v.. Ha: α α 0 (It ofte tetg for Ho: α 0 v.. Ha: α 0.) Tet Stattc: ˆ α α0 t ~ t-dtrbuto d.f., where e ˆ ( ˆ α) Deco rule: Reject Ho, f C.V. approach: t < t α/ or t > t α/ p-value approach: p-value < α α 7.4 ( ) e ˆ ( ˆ ) + 5
Lear Regreo ad Correlato Iferece for Predcted Value The (-α) 00% cofdece terval for predctg the mea repoe at : ˆ tα / e ˆ ( ˆ) where ± ( ) e ˆ ( ˆ) d.f. ( ) + Predcted Number of Maatee Klled o Average at 4 > 6.0 ± 3.9 > (.09, 9.9) The (-α) 00% cofdece terval for predctg a dvdual outcome at : ˆ t ˆ ( ~ α / e ) where ± ( ) e ˆ ( ~ ) d.f. ( ) + + Predcted Number of Maatee Klled at 4 > 6.0 ± 0.0 > (5.9, 6.) Cofdece Iterval Bad Number of maatee klled 30 0 0 0 0 0 700 800 Number of Powerboat 6
Lear Regreo ad Correlato Evaluato of the Model Coeffcet of Determato It the proporto of varato oberved that ca be eplaed b the varable wth the lear regreo model. It the quare of the Pearo correlato coeffcet, r. r SSE/SSTo SSR/SSTo Redual Plot A catter plot of the redual agat the predcted value of the repoe varable to verf the aumpto behd the regreo model. (Homogeet of varace, radom ormal error, appropratee of the lear model.).5 Scatterplot Depedet Varable: Number of maatee klle Regreo Stadardzed Redual.0.5 0.0 -.5 -.0 -.5 -.0 -.5 -.5 -.0 -.5 0.0.5.0.5.0 Regreo Stadardzed Predcted Value 0 0 Model ot a good lear ft Volato of the equal varace aumpto 7
Lear Regreo ad Correlato Traformato The crcle of power rule. Whe the relato betwee ad ot lear, we beg lookg at traformato of the form p ad p. p, -3, -, -, -/, l, /,,, 3, Whe the catter plot how patter of data a Quadrat I, oe ca traform the varable wth a p fucto b creag power p to a umber larger tha oe, or traformato varable wth p fucto b creag power p to a umber large tha oe. Th what the up ad up mea. If the patter mlar to the catter plot Quadrat II, the ether creae the power of p traformato to a umber larger tha oe ( up) or decreae the power of p traformato to a umber le tha oe ( dow). up Quadrat II Quadrat I dow up Quadrat III Quadrat IV dow Eample: female lfe epectac, GDP (Gro dometc product) Plot of v.. Plot of v.. l() 90 90 80 80 Female lfe epectac 99 70-0000 0 0000 0000 30000 Female lfe epectac 99 70 4 5 6 7 8 9 0 GDP per capta Natural log of GDP ˆ αˆ + ˆ β l( ) 8
Lear Regreo ad Correlato Correlato: Eame Quattatve Bvarate Data The correlato, ρ, betwee two radom varable, X ad Y, defed a, ( X µ X ) ( Y µ Y ) ρ average σ X σ Y product of the tadard devate of X ad Y, quatfe the tregth of lear relatohp. Regreo Aal I there a relato betwee umber of power boat the area ad umber of maatee klled? Year NPB( ) Nkll( ) 77 447 3 78 4 79 48 4 80 498 6 8 53 4 8 5 0 83 56 5 84 559 34 85 585 33 86 64 33 87 645 39 88 675 43 89 7 90 79 47 Maatee Klled 30 0 0 0 0 Scatter Plot 0 700 800 Number of Power Boat Eample: Calculato of Correlato Coeffcet Σ 7945 (Σ) 63305 Σ 4 (Σ) 69744 Σ 468597 Σ 56 Σ 475 S 09809.5, S 93.4857 S 37 r 0.9447789, r 0.886379485 9
Lear Regreo ad Correlato 0 Formula for correlato coeffcet: Properte of correlato coeffcet: < r < It meaure the tregth ad drecto of the lear relato betwee two quattatve varable. r f all pot le eactl o a traght le. ρ he otato for populato correlato coeffcet. Correlato Doe Not Impl Cauato Eample: The umber of powerboat regtered ma ot be the drect caue for the death of Maatee..,,, ) ( ) ( ) )( ( where r r,.) correlato ample Pearo' ( r cloe to r cloe to r cloe to zero r cloe to zero
Lear Regreo ad Correlato t-tet for correlato Hpothe: Ho: ρ ρ 0, v.. Ha: ρ ρ 0 Tet Stattc: t r ρ0 ( r ) /( ) ~ t-dtrbuto d.f. If tud rak correlato, ue r : Tet Stattc: t r r Deco rule: Reject Ho, f C.V. approach: t < t α/ or t > t α/ p-value approach: p-value < α Eample: (Maatee) I there a gfcat correlato betwee powerboat ad death of maatee? Ho: ρ 0, v.. Ha: ρ 0. 94 0 t 9. 65 (. 886) /( 4 ) d.f. 4 -, p-value <.0005, reject Ho, there gfcat lear relato. Spearma Rak Correlato Coeffcet Spearma correlato coeffcet calculated baed o the rak of the obervato,.e., r rak of amog all, ad r rak of amog all, where d r r. If te ue average rak. Tet for correlato ug rak correlato coeffcet If ample ze le tha 0 ue pecal table. If ample ze large ue t-dtrbuto table. The t-tet the ame a the tet procedure for o-rak data. It le etve to outler wth the dadvatage of lo of formato. Eample: (Maatee) ( )( ) r r r r 6 r ( ( ) ( ) r r r r d ) Year NPB( ) Rak Nkll( ) Rak 77 447 3 78 4 5 79 48 3 4 6.5 80 498 4 6 3 8 53 6 4 6.5 8 5 5 0 4 83 56 7 5 84 559 8 34 0 85 585 9 33 8.5 86 64 0 33 8.5 87 645 39 88 675 43 89 7 3 4 90 79 4 47 3 Σ d (-) + (-5) + (3-6.5) + 57 r t 6 57. 875 4( 4 ) 4. 875 6. 6. 875 p-value <.0005 <.05. There gfcat correlato.
Lear Regreo ad Correlato Eample: I a vetgato of the relato betwee the average aual temperature ad the mortalt de for a tpe of breat cacer wome certa rego of Europe, data were collected from 5 Europea coutre. 0 00 90 SPSS Output 80 Model Summar 70 Mortalt Ide 30 Average Temperature Model R R Square Adjuted R Square Std. Error of the Etmate.94 a.853.84 5.936 a. Predctor: (Cotat), Average Temperature Sample correlato coeffcet: Eample: I a vetgato, coutre were cluded to tud the relato betwee female lfe epectac ad the brthrate. 90 80 70 Female lfe epectac 99 0 0 0 30 Brth per 000 populato, 99 Model R Model Summar R Square Adjuted R Square Std. Error of the Etmate.870 a.756.754 5.6 a. Predctor: (Cotat), Brth per 000 populato, 99 Sample correlato coeffcet: