/3/08 Sstems & Bomedcal Egeerg Departmet SBE 304: Bo-Statstcs Smple Lear Regresso ad Correlato Dr. Ama Eldeb Fall 07 Descrptve Orgasg, summarsg & descrbg data Statstcs Correlatoal Relatoshps Iferetal Geeralsg Sgfcace
/3/08 Purpose of Correlato Correlato The purpose of correlato s to help addressg the questo: What s the relatoshp or degree of assocato or amout of shared varace betwee two (or more) varables? Other was of puttg the correlatoal questo clude: Are two varables depedet or depedet of oe aother? Ca oe varable be predcted from aother? Correlato A Scatter Graph Presetg the data of two varables as pots of a -D space, make up oe of the most powerful ad useful areas of data aalss.
/3/08 Two Ma Questos: Lear Relatoshp. How lkel s the relatoshp betwee two varables to be lear? a. Use correlato b. Ths volves calculatg a correlato coeffcet (r). What s the best descrpto of the lear relatoshp betwee two varables? a. Use regresso b. Ths volves calculatg the equato of the best ft le. Lear regresso tres to epla the relatoshp as a equato of the form. A + B* Correlato Covarace: cov(, ) ( )( ) Pearso s Correlato Coeffcet s stadardzed covarace (utless): r cov arace(, ) var var 3
/3/08 Correlato A rato whch dcates the amout of varato betwee two sets of data (two varables). Its rage betwee.0 ad.0: ever > or <- The value tells ou how closel the data appromate to a straght le. Ths rato s epressed as a correlato coeffcet (r): - -0.7-0.3-0. 0 +0. +0.3 +0.7 + Perfect Negatve (versel) Relatoshp Strog Moderate Weak Weak Moderate Strog _ No Relatoshp + Perfect Postve Relatoshp Correlato r - r -.6 r 0 r +.3 r + r 0 4
/3/08 Correlato Lear relatoshps Curvlear relatoshps Correlato Strog relatoshps Weak relatoshps 5
/3/08 6 No relatoshp Correlato Calculatg b had ) ( ) ( ) )( ( var var ), ( cov arace r Correlato
/3/08 Calculatg b had r ( ) ( ) ( )( ) ( )( ) ( ) ( ) Correlato SS SS SS r Numerator of covarace SS SS SS Numerators of varace Lear Regresso I correlato, the two varables are treated as equals. I regresso, oe varable s cosdered depedet (predctor) varable () ad the other the depedet (outcome) varable. Ths allows us to predct values a act kow as etrapolato Lear regresso helps us to determe rate of chage (slope) ad to predct the epected value for gve a value of. Ths s eactl equvalet to drawg a le o the graph. 7
/3/08 Lear Regresso Ft a le. Is ths oe good eough? Or s ths oe better? Or ths? Whch le has the best ft to the data? Lear Regresso Regresso aalss s used to predct the value of oe varable (the depedet varable) o the bass of other varables (the depedet varables). Depedet varable: deoted Idepedet varables: deoted,,, k If we ol have ONE depedet varable, the model s whch s referred to as smple lear regresso. We would be terested estmatg β 0 ad β from the data we collect. 8
/3/08 Lear Regresso Varables: Idepedet Varable (we provde ths) Depedet Varable (we observe ths) Parameters: β 0 -Itercept β Slope ε ~ Normal Radom Varable (μ ε 0, σ ε???) [Nose/Error] Lear Regresso Meag of ad > 0 [postve slope] < 0 [egatve slope] slope -tercept 9
/3/08 Lear Regresso Estmatg the Coeffcets I much the same wa we base estmates of o, we estmate wth b 0 ad wth b, the -tercept ad slope (respectvel) of the least squares or regresso le gve b: Ths s a applcato of the least squares method ad t produces a straght le that mmzes the sum of the squared dffereces betwee the pots ad the le The Least Squares Le The Best Ft Le Lear Regresso The coeffcets b ad b 0 for the least squares le Cot. are calculated as: 0
/3/08 Lear Regresso Eample Statstcs Data Pots Iformato 6 3 9 4 5 5 7 6.934+.4 - choce reall matters V - - R -0.9 Lear Regresso Gve varables, ou ca plot equvalet graphs: agast, or agast. I fact these are ot qute the same! Swappg the aes aroud has o effect o the Correlato BUT It does deepl affect the relatoshp ferred, the actual best ft le. R -0.9 V V IS NOT THE SAME AS: V The o le s NOT just the o le trasposed.
/3/08 Lear Regresso Varato _ SST ( - ) Radom/ueplaed. SSE ( - ) _ SSR ( - ) SST SSR + SSE Due to regresso. _ Hpothess Testg Almost all correlatos are ot 0, therefore the questo s: What s the lkelhood that a relatoshp betwee varables s a true relatoshp, or could t smpl be a result of radom samplg varablt or chace?
/3/08 Hpothess Testg Eample: Sgfcace of correlato H 0 : r 0 assumg that there s o true relatoshp. H : r <> 0 assumg that the relatoshp s real. Itall assume H 0 s true, ad evaluate whether the data support H The test statstc has a Studet t-dstrbuto wth degrees of freedom. The rejecto rego depeds o whether or ot we re dog a oe- or two- tal test (two-tal test s most tpcal) Wh N-? Because wth data pots our le s certa to be a perfect ft g{tç~ léâ 3