Chapter 9 Correlaton and Regresson 9. Correlaton Correlaton A correlaton s a relatonshp between two varables. The data can be represented b the ordered pars (, ) where s the ndependent (or eplanator) varable, and s the dependent (or response) varable. A scatter plot can be used to determne whether a lnear (straght lne) correlaton ests between two varables. Eample: 4 4 4 4 Larson & Farber, Elementar Statstcs: Pcturng the World, e
Lnear Correlaton As ncreases, tends to decrease. As ncreases, tends to ncrease. Negatve Lnear Correlaton Postve Lnear Correlaton No Correlaton Nonlnear Correlaton Larson & Farber, Elementar Statstcs: Pcturng the World, e 4 Correlaton Coeffcent The correlaton coeffcent s a measure of the strength and the drecton of a lnear relatonshp between two varables. The smbol r represents the sample correlaton coeffcent. The formula for r s n ( )( ) r. n ( ) n ( ) The range of the correlaton coeffcent s to. If and have a strong postve lnear correlaton, r s close to. If and have a strong negatve lnear correlaton, r s close to. If there s no lnear correlaton or a weak lnear correlaton, r s close to. Larson & Farber, Elementar Statstcs: Pcturng the World, e Lnear Correlaton r.9 r.88 Strong negatve correlaton r.4 Strong postve correlaton r. Weak postve correlaton Nonlnear Correlaton Larson & Farber, Elementar Statstcs: Pcturng the World, e
Calculatng a Correlaton Coeffcent Calculatng a Correlaton Coeffcent In Words. Fnd the sum of the -values.. Fnd the sum of the -values.. Multpl each -value b ts correspondng -value and fnd the sum. 4. Square each -value and fnd the sum.. Square each -value and fnd the sum.. Use these fve sums to calculate the correlaton coeffcent. r In Smbols n ( )( ). n ( ) n ( ) Contnued. Larson & Farber, Elementar Statstcs: Pcturng the World, e Correlaton Coeffcent Eample: Calculate the correlaton coeffcent r for the followng data. r 4 9 4 9 4 4 9 n ( )( ) n ( ) n ( ) (9 ) ( )( ) ( ) ( ) ( ).9 8 4 There s a strong postve lnear correlaton between and. Larson & Farber, Elementar Statstcs: Pcturng the World, e 8 Correlaton Coeffcent Eample: The followng data represents the number of hours dfferent students watched televson durng the weekend and the scores of each student who took a test the followng Monda. a.) Dspla the scatter plot. b.) Calculate the correlaton coeffcent r. Hours, Test score, 9 8 8 4 9 8 84 8 Contnued. Larson & Farber, Elementar Statstcs: Pcturng the World, e 9
Eample contnued: Correlaton Coeffcent Hours, Test score, 9 8 8 4 9 8 84 8 Test score 8 4 4 8 Hours watchng TV Contnued. Larson & Farber, Elementar Statstcs: Pcturng the World, e Eample contnued: Correlaton Coeffcent Hours, Test score, 9 8 8 4 9 8 84 8 8 4 8 4 8 4 48 4 4 9 9 49 49 9 4 4 9 44 4 4 4 98 4 8 r n ( )( ) n ( ) n ( ) ( 4 ) ( 4)( 9 8) ( ) 4 ( 8 ) ( 9 8 ) There s a strong negatve lnear correlaton. As the number of hours spent watchng TV ncreases, the test scores tend to decrease..8 Larson & Farber, Elementar Statstcs: Pcturng the World, e Testng a Populaton Correlaton Coeffcent Once the sample correlaton coeffcent r has been calculated, we need to determne whether there s enough evdence to decde that the populaton correlaton coeffcent ρ s sgnfcant at a specfed level of sgnfcance. One wa to determne ths s to use Table n Append B. If r s greater than the crtcal value, there s enough evdence to decde that the correlaton coeffcent ρ s sgnfcant. n 4 α..9.88.8.4 α..99.99.9.8 For a sample of sze n, ρ s sgnfcant at the % sgnfcance level, f r >.8. Larson & Farber, Elementar Statstcs: Pcturng the World, e 4
Testng a Populaton Correlaton Coeffcent Fndng the Correlaton Coeffcent ρ In Words. Determne the number of pars of data n the sample.. Specf the level of sgnfcance.. Fnd the crtcal value. 4. Decde f the correlaton s sgnfcant.. Interpret the decson n the contet of the orgnal clam. In Smbols Determne n. Identf α. Use Table n Append B. If r > crtcal value, the correlaton s sgnfcant. Otherwse, there s not enough evdence to support that the correlaton s sgnfcant. Larson & Farber, Elementar Statstcs: Pcturng the World, e Testng a Populaton Correlaton Coeffcent Eample: The followng data represents the number of hours dfferent students watched televson durng the weekend and the scores of each student who took a test the followng Monda. The correlaton coeffcent r.8. Hours, Test score, 9 8 8 4 9 8 84 8 Is the correlaton coeffcent sgnfcant at α.? Contnued. Larson & Farber, Elementar Statstcs: Pcturng the World, e 4 Testng a Populaton Correlaton Coeffcent Eample contnued: r.8 n 4 n α. Append B: Table α. α..9.99.88.99.8.9.......8.84 r >.8 Because, the populaton correlaton s sgnfcant, there s enough evdence at the % level of sgnfcance to conclude that there s a sgnfcant lnear correlaton between the number of hours of televson watched durng the weekend and the scores of each student who took a test the followng Monda. Larson & Farber, Elementar Statstcs: Pcturng the World, e
Hpothess Testng for ρ A hpothess test can also be used to determne whether the sample correlaton coeffcent r provdes enough evdence to conclude that the populaton correlaton coeffcent ρ s sgnfcant at a specfed level of sgnfcance. A hpothess test can be one taled or two taled. H : ρ (no sgnfcant negatve correlaton) H a : ρ < (sgnfcant negatve correlaton) H : ρ (no sgnfcant postve correlaton) H a : ρ > (sgnfcant postve correlaton) H : ρ (no sgnfcant correlaton) H a : ρ (sgnfcant correlaton) Left-taled test Rght-taled test Two-taled test Larson & Farber, Elementar Statstcs: Pcturng the World, e Hpothess Testng for ρ The t-test for the Correlaton Coeffcent A t-test can be used to test whether the correlaton between two varables s sgnfcant. The test statstc s r and the standardzed test statstc r r t σ r r n follows a t-dstrbuton wth n degrees of freedom. In ths tet, onl two-taled hpothess tests for ρ are consdered. Larson & Farber, Elementar Statstcs: Pcturng the World, e Hpothess Testng for ρ Usng the t-test for the Correlaton Coeffcent ρ In Words. State the null and alternatve hpothess.. Specf the level of sgnfcance.. Identf the degrees of freedom. 4. Determne the crtcal value(s) and rejecton regon(s). In Smbols State H and H a. Identf α. d.f. n Use Table n Append B. Larson & Farber, Elementar Statstcs: Pcturng the World, e 8
Hpothess Testng for ρ Usng the t-test for the Correlaton Coeffcent ρ In Words. Fnd the standardzed test statstc.. Make a decson to reject or fal to reject the null hpothess.. Interpret the decson n the contet of the orgnal clam. In Smbols r t r n If t s n the rejecton regon, reject H. Otherwse fal to reject H. Larson & Farber, Elementar Statstcs: Pcturng the World, e 9 Hpothess Testng for ρ Eample: The followng data represents the number of hours dfferent students watched televson durng the weekend and the scores of each student who took a test the followng Monda. The correlaton coeffcent r.8. Hours, Test score, 9 8 8 4 9 8 84 8 Test the sgnfcance of ths correlaton coeffcent sgnfcant at α.? Contnued. Larson & Farber, Elementar Statstcs: Pcturng the World, e Hpothess Testng for ρ Eample contnued: H : ρ (no correlaton) H a : ρ (sgnfcant correlaton) The level of sgnfcance s α.. Degrees of freedom are d.f.. The crtcal values are t.9 and t.9. The standardzed test statstc s t r.8 r (.8) n 4.. t.9 t.9 The test statstc falls n the rejecton regon, so H s rejected. At the % level of sgnfcance, there s enough evdence to conclude that there s a sgnfcant lnear correlaton between the number of hours of TV watched over the weekend and the test scores on Monda mornng. Larson & Farber, Elementar Statstcs: Pcturng the World, e t
Correlaton and Causaton The fact that two varables are strongl correlated does not n tself mpl a cause-and-effect relatonshp between the varables. If there s a sgnfcant correlaton between two varables, ou should consder the followng possbltes.. Is there a drect cause-and-effect relatonshp between the varables? Does cause?. Is there a reverse cause-and-effect relatonshp between the varables? Does cause?. Is t possble that the relatonshp between the varables can be caused b a thrd varable or b a combnaton of several other varables? 4. Is t possble that the relatonshp between two varables ma be a concdence? Larson & Farber, Elementar Statstcs: Pcturng the World, e 9. Lnear Regresson Resduals After verfng that the lnear correlaton between two varables s sgnfcant, net we determne the equaton of the lne that can be used to predct the value of for a gven value of. Observed - value d d For a gven -value, d (observed -value) (predcted -value) Predcted - d value Each data pont d represents the dfference between the observed -value and the predcted -value for a gven -value on the lne. These dfferences are called resduals. Larson & Farber, Elementar Statstcs: Pcturng the World, e 4 8
Regresson Lne A regresson lne, also called a lne of best ft, s the lne for whch the sum of the squares of the resduals s a mnmum. The Equaton of a Regresson Lne The equaton of a regresson lne for an ndependent varable and a dependent varable s ŷ m + b where ŷ s the predcted -value for a gven -value. The slope m and -ntercept b are gven b n ( )( ) m and b m m n ( ) n n wher e s the m ean of the -values and s the mean of the -values. The regr esson lne alwas passes through (, ). Larson & Farber, Elementar Statstcs: Pcturng the World, e Regresson Lne Eample: Fnd the equaton of the regresson lne. 4 9 4 9 4 4 9 n ( )( ) (9) ( )( ) m. n ( ) () ( ) Contnued. Larson & Farber, Elementar Statstcs: Pcturng the World, e Regresson Lne Eample contnued: b m (.).8 The equaton of the regresson lne s ŷ..8. 4 ( ) (, ), Larson & Farber, Elementar Statstcs: Pcturng the World, e 9
Regresson Lne Eample: The followng data represents the number of hours dfferent students watched televson durng the weekend and the scores of each student who took a test the followng Monda. Hours, Test score, a.) Fnd the equaton of the regresson lne. b.) Use the equaton to fnd the epected test score for a student who watches 9 hours of TV. 9 8 8 8 4 9 8 84 8 4 8 4 8 4 48 4 4 9 9 49 49 9 4 4 9 44 4 4 4 98 4 8 Larson & Farber, Elementar Statstcs: Pcturng the World, e 8 Regresson Lne Eample contnued: n ( )( ) (4) ( 4)( 98) m n ( ) 4. () ( 4) b m 98 4 ( 4.) 9.9 ŷ 4. + 9.9 Test score 8 4 4 98 ( ) ( ) (, ), 4.,. 4 8 Hours watchng TV Contnued. Larson & Farber, Elementar Statstcs: Pcturng the World, e 9 Eample contnued: Regresson Lne Usng the equaton ŷ 4. + 9.9, we can predct the test score for a student who watches 9 hours of TV. ŷ 4. + 9.9 4.(9) + 9.9.4 A student who watches 9 hours of TV over the weekend can epect to receve about a.4 on Monda s test. Larson & Farber, Elementar Statstcs: Pcturng the World, e
9. Measures of Regresson and Predcton Intervals Varaton About a Regresson Lne To fnd the total varaton, ou must frst calculate the total devaton, the eplaned devaton, and the uneplaned devaton. Total devaton Eplan ed devaton ˆ Uneplaned devaton ˆ (, ) Uneplaned Total devaton devaton ˆ (, ŷ ) Eplaned devaton (, ) Larson & Farber, Elementar Statstcs: Pcturng the World, e ˆ Varaton About a Regresson Lne The total varaton about a regresson lne s the sum of the squares of the dfferences between the -value of each ordered par and the mean of. Total varaton ( ) The eplaned varaton s the sum of the squares of the dfferences between each predcted -value and the mean of. Eplaned varaton ( ) The uneplaned varaton s the sum of the squares of the dfferences between the -value of each ordered par and each correspondng predcted -value. Uneplaned varaton ( ˆ ) ˆ Total varaton Eplaned varaton + Uneplaned varaton Larson & Farber, Elementar Statstcs: Pcturng the World, e
Coeffcent of Determnaton The coeffcent of determnaton r s the rato of the eplaned varaton to the total varaton. That s, Eplaned varaton r Total varaton Eample: The correlaton coeffcent for the data that represents the number of hours students watched televson and the test scores of each student s r.8. Fnd the coeffcent of determnaton. r (.8).9 About 9.% of the varaton n the test scores can be eplaned b the varaton n the hours of TV watched. About.9% of the varaton s uneplaned. Larson & Farber, Elementar Statstcs: Pcturng the World, e 4 The Standard Error of Estmate When a ŷ-value s predcted from an -value, the predcton s a pont estmate. An nterval can also be constructed. The standard error of estmate s e s the standard devaton of the observed -values about the predcted ŷ-value for a gven -value. It s gven b ( ˆ ) s e n where n s the number of ordered pars n the data set. The closer the observed -values are to the predcted -values, the smaller the standard error of estmate wll be. Larson & Farber, Elementar Statstcs: Pcturng the World, e The Standard Error of Estmate Fndng the Standard Error of Estmate In Words. Make a table that ncludes the column headng shown.. Use the regresson equaton to calculate the predcted -values.. Calculate the sum of the squares of the dfferences between each observed - value and the correspondng predcted -value. 4. Fnd the standard error of estmate. In Smbols,, ˆ, ( ˆ ), ( ˆ ) ˆ m + b ( ˆ ) s e ( ˆ ) n Larson & Farber, Elementar Statstcs: Pcturng the World, e
The Standard Error of Estmate Eample: The regresson equaton for the followng data s ŷ..8. Fnd the standard error of estmate. 4 ( ˆ ) n ŷ ( ŷ )...4...4..4.4 Uneplaned varaton.4 se. The standard devaton of the predcted value for a gven value s about.. Larson & Farber, Elementar Statstcs: Pcturng the World, e The Standard Error of Estmate Eample: The regresson equaton for the data that represents the number of hours dfferent students watched televson durng the weekend and the scores of each student who took a test the followng Monda s ŷ 4. + 9.9. Fnd the standard error of estmate. Hours, Test score, 9 8 8 4 9 8 ŷ 9.9 89.9 8.8 8. 8.. ( ŷ ) 4. 4. 4....8 Hours, Test score, 84 8 ŷ.. 9..48.48. ( ŷ )..4.4. 9..9 Contnued. Larson & Farber, Elementar Statstcs: Pcturng the World, e 8 The Standard Error of Estmate Eample contnued: ( ˆ ) 8. Uneplaned varaton s e ( ˆ ) n 8. 8. The standard devaton of the student test scores for a specfc number of hours of TV watched s about 8.. Larson & Farber, Elementar Statstcs: Pcturng the World, e 9
Predcton Intervals Two varables have a bvarate normal dstrbuton f for an fed value of, the correspondng values of are normall dstrbuted and for an fed values of, the correspondng - values are normall dstrbuted. A predcton nterval can be constructed for the true value of. Gven a lnear regresson equaton ŷ m + b and, a specfc value of, a c-predcton nterval for s ŷ E < < ŷ + E where n( ) E t cse + +. n n ( ) The pont estmate s ŷ and the margn of error s E. The probablt that the predcton nterval contans s c. Larson & Farber, Elementar Statstcs: Pcturng the World, e 4 Predcton Intervals Construct a Predcton Interval for for a Specfc Value of In Words. Identf the number of ordered pars n the data set n and the degrees of freedom.. Use the regresson equaton and the gven -value to fnd the pont estmate ŷ.. Fnd the crtcal value t c that corresponds to the gven level of confdence c. In Smbols d.f. n ˆ m + b Use Table n Append B. Contnued. Larson & Farber, Elementar Statstcs: Pcturng the World, e 4 In Words 4. Fnd the standard error of estmate s e. Predcton Intervals Construct a Predcton Interval for for a Specfc Value of In Smbols s e ( ˆ ) n. Fnd the margn of error E.. Fnd the left and rght endponts and form the predcton nterval. n( ) E t cs e + + n n ( ) Left endpont: ŷ E Rght endpont: ŷ + E Interval: ŷ E < < ŷ + E Larson & Farber, Elementar Statstcs: Pcturng the World, e 4 4
Predcton Intervals Eample: The followng data represents the number of hours dfferent students watched televson durng the weekend and the scores of each student who took a test the followng Monda. Hours, Test score, 9 8 8 4 9 8 84 8 ŷ 4. + 9.9 s e 8. Construct a 9% predcton nterval for the test scores when 4 hours of TV are watched. Contnued. Larson & Farber, Elementar Statstcs: Pcturng the World, e 4 Predcton Intervals Eample contnued: Construct a 9% predcton nterval for the test scores when the number of hours of TV watched s 4. There are n degrees of freedom. The pont estmate s ŷ 4. + 9.9 4.(4) + 9.9.9. The crtcal value t c.8, and s e 8.. ŷ E < < ŷ + E.9 8. 9.8.9+ 8. 8.8 You can be 9% confdent that when a student watches 4 hours of TV over the weekend, the student s test grade wll be between 9.8 and 8.8. Larson & Farber, Elementar Statstcs: Pcturng the World, e 44 9.4 Multple Regresson
Multple Regresson Equaton In man nstances, a better predcton can be found for a dependent (response) varable b usng more than one ndependent (eplanator) varable. For eample, a more accurate predcton of Monda s test grade from the prevous secton mght be made b consderng the number of other classes a student s takng as well as the student s prevous knowledge of the test materal. A multple regresson equaton has the form ŷ b + m + m + m + + m k k where,,,, k are ndependent varables, b s the - ntercept, and s the dependent varable. * Because the mathematcs assocated wth ths concept s complcated, technolog s generall used to calculate the multple regresson equaton. Larson & Farber, Elementar Statstcs: Pcturng the World, e 4 Predctng -Values After fndng the equaton of the multple regresson lne, ou can use the equaton to predct -values over the range of the data. Eample: The followng multple regresson equaton can be used to predct the annual U.S. rce eld (n pounds). ŷ 89 +. +.8 where s the number of acres planted (n thousands), and s the number of acres harvested (n thousands). (Source: U.S. Natonal Agrcultural Statstcs Servce) a.) Predct the annual rce eld when 8, and 4. b.) Predct the annual rce eld when 8, and. Contnued. Larson & Farber, Elementar Statstcs: Pcturng the World, e 4 Eample contnued: Predctng -Values a.) ŷ 89 +. +.8 89 +.(8) +.8(4),. The predcted annual rce eld s,. pounds. b.) ŷ 89 +. +.8 89 +.(8) +.8(),.8 The predcted annual rce eld s,.8 pounds. Larson & Farber, Elementar Statstcs: Pcturng the World, e 48