Lnear Correlaton Many research ssues are pursued wth nonexpermental studes that seek to establsh relatonshps among or more varables E.g., correlates of ntellgence; relaton between SAT and GPA; relaton between workng memory se and speed of problem solvng Ths type of research cannot establsh pathways of cause-effect Relatonshps among varables can take on many forms Correlaton s used to establsh test relablty egatve Lnear Relaton 8 6 4 Scatter Plots 3 4 5 6 Perfect Postve Lnear Relaton 5 5 5 4 6 8 Postve Lnear Relaton Postve on-lnear Relaton 6 5 4 3 4 6 8 6 4 4 6
egatve Lnear Relaton 8 6 4 3 4 5 6 ote non-systematc devaton of ponts around best-fttng lne. Ths suggests a lnear relatonshp between and a + b ote systematc devaton of ponts around best-fttng lne. Ths suggests a non-lnear relatonshp Postve Lnear Relaton 6 5 4 3 4 6 Postve on-lnear Relaton 8 6 4-4 6 Postve on-lnear Relaton 8 6 4 4 6 a + b c, good ft Many other non-lnear relatonshps between and are also possble. E.g., a + blog; a + b, etc Postve on-lnear Relaton a + b, poor ft 8 6 4-4 6
We wll only consder lnear relatonshps There are two related descrptve ssues How strong s the relatonshp? (Chapter 6 What s the lnear correlaton between the varables? The measure we focus on s called the lnear correlaton coeffcent or the Pearson product moment correlaton Wll also consder a verson for ordnal data There are many other coeffcents of relatonshp, as well What s the relatonshp? (Chapter 7 I.e., what equaton can one use to predct from (or vce versa ote, dfferent equatons are needed to predct n each drecton Another way to phrase the strength-ofrelatonshp queston s: How well does the standard score (-score of one varable predct the standard score (-score of the other? I.e., does knowng the number of s.d. s above or below the mean one varable s tell one how many s.d. s above the mean the other varable s? Ths phrasng makes t meanngful to relate measures on dfferent scales (e.g., heght and weght or of dfferent values on the same scale (e.g., heghts of chldren and parents We want a scale that runs from + to Perfect postve to perfect negatve lnear relaton 3
To llustrate, consder the followng data: 4 3 6 4 8 5 6 7 4 8 6 9 8 Mean 5.5. S.D. 3.3 6.6 Z Z Z Z -.49 -.49. -.6 -.6.34 -.83 -.83.68 -.5 -.5.5 -.7 -.7.3.7.7.3.5.5.5.83.83.68.6.6.34.49.49. SUM 9 SUM/(- Perfect Postve Lnear Relaton 5 5 5 4 6 8 r A A, where for A and A. s A A Consder the equaton r A A, where for A and A. s A A ote that product s postve f and are on the same sde of the mean and negatve f they are on opposte sdes r f the relatonshp s perfectly postvely lnear and r- f t s perfectly negatvely lnear It s between + and otherwse Ths when the relatonshp contans random error or s non-lnear 4
5 There s another way to calculate the correlaton coeffcent s s r r Ths computatonal equaton may be easer to use Lots of algebra occurs here 4 3 6 4 8 5 6 7 4 8 6 9 8 Mean 5.5. S.D. 3.3 6.6 33 8.5 65 54 55 385 55 77 r 4 4 4 6 8 3 6 9 36 8 4 8 6 64 3 5 5 5 6 36 44 7 7 4 49 96 98 8 6 64 56 8 9 8 8 34 6 4 Sum 55 385 54 77
Another Example 3 8 7 9 9 9 9 38 36 4 49 Z Z Z Z 3 8 -.56 -.49.33 7 9 -.5 -.74.39 9 9. -.7. 9 38..58. 36.78.4.33 4 49.3.3.69 Mean 9. 9.83 S.D. 3.85 4.7 SUM 4.74 6 5 4 3 3 6 9 5 r 4.74 5.95 Proporton of Varance Accounted for r has a very smple nterpretaton n terms of the mprovement n predctablty n provded by knowledge of over that obtaned from the mean of alone In the absence of any knowledge of, one can do no better than use as the best guess for When and are lnearly related, then a+b gves a better predcted value 6 5 ote the followng: ( + ( 4 3 3 6 9 5 6
( + ( ( ( + ( Lots of algebra occurs next Gets smaller as predcton mproves Stays fxed Gets larger as predcton mproves Total varablty of accounted for by (sum of squared devatons of about ts mean Sum of squared devatons of predcted from actual scores Total varablty of (sum of squared devatons of about ts mean Coeffcent of determnaton r ( ( Some Factors Affectng r Restrctng the range of 6 data decreases the 4 correlaton Elmnatng the mddle porton of the data or addng extreme scores nflates the correlaton - 3 4 5 6 7 8 Overall r.7 For 5<<5 r.3 For < or > 64 r.89 7
Ordnal Data If the data are ordnal n nature, or the relatonshp s dstnctly non-lnear, then t s napproprate to use the Pearson Product Moment correlaton, as t assumes a lnear relatonshp and requres at least nterval-scaled data There are varous alternatves One s the Spearman correlaton, whch s the Pearson, but calculated on ranks ou can use the equaton n the book, or the pror equatons, but usng the ranks rather than the raw data For the latter, convert the scores to ranks to, convert the scores to ranks to, then proceed as we dd wth the pror examples Group se 4 8 Decson tme 5.6 9.35 4.4 7.9 8.8 umber mnutes 5.. 5.. 5.. 4 8 Group se Pearson r.9 Spearman r s. 8