Chapter VII Measures of Correlation

Chapter VII Measures of Correlatio A researcher may be iterested i fidig out whether two variables are sigificatly related or ot. For istace, he may be iterested i kowig whether metal ability is sigificatly related to school performace; whether work performace is sigificatly related to level of morale of the employees; or whether study habits sigificatly relate to grades i mathematics. I all these problems, the researcher ca choose a research desig that will help him establish such relatioships. The simplest thig that he ca do is to study the patters of values of the two variables. This is the area of correlatio aalysis. Basic Cocepts i Correlatio Aalysis Correlatio aalysis is cocered with the aalysis of liear relatioship betwee two or more variables. It is used if oe wats to determie whether the variables are sigificatly related or ot. The relatioship betwee two variables could be positive or egative. A positive liear relatioship betwee two variables exists if a icrease i the value of oe correspods to a icrease i the value of the other. Equivaletly, a positive correlatio exists if a decrease i the value of oe variable correspods to a decrease i the value of the other variable. O the other had, a egative liear relatioship betwee two variables exists if a decrease i the value of oe variable correspods to a icrease i the value of the other variable. Or, if a icrease i the value of oe variable correspods to a decrease i the value of the other, the a egative correlatio exists. Example: Example: It has bee show that IQ ad academic performace are positively related. The followig variables are egatively correlated. 1. academic achievemet ad hours per week of watchig TV. time spet i practice ad umber of typig errors 3. abseteeism rate ad job satisfactio I order to determie the stregth of the associatio or correlatio betwee variables we compute for a statistic kow as correlatio coefficiet. It measures the degree or stregth of the liear relatioship betwee two or more variables. There are several coefficiets of correlatio ad their use depeds o the type of data (e. g. Pearso r, Spearma rho, Cramer s V, Poit-biserial, etc.). Most coefficiets of correlatio assume values betwee -1 ad +1. Qualitative descriptio of the stregth of correlatio is based o the followig suggested guide.

Figure 1. Guide i iterpretig correlatio coefficiets. The Scatter Plot A scatter plot, also kow as scatter diagram, is a graphical represetatio of the liear relatioship betwee two variables. The followig scatter plots show the various types ad degrees of liear associatio betwee two variables. Figure a. Strog positive correlatio Figure b. Low egative correlatio

Figure c. Strog Negative Correlatio Figure d. No correlatio Figure e. Low positive correlatio Spurious Correlatio ad Causality Causality, also kow as causatio, is defied as a cause-effect relatioship betwee two variables. A sigificat correlatio does ot ecessarily idicate causality but rather a commo likage i a sequece of evets. Oe type of sigificat correlatio situatio is whe both variables are iflueced by a commo cause (cofoudig variable) ad therefore are correlated with each other (Spurious correlatio). For example, idividuals with a higher level of icome have both higher levels of savigs ad spedig. We might fid that there is a positive correlatio betwee level of savigs ad level of spedig but this does ot mea that oe variable causes the other. Moore (1993) foud a sigificat strog correlatio betwee amout of ice cream sold ad the umber of deaths by drowig. But this does ot mea that eatig ice cream causes death by drowig or the other way aroud. This sigificat correlatio is actually due to seaso (summer time). If a researcher wats to show cause-ad-effect, he should coduct a cotrolled experimet.

Correlatio Betwee Two Iterval Variables (Pearso r) The Pearso product-momet correlatio coefficiet, popularly kow as Pearso s r, is the most widely used correlatio coefficiet. Values of r for pairs of variables are commoly reported i research reports ad jourals as a meas of summarizig the extet of the relatioship betwee two variables measured i at least iterval scale. This coefficiet is appropriate if observatios are sampled from a bivariate ormal distributio. Give sample data, Pearso r is computed usig the followig formula: r i1 X i X iyi X iyi i1 i1 i1 i1 X i i1 Y Testig for the Sigificace of Pearso s r Whe computig a correlatio coefficiet, it is also useful to test the correlatio coefficiet for sigificace. This provides the researcher with some idea of how large a correlatio coefficiet must be before cosiderig it to demostrate that there really is a relatioship betwee two variables. It may be that two variables are related by chace, ad a hypothesis test for r allows the researcher to decide whether the observed r could have emerged by chace or ot. I order to test the correlatio coefficiet for statistical sigificace, it is ecessary to defie the true correlatio coefficiet that would be observed if all populatio values were available. This true correlatio coefficiet is usually deoted by the Greek letter ρ (rho). The ull hypothesis is that there is o relatioship betwee the two variables X ad Y. I symbol, H0: = 0 The ull hypothesis may be tested agaist ay oe of the followig alterative hypotheses: a) Ha: 0 b) Ha: > 0 c) Ha: < 0 The test statistic for the hypothesis test above is the sample or observed correlatio coefficiet r. As various samples are draw, each of sample size, the values of r vary from sample to sample. The samplig distributio of r is approximated by a t distributio with - degrees for freedom. Hece, the test statistic for testig for the sigificace of r is give by r t 1 r i i1 Y i

Example: The followig table presets data o years of formal educatio (X) ad age of etry ito the labour force (Y) of a radom sample of 1 workers. X Y X Y XY 10 16 100 56 160 1 17 144 89 04 15 18 5 34 70 8 15 64 5 10 0 18 400 34 360 17 89 484 374 1 19 144 361 8 15 5 484 330 1 18 144 34 16 10 15 100 5 150 8 18 64 34 144 10 16 100 56 160 X 149 Y 14 X 1999 Y 3876 XY 716 r r X XY X X Y Y 1716 149 14 149 1 3876 1 1999 14 Y 0. 641 Therefore, there is a moderate positive correlatio betwee years of formal educatio (X) ad age of etry ito the labour force (Y). Computig ad testig the sigificace of Pearso r usig Stata 1. The data shall be typed i Stata Editor (or i Excel), the way it appears i the table above (oly the colums of X ad Y).

. From the meu at the top of the scree, click o StatisticsSummaries, tables, ad testssummary ad descriptive statisticspairwise correlatios. 3. I the Mai tab of the dialog box, select X ad Y i the pull-dow meu. 4. Check the boxes opposite Prit sigificace level for each etry ad sigificace level for displayig with a star. 5. Click OK. SCREENSHOTS

OUTPUT. pwcorr x y, sig star(5) x x 1.0000 y Commad Pearso r y 0.641* 1.0000 0.0301 p-value Iterpretatio: There is a sigificat moderate positive correlatio betwee years of formal educatio ad age of etry ito the labour force (r=0.641, p=0.0301). Correlatio Betwee Two Ordial Variables (Spearma rho) If both variables are measured i the ordial scale, the the most appropriate correlatio coefficiet is the Spearma rak coefficiet, popularly kow as Spearma rho. The value of the Spearma rho coefficiet is calculated based o the raks of the observatios. The computatioal formula of Spearma s rho together with the test statistic for testig its sigificace are give by 6 d r s 1 3 where d = paired differeces betwee the raks. = the umber of paired raks.

t r s 1 rs Example: Thus, Suppose that a researcher wats to fid out the relatioship betwee job performace of 6 employees (measured i a scale of 1 to 10) ad their job satisfactio (also measured i a scale of 1 to 10). The hypothetical data is give below. Performace Satisfactio Differece (d) d 10 (1) 8 (3) 4 8 (3) 9 () 1 1 9 () 10 (1) 1 1 4 (6) 5 (5) 1 1 5 (5) 3 (6) 1 1 6 (4) 6 (4) 0 0 r s d 8 6( 8 ) 48 1 1 0. 7714 there exists a strog liear relatioship 3 6 6 10 betwee performace ad satisfactio Computig ad testig the sigificace of Spearma rho usig Stata 1. The data whe typed i Stata Editor (or i Excel) will look like this:. From the meu at the top of the scree, click o StatisticsNoparametric aalysistests of hypothesesspearma s rak correlatio. 3. I the Mai tab of the dialog box, select X ad Y i the pull-dow meu. 4. Check the boxes opposite Prit sigificace level for each etry ad sigificace level for displayig with a star. 5. Click OK.

SCREENSHOTS

OUTPUT. spearma performace satisfactio, star(0.05) Number of obs = 6 Spearma's rho = 0.7714 Commad Spearma rho Test of Ho: performace ad satisfactio are idepedet Prob > t = 0.074 p-value Iterpretatio: There is a strog liear relatioship betwee performace ad satisfactio(rs=0.7714). However, the correlatio is ot sigificat (p=0.074). The strog liear associatio betwee these two variables may oly be due to chace. Or, maybe because of very small sample size. Correlatio Betwee a Iterval Variable ad a Two-category Nomial Variable If oe of the two variables i the aalysis is dichotomous (two-category omial) ad the other is iterval i scale, the poit-biserial correlatio coefficiet is appropriate. The computatioal formula poit biserial coefficiet is give by y1 y 0 pq rpbi s where: y 1 y 0 y - mea of the Y scores for those idividuals with X scores equal to 1 - mea of the Y score for those idividuals with X scores equal to 0 s - stadard deviatio of all Y scores y p - proportio of idividuals with a X score equal to 1 q - proportio of idividuals with a X score equal to 0 Similar to the Pearso r ad the Spearma rho, the sigificace of the poit-biserial coefficiet is tested usig a T test. Example Cosider the dexterity scores of a radom sample of male ad female respodets. Let 1 deote male ad 0 deote female. Subject Sex Dexterity 1 1 11 1 13 3 1 1 4 1 13 5 1 10 6 0 10 7 0 7 8 0 9 9 0 6 10 0 7

Solutio: The stadard deviatio of dexterity scores is s y. 59813; p = 0.5 ad q = 0.5. Hece, r pbi r pbi y 1 y s 0 y pq 11. 8 7. 8 0. 50. 5. 59818 0. 7906 Therefore, there exists a high correlatio betwee sex ad dexterity scores. Males ted to have higher dexterity tha females. Computig ad testig the sigificace of Poit-biserial coefficiet usig Stata 1. Type the data Stata Editor (or i Excel) as it appears i the table above.. Istall the poit-biserial (pbis) package. I the Commad widow, type fidit pbis. Hit Eter. Follow istructios below. Wait util istallatio is complete.

Click o this lik 3. I the Commad widow, type pbis sex dexterity. Hit Eter.. pbis sex dexterityscore OUTPUT Commad (obs= 10) Np= 5 p= 0.50 Nq= 5 q= 0.50 Poit-biserial coefficiet p-value ------------------+------------------+------------------+------------------+ Coef.= 0.7906 t= 3.6515 P> t = 0.0065 df= 8

Iterpretatio: There exists a sigificat strog correlatio betwee sex ad dexterity scores. Further, males have sigificatly higher dexterity tha females. Correlatio Betwee Two Dichotomous Nomial Variables The Phi () coefficiet is useful whe both variables are omial i scale or categorical with two categories (dichotomous). I this case, the data is arraged i a x cotigecy table. The computatioal formula of phi coefficiet together with the test statistic for testig its sigificace are give by t bc ad a bc d a cb d 1 Example: Cosider the problem of determiig the correlatio betwee sex ad political party affiliatio. Suppose we have 10 subjects with the followig data: Perso Sex (X) Party (Y) A 1 1 B 1 1 C 1 0 D 1 1 E 1 1 F 0 0 G 0 1 H 0 1 I 0 0 J 0 0 where: sex - 1 if male; 0 if female party 1 if Republica; 0 if Democrat The correspodig x cotigecy table is give below. Party (Y) Sex (X) Total 0 1 1 4 6 0 3 1 4 Total 5 5 10 The, 43 1 64 5 5 0. 408

Therefore, there exists a relatively moderate positive associatio betwee sex ad political party affiliatio. Computig ad testig the sigificace of Phi coefficiet usig Stata 1. Type the data Stata Editor (or i Excel) as it appears below.. Istall the Phi coefficiet (phi) package. I the Commad widow, type fidit phi. Hit Eter. Scroll dow to this lik: sp3 from http://www.stata.com/stb/stb3. Click o this lik

Wait util istallatio is complete. 3. I the Commad widow, type phi sex party. Hit Eter.. phi sex party OUTPUT Commad Party (Y) Sex (X) 0 1 Total 0 3 5 1 1 4 5 Total 4 6 10 p-value Phi coefficiet Pearso chi(1) = 1.6667 Pr = 0.197 phi = Cohe's w = fourfold poit correlatio = 0.408 phi-squared = 0.1667 Iterpretatio: There exists a o-sigificat moderate associatio betwee sex ad political party affiliatio ( =0.408, p=0.197). There is o sufficiet data to coclude that oe sex category is domiatly affiliated with a political party. REMARK: The value of the Phi coefficiet ca also be computed based i the value of the chi-square test statistic. It is calculated as X. N

Chi-square-based Measures of Associatio for Categorical Variables The chi-square test for idepedece of two categorical variables is used to determie whether the variables are associated or ot. However, it does ot provide iformatio o the stregth of the associatio. The followig are some measures of (oliear) associatio betwee variables based o the chi-square test. The sigificace of these measures follows that of the chi-square test statistic. a) Cramer s V The Cramer s V is appropriate if the dimesio of the cotigecy table is larger tha a x ad the umber of rows is ot equal to the umber of colums. This is cosidered as a alterative for phi coefficiet. The value of V ca be calculated based o the chi-square statistic as follows: X V NL 1 where X is the value of the chi-square statistic N = the grad total L = the umber of rows or colums, whichever is smaller Example: Suppose that we are iterested i the relatioship betwee work performace problems of employees ad their use of three treatmet cliics. The data are give below. Type of Problem Type of Cliic Total Cardiovascular Metal Alcoholism Health Abseteeism 7 13 8 48 Tardiess 15 14 11 40 Poor Quality Work 15 1 14 41 Problems with Other Workers 3 16 4 43 Total 60 55 57 17 Aalysis usig Stata 1. Type the data Stata Editor (or i Excel) as it appears below.

. Type the followig commad i the Commad widow: tabulate problem cliic [fweight=um_workers], chi V. Press Eter. OUTPUT. tabulate problem cliic [fweight=um_workers], chi V cliic problem Cardiovas Metal he Alcoholis Total Abseteeism 7 13 8 48 Tardiess 15 14 11 40 Poor quality of work 15 1 14 41 Problem with co-worke 3 16 4 43 Total 60 55 57 17 Pearso chi(6) = 7.981 Pr = 0.000 Cramér's V = 0.849 Iterpretatio: There exists a sigificatly low associatio betwee work performace problem ad type of treatmet cliic (V=0.849, p=0.000). Majority of workers with problems i abseteeism usually atted to Cardiovascular treatmet; while, workers with problems with their co-workers usually atted to Alcoholism treatmet cliic. b) Cotigecy coefficiet C The Cotigecy C is appropriate if the dimesio of the cotigecy table is larger tha a x ad the umber of rows is equal to the umber of colums. The value of C is calculated as: X C X N where X is the value of the chi-squared statistic N = the grad Ufortuately, there is o available Stata package for computig the value of C. Its value has to be computed maually. Example: A radom sample of 00 married me, all retired, were classified accordig to educatio ad umber of childre. Is there a sigificat associatio betwee the size of the family ad the level of educatio attaied by the father? Educatio Number of Childre 0-1 -3 Over 3 Total Elemetary 14 37 3 83 Secodary 19 4 17 78 College 1 17 10 39 Total 45 96 59 00

Aalysis usig Stata 1. Type the data Stata Editor (or i Excel) as it appears below.. Type the followig commad i the Commad widow: tabulate educ umchild [fweight=um_me], chi. Press Eter. OUTPUT. tabulate educ umchild [fweight=um_me], chi umchild educ 0-1 -3 Over 3 Total Elemetary 14 37 3 83 High School 19 4 17 78 College 1 17 10 39 Total 45 96 59 00 Pearso chi(4) = 7.4644 Pr = 0.113 3. To geerate the value of C, type the commad: display sqrt(7.4644/(7.4644+00)) C= 0.1896818 Iterpretatio: There exists a o-sigificat very low associatio betwee the level of educatio of the father ad the umber of childre of the family (C=0.1897, p=0.1130). Other measures of associatio 1. Gamma coefficiet. Kedall s rak correlatio 3. Eta coefficiet