CORELATION - Pearson-r - Spearman-rho
Scatter Diagram A scatter diagram is a graph that shows that the relationship between two variables measured on the same individual. Each individual in the set is represented by a point on in the scatter diagram. The predictor variable is plotted on the horizontal axis and the response variable is plotted on the vertical axis. Do not connect points when drawing a scatter diagram.
Scatterplot A scatterplot is a graph that shows location of each data formed by a pair of X-Y scores. In a positive linear relationship, as the X scores increase, the Y scores tends to increase. In a negative linear relationship, as the X scores increase, the Y scores tends to decrease. In a nonlinear relationship, as the X scores increase, the Y scores do not only increase or only decreases
Types of relationship A horizontal scatterplot, with horizontal regression line, indicates no relationship. Slopping scatterplots with regression lines oriented so that Y increases as X increases indicate a positive linear relationship. Slopping scatterplots with regression lines oriented so that Y decreases as X increases indicate a negative linear relationship. Scatterplots producing curved regression lines indicate nonlinear relationships.
Strength of relationship The strength of a relationship is the extent to which one value of Y is consistently paired with one and only one value of X. The strength of a relationship is also referred to as the degree of association between the two variables The absolute value of the correlation coefficient (the size of the number we calculate) indicates the strength of the relationship. The largest value you can obtain is 1.0 and the smallest value is 0. The larger the value the stronger the relationship.
For example, on average, as height in people increases, so does weight. Height(in) Weight (lbs) 1 60 102 2 62 120 3 63 130 4 65 150 5 65 120 6 68 145 7 69 175 8 70 170 9 72 185 10 74 210
Example of a Positive Correlation If the correlation is positive, when one variable increases, so does the other.
For example, as study time increases, the number of errors on an exam decreases Study time (min) 1 90 25 2 100 28 3 130 20 4 150 20 5 180 15 6 200 12 7 220 13 8 300 10 9 350 8 10 400 6 No. Errors on test
Example of a negative correlation If the correlation is negative, when one variable increases, the other decreases.
Example of a zero correlation If there is no relationship between the two variables, then as one variable increases, the other variable neither increases nor decreases. In this case, the correlation is zero. For example, if we measure the SAT-V scores of college freshmen and also measure the circumference of their right big toes, there will be a zero correlation.
What is the correlation coefficient? Linear means straight line. Correlation means co-relation, or the degree that two variables "go together". Linear correlation means to go together in a straight line. The correlation coefficient is a number that summarizes the direction and degree (closeness) of linear relations between two variables.
What is the correlation coefficient? The correlation coefficient is also known as the Pearson Product- Moment Correlation Coefficient. The sample value is called r, and the population value is called ρ (rho).
What is the correlation coefficient? The correlation coefficient can take values between -1 through 0 to +1. The sign (+ or -) of the correlation affects its interpretation. When the correlation is positive (r > 0), as the value of one variable increases, so does the other.
The correlation coefficient 1. Pearson correlation coefficient (Both variables must be interval or ratio) 2. Spearman rank-order correlation coefficient (Both variables are ordinal (ranked)) 3. Point-biserial correlation coefficient (One variable is interval or ratio and one variable is nominal and dichotomous) 4. Phi (Both variables are nominal and dichotomous)
Correlation & Association Scale Interval-interval Ordinal-ordinal Nominal-nominal Nominal-interval Nominal-ordinal Ordinal-interval Example Pearson r Spearman Rank Phi, Chi-square Independent test Eta Theta, Kruskal-Wallis H test Jaspen s M, F test
Measuring Associations : Pearson s correlation
Pearson correlation coefficient o The conceptual (definitional) formula of the correlation coefficient is: (1.1) where x and y are deviation scores, that SX and SY are sample standard deviations, that is,
where zx is X in z-score form, zy is Y in z-score form, and S and N have their customary meaning. This says that r is the average cross-product of z- scores. Pearson correlation coefficient Another way of defining correlation is: (1.2)
Pearson correlation coefficient Where
Pearson correlation coefficient Sometimes you will see these formulas written as: and
Pearson correlation coefficient These formulas are correct when the standard deviations used in the calculations are the estimated population standard deviations rather than the sample standard deviations. so the main point is to be consistent. Either use N throughout or use N-1 throughout.
Example:
Covariance Covariance(cov xy )represents the degree which two variables change together Cov xy = (Σ(x xbar). (y-ybar))/n-1 This says that the correlation is the average of cross-products (also called a covariance) standardized by dividing through by both standard deviations.
Height Weight 72 190 66 135 69 155 72 165 71 155 Dapatkan (i) covariance (ii) coefficient of correlation
Interpretation of Pearson Coefficient r Interpretation 0.00-0.20 can be ignored 0.20-0.40 low 0.40-0.60 medium 0.60-0.80 high 0.80-1.00 very high
Strength of Pearson r Coefficient Strength 0.01 0.09 Trivial 0.10 0.29 Low to moderate 0.30 0.49 Moderate to substantial 0.50 0.69 Substantial to very strong 0.70 0.89 Very strong >0.90 Near perfect
The coefficient of determination Correlation cannot be used to explain whether or not one variable causes another, but can be used for predictive purposes The Coefficient of determination, computed by squaring the correlation coefficient, tells the proportion of the variability of one variable that can be explained by the other variable. Coefficient of determination = r 2
The coefficient of determination Suppose that the bird and whale migration were correlated with r = 0.5. r 2= (.5) 2 = 0.25 This means that.25 or 25% of the variance in the time of whale migration can be explained by the variance in time of bird migration..75 or 75% of the variance can be explained by other factors. Therefore, even if the bird were 2 weeks late in their migration, you would not expect the whales to be 2 weeks late because 75% of the variation in whale migration is explained y factors other than bird migration.
Spearman s Coefficient of Rank Correlation, r s
Spearman s rank-order correlation coefficient The correlation coefficient is used when one or more variables is measured on an ordinal (ranking) scale Describes the linear relationship between two variables measured using ranked scores Symbol used r s (The subscript s stands for Spearman; Charles Spearman invented this one)
Computational Formula for the Spearman Rank-Order Correlation Coefficient is: R s = 1 6(ΣD 2 ) ----------- N (N 2-1) N is the number of pair ranks D is the difference between the two ranks in each pair
Running the Spearman Rank-Order Correlation Test 1. Determine the difference between the ranks for each subjects 2. Square each difference and sum them 3. Calculate the rho statistics. 4. Compare the obtained rho value with the critical value
Summary of the Spearman Rank-Order Correlation Test Hypotheses: H 0 : Rho = 0 H a : Rho 0, or Rho < 0, or Rho > 0 Assumptiojns: Subjects are randomly selected Observations are ranked order Decision Rules: n = number of pairs of ranks If rho obt rho crit, reject H 0 If rho obt < rho crit, do not reject H 0 Formula rho = 1 6(ΣD 2 ) n (n 2-1)
Sample data Participant Observer A: X Observer B: Y 1 4 3 2 1 2 3 9 8 4 8 6 5 3 5 6 5 4 7 6 7 8 2 1 9 7 9
Solution Participant Observer A: X Observer B: Y D D 2 1 4 3 1 1 2 1 2-1 1 3 9 8 1 1 4 8 6 2 4 5 3 5-2 4 6 5 4 1 1 7 6 7-1 1 8 2 1 1 1 9 7 9-2 4 ΣD 2 =18
Solution Rs = 1 6(ΣD 2 ) ----------- N (N 2-1) = 1 (6(18)) ---------- 9 (9 2-1) = 1 - ((108)/720) = 1 0.15 = +.85
What does the value of r s tell you? Spearman s rank correlation coefficient is actually derived from the product-moment correlation coefficient, such that: -1 r s 1 r s = 0.85 Means that a child receiving a particular ranking from one observer tended to receive very close to the same ranking from other observer r s = +1 means the ranking is in complete agreement r s = 0 means that there is no correlation between the rankings r s = -1 means that the ranking are in complete disagreement. In fact they are in exact reverse order.
Exercise: The marks of eight candidates in English and Mathematics are: Candidate 1 2 3 4 5 6 7 8 English (x) 50 58 35 86 76 43 40 60 Maths (y) 65 72 54 82 32 74 40 53 Rank the results and hence find Spearman s rank correlation coefficient between the two sets of marks. Comment on the value obtained,
Solution English (x) Maths (y) 50 58 35 86 76 43 40 60 65 72 54 82 32 74 40 53 Rank x 4 5 1 8 7 3 2 6 Rank y 5 6 4 8 1 7 2 3 D -1-1 -3 0 6-4 0 3 D 2 1 1 9 0 36 16 0 9 D 2 = 72
Solution R s = 1 6(ΣD 2 ) ----------- N (N 2-1) = 1 (6(72)) ---------- 8 (8 2-1) = 1 - ((432)/504) = 1 0.857 =.142 Spearman s coefficient of rank correlation is 0.142 This appears to show a very weak positive correlation between the English and Mathematics ranking
Tied Ranks A tied rank occurs when two participants receive the same rank on the same variable (e.g two person are tied for first on variable x) Tied ranks result in an incorrect value of r s Resolve (correct) any tied ranks before computing r s Therefore, for each participant at a tied rank, assign the mean of the ranks that would have been used had there not been a tie
Example Runner Race X Race Y To resolve ties New Y A 4 1 Tie uses ranks 1 and 2, becomes 1.5 B 3 1 Tie uses ranks 1 and 2, becomes 1.5 C 2 2 Becomes 3rd 3 D 1 3 Becomes 4th 4 1.5 1.5
Example Runner Race X New Y D D 2 A 4 1.5 2.5 6.25 B 3 1.5 1.5 2.25 C 2 3-1 1 D 1 4-3 9 D 2 = 18.5
Solution R s = 1 6(ΣD 2 ) ----------- N (N 2-1) = 1 (6(18.5)) ---------- 4 (4 2-1) = 1 - ((111)/60) = 1 1.85 = -.85