Inferences for Correlation Quantitative Methods II Plan for Today Recall: correlation coefficient Bivariate normal distributions Hypotheses testing for population correlation Confidence intervals for population correlation 1
Bivariate Analysis Is there a relationship between two variables? For example, is there a relationship between a person s income and his level of educational attainment? These types of questions are studied by the bivariate analysis, where bi indicates two variables. Statisticians also work with more than two variables, leading to a multivariate analysis. Correlation Coefficient r It was developed by Karl Pearson in the early 1900s as a numerical measure of strength and direction of the linear association between the independent variable x and the dependent variable y. The value of r is always between 1 and 1. If r > 0, the correlation is positive (when x increases, y increases as well). If r < 0, the correlation is negative (when x increases, y decreases). 2
Correlation Coefficient r r 1 : perfect negative correlation 1 < r < 0.6 : strong negative correlation 0.6 < r < 0.3 : moderate negative correlation 0.3 < r < 0 : weak negative correlation r 0 : no correlation 0 < r < 0.3 : weak positive correlation 0.3 < r < 0.6 : moderate positive correlation 0.6 < r < 1 : strong positive correlation r 1 : perfect positive correlation A formula for the coefficient of correlation r = s x s y b where s x and s y are the standard deviations for the x- and y- data respectively, and the slope b = xy x ҧ തy x 2 xҧ 2 However, the correlation coefficient is much faster and easier to compute using your scientific calculator! 3
Salaries and education The table on the left represents a sample of 11 individuals working for the government of Quebec, their annual salary in thousands of $, and their educational attainment, in years. Computing the correlation coefficient Using the formulas from before or the built-in calculator functions, compute: r = 0. 6395 This is an example of a strong positive correlation. 4
Bivariate normal distribution We shall always assume that the set (x, y) of ordered pairs of data comes from a bivariate normal distribution. It means that for a fixed value of x, the values of y are normally distributed, and for a fixed value of y the values of x are normally distributed as well. In most cases, the results are still accurate if the distributions are bell-shaped and symmetrical, and the y-variances are approximately equal. Hypothesis testing for correlation. The population correlation is denoted by the Greek letter ρ ( rho ) and the sample correlation by r. The null-hypothesis is always going to be that the values of x and y have no linear correlation, that is H 0 : ρ = 0. The alternate hypotheses will always be H A : ρ 0 (two-tailed test) 5
The test statistic To test hypotheses for a population correlation, we are going to use the Student s t distribution with (n-2) degrees of freedom: df = n 2 And the test statistic t is given by r n 2 t = 1 r 2 Here n is the number of pairs of data (x, y). Example: study hours and grades Five students have recorded the number of hours they studied for an exam and their grades: Hours 2 5 1 4 2 Grade 80 80 70 90 60 Assuming a bivariate normal population, test at a 5% level of significance whether the correlation between the number of hours of study and the grade is significant. 6
Example: study hours and grades First of all, let us compute the sample correlation coefficient, using formulas or a calculator. We have r = 0.6138. State the hypotheses: H 0 : ρ = 0, H A : ρ 0. (A two-tailed test.) 0.6138 5 2 The test statistic: t = = 1.35 1 0.6138 2 The number of degrees of freedom is df = 3. The critical values: ±t 3, 0.025 = ±3.182 The p-value = 2 0.142 = 0.284 > 0.05 = α Decision: fail to reject H0. Example: reading time and TV Do reading an TV viewing compete for leisure time? To find out, a psychologist interviewed a random sample of 15 children regarding the number of books they had read during the last year and the number of hours they had spent watching TV on a daily basis. If a correlation coefficient of 0.715 is obtained, is the correlation significant at the 5% level of significance? Assume that it s a bivariate normal population. 7
Example: reading time and TV Let us state the hypotheses: H 0 : ρ = 0, H A : ρ 0. (A two-tailed test.) 0.715 15 2 The test statistic: t = = 3.69 1 ( 0.715) 2 The number of degrees of freedom is df = 13. The critical values: ±t 13, 0.025 = ±2.16 (Sketch the curve and the regions of rejection.) The p-value = 2 0.002 = 0.004 < 0.05 = α Decision: reject H0. The confidence intervals We start with the Fisher transformation: Z = 1 1 + r ln 2 1 r It turns out, that for a bivariate normal population, Z is (approximately) normally distributed with the st. deviation of 1Τ n 3 So the confidence interval for μ z is c = Z z( α Τ 2) n 3 < μ Z < Z + z( α Τ 2) n 3 = d 8
The confidence intervals Now we perform the inverse Fisher transformation to get the confidence interval for the population correlation ρ: e 2 c 1 e 2 c + 1 < ρ < e2 d 1 e 2 d + 1 Make sure you can compute these quantities correctly on your scientific calculator! Let us consider examples. Example: study hours and grades Let us find the 95% confidence interval for ρ. Recall that we have r = 0.6138. Do the Fisher transformation: Z = 1 1+0.6138 ln = 0.7150 2 1 0.6138 z ατ2 = 1.96. The confidence interval for μ Z : so 0.7150 1.96 1.96 < μ < 0.7150 + 2 2 c = 0.6709 < μ Z < 2.1009 = d 9
Example: study hours and grades Now we ll do the inverse Fisher transformation. e 2 ( 0.6709) 1 e 2 ( 0.6709) + 1 < ρ < e2 2.1009 1 e 2 2.1009 + 1 After computation, we find the 95% confidence interval for the population correlation coefficient ρ: 0.5856 < ρ < 0.9705 Example: reading time and TV Let us find the 94% confidence interval for ρ. Recall that we have r = 0.715. Do the Fisher transform: Z = 1 1+ 0.715 ln = 0.8973 2 1 0.715 z ατ2 = 1.88. The confidence interval for μ Z : so 0.8973 1.88 1.88 < μ < 0.8973 + 12 12 c = 1.4400 < μ Z < 0.3546 = d 10
Example: reading time and TV Now we ll do the inverse Fisher transformation. e 2 ( 1.44) 1 e 2 ( 1.44) + 1 < ρ < e2 ( 0.3546) 1 e 2 ( 0.3546) + 1 After computation, we find the 94% confidence interval for the population correlation coefficient ρ: 0.8937 < ρ < 0.3404 Example: salt and anxiety (practice) Is there a correlation between one s salt intake and his or her level of stress and anxiety? A study of 32 volunteers has found a correlation coefficient of 0. 26 between the participants salt intake and the amplitude of their adrenaline spikes. (a) Test at a 5% level of significance whether the population correlation is significant. (b) Construct a 95% confidence interval for the population correlation coefficient. Assume a bivariate normal population. 11
Example: immigration and GDP (practice) Do immigration rates correlate with GDP (gross domestic product)? A researcher took data from 40 different countries and found the correlation coefficient for her sample to be equal to 0.44. (a) Test at a 1% level of significance whether there is a significant correlation between immigration rates and GDP. (b) Construct a 98% confidence interval for the population correlation coefficient. 12