PS2: Two Variable Statistics LT2: Measuring Correlation and Line of best fit by eye. LT3: Linear regression LT4: The χ 2 test of independence. 1
Pearson's Correlation Coefficient In examinations you are expected to calculate r using technology. Therefore, you will put the data in your calculator and run a linreg(ax+b) test to find r. This method is simple and shouldn t be too much of a worry on your paper. However, calculating r using the formula is recommended for the Internal Assessment to get full marks on the mathematic process portion. For us as practice we will only be looking at a few points rather than a multitude of points due to time. The formula is: r = (x x)(y y) x x 2 y y 2 You may recognize the bottom. What does this look similar to? The top is called the covariance, which tells us what happens to a specific point compared to the mean. So if we test to see how each point is associated with the mean, we come up with r, or a representation of correlation from -1 to 1. 2
Pearson's Correlation Coefficient Also important side note, r = S xy S x S y S x = Standard Deviation of x S y = Standard Deviation of y S xy = Covariance of x and y. Sometimes, all we need is to find the standard deviation of x and y, with a given covariance, and we can calculate r. Calculating r by hand Using the long equation r = find the values needed to find r. (x x)(y y) x x 2 y y 2, Lets look at an example and Example: Daisy investigates how the volume of water in a pot affects the time it takes to boil on the stove. The results are given in the table. Find and interpret Pearson s correlation coefficient between the two variables. Pot Volume (x, L) Time to boil (y min) A 1 2 B 2 4 C 4 7 D 6 9 3
Calculating r by hand r = Pot (x x)(y y) x x 2 y y Volume (x, L) 2, each portion section is needed to be solved. Time to boil (y min) A 1 2 x x y y (x x)(y y) x x 2 y y 2 B 2 4 C 4 7 D 6 9 total Calculating mean of x and y, should be a cinch! x = x y =, y = = 4 4 r = Try One On Your Own Period 7 s test scores are as follows as well as their IQ for 5 people. by hand, find what type of correlation there is by interpreting r. Person Score (x) IQ (y) 1 66 124 2 49 126 3 55 130 4 68 168 5 58 101 Total 4
Try One On Your Own Period 7 s test scores are as follows as well as their IQ for 5 people. by hand, find what type of correlation there is by interpreting r. Person Score (x) IQ (y) x x y y (x x)(y y) x x 2 y y 2 1 66 124 6.8-5.8-39.44 46.24 33.64 2 49 126-10.2-3.8 38.76 104.04 14.44 3 55 130-4.2.2 -.84 17.64.04 4 68 168 8.8 38.2 336.16 77.44 1459.2 5 58 101-1.2-28.8 34.56 1.44 829.44 Total 296 649 369.2 246.8 2336.8 x = 5 x = 59. 2, y = y 5 = 129.8 r = 369.2 246.8 2336.8.4862 Now that you ve done this once or twice; Click Here for something amazing by an amazing person. Score (x) IQ (y) x x y y (x x)(y y) x x 2 y y 2 66 124 49 126 55 130 68 168 58 101 31 111 60 199 5
r 2 : The Coefficient of Determination. To help describe the correlation between two variables, we can also calculate the coefficient of determination r 2. This is simply the square of Pearson s Product moment correlation coefficient r, and as such the direction of correlation is eliminated. r describes the direction of the correlation and how correlated something is. r 2 describes the type of correlation at each point. In other words, it describes the percent in which one variable will follow the correlation. How often will a given variable depend on the other variable? Do not get these confused. The IA specifically states to dock points if students get these mixed up as evidence that the student doesn t know what r stands for. Yep. That s me. 6
3C 7
LINE OF BEST FIT BY EYE What is the line of best fit? A line we can draw to best represent the relationship between two variables. How do we calculate this line? Well it s by eye, so we never really get much of an accurate line, but something close will do. Here is how you do it! 8
LOBF: the calculations by hand Step 1: Calculate the mean of the X values, x, and the mean of the Y values, y. Step 2: Mark the mean point ( x, y) on the scatter diagram. Step 3: Draw a line through the mean point which fits the trend of the data, and so that about the same number of data points are above the line as below it. This process is an estimate and therefore can result in some discrepancies. Make sure you use a straight edge for all your lines of best fit by eye. Example A group of LCC students were surveyed on how much they run a week. The data was recorded in a table and the results were as follows. 1) Plot the points on a scatterplot (accuracy is important.) 2) find the line of best fit by eye. (graph x, y) 3) describe the correlation (strength, direction, outliers, etc). Age(x) Distance Miles (y) 16 11 66 3 49 21 23 17 22 11 55 1 71 2 58 6 31 14 60 8 25 20 15 10 5 Distance Miles (y) vs Age (x) 0 0 10 20 30 40 50 60 70 80 9
INTERPOLATION AND EXTRAPOLATION Using the line of best fit we can make predictions about values we don t know about. For instance, on the previous graph, we had 10 different ages. On a scale from 0-70+ we have many more possibilities. Interpolation is an estimation of a data point within the lowest x value (lower pole) and the highest x value (upper pole) using the line of best fit. Extrapolation is an estimate of a data point outside the lower pole and upper pole. Using the LOBF. TOK: Think about his! Are there any limitations to interpolation or extrapolation? Think in terms of the previous slides. How many miles should a 1 year old run? Do all people run the same at age 20? Example On a hot day, nine cars were left in the sun in a car parking lot. The length of time each car was left in the sun was recorded, as well as the temperature inside the car at the end of the period. Car A B C D E F G H I Time 50 5 25 40 15 45 55 10 15 Temp 100 70 88 96 77 110 121 80 73 A. Calculate the mean of both variable B. Draw a scatter diagram of the data. C. Plot the point ( x, y) on the scatter D. diagram and then draw the line of best fit. E. Predict: The temp at 35 minutes The temp at 75 minutes. Comment on the reliability of your predictions. 10
Homework 11B.2 P. 325-326 #1-3 (Saputo page numbers 11-14) 11B.3 P. 327 #1-4 11C P. 330 #1-3 11