Correlation (pp. 1 of 6) Car dealers want to know how mileage affects price on used Corvettes. Biologists are studying the effects of temperature on cricket chirps. Farmers are trying to determine if there is a relationship between the yield of a type of grain and the time between flowering and harvesting in that type of grain. In the collection and analysis of these data, it is important to determine not only how the two variables are related but also the strength of the relationship. Why would the strength of the relationship be important? 1. Study the scatterplots below and describe the relationship between the two variables. Graph A Graph B Graph C Graph D 2009, TESCCC 08/01/09 page 54 of 69
Correlation (pp. 2 of 6) 2. Use the scatterplots to compare the strength of the relationship between the variables. Let s Play Ball The following set of data gives the percentage of games won by teams in the American Football League and the average attendance (rounded to the nearest thousand) for each home game in the 1993 season. 3. Sketch a scatterplot and use the graph to determine if there is a relationship between the variables. Find a linear regression model for the relationship. Study the graph and predict the strength of the relationship. Team Percentage of Games Won Average Attendance per Home Game (in thousands) Baltimore Orioles 53 45 Boston Red Socks 49 30 California Angels 44 25 Chicago White Socks 58 32 Cleveland Indians 47 27 Detroit Tigers 53 24 Kansas City Royals 52 24 Milwaukee Brewers 43 21 Minnesota Twins 44 25 New York Yankees 54 30 Oakland Athletics 42 25 Seattle Mariners 51 25 Texas Rangers 53 28 Toronto Blue Jays 59 50 2009, TESCCC 08/01/09 page 55 of 69
Correlation (pp. 3 of 6) In order to make appropriate and accurate predictions and decisions from the data, we must go further than just analysis of the graphs. One factor to consider is the value of the correlation coefficient. This is a numeric value that assesses the strength of the relationship between the first and second coordinates in a set of points. The value of the correlation coefficient lays in the range -1 r 1. If the points in a scatterplot do not appear to have any relationship, the correlation value will be approximately zero. If the points in a scatterplot all fall on a regression line that increases from left to right, the linear correlation value will equal +1. If the points in a scatterplot all fall on a regression line that decreases from left to right, the linear correlation value will equal -1. r= +1 r= -1 The common correlation value classifications are given in the following list. Correlation can be found for both linear and non-linear data. 0 < r < 0.33 Weak, very weak, to no correlation as it approaches 0 0.34 < r < 0.67 Moderate correlation 0.68 < r < 1.00 Strong, very strong, to perfect at ± 1.00 4. Analyze the scatterplots on Graphs A, B, C, and D on the first page of this handout and estimate the correlation coefficient value for each graph. Explain your reasoning. 2009, TESCCC 08/01/09 page 56 of 69
Correlation (pp. 4 of 6) 5. Stronger correlation values mean that the model can better be used to make predictions and decisions from the data. Which sets of data in Graphs A, B, C, and D could best be used to make predictions? Explain your reasoning. Calculating the Correlation Value by Formula (Optional) The correlation coefficient, r, can be calculated manually for a set of (x, y) data points using the following formula. r x x y y S x n S y x = mean of the x variables x = individual domain variables y = mean of the y variables y = individual range variables S x = standard deviation of the x variables S = standard deviation of the y variables y sum of all values n = number of points Correlation Coefficient by Graphing Calculator Although the correlation coefficient can be calculated manually, it is much quicker and more reliable to use statistical software or the graphing calculator. When the regression equation is determined on the graphing calculator, the calculator can simultaneously give the r-value, if the diagnostic tool of the calculator is turned on. Turn on the diagnostics: CATALOG Scroll to DiagnosticOn ENTER ENTER 2009, TESCCC 08/01/09 page 57 of 69
Correlation (pp. 5 of 6) 6. Enter the data comparing percentage of games won to average attendance at baseball games in 1993 into the L1 and L2 lists in the graphing calculator. Graph a scatterplot of the data on the calculator. Run a linear regression. (Be sure to turn on the diagnostics.) What is the strength of the linear regression according to the r-value? Enter the linear function in y1. Use the graph and function model to describe the situation. 2009, TESCCC 08/01/09 page 58 of 69
Correlation (pp. 6 of 6) Independent Practice 1. Sketch a scatterplot to represent each of the following. Give a range between which the r- value would fall. a. Moderate, positive, linear correlation b. Moderate, negative, quadratic correlation c. Perfect, negative, linear correlation d. Weak, positive, linear correlation e. Strong, positive, exponential correlation 2. The table below gives the first semester and second semester scores for last year s Math Models course. Analyze the data and use the data to determine if the first semester scores for this year s class would make good predictors for the second semester scores. Do any scores not fit the predictions? If so, explain the possible reasons. 1 st Semester 2 nd Semester 81 80 75 82 71 83 61 57 96 100 56 30 85 68 18 56 70 40 77 87 71 65 91 86 88 82 79 57 77 75 68 47 2009, TESCCC 08/01/09 page 59 of 69