Section 9-1 Correlation A correlation is a between two. The data can be represented by ordered pairs (x,y) where x is the (or ) variable and y is the (or ) variable. There are several types of correlations that can be ascertained by graphing a scatter-plot of the ordered pairs and looking at the pattern. If the dots tend to run from left to right in a more or less fashion, the correlation is. If the dots tend to run from left to right in a more or less fashion, the correlation is. If the dots tend to be all over the graph with pattern, the correlation is. If the dots form a pattern other than a (, for example), the correlation is The correlation coefficient is a measure of the and of a relationship between two. The correlation coefficient is denoted by the letter. The correlation coefficient is denoted by, the Greek letter (pronounced row ). The correlation coefficient runs from to ; the closer the value is to either end, the the is. A correlation coefficient of would signify a linear relationship. A correlation coefficient of would signify a linear relationship. A correlation coefficient of would signify linear relationship. While there is a formula for finding the value of r, we are going to use the calculator to find this for us. Steps to graphing a scatter-plot and finding the correlation coefficient on the calculator. 1) Turn StatPlot On 2 nd Y=, select plot 1, turn it on, and make sure that it is looking at L1 and L2. 2) STAT-EDIT, enter data points. Use L1 for the x-values and L2 for the y-values. 3) Set your window WINDOW Set x-min to a number less than the smallest x-value in your list. Set x-max to a number greater than the largest x-value in your list Set y-min to a number less than the smallest y-value in your list.. Set y-max to a number greater than the largest y-value in your list 4) Hit the GRAPH key to look at your scatter-plot. 5) To find the correlation coefficient, run STAT-Test-F. The calculator will give you the values for r 2, and r. We ll talk more about r 2 later, but for now we are looking at r to quantify the strength of the relationship. This test also gives you the equation of the line of regression, as well as an abundance of other information that we will use later, including p. To graph the regression line, simply enter the equation into the Y= screen and press GRAPH. The line will appear, and go through the scatter-plot that you already have. Once we have a number that represents the of the relationship, we need to determine whether or not this relationship is. This is necessary to determine whether the line can be used for y-values. There are ways to determine if the relationship is significant. Since we have been doing hypothesis testing for two chapters now, we will use the hypothesis
test method of determining whether a relationship is significant as our first choice. The hypotheses are written in the following way: To test whether there is any correlation at all, the hypotheses are H0: ρ = 0 and Ha: ρ 0 Notice that this means that the null hypothesis states that there is relationship. In this class, we will ONLY conduct the two-tailed test for any significance. Once we have the hypotheses written, we will conduct a t-test to test them. STAT-Test-F (LinRegTTest) will give us the t-score, as well as the r value, the p value, and the equation of best fit. Enter data into L1 and L2, then run STAT-Test-F, making sure to indicate that you are running a two-tailed test. Set Freq: to 1 and leave RegEQ: blank. If p α, reject H 0. If p > α, fail to reject H 0. Using the Pearson Correlation Coefficient chart (Table 11), found on page A28 in the back of your book can only be used if the desired α level is or. To use the chart, simply use the number of (n) for the and the for the to find the critical value. If the absolute value of is than the value, the relationship is. If the absolute value of is than or the value, the relationship is. Correlation and Causation It is important to remember that just because two variables are related does not necessarily mean that one causes the other. There are 4 possibilities: 1) A cause-and-effect relationship between the variables. x causes y. For example, spending more money on advertising results in more sales. 2) A cause-and-effect relationship between the variables. y causes x. For example, maybe more time between Old Faithful eruptions causes the next one to last longer, instead of the other way around. 3) A, as yet unknown, variable may be both x and y. The Chapter Opener on page 495 shows a positive correlation between a movie s budget and its ticket sales. Which one causes the other? Maybe they are both caused by the actors who star in the movies. Big stars demand more money to appear in films (budget goes up). Big stars draw more people to the theaters to see their movies (ticket sales go up). Maybe they are both caused by the hype generated by the movie studio prior to the release of the movie. Advertising causes the budget to go up. Advertising may lure more into the theater to see the movie (ticket sales up). 4) The variables only to be related; it s a. For example, there may be a strong positive correlation between the number of coyotes living in an area and the number of families owning more than two cars in that same area, but it is highly unlikely that one causes the other. The relation would probably be due to coincidence. Example 3 (Page 498) Old Faithful, located in Yellowstone National Park, is the world s most famous geyser. The duration (in minutes) of several of Old Faithful s eruptions and the times (in minutes) until the next eruption
are shown in the table below. Display the data in a scatterplot and determine whether there appears to be a positive or negative linear correlation or no linear correlation at all. Duration, x 1.8 1.82 1.90 1.93 1.98 2.05 2.13 2.30 2.37 2.82 3.13 3.27 3.65 Time, y 56 58 62 56 57 57 60 57 61 73 76 77 77 STAT- ( ) Enter Duration (x) values into Enter Time (y) values into 2 nd Y=, turn On Select ( option) Make sure that the correct lists are being looked at. Window Set x-min to something than the x-value in your data set. Set x-max to something than the x-value in your data set. Repeat for y. Graph This plot appears to show a correlation. Example 5 (Page 501) Duration, x 3.78 3.83 3.88 4.10 4.27 4.30 4.43 4.47 4.53 4.55 4.60 4.63 Time, y 79 85 80 89 90 89 89 86 89 86 92 91 Use a technology tool (TI-84 Plus) to calculate the correlation coefficient for the Old Faithful data given in Example 3. What can you conclude? STAT TEST F. r ; since is pretty close to 1, it suggests a linear correlation. Example 6 (Page 503) Using the data from Example 5, and the Pearson Correlation Coefficient chart on page A28, determine whether the correlation coefficient is significant. To use the table, simply look at the row for n and the column for α. This is your critical value. If the r value of the correlation is than the critical value, the correlation is significant. Looking at the Pearson Correlation Coefficient chart, the critical value for n = 25 and α =.05 is. Since the r value that we got when we ran the test was r, and, we conclude that the correlation is. At the 5% level of significance, there is evidence to conclude that there is a linear correlation between the duration of Old Faithful s eruptions and the time between eruptions. Example 6 (Page 503) Using the data from Example 5, and the hypothesis testing method, determine whether the correlation coefficient is significant. Write the hypotheses: H 0 : ρ = 0, H a : ρ 0 (claim) STAT - TEST - F Set for L1 and L2 and two-tailed test. We get a t-value (standardized test statistic) of. Our p-value is.
Since p α, we H 0. Remember that the null says that there is significant correlation. Since we that, we are saying that there a significant correlation. At the 5% level of significance, there is evidence to conclude that there is a linear correlation between the duration of Old Faithful s eruptions and the time between eruptions. Example 7 (Page 505) Using the data from example 4 (provided below), test the significance of this correlation coefficient. Use α = 0.05. Advertising $ (in thousands) 2.4 1.6 2.0 2.6 1.4 1.6 2.0 2.2 Company Sales (in thousands) 225 184 220 240 180 184 186 215 H 0 : ; ( correlation) H a : ( correlation) Enter Advertising values into and Sales values into. Run STAT-Test- Designate L1 and L2 and specify a two-tailed test. Don t change Freq or RegEQ. The results are: t ; p, r Since p α, we H 0. This means that there a correlation between advertising expenses and company sales. Also, since r is, it is a correlation. At the 5% significance level, there is evidence to conclude that there is a linear correlation between advertising expenses and company sales. Section 9-2 Equation of Best Fit for Linear Regression The only thing in Section 9-2 that is new is to use the equation of the line of best fit to make predictions about y-values. You can only use the equation to make predictions if the correlation is!! That's why we ran the tests in 9-1 to determine whether the correlation is significant or not. When you run STAT - TEST - F, you get the equation of the line of best fit, too. Using the data from Example 7 in 9-1, we got the following: y = a + bx; a = 104.061, and b = 50.729, so the equation is y = + Example 3 - (Page 516) Use the equation of best fit from Example 7 in 9-1 to predict the expected company sales for the following advertising expenses. a) 1.5 thousand b) 1.8 thousand c) 2.5 thousand Remember, we have already determined that the correlation is significant, so this equation can be used for making predictions. 1) Plug each value of x into the equation to find the y-value prediction. y = 50.729(1.5) + 104.061 180.155, or $180,155 y = 50.729(1.8) + 104.061 195.373, or $195,373 y = 50.729(2.5) + 104.061 230.884, or $230,884 2) Enter the equation into y = ( y = 50.729x + 104.061) Use 2nd Window to set beginning of table, then use 2nd Graph to see the y-value for each x.
Section 9-3 We already know how to calculate the correlation coefficient, r. The square of this coefficient is called the coefficient of. The coefficient of determination is equal to the of the variation to the total variation. In other words, if r 2 =.81, then of the variation between x and y can be by the between x and y. The other 19% of the variation is and is due to other factors or to sampling error. How to find the Standard Error of Estimate: 1) Go to STAT Edit Put values into and values into. 2) STAT The Standard Error of Estimate is the of the residuals ( ). Scroll down the list of values given as the results of the test, and find. Construct a Prediction Interval for a Specific x-value (x0). 1) Determine degrees of freedom ( ) 2) Use and given x ( ) to find. 3) Find the critical t value (tc) that corresponds to the level of confidence (c) by using the calculator (InvT( 1 c )), with degrees of freedom being found at.) 2 4) Use the tc value and the Se value to calculate the margin of error (E). E = t c S e 1 + 1 n + n(x 0 x ) 2 n x 2 ( x) 2 n = sample size x 0 is the x value that you used to find y x is the sample mean. x 2 is total of all the squared x s. Square first, then add them up. ( x) 2 is the total of x s squared. Add first, then square the answer. The values for x, x 2, and x can all be found by going to STAT ( ) 5) Find the left and right endpoints by E from y and then E to y. These answers are your interval. Example 1 (Page 526) The correlation coefficient for the advertising expenses and company sales data as calculated in Example 4 of Section 9-1 is r 0.913. Find the coefficient of determination. What does this tell you about the explained variation of the data about the regression line? About the unexplained variation? r 2 =. About of the variation in the company sales can be by the variation in the advertising expenditures. About (the rest) of the variation is and is due to chance or other variables. Example 2 (Page 528) The regression equation for the advertising expenses and company sales data as calculated in Example 1 of Section 9-2 is. Find the standard error of estimate. x 2.4 1.6 2.0 2.6 1.4 1.6 2.0 2.2 y 225 184 220 240 180 184 186 215
1) Go to STAT Put values into L1 and values into L2 2) STAT Test F Find on the list of values given from this test. 3) The standard error of the estimate is. Example 3 (Page 530) Using the results of Example 2, construct a 95% prediction interval for the company sales when the advertising expenses are $2100. What can you conclude? We were told that, so we plug in for x to find y. y =. From here, we need to be able to use the formula for the margin of error. It s not pretty, but it works. E = t c s e 1 + 1 + n(x 0 x ) 2 n n( x 2 ) ( x) 2 t c = (2 nd VARS 4, (1 -.95)/2, with 6 degrees of freedom. s e =, from last example. n = x 0 = (this is the x value we used to find y ). x =, ( x 2 ) =, x = (These values are from STAT-Calc 1 (1-Var Stats)). E = (2.447)(10.29) 1 + 1 8 + 8(2.1 1.975)2 8(32.44) (15.8) 2 26.857 We 26.857 from to get the end of the estimate. We 26.857 from to get the end of the estimate. < y < We can be 95% confident that when advertising expenses are $2100, the company sales will be between and.