Chapter 4 Data with Two Variables

Size: px
Start display at page:

Download "Chapter 4 Data with Two Variables"

Transcription

1 Chapter 4 Data with Two Variables

2 1 Scatter Plots and Correlation and 2 Pearson s Correlation Coefficient

3 Looking for Correlation Example Does the number of hours you watch TV per week impact your average grade in a class? Hours Grade

4 Looking for Correlation Example Does the number of hours you watch TV per week impact your average grade in a class? Hours Grade To see if there is a relationship, we will create a scatter plot and analyze it. Definition A scatter plot is a geographical representation between two quantitative variables. They may be from the same individual (i.e. education v. income, height v. weight) or from paired individuals (i.e. age of partners in a relationship).

5 Scatter Plots When working with scatter plots, there are two variables. They may be two different types. Definition A response variable measures the outcome of a study.

6 Scatter Plots When working with scatter plots, there are two variables. They may be two different types. Definition A response variable measures the outcome of a study. Definition An explanatory variable may explain or influence changes in a response variable.

7 Scatter Plots When working with scatter plots, there are two variables. They may be two different types. Definition A response variable measures the outcome of a study. Definition An explanatory variable may explain or influence changes in a response variable. Explanatory variables are often called independent and are on the x-axis. Response variables are often called dependent and are on the y-axis.

8 Back to Our Example In our example, which is the explanatory variable?

9 Back to Our Example In our example, which is the explanatory variable? Watched TV hours.

10 Back to Our Example In our example, which is the explanatory variable? Watched TV hours. The response variable is there for the average grade. So the question we are trying to answer is Does watching TV influence the average grade in a class?

11 Back to Our Example In our example, which is the explanatory variable? Watched TV hours. The response variable is there for the average grade. So the question we are trying to answer is Does watching TV influence the average grade in a class? Let s plot the data and see what we have.

12 The Scatter Plot Grades v. Hours of TV Grade Hours of TV

13 How Does the Relationship Look? What do we think?

14 How Does the Relationship Look? What do we think? It looks like the more hours of TV that are watched, the lower the average grade. But how good is the relationship? We can measure this in different ways. One is direction (+, ) and another is by ranking the strength. These are both accomplished by looking at Pearson s Correlation Coefficient.

15 Facts About Pearson s Correlation Coefficient: 1 1 r 1. The least correlation is 0 and the best correlation is ±1. Whether r is positive or negative only tells us which direction the relationship goes - whether y increases as x increases or if y decreases as x increases. Being negative is not bad.

16 Facts About Pearson s Correlation Coefficient: 1 1 r 1. The least correlation is 0 and the best correlation is ±1. Whether r is positive or negative only tells us which direction the relationship goes - whether y increases as x increases or if y decreases as x increases. Being negative is not bad. 2 Correlation makes no distinction between x and y, that is, between the choice of explanatory and response variables. We need to make sure we are careful, though, as the next part (regression line) depends heavily on the correct choice.

17 Facts About Pearson s Correlation Coefficient: 1 1 r 1. The least correlation is 0 and the best correlation is ±1. Whether r is positive or negative only tells us which direction the relationship goes - whether y increases as x increases or if y decreases as x increases. Being negative is not bad. 2 Correlation makes no distinction between x and y, that is, between the choice of explanatory and response variables. We need to make sure we are careful, though, as the next part (regression line) depends heavily on the correct choice. 3 Correlation measures only the linear relationship.

18 Facts About Pearson s Correlation Coefficient: 1 1 r 1. The least correlation is 0 and the best correlation is ±1. Whether r is positive or negative only tells us which direction the relationship goes - whether y increases as x increases or if y decreases as x increases. Being negative is not bad. 2 Correlation makes no distinction between x and y, that is, between the choice of explanatory and response variables. We need to make sure we are careful, though, as the next part (regression line) depends heavily on the correct choice. 3 Correlation measures only the linear relationship. 4 Correlation is not resistant.

19 Facts About Pearson s Correlation Coefficient: 1 1 r 1. The least correlation is 0 and the best correlation is ±1. Whether r is positive or negative only tells us which direction the relationship goes - whether y increases as x increases or if y decreases as x increases. Being negative is not bad. 2 Correlation makes no distinction between x and y, that is, between the choice of explanatory and response variables. We need to make sure we are careful, though, as the next part (regression line) depends heavily on the correct choice. 3 Correlation measures only the linear relationship. 4 Correlation is not resistant. 5 Correlation has no units.

20 So How Do We Find Pearson s Correlation Coefficient? The Correlation Coefficient r = 1 ( ) ( ) x i x yi y n 1 S x S y = 1 n 1 zx z y

21 So How Do We Find Pearson s Correlation Coefficient? The Correlation Coefficient r = 1 ( ) ( ) x i x yi y n 1 S x S y = 1 n 1 zx z y Let s find the correlation coefficient for our example. First, we need a few values, x, y, S x, S y.

22 So How Do We Find Pearson s Correlation Coefficient? The Correlation Coefficient r = 1 ( ) ( ) x i x yi y n 1 S x S y = 1 n 1 zx z y Let s find the correlation coefficient for our example. First, we need a few values, x, y, S x, S y. x = y = S x = S y = 8.971

23 Finding Pearson s Correlation Coefficient For each pair, find the z-score for each value. Then multiply them together. After summing, divide by n 1. i z x z y product

24 Finding Pearson s Correlation Coefficient For each pair, find the z-score for each value. Then multiply them together. After summing, divide by n 1. i z x z y product

25 Finding Pearson s Correlation Coefficient For each pair, find the z-score for each value. Then multiply them together. After summing, divide by n 1. i z x z y product r = 1 ( ) =

26 Finding Pearson s Correlation Coefficient For each pair, find the z-score for each value. Then multiply them together. After summing, divide by n 1. i z x z y product r = 1 ( ) = Interpretation: Moderate negative correlation

27 Another Way To Calculate We can use a formula similar to the way we found the standard deviation to find this correlation coefficient. First, we need a few formulas:

28 Another Way To Calculate We can use a formula similar to the way we found the standard deviation to find this correlation coefficient. First, we need a few formulas: The Three S s S xx = Σ(x x) 2

29 Another Way To Calculate We can use a formula similar to the way we found the standard deviation to find this correlation coefficient. First, we need a few formulas: The Three S s S xx = Σ(x x) 2 S yy = Σ(y y) 2

30 Another Way To Calculate We can use a formula similar to the way we found the standard deviation to find this correlation coefficient. First, we need a few formulas: The Three S s S xx = Σ(x x) 2 S yy = Σ(y y) 2 S xy = Σ(x x)(y y)

31 Another Way To Calculate We can use a formula similar to the way we found the standard deviation to find this correlation coefficient. First, we need a few formulas: The Three S s S xx = Σ(x x) 2 S yy = Σ(y y) 2 S xy = Σ(x x)(y y) Pearson s Correlation Coefficient r = S xy Sxx S yy

32 Another Way To Calculate This makes it look just as complicated, or worse, than the other method, but there are simpler ways to calculate the three S s that make this method not so painful.

33 Another Way To Calculate This makes it look just as complicated, or worse, than the other method, but there are simpler ways to calculate the three S s that make this method not so painful. Calculating the Three S s S xx = Σx 2 (Σx)2 n

34 Another Way To Calculate This makes it look just as complicated, or worse, than the other method, but there are simpler ways to calculate the three S s that make this method not so painful. Calculating the Three S s S xx = Σx 2 (Σx)2 n S yy = Σy 2 (Σy)2 n

35 Another Way To Calculate This makes it look just as complicated, or worse, than the other method, but there are simpler ways to calculate the three S s that make this method not so painful. Calculating the Three S s S xx = Σx 2 (Σx)2 n S yy = Σy 2 (Σy)2 n S xy = Σxy (Σx)(Σy) n

36 Another Way To Calculate This makes it look just as complicated, or worse, than the other method, but there are simpler ways to calculate the three S s that make this method not so painful. Calculating the Three S s S xx = Σx 2 (Σx)2 n S yy = Σy 2 (Σy)2 n S xy = Σxy (Σx)(Σy) n And, we can get this from the calculator...

37 Another Way To Calculate Σx Σx 69 (Σx) (Σx) Σy Σy 533 (Σy) (Σy) Σxy 5083 (Σx)(Σy) (Σx)(Σy)

38 Another Way To Calculate So we have r = S xy Sxx S yy

39 Another Way To Calculate So we have = S xy r = Sxx S yy ( )( )

40 Another Way To Calculate So we have = S xy r = Sxx S yy ( )( ) = (142.86)(482.86)

41 Another Way To Calculate So we have = S xy r = Sxx S yy ( )( ) = (142.86)(482.86) =.6505

42 So Can We Say There Is A Relationship? So, can we say that there is a direct relationship between the number of hours of TV watched and the average grade? Not so fast...

43 So Can We Say There Is A Relationship? So, can we say that there is a direct relationship between the number of hours of TV watched and the average grade? Not so fast... Correlation does not necessarily imply causation.

44 So Can We Say There Is A Relationship? So, can we say that there is a direct relationship between the number of hours of TV watched and the average grade? Not so fast... Correlation does not necessarily imply causation. Just because it looks the part does not mean we have evidence that there is a relationship. We have to consider a couple of other things. One is lurking variables. These are variables that may be present but we are not actually considering them within the data.

45 So Can We Say There Is A Relationship? So, can we say that there is a direct relationship between the number of hours of TV watched and the average grade? Not so fast... Correlation does not necessarily imply causation. Just because it looks the part does not mean we have evidence that there is a relationship. We have to consider a couple of other things. One is lurking variables. These are variables that may be present but we are not actually considering them within the data. Can you think of any lurking variables that would impact our example?

46 Significance We also need to test for significance to see what is going on. If r n > 3, the correlation is significant Otherwise it is not significant

47 Significance We also need to test for significance to see what is going on. If r n > 3, the correlation is significant Otherwise it is not significant The smaller this value, the smaller the probability that the correlation will be significant.

48 Significance We also need to test for significance to see what is going on. If r n > 3, the correlation is significant Otherwise it is not significant The smaller this value, the smaller the probability that the correlation will be significant. Reasons why data may not be significant: 1 Genuine lack of correlation

49 Significance We also need to test for significance to see what is going on. If r n > 3, the correlation is significant Otherwise it is not significant The smaller this value, the smaller the probability that the correlation will be significant. Reasons why data may not be significant: 1 Genuine lack of correlation 2 Not enough data

50 Significance We also need to test for significance to see what is going on. If r n > 3, the correlation is significant Otherwise it is not significant The smaller this value, the smaller the probability that the correlation will be significant. Reasons why data may not be significant: 1 Genuine lack of correlation 2 Not enough data Our example is not significant because of quantity. So we cannot consider that watching TV has a direct impact on grades.

51 Assumptions and Conditions for Correlation Quantitative Variables Condition Don t make the common error of calling an association involving a categorical variable a correlation. Correlation is only about quantitative variables.

52 Assumptions and Conditions for Correlation Quantitative Variables Condition Don t make the common error of calling an association involving a categorical variable a correlation. Correlation is only about quantitative variables. Straight Enough Condition The best check for the assumption that the variables are truly linearly related is to look at the scatter plot to see whether it looks reasonably straight. That s a judgment call, but not a difficult one.

53 Assumptions and Conditions for Correlation Quantitative Variables Condition Don t make the common error of calling an association involving a categorical variable a correlation. Correlation is only about quantitative variables. Straight Enough Condition The best check for the assumption that the variables are truly linearly related is to look at the scatter plot to see whether it looks reasonably straight. That s a judgment call, but not a difficult one. No Outliers Condition Outliers can distort the correlation dramatically, making a weak association look strong or a strong one look weak. Outliers can even change the sign of the correlation. But it s easy to see outlier in the scatter plot, so to check this condition, just look.

54 Another Example Example The following gives the power numbers for the starting 9 for the 2007 Boston Red Sox. Is there relationship between the number of home runs and the number of RBIs? Does the number of home runs affect the number of RBIs? Produce a scatter plot and discuss the correlation. Player Home Runs RBIs Varitek Youkilis Pedroia 8 50 Lowell Lugo 8 73 Ramirez Crisp 6 60 Drew Ortiz

55 Red Sox Example Which is the response variable? Which is the response variable?

56 Red Sox Example Which is the response variable? Which is the response variable? Since we are asking if HR affects RBIs, HR would be the explanatory variable and therefore x. So RBIs is the y variable Red Sox Power Numbers RBIs Home Runs

57 Before We Go On Something to notice: we have two values with the same x-coordinate Red Sox Power Numbers 120 RBIs Home Runs

58 Finding Pearson s Correlation Coefficient What is our guess as to the correlation?

59 Finding Pearson s Correlation Coefficient What is our guess as to the correlation? Now let s find Pearson s correlation coefficient.

60 Finding Pearson s Correlation Coefficient What is our guess as to the correlation? Now let s find Pearson s correlation coefficient. Input data in usual way, with explanatory variable under L 1 and response variable under L 2

61 Finding Pearson s Correlation Coefficient What is our guess as to the correlation? Now let s find Pearson s correlation coefficient. Input data in usual way, with explanatory variable under L 1 and response variable under L 2 Press STAT and scroll to TESTS

62 Finding Pearson s Correlation Coefficient What is our guess as to the correlation? Now let s find Pearson s correlation coefficient. Input data in usual way, with explanatory variable under L 1 and response variable under L 2 Press STAT and scroll to TESTS Select LinRegTTest

63 Finding Pearson s Correlation Coefficient What is our guess as to the correlation? Now let s find Pearson s correlation coefficient. Input data in usual way, with explanatory variable under L 1 and response variable under L 2 Press STAT and scroll to TESTS Select LinRegTTest Make sure the XList and YList are the lists where the data for the explanatory and response variables are located, respectively

64 Finding Pearson s Correlation Coefficient What is our guess as to the correlation? Now let s find Pearson s correlation coefficient. Input data in usual way, with explanatory variable under L 1 and response variable under L 2 Press STAT and scroll to TESTS Select LinRegTTest Make sure the XList and YList are the lists where the data for the explanatory and response variables are located, respectively Press Calculate and scroll to find r and r 2

65 Using Technology For our example, we have

66 Using Technology For our example, we have r r

67 Using Technology For our example, we have r r So the correlation coefficient tells us that there is a strong positive correlation.

68 What Does r 2 Tell Us? r 2 tells us how much better our predictions will be if we go through the trouble to find the regression line rather than just make our predictions with the means. Ours is pretty good here, indicating that we should find the regression line Red Sox Power Numbers RBIs Home Runs

69 Technology and Scatter Plots We can also create a scatter plot on the calculator.

70 Technology and Scatter Plots We can also create a scatter plot on the calculator. Make sure there are no functions in the grapher (press Y= to check)

71 Technology and Scatter Plots We can also create a scatter plot on the calculator. Make sure there are no functions in the grapher (press Y= to check) Input the data in the usual way (we already have it there for this example)

72 Technology and Scatter Plots We can also create a scatter plot on the calculator. Make sure there are no functions in the grapher (press Y= to check) Input the data in the usual way (we already have it there for this example) Press 2 nd and Y= to get into the STAT PLOT menu

73 Technology and Scatter Plots We can also create a scatter plot on the calculator. Make sure there are no functions in the grapher (press Y= to check) Input the data in the usual way (we already have it there for this example) Press 2 nd and Y= to get into the STAT PLOT menu Make sure only the plot we want is turned on

74 Technology and Scatter Plots We can also create a scatter plot on the calculator. Make sure there are no functions in the grapher (press Y= to check) Input the data in the usual way (we already have it there for this example) Press 2 nd and Y= to get into the STAT PLOT menu Make sure only the plot we want is turned on Select the first graph in the first row and then make sure the XList and YList are correct

75 Technology and Scatter Plots We can also create a scatter plot on the calculator. Make sure there are no functions in the grapher (press Y= to check) Input the data in the usual way (we already have it there for this example) Press 2 nd and Y= to get into the STAT PLOT menu Make sure only the plot we want is turned on Select the first graph in the first row and then make sure the XList and YList are correct Press ZOOM 9

76 One More Example Example There is some evidence that drinking moderate amounts of wine helps prevent heart attacks. The accompanying table gives data on yearly wine consumption (in liters of alcohol from drinking wine per person) and yearly deaths from heart disease (per 100,000 people) in 19 developing nations. Construct a scatter plot and describe what you see. Country Alcohol Deaths County Alcohol Deaths Australia Austria Belgium Canada Denmark Finland France Iceland Ireland Italy Netherlands New Zealand Norway Spain Sweden 1, Switzerland United Kingdom United States West Germany

77 The Scatter Plot Heart Disease v. Alcohol from Wine Deaths (per 100,000) Alcohol from Wine (in liters)

78 The Scatter Plot Heart Disease v. Alcohol from Wine Deaths (per 100,000) Alcohol from Wine (in liters) r =.8428, strong negative correlation

79 The Scatter Plot Heart Disease v. Alcohol from Wine Deaths (per 100,000) Alcohol from Wine (in liters) r =.8428, strong negative correlation r 2 =.7103, worthwhile to find linear regression line

80 3 Slopes and Equations of Fitted Lines

81 This Section Is... The focus of this section is the idea of a fitted curve and the formulas used to find the equation of a line given two points.

82 This Section Is... The focus of this section is the idea of a fitted curve and the formulas used to find the equation of a line given two points. We will not talk about this section here unless we need to.

83 4 The Least Squares Line

84 Getting Started Definition A regression line is a straight line that describes how a response variable y changes as an explanatory variable x changes. We often use a regression line to predict the value of y for a given value of x.

85 Getting Started Definition A regression line is a straight line that describes how a response variable y changes as an explanatory variable x changes. We often use a regression line to predict the value of y for a given value of x. Linear functions are of the form y = mx + b but we will consider them as y = b 0 + b 1 x where b 0 is the y-intercept and b 1 is the slope. The calculator actually uses the form y = a + bx so be careful.

86 Formulas What we will be finding is the least squares regression line of y on x. This is the line that makes the sum of the squares of the vertical distances of the data points from the line as small as possible.

87 Formulas What we will be finding is the least squares regression line of y on x. This is the line that makes the sum of the squares of the vertical distances of the data points from the line as small as possible. Least Squares Regression Line b 1 = r s y s x

88 Formulas What we will be finding is the least squares regression line of y on x. This is the line that makes the sum of the squares of the vertical distances of the data points from the line as small as possible. Least Squares Regression Line b 1 = r s y s x b 0 = y b 1 x

89 Formulas What we will be finding is the least squares regression line of y on x. This is the line that makes the sum of the squares of the vertical distances of the data points from the line as small as possible. Least Squares Regression Line b 1 = r s y s x b 0 = y b 1 x If the correlation coefficient is too small, there is no point in finding y since b 0 and b 1 are both dependent on r.

90 Example Using Given Values Example The following list gives the power numbers for starting 9 Red Sox players for the 2007 season. Name Home Runs RBIs Jason Varitek Kevin Youkilis Dustin Pedroia 8 50 Mike Lowell Julio Lugo 8 73 Manny Ramirez Coco Crisp 6 60 J.D. Drew David Ortiz We want to know if there the number of home runs affects the number of RBIs.

91 Variable Types What is the response variable?

92 Variable Types What is the response variable? What is the explanatory variable?

93 The Needed Values We can find the mean and standard deviation of both sets of data quickly using our technology.

94 The Needed Values We can find the mean and standard deviation of both sets of data quickly using our technology. Variable Mean Standard Deviation x y

95 The Needed Values We can find the mean and standard deviation of both sets of data quickly using our technology. Variable Mean Standard Deviation x y And, since the data is already in the calculator, we can obtain the values of r and r 2 quickly as well. Variable Mean Standard Deviation x y r =.8463 r 2 =.7162

96 The Correlation Coefficient How would we describe this correlation coefficient?

97 The Correlation Coefficient How would we describe this correlation coefficient? It indicates a strong, positive correlation.

98 The Correlation Coefficient How would we describe this correlation coefficient? It indicates a strong, positive correlation. We can use these values to find the equation of the regression line. b 1 = r s y s ( ) x = = 2.29

99 The Correlation Coefficient How would we describe this correlation coefficient? It indicates a strong, positive correlation. We can use these values to find the equation of the regression line. b 1 = r s y s ( ) x = = 2.29 b 0 = y b 1 x

100 The Correlation Coefficient How would we describe this correlation coefficient? It indicates a strong, positive correlation. We can use these values to find the equation of the regression line. b 1 = r s y s ( ) x = = 2.29 b 0 = y b 1 x = (15.78) = 44.19

101 The Correlation Coefficient How would we describe this correlation coefficient? It indicates a strong, positive correlation. We can use these values to find the equation of the regression line. So, the regression line is b 1 = r s y s ( ) x = = 2.29 b 0 = y b 1 x = (15.78) = y = x

102 Alternative Calculation We can also use the three S s to find the equation of the least squares line.

103 Alternative Calculation We can also use the three S s to find the equation of the least squares line. Least Squares Regression Line and m = S xy S xx b = y mx

104 Alternative Calculation S xy = (142)(723) 9 =

105 Alternative Calculation S xy = (142)(723) 9 = S xx = 2896 (142)2 9 =

106 Alternative Calculation S xy = (142)(723) 9 = S xx = 2896 (142)2 9 = m = = 2.29

107 Alternative Calculation S xy = (142)(723) 9 = S xx = 2896 (142)2 9 = m = = 2.29 b = (15.78) = And we have the same results.

108 Practical Interpretation What do these values mean in practical terms?

109 Practical Interpretation What do these values mean in practical terms? The slope tells us that a change in the explanatory variable by one unit will result in a change in the response variable by the amount and direction of the slope.

110 Practical Interpretation What do these values mean in practical terms? The slope tells us that a change in the explanatory variable by one unit will result in a change in the response variable by the amount and direction of the slope. In our example, the slope is b 1 = 2.29, which tells us that for every home run hit, you d expect to get an additional 2.29 RBIs.

111 Practical Interpretation What do these values mean in practical terms? The slope tells us that a change in the explanatory variable by one unit will result in a change in the response variable by the amount and direction of the slope. In our example, the slope is b 1 = 2.29, which tells us that for every home run hit, you d expect to get an additional 2.29 RBIs. The y-intercept tells us the value of the response variable when the explanatory variable is 0.

112 Practical Interpretation What do these values mean in practical terms? The slope tells us that a change in the explanatory variable by one unit will result in a change in the response variable by the amount and direction of the slope. In our example, the slope is b 1 = 2.29, which tells us that for every home run hit, you d expect to get an additional 2.29 RBIs. The y-intercept tells us the value of the response variable when the explanatory variable is 0. In our example b 0 = 44.19, which means that we expect a player who hits no home runs to have RBIs.

113 Practical Interpretation What do these values mean in practical terms? The slope tells us that a change in the explanatory variable by one unit will result in a change in the response variable by the amount and direction of the slope. In our example, the slope is b 1 = 2.29, which tells us that for every home run hit, you d expect to get an additional 2.29 RBIs. The y-intercept tells us the value of the response variable when the explanatory variable is 0. In our example b 0 = 44.19, which means that we expect a player who hits no home runs to have RBIs. Note: In context, we may need to round to whole numbers for the answers to make any sense.

114 The Scatter Plot Let s see how good the regression line is by plotting it over the scatter plot.

115 The Scatter Plot Let s see how good the regression line is by plotting it over the scatter plot. RBIs Red Sox Power Numbers Home Runs To do so, we press Y= and put the line under Y 1, then select GRAPH

116 Plot and Line And now with the regression line y = x 2007 Red Sox Power Numbers RBIs Home Runs

117 Predictions One use of the regression line is making predictions. Suppose we wanted to know about how many RBI we could expect a player to have if they hit 60 home runs. We are looking to predict the value of y (so we want y) and we are given a value of x = 60.

118 Predictions One use of the regression line is making predictions. Suppose we wanted to know about how many RBI we could expect a player to have if they hit 60 home runs. We are looking to predict the value of y (so we want y) and we are given a value of x = 60. Our prediction would be y = (60) =

119 Predictions One use of the regression line is making predictions. Suppose we wanted to know about how many RBI we could expect a player to have if they hit 60 home runs. We are looking to predict the value of y (so we want y) and we are given a value of x = 60. Our prediction would be y = (60) = So, our prediction is 182 RBIs.

120 Facts about Regression Lines 1 Distinction between explanatory variables is essential - remember the formulas...

121 Facts about Regression Lines 1 Distinction between explanatory variables is essential - remember the formulas... 2 There is a close connection between slope and correlation

122 Facts about Regression Lines 1 Distinction between explanatory variables is essential - remember the formulas... 2 There is a close connection between slope and correlation 3 (x, y) is always on the line

123 Facts about Regression Lines 1 Distinction between explanatory variables is essential - remember the formulas... 2 There is a close connection between slope and correlation 3 (x, y) is always on the line 4 r 2 gives the fraction of the variation in the values of y that is explained by the least squares regression line of y on x. In other words, r 2 gives the fraction of the data s variation accounted for by the model.

124 Facts about Regression Lines 1 Distinction between explanatory variables is essential - remember the formulas... 2 There is a close connection between slope and correlation 3 (x, y) is always on the line 4 r 2 gives the fraction of the variation in the values of y that is explained by the least squares regression line of y on x. In other words, r 2 gives the fraction of the data s variation accounted for by the model. 5 This only shows us the linear model; it is possible that there is little correlation linearly but that the data has a strong correlation if we were using some other type of model.

125 Facts about Regression Lines 1 Distinction between explanatory variables is essential - remember the formulas... 2 There is a close connection between slope and correlation 3 (x, y) is always on the line 4 r 2 gives the fraction of the variation in the values of y that is explained by the least squares regression line of y on x. In other words, r 2 gives the fraction of the data s variation accounted for by the model. 5 This only shows us the linear model; it is possible that there is little correlation linearly but that the data has a strong correlation if we were using some other type of model. 6 We will not always get perfect correlation (probably never) but we need the line to be straight enough for the data to make sense. What that means is variable. Depending on the situation, r =.3 could be good enough; other times r =.8 would be a minimum.

126 Another Sox Example Suppose we wanted to know if a player was expected to score more runs if he got more hits. To answer this question, we will use the roster of the 2011 Boston Red Sox. Name Runs Hits Jarred Saltalamacchia Adrian Gonzalez Dustin Pedroia Marco Scutaro Kevin Youkilis Carl Crawford Jacoby Ellsbury J.D. Drew David Ortiz Jed Lowrie Josh Reddick Jason Varitek Darnell McDonald Mike Aviles Mike Cameron 9 14 Drew Sutton Ryan Lavarnway 5 9 Yamaico Navarro 6 8 Conor Jackson 2 3 Jose Iglesias 3 2 Lars Anderson 2 0 Joey Gathright 1 0

127 The Correlation Coefficient The first thing we will do is find the correlation coefficient.

128 The Correlation Coefficient The first thing we will do is find the correlation coefficient. When we plug all of the data into our technology, we get r =.9942.

129 The Correlation Coefficient The first thing we will do is find the correlation coefficient. When we plug all of the data into our technology, we get r = Interpretation?

130 The Correlation Coefficient The first thing we will do is find the correlation coefficient. When we plug all of the data into our technology, we get r = Interpretation? There is a strong, positive correlation between hits and runs scored.

131 Interpretation of r 2 What is the value of r 2 here?

132 Interpretation of r 2 What is the value of r 2 here? r 2 = What does this mean?

133 Interpretation of r 2 What is the value of r 2 here? r 2 = What does this mean? r 2 tells us that the fraction of the variability of the response variable that is accounted for by the variation in the explanatory variable.

134 Interpretation of r 2 What is the value of r 2 here? r 2 = What does this mean? r 2 tells us that the fraction of the variability of the response variable that is accounted for by the variation in the explanatory variable. Here, this means that we can explain 98.85% of the variability in the runs total by the variability in the number of hits.

135 Interpretation of r 2 What is the value of r 2 here? r 2 = What does this mean? r 2 tells us that the fraction of the variability of the response variable that is accounted for by the variation in the explanatory variable. Here, this means that we can explain 98.85% of the variability in the runs total by the variability in the number of hits. For each standard deviation above the mean for the explanatory variable x, y will be r standard deviations above the mean of the response variable.

136 Interpretation of r 2 What is the value of r 2 here? r 2 = What does this mean? r 2 tells us that the fraction of the variability of the response variable that is accounted for by the variation in the explanatory variable. Here, this means that we can explain 98.85% of the variability in the runs total by the variability in the number of hits. For each standard deviation above the mean for the explanatory variable x, y will be r standard deviations above the mean of the response variable. Here, we would say that for every standard deviation we are above the mean number of hits, we will be.9885 standard deviations above the mean number of runs.

137 Interpretation of r 2 What is the value of r 2 here? r 2 = What does this mean? r 2 tells us that the fraction of the variability of the response variable that is accounted for by the variation in the explanatory variable. Here, this means that we can explain 98.85% of the variability in the runs total by the variability in the number of hits. For each standard deviation above the mean for the explanatory variable x, y will be r standard deviations above the mean of the response variable. Here, we would say that for every standard deviation we are above the mean number of hits, we will be.9885 standard deviations above the mean number of runs. Also note that 1 r 2 is the fraction of the variation in the original data left in the residuals.

138 Producing the Scatter Plot Now, let s produce a scatter plot for the data.

139 Producing the Scatter Plot Now, let s produce a scatter plot for the data Red Sox Runs Hits

140 The Assumptions The points give us a pretty good indication that there is a very strong positive correlation here. Before we go on, we want to make sure all of the assumptions about regression lines are met.

141 The Assumptions The points give us a pretty good indication that there is a very strong positive correlation here. Before we go on, we want to make sure all of the assumptions about regression lines are met. Quantitative Variable Condition If either y or x is categorical, you cannot make a scatter plot and you cannot perform a regression.

142 The Assumptions The points give us a pretty good indication that there is a very strong positive correlation here. Before we go on, we want to make sure all of the assumptions about regression lines are met. Quantitative Variable Condition If either y or x is categorical, you cannot make a scatter plot and you cannot perform a regression. Straight Enough Condition Does the data look straight enough that we can see a linear relationship in the data set?

143 The Assumptions The points give us a pretty good indication that there is a very strong positive correlation here. Before we go on, we want to make sure all of the assumptions about regression lines are met. Quantitative Variable Condition If either y or x is categorical, you cannot make a scatter plot and you cannot perform a regression. Straight Enough Condition Does the data look straight enough that we can see a linear relationship in the data set? Outlier Condition Are there any outliers that dramatically influence the fit of the regression line?

144 The Assumptions The points give us a pretty good indication that there is a very strong positive correlation here. Before we go on, we want to make sure all of the assumptions about regression lines are met. Quantitative Variable Condition If either y or x is categorical, you cannot make a scatter plot and you cannot perform a regression. Straight Enough Condition Does the data look straight enough that we can see a linear relationship in the data set? Outlier Condition Are there any outliers that dramatically influence the fit of the regression line? Does the Plot Thicken Condition Does the spread of the data around the generally straight relationship seem to be consistent for all values of x?

145 Linear Regression Line Since all of these are satisfied, we will continue on to find the formula of the regression line. Using our technology, we have

146 Linear Regression Line Since all of these are satisfied, we will continue on to find the formula of the regression line. Using our technology, we have y = b0 + b 1 x = x

147 Linear Regression Line Since all of these are satisfied, we will continue on to find the formula of the regression line. Using our technology, we have y = b0 + b 1 x = x What is the practical interpretation of the slope b 1?

148 Linear Regression Line Since all of these are satisfied, we will continue on to find the formula of the regression line. Using our technology, we have y = b0 + b 1 x = x What is the practical interpretation of the slope b 1? For each hit, we expect a player to score.52 additional runs.

149 Linear Regression Line Since all of these are satisfied, we will continue on to find the formula of the regression line. Using our technology, we have y = b0 + b 1 x = x What is the practical interpretation of the slope b 1? For each hit, we expect a player to score.52 additional runs. What is the practical interpretation of the y-intercept b 0?

150 Linear Regression Line Since all of these are satisfied, we will continue on to find the formula of the regression line. Using our technology, we have y = b0 + b 1 x = x What is the practical interpretation of the slope b 1? For each hit, we expect a player to score.52 additional runs. What is the practical interpretation of the y-intercept b 0? If a player has no hits, we expect 1.92 runs to be scored.

151 Scatter Plot With Regression Line 2011 Red Sox Runs Hits

152 Scatter Plot With Regression Line 2011 Red Sox Runs Hits So, when we plot the regression line over the scatter plot, we see that the line is a good fit.

153 Predictions 1 If a player got 200 hits, how many runs would we expect them to have?

154 Predictions 1 If a player got 200 hits, how many runs would we expect them to have? Here, we are given the x value and using our regression line, we find the predicted value.

155 Predictions 1 If a player got 200 hits, how many runs would we expect them to have? Here, we are given the x value and using our regression line, we find the predicted value. y = (200) So, we d expect about 106 runs for a player with 200 hits.

156 Predictions 1 If a player got 200 hits, how many runs would we expect them to have? Here, we are given the x value and using our regression line, we find the predicted value. y = (200) So, we d expect about 106 runs for a player with 200 hits. 2 What if we wanted to know how many hits a player had if they scored 120 runs?

157 Predictions 1 If a player got 200 hits, how many runs would we expect them to have? Here, we are given the x value and using our regression line, we find the predicted value. y = (200) So, we d expect about 106 runs for a player with 200 hits. 2 What if we wanted to know how many hits a player had if they scored 120 runs? We are given the value of y and want to find the value of x. So, we use our algebra skills...

158 Predictions 1 If a player got 200 hits, how many runs would we expect them to have? Here, we are given the x value and using our regression line, we find the predicted value. y = (200) So, we d expect about 106 runs for a player with 200 hits. 2 What if we wanted to know how many hits a player had if they scored 120 runs? We are given the value of y and want to find the value of x. So, we use our algebra skills... We expect about 227 hits. y = x 120 = x =.52x = x

159 Important Points A few important points to keep in mind 1 An observation is influential for a statistical calculation if removing it would markedly change the results of the calculation. Point that are outliers in either the x or y direction are often influential points.

160 Important Points A few important points to keep in mind 1 An observation is influential for a statistical calculation if removing it would markedly change the results of the calculation. Point that are outliers in either the x or y direction are often influential points. 2 Correlation and least squares regression lines are not resistant

161 Important Points A few important points to keep in mind 1 An observation is influential for a statistical calculation if removing it would markedly change the results of the calculation. Point that are outliers in either the x or y direction are often influential points. 2 Correlation and least squares regression lines are not resistant 3 They only describe linear relationships.

162 Important Points A few important points to keep in mind 1 An observation is influential for a statistical calculation if removing it would markedly change the results of the calculation. Point that are outliers in either the x or y direction are often influential points. 2 Correlation and least squares regression lines are not resistant 3 They only describe linear relationships. 4 There could be lurking variables. Those are ones that are not among the explanatory or response variables but may influence the interpretation of the relationship.

163 Important Points A few important points to keep in mind 1 An observation is influential for a statistical calculation if removing it would markedly change the results of the calculation. Point that are outliers in either the x or y direction are often influential points. 2 Correlation and least squares regression lines are not resistant 3 They only describe linear relationships. 4 There could be lurking variables. Those are ones that are not among the explanatory or response variables but may influence the interpretation of the relationship. 5 An association between an explanatory variable x and a response variable y, even if r is very strong, is not itself good evidence that changes in x actually cause changes in y. The phrase to remember is that correlation does not necessarily imply causation.

164 Example Where You Are Doing The Work Example We want to know know if there is a relationship between the score on the math portion of the SAT exam and the number of hours studying for the test. The question is, Does studying more increase the score on the exam? The following data was taken from a study conducted of 20 students as they prepared and took the SAT exam. Hours Score Hours Score

165 Variable Types What is the response variable?

166 Variable Types What is the response variable? Math SAT score

167 Variable Types What is the response variable? Math SAT score What is the explanatory variable?

168 Variable Types What is the response variable? Math SAT score What is the explanatory variable? Hours of study

169 Pearson s Correlation Coefficient So let s get first find the correlation coefficient to see what we are dealing with.

170 Pearson s Correlation Coefficient So let s get first find the correlation coefficient to see what we are dealing with. Our interpretation? r =.9336

171 Pearson s Correlation Coefficient So let s get first find the correlation coefficient to see what we are dealing with. Our interpretation? r =.9336 This tells us there is a strong positive correlation.

172 And Now r 2 What about r 2?

173 And Now r 2 What about r 2? r 2 =.8716 This tells us what?

174 And Now r 2 What about r 2? r 2 =.8716 This tells us what? We can explain 87.16% of the variation in the Math SAT score can be explained by the variation in the hours of study.

175 Is The Data Significant? What is the inequality we are using?

176 Is The Data Significant? What is the inequality we are using? r n > 3

177 Is The Data Significant? What is the inequality we are using? r n > 3 Is this data significant?

178 Is The Data Significant? What is the inequality we are using? r n > 3 Is this data significant? r n = > 3 So, the data is significant based on this criteria.

179 Visual Representation Next, let s produce our scatter plot so we can see what we are dealing with.

180 Visual Representation Next, let s produce our scatter plot so we can see what we are dealing with. Math SAT Score v. Hours of Study SAT Score Hours of Study

181 Checking Conditions/Assumptions We are feeling pretty good about this - it seems to have a strong, positive correlation. When we consider the conditions, are we still happy with this?

182 Checking Conditions/Assumptions We are feeling pretty good about this - it seems to have a strong, positive correlation. When we consider the conditions, are we still happy with this? Quantitative Variable Condition

183 Checking Conditions/Assumptions We are feeling pretty good about this - it seems to have a strong, positive correlation. When we consider the conditions, are we still happy with this? Quantitative Variable Condition Both variables are quantitative.

184 Checking Conditions/Assumptions We are feeling pretty good about this - it seems to have a strong, positive correlation. When we consider the conditions, are we still happy with this? Quantitative Variable Condition Both variables are quantitative. Straight Enough Condition

185 Checking Conditions/Assumptions We are feeling pretty good about this - it seems to have a strong, positive correlation. When we consider the conditions, are we still happy with this? Quantitative Variable Condition Both variables are quantitative. Straight Enough Condition Data looks reasonably straight.

186 Checking Conditions/Assumptions We are feeling pretty good about this - it seems to have a strong, positive correlation. When we consider the conditions, are we still happy with this? Quantitative Variable Condition Both variables are quantitative. Straight Enough Condition Data looks reasonably straight. Outlier Condition

187 Checking Conditions/Assumptions We are feeling pretty good about this - it seems to have a strong, positive correlation. When we consider the conditions, are we still happy with this? Quantitative Variable Condition Both variables are quantitative. Straight Enough Condition Data looks reasonably straight. Outlier Condition There do not seem to be any outliers.

188 Checking Conditions/Assumptions We are feeling pretty good about this - it seems to have a strong, positive correlation. When we consider the conditions, are we still happy with this? Quantitative Variable Condition Both variables are quantitative. Straight Enough Condition Data looks reasonably straight. Outlier Condition There do not seem to be any outliers. Does the Plot Thicken Condition

189 Checking Conditions/Assumptions We are feeling pretty good about this - it seems to have a strong, positive correlation. When we consider the conditions, are we still happy with this? Quantitative Variable Condition Both variables are quantitative. Straight Enough Condition Data looks reasonably straight. Outlier Condition There do not seem to be any outliers. Does the Plot Thicken Condition Pretty much - other than the one person who studied for 22 hours, the relationship seems very strong.

Chapter 4 Data with Two Variables

Chapter 4 Data with Two Variables Chapter 4 Data with Two Variables 1 Scatter Plots and Correlation and 2 Pearson s Correlation Coefficient Looking for Correlation Example Does the number of hours you watch TV per week impact your average

More information

Chapter 6 Scatterplots, Association and Correlation

Chapter 6 Scatterplots, Association and Correlation Chapter 6 Scatterplots, Association and Correlation Looking for Correlation Example Does the number of hours you watch TV per week impact your average grade in a class? Hours 12 10 5 3 15 16 8 Grade 70

More information

STATISTICS Relationships between variables: Correlation

STATISTICS Relationships between variables: Correlation STATISTICS 16 Relationships between variables: Correlation The gentleman pictured above is Sir Francis Galton. Galton invented the statistical concept of correlation and the use of the regression line.

More information

AP Statistics Two-Variable Data Analysis

AP Statistics Two-Variable Data Analysis AP Statistics Two-Variable Data Analysis Key Ideas Scatterplots Lines of Best Fit The Correlation Coefficient Least Squares Regression Line Coefficient of Determination Residuals Outliers and Influential

More information

Ch. 3 Review - LSRL AP Stats

Ch. 3 Review - LSRL AP Stats Ch. 3 Review - LSRL AP Stats Multiple Choice Identify the choice that best completes the statement or answers the question. Scenario 3-1 The height (in feet) and volume (in cubic feet) of usable lumber

More information

Bivariate Data Summary

Bivariate Data Summary Bivariate Data Summary Bivariate data data that examines the relationship between two variables What individuals to the data describe? What are the variables and how are they measured Are the variables

More information

Chapter 3: Examining Relationships

Chapter 3: Examining Relationships Chapter 3: Examining Relationships Most statistical studies involve more than one variable. Often in the AP Statistics exam, you will be asked to compare two data sets by using side by side boxplots or

More information

BIVARIATE DATA data for two variables

BIVARIATE DATA data for two variables (Chapter 3) BIVARIATE DATA data for two variables INVESTIGATING RELATIONSHIPS We have compared the distributions of the same variable for several groups, using double boxplots and back-to-back stemplots.

More information

4.1 Introduction. 4.2 The Scatter Diagram. Chapter 4 Linear Correlation and Regression Analysis

4.1 Introduction. 4.2 The Scatter Diagram. Chapter 4 Linear Correlation and Regression Analysis 4.1 Introduction Correlation is a technique that measures the strength (or the degree) of the relationship between two variables. For example, we could measure how strong the relationship is between people

More information

Describing Bivariate Relationships

Describing Bivariate Relationships Describing Bivariate Relationships Bivariate Relationships What is Bivariate data? When exploring/describing a bivariate (x,y) relationship: Determine the Explanatory and Response variables Plot the data

More information

Chapter 8. Linear Regression /71

Chapter 8. Linear Regression /71 Chapter 8 Linear Regression 1 /71 Homework p192 1, 2, 3, 5, 7, 13, 15, 21, 27, 28, 29, 32, 35, 37 2 /71 3 /71 Objectives Determine Least Squares Regression Line (LSRL) describing the association of two

More information

Chapter 8. Linear Regression. Copyright 2010 Pearson Education, Inc.

Chapter 8. Linear Regression. Copyright 2010 Pearson Education, Inc. Chapter 8 Linear Regression Copyright 2010 Pearson Education, Inc. Fat Versus Protein: An Example The following is a scatterplot of total fat versus protein for 30 items on the Burger King menu: Copyright

More information

5.1 Bivariate Relationships

5.1 Bivariate Relationships Chapter 5 Summarizing Bivariate Data Source: TPS 5.1 Bivariate Relationships What is Bivariate data? When exploring/describing a bivariate (x,y) relationship: Determine the Explanatory and Response variables

More information

Example: Can an increase in non-exercise activity (e.g. fidgeting) help people gain less weight?

Example: Can an increase in non-exercise activity (e.g. fidgeting) help people gain less weight? Example: Can an increase in non-exercise activity (e.g. fidgeting) help people gain less weight? 16 subjects overfed for 8 weeks Explanatory: change in energy use from non-exercise activity (calories)

More information

Regression and correlation. Correlation & Regression, I. Regression & correlation. Regression vs. correlation. Involve bivariate, paired data, X & Y

Regression and correlation. Correlation & Regression, I. Regression & correlation. Regression vs. correlation. Involve bivariate, paired data, X & Y Regression and correlation Correlation & Regression, I 9.07 4/1/004 Involve bivariate, paired data, X & Y Height & weight measured for the same individual IQ & exam scores for each individual Height of

More information

Relationships Regression

Relationships Regression Relationships Regression BPS chapter 5 2006 W.H. Freeman and Company Objectives (BPS chapter 5) Regression Regression lines The least-squares regression line Using technology Facts about least-squares

More information

Prob/Stats Questions? /32

Prob/Stats Questions? /32 Prob/Stats 10.4 Questions? 1 /32 Prob/Stats 10.4 Homework Apply p551 Ex 10-4 p 551 7, 8, 9, 10, 12, 13, 28 2 /32 Prob/Stats 10.4 Objective Compute the equation of the least squares 3 /32 Regression A scatter

More information

Chapter 10 Correlation and Regression

Chapter 10 Correlation and Regression Chapter 10 Correlation and Regression 10-1 Review and Preview 10-2 Correlation 10-3 Regression 10-4 Variation and Prediction Intervals 10-5 Multiple Regression 10-6 Modeling Copyright 2010, 2007, 2004

More information

Steps to take to do the descriptive part of regression analysis:

Steps to take to do the descriptive part of regression analysis: STA 2023 Simple Linear Regression: Least Squares Model Steps to take to do the descriptive part of regression analysis: A. Plot the data on a scatter plot. Describe patterns: 1. Is there a strong, moderate,

More information

If the roles of the variable are not clear, then which variable is placed on which axis is not important.

If the roles of the variable are not clear, then which variable is placed on which axis is not important. Chapter 6 - Scatterplots, Association, and Correlation February 6, 2015 In chapter 6-8, we look at ways to compare the relationship of 2 quantitative variables. First we will look at a graphical representation,

More information

Finite Mathematics : A Business Approach

Finite Mathematics : A Business Approach Finite Mathematics : A Business Approach Dr. Brian Travers and Prof. James Lampes Second Edition Cover Art by Stephanie Oxenford Additional Editing by John Gambino Contents What You Should Already Know

More information

Chapter 12: Linear Regression and Correlation

Chapter 12: Linear Regression and Correlation Chapter 12: Linear Regression and Correlation Linear Equations Linear regression for two variables is based on a linear equation with one independent variable. It has the form: y = a + bx where a and b

More information

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras Lecture - 39 Regression Analysis Hello and welcome to the course on Biostatistics

More information

Linear Regression. Linear Regression. Linear Regression. Did You Mean Association Or Correlation?

Linear Regression. Linear Regression. Linear Regression. Did You Mean Association Or Correlation? Did You Mean Association Or Correlation? AP Statistics Chapter 8 Be careful not to use the word correlation when you really mean association. Often times people will incorrectly use the word correlation

More information

The empirical ( ) rule

The empirical ( ) rule The empirical (68-95-99.7) rule With a bell shaped distribution, about 68% of the data fall within a distance of 1 standard deviation from the mean. 95% fall within 2 standard deviations of the mean. 99.7%

More information

Chapter 6. September 17, Please pick up a calculator and take out paper and something to write with. Association and Correlation.

Chapter 6. September 17, Please pick up a calculator and take out paper and something to write with. Association and Correlation. Please pick up a calculator and take out paper and something to write with. Sep 17 8:08 AM Chapter 6 Scatterplots, Association and Correlation Copyright 2015, 2010, 2007 Pearson Education, Inc. Chapter

More information

3.2: Least Squares Regressions

3.2: Least Squares Regressions 3.2: Least Squares Regressions Section 3.2 Least-Squares Regression After this section, you should be able to INTERPRET a regression line CALCULATE the equation of the least-squares regression line CALCULATE

More information

Relationships between variables. Visualizing Bivariate Distributions: Scatter Plots

Relationships between variables. Visualizing Bivariate Distributions: Scatter Plots SFBS Course Notes Part 7: Correlation Bivariate relationships (p. 1) Linear transformations (p. 3) Pearson r : Measuring a relationship (p. 5) Interpretation of correlations (p. 10) Relationships between

More information

The response variable depends on the explanatory variable.

The response variable depends on the explanatory variable. A response variable measures an outcome of study. > dependent variables An explanatory variable attempts to explain the observed outcomes. > independent variables The response variable depends on the explanatory

More information

Ch Inference for Linear Regression

Ch Inference for Linear Regression Ch. 12-1 Inference for Linear Regression ACT = 6.71 + 5.17(GPA) For every increase of 1 in GPA, we predict the ACT score to increase by 5.17. population regression line β (true slope) μ y = α + βx mean

More information

CORRELATION. compiled by Dr Kunal Pathak

CORRELATION. compiled by Dr Kunal Pathak CORRELATION compiled by Dr Kunal Pathak Flow of Presentation Definition Types of correlation Method of studying correlation a) Scatter diagram b) Karl Pearson s coefficient of correlation c) Spearman s

More information

Learning Objectives. Math Chapter 3. Chapter 3. Association. Response and Explanatory Variables

Learning Objectives. Math Chapter 3. Chapter 3. Association. Response and Explanatory Variables ASSOCIATION: CONTINGENCY, CORRELATION, AND REGRESSION Chapter 3 Learning Objectives 3.1 The Association between Two Categorical Variables 1. Identify variable type: Response or Explanatory 2. Define Association

More information

1. Create a scatterplot of this data. 2. Find the correlation coefficient.

1. Create a scatterplot of this data. 2. Find the correlation coefficient. How Fast Foods Compare Company Entree Total Calories Fat (grams) McDonald s Big Mac 540 29 Filet o Fish 380 18 Burger King Whopper 670 40 Big Fish Sandwich 640 32 Wendy s Single Burger 470 21 1. Create

More information

Objectives. 2.3 Least-squares regression. Regression lines. Prediction and Extrapolation. Correlation and r 2. Transforming relationships

Objectives. 2.3 Least-squares regression. Regression lines. Prediction and Extrapolation. Correlation and r 2. Transforming relationships Objectives 2.3 Least-squares regression Regression lines Prediction and Extrapolation Correlation and r 2 Transforming relationships Adapted from authors slides 2012 W.H. Freeman and Company Straight Line

More information

Solving Quadratic & Higher Degree Equations

Solving Quadratic & Higher Degree Equations Chapter 7 Solving Quadratic & Higher Degree Equations Sec 1. Zero Product Property Back in the third grade students were taught when they multiplied a number by zero, the product would be zero. In algebra,

More information

Chapter 9. Correlation and Regression

Chapter 9. Correlation and Regression Chapter 9 Correlation and Regression Lesson 9-1/9-2, Part 1 Correlation Registered Florida Pleasure Crafts and Watercraft Related Manatee Deaths 100 80 60 40 20 0 1991 1993 1995 1997 1999 Year Boats in

More information

Chapter 12 : Linear Correlation and Linear Regression

Chapter 12 : Linear Correlation and Linear Regression Chapter 1 : Linear Correlation and Linear Regression Determining whether a linear relationship exists between two quantitative variables, and modeling the relationship with a line, if the linear relationship

More information

determine whether or not this relationship is.

determine whether or not this relationship is. Section 9-1 Correlation A correlation is a between two. The data can be represented by ordered pairs (x,y) where x is the (or ) variable and y is the (or ) variable. There are several types of correlations

More information

IOP2601. Some notes on basic mathematical calculations

IOP2601. Some notes on basic mathematical calculations IOP601 Some notes on basic mathematical calculations The order of calculations In order to perform the calculations required in this module, there are a few steps that you need to complete. Step 1: Choose

More information

Describing Bivariate Data

Describing Bivariate Data Describing Bivariate Data Correlation Linear Regression Assessing the Fit of a Line Nonlinear Relationships & Transformations The Linear Correlation Coefficient, r Recall... Bivariate Data: data that consists

More information

MATH 1070 Introductory Statistics Lecture notes Relationships: Correlation and Simple Regression

MATH 1070 Introductory Statistics Lecture notes Relationships: Correlation and Simple Regression MATH 1070 Introductory Statistics Lecture notes Relationships: Correlation and Simple Regression Objectives: 1. Learn the concepts of independent and dependent variables 2. Learn the concept of a scatterplot

More information

BIOSTATISTICS NURS 3324

BIOSTATISTICS NURS 3324 Simple Linear Regression and Correlation Introduction Previously, our attention has been focused on one variable which we designated by x. Frequently, it is desirable to learn something about the relationship

More information

Math 243 OpenStax Chapter 12 Scatterplots and Linear Regression OpenIntro Section and

Math 243 OpenStax Chapter 12 Scatterplots and Linear Regression OpenIntro Section and Math 243 OpenStax Chapter 12 Scatterplots and Linear Regression OpenIntro Section 2.1.1 and 8.1-8.2.6 Overview Scatterplots Explanatory and Response Variables Describing Association The Regression Equation

More information

Regression, part II. I. What does it all mean? A) Notice that so far all we ve done is math.

Regression, part II. I. What does it all mean? A) Notice that so far all we ve done is math. Regression, part II I. What does it all mean? A) Notice that so far all we ve done is math. 1) One can calculate the Least Squares Regression Line for anything, regardless of any assumptions. 2) But, if

More information

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1 Lecture Slides Elementary Statistics Tenth Edition and the Triola Statistics Series by Mario F. Triola Slide 1 Chapter 10 Correlation and Regression 10-1 Overview 10-2 Correlation 10-3 Regression 10-4

More information

regression analysis is a type of inferential statistics which tells us whether relationships between two or more variables exist

regression analysis is a type of inferential statistics which tells us whether relationships between two or more variables exist regression analysis is a type of inferential statistics which tells us whether relationships between two or more variables exist sales $ (y - dependent variable) advertising $ (x - independent variable)

More information

11 Correlation and Regression

11 Correlation and Regression Chapter 11 Correlation and Regression August 21, 2017 1 11 Correlation and Regression When comparing two variables, sometimes one variable (the explanatory variable) can be used to help predict the value

More information

Chapter 14. Statistical versus Deterministic Relationships. Distance versus Speed. Describing Relationships: Scatterplots and Correlation

Chapter 14. Statistical versus Deterministic Relationships. Distance versus Speed. Describing Relationships: Scatterplots and Correlation Chapter 14 Describing Relationships: Scatterplots and Correlation Chapter 14 1 Statistical versus Deterministic Relationships Distance versus Speed (when travel time is constant). Income (in millions of

More information

Chapter 2: Looking at Data Relationships (Part 3)

Chapter 2: Looking at Data Relationships (Part 3) Chapter 2: Looking at Data Relationships (Part 3) Dr. Nahid Sultana Chapter 2: Looking at Data Relationships 2.1: Scatterplots 2.2: Correlation 2.3: Least-Squares Regression 2.5: Data Analysis for Two-Way

More information

Lecture 4 Scatterplots, Association, and Correlation

Lecture 4 Scatterplots, Association, and Correlation Lecture 4 Scatterplots, Association, and Correlation Previously, we looked at Single variables on their own One or more categorical variable In this lecture: We shall look at two quantitative variables.

More information

Solving Quadratic & Higher Degree Equations

Solving Quadratic & Higher Degree Equations Chapter 9 Solving Quadratic & Higher Degree Equations Sec 1. Zero Product Property Back in the third grade students were taught when they multiplied a number by zero, the product would be zero. In algebra,

More information

appstats27.notebook April 06, 2017

appstats27.notebook April 06, 2017 Chapter 27 Objective Students will conduct inference on regression and analyze data to write a conclusion. Inferences for Regression An Example: Body Fat and Waist Size pg 634 Our chapter example revolves

More information

y n 1 ( x i x )( y y i n 1 i y 2

y n 1 ( x i x )( y y i n 1 i y 2 STP3 Brief Class Notes Instructor: Ela Jackiewicz Chapter Regression and Correlation In this chapter we will explore the relationship between two quantitative variables, X an Y. We will consider n ordered

More information

This document contains 3 sets of practice problems.

This document contains 3 sets of practice problems. P RACTICE PROBLEMS This document contains 3 sets of practice problems. Correlation: 3 problems Regression: 4 problems ANOVA: 8 problems You should print a copy of these practice problems and bring them

More information

Algebra Review. Finding Zeros (Roots) of Quadratics, Cubics, and Quartics. Kasten, Algebra 2. Algebra Review

Algebra Review. Finding Zeros (Roots) of Quadratics, Cubics, and Quartics. Kasten, Algebra 2. Algebra Review Kasten, Algebra 2 Finding Zeros (Roots) of Quadratics, Cubics, and Quartics A zero of a polynomial equation is the value of the independent variable (typically x) that, when plugged-in to the equation,

More information

Chapter 7. Scatterplots, Association, and Correlation

Chapter 7. Scatterplots, Association, and Correlation Chapter 7 Scatterplots, Association, and Correlation Bin Zou (bzou@ualberta.ca) STAT 141 University of Alberta Winter 2015 1 / 29 Objective In this chapter, we study relationships! Instead, we investigate

More information

Correlation & Regression

Correlation & Regression Correlation & Regression Correlation It is critical that when "interpreting" the association between 2 variables via a scatterplot, to employ "weasel words" such as in general and on average and tends

More information

Chapter 27 Summary Inferences for Regression

Chapter 27 Summary Inferences for Regression Chapter 7 Summary Inferences for Regression What have we learned? We have now applied inference to regression models. Like in all inference situations, there are conditions that we must check. We can test

More information

Algebra & Trig Review

Algebra & Trig Review Algebra & Trig Review 1 Algebra & Trig Review This review was originally written for my Calculus I class, but it should be accessible to anyone needing a review in some basic algebra and trig topics. The

More information

P.1 Prerequisite skills Basic Algebra Skills

P.1 Prerequisite skills Basic Algebra Skills P.1 Prerequisite skills Basic Algebra Skills Topics: Evaluate an algebraic expression for given values of variables Combine like terms/simplify algebraic expressions Solve equations for a specified variable

More information

Important note: Transcripts are not substitutes for textbook assignments. 1

Important note: Transcripts are not substitutes for textbook assignments. 1 In this lesson we will cover correlation and regression, two really common statistical analyses for quantitative (or continuous) data. Specially we will review how to organize the data, the importance

More information

Chapter 5 Least Squares Regression

Chapter 5 Least Squares Regression Chapter 5 Least Squares Regression A Royal Bengal tiger wandered out of a reserve forest. We tranquilized him and want to take him back to the forest. We need an idea of his weight, but have no scale!

More information

Related Example on Page(s) R , 148 R , 148 R , 156, 157 R3.1, R3.2. Activity on 152, , 190.

Related Example on Page(s) R , 148 R , 148 R , 156, 157 R3.1, R3.2. Activity on 152, , 190. Name Chapter 3 Learning Objectives Identify explanatory and response variables in situations where one variable helps to explain or influences the other. Make a scatterplot to display the relationship

More information

a. Length of tube: Diameter of tube:

a. Length of tube: Diameter of tube: CCA Ch 6: Modeling Two-Variable Data Name: 6.1.1 How can I make predictions? Line of Best Fit 6-1. a. Length of tube: Diameter of tube: Distance from the wall (in) Width of field of view (in) b. Make a

More information

STA Module 5 Regression and Correlation. Learning Objectives. Learning Objectives (Cont.) Upon completing this module, you should be able to:

STA Module 5 Regression and Correlation. Learning Objectives. Learning Objectives (Cont.) Upon completing this module, you should be able to: STA 2023 Module 5 Regression and Correlation Learning Objectives Upon completing this module, you should be able to: 1. Define and apply the concepts related to linear equations with one independent variable.

More information

Lecture 4 Scatterplots, Association, and Correlation

Lecture 4 Scatterplots, Association, and Correlation Lecture 4 Scatterplots, Association, and Correlation Previously, we looked at Single variables on their own One or more categorical variables In this lecture: We shall look at two quantitative variables.

More information

Introduce Exploration! Before we go on, notice one more thing. We'll come back to the derivation if we have time.

Introduce Exploration! Before we go on, notice one more thing. We'll come back to the derivation if we have time. Introduce Exploration! Before we go on, notice one more thing. We'll come back to the derivation if we have time. Simplifying the calculation of variance Notice that we can rewrite the calculation of a

More information

3.1 Scatterplots and Correlation

3.1 Scatterplots and Correlation 3.1 Scatterplots and Correlation Most statistical studies examine data on more than one variable. In many of these settings, the two variables play different roles. Explanatory variable (independent) predicts

More information

Chapter 1 Review of Equations and Inequalities

Chapter 1 Review of Equations and Inequalities Chapter 1 Review of Equations and Inequalities Part I Review of Basic Equations Recall that an equation is an expression with an equal sign in the middle. Also recall that, if a question asks you to solve

More information

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x). Linear Regression Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x). A dependent variable is a random variable whose variation

More information

MATH 1130 Exam 1 Review Sheet

MATH 1130 Exam 1 Review Sheet MATH 1130 Exam 1 Review Sheet The Cartesian Coordinate Plane The Cartesian Coordinate Plane is a visual representation of the collection of all ordered pairs (x, y) where x and y are real numbers. This

More information

Talking feet: Scatterplots and lines of best fit

Talking feet: Scatterplots and lines of best fit Talking feet: Scatterplots and lines of best fit Student worksheet What does your foot say about your height? Can you predict people s height by how long their feet are? If a Grade 10 student s foot is

More information

CORRELATION AND REGRESSION

CORRELATION AND REGRESSION CORRELATION AND REGRESSION CORRELATION Introduction CORRELATION problems which involve measuring the strength of a relationship. Correlation Analysis involves various methods and techniques used for studying

More information

Chapter 7 Linear Regression

Chapter 7 Linear Regression Chapter 7 Linear Regression 1 7.1 Least Squares: The Line of Best Fit 2 The Linear Model Fat and Protein at Burger King The correlation is 0.76. This indicates a strong linear fit, but what line? The line

More information

Statistics 1. Edexcel Notes S1. Mathematical Model. A mathematical model is a simplification of a real world problem.

Statistics 1. Edexcel Notes S1. Mathematical Model. A mathematical model is a simplification of a real world problem. Statistics 1 Mathematical Model A mathematical model is a simplification of a real world problem. 1. A real world problem is observed. 2. A mathematical model is thought up. 3. The model is used to make

More information

Unit 6 - Introduction to linear regression

Unit 6 - Introduction to linear regression Unit 6 - Introduction to linear regression Suggested reading: OpenIntro Statistics, Chapter 7 Suggested exercises: Part 1 - Relationship between two numerical variables: 7.7, 7.9, 7.11, 7.13, 7.15, 7.25,

More information

CORRELATION AND REGRESSION

CORRELATION AND REGRESSION CORRELATION AND REGRESSION CORRELATION The correlation coefficient is a number, between -1 and +1, which measures the strength of the relationship between two sets of data. The closer the correlation coefficient

More information

AP Statistics. Chapter 6 Scatterplots, Association, and Correlation

AP Statistics. Chapter 6 Scatterplots, Association, and Correlation AP Statistics Chapter 6 Scatterplots, Association, and Correlation Objectives: Scatterplots Association Outliers Response Variable Explanatory Variable Correlation Correlation Coefficient Lurking Variables

More information

Correlation and Regression

Correlation and Regression Correlation and Regression 8 9 Copyright Cengage Learning. All rights reserved. Section 9.2 Linear Regression and the Coefficient of Determination Copyright Cengage Learning. All rights reserved. Focus

More information

Unit 1 Science Models & Graphing

Unit 1 Science Models & Graphing Name: Date: 9/18 Period: Unit 1 Science Models & Graphing Essential Questions: What do scientists mean when they talk about models? How can we get equations from graphs? Objectives Explain why models are

More information

appstats8.notebook October 11, 2016

appstats8.notebook October 11, 2016 Chapter 8 Linear Regression Objective: Students will construct and analyze a linear model for a given set of data. Fat Versus Protein: An Example pg 168 The following is a scatterplot of total fat versus

More information

Chapter 3: Examining Relationships

Chapter 3: Examining Relationships Chapter 3: Examining Relationships 3.1 Scatterplots 3.2 Correlation 3.3 Least-Squares Regression Fabric Tenacity, lb/oz/yd^2 26 25 24 23 22 21 20 19 18 y = 3.9951x + 4.5711 R 2 = 0.9454 3.5 4.0 4.5 5.0

More information

Linear Regression. Al Nosedal University of Toronto. Summer Al Nosedal University of Toronto Linear Regression Summer / 115

Linear Regression. Al Nosedal University of Toronto. Summer Al Nosedal University of Toronto Linear Regression Summer / 115 Linear Regression Al Nosedal University of Toronto Summer 2017 Al Nosedal University of Toronto Linear Regression Summer 2017 1 / 115 My momma always said: Life was like a box of chocolates. You never

More information

Review of Regression Basics

Review of Regression Basics Review of Regression Basics When describing a Bivariate Relationship: Make a Scatterplot Strength, Direction, Form Model: y-hat=a+bx Interpret slope in context Make Predictions Residual = Observed-Predicted

More information

AP Statistics Unit 6 Note Packet Linear Regression. Scatterplots and Correlation

AP Statistics Unit 6 Note Packet Linear Regression. Scatterplots and Correlation Scatterplots and Correlation Name Hr A scatterplot shows the relationship between two quantitative variables measured on the same individuals. variable (y) measures an outcome of a study variable (x) may

More information

6.1.1 How can I make predictions?

6.1.1 How can I make predictions? CCA Ch 6: Modeling Two-Variable Data Name: Team: 6.1.1 How can I make predictions? Line of Best Fit 6-1. a. Length of tube: Diameter of tube: Distance from the wall (in) Width of field of view (in) b.

More information

Introduction to Determining Power Law Relationships

Introduction to Determining Power Law Relationships 1 Goal Introduction to Determining Power Law Relationships Content Discussion and Activities PHYS 104L The goal of this week s activities is to expand on a foundational understanding and comfort in modeling

More information

Business Statistics 41000:

Business Statistics 41000: Business Statistics 41000: Plotting and Summarizing Bivariate Data Drew D. Creal University of Chicago, Booth School of Business Week 2: January 17 and 18, 2014 1 Class information Drew D. Creal Email:

More information

Chapter 12 Summarizing Bivariate Data Linear Regression and Correlation

Chapter 12 Summarizing Bivariate Data Linear Regression and Correlation Chapter 1 Summarizing Bivariate Data Linear Regression and Correlation This chapter introduces an important method for making inferences about a linear correlation (or relationship) between two variables,

More information

Chapter 6: Exploring Data: Relationships Lesson Plan

Chapter 6: Exploring Data: Relationships Lesson Plan Chapter 6: Exploring Data: Relationships Lesson Plan For All Practical Purposes Displaying Relationships: Scatterplots Mathematical Literacy in Today s World, 9th ed. Making Predictions: Regression Line

More information

Math 147 Lecture Notes: Lecture 12

Math 147 Lecture Notes: Lecture 12 Math 147 Lecture Notes: Lecture 12 Walter Carlip February, 2018 All generalizations are false, including this one.. Samuel Clemens (aka Mark Twain) (1835-1910) Figures don t lie, but liars do figure. Samuel

More information

Solving Quadratic & Higher Degree Equations

Solving Quadratic & Higher Degree Equations Chapter 9 Solving Quadratic & Higher Degree Equations Sec 1. Zero Product Property Back in the third grade students were taught when they multiplied a number by zero, the product would be zero. In algebra,

More information

( )( b + c) = ab + ac, but it can also be ( )( a) = ba + ca. Let s use the distributive property on a couple of

( )( b + c) = ab + ac, but it can also be ( )( a) = ba + ca. Let s use the distributive property on a couple of Factoring Review for Algebra II The saddest thing about not doing well in Algebra II is that almost any math teacher can tell you going into it what s going to trip you up. One of the first things they

More information

SOLUTIONS FOR PROBLEMS 1-30

SOLUTIONS FOR PROBLEMS 1-30 . Answer: 5 Evaluate x x + 9 for x SOLUTIONS FOR PROBLEMS - 0 When substituting x in x be sure to do the exponent before the multiplication by to get (). + 9 5 + When multiplying ( ) so that ( 7) ( ).

More information

Nov 13 AP STAT. 1. Check/rev HW 2. Review/recap of notes 3. HW: pg #5,7,8,9,11 and read/notes pg smartboad notes ch 3.

Nov 13 AP STAT. 1. Check/rev HW 2. Review/recap of notes 3. HW: pg #5,7,8,9,11 and read/notes pg smartboad notes ch 3. Nov 13 AP STAT 1. Check/rev HW 2. Review/recap of notes 3. HW: pg 179 184 #5,7,8,9,11 and read/notes pg 185 188 1 Chapter 3 Notes Review Exploring relationships between two variables. BIVARIATE DATA Is

More information

The Simple Linear Regression Model

The Simple Linear Regression Model The Simple Linear Regression Model Lesson 3 Ryan Safner 1 1 Department of Economics Hood College ECON 480 - Econometrics Fall 2017 Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 1 / 77 Bivariate

More information

9. Linear Regression and Correlation

9. Linear Regression and Correlation 9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,

More information

About Bivariate Correlations and Linear Regression

About Bivariate Correlations and Linear Regression About Bivariate Correlations and Linear Regression TABLE OF CONTENTS About Bivariate Correlations and Linear Regression... 1 What is BIVARIATE CORRELATION?... 1 What is LINEAR REGRESSION... 1 Bivariate

More information

Unit 9 Regression and Correlation Homework #14 (Unit 9 Regression and Correlation) SOLUTIONS. X = cigarette consumption (per capita in 1930)

Unit 9 Regression and Correlation Homework #14 (Unit 9 Regression and Correlation) SOLUTIONS. X = cigarette consumption (per capita in 1930) BIOSTATS 540 Fall 2015 Introductory Biostatistics Page 1 of 10 Unit 9 Regression and Correlation Homework #14 (Unit 9 Regression and Correlation) SOLUTIONS Consider the following study of the relationship

More information