Chapter 4 Data with Two Variables
|
|
- Darrell Mason
- 5 years ago
- Views:
Transcription
1 Chapter 4 Data with Two Variables
2 1 Scatter Plots and Correlation and 2 Pearson s Correlation Coefficient
3 Looking for Correlation Example Does the number of hours you watch TV per week impact your average grade in a class? Hours Grade
4 Looking for Correlation Example Does the number of hours you watch TV per week impact your average grade in a class? Hours Grade To see if there is a relationship, we will create a scatter plot and analyze it. Definition A scatter plot is a geographical representation between two quantitative variables. They may be from the same individual (i.e. education v. income, height v. weight) or from paired individuals (i.e. age of partners in a relationship).
5 Scatter Plots When working with scatter plots, there are two variables. They may be two different types. Definition A response variable measures the outcome of a study.
6 Scatter Plots When working with scatter plots, there are two variables. They may be two different types. Definition A response variable measures the outcome of a study. Definition An explanatory variable may explain or influence changes in a response variable.
7 Scatter Plots When working with scatter plots, there are two variables. They may be two different types. Definition A response variable measures the outcome of a study. Definition An explanatory variable may explain or influence changes in a response variable. Explanatory variables are often called independent and are on the x-axis. Response variables are often called dependent and are on the y-axis.
8 Back to Our Example In our example, which is the explanatory variable?
9 Back to Our Example In our example, which is the explanatory variable? Watched TV hours.
10 Back to Our Example In our example, which is the explanatory variable? Watched TV hours. The response variable is there for the average grade. So the question we are trying to answer is Does watching TV influence the average grade in a class?
11 Back to Our Example In our example, which is the explanatory variable? Watched TV hours. The response variable is there for the average grade. So the question we are trying to answer is Does watching TV influence the average grade in a class? Let s plot the data and see what we have.
12 The Scatter Plot Grades v. Hours of TV Grade Hours of TV
13 How Does the Relationship Look? What do we think?
14 How Does the Relationship Look? What do we think? It looks like the more hours of TV that are watched, the lower the average grade. But how good is the relationship? We can measure this in different ways. One is direction (+, ) and another is by ranking the strength. These are both accomplished by looking at Pearson s Correlation Coefficient.
15 Facts About Pearson s Correlation Coefficient: 1 1 r 1. The least correlation is 0 and the best correlation is ±1. Whether r is positive or negative only tells us which direction the relationship goes - whether y increases as x increases or if y decreases as x increases. Being negative is not bad.
16 Facts About Pearson s Correlation Coefficient: 1 1 r 1. The least correlation is 0 and the best correlation is ±1. Whether r is positive or negative only tells us which direction the relationship goes - whether y increases as x increases or if y decreases as x increases. Being negative is not bad. 2 Correlation makes no distinction between x and y, that is, between the choice of explanatory and response variables. We need to make sure we are careful, though, as the next part (regression line) depends heavily on the correct choice.
17 Facts About Pearson s Correlation Coefficient: 1 1 r 1. The least correlation is 0 and the best correlation is ±1. Whether r is positive or negative only tells us which direction the relationship goes - whether y increases as x increases or if y decreases as x increases. Being negative is not bad. 2 Correlation makes no distinction between x and y, that is, between the choice of explanatory and response variables. We need to make sure we are careful, though, as the next part (regression line) depends heavily on the correct choice. 3 Correlation measures only the linear relationship.
18 Facts About Pearson s Correlation Coefficient: 1 1 r 1. The least correlation is 0 and the best correlation is ±1. Whether r is positive or negative only tells us which direction the relationship goes - whether y increases as x increases or if y decreases as x increases. Being negative is not bad. 2 Correlation makes no distinction between x and y, that is, between the choice of explanatory and response variables. We need to make sure we are careful, though, as the next part (regression line) depends heavily on the correct choice. 3 Correlation measures only the linear relationship. 4 Correlation is not resistant.
19 Facts About Pearson s Correlation Coefficient: 1 1 r 1. The least correlation is 0 and the best correlation is ±1. Whether r is positive or negative only tells us which direction the relationship goes - whether y increases as x increases or if y decreases as x increases. Being negative is not bad. 2 Correlation makes no distinction between x and y, that is, between the choice of explanatory and response variables. We need to make sure we are careful, though, as the next part (regression line) depends heavily on the correct choice. 3 Correlation measures only the linear relationship. 4 Correlation is not resistant. 5 Correlation has no units.
20 So How Do We Find Pearson s Correlation Coefficient? The Correlation Coefficient r = 1 ( ) ( ) x i x yi y n 1 S x S y = 1 n 1 zx z y
21 So How Do We Find Pearson s Correlation Coefficient? The Correlation Coefficient r = 1 ( ) ( ) x i x yi y n 1 S x S y = 1 n 1 zx z y Let s find the correlation coefficient for our example. First, we need a few values, x, y, S x, S y.
22 So How Do We Find Pearson s Correlation Coefficient? The Correlation Coefficient r = 1 ( ) ( ) x i x yi y n 1 S x S y = 1 n 1 zx z y Let s find the correlation coefficient for our example. First, we need a few values, x, y, S x, S y. x = y = S x = S y = 8.971
23 Finding Pearson s Correlation Coefficient For each pair, find the z-score for each value. Then multiply them together. After summing, divide by n 1. i z x z y product
24 Finding Pearson s Correlation Coefficient For each pair, find the z-score for each value. Then multiply them together. After summing, divide by n 1. i z x z y product
25 Finding Pearson s Correlation Coefficient For each pair, find the z-score for each value. Then multiply them together. After summing, divide by n 1. i z x z y product r = 1 ( ) =
26 Finding Pearson s Correlation Coefficient For each pair, find the z-score for each value. Then multiply them together. After summing, divide by n 1. i z x z y product r = 1 ( ) = Interpretation: Moderate negative correlation
27 Another Way To Calculate We can use a formula similar to the way we found the standard deviation to find this correlation coefficient. First, we need a few formulas:
28 Another Way To Calculate We can use a formula similar to the way we found the standard deviation to find this correlation coefficient. First, we need a few formulas: The Three S s S xx = Σ(x x) 2
29 Another Way To Calculate We can use a formula similar to the way we found the standard deviation to find this correlation coefficient. First, we need a few formulas: The Three S s S xx = Σ(x x) 2 S yy = Σ(y y) 2
30 Another Way To Calculate We can use a formula similar to the way we found the standard deviation to find this correlation coefficient. First, we need a few formulas: The Three S s S xx = Σ(x x) 2 S yy = Σ(y y) 2 S xy = Σ(x x)(y y)
31 Another Way To Calculate We can use a formula similar to the way we found the standard deviation to find this correlation coefficient. First, we need a few formulas: The Three S s S xx = Σ(x x) 2 S yy = Σ(y y) 2 S xy = Σ(x x)(y y) Pearson s Correlation Coefficient r = S xy Sxx S yy
32 Another Way To Calculate This makes it look just as complicated, or worse, than the other method, but there are simpler ways to calculate the three S s that make this method not so painful.
33 Another Way To Calculate This makes it look just as complicated, or worse, than the other method, but there are simpler ways to calculate the three S s that make this method not so painful. Calculating the Three S s S xx = Σx 2 (Σx)2 n
34 Another Way To Calculate This makes it look just as complicated, or worse, than the other method, but there are simpler ways to calculate the three S s that make this method not so painful. Calculating the Three S s S xx = Σx 2 (Σx)2 n S yy = Σy 2 (Σy)2 n
35 Another Way To Calculate This makes it look just as complicated, or worse, than the other method, but there are simpler ways to calculate the three S s that make this method not so painful. Calculating the Three S s S xx = Σx 2 (Σx)2 n S yy = Σy 2 (Σy)2 n S xy = Σxy (Σx)(Σy) n
36 Another Way To Calculate This makes it look just as complicated, or worse, than the other method, but there are simpler ways to calculate the three S s that make this method not so painful. Calculating the Three S s S xx = Σx 2 (Σx)2 n S yy = Σy 2 (Σy)2 n S xy = Σxy (Σx)(Σy) n And, we can get this from the calculator...
37 Another Way To Calculate Σx Σx 69 (Σx) (Σx) Σy Σy 533 (Σy) (Σy) Σxy 5083 (Σx)(Σy) (Σx)(Σy)
38 Another Way To Calculate So we have r = S xy Sxx S yy
39 Another Way To Calculate So we have = S xy r = Sxx S yy ( )( )
40 Another Way To Calculate So we have = S xy r = Sxx S yy ( )( ) = (142.86)(482.86)
41 Another Way To Calculate So we have = S xy r = Sxx S yy ( )( ) = (142.86)(482.86) =.6505
42 So Can We Say There Is A Relationship? So, can we say that there is a direct relationship between the number of hours of TV watched and the average grade? Not so fast...
43 So Can We Say There Is A Relationship? So, can we say that there is a direct relationship between the number of hours of TV watched and the average grade? Not so fast... Correlation does not necessarily imply causation.
44 So Can We Say There Is A Relationship? So, can we say that there is a direct relationship between the number of hours of TV watched and the average grade? Not so fast... Correlation does not necessarily imply causation. Just because it looks the part does not mean we have evidence that there is a relationship. We have to consider a couple of other things. One is lurking variables. These are variables that may be present but we are not actually considering them within the data.
45 So Can We Say There Is A Relationship? So, can we say that there is a direct relationship between the number of hours of TV watched and the average grade? Not so fast... Correlation does not necessarily imply causation. Just because it looks the part does not mean we have evidence that there is a relationship. We have to consider a couple of other things. One is lurking variables. These are variables that may be present but we are not actually considering them within the data. Can you think of any lurking variables that would impact our example?
46 Significance We also need to test for significance to see what is going on. If r n > 3, the correlation is significant Otherwise it is not significant
47 Significance We also need to test for significance to see what is going on. If r n > 3, the correlation is significant Otherwise it is not significant The smaller this value, the smaller the probability that the correlation will be significant.
48 Significance We also need to test for significance to see what is going on. If r n > 3, the correlation is significant Otherwise it is not significant The smaller this value, the smaller the probability that the correlation will be significant. Reasons why data may not be significant: 1 Genuine lack of correlation
49 Significance We also need to test for significance to see what is going on. If r n > 3, the correlation is significant Otherwise it is not significant The smaller this value, the smaller the probability that the correlation will be significant. Reasons why data may not be significant: 1 Genuine lack of correlation 2 Not enough data
50 Significance We also need to test for significance to see what is going on. If r n > 3, the correlation is significant Otherwise it is not significant The smaller this value, the smaller the probability that the correlation will be significant. Reasons why data may not be significant: 1 Genuine lack of correlation 2 Not enough data Our example is not significant because of quantity. So we cannot consider that watching TV has a direct impact on grades.
51 Assumptions and Conditions for Correlation Quantitative Variables Condition Don t make the common error of calling an association involving a categorical variable a correlation. Correlation is only about quantitative variables.
52 Assumptions and Conditions for Correlation Quantitative Variables Condition Don t make the common error of calling an association involving a categorical variable a correlation. Correlation is only about quantitative variables. Straight Enough Condition The best check for the assumption that the variables are truly linearly related is to look at the scatter plot to see whether it looks reasonably straight. That s a judgment call, but not a difficult one.
53 Assumptions and Conditions for Correlation Quantitative Variables Condition Don t make the common error of calling an association involving a categorical variable a correlation. Correlation is only about quantitative variables. Straight Enough Condition The best check for the assumption that the variables are truly linearly related is to look at the scatter plot to see whether it looks reasonably straight. That s a judgment call, but not a difficult one. No Outliers Condition Outliers can distort the correlation dramatically, making a weak association look strong or a strong one look weak. Outliers can even change the sign of the correlation. But it s easy to see outlier in the scatter plot, so to check this condition, just look.
54 Another Example Example The following gives the power numbers for the starting 9 for the 2007 Boston Red Sox. Is there relationship between the number of home runs and the number of RBIs? Does the number of home runs affect the number of RBIs? Produce a scatter plot and discuss the correlation. Player Home Runs RBIs Varitek Youkilis Pedroia 8 50 Lowell Lugo 8 73 Ramirez Crisp 6 60 Drew Ortiz
55 Red Sox Example Which is the response variable? Which is the response variable?
56 Red Sox Example Which is the response variable? Which is the response variable? Since we are asking if HR affects RBIs, HR would be the explanatory variable and therefore x. So RBIs is the y variable Red Sox Power Numbers RBIs Home Runs
57 Before We Go On Something to notice: we have two values with the same x-coordinate Red Sox Power Numbers 120 RBIs Home Runs
58 Finding Pearson s Correlation Coefficient What is our guess as to the correlation?
59 Finding Pearson s Correlation Coefficient What is our guess as to the correlation? Now let s find Pearson s correlation coefficient.
60 Finding Pearson s Correlation Coefficient What is our guess as to the correlation? Now let s find Pearson s correlation coefficient. Input data in usual way, with explanatory variable under L 1 and response variable under L 2
61 Finding Pearson s Correlation Coefficient What is our guess as to the correlation? Now let s find Pearson s correlation coefficient. Input data in usual way, with explanatory variable under L 1 and response variable under L 2 Press STAT and scroll to TESTS
62 Finding Pearson s Correlation Coefficient What is our guess as to the correlation? Now let s find Pearson s correlation coefficient. Input data in usual way, with explanatory variable under L 1 and response variable under L 2 Press STAT and scroll to TESTS Select LinRegTTest
63 Finding Pearson s Correlation Coefficient What is our guess as to the correlation? Now let s find Pearson s correlation coefficient. Input data in usual way, with explanatory variable under L 1 and response variable under L 2 Press STAT and scroll to TESTS Select LinRegTTest Make sure the XList and YList are the lists where the data for the explanatory and response variables are located, respectively
64 Finding Pearson s Correlation Coefficient What is our guess as to the correlation? Now let s find Pearson s correlation coefficient. Input data in usual way, with explanatory variable under L 1 and response variable under L 2 Press STAT and scroll to TESTS Select LinRegTTest Make sure the XList and YList are the lists where the data for the explanatory and response variables are located, respectively Press Calculate and scroll to find r and r 2
65 Using Technology For our example, we have
66 Using Technology For our example, we have r r
67 Using Technology For our example, we have r r So the correlation coefficient tells us that there is a strong positive correlation.
68 What Does r 2 Tell Us? r 2 tells us how much better our predictions will be if we go through the trouble to find the regression line rather than just make our predictions with the means. Ours is pretty good here, indicating that we should find the regression line Red Sox Power Numbers RBIs Home Runs
69 Technology and Scatter Plots We can also create a scatter plot on the calculator.
70 Technology and Scatter Plots We can also create a scatter plot on the calculator. Make sure there are no functions in the grapher (press Y= to check)
71 Technology and Scatter Plots We can also create a scatter plot on the calculator. Make sure there are no functions in the grapher (press Y= to check) Input the data in the usual way (we already have it there for this example)
72 Technology and Scatter Plots We can also create a scatter plot on the calculator. Make sure there are no functions in the grapher (press Y= to check) Input the data in the usual way (we already have it there for this example) Press 2 nd and Y= to get into the STAT PLOT menu
73 Technology and Scatter Plots We can also create a scatter plot on the calculator. Make sure there are no functions in the grapher (press Y= to check) Input the data in the usual way (we already have it there for this example) Press 2 nd and Y= to get into the STAT PLOT menu Make sure only the plot we want is turned on
74 Technology and Scatter Plots We can also create a scatter plot on the calculator. Make sure there are no functions in the grapher (press Y= to check) Input the data in the usual way (we already have it there for this example) Press 2 nd and Y= to get into the STAT PLOT menu Make sure only the plot we want is turned on Select the first graph in the first row and then make sure the XList and YList are correct
75 Technology and Scatter Plots We can also create a scatter plot on the calculator. Make sure there are no functions in the grapher (press Y= to check) Input the data in the usual way (we already have it there for this example) Press 2 nd and Y= to get into the STAT PLOT menu Make sure only the plot we want is turned on Select the first graph in the first row and then make sure the XList and YList are correct Press ZOOM 9
76 One More Example Example There is some evidence that drinking moderate amounts of wine helps prevent heart attacks. The accompanying table gives data on yearly wine consumption (in liters of alcohol from drinking wine per person) and yearly deaths from heart disease (per 100,000 people) in 19 developing nations. Construct a scatter plot and describe what you see. Country Alcohol Deaths County Alcohol Deaths Australia Austria Belgium Canada Denmark Finland France Iceland Ireland Italy Netherlands New Zealand Norway Spain Sweden 1, Switzerland United Kingdom United States West Germany
77 The Scatter Plot Heart Disease v. Alcohol from Wine Deaths (per 100,000) Alcohol from Wine (in liters)
78 The Scatter Plot Heart Disease v. Alcohol from Wine Deaths (per 100,000) Alcohol from Wine (in liters) r =.8428, strong negative correlation
79 The Scatter Plot Heart Disease v. Alcohol from Wine Deaths (per 100,000) Alcohol from Wine (in liters) r =.8428, strong negative correlation r 2 =.7103, worthwhile to find linear regression line
80 3 Slopes and Equations of Fitted Lines
81 This Section Is... The focus of this section is the idea of a fitted curve and the formulas used to find the equation of a line given two points.
82 This Section Is... The focus of this section is the idea of a fitted curve and the formulas used to find the equation of a line given two points. We will not talk about this section here unless we need to.
83 4 The Least Squares Line
84 Getting Started Definition A regression line is a straight line that describes how a response variable y changes as an explanatory variable x changes. We often use a regression line to predict the value of y for a given value of x.
85 Getting Started Definition A regression line is a straight line that describes how a response variable y changes as an explanatory variable x changes. We often use a regression line to predict the value of y for a given value of x. Linear functions are of the form y = mx + b but we will consider them as y = b 0 + b 1 x where b 0 is the y-intercept and b 1 is the slope. The calculator actually uses the form y = a + bx so be careful.
86 Formulas What we will be finding is the least squares regression line of y on x. This is the line that makes the sum of the squares of the vertical distances of the data points from the line as small as possible.
87 Formulas What we will be finding is the least squares regression line of y on x. This is the line that makes the sum of the squares of the vertical distances of the data points from the line as small as possible. Least Squares Regression Line b 1 = r s y s x
88 Formulas What we will be finding is the least squares regression line of y on x. This is the line that makes the sum of the squares of the vertical distances of the data points from the line as small as possible. Least Squares Regression Line b 1 = r s y s x b 0 = y b 1 x
89 Formulas What we will be finding is the least squares regression line of y on x. This is the line that makes the sum of the squares of the vertical distances of the data points from the line as small as possible. Least Squares Regression Line b 1 = r s y s x b 0 = y b 1 x If the correlation coefficient is too small, there is no point in finding y since b 0 and b 1 are both dependent on r.
90 Example Using Given Values Example The following list gives the power numbers for starting 9 Red Sox players for the 2007 season. Name Home Runs RBIs Jason Varitek Kevin Youkilis Dustin Pedroia 8 50 Mike Lowell Julio Lugo 8 73 Manny Ramirez Coco Crisp 6 60 J.D. Drew David Ortiz We want to know if there the number of home runs affects the number of RBIs.
91 Variable Types What is the response variable?
92 Variable Types What is the response variable? What is the explanatory variable?
93 The Needed Values We can find the mean and standard deviation of both sets of data quickly using our technology.
94 The Needed Values We can find the mean and standard deviation of both sets of data quickly using our technology. Variable Mean Standard Deviation x y
95 The Needed Values We can find the mean and standard deviation of both sets of data quickly using our technology. Variable Mean Standard Deviation x y And, since the data is already in the calculator, we can obtain the values of r and r 2 quickly as well. Variable Mean Standard Deviation x y r =.8463 r 2 =.7162
96 The Correlation Coefficient How would we describe this correlation coefficient?
97 The Correlation Coefficient How would we describe this correlation coefficient? It indicates a strong, positive correlation.
98 The Correlation Coefficient How would we describe this correlation coefficient? It indicates a strong, positive correlation. We can use these values to find the equation of the regression line. b 1 = r s y s ( ) x = = 2.29
99 The Correlation Coefficient How would we describe this correlation coefficient? It indicates a strong, positive correlation. We can use these values to find the equation of the regression line. b 1 = r s y s ( ) x = = 2.29 b 0 = y b 1 x
100 The Correlation Coefficient How would we describe this correlation coefficient? It indicates a strong, positive correlation. We can use these values to find the equation of the regression line. b 1 = r s y s ( ) x = = 2.29 b 0 = y b 1 x = (15.78) = 44.19
101 The Correlation Coefficient How would we describe this correlation coefficient? It indicates a strong, positive correlation. We can use these values to find the equation of the regression line. So, the regression line is b 1 = r s y s ( ) x = = 2.29 b 0 = y b 1 x = (15.78) = y = x
102 Alternative Calculation We can also use the three S s to find the equation of the least squares line.
103 Alternative Calculation We can also use the three S s to find the equation of the least squares line. Least Squares Regression Line and m = S xy S xx b = y mx
104 Alternative Calculation S xy = (142)(723) 9 =
105 Alternative Calculation S xy = (142)(723) 9 = S xx = 2896 (142)2 9 =
106 Alternative Calculation S xy = (142)(723) 9 = S xx = 2896 (142)2 9 = m = = 2.29
107 Alternative Calculation S xy = (142)(723) 9 = S xx = 2896 (142)2 9 = m = = 2.29 b = (15.78) = And we have the same results.
108 Practical Interpretation What do these values mean in practical terms?
109 Practical Interpretation What do these values mean in practical terms? The slope tells us that a change in the explanatory variable by one unit will result in a change in the response variable by the amount and direction of the slope.
110 Practical Interpretation What do these values mean in practical terms? The slope tells us that a change in the explanatory variable by one unit will result in a change in the response variable by the amount and direction of the slope. In our example, the slope is b 1 = 2.29, which tells us that for every home run hit, you d expect to get an additional 2.29 RBIs.
111 Practical Interpretation What do these values mean in practical terms? The slope tells us that a change in the explanatory variable by one unit will result in a change in the response variable by the amount and direction of the slope. In our example, the slope is b 1 = 2.29, which tells us that for every home run hit, you d expect to get an additional 2.29 RBIs. The y-intercept tells us the value of the response variable when the explanatory variable is 0.
112 Practical Interpretation What do these values mean in practical terms? The slope tells us that a change in the explanatory variable by one unit will result in a change in the response variable by the amount and direction of the slope. In our example, the slope is b 1 = 2.29, which tells us that for every home run hit, you d expect to get an additional 2.29 RBIs. The y-intercept tells us the value of the response variable when the explanatory variable is 0. In our example b 0 = 44.19, which means that we expect a player who hits no home runs to have RBIs.
113 Practical Interpretation What do these values mean in practical terms? The slope tells us that a change in the explanatory variable by one unit will result in a change in the response variable by the amount and direction of the slope. In our example, the slope is b 1 = 2.29, which tells us that for every home run hit, you d expect to get an additional 2.29 RBIs. The y-intercept tells us the value of the response variable when the explanatory variable is 0. In our example b 0 = 44.19, which means that we expect a player who hits no home runs to have RBIs. Note: In context, we may need to round to whole numbers for the answers to make any sense.
114 The Scatter Plot Let s see how good the regression line is by plotting it over the scatter plot.
115 The Scatter Plot Let s see how good the regression line is by plotting it over the scatter plot. RBIs Red Sox Power Numbers Home Runs To do so, we press Y= and put the line under Y 1, then select GRAPH
116 Plot and Line And now with the regression line y = x 2007 Red Sox Power Numbers RBIs Home Runs
117 Predictions One use of the regression line is making predictions. Suppose we wanted to know about how many RBI we could expect a player to have if they hit 60 home runs. We are looking to predict the value of y (so we want y) and we are given a value of x = 60.
118 Predictions One use of the regression line is making predictions. Suppose we wanted to know about how many RBI we could expect a player to have if they hit 60 home runs. We are looking to predict the value of y (so we want y) and we are given a value of x = 60. Our prediction would be y = (60) =
119 Predictions One use of the regression line is making predictions. Suppose we wanted to know about how many RBI we could expect a player to have if they hit 60 home runs. We are looking to predict the value of y (so we want y) and we are given a value of x = 60. Our prediction would be y = (60) = So, our prediction is 182 RBIs.
120 Facts about Regression Lines 1 Distinction between explanatory variables is essential - remember the formulas...
121 Facts about Regression Lines 1 Distinction between explanatory variables is essential - remember the formulas... 2 There is a close connection between slope and correlation
122 Facts about Regression Lines 1 Distinction between explanatory variables is essential - remember the formulas... 2 There is a close connection between slope and correlation 3 (x, y) is always on the line
123 Facts about Regression Lines 1 Distinction between explanatory variables is essential - remember the formulas... 2 There is a close connection between slope and correlation 3 (x, y) is always on the line 4 r 2 gives the fraction of the variation in the values of y that is explained by the least squares regression line of y on x. In other words, r 2 gives the fraction of the data s variation accounted for by the model.
124 Facts about Regression Lines 1 Distinction between explanatory variables is essential - remember the formulas... 2 There is a close connection between slope and correlation 3 (x, y) is always on the line 4 r 2 gives the fraction of the variation in the values of y that is explained by the least squares regression line of y on x. In other words, r 2 gives the fraction of the data s variation accounted for by the model. 5 This only shows us the linear model; it is possible that there is little correlation linearly but that the data has a strong correlation if we were using some other type of model.
125 Facts about Regression Lines 1 Distinction between explanatory variables is essential - remember the formulas... 2 There is a close connection between slope and correlation 3 (x, y) is always on the line 4 r 2 gives the fraction of the variation in the values of y that is explained by the least squares regression line of y on x. In other words, r 2 gives the fraction of the data s variation accounted for by the model. 5 This only shows us the linear model; it is possible that there is little correlation linearly but that the data has a strong correlation if we were using some other type of model. 6 We will not always get perfect correlation (probably never) but we need the line to be straight enough for the data to make sense. What that means is variable. Depending on the situation, r =.3 could be good enough; other times r =.8 would be a minimum.
126 Another Sox Example Suppose we wanted to know if a player was expected to score more runs if he got more hits. To answer this question, we will use the roster of the 2011 Boston Red Sox. Name Runs Hits Jarred Saltalamacchia Adrian Gonzalez Dustin Pedroia Marco Scutaro Kevin Youkilis Carl Crawford Jacoby Ellsbury J.D. Drew David Ortiz Jed Lowrie Josh Reddick Jason Varitek Darnell McDonald Mike Aviles Mike Cameron 9 14 Drew Sutton Ryan Lavarnway 5 9 Yamaico Navarro 6 8 Conor Jackson 2 3 Jose Iglesias 3 2 Lars Anderson 2 0 Joey Gathright 1 0
127 The Correlation Coefficient The first thing we will do is find the correlation coefficient.
128 The Correlation Coefficient The first thing we will do is find the correlation coefficient. When we plug all of the data into our technology, we get r =.9942.
129 The Correlation Coefficient The first thing we will do is find the correlation coefficient. When we plug all of the data into our technology, we get r = Interpretation?
130 The Correlation Coefficient The first thing we will do is find the correlation coefficient. When we plug all of the data into our technology, we get r = Interpretation? There is a strong, positive correlation between hits and runs scored.
131 Interpretation of r 2 What is the value of r 2 here?
132 Interpretation of r 2 What is the value of r 2 here? r 2 = What does this mean?
133 Interpretation of r 2 What is the value of r 2 here? r 2 = What does this mean? r 2 tells us that the fraction of the variability of the response variable that is accounted for by the variation in the explanatory variable.
134 Interpretation of r 2 What is the value of r 2 here? r 2 = What does this mean? r 2 tells us that the fraction of the variability of the response variable that is accounted for by the variation in the explanatory variable. Here, this means that we can explain 98.85% of the variability in the runs total by the variability in the number of hits.
135 Interpretation of r 2 What is the value of r 2 here? r 2 = What does this mean? r 2 tells us that the fraction of the variability of the response variable that is accounted for by the variation in the explanatory variable. Here, this means that we can explain 98.85% of the variability in the runs total by the variability in the number of hits. For each standard deviation above the mean for the explanatory variable x, y will be r standard deviations above the mean of the response variable.
136 Interpretation of r 2 What is the value of r 2 here? r 2 = What does this mean? r 2 tells us that the fraction of the variability of the response variable that is accounted for by the variation in the explanatory variable. Here, this means that we can explain 98.85% of the variability in the runs total by the variability in the number of hits. For each standard deviation above the mean for the explanatory variable x, y will be r standard deviations above the mean of the response variable. Here, we would say that for every standard deviation we are above the mean number of hits, we will be.9885 standard deviations above the mean number of runs.
137 Interpretation of r 2 What is the value of r 2 here? r 2 = What does this mean? r 2 tells us that the fraction of the variability of the response variable that is accounted for by the variation in the explanatory variable. Here, this means that we can explain 98.85% of the variability in the runs total by the variability in the number of hits. For each standard deviation above the mean for the explanatory variable x, y will be r standard deviations above the mean of the response variable. Here, we would say that for every standard deviation we are above the mean number of hits, we will be.9885 standard deviations above the mean number of runs. Also note that 1 r 2 is the fraction of the variation in the original data left in the residuals.
138 Producing the Scatter Plot Now, let s produce a scatter plot for the data.
139 Producing the Scatter Plot Now, let s produce a scatter plot for the data Red Sox Runs Hits
140 The Assumptions The points give us a pretty good indication that there is a very strong positive correlation here. Before we go on, we want to make sure all of the assumptions about regression lines are met.
141 The Assumptions The points give us a pretty good indication that there is a very strong positive correlation here. Before we go on, we want to make sure all of the assumptions about regression lines are met. Quantitative Variable Condition If either y or x is categorical, you cannot make a scatter plot and you cannot perform a regression.
142 The Assumptions The points give us a pretty good indication that there is a very strong positive correlation here. Before we go on, we want to make sure all of the assumptions about regression lines are met. Quantitative Variable Condition If either y or x is categorical, you cannot make a scatter plot and you cannot perform a regression. Straight Enough Condition Does the data look straight enough that we can see a linear relationship in the data set?
143 The Assumptions The points give us a pretty good indication that there is a very strong positive correlation here. Before we go on, we want to make sure all of the assumptions about regression lines are met. Quantitative Variable Condition If either y or x is categorical, you cannot make a scatter plot and you cannot perform a regression. Straight Enough Condition Does the data look straight enough that we can see a linear relationship in the data set? Outlier Condition Are there any outliers that dramatically influence the fit of the regression line?
144 The Assumptions The points give us a pretty good indication that there is a very strong positive correlation here. Before we go on, we want to make sure all of the assumptions about regression lines are met. Quantitative Variable Condition If either y or x is categorical, you cannot make a scatter plot and you cannot perform a regression. Straight Enough Condition Does the data look straight enough that we can see a linear relationship in the data set? Outlier Condition Are there any outliers that dramatically influence the fit of the regression line? Does the Plot Thicken Condition Does the spread of the data around the generally straight relationship seem to be consistent for all values of x?
145 Linear Regression Line Since all of these are satisfied, we will continue on to find the formula of the regression line. Using our technology, we have
146 Linear Regression Line Since all of these are satisfied, we will continue on to find the formula of the regression line. Using our technology, we have y = b0 + b 1 x = x
147 Linear Regression Line Since all of these are satisfied, we will continue on to find the formula of the regression line. Using our technology, we have y = b0 + b 1 x = x What is the practical interpretation of the slope b 1?
148 Linear Regression Line Since all of these are satisfied, we will continue on to find the formula of the regression line. Using our technology, we have y = b0 + b 1 x = x What is the practical interpretation of the slope b 1? For each hit, we expect a player to score.52 additional runs.
149 Linear Regression Line Since all of these are satisfied, we will continue on to find the formula of the regression line. Using our technology, we have y = b0 + b 1 x = x What is the practical interpretation of the slope b 1? For each hit, we expect a player to score.52 additional runs. What is the practical interpretation of the y-intercept b 0?
150 Linear Regression Line Since all of these are satisfied, we will continue on to find the formula of the regression line. Using our technology, we have y = b0 + b 1 x = x What is the practical interpretation of the slope b 1? For each hit, we expect a player to score.52 additional runs. What is the practical interpretation of the y-intercept b 0? If a player has no hits, we expect 1.92 runs to be scored.
151 Scatter Plot With Regression Line 2011 Red Sox Runs Hits
152 Scatter Plot With Regression Line 2011 Red Sox Runs Hits So, when we plot the regression line over the scatter plot, we see that the line is a good fit.
153 Predictions 1 If a player got 200 hits, how many runs would we expect them to have?
154 Predictions 1 If a player got 200 hits, how many runs would we expect them to have? Here, we are given the x value and using our regression line, we find the predicted value.
155 Predictions 1 If a player got 200 hits, how many runs would we expect them to have? Here, we are given the x value and using our regression line, we find the predicted value. y = (200) So, we d expect about 106 runs for a player with 200 hits.
156 Predictions 1 If a player got 200 hits, how many runs would we expect them to have? Here, we are given the x value and using our regression line, we find the predicted value. y = (200) So, we d expect about 106 runs for a player with 200 hits. 2 What if we wanted to know how many hits a player had if they scored 120 runs?
157 Predictions 1 If a player got 200 hits, how many runs would we expect them to have? Here, we are given the x value and using our regression line, we find the predicted value. y = (200) So, we d expect about 106 runs for a player with 200 hits. 2 What if we wanted to know how many hits a player had if they scored 120 runs? We are given the value of y and want to find the value of x. So, we use our algebra skills...
158 Predictions 1 If a player got 200 hits, how many runs would we expect them to have? Here, we are given the x value and using our regression line, we find the predicted value. y = (200) So, we d expect about 106 runs for a player with 200 hits. 2 What if we wanted to know how many hits a player had if they scored 120 runs? We are given the value of y and want to find the value of x. So, we use our algebra skills... We expect about 227 hits. y = x 120 = x =.52x = x
159 Important Points A few important points to keep in mind 1 An observation is influential for a statistical calculation if removing it would markedly change the results of the calculation. Point that are outliers in either the x or y direction are often influential points.
160 Important Points A few important points to keep in mind 1 An observation is influential for a statistical calculation if removing it would markedly change the results of the calculation. Point that are outliers in either the x or y direction are often influential points. 2 Correlation and least squares regression lines are not resistant
161 Important Points A few important points to keep in mind 1 An observation is influential for a statistical calculation if removing it would markedly change the results of the calculation. Point that are outliers in either the x or y direction are often influential points. 2 Correlation and least squares regression lines are not resistant 3 They only describe linear relationships.
162 Important Points A few important points to keep in mind 1 An observation is influential for a statistical calculation if removing it would markedly change the results of the calculation. Point that are outliers in either the x or y direction are often influential points. 2 Correlation and least squares regression lines are not resistant 3 They only describe linear relationships. 4 There could be lurking variables. Those are ones that are not among the explanatory or response variables but may influence the interpretation of the relationship.
163 Important Points A few important points to keep in mind 1 An observation is influential for a statistical calculation if removing it would markedly change the results of the calculation. Point that are outliers in either the x or y direction are often influential points. 2 Correlation and least squares regression lines are not resistant 3 They only describe linear relationships. 4 There could be lurking variables. Those are ones that are not among the explanatory or response variables but may influence the interpretation of the relationship. 5 An association between an explanatory variable x and a response variable y, even if r is very strong, is not itself good evidence that changes in x actually cause changes in y. The phrase to remember is that correlation does not necessarily imply causation.
164 Example Where You Are Doing The Work Example We want to know know if there is a relationship between the score on the math portion of the SAT exam and the number of hours studying for the test. The question is, Does studying more increase the score on the exam? The following data was taken from a study conducted of 20 students as they prepared and took the SAT exam. Hours Score Hours Score
165 Variable Types What is the response variable?
166 Variable Types What is the response variable? Math SAT score
167 Variable Types What is the response variable? Math SAT score What is the explanatory variable?
168 Variable Types What is the response variable? Math SAT score What is the explanatory variable? Hours of study
169 Pearson s Correlation Coefficient So let s get first find the correlation coefficient to see what we are dealing with.
170 Pearson s Correlation Coefficient So let s get first find the correlation coefficient to see what we are dealing with. Our interpretation? r =.9336
171 Pearson s Correlation Coefficient So let s get first find the correlation coefficient to see what we are dealing with. Our interpretation? r =.9336 This tells us there is a strong positive correlation.
172 And Now r 2 What about r 2?
173 And Now r 2 What about r 2? r 2 =.8716 This tells us what?
174 And Now r 2 What about r 2? r 2 =.8716 This tells us what? We can explain 87.16% of the variation in the Math SAT score can be explained by the variation in the hours of study.
175 Is The Data Significant? What is the inequality we are using?
176 Is The Data Significant? What is the inequality we are using? r n > 3
177 Is The Data Significant? What is the inequality we are using? r n > 3 Is this data significant?
178 Is The Data Significant? What is the inequality we are using? r n > 3 Is this data significant? r n = > 3 So, the data is significant based on this criteria.
179 Visual Representation Next, let s produce our scatter plot so we can see what we are dealing with.
180 Visual Representation Next, let s produce our scatter plot so we can see what we are dealing with. Math SAT Score v. Hours of Study SAT Score Hours of Study
181 Checking Conditions/Assumptions We are feeling pretty good about this - it seems to have a strong, positive correlation. When we consider the conditions, are we still happy with this?
182 Checking Conditions/Assumptions We are feeling pretty good about this - it seems to have a strong, positive correlation. When we consider the conditions, are we still happy with this? Quantitative Variable Condition
183 Checking Conditions/Assumptions We are feeling pretty good about this - it seems to have a strong, positive correlation. When we consider the conditions, are we still happy with this? Quantitative Variable Condition Both variables are quantitative.
184 Checking Conditions/Assumptions We are feeling pretty good about this - it seems to have a strong, positive correlation. When we consider the conditions, are we still happy with this? Quantitative Variable Condition Both variables are quantitative. Straight Enough Condition
185 Checking Conditions/Assumptions We are feeling pretty good about this - it seems to have a strong, positive correlation. When we consider the conditions, are we still happy with this? Quantitative Variable Condition Both variables are quantitative. Straight Enough Condition Data looks reasonably straight.
186 Checking Conditions/Assumptions We are feeling pretty good about this - it seems to have a strong, positive correlation. When we consider the conditions, are we still happy with this? Quantitative Variable Condition Both variables are quantitative. Straight Enough Condition Data looks reasonably straight. Outlier Condition
187 Checking Conditions/Assumptions We are feeling pretty good about this - it seems to have a strong, positive correlation. When we consider the conditions, are we still happy with this? Quantitative Variable Condition Both variables are quantitative. Straight Enough Condition Data looks reasonably straight. Outlier Condition There do not seem to be any outliers.
188 Checking Conditions/Assumptions We are feeling pretty good about this - it seems to have a strong, positive correlation. When we consider the conditions, are we still happy with this? Quantitative Variable Condition Both variables are quantitative. Straight Enough Condition Data looks reasonably straight. Outlier Condition There do not seem to be any outliers. Does the Plot Thicken Condition
189 Checking Conditions/Assumptions We are feeling pretty good about this - it seems to have a strong, positive correlation. When we consider the conditions, are we still happy with this? Quantitative Variable Condition Both variables are quantitative. Straight Enough Condition Data looks reasonably straight. Outlier Condition There do not seem to be any outliers. Does the Plot Thicken Condition Pretty much - other than the one person who studied for 22 hours, the relationship seems very strong.
Chapter 4 Data with Two Variables
Chapter 4 Data with Two Variables 1 Scatter Plots and Correlation and 2 Pearson s Correlation Coefficient Looking for Correlation Example Does the number of hours you watch TV per week impact your average
More informationChapter 6 Scatterplots, Association and Correlation
Chapter 6 Scatterplots, Association and Correlation Looking for Correlation Example Does the number of hours you watch TV per week impact your average grade in a class? Hours 12 10 5 3 15 16 8 Grade 70
More informationSTATISTICS Relationships between variables: Correlation
STATISTICS 16 Relationships between variables: Correlation The gentleman pictured above is Sir Francis Galton. Galton invented the statistical concept of correlation and the use of the regression line.
More informationAP Statistics Two-Variable Data Analysis
AP Statistics Two-Variable Data Analysis Key Ideas Scatterplots Lines of Best Fit The Correlation Coefficient Least Squares Regression Line Coefficient of Determination Residuals Outliers and Influential
More informationCh. 3 Review - LSRL AP Stats
Ch. 3 Review - LSRL AP Stats Multiple Choice Identify the choice that best completes the statement or answers the question. Scenario 3-1 The height (in feet) and volume (in cubic feet) of usable lumber
More informationBivariate Data Summary
Bivariate Data Summary Bivariate data data that examines the relationship between two variables What individuals to the data describe? What are the variables and how are they measured Are the variables
More informationChapter 3: Examining Relationships
Chapter 3: Examining Relationships Most statistical studies involve more than one variable. Often in the AP Statistics exam, you will be asked to compare two data sets by using side by side boxplots or
More informationBIVARIATE DATA data for two variables
(Chapter 3) BIVARIATE DATA data for two variables INVESTIGATING RELATIONSHIPS We have compared the distributions of the same variable for several groups, using double boxplots and back-to-back stemplots.
More information4.1 Introduction. 4.2 The Scatter Diagram. Chapter 4 Linear Correlation and Regression Analysis
4.1 Introduction Correlation is a technique that measures the strength (or the degree) of the relationship between two variables. For example, we could measure how strong the relationship is between people
More informationDescribing Bivariate Relationships
Describing Bivariate Relationships Bivariate Relationships What is Bivariate data? When exploring/describing a bivariate (x,y) relationship: Determine the Explanatory and Response variables Plot the data
More informationChapter 8. Linear Regression /71
Chapter 8 Linear Regression 1 /71 Homework p192 1, 2, 3, 5, 7, 13, 15, 21, 27, 28, 29, 32, 35, 37 2 /71 3 /71 Objectives Determine Least Squares Regression Line (LSRL) describing the association of two
More informationChapter 8. Linear Regression. Copyright 2010 Pearson Education, Inc.
Chapter 8 Linear Regression Copyright 2010 Pearson Education, Inc. Fat Versus Protein: An Example The following is a scatterplot of total fat versus protein for 30 items on the Burger King menu: Copyright
More information5.1 Bivariate Relationships
Chapter 5 Summarizing Bivariate Data Source: TPS 5.1 Bivariate Relationships What is Bivariate data? When exploring/describing a bivariate (x,y) relationship: Determine the Explanatory and Response variables
More informationExample: Can an increase in non-exercise activity (e.g. fidgeting) help people gain less weight?
Example: Can an increase in non-exercise activity (e.g. fidgeting) help people gain less weight? 16 subjects overfed for 8 weeks Explanatory: change in energy use from non-exercise activity (calories)
More informationRegression and correlation. Correlation & Regression, I. Regression & correlation. Regression vs. correlation. Involve bivariate, paired data, X & Y
Regression and correlation Correlation & Regression, I 9.07 4/1/004 Involve bivariate, paired data, X & Y Height & weight measured for the same individual IQ & exam scores for each individual Height of
More informationRelationships Regression
Relationships Regression BPS chapter 5 2006 W.H. Freeman and Company Objectives (BPS chapter 5) Regression Regression lines The least-squares regression line Using technology Facts about least-squares
More informationProb/Stats Questions? /32
Prob/Stats 10.4 Questions? 1 /32 Prob/Stats 10.4 Homework Apply p551 Ex 10-4 p 551 7, 8, 9, 10, 12, 13, 28 2 /32 Prob/Stats 10.4 Objective Compute the equation of the least squares 3 /32 Regression A scatter
More informationChapter 10 Correlation and Regression
Chapter 10 Correlation and Regression 10-1 Review and Preview 10-2 Correlation 10-3 Regression 10-4 Variation and Prediction Intervals 10-5 Multiple Regression 10-6 Modeling Copyright 2010, 2007, 2004
More informationSteps to take to do the descriptive part of regression analysis:
STA 2023 Simple Linear Regression: Least Squares Model Steps to take to do the descriptive part of regression analysis: A. Plot the data on a scatter plot. Describe patterns: 1. Is there a strong, moderate,
More informationIf the roles of the variable are not clear, then which variable is placed on which axis is not important.
Chapter 6 - Scatterplots, Association, and Correlation February 6, 2015 In chapter 6-8, we look at ways to compare the relationship of 2 quantitative variables. First we will look at a graphical representation,
More informationFinite Mathematics : A Business Approach
Finite Mathematics : A Business Approach Dr. Brian Travers and Prof. James Lampes Second Edition Cover Art by Stephanie Oxenford Additional Editing by John Gambino Contents What You Should Already Know
More informationChapter 12: Linear Regression and Correlation
Chapter 12: Linear Regression and Correlation Linear Equations Linear regression for two variables is based on a linear equation with one independent variable. It has the form: y = a + bx where a and b
More informationBiostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras
Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras Lecture - 39 Regression Analysis Hello and welcome to the course on Biostatistics
More informationLinear Regression. Linear Regression. Linear Regression. Did You Mean Association Or Correlation?
Did You Mean Association Or Correlation? AP Statistics Chapter 8 Be careful not to use the word correlation when you really mean association. Often times people will incorrectly use the word correlation
More informationThe empirical ( ) rule
The empirical (68-95-99.7) rule With a bell shaped distribution, about 68% of the data fall within a distance of 1 standard deviation from the mean. 95% fall within 2 standard deviations of the mean. 99.7%
More informationChapter 6. September 17, Please pick up a calculator and take out paper and something to write with. Association and Correlation.
Please pick up a calculator and take out paper and something to write with. Sep 17 8:08 AM Chapter 6 Scatterplots, Association and Correlation Copyright 2015, 2010, 2007 Pearson Education, Inc. Chapter
More information3.2: Least Squares Regressions
3.2: Least Squares Regressions Section 3.2 Least-Squares Regression After this section, you should be able to INTERPRET a regression line CALCULATE the equation of the least-squares regression line CALCULATE
More informationRelationships between variables. Visualizing Bivariate Distributions: Scatter Plots
SFBS Course Notes Part 7: Correlation Bivariate relationships (p. 1) Linear transformations (p. 3) Pearson r : Measuring a relationship (p. 5) Interpretation of correlations (p. 10) Relationships between
More informationThe response variable depends on the explanatory variable.
A response variable measures an outcome of study. > dependent variables An explanatory variable attempts to explain the observed outcomes. > independent variables The response variable depends on the explanatory
More informationCh Inference for Linear Regression
Ch. 12-1 Inference for Linear Regression ACT = 6.71 + 5.17(GPA) For every increase of 1 in GPA, we predict the ACT score to increase by 5.17. population regression line β (true slope) μ y = α + βx mean
More informationCORRELATION. compiled by Dr Kunal Pathak
CORRELATION compiled by Dr Kunal Pathak Flow of Presentation Definition Types of correlation Method of studying correlation a) Scatter diagram b) Karl Pearson s coefficient of correlation c) Spearman s
More informationLearning Objectives. Math Chapter 3. Chapter 3. Association. Response and Explanatory Variables
ASSOCIATION: CONTINGENCY, CORRELATION, AND REGRESSION Chapter 3 Learning Objectives 3.1 The Association between Two Categorical Variables 1. Identify variable type: Response or Explanatory 2. Define Association
More information1. Create a scatterplot of this data. 2. Find the correlation coefficient.
How Fast Foods Compare Company Entree Total Calories Fat (grams) McDonald s Big Mac 540 29 Filet o Fish 380 18 Burger King Whopper 670 40 Big Fish Sandwich 640 32 Wendy s Single Burger 470 21 1. Create
More informationObjectives. 2.3 Least-squares regression. Regression lines. Prediction and Extrapolation. Correlation and r 2. Transforming relationships
Objectives 2.3 Least-squares regression Regression lines Prediction and Extrapolation Correlation and r 2 Transforming relationships Adapted from authors slides 2012 W.H. Freeman and Company Straight Line
More informationSolving Quadratic & Higher Degree Equations
Chapter 7 Solving Quadratic & Higher Degree Equations Sec 1. Zero Product Property Back in the third grade students were taught when they multiplied a number by zero, the product would be zero. In algebra,
More informationChapter 9. Correlation and Regression
Chapter 9 Correlation and Regression Lesson 9-1/9-2, Part 1 Correlation Registered Florida Pleasure Crafts and Watercraft Related Manatee Deaths 100 80 60 40 20 0 1991 1993 1995 1997 1999 Year Boats in
More informationChapter 12 : Linear Correlation and Linear Regression
Chapter 1 : Linear Correlation and Linear Regression Determining whether a linear relationship exists between two quantitative variables, and modeling the relationship with a line, if the linear relationship
More informationdetermine whether or not this relationship is.
Section 9-1 Correlation A correlation is a between two. The data can be represented by ordered pairs (x,y) where x is the (or ) variable and y is the (or ) variable. There are several types of correlations
More informationIOP2601. Some notes on basic mathematical calculations
IOP601 Some notes on basic mathematical calculations The order of calculations In order to perform the calculations required in this module, there are a few steps that you need to complete. Step 1: Choose
More informationDescribing Bivariate Data
Describing Bivariate Data Correlation Linear Regression Assessing the Fit of a Line Nonlinear Relationships & Transformations The Linear Correlation Coefficient, r Recall... Bivariate Data: data that consists
More informationMATH 1070 Introductory Statistics Lecture notes Relationships: Correlation and Simple Regression
MATH 1070 Introductory Statistics Lecture notes Relationships: Correlation and Simple Regression Objectives: 1. Learn the concepts of independent and dependent variables 2. Learn the concept of a scatterplot
More informationBIOSTATISTICS NURS 3324
Simple Linear Regression and Correlation Introduction Previously, our attention has been focused on one variable which we designated by x. Frequently, it is desirable to learn something about the relationship
More informationMath 243 OpenStax Chapter 12 Scatterplots and Linear Regression OpenIntro Section and
Math 243 OpenStax Chapter 12 Scatterplots and Linear Regression OpenIntro Section 2.1.1 and 8.1-8.2.6 Overview Scatterplots Explanatory and Response Variables Describing Association The Regression Equation
More informationRegression, part II. I. What does it all mean? A) Notice that so far all we ve done is math.
Regression, part II I. What does it all mean? A) Notice that so far all we ve done is math. 1) One can calculate the Least Squares Regression Line for anything, regardless of any assumptions. 2) But, if
More informationLecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1
Lecture Slides Elementary Statistics Tenth Edition and the Triola Statistics Series by Mario F. Triola Slide 1 Chapter 10 Correlation and Regression 10-1 Overview 10-2 Correlation 10-3 Regression 10-4
More informationregression analysis is a type of inferential statistics which tells us whether relationships between two or more variables exist
regression analysis is a type of inferential statistics which tells us whether relationships between two or more variables exist sales $ (y - dependent variable) advertising $ (x - independent variable)
More information11 Correlation and Regression
Chapter 11 Correlation and Regression August 21, 2017 1 11 Correlation and Regression When comparing two variables, sometimes one variable (the explanatory variable) can be used to help predict the value
More informationChapter 14. Statistical versus Deterministic Relationships. Distance versus Speed. Describing Relationships: Scatterplots and Correlation
Chapter 14 Describing Relationships: Scatterplots and Correlation Chapter 14 1 Statistical versus Deterministic Relationships Distance versus Speed (when travel time is constant). Income (in millions of
More informationChapter 2: Looking at Data Relationships (Part 3)
Chapter 2: Looking at Data Relationships (Part 3) Dr. Nahid Sultana Chapter 2: Looking at Data Relationships 2.1: Scatterplots 2.2: Correlation 2.3: Least-Squares Regression 2.5: Data Analysis for Two-Way
More informationLecture 4 Scatterplots, Association, and Correlation
Lecture 4 Scatterplots, Association, and Correlation Previously, we looked at Single variables on their own One or more categorical variable In this lecture: We shall look at two quantitative variables.
More informationSolving Quadratic & Higher Degree Equations
Chapter 9 Solving Quadratic & Higher Degree Equations Sec 1. Zero Product Property Back in the third grade students were taught when they multiplied a number by zero, the product would be zero. In algebra,
More informationappstats27.notebook April 06, 2017
Chapter 27 Objective Students will conduct inference on regression and analyze data to write a conclusion. Inferences for Regression An Example: Body Fat and Waist Size pg 634 Our chapter example revolves
More informationy n 1 ( x i x )( y y i n 1 i y 2
STP3 Brief Class Notes Instructor: Ela Jackiewicz Chapter Regression and Correlation In this chapter we will explore the relationship between two quantitative variables, X an Y. We will consider n ordered
More informationThis document contains 3 sets of practice problems.
P RACTICE PROBLEMS This document contains 3 sets of practice problems. Correlation: 3 problems Regression: 4 problems ANOVA: 8 problems You should print a copy of these practice problems and bring them
More informationAlgebra Review. Finding Zeros (Roots) of Quadratics, Cubics, and Quartics. Kasten, Algebra 2. Algebra Review
Kasten, Algebra 2 Finding Zeros (Roots) of Quadratics, Cubics, and Quartics A zero of a polynomial equation is the value of the independent variable (typically x) that, when plugged-in to the equation,
More informationChapter 7. Scatterplots, Association, and Correlation
Chapter 7 Scatterplots, Association, and Correlation Bin Zou (bzou@ualberta.ca) STAT 141 University of Alberta Winter 2015 1 / 29 Objective In this chapter, we study relationships! Instead, we investigate
More informationCorrelation & Regression
Correlation & Regression Correlation It is critical that when "interpreting" the association between 2 variables via a scatterplot, to employ "weasel words" such as in general and on average and tends
More informationChapter 27 Summary Inferences for Regression
Chapter 7 Summary Inferences for Regression What have we learned? We have now applied inference to regression models. Like in all inference situations, there are conditions that we must check. We can test
More informationAlgebra & Trig Review
Algebra & Trig Review 1 Algebra & Trig Review This review was originally written for my Calculus I class, but it should be accessible to anyone needing a review in some basic algebra and trig topics. The
More informationP.1 Prerequisite skills Basic Algebra Skills
P.1 Prerequisite skills Basic Algebra Skills Topics: Evaluate an algebraic expression for given values of variables Combine like terms/simplify algebraic expressions Solve equations for a specified variable
More informationImportant note: Transcripts are not substitutes for textbook assignments. 1
In this lesson we will cover correlation and regression, two really common statistical analyses for quantitative (or continuous) data. Specially we will review how to organize the data, the importance
More informationChapter 5 Least Squares Regression
Chapter 5 Least Squares Regression A Royal Bengal tiger wandered out of a reserve forest. We tranquilized him and want to take him back to the forest. We need an idea of his weight, but have no scale!
More informationRelated Example on Page(s) R , 148 R , 148 R , 156, 157 R3.1, R3.2. Activity on 152, , 190.
Name Chapter 3 Learning Objectives Identify explanatory and response variables in situations where one variable helps to explain or influences the other. Make a scatterplot to display the relationship
More informationa. Length of tube: Diameter of tube:
CCA Ch 6: Modeling Two-Variable Data Name: 6.1.1 How can I make predictions? Line of Best Fit 6-1. a. Length of tube: Diameter of tube: Distance from the wall (in) Width of field of view (in) b. Make a
More informationSTA Module 5 Regression and Correlation. Learning Objectives. Learning Objectives (Cont.) Upon completing this module, you should be able to:
STA 2023 Module 5 Regression and Correlation Learning Objectives Upon completing this module, you should be able to: 1. Define and apply the concepts related to linear equations with one independent variable.
More informationLecture 4 Scatterplots, Association, and Correlation
Lecture 4 Scatterplots, Association, and Correlation Previously, we looked at Single variables on their own One or more categorical variables In this lecture: We shall look at two quantitative variables.
More informationIntroduce Exploration! Before we go on, notice one more thing. We'll come back to the derivation if we have time.
Introduce Exploration! Before we go on, notice one more thing. We'll come back to the derivation if we have time. Simplifying the calculation of variance Notice that we can rewrite the calculation of a
More information3.1 Scatterplots and Correlation
3.1 Scatterplots and Correlation Most statistical studies examine data on more than one variable. In many of these settings, the two variables play different roles. Explanatory variable (independent) predicts
More informationChapter 1 Review of Equations and Inequalities
Chapter 1 Review of Equations and Inequalities Part I Review of Basic Equations Recall that an equation is an expression with an equal sign in the middle. Also recall that, if a question asks you to solve
More informationLinear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).
Linear Regression Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x). A dependent variable is a random variable whose variation
More informationMATH 1130 Exam 1 Review Sheet
MATH 1130 Exam 1 Review Sheet The Cartesian Coordinate Plane The Cartesian Coordinate Plane is a visual representation of the collection of all ordered pairs (x, y) where x and y are real numbers. This
More informationTalking feet: Scatterplots and lines of best fit
Talking feet: Scatterplots and lines of best fit Student worksheet What does your foot say about your height? Can you predict people s height by how long their feet are? If a Grade 10 student s foot is
More informationCORRELATION AND REGRESSION
CORRELATION AND REGRESSION CORRELATION Introduction CORRELATION problems which involve measuring the strength of a relationship. Correlation Analysis involves various methods and techniques used for studying
More informationChapter 7 Linear Regression
Chapter 7 Linear Regression 1 7.1 Least Squares: The Line of Best Fit 2 The Linear Model Fat and Protein at Burger King The correlation is 0.76. This indicates a strong linear fit, but what line? The line
More informationStatistics 1. Edexcel Notes S1. Mathematical Model. A mathematical model is a simplification of a real world problem.
Statistics 1 Mathematical Model A mathematical model is a simplification of a real world problem. 1. A real world problem is observed. 2. A mathematical model is thought up. 3. The model is used to make
More informationUnit 6 - Introduction to linear regression
Unit 6 - Introduction to linear regression Suggested reading: OpenIntro Statistics, Chapter 7 Suggested exercises: Part 1 - Relationship between two numerical variables: 7.7, 7.9, 7.11, 7.13, 7.15, 7.25,
More informationCORRELATION AND REGRESSION
CORRELATION AND REGRESSION CORRELATION The correlation coefficient is a number, between -1 and +1, which measures the strength of the relationship between two sets of data. The closer the correlation coefficient
More informationAP Statistics. Chapter 6 Scatterplots, Association, and Correlation
AP Statistics Chapter 6 Scatterplots, Association, and Correlation Objectives: Scatterplots Association Outliers Response Variable Explanatory Variable Correlation Correlation Coefficient Lurking Variables
More informationCorrelation and Regression
Correlation and Regression 8 9 Copyright Cengage Learning. All rights reserved. Section 9.2 Linear Regression and the Coefficient of Determination Copyright Cengage Learning. All rights reserved. Focus
More informationUnit 1 Science Models & Graphing
Name: Date: 9/18 Period: Unit 1 Science Models & Graphing Essential Questions: What do scientists mean when they talk about models? How can we get equations from graphs? Objectives Explain why models are
More informationappstats8.notebook October 11, 2016
Chapter 8 Linear Regression Objective: Students will construct and analyze a linear model for a given set of data. Fat Versus Protein: An Example pg 168 The following is a scatterplot of total fat versus
More informationChapter 3: Examining Relationships
Chapter 3: Examining Relationships 3.1 Scatterplots 3.2 Correlation 3.3 Least-Squares Regression Fabric Tenacity, lb/oz/yd^2 26 25 24 23 22 21 20 19 18 y = 3.9951x + 4.5711 R 2 = 0.9454 3.5 4.0 4.5 5.0
More informationLinear Regression. Al Nosedal University of Toronto. Summer Al Nosedal University of Toronto Linear Regression Summer / 115
Linear Regression Al Nosedal University of Toronto Summer 2017 Al Nosedal University of Toronto Linear Regression Summer 2017 1 / 115 My momma always said: Life was like a box of chocolates. You never
More informationReview of Regression Basics
Review of Regression Basics When describing a Bivariate Relationship: Make a Scatterplot Strength, Direction, Form Model: y-hat=a+bx Interpret slope in context Make Predictions Residual = Observed-Predicted
More informationAP Statistics Unit 6 Note Packet Linear Regression. Scatterplots and Correlation
Scatterplots and Correlation Name Hr A scatterplot shows the relationship between two quantitative variables measured on the same individuals. variable (y) measures an outcome of a study variable (x) may
More information6.1.1 How can I make predictions?
CCA Ch 6: Modeling Two-Variable Data Name: Team: 6.1.1 How can I make predictions? Line of Best Fit 6-1. a. Length of tube: Diameter of tube: Distance from the wall (in) Width of field of view (in) b.
More informationIntroduction to Determining Power Law Relationships
1 Goal Introduction to Determining Power Law Relationships Content Discussion and Activities PHYS 104L The goal of this week s activities is to expand on a foundational understanding and comfort in modeling
More informationBusiness Statistics 41000:
Business Statistics 41000: Plotting and Summarizing Bivariate Data Drew D. Creal University of Chicago, Booth School of Business Week 2: January 17 and 18, 2014 1 Class information Drew D. Creal Email:
More informationChapter 12 Summarizing Bivariate Data Linear Regression and Correlation
Chapter 1 Summarizing Bivariate Data Linear Regression and Correlation This chapter introduces an important method for making inferences about a linear correlation (or relationship) between two variables,
More informationChapter 6: Exploring Data: Relationships Lesson Plan
Chapter 6: Exploring Data: Relationships Lesson Plan For All Practical Purposes Displaying Relationships: Scatterplots Mathematical Literacy in Today s World, 9th ed. Making Predictions: Regression Line
More informationMath 147 Lecture Notes: Lecture 12
Math 147 Lecture Notes: Lecture 12 Walter Carlip February, 2018 All generalizations are false, including this one.. Samuel Clemens (aka Mark Twain) (1835-1910) Figures don t lie, but liars do figure. Samuel
More informationSolving Quadratic & Higher Degree Equations
Chapter 9 Solving Quadratic & Higher Degree Equations Sec 1. Zero Product Property Back in the third grade students were taught when they multiplied a number by zero, the product would be zero. In algebra,
More information( )( b + c) = ab + ac, but it can also be ( )( a) = ba + ca. Let s use the distributive property on a couple of
Factoring Review for Algebra II The saddest thing about not doing well in Algebra II is that almost any math teacher can tell you going into it what s going to trip you up. One of the first things they
More informationSOLUTIONS FOR PROBLEMS 1-30
. Answer: 5 Evaluate x x + 9 for x SOLUTIONS FOR PROBLEMS - 0 When substituting x in x be sure to do the exponent before the multiplication by to get (). + 9 5 + When multiplying ( ) so that ( 7) ( ).
More informationNov 13 AP STAT. 1. Check/rev HW 2. Review/recap of notes 3. HW: pg #5,7,8,9,11 and read/notes pg smartboad notes ch 3.
Nov 13 AP STAT 1. Check/rev HW 2. Review/recap of notes 3. HW: pg 179 184 #5,7,8,9,11 and read/notes pg 185 188 1 Chapter 3 Notes Review Exploring relationships between two variables. BIVARIATE DATA Is
More informationThe Simple Linear Regression Model
The Simple Linear Regression Model Lesson 3 Ryan Safner 1 1 Department of Economics Hood College ECON 480 - Econometrics Fall 2017 Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 1 / 77 Bivariate
More information9. Linear Regression and Correlation
9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,
More informationAbout Bivariate Correlations and Linear Regression
About Bivariate Correlations and Linear Regression TABLE OF CONTENTS About Bivariate Correlations and Linear Regression... 1 What is BIVARIATE CORRELATION?... 1 What is LINEAR REGRESSION... 1 Bivariate
More informationUnit 9 Regression and Correlation Homework #14 (Unit 9 Regression and Correlation) SOLUTIONS. X = cigarette consumption (per capita in 1930)
BIOSTATS 540 Fall 2015 Introductory Biostatistics Page 1 of 10 Unit 9 Regression and Correlation Homework #14 (Unit 9 Regression and Correlation) SOLUTIONS Consider the following study of the relationship
More information