Scatterplots and Correlation Name Hr A scatterplot shows the relationship between two quantitative variables measured on the same individuals. variable (y) measures an outcome of a study variable (x) may help explain or influence changes in a response variable. A **Remember, the explanatory variable goes on the X-axis!** EXAMPLE 1: Identify the explanatory and response variables: Mrs. Sapp is interested in the relationship between the hours students spend studying for an exam and their score on the exam. A researcher is interested in the effects a new drug has on reducing muscle spasms. How to Make a Scatterplot: 1. Decide which variable is explanatory (x) and which is response (y). 2. Label and scale your axes. (Note: The axes don t need to intersect at (0, 0).) 3. Plot individual values. EXAMPLE 2: Make a scatterplot of the relationship between body weight and pack weight.
Interpreting Scatterplots As in any graph of data, look for the overall (DSS) and for striking from that pattern. AP On the AP Exam, you will need to mention three important characteristics of the scatterplot: Direction: Two variables have a association when above-average values of one tend to accompany above-average values of the other, and when below-average values also tend to occur together. (i.e., Generally speaking, the y values tend to increase as the x values increase.) Two variables have a association when above-average values of one tend to accompany below-average values of the other. (i.e., Generally speaking, the y values tend to decrease as the x values increase.) Shape: Does the data appear linear or curved? Strength: If the points cluster closely around an imaginary line, the association is. If the points are scattered farther from the line, the association is. Outliers or Influential Points: Outlier: an individual value that falls outside the overall pattern of the relationship. Ask yourself:? In a regression setting, an outlier is a data point with a large Influential point: when removed, the of the relationship significantly changes (it influences where the LSRL is located) Typically, if an observation is an outlier, it will be influential Positive or Negative Relationship? a) Minutes spent studying and exam score c) Age and bone density b) Age of vehicle and value of vehicle d) Write down a positive example: EXAMPLE 3: EXAMPLE 4: Can Mrs. Sapp be bribed with chocolate?
Correlation The correlation measures the strength of the linear relationship between two quantitative variables. r is always a number between and r > 0 indicates a association. r < 0 indicates a association. Values of r near indicate a very weak linear relationship. The strength of the linear relationship increases as r moves away from 0 towards -1 or 1. The extreme values r = 1 and r = 1 occur only in the case of a linear relationship. FACTS about correlation: 1. Correlation makes no distinction between explanatory and response variables. 2. r does not when we change the units of measurement of x, y, or both. 3. The correlation r itself has no of measurement. 4. The correlation coefficient is 1 only when all the points lie on a downward-sloping line, and +1 only when all the points lie on an upward-sloping line. 5. The value of r is a measure of the extent to which x and y are. IMPORTANT: A value of r close to zero does not rule out any strong relationship; it just rules out a linear one.
CAUTIONS: Correlation requires that both variables be quantitative. Correlation does not describe curved relationships between variables, no matter how strong the relationship is. Correlation is not. (r is strongly affected by a few outlying observations.) Correlation is not a complete summary of two-variable data. EXAMPLE 5: Interpret the relationship between the variables. a) b) x y 1.2 23.3 2.5 21.5 6.5 12.2 13.1 3.9 24.2 4.0 34.1 18.0 20.8 1.7 37.5 26.1 Correlation and Causation Least squares regression line 1. Correlation measures the extent of association (strong, moderate, or weak), but association does not imply causation! 2. It can frequently happen that two variables are highly correlated not because one is causally related to the other but because they are both strongly related to a third variable (called a confounding variable). For instance, why do high values of hot chocolate consumption tend to be paired with lower crime rates? 3. The only way to make a strong case for causation is by conducting a well-controlled scientific experiment!
Least Squares Regression ŷ a bx is the predicted value of the response variable y is the slope, the amount by which y is predicted to change when x increases by one unit. is the y intercept, the predicted value of y when x = 0. EXAMPLE 6: Does Fidgeting Keep You Slim? Slope: fatgain = 3.505-0.00344(NEA change) y-intercept: Predict the fat gain when NEA= 400 calories: EXAMPLE 7: The ages (in months) and heights (in inches) of seven children are given. x 16 24 42 60 75 102 120 y 24 30 35 40 48 56 60 Determine the LSRL (Least Squares Regression Line): Interpret the slope: Interpret the y-int: Predict the height of a child who is 4.5 years old: Predict the height of someone who is 20 years old:
is the use of a regression line for predictions outside the interval of values of the explanatory variable x used to obtain the line. Such predictions are often not accurate. RESIDUALS: The vertical between the observations & the LSRL the sum of the residuals is always FORMULA : Residual plots: A scatterplot of the (x, ) pairs. Purpose is to tell if a association exists between the x & y variables If exists between the points in the residual plot, then the association is linear.
EXAMPLES: EXAMPLE 8: Determine the LSRL (Least Squares Regression Line): Find the correlation coefficient (r): Interpret the slope: Interpret the y-int: Predict the range of motion for a 29-year-old: Predict the range of motion for a 50-year-old: Make a residual plot. What does this plot tell you about the linearity of the data? Calculate the residual for age 24: Calculate the residual for age 14:
Outliers and Influential Points: Standard deviation formula: Interpretation: The a typical value is from the LSRL COEFFICIENT OF DETERMINATION (r 2 ) : the percent variation in y can be explained by the least-squares regression line of y on x. FORMULA: MEM ORIZ E Memorize this statement! Example: Referring to the age and range of motion data, how well does age predict the range of motion after knee surgery?
COMPUTER OUTPUT Example continued Minitab output looks like Regression Analysis: % Fat y versus Age (x) Estimated y intercept a The regression equation is Regression line % Fat y = 3.22 + 0.548 Age (x) Estimated slope b Predictor Coef SE Coef T P Constant 3.221 5.076 0.63 0.535 Age (x) 0.5480 0.1056 5.19 0.000 S = 5.754 R-Sq = 62.7% R-Sq(adj) = 60.4% Analysis of Variance Source DF SS MS F P Regression 1 891.87 891.87 26.94 0.000 Residual Error 16 529.66 33.10 Total 17 1421.54 SSTo residual df = n -2 16 SSResid 2 s e EXAMPLE 1:
EXAMPLE 2: a) Write the equation of the LSRL. b) State and interpret the correlation coefficient. c) State and interpret the slope. d) State and interpret the standard deviation. e) State and interpret the coefficient of determination.