Comparing Quantitative Variables Lecture 8 January 29, 2018 Four Stages of Statistics Data Collection Displaying and Summarizing Data One Categorical Two Categorical One Quantitative One Categorical and One Quantitative Two Quantitative Probability Inference Scatterplot Scatterplot:graphical display of the relationship between two quantitative variables Response Variable:variable plotted along y-axis that we are trying to explain or predict Predictor Variable:variable plotted along x-axis that we are using to explain changes about the response variable Observations plotted as ordered pairs Example #1: Response vs. Predictor Scenario:Want to know if a student s verbal SAT score gives any information about their math SAT score by sampling 44 Pitt students. Question:What should be the response variable? What should be the predictor variable? Response: Math SAT score Predictor: Verbal SAT score
Describing a Scatterplot Linearity Linear:points on scatterplot are clustered around a straight line Nonlinear:points on scatterplot follow some other type of pattern (parabolic, exponential, etc.) Slope of curve changes as predictor variable changes Direction Positive:response variable tends to increase as predictor variable increases Negative: response variable tends to decrease as predictor variable increases None: no visible pattern between variables Covariance Covariance:measure of strength of relationship between two quantitative variables Two Types: Population Covariance:Denoted by Parameter Generally unknown Sample Covariance: Denoted by Statistic Good approximation of = + + 1 Example #2: Describing a Scatterplot Scenario:Want to know if a student s verbal SAT score gives any information about their math SAT score by sampling 44 Pitt students. Question: What are the linearity and direction? Linearity: Math scores change at Direction: Math scores tend to Facts About Covariance Positive covariance implies positive relationship; negative covariance implies negative relationship Variables unrelated if covariance equals 0 Unbounded Values range from to Problem: Does not help us interpret how strong the relationship is
Example #3: Covariance Scenario:Math vs. verbal SAT on left; scatterplot comparing random sample of students midterm and final exam scores on right Question:Which scatterplot appears to have the stronger linear relationship? Example #3: Covariance Scenario:Math vs. verbal SAT on left; scatterplot comparing random sample of students midterm and final exam scores on right Question: What does the covariance tell us? Correlation Correlation:measures how strong the linear relationship is between two quantitative variables Two Types: Population Correlation:Denoted by (Greek letter rho ) Parameter Generally unknown Sample Correlation: Denoted by Statistic Good approximation of = Sample covariance Product of std. deviations Describing Scatterplot Using Correlation Correlation Type of Linear Relationship 0.70 to 1.00 Strong Positive Linear Relationship 0.40 to 0.70 Moderate Positive Linear Relationship 0.10 to 0.40 Weak Positive Linear Relationship -0.10 to 0.10 No Linear Relationship -0.40 to -0.10 Weak Negative Linear Relationship -0.70 to -0.40 Moderate Negative Linear Relationship -1.00 to -0.70 Strong Negative Linear Relationship
Example #4: Calculating Correlation Scenario:Want to compare random sample of students midterm and final exam scores Covariance 53.36 7.34 7.90 Question: What is the sample correlation? Example #5: Estimating Correlation Scenario:Scatterplot shows list price for 33 used cars and their corresponding ages in years Question:Approximately what is the correlation between age and sale price? relationship Points are Actual Correlation: Example #6: Switching Roles of Variables Scenario:Correlation when using math as response and verbal as predictor was 0.619. Question:What is the correlation if verbal is the response and math is the predictor variable? Example #7: Ordering Correlations Scenario: Scatterplots show observations with predictors from 1 to 25, responses from 0 to 100. Task:Order the scatterplots from weakest correlation to strongest. A. B. C. Takeaway: Switching the roles of the variables = = =
Least Squares Regression Line Least Squares Regression Line:best possible line that can be drawn through a set of data on a scatterplot Summarizes linear relationship between the predictor ( ) and response ( ) variables through the equation = + Make predictions about response variable using the predictor Least Squares Regression Line Regression Line: = + : Intercept (Predicted response when =0) : Slope (Predicted change for every additional unit increase in ) Slope of regression line tells direction of relationship : Value of predictor variable : Expected value of response variable at value of Example #8: Regression Line Question:What does the regression line tell us about the relationship between age and sales price? Sales prices tend to Example #9: Regression Line Question:How much would we expect to pay for a used car that is 7 years old?
Example #10: Regression Line Question:What do the intercept and slope mean? Intercept:A car that is 0 years old would have a Slope:For every additional year older a car is, Example #12: Reviewing Main Concepts Scenario:We take a random sample of high school students and survey each to see if they play an instrument, play a sport, or participate in a club. Question:What type of graph is best to display the results of the survey? Question:What numerical summary is best to summarize the results? Example #11: Comparing Quan. Variables Question: Which of these statements are true? I. There is a strong, negative linear relationship. II. A car that was 2 years old and sold for $4,000 is close to what we would expect III. The slope of the regression line is positive. Relationship is $4,000 is to what is expected Slope of regression line is Summary Scatterplot:graphical display of two quantitative variables Description: linearity, direction Correlation:numerical summary of strength of linear relationship in scatterplot Close to 1 in absolute value strong relationship Unaffected by flipping roles of variables Does not necessarily work for curved relationships Regression Line:used to make predictions and determine direction of relationship