Chapter 10 Correlation and Regression

Similar documents
Regression Equation. April 25, S10.3_3 Regression. Key Concept. Chapter 10 Correlation and Regression. Definitions

Regression Equation. November 28, S10.3_3 Regression. Key Concept. Chapter 10 Correlation and Regression. Definitions

CREATED BY SHANNON MARTIN GRACEY 146 STATISTICS GUIDED NOTEBOOK/FOR USE WITH MARIO TRIOLA S TEXTBOOK ESSENTIALS OF STATISTICS, 3RD ED.

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Chapter 9. Correlation and Regression

Chapter 12 Summarizing Bivariate Data Linear Regression and Correlation

Chapter 10. Correlation and Regression. Lecture 1 Sections:

CHAPTER 4 DESCRIPTIVE MEASURES IN REGRESSION AND CORRELATION

Chapter 7. Scatterplots, Association, and Correlation

Section 4.1 Polynomial Functions and Models. Copyright 2013 Pearson Education, Inc. All rights reserved

Chapter 5 Friday, May 21st

Chapter 6: Exploring Data: Relationships Lesson Plan

Chapter 8. Linear Regression. Copyright 2010 Pearson Education, Inc.

Chapters 9 and 10. Review for Exam. Chapter 9. Correlation and Regression. Overview. Paired Data

MODELING. Simple Linear Regression. Want More Stats??? Crickets and Temperature. Crickets and Temperature 4/16/2015. Linear Model

Chapter 5 Least Squares Regression

Chapter 10 Regression Analysis

Summarizing Data: Paired Quantitative Data

Chapter 14. Statistical versus Deterministic Relationships. Distance versus Speed. Describing Relationships: Scatterplots and Correlation

Chapter 8. Linear Regression. The Linear Model. Fat Versus Protein: An Example. The Linear Model (cont.) Residuals

Statistical View of Least Squares

Warm-up Using the given data Create a scatterplot Find the regression line

Sociology 6Z03 Review I

Review of Regression Basics

MATH 80S Residuals and LSR Models October 3, 2011

Correlation and Regression

6.6 General Form of the Equation for a Linear Relation

Lecture 15: Chapter 10

Chapter 4 Describing the Relation between Two Variables

Section 1.4. Meaning of Slope for Equations, Graphs, and Tables

Chapter 12 - Part I: Correlation Analysis

Analysis of Bivariate Data

Chapter 2: Looking at Data Relationships (Part 3)

Unit 6 - Introduction to linear regression

Objectives. 2.3 Least-squares regression. Regression lines. Prediction and Extrapolation. Correlation and r 2. Transforming relationships

Correlation and Regression

THE PEARSON CORRELATION COEFFICIENT

Regression and correlation. Correlation & Regression, I. Regression & correlation. Regression vs. correlation. Involve bivariate, paired data, X & Y

INFERENCE FOR REGRESSION

AP Statistics Unit 6 Note Packet Linear Regression. Scatterplots and Correlation

Chapter 6. September 17, Please pick up a calculator and take out paper and something to write with. Association and Correlation.

Intermediate Algebra Summary - Part I

Bivariate Data Summary

Single and multiple linear regression analysis

Mini Lecture 2.1 Introduction to Functions

Chapter 3: Examining Relationships

Chapter 7 Linear Regression

Correlation: basic properties.

Regression Models. Chapter 4

Chapter 27 Summary Inferences for Regression

AP Statistics Two-Variable Data Analysis

Section Linear Correlation and Regression. Copyright 2013, 2010, 2007, Pearson, Education, Inc.

Describing the Relationship between Two Variables

Chapter 10. Regression. Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania

Linear Regression. Linear Regression. Linear Regression. Did You Mean Association Or Correlation?

Regression Diagnostics Procedures

BIVARIATE DATA data for two variables

Correlation and Linear Regression

Correlation and Regression

appstats8.notebook October 11, 2016

Common Core State Standards for Mathematics

appstats27.notebook April 06, 2017

Chapter 8. Linear Regression /71

ECON3150/4150 Spring 2015

Chapter 3: Describing Relationships

Chapter 7. Scatterplots, Association, and Correlation. Copyright 2010 Pearson Education, Inc.

Business Statistics. Lecture 10: Correlation and Linear Regression

Systems of Linear Equations

36-309/749 Math Review 2014

Module 1: Equations and Inequalities (30 days) Solving Equations: (10 Days) (10 Days)

Unit 6 - Simple linear regression

STAT Chapter 11: Regression

Nonlinear Regression Section 3 Quadratic Modeling

Lecture Slides. Elementary Statistics Eleventh Edition. by Mario F. Triola. and the Triola Statistics Series 9.1-1

Stat 101 Exam 1 Important Formulas and Concepts 1

Chapter Goals. To understand the methods for displaying and describing relationship among variables. Formulate Theories.

We will now find the one line that best fits the data on a scatter plot.

Chapter 4 Data with Two Variables

11 Correlation and Regression

Chapter 3: Describing Relationships

9. Linear Regression and Correlation

Linear Regression and Correlation. February 11, 2009

Common Core State Standards for Mathematics

AMS 7 Correlation and Regression Lecture 8

BIOSTATISTICS NURS 3324

AP Final Review II Exploring Data (20% 30%)

MICHIGAN STANDARDS MAP for a Basic Grade-Level Program. Grade Eight Mathematics (Algebra I)

Chapter 2. Polynomial and Rational Functions. 2.3 Polynomial Functions and Their Graphs. Copyright 2014, 2010, 2007 Pearson Education, Inc.

Chapter 3: Examining Relationships

Lecture 3. The Population Variance. The population variance, denoted σ 2, is the sum. of the squared deviations about the population

AP STATISTICS Name: Period: Review Unit IV Scatterplots & Regressions

Relationships Regression

AP Statistics L I N E A R R E G R E S S I O N C H A P 7

Chapte The McGraw-Hill Companies, Inc. All rights reserved.

Regression. Estimation of the linear function (straight line) describing the linear component of the joint relationship between two variables X and Y.

Chapter 4 Data with Two Variables

Chapter 5: Data Transformation

The following formulas related to this topic are provided on the formula sheet:

REVIEW 8/2/2017 陈芳华东师大英语系

1) A residual plot: A)

Transcription:

Chapter 10 Correlation and Regression 10-1 Review and Preview 10-2 Correlation 10-3 Regression 10-4 Variation and Prediction Intervals 10-5 Multiple Regression 10-6 Modeling Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 10.1-1

Section 10-1 Review and Preview Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 10.1-2

Preview In this chapter we introduce methods for determining whether a correlation, or association, between two variables exists and whether the correlation is linear. For linear correlations, we can identify an equation that best fits the data and we can use that equation to predict the value of one variable given the value of the other variable. In this chapter, we also present methods for analyzing differences between predicted values and actual values. Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 10.1-3

Section 10-2 Correlation Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 10.1-4

Key Concept In part 1 of this section introduces the linear correlation coefficient r, which is a numerical measure of the strength of the relationship between two variables representing quantitative data. Using paired sample data (sometimes called bivariate data), we find the value of r (usually using technology), then we use that value to conclude that there is (or is not) a linear correlation between the two variables. Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 10.1-5

Part 1: Basic Concepts of Correlation Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 10.1-6

Definition A correlation exists between two variables when the values of one are somehow associated with the values of the other in some way. Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 10.1-7

Definition The linear correlation coefficient r measures the strength of the linear relationship between the paired quantitative x- and y-values in a sample. Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 10.1-8

Exploring the Data We can often see a relationship between two variables by constructing a scatterplot. Figure 10-2 following shows scatterplots with different characteristics. Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 10.1-9

Scatterplots of Paired Data Figure 10-2 Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 10.1-10

Scatterplots of Paired Data Figure 10-2 Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 10.1-11

Scatterplots of Paired Data Figure 10-2 Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 10.1-12

Requirements 1. The sample of paired (x, y) data is a simple random sample of quantitative data. 2. Visual examination of the scatterplot must confirm that the points approximate a straight-line pattern. 3. The outliers must be removed if they are known to be errors. The effects of any other outliers should be considered by calculating r with and without the outliers included. Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 10.1-13

Notation for the Linear Correlation Coefficient n = number of pairs of sample data x x 2 ( x) 2 denotes the addition of the items indicated. denotes the sum of all x-values. indicates that each x-value should be squared and then those squares added. indicates that the x-values should be added and then the total squared. Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 10.1-14

Notation for the Linear Correlation Coefficient xy r indicates that each x-value should be first multiplied by its corresponding y-value. After obtaining all such products, find their sum. = linear correlation coefficient for sample data. = linear correlation coefficient for population data. Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 10.1-15

Formula The linear correlation coefficient r measures the strength of a linear relationship between the paired values in a sample. r = n xy ( x)( y) n( x 2 ) ( x) 2 n( y 2 ) ( y) 2 Formula 10-1 Computer software or calculators can compute r Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 10.1-16

Interpreting r Using Table A-6: If the absolute value of the computed value of r, denoted r, exceeds the value in Table A-6, conclude that there is a linear correlation. Otherwise, there is not sufficient evidence to support the conclusion of a linear correlation. Using Software: If the computed P-value is less than or equal to the significance level, conclude that there is a linear correlation. Otherwise, there is not sufficient evidence to support the conclusion of a linear correlation. Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 10.1-17

Caution Know that the methods of this section apply to a linear correlation. If you conclude that there does not appear to be linear correlation, know that it is possible that there might be some other association that is not linear. Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 10.1-18

Rounding the Linear Correlation Coefficient r Round to three decimal places so that it can be compared to critical values in Table A-6. Use calculator or computer if possible. Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 10.1-19

Properties of the Linear Correlation Coefficient r 1. 1 r 1 2. if all values of either variable are converted to a different scale, the value of r does not change. 3. The value of r is not affected by the choice of x and y. Interchange all x- and y-values and the value of r will not change. 4. r measures strength of a linear relationship. 5. r is very sensitive to outliers, they can dramatically affect its value. Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 10.1-20

Example: The paired pizza/subway fare costs from Table 10-1 are shown here in Table 10-2. Use computer software with these paired sample values to find the value of the linear correlation coefficient r for the paired sample data. Requirements are satisfied: simple random sample of quantitative data; Minitab scatterplot approximates a straight line; scatterplot shows no outliers - see next slide Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 10.1-21

Example: Using software or a calculator, r is automatically calculated: Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 10.1-22

Interpreting the Linear Correlation Coefficient r Using Table A-6 to Interpret r: If r exceeds the value in Table A-6, conclude that there is a linear correlation. Otherwise, there is not sufficient evidence to support the conclusion of a linear correlation. Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 10.1-23

Interpreting the Linear Correlation Coefficient r Critical Values from Table A-6 and the Computed Value of r Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 10.1-24

Example: Using a 0.05 significance level, interpret the value of r = 0.117 found using the 62 pairs of weights of discarded paper and glass listed in Data Set 22 in Appendix B. When the paired data are used with computer software, the P-value is found to be 0.364. Is there sufficient evidence to support a claim of a linear correlation between the weights of discarded paper and glass? Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 10.1-25

Example: Using Table A-6 to Interpret r: If we refer to Table A-6 with n = 62 pairs of sample data, we obtain the critical value of 0.254 (approximately) for = 0.05. Because 0.117 does not exceed the value of 0.254 from Table A-6, we conclude that there is not sufficient evidence to support a claim of a linear correlation between weights of discarded paper and glass. Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 10.1-26

Interpreting r: Explained Variation The value of r 2 is the proportion of the variation in y that is explained by the linear relationship between x and y. Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 10.1-27

Example: Using the pizza subway fare costs in Table 10-2, we have found that the linear correlation coefficient is r = 0.988. What proportion of the variation in the subway fare can be explained by the variation in the costs of a slice of pizza? With r = 0.988, we get r 2 = 0.976. We conclude that 0.976 (or about 98%) of the variation in the cost of a subway fares can be explained by the linear relationship between the costs of pizza and subway fares. This implies that about 2% of the variation in costs of subway fares cannot be explained by the costs of pizza. Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 10.1-28

Common Errors Involving Correlation 1. Causation: It is wrong to conclude that correlation implies causality. 2. Averages: Averages suppress individual variation and may inflate the correlation coefficient. 3. Linearity: There may be some relationship between x and y even when there is no linear correlation. Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 10.1-29

Caution Know that correlation does not imply causality. Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 10.1-30

Section 10-3 Regression Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 10.1-31

Key Concept In part 1 of this section we find the equation of the straight line that best fits the paired sample data. That equation algebraically describes the relationship between two variables. The best-fitting straight line is called a regression line and its equation is called the regression equation. In part 2, we discuss marginal change, influential points, and residual plots as tools for analyzing correlation and regression results. Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 10.1-32

Part 1: Basic Concepts of Regression Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 10.1-33

Regression The regression equation expresses a relationship between x (called the explanatory variable, predictor variable or independent variable), and ^y (called the response variable or dependent variable). The typical equation of a straight line y = mx + b is expressed in the form ^y = b 0 + b 1 x, where b 0 is the y-intercept and b 1 is the slope. Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 10.1-34

Definitions Regression Equation Given a collection of paired data, the regression equation y ^ = b 0 + b 1 x algebraically describes the relationship between the two variables. Regression Line The graph of the regression equation is called the regression line (or line of best fit, or least squares line). Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 10.1-35

Notation for Regression Equation Population Parameter Sample Statistic y-intercept of regression equation Slope of regression equation 0 b 0 1 b 1 Equation of the regression line y = 0 + 1 x y ^= b 0 + b 1 x Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 10.1-36

Requirements 1. The sample of paired (x, y) data is a random sample of quantitative data. 2. Visual examination of the scatterplot shows that the points approximate a straight-line pattern. 3. Any outliers must be removed if they are known to be errors. Consider the effects of any outliers that are not known errors. Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 10.1-37

Formulas for b 0 and b 1 Formula 10-3 b 1 r s y s x (slope) Formula 10-4 b 0 y b 1 x (y-intercept) calculators or computers can compute these values Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 10.1-38

Special Property The regression line fits the sample points best. Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 10.1-39

Rounding the y-intercept b 0 and the Slope b 1 Round to three significant digits. If you use the formulas 10-3 and 10-4, do not round intermediate values. Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 10.1-40

Example: Refer to the sample data given in Table 10-1 in the Chapter Problem. Use technology to find the equation of the regression line in which the explanatory variable (or x variable) is the cost of a slice of pizza and the response variable (or y variable) is the corresponding cost of a subway fare. Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 10.1-41

Example: Requirements are satisfied: simple random sample; scatterplot approximates a straight line; no outliers Here are results from four different technologies technologies Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 10.1-42

Example: All of these technologies show that the regression equation can be expressed as y ^ = 0.0346 +0.945x, where ^y is the predicted cost of a subway fare and x is the cost of a slice of pizza. We should know that the regression equation is an estimate of the true regression equation. This estimate is based on one particular set of sample data, but another sample drawn from the same population would probably lead to a slightly different equation. Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 10.1-43

Example: Graph the regression equation yˆ 0.0346 0.945x (from the preceding Example) on the scatterplot of the pizza/subway fare data and examine the graph to subjectively determine how well the regression line fits the data. On the next slide is the Minitab display of the scatterplot with the graph of the regression line included. We can see that the regression line fits the data quite well. Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 10.1-44

Example: Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 10.1-45

Using the Regression Equation for Predictions 1. Use the regression equation for predictions only if the graph of the regression line on the scatterplot confirms that the regression line fits the points reasonably well. 2. Use the regression equation for predictions only if the linear correlation coefficient r indicates that there is a linear correlation between the two variables (as described in Section 10-2). Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 10.1-46

Using the Regression Equation for Predictions 3. Use the regression line for predictions only if the data do not go much beyond the scope of the available sample data. (Predicting too far beyond the scope of the available sample data is called extrapolation, and it could result in bad predictions.) 4. If the regression equation does not appear to be useful for making predictions, the best predicted value of a variable is its point estimate, which is its sample mean. Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 10.1-47

Strategy for Predicting Values of Y Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 10.1-48

Using the Regression Equation for Predictions If the regression equation is not a good model, the best predicted value of y is simply y, ^ the mean of the y values. Remember, this strategy applies to linear patterns of points in a scatterplot. If the scatterplot shows a pattern that is not a straight-line pattern, other methods apply, as described in Section 10-6. Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 10.1-49

Part 2: Beyond the Basics of Regression Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 10.1-50

Definitions In a scatterplot, an outlier is a point lying far away from the other data points. Paired sample data may include one or more influential points, which are points that strongly affect the graph of the regression line. Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 10.1-51

Example: Consider the pizza subway fare data from the Chapter Problem. The scatterplot located to the left on the next slide shows the regression line. If we include this additional pair of data: x = 2.00,y = 20.00 (pizza is still $2.00 per slice, but the subway fare is $ 20.00 which means that people are paid $20 to ride the subway), this additional point would be an influential point because the graph of the regression line would change considerably, as shown by the regression line located to the right. Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 10.1-52

Example: Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 10.1-53

Example: Compare the two graphs and you will see clearly that the addition of that one pair of values has a very dramatic effect on the regression line, so that additional point is an influential point. The additional point is also an outlier because it is far from the other points. Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 10.1-54