Chapter 3: Examining Relationships

Similar documents
Bivariate Data Summary

AP Statistics Two-Variable Data Analysis

1. Create a scatterplot of this data. 2. Find the correlation coefficient.

Chapter 8. Linear Regression. Copyright 2010 Pearson Education, Inc.

Regression line. Regression. Regression line. Slope intercept form review 9/16/09

BIVARIATE DATA data for two variables

Objectives. 2.3 Least-squares regression. Regression lines. Prediction and Extrapolation. Correlation and r 2. Transforming relationships

Chapter 12 Summarizing Bivariate Data Linear Regression and Correlation

Linear Regression. Linear Regression. Linear Regression. Did You Mean Association Or Correlation?

THE PEARSON CORRELATION COEFFICIENT

appstats8.notebook October 11, 2016

Chapter 5 Friday, May 21st

Unit 6 - Introduction to linear regression

Least-Squares Regression. Unit 3 Exploring Data

Chapter 7 Linear Regression

Linear Regression and Correlation. February 11, 2009

Chapter 3: Examining Relationships

5.1 Bivariate Relationships

Chi-square tests. Unit 6: Simple Linear Regression Lecture 1: Introduction to SLR. Statistics 101. Poverty vs. HS graduate rate

ASSIGNMENT 3 SIMPLE LINEAR REGRESSION. Old Faithful

Chapter 8. Linear Regression. The Linear Model. Fat Versus Protein: An Example. The Linear Model (cont.) Residuals

Chapter 6. September 17, Please pick up a calculator and take out paper and something to write with. Association and Correlation.

Chapter 8. Linear Regression /71

Reteach 2-3. Graphing Linear Functions. 22 Holt Algebra 2. Name Date Class

CHAPTER 4 DESCRIPTIVE MEASURES IN REGRESSION AND CORRELATION

LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION

Important note: Transcripts are not substitutes for textbook assignments. 1

Correlation & Simple Regression

Statistical View of Least Squares

PS5: Two Variable Statistics LT3: Linear regression LT4: The test of independence.

STA Module 5 Regression and Correlation. Learning Objectives. Learning Objectives (Cont.) Upon completing this module, you should be able to:

Unit 6 - Simple linear regression

Announcements: You can turn in homework until 6pm, slot on wall across from 2202 Bren. Make sure you use the correct slot! (Stats 8, closest to wall)

Inferences for Regression

Chapter 4 Describing the Relation between Two Variables

9. Linear Regression and Correlation

Scatterplots and Correlation

3.2: Least Squares Regressions

Describing Bivariate Relationships

BIOSTATISTICS NURS 3324

Lecture 16 - Correlation and Regression

Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT. Charlotte Wickham. stat511.cwick.co.nz. Nov

Chapter 10 Correlation and Regression

Analysing data: regression and correlation S6 and S7

Business Statistics. Lecture 10: Correlation and Linear Regression

INFERENCE FOR REGRESSION

Chapter 3: Describing Relationships

Chapter 2: Looking at Data Relationships (Part 3)

Announcements. Lecture 10: Relationship between Measurement Variables. Poverty vs. HS graduate rate. Response vs. explanatory

Algebra II Notes Quadratic Functions Unit Applying Quadratic Functions. Math Background

LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION

4.1 Introduction. 4.2 The Scatter Diagram. Chapter 4 Linear Correlation and Regression Analysis

Analysis of Bivariate Data

Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues

Determine is the equation of the LSRL. Determine is the equation of the LSRL of Customers in line and seconds to check out.. Chapter 3, Section 2

Chapter 3: Describing Relationships

Chapter 14. Statistical versus Deterministic Relationships. Distance versus Speed. Describing Relationships: Scatterplots and Correlation

Nov 13 AP STAT. 1. Check/rev HW 2. Review/recap of notes 3. HW: pg #5,7,8,9,11 and read/notes pg smartboad notes ch 3.

Announcements. Lecture 18: Simple Linear Regression. Poverty vs. HS graduate rate

Regression and correlation. Correlation & Regression, I. Regression & correlation. Regression vs. correlation. Involve bivariate, paired data, X & Y

AP Statistics Unit 6 Note Packet Linear Regression. Scatterplots and Correlation

Intermediate Algebra Summary - Part I

Warm-up Using the given data Create a scatterplot Find the regression line

Announcements. Unit 6: Simple Linear Regression Lecture : Introduction to SLR. Poverty vs. HS graduate rate. Modeling numerical variables

Lecture 48 Sections Mon, Nov 16, 2009

Chapter 6. Exploring Data: Relationships. Solutions. Exercises:

MATH 1150 Chapter 2 Notation and Terminology

Chapter 6: Exploring Data: Relationships Lesson Plan

Stat 101 Exam 1 Important Formulas and Concepts 1

Scatterplots and Correlation

y = a + bx 12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation Review: Interpreting Computer Regression Output

Chapter 7. Scatterplots, Association, and Correlation

AP Statistics - Chapter 2A Extra Practice

Example: Can an increase in non-exercise activity (e.g. fidgeting) help people gain less weight?

Parametric Estimating Nonlinear Regression

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

1 A Review of Correlation and Regression

Pre-Calculus Multiple Choice Questions - Chapter S8

This module focuses on the logic of ANOVA with special attention given to variance components and the relationship between ANOVA and regression.

The Correlation Principle. Estimation with (Nonparametric) Correlation Coefficients

Correlation & Regression. Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University of Alexandria

7. Do not estimate values for y using x-values outside the limits of the data given. This is called extrapolation and is not reliable.

Relationships Regression

Introduction to Simple Linear Regression

Scatter plot of data from the study. Linear Regression

Chapter 5: Data Transformation

Chapter 7. Scatterplots, Association, and Correlation. Copyright 2010 Pearson Education, Inc.

Lecture 3. The Population Variance. The population variance, denoted σ 2, is the sum. of the squared deviations about the population

Chapter 4 Data with Two Variables

What is the easiest way to lose points when making a scatterplot?

Correlation and Regression

appstats27.notebook April 06, 2017

Chapter 11. Correlation and Regression

Looking at data: relationships

Chapter Goals. To understand the methods for displaying and describing relationship among variables. Formulate Theories.

STA 218: Statistics for Management

Mathematics for Economics MA course

Chapter 4 Data with Two Variables

Related Example on Page(s) R , 148 R , 148 R , 156, 157 R3.1, R3.2. Activity on 152, , 190.

Chapter 27 Summary Inferences for Regression

Transcription:

Chapter 3: Examining Relationships 3.1 Scatterplots 3.2 Correlation 3.3 Least-Squares Regression Fabric Tenacity, lb/oz/yd^2 26 25 24 23 22 21 20 19 18 y = 3.9951x + 4.5711 R 2 = 0.9454 3.5 4.0 4.5 5.0 Fiber Tenacity, g/den 1

Relationship Between Fiber Tenacity and Fabric Tenacity Fiber Tenacity, g/den Fabric Tenacity, lb/oz/yd 2 3.6 19.0 3.9 20.5 4.1 20.8 4.3 21.0 4.8 23.0 5.0 24.9 2

Variable Designations Which variable is the dependent variable? Our text uses the term response variable. Which variable is the independent variable? Explanatory variable Note: Sometimes we do not have a clear explanatoryresponse variable situation we may just want to look at the relationship between two variables. Problems 3.1 and 3.4, p. 123 3

Scatterplot 1: Relationship Between Fiber Tenacity and Fabric Tenacity Fabric Tenacity, lb/oz/yd^2 26 25 24 23 22 21 20 19 18 3.5 4.0 4.5 5.0 Fiber Tenacity, g/den Note placement of response and explanatory variables. Also note axes labels and plot title. 4

Problem 3.6, p. 125 Type data into your calculator. Examining a scatterplot: Look for the overall pattern and striking deviations from that pattern. Pay particular attention to outliers Look at form, direction, and strength of the relationship. 5

Examining a Scatterplot, cont. Form Does the relationship appear to be linear? Direction Positively or negatively associated? Strength of Relationship How closely do the points follow a clear form? In the next section, we will discuss the correlation coefficient as a numerical measure of strength of relationship. 6

Scatterplot for 3.6 7

Problem 3.9, p. 129 8

Tips for Drawing Scatterplots p. 128 9

Adding a Categorical Variable to a Scatterplot Income (Thousands of Year 2000 Dollars) 60 50 40 30 20 10 0 60 70 80 90 100 110 Year (67=year 1967) Black Hispanic White Asian 10

Homework Reading: pp. 121-135 11

Practice Problems: 3.11 (p. 129) 3.12 (p. 132) 3.16 (p. 136) 12

Figure 3.6, p. 136 13

1600 1500 1400 1300 1200 1100 1000 Which shows the strongest relationship? 900 800 30 40 50 60 2200 1800 1400 1000 600 200 0 20 40 60 80 100 120 14

The two plots represent the same data! Our eye is not good enough in describing strength of relationship. We need a method for quantifying the relationship between two variables. The most common measure of relationship is the Pearson Product Moment correlation coefficient. We generally just say correlation coefficient. 15

16 Correlation Coefficient, r The correlation, r, is an average of the products of the standardized x-values and the standardized y-values for each pair. = = y i n i x i s y y s x x n r 1 1 1

Correlation Coefficient, r A correlation coefficient measures these characteristics of the linear relationship between two variables, x and y. Direction of the relationship Positive or negative Degree of the relationship: How well do the data fit the linear form being considered? Correlation of (1 or -1) represents a perfect fit. Correlation of (0) indicates no relationship. 17

Interpreting Correlation Coefficient, r Correlation Applet: http://www.duxbury.com/authors/mcclellandg/tiein/ johnson/correlation.htm Facts about correlation pp.143-144 Correlation is not a complete description of twovariable data. We also need to report a complete numerical summary (means and standard deviations, 5-number summary) of both x and y. 18

Exercise 3.25, p. 146 19

Outlier, or influential point? Let s enter the data into our calculators and calculate the correlation coefficient. The data are in the middle two columns of Table 1.10, p. 59. r=? Now, remove the possible influential point. What happens to r? 20

21

Exercises: Understanding Correlation Review Facts about correlation, pp. 143-144 3.34, 3.35, and 3.37, p. 149 Reading: pp. 149-157 22

Relationship Between Winding Tension and Yarn Elongation Elongation% 9.0 8.5 8.0 7.5 7.0 6.5 y = -0.0759x + 9.4455 R 2 = 0.732 6.0 10 15 20 25 30 35 Winding Tension, g 23

Least Squares Regression Ultimately, we would like to predict elongation by using a more practical measurement, winding tension. A regression line, also called a line of best fit, was found. How was the line of best fit determined? Determine mathematically the distance between the line and each data point for all values of x. The distance between the predicted value and the actual (y) value is called a residual (or error). residual = ^ yi y = error (e) 24

Least Squares Regression: Line of Best Fit This could be done for each data point. If we square each residual and sum all of the squared residuals, we have: e 2 = n i= 1 (y i The best-fitting line is the line that has the smallest sum of e 2... the least squares regression line! That is, the line of best fit occurs when: ^ y) 2 e 2 n = (y i= 1 i ^ y) 2 = minimum 25

A Residual (Figure 3.11, p. 151) 26

Least-Squares Regression Line With the help of algebra and a little calculus, it can be shown that this occurs when: b = r s s y x a = y bx ^ y = a + bx 27

Exercise 3.12, p. 132 Is there a relationship between lean body mass and resting metabolic rate for females? Quantify this relationship. Find the line of best fit (the least-squares regression, LSR). Use the LSR to predict the resting metabolic rate for a woman with mass of 45 kg and for a woman with mass of 59.5 kg. 28

Interpreting the Regression Model The slope of the regression line is important for the interpretation of the data: The slope is the rate of change of the response variable with a one unit change in the explanatory variable. The intercept is the value of y-predicted when x=0. It is statistically meaningful only when x can actually take values close to zero. 29

R 2 : Coefficient of Determination Proportion of variability in one variable that can be associated with (or predicted by) the variability of the other variable. 1- r 2 = 0.28 r = 0.85, r 2 = 0.72 30

Exercise 3.45, p. 166 31

Exercise 3.45, p. 166 32

Residuals In regression, we see deviations by looking at the scatter of points about the regression line. The vertical distances from the points to the least-squares regression line are as small as possible, in the sense that they have the smallest possible sum of squares. Because they represent left-over variation in the response after fitting the regression line, these distances are called residuals. 33

Examining the Residuals The residuals show how far the data fall from our regression line, so examining the residuals helps us to assess how well the line describes the data. Residuals Plot 34

Residuals Plot Let s construct a residuals plot, that is, a plot of the explanatory variable vs. the residuals. pp. 174-175 The residuals plot helps us to assess the fit of the least squares regression line. We are looking for similar spread about the line y=0 (why?) for all levels of the explanatory variable. 35

Residuals Plot Interpretation, cont. A curved or other definitive pattern shows an underlying relationship that is not linear. Figure 3.19(b), p. 170 Increasing or decreasing spread about the line as x increases indicates that prediction of y will be less accurate for smaller or larger x. Figure 3.19(c), p. 171 Look for outliers! 36

Figures 3.19 (a-c), pp. 170-171 37

How to create a residuals plot Create regression model using your calculator. Create a column in your STAT menu for residuals. Remember that a residual is the actual value minus the predicted value: residual = y! y " 38

Residuals Plot for 3.45 39

HW Read through end of chapter Problems: 3.42 and 3.43 (parts a and b only), p. 165 3.46, p. 173 Chapter 3 Test on Monday 40

Regression Outliers and Influential Observations A regression outlier is an observation that lies outside the overall pattern of the other observations. An observation is influential for a statistical calculation if removing it would markedly change the result of the calculation. Points that are outliers in the x direction of a scatterplot are often influential for the least-squares regression line. Sometimes, however, the point is not influential when it falls in line with the remaining data points. Note: An influential point may be an outlier in terms of x, but we label it as influential if removing it significantly influences the regression. 41

Practice Problems Problems: 3.56, p. 179 3.74, p. 188 3.76, p. 189 42

Preparing for the Test Re-read chapter. Know the terms, big concepts. Chapter Review, pp. 181-182 Go back over example and HW problems. Study slides! 43