AP Statistics L I N E A R R E G R E S S I O N C H A P 7

Similar documents
AP Statistics S C A T T E R P L O T S, A S S O C I A T I O N, A N D C O R R E L A T I O N C H A P 6

MODELING. Simple Linear Regression. Want More Stats??? Crickets and Temperature. Crickets and Temperature 4/16/2015. Linear Model

Chapter 3: Describing Relationships

Chapter 3: Describing Relationships

Announcements: You can turn in homework until 6pm, slot on wall across from 2202 Bren. Make sure you use the correct slot! (Stats 8, closest to wall)

Inferences for Regression

AP Statistics Unit 6 Note Packet Linear Regression. Scatterplots and Correlation

Stat 101 L: Laboratory 5

Chapter 27 Summary Inferences for Regression

Chapter 5 Least Squares Regression

28. SIMPLE LINEAR REGRESSION III

Unit 6 - Introduction to linear regression

appstats8.notebook October 11, 2016

Chapter 6: Exploring Data: Relationships Lesson Plan

Section 3.3. How Can We Predict the Outcome of a Variable? Agresti/Franklin Statistics, 1of 18

Linear Regression. Linear Regression. Linear Regression. Did You Mean Association Or Correlation?

Information Sources. Class webpage (also linked to my.ucdavis page for the class):

Lecture 4 Scatterplots, Association, and Correlation

Lecture 4 Scatterplots, Association, and Correlation

Chapter 7 Linear Regression

HOMEWORK (due Wed, Jan 23): Chapter 3: #42, 48, 74

Scatterplots and Correlation

Chi-square tests. Unit 6: Simple Linear Regression Lecture 1: Introduction to SLR. Statistics 101. Poverty vs. HS graduate rate

9. Linear Regression and Correlation

appstats27.notebook April 06, 2017

Objectives. 2.3 Least-squares regression. Regression lines. Prediction and Extrapolation. Correlation and r 2. Transforming relationships

Announcements. Lecture 10: Relationship between Measurement Variables. Poverty vs. HS graduate rate. Response vs. explanatory

Chapter 5 Friday, May 21st

3.2: Least Squares Regressions

7. Do not estimate values for y using x-values outside the limits of the data given. This is called extrapolation and is not reliable.

Warm-up Using the given data Create a scatterplot Find the regression line

Ch Inference for Linear Regression

Summarizing Data: Paired Quantitative Data

Nov 13 AP STAT. 1. Check/rev HW 2. Review/recap of notes 3. HW: pg #5,7,8,9,11 and read/notes pg smartboad notes ch 3.

Relationships Regression

Lecture 18: Simple Linear Regression

Recall, Positive/Negative Association:

6.1.1 How can I make predictions?

Chapter 14. Statistical versus Deterministic Relationships. Distance versus Speed. Describing Relationships: Scatterplots and Correlation

Chapter 8. Linear Regression /71

Related Example on Page(s) R , 148 R , 148 R , 156, 157 R3.1, R3.2. Activity on 152, , 190.

Unit 6 - Simple linear regression

Chapter 3: Examining Relationships

Chapter 6. September 17, Please pick up a calculator and take out paper and something to write with. Association and Correlation.

AP Statistics. Chapter 6 Scatterplots, Association, and Correlation

Intro to Linear Regression

Chapter Goals. To understand the methods for displaying and describing relationship among variables. Formulate Theories.

Chapter 2: Looking at Data Relationships (Part 3)

Intro to Linear Regression

CHAPTER 3 Describing Relationships

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters

y response variable x 1, x 2,, x k -- a set of explanatory variables

STA Module 5 Regression and Correlation. Learning Objectives. Learning Objectives (Cont.) Upon completing this module, you should be able to:

Determine is the equation of the LSRL. Determine is the equation of the LSRL of Customers in line and seconds to check out.. Chapter 3, Section 2

10.1: Scatter Plots & Trend Lines. Essential Question: How can you describe the relationship between two variables and use it to make predictions?

Describing Bivariate Relationships

Announcements. Lecture 18: Simple Linear Regression. Poverty vs. HS graduate rate

Chapter 9. Correlation and Regression

Business Statistics. Lecture 10: Course Review

Review of Multiple Regression

Scatterplots. 3.1: Scatterplots & Correlation. Scatterplots. Explanatory & Response Variables. Section 3.1 Scatterplots and Correlation

Announcements. Unit 6: Simple Linear Regression Lecture : Introduction to SLR. Poverty vs. HS graduate rate. Modeling numerical variables

1. Create a scatterplot of this data. 2. Find the correlation coefficient.

Univariate analysis. Simple and Multiple Regression. Univariate analysis. Simple Regression How best to summarise the data?

AP Statistics Two-Variable Data Analysis

Correlation and Linear Regression

Chapter 7. Linear Regression (Pt. 1) 7.1 Introduction. 7.2 The Least-Squares Regression Line

5.1 Bivariate Relationships

INFERENCE FOR REGRESSION

7.0 Lesson Plan. Regression. Residuals

What is the easiest way to lose points when making a scatterplot?

Introduction to Determining Power Law Relationships

Linear Regression and Correlation. February 11, 2009

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.

How can we explore the association between two quantitative variables?

Looking at data: relationships

Regression Models - Introduction

Chapter 10 Correlation and Regression

Mathematics for Economics MA course

Chapter 7 Summary Scatterplots, Association, and Correlation

AMS 7 Correlation and Regression Lecture 8

Lecture 11: Simple Linear Regression

Influencing Regression

Practice Questions for Exam 1

Review. Number of variables. Standard Scores. Anecdotal / Clinical. Bivariate relationships. Ch. 3: Correlation & Linear Regression

Chapter 8. Linear Regression. Copyright 2010 Pearson Education, Inc.

Basic Business Statistics 6 th Edition

Answer Key. 9.1 Scatter Plots and Linear Correlation. Chapter 9 Regression and Correlation. CK-12 Advanced Probability and Statistics Concepts 1

Review of Statistics 101

Chapter 8. Linear Regression. The Linear Model. Fat Versus Protein: An Example. The Linear Model (cont.) Residuals

BIVARIATE DATA data for two variables

Lecture 16 - Correlation and Regression

CHAPTER 3 Describing Relationships

Chapter 7. Scatterplots, Association, and Correlation

Chapter 12 - Part I: Correlation Analysis

Steps to take to do the descriptive part of regression analysis:

Analysis of Bivariate Data

Chapter 5: Ordinary Least Squares Estimation Procedure The Mechanics Chapter 5 Outline Best Fitting Line Clint s Assignment Simple Regression Model o

ST Correlation and Regression

Business Statistics. Lecture 10: Correlation and Linear Regression

Transcription:

AP Statistics 1 L I N E A R R E G R E S S I O N C H A P 7 The object [of statistics] is to discover methods of condensing information concerning large groups of allied facts into brief and compendious expressions suitable for discussion. Francis Galton (1822-1911)

Relationship between 2 Quantitative Variables 2 In the mid-20 th century Dr. Mildred Trotter, a forensic anthropologist, determined relationships between dimensions of various bones and a person s height. The relationships she found are still in use today in the effort to identify missing persons based on skeletal remains (think C.S.I). One relationship compares the length of the femur to the person s height. Measure the length of your femur along the outside of your leg (in inches). Also measure your height (also in inches) if you don t already know it. Record both at the front of the room. Femur

Scatterplot of Height vs. Femur 3 Femur length is the predictor variable and height is the response variable. Association between variables: Direction positive Form linear Strength moderate (correlation = 0.591)

Linear Model 4 A linear model is a line that passes through the middle of the plotted points. residual squared residual The line we will create is called the Least Squares Regression Line. It minimizes something called the squared residuals (residuals are also called errors).

Linear Model A residual is the difference between the observed response and what the model predicts for the response: 5 residual observed y predicted residual y yˆ y-hat y residual Example: The indicated point is for a femur length of 15 in. The observed height is 62 in., and the model predicts a height of 64.4 in. The residual is: residual 62 64.4 2.4 The model over-predicts height for this data point by 2.4 in.

Least Squares Regression Equation 6 The form of the linear model is: ŷ b b x 0 1 b y b x 0 1 b 1 rs s x y Note 1 These formulas are on your AP Formula Sheet Note 2 See optional Math Box on p. 180 to look under the hood

Least Squares Regression Equation 7 If we standardized the data, the standard deviation s would be 1: rsy 1 b1 r r s 1 x Since r can never be larger than 1, the numerator indicates that the predicted y tends to be closer to the mean (in standard deviations) the its corresponding x. This property is called regression to the mean. So, when x increases by 1SD, y increases by only r times 1SD. This is called regression to the mean.

Back to Our Data 8 Variable Mean Std. dev. femur 16.10 1.71 height 65.87 4.10 r = 0.591 Compute the Least Squares Regression Line Model: b 1 rs s x y b y b x 0 1 ŷ b b x 0 1

Back to Our Data 9 Calculate the Least Squares Regression Model using your calculator (see p. 193 for TI Tips).

Regression on the Computer Simple linear regression results: Dependent Variable: height Independent Variable: femur height = 43.06999 + 1.4157449 femur Sample size: 34 R (correlation coefficient) = 0.59071902 R-sq = 0.34894896 Estimate of error standard deviation: 3.3562315 10 Parameter estimates: Parameter Estimate Std. Err. Alternative DF T-Stat P-value Intercept 43.06999 5.5348135 0 32 7.7816516 <0.0001 Slope 1.4157449 0.3418507 0 32 4.1414119 0.0002

Interpret the Slope and Intercept 11 height 43.07 1.42 femur Predicted height if femur length is 0 inches! For every 1-inch increase in femur length we expect an increase of about 1.42 inches in height.

Make Predictions 12 height 43.07 1.42 femur Predict a person s height if the femur length is: 1.18.5 inches 2.25 inches

Make Predictions 13 A prediction made outside the range of the predictor (explanatory) data is called extrapolation. Proceed with caution. The model may not hold true beyond the observed data. Observed Data

How Good Is The Model? Make a residual plot. If there is no apparent pattern to the residuals, then the Least Squares Regression (LSR) Model is reasonable. 14 No pattern means the model captured all of the interesting relationship between the variables. The mean of the residuals will always be 0, and the smaller the residuals from the model, the closer the model fits the data.

How Good Is The Model? 15 Software usually plots the residuals against the predicted values (heights in this case). When there is only one explanatory variable, the result is the same when the residuals are plotted against the explanatory variable (other than scale). Residuals vs. Predicted Values Residuals vs. Explanatory Variable

How Good Is The Model? 16 See TI Tips on p. 193 for instructions on creating a residual plot on your calculator. Note Every time you have your calculator create a LSR Model, a residuals list (RESID) is created. Every time. The most common mistake made by AP Stat students with regards to this is to forget to rerun the LSR command when working with a new set of data.

How Good Is The Model? 17 No left-over underlying pattern, so the linear model is reasonable. Curved pattern left-over after regression, so linear model did not capture entire variable relationship. Residuals increase in size with an increase in x, so linear model did not capture entire variable relationship.

After Deciding On The Model 18 A statistic we will call R-squared* tells us something about how well the model captures the relationship in the variability present in the association between the predictor and response variables. It goes something like this *It has a technical name, but no one really ever uses it.

R-squared 19 What if I told you a new student was joining the class next week, and I want you to predict his/her height? Further, the only model you have is based on the mean height of students in the AP Stats class (y-bar = 65.9 in.). What height is your best guess? That s right, the mean height of 65.9 in.

The mean height model: R-squared 20 yˆ 65.9

Here s the new student R-squared 21 Femur = 18 in Height = 70in yˆ 65.9

R-squared Here s the new student 22 error wrt mean model yˆ 65.9 The mean height is not a satisfactory guess we need more information.

R-squared Now we know the student s femur length to be 18 in. 23 height 43.07 1.42 femur error eliminated by LSR model yˆ 65.9 error wrt mean model A better estimate is based on the LSR Model 68.63 in.

R-squared 24 The ratios of these two errors (summed for all data points) is r-squared. height 43.07 1.42 femur error eliminated by LSR model yˆ 65.9 error wrt mean model

R-squared R-squared is the fraction of variation in height accounted for by the regression model with femur length. Specifically 25 About 34.9% of the variation in height can be attributed to the linear regression model with femur length.

R-squared R-squared will always be between 0% and 100%. R-squared describes the percent of variation in the response variable that can be explained (or attributed to) the regression with the explanatory variable. 26 In the case of heights predicted from femur lengths, if only 34.9% of the observed variation of heights can be attributed to differences in femur length, there must be other factors that contribute to differences in heights. What might some of these factors be?

r vs. R-squared 27 Correlation r describes the strength and direction of a linear association between two quantitative variables. R-squared describes the amount of variation observed in the response variable accounted for through the linear model with the predictor variable.

What if the Variables are Reversed? 28 Because the Least Squares Regression Line minimizes the vertical distances to the observed data, you can t ever (no, not ever don t even think it!) use a regression equation to predict the explanatory variable from the response variable. height 43.07 1.42 femur femur 0.13 0.25 height

Summary If the scatterplot doesn t look straight, don t fit a line! Watch out for outliers (in all directions). Do use a residual plot to see if the linear model looks reasonable. 29 Don t use R-squared to determine if a linear model is reasonable. R- squared only tells you something about the amount of variation in the response variable that can be traced back to the a particular predictor variable. Association does not imply causation. A regression model with high r and R-squared in no way means the predictor variable causes the response variable. Extrapolate with caution.

Assignment 30 Read Chapter 7 Assignment 1: Do Ch 7 exercises #1-11 odd, 17, Assignment 2: Do Ch 7 exercises # 19-31 odd, 37-41 odd, 45, 53, 57, 59 www.causeweb.org John Landers