Summarizing Data: Paired Quantitative Data

Similar documents
s e, which is large when errors are large and small Linear regression model

Chapter 7. Scatterplots, Association, and Correlation

Bivariate Data Summary

BIVARIATE DATA data for two variables

Chapter 10 Correlation and Regression

Objectives. 2.3 Least-squares regression. Regression lines. Prediction and Extrapolation. Correlation and r 2. Transforming relationships

Least Squares Regression

Chapter 3: Examining Relationships

Chapter 8. Linear Regression /71

Least-Squares Regression. Unit 3 Exploring Data

CHAPTER 4 DESCRIPTIVE MEASURES IN REGRESSION AND CORRELATION

Determine is the equation of the LSRL. Determine is the equation of the LSRL of Customers in line and seconds to check out.. Chapter 3, Section 2

Analysis of Bivariate Data

Chapter 11. Correlation and Regression

Chapter 6: Exploring Data: Relationships Lesson Plan

Review of Regression Basics

AP Statistics L I N E A R R E G R E S S I O N C H A P 7

Sociology 6Z03 Review I

Relationships Regression

Chapter 5 Friday, May 21st

Chapter 3: Describing Relationships

MODELING. Simple Linear Regression. Want More Stats??? Crickets and Temperature. Crickets and Temperature 4/16/2015. Linear Model

Important note: Transcripts are not substitutes for textbook assignments. 1

Example: Can an increase in non-exercise activity (e.g. fidgeting) help people gain less weight?

Chapter 10. Regression. Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania

STA Module 5 Regression and Correlation. Learning Objectives. Learning Objectives (Cont.) Upon completing this module, you should be able to:

Simple Linear Regression

Lecture 48 Sections Mon, Nov 16, 2009

Unit 6 - Introduction to linear regression

Chapter 3: Describing Relationships

Describing Bivariate Relationships

Chapter 12 Summarizing Bivariate Data Linear Regression and Correlation

Announcements. Lecture 18: Simple Linear Regression. Poverty vs. HS graduate rate

Chapter 9. Correlation and Regression

Unit 6 - Simple linear regression

Name. The data below are airfares to various cities from Baltimore, MD (including the descriptive statistics).

Nov 13 AP STAT. 1. Check/rev HW 2. Review/recap of notes 3. HW: pg #5,7,8,9,11 and read/notes pg smartboad notes ch 3.

Chapter 5 Least Squares Regression

Chapter 2: Looking at Data Relationships (Part 3)

Chapter 14. Statistical versus Deterministic Relationships. Distance versus Speed. Describing Relationships: Scatterplots and Correlation

Chapter 4 Describing the Relation between Two Variables

Stat 101: Lecture 6. Summer 2006

5.1 Bivariate Relationships

Announcements. Lecture 10: Relationship between Measurement Variables. Poverty vs. HS graduate rate. Response vs. explanatory

What is the easiest way to lose points when making a scatterplot?

Looking at data: relationships

Chapter 7 Linear Regression

Regression line. Regression. Regression line. Slope intercept form review 9/16/09

y n 1 ( x i x )( y y i n 1 i y 2

1. Use Scenario 3-1. In this study, the response variable is

Correlation. A statistics method to measure the relationship between two variables. Three characteristics

Chi-square tests. Unit 6: Simple Linear Regression Lecture 1: Introduction to SLR. Statistics 101. Poverty vs. HS graduate rate

AP Statistics Two-Variable Data Analysis

Linear Regression. Linear Regression. Linear Regression. Did You Mean Association Or Correlation?

Simple Linear Regression for the MPG Data

Announcements. Unit 6: Simple Linear Regression Lecture : Introduction to SLR. Poverty vs. HS graduate rate. Modeling numerical variables

A company recorded the commuting distance in miles and number of absences in days for a group of its employees over the course of a year.

Influencing Regression

Can you tell the relationship between students SAT scores and their college grades?

7.0 Lesson Plan. Regression. Residuals

> modlyq <- lm(ly poly(x,2,raw=true)) > summary(modlyq) Call: lm(formula = ly poly(x, 2, raw = TRUE))

Introduction to Linear regression analysis. Part 2. Model comparisons

Statistical View of Least Squares

Linear Regression and Correlation. February 11, 2009

Correlation and Regression

7. Do not estimate values for y using x-values outside the limits of the data given. This is called extrapolation and is not reliable.

appstats27.notebook April 06, 2017

Chapter 12 : Linear Correlation and Linear Regression

Announcements: You can turn in homework until 6pm, slot on wall across from 2202 Bren. Make sure you use the correct slot! (Stats 8, closest to wall)

The Correlation Principle. Estimation with (Nonparametric) Correlation Coefficients

3.2: Least Squares Regressions

Lecture 4 Scatterplots, Association, and Correlation

Lecture 4 Scatterplots, Association, and Correlation

Prob/Stats Questions? /32

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

Analysing data: regression and correlation S6 and S7

Overview. 4.1 Tables and Graphs for the Relationship Between Two Variables. 4.2 Introduction to Correlation. 4.3 Introduction to Regression 3.

Scatterplots and Correlation

Section 3.3. How Can We Predict the Outcome of a Variable? Agresti/Franklin Statistics, 1of 18

Math 243 OpenStax Chapter 12 Scatterplots and Linear Regression OpenIntro Section and

appstats8.notebook October 11, 2016

Fish act Water temp

Chapter 7. Linear Regression (Pt. 1) 7.1 Introduction. 7.2 The Least-Squares Regression Line

Chapter 6 Scatterplots, Association and Correlation

Ch Inference for Linear Regression

Math 1710 Class 20. V2u. Last Time. Graphs and Association. Correlation. Regression. Association, Correlation, Regression Dr. Back. Oct.

Weighted Least Squares

Test 3A AP Statistics Name:

CREATED BY SHANNON MARTIN GRACEY 146 STATISTICS GUIDED NOTEBOOK/FOR USE WITH MARIO TRIOLA S TEXTBOOK ESSENTIALS OF STATISTICS, 3RD ED.

Related Example on Page(s) R , 148 R , 148 R , 156, 157 R3.1, R3.2. Activity on 152, , 190.

Parametric Estimating Nonlinear Regression

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects

Describing the Relationship between Two Variables

Correlation and Regression Notes. Categorical / Categorical Relationship (Chi-Squared Independence Test)

STATS DOESN T SUCK! ~ CHAPTER 16

Warm-up Using the given data Create a scatterplot Find the regression line

AMS 7 Correlation and Regression Lecture 8

INFERENCE FOR REGRESSION

HOLLOMAN S AP STATISTICS BVD CHAPTER 08, PAGE 1 OF 11. Figure 1 - Variation in the Response Variable

Sampling Distributions in Regression. Mini-Review: Inference for a Mean. For data (x 1, y 1 ),, (x n, y n ) generated with the SRM,

Transcription:

Summarizing Data: Paired Quantitative Data regression line (or least-squares line) a straight line model for the relationship between explanatory (x) and response (y) variables, often used to produce a prediction ŷ of the variable y for a given value of x (the small hat over the variable indicates that the quantity is not a measured but is rather a predicted value of the response variable); also, the line that minimizes the sum of the squared deviations between the data points and the model line, with equation ŷ = b 0 + b 1 x, having slope b 1 = r sy s x and y-intercept b 0 = ȳ b 1 x. [TI83: STAT Calc LinReg(a+bx)] Assumptions for using the linear regression model Quantitative Variables Condition Straight Enough Condition Outlier Condition 1

Analyzing Paired Quantitative Data: least-squares line Using the The least-squares regression line is determined by minimizing y-deviations between the observed data values and the corresponding predicted values, so switching explanatory and response variables will generate a different least-squares line. The least-squares line always passes through the point of means ( x, ȳ). That is, the predicted response for the average value of the explanatory variable x will equal the average value of the response variable. An increase in the value of x by one standard deviation s x corresponds to a change in ŷ of r times a standard deviation s y. Thus, since r lies between 1 and +1, predicted values of ŷ will lie closer to their mean value ȳ than the corresponding x values are from their mean value x. (We say that the predicted ŷ values regress towards their mean. This is why the least-squares line is also called the regression line.) 2

coefficient of determination (r 2 or R 2 ) measures the percentage of total variation in y values that is due to their linear association with their corresponding x values. residual (Resid) the deviation y ŷ between the measured value of the response variable and its corresponding predicted value on the regression line; the mean of the residuals always equals 0. residual plot a scatterplot of pairs (ŷ, Resid), used to evaluate whether a linear model is appropriate: if it is, the residual plot should be absent of any patterns or trends [TI83: StatPlot, use Ylist:RESID] residual standard deviation (s e ) a measure of how far a typical point can lie above or below the regression line, or the size of a typical residual: (y ŷ) 2 s e = n 2 [TI83: STAT TESTS LinRegTTest, find s] 3

Analyzing Paired Quantitative Data: Linear Regression wisdom Residual plots are an indispensable tool for analyzing the suitability of the linear model; the data should be homogeneous, that is, there should not be subgroups of the data which differ from each other in some respect (often recognizable in a residual plot) The Straight Enough Condition warns us to check that the scatterplot be reasonably straight to ensure that the linear model is appropriate; deviations from straightness are often more easily noticed in a residual plot. Regression formulas are often used to extrapolate, that is, to make predictions for y corresponding to x values beyond the range of the measured data but based on trends within the range of the data; all such predictions are suspect, and the further one extrapolates, the more suspect the prediction! The Outlier Condition warns us to be on guard for outliers in the data, points with large deviations in x or y, or both; such points can be influential, in the sense that the size of the correlation (hence also the regression formula) can change dramatically when that outlier is removed from the data set. 4

A residual plot can also identify outliers having high leverage, the tendency to singlehandedly change the direction of the regression line by a noticeable amount; treat them in the same way as influential points. Outliers in the data need not be bad, and should not be dismissed out of hand or discarded only so as to strengthen the association between the variables; they should rather be explained: let the data honestly speak for itself. A high correlation does not necessarily signify a causative relationship. There may be a strong association between variables without there being a cause/effect relation between them, since both the explanatory and response variables might be influenced by a third lurking variable that has not been measured. Correlations between paired data sets based on averaged data smooth out much of the natural variation in raw measurements and naturally tend to be very high; predictions in these cases may be unreliable when applied to individual cases. 5