MATH 2560 C F03 Elementary Statistics I LECTURE 9: Least-Squares Regression Line and Equation

Similar documents
Statistical View of Least Squares

AP Statistics Unit 6 Note Packet Linear Regression. Scatterplots and Correlation

Chapter 3: Describing Relationships

Least Squares Regression

Chapter 6: Exploring Data: Relationships Lesson Plan

7. Do not estimate values for y using x-values outside the limits of the data given. This is called extrapolation and is not reliable.

Chapter 3: Describing Relationships

MATH 1070 Introductory Statistics Lecture notes Relationships: Correlation and Simple Regression

A company recorded the commuting distance in miles and number of absences in days for a group of its employees over the course of a year.

7.0 Lesson Plan. Regression. Residuals

Determine is the equation of the LSRL. Determine is the equation of the LSRL of Customers in line and seconds to check out.. Chapter 3, Section 2

Chapter 4 Describing the Relation between Two Variables

Chapter 10. Regression. Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania

Business Statistics. Lecture 10: Correlation and Linear Regression

Chapter 2: Looking at Data Relationships (Part 3)

Chapter 5 Least Squares Regression

Stat 101: Lecture 6. Summer 2006

Looking at data: relationships

Influencing Regression

Chapter 7. Linear Regression (Pt. 1) 7.1 Introduction. 7.2 The Least-Squares Regression Line

Relationships Regression

Inference for Regression

Section 5.4 Residuals

UNIT 12 ~ More About Regression

Linear Regression Communication, skills, and understanding Calculator Use

AMS 7 Correlation and Regression Lecture 8

Ch Inference for Linear Regression

3.2: Least Squares Regressions

PROBABILITY AND STATISTICS. Least Squares Regression

Mrs. Poyner/Mr. Page Chapter 3 page 1

Linear Regression and Correlation. February 11, 2009

THE PEARSON CORRELATION COEFFICIENT

Analysis of Bivariate Data

Bivariate Data Summary

MATH 2560 C F03 Elementary Statistics I Solutions to Assignment N3

Describing Bivariate Relationships

y = a + bx 12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation Review: Interpreting Computer Regression Output

sociology sociology Scatterplots Quantitative Research Methods: Introduction to correlation and regression Age vs Income

BIOSTATISTICS NURS 3324

Chapter 7 Linear Regression

Announcements. Lecture 10: Relationship between Measurement Variables. Poverty vs. HS graduate rate. Response vs. explanatory

Overview. Overview. Overview. Specific Examples. General Examples. Bivariate Regression & Correlation

Linear Regression. Linear Regression. Linear Regression. Did You Mean Association Or Correlation?

Lecture 8 CORRELATION AND LINEAR REGRESSION

5.1 Bivariate Relationships

AP Statistics. Chapter 9 Re-Expressing data: Get it Straight

appstats8.notebook October 11, 2016

Example: Forced Expiratory Volume (FEV) Program L13. Example: Forced Expiratory Volume (FEV) Example: Forced Expiratory Volume (FEV)

Lecture 16 - Correlation and Regression

BNAD 276 Lecture 10 Simple Linear Regression Model

Algebra 1 Practice Test Modeling with Linear Functions Unit 6. Name Period Date

Chapter 3: Examining Relationships

MODELING. Simple Linear Regression. Want More Stats??? Crickets and Temperature. Crickets and Temperature 4/16/2015. Linear Model

INFERENCE FOR REGRESSION

The following formulas related to this topic are provided on the formula sheet:

Lecture 3. The Population Variance. The population variance, denoted σ 2, is the sum. of the squared deviations about the population

Chi-square tests. Unit 6: Simple Linear Regression Lecture 1: Introduction to SLR. Statistics 101. Poverty vs. HS graduate rate

Examining Relationships. Chapter 3

APPENDIX 1 BASIC STATISTICS. Summarizing Data

IF YOU HAVE DATA VALUES:

MATH 1150 Chapter 2 Notation and Terminology

Summarizing Data: Paired Quantitative Data

Intro to Linear Regression

SOLVING INEQUALITIES and 9.1.2

Scatterplots and Correlation

Psychology 282 Lecture #3 Outline

Chapter 14. Statistical versus Deterministic Relationships. Distance versus Speed. Describing Relationships: Scatterplots and Correlation

Correlation. A statistics method to measure the relationship between two variables. Three characteristics

Chapter 12 - Part I: Correlation Analysis

BIVARIATE DATA data for two variables

Section 3.3. How Can We Predict the Outcome of a Variable? Agresti/Franklin Statistics, 1of 18

USING THE QUADRATIC FORMULA and 9.1.3

11 Regression. Introduction. The Correlation Coefficient. The Least-Squares Regression Line

Data Analysis and Statistical Methods Statistics 651

AP Statistics L I N E A R R E G R E S S I O N C H A P 7

Chapter 5 Friday, May 21st

Mathematics for Economics MA course

Regression. X (b) Line of Regression. (a) Curve of Regression. Figure Regression lines

Sociology 6Z03 Review I

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Inference for Regression Inference about the Regression Model and Using the Regression Line

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

Simple Linear Regression Analysis

Review for EOC. Arithmetic Sequences, Geometric Sequences, & Scatterplots

Lecture 46 Section Tue, Apr 15, 2008

Can you tell the relationship between students SAT scores and their college grades?

Regression line. Regression. Regression line. Slope intercept form review 9/16/09

23. Inference for regression

Correlation and Regression

MATH 80S Residuals and LSR Models October 3, 2011

Chapter 27 Summary Inferences for Regression

Name. The data below are airfares to various cities from Baltimore, MD (including the descriptive statistics).

We will now find the one line that best fits the data on a scatter plot.

Predicted Y Scores. The symbol stands for a predicted Y score

Announcements. Unit 6: Simple Linear Regression Lecture : Introduction to SLR. Poverty vs. HS graduate rate. Modeling numerical variables

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

CHAPTER 4 DESCRIPTIVE MEASURES IN REGRESSION AND CORRELATION

Section Least Squares Regression

Math 1710 Class 20. V2u. Last Time. Graphs and Association. Correlation. Regression. Association, Correlation, Regression Dr. Back. Oct.

STAT 3022 Spring 2007

Transcription:

MATH 2560 C F03 Elementary Statistics I LECTURE 9: Least-Squares Regression Line and Equation 1 Outline least-squares regresion line (LSRL); equation of the LSRL; interpreting the LSRL; correlation and regression;

2 Least-Squares Regression Line = Our first aim is: we need a way to draw a regression line that doesn t depend on our guess as to where the line should go. We want one line that is as close as possible. = Our second aim is: we want a regression line that makes the prediction errors as small as possible: errors=observed variables-predicted variables minimize! Figure 2.13 illustrate the idea.

= The most common idea how to make these errors as small as possible precisely is the LEAST-SQUARES idea. Leat-Squares Regression Line A least-squares regression line of y on x is the line that makes the sum of the squares of the vertical distances of the data points from the line as small as possible. Below we have the least-squares idea expressed as a mathematical problem. Least-Squares Idea as a Mathematical Problem 1. There are n observations on two variables x and y : (x 1, y 1 ), (x 2, y 2 ),..., (x n, y n ); 2. The line y = a + bx through scatterplot of these observations predicts the value of y corresponding to x i as ŷ i = a + bx i ; 3. The predicted response ŷ i will not be exactly the same as the actually observed response y i ; 4.The prediction error for the point x i is: error=observed y i -predicted ŷ i ; 5. The method of Least-Squares chooses the line that makes the sum of the squares of these errors as small as possible; 6. Mathematical problem : find the values of the intercept a and the slope b that minimize the following expression; (error) 2 = (y i ŷ i ) 2 = (y i a bx i ) 2 minimize.

Equation for the LSRL Equation of the Leat-Squares Regression Line 1. Let we have data on explanatory variable x and a response variable y for n individuals; 2. The mean and standard deviations of the sample data are x and s x for x and ȳ and s y for y, and the correlation between x and y is r; 3. The equation of the least-squares regression line of y on x is: with slope and intercept ŷ = a + bx b = r s y s x a = ȳ b x. Example 2.13. Mean height of Kalama children (Table 2.7). We calculate means, standard deviations for x and y, correlation r, slope b, intercept a and give the equation of the least-squares line in this case: 1. Mean and Standard Deviation for x: x = 23.5m, s x = 3.606m; 2. Mean and Standard Deviation for y: 3. Correlation: 4. Slope: 5. Intercept: ȳ = 79.85, s y = 2.302; r = 0.9944; b = r s y s x = 0.9944 2.302 3.606 = 0.6348cm/m; a = ȳ b x = 79.85 (0.6348)(23.5) = 64.932cm; 6. The equation of the least-squares line is: ŷ = 64.932 + 0.6348x.

3 Interpreting the regression line Interpreting the Leat-Squares Regression Line 1. Slope b = r sy s x : says that along the regression line, a change of one standard deviation in x corresponds to a change of r standard deviations in y; (The change in the predicted response ŷ is the same as the change in x when r = 1 or r = 1. Otherwise, 1 < r < 1, the change in ŷ is less than the change in x.) 2. The least-squares regression line always passes through the point ( x, ȳ); Figure 2.14 displays the basic regression output for the Kalama data from a graphing calculator and two statistical software packages.

4 Correlation and Regression Least-squares regression looks at the distances of the data points from the line only in the y direction. Example 2.14. Expanding the Universe (Figure 2.15). Figure 2.15 is a scatterplot of data that played a central role in the discovery that the universe is expanding. Here r = 0.7842, hence, relationship between the distances from Earth of 24 spiral galaxies and the speeds at which they are moving away from us is a positive and linear. Important Remark: Although there is only one correlation between velocity and distance, regression of velocity on distance and regression of distance on velocity give different lines.

= There is a close connection between correlation and regression: Connection between Correlation and Regression: the slope of the least-squares line involves r; the square of the correlation, r 2, is the fraction of the variation in the values of y that is explained by the least-squares regression of y on x. Relationship between r and r 2 When you report a regression, give r 2 as a measure of how successfully the regression explains the response. All the software outputs in Figure 2.14 include r 2. The use of r 2 to describe the success of regression in explaining the response y is very common: it rests on the fact that there are two sources of variation in the responses y in a regression setting. Example: Kalama children. One reason the Kalama heights vary is that height changes with age; Second reason is that heights do not lie exactly on the line, but are scattered above and below it. We use r 2 to measure variation along the line as a fraction of the total variation in the response variables. For a pictorial grasp of what r 2 tells us, look at Figure 2.16. Both scatterplots resemble the Kalama data, but with many more observations. The least-squares regression line is the same as we computed from the Kalama data. In Figure 2.16(a), r = 0.994 and r 2 = 0.989. In Figure 2.16(b), r = 0.921 and r 2 = 0.849. There is more scatter about the fitted line and here r 2 is less than in Figure 2.16(a).

5 More Specific Interpretation of r 2 The squared correlation gives us the variance of the predicted responses as a fraction of the variance of the actual responses: r 2 = varianceofpredictedvalues ŷ varianceofobservedvalues y. This fact is always true. Final Important Remark: The connections with correlation are special properties of least-squares regression. They are not true for other methods of fitting a line to data.

6 Summary 1. A regression line is stright line that describes how a response variable y changes as an explanatory variable x changes. 2. The most common method of fitting a line to a scatterplot is least squares. The least-squares regression line is the stright line ŷ = a + bx that minimizes the sum of the squares of the vertical distances of the observed y-values from the line. 3. A regression line is used to predict the value of y for any value of x by substituting this x into the eqution of the line. Exptrapolation beyond the range of x values spanned by the data is risky. 4. The slope b of a regression line ŷ = a + bx is the rate at which the predicted response ŷ changes along the line as the explanatory variable x changes. Specifically, b is the change in ŷ when x increases by 1. 5. The intercept a of a regression line ŷ = a + bx is the predicted response ŷ when the explanatory variable x = 0. This prediction is of no statistical use unless x can actually take values near 0. The least-squares regression line of y on x is the line with slope r sy s x and intercept a = ȳ b x. This line always passes through the point ( x, ȳ). 6. Remarks. Correlation and regression are closely connected. The correlation r is the slope of the least-squares regression line when we measure both x and y in standardized units. The square of the correaltion r 2 is the fraction of the variance of one variable that is explaned by least-squares regression on the other variable.