Least Squares Regression

Similar documents
Inference for Regression

Density Curves & Normal Distributions

Bivariate Data Summary

BIVARIATE DATA data for two variables

Confidence Intervals for Two Means

Inverse Normal Distribution and Sampling Distributions

Standard Normal Calculations

Jointly Distributed Variables

Steps to take to do the descriptive part of regression analysis:

AP Statistics Two-Variable Data Analysis

Determine is the equation of the LSRL. Determine is the equation of the LSRL of Customers in line and seconds to check out.. Chapter 3, Section 2

Summarizing Data: Paired Quantitative Data

Name. The data below are airfares to various cities from Baltimore, MD (including the descriptive statistics).

Bernoulli Trials and Binomial Distribution

Chapter 3: Describing Relationships

Chapter 8. Linear Regression /71

Chapter 3: Examining Relationships

Reminder: Univariate Data. Bivariate Data. Example: Puppy Weights. You weigh the pups and get these results: 2.5, 3.5, 3.3, 3.1, 2.6, 3.6, 2.

Least-Squares Regression. Unit 3 Exploring Data

6.1.1 How can I make predictions?

MATH 2560 C F03 Elementary Statistics I LECTURE 9: Least-Squares Regression Line and Equation

Classroom Assessments Based on Standards Integrated College Prep I Unit 3 CP 103A

Test 1 Review. Review. Cathy Poliak, Ph.D. Office in Fleming 11c (Department Reveiw of Mathematics University of Houston Exam 1)

Describing Bivariate Relationships

Linear Regression. Linear Regression. Linear Regression. Did You Mean Association Or Correlation?

Related Example on Page(s) R , 148 R , 148 R , 156, 157 R3.1, R3.2. Activity on 152, , 190.

Linear Regression and Correlation. February 11, 2009

Scatterplots and Correlation

Linear Regression Communication, skills, and understanding Calculator Use

Session 4 2:40 3:30. If neither the first nor second differences repeat, we need to try another

Chapter 7. Linear Regression (Pt. 1) 7.1 Introduction. 7.2 The Least-Squares Regression Line

Prob/Stats Questions? /32

5.1 Bivariate Relationships

Chapter 2: Looking at Data Relationships (Part 3)

3.2: Least Squares Regressions

Chapter 4 Describing the Relation between Two Variables

Announcements: You can turn in homework until 6pm, slot on wall across from 2202 Bren. Make sure you use the correct slot! (Stats 8, closest to wall)

Consistent and Dependent

s e, which is large when errors are large and small Linear regression model

Chapter 12 : Linear Correlation and Linear Regression

Mathematics for Economics MA course

IF YOU HAVE DATA VALUES:

MINI LESSON. Lesson 2a Linear Functions and Applications

Math 2311 Written Homework 6 (Sections )

Mrs. Poyner/Mr. Page Chapter 3 page 1

Chapter 12 Summarizing Bivariate Data Linear Regression and Correlation

Expected Values, Exponential and Gamma Distributions

Lecture 46 Section Tue, Apr 15, 2008

Lecture 48 Sections Mon, Nov 16, 2009

Chapter 3: Describing Relationships

Algebra I Notes Modeling with Linear Functions Unit 6

AP Statistics Bivariate Data Analysis Test Review. Multiple-Choice

a. Length of tube: Diameter of tube:

Chapter 6. Exploring Data: Relationships

appstats8.notebook October 11, 2016

Examining Relationships. Chapter 3

Correlation A relationship between two variables As one goes up, the other changes in a predictable way (either mostly goes up or mostly goes down)

Example: Can an increase in non-exercise activity (e.g. fidgeting) help people gain less weight?

Scatterplots. 3.1: Scatterplots & Correlation. Scatterplots. Explanatory & Response Variables. Section 3.1 Scatterplots and Correlation

Chapter 11. Correlation and Regression

Lecture 8 CORRELATION AND LINEAR REGRESSION

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

Ch Inference for Linear Regression

AP Statistics Unit 6 Note Packet Linear Regression. Scatterplots and Correlation

ASSIGNMENT 3 SIMPLE LINEAR REGRESSION. Old Faithful

Expected Values, Exponential and Gamma Distributions

Graphing Equations in Slope-Intercept Form 4.1. Positive Slope Negative Slope 0 slope No Slope

Determining the Spread of a Distribution Variance & Standard Deviation

AP Statistics L I N E A R R E G R E S S I O N C H A P 7

Chapter 16. Simple Linear Regression and dcorrelation

BNAD 276 Lecture 10 Simple Linear Regression Model

y n 1 ( x i x )( y y i n 1 i y 2

Unit #2: Linear and Exponential Functions Lesson #13: Linear & Exponential Regression, Correlation, & Causation. Day #1

Chapter 16. Simple Linear Regression and Correlation

Lecture 45 Sections Mon, Apr 14, 2008

Section I: Multiple Choice Select the best answer for each question.

Finite Mathematics Chapter 1

AMS 7 Correlation and Regression Lecture 8

Section 5.4 Residuals

Section 2.2: LINEAR REGRESSION

The Normal Distribuions

Chapter 6: Exploring Data: Relationships Lesson Plan

AP STATISTICS Name: Period: Review Unit IV Scatterplots & Regressions

7. Do not estimate values for y using x-values outside the limits of the data given. This is called extrapolation and is not reliable.

Ch. 9 Pretest Correlation & Residuals

Stat 101: Lecture 6. Summer 2006

Objectives. 2.3 Least-squares regression. Regression lines. Prediction and Extrapolation. Correlation and r 2. Transforming relationships

2.1 Scatterplots. Ulrich Hoensch MAT210 Rocky Mountain College Billings, MT 59102

Lecture 4 Scatterplots, Association, and Correlation

Business Statistics. Lecture 10: Correlation and Linear Regression

4.1 Introduction. 4.2 The Scatter Diagram. Chapter 4 Linear Correlation and Regression Analysis

y = a + bx 12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation Review: Interpreting Computer Regression Output

Algebra 1 Practice Test Modeling with Linear Functions Unit 6. Name Period Date

Chapter 7 Linear Regression

Stat 101 L: Laboratory 5

x and y, called the coordinates of the point.

Estimation and Confidence Intervals

Relationships Regression

S12 - HS Regression Labs Workshop. Linear. Quadratic (not required) Logarithmic. Exponential. Power

Algebra 1 Midterm Review

Transcription:

Least Squares Regression Sections 5.3 & 5.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 14-2311 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Sections (Department 5.3 & 5.4of Mathematics University of Lecture Houston 14 ) - 2311 1 / 32

Outline 1 Least-Squares Regression 2 Prediction 3 Coefficient of Determination 4 Residuals 5 Residual Plots Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Sections (Department 5.3 & 5.4of Mathematics University of Lecture Houston 14 ) - 2311 2 / 32

Popper Set Up Fill in all of the proper bubbles. Make sure your ID number is correct. Make sure the filled in circles are very dark. This is popper number 07. Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Sections (Department 5.3 & 5.4of Mathematics University of Lecture Houston 14 ) - 2311 3 / 32

Scatterplots and Correlation We are wanting to know the relationship between two random variables. To quickly look at the relationship we use the scatterplot and correlation to determine: Direction (positive or negative) Strength (weak, moderately weak, strong) Form (linear or non-linear) Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Sections (Department 5.3 & 5.4of Mathematics University of Lecture Houston 14 ) - 2311 4 / 32

Popper 07 Questions We are looking at the relationship between how many items are on a shelf (shelf space) and the number of items sold in a week. The correlation coefficient is r = 0.827. 1. What is the direction of the relationship between shelf space and weekly sales? a) Positive b) Negative c) No direction 2. What is the strength the relationship between shelf space and weekly sales? a) Strong b) Moderate c) Weak 3. What form appears to be in the relationship between shelf space and weekly sales? a) Linear b) Non-linear Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Sections (Department 5.3 & 5.4of Mathematics University of Lecture Houston 14 ) - 2311 5 / 32

Popper 07 Questions Use the following scatterplot to answer the questions below. y 2 2 6 10 2 0 2 4 6 x 4. What is the correlation coefficient for this set of data? a. 1 b. -0.94 c. 0.94 d. 0 5. If both X and Y are multiplied by 2, what would happen to the correlation coefficient? a. It will stay the same. c. It will be multiplied by 2. b. It will be divided by 2. d. It will be zero. Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Sections (Department 5.3 & 5.4of Mathematics University of Lecture Houston 14 ) - 2311 6 / 32

Example The marketing manager of a supermarket chain would like to use shelf space to predict the sales of coffee. A random sample of 12 stores is selected, with the following results. Store Shelf Space (ft) Weekly Sales (# sold) 1 5 160 2 5 220 3 5 140 4 10 190 5 10 240 6 10 260 7 15 230 8 15 270 9 15 280 10 20 260 11 20 290 12 20 310 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Sections (Department 5.3 & 5.4of Mathematics University of Lecture Houston 14 ) - 2311 7 / 32

Scatterplot Number Sold 150 200 250 300 5 10 15 20 Shelf Space(feet) Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Sections (Department 5.3 & 5.4of Mathematics University of Lecture Houston 14 ) - 2311 8 / 32

Examining relationships Correlation measures the direction and strength of the straight-line relationship between two quantitative variables. If a scatterpolt shows a linear relationship, we would like to summarize this overall pattern by drawing a line on the scatterplot. A regression line is a straight line that describes how a response variable y changes as an explanatory variable x changes. This equation is used when one of the variables helps explain or predict the other. Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Sections (Department 5.3 & 5.4of Mathematics University of Lecture Houston 14 ) - 2311 9 / 32

Least-Squares regression The least-squares regression line (LSRL) of Y on X is the line that makes the sum of the squares of the vertical distances of the data points from the line as small as possible. - 2311 10 / 32

Least-Squares Number Sold 150 200 250 300 5 10 15 20 Shelf Space(feet) - 2311 11 / 32

Equation of the least-squares regression line Let x be the explanatory variable and y be the response variable for n individuals. From the data calculate the means x and ȳ and the standard deviations s x and s y of the two variables, and their correlation r. The least-squares line is the equation ŷ = a + bx - 2311 12 / 32

Equation of the least-squares regression line The least-squares line is the equation ŷ = a + bx. In the example of the supermarket sales, let Y = weekly sales and X = shelf space. The least-squares regression equation to predict weekly sales based on shelf space is ŷ = 145 + 7.4x - 2311 13 / 32

Calculating the least squares regression equation by hand Let X be the explanatory variable and Y be the response variable for n individuals. 1. From the data calculate the means x and ȳ and the standard deviations s x and s y of the two variables, and their correlation r. 2. Calculate the slope: 3. Calculate the y-intercept: b = r s y s x a = ȳ b x 4. Then the equation is: ŷ = a + bx. Where you input the slope into b and the y-intercept into a and leave y and x alone (do not put any numbers into y and x). - 2311 14 / 32

Finding Equation for Coffee Sales Given these values determine the least-square regression line (LSRL) equation for predicting number sold based on shelf space. Explanatory variable: Shelf space (X) x = 12.5 feet, s x = 5.83874 feet. Response variable: Sales (Y ) ȳ = 237.5 units sold, s y = 52.2451 units sold. Correlation: r = 0.827-2311 15 / 32

Example of LSLR for Cost of Car The following are descriptive statistics for estimating cost of the car Y by the age of the car (X), r = 0.8224. Estimated Cost (Y ) ȳ = 10360.93 s y =5482.3372 Age (X) x = 5.214 s x = 2.940 Determine the Least Squares Equation (LSLR) of the data. - 2311 16 / 32

Least-Square Regression Line Using R Use the command: lm(yvariable Xvariable). For example of the coffee sales. > shelf=c(5,5,5,10,10,10,15,15,15,20,20,20) > sales=c(160,220,140,190,240,260,230,270,280,260,290,310) > lm(sales~shelf) Call: lm(formula = sales ~ shelf) Coefficients: (Intercept) shelf 145.0 7.4 Where the Intercept value is a and the other value is b, the slope. Thus the equation in this example is: ŷ = 145 + 7.4x. - 2311 17 / 32

Least-Squares Regression Line Using TI-83(84) 1. Make sure the diagnostics is turned on by clicking 2ND CATALOG and scroll down to Diagnostics. 2. Choose STAT CALC then 8:LinReg(a+bx). 3. Make sure your Xlist is L1 and Ylist is L2 and select Calculate. 4. a is your y-intercept and b is your slope. - 2311 18 / 32

Prediction The equation of the regression line makes prediction easy. Substitute an x-value into the equation. Predict the weekly sales of coffee with shelf space of 12 feet. ŷ = 145 + 7.4 12 = 233.8 Thus the 233.80 is the predicted weekly number of units of coffee sold that is has 12 feet of shelf space. - 2311 19 / 32

Popper Questions The least-squares regression line equation to predict cost of the car(y ) by the age of the car (X) is: ŷ = 18358 1534x. 6. Predict the cost of an automobile for a 5 year old car. a) $18358 b) $1534 c) $10688 d) $11.96 7. Predict the cost of an automobile for a 20 year old car. a) $18358 b) -$1534 c) -$12,322 d) $49038-2311 20 / 32

Facts about least-squares regression Fact 1: A change of one standard deviation in x corresponds to a change of r standard deviations in y. (b 1 slope) Fact 2: The least-squares regression line always passes through the point ( x, ȳ). That is why we can get b 0 the y-intercept. Fact 3: The distinction between explanatory and response variables is essential in regression. - 2311 21 / 32

R 2 The square of the correlation R 2 describes the strength of a straight-line relationship. The formal name of it is called the Coefficient of Determination. This is the percent of variation of Y that is explained by this equation. R 2 is a measure of how successful the regression was in explaining the response. - 2311 22 / 32

R 2 for coffee sales The correlation is r = 0.827 The coefficient of determination is R 2 = 0.827 2 = 0.684 This means that 68.4% of the variation in coffee sales can be explained by the equation. - 2311 23 / 32

Example The following is an output of a least-squares regression equation in the TI-84. What percent of the variation in the y-variable can be explained by this regression equation? a) 67% b) 6.7845% c) 77% d) 88% - 2311 24 / 32

Is good at predicting the response? The coefficient of determination, R 2 is the percent (fraction) of variability in the response variable (Y ) that is explained by the least-squares regression with the explanatory variable. This is a measure of how successful the regression equation was in predicting the response variable. The closer R 2 is to one (100%) the better our equation is at predicting the response variable. From the previous question we can say that 84.33% of the variation in the monthly gas bills can be explained by the least squares regression line equation. - 2311 25 / 32

Is the LSLR good at predicting the response? A residual is the difference between an observed value of the response variable and the value predicted by the regression line. residual = observed y predicted y We can determine residuals for each observation. The closer the residuals are to zero, the better we are at predicting the response variable. We can plot the residuals for each observation, these are called the residual plots. - 2311 26 / 32

Example of residuals The regression equation to determine number of units sold (y) for coffee by shelf space (x) is: ŷ = 145 + 7.4x. A store that has 10 feet of space sold 260 units of coffee. Determine the residual for this store. 1. Determine the predicted units sold for x = 10. 2. The observed units sold is the given value 260. 3. The residual is the difference between the observed y and the predicted y. - 2311 27 / 32

Residuals of coffee sales The regression equation is Weekly Sales = 145 + 7.40 Shelf Space Shelf Observed Predicted Residual Space Weekly Sales Weekly Sales 5 160 182-22 5 220 182 38 5 140 182-42 10 190 219-29 10 240 219 21 10 260 219 41 15 230 256-26 15 270 256 14 15 280 256 24 20 260 293-33 20 290 293-3 20 310 293 17-2311 28 / 32

Residual Plots The plot of the residual values against the x values can tell us about our LSRL model. This is a scatterplot of where the x-values are on the horizontal axis and the residuals are on the vertical axis. In R: plot(x,resid(lm(y x))) If the equation is a good way of predicting the y-variable (our model is a good fit), the residual plot will be a bunch of scattered points with no pattern that is centered about zero on the vertical axis. - 2311 29 / 32

Residual Plot residuals 40 20 0 20 40 5 10 15 20 shelf - 2311 30 / 32

R code > shelf=c(5,5,5,10,10,10,15,15,15,20,20,20) > sales=c(160,220,140,190,240,260,230,270,280,260,290,310) > plot(shelf,resid(lm(sales~shelf)),ylab="residuals") > abline(0,0) - 2311 31 / 32

Examining a residual plot A curved pattern shows that the relationship is not linear. Increasing spread about the zero line as x increases indicates the prediction of y will be less accurate for larger x. Decreasing spread about the zero line as x increases indicates the prediction of y to be more accurate for larger x. Individual points with larger residuals are considered outliers in the vertical (y) direction. Individual points that are extreme in the x direction are considered outliers for the x-variable. - 2311 32 / 32