Fitting a regression model

Similar documents
BNAD 276 Lecture 10 Simple Linear Regression Model

Chapter Learning Objectives. Regression Analysis. Correlation. Simple Linear Regression. Chapter 12. Simple Linear Regression

Correlation Analysis

Lecture 5: Clustering, Linear Regression

Simple Linear Regression

Ch 2: Simple Linear Regression

Lecture 14 Simple Linear Regression

Lecture 5: Clustering, Linear Regression

Regression - Modeling a response

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal

Basic Business Statistics 6 th Edition

Lecture 5: Clustering, Linear Regression

Ch 13 & 14 - Regression Analysis

Statistics for Managers using Microsoft Excel 6 th Edition

Scatter plot of data from the study. Linear Regression

Covariance and Correlation

STAT2201 Assignment 6

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Chapter 1. Linear Regression with One Predictor Variable

Scatter plot of data from the study. Linear Regression

STAT 705 Chapter 16: One-way ANOVA

Lecture 2 Simple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: Chapter 1

Regression Analysis. BUS 735: Business Decision Making and Research

LI EAR REGRESSIO A D CORRELATIO

Section 4: Multiple Linear Regression

STAT 111 Recitation 7

Chapter 4: Regression Models

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

ECON The Simple Regression Model

Simple and Multiple Linear Regression

Regression Models. Chapter 4

Class time (Please Circle): 11:10am-12:25pm. or 12:45pm-2:00pm

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables.

Regression Models. Chapter 4. Introduction. Introduction. Introduction

7.0 Lesson Plan. Regression. Residuals

Multicollinearity occurs when two or more predictors in the model are correlated and provide redundant information about the response.

Ch 3: Multiple Linear Regression

Weighted Least Squares

The Simple Linear Regression Model

ST430 Exam 2 Solutions

The prediction of house price

Homework 2: Simple Linear Regression

Bivariate Relationships Between Variables

Section 3: Simple Linear Regression

Applied Regression Analysis. Section 2: Multiple Linear Regression

Chapter 4. Regression Models. Learning Objectives

STAT 511. Lecture : Simple linear regression Devore: Section Prof. Michael Levine. December 3, Levine STAT 511

Economics 620, Lecture 2: Regression Mechanics (Simple Regression)

STAT 540: Data Analysis and Regression

Estimating Estimable Functions of β. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 17

A Second Course in Statistics: Regression Analysis

Regression Analysis IV... More MLR and Model Building

Chapter 3 Multiple Regression Complete Example

STAT Regression Methods

PART I. (a) Describe all the assumptions for a normal error regression model with one predictor variable,

Regression Analysis Chapter 2 Simple Linear Regression

Chapter 14 Student Lecture Notes 14-1

Formal Statement of Simple Linear Regression Model

STAT5044: Regression and Anova. Inyoung Kim

Linear models and their mathematical foundations: Simple linear regression

Lecture 2. Simple linear regression

SF2930: REGRESION ANALYSIS LECTURE 1 SIMPLE LINEAR REGRESSION.

Sampling Distributions in Regression. Mini-Review: Inference for a Mean. For data (x 1, y 1 ),, (x n, y n ) generated with the SRM,

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

Simultaneous Inference: An Overview

Statistics II Exercises Chapter 5

3. Diagnostics and Remedial Measures

Mathematics for Economics MA course

The simple linear regression model discussed in Chapter 13 was written as

1. Simple Linear Regression

STAT 705 Chapter 19: Two-way ANOVA

Midterm 2 - Solutions

STAT 3A03 Applied Regression With SAS Fall 2017

L2: Two-variable regression model

15.063: Communicating with Data

The Multiple Regression Model

Xβ is a linear combination of the columns of X: Copyright c 2010 Dan Nettleton (Iowa State University) Statistics / 25 X =

15.1 The Regression Model: Analysis of Residuals

Partial derivatives, linear approximation and optimization

Regression I - the least squares line

STAT Chapter 11: Regression

Week 3: Simple Linear Regression

STAT 4385 Topic 03: Simple Linear Regression

CHAPTER 5 LINEAR REGRESSION AND CORRELATION

Ordinary Least Squares Regression

The Simple Regression Model. Simple Regression Model 1

Econ 3790: Statistics Business and Economics. Instructor: Yogesh Uppal

1 Least Squares Estimation - multiple regression.

Simple Linear Regression

Chapter 13 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics

Basic Business Statistics, 10/e

Chapter 13. Multiple Regression and Model Building

Regression Analysis II

Chapter 16. Simple Linear Regression and dcorrelation

Matrix Approach to Simple Linear Regression: An Overview

ECON 450 Development Economics

Chapter 7 Student Lecture Notes 7-1

3. Find the slope of the tangent line to the curve given by 3x y e x+y = 1 + ln x at (1, 1).

Chapter 14 Simple Linear Regression (A)

Business Statistics. Lecture 9: Simple Regression

Transcription:

Fitting a regression model We wish to fit a simple linear regression model: y = β 0 + β 1 x + ɛ. Fitting a model means obtaining estimators for the unknown population parameters β 0 and β 1 (and also for the variance of the errors σ 2 ). First step: obtain a sample of size n from the relevant population. For each sample unit, obtain measurements (y 1, x 1 ), (y 2, x 2 ),..., (y n, x n ). How do we use the sample values to estimate the model parameters? We wish to find estimators b 0, b 1 that are best in some sense. Stat 328 - Fall 2004 1

The Method of Least Squares The method that produces the best estimators we are seeking is called the method of Least Squares (LS), sometimes also known as Ordinary Least Squares (OLS). By best we mean the values of β 0, β 1 that produce a line closest to all n observations. This means that we find the line that minimizes the distances of each observation to the line. Formal definition of LS estimators: values of β 0, β 1 that minimize the sum of squared deviations of observations from the line. Note: the textbook uses ˆβ 0, ˆβ 1 to denote the estimators of β 0, β 1, whereas I have used b 0, b 1. We mean the same thing and you can use either notation. Stat 328 - Fall 2004 2

The Method of Least Squares (cont d) Steps to obtain LS estimators of (β 0, β 1 ): 1. For each observation (y i, x i ), consider the error ɛ i : ɛ i = y i E(y i ) = y i (β 0 + β 1 x i ). 2. Find the values of β 0, β 1 that minimize the sum of the squared errors (SSE): n n SSE = ɛ 2 i = (y i β 0 β 1 x i ) 2. i=1 i=1 Stat 328 - Fall 2004 3

The Method of Least Squares (cont d) It can be shown that the LS estimators of β 0, β 1 are given by b 1 = SS xy SS xx b 0 = ȳ b 1 x, where SS xy is the sum of cross-deviations of y and x: SS xy = n (x i x)(y i ȳ), i=1 Stat 328 - Fall 2004 4

and S xx is the sum of squared deviations of the x: SS xx = n (x i x) 2. i=1 Formulas for SS xy and S xx that are easier for computation are SS xy = i y i x i n xȳ SS xx = i x 2 i n( x) 2. [Those of you who know some calculus, you might be interested in the companion set of notes: LS-derivation on the course web site. Everyone else: material in LS-derivation is NOT part of the course so don t faint.] Stat 328 - Fall 2004 5

Method of LS - Example Suppose that we have the following data on a sample of size n = 5 stores, where y represents number of units sold (in 100s) of a product over a certain period and x represents the amount (in $1,000) spent by the store in advertising the product: Store x y 1 2 5 2 3 7 3 4 6 4 5 7 5 6 9 Stat 328 - Fall 2004 6

Method of LS - Example (cont d) We wish to answer the following questions: 1. How many units can a store expect to sell if it spends $5,000 in advertising? 2. What might be the expected sales if a store were to increase advertising by $1,000? 3. Would it be possible to sell more than 1000 units if advertising were increased? By how much? To answer all of those questions, we need to get b 0 and b 1. Use the computational formulas for SS xy and SS xx. We need x and ȳ, the products x i y i and the squares x 2 i. From the table above: x = 4 and ȳ = 6.8. Stat 328 - Fall 2004 7

Method of LS - Example (cont d) To get SS xy and SS xx we expand the data table: Store x y xy x 2 1 2 5 10 4 2 3 7 21 9 3 4 6 24 16 4 5 7 35 25 5 6 9 54 36 Now: SS xy = i x i y i n xȳ = (10 + 21 + 24 + 35 + 54) 5 4 6.8 = 144 136 = 8. Stat 328 - Fall 2004 8

Method of LS - Example (cont d) We get the sum of squared deviations of x in a similar manner: SS xx = i x 2 i n( x) 2 = (4 + 9 + 16 + 25 + 36) 5 16 = 90 80 = 10. We can now compute the estimators for β 0, β 1 : b 1 = SS xy = 8 SS xx 10 = 0.8 b 0 = ȳ b 1 x = 6.8 0.8 4 = 3.6. Stat 328 - Fall 2004 9

Example - Interpreting results β 1 represents the change in y when x increases by one unit. Thus in example, every $1,000 increase in advertising expenditures is expected to result in an additional 80 units of the product sold. A store that spends nothing on advertising can expect to sell about 360 units of the product. How many units can a store that spends $5,000 expect to sell? We need to compute ŷ, the predicted value of y for x = 5: ŷ = 3.6 + 0.8 5 = 7.6. Thus a store that spends $5,000 in advertising can expect to sell about 760 units in the period under consideration. Stat 328 - Fall 2004 10

Example - Interpreting results (cont d) What might be the expected change in sales at a store that increases advertising by $1,000? Since we know that every additional $1,000 represents an increase of about 80 units sold, a store than increases ads by $1,000 can expect to sell: current amount + 80 = y + 80. Would it be possible to sell more than 1000 units if advertising were increased? By how much? By trial and error: For $6,000 in ads we can expect to sell ŷ = 3.6 + 0.8 6 = 8.4 100 units. For $8,000 we can expect to sell ŷ = 3.6 + 0.8 8 = 10 100 units. Stat 328 - Fall 2004 11

Example - Interpreting results (cont d) More formally: for a given ŷ solve for x from ŷ = b 0 + b 1 x. If I know what ŷ I want and I have b 0, b 1, I can solve for x above as x = ŷ b 0 b 1. In example, for ŷ = 10, and for b 0 = 3.6, b 1 = 0.8, I get x = 10 3.6 0.8 = 8, or $8,000, the same we obtained earlier by trial and error. Stat 328 - Fall 2004 12

Residuals or errors Earlier we computed ŷ, the predicted value of y for a given x as ŷ = b 0 + b 1 x. Note that ŷ is an estimator of E(y), the expected value of y for a given x. Since we had defined ɛ = y E(y), we can now estimate the errors or residuals for each observation as e i = y i ŷ i = y i b 0 b 1 x i. Note that the sum of the errors is equal to 0: i e i = 0. Stat 328 - Fall 2004 13

Example: Tampa home sales Data are appraised values (x) and sale prices (y) (both in $1,000) of n = 92 residential properties sold in Tampa, FL in 1999. Questions of interest might be: 1. Are appraisal value and sale price associated? 2. What is the expected change in sale price if the assessed value of a home increases by $20,000? 3. What sale price can a home owner expect if the house she owns is appraised at $180,000? 4. A home owner is hoping to sell his home for $500,000 or more. How much would his house need to be appraised for for his hopes to be realistic? See JMP and SAS outputs. SAS code is on web site under Examples. Stat 328 - Fall 2004 14

Example: Tampa home sales (cont d) 1. Are appraisal value and sale price associated? It appears so. The estimated regression coefficient b 1 is 1.07, apparently different from 0. 2. Since b 1 = 1.07, the expected change in sale price for every $1,000 increase in assessed value is b 1 1, 000 = $1, 070. Thus, an increase in assessed value of $20,000 is associated to an increase in sale price of about 20 b 1 = $21, 400. 3. We compute ŷ for x = 180: ŷ = 20.94 + 1.07 180 = $213.54. The owner of a home assessed at $180,000 can expect to get about $213,500 for it. Stat 328 - Fall 2004 15

4. Owner wishes to make $500,000: we need to find x for which ŷ = 500: x = 500 b 0 b 1 = 500 20.94 1.07 = 447.72 His hopes would be realistic if his home is appraised at at least $448,000. Stat 328 - Fall 2004 16

Final comments We can predict y for any x. However, if the x of interest is larger or smaller than all the x s included in the sample, this is called extrapolation. It is always dangerous to extrapolate beyond the range of the sample. We do not know whether our model holds outside of the range of the x in the sample. See figure. Stat 328 - Fall 2004 17