Chapter 4: Regression With One Regressor

Similar documents
Economics 130. Lecture 4 Simple Linear Regression Continued

Properties of Least Squares

Chapter 11: Simple Linear Regression and Correlation

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

STAT 3008 Applied Regression Analysis

Statistics for Business and Economics

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

The Ordinary Least Squares (OLS) Estimator

Chapter 5: Hypothesis Tests, Confidence Intervals & Gauss-Markov Result

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

Chapter 3. Two-Variable Regression Model: The Problem of Estimation

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Econ Statistical Properties of the OLS estimator. Sanjaya DeSilva

β0 + β1xi and want to estimate the unknown

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

Statistics for Economics & Business

Outline. Zero Conditional mean. I. Motivation. 3. Multiple Regression Analysis: Estimation. Read Wooldridge (2013), Chapter 3.

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

Linear Regression Analysis: Terminology and Notation

F8: Heteroscedasticity

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

e i is a random error

Diagnostics in Poisson Regression. Models - Residual Analysis

β0 + β1xi. You are interested in estimating the unknown parameters β

Lecture 3 Stat102, Spring 2007

Chapter 13: Multiple Regression

17 - LINEAR REGRESSION II

Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction

Comparison of Regression Lines

Basic Business Statistics, 10/e

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

Learning Objectives for Chapter 11

Chapter 14 Simple Linear Regression

T E C O L O T E R E S E A R C H, I N C.

Introduction to Regression

/ n ) are compared. The logic is: if the two

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

First Year Examination Department of Statistics, University of Florida

Lecture 3 Specification

x yi In chapter 14, we want to perform inference (i.e. calculate confidence intervals and perform tests of significance) in this setting.

Lecture 4 Hypothesis Testing

Composite Hypotheses testing

Linear Approximation with Regularization and Moving Least Squares

III. Econometric Methodology Regression Analysis

Systematic Error Illustration of Bias. Sources of Systematic Errors. Effects of Systematic Errors 9/23/2009. Instrument Errors Method Errors Personal

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

β0 + β1xi. You are interested in estimating the unknown parameters β

Correlation and Regression

[The following data appear in Wooldridge Q2.3.] The table below contains the ACT score and college GPA for eight college students.

x = , so that calculated

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Statistics II Final Exam 26/6/18

Topic 7: Analysis of Variance

Linear regression. Regression Models. Chapter 11 Student Lecture Notes Regression Analysis is the

princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 3: Large deviations bounds and applications Lecturer: Sanjeev Arora

Biostatistics. Chapter 11 Simple Linear Correlation and Regression. Jing Li

28. SIMPLE LINEAR REGRESSION III

Limited Dependent Variables

Negative Binomial Regression

January Examinations 2015

x i1 =1 for all i (the constant ).

Lecture 3: Probability Distributions

Primer on High-Order Moment Estimators

Chapter 9: Statistical Inference and the Relationship between Two Variables

18. SIMPLE LINEAR REGRESSION III

Statistics MINITAB - Lab 2

JAB Chain. Long-tail claims development. ASTIN - September 2005 B.Verdier A. Klinger

Estimation: Part 2. Chapter GREG estimation

Introduction to Dummy Variable Regressors. 1. An Example of Dummy Variable Regressors

is the calculated value of the dependent variable at point i. The best parameters have values that minimize the squares of the errors

Lecture Notes on Linear Regression

Econ107 Applied Econometrics Topic 9: Heteroskedasticity (Studenmund, Chapter 10)

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

where I = (n x n) diagonal identity matrix with diagonal elements = 1 and off-diagonal elements = 0; and σ 2 e = variance of (Y X).

CHAPER 11: HETEROSCEDASTICITY: WHAT HAPPENS WHEN ERROR VARIANCE IS NONCONSTANT?

Laboratory 1c: Method of Least Squares

Lecture Notes for STATISTICAL METHODS FOR BUSINESS II BMGT 212. Chapters 14, 15 & 16. Professor Ahmadi, Ph.D. Department of Management

Systems of Equations (SUR, GMM, and 3SLS)

Problem of Estimation. Ordinary Least Squares (OLS) Ordinary Least Squares Method. Basic Econometrics in Transportation. Bivariate Regression Analysis

4.3 Poisson Regression

BIO Lab 2: TWO-LEVEL NORMAL MODELS with school children popularity data

Global Sensitivity. Tuesday 20 th February, 2018

STAT 3340 Assignment 1 solutions. 1. Find the equation of the line which passes through the points (1,1) and (4,5).

Statistics Chapter 4

Scatter Plot x

Resource Allocation and Decision Analysis (ECON 8010) Spring 2014 Foundations of Regression Analysis

UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours

a. (All your answers should be in the letter!

DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR. Introductory Econometrics 1 hour 30 minutes

PHYS 450 Spring semester Lecture 02: Dealing with Experimental Uncertainties. Ron Reifenberger Birck Nanotechnology Center Purdue University

Professor Chris Murray. Midterm Exam

Laboratory 3: Method of Least Squares

Cathy Walker March 5, 2010

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

Lecture 6: Introduction to Linear Regression

Department of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution

SIMPLE LINEAR REGRESSION

Transcription:

Chapter 4: Regresson Wth One Regressor Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. 1-1

Outlne 1. Fttng a lne to data 2. The ordnary least squares (OLS) lne/regresson 3. Measures of ft 4. Populaton model 5. The least squares assumptons 6. The samplng dstrbuton of the OLS estmator Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. 4-2

Fttng a lne to data Suppose data on two rvs (X,Y): (X 1,Y 1 ),, (X n,y n ) No probablty dstrbuton for now Suspect Y depends somewhat on X, e.g. Y = average test score n school dstrct X = average student-teacher rato n school dstrct Try to summarze/ft ths dependence (f any) by a lne defned by the ntercept, slope parameters: b 0, b 1 Seek b 0, b 1 s.t. data approxmately satsfy: Known as a regresson. Y b + b X 0 1 Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. 1-3

Resduals of a partcular lne β 0,β 1 Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. 4-4

Errors/resduals n ft Gven a lne b 0,b 1, defne errors/resduals as u : = Y ( b + b X ) s o Y = b + b X + u 0 1 0 1 Wsh u =1,,n to be zerosh. Ths wsh can be nterpreted as a specfc goal n varous ways. One s n terms of the sum of squared resduals: SSR 2 : u = 1,..., n = So the goal s to choose the lne b 0,b 1 to mnmze SSR The mnmands b * 0,b * 1 are known as least squares We wll see another way later Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. 1-5

Least squares So the least squares b * 0,b * 1 mnmze ths SSR = u = (Y b b X ) 2 2 0 1 If you know basc calculus, you can set equal to zero the dervatve of SSR wrt b 0, and that of SSR wrt b 1. Then solve the system of two equatons for unknowns b 0,b 1 Ftted/estmated/ predcted value: Resdual/error: * * Y : = b 0 + b 1 X u : = Y Y Followng s the least squares lne b * 0, b * 1 : Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. 1-6

Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. 4-7

Interpretaton Y = b + b X * * 0 1 The least squares lne does ft the average data (role of b 0 ) b 1 s the senstvty of Y to X, for values of X near mean (assumng dependence exsts!) b 1 = sample covarance X 's sample varance Slope s postve ff data postvely correlated If X vares lttle, so denom. near zero, slope s unrelable (vares greatly wth small varatons of Y) Exercse: Errors sum to zero. 1-8 Copyrght 2011 Pearson Addson-Wesley. All rghts reserved.

Applcaton to CA data Slope = = 2.28 Intercept = = 698.9 Least squares lne: = 698.9 2.28 STR TestScore Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. 4-9

Ftted value X resdual For =Antelope, CA dstrct, (X,Y )=(19.33,657.8) ftted value: = 698.9 2.28 19.33 = 654.8 YˆAntelope u ˆAntelope resdual: = 657.8 654.8 = 3.0 Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. 4-10

Interpretaton here TestSco re= 689.9 2.28 STR Dstrcts wth one fewer student per teacher have average test scores 2.28 ponts hgher, on average Do not nterpret the ntercept as the value of the lne at X=0 For there are no school dstrcts wth every classroom empty (STR=X=0), and, even f there were, there would be no test scores n such classrooms (.e. Y s) The ntercept s just somethng that makes ths true: Y = b0 + b1 X Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. 4-11

Illustraton: Computng OLS Data (X,Y)=(3,2),(2,1),(3,1),(4,2). Compute OLS: Mean of X: 3. Mean of Y:3/2 Numerator 3 3 ( X X )(Y Y) = (0)(.) + (2 3)(1 ) + (0)(.) + (4 3)(2 ) = 1 2 2 2.5 Denomnator 2 ( X X ) = 0 + 1+ 0 + 1 = 2 2 1.5 * num 1 b1 = = den 2 * * 3 1 b0 = Y b1 X = 3 = 0 2 2 Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. 1 0.5 0 0 1 2 3 4 5 4-12

Measures of Ft There are two measures of the ft of the lne to the data: The R 2 measures the fracton of the varance of Y that s explaned by X. The standard error of the regresson (SER) measures the magntude of the regresson s errors. Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. 4-13

The R 2 Recall, from def. of u. Exercse: So Varance splts nto explaned and unexplaned parts Dvdng, Y = Y + u var( Y ) = var( Y ) + var ( u ) var( ) var ( ) Y u 1 = + var( Y ) var( Y ) cov( Y, u ) = 0 the explaned and unexplaned proportons of var(y) Ths explaned proporton s called 0 R 2 1. Often worded va TSS:= (Y -Y*) 2 and (mnmzed) SSR: Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. R 2 2 var(u ) (u 0) = 1 = 1 = 1 2 var( Y ) ( Y Y ) SSR TSS 4-14

The R 2 cont d Often worded va ESS:=(Y hat -Y*) 2. R 2 var( ˆ ) ( Yˆ Y ) : = = = var( ) ( ) 2 Y ESS 2 Y Y Y TSS Whatever formula one uses, clearly hgher R 2 s better. Exercse: R 2 = the square of the correlaton b/w X,Y Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. 4-15

Standard Error of the Regresson (SER) It measures the average magntude of errors. 1 1 SER:= ( ) n 2 2 n n 2 2 uˆ ˆ ˆ u = u = 1 n = 1 Equalty uses fact errors sum/average to zero. Wsh ths to be small (and OLS mnmzes t by defnton). Factor s 1/(n-2), nstead of 1/n (as n true average) for techncal reasons (showng t s consstent, later). RMSE defned as above, but wth 1/n. (Very smlar f n large) Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. 4-16

R 2 & SER for CA data R 2 =.05, SER = 18.6 poor ft X=STR explans va Y hat only a small fracton of var(test scores) Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. 4-17

The Lnear Regresson Model So far, gven data, dscussed how to ft lne and measure ft. Dscusson apart of any probablstc model for data. Now: Assume Y = β 0 + β 1 X + u = 1,, n Data generated as follows: The X s: Gven. The Y s: There are constants β 0,β 1 & rv u ( error term ) such that every observed Y arses lnearly as above. X s known as the ndependent varable or regressor Y as the dependent varable Error term subsumes omtted factors & data measurement errors. Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. 4-18

Purpose of the Lnear Regresson Model The assumpton mples that there s some true model generatng the data & OLS b s. The CLT wll mply, gven condtons, the rate at whch the average data & the OLS b s converge to the true model. Note: Wthout a true model, what s there to converge to?! The OLS b s are called estmates β, β (of the true β 0,β 1 ). 0 1 (Ths usage senseless wthout the data-generatng model.) As n ch3, we wll address whether E(b)= β (unbased), b-> β (consstency), & rate of convergence/confdence ntervals Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. 4-19

The OLS Assumptons Assume Y = β 0 + β 1 X + u =1,,n 1. Error term condtonal on X has mean zero, E(u X=x) 2. (X,Y ) =1,,n, are..d. 3. Outlers are rare: E(X 4 ), E(Y 4 ) are fnte. Purpose: (1) mples s unbased. (2) mples samplng dstrbuton of β, β ; true under SRS 0 1 (3) needed to apply CLT for confdence ntervals Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. 4-20

OLS assumpton 1: E(u X = x) = 0 Example: E(u STR=16)=0 What are some of these other factors? Dstrct s wealth, parental nvolvement, across all dstrcts wth STR=16, these factors average out, says the assumpton Note, STR=16 s low, so those dstrcts tend to be wealthy already, suggestng n fact E(u STR=16)>0 for such rvs. Exercse: Ass n mples cov(x,u )=0 Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. 4-21

OLS assumpton 2: (X,Y ) = 1,,n are d True f entty (ndvdual, dstrct) s smply randomly sampled: The enttes are selected from the same populaton, so (X, Y ) are dentcally dstrbuted for all = 1,, n. The enttes are selected at random, so the values of (X, Y) for dfferent enttes are ndependently dstrbuted. One case where samplng s not-d s where data for the same entty are recorded over tme (panel & tme seres data) Entty s data tends to show tme-dependence (not ndependent) Eg dstrct wth small STR n 1999 s lkely to have small STR n 2000. We ll address tme later n course. Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. 4-22

OLS assumpton 3: E(X 4 ) & E(Y 4 ) fnte Says extreme outlers are rare Ths true whenever both X,Y are bounded. Eg, n CA data, X=STR s bounded between 0 and lawful maxmum (100?) Y=test score bounded between 0 and test max (1600?) Addng a large outler drastcally changes OLS estmate (snce largeness n squared n errors), so ths assumpton says that data averages are stable as sample grows Btw, look n data for outlers that may be justfable removed, eg wrong code, scale, unt. Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. 4-23

Why ass n 3 mportant Black dots get OLS lne that s flat Addng one red outler, though one among many ponts, causes OLS lne to move drastcally OLS unrelable f, as sample grows, reds arse often Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. 4-24

Samplng Dstrbuton of OLS Estmator The OLS estmate s defned by a sample of data. Dfferent samples yeld dfferent estmates. So OLS estmate s a rv. Wsh to learn ts samplng dstrbuton. Ths so as to: Test hypotheses such as β 1 = 0 Construct confdence ntervals for β 1 Analogous to what we dd wth the sample mean as an estmate rv of the true mean Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. 4-25

Key auxlary fact: β = β + ( X X ) u 1 1 2 ( X X ) u = error term Result: OLS unbased, E(β 1 hat )=β 1 Proof: Suffces that E(fracton)=0. Toward ths, use Law of Iterated Expectatons E(rv)=E[E(rv X)]: ( X X ) u ( X X ) E[ u X = 1,..., n] ( X X )0 E = E = E = 0 2 2 2 ( X X ) ( X X ) ( X X ) Detal: -Used E[u X =1,,n ] = E[u X ] snce ndependently dst d -In turn used E[u X ] = 0 snce assn 2 -So E[u X =1,,n ] = 0 n the above Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. 4-26

Key auxlary fact: Result: For all large n, Idea: β β = β + ( X X ) u 1 1 2 ( X X ) 1 var[(x µ X ) u] var( β 1) 4 n σ 1 1 1 ( X X ) u ( X µ X ) u v = n n n = 1 n 1 n n 1 2 2 2 2 ( X ) X X X s σ σ X v : = ( X µ ) u Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. X u v = error term where X can be shown to meet condtons for CLT to apply. So CLT apples to conclude vbar s approxmately a normal wth varance 1 var[( X µ X ) u ] n Thus v 1 ( ) 1 var[(x µ X ) u] var( β 1) var var v 2 = = 4 4 σ X σ X n σ X 4-27

Summary of samplng dstrbuton E( β ) = 1 1 β µ 1 var[( X ) u] var( β ) X 1 n 4 β 1 σ s approxmately a normal X Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. 4-28

Importance of var(x) for relablty of OLS Note, we see var( ) s nversely proportonal to 4 σ X So the greater var(x), the more relable s estmate More nformaton n the data makes slope easy to ascertan Illustraton #blue dots = #black dots Slope for blues? Unsure Slope for blacks? About 2! Dfference s that blacks got greater spread, varance. Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. 4-29

Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. 4-30

Appendx I: β = β + ( X X ) u 1 1 2 ( X X ) u = error term Let us show ths auxlary result that lead to our key results Averagng model Y = β 0 + β 1 X + u and then subtractng, Y = β + β X + u Y Y = β ( X X ) + ( u u) 0 1 1 Let us substtute ths n the OLS formula for β 1 : (Y Y)(X X ) β (X X ) ( u u) + (X X ) 1 1 = = 2 2 (X X ) (X X ) β (X X )(X X ) ( u u)(x X ) = β + 1 2 2 (X X ) (X X ) u (X X ) = β + (X X ) = 0 1 u 2 2 (X X ) (X X ) Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. 4-31

Appendx II: Dervaton of OLS estmates Recall defnton of resdual u := Y (b 0 + b 1 X ) and of crteron SSR := u 2 that OLS b 0,b 1 are to mnmze. To compute these, take dervatves wrt them and set to 0: d d( u ) d(y (b + b X )) b SSR u u u 0 0 1 0 = = 2 = 2 = 2 db0 db0 db0 Ths s askng that resduals add (or average) to zero,.e. that 1 0 = u = Y (b + b X) b = Y b X * 0 1 0 1 d d( u ) d(y (b + b X )) b SSR u u u 0 1 0 = = 2 = 2 = 2 X db1 db1 db1 That s, 0 = (Y (b + b X )) X = Y X b nx + b 0 1 0 1 X 2 Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. 4-32

Appendx II: cont d Substtutng b 0, 0 = Y X (Y b X) nx + b b Y X = 1 1 1 * 1 2 2 ( 2 2 ) nx Y = b X nx Y X X nx Y nx Now, a bt of algebra show the numerator s denomnator s (Just expand the latter, smplfy, and get the above fracton.) X 2 ( Y Y )( X X ) ( X X ) 2 Fnally, these are global mnma (vs. local mnma or maxma) because the dervatve of pror dervatves s postve. Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. 4-33