Chapter 4: Regression With One Regressor

Chapter 4: Regresson Wth One Regressor Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. 1-1

Outlne 1. Fttng a lne to data 2. The ordnary least squares (OLS) lne/regresson 3. Measures of ft 4. Populaton model 5. The least squares assumptons 6. The samplng dstrbuton of the OLS estmator Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. 4-2

Fttng a lne to data Suppose data on two rvs (X,Y): (X 1,Y 1 ),, (X n,y n ) No probablty dstrbuton for now Suspect Y depends somewhat on X, e.g. Y = average test score n school dstrct X = average student-teacher rato n school dstrct Try to summarze/ft ths dependence (f any) by a lne defned by the ntercept, slope parameters: b 0, b 1 Seek b 0, b 1 s.t. data approxmately satsfy: Known as a regresson. Y b + b X 0 1 Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. 1-3

Resduals of a partcular lne β 0,β 1 Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. 4-4

Errors/resduals n ft Gven a lne b 0,b 1, defne errors/resduals as u : = Y ( b + b X ) s o Y = b + b X + u 0 1 0 1 Wsh u =1,,n to be zerosh. Ths wsh can be nterpreted as a specfc goal n varous ways. One s n terms of the sum of squared resduals: SSR 2 : u = 1,..., n = So the goal s to choose the lne b 0,b 1 to mnmze SSR The mnmands b * 0,b * 1 are known as least squares We wll see another way later Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. 1-5

Least squares So the least squares b * 0,b * 1 mnmze ths SSR = u = (Y b b X ) 2 2 0 1 If you know basc calculus, you can set equal to zero the dervatve of SSR wrt b 0, and that of SSR wrt b 1. Then solve the system of two equatons for unknowns b 0,b 1 Ftted/estmated/ predcted value: Resdual/error: * * Y : = b 0 + b 1 X u : = Y Y Followng s the least squares lne b * 0, b * 1 : Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. 1-6

Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. 4-7

Interpretaton Y = b + b X * * 0 1 The least squares lne does ft the average data (role of b 0 ) b 1 s the senstvty of Y to X, for values of X near mean (assumng dependence exsts!) b 1 = sample covarance X 's sample varance Slope s postve ff data postvely correlated If X vares lttle, so denom. near zero, slope s unrelable (vares greatly wth small varatons of Y) Exercse: Errors sum to zero. 1-8 Copyrght 2011 Pearson Addson-Wesley. All rghts reserved.

Applcaton to CA data Slope = = 2.28 Intercept = = 698.9 Least squares lne: = 698.9 2.28 STR TestScore Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. 4-9

Ftted value X resdual For =Antelope, CA dstrct, (X,Y )=(19.33,657.8) ftted value: = 698.9 2.28 19.33 = 654.8 YˆAntelope u ˆAntelope resdual: = 657.8 654.8 = 3.0 Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. 4-10

Interpretaton here TestSco re= 689.9 2.28 STR Dstrcts wth one fewer student per teacher have average test scores 2.28 ponts hgher, on average Do not nterpret the ntercept as the value of the lne at X=0 For there are no school dstrcts wth every classroom empty (STR=X=0), and, even f there were, there would be no test scores n such classrooms (.e. Y s) The ntercept s just somethng that makes ths true: Y = b0 + b1 X Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. 4-11

Illustraton: Computng OLS Data (X,Y)=(3,2),(2,1),(3,1),(4,2). Compute OLS: Mean of X: 3. Mean of Y:3/2 Numerator 3 3 ( X X )(Y Y) = (0)(.) + (2 3)(1 ) + (0)(.) + (4 3)(2 ) = 1 2 2 2.5 Denomnator 2 ( X X ) = 0 + 1+ 0 + 1 = 2 2 1.5 * num 1 b1 = = den 2 * * 3 1 b0 = Y b1 X = 3 = 0 2 2 Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. 1 0.5 0 0 1 2 3 4 5 4-12

Measures of Ft There are two measures of the ft of the lne to the data: The R 2 measures the fracton of the varance of Y that s explaned by X. The standard error of the regresson (SER) measures the magntude of the regresson s errors. Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. 4-13

The R 2 Recall, from def. of u. Exercse: So Varance splts nto explaned and unexplaned parts Dvdng, Y = Y + u var( Y ) = var( Y ) + var ( u ) var( ) var ( ) Y u 1 = + var( Y ) var( Y ) cov( Y, u ) = 0 the explaned and unexplaned proportons of var(y) Ths explaned proporton s called 0 R 2 1. Often worded va TSS:= (Y -Y*) 2 and (mnmzed) SSR: Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. R 2 2 var(u ) (u 0) = 1 = 1 = 1 2 var( Y ) ( Y Y ) SSR TSS 4-14

The R 2 cont d Often worded va ESS:=(Y hat -Y*) 2. R 2 var( ˆ ) ( Yˆ Y ) : = = = var( ) ( ) 2 Y ESS 2 Y Y Y TSS Whatever formula one uses, clearly hgher R 2 s better. Exercse: R 2 = the square of the correlaton b/w X,Y Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. 4-15

Standard Error of the Regresson (SER) It measures the average magntude of errors. 1 1 SER:= ( ) n 2 2 n n 2 2 uˆ ˆ ˆ u = u = 1 n = 1 Equalty uses fact errors sum/average to zero. Wsh ths to be small (and OLS mnmzes t by defnton). Factor s 1/(n-2), nstead of 1/n (as n true average) for techncal reasons (showng t s consstent, later). RMSE defned as above, but wth 1/n. (Very smlar f n large) Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. 4-16

R 2 & SER for CA data R 2 =.05, SER = 18.6 poor ft X=STR explans va Y hat only a small fracton of var(test scores) Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. 4-17

The Lnear Regresson Model So far, gven data, dscussed how to ft lne and measure ft. Dscusson apart of any probablstc model for data. Now: Assume Y = β 0 + β 1 X + u = 1,, n Data generated as follows: The X s: Gven. The Y s: There are constants β 0,β 1 & rv u ( error term ) such that every observed Y arses lnearly as above. X s known as the ndependent varable or regressor Y as the dependent varable Error term subsumes omtted factors & data measurement errors. Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. 4-18

Purpose of the Lnear Regresson Model The assumpton mples that there s some true model generatng the data & OLS b s. The CLT wll mply, gven condtons, the rate at whch the average data & the OLS b s converge to the true model. Note: Wthout a true model, what s there to converge to?! The OLS b s are called estmates β, β (of the true β 0,β 1 ). 0 1 (Ths usage senseless wthout the data-generatng model.) As n ch3, we wll address whether E(b)= β (unbased), b-> β (consstency), & rate of convergence/confdence ntervals Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. 4-19

The OLS Assumptons Assume Y = β 0 + β 1 X + u =1,,n 1. Error term condtonal on X has mean zero, E(u X=x) 2. (X,Y ) =1,,n, are..d. 3. Outlers are rare: E(X 4 ), E(Y 4 ) are fnte. Purpose: (1) mples s unbased. (2) mples samplng dstrbuton of β, β ; true under SRS 0 1 (3) needed to apply CLT for confdence ntervals Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. 4-20

OLS assumpton 1: E(u X = x) = 0 Example: E(u STR=16)=0 What are some of these other factors? Dstrct s wealth, parental nvolvement, across all dstrcts wth STR=16, these factors average out, says the assumpton Note, STR=16 s low, so those dstrcts tend to be wealthy already, suggestng n fact E(u STR=16)>0 for such rvs. Exercse: Ass n mples cov(x,u )=0 Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. 4-21

OLS assumpton 2: (X,Y ) = 1,,n are d True f entty (ndvdual, dstrct) s smply randomly sampled: The enttes are selected from the same populaton, so (X, Y ) are dentcally dstrbuted for all = 1,, n. The enttes are selected at random, so the values of (X, Y) for dfferent enttes are ndependently dstrbuted. One case where samplng s not-d s where data for the same entty are recorded over tme (panel & tme seres data) Entty s data tends to show tme-dependence (not ndependent) Eg dstrct wth small STR n 1999 s lkely to have small STR n 2000. We ll address tme later n course. Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. 4-22

OLS assumpton 3: E(X 4 ) & E(Y 4 ) fnte Says extreme outlers are rare Ths true whenever both X,Y are bounded. Eg, n CA data, X=STR s bounded between 0 and lawful maxmum (100?) Y=test score bounded between 0 and test max (1600?) Addng a large outler drastcally changes OLS estmate (snce largeness n squared n errors), so ths assumpton says that data averages are stable as sample grows Btw, look n data for outlers that may be justfable removed, eg wrong code, scale, unt. Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. 4-23

Why ass n 3 mportant Black dots get OLS lne that s flat Addng one red outler, though one among many ponts, causes OLS lne to move drastcally OLS unrelable f, as sample grows, reds arse often Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. 4-24

Samplng Dstrbuton of OLS Estmator The OLS estmate s defned by a sample of data. Dfferent samples yeld dfferent estmates. So OLS estmate s a rv. Wsh to learn ts samplng dstrbuton. Ths so as to: Test hypotheses such as β 1 = 0 Construct confdence ntervals for β 1 Analogous to what we dd wth the sample mean as an estmate rv of the true mean Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. 4-25

Key auxlary fact: β = β + ( X X ) u 1 1 2 ( X X ) u = error term Result: OLS unbased, E(β 1 hat )=β 1 Proof: Suffces that E(fracton)=0. Toward ths, use Law of Iterated Expectatons E(rv)=E[E(rv X)]: ( X X ) u ( X X ) E[ u X = 1,..., n] ( X X )0 E = E = E = 0 2 2 2 ( X X ) ( X X ) ( X X ) Detal: -Used E[u X =1,,n ] = E[u X ] snce ndependently dst d -In turn used E[u X ] = 0 snce assn 2 -So E[u X =1,,n ] = 0 n the above Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. 4-26

Key auxlary fact: Result: For all large n, Idea: β β = β + ( X X ) u 1 1 2 ( X X ) 1 var[(x µ X ) u] var( β 1) 4 n σ 1 1 1 ( X X ) u ( X µ X ) u v = n n n = 1 n 1 n n 1 2 2 2 2 ( X ) X X X s σ σ X v : = ( X µ ) u Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. X u v = error term where X can be shown to meet condtons for CLT to apply. So CLT apples to conclude vbar s approxmately a normal wth varance 1 var[( X µ X ) u ] n Thus v 1 ( ) 1 var[(x µ X ) u] var( β 1) var var v 2 = = 4 4 σ X σ X n σ X 4-27

Summary of samplng dstrbuton E( β ) = 1 1 β µ 1 var[( X ) u] var( β ) X 1 n 4 β 1 σ s approxmately a normal X Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. 4-28

Importance of var(x) for relablty of OLS Note, we see var( ) s nversely proportonal to 4 σ X So the greater var(x), the more relable s estmate More nformaton n the data makes slope easy to ascertan Illustraton #blue dots = #black dots Slope for blues? Unsure Slope for blacks? About 2! Dfference s that blacks got greater spread, varance. Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. 4-29

Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. 4-30

Appendx I: β = β + ( X X ) u 1 1 2 ( X X ) u = error term Let us show ths auxlary result that lead to our key results Averagng model Y = β 0 + β 1 X + u and then subtractng, Y = β + β X + u Y Y = β ( X X ) + ( u u) 0 1 1 Let us substtute ths n the OLS formula for β 1 : (Y Y)(X X ) β (X X ) ( u u) + (X X ) 1 1 = = 2 2 (X X ) (X X ) β (X X )(X X ) ( u u)(x X ) = β + 1 2 2 (X X ) (X X ) u (X X ) = β + (X X ) = 0 1 u 2 2 (X X ) (X X ) Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. 4-31

Appendx II: Dervaton of OLS estmates Recall defnton of resdual u := Y (b 0 + b 1 X ) and of crteron SSR := u 2 that OLS b 0,b 1 are to mnmze. To compute these, take dervatves wrt them and set to 0: d d( u ) d(y (b + b X )) b SSR u u u 0 0 1 0 = = 2 = 2 = 2 db0 db0 db0 Ths s askng that resduals add (or average) to zero,.e. that 1 0 = u = Y (b + b X) b = Y b X * 0 1 0 1 d d( u ) d(y (b + b X )) b SSR u u u 0 1 0 = = 2 = 2 = 2 X db1 db1 db1 That s, 0 = (Y (b + b X )) X = Y X b nx + b 0 1 0 1 X 2 Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. 4-32

Appendx II: cont d Substtutng b 0, 0 = Y X (Y b X) nx + b b Y X = 1 1 1 * 1 2 2 ( 2 2 ) nx Y = b X nx Y X X nx Y nx Now, a bt of algebra show the numerator s denomnator s (Just expand the latter, smplfy, and get the above fracton.) X 2 ( Y Y )( X X ) ( X X ) 2 Fnally, these are global mnma (vs. local mnma or maxma) because the dervatve of pror dervatves s postve. Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. 4-33