Linear Regression 1 Single Explanatory Variable Assume (y is not necessarily normal) where Examples: y» F; Ey = + x Vary = ¾ 2 ) y = + x + u Eu = 0 Varu = ¾ 2 Exu = 0: 1. School performance as a function of SA score 2. Consumption as a function of income 3. Murder rate as a function of crime policy 1.1 Curve tting 1. Draw pictures 2. Options for tting (a) Piecewise (b) Mean squared deviation (c) Mean absolute deviation (d) Mean orthogonal squared deviation 1.2 Estimation 1.2.1 ypes of samples Cross-section data: ime Series data: y i = + x i + u i i =1; 2; ::; n (1) y t = + x t + u t t =1; 2; ::; 1
1.2.2 Mechanics Consider the solution to b ; b =argmin he FOC s are and 0 = 2 ) 0= ) 0= ) 0= ) 0= ) 0= ; 0 = 2 [y i x i ] 2 [y i x i ] ) b = y b x hy i b x i i x i hey i b ex i i x i hey i b ex i i (ex i + x) hey i b ex i i ex i + hey i b ex i i ex i + x hey i b ex i i ex i Note that, for any variable, ez i = (z i z) = hey i b ex i i x ey i b x ex i z i nz = nz nz =0: Solving for b, onegets b = ex P 1 n P iey i n n = ex iey i P ex2 1 n = b¾ xy : i n ex2 i b¾ xx Note that one can take equation (1) and write it as ey i = ex i + eu i ; set b =argmin [ey i ex i ] 2 ; and get the same b. 2
1.2.3 Properties of b b = ex iey i ex2 i = ex i ( ex i + eu i ) ex2 i = ex2 i ex2 i = + + P ex ieu i n : ex2 i ex ieu i ex2 i E b = E + E P ex ieu i n ex2 i = + ex ieeu i ex2 i = ) b is an unbiased estimator of. Var b h i 2 h 2 = E b E b = E b i Note that: = E ex 2 P ieu i n = E [ ex ieu i ] 2 ex2 i [ ex2 i h ]2 Pn P i h E n j=1 ex Pn P i iex j eu i eu n j = [ P j=1 ex iex j Eeu i eu j = n ex2 i ]2 [ ex2 i ]2 P " n n # 1 = ex2 i ¾2 X [ P = ¾2 ex 2 n ex2 i : i ]2 1. Var b declines as n increases; 2. Var b declines as variation in x increases; 3. Var b increases as ¾ 2 increases. So far, the assumptions are: 1. Eu i =0; 2. Varu i = ¾ 2 < 1; 3. Cov(u i ;u j )=0 8i 6= j; 4. Cov(u i ;x i )=0. 3
If we also assume that u i» iidn 0;¾ 2, then, since b = + ex ieu i ex2 i is a linear combination of normal random variables, b is normal 0 " n # 1 1 ) b X» N @ ;¾ 2 A : Alternatively (instead of assuming u i» iidn 0;¾ 2 ), assume n!1. hen, since b is a weighted average of u i, i =1; 2; ::; n, whereeachu i has Eu i =0, E b = 0; " n # 1 X Var b = ¾ 2 ex 2 i! 0; E p n b = 0; Var p " n # 1 " # 1 X n b = n¾ 2 ex 2 i = ¾ 2 1 ex 2 i n ) (CL) ex 2 i 0 " p n b» N @0;¾ 2 1 n ex 2 i # 1 1 A : In either case, since p n b has a known distribution, one can construct hypothesis tests involving. Consider H 0 : = 0 vs. H A : 6= 0: Under the null hypothesis, where or p n b 0 ¾ v u ¾ = ¾= t 1 n p n b 0» N (0; 1) ex 2 i ; b¾» t n 2 4
where v u b¾ = s 2 = t 1 n s 2 = 1 n 2 ex 2 i ey i b ex i 2 : Why n 2? We say that is (statistically) signi cant if b is signi cantly di erent than 0 =0. 1.2.4 Properties of Residuals u i is the error, and bu i = ey i b ex i = y i b b x i 6= u i is the residual. Note that bu i = ey i i b ex = ey i b ex i =0; bu i ex i = = = = ey i b ex i ex i ey i ex i b ey i ex i ey i ex i ex 2 i P ey iex i n ex2 i ey i ex i =0: ex 2 i 1.2.5 Goodness of Fit ey i 2 = (y i y) 2 = [y i by i + by i y] 2 (where by i = b + b x i ) 5
= (y i by i ) 2 + (by i y) 2 +2 (y i by i )(by i y) : (2) Note that y i by i = y i b b x i = bu i and hen by i y = b i b x b i b x = b ex i : (y i by i )(by i y) = b bu i ex i =0: So equation (2) becomes De ne (y i y) 2 = R 2 = (y i by i ) 2 + (by i y) 2 : (by i y) 2 (y i y) 2 =1 (y i by i ) 2 (y i y) 2 : Note that 0 R 2 1: When is R 2 =1?WhenisR 2 =0? Now let ESS = RSS = (y i by i ) 2 ; (by i y) 2 : Consider H 0 : =0 vs. H A : 6= 0: It can be shown that ESS=¾ 2» Â 2 n 2 RSS=¾ 2» Â 2 1 ¾ independent 6
Note that (n 2) RSS )» F 1;n 2 : ESS R 2 = RSS= (RSS + ESS) 1 R 2 = ESS=(RSS + ESS) ) 2 Multiple Regression Now consider the more complex model Examples: (n 2) R2 (n 2) RSS 1 R 2 =» F 1;n 2 : ESS y t = 0 + ix ti + u t : (3) 1. Consumption on interest rate, income, wealth, unemployment rate 2. Earnings on schooling, age, race, sex 3. Hours of work on children, sex, spouse s income, wage 2.1 Structure We can write equation (3) as where x t0 =1. Assumptions: y t = ix ti + u t i=0 1. Linear model with correct speci cation 2. x s are nonstochastic and linearly independent 3. Eu t =0 4. ½ ¾ 2 if t = s Eu t u s = 0 if t 6= s 7
5. (maybe) u t» iidn 0;¾ 2 Ã n! X ) y t» indn ix ti ;¾ 2 : i=0 Note that i = @Ey t : @x ti Discuss the signi cance of partial derivative. 2.2 Intro to Ordinary Least Squares Consider minimizing the sum of squared residuals: min 0; 1;::; n " X y t t=1 # 2 ix ti : i=0 he FOC s are " # X 0 = 2 y t ix ti x ti (4) ) 0= t=1 " X y t i=0 t=1 i=0 # ix ti x ti : 2.3 Matrix Notation Let 0 1 y 1 y 2 y = B C 1 @. A ; y 0 1 (n+1) 1 = B @. 0 1 n X (n+1) = 0 C A ; u = B @ u 1 u 2. u 0 x 10 x 11 x 1n x 20 x 21 x 2n B @...... x 0 1 x 1 x n C A : 1 C A ; hen, we can write equation (3) as y = X + u; 8
and we can write the FOC s in equation (4) as X 0 y X b =0: We can solve for b to get b =(X 0 X) 1 (X 0 y) : An alternative derivation is to let bu = y X b be the residuals and nd b so that X 0 bu =0(this will generalize later to instrumental variables estimation): X 0 bu = X 0 y X b =0 ) b =(X 0 X) 1 (X 0 y) : 2.4 Statistical Properties of b b = (X 0 X) 1 (X 0 y)=(x 0 X) 1 X 0 (X + u) = (X 0 X) 1 X 0 X +(X 0 X) 1 X 0 u = +(X 0 X) 1 X 0 u: E b = E + E (X 0 X) 1 X 0 u = +(X 0 X) 1 X 0 Eu = : D b 0 = E b E b b E b = E b b 0 = E (X 0 X) 1 X 0 uu 0 X (X 0 X) 1 = (X 0 X) 1 X 0 Euu 0 X (X 0 X) 1 = (X 0 X) 1 X 0 ¾ 2 IX (X 0 X) 1 = ¾ 2 (X 0 X) 1 X 0 X (X 0 X) 1 = ¾ 2 (X 0 X) 1 : ) Var b i = ¾ 2 (X 0 X) 1 ii ; Cov b i; b j = ¾ 2 (X 0 X) 1 ij : 9
If u» N 0;¾ 2 I,then b» N h ;¾ 2 (X 0 X) 1i ) q b i i =¾ (X 0 X) 1 ii» N (0; 1) : Next, let s 2 = bu 0 bu (n +1) : Later, we show that Es 2 = ¾ 2. Also, it is true that [ (n +1)]s 2 =¾ 2» Â 2 (n+1) : ) q b i i =¾ (X 0 X) 1 ii p [ (n +1)]s2 =¾ 2 [ (n +1)]» t (n+1) which simpli es to q b i i = (X 0 X) 1 ii s» t (n+1) : 2.5 Asymptotics b = +(X 0 X) 1 X 0 u: We would like to say that plim b = plim + plim (X 0 X) 1 X 0 u = + plim (X 0 X) 1 X 0 u: plim (X 0 X) 1 X 0 u =(plim X 0 X) 1 plim X 0 u: But X 0 X explodes as!1,andx 0 u doesn t converge. o see this note that 0 P t X P t1 t X 1 P tn X 0 t X = B X P P t1 t X2 t1 t X t1x tn @..... C P. A t X P tn t X P t1x tn t X2 tn 10
and 0 X 0 u = B @ P Pt u t t X t1u t. P t X tnu t which is random. However, if we rewrite µ (X 0 X) 1 X X 0 0 1 X X 0 u u = ; then, under some conditions (discuss), X0 X will converge to a positive de nite nite matrix and X0 u will converge to 0 (by a Law of Large Numbers). hus µ plim b X 0 1 X X 0 u = + plim µ 1 = + plim X0 X plim X0 u µ 1 = + plim X0 X 0= : ) plim b =0 ) p µ b» N "0;¾ 2 1 C A plim X0 X 1 # by a Central Limit heorem. What would have happened if we had tried to derive the asymptotic distribution of p b? What would have happened if we had tried to derive the asymptotic distribution of b? 2.6 Residual Analysis bu = y X b = y X (X 0 X) 1 X 0 y h = I X (X 0 X) 1 X 0i y: Consider P X = X (X 0 X) 1 X 0 : Note that P X X = X (X 0 X) 1 X 0 X = X; P X Xb = X (X 0 X) 1 X 0 Xb = Xb; and the same properties hold for (Xb) 0 P X. P X is called the projection matrix for X because it projects any vector into the space spanned by X. 11
Figure 1: Explain with a picture: Show unbiasedness. Explain with three dimensions. Note that P 2 X = X (X 0 X) 1 X 0 X (X 0 X) 1 X 0 = X (X 0 X) 1 X 0 = P X ) P X is idempotent. Consider I P X : (I P X )(I P X )=I P X P X + P X P X = I P X ) I P X is idempotent. Explain I P X geometrically. Note that (I P X ) Xb = Xb P X Xb = Xb Xb =0: Note that both P X and (I P X ) are symmetric. Back to residuals: Consider bu =(I P X ) y =(I P X )(X + u) =(I P X ) u: b 0 X 0 bu = b 0 X 0 (I P X ) u =0 8b: (5) 12
Let e i (n+1) 1 be a vector with 1 in the ith element and 0 everywhere else. that Xe i = X i (the ith column of X). Note ) Xi 0 bu =0 8i (set b = e i in equation (5)). When i =0, 0 X 0 = B @ and Consider bu 0 bu = 1 1. 1 1 C A ; X0 0 bu = X bu t =0: t=1 X bu 2 t =[ (n +1)]s2 : t=1 Ebu 0 bu = Eu 0 (I P X ) 0 (I P X ) u = Eu 0 (I P X )(I P X ) u = Eu 0 (I P X ) u = treu 0 (I P X ) u = Etru 0 (I P X ) u = Etr (I P X ) uu 0 = tr (I P X ) Euu 0 = tr (I P X ) ¾ 2 I = ¾ 2 tr (I P X ) = ¾ 2 tri trp X = ¾ 2 ¾ 2 trx (X 0 X) 1 X 0 = ¾ 2 ¾ 2 tr (X 0 X) 1 X 0 X = ¾ 2 ¾ 2 tri n+1 = ¾ 2 [ (n +1)] ) Es 2 = E [ (n +1)] 1 bu 0 bu = [ (n +1)] 1 Ebu 0 bu = ¾ 2 : 13
2.7 More on Projection Consider the model, y = X k + and consider three di erent estimators for : Z + u; m 1. Let X =(X j Z) ; = µ ) we can write the model as y = X + u: De ne b =(X 0 X ) 1 X 0 y and b as the rst k elements of b. 2. Let ex = X Z b ± = X Z (Z 0 Z) 1 Z 0 X = [I P Z ] X; and de ne b = X e 0 ex 1 X e 0 y: 3. Let ey = y P Z y =[I P Z ] y; and de ne b = ex 0 X 1 e ex 0 ey: All three of these estimators provide the exact same estimate of. Generalize for Why? y = X + Z + Qµ + u: 14
2.8 Signi cance We might want to test the signi cance of the whole equation. R 2 and F. We can use the same statistics again. Let Before we used X X X (y t y) 2 = (y t by t ) 2 + (by t y) 2 t=1 t=1 t=1 SS ESS RSS where by t = X t b : De ne Problems with R 2 : R 2 = RSS SS =1 ESS SS : 1. It s properties depend on having the correct speci cation of the model (implications for comparing R 2 across nonnested models). 2. It s properties depend on the number of X s (n). In particular, adding an extra column to X can not decrease R 2, and, if X 0 n+2 (I P X) y 6= 0; then R 2 increases. R 2 =1. ) linearly independent X s will always produce 3. R 2 is only one measure of t, and it is not a particularly good one. It corresponds to What is the distribution of R 2? Adjustments: H 0 : 1 = 2 = :: = n =0: where dvarbu t = s 2 ; dvary t = R 2 =1 d Varbu t dvary t 1 1 X (y t y) 2 : t=1 15
Note that R 2 = 1 s2 [ (n +1)] dvary t [ 1] ) 1 R 2 = s2 [ (n +1)] dvary t [ 1] ) R 2 =1 1 R 2 1 (n +1) : Note that R 2 can be negative. It can be shown that, if the jt statisticj associated with a variable is greater than one, then the R 2 will increase when that variable is included. Consider the test Under the null hypothesis, H 0 : 1 = 2 = :: = n =0 H A : 1 6= 2 6= :: 6= n 6= 0: independent ( ESS=¾ 2 = s 2 [ (n +1)]=¾ 2» Â 2 (n+1) RSS=¾ 2 = b 0 X 0 X b =¾ 2» Â 2 n where X is X without the rst column and b of b. ) is the corresponding elements RSS=n ESS=[ (n +1)]» F n; (n+1) = R 2 =n (1 R 2 ) = [ (n +1)] : 2.9 Multicollinearity If two or more X i s are linearly independent, we say they are colinear. Example: where Ea = 0 + 1Sc + 2Ex + 3Ag + u Ag = Sc + Ex +6: In this example, X 0 X is not invertible; it is singular. Where is the singularity? Assume you knew. Consider the alternative model Ea = 0 + 1 Sc + 2 Ex + 3 Ag + u 16
where 0 = 0 +6 3; 1 = 1 + 3; 2 = 2 + 3; 3 = 0: ) Ea = 0 +6 3 +( 1 + 3) Sc +( 2 + 3) Ex + u = 0 + 1Sc + 2Ex + 3 (6 + Sc + Ex)+u = 0 + 1Sc + 2Ex + 3Ag + u which is the same as the original model but with di erent implications. In practice, even if there is not perfect colinearity, but near perfect colinearity (as measured by the correlation matrix of X or the eigenvalues of a standardized X 0 X), there will be problems: D b will explode as X 0 X approaches singularity. What can we do? Drop variables, principal components, cry. 2.10 Linearity Discuss which equations can be estimated with OLS: 1. 2. 3. 4. 2.11 Dummy Variables Example: Wages y = 0 + 1X 1 + 2X 2 1 + u log y = 0 + 1 log X 1 + 2 log X 2 + u y = 0 X 1 1 X 2 2 u y = 0 X 1 1 X 2 2 + u log W t = 0 + 1Educ t + 2Age t + 3Sex t + 4Race t + 5Urban t + 6South t + u t : Explain each dummy variable. 17
1. What if you change de nition? 2. With race, what if there are more than categories? 3. What if the constant is deleted? 4. How can you interact dummies? Consider the equation below (from Byrne, Goeree, Hiedemann, and Stern 2001) and interpret each variable: ln w k = 0:028 + 0:035 Educ k 1(Educ k < 12) (6) (0:072) (0:006) + 0:540 1(Educ k =12) + 0:680 1(13 Educ k 15) (0:052) (0:053) + 0:978 1(Educ k =16) + 1:086 1(Educ k =17) (0:053) (0:054) + 0:066 Age k 0:00069 Age k Age k + 0:099 Male k (0:002) (0:00003) (0:030) + 0:028 Marry k + 0:066 W hite k + 0:090 Male k Marry k (0:029) (0:022) (0:042) + 0:022 Male k White k 0:035 Marry k W hite k (0:033) (0:032) + 0:093 Male k Marry k White k + e k (0:045) 2.12 Omitted Variables Consider the model Instead we estimate with and R 2 =0:34 y = X + Z + u: y = Xb + e b = (X 0 X) 1 X 0 y = (X 0 X) 1 X 0 (X + Z + u) = +(X 0 X) 1 X 0 Z +(X 0 X) 1 X 0 u; E b b = +(X 0 X) 1 X 0 Z which is equal to i (X 0 X) 1 X 0 Z =0. his happens if X 0 Z =0or =0; otherwise our estimator is biased. 18
2.13 Measurement Error Consider the model y = X + u: Insteadofobservingy, weobserve w = y + e where EX 0 e =0: Consider the properties of the OLS estimator b =(X 0 X) 1 X 0 w: b =(X 0 X) 1 X 0 (y + e) =(X 0 X) 1 X 0 y +(X 0 X) 1 X 0 e ) E b = E (X 0 X) 1 X 0 y + E (X 0 X) 1 X 0 e = : Now, instead, assume we observe y but we observe Z = X + e instead of X with Ee =0, Ee 0 u =0,andplim X0 e y = Zb + ": b = (Z 0 Z) 1 Z 0 y = (X + e) 0 (X + e) 1 (X + e) 0 (X + u) : =0. So we estimate b in he E b b may not even exist. Consider plim b = plim (X + e) 0 (X + e) 1 (X + e) 0 (X + u) (7) (X + e) 0 1 (X + e) (X + e) 0 (X + u) = plim = plim X0X 1 + e plimx0 + X plime0 + e plime0 plim X0 X + plim X0 u + X plime0 + plim e0 u = plim X0X 1 + e plime0 plim X0 X 6= : If n =1, then equation (7) becomes P P plim b ex 2 = plim t e 2 1 P + plim t ex 2 plim t with the property that b is biased towards 0 (attenuation bias). 19
2.14 Gauss-Markov heorem heorem 1 If y = X + u with u» 0;¾ 2 I ; then b =(X 0 X) 1 X 0 y is BLUE. (De ne BLUE) Proof. Consider an alternative linear unbiased estimator Ly = L (X + u) : Since ELy = (unbiased), LX = 8 ) LX = I: Note that D (Ly) =¾ 2 LL 0 and D b = ¾ 2 (X 0 X) 1 : De ne Z = L (X 0 X) 1 X 0 ; and note that ZZ 0 is positive semide nite. Also, ZZ 0 = hl (X 0 X) 1 X 0ih L (X 0 X) 1 X 0i 0 = LL 0 LX (X 0 X) 1 (X 0 X) 1 X 0 L 0 +(X 0 X) 1 X 0 X (X 0 X) 1 = LL 0 (X 0 X) 1 (X 0 X) 1 +(X 0 X) 1 = LL 0 (X 0 X) 1 : his implies that D (Ly) D b 0: 20
2.15 esting Consider the model y = X + u and the test H 0 : k = k vs H A : k 6= k: Under H 0, ) " p µ X b k k» N 0;¾ 2 0 X p b k k q ¾ 2 X 0 X 1 kk» N [0; 1] : We can construct a test statistic using a normal table. If ¾ 2 is unknown, then p b k k q s 2 X 0 X 1 kk» t (n+1) : 1 We can construct a test statistic using a t-statistic table. Consider a one-sided test. Now consider Give examples. Note that, under H 0, H 0 : a 0 = c vs H A : a 0 6= c: plim a 0 b = a0 = c; D a 0 b = a 0 D b a; kk # Do some examples. Do some examples. and, therefore, and " p µ X a 0 b c» N 0;¾ 2 a 0 0 1 X a# p a 0b c q s 2 a» t (n+1) : 0 X 0 X 1 a 21
Now consider H 0 : A = c vs H A : A 6= c: Give examples (including signi cance of a subset of coe cients). " p A b µ X c» N 0;¾ 2 0 # 1 X A A 0 : Now Consider (in general) Z» N [0; ] ; and de ne R such that RR 0 = 1 ) R 1 0 R 1 = : Consider R 0 Z: ER 0 Z = R 0 EZ =0; D (R 0 Z) = ER 0 ZZ 0 R = R 0 EZZ 0 R = R 0 D (Z) R = R 0 R = R 0 R 1 0 R 1 R = I ) RZ» N (0;I) ) (RZ) 0 (RZ)» Â 2 n: Now let " µ X RR 0 0 # 1 1 X = A A 0 =¾ 2 ) R 0p A b c» N (0;I) h ) R 0p A b i 0 h c R 0p A b i c» Â 2 q where q is the number of rows in A (= Rank(R)) ) A b c 0 RR 0 A b c " µ = A b 0 X 0 # 1 1 X c A A 0 A b c =¾ 2» Â 2 q: 22
If ¾ 2 is unknown, then " A b 0 c A µ X 0 X # 1 1 A 0 A b c =qs 2» F q; (n+1) (where did the q in the denominator come from?). his is a Wald test statistic. Alternatively, we could construct a likelihood ratio test statistic. Consider the model and the test Estimate and de ne these are restricted residuals. Next, de ne y = X n + Z k + u H 0 : =0 vs H 0 : 6= 0: b b =(X 0 X) 1 X 0 y; be = y X b b =(I P X ) y; De ne ESS R ¾ 2 = be0 be ¾ 2» Â2 n : b" =(I P XZ ) y; these are unrestricted residuals. De ne ESS U = b" 0 b": We can show that ½ ESSR ESS U Independent ¾» Â 2 2 k ESS U ¾» Â 2 2 (n+k) ) (ESS R ESS U ) =k ESS U = [ (n + k)]» F k; (n+k) R 2 ) U RR 2 =k (1 RU 2 ) = [ (n + k)]» F k; (n+k): We will discuss likelihood ratio tests (and why this is a likelihood ratio test) in more detail later. 23