1 Economics 620, Lecture 2: Regression Mechanics (Simple Regression) Observed variables: y i ; x i i = 1; :::; n Hypothesized (model): Ey i = + x i or y i = + x i + (y i Ey i ) ; renaming we get: y i = + x i + " i Unobserved: ; ; " i EXAMPLE: ENGEL CURVES Utility function: u(z 1 ; :::; z k ) = P k j=1 a j ln z j : Budget constraint: m = P k j=1 p j z j. FOC: a j z j p j = 0 j = 1; :::; k ) = P kj=1 a j m ) z j = a jm p j P k`=1 a` ) z j p j = a j P k`=1 a`m
2 Estimation We want to estimate: E(y) = + x Where y is the expenditure on good j and x is income. According to the model we also have: = a j = P a`; = 0 We would like to estimate the unknowns from a sample of n observations on y and x.
3 The Least Squares Method The Least Squares criterion to estimate and is to choose ^ and ^ to minimize the sum of squared vertical distances between ^y i = ^ + ^xi and y i. Why do we consider the vertical distances? Why do we square? Let S(a; b) = P n i=1 (y i a bx i ) 2 : Partial deriviatives: @S @a = 2 P n i=1 (y i a bx i ) @S @b = 2 P n i=1 x i (y i a bx i ).
4 Normal Equations Normal equations: 0 = P n i=1 (y i a bx i ) 0 = P n i=1 x i (y i a bx i ): ^ and ^ are the Least Squares (LS) Estimators P ni=1 y i = n^ + ^ P ni=1 x i (1) P ni=1 x i y i = ^ P n i=1 x i + ^ P ni=1 x 2 i (2) From (1): ^ = y ^x where y = P ni=1 y i n ; x = P ni=1 x i n Substituting into (2): P xi y i = (y ^x) P xi + ^ P x 2 i
5 Normal Equations cont d. ) P x i (y i y) = ^ P x 2 i x P x i = ^ P x 2 i nx 2 = ^ P (xi x) 2 ) ^ = P xi (y i y) P (xi x) 2 = P (xi x)y i P (xi x) 2 ) ^ = P (xi x)(y i y) P (xi x) 2 Is this a minimum? Note that: @ 2 S @a 2 = 2n; @2 S @a@b = 2 P x i ; @2 S @b 2 = 2 P x 2 i
6 Normal Equations cont d. Is the Hessian p.d.? H = 2 " P # n xi P P xi x 2 i YES! Use Cauchy-Schwarz: ( X x i z i ) 2 ( X x 2 i )(X z 2 i ) Here: ( P x i ) 2 P x 2 i n De ne the residuals as: e i = y i ^ ^xi From the normal equations: P ei = P x i e i = 0
7 Proof of Minimization Consider alternative estimators a and b : S(a ; b ) = P (y i a b x i ) 2 = P [(y i ^ ^xi ) + (^ a ) + (^ b )x i ] 2 = P e 2 i + 2(^ a ) P e i + 2(^ b ) P x i e i + P [(^ a ) + (^ b )x i ] 2 P e 2 i
8 Properties of Estimators LS estimators are unbiased: ^ = P (xi x)y i P (xi x) 2 = P (x i x) + P (x i x)x i + P (x i x)" i P (xi x) 2 = + P (xi x)" i P (xi x) 2 ) E^ = ; ^ = y ^x = + ( ^)x + " ) E ^ =
9 More Properties We cannot get more properties without further assumptions: Assume: V (y i jx i ) = V (" i ) = 2 ; Cov(" i " j ) = 0: Now: V (^) = E(^ ) 2 = E "P # 2 (xi x)" i P (xi x) 2 = P (xi x) 2 2 P(xi x) 2 2 ; using E" i " j = 0. Thus: V (^) = 2 P (xi x) 2
10 More Properties cont d. Now for V (^); ^ = ( ^)x + " ) V (^) = V (^)x 2 + 2 n ) V (^) = 2 1 n + x 2 P (xi x) 2 This requires Cov(^; ") = 0. Why? P (xi x)" E(^ )" = E P(xi i 1n P x) 2 "j = P (xi x) 2 =n P (xi x) 2 = 0
11 Engel Curve Example cont d. We know: p j z j = P a j a`m: Is V (" j ) = 2 plausible here? How about logs: ln(p j z j ) = ln a j P a` + ln m? This implies the regression equation y = + x where y is log expenditure on good j and x is log income. What are our expectations about the estimator values? Is this better?
12 Covariance of Estimators Cov(^; ^) = E[(^ )(^ )] " P!# (xi x)" = E (( ^)x + ") i P (xi x) 2 = E = "P # 2 (xi x)" i P (xi x) 2 x 2 x P (xi x) 2:
13 Gauss-Markov Theorem The LS estimator is the best linear unbiased estimator (BLUE). Proof: de ne w i = (x i x) P (xi x) 2 P so ^ = wi y i : Consider an alternative linear unbiased estimator: ~ = P c i y i : Write c i = w i + d i : Note: P E~ = ) E ci ( + x i + " i ) = E P c i ( + x i + " i ) = P c i + P c i x i ) P c i = 0; P c i x i = 1
14 Note that Gauss-Markov Theorem proof cont d. w i satis es P w i = 0; P w i x i = 1, so P d i = 0 and P di x i = 0. so Now we have WHY? V (~ ) = E ( P ci " i ) 2 = 2 P c 2 i = 2 P (w i + d i ) 2 = 2 h P d 2 i + 2 P w i d i + P w 2 i V (~ ) V (^) = 2 h P d 2 i + 2 P w i d i i = 2 P d 2 i This is minimized when the estimators are identical! A similar argument applies for b and any linear combination of b and b. i
15 Estimation of Variance It is natural to use the sum of squared residuals to obtain information about the variance. e i = y i ^ ^xi = (y i y) ^(xi x) = (^ )(xi x) + (" i ") ) P e 2 i = (^ ) 2 P (x i x) 2 + P (" i ") 2 2(^ ) P (xi x)(" i ") This will involve 2 in expectation - term by term. First term: E(^ ) 2 P (x i x) 2 = 2
16 Estimation of Variance cont d. Second term: E P (" i ") 2 = E " P " 2 1 i + n P 2 "i 2 P # " i " n = n 2 + 2 2 2 = (n 1) 2 Third term: P E2(^ ) (xi x)(" i ") "P (xi x)" = 2E i P P (xi (xi x) 2 x)(" i ") = 2E [P (x i x)" i ] 2 P (xi x) 2 = 2 2 #
17 Estimation of Variance cont d. Adding the terms we get: E P e 2 i = (n 2)2 This suggest the estimator: s 2 = P e 2 i =(n 2) This is an unbiased estimator It is a quadratic function of y This is all we can say without further assumptions
18 Summing Up With the assumption Ey i = + x i, we can calculate unbiased estimates of and (linear in y i ). Adding the assumption V (y i jx i ) = 2 and E" i " j = 0, we can obtain sampling variance for ^ and ^, get an optimality property and an unbiased estimate for 2 : Note the the optimality property may not be that compelling and that we have very little information about the variance estimate.