Econ Statistical Properties of the OLS estimator. Sanjaya DeSilva

Econ 39 - Statstcal Propertes of the OLS estmator Sanjaya DeSlva September, 008

1 Overvew Recall that the true regresson model s Y = β 0 + β 1 X + u (1) Applyng the OLS method to a sample of data, we estmate the sample regresson functon Y = b 0 + b 1 X + e () where the OLS estmators are, n x y b 1 = n x b 0 = Y b 1 X Unbasedness The OLS estmate b 1 s smply a sample estmate of the populaton parameter β 1. For every random sample we draw from the populaton, we wll get a dfferent b 1. What then s the relatonshp between the b 1 we obtan from a random sample and the underlyng β 1 of the populaton? To see ths, start by rewrtng the OLS estmator as follows; b 1 = n x y n x (3) 1 = n x x (Y Y ) (4) 1 = n ( x x Y x Y ) (5) 1

= 1 n x ( x Y Y x ) (6) 1 = n ( x x Y ) (7) For Y, we can substtute the expresson for the true regresson lne n order to obtan a relatonshp between b 1 and β 1. 1 b 1 = n ( x x (β 0 + β 1 X + u )) (8) 1 n n = n (β x 0 x + β 1 x X + x u ) (9) = β 1 + k u (10) where k = x n x (11) From ths expresson, we see that b 1 and β 1 are n fact dfferent. However, we can demonstrate that, under certan assumptons, the average b 1 n repeated samplng would equal β 1. To see ths, take the expectaton of both sdes of the above expresson, E(b 1 ) = β 1 + E( k u ) (1) If we assume that X, and therefore k s non-stochastc, we can rewrte ths as E(b 1 ) = β 1 + k E(u ) (13)

If we also assume that E(u ) = 0, we get E(b 1 ) = β 1 (14) When the expectaton of the sample estmate equals the true parameter, we say that the estmator s unbased. To recap, we fnd that f X s nonstochastc and E(u ) = 0, the OLS estmator s unbased. However, note that these two condtons are not necessary for unbasedness. Suppose k s not stochastc. Then, b 1 s an unbased f, E( k u ) = ( E(k u )) = 0 (15) That s, f X and u are uncorrelated, the OLS estmator s unbased. 3 Varance of the Coeffcent Estmate The varance of the b 1 samplng dstrbuton s, by defnton V ar(b 1 ) = E[b 1 E(b 1 )] (16) We showed n the prevous secton that, under certan classcal assumptons, E(b 1 ) = β 1. Then, V ar(b 1 ) = E[b 1 β 1 ] = E[ k u ] (17) Expandng terms, we get V ar(b 1 ) = E[ k u + k k j u u j ] (18) j = k E(u ) + k k j E(u u j ) (19) j 3

If we make the followng two addtonal assumptons, 1. The varance of the error term s constant,.e. V ar(u ) = E[u ] = σ. The error terms of dfferent observatons are not correlated wth each other or the covarance between all error terms s zero,.e. E(u u j ) = 0 for all j The expresson for the varance of b 1 reduces to the followng elegant form, V ar(b 1 ) = k σ = σ n (0) x Note that the varance of the slope coeffcent depends on two thngs. Varance of the slope coeffcent ncreases as 1. The varance of the error term ncreases. The sum of squared varaton n the ndependent varable decreases,.e. the X varable s clustered around the mean. 3.1 Estmate of The Varance of the Error Term Even though the above expresson s elegant, t s mpossble to compute the varance of the slope estmate because we don t know the varance of the underlyng error term. We get around ths problem by estmatng the varance of the error term,.e. σ usng the resduals obtaned from OLS. It can be shown that, under certan classcal assumptons, ˆσ = n e n (1) 4

s an unbased estmator of σ,.e.. E[ ˆσ n e ] = E[ n ] = σ () For the formal proof, see Gujarat Appendx. Note that ths proof also depends crucally on the classcal assumptons. Note that the denomnator of ths unbased estmator s the SSR. The estmator tself s ofren called the Mean Square Resdual. The square root of the estmator s called the standard error of the regresson (SER) and s typcally used as an estmate of the standard devaton of the error term. 4 The Effcency of the OLS estmator Under the classcal assumptons, the OLS estmator b 1 can be wrtten as a lnear functon of Y; b 1 = k Y (3) where k = x (4) Our goal now s to show that ths OLS estmator has a lower varance than any other lnear estmator,.e. the OLS estmator s effcent or best. To do so, consder any other lnear unbased estmator, b 1 = w Y (5) where w s some other functon of the two varables. 5

The expected value of ths estmator s, E(b 1) = w E(Y ) = β 0 w + β 1 w X (6) Because b 1 s unbased, For ths to be the case, t follows that, E(b 1) = β 1 (7) w = 0 (8) w X = 1 (9) It follows from these two denttes that, w x = w (X X) = w X X w = 1 (30) The varance of b 1 s V ar(b 1) = V ar( w Y ) = w V ar(y ) = σ w (31) If we rewrte the varance as, V ar(b 1) = σ (w k + k ) (3) and expand ths expresson V ar(b 1) = σ ( (w k ) + k + k (w k ) ) (33) Note that, k w = w = 1 (34) under the unbasedness assumpton made earler. 6

In addton, k = = 1 (35) Therefore, the varance of b 1 smplfes to, V ar(b 1) = σ ( (w k ) + k ) (36) Ths expresson s mnmzed when w = k (37) and the mnmum varance s, V ar(b 1) = σ k (38) Ths completes the proof that, under the classcal assumptons, the OLS estmator s has the least varance among all lnear unbased estmators. 4.1 Consstency We establshed that the OLS estmator s unbased and effcent under classcal assumptons. We can also show easly that the OLS estmator s consstent under the same assumpton. An estmator s consstent f ts varances reaches zero as the sample sze ncreases. In order to see ths, start wth the expresson for the varance, V ar(b 1 ) = Dvde both the denomnator and numerator by n. σ (39) V ar(b 1 ) = σ /n /n (40) 7

As n, the numerator approaches zero whereas the denomnator remans postve. Therefore, lm V ar(b 1) = 0 (41) n 5 Gauss-Markov Theorem and Classcal Assumptons To recap, we have demonstrated that the OLS estmator, b 1 = y = k Y (4) has the followng propertes; 1. Unbased,.e.. V ar(b 1 ) = E(b 1 ) = β 1 (43) σ = σ k (44) 3. Best or effcent,.e. has lower varance than any other lnear unbased estmator,.e. V ar(b 1 ) < V ar(b 1) (45) where b 1 = w Y and w s any other functon of x. 4. Consstent,.e. lm V ar(b 1) = 0 (46) n 8

f the followng classcal assumptons are satsfed, 1. The underlyng regresson model s lnear n parameters, has an addtve error and s correctly specfed,.e. Y = β 0 + β 1 f(x ) + u (47). The X varable s non-stochastc,.e. fxed n repeated samplng. 3. The expected value of the error term s zero,.e. E(u ) = 0 (48) Note that the ntercept term, β 0 ensures that ths condton s met. Consder Y = β 0 + β 1 X + u (49) E(u ) = k (50) Ths s equvalent to a model where Y = β0 + β 1 X + u (51) β0 = β 0 + 3 (5) E(u ) = 0 (53) Note also that the frst three condtons are suffcent for OLS to be unbased. 4. The explanatory varable, X s uncorrelated wth the error term u,.e. Cor(X, e) = E[x u ] = 0 (54) 9

Note that ths assumpton s necessary for OLS to be unbased. Even f x s non-stochastc, we can obtan unbased coeffcents f x s uncorrelated wth the error term. Such correlaton occurs typcally f X s endogenous,.e. determned by other varables. If both X and Y are determned by the same unobserved varables, ths assumpton s volated. If X and Y are determned by each other,.e. smultaneous equatons, ths assumpton s also volated. For example, f Y = β 0 + β 1 X + u (55) X = δ 0 + δ 1 Y + ɛ (56) Cor(X, u ) 0 f δ 1 0 and/or Cor(u, ɛ ) 0 5. The error term s homoskedastc,.e. the condtonal varance s a constant. V ar(u X ) = E(u X ) = σ (57) 6. The error term s serally uncorrelated,.e. the error term of one observaton s not correlated wth the error term of any other observaton. Cov(u, u j X, X j ) = E(u u j X, X j ) = 0 j (58) The assumptons of serally uncorrelated and homoskedastc errors allow us to obtan an unbased estmator for the varance of the error term, and a smple OLS formula for the varance of the coeffcent estmate. In addton, we need these two assumptons to demonstrate that OLS s effcent. In fact, we wll see later that other GLS methods are effcent when these assumptons are volated. 10

There are a few other assumptons that are necessary to obtan OLS coeffcents and standard errors; 1. At least one degree of freedom,.e. the number of observatons must exceed the number of parameters (n > k + 1) where k s the number of X varables. In the smple regresson wth one X varable, ths means there should be at least three observatons.. No X varable should be a determnstc lnear functon of other X varables,.e. no multcollnearty. Ths condton apples only to multple regressons where there are more than one X varable, and s dscussed later. 3. There should be some varaton n the X varable. If the X varable does not vary, t s mpossble to estmate the slope of a regresson lne. 11