Statistical Properties of OLS estimators

1 Statistical Properties of OLS estimators Liear Model: Y i = β 0 + β 1 X i + u i OLS estimators: β 0 = Y β 1X β 1 = Best Liear Ubiased Estimator (BLUE) Liear Estimator: β 0 ad β 1 are liear fuctio of Y i s Cosider the umerator term i β 1. Note that (X i X ) = 0 (X i X )Y = Y (X i X ) = 0 β 1 = = (X i X )Y i = w i Y i w i = (X i X ) 1 Y i β 0 = Y β 1X = w i X Y i = ( 1 w ix ) Y i Ubiased Estimator E(β 0) = β 0 E(β 1) = β 1 Assumptio 1. E(u i X) = 0 E(Y i X) = β 0 + β 1 X i, E(u i ) = E[E(u i X)] = E[0] = 0 where X icludes all X i s: X = (X 1, X,, X ) Note: E(u i X) = 0 implies that cov(u i, X) = 0, that is, errors u i are ot correlated with ay X j. This is based o the followig geeral result. Theorem. E(Y X) = E(Y) cov(x, Y) = 0 Proof. We use the law of iterated expectatios. cov(x, Y) = E(XY) E(X)E(Y) E(XY) = E [E(XY X)] = E[XE(Y X)] = E[XE(Y)] = E(X)E(Y) the first equality is due to the law of iterated expectatios the secod equality is due the fact that, coditioal o X meas X becomes a costat i expectatios third equality is due to the proposed relatioship E(Y X) = E(Y) last equality is due to the fact that E(Y) is a costat i expectatio with respect to X

Example of wage rate of idividuals: Idividuals of 1 year educatio (X ) get β 0 + β 1 X i o the average. Some idividual gets a higher ad some gets a lower wage rate tha the average rate ad the average of deviatios from the mea is zero. Propositio: Uder Assumptio 1, OLS estimators are ubiased. Proof OLS estimators ca be writte as (1.a) β 1 = β 1 + w i u i where w i = (X i X ) (1.b) β 0 = β 0 (β 1 β 1 )X + u Takig coditioal expectatio q.e.d. E(β 1 X) = β 1 + w i E(u i X) = β 1 + 0 E(β 0 X) = β 0 (E(β 1 X) β 1 )X + E(u X) = β 0 + 0 + 0 Proof of (1.b) Y i = β 0 + β 1 X i + u i Y = β 0 + β 1 X + u Plug this ito the equatio for β 0 ad rearrage terms to get β 0 = Y β 1X = (β 0 + β 1 X + u ) β 1X = β 0 (β 1 β 1 )X + u Proof of (1.a) Cosider the umerator term of β 1 equatio = (X i X )(Y i Y ) = (X i X )Y i = (X i X )(β 0 + β 1 X i + u i ) = β 1 (X i X )X i + (X i X )u i = β 1 (X i X ) + (X i X )u i where we used the relatioship (X i X )Y = (X i X )β 0 = 0 (X i X )X i = (X i X )(X i X ) = (X i X ) β 1 = = β 1 (X i X ) + (X i X )u i (X i X ) = β 1 + w i u i Violatio of Assumptio 1

3 Omitted variable. Omitted variable may cause the correlatio betwee the error term ad the explaatory variable. The regressio is the effect of the percetage of childre who are eligible for free luch program (luch) o the percetage (math) of 10 th graders who pass the math exam. The regressio result is math=3.14-0.319luch, =408, R =0.171 This idicates that a 10% icrease i the umber of studets who are eligible for free luch will reduce the passig percetage by 3.19 percet. A policy implicatio is that the govermet must tighte the eligibility criteria to icrease the passig percetage. This regressio result does't seem to be right. It is likely that the explaatory variable (percetage of eligible studets) is correlated with the poverty level, school quality ad resources of the school which are cotaied i the error term. This causes the OLS estimator biased. Assumptio. (X i, Y i ) are idepedetly ad idetically distributed (i.i.d) Assumptio 3. Large outliers are ulikely. Assumptio meas that samples are draw radomly. This is possible i experimets. But, with observed data, we hope that they are reasoably idepedet. Idepedece of Y i ad Y j meas that u i ad u j are idepedet. That is, u i's are i.i.d. radom variables. E(u i X) = E(u i ) = 0 var(u i X) = var(u i ) = σ u homoscedasticity var(u i X) = var(u i ) = σ u Heteroskedasticity var(u i X) = var(u i ) = σ i Theorem. Variace ad covariace of coefficiet estimators uder homoscedasticity Uder the assumptios listed above, the variaces of OLS estimators are give by σ = σ β 0 u Q 0 σ = σ β 1 u Q 1 σ β 0,β 1 = cov(β 0, β 1) = X var(β 1) = σ u X Q 1 Q 0 = X i Q 1 = 1 (X i X ) Remarks: 1. Estimators become less precise (i.e., higher variaces) as there is more ucertaity i the error term (i.e., higher value of σ u ).. Estimators become more precise (i.e., lower variaces) as the regressor X is more widely dispersed aroud its mea, i.e., the deomiator term is larger. 3. Estimators become more precise (i.e., lower variaces) as the sample size icreases. 4. Variace of itercept term icreases if values of regressor X are far away from the origi 0.

4 5. β 0 ad β 1 are egatively (positively) correlated if X is positive (egative) because the regressio lie must pass the poit of sample meas (X, Y ). 6. Variaces of coefficiet estimators are ukow because they ivolve σ u which is ukow. To compute their variaces, we eed a estimator for σ u. Least squares estimator of σ u Variaces of least squares estimators of coefficiets ad their covariace caot be computed from the give data because σ u is ukow. How do we estimate it? - σ u is the variace of error term: σ u = var(u i ) = E[u i E(u i )] = E[u i ] - Expected value is a theoretical couter part of sample mea - If we have observed data o u i, we may estimate the variace by sample mea u i /. - Sice we do t have observed values of error terms, we may use its estimate u i - A problem is that ot all u i ca take idepedet values: For example, u i = 0. This meas that if we have values of the first -1 estimated error terms, the last oe is automatically determied. This is called the loss of degrees of freedom. - The umber loss i degrees of freedom is equal to the umber of parameters we estimate. - i the simple liear regressio model, it is two. - whe we add all estimated residuals, we are actually addig - idepedet values. - the degrees of freedom is therefore -. - ad we estimate the variace by the average of - idepedet residuals σ u = u i Estimated variaces ad covariace of coefficiet estimators are obtaied by replacig the ukow σ u with its ubiased estimate σ u. σ β 0 = σ u Q 0 σ β 1 = σ u Q 1 = σ β 0,β 1 X σ β 1 Remark: Goodess of Fit R The idea of the R measure of goodess of fit was to compare the SSR i models with ad without iformatio about the regressor ad measure the fractio of the reductio i the SSR

5 R = SSR u = 1 SSR u Aother idea is to see how close predicted values Y i are to the observed Y i. The closeess is measured by the sample correlatio coefficiet or its squared value corr(y i, Y i) = cov(y i,y i) SD(Y i )SD(Y i) R = [corr(y i, Y i)] = [cov(y i,y i)] var(y i )var(y i) To show the last expressio is the same as the previous expressio of R, we first show u i = (Y i β 0 β 1X i ) = (Y i Y ) + β 1 (X i X ) = 0 + 0 = 0 Notig X i u i = (Y i β 0 β 1X i ) = (Y i Y ) + β 1 (X i X ) = 0 + 0 = 0 Y i = Y i + u i Y = Y + u = Y it is easy to show cov(y i, Y i) = 1 (Y i Y )(Y i Y ) = 1 (Y i + u i Y )(Y i Y ) = 1 (Y i Y ) where the last equality is due to X i u i = 0. This shows var(y i) = 1 R = [cov(y i,y i)] = var(y i) var(y i )var(y i) var(y i ) Note that (Y i Y ) = cov(y i, Y i) var(y i ) = 1 (Y i Y ) 1 = SSR r Now, we will show var(y i) = SSR u (Y i Y ) = (Y i Y i + Y i Y ) The last term is zero = (Y i Y i) + (Y i Y ) + (Y i Y i)(y i Y ) = u i + (Y i Y ) + u i(y i Y ) u i(y i Y ) = u i(y i Y ) = u i(β 0 + β 1X i Y ) = 0 where the last equality is due to u i(β 0 Y ) = (β 0 Y ) u i = 0 u ix i = 0 This is show before. Puttig all these results together, we have R = [cov(y i,y i)] = var(y i) = SSR u = 1 SSR u var(y i )var(y i) var(y i )