Multiple Regression Analysis: Heteroskedasticity y = β 0 + β 1 x 1 + β x +... β k x k + u Read chapter 8. EE45 -Chaiyuth Punyasavatsut 1
topics 8.1 Heteroskedasticity and OLS 8. Robust estimation 8.3 Testing for Heteroskedasticity 8.4 Weighted Least Squares Estimation 8.5 LPM revisited EE45 -Chaiyuth Punyasavatsut
8.1 What is Heteroskedasticity Recall the assumption of homoskedasticity implied that conditional on the explanatory variables, the variance of the unobserved error, u, was constant If this is not true, that is if the variance of u is different for different values of the x s, then the errors are heteroskedastic Example: estimating returns to education and ability is unobservable, and think the variance in ability differs by educational attainment EE45 -Chaiyuth Punyasavatsut 3
EE45 -Chaiyuth Punyasavatsut 4
8.1 Why Worry About Heteroskedasticity? OLS is still unbiased and consistent, even if we do not assume homoskedasticity (MLR1-4 enough) R and adjusted R are unaffected But, The standard errors of the estimates are biased if we have heteroskedasticity If the standard errors are biased, we can not use the usual t statistics or F statistics or LM statistics for drawing inferences LS is no longer BLUE under heteroskedasticity, means that we can find more efficient estimators EE45 -Chaiyuth Punyasavatsut 5
8. Heteroskedasticity-Robust adjust standard errors and t, F and LM statistics so that they are valid in the presence of heteroskedasticity of unknown form. Thus, LS is still useful Let begin with how the variances, var ( ˆβ ) 1, can be estimated under hetero Consider a simple regression y i = β 0 + β 1 x i + u i var(u i x i ) = σ i EE45 -Chaiyuth Punyasavatsut 6
8. Variance with Heteroskedasticity For LS, Var ( β ) where βˆ ( x x) ( x ) i i = β + 1 1 ( x x) i = i 1 SST x σ i ˆ, x ( ) SST = x x i x u, so EE45 -Chaiyuth Punyasavatsut 7
8. Variance with Heteroskedasticity σ i If σ =, then Var ( β ) ( x x) i 1 SST x σ ˆ = = SST seeeq.(.57), so wecannot use this formula when hetero is present. σ x EE45 -Chaiyuth Punyasavatsut 8
8. Variance with Heteroskedasticity When σ σ i, a valid estimator of ( ) i 1 SST x ˆ i var( ˆ x x u β ) is, where uˆ are are the OLS residuals i We can compute this after LS regression. EE45 -Chaiyuth Punyasavatsut 9
8. Variance with Heteroskedasticity For multiple regression model, a valid estimator of Var ( ) ( ) j Var j th ˆ ru ˆˆ ij i SSRj βˆ with hetero is β =, where rˆ is the i residual from regressing x ij on all other independent variables, and SSR j is the sum of squared residuals from this regression j EE45 -Chaiyuth Punyasavatsut 10
8. Variance with Heteroskedasticity: Robust Standard Errors Now that we have a consistent estimate of the variance, the square root can be used as a standard error for inference called hetero-robust standard error Sometimes, the estimated variance is corrected for degrees of freedom by multiplying by n/(n k 1) As n it s all the same, though The hetero-robust s.e. provide us to compute t- statistics that are asymptotically t-distributed whether or not hetero is present. EE45 -Chaiyuth Punyasavatsut 11
8. Robust Standard Errors (cont) Important to remember that these robust standard errors only have asymptotic justification. With small sample sizes, t statistics formed with robust standard errors will not have a distribution close to the t, and inferences will not be correct That s why we like the usual t-stat under the homoskedasticity assumption: t-stat have an exact t-distribution. In Stata, robust standard errors are easily obtained using the robust option of reg EE45 -Chaiyuth Punyasavatsut 1
8. Robust Standard Errors (cont) Note that the robust standard errors can be either larger or smaller than the usual s.e. In empirical works, we often find that the robust s.e. is larger. Similarly, we could have a hetero-robust- F- statistics or LM statistics EE45 -Chaiyuth Punyasavatsut 13
8. Robust Standard Errors (cont) A Robust LM Statistic consider the model y = β 0 + β 1 x 1 + β x +... β 5 x 5 + u H 0 :β 4 =0, β 5 =0 To obtain the usual LM statistic, first we estimate the restricted model to obtain the residuals, ŭ. Then, we regress ŭ on all x s Thus, LM = n. R u EE45 -Chaiyuth Punyasavatsut 14
8. Robust Standard Errors (cont) A Robust LM Statistic For the robust LM 1. run OLS on the restricted model and save the residuals ŭ. Regress each of the excluded variables on all of the included variables (q different regressions) and save each set of residuals ř 1, ř,, ř q 3. Regress 1 on ř 1 ŭ, ř ŭ,, ř q ŭ, with no intercept 4. The LM statistic is n SSR 1, where SSR 1 is the sum of squared residuals from the final regression from step 3. EE45 -Chaiyuth Punyasavatsut 15
8.3 Testing for Heteroskedasticity To decide whether to us the robust-s.e or the usual one. If the hetero is present, LS is not BLUE. If the form of hetero is known, we can find the better estimator than LS...WLS Consider the model y = β 0 + β 1 x 1 + β x +... β k x k + u Assuming MLR1-4 hold Essentially, we want to test H 0 : Var(u x 1, x,, x k ) = σ, which is equivalent to H 0 : E(u x 1, x,, x k ) = E(u ) = σ EE45 -Chaiyuth Punyasavatsut 16
8.3 Testing for Heteroskedasticity This is to test whether MLR5 is true. If data violates this null hypothesis, that means we can find some relationship between u and x j. A simple approach is to assume a linear function as u = δ 0 + δ 1 x 1 + + δ k x k + v The null hypothesis of homoskedasticity is to test H 0 : δ 1 = δ = = δ k = 0 remind that u is the errors in the population model which are not observed by us, we have to estimate. EE45 -Chaiyuth Punyasavatsut 17
8.3 Testing for Heteroskedasticity (cont): BP test We can estimate it with the residuals from the OLS regression After regressing the residuals squared on all of the x s, we can use the R to form an F or LM test The F statistic is just the reported F statistic for overall significance of the regression, F = [R /k]/[(1 R )/(n k 1)], which is distributed F k, n k - 1 The LM statistic is LM = nr, which is distributed χ k The LM test is also called the Breusch-Pagan test for hetero (or BP EE45 test -Chaiyuth for Punyasavatsut short) 18
8.3 Testing for Heteroskedasticity (cont): White test The Breusch-Pagan test will detect any linear forms of heteroskedasticity The White test allows for nonlinearities by using squares and crossproducts of all the x s Still just using an F or LM to test whether all the x j, x j, and x j x h are jointly significant This can get to be unwieldy pretty quickly EE45 -Chaiyuth Punyasavatsut 19
8.3 Testing for Heteroskedasticity (cont): White test For k=3, we have 9 terms (3 for x s, 3 for x s, and 3 for crossproduct x i x j ) in total H 0 : δ 1 = 0,...δ 9 = 0. loss many degrees of freedom we can conserve on df with the White idea by using LS fitted values. EE45 -Chaiyuth Punyasavatsut 0
8.3 Testing for Heteroskedasticity (cont): White test Consider that the fitted values from OLS, ŷ, are a function of all the x s yˆ = βˆ + βˆx +... + βˆx i 0 1 i1 k k1 Thus, ŷ will be a function of the squares and crossproducts So, ŷ and ŷ can proxy for all of the x j, x j, and x j x h, EE45 -Chaiyuth Punyasavatsut 1
8.3 Testing for Heteroskedasticity (cont): White test We can then regressing the residual squared on ŷ and ŷ and use the R to form an F or LM statistic Note that we need to test only restrictions now: δ 1 = 0, δ = 0. This alternative method saves some df. EE45 -Chaiyuth Punyasavatsut
8.3 Testing for Heteroskedasticity (cont): White test The LM-statistics will have Chi-squared distribution with k df. In the previous example, k = (no. of restrictions) If we use F-statistics, we have F- distribution with (k, n- no. of estimates) or (, n-3) in the previous example. EE45 -Chaiyuth Punyasavatsut 3
8.3 Testing for Heteroskedasticity (cont): White test Heteroskedasticity problem sometimes could be viewed as model misspecification problem In practices, we will test model specification first, once we are satisfied with the functional form, we then test for heteroskedasticity. EE45 -Chaiyuth Punyasavatsut 4
8.3 Testing for Heteroskedasticity : example 8.4 (housing price equation) price = 1.77 +.0007lotsize + 0.13sqrft + 13.85bdrms (9.48) (0.00064) (0.013) (9.01) n = 88, R squared = 0.67 If we regress the LS residuals on independent variables, we get R- squared=0.16 For n=88, k=3 EE45 -Chaiyuth Punyasavatsut 5
8.3 Testing for Heteroskedasticity : example 8.4 (housing price equation) The F-test (formula 8.15) for heteroskedasticity = 0.16/ 3 5.34 (1 0.16)/(88 3 1) So, we reject the null. We find hetero problem What if we take log of all variables? EE45 -Chaiyuth Punyasavatsut 6
8.3 Testing for Heteroskedasticity : example 8.5 (White Test) see program EE45 -Chaiyuth Punyasavatsut 7