Econometrics - 30C00200 Lecture 11: Heteroskedasticity Antti Saastamoinen VATT Institute for Economic Research Fall 2015 30C00200 Lecture 11: Heteroskedasticity 12.10.2015 Aalto University School of Business 1 / 31
Outline What is heteroscedasticity? Why worry about heteroscedasticity? Robust standard errors and testing parameters The efficiency of estimator Testing heteroscedasticity Reading: Wooldridge Chapter 8 30C00200 Lecture 11: Heteroskedasticity 12.10.2015 Aalto University School of Business 2 / 31
Some notational remarks My notation here slightly differs from the one used by professor Kuosmanen Most importantly I denote the error term by u instead of ε This notation follows the book by Wooldridge. In these slides I also refer to some Stata-data files that have been used on previous courses. The files can be obtained from me by request. 30C00200 Lecture 11: Heteroskedasticity 12.10.2015 Aalto University School of Business 3 / 31
What is heteroscedasticity? In the context of regression analysis: Homoskedasticity implies that, conditional on the explanatory variables, the variance of the error u is constant (does not depend on x): Var (u x) = constant If the variance of u is non-constant across the observations i.e. it differs with values of x s, then the errors are heteroskedastic: Var (u x) = f (x) Any random variable which variance is a function of some other variable is heteroscedastic! 30C00200 Lecture 11: Heteroskedasticity 12.10.2015 Aalto University School of Business 4 / 31
Example of Heteroskedasticity 30C00200 Lecture 11: Heteroskedasticity 12.10.2015 Aalto University School of Business 5 / 31
Example: A data sample with heteroskedasticity 30C00200 Lecture 11: Heteroskedasticity 12.10.2015 Aalto University School of Business 6 / 31
Examples of heteroscedasticity Why do you think that food expenditures could possibly be heteroscedastic wrt. income? How about firm sizes and growth rates of the firms? 30C00200 Lecture 11: Heteroskedasticity 12.10.2015 Aalto University School of Business 7 / 31
Empirical example: Wages heteroscedatic wrt. education? Using WAGE1.dta dataset. 30C00200 Lecture 11: Heteroskedasticity 12.10.2015 Aalto University School of Business 8 / 31
Functional form misspecification as heteroscedasticity Also functional form misspecification can lead to heteroscedasticity. 0 5 10 15 20 25 0 5 10 15 20 years of education average hourly earnings Fitted values 30C00200 Lecture 11: Heteroskedasticity 12.10.2015 Aalto University School of Business 9 / 31
The properties of OLS revisited Recall that homoscedasticity was one of the Gauss-Markov assumptions of OLS. More specifcally we have assumed that errors are i.i.d. Heteroscedasticity violates other one of these i s, namely the identically part. What are the implications of this violation? 30C00200 Lecture 11: Heteroskedasticity 12.10.2015 Aalto University School of Business 10 / 31
Why worry about heteroskedasticity? OLS is still unbiased and consistent, even if we do not assume homoskedasticity. Recall that exogeneity is the only required assumption for unbiasedness! However, the are two main problems with the usual OLS estimation and heterocedasticity. The usual standard errors of the parameter estimates are not correct. OLS is not necessarily the best (lowest variance) unbiased estimator More efficient estimator could be obtained. 30C00200 Lecture 11: Heteroskedasticity 12.10.2015 Aalto University School of Business 11 / 31
Robust Standard Errors Standard errors of the estimates are biased if we have heteroskedasticity. If the standard errors are biased, we cannot use the usual t statistics or F statistics for drawing inferences. Thus we derive alternative estimator of standard errors which is robust to heteroskedasticity. Note however that we are still using OLS estimator to estimate the parameters; it is only the standard errors that we correct. 30C00200 Lecture 11: Heteroskedasticity 12.10.2015 Aalto University School of Business 12 / 31
Variance of OLS with Heteroskedasticity Consider the simple regression model. Recall that the OLS estimator can be written as n i=1 ˆβ 1 = β 1 + (x i x)u i n i=1 (x i x) 2 = β Cov(x, u) 1 + Var(x) Therefore, the variance of the OLS estimator is: n i=1 Var( ˆβ 1 ) = (x i x) 2 σi 2 [ n i=1 (x i x) 2 ] 2 = where σ 2 i = Var (u i x i ). n i=1 (x i x) 2 σ 2 i (SST x ) 2, 30C00200 Lecture 11: Heteroskedasticity 12.10.2015 Aalto University School of Business 13 / 31
Homoscedastic vs. heteroscedastic variance formulas n i=1 Var( ˆβ 1 ) = (x i x) 2 σi 2 n [ n i=1 (x i x) 2 ] 2 = i=1 (x i x) 2 σi 2 (SST x ) 2 If we now set that Var (u i x i ) = σ 2, the above formula becomes Var( ˆβ 1 ) = σ2 n i=1 (x i x) 2 (SST x ) 2 = σ2 SST x (SST x ) 2 = σ2 SST x which is exactly the variance formula in the homosecastic case. 30C00200 Lecture 11: Heteroskedasticity 12.10.2015 Aalto University School of Business 14 / 31
OLS variance under heteroskedasticity 1/3 Under homocedasticity, we estimate σ 2 using SSR/(n-k-1). But how do we estimate the error variance under heteroskedasticity, i.e. estimate σ 2 i? Halbert White (1980) showed how to obtain a consistent estimator of var(ols) under heterocedasticity. In the expression of Var( ˆβ 1 ), we replace the unknown σi 2 the squared residual of observation i: by n Var( ˆβ i=1 1 ) = (x i x) 2 ûi 2 [ n i=1 (x i x) 2 ] 2. 30C00200 Lecture 11: Heteroskedasticity 12.10.2015 Aalto University School of Business 15 / 31
OLS variance under heteroskedasticity 2/3 The presented variance estimator is a consistent estimator of the OLS variance, i.e. plim n Var( ˆβ 1 ) = Var( ˆβ 1 ). Consequently, in large samples the heteroskedasticity correction works well. 30C00200 Lecture 11: Heteroskedasticity 12.10.2015 Aalto University School of Business 16 / 31
OLS variance under heteroskedasticity 3/3 White s estimator can be easily generalized to the multiple regression model. This can be done in two steps: 1 Use ûi 2 to estimate σi 2 and Var (u X) Ω, which has the variances on the diagonal. 2 Use the estimated ) variance-covariance matrix Ω to estimate Var (ˆβ X with the following formula: ) Var (ˆβ X = (X X) 1 X ΩX(X X) 1. What would happen above if Ω would be scalar matrix with constant σ 2 :s on its diagonal? 30C00200 Lecture 11: Heteroskedasticity 12.10.2015 Aalto University School of Business 17 / 31
Robust Standard Errors Now that we have a consistent estimate of the variance, the square root of estimated variance can be used as a standard error for inference, i.e. se( ˆβ j ) = Var( ˆβ j ) = Var(ˆβ X) jj. These are standard errors that are robust to heteroskedasticity. They are often called heteroscedasticity-consistent standard errors. Why? You can also hear names such as White, Huber-White, or robust standard errors associated with the above estimator. 30C00200 Lecture 11: Heteroskedasticity 12.10.2015 Aalto University School of Business 18 / 31
Robust t Statistics Given robust standard errors, we can construct heteroskedasticity-robust t statistics. Robust t-statistic has however only an asymptotic t-distribution (works well in large samples). Under homoscedasticity, the usual standard errors would be better. In Stata, robust standard errors are easily obtained using the robust option of reg. In Excel, there is no direct way to obtain them! Besides robust t-statistic, a robust version of F-statistic could be obtained as well. 30C00200 Lecture 11: Heteroskedasticity 12.10.2015 Aalto University School of Business 19 / 31
Stata Example 1/2 Let us estimate the following wage equation using usual and robust standard errors: wage = β 0 + δ 0 female + β 1 educ + β 2 exper + u. Stata output (with usual standard errors): 30C00200 Lecture 11: Heteroskedasticity 12.10.2015 Aalto University School of Business 20 / 31
Stata Example 2/2 Stata output (with heteroscedasticity robust standard errors): 30C00200 Lecture 11: Heteroskedasticity 12.10.2015 Aalto University School of Business 21 / 31
Residuals versus fitted plot One diagnostics tool to examine heteroscedasticity is the so called residuals-vs-fitted plot. 30C00200 Lecture 11: Heteroskedasticity 12.10.2015 Aalto University School of Business 22 / 31
Robust SEs and heteroskedasticity testing It is standard practice to use heteroskedasticity-robust inference methods in larger samples. Recall, that the robust standard errors have only large sample justification There are multiple tests to detect possible heteroscedasticity such as Breusch-Pagan, White and Goldfeld Quandt tests. Breusch-Pagan and White tests are based on testing wheter the x-variables significantly explain the squared residuals û 2 i 30C00200 Lecture 11: Heteroskedasticity 12.10.2015 Aalto University School of Business 23 / 31
White (1980) test 1/2 H 0 :homoscedastic errors. Consider for example a model y = β 0 + β 1 x 1 + β 2 x 2 + u. 1 Obtain û 2 from the above model. 2 Run the the following regression û 2 on x 1, x 2, x 2 1, x 2 2, x 1 x 2 3 Obtain the test-statistic as W = nr 2 u where n is sample size and R 2 uis the R-squared from the above regression. This statistic is distributed as W χ 2 (k) where k is the number of regressors (excluding constant) in the above auxilliary regression. 30C00200 Lecture 11: Heteroskedasticity 12.10.2015 Aalto University School of Business 24 / 31
White (1980) test 2/2 Detects very general form of heteroscedasticity, but is a low powered test and works well only in large samples. Recall, power of the test means that the test correctly rejects a false null (or conversely correctly accepting the correct alternative). More conservative tests that define explicitly a form of heteroscedasticity (under alternative hypothesis) or exclude terms in the Step 2. regression can be used instead. Often some functional form missspecification might be detected as heteroscedasticity by the White test (and also by some other tests). 30C00200 Lecture 11: Heteroskedasticity 12.10.2015 Aalto University School of Business 25 / 31
Improving the efficiency of the estimator 1/3 Recall that under hetescedasticity OLS is not anymore the most efficient estimator. Generalized Least Squares (GLS) estimators can be used instead. Known also as Weighted Least Squares (WLS) estimator, since it weights the observations in terms of their variability. Let us assume the following (multiplicative) form of heteroscedasticity: Var(u x) = σ 2 h(x)...and that the function h(x) is known i.e. we know the values h i Var(u i x i ) = σ 2 h i 30C00200 Lecture 11: Heteroskedasticity 12.10.2015 Aalto University School of Business 26 / 31
Improving the efficiency of the estimator 2/3 Given the form of heteroscedasticity: E [((u i / h i ) 2) ] x i = Var (u i / ) h i x i = = ( 1 hi ) 2 Var(u i x i ) = σ2 h i h 1 = σ 2 This suggests that we can estimate the following model (with OLS!) to improve the efficiency. ( ) ( ) ( ) ( ) 1 1 x1 1 y i = β 0 + β 1 + u i hi hi hi hi The parameters should still be interpreted in terms of the original model. 30C00200 Lecture 11: Heteroskedasticity 12.10.2015 Aalto University School of Business 27 / 31
Improving the efficiency of the estimator 3/3 Of course in practice the function h(x) needs to be estimated. Feasible GLS (FGLS) estimators. FGLS estimators utilize the residuals of the original model to estimate a model of heteroscedasticy. A functional form for heteroscedasticity has to be assumed, such as σ 2 exp(x δ) Obviously this form can be also wrong! Thus improved efficiency properties compared to stadard OLS are not guarateed in theory. 30C00200 Lecture 11: Heteroskedasticity 12.10.2015 Aalto University School of Business 28 / 31
Logarithms as a way to reduce heteroscedasticity Often we can reduce heteroscedasticity in our data by taking logarithms. Using WAGE2.dta dataset, where wage is now monthly earnings. 30C00200 Lecture 11: Heteroskedasticity 12.10.2015 Aalto University School of Business 29 / 31
Concluding remarks 1/2 Remember that the OLS estimator of the coefficient(s) is still unbiased and consistent under heteroscedasticity. Heteroscedasticity however invalidates the basic statistical inference. Moreover OLS is not anymore BLUE under heteroscedasticity. Remedies for heteroscedasticity include: Obtain the robust standard errors. Use an alternative estimator (GLS). Data transformations. Generally, some form of safeguard against heteroscedasticity is recommended always if problem is suspected to exist. 30C00200 Lecture 11: Heteroskedasticity 12.10.2015 Aalto University School of Business 30 / 31
Concluding remarks 2/2 Is heteroscedasticity only an econometric problem? No! Heteroscedasticity is also as a phenomena with some economic/practical meaning. Variance is a common measure of risk. For example, in financial time series modelling heteroscedasticity is used as a tool to model time-varying volatily Autoregressive Conditional Heteroscedasticity (ARCH) models. See also for example: Saastamoinen, A. (2015). Heteroscedasticy or Production Risk? A Synthetic View. Journal of Economic Surveys, 29(3): 459-478. http://onlinelibrary.wiley.com/doi/10.1111/joes.12054/full 30C00200 Lecture 11: Heteroskedasticity 12.10.2015 Aalto University School of Business 31 / 31