Heteroskedasticity. Part VII. Heteroskedasticity

Part VII Heteroskedasticity As of Oct 15, 2015

1 Heteroskedasticity Consequences Heteroskedasticity-robust inference Testing for Heteroskedasticity Weighted Least Squares (WLS) Feasible generalized Least Squares (GLS)

Consider regression y i = β 0 + β 1 x i1 + + β k x ik + u i. (1) Assumption 2 (classical assumptions) states that the error term u i is homoskedastic, which means the variance of u i (conditional on the explanatory variables) is constant, i.e., var[u i x i ] = σ 2 (< ) for all i, where x i = (x i1,..., x ik ). Violation of this assumption is called heteroskedasticity in which the variance var[u i x i ] = σ 2 i varies (e.g. as a function of x i ).

Consequences 1 Heteroskedasticity Consequences Heteroskedasticity-robust inference Testing for Heteroskedasticity Weighted Least Squares (WLS) Feasible generalized Least Squares (GLS)

Consequences In the presence of heteroskedasticity: (i) OLS estimators are not BLUE [ ] (ii) var ˆβ j are biased, implying that t-, F -, and LM-statistics, and confidence intervals are no more reliable. (iii) OLS estimator are no more asymptotically efficient. However, (iv) OLS estimators are still unbiased. (v) OLS estimators are still consistent

Heteroskedasticity-robust inference 1 Heteroskedasticity Consequences Heteroskedasticity-robust inference Testing for Heteroskedasticity Weighted Least Squares (WLS) Feasible generalized Least Squares (GLS)

Heteroskedasticity-robust inference Consider for the sake of simplicity i = 1,..., n, where y i = β 0 + β 1 x i + u i, (2) var[u i x i ] = σ 2 i. (3) Then writing the OLS-estimator of β 1 in the form ˆβ 1 = β 1 + n i=1 (x i x)u i n i=1 (x i x) 2. (4) Because the error terms are uncorrelated, [ ] n var ˆβ i=1 1 = (x i x) 2 σi 2 (SST x ) 2, (5) where SST x = n (x i x) 2. (6) i=1

Heteroskedasticity-robust inference In the homoscedastic case, where σ 2 i = σ 2 for all i formula (5) reduces to the usual variance σ 2 u/ (x i x) 2. White (1980) 2 derives a robust estimator for (5) as where û i are the OLS residuals. If we rewrite (1) in the matrix form and write ˆβ = (X X) 1 X y as [ ] n var ˆβ i=1 1 = (x i x) 2 ûi 2 (SST x ) 2, (7) y = Xβ + u, (8) ˆβ = β + (X X) 1 X u (9) 2 White, H. (1980). A Heteroskedasticity-consistent covariance matrix estimator and direct test for heterosedasticity. Econometrica 48, 817 838.

Heteroskedasticity-robust inference Given X, the variance-covariance matrix of ˆb is ( ] n ) cov[ˆβ = (X X) 1 σi 2 x i x i (X X) 1, (10) i=1 where x i = (1, x i1,... x ik ) is the ith row of the data matrix X on x-variables. Analogous to (7), an estimator of (10) is ( ] n ) cov[ˆβ = (X X) 1 ûi 2 x i x i (X X) 1, (11) i=1 which is often adjusted by n/(n k 1) (e.g. EViews). Heteroskedasticity robust standard error for estimate ˆβ j is the square root of the jth diagonal element of (11).

Heteroskedasticity-robust inference Remark 7.1: If the residual variances var[u i ] = σi 2 = σu 2 are the same, then because n X X = x i x i, (11) is i=1 ( ] n ) cov[ˆβ = σu(x 2 X) 1 x i x i (X X) 1 = σu(x 2 X) 1, i=1 i.e., the usual case.

Heteroskedasticity-robust inference Example 7.1: Wage example with heteroskedasticity-robust standard errors. Dependent Variable: LOG(WAGE) Method: Least Squares Sample: 1 526 Included observations: 526 White Heteroskedasticity-Consistent Standard Errors & Covariance ================================================================ Variable Coefficient Std. Error t-statistic Prob. ---------------------------------------------------------------- C 0.321378 0.109469 2.936 0.0035 MARRMALE 0.212676 0.057142 3.722 0.0002 MARRFEM -0.198268 0.058770-3.374 0.0008 SINGFEM -0.110350 0.057116-1.932 0.0539 EDUC 0.078910 0.007415 10.642 0.0000 EXPER 0.026801 0.005139 5.215 0.0000 TENURE 0.029088 0.006941 4.191 0.0000 EXPER^2-0.000535 0.000106-5.033 0.0000 TENURE^2-0.000533 0.000244-2.188 0.0291 ================================================================ R-squared 0.461 Mean dependent var 1.623 Adjusted R-squared 0.453 S.D. dependent var 0.532 S.E. of regression 0.393 Akaike info criterion 0.988 Sum squared resid 79.968 Schwarz criterion 1.061 Log likelihood -250.955 F-statistic 55.246 Durbin-Watson stat 1.785 Prob(F-statistic) 0.000 ================================================================

Heteroskedasticity-robust inference Comparing to Example 6.3 the standard errors change slightly (usually little increase). However, conclusions do not change.

Testing for Heteroskedasticity 1 Heteroskedasticity Consequences Heteroskedasticity-robust inference Testing for Heteroskedasticity Weighted Least Squares (WLS) Feasible generalized Least Squares (GLS)

Testing for Heteroskedasticity y = β 0 + β 1 x 1 + + β k x k + u. (12) Now variance of u also dependent on x-variables as σi 2 = var[u i x 1,..., x k ] = δ 0 + δ 1 x 1 + + δ k x k, (13) then the homoscedasticity hypothesis is H 0 : δ 1 = = δ k = 0, (14) i.e., σi 2 = δ 0.

Testing for Heteroskedasticity Writing v i = u 2 i E [ u 2 i x 1,..., x n ] (note that var[u i x 1,..., x k ] = E [ u 2 i x 1,..., x k ] ), we can write (13) as u 2 i = δ 0 + δ 1 x 1 + + δ k x k + v i. (15) The error terms u i are unobservable. They must be replaced by the OLS-residuals û i.

Testing for Heteroskedasticity Estimating the parameters with OLS, the null hypothesis in (14) can be tested with the overall F -statistic defined in (4.25), which can be written in terms of the R-square as F = R 2 û 2 /k (1 R 2 û 2 )/(n k 1), (16) where R 2 û 2 is the R-square of the regression û 2 i = δ 0 + δ 1 x 1 + + δ k x k + v i. (17) The F -statistic is asymptotically F -distributed under the null hypothesis with k and n k 1 degrees for freedom.

Testing for Heteroskedasticity Breuch-Bagan test: Asymptotically (16) is equivalent to the Lagrange Multiplier (LM) test LM = nr 2 û 2, (18) which is asymptotically χ 2 -distributed with k degrees of freedom when the null hypothesis is true. Remark 7.2: In regression (17) the explanatory variables can be also some external variables (not just x-variables).

Testing for Heteroskedasticity White test: Suppose, for the sake of simplicity, that in (1) k = 3, then the White-procedure is to estimate û 2 i = δ 0 + δ 1 x 1 + δ 2 x 2 + δ 3 x 3 +δ 4 x 2 1 + δ 5x 2 2 + δ 6x 2 3 (19) +δ 7 x 1 x 2 + δ 8 x 1 x 3 + δ 9 x 2 x 3 + v i Estimate the model and use LM-statistic of the form (18) to test whether the coefficients δ j, j = 1,..., 9, are zero. Remark 7.3: As is obvious, Breuch-Pagan (BP) test with x-variables is White test without the cross-terms.

Testing for Heteroskedasticity Example 7.2: In the wage example BP (White without cross-terms) yields R 2 û 2 = 0.030244. With n = 526, LM = nr 2 û 2 15.91 df = 11, producing p-value 0.1446. Thus there is not empirical evidence of heteroskedasticity. White with cross-terms gives R 2 û 2 = 0.086858 and LM 45.69 with df = 36 and p-value of 0.129. Again we do not reject the null hypothesis of homoscedasticity.

Testing for Heteroskedasticity Remark 7.4: When x-variables include dummy-variables, be aware of the dummy-variable trap due to D 2 = D! I.e., you can only include Ds. Modern econometric packages, like EViews, avoid the trap automatically if the procedure is readily available in the program.

Weighted Least Squares (WLS) 1 Heteroskedasticity Consequences Heteroskedasticity-robust inference Testing for Heteroskedasticity Weighted Least Squares (WLS) Feasible generalized Least Squares (GLS)

Weighted Least Squares (WLS) Suppose the heteroskedasticity is of the form var[u i x i ] = σ 2 h(x i ), (20) where h i = h(x i ) > 0 is some (known) function of the explanatory (and possibly some other variables).

Weighted Least Squares (WLS) Dividing both sides of (1) by h i and denoting the new variables as ỹ i = y i / h i, x ij = x ij / h i, and ũ i = u i / h i, we get regression ỹ i = β 0 1 hi + β 1 x i1 + + β k x ik + ũ i, (21) where var[ũ i x i ] = 1 h i var[u i x i ] = 1 h i h i σ 2 (22) = σ 2, i.e., homoscedastic (satisfying the classical assumption 2). Applying OLS to (22) produces again BLUE for the parameters.

Weighted Least Squares (WLS) From estimation point of view the transformation leads, in fact, to the minimization of n (y i β 0 β 1 x i1 β k x ik ) 2 /h i. (23) i=1 This is called Weighted Least Squares (WLS), where the observations are weighted by the inverse of h i.

Weighted Least Squares (WLS) Example 7.3: Speed and stopping distance for cars, n = 50 observations. Data Chart 2 Distance vs Speed 140 120 100 80 Distance 60 40 20 0 0 5 10 15 20 25 30 Speed Page 1

Weighted Least Squares (WLS) Visual inspection suggests somewhat increasing variability as a function of speed. From the linear model dist = β 0 + β 1 speed + u White test gives LM = 3.22 with df = 2 and p-val 0.20, which is not statistically significant.

Weighted Least Squares (WLS) Physics: stopping distance proportional to square of speed, i.e., β 1 (speed) 2. Thus instead of a linear model a better alternative should be dist i = β 1 (speed i ) 2 + error i, (24) Human factor: reaction time v i = β 0 + u i, where β 0 is the average reaction time and the error term u i N(0, σ 2 u).

Weighted Least Squares (WLS) During the reaction time the car moves a distance v i speed i = β 0 speed i + u i speed i. (25) Thus modeling the error term in (24) as (25), gives dist i = β 0 speed i + β 1 (speed i ) 2 + e i, (26) where e i = u i speed i. (27) Because var[e i speed i ] = (speed i ) 2 var[u i ] (28) = (speed) 2 σu, 2 the heteroskedasticity is of the form (20) with h i = (speed i ) 2. (29)

Weighted Least Squares (WLS) Estimating (26) by ignoring the inherent heteroskedasticity yields Dependent Variable: DISTANCE Method: Least Squares Included observations: 50 ============================================================== Variable Coefficient Std. Error t-statistic Prob. -------------------------------------------------------------- SPEED 1.239 0.560 2.213 0.032 SPEED^2 0.090 0.029 3.067 0.004 ============================================================== R-squared 0.667 Mean dependent var 42.980 Adjusted R-squared 0.660 S.D. dependent var 25.769 S.E. of regression 15.022 Akaike info criterion 8.296 Sum squared resid 10831.117 Schwarz criterion 8.373 Log likelihood -205.401 Durbin-Watson stat 1.763 ==============================================================

Weighted Least Squares (WLS) Accounting for the heteroskedasticity and estimating the coefficients from gives dist i speed i = β 0 + β 1 speed i + u i (30) ============================================================== Variable Coefficient Std. Error t-statistic Prob. -------------------------------------------------------------- SPEED 1.261 0.426 2.963 0.00472 SPEED^2 0.089 0.026 3.402 0.00136 ============================================================== The results are not materially different. Thus the heteroskedasticity is not a big problem here.

Weighted Least Squares (WLS) Remark 7.5: The R-squares from (26) and (30) are not comparable. Comparable R-squares can be obtained by computing dist using the coefficient estimates of (30) and squaring the correlation R = corr [dist i, dist ] i. (31) The R-square for (30) is 0.194 while for (26) 0.667. A comparable R-square, however, is obtained by squaring (31), which gives 0.667, i.e., the same in this case (usually it is slightly smaller; why?).

Feasible generalized Least Squares (GLS) 1 Heteroskedasticity Consequences Heteroskedasticity-robust inference Testing for Heteroskedasticity Weighted Least Squares (WLS) Feasible generalized Least Squares (GLS)

Feasible generalized Least Squares (GLS) In practice the h(x) function is rarely known. In order to guarantee strict positivity, a common practice is to model it as h(x i ) = exp(δ 0 + δ 1 x 1 + + δ k x k ). (32) In such a case we can write where e is an error term. log(u 2 ) = δ 0 + δ 1 x 1 + + δ k x k + e, (33)

Feasible generalized Least Squares (GLS) In order to estimate the unknown parameters the procedure is: (i) Obtain OLS residuals û from regression equation (1) (ii) Run regression (33) for log(û 2 ), and generate the fitted values, ĝ i. (iii) Re-estimate (1) by WLS using 1/ĥ i, where ĥ i = exp(ĝ i ). This is called a feasible GLS. Another possibility is to obtain the ĝ i by regressing log(û 2 ) on ŷ and ŷ 2.