Lecture 4: Heteroskedasticity Econometric Methods Warsaw School of Economics (4) Heteroskedasticity 1 / 24
Outline 1 What is heteroskedasticity? 2 Testing for heteroskedasticity White Goldfeld-Quandt Breusch-Pagan 3 Dealing with heteroskedasticity Robust standard errors (4) Heteroskedasticity 2 / 24
Outline 1 What is heteroskedasticity? 2 Testing for heteroskedasticity 3 Dealing with heteroskedasticity (4) Heteroskedasticity 3 / 24
Theoretical background Recall: variance-covariance matrix of ε E ( εε T ) = = var (ε 1 ) cov (ε 1 ε 2 ) cov (ε 1 ε T ) cov (ε 1 ε 2 ) var (ε 2 ) cov (ε 2 ε T ) cov (ε 1 ε T ) cov (ε 2 ε T ) var (ε T ) OLS assumptions = σ 2 0 0 0 σ 2 0 0 0 σ 2 = (4) Heteroskedasticity 4 / 24
Theoretical background Heteroskedasticity var (ε 1 ) cov (ε 1, ε 2 ) cov (ε 1, ε T ) cov (ε 1, ε 2 ) var (ε 2 ) cov (ε 2, ε T ) cov (ε 1, ε T ) cov (ε 2, ε T ) var (ε T ) = σ2 ω 1 0 0 0 ω 2 0 0 0 ω T 6 4 2 0 6 4 2 0-2 -4-6 -2-4 -6 2000 2001 2002 2003 2004 2005 2006 Residual Actual Fitted (4) Heteroskedasticity 5 / 24
Theoretical background Non-spherical disturbances variance-covariance serial correlation matrix of the error term absent present skedasticity absent heteropresent σ 2 0 0 0 σ 2 0 0 0 σ 2 ω 1 0 0 0 ω σ 2 2 0 0 0 ω T σ 2 1 ω 12 ω 1T ω 12 1 ω 2T ω 1T ω 2T 1 Ω (???) (4) Heteroskedasticity 6 / 24
Theoretical background Consequences of heteroskedasticity Common features of non-spherical disturbances (see serial correlation): no bias, no inconsistency but ineciency! of OLS estimates Unlike serial correlation heteroskedasticity can occur both in time series data (eg high- and low-volatility periods in nancial markets) cross-section data (eg variance of disturbances depends on unit size or some key explanatory variables) (4) Heteroskedasticity 7 / 24
Theoretical background Exercise (1/3) Credit cards Based on client-level data, we t a model that that explains the credit-card-settled expenditures with: age; income; squared income; house ownership dummy (4) Heteroskedasticity 8 / 24
Outline 1 What is heteroskedasticity? 2 Testing for heteroskedasticity 3 Dealing with heteroskedasticity (4) Heteroskedasticity 9 / 24
White White test (1) Step 1: OLS regression y i = x i β + ε i ˆβ = (X T X) 1 X T y ˆεi = y i x i ˆβ Step 2: auxiliary regression equation ˆε 2 i = x k,i x l,i β k,l + v i k,l Eg in a model with a constant and 3 regressors x1t, x2t, x3t, the auxiliary model contains the following explanatory variables: constant, x 1t, x 2t, x 3t, x1t, 2 x2t, 2 x3t, 2 x 1t x 2t, x 2t x 3t, x 1t x 3t }{{} cross terms IDEA: Without heteroskedasticity, the R 2 of the auxiliary equation should be low (4) Heteroskedasticity 10 / 24
White White test (1) Step 1: OLS regression y i = x i β + ε i ˆβ = (X T X) 1 X T y ˆεi = y i x i ˆβ Step 2: auxiliary regression equation ˆε 2 i = x k,i x l,i β k,l + v i k,l Eg in a model with a constant and 3 regressors x1t, x2t, x3t, the auxiliary model contains the following explanatory variables: constant, x 1t, x 2t, x 3t, x1t, 2 x2t, 2 x3t, 2 x 1t x 2t, x 2t x 3t, x 1t x 3t }{{} cross terms IDEA: Without heteroskedasticity, the R 2 of the auxiliary equation should be low (4) Heteroskedasticity 10 / 24
White White test (1) Step 1: OLS regression y i = x i β + ε i ˆβ = (X T X) 1 X T y ˆεi = y i x i ˆβ Step 2: auxiliary regression equation ˆε 2 i = x k,i x l,i β k,l + v i k,l Eg in a model with a constant and 3 regressors x1t, x2t, x3t, the auxiliary model contains the following explanatory variables: constant, x 1t, x 2t, x 3t, x1t, 2 x2t, 2 x3t, 2 x 1t x 2t, x 2t x 3t, x 1t x 3t }{{} cross terms IDEA: Without heteroskedasticity, the R 2 of the auxiliary equation should be low (4) Heteroskedasticity 10 / 24
White White test (2) White test H 0 : heteroskedasticity absent H 1 : heteroskedasticity present W = TR 2 χ 2 (k ) k number of explanatory variables in the auxiliary regresson (excluding constant) CAUTION! It's a weak test (ie low power to reject the null) (4) Heteroskedasticity 11 / 24
Goldfeld-Quandt Goldfeld-Quandt test split the sample (T observations) into 2 subsamples (T = n 1 + n 2) test for equality of error term variance in both subsamples Goldfeld-Quandt H 0 : σ1 2 = σ2 2 equal variance in both subsamples (homoskedasticity) H 1 : σ1 2 > σ2 2 higher variance in the subsample indexed as 1 n 1 F (n 1 k, n 2 k) = ˆε 2 i /(n 1 k) i=1 T i=n 1 +1 ˆε 2 i /(n 2 k) CAUTION! This makes sense only when we index the subsample with higher variance as 1 Otherwise we never reject H 0 (4) Heteroskedasticity 12 / 24
Breusch-Pagan Breusch-Pagan test variance of the disturbances can be explained with a variable set contained in matrix Z (like explanatory variables for y in the matrix X) Breusch-Pagan test H 0 : homoskedasticity H 1 : heteroskedasticity BP = ( ) ˆε2 ˆσ 2 T 1 Z(Z T Z) 1 ( Z T ˆε2 ˆσ 1) 2 n ˆε 2 i /(n k) i=1 where ˆε 2 = [ ] ε 2 1 ε 2 2 ε 2 T [ ] T T, 1 = 1 1 1 The test statistic is χ 2 -distributed with degrees of freedom equal to the number of regressors in the matrix Z (4) Heteroskedasticity 13 / 24
Breusch-Pagan Exercise (2/3) Credit cards 1 Does the White test detect heteroskedasticity? 2 Split the sample into two equal subsamples: high-income and low-income Check if the variance diers between the two sub-samples (You need to sort the data and restrict the sample to a sub-sample twice, each time calculating the appropriate statistics) 3 Perform the Breusch-Pagan test, assuming that the variance depends only on the income and squared income (and a constant) (4) Heteroskedasticity 14 / 24
Outline 1 What is heteroskedasticity? 2 Testing for heteroskedasticity 3 Dealing with heteroskedasticity (4) Heteroskedasticity 15 / 24
Robust standard errors White robust SE unlike Newey-West robust SE (robust to both serial correlation and heteroskedasticity), White's SE robust only to heteroskedasticity (the former were proposed later and generalized White's work) White (1980): ( ) Var ˆβ = ( ( X T ) 1 T ) ( X ˆε 2 t x t x T t X T ) 1 X t=1 they share the same features as Newey-West SE (see: serial correlation), ie correct statistical inference without improving estimation eciency of the parameters themselves (4) Heteroskedasticity 16 / 24
Robust standard errors White robust SE unlike Newey-West robust SE (robust to both serial correlation and heteroskedasticity), White's SE robust only to heteroskedasticity (the former were proposed later and generalized White's work) White (1980): ( ) Var ˆβ = ( ( X T ) 1 T ) ( X ˆε 2 t x t x T t X T ) 1 X t=1 they share the same features as Newey-West SE (see: serial correlation), ie correct statistical inference without improving estimation eciency of the parameters themselves (4) Heteroskedasticity 16 / 24
Robust standard errors White robust SE unlike Newey-West robust SE (robust to both serial correlation and heteroskedasticity), White's SE robust only to heteroskedasticity (the former were proposed later and generalized White's work) White (1980): ( ) Var ˆβ = ( ( X T ) 1 T ) ( X ˆε 2 t x t x T t X T ) 1 X t=1 they share the same features as Newey-West SE (see: serial correlation), ie correct statistical inference without improving estimation eciency of the parameters themselves (4) Heteroskedasticity 16 / 24
(1) y = Xβ + ε ε ( E [ε] = 0, E [ εε T ] = Ω ) Recall the GLS estimator Under heteroskedasticity, we know that ω 1 0 0 0 ω 2 0 the variance-covariance matrix Ω = 1 ω 0 1 0 1 0 implies Ω 1 ω 2 0 = 0 0 1 ω T 0 0 ω T, which (4) Heteroskedasticity 17 / 24
(1) y = Xβ + ε ε ( E [ε] = 0, E [ εε T ] = Ω ) Recall the GLS estimator Under heteroskedasticity, we know that ω 1 0 0 0 ω 2 0 the variance-covariance matrix Ω = 1 ω 0 1 0 1 0 implies Ω 1 ω 2 0 = 0 0 1 ω T 0 0 ω T, which (4) Heteroskedasticity 17 / 24
(2) [ Knowing the vector 1 1 ω 1 ω 2 ˆβ WLS = [ X T Ω 1 1 X] X T Ω 1 y = = = [ x T 1 [ x T 1 ( T i=1 1 ω i x T i x T 2 x T T x T 2 x T T ) 1 ( T x i i=1 1 ω T ] we can immediately apply GLS: ] ] 1 x ω T y i i i 1 0 0 ω 1 0 1 0 ω 2 0 0 1 ω T 1 0 0 ω 1 0 1 0 ω 2 0 0 1 ω T x 1 x 2 x T y 1 y 2 y T This vector can hence be interpreted as a vector of weights, associated with individual observations in the estimation (hence: weighted least squares) ) 1 = (4) Heteroskedasticity 18 / 24
ˆβ WLS = = ( T ( i=1 n i=1 ) 1 ( n ) 1 T 1 T x ω i i x i x ω i i y i = i=1 x i ωi ) 1 ( x T T i ωi i=1 x T i ωi ) y i ωi The WLS estimation is hence equivalent to OLS estimation using data transformed in the following way: y 1/ ω 1 y y x 1/ 2/ ω 2 ω 1 x = X 2/ ω 2 = y T / ω T x T / ω T Conclusion Weights for individual observations are the inverse of the disturbance variance in individual periods Under OLS, these weights are a unit vector (4) Heteroskedasticity 19 / 24
ˆβ WLS = = ( T ( i=1 n i=1 ) 1 ( n ) 1 T 1 T x ω i i x i x ω i i y i = i=1 x i ωi ) 1 ( x T T i ωi i=1 x T i ωi ) y i ωi The WLS estimation is hence equivalent to OLS estimation using data transformed in the following way: y 1/ ω 1 y y x 1/ 2/ ω 2 ω 1 x = X 2/ ω 2 = y T / ω T x T / ω T Conclusion Weights for individual observations are the inverse of the disturbance variance in individual periods Under OLS, these weights are a unit vector (4) Heteroskedasticity 19 / 24
ˆβ WLS = = ( T ( i=1 n i=1 ) 1 ( n ) 1 T 1 T x ω i i x i x ω i i y i = i=1 x i ωi ) 1 ( x T T i ωi i=1 x T i ωi ) y i ωi The WLS estimation is hence equivalent to OLS estimation using data transformed in the following way: y 1/ ω 1 y y x 1/ 2/ ω 2 ω 1 x = X 2/ ω 2 = y T / ω T x T / ω T Conclusion Weights for individual observations are the inverse of the disturbance variance in individual periods Under OLS, these weights are a unit vector (4) Heteroskedasticity 19 / 24
How to nd ω 1, ω 2,, ω T? Unknown and, with T observations, cannot be estimated The most popular solutions include: Way 1: Split the sample into subsamples Estimate the model in each subsample via OLS to obtain the vector ˆε In every subsample i estimate the variance of error terms ˆσ 2 i Assign the weight 1ˆσ 2 i to all the observations in the subsample i (4) Heteroskedasticity 20 / 24
How to nd ω 1, ω 2,, ω T? Unknown and, with T observations, cannot be estimated The most popular solutions include: Way 1: Split the sample into subsamples Estimate the model in each subsample via OLS to obtain the vector ˆε In every subsample i estimate the variance of error terms ˆσ 2 i Assign the weight 1ˆσ 2 i to all the observations in the subsample i (4) Heteroskedasticity 20 / 24
How to nd ω 1, ω 2,, ω T? Unknown and, with T observations, cannot be estimated The most popular solutions include: Way 1: Split the sample into subsamples Estimate the model in each subsample via OLS to obtain the vector ˆε In every subsample i estimate the variance of error terms ˆσ 2 i Assign the weight 1ˆσ 2 i to all the observations in the subsample i (4) Heteroskedasticity 20 / 24
How to nd ω 1, ω 2,, ω T? Unknown and, with T observations, cannot be estimated The most popular solutions include: Way 1: Split the sample into subsamples Estimate the model in each subsample via OLS to obtain the vector ˆε In every subsample i estimate the variance of error terms ˆσ 2 i Assign the weight 1ˆσ 2 i to all the observations in the subsample i (4) Heteroskedasticity 20 / 24
How to nd ω 1, ω 2,, ω T? Unknown and, with T observations, cannot be estimated The most popular solutions include: Way 1: Split the sample into subsamples Estimate the model in each subsample via OLS to obtain the vector ˆε In every subsample i estimate the variance of error terms ˆσ 2 i Assign the weight 1ˆσ 2 i to all the observations in the subsample i (4) Heteroskedasticity 20 / 24
Way 2: Estimate the model via OLS to obtain the vector ˆε against a set of potential explanatory variables Regress ˆε 2 t (when done automatically, usually all the regressors from the base equation plus possibly their squares) Take theoretical value of ˆε 2 t from this regression say, e2 t a proxy of variance (one cannot use ˆε 2 t itself, as it does not measure variance adequately it is just one draw from a as distribution, while the theoretical value summarizes a number of draws made under similar conditions regarding the explanatory variables for variance) Use 1 et 2 as weights for individual observations t ) rather than ˆε 2 t In In practice, it is common to regress ln (ˆε 2 t ( ˆ ) ( this way we compute ε 2 t = exp ln ( ˆ ) ) ε 2 t which blocks negative values of error terms' variance (4) Heteroskedasticity 21 / 24
Way 2: Estimate the model via OLS to obtain the vector ˆε against a set of potential explanatory variables Regress ˆε 2 t (when done automatically, usually all the regressors from the base equation plus possibly their squares) Take theoretical value of ˆε 2 t from this regression say, e2 t a proxy of variance (one cannot use ˆε 2 t itself, as it does not measure variance adequately it is just one draw from a as distribution, while the theoretical value summarizes a number of draws made under similar conditions regarding the explanatory variables for variance) Use 1 et 2 as weights for individual observations t ) rather than ˆε 2 t In In practice, it is common to regress ln (ˆε 2 t ( ˆ ) ( this way we compute ε 2 t = exp ln ( ˆ ) ) ε 2 t which blocks negative values of error terms' variance (4) Heteroskedasticity 21 / 24
Way 2: Estimate the model via OLS to obtain the vector ˆε against a set of potential explanatory variables Regress ˆε 2 t (when done automatically, usually all the regressors from the base equation plus possibly their squares) Take theoretical value of ˆε 2 t from this regression say, e2 t a proxy of variance (one cannot use ˆε 2 t itself, as it does not measure variance adequately it is just one draw from a as distribution, while the theoretical value summarizes a number of draws made under similar conditions regarding the explanatory variables for variance) Use 1 et 2 as weights for individual observations t ) rather than ˆε 2 t In In practice, it is common to regress ln (ˆε 2 t ( ˆ ) ( this way we compute ε 2 t = exp ln ( ˆ ) ) ε 2 t which blocks negative values of error terms' variance (4) Heteroskedasticity 21 / 24
Way 2: Estimate the model via OLS to obtain the vector ˆε against a set of potential explanatory variables Regress ˆε 2 t (when done automatically, usually all the regressors from the base equation plus possibly their squares) Take theoretical value of ˆε 2 t from this regression say, e2 t a proxy of variance (one cannot use ˆε 2 t itself, as it does not measure variance adequately it is just one draw from a as distribution, while the theoretical value summarizes a number of draws made under similar conditions regarding the explanatory variables for variance) Use 1 et 2 as weights for individual observations t ) rather than ˆε 2 t In In practice, it is common to regress ln (ˆε 2 t ( ˆ ) ( this way we compute ε 2 t = exp ln ( ˆ ) ) ε 2 t which blocks negative values of error terms' variance (4) Heteroskedasticity 21 / 24
Way 2: Estimate the model via OLS to obtain the vector ˆε against a set of potential explanatory variables Regress ˆε 2 t (when done automatically, usually all the regressors from the base equation plus possibly their squares) Take theoretical value of ˆε 2 t from this regression say, e2 t a proxy of variance (one cannot use ˆε 2 t itself, as it does not measure variance adequately it is just one draw from a as distribution, while the theoretical value summarizes a number of draws made under similar conditions regarding the explanatory variables for variance) Use 1 et 2 as weights for individual observations t ) rather than ˆε 2 t In In practice, it is common to regress ln (ˆε 2 t ( ˆ ) ( this way we compute ε 2 t = exp ln ( ˆ ) ) ε 2 t which blocks negative values of error terms' variance (4) Heteroskedasticity 21 / 24
Exercise (3/3) Credit cards 1 Split the sample into two equal subsamples and use WLS (way 1) 2 Use all the explanatory variables and their sqares as the regressors in the variance equation and use WLS (way 2) 3 Compare the parameter values between OLS and the two variants of WLS; 4 Compare variable signicance between OLS, OLS with White's robust standard errors and the two variants of WLS (4) Heteroskedasticity 22 / 24
Readings Greene: chapter Generalized Regression Model and Heteroscedasticity (4) Heteroskedasticity 23 / 24
Homework Gasoline demand model Verify the presence of heteroskedasticity in the model considered in the previous lecture Programming Write an R-function performing an automated heteroskedasticity correction a la way 2 in this presentation, using WLS, and using all the right-hand side variables as potential determinants of error term variance (4) Heteroskedasticity 24 / 24