Heteroskedasticity ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD

Similar documents
Multiple Regression Analysis: Inference ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD

Multiple Regression Analysis: Estimation ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD

the error term could vary over the observations, in ways that are related

Econometrics - 30C00200

Heteroskedasticity. We now consider the implications of relaxing the assumption that the conditional

Introduction to Econometrics. Heteroskedasticity

Topic 7: Heteroskedasticity

Introductory Econometrics

1 Motivation for Instrumental Variable (IV) Regression

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Multiple Regression Analysis

Econometrics Multiple Regression Analysis: Heteroskedasticity

Course Econometrics I

Semester 2, 2015/2016

Math, Stats, and Mathstats Review ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD

More on Specification and Data Issues ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD

Least Squares Estimation-Finite-Sample Properties

Multiple Regression Analysis: Heteroskedasticity

Statistical Inference with Regression Analysis

Week 11 Heteroskedasticity and Autocorrelation

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Multiple Linear Regression

Heteroskedasticity (Section )

Wooldridge, Introductory Econometrics, 2d ed. Chapter 8: Heteroskedasticity In laying out the standard regression model, we made the assumption of

ECON 4230 Intermediate Econometric Theory Exam

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

Homoskedasticity. Var (u X) = σ 2. (23)

Heteroskedasticity-Robust Inference in Finite Samples

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

OSU Economics 444: Elementary Econometrics. Ch.10 Heteroskedasticity

Economics 308: Econometrics Professor Moody

Econometrics Summary Algebraic and Statistical Preliminaries

Linear Regression with 1 Regressor. Introduction to Econometrics Spring 2012 Ken Simons

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

Heteroskedasticity and Autocorrelation

Reliability of inference (1 of 2 lectures)

Microeconometrics: Clustering. Ethan Kaplan

ECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests

Econometrics. 9) Heteroscedasticity and autocorrelation

GLS. Miguel Sarzosa. Econ626: Empirical Microeconomics, Department of Economics University of Maryland

Heteroskedasticity. Part VII. Heteroskedasticity

Graduate Econometrics Lecture 4: Heteroskedasticity

Introductory Econometrics

Non-Spherical Errors

Multiple Regression Analysis: Further Issues ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD

Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator

Models, Testing, and Correction of Heteroskedasticity. James L. Powell Department of Economics University of California, Berkeley

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Lecture 4: Heteroskedasticity

Intermediate Econometrics

Question 1 carries a weight of 25%; question 2 carries 25%; question 3 carries 20%; and question 4 carries 30%.

Review of Econometrics

Iris Wang.

Introductory Econometrics

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Topic 7: HETEROSKEDASTICITY

statistical sense, from the distributions of the xs. The model may now be generalized to the case of k regressors:

1 The Multiple Regression Model: Freeing Up the Classical Assumptions

The Simple Linear Regression Model

Applied Statistics and Econometrics

The Finite Sample Properties of the Least Squares Estimator / Basic Hypothesis Testing

The Exact Distribution of the t-ratio with Robust and Clustered Standard Errors

Recall that a measure of fit is the sum of squared residuals: where. The F-test statistic may be written as:

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals

Economics 471: Econometrics Department of Economics, Finance and Legal Studies University of Alabama

Chapter 2: simple regression model

Business Economics BUSINESS ECONOMICS. PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS MODULE No. : 3, GAUSS MARKOV THEOREM

Chapter 7. Hypothesis Tests and Confidence Intervals in Multiple Regression

Chapter 8 Heteroskedasticity

ECNS 561 Multiple Regression Analysis

Outline. Possible Reasons. Nature of Heteroscedasticity. Basic Econometrics in Transportation. Heteroscedasticity

ECON2228 Notes 7. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 41

Quantitative Analysis of Financial Markets. Summary of Part II. Key Concepts & Formulas. Christopher Ting. November 11, 2017

Midterm Exam 1, section 1. Thursday, September hour, 15 minutes

Answer Key: Problem Set 6

Lab 11 - Heteroskedasticity

Making sense of Econometrics: Basics

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9.

Lecture 4: Multivariate Regression, Part 2

Heteroskedasticity. y i = β 0 + β 1 x 1i + β 2 x 2i β k x ki + e i. where E(e i. ) σ 2, non-constant variance.

Christopher Dougherty London School of Economics and Political Science

The Exact Distribution of the t-ratio with Robust and Clustered Standard Errors

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16)

WISE International Masters

(a) Briefly discuss the advantage of using panel data in this situation rather than pure crosssections

Applied Statistics and Econometrics

1 Linear Regression Analysis The Mincer Wage Equation Data Econometric Model Estimation... 11

1 Introduction to Generalized Least Squares

Passing-Bablok Regression for Method Comparison

General Strong Polarization

Econ 836 Final Exam. 2 w N 2 u N 2. 2 v N

Topic 10: Panel Data Analysis

A Course in Applied Econometrics Lecture 7: Cluster Sampling. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

Econometrics. 7) Endogeneity

Variations. ECE 6540, Lecture 02 Multivariate Random Variables & Linear Algebra

Econometrics. Final Exam. 27thofJune,2008. Timeforcompletion: 2h30min

INTRODUCTION TO BASIC LINEAR REGRESSION MODEL

Introductory Econometrics

General Strong Polarization

Transcription:

Heteroskedasticity ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD

Introduction For pedagogical reasons, OLS is presented initially under strong simplifying assumptions. One of these is homoskedastic errors, i.e., with constant variance, σσ 2. Along with the other Gauss-Markov assumptions, this gives OLS the desirable properties of unbiasedness, consistency, and has minimal variance among the class of linear estimators, i.e., it is BLUE. If the homoskedasticity assumption fails (with the other assumptions remaining intact), however, the last property ceases to apply, and OLS is no longer the most efficient estimator. This heteroskedasticity, or variance that depends on x, is the norm in empirical economics not the exception. So this chapter describes what can be done to detect heteroskedasticity and what can be done about it once detected.

Outline Consequences of Heteroskedasticity for OLS. Heteroskedasticity-Robust Inference after OLS Estimation. Testing for Heteroskedasticity. Weighted Least Squares Estimation.

Consequences of heteroskedasticity for OLS OLS estimators do not need the homoskedasticity assumption to be unbiased and consistent. It is required to have the standard errors that justify inference using t and F statistics, though. These tests are not valid under heteroskedasticity, i.e., when VVVVVV uu xx 1,..., xx kk = σσ 2 is violated. Lastly OLS loses its efficiency properties under heteroskedasticity. It is possible to find more efficient (less variance) estimators than OLS, if one knows the form of the heteroskedasticity.

Heteroskedasticity-robust inference after OLS estimation Large samples are wonderful: they can justify OLS estimation, even with heteroskedasticity, as long as the bias in the standard errors is corrected. The efficiency loss is tolerable with large samples because the standard errors will be small enough to make valid inferences anyway. So then the problem is reduced to one of correcting the bias in the variance of the OLS estimators. Fortunately some very smart people (Halbert White, Friedhelm Eicker, and Peter J. Huber) have devised a consistent estimator for the variance of the OLS estimator under heteroskedasticity. What is more remarkable: this estimator is valid even if one doesn t know the form of the heteroskedasticity.

Heteroskedasticity-robust inference (continued) It is deceptively easy to have software compute heteroskedasticity-robust standard errors. In STATA it just requires an option at the end of a reg command. Typing, vce(robust) after your regressors will run the regression and calculate robust standard errors. reg yvar {xvar1 xvar2... xvark}, vce(robust) is the whole syntax.

Heteroskedasticity-robust inference (continued) As we have demonstrated before, the simple OLS estimator, ββ 1, is expressed: ββ 1 = nn ii=1 xx ii xx yy ii nn xx ii xx 2 = ββ 1 + nn ii=1 xx ii xx uu ii nn xx ii xx 2. ii=1 The estimate of its variance, then, is the square of the second term. 2 nn 2 ii=1 xx ii xx uu ii VVVVVV ββ 1 = EE ββ 1 ββ 1 = EE nn xx ii xx 2 = nn xx ii xx 2 2 ii=1 σσ ii, 2 SSSSTT xx where the last equality holds if the errors are at least independent (for the cross products to cancel out). ii=1 ii=1

Heteroskedasticity-robust inference (continued) This is the same expression for variance from Chapter 2, with one notable exception: the subscript i on the variance. This means the variance will not simplify by canceling an SSSSTT xx s because you can t pull σσ ii 2 through the sum operator like when it was a constant! Professor White s contribution* was showing that the sample analog of the variance: nn xx ii xx 2 2 uu ii ii=1 SSSSTT xx 2 consistently estimates the variance of ββ 1 as expressed above. *White, Halbert. 1980. A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity. Econometrica, 48: 817-838.

Heteroskedasticity-robust inference (continued) Retracing White s footsteps is definitely beyond the scope and level of technicality for this class. The interested student may read White s paper or a graduate-level text on the subject. The simple OLS case generalizes to multiple regression, in which the estimators are expressed in terms of their residual variance (not explained by other regressors). The partialling-out expression for ββ jj is: ββ jj = ββ jj + nn ii=1 rr iiiiuu ii nn 2 ii=1 rr iiii where rr iii is the residual from regressing xx jj on all the other regressors.,

Heteroskedasticity-robust inference (concluded) The expression for heteroskedasticity-robust variance, then, is: nn VVVVVV ββ jj = ii=1 rr iiii 2 SSSSRR jj 2 uu ii 2 2, where SSSSRR jj = rr iiii The square root is the heteroskedasticity-robust standard error. Sometimes just robust standard error for short. The degrees of freedom are corrected for by multiplying the variance by ( square root. nn ii=1. nn nn kk 1 ) before taking the With the robust standard error in hand, it is possible to conduct inference using the familiar methods of t and F tests, but it is worth reminding you that the tests are only valid asymptotically, i.e., in large samples.

Testing for heteroskedasticity Even though heteroskedasticity is commonplace in empirical work, we would like to know when Assumption MLR.5 does hold because in those instances inference can be conducted with an exact t distribution, instead of just an asymptotic one. Also if there is heteroskedasticity, one may prefer a different estimator with improved efficiency, so detecting it is necessary for making an informed decision. Detecting heteroskedasticity amounts to testing whether the hypothesis, EE uu 2 xx 1,... xx kk = EE uu 2 = σσ 2, (that the error variance is independent of the regressors) is rejected or not.

Testing for heteroskedasticity (continued) Keeping with the use of regression analysis to tell when two variables are related, regressing an estimate of the variance on the regressors suggests itself as a test: uu 2 = δδ 0 + δδ 1 xx 1 + δδ 2 xx 2 +... +δδ kk xx kk + vv. If the magnitude of the errors is independent of the regressors, then it should be the case that: δδ 0 = δδ 1 = δδ 2 =... = δδ kk = 0; call this HH 0. This can be readily tested with an F statistic (test of the overall model s explanatory power). In the R Squared form, this is: 2 RR uu nn kk 1 FF =. 2 1 RR uu kk

Testing for heteroskedasticity (concluded) There are more exotic tests that can be performed to detect heteroskedasticity, i.e., White s test that adds interactions and quadratics to the procedure outlined above. We will not examine all of the variations in detail; this is left as an exercise for the interested student. Suffice it to say that there are regression-based tests for heteroskedasticity that are intuitive and easy to implement (see computer exercise C4, for example).

Weighted least squares estimation If the particular form of heteroskedasticity can be discerned, i.e., functionally how variance depends on x, the regression model can be made homoskedastic with the application of weights. E.g., if the variance of the error, conditional on x (all regressors), is simply a constant multiplied by a known function ( h ) of x, a new error can be written by dividing the heteroskedastic error (u) by h. yy ii = ββ 0 + ββ 1 xx iii +... +ββ kk xx iiii + uu ii ; VVVVVV uu xx = σσ 2 h xx, so VVVVVV uu h xx xx = σσ2, ss. dd. uu h xx xx = σσ, and EE uu h xx xx = 0.

Weighted least squares (continued) To make the regression homoskedastic, just divide through by h ii = h(xx ii). yy ii = ββ 0 xx iii xx iiii + ββ 1 +... +ββ kk + uu ii. h ii h ii h ii h ii h ii If the original model satisfied all the Gauss-Markov assumptions except for homoskedasticity, the new transformed model satisfies all of them, along with their desirable properties (BLUE). Estimating the parameters in the preceding line by OLS produces what are called the generalized least squares (GLS) estimators.

Weighted least squares (continued) So where does the weighting come in? When you estimate these parameters by OLS, you re minimizing nn uu ii 2, ii=1 the sum of squared residuals, weighted by 1/h ii. Since h is what increases the variance of u, the weights assigned are the inverses of how noisy each observation is. This is how weighted least squares improves on the efficiency of regular OLS, which simply weights all observations equally. The GLS estimates will differ from regular OLS, but the interpretation of the coefficients still comes from the original model. h ii

The heteroskedasticity function must be estimated: Feasible GLS The crucial step in performing weighted least squares is figuring out what the function, h is. When the weighting function is modeled and estimated from data, the resulting estimated weights (h ii ) are the bases for feasible GLS (FGLS). In a spirit similar to testing for heteroskedasticity, one could estimate the regression ln uu 2 = cccccccccccccccc + δδ 1 xx 1 + δδ 2 xx 2 +... +δδ kk xx kk + ee, by OLS, where uu 2 are the (y on x) OLS residuals squared. The fitted values from the regression of (logs of) squared residuals equal the logs of the weights, under the assumptions that h xx = exp δδ 0 + δδ 1 xx 1 +... +δδ kk xx kk ; VVVVVV uu xx = σσ 2 h(xx).

Feasible GLS (continued) So, h ii = exp cccccccccccccccc + δδ 1 xx iii +.. +δδ kk xx iiii, are the weights for squared residuals in an FGLS procedure (you just have to invert them). So FGLS minimizes nn uu ii 2. h ii ii=1 Since the weights have to be estimated from the data, FGLS is no longer either unbiased or BLUE. But it is asymptotically more efficient than regular OLS, and it is consistent.

FGLS cookbook 1. Regress y on all k regressors in the model. 2. Obtain the residuals, uu. 3. Square the residuals and take the log of each of the squares. 4. Regress the logs of the squares on all k regressors. See p.288 for an alternative method that uses the fitted values of y instead. 5. Obtain the fitted values. 6. Exponentiate the fitted values: e raised to the fitted value. 7. Estimate the original model using the inverse of the h ii as weights.

FGLS (concluded) If the FGLS estimates differ substantially, e.g., a statistically significant coefficient changes sign, from OLS, it is probably because the functional form of the model is mis-specified or a relevant variable has been omitted. This would cause OLS and FGLS estimators to have different probability limits. Furthermore the specification of the heteroskedasticity as, VVVVVV uu xx = σσ 2 h xx, could be wrong because the model of h xx is mis-specified. FGLS is still unbiased and consistent if this is the case (assuming the underlying model satisfies the first 4 Gauss-Markov assumptions). The standard errors computed under a mis-specified h xx will not yield valid inferences, though. Once again, though, robust standard errors come to the rescue if weighted least squares fails to solve the heteroskedasticity problem. Even if a WLS procedure mis-specifies the weighting function, though, it is an improvement (in terms of smaller robust standard errors) over ignoring heteroskedasticity and estimating by OLS.

Prediction and prediction intervals with heteroskedasticity The construction of prediction intervals follows a procedure analogous to what we are familiar with. There are some technical modifications to the standard errors used in constructing the intervals, but I prefer not to devote time to scrutinizing them here. There are topics of more general interest to which to proceed, so they are left to the interested student.

Conclusion Heteroskedasticity is the rule not the exception in empirical economics. Robust inference can be performed by having software compute robust ( White ) standard errors. Heteroskedasticity can be detected by one of several regression-based tests. If the form of heteroskedasticity is known, a weighted least squares estimator has better efficiency properties than OLS.