Topic 7: Heteroskedasticity

Similar documents
Lecture 4: Heteroskedasticity

Intermediate Econometrics

Econometrics Multiple Regression Analysis: Heteroskedasticity

Introductory Econometrics

Introduction to Econometrics. Heteroskedasticity

the error term could vary over the observations, in ways that are related

Econometrics - 30C00200

Multiple Regression Analysis

Heteroskedasticity. Part VII. Heteroskedasticity

Multiple Regression Analysis: Heteroskedasticity

Wooldridge, Introductory Econometrics, 2d ed. Chapter 8: Heteroskedasticity In laying out the standard regression model, we made the assumption of

Graduate Econometrics Lecture 4: Heteroskedasticity

Heteroskedasticity. We now consider the implications of relaxing the assumption that the conditional

Heteroskedasticity. Occurs when the Gauss Markov assumption that the residual variance is constant across all observations in the data set

Chapter 8 Heteroskedasticity

Outline. Possible Reasons. Nature of Heteroscedasticity. Basic Econometrics in Transportation. Heteroscedasticity

Heteroskedasticity. (In practice this means the spread of observations around any given value of X will not now be constant)

Heteroskedasticity and Autocorrelation

Heteroskedasticity. y i = β 0 + β 1 x 1i + β 2 x 2i β k x ki + e i. where E(e i. ) σ 2, non-constant variance.

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Lab 11 - Heteroskedasticity

Semester 2, 2015/2016

Topic 10: Panel Data Analysis

OSU Economics 444: Elementary Econometrics. Ch.10 Heteroskedasticity

EC312: Advanced Econometrics Problem Set 3 Solutions in Stata

Chapter 15 Panel Data Models. Pooling Time-Series and Cross-Section Data

Course Econometrics I

Introductory Econometrics

Heteroskedasticity ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD

Iris Wang.

Reliability of inference (1 of 2 lectures)

Repeated observations on the same cross-section of individual units. Important advantages relative to pure cross-section data

Heteroskedasticity Example

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Autocorrelation. Think of autocorrelation as signifying a systematic relationship between the residuals measured at different points in time

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

Heteroskedasticity (Section )

Econometrics. 9) Heteroscedasticity and autocorrelation

Lecture 8: Heteroskedasticity. Causes Consequences Detection Fixes

Using EViews Vox Principles of Econometrics, Third Edition

ECON2228 Notes 7. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 41

Ordinary Least Squares Regression

1 The Multiple Regression Model: Freeing Up the Classical Assumptions

ECO375 Tutorial 7 Heteroscedasticity

Models, Testing, and Correction of Heteroskedasticity. James L. Powell Department of Economics University of California, Berkeley

Topic 4: Model Specifications

Making sense of Econometrics: Basics

ECON 497: Lecture Notes 10 Page 1 of 1

Econ 836 Final Exam. 2 w N 2 u N 2. 2 v N

Week 11 Heteroskedasticity and Autocorrelation

mrw.dat is used in Section 14.2 to illustrate heteroskedasticity-robust tests of linear restrictions.

Instrumental Variables and GMM: Estimation and Testing. Steven Stillman, New Zealand Department of Labour

Econ 510 B. Brown Spring 2014 Final Exam Answers

Economics 471: Econometrics Department of Economics, Finance and Legal Studies University of Alabama

E 31501/4150 Properties of OLS estimators (Monte Carlo Analysis)

1. You have data on years of work experience, EXPER, its square, EXPER2, years of education, EDUC, and the log of hourly wages, LWAGE

Motivation for multiple regression

Econometrics Summary Algebraic and Statistical Preliminaries

Heteroskedasticity Richard Williams, University of Notre Dame, Last revised January 30, 2015

Panel Data. March 2, () Applied Economoetrics: Topic 6 March 2, / 43

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

Heteroscedasticity. Jamie Monogan. Intermediate Political Methodology. University of Georgia. Jamie Monogan (UGA) Heteroscedasticity POLS / 11

Instrumental Variables, Simultaneous and Systems of Equations

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Economics 582 Random Effects Estimation

1 Motivation for Instrumental Variable (IV) Regression

FinQuiz Notes

Economics 308: Econometrics Professor Moody

Ninth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis"

Lecture 19. Common problem in cross section estimation heteroskedasticity

2 Prediction and Analysis of Variance

Heteroskedasticity-Robust Inference in Finite Samples

Maria Elena Bontempi Roberto Golinelli this version: 5 September 2007

The regression model with one fixed regressor cont d

ECON 4551 Econometrics II Memorial University of Newfoundland. Panel Data Models. Adapted from Vera Tabakova s notes

Homoskedasticity. Var (u X) = σ 2. (23)

Outline. Nature of the Problem. Nature of the Problem. Basic Econometrics in Transportation. Autocorrelation

Christopher Dougherty London School of Economics and Political Science

INTRODUCTION TO BASIC LINEAR REGRESSION MODEL

Applied Statistics and Econometrics

Econometrics. 8) Instrumental variables

Greene, Econometric Analysis (7th ed, 2012) Chapters 9, 20: Generalized Least Squares, Heteroskedasticity, Serial Correlation

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9.

x i = 1 yi 2 = 55 with N = 30. Use the above sample information to answer all the following questions. Show explicitly all formulas and calculations.

Topic 6: Non-Spherical Disturbances

Spatial Regression. 3. Review - OLS and 2SLS. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Test of hypotheses with panel data

Freeing up the Classical Assumptions. () Introductory Econometrics: Topic 5 1 / 94

Introductory Econometrics

Environmental Econometrics

Rockefeller College University at Albany

Empirical Economic Research, Part II

Econometrics of Panel Data

Review of Econometrics

ECON 366: ECONOMETRICS II. SPRING TERM 2005: LAB EXERCISE #10 Nonspherical Errors Continued. Brief Suggested Solutions

Econometrics of Panel Data

Simple Linear Regression: The Model

Topic 7: HETEROSKEDASTICITY

Homework Set 3, ECO 311, Spring 2014

Transcription:

Topic 7: Heteroskedasticity Advanced Econometrics (I Dong Chen School of Economics, Peking University Introduction If the disturbance variance is not constant across observations, the regression is heteroskedastic That is, Var (ε i = σ i, i =,, n ( We continue to assume that disturbances are pairwise uncorrelated This implies that σ 0 0 E (εε = σ 0 σ 0 Ω = ( 0 0 σn Heteroskedasticity may arise in many applications, especially in cross-sectional data Example : (i The variation in profits of large firms may be greater than that of small ones, even after accounting for differences in firm size (ii The variation of expenditure on certain commodity groups may be higher for high-income families than for low-income ones (iii When estimate the return to education, ability is unobservable and thus it enters the disturbance It s possible that the variance of ability varies with the level of education (iv Sometimes heteroskedasticity is a consequence of aggregation (eg taking average of data By eyeballing the patterns of residuals from OLS estimation, we may find some evidence of heteroskedasticity Example : Consider the following model EXP = β +β AGE +β 3 INCOME +β 4 (INCOME +β 5 OW NER+ε, (3 where EXP is the credit card expenditure and OW NER is a dummy variable indicating whether an individual owns a house Model (3 is estimated by OLS and the residuals are saved In Figure the residuals are plotted against INCOME and in Figure against AGE In Figure, the spread of the residuals become wider for higher income, while in Figure the distribution of the residuals are largely random Figure and suggest that a common cause of

Consequences Fig : Plot of the OLS Residuals against INCOME heteroskedasticity is that the variances of the disturbance terms may depend on some of the x variables, ie, σi = h (x i In this case, it appears that σi is positively related to IN COM E STATA Tips To obtain graphs like those in Figure and, use the following command in STATA reg exp age income income owner predict e, resid graph twoway scatter e income, msymbol(oh yline(0 Consequences Recall from our previous discussion that if we use OLS when Var (ε = σ Ω, then (i b is unbiased; (ii b is inefficient, while the GLS estimator, β, is BLUE (iii Var (b = σ (X X X ΩX (X X So the use of σ (X X is incorrect and it leads to incorrect standard errors and unreliable inferences about population parameters

3 Robust Estimation of Asymptotic Covariance Matrix 3 Fig : Plot of the OLS Residuals against AGE 3 Robust Estimation of Asymptotic Covariance Matrix The above discussions suggest that if we are to continue using OLS in the presence of heteroskedasticity, then we should at least use the correct formula for Var (b Note that in the expression for Var (b, σ and Ω are both unknown To estimate Var (b, we need to estimate the matrix σ X ΩX White (980, Econometrica shows that under very general conditions, the matrix, is a consistent estimator of S 0 = n n e i x i x i, (4 Σ = n σ X ΩX = n n σi x i x i, (5 where e i is the OLS residual for observation i and x i = [ ] x i x i x ik Therefore, we can obtain a consistent estimator of Var (b, which is given by ( n EstAsyVar (b = (X X e i x i x i (X X (6 This is usually called the White heteroskedastic-consistent/robust estimator of the covariance matrix of b Note that in forming this estimator, we don t have to assume any specific form of heteroskedasticity So it s a very useful result The asymptotic properties of the estimator is unambiguous, but

4 Testing for Heteroskedasticity 4 its usefulness in small sample is open to question Some Monte Carlo studies suggest that in small sample the White estimator tends to underestimate the variance matrix Remark : With the White robust estimator for covariance matrix, we can construct the t statistic as usual, which is called the heteroskedastic-robust t statistic Note that this robust statistic follows a t distribution only asymptotically In small sample, its sampling distribution is unknown Remark : We cannot use the F test for testing exact linear restrictions because the distributional assumption of the F statistic requires homoskedasticity But we can use a Wald test The statistic is W = (Rb q {R [EstAsyVar (b] R } (Rb q (7 χ J under H 0 : Rβ = q That is, the statistic is asymptotically distributed as χ with degrees of freedom equal to the number of restrictions STATA Tips: In STATA, to obtain the White estimator, we simply add the option robust to the regress command For example, reg y x x x3, robust Then the output will report standard errors computed from the White estimator of the covariance matrix of b 4 Testing for Heteroskedasticity Among others, three tests are common in practice to detect heteroskedasticity They are: ( White s general test; ( Goldfeld-Quandt test; and (3 Breusch- Pagan LM test These tests are based on the following strategy OLS estimator of β is consistent even in the presence of heteroskedasticity Therefore, the OLS residuals will mimic the heteroskedasticity of the true disturbances Hence, tests designed to detect heteroskedasticity will be applied to the OLS residuals 4 White s General Test The hypotheses under examination are H 0 : σ i = σ vs H : not H 0 Note that to conduct White test, we do not have to assume any specific form of heteroskedasticity White test is motivated by the observation that if the model does not have heteroskedasticity, then ε i should not be correlated with any regressors, the squares of those regressors and their cross products A simple operational version of White test is carried out by obtaining nr in the auxiliary regression of e i on a constant and all unique variables contained in x i and all the squares and cross products of the variables in x i

4 Testing for Heteroskedasticity 5 Example 3: Suppose we have four regressors, x, x, x 3, and a constant term Then White test is carried out by first obtaining the residuals, e i, from OLS of the original model and then estimating an auxiliary regression e i on a constant and x, x, x 3, x, x, x 3, x x, x x 3, x x 3 Finally, record the R from the auxiliary regression and construct the test statistic nr The test statistic, nr, is asymptotically distributed as chi-squared with P degrees of freedom, where P is the number of regressors in the auxiliary regression, including the constant nr a χ P (8 Remark 3: White test is very general in that it does not specify any specific form of heteroskedasticity Remark 4: Due to its generality, White test may simply identify some other specification errors (such as the omission of x from a simple regression instead of heteroskedasticity Remark 5: The power of White test may be low in some cases Remark 6: White test is nonconstructive in that if we reject the null hypothesis, then the result of the test does not provide any guidance for the next step STATA Tips To perform White test in STATA, you can either manually construct the test statistic as in (8 or to use the whitetst command following a regress command on the original model whitetst is not an official STATA command and has to be downloaded Type findit whitetst in STATA and follow the link and the command will be installed automatically 4 Goldfeld-Quandt Test Goldfeld-Quandt test assumes some particular form of heteroskedasticity It tests that E ( ε i = σ h (x ik, eg, σ x ik This test is applicable if one of the x variables is thought to cause the heteroskedasticity Steps: Reorder observations by values of x k Omit c central observations and we are left with two samples of (n c / observations 3 Let σ (σ be the error variance of the first (second sample Test H 0 : σ = σ vs H : σ > σ 4 Estimate the regression y = Xβ + ε in each sub-sample (which requires that (n c / > K Obtain e e and e e, where e and e are the residual vectors from the two sub-samples respectively 5 Form R = e e /e e

4 Testing for Heteroskedasticity 6 It can be shown that under H 0, where n = (n c K / R F n,n, (9 Remark 7: c can be zero Introducing c is intended to increase the power of the test However, if c increases, then (n c / decreases, which leads to lower degrees of freedom in the estimation with each sub-sample and this tends to diminish the power of the test So there is a trade-off in choose the appropriate c Some studies suggest that no more than a third of the observations should be dropped One choice is that c n 3 K Remark 8: Goldfeld-Quandt test is exactly distributed as F under H 0 if the disturbances are normally distributed If not, then F distribution is only an approximation 43 Breusch-Pagan LM Test Goldfeld-Quandt test is reasonably powerful if we know or are able to identify correctly the variable to use in sample separation This limits its generality For example, what if a set of regressors jointly determine the nature of heteroskedasticity? In this regard, Breusch-Pagan LM test is more general Assume σ i = h (z iα, where h ( is some function, α is a coefficient vector unrelated to β, and z i is a vector of variables causing heteroskedasticity, with the first element being Within this framework, if α = α 3 = = α P = 0, then σ i = h (α = σ, ie, homoskedasticity Therefore, we are to test Steps: H 0 : α = α 3 = = α P = 0 vs H : not H 0 Regress y on X Obtain OLS residual vector e Compute σ = e e/n and g i = ( e i / σ 3 Estimate, by OLS, an auxiliary regression g i = α + α z i + α 3 z i3 + + α P z ip + v i (0 4 Compute the regression sum of squares (SSR, SSR = n (ĝ i g, where g = n n g i ( Under H 0, LM = SSR a χ P (

5 Generalized Least Squares Estimator 7 STATA Tips To perform Breusch-Pagan LM Test in STATA, you can use the hettest or the bpagan command following the regress command on the original model bpagan is unofficial and thus needs to be downloaded The syntax is the following hettest var_list where var_list specifies z i without the The same syntax applies to bpagan 5 Generalized Least Squares Estimator 5 Weighted Least Squares when Ω Is Known Suppose the variance matrix of ε is given by (, where Ω is known Without loss of generality, we may write So, σ i = σ ω i (3 ω 0 ω Ω = (4 0 ω n Now consider a weight matrix, P, as follows: ω 0 P = ω (5 0 ωn Hence, P P = Ω and Py = y / ω y / ω y n / ω n and PX = x / ω x / ω x n / ω n Regressing Py on PX using OLS gives the GLS estimator, β = (X P PX X P Py = ( X Ω X X Ω y [ n ] [ n ] = w i x i x i w i x i y i, (6 where w i = /ω i In this case, β is also called the weighted least squares (WLS estimator

5 Generalized Least Squares Estimator 8 A common specification is that the variance is proportional to one of the regressors or its square For example, if σ i = σ x ik (7 for some k, then the transformed regression model for GLS (or WLS is ( ( ( y x x xk = β k + β + β + + β K + ε (8 x k x k x k x k x k If the variance is proportional to x k instead of x k, then the weight applied to each observation is / x k instead of /x k STATA Tips In STATA, you can perform WLS either by manually transforming the data and then running OLS or use the aweight feature in the regress command The syntax is as follows regress y x x xk [aweight=var_name] The weight to be used is /ω i For example, if σi = σ x ik, then you should first generate a variable, say w, which equals /x ik, and then write [aweight=w] in the regress command If σi = σ x ik, then w should be /x ik 5 Estimation when Ω Is Unknown It s rare that the form of Ω is known, so usually it has to be estimated The general form of the heteroskedastic regression model has too many parameters to estimate Typically, the model is restricted by formulating σ Ω as a function of a few parameters, α Write this function as Ω (α FGLS based on a consistent estimator of Ω (α is asymptotically equivalent to full GLS Recall that for the heteroskedastic model, the GLS estimator is [ n ( β = σ i x i x i ] [ n ( ] x i y i (9 Basically, we first need to obtain estimates for σi, say σ i, usually using some function of the OLS residuals Then we can compute β from (9 and σ i Note that E ( ε i = σ i, so σ i ε i = σ i + v i, (0 where v i is the difference between ε i and its expectation Since ε i is unobservable, we would use the least squares residuals, for which Then e i = ε i x i (b β = ε i + u i ( e i = ε i + u i + ε i u i ( However, we know that b is consistent, ie, b P β Therefore, the terms in u i will become negligible and thus approximately we have, e i = σ i + v i (3

5 Generalized Least Squares Estimator 9 The above reasoning leads to the following estimation strategy If σ i = h (z i α, where z i may or may not coincide with x i, then we can obtain a consistent estimator for α by estimating e i = h (z iα + v i (4 Obtaining the fitted value of e i, say ê i, we can use it in place of σ i in (9 to construct β, the feasible generalized least squares (FGLS estimator This estimation method is called the two-step estimation A common functional form for h ( is exponential Suppose we have a model, We may write y i = β + β x i + + β K x Ki + ε i, where ε i ( 0, σ i (5 σ i = exp (α + α z i + + α P z P i v i, (6 where v i is uncorrelated with z s and has expectation of Then ln ( σ i = α + α z i + + α P z P i + v i (7 In this case, the procedures to obtain the FGLS estimator are the following Regress y on (, x,, x K and obtain e i Compute ln ( e i and use it as the dependent variable in model (7 Obtain the fitted value, ln (e i ( 3 Compute ĥi = exp ln (e i and its reciprocal, w i = b hi 4 Use w i as the weight to compute the weighted least squares estimator of β