Heteroskedasticity. We now consider the implications of relaxing the assumption that the conditional

Similar documents
the error term could vary over the observations, in ways that are related

Repeated observations on the same cross-section of individual units. Important advantages relative to pure cross-section data

Introductory Econometrics

Econometrics - 30C00200

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

We can relax the assumption that observations are independent over i = firms or plants which operate in the same industries/sectors

Intermediate Econometrics

Topic 7: Heteroskedasticity

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Models, Testing, and Correction of Heteroskedasticity. James L. Powell Department of Economics University of California, Berkeley

Heteroskedasticity. y i = β 0 + β 1 x 1i + β 2 x 2i β k x ki + e i. where E(e i. ) σ 2, non-constant variance.

Heteroskedasticity and Autocorrelation

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

Iris Wang.

Econometrics Multiple Regression Analysis: Heteroskedasticity

Heteroskedasticity ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD

Introduction to Econometrics. Heteroskedasticity

Heteroscedasticity. Jamie Monogan. Intermediate Political Methodology. University of Georgia. Jamie Monogan (UGA) Heteroscedasticity POLS / 11

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Multiple Regression Analysis

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Week 11 Heteroskedasticity and Autocorrelation

Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16)

Econ 510 B. Brown Spring 2014 Final Exam Answers

Semester 2, 2015/2016

Lecture 4: Heteroskedasticity

So far our focus has been on estimation of the parameter vector β in the. y = Xβ + u

Regression with time series

Review of Econometrics

Graduate Econometrics Lecture 4: Heteroskedasticity

Heteroskedasticity. Part VII. Heteroskedasticity

Course Econometrics I

Simple Linear Regression: The Model

Homoskedasticity. Var (u X) = σ 2. (23)

Panel Data Models. James L. Powell Department of Economics University of California, Berkeley

Quantitative Analysis of Financial Markets. Summary of Part II. Key Concepts & Formulas. Christopher Ting. November 11, 2017

Christopher Dougherty London School of Economics and Political Science

Multiple Regression Analysis: Heteroskedasticity

Econometrics. 9) Heteroscedasticity and autocorrelation

Wooldridge, Introductory Econometrics, 2d ed. Chapter 8: Heteroskedasticity In laying out the standard regression model, we made the assumption of

LECTURE 10: MORE ON RANDOM PROCESSES

Least Squares Estimation-Finite-Sample Properties

Microeconometrics: Clustering. Ethan Kaplan

Instrumental Variables, Simultaneous and Systems of Equations

INTRODUCTION TO BASIC LINEAR REGRESSION MODEL

ECON3327: Financial Econometrics, Spring 2016

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012

Chapter 6. Panel Data. Joan Llull. Quantitative Statistical Methods II Barcelona GSE

Econometrics of Panel Data

Econometrics Summary Algebraic and Statistical Preliminaries

Short T Panels - Review

LECTURE 11. Introduction to Econometrics. Autocorrelation

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Linear Model Under General Variance

Linear Regression with 1 Regressor. Introduction to Econometrics Spring 2012 Ken Simons

Instrumental Variables and GMM: Estimation and Testing. Steven Stillman, New Zealand Department of Labour

Simple Linear Regression Model & Introduction to. OLS Estimation

GENERALISED LEAST SQUARES AND RELATED TOPICS

Section 6: Heteroskedasticity and Serial Correlation

1 Introduction to Generalized Least Squares

1 The Multiple Regression Model: Freeing Up the Classical Assumptions

Heteroskedasticity (Section )

Empirical Economic Research, Part II

Introductory Econometrics

Instrumental Variables

Non-Spherical Errors

Chapter 8 Heteroskedasticity

ECON2228 Notes 10. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 48

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Econometrics of Panel Data

Economics 620, Lecture 7: Still More, But Last, on the K-Varable Linear Model

Lab 11 - Heteroskedasticity

Econometrics. Week 6. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Quick Review on Linear Multiple Regression

Ordinary Least Squares Regression

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

Econometria. Estimation and hypotheses testing in the uni-equational linear regression model: cross-section data. Luca Fanelli. University of Bologna

Zellner s Seemingly Unrelated Regressions Model. James L. Powell Department of Economics University of California, Berkeley

Outline. Possible Reasons. Nature of Heteroscedasticity. Basic Econometrics in Transportation. Heteroscedasticity

ECON2228 Notes 10. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 54

Applied Statistics and Econometrics

Heteroskedasticity. Occurs when the Gauss Markov assumption that the residual variance is constant across all observations in the data set

ECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests

Additional Topics on Linear Regression

Model Mis-specification

Central Bank of Chile October 29-31, 2013 Bruce Hansen (University of Wisconsin) Structural Breaks October 29-31, / 91. Bruce E.

Lecture 7: Dynamic panel models 2

Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator

QED. Queen s Economics Department Working Paper No Heteroskedasticity-Robust Tests for Structural Change. James G. MacKinnon Queen s University

Financial Econometrics

Introductory Econometrics

Econometric Analysis of Cross Section and Panel Data

A Course on Advanced Econometrics

1 Motivation for Instrumental Variable (IV) Regression

Econ 423 Lecture Notes: Additional Topics in Time Series 1

Cluster-Robust Inference

Generalized Method of Moments: I. Chapter 9, R. Davidson and J.G. MacKinnon, Econometric Theory and Methods, 2004, Oxford.

Linear Regression. Junhui Qian. October 27, 2014

Final Exam. Economics 835: Econometrics. Fall 2010

Transcription:

Heteroskedasticity We now consider the implications of relaxing the assumption that the conditional variance V (u i x i ) = σ 2 is common to all observations i = 1,..., In many applications, we may suspect that the conditional variance of the error term could vary over the observations, in ways that may be diffi cult to model convincingly 1

For example, the variance of shocks to GDP per capita may be quite different for developing countries that are dependent on primary commodity exports compared to developing countries with more diversified export structures, or compared to OECD countries Or the variance of shocks to firm-level total factor productivity (TFP) may be quite different for firms in high-tech sectors compared to firms in low-tech sectors, or for new entrants compared to incumbent firms 2

Allowing the conditional variance V (u i x i ) = σ 2 i to be different for different observations i = 1,..., is referred to as conditional heteroskedasticity We have seen that the consistency property of the OLS estimator does not require the assumption of conditional homoskedasticity We now show that the asymptotic ormality property of the OLS estimator does not require conditional homoskedasticity The asymptotic variance of the OLS estimator has a different form in the more general case of conditional heteroskedasticity, but can still be estimated consistently 3

This allows us to extend the asymptotic Wald tests of restrictions on the parameter vector β to this more general setting As before, we start with a suffi cient set of assumptions to obtain the asymptotic ormality result, and to derive the asymptotic variance of the OLS estimator, in the case of conditional heteroskedasticity 4

i) y i = x i β + u i for i = 1,..., or y = Xβ + u ii) The data on (y i, x i ) are independent over i = 1,...,, with E(u i ) = 0 and E(x i u i ) = 0 for all i = 1,..., but now with E(u 2 i x i) = σ 2 i iii) X is stochastic and full rank iv) The K K matrix M XX = p lim ( X X ) = p lim 1 exists and is non-singular ( ) X v) The K 1 vector u = 1 D x i u i (0, MXΩX ) i=1 where M XΩX = p lim ( X uu X ) = p lim 1 Then ( β OLS β) D (0, M 1 XX M XΩXM 1 XX ) 5 u 2 i x ix i i=1 i=1 x i x i

Assumption (ii), that E(u 2 i x i) = σ 2 i and we have independent observations over i = 1,...,, implies that the conditional variance matrix E(uu X) = Ω is an matrix with elements σ 2 i on its main diagonal, and zeros elsewhere, i.e. E(uu X) = Ω = σ 2 1 0 0 0 σ 2 2 0...... 0 0 σ 2 ote that there are parameters σ 2 1, σ 2 2,..., σ 2 in Ω 6

Consequently we cannot estimate Ω consistently from a sample with observations The number of parameters to be estimated increases at the same rate as the sample size Happily we do not require a consistent estimator of Ω to obtain a consistent estimator of avar( β OLS ) = ( ) 1 M 1 XX M XΩXMXX 1 7

The proof of this asymptotic ormality result follows the same steps that we went through in detail for the case of conditional homoskedasticity We write ) ( βols β = ( X X ) 1 ( ) X u The K K matrix ( X X The K 1 vector ) 1 P M 1 XX from assumption (iv) ( X u ) D (0, MXΩX ) from assumption (v) has the same limit dis- The product rule then implies that ( X X ( tribution as MXX 1 X u ), so that ) 1 ( ) X u ( βols β) D (0, M 1 XX M XΩXM 1 XX ) using the symmetry of M 1 XX 8

As before, assumption (v) can be derived from more primitive assumptions, that allow an appropriate (Liapounov) Central Limit Theorem for independent but not identically distributed random vectors to be used to establish that 1 D x i u i (0, 1 MXΩX ) where M XΩX = p lim u 2 i x ix i i=1 Again this result cannot be applied directly to the case of time series models with lagged dependent variables, since this violates the assumption of independent observations i=1 Section 2.3 of Hayashi (2000) provides a more general asymptotic ormality result for the OLS estimator, which covers this case 9

The assumption that (y i, x i ) are independent over i = 1,..., is replaced by the weaker assumption that the stochastic process {y i, x i } is stationary and ergodic A stochastic process {z t }(t = 1, 2,...) is (strictly) stationary if the joint distribution of (e.g.) (z 1, z 2, z 3 ) is the same as the joint distribution (z 101, z 102, z 103 ) A stationary stochastic process is ergodic if two random variables z t and z t+k become (almost) independent as k increases ote however that for time series models with lagged dependent variables, the assumption that E(x t u t ) = 0 for t = 1,..., T rules out serial correlation in the errors u t 10

For example, if we have the simple dynamic model then y t = β 0 + β 1 y t 1 + β 2 x t + u t y t 1 = β 0 + β 1 y t 2 + β 2 x t 1 + u t 1 So u t 1 is certainly correlated with y t 1 If u t is also correlated with u t 1, which is what we mean by serial correlation of the error term in this context (e.g. if u t = ρu t 1 + e t, where e t is serially uncorrelated), then y t 1 and u t will be correlated, violating the assumption that E[(1, y t 1, x t ) u t ] = 0 In this case, the OLS estimator is inconsistent 11

ow consider the result ( βols β) D (0, M 1 XX M XΩXM 1 XX ) As before, we can use this limit distribution for ( β OLS β) to obtain an approximation to the distribution of β OLS that will be accurate in large (but finite) samples We have where β OLS a (β, avar( β OLS ) = ( ) ) 1 MXX 1 M XΩXMXX 1 ( ) 1 MXX 1 M XΩXMXX 1 12

To make this useful, we require consistent estimators for the K K matrices M XX and M XΩX We have already seen that M XX = ( X X of M XX Similarly M XΩX = 1 since p lim ) provides a consistent estimator û 2 i x ix i provides a consistent estimator of M XΩX, i=1 M XΩX = p lim 1 i=1 û 2 ix i x i = p lim 1 u 2 ix i x i = M XΩX i=1 since y i x i β OLS = û i P ui 13

Then ( M XX 1 M 1 X XΩX M ) ( ) 1 XX = X 1 (X û 2 ix i x X i i=1 ( ) = (X X) 1 û 2 ix i x i (X X) 1 provides a consistent estimator of M 1 XX M XΩXM 1 XX i=1 ) 1 And âvar( β OLS ) = ( ) 1 M XX 1 M 1 XΩX M XX ) ( = (X X) 1 û 2 ix i x i provides a consistent estimator of avar( β OLS ) = ( 1 i=1 (X X) 1 ) M 1 XX M XΩXM 1 XX 14

ow we have ) a β OLS (β, âvar( β OLS ) where ( ) âvar( β OLS ) = (X X) 1 û 2 ix i x i (X X) 1 We can compute âvar( β OLS ) using the data on X and the OLS residuals û i=1 We can then construct asymptotic t-test and Wald test statistics as before, using this heteroskedasticity-consistent estimator of âvar( β OLS ) in place of the estimator we obtained in the special case of conditional homoskedasticity 15

This heteroskedasticity-consistent estimator for the asymptotic variance of the OLS estimator was introduced into the econometrics literature in a paper by White (Econometrica, 1980), one of the most cited papers in econometrics (or economics) in the last 30 years Similar ideas can be found much earlier in the statistics literature, in papers by Huber and by Eicker (both in 1967) The square roots of the elements on the main diagonal of âvar( β OLS ) are variously referred to as heteroskedasticity-consistent standard errors, or heteroskedasticity-robust standard errors, or White standard errors (or some combination of Eicker-Huber-White standard errors) 16

Heteroskedasticity-robust standard errors and test statistics are available in most econometric software To obtain these in Stata, we can use the vce(robust) option within the regress command, e.g. reg y x1 x2, vce(r) Asymptotic inference based on the non-robust estimator âvar( β OLS ) = σ 2 (X X) 1 that we derived under conditional homoskedasticity is valid only under this restrictive assumption 17

But since E(u 2 i x i) = σ 2 is a special case of the more general assumption E(u 2 i x i) = σ 2 i, asymptotic inference based on the robust estimator ( ) âvar( β OLS ) = (X X) 1 û 2 i x ix i (X X) 1 that we derived under condi- i=1 tional heteroskedasticity is also valid (in large samples) if the model happens to satisfy conditional homoskedasticity At no point do we obtain a consistent estimator of the conditional variance matrix Ω = E(uu X) All that we need is a consistent estimator of the K K matrix M XΩX = 1 p lim u 2 i x ix i, which as we have seen can be estimated consistently i=1 as with K fixed 18

In applications where large data samples are available, a common response to the suspicion that conditional heteroskedasticity may be relevant is to continue to use the OLS estimator, and to use heteroskedasticity-consistent standard errors (and test statistics) in place of the traditional standard errors (and test statistics) The OLS estimator remains consistent, and we have a consistent estimator of the asymptotic variance matrix, so asymptotic inference remains valid in large samples 19

That is, we will reject a correct null hypothesis approximately 5% of the time at the 5% significance level (the level or size of the test is approximately correct) And the probability of rejecting a false null hypothesis (the power of the test) increases with the sample size, tending to one in the limit as (the test is said to be consistent) This is sometimes referred to as a passive response to heteroskedasticity The OLS estimator is not asymptotically effi cient in the case of conditional heteroskedasticity, but can still be used to conduct valid hypothesis tests in large samples 20

Testing for heteroskedasticity becomes less important if we have the luxury of using large data samples, and are content to follow this passive strategy Various tests are available that have power to detect conditional heteroskedasticity in the OLS residuals, or to reject the null hypothesis of conditional homoskedasticity White (Econometrica, 1980) suggested regressing the squared OLS residuals û 2 i on a constant and on all the explanatory variables, and their squares and cross-products 21

For example, in the model with K = 3 and an intercept term we run the regression y i = β 1 + β 2 x 2i + β 3 x 3i + u i û 2 i = γ 1 + γ 2 x 2i + γ 3 x 3i + γ 4 x 2 2i + γ 5 x 2 3i + γ 6 (x 2i x 3i ) + v i and test the restriction H 0 : γ 2 = γ 3 =... = γ 6 = 0 (which is implied by the conditional homoskedasticity assumption E(u 2 i x i) = σ 2 for all i = 1,..., ) 22

The basic idea is that we specify σ 2 i = E(u 2 i x i) to be some unknown function f(z i ) of a vector of observed variables z i We use the squared residuals û 2 i as a proxy for u2 i, and we use this polynomial approximation to the unknown function f(z i ) Earlier tests for heteroskedasticity based on similar ideas include those proposed by Glejser (JASA, 1969), Ramsey (JRSS(B), 1969), Goldfeld and Quandt (1972) and Breusch and Pagan (Econometrica, 1979) 23

In some models with conditional heteroskedasticity, we can obtain more effi cient estimators that OLS if we are willing to model the form that this heteroskedasticity takes This active response to heteroskedasticity may be more appropriate in applications where effi ciency is considered to be a more important concern To see the basic idea, we first consider a version of the classical linear regression model with a known form of conditional heteroskedasticity 24

Generalized Least Squares We assume E(y X) = Xβ V (y X) = Ω, with Ω σ 2 I a known, positive definite conditional variance matrix X is stochastic and full rank Because Ω is positive definite and known, we can find a non-stochastic matrix H such that H H = Ω 1 and HΩH = I 25

Let y = Hy and X = HX ow E(y X) = HE(y X) = HXβ = X β V (y X) = HV (y X)H = HΩH = I ( = σ 2 I for σ 2 = 1) X = HX is stochastic and full rank The transformed model y = X β + u with V (u X ) = I is a classical linear regression model with conditional homoskedasticity, which satisfies the assumptions of the Gauss-Markov theorem 26

Aitken s theorem The OLS estimator of β in this transformed model, known as the Generalized Least Squares (GLS) estimator, is effi cient (in the class of linear, unbiased estimators) β GLS = (X X ) 1 X y = (X H HX) 1 X H Hy = (X Ω 1 X) 1 X Ω 1 y 27

We can replace H here by any matrix which is proportional to H, and still obtain the GLS estimator If we use H = ah for some scalar a, we have the transformed variables ỹ = Hy = ahy and X = HX = ahx, giving ( X X) 1 X ỹ = (a 2 X H HX) 1 a 2 X H Hy = (X Ω 1 X) 1 X Ω 1 y = β GLS 28

Under the further ormality assumption y X (Xβ, Ω) the GLS estimator is also the conditional Maximum Likelihood estimator in this particular model where Ω is known β GLS also has a ormal distribution in this case, with β GLS X ( β, (X Ω 1 X) 1) 29

In practice this GLS estimator cannot be computed, since we don t know the conditional variance matrix Ω The Feasible Generalized Least Squares (FGLS) estimator replaces the unknown Ω by an estimator Ω, giving β F GLS = (X Ω 1 X) 1 X Ω 1 y This can be computed using y = Ĥy and X = ĤX, where we now require Ĥ Ĥ = Ω 1 and Ĥ ΩĤ = I [or we can use any matrix that is proportional to Ĥ] 30

The properties of the FGLS estimator depend on the properties of Ω as an estimator of Ω If Ω is a consistent estimator of Ω, then under quite general conditions we find that β F GLS has the same asymptotic distribution as the infeasible β GLS, giving β F GLS a ( β, (X Ω 1 X) 1 ) In this case, β F GLS is also asymptotically effi cient However obtaining a consistent estimator of Ω is not straightforward 31

In general, the symmetric matrix Ω has (+1)/2 distinct elements Even if we restrict all the off-diagonal elements to be zero, as is natural in a cross-section regression context with independent observations, so that σ 2 1 0 0 0 σ 2 2 0 Ω =...... 0 0 σ 2 we still have distinct elements These cannot be estimated consistently from a sample of size 32

Consistent estimation requires us to specify a (parametric) model for Ω of the form Ω = Ω(φ), where Ω(φ) is a function of the vector φ which contains a finite number of additional parameters, not increasing with the sample size, which can be estimated consistently from the data If this specification of the conditional heteroskedasticity V (y X) = Ω(φ) is correct, and we can find a consistent estimator φ for φ, we can then use the consistent estimator Ω = Ω( φ) to obtain the FGLS estimator 33

As a very simple example (in which implementation does not require the estimation of any additional parameters), we could specify the conditional variance V (y i X) = V (u i X) = σ 2 i to be proportional to the squared values of one of the regressors, say x Ki, giving σ 2 i = σ2 x 2 Ki We then let y i = y i x Ki and x ki = x ki x Ki for each k = 1,..., K (notice how this transformation affects the intercept in the model) In the transformed model y i = x i β + u i we then have V (y i X) = V (u i X) = σ2 i x 2 Ki = σ2 x 2 Ki x 2 Ki = σ 2 for all i = 1,..., 34

This transformed model then satisfies conditional homoskedasticity, and we can compute the FGLS estimator here simply as the OLS estimator in the transformed model Feasible GLS estimators of this kind are also known as Weighted Least Squares estimators, since we weight each observation by a factor which is proportional to (an estimate of) 1 σ i ote that the transformation gives less weight to observations where the variance of u i is (estimated to be) relatively high, and more weight to observations where the variance of u i is (estimated to be) relatively low 35

If our specification for the conditional variance V (y X) = Ω(φ) is correct, and we estimate Ω(φ) consistently, this weighting is the source of the effi ciency gain compared to OLS 36

If we specify the conditional variance V (y X) = Ω = Ω(φ) and further assume that y X (Xβ, Ω(φ)) then the feasible GLS estimator is not the conditional Maximum Likelihood estimator in the case where φ is unknown FGLS uses a consistent estimator of φ to construct a consistent estimator of Ω, and then maximizes L(β, Ω) = L(β, Ω( φ)) with respect to β The conditional Maximum Likelihood estimator maximizes the likelihood function L(β, Ω) = L(β, Ω(φ)) with respect to β and φ jointly 37

These estimators are different, unless we happen to have φ = φ ML, giving Ω = Ω ML, in which case L(β, Ω ML ) is a concentrated likelihood function, and maximizing L(β, Ω ML ) with respect to β also yields the conditional Maximum Likelihood estimator β ML In most applications of Feasible GLS, we do not have φ = φ ML Then β F GLS β ML, although they are asymptotically equivalent (i.e. they have the same asymptotic distribution) under quite general conditions 38

The consistency properties of β F GLS and β ML in this linear model do not depend on the parametric specification of Ω = Ω(φ) being correct But the effi ciency advantages of these estimators relative to β OLS may not hold if this specification for the form of the conditional heteroskedasticity is not correct These estimators do not extend straightforwardly to linear models that do not satisfy the linear conditional expectation assumption E(y i x i ) = x i β 39