Heteroskedasticity and Autocorrelation Consistent Standard Errors

Similar documents
Heteroskedasticity- and Autocorrelation-Robust Inference or Three Decades of HAC and HAR: What Have We Learned?

An estimate of the long-run covariance matrix, Ω, is necessary to calculate asymptotic

HAR Inference: Recommendations for Practice

Single Equation Linear GMM with Serially Correlated Moment Conditions

GMM, HAC estimators, & Standard Errors for Business Cycle Statistics

LECTURE ON HAC COVARIANCE MATRIX ESTIMATION AND THE KVB APPROACH

Single Equation Linear GMM with Serially Correlated Moment Conditions

HAC Corrections for Strongly Autocorrelated Time Series

The Size-Power Tradeoff in HAR Inference

The Functional Central Limit Theorem and Testing for Time Varying Parameters

Testing in GMM Models Without Truncation

Optimal Bandwidth Selection in Heteroskedasticity-Autocorrelation Robust Testing

A New Approach to Robust Inference in Cointegration

Asymptotic distribution of GMM Estimator

Chapter 11 GMM: General Formulas and Application

LONG RUN VARIANCE ESTIMATION AND ROBUST REGRESSION TESTING USING SHARP ORIGIN KERNELS WITH NO TRUNCATION

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY

A New Asymptotic Theory for Heteroskedasticity-Autocorrelation Robust Tests

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY

Choice of Spectral Density Estimator in Ng-Perron Test: Comparative Analysis

Accurate Asymptotic Approximation in the Optimal GMM Frame. Stochastic Volatility Models

Department of Economics, UCSD UC San Diego

Robust Unit Root and Cointegration Rank Tests for Panels and Large Systems *

Economic modelling and forecasting

Heteroskedasticity in Panel Data

Heteroskedasticity in Panel Data

On the Long-Run Variance Ratio Test for a Unit Root

STAD57 Time Series Analysis. Lecture 23

CAE Working Paper # Fixed-b Asymptotic Approximation of the Sampling Behavior of Nonparametric Spectral Density Estimators

Statistics 349(02) Review Questions

Robust Nonnested Testing and the Demand for Money

Economics 308: Econometrics Professor Moody

A Fixed-b Perspective on the Phillips-Perron Unit Root Tests

Inference with Dependent Data Using Cluster Covariance Estimators

HETEROSKEDASTICITY, TEMPORAL AND SPATIAL CORRELATION MATTER

by Ye Cai and Mototsugu Shintani

Econometric Methods for Panel Data

ECE 636: Systems identification

Econometrics Honor s Exam Review Session. Spring 2012 Eunice Han

Cointegrating Regressions with Messy Regressors: J. Isaac Miller

THE ERROR IN REJECTION PROBABILITY OF SIMPLE AUTOCORRELATION ROBUST TESTS

Generalized Method of Moments (GMM) Estimation

On block bootstrapping areal data Introduction

Nonstationary Time Series:

HAR Inference: Recommendations for Practice

Heteroskedasticity-Robust Standard Errors for Fixed Effects Panel Data Regression

HAR Inference: Recommendations for Practice

CAE Working Paper # A New Asymptotic Theory for Heteroskedasticity-Autocorrelation Robust Tests. Nicholas M. Kiefer and Timothy J.

Ch. 15 Forecasting. 1.1 Forecasts Based on Conditional Expectations

POWER MAXIMIZATION AND SIZE CONTROL IN HETEROSKEDASTICITY AND AUTOCORRELATION ROBUST TESTS WITH EXPONENTIATED KERNELS

A New Approach to the Asymptotics of Heteroskedasticity-Autocorrelation Robust Testing

ROBUST MEASUREMENT OF THE DURATION OF

ECONOMETRICS HONOR S EXAM REVIEW SESSION

Time Series Examples Sheet

Lecture 4: Heteroskedasticity

ECONOMICS 210C / ECONOMICS 236A MONETARY HISTORY

Low-Frequency Econometrics

Testing Weak Convergence Based on HAR Covariance Matrix Estimators

Title. Description. Quick start. Menu. stata.com. xtcointtest Panel-data cointegration tests

Time series models in the Frequency domain. The power spectrum, Spectral analysis

E 4160 Autumn term Lecture 9: Deterministic trends vs integrated series; Spurious regression; Dickey-Fuller distribution and test

TitleReducing the Size Distortion of the.

STA 6857 Cross Spectra & Linear Filters ( 4.6 & 4.7)

ECON 616: Lecture 1: Time Series Basics

Inference with Dependent Data Using Cluster Covariance Estimators

9. AUTOCORRELATION. [1] Definition of Autocorrelation (AUTO) 1) Model: y t = x t β + ε t. We say that AUTO exists if cov(ε t,ε s ) 0, t s.

Lecture 9 SLR in Matrix Form

1 Class Organization. 2 Introduction

GMM - Generalized method of moments

Central Bank of Chile October 29-31, 2013 Bruce Hansen (University of Wisconsin) Structural Breaks October 29-31, / 91. Bruce E.

Econ 423 Lecture Notes

Comparing Forecast Accuracy of Different Models for Prices of Metal Commodities

Economics 583: Econometric Theory I A Primer on Asymptotics

TAKEHOME FINAL EXAM e iω e 2iω e iω e 2iω

Basics: Definitions and Notation. Stationarity. A More Formal Definition

Low-Frequency Econometrics

Questions and Answers on Heteroskedasticity, Autocorrelation and Generalized Least Squares

Heteroscedasticity and Autocorrelation

The Bootstrap: Theory and Applications. Biing-Shen Kuo National Chengchi University

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Fixed-b Asymptotics for t-statistics in the Presence of Time-Varying Volatility

Econ 582 Fixed Effects Estimation of Panel Data

Nonstationary Panels

GMM and SMM. 1. Hansen, L Large Sample Properties of Generalized Method of Moments Estimators, Econometrica, 50, p

Introduction to Econometrics

Econometrics of financial markets, -solutions to seminar 1. Problem 1

Alternative HAC Covariance Matrix Estimators with Improved Finite Sample Properties

Module 3. Descriptive Time Series Statistics and Introduction to Time Series Models

Econometrics. Week 11. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Long-Run Covariability

Introduction to Econometrics

Online Appendix. j=1. φ T (ω j ) vec (EI T (ω j ) f θ0 (ω j )). vec (EI T (ω) f θ0 (ω)) = O T β+1/2) = o(1), M 1. M T (s) exp ( isω)

Problem Set 2 Solution Sketches Time Series Analysis Spring 2010

Current Version: December, 1999 FULLY MODIFIED OLS FOR HETEROGENEOUS COINTEGRATED PANELS * Peter Pedroni. Indiana University

E 4101/5101 Lecture 9: Non-stationarity

POWER MAXIMIZATION AND SIZE CONTROL IN HETEROSKEDASTICITY AND AUTOCORRELATION ROBUST TESTS WITH EXPONENTIATED KERNELS

Time Series I Time Domain Methods

Robust Standard Errors to Spatial and Time Dependence in Aggregate Panel Models

8.2 Harmonic Regression and the Periodogram

The Exact Distribution of the t-ratio with Robust and Clustered Standard Errors

Transcription:

NBER Summer Institute Minicourse What s New in Econometrics: ime Series Lecture 9 July 6, 008 Heteroskedasticity and Autocorrelation Consistent Standard Errors Lecture 9, July, 008

Outline. What are HAC SE s and why are they needed?. Parametric and Nonparametric Estimators 3. Some Estimation Issues (psd, lag choice, etc.) 4. Inconsistent Estimators Lecture 9, July, 008

. What are HAC SE s and why are they needed? Linear Regression: y t = x t β + e t ˆ β = xx t t' xy t t' = β + xx t t' xe t t' t= t= t= t= t t t t XX t= t= ˆ ( β β) = x x ' xe ' = S A. d S XX p Σ XX = E(x t x t ), A N (0, Ω ), where Ω = lim Var hus a ˆ β N β, V, where HAC problem : ˆΩ =??? Vˆ = S Ω ˆS XX XX t = xe t t ' Lecture 9 3, July, 008

(Same problem arises in IV regression, GMM, ) Notation: Ω = lim Var t = xe t t ' Let w t = x t e t (assume this is a scalar for convenience) Feasible estimator of Ω must use ˆ w = xeˆ, t t t ( ˆ t ) ( w t ) he estimator is Ω ˆ { w }, but most of our discussion uses ˆ { } convenience. Ω for Lecture 9 4, July, 008

An expression for Ω: Suppose w t is covariance stationary with autocovariances γ. (Also, for notational convenience, assume w t is a scalar). hen Ω= var( wt) = var( w+ w +... + w) t= = { γ0 + ( )( γ+ γ ) + ( )( γ + γ ) +... ( γ + γ )} = γ ( γ + γ ) = + = If the autocovariances are -summable so that Ω = var( w ) t t= = γ γ < then Recall spectrum of w at frequency ω is S(ω) = i γ e ω, so that Ω = π = π S(0). Ω is called the Long-run variance of w. Lecture 9 5, July, 008

II. Estimators (a) Parametric Estimators, w t ~ ARMA φ(l)w t = θ(l)ε t, S(ω) = iω iω θ( e ) θ( e ) σ ε iω iω π φ( e ) φ( e ) Ω = π S(0) = σ ( θ θ θ ) θ () q ε = σ ε φ() ( φ φ φp) Ω= ˆ ˆ σ ε ( ˆ θ ˆ ˆ θ θq) ( ˆ φ ˆ φ ˆ φ ) p Lecture 9 6, July, 008

Jargon: VAR-HAC is a version of this. Suppose w t is a vector and the estimated VAR using w t is w =Φ ˆ w +... +Φ ˆ w + ˆ ε, t t p t p t where ˆ Σ ε = ˆˆ' εtεt t= hen Ω= ˆ ( I Φ ˆ... Φ ˆ ) Σ ˆ ( I Φ ˆ... Φ ˆ ) ' p ε p Lecture 9 7, July, 008

(b) Nonparametric Estimators Ω= γ = m Ω= ˆ = m K ˆ γ with ˆ γ = ww = ˆ γ ( 0) t t t= + Lecture 9 8, July, 008

III. Issues (a) ˆΩ 0? (b) Number of lags (m) in non-parameter estimator or order of ARMA in parametric estimator (c) form of K weights Lecture 9 9, July, 008

(a) ˆΩ 0? Ω= ˆ ˆ σ ε ( ˆ θ ˆ ˆ θ θq) ( ˆ φ ˆ φ ˆ φ ) p (yes) m Ω= ˆ K ˆ γ (Not necessarily) = m ˆ ( ˆ ˆ ˆ ) MA() Example: Ω = (γ + γ 0 + γ ) and Ω = γ + γ0 + γ γ γ 0 / but ˆ γ ˆ > γ 0 / is possible. Newey-West (987): Use K = m + (Bartlett Kernel ) Lecture 9 0, July, 008

An alternative expression for ˆ m Ω= = m K ˆ γ Let W = (w w w ) where W ~ (0, Σ WW ), U = HW, U ~ (0, Σ UU ) with Σ UU = HΣ WW H. Choose H so that Σ UU = D a diagonal matrix. A useful result: with W covariance stationary (Σ WW is special ), a particular H matrix yields (approximately) Σ UU = D (diagonal). (he u s are the coefficients from the discrete Fourier transform of the w s.) Variable π Variance (D ii ) u S(0) u and u 3 S( π/) u 4 and u 5 S( π/) u 6 and u 7 S(3 π/) u and u ( odd, similar for even) S([( )/] π/) Lecture 9, July, 008

Some algebra shows that m Ω= ˆ K ˆ γ = = m ku 0 ( )/ u + u + + k = where the k-weights are functions (Fourier transforms) of the K-weights. he algebraic details are unimportant for our purposes but, for those interested, they are sketched on the next slide. Jargon: the term u + u + is called the th periodogram ordinates. Because π Eu ( ) = Eu ( + ) = π S, the th periodogram ordinate is an estimate of the spectrum at frequency (π/). he important point: Evidently, ˆΩ 0 requires k 0 for all. Lecture 9, July, 008

he algebra linking the data w, the u s, the periodogram ordinates and the sample covariances parallels the expressions the spectrum presented in lecture. (Ref: Fuller (976), or Priestly (98), or Brockwell and Davis (99)). In particular, let c = iω t we t where ω = π/ t= hen u where ˆp γ = + u + = c ˆ = γ p p p= t= p+ cos( ω ), ( w w)( w w) t t p hus ω. u + u + = c is a natural estimator of the spectrum at frequency Lecture 9 3, July, 008

m Ω= ˆ K ˆ γ = = m ku 0 ( )/ u + u + + k = is then a weighted average (with weights k of the estimated spectra at difference frequencies. Because the goal is to estimate at ω = 0, the weights should concentrate around this frequency as gets large. Jargon: Kernel (K-weights, sometimes k-weights), Lag Window (Kweights, RAS follows this notation and calls these lwindow in the code), Spectral Window (k-weights) Lecture 9 4, July, 008

runcated Kernel: K = ( m) Spectral Weight (k) (m =, =50) Lecture 9 5, July, 008

Newey-West (Bartlet Kernel): K = m + Spectral Weight (k) (m =, =50) Lecture 9 6, July, 008

Implementing Estimator (lag order, kernel choice and so forth) Parametric Estimators: AR/VAR approximations: Berk (974) Ng and Perron (00), lag lengths in ADF tests. Related, but different issues. Shorter lags: Larger Bias, Smaller Variance Longer lags: Smaller Bias, Larger Variance Practical advice: Choose lags a little longer than you might otherwise (rational given below). Lecture 9 7, July, 008

How Should m be chosen? Andrews (99) and Newey and West (994): minimize mse( ˆΩ Ω) MSE = Variance + Bias ( )/ ˆ u + u + Ω= ku 0 + k = Variance: Spread the weight out over many of the squared u s: Make the spectral weights flat. Bias: E( ˆΩ) u + u depends on the values E + S( π/). So the bias depends on how flat the spectrum is around ω = 0. he more flat it is, the smaller is the bias. he more curved it is, the higher is the bias. Lecture 9 8, July, 008

Spectral Window (k ) for N-W/Bartlett Lag Window (K = m = 5 m = 0 m ), = 50. + m = 0 m = 50 Lecture 9 9, July, 008

MSE minimizing values for w t = φw t + e t, Bartlett Kernel (note the higher is φ, more serial correlation, more curved spectrum around ω = 0, more bias for given value of m.) m* =.447 4 /3 (φ ) /3 /3 =.8 φ /3 /3 φ m* 00 400 000 0.00 0 0 0 0 0.5 0.7 /3 3 5 7 0.50.5 /3 5 8 0.75.50 /3 6 5 0.90.70 /3 7 7 0.95.76 /3 8 7 Lecture 9 0, July, 008

Kernel Choice: Some Kernels (lag windows) Lecture 9, July, 008

Does Kernel Choice Matter? In theory, yes. (QS is optimal) In practice (for psd kernels), not really. Lecture 9, July, 008

Does lag length matter? Yes, quite a bit y t = β + w t, w t = φw t + e t, = 50. Reection of -sided 0% test φ m ˆ * AR m ˆ *(PW) m* m* 0.00 0.0 0. 0. 0.0 0.0 0.5 0.3 0. 0. 0.3 0.3 0.50 0.5 0. 0. 0.5 0.4 0.75 0. 0. 0. 0. 0.8 0.90 0.33 0.5 0.5 0.33 0.5 0.95 0.46 0.0 0.9 0.46 0.37 (More Simulations: den Haan and Levin (997) available on den Haan s web page and other papers listed throughout these slides) Lecture 9 3, July, 008

Why use more lags than MSE Optimal (Sun, Phillips, and Jin (008) ) Intuition: z ~ N(0,σ ), ˆ σ an estimator of σ (assumed independent of z) z Pr < c Pr z c E ( z c) E g( ) ˆ σ = < = < = ( ) E g σ + E ˆ σ σ g σ + E( ˆ σ σ ) g σ = F () c + Bias( ˆ σ ) g ' + MSE( ˆ σ ) g '' χ ( ) ( ) ( ˆ σ ˆ σ ˆ σ ) ( ) ( ) '( ) ( ) ''( ) MSE Bias + Variance his formula Bias and MSE hus m should be bigger Lecture 9 4, July, 008

IV. Making m really big m = b where b is fixed. (Kiefer, Vogelsang and Bunzel (000), Kiefer and Vogelsang (00), Kiefer and Vogelsang (005)) Bartlett m = ( ): ˆ Ω ( wˆ ) = ( ) ˆ γ = + ˆ γ = ww ˆ ˆ = ˆ γ, and where wˆ t = w t w. t t t= + KV (00) show that some rearanging yields : t ˆ / Ω ( wˆ) = wˆ t= = [ s ] d / / Ω = [ ] / ˆ () = But, w W() s, w ξ [ s ] d / / Ω = wˆ ( W( s) sw()), and, where ξ(s) = Ω / (W(s) sw()). Lecture 9 5, July, 008

hus Ω ˆ ( wˆ ): t d / t= = 0 Ω ˆ ( wˆ) = wˆ Ω ( W( s) sw()) ds So that Ω ˆ ( wˆ ) is a robust, but inconsistent estimator of Ω. Distributions of t-statistics (or other statistics using this ˆΩ) will not be asymptotically normal, but large sample distributions easily tabulated (see KV (00)). Lecture 9 6, July, 008

able of t cricitical values F-critical values.. See KVB (DIVIDE BY!) Lecture 9 7, July, 008

Power loss (From KVB) Lecture 9 8, July, 008

Size Control (Finite Sample AR()) Lecture 9 9, July, 008

(a) Advantages... (i) Some what better size control (in Monte Carlos and in theory (Jansson (004)). (ii) More Robust to serial correlation (Müller (007). Müller proposes estimators that yield t-statistics with t-distributions. (b) Disadvantages... (i) Wider confidence intervals... In the one-dimensional regression problem a value of β β 0 chosen so that β is contained in the (true, not size-distorted) 90% HAC confidence interval with probability 0.0. he value of β is contained in the KVB 90% CI with probability 0.3. (Based on an asymptotic calculation) (ii) Robustness to volatility breaks... No... Lecture 9 30, July, 008

KVB Cousins: (Some related work) (a) (Müller (007): Just focus on low frequency observations. How many do you have? Periodogram ordinates at π, =,, (-)/ Let p denote a low frequency cutoff frequencies with periods greater than p are ω π. Solving number of periodogram ordinates with p periods p: p 60 years of data, p = 6 years, number of ordinates = 0. (Does not depend on sampling frequency). Müller : do inference based on these obs. Appropriately weighted these yield t-distributions for t-statisics. Lecture 9 3, July, 008

(b) Panel Data: Hansen (007). Clustered SEs (over ) when is large and n is small ( and n fixed). his too yields t-statistics being distributed t (with n df) and appropriately constructed F-stats having Hotelling t-squared distribution. (c) Ibragimov and Müller (007). Allows heterogeneity. Lecture 9 3, July, 008

HAC Bottom line(s) () Parametric Estimators are easy and tend to perform reasonably well. In some applications a parametric model is natural (MA(q) model for forecasting q+ periods ahead). In other circumstances VAR-HAC is sensible. (den Haan and Levin (997).) hink about changes in volatility. () Why are you estimating Ω? (a) Optimal Weighting matrix in GMM: Minimum MSE ˆΩ seems like a good idea. (Analytic results on this?) (b) Inference: Use more lags than you otherwise would think you should in Newey-West (or other non-parametric estimators). Worried? Use KVB estimators (and their critical values for tests) or Müller (007) versions (with t or F critical values). Lecture 9 33, July, 008