TECHNICAL WORKING PAPER SERIES HETEROSKEDASTICITY-ROBUST STANDARD ERRORS FOR FIXED EFFECTS PANEL DATA REGRESSION. James H. Stock Mark W.

Similar documents
Heteroskedasticity-Robust Standard Errors for Fixed Effects Panel Data Regression

Heteroskedasticity-Robust Standard Errors for Fixed Effects Panel Data Regression

Specification testing in panel data models estimated by fixed effects with instrumental variables

DEPARTMENT OF STATISTICS

Testing Overidentifying Restrictions with Many Instruments and Heteroskedasticity

Modeling GARCH processes in Panel Data: Theory, Simulations and Examples

The Role of "Leads" in the Dynamic Title of Cointegrating Regression Models. Author(s) Hayakawa, Kazuhiko; Kurozumi, Eiji


Specification Test for Instrumental Variables Regression with Many Instruments

Inference about Clustering and Parametric. Assumptions in Covariance Matrix Estimation

An estimate of the long-run covariance matrix, Ω, is necessary to calculate asymptotic

Deriving Some Estimators of Panel Data Regression Models with Individual Effects

applications to the cases of investment and inflation January, 2001 Abstract

Heteroskedasticity and Autocorrelation Consistent Standard Errors

Heteroskedasticity- and Autocorrelation-Robust Inference or Three Decades of HAC and HAR: What Have We Learned?

1 Procedures robust to weak instruments

Robust Unit Root and Cointegration Rank Tests for Panels and Large Systems *

Solutions to Odd-Numbered End-of-Chapter Exercises: Chapter 14

HAR Inference: Recommendations for Practice

Economic modelling and forecasting

Department of Economics, UCSD UC San Diego

Applied Econometrics. Lecture 3: Introduction to Linear Panel Data Models

A TIME SERIES PARADOX: UNIT ROOT TESTS PERFORM POORLY WHEN DATA ARE COINTEGRATED

Panel Threshold Regression Models with Endogenous Threshold Variables

Least Squares Estimation-Finite-Sample Properties

Multiple Linear Regression

LECTURE ON HAC COVARIANCE MATRIX ESTIMATION AND THE KVB APPROACH

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Final Exam. Economics 835: Econometrics. Fall 2010

ECONOMETRICS HONOR S EXAM REVIEW SESSION

NBER WORKING PAPER SERIES IS THE SPURIOUS REGRESSION PROBLEM SPURIOUS? Bennett T. McCallum. Working Paper

Efficiency Tradeoffs in Estimating the Linear Trend Plus Noise Model. Abstract

Econometrics. Week 6. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

A Practical Test for Strict Exogeneity in Linear Panel Data Models with Fixed Effects

Panel Data Models. James L. Powell Department of Economics University of California, Berkeley

This chapter reviews properties of regression estimators and test statistics based on

Discussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis

Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator

Intermediate Econometrics

Weak Instruments and the First-Stage Robust F-statistic in an IV regression with heteroskedastic errors

Missing dependent variables in panel data models

1 Appendix A: Matrix Algebra

Econometrics Honor s Exam Review Session. Spring 2012 Eunice Han

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

Introduction to Econometrics

Chapter 11 GMM: General Formulas and Application

A Primer on Asymptotics

GMM, HAC estimators, & Standard Errors for Business Cycle Statistics

Comment on HAC Corrections for Strongly Autocorrelated Time Series by Ulrich K. Müller

Econ 423 Lecture Notes

Large Sample Properties of Estimators in the Classical Linear Regression Model

Economics 583: Econometric Theory I A Primer on Asymptotics

Asymptotic Distributions of Instrumental Variables Statistics with Many Instruments

The Functional Central Limit Theorem and Testing for Time Varying Parameters

A better way to bootstrap pairs

TitleReducing the Size Distortion of the.

GMM estimation of spatial panels

Choice of Spectral Density Estimator in Ng-Perron Test: Comparative Analysis

ASSET PRICING MODELS

A Non-Parametric Approach of Heteroskedasticity Robust Estimation of Vector-Autoregressive (VAR) Models

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

1 Motivation for Instrumental Variable (IV) Regression

Heteroskedasticity. Part VII. Heteroskedasticity

11. Bootstrap Methods

Heteroskedasticity-Robust Inference in Finite Samples

Testing Linear Restrictions: cont.

Non-linear panel data modeling

Testing for Serial Correlation in Fixed-Effects Panel Data Models

A PANIC Attack on Unit Roots and Cointegration. July 31, Preliminary and Incomplete

Prof. Dr. Roland Füss Lecture Series in Applied Econometrics Summer Term Introduction to Time Series Analysis

Some Monte Carlo Evidence for Adaptive Estimation of Unit-Time Varying Heteroscedastic Panel Data Models

Chapter 15 Panel Data Models. Pooling Time-Series and Cross-Section Data

A Bootstrap Test for Causality with Endogenous Lag Length Choice. - theory and application in finance

Online Appendix. j=1. φ T (ω j ) vec (EI T (ω j ) f θ0 (ω j )). vec (EI T (ω) f θ0 (ω)) = O T β+1/2) = o(1), M 1. M T (s) exp ( isω)

Lecture 8: Instrumental Variables Estimation

Comparing Forecast Accuracy of Different Models for Prices of Metal Commodities

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Lecture 4: Heteroskedasticity

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Reliability of inference (1 of 2 lectures)

Repeated observations on the same cross-section of individual units. Important advantages relative to pure cross-section data

HETEROSKEDASTICITY, TEMPORAL AND SPATIAL CORRELATION MATTER

Econometrics of Panel Data

Chapter 2. Dynamic panel data models

Chapter 6. Panel Data. Joan Llull. Quantitative Statistical Methods II Barcelona GSE

Instrumental Variable Estimation with Heteroskedasticity and Many Instruments

Review of Econometrics

Econometrics Summary Algebraic and Statistical Preliminaries

Weak Identification in Maximum Likelihood: A Question of Information

Exogeneity tests and weak identification

A Practitioner s Guide to Cluster-Robust Inference

the error term could vary over the observations, in ways that are related

Heteroskedasticity. We now consider the implications of relaxing the assumption that the conditional

Supplemental Material for KERNEL-BASED INFERENCE IN TIME-VARYING COEFFICIENT COINTEGRATING REGRESSION. September 2017

Panel Data Models. Chapter 5. Financial Econometrics. Michael Hauser WS17/18 1 / 63

Econometrics II. Nonstandard Standard Error Issues: A Guide for the. Practitioner

Econometrics of Panel Data

Least Squares Estimation of a Panel Data Model with Multifactor Error Structure and Endogenous Covariates

Homoskedasticity. Var (u X) = σ 2. (23)

Introductory Econometrics

Transcription:

ECHNICAL WORKING PAPER SERIES HEEROSKEDASICIY-ROBUS SANDARD ERRORS FOR FIED EFFECS PANEL DAA REGRESSION James H. Stock Mark W. Watson echnical Working Paper http://www.nber.org/papers/0 NAIONAL BUREAU OF ECONOMIC RESEARCH 050 Massachusetts Avenue Cambridge, MA 08 June 006 We thank Alberto Abadie, Gary Chamberlain, Doug Staiger, and Hal Whe for helpful discussions and/or comments and Anna Mikusheva for research assistance. his research was supported in part by NSF grant SBR-0. he views expressed herein are those of the author(s) and do not necessarily reflect the views of the National Bureau of Economic Research. 006 by James H. Stock and Mark W. Watson. All rights reserved. Short sections of text, not to exceed two paragraphs, may be quoted whout explic permission provided that full cred, including notice, is given to the source.

Heteroskedasticy-Robust Standard Errors for Fixed Effects Panel Data Regression James H. Stock and Mark W. Watson NBER echnical Working Paper No. June 006 JEL No. C, C ABSRAC he conventional heteroskedasticy-robust (HR) variance matrix estimator for cross-sectional regression (wh or whout a degrees of freedom adjustment), applied to the fixed effects estimator for panel data wh serially uncorrelated errors, is inconsistent if the number of time periods is fixed (and greater than two) as the number of enties n increases. We provide a bias-adjusted HR estimator that is (n) / -consistent under any sequences (n, ) in which n and/or increase to. James H. Stock Department of Economics Harvard Universy Ltauer Center M Cambridge, MA 08 and NBER james_stock@harvard.edu Mark W. Watson Department of Economics Princeton Universy Princeton, NJ 085-0 and NBER mwatson@princeton.edu

. Model and heoretical Results Consider the fixed effects regression model, Y α i + β + u, i,, n, t,, () where is a k vector of regressors and where (, u ) satisfy: Heteroskedastic panel data model wh condionally uncorrelated errors. ( i,, i, u i,,, u i ) are i.i.d. over i,, n (i.i.d. over enties),. E(u i,, i ) 0 (strict exogeney) Q t is nonsingular (no perfect multicollineary), and. E. E(u u is i,, i ) 0 for t s (condionally serially uncorrelated errors). For the asymptotic results we will further assume: Stationary and moment condion 5. (, u ) is stationary and has absolutely summable cumulants up to order twelve. he fixed effects estimator is, ˆFE β n n Y () i t i t

where the superscript ~ over variables denotes deviations from enty means (, etc.). he asymptotic distribution of s is (00)] ˆFE β is [e.g. Arrelano d n ( ˆFE β β) N(0, Q Σ Q ), where Σ E( u ). () t he variance of the asymptotic distribution in () is estimated by Qˆ Σˆ Qˆ, where Qˆ n ( n ) and ˆΣ is a heteroskedasticy-robust (HR) covariance matrix i t estimator. A frequently used HR estimator of Σ is ˆ HR S Σ n ˆ n n k u () i t where { u } are the fixed-effects regression residuals, u u ( β β). ˆ Although ˆ HR S Σ is consistent in cross-section regression [Whe (980)], turns out to be inconsistent in panel data regression wh fixed. Specifically, an implication of the results in the appendix is that, under fixed- asymptotics wh >, ˆ ˆFE For example, ˆ HR S Σ is the estimator used in SAA and Eviews.

Σ ˆ HR S p (n, fixed) Σ+ ( B Σ ), where B E uis t. (5) s he expression for B in (5) suggests the bias-adjusted estimator, ˆ HR FE Σ ˆ HR S Bˆ Σ, where ˆB n ˆ is n i u t (6) s where the estimator is defined for >. It is shown in the appendix that, if assumptions -5 hold, then under any sequence (n, ) in which n and/or (which includes the cases of n fixed or fixed), ˆ HR FE Σ Σ + O p (/ n ) (7) so the problematic bias term of order is eliminated if ˆ HR FE Σ is used. Remarks. he bias arises because the enty means are not consistently estimated when is fixed, so the usual step of replacing estimated regression coefficients wh their probabily lims is inapplicable. his can be seen by considering

HR S Σ ( ) n i t u, (8) n which is the infeasible version of ˆ HR S Σ in which β is treated as known and the degrees-of-freedom correction k is omted. he bias calculation is short: EΣ HR S n E u is n ( ) u i t s E u t E( u ) E ( ) + t t s u is u + E ( ) t s u E ( ) is t s r u is ir u Σ + B, (9) where the third equaly uses the assumption E(u u is i,, i ) 0 for t s; rearranging the final expression in (9) yields the plim in (5). he source of the bias is the final two terms in the second line of (9), both of which appear because of estimating the enty means. he problems created by the enty means is an example of the general problem of having increasingly many incidental parameters.. he asymptotic bias in ˆ HR S Σ is O(/). An implication of the calculations in the appendix is that var( Σ ˆ HR S ) O(/n), so MSE( Σ ˆ HR S ) O(/ ) + O(/n).. In general, B Σ is neher posive nor negative semidefine, so standard errors computed using ˆ HR S Σ can in general eher be too large or too small.

. If (, u ) are i.i.d. over t as well as over i, then the asymptotic bias in ˆ HR S Σ is proportional to the asymptotic bias in the homoskedasticy-only estimator, Σˆ homosk Qˆ σ, where ˆu ˆu σ ˆ. Specifically, plim( ˆ HR S Σ Σ) n n k n u i t ( ) b plim( Σˆ homosk Σ), where b ( )/( ). In this sense, ˆ HR S Σ undercorrects for heteroskedasticy. 5. One case in which ˆ HR S p Σ Σ is when, in which case the fixed effects estimator and ˆ HR S Σ are equivalent to the estimator and HR variance matrix computed using first-differences of the data (suppressing the intercept). 6. Another case in which ˆ HR S Σ is consistent is when the errors are homoskedastic: if E( u i,, i ) σ u, then B Σ Q u σ. 7. Another estimator of Σ is the clustered (over enties) variance estimator, n ˆ cluster ˆ Σ u uˆ is is n i t s (0) If, then the infeasible version of ˆ HR FE Σ (in which β is known) equals the infeasible version of Σˆ cluster, and ˆ HR FE Σ is asymptotically equivalent to Σˆ cluster to order / n ; but for >, ˆ cluster Σ and ˆ HR FE Σ differ. Interestingly, the problem of no consistent estimation of the enty means does not affect the clustered variance estimator for any value of because of the (idempotent matrix) identy u t u. his identy does not hold in general for heteroskedasticy- and t 5

autocorrelation-consistent (HAC) kernel estimators of Σ, rather arises as a special case for the untruncated rectangular kernel used in the cluster variance estimator. hus the means-estimation problem discussed above for for HAC panel data estimators other than Σ ˆ cluster. ˆ HR S Σ seems likely to arise 8. Under general (n, ) sequences (n and/or ), Σ ˆ cluster Σ + O p (/ n ) [Hansen (005)]. Because ˆ HR FE Σ Σ + O p (/ n ), if the errors are condionally serially uncorrelated and is moderate or large then ˆ HR FE Σ will be more efficient than Σˆ cluster. 9. he assumption of absolutely summable cumulants, which is used in the proof of the n -consistency of Σ ˆ HR FE, is stronger than needed to justify HR variance estimation in cross-sectional data or HAC estimation in time series data. In the proof in the appendix, this stronger assumption arises because the number of nuisance parameters (enty means) is increasing when n. Under fixed, n asymptotics, stationary and summable cumulants are unnecessary and assumption 5 E Eu can be replaced by < and <, t,,. 0. As wrten, Σˆ HR FE is not guaranteed to be posive semi-define (psd). Asymptotically equivalent psd estimators can be constructed in a number of standard ways. For example if the spectral decomposion of ˆ HR FE Σ is Q ΛQ, then ˆ HR FE psd Σ Q Λ Q is psd.. hese results should extend to IV panel data regression wh heteroskedasticy, albe wh different formulas. 6

. Monte Carlo Results of the bias in A small Monte Carlo study was performed to assess the quantative importance ˆ HR S Σ and the relative MSEs of the variance estimators. he design has a single regressor and Gaussian errors: y x β + u () x ~ i.i.d. N(0,) () u x i ~ i.n.i.d. N(0, σ ), σ λ(0. + x ) κ, () where κ ± and λ is chosen so that the uncondional variance of u is. he variance estimators considered are ˆ HR S Σ (given in ()), ˆ HR FE Σ (given in (6)), and Σˆ cluster (given in (0)). he results, which are based on 0,000 Monte Carlo draws, are summarized in able (a) (for κ ) and (b) (for κ ). he first three columns of results report the bias of the three estimators, relative to the true value of Σ (e.g., E[ ˆ HR S Σ Σ]/Σ). he next three columns report their MSEs, relative to the MSE of the infeasible HR estimator ˆ inf Σ n ( n ) u that could be constructed if the true errors were i t observed. he final three columns report the size of the 0% two-sided test of β β based on the t-statistic using the indicated variance estimator and the asymptotic normal crical value. Several results are noteworthy. 0 7

First, the bias in ˆ HR S Σ can be large, persists as n increases wh fixed, and can be posive or negative depending on the design. For example, wh 5, and n 000, the relative bias of ˆ HR S Σ is.% when κ and is % when κ. Second, a large bias in ˆ HR S Σ can result in a very large relative MSE. Interestingly, in some cases wh small n and and κ, the MSE of ˆ HR S Σ is less than the MSE the infeasible estimator, apparently reflecting a bias-variance tradeoff. hird, the bias correction in ˆ HR FE Σ does s job: the relative bias of ˆ HR FE Σ is less than % in all cases wh n 00, and in most cases the MSE of the MSE of the infeasible HR estimator. ˆ HR FE Σ is very close to Fourth, consistent wh remark 8, the ratio of the MSE of the cluster variance estimator to the infeasible estimator depends on and does not converge to as n gets large for fixed. he MSE of the cluster estimator considerably exceeds the MSE of ˆ HR FE Σ when is moderate or large, regardless of n. Fifth, although the focus of this note has been bias and MSE, one would suspect that the variance estimators wh less bias would produce tests wh better size. able is consistent wh this conjecture: When ˆ HR S Σ is biased up, the t-tests reject too infrequently, and when ˆ HR S Σ is biased down, the t-tests reject too often. When is small, the magnudes of these size distortions can be considerable: for and n 000, the size of the nominal 0% test is.0% for κ and is 6.% when κ. In contrast, in all cases wh n 500, the other two variance estimators produce tests wh sizes that are whin Monte Carlo error of 0%. In more complicated designs, the size distortions of tests based on ˆ HR S Σ are even larger than reported in able. 8

. Conclusions Our theoretical results and Monte Carlo simulations, combined wh the results in Hansen (005), suggest the following advice for empirical practice. he usual estimator ˆ HR S Σ can be used if but should not be used if >. If, ˆ HR FE Σ and Σ ˆ cluster are asymptotically equivalent and eher can be used. If > and there are good reasons to believe that u is condionally serially uncorrelated, then ˆ HR FE Σ will be more efficient than Σˆ cluster, so ˆ HR FE Σ should be used. If, however, serially correlated errors are a possibily as they are in many applications then Σ ˆ cluster should be used in conjunction wh t n or F.,n crical values for hypothesis tests on β [see Hansen (005)]. 9

References Arrelano, M. (00). Panel Data Econometrics, Oxford: Oxford Universy Press. Brillinger, D. (98). ime Series Data Analysis and heory. San Francisco: Holden- Day. Hansen, C. (005). Asymptotic Properties of a Robust Variance Matrix Estimator for Panel Data wh is Large, manuscript, Graduate School of Business, Universy of Chicago. Leonov, V.P. and Shiryaev, A.N. (959). On a Method of Calculation of Semi- Invariants. heoretical Probabily and s Applications,, 9-9. Whe, H. (980). A Heteroskedasticy-Consistent Covariance Matrix Estimator and a Direct ests for Heteroskedasticy, Econometrica, 8(), 87-88. 0

Appendix: Proof of (7) All lims in this appendix hold for any nondecreasing sequence (n, ) in which n and/or. o simplify the calculations, we consider the special case that is a scalar. Whout loss of generaly, let E 0. Adopt the notation u i i m. he proof repeatedly uses the inequaly t var ( a j j ) t u and m ( var( a )) j j. Begin by wring n ( ˆ HR FE Σ Σ) as the sum of four terms using (6) and (9): ˆ HR FE n ( Σ Σ) ˆ HR S ˆ HR S n Σ B EΣ B HR S ( HR S n ) ( ) n ˆ E Bˆ Σ Σ B HR S ( HR S HR S HR S ) ( ) n ˆ Σ Σ + n Σ EΣ n ( Bˆ B ) + n ( B ) B () where HR S Σ is given in (8) and B is ˆB given in (6) wh u replaced by u. he proof of (7) proceeds by showing that, under the stated moment condions, ˆ HR S HR S (a) n ( Σ EΣ ) O p (),

(b) n/ ( B B) O p (/ ), (c) ( ˆ HR S HR n S ) Σ Σ p 0, (d) n/ ( B ˆ B ) p 0. Substution of (a) (d) into () yields n ( ˆ HR FE Σ Σ) O p () and thus the result (7). (a) From (8), we have that var n HR S HR S ( Σ EΣ n ) ( var ) u E u n i t var u t / so (a) follows if can be shown that var ( t ) u O(). Expanding t u yields: u t A 0 A D + ( AD AD AA) + + A AA / A A

where A 0 u t, A t, A u t, A u t, A u t, D t, D u t, and D u t. hus var u {var(a 0 ) / + var(a D ) / + / var( t AD ) / + / var( AD ) / + / var(a A ) / + var(a A A ) / + -/ var( AA ) / } { / var( ) + + A ( EA ED ) / / 8 ( EA ED ) / 0 / 8 + ( EA ED ) / ( ) / 8 8 ( EAEA) ( EA) + / EA EA + /8 / / 8 8 / + ( EA EA ) } (5) where the second inequaly uses term-by-term inequalies, for example the second term in the final expression obtains using var(a D ) EA D ( ) / EA ED. hus a / sufficient condion for var ( t ) EA, ED, ED, and ED all are O(). u to be O() is that var(a 0 ), EA 8, EA 8, EA, 8 8 First consider the D terms. Because ED, ED, and (by Hölder s / / 8 inequaly) ED E u ( ) ( ) in (5) are O(). E Eu E Eu, under assumption 5 all the D moments

For the remainder of the proof of (a), drop the subscript i. Now turn to the A terms, starting wh A. Because t ( ) has mean zero and absolutely summable eighth cumulants, 8 EA E t t 8 h8 cov( t, t j) + O( ) O() j where h 8 is the eighth moment of a standard normal random variable. he same 8 argument applied to u t yields EA O(). Now consider A and let ξ t t u t. hen EA E t ξ t t,..., t Eξ ξξξ t t t t ξt ξ t + t t cov(, ) t,..., t cum( ξ, ξ, ξ, ξ ) t t t t var(ξ t ) + t, t, t cum( ξ, ξ, ξ, ξ ) 0 t t t E Eu + t t cum( 0u0, tu t, t u, ) t t u t (6) t, t, t where cum(.) denotes the cumulant, the third equaly follows from assumption and the definion of the fourth cumulant (see definion.. of Brillinger (98)), the fourth If a t is stationary wh mean zero, autocovariances γ j, and absolutely summable cumulants up to order k, then E( / a t t ) k h k ( γ ) k j j + O( ).

equaly follows by the stationary of ( t, u t ) and because cov(ξ t,ξ s ) 0 for t s by assumption, and the inequaly follows by Cauchy-Schwartz (first term). It remains to show that the final term in (6) is fine. We do so by using a result of Leonov and Shiryaev (959), stated as heorem.. in Brillinger (98), to express the cumulant of products as the product of cumulants. Let z s s and z s u s, and let ν m ν j denote a partion of the set of index pairs j S A {(0,), (0,), (t,), (t,), (t,), (t,), (t,), (t,)}. heorem.. implies that cum( u, u, u, u) 0 0 t t t t t t cum( z0z0, zt zt, zt zt, zt z t) cum( zij, ij ν) cum( zij, ij ν m), where the ν summation extends over all indecomposable partions of S A. Because ( t, u t ) has mean zero, cum( 0 ) cum(u 0 ) 0 so all partions wh some ν k having a single element make a contribution of zero to the sum. hus nontrivial partions must have m. Separating out the partion wh m, we therefore have that cum( 0u0, tu t, t u, ) t t u t t, t, t t, t, t cum(, u,, u,, u,, u ) 0 0 t t t t t t + cum( zij, ij ν) cum( zij, ij ν m). (7) ν: m,, t, t, t he first term on the right hand side of (7) satisfies cum( 0, u0, t, u t, t, u,, ) t t u t t, t, t t, t,..., t7 cum(, u,, u,, u,, u ) 0 t t t t t5 t6 t7 5

which is fine by assumption 5. It remains to show that the second term in (7) is fine. Consider cumulants of the form cum(,...,, u,..., u ) (including the case of no s). When p, by t tr s s p assumption this cumulant is zero. When p, by assumption this cumulant is zero if s s. hus the only nontrivial partions of S A eher (i) place two occurrences of u in one set and two in a second set, or (ii) place all four occurrences of u in a single set. In case (i), the three-fold summation reduces to a single summation which can be handled by bounding one or more cumulants and invoking summabily. For example, one such term is t, t, t cum(, )cum(, u, u )cum(, u, u t ) 0 t t 0 t t t 0 t t u0 u0 0 ut t cum(, )cum(,, )cum(,, u t) 0 0 0 0 t t < (8) t, t var( ) E Eu cum(, u, u ) where the inequaly uses cum( 0, ) var( 0 ), cum(, u0, u 0) t t E u t 0 E 0Eu 0, and cum( 0, ut, ut) t t, t cum(, u, u ) ; all terms in the final 0 t t line of (8) are fine by assumption 5. For a partion to be indecomposable, must be that at least one cumulant under the single summation contains both time indexes 0 and t (if not, the partion satisfies Equation (..5) in Brillinger (98) and thus violates the 6

row equivalency necessary and sufficient condion for indecomposabily). hus all terms in case (i) can be handled in the same way (bounding and applying summabily to a cumulant wh indexes of both 0 and t) as the term handled in (8). hus all terms in case (i) are fine. In case (ii), the summation remains three-dimensional and all cases can be handled by bounding the cumulants not containing the u s and invoking absolute summabily for the cumulant containing the u s. A typical term is cum( 0, u0, ut, u t, u )cum(,, ) t t t t t, t, t E cum(, u, u, u, u) 0 0 0 t, t, t t t t 0 cum( 0, t, t, t, ) t <. t,..., t E u u u u Because the number of partions is fine, the final term in (7) is fine, and follows from (6) that EA O(). Next consider A. he argument that EA for A. he counterpart of the final line of (6) is O() closely follows the argument EA E Eu + 8 t t t, t, t cum( u, u, u, u) 0 0 0 t t t t t t t t t so the leading term in the counterpart of (7) is a twelfth cumulant, which is absolutely summable by assumption 5. Following the remaining steps shows that EA <. 7

Now turn to A 0. he logic of (7) implies that var(a 0 ) var t u t cov( u, u ) 0 0 t t 0 0 0 0 t cum(,, u, u,,, u, u ) t t t t + zij ij ν zij ij ν m ν: m,,t cum(, ) cum(, ) (9) where the summation over ν extends over indecomposable partions of S A 0 {(0,), (0,), (0,), (0,), (t,), (t,), (t,), (t,)} wh m. he first term in the final line of (9) is fine by assumption 5. For a partion of S A 0 to be indecomposable, at least one cumulant must have indexes of both 0 and t (otherwise Brillinger s (98) Equation (..5) is satisfied). hus the bounding and summabily steps of (8) can be applied to all partions in (9), so var(a 0 ) O(). his proves (a). (b) First note that E B B: E B n E is n i u t s E uis u isuir t s ( ) s r 8

E uis uis t s ( ) s B where the penultimate equaly obtains because u is condionally serially uncorrelated. hus n E ( B B) var uis t s is t s E u E Eu 8 8 is (0) where the first inequaly uses and t t u t u t. he result (b) follows from (0). Inspection of the right hand side of the first line in (0) reveals that this variance is posive for fine, so that under fixed- asymptotics the estimation of B makes a /n contribution to the variance of ˆ HR FE Σ. (c) ( ˆ HR S HR n S ) Σ Σ n n n k n ˆ u i t n ( ) n n i t u n n ( ˆ u u ) n ( ) k n i t k n HR S Σ. () n ( ) k 9

HR S An implication of (a) is that Σ EΣ p HR S, so the second term in () is O p (/ n ). o show that the first term in () is o p () suffices to show that i t ( ) n u ˆ u n p 0. Because u u ˆ ( ˆ β β), ( u u ) ( ) n ˆ n β β n ˆ n i t ˆ β n i t n ˆ β β u n ( ) n β ( n ) n i t ( ) n / n ( ) i t ˆ n β u + ( ) n ˆ β β u n i t n β i n i. () t Consider the first term in (). Now n ( ˆ β β ) O p () and E ( ) n / n i t n E ( ) 0 where convergence follows because E ) < is implied by E) <. hus, by ( Markov s inequaly the first term in () converges in probabily to zero. Next consider the second term in (). Because u is condionally serially uncorrelated, u has (respectively) moments, and ( has moments (because has moments), 0

var n 6 u n E( u ) i t n n ( E )( Eu ) 0. his result and n ( ˆ β β ) O p () imply that the second term in () converges in probabily to zero. urning to the final term in (), because u is condionally serially uncorrelated, has moments, u has moments, var n n ui i t E u n t t u 0 t E E n his result and n ( ˆ β β ) O p () imply that the final term in () converges in probabily to zero, and (c) follows. (d) Use u u ˆ ( ˆ β β) and collect terms to obtain ( ˆ n ) ( ˆ u is u is ) n/ B B n i t s n / ( n) i t n ( ˆ β β) n ˆ n β β isuis n i t. () s ( )

Because n ( ˆ β β ) O p () and has four moments, by Markov s inequaly the first term in () converges in probabily to zero (the argument is like that used for the first term in ()). urning to the second term in (), n var is is n i u t s var isu is n ( ) t s n ( ) E Eu 0 so the second term in () converges in probabily to zero, and (d) follows. Details of remark 9. he only place in this proof that the summable cumulant condion is used is to bound the A moments in part (a). If is fixed, a sufficient condion for the moments of A to be bounded is that and u have moments. Stationary of (, u ) is used repeatedly but, if is fixed, stationary could be relaxed by replacing moments such as E wh max E. hus, under -fixed, n t asymptotics, assumption 5 could be replaced by the assumption that E < and Eu < for t,,.

Details of remark. If (, u ) is i.i.d., t,,, i,, n, then Σ E u Q σ + Ω, where Ω jk u, where is the j th element of j cov( j k, u ). Also, the (j,k) element of B is BBjk j k is t s E u Q cov(, ) σ +, jk u j k uis t s σ + Ω Q, jk u jk, where the final equaly uses, for t s, cov(, u ) cov( j k, uis) j k ( ) Ω jk (because (, u ) is i.i.d. over t). hus B Q σ + ( ) Ω u Q σ u + ( ) (Σ Q σ ). he result stated in the remark follows by substuting this final u expression for B into (5), noting that Σ ˆ homosk p Q σ, and collecting terms. u

able. Monte Carlo Results: Bias, Relative MSE, and Size for hree Variance Estimators Design: y x β + u, i,, n, t,, x ~ i.i.d. N(0,) u x i ~ i.n.i.d. N(0, σ ); σ (0. + x ) κ /E[(0. + x ) κ ] (a) κ n ˆ Bias relative to true MSE relative to infeasible Size (nominal level 0%) HR S Σ ˆ HR FE Σ ˆ cluster Σ ˆ HR S Σ ˆ HR FE Σ Σ ˆ cluster ˆ HR S Σ ˆ HR FE Σ Σ ˆ cluster 50-0.80-0.05-0.068 0.78.05.0 0.7 0.5 0.8 5 50-0.5-0.09-0.06 0.8 0.98. 0. 0. 0. 0 50-0.07-0.0-0.0 0.9 0.99.7 0.9 0.08 0.9 5 50-0.00-0.005-0.06 0.96 0.99. 0.07 0.0 0. 50 50-0.05-0.00-0.0 0.98 0.99.8 0.0 0.0 0.0 00 50-0.008-0.00-0.00 0.99.00 6.95 0.099 0.098 0.07 00-0.60-0.07-0.05 0.89..0 0. 0.8 0.0 5 00-0. -0.05-0.0 0.95.0.0 0.7 0.06 0.0 0 00-0.067-0.006-0.06 0.99.0.5 0.6 0.05 0.08 5 00-0.08-0.00-0.0.00.00. 0.0 0.099 0.0 50 00-0.0-0.00-0.0.00.00.95 0.0 0.00 0.0 00 00-0.007-0.00-0.0.00.00 6.9 0.0 0.00 0.06 500-0. -0.006-0.008.60..0 0. 0.097 0.097 5 500-0. -0.00-0.00.70.07.0 0. 0.0 0.0 0 500-0.06-0.00-0.00.5.0.55 0. 0.0 0.0 5 500-0.06 0.000-0.00.9.0.8 0.0 0.00 0.0 50 500-0.0 0.000-0.00.0.00.06 0.0 0.00 0.0 00 500-0.007 0.000-0.00.05.00 7. 0.0 0.00 0.0 000-0.9-0.00-0.00.5.. 0.0 0.0 0.0 5 000-0. -0.00-0.00.59.08.9 0. 0.099 0.00 0 000-0.06-0.00-0.00.00.0.56 0.09 0.098 0.099 5 000-0.06 0.000-0.00..0.6 0.05 0.0 0.0 50 000-0.0 0.000-0.00..00.9 0.0 0.00 0.00 00 000-0.006 0.000 0.000..00 7. 0.0 0.0 0.0

able, ctd. (b) κ n ˆ Bias relative to true MSE relative to infeasible Size (nominal level 0%) HR S Σ ˆ HR FE Σ ˆ cluster Σ ˆ HR S Σ ˆ HR FE Σ Σ ˆ cluster ˆ HR S Σ ˆ HR FE Σ Σ ˆ cluster 50 0.7 0.0-0.0.7..8 0.067 0.05 0.0 5 50 0. 0.007-0.0 5.0.68.0 0.060 0.0 0.07 0 50 0. 0.00-0.07 6.96.5.57 0.068 0.0 0.0 5 50 0.9 0.00-0.07 6.6..0 0.08 0.0 0.08 50 50 0.065 0.000-0.08.6.9.5 0.09 0.0 0. 00 50 0.0 0.000-0.00.. 69.9 0.09 0.00 0.0 00 0.70 0.006-0.007.78.0.8 0.06 0.099 0.0 5 00 0. 0.00-0.006 8.65.66.0 0.059 0.099 0.0 0 00 0. 0.00-0.009.68.5.68 0.065 0.098 0.0 5 00 0.9 0.00-0.008.09.. 0.08 0.0 0.06 50 00 0.065 0.000-0.009 7.9.9.6 0.090 0.0 0.07 00 00 0.0 0.000-0.00 5.9. 70.98 0.09 0.00 0.05 500 0.7 0.00-0.00.59..0 0.06 0.098 0.098 5 500 0.09 0.000-0.00 5.8.66.0 0.059 0.099 0.099 0 500 0. 0.00-0.00 55.7.50.8 0.066 0.099 0.099 5 500 0.8 0.000-0.00 9...5 0.08 0.098 0.00 50 500 0.06 0.000-0.00.6.9.99 0.090 0.00 0.0 00 500 0.0 0.000-0.00.6. 7.9 0.09 0.098 0.099 000 0.69 0.00 0.000 5.7.. 0.06 0.099 0.099 5 000 0.0 0.000-0.00 70.65.66.09 0.059 0.099 0.099 0 000 0. 0.000-0.00 08.60.50.66 0.069 0.099 0.099 5 000 0.8 0.000-0.00 97.76..9 0.08 0.0 0.0 50 000 0.06 0.000-0.00 68.8.9. 0.088 0.098 0.099 00 000 0.0 0.000-0.00 0.87.0 70.8 0.09 0.098 0.00 Notes to able : he first three columns of results report the bias of the indicated estimator as a fraction of the true variance. he next three columns report the MSE of the indicated estimator, relative to the MSE of the infeasible estimator Σ ˆ inf ( n ) n i t u. he final three columns report rejection rate under the null hypothesis of the -sided test of β β0 based on the t-statistic computed using the indicated variance estimator and the asymptotic normal crical value, where the test has a nominal level of 0%. All results are based on 0,000 Monte Carlo draws. 5