Heteroskedasticity-Robust Standard Errors for Fixed Effects Panel Data Regression

Similar documents
TECHNICAL WORKING PAPER SERIES HETEROSKEDASTICITY-ROBUST STANDARD ERRORS FOR FIXED EFFECTS PANEL DATA REGRESSION. James H. Stock Mark W.

Heteroskedasticity-Robust Standard Errors for Fixed Effects Panel Data Regression

Specification testing in panel data models estimated by fixed effects with instrumental variables

DEPARTMENT OF STATISTICS

Deriving Some Estimators of Panel Data Regression Models with Individual Effects

Modeling GARCH processes in Panel Data: Theory, Simulations and Examples

Testing Overidentifying Restrictions with Many Instruments and Heteroskedasticity

Specification Test for Instrumental Variables Regression with Many Instruments

An estimate of the long-run covariance matrix, Ω, is necessary to calculate asymptotic

Inference about Clustering and Parametric. Assumptions in Covariance Matrix Estimation

The Role of "Leads" in the Dynamic Title of Cointegrating Regression Models. Author(s) Hayakawa, Kazuhiko; Kurozumi, Eiji

applications to the cases of investment and inflation January, 2001 Abstract


Heteroskedasticity- and Autocorrelation-Robust Inference or Three Decades of HAC and HAR: What Have We Learned?

Heteroskedasticity and Autocorrelation Consistent Standard Errors

Solutions to Odd-Numbered End-of-Chapter Exercises: Chapter 14

HAR Inference: Recommendations for Practice

Robust Unit Root and Cointegration Rank Tests for Panels and Large Systems *

Economic modelling and forecasting

Applied Econometrics. Lecture 3: Introduction to Linear Panel Data Models

Least Squares Estimation-Finite-Sample Properties

Multiple Linear Regression

1 Procedures robust to weak instruments

LECTURE ON HAC COVARIANCE MATRIX ESTIMATION AND THE KVB APPROACH

ECONOMETRICS HONOR S EXAM REVIEW SESSION

Econometrics. Week 6. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

A TIME SERIES PARADOX: UNIT ROOT TESTS PERFORM POORLY WHEN DATA ARE COINTEGRATED

Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator

This chapter reviews properties of regression estimators and test statistics based on

Department of Economics, UCSD UC San Diego

1 Appendix A: Matrix Algebra

Chapter 11 GMM: General Formulas and Application

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

Efficiency Tradeoffs in Estimating the Linear Trend Plus Noise Model. Abstract

Econometrics Honor s Exam Review Session. Spring 2012 Eunice Han

Panel Threshold Regression Models with Endogenous Threshold Variables

Final Exam. Economics 835: Econometrics. Fall 2010

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Panel Data Models. James L. Powell Department of Economics University of California, Berkeley

Weak Instruments and the First-Stage Robust F-statistic in an IV regression with heteroskedastic errors

Intermediate Econometrics

Discussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis

Missing dependent variables in panel data models

Asymptotic Distributions of Instrumental Variables Statistics with Many Instruments

The Functional Central Limit Theorem and Testing for Time Varying Parameters

A Practical Test for Strict Exogeneity in Linear Panel Data Models with Fixed Effects

A Primer on Asymptotics

Comment on HAC Corrections for Strongly Autocorrelated Time Series by Ulrich K. Müller

ASSET PRICING MODELS

GMM, HAC estimators, & Standard Errors for Business Cycle Statistics

Introduction to Econometrics

Choice of Spectral Density Estimator in Ng-Perron Test: Comparative Analysis

Economics 583: Econometric Theory I A Primer on Asymptotics

A better way to bootstrap pairs

TitleReducing the Size Distortion of the.

Heteroskedasticity. Part VII. Heteroskedasticity

Testing Linear Restrictions: cont.

Non-linear panel data modeling

GMM estimation of spatial panels

Chapter 15 Panel Data Models. Pooling Time-Series and Cross-Section Data

Online Appendix. j=1. φ T (ω j ) vec (EI T (ω j ) f θ0 (ω j )). vec (EI T (ω) f θ0 (ω)) = O T β+1/2) = o(1), M 1. M T (s) exp ( isω)

A Non-Parametric Approach of Heteroskedasticity Robust Estimation of Vector-Autoregressive (VAR) Models

Lecture 8: Instrumental Variables Estimation

A PANIC Attack on Unit Roots and Cointegration. July 31, Preliminary and Incomplete

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Lecture 4: Heteroskedasticity

Reliability of inference (1 of 2 lectures)

Large Sample Properties of Estimators in the Classical Linear Regression Model

1 Motivation for Instrumental Variable (IV) Regression

Repeated observations on the same cross-section of individual units. Important advantages relative to pure cross-section data

11. Bootstrap Methods

HETEROSKEDASTICITY, TEMPORAL AND SPATIAL CORRELATION MATTER

Econometrics of Panel Data

Econ 423 Lecture Notes

Testing for Serial Correlation in Fixed-Effects Panel Data Models

Instrumental Variable Estimation with Heteroskedasticity and Many Instruments

Review of Econometrics

A Bootstrap Test for Causality with Endogenous Lag Length Choice. - theory and application in finance

Econometrics Summary Algebraic and Statistical Preliminaries

Some Monte Carlo Evidence for Adaptive Estimation of Unit-Time Varying Heteroscedastic Panel Data Models

A Practitioner s Guide to Cluster-Robust Inference

the error term could vary over the observations, in ways that are related

Exogeneity tests and weak identification

Heteroskedasticity. We now consider the implications of relaxing the assumption that the conditional

Supplemental Material for KERNEL-BASED INFERENCE IN TIME-VARYING COEFFICIENT COINTEGRATING REGRESSION. September 2017

Econometrics II. Nonstandard Standard Error Issues: A Guide for the. Practitioner

Homoskedasticity. Var (u X) = σ 2. (23)

Introductory Econometrics

Panel Data Models. Chapter 5. Financial Econometrics. Michael Hauser WS17/18 1 / 63

Econometrics of Panel Data

Estimation of Panel Data Models with Binary Indicators when Treatment Effects are not Constant over Time. Audrey Laporte a,*, Frank Windmeijer b

Chapter 6. Panel Data. Joan Llull. Quantitative Statistical Methods II Barcelona GSE

Comparing Forecast Accuracy of Different Models for Prices of Metal Commodities

GMM-based inference in the AR(1) panel data model for parameter values where local identi cation fails

Steven Cook University of Wales Swansea. Abstract

Drawing Inferences from Statistics Based on Multiyear Asset Returns

Introduction to Regression Models for Panel Data Analysis. Indiana University Workshop in Methods February 13, Professor Patricia A.

Darmstadt Discussion Papers in Economics

Chapter 2. Dynamic panel data models

Transcription:

Heteroskedasticy-Robust Standard Errors for Fixed Effects Panel Data Regression May 6, 006 James H. Stock Department of Economics, Harvard Universy and the NBER Mark W. Watson Department of Economics and Woodrow Wilson School, Princeton Universy and the NBER ABSRAC he conventional heteroskedasticy-robust (HR variance matrix estimator for cross-sectional regression (wh or whout a degrees of freedom adjustment, applied to the fixed effects estimator for panel data wh serially uncorrelated errors, is inconsistent if the number of time periods is fixed (and greater than two as the number of enties n increases. We provide a bias-adjusted HR estimator that is n -consistent under any sequences (n, in which n and/or increase to. Key words: Whe standard errors, longudinal data, clustered standard errors JEL codes: C, C We thank Alberto Abadie, Gary Chamberlain, Doug Staiger, and Hal Whe for helpful discussions and/or comments and Anna Mikusheva for research assistance. his research was supported in part by NSF grant SBR-0.

. Model and heoretical Results Consider the fixed effects regression model, Y α i + β + u, i,, n, t,, ( where is a k vector of regressors and where (, u satisfy: Heteroskedastic panel data model wh condionally uncorrelated errors. ( i,, i, u i,,, u i are i.i.d. over i,, n (i.i.d. over enties,. E(u i,, i 0 (strict exogeney Q t is nonsingular (no perfect multicollineary, and. E. E(u u is i,, i 0 for t s (condionally serially uncorrelated errors. For the asymptotic results we will further assume: Stationary and moment condion 5. (, u is stationary and has absolutely summable cumulants up to order twelve. he fixed effects estimator is, ˆFE β n n Y ( i t i t where the superscript ~ over variables denotes deviations from enty means (, etc.. he asymptotic distribution of s is ˆFE β is [e.g. Arrelano (00] d n ( ˆFE β β N(0, Q Σ Q, where Σ E( u. ( t

he variance of the asymptotic distribution in ( is estimated by Qˆ Σˆ Qˆ, where n ( n and ˆΣ is a heteroskedasticy-robust (HR covariance matrix estimator. i t A frequently used HR estimator of Σ is Qˆ ˆ HR S Σ n ˆ n n k u ( i t where { u } are the fixed-effects regression residuals, u u ( β β. ˆ Although ˆ HR S Σ is consistent in cross-section regression [Whe (980], turns out to be inconsistent in panel data regression wh fixed. Specifically, an implication of the results in the appendix is that, under fixed- asymptotics wh >, ˆ ˆFE Σ ˆ HR S p (n, fixed Σ+ ( B Σ, where B E uis t. (5 s he expression for B in (5 suggests the bias-adjusted estimator, ˆ HR FE Σ ˆ HR S Bˆ Σ, where ˆB n ˆ is n i u t (6 s where the estimator is defined for >. It is shown in the appendix that, if assumptions -5 hold, then under any sequence (n, in which n and/or (which includes the cases of n fixed or fixed, For example, ˆ HR S Σ is the estimator used in SAA and Eviews.

ˆ HR FE Σ Σ + O p (/ n (7 so the problematic bias term of order is eliminated if ˆ HR FE Σ is used. Remarks. he bias arises because the enty means are not consistently estimated when is fixed, so the usual step of replacing estimated regression coefficients wh their probabily lims is inapplicable. his can be seen by considering HR S Σ ( n i t u, (8 n which is the infeasible version of ˆ HR S Σ in which β is treated as known and the degrees-of-freedom correction k is omted. he bias calculation is short: EΣ HR S n E u is n ( u i t s E u t E( u E ( + t Σ + t s u is u + E u ( t s E ( is t s r u is ir u B, (9 where the third equaly uses the assumption E(u u is i,, i 0 for t s; rearranging the final expression in (9 yields the plim in (5. he source of the bias is the final two terms in the second line of (9, both of which appear because of estimating the enty means. he problems created by the enty means is an example of the general problem of having increasingly many incidental parameters.

. he asymptotic bias in appendix is that var( Σ ˆ ˆ HR S Σ is O(/. An implication of the calculations in the HR S O(/n, so MSE( Σ ˆ HR S O(/ + O(/n.. In general, B Σ is neher posive nor negative semidefine, so standard errors computed using ˆ HR S Σ can in general eher be too large or too small.. If (, u are i.i.d. over t as well as over i, then the asymptotic bias in ˆ HR S Σ is proportional to the asymptotic bias in the homoskedasticy-only estimator, Σˆ homosk Qˆ σ, where ˆu ˆu σ ˆ. Specifically, plim( ˆ HR S Σ Σ n n k n u i t ( b plim( Σˆ homosk Σ, where b ( /(. In this sense, ˆ HR S Σ undercorrects for heteroskedasticy. 5. One case in which ˆ HR S Σ Σ is when, in which case the fixed effects estimator and p ˆ HR S Σ are equivalent to the estimator and HR variance matrix computed using first-differences of the data (suppressing the intercept. 6. Another case in which ˆ HR S Σ is consistent is when the errors are homoskedastic: if E( u i,, i σ u, then B Σ Q u σ. 7. Another estimator of Σ is the clustered (over enties variance estimator, n ˆ cluster ˆ Σ u uˆ is is n i t s (0 If, then the infeasible version of ˆ HR FE Σ (in which β is known equals the infeasible version of order / n ; but for >, Σˆ cluster, and ˆ cluster Σ ˆ HR FE Σ is asymptotically equivalent to Σˆ cluster to and ˆ HR FE Σ differ. Interestingly, the problem of no consistent estimation of the enty means does not affect the clustered variance estimator for any value of because of the (idempotent matrix identy u t u. his identy does not hold in general for heteroskedasticy- and autocorrelation-consistent (HAC kernel estimators of Σ, rather arises as a special case for the untruncated rectangular kernel used in the cluster variance estimator. t

hus the means-estimation problem discussed above for for HAC panel data estimators other than Σ ˆ cluster. ˆ HR S Σ seems likely to arise 8. Under general (n, sequences (n and/or, Σ ˆ cluster Σ + O p (/ n [Hansen (005]. Because ˆ HR FE Σ Σ + O p (/ n, if the errors are condionally serially uncorrelated and is moderate or large then ˆ HR FE Σ will be more efficient than Σˆ cluster. 9. he assumption of absolutely summable cumulants, which is used in the proof of the n -consistency of Σ ˆ HR FE, is stronger than needed to justify HR variance estimation in cross-sectional data or HAC estimation in time series data. In the proof in the appendix, this stronger assumption arises because the number of nuisance parameters (enty means is increasing when n. Under fixed, n asymptotics, stationary and summable cumulants are unnecessary and assumption 5 E Eu can be replaced by < and <, t,,. 0. As wrten, ˆ HR FE Σ is not guaranteed to be posive semi-define (psd. Asymptotically equivalent psd estimators can be constructed in a number of standard ˆ HR FE psd ways. For example if the spectral decomposion of ˆ HR FE Σ is Q ΛQ, then Σ Q Λ Q is psd.. hese results should extend to IV panel data regression wh heteroskedasticy, albe wh different formulas.. Monte Carlo Results A small Monte Carlo study was performed to assess the quantative importance of the bias in ˆ HR S Σ and the relative MSEs of the variance estimators. he design has a single regressor and Gaussian errors: y x β + u ( x ~ i.i.d. N(0, ( u x i ~ i.n.i.d. N(0, σ, σ λ(0. + x κ, ( 5

where κ ± and λ is chosen so that the uncondional variance of u is. he variance estimators considered are ˆ HR S Σ (given in (, ˆ HR FE Σ (given in (6, and Σˆ cluster (given in (0. he results, which are based on 0,000 Monte Carlo draws, are summarized in able (a (for κ and (b (for κ. he first three columns of results report the bias of the three estimators, relative to the true value of Σ (e.g., E[ ˆ HR S Σ Σ]/Σ. he next three columns report their MSEs, relative to the MSE of the infeasible HR estimator ˆ inf Σ n ( n u that could be constructed if the true errors were i t observed. he final three columns report the size of the 0% two-sided test of β β based on the t-statistic using the indicated variance estimator and the asymptotic normal crical value. Several results are noteworthy. First, the bias in ˆ HR S Σ can be large, persists as n increases wh fixed, and can be posive or negative depending on the design. For example, wh 5, and n 000, the relative bias of Second, a large bias in ˆ HR S Σ is.% when κ and is % when κ. ˆ HR S Σ can result in a very large relative MSE. Interestingly, in some cases wh small n and and κ, the MSE of ˆ HR S Σ is less than the MSE the infeasible estimator, apparently reflecting a bias-variance tradeoff. hird, the bias correction in ˆ HR FE Σ does s job: the relative bias of ˆ HR FE Σ is less than % in all cases wh n 00, and in most cases the MSE of ˆ HR FE Σ is very close to the MSE of the infeasible HR estimator. Fourth, consistent wh remark 8, the ratio of the MSE of the cluster variance estimator to the infeasible estimator depends on and does not converge to as n gets large for fixed. he MSE of the cluster estimator considerably exceeds the MSE of ˆ HR FE Σ when is moderate or large, regardless of n. Fifth, although the focus of this note has been bias and MSE, one would suspect that the variance estimators wh less bias would produce tests wh better size. able is consistent wh this conjecture: When ˆ HR S Σ is biased up, the t-tests reject too infrequently, and when ˆ HR S Σ is biased down, the t-tests reject too often. When is small, the magnudes of these size distortions can be considerable: for and n 0 6

000, the size of the nominal 0% test is.0% for κ and is 6.% when κ. In contrast, in all cases wh n 500, the other two variance estimators produce tests wh sizes that are whin Monte Carlo error of 0%. In more complicated designs, the size distortions of tests based on ˆ HR S Σ are even larger than reported in able.. Conclusions Our theoretical results and Monte Carlo simulations, combined wh the results in Hansen (005, suggest the following advice for empirical practice. he usual estimator Σ ˆ HR S can be used if but should not be used if >. If, ˆ HR FE Σ and Σ ˆ cluster are asymptotically equivalent and eher can be used. If > and there are good reasons to believe that u is condionally serially uncorrelated, then ˆ HR FE Σ will be more efficient than Σˆ cluster, so ˆ HR FE Σ should be used. If, however, serially correlated errors are a possibily as they are in many applications then Σ ˆ cluster should be used in conjunction wh t n or F.,n crical values for hypothesis tests on β [see Hansen (005]. References Arrelano, M. (00. Panel Data Econometrics, Oxford: Oxford Universy Press. Brillinger, D. (98. ime Series Data Analysis and heory. San Francisco: Holden- Day. Hansen, C. (005. Asymptotic Properties of a Robust Variance Matrix Estimator for Panel Data wh is Large, manuscript, Graduate School of Business, Universy of Chicago. Leonov, V.P. and Shiryaev, A.N. (959. On a Method of Calculation of Semi- Invariants. heoretical Probabily and s Applications,, 9-9. Whe, H. (980. A Heteroskedasticy-Consistent Covariance Matrix Estimator and a Direct ests for Heteroskedasticy, Econometrica, 8(, 87-88. 7

Appendix: Proof of (7 All lims in this appendix hold for any nondecreasing sequence (n, in which n and/or. o simplify the calculations, we consider the special case that is a scalar. Whout loss of generaly, let E 0. Adopt the notation u i i m. he proof repeatedly uses the inequaly t var ( a j j t u and m ( var( a j j. Begin by wring n ( ˆ HR FE Σ Σ as the sum of four terms using (6 and (9: n ( ˆ HR FE Σ Σ ˆ HR S ˆ HR S n Σ B EΣ B HR S ( HR S n ( n ˆ E Bˆ Σ Σ B HR S ( HR S HR S HR S ( n ˆ Σ Σ + n Σ EΣ n ( Bˆ B + n ( B B ( where HR S Σ is given in (8 and B is ˆB given in (6 wh u replaced by u. he proof of (7 proceeds by showing that, under the stated moment condions, ˆ HR S HR S (a n ( Σ EΣ (b n/ ( B B O p (/, (c ( ˆ HR S HR n S Σ Σ p 0, O p (, (d n/ ( B ˆ B p 0. 8

Substution of (a (d into ( yields. n ( ˆ HR FE Σ Σ O p ( and thus the result (7 (a From (8, we have that var n HR S HR S ( Σ EΣ n ( var u E u n i t var u t / so (a follows if can be shown that var ( t t u yields: u O(. Expanding u t A 0 A D + ( AD AD AA + + A AA / A A where A 0 u t, A t, A u t, A u t, A u t, D t, D u t, and D u t. hus var u {var(a 0 / + var(a D / + / var( t AD / + / var( AD / + / var(a A / + var(a A A / + -/ var( AA / } { / var( + + A ( EA ED / / 8 ( EA ED / 0 + / EA EA + ( / 8 8 ( EAEA ( EA /8 / / 8 + ( EA ED / / 8 8 / + ( EA EA } (5 9

where the second inequaly uses term-by-term inequalies, for example the second term in the final expression obtains using var(a D / sufficient condion for var ( t EA, ED, ED, and ED all are O(. EA D ( / EA ED. hus a u to be O( is that var(a 0, EA 8, EA 8, EA, 8 8 First consider the D terms. Because ED E, ED Eu, and (by Hölder s / / 8 inequaly ED E u ( ( E Eu, under assumption 5 all the D moments in (5 are O(. For the remainder of the proof of (a, drop the subscript i. Now turn to the A terms, starting wh A. Because t ( has mean zero and absolutely summable eighth cumulants, 8 EA E t t 8 h8 cov( t, t j + O( O( j where h 8 is the eighth moment of a standard normal random variable. he same 8 argument applied to u t yields EA O(. Now consider A and let ξ t t u t. hen EA E t ξ t t,..., t Eξ ξξξ t t t t ξt ξ t + t t cov(, t,..., t cum( ξ, ξ, ξ, ξ t t t t var(ξ t + t, t, t cum( ξ, ξ, ξ, ξ 0 t t t If a t is stationary wh mean zero, autocovariances γ j, and absolutely summable cumulants up to order k, then E( / a t t k h k ( γ k j j + O(. 0

E Eu + t t cum( 0u0, tu t, t u, t t u t (6 t, t, t where cum(. denotes the cumulant, the third equaly follows from assumption and the definion of the fourth cumulant (see definion.. of Brillinger (98, the fourth equaly follows by the stationary of ( t, u t and because cov(ξ t,ξ s 0 for t s by assumption, and the inequaly follows by Cauchy-Schwartz (first term. It remains to show that the final term in (6 is fine. We do so by using a result of Leonov and Shiryaev (959, stated as heorem.. in Brillinger (98, to express the cumulant of products as the product of cumulants. Let z s s and z s u s, and let ν m ν j denote a partion of the set of index pairs j S A {(0,, (0,, (t,, (t,, (t,, (t,, (t,, (t,}. heorem.. implies that cum( u, u, u, u 0 0 t t t t t t cum( z0z0, zt zt, zt zt, zt z t cum( zij, ij ν cum( zij, ij ν m, where the summation extends over all indecomposable partions of ν S A. Because ( t, u t has mean zero, cum( 0 cum(u 0 0 so all partions wh some ν k having a single element make a contribution of zero to the sum. hus nontrivial partions must have m. Separating out the partion wh m, we therefore have that cum( 0u0, tu t, t u, t t u t t, t, t t, t, t cum(, u,, u,, u,, u 0 0 t t t t t t + cum( zij, ij ν cum( zij, ij ν m. (7 ν: m,, t, t, t he first term on the right hand side of (7 satisfies cum( 0, u0, t, u t, t, u,, t t u t t, t, t t, t,..., t7 cum(, u,, u,, u,, u 0 t t t t t5 t6 t7 which is fine by assumption 5.

the form It remains to show that the second term in (7 is fine. Consider cumulants of cum(,...,, u,..., u (including the case of no s. When p, by t tr s s p assumption this cumulant is zero. When p, by assumption this cumulant is zero if s s. hus the only nontrivial partions of S A eher (i place two occurrences of u in one set and two in a second set, or (ii place all four occurrences of u in a single set. In case (i, the three-fold summation reduces to a single summation which can be handled by bounding one or more cumulants and invoking summabily. For example, one such term is t, t, t cum(, cum(, u, u cum(, u, u t 0 t t 0 t t t 0 t t u0 u0 0 ut t cum(, cum(,, cum(,, u t 0 0 0 0 t t < (8 t, t var( E Eu cum(, u, u where the inequaly uses cum( 0, var( 0, cum(, u0, u 0 t t E u t 0 E 0Eu 0, and cum( 0, ut, ut t t, t cum(, u, u ; all terms in the final 0 t t line of (8 are fine by assumption 5. For a partion to be indecomposable, must be that at least one cumulant under the single summation contains both time indexes 0 and t (if not, the partion satisfies Equation (..5 in Brillinger (98 and thus violates the row equivalency necessary and sufficient condion for indecomposabily. hus all terms in case (i can be handled in the same way (bounding and applying summabily to a cumulant wh indexes of both 0 and t as the term handled in (8. hus all terms in case (i are fine. In case (ii, the summation remains three-dimensional and all cases can be handled by bounding the cumulants not containing the u s and invoking absolute summabily for the cumulant containing the u s. A typical term is

cum( 0, u0, ut, u t, u cum(,, t t t t t, t, t E cum(, u, u, u, u 0 0 0 t, t, t t t t 0 cum( 0, t, t, t, t <. t,..., t E u u u u Because the number of partions is fine, the final term in (7 is fine, and follows from (6 that EA O(. Next consider A. he argument that EA for A. he counterpart of the final line of (6 is O( closely follows the argument EA E Eu + 8 t t t, t, t cum( u, u, u, u 0 0 0 t t t t t t t t t so the leading term in the counterpart of (7 is a twelfth cumulant, which is absolutely summable by assumption 5. Following the remaining steps shows that EA <. Now turn to A 0. he logic of (7 implies that var(a 0 var t u t cov( u, u 0 0 t t 0 0 0 0 t cum(,, u, u,,, u, u t t t t + zij ij ν zij ij ν m ν: m,,t cum(, cum(, (9 where the summation over ν extends over indecomposable partions of S A 0 {(0,, (0,, (0,, (0,, (t,, (t,, (t,, (t,} wh m. he first term in the final line of (9 is fine by assumption 5. For a partion of S A 0 to be indecomposable, at least

one cumulant must have indexes of both 0 and t (otherwise Brillinger s (98 Equation (..5 is satisfied. hus the bounding and summabily steps of (8 can be applied to all partions in (9, so var(a 0 O(. his proves (a. (b First note that E B B: E B n E is n i u t s E uis u isuir t s ( s r E uis uis t s ( s B where the penultimate equaly obtains because u is condionally serially uncorrelated. hus n E ( B B var uis t s is t s E u E Eu 8 8 is (0 where the first inequaly uses and t t u t u t. he result (b follows from (0. Inspection of the right hand side of the first line in (0 reveals that this variance is posive for fine, so that under fixed- asymptotics the estimation of B makes a /n contribution to the variance of ˆ HR FE Σ. (c ( ˆ HR S HR n S Σ Σ n n n k n ˆ u i t n ( n n i t u

n n ( ˆ u u n ( k n i t k n HR S Σ. ( n ( k An implication of (a is that HR S Σ p E Σ HR S, so the second term in ( is O p (/ n. o show that the first term in ( is o p ( suffices to show that i t ( n u ˆ u n p 0. Because u u ˆ ( ˆ β β, ( u u ( n ˆ n β β n ˆ n i t n n i t n ˆ β β u n ( n ˆ β β ( n n i t ( / ( i t + ( ˆ n β u Consider the first term in (. Now n ( ˆ β β n n ˆ β β u n i t n β i n i. ( t O p ( and E ( n / n i t n E ( 0 where convergence follows because E < is implied by E <. hus, by ( Markov s inequaly the first term in ( converges in probabily to zero. Next consider the second term in (. Because u is condionally serially uncorrelated, u has (respectively moments, and ( has moments (because has moments, var n 6 u n E( u i t n n ( E ( Eu 0. 5

his result and n ( ˆ β β O p ( imply that the second term in ( converges in probabily to zero. urning to the final term in (, because u is condionally serially uncorrelated, has moments, u has moments, var n n ui i t E u n t t u 0 t E E n his result and n ( ˆ β β O p ( imply that the final term in ( converges in probabily to zero, and (c follows. (d Use u u ˆ ( ˆ β β and collect terms to obtain ( ˆ n ( ˆ u is u is n/ B B n i t s n / ( n i t n ( ˆ β β n ˆ n β β isuis n i t. ( s ( Because n ( ˆ β β O p ( and has four moments, by Markov s inequaly the first term in ( converges in probabily to zero (the argument is like that used for the first term in (. urning to the second term in (, n var is is n i u t s var isu is n ( t s 6

n ( E Eu 0 so the second term in ( converges in probabily to zero, and (d follows. Details of remark 9. he only place in this proof that the summable cumulant condion is used is to bound the A moments in part (a. If is fixed, a sufficient condion for the moments of A to be bounded is that and u have moments. Stationary of (, u is used repeatedly but, if is fixed, stationary could be relaxed by replacing moments such as E wh max E. hus, under -fixed, n t asymptotics, assumption 5 could be replaced by the assumption that E < and Eu < for t,,. Details of remark. If (, u is i.i.d., t,,, i,, n, then Σ E u Q σ + Ω, where Ω jk u, where is the j th element of j cov( j k, u. Also, the (j,k element of B is BBjk E u Q σ + cov(,, jk u j k uis j k is t s Q, jk σ u + Ωj k, t s where the final equaly uses, for t s, cov(, u cov( j k, uis j k ( Ω jk (because (, u is i.i.d. over t. hus B Q σ + ( Ω u Q σ u + ( (Σ Q σ. he result stated in the remark follows by substuting this final u expression for B into (5, noting that Σ ˆ homosk p Q σ, and collecting terms. u 7

able. Monte Carlo Results: Bias, Relative MSE, and Size for hree Variance Estimators Design: y x β + u, i,, n, t,, x ~ i.i.d. N(0, u x i ~ i.n.i.d. N(0, σ ; σ (0. + x κ /E[(0. + x κ ] (a κ n ˆ Bias relative to true MSE relative to infeasible Size (nominal level 0% HR S Σ ˆ HR FE Σ ˆ cluster Σ ˆ HR S Σ ˆ HR FE Σ Σ ˆ cluster ˆ HR S Σ ˆ HR FE Σ Σ ˆ cluster 50-0.80-0.05-0.068 0.78.05.0 0.7 0.5 0.8 5 50-0.5-0.09-0.06 0.8 0.98. 0. 0. 0. 0 50-0.07-0.0-0.0 0.9 0.99.7 0.9 0.08 0.9 5 50-0.00-0.005-0.06 0.96 0.99. 0.07 0.0 0. 50 50-0.05-0.00-0.0 0.98 0.99.8 0.0 0.0 0.0 00 50-0.008-0.00-0.00 0.99.00 6.95 0.099 0.098 0.07 00-0.60-0.07-0.05 0.89..0 0. 0.8 0.0 5 00-0. -0.05-0.0 0.95.0.0 0.7 0.06 0.0 0 00-0.067-0.006-0.06 0.99.0.5 0.6 0.05 0.08 5 00-0.08-0.00-0.0.00.00. 0.0 0.099 0.0 50 00-0.0-0.00-0.0.00.00.95 0.0 0.00 0.0 00 00-0.007-0.00-0.0.00.00 6.9 0.0 0.00 0.06 500-0. -0.006-0.008.60..0 0. 0.097 0.097 5 500-0. -0.00-0.00.70.07.0 0. 0.0 0.0 0 500-0.06-0.00-0.00.5.0.55 0. 0.0 0.0 5 500-0.06 0.000-0.00.9.0.8 0.0 0.00 0.0 50 500-0.0 0.000-0.00.0.00.06 0.0 0.00 0.0 00 500-0.007 0.000-0.00.05.00 7. 0.0 0.00 0.0 000-0.9-0.00-0.00.5.. 0.0 0.0 0.0 5 000-0. -0.00-0.00.59.08.9 0. 0.099 0.00 0 000-0.06-0.00-0.00.00.0.56 0.09 0.098 0.099 5 000-0.06 0.000-0.00..0.6 0.05 0.0 0.0 50 000-0.0 0.000-0.00..00.9 0.0 0.00 0.00 00 000-0.006 0.000 0.000..00 7. 0.0 0.0 0.0 8

able, ctd. (b κ n ˆ Bias relative to true MSE relative to infeasible Size (nominal level 0% HR S Σ ˆ HR FE Σ ˆ cluster Σ ˆ HR S Σ ˆ HR FE Σ Σ ˆ cluster ˆ HR S Σ ˆ HR FE Σ Σ ˆ cluster 50 0.7 0.0-0.0.7..8 0.067 0.05 0.0 5 50 0. 0.007-0.0 5.0.68.0 0.060 0.0 0.07 0 50 0. 0.00-0.07 6.96.5.57 0.068 0.0 0.0 5 50 0.9 0.00-0.07 6.6..0 0.08 0.0 0.08 50 50 0.065 0.000-0.08.6.9.5 0.09 0.0 0. 00 50 0.0 0.000-0.00.. 69.9 0.09 0.00 0.0 00 0.70 0.006-0.007.78.0.8 0.06 0.099 0.0 5 00 0. 0.00-0.006 8.65.66.0 0.059 0.099 0.0 0 00 0. 0.00-0.009.68.5.68 0.065 0.098 0.0 5 00 0.9 0.00-0.008.09.. 0.08 0.0 0.06 50 00 0.065 0.000-0.009 7.9.9.6 0.090 0.0 0.07 00 00 0.0 0.000-0.00 5.9. 70.98 0.09 0.00 0.05 500 0.7 0.00-0.00.59..0 0.06 0.098 0.098 5 500 0.09 0.000-0.00 5.8.66.0 0.059 0.099 0.099 0 500 0. 0.00-0.00 55.7.50.8 0.066 0.099 0.099 5 500 0.8 0.000-0.00 9...5 0.08 0.098 0.00 50 500 0.06 0.000-0.00.6.9.99 0.090 0.00 0.0 00 500 0.0 0.000-0.00.6. 7.9 0.09 0.098 0.099 000 0.69 0.00 0.000 5.7.. 0.06 0.099 0.099 5 000 0.0 0.000-0.00 70.65.66.09 0.059 0.099 0.099 0 000 0. 0.000-0.00 08.60.50.66 0.069 0.099 0.099 5 000 0.8 0.000-0.00 97.76..9 0.08 0.0 0.0 50 000 0.06 0.000-0.00 68.8.9. 0.088 0.098 0.099 00 000 0.0 0.000-0.00 0.87.0 70.8 0.09 0.098 0.00 Notes to able : he first three columns of results report the bias of the indicated estimator as a fraction of the true variance. he next three columns report the MSE of the indicated estimator, relative to the MSE of the infeasible estimator Σ ˆ inf ( n n i t u. he final three columns report rejection rate under the null hypothesis of the -sided test of β β0 based on the t-statistic computed using the indicated variance estimator and the asymptotic normal crical value, where the test has a nominal level of 0%. All results are based on 0,000 Monte Carlo draws. 9