TECHNICAL WORKING PAPER SERIES HETEROSKEDASTICITY-ROBUST STANDARD ERRORS FOR FIXED EFFECTS PANEL DATA REGRESSION. James H. Stock Mark W.

ECHNICAL WORKING PAPER SERIES HEEROSKEDASICIY-ROBUS SANDARD ERRORS FOR FIED EFFECS PANEL DAA REGRESSION James H. Stock Mark W. Watson echnical Working Paper http://www.nber.org/papers/0 NAIONAL BUREAU OF ECONOMIC RESEARCH 050 Massachusetts Avenue Cambridge, MA 08 June 006 We thank Alberto Abadie, Gary Chamberlain, Doug Staiger, and Hal Whe for helpful discussions and/or comments and Anna Mikusheva for research assistance. his research was supported in part by NSF grant SBR-0. he views expressed herein are those of the author(s) and do not necessarily reflect the views of the National Bureau of Economic Research. 006 by James H. Stock and Mark W. Watson. All rights reserved. Short sections of text, not to exceed two paragraphs, may be quoted whout explic permission provided that full cred, including notice, is given to the source.

Heteroskedasticy-Robust Standard Errors for Fixed Effects Panel Data Regression James H. Stock and Mark W. Watson NBER echnical Working Paper No. June 006 JEL No. C, C ABSRAC he conventional heteroskedasticy-robust (HR) variance matrix estimator for cross-sectional regression (wh or whout a degrees of freedom adjustment), applied to the fixed effects estimator for panel data wh serially uncorrelated errors, is inconsistent if the number of time periods is fixed (and greater than two) as the number of enties n increases. We provide a bias-adjusted HR estimator that is (n) / -consistent under any sequences (n, ) in which n and/or increase to. James H. Stock Department of Economics Harvard Universy Ltauer Center M Cambridge, MA 08 and NBER james_stock@harvard.edu Mark W. Watson Department of Economics Princeton Universy Princeton, NJ 085-0 and NBER mwatson@princeton.edu

. Model and heoretical Results Consider the fixed effects regression model, Y α i + β + u, i,, n, t,, () where is a k vector of regressors and where (, u ) satisfy: Heteroskedastic panel data model wh condionally uncorrelated errors. ( i,, i, u i,,, u i ) are i.i.d. over i,, n (i.i.d. over enties),. E(u i,, i ) 0 (strict exogeney) Q t is nonsingular (no perfect multicollineary), and. E. E(u u is i,, i ) 0 for t s (condionally serially uncorrelated errors). For the asymptotic results we will further assume: Stationary and moment condion 5. (, u ) is stationary and has absolutely summable cumulants up to order twelve. he fixed effects estimator is, ˆFE β n n Y () i t i t

where the superscript ~ over variables denotes deviations from enty means (, etc.). he asymptotic distribution of s is (00)] ˆFE β is [e.g. Arrelano d n ( ˆFE β β) N(0, Q Σ Q ), where Σ E( u ). () t he variance of the asymptotic distribution in () is estimated by Qˆ Σˆ Qˆ, where Qˆ n ( n ) and ˆΣ is a heteroskedasticy-robust (HR) covariance matrix i t estimator. A frequently used HR estimator of Σ is ˆ HR S Σ n ˆ n n k u () i t where { u } are the fixed-effects regression residuals, u u ( β β). ˆ Although ˆ HR S Σ is consistent in cross-section regression [Whe (980)], turns out to be inconsistent in panel data regression wh fixed. Specifically, an implication of the results in the appendix is that, under fixed- asymptotics wh >, ˆ ˆFE For example, ˆ HR S Σ is the estimator used in SAA and Eviews.

Σ ˆ HR S p (n, fixed) Σ+ ( B Σ ), where B E uis t. (5) s he expression for B in (5) suggests the bias-adjusted estimator, ˆ HR FE Σ ˆ HR S Bˆ Σ, where ˆB n ˆ is n i u t (6) s where the estimator is defined for >. It is shown in the appendix that, if assumptions -5 hold, then under any sequence (n, ) in which n and/or (which includes the cases of n fixed or fixed), ˆ HR FE Σ Σ + O p (/ n ) (7) so the problematic bias term of order is eliminated if ˆ HR FE Σ is used. Remarks. he bias arises because the enty means are not consistently estimated when is fixed, so the usual step of replacing estimated regression coefficients wh their probabily lims is inapplicable. his can be seen by considering

HR S Σ ( ) n i t u, (8) n which is the infeasible version of ˆ HR S Σ in which β is treated as known and the degrees-of-freedom correction k is omted. he bias calculation is short: EΣ HR S n E u is n ( ) u i t s E u t E( u ) E ( ) + t t s u is u + E ( ) t s u E ( ) is t s r u is ir u Σ + B, (9) where the third equaly uses the assumption E(u u is i,, i ) 0 for t s; rearranging the final expression in (9) yields the plim in (5). he source of the bias is the final two terms in the second line of (9), both of which appear because of estimating the enty means. he problems created by the enty means is an example of the general problem of having increasingly many incidental parameters.. he asymptotic bias in ˆ HR S Σ is O(/). An implication of the calculations in the appendix is that var( Σ ˆ HR S ) O(/n), so MSE( Σ ˆ HR S ) O(/ ) + O(/n).. In general, B Σ is neher posive nor negative semidefine, so standard errors computed using ˆ HR S Σ can in general eher be too large or too small.

. If (, u ) are i.i.d. over t as well as over i, then the asymptotic bias in ˆ HR S Σ is proportional to the asymptotic bias in the homoskedasticy-only estimator, Σˆ homosk Qˆ σ, where ˆu ˆu σ ˆ. Specifically, plim( ˆ HR S Σ Σ) n n k n u i t ( ) b plim( Σˆ homosk Σ), where b ( )/( ). In this sense, ˆ HR S Σ undercorrects for heteroskedasticy. 5. One case in which ˆ HR S p Σ Σ is when, in which case the fixed effects estimator and ˆ HR S Σ are equivalent to the estimator and HR variance matrix computed using first-differences of the data (suppressing the intercept). 6. Another case in which ˆ HR S Σ is consistent is when the errors are homoskedastic: if E( u i,, i ) σ u, then B Σ Q u σ. 7. Another estimator of Σ is the clustered (over enties) variance estimator, n ˆ cluster ˆ Σ u uˆ is is n i t s (0) If, then the infeasible version of ˆ HR FE Σ (in which β is known) equals the infeasible version of Σˆ cluster, and ˆ HR FE Σ is asymptotically equivalent to Σˆ cluster to order / n ; but for >, ˆ cluster Σ and ˆ HR FE Σ differ. Interestingly, the problem of no consistent estimation of the enty means does not affect the clustered variance estimator for any value of because of the (idempotent matrix) identy u t u. his identy does not hold in general for heteroskedasticy- and t 5

autocorrelation-consistent (HAC) kernel estimators of Σ, rather arises as a special case for the untruncated rectangular kernel used in the cluster variance estimator. hus the means-estimation problem discussed above for for HAC panel data estimators other than Σ ˆ cluster. ˆ HR S Σ seems likely to arise 8. Under general (n, ) sequences (n and/or ), Σ ˆ cluster Σ + O p (/ n ) [Hansen (005)]. Because ˆ HR FE Σ Σ + O p (/ n ), if the errors are condionally serially uncorrelated and is moderate or large then ˆ HR FE Σ will be more efficient than Σˆ cluster. 9. he assumption of absolutely summable cumulants, which is used in the proof of the n -consistency of Σ ˆ HR FE, is stronger than needed to justify HR variance estimation in cross-sectional data or HAC estimation in time series data. In the proof in the appendix, this stronger assumption arises because the number of nuisance parameters (enty means) is increasing when n. Under fixed, n asymptotics, stationary and summable cumulants are unnecessary and assumption 5 E Eu can be replaced by < and <, t,,. 0. As wrten, Σˆ HR FE is not guaranteed to be posive semi-define (psd). Asymptotically equivalent psd estimators can be constructed in a number of standard ways. For example if the spectral decomposion of ˆ HR FE Σ is Q ΛQ, then ˆ HR FE psd Σ Q Λ Q is psd.. hese results should extend to IV panel data regression wh heteroskedasticy, albe wh different formulas. 6

. Monte Carlo Results of the bias in A small Monte Carlo study was performed to assess the quantative importance ˆ HR S Σ and the relative MSEs of the variance estimators. he design has a single regressor and Gaussian errors: y x β + u () x ~ i.i.d. N(0,) () u x i ~ i.n.i.d. N(0, σ ), σ λ(0. + x ) κ, () where κ ± and λ is chosen so that the uncondional variance of u is. he variance estimators considered are ˆ HR S Σ (given in ()), ˆ HR FE Σ (given in (6)), and Σˆ cluster (given in (0)). he results, which are based on 0,000 Monte Carlo draws, are summarized in able (a) (for κ ) and (b) (for κ ). he first three columns of results report the bias of the three estimators, relative to the true value of Σ (e.g., E[ ˆ HR S Σ Σ]/Σ). he next three columns report their MSEs, relative to the MSE of the infeasible HR estimator ˆ inf Σ n ( n ) u that could be constructed if the true errors were i t observed. he final three columns report the size of the 0% two-sided test of β β based on the t-statistic using the indicated variance estimator and the asymptotic normal crical value. Several results are noteworthy. 0 7

First, the bias in ˆ HR S Σ can be large, persists as n increases wh fixed, and can be posive or negative depending on the design. For example, wh 5, and n 000, the relative bias of ˆ HR S Σ is.% when κ and is % when κ. Second, a large bias in ˆ HR S Σ can result in a very large relative MSE. Interestingly, in some cases wh small n and and κ, the MSE of ˆ HR S Σ is less than the MSE the infeasible estimator, apparently reflecting a bias-variance tradeoff. hird, the bias correction in ˆ HR FE Σ does s job: the relative bias of ˆ HR FE Σ is less than % in all cases wh n 00, and in most cases the MSE of the MSE of the infeasible HR estimator. ˆ HR FE Σ is very close to Fourth, consistent wh remark 8, the ratio of the MSE of the cluster variance estimator to the infeasible estimator depends on and does not converge to as n gets large for fixed. he MSE of the cluster estimator considerably exceeds the MSE of ˆ HR FE Σ when is moderate or large, regardless of n. Fifth, although the focus of this note has been bias and MSE, one would suspect that the variance estimators wh less bias would produce tests wh better size. able is consistent wh this conjecture: When ˆ HR S Σ is biased up, the t-tests reject too infrequently, and when ˆ HR S Σ is biased down, the t-tests reject too often. When is small, the magnudes of these size distortions can be considerable: for and n 000, the size of the nominal 0% test is.0% for κ and is 6.% when κ. In contrast, in all cases wh n 500, the other two variance estimators produce tests wh sizes that are whin Monte Carlo error of 0%. In more complicated designs, the size distortions of tests based on ˆ HR S Σ are even larger than reported in able. 8

. Conclusions Our theoretical results and Monte Carlo simulations, combined wh the results in Hansen (005), suggest the following advice for empirical practice. he usual estimator ˆ HR S Σ can be used if but should not be used if >. If, ˆ HR FE Σ and Σ ˆ cluster are asymptotically equivalent and eher can be used. If > and there are good reasons to believe that u is condionally serially uncorrelated, then ˆ HR FE Σ will be more efficient than Σˆ cluster, so ˆ HR FE Σ should be used. If, however, serially correlated errors are a possibily as they are in many applications then Σ ˆ cluster should be used in conjunction wh t n or F.,n crical values for hypothesis tests on β [see Hansen (005)]. 9

References Arrelano, M. (00). Panel Data Econometrics, Oxford: Oxford Universy Press. Brillinger, D. (98). ime Series Data Analysis and heory. San Francisco: Holden- Day. Hansen, C. (005). Asymptotic Properties of a Robust Variance Matrix Estimator for Panel Data wh is Large, manuscript, Graduate School of Business, Universy of Chicago. Leonov, V.P. and Shiryaev, A.N. (959). On a Method of Calculation of Semi- Invariants. heoretical Probabily and s Applications,, 9-9. Whe, H. (980). A Heteroskedasticy-Consistent Covariance Matrix Estimator and a Direct ests for Heteroskedasticy, Econometrica, 8(), 87-88. 0

Appendix: Proof of (7) All lims in this appendix hold for any nondecreasing sequence (n, ) in which n and/or. o simplify the calculations, we consider the special case that is a scalar. Whout loss of generaly, let E 0. Adopt the notation u i i m. he proof repeatedly uses the inequaly t var ( a j j ) t u and m ( var( a )) j j. Begin by wring n ( ˆ HR FE Σ Σ) as the sum of four terms using (6) and (9): ˆ HR FE n ( Σ Σ) ˆ HR S ˆ HR S n Σ B EΣ B HR S ( HR S n ) ( ) n ˆ E Bˆ Σ Σ B HR S ( HR S HR S HR S ) ( ) n ˆ Σ Σ + n Σ EΣ n ( Bˆ B ) + n ( B ) B () where HR S Σ is given in (8) and B is ˆB given in (6) wh u replaced by u. he proof of (7) proceeds by showing that, under the stated moment condions, ˆ HR S HR S (a) n ( Σ EΣ ) O p (),

(b) n/ ( B B) O p (/ ), (c) ( ˆ HR S HR n S ) Σ Σ p 0, (d) n/ ( B ˆ B ) p 0. Substution of (a) (d) into () yields n ( ˆ HR FE Σ Σ) O p () and thus the result (7). (a) From (8), we have that var n HR S HR S ( Σ EΣ n ) ( var ) u E u n i t var u t / so (a) follows if can be shown that var ( t ) u O(). Expanding t u yields: u t A 0 A D + ( AD AD AA) + + A AA / A A

where A 0 u t, A t, A u t, A u t, A u t, D t, D u t, and D u t. hus var u {var(a 0 ) / + var(a D ) / + / var( t AD ) / + / var( AD ) / + / var(a A ) / + var(a A A ) / + -/ var( AA ) / } { / var( ) + + A ( EA ED ) / / 8 ( EA ED ) / 0 / 8 + ( EA ED ) / ( ) / 8 8 ( EAEA) ( EA) + / EA EA + /8 / / 8 8 / + ( EA EA ) } (5) where the second inequaly uses term-by-term inequalies, for example the second term in the final expression obtains using var(a D ) EA D ( ) / EA ED. hus a / sufficient condion for var ( t ) EA, ED, ED, and ED all are O(). u to be O() is that var(a 0 ), EA 8, EA 8, EA, 8 8 First consider the D terms. Because ED, ED, and (by Hölder s / / 8 inequaly) ED E u ( ) ( ) in (5) are O(). E Eu E Eu, under assumption 5 all the D moments

For the remainder of the proof of (a), drop the subscript i. Now turn to the A terms, starting wh A. Because t ( ) has mean zero and absolutely summable eighth cumulants, 8 EA E t t 8 h8 cov( t, t j) + O( ) O() j where h 8 is the eighth moment of a standard normal random variable. he same 8 argument applied to u t yields EA O(). Now consider A and let ξ t t u t. hen EA E t ξ t t,..., t Eξ ξξξ t t t t ξt ξ t + t t cov(, ) t,..., t cum( ξ, ξ, ξ, ξ ) t t t t var(ξ t ) + t, t, t cum( ξ, ξ, ξ, ξ ) 0 t t t E Eu + t t cum( 0u0, tu t, t u, ) t t u t (6) t, t, t where cum(.) denotes the cumulant, the third equaly follows from assumption and the definion of the fourth cumulant (see definion.. of Brillinger (98)), the fourth If a t is stationary wh mean zero, autocovariances γ j, and absolutely summable cumulants up to order k, then E( / a t t ) k h k ( γ ) k j j + O( ).

equaly follows by the stationary of ( t, u t ) and because cov(ξ t,ξ s ) 0 for t s by assumption, and the inequaly follows by Cauchy-Schwartz (first term). It remains to show that the final term in (6) is fine. We do so by using a result of Leonov and Shiryaev (959), stated as heorem.. in Brillinger (98), to express the cumulant of products as the product of cumulants. Let z s s and z s u s, and let ν m ν j denote a partion of the set of index pairs j S A {(0,), (0,), (t,), (t,), (t,), (t,), (t,), (t,)}. heorem.. implies that cum( u, u, u, u) 0 0 t t t t t t cum( z0z0, zt zt, zt zt, zt z t) cum( zij, ij ν) cum( zij, ij ν m), where the ν summation extends over all indecomposable partions of S A. Because ( t, u t ) has mean zero, cum( 0 ) cum(u 0 ) 0 so all partions wh some ν k having a single element make a contribution of zero to the sum. hus nontrivial partions must have m. Separating out the partion wh m, we therefore have that cum( 0u0, tu t, t u, ) t t u t t, t, t t, t, t cum(, u,, u,, u,, u ) 0 0 t t t t t t + cum( zij, ij ν) cum( zij, ij ν m). (7) ν: m,, t, t, t he first term on the right hand side of (7) satisfies cum( 0, u0, t, u t, t, u,, ) t t u t t, t, t t, t,..., t7 cum(, u,, u,, u,, u ) 0 t t t t t5 t6 t7 5

which is fine by assumption 5. It remains to show that the second term in (7) is fine. Consider cumulants of the form cum(,...,, u,..., u ) (including the case of no s). When p, by t tr s s p assumption this cumulant is zero. When p, by assumption this cumulant is zero if s s. hus the only nontrivial partions of S A eher (i) place two occurrences of u in one set and two in a second set, or (ii) place all four occurrences of u in a single set. In case (i), the three-fold summation reduces to a single summation which can be handled by bounding one or more cumulants and invoking summabily. For example, one such term is t, t, t cum(, )cum(, u, u )cum(, u, u t ) 0 t t 0 t t t 0 t t u0 u0 0 ut t cum(, )cum(,, )cum(,, u t) 0 0 0 0 t t < (8) t, t var( ) E Eu cum(, u, u ) where the inequaly uses cum( 0, ) var( 0 ), cum(, u0, u 0) t t E u t 0 E 0Eu 0, and cum( 0, ut, ut) t t, t cum(, u, u ) ; all terms in the final 0 t t line of (8) are fine by assumption 5. For a partion to be indecomposable, must be that at least one cumulant under the single summation contains both time indexes 0 and t (if not, the partion satisfies Equation (..5) in Brillinger (98) and thus violates the 6

row equivalency necessary and sufficient condion for indecomposabily). hus all terms in case (i) can be handled in the same way (bounding and applying summabily to a cumulant wh indexes of both 0 and t) as the term handled in (8). hus all terms in case (i) are fine. In case (ii), the summation remains three-dimensional and all cases can be handled by bounding the cumulants not containing the u s and invoking absolute summabily for the cumulant containing the u s. A typical term is cum( 0, u0, ut, u t, u )cum(,, ) t t t t t, t, t E cum(, u, u, u, u) 0 0 0 t, t, t t t t 0 cum( 0, t, t, t, ) t <. t,..., t E u u u u Because the number of partions is fine, the final term in (7) is fine, and follows from (6) that EA O(). Next consider A. he argument that EA for A. he counterpart of the final line of (6) is O() closely follows the argument EA E Eu + 8 t t t, t, t cum( u, u, u, u) 0 0 0 t t t t t t t t t so the leading term in the counterpart of (7) is a twelfth cumulant, which is absolutely summable by assumption 5. Following the remaining steps shows that EA <. 7

Now turn to A 0. he logic of (7) implies that var(a 0 ) var t u t cov( u, u ) 0 0 t t 0 0 0 0 t cum(,, u, u,,, u, u ) t t t t + zij ij ν zij ij ν m ν: m,,t cum(, ) cum(, ) (9) where the summation over ν extends over indecomposable partions of S A 0 {(0,), (0,), (0,), (0,), (t,), (t,), (t,), (t,)} wh m. he first term in the final line of (9) is fine by assumption 5. For a partion of S A 0 to be indecomposable, at least one cumulant must have indexes of both 0 and t (otherwise Brillinger s (98) Equation (..5) is satisfied). hus the bounding and summabily steps of (8) can be applied to all partions in (9), so var(a 0 ) O(). his proves (a). (b) First note that E B B: E B n E is n i u t s E uis u isuir t s ( ) s r 8

E uis uis t s ( ) s B where the penultimate equaly obtains because u is condionally serially uncorrelated. hus n E ( B B) var uis t s is t s E u E Eu 8 8 is (0) where the first inequaly uses and t t u t u t. he result (b) follows from (0). Inspection of the right hand side of the first line in (0) reveals that this variance is posive for fine, so that under fixed- asymptotics the estimation of B makes a /n contribution to the variance of ˆ HR FE Σ. (c) ( ˆ HR S HR n S ) Σ Σ n n n k n ˆ u i t n ( ) n n i t u n n ( ˆ u u ) n ( ) k n i t k n HR S Σ. () n ( ) k 9

HR S An implication of (a) is that Σ EΣ p HR S, so the second term in () is O p (/ n ). o show that the first term in () is o p () suffices to show that i t ( ) n u ˆ u n p 0. Because u u ˆ ( ˆ β β), ( u u ) ( ) n ˆ n β β n ˆ n i t ˆ β n i t n ˆ β β u n ( ) n β ( n ) n i t ( ) n / n ( ) i t ˆ n β u + ( ) n ˆ β β u n i t n β i n i. () t Consider the first term in (). Now n ( ˆ β β ) O p () and E ( ) n / n i t n E ( ) 0 where convergence follows because E ) < is implied by E) <. hus, by ( Markov s inequaly the first term in () converges in probabily to zero. Next consider the second term in (). Because u is condionally serially uncorrelated, u has (respectively) moments, and ( has moments (because has moments), 0

var n 6 u n E( u ) i t n n ( E )( Eu ) 0. his result and n ( ˆ β β ) O p () imply that the second term in () converges in probabily to zero. urning to the final term in (), because u is condionally serially uncorrelated, has moments, u has moments, var n n ui i t E u n t t u 0 t E E n his result and n ( ˆ β β ) O p () imply that the final term in () converges in probabily to zero, and (c) follows. (d) Use u u ˆ ( ˆ β β) and collect terms to obtain ( ˆ n ) ( ˆ u is u is ) n/ B B n i t s n / ( n) i t n ( ˆ β β) n ˆ n β β isuis n i t. () s ( )

Because n ( ˆ β β ) O p () and has four moments, by Markov s inequaly the first term in () converges in probabily to zero (the argument is like that used for the first term in ()). urning to the second term in (), n var is is n i u t s var isu is n ( ) t s n ( ) E Eu 0 so the second term in () converges in probabily to zero, and (d) follows. Details of remark 9. he only place in this proof that the summable cumulant condion is used is to bound the A moments in part (a). If is fixed, a sufficient condion for the moments of A to be bounded is that and u have moments. Stationary of (, u ) is used repeatedly but, if is fixed, stationary could be relaxed by replacing moments such as E wh max E. hus, under -fixed, n t asymptotics, assumption 5 could be replaced by the assumption that E < and Eu < for t,,.

Details of remark. If (, u ) is i.i.d., t,,, i,, n, then Σ E u Q σ + Ω, where Ω jk u, where is the j th element of j cov( j k, u ). Also, the (j,k) element of B is BBjk j k is t s E u Q cov(, ) σ +, jk u j k uis t s σ + Ω Q, jk u jk, where the final equaly uses, for t s, cov(, u ) cov( j k, uis) j k ( ) Ω jk (because (, u ) is i.i.d. over t). hus B Q σ + ( ) Ω u Q σ u + ( ) (Σ Q σ ). he result stated in the remark follows by substuting this final u expression for B into (5), noting that Σ ˆ homosk p Q σ, and collecting terms. u

able. Monte Carlo Results: Bias, Relative MSE, and Size for hree Variance Estimators Design: y x β + u, i,, n, t,, x ~ i.i.d. N(0,) u x i ~ i.n.i.d. N(0, σ ); σ (0. + x ) κ /E[(0. + x ) κ ] (a) κ n ˆ Bias relative to true MSE relative to infeasible Size (nominal level 0%) HR S Σ ˆ HR FE Σ ˆ cluster Σ ˆ HR S Σ ˆ HR FE Σ Σ ˆ cluster ˆ HR S Σ ˆ HR FE Σ Σ ˆ cluster 50-0.80-0.05-0.068 0.78.05.0 0.7 0.5 0.8 5 50-0.5-0.09-0.06 0.8 0.98. 0. 0. 0. 0 50-0.07-0.0-0.0 0.9 0.99.7 0.9 0.08 0.9 5 50-0.00-0.005-0.06 0.96 0.99. 0.07 0.0 0. 50 50-0.05-0.00-0.0 0.98 0.99.8 0.0 0.0 0.0 00 50-0.008-0.00-0.00 0.99.00 6.95 0.099 0.098 0.07 00-0.60-0.07-0.05 0.89..0 0. 0.8 0.0 5 00-0. -0.05-0.0 0.95.0.0 0.7 0.06 0.0 0 00-0.067-0.006-0.06 0.99.0.5 0.6 0.05 0.08 5 00-0.08-0.00-0.0.00.00. 0.0 0.099 0.0 50 00-0.0-0.00-0.0.00.00.95 0.0 0.00 0.0 00 00-0.007-0.00-0.0.00.00 6.9 0.0 0.00 0.06 500-0. -0.006-0.008.60..0 0. 0.097 0.097 5 500-0. -0.00-0.00.70.07.0 0. 0.0 0.0 0 500-0.06-0.00-0.00.5.0.55 0. 0.0 0.0 5 500-0.06 0.000-0.00.9.0.8 0.0 0.00 0.0 50 500-0.0 0.000-0.00.0.00.06 0.0 0.00 0.0 00 500-0.007 0.000-0.00.05.00 7. 0.0 0.00 0.0 000-0.9-0.00-0.00.5.. 0.0 0.0 0.0 5 000-0. -0.00-0.00.59.08.9 0. 0.099 0.00 0 000-0.06-0.00-0.00.00.0.56 0.09 0.098 0.099 5 000-0.06 0.000-0.00..0.6 0.05 0.0 0.0 50 000-0.0 0.000-0.00..00.9 0.0 0.00 0.00 00 000-0.006 0.000 0.000..00 7. 0.0 0.0 0.0

able, ctd. (b) κ n ˆ Bias relative to true MSE relative to infeasible Size (nominal level 0%) HR S Σ ˆ HR FE Σ ˆ cluster Σ ˆ HR S Σ ˆ HR FE Σ Σ ˆ cluster ˆ HR S Σ ˆ HR FE Σ Σ ˆ cluster 50 0.7 0.0-0.0.7..8 0.067 0.05 0.0 5 50 0. 0.007-0.0 5.0.68.0 0.060 0.0 0.07 0 50 0. 0.00-0.07 6.96.5.57 0.068 0.0 0.0 5 50 0.9 0.00-0.07 6.6..0 0.08 0.0 0.08 50 50 0.065 0.000-0.08.6.9.5 0.09 0.0 0. 00 50 0.0 0.000-0.00.. 69.9 0.09 0.00 0.0 00 0.70 0.006-0.007.78.0.8 0.06 0.099 0.0 5 00 0. 0.00-0.006 8.65.66.0 0.059 0.099 0.0 0 00 0. 0.00-0.009.68.5.68 0.065 0.098 0.0 5 00 0.9 0.00-0.008.09.. 0.08 0.0 0.06 50 00 0.065 0.000-0.009 7.9.9.6 0.090 0.0 0.07 00 00 0.0 0.000-0.00 5.9. 70.98 0.09 0.00 0.05 500 0.7 0.00-0.00.59..0 0.06 0.098 0.098 5 500 0.09 0.000-0.00 5.8.66.0 0.059 0.099 0.099 0 500 0. 0.00-0.00 55.7.50.8 0.066 0.099 0.099 5 500 0.8 0.000-0.00 9...5 0.08 0.098 0.00 50 500 0.06 0.000-0.00.6.9.99 0.090 0.00 0.0 00 500 0.0 0.000-0.00.6. 7.9 0.09 0.098 0.099 000 0.69 0.00 0.000 5.7.. 0.06 0.099 0.099 5 000 0.0 0.000-0.00 70.65.66.09 0.059 0.099 0.099 0 000 0. 0.000-0.00 08.60.50.66 0.069 0.099 0.099 5 000 0.8 0.000-0.00 97.76..9 0.08 0.0 0.0 50 000 0.06 0.000-0.00 68.8.9. 0.088 0.098 0.099 00 000 0.0 0.000-0.00 0.87.0 70.8 0.09 0.098 0.00 Notes to able : he first three columns of results report the bias of the indicated estimator as a fraction of the true variance. he next three columns report the MSE of the indicated estimator, relative to the MSE of the infeasible estimator Σ ˆ inf ( n ) n i t u. he final three columns report rejection rate under the null hypothesis of the -sided test of β β0 based on the t-statistic computed using the indicated variance estimator and the asymptotic normal crical value, where the test has a nominal level of 0%. All results are based on 0,000 Monte Carlo draws. 5