Asymptotic Behavior of a t Test Robust to Cluster Heterogeneity

Size: px

Start display at page:

Download "Asymptotic Behavior of a t Test Robust to Cluster Heterogeneity"

Reginald Shaw
6 years ago
Views:

1 Asymptotic Behavior of a t est Robust to Cluster Heteroeneity Andrew V. Carter Department of Statistics University of California, Santa Barbara Kevin. Schnepel and Doulas G. Steierwald Department of Economics University of California, Santa Barbara May 3, 03 Abstract Abstract We study the behavior of a cluster-robust t statistic and make two principle contributions. First, we relax the restriction of previous asymptotic theory that clusters have identical size, and establish that the cluster-robust t statistic continues to have a Gaussian asymptotic null distribution. Second, we determine how variation in cluster sizes, toether with other sources of cluster heteroeneity, a ect the behavior of the test statistic. o do so, we determine the sample speci c measure of cluster heteroeneity that overns this behavior and show that the measure depends on how three quantities vary over clusters: cluster size, the cluster speci c error covariance matrix and the actual value of the covariates. Because, in the absence of a xed desin, the third quantity will always vary over clusters, the vast majority of empirical analyses have test statistics whose nite sample behavior is impacted by cluster heteroeneity. o capture this impact, we develop the e ective number of clusters, which scales down the actual number of clusters by the measure of cluster heteroeneity. If the e ective number of clusters is lare, then Gaussian critical values are appropriate. Introduction In conductin inference with a cluster-robust t statistic, researchers often rely on the result that the conventional t statistic has a Gaussian asymptotic null We thank Dick Startz, toether with members of the Econometrics Research Group at UC Santa Barbara, and Ulrich Müller, who served as discussant at the 03 Econometric Society Winter Meetins, for helpful comments. Correspondin author: dou@econ.ucsb.edu

2 distribution. he existin result is derived for the speci c case in which clusters are equal in size. Because in many applications clusters are unequal in size, there is a ap between the existin result and empirical practice. We ll this ap by establishin that the conventional cluster-robust t statistic has a Gaussian asymptotic null distribution for the more eneral case in which clusters can vary in size. In so doin, we determine the sample speci c measure of cluster heteroeneity that overns the behavior of this cluster-robust t statistic. From the sample speci c measure we construct the e ective number of clusters, which scales down the actual number of clusters by the measure of cluster heteroeneity. It is the e ective number of clusters that overns inference: If the e ective number of clusters is lare, then Gaussian critical values are appropriate. he conventional cluster-robust t statistic is based on the ordinary least squares coe cient estimator from the entire sample, toether with a clusterrobust variance estimator based on the outer product of the residuals. Use of the outer product of the residuals implies that the cluster-robust variance estimator is a function only of between cluster variation and, hence, that consistency of the variance estimator requires that the number of clusters rows without bound. One immediate consequence is that it is not possible to conduct valid inference on xed e ects with the cluster-robust t statistic. he reason is that the coe cient correspondin to the xed e ect takes non-zero values for only a subset of clusters. As this subset does not row with the addition of more clusters, the number of clusters from which to estimate the variance component that corresponds to test of a xed e ect does not row without bound and consistent estimation of this variance component is not possible. he existin asymptotic theory, due to White (984, heorem 6.3, p. 36), applies to clusters of equal size that satisfy a further assumption of cluster homoeneity. Under cluster homoeneity White establishes two principle results. First, that the cluster-robust t statistic has a Gaussian asymptotic null distribution. Second, that the variance component, which appears in the denominator of the test statistic, is consistently estimated throuh the use of the cluster-robust variance estimator. We allow both for unequal cluster size and for further heteroeneity of clusters. Under these more eneral assumptions we establish that the cluster-robust t statistic has a Gaussian asymptotic null distribution. We further establish that the cluster-robust variance estimator can be used to consistently estimate the variance component that appears in the denominator of the test statistic. he asymptotic analysis yields the measure of cluster heteroeneity that overns behavior of the test statistic. he measure, which is sample speci c, depends on how three quantities vary over clusters: cluster size, the cluster speci c error covariance matrix and the actual value of the covariates. o understand why variation in these quantities impacts the behavior of the clusterrobust t statistic, consider a sample of 0 observations divided into two clusters. Because observations are assumed to be independent across clusters, the potentially unique error covariance terms are all contained within the diaonal blocks In what follows we refer to this test statistic simply as the cluster-robust t statistic.

3 that capture the correlation within clusters. If the clusters are equally sized, there are 0 potentially unique terms. As the size of one cluster rows, the number of potentially unique terms rows and reaches a maximum of 9 when one roup contains 9 observations. Variation in cluster size, keepin xed the total number of observations, alters the number of non-zero error covariance terms. Because, as Kloek (98) was amon the rst to show, each of these non-zero terms must be accounted for to avoid upward bias in the test statistic, the behavior of the cluster-robust t statistic is impacted by variation in cluster size. he measure of cluster heteroeneity scales down the number of clusters to produce the e ective number of clusters. Because our results allow the cluster speci c error covariance matrices to take eneral forms, the e ective number of clusters can be thouht of as a eneralization of the correction formula of Moulton (986). wo additional points emere. First, the e ective number of clusters varies with the coe cient under test. Second, because the measure of cluster heteroeneity depends on the value of the covariates under study, the e ective number of clusters should be calculated for virtually all studies that employ the cluster-robust t statistic, even if clusters do not vary in size. he paper is oranized as follows. In Section we de ne the eneral class of models under study and de ne the measure of cluster heteroeneity. We relate the measure to the variability of the cluster-robust variance estimator, establish that the asymptotic null distribution of the cluster-robust t statistic is Gaussian and show that consistent testin of xed e ects is not possible. In Section 3, we de ne the e ective number of clusters and emphasize, throuh simulation, that the e ective number of clusters is a sample speci c measure that varies with the coe cient under test. We illustrate how increases in cluster heteroeneity reduce the e ective number of clusters and how this reduction re ects hiher variability of the cluster-robust variance estimator. he hiher variability, in turn, increases the empirical size of the test. Because the e ective number of clusters depends on the latent error covariance matrix, we discuss how to approximate the e ective number of clusters in Section 4. For two empirical studies we approximate the e ective number of clusters for the key hypotheses under test and discuss appropriate inference. Our remarks, in Section 5, discuss alternative methods of inference when the e ective number of clusters is not lare. Asymptotic Behavior We consider a set of n observations from the linear model y = + u; () where the covariate matrix consists of k linearly independent columns. he key feature of the model is that the observations can be sorted into G clusters, where the errors are independent between clusters. Given the covariance matrix of u,, is then a block diaonal matrix with blocks of the form, 3

4 which corresponds to the covariance matrix for cluster. Because is block diaonal, the variance of the ordinary least squares estimator ^ can be written as the sum of the G cluster speci c variance components. We have h V := V ar i uj = G = h V ar i u j ; where and u are the covariate matrix and error vector for cluster, respectively. he hypotheses under test are formed from subsets of the coe cients in (). he eneral form of null hypothesis is H 0 : a = a 0, where a is a selection vector of dimension k. he cluster-robust t statistic is Z = a ^ r dv ar a ^ ; () where the variance component is V d ar a ^ = a V b a and V b is the clusterrobust variance estimator. he cluster-robust variance estimator, which Shah, Holt and Folsom (977) are amon the rst to use, is the sample analo for V where the observed residuals ^u replace the errors u : bv = G = 0 ^u ^u : (3) White establishes asymptotic results for the cluster-robust t statistic and for the variance component V b a := a V b a. White s proof has two key assumptions: ) that all clusters have an identical, xed, number of observations and ) that clusters are homoeneous in that E not vary over. He then proves that, if G! as n! then Z has a Gaussian asymptotic null distribution and V b a is a consistent estimator of V a. We relax both of White s key, cluster homoeneity, assumptions. We allow the cluster size, n, to vary: over clusters, so that clusters need not be of identical size, and to vary with the sample size, so that cluster sizes need not be xed. We also allow clusters to be heteroeneous, so that E can vary over. We then prove that, if G! as n!, then Z has a Gaussian asymptotic null distribution and bv a is a consistent estimator of V a. Note that to establish consistency we study bv a =V a, rather than V b a V a. We do so because if ^ is a consistent estimator of, then the elements of V convere to zero and we want to ensure that the elements of V b convere to zero at the same rate as the elements of V. Because is restricted only by the requirements of a positive de nite matrix, the test statistic Z is robust to a wide rane of correlated processes. But this eneral robustness has two important implications. First, the variance of ^ can be expressed as a weihted sum of the variances for the cluster-speci c ordinary least squares estimators. (he cluster-speci c ordinary least squares 4

5 estimator for cluster, ^, is based only on the observations for cluster.) his insiht is also contained in Ibraimov and Müller (00), who consider an alternative test statistic based on the unweihted sum of cluster-speci c ordinary least squares estimators. Second, the cluster-robust variance estimator is a function only of between cluster variation. In consequence, the behavior of Z, even for hypothesis tests of coe cients on covariates that vary within clusters, is overned by the number of clusters, not the total number of observations. Of course, if there is no cluster correlation, then each observation e ectively becomes a cluster, G = n, and the behavior of Z is overned by the total number of observations. We collect these ndins in the followin result (alebraic details that verify the result are contained in the Appendix). Result : a) he covariance matrix V, toether with the estimator V b, can be expressed as functions of ^ : V = A V ar ^ A ; (4) bv = A ^ ^ ^ ^ A ; where A =. b) he estimator b V is a function only of between cluster variation. Remarks: he cost of the eneral robustness of Z is re ected in Result b. Because V b is a function only of between cluster variation, consistency of V b requires that the number of clusters row without bound. hus, to ensure we have a consistent test, we require that the selection vector a include only covariates for which the number of clusters in which the covariate takes non-zero values rows without bound. Our proof of the asymptotic null distribution for Z requires two lemmas, which will themselves be helpful in understandin the nite-sample behavior of Z. We rst establish the behavior of ev = A ^ ^ A ; for which the components ^ are uncorrelated across clusters. As we will see below, the behavior of e V leads to a formula that describes how cluster heteroeneity impacts the rate of converence of both e V and b V. Assumption : (i) he error vector u is uncorrelated with the covariate matrix. (ii) Conditional on the covariate matrix, the error vector u is normally distributed with a block diaonal covariance matrix. 5

6 Lemma : Under Assumption, 8" # 9 < eva V = a E : V a ; = ( + (; )) ; G where the quantity is de ned by (; ) = a A V ar (; ) = G G = ^ A a; ; with (; ) = G P (; ). Proof: See Appendix. Remarks: he quantity (; ), which is the squared coe cient of variation for (; ), is the measure of cluster heteroeneity that drives the behavior of the test statistic Z. Clearly (; ) depends on the selection vector a but, because the researcher selects a throuh speci cation of the null hypothesis, we do not explicitly include a as an arument when analyzin the behavior of (; ). We next turn to the behavior of V b. In doin so, we must account for the fact that the components ^ ^ are correlated across clusters throuh the common presence of the random variable ^. Assumption : As n! : (i) he covariance matrix V, which is conditional on the desin, is almost surely well formed in that the larest eienvalue of V,, is bounded in probability relative to V a, n o supp V a > M! 0: (ii) he covariate matrix has a distribution such that G P E A a G a! 0: Note that V a, because the maximal eienvalue for V is the maximum over a of the quadratic form V a. Lemma : Under Assumptions -, E a V b V a = E a V e where a () is bounded in probability. that supp n ( + a ()) > B 6 V a ( + a ()) ; In particular, there exists a B such o! 0:

7 Proof: See Appendix. Remarks: Because n o P ( + a ()) > B n = P a () < + p o n B + P a () > p o B it follows from Markov s inequality that n o P ( + a ()) > B E a () (+ p + E B) a () ( p : B ) Hence a su cient condition is that E a ()! 0 for any bound B >. o understand the implication of Assumption (ii) for the covariate matrix, we discuss a simple desin that satis es the condition. Consider estimation of only an intercept G E A a G a = G n E : n G If each observation is independently and uniformly placed in a cluster, then n follows a multinomial distribution and G E n n G = G n, so that under Assumption (ii) G = o (n), which also implies that E a ()! 0. We are now able to establish our principle result that the cluster-robust test statistic has a Gaussian asymptotic null distribution. Assumption 3: (i) As n! the number of clusters is increasin G!. E[ (;)] (ii) he quantity sup G! 0. heorem : If Assumptions -3 hold, then V b a is a consistent estimator of V a and, under H 0,: Z N (0; ) ; where Proof: denotes converence in distribution. See Appendix. Remarks: Our condition on cluster heteroeneity can be related to the work of Roers (993). Althouh he does not derive an asymptotic null distribution, Roers conjectured that a Gaussian approximation would be adequate for Z if max n n < :05. If we consider a model with only an intercept and no intracluster correlation, then = n n and we see that the adequacy of a Gaussian approximation does depend on n n, albeit throuh the squared coe cient of variation, rather than the maximal value. In addition, part (ii) of Assumption 3 allows for the size of individual clusters, n, to row with n. Because consistency of ^ requires that V a! 0, the rowth of n is restricted; a su cient restriction is n = o (n). o understand the link between these statements and Assumption 3(ii), observe that V a = P h P i (; ). Because supe (; )! 0 implies Assumption 3(ii), our theory encompasses data sets in which n = o (n). 7

8 o obtain consistency, the number of clusters must row without bound. In consequence for any covariate that takes non-zero values over only a xed subset of clusters, consistent testin is not possible. Corollary : For coe cient estimators that depend only on a xed subset of clusters, the elements of b V that correspond to these estimators are inconsistent. Proof: Because V b is a function only of between cluster variation, consistency of V b requires information from a rowin number of clusters. If a coe cient estimator depends only on a nite set of clusters, the requirement is not met. Consider a covariate that takes non-zero values for a xed subset m of the clusters. (For a cluster speci c control, m =.) he element of A that corresponds to this covariate is zero for all clusters other than the set of m, so the correspondin element of V b depends only on m elements even as G! and is inconsistent. Q.E.D. Leadin examples of such covariates are cluster speci c controls (most often termed cluster xed e ects) or controls that correspond to a roup of clusters. 3 E ective Number of Clusters We have established conditions under which the cluster-robust variance estimator V b is consistent and the test statistic Z has a Gaussian asymptotic null distribution. We now turn to the question: How should a researcher use the results to inform empirical analysis? An important component to the answer for this question is contained in Lemma, where we derive a quantity that overns the nite sample behavior of V b and, thus, of Z. his quantity captures how the heteroeneity in clusters impacts the mean-squared error of V b. Recall that heteroeneity across clusters, where the clusters are indexed by, is captured throuh heteroeneity in, where can be written as = a a: (5) From this expression we see that is the key quantity to measure across clusters, so variation in cluster size is not required for cluster heteroeneity. Cluster heteroeneity can arise with clusters of equal size, but where the cluster error covariance matrix di ers over clusters. Moreover, even if is identical across clusters, the fact that the covariates di er over clusters induces heteroeneity. For this reason the vast majority of empirical analyses with cluster-robust inference are characterized by heteroeneous clusters. As detailed in Lemma, the heteroeneity in is captured throuh (; ) = ^ ; where ^ = G P G =. As Lemmas and reveal, the mean-squared 8

9 error of b V is inversely proportional to G = G + (; ) ; so G is the key measure of the adequacy of the asymptotic results. Observe that G G and the manitude of the di erence between G and G increases nonlinearly in the measure of cluster heteroeneity (; ). We refer to G as the e ective number of clusters to re ect the fact that the results in Section extend the conventional analysis (under cluster homoeneity) in which the number of clusters is the measure of the adequacy of the asymptotic results. o emphasize the importance of calculatin G for cluster samples, while also illustratin how to do so, we turn to the simulation model y i = 0 + x + z i + u i : (6) As (5) reveals, the e ective number of clusters varies with the hypothesis under test throuh the selection vector a. For this reason both a covariate that is invariant within clusters, x, and a cluster varyin covariant, z i, are used to construct the outcome, y i, for observation i in cluster. Importantly, because the number of clusters in which x takes non-zero values rows with the sample size, the cluster-invariant covariate is distinct from a xed e ect and the statistic Z is consistent for hypothesis testin on. As noted above, variation in comes from two sources observable to a researcher, cluster sizes and the observed values of the covariates, and from one source that is latent, the error covariance matrix. We capture the impact of the observable sources of variation throuh the followin desin features. Each simulated data set consists of 500 observations divided into 00 clusters. he 00 values of the cluster-invariant covariate are enerated as independent Bernoulli random variables with equal probability of 0 or and the 500 values of the cluster-varyin covariate are enerated as independent Uniform (0; ) random variables. For each simulated data set, the covariates are divided into clusters in the followin way. In the rst cluster size desin, each cluster contains 5 observations. In each succeedin cluster size desin, one observation is subtracted from clusters throuh 00 and these observations are added to the rst cluster. We construct a sequence of 0 cluster size desins, in which the proportion of the sample in the rst cluster rows monotonically from percent to 37 percent. o capture the impact of the latent source of cluster heteroeneity, we use an error-components model u i = " + v i ; (7) where the cluster component " is independent of the individual component v i. While a model with two independent error components is only one of the many types of error processes allowed for under Assumption, such a model illustrates the major factors impactin and hence the e ective number of clusters. o construct from (7), we must specify V ar (u i j). If V ar (u i j) is 9

10 homoskedastic, then the conditional cluster correlation is constant both across and within clusters. If V ar (u i j) is heteroskedastic, then the conditional cluster correlation can di er both across and within clusters. o capture the richness of empirical settins in which V b is typically employed, we set v i j N 0; czi and " j N 0; ", so that the conditional cluster correlation p " " +czi p di ers both across and within clusters. " +cz j he manitude of c plays an important role in characterizin. First, and most obviously, if c = 0 then the conditional cluster correlation is constant. Indeed, because u i consists entirely of the common cluster component, all error terms within a cluster are perfectly correlated. As c increases, the idiosyncractic component rows in importance. We set " = so that for c = 500, the idiosyncratic component dominates u i and all error terms within a cluster are essentially uncorrelated. For each set of cluster sizes and value of c, we enerate 000 simulated data sets and compute the e ective number of clusters. In Fiure, we plot the e ective number of clusters as a function of the variation in cluster sizes. he panels of the ure in the left column correspond to hypothesis tests on the coe cient for the cluster-invariant covariate, x, those in the riht column correspond to hypothesis tests on the coe cient for the cluster-varyin covariate, z i. he rows of the ure correspond to di erin derees of within cluster correlation; in the top row c = 500, while in the bottom row, c = 0. Within each panel, for each set of cluster sizes, the bar spans the rane from the minimum to the maximum e ective number of clusters across the 000 simulated data sets and the dot reports the median value. Fiure 0

11 Several features are immediate. First, increasin the cluster size variation dramatically reduces the e ective number of clusters. Even for data sets with little cluster correlation, c = 500, cluster size variation quickly reduces the e ective number of clusters to less than half the actual number of clusters. Second, that variation in the realized values of f plays an important role. his is perhaps most clearly seen in the upper left panel of Fiure. Here, even thouh clusters are of equal size and the errors exhibit minimal cluster correlation, the e ective number of clusters varies from 00 to 60 with variation in the realized values of the covariates. his variation underscores the need to calculate the e ective number of clusters as a eneral rule. he impact of variation in the unobserved component, the error cluster correlation, is more subtle. o understand why, consider a model with only an intercept, so variation in is driven exclusively by variation in n. If we let v = P n n i= cz i, then n v + n ". Because variation in n is larer than variation in n, as "= v rows, variation in n has a reater impact on variation in. For the eneral model, in which contains covariates, as c declines and the error cluster correlation increases, the variation in has a more pronounced impact on the e ective number of clusters. his impact is most clearly revealed in the riht column of Fiure, where the rane of computed values for the e ective number of clusters tends to widen as the cluster correlation of the error rows. he results in Fiure indicate that the estimator of the standard error is likely to behave more erratically for data sets with larer variation in cluster sizes and where the within cluster correlations are more pronounced. o con rm, h we next compute the (conditional) mean-squared error of V b i bva V, E a V a. o compute this quantity, which is conditional on the realized value of, we simulate data as follows. For each set of cluster sizes and value of c, we enerate the covariate matrix 5 times. For each of these 5 values of, we enerate 000 vectors for u. As each of these vectors ives rise to a di erent value for V b, we have 000 values of V b for each desin matrix from which we calculate the mean-squared error for V b. We report the results for hypothesis tests on the coe cient for the clusterinvariant covariate in Fiure and for the cluster varyin covariate in Fiure 3. Each point in the ures corresponds to the mean-squared error for a speci c desin, where the desin consists of the set of cluster sizes, the value of c and the enerated covariate matrix. Because there are 5 enerated covariate matrices for each of 0 sets of cluster sizes and 4 values of c, there are 00 points, each correspondin to an estimated mean-squared error. Points clustered to the riht, for which the e ective number of clusters is nearly 00, enerally correspond to desins in which all clusters are equal. Points appearin far to the left, for which the e ective number of clusters is less than 0, enerally correspond to desins in which the cluster size variation is pronounced and the value of c is small.

12 Fiure Both ures reveal a strikin pattern. As the e ective number of clusters drops below 40, the mean-squared error of the cluster-robust variance estimator bv rises exponentially. Indeed, an exponential curve is tted to each ure and the point estimates for the two curves are similar ln (mse) = 3:9 (:0) ln (mse) = 3:9 (:0) 0:88 (:03) G for the coe cient on x 0:79 (:0) G for the coe cient on z i :

13 Fiure 3 he increase in mean-squared error bes the question: Is the source of the mean-squared error the bias or the variation of V b? We compute the bias of bv and report the results for the coe cient on the cluster-invariant covariate in Fiure 4 and for the cluster-varyin covariate in Fiure 5. In each ure we plot the manitude of the bias as a function of the larest cluster size. Fiure 4 reveals that increasin the variance of cluster sizes leads to a lare downward bias in V b for the cluster-invariant covariate and that this bias is relatively constant as the cluster correlation for the error chanes. We thus conclude that for the coe cient on the cluster-invariant covariate it is increasin variation in cluster sizes that drives down the e ective number of clusters and results in substantial downward bias in the estimated standard error. 3

14 Fiure 4 For the cluster-varyin coe cient, in contrast, bias plays only a small role in the mean-squared error of the cluster-robust variance estimator. (he bias is pronounced for the case in which c = 500, but as this case has virtually no error cluster correlation, the bias is a re ection of a downward bias arisin from heteroskedasticity.) It seems that for the cluster-varyin coe cient, the meansquared error of the cluster-robust standard error is driven almost exclusively by variation in the estimated standard error. 4

15 Fiure 5 he increase in the mean-squared error of V b has a direct impact on the size of hypothesis tests. As for the mean-squared error calculations, the empirical size is based on the 000 values of V b for each of the 00 covariate matrices (5 covariate matrices for each pair of cluster sizes and value of c). he results are contained in Fiure 6 for the cluster-invariant covariate and in Fiure 7 for the cluster-varyin covariate. 5

16 Fiure 6 For the cluster-invariant covariate, we see in Fiure 6 that the lare downward bias in the cluster-robust standard error translates into in ated test sizes. While the empirical size beins to rise above the nominal size of 5 percent when the e ective number of clusters falls below 40, substantial size distortion only occurs when the e ective number of clusters falls below 0 (re ectin the exponential increase in the mean-squared error). In Fiure 7 we report the nominal size of a test on the coe cient for the cluster-varyin coe cient. We see that while there is some size distortion for very small values of the e ective number of clusters, the absence of substantial bias in the cluster-robust standard error estimator mitiates the distortion. Rather, we see the impact of the variation in the standard error estimator, with empirical size fallin both above and below the nominal level. In sum, the impact of cluster heteroeneity, as re ected in a low e ective number of clusters, is pronounced for test of the cluster-invariant covariate but of much less import for test of the cluster-varyin covariate. 6

17 Fiure 7 Of course, the above calculations rely on knowlede of the error cluster correlation. In practice, absent knowlede of this correlation, can the assumption that all errors are perfectly correlated with a cluster (c = 0) provide a conservative bound? As size distortion is pronounced only for the cluster-invariant covariate, we focus our attention on this covariate. In Fiure 8, we report the mean-squared error of V b as a function of the estimated e ective number of clusters. Because the estimated e ective number of clusters assume c = 0, the estimated e ective number of clusters forms a lower bound for the actual number of clusters. In comparison with Fiure, we see that the only points that lie far from the curve are associated with c = 500. If c = 500, then there is virtually no within cluster correlation, so it is perhaps not surprisin that the assumption of perfect within cluster correlation would provide a conservative bound. 7

18 Fiure 8 4 Remarks Consistency of the cluster-robust variance estimator, and the asymptotic null distribution of the resultant t test statistics, are impacted by cluster heteroeneity. We establish conditions under which the cluster-robust variance estimator is consistent and the resultant t test statistics have an asymptotic Gaussian null distribution, as the number of clusters rows without bound. he conditions lead to a sample speci c measure of cluster heteroeneity, which translates into an e ective number of clusters. he e ective number of clusters is informative for the nite sample behavior of the cluster-robust variance estimator in that a low e ective number of clusters sinals downward bias in the estimated standard errors for the cluster invariant reressor. While the e ective number of clusters should be calculated for all studies that employ the cluster-robust t statistic, we demonstrate that for many forms of heteroeneity the e ective number of clusters is a lare fraction of the actual number of clusters. If the e ective number of clusters is lare, then Gaussian critical values are appropriate. If the e ective number of clusters is not lare, then an alternative method of inference is needed. hree natural alternatives arise: ) use critical values from a t distribution, ) use critical values from a re-samplin method and 3) use an alternative test statistic. We discuss each of these alternatives in turn. 8

19 Use of critical values from a t distribution is arued for by Kott (994). Kott arues that the derees of freedom should be selected to mirror the variation of the cluster-robust variance estimator. Althouh his analysis does not contain formal asymptotic results, our analysis would suest that the derees of freedom in the approximatin distribution should be set equal to the e ective number of clusters. Yet we suest caution before employin critical values from a t distribution. o understand why, consider Z e = a (^ p 0 ). Under eva V homoeneous clusters G ~ a V a G, which would seem to imply that Z e t (G). he implication does not hold because the numerator and denominator of Z e are correlated. he error from approximatin Z e by a t distribution is mani ed under cluster heteroeneity because a G ~V is not a (G=(+ )) random (+ ) V a variable. Further, the test statistic Z is constructed with V b a rather than V e a, which adds a third source of approximation error. Because it is di cult to bound the approximation error that these three sources induce, use of critical values from a t (G=(+ )) could lead to di culty in controllin the size of the test. his caution is borne out in our preliminary simulations. For a desin in which the e ective number of clusters is 33, the critical value that delivers an empirical size of 5 percent is.0, while the critical value from a t distribution with 33 derees of freedom is.04. In sum, we have not established that critical values from a t distribution with the e ective number of clusters as the derees of freedom provide a conservative sized test, and this topic remains for future study. Use of critical values from a bootstrap is arued for by Cameron, Gelbach and Miller (008), who focus on data eneratin processes with homoenous clusters and compare several re-samplin methods. hey nd that use of the wild bootstrap leads to critical values that can control the size of the clusterrobust t statistic. Extendin these results to allow for full ranes of cluster heteroeneity remains a topic for further study. Ibraimov and Mueller propose an alternative test statistic, for which critical values are obtained from the t distribution. he alternative test statistic is based on the averae of cluster speci c coe cient estimators. While such a statistic cannot be used to test hypotheses reardin covariates that are invariant within clusters, the statistic is a simple alternative for test reardin covariates that do vary within clusters. Moreover, the critical value from a t distribution with G derees of freedom is conservative, so the size of the test is controlled. (Size control depends on the nominal size of the test, as the critical values cannot be conservative for all nominal sizes. But for a test with the conventional size of 5 percent, the critical value is conservative.) As indicated by Result, the cluster-robust t statistic depends on weihted averaes of cluster speci c quantities. As the use of unequal weihts in constructin the test statistic likely alters the power of the test statistic, it is an open question as to the conditions under which each statistic delivers hiher power. his too, remains a topic for further study. 9

20 5 Appendix Verification of Result : Let be the n k covariate matrix with all rows that do not correspond to cluster set to zero. Part a: Observe that because y = P y, ^ = y A ^ : (8) As V V ar ^j, the cluster representation of V in (4) follows directly from (8). o derive the cluster representation of V b in (4), note that = =. Hence y ^ h = i y h = i y = ^ ^ : hus because A ^ y ^ = y ^ = ^u ; (9) ^ = ^u and ^u = ^u. Hence the cluster representation of V b in (4) follows directly from (9). Part b: he estimator V b is a function of the residuals y y = (I n ) y = ^u; where =. components hese residuals can be decomposed into two ^u = (I n G + G ) y = (I n G ) y + ( G ) y = ^u W + ^u B ; where G = P is the projection operator onto the cluster speci c models. he residual component ^u W captures the within cluster variation while the residual component ^u B captures the between cluster variation. he quantity V b depends on the residuals throuh the linear function ^u = ^u. Hence, ^u = ^u W + ^u B : Because certain elements of may not be identi ed within cluster, we use the eneralized inverse de ned such that = (Harville 997, heorem.3.4 part (5), p. 67). 0

21 Because the least squares residuals are orthoonal to the correspondin model space, and ^u W = (I n G ) y = y = 0; ^u B = ( G ) y = A y 6= 0: hus, b V is only a function of between cluster variation. Proof of Lemma : Let Q := a A ^ Assumption, Q j N 0; a A V ar ^ A a. Under part (i) of. hus e V a is the sum of G squared, independent, normal random variables. Under normality V ar Q = [V ar (Q )], so V ar eva = h a A V ar ^ A ai : P. he manitude of this quantity is relative to Va, which is a A V ar ^ A a hus 8" # 9 < P h i eva V = a E : V a ; = a A V ar ^ A a P : a A V ar ^ A a From the de nition of (; ), 8" # 9 < eva V = a E : V a ; = ( + (; )) : G Q.E.D. Proof of Lemma : We wish to establish that E a V b V a = E a V e V a ( + a ()) (0) where a () is bounded in probability as n!. follows from the expansion E a V b V a = E a V e V +E na e V V a he settin of the problem a + E a V b a b V e V e V o a ; a

22 for which the Cauchy-Schwarz inequality yields (0) with 3 E a bv V e a a () = : E a ev V a he rest of the calculation is to show that a () is bounded in probability as n!. o bein, note bv V e = A ^ ^ ^ ^ ^ ^ A = = = A ^ ^ ^ ^ ^ ^ + ^ ^ A A ^ A ^ ^ ^ + ^ ^ A ^ + ^ ^ + ^ ^ A : When takin the quadratic form, this becomes a V b V e a = a A ^ = a A ^ ^ A a + ^ + a A ^ ^ A a ^ ^ A a: o obtain the limitin behavior, we use the fact that P A = I to introduce the quantity A G I, so that a A ^ ^ A a = = a A ^ a A ^ " ^ A G I a + a A ^ ^ A G I a + where the last equality follows from P A ^ = ^. a b V e V a = + a A ^ a A ^ # a A ^ ^ We now have ^ G I a G I a; ^ A G I a + () ^ A G I a G a ^ ^ a:

23 herefore, our bound on E a bv V e a depends on a bound on the conditional expectation of the square of each of the three terms on the riht side of (). Each term is the product of vector norms of the form ^ A a N 0; a AV ar ^ A a. We bein with the last term, for which ( ) E G a ^ ^ a = 4 G E a ^ = 3 a G V a : For the remainin two terms in (), let a = a A ^ A ^ G I a. Because " # " a b we have 8! 9 < ( = E a : b ; E 8 < 4E : 4 a! a # " b # ) b! ; () and let b = 93 = a! 8 93 < = 5 4E b ; :! 5 ; E! 3 a E! 3 b 4 5 ; where the last two inequalities are obtained from the Cauchy-Schwarz and Minkowski inequalities, respectively. Here E h i # a 4 = 3 a A V ar ^ A a and E b 4 = 3 "a A G I V A G I a : Hence, the rst term on the riht side of () is bounded as 8 <! 9 E a = A ^ ^ A : G I a (3) ; " # " 3 a A V ar ^ A a a A G I V A a# G I = 3 a V a " a A G I V A G I a# : 3

24 A For the second term on the riht side of (), because P G I = 0, 8 <! 9 E a = A : ^ ^ A G I a (4) ; 8 < = E a! 9 A : G I ^ = ^ A G I a ; " # 3 a A G I V A G I a : Because (f + f f 3 ) 8f +6f +3f3, we combine the bound from Lemma with (), (3) and (4) to obtain ( " # a () G a A + (; ) G I a V A G I a V a (5) " # 9 +G a A G I a V A G I a V a = + G; : From this bound, we are able to verify that a () is bounded in probability as n!. o do so, observe that without loss of enerality we can assume kak =. From Assumption (ii) the second term on the riht side of (5) is bounded in probability, so we are left to verify that P a A G I V A G I a a V a is o P G. Under Assumption (i) supp M! ; V a and Assumption (ii) implies A a G a = o P G : Q.E.D. Proof of heorem : By our Assumption (i) the random variable W = a ^ 0 = p a V a N (0; ) ; 4

25 under the null hypothesis. hus the test statistic a V a Z = W a V b a will convere in distribution to W if a V b a a V a By Lemma E a V b V a = E a V e ; P!, by Slutsky s lemma. V a ( + a ()) : Lemma further implies 8 9 >< a bv V a >= E >: a V a >; = G ( + (; )) ( + a ()) : De ne the set B to be n ( + a ()) < B o, where B is de ned as in Lemma. herefore, to imply converence in probability, ( ) ( )! bv a P V a > " bv a P V a > " B + P (B c ) " ( )!# bv a = E P V a > " B + P (B c ) >< 6 a bv V a >= B 7 E 4E >: a V a >; " 5 + P (B c ) ; where the last inequality follows from Chebyshev s inequality and the fact that B (). From the de nition of B, toether with application of the lemmas described above ( ) bv a B P V a > " E ( + (; )) + P (B c ) ; G" where each term oes to zero by Assumptions 3(i) and 3(i) and Lemma, respectively. Hence ( ) bv a supp V a > "! 0; which implies converence in distribution. Q.E.D. Verification of Result : he for estimatin the term is the coe cient of variation for 0 = 0 : (6) 5

26 Here 0 = (n ) n (n )! ; (7) where n is the total number of observations and is the total number of observations where the covariate is. Further = k k ; (8) where is an indicator function of which roups have covariate. If we insert the results from (7) and (8) into (6) we obtain = k k (n ) + n (n ) : he averae value of is = k k (n ) + n (n ) M G : It follows that = n (n ) + n (n ) M G M G M : G Because all clusters are of equal size, =n = M=G and = (G M) M (G M) : 6

27 References [] Cameron, A., J. Gelbach and D. Miller, Bootstrap-Based Improvements for Inference with Clustered Errors, Review of Economics and Statistics, 008, 90, [] Harville, D., Matrix Alebra from a Statistician s Perspective, 997, Spriner: New York. [3] Ibraimov, R. and U. Müller, t-statistic Based Correlation and Heteroeneity Robust Inference, Journal of Business and Economic Statistics, 00, 8, [4] Kloek,., OLS Estimation in a Model where a Microvariable is Explained by Areates and Contemporaneous Disturbances are Equicorrelated, Econometrica, 98, 49, [5] Kott, P., A Hypothesis est of Linear Reression Coe cients with Survey Data, Survey Methodoloy, 994, 0, [6] Krueer, A., Experimental Estimates of Educational Production Functions, Quarterly Journal of Economics, 999, 4, [7] Kuhn, P., P. Kooreman, A. Soetevent and A. Kapteyn, he E ects of Lottery Prizes on Winners and heir Neihbors: Evidence from the Dutch Postcode Lottery, American Economic Review, 0, 0, [8] Moulton, B., Random Group E ects and the Precision of Reression Estimates, Journal of Econometrics, 986, 3, [9] Roers, W., Reression Standard Errors in Clustered Samples, Stata echnical Bulletin 3, 993, 3, 9-3. [0] Shah, B., M. Holt and R. Folsom, Inference About Reression Models from Sample Survey Data, Bulletin of the International Statistical Institute Proceedins of the 4st Session, 977, 47:3, [] White, H., Asymptotic heory for Econometricians, 984, Academic Press: San Dieo. 7

GMM-based inference in the AR(1) panel data model for parameter values where local identi cation fails

GMM-based inference in the AR() panel data model for parameter values where local identi cation fails Edith Madsen entre for Applied Microeconometrics (AM) Department of Economics, University of openhagen,