Asymptotic Behavior of a t Test Robust to Cluster Heterogeneity

Size: px
Start display at page:

Download "Asymptotic Behavior of a t Test Robust to Cluster Heterogeneity"

Transcription

1 Asymptotic Behavior of a t est Robust to Cluster Heteroeneity Andrew V. Carter Department of Statistics University of California, Santa Barbara Kevin. Schnepel and Doulas G. Steierwald Department of Economics University of California, Santa Barbara May 3, 03 Abstract Abstract We study the behavior of a cluster-robust t statistic and make two principle contributions. First, we relax the restriction of previous asymptotic theory that clusters have identical size, and establish that the cluster-robust t statistic continues to have a Gaussian asymptotic null distribution. Second, we determine how variation in cluster sizes, toether with other sources of cluster heteroeneity, a ect the behavior of the test statistic. o do so, we determine the sample speci c measure of cluster heteroeneity that overns this behavior and show that the measure depends on how three quantities vary over clusters: cluster size, the cluster speci c error covariance matrix and the actual value of the covariates. Because, in the absence of a xed desin, the third quantity will always vary over clusters, the vast majority of empirical analyses have test statistics whose nite sample behavior is impacted by cluster heteroeneity. o capture this impact, we develop the e ective number of clusters, which scales down the actual number of clusters by the measure of cluster heteroeneity. If the e ective number of clusters is lare, then Gaussian critical values are appropriate. Introduction In conductin inference with a cluster-robust t statistic, researchers often rely on the result that the conventional t statistic has a Gaussian asymptotic null We thank Dick Startz, toether with members of the Econometrics Research Group at UC Santa Barbara, and Ulrich Müller, who served as discussant at the 03 Econometric Society Winter Meetins, for helpful comments. Correspondin author: dou@econ.ucsb.edu

2 distribution. he existin result is derived for the speci c case in which clusters are equal in size. Because in many applications clusters are unequal in size, there is a ap between the existin result and empirical practice. We ll this ap by establishin that the conventional cluster-robust t statistic has a Gaussian asymptotic null distribution for the more eneral case in which clusters can vary in size. In so doin, we determine the sample speci c measure of cluster heteroeneity that overns the behavior of this cluster-robust t statistic. From the sample speci c measure we construct the e ective number of clusters, which scales down the actual number of clusters by the measure of cluster heteroeneity. It is the e ective number of clusters that overns inference: If the e ective number of clusters is lare, then Gaussian critical values are appropriate. he conventional cluster-robust t statistic is based on the ordinary least squares coe cient estimator from the entire sample, toether with a clusterrobust variance estimator based on the outer product of the residuals. Use of the outer product of the residuals implies that the cluster-robust variance estimator is a function only of between cluster variation and, hence, that consistency of the variance estimator requires that the number of clusters rows without bound. One immediate consequence is that it is not possible to conduct valid inference on xed e ects with the cluster-robust t statistic. he reason is that the coe cient correspondin to the xed e ect takes non-zero values for only a subset of clusters. As this subset does not row with the addition of more clusters, the number of clusters from which to estimate the variance component that corresponds to test of a xed e ect does not row without bound and consistent estimation of this variance component is not possible. he existin asymptotic theory, due to White (984, heorem 6.3, p. 36), applies to clusters of equal size that satisfy a further assumption of cluster homoeneity. Under cluster homoeneity White establishes two principle results. First, that the cluster-robust t statistic has a Gaussian asymptotic null distribution. Second, that the variance component, which appears in the denominator of the test statistic, is consistently estimated throuh the use of the cluster-robust variance estimator. We allow both for unequal cluster size and for further heteroeneity of clusters. Under these more eneral assumptions we establish that the cluster-robust t statistic has a Gaussian asymptotic null distribution. We further establish that the cluster-robust variance estimator can be used to consistently estimate the variance component that appears in the denominator of the test statistic. he asymptotic analysis yields the measure of cluster heteroeneity that overns behavior of the test statistic. he measure, which is sample speci c, depends on how three quantities vary over clusters: cluster size, the cluster speci c error covariance matrix and the actual value of the covariates. o understand why variation in these quantities impacts the behavior of the clusterrobust t statistic, consider a sample of 0 observations divided into two clusters. Because observations are assumed to be independent across clusters, the potentially unique error covariance terms are all contained within the diaonal blocks In what follows we refer to this test statistic simply as the cluster-robust t statistic.

3 that capture the correlation within clusters. If the clusters are equally sized, there are 0 potentially unique terms. As the size of one cluster rows, the number of potentially unique terms rows and reaches a maximum of 9 when one roup contains 9 observations. Variation in cluster size, keepin xed the total number of observations, alters the number of non-zero error covariance terms. Because, as Kloek (98) was amon the rst to show, each of these non-zero terms must be accounted for to avoid upward bias in the test statistic, the behavior of the cluster-robust t statistic is impacted by variation in cluster size. he measure of cluster heteroeneity scales down the number of clusters to produce the e ective number of clusters. Because our results allow the cluster speci c error covariance matrices to take eneral forms, the e ective number of clusters can be thouht of as a eneralization of the correction formula of Moulton (986). wo additional points emere. First, the e ective number of clusters varies with the coe cient under test. Second, because the measure of cluster heteroeneity depends on the value of the covariates under study, the e ective number of clusters should be calculated for virtually all studies that employ the cluster-robust t statistic, even if clusters do not vary in size. he paper is oranized as follows. In Section we de ne the eneral class of models under study and de ne the measure of cluster heteroeneity. We relate the measure to the variability of the cluster-robust variance estimator, establish that the asymptotic null distribution of the cluster-robust t statistic is Gaussian and show that consistent testin of xed e ects is not possible. In Section 3, we de ne the e ective number of clusters and emphasize, throuh simulation, that the e ective number of clusters is a sample speci c measure that varies with the coe cient under test. We illustrate how increases in cluster heteroeneity reduce the e ective number of clusters and how this reduction re ects hiher variability of the cluster-robust variance estimator. he hiher variability, in turn, increases the empirical size of the test. Because the e ective number of clusters depends on the latent error covariance matrix, we discuss how to approximate the e ective number of clusters in Section 4. For two empirical studies we approximate the e ective number of clusters for the key hypotheses under test and discuss appropriate inference. Our remarks, in Section 5, discuss alternative methods of inference when the e ective number of clusters is not lare. Asymptotic Behavior We consider a set of n observations from the linear model y = + u; () where the covariate matrix consists of k linearly independent columns. he key feature of the model is that the observations can be sorted into G clusters, where the errors are independent between clusters. Given the covariance matrix of u,, is then a block diaonal matrix with blocks of the form, 3

4 which corresponds to the covariance matrix for cluster. Because is block diaonal, the variance of the ordinary least squares estimator ^ can be written as the sum of the G cluster speci c variance components. We have h V := V ar i uj = G = h V ar i u j ; where and u are the covariate matrix and error vector for cluster, respectively. he hypotheses under test are formed from subsets of the coe cients in (). he eneral form of null hypothesis is H 0 : a = a 0, where a is a selection vector of dimension k. he cluster-robust t statistic is Z = a ^ r dv ar a ^ ; () where the variance component is V d ar a ^ = a V b a and V b is the clusterrobust variance estimator. he cluster-robust variance estimator, which Shah, Holt and Folsom (977) are amon the rst to use, is the sample analo for V where the observed residuals ^u replace the errors u : bv = G = 0 ^u ^u : (3) White establishes asymptotic results for the cluster-robust t statistic and for the variance component V b a := a V b a. White s proof has two key assumptions: ) that all clusters have an identical, xed, number of observations and ) that clusters are homoeneous in that E not vary over. He then proves that, if G! as n! then Z has a Gaussian asymptotic null distribution and V b a is a consistent estimator of V a. We relax both of White s key, cluster homoeneity, assumptions. We allow the cluster size, n, to vary: over clusters, so that clusters need not be of identical size, and to vary with the sample size, so that cluster sizes need not be xed. We also allow clusters to be heteroeneous, so that E can vary over. We then prove that, if G! as n!, then Z has a Gaussian asymptotic null distribution and bv a is a consistent estimator of V a. Note that to establish consistency we study bv a =V a, rather than V b a V a. We do so because if ^ is a consistent estimator of, then the elements of V convere to zero and we want to ensure that the elements of V b convere to zero at the same rate as the elements of V. Because is restricted only by the requirements of a positive de nite matrix, the test statistic Z is robust to a wide rane of correlated processes. But this eneral robustness has two important implications. First, the variance of ^ can be expressed as a weihted sum of the variances for the cluster-speci c ordinary least squares estimators. (he cluster-speci c ordinary least squares 4

5 estimator for cluster, ^, is based only on the observations for cluster.) his insiht is also contained in Ibraimov and Müller (00), who consider an alternative test statistic based on the unweihted sum of cluster-speci c ordinary least squares estimators. Second, the cluster-robust variance estimator is a function only of between cluster variation. In consequence, the behavior of Z, even for hypothesis tests of coe cients on covariates that vary within clusters, is overned by the number of clusters, not the total number of observations. Of course, if there is no cluster correlation, then each observation e ectively becomes a cluster, G = n, and the behavior of Z is overned by the total number of observations. We collect these ndins in the followin result (alebraic details that verify the result are contained in the Appendix). Result : a) he covariance matrix V, toether with the estimator V b, can be expressed as functions of ^ : V = A V ar ^ A ; (4) bv = A ^ ^ ^ ^ A ; where A =. b) he estimator b V is a function only of between cluster variation. Remarks: he cost of the eneral robustness of Z is re ected in Result b. Because V b is a function only of between cluster variation, consistency of V b requires that the number of clusters row without bound. hus, to ensure we have a consistent test, we require that the selection vector a include only covariates for which the number of clusters in which the covariate takes non-zero values rows without bound. Our proof of the asymptotic null distribution for Z requires two lemmas, which will themselves be helpful in understandin the nite-sample behavior of Z. We rst establish the behavior of ev = A ^ ^ A ; for which the components ^ are uncorrelated across clusters. As we will see below, the behavior of e V leads to a formula that describes how cluster heteroeneity impacts the rate of converence of both e V and b V. Assumption : (i) he error vector u is uncorrelated with the covariate matrix. (ii) Conditional on the covariate matrix, the error vector u is normally distributed with a block diaonal covariance matrix. 5

6 Lemma : Under Assumption, 8" # 9 < eva V = a E : V a ; = ( + (; )) ; G where the quantity is de ned by (; ) = a A V ar (; ) = G G = ^ A a; ; with (; ) = G P (; ). Proof: See Appendix. Remarks: he quantity (; ), which is the squared coe cient of variation for (; ), is the measure of cluster heteroeneity that drives the behavior of the test statistic Z. Clearly (; ) depends on the selection vector a but, because the researcher selects a throuh speci cation of the null hypothesis, we do not explicitly include a as an arument when analyzin the behavior of (; ). We next turn to the behavior of V b. In doin so, we must account for the fact that the components ^ ^ are correlated across clusters throuh the common presence of the random variable ^. Assumption : As n! : (i) he covariance matrix V, which is conditional on the desin, is almost surely well formed in that the larest eienvalue of V,, is bounded in probability relative to V a, n o supp V a > M! 0: (ii) he covariate matrix has a distribution such that G P E A a G a! 0: Note that V a, because the maximal eienvalue for V is the maximum over a of the quadratic form V a. Lemma : Under Assumptions -, E a V b V a = E a V e where a () is bounded in probability. that supp n ( + a ()) > B 6 V a ( + a ()) ; In particular, there exists a B such o! 0:

7 Proof: See Appendix. Remarks: Because n o P ( + a ()) > B n = P a () < + p o n B + P a () > p o B it follows from Markov s inequality that n o P ( + a ()) > B E a () (+ p + E B) a () ( p : B ) Hence a su cient condition is that E a ()! 0 for any bound B >. o understand the implication of Assumption (ii) for the covariate matrix, we discuss a simple desin that satis es the condition. Consider estimation of only an intercept G E A a G a = G n E : n G If each observation is independently and uniformly placed in a cluster, then n follows a multinomial distribution and G E n n G = G n, so that under Assumption (ii) G = o (n), which also implies that E a ()! 0. We are now able to establish our principle result that the cluster-robust test statistic has a Gaussian asymptotic null distribution. Assumption 3: (i) As n! the number of clusters is increasin G!. E[ (;)] (ii) he quantity sup G! 0. heorem : If Assumptions -3 hold, then V b a is a consistent estimator of V a and, under H 0,: Z N (0; ) ; where Proof: denotes converence in distribution. See Appendix. Remarks: Our condition on cluster heteroeneity can be related to the work of Roers (993). Althouh he does not derive an asymptotic null distribution, Roers conjectured that a Gaussian approximation would be adequate for Z if max n n < :05. If we consider a model with only an intercept and no intracluster correlation, then = n n and we see that the adequacy of a Gaussian approximation does depend on n n, albeit throuh the squared coe cient of variation, rather than the maximal value. In addition, part (ii) of Assumption 3 allows for the size of individual clusters, n, to row with n. Because consistency of ^ requires that V a! 0, the rowth of n is restricted; a su cient restriction is n = o (n). o understand the link between these statements and Assumption 3(ii), observe that V a = P h P i (; ). Because supe (; )! 0 implies Assumption 3(ii), our theory encompasses data sets in which n = o (n). 7

8 o obtain consistency, the number of clusters must row without bound. In consequence for any covariate that takes non-zero values over only a xed subset of clusters, consistent testin is not possible. Corollary : For coe cient estimators that depend only on a xed subset of clusters, the elements of b V that correspond to these estimators are inconsistent. Proof: Because V b is a function only of between cluster variation, consistency of V b requires information from a rowin number of clusters. If a coe cient estimator depends only on a nite set of clusters, the requirement is not met. Consider a covariate that takes non-zero values for a xed subset m of the clusters. (For a cluster speci c control, m =.) he element of A that corresponds to this covariate is zero for all clusters other than the set of m, so the correspondin element of V b depends only on m elements even as G! and is inconsistent. Q.E.D. Leadin examples of such covariates are cluster speci c controls (most often termed cluster xed e ects) or controls that correspond to a roup of clusters. 3 E ective Number of Clusters We have established conditions under which the cluster-robust variance estimator V b is consistent and the test statistic Z has a Gaussian asymptotic null distribution. We now turn to the question: How should a researcher use the results to inform empirical analysis? An important component to the answer for this question is contained in Lemma, where we derive a quantity that overns the nite sample behavior of V b and, thus, of Z. his quantity captures how the heteroeneity in clusters impacts the mean-squared error of V b. Recall that heteroeneity across clusters, where the clusters are indexed by, is captured throuh heteroeneity in, where can be written as = a a: (5) From this expression we see that is the key quantity to measure across clusters, so variation in cluster size is not required for cluster heteroeneity. Cluster heteroeneity can arise with clusters of equal size, but where the cluster error covariance matrix di ers over clusters. Moreover, even if is identical across clusters, the fact that the covariates di er over clusters induces heteroeneity. For this reason the vast majority of empirical analyses with cluster-robust inference are characterized by heteroeneous clusters. As detailed in Lemma, the heteroeneity in is captured throuh (; ) = ^ ; where ^ = G P G =. As Lemmas and reveal, the mean-squared 8

9 error of b V is inversely proportional to G = G + (; ) ; so G is the key measure of the adequacy of the asymptotic results. Observe that G G and the manitude of the di erence between G and G increases nonlinearly in the measure of cluster heteroeneity (; ). We refer to G as the e ective number of clusters to re ect the fact that the results in Section extend the conventional analysis (under cluster homoeneity) in which the number of clusters is the measure of the adequacy of the asymptotic results. o emphasize the importance of calculatin G for cluster samples, while also illustratin how to do so, we turn to the simulation model y i = 0 + x + z i + u i : (6) As (5) reveals, the e ective number of clusters varies with the hypothesis under test throuh the selection vector a. For this reason both a covariate that is invariant within clusters, x, and a cluster varyin covariant, z i, are used to construct the outcome, y i, for observation i in cluster. Importantly, because the number of clusters in which x takes non-zero values rows with the sample size, the cluster-invariant covariate is distinct from a xed e ect and the statistic Z is consistent for hypothesis testin on. As noted above, variation in comes from two sources observable to a researcher, cluster sizes and the observed values of the covariates, and from one source that is latent, the error covariance matrix. We capture the impact of the observable sources of variation throuh the followin desin features. Each simulated data set consists of 500 observations divided into 00 clusters. he 00 values of the cluster-invariant covariate are enerated as independent Bernoulli random variables with equal probability of 0 or and the 500 values of the cluster-varyin covariate are enerated as independent Uniform (0; ) random variables. For each simulated data set, the covariates are divided into clusters in the followin way. In the rst cluster size desin, each cluster contains 5 observations. In each succeedin cluster size desin, one observation is subtracted from clusters throuh 00 and these observations are added to the rst cluster. We construct a sequence of 0 cluster size desins, in which the proportion of the sample in the rst cluster rows monotonically from percent to 37 percent. o capture the impact of the latent source of cluster heteroeneity, we use an error-components model u i = " + v i ; (7) where the cluster component " is independent of the individual component v i. While a model with two independent error components is only one of the many types of error processes allowed for under Assumption, such a model illustrates the major factors impactin and hence the e ective number of clusters. o construct from (7), we must specify V ar (u i j). If V ar (u i j) is 9

10 homoskedastic, then the conditional cluster correlation is constant both across and within clusters. If V ar (u i j) is heteroskedastic, then the conditional cluster correlation can di er both across and within clusters. o capture the richness of empirical settins in which V b is typically employed, we set v i j N 0; czi and " j N 0; ", so that the conditional cluster correlation p " " +czi p di ers both across and within clusters. " +cz j he manitude of c plays an important role in characterizin. First, and most obviously, if c = 0 then the conditional cluster correlation is constant. Indeed, because u i consists entirely of the common cluster component, all error terms within a cluster are perfectly correlated. As c increases, the idiosyncractic component rows in importance. We set " = so that for c = 500, the idiosyncratic component dominates u i and all error terms within a cluster are essentially uncorrelated. For each set of cluster sizes and value of c, we enerate 000 simulated data sets and compute the e ective number of clusters. In Fiure, we plot the e ective number of clusters as a function of the variation in cluster sizes. he panels of the ure in the left column correspond to hypothesis tests on the coe cient for the cluster-invariant covariate, x, those in the riht column correspond to hypothesis tests on the coe cient for the cluster-varyin covariate, z i. he rows of the ure correspond to di erin derees of within cluster correlation; in the top row c = 500, while in the bottom row, c = 0. Within each panel, for each set of cluster sizes, the bar spans the rane from the minimum to the maximum e ective number of clusters across the 000 simulated data sets and the dot reports the median value. Fiure 0

11 Several features are immediate. First, increasin the cluster size variation dramatically reduces the e ective number of clusters. Even for data sets with little cluster correlation, c = 500, cluster size variation quickly reduces the e ective number of clusters to less than half the actual number of clusters. Second, that variation in the realized values of f plays an important role. his is perhaps most clearly seen in the upper left panel of Fiure. Here, even thouh clusters are of equal size and the errors exhibit minimal cluster correlation, the e ective number of clusters varies from 00 to 60 with variation in the realized values of the covariates. his variation underscores the need to calculate the e ective number of clusters as a eneral rule. he impact of variation in the unobserved component, the error cluster correlation, is more subtle. o understand why, consider a model with only an intercept, so variation in is driven exclusively by variation in n. If we let v = P n n i= cz i, then n v + n ". Because variation in n is larer than variation in n, as "= v rows, variation in n has a reater impact on variation in. For the eneral model, in which contains covariates, as c declines and the error cluster correlation increases, the variation in has a more pronounced impact on the e ective number of clusters. his impact is most clearly revealed in the riht column of Fiure, where the rane of computed values for the e ective number of clusters tends to widen as the cluster correlation of the error rows. he results in Fiure indicate that the estimator of the standard error is likely to behave more erratically for data sets with larer variation in cluster sizes and where the within cluster correlations are more pronounced. o con rm, h we next compute the (conditional) mean-squared error of V b i bva V, E a V a. o compute this quantity, which is conditional on the realized value of, we simulate data as follows. For each set of cluster sizes and value of c, we enerate the covariate matrix 5 times. For each of these 5 values of, we enerate 000 vectors for u. As each of these vectors ives rise to a di erent value for V b, we have 000 values of V b for each desin matrix from which we calculate the mean-squared error for V b. We report the results for hypothesis tests on the coe cient for the clusterinvariant covariate in Fiure and for the cluster varyin covariate in Fiure 3. Each point in the ures corresponds to the mean-squared error for a speci c desin, where the desin consists of the set of cluster sizes, the value of c and the enerated covariate matrix. Because there are 5 enerated covariate matrices for each of 0 sets of cluster sizes and 4 values of c, there are 00 points, each correspondin to an estimated mean-squared error. Points clustered to the riht, for which the e ective number of clusters is nearly 00, enerally correspond to desins in which all clusters are equal. Points appearin far to the left, for which the e ective number of clusters is less than 0, enerally correspond to desins in which the cluster size variation is pronounced and the value of c is small.

12 Fiure Both ures reveal a strikin pattern. As the e ective number of clusters drops below 40, the mean-squared error of the cluster-robust variance estimator bv rises exponentially. Indeed, an exponential curve is tted to each ure and the point estimates for the two curves are similar ln (mse) = 3:9 (:0) ln (mse) = 3:9 (:0) 0:88 (:03) G for the coe cient on x 0:79 (:0) G for the coe cient on z i :

13 Fiure 3 he increase in mean-squared error bes the question: Is the source of the mean-squared error the bias or the variation of V b? We compute the bias of bv and report the results for the coe cient on the cluster-invariant covariate in Fiure 4 and for the cluster-varyin covariate in Fiure 5. In each ure we plot the manitude of the bias as a function of the larest cluster size. Fiure 4 reveals that increasin the variance of cluster sizes leads to a lare downward bias in V b for the cluster-invariant covariate and that this bias is relatively constant as the cluster correlation for the error chanes. We thus conclude that for the coe cient on the cluster-invariant covariate it is increasin variation in cluster sizes that drives down the e ective number of clusters and results in substantial downward bias in the estimated standard error. 3

14 Fiure 4 For the cluster-varyin coe cient, in contrast, bias plays only a small role in the mean-squared error of the cluster-robust variance estimator. (he bias is pronounced for the case in which c = 500, but as this case has virtually no error cluster correlation, the bias is a re ection of a downward bias arisin from heteroskedasticity.) It seems that for the cluster-varyin coe cient, the meansquared error of the cluster-robust standard error is driven almost exclusively by variation in the estimated standard error. 4

15 Fiure 5 he increase in the mean-squared error of V b has a direct impact on the size of hypothesis tests. As for the mean-squared error calculations, the empirical size is based on the 000 values of V b for each of the 00 covariate matrices (5 covariate matrices for each pair of cluster sizes and value of c). he results are contained in Fiure 6 for the cluster-invariant covariate and in Fiure 7 for the cluster-varyin covariate. 5

16 Fiure 6 For the cluster-invariant covariate, we see in Fiure 6 that the lare downward bias in the cluster-robust standard error translates into in ated test sizes. While the empirical size beins to rise above the nominal size of 5 percent when the e ective number of clusters falls below 40, substantial size distortion only occurs when the e ective number of clusters falls below 0 (re ectin the exponential increase in the mean-squared error). In Fiure 7 we report the nominal size of a test on the coe cient for the cluster-varyin coe cient. We see that while there is some size distortion for very small values of the e ective number of clusters, the absence of substantial bias in the cluster-robust standard error estimator mitiates the distortion. Rather, we see the impact of the variation in the standard error estimator, with empirical size fallin both above and below the nominal level. In sum, the impact of cluster heteroeneity, as re ected in a low e ective number of clusters, is pronounced for test of the cluster-invariant covariate but of much less import for test of the cluster-varyin covariate. 6

17 Fiure 7 Of course, the above calculations rely on knowlede of the error cluster correlation. In practice, absent knowlede of this correlation, can the assumption that all errors are perfectly correlated with a cluster (c = 0) provide a conservative bound? As size distortion is pronounced only for the cluster-invariant covariate, we focus our attention on this covariate. In Fiure 8, we report the mean-squared error of V b as a function of the estimated e ective number of clusters. Because the estimated e ective number of clusters assume c = 0, the estimated e ective number of clusters forms a lower bound for the actual number of clusters. In comparison with Fiure, we see that the only points that lie far from the curve are associated with c = 500. If c = 500, then there is virtually no within cluster correlation, so it is perhaps not surprisin that the assumption of perfect within cluster correlation would provide a conservative bound. 7

18 Fiure 8 4 Remarks Consistency of the cluster-robust variance estimator, and the asymptotic null distribution of the resultant t test statistics, are impacted by cluster heteroeneity. We establish conditions under which the cluster-robust variance estimator is consistent and the resultant t test statistics have an asymptotic Gaussian null distribution, as the number of clusters rows without bound. he conditions lead to a sample speci c measure of cluster heteroeneity, which translates into an e ective number of clusters. he e ective number of clusters is informative for the nite sample behavior of the cluster-robust variance estimator in that a low e ective number of clusters sinals downward bias in the estimated standard errors for the cluster invariant reressor. While the e ective number of clusters should be calculated for all studies that employ the cluster-robust t statistic, we demonstrate that for many forms of heteroeneity the e ective number of clusters is a lare fraction of the actual number of clusters. If the e ective number of clusters is lare, then Gaussian critical values are appropriate. If the e ective number of clusters is not lare, then an alternative method of inference is needed. hree natural alternatives arise: ) use critical values from a t distribution, ) use critical values from a re-samplin method and 3) use an alternative test statistic. We discuss each of these alternatives in turn. 8

19 Use of critical values from a t distribution is arued for by Kott (994). Kott arues that the derees of freedom should be selected to mirror the variation of the cluster-robust variance estimator. Althouh his analysis does not contain formal asymptotic results, our analysis would suest that the derees of freedom in the approximatin distribution should be set equal to the e ective number of clusters. Yet we suest caution before employin critical values from a t distribution. o understand why, consider Z e = a (^ p 0 ). Under eva V homoeneous clusters G ~ a V a G, which would seem to imply that Z e t (G). he implication does not hold because the numerator and denominator of Z e are correlated. he error from approximatin Z e by a t distribution is mani ed under cluster heteroeneity because a G ~V is not a (G=(+ )) random (+ ) V a variable. Further, the test statistic Z is constructed with V b a rather than V e a, which adds a third source of approximation error. Because it is di cult to bound the approximation error that these three sources induce, use of critical values from a t (G=(+ )) could lead to di culty in controllin the size of the test. his caution is borne out in our preliminary simulations. For a desin in which the e ective number of clusters is 33, the critical value that delivers an empirical size of 5 percent is.0, while the critical value from a t distribution with 33 derees of freedom is.04. In sum, we have not established that critical values from a t distribution with the e ective number of clusters as the derees of freedom provide a conservative sized test, and this topic remains for future study. Use of critical values from a bootstrap is arued for by Cameron, Gelbach and Miller (008), who focus on data eneratin processes with homoenous clusters and compare several re-samplin methods. hey nd that use of the wild bootstrap leads to critical values that can control the size of the clusterrobust t statistic. Extendin these results to allow for full ranes of cluster heteroeneity remains a topic for further study. Ibraimov and Mueller propose an alternative test statistic, for which critical values are obtained from the t distribution. he alternative test statistic is based on the averae of cluster speci c coe cient estimators. While such a statistic cannot be used to test hypotheses reardin covariates that are invariant within clusters, the statistic is a simple alternative for test reardin covariates that do vary within clusters. Moreover, the critical value from a t distribution with G derees of freedom is conservative, so the size of the test is controlled. (Size control depends on the nominal size of the test, as the critical values cannot be conservative for all nominal sizes. But for a test with the conventional size of 5 percent, the critical value is conservative.) As indicated by Result, the cluster-robust t statistic depends on weihted averaes of cluster speci c quantities. As the use of unequal weihts in constructin the test statistic likely alters the power of the test statistic, it is an open question as to the conditions under which each statistic delivers hiher power. his too, remains a topic for further study. 9

20 5 Appendix Verification of Result : Let be the n k covariate matrix with all rows that do not correspond to cluster set to zero. Part a: Observe that because y = P y, ^ = y A ^ : (8) As V V ar ^j, the cluster representation of V in (4) follows directly from (8). o derive the cluster representation of V b in (4), note that = =. Hence y ^ h = i y h = i y = ^ ^ : hus because A ^ y ^ = y ^ = ^u ; (9) ^ = ^u and ^u = ^u. Hence the cluster representation of V b in (4) follows directly from (9). Part b: he estimator V b is a function of the residuals y y = (I n ) y = ^u; where =. components hese residuals can be decomposed into two ^u = (I n G + G ) y = (I n G ) y + ( G ) y = ^u W + ^u B ; where G = P is the projection operator onto the cluster speci c models. he residual component ^u W captures the within cluster variation while the residual component ^u B captures the between cluster variation. he quantity V b depends on the residuals throuh the linear function ^u = ^u. Hence, ^u = ^u W + ^u B : Because certain elements of may not be identi ed within cluster, we use the eneralized inverse de ned such that = (Harville 997, heorem.3.4 part (5), p. 67). 0

21 Because the least squares residuals are orthoonal to the correspondin model space, and ^u W = (I n G ) y = y = 0; ^u B = ( G ) y = A y 6= 0: hus, b V is only a function of between cluster variation. Proof of Lemma : Let Q := a A ^ Assumption, Q j N 0; a A V ar ^ A a. Under part (i) of. hus e V a is the sum of G squared, independent, normal random variables. Under normality V ar Q = [V ar (Q )], so V ar eva = h a A V ar ^ A ai : P. he manitude of this quantity is relative to Va, which is a A V ar ^ A a hus 8" # 9 < P h i eva V = a E : V a ; = a A V ar ^ A a P : a A V ar ^ A a From the de nition of (; ), 8" # 9 < eva V = a E : V a ; = ( + (; )) : G Q.E.D. Proof of Lemma : We wish to establish that E a V b V a = E a V e V a ( + a ()) (0) where a () is bounded in probability as n!. follows from the expansion E a V b V a = E a V e V +E na e V V a he settin of the problem a + E a V b a b V e V e V o a ; a

22 for which the Cauchy-Schwarz inequality yields (0) with 3 E a bv V e a a () = : E a ev V a he rest of the calculation is to show that a () is bounded in probability as n!. o bein, note bv V e = A ^ ^ ^ ^ ^ ^ A = = = A ^ ^ ^ ^ ^ ^ + ^ ^ A A ^ A ^ ^ ^ + ^ ^ A ^ + ^ ^ + ^ ^ A : When takin the quadratic form, this becomes a V b V e a = a A ^ = a A ^ ^ A a + ^ + a A ^ ^ A a ^ ^ A a: o obtain the limitin behavior, we use the fact that P A = I to introduce the quantity A G I, so that a A ^ ^ A a = = a A ^ a A ^ " ^ A G I a + a A ^ ^ A G I a + where the last equality follows from P A ^ = ^. a b V e V a = + a A ^ a A ^ # a A ^ ^ We now have ^ G I a G I a; ^ A G I a + () ^ A G I a G a ^ ^ a:

23 herefore, our bound on E a bv V e a depends on a bound on the conditional expectation of the square of each of the three terms on the riht side of (). Each term is the product of vector norms of the form ^ A a N 0; a AV ar ^ A a. We bein with the last term, for which ( ) E G a ^ ^ a = 4 G E a ^ = 3 a G V a : For the remainin two terms in (), let a = a A ^ A ^ G I a. Because " # " a b we have 8! 9 < ( = E a : b ; E 8 < 4E : 4 a! a # " b # ) b! ; () and let b = 93 = a! 8 93 < = 5 4E b ; :! 5 ; E! 3 a E! 3 b 4 5 ; where the last two inequalities are obtained from the Cauchy-Schwarz and Minkowski inequalities, respectively. Here E h i # a 4 = 3 a A V ar ^ A a and E b 4 = 3 "a A G I V A G I a : Hence, the rst term on the riht side of () is bounded as 8 <! 9 E a = A ^ ^ A : G I a (3) ; " # " 3 a A V ar ^ A a a A G I V A a# G I = 3 a V a " a A G I V A G I a# : 3

24 A For the second term on the riht side of (), because P G I = 0, 8 <! 9 E a = A : ^ ^ A G I a (4) ; 8 < = E a! 9 A : G I ^ = ^ A G I a ; " # 3 a A G I V A G I a : Because (f + f f 3 ) 8f +6f +3f3, we combine the bound from Lemma with (), (3) and (4) to obtain ( " # a () G a A + (; ) G I a V A G I a V a (5) " # 9 +G a A G I a V A G I a V a = + G; : From this bound, we are able to verify that a () is bounded in probability as n!. o do so, observe that without loss of enerality we can assume kak =. From Assumption (ii) the second term on the riht side of (5) is bounded in probability, so we are left to verify that P a A G I V A G I a a V a is o P G. Under Assumption (i) supp M! ; V a and Assumption (ii) implies A a G a = o P G : Q.E.D. Proof of heorem : By our Assumption (i) the random variable W = a ^ 0 = p a V a N (0; ) ; 4

25 under the null hypothesis. hus the test statistic a V a Z = W a V b a will convere in distribution to W if a V b a a V a By Lemma E a V b V a = E a V e ; P!, by Slutsky s lemma. V a ( + a ()) : Lemma further implies 8 9 >< a bv V a >= E >: a V a >; = G ( + (; )) ( + a ()) : De ne the set B to be n ( + a ()) < B o, where B is de ned as in Lemma. herefore, to imply converence in probability, ( ) ( )! bv a P V a > " bv a P V a > " B + P (B c ) " ( )!# bv a = E P V a > " B + P (B c ) >< 6 a bv V a >= B 7 E 4E >: a V a >; " 5 + P (B c ) ; where the last inequality follows from Chebyshev s inequality and the fact that B (). From the de nition of B, toether with application of the lemmas described above ( ) bv a B P V a > " E ( + (; )) + P (B c ) ; G" where each term oes to zero by Assumptions 3(i) and 3(i) and Lemma, respectively. Hence ( ) bv a supp V a > "! 0; which implies converence in distribution. Q.E.D. Verification of Result : he for estimatin the term is the coe cient of variation for 0 = 0 : (6) 5

26 Here 0 = (n ) n (n )! ; (7) where n is the total number of observations and is the total number of observations where the covariate is. Further = k k ; (8) where is an indicator function of which roups have covariate. If we insert the results from (7) and (8) into (6) we obtain = k k (n ) + n (n ) : he averae value of is = k k (n ) + n (n ) M G : It follows that = n (n ) + n (n ) M G M G M : G Because all clusters are of equal size, =n = M=G and = (G M) M (G M) : 6

27 References [] Cameron, A., J. Gelbach and D. Miller, Bootstrap-Based Improvements for Inference with Clustered Errors, Review of Economics and Statistics, 008, 90, [] Harville, D., Matrix Alebra from a Statistician s Perspective, 997, Spriner: New York. [3] Ibraimov, R. and U. Müller, t-statistic Based Correlation and Heteroeneity Robust Inference, Journal of Business and Economic Statistics, 00, 8, [4] Kloek,., OLS Estimation in a Model where a Microvariable is Explained by Areates and Contemporaneous Disturbances are Equicorrelated, Econometrica, 98, 49, [5] Kott, P., A Hypothesis est of Linear Reression Coe cients with Survey Data, Survey Methodoloy, 994, 0, [6] Krueer, A., Experimental Estimates of Educational Production Functions, Quarterly Journal of Economics, 999, 4, [7] Kuhn, P., P. Kooreman, A. Soetevent and A. Kapteyn, he E ects of Lottery Prizes on Winners and heir Neihbors: Evidence from the Dutch Postcode Lottery, American Economic Review, 0, 0, [8] Moulton, B., Random Group E ects and the Precision of Reression Estimates, Journal of Econometrics, 986, 3, [9] Roers, W., Reression Standard Errors in Clustered Samples, Stata echnical Bulletin 3, 993, 3, 9-3. [0] Shah, B., M. Holt and R. Folsom, Inference About Reression Models from Sample Survey Data, Bulletin of the International Statistical Institute Proceedins of the 4st Session, 977, 47:3, [] White, H., Asymptotic heory for Econometricians, 984, Academic Press: San Dieo. 7

GMM-based inference in the AR(1) panel data model for parameter values where local identi cation fails

GMM-based inference in the AR(1) panel data model for parameter values where local identi cation fails GMM-based inference in the AR() panel data model for parameter values where local identi cation fails Edith Madsen entre for Applied Microeconometrics (AM) Department of Economics, University of openhagen,

More information

Economics 241B Estimation with Instruments

Economics 241B Estimation with Instruments Economics 241B Estimation with Instruments Measurement Error Measurement error is de ned as the error resulting from the measurement of a variable. At some level, every variable is measured with error.

More information

A New Approach to Robust Inference in Cointegration

A New Approach to Robust Inference in Cointegration A New Approach to Robust Inference in Cointegration Sainan Jin Guanghua School of Management, Peking University Peter C. B. Phillips Cowles Foundation, Yale University, University of Auckland & University

More information

Testing for Regime Switching: A Comment

Testing for Regime Switching: A Comment Testing for Regime Switching: A Comment Andrew V. Carter Department of Statistics University of California, Santa Barbara Douglas G. Steigerwald Department of Economics University of California Santa Barbara

More information

Panel Data. March 2, () Applied Economoetrics: Topic 6 March 2, / 43

Panel Data. March 2, () Applied Economoetrics: Topic 6 March 2, / 43 Panel Data March 2, 212 () Applied Economoetrics: Topic March 2, 212 1 / 43 Overview Many economic applications involve panel data. Panel data has both cross-sectional and time series aspects. Regression

More information

Comment on HAC Corrections for Strongly Autocorrelated Time Series by Ulrich K. Müller

Comment on HAC Corrections for Strongly Autocorrelated Time Series by Ulrich K. Müller Comment on HAC Corrections for Strongly Autocorrelated ime Series by Ulrich K. Müller Yixiao Sun Department of Economics, UC San Diego May 2, 24 On the Nearly-optimal est Müller applies the theory of optimal

More information

Inference about Clustering and Parametric. Assumptions in Covariance Matrix Estimation

Inference about Clustering and Parametric. Assumptions in Covariance Matrix Estimation Inference about Clustering and Parametric Assumptions in Covariance Matrix Estimation Mikko Packalen y Tony Wirjanto z 26 November 2010 Abstract Selecting an estimator for the variance covariance matrix

More information

Notes on Time Series Modeling

Notes on Time Series Modeling Notes on Time Series Modeling Garey Ramey University of California, San Diego January 17 1 Stationary processes De nition A stochastic process is any set of random variables y t indexed by t T : fy t g

More information

Wild Bootstrap Inference for Wildly Dierent Cluster Sizes

Wild Bootstrap Inference for Wildly Dierent Cluster Sizes Wild Bootstrap Inference for Wildly Dierent Cluster Sizes Matthew D. Webb October 9, 2013 Introduction This paper is joint with: Contributions: James G. MacKinnon Department of Economics Queen's University

More information

Econometrics II. Nonstandard Standard Error Issues: A Guide for the. Practitioner

Econometrics II. Nonstandard Standard Error Issues: A Guide for the. Practitioner Econometrics II Nonstandard Standard Error Issues: A Guide for the Practitioner Måns Söderbom 10 May 2011 Department of Economics, University of Gothenburg. Email: mans.soderbom@economics.gu.se. Web: www.economics.gu.se/soderbom,

More information

11. Bootstrap Methods

11. Bootstrap Methods 11. Bootstrap Methods c A. Colin Cameron & Pravin K. Trivedi 2006 These transparencies were prepared in 20043. They can be used as an adjunct to Chapter 11 of our subsequent book Microeconometrics: Methods

More information

Speci cation of Conditional Expectation Functions

Speci cation of Conditional Expectation Functions Speci cation of Conditional Expectation Functions Econometrics Douglas G. Steigerwald UC Santa Barbara D. Steigerwald (UCSB) Specifying Expectation Functions 1 / 24 Overview Reference: B. Hansen Econometrics

More information

Environmental Econometrics

Environmental Econometrics Environmental Econometrics Syngjoo Choi Fall 2008 Environmental Econometrics (GR03) Fall 2008 1 / 37 Syllabus I This is an introductory econometrics course which assumes no prior knowledge on econometrics;

More information

1 The Multiple Regression Model: Freeing Up the Classical Assumptions

1 The Multiple Regression Model: Freeing Up the Classical Assumptions 1 The Multiple Regression Model: Freeing Up the Classical Assumptions Some or all of classical assumptions were crucial for many of the derivations of the previous chapters. Derivation of the OLS estimator

More information

Chapter 2. GMM: Estimating Rational Expectations Models

Chapter 2. GMM: Estimating Rational Expectations Models Chapter 2. GMM: Estimating Rational Expectations Models Contents 1 Introduction 1 2 Step 1: Solve the model and obtain Euler equations 2 3 Step 2: Formulate moment restrictions 3 4 Step 3: Estimation and

More information

Time Series Models and Inference. James L. Powell Department of Economics University of California, Berkeley

Time Series Models and Inference. James L. Powell Department of Economics University of California, Berkeley Time Series Models and Inference James L. Powell Department of Economics University of California, Berkeley Overview In contrast to the classical linear regression model, in which the components of the

More information

Stat260: Bayesian Modeling and Inference Lecture Date: March 10, 2010

Stat260: Bayesian Modeling and Inference Lecture Date: March 10, 2010 Stat60: Bayesian Modelin and Inference Lecture Date: March 10, 010 Bayes Factors, -priors, and Model Selection for Reression Lecturer: Michael I. Jordan Scribe: Tamara Broderick The readin for this lecture

More information

Robust Semiparametric Optimal Testing Procedure for Multiple Normal Means

Robust Semiparametric Optimal Testing Procedure for Multiple Normal Means Veterinary Dianostic and Production Animal Medicine Publications Veterinary Dianostic and Production Animal Medicine 01 Robust Semiparametric Optimal Testin Procedure for Multiple Normal Means Pen Liu

More information

Estimation and Inference of Linear Trend Slope Ratios

Estimation and Inference of Linear Trend Slope Ratios Estimation and Inference of Linear rend Slope Ratios imothy J. Vogelsang and Nasreen Nawaz Department of Economics, Michigan State University October 4 Abstract We focus on the estimation of the ratio

More information

1. The Multivariate Classical Linear Regression Model

1. The Multivariate Classical Linear Regression Model Business School, Brunel University MSc. EC550/5509 Modelling Financial Decisions and Markets/Introduction to Quantitative Methods Prof. Menelaos Karanasos (Room SS69, Tel. 08956584) Lecture Notes 5. The

More information

Chapter 2. Dynamic panel data models

Chapter 2. Dynamic panel data models Chapter 2. Dynamic panel data models School of Economics and Management - University of Geneva Christophe Hurlin, Université of Orléans University of Orléans April 2018 C. Hurlin (University of Orléans)

More information

ECONOMETRICS FIELD EXAM Michigan State University May 9, 2008

ECONOMETRICS FIELD EXAM Michigan State University May 9, 2008 ECONOMETRICS FIELD EXAM Michigan State University May 9, 2008 Instructions: Answer all four (4) questions. Point totals for each question are given in parenthesis; there are 00 points possible. Within

More information

Testing for a Trend with Persistent Errors

Testing for a Trend with Persistent Errors Testing for a Trend with Persistent Errors Graham Elliott UCSD August 2017 Abstract We develop new tests for the coe cient on a time trend in a regression of a variable on a constant and time trend where

More information

Notes on Asymptotic Theory: Convergence in Probability and Distribution Introduction to Econometric Theory Econ. 770

Notes on Asymptotic Theory: Convergence in Probability and Distribution Introduction to Econometric Theory Econ. 770 Notes on Asymptotic Theory: Convergence in Probability and Distribution Introduction to Econometric Theory Econ. 770 Jonathan B. Hill Dept. of Economics University of North Carolina - Chapel Hill November

More information

SIMILAR-ON-THE-BOUNDARY TESTS FOR MOMENT INEQUALITIES EXIST, BUT HAVE POOR POWER. Donald W. K. Andrews. August 2011

SIMILAR-ON-THE-BOUNDARY TESTS FOR MOMENT INEQUALITIES EXIST, BUT HAVE POOR POWER. Donald W. K. Andrews. August 2011 SIMILAR-ON-THE-BOUNDARY TESTS FOR MOMENT INEQUALITIES EXIST, BUT HAVE POOR POWER By Donald W. K. Andrews August 2011 COWLES FOUNDATION DISCUSSION PAPER NO. 1815 COWLES FOUNDATION FOR RESEARCH IN ECONOMICS

More information

Lecture 4: Linear panel models

Lecture 4: Linear panel models Lecture 4: Linear panel models Luc Behaghel PSE February 2009 Luc Behaghel (PSE) Lecture 4 February 2009 1 / 47 Introduction Panel = repeated observations of the same individuals (e.g., rms, workers, countries)

More information

Parametric Inference on Strong Dependence

Parametric Inference on Strong Dependence Parametric Inference on Strong Dependence Peter M. Robinson London School of Economics Based on joint work with Javier Hualde: Javier Hualde and Peter M. Robinson: Gaussian Pseudo-Maximum Likelihood Estimation

More information

Robust Con dence Intervals in Nonlinear Regression under Weak Identi cation

Robust Con dence Intervals in Nonlinear Regression under Weak Identi cation Robust Con dence Intervals in Nonlinear Regression under Weak Identi cation Xu Cheng y Department of Economics Yale University First Draft: August, 27 This Version: December 28 Abstract In this paper,

More information

Lecture Notes on Measurement Error

Lecture Notes on Measurement Error Steve Pischke Spring 2000 Lecture Notes on Measurement Error These notes summarize a variety of simple results on measurement error which I nd useful. They also provide some references where more complete

More information

Strati cation in Multivariate Modeling

Strati cation in Multivariate Modeling Strati cation in Multivariate Modeling Tihomir Asparouhov Muthen & Muthen Mplus Web Notes: No. 9 Version 2, December 16, 2004 1 The author is thankful to Bengt Muthen for his guidance, to Linda Muthen

More information

Heteroskedasticity-Robust Inference in Finite Samples

Heteroskedasticity-Robust Inference in Finite Samples Heteroskedasticity-Robust Inference in Finite Samples Jerry Hausman and Christopher Palmer Massachusetts Institute of Technology December 011 Abstract Since the advent of heteroskedasticity-robust standard

More information

GMM estimation of spatial panels

GMM estimation of spatial panels MRA Munich ersonal ReEc Archive GMM estimation of spatial panels Francesco Moscone and Elisa Tosetti Brunel University 7. April 009 Online at http://mpra.ub.uni-muenchen.de/637/ MRA aper No. 637, posted

More information

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria SOLUTION TO FINAL EXAM Friday, April 12, 2013. From 9:00-12:00 (3 hours) INSTRUCTIONS:

More information

GMM based inference for panel data models

GMM based inference for panel data models GMM based inference for panel data models Maurice J.G. Bun and Frank Kleibergen y this version: 24 February 2010 JEL-code: C13; C23 Keywords: dynamic panel data model, Generalized Method of Moments, weak

More information

Optimal Bandwidth Selection in Heteroskedasticity-Autocorrelation Robust Testing

Optimal Bandwidth Selection in Heteroskedasticity-Autocorrelation Robust Testing Optimal Bandwidth Selection in Heteroskedasticity-Autocorrelation Robust Testing Yixiao Sun Department of Economics University of California, San Diego Peter C. B. Phillips Cowles Foundation, Yale University,

More information

MC3: Econometric Theory and Methods. Course Notes 4

MC3: Econometric Theory and Methods. Course Notes 4 University College London Department of Economics M.Sc. in Economics MC3: Econometric Theory and Methods Course Notes 4 Notes on maximum likelihood methods Andrew Chesher 25/0/2005 Course Notes 4, Andrew

More information

The Impact of a Hausman Pretest on the Size of a Hypothesis Test: the Panel Data Case

The Impact of a Hausman Pretest on the Size of a Hypothesis Test: the Panel Data Case The Impact of a Hausman retest on the Size of a Hypothesis Test: the anel Data Case atrik Guggenberger Department of Economics UCLA September 22, 2008 Abstract: The size properties of a two stage test

More information

2014 Preliminary Examination

2014 Preliminary Examination 014 reliminary Examination 1) Standard error consistency and test statistic asymptotic normality in linear models Consider the model for the observable data y t ; x T t n Y = X + U; (1) where is a k 1

More information

On the Power of Tests for Regime Switching

On the Power of Tests for Regime Switching On the Power of Tests for Regime Switching joint work with Drew Carter and Ben Hansen Douglas G. Steigerwald UC Santa Barbara May 2015 D. Steigerwald (UCSB) Regime Switching May 2015 1 / 42 Motivating

More information

Chapter 1. GMM: Basic Concepts

Chapter 1. GMM: Basic Concepts Chapter 1. GMM: Basic Concepts Contents 1 Motivating Examples 1 1.1 Instrumental variable estimator....................... 1 1.2 Estimating parameters in monetary policy rules.............. 2 1.3 Estimating

More information

A Conditional-Heteroskedasticity-Robust Con dence Interval for the Autoregressive Parameter

A Conditional-Heteroskedasticity-Robust Con dence Interval for the Autoregressive Parameter A Conditional-Heteroskedasticity-Robust Con dence Interval for the Autoregressive Parameter Donald W. K. Andrews Cowles Foundation for Research in Economics Yale University Patrik Guggenberger Department

More information

The Case Against JIVE

The Case Against JIVE The Case Against JIVE Related literature, Two comments and One reply PhD. student Freddy Rojas Cama Econometrics Theory II Rutgers University November 14th, 2011 Literature 1 Literature 2 Key de nitions

More information

Likelihood Ratio Based Test for the Exogeneity and the Relevance of Instrumental Variables

Likelihood Ratio Based Test for the Exogeneity and the Relevance of Instrumental Variables Likelihood Ratio Based est for the Exogeneity and the Relevance of Instrumental Variables Dukpa Kim y Yoonseok Lee z September [under revision] Abstract his paper develops a test for the exogeneity and

More information

Economics 241B Review of Limit Theorems for Sequences of Random Variables

Economics 241B Review of Limit Theorems for Sequences of Random Variables Economics 241B Review of Limit Theorems for Sequences of Random Variables Convergence in Distribution The previous de nitions of convergence focus on the outcome sequences of a random variable. Convergence

More information

Solution to the take home exam for ECON 3150/4150

Solution to the take home exam for ECON 3150/4150 Solution to the tae home exam for ECO 350/450 Jia Zhiyan and Jo Thori Lind April 2004 General comments Most of the copies we ot were quite ood, and it seems most of you have done a real effort on the problem

More information

We begin by thinking about population relationships.

We begin by thinking about population relationships. Conditional Expectation Function (CEF) We begin by thinking about population relationships. CEF Decomposition Theorem: Given some outcome Y i and some covariates X i there is always a decomposition where

More information

Chapter 6. Maximum Likelihood Analysis of Dynamic Stochastic General Equilibrium (DSGE) Models

Chapter 6. Maximum Likelihood Analysis of Dynamic Stochastic General Equilibrium (DSGE) Models Chapter 6. Maximum Likelihood Analysis of Dynamic Stochastic General Equilibrium (DSGE) Models Fall 22 Contents Introduction 2. An illustrative example........................... 2.2 Discussion...................................

More information

A formal statistical test for the number of factors in. the approximate factor models

A formal statistical test for the number of factors in. the approximate factor models A formal statistical test for the number of factors in the approximate factor models Alexei Onatski Economics Department, Columbia University September 26, 2006 Abstract In this paper we study i.i.d. sequences

More information

Inference for Clustered Data

Inference for Clustered Data Inference for Clustered Data Chang Hyung Lee and Douglas G. Steigerwald Department of Economics University of California, Santa Barbara March 23, 2017 Abstract This article introduces eclust, a new Stata

More information

A CONDITIONAL-HETEROSKEDASTICITY-ROBUST CONFIDENCE INTERVAL FOR THE AUTOREGRESSIVE PARAMETER. Donald W.K. Andrews and Patrik Guggenberger

A CONDITIONAL-HETEROSKEDASTICITY-ROBUST CONFIDENCE INTERVAL FOR THE AUTOREGRESSIVE PARAMETER. Donald W.K. Andrews and Patrik Guggenberger A CONDITIONAL-HETEROSKEDASTICITY-ROBUST CONFIDENCE INTERVAL FOR THE AUTOREGRESSIVE PARAMETER By Donald W.K. Andrews and Patrik Guggenberger August 2011 Revised December 2012 COWLES FOUNDATION DISCUSSION

More information

Lecture 1- The constrained optimization problem

Lecture 1- The constrained optimization problem Lecture 1- The constrained optimization problem The role of optimization in economic theory is important because we assume that individuals are rational. Why constrained optimization? the problem of scarcity.

More information

Single-Equation GMM: Endogeneity Bias

Single-Equation GMM: Endogeneity Bias Single-Equation GMM: Lecture for Economics 241B Douglas G. Steigerwald UC Santa Barbara January 2012 Initial Question Initial Question How valuable is investment in college education? economics - measure

More information

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Winter 2014 Instructor: Victor Aguirregabiria

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Winter 2014 Instructor: Victor Aguirregabiria ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Winter 2014 Instructor: Victor guirregabiria SOLUTION TO FINL EXM Monday, pril 14, 2014. From 9:00am-12:00pm (3 hours) INSTRUCTIONS:

More information

Mismeasurement of Distance E ects: The Role of Internal Location of Production

Mismeasurement of Distance E ects: The Role of Internal Location of Production Mismeasurement of Distance E ects: The Role of Internal Location of Production Hakan Yilmazkuday December 15, 2014 Abstract The estimated e ects of distance in empirical international trade reressions

More information

Habits and Multiple Equilibria

Habits and Multiple Equilibria Habits and Multiple Equilibria Lorenz Kuen and Eveny Yakovlev February 5, 2018 Abstract Abstract TBA. A Introduction Brief introduction TBA B A Structural Model of Taste Chanes Several structural models

More information

Business Economics BUSINESS ECONOMICS. PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS MODULE No. : 3, GAUSS MARKOV THEOREM

Business Economics BUSINESS ECONOMICS. PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS MODULE No. : 3, GAUSS MARKOV THEOREM Subject Business Economics Paper No and Title Module No and Title Module Tag 8, Fundamentals of Econometrics 3, The gauss Markov theorem BSE_P8_M3 1 TABLE OF CONTENTS 1. INTRODUCTION 2. ASSUMPTIONS OF

More information

Phase Diagrams: construction and comparative statics

Phase Diagrams: construction and comparative statics 1 / 11 Phase Diarams: construction and comparative statics November 13, 215 Alecos Papadopoulos PhD Candidate Department of Economics, Athens University of Economics and Business papadopalex@aueb.r, https://alecospapadopoulos.wordpress.com

More information

x i = 1 yi 2 = 55 with N = 30. Use the above sample information to answer all the following questions. Show explicitly all formulas and calculations.

x i = 1 yi 2 = 55 with N = 30. Use the above sample information to answer all the following questions. Show explicitly all formulas and calculations. Exercises for the course of Econometrics Introduction 1. () A researcher is using data for a sample of 30 observations to investigate the relationship between some dependent variable y i and independent

More information

Tests for Cointegration, Cobreaking and Cotrending in a System of Trending Variables

Tests for Cointegration, Cobreaking and Cotrending in a System of Trending Variables Tests for Cointegration, Cobreaking and Cotrending in a System of Trending Variables Josep Lluís Carrion-i-Silvestre University of Barcelona Dukpa Kim y Korea University May 4, 28 Abstract We consider

More information

POWER MAXIMIZATION AND SIZE CONTROL IN HETEROSKEDASTICITY AND AUTOCORRELATION ROBUST TESTS WITH EXPONENTIATED KERNELS

POWER MAXIMIZATION AND SIZE CONTROL IN HETEROSKEDASTICITY AND AUTOCORRELATION ROBUST TESTS WITH EXPONENTIATED KERNELS POWER MAXIMIZAION AND SIZE CONROL IN HEEROSKEDASICIY AND AUOCORRELAION ROBUS ESS WIH EXPONENIAED KERNELS By Yixiao Sun, Peter C. B. Phillips and Sainan Jin January COWLES FOUNDAION DISCUSSION PAPER NO.

More information

Testing Weak Convergence Based on HAR Covariance Matrix Estimators

Testing Weak Convergence Based on HAR Covariance Matrix Estimators Testing Weak Convergence Based on HAR Covariance atrix Estimators Jianning Kong y, Peter C. B. Phillips z, Donggyu Sul x August 4, 207 Abstract The weak convergence tests based on heteroskedasticity autocorrelation

More information

1 Correlation between an independent variable and the error

1 Correlation between an independent variable and the error Chapter 7 outline, Econometrics Instrumental variables and model estimation 1 Correlation between an independent variable and the error Recall that one of the assumptions that we make when proving the

More information

The Exact Distribution of the t-ratio with Robust and Clustered Standard Errors

The Exact Distribution of the t-ratio with Robust and Clustered Standard Errors The Exact Distribution of the t-ratio with Robust and Clustered Standard Errors by Bruce E. Hansen Department of Economics University of Wisconsin October 2018 Bruce Hansen (University of Wisconsin) Exact

More information

Generalized Least-Squares Regressions V: Multiple Variables

Generalized Least-Squares Regressions V: Multiple Variables City University of New York (CUNY) CUNY Academic Works Publications Research Kinsborouh Community Collee -05 Generalized Least-Squares Reressions V: Multiple Variables Nataniel Greene CUNY Kinsborouh Community

More information

Markov-Switching Models with Endogenous Explanatory Variables. Chang-Jin Kim 1

Markov-Switching Models with Endogenous Explanatory Variables. Chang-Jin Kim 1 Markov-Switching Models with Endogenous Explanatory Variables by Chang-Jin Kim 1 Dept. of Economics, Korea University and Dept. of Economics, University of Washington First draft: August, 2002 This version:

More information

Föreläsning /31

Föreläsning /31 1/31 Föreläsning 10 090420 Chapter 13 Econometric Modeling: Model Speci cation and Diagnostic testing 2/31 Types of speci cation errors Consider the following models: Y i = β 1 + β 2 X i + β 3 X 2 i +

More information

Notes on Generalized Method of Moments Estimation

Notes on Generalized Method of Moments Estimation Notes on Generalized Method of Moments Estimation c Bronwyn H. Hall March 1996 (revised February 1999) 1. Introduction These notes are a non-technical introduction to the method of estimation popularized

More information

Grouped Effects Estimators in Fixed Effects Models

Grouped Effects Estimators in Fixed Effects Models Grouped Effects Estimators in Fixed Effects Models C. Alan Bester and Christian B. Hansen April 2009 Abstract. We consider estimation of nonlinear panel data models with common and individual specific

More information

THE SIGNAL ESTIMATOR LIMIT SETTING METHOD

THE SIGNAL ESTIMATOR LIMIT SETTING METHOD ' THE SIGNAL ESTIMATOR LIMIT SETTING METHOD Shan Jin, Peter McNamara Department of Physics, University of Wisconsin Madison, Madison, WI 53706 Abstract A new method of backround subtraction is presented

More information

1 A Non-technical Introduction to Regression

1 A Non-technical Introduction to Regression 1 A Non-technical Introduction to Regression Chapters 1 and Chapter 2 of the textbook are reviews of material you should know from your previous study (e.g. in your second year course). They cover, in

More information

SIMILAR-ON-THE-BOUNDARY TESTS FOR MOMENT INEQUALITIES EXIST, BUT HAVE POOR POWER. Donald W. K. Andrews. August 2011 Revised March 2012

SIMILAR-ON-THE-BOUNDARY TESTS FOR MOMENT INEQUALITIES EXIST, BUT HAVE POOR POWER. Donald W. K. Andrews. August 2011 Revised March 2012 SIMILAR-ON-THE-BOUNDARY TESTS FOR MOMENT INEQUALITIES EXIST, BUT HAVE POOR POWER By Donald W. K. Andrews August 2011 Revised March 2012 COWLES FOUNDATION DISCUSSION PAPER NO. 1815R COWLES FOUNDATION FOR

More information

Chapter 6: Endogeneity and Instrumental Variables (IV) estimator

Chapter 6: Endogeneity and Instrumental Variables (IV) estimator Chapter 6: Endogeneity and Instrumental Variables (IV) estimator Advanced Econometrics - HEC Lausanne Christophe Hurlin University of Orléans December 15, 2013 Christophe Hurlin (University of Orléans)

More information

Nonparametric Identi cation and Estimation of Truncated Regression Models with Heteroskedasticity

Nonparametric Identi cation and Estimation of Truncated Regression Models with Heteroskedasticity Nonparametric Identi cation and Estimation of Truncated Regression Models with Heteroskedasticity Songnian Chen a, Xun Lu a, Xianbo Zhou b and Yahong Zhou c a Department of Economics, Hong Kong University

More information

On GMM Estimation and Inference with Bootstrap Bias-Correction in Linear Panel Data Models

On GMM Estimation and Inference with Bootstrap Bias-Correction in Linear Panel Data Models On GMM Estimation and Inference with Bootstrap Bias-Correction in Linear Panel Data Models Takashi Yamagata y Department of Economics and Related Studies, University of York, Heslington, York, UK January

More information

The Exact Distribution of the t-ratio with Robust and Clustered Standard Errors

The Exact Distribution of the t-ratio with Robust and Clustered Standard Errors The Exact Distribution of the t-ratio with Robust and Clustered Standard Errors by Bruce E. Hansen Department of Economics University of Wisconsin June 2017 Bruce Hansen (University of Wisconsin) Exact

More information

Lecture 5 Processing microarray data

Lecture 5 Processing microarray data Lecture 5 Processin microarray data (1)Transform the data into a scale suitable for analysis ()Remove the effects of systematic and obfuscatin sources of variation (3)Identify discrepant observations Preprocessin

More information

Discussion Paper Series

Discussion Paper Series INSTITUTO TECNOLÓGICO AUTÓNOMO DE MÉXICO CENTRO DE INVESTIGACIÓN ECONÓMICA Discussion Paper Series Size Corrected Power for Bootstrap Tests Manuel A. Domínguez and Ignacio N. Lobato Instituto Tecnológico

More information

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data July 2012 Bangkok, Thailand Cosimo Beverelli (World Trade Organization) 1 Content a) Classical regression model b)

More information

Supplemental Material 1 for On Optimal Inference in the Linear IV Model

Supplemental Material 1 for On Optimal Inference in the Linear IV Model Supplemental Material 1 for On Optimal Inference in the Linear IV Model Donald W. K. Andrews Cowles Foundation for Research in Economics Yale University Vadim Marmer Vancouver School of Economics University

More information

The Compact Muon Solenoid Experiment. Conference Report. Mailing address: CMS CERN, CH-1211 GENEVA 23, Switzerland

The Compact Muon Solenoid Experiment. Conference Report. Mailing address: CMS CERN, CH-1211 GENEVA 23, Switzerland Available on CMS information server CMS CR-28/5 he Compact Muon Solenoid Experiment Conference Report Mailin address: CMS CERN, CH-1211 GENEVA 23, Switzerland 14 February 28 Observin Heavy op Quarks with

More information

Appendix for "O shoring in a Ricardian World"

Appendix for O shoring in a Ricardian World Appendix for "O shoring in a Ricardian World" This Appendix presents the proofs of Propositions - 6 and the derivations of the results in Section IV. Proof of Proposition We want to show that Tm L m T

More information

Bayesian IV: the normal case with multiple endogenous variables

Bayesian IV: the normal case with multiple endogenous variables Baesian IV: the normal case with multiple endogenous variables Timoth Cogle and Richard Startz revised Januar 05 Abstract We set out a Gibbs sampler for the linear instrumental-variable model with normal

More information

Empirical Methods in Applied Microeconomics

Empirical Methods in Applied Microeconomics Empirical Methods in Applied Microeconomics Jörn-Ste en Pischke LSE October 007 1 Nonstandard Standard Error Issues The discussion so far has concentrated on identi cation of the e ect of interest. Obviously,

More information

Bootstrap-Based Improvements for Inference with Clustered Errors

Bootstrap-Based Improvements for Inference with Clustered Errors Bootstrap-Based Improvements for Inference with Clustered Errors Colin Cameron, Jonah Gelbach, Doug Miller U.C. - Davis, U. Maryland, U.C. - Davis May, 2008 May, 2008 1 / 41 1. Introduction OLS regression

More information

A Practitioner s Guide to Cluster-Robust Inference

A Practitioner s Guide to Cluster-Robust Inference A Practitioner s Guide to Cluster-Robust Inference A. C. Cameron and D. L. Miller presented by Federico Curci March 4, 2015 Cameron Miller Cluster Clinic II March 4, 2015 1 / 20 In the previous episode

More information

Dynamical Diffraction

Dynamical Diffraction Dynamical versus Kinematical Di raction kinematical theory valid only for very thin crystals Dynamical Diffraction excitation error s 0 primary beam intensity I 0 intensity of other beams consider di racted

More information

Comparing Nested Predictive Regression Models with Persistent Predictors

Comparing Nested Predictive Regression Models with Persistent Predictors Comparing Nested Predictive Regression Models with Persistent Predictors Yan Ge y and ae-hwy Lee z November 29, 24 Abstract his paper is an extension of Clark and McCracken (CM 2, 25, 29) and Clark and

More information

Obtaining Critical Values for Test of Markov Regime Switching

Obtaining Critical Values for Test of Markov Regime Switching University of California, Santa Barbara From the SelectedWorks of Douglas G. Steigerwald November 1, 01 Obtaining Critical Values for Test of Markov Regime Switching Douglas G Steigerwald, University of

More information

Daily Welfare Gains from Trade

Daily Welfare Gains from Trade Daily Welfare Gains from Trade Hasan Toprak Hakan Yilmazkuday y INCOMPLETE Abstract Using daily price quantity data on imported locally produced agricultural products, this paper estimates the elasticity

More information

Inference on a Structural Break in Trend with Fractionally Integrated Errors

Inference on a Structural Break in Trend with Fractionally Integrated Errors Inference on a Structural Break in rend with Fractionally Integrated Errors Seongyeon Chang Boston University Pierre Perron y Boston University November, Abstract Perron and Zhu (5) established the consistency,

More information

Generalization of Vieta's Formula

Generalization of Vieta's Formula Rose-Hulman Underraduate Mathematics Journal Volume 7 Issue Article 4 Generalization of Vieta's Formula Al-Jalila Al-Abri Sultan Qaboos University in Oman Follow this and additional wors at: https://scholar.rose-hulman.edu/rhumj

More information

Math 413/513 Chapter 6 (from Friedberg, Insel, & Spence)

Math 413/513 Chapter 6 (from Friedberg, Insel, & Spence) Math 413/513 Chapter 6 (from Friedberg, Insel, & Spence) David Glickenstein December 7, 2015 1 Inner product spaces In this chapter, we will only consider the elds R and C. De nition 1 Let V be a vector

More information

PANEL DATA RANDOM AND FIXED EFFECTS MODEL. Professor Menelaos Karanasos. December Panel Data (Institute) PANEL DATA December / 1

PANEL DATA RANDOM AND FIXED EFFECTS MODEL. Professor Menelaos Karanasos. December Panel Data (Institute) PANEL DATA December / 1 PANEL DATA RANDOM AND FIXED EFFECTS MODEL Professor Menelaos Karanasos December 2011 PANEL DATA Notation y it is the value of the dependent variable for cross-section unit i at time t where i = 1,...,

More information

A Course on Advanced Econometrics

A Course on Advanced Econometrics A Course on Advanced Econometrics Yongmiao Hong The Ernest S. Liu Professor of Economics & International Studies Cornell University Course Introduction: Modern economies are full of uncertainties and risk.

More information

Econometric Analysis of Cross Section and Panel Data

Econometric Analysis of Cross Section and Panel Data Econometric Analysis of Cross Section and Panel Data Jeffrey M. Wooldridge / The MIT Press Cambridge, Massachusetts London, England Contents Preface Acknowledgments xvii xxiii I INTRODUCTION AND BACKGROUND

More information

7 Semiparametric Methods and Partially Linear Regression

7 Semiparametric Methods and Partially Linear Regression 7 Semiparametric Metods and Partially Linear Regression 7. Overview A model is called semiparametric if it is described by and were is nite-dimensional (e.g. parametric) and is in nite-dimensional (nonparametric).

More information

Conical Pendulum Linearization Analyses

Conical Pendulum Linearization Analyses European J of Physics Education Volume 7 Issue 3 309-70 Dean et al. Conical Pendulum inearization Analyses Kevin Dean Jyothi Mathew Physics Department he Petroleum Institute Abu Dhabi, PO Box 533 United

More information

Identi cation of Positive Treatment E ects in. Randomized Experiments with Non-Compliance

Identi cation of Positive Treatment E ects in. Randomized Experiments with Non-Compliance Identi cation of Positive Treatment E ects in Randomized Experiments with Non-Compliance Aleksey Tetenov y February 18, 2012 Abstract I derive sharp nonparametric lower bounds on some parameters of the

More information

Models, Testing, and Correction of Heteroskedasticity. James L. Powell Department of Economics University of California, Berkeley

Models, Testing, and Correction of Heteroskedasticity. James L. Powell Department of Economics University of California, Berkeley Models, Testing, and Correction of Heteroskedasticity James L. Powell Department of Economics University of California, Berkeley Aitken s GLS and Weighted LS The Generalized Classical Regression Model

More information

Efficiency Tradeoffs in Estimating the Linear Trend Plus Noise Model. Abstract

Efficiency Tradeoffs in Estimating the Linear Trend Plus Noise Model. Abstract Efficiency radeoffs in Estimating the Linear rend Plus Noise Model Barry Falk Department of Economics, Iowa State University Anindya Roy University of Maryland Baltimore County Abstract his paper presents

More information