Robust Inference with Multi-way Clustering

Size: px

Start display at page:

Download "Robust Inference with Multi-way Clustering"

Annabel McCoy
6 years ago
Views:

1 Robust Inference with Multi-way Clustering Colin Cameron, Jonah Gelbach, Doug Miller U.C. - Davis, U. Arizona, U.C. - Davis February 2010 February / 44

2 1.1 Introduction Moulton (1986, 1990) and Bertrand, Duo, and Mullainathan (2004) showed the importance of controlling for clustering. Failure to control underestimates OLS standard errors and overstates t statistics. Initially use one-way random eects model. Now use cluster-robust standard errors (White (1984), Arellano (1987), Liang and Zeger (1986), Rogers (1993)). February / 44

3 This paper extends to cluster-robust in two (nonnested) dimensions Example 1: More than one grouped regressor. Example 2: Cross-section unit and time for panel data. Outline of presentation Lengthy discussion of one-way cluster-robust. Present method for two-way cluster-robust. Simulation and application. Conclusion. February / 44

4 1.2 OLS with Cluster Errors Model for G clusters with N g individuals per cluster: OLS estimator y ig = x 0 ig β + u ig, i = 1,..., N g, g = 1,..., G, y g = X g β + u g, g = 1,..., G, y = Xβ + u, bβ = ( G g=1 N g i=1 x ig x 0 ig ) 1 ( G g=1 N g i=1 x ig y ig ) = ( G g=1 X0 g X g ) 1 ( G g=1 X g y g ) = (X 0 X) 1 X 0 y. February / 44

5 OLS with Clustered Errors (continued) As usual bβ = β + (X 0 X) 1 X 0 u = β + (X 0 X) 1 ( G g=1 X g u g ). Assume independence over g with u g [0, Σ g = E[u g ug 0 ]]. Then β b a N [β, V[ β]] b with V[ b β] = (X 0 X) 1 ( G g=1 X g Σ g X 0 g )(X 0 X) 1. February / 44

6 OLS with Clustered Errors (continued) If ignore clustering the default OLS variance estimate should be inated by approximately where τ j ' 1 + ρ xj ρ u ( N g 1), ρ xj is the within cluster correlation of x j ρ u is the within cluster error correlation N g is the average cluster size. Kloek (1981), Scott and Holt (1982). Moulton (1986, 1990) showed that could be large even if ρ u small. e.g. N G = 81, ρ x = 1 and ρ u = 0.1 then τ j = 9! So should correct for clustering - but Σ g is unknown. February / 44

7 1.3 Random eects approach The original way to obtain consistent variance matrix estimate. Assume a random eects (RE) model u ig = α g + ε ig α g iid[0, σ 2 α] ε ig iid[0, σ 2 ε ]. Then Σ g = σ 2 ui Ng + σ 2 αe Ng e 0 N g and we use bv RE [ b β] = (X 0 X) 1 ( G g=1 X g b Σ g X 0 g )(X 0 X) 1, where bσ g = bσ 2 ui Ng + bσ 2 αe Ng e 0 N g, and bσ 2 ε and bσ 2 α are consistent. Weakness is strong distributional assumptions. February / 44

8 1.4 Cluster-Robust Variance Estimates The current method to obtain variance estimates. The cluster-robust variance estimate (CRVE) bv CR [ b β] = (X 0 X) 1 ( G g=1 X g eu g eu 0 g X 0 g )(X 0 X) 1, provides a consistent estimator of V CR [ b β] if plim 1 G G g=1 X g eu g eu g 0 X 0 g = plim 1 G G g=1 X g Σ g X 0 g. bv CR [ b β] yields cluster-robust standard errors. If OLS residuals are used then eu g = bu g = y g X g b β. Then bv CR [ b β] is downwards-biased for V[ b β]. February / 44

9 Cluster-Robust Variance Estimates (continued) The CRVE was proposed by Liang and Zeger (1986) for grouped data proposed by Arellano (1987) for FE estimator for short panels (where the grouping is on the individual) popularized by incorporation in Stata as the cluster option (Rogers (1993)). Bertrand, Duo and Mullainathan (2004) in a much-cited paper noted that with state-year data many studies were erroneously clustering on state-year (implying independence over time) rather than clustering on just state. February / 44

10 1.5 CRVE with few clusters CRVE assumes G!. What if G is small? Stata uses eu g = p cbu g where c = critical values. Reasonable starting point. G G 1 N 1 N k ' G G 1 and T (G 1) Donald and Lang (2007) do asymptotic theory when G is small and N g!. Then a two-step FGLS RE estimator yields t-test that is T (G 2) under some assumptions. Ibragimov and Muller (2010) do asymptotic theory when G is small and N g! and only within-group variation is relevant. Then separately estimate b β g s and average. Cameron, Gelbach and Miller (2007) propose cluster bootstraps with asymptotic renement. February / 44

11 1.6 Wild Cluster Bootstrap-T Procedure Test H 0 : β 1 = β 0 1 against H a : β 1 6= β 0 1 using w = ( bβ 1 β 0 1 )/s bβ 1. 1 Obtain the OLS estimator b β and OLS residuals bu g, g = 1,..., G. [Best to use residuals that impose H 0 ]. 2 Do B iterations of this step. On the b th iteration: 1 For each cluster g = 1,..., G, form bu g = bu g or bu g = bu g each with probability 0.5 and hence form by g = X 0 g b β + bu g. This yields wild cluster bootstrap resample f(by 1, X 1),..., (by G, X G )g. 2 Calculate the OLS estimate bβ 1,b and its standard error s bβ 1,b and given these form the Wald test statistic w b = ( bβ 1,b 3 Reject H 0 at level α if and only if bβ 1 )/s b β 1,b. w < w [α/2] or w > w [1 α/2], where w [q] denotes the qth quantile of w 1,..., w B. February / 44

12 1.7 Other work on clustering Wooldridge (2003) survey. Bhattacharya (2005) combines stratication and clustering, and in a GMM setting. Christian Hansen (2007a) shows that even for long panels it is okay to use CRVE with FE estimator (as shown in simulations by Krezde (2002)). Christian Hansen (2007b) shows that for short panels with AR(p) errors bias-corrected FGLS does well. Hausman and Kuersteiner (2005) do qualitatively similar small-t correction for Kiefer (1980) FGLS estimator in panel. Conley (1998) analyzes GMM with spatial correlation. Cameron and Miller (2010) show that degrees-of-freedom corrections can lead to cluster-robust variance estimates diering substantially in LSDV versus mean-dierenced model if T is small. February / 44

13 2.0 Motivating example How do job injury rates eect wages? Hersch (1998). CPS individual data on male wages N = But there is no individual data on job injury rate. Instead aggregated data: data on industry injury rates for 211 industries data on occupations injury rates for 387 occupations. Model estimated is What should we do? y igh = α + x 0 igh β + γ rind ig + δ rocc ih + u igh. GLS two-way random eects: u igh = ε g + ε h + ε igh ; ε g, ε h, ε igh i.i.d. Strong assumptions. OLS then cluster on industry for bγ and cluster on occupation for bδ. Ad hoc. February / 44

14 2.1 One-way clustering Return to one-way clustering before extending to two-way. Clustering on group g (for g = 1,..., G ) with N g individuals in group g : y ig = xig 0 β + u ig E[u ig u jg 0jx ig, x jg 0] = 0, for i 6= j unless g = g 0. Then informally the estimated asymptotic variance matrix of b β OLS, denoted Avar[ b β], is where Avar[ b β] = (X 0 X) 1 bb(x 0 X) 1 February / 44

15 One-way Clustering bb = G g=1 X 0 g bu g bu 0 g X g 2 bu 1 bu = X 0 0 bu 2 bu X 0 bu G bu G E = X 0 bubu0. 0 E C 5A X 0 E G = X 0 (bubu 0. S G )X, February / 44

16 2.2 Two-way Clustering With g denoting rst group and h denoting second group y igh = x 0 igh β+u igh, E[u igh u jg 0 h 0jx igh, x jg 0 h 0] = 0, for i 6= j unless g = g 0 or h = h 0. Then use bb = X 0 (bubu 0. S GH )X, (1) where S GH is an N N indicator matrix with ij th entry equal to one if the i th and j th observation share any cluster and equal to zero otherwise. February / 44

17 Two-way Clustering Dene three N N indicator matrices: S G with ij th entry equal to one if the i th and j th observation belong to the same cluster g 2 f1, 2,..., G g S H with ij th entry equal to one if the i th and j th observation belong to the same cluster h 2 f1, 2,..., Hg S G \H with ij th entry equal to one if the i th and j th observation belong to both the same cluster g 2 f1, 2,..., G g and the same cluster h 2 f1, 2,..., Hg. Then so S GH = S G + S H S G \H, bb = X 0 (bubu 0. S G )X + X 0 (bubu 0. S H )X X 0 (bubu 0. S G \H. X), February / 44

18 Two-way Clustering Hence Avar[ b β] = (X 0 X) 1 X 0 (bubu 0. S G )X(X 0 X) 1 (2) +(X 0 X) 1 X 0 (bubu 0. S H )X(X 0 X) 1 (X 0 X) 1 X 0 (bubu 0. S G \H )X(X 0 X) 1. The three components can be separately computed by (1) OLS regression of y on X with standard errors computed using clustering on g 2 f1, 2,..., G g; (2) OLS regression of y on X with standard errors computed using clustering on h 2 f1, 2,..., Hg; and (3) OLS regression of y on X with standard errors computed using clustering on (g, h) 2 f(1, 1),..., (G, H)g Stata add-on cgmreg.ado does this. February / 44

19 Two-way Clustering: other research This method has been independently proposed by others (we began in 2004). Applied: Acemoglu and Pischke (2003): cluster on individual and region-by-time. Applied: Fafchamps and Gubert (2006): cluster on person 1 and person 2 in person-pair data (dyadic) for networks. Theory: Thompson (2006): cluster on rm on time. Theory: Miglioretti and Heagarty (2006): cluster on person, radiologist, and facility. Theory: Conley (1998): cluster on space and time (dierent asymptotic theory). February / 44

20 2.3 Practical Considerations If bv[ b β] is not positive-denite (small G, H) then Decompose bv[ β] b = UΛU 0 ; U contains eigenvectors of bv, and Λ = Diag[λ 1,..., λ d ] contains eigenvalues. Create Λ + = Diag[λ 1 +,..., λ+ d ], with λ+ j = max 0, λ j, and use bv + [ b β] = UΛ + U 0. Fixed eects in one or both dimensions We do not formally address this complication Intuitively if G! and H! then each xed eect is estimated using many observations. In practice the main consequence of including xed eects is a reduction in within-cluster correlation. February / 44

21 2.4 Multi-way clustering Extends to multi-way. e.g. 7 variance estimates for three-way clustering and bb may be written as B e (1,0,0) + eb (0,1,0) + eb (0,0,1) B e (1,1,0) + eb (1,0,1) + eb (0,1,1) + eb (1,1,1 Extends to m-estimators and GMM e.g. For probit three components can be separately computed by (1) probit regression with standard errors computed using clustering on g 2 f1, 2,..., G g; (2) probit regression with standard errors computed using clustering on h 2 f1, 2,..., Hg; and (3) probit regression with standard errors computed using clustering on (g, h) 2 f(1, 1),..., (G, H)g Then add (1) and (2) and subtract (3). February / 44

22 2.5 GMM M-estimator solves N i=1 h i (bθ) = 0. Under standard assumptions, bθ is asymptotically normal with bv[bθ] = ba 1 bbba 1, h where ba = i i, and bb is an estimate of b V[ i h i ]. θ θ For independence over i For one-way clustering bb = bb = N N i=1 j=1 N bh i bh i. 0 i=1 bh i bh 0 ji (i, j) where I (i, j) = 1 if i and j are in the same cluster. February / 44

23 GMM Two-way clustering For two-way clustering bb = N N i=1 j=1 bh i bh 0 ji 1 (i, j) + N N i=1 j=1 bh i bh 0 ji 2 (i, j) N N i=1 j=1 bh i bh 0 ji 12 (i, j) where I 1 (i, j) = 1 if i and j are in the same cluster in dimension 1 and I 2 (i, j) = 1 if i and j are in the same cluster in dimension 2 and I 12 (i, j) = 1 if i and j are in the same cluster in dimensions 1 and 2. February / 44

24 3.1.1 Dgp with no clustering Simulations are for one observation per (h, g) combination. So HG observations and can drop the subscript i. Balanced and equal size groups H = G = 10, 20,..., 100 plus some cases H 6= G. Dgp is y gh = β 0 + β 1 x 1gh + β 2 x 2gh + α g + δ h + ε gh where β 0 = β 1 = β 2 = 1 and x ig, x ih, α g, δ h and ε igh are all iid N[0, 1]. February / 44

25 Dgp with no clustering Compute actual rejection rates for 5% two-sided Wald tests of β 1 = 1 and β 2 = 1. Dierent standard error estimates lead to dierent Wald tests. Methods used are 1. Assume iid errors 2. One-way cluster on g 3. Two-way random eects 4. Two-way cluster robust with T critical values. All methods should work with no clustering. February / 44

26 Dgp with no clustering All methods work ne with G = H = 100 Small sample over-rejection greatest for two-way cluster-robust, though diminished if use T (min(g, H) 1) degrees of freedom.. February / 44

27 Table 1 Rejection probabilities for a true null hypothesis True model: iid errors Number of Group Number of Group 1 Clusters 2 Clusters Assume iid errors Assumption about errors in construction of Variance One-way cluster robust (cluster on group1) Two-way random effects Two-way clusterrobust Two-way clusterrobust, T critical values % 6.7% 9.0% 9.7% 4.4% 5.8% 13.3% 14.9% 9.9% 11.5% % 4.8% 7.9% 5.7% 5.9% 4.9% 9.0% 7.6% 7.2% 5.9% % 4.8% 5.9% 5.9% 5.1% 5.0% 6.8% 7.1% 6.1% 6.1% % 4.8% 5.8% 5.5% 5.3% 4.7% 6.6% 6.1% 5.5% 5.4% % 5.3% 5.7% 6.1% 5.2% 5.8% 5.7% 6.5% 4.9% 5.6% % 5.6% 5.7% 5.8% 4.9% 5.4% 6.5% 6.0% 6.0% 5.2% % 5.3% 4.3% 5.6% 4.9% 5.5% 5.9% 6.0% 5.7% 5.7% % 5.3% 5.2% 6.0% 4.7% 5.2% 6.0% 6.4% 5.6% 5.9% % 4.8% 6.2% 5.0% 5.7% 4.8% 6.1% 4.6% 5.6% 4.3% % 5.1% 4.9% 5.3% 4.7% 5.2% 5.1% 6.1% 4.9% 5.7% % 4.0% 8.6% 7.3% 5.5% 4.1% 8.9% 9.7% 5.9% 6.8% % 5.1% 6.2% 7.3% 5.3% 5.1% 6.7% 7.4% 5.2% 5.8% % 5.1% 7.0% 7.7% 4.7% 5.0% 8.9% 8.9% 5.9% 5.8% % 3.8% 5.8% 5.0% 3.9% 3.7% 6.7% 7.7% 5.4% 5.9% % 5.0% 5.5% 5.4% 5.0% 4.9% 5.9% 6.5% 5.2% 5.7% Note: The null hypothesis should be rejected 5% of the time. Number of monte carlo simulations is 2000, except that for methods 1-3 it is 1000 when G*H > February / 44

28 3.1.2 Dgp with two-way clustered homoskedastic errors Dgp is y gh = β 0 + β 1 x 1gh + β 2 x 2gh + α g + δ h + u gh where now u gh = ε g + ε h + ε gh and each error is iid N [0, 1] and x 1gh = z g + z gh where each is iid N [0, 1] and x 2gh is similar. Then only the two-way methods work. Again two-way random eects does best as this is situation it is intended for. February / 44

29 Table 2 Rejection probabilities for a true null hypothesis True model: random effects on both Group1 and Group 2 Assumption about errors in construction of Variance Number of Group Number of Group 1 Clusters 2 Clusters Assume iid errors One-way cluster robust (cluster on group1) Two-way random effects Two-way clusterrobust Two-way clusterrobust, T critical values % 23.7% 13.7% 33.2% 6.2% 6.6% 17.4% 17.4% 12.6% 13.4% % 32.3% 8.6% 42.7% 5.7% 5.3% 10.3% 9.6% 8.7% 7.6% % 41.7% 7.4% 50.6% 5.2% 5.6% 8.6% 9.1% 7.8% 7.7% % 47.6% 8.7% 55.2% 6.5% 5.4% 9.0% 9.3% 7.9% 8.1% % 50.4% 6.0% 58.8% 4.9% 4.8% 7.0% 6.7% 6.3% 6.2% % 56.3% 6.4% 64.1% 5.6% 6.5% 6.7% 6.1% 5.9% 5.6% % 57.6% 5.5% 64.8% 4.8% 6.0% 6.4% 6.5% 5.9% 6.0% % 60.9% 6.5% 67.0% 4.9% 4.7% 6.3% 7.0% 5.7% 6.5% % 60.7% 5.6% 67.9% 5.0% 5.4% 6.0% 6.7% 5.8% 6.3% % 60.4% 5.8% 67.9% 5.3% 3.6% 6.4% 5.3% 6.1% 5.1% % 21.3% 13.3% 33.4% 8.9% 4.0% 15.0% 9.3% 10.2% 5.8% % 33.1% 9.8% 44.5% 6.7% 4.5% 9.3% 8.1% 8.2% 6.2% % 21.0% 14.1% 31.7% 10.4% 3.3% 14.2% 8.1% 9.2% 4.6% % 33.9% 10.0% 43.7% 6.2% 3.7% 9.2% 6.3% 7.7% 4.6% % 54.0% 6.6% 60.5% 5.3% 3.6% 6.9% 6.9% 6.4% 6.1% Note: See Table 1. February / 44

30 3.1.3 Dgp with two-way clustered heteroskedastic errors Dgp is y gh = β 0 + β 1 x 1gh + β 2 x 2gh + α g + δ h + u gh where u gh = ε g + ε h + ε gh and each error is iid N [0, 1] EXCEPT now ε gh is N [0, jx 1gh x 2gh j] and x 1gh = z g + z gh where each is iid N [0, 1] and x 2gh is similar. Then only the two-way cluster-robust method works. February / 44

31 Table 1 Rejection probabilities for a true null hypothesis True model: a random effect common to each group, and a heteroscedastic component. Assumption about errors in construction of Variance Number of Group 1 Clusters Number of Group 2 Clusters Assume independent errors One-way cluster robust (cluster on group1) Two-way random effects Two-way clusterrobust % 7.9% 15.7% 9.8% 15.9% 14.3% 18.4% 16.5% 14.5% 12.9% 8.6% 8.9% % 5.4% 9.5% 7.1% 13.0% 10.8% 11.9% 10.9% 10.3% 8.8% 7.1% 5.9% % 6.9% 7.0% 8.1% 9.7% 10.8% 8.2% 9.2% 7.1% 8.0% 6.2% 6.0% % 8.2% 6.0% 8.5% 8.8% 9.4% 7.1% 7.0% 6.3% 6.5% 5.9% 5.7% % 10.5% 6.1% 11.2% 9.6% 9.0% 6.4% 6.9% 6.0% 6.4% 4.4% 5.4% % 5.6% 12.9% 8.8% 12.9% 12.3% 13.7% 9.8% 9.6% 5.9% 6.1% 6.0% % 7.5% 7.9% 8.1% 10.5% 11.5% 9.2% 8.6% 7.6% 6.6% 5.2% 6.7% % 6.4% 10.4% 9.4% 10.1% 13.0% 11.3% 10.0% 7.3% 6.8% 7.5% 6.2% % 5.3% 9.2% 6.7% 10.8% 10.1% 9.4% 6.4% 7.7% 4.5% 5.1% 6.2% % 8.1% 6.7% 8.7% 9.9% 10.0% 6.9% 6.8% 6.1% 6.2% 6.1% 5.2% Note: The null hypothesis should be rejected 5% of the time. Number of monte carlo simulations is Two-way clusterrobust, T critical values Group fixed effects, Two-way cluster-robust February / 44

32 3.2 Dgp with Time-State Correlation Extension of BDM (2004). Data on 1,358,623 women from CPS. Model is y ist = αd st + xist 0 β + δ s + γ t + u ist, where d st is placebo policy created to be correlated over s and t : Specically d st = dst s + 2dst t ; s = 1,..., 50 ordered states where dst s = 0.6dst s 1 + v st s, with v st s iid N [0, 1] and dst t = 0.6ds t 1,t + v st t with v st t iid N [0, 1], and also independent from other variables. Here the index s ranges Two-way cluster-robust is best when no xed eects included. No gain when xed eects included. So here xed eects are enough. February / 44

33 Table 2 Rejection probabilities for a true null hypothesis Monte Carlos with micro (CPS) data quartic in age, 4 education dummies quartic in age, 4 education dummies, state fixed effects RHS control variables quartic in age, 4 education dummies, year fixed effects quartic in age, 4 education dummies, state and year fixed effects Standard error estimator: Heteroscedasticity robust 91.6% 92.1% 82.2% 79.0% One-way cluster robust (cluster on state-by-year cell) 19.8% 22.4% 13.1% 13.9% One-way cluster robust (cluster on state) 16.2% 17.0% 12.0% 15.0% One-way cluster robust (cluster on year) 10.2% 8.9% 8.7% 7.6% Two-way cluster-robust (cluster on state and year) 7.2% 6.9% 7.6% 7.1% Note: Data come from 1.3 million employed women from the March CPS. Table reports rejection rates for testing a (true) null hypothesis of zero on the coefficient of fake treatments. The "treatments" are generated as (t = e_s + 2 e_t), with e_s a state-specific autoregressive component and e_t a year-specific "spatial" autoregressive component. The outcome is also modified by an independent yearspecific autoregressive component. See text for details Monte Carlo replications February / 44

34 4.1 Hersch - Cross-Section with Two-Way Clustering CPS data on male wages N = Separate data on industry and occupation injury rates. 211 injuries and 387 occupations. Model estimates is y igh = α + x 0 igh β + γ rind ig + δ rocc ih + u igh. Two-way clustering is best. C February / 44

35 Table 3 Replication of Hersch (1998) Industry Injury Rate Variable Occupation Injury Rate Estimated slope coefficient: Estimated standard errors Default (iid) (0.415) {0.0000} (0.235) {0.0478} and p-values: Heteroscedastic robust (0.397) {0.0000} (0.260) {0.0737} One-way cluster on Industry (0.643) {0.0032} (0.251) {0.0639} One-way cluster on Occupation (0.486) {0.0001} (0.363) {0.2002} Two-way clustering (0.702) {0.0070} (0.357) {0.1927} Note: Replication of Hersch (1998), pg 604, Table 3, Panel B, Column 4. Standard errors in parentheses. P- values from a test of each coefficient equal to zero in brackets. Data are 5960 observations on working men from the Current Population Survey. Both columns come from the same regression. There are 211 industries and 387 occupations in the data set. February / 44

36 4.3 Rose and Engel - Bilateral Trade Dyadic data: trade ows between pairs of countries. First column of Table 3 of Rose and Engel (2002). gravity model is tted for the natural logarithm of bilateral trade single cross-section on trade ows between 98 countries with 3262 unique country pairs. Cameron and Golotvina (2005) control for two-way clustering, using FGLS estimation based on iid country random eects. Here instead apply the more robust method to their paper. Focus on coecient of the log product of real GDP (= 0.867) heteroskedastic-robust standard error of average one-way clustered standard error of two-way robust standard error is if country specic eects are included as alternative way to control for the clustering, then the coecient of the log product of real GDP is no longer identied. February / 44

37 4.3 Gruber and Madrian - Rotating Panel March CPS data on 39,063 men. Model of retirement (y ist ) and health insurance coverage (d st ) Pr[y ist = 1] = Φ(αd st + xist 0 β + δ s + γ t ). Rotating design - many individuals appear in sample twice. February / 44

38 Table 6 Replication of Gruber and Madrian (1995) Model Probit OLS Estimated slope coefficient (* 1000): Estimated standard errors (* 1000): Default (iid) (5.709) (0.675) Heteroscedastic robust (5.732) (0.684) One-way cluster on state-year (6.265) (0.759) One-way cluster on household id (5.866) (0.702) One-way cluster on hhid-by-state-year (5.732) (0.685) Two-way clustering (6.389) (0.775) One-way cluster on State (6.030) (0.718) Note: Replication of Gruber and Madrian (1995), pg 943, Table 3, Model 1, Column 1. Standard errors in parentheses. Data are 39,063 observations on year-old men from the Current Population Surveys. February / 44

39 Conclusion Method is straightforward to implement. Works well except in smallest designs with G = H = 10. Many potential applications. Sometimes has big impact and sometimes modest. But researcher will not know this a priori. February / 44

40 Asymptotic theory Key is h B 0 = lim E G 1 H 2 N i=1 N j=1 z i (θ)z j (θ) 0i. where N i=1 z i (θ) = G g=1 H h=1 z gh (θ) and z gh (θ) = i2cgh z igh (θ) = i2cgh (y igh xigh 0 β)x igh. For two-way clustering since i z i (θ) = g h z gh (θ) we have h E G 1 H 2 N i=1 N j=1 z i (θ)z j (θ) 0i = E hg 1 H 2 Gg=1 Hh=1 Gg 0=1 Hh 0=1 z gh(θ)z g 0 h 0(θ)0i = G 1 H 2 g h h 0 E[z gh z 0 gh 0] g = g 0 +G 1 H 2 h g g 0 E[z gh z 0 g 0 h ] h = h0 G 1 H 2 g h E[z gh z 0 gh]. February / 44

41 Assume the triple sums are of order GH 2 = G. The triple sums are instead of order GH, rather than GH 2, if the dependence of observations in common cluster g goes to zero as clusters h and h 0 become further apart. case with declining time series dependence or spatial dependence. then in B 0 normalize by (GH) rate of convergence of the estimator becomes a faster p GH rather than p G. Still obtain the same asymptotic variance matrix for b β. February / 44

42 Some References Acemoglu, D., and J.-S. Pischke (2003), \Minimum Wages and On-the-job Training," Research in Labor Economics, 22, Arellano, M. (1987), \Computing Robust Standard Errors for Within-Group Estimators," Oxford Bulletin of Economics and Statistics, 49, Bertrand, M., E. Duo, and S. Mullainathan (2004), \How Much Should we Trust Dierences-in-Dierences Estimates," Quarterly Journal of Economics, Bhattacharya, D. (2005), \Asymptotic inference from multi-stage samples,"' Hournal of Econometrics, 126: Cameron, A.C., and N. Golotvina (2005), \Estimation of Country-Pair Data Models Controlling for Clustered Errors: with International Trade Applications," U.C.-Davis Economics Department Working Paper No Conley, T. G. (1999), \GMM with cross sectional dependence," Journal of Econometrics, 92: Donald, S. and K. Lang (2004), \Inferences with Dierences in Dierences and Other Panel Data," October 22, February / 44

43 Some References Fafchamps, M., and F. Gubert (2006), \The Formation of Risk Sharing Networks," mimeo, April Hansen, C. B. (2007a), \Generalized least squares inference in panel and multilevel models with serial correlation and xed eects," Journal of Econometrics, 140, Hansen, C. B. (2007b), \Asymptotic properties of a robust variance matrix estimator for panel data when T is large,"journal of Econometrics, 141: Ibragimov, R. and U.K. Muller (2007), \T-Statistic Based Correlation and Heterogeneity Robust Inference," Harvard Institute of Economic Research Discussion Paper No Kuersteiner, G. and J. Hausman (2007), \Dierence in Dierence meets Generalized Least Squares: Higher Order Properties of Hypotheses Tests". Liang, K.-Y., and S.L. Zeger (1986), \Longitudinal Data Analysis Using Generalized Linear Models," Biometrika, 73, Miglioretti, D.L., and P.J. Heagerty (2006), \Marginal Modeling of Nonnested Multilevel Data using Standard Software," American Journal of Epidemiology, 165(4), February / 44

44 Some References Moulton, B.R. (1986), \Random Group Eects and the Precision of Regression Estimates," Journal of Econometrics, 32, Moulton, B.R. (1990), \An Illustration of a Pitfall in Estimating the Eects of Aggregate Variables on Micro Units," Review of Economics and Statistics, 72, Petersen, M. (2006), \Estimating Standard Errors in Finance Panel Data Sets: Comparing Approaches," unpublished manuscript, Northwestern University. Rogers, W.H. (1993), \Regression Standard Errors in Clustered Samples," Stata Technical Bulletin, 13, Thompson, S. (2005), \A Simple Formula for Standard Errors that Cluster by Both Firm and Time," unpublished manuscript. White, H. (1984), Asymptotic Theory for Econometricians, San Diego, Academic Press. Wooldridge, J.M. (2003), \Cluster-Sample Methods in Applied Econometrics," American Economic Review, 93, February / 44

Bootstrap-Based Improvements for Inference with Clustered Errors

Bootstrap-Based Improvements for Inference with Clustered Errors Colin Cameron, Jonah Gelbach, Doug Miller U.C. - Davis, U. Maryland, U.C. - Davis May, 2008 May, 2008 1 / 41 1. Introduction OLS regression