Day 2B Inference for Clustered Data: Part 2 With a Cross-section Data Example (Revised August 25 to add 14. Dyadic Clustering)

Size: px

Start display at page:

Download "Day 2B Inference for Clustered Data: Part 2 With a Cross-section Data Example (Revised August 25 to add 14. Dyadic Clustering)"

Byron Wilson
5 years ago
Views:

1 Day 2B nference for Clustered Data: Part 2 With a Cross-section Data Example (Revised August 25 to add 14. Dyadic Clustering) A. Colin Cameron Univ. of Calif. - Davis... for Center of Labor Economics Norwegian School of Economics Advanced Microeconometrics Aug 28 - Sep 1, Colin Cameron Univ. of Calif. - Davis... for Center of Clustered Labor Economics Data: PartNorwegian 2 School of Economics Aug 28 -Advanced Sep 1, 2017 Microeconometri 1 / 56

2 1. ntroduction 1. ntroduction Consider clustering in more detail. The example is a Moulton-type data example data on individuals in households in communes (villages) These slides are a summary of A. Colin Cameron and Douglas L. Miller (2015), A Practitioner s Guide to Robust nference with Clustered Data, Journal of Human Resources, Vol.50 (2, Spring), Preprint version available at F A. Colin Cameron Univ. of Calif. - Davis... for Center of Clustered Labor Economics Data: PartNorwegian 2 School of Economics Aug 28 -Advanced Sep 1, 2017 Microeconometri 2 / 56

3 1. ntroduction Outline 1 ntroduction 2 Moulton-Type Data 3 Analysis using Panel Commands 4 Analysis using Mixed Models 5 Cluster-Speci c Fixed E ects 6 What to Cluster Over? 7 Multi-way Clustering 8 Few Clusters: Overview 9 Few Clusters: Bias-Corrected Variance Estimate 10 Few Clusters: Bootstrap with Asymptotic Re nement 11 Few Clusters: mproved Critical t-values 12 Few Clusters: Special Cases 13 Extensions: To V, 2SLS. GMM 14 ADDENDUM: Dyadic Clustering 15 Conclusion A. Colin Cameron Univ. of Calif. - Davis... for Center of Clustered Labor Economics Data: PartNorwegian 2 School of Economics Aug 28 -Advanced Sep 1, 2017 Microeconometri 3 / 56

4 2. Moulton - Type Data 2. Moulton-Type Data Vietnam data on individuals in households in communes pharvis - number of direct pharmacy visits in past 12 months lnhhexp - log of household medical expenditures illness - number of illnesses denti ers commune - identi es the commune hh - created variable that identi es household person_in_hh - created variable that identi es person in household person - created variable that uniquely identi es household We analyze using (1) regress A. Colin Cameron Univ. of Calif. - Davis... for Center of Clustered Labor Economics Data: PartNorwegian 2 School of Economics Aug 28 -Advanced Sep 1, 2017 Microeconometri 4 / 56

5 2. Moulton - Type Data OLS without Using Panel Commands OLS without using Panel Commands Summary statistics. sum pharvis lnhhexp illness AGE hh person_in_hh person Variable Obs Mean Std. Dev. Min Max pharvis lnhhexp illness AGE hh person_in_hh person A. Colin Cameron Univ. of Calif. - Davis... for Center of Clustered Labor Economics Data: PartNorwegian 2 School of Economics Aug 28 -Advanced Sep 1, 2017 Microeconometri 5 / 56

6 2. Moulton - Type Data OLS without Using Panel Commands Cluster-Robust Variance for OLS * OLS estimation with cluster-robust standard errors * Cluster on household and then on commune quietly regress pharvis lnhhexp illness estimates store OLS_iid quietly regress pharvis lnhhexp illness, vce(robust) estimates store OLS_het quietly regress pharvis lnhhexp illness, vce(cluster hh) estimates store OLS_hh quietly regress pharvis lnhhexp illness, vce(cluster commune) estimates store OLS_comm estimates table OLS_iid OLS_het OLS_hh OLS_comm, b(%10.4f) se stats(r2 N) A. Colin Cameron Univ. of Calif. - Davis... for Center of Clustered Labor Economics Data: PartNorwegian 2 School of Economics Aug 28 -Advanced Sep 1, 2017 Microeconometri 6 / 56

7 2. Moulton - Type Data OLS without Using Panel Commands Standard errors increase with broader clustering level. estimates table OLS_iid OLS_het OLS_hh OLS_comm, b(%10.4f) se stats(r2 N) Variable OLS_iid OLS_het OLS_hh OLS_comm lnhhexp illness _cons r N legend: b/se A. Colin Cameron Univ. of Calif. - Davis... for Center of Clustered Labor Economics Data: PartNorwegian 2 School of Economics Aug 28 -Advanced Sep 1, 2017 Microeconometri 7 / 56

8 3. Analysis using Panel Commands 3. Analysis using Panel Commands Suppose we cluster on household and want to use xtdescribe. xtset hh panel variable: hh (unbalanced). xtdescribe must specify timevar; use xtset r(459);. xtset hh person_in_hh panel variable: hh (unbalanced) time variable: person_in_hh, 1 to 19 delta: 1 unit Also need to give a time identi er - here person_in_hh A. Colin Cameron Univ. of Calif. - Davis... for Center of Clustered Labor Economics Data: PartNorwegian 2 School of Economics Aug 28 -Advanced Sep 1, 2017 Microeconometri 8 / 56

9 3. Analysis using Panel Commands. xtdescribe hh: 1, 2,..., 5740 n = 5740 person_in_hh: 1, 2,..., 19 T = 19 Delta(person_in_hh) = 1 unit Span(person_in_hh) = 19 periods (hh*person_in_hh uniquely identifies each observation) Distribution of T_i: min 5% 25% 50% 75% 95% max Freq. Percent Cum. Pattern (other patterns) XXXXXXXXXXXXXXXXXXX. Colin Cameron Univ. of Calif. - Davis... for Center of Clustered Labor Economics Data: PartNorwegian 2 School of Economics Aug 28 -Advanced Sep 1, 2017 Microeconometri 9 / 56

10 3. Analysis using Panel Commands ntracluster Correlation ntracluster Correlation Cluster here is household - here fairly high intracluster correlation. loneway pharvis hh One way Analysis of Variance for pharvis: Source SS df MS F Prob > F Between hh Within hh Total ntraclass Asy. correlation S.E. [95% Conf. nterval] Estimated SD of hh effect Estimated SD within hh Est. reliability of a hh mean (evaluated at n=4.84) Number of obs = R squared = A. Colin Cameron Univ. of Calif. - Davis... for Center of Clustered Labor Economics Data: PartNorwegian 2 School of Economics Aug 28 - Sep Advanced 1, 2017 Microeconometri 10 / 56

11 3. Analysis using Panel Commands ntracluster Correlation Cluster here is household other methods to estimate intraclass correlation. quietly xtreg pharvis, mle.. display "ntra class correlation for household: " e(rho) ntra class correlation for household: quietly correlate pharvis L1.pharvis.. display "Correlation for adjoining household: " r(rho) Correlation for adjoining household: A. Colin Cameron Univ. of Calif. - Davis... for Center of Clustered Labor Economics Data: PartNorwegian 2 School of Economics Aug 28 - Sep Advanced 1, 2017 Microeconometri 11 / 56

12 3. Analysis using Panel Commands OLS, RE and FE Estimates * OLS, RE and FE estimation with clustering on household and on village quietly regress pharvis lnhhexp illness, vce(cluster hh) estimates store OLS_hh quietly xtreg pharvis lnhhexp illness, re estimates store RE_hh quietly xtreg pharvis lnhhexp illness, fe estimates store FE_hh quietly xtset commune quietly regress pharvis lnhhexp illness, vce(cluster commune) estimates store OLS_vill quietly xtreg pharvis lnhhexp illness, re estimates store RE_vill quietly xtreg pharvis lnhhexp illness, fe estimates store FE_vill estimates table OLS_hh RE_hh FE_hh OLS_vill RE_vill FE_vill, b(%7.5f) se(%7.4f) A. Colin Cameron Univ. of Calif. - Davis... for Center of Clustered Labor Economics Data: PartNorwegian 2 School of Economics Aug 28 - Sep Advanced 1, 2017 Microeconometri 12 / 56

13 3. Analysis using Panel Commands OLS, RE and FE Estimates Here xt commands cluster on household but second lot of se s cluster on village (commune) RE and FE more e cient than OLS FE similar e ciency to RE as much within variation F to see this xtsum pharvis. estimates table OLS_hh RE_hh FE_hh OLS_vill RE_vill FE_vill, b(%7.5f) se(%7.4f) Variable OLS_hh RE_hh FE_hh OLS_vill RE_vill FE_vill lnhhexp (omitted) illness _cons legend: b/se A. Colin Cameron Univ. of Calif. - Davis... for Center of Clustered Labor Economics Data: PartNorwegian 2 School of Economics Aug 28 - Sep Advanced 1, 2017 Microeconometri 13 / 56

14 4. Analysis using Mixed Models 4. Mixed or Multi-level or Hierarchical Model Not used in microeconometrics but used in many other disciplines. Stack all observations for cluster g and specify y g = X g β + Z g u g + ε g where u g is iid (0, G) and Z g is called a design matrix and ε g (0, σ 2 ɛ). Random e ects: Z g = e (a vector of ones) and u g = α g Random coe cients: Z g = X g Reason: β g (β, Σ) so β g β + u g where u g (0, Σ) So y g = X g β g + ε g = X g (β + u g ) + ε g = X g β + X g u i + ε g. Note: y g jx g has mean X g β and variance X 0 g ΣX g + σ 2 ɛ. Simplest case of random intercept is same as xtreg, mle. mixed lwage exp exp2 wks ed jj hh: A. Colin Cameron Univ. of Calif. - Davis... for Center of Clustered Labor Economics Data: PartNorwegian 2 School of Economics Aug 28 - Sep Advanced 1, 2017 Microeconometri 14 / 56

15 4. Analysis using Mixed Models Example where illness has random slope. * Mixed model with random interecept and a random slope. mixed pharvis lnhhexp illness hh: illness, nolog covariance(unstructured) Mixed effects ML regression Number of obs = 27,765 Group variable: hh Number of groups = 5,740 Obs per group: min = 1 avg = 4.8 max = 19 Wald chi2(2) = Log likelihood = Prob > chi2 = pharvis Coef. Std. Err. z P> z [95% Conf. nterval] lnhhexp illness _cons Random effects Parameters Estimate Std. Err. [95% Conf. nterval] hh: Unstructured var(illness) var(_cons) cov(illness,_cons) var(residual) LR test vs. linear model: chi2(3) = Prob > chi2 = Note: LR test is conservative and provided only for reference. A. Colin Cameron Univ. of Calif. - Davis... for Center of Clustered Labor Economics Data: PartNorwegian 2 School of Economics Aug 28 - Sep Advanced 1, 2017 Microeconometri 15 / 56

16 4. Analysis using Mixed Models Extensions Extensions Proceeding example was a two-level model. Can add additional levels e.g. people in households in communes And can have two-way random e ects y igh = α g + δ g + xigh 0 β + ε igh α g iid (α, σ 2 α) and δ h iid (0, σ 2 δ ). Code as * Twoway random effects with error * e_g + e_h + e_igh, g=lldays h=hh mixed pharvis lnhhexp illness jj _all: R.LLDAYS jj hh:, mle A. Colin Cameron Univ. of Calif. - Davis... for Center of Clustered Labor Economics Data: PartNorwegian 2 School of Economics Aug 28 - Sep Advanced 1, 2017 Microeconometri 16 / 56

17 5. Cluster-speci c e ects models 5. Cluster-Speci c Fixed E ects Model The model is y ig = x 0 ig β + α g + u ig = x 0 ig β + G h=1 α g dh ig + u ig there are G dummy variables d1 ig,..., dg ig dh ig = 1 if ig th observation is in cluster h and = 0 otherwise. Do FE s Eliminate Within-Cluster Error Correlation? No. Does CRVE still work? Yes if G! and either N g xed or N g!. What is rank of CRVE with K regressors and G clusters For OLS it is minimum of K and G 1 F Reason: bv CR [ β] b = (X 0 X) 1 bb (X 0 X) 1 F bb = C 0 C, where C 0 = [X1 0 bu 1 XG 0 bu G ] is K G F and X1 0 bu XG 0 bu G = 0 For FE it is also minimum of K and G 1 A. Colin Cameron Univ. of Calif. - Davis... for Center of Clustered Labor Economics Data: PartNorwegian 2 School of Economics Aug 28 - Sep Advanced 1, 2017 Microeconometri 17 / 56

18 5. Cluster-speci c e ects models Feasible GLS Feasible GLS (with FE s) More di cult with xed e ects if N g is small. f N g is nite then bα g is inconsistent for a g Does not lead to inconsistent β b But does mean residuals bu ig inconsistently estimated This contaminates bω used in FGLS estimation. Hansen (2007b) provides bias-corrected FGLS for AR(p) errors Brewer, Crossley and Joyce (2013) implement in DiD setting and show power gains Hausman and Kuersteiner (2008) provide bias-corrected FGLS for Kiefer (1980) error model Ω g = Ω and bω ij = G 1 G g =1 bu ig bu jg, where bu ig are OLS residuals A. Colin Cameron Univ. of Calif. - Davis... for Center of Clustered Labor Economics Data: PartNorwegian 2 School of Economics Aug 28 - Sep Advanced 1, 2017 Microeconometri 18 / 56

19 5. Cluster-speci c e ects models Testing the Need for Fixed E ects Testing the Need for Fixed E ects Hausman test: T H = ( b β 1;FE b β1;re ) 0 bv 1 ( b β 1;FE b β;re ), Must use a modi ed Hausman test Reason: Default Hausman uses bv = bv 1;FE bv 1;RE But this requires that RE model is fully e cient under H 0 Wooldridge (2012, p.332): OLS regression y ig = x 0 ig β + w 0 g γ + u ig, where w g denotes the subcomponent of x ig that varies within cluster and w g = Ng 1 G i =1 w ig. test H 0 : γ = 0 using a Wald test based on a CRVE FE model is necessary if we reject H 0 Stata user-written command xtoverid, due to Hoechle (2007). Or do pairs cluster bootstrap to estimate bv. A. Colin Cameron Univ. of Calif. - Davis... for Center of Clustered Labor Economics Data: PartNorwegian 2 School of Economics Aug 28 - Sep Advanced 1, 2017 Microeconometri 19 / 56

20 6. What to Cluster Over? Factors Determining What to Cluster Over 6. Factors Determining What to Cluster Over t is not always obvious how to specify the clusters. Moulton (1986, 1990) cluster at the level of an aggregated regressor. Bertrand, Du o and Mullainathan (2004) with state-year data cluster on states (assumed to be independent) rather than state-year pairs. Pepper (2002) cluster at the highest level where there may be correlation e.g. for individual in household in state may want to cluster at level of the state if state policy variable is a regressor. A. Colin Cameron Univ. of Calif. - Davis... for Center of Clustered Labor Economics Data: PartNorwegian 2 School of Economics Aug 28 - Sep Advanced 1, 2017 Microeconometri 20 / 56

21 6. What to Cluster Over? Clustering Due to Survey Design Clustering Due to Survey Design Clustering routinely arises with complex survey data. Then the loss of e ciency due to clustering is called the design e ect This is the inverse of the variance in ation factor given earlier Long literature going back to 1960 s CRVE is called the linearization formula Shah, Holt and Folsom (1977) is early reference. Complex survey data are weighted often ignore assuming conditioning on x handles weighting And strati ed this improves estimator e ciency somewhat Bhattacharya (2005) gives a general GMM treatment. A. Colin Cameron Univ. of Calif. - Davis... for Center of Clustered Labor Economics Data: PartNorwegian 2 School of Economics Aug 28 - Sep Advanced 1, 2017 Microeconometri 21 / 56

22 6. What to Cluster Over? Clustering Due to Survey Design Econometricians reasonably Cluster on PSU or higher Sometimes weight and sometimes not gnore strati cation (with slight loss in e ciency) Survey software controls for all three. Stata svy commands Econometricians use regular commands with vce(cluster) and possibly [pweight=1/prob] A. Colin Cameron Univ. of Calif. - Davis... for Center of Clustered Labor Economics Data: PartNorwegian 2 School of Economics Aug 28 - Sep Advanced 1, 2017 Microeconometri 22 / 56

23 7. Multi-Way Clustering 7. Multi-way Clustering Example: How do job injury rates e ect wages? Hersch (1998). CPS individual data on male wages N = But there is no individual data on job injury rate. nstead aggregated data: F F data on industry injury rates for 211 industries data on occupation injury rates for 387 occupations. Model estimated is What should we do? y igh = α + x 0 igh β + γ rind ig + δ rocc ih + u igh. Ad hoc robust: OLS and robust cluster on industry for bγ and robust cluster on occupation for bδ. Non-robust: FGLS two-way random e ects: u igh = ε g + ε h + ε igh ; ε g, ε h, ε igh i.i.d. Two-way robust: next A. Colin Cameron Univ. of Calif. - Davis... for Center of Clustered Labor Economics Data: PartNorwegian 2 School of Economics Aug 28 - Sep Advanced 1, 2017 Microeconometri 23 / 56

24 7. Multi-Way Clustering Two-way Cluster-Robust VE Two-way clustering Robust variance matrix estimates are of the form bavar[ β] b = (X 0 X) 1 bb(x 0 X) 1 For one-way clustering with clusters g = 1,..., G we can write bb = N i=1 N j=1 x i xj 0 bu i bu j 1[i, j in same cluster g] where bu i = y i xi 0 β b and the indicator function 1[A] equals 1 if event A occurs and 0 otherwise. For two-way clustering with clusters g = 1,..., G and h = 1,..., H bb = N i=1 N j=1 x i xj 0 bu i bu j 1[i, j share any of the two clusters] = N i=1 N j=1 x i xj 0 bu i bu j 1[i, j in same cluster g] + N i=1 N j=1 x i xj 0 bu i bu j 1[i, j in same cluster h] N i=1 N j=1 x i xj 0 bu i bu j 1[i, j in both cluster g and h]. A. Colin Cameron Univ. of Calif. - Davis... for Center of Clustered Labor Economics Data: PartNorwegian 2 School of Economics Aug 28 - Sep Advanced 1, 2017 Microeconometri 24 / 56

25 7. Multi-Way Clustering Two-way Cluster-Robust VE Obtain three di erent cluster-robust variance matrices for the estimator by one-way clustering in, respectively, the rst dimension, the second dimension, and by the intersection of the rst and second dimensions add the rst two variance matrices and, to account for double-counting, subtract the third. Thus bv two-way [ β] b = bv G [ β] b + bv H [ β] b bv G \H [ β], b Theory presented in Cameron, Gelbach, and Miller (2006, 2011), Miglioretti and Heagerty (2006), and Thompson (2006, 2011) Extends to multi-way clustering. Early empirical applications that independently proposed this method include Acemoglu and Pischke (2003). A. Colin Cameron Univ. of Calif. - Davis... for Center of Clustered Labor Economics Data: PartNorwegian 2 School of Economics Aug 28 - Sep Advanced 1, 2017 Microeconometri 25 / 56

26 7. Multi-Way Clustering mplementation mplementation f bv[ b β] is not positive-de nite (small G, H) then Decompose bv[ b β] = UΛU 0 ; U contains eigenvectors of bv, and Λ = Diag[λ 1,..., λ d ] contains eigenvalues. Create Λ + = Diag[λ + 1,..., λ+ d ], with λ+ j = max 0, λ j, and use bv + [ β] b = UΛ + U 0 Stata add-on cgmreg.ado implements this. Also Stata add-on ivreg2.ado has two-way clustering for a variety of linear model estimators. Fixed e ects in one or both dimensions Theory has not formally addressed this complication ntuitively if G! and H! then each xed e ect is estimated using many observations. n practice the main consequence of including xed e ects is a reduction in within-cluster correlation of errors. A. Colin Cameron Univ. of Calif. - Davis... for Center of Clustered Labor Economics Data: PartNorwegian 2 School of Economics Aug 28 - Sep Advanced 1, 2017 Microeconometri 26 / 56

27 7. Multi-Way Clustering Cluster on Household and Age Cluster on Household and Age Use cgmreg.ado at cameron.econ.ucdavis.edu/research/papers.html. cgmreg pharvis lnhhexp illness, cluster(hh AGE) Note: +/ means the corresponding matrix is added/subtracted Calculating cov part for variables: hh (+) Calculating cov part for variables: hh AGE ( ) Calculating cov part for variables: AGE (+) Number of obs = Num clusvars = 2 Num combinations = 3 G(hh) = 5740 G(AGE) = 98 pharvis Coef. Std. Err. z P> z [95% Conf. nterval] lnhhexp illness _cons A. Colin Cameron Univ. of Calif. - Davis... for Center of Clustered Labor Economics Data: PartNorwegian 2 School of Economics Aug 28 - Sep Advanced 1, 2017 Microeconometri 27 / 56

28 7. Multi-Way Clustering Cluster on Household and Age User-written ivreg2 command gives similar (di erent dof). ivreg2 pharvis lnhhexp illness, cluster(hh AGE) OLS estimation Estimates efficient for homoskedasticity only Statistics robust to heteroskedasticity and clustering on hh and AGE Number of clusters (hh) = 5740 Number of obs = Number of clusters (AGE) = 98 F( 2, 97) = Prob > F = Total (centered) SS = Centered R2 = Total (uncentered) SS = Uncentered R2 = Residual SS = Root MSE = Robust pharvis Coef. Std. Err. z P> z [95% Conf. nterval] lnhhexp illness _cons ncluded instruments: lnhhexp illness A. Colin Cameron Univ. of Calif. - Davis... for Center of Clustered Labor Economics Data: PartNorwegian 2 School of Economics Aug 28 - Sep Advanced 1, 2017 Microeconometri 28 / 56

29 7. Multi-Way Clustering Cluster on Household and Age Application Example 1: Hersch data Relatively small di erence versus one-way But can simultaneously handle both ways rather than one-way cluster on industry for bγ and one-way cluster on occupation for bδ. Example 2: DiD We have found little di erence if cluster two-way on state and time versus just one-way on state. Studies in nance view this as important. Example 3: Country-pair international trade volume Two-way cluster on country 1 and country 2 leads to much bigger standard errors (Cameron et al. 2011) Cameron and Miller (2012) nd that two-way still doesn t pick up all correlations. nstead other methods including Fafchamps and Gubert (2007). A. Colin Cameron Univ. of Calif. - Davis... for Center of Clustered Labor Economics Data: PartNorwegian 2 School of Economics Aug 28 - Sep Advanced 1, 2017 Microeconometri 29 / 56

30 7. Multi-Way Clustering Feasible GLS Feasible GLS Two-way random e ects y igh = xigh 0 β + α g + δ h + ε ig with i.i.d. errors xtmixed y x jj _all: R.id1 jj id2:, mle. but cannot then get cluster-robust variance matrix Hierarchical linear models or mixed models richer FGLS y ig = xig 0 β g + u ig β g = W g γ + v j where u ig and v g are errors. see Rabe-Hesketh and Skrondal (2012) A. Colin Cameron Univ. of Calif. - Davis... for Center of Clustered Labor Economics Data: PartNorwegian 2 School of Economics Aug 28 - Sep Advanced 1, 2017 Microeconometri 30 / 56

31 7. Multi-Way Clustering Spatial Correlation Spatial Correlation Two-way cluster robust related to time-series and spatial HAC. n general bb in preceding has the form i j w (i, j) x i x 0 j bu i bu j. Two-way clustering: w (i, j) = 1 for observations that share a cluster. White and Domowitz (1984) time series: w (i, j) = 1 for observations close in time to one another. Conley (1999) spatial: w (i, j) decays to 0 as the distance between observations grows. The di erence: White & Domowitz and Conley use mixing conditions to ensure decay of dependence in time or distance. Mixing conditions do not apply to clustering due to common shocks. nstead two-way robust requires independence across clusters. A. Colin Cameron Univ. of Calif. - Davis... for Center of Clustered Labor Economics Data: PartNorwegian 2 School of Economics Aug 28 - Sep Advanced 1, 2017 Microeconometri 31 / 56

32 7. Multi-Way Clustering Spatial Correlation Spatial Correlation Consistent VE Driscoll and Kraay (1998) panel data when T! generalizes HAC to spatial correlation errors potentially correlated across individuals correlation across individuals disappears for obs > m time periods apart then w (it, js) = 1 d (it, js) /(m + 1) with sum over i, j, s and t and d (it, js) = jt sj if jt sj m and d (it, js) = 0 otherwise. Stata add-on command xtscc, due to Hoechle (2007). Foote (2007) contrasts various variance matrix estimators in a macroeconomics example. Petersen (2009) contrasts methods for panel data on nancial rms. Barrios, Diamond, mbens, and Kolesár (2012) state-year panel on individuals with spatial correlation across states. And use randomization inference. A. Colin Cameron Univ. of Calif. - Davis... for Center of Clustered Labor Economics Data: PartNorwegian 2 School of Economics Aug 28 - Sep Advanced 1, 2017 Microeconometri 32 / 56

33 8. Few Clusters: Overview 8. Few Clusters: nference with few clusters One-way clustering, and focus on the Wald t-statistic w = b β β 0. s b β CRVE assumes G!. What if G is small? At a minimum use CRVE with rescaled error eu g = p cbu g where c = G G 1 or c = G G 1 N 1 N k ' G G 1 And use T (G 1) critical values Stata does this for regress but not other commands.. But tests still over-reject with small G. A. Colin Cameron Univ. of Calif. - Davis... for Center of Clustered Labor Economics Data: PartNorwegian 2 School of Economics Aug 28 - Sep Advanced 1, 2017 Microeconometri 33 / 56

34 8. Few Clusters: Overview The Basic Problem with Few Clusters The Basic Problem with Few Clusters OLS over ts with bu systematically biased to zero compared to u. e.g. OLS with iid normal errors E[bu 0 bu] = (N K )σ 2, not Nσ 2. Problem is greatest as G gets small - few clusters. How few is few? balanced data; G < 20 to G < 50 depending on data unbalanced data: G less than this. Unusual situation for applied econometrics since have many observations estimation is reasonably precise so it is worthwhile doing statistical inference but because G is small the usual asymptotic theory leads to invalid inference. A. Colin Cameron Univ. of Calif. - Davis... for Center of Clustered Labor Economics Data: PartNorwegian 2 School of Economics Aug 28 - Sep Advanced 1, 2017 Microeconometri 34 / 56

35 8. Few Clusters: Overview Solutions Solutions 1. Bias-corrected CRVE 2. Cluster-Bootstrap with Asymptotic Re nement 3. mproved Critical Values This is an active area of research the following discussion needs references updated and newer ones added. A. Colin Cameron Univ. of Calif. - Davis... for Center of Clustered Labor Economics Data: PartNorwegian 2 School of Economics Aug 28 - Sep Advanced 1, 2017 Microeconometri 35 / 56

36 9. Few Clusters: Bias-Corrected CRVE 9. Bias-Corrected CRVE Simplest is eu g = p cbu g, already mentioned. CR2VE generalizes HC2 for heteroskedasticity eu g = [ Ng H gg ] 1/2 bu g where H gg = X g (X 0 X) 1 Xg 0 gives unbiased CRVE if errors iid normal CR3VE generalizes HC3 for heteroskedasticity eu g + = p G /(G 1)[ Ng H gg ] 1 bu g where H gg = X g (X 0 X) 1 Xg 0 same as jackknife Problems: Not clear which is better CR2VE or CR3VE More importantly, still over-rejects when few clusters. A. Colin Cameron Univ. of Calif. - Davis... for Center of Clustered Labor Economics Data: PartNorwegian 2 School of Economics Aug 28 - Sep Advanced 1, 2017 Microeconometri 36 / 56

37 10. Few Clusters: Bootstrap with asymptotic re nement 10. Few Clusters: Cluster bootstrap with asymptotic re nement Cameron, Gelbach and Miller (2008) Test H 0 : β 1 = β 0 1 against H a : β 1 6= β 0 1 using w = ( bβ 1 β 0 1 )/s bβ 1 perform a cluster bootstrap with asymptotic re nement then true test size is α + O(G 3/2 ) rather than usual α + O(G 1 ) hopefully improvement when G is small wild cluster percentile-t bootstrap is best better than pairs cluster percentile-t bootstrap. For preparatroy material see separate handout on the bootstrap. A. Colin Cameron Univ. of Calif. - Davis... for Center of Clustered Labor Economics Data: PartNorwegian 2 School of Economics Aug 28 - Sep Advanced 1, 2017 Microeconometri 37 / 56

38 10. Few Clusters: Bootstrap with asymptotic re nement Stata Pairs Cluster Bootstrap BC and BCa Con dence ntervals Stata Pairs Cluster Bootstrap BC and BCa Con dence ntervals preserve Keep only 20 communes (Cluster on commune) Stata gives bias-corrected and accelerated BC intervals keep if commune <= 20 (25122 observations deleted) xtset commune // for cluster bootstrap can only xtset the cluster variable regress pharvis lnhhexp illness, vce(boot, /// cluster(commune) seed(10101) reps(999) bca) (running regress on estimation sample) Jackknife replications (20) Bootstrap replications (999) A. Colin Cameron Univ. of Calif. - Davis... for Center of Clustered Labor Economics Data: PartNorwegian 2 School of Economics Aug 28 - Sep Advanced 1, 2017 Microeconometri 38 / 56

39 10. Few Clusters: Bootstrap with asymptotic re nement Stata Pairs Cluster Bootstrap BC and BCa Con dence ntervals. estat bootstrap, all Linear regression Number of obs = 2,643 Replications = 999 (Replications based on 20 clusters in commune) Observed Bootstrap pharvis Coef. Bias Std. Err. [95% Conf. nterval] lnhhexp (N) (P) (BC) (BCa) illness (N) (P) (BC) (BCa) _cons (N) (P) (BC) (BCa) (N) normal confidence interval (P) percentile confidence interval (BC) bias corrected confidence interval (BCa) bias corrected and accelerated confidence interval. Colin Cameron Univ. of Calif. - Davis... for Center of Clustered Labor Economics Data: PartNorwegian 2 School of Economics Aug 28 - Sep Advanced 1, 2017 Microeconometri 39 / 56

40 10. Few Clusters: Bootstrap with asymptotic re nement Stata Pairs Cluster Bootstrap Percentile-t Con dence ntervals Stata Pairs Cluster Bootstrap Percentile-t Con dence ntervals n theory the BC and BCa con dence intervals provide asymptotic re nement. but we nd they di er little from the percentile bootstrap. another possibility is to Resample cluster pairs as for cluster robust se s bootstrap But at each bootstrap calculate wb = ( bβ b bβ)/s b β b Note: we subtract bβ because the bootstrap views the population as the sample so the dgp value of β is bβ. * Percentile-t for a single coe cient: Bootstrap the t statistic. quietly regress pharvis lnhhexp illness, vce(cluster commune). local theta = _b[lnhhexp]. local setheta = _se[lnhhexp]. bootstrap tstar=((_b[lnhhexp]- theta )/_se[lnhhexp]), seed(10101) /// > reps(999) saving(percentilet, replace): /// > regress pharvis lnhhexp illness, vce(cluster commune) A. Colin Cameron Univ. of Calif. - Davis... for Center of Clustered Labor Economics Data: PartNorwegian 2 School of Economics Aug 28 - Sep Advanced 1, 2017 Microeconometri 40 / 56

41 10. Few Clusters: Bootstrap with asymptotic re nement Stata Pairs Cluster Bootstrap Percentile-t Con dence ntervals We have 999 values of wb, called tstar below.. * Percentile t p value for symmetric two sided Wald test of H0: theta = 0. use percentilet, clear (bootstrap: regress). quietly count if abs(`theta'/`setheta') < abs(tstar). display "p value = " r(n)/_n p value = * Percentile t critical values and confidence interval. _pctile tstar, p(2.5,97.5). scalar lb = `theta' + r(r1)*`setheta'. scalar ub = `theta' + r(r2)*`setheta'. display "2.5 and 97.5 percentiles of t* distn: " r(r1) ", " r(r2) _n /// > "95 percent percentile t confidence interval is: (" lb "," ub ")" 2.5 and 97.5 percentiles of t* distn: , percent percentile t confidence interval is: ( , ) A. Colin Cameron Univ. of Calif. - Davis... for Center of Clustered Labor Economics Data: PartNorwegian 2 School of Economics Aug 28 - Sep Advanced 1, 2017 Microeconometri 41 / 56

42 10. Few Clusters: Bootstrap with asymptotic re nement Wild Cluster Bootstrap Wild Cluster Bootstrap 1 Obtain the OLS estimator b β and OLS residuals bu g, g = 1,..., G. Best to use residuals that impose H 0. 2 Do B iterations of this step. On the b th iteration: 1 For each cluster g = 1,..., G, form bu g = bu g or bu g = bu g each with probability 0.5 and hence form by g = X 0 g b β + bu g. This yields wild cluster bootstrap resample f(by 1, X 1),..., (by G, X G )g. 2 Calculate the OLS estimate bβ 1,b and its standard error s bβ and given 1,b these form the Wald test statistic wb = ( bβ 1,b bβ 1 )/s b β. 1,b 3 Reject H 0 at level α if and only if w < w [α/2] or w > w [1 α/2], where w [q] denotes the qth quantile of w 1,..., w B. A. Colin Cameron Univ. of Calif. - Davis... for Center of Clustered Labor Economics Data: PartNorwegian 2 School of Economics Aug 28 - Sep Advanced 1, 2017 Microeconometri 42 / 56

43 10. Few Clusters: Bootstrap with asymptotic re nement Wild Cluster Bootstrap Current Research Webb (2013) proposes using a six-point distribution for the weights d g in bu g = d g bu g. The weights d g have a 1/6 chance of each value in f p p p p p p 1.5, 1,.5,.5, 1, 1.5g. Works better with few clusters than two-point F Two-point cluster gives only 2 G 1 di erent bootstrap resamples. Also with very few clusters need to enumerate rather than bootstrap. f have less than ten clusters use Webb s method. MacKinnon and Webb (2013) nd that unbalanced cluster sizes worsens few clusters problem. Wild cluster bootstrap does well. A. Colin Cameron Univ. of Calif. - Davis... for Center of Clustered Labor Economics Data: PartNorwegian 2 School of Economics Aug 28 - Sep Advanced 1, 2017 Microeconometri 43 / 56

44 10. Few Clusters: Bootstrap with asymptotic re nement Wild Cluster Bootstrap Use the Bootstrap with Caution We assume clustering does not lead to estimator inconsistency focus is just on the standard errors. We assume that the bootstrap is valid this is usually the case for smooth problems with asymptotically normal estimators and usual rates of convergence. but there are cases where the bootstrap is invalid. When bootstrapping always set the seed (for replicability) use more bootstraps than the Stata default of 50 F for bootstraps without asymptotic re nement 400 should be plenty. When bootstrapping a xed e ects panel data model the additional option idcluster() must be used F for explanation see Stata manual [R] bootstrap: Bootstrapping statistics from data with a complex structure. A. Colin Cameron Univ. of Calif. - Davis... for Center of Clustered Labor Economics Data: PartNorwegian 2 School of Economics Aug 28 - Sep Advanced 1, 2017 Microeconometri 44 / 56

45 11. Few Clusters: mproved T Critical Values 11. Few Clusters: mproved T Critical Values Suppose all regressors are invariant within clusters, clusters are balanced and errors are i.i.d. normal then y ig = xg 0 β + ε ig =) ȳ g = xg 0 β + ε g with ε g i.i.d. normal so Wald test based on OLS is exactly T (G L), where L is the number of group invariant regressors. Extend to nonnormal errors and group varying regressors asymptotic theory when G is small and N g!. Donald and Lang (2007) propose a two-step FGLS RE estimator yields t-test that is T (G L) under some assumptions Wooldridge (2006) proposes an alternative minimum distance method. A. Colin Cameron Univ. of Calif. - Davis... for Center of Clustered Labor Economics Data: PartNorwegian 2 School of Economics Aug 28 - Sep Advanced 1, 2017 Microeconometri 45 / 56

46 11. Few Clusters: mproved T Critical Values mbens and Kolesar (2012) Data-determined number of degrees of freedom for t and F tests Builds on Satterthwaite (1946) and Bell and McCa rey (2002). Assumes normal errors and particular model for Ω. Match rst two moments of test statistic with rst two moments of χ 2. v = ( G j=1 λ j ) 2 /( G j=1 λ2 j ) and λ j are the eigenvalues of the G G matrix G 0 bωg. Find works better than 2-point Wild cluster bootstrap but they did not impose H 0. A. Colin Cameron Univ. of Calif. - Davis... for Center of Clustered Labor Economics Data: PartNorwegian 2 School of Economics Aug 28 - Sep Advanced 1, 2017 Microeconometri 46 / 56

47 11. Few Clusters: mproved T Critical Values Carter, Schnepel and Steigerwald (2013) provide asymptotic theory when clusters are unbalanced propose a measure of the e ective number of clusters G = G /(1 + δ) F where δ = 1 G G g =1 f(γ g γ) 2 / γ 2 g F γ g = ek 0 (X0 X) 1 Xg 0 Ω g X g (X 0 X) 1 e k F e k is a K 1 vector of zeroes aside from 1 in the k th position if bβ = bβ k F γ = G 1 G g =1 γ g. Cluster heterogeneity (δ 6= 0) can arise for many reasons variation in N g, variation in X g and variation in g across clusters. Brewer, Crossley and Joyce (2013) Do FGLS as gives both e ciency gains and works well even with few clusters. A. Colin Cameron Univ. of Calif. - Davis... for Center of Clustered Labor Economics Data: PartNorwegian 2 School of Economics Aug 28 - Sep Advanced 1, 2017 Microeconometri 47 / 56

48 12. Few Clusters: Special Cases 12. Few Clusters: Special Cases Bester, Conley and Hansen (2009) obtain T (G 1) in settings such as panel where mixing conditions apply. bragimov and Muller (2010) take an alternative approach suppose only within-group variation is relevant then separately estimate β b g s and average asymptotic theory when G is small and N g! A big limitation is assumption of only within variation for example in state-year panel application with clustering on state it rules out z t in y st = x 0 st β + z0 t γ + ε ig where z t are for example time dummies. This limitation is relevant in DiD models with few treated groups Conley and Taber (2010) present a novel method for that case. for synthetic control methods (one treated group) see Abadie, Diamond and Hainmueller (2010) F inference not established. A. Colin Cameron Univ. of Calif. - Davis... for Center of Clustered Labor Economics Data: PartNorwegian 2 School of Economics Aug 28 - Sep Advanced 1, 2017 Microeconometri 48 / 56

49 13. Extensions 13. Extensions The results for OLS and FGLS and t-tests extend to multiple hypothesis tests and V, 2SLS. GMM and nonlinear estimators. These extensions are incorporated in Stata but Stata generally does not use nite-cluster degrees-of-freedom adjustments in computing test p-values and con dence intervals F exception is command regress. A. Colin Cameron Univ. of Calif. - Davis... for Center of Clustered Labor Economics Data: PartNorwegian 2 School of Economics Aug 28 - Sep Advanced 1, 2017 Microeconometri 49 / 56

50 13. Extensions Extensions (continued) See Cameron and Miller, JHR (2015) paper. 7.1 Cluster-Robust F-tests 7.2 nstrumental Variables Estimators V, 2SLS, linear GMM Need modi ed Hausman test for endogeneity : estat endogenous Weak instruments: F F F First-stage F-test should be cluster-robust use add-on xtivreg2 Finlay and Magnusson (2009) have Stata add-on rivtest.ado. 7.3 Nonlinear Estimators Population-averaged (xtreg, pa) and random e ects (e.g. xtlogit, re) give quite di erent βs Rarely can eliminate xed e ects if N g is small. 7.4 Cluster-randomized Experiments A. Colin Cameron Univ. of Calif. - Davis... for Center of Clustered Labor Economics Data: PartNorwegian 2 School of Economics Aug 28 - Sep Advanced 1, 2017 Microeconometri 50 / 56

51 14. ADDENDUM: Dyadic Clustering 14 ADDENDUM: Dyadic Clustering A dyad is a pair. An example is country pairs. Gravity model for volume of trade between countries Does WTO/GATT membership increase trade? (Rose AER 2004) Data on 187 countries and 52 years ( ) N = 234,597 max observations = /2 = 819,156. Model estimated by OLS is y ght = x 0 ght β + u ght where y ght is the natural logarithm of real bilateral trade trade between countries g and h in time period t. How do we compute standard errors controlling for correlation in u ght? Complication is that US-UK error likely correlated with any pair involving the US or involving UK. A. Colin Cameron Univ. of Calif. - Davis... for Center of Clustered Labor Economics Data: PartNorwegian 2 School of Economics Aug 28 - Sep Advanced 1, 2017 Microeconometri 51 / 56

52 14. ADDENDUM: Dyadic Clustering n general robust variance estimates are of form bavar[ b β] = (X 0 X) 1 bb(x 0 X) 1 / For two-way clustering with y igh = x 0 igh β + u ighg use E[u igh u jg 0 h 0jx igh, x jg 0 h 0] = 0, unless g = g 0 or h = h 0. N N bb = 1[g = g 0 and/or h = h 0 ] bu igh bu jg 0 h 0x ighxjg 0 0 h 0. i =1 j=1 For dyadic we have E[u igh u jg 0 h 0jx igh, x jg 0 h 0] = 0, unless g = g 0 or h = h 0 or h = g 0 or g = h 0. use bb = N N 1[g = g 0 and/or h = h 0 and/or h = g 0 and/or g = h 0 ] i =1 j=1 bu igh bu jg 0 h 0x ighx 0 jg 0 h 0. A. Colin Cameron Univ. of Calif. - Davis... for Center of Clustered Labor Economics Data: PartNorwegian 2 School of Economics Aug 28 - Sep Advanced 1, 2017 Microeconometri 52 / 56

53 14. ADDENDUM: Dyadic Clustering Example: G=4 countries Six Pairs (1, 2), (1, 3), (1, 4), (2, 3), (2, 4) and (3, 4) country-pair: only (g, h) = (g 0, h 0 ) diagonal entries denoted CP two-way: g = g 0 and/or h = h 0 denoted CP and 2way. dyadic: also g = h 0 or h = g 0 denoted CP, 2way and DYAD. (g,h)n(g 0,h 0 ) (1,2) (1,3) (1,4) (2,3) (2,4) (3,4) (1,2) CP 2way 2way DYAD DYAD (1,3) 2way CP 2way 2way DYAD (1,4) 2way 2way CP 2way 2way (2,3) DYAD 2way CP 2way DYAD (2,4) DYAD 2way 2way CP 2way (3,4) DYAD 2way DYAD 2way CP For small G large fraction of correlation matrix is nonzero G = 10 : 38% of error correlations are nonzero G = 30 : 13% of error correlations are nonzero. For large G the fraction potentially correlated! 4/(G 1). A. Colin Cameron Univ. of Calif. - Davis... for Center of Clustered Labor Economics Data: PartNorwegian 2 School of Economics Aug 28 - Sep Advanced 1, 2017 Microeconometri 53 / 56

54 14. ADDENDUM: Dyadic Clustering t makes a big di erence, even with panel data and country-pair FEs. Table has standard error ratios - bold is dyadic versus usual method. TABLE 4B: COUNTRY PAR PANEL DATA EXAMPLE WTH COUNTRY PAR FXED EFFECTS Comparison of Standard Errors computed in various ways SE RATOS DYAD DYAD DYAD DYAD COEFF TME SG 5% /HET /PARS /2WAY /JACK Both_in_GATTorWTO Yes No One_in_GATTorWTO Yes No GSP Yes Yes Log_Distance Log_product_real_GDP Yes Yes Log_product_real_GDP_pc Yes No Regional_FTA Yes Yes Currency_Union Yes Yes Common_language Land_border Number_landlocked Number_islands Log_product_land_area Common_colonizer Currently_colonized Yes Yes Ever colony Common_country Constant Year dummies Yes Country pair dummies Yes Observations R squared A. Colin Cameron Univ. of Calif. - Davis... for Center of Clustered Labor Economics Data: PartNorwegian 2 School of Economics Aug 28 - Sep Advanced 1, 2017 Microeconometri 54 / 56

55 14. ADDENDUM: Dyadic Clustering Resources A. Colin Cameron and Douglas L. Miller, "Robust nference for Dyadic Data", December 31, Paper presented at the January 2015 ASSA meetings. Code for Dyadic Cluster Robust Standard Errors For OLS our Stata code is regdyad2.ado This is very much a "beta version". We are in the midst of editing and tweaking it, but we believe that it runs and works. The syntax to use is F regdyad2 yvar xvars, dyads(d_var_1 D_var_2) Paper and code downloadable at A. Colin Cameron Univ. of Calif. - Davis... for Center of Clustered Labor Economics Data: PartNorwegian 2 School of Economics Aug 28 - Sep Advanced 1, 2017 Microeconometri 55 / 56

56 14. ADDENDUM: Dyadic Clustering 14. Conclusion Where clustering is present it is important to control for it. We focus on obtaining cluster-robust standard errors though clustering may also lead to estimator inconsistency. Many Stata commands provide cluster-robust standard errors using option vce() a cluster bootstrap can be used when option vce() does not include clustering. n practice it can be di cult to know at what level to cluster the number of clusters may be few and asymptotic theory is in the number of clusters. A. Colin Cameron Univ. of Calif. - Davis... for Center of Clustered Labor Economics Data: PartNorwegian 2 School of Economics Aug 28 - Sep Advanced 1, 2017 Microeconometri 56 / 56

Robust Inference for Regression with Clustered Data

Robust Inference for Regression with Clustered Data Robust nference for Regression with Clustered Data Colin Cameron Univ. of California - Davis Joint work with Douglas L. Miller (UC - Davis). Based on A Practitioners Guide to Cluster-Robust nference, Journal