Robust Inference with Clustered Errors

Size: px
Start display at page:

Download "Robust Inference with Clustered Errors"

Transcription

1 Robust nference with Clustered Errors Colin Cameron Univ. of California - Davis Keynote address at The 8th Annual Health Econometrics Workshop, University of Colorado Denver, Anschutz Medical Campus.. Based on A Practitioners Guide to Cluster-Robust nference J. of Human Resources, 2015, vol.50, Joint work with Douglas L. Miller (and earlier Jonah Gelbach). September olin Cameron Univ. of California - Davis (Keynote Robust address nference at The with 8thClustered Annual Health Errors Econometrics Workshop,University September of Colorado 1 / 63 D

2 1. ntroduction ntroduction 1. ntroduction Consider straightforward OLS estimation in linear regression model. Suppose estimator b β is consistent for β. Concerned with getting the correct standard errors of b β default: if errors are i.i.d. (0, σ 2 ) heteroskedastic-robust: if errors are independent (0, σ 2 i ) heteroskedastic and autocorrelation-robust (HAC): if errors are serially correlated cluster-robust: if errors are correlated within cluster and independent across clusters F this talk. Colin Cameron Univ. of California - Davis (Keynote Robust address nference at The with 8thClustered Annual Health Errors Econometrics Workshop,University September of Colorado 2 / 63 D

3 1. ntroduction ntroduction Why is this important? 1. Cluster-robust standard errors can be much bigger than default or heteroskedastic-robust. 2. So failure to control for clustering overstates t statistics and understates p-values provides too narrow con dence intervals 3. This arises often especially in the empirical / public labor literature using quasi-experimental methods. 4. There are subtleties - not always straightforward to implement. Colin Cameron Univ. of California - Davis (Keynote Robust address nference at The with 8thClustered Annual Health Errors Econometrics Workshop,University September of Colorado 3 / 63 D

4 1. ntroduction Example 1: ndividuals in Cluster Example 1: ndividuals in Cluster Example: How do job injury rates e ect wages? Hersch (1998). CPS individual data on male wages. But there is no individual data on job injury rate. nstead aggregated data on occupation injury rates 211 OLS estimate model for individual i in occupation g Problem: y ig = α + x 0 ig β + γ z g + u ig. the regressor z g (job injury risk in occupation g) is perfectly correlated within cluster (occupation) F by construction and the error u ig is (mildly) correlated within cluster F if model overpredicts for one person in occupation j it is likely to overpredict for others in occupation j. Colin Cameron Univ. of California - Davis (Keynote Robust address nference at The with 8thClustered Annual Health Errors Econometrics Workshop,University September of Colorado 4 / 63 D

5 1. ntroduction Example 1: ndividuals in Cluster Simpler model, nine occupations, N = Summary statistics Variable Obs Mean Std. Dev. Min Max lnw occrate potexp potexpsq educ union nonwhite northe midw west occ_id olin Cameron Univ. of California - Davis (Keynote Robust address nference at The with 8thClustered Annual Health Errors Econometrics Workshop,University September of Colorado 5 / 63 D

6 1. ntroduction Example 1: ndividuals in Cluster Same OLS regression with di erent se s estimated using Stata (1) iid errors, (2) het errors, (3,4) clustered errors global covars potexp potexpsq educ union nonwhite northe midw west regress lnw occrate $covars estimates store one_iid regress lnw occrate $covars, vce(robust) estimates store one_het regress lnw occrate $covars, vce(cluster occ_id) estimates store one_clu xtset occ_id xtreg lnw occrate $covars, pa corr(ind) vce(robust) estimates store one_xtclu estimates table one_iid one_het one_clu one_xtclu, /// b(%10.4f) se(%10.4f) p(%10.3f) stats(n N_clust rank F) Colin Cameron Univ. of California - Davis (Keynote Robust address nference at The with 8thClustered Annual Health Errors Econometrics Workshop,University September of Colorado 6 / 63 D

7 1. ntroduction Example 1: ndividuals in Cluster Same OLS coe cients but cluster-robust standard errors (columns 3 and 4) when cluster on occupation are 2-4 times larger than default (column 1) or heteroskedastic-robust (column 2) and p-values in the last two columns di er substantially: t(8) versus N(0, 1) Variable one_iid one_het one_clu one_xtclu occrate potexp potexpsq educ union Colin Cameron Univ. of California - Davis (Keynote Robust address nference at The with 8thClustered Annual Health Errors Econometrics Workshop,University September of Colorado 7 / 63 D

8 1. ntroduction Example 1: ndividuals in Cluster And cluster-robust variance matrix is rank de cient nonwhite northe midw west _cons N N_clust rank F legend: b/se/p Colin Cameron Univ. of California - Davis (Keynote Robust address nference at The with 8thClustered Annual Health Errors Econometrics Workshop,University September of Colorado 8 / 63 D

9 1. ntroduction Example 1: ndividuals in Cluster Moulton (1986, 1990) is key paper to highlight the larger standard errors when cluster due to regressors correlated within cluster and errors correlated within cluster. The di erent p-values in columns 3 and 4 arise when there are few clusters use t(8) not N(0, 1) The rank de ciency of the overall F-test is explained below individual t-statistics are still okay. Colin Cameron Univ. of California - Davis (Keynote Robust address nference at The with 8thClustered Annual Health Errors Econometrics Workshop,University September of Colorado 9 / 63 D

10 1. ntroduction Example 2: Di erence-in-di erences in State-Year Panel Example 2: Di erence-in-di erences State-Year Panel Example: How do wages respond to a policy indicator variable d ts that varies by state? e.g. d ts = 1 if minimum wage law in e ect OLS estimate model for state s at time t Problem: y ts = α + x 0 ts β + γ d ts + u ts. the regressor d ts is highly correlated within cluster F typically d ts is initially 0 and at some stage switches to 1 the error u ts is (mildly) correlated within cluster F if model underpredicts for California in one year then it is likely to underpredict for other years. Colin Cameron Univ. of California - Davis (Keynote Robust address nference at The with 8thClustered Annual Health Errors EconometricsSeptember Workshop,University of Colorado 10 / 63 D

11 1. ntroduction Example 2: Di erence-in-di erences in State-Year Panel Again nd that default OLS standard errors are way too small should instead do cluster-robust (cluster on state) The same problem arises if we have data in individuals (i) in states and years y its = α + x 0 its β + γ d ts + u its in that case should also cluster on state. Bertrand, Du o & Mullainathan (2004) key paper that highlighted problems for DiD in 2004 people either ignored the problem or with its data erroneously clustered on state-year pair and not state. Colin Cameron Univ. of California - Davis (Keynote Robust address nference at The with 8thClustered Annual Health Errors EconometricsSeptember Workshop,University of Colorado 11 / 63 D

12 1. ntroduction Example 2: Di erence-in-di erences in State-Year Panel Outline 1 ntroduction 2 Cluster-Robust nference for OLS 3 Cluster-Speci c Fixed E ects 4 What to Cluster Over? 5 Multi-way Clustering 6 Few Clusters 7 Extensions (beyond OLS) 8 Empirical Example 9 Conclusion Colin Cameron Univ. of California - Davis (Keynote Robust address nference at The with 8thClustered Annual Health Errors EconometricsSeptember Workshop,University of Colorado 12 / 63 D

13 2. Cluster-Robust nference for OLS Summary 2. Cluster-Robust nference for OLS Clustered errors: y ig = x 0 ig β + u ig with u ig correlated with error for any observation in group g and uncorrelated with error for any observation in other groups. Key result is that then the incorrect default OLS variance estimate should be in ated by τ j ' 1 + ρ xj ρ u ( N g 1), (1) ρ xj is the within cluster correlation of x j (2) ρ u is the within cluster error correlation (3) N g is the average cluster size. Need both (1) and (2) and it also increases with (3). Cluster-robust estimate of V[ b β] is natural extension of White s (1980) heteroskedastic-robust estimate but requires number of groups G!. Potentially more e cient feasible GLS is possible and can also be robusti ed. Colin Cameron Univ. of California - Davis (Keynote Robust address nference at The with 8thClustered Annual Health Errors EconometricsSeptember Workshop,University of Colorado 13 / 63 D

14 2. Cluster-Robust nference 2.1 ntuition 2.1 ntuition Suppose we have univariate data y i (µ, σ 2 ). We estimate µ by ȳ and " N 1 Var[ȳ] = Var N y i = 1 N N i=1 2 i=1 N j=1 Cov(y i, y j ) Given independence over i this simpli es to Var[ȳ] = 1 N σ2. Now suppose observations 2 are equicorrelated 3 with Cov(y i, y j ) = ρσ 2 1 ρ ρ for i 6= j so Var[y] = σ 2 ρ ρ 5. Then ρ ρ 1 " # Var[ȳ] = 1 N N 2 Var(y i ) + N N Cov(y i, y j ) i=1 i=1 j=1;j6=i = 1 N 2 [Nσ 2 + N(N 1)ρσ 2 ] = 1 N σ2 f1 + (n 1)ρg. Colin Cameron Univ. of California - Davis (Keynote Robust address nference at The with 8thClustered Annual Health Errors EconometricsSeptember Workshop,University of Colorado 14 / 63 D #.

15 2. Cluster-Robust nference 2.1 ntuition So independent errors Equicorrelated errors Var[ȳ] = 1 N σ2. Var[ȳ] = 1 N σ2 f1 + (N 1)ρg. The variance is 1 + (N 1)ρ times larger!. Reason: An extra observation is not providing a new independent piece of information. Note that the e ect can be large if ρ = 0.1 (so R 2 of y i on y j is 0.01) and N = 81 then Var[ȳ] = 9 N 1 σ2 is 9 times larger! Colin Cameron Univ. of California - Davis (Keynote Robust address nference at The with 8thClustered Annual Health Errors EconometricsSeptember Workshop,University of Colorado 15 / 63 D

16 2. Cluster-Robust nference 2.2 Clustered Errors 2.2 OLS with Clustered Errors Model for G clusters with N g individuals per cluster: OLS estimator y ig = x 0 ig β + u ig, i = 1,..., N g, g = 1,..., G, y g = X g β + u g, g = 1,..., G, y = Xβ + u. bβ = ( G g =1 N g i=1 x ig x 0 ig ) 1 ( G g =1 N g i=1 x ig y ig ) = ( G g =1 X0 g X g ) 1 ( G g =1 X0 g y g ) = (X 0 X) 1 X 0 y. Colin Cameron Univ. of California - Davis (Keynote Robust address nference at The with 8thClustered Annual Health Errors EconometricsSeptember Workshop,University of Colorado 16 / 63 D

17 2. Cluster-Robust nference 2.2 Clustered Errors As usual bβ = β + (X 0 X) 1 X 0 u = β + (X 0 X) 1 ( G g =1 X g u g ). Assume independence over g and correlation within g E[u ig u jg 0jx ig, x jg 0] = 0, unless g = g 0. Then β b a N [β, V[ β]] b with asymptotic variance Avar[ β] b = (E[X 0 X]) 1 ( G g =1 E[Xg 0 u g ug 0 X g ])(E[X 0 X]) 1 6= σ 2 u(e[x 0 X]) 1. Colin Cameron Univ. of California - Davis (Keynote Robust address nference at The with 8thClustered Annual Health Errors EconometricsSeptember Workshop,University of Colorado 17 / 63 D

18 2. Cluster-Robust nference 2.2 Clustered Errors Consequences - KEY RESULT FOR NSGHT Suppose equicorrelation within cluster g 1 Cor[u ig, u jg jx ig, x jg ] = ρ u i = j i 6= j this arises in a random e ects model with u ig = α g + ε ig, where α g and ε ig are i.i.d. errors. an example is individual i in village g or student i in school g. The incorrect default OLS variance estimate should be in ated by τ j ' 1 + ρ xj ρ u ( N g 1), (1) ρ xj is the within cluster correlation of x j (2) ρ u is the within cluster error correlation (3) N g is the average cluster size. Need both (1) and (2) and it also increases with (3) Colin Cameron Univ. of California - Davis (Keynote Robust address nference at The with 8thClustered Annual Health Errors EconometricsSeptember Workshop,University of Colorado 18 / 63 D

19 2. Cluster-Robust nference 2.2 Clustered Errors Theory: Kloek (1981), Scott and Holt (1982). Practice: Moulton (1986, 1990) showed that the variance in ation can be large even if ρ u is small especially with a grouped regressor (same for all individuals in group) so that ρ x = 1. CPS data example: N g = 81, ρ x = 1 and ρ u = 0.1 =) τ j ' 1 + ρ xj ρ u ( N g 1) = = 9. F true standard errors are three times the default! So should correct for clustering even in settings where not obviously a problem. Colin Cameron Univ. of California - Davis (Keynote Robust address nference at The with 8thClustered Annual Health Errors EconometricsSeptember Workshop,University of Colorado 19 / 63 D

20 2. Cluster-Robust nference 2.3 The Cluster-Robust Variance Matrix Estimate 2.3 The Cluster-Robust Variance Matrix Estimate Recall for OLS with independent heteroskedastic errors Avar[ β] b = (E[X 0 X]) 1 ( N i=1 E[ui 2 x i xi 0 ])(E[X 0 X]) 1 can be consistently estimated (White (1980)) as N! by bv[ β] b = (X 0 X) 1 ( N i=1 bu i 2 x i xi 0 )(X 0 X) 1. Need 1 N N i=1 bu i 2x i xi 0 not bu i 2 p! E[ui 2] 1 N N i=1e[u 2 i x i x 0 i ] p! 0 Colin Cameron Univ. of California - Davis (Keynote Robust address nference at The with 8thClustered Annual Health Errors EconometricsSeptember Workshop,University of Colorado 20 / 63 D

21 2. Cluster-Robust nference 2.3 The Cluster-Robust Variance Matrix Estimate Similarly for OLS with independent clustered errors Avar[ b β] = (E[X 0 X]) 1 ( G g =1 E[X 0 g u g u 0 g X g ])(E[X 0 X]) 1 can be consistently estimated as G! by the cluster-robust variance estimate (CRVE) bv CR [ b β] = (X 0 X) 1 ( G g =1 X 0 g eu g eu 0 g X g )(X 0 X) 1. Stata uses eu g = cbu g = c(y g X g b β) where c = G G 1 N 1 N K ' G G 1. Colin Cameron Univ. of California - Davis (Keynote Robust address nference at The with 8thClustered Annual Health Errors EconometricsSeptember Workshop,University of Colorado 21 / 63 D

22 2. Cluster-Robust nference 2.3 The Cluster-Robust Variance Matrix Estimate The CRVE was proposed by White (1984) for balanced case proposed by Liang and Zeger (1986) for grouped data proposed by Arellano (1987) for FE estimator for short panels (group on individual) Hansen (2007a) and Carter, Schnepel and Steigerwald (2013) also allow N g!. popularized by incorporation in Stata as the cluster option (Rogers (1993)). also allows for heteroskedasticity so is cluster- and heteroskedasticrobust. Stata with cluster identi er id_clu regress y x, vce(cluster id_clu) xtreg y x, pa corr(ind) vce(robust) F F after xtset id_clu from version 12.1 on Stata interprets vce(robust) as cluster-robust for all xt commands. Colin Cameron Univ. of California - Davis (Keynote Robust address nference at The with 8thClustered Annual Health Errors EconometricsSeptember Workshop,University of Colorado 22 / 63 D

23 2. Cluster-Robust nference 2.4 Feasible GLS 2.4. Feasible GLS with Cluster-Robust nference Potential e ciency gains for feasible GLS compared to OLS. Specify a model for Ω g = E[u g u 0 g jx g ], e.g. within-cluster equicorrelation. Given bω p! Ω, the feasible GLS estimator of β is G bβ FGLS = g =1 X0 b g Ωg 1 X g 1 G g =1 X0 g b Ω 1 g y g. Default bv[ b β FGLS ] = (X 0 bω 1 X) 1 requires correct Ω. To guard against misspeci ed Ω g use cluster-robust bv CR [ b β FGLS ] = 1 X 0 bω 1 G X g =1 X0 b g Ωg 1 bu g bu g 0 Ω b g 1 X g X 0 bω 1 X where bu g = y g X g βfgls b and bω = Diag[ bω g ] assumes u g and u h are uncorrelated, for g 6= h and needs G!. Colin Cameron Univ. of California - Davis (Keynote Robust address nference at The with 8thClustered Annual Health Errors EconometricsSeptember Workshop,University of Colorado 23 / 63 D

24 2. Cluster-Robust nference 2.4 Feasible GLS FGLS Examples Example 1 - Moulton setting Random e ects model: y ig = x 0 ig β + α g + ε ig F xtreg, re vce(robust) Richer hierarchical linear model or mixed model F Stata 13: mixed, vce(robust) Example 2 - BDM setting AR(1) error u it = ρu i,t 1 + ε it and ε it i.i.d. xtreg y x, pa corr(ar 1) vce(robust) Stata allows a range of correlation structures Puzzle - why is FGLS not used more? Easily done in Stata with robust VCE if G! Unless FE s present and N g small (see later). Colin Cameron Univ. of California - Davis (Keynote Robust address nference at The with 8thClustered Annual Health Errors EconometricsSeptember Workshop,University of Colorado 24 / 63 D

25 2. Cluster-Robust nference 2.5 mplementation of OLS and FGLS 2.5 The CRVE can be rank de cient bv CR [ b β] can be rank de cient rank is as most minimum of K and G 1 bb = C 0 C, where C 0 = [X1 0 bu 1 XG 0 bu G ] is K G and X1 0 bu XG 0 bu G = 0 For example if have 15 clusters (say states) Cannot jointly test signi cance of 20 occupation dummies But can test joint signi cance of 14. The test of overall joint statistical signi cance is not computable if G < K but tests on individual coe cients are still okay. Colin Cameron Univ. of California - Davis (Keynote Robust address nference at The with 8thClustered Annual Health Errors EconometricsSeptember Workshop,University of Colorado 25 / 63 D

26 2. Cluster-Robust nference 2.6 Pairs Cluster Bootstrap 2.6 Pairs Cluster Bootstrap Do the following steps for each of B bootstrap samples: (1) form G clusters f(y1, X 1 ),..., (y G, X G )g by resampling with replacement G times from the original sample of clusters (2) compute β b b (estimate of β) in the b th bootstrap sample. Compute the variance of the B estimates b β 1,..., b β B as bv CR;boot [ b β] = 1 B 1 B b=1 (b β b b β)( b βb b β) 0, where b β = B 1 B b=1 b β b and B 400. Pairs cluster bootstrap has no asymptotic re nement. But can compute these if Stata doesn t provide a CRVE. Also can do even if usual CRVE is rank de cient? Also cluster jackknife. Colin Cameron Univ. of California - Davis (Keynote Robust address nference at The with 8thClustered Annual Health Errors EconometricsSeptember Workshop,University of Colorado 26 / 63 D

27 3. Cluster-Speci c Fixed E ects models 3. Cluster-Speci c Fixed E ects Models: Summary Now y ig = x 0 ig β + α g + u ig = x 0 ig β + G h=1 α g dh ig + u ig. 1. FE s do not in practice absorb all within cluster correlation of u ig still need to uce cluster-robust VCE 2. Cluster-robust VCE is still okay with FE s (if G! ) Arellano (1987) for N g small and Hansen (2007a, p.600) for N g! 3. f N g small use xtreg, fe not reg i.id_clu as reg or areg uses wrong degrees of freedom 4. FGLS with xed e ects needs to bias-adjust for bα g inconsistent Hansen (2007b) provides bias-corrected FGLS for AR(p) errors Brewer, Crossley and Joyce (2013) implement in DiD setting Hausman and Kuersteiner (2008) provide bias-corrected FGLS for Kiefer (1980) error model 5. Need to do a modi ed Hausman test for xed e ects. Colin Cameron Univ. of California - Davis (Keynote Robust address nference at The with 8thClustered Annual Health Errors EconometricsSeptember Workshop,University of Colorado 27 / 63 D

28 4. What to Cluster Over? 4.1 Factors Determining What to Cluster Over 4.1 Factors Determining What to Cluster Over t is not always obvious how to specify the clusters. Moulton (1986, 1990) cluster at the level of an aggregated regressor. Bertrand, Du o and Mullainathan (2004) with state-year data cluster on states (assumed to be independent) rather than state-year pairs. Pepper (2002) cluster at the highest level where there may be correlation e.g. for individual in household in state may want to cluster at level of the state if state policy variable is a regressor. Colin Cameron Univ. of California - Davis (Keynote Robust address nference at The with 8thClustered Annual Health Errors EconometricsSeptember Workshop,University of Colorado 28 / 63 D

29 4. What to Cluster Over? 4.2 Clustering Due to Survey Design 4.2 Clustering Due to Survey Design Clustering routinely arises with complex survey data. Then the loss of e ciency due to clustering is called the design e ect This is the inverse of the variance in ation factor given earlier Long literature going back to 1960 s CRVE is called the linearization formula Shah, Holt and Folsom (1977) is early reference. Complex survey data are weighted often ignore assuming conditioning on x handles weighting And strati ed this improves estimator e ciency somewhat Bhattacharya (2005) gives a general GMM treatment. Colin Cameron Univ. of California - Davis (Keynote Robust address nference at The with 8thClustered Annual Health Errors EconometricsSeptember Workshop,University of Colorado 29 / 63 D

30 4. What to Cluster Over? 4.2 Clustering Due to Survey Design Econometricians reasonably 1. Cluster on PSU or higher 2. Sometimes weight and sometimes not 3. gnore strati cation (with slight loss in e ciency) Survey software controls for all three. Stata svy commands Econometricians use regular commands with vce(cluster) and possibly [pweight=1/prob] Colin Cameron Univ. of California - Davis (Keynote Robust address nference at The with 8thClustered Annual Health Errors EconometricsSeptember Workshop,University of Colorado 30 / 63 D

31 5. Multi-Way Clustering 5. Multi-way Clustering Example: How do job injury rates e ect wages? Hersch (1998). CPS individual data on male wages N = But there is no individual data on job injury rate. nstead aggregated data: F F data on industry injury rates for 211 industries data on occupation injury rates for 387 occupations. Model estimated is What should we do? y igh = α + x 0 igh β + γ rind ig + δ rocc ih + u igh. Ad hoc robust: OLS and robust cluster on industry for bγ and robust cluster on occupation for bδ. Non-robust: FGLS two-way random e ects: u igh = ε g + ε h + ε igh ; ε g, ε h, ε igh i.i.d. Two-way robust: next Colin Cameron Univ. of California - Davis (Keynote Robust address nference at The with 8thClustered Annual Health Errors EconometricsSeptember Workshop,University of Colorado 31 / 63 D

32 5. Multi-Way Clustering 5.1 Two-way Cluster-Robust VCE 5.1 Two-way Cluster-Robust Robust variance matrix estimates are of the form bavar[ β] b = (X 0 X) 1 bb(x 0 X) 1 For one-way clustering with clusters g = 1,..., G we can write bb = N i=1 N j=1 x i xj 0 bu i bu j 1[i, j in same cluster g] where bu i = y i xi 0 β b and the indicator function 1[A] equals 1 if event A occurs and 0 otherwise. For two-way clustering with clusters g = 1,..., G and h = 1,..., H bb = N i=1 N j=1 x i xj 0 bu i bu j 1[i, j share any of the two clusters] = N i=1 N j=1 x i xj 0 bu i bu j 1[i, j in same cluster g] + N i=1 N j=1 x i xj 0 bu i bu j 1[i, j in same cluster h] N i=1 N j=1 x i xj 0 bu i bu j 1[i, j in both cluster g and h]. Colin Cameron Univ. of California - Davis (Keynote Robust address nference at The with 8thClustered Annual Health Errors EconometricsSeptember Workshop,University of Colorado 32 / 63 D

33 5. Multi-Way Clustering 5.1 Two-way Cluster-Robust VCE Obtain three di erent cluster-robust variance matrices for the estimator by one-way clustering in, respectively, the rst dimension, the second dimension, and by the intersection of the rst and second dimensions add the rst two variance matrices and, to account for double-counting, subtract the third. Thus bv two-way [ β] b = bv G [ β] b + bv H [ β] b bv G \H [ β], b Theory presented in Cameron, Gelbach, and Miller (2006, 2011), Miglioretti and Heagerty (2006), and Thompson (2006, 2011) Extends to multi-way clustering. Early empirical applications that independently proposed this method include Acemoglu and Pischke (2003). Colin Cameron Univ. of California - Davis (Keynote Robust address nference at The with 8thClustered Annual Health Errors EconometricsSeptember Workshop,University of Colorado 33 / 63 D

34 5. Multi-Way Clustering 5.2 mplementation 5.2 mplementation f bv[ b β] is not positive-de nite (small G, H) then Decompose bv[ b β] = UΛU 0 ; U contains eigenvectors of bv, and Λ = Diag[λ 1,..., λ d ] contains eigenvalues. Create Λ + = Diag[λ + 1,..., λ+ d ], with λ+ j = max 0, λ j, and use bv + [ β] b = UΛ + U 0 Stata add-on cgmreg.ado implements this. Also Stata add-on xtivreg2.ado has two-way clustering for a variety of linear model estimators. Fixed e ects in one or both dimensions Theory has not formally addressed this complication ntuitively if G! and H! then each xed e ect is estimated using many observations. n practice the main consequence of including xed e ects is a reduction in within-cluster correlation of errors. Colin Cameron Univ. of California - Davis (Keynote Robust address nference at The with 8thClustered Annual Health Errors EconometricsSeptember Workshop,University of Colorado 34 / 63 D

35 5. Multi-Way Clustering 5.2 mplementation Application Example 1: Hersch data Relatively small di erence versus one-way But can simultaneously handle both ways rather than one-way cluster on industry for bγ and one-way cluster on occupation for bδ. Example 2: DiD We have found little di erence if cluster two-way on state and time versus just one-way on state. Studies in nance view this as important. Example 3: Country-pair international trade volume Two-way cluster on country 1 and country 2 leads to much bigger standard errors (Cameron et al. 2011) Cameron and Miller (2012) nd that two-way still doesn t pick up all correlations. nstead other methods including Fafchamps and Gubert (2007). Colin Cameron Univ. of California - Davis (Keynote Robust address nference at The with 8thClustered Annual Health Errors EconometricsSeptember Workshop,University of Colorado 35 / 63 D

36 5. Multi-Way Clustering 5.3 Feasible GLS 5.3 Feasible GLS Two-way random e ects y igh = xigh 0 β + α g + δ h + ε ig with i.i.d. errors xtmixed y x jj _all: R.id1 jj id2:, mle. but cannot then get cluster-robust variance matrix Hierarchical linear models or mixed models richer FGLS y ig = xig 0 β g + u ig β g = W g γ + v j where u ig and v g are errors. see Rabe-Hesketh and Skrondal (2012) Colin Cameron Univ. of California - Davis (Keynote Robust address nference at The with 8thClustered Annual Health Errors EconometricsSeptember Workshop,University of Colorado 36 / 63 D

37 5. Multi-Way Clustering 5.4 Spatial Correlation 5.4 Spatial Correlation Two-way cluster robust related to time-series and spatial HAC. n general bb in preceding has the form i j w (i, j) x i x 0 j bu i bu j. Two-way clustering: w (i, j) = 1 for observations that share a cluster. White and Domowitz (1984) time series: w (i, j) = 1 for observations close in time to one another. Conley (1999) spatial: w (i, j) decays to 0 as the distance between observations grows. The di erence: White & Domowitz and Conley use mixing conditions to ensure decay of dependence in time or distance. Mixing conditions do not apply to clustering due to common shocks. nstead two-way robust requires independence across clusters. Colin Cameron Univ. of California - Davis (Keynote Robust address nference at The with 8thClustered Annual Health Errors EconometricsSeptember Workshop,University of Colorado 37 / 63 D

38 5. Multi-Way Clustering 5.4 Spatial Correlation Spatial Correlation Consistent VE Driscoll and Kraay (1998) panel data when T! generalizes HAC to spatial correlation errors potentially correlated across individuals correlation across individuals disappears for obs > m time periods apart then w (it, js) = 1 d (it, js) /(m + 1) with sum over i, j, s and t and d (it, js) = jt sj if jt sj m and d (it, js) = 0 otherwise. Stata add-on command xtscc, due to Hoechle (2007). Foote (2007) contrasts various variance matrix estimators in a macroeconomics example. Petersen (2009) contrasts methods for panel data on nancial rms. Barrios, Diamond, mbens, and Kolesár (2012) state-year panel on individuals with spatial correlation across states. And use randomization inference. Colin Cameron Univ. of California - Davis (Keynote Robust address nference at The with 8thClustered Annual Health Errors EconometricsSeptember Workshop,University of Colorado 38 / 63 D

39 6. Few Clusters 6. nference with Few Clusters One-way clustering, and focus on the Wald t-statistic w = b β β 0. s b β CRVE assumes G!. What if G is small? At a minimum use CRVE with rescaled error eu g = p cbu g where c = G G 1 or c = G G 1 N 1 N k ' G G 1 And use T (G 1) critical values Stata does this for regress but not other commands.. But tests still over-reject with small G. Colin Cameron Univ. of California - Davis (Keynote Robust address nference at The with 8thClustered Annual Health Errors EconometricsSeptember Workshop,University of Colorado 39 / 63 D

40 6. Few Clusters nference with few clusters arises often in practice e.g. have only ten states standard methods in e.g. Stata over-reject this is an active area of research. Three approaches 1. Finite sample bias correction to the CRVE 2. Wild cluster bootstrap (with asymptotic re nement) 3. Better t critical values A related distinct problem is one treated cluster and many control clusters. Colin Cameron Univ. of California - Davis (Keynote Robust address nference at The with 8thClustered Annual Health Errors EconometricsSeptember Workshop,University of Colorado 40 / 63 D

41 6. Few Clusters 6.1 The Basic Problem with Few Clusters 6.1 The Basic Problem with Few Clusters OLS over ts with bu systematically biased to zero compared to u. e.g. OLS with iid normal errors E[bu 0 bu] = (N K )σ 2, not Nσ 2. Problem is greatest as G gets small - few clusters. How few is few? balanced data; G < 20 to G < 50 depending on data unbalanced data: G less than this. Unusual case. f N is too small with cross-section data, usually everything is statistically insigni cant. With clustered data if G is small we may still have statistical signi cance if N g is small. Colin Cameron Univ. of California - Davis (Keynote Robust address nference at The with 8thClustered Annual Health Errors EconometricsSeptember Workshop,University of Colorado 41 / 63 D

42 6. Few Clusters 6.2 Solution 1: Bias-Corrected CRVE 6.2 Solution 1: Bias-Corrected CRVE Simplest is eu g = p cbu g, already mentioned. CR2VE generalizes HC2 for heteroskedasticity eu g = [ Ng H gg ] 1/2 bu g where H gg = X g (X 0 X) 1 Xg 0 gives unbiased CRVE if errors iid normal CR3VE generalizes HC3 for heteroskedasticity eu g + = p G /(G 1)[ Ng H gg ] 1 bu g where H gg = X g (X 0 X) 1 Xg 0 same as jackknife Finite sample Wald tests at least use T (G 1) p-values and critical values and not N [0, 1] Example G = 10 F t = 1.96 has p = using T (9) versus p = 0.05 using N [0, 1] ad hoc reasonable correction used by Stata. Colin Cameron Univ. of California - Davis (Keynote Robust address nference at The with 8thClustered Annual Health Errors EconometricsSeptember Workshop,University of Colorado 42 / 63 D

43 6. Few Clusters 6.3 Solution 2: Cluster Bootstrap with Asymptotic Re nement 6.3 Solution 2: Cluster Bootstrap with Asymptotic Re nement Cameron, Gelbach and Miller (2008) Test H 0 : β 1 = β 0 1 against H a : β 1 6= β 0 1 using w = ( bβ 1 β 0 1 )/s bβ 1 perform a cluster bootstrap with asymptotic re nement then true test size is α + O(G 3/2 ) rather than usual α + O(G 1 ) hopefully improvement when G is small wild cluster percentile-t bootstrap is best better than pairs cluster percentile-t bootstrap. Colin Cameron Univ. of California - Davis (Keynote Robust address nference at The with 8thClustered Annual Health Errors EconometricsSeptember Workshop,University of Colorado 43 / 63 D

44 6. Few Clusters 6.3 Solution 2: Cluster Bootstrap with Asymptotic Re nement Wild Cluster Bootstrap 1 Obtain the OLS estimator b β and OLS residuals bu g, g = 1,..., G. Best to use residuals that impose H 0. 2 Do B iterations of this step. On the b th iteration: 1 For each cluster g = 1,..., G, form bu g = bu g or bu g = bu g each with probability 0.5 and hence form by g = X 0 g b β + bu g. This yields wild cluster bootstrap resample f(by 1, X 1),..., (by G, X G )g. 2 Calculate the OLS estimate bβ 1,b and its standard error s bβ and given 1,b these form the Wald test statistic wb = ( bβ 1,b bβ 1 )/s b β. 1,b 3 Reject H 0 at level α if and only if w < w [α/2] or w > w [1 α/2], where w [q] denotes the qth quantile of w 1,..., w B. Colin Cameron Univ. of California - Davis (Keynote Robust address nference at The with 8thClustered Annual Health Errors EconometricsSeptember Workshop,University of Colorado 44 / 63 D

45 6. Few Clusters 6.3 Solution 2: Cluster Bootstrap with Asymptotic Re nement Current Research Webb (2013) proposes using a six-point distribution for the weights d g in bu g = d g bu g. The weights d g have a 1/6 chance of each value in f p p p p p p 1.5, 1,.5,.5, 1, 1.5g. Works better with few clusters than two-point F Two-point cluster gives only 2 G 1 di erent bootstrap resamples. Also with few clusters need to enumerate rather than bootstrap. MacKinnon and Webb (2013) nd that unbalanced cluster sizes worsens few clusters problem. Wild cluster bootstrap does well. Colin Cameron Univ. of California - Davis (Keynote Robust address nference at The with 8thClustered Annual Health Errors EconometricsSeptember Workshop,University of Colorado 45 / 63 D

46 6. Few Clusters 6.3 Solution 2: Cluster Bootstrap with Asymptotic Re nement Use the Bootstrap with Caution We assume clustering does not lead to estimator inconsistency focus is just on the standard errors. We assume that the bootstrap is valid this is usually the case for smooth problems with asymptotically normal estimators and usual rates of convergence. but there are cases where the bootstrap is invalid. When bootstrapping always set the seed (for replicability) use more bootstraps than the Stata default of 50 F for bootstraps without asymptotic re nement 400 should be plenty. When bootstrapping a xed e ects panel data model the additional option idcluster() must be used F for explanation see Stata manual [R] bootstrap: Bootstrapping statistics from data with a complex structure. Colin Cameron Univ. of California - Davis (Keynote Robust address nference at The with 8thClustered Annual Health Errors EconometricsSeptember Workshop,University of Colorado 46 / 63 D

47 6. Few Clusters 6.4 Solution 3: mproved T Critical Values Solution 3: mproved T Critical Values Suppose all regressors are invariant within clusters, clusters are balanced and errors are i.i.d. normal then y ig = xg 0 β + ε ig =) ȳ g = xg 0 β + ε g with ε g i.i.d. normal so Wald test based on OLS is exactly T (G L), where L is the number of group invariant regressors. Extend to nonnormal errors and group varying regressors asymptotic theory when G is small and N g!. Donald and Lang (2007) propose a two-step FGLS RE estimator yields t-test that is T (G L) under some assumptions Wooldridge (2006) proposes an alternative minimum distance method. Colin Cameron Univ. of California - Davis (Keynote Robust address nference at The with 8thClustered Annual Health Errors EconometricsSeptember Workshop,University of Colorado 47 / 63 D

48 6. Few Clusters 6.4 Solution 3: mproved T Critical Values Current Research (continued) mbens and Kolesar (2012) Data-determined number of degrees of freedom for t and F tests Builds on Satterthwaite (1946) and Bell and McCa rey (2002). Assumes normal errors and particular model for Ω. Match rst two moments of test statistic with rst two moments of χ 2. v = ( G j=1 λ j ) 2 /( G j=1 λ2 j ) and λ j are the eigenvalues of the G G matrix G 0 b G. Find works better than 2-point Wild cluster bootstrap but they did not impose H 0. Colin Cameron Univ. of California - Davis (Keynote Robust address nference at The with 8thClustered Annual Health Errors EconometricsSeptember Workshop,University of Colorado 48 / 63 D

49 6. Few Clusters 6.4 Solution 3: mproved T Critical Values Carter, Schnepel and Steigerwald (2013) provide asymptotic theory when clusters are unbalanced propose a measure of the e ective number of clusters G = G /(1 + δ) F where δ = G 1 G g =1 f(γ g γ) 2 / γ 2 g F γ g = ek 0 (X0 X) 1 Xg 0 g X g (X 0 X) 1 e k F e k is a K 1 vector of zeroes aside from 1 in the k th position if bβ = bβ k F γ = G 1 G g =1 γ g. Cluster heterogeneity (δ 6= 0) can arise for many reasons variation in N g, variation in X g and variation in g across clusters. Colin Cameron Univ. of California - Davis (Keynote Robust address nference at The with 8thClustered Annual Health Errors EconometricsSeptember Workshop,University of Colorado 49 / 63 D

50 6. Few Clusters 6.4 Solution 3: mproved T Critical Values Brewer, Crossley and Joyce (2013) Do FGLS as gives both e ciency gains and works well even with few clusters. Colin Cameron Univ. of California - Davis (Keynote Robust address nference at The with 8thClustered Annual Health Errors EconometricsSeptember Workshop,University of Colorado 50 / 63 D

51 6. Few Clusters 6.5 Special Cases 6.5 Special Cases Bester, Conley and Hansen (2009) obtain T (G 1) in settings such as panel where mixing conditions apply. bragimov and Muller (2010) take an alternative approach suppose only within-group variation is relevant then separately estimate β b g s and average asymptotic theory when G is small and N g! A big limitation is assumption of only within variation for example in state-year panel application with clustering on state it rules out z t in y st = x 0 st β + z0 t γ + ε ig where z t are for example time dummies. This limitation is relevant in DiD models with few treated groups Conley and Taber (2010) present a novel method for that case. Colin Cameron Univ. of California - Davis (Keynote Robust address nference at The with 8thClustered Annual Health Errors EconometricsSeptember Workshop,University of Colorado 51 / 63 D

52 7. Extensions 7. Extensions The results for OLS and FGLS and t-tests extend to multiple hypothesis tests and V, 2SLS. GMM and nonlinear estimators. These extensions are incorporated in Stata but Stata generally does not use nite-cluster degrees-of-freedom adjustments in computing test p-values and con dence intervals F exception is command regress. Colin Cameron Univ. of California - Davis (Keynote Robust address nference at The with 8thClustered Annual Health Errors EconometricsSeptember Workshop,University of Colorado 52 / 63 D

53 7. Extensions Extensions (continued) 7.1 Cluster-Robust F-tests 7.2 nstrumental Variables Estimators V, 2SLS, linear GMM Need modi ed Hausman test for endogeneity : estat endogenous Weak instruments: F F F First-stage F-test should be cluster-robust use add-on xtivreg2 Finlay and Magnusson (2009) have Stata add-on rivtest.ado. 7.3 Nonlinear Estimators Population-averaged (xtreg, pa) and random e ects (e.g. xtlogit, re) give quite di erent βs Rarely can eliminate xed e ects if N g is small. 7.4 Cluster-randomized Experiments Colin Cameron Univ. of California - Davis (Keynote Robust address nference at The with 8thClustered Annual Health Errors EconometricsSeptember Workshop,University of Colorado 53 / 63 D

54 8. Empirical Example 8. Empirical Example: Moulton Setting Moulton setting Cross-section sample with clustering on state. BDM setting Repeated cross-section data with individual data aggregated to state-year. Demonstrate the impact of clustering on standard errors and test size and consider various nite-cluster corrections. Colin Cameron Univ. of California - Davis (Keynote Robust address nference at The with 8thClustered Annual Health Errors EconometricsSeptember Workshop,University of Colorado 54 / 63 D

55 8. Empirical Example 8.1 Cross-section individual-level data Table 1: Moulton setting. Cross-section individual-level data March 2012 CPS data with state-level regressor and cluster on state. N = and G = 51. Compare various standard errors for OLS and FGLS (RE). Table 2: 20% subsample of data in Table 1. Now construct a fake dummy and test H 0 : β = 0. Do this for G = 50, 30, 20, 10 and 6 S = 4000 for G 10 and S = 1000 for G > 10. B = 399 (okay for Monte Carlo but set higher in practice). Colin Cameron Univ. of California - Davis (Keynote Robust address nference at The with 8thClustered Annual Health Errors EconometricsSeptember Workshop,University of Colorado 55 / 63 D

56 8. Empirical Example Table 1 Cross section individual level data mpacts of clustering and estimator choices on estimated coefficients and standard errors Estimation Method OLS FGLS (RE) Slope coefficient Standard Errors Default Heteroscedastic Robust Cluster Robust (cluster on State) Pairs cluster bootstrap Number observations Number clusters (states) Cluster size range 519 to to 5866 ntraclass correlation Notes: March 2012 CPS data, from PUMS download. Default standard errors for OLS assume errors are iid; default standard errors for FGLS assume the Random Effects model is correctly specified. The Bootstrap uses 399 replications. A fixed effect model is not possible, since the regressor is invariant within states. olin Cameron Univ. of California - Davis (Keynote Robust address nference at The with 8thClustered Annual Health Errors EconometricsSeptember Workshop,University of Colorado 56 / 63 D

57 8. Empirical Example Table 2 Cross section individual level data Monte Carlo rejection rates of true null hypothesis (slope = 0) with different number of clusters and different rejection methods Nominal 5% rejection rates Wald test method Numbers of Clusters Different standard errors and critical values 1 White Robust, T(N k) for critical value Cluster on state, T(N k) for critical value Cluster on state, T(G 1) for critical value Cluster on state, T(G 2) for critical value Cluster on state, CR2 bias correction, T(G 1) for critical value Cluster on state, CR3 bias correction, T(G 1) for critical value Cluster on state, CR2 bias correction, K degrees of freedom Cluster on state, T(CSS effective # clusters) Pairs cluster bootstrap for standard error, T(G 1) for critical value Bootstrap Percentile T methods 10 Pairs cluster bootstrap Wild cluster bootstrap, Rademacher 2 point distribution, low p value Wild cluster bootstrap, Rademacher 2 point distribution, mid p value Wild cluster bootstrap, Rademacher 2 point distribution, high p value Wild cluster bootstrap, Webb 6 point distribution Wild cluster bootstrap, Rademacher 2 pt, do not impose null hypothesis K effective DOF (mean) K effective DOF (5th percentile) K effective DOF (95th percentile) CSS effective # clusters (mean) Average number of observations Notes: March 2012 CPS data, 20% sample from PUMS download. For 6 and 10 clusters, 4000 Monte Carlo replications. For clusters, 1000 Monte Carlo replications. The Bootstraps use 399 replications. "K effective DOF" from mbens and Kolesar (2013), and "CSS effective # clusters" from Carter, Schnepel and Steigerwald (2013), see Subsection V.D. Row 11 uses lowest p value from interval, when Wild percentile T bootstrapped p values are not point identified due to few clusters. Row 12 uses mid range of interval, and row 13 uses largest p value of interval. Colin Cameron Univ. of California - Davis (Keynote Robust address nference at The with 8thClustered Annual Health Errors EconometricsSeptember Workshop,University of Colorado 57 / 63 D

58 8. Empirical Example 8.2 BDM Setting with repreated c Table 3: BDM setting. Panel level state-year data March CPS data. Aggregated from individual level data using Hansen (2007) method F F OLS regress y its on regresors x its and state-year dummies D ts gives coe cients ey ts OLS regress ey ts = α s + δ t + β d ts + u ts G = 51, T = 36, N = G T = 1836, Compare various standard errors for FE-OLS and FE-FGLS (AR(1)). Table 4: Same data as Table 3. Now construct a fake serially correlated dummy and test H 0 : β = 0. Do this for G = 50, 30, 20, 10 and 6. Colin Cameron Univ. of California - Davis (Keynote Robust address nference at The with 8thClustered Annual Health Errors EconometricsSeptember Workshop,University of Colorado 58 / 63 D

59 8. Empirical Example Table 3 State year panel data with differences in differences estimation mpacts of clustering and estimation choices on estimated coefficients, standard errors, and p values Estimation Method: Standard Errors p values Model: OLS FE OLS no FE FGLS AR(1) Slope coefficient Standard Errors OLS FE OLS no FE FGLS AR(1) 1 Default standard errors, T(N k) for critical value White Robust, T(N k) for critical value na na 3 Cluster on state, T(G 1) for critical value Cluster on state, CR2 bias correction, T(G 1) for critical value na na 5 Cluster on state, CR2 bias correction, K degrees of freedom na na 6 Pairs cluster bootstrap for standard error, T(G 1) for critical value Bootstrap Percentile T methods 7 Pairs cluster bootstrap na na Wild cluster bootstrap, Rademacher 2 point distribution na na Wild cluster bootstrap, Webb 6 point distribution na na mbens Kolesar effective DOF C S S effective # clusters Number observations Number clusters (states) Notes: March CPS data, from PUMS download. Models 1 and 3 include state and year fixed effects, and a "fake policy" dummy variable that turns on in 1995 for a random subset of half of the states. Model 2 includes year fixed effects but not state fixed effects. The Bootstraps use 999 replications. Model 3 uses FGLS, assuming an AR(1) error within each state. "K effective DOF" from mbens and Kolesar (2013), and "CSS effective # clusters" from Carter, Schnepel and Steigerwald (2013), see Subsection V.D. Colin Cameron Univ. of California - Davis (Keynote Robust address nference at The with 8thClustered Annual Health Errors EconometricsSeptember Workshop,University of Colorado 59 / 63 D

60 8. Empirical Example Table 4 State year panel data with differences in differences estimation Monte Carlo rejection rates of true null hypothesis (slope = 0) with different # clusters and different rejection meth Nominal 5% rejection rates Estimation Method Numbers of Clusters Wald Tests 1 Default standard errors, T(N k) for critical value Cluster on state, T(N k) for critical value Cluster on state, T(G 1) for critical value Cluster on state, T(G 2) for critical value Pairs cluster bootstrap for standard error, T(G 1) for critical value Bootstrap Percentile T methods 6 Pairs cluster bootstrap Wild cluster bootstrap, Rademacher 2 point distribution Wild cluster bootstrap, Webb 6 point distribution Notes: March CPS data, from PUMS download. Models include state and year fixed effects, and a "fake poli dummy variable that turns on in 1995 for a random subset of half of the states. For 6 and 10 clusters, 4000 Monte Carlo replications. For clusters, 1000 Monte Carlo replications. The Bootstraps use 399 replications. olin Cameron Univ. of California - Davis (Keynote Robust address nference at The with 8thClustered Annual Health Errors EconometricsSeptember Workshop,University of Colorado 60 / 63 D

61 9. Current Research 9. Current research Andreas Hagemann (2016), Cluster-Robust Bootstrap nference in Quantile Regression Models, JASA, forthcoming. wild cluster bootstrap for quantile regression. James G. MacKinnon and Matthew D. Webb (2016), Randomization nference for Di erence-in-di erences with Few Treated Clusters Randomization and bootstrap methods for di erences-in-di erences with few clusters. James E. Pustejovsky and Elizabeth Tipton (2016), Small sample methods for cluster-robust variance estimation and hypothesis testing in xed e ects models. mbens and Kolesar extended to multiple hypothesis tests. Colin Cameron Univ. of California - Davis (Keynote Robust address nference at The with 8thClustered Annual Health Errors EconometricsSeptember Workshop,University of Colorado 61 / 63 D

62 9. Current Research Current research (contined) Rustam bragimov and Ulrich K. Müller (2016), nference with Few heterogeneous Clusters, R.E.Stat., Extends bragimov and Müller (2010) from one-sample t-test to two-sample t-test. Alberto Abadie, Susan Athey, Guido W. mbens, Je rey M. Wooldridge (2014), Finite Population Causal Standard Errors, NBER Working Paper proposes randomization-based standard errors that in general are smaller than the conventional robust standard errors. A. Colin Cameron and Douglas L. Miller (2014), "Robust nference for Dyadic Data". robust inference for paired data such as cross-country trade. Colin Cameron Univ. of California - Davis (Keynote Robust address nference at The with 8thClustered Annual Health Errors EconometricsSeptember Workshop,University of Colorado 62 / 63 D

Robust Inference for Regression with Clustered Data

Robust Inference for Regression with Clustered Data Robust nference for Regression with Clustered Data Colin Cameron Univ. of California - Davis Joint work with Douglas L. Miller (UC - Davis). Based on A Practitioners Guide to Cluster-Robust nference, Journal

More information

Robust Inference with Multi-way Clustering

Robust Inference with Multi-way Clustering Robust Inference with Multi-way Clustering Colin Cameron, Jonah Gelbach, Doug Miller U.C. - Davis, U. Arizona, U.C. - Davis February 2010 February 2010 1 / 44 1.1 Introduction Moulton (1986, 1990) and

More information

Bootstrap-Based Improvements for Inference with Clustered Errors

Bootstrap-Based Improvements for Inference with Clustered Errors Bootstrap-Based Improvements for Inference with Clustered Errors Colin Cameron, Jonah Gelbach, Doug Miller U.C. - Davis, U. Maryland, U.C. - Davis May, 2008 May, 2008 1 / 41 1. Introduction OLS regression

More information

Day 2B Inference for Clustered Data: Part 2 With a Cross-section Data Example (Revised August 25 to add 14. Dyadic Clustering)

Day 2B Inference for Clustered Data: Part 2 With a Cross-section Data Example (Revised August 25 to add 14. Dyadic Clustering) Day 2B nference for Clustered Data: Part 2 With a Cross-section Data Example (Revised August 25 to add 14. Dyadic Clustering) A. Colin Cameron Univ. of Calif. - Davis... for Center of Labor Economics Norwegian

More information

Wild Bootstrap Inference for Wildly Dierent Cluster Sizes

Wild Bootstrap Inference for Wildly Dierent Cluster Sizes Wild Bootstrap Inference for Wildly Dierent Cluster Sizes Matthew D. Webb October 9, 2013 Introduction This paper is joint with: Contributions: James G. MacKinnon Department of Economics Queen's University

More information

A Practitioner s Guide to Cluster-Robust Inference

A Practitioner s Guide to Cluster-Robust Inference A Practitioner s Guide to Cluster-Robust Inference A. Colin Cameron Douglas L. Miller Cameron and Miller abstract We consider statistical inference for regression when data are grouped into clusters, with

More information

Casuality and Programme Evaluation

Casuality and Programme Evaluation Casuality and Programme Evaluation Lecture V: Difference-in-Differences II Dr Martin Karlsson University of Duisburg-Essen Summer Semester 2017 M Karlsson (University of Duisburg-Essen) Casuality and Programme

More information

The Exact Distribution of the t-ratio with Robust and Clustered Standard Errors

The Exact Distribution of the t-ratio with Robust and Clustered Standard Errors The Exact Distribution of the t-ratio with Robust and Clustered Standard Errors by Bruce E. Hansen Department of Economics University of Wisconsin October 2018 Bruce Hansen (University of Wisconsin) Exact

More information

The Exact Distribution of the t-ratio with Robust and Clustered Standard Errors

The Exact Distribution of the t-ratio with Robust and Clustered Standard Errors The Exact Distribution of the t-ratio with Robust and Clustered Standard Errors by Bruce E. Hansen Department of Economics University of Wisconsin June 2017 Bruce Hansen (University of Wisconsin) Exact

More information

A Practitioner s Guide to Cluster-Robust Inference

A Practitioner s Guide to Cluster-Robust Inference A Practitioner s Guide to Cluster-Robust Inference A. C. Cameron and D. L. Miller presented by Federico Curci March 4, 2015 Cameron Miller Cluster Clinic II March 4, 2015 1 / 20 In the previous episode

More information

11. Bootstrap Methods

11. Bootstrap Methods 11. Bootstrap Methods c A. Colin Cameron & Pravin K. Trivedi 2006 These transparencies were prepared in 20043. They can be used as an adjunct to Chapter 11 of our subsequent book Microeconometrics: Methods

More information

A Practitioner's Guide to Cluster-Robust Inference

A Practitioner's Guide to Cluster-Robust Inference A Practitioner's Guide to Cluster-Robust Inference A. Colin Cameron and Douglas L. Miller Department of Economics, University of California - Davis. This version: September 30, 2013 Abstract Note: Sections

More information

Robust Inference with Clustered Data

Robust Inference with Clustered Data Robust Inference with Clustered Data A. Colin Cameron and Douglas L. Miller Department of Economics, University of California - Davis. This version: Feb 10, 2010 Abstract In this paper we survey methods

More information

Panel Data. March 2, () Applied Economoetrics: Topic 6 March 2, / 43

Panel Data. March 2, () Applied Economoetrics: Topic 6 March 2, / 43 Panel Data March 2, 212 () Applied Economoetrics: Topic March 2, 212 1 / 43 Overview Many economic applications involve panel data. Panel data has both cross-sectional and time series aspects. Regression

More information

Lecture 4: Linear panel models

Lecture 4: Linear panel models Lecture 4: Linear panel models Luc Behaghel PSE February 2009 Luc Behaghel (PSE) Lecture 4 February 2009 1 / 47 Introduction Panel = repeated observations of the same individuals (e.g., rms, workers, countries)

More information

Day 1A Ordinary Least Squares and GLS

Day 1A Ordinary Least Squares and GLS Day 1A Ordinary Least Squares and GLS c A. Colin Cameron Univ. of Calif.- Davis Frontiers in Econometrics Bavarian Graduate Program in Economics. Based on A. Colin Cameron and Pravin K. Trivedi (2009,2010),

More information

Day 3B Nonparametrics and Bootstrap

Day 3B Nonparametrics and Bootstrap Day 3B Nonparametrics and Bootstrap c A. Colin Cameron Univ. of Calif.- Davis Frontiers in Econometrics Bavarian Graduate Program in Economics. Based on A. Colin Cameron and Pravin K. Trivedi (2009,2010),

More information

Day 2A Instrumental Variables, Two-stage Least Squares and Generalized Method of Moments

Day 2A Instrumental Variables, Two-stage Least Squares and Generalized Method of Moments Day 2A nstrumental Variables, Two-stage Least Squares and Generalized Method of Moments c A. Colin Cameron Univ. of Calif.- Davis Frontiers in Econometrics Bavarian Graduate Program in Economics. Based

More information

Clustering as a Design Problem

Clustering as a Design Problem Clustering as a Design Problem Alberto Abadie, Susan Athey, Guido Imbens, & Jeffrey Wooldridge Harvard-MIT Econometrics Seminar Cambridge, February 4, 2016 Adjusting standard errors for clustering is common

More information

TECHNICAL WORKING PAPER SERIES ROBUST INFERENCE WITH MULTI-WAY CLUSTERING. A. Colin Cameron Jonah B. Gelbach Douglas L. Miller

TECHNICAL WORKING PAPER SERIES ROBUST INFERENCE WITH MULTI-WAY CLUSTERING. A. Colin Cameron Jonah B. Gelbach Douglas L. Miller TECHNICAL WORKING PAPER SERIES ROBUST INFERENCE WITH MULTI-WAY CLUSTERING A. Colin Cameron Jonah B. Gelbach Douglas L. Miller Technical Working Paper 327 http://www.nber.org/papers/t0327 NATIONAL BUREAU

More information

Inference about Clustering and Parametric. Assumptions in Covariance Matrix Estimation

Inference about Clustering and Parametric. Assumptions in Covariance Matrix Estimation Inference about Clustering and Parametric Assumptions in Covariance Matrix Estimation Mikko Packalen y Tony Wirjanto z 26 November 2010 Abstract Selecting an estimator for the variance covariance matrix

More information

Cluster-Robust Inference

Cluster-Robust Inference Cluster-Robust Inference David Sovich Washington University in St. Louis Modern Empirical Corporate Finance Modern day ECF mainly focuses on obtaining unbiased or consistent point estimates (e.g. indentification!)

More information

Empirical Methods in Applied Microeconomics

Empirical Methods in Applied Microeconomics Empirical Methods in Applied Microeconomics Jörn-Ste en Pischke LSE October 007 1 Nonstandard Standard Error Issues The discussion so far has concentrated on identi cation of the e ect of interest. Obviously,

More information

Longitudinal Data Analysis Using Stata Paul D. Allison, Ph.D. Upcoming Seminar: May 18-19, 2017, Chicago, Illinois

Longitudinal Data Analysis Using Stata Paul D. Allison, Ph.D. Upcoming Seminar: May 18-19, 2017, Chicago, Illinois Longitudinal Data Analysis Using Stata Paul D. Allison, Ph.D. Upcoming Seminar: May 18-19, 217, Chicago, Illinois Outline 1. Opportunities and challenges of panel data. a. Data requirements b. Control

More information

Econometrics II. Nonstandard Standard Error Issues: A Guide for the. Practitioner

Econometrics II. Nonstandard Standard Error Issues: A Guide for the. Practitioner Econometrics II Nonstandard Standard Error Issues: A Guide for the Practitioner Måns Söderbom 10 May 2011 Department of Economics, University of Gothenburg. Email: mans.soderbom@economics.gu.se. Web: www.economics.gu.se/soderbom,

More information

1 Outline. Introduction: Econometricians assume data is from a simple random survey. This is never the case.

1 Outline. Introduction: Econometricians assume data is from a simple random survey. This is never the case. 24. Strati cation c A. Colin Cameron & Pravin K. Trivedi 2006 These transparencies were prepared in 2002. They can be used as an adjunct to Chapter 24 (sections 24.2-24.4 only) of our subsequent book Microeconometrics:

More information

FNCE 926 Empirical Methods in CF

FNCE 926 Empirical Methods in CF FNCE 926 Empirical Methods in CF Lecture 11 Standard Errors & Misc. Professor Todd Gormley Announcements Exercise #4 is due Final exam will be in-class on April 26 q After today, only two more classes

More information

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data July 2012 Bangkok, Thailand Cosimo Beverelli (World Trade Organization) 1 Content a) Classical regression model b)

More information

A Course in Applied Econometrics Lecture 7: Cluster Sampling. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

A Course in Applied Econometrics Lecture 7: Cluster Sampling. Jeff Wooldridge IRP Lectures, UW Madison, August 2008 A Course in Applied Econometrics Lecture 7: Cluster Sampling Jeff Wooldridge IRP Lectures, UW Madison, August 2008 1. The Linear Model with Cluster Effects 2. Estimation with a Small Number of roups and

More information

Lecture Notes on Measurement Error

Lecture Notes on Measurement Error Steve Pischke Spring 2000 Lecture Notes on Measurement Error These notes summarize a variety of simple results on measurement error which I nd useful. They also provide some references where more complete

More information

Ninth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis"

Ninth ARTNeT Capacity Building Workshop for Trade Research Trade Flows and Trade Policy Analysis Ninth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis" June 2013 Bangkok, Thailand Cosimo Beverelli and Rainer Lanz (World Trade Organization) 1 Selected econometric

More information

Time Series Models and Inference. James L. Powell Department of Economics University of California, Berkeley

Time Series Models and Inference. James L. Powell Department of Economics University of California, Berkeley Time Series Models and Inference James L. Powell Department of Economics University of California, Berkeley Overview In contrast to the classical linear regression model, in which the components of the

More information

Inference in difference-in-differences approaches

Inference in difference-in-differences approaches Inference in difference-in-differences approaches Mike Brewer (University of Essex & IFS) and Robert Joyce (IFS) PEPA is based at the IFS and CEMMAP Introduction Often want to estimate effects of policy/programme

More information

Chapter 2. Dynamic panel data models

Chapter 2. Dynamic panel data models Chapter 2. Dynamic panel data models School of Economics and Management - University of Geneva Christophe Hurlin, Université of Orléans University of Orléans April 2018 C. Hurlin (University of Orléans)

More information

Repeated observations on the same cross-section of individual units. Important advantages relative to pure cross-section data

Repeated observations on the same cross-section of individual units. Important advantages relative to pure cross-section data Panel data Repeated observations on the same cross-section of individual units. Important advantages relative to pure cross-section data - possible to control for some unobserved heterogeneity - possible

More information

PANEL DATA RANDOM AND FIXED EFFECTS MODEL. Professor Menelaos Karanasos. December Panel Data (Institute) PANEL DATA December / 1

PANEL DATA RANDOM AND FIXED EFFECTS MODEL. Professor Menelaos Karanasos. December Panel Data (Institute) PANEL DATA December / 1 PANEL DATA RANDOM AND FIXED EFFECTS MODEL Professor Menelaos Karanasos December 2011 PANEL DATA Notation y it is the value of the dependent variable for cross-section unit i at time t where i = 1,...,

More information

Small sample methods for cluster-robust variance estimation and hypothesis testing in fixed effects models

Small sample methods for cluster-robust variance estimation and hypothesis testing in fixed effects models Small sample methods for cluster-robust variance estimation and hypothesis testing in fixed effects models arxiv:1601.01981v1 [stat.me] 8 Jan 2016 James E. Pustejovsky Department of Educational Psychology

More information

Föreläsning /31

Föreläsning /31 1/31 Föreläsning 10 090420 Chapter 13 Econometric Modeling: Model Speci cation and Diagnostic testing 2/31 Types of speci cation errors Consider the following models: Y i = β 1 + β 2 X i + β 3 X 2 i +

More information

Econometric Analysis of Cross Section and Panel Data

Econometric Analysis of Cross Section and Panel Data Econometric Analysis of Cross Section and Panel Data Jeffrey M. Wooldridge / The MIT Press Cambridge, Massachusetts London, England Contents Preface Acknowledgments xvii xxiii I INTRODUCTION AND BACKGROUND

More information

WILD CLUSTER BOOTSTRAP CONFIDENCE INTERVALS*

WILD CLUSTER BOOTSTRAP CONFIDENCE INTERVALS* L Actualité économique, Revue d analyse économique, vol 91, n os 1-2, mars-juin 2015 WILD CLUSTER BOOTSTRAP CONFIDENCE INTERVALS* James G MacKinnon Department of Economics Queen s University Abstract Confidence

More information

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria SOLUTION TO FINAL EXAM Friday, April 12, 2013. From 9:00-12:00 (3 hours) INSTRUCTIONS:

More information

Markov-Switching Models with Endogenous Explanatory Variables. Chang-Jin Kim 1

Markov-Switching Models with Endogenous Explanatory Variables. Chang-Jin Kim 1 Markov-Switching Models with Endogenous Explanatory Variables by Chang-Jin Kim 1 Dept. of Economics, Korea University and Dept. of Economics, University of Washington First draft: August, 2002 This version:

More information

Small sample methods for cluster-robust variance estimation and hypothesis testing in fixed effects models

Small sample methods for cluster-robust variance estimation and hypothesis testing in fixed effects models Small sample methods for cluster-robust variance estimation and hypothesis testing in fixed effects models James E. Pustejovsky and Elizabeth Tipton July 18, 2016 Abstract In panel data models and other

More information

1 The Multiple Regression Model: Freeing Up the Classical Assumptions

1 The Multiple Regression Model: Freeing Up the Classical Assumptions 1 The Multiple Regression Model: Freeing Up the Classical Assumptions Some or all of classical assumptions were crucial for many of the derivations of the previous chapters. Derivation of the OLS estimator

More information

An overview of applied econometrics

An overview of applied econometrics An overview of applied econometrics Jo Thori Lind September 4, 2011 1 Introduction This note is intended as a brief overview of what is necessary to read and understand journal articles with empirical

More information

Inference for Clustered Data

Inference for Clustered Data Inference for Clustered Data Chang Hyung Lee and Douglas G. Steigerwald Department of Economics University of California, Santa Barbara March 23, 2017 Abstract This article introduces eclust, a new Stata

More information

A Course on Advanced Econometrics

A Course on Advanced Econometrics A Course on Advanced Econometrics Yongmiao Hong The Ernest S. Liu Professor of Economics & International Studies Cornell University Course Introduction: Modern economies are full of uncertainties and risk.

More information

x i = 1 yi 2 = 55 with N = 30. Use the above sample information to answer all the following questions. Show explicitly all formulas and calculations.

x i = 1 yi 2 = 55 with N = 30. Use the above sample information to answer all the following questions. Show explicitly all formulas and calculations. Exercises for the course of Econometrics Introduction 1. () A researcher is using data for a sample of 30 observations to investigate the relationship between some dependent variable y i and independent

More information

Economics 582 Random Effects Estimation

Economics 582 Random Effects Estimation Economics 582 Random Effects Estimation Eric Zivot May 29, 2013 Random Effects Model Hence, the model can be re-written as = x 0 β + + [x ] = 0 (no endogeneity) [ x ] = = + x 0 β + + [x ] = 0 [ x ] = 0

More information

New Developments in Econometrics Lecture 11: Difference-in-Differences Estimation

New Developments in Econometrics Lecture 11: Difference-in-Differences Estimation New Developments in Econometrics Lecture 11: Difference-in-Differences Estimation Jeff Wooldridge Cemmap Lectures, UCL, June 2009 1. The Basic Methodology 2. How Should We View Uncertainty in DD Settings?

More information

Applied Econometrics. Lecture 3: Introduction to Linear Panel Data Models

Applied Econometrics. Lecture 3: Introduction to Linear Panel Data Models Applied Econometrics Lecture 3: Introduction to Linear Panel Data Models Måns Söderbom 4 September 2009 Department of Economics, Universy of Gothenburg. Email: mans.soderbom@economics.gu.se. Web: www.economics.gu.se/soderbom,

More information

Heteroskedasticity. We now consider the implications of relaxing the assumption that the conditional

Heteroskedasticity. We now consider the implications of relaxing the assumption that the conditional Heteroskedasticity We now consider the implications of relaxing the assumption that the conditional variance V (u i x i ) = σ 2 is common to all observations i = 1,..., In many applications, we may suspect

More information

Chapter 6. Panel Data. Joan Llull. Quantitative Statistical Methods II Barcelona GSE

Chapter 6. Panel Data. Joan Llull. Quantitative Statistical Methods II Barcelona GSE Chapter 6. Panel Data Joan Llull Quantitative Statistical Methods II Barcelona GSE Introduction Chapter 6. Panel Data 2 Panel data The term panel data refers to data sets with repeated observations over

More information

Day 4A Nonparametrics

Day 4A Nonparametrics Day 4A Nonparametrics A. Colin Cameron Univ. of Calif. - Davis... for Center of Labor Economics Norwegian School of Economics Advanced Microeconometrics Aug 28 - Sep 2, 2017. Colin Cameron Univ. of Calif.

More information

On the Power of Tests for Regime Switching

On the Power of Tests for Regime Switching On the Power of Tests for Regime Switching joint work with Drew Carter and Ben Hansen Douglas G. Steigerwald UC Santa Barbara May 2015 D. Steigerwald (UCSB) Regime Switching May 2015 1 / 42 Motivating

More information

the error term could vary over the observations, in ways that are related

the error term could vary over the observations, in ways that are related Heteroskedasticity We now consider the implications of relaxing the assumption that the conditional variance Var(u i x i ) = σ 2 is common to all observations i = 1,..., n In many applications, we may

More information

1. The Multivariate Classical Linear Regression Model

1. The Multivariate Classical Linear Regression Model Business School, Brunel University MSc. EC550/5509 Modelling Financial Decisions and Markets/Introduction to Quantitative Methods Prof. Menelaos Karanasos (Room SS69, Tel. 08956584) Lecture Notes 5. The

More information

Environmental Econometrics

Environmental Econometrics Environmental Econometrics Syngjoo Choi Fall 2008 Environmental Econometrics (GR03) Fall 2008 1 / 37 Syllabus I This is an introductory econometrics course which assumes no prior knowledge on econometrics;

More information

Notes on Generalized Method of Moments Estimation

Notes on Generalized Method of Moments Estimation Notes on Generalized Method of Moments Estimation c Bronwyn H. Hall March 1996 (revised February 1999) 1. Introduction These notes are a non-technical introduction to the method of estimation popularized

More information

INFERENCE WITH LARGE CLUSTERED DATASETS*

INFERENCE WITH LARGE CLUSTERED DATASETS* L Actualité économique, Revue d analyse économique, vol. 92, n o 4, décembre 2016 INFERENCE WITH LARGE CLUSTERED DATASETS* James G. MACKINNON Queen s University jgm@econ.queensu.ca Abstract Inference using

More information

Finite Sample Performance of A Minimum Distance Estimator Under Weak Instruments

Finite Sample Performance of A Minimum Distance Estimator Under Weak Instruments Finite Sample Performance of A Minimum Distance Estimator Under Weak Instruments Tak Wai Chau February 20, 2014 Abstract This paper investigates the nite sample performance of a minimum distance estimator

More information

Wild Cluster Bootstrap Confidence Intervals

Wild Cluster Bootstrap Confidence Intervals Wild Cluster Bootstrap Confidence Intervals James G MacKinnon Department of Economics Queen s University Kingston, Ontario, Canada K7L 3N6 jgm@econqueensuca http://wwweconqueensuca/faculty/macinnon/ Abstract

More information

Econometrics Midterm Examination Answers

Econometrics Midterm Examination Answers Econometrics Midterm Examination Answers March 4, 204. Question (35 points) Answer the following short questions. (i) De ne what is an unbiased estimator. Show that X is an unbiased estimator for E(X i

More information

Basic Regressions and Panel Data in Stata

Basic Regressions and Panel Data in Stata Developing Trade Consultants Policy Research Capacity Building Basic Regressions and Panel Data in Stata Ben Shepherd Principal, Developing Trade Consultants 1 Basic regressions } Stata s regress command

More information

A New Approach to Robust Inference in Cointegration

A New Approach to Robust Inference in Cointegration A New Approach to Robust Inference in Cointegration Sainan Jin Guanghua School of Management, Peking University Peter C. B. Phillips Cowles Foundation, Yale University, University of Auckland & University

More information

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i,

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i, A Course in Applied Econometrics Lecture 18: Missing Data Jeff Wooldridge IRP Lectures, UW Madison, August 2008 1. When Can Missing Data be Ignored? 2. Inverse Probability Weighting 3. Imputation 4. Heckman-Type

More information

Panel Data: Fixed and Random Effects

Panel Data: Fixed and Random Effects Short Guides to Microeconometrics Fall 2016 Kurt Schmidheiny Unversität Basel Panel Data: Fixed and Random Effects 1 Introduction In panel data, individuals (persons, firms, cities, ) are observed at several

More information

Chapter 1. GMM: Basic Concepts

Chapter 1. GMM: Basic Concepts Chapter 1. GMM: Basic Concepts Contents 1 Motivating Examples 1 1.1 Instrumental variable estimator....................... 1 1.2 Estimating parameters in monetary policy rules.............. 2 1.3 Estimating

More information

Simultaneous Equations with Error Components. Mike Bronner Marko Ledic Anja Breitwieser

Simultaneous Equations with Error Components. Mike Bronner Marko Ledic Anja Breitwieser Simultaneous Equations with Error Components Mike Bronner Marko Ledic Anja Breitwieser PRESENTATION OUTLINE Part I: - Simultaneous equation models: overview - Empirical example Part II: - Hausman and Taylor

More information

NBER WORKING PAPER SERIES ROBUST STANDARD ERRORS IN SMALL SAMPLES: SOME PRACTICAL ADVICE. Guido W. Imbens Michal Kolesar

NBER WORKING PAPER SERIES ROBUST STANDARD ERRORS IN SMALL SAMPLES: SOME PRACTICAL ADVICE. Guido W. Imbens Michal Kolesar NBER WORKING PAPER SERIES ROBUST STANDARD ERRORS IN SMALL SAMPLES: SOME PRACTICAL ADVICE Guido W. Imbens Michal Kolesar Working Paper 18478 http://www.nber.org/papers/w18478 NATIONAL BUREAU OF ECONOMIC

More information

INTRODUCTION TO BASIC LINEAR REGRESSION MODEL

INTRODUCTION TO BASIC LINEAR REGRESSION MODEL INTRODUCTION TO BASIC LINEAR REGRESSION MODEL 13 September 2011 Yogyakarta, Indonesia Cosimo Beverelli (World Trade Organization) 1 LINEAR REGRESSION MODEL In general, regression models estimate the effect

More information

We can relax the assumption that observations are independent over i = firms or plants which operate in the same industries/sectors

We can relax the assumption that observations are independent over i = firms or plants which operate in the same industries/sectors Cluster-robust inference We can relax the assumption that observations are independent over i = 1, 2,..., n in various limited ways One example with cross-section data occurs when the individual units

More information

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals (SW Chapter 5) Outline. The standard error of ˆ. Hypothesis tests concerning β 3. Confidence intervals for β 4. Regression

More information

Economics 326 Methods of Empirical Research in Economics. Lecture 14: Hypothesis testing in the multiple regression model, Part 2

Economics 326 Methods of Empirical Research in Economics. Lecture 14: Hypothesis testing in the multiple regression model, Part 2 Economics 326 Methods of Empirical Research in Economics Lecture 14: Hypothesis testing in the multiple regression model, Part 2 Vadim Marmer University of British Columbia May 5, 2010 Multiple restrictions

More information

Dealing With Endogeneity

Dealing With Endogeneity Dealing With Endogeneity Junhui Qian December 22, 2014 Outline Introduction Instrumental Variable Instrumental Variable Estimation Two-Stage Least Square Estimation Panel Data Endogeneity in Econometrics

More information

Applied Microeconometrics (L5): Panel Data-Basics

Applied Microeconometrics (L5): Panel Data-Basics Applied Microeconometrics (L5): Panel Data-Basics Nicholas Giannakopoulos University of Patras Department of Economics ngias@upatras.gr November 10, 2015 Nicholas Giannakopoulos (UPatras) MSc Applied Economics

More information

IV Quantile Regression for Group-level Treatments, with an Application to the Distributional Effects of Trade

IV Quantile Regression for Group-level Treatments, with an Application to the Distributional Effects of Trade IV Quantile Regression for Group-level Treatments, with an Application to the Distributional Effects of Trade Denis Chetverikov Brad Larsen Christopher Palmer UCLA, Stanford and NBER, UC Berkeley September

More information

GLS. Miguel Sarzosa. Econ626: Empirical Microeconomics, Department of Economics University of Maryland

GLS. Miguel Sarzosa. Econ626: Empirical Microeconomics, Department of Economics University of Maryland GLS Miguel Sarzosa Department of Economics University of Maryland Econ626: Empirical Microeconomics, 2012 1 When any of the i s fail 2 Feasibility 3 Now we go to Stata! GLS Fixes i s Failure Remember that

More information

E cient Estimation and Inference for Di erence-in-di erence Regressions with Persistent Errors

E cient Estimation and Inference for Di erence-in-di erence Regressions with Persistent Errors E cient Estimation and Inference for Di erence-in-di erence Regressions with Persistent Errors Ryan Greenaway-McGrevy Bureau of Economic Analysis Chirok Han Korea University January 2014 Donggyu Sul Univeristy

More information

GMM estimation of spatial panels

GMM estimation of spatial panels MRA Munich ersonal ReEc Archive GMM estimation of spatial panels Francesco Moscone and Elisa Tosetti Brunel University 7. April 009 Online at http://mpra.ub.uni-muenchen.de/637/ MRA aper No. 637, posted

More information

We begin by thinking about population relationships.

We begin by thinking about population relationships. Conditional Expectation Function (CEF) We begin by thinking about population relationships. CEF Decomposition Theorem: Given some outcome Y i and some covariates X i there is always a decomposition where

More information

Introduction: structural econometrics. Jean-Marc Robin

Introduction: structural econometrics. Jean-Marc Robin Introduction: structural econometrics Jean-Marc Robin Abstract 1. Descriptive vs structural models 2. Correlation is not causality a. Simultaneity b. Heterogeneity c. Selectivity Descriptive models Consider

More information

Dynamic Panel Data Ch 1. Reminder on Linear Non Dynamic Models

Dynamic Panel Data Ch 1. Reminder on Linear Non Dynamic Models Dynamic Panel Data Ch 1. Reminder on Linear Non Dynamic Models Pr. Philippe Polomé, Université Lumière Lyon M EcoFi 016 017 Overview of Ch. 1 Data Panel Data Models Within Estimator First-Differences Estimator

More information

Econometrics of Panel Data

Econometrics of Panel Data Econometrics of Panel Data Jakub Mućk Meeting # 6 Jakub Mućk Econometrics of Panel Data Meeting # 6 1 / 36 Outline 1 The First-Difference (FD) estimator 2 Dynamic panel data models 3 The Anderson and Hsiao

More information

Panel Data Models. Chapter 5. Financial Econometrics. Michael Hauser WS17/18 1 / 63

Panel Data Models. Chapter 5. Financial Econometrics. Michael Hauser WS17/18 1 / 63 1 / 63 Panel Data Models Chapter 5 Financial Econometrics Michael Hauser WS17/18 2 / 63 Content Data structures: Times series, cross sectional, panel data, pooled data Static linear panel data models:

More information

Small-sample cluster-robust variance estimators for two-stage least squares models

Small-sample cluster-robust variance estimators for two-stage least squares models Small-sample cluster-robust variance estimators for two-stage least squares models ames E. Pustejovsky The University of Texas at Austin Context In randomized field trials of educational interventions,

More information

Lecture 8 Panel Data

Lecture 8 Panel Data Lecture 8 Panel Data Economics 8379 George Washington University Instructor: Prof. Ben Williams Introduction This lecture will discuss some common panel data methods and problems. Random effects vs. fixed

More information

Difference-in-Differences Estimation

Difference-in-Differences Estimation Difference-in-Differences Estimation Jeff Wooldridge Michigan State University Programme Evaluation for Policy Analysis Institute for Fiscal Studies June 2012 1. The Basic Methodology 2. How Should We

More information

Topic 7: Heteroskedasticity

Topic 7: Heteroskedasticity Topic 7: Heteroskedasticity Advanced Econometrics (I Dong Chen School of Economics, Peking University Introduction If the disturbance variance is not constant across observations, the regression is heteroskedastic

More information

Testing Weak Convergence Based on HAR Covariance Matrix Estimators

Testing Weak Convergence Based on HAR Covariance Matrix Estimators Testing Weak Convergence Based on HAR Covariance atrix Estimators Jianning Kong y, Peter C. B. Phillips z, Donggyu Sul x August 4, 207 Abstract The weak convergence tests based on heteroskedasticity autocorrelation

More information

ECON 594: Lecture #6

ECON 594: Lecture #6 ECON 594: Lecture #6 Thomas Lemieux Vancouver School of Economics, UBC May 2018 1 Limited dependent variables: introduction Up to now, we have been implicitly assuming that the dependent variable, y, was

More information

Models, Testing, and Correction of Heteroskedasticity. James L. Powell Department of Economics University of California, Berkeley

Models, Testing, and Correction of Heteroskedasticity. James L. Powell Department of Economics University of California, Berkeley Models, Testing, and Correction of Heteroskedasticity James L. Powell Department of Economics University of California, Berkeley Aitken s GLS and Weighted LS The Generalized Classical Regression Model

More information

Small-Sample Methods for Cluster-Robust Inference in School-Based Experiments

Small-Sample Methods for Cluster-Robust Inference in School-Based Experiments Small-Sample Methods for Cluster-Robust Inference in School-Based Experiments James E. Pustejovsky UT Austin Educational Psychology Department Quantitative Methods Program pusto@austin.utexas.edu Elizabeth

More information

Econ 1123: Section 5. Review. Internal Validity. Panel Data. Clustered SE. STATA help for Problem Set 5. Econ 1123: Section 5.

Econ 1123: Section 5. Review. Internal Validity. Panel Data. Clustered SE. STATA help for Problem Set 5. Econ 1123: Section 5. Outline 1 Elena Llaudet 2 3 4 October 6, 2010 5 based on Common Mistakes on P. Set 4 lnftmpop = -.72-2.84 higdppc -.25 lackpf +.65 higdppc * lackpf 2 lnftmpop = β 0 + β 1 higdppc + β 2 lackpf + β 3 lackpf

More information

Applied Economics. Panel Data. Department of Economics Universidad Carlos III de Madrid

Applied Economics. Panel Data. Department of Economics Universidad Carlos III de Madrid Applied Economics Panel Data Department of Economics Universidad Carlos III de Madrid See also Wooldridge (chapter 13), and Stock and Watson (chapter 10) 1 / 38 Panel Data vs Repeated Cross-sections In

More information

Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator

Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator by Emmanuel Flachaire Eurequa, University Paris I Panthéon-Sorbonne December 2001 Abstract Recent results of Cribari-Neto and Zarkos

More information

Econometrics Homework 1

Econometrics Homework 1 Econometrics Homework Due Date: March, 24. by This problem set includes questions for Lecture -4 covered before midterm exam. Question Let z be a random column vector of size 3 : z = @ (a) Write out z

More information

QED. Queen s Economics Department Working Paper No. 1355

QED. Queen s Economics Department Working Paper No. 1355 QED Queen s Economics Department Working Paper No 1355 Randomization Inference for Difference-in-Differences with Few Treated Clusters James G MacKinnon Queen s University Matthew D Webb Carleton University

More information

xtunbalmd: Dynamic Binary Random E ects Models Estimation with Unbalanced Panels

xtunbalmd: Dynamic Binary Random E ects Models Estimation with Unbalanced Panels : Dynamic Binary Random E ects Models Estimation with Unbalanced Panels Pedro Albarran* Raquel Carrasco** Jesus M. Carro** *Universidad de Alicante **Universidad Carlos III de Madrid 2017 Spanish Stata

More information

Comment on HAC Corrections for Strongly Autocorrelated Time Series by Ulrich K. Müller

Comment on HAC Corrections for Strongly Autocorrelated Time Series by Ulrich K. Müller Comment on HAC Corrections for Strongly Autocorrelated ime Series by Ulrich K. Müller Yixiao Sun Department of Economics, UC San Diego May 2, 24 On the Nearly-optimal est Müller applies the theory of optimal

More information