Two-way fixed effects estimators with heterogeneous treatment effects

Size: px

Start display at page:

Download "Two-way fixed effects estimators with heterogeneous treatment effects"

Asher Wade
5 years ago
Views:

1 Two-way fixed effects estimators with heterogeneous treatment effects Clément de Chaisemartin UC Santa Barbara Xavier D Haultfœuille CREST USC-INET workshop on Evidence to Inform Economic Policy

2 are very pervasive 20% of empirical papers published by AER in estimate treatment effects (TE) using regs with period and group fixed effects. If common trends assumption holds, and if TE of all units is equal to a constant, then β, coeff. of treatment in those reg., is equal to. Often implausible that TE constant: TE may vary over time and across individuals. E.g.: returns to schooling presumably heterogeneous across people, and some evidence that are changing over time. What do those regressions identify if TE heterogeneous?

3 Two-way FE reg. may be misleading if TE heterogeneous Main result: without assuming constant TE, β = g,t w g,t g,t, where w g,t: weights summing to 1, and g,t: TE in group g at period t. w g,t to proportion of units in (g, t), so β ATE. But even worse, often times, many weights w g,t are < 0. In 2 articles estimating two-way FE, we find that 50% of weights are < 0. Then, β could be < 0 even if all the g,t are > 0. Estimating weights = diagnostic of β s robustness to heterogeneous TE. Stata progs. estimating weights available from my website.

4 We propose an alternative to two-way FE reg. We propose an alternative estimand, Wald-TC estimand. Does not rely on constant TE, but on conditional CT assumption. Like the standard CT assumption, conditional CT can be partly tested by looking at pre-trends. Identifies an easy to interpret Local Average Treatment Effect (LATE). In applications, we find that our estimator and the two-way FE estimators are often very different, and sometimes even have opposite signs. Our Wald-TC estimator is computed by fuzzydid Stata package, available from SSC repository.

5 Literature de Chaisemartin and D Haultfœuille (2018). Imai et al. (2018). Several papers study special case with staggered adoption: Borusyak and Jaravel (2017), Abraham and Sun (2018), Athey and Imbens (2018), Goodman-Bacon (2018), Callaway and Sant Anna (2018).

6 Outline 1 Set-up 2 Intuition: 2 groups & 2 periods 3 Main results 4 Alternative estimand 5 Applications 6 Conclusion

7 Set-up and notations Binary treatment D (for simplicity, all results extend to non-binary D). Potential outcomes (Y (0), Y (1)) and Y = Y (D). t + 1 time periods T {0,..., t}, and g + 1 groups G {0,..., g}. Shortcut: for any var. R, E(R.,t) = E(R T = t), E(R g,t) = E(R G = g, T = t).

8 The FE regression We are going to study two commonly used two-way FE regs. FE regression: OLS of E(Y G, T ) on group and time FE and E(D G, T ). β fe = coeff. of E(D G, T ). Example, Enikolopov, Petrova, and Zhuravskaya (2011): Study effect of NTV, independent TV channel introduced in 1996 in Russia, on the share of votes for opposition parties. Signal quality heterogeneous: in some regions many people get access to NTV post 1996, in other regions few people. E(Y G, T ): % of votes for opposition in region G & year T. E(D G, T ): % of pop. having access to NTV in region G & year T.

9 The FD regression FD regression: OLS of E(Y G, T ) E(Y G, T 1) on time FE and E(D G, T ) E(D G, T 1). β fd = coeff. of E(D G, T ) E(D G, T 1). Example, Gentzkow, Shapiro, and Sinkinson (2011): Study whether newspapers affect electoral participation in the USA between 1868 and Regress FD of turnout in county g between presidential elections t and t 1 on FD of the number of newspapers and state-year FE.

10 Outline 1 Set-up 2 Intuition: 2 groups & 2 periods 3 Main results 4 Alternative estimand 5 Applications 6 Conclusion

11 To gain intuition, let s start with 2 groups & 2 periods Assume that 2 groups and periods. Then, β fe = β fd = E(Y1,1) E(Y1,0) E(Y0,1) + E(Y0,0) E(D 1,1) E(D 1,0) E(D 0,1) + E(D 0,0). Rhs of previous display: W DID, Wald-DID estimand studied in C&D (2018).

12 The common trends assumption A1 (Common trends) E(Y (0) G, T = 1) E(Y (0) G, T = 0) does not depend on G. Usual CT condition: mean Y (0) follows the same trend in all groups.

13 No negative weights in sharp DID Assume that sharp design (D = G T ). Then, if CT holds, well-known result: E(Y 1,1) E(Y 1,0) E(Y 0,1) + E(Y 0,0) = TR 1,1, (1) where TR g,t = E(Y (1) Y (0) D = 1, G = g, T = t): ATT in (g, t). CT: if nobody treated, mean of Y follows same trend in 2 groups. Sharp design: 1 departure from no treated scenario: G = 1, T = 1 treated. Diverging trends between 2 groups must come from TE in G = 1, T = 1.

14 Negative weights in fuzzy DID Now fuzzy design: there can be treated units in both groups and periods: E(Y 1,1) E(Y 1,0) E(Y 0,1) + E(Y 0,0) =E(D 1,1) TR 1,1 E(D 1,0) TR 1,0 E(D 0,1) TR 0,1 + E(D 0,0) TR 0,0. (2) Potentially, four departures from scenario where no one treated. Discrepancy between trends of 2 groups can come from TE in any (g, t). Let DID D = E(D 1,1) E(D 1,0) E(D 0,1) + E(D 0,0), then β fe = E(D 1,1) TR 1,1 DID E(D 1,0) TR 1,0 D DID E(D 0,1) TR 0,1 D DID + E(D 0,0) TR 0,0 D DID. (3) D β fe identifies weighted sum of 4 ATTs, where 2 ATTs get < 0 weight.

15 Couldn t we impose more assumptions? Does β fe identify something more meaningful under supplementary, plausible assumptions?

16 The treatment monotonicity assumption D(0) and D(1): unit s treatment at period 0 and 1. A2 (Treatment monotonicity) P(D(1) D(0) G) = 1 or P(D(1) D(0) G) = 1. Within a group, there can t be units switching from non treatment to treatment and units making the opposite switch from T = 0 to T = 1. Automatically holds when treatment at group period level. With individual-level treatment, similar to monotonicity in Imbens and Angrist (1994). S = {D(0) D(1), T = 1}: units switching treatment from T = 0 to 1.

17 The stable treatment effect assumption A3 (Stable treatment effect) For all g, E(Y (1) Y (0) D(0) = 1, G = g, T = 1) =E(Y (1) Y (0) D(0) = 1, G = g, T = 0). In each group, ATT of treated at T = 0 does not change from T = 0 to T = 1. Restriction on TE heterogeneity over time. May or may not be plausible.

18 We may have no negative weights under CT+TM+STE Remember, under CT we have: β fe = E(D 1,1) TR 1,1 DID E(D 1,0) TR 1,0 D DID E(D 0,1) TR 0,1 D DID + E(D 0,0) TR 0,0 D DID. (4) D Under CT, TM, and STE: β fe = E(D1,1) E(D1,0) S E(D0,1) E(D0,0) 1,1 S 0,1. (5) DID D DID D where S g,t = E(Y (1) Y (0) S, G = g, T = t): switch. LATE in (g, t). Intuition, TE of units treated at T = 0 cancels in E(D 1,1) TR 1,1 E(D 1,0) TR 1,0, under STE. β fe identifies a weighted sum of 2 LATEs: If % treated in both groups, 1 LATE gets < 0 weight. if % treated in group 1 & in group 0, 0 LATE gets < 0 weight.

19 Outline 1 Set-up 2 Intuition: 2 groups & 2 periods 3 Main results 4 Alternative estimand 5 Applications 6 Conclusion

20 Results with 2 G & 2 T generalize to many G & many T Theorem 1 1 If A1 holds, β fe = E [ w fe,g,t TR G,T D = 1 ]. 2 If A1-A3 hold, β fe = E [ ω fe,g,t S G,T S ]. Under CT, FE = weighted sums of ATTs. β fe = g,t P(G = g, T = t D = 1)w fe,g,t TR g,t Under CT+TM+STE, FE=weighted sums of LATEs. β fe = g,t P(G = g, T = t S)ω fe,g,t S g,t Similar results hold for FD estimand, with different weights.

21 The weights attached to FE and FD Let ε fe,g,t =residual of units in group g and at period t in reg. of E(D G, T ) on group & time FE. Weights attached to FE under CT: w fe,g,t = ε fe,g,t E(ε fe,g,t D=1). (g, t) receiving < 0 weight = those with a < 0 residual. w fe,g,t can be estimated. Similarly, weights attached to FD under CT are a function of ε fd,g,t, residual of units in (g, t) in reg. of E(D G, T ) E(D G, T 1) on time FE. Can also be estimated. Weights attached to FE (resp. FD) under CT+TM+STE are also a function of ε fe,g,t (resp. ε fd,g,t ).

22 When are all the weights attached to FE positive? 1 Under CT: Weights attached to FE all 0 only in very special cases (e.g.: sharp DID, 2 groups & periods). Even in staggered adoption designs, one of the many designs where two-way FE regressions have been used, and where groups go from fully untreated to fully treated at heterogeneous dates, some weights are < 0, while this design is very close to a sharp DID. 2 Under CT+TM+STE: Weights all 0 in staggered adoption design. in this design, FE relies on STE, but not on homogeneous treatment effect across groups. Outside of this special case, very likely that some weights < 0. When are all weights attached to FD positive?

23 A measure of robustness of FE and FD to heterogenous TE Let σ TR = V ( TR G,T D = 1 ) 1/2. Std. dev. of ATTs across (g, t): amount of TE heterogeneity. Corollary 1 If A1 holds, lowest value of σ TR such that β fe and TR can have signs is σ TR β fe fe = V (w fe,g,t D = 1). 1/2 The larger σ TR fe, the more robust to heterogeneous TE FE is. Similar lower bound on TE heterogeneity to have sign reversal can be obtained for FD under CT, and for FE and FD under CT+TM+STE.

24 Outline 1 Set-up 2 Intuition: 2 groups & 2 periods 3 Main results 4 Alternative estimand 5 Applications 6 Conclusion

25 The two assumptions underlying our estimand A1 For d {0, 1}, E(Y g,t(d) D(t 1) = d) E(Y g,t 1(d) D(t 1) = d) does not depend on g. Among units untreated in t 1, mean of Y (0) follows same evolution from t 1 to t in all groups. Among units treated in t 1, mean of Y (1) follows same evolution from t 1 to t in all groups. Common trends conditional on treatment in t 1. A6 For all t 1, there exists g such that E(D g,t) = E(D g,t 1). For each t, there are groups whose exposure to treatment does not change between t 1 and t. Often satisfied: Gentzkow et al., staggered adoption design...

26 S identified without any assumption on TE heterogeneity Theorem 2 Suppose that A1, A2, and A6 hold. Then W TC = S. W TC estimand defined in paper, but intuition on what it does given next slide.

27 What does W TC do? Assume all units in same (g, t) have same D (e.g. Gentzkow et al.). Then W TC is weighted average of: 1 DIDs comparing the groups that went from non treatment in t 1 to treatment in t to groups untreated at both dates. Under A1, identifies TE in period t in the switching-in groups. 2 DIDs comparing the groups treated at both dates to those going from treatment in t 1 to non treatment in t. Under A1, identifies TE in period t in the switching-out groups. When treatment not constant in (g, t), what W TC does is a bit more complicated, but idea of comparing trends of units with same treatment in t 1 remains.

28 Outline 1 Set-up 2 Intuition: 2 groups & 2 periods 3 Main results 4 Alternative estimand 5 Applications 6 Conclusion

29 Enikolopov et al. (2011) Regress share of vote for opposition in region g and year t on region FE, year FE, and share of people having access to NTV in region g and year t. β fe = 6.65 (s.e.=1.40). We estimate weights attached to FE under CT: 918 > 0, 1,020 < 0. under CT, FE= weight. sum of 1,938 ATTs, 1,020 ATTs get < 0 weight. Negative weights sum to (s.e.=0.06). = 0.91 (CI=[0.53, 1.28]): β fe and ATT could be of opposite signs even if s.e. of effect of NTV across subregions is one percentage point. σ TR fe 2 periods (1995 & 1999), no one treated in 1995, so switchers = treated, and weights under CT+TM+STE same as under CT. In every region, % of pop having access to NTV > 0 in 1999, so we cannot compute Wald-TC.

30 Gentzkow et al. (2011): two-way FE estimators Did newspapers affect electoral turnout in the US between 1868 and 1928? Reg change in turnout from presidential election t 1 to t in county g on change in number of newspapers and state-year FE. Alternatively, one could estimate FE regression. β fd = (0.0009), β fe = (0.0011), t-stat of diff: About half of the weights attached to FD and FE under CT < 0. All weights attached to FD under CT+TM+STE > 0, while 25% of weights attached to FE still < 0. FD estimates weighted average of TEs under CT+TM+STE.

31 Gentzkow et al. (2011): estimating Wald-TC Still, FD relies on STE, not warranted (e.g. radio developed over period). A6 holds here: at each t, there are counties whose number of newspapers does not change between t 1 and t. Ŵ TC = % than and significantly from β fd, opposite sign than β fe Does not rely on any homogeneous TE assumption, just conditional CT. Test of pre-trends indicates conditional CT plausible.

32 Outline 1 Set-up 2 Intuition: 2 groups & 2 periods 3 Main results 4 Alternative estimand 5 Applications 6 Conclusion

33 Take-away Under CT, FE and FD regs identify weighted sums of ATTs. Weights can be estimated. Most often, many are < 0. Under CT+TM+STE, FE and FD regs often still identify uninterpretable weighted sums of TEs. Exception: staggered adoption design. We propose alternative estimator, that does not rely on TE homogeneity, and identifies an easily interpretable parameter, the LATE of all switchers. Can be used when for each pair of periods there are groups whose treatment does not change. Our applications show that using FE, FD, or W TC can make large.

34 Thank you!

35 When are all the weights attached to FD positive? 1 Under CT, weights attached to FD all 0 only in very special cases. 2 Under CT+TM+STE, weights attached to FD all 0 iif: a) in groups where % treated from t 1 to t, is E(D.,t) E(D.,t 1). b) in groups where % treated from t 1 to t, is E(D.,t) E(D.,t 1). Holds if E(D.,t) E(D.,t 1) = 0 or close to 0. E.g.: Gentzkow et al. Also holds in staggered adoption designs. Fails if e.g., % treated in all groups from t 1 to t, for some t. Back

36 The random weights assumptions Under CT, FE and FD often identify uninterpretable weighted sums of ATTs with many < 0 weights. Adding TM+STE does not solve the pblm. Let s add more assumptions: A4 cov ( TR G,T, w fe,g,t D = 1 ) = 0. A4: Weights attached to FE under CT uncorrelated with ATTs. Corollary 2 If A1 and A4 hold, β fe = TR. FD also identifies TR under CT and similar random weights assumption. Similar result can be obtained under CT+TM+STE.

37 Random weights assumptions often implausible A4 equivalent to cov ( ε fe,g,t, TR G,T D = 1 ) = 0. fails if groups where a large proportion of units adopt the treatment (ε fe,g,t > 0) are those with the largest treatment effect ( TR G,T > TR ). A4 incompatible with Roy model of selection into treatment.

38 The random weights assumptions are partly testable Under CT, if A4 and corresponding assumption for FD hold, β fe = β fd. Sometimes, the 2 are significantly different (e.g. Gentzkow et al.) the two random weights assumptions cannot jointly hold.

39 Extensions We extend our results to non-binary treatments: the weights remain unchanged. We extend our results to FD and FE regressions with controls X : the weights remain unchanged. Our results are widely applicable.

40 The weights attached to FE and FD under CT+TM+STE Let s g,t = sgn (E(D g,t) E(D g,t 1)). s g,t = 1, 0, and -1 in groups where % treated, =, from t 1 to t. Weights attached to FD under CT+TM+STE: ω fd,g,t = ω fd,g,t can be estimated. sg,t ε fd,g,t E[s G,T ε fd,g,t S]. Similarly, weights attached to FE under CT+TM+STE are a function of ε fe,g,t and s g,t. Can also be estimated.

Two-way fixed effects estimators with heterogeneous treatment effects

Two-way fixed effects estimators with heterogeneous treatment effects Clément de Chaisemartin Xavier D Haultfœuille April 26, 2018 Abstract Around 20% of all empirical articles published by the American