Supplementary Appendix: Difference-in-Differences with. Multiple Time Periods and an Application on the Minimum. Wage and Employment
|
|
- Marshall Shelton
- 5 years ago
- Views:
Transcription
1 Supplementary Appendix: Difference-in-Differences with Multiple Time Periods and an Application on the Minimum Wage and mployment Brantly Callaway Pedro H. C. Sant Anna August 3, 208 This supplementary appendix contains a) the proofs of the results stated in the main text; b) results for the case where a researcher has access to repeated cross sections data rather than panel data; c) extensions of our main results when using not yet treated observations as a control group; and d) additional details on group-time average treatment effects under an unconditional parallel trends assumption, paying particular attention to the possibilities of using regressions to estimate group-time average treatment effects. Appendix A: Proofs of Main Results We provide the proofs of our results in this appendix. Before proceeding, we first state and prove several auxiliary lemmas that help us proving our main theorems. Let AT T X g, t) = Y t ) Y t 0) X, G g =. Lemma A.. Under Assumptions -4, and for 2 g t T, AT T X g, t) = Y t Y g X, G g = Y t Y g X, C = a.s.. Proof of Lemma A.: In what follows, take all equalities to hold almost surely a.s.). Notice that for identifying AT T X g, t), the key term is Y t 0) X, G g =. And notice that for h > s, Y s 0) X, G h = = Y s X, G h =, which holds because in time periods before an individual Department of conomics, Temple University. mail: brantly.callaway@temple.edu Department of conomics, Vanderbilt University. mail: pedro.h.santanna@vanderbilt.edu
2 is first treated, their untreated potential outcomes are observed outcomes. Also, note that, for 2 g t T, Y t 0) X, G g = = Y t 0) X, G g = + Y t 0) X, G g = = Y t X, C = + Y t 0) X, G g =, A.) where the first equality holds by adding and subtracting Y t 0) X, G g = and the second equality holds by Assumption 2. If g = t, then the last term in the final equation is identified; otherwise, one can continue recursively in similar way to A.) but starting with Y t 0) X, G g =. As a result, t g Y t 0) X, G g = = Y t j X, C = + Y g X, G g = j=0 = Y t Y g X, C = + Y g X, G g =. A.2) Combining A.2) with the fact that, for all g t, Y t ) X, G g = = Y t X, G g = which holds because observed outcomes for group g in period t with g t are treated potential outcomes), implies the result. Next, recall that ˆπ g = arg max π i:g ig +C i = G ig ln p g X iπ)) + G ig ) ln p g X iπ)), ṗ g = p g u)/ u, ṗ g X) = ṗ g X π 0 g), and π 0 g is the true, unknown vector of parameter indexed the generalized propensity score p g X) = G g = X, G g + C =. Lemma A.2. Under Assumption 5, ˆπg π 0 g) = n ξg π W i ) + o p ), where ξg π G g + C) ṗ g X) 2 W) = p g X) p g X)) XX X G g + C) G g p g X)) ṗ g X). p g X) p g X)) Proof of Lemma A.2: Let n gc = n C i + G ig ). Under Assumption 5, from Theorem 5.39 and xample 5.40 in van der Vaart 998), we have ngc ˆπg π 0 g) 2
3 = ngc ṗ g X) 2 Gg G ig p g X i )) ṗ g X i ) p g X) p g X)) XX + C = X i +o p ) p g X i ) p g X i )) G g + C) ṗ g X) 2 G ig + C i ) G ig p g X i )) ṗ g X i ) p g X) p g X)) XX X i +o p ) p g X i ) p g X i )) i:g ig +C i = = G g + C ngc = n G g + C ngc = Thus, ngc n ξg π W i ) +o p ) ξg π W i ) +o p ). and the proof is complete. n ˆπg πg) 0 ) = gc ˆπg πg 0 ngc = ξg π W i ) + o p ), For an arbitrary π, let p g x; π) = p g x π), ṗ g x; π) = ṗ g x π), for all g = 2,..., T. Define the classes of functions, { } p g x; π) H,g = x, c) c p g x; π) : π Π g, { H 2,g = x, c, y t, y g ) c p } g x; π) y t y g ) : π Π g p g x; π) { } H 3,g = x, c, y t, y g ) x c ṗ g x; π) y t y g ) p g x; π)) 2 : π Π g, { } c ṗ g x; π) H 4,g = x, c) x p g x; π)) 2 : π Π g, { H 5,g = x, c, g g ) x g } g + c) g g p g x; π)) ṗ g x; π) : π Π g. p g x; π) p g x; π)) Lemma A.3. Under Assumptions and 5, for all g = 2,..., T, t = 2,..., T, the classes of functions H j,g, j = {, 2..., 5}, are Donsker. Proof of Lemma A.3: This follows from xample 9.7 in van der Vaart 998). Lemma A.4. Under Assumptions and 5, the null hypothesis H 0 : Y t Y t X, G g = Y t Y t X, C = = 0 a.s.for all 2 t < g T, 3
4 can be equivalently written as H 0 : G g G g p g X) C p g X) pg X) C p g X) Y t Y t ) X = 0 a.s.for all 2 t < g T. Proof of Lemma A.4: First note that Y t Y t X, G g = = G g Y t Y t ) X, G g = G g = G g X Y t Y t ) X. Analogously, implying that Y t Y t X, C = = C C X Y t Y t ) X, Y t Y t X, G g = Y t Y t X, C = = 0 a.s.for all 2 t < g T. ) Gg G g X C Y t Y t ) C X X = 0 a.s.for all 2 t < g T. Given that under Assumptions 4 and 5, G g + C X > 0 a.s., we have that ) Gg G g X C Y t Y t ) C X X = 0 a.s.for all 2 t < g T if and only if G g + C X By noticing that Gg G g X p g X) = ) C C X Y t Y t ) X = 0 a.s.for all 2 t < g T. A.3) G g X G g + C X, p g X) = C X G g + C X, and that both of these are bounded away from zero under Assumption 5, we can rewrite A.3) as G g G g p g X) C p g X) pg X) C p g X) Y t Y t ) X = 0 a.s.for all 2 t < g T, 4
5 since pg X) C p g X)) Gg X, C + G g = C = C X, C + G g = Gg X C = C X Gg X C X = C X = G g X = G g. A.4) This completes the proof. Now, we are ready to proceed with the proofs of our main theorems. Proof of Theorem : Given the result in Lemma A., AT T g, t) = AT T X g, t) G g = = Y t Y g X, G g = Y t Y g X, C = G g = := A X G g = B X G g =, and we consider each term separately. For the first term A X G g = = Y t Y g G g = Gg = G g Y t Y g ). A.5) For the second term, by repetition of the law of iterated expectations, we have B X G g = = Y t Y g X, C = G g = = G g CY t Y g ) X, C = G g = C = G g p g X)) Y t Y g ) Gg X, Gg + C = = C G g p g X)) Y t Y g ) Gg X, Gg + C = + C = = G g G g + C = pg X) C p g X)) Y t Y g ) Gg X, Gg + C = + C = = G g G g + C = = G g pg X) C G g + C X p g X)) Y t Y g ) X, Gg + C = 5
6 = G g pg X) C p g X)) Y t Y g ) X = G g pg X) C p g X)) Y t Y g ) pg X) C p g X)) Y t Y g ) =, A.6) pg X) C p g X)) where A.6) follows from A.4). The proof is completed by combining A.5) and A.6). Proof of Theorem 2: Remember that Gg ÂT T g, t) = n n G g Y t Y g ) n := ÂT T g g, t) ÂT T C g, t), ˆp g X) C ˆp g X) n ˆpg X) C ˆp g X) Y t Y g ), and Gg AT T g, t) = G g Y t Y g ) := AT T g g, t) AT T C g, t). p g X) C p g X) pg X) C p g X) Y t Y g ) In what follows we will separately show that, for 2 g t T, ÂT T g g, t) AT T g g, t)) = ψgtw G i ) + o p ), A.7) and Then, ÂT T C g, t) AT T C g, t)) = ÂT T g, t) AT T g, t)) = ψgtw C i ) + o p ). ψ gt W i ) + o p ) A.8) hold from A.7) and A.8), and the asymptotic normality result follows from the application of the multivariate central limit theorem. 6
7 Let β g = G g and β g = n G g, and note that βg β g ) = G ig G g ). Then, for all 2 g t T, by the continuous mapping theorem, ÂT T g g, t) AT T g g, t)) = βg n G g Y t Y g ) G g Y t Y g )) concluding the proof of A.7). G g Y t Y g ) ) n βg β g = n G ig Y it Y ig ) G g Y t Y g )) β g G g Y t Y g ) ) n βg β g + o p ) = = := Next we focus on A.8). For an arbitrary function g, let and note that β 2 g Gig Y it Y ig ) G ) ig G g Y t Y g ) + o β g βg 2 p ) G ig Y it Y ig ) AT T g g, t)) + o p ) β g ψgtw G i ) + o p ), w g) = g X) C g X), ÂT T C g, t) AT T C g, t)) = n n w ˆp g ) Y t Y g ) w p g ) Y t Y g )) n w ˆp g ) w p g) Y t Y g ) n w ˆp g ) w p g )) n w ˆp g ) w p g ) := n w ˆp g ) na n ˆp g ) AT T Cg, t) n w ˆp g ) nb n ˆp g ). From Assumption 5, Lemmas A.2 and A.3, and the continuous mapping theorem, n w ˆp g ) = w p g ) + o p ), AT T C g, t) n w ˆp g ) = AT T Cg, t) + o p ). w p g ) 7
8 Thus, ÂT T C g, t) AT T C g, t)) = w p g ) na n ˆp g ) Applying a classical mean value theorem argument, AT T Cg, t) w p g ) nb n ˆp g ) + o p ) A.9) A n ˆp g ) = n w p g ) Y t Y g ) w p g ) Y t Y g ) ) 2 C + n X ṗ g X; π g ) Y it Y ig ) ˆπg π 0 p g X; π g ) g), where π is an intermediate point that satisfies πg πg 0 ˆπg πg 0 a.s. Thus, by Assumption 5, Lemmas A.2 and A.3, and the Classical Glivenko-Cantelli s theorem, Analogously, A n ˆp g ) = n w p g ) Y t Y g ) w p g ) Y t Y g ) A.0) ) 2 C + X ṗ g X) Y it Y ig ) ˆπg π 0 p g X) g) ) + op n /2. A.) B n ˆp g ) = n w p g ) w p g ) ) 2 C + X ṗ g X) ˆπg π 0 p g X) g) ) + op n /2. A.2) Then, A.9), A.0), A.2) and Lemma A.2 yield A.8), concluding the proof. Proof of Theorem 3: Note that, by the conditional multiplier central limit theorem, see Lemma in van der Vaart and Wellner 996), as n, V i Ψ g t W i ) d N0, Σ), A.3) where Σ = Ψ g t W)Ψ g t W). Thus, to conclude the proof that ÂT T g t ÂT T g t ) d N0, Σ), it suffices to show that, for all 2 g t T, V i ψgt W i ) ψ gt W i ) = o p ). 8
9 Towards this, note that V i ψgt W i ) ψ gt W i ) = n V i V i ψg gtw i ) ψgtw G i ) ψc gtw i ) ψgtw C i ), A.4) where and with ψ G gtw) = G g Y t Y g ) ÂT T g g, t), n G g ψ gtw) C = w ˆp g) Y it Y ig ) ÂT T C g, t) + n w ˆp g ) M gt ξ g π W), w ˆp g ) = ˆp g X) C ˆp g X), ) 2 C n X ṗ ˆp g X) g X) Y it Y ig ) ÂT T g g, t) M gt =, n w ˆp g ) Gg + C) ṗ ξ g π g X) 2 W) = n ˆp g X) ˆp g X)) XX X G g + C) G g ˆp g X)) ṗ g X). ˆp g X) ˆp g X)) We will show that each term in A.4) is o p ). For the first term in A.4), we have = V i ψg gtw i ) ψgtw G i ) n G g G g n V i G ig Y it Y ig ) ÂT T g g, t) AT T g g, t) n V i G ig, = o p ), A.5) where the last equality follows from the results in Theorem, together with the law of large numbers, continuous mapping theorem, and Lemma in van der Vaart and Wellner 996). For the second term in A.4), we have = V i n w ˆp g ) n ψc gtw i ) ψgtw C i ) V i w i ˆp g ) w i p g )) Y it Y ig ) 9
10 + n w ˆp g ) + Mgt M gt ) n + M gt n V i ) n V i w i p g ) Y it Y ig ) w p g ) V i ξg π W i ). := A n + A 2n + A 3n + A 4n. ξπ g W i ) ξ π g W i ) From Lemma A.3, we have that H,g, H 2,g, H 3,g and H 5,g are Donsker, and by Assumption 5, w p g ) it is bounded away from zero. Thus, by a stochastic equicontinuity argument, Glivenko- Cantelli s theorem, continuous mapping theorem, and Theorem in van der Vaart and Wellner 996), implying that A n = o p ), A 2n = o p ), A 3n = o p ), and A 4n = o p ), V i From A.3)-A.6), it follows that ) ψc gtw i ) ψgtw C i ) = o p ). A.6) ÂT T g t ÂT T g t ) d N0, Σ). Finally, by the continuous mapping theorem, see e.g. Theorem 0.8 in Kosorok 2008), for any continuous functional Γ ) n )) d Γ ÂT T g t ÂT T g t Γ N0, V )), concluding our proof. Proof of Theorem 4: In order to prove the first part of Theorem 4, we first show that, under H 0, for all 2 t < g T, Ĵu, g, t, ˆp g ) = n ψ test ugt W i ) + o p n /2 ), Towards this end, we write Gg Ĵu, g, t, ˆp g ) = n n G g X u) Y t Y t ) ˆp g X) C n ˆp g X) X u) Y t Y t ) ˆpg X) C n ˆp g X) 0
11 and analyze each term separately. := ĴGu, g, t, ˆp g ) ĴCu, g, t, ˆp g ), As in the proof of Theorem, let β g = G g and β g = n G g. Applying a classical mean value theorem argument, uniformly in u X, Gg Ĵ G u, g, t, ˆp g ) = n X u) Y t Y t ) β g n G g X u) Y t Y t ) β 2 g where β g is an intermediate point that satisfies βg β g functions n G g G g. H 6,g = {x, g g, y t, y t ) g g y t y t ) {x u} : u X }. β g β g a.s.. Define the class of By xample 9. in van der Vaart 998), H 6,g is Donsker under Assumption 5. Furthermore, n G g G g = O p n /2 ). Thus, by the Glivenko-Cantelli s theorem and the continuous mapping theorem, uniformly in u X, Gg Ĵ G u, g, t, ˆp g ) = n G g X u) Y t Y t ) where J Gu, g, t, p g ) G g n G g G g + o p n /2 ) = n w G g Yt Y t ) X u) w G g X u) Y t Y t ) ) + J G u, g, t, p g ) + o p n /2 ), Gg J G u, g, t, p g ) = G g X u) Y t Y t ). A.7) We analyze ĴCu, g, t, ˆp g ) next. Applying a classical mean value theorem argument, uniformly in u X, Ĵ C u, g, t, ˆp g ) = ĴCu, g, t, p g ) n X C ṗ g X; π g ) p g X; π g )) 2 X u) Y t Y t ) + ˆπg π pg X; π g ) C g) 0 n p g X; π g )
12 n X C ṗ g X; π g ) pg X; π g ) C p g X; π g )) 2 n p g X; π g ) X u) Y t Y t ) ˆπg π pg X; π g ) C pg X; π g ) C g) 0 n n p g X; π g ) p g X; π g ) where π is an intermediate point that satisfies πg πg 0 ˆπg πg 0 a.s, and pg X) C n p g X) X u) Y t Y t ) Ĵ C u, g, t, p g ) =. pg X) C n p g X) Define the classes of functions { H 7,g = x, c, y t, y t ) p } g x; π) p g x; π) c y t y t ) {x u} : π Π g, u X, { } x; π) c y t y t ) {x u} H 8,g = x, c, y t, y t ) xṗg p g x; π)) 2 : π Π g, u X, { H 9,g = x, c) cp } g x; π) p g x; π) : π Π g,, { } ṗ g x; π) c H 0,g = x, c) x p g x; π)) 2 : π Π g. By xamples 9.7, 9., and 9.20 in van der Vaart 998), all these classes of functions are Donsker under Assumption 5. theorem, and Lemma A.2, uniformly in u X, for every g, t. Denote Thus, by the Glivenko-Cantelli s theorem, continuous mapping Ĵ C u, g, t, ˆp g ) = ĴCu, ) g, t, p g ) + Mugt test ˆπg π ) g 0 + op n /2, A.8) ˆβ C g pg X) C = n, β C pg X) C g =. p g X) p g X) Applying a classical mean value theorem argument, we have pg X) C n p g X) X u) Y t Y t ) Ĵ C u, g, t, p g ) = pg X) C p g X) pg X) C n p g X) X u) Y t Y t ) pg X) C ) 2 n βc g p g X) pg X) C p g X) 2
13 where β g C is an intermediate point that satisfies βc g βg C βc g βg C a.s.. Since H 7,g is a Donsker Class of functions and pg X) C n p g X) pg X) C = O ) p n /2, p g X) we have that, by the Glivenko-Cantelli s theorem and the continuous mapping theorem, uniformly in u X, Ĵ C u, g, t, p g ) = n w C g Y t Y t ) X u) wg C Y t Y t ) X u) pg X) C n pg X) C p g X) pg X) C + o ) p n /2 p g X) p g X) = n w C g Yt Y t ) X u) w C g X u) Y t Y t ) ) + J C u, g, t, p g ) + o p n /2 ). A.9) Hence, from A.7), A.8), A.9), and the asymptotic linear representation of ˆπ g π 0 g) in Lemma A.2, for every g, t, Ĵu, g, t, ˆp g ) = n ψ test ugt W) + J G u, g, t, p g ) J C u, g, t, p g )) + o p n /2 ) A.20) By noticing that under H 0, J G u, g, t, p g ) = J C u, g, t, p g ) for all u X, g, t) such that 2 t < g T, we have that, under H 0, uniformly in u X, for all 2 t < g T Ĵu, g, t, ˆp g ) = n ψ test ugt W i ) + o p n /2 ). In order to show that Ĵg>t u) Gu) in l X ), it suffices to show that the class of functions is Donsker. H 0 = { x, g g, c, y t, y t ) ψ test ugt : u X, 2 t < g T } This follows straightforwardly from the previously discussed Donsker results and xample 9.20 in van der Vaart 998). Finally, CvM d n Gu) 2 M F X du) follows from the continuous mapping theorem, and the Helly-Bray Theorem. X sup F n,x u) F X u) = o a.s. ), u X Next, we study the behavior of CvM n under H. First, note that under H, for some u X, 3
14 and some g, t), 2 t < g T, Ju, g, t, p g ) 0. Thus, from A.20), under H, uniformly in u X, Ĵ g>t u) = O p n /2 ), implying that CvM n diverges to infinity under H. Because c CvM α = O ) a.s., as n, concluding the proof of Theorem 4. P ) CvM n > c CvM α, Proof of Theorem 5: In the proof of Theorem 4, we have shown that H = { x, g g, c, y t, y t ) ψ test ugt : u X, 2 t < g T } is a Donsker class of functions. Then, by the conditional multiplier functional central limit theorem, see Theorem 2.9.6, in van der Vaart and Wellner 996), as n, V i Ψ test g>tw i ) Gu) in l X ), where Gu) in l X ) is the same Gaussian process of Theorem 4 and indicates weak convergence in probability under the bootstrap law. Thus, to conclude the proof it suffices to show that, for all 2 t < g T, uniformly in u X, V i ψtest ugt W i ) ψugt test W i ) = o p ). A.2) The proof of A.2) follows exactly the same steps as the proof of Theorem 3), and is therefore omitted. Appendix B: Additional Results for Repeated Cross Sections In this section we extend our results to the case with repeated cross sections data instead of panel data. Here we assume that for each individual in the pooled sample, we observe Y, G,..., G T, C, T, X) where T {,..., T } denotes the time period when that individual is observed. Let T t = if an observation is observed at time t, and zero otherwise. We assume that random samples are available for each time period. 4
15 Assumption B.. Conditional of T = t, the data are independent and identically distributed from the distribution of Y t, G,..., G T, C, X), for all t =,..., T. Assumption B. implies that our sample consists of random draws from the mixture distribution F M y, g,..., g T, c, t, x) = T λ t F Y,G,...,G T,C,X T y, g,..., g T, c, x t), t= where λ t = P T t = ). Notice that, once one conditions on the time period, then expectations under the mixture distribution correspond to population expectations. Also, because X, G g, and C are observed for all individuals, one can use draws from the mixture distribution to estimate the generalized propensity score. With some abuse of notation, we then use p g X) as a short notation for M G g X, G g + C =, where M denotes expectations with respect to F M ). Define the stabilized weights where a, b =, 2,... T. w treat a, b) = T b G a / M T b G a, w cont a, b) = T b p a X) C p a X) / M Tb p a X) C p a X) Theorem B.. Under Assumption B. and Assumptions 2-4 in the main text, for 2 g t T, the group-time average treatment effect for group g in period t is nonparametrically identified, and given by AT T g, t) = M w treat g, t) w treat g, g )) Y, M w cont g, t) w cont g, g )) Y. Proof of Theorem B.: By the law of iterated expectations, Assumption B. and Assumption 3 in the main text, for all 2 g t T, M w treat g, t) Y = M T t G g Y M T t G g = G g Y T t = G g T t = = Y T t =, G g = = Y t ) G g =. To complete the proof of Theorem B., we must show that M w treat g, g, ) + w cont g, t) w cont g, g )) Y = Y t 0) G g =. B.) 5
16 Towards this, from Assumption B. and proceeding as in Lemma A., we get Y t 0) X, G g = =Y 0) X, G g =, T t = From the above result, it follows that = Y X, G g =, T g = B.2) + Y X, C =, T t = Y X, C =, T g =. Y t 0) X, G g = = Y X, G g =, T g = G g =, T g = + Y X, C =, T t = G g =, T t = B.3) Y X, C =, T g = G g =, T g =. We consider each term separately. For the first term of B.3), Y X, G g =, T g = G g =, T g = = Y G g =, T g = = M w treat t, g) Y. B.4) Let Y X, C =, T t = = A C=,Tt= X), and note that, by repeated application of the law of iterated expectations as in the proof of Theorem, we have that for the second term of B.3), A C=,Tt= X) G g =, T t = = G g T t = pg X) C p g X)) Y T t = = M G g T t Tt p g X) C M p g X)) Y = M w cont g, t) Y, B.5) where the last equality follows from p g X) := M G g X, G g + C =, and Tt p g X) C M = M T t M G g X, C + G g = C p g X)) M C X, C + G g = = M T t M G g X C M C X M G g X M C X = M M C X = M T t G g X = M T t G g. Following analogous steps, we get that, for the third term of B.3), A C=,Tg = X) Gg =, T g = = M w cont g, g ) Y. B.6) 6
17 Then, B.) follows by combining B.4), B.5) and B.6). The proof of Theorem B. is therefore completed. The identification results in Theorem B. suggest a simple two-step estimation procedure for the AT T g, t) with repeated cross-section data. Similar to the panel data case discussed in the main text, we propose to estimate AT T g, t) by ÂT T g, t) = n ŵ treat g, t) ŵ treat g, g )) Y where ˆp g ) is an estimate of p g ), and for a, b =, 2,... T, ŵ treat a, b) = T b G a / n T b G a, ŵ cont a, b; ˆp) = T b ˆp a X) C ˆp a X) n ŵ cont g, t; ˆp) ŵ cont g, g ; ˆp)) Y. / n Tb ˆp a X) C ˆp a X) Next, we show that ÂT T g, t) is -consistent, admits an asymptotically linear representation, and is asymptotically normal. These results are analogous to Theorem 2 in the main text. Let AT T g t and ÂT T g t denote the vector of AT T g, t) and ÂT T g, t), respectively, for all g = 2,..., T and t = 2,..., T with g t. Define ψg,tw rc i ) = ψ rc,g g,t W i ) ψ rc,g where, for g, t =, 2,... T, and g,g W i ) ) ψ rc,c g,t ψ rc,g g,t W) = w treat g, t) Y M w treat g, t) Y,. ) W i ) ψg,g W rc,c i ), ψ rc,c g,t W) = w cont g, t) Y M w cont g, t) Y + M rc g,t ξ π g W), M rc g,t = M X ) 2 Tt C ṗ g X) Y w cont g, t) Y p g X), Tt p g X) C M p g X) which is a k vector, with k the dimension of X, and ξ π g W) is as defined in 3.) in the main text. Finally, let Ψ rc g t denote the collection of ψ rc g,t across all periods t and groups g such that g t. Theorem B.2. Under Assumption B. and Assumptions 2-5 in the main text, for 2 g t T, ÂT T g, t) AT T g, t)) = ψ rc g,tw i ) + o p ). 7
18 Furthermore, ÂT T g t AT T g t ) d N0, Σ rc ) where Σ rc = M Ψ rc g tw)ψ rc g tw). Proof of Theorem B.2: The proof of Theorem B.2 follows the same steps as the Proof of Theorem 2. From Theorem B., for each 2 g t T we can write ÂT T g, t) AT T g, t)) = n ŵ treat g, t) Y M w treat g, t) Y ) n ŵ treat g, g ) Y M w treat g, g ) Y ) n ŵ cont g, t; ˆp) Y M w cont g, t) Y ) + n ŵ cont g, g ; ˆp) Y M w cont g, g ) Y ). B.7) We analyze each term separately. First, note that, for each 2 g t T, n T t G g M T t G g ) = Then, by the continuous mapping theorem, Analogously, T it G ig T t G g ). n ŵ treat g, t) Y M w treat g, t) Y ) = ψ rc,g g,t W i ) + o p ). B.8) n ŵ treat g, g ) Y M w treat g, g ) Y ) = ψ rc,g g,g W i ) + o p ). B.9) Next we focus on n ŵ cont g, t; ˆp) Y M w cont g, t) Y ). To simplify notation, write w a,b p) = T b p a X) C p a X), and note that ŵ cont g, t; ˆp) = w g,t ˆp) / n w g,t ˆp) and w cont g, t; p) = w g,t p) / M w g,t p). Then, n ŵ cont g, t; ˆp) Y M w cont g, t) Y ) = n n w g,t ˆp) Y M w g,t p) Y ) n w g,t ˆp) M w g,t p) Y n n w g,t ˆp) M w g,t p)) n w g,t ˆp) M w g,t p) 8
19 := na rc n, g,t ˆp g ) M w cont g, t) Y nb rc n, g,t ˆp g ). n w g,t ˆp) n w g,t ˆp) From Assumption 5, Lemmas A.2 and A.3, and the continuous mapping theorem, Thus, n w g,t ˆp) = M w g,t p) + o p ), M w cont g, t) Y n w g,t ˆp) = M w cont g, t) Y M w g,t p) + o p ). n ŵ cont g, t; ˆp) Y M w cont g, t) Y ) na rc = n, g,t ˆp g ) M w g,t p) M w cont g, t) Y nb rc n, g,t ˆp g ) + o p ) B.0) M w g,t p) Applying a classical mean value theorem argument, A rc n, g,t ˆp g ) = n w g,t p) Y M w g,t p) Y + n X T t C p g X; π g ) ) 2 ṗ g X; π g ) Y ˆπg π 0 g), where π is an intermediate point that satisfies πg πg 0 ˆπg πg 0 a.s. Thus, by Assumption 5, Lemmas A.2 and A.3, and the Glivenko-Cantelli s theorem, A rc n, g,t ˆp g ) = n w g,t p) Y M w g,t p) Y ) 2 Tt C + M X ṗ g X) Y p g X) Analogously, ˆπg π 0 g) + op n /2 ). B.) B n ˆp g ) = n w g,t p) M w g,t p) + M X Combining B.0), B.), B.2) with Lemma A.2 yield n ŵ cont g, t; ˆp) Y M w cont g, t) Y ) = ) 2 Tt C ṗ g X) ˆπg π 0 p g X) g) ) + op n /2. B.2) ψ rc,c g,t W i ) + o p ). B.3) 9
20 Using the same arguments, we conclude that n ŵ cont g, g ; ˆp) Y M w cont g, g ) Y ) = ψ rc,c g,g W i ) + o p ). B.4) Hence, from B.7), B.8), B.9), B.3) and B.4), we conclude that, for each 2 g t T, ÂT T g, t) AT T g, t)) = ψ rc g,tw i ) + o p ). The proof is then completed by applying the multivariate central limit theorem. Based on the above results, one can conclude that estimation and inference procedures for AT T g, t) in the case of repeated cross sections is similar to what we did in the case with panel data. In fact, one simply needs to adjust the weights slightly. In order to conduct asymptotically valid simultaneous inference, one can leverage on the asymptotic linear representation in Theorem B.2, and use a multiplier bootstrap procedure analogous to the one in Theorem 3. The proof of the bootstrap validity in the repeated cross section case follows exactly the same steps as in Theorem 3 and is therefore omitted. Appendix C: Analysis with Not yet Treated as a Control Group In this appendix, we discuss the case where one considers the not yet treated instead of the never treated as a control group. This case is particularly relevant in applications when eventually almost) all units are treated, though the timing of the treatment differs across groups. To carry this analysis, we make the following assumptions. Assumption C.. {Y i, Y i2,... Y it, X i, D i, D i2,..., D it } n is independent and identically distributed iid). Assumption C.2. For all t = 2,..., T, g = 2,..., T such that g t, Y t 0) Y t 0) X, G g = = Y t 0) Y t 0) X, D t = 0 a.s.. Assumption C.3. For t = 2,..., T, D t = implies that D t = Assumption C.4. For all t = 2,..., T, g = 2,..., T, P G g = ) > 0 and P D t = X) < a.s.. 20
21 Assumptions C. and C.3 are the same as Assumptions and 3 in the main text. Assumptions C.2 and C.4 are the analogue of Assumptions 2 and 4, but using those not yet treated D t = 0) as a control group instead of the never treated C = 0 or D T = 0). Note that Assumption C.4 rules out the case in which eventually everyone is treated; in these time periods, there is no control group available, and therefore the data itself is not informative about the average treatment effect when D t = a.s.. In these cases, one should concentrate their attention only to the time periods such that P D t = X) < a.s.. Remember that AT T X g, t) = Y t ) Y t 0) X, G g =. Next lemma states that, under Assumptions C.-C.4, we can identify AT T X g, t) for 2 g t T. This is the analogue of Lemma A.. Lemma C.. Under Assumptions C.-C.4, and for 2 g t T, AT T X g, t) = Y t Y g X, G g = Y t Y g X, D t = 0 a.s.. Proof of Lemma C.: In what follows, take all equalities to hold almost surely a.s.). Notice that for identifying AT T X g, t), the key term is Y t 0) X, G g =. And notice that for h > s, Y s 0) X, G s = = Y s X, G h =, which holds because in time periods before an individual is first treated, their untreated potential outcomes are observed outcomes. Also, note that, for 2 g t T, Y t 0) X, G g = = Y t 0) X, G g = + Y t 0) X, G g = = Y t X, D t = 0 + Y t 0) X, G g =, C.) where the first equality holds by adding and subtracting Y t 0) X, G g = and the second equality holds by Assumption C.2. If g = t, then the last term in the final equation is identified; otherwise, one can continue recursively in similar way to C.) but starting with Y t 0) X, G g =. As a result, t g Y t 0) X, G g = = Y t j X, D t = 0 + Y g X, G g = j=0 = Y t Y g X, D t = 0 + Y g X, G g =. C.2) Combining C.2) with the fact that, for all g t, Y t ) X, G g = = Y t X, G g = which holds because observed outcomes for group g in period t with g t are treated potential outcomes), implies the result. With the result of Lemma C. at hands, we proceed to show that the AT T g, t) is nonparametrically identified under Assumptions C. - C.4 and for 2 g t T. The following Theorem 2
22 is the analogue Theorem. Theorem C.. Under Assumptions C. - C.4 and for 2 g t T, the group-time average treatment effect for group g in period t is nonparametrically identified, and given by P G g = X) D t ) AT T g, t) = G g G g P D t = X) P Gg = X) D t ) Y t Y g ). P D t = X) Proof of Theorem C.: Given the result in Lemma C., AT T g, t) = AT T X g, t) G g = = Y t Y g X, G g = Y t Y g X, D t = 0 G g = := A X G g = B n.yet X G g =, and we consider each term separately. For the first term C.3) A X G g = = Y t Y g G g = Gg = G g Y t Y g ). C.4) For the second term, by repetition of the law of iterated expectations, we have B n.yet X G g = = Y t Y g X, D t = 0 G g = = G g D t ) Y t Y g ) X, D t = 0 G g = D t ) = G g P D t = X) Y t Y g ) Gg X = = G g D t ) G g P D t = X) Y t Y g ) X = G g D t ) G g X P D t = X) Y t Y g ) X P = G g Gg = X) D t ) Y t Y g ) X P D t = X) P = G g Gg = X) D t ) Y t Y g ) P D t = X) P Gg = X) D t ) Y t Y g ) P D t = X) =, C.5) P Gg = X) D t ) P D t = X) 22
23 where C.5) follows from P Gg = X) D t ) P Gg = X) D t ) = X P D t = X) P D t = X) P Gg = X) = P D t = X) D t ) X P Gg = X) = P D t = X) P D t = X)) = P G g = X) = G g X = G g. The proof is completed by combining C.4) and C.5). Once we have establish nonparametric identification of AT T g, t), we can follow a similar twostep estimation strategy as described in Section 3. More precisely, under Assumptions C. - C.4 and for 2 g t T, one can estimate AT T g, t) by ÂT T n.yet g, t) = n G g n G g ˆp Gg X) D t ) ˆp Dt X) n ˆpGg X) D t ) ˆp Dt X) Y t Y g ), where ˆp Gg X) is an estimate of P G g = X), and ˆp Dt X) is an estimate of P D t = X). In contrast to the case analyzed in the main text, here we need to estimate two propensity scores. These can be estimated separately, using binary choice models e.g. multinomial choice models e.g. multinomial logit). logit), or jointly, using Following similar steps as in Theorems 2 and 3, one can show that under suitable regularity conditions akin to those in Assumption 5, ÂT T n.yet g, t) is consistent and asymptotically normal, and that one can use a multiplier bootstrap similar to the one described in Algorithm to conduct asymptotically valid inference. Nonetheless, it is worth mentioning that the asymptotic linear representation of ÂT T n.yet g, t) will be different from that of ÂT T g, t), because the former is based on of two different propensity scores whereas the later is based only on one. A detailed and formal derivation of the aforementioned results is beyond the scope of this article. 23
24 Appendix D: Additional Results for the Case without Covariates Panel Data The case where the DID assumption holds without conditioning on covariates is of particular interest. In this appendix, we briefly consider whether or not it is possible to obtain AT T g, t) using a regression approach like the two period - two group case. A natural starting point is the model Y igt = α t + c g + γ gt G igt + u igt where α t is a vector of time period fixed effects we normalize α to be equal to zero and γ g to be equal to ), c g is time invariant unobserved heterogeneity that can be distributed differently across groups, and G igt is is a dummy variable indicating whether or not individual i is a member group g and the time period is t. Differencing the model across time periods results in Y igt = α t + γ gt G igt + u igt, where α t = α t α t. Notice that this is a fully saturated model in group and time effects. It is straightforward to show that γ gt = Y t G g = Y t C =. When g = t, this is exactly the DID estimator. Under the augmented unconditional version of the parallel trends assumption, γ gt should be equal to 0 for all g > t, and it is straightforward to test this using output from standard regression software e.g. Wald test). For t > g, the long difference estimate of AT T g, t) can be constructed by AT T g, t) = Y t Y g G g = Y t Y g C = t = Y s G g = Y s C = ) = s=g t s=g γ gs This implies that, under the augmented) unconditional parallel trends assumption, AT T g, t) can be recovered using a regression approach. However, combining the estimates of the parameters in this way does not seem to offer much convenience relative to simply computing the estimates directly using the main approach suggested in the paper. Thus, unlike the 2-period case, it does not appear that there is as exact of a mapping from a regression coefficient to a group-time average 24
25 treatment effect. Common Approaches to Pre-Testing in the Unconditional Case Finally in this section, we consider the most common approach to pre-testing the augmented unconditional version of the parallel trends assumption, that is, to run the following regression see Autor et al. 2007) and Angrist and Pischke 2008)). q Y it = α t + θ g + β 0 D it + β j D it,t+j + u it j= D.) where D it is a dummy variable for whether or not individual i is treated in period t notice that this is not whether they are first treated in period t but whether or not they are treated at all; it is a post-treatment dummy variable), D it,t+j is a j period lead for individual i who is first treated in period t + j. For example, when t = 2, D i2,4 = for j = 2) for individuals who are first treated in period 4, which indicates that the group of individuals first treated in period 4 will be treated 2 periods from period t. Then, one can pre-test the unconditional parallel trends assumption by testing if β j = 0 for j =,..., q. Under the Unconditional DID Assumption, each β j will be 0. One advantage of this approach is that it allows simple graphs of pre-treatment trends. However, it is possible for this approach to miss departures from the unconditional parallel trends assumption that our test would not miss. Consider the case with four periods and three groups the control group, a group first treated in period 4, and a group first treated in period 3. Also, consider the case with q =. It is easy to show that β = Y 3 G 4 = Y 3 C = and β = Y 2 G 3 = Y C = so that the estimate of β will be a weighted average of these two pre-trends. Thus, the unconditional augmented parallel trends assumption could be violated in ways that offset each other leading to β being equal to 0. ven more importantly,the weights associate with the regression coefficient β may not be convex; see Propositions 3 and 7 in Abraham and Sun 208) for detailed arguments. As a consequence, tests for pre-trends based on D.) may not be reliable under treatment effect heterogeneity. Our approach described in Remark 7 in the main text, on the other hand, does not suffer from this potential drawback. References Abraham, S., and Sun, L. 208), stimating Dynamic Treatment ffects in vent Studies With Heterogeneous Treatment ffects, Working Paper,. 25
26 Angrist, J. D., and Pischke, J.-S. 2008), Mostly Harmless conometrics: An mpiricist s Companion, : Princeton University Press. Autor, D. H., Kerr, W. R., and Kugler, A. D. 2007), Does mployment Protection Reduce Productivity? vidence From US States, The conomic Journal, 752), F89 F27. Kosorok, M. R. 2008), Introduction to mpirical Processes and Semiparametric Inference, New York, NY: Springer. van der Vaart, A. W. 998), Asymptotic Statistics, Cambridge: Cambridge University Press. van der Vaart, A. W., and Wellner, J. A. 996), Weak Convergence and mpirical Processes, New York: Springer. 26
Difference-in-Differences with Multiple Time Periods and an Application on the Minimum Wage and Employment
Difference-in-Differences with Multiple Time Periods and an Application on the Minimum Wage and Employment Brantly Callaway Department of Economics Temple University Pedro H. C. Sant Anna Department of
More informationDifference-in-Differences with Multiple Time Periods and. an Application on the Minimum Wage and Employment
Difference-in-Differences with Multiple Time Periods and an Application on the Minimum Wage and Employment arxiv:1803.09015v2 [econ.em] 31 Aug 2018 Brantly Callaway Pedro H. C. Sant Anna September 5, 2018
More informationSOME CONVERSE LIMIT THEOREMS FOR EXCHANGEABLE BOOTSTRAPS
SOME CONVERSE LIMIT THEOREMS OR EXCHANGEABLE BOOTSTRAPS Jon A. Wellner University of Washington The bootstrap Glivenko-Cantelli and bootstrap Donsker theorems of Giné and Zinn (990) contain both necessary
More informationEconometric Analysis of Cross Section and Panel Data
Econometric Analysis of Cross Section and Panel Data Jeffrey M. Wooldridge / The MIT Press Cambridge, Massachusetts London, England Contents Preface Acknowledgments xvii xxiii I INTRODUCTION AND BACKGROUND
More informationIntroduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued
Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations Research
More informationWhen Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data?
When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data? Kosuke Imai Department of Politics Center for Statistics and Machine Learning Princeton University
More informationWhen Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data?
When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data? Kosuke Imai Princeton University Asian Political Methodology Conference University of Sydney Joint
More informationEstimation and Inference of Quantile Regression. for Survival Data under Biased Sampling
Estimation and Inference of Quantile Regression for Survival Data under Biased Sampling Supplementary Materials: Proofs of the Main Results S1 Verification of the weight function v i (t) for the lengthbiased
More informationFlexible Estimation of Treatment Effect Parameters
Flexible Estimation of Treatment Effect Parameters Thomas MaCurdy a and Xiaohong Chen b and Han Hong c Introduction Many empirical studies of program evaluations are complicated by the presence of both
More informationOn the Use of Linear Fixed Effects Regression Models for Causal Inference
On the Use of Linear Fixed Effects Regression Models for ausal Inference Kosuke Imai Department of Politics Princeton University Joint work with In Song Kim Atlantic ausal Inference onference Johns Hopkins
More informationFuzzy Differences-in-Differences
Review of Economic Studies (2017) 01, 1 30 0034-6527/17/00000001$02.00 c 2017 The Review of Economic Studies Limited Fuzzy Differences-in-Differences C. DE CHAISEMARTIN University of California at Santa
More informationWhen Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data?
When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data? Kosuke Imai Department of Politics Center for Statistics and Machine Learning Princeton University Joint
More informationIntroduction to Empirical Processes and Semiparametric Inference Lecture 12: Glivenko-Cantelli and Donsker Results
Introduction to Empirical Processes and Semiparametric Inference Lecture 12: Glivenko-Cantelli and Donsker Results Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics
More informationNew Developments in Econometrics Lecture 11: Difference-in-Differences Estimation
New Developments in Econometrics Lecture 11: Difference-in-Differences Estimation Jeff Wooldridge Cemmap Lectures, UCL, June 2009 1. The Basic Methodology 2. How Should We View Uncertainty in DD Settings?
More informationCausal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies
Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies Kosuke Imai Department of Politics Princeton University November 13, 2013 So far, we have essentially assumed
More informationEmpirical Processes: General Weak Convergence Theory
Empirical Processes: General Weak Convergence Theory Moulinath Banerjee May 18, 2010 1 Extended Weak Convergence The lack of measurability of the empirical process with respect to the sigma-field generated
More informationA Course in Applied Econometrics Lecture 14: Control Functions and Related Methods. Jeff Wooldridge IRP Lectures, UW Madison, August 2008
A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods Jeff Wooldridge IRP Lectures, UW Madison, August 2008 1. Linear-in-Parameters Models: IV versus Control Functions 2. Correlated
More informationINFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION. 1. Introduction
INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION VICTOR CHERNOZHUKOV CHRISTIAN HANSEN MICHAEL JANSSON Abstract. We consider asymptotic and finite-sample confidence bounds in instrumental
More informationEfficiency of Profile/Partial Likelihood in the Cox Model
Efficiency of Profile/Partial Likelihood in the Cox Model Yuichi Hirose School of Mathematics, Statistics and Operations Research, Victoria University of Wellington, New Zealand Summary. This paper shows
More informationProgram Evaluation with High-Dimensional Data
Program Evaluation with High-Dimensional Data Alexandre Belloni Duke Victor Chernozhukov MIT Iván Fernández-Val BU Christian Hansen Booth ESWC 215 August 17, 215 Introduction Goal is to perform inference
More information16/018. Efficiency Gains in Rank-ordered Multinomial Logit Models. June 13, 2016
16/018 Efficiency Gains in Rank-ordered Multinomial Logit Models Arie Beresteanu and Federico Zincenko June 13, 2016 Efficiency Gains in Rank-ordered Multinomial Logit Models Arie Beresteanu and Federico
More informationEstimation of Treatment Effects under Essential Heterogeneity
Estimation of Treatment Effects under Essential Heterogeneity James Heckman University of Chicago and American Bar Foundation Sergio Urzua University of Chicago Edward Vytlacil Columbia University March
More informationLecture 11/12. Roy Model, MTE, Structural Estimation
Lecture 11/12. Roy Model, MTE, Structural Estimation Economics 2123 George Washington University Instructor: Prof. Ben Williams Roy model The Roy model is a model of comparative advantage: Potential earnings
More informationWild Bootstrap Inference for Wildly Dierent Cluster Sizes
Wild Bootstrap Inference for Wildly Dierent Cluster Sizes Matthew D. Webb October 9, 2013 Introduction This paper is joint with: Contributions: James G. MacKinnon Department of Economics Queen's University
More informationLikelihood Based Inference for Monotone Response Models
Likelihood Based Inference for Monotone Response Models Moulinath Banerjee University of Michigan September 5, 25 Abstract The behavior of maximum likelihood estimates (MLE s) the likelihood ratio statistic
More informationIV Quantile Regression for Group-level Treatments, with an Application to the Distributional Effects of Trade
IV Quantile Regression for Group-level Treatments, with an Application to the Distributional Effects of Trade Denis Chetverikov Brad Larsen Christopher Palmer UCLA, Stanford and NBER, UC Berkeley September
More informationA Course in Applied Econometrics Lecture 4: Linear Panel Data Models, II. Jeff Wooldridge IRP Lectures, UW Madison, August 2008
A Course in Applied Econometrics Lecture 4: Linear Panel Data Models, II Jeff Wooldridge IRP Lectures, UW Madison, August 2008 5. Estimating Production Functions Using Proxy Variables 6. Pseudo Panels
More informationEconometrica Supplementary Material
Econometrica Supplementary Material SUPPLEMENT TO USING INSTRUMENTAL VARIABLES FOR INFERENCE ABOUT POLICY RELEVANT TREATMENT PARAMETERS Econometrica, Vol. 86, No. 5, September 2018, 1589 1619 MAGNE MOGSTAD
More informationGoodness-of-fit tests for the cure rate in a mixture cure model
Biometrika (217), 13, 1, pp. 1 7 Printed in Great Britain Advance Access publication on 31 July 216 Goodness-of-fit tests for the cure rate in a mixture cure model BY U.U. MÜLLER Department of Statistics,
More informationDifference-in-Differences with Multiple Time Periods
Difference-in-Differences with Multiple Time Periods Brantly Callaway Pedro H. C. Sant Anna March 1, 2019 Abstract In this article, we consider identification, estimation, and inference procedures for
More informationA Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i,
A Course in Applied Econometrics Lecture 18: Missing Data Jeff Wooldridge IRP Lectures, UW Madison, August 2008 1. When Can Missing Data be Ignored? 2. Inverse Probability Weighting 3. Imputation 4. Heckman-Type
More informationDifference-in-Differences Estimation
Difference-in-Differences Estimation Jeff Wooldridge Michigan State University Programme Evaluation for Policy Analysis Institute for Fiscal Studies June 2012 1. The Basic Methodology 2. How Should We
More informationThe changes-in-changes model with covariates
The changes-in-changes model with covariates Blaise Melly, Giulia Santangelo Bern University, European Commission - JRC First version: January 2015 Last changes: January 23, 2015 Preliminary! Abstract:
More informationIntroduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued
Introduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and
More informationLecture 8. Roy Model, IV with essential heterogeneity, MTE
Lecture 8. Roy Model, IV with essential heterogeneity, MTE Economics 2123 George Washington University Instructor: Prof. Ben Williams Heterogeneity When we talk about heterogeneity, usually we mean heterogeneity
More informationEconometrics of Panel Data
Econometrics of Panel Data Jakub Mućk Meeting # 1 Jakub Mućk Econometrics of Panel Data Meeting # 1 1 / 31 Outline 1 Course outline 2 Panel data Advantages of Panel Data Limitations of Panel Data 3 Pooled
More informationQuantile Regression for Panel Data Models with Fixed Effects and Small T : Identification and Estimation
Quantile Regression for Panel Data Models with Fixed Effects and Small T : Identification and Estimation Maria Ponomareva University of Western Ontario May 8, 2011 Abstract This paper proposes a moments-based
More informationEstimating Panel Data Models in the Presence of Endogeneity and Selection
================ Estimating Panel Data Models in the Presence of Endogeneity and Selection Anastasia Semykina Department of Economics Florida State University Tallahassee, FL 32306-2180 asemykina@fsu.edu
More informationMissing dependent variables in panel data models
Missing dependent variables in panel data models Jason Abrevaya Abstract This paper considers estimation of a fixed-effects model in which the dependent variable may be missing. For cross-sectional units
More informationFour Parameters of Interest in the Evaluation. of Social Programs. James J. Heckman Justin L. Tobias Edward Vytlacil
Four Parameters of Interest in the Evaluation of Social Programs James J. Heckman Justin L. Tobias Edward Vytlacil Nueld College, Oxford, August, 2005 1 1 Introduction This paper uses a latent variable
More informationCross-fitting and fast remainder rates for semiparametric estimation
Cross-fitting and fast remainder rates for semiparametric estimation Whitney K. Newey James M. Robins The Institute for Fiscal Studies Department of Economics, UCL cemmap working paper CWP41/17 Cross-Fitting
More informationEstimation of the Bivariate and Marginal Distributions with Censored Data
Estimation of the Bivariate and Marginal Distributions with Censored Data Michael Akritas and Ingrid Van Keilegom Penn State University and Eindhoven University of Technology May 22, 2 Abstract Two new
More informationA note on L convergence of Neumann series approximation in missing data problems
A note on L convergence of Neumann series approximation in missing data problems Hua Yun Chen Division of Epidemiology & Biostatistics School of Public Health University of Illinois at Chicago 1603 West
More informationSupplementary Materials for Residuals and Diagnostics for Ordinal Regression Models: A Surrogate Approach
Supplementary Materials for Residuals and Diagnostics for Ordinal Regression Models: A Surrogate Approach Part A: Figures and tables Figure 2: An illustration of the sampling procedure to generate a surrogate
More informationMore Empirical Process Theory
More Empirical Process heory 4.384 ime Series Analysis, Fall 2008 Recitation by Paul Schrimpf Supplementary to lectures given by Anna Mikusheva October 24, 2008 Recitation 8 More Empirical Process heory
More informationGov 2002: 9. Differences in Differences
Gov 2002: 9. Differences in Differences Matthew Blackwell October 30, 2015 1 / 40 1. Basic differences-in-differences model 2. Conditional DID 3. Standard error issues 4. Other DID approaches 2 / 40 Where
More informationWhat s New in Econometrics? Lecture 14 Quantile Methods
What s New in Econometrics? Lecture 14 Quantile Methods Jeff Wooldridge NBER Summer Institute, 2007 1. Reminders About Means, Medians, and Quantiles 2. Some Useful Asymptotic Results 3. Quantile Regression
More informationUniversity of California, Berkeley
University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 24 Paper 153 A Note on Empirical Likelihood Inference of Residual Life Regression Ying Qing Chen Yichuan
More informationUnderstanding Ding s Apparent Paradox
Submitted to Statistical Science Understanding Ding s Apparent Paradox Peter M. Aronow and Molly R. Offer-Westort Yale University 1. INTRODUCTION We are grateful for the opportunity to comment on A Paradox
More information(θ θ ), θ θ = 2 L(θ ) θ θ θ θ θ (θ )= H θθ (θ ) 1 d θ (θ )
Setting RHS to be zero, 0= (θ )+ 2 L(θ ) (θ θ ), θ θ = 2 L(θ ) 1 (θ )= H θθ (θ ) 1 d θ (θ ) O =0 θ 1 θ 3 θ 2 θ Figure 1: The Newton-Raphson Algorithm where H is the Hessian matrix, d θ is the derivative
More informationQuantile Processes for Semi and Nonparametric Regression
Quantile Processes for Semi and Nonparametric Regression Shih-Kang Chao Department of Statistics Purdue University IMS-APRM 2016 A joint work with Stanislav Volgushev and Guang Cheng Quantile Response
More informationLecture 11 Roy model, MTE, PRTE
Lecture 11 Roy model, MTE, PRTE Economics 2123 George Washington University Instructor: Prof. Ben Williams Roy Model Motivation The standard textbook example of simultaneity is a supply and demand system
More informationMotivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University
Econ 690 Purdue University In virtually all of the previous lectures, our models have made use of normality assumptions. From a computational point of view, the reason for this assumption is clear: combined
More informationScore test for random changepoint in a mixed model
Score test for random changepoint in a mixed model Corentin Segalas and Hélène Jacqmin-Gadda INSERM U1219, Biostatistics team, Bordeaux GDR Statistiques et Santé October 6, 2017 Biostatistics 1 / 27 Introduction
More informationConcentration behavior of the penalized least squares estimator
Concentration behavior of the penalized least squares estimator Penalized least squares behavior arxiv:1511.08698v2 [math.st] 19 Oct 2016 Alan Muro and Sara van de Geer {muro,geer}@stat.math.ethz.ch Seminar
More informationlarge number of i.i.d. observations from P. For concreteness, suppose
1 Subsampling Suppose X i, i = 1,..., n is an i.i.d. sequence of random variables with distribution P. Let θ(p ) be some real-valued parameter of interest, and let ˆθ n = ˆθ n (X 1,..., X n ) be some estimate
More informationWeb Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.
Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Ruppert A. EMPIRICAL ESTIMATE OF THE KERNEL MIXTURE Here we
More informationVerifying Regularity Conditions for Logit-Normal GLMM
Verifying Regularity Conditions for Logit-Normal GLMM Yun Ju Sung Charles J. Geyer January 10, 2006 In this note we verify the conditions of the theorems in Sung and Geyer (submitted) for the Logit-Normal
More informationLarge Sample Properties of Estimators in the Classical Linear Regression Model
Large Sample Properties of Estimators in the Classical Linear Regression Model 7 October 004 A. Statement of the classical linear regression model The classical linear regression model can be written in
More informationNew Developments in Econometrics Lecture 16: Quantile Estimation
New Developments in Econometrics Lecture 16: Quantile Estimation Jeff Wooldridge Cemmap Lectures, UCL, June 2009 1. Review of Means, Medians, and Quantiles 2. Some Useful Asymptotic Results 3. Quantile
More informationManifold Learning for Subsequent Inference
Manifold Learning for Subsequent Inference Carey E. Priebe Johns Hopkins University June 20, 2018 DARPA Fundamental Limits of Learning (FunLoL) Los Angeles, California http://arxiv.org/abs/1806.01401 Key
More informationRewrap ECON November 18, () Rewrap ECON 4135 November 18, / 35
Rewrap ECON 4135 November 18, 2011 () Rewrap ECON 4135 November 18, 2011 1 / 35 What should you now know? 1 What is econometrics? 2 Fundamental regression analysis 1 Bivariate regression 2 Multivariate
More informationLikelihood-Based Methods
Likelihood-Based Methods Handbook of Spatial Statistics, Chapter 4 Susheela Singh September 22, 2016 OVERVIEW INTRODUCTION MAXIMUM LIKELIHOOD ESTIMATION (ML) RESTRICTED MAXIMUM LIKELIHOOD ESTIMATION (REML)
More informationEconometrics of causal inference. Throughout, we consider the simplest case of a linear outcome equation, and homogeneous
Econometrics of causal inference Throughout, we consider the simplest case of a linear outcome equation, and homogeneous effects: y = βx + ɛ (1) where y is some outcome, x is an explanatory variable, and
More informationBootstrap of residual processes in regression: to smooth or not to smooth?
Bootstrap of residual processes in regression: to smooth or not to smooth? arxiv:1712.02685v1 [math.st] 7 Dec 2017 Natalie Neumeyer Ingrid Van Keilegom December 8, 2017 Abstract In this paper we consider
More informationSEMIPARAMETRIC LIKELIHOOD RATIO INFERENCE. By S. A. Murphy 1 and A. W. van der Vaart Pennsylvania State University and Free University Amsterdam
The Annals of Statistics 1997, Vol. 25, No. 4, 1471 159 SEMIPARAMETRIC LIKELIHOOD RATIO INFERENCE By S. A. Murphy 1 and A. W. van der Vaart Pennsylvania State University and Free University Amsterdam Likelihood
More informationAsymptotic Distributions for the Nelson-Aalen and Kaplan-Meier estimators and for test statistics.
Asymptotic Distributions for the Nelson-Aalen and Kaplan-Meier estimators and for test statistics. Dragi Anevski Mathematical Sciences und University November 25, 21 1 Asymptotic distributions for statistical
More informationEfficient Semiparametric Estimation of Quantile Treatment Effects
fficient Semiparametric stimation of Quantile Treatment ffects Sergio Firpo U Berkeley - Department of conomics This Draft: January, 2003 Abstract This paper presents calculations of semiparametric efficiency
More informationStatistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation
Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider
More informationAppendix B for The Evolution of Strategic Sophistication (Intended for Online Publication)
Appendix B for The Evolution of Strategic Sophistication (Intended for Online Publication) Nikolaus Robalino and Arthur Robson Appendix B: Proof of Theorem 2 This appendix contains the proof of Theorem
More informationSupplemental Material for KERNEL-BASED INFERENCE IN TIME-VARYING COEFFICIENT COINTEGRATING REGRESSION. September 2017
Supplemental Material for KERNEL-BASED INFERENCE IN TIME-VARYING COEFFICIENT COINTEGRATING REGRESSION By Degui Li, Peter C. B. Phillips, and Jiti Gao September 017 COWLES FOUNDATION DISCUSSION PAPER NO.
More informationEconometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018
Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate
More informationNon-linear panel data modeling
Non-linear panel data modeling Laura Magazzini University of Verona laura.magazzini@univr.it http://dse.univr.it/magazzini May 2010 Laura Magazzini (@univr.it) Non-linear panel data modeling May 2010 1
More informationSupplementary Material for Nonparametric Operator-Regularized Covariance Function Estimation for Functional Data
Supplementary Material for Nonparametric Operator-Regularized Covariance Function Estimation for Functional Data Raymond K. W. Wong Department of Statistics, Texas A&M University Xiaoke Zhang Department
More informationA Very Brief Summary of Statistical Inference, and Examples
A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2009 Prof. Gesine Reinert Our standard situation is that we have data x = x 1, x 2,..., x n, which we view as realisations of random
More informationThe International Journal of Biostatistics
The International Journal of Biostatistics Volume 1, Issue 1 2005 Article 3 Score Statistics for Current Status Data: Comparisons with Likelihood Ratio and Wald Statistics Moulinath Banerjee Jon A. Wellner
More informationOnline Appendix to Yes, But What s the Mechanism? (Don t Expect an Easy Answer) John G. Bullock, Donald P. Green, and Shang E. Ha
Online Appendix to Yes, But What s the Mechanism? (Don t Expect an Easy Answer) John G. Bullock, Donald P. Green, and Shang E. Ha January 18, 2010 A2 This appendix has six parts: 1. Proof that ab = c d
More informationApplied Econometrics Lecture 1
Lecture 1 1 1 Università di Urbino Università di Urbino PhD Programme in Global Studies Spring 2018 Outline of this module Beyond OLS (very brief sketch) Regression and causality: sources of endogeneity
More informationClassification. Chapter Introduction. 6.2 The Bayes classifier
Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode
More informationSensitivity to Missing Data Assumptions: Theory and An Evaluation of the U.S. Wage Structure. September 21, 2012
Sensitivity to Missing Data Assumptions: Theory and An Evaluation of the U.S. Wage Structure Andres Santos UC San Diego September 21, 2012 The Problem Missing data is ubiquitous in modern economic research
More informationSupplement to Fuzzy Differences-in-Differences
Supplement to Fuzzy Differences-in-Differences Clément de Chaisemartin Xavier D Haultfœuille June 30, 2017 Abstract This paper gathers the supplementary material to de Chaisemartin and D Haultfœuille (2017).
More informationMaximum Likelihood (ML) Estimation
Econometrics 2 Fall 2004 Maximum Likelihood (ML) Estimation Heino Bohn Nielsen 1of32 Outline of the Lecture (1) Introduction. (2) ML estimation defined. (3) ExampleI:Binomialtrials. (4) Example II: Linear
More informationECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Winter 2016 Instructor: Victor Aguirregabiria
ECOOMETRICS II (ECO 24S) University of Toronto. Department of Economics. Winter 26 Instructor: Victor Aguirregabiria FIAL EAM. Thursday, April 4, 26. From 9:am-2:pm (3 hours) ISTRUCTIOS: - This is a closed-book
More informationIEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm
IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.
More informationTwo-way fixed effects estimators with heterogeneous treatment effects
Two-way fixed effects estimators with heterogeneous treatment effects Clément de Chaisemartin UC Santa Barbara Xavier D Haultfœuille CREST USC-INET workshop on Evidence to Inform Economic Policy are very
More informationIgnoring the matching variables in cohort studies - when is it valid, and why?
Ignoring the matching variables in cohort studies - when is it valid, and why? Arvid Sjölander Abstract In observational studies of the effect of an exposure on an outcome, the exposure-outcome association
More informationEco517 Fall 2014 C. Sims FINAL EXAM
Eco517 Fall 2014 C. Sims FINAL EXAM This is a three hour exam. You may refer to books, notes, or computer equipment during the exam. You may not communicate, either electronically or in any other way,
More informationCasuality and Programme Evaluation
Casuality and Programme Evaluation Lecture V: Difference-in-Differences II Dr Martin Karlsson University of Duisburg-Essen Summer Semester 2017 M Karlsson (University of Duisburg-Essen) Casuality and Programme
More informationGoodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach
Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach By Shiqing Ling Department of Mathematics Hong Kong University of Science and Technology Let {y t : t = 0, ±1, ±2,
More informationCRE METHODS FOR UNBALANCED PANELS Correlated Random Effects Panel Data Models IZA Summer School in Labor Economics May 13-19, 2013 Jeffrey M.
CRE METHODS FOR UNBALANCED PANELS Correlated Random Effects Panel Data Models IZA Summer School in Labor Economics May 13-19, 2013 Jeffrey M. Wooldridge Michigan State University 1. Introduction 2. Linear
More informationStat 542: Item Response Theory Modeling Using The Extended Rank Likelihood
Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Jonathan Gruhl March 18, 2010 1 Introduction Researchers commonly apply item response theory (IRT) models to binary and ordinal
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationStatistics 581 Revision of Section 4.4: Consistency of Maximum Likelihood Estimates Wellner; 11/30/2001
Statistics 581 Revision of Section 4.4: Consistency of Maximum Likelihood Estimates Wellner; 11/30/2001 Some Uniform Strong Laws of Large Numbers Suppose that: A. X, X 1,...,X n are i.i.d. P on the measurable
More information1 Glivenko-Cantelli type theorems
STA79 Lecture Spring Semester Glivenko-Cantelli type theorems Given i.i.d. observations X,..., X n with unknown distribution function F (t, consider the empirical (sample CDF ˆF n (t = I [Xi t]. n Then
More informationConsistent Tests for Conditional Treatment Effects
Consistent Tests for Conditional Treatment Effects Yu-Chin Hsu Department of Economics University of Missouri at Columbia Preliminary: please do not cite or quote without permission.) This version: May
More informationECON 4160, Lecture 11 and 12
ECON 4160, 2016. Lecture 11 and 12 Co-integration Ragnar Nymoen Department of Economics 9 November 2017 1 / 43 Introduction I So far we have considered: Stationary VAR ( no unit roots ) Standard inference
More informationEC821: Time Series Econometrics, Spring 2003 Notes Section 9 Panel Unit Root Tests Avariety of procedures for the analysis of unit roots in a panel
EC821: Time Series Econometrics, Spring 2003 Notes Section 9 Panel Unit Root Tests Avariety of procedures for the analysis of unit roots in a panel context have been developed. The emphasis in this development
More informationGeneralized Gaussian Bridges of Prediction-Invertible Processes
Generalized Gaussian Bridges of Prediction-Invertible Processes Tommi Sottinen 1 and Adil Yazigi University of Vaasa, Finland Modern Stochastics: Theory and Applications III September 1, 212, Kyiv, Ukraine
More informationIntegrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University
Integrated Likelihood Estimation in Semiparametric Regression Models Thomas A. Severini Department of Statistics Northwestern University Joint work with Heping He, University of York Introduction Let Y
More informationConditional Distributions
Conditional Distributions The goal is to provide a general definition of the conditional distribution of Y given X, when (X, Y ) are jointly distributed. Let F be a distribution function on R. Let G(,
More information