Supplementary material to: Tolerating deance? Local average treatment eects without monotonicity.

Supplementary material to: Tolerating deance? Local average treatment eects without monotonicity. Clément de Chaisemartin September 1, 2016 Abstract This paper gathers the supplementary material to de Chaisemartin (2016). I show that one can estimate quantile treatment eects in the surviving-compliers subpopulation, that one can test the CD condition, and that my results extend to multivariate treatment and instrument, to fuzzy regression discontinuity designs, and to the bounds in Lee (2009) for survey non-response. University of California at Santa Barbara, clementdechaisemartin@ucsb.edu

1 Testability and quantile treatment eects. In this section, I show that random instrument, exclusion restriction, and CD have a testable implication. Then, I discuss how the testable implications of random instrument, exclusion restriction, and ND studied in Huber & Mellace (2012), Kitagawa (2015), and Mourie & Wan (2014) relate to the CD condition. For any random variable R, let S(R) denote the support of R. Let U dz be a random variable uniformly distributed on [0, 1] denoting the rank of an observation in the distribution of Y D = d, Z = z. If the distribution of Y D = d, Z = z is continuous, U dz = F Y D=d,Z=z (Y ). If the distribution of Y D = d, Z = z is discrete, one can randomly allocate a rank to tied observations by setting U dz = F Y D=d,Z=z (Y ) + V (F Y D=d,Z=z (Y ) F Y D=d,Z=z (Y )), where V is uniformly distributed on [0, 1] and independent of (Y, D, Z), and Y = sup{y S(Y D = d, Z = z) : y < Y }. 1 For every d {0, 1}, let p d = F S P (D=d Z=d). Notice that both p 0 and p 1 are included between 0 and 1. Finally, let L = E(Y D = 1, Z = 1, U 11 p 1 ) E(Y D = 0, Z = 0, U 00 1 p 0 ) L = E(Y D = 1, Z = 1, U 11 1 p 1 ) E(Y D = 0, Z = 0, U 00 p 0 ). Theorem S1 If Assumptions 1, 2, and 5 are satised, L W L. (29) The intuition of this result goes as follows. Under random instrument, exclusion restriction, and CD, E(Y 1 Y 0 C V ) is point identied: following Theorem 2.1 in de Chaisemartin (2016), it is equal to W. It is also partially identied. C V is included in {D = 0, Z = 0}, and it accounts for p 0 % of this population. E(Y 0 C V ) can therefore not be larger than the mean of Y 0 of the p 0 % of this population with largest Y 0. It can also not be smaller than the mean of Y 0 of the p 0 % with lowest Y 0 (see Horowitz & Manski, 1995 and Lee (2009)). Combining this with a similar reasoning for E(Y 1 C V ) yields worst case bounds for E(Y 1 Y 0 C V ), L and L. The estimand of E(Y 1 Y 0 C V ) should lie within its bounds, hence the testable implication. To implement this test, one can use results from Andrews & Soares (2010). Let θ satisfy the two following moment inequality conditions: ( Y Z 0 E 0 E P (Z = 1) ( θ + ( F S ( Y (1 Z) P (Z = 0) F S Y DZ1{U 11 p 1 } θ + P (D = 1, Z = 1, U 11 p 1 ) Y (1 D)(1 Z)1{U 00 1 p 0 } P (D = 0, Z = 0, U 00 1 p 0 ) Y DZ1{U 11 1 p 1 } P (D = 1, Z = 1, U 11 1 p 1 ) Y (1 D)(1 Z)1{U 00 p 0 } P (D = 0, Z = 0, U 00 p 0 ) ) )) ( Y Z Y (1 Z) P (Z = 1) P (Z = 0) This denes a moment inequality model with preliminary estimated parameters. This model satises all the technical conditions required in Andrews & Soares (2010) if E( Y 2+δ ) < + 1 Y should be understood as if Y = inf{s(y D = d, Z = z)}. )). 1

for some strictly positive δ, and if P (Z = 0), P (Z = 1), P (D = 1, Z = 1, U 11 p 1 ), P (D = 1, Z = 1, U 11 1 p 1 ), P (D = 0, Z = 0, U 00 p 0 ), and P (D = 0, Z = 0, U 00 1 p 0 ) are all bounded by below by some strictly positive constant ɛ. Under these two mild restrictions, their Theorem 1 applies, and one can use it to construct a uniformly valid condence interval for θ. (29) is rejected when 0 does not belong to this condence interval. As pointed out in Balke & Pearl (1997) and Heckman & Vytlacil (2005), random instrument, exclusion restriction, and ND also have testable implications. Huber & Mellace (2011), Kitagawa (2015), Machado et al. (2013), and Mourie & Wan (2014) have developed statistical tests of these implications. I now discuss how these tests relate to the CD condition. The test suggested in Huber & Mellace (2011) is not a test of CD. Huber & Mellace (2011) use the fact that E(Y 0 NT ) and E(Y 1 AT ) are both point and partially identied under random instrument, exclusion restriction, and ND. For instance, E(Y D = 0, Z = 1) is equal to E(Y 0 NT ) and E(Y D = 0, Z = 0) is equal to (1 p 0 )E(Y 0 NT ) + p 0 E(Y 0 C). This implies E(Y D = 0, Z = 0, U 00 1 p 0 ) E(Y D = 0, Z = 1) E(Y D = 0, Z = 0, U 00 p 0 ). P (NT ) But under CD, E(Y D = 0, Z = 1) is equal to E(Y P (NT )+P (F ) 0 NT ) + the previous implication need not be true anymore. P (F ) P (NT )+P (F ) E(Y 0 F ), so The tests suggested in Kitagawa (2015), Machado et al. (2013), and Mourie & Wan (2014) are not tests of CD, but they are tests of the CDM condition introduced hereafter. Assumption S1 (Compliers-deers for marginals: CDM) There is a subpopulation of C denoted C F which satises: P (C F ) = P (F ) Y 0 C F Y 0 F Y 1 C F Y 1 F. The CDM condition requires that a subgroup of compliers have the same size and the same marginal distributions of Y 0 and Y 1 as deers. CDM is stronger than CD. On the other hand, it is invariant to the scaling of the outcome, while CD is not. With non-binary outcomes, working under assumptions invariant to scaling might be desirable so one might then prefer to invoke CDM than CD, despite the fact it is a stronger assumption. Following the logic of Theorem 2.2 in de Chaisemartin (2016), a sucient condition for CDM to hold is that there be more compliers than deers in each subgroup with the same value of (Y 0, Y 1 ): P (F Y 0, Y 1 ) P (C Y 0, Y 1 ). 2

One can show that under CDM, the marginal distributions of Y 0 and Y 1 are identied for surviving-compliers. The estimands are the same as in Imbens & Rubin (1997): f Y0 C V (y 0 ) = P (D = 0 Z = 0)f Y D=0,Z=0(y 0 ) P (D = 0 Z = 1)f Y D=0,Z=1 (y 0 ) P (D = 0 Z = 0) P (D = 0 Z = 1) f Y1 C V (y 1 ) = P (D = 1 Z = 1)f Y D=1,Z=1(y 1 ) P (D = 1 Z = 0)f Y D=1,Z=0 (y 1 ). P (D = 1 Z = 1) P (D = 1 Z = 0) The testing procedures in Kitagawa (2015) and Mourie & Wan (2014) test whether the rhs of the two previous equations are positive everywhere. Under CDM, these quantities are densities so they must be positive. These procedures are therefore tests of CDM. Machado et al. (2013) focus on binary outcomes. Their procedure tests inequalities equivalent to those considered in Kitagawa (2015). Therefore, their procedure is also a test of the CDM condition. 2 Multivariate treatment and instrument Results presented in the main paper extend to applications where treatment is multivariate. Assume that for every z {0; 1}, D z {0, 1, 2,..., J} for some integer J. Let (Y j ) j {0,1,2,...,J} denote the corresponding potential outcomes. Let C j = {D 1 j > D 0 } denote the subpopulation of compliers induced to go from a treatment below to above j by the instrument. Let F j = {D 0 j > D 1 } denote the subpopulation of deers induced to go from above to below j by the instrument. As in Angrist & Imbens (1995), I assume that the cdf of D Z = 0 stochastically dominates that of D Z = 1. This implies that P (C j ) P (F j ). Consider the following CD condition: Assumption S2 (Compliers-deers for multivariate treatment: CDMU) For every j {1, 2,..., J}, there is a subpopulation of C j denoted C j F P (C j F ) = P (F j ) E(Y j Y j 1 C j F ) = E(Y j Y j 1 F j ). which satises: If CDMU is satised, one can show that W is equal to the following average causal response: J w j E(Y j Y j 1 C j V ), j=1 where C j V is a subset of C j of size P (C j ) P (F j ), and w j are positive real numbers. This generalizes Theorem 1 in Angrist & Imbens (1995). Their Theorem 2 considers the case where the instrument is multivariate: Z {0, 1, 2,..., K}. Let β denote the coecient of D in a 2SLS regression of Y on D using the set of dummies 3

(1{Z = k}) 1 k K as instruments. Angrist & Imbens (1995) show that if D z D z for every z z, β is equal to a weighted average of average causal responses. Let C jz = {D z j > D z 1 } (resp. F jz = {D z j > D z 1 }) denote the subpopulation of compliers induced to go from a treatment below to above j (resp. above to below j) when the instrument moves from z 1 to z. Consider the following condition: Assumption S3 (Compliers-deers for multivariate treatment and instrument: CDMU2) For every j {1, 2,..., J} and z {0, 1, 2,..., K}, there is a subpopulation of C jz denoted C jz F which satises: P (C jz F ) = P (F jz ) E(Y j Y j 1 C jz F ) = E(Y j Y j 1 F jz ). Under CDMU2, one can show that β is equal to a weighted average of average causal responses for surviving-compliers. 3 Fuzzy regression discontinuity designs and survey non response My results also extend to many treatment eect models relying on ND type of conditions. An important example are fuzzy regression discontinuity (RD) designs studied in Hahn et al. (2001). Their Theorem 3 still holds if their Assumption A.3, a ND at the threshold assumption, is replaced by a CD at the threshold assumption. Let S denote the forcing variable, and let s denote the threshold. Let C s = {C, S = s} and F s = {F, S = s} respectively denote compliers and deers at the threshold. Assumption S4 (Compliers-deers at the threshold: CDT) There is a subpopulation of C s denoted C s F which satises: P (C s F S = s) = P (F s S = s) E(Y 1 Y 0 C s F, S = s) = E(Y 1 Y 0 F s, S = s). Finally, my results extend to the sample selection and survey non-response literatures. ND type of conditions have also been used in the sample selection and survey non-response literatures, for instance in Lee (2009) or Behaghel et al. (2012). Let R 0 and R 1 denote a subject's potential responses to a survey without and with the treatment. Let AR, RC, and RF respectively denote always respondents (subjects who satisfy R 0 = R 1 = 1), reponse-compliers (those who satisfy R 0 = 0 and R 1 = 1), and response-deers (those who satisfy R 0 = 1 and R 1 = 0). Under the assumption that there are no response-deers (R 1 R 0 ), Lee (2009) derives bounds for E(Y 1 Y 0 AR). Under the weaker assumption that a subpopulation RC F of RC has the same size and the same average treatment eect as RF, the bounds derived in Lee (2009) are valid bounds for E(Y 1 Y 0 AR RC F ). 4

References Andrews, D. W. K. & Soares, G. (2010), `Inference for parameters dened by moment inequalities using generalized moment selection', Econometrica 78(1), 119157. Angrist, J. D. & Imbens, G. W. (1995), `Two-stage least squares estimation of average causal eects in models with variable treatment intensity', Journal of the American Statistical Association 90(430), pp. 431442. Balke, A. & Pearl, J. (1997), `Bounds on treatment eects from studies with imperfect compliance', Journal of the American Statistical Association 92(439), 11711176. Behaghel, L., Crépon, B., Gurgand, M. & Le Barbanchon, T. (2012), Please call again: Correcting non-response bias in treatment eect models, IZA Discussion Papers 6751, Institute for the Study of Labor (IZA). de Chaisemartin, C. (2016), `Tolerating deance? monotonicity.'. identication of treatment eects without Hahn, J., Todd, P. & Van der Klaauw, W. (2001), `Identication and estimation of treatment eects with a regression-discontinuity design', Econometrica 69(1), 201209. Heckman, J. J. & Vytlacil, E. (2005), `Structural equations, treatment eects, and econometric policy evaluation', Econometrica 73(3), 669738. Horowitz, J. L. & Manski, C. F. (1995), `Identication and robustness with contaminated and corrupted data', Econometrica 63(2), 281302. Huber, M. & Mellace, G. (2011), Testing instrument validity for late identication based on inequality moment constraints, Technical report, University of St. Gallen, School of Economics and Political Science. Huber, M. & Mellace, G. (2012), Relaxing monotonicity in the identication of local average treatment eects. Working Paper. Imbens, G. W. & Rubin, D. B. (1997), `Estimating outcome distributions for compliers in instrumental variables models', Review of Economic Studies 64(4), 555574. Kitagawa, T. (2015), `A test for instrument validity', Econometrica 83(5), 20432063. Lee, D. S. (2009), `Training, wages, and sample selection: Estimating sharp bounds on treatment eects', Review of Economic Studies 76(3), 10711102. 5

Machado, C., Shaikh, A. & Vytlacil, E. (2013), Instrumental variables, and the sign of the average treatment eect, Technical report, Citeseer. Mourie, I. & Wan, Y. (2014), Testing late assumptions, Technical report. 6

A Proofs Proof of Theorem S1 Proving that L and L are valid bounds for E(Y 1 Y 0 C V ) relies on a similar argument as the proof of Proposition 1.a in Lee (2009). Due to a concern for brevity, I refer the reader to this paper for this part of the proof. The testable implication directly follows. QED. 7