Minimum Contrast Empirical Likelihood Manipulation. Testing for Regression Discontinuity Design

Size: px

Start display at page:

Download "Minimum Contrast Empirical Likelihood Manipulation. Testing for Regression Discontinuity Design"

Job Bishop
6 years ago
Views:

1 Minimum Contrast Empirical Likelihood Manipulation Testing for Regression Discontinuity Design Jun Ma School of Economics Renmin University of China Hugo Jales Department of Economics Syracuse University Zhengfei Yu Faculty of Humanities and Social Sciences University of Tsukuba Abstract This paper investigates the asymptotic properties of a simple empirical-likelihood-based inference method for discontinuity in density. In a regression discontinuity design (RDD), the continuity of the density of the assignment variable at the threshold is considered as a nomanipulation behavioral assumption, which is a testable implication of an identifying condition for the local treatment effect (LATE). Our approach is based on the first-order conditions obtained from a minimum contrast (MC) problem and complements Otsu et al. (03, OXM, hereafter) s method. Our inference procedure has three main advantages. Firstly, it requires only one tuning parameter; secondly, it does not require concentrating out any nuisance parameter and therefore is very easily implementable; thirdly, its delicate second-order properties lead to a simple coverage-error-optimal (CE-optimal, hereafter) bandwidth selection rule. We propose a data-driven CE-optimal bandwidth selector for use in practice. Results from Monte Carlo simulations are presented. Usefulness of our method is illustrated by empirical examples. Introduction There is vast literature utilizing RRD s as identification strategies to evaluate LATE s, which represents causal effects in different social scientific studies. See, e.g., Lee and Lemieux (00) for review This version: March, 07

2 of papers using RDD s in economic literature. Hahn et al. (00) showed that in an RDD, the LATE is nonparametrically identified to be a quantity that can be estimated by standard nonparametric kernel methods, under a set of identifying conditions. For the case of sharp RDD, Lee (008) provided a different set of conditions for identifying the LATE, which has a testable implication that the density function of the assignment variable should be continuous at the threshold point. In Lee (008) s framework, McCrary (008) proposed a formal Wald-type test of the density continuity based on Cheng et al. (997) s local linear estimator. Now it has been a common practice that applied econometricians carry out McCrary (008) s test as a specification/falsification test and report the corresponding p-value in the context of RDD. In the spirit of Lee (008), Dong (06) provided a set of identifying conditions which still provided formal justification for testing the continuity of the density function of the assignment variable at the threshold point in the more general context of fuzzy RDD. In Dong (06) s framework, Yanagi (05) developed identifying conditions for local weighted average treatment effect in the presence of nonclassical measurement errors in the context of fuzzy RDD. Yanagi (05) s analysis justified using McCrary (008) s test as a falsification test for RDD even in the presence of measurement error in the assignment variable. In this paper, based on the first-order conditions obtained from a minimum contrast problem (see Bickel and Doksum, 05, Chapter.3) which gives a moment condition that approximately identifies the parameter of interest, which is the size of the difference between the right and the left limits of the density function of the assignment variable at the threshold, we apply the empirical likelihood (EL) profiling procedure to test H 0 : =0. EL can be applied to make inference on the parameter identified by unconditional or conditional moment conditions ( see, e.g., Kitamura, 006 for a comprehensive review) and also nonparametric curves (see, e.g., Chen, 996, Chen and Qin, 000 and Otsu et al., 05). Various papers showed that EL has particularly favorable statistical properties. See Chen and Cui (007), Kitamura (00), Newey and Smith (004) and Otsu (00) among many others. Our approach is closely related to OXM, which considers applying the EL profiling procedure to the first-order conditions obtained from applying Cheng et al. (997) s binning estimation method, Particularly, Hahn et al. (00) s local independence assumption is not required in Lee (008) s framework. Recently, Dong (06) pointed out that the plausibility of the local independence assumption of Hahn et al. (00) can be often in doubt in practical empirical applications. We should emphasize that the continuity of the density function of the assignment variable is neither sufficient nor necessary for identification of treatment effects. See McCrary (008) for discussion.

3 which can be viewed as two separate local linear least-square regressions (on each side of the threshold) using the bin counts as regressands and the bin centers as regressors (see McCrary, 008 and Otsu et al., 03 for the technical details). Notice that both of this EL-based procedure and McCrary (008) s Wald-type procedure require choosing two tuning parameters. 3 OXM also considers and recommends applying the EL procedure to the first-order conditions obtained from the local likelihood approach (see, e.g., Loader, 006, Chapter 4 for a comprehensive review). The minimum contrast empirical likelihood (MC-EL) manipulation testing method proposed in this paper has the same attractive features as those of OXM s approaches, including implicit asymptotic variance estimation and data-driven shapes of the confidence interval for. 4 The testing method proposed in this paper has three advantages over (at least one of) OXM s EL-based methods. Firstly, our MC-EL approach requires selecting only one tuning parameter. Secondly, adopting either of the two EL-based approaches of OXM for testing H 0 : =0requires concentrating out nuisance parameters. In contrast, our MC-EL method does not require concentrating out any nuisance parameter. This is because the moment condition derived from the MC problem that identifies the parameter of interest is much simpler than those derived from binning and local likelihood approaches. To compute the MC-EL testing statistic, it suffices to solve only the EL inner loop problem, which is a simple convex optimization problem. 5 Hence, our approach has less computational burden. Thirdly, our MC-EL approach yields a simple expression for the second order coverage error. As a result, a CE-optimal tuning parameter (bandwidth) selection rule can be derived. The third advantage is a particularly desirable feature because it is recognized that for nonparametric econometric methods, an automatic data-driven selection rule for the tuning parameters is very important. Taking advantage of delicate second-order properties of the likelihood-based method, we derive the CE-optimal bandwidth selection rule. As discussed in Otsu et al. (05, Section 5), under H 0, the type-i error is + r (n, h), where is the significance level, n is the sample size, h is the bandwidth and r (n, h)! 0, asn!and h! 0 (under proper rate restriction). 3 Recently, Cattaneo et al. (06) proposedanovellocalregressionmethodwithanapplicationtomanipulation testing that does not require binning. Later we adopt this idea to obtain pilot bandwidths when calculating estimates of the CE-optimal bandwidths. 4 The asymptotic variance estimation can be complicated in a Wald-type manipulation test, see Cattaneo et al. (06). 5 See Kitamura (006, Section 8.) for a discussion of the inner loop optimization and the outer loop optimization in the context of EL. 3

4 In this paper, we show that for our MC-EL manipulation testing statistic, the leading term of the remainder term r (n, h) is positive and a CE-optimal bandwidth that minimizes the leading term is well-defined and has a very simple explicit form. 6 We further propose direct-plug-in (DPI) estimation of the CE-optimal bandwidth. Due to the considerably important role of RDD in economic policy evaluation and popularity of the application of nonparametric kernel methods for estimation and inference in the context of RDD, bandwidth selection rules that are tailored for interval estimation/testing problems in this context had recently drawn attention from theoretical econometricians. See, e.g., Calonico et al. (04, CCT, hereafter), Calonico et al. (06), Armstrong and Kolesár (05) andarmstrong and Kolesár (06). 7 Hence, our MC-EL method and its CE-optimal bandwidth selection rule also contribute to the RDD literature in this respect. 8 The paper is organized as follows. Section provides basic setup and preliminaries. Section 3 introduces the MC-EL manipulation testing method and provides its first-order and second-order asymptotic properties. Section 4 provides Monte Carlo simulation results. Section 5 provides two empirical examples illustrating usefulness of our testing method. Section 6 concludes. Preliminaries Suppose our observations {X i : i =,...,n} are i.i.d. random variables with Lebesgue density f. In applications in the context of RDD, X i is the assignment variable. We assume that f is compactly supported on [x, x]. For some known threshold point c (x, x), define f (z) f (z) for z<cand f + (z) f (z) for z>c. Denote ' (k) limf (k) (z) and ' (k) + limf (k) (k) + (z), where f s denotes the z"c z#c k th order derivative of f s, for s {, +}. For k =0, we denote ' + and ' for simplicity. Our 6 In a Wald-type inference procedure for the density at an interior point using the standard kernel density estimator, numerical optimization is required to calculate a DPI estimate of the coverage-error-optimal bandwidth, see Calonico et al. (07), Corollary 3. 7 These methods are fundamentally different from that of Imbens and Kalyanaraman (0) whichprovideda selection rule based on the criterion of minimizing the asymptotic mean square error (AMSE) for point estimation in the context of RDD. 8 We also notice that since manipulation testing is viewed as a specification/falsification test in the context of RDD, practitioners are more concerned with the type-i error than the type-ii error. If the data and the testing methods reveal evidence against the no-manipulation assumption, validity of RDD is in doubt. Recently, Gerard et al. (06) showed that without assuming no manipulation, the LATE is partially identified and provided explicit formulae for the identified upper and lower bounds. Compared with some other type of criterion that incorporates type-ii error (power) (see, e.g., Gao and Gijbels, 008), the CE-optimal bandwidth selection rule is of more practical interest to practitioners. 4

5 parameter of interest is ' + ', which is the same as that considered in OXM. =0means that the density function is continuous at c (i.e., the no-manipulation condition holds). Let K denote a kernel function assumed to be supported on [, ], h denote the bandwidth and K h h K ( /h). Following Bickel and Doksum (05, Chapter.3), we consider the following minimum contrast problems: a,b argmin a,b Z c x {f (z) (a + b (z c))} K h (z c)dz, (.) and Denote a +,b + argmin a +,b + Z x c {f (z) (a + + b + (z c))} K h (z c)dz. (.) m j, Z 0 u j K (u)du, m j,+ Z 0 u j K (u)du, v j, Z 0 u j K (u) du, v j,+ Z 0 u j K (u) du. When a kernel is given, these are known constants. Solving the first-order conditions (.) and(.), we find that the minimizers are a = Z c x ( m, m 0, m, m, m, m 0, m, m, ) z c K h (z c) f (z)dz, (.3) h and a + = Z x c ( m,+ m 0,+ m,+ m,+ m,+ m 0,+ m,+ m,+ ) z c K h (z c) f (z)dz. (.4) h It can be shown that a s = ' s + O h for s {, +}. The sample analogue of a s is referred to as the minimum contrast estimator (MCE, hereafter) of ' s. 9 Let ( ) denote the indicator function. Denote W i, ( m, m 0, m, m, m, m 0, m, m, ) Xi c h (X i <c) K h (X i c) 9 Such a local-linear-type estimator dates back to Lejeune and Sarda (99), where the MCE was interpreted as local linear approximation to the empirical density function. Cheng et al. (997) showed that both the binning estimator and the MCE enjoy certain optimal theoretical properties. 5

6 and W i,+ ( m,+ m 0,+ m,+ m,+ m,+ m 0,+ m,+ m,+ ) Xi c h (X i >c) K h (X i c). Let be the MCE for. Defining b n (W i,+ W i, ) i= Z 0 j = Z 0 ( m, m 0, m, m, ( m,+ m 0,+ m,+ m,+ m, j t) m 0, m, m K (t) j dt, j m,+ t) m 0,+ m,+ m K (t) j dt,,+ where the second line follows if K is symmetric. Applying the delta method and Jiang and Doksum (003, Theorem.3), under standard regularity conditions (i.e., Assumptions and in Section 3), we have p nh b MC h! d N(0, V MC ) (.5) where MC ' () + ' (), (.6) with Z 0 m, m, t m 0, m, m,! t K (t)dt = m, m, m 3, m 0, m, m,, and V MC (' + + ' ). Notice that the statement (.5) is exactly the same as that of Theorem. of OXM. This suggests that the MCE of is first-order equivalent to its local likelihood estimator. The AMSE of MCE of is AMSE (h) MCh 4 + V MC nh. 6

7 It is now clear that the AMSE-optimal bandwidth selection rule is h VMC 4 MC /5 n /5. (.7) DPI estimation of h requires pilot estimates of ' (k) s, for (s, k) {, +} {0, }. Let ˆT p / nh b [V MC be the t-statistic, where [V MC is a consistent estimator of V MC. Notice that using (.5) for testing H 0 : =0requires either undersmoothing (i.e., nh 5! 0 as n!), in which case, ˆT is asymptotically standard normal or explicit estimation (removal) of the bias term. The AMSE-optimal bandwidth (.7) is too large so that the bias is not negligible. Various papers suggested an ad hoc undersmoothing strategy, i.e., using c h / in practical implementation, where b h is an estimate of h. See, e.g., McCrary (008) andcattaneo et al. (06). The idea can be easily extended to achieve a p-th order MCE. Let r p (u) (,u,...,u p ) 0. Denote 0, ',s ' s /0!, '() s /!,..., '( ) s /! for s {, +}. 0 Let e + be the unit vector where the ( + )-th component is one and the other components are all zeros. We consider the following more general MC problem: Z c min R p+ x f (z) 0 r p (z c) K h (z c)dz. The minimizer denoted by satisfies the moment conditions: Z (z <c) r p (z c) K h (z c)df (z) = Z c x 0 r p (z c) r p (z c) K h (z c)dz. The minimizer is a local approximation for ',s,and can be estimated by the method of moments. Define the matrices 3 3 M p,s 6 4 m 0,s m,s m p,s m,s m,s m p+,s, V p,s v 0,s v,s v p,s v,s v,s v p+,s, 7 5 m p,s m p+,s m p,s v p,s v p+,s v p,s 0 Existence of ',s can be guaranteed by imposing sufficient smoothness assumption on f + and f. See the discussion following Assumption. 7

8 for s {, +} and B h diag {,h,...,h p }. Define (ˆ 0,, ˆ,,...,ˆ p, ) 0 (M p, B h ) ( n i= ) Xi c (X i <c) r p K h (X i c). h The MCE for '( ) /! is given by ˆ,. Similarly the MCE for '( ) + /! is given by the ( + ) th component of (ˆ 0,+, ˆ,+,...,ˆ p,+ ) 0 (M p,+ B h ) ( n i= ) Xi c (X i >c) r p K h (X i c). h Denote l p,s (m p+,s,...,m p+,s ) 0 for s {, +}. Jiang and Doksum (003, Theorem.3) shows that E[ˆ,s ] ' ( ) s! = '(p+) s h p+ (p + )! e 0 +M p,sl p,s + o h p+, (.8) and Var [ˆ,s ]= ' s nh + e 0 +M p,sv p,s M p,se + + o nh +, (.9) for s {, +}. The higher-order MCE is useful when estimators of ' (k) s for k > are needed. We will see that the local quadratic MCE (i.e., p =) is used when forming DPI estimates of h and the CE-optimal bandwidth introduced in the next section. Using higher-order MCE can be also justified based on CCT s argument. The explicit bias estimation (removal) and robustifying strategy proposed by CCT for local polynomial estimation for RDD is equivalent to increasing the order of the local polynomial specified. The same strategy can be applied when using the MCE for manipulation testing. In practice, we use an estimated AMSE-optimal bandwidth (.7) but construct the t-statistic based on the local quadratic estimator instead, since it suffices to have nh 7! 0 for the t-statistic to be asymptotically standard normal. The asymptotic variance of the local quadratic estimator is given by (' + + ' ) e 0 M, V, M, e, which can be estimated consistently. See Remark 7 of CCT. 8

9 3 Empirical Likelihood Inference We consider the following empirical likelihood criterion function ` ( ) min p,...,p n subject to log (n p i ) i= p i {(W i,+ W i, ) } =0, i= p i =,p i > 0, i=,...,n. (3.) i= Standard derivations using Lagrange multiplier give that for each, ` ( ) = i= log +ˆ ( ) {(W i,+ W i, ) }, (3.) where ˆ ( ) satisfies (W i,+ W i, ) i= +ˆ ( ) {(W i,+ W i, ) } =0. (3.3) It can be shown that ` ( ) is asymptotically (i.e., the Wilks phenomenon holds). The MC- EL criterion function (3.) is a counterpart to the concentrated EL criterion function of OXM. However, the MC-EL criterion function depends on in a much simpler way: the moment condition that is present in the constraint of (3.) is linear in, while the corresponding concentrated EL functions of OXM are minimum value functions of, which is involved in the constraint of outer loop constrained minimization problems. Computing our MC-EL testing statistic requires solving a simple convex optimization problem. In contrast, computing the testing statistics proposed in OXM requires solving saddle point problems. Therefore, apparently our approach incurs less computational cost and error since it does not require concentrating out any nuisance parameter and avoids the concentration step. This is due to the fact that the parameter of interest can be approximated by a + a,ash! 0, which is identified by a single moment condition: it is clear from (.3) and(.4) that E (W,+ W, ) a + a =0. (3.4) We now make the following regularity assumptions. Assumption. (i) (Kernel) K : R! R is a symmetric and continuous probability density function See Equations (8) and () of OXM, 9

10 that is supported on [, ]. (ii) (Bandwidth) The bandwidth h is positive and satisfies h! 0 and (nh) log (n)! 0 as n!. Assumption. (Data Generating Process) Let denote some positive constant. On the neighborhood (c, c), f has a Lipschitz third-order derivative. Analogous condition holds for f + with the neighborhood (c, c + ). Assumption and the continuous extension theorem guarantee the existence of ' (k) s for (s, k) {, +} {0,,, 3}. It is stronger than the smoothness assumption imposed in OXM, because our DPI estimation of the CE-optimal bandwidth requires existence of the third-order (one-sided) derivatives. The MC-EL testing statistic for H 0 : =0is ` (0). The following theorem provides its asymptotic distribution under H 0. Theorem. Suppose that Assumptions and are satisfied. Additionally, assume that nh 5! 0 as n!.thenwehave ` ( )! d. Remark. Notice that Theorem requires that the bandwidth satisfies nh 5! 0 as n!, which is also required in the statement of Theorem of OXM. We can take the ad hoc undersmoothing strategy to choose h = c h / in practical implementation, where b h is taken to be a DPI estimate of h. (s, k) { MC can be consistently estimated by replacing ' (k) s by its consistent estimators ' d (k) s, for, +} {0, } which can be obtained by running local quadratic MCE using AMSE-optimal bandwidth selection rules. Notice that (.8) and(.9) imply that the AMSE-optimal bandwidth selection rule for the local quadratic MCE of '(k) s /k! is 8 >< 8' s e 0 h k,s k+ M >: (3 k) ' (3) s,s V,sM,s e k+ e 0 k+ M,s l,s 9 >= >; /7 n /7. (3.5) Estimating these bandwidths requires pilot estimates of ' (k) s for (s, k) {, +} {0, 3}. Following the approach taken by various papers in nonparametric econometrics (see, e.g., Arai and Ichimura (05)), we use inconsistent but easily implementable estimates of these objects. These estimates can be based on the local regression idea of Cattaneo et al. (06). See Appendix C for its implementation. 0

11 Remark. Let W q i, (X i <c) e 0 (M, B h ) Xi c r K h (X i c) h and W q i,+ (X i >c) e 0 (M,+ B h ) Xi c r K h (X i c). h The local quadratic EL criterion function is simply `q ( ) = i= n log +ˆq ( ) W q i,+ W q i, o, where ˆq ( ) satisfies a first-order condition that is similar to (3.3). Given (.8), it is straightforward to adapt the proof of Theorem to show that `q ( )! d when nh 7! 0 as n!. For practical implementation, we take h = b h n /5, where b h is the DPI estimate discussed in Remark and then the bias is removed internally by using higher-order polynomial approximation. This is analogous to CCT s robustifying strategy. Now we provide second-order properties of the EL-based inference procedure. Denote ` ` ( ) and find its stochastic expansion `0 satisfying ` = `0 + O p (nh) 3 / + n + h3 + h (nh) / +(nh) / h 6 + nh 8!, (3.6) where the expression of `0 can be found in the appendix (see (B.)). 3 Denote K,s (t) ( m,s m 0,s m,s m,s ) m,s t m 0,s m,s m K (t),,s for s {, +}. Notice that since K is symmetric, we have K,+ (t) =K, ( t) for all t [, ]. The following theorem provides the coverage accuracy for the MC-EL statistic. An additional condition, which is satisfied by most commonly-used kernels, is required. Theorem. Suppose that Assumptions and are satisfied. Additionally, assume that there exists apartition0=u < <u J =,suchthatk 0,+ is bounded and either strictly positive or strictly 3 It can be shown that the difference between the CDF of ` and that of `0 is of order of magnitude that is approximately the same as that of the stochastic order of magnitude of the error term in (3.6).

12 negative on (u j,u j+ ),forj =,...,J. Thenwehave Pr `0 6 c =( ) nh 5 µ MC +(nh) µ µ 4 3 µ 3 µ 3 + O h 3 + n +(nh) 3 / + nh 6 + n h 0, (3.7) c / c / where the definitions of µ, µ 3 and µ 4 can be found in Appendix A and is the standard normal density function. Remark 3. Notice that it is straightforward to show that µ = (' + + ' ) + O (h), µ 3 = (' + ' ) 3 + O (h) and µ 4 =(' + + ' ) 4 + O (h). By using these results and (3.7), we have Pr `0 6 c =( ) n nh 5 (' + + ' ) MC +(nh) (' + + ' ) (' + + ' ) 3 c / c / + O h 3 + n +(nh) 3 / + nh 6 + n h 0. (3.8) 3 Notice that 4 3 (' + + ' ) 3 > (3.9) since (' + ' ) 6 (' + + ' ). We notice that the right hand side of (3.9) depends only on the kernel K. We find that such a number is positive for all commonly used kernels. For example, it is equal to if K is the triangular kernel and equal to if K is the Epanechnikov kernel. Denote 4 3 (' + + ' ) 3. It now follows that the CE-optimal bandwidth h CE argmin nnh 5 MC + (nh) o h>0 is well-defined and solves the first-order condition. We find h CE = 5 MC /6 n /3.

13 Notice that we have h CE n /3, which is asymptotically smaller than h. For practical implementation, we can obtain a DPI estimate of h CE by replacing MC and by their consistent estimators. We take the same strategy as that discussed in Remark and run quadratic MCE twice using AMSE-optimal bandwidth selection rules. It is also clear that for the manipulation testing problem in the context of RDD, when the testing statistic is the MC-EL statistic (i.e., ` (0)), h CE minimizes the leading term of the distortion to the type-i error. Remark 4. Notice that for many other Wald-type or EL-type confidence regions for nonparametric curves, the coverage errors are often of orders like O nh 5 + h +(nh). See, e.g., Calonico et al. (07, Theorem ) for Wald-type confidence intervals based on standard or bias-corrected kernel density estimators for interior points and Otsu et al. (05, Theorem 4.) for EL-type confidence regions in the context of RDD. In those cases a simple explicit form of the CE-optimal bandwidth that minimizes the absolute value of leading term of the coverage error is not available and numerical optimization is required to calculate a DPI estimate of the CE-optimal bandwidth. See Calonico et al. (07, Corollary 3). In the expansion of the coverage probability of MC-EL statistic, the O h term vanishes and this makes it possible to derive a simple explicit form of the CE-optimal bandwidth. Remark 5. Theorem also implies that the coverage error of the rescaled statistic ( + B c ) `0, where B c nh 5 (' + + ' ) MC +(nh), is of a smaller order of magnitude, since the rescaling eliminates the leading term of (3.8). This rescaling device is known as the Bartlett correction. In practice, the unknown quantities in B c can be estimated by running a higher-order MCE and a feasible Bartlett-corrected MC-EL testing statistic can be defined correspondingly. However, a drawback is that in this nonparametric setting, there seems to be no practical guidance on how to select the bandwidth. 3

14 Table : Sizes of different manipulation tests: Design I DGP n EL CE EL CE true EL CCT EL ad hoc t ad hoc t CCT 0.05 Normal t Normal t Monte Carlo Simulations We conduct Monte Carlo simulations to examine the finite-sample performances of our MC-EL test for the density discontinuity and the minimum contrast estimator (MCE) based t-test, under different bandwidth selection rules. The simulation designs follow OXM. Design I corresponds to data generating processes (DGPs) in which the density function of the assignment variable is continuous everywhere, and Design II corresponds to DGPs in which the density function of the assignment variable has a jump at the cutoff c. In Design I, the DGPs are normal distribution N(, 3) and Student s t distribution: + 3 p 5 t(5). In Design II, the DGP s are mixtures of normal distributions, i.e., each observation is drawn from a normal distribution N(, 3) truncated on (, c) with probability, and from N(, 3) truncated on (c, +) with probability. We consider several values for the mixing probability such that the corresponding density jumps at the cutoff c are d =0.05, 0.0, 0.5, 0.0, In both designs, the cutoff in consideration is c = 3. The number of replications is We consider sample sizes n = 500, 000, 000. Table presents the empirical sizes of our MC-EL test and the usual t-test, under Design I: the column EL CE for the feasible CE-optimal MC-EL test, i.e., the MC-EL test using the DPI estimate of CE-optimal bandwidth; the column EL CE true for an infeasible CE-optimal MC-EL test using the true CE-optimal bandwidth h CE ; the column EL CCT for the MC-EL test with CCT-type robustification, i.e., the test is based on the local quadratic MC first-order conditions using 4 d = (c 0 )d 0 c,wherec0 (c 0 )( (c 0 = p )) 3 and d 0 = (c 0). 4

15 DPI estimate of h, the AMSE-optimal bandwidth for the linear MCE (see equation (.7)), denoted by h b ; the column EL ad hoc for the MC-EL test with the ad hoc undersmoothing bandwidth h c /; the column t ad hoc for the MCE-based t-test with the bandwidth h c /; the column t CCT for the MCE-based t-test with a CCT-type robustification, i.e., the density at each side of the cutoff is estimated by the local quadratic MCE using the bandwidth h b. Table shows that among all feasible tests, the CE-optimal MC-EL test exhibits the best finite-sample size property in all the experiments we consider, and its size is very close to the nominal size ( =0.05 or =0.0). This shows that the theoretical size advantage of the CE-optimal MC-EL test is realized in finite samples. In addition, the size performance of the feasible CE-optimal MC-EL test is very close to that of its infeasible version, which indicates good performance of our proposed DPI estimate of h CE. The simulation results also indicate that the ad hoc undersmoonthing bandwidth may lead to noticeable over-rejections. As the normal DGP in Design I is the same as the simulation design of OXM, we can also compare the finite-sample size performance of our MC-EL test with their the EL based tests. 5 When the sample size is n = 000, our feasible CE-optimal MC-EL test performs slightly better than both EL based tests in OXM. When the sample size increases to 000, our CE-optimal MC-EL test has a similar performance as the recommended local likelihood based EL test in OXM. Figures and plot the p-value and p-value discrepancy (Davidson and MacKinnon (998)) for the testing procedures studied in Table, under the normal and Student s t DGP s in Design I, with n = 000. The figures show that the good finite-sample size performance of the feasible CE-optimal MC-EL test is very stable over a range of nominal sizes. Table presents the size-adjusted power of all the aforementioned manipulation tests, under Design II. 6 All tests we consider has power increasing towards one when the jump size d and the sample size increase. The feasible CE-optimal MC-EL test is relatively more powerful as compared to the t-test when the jump size d is small or the sample size n is small. The infeasible CE-optimal MC-EL test has the best performance in terms of the power properties. When the jump size and the sample size are both large, the power performance of the feasible CE-optimal MC-EL test is quite close to that of its infeasible version. In other cases, the simulation results indicate loss of power when the CE-optimal bandwidth has to be estimated. When the sample size is large, all the feasible tests (i.e., EL CE, EL CCT, EL ad hoc, t 5 See Table 3 of OXM for the sample size n =000, Each critical value is the 0.95 empirical quantile of corresponding test statistic under the null hypothesis of continuity (Design I), computed by 0, 000 simulations. 5

16 Figure : The p-value (left) and the p-value discrepancy plots (right): Design I- normal Figure : The p-value (left) and the p-value discrepancy plots (right): Design II-t ad hoc and t CCT) have similar power performance. Overall the feasible CE-optimal MC-EL manipulation test exhibits excellent size properties and reasonably good power properties. It is also worth mentioning that the computation of the MC-EL testing statistic is very fast. 5 Empirical Illustration We apply our CE-EL based manipulation test to two well-known data sets in the RD literature: the US House elections (Lee (008)) and PROGRESA (Calonico et al. (04)). Lee (008) study the incumbency advantage exploiting the discontinuous change in incumbency status for the individuals 6

17 Table : Powers (size-adjusted) of different manipulation tests, Design II, =0.05 d n EL CE EL CE true EL CCT EL ad hoc t ad hoc t CCT won an election by a tight margin compared to those that lost by a tight margin. In this case, the assignment variable is the democratic vote share in an election. If the density of vote share presents a discontinuity at the cutoff, then the credibility of the RDD estimates could be called into question. Similarly, in the case of the Mexican conditional cash transfer called PROGRESA, the treatment assignment rule is set as a discontinuous function of the poverty index. This allows us to exploit the discontinuous nature of the treatment assignment to identify the local effects of the policy. However, if individuals manipulate their poverty index to ensure that they receive the treatment, then one could worry about the validity of the RD design. In these cases, as suggested by McCrary (008), one can test for this type of manipulation by checking whether the density of the assignment variable is continuous at the cutoff. Figure 3 displays the undersmoothed histogram of the assignment variable for Lee s (008) dataset. Figure 4 displays the histogram of the (normalized) poverty index for the Calonico et al. (04) s dataset. In both of these figures, we observe changes in the histogram heights at each side of the cutoff. The relevant question is whether these changes are big enough so that we reject the null hypothesis of continuity of the density at the threshold. Table 3 presents the bandwidths and p-values for different manipulation tests. We report the bandwidths and the p-values associated with five tests: Our proposed MC-EL test using the estimated CE-optimal bandwidth (EL CE), the MC-EL test employing CCT-type robustification (EL CCT), the MC-EL test employing ad hoc undersmoothing (EL ad hoc), t-test employing ad hoc un- 7

Figure 3: Vote share density Table 3: Empirical Applications of Manipulation Tests Data EL CE EL CCT EL ad hoc t ad hoc t CCT U.S. House Bandwidth 4.6946 5.455 7.68 7.68 5.455 n = 6558 p-value 0.47 0.

18 Figure 3: Vote share density Table 3: Empirical Applications of Manipulation Tests Data EL CE EL CCT EL ad hoc t ad hoc t CCT U.S. House Bandwidth n = 6558 p-value PROGRESA Bandwidth n = 809 p-value dersmoothing (t ad hoc), and, finally, t-test employing CCT-type robustification. We are interested in both the outcome of the manipulation tests for both applications - the PROGRESA and the US House elections - and also in the magnitude of the bandwidth used by the different tests. Regarding Table 3, we observe that the null hypothesis of no-manipulation is not rejected by any of the tests for the case of the US House Elections data. The p-values are large to rule out discontinuities in the density at the cutoff at any conventional significance level. In the case of the PROGRESA, the null of no-manipulation is not rejected by the EL CE, EL CCT, EL ad hoc, or t CCT tests, but is marginally rejected by the t ad hoc test in which the bandwidth selection is not supported by theory. Overall, our MC-EL manipulation test, especially when the CE-optimal bandwidth is used, yields testing results that are consistent with the majority of studies using these two datasets. 6 Conclusion In this paper, we propose an EL-based test for manipulation of the assignment variable in regression discontinuity designs. Our test is based on the first-order conditions from a minimum contrast 8

Figure 4: Normalized poverty index density problem. The testing procedure is more easily implementable and has less computational burden, compared to other EL-based tests.

19 Figure 4: Normalized poverty index density problem. The testing procedure is more easily implementable and has less computational burden, compared to other EL-based tests. More importantly, by investigating the second-order properties, we also derived a coverage-error-optimal bandwidth selector that minimizes the leading (positive) term in the distortion to the type-i error. We then propose a direct plug-in estimating procedure for the bandwidth selector for use in practice. The new testing procedure demonstrates good performance in terms of both size and power properties in a Monte Carlo simulation experiment. Usefulness of our new manipulation testing method is also demonstrated by two empirical examples, in which regression discontinuity designs are employed to assess the treatment effect. Mathematical Appendix A First-order Asymptotic Properties Denote Y i h (W i,+ W i, ) and µ k E h Y k. Denote Ak (nh) P n i= Y i k µ k. For notational simplicity, denote Y k (nh) P n i= (Y i h ) k. Denote ˆ h ˆ ( ). It is clear that ˆ solves nh Y i h =0. (A.) +ˆ (Y i h ) i= 9

20 It is also clear that we have ` = log ( + ˆ (Y i h )). i= Proof of Theorem. (A.) can be written as nh i= (Y i h ) ˆ (Y i h ) +ˆ (Y i h ) =0. (A.) Denote Z i Y i h and Z n max Z i. Let c max K, (t). By Hoeffding s lemma (see, i=,...,n t[,0] e.g., Giné and Nickl, 05, Lemma 3..), for small enough h, we have for all t>0, E[exp(tZ )] 6 exp (te[z ]) exp 8c t. (A.3) By Jensen s inequality and (A.3), for t>0, wehave exp (te[z n ]) 6 E[exp(tZ i )] 6 n exp (te[z ]) exp 8c t, i= which implies Now, E[Z n ] 6 log (n) t +E[Z ] + 8c t. E[Z n ] 6 8c / (log (n)) / +E[Z ] follows by setting t = 8c / (log (n)) /. By noticing E[Z ]=O(h) and using Markov s inequality, we have Z n = O p log (n) /. It follows from (A.) that ˆ Y 6 Y ( + ˆ Z n ). (A.4) By the Chebyshev inequality and Bickel and Doksum (05, Proposition.3.), Y = {(W i,+ W i, ) E[W i,+ W i, ]} +(E[W i,+ W i, ] ) n i= =O p (nh) / + O h. (A.5) 0

21 It now follows from Z n = O p log (n) /,(A.5) andtheassumptionnh 5! 0 that /! log (n) Z n Y = O p = o p (). (A.6) nh It is easy to verify that Y =(' + + ' ) + O p (nh) / + O (h). (A.7) Now it follows from (A.4), (A.5), (A.6), (A.7), Assumption (ii) and nh 5! 0 that ˆ = O p (nh) /. Now, (A.) can be written as nh i= It is also straightforward to verify that (Y i h ) ˆ (Y i h ) ˆ (Y i h ) +ˆ (Y i h ) =0. (A.8) nh i= (Y i h ) 3 = O p () (A.9) follows from Markov s inequality. Now nh i= ( (Y i h ) 3 +ˆ (Y i h ) 6 nh i= (Y i h ) 3 ) ( ˆ Z n ) = O p () (A.0) follows from (A.9), Z n = O p log (n) /, ˆ = O p (nh) / (A.0) and ˆ = O p (nh) / imply and Assumption (ii). Now (A.8), ˆ = Y Y + O p (nh). (A.) Now by a Taylor expansion, ` = i= ˆ (Y i h ) ˆ (Y i h ) + i, (A.) where i 6 ( /3) ˆ 3 (Y i h ) 3. Therefore, (A.5), (A.9), (A.), (A.) and ˆ = O p (nh) /

22 imply that ` =(nh) Y Y + O p (nh) /. (A.3) Now `! d follows from (.5), (A.7) and(a.3). B Coverage Accuracy B. Stochastic Expansion By using (A.) and the equality a b = a c a (b c) a (b c) a (b c) 3 a (b c)4 c + c 3 c 4 + bc 4, (B.) we have nh ( ) (Y i h ) (Y i h ) ˆ +(Y i h ) 3 ˆ (Y i h ) 4 ˆ 3 + (Y i h ) 5 ˆ 4 =0, +(Y i h )ˆ i= which gives ( ˆ = Y Y + Y Y 3ˆ Y Y 4ˆ 3 + Y nh Denote ` ` ( ). By a Taylor expansion, we have ) (Y i h ) 5 ˆ 4. (B.) +(Y i h )ˆ i= ` = = log ( + ˆ (Y i h )) i= i= ˆ (Y i h ) ˆ (Y i h ) + 3 ˆ 3 (Y i h ) 3 4 ˆ 4 (Y i h ) 4 + O p (nh)ˆ 5. (B.3) Now (B.) and(b.3) give ` =(nh) Y Y + 3 Y 3 Y 3 Y 3 + Y 5 Y 3 Y 4 Y 4 Y 4 + O p (nh)ˆ 5. (B.4)

23 It is straightforward to verify that Y =A +(µ ) Y =(µ + A ) h (µ + A )+h Y 3 =(µ 3 + A 3 ) 3h (µ + A )+3h (µ + A ) h 3 Y 4 =(µ 4 + A 4 ) 4h (A 3 + µ 3 )+6h (A + µ ) 4h 3 3 (A + µ )+h 3 4. (B.5) It is also straightforward to check that A k = O p (nh) /, µ k = O (), (B.6) for k =,, 3. Denote µ = MC h + O h 3, where the second equality follows from Bickel and Doksum (05, Proposition.3.). Now by using this result, equalities that are similar to (B.), (B.5), (B.6) andµ = O (), which follows from the fact µ =(' + + ' ) + O (h) with (' + + ' ) > 0, wehave Y Y =µ Y µ Y Y µ + µ 3 Y Y µ =µ Y µ Y A h ha + µ 3 Y A h! + O p (nh) 5 / + n h + h n + h6 (nh) / + h7 µ 3 Y Y Y µ 3, (B.7) Y 3 Y 3 Y 3 =µ 3 Y 3Y 3 µ 6 Y 3Y 3 Y 3 µ 3 + µ 6 Y 3 Y 3 Y 3 Y 3 µ 3 =µ 3 µ 3Y 3 + µ 3 A 3Y 3 3µ hy 3 3µ 4 µ 3Y 3 A h + O p (nh) 5 / + n h + h3 (nh) + h5 3 / n + h7 (nh) / + h8!, (B.8)! Y 5 Y 3Y 4 = µ 5 µ 3Y 4 + O p (nh) + 5 / n h + h8 (nh) + h9, (B.9) / and! Y 4 Y 4 Y 4 = µ 4 µ 4Y 4 + O p (nh) + 5 / n h + h8 (nh) + h9. (B.0) / 3

24 It now follows from (B.4), (B.7), (B.8), (B.9), (B.0) and the fact ˆ = O p (nh) / + h that ` =(nh) µ µ A h Y +µ ha Y + µ 3 A h Y + 3 µ 3 3Y µ 3 (A 3 3µ h) Y 3 µ 4 µ 3 A h Y O p (nh) 3 / + n + h3 + h (nh) / +(nh) / h 6 + nh 8! µ 5 µ 3. µ 4 µ 4 Y 4 By lengthy but straightforward algebraic derivations and the fact Y = O p (nh) / + h, we get (3.6) where `0 (nh) µ / Y µ 3 / A h Y + 3 µ 5 / µ 3 Y + 3 µ 5 / (A 3 3µ h) Y 5 6 µ 7 / µ 3 A h Y µ 9 / µ µ 5 / A h Y + µ 3 / ha Y 4 µ 7 / µ 4 Y 3 (B.) is a stochastic approximation. 7 By the fact Y = A +, we can write `0 =(nh)(r 0 + R + R + R 3 ), where R 0 µ / + µ 3 / h + 3 µ 5 / µ 3 µ 3 / h µ 7 / µ 3 h 7 Adapting the proof for deriving an upper bound of the moderate deviation probability for kernel-type estimators (see, e.g., Li and Racine (007, Equation.56)), wecanshowthatforanyk, for any > 0, thereexistssome > 0 such that " / # log (n) P A k > = o n. (B.) nh / Let + h. By using the fact E[Z n]=o log (n) / and (B.), the argument in Liu et al. (04, n log(n) nh Appendix A.) applied to (A.8) shows that for any >0, thereexistsa > 0 such that P[ ˆ > n] =o n. (B.3) Using (B.) and(b.3) andapplyingthedeltamethod(see,e.g.,hall (99, Section.7)),wecanshowthatthe difference between the CDF of `0 and that of ` is of order of magnitude log (n) 5/ (nh) 3/ +log(n) n +log(n) h 3 + log (n) 3/ h (nh) / +(nh) / log (n) / h 6 + nh 8. 4

25 4 + 9 µ 9 / µ 3 4 µ 7 / µ µ 5 / h 4, (B.4) R µ / + µ 3 / h µ 5 / 4 h + 3 µ 5 / µ 3 µ 3 / h µ 7 / µ 9 / µ 3 µ 3 / 3 4 µ 7 / µ 4 A 5 6 µ 7 / µ h A + 3 µ 5 / A 3, h R + 3 µ 5 / µ µ 7 / µ 3 / 4 µ 3 h µ 5 / h 9 µ 9 / µ 3 4 µ 7 / µ 4 A 5 3 µ 7 / µ 3 A A + 3 µ 5 / A A µ 5 / A and R 3 3 µ 5 / A 5 A 3 6 µ 7 / 4 µ 3 A A + 9 µ 9 / µ 3 4 µ 7 / µ 4 A µ 5 / A A. We denote R R + R + R 3. B. Proof of Theorem Proof of Theorem. Let apple 0 j denote the j-th cumulant of (nh) / R. Computations in Section B.3 show that apple 0 = 6 µ 3 / + (nh) + / µ 3 4 µ 5 / µ 3 + µ / h (nh) / 3 µ 7 / µ 3 MC µ 5 / 5 µ 4 MC 6 µ 7 / µ µ 3 / 3! + O h 3 (nh) / +(nh) 3 /, h (nh) / apple 0 =+ 3 µ µ 3 MC h + µ µ 4 apple 0 3 = O 3 36 µ 3 µ 3 (nh) 3 / + n, nh + O h 3 + n + n h, 5

26 and Let i p apple 0 4 = O n +(nh).. Now formally expanding the characteristic function of (nh) / R,wehave h i E exp it (nh) / R t =exp + +(nh) / 6 µ 3 / µ 3 it + 4 µ 5 / µ 3 + µ / 3 µ 7 / µ 3 MC µ 5 / 5 µ 4 MC 6 µ 7 / µ µ 3 / 3 36 µ 7 / µ 3 MCh (it) µ µ 3 MCh (it) +(nh) 4 µ µ 4 h 3 + n +(nh) 3 / + O hit h it 6 µ 3 µ 3 (it). (B.5) The leading term of (B.5) is the Fourier-Stieltjes transform of x 7! R x n (z)dz, where n (x) (x)+(nh) / + 3 µ 7 / µ 3 MC µ 5 / 36 µ 7 / 6 µ 3 / µ 3 x (x)+ 4 µ 5 / µ 3 + µ / hx (x) 5 µ 4 MC 6 µ 7 / µ µ 3 / 3 h x (x) µ 3 MCh x 3 3x (x) + 6 µ µ 3 MCh x (x) x (x). (B.6) +(nh) 4 µ µ 4 6 µ 3 µ 3 The formal Edgeworth expansion of (nh) / R is defined to be such an inverse Fourier-Stieltjes transform. Let C be a class of Borel sets satisfying sup R (@C) (x) =O ( ), denotes the CC boundary of C and (@C) denotes its -neighborhood. The formal Edgeworth expansion is valid in the sense of Bhattacharya and Ghosh (978) if sup CC h i P (nh) / R C Z C n (z)dz = O h 3 + n +(nh) 3 /. (B.7) Now suppose that (B.7) holds. It is straightforward to verify Z c / (nh) / R 0 (x)dx =( ) c / c / (nh) / R 0 c / (nh) R0 + O (nh) R0 4, (B.8) 6

27 Z c / (nh) / R 0 x (x)dx = c / c / (nh) / R 0 c / (nh) / R 0 + O (nh) 3 / R0 3, (B.9) and Z c / (nh) / R 0 x (x)dx = c / c / (nh) / R 0 c / + O (nh) R0 (B.0) Z c / (nh) / R 0 c / (nh) / R 0 x 3 3x (x)dx = (c ) c / (nh) / R 0 + O (nh) 3 / R0 3. (B.) Now by (B.4), (B.7), (B.8), (B.9), (B.0) and(b.), we have Pr `0 6 c =Pr h c / (nh) / R 0 6 (nh) / R 6 c / Z / c (nh) / R 0 = c / (nh) n (x)dx + O / R 0 =( ) (nh) / R 0 i h 3 + n +(nh) 3 / µ nh 5 MC +(nh) µ µ 4 + O h 3 + n +(nh) 3 / + nh 6 + n h 0. 3 µ 3 µ 3 c / c / Notice that in the above distributional expansion, the O h term vanishes. Now it suffices to verify (B.7). The argument we follow is identical to that of Chen and Qin (00). Now, define w (t) =K,+ (t) (t >0) K, (t) (t <0). It is clear that W i,+ W i, = h w Xi c h, i =,,... It can be shown that the following condition (as an analogue of Cramé r s condition) is satisfied: for any, there exists some C > 0 such that sup t + t + t 3 > Z exp it w (u)+it w (u) +it 3 w (u) 3 hf (c hu)du 6 C h (B.) for all sufficiently small h. This can be proved under the additional condition imposed on K,+ by following the arguments in the proof of Hall (99, Lemma 5.6) and Chen and Qin (00, Lemma ). Let A (A,A,A 3 ) 0. By applying (B.) and repeating the same arguments as those in the proof of Hall (99, Theorem 5.8), we establish a valid Edgeworth expansion for (nh) / A. We notice 7

28 that R is a third-order polynomial in A. The validity result (B.7) follows from the existence of a valid Edgeworth expansion for (nh) / A and Skovgaard (98, Theorem 3.), which showed that the validity of an Edgeworth expansion is preserved under a smooth transformation. B.3 Computation of Cumulants of R B.3. First Cumulant By the formulae for moments of products of sample averages on DiCiccio et al. (988, Page),we have E[R ]=0, E[R ]= + 3 µ 5 / µ µ 7 / µ 3 / + 3 µ 5 / n µ4 nh 4 µ 3 h µ 5 / h 5 3 µ 7 / 9 µ 9 / µ 3 µ 3 µ µ o n 8 µ 5 / µ4 nh n µ3 nh 4 µ 7 / µ 4 µ n µ µ o n µ nh µ n (B.3) and E[R 3 ]=O (nh). Let apple j denote the j-th cumulant of R. Nowbyusing = MC h +O h 3, the first cumulant of R satisfies apple =E [R + R + R 3 ] = 6 µ 3 / µ 3 nh µ 5 / µ 3 + µ / n 3 µ 7 / µ 3 MC µ 5 / µ 4 MC 5 6 µ 7 / µ µ 3 / 3 h h n + O n +(nh). (B.4) B.3. Second Cumulant Again, by the formulae on DiCiccio et al. (988, Page ) and lengthy calculations, we have E R = nh + 3 µ µ 3 MC h n + O h n, (B.5) E[R R ]= 3 µ 3 µ 3 µ µ 4 n h + O n, (B.6) h 8

29 5 E[R R 3 ]= 8 µ µ 4 5 µ 3 µ 3 n h + O n h + n 3 h 3 (B.7) and E R = 6 µ 3 µ µ µ 4 n h + O n h + n 3 h 3. (B.8) Now (B.3), (B.5), (B.6), (B.7) and(b.8) imply apple =E R E[R] =E R + E [R R ] + E [R R 3 ]+E R = nh + 3 µ µ 3 MC h n + µ µ µ 3 µ 3 E[R ] + O n 3 h 3 n h + O h n + n h + n 3 h 3. B.3.3 Third Cumulant By the formulae on DiCiccio et al. (988, Page ) and lengthy calculations, we have E R 3 o = nµ 3 / 3 µ 3 n h + µ 5 / µ 3 3µ / n h + O n, (B.9) E R R = µ 3 / 3 µ 3 n h + 4 µ 5 / µ µ / n h + O n 3 h + n. (B.30) It follows from (B.3) and(b.5) that E[R ]E R = 6 µ 3 / µ 3 n h + 4 µ 5 / µ 3 + µ / n h + O n. (B.3) Now the formulae on DiCiccio et al. (988, Page ) directly give moment bounds, e.g., E R R = O (nh) 3. These bounds and (B.4), (B.9), (B.30) and(b.3) imply apple 3 =E R 3 3 E R E[R]+ E[R] 3 =E R 3 +3 E R R =O (nh) 3 + n. E[R ]E R + O (nh) 3 (B.3) 9

30 B.3.4 Fourth Cumulant By the relation between cumulants and moments, we have apple 4 =E R 4 3 E R 4 E[R] apple 3 + E[R] 4 =E R 4 3 E R + O n 3 h +(nh) 4. (B.33) The formulae on DiCiccio et al. (988, Page ) imply moment bounds, e.g., E R R 3 = O (nh) 4. By these results and (B.33), we now have apple 4 =E R 4 +6 E R R +4 E R R 3 +4 E R3 R 3 n 3 E R + E R E R +4 E R E[R R ]+4 E o R E[R R 3 ] + O n 3 h +(nh) 4. (B.34) By lengthy calculations, we have E R 4 3 E R = n 3 h 3 µ µ 4 + O n 3 h, (B.35) E R R 3 E R 3 R 3 3 E[R R ]E R = n 3 h 3 3 µ 3 µ 3 E[R 3 R ]E R = n 3 h 3 µ µ 4 3 µ µ 4 + O µ 3 µ 3 + O n 3 h, (B.36) n 3 h (B.37) and E RR E R E R = n 3 h 3 µ µ 4 6 µ 3 µ 3 + O n 3 h. (B.38) Now (B.34), (B.35), (B.36), (B.37) and(b.38) implythat apple 4 = O n 3 h +(nh) 4. C Implementation: Pilot Bandwidths for Estimating ' s and ' (3) s A DPI estimate of (3.5) requires pilot estimates of ' s and ' (3) s. Following Arai and Ichimura (05), we use inconsistent but easily implementable estimates of these objects. These estimates 30

31 can be based on the local regression idea of Cattaneo et al. (06). It is noticed that we can write F (z) =E[ (X 6 X ) X = z], where F is the distribution function of X and X is an independent copy of X. Let n + denote the number of observations with X i >c. Instead of a local regression, we fit a global polynomial regression. Let + n +/n. Let Z,...,Z n+ denote these observations. For each of these observations, we generate R i (n + ) P j6=i (Z j 6 Z i ), for i =,...,n + and regress R i on, (Z i c), (Z i c),..., (Z i c) 5 to obtain the OLS coefficients, denoted by ˆ +. The pilot estimate of ' + is given by + e 0 ˆ + and the pilot estimate of ' (3) + is given by + 4! e 0 5 ˆ +. It is clear from Equation (3.5) that to form a valid DPI estimate of the AMSE-optimal bandwidth, the estimate of ' + should not be negative. In practice, we solve the following constrained least-square problem: ˆ + = argmin [0,) R 3 n + X i= R i r 5 (Z i c) 0, to avoid negativity. 8 Similarly, we obtain inconsistent but easily implementable estimates of ' and ' (3). In our simulations, we find that these estimators perform well enough, although theoretically they are not consistent for ' s and ' (3) s,fors {, +}. References Arai, Y. and H. Ichimura (05). Simultaneous selection of optimal bandwidths for the sharp regression discontinuity estimator. Working paper, GRIPS. Armstrong, T. and M. Kolesár (05). A simple adjustment for bandwidth snooping. Working paper, Yale University. Armstrong, T. and M. Kolesár (06). Simple and honest confidence intervals in nonparametric regression. Working Paper, Yale University. Bhattacharya, R. N. and J. K. Ghosh (978). On the validity of the formal Edgeworth expansion. The Annals of Statistics, We use Matlab command quadprog to solve the inequality-constrained quadratic programming problem. 3

32 Bickel, P. J. and K. A. Doksum (05). Mathematical statistics: basic ideas and selected topics, Volume. CRC Press. Calonico, S., M. D. Cattaneo, and M. Farrell (06). Coverage error optimal confidence intervals for regression discontinuity designs. Working paper, University of Miami. Calonico, S., M. D. Cattaneo, and M. Farrell (07). On the effect of bias estimation on coverage accuracy in nonparametric inference. Journal of the American Statistical Association. Forthcoming. Calonico, S., M. D. Cattaneo, and R. Titiunik (04). Robust nonparametric confidence intervals for regression-discontinuity designs. Econometrica 8 (6), Cattaneo, M. D., M. Jansson, and X. Ma (06). Simple local regression distribution estimators with an application to manipulation testing. Working paper, University of Michigan. Chen, S. X. (996). Empirical likelihood confidence intervals for nonparametric density estimation. Biometrika, Chen, S. X. and H. Cui (007). On the second-order properties of empirical likelihood with moment restrictions. Journal of Econometrics 4 (), Chen, S. X. and Y. S. Qin (000). Empirical likelihood confidence intervals for local linear smoothers. Biometrika, Chen, S. X. and Y. S. Qin (00). Confidence intervals based on local linear smoother. Scandinavian Journal of Statistics 9 (), Cheng, M.-Y., J. Fan, and J. S. Marron (997). On automatic boundary corrections. The Annals of Statistics 5, Davidson, R. and J. G. MacKinnon (998). Graphical methods for investigating the size and power of hypothesis tests. The Manchester School 66, 6. DiCiccio, T., P. Hall, and J. Romano (988). Bartlett adjustments for empirical likelihood. Technical report No. 98, Department of Statistics, Stanford University. 3

Supplemental Appendix to "Alternative Assumptions to Identify LATE in Fuzzy Regression Discontinuity Designs"

Supplemental Appendix to Alternative Assumptions to Identify LATE in Fuzzy Regression Discontinuity Designs Supplemental Appendix to "Alternative Assumptions to Identify LATE in Fuzzy Regression Discontinuity Designs" Yingying Dong University of California Irvine February 2018 Abstract This document provides