Consistent Tests for Conditional Treatment Effects

Size: px

Start display at page:

Download "Consistent Tests for Conditional Treatment Effects"

Alannah Wilkins
5 years ago
Views:

1 Consistent Tests for Conditional Treatment Effects Yu-Chin Hsu Department of Economics University of Missouri at Columbia Preliminary: please do not cite or quote without permission.) This version: May 11, 2011 Department of Economics, University of Missouri at Columbia, Columbia MO, U.S.A.; Acknowledgement: I thank Jason Abrevaya, Richard Chiburis, Stephen G. Donald, and Kyungchul Song for their insightful comments. All errors and omissions are my own responsibility.

2 Abstract We construct a Kolmogorov-Smirnov test for the null hypothesis that the average treatment effect is non-negative conditional on all possible values of the covariates. The null hypothesis of our interest can be characterized as a conditional moment inequality under the unconfoundedness assumption, and we employ the instrumental variable method to convert the conditional moment inequality into unconditional ones without information loss. The Kolmogorov-Smirnov test is constructed based on these unconditional moment inequalities. It is shown that our test can control the size asymptotically, is consistent against fixed alternatives, and is unbiased against some N 1/2 local alternatives. Furthermore, our test is more powerful than Lee and Whang s 2009) against a broad set of N 1/2 local alternatives. Monte-Carlo simulation results confirm our theoretical findings. Several interesting extensions are discussed too. JEL classification: C01, C12, C21 Keywords: Hypothesis testing, treatment effects, test consistency, propensity score.

3 1 Introduction This paper proposes a Kolmogorov-Smirnov KS) test for the null hypothesis that the average treatment effect is non-negative conditional on all possible values of the covariates. We show that the null hypothesis of our interest can be characterized as a conditional moment inequality under the unconfoundedness assumption. We employ Andrews and Shi s 2010, AS hereafter) instrumental variable approach to transform the conditional moment inequality to infinite number of unconditional moment inequalities without information loss. An inverse probability weighted estimator IPW) as in Hirano, Imbens and Ridders 2003, HIR hereafter) is used to estimate each of the unconditional moments, and the estimated moments indexed by the instrument functions are shown to weakly converge to a mean zero Gaussian process. As in Donald and Hsu 2010b, DH hereafter), we propose a simulated method based on the multiplier central limit theorem to approximate the limiting process of the estimated moments. The test statistic is defined as the supremum of the estimated unconditional moments indexed by the instrument functions. The critical value for the test is constructed based on the simulated process and the generalized moment selection GMS) approach. The GMS method introduced by Andrews and Soares 2010) and AS is similar to the recentering method of Hansen 2005) and Donald and Hsu 2010a), and the contact set method of Linton, Song and Whang 2010). These methods are used to improve the power of the tests involving inequalities without resorting to the least favorable configuration LFC). We show that our test can control the size at the prespecified significance level asymptotically and is consistent against any fixed alternatives. Our test is also unbiased against some N 1/2 local alternatives. Lee and Whang 2009, LW hereafter) consider the same null hypothesis as ours, and their test statistic is a one-sided L 1 -type functional of the nonparametric kernel estimator of the conditional average treatment effects. Their test can control the size well asymptotically and is consistent against fixed alternatives too. However, one drawback of their test is that to implement their test, one needs to specify a strict subset of the support of the covariates first and restrict attention to this subset. This might cause LW s test to be inconsistent especially when the violation of the null hypothesis is outside of this subset. Given that the violation of the null hypothesis is within this subset, our test and LW s are both consistent, but which test is more powerful depends on the underlying fixed alternatives. However, we can show that our test is more powerful than LW s against some N 1/2 local alternatives that do not converge to the least favorable case. We conduct small-scaled Monte-Carlo simulations to study the finite sample performance 1

4 of our test and LW s and the results support our theoretical findings. This paper is related to the treatment effect literature. For recent reviews of this huge literature, please see Imbens 2004) and Imbens and Wooldridge 2009) among others. Most papers in the literature focus on the estimation and inference for the average treatment effects, and only a few papers construct tests for the average treatment effect conditional on the covariates, e.g. LW and Crump, Hotz, Imbens and Mintnik 2008). We have discussed LW s test. Crump, Hotz, Imbens and Mintnik 2008) construct nonparametric tests for two different null hypotheses that the average treatment effects conditional on the covariates for all values of covariates are equal to zero or equal to a constant. 1 The null hypotheses of their interest involve conditional moment equalities, which are different from ours. The methods developed in this paper extend directly to those null hypotheses, but detailed comparison between our method and theirs is beyond the scope of this paper and is left for future research. This paper is also related to the literature on the conditional moment inequalities, e.g. AS, Chernozhukov, Lee and Rosen 2008), Galichon and Henry 2009), Fan 2008) and Kim 2008). These papers construct confidence sets for the parameters defined by a set of conditional moment inequalities and/or equalities. The focus of ours is different from theirs, because we are interested in testing a null hypothesis that is characterized by a moment inequality. Lee, Song and Whang 2011) consider a problem similar to ours, but they require that the treatment assignment is random and independent of the covariates with the probability of assignment known. We consider several extensions of our tests. We first extend our results to test the null hypothesis that the conditional stochastic dominance relation between the potential outcomes holds for all values of the covariates. LW also consider this type of null hypotheses. In the treatment effect literature, only a small number of papers discuss the stochastic dominance relation between the two groups under the unconfoundedness assumption such as DH and Maier 2011). On the other hand, Abadie 2002) considers tests for stochastic dominance relation when the treatment assignment is endogenous. These papers focus on unconditional stochastic dominance relation between the potential outcomes, but we focus on the conditional stochastic dominance relation. Second, we extend our results to the cases where the conditioning set is a strict subset of the covariates where the unconfoundedness assumption does not hold if we condition on this subset of covariates. The only paper in the literature that discusses the conditional average treatment effect in 1 LW only consider the null hypothesis that the average treatment effects conditional on the covariates for all values of covariates are equal to zero. 2

5 this case is Hsu and Lieli 2011) where they propose a two-step kernel estimator for the conditional average treatment effect, but here we are interested in testing whether the conditional average treatment effect is uniformly non-negative conditional on this subset of covariates. Finally, we extend our test to the cases where the treatment assignment is endogenous as in the local average treatment effect setup of Imbens and Angrist 1994), Abadie, Angrist and Imbens 2002), Abadie 2002, 2003), Frölich 2007) and Donald, Hsu and Lieli 2010). The rest of this paper is organized as follows. Section 2 introduces the model setup and we formulate the null hypothesis of interest as a conditional moment inequality. We introduce AS s instrument approach to transform the conditional moment inequality to a continuum of unconditional moment inequalities without information loss. We also introduce an IPW estimator to estimate the unconditional moments. The test statistic and the decision rule are also discussed. Section 3 derives the asymptotics of the estimated moments and the test statistic. Section 4 present a simulated method to approximate the limiting process of the estimated moments and the GMS method based on which the critical value is constructed. Section 5 discusses the size properties, the consistency against fixed alternatives and the asymptotic local power against N 1/2 local alternatives of our test. We introduce LW s test and make comparisons between our test and LW s in Section 6. Section 7 summarizes Monte-Carlo simulation results, and Section 8 discusses some extensions of our tests. Section 9 concludes, and all mathematical proofs are deferred to the Appendix. 2 Test for Conditional Average Treatment Effects 2.1 Hypothesis Formulation Let D be a dummy variable such that D = 1 if the individual receives treatment; otherwise, D = 0. Let X be a k-dimensional vector of covariates with k 1 with a compact support X. Define Y 1) as the potential outcome for the individual under treatment and Y 0) as that without treatment. We observe D, X and Y = D Y 1) + 1 D) Y 0). We have a random sample of size N. Let µ 0 x) = E[Y 0) X = x] and µ 1 x) = E[Y 1) X = x]. The null hypothesis of our interest is that the conditional average treatment effect defined as µ 1 x) µ 0 x) is non-negative for each x X and this can be formulated as H 0 : µ 0 X) µ 1 X) 0, a.s. in X. 1) 3

6 We assume that the treatment assignment is unconfounded. Assumption 2.1 Unconfoundedness Assumption): Y 0), Y 1)) D X. Let px) = P D = 1 X = x) denote the propensity score, the probability of getting treatment for an individual with covariates x, which is assumed to be bounded away from zero and one on X. Under Assumption 2.1, µ 0 x) and µ 1 x) are identified as [ 1 D)Y ] [ DY ] µ 0 x) = E X = x, µ 1 x) = E X = x. 1 px) px) Hence, under Assumption 2.1, 1) is equivalent to [ 1 D)Y H 0 : E 1 px) DY ] X 0, a.s. in X. 2) px) The null hypotheses defined in 2) involve a conditional moment inequality. To extract all the information from 2), we adopt AS s instrumental variable approach to transform the conditional moment inequality to infinitely many unconditional ones without information loss. Define l = x, r) and L = X [0, r] where r > 0. The set of instrument functions we consider is defined as G cube = C cube = {g l X) = 1X C l ) : C l C cube }, { C l = k } [l j r, l j + r] : l L. j=1 As shown by AS, the null hypotheses in 1) and 2) are equivalent to [ 1 D)Y H 0 : νl) E g l X) 1 px) DY )] 0, for all l L. 3) px) That is, we can transform the conditional moment inequality to a continuum of number of unconditional moment inequalities indexed by the instrument functions. We estimate νl) by the IPW estimator proposed by HIR ˆνl) = 1 N N 1 Di )Y i g l X i ) 1 ˆpX i ) D ) iy i, ˆpX i ) where ˆpX i ) is a nonparametric estimator for px). As in HIR, we use the Series Logit Estimator SLE) to estimate px) based on power series. Let λ = λ 1,..., λ r ) Z r + be a r-dimensional vector of non-negative integers where Z + denotes the set of nonnegative integers, and define the norm for λ as λ = r j=1 λ j. Let {λk)} k=1 be a sequence including all distinct λ Z r + such that λk) is non-decreasing in k and let 4

7 x λ = r j=1 xλ j j. For any integer K, define RK x) = x λ1),..., x λk) ) as a vector of power functions. Let Λa) = expa)/1 + expa)) be the logistic CDF. The SLE for px i ) is defined as ˆpx) = Λ R K x) ˆπ ) K where 1 ˆπ K = arg max π k N N )) )) T i log Λ R K X i ) π K + 1 T i ) log 1 Λ R K X i ) π K. Other nonparametric estimators can be used to estimate the propensity score function, e.g. local polynomial estimators in Ichimura and Linton 2005), but the estimated propensity score is not necessary bounded away by 0 and 1 in finite sample and proper trimming is required. However, trimming is not required for SLE, because the estimated propensity score function is automatically bounded away from 0 and 1. Furthermore, one can also use the imputation estimator to estimate νl), e.g. Heckman, Ichimura, and Todd 1997, 1998), Heckman, Ichimura, Smith and Todd 1998), and Hahn 1998). That is, one can estimate νl) by ˆνl) = 1 N N g l X i )ˆµ 0 X i ) ˆµ 1 X i )), where ˆµ 0 x) and ˆµ 1 x) are nonparametric estimators for µ 0 x) and µ 1 x) for all x X. We expect that under suitable assumptions, all the results discussed below still hold when one uses the imputation estimator. 2.2 Test Statistic and Decision Rule The KS test statistic is defined as Ŝ N = N sup ˆνl). 4) l L In this paper, we focus on the non-standardized version of the test, but the results developed below can be extended to the standardized version of the test as in AS. 2 Also, all results can be extended to Cramér-von Mises type test easily. Given a simulated critical value c which will be defined later, the decision rule is the following: Reject H 0 if ŜN > c. 5) 2 Ideally, the standardized version of our test statistics should be defined as ŜN = N sup l L ˆνl)/ˆσl) where ˆσ 2 l) is an estimator for asymptotic variance of Nˆνl) νl)). However, because ˆσ 2 l) is not uniformly bounded away from 0, this will cause problem when ˆνl) is divided ˆσl), so we need to modify ˆσl) and the standardized test statistic. For more details, please refer to Section 3.1 of AS. 5

8 3 Asymptotics of ˆνl) and the Test Statistic 3.1 Assumptions In addition to the unconfoundedness assumption, we assume the following regularity conditions which are identical to those in HIR. The first assumption summarizes the properties of Y 0) and Y 1). Assumption 3.1 Distributions of Y 0) and Y 1)): 1. Y 0) and Y 1) have finite second moments. 2. µ 0 x) and µ 1 x) are continuously differentiable for all x X. We impose conditions on the distribution of X and the conditional expectation of Y 0) and Y 1). Assumption 3.2 Distribution of X): 1. The support of the k-dimensional covariate X is a Cartesian product of compact intervals, X = k j=1 [x lj, x uj ]. 2. The density of X, fx), is bounded and bounded away from zero on X. The following assumption requires the smoothness of the propensity score function. Assumption 3.3 Propensity Score): For all x X, the propensity score px) satisfies the following conditions: 1. px) is continuously differentiable of order s 7k. 2. px) is bounded away from zero and one: 0 < p px) p < 1. The last assumption restricts the growth rate of the number of approximating functions to be included in the series approximation to the propensity score function. Assumption 3.4 Series Estimator): The SLE of px) uses a power series with K = N ν for some k/4s k) < ν < 1/9. 6

9 3.2 Asymptotics of ˆνl) Lemma 3.5 Suppose Assumption 2.1 and hold. Then Nˆν ) ν )) Ψ ), where denotes weak convergence, and Ψ ) is zero-mean Gaussian processes with covariance functions generated by 1 D)Y ψ l W ) = g l X) 1 px) DY µ0 X) + D px)) px) 1 px) + µ 1X) px) where W {Y, D, X}, i.e., CovΨl 1 ), Ψl 2 )) = E[ψ l1 W )ψ l2 W )]. Note that the proof of Lemma 3.5 contains two parts. First, we show that sup l L 1 Nˆνl) νl)) N N ψ l W i ) = o p1). )) νl) In the second step, we show that K = {ψ l W ) l L} is a Vapnik-Chervonenkis VC) class of functions and by Lemma 3.5 follows from Donsker s Theorem or the functional central limit theorem. 3.3 Asymptotics of the Test Statistic Define L = {l : νl) = 0}, which is non-empty by definition. 3 We have the following result concerning the asymptotic properties of the test statistic ŜN. Proposition 3.6 Suppose that Assumption 2.1 and hold, then 1. if H 0 is true, ŜN D sup l L Ψl). 2. under any fixed alternative H 1 : νl) > 0 for some l L, then lim N Ŝ N. The first part of Proposition 3.6 shows that the limiting null distribution of the test statistic only depends on those ˆνl) with νl) = 0, and the result is standard in the literature. The second part shows that under any fixed alternative H 1 : νl) > 0 for some l L, the test statistic will diverge to infinity, which leads to the consistency of our test as we will show later. 3 If l = x, r) with r = 0, then l L. 7

10 4 Simulated Process, Generalized Moment Selection and Simulated Critical Value As noted by McFadden 1989) and Barrett and Donald 2003), the main difficulty with Kolmogorov-Smirnov tests is in constructing an appropriate critical value for conducting the tests since the limiting distribution of the test statistics depends on the underlying functions. In our example, the limiting distribution of the test statistics depends on px), µ 0 x), µ 1 x) and νl). Hence, as in DH, we propose a method to simulate the the stochastic process Ψ ) based on the multiple central limit theorem. We also introduce the AS s GMS method. The simulated critical value for our test which is constructed based on the simulated process and the GMS. Finally, we discuss the properties of the simulated critical value. 4.1 Simulated Process The stochastic process Ψ ) is simulated based on the multiplier central limit theorem. Let U 1, U 2,... be bounded independent random variables with mean zero and variance equal to one that are independent of the sequence W = {W 1, W 2,...}. For all l L, we define the simulated stochastic processes Ψ u l) as Ψ u l) = 1 N N 1 Di )Y i U i g l X i ) 1 ˆpX i ) D iy i ˆpX i ) ˆµ0 X i ) + D i ˆpX i )) 1 ˆpX i ) + ˆµ )) 1X i ) ˆνl) ˆpX i ) where ˆµ 0 x) and ˆµ 1 x) are the series estimators for µ 0 x) and µ 1 x): ˆµ 0 x) = ˆµ 1 x) = N N ), 6) ) N ) 1 1 D i )Y i 1 ˆpX i ) RK X i ) R K X i )R K X i ) R K x), ) N ) 1 D i Y i ˆpX i ) RK X i ) R K X i )R K X i ) R K x). 7) As shown in HIR, ˆµ 0 x) and ˆµ 1 x) are consistent for µ 0 x) and µ 1 x) uniformly in X. Lemma 4.1 Suppose Assumption 2.1 and probability which is denoted by Ψ u l) Pw Ψl). Then Ψ u l) Ψl) given W in 8

11 The proof of Lemma 4.1 contains two steps which are similar to DH. In the first step, we show that 1 N N U i ψ l W i ) Pw Ψl) by the multiplier central limit theorem of Corollary of Van der Vaart and Wellner 1996). In the second step, we show that the estimation error from estimating µ 0 x), µ 1 x), px) and νl) will disappear in the limit. 4.2 Generalized Moment Selection As most papers in the moment inequality literature, we use the generalized moment selection method to construct the critical value. The generalized moment selection method is introduced by Andrews and Soares 2010) and AS. It is similar to the recentering method in Hansen 2005) and Donald and Hsu 2010a), and the contact set approach in Linton, Song and Whang 2010). Again, under the null hypothesis, those ˆνl) with νl) < 0 will not contribute to the limiting null distribution. Therefore, the main idea of these approaches is to find out those l s with νl) < 0 and then to remove those moment from consideration asymptotically. By doing this, one can construct a more powerful test without resorting to the LFC. For a sequence of negative numbers b N, we define the generalized moment selection function or the recentering function as ˆµl) = ˆνl) 1 N ˆνl) < b N ) where 1 ) is the indicator function. We impose the following condition on the sequence of b N which is standard in the moment inequality literature. Assumption 4.2 The sequence of negative numbers b N satisfies that lim N b N = and lim N N 1/2 b N = 0. As we will see later, the generalized moment selection function will be added to the simulated process before we take the supremum. By doing this, we can approximate the null distribution without resorting to the LFC so as to improve the power of our test. 4.3 Simulated Critical Value We define the simulated test statistic as Ŝu N sup l LΨ u l) + N ˆµl)) and c as the 1 α)-th quantile of Ŝu N, i.e., { ) } c = sup q P u Ŝu N q 1 α. 9

12 The critical value is defined as ĉ = max{ c, η} where η is an arbitrarily small positive number, say Note that when µ 0 X) µ 1 X) < 0 a.s. in X, we can show that both Ŝ N and c converge to zero. As a result, defining the critical value ĉ as the maximum between c and η eliminates the need for complicated proof to show that ŜN converges to zero faster than c so that the level of the test can be well controlled. 4 The following lemma summarizes the asymptotic properties regarding c and ĉ. Define c as the 1 α)-th quantile of sup l L + Ψl) where L + {l : νl) 0}. 5 Lemma 4.3 Suppose Assumption 2.1, and 4.2 hold and α < 1/2. Then 1. c p c. 2. ĉ p max{c, η}. Note that Lemma 4.3 holds both under null hypotheses and under fixed alternative hypotheses. The first part of Lemma 4.3 shows that c will converge to c which implies that c is bounded in probability. Furthermore, under the null hypothesis, L + = L, so c will converge to c, the 1 α)-th quantile of null distribution 6, i.e., we can approximate the critical value without resorting the LFC. The second part follows from the first part and the continuity of the max operator. 5 Size and Power Properties In this section, we show that our test can control the size well and is consistent against fixed alternatives. We also discuss the local power properties of our test and show that out test is asymptotically unbiased against some N 1/2 local alternatives. Let M ) denote the Lebesgue measure. 5.1 Size and Power against Fixed Alternatives We summarize first main result regarding the size and the power against fixed alternatives of our test in the following theorem. Theorem 5.1 Suppose Assumption 2.1, and 4.2 hold and α 0 < 1/2. If we reject the H 0 when ŜN > ĉ, then: 4 Similar approach is used in Section 4.1 of AS. 5 If sup l L + Ψl) is degenerate at 0, then c = 0. 6 When µ 0X) µ 1X) < 0 a.s. in X which implies that L = {l = x, r) : r = 0}. As a result, the limit distribution of ŜN is degenerate at 0. In addition, c is equal to 0, so both ŜN and c converge to 0. 10

13 1. if H 0 is true and M{x : µ 0 x) µ 1 x) = 0}) > 0, lim η 0 lim N P reject H 0 ) = lim η 0 lim P ŜN > ĉ) = α. 2. if H 0 is true and M{x : µ 0 x) µ 1 x) = 0}) = 0, lim N P reject H 0 ) = lim N P ŜN > ĉ) = under any fixed alternative H 1 : νl) > 0 for some l L, lim N P reject H 0 ) = 1. The first part of Theorem 5.1 shows that our test will have exact size asymptotically when the set of x such that µ 0 x) µ 1 x) = 0 is not of measure zero. As a result, there is l with r > 0 such that νl) = 0 and V arψ l W )) > 0 and it follows that limiting null distribution is non-degenerate and c is strictly positive. This implies that if η is small enough, c will be strictly greater than η with probability approaching 1 and c = ĉ with probability approaching 1. As a result, first part follows from the fact that when η is small enough, c will converge to c when sample size converge to infinity. For the second part, we have both Ŝ and c converge to 0 in the limit by Proposition 3.6 and Lemma 4.3. As a result, for any positive η, we have Ŝ η ĉ with probability approaching 1 and the second part follows. The last part shows the consistency of our test and it follows from that the test statistic ŜN diverges to positive infinity and the critical value ĉ is bounded in probability. 5.2 Local Asymptotic Power We show that our test is unbiased against some N 1/2 local alternatives. We consider a sequence of τ N x) that will converge to τx) uniformly in x such that τx) 0 for all x X. Define the local alternative as H 1,N : τ N x) = τx) + δx) N, 8) where δx) is continuous in x. Let X {x : τx) = 0}. We impose conditions on the local alternatives we consider. Assumption 5.2 Suppose the following conditions hold: 1. MX ) > δx) 0 if x X. 3. Mδ + X ) > 0 where δ + {x : δx) > 0}. 11

14 Define L {l : E[τX)g l X)] = 0}, L ++ = {l : E[τ N X)g l X)] > 0 eventually} and dl) = E[δX)g l X)]. Under Assumption 5.2, it is not hard to show that L ++ L with ML ++ ) > 0. In addition, dl) 0 when l L and dl) > 0 when l L ++. The following lemma summarizes the limiting distribution of test statistic and limit of the critical value under the local alternatives that satisfy Assumption 5.2. Lemma 5.3 Suppose Assumption 2.1, and 4.2 hold and α < 1/2. Under the local alternatives 5.2) which satisfy Assumption 5.2, then D 1. Ŝ N supl L Ψl) + dl)). 2. ĉ p max{c, η} where c is the 1 α)-th quantile of sup l L Ψl). The following theorem shows that our test is unbiased against the local alternatives 5.2) defined in ) that satisfy Assumption 5.2. Theorem 5.4 Under same assumptions in Lemma 5.3, the asymptotic local power of our test is greater than or equal to α when η tends to zero, i.e., lim η 0 lim N P reject H 0 ) α. It is well-known that tests involving inequalities are only unbiased against some N 1/2 local alternatives. Note that if the deviation from the null is allowed to be negative on X, our test might be biased. A simple example regarding this can be found in Donald and Hsu s 2010a) Example Comparisons with Lee and Whang 2009) This section compares our tests and LW s. We first summarize LW s test. We show that that LW s test which is constructed based on a user-chosen strict subset of X might be inconsistent if the violation of the null is outside of the user-chosen subset. Furthermore, we show that under a broad set of N 1/2 local alternatives that do not converge to the least favorable case, our test is more powerful than LW s. 6.1 Lee and Whang s Test Define the kernel estimator for µ 0 x) µ 1 x) as ˆτx) = ˆµ 0 x) ˆµ 1 x) where {i:d ˆµ 0 x) = i =0} Y ik h x X i ) {i:d i =0} K hx X i ), ˆµ {i:d 0x) = i =1} Y ik h x X i ) {i:d i =1} K hx X i ). 12

15 The K h x X i ) = h d Kx X i )/h) where K ) is a kernel function and h is the bandwidth. The test statistic of LW s is a one-sided version of L 1 -type functionals of ˆτx) which is defined as T = N max{ˆτx), 0}wx)dx. X where wx) 0 is a weight function with support W x which is a strictly subset of X. They require W x to be a strict subset of X to avoid the boundary problem of kernel estimators. In the least favorable case of the null hypothesis, LW shows that T a N σ N D N 0, 1). 9) The exact definitions of a N and σn 2 are given in Appendix. Let â N and ˆσ N 2 be estimators for a N and σn 2 which are also defined in Appendix. It is shown that the asymptotic normality of T in 9) still holds with â N and ˆσ N in place of a N and σ N. Hence, define the standardized test statistic as Ŝ LW = T â N ˆσ N, and the rejection rule of LW s test is: Reject H 0 if ŜLW > z 1 α, where z 1 α is the 1 α)-th quantile of the standard normal. Theorem 4.1 and Theorem 4.2 of LW shows that their test can control the size well and is consistent against fixed alternative if M{x W x : τx) > 0}) > Advantages of Our Test over LW s 1. The first advantage of our test over LW s is that our test is consistent against all fixed alternatives, but LW s test is only consistent if M{x W x : τx) > 0}) > 0, because they need to restrict their attention to the subset W x to avoid the boundary problem of the kernel estimator ˆτx). Therefore, LW s test might not have power when the violation of the null is outside of W x. To restore the consistency of LW s test, one can allow the W x to expand to X when N tends to infinity, but the theory for this result is not trivial and remains an open question. 2. If the violation of the null is within W x, our test and LW s are both consistent, and which test is more powerful depends on the underlying fixed alternatives, i.e., it is possible to show that our test is more powerful under some fixed alternatives, but 13

16 LW s is more powerful under some other fixed alternatives. However, we can show that under a broad set of local alternatives, our test is more powerful than LW s. To show this, we modify our Assumption 5.2. Define A\B {x : x A but x B} for any two subsets A and B. We define C {x W x : τx) = 0} Assumption 6.1 In addition to the conditions in Assumption 5.2, we assume the following conditions: a) MC) > 0. b) Mδ + C) > 0. c) MW x \C) > 0. The last condition requires that the measure of W x \C {x : τx) < 0} is strictly positive which implies that the local alternatives will not converge to the least favorable case. The following theorem shows that LW s test has no power against the local alternatives satisfying Assumption 6.1. Note that the local alternatives defined in Assumption 6.1 is a subset of those defined in Assumption 5.2, so our test is still unbiased under Assumption 6.1. Therefore, our test is more powerful than LW s against those local alternatives satisfying Assumption 6.1. Theorem 6.2 Under the local alternatives 5.2) which satisfy Assumption 6.1, then the local power of LW s test is equal to zero. That is, lim N P ŜLW > z 1 α ) = 0. Theorem 6.2 shows that LW s test has no power against those local alternatives that do not converge to the least favorable case. In the proof of Theorem 6.2, we show that ŜLW converges to negative infinity under the local alternatives defined in Assumption 6.1. Therefore, the local power of LW s test is lim N P ŜLW > Z 1 α ) = 0. Again, our test is unbiased in this case, so our test is more powerful than LW s against a broad set of local alternatives satisfying Assumption 6.1. Note that Section 5.1 of LW proposes a more powerful test based on the contact set approach. In fact, with a suitable modification of assumption 6.1, we can show that the same result in Theorem 6.2 holds for LW s more powerful test. 14

17 7 Monte-Carlo Simulations In this section, we conduct small-scaled Monte-Carlo simulations to illustrate the finite sample performance of our test and LW s. Example 7.1 Let the DGP be: X = U x, D = 1U t < X), Y 0) = 2X 0.5) + U 0, Y 1) = 2X 0.5) + U 1 Y = DY 1) + 1 D)Y 0), where U x and U t are uniform distributions over [0, 1] and U 0 and U 1 are standard normals. U x, U t, U 0 and U 1 are independent. In Example 7.1, we have µ 0 X) µ 1 X) = 0 a.s. in X which is the least favorable case of the null hypothesis. We use this example to illustrate the size properties of our test and LW s when the null hypothesis is in the least favorable case. Example 7.2 Let the DGP be the same as in Example 7.1 except that Y 0) = 2X 0.5) 1X < 0.5) + U 0, Y 1) = U 1. In Example 7.2, we have µ 0 X) µ 1 X) 0 a.s. in X and the strictly inequality holds when X < 0.5. We use this example to illustrate the size properties of our test and LW s when the null hypothesis is not in least favorable case. Example 7.3 Let the DGP be the same as in Example 7.1 except that Y 1) = U 1. In Example 7.3, we have µ 0 X) µ 1 X) 0 when X 0.5 and µ 0 X) µ 1 X) > 0 when X > 0.5, i.e., the null hypothesis is violated. We use this example to show the power of our test and LW s against fixed alternatives. Example 7.4 Let the DGP be the same as in Example 7.1 except that Y 0) = 2X 0.5) 1X < 0.5) + N 1/2 + U 0, Y 1) = U 1. 15

18 We use Example 7.4 to demonstrate the local power of our test and LW s test. Note that the DGP in Example 7.4 converges to the DGP in Example 7.2 which is not in the least favorable case of the hull hypothesis. Following Section 3.5 of AS, we approximate Ŝ by a finite number of instrument functions. The intervals we consider here are those with lengths r 1 for r = 1,..., r 1 where r 1 = 8 so we use a total of 36 intervals 7. 8 Note that the smaller the recentering parameter is, the more conservative and the less powerful of our test in finite sample. To illustrate this, we consider b N = 0.5 log log N, log log N and 2 log log N. 9 When approximating the limiting process, we let U i s be independent uniform random variables on [ 3. 3] and the simulated test statistic is also approximated by the same instrument functions as well. For each simulation we approximate the p-value 10 of our tests by 1,000 repetitions. For all the examples, none of the critical values will converge to zero, so we set η = 0 in these cases. We consider three different sample size: 300, 500 and 1,000. For N = 300, the propensity score function is estimated by the SLE with power series: 1 and X. For N = 500 and 1,000, we use 1 and X and X When implementing LW s test, we set the W x = [0.05, 0.95] and we use the uniform weight function with wx) = 1 for all x W x and wx) = 0 otherwise. 12 The kernel function Ku) = u) 2 ) 1 u 0.5) with bandwidth h = C h ŝ X N 2/7 where C h {2, 3} and ŝ X is the sample standard deviation of X as suggested by LW. We use the Reimann sum to approximate the integral in the expression of T, â N and ˆσ 2 N based on 500 gridpoints evenly distributed in [0.05, 0.95]. 13 The rejection rates are calculated based on 5,000 simulations and are summarized in Table 1-4. The first observation is that the smaller the b N is the smaller rejections are in all cases. Table 1 presents the size of our test and LW s for Example 7.1 where the null hypothesis is in the least favorable case. Both tests are over-sized, but the size distortion 7 For example, if r = 3, we use instrument functions, 10 X 1/3), 11/3 X 2/3) and 12/3 X 1). Therefore, if r 1 = 8, we have a total of 36 instrument functions. 8 Our simulation results are not sensitive to the choice of r 1. 9 The rejection rates of our test with b N = 0.5 log log N will be the largest among the three b N s we consider. 10 Define the p-value of our test as ˆpŜN ) = P max{ψ u l)+ N ˆµl), η} > ŜN ). For α 0 < 1/2, we reject H 0 when ˆpŜN ) < α 0. Also, the critical value method and the p-value method are equivalent. 11 The simulation results are not sensitive to the choices of the power series. 12 In the simulations, we do not consider the estimated weight functions, ˆpx)1 ˆpx)) ˆf 2 x) and ˆfx), as suggested by LW, because in this case one should also consider the estimation effects of the estimated weight function when estimating â N and ˆσ 2 N. However, this is not discussed in LW. 13 The simulation results are not sensitive to the numbers of gridpoints we choose among 300, 500 and 1,

19 of our test which is between 0.44%-1.34% is smaller than theirs which is between 1.18%- 3.1%. Also, when the sample size increases, the size of both tests gets closer to 5% level. This suggests that it will be interesting to further investigate the higher order properties of both tests; however, this is beyond the scope of this paper and we leave it for future research. The simulation results of Example 7.2 are summarized in Table 2. In Table 2, we find that our test is much less conservative than LW s. Actually, by the same argument for Theorem 6.2, we can show that the size of LW s test is equal to zero asymptotically when the null hypothesis is not in the least favorable case. Simulation results in Table 2 support this result, as the size of LW s test with C N = 2 and 3 decreases to zero when the sample size increases. On the other hand, the size of our test with different b N s all increases to 5% level when the sample size increases. This confirms the first part of our Theorem 5.1. Table 3 summarizes the simulation results of Example 7.3, which demonstrate the powers of our test and LW s against fixed alternatives. In this case, our test is more powerful than LW s since the rejection rates of our test with different choices of b N s are all larger than theirs for all sample sizes we consider. Note that when N = 300, then difference in power of ours and theirs can be as high as 34.5%. A couple of reasons account for this. First, LW s test is constructed based on LFC and it is well-known that tests based on LFC are in general more conservative and less powerful. Our test that does not rely on LFCcan be more powerful in this case. Second, LW s test is restricted to W x = [0.05, 0.95], so LW s test does not utilize the information of the violation on x [0.95, 1]. However, our test can utilize this information, so our test can be more powerful than theirs in this example. Table 4 presents simulation results of Example 7.4. We use this example to illustrate Theorem 5.4 and Theorem 6.2 concerning the asymptotic local power of our test and LW s. Note that the DGP in Example 7.4 satisfies Assumption 6.1. Section 5 and Section 6 show that the asymptotic local power of our test will be greater than 5%, but that of LW s is zero. The simulation results support this theoretical finding since the simulated rejection rates of our test are between 4.62%-5.66% with N = 300 which increase to 7.84%-8.24% when N is 1,000, and those of LW s are 2.22% and 2.28% with N = 300 and decrease to 0.68% and 0.76% respectively when N is 1,

20 8 Extensions We present several extensions of our test in this section. We first extend our test to test for the condition stochastic dominance relation between treatment group and control group. Second, we discuss how to test for the null hypothesis that the conditional treatment effect in non-negative conditional on X a which is a strict subset of X. Especially, we focus on the cases where the unconfoundedness assumption will not hold when we condition on X a only. Third, we extend our tests to the cases where the unconfoundedness assumption does not hold, but there is a binary instrument variable available as in the traditional local average treatment effect setup. 8.1 Condition Stochastic Dominance Treatment Effects We can extend our method to test the conditional stochastic dominance relation between treatment and control groups which is also considered in LW. To test the conditional stochastic dominance relation between treatment and control groups, we first replace Y with 1Y y) for all y Y where Y is the common compact support of Y 1) and Y 0). To be more specific, let F 1 y x) and F 1 y x) denote the conditional CDFs of Y 1) and Y 0), the null hypothesis regarding condition stochastic dominance treatment effects is defined as H sd 0 : F 1 y X) F 0 y X) 0 y Y and a.s. in X which is equivalent to H sd 0 : νl, y) E We estimate νl, y) by ˆνl, y) = 1 N N [ D 1Y y) g l X) px) )] 1 D) 1Y y) 0 1 px) l L and y Y. 10) Di 1Y i y) g l X i ) 1 D ) i) 1Y i y). ˆpX i ) 1 ˆpX i ) which will converge to a mean zero Gaussian process Ψl, y) with covariance kernel generated by D 1Y y) 1 D) 1Y y) ψ l,y W ) = g l X) px) 1 px) F1 y X) D px)) + F )) 0y X) νl, y). px) 1 px) 18

21 Given that F 0 y x) and F 1 y x) are monotonically increasing and uniformly consistent estimators 14 for F 0 y x) and F 1 y x) as in DH, then Ψl, y) can be approximated well by Ψ u l, y) which is Ψ u l, y) = 1 N N The test statistic is defined as ŜN sd = N sup ˆνl, y). l L,y Y Di 1Y i y) U i g l X i ) 1 D i) 1Y i y) ˆpX i ) 1 ˆpX i ) F1 y X i ) + D i ˆpX i )) + F )) 0 y X i ) ˆνl, y) ˆpX i ) 1 ˆpX i ) ). 11) Given the generalized moment selection function ˆµl, y) = ˆνl, y) 1 N ˆνl, y) < b N ), the critical value is ĉ sd = max{ c sd, η} where c sd is the 1 α)-th quantile of sup l L,y Y Ψ u l, y)+ N ˆµl, y)). The rejection rule is: Reject H0 sd, if Ŝsd N > ĉsd. Under the same conditions in DH, the test for conditional stochastic dominance treatment effect shares the same prosperities with our test regarding the conditional average treatment effect. The advantages of our method over LW s in this case are similar to the conditional average treatment effect case. Hence, we omit the formal presentations of these results. The extension of our test to higher order stochastic dominance relation case is straightforward. For example, for j-th order stochastic dominance with j 2, we just need to replace 1Y y) with 1Y y) Y y) j 1 /j 1)!. 8.2 Conditional on a Strict Subset X a We extend our results to the case when the conditioning set is X a, a strict subset X, and the unconfoundedness assumption does not hold if we only condition on X a. 15 the treatment effect literature, most papers focus on either the treatment effects over the whole population or the conditional treatment effects conditional on the whole set of covariates X such that the unconfoundedness assumption holds. In For a researcher, requiring a policy that has uniformly positive effect on all subgroups defined by X might 14 F0 y x) is monotonically increasing if F 0 y 1 x) F 0 y 2 x) for all y 1 y 2 and for all x X. F0 y x) is uniformly consistent for F 0y x) if sup y Y,x X F 0y x) F 0y x) = o p1). 15 If the unconfoundedness assumption holds when we only condition on X a, then previous methods still work and the theory is valid when we replace X with X a. 19

22 be too strict sometimes. Our test is useful when the researcher is interested in a policy that has uniformly positive effect on all subgroups defined by X a. The null hypothesis is H a 0 : E[Y 0) Y 1) X a ] 0 a.s. in X a. 12) Define l a = x a, r) and L a = X a [0, r] where X a is the support of X a and r > 0. The set of instrument functions is defined as } Gcube {g a = la X a ) = 1X a C la ) : C la Ccube a, C a cube = {C la = k a j=1 [l j r, l j + r] : l a L a }, where k a denotes the dimension of X a. As a result, the null hypothesis defined in 12) is equivalent to H a 0 : νl) E [ 1 D)Y g la X a ) 1 px) DY )] 0, for all l a L a. px) As a result, by replacing g l X) with g la X a ), all results of our test can be easily extended to this case. However, it is not trivial to extend LW s test to this case. The main reason is that ˆτx a ) defined as ˆτx a ) N 1 D i)y i K h x a X ai ) N 1 D i)k h x a X ai ) N D iy i K h x a X ai ) N D ik h x a X ai ) is not a consistent estimator for τx a ) = E[Y 0) Y 1) X a = x a ] since the the treatment status is not unconfounded conditional on X a only. For LW s test to work, a two-step estimator for τx a ) as in Hsu and Lieli 2011) is needed and the extension of the theory of LW s test to this case is not a trivial exercise. 8.3 Tests without Unconfoundedness Finally, as LW s test, our method can be applied to the cases where the unconfoundedness assumption does not hold, but there is a binary instrument instrument Z) available. Under the conditional local average treatment effect LATE) setup as in Abadie, Angrist and Imbens 2002), and Abadie 2003), the LATE is defined as LAT Ex) E[Y 1) Y 0) X = x, C] where C denotes the group of compliers and the null hypothesis of interest is defined as H late 0 : LAT EX) 0 a.s. in X. 13) 20

23 It is well-known that LATEx) is identified by [ 1 Z)Y LAT Ex) = E 1 qx) ZY ]/ [ 1 Z)D X E qx) 1 qx) ZD ] X, qx) where qx) = P Z = 1 X = x). Given that the denominator is assumed to be strictly positive a.s. in X, the null hypothesis in 13) equivalent to [ 1 Z)Y H0 late : E 1 qx) ZY ] X 0, a.s. in X, 14) qx) which is virtually equivalent to 2) after we replace D and px) with Z and qx) respectively. In other words, if we treat Z as the primary treatment status, then the test for 14) is equivalent to 2). Similar argument applies to the stochastic dominance cases and to the cases where the conditioning set X a is a strict subset of X. 9 Conclusion In this paper, we propose a KS test for the null hypothesis that the conditional average treatment effect is uniformly non-negative over all subgroups defined by the covariates. Our test can control the size well asymptotically, is consistent against any fixed alternative and is unbiased against some local alternatives. We compare our test with LW s, which may be inconsistent in some cases, and our test is more powerful than theirs against some local alternatives. Monte-Carlo simulations confirm our theoretical findings. Several extensions are presented as well. For future study, it would be interesting to extend our method to test for the null hypotheses that for a fixed quantile index or for a continuum of quantile indexes, the conditional quantile treatment effect is uniformly non-negative over all subgroups defined by the covariates as in Section 5.2 of LW. To implement this, one would need to use the imputation method with an estimator for the conditional quantile functions as in Section 5.2 of LW because it is difficult to define these null hypotheses in the same way as 2) in this paper. 21

24 Appendix Let be a generic constant which varies in different cases. N. All limits are taken as Proof of Lemma 3.5: The proof is similar to the proof of Lemma 3.6 of DH. First, using the same argument in the addendum of HIR, we have sup l L 1 Nˆνl) νl)) N N ψ l W i ) = o p1). Second, given C cube is a VC class of sets, so G cube is a VC class of functions. By Lemma 9.9 in Kosorok 2008), for any measurable function fw ), we have {g l W )fw ) : l L} is a VC class of functions which implies that K = {ψ l W ) : l L} is a VC class of functions where 1 D)Y ψ l W ) = g l X) 1 px) DY µ0 X) + D px)) px) 1 px) + µ 1X) px) Hence, Lemma 3.5 follows from the functional central limit theorem. )) νl). Proof of Proposition 3.6: The proof of Proposition 3.6 is similar to that of Proposition 1 of Barrett and Donald 2003). Using the same argument with Ψ ) in the place of T j ) and l in the place of z, we can show that for any c > 0 ) lim sup P ŜN c) P sup Ψl) c l L and for arbitrary ϵ > 0, ) lim inf P ŜN c) P sup Ψl) c 2ϵ. l L Together, we have ) lim P ŜN c) = P sup Ψl) c. 15) l L If the CDF of sup l L Ψl) is discontinuous at 0, then Equation 15) implies that Ŝ D sup l L Ψl). If the CDF of sup l L Ψl) is continuous at 0, then following the same argument of the proof of Lemma 1 of Donald and Hsu 2010a), we can show that lim P ŜN 0) = P sup l L Ψl) 0) = 0. This implies that Ŝ D sup l L Ψl). To show the second part, under the fixed alternative H 1 : νl) > 0 for some l L, there is some l such that νl ) = ξ > 0. Note that νl ) p ξ and this implies that Ŝ N N νl ). 22

25 This completes our proof. Proof of Lemma 4.1: Rewrite Ψ u l) = N U i N ψ l W i ) N U i Di Y i g l X i ) N px i ) D )) iy i 1 ˆpX i ) N U i 1 Di )Y i g l X i ) N 1 ˆpX i ) 1 D )) i)y i 1 px i ) N U ) i g ˆµ 0 X i ) l X i ) px i ) ˆpX i ) N 1 ˆpX i ) + ˆµ )) 1X i ) ˆpX i ) N U i ˆµ0 X i ) g l X i )D i px i )) N 1 ˆpX i ) + ˆµ 1X i ) ˆpX i ) µ 0X i ) 1 px i ) µ )) 1X i ) px i ) N U i N ˆνl) νl)). 16) Note that by the multiplier central limit theorem of Corollary of Van der Vaart and Wellner 1996), we have 1 N N U i ψ l W i )) Pw Ψl). The only thing left is to show that all the remaining terms will disappear in the limit. We first look at χ N l W) = N U i Di Y i g l X i ) N px i ) D )) iy i. ˆpX i ) We want to show that χ N l W ) weakly converges to a zero process with probability 1 conditional on sample path W with probability one by showing that conditions i)-v) of Theorem 10.6 of Pollard 1990) hold with probability one. Conditions ii)-v) can be verified easily. To check i), note that 1X ji x j ) where x j X j where X j denotes the j-th element of X and X j is the compact support of X j. Let X N x j) = 1X j1 x j ),..., 1X jn x j )) and G jn = { X N x j) : We define some notations first. x j X j }. Let G N = 1,..., 1) which would be the envelope functions of G jn. For any two vectors a = a 1,..., a N ) and b = b 1,..., b N ) in R N, define the the pointwise product as a b = a 1 b 1,..., a N b N ). Define Θ N = θ 1,..., θ N ) be a vector of non-negative weights and Θ N G jn = {Θ N X N x j ) : x j X j }. The packing number Dϵ, T 0 ) for a subset 23

26 of T 0 of a metric space with metric d is defined as the largest m for which there exist points t 1,..., t m in T 0 with dt i, t j ) > ϵ for i j. Define the metric d as the l 1 norm on R N which is defined as v 1,..., v N ) 1 = N v i and dx j1, x j2 ) = N 1X ji x j1 ) 1X ji x j2 ). We also have for any x j1 x j2 x j3, dx j3, x j1 ) = dx j3, x j2 ) + dx j2, x j1 ). Hence we can show that D 1 ϵ Θ N G jn 1, Θ N G j ) 1/ϵ + 1 for all Θ N and for all sample path of X j. Similarly, define G + jn = {1X j1 x j ),..., 1X jn x j )) : x j X j } and we have D 1 ϵ Θ N G N 1, Θ N G + jn ) 1/ϵ + 1 for all Θ N. Define G jn = { X N x j1) X N x j2) : x j1, x j2 X j }. Let v 1,..., v N ) 2 = N v2 i )1/2 which is the l 2 norm on R N. Let D 1 and D 2 be the l 1 and l 2 packing numbers respectively. Then by Lemma 5.3 of Pollard 1990) and the relation between packing number and covering number, we have D 2 ϵ Θ N G jn 2, Θ N G j ) ϵ 4, where is a positive number not depending on the realization of the sample path of X j. Define G N = G 1N G kn Finally, by Equation 5.2) of Pollard 1990), we have D 2 ϵ Θ N G N 2, Θ N G N ) ϵ 4k 17) for some positive number. Given W and ω u Ω u, let F Ni ω u ) = M Y i ) U i ω u ), F N ω u ) = F N1 ω u ),..., F NN ω u ), N 1 f Ni U i ω u ), l W) = U i ω u ) g l X i ) ˆpX i ) 1 ) )D i Y i px i ) ) f N l, ω u W) = f N1 U 1 ω u ), l W),..., f NN U N ω u ), l W), { fn } F Nωu = l, ω u W) : l L Equation 17) implies that D 2 ϵ Θ N F N ω u ) 2, Θ N F Nωu ) ϵ 4k βϵ), since the G cube is a subset of G = {Π k j=1 1x j x j1 ) 1x j x j2 ) : x j1, x j2 X j }. Since 1 0 log βt)dt <, this is sufficient to show i). We can show that Hl 1, l 2 ) = lim N χ N l 1 W)χ N l 2 W) = 0 for all l 1 and l 2 in L conditional on sample path W with probability 1. Given this, it follows that conditional 24

27 on sample path W with probability 1 χ N l W) converges to a mean zero Gaussian process with covariance kernel Hl 1, l 2 ) = 0 which is a zero process. We have χ N l W) Pw 0. Similarly, we can show that N U i 1 Di )Y i g l X i ) N 1 ˆpX i ) 1 D )) i)y i P w 0, 1 px i ) N U i ˆµ0 X i ) g l X i )px i ) ˆpX i )) N 1 ˆpX i ) + ˆµ )) 1X i ) P w 0, ˆpX i ) N U ) i g ˆµ 0 X i ) l X i ) D i px i ) N 1 ˆpX i ) + ˆµ 1X i ) ˆpX i ) µ 0X i ) 1 px i ) µ )) 1X i ) P w 0. px i ) For the last term in 16), P sup l L 1 ˆνl) νl)) N N U i > ϵ ) [ ] 2 E sup l L ˆνl) νl) ϵ 2 0. The first inequality follows from the Chebyshev s inequality and E[ N U i/ N] 2 = 1. By dominated convergence theorem, we also have [ 2 E sup ˆνl) νl) ] 0. l L This completes the proof of Lemma 4.1. Proof of Lemma 4.3: To show Lemma 4.3, we first use the same argument in the proof of Lemma 3.5 of Donald and Hsu 2010a) to show that conditional on the sample path W with probability 1 S u supψ u l) + N ˆµl)) D sup Ψl) S. 18) l L l L + For the case where S is degenerate at 0, it is obvious that c will converge to 0 which is equal to c. In the other case, we assume that there is l L + such that V arψl)) > 0 so that S is not degenerate. It is true that P S 0) 1/2, so when α < 1/2, the 1 α)-th quantile of S, c, is strictly positive. On the other hand, by Tsirel son 1975), the distribution function of S, F S z), is continuous when z > 0. These are sufficient to show that c, 1 α)-th quantile of S u, will converge to c in probability. This completes the first part. For the second part, it is true that ĉ = max{ c, η} p max{c, η} since max{a, b} is a continuous function. 25

Estimation and Inference for Distribution Functions and Quantile Functions in Endogenous Treatment Effect Models. Abstract

Estimation and Inference for Distribution Functions and Quantile Functions in Endogenous Treatment Effect Models Yu-Chin Hsu Robert P. Lieli Tsung-Chih Lai Abstract We propose a new monotonizing method