Consistent Tests for Conditional Treatment Effects

Size: px
Start display at page:

Download "Consistent Tests for Conditional Treatment Effects"

Transcription

1 Consistent Tests for Conditional Treatment Effects Yu-Chin Hsu Department of Economics University of Missouri at Columbia Preliminary: please do not cite or quote without permission.) This version: May 11, 2011 Department of Economics, University of Missouri at Columbia, Columbia MO, U.S.A.; Acknowledgement: I thank Jason Abrevaya, Richard Chiburis, Stephen G. Donald, and Kyungchul Song for their insightful comments. All errors and omissions are my own responsibility.

2 Abstract We construct a Kolmogorov-Smirnov test for the null hypothesis that the average treatment effect is non-negative conditional on all possible values of the covariates. The null hypothesis of our interest can be characterized as a conditional moment inequality under the unconfoundedness assumption, and we employ the instrumental variable method to convert the conditional moment inequality into unconditional ones without information loss. The Kolmogorov-Smirnov test is constructed based on these unconditional moment inequalities. It is shown that our test can control the size asymptotically, is consistent against fixed alternatives, and is unbiased against some N 1/2 local alternatives. Furthermore, our test is more powerful than Lee and Whang s 2009) against a broad set of N 1/2 local alternatives. Monte-Carlo simulation results confirm our theoretical findings. Several interesting extensions are discussed too. JEL classification: C01, C12, C21 Keywords: Hypothesis testing, treatment effects, test consistency, propensity score.

3 1 Introduction This paper proposes a Kolmogorov-Smirnov KS) test for the null hypothesis that the average treatment effect is non-negative conditional on all possible values of the covariates. We show that the null hypothesis of our interest can be characterized as a conditional moment inequality under the unconfoundedness assumption. We employ Andrews and Shi s 2010, AS hereafter) instrumental variable approach to transform the conditional moment inequality to infinite number of unconditional moment inequalities without information loss. An inverse probability weighted estimator IPW) as in Hirano, Imbens and Ridders 2003, HIR hereafter) is used to estimate each of the unconditional moments, and the estimated moments indexed by the instrument functions are shown to weakly converge to a mean zero Gaussian process. As in Donald and Hsu 2010b, DH hereafter), we propose a simulated method based on the multiplier central limit theorem to approximate the limiting process of the estimated moments. The test statistic is defined as the supremum of the estimated unconditional moments indexed by the instrument functions. The critical value for the test is constructed based on the simulated process and the generalized moment selection GMS) approach. The GMS method introduced by Andrews and Soares 2010) and AS is similar to the recentering method of Hansen 2005) and Donald and Hsu 2010a), and the contact set method of Linton, Song and Whang 2010). These methods are used to improve the power of the tests involving inequalities without resorting to the least favorable configuration LFC). We show that our test can control the size at the prespecified significance level asymptotically and is consistent against any fixed alternatives. Our test is also unbiased against some N 1/2 local alternatives. Lee and Whang 2009, LW hereafter) consider the same null hypothesis as ours, and their test statistic is a one-sided L 1 -type functional of the nonparametric kernel estimator of the conditional average treatment effects. Their test can control the size well asymptotically and is consistent against fixed alternatives too. However, one drawback of their test is that to implement their test, one needs to specify a strict subset of the support of the covariates first and restrict attention to this subset. This might cause LW s test to be inconsistent especially when the violation of the null hypothesis is outside of this subset. Given that the violation of the null hypothesis is within this subset, our test and LW s are both consistent, but which test is more powerful depends on the underlying fixed alternatives. However, we can show that our test is more powerful than LW s against some N 1/2 local alternatives that do not converge to the least favorable case. We conduct small-scaled Monte-Carlo simulations to study the finite sample performance 1

4 of our test and LW s and the results support our theoretical findings. This paper is related to the treatment effect literature. For recent reviews of this huge literature, please see Imbens 2004) and Imbens and Wooldridge 2009) among others. Most papers in the literature focus on the estimation and inference for the average treatment effects, and only a few papers construct tests for the average treatment effect conditional on the covariates, e.g. LW and Crump, Hotz, Imbens and Mintnik 2008). We have discussed LW s test. Crump, Hotz, Imbens and Mintnik 2008) construct nonparametric tests for two different null hypotheses that the average treatment effects conditional on the covariates for all values of covariates are equal to zero or equal to a constant. 1 The null hypotheses of their interest involve conditional moment equalities, which are different from ours. The methods developed in this paper extend directly to those null hypotheses, but detailed comparison between our method and theirs is beyond the scope of this paper and is left for future research. This paper is also related to the literature on the conditional moment inequalities, e.g. AS, Chernozhukov, Lee and Rosen 2008), Galichon and Henry 2009), Fan 2008) and Kim 2008). These papers construct confidence sets for the parameters defined by a set of conditional moment inequalities and/or equalities. The focus of ours is different from theirs, because we are interested in testing a null hypothesis that is characterized by a moment inequality. Lee, Song and Whang 2011) consider a problem similar to ours, but they require that the treatment assignment is random and independent of the covariates with the probability of assignment known. We consider several extensions of our tests. We first extend our results to test the null hypothesis that the conditional stochastic dominance relation between the potential outcomes holds for all values of the covariates. LW also consider this type of null hypotheses. In the treatment effect literature, only a small number of papers discuss the stochastic dominance relation between the two groups under the unconfoundedness assumption such as DH and Maier 2011). On the other hand, Abadie 2002) considers tests for stochastic dominance relation when the treatment assignment is endogenous. These papers focus on unconditional stochastic dominance relation between the potential outcomes, but we focus on the conditional stochastic dominance relation. Second, we extend our results to the cases where the conditioning set is a strict subset of the covariates where the unconfoundedness assumption does not hold if we condition on this subset of covariates. The only paper in the literature that discusses the conditional average treatment effect in 1 LW only consider the null hypothesis that the average treatment effects conditional on the covariates for all values of covariates are equal to zero. 2

5 this case is Hsu and Lieli 2011) where they propose a two-step kernel estimator for the conditional average treatment effect, but here we are interested in testing whether the conditional average treatment effect is uniformly non-negative conditional on this subset of covariates. Finally, we extend our test to the cases where the treatment assignment is endogenous as in the local average treatment effect setup of Imbens and Angrist 1994), Abadie, Angrist and Imbens 2002), Abadie 2002, 2003), Frölich 2007) and Donald, Hsu and Lieli 2010). The rest of this paper is organized as follows. Section 2 introduces the model setup and we formulate the null hypothesis of interest as a conditional moment inequality. We introduce AS s instrument approach to transform the conditional moment inequality to a continuum of unconditional moment inequalities without information loss. We also introduce an IPW estimator to estimate the unconditional moments. The test statistic and the decision rule are also discussed. Section 3 derives the asymptotics of the estimated moments and the test statistic. Section 4 present a simulated method to approximate the limiting process of the estimated moments and the GMS method based on which the critical value is constructed. Section 5 discusses the size properties, the consistency against fixed alternatives and the asymptotic local power against N 1/2 local alternatives of our test. We introduce LW s test and make comparisons between our test and LW s in Section 6. Section 7 summarizes Monte-Carlo simulation results, and Section 8 discusses some extensions of our tests. Section 9 concludes, and all mathematical proofs are deferred to the Appendix. 2 Test for Conditional Average Treatment Effects 2.1 Hypothesis Formulation Let D be a dummy variable such that D = 1 if the individual receives treatment; otherwise, D = 0. Let X be a k-dimensional vector of covariates with k 1 with a compact support X. Define Y 1) as the potential outcome for the individual under treatment and Y 0) as that without treatment. We observe D, X and Y = D Y 1) + 1 D) Y 0). We have a random sample of size N. Let µ 0 x) = E[Y 0) X = x] and µ 1 x) = E[Y 1) X = x]. The null hypothesis of our interest is that the conditional average treatment effect defined as µ 1 x) µ 0 x) is non-negative for each x X and this can be formulated as H 0 : µ 0 X) µ 1 X) 0, a.s. in X. 1) 3

6 We assume that the treatment assignment is unconfounded. Assumption 2.1 Unconfoundedness Assumption): Y 0), Y 1)) D X. Let px) = P D = 1 X = x) denote the propensity score, the probability of getting treatment for an individual with covariates x, which is assumed to be bounded away from zero and one on X. Under Assumption 2.1, µ 0 x) and µ 1 x) are identified as [ 1 D)Y ] [ DY ] µ 0 x) = E X = x, µ 1 x) = E X = x. 1 px) px) Hence, under Assumption 2.1, 1) is equivalent to [ 1 D)Y H 0 : E 1 px) DY ] X 0, a.s. in X. 2) px) The null hypotheses defined in 2) involve a conditional moment inequality. To extract all the information from 2), we adopt AS s instrumental variable approach to transform the conditional moment inequality to infinitely many unconditional ones without information loss. Define l = x, r) and L = X [0, r] where r > 0. The set of instrument functions we consider is defined as G cube = C cube = {g l X) = 1X C l ) : C l C cube }, { C l = k } [l j r, l j + r] : l L. j=1 As shown by AS, the null hypotheses in 1) and 2) are equivalent to [ 1 D)Y H 0 : νl) E g l X) 1 px) DY )] 0, for all l L. 3) px) That is, we can transform the conditional moment inequality to a continuum of number of unconditional moment inequalities indexed by the instrument functions. We estimate νl) by the IPW estimator proposed by HIR ˆνl) = 1 N N 1 Di )Y i g l X i ) 1 ˆpX i ) D ) iy i, ˆpX i ) where ˆpX i ) is a nonparametric estimator for px). As in HIR, we use the Series Logit Estimator SLE) to estimate px) based on power series. Let λ = λ 1,..., λ r ) Z r + be a r-dimensional vector of non-negative integers where Z + denotes the set of nonnegative integers, and define the norm for λ as λ = r j=1 λ j. Let {λk)} k=1 be a sequence including all distinct λ Z r + such that λk) is non-decreasing in k and let 4

7 x λ = r j=1 xλ j j. For any integer K, define RK x) = x λ1),..., x λk) ) as a vector of power functions. Let Λa) = expa)/1 + expa)) be the logistic CDF. The SLE for px i ) is defined as ˆpx) = Λ R K x) ˆπ ) K where 1 ˆπ K = arg max π k N N )) )) T i log Λ R K X i ) π K + 1 T i ) log 1 Λ R K X i ) π K. Other nonparametric estimators can be used to estimate the propensity score function, e.g. local polynomial estimators in Ichimura and Linton 2005), but the estimated propensity score is not necessary bounded away by 0 and 1 in finite sample and proper trimming is required. However, trimming is not required for SLE, because the estimated propensity score function is automatically bounded away from 0 and 1. Furthermore, one can also use the imputation estimator to estimate νl), e.g. Heckman, Ichimura, and Todd 1997, 1998), Heckman, Ichimura, Smith and Todd 1998), and Hahn 1998). That is, one can estimate νl) by ˆνl) = 1 N N g l X i )ˆµ 0 X i ) ˆµ 1 X i )), where ˆµ 0 x) and ˆµ 1 x) are nonparametric estimators for µ 0 x) and µ 1 x) for all x X. We expect that under suitable assumptions, all the results discussed below still hold when one uses the imputation estimator. 2.2 Test Statistic and Decision Rule The KS test statistic is defined as Ŝ N = N sup ˆνl). 4) l L In this paper, we focus on the non-standardized version of the test, but the results developed below can be extended to the standardized version of the test as in AS. 2 Also, all results can be extended to Cramér-von Mises type test easily. Given a simulated critical value c which will be defined later, the decision rule is the following: Reject H 0 if ŜN > c. 5) 2 Ideally, the standardized version of our test statistics should be defined as ŜN = N sup l L ˆνl)/ˆσl) where ˆσ 2 l) is an estimator for asymptotic variance of Nˆνl) νl)). However, because ˆσ 2 l) is not uniformly bounded away from 0, this will cause problem when ˆνl) is divided ˆσl), so we need to modify ˆσl) and the standardized test statistic. For more details, please refer to Section 3.1 of AS. 5

8 3 Asymptotics of ˆνl) and the Test Statistic 3.1 Assumptions In addition to the unconfoundedness assumption, we assume the following regularity conditions which are identical to those in HIR. The first assumption summarizes the properties of Y 0) and Y 1). Assumption 3.1 Distributions of Y 0) and Y 1)): 1. Y 0) and Y 1) have finite second moments. 2. µ 0 x) and µ 1 x) are continuously differentiable for all x X. We impose conditions on the distribution of X and the conditional expectation of Y 0) and Y 1). Assumption 3.2 Distribution of X): 1. The support of the k-dimensional covariate X is a Cartesian product of compact intervals, X = k j=1 [x lj, x uj ]. 2. The density of X, fx), is bounded and bounded away from zero on X. The following assumption requires the smoothness of the propensity score function. Assumption 3.3 Propensity Score): For all x X, the propensity score px) satisfies the following conditions: 1. px) is continuously differentiable of order s 7k. 2. px) is bounded away from zero and one: 0 < p px) p < 1. The last assumption restricts the growth rate of the number of approximating functions to be included in the series approximation to the propensity score function. Assumption 3.4 Series Estimator): The SLE of px) uses a power series with K = N ν for some k/4s k) < ν < 1/9. 6

9 3.2 Asymptotics of ˆνl) Lemma 3.5 Suppose Assumption 2.1 and hold. Then Nˆν ) ν )) Ψ ), where denotes weak convergence, and Ψ ) is zero-mean Gaussian processes with covariance functions generated by 1 D)Y ψ l W ) = g l X) 1 px) DY µ0 X) + D px)) px) 1 px) + µ 1X) px) where W {Y, D, X}, i.e., CovΨl 1 ), Ψl 2 )) = E[ψ l1 W )ψ l2 W )]. Note that the proof of Lemma 3.5 contains two parts. First, we show that sup l L 1 Nˆνl) νl)) N N ψ l W i ) = o p1). )) νl) In the second step, we show that K = {ψ l W ) l L} is a Vapnik-Chervonenkis VC) class of functions and by Lemma 3.5 follows from Donsker s Theorem or the functional central limit theorem. 3.3 Asymptotics of the Test Statistic Define L = {l : νl) = 0}, which is non-empty by definition. 3 We have the following result concerning the asymptotic properties of the test statistic ŜN. Proposition 3.6 Suppose that Assumption 2.1 and hold, then 1. if H 0 is true, ŜN D sup l L Ψl). 2. under any fixed alternative H 1 : νl) > 0 for some l L, then lim N Ŝ N. The first part of Proposition 3.6 shows that the limiting null distribution of the test statistic only depends on those ˆνl) with νl) = 0, and the result is standard in the literature. The second part shows that under any fixed alternative H 1 : νl) > 0 for some l L, the test statistic will diverge to infinity, which leads to the consistency of our test as we will show later. 3 If l = x, r) with r = 0, then l L. 7

10 4 Simulated Process, Generalized Moment Selection and Simulated Critical Value As noted by McFadden 1989) and Barrett and Donald 2003), the main difficulty with Kolmogorov-Smirnov tests is in constructing an appropriate critical value for conducting the tests since the limiting distribution of the test statistics depends on the underlying functions. In our example, the limiting distribution of the test statistics depends on px), µ 0 x), µ 1 x) and νl). Hence, as in DH, we propose a method to simulate the the stochastic process Ψ ) based on the multiple central limit theorem. We also introduce the AS s GMS method. The simulated critical value for our test which is constructed based on the simulated process and the GMS. Finally, we discuss the properties of the simulated critical value. 4.1 Simulated Process The stochastic process Ψ ) is simulated based on the multiplier central limit theorem. Let U 1, U 2,... be bounded independent random variables with mean zero and variance equal to one that are independent of the sequence W = {W 1, W 2,...}. For all l L, we define the simulated stochastic processes Ψ u l) as Ψ u l) = 1 N N 1 Di )Y i U i g l X i ) 1 ˆpX i ) D iy i ˆpX i ) ˆµ0 X i ) + D i ˆpX i )) 1 ˆpX i ) + ˆµ )) 1X i ) ˆνl) ˆpX i ) where ˆµ 0 x) and ˆµ 1 x) are the series estimators for µ 0 x) and µ 1 x): ˆµ 0 x) = ˆµ 1 x) = N N ), 6) ) N ) 1 1 D i )Y i 1 ˆpX i ) RK X i ) R K X i )R K X i ) R K x), ) N ) 1 D i Y i ˆpX i ) RK X i ) R K X i )R K X i ) R K x). 7) As shown in HIR, ˆµ 0 x) and ˆµ 1 x) are consistent for µ 0 x) and µ 1 x) uniformly in X. Lemma 4.1 Suppose Assumption 2.1 and probability which is denoted by Ψ u l) Pw Ψl). Then Ψ u l) Ψl) given W in 8

11 The proof of Lemma 4.1 contains two steps which are similar to DH. In the first step, we show that 1 N N U i ψ l W i ) Pw Ψl) by the multiplier central limit theorem of Corollary of Van der Vaart and Wellner 1996). In the second step, we show that the estimation error from estimating µ 0 x), µ 1 x), px) and νl) will disappear in the limit. 4.2 Generalized Moment Selection As most papers in the moment inequality literature, we use the generalized moment selection method to construct the critical value. The generalized moment selection method is introduced by Andrews and Soares 2010) and AS. It is similar to the recentering method in Hansen 2005) and Donald and Hsu 2010a), and the contact set approach in Linton, Song and Whang 2010). Again, under the null hypothesis, those ˆνl) with νl) < 0 will not contribute to the limiting null distribution. Therefore, the main idea of these approaches is to find out those l s with νl) < 0 and then to remove those moment from consideration asymptotically. By doing this, one can construct a more powerful test without resorting to the LFC. For a sequence of negative numbers b N, we define the generalized moment selection function or the recentering function as ˆµl) = ˆνl) 1 N ˆνl) < b N ) where 1 ) is the indicator function. We impose the following condition on the sequence of b N which is standard in the moment inequality literature. Assumption 4.2 The sequence of negative numbers b N satisfies that lim N b N = and lim N N 1/2 b N = 0. As we will see later, the generalized moment selection function will be added to the simulated process before we take the supremum. By doing this, we can approximate the null distribution without resorting to the LFC so as to improve the power of our test. 4.3 Simulated Critical Value We define the simulated test statistic as Ŝu N sup l LΨ u l) + N ˆµl)) and c as the 1 α)-th quantile of Ŝu N, i.e., { ) } c = sup q P u Ŝu N q 1 α. 9

12 The critical value is defined as ĉ = max{ c, η} where η is an arbitrarily small positive number, say Note that when µ 0 X) µ 1 X) < 0 a.s. in X, we can show that both Ŝ N and c converge to zero. As a result, defining the critical value ĉ as the maximum between c and η eliminates the need for complicated proof to show that ŜN converges to zero faster than c so that the level of the test can be well controlled. 4 The following lemma summarizes the asymptotic properties regarding c and ĉ. Define c as the 1 α)-th quantile of sup l L + Ψl) where L + {l : νl) 0}. 5 Lemma 4.3 Suppose Assumption 2.1, and 4.2 hold and α < 1/2. Then 1. c p c. 2. ĉ p max{c, η}. Note that Lemma 4.3 holds both under null hypotheses and under fixed alternative hypotheses. The first part of Lemma 4.3 shows that c will converge to c which implies that c is bounded in probability. Furthermore, under the null hypothesis, L + = L, so c will converge to c, the 1 α)-th quantile of null distribution 6, i.e., we can approximate the critical value without resorting the LFC. The second part follows from the first part and the continuity of the max operator. 5 Size and Power Properties In this section, we show that our test can control the size well and is consistent against fixed alternatives. We also discuss the local power properties of our test and show that out test is asymptotically unbiased against some N 1/2 local alternatives. Let M ) denote the Lebesgue measure. 5.1 Size and Power against Fixed Alternatives We summarize first main result regarding the size and the power against fixed alternatives of our test in the following theorem. Theorem 5.1 Suppose Assumption 2.1, and 4.2 hold and α 0 < 1/2. If we reject the H 0 when ŜN > ĉ, then: 4 Similar approach is used in Section 4.1 of AS. 5 If sup l L + Ψl) is degenerate at 0, then c = 0. 6 When µ 0X) µ 1X) < 0 a.s. in X which implies that L = {l = x, r) : r = 0}. As a result, the limit distribution of ŜN is degenerate at 0. In addition, c is equal to 0, so both ŜN and c converge to 0. 10

13 1. if H 0 is true and M{x : µ 0 x) µ 1 x) = 0}) > 0, lim η 0 lim N P reject H 0 ) = lim η 0 lim P ŜN > ĉ) = α. 2. if H 0 is true and M{x : µ 0 x) µ 1 x) = 0}) = 0, lim N P reject H 0 ) = lim N P ŜN > ĉ) = under any fixed alternative H 1 : νl) > 0 for some l L, lim N P reject H 0 ) = 1. The first part of Theorem 5.1 shows that our test will have exact size asymptotically when the set of x such that µ 0 x) µ 1 x) = 0 is not of measure zero. As a result, there is l with r > 0 such that νl) = 0 and V arψ l W )) > 0 and it follows that limiting null distribution is non-degenerate and c is strictly positive. This implies that if η is small enough, c will be strictly greater than η with probability approaching 1 and c = ĉ with probability approaching 1. As a result, first part follows from the fact that when η is small enough, c will converge to c when sample size converge to infinity. For the second part, we have both Ŝ and c converge to 0 in the limit by Proposition 3.6 and Lemma 4.3. As a result, for any positive η, we have Ŝ η ĉ with probability approaching 1 and the second part follows. The last part shows the consistency of our test and it follows from that the test statistic ŜN diverges to positive infinity and the critical value ĉ is bounded in probability. 5.2 Local Asymptotic Power We show that our test is unbiased against some N 1/2 local alternatives. We consider a sequence of τ N x) that will converge to τx) uniformly in x such that τx) 0 for all x X. Define the local alternative as H 1,N : τ N x) = τx) + δx) N, 8) where δx) is continuous in x. Let X {x : τx) = 0}. We impose conditions on the local alternatives we consider. Assumption 5.2 Suppose the following conditions hold: 1. MX ) > δx) 0 if x X. 3. Mδ + X ) > 0 where δ + {x : δx) > 0}. 11

14 Define L {l : E[τX)g l X)] = 0}, L ++ = {l : E[τ N X)g l X)] > 0 eventually} and dl) = E[δX)g l X)]. Under Assumption 5.2, it is not hard to show that L ++ L with ML ++ ) > 0. In addition, dl) 0 when l L and dl) > 0 when l L ++. The following lemma summarizes the limiting distribution of test statistic and limit of the critical value under the local alternatives that satisfy Assumption 5.2. Lemma 5.3 Suppose Assumption 2.1, and 4.2 hold and α < 1/2. Under the local alternatives 5.2) which satisfy Assumption 5.2, then D 1. Ŝ N supl L Ψl) + dl)). 2. ĉ p max{c, η} where c is the 1 α)-th quantile of sup l L Ψl). The following theorem shows that our test is unbiased against the local alternatives 5.2) defined in ) that satisfy Assumption 5.2. Theorem 5.4 Under same assumptions in Lemma 5.3, the asymptotic local power of our test is greater than or equal to α when η tends to zero, i.e., lim η 0 lim N P reject H 0 ) α. It is well-known that tests involving inequalities are only unbiased against some N 1/2 local alternatives. Note that if the deviation from the null is allowed to be negative on X, our test might be biased. A simple example regarding this can be found in Donald and Hsu s 2010a) Example Comparisons with Lee and Whang 2009) This section compares our tests and LW s. We first summarize LW s test. We show that that LW s test which is constructed based on a user-chosen strict subset of X might be inconsistent if the violation of the null is outside of the user-chosen subset. Furthermore, we show that under a broad set of N 1/2 local alternatives that do not converge to the least favorable case, our test is more powerful than LW s. 6.1 Lee and Whang s Test Define the kernel estimator for µ 0 x) µ 1 x) as ˆτx) = ˆµ 0 x) ˆµ 1 x) where {i:d ˆµ 0 x) = i =0} Y ik h x X i ) {i:d i =0} K hx X i ), ˆµ {i:d 0x) = i =1} Y ik h x X i ) {i:d i =1} K hx X i ). 12

15 The K h x X i ) = h d Kx X i )/h) where K ) is a kernel function and h is the bandwidth. The test statistic of LW s is a one-sided version of L 1 -type functionals of ˆτx) which is defined as T = N max{ˆτx), 0}wx)dx. X where wx) 0 is a weight function with support W x which is a strictly subset of X. They require W x to be a strict subset of X to avoid the boundary problem of kernel estimators. In the least favorable case of the null hypothesis, LW shows that T a N σ N D N 0, 1). 9) The exact definitions of a N and σn 2 are given in Appendix. Let â N and ˆσ N 2 be estimators for a N and σn 2 which are also defined in Appendix. It is shown that the asymptotic normality of T in 9) still holds with â N and ˆσ N in place of a N and σ N. Hence, define the standardized test statistic as Ŝ LW = T â N ˆσ N, and the rejection rule of LW s test is: Reject H 0 if ŜLW > z 1 α, where z 1 α is the 1 α)-th quantile of the standard normal. Theorem 4.1 and Theorem 4.2 of LW shows that their test can control the size well and is consistent against fixed alternative if M{x W x : τx) > 0}) > Advantages of Our Test over LW s 1. The first advantage of our test over LW s is that our test is consistent against all fixed alternatives, but LW s test is only consistent if M{x W x : τx) > 0}) > 0, because they need to restrict their attention to the subset W x to avoid the boundary problem of the kernel estimator ˆτx). Therefore, LW s test might not have power when the violation of the null is outside of W x. To restore the consistency of LW s test, one can allow the W x to expand to X when N tends to infinity, but the theory for this result is not trivial and remains an open question. 2. If the violation of the null is within W x, our test and LW s are both consistent, and which test is more powerful depends on the underlying fixed alternatives, i.e., it is possible to show that our test is more powerful under some fixed alternatives, but 13

16 LW s is more powerful under some other fixed alternatives. However, we can show that under a broad set of local alternatives, our test is more powerful than LW s. To show this, we modify our Assumption 5.2. Define A\B {x : x A but x B} for any two subsets A and B. We define C {x W x : τx) = 0} Assumption 6.1 In addition to the conditions in Assumption 5.2, we assume the following conditions: a) MC) > 0. b) Mδ + C) > 0. c) MW x \C) > 0. The last condition requires that the measure of W x \C {x : τx) < 0} is strictly positive which implies that the local alternatives will not converge to the least favorable case. The following theorem shows that LW s test has no power against the local alternatives satisfying Assumption 6.1. Note that the local alternatives defined in Assumption 6.1 is a subset of those defined in Assumption 5.2, so our test is still unbiased under Assumption 6.1. Therefore, our test is more powerful than LW s against those local alternatives satisfying Assumption 6.1. Theorem 6.2 Under the local alternatives 5.2) which satisfy Assumption 6.1, then the local power of LW s test is equal to zero. That is, lim N P ŜLW > z 1 α ) = 0. Theorem 6.2 shows that LW s test has no power against those local alternatives that do not converge to the least favorable case. In the proof of Theorem 6.2, we show that ŜLW converges to negative infinity under the local alternatives defined in Assumption 6.1. Therefore, the local power of LW s test is lim N P ŜLW > Z 1 α ) = 0. Again, our test is unbiased in this case, so our test is more powerful than LW s against a broad set of local alternatives satisfying Assumption 6.1. Note that Section 5.1 of LW proposes a more powerful test based on the contact set approach. In fact, with a suitable modification of assumption 6.1, we can show that the same result in Theorem 6.2 holds for LW s more powerful test. 14

17 7 Monte-Carlo Simulations In this section, we conduct small-scaled Monte-Carlo simulations to illustrate the finite sample performance of our test and LW s. Example 7.1 Let the DGP be: X = U x, D = 1U t < X), Y 0) = 2X 0.5) + U 0, Y 1) = 2X 0.5) + U 1 Y = DY 1) + 1 D)Y 0), where U x and U t are uniform distributions over [0, 1] and U 0 and U 1 are standard normals. U x, U t, U 0 and U 1 are independent. In Example 7.1, we have µ 0 X) µ 1 X) = 0 a.s. in X which is the least favorable case of the null hypothesis. We use this example to illustrate the size properties of our test and LW s when the null hypothesis is in the least favorable case. Example 7.2 Let the DGP be the same as in Example 7.1 except that Y 0) = 2X 0.5) 1X < 0.5) + U 0, Y 1) = U 1. In Example 7.2, we have µ 0 X) µ 1 X) 0 a.s. in X and the strictly inequality holds when X < 0.5. We use this example to illustrate the size properties of our test and LW s when the null hypothesis is not in least favorable case. Example 7.3 Let the DGP be the same as in Example 7.1 except that Y 1) = U 1. In Example 7.3, we have µ 0 X) µ 1 X) 0 when X 0.5 and µ 0 X) µ 1 X) > 0 when X > 0.5, i.e., the null hypothesis is violated. We use this example to show the power of our test and LW s against fixed alternatives. Example 7.4 Let the DGP be the same as in Example 7.1 except that Y 0) = 2X 0.5) 1X < 0.5) + N 1/2 + U 0, Y 1) = U 1. 15

18 We use Example 7.4 to demonstrate the local power of our test and LW s test. Note that the DGP in Example 7.4 converges to the DGP in Example 7.2 which is not in the least favorable case of the hull hypothesis. Following Section 3.5 of AS, we approximate Ŝ by a finite number of instrument functions. The intervals we consider here are those with lengths r 1 for r = 1,..., r 1 where r 1 = 8 so we use a total of 36 intervals 7. 8 Note that the smaller the recentering parameter is, the more conservative and the less powerful of our test in finite sample. To illustrate this, we consider b N = 0.5 log log N, log log N and 2 log log N. 9 When approximating the limiting process, we let U i s be independent uniform random variables on [ 3. 3] and the simulated test statistic is also approximated by the same instrument functions as well. For each simulation we approximate the p-value 10 of our tests by 1,000 repetitions. For all the examples, none of the critical values will converge to zero, so we set η = 0 in these cases. We consider three different sample size: 300, 500 and 1,000. For N = 300, the propensity score function is estimated by the SLE with power series: 1 and X. For N = 500 and 1,000, we use 1 and X and X When implementing LW s test, we set the W x = [0.05, 0.95] and we use the uniform weight function with wx) = 1 for all x W x and wx) = 0 otherwise. 12 The kernel function Ku) = u) 2 ) 1 u 0.5) with bandwidth h = C h ŝ X N 2/7 where C h {2, 3} and ŝ X is the sample standard deviation of X as suggested by LW. We use the Reimann sum to approximate the integral in the expression of T, â N and ˆσ 2 N based on 500 gridpoints evenly distributed in [0.05, 0.95]. 13 The rejection rates are calculated based on 5,000 simulations and are summarized in Table 1-4. The first observation is that the smaller the b N is the smaller rejections are in all cases. Table 1 presents the size of our test and LW s for Example 7.1 where the null hypothesis is in the least favorable case. Both tests are over-sized, but the size distortion 7 For example, if r = 3, we use instrument functions, 10 X 1/3), 11/3 X 2/3) and 12/3 X 1). Therefore, if r 1 = 8, we have a total of 36 instrument functions. 8 Our simulation results are not sensitive to the choice of r 1. 9 The rejection rates of our test with b N = 0.5 log log N will be the largest among the three b N s we consider. 10 Define the p-value of our test as ˆpŜN ) = P max{ψ u l)+ N ˆµl), η} > ŜN ). For α 0 < 1/2, we reject H 0 when ˆpŜN ) < α 0. Also, the critical value method and the p-value method are equivalent. 11 The simulation results are not sensitive to the choices of the power series. 12 In the simulations, we do not consider the estimated weight functions, ˆpx)1 ˆpx)) ˆf 2 x) and ˆfx), as suggested by LW, because in this case one should also consider the estimation effects of the estimated weight function when estimating â N and ˆσ 2 N. However, this is not discussed in LW. 13 The simulation results are not sensitive to the numbers of gridpoints we choose among 300, 500 and 1,

19 of our test which is between 0.44%-1.34% is smaller than theirs which is between 1.18%- 3.1%. Also, when the sample size increases, the size of both tests gets closer to 5% level. This suggests that it will be interesting to further investigate the higher order properties of both tests; however, this is beyond the scope of this paper and we leave it for future research. The simulation results of Example 7.2 are summarized in Table 2. In Table 2, we find that our test is much less conservative than LW s. Actually, by the same argument for Theorem 6.2, we can show that the size of LW s test is equal to zero asymptotically when the null hypothesis is not in the least favorable case. Simulation results in Table 2 support this result, as the size of LW s test with C N = 2 and 3 decreases to zero when the sample size increases. On the other hand, the size of our test with different b N s all increases to 5% level when the sample size increases. This confirms the first part of our Theorem 5.1. Table 3 summarizes the simulation results of Example 7.3, which demonstrate the powers of our test and LW s against fixed alternatives. In this case, our test is more powerful than LW s since the rejection rates of our test with different choices of b N s are all larger than theirs for all sample sizes we consider. Note that when N = 300, then difference in power of ours and theirs can be as high as 34.5%. A couple of reasons account for this. First, LW s test is constructed based on LFC and it is well-known that tests based on LFC are in general more conservative and less powerful. Our test that does not rely on LFCcan be more powerful in this case. Second, LW s test is restricted to W x = [0.05, 0.95], so LW s test does not utilize the information of the violation on x [0.95, 1]. However, our test can utilize this information, so our test can be more powerful than theirs in this example. Table 4 presents simulation results of Example 7.4. We use this example to illustrate Theorem 5.4 and Theorem 6.2 concerning the asymptotic local power of our test and LW s. Note that the DGP in Example 7.4 satisfies Assumption 6.1. Section 5 and Section 6 show that the asymptotic local power of our test will be greater than 5%, but that of LW s is zero. The simulation results support this theoretical finding since the simulated rejection rates of our test are between 4.62%-5.66% with N = 300 which increase to 7.84%-8.24% when N is 1,000, and those of LW s are 2.22% and 2.28% with N = 300 and decrease to 0.68% and 0.76% respectively when N is 1,

20 8 Extensions We present several extensions of our test in this section. We first extend our test to test for the condition stochastic dominance relation between treatment group and control group. Second, we discuss how to test for the null hypothesis that the conditional treatment effect in non-negative conditional on X a which is a strict subset of X. Especially, we focus on the cases where the unconfoundedness assumption will not hold when we condition on X a only. Third, we extend our tests to the cases where the unconfoundedness assumption does not hold, but there is a binary instrument variable available as in the traditional local average treatment effect setup. 8.1 Condition Stochastic Dominance Treatment Effects We can extend our method to test the conditional stochastic dominance relation between treatment and control groups which is also considered in LW. To test the conditional stochastic dominance relation between treatment and control groups, we first replace Y with 1Y y) for all y Y where Y is the common compact support of Y 1) and Y 0). To be more specific, let F 1 y x) and F 1 y x) denote the conditional CDFs of Y 1) and Y 0), the null hypothesis regarding condition stochastic dominance treatment effects is defined as H sd 0 : F 1 y X) F 0 y X) 0 y Y and a.s. in X which is equivalent to H sd 0 : νl, y) E We estimate νl, y) by ˆνl, y) = 1 N N [ D 1Y y) g l X) px) )] 1 D) 1Y y) 0 1 px) l L and y Y. 10) Di 1Y i y) g l X i ) 1 D ) i) 1Y i y). ˆpX i ) 1 ˆpX i ) which will converge to a mean zero Gaussian process Ψl, y) with covariance kernel generated by D 1Y y) 1 D) 1Y y) ψ l,y W ) = g l X) px) 1 px) F1 y X) D px)) + F )) 0y X) νl, y). px) 1 px) 18

21 Given that F 0 y x) and F 1 y x) are monotonically increasing and uniformly consistent estimators 14 for F 0 y x) and F 1 y x) as in DH, then Ψl, y) can be approximated well by Ψ u l, y) which is Ψ u l, y) = 1 N N The test statistic is defined as ŜN sd = N sup ˆνl, y). l L,y Y Di 1Y i y) U i g l X i ) 1 D i) 1Y i y) ˆpX i ) 1 ˆpX i ) F1 y X i ) + D i ˆpX i )) + F )) 0 y X i ) ˆνl, y) ˆpX i ) 1 ˆpX i ) ). 11) Given the generalized moment selection function ˆµl, y) = ˆνl, y) 1 N ˆνl, y) < b N ), the critical value is ĉ sd = max{ c sd, η} where c sd is the 1 α)-th quantile of sup l L,y Y Ψ u l, y)+ N ˆµl, y)). The rejection rule is: Reject H0 sd, if Ŝsd N > ĉsd. Under the same conditions in DH, the test for conditional stochastic dominance treatment effect shares the same prosperities with our test regarding the conditional average treatment effect. The advantages of our method over LW s in this case are similar to the conditional average treatment effect case. Hence, we omit the formal presentations of these results. The extension of our test to higher order stochastic dominance relation case is straightforward. For example, for j-th order stochastic dominance with j 2, we just need to replace 1Y y) with 1Y y) Y y) j 1 /j 1)!. 8.2 Conditional on a Strict Subset X a We extend our results to the case when the conditioning set is X a, a strict subset X, and the unconfoundedness assumption does not hold if we only condition on X a. 15 the treatment effect literature, most papers focus on either the treatment effects over the whole population or the conditional treatment effects conditional on the whole set of covariates X such that the unconfoundedness assumption holds. In For a researcher, requiring a policy that has uniformly positive effect on all subgroups defined by X might 14 F0 y x) is monotonically increasing if F 0 y 1 x) F 0 y 2 x) for all y 1 y 2 and for all x X. F0 y x) is uniformly consistent for F 0y x) if sup y Y,x X F 0y x) F 0y x) = o p1). 15 If the unconfoundedness assumption holds when we only condition on X a, then previous methods still work and the theory is valid when we replace X with X a. 19

22 be too strict sometimes. Our test is useful when the researcher is interested in a policy that has uniformly positive effect on all subgroups defined by X a. The null hypothesis is H a 0 : E[Y 0) Y 1) X a ] 0 a.s. in X a. 12) Define l a = x a, r) and L a = X a [0, r] where X a is the support of X a and r > 0. The set of instrument functions is defined as } Gcube {g a = la X a ) = 1X a C la ) : C la Ccube a, C a cube = {C la = k a j=1 [l j r, l j + r] : l a L a }, where k a denotes the dimension of X a. As a result, the null hypothesis defined in 12) is equivalent to H a 0 : νl) E [ 1 D)Y g la X a ) 1 px) DY )] 0, for all l a L a. px) As a result, by replacing g l X) with g la X a ), all results of our test can be easily extended to this case. However, it is not trivial to extend LW s test to this case. The main reason is that ˆτx a ) defined as ˆτx a ) N 1 D i)y i K h x a X ai ) N 1 D i)k h x a X ai ) N D iy i K h x a X ai ) N D ik h x a X ai ) is not a consistent estimator for τx a ) = E[Y 0) Y 1) X a = x a ] since the the treatment status is not unconfounded conditional on X a only. For LW s test to work, a two-step estimator for τx a ) as in Hsu and Lieli 2011) is needed and the extension of the theory of LW s test to this case is not a trivial exercise. 8.3 Tests without Unconfoundedness Finally, as LW s test, our method can be applied to the cases where the unconfoundedness assumption does not hold, but there is a binary instrument instrument Z) available. Under the conditional local average treatment effect LATE) setup as in Abadie, Angrist and Imbens 2002), and Abadie 2003), the LATE is defined as LAT Ex) E[Y 1) Y 0) X = x, C] where C denotes the group of compliers and the null hypothesis of interest is defined as H late 0 : LAT EX) 0 a.s. in X. 13) 20

23 It is well-known that LATEx) is identified by [ 1 Z)Y LAT Ex) = E 1 qx) ZY ]/ [ 1 Z)D X E qx) 1 qx) ZD ] X, qx) where qx) = P Z = 1 X = x). Given that the denominator is assumed to be strictly positive a.s. in X, the null hypothesis in 13) equivalent to [ 1 Z)Y H0 late : E 1 qx) ZY ] X 0, a.s. in X, 14) qx) which is virtually equivalent to 2) after we replace D and px) with Z and qx) respectively. In other words, if we treat Z as the primary treatment status, then the test for 14) is equivalent to 2). Similar argument applies to the stochastic dominance cases and to the cases where the conditioning set X a is a strict subset of X. 9 Conclusion In this paper, we propose a KS test for the null hypothesis that the conditional average treatment effect is uniformly non-negative over all subgroups defined by the covariates. Our test can control the size well asymptotically, is consistent against any fixed alternative and is unbiased against some local alternatives. We compare our test with LW s, which may be inconsistent in some cases, and our test is more powerful than theirs against some local alternatives. Monte-Carlo simulations confirm our theoretical findings. Several extensions are presented as well. For future study, it would be interesting to extend our method to test for the null hypotheses that for a fixed quantile index or for a continuum of quantile indexes, the conditional quantile treatment effect is uniformly non-negative over all subgroups defined by the covariates as in Section 5.2 of LW. To implement this, one would need to use the imputation method with an estimator for the conditional quantile functions as in Section 5.2 of LW because it is difficult to define these null hypotheses in the same way as 2) in this paper. 21

24 Appendix Let be a generic constant which varies in different cases. N. All limits are taken as Proof of Lemma 3.5: The proof is similar to the proof of Lemma 3.6 of DH. First, using the same argument in the addendum of HIR, we have sup l L 1 Nˆνl) νl)) N N ψ l W i ) = o p1). Second, given C cube is a VC class of sets, so G cube is a VC class of functions. By Lemma 9.9 in Kosorok 2008), for any measurable function fw ), we have {g l W )fw ) : l L} is a VC class of functions which implies that K = {ψ l W ) : l L} is a VC class of functions where 1 D)Y ψ l W ) = g l X) 1 px) DY µ0 X) + D px)) px) 1 px) + µ 1X) px) Hence, Lemma 3.5 follows from the functional central limit theorem. )) νl). Proof of Proposition 3.6: The proof of Proposition 3.6 is similar to that of Proposition 1 of Barrett and Donald 2003). Using the same argument with Ψ ) in the place of T j ) and l in the place of z, we can show that for any c > 0 ) lim sup P ŜN c) P sup Ψl) c l L and for arbitrary ϵ > 0, ) lim inf P ŜN c) P sup Ψl) c 2ϵ. l L Together, we have ) lim P ŜN c) = P sup Ψl) c. 15) l L If the CDF of sup l L Ψl) is discontinuous at 0, then Equation 15) implies that Ŝ D sup l L Ψl). If the CDF of sup l L Ψl) is continuous at 0, then following the same argument of the proof of Lemma 1 of Donald and Hsu 2010a), we can show that lim P ŜN 0) = P sup l L Ψl) 0) = 0. This implies that Ŝ D sup l L Ψl). To show the second part, under the fixed alternative H 1 : νl) > 0 for some l L, there is some l such that νl ) = ξ > 0. Note that νl ) p ξ and this implies that Ŝ N N νl ). 22

25 This completes our proof. Proof of Lemma 4.1: Rewrite Ψ u l) = N U i N ψ l W i ) N U i Di Y i g l X i ) N px i ) D )) iy i 1 ˆpX i ) N U i 1 Di )Y i g l X i ) N 1 ˆpX i ) 1 D )) i)y i 1 px i ) N U ) i g ˆµ 0 X i ) l X i ) px i ) ˆpX i ) N 1 ˆpX i ) + ˆµ )) 1X i ) ˆpX i ) N U i ˆµ0 X i ) g l X i )D i px i )) N 1 ˆpX i ) + ˆµ 1X i ) ˆpX i ) µ 0X i ) 1 px i ) µ )) 1X i ) px i ) N U i N ˆνl) νl)). 16) Note that by the multiplier central limit theorem of Corollary of Van der Vaart and Wellner 1996), we have 1 N N U i ψ l W i )) Pw Ψl). The only thing left is to show that all the remaining terms will disappear in the limit. We first look at χ N l W) = N U i Di Y i g l X i ) N px i ) D )) iy i. ˆpX i ) We want to show that χ N l W ) weakly converges to a zero process with probability 1 conditional on sample path W with probability one by showing that conditions i)-v) of Theorem 10.6 of Pollard 1990) hold with probability one. Conditions ii)-v) can be verified easily. To check i), note that 1X ji x j ) where x j X j where X j denotes the j-th element of X and X j is the compact support of X j. Let X N x j) = 1X j1 x j ),..., 1X jn x j )) and G jn = { X N x j) : We define some notations first. x j X j }. Let G N = 1,..., 1) which would be the envelope functions of G jn. For any two vectors a = a 1,..., a N ) and b = b 1,..., b N ) in R N, define the the pointwise product as a b = a 1 b 1,..., a N b N ). Define Θ N = θ 1,..., θ N ) be a vector of non-negative weights and Θ N G jn = {Θ N X N x j ) : x j X j }. The packing number Dϵ, T 0 ) for a subset 23

26 of T 0 of a metric space with metric d is defined as the largest m for which there exist points t 1,..., t m in T 0 with dt i, t j ) > ϵ for i j. Define the metric d as the l 1 norm on R N which is defined as v 1,..., v N ) 1 = N v i and dx j1, x j2 ) = N 1X ji x j1 ) 1X ji x j2 ). We also have for any x j1 x j2 x j3, dx j3, x j1 ) = dx j3, x j2 ) + dx j2, x j1 ). Hence we can show that D 1 ϵ Θ N G jn 1, Θ N G j ) 1/ϵ + 1 for all Θ N and for all sample path of X j. Similarly, define G + jn = {1X j1 x j ),..., 1X jn x j )) : x j X j } and we have D 1 ϵ Θ N G N 1, Θ N G + jn ) 1/ϵ + 1 for all Θ N. Define G jn = { X N x j1) X N x j2) : x j1, x j2 X j }. Let v 1,..., v N ) 2 = N v2 i )1/2 which is the l 2 norm on R N. Let D 1 and D 2 be the l 1 and l 2 packing numbers respectively. Then by Lemma 5.3 of Pollard 1990) and the relation between packing number and covering number, we have D 2 ϵ Θ N G jn 2, Θ N G j ) ϵ 4, where is a positive number not depending on the realization of the sample path of X j. Define G N = G 1N G kn Finally, by Equation 5.2) of Pollard 1990), we have D 2 ϵ Θ N G N 2, Θ N G N ) ϵ 4k 17) for some positive number. Given W and ω u Ω u, let F Ni ω u ) = M Y i ) U i ω u ), F N ω u ) = F N1 ω u ),..., F NN ω u ), N 1 f Ni U i ω u ), l W) = U i ω u ) g l X i ) ˆpX i ) 1 ) )D i Y i px i ) ) f N l, ω u W) = f N1 U 1 ω u ), l W),..., f NN U N ω u ), l W), { fn } F Nωu = l, ω u W) : l L Equation 17) implies that D 2 ϵ Θ N F N ω u ) 2, Θ N F Nωu ) ϵ 4k βϵ), since the G cube is a subset of G = {Π k j=1 1x j x j1 ) 1x j x j2 ) : x j1, x j2 X j }. Since 1 0 log βt)dt <, this is sufficient to show i). We can show that Hl 1, l 2 ) = lim N χ N l 1 W)χ N l 2 W) = 0 for all l 1 and l 2 in L conditional on sample path W with probability 1. Given this, it follows that conditional 24

27 on sample path W with probability 1 χ N l W) converges to a mean zero Gaussian process with covariance kernel Hl 1, l 2 ) = 0 which is a zero process. We have χ N l W) Pw 0. Similarly, we can show that N U i 1 Di )Y i g l X i ) N 1 ˆpX i ) 1 D )) i)y i P w 0, 1 px i ) N U i ˆµ0 X i ) g l X i )px i ) ˆpX i )) N 1 ˆpX i ) + ˆµ )) 1X i ) P w 0, ˆpX i ) N U ) i g ˆµ 0 X i ) l X i ) D i px i ) N 1 ˆpX i ) + ˆµ 1X i ) ˆpX i ) µ 0X i ) 1 px i ) µ )) 1X i ) P w 0. px i ) For the last term in 16), P sup l L 1 ˆνl) νl)) N N U i > ϵ ) [ ] 2 E sup l L ˆνl) νl) ϵ 2 0. The first inequality follows from the Chebyshev s inequality and E[ N U i/ N] 2 = 1. By dominated convergence theorem, we also have [ 2 E sup ˆνl) νl) ] 0. l L This completes the proof of Lemma 4.1. Proof of Lemma 4.3: To show Lemma 4.3, we first use the same argument in the proof of Lemma 3.5 of Donald and Hsu 2010a) to show that conditional on the sample path W with probability 1 S u supψ u l) + N ˆµl)) D sup Ψl) S. 18) l L l L + For the case where S is degenerate at 0, it is obvious that c will converge to 0 which is equal to c. In the other case, we assume that there is l L + such that V arψl)) > 0 so that S is not degenerate. It is true that P S 0) 1/2, so when α < 1/2, the 1 α)-th quantile of S, c, is strictly positive. On the other hand, by Tsirel son 1975), the distribution function of S, F S z), is continuous when z > 0. These are sufficient to show that c, 1 α)-th quantile of S u, will converge to c in probability. This completes the first part. For the second part, it is true that ĉ = max{ c, η} p max{c, η} since max{a, b} is a continuous function. 25

Estimation and Inference for Distribution Functions and Quantile Functions in Endogenous Treatment Effect Models. Abstract

Estimation and Inference for Distribution Functions and Quantile Functions in Endogenous Treatment Effect Models. Abstract Estimation and Inference for Distribution Functions and Quantile Functions in Endogenous Treatment Effect Models Yu-Chin Hsu Robert P. Lieli Tsung-Chih Lai Abstract We propose a new monotonizing method

More information

Flexible Estimation of Treatment Effect Parameters

Flexible Estimation of Treatment Effect Parameters Flexible Estimation of Treatment Effect Parameters Thomas MaCurdy a and Xiaohong Chen b and Han Hong c Introduction Many empirical studies of program evaluations are complicated by the presence of both

More information

ON THE CHOICE OF TEST STATISTIC FOR CONDITIONAL MOMENT INEQUALITES. Timothy B. Armstrong. October 2014 Revised July 2017

ON THE CHOICE OF TEST STATISTIC FOR CONDITIONAL MOMENT INEQUALITES. Timothy B. Armstrong. October 2014 Revised July 2017 ON THE CHOICE OF TEST STATISTIC FOR CONDITIONAL MOMENT INEQUALITES By Timothy B. Armstrong October 2014 Revised July 2017 COWLES FOUNDATION DISCUSSION PAPER NO. 1960R2 COWLES FOUNDATION FOR RESEARCH IN

More information

Testing the Unconfoundedness Assumption via Inverse Probability Weighted Estimators of (L)ATT

Testing the Unconfoundedness Assumption via Inverse Probability Weighted Estimators of (L)ATT Testing the Unconfoundedness Assumption via Inverse Probability Weighted Estimators of (L)ATT Stephen G. Donald Yu-Chin Hsu Robert P. Lieli October 21, 2013 Abstract We propose inverse probability weighted

More information

Testing the Unconfoundedness Assumption via Inverse Probability Weighted Estimators of (L)ATT

Testing the Unconfoundedness Assumption via Inverse Probability Weighted Estimators of (L)ATT Testing the Unconfoundedness Assumption via Inverse Probability Weighted Estimators of (L)ATT Stephen G. Donald Yu-Chin Hsu Robert P. Lieli December 16, 2012 Abstract We propose inverse probability weighted

More information

Testing for Treatment Effect Heterogeneity in Regression Discontinuity Design

Testing for Treatment Effect Heterogeneity in Regression Discontinuity Design Testing for Treatment Effect Heterogeneity in Regression Discontinuity Design Yu-Chin Hsu Institute of Economics Academia Sinica Shu Shen Department of Economics University of California, Davis E-mail:

More information

Optimal bandwidth selection for the fuzzy regression discontinuity estimator

Optimal bandwidth selection for the fuzzy regression discontinuity estimator Optimal bandwidth selection for the fuzzy regression discontinuity estimator Yoichi Arai Hidehiko Ichimura The Institute for Fiscal Studies Department of Economics, UCL cemmap working paper CWP49/5 Optimal

More information

Closest Moment Estimation under General Conditions

Closest Moment Estimation under General Conditions Closest Moment Estimation under General Conditions Chirok Han Victoria University of Wellington New Zealand Robert de Jong Ohio State University U.S.A October, 2003 Abstract This paper considers Closest

More information

TESTING REGRESSION MONOTONICITY IN ECONOMETRIC MODELS

TESTING REGRESSION MONOTONICITY IN ECONOMETRIC MODELS TESTING REGRESSION MONOTONICITY IN ECONOMETRIC MODELS DENIS CHETVERIKOV Abstract. Monotonicity is a key qualitative prediction of a wide array of economic models derived via robust comparative statics.

More information

Closest Moment Estimation under General Conditions

Closest Moment Estimation under General Conditions Closest Moment Estimation under General Conditions Chirok Han and Robert de Jong January 28, 2002 Abstract This paper considers Closest Moment (CM) estimation with a general distance function, and avoids

More information

Econometrica, Vol. 71, No. 1 (January, 2003), CONSISTENT TESTS FOR STOCHASTIC DOMINANCE. By Garry F. Barrett and Stephen G.

Econometrica, Vol. 71, No. 1 (January, 2003), CONSISTENT TESTS FOR STOCHASTIC DOMINANCE. By Garry F. Barrett and Stephen G. Econometrica, Vol. 71, No. 1 January, 2003), 71 104 CONSISTENT TESTS FOR STOCHASTIC DOMINANCE By Garry F. Barrett and Stephen G. Donald 1 Methods are proposed for testing stochastic dominance of any pre-specified

More information

Program Evaluation with High-Dimensional Data

Program Evaluation with High-Dimensional Data Program Evaluation with High-Dimensional Data Alexandre Belloni Duke Victor Chernozhukov MIT Iván Fernández-Val BU Christian Hansen Booth ESWC 215 August 17, 215 Introduction Goal is to perform inference

More information

Multiscale Adaptive Inference on Conditional Moment Inequalities

Multiscale Adaptive Inference on Conditional Moment Inequalities Multiscale Adaptive Inference on Conditional Moment Inequalities Timothy B. Armstrong 1 Hock Peng Chan 2 1 Yale University 2 National University of Singapore June 2013 Conditional moment inequality models

More information

Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score 1

Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score 1 Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score 1 Keisuke Hirano University of Miami 2 Guido W. Imbens University of California at Berkeley 3 and BER Geert Ridder

More information

Quantile Structural Treatment Effect: Application to Smoking Wage Penalty and its Determinants

Quantile Structural Treatment Effect: Application to Smoking Wage Penalty and its Determinants Quantile Structural Treatment Effect: Application to Smoking Wage Penalty and its Determinants Yu-Chin Hsu Institute of Economics Academia Sinica Kamhon Kan Institute of Economics Academia Sinica Tsung-Chih

More information

Testing for Rank Invariance or Similarity in Program Evaluation: The Effect of Training on Earnings Revisited

Testing for Rank Invariance or Similarity in Program Evaluation: The Effect of Training on Earnings Revisited Testing for Rank Invariance or Similarity in Program Evaluation: The Effect of Training on Earnings Revisited Yingying Dong and Shu Shen UC Irvine and UC Davis Sept 2015 @ Chicago 1 / 37 Dong, Shen Testing

More information

NONPARAMETRIC TESTS OF CONDITIONAL TREATMENT EFFECTS. Sokbae Lee and Yoon-Jae Whang. November 2009 COWLES FOUNDATION DISCUSSION PAPER NO.

NONPARAMETRIC TESTS OF CONDITIONAL TREATMENT EFFECTS. Sokbae Lee and Yoon-Jae Whang. November 2009 COWLES FOUNDATION DISCUSSION PAPER NO. NONPAAMETIC TESTS OF CONDITIONAL TEATMENT EFFECTS By Sokbae Lee and Yoon-Jae Whang November 2009 COWLES FOUNDATION DISCUSSION PAPE NO. 1740 COWLES FOUNDATION FO ESEACH IN ECONOMICS YALE UNIVESITY Box 208281

More information

Empirical Processes: General Weak Convergence Theory

Empirical Processes: General Weak Convergence Theory Empirical Processes: General Weak Convergence Theory Moulinath Banerjee May 18, 2010 1 Extended Weak Convergence The lack of measurability of the empirical process with respect to the sigma-field generated

More information

SIMILAR-ON-THE-BOUNDARY TESTS FOR MOMENT INEQUALITIES EXIST, BUT HAVE POOR POWER. Donald W. K. Andrews. August 2011

SIMILAR-ON-THE-BOUNDARY TESTS FOR MOMENT INEQUALITIES EXIST, BUT HAVE POOR POWER. Donald W. K. Andrews. August 2011 SIMILAR-ON-THE-BOUNDARY TESTS FOR MOMENT INEQUALITIES EXIST, BUT HAVE POOR POWER By Donald W. K. Andrews August 2011 COWLES FOUNDATION DISCUSSION PAPER NO. 1815 COWLES FOUNDATION FOR RESEARCH IN ECONOMICS

More information

Implementing Matching Estimators for. Average Treatment Effects in STATA

Implementing Matching Estimators for. Average Treatment Effects in STATA Implementing Matching Estimators for Average Treatment Effects in STATA Guido W. Imbens - Harvard University West Coast Stata Users Group meeting, Los Angeles October 26th, 2007 General Motivation Estimation

More information

Inference on Optimal Treatment Assignments

Inference on Optimal Treatment Assignments Inference on Optimal Treatment Assignments Timothy B. Armstrong Yale University Shu Shen University of California, Davis April 23, 2014 Abstract We consider inference on optimal treatment assignments.

More information

Inference on Optimal Treatment Assignments

Inference on Optimal Treatment Assignments Inference on Optimal Treatment Assignments Timothy B. Armstrong Yale University Shu Shen University of California, Davis April 8, 2015 Abstract We consider inference on optimal treatment assignments. Our

More information

New Developments in Econometrics Lecture 11: Difference-in-Differences Estimation

New Developments in Econometrics Lecture 11: Difference-in-Differences Estimation New Developments in Econometrics Lecture 11: Difference-in-Differences Estimation Jeff Wooldridge Cemmap Lectures, UCL, June 2009 1. The Basic Methodology 2. How Should We View Uncertainty in DD Settings?

More information

Partial Identification and Confidence Intervals

Partial Identification and Confidence Intervals Partial Identification and Confidence Intervals Jinyong Hahn Department of Economics, UCLA Geert Ridder Department of Economics, USC September 17, 009 Abstract We consider statistical inference on a single

More information

INFERENCE BASED ON MANY CONDITIONAL MOMENT INEQUALITIES. Donald W. K. Andrews and Xiaoxia Shi. July 2015 Revised April 2016

INFERENCE BASED ON MANY CONDITIONAL MOMENT INEQUALITIES. Donald W. K. Andrews and Xiaoxia Shi. July 2015 Revised April 2016 INFERENCE BASED ON MANY CONDITIONAL MOMENT INEQUALITIES By Donald W. K. Andrews and Xiaoxia Shi July 2015 Revised April 2016 COWLES FOUNDATION DISCUSSION PAPER NO. 2010R COWLES FOUNDATION FOR RESEARCH

More information

Supplementary material to: Tolerating deance? Local average treatment eects without monotonicity.

Supplementary material to: Tolerating deance? Local average treatment eects without monotonicity. Supplementary material to: Tolerating deance? Local average treatment eects without monotonicity. Clément de Chaisemartin September 1, 2016 Abstract This paper gathers the supplementary material to de

More information

Comparison of inferential methods in partially identified models in terms of error in coverage probability

Comparison of inferential methods in partially identified models in terms of error in coverage probability Comparison of inferential methods in partially identified models in terms of error in coverage probability Federico A. Bugni Department of Economics Duke University federico.bugni@duke.edu. September 22,

More information

Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions

Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions Vadim Marmer University of British Columbia Artyom Shneyerov CIRANO, CIREQ, and Concordia University August 30, 2010 Abstract

More information

What s New in Econometrics? Lecture 14 Quantile Methods

What s New in Econometrics? Lecture 14 Quantile Methods What s New in Econometrics? Lecture 14 Quantile Methods Jeff Wooldridge NBER Summer Institute, 2007 1. Reminders About Means, Medians, and Quantiles 2. Some Useful Asymptotic Results 3. Quantile Regression

More information

Identification and Inference on Regressions with Missing Covariate Data

Identification and Inference on Regressions with Missing Covariate Data Identification and Inference on Regressions with Missing Covariate Data Esteban M. Aucejo Department of Economics London School of Economics and Political Science e.m.aucejo@lse.ac.uk V. Joseph Hotz Department

More information

Estimation of the Conditional Variance in Paired Experiments

Estimation of the Conditional Variance in Paired Experiments Estimation of the Conditional Variance in Paired Experiments Alberto Abadie & Guido W. Imbens Harvard University and BER June 008 Abstract In paired randomized experiments units are grouped in pairs, often

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Partial Identification and Inference in Binary Choice and Duration Panel Data Models

Partial Identification and Inference in Binary Choice and Duration Panel Data Models Partial Identification and Inference in Binary Choice and Duration Panel Data Models JASON R. BLEVINS The Ohio State University July 20, 2010 Abstract. Many semiparametric fixed effects panel data models,

More information

ECO Class 6 Nonparametric Econometrics

ECO Class 6 Nonparametric Econometrics ECO 523 - Class 6 Nonparametric Econometrics Carolina Caetano Contents 1 Nonparametric instrumental variable regression 1 2 Nonparametric Estimation of Average Treatment Effects 3 2.1 Asymptotic results................................

More information

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications. Stat 811 Lecture Notes The Wald Consistency Theorem Charles J. Geyer April 9, 01 1 Analyticity Assumptions Let { f θ : θ Θ } be a family of subprobability densities 1 with respect to a measure µ on a measurable

More information

TESTING IDENTIFYING ASSUMPTIONS IN FUZZY REGRESSION DISCONTINUITY DESIGN 1. INTRODUCTION

TESTING IDENTIFYING ASSUMPTIONS IN FUZZY REGRESSION DISCONTINUITY DESIGN 1. INTRODUCTION TESTING IDENTIFYING ASSUMPTIONS IN FUZZY REGRESSION DISCONTINUITY DESIGN YOICHI ARAI a YU-CHIN HSU b TORU KITAGAWA c ISMAEL MOURIFIÉ d YUANYUAN WAN e GRIPS ACADEMIA SINICA UCL UNIVERSITY OF TORONTO ABSTRACT.

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations Research

More information

SIMILAR-ON-THE-BOUNDARY TESTS FOR MOMENT INEQUALITIES EXIST, BUT HAVE POOR POWER. Donald W. K. Andrews. August 2011 Revised March 2012

SIMILAR-ON-THE-BOUNDARY TESTS FOR MOMENT INEQUALITIES EXIST, BUT HAVE POOR POWER. Donald W. K. Andrews. August 2011 Revised March 2012 SIMILAR-ON-THE-BOUNDARY TESTS FOR MOMENT INEQUALITIES EXIST, BUT HAVE POOR POWER By Donald W. K. Andrews August 2011 Revised March 2012 COWLES FOUNDATION DISCUSSION PAPER NO. 1815R COWLES FOUNDATION FOR

More information

Robust Confidence Intervals for Average Treatment Effects under Limited Overlap

Robust Confidence Intervals for Average Treatment Effects under Limited Overlap DISCUSSION PAPER SERIES IZA DP No. 8758 Robust Confidence Intervals for Average Treatment Effects under Limited Overlap Christoph Rothe January 2015 Forschungsinstitut zur Zukunft der Arbeit Institute

More information

Estimation of the Bivariate and Marginal Distributions with Censored Data

Estimation of the Bivariate and Marginal Distributions with Censored Data Estimation of the Bivariate and Marginal Distributions with Censored Data Michael Akritas and Ingrid Van Keilegom Penn State University and Eindhoven University of Technology May 22, 2 Abstract Two new

More information

Identification and Estimation of Partially Linear Censored Regression Models with Unknown Heteroscedasticity

Identification and Estimation of Partially Linear Censored Regression Models with Unknown Heteroscedasticity Identification and Estimation of Partially Linear Censored Regression Models with Unknown Heteroscedasticity Zhengyu Zhang School of Economics Shanghai University of Finance and Economics zy.zhang@mail.shufe.edu.cn

More information

The Value of Knowing the Propensity Score for Estimating Average Treatment Effects

The Value of Knowing the Propensity Score for Estimating Average Treatment Effects DISCUSSION PAPER SERIES IZA DP No. 9989 The Value of Knowing the Propensity Score for Estimating Average Treatment Effects Christoph Rothe June 2016 Forschungsinstitut zur Zukunft der Arbeit Institute

More information

Inference For High Dimensional M-estimates. Fixed Design Results

Inference For High Dimensional M-estimates. Fixed Design Results : Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and

More information

Bayesian and frequentist inequality tests

Bayesian and frequentist inequality tests Bayesian and frequentist inequality tests David M. Kaplan Longhao Zhuo Department of Economics, University of Missouri July 1, 2016 Abstract Bayesian and frequentist criteria are fundamentally different,

More information

Incorporating Covariates in the Measurement of Welfare and Inequality: Methods and Applications

Incorporating Covariates in the Measurement of Welfare and Inequality: Methods and Applications Incorporating Covariates in the Measurement of Welfare and Inequality: Methods and Applications Stephen G. Donald, UT - Austin Yu-Chin Hsu, UT Austin (soon to be U Missouri) Garry Barrett, UNSW (soon to

More information

Implementing Matching Estimators for. Average Treatment Effects in STATA. Guido W. Imbens - Harvard University Stata User Group Meeting, Boston

Implementing Matching Estimators for. Average Treatment Effects in STATA. Guido W. Imbens - Harvard University Stata User Group Meeting, Boston Implementing Matching Estimators for Average Treatment Effects in STATA Guido W. Imbens - Harvard University Stata User Group Meeting, Boston July 26th, 2006 General Motivation Estimation of average effect

More information

14.30 Introduction to Statistical Methods in Economics Spring 2009

14.30 Introduction to Statistical Methods in Economics Spring 2009 MIT OpenCourseWare http://ocw.mit.edu 4.0 Introduction to Statistical Methods in Economics Spring 009 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 13: Entropy Calculations

Introduction to Empirical Processes and Semiparametric Inference Lecture 13: Entropy Calculations Introduction to Empirical Processes and Semiparametric Inference Lecture 13: Entropy Calculations Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations Research

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model Minimum Hellinger Distance Estimation in a Semiparametric Mixture Model Sijia Xiang 1, Weixin Yao 1, and Jingjing Wu 2 1 Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802.

More information

Supplementary Appendix: Difference-in-Differences with. Multiple Time Periods and an Application on the Minimum. Wage and Employment

Supplementary Appendix: Difference-in-Differences with. Multiple Time Periods and an Application on the Minimum. Wage and Employment Supplementary Appendix: Difference-in-Differences with Multiple Time Periods and an Application on the Minimum Wage and mployment Brantly Callaway Pedro H. C. Sant Anna August 3, 208 This supplementary

More information

Appendix B for The Evolution of Strategic Sophistication (Intended for Online Publication)

Appendix B for The Evolution of Strategic Sophistication (Intended for Online Publication) Appendix B for The Evolution of Strategic Sophistication (Intended for Online Publication) Nikolaus Robalino and Arthur Robson Appendix B: Proof of Theorem 2 This appendix contains the proof of Theorem

More information

Identification and Inference on Regressions with Missing Covariate Data

Identification and Inference on Regressions with Missing Covariate Data Identification and Inference on Regressions with Missing Covariate Data Esteban M. Aucejo Department of Economics London School of Economics e.m.aucejo@lse.ac.uk V. Joseph Hotz Department of Economics

More information

Estimating Semi-parametric Panel Multinomial Choice Models

Estimating Semi-parametric Panel Multinomial Choice Models Estimating Semi-parametric Panel Multinomial Choice Models Xiaoxia Shi, Matthew Shum, Wei Song UW-Madison, Caltech, UW-Madison September 15, 2016 1 / 31 Introduction We consider the panel multinomial choice

More information

Difference-in-Differences Estimation

Difference-in-Differences Estimation Difference-in-Differences Estimation Jeff Wooldridge Michigan State University Programme Evaluation for Policy Analysis Institute for Fiscal Studies June 2012 1. The Basic Methodology 2. How Should We

More information

On Consistent Hypotheses Testing

On Consistent Hypotheses Testing Mikhail Ermakov St.Petersburg E-mail: erm2512@gmail.com History is rather distinctive. Stein (1954); Hodges (1954); Hoefding and Wolfowitz (1956), Le Cam and Schwartz (1960). special original setups: Shepp,

More information

Computer simulation on homogeneity testing for weighted data sets used in HEP

Computer simulation on homogeneity testing for weighted data sets used in HEP Computer simulation on homogeneity testing for weighted data sets used in HEP Petr Bouř and Václav Kůs Department of Mathematics, Faculty of Nuclear Sciences and Physical Engineering, Czech Technical University

More information

Cross-fitting and fast remainder rates for semiparametric estimation

Cross-fitting and fast remainder rates for semiparametric estimation Cross-fitting and fast remainder rates for semiparametric estimation Whitney K. Newey James M. Robins The Institute for Fiscal Studies Department of Economics, UCL cemmap working paper CWP41/17 Cross-Fitting

More information

Testing for Rank Invariance or Similarity in Program Evaluation

Testing for Rank Invariance or Similarity in Program Evaluation Testing for Rank Invariance or Similarity in Program Evaluation Yingying Dong University of California, Irvine Shu Shen University of California, Davis First version, February 2015; this version, October

More information

L p Functions. Given a measure space (X, µ) and a real number p [1, ), recall that the L p -norm of a measurable function f : X R is defined by

L p Functions. Given a measure space (X, µ) and a real number p [1, ), recall that the L p -norm of a measurable function f : X R is defined by L p Functions Given a measure space (, µ) and a real number p [, ), recall that the L p -norm of a measurable function f : R is defined by f p = ( ) /p f p dµ Note that the L p -norm of a function f may

More information

Working Paper No Maximum score type estimators

Working Paper No Maximum score type estimators Warsaw School of Economics Institute of Econometrics Department of Applied Econometrics Department of Applied Econometrics Working Papers Warsaw School of Economics Al. iepodleglosci 64 02-554 Warszawa,

More information

Estimating Conditional Average Treatment Effects

Estimating Conditional Average Treatment Effects Estimating Conditional Average Treatment Effects Jason Abrevaya Yu-Chin Hsu Robert P. Lieli July, 202 Abstract We consider a functional parameter called the conditional average treatment effect CATE, designed

More information

City, University of London Institutional Repository

City, University of London Institutional Repository City Research Online City, University of London Institutional Repository Citation: Kato, K., F Galvao Jr, A. & Montes-Rojas, G. (2012). Asymptotics for panel quantile regression models with individual

More information

Testing instrument validity for LATE identification based on inequality moment constraints

Testing instrument validity for LATE identification based on inequality moment constraints Testing instrument validity for LATE identification based on inequality moment constraints Martin Huber* and Giovanni Mellace** *Harvard University, Dept. of Economics and University of St. Gallen, Dept.

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 12: Glivenko-Cantelli and Donsker Results

Introduction to Empirical Processes and Semiparametric Inference Lecture 12: Glivenko-Cantelli and Donsker Results Introduction to Empirical Processes and Semiparametric Inference Lecture 12: Glivenko-Cantelli and Donsker Results Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued

Introduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued Introduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and

More information

New Developments in Econometrics Lecture 16: Quantile Estimation

New Developments in Econometrics Lecture 16: Quantile Estimation New Developments in Econometrics Lecture 16: Quantile Estimation Jeff Wooldridge Cemmap Lectures, UCL, June 2009 1. Review of Means, Medians, and Quantiles 2. Some Useful Asymptotic Results 3. Quantile

More information

Adaptive test of conditional moment inequalities

Adaptive test of conditional moment inequalities Adaptive test of conditional moment inequalities Denis Chetverikov The Institute for Fiscal Studies Department of Economics, UCL cemmap working paper CWP36/12 Adaptive Test of Conditional Moment Inequalities

More information

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure? MA 645-4A (Real Analysis), Dr. Chernov Homework assignment 1 (Due ). Show that the open disk x 2 + y 2 < 1 is a countable union of planar elementary sets. Show that the closed disk x 2 + y 2 1 is a countable

More information

Probability and Measure

Probability and Measure Part II Year 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2018 84 Paper 4, Section II 26J Let (X, A) be a measurable space. Let T : X X be a measurable map, and µ a probability

More information

Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model.

Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model. Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model By Michael Levine Purdue University Technical Report #14-03 Department of

More information

Inference Based on Conditional Moment Inequalities

Inference Based on Conditional Moment Inequalities Inference Based on Conditional Moment Inequalities Donald W. K. Andrews Cowles Foundation for Research in Economics Yale University Xiaoxia Shi Department of Economics University of Wisconsin, Madison

More information

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9 MAT 570 REAL ANALYSIS LECTURE NOTES PROFESSOR: JOHN QUIGG SEMESTER: FALL 204 Contents. Sets 2 2. Functions 5 3. Countability 7 4. Axiom of choice 8 5. Equivalence relations 9 6. Real numbers 9 7. Extended

More information

3 Integration and Expectation

3 Integration and Expectation 3 Integration and Expectation 3.1 Construction of the Lebesgue Integral Let (, F, µ) be a measure space (not necessarily a probability space). Our objective will be to define the Lebesgue integral R fdµ

More information

Understanding Regressions with Observations Collected at High Frequency over Long Span

Understanding Regressions with Observations Collected at High Frequency over Long Span Understanding Regressions with Observations Collected at High Frequency over Long Span Yoosoon Chang Department of Economics, Indiana University Joon Y. Park Department of Economics, Indiana University

More information

Asymptotic Statistics-VI. Changliang Zou

Asymptotic Statistics-VI. Changliang Zou Asymptotic Statistics-VI Changliang Zou Kolmogorov-Smirnov distance Example (Kolmogorov-Smirnov confidence intervals) We know given α (0, 1), there is a well-defined d = d α,n such that, for any continuous

More information

Testing for Rank Invariance or Similarity in Program Evaluation: The Effect of Training on Earnings Revisited

Testing for Rank Invariance or Similarity in Program Evaluation: The Effect of Training on Earnings Revisited Testing for Rank Invariance or Similarity in Program Evaluation: The Effect of Training on Earnings Revisited Yingying Dong University of California, Irvine Shu Shen University of California, Davis First

More information

Statistical inference on Lévy processes

Statistical inference on Lévy processes Alberto Coca Cabrero University of Cambridge - CCA Supervisors: Dr. Richard Nickl and Professor L.C.G.Rogers Funded by Fundación Mutua Madrileña and EPSRC MASDOC/CCA student workshop 2013 26th March Outline

More information

arxiv: v3 [math.st] 23 May 2016

arxiv: v3 [math.st] 23 May 2016 Inference in partially identified models with many moment arxiv:1604.02309v3 [math.st] 23 May 2016 inequalities using Lasso Federico A. Bugni Mehmet Caner Department of Economics Department of Economics

More information

Michael Lechner Causal Analysis RDD 2014 page 1. Lecture 7. The Regression Discontinuity Design. RDD fuzzy and sharp

Michael Lechner Causal Analysis RDD 2014 page 1. Lecture 7. The Regression Discontinuity Design. RDD fuzzy and sharp page 1 Lecture 7 The Regression Discontinuity Design fuzzy and sharp page 2 Regression Discontinuity Design () Introduction (1) The design is a quasi-experimental design with the defining characteristic

More information

Testing Downside-Risk Efficiency Under Distress

Testing Downside-Risk Efficiency Under Distress Testing Downside-Risk Efficiency Under Distress Jesus Gonzalo Universidad Carlos III de Madrid Jose Olmo City University of London XXXIII Simposio Analisis Economico 1 Some key lines Risk vs Uncertainty.

More information

Imbens/Wooldridge, Lecture Notes 1, Summer 07 1

Imbens/Wooldridge, Lecture Notes 1, Summer 07 1 Imbens/Wooldridge, Lecture Notes 1, Summer 07 1 What s New in Econometrics NBER, Summer 2007 Lecture 1, Monday, July 30th, 9.00-10.30am Estimation of Average Treatment Effects Under Unconfoundedness 1.

More information

Statistical Properties of Numerical Derivatives

Statistical Properties of Numerical Derivatives Statistical Properties of Numerical Derivatives Han Hong, Aprajit Mahajan, and Denis Nekipelov Stanford University and UC Berkeley November 2010 1 / 63 Motivation Introduction Many models have objective

More information

Brownian Motion. 1 Definition Brownian Motion Wiener measure... 3

Brownian Motion. 1 Definition Brownian Motion Wiener measure... 3 Brownian Motion Contents 1 Definition 2 1.1 Brownian Motion................................. 2 1.2 Wiener measure.................................. 3 2 Construction 4 2.1 Gaussian process.................................

More information

Defining the Integral

Defining the Integral Defining the Integral In these notes we provide a careful definition of the Lebesgue integral and we prove each of the three main convergence theorems. For the duration of these notes, let (, M, µ) be

More information

Supplemental Material 1 for On Optimal Inference in the Linear IV Model

Supplemental Material 1 for On Optimal Inference in the Linear IV Model Supplemental Material 1 for On Optimal Inference in the Linear IV Model Donald W. K. Andrews Cowles Foundation for Research in Economics Yale University Vadim Marmer Vancouver School of Economics University

More information

Stability of optimization problems with stochastic dominance constraints

Stability of optimization problems with stochastic dominance constraints Stability of optimization problems with stochastic dominance constraints D. Dentcheva and W. Römisch Stevens Institute of Technology, Hoboken Humboldt-University Berlin www.math.hu-berlin.de/~romisch SIAM

More information

Copyright 2010 Pearson Education, Inc. Publishing as Prentice Hall.

Copyright 2010 Pearson Education, Inc. Publishing as Prentice Hall. .1 Limits of Sequences. CHAPTER.1.0. a) True. If converges, then there is an M > 0 such that M. Choose by Archimedes an N N such that N > M/ε. Then n N implies /n M/n M/N < ε. b) False. = n does not converge,

More information

A Simple Adjustment for Bandwidth Snooping

A Simple Adjustment for Bandwidth Snooping A Simple Adjustment for Bandwidth Snooping Timothy B. Armstrong Yale University Michal Kolesár Princeton University October 18, 2016 Abstract Kernel-based estimators are often evaluated at multiple bandwidths

More information

Instrumental Variables Estimation and Weak-Identification-Robust. Inference Based on a Conditional Quantile Restriction

Instrumental Variables Estimation and Weak-Identification-Robust. Inference Based on a Conditional Quantile Restriction Instrumental Variables Estimation and Weak-Identification-Robust Inference Based on a Conditional Quantile Restriction Vadim Marmer Department of Economics University of British Columbia vadim.marmer@gmail.com

More information

Nonparametric Identification and Estimation of a Transformation Model

Nonparametric Identification and Estimation of a Transformation Model Nonparametric and of a Transformation Model Hidehiko Ichimura and Sokbae Lee University of Tokyo and Seoul National University 15 February, 2012 Outline 1. The Model and Motivation 2. 3. Consistency 4.

More information

ESTIMATING AVERAGE TREATMENT EFFECTS: REGRESSION DISCONTINUITY DESIGNS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics

ESTIMATING AVERAGE TREATMENT EFFECTS: REGRESSION DISCONTINUITY DESIGNS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics ESTIMATING AVERAGE TREATMENT EFFECTS: REGRESSION DISCONTINUITY DESIGNS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 1. Introduction 2. The Sharp RD Design 3.

More information

Online Appendix for Targeting Policies: Multiple Testing and Distributional Treatment Effects

Online Appendix for Targeting Policies: Multiple Testing and Distributional Treatment Effects Online Appendix for Targeting Policies: Multiple Testing and Distributional Treatment Effects Steven F Lehrer Queen s University, NYU Shanghai, and NBER R Vincent Pohl University of Georgia November 2016

More information

Integration on Measure Spaces

Integration on Measure Spaces Chapter 3 Integration on Measure Spaces In this chapter we introduce the general notion of a measure on a space X, define the class of measurable functions, and define the integral, first on a class of

More information

Probability and Measure

Probability and Measure Probability and Measure Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham, NC, USA Convergence of Random Variables 1. Convergence Concepts 1.1. Convergence of Real

More information

Specification Test for Instrumental Variables Regression with Many Instruments

Specification Test for Instrumental Variables Regression with Many Instruments Specification Test for Instrumental Variables Regression with Many Instruments Yoonseok Lee and Ryo Okui April 009 Preliminary; comments are welcome Abstract This paper considers specification testing

More information

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Ruppert A. EMPIRICAL ESTIMATE OF THE KERNEL MIXTURE Here we

More information

1 Introduction. 2 Measure theoretic definitions

1 Introduction. 2 Measure theoretic definitions 1 Introduction These notes aim to recall some basic definitions needed for dealing with random variables. Sections to 5 follow mostly the presentation given in chapter two of [1]. Measure theoretic definitions

More information

Limit theory and inference about conditional distributions

Limit theory and inference about conditional distributions Limit theory and inference about conditional distributions Purevdorj Tuvaandorj and Victoria Zinde-Walsh McGill University, CIREQ and CIRANO purevdorj.tuvaandorj@mail.mcgill.ca victoria.zinde-walsh@mcgill.ca

More information

A Simple Adjustment for Bandwidth Snooping

A Simple Adjustment for Bandwidth Snooping A Simple Adjustment for Bandwidth Snooping Timothy B. Armstrong Yale University Michal Kolesár Princeton University June 28, 2017 Abstract Kernel-based estimators such as local polynomial estimators in

More information