Marginal Quantile Treatment E ect

Similar documents
Marginal Quantile Treatment E ect

Identi cation of Positive Treatment E ects in. Randomized Experiments with Non-Compliance

Flexible Estimation of Treatment Effect Parameters

The Econometric Evaluation of Policy Design: Part I: Heterogeneity in Program Impacts, Modeling Self-Selection, and Parameters of Interest

Principles Underlying Evaluation Estimators

Testing for Rank Invariance or Similarity in Program Evaluation: The Effect of Training on Earnings Revisited

Testing for Rank Invariance or Similarity in Program Evaluation

The relationship between treatment parameters within a latent variable framework

ECONOMETRICS II (ECO 2401) Victor Aguirregabiria. Spring 2018 TOPIC 4: INTRODUCTION TO THE EVALUATION OF TREATMENT EFFECTS

What s New in Econometrics? Lecture 14 Quantile Methods

Policy-Relevant Treatment Effects

SIMILAR-ON-THE-BOUNDARY TESTS FOR MOMENT INEQUALITIES EXIST, BUT HAVE POOR POWER. Donald W. K. Andrews. August 2011

Testing for Rank Invariance or Similarity in Program Evaluation: The Effect of Training on Earnings Revisited

A Test for Rank Similarity and Partial Identification of the Distribution of Treatment Effects Preliminary and incomplete

Empirical Methods in Applied Microeconomics

Non-parametric Identi cation and Testable Implications of the Roy Model

CALIFORNIA INSTITUTE OF TECHNOLOGY

Limited Information Econometrics

Control Functions in Nonseparable Simultaneous Equations Models 1

Nonparametric Identi cation of Regression Models Containing a Misclassi ed Dichotomous Regressor Without Instruments

New Developments in Econometrics Lecture 16: Quantile Estimation

A Test for Rank Similarity and Partial Identification of the Distribution of Treatment Effects Preliminary and incomplete

Lecture 8. Roy Model, IV with essential heterogeneity, MTE

Generalized Roy Model and Cost-Benefit Analysis of Social Programs 1

The Asymptotic Variance of Semi-parametric Estimators with Generated Regressors

The Generalized Roy Model and Treatment Effects

Using Matching, Instrumental Variables and Control Functions to Estimate Economic Choice Models

Identifying Structural E ects in Nonseparable Systems Using Covariates

Supplementary material to: Tolerating deance? Local average treatment eects without monotonicity.

Job Displacement of Older Workers during the Great Recession: Tight Bounds on Distributional Treatment Effect Parameters

Exploring Marginal Treatment Effects

Lecture Notes on Measurement Error

A test of the conditional independence assumption in sample selection models

Fuzzy Differences-in-Differences

Instrumental Variables in Models with Multiple Outcomes: The General Unordered Case

STRUCTURAL EQUATIONS, TREATMENT EFFECTS, AND ECONOMETRIC POLICY EVALUATION 1

Differences-in-differences, differences of quantiles and quantiles of differences

Lecture 11 Roy model, MTE, PRTE

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies

Nonparametric Identi cation of Regression Models Containing a Misclassi ed Dichotomous Regressor Without Instruments

Lectures on Identi cation 2

Partial Identification of Nonseparable Models using Binary Instruments

Lecture 4: Linear panel models

Chapter 1. GMM: Basic Concepts

Simple Estimators for Semiparametric Multinomial Choice Models

Estimation of Treatment Effects under Essential Heterogeneity

Trimming for Bounds on Treatment Effects with Missing Outcomes *

Simultaneous Choice Models: The Sandwich Approach to Nonparametric Analysis

Testing Rank Similarity

Generated Covariates in Nonparametric Estimation: A Short Review.

Four Parameters of Interest in the Evaluation. of Social Programs. James J. Heckman Justin L. Tobias Edward Vytlacil

Bounds on Average and Quantile Treatment E ects of Job Corps Training on Participants Wages

Lecture 9: Quantile Methods 2

STRUCTURAL EQUATIONS, TREATMENT EFFECTS AND ECONOMETRIC POLICY EVALUATION 1. By James J. Heckman and Edward Vytlacil

NBER WORKING PAPER SERIES

Comparative Advantage and Schooling

Quantile methods. Class Notes Manuel Arellano December 1, Let F (r) =Pr(Y r). Forτ (0, 1), theτth population quantile of Y is defined to be

Nonparametric Identi cation and Estimation of Truncated Regression Models with Heteroskedasticity

SIMILAR-ON-THE-BOUNDARY TESTS FOR MOMENT INEQUALITIES EXIST, BUT HAVE POOR POWER. Donald W. K. Andrews. August 2011 Revised March 2012

Unconditional Quantile Regressions

PSC 504: Instrumental Variables

Matching. James J. Heckman Econ 312. This draft, May 15, Intro Match Further MTE Impl Comp Gen. Roy Req Info Info Add Proxies Disc Modal Summ

Unconditional Quantile Regression with Endogenous Regressors

Weak Stochastic Increasingness, Rank Exchangeability, and Partial Identification of The Distribution of Treatment Effects

Continuous Treatments

Bounds on Population Average Treatment E ects with an Instrumental Variable

MC3: Econometric Theory and Methods. Course Notes 4

TECHNICAL WORKING PAPER SERIES LOCAL INSTRUMENTAL VARIABLES. James J. Heckman Edward J. Vytlacil

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria

A Note on the Closed-form Identi cation of Regression Models with a Mismeasured Binary Regressor

Counterfactual worlds

Stochastic Demand and Revealed Preference

Labor Economics, Lecture 11: Partial Equilibrium Sequential Search

Econometrics of causal inference. Throughout, we consider the simplest case of a linear outcome equation, and homogeneous

Impact Evaluation Technical Workshop:

Speci cation of Conditional Expectation Functions

ted: a Stata Command for Testing Stability of Regression Discontinuity Models

IDENTIFICATION OF MARGINAL EFFECTS IN NONSEPARABLE MODELS WITHOUT MONOTONICITY

Econ 273B Advanced Econometrics Spring

7(&+1,&$/:25.,1*3$3(56(5,(6,167580(17$/9$5,$%/(66(/(&7,2102'(/6$1' 7,*+7%281'6217+($9(5$*(75($70(17())(&7 -DPHV-+H NPDQ (GZDUG-9\WOD LO

Wageningen Summer School in Econometrics. The Bayesian Approach in Theory and Practice

Endogenous binary choice models with median restrictions: A comment

Nonadditive Models with Endogenous Regressors

Simple Estimators for Monotone Index Models

Granger Causality and Structural Causality in Cross-Section and Panel Data

Matching using Semiparametric Propensity Scores

Blinder-Oaxaca as a Reweighting Estimator

COUNTERFACTUAL MAPPING AND INDIVIDUAL TREATMENT EFFECTS IN NONSEPARABLE MODELS WITH DISCRETE ENDOGENEITY*

INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION. 1. Introduction

GMM-based inference in the AR(1) panel data model for parameter values where local identi cation fails

Marginal treatment effects

Random Coefficients on Endogenous Variables in Simultaneous Equations Models

4.8 Instrumental Variables

Applied Microeconometrics. Maximilian Kasy

13 Endogeneity and Nonparametric IV

WORKING P A P E R. Unconditional Quantile Treatment Effects in the Presence of Covariates DAVID POWELL WR-816. December 2010

A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

IDENTIFICATION OF TREATMENT EFFECTS WITH SELECTIVE PARTICIPATION IN A RANDOMIZED TRIAL

Nonparametric Welfare Analysis for Discrete Choice

What do instrumental variable models deliver with discrete dependent variables?

Transcription:

Marginal Quantile Treatment E ect Ping Yu University of Auckland Started: April 23 First Version: October 23 This Version: April 24 Abstract This paper studies estimation and inference based on the marginal quantile treatment e ect. First, we illustrate the importance of the rank preservation assumption in the quantile treatment e ects evaluation, show the identi ability of the marginal quantile treatment e ect, and clarify the relationship between the marginal quantile treatment e ect and other quantile treatment parameters. Second, we develop sharp bounds for the quantile treatment e ect with and without the monotonicity assumption, and also su cient and necessary conditions for point identi cation. Third, we estimate the marginal quantile treatment e ect and associated quantile treatment e ect and integrated quantile treatment e ect based on the distribution regression, derive the corresponding weak limits and show the validity of the bootstrap inferences. The inference procedure can be used to construct uniform con dence bands for quantile treatment parameters and test unconfoundedness and stochastic dominance. We also develop goodness of t tests to choose regressors in the distribution regression. Fourth, we conduct two counterfactual analyses: deriving the transition matri and developing the relative marginal policy relevant quantile treatment e ect parameter under the policy invariance. Fifth, we compare the identi cation schemes in some important literature with that by the marginal quantile treatment e ect, and point out advantages and also weaknesses of each scheme, e.g., Chernozhukov and Hansen (25) concentrate mainly on the quantile treatment e ect with the selection select but without the essential heterogeneity; Abadie, Angrist and Imbens (22), Aakvik, Heckman and Vytlacil (25) and Chernozhukov and Hansen (26) su er from some obvious misspeci cation problems. Meanwhile, an alternative estimator of the local quantile treatment e ect is developed and its weak limit is derived. Finally, we apply the estimation methods to the famous return to schooling dataset of Angrist and Krueger (99) to illustrate the usefulness of the techniques developed in this paper to practitioners. Keywords: marginal quantile treatment e ect, local quantile treatment e ect, rank preservation, selection e ect, essential heterogeneity, sharp bound, point identi cation, distribution regression, two-step estimator, Hadamard di erentiability, weak limit, uniform con dence band, unconfoundedness, completeness, stochastic dominance, goodness of t test, transition matri, relative marginal policy relevant quantile treatment e ect, counterfactual analysis, policy invariance, bootstrap validity, return to schooling JEL-Classification: C2, C3, C4, C2, C26 Email: p.yu@auckland.ac.nz.

Introduction Treatment e ect evaluation is one main task of econometric analysis. Most literature concentrates on the average treatment e ect evaluation; see Heckman and Vytlacil (27a,b) for a comprehensive summary. Meanwhile, as illustrated in Heckman (992), Heckman et al. (997) and Heckman and Smith (993, 998), questions of political economy or "social justice" requires knowledge of the distribution of the treatment e ect. As a result, distributional treatment e ects (especially when unconfoundedness does not hold) become natural parameters of interest among econometricians. Actually, distributional treatment e ects have been studied etensively in the empirical literature. For eample, Card (996) uses a panel data set to study the e ects of unions on the structure of wages; DiNardo et al. (996) presents a semiparametric procedure to analyze the e ects of institutional and labor market factors on changes in the U.S. distribution of wages; Bitler et al. (26) estimate quantile treatment e ects using random-assignment data from Connecticut s Job First waiver. Distributional treatment e ects are usually estimated based on quantile regression initiated by Koenker and Bassett (978) (see Koenker (25) for an introduction to quantile regression). One related eld that recently attracts much attention is the "general" semiparametric and nonparametric quantile regression with endogeneity. For the semiparametric setups, see, e.g, Hong and Tamer (23), Honoré and Hu (24), Ma and Koenker (26), Lee (27), Sakata (27) and Jun (28) among others. For nonparametric setups, see, e.g., Chesher (23), Chernozhukov et al. (27), Horowitz and Lee (27), Imbens and Newey (29), Chen and Pouzo (22), and Gagliardini and Scaillet (22) among others. However, the main interest of this paper concentrates on the special structure of the treatment model, namely, the endogenous variable is binary. A key parameter we will develop is the marginal quantile treatment e ect (MQTE), which is the counterpart of the marginal treatment e ect (MTE) in the average treatment e ect estimation. The idea of the MTE was rst introduced in the contet of a parametric normal generalized Roy model by Björklund and Mo tt (987), and was analyzed more generally by Heckman (997). In a choice (or selection, or participation) model with the latent variable structure, Heckman and Vytlacil (999, 2a) epress the conventional average treatment e ect parameters as di erent weighted averages of the MTE, and also identify the MTE by the local instrumental variable (LIV) estimator. Actually, Heckman and Vytlacil (27b) use the MTE to unify the econometric literature on the evaluation of social programs, so it is well recognized that the MTE is a convenient tool to organize the nonparametric literature on the average treatment e ect evaluation. An embarrassing situation is that the counterpart of the MTE in the quantile treatment e ect literature, the MQTE, is yet to be well understood. The purpose of this paper is to integrate the relevant literature on the quantile treatment e ect evaluation without unconfoundedness into one framework and provide some useful estimation and inference methods to practitioners based on the MQTE. There are two strands of literature concerning about the distributional treatment e ects, and they are interwined. Before reviewing the relevant literature, we must emphasize that the distributional treatment e ects are functionals of the distribution of Y Y, which requires the joint distribution of Y and Y, where Y and Y are the outcome under the treatment status and the control status, respectively. As mentioned in Section II.B of Manski (996) or footnote 5 of Manski (997), "knowledge of F (Y Y ) neither implying nor being implied by knowledge of F (Y ) and F (Y )", where F (X) is the cumulative distribution function (CDF) of X for a random variable X. Due to the fundamental problem of causal inference (page 947 of Holland (986)), Y and Y cannot be observed simultaneously. As a result, even in a random eperiment, the joint distribution F (Y ; Y ) or F (Y Y ) cannot be identi ed if without further restrictions although F (Y ) and F (Y ) can be identi ed. On the other hand, marginal distributions F (Y ) and F (Y ) are also of interest in econometric analysis. For eample, in Atkinson (97), Sen (997, 2), Manski (996, p74),

Imbens and Rubin (997, p558), Imbens and Wooldridge (29, p7), and Imbens (2, p49), the marginal distributions of outcomes are more relevant for a social planner choosing between two programs; see also the Introduction of Abadie (22) for an eample where only the marginal distributions are relevant. The rst strand of literature tackles the joint distribution F (Y ; Y ) directly. This strand of literature is mainly interested in the quantile of di erences of Y and Y. First, using the classical probability results due to Hoe ding (94) and Fréchet (95), Heckman et al. (997) and Heckman and Smith (993, 998) bound F (Y ; Y ) using F (Y ) and F (Y ) in a random eperiment. It turns out that this kind of bounds are too wide to be useful. Later, Aakvik et al. (25) impose more structures on the problem to get more stringent identi cation results. Basically, two key structures are imposed: (i) the error terms in the outcome equations and the choice equation are independent of all the covariates; (ii) the error terms follow a one-factor structure, i.e., the correlation among the error terms is only through the factor. Under these two assumptions, F (Y ; Y ) can be identi ed, so all interesting functionals of F (Y ; Y ) can be identi ed. For eample, the proportion of people who bene t from participation in the program (P (Y > Y )), gains to participants at selected levels of the no-treatment distribution (F (Y Y jy = y )) or treatment distribution (F (Y Y jy = y )), and a variety of other questions including the quantile treatment e ect (QTE) and the quantile treatment e ect on the treated (QTT) can be answered. Aakvik et al. (25) consider only two binary potential outcomes, and parametric one-factor models with cross-sectional data. Etensions to multiple (possibly continuous or mied discrete and continuous) outcomes, and semiparametric (or nonparametric) multiple-factor models with possibly panel data can be found in Aakvik et al. (999) and Carneiro et al. (2, 23). See Section 2 of Abbring and Heckman (27) for a summary of this strand of literature. The second strand of literature concentrates on the marginal distribution of Y and Y (maybe also conditional on some covariates or some speci ed population). However, as noted above, the distributional treatment e ects require the joint distribution of Y and Y. To circumvent this problem, this strand of literature eplicitly or implicitly assume some type of rank preservation (RP) condition. Such type of condition was initiated by Lehmann (974) and Doksum (974). Under this assumption, the distributional treatment e ects can be described by the di erence of quantiles of Y and Y. The rst part of this strand of literature bounds F (Y ) and F (Y ) without imposing any restrictions on the choice process. Under the RP assumption, these bounds imply bounds on the QTE. Such literature are summarized in Manski (994, 995, 23). When further restrictions on the choice process are imposed, point identifying some type of quantile treatment e ects is possible. The second part of this strand of literature estimates and conducts inferences on some type of quantile treatment e ects under point identi cation. Firgo (27) estimates the QTE and the QTT in observational studies under the unconfoundedness assumption. When the unconfoundedness assumption fails while only the selection e ect eists, Chernozhukov and Hansen (25) show that the QTE can be identi ed under some completeness assumption, and Chernozhukov and Hansen (26) provide a speci c estimation scheme; see also Chernozhukov and Hansen (23) for most updated developments along this line. When there is also the essential heterogeneity, the monotonicity assumption of Imbens and Angrist (994) or the uniformity assumption of Heckman and Vytlacil (25) is usually imposed. Under this assumption, Abadie et al. (22) estimate the local quantile treatment e ect (LQTE), which is the counterpart of the local average treatment e ect (LATE), using the identi cation results in Abadie (23); see also Imbens and Rubin (997) for identi cation of the marginal potential distributions of compliers when no covariates are present, and Abadie (22) for bootstrap tests of distributional treatment e ects in a similar framework. Carneiro and Lee (29) deal with the essential heterogeneity in an alternative way. They borrow a key assumption, the independence assumption (i) in the last paragraph, from the rst strand The LATE parameter is rst introduced by Imbens and Angrist (994). The MTE is a limit form of the LATE, see, e.g., Björklund and Mo tt (987), Heckman (997) and Angrist et al. (2). 2

of literature to identify the (conditional) marginal distributions of potential outcomes. These distributions imply the MQTE, which is also the main objective of this paper but we do not need the independence assumption. The above-mentioned literature concentrates on the cross-sectional data; Athey and Imbens (26) also use the panel data to identify the QTT through what they called change-in-change approach under the RP condition on the treated. Although these two strands of literature use di erent identi cation assumptions, their targets are the same, namely, identifying the joint distribution of Y and Y. This paper can be put in the second strand of literature, i.e., we impose some RP assumptions to identify F (Y ; Y ). Consequently, the quantile treatment e ect in this paper refers to the di erence of quantiles rather than the quantile of di erences. Meanwhile, we employ the framework in the rst strand of literature to study the di erence of quantiles. The rest of this paper is structured as follows. Section 2 sets up our treatment model, illustrates the importance of the RP assumption in the quantile treatment e ect evaluation, shows the identi ability of the MQTE, and clari es the relationship between the MQTE and other quantile treatment parameters. Section 3 develops sharp bounds and su cient and necessary conditions for point identi cation of the QTE with and without the monotonicity assumption. In Section 4, we estimate the MQTE based on the distribution regression introduced by Foresi and Peracchi (995), derive its weak limit and show the validity of the bootstrap inferences, and we also develop goodness of t tests to choose regressors. In Section 5, we conduct two counterfactual analyses: deriving the transition matri and developing the relative marginal policy relevant quantile treatment e ect parameter under the policy invariance. In Section 6, we comment some key literature in the two strands above, pointing out their weaknesses, underlying assumptions, and interactions with this paper. Section 7 presents an empirical application to the return to schooling and Section 8 concludes. All proofs are contained in an appendi. Some notations are collected here for future reference. d is always used for indicating the two treatment statuses, so is not written out eplicitly as "d = ; " throughout the paper. supp(x) for a random variable X denotes the support of the distribution of X. Both Q X () and Q (X) denote the th quantile of a random variable X. The capital letters such as X denote random variables and the corresponding lower case letter such as denote the potential values they may take. For any parameter, d is the dimension of. The space `(F) represents the space of real-valued bounded functions de ned on the inde set equipped with the supremum norm kk`(f). C (Y) is the space of continuous functions on Y. 2 The Setup and Parameters of Interest We use the nonlinear and nonseparable outcome model as in Heckman and Vytlacil (25), Y = (X; U ); Y = (X; U ): () Actually, the additively separable setup, Y d = d (X) + U d, does not lose generality since we can de ne the new U d as Y d Q Yd jx(jx) and all our analysis in this paper is conditional on X. The distribution of Y d may be discrete (e.g., employment status), continuous (e.g., wage), or mied discrete and continuous (e.g., in the national JTPA study 8 month impact sample used in Heckman et al. (997), a substantial proportion of persons has zero earnings in both distributions of Y and Y ). The participation decision D = ( D (X; Z) V ); (2) 3

where Z includes the instruments for the choice process. Both X and Z appearing as the arguments of D does not lose generality since D (X; Z) may not depend on all elements of X. By transforming D (X; Z) and V by F V jx;z, we can rewrite D = (p(x; Z) U D ); (3) where U D jx; Z U(; ) and p(x; Z) is the propensity score. We use these two formulations of D interchangeably throughout the paper. As shown in Vytlacil (26), there is a larger class of latent inde models that will have a representation of this form. Also, this setup of D implies the monotonicity assumption of Imbens and Angrist (994) as shown in Vytlacil (22). We impose the following assumptions on the outcome equation and the choice equation. (A) D (X; Z) is a nondegenerate random variable conditional on X. (A2) The random vectors (U ; V ) and (U ; V ) are independent of Z conditional on X. (A3) The distribution of V is absolutely continuous with respect to Lebesgue measure. (A4) X = X almost everywhere, where X d denote a value of X if D is set to d. (A5) > P (D = jx) >. (A6) Conditional on X =, V = v, Y and Y have the same rank: (A)-(A5) corresponds to (A-)-(A-3), (A-6) and (A-5) in Heckman and Vytlacil (25), respectively. These assumptions are prevalent in the literature with heterogeneous treatment e ects. A necessary condition for (A) is that Z contains a continuous variable. (A2) allows for both the selection e ect (U 6? DjX) and the essential heterogeneity ((U U ) 6? DjX). Also, (A2) implies the usual assumption in the control function approach, say, Z? (U ; U )j (X; V ). (A)-(A5), combined with () and (2), impose testable restrictions on the distribution of (Y; D; Z; X); see Heckman and Vytlacil (25) (page 678) for the inde su ciency restriction and the monotonicity restriction. We refer to Heckman and Vytlacil (25) for more detailed discussions on (A)-(A5). The assumption (A6) deserves further eamination. 2. The Rank Preservation Condition The key etra assumption beyond those in Heckman and Vytlacil (25) is the RP condition (A6). Chernozhukov and Hansen (25) state the RP assumption via the Skorohod representation. We try to do the same thing here although unlike them, this representation is not essential for the development of our identi cation scheme. Suppose Y d is continuous, and the th conditional quantile of Y d given X and V is q(d; X; V; ); then we can represent Y d = q(d; X; V; R d ) by the Skorohod representation, where R d j(x; V ) U(; ) is the rank variable which represents some unobserved characteristic of Y d, e.g., ability or proneness, among the slice of people with a speci c value of X and V. The RP assumption (A6) can be restated as R j(x; V ) = R j(x; V ). We now clarify two key points of the Skorohod representation. First, the Skorohod representation decomposes the information in U d of () into two components: the value information and the rank information. The former is incorporated in the quantile function q() and the later is included in R d. Second, because R d j(x; V ) U(; ) does not depend on (X; V ), it may be suspected that R d is independent of (X; V ). This is incorrect. This mistake is immediately clear if we rewrite Y d = q(d; X; V; R d (X; V )) ; in other words, R d must be understood as a conditional random variable. Suppose there are N distinct points on the support of (X; V ), and then there are N rank variables R d (X; V ). Although R d (X; V )j(x = ; V = v) U(; ) does not depend on (; v), the unconditional random variable R d may depend on (X; V ). The RP condition does not restrict the dependence between R d and (X; V ); rather, it restricts the total number of conditional rank variables 4

R d (X; V ) from 2N to N. To be consistent with the notation in the literature, R d is replaced by U d in the rest of this paper. The meaning of U d should be clear from the contet. For eample, when V appears as an argument of the representation of Y d, or Y d is represented as Y d = q(), U d means the rank variable. In this paper, we do not consider the quality of the evidence supporting the assumption (A6). Instead, we consider the evaluation of speci c programs under this assumption. Figure : Rank Preserved Conditional on U D BUT Unconditionally Unpreserved Our RP assumption is weaker than the usual assumption that Y and Y have the same rank conditional on X =. Think about the following eample. Suppose Z is the only covariate in the determination of D, and Z can take only and. So the only nontrivial values for U D are p() and p(). Suppose for each value of U D, there are only two persons. Figure shows that although the rank is preserved among the people with a speci c U D value, the rank is unpreserved if all people are taken into account. In other words, (A6) only requires the RP condition to hold locally (X = ; U D = u D ) instead of globally (X = ). Conditional on X =, the rank may not be maintained under the treatment. However, for a ner slice of individuals, the rank is maintained. Local rank preservation is much weaker than global rank preservation. The larger the conditional set on which the RP condition is imposed, the harder for the RP condition to hold. Actually, the analysis in Heckman et al. (997) show that the unconditional RP condition cannot hold although substantial departures from the perfect positive dependence across Y and Y are not credible in their contet; see also Carneiro et al. (23) for further evidences against the unconditional RP condition. The RP condition also imposes a restriction on the joint distribution of Y and Y given X = and U D = u D, namely, the joint distribution is fully determined by the marginal distribution. It is not hard to see that when the RP condition holds, P (Y y ; Y y jx = ; U D = u D ) = min fp (Y y jx = ; U D = u D ) ; P (Y y jx = ; U D = u D )g ; 5

which implies that the joint distribution of Y and Y given X = ; U D = u D is degenerate. To see how this joint distribution looks like, suppose Y d j (X = ; U D = u D ) is continuously distributed and supp(y d jx = ; U D = u D ) = [; ] to simplify the discussion. It turns out that only on the line y ; F Y jx;u D F YjX;U D (y j; u D )j; u D with y 2 [; ] there is probability. In other words, only on the Q-Q plot, (Y ; Y ) can occur simultaneously. An implication of this result is that if F YjX;U D (j; u D ) is the same as F YjX;U D (j; u D ), then the correlation between Y and Y conditional on X = ; U D = u D must be. Figure 2 shows a typical Q-Q plot of (Y ; Y ) conditional on X = ; U D = u D. In Figure 2, P (Y Y jy = y ; X = ; U D = u D ) = when y :6 and P (Y Y jy = y ; X = ; U D = u D ) = when y > :6. In other words, for the slice of people with Y = y ; X = ; U D = u D, the participant always bene ts as long as y :6, and vice versa. Nevertheless, it is more likely that P (Y Y jy = y ; X = ) 2 (; ), P (Y Y jx = ; U D = u D ) = F YjX;U D (:6j; u D ) 2 (; ) and P (Y Y jx = ) = R P (Y Y jx = ; U D = u D )du D 2 (; )..6.6 Figure 2: Q-Q Plot of (Y ; Y ) Conditional on X = ; U D = u D It should be emphasized that the RP condition is only for de ning various quantile treatment e ects. Even without this condition, we can still identify various marginal distributions which, as argued in the introduction, are useful for many other purposes. Under the RP assumption, we de ne the MQTE in Carneiro and Lee (29) as MQT E (; u D ) = Q YjX;U D (j; u D ) Q YjX;U D (j; u D ): If we strengthen the RP assumption to be conditional on X = or on X = ; D =, then we can de ne the QTE in Chernozhukov and Hansen (25, 26) and the QTT as QT E () = Q YjX(j) Q YjX(j) and QT T () = Q YjX;D(j; ) Q YjX;D(j; ); 6

respectively. If the RP assumption is conditional on X = ; u D < U D u D, then the LQTE of Abadie et al. (22) 2 is de ned as LQT E (; u D ; u D) = Q YjX;U D (j; (u D ; u D]) Q YjX;U D (j; (u D ; u D]): Finally, if the RP assumption holds unconditionally (with respect to X), 3 then we de ne the integrated QTE (IQTE) IQT E = Q Y () Q Y (); the integrated QTT (IQTT) IQT T = Q YjD(j) Q YjD(j) as in Firpo (27), 4 and the integrated LQTE (ILQTE) 2.2 Identi cation of the MQTE ILQT E (u D ; u D) = Q YjU D (j(u D ; u D]) Q YjU D (j(u D ; u D]): The following theorem states that the MQTE can be identi ed for a range of u D. Theorem Suppose assumptions (A)-(A6) hold. If u D is not an isolated point of P \P, then MQT E (; u D ) can be identi ed for any 2 (; ), where P d =supp(p(x; Z)jX = ; D = d). Proof. To simplify notations, we depress the conditioning on X =. Given the RP assumption (A6), we need only identify Q Yd ju D (ju D ) whose identi cation is equivalent to the identi cation of F Yd ju D (ju D ). We provide two methods to identify F Yd ju D (ju D ). Method : Note that P (Y yjp(z) = p; D = ) p = P (Y yjp(z) = p; D = ) P (D = jp(z) = p) and similarly, P (Y yjp(z) = p; D = ) ( p) = = P (Y yju D p) p = Z p Z p F YjU D (yju D )du D, so d [P (Y yjp(z) = p; D = ) p] = F YjU dp D (yjp); d [P (Y yjp(z) = p; D = ) ( p)] = F YjU dp D (yjp): F YjU D (yju D )du D ; 2 Abadie et al. (22) con ate issues of de nition of parameters with issues of identi cation; see Section 6.2 below for their de nition. Actually, LQT E (; u D ; u D ) can be de ned for any u D; u D 2 (; ) although it can only be identi ed for u D; u D on the support of p(; Z). 3 Note that if the RP assumption holds on X =, Y d jx can be epressed as Y d = q(d; X; U) by the Skorohod representation, where UjX = U jx = U jx. If the RP assumption holds unconditionally, then Y d can be epressed as Y d = q(d; U) by the Skorohod representation, where U = U = U 2. This by no means implies that information in X and Z is useless to the identi cation or e ciency improvement in the quantile treatment e ect evaluation. 4 Be careful about the terminology in the literature. Our IQTE and IQTT are the QTE and QTT of Firpo (27). Also, the MQTE of Cattaneo (2) means Q (Y ) and Q (Y ) rather than MQT E (; u D ), and the MQTE, QTE and QTT in the rst strand of literature mentioned in the introduction means Q Y Y jx;u D (j; u D ), Q Y Y jx(j) and Q Y Y jx;d(j; ) rather than MQT E (; u D ), QT E () and QT T (). 7

Method 2: As in Hahn (998), we can use DY and ( y D)Y to identify F Yd ju D (ju D ). Note that for any P (DY yjp(z) = p) ( p) = P (DY yjp(z) = p; D = ) P (D = jp(z) = p) + P (DY yjp(z) = p; D = ) P (D = jp(z) = p) ( p) = P (Y yju D p) p + P ( yju D > p) P (D = jp(z) = p) ( p) = so Z p F YjU D (yju D )du D + ( p) ( p) = For y <, repeat the analysis above, we have Similarly, and Z p F YjU D (yju D )du D ; d [P (DY yjp(z) = p) ( p)] dp dp (DY yjp(z) = p) dp d [P (( D) Y yjp(z) = p) p] dp dp (( D) Y yjp(z) = p) dp = F YjU D (yjp): (4) = F YjU D (yjp): (5) = F YjU D (yjp) when y ; (6) = F YjU D (yjp) when y < : (7) Inverting F Yd ju D (yjp) as a function of y, we can get Q Yd ju D (jp). Since p(z), P (Y yjp(z) = p; D = d), P (DY yjp(z) = p) and P (( D) Y yjp(z) = p) for y 2 R and p 2 P d can be identi ed, MQT E (u D ) = Q YjU D (ju D ) Q YjU D (ju D ) for all 2 (; ) and u D not being an isolated point of P\P can be identi ed. Figure 3: Intuition for Identi cation of F YjU D (yjp) with y and y < Figure 3 provides some intuition for the arguments in the second method. For y, P (DY yjp(z) = p) includes a point mass p at, and the remaining probability is 8 Z p F YjU D (yju D )du D, while for y <,

P (DY yjp(z) = p) does not include the point mass. This intuition is similar in spirit to that of the censored quantile regression models discussed in Powell (984, 986). The arguments in Theorem can be applied to the discrete Y d case. Suppose Y and Y have the same support fy ; ; y S g, and then the counterpart of the MQTE is P YjU D (y s ju D ) P YjU D (y s ju D ), s = ; ; S, where P Yd ju D (y s ju D ) is the point mass of Y d j (U D = u D ) at y s. We can still identify F Yd ju D (y s jp) by (4), (5), (6) and (7), and then P Yd ju D (y jp) = F Yd ju D (y jp) and P Yd ju D (y s jp) = F Yd ju D (y s jp) F Yd ju D (y s jp) for s = 2; ; S can be sequentially identi ed. If Y d can take only and, then the parameter of interest is P YjU D (ju D ) P YjU D (ju D ) which coincides with the MTE. Of course, we can also consider the case with mied discrete and continuous outcomes. Both the discrete case and the mied case are easier to handle than the continuous case, so we will concentrate on the continuous case in the rest of this paper unless stated otherwise. If we use the idea of LIV as in Heckman and Vytlacil (2a), we have and P (Y yjp(z) = p) = P (Y yjp(z) = p; D = ) p + P (Y yjp(z) = p; D = ) ( p) = Z p F YjU D (yju D )du D + @P (Y yjp(z) = p) @p Z p F YjU D (yju D )du D ; = F YjU D (yjp) F YjU D (yjp); which is the di erence of CDFs in the two treatment statuses. So it is hard to identify the MQTE from @P (Y yjp(z) = p) =@p. From Theorem, we can identify E[Y ju D = p] and E[Y ju D = p] separately, not just their di erence E[Y Y ju D = p] as in the LIV method of Heckman and Vytlacil (2a). Method of the proof is a special case of Theorem in Carneiro and Lee (29). We also discuss Method 2 to distinguish the di erence between the identi cation scheme of the MTE and the MQTE. For the MTE, E[DY jp(z) = p] = E [Y jp(z) = p; D = ] p = p] = E [Y jp(z) = p; D = ] ( p) = Z p Z p E [Y ju D = u D ] du D, and E[( D) Y jp(z) = E [Y ju D = u D ] du D, so the two methods in the proof are the same in the MTE identi cation. We close this subsection by a concrete eample. Suppose Y = V +2U; Y = 2V +U, and D = (Z V > ), where B @ U V Z C A N (; ) with = B @ :5 :5 It can be shown that MQT E (u D ) = :5 (u D ) + p :75 (). Figure 4 shows MQT E (u D ) for = :; :25; :5; :75 and :9. In this simple model, the spreading measure of the MQTE, e.g., MQT E (u D ) MQT E (u D ) for 2 (; :5), is the same for any u D, which may not be standard in practice. Also, MQT E (u D ) is a decreasing function of p, which indicates that the more likely will an individual participate in the program, the higher bene t will she receive. 5 In the gure, we also show MT E (u D ), QT E and AT E ( E[Y ] E[Y ]) for comparison. Note that in this eample, MT E (u D ) = MQT E :5 (u D ), and QT E = = AT E does not depend on. 6 Obviously, MQT E (u D ) provides more information than MT E (u D ), QT E, and AT E. C A : 5 Aakvik et al. (25) provide a converse eample. 6 It should be emphasized that QT E is not well de ned in this eample since the RP condition does not hold unconditionally given that Y and Y have the same marginal distribution but Corr(Y ; Y ) = 6:5=7 <. 9

2.5 2.5.5.5.5 2 2.5..2.3.4.5.6.7.8.9 Figure 4: MQT E (u D ) for = :; :25; :5; :75 and :9 in a Simple Eample 2.3 Relationship with Other Parameters of Treatment E ects In this subsection, we rst discuss the relationship between MQT E IQT E, IQT T, ILQT E (; u D ) and QT T (), QT E (), LQT E (; u D ; u D ),. It turns out that the building block is F Yd jx;u D (y d j; u D ) rather than MQT E (; u D ). Actually, MQT E (; u D ) is more relevant to the (conditional) quantile of Y Y. From the supplementary materials, we can show that QT T () = F Y jx;d (j; ) FY jx;d (j; ) and the quantile treatment e ect on the untreated (QTUT) where QT UT () = F Y jx;d (j; ) FY jx;d (j; ); F Yd jx;d(y d j; ) = F Yd jx;d(y d j; ) = Z Z F Yd jx;u D (y d j; u D )h T T (; u D )du D ; F Yd jx;u D (y d j; u D )h T UT (; u D )du D ; with h T T (; u D ) = F p(x;z)jx (u D j) =E[p(X; Z)jX = ] and h T UT (; u D ) = F p(x;z)jx (u D j)=e[ p(x; Z)jX = ]. Also, QT E () = F Y (j) F E jx Y jx (j); IQT = F Y () F (); IQT T = F Y (j) F UT jd Y jd (j); IQT = F Y (j) F jd Y (j) jd Y

where F Yd jx(y d j) = Z Z F Yd jx;u D (y d j; u D )du D ; F Yd (y d ) = F Yd jx(y d j)df X (); Z Z Z F Yd jd(yj) = F Yd jx;d(yj; )df XjD (j) = F Yd jx;u D (yj; u D ) Z Z Z F Yd jd(yj) = F Yd jx;d(yj; )df XjD (j) = Finally, where F p(x;z)jx(u D j) du D df X (); P (D = ) F Yd jx;u D (yj; u D ) F p(x;z)jx(u D j) du D df X (): P (D = ) LQT E (; u D ; u D) = F Y jx;u D (j; (u D ; u D]) F Y jx;u D (j; (u D ; u D]); ILQT E (u D ; u D) = F Y ju D (j(u D ; u D]) F Y ju D (j(u D ; u D]); Z u D F Yd jx;u D (y d j; (u D ; u D]) = u D u F Yd jx;u D (y d j; u D )du D ; D u Z D F Yd ju D (y d j(u D; u D ]) = F Yd jx;u D (y d j; (u D ; u D])dF Xjp(X;Z) (j(u D ; u D]) Z = F Yd jx;u D (y d j; (u D ; u D]) P (p(x; Z) 2 (u D; u D ]jx = ) P (p(x; Z) 2 (u D ; u D ]) df X (): See Appendi B. of Carneiro and Lee (29) for implementation of some of these parameters in practice. Z Z Note that QT E () 6= QT E (; u D )du D, and IQT E 6= QT E ()df X (), so it is hard to nd a relationship between these quantile treatment parameters and MQT E (; u D ). We can also identify the MTE Z MT E (; u D ) = y df YjX;U D (y j; u D ) Z Z y df YjX;U D (y j; u D ) = MQT E (; u D )d; so all the parameters that can be identi ed by MT E (; u D ) as listed in Table IA of Heckman and Vytlacil (25) can also be identi ed by MQT E (; u D ). In other words, MQT E (; u D ) is a more basic building block of the average treatment parameters. Note that to identify MT E, we do not need the RP assumption, but we need to assume E [jy d j] <. Heckman et al. (997) consider also the following parameters of treatment e ects: (a) the proportion of people taking the program who bene t from it, P (Y > Y jd = ); (b) the proportion of the total population that bene ts from the program, P (Y > Y jd = )P (D = ); (c) selected quantiles of the impact distribution, Q Y Y jd(j); (d) the distribution of gains at selected base state values, F Y Y jy ;D(jy ; ); 7 These parameters can be identi ed from MQT E (; u D ). It is not hard to show that under the RP assumption (A6), Z Z Z P (Y Y yjd = ) = ( MQT E (; u D ) y)d Fp(X;Z)jX (u D j) du D df X () P (D = ) 7 They also consider the option value of a social program, E[ma(Y ; Z)jD = ] E[Y jd = ], where Z is the option provided by the program.

by a similar derivation as in the epression of F Yd jd(yj), so Q Y Y jd(j) can be identi ed and P (Y > Y jd = ) = P (Y Y jd = ) can also be identi ed. 8 Actually, we can identify any conditional or unconditional quantile of Y Y of interest, e.g., Q Y Y jx;d(j; d), Q Y Y jx(j), Q Y Y jd(jd), Q Y Y () and Q Y Y jx;u D (j; (u D ; u D]), based on MQT E (; u D ). Since the corresponding weights can be similarly de ned as above, we neglect the details. Note that if only assumption (A6) holds, P (Y > Y jd = ) need not equal R R (QT T () > )ddf XjD (j) or R (IQT T > )d. They are equal only if the RP assumption holds on X = ; D = or D =. This observation can be used to test whether the RP assumption holds on a larger set than X = ; U D = u D. Because quantile is not a linear operator of the distribution function, Q Y Y () and Q Y () Q Y () are generally unequal (and do not have any identi able relationships), so the quantile treatment e ect and the quantile of the impact distribution are two di erent parameters. On the contrary, since mean is a linear operator of the distribution function, the average treatment e ect and the average of the impact distribution are the same parameter. In this paper, we concentrate on three most popular quantile treatment e ect parameters in the literature: MQT E (; u D ), QT E () and IQT E. We concentrate on di erence of quantiles rather than quantile of di erences because the latter may not be interesting. For eample, in the common e ect model, the distribution of Y Y is a point mass at a ed value. Even if the treatment e ect is not common, Y Y may still have discrete components in its distribution. See Section 3.2 of Aakvik et al. (25) for de nitions of the distributional counterparts of the MTE, ATE and ATT based on Y Y when the outcomes are binary, and see Section 2 of Abbring and Heckman (27) for de nitions of the distributional treatment e ects in more general settings. Finally, we study F Y Y jy ;D(jy ; ). We have already shown in Section 2. that under the RP assumption (A6), so P (Y Y yjy = y ; X = ; U D = u D ) = (Q YjX;U D (F YjX;U D (y j; u D )j; u D ) y + y ); P (Y Y yjy = y ; D = ) = P (Y Y yjy = y ; U D p(x; Z)) = R R h F UD ju ;X (p(;z)jf Y jx (y j);) R p(;z) P (Y Y yjy = y ; X = ; U D = u D )df UD ju ;X(pjF YjX(y j); ) df ZjX (zj)df XjY (jy ); where the equality is from the fact that F (UD ;X;Z)jY =y = F UD jy =y ;X;ZF ZjY=y ;XF XjY=y = F UD ju =F Y jx (y jx);x F ZjX F XjY=y, and U is de ned in the Skorohod representation of Y, Y jx = F Y jx (U jx). So this parameter is a complicated functional of Q YjX;U D (F YjX;U D (y j; u D )j; u D ) and is not easy to estimate. Actually, it is unknown whether it can be point identi ed since F UD ju ;X is hard to be nonparametrically identi ed without further structures on the model. i 3 Sharp Bounds for the QTE Although Q Yd jx;u D (j; u D ) can be point identi ed from Theorem, we show in this section that Q Yd jx(j) generally can only be partially identi ed, which implies that QT E () can only be partially identi ed. Here, we implicitly assume that the RP assumption on X = holds (i.e., Y d can be represented as Y d = q(d; X; U d ) 8 This parameter is useful, e.g., in the median-voter model, we need to check whether P (Y > Y jd = )P (D = ) > =2. 2

with U j(x = ) = U j(x = )), but we do not eplicitly eplore the information content in this assumption. 9 First, we impose the quantile independence assumption (QIA), Q Yd jx;z (jx; Z) = Q Yd jx (jx) for all 2 (; ). (8) This assumption is equivalent to (Y ; Y )? ZjX. This assumption is parallel to the usual IV assumption E[Y d jx; Z] = E[Y d jx] in the average treatment e ect evaluation. As in Heckman and Vytlacil (2b), we assume further that D = (p(x; Z) U D ) and Z? U D jx (9) to study the improvement on the bounds for QT E (). 3. Bounds Under the Quantile Independence Assumption From Proposition 2 and (36) of Manski (994), we have sharp bounds for Q Yd jx(j) under (8): sup L (; z) Q YjX(j) inf R (; z) ; z2z z2z sup L (; z) Q YjX(j) inf R (; z) ; z2z z2z where Z supp(zjx = ), ( QY L jx;z;d p(;z) ; z; ; (; z) = ; ( QY R jx;z;d p(;z) j; z; ; if p(; z) ; (; z) = ; otherwise, ( QY L jx;z;d p(;z) ; z; ; (; z) = ; ( QY R jx;z;d p(;z) ; z; ; (; z) = ; if p(; z) > ; otherwise, if p(; z) < ; otherwise, if p(; z) ; otherwise. () So I L () sup z2z L (; z) inf R (; z) QT E () inf R (; z) z2z z2z sup L (; z) I U (): () z2z This bound is trivial, since I L () = and I U () = if Y and Y are unbounded. Similar phenomena also happen in the average treatment e ect evaluation. To avoid such trivial results, we assume that P y l d() Y d y u d ()jx = ; Z = ; (2) where yd l (); yu d () 2 R does not depend on Z from (8). To simplify notations, we assume that yl () = y(), l denoted as y l (), and y u () = y u (), denoted as y u (). Then in () is changed to y l () and is changed to y u (). Let P supp(p(x; Z)jX = ), p sup = sup P and p inf = inf P. The width of the bounds is I U () I L (), a complicated epression to evaluate, especially if Z is uncountable. Note that the above bounds eactly identify QT E () if I L () = I U (). Note also that it is neither necessary nor su cient for p(; z) 9 In Section 6., we will show how Chernozhukov and Hansen (25) point identify Q Yd jx(j) by eploring the information content in this assumption and imposing some completeness conditions. 3

to be a nontrivial function of z for these bounds to improve upon the bounds when (8) is not imposed (i.e., sup and inf are dropped from ()); this is because Q Y jx;z;d (j; z; d) may depend on z through other z2z z2z channels than p(; z). Evaluating the bounds for QT E () requires knowledge of p(; z); Q Y jx;z;d p(; z) ; z; ; Q Y jx;z;d p(; z) ; z; ; y l (); y u (); Q Y jx;z;d p(; z) ; z; ; Q Y jx;z;d p(; z) ; z; for each z 2 Z ; estimators of these objects can be constructed in an obvious way, so are omitted here. The following theorem is parallel to Corollary and 2 of Proposition 6 in Manski (994). It develops necessary and su cient conditions on p sup and p inf Theorem 2 Suppose assumptions (8) and (2) hold. for point identifying QT E (). (i) p sup min f; g and p inf ma f; g are necessary for point identi cation of QT E (). Also, when p inf and p sup are achieved at some values that Z can take, p inf = and p sup = are su cient for point identi cation of QT E () for any ed 2 (; ). (ii) When (Y ; Y )? Dj (X; Z), p sup = and p inf = is su cient for point identi cation of QT E () for any ed 2 (; ). If assume further that Y d j (X = ) is continuously distributed with a positive density p inf on (y l (); y u ()), then p sup (). = and p inf = is also necessary for point identi cation of QT E () using For the average treatment e ect evaluation, Corollary of Proposition 6 in Manski (994) implies that =2 and p sup =2 are necessary for point identi cation of AT E () E[Y Y jx = ]. A key assumption for this result to hold is that the support of Y d j (X = ; Z) does not depend on Z. Our rst necessary condition requires only the support independence assumption rather that the full independence assumption (8). So these two sets of necessary conditions are comparable. When = =2, they are the same. To understand the su cient condition for point identi cation of QT E () in Theorem 2(i), we need to clarify the meaning of Q Y jx;z;d ((; z)j; z; d) when z 2 Z but cannot be taken by Z, where (; z) is the quantile inde as a function of and z. Since in this case Q Y jx;z;d ((; z)j; z; d) is not de ned, it should be understood as the continuous etension of Q Y jx;z;d ((; z)j; z; d) as z converges to z. The quantile functions in (3) below are similarly understood when p sup and p inf cannot be taken by p(; z). This etension is not required when Z is discretely distributed. When Z has a continuous component, we can assume that Z can take all values in Z, and assume (8) is satis ed for Z 2 Z. This assumption does not lose generality since rede ning a continuous random variable on a set with Lebesgue measure zero will not a ect its distribution at all. Under this etension, p inf and p sup must be achieved at some values in Z, so p inf = and p sup = are su cient for point identi cation of QT E of Z is discrete or continuous or a miture. Finally, note that p inf () for any ed 2 (; ) regardless = is essentially the usual = and p sup large support condition which entails identi cation-at-in nity. In the average treatment e ect evaluation, there are not su cient conditions for point identi cation of AT E () in the literature. From Manski (994), I L () AT E () I U (); 4

where I L () = sup z2z p(; z)e [Y jx = ; Z = z; D = ] + ( p(; z))y l (; z) inf f( p(; z)) E [Y jx = ; Z = z; D = ] + p(; z)y u (; z)g ; z2z I U () = inf z2z fp(; z)e [Y jx = ; Z = z; D = ] + ( p(; z))y u (; z)g sup ( p(; z)) E [Y jx = ; Z = z; D = ] + p(; z)y l (; z) ; z2z and y l (; z), y u (; z) 2 R satisfy P y l (; z) Y d y u (; z)jx = ; Z = z =. Note here that y l (; z) and y u (; z) depend on z if only the mean independence assumption, E[Y d jx; Z] = E[Y d jx], is imposed. As in Theorem 2(i), when p inf and p sup are achieved at some values that Z can take, p inf = and p sup = implies that E [Y jx = ; Z = z; D = ] E [Y jx = ; Z = z; D = ] I L () I U () E [Y jx = ; Z = z; D = ] E [Y jx = ; Z = z; D = ] ; so AT E () is point identi ed. Corollary 2 of Proposition 6 in Manski (994) implies that when (Y ; Y )? Dj (X; Z), AT E () is point identi ed using his bound (35) or [I L (); I U ()] above if and only if p sup = and p inf =. Our result parallels his result when Y d j (X = ) is continuously distributed with a positive density on (y l (); y u ()). It should be emphasized that when (Y ; Y )? Dj (X; Z), p sup = and p inf = is necessary for point identi cation of QT E () only when () is used. Actually, since Q Yd jx(j) = Q Yd jx;z(j; z) = Q Y jx;z;d (j; z; d), Q Yd jx(j) can be identi ed directly from Q Y jx;z;d (j; z; d). 3.2 Bounds Under the Nonparametric Selection Model The following theorem states the bounds for QT E () when assumption (9) is imposed. Theorem 3 Suppose assumptions (8), (9) and (2) hold. (i) QT E () has sharp bounds, L () R () QT E () R () L () ; (3) where ( QY L jx;p(x;z);d p () = sup ; p sup ; ; y l (); ( QY R jx;p(x;z);d p () = sup ; p sup ; ; if p sup ; y u (); otherwise, ( QY L jx;p(x;z);d p ; p inf () = inf ; ; y l (); ( QY R jx;p(x;z);d p ; p inf () = inf ; ; y u (); if p sup > ; otherwise, if p inf < ; otherwise, if p inf ; otherwise. (ii) p inf = and p sup = are su cient for point identi cation of QT E () for any ed 2 (; ). When ; D = ) and Y j X = ; p(x; Z) = p inf ; D = are continuously distributed Y j (X = ; p(x; Z) = p sup with a positive density on (y l (); y u ()), they are also necessary. 5

(iii) [I L (); I U ()] in () will simplify to the bounds in (3) under assumption (9). Figure 5: Intuition for L Q Y () R : p sup = :8, = :5, = :5, 2 = :46 and 3 = :9 Figure 5 provides some intuition for why L () Q YjX (j) R (); similar intuition can be applied to the bounds for Q YjX (j). From the proof of Theorem 3, P (Y yjp(z) = p sup ; D = ) p sup P (Y y) P (Y yjp(z) = p sup ; D = ) p sup + ( p sup ); where the conditioning on X = is depressed. Suppose (Y ; V ) N ;!! Z! p sup P (Y yjp(z) = p sup ; D = ) p sup = y (u D ) p du D : 2 ; then Figure 5 shows the bounds for P (Y y) when p sup = :8 and = :5. Inverting the bounds for P (Y y), we can get the bounds for Q Y (). When p sup, L = y l ; when > p sup, R = y u. Only if 2 ( p sup ; p sup ), both bounds are nontrivial. This is not always possible; only if p sup > ma(; ) =2 (p min < min(; )), neither the left nor the right bound for Q Y () (Q Y ()) is trivial. Pushing! or, we can see that there are nontrivial bounds for Q Y () (Q Y ()) for all if and only if p sup = (p min = ). Note that L d () and R d () are increasing functions of ; hence the bound for Q Yd jx(j) shifts to the right as increases. Also observe that p sup p sup and p inf p inf : Hence Q Y jx;p(x;z);d (j; p sup ; ) and Q Y jx;p(x;z);d j; p inf ; lie within the bound for Q YjX(j) and Q YjX(j), respectively. This implies that F YjX(j) = F Y jx;p(x;z);d (j; p sup ; ) and F YjX(j) = F Y jx;p(x;z);d (j; p inf ; ) are not rejectable in the absence of other information. 6

Evaluating the bounds for QT E () requires knowledge only of p inf ; p sup ; Q Y jx;p(x;z);d y l (); y u (); Q Y jx;p(x;z);d p inf p sup ; psup ; ; pinf ; ; Q Y jx;p(x;z);d ; Q Y jx;p(x;z);d p sup ; ; ; psup p inf ; pinf ; Estimators of these objects can be constructed in an obvious way, so are omitted here. The simpler structure of these bounds results from assumption (9). Also, it is both necessary and su cient for p(; z) to be a nontrivial function of z for the bounds in Theorem 3 to improve upon the bounds when (8) and (9) are not imposed. As shown in Section 4 of Heckman and Vytlacil (2b), p inf = and p sup : = are necessary and su cient for point identi cation of AT E () under assumption (9). Our result parallels their result when Y d j (X = ) is continuously distributed with a positive density on (y l (); y u ()). As shown in Section 6 of Heckman and Vytlacil (2b), the Manski IV bounds of AT E () simplify to their bounds under assumption (9); the last part of Theorem 3 parallels their result. Finally, note that the bounds for QT E () can be integrated (with respect to ) to get the bounds for IQT E. 3.3 Some Countereamples The bounds for QT E () in Theorem 3 can be applied to cases with discrete, continuous or mied response variables. Note that p inf Y j (X = ; p(x; Z) = p sup a positive density on (y l (); y u ()). = are necessary for point identi cation of QT E () only when ; D = are continuously distributed with = and p sup ; D = ) and Y j X = ; p(x; Z) = p inf The following eample illustrates that p inf = and p sup = is not necessary for point identi cation of QT E () when Y d is binary. The supplementary materials include another eample in a similar spirit where the distribution of Y d is a miture of continuous and discrete. Eample Suppose Y d 2 f; g. p sup P (Y = jx = ; p(x; Z) = psup ; D = ) 2 (; ) and p inf P (Y = jx = ; p(x; Z) = p inf ; D = ) 2 (; ). First check the bounds for Q YjX (j): ( L () = ( R () = ; ; ; if p sup > and p sup if p sup or [p sup > and p sup if p sup and p sup ; if p sup p sup > p sup ; < or [p sup and p sup > p sup ]: p sup ]; n o When ma ; < p sup p sup or p sup p sup p sup, L () = R () = ; when < p n o sup p sup < or ma ; < p sup p sup <, L p sup () = R () =. Similarly, when p inf p inf < n o min ; or p inf p inf, L p inf () = R () = ; when < p inf < or < p inf p n o inf p inf < min ;, L p inf () = R () =. Figure 6 shows the point identi cation combination of p sup (p inf ) and p sup (pinf ) for = :; :25; :5; :75; :9. Obviously, p inf for point identi cation of QT E also that p sup min f; g and p inf p inf 2 (; ) as predicted by Theorem 2. = and p sup = are not necessary (). Only if p sup = pinf =, p inf = and p sup = are necessary. Note ma f; g for point identi cation of QT E () for any p sup, 7

Figure 6: p sup (p inf and Blue Area for Q Yd jx (j) = ) and p sup (pinf ) for Point Identi cation of Q Yd jx (j): Red Area for Q Yd jx (j) = The net eample shows that [I L (); I U ()] in () may not simplify to the bounds in Theorem 3 if assumption (9) is not imposed. This eample parallels the eample in Section 6 of Heckman and Vytlacil (2b) where they show a similar result for AT E (). Eample 2 Suppose Z is binary and there are no other covariates. Take inf R (; z) as an eample; z2z suppose y l () =, y u () = and p() p(; ) > p(; ) p(). We want to show that it is possible to have min Q Y jz;d p() ; (p() ) + (p() < ); Q Y jz;d p() ; (p() ) + (p() < ) = Q Y jz;d p() ; (p() ) + (p() < ) < Q Y jz;d p() ; (p() ) + (p() < ): We must assume min fp(); p()g to make this result hold. If min fp(); p()g, we need only check q Q Y jz;d p() ; > Q Y jz;d p() ; q : First, the QIA needs to be satis ed. Without loss of generality, assume Y jz is uniformly distributed. Then the QIA is satis ed if F YjZ(y j) = F YjZ;D(y j; )( p()) + F YjZ;D(y j; )p() = y ; F YjZ(y j) = F YjZ;D(y j; )( p()) + F YjZ;D(y j; )p() = y ; (4) 8

for any y 2 [; ]. As long as F YjZ;D(q j; ) = q p() 2 (; ) or < q < + ( p()); F YjZ;D(q j; ) = q p() 2 (; ) or < q < + ( p()); we can nd quali ed F YjZ;D(y jz; d), z = ;, d = ; such that (4) is satis ed. For eample, let q q p() q F YjZ;D(y j; ) = y (y q ) + + q! p() y (y > q ); ( p()) q q q F YjZ;D(y j; ) = p() q y (y q ) + +! p() y (y > q ); p()q q q q q p() q F YjZ;D(y j; ) = y (y q ) + + q! p() y (y > q ); ( p()) q q q F YjZ;D(y j; ) = p() q y (y q ) + +! p() y (y > q ): p()q q q Figure 7 shows the case with = :5; p() = :6; p() = :7; q = :65 < :75 = q..375.5.65.5.833.5.75.5 Figure 7: An Illustration of inf z2z R (; z) 6= R () When (9) is NOT Satis ed: = :5 It is useful to construct a test to check the hypothesis that the bounds [I L (); I U ()] and those in Theorem 3(i) coincide. Since I L () L () R () and I U () R () L () always hold, our null hypothesis is L () R () I L (), and I U () R () L () ; 9