Causal Analysis After Haavelmo: Definitions and a Unified Analysis of Identification of Recursive Causal Models

Causal Inference in the Social Sciences University of Michigan December 12, 2012 This draft, December 15, 2012 James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 1 / 157 Causal Analysis After Haavelmo: Definitions and a Unified Analysis of Identification of Recursive Causal Models James Heckman University of Chicago Rodrigo Pinto University of Chicago

Plan of the Talk Six Topics: 1 Haavelmo (1943): econometric approach to causal analysis based on latent variables. First formalization of Yule s credo: Correlation is not causation. 2 Linking the models developed in econometrics to DAG approaches (e.g. Pearl, 2009). Those models are fundamentally recursive. 3 Simultaneous causality and mediation analysis. (Haavelmo (1944)) James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 2 / 157

Plan of the Talk 4 A framework unifying alternative causal estimators for recursive models as solutions to a mixture problem a Matching b Instrumental Variables c Control Functions/ Selection Models d Stratification e Random Effects Approaches 5 New results on identification within this framework using mixture models. 6 Econometric mediation analysis. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 3 / 157

1. Econometric Approach to Causality James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 4 / 157

The econometric approach to causality develops explicit models of outcomes where the causes of effects are investigated and the mechanisms governing the choice of treatment are analyzed. The relationship between treatment outcomes and treatment choice mechanisms is studied. A careful accounting of the unobservables in outcome and treatment choice equations facilitates the design and interpretation of estimators. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 5 / 157

Both objective and subjective evaluations are considered, where subjective valuations are those of the person receiving treatment as well as the persons assigning it. Differences between anticipated and realized objective and subjective outcomes are analyzed. Models for simultaneous treatment effects are developed. A careful distinction is made between models for potential outcomes and empirical methods for identifying treatment effects. Carefully analyzes the relationship between observables and unobservables. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 6 / 157

The econometric approach to causality addresses questions that arise in addressing policy problems. Three distinct policy questions: P1 Evaluating the Impact of Historical Interventions on Outcomes Including Their Impact in Terms of the Well-Being of the Treated and Society at Large P2 Forecasting the Impacts (Constructing Counterfactual States) of Interventions Implemented in one Environment in Other Environments, Including Their Impacts In Terms of Well-Being. (External Validity.) P3 Forecasting the Impacts of Interventions (Constructing Counterfactual States Associated with Interventions) Never Historically Experienced to Various Environments, Including Their Impacts in Terms of Well-Being. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 7 / 157

Table 1: Three Distinct Tasks Arising in the Analysis of Causal Models Task Description Requirements 1 Defining the Class of A Scientific Theory: Hypotheticals or A Purely Mental Counterfactuals by Activity Thought Experiments (Models) 2 Identifying Causal Parameters Mathematical Analysis of from Hypothetical Population Point or Set Identification Data 3 Identifying Parameters from Estimation and Testing Real Data Theory James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 8 / 157

A Prototypical Structural Model in Economics Prototypical econometric model for policy evaluation. Agent can be given two courses of treatment 1 and 0 with mutually exclusive outcomes Y (1) and Y (0). Costs C. Information I for the relevant decision maker. The decision to treat may be made on the basis of the expected outcomes E(Y (1) I) and E(Y (0) I) and costs E(C I) where the expectations are those of the relevant decision maker. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 9 / 157

Expected net value: E(Y (1) I) E(C I) E(Y (0) I). (1) James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 10 / 157

For persons who pick treatment based on expected maximum gain: D = 1[(E(Y (1) I) E(C I) E(Y (0) I)) 0]. Generalized Roy Model James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 11 / 157

Ex post treatment effect is Y (1) Y (0). The ex ante effect is E(Y (1) I) E(Y (0) I). Behavioral or scientific theory motivates the construction of (Y (0), Y (1)) and decision rules about treatment assignment. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 12 / 157

The statistical approach to causal inference does not model the treatment assignment rule or its relationship to potential outcomes. The econometric approach makes the treatment assignment equation the centerpiece of its focus and considers both objective and subjective valuations as well as ex ante (E(Y (1) I), E(Y (0) I), E(C I)) and ex post outcomes (Y (1), Y (0), C). For this model, expected effect of treatment for people at the margin of participating is E(Y (1) Y (0) E(Y (1) I) E(Y (0) I) E(C I) = 0), The gain to people just indifferent between treatment and no treatment. Distributional treatment effects (Heckman, Smith, and Clements (1997)) James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 13 / 157

Generating Structural Counterfactuals The traditional model of econometrics is the all causes model. Outcomes for treatment assignment are produced from deterministic mapping of inputs to outputs: y = g (x, u) (2) where x and u are fixed variables specified by the relevant economic theory. Notation anticipates the distinction between observable (x) and unobservable (u) that is important in empirical implementation. Two types of variables in (2) enter symmetrically. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 14 / 157

D is the domain of the mapping g : D R y, where R y is the range of y. There may be multiple outcome variables. All outcomes are explained in a functional sense by the arguments of g in (2). If one models the ex post realizations of outcomes, it is entirely reasonable to invoke an all causes model since the realizations are known (ex post) and all uncertainty has been resolved. Implicit in the definition of a function is the requirement that g be stable or invariant to changes in x and u ( Autonomy Frisch, 1938). The g function remains stable as its arguments are varied. Invariance is a key property of a causal model. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 15 / 157

Equation (2) is sometimes called a Marshallian causal function. Uncertainty is a feature of the environment, (2) can be interpreted as ex post realizations of the counterfactual as uncertainty is resolved. From the point of view of agent i with information set I i, the ex ante expected value of Y is E (Y I i ). (3) James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 16 / 157

Haavelmo (1943) All Causes Framework Early work used recursive linear models. Y outcome cause X observed by analyst = β + cause U unobserved by analyst (4) Distinguishing feature of the econometric approach is explicit modeling of unobservables that drive outcomes and produce selection problems and analyzing their relationship to observables. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 17 / 157

Fixing Vs. Conditioning Distinguished between fixing and conditioning on X. E(Y X = x) conditioning on X F (Y X = x) = xβ + E(U X = x) Fixing X at level X = x X is externally manipulated to take value x Fixing X at different levels is a hypothetical manipulation that does not change the U. E(Y X = x, U = u) (A mental construct). In Haavelmo (1943): y = xβ + u Causality is in the mind - a conceptual thought experiment. Marshall (1890) ceteris paribus clause. No algorithm for producing conceptual models. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 18 / 157

Decomposing Unobserved Confounders Marschak and Andrews (1944) decompose the unobservable: V X and E (V, X ). E(Y X ) = X β + φe(v X ). U = φv + E (5) Source of Confounding All estimators for causal models control for the effects of V (implicitly or explicitly). James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 19 / 157

2. Linking the Econometric Framework of Haavelmo to Fundamentally Recursive DAG Models M H : the hypothetical model (H) M E : the empirical model (E) B E, B H : the domain of the empirical and hypothetical models. Autonomy (Frisch, 1938): y = f Y (x, u) and x = f X (v, ω) hold for both models. (Also called structural invariance.) Example of a hypothetical model: y = xβ + u (Haavelmo, 1943). James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 20 / 157

M (Y ): parents of Y. They directly cause Y, i.e. Some Notation 1 Variables in M (Y ) cause Y 2 Given M (Y ), Y is not affected by changes of other variables in B that are not caused by Y. If M (Y ) =, Y is not caused by any variable in the model. In such cases, Y is an external variable. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 21 / 157

Construct a Hypothetical Intervention Define a hypothetical random variable X not caused by other variables in the model. This is a purely mental construct. Goal of causal analysis is to identify its effect in the data. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 22 / 157

Model for Hypotheticals Model H-1 B H = {X, U, X, Y, V }, M H (Y ) = { X, U}, M H (U) = {V, E}. M H (X ) = {V, Ω}. Variables X, V, E, and Ω are external: M H ( X ) = M H (V ) = M H (E) = M H (Ω) =. U, V, Ω, E unobserved. E and Ω play no essential role in producing selection and evaluation problems. Play an important role in constructing and interpreting the probability spaces of random variables. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 23 / 157

Figure 1: Mechanisms of Causality (a) Haavelmo s Hypothetical Model H-1 Ω ε V U X X ~ Y James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 24 / 157

Empirical Model Model E-1 B E = {X, U, Y, V }, M E (Y ) = {X, U}, and M E (U) = {V, E}. M E (X ) = {V, Ω}. V, Ω, E are external: M E (V ) =, M E (Ω) =, M E (E) = U, V are unobserved. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 25 / 157

Figure 2: Mechanisms of Causality (a) Haavelmo s Hypothetical Model H-1 (b) Haavelmo s Empirical Model E-1 Ω ε V U X X ~ Y Ω ε V U X Y B H = B E { X } James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 26 / 157

Formal Definition of Fixing Definition D-1 Fixing: We represent the variable Y B H when X B H is fixed at level x according to the hypothetical model M H by standard statistical conditioning: Y (x) (Y X = x), under model M H. (6) James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 27 / 157

Acylic Models To connect M E and M H invoke the Local Markov Condition (Kiiveri et al., 1984; Pearl, 1988). X, Y B are variables that belong to the domain of the model Let G B be a subset of the model domain. Variables directly caused by X B are termed direct descendants of X. Denoted by M 1 (X ) = {Y B; X M (Y )}. Let M 1 (G) = X G M 1 (X ) such that G B. Higher order descendants of generation k: M (k+1) (G) = M 1 (M k (G)). James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 28 / 157

Overall set of variables caused by X (directly or indirectly) are called descendants of X : D(X ) = B j=1 M j (X ). Descendants of X, also called internal variables associated with X. In acyclic models, no variable is a descendant of itself, i.e., X / D(X ), X B The set of non-descendant variables of X are called the variables external to X and defined by: L (X ) = B \ D(X ). If a variable X B is external to all other variables in B, it is termed external to the model and has no parents: M (X ) =. Simultaneous equations models in econometrics (Haavelmo, 1944) relax this requirement and allow variables to be descendants of themselves. Recently formulated causal models in statistics (e.g., by Pearl and Rubin) take a step backward from traditional econometric models (Haavelmo, 1944) and are fundamentally recursive. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 29 / 157

Local Markov Condition Variable Y is independent of its non-descendants conditional on its parents: LMC: Y B, Y T { all X M (Y )} T B\(D(Y ) Y ). (7) LMC (7)is a necessary and sufficient condition for obtaining a DAG. Pearl (1988) James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 30 / 157

3. Nonrecursive (Simultaneous) Models of Causality: Developed in Economics (Haavelmo, 1944) A system of linear simultaneous equations captures interdependence among outcomes Y. Standard framework for mediation analyses. (Klein and Goldberger (1955).) James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 31 / 157

Linear model in terms of parameters (Γ, B), observables (Y, X ) and unobservables U: ΓY + BX = U E (U) = 0 (8) Y is a vector of internal and interdependent variables X is external and exogenous (E (U X ) = 0) Γ is a full rank matrix. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 32 / 157

Linear-in-the-parameters all causes model for vector Y. Causes are X and U. The structure is (Γ, B), Σ U, where Σ U is the variance-covariance matrix of U. In the Cowles Commission analysis it is assumed that Γ, B, Σ U are invariant to classes of changes in X and modifications of the distribution of U. Autonomy (Frisch, 1938). Later defined as part of the SUTVA (1986) assumption. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 33 / 157

Linear systems be generalized. Can postulate a system of equations G (Y, X, U) = 0 Conditions for unique solution of reduced forms Y = K (X, U) require that certain Jacobian terms be nonvanishing. Matzkin (2007, 2010). The structural form (8) is an all causes model that relates in a deterministic way outcomes (internal variables) to other outcomes (internal variables) and external variables (the X and U). James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 34 / 157

Are ceteris paribus manipulations associated with the effect of varying some components of Y on other components of Y are possible within the model? James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 35 / 157

Consider a two-agent model of social interactions. Y 1 is the outcome for agent 1; Y 2 is the outcome for agent 2. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 36 / 157

Y 1 = α 1 + γ 12 Y 2 + β 11 X 1 + β 12 X 2 + U 1 (9a) Y 2 = α 2 + γ 21 Y 1 + β 21 X 1 + β 22 X 2 + U 2. (9b) Social interactions model is a version of the standard simultaneous equations problem. This model is sufficiently flexible to capture the notion that the consumption of 1 (Y 1 ) depends on the consumption of 2 if γ 12 0, as well as 1 s value of X if β 11 0, X 1 (assumed to be observed), 2 s value of X, X 2 if β 12 0 and unobservable factors that affect 1 (U 1 ). The determinants of 2 s consumption are defined symmetrically. Allow U 1 and U 2 to be freely correlated. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 37 / 157

Assume E (U 1 X 1, X 2 ) = 0 (10a) and E (U 2 X 1, X 2 ) = 0. (10b) Completeness guarantees that (9a) and (9b) have a determinate solution for (Y 1, Y 2 ). Applying Haavelmo s (1943) fixing analysis to (9a) and (9b), the causal effect of Y 2 on Y 1 is γ 12. This is the effect on Y 1 of fixing Y 2 at different values, holding constant the other variables in the equation. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 38 / 157

Under completeness, the reduced form outcomes of the model after social interactions are solved out can be written as Y 1 = π 10 + π 11 X 1 + π 12 X 2 + E 1, (11a) Y 2 = π 20 + π 21 X 1 + π 22 X 2 + E 2. (11b) James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 39 / 157

Least squares can identify the ceteris paribus effects of X 1 and X 2 on Y 1 and Y 2 because E(E 1 X 1, X 2 ) = 0 and E(E 2 X 1, X 2 ) = 0. Simple algebra: and π 11 = β 11 + γ 12 β 21 1 γ 12 γ 21, π 12 = β 12 + γ 12 β 22 1 γ 12 γ 21, π 21 = γ 21β 11 + β 21 1 γ 12 γ 21, π 22 = γ 21β 12 + β 22 1 γ 12 γ 21 E 1 = U 1 + γ 12 U 2 1 γ 12 γ 21, E 2 = γ 21U 1 + U 2 1 γ 12 γ 21. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 40 / 157

Without any further information on the variances of (U 1, U 2 ) and their relationship to the causal parameters, we cannot isolate the causal effects γ 12 and γ 21 from the reduced form regression coefficients. This is so because holding X 1, X 2, U 1 and U 2 fixed in (9a) or (9b), it is not in principle possible to vary Y 2 or Y 1, respectively, because they are exact functions of X 1, X 2, U 1 and U 2. This exact dependence holds true even if U 1 = 0 and U 2 = 0 so that there are no unobservables. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 41 / 157

Can we define causality within the model? There is no mechanism yet specified within the model to independently vary the right hand sides of Equations (9a) and (9b). Some economists suggest that the mere fact that we can write (9a) and (9b) means that we can imagine independent variation. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 42 / 157

By the same token, we can imagine a model Y = ϕ 0 + ϕ 1 X 1 + ϕ 2 X 2. If part of the model is ( ) X 1 = X 2, no causal effect of X 1 holding X 2 constant is possible in principle within the rules of the model. If we break restriction ( ) and permit independent variation in X 1 and X 2, we can define the causal effect of X 1 holding X 2 constant. The X effects on Y 1 and Y 2, identified through the reduced forms, combine the direct effects (through β ij ) and the indirect effects (as they operate through Y 1 and Y 2, respectively). If we assume exclusions (β 12 = 0) or (β 21 = 0) or both, we can identify the ceteris paribus causal effects of Y 2 on Y 1 and of Y 1 on Y 2, respectively, if β 22 0 or β 11 0, respectively. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 43 / 157

Thus if β 12 = 0, from the reduced form π 12 π 22 = γ 12. If β 21 = 0, we obtain π 21 π 11 = γ 21. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 44 / 157

In a general nonlinear model, Y 1 = g 1 (Y 2, X 1, X 2, U 1 ) Y 2 = g 2 (Y 1, X 1, X 2, U 2 ), Exclusion is defined as g 1 X 2 = 0 for all (Y 2, X 1, X 2, U 1 ) and g 2 X 1 = 0 for all (Y 1, X 1, X 2, U 2 ). James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 45 / 157

Assuming the existence of local solutions, we can solve these equations to obtain By the chain rule we can write Y 1 = ϕ 1 (X 1, X 2, U 1, U 2 ) Y 2 = ϕ 2 (X 1, X 2, U 1, U 2 ) g 1 Y 2 = Y 1 X 2 / Y2 X 2 = ϕ 1 X 2 / ϕ2 X 2. We may define and identify causal effects for Y 1 on Y 2 using partials with respect to X 2 in an analogous fashion. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 46 / 157

This definition of causal effects in an interdependent system generalizes the recursive definitions of causality featured in the statistical treatment effect literature (Holland, 1988, and Pearl, 2009a). The key to this definition is manipulation of external inputs and exclusion, not randomization or matching. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 47 / 157

Econometric Mediation Analysis Build on Wright (1921, 1934), Klein and Goldberger (1955), and Theil (1958) Reduced form estimates the net effect of a policy change X 1 Y 1 X 1 = ϕ 1(X 1, X 2, U 1, U 2 ) X 1. (12) Using this analysis, the system can trivially be used to conduct mediation analyses. ( ) ( ) Y 1 g1 Y2 g 1 = + = ϕ 1(X 1, X 2, U 1, U 2 ) X 1 Y 2 X 1 X 1 X }{{}}{{}}{{} 1 Identified Identified Identified through from reduced from structure exclusion form in structure James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 48 / 157

A More General DAG Framework for Causal Analysis Return to recursive models as does much of the causal analysis literature. Introduce instrumental variables (Reiersöl, 1945) James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 49 / 157

Features: 1 Observed outcome Y ; 2 Observed treatment indicator T that causes outcome Y ; 3 Unobserved variable U that causes outcome Y ; 4 Unobserved transmission variable V that causes treatment indicator T ; 5 Observed external variable Z that causes treatment indicator T (random assignment might generate Z); 6 Observed variable X that causes T and Y, and might be a descendant of V ; 7 Hypothetical variable T that replaces T as a cause of Y. Without loss of generality, assume that V, U designate unobserved random vectors generating variables Y, T and X. Structural model: y = f Y (t, u, x), u = f U (v, ε), x = f X (v, ω), t = f T (z, v, x). James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 50 / 157

Model H-2 Extended Model for Hypotheticals: (Keep Ω and E implicit.) B H = {Y, V, T, Z, X, U, T } M H (Y ) = {X, U, T }, M H (X ) = {V }, M H (T ) = {V, Z, X } M H (U) = {V }. Variables Z, V, T are external, that is, M H (Z) = M H ( T ) = M H (V ) =. Causal effect of T on Y can be moderated by X. Model E-2 Extended Empirical Model: B E = {Y, V, T, Z, X, U}, M E (Y ) = {X, U, T }, M E (X ) = {V }, M E (T ) = {V, Z, X }, M E (U) = {V }, Variables Z, V are external, that is, M E (V ) = M E (Z) =. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 51 / 157

Figure 3: Mechanisms of Causality (a) Hypothetical Model H-2 X V U Z T Y T ~ (b) Empirical Model E-2 X V U Z T Y James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 52 / 157

Models where T and T cause U: Generalized Roy Model. Model H-2 M H (U) = {V, T } and remaining relations as defined in Model H-2. Model E-2 M E (U) = {V, T } and remaining relations as defined in Model E-2. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 53 / 157

Figure 4: Mechanisms of Causality (a) Hypothetical Model H-2 X V U Z T Y T ~ (b) Empirical Model E-2 X V U Z T Y James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 54 / 157

Suppose we fix the hypothetical variable T to take exactly the values that T takes in the empirical model. By autonomy, the distribution of Y in both models must be the same. Thus, Y d = Y (T = T ) Left-hand side is a variable in M E, the right hand side in M H T = T means that we are fixing T on the values T takes. For discrete T : Y = d Y (T = T ) d = Y (t) 1[T = t] (13) t supp(t ) James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 55 / 157

Quandt (1958) switching regression model. In the binary treatment case: Y d = Y (1)T + Y (0)(1 T ) U d = U(1)T + U(0)(1 T ). Continuum of treatment T [0, 1], Y = 1 0 Y (t)1(t = t)df T (t). James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 56 / 157

Benefit of the More General Framework Allow for the analysis of heterogeneity in both slopes and intercepts. (e.g., Generalized Roy Model, Heckman and Vytlacil, 2007a) In the linear setup of Haavelmo (1943): Y (t) = X (t)β(t) + U(t), t [0, 1] or t {0, 1}. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 57 / 157

Mean Causal Effects Average treatment effect (ATE) conditional on X when hypothetical variable T is fixed at values t and t : E(Y (t) Y (t ) X = x) E(Y T = t, X = x) E(Y T = t, X = x); t t. More general formula for total causal effects: E(Y (t) Y (t ) K, X = x) E(Y T = t, K, X = x) E(Y T = t, K, X = x) K is some event based on variables not caused by T. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 58 / 157

A variety of distributional treatment effects e.g. % who benefit from treatment (net and gross; ex ante and ex post). Benefits to the bottom percentile. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 59 / 157

Identification Strategy Identification of treatment effects requires that we connect models M H and M E in a fashion that allows us to evaluate causal parameters of model M H using data generated by model M E. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 60 / 157

4. Identification One set of identifying conditions Assumption A-1 Z is a non-degenerate random variable conditional on X ; Assumption A-2 Counterfactual outcomes Y have finite first moments; Assumption A-3 The population contains both a treatment and a control group for each X, that is, 0 < Pr(T = 1 X = x) < 1 x supp(x ). Assumption A-4 V, U are absolutely continuous with respect to Lebesgue measure. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 61 / 157

ALL valid strategies in the literature control for Haavelmo s V in some form or other. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 62 / 157

Table 2: Alternative Approaches to Identifying Treatment Effects by Eliminating V : Binary Treatment Case (Y (1), Y (0)) T X, V, T {0, 1} Y = TY (1) + (1 T )Y (0) E(Y T = t, X = x) = E(Y T = t, X = x, V = v)df V X =x (v) Method Need Instrument (Z)? Identify Distribution of V? V X? V Z? Matching a V, X known No Yes (V observed) No Control Functions b V estimated, X, Z known (continuous T ); Bounds on quantiles Yes Yes (over support) V X, Z of V estimated (discrete case) Factor No Yes (with auxiliary measurements Typically Method c Distribution of V estimated from additional measurements of V (M) over support) V X (not strictly required) IV LATE d Z, X known Yes Estimate intervals of No; V Z quantiles of V (Heckman and Vytlacil, 1999, 2005) and conditions on them James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 63 / 157

Table 3: Alternative Approaches to Identifying Treatment Effects by Eliminating V : Binary Treatment Case Cont. (Y (1), Y (0)) T X, V, T {0, 1} Y = TY (1) + (1 T )Y (0) E(Y T = t, X = x) = E(Y T = t, X = x, V = v)df V X =x (v) IV LIV e Z, X known Yes Shrink interval of quantiles of V to a point using continuous instruments and conditions on them Stratification f Z, X known Instruments give restrictions on strata (balancing scores for V ) Mixing Estimate V, P(V ) for discrete No (intervals of Distributions g mixtures V ) Identify distribution of strata which places interval bounds on V and conditions on them Yes (Integral Equation) No; V Z V Z (for strata) Yes James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 64 / 157

4.a Matching: V is Known Lemma L-1 Matching: Let G B H be a set of (matching) variables not caused by T, such that, under model M H, T Y (t) G; t supp(t ), then E(Y (t) G) under M H is equal to E(Y T = t, G) under M E. Proof In matching, it is assumed that the matching variables are observed without error. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 65 / 157

Lemma L-2 Under Model H-2, Y (t) T V, X Alternatively: Y T V, T = t, X. Proof James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 66 / 157

V may not be observed. Proxy V. Matching on Mismeasured Variables (Carneiro, Hansen, and Heckman, 2003) Correct for measurement error in the proxies.(conti, Heckman, Pinger, and Zanolini (2009), revised 2012.) James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 67 / 157

4.b LIV and LATE Identification Heckman and Vytlacil (2005) show how to identify: E(Y T = t, V, X ) from observed data (E(Y T = t, X ) when variable V is unobserved. Under assumptions (A-1) (A-4), and under the assumption of Model H-2, using weak separability between observables and unobservables can establish the following relationship between V and instrumental variable Z : Lemma L-3 Under Model H-2, V Z X. Proof Their LIV estimator identifies E(Y T = t, V, X ) for all quantiles of V in the support of P(T = 1 Z). Heckman Vytlacil separability assumption implies and is implied by the LATE assumptions of Imbens and Angrist (1994). LATE identifies E(Y T = t, V V V, X ) where V and V ) are identified from values of Pr(T = 1 Z). James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 68 / 157

4.c Control Function Approach Method goes back to Telser (1964). See Blundell and Powell (2003) for survey. Estimate V using the relationship that follows from Lemma L-3 and auxiliary equations that relate V to T, X, and Z. These equations are called control functions. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 69 / 157

4.d Stratification (Robins and Greenland, 1992) Popularized and expanded on by Frangakis and Rubin (2002) James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 70 / 157

Definition D-2 Stratification: Stratification Variable Let Z be a discrete valued random variable, where supp(z) = {z 1,... z NZ }. Define the strata variable S as an ordered random vector of treatment assignments T when T is conditioned on Z = z 1,... Z = z NZ : S = [(T Z = z 1 ),..., (T Z = z NZ )]. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 71 / 157

Example of Stratification Matrix for the Binary Case T {0, 1} For #(Z) = 3, matrix A that represents all possible strata (S) is: Strata (S) {}}{ s 1 s 2 s 3 s 4 s 5 s 6 s 7 s 8 1 0 1 0 1 0 1 0 z 1 A = 1 1 0 0 1 1 0 0 z 2 Z 1 1 1 1 0 0 0 0 Each column represents one stratum. Each row associates the strata with the instrument Z = z j. The presence of elements 0 and 1 in every line implies that 0 < Pr(T = 1 Z = z) < 1; for all z supp(z). z 3 James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 72 / 157

Idea Partition sample space of the strata so that T Y (t) for each subset of the sample space. Identify counterfactuals with each subset. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 73 / 157

A Fundamental Problem With Stratification A basic problem is that the proposed partitions that guarantee independence are not known. Use the concept of stratification to generate a partition that has the desired conditional independence property Lemma L-1. This identification strategy is a version of mixture models (Rao, 1992) and belongs to a broader class of identification methods which we introduce. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 74 / 157

Simplify the argument further: Keep X and U implicit. Model H-3 Simplified Model for Hypotheticals: B H = {Y, T, T, Z, V }, M H (Y ) = { T, V }, M H (T ) = {Z, V }, M H (Z) = M H (V ) = M H ( T ) =. V is unobserved. Model E-3 Simplified Empirical Model: B E = {Y, T, Z, V }, M E (Y ) = {T, V }, M E (T ) = {Z, V }, M E (Z) = M E (V ) =. V is unobserved. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 75 / 157

Figure 5: Mechanisms of Causality (a) Hypothetical Simplified Model H-3 (b) Empirical Simplified Model E-3 V V Z T Y T ~ Z T Y James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 76 / 157

Model H-4 Causal Models with Strata Hypothetical Strata Model: B H = {Y, T, T, Z, V, S} M H (Y ) = { T, V }, M H (T ) = {Z, S}, M H (S) = {V } M H (Z) = M H (V ) = M H ( T ) =. S and V are unobserved variables. Model E-4 Empirical Strata Model: B E = {Y, T, Z, V, S} M E (Y ) = {T, V, Z}, M E (T ) = {Z, S}, M E (S) = {V } M E (Z) = M E (V ) =. S and V are unobserved variables. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 77 / 157

Figure 6: Mechanisms of Causality (a) Hypothetical Strata Model H-4 (b) Empirical Strata Model E-4 S V S V Z T Y T ~ Z T Y James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 78 / 157

Strata allows us (under certain conditions) to control for V. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 79 / 157

Properties of Stratification Procedures If supp(t ) = {t 1,, t NT }, partition is finite there are N N Z T possible strata. supp(s) = {s 1,, s NS } for the support of S, i.e., all strata s such that Pr(S = s) > 0. Key intuition: can partition the support of V on the basis of strata as each value v of V is associated with a unique stratum s supp(s) by construction. This is a consequence of the structural relationship determining treatment: t = f T (z, v, x) Specifically, supp(v ) is the union of the disjoint sets S n = {v supp(v ); (S V = v) = s n } as stratum s n varies across supp(s). In other words, we partition the sample space on the basis of the response type s supp(s). In this notation, the events S = s n or V S n are equivalent. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 80 / 157

Principal strata allows us (under certain conditions) to control for V. They are a coarse partition of V. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 81 / 157

Both V and the strata variable S play the role of conditioning variables in matching as defined in Lemma L-1. Theorem T-1 In Model H-4, Y (t) T V, but also Y (t) T S. Proof Lemma L-4 In Model H-4, Z S, Y (t) Z S, and Y (t) T S, Z. Proof Lemma L-4 shows that the strata variable S of Definition D-2 is independent of instrument Z. From Lemma L-1: N S E(Y (t)) = E(Y T = t, S n ) Pr(S n ). (14) n=1 James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 82 / 157

The Strata Identification Problem Identification problem for mean treatment effects: the identification of {E(Y (t) S n ), Pr(S n )} N S n=1 from the observed {E(Y T = t, Z = z j ), Pr(T = t Z = z j )} N Z j=1 for t supp(t ). James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 83 / 157

Identification Based on Strata James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 84 / 157

Lemma L-5 In Model H-4, Y (t) Z S. Proof Lemma L-6 In Model H-4, Y (t) T S, Z. Proof James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 85 / 157

P Z (t) = [Pr(T = t Z = z 1 ),..., Pr(T = t Z = z NZ )] (Propensity scores) Q Z (t) = [E(Y T = t, Z = z 1 ),..., E(Y T = t, Z = z NZ )] (Data) P S = [Pr(S 1 ),, Pr(S NS ))] (Hypothetical) Q S (t) = [E(Y (t) S 1 )),..., E(Y (t) S NS ))]. (Hypothetical) James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 86 / 157

Definition D-3 Strata Matrix: Define the strata matrix A as a matrix that consists of support of S, (e.g., {s 1,..., s NS }) A = [s 1,..., s NS ]. Dimension is N Z N S. Define matrix: A t = 1[A = t] ; t supp(t ). i.e. equals 1 for elements of A that take values = t; zero otherwise. A[j, n] for the element in the j-th line and n-th column of matrix A, we use A[, n] for the n-th column and A[j, ] for the j-th row. We use rank(a) for the column rank of matrix A. Elements of the strata matrix are deterministic: A[j, n] = (T Z = z j, S n ); n {1,..., N S }, j {1,..., N Z }. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 87 / 157

From Lemma L-4: Pr(T =t observed Z = z j ) = A t [j, ] Can be constructed from strata Probability of the Stratum P S unknown (15) From Equation (15): P S = A + T [P Z (0),..., P Z (N T ) ] (16) A + T : Moore-Penrose inverse of matrix A T = [A 0,..., A N T ]. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 88 / 157

Lemma L-7 In Model H-4, Y (t) Z S. Proof Lemma L-8 In Model H-4, Y (t) T S, Z. Proof James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 89 / 157

From Lemmas L-1 and L-4, Equation (15) and the definition of A t : E(Y T = t, Z = z j ) = }{{} Observed N S n=1 A t [j, n] }{{} Can be constructed from strata Unobserved {}}{ E(Y (t) S n ) Pr(S n ). (17) Pr(T = t Z = z j ) }{{} Observed James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 90 / 157

Theorem T-2 {E(Y (t) S n )} N S n=1 are identified rank(a t) = N S for each t supp(t ). Proof From Equation (14): E(Y (t)) = ι N S (P S Q S (t)) = ι N S A + t (Q Z (t) (P Z )) (18) : Hadamard product (element-wise multiplication) A + t is the Moore-Penrose inverse of matrix A t ι NS is a N S -dimensional vector whose elements are 1. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 91 / 157

Theorem T-3 Identification of stratum probability Pr(T = t S n )} N S n=1 are identified rank(a t) = N S for each t supp(t ). Proof Theorem T-4 {E(Y (t) S n )} N S n=1 are identified rank(a t) = N S for each t supp(t ). Proof Corollary C-1 rank(a t ) = N S E(Y (t)) is identified. Proof James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 92 / 157

Key Requirement for Identification The identification of strata treatment effects relies on assumptions that restrict the number of elements in the support of S. To achieve this end, one must draw on hypothetical models that restrict how Z and V interact to generate the treatment assignment T. While the identification of strata treatment effects requires that A t be invertible, the identification of average effect when treatment is set to t, namely, E(Y (t)) is less restrictive. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 93 / 157

Stratification and Mixture Distributions James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 94 / 157

Define a probability measure P Λ on (I, Q) for a probability measure Λ on (Θ, Ξ), by the following transformation: P Λ (Q) = P θ (Q)dΛ(θ); Q Q. (19) Theorem T-5 θ A necessary and sufficient condition for identifying all finite mixing distributions (e.g., Λ) associated with a family of probability measures P = {P θ, θ Θ} is that the probabilities {P θ, θ Θ} are linearly independent as functions on Q. Proof {P θ, θ Θ} linearly independent is translated to rank((a) = (Supp(S)). James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 95 / 157

In our notation, Θ = supp(s), and Ξ is a σ-algebra associated with supp(s). Let the family of distributions be P = {P sn, s n supp(s)} where P sn is the distribution of (T S = s n ) defined by the following probabilities: Pr(T = t S = s n ) = [Pr(Z = z 1 )... Pr(Z = z NZ )] A t [, n]; t supp(t ). James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 96 / 157

The mixing distribution Λ represents a distribution of S, say Λ = {Pr(S = s n )} N S n=1. In this case, P Λ is given by: N S P Λ = Pr(S = s n )P sn. n=1 Theorem T-5 states that for Λ to be identified, {P θ, θ Θ} must be linearly independent. In other words, A T must have column rank equal to the number of elements in supp(s), as stated in Theorem T-3. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 97 / 157

Example for Binary Treatment James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 98 / 157

Binary treatment: N T = 2. Suppose 0 < Pr(T = 1 Z = z) < 1 for each z supp(z) The matrix of all possible strata: i = 1,..., N Z, and n = 1,..., 2 N Z, A[i, n] = odd( n/i ). a smallest integer bigger or equal than a. odd(a) = 1 if a is an odd number and zero otherwise A 1 = A and A 0 = ι NZ ι N T A, N Z : cardinality of the support of the random variable Z : N Z = #(Z). James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 99 / 157

Example of Stratification Matrix for the Binary Case T {0, 1} For #(Z) = 3, matrix A that represents all possible strata (S) is: Strata (S) {}}{ s 1 s 2 s 3 s 4 s 5 s 6 s 7 s 8 1 0 1 0 1 0 1 0 z 1 A = 1 1 0 0 1 1 0 0 z 2 Z 1 1 1 1 0 0 0 0 Each column represents one stratum. Each j associates the strata with the instrument Z = z j. The presence of elements 0 and 1 in every line implies that 0 < Pr(T = 1 Z = z) < 1; for all z supp(z). z 3 James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 100 / 157

Monotonicity Without further assumptions, treatment effects cannot be identified because rank(a) N Z < N S = 2 N Z. To identify treatment effects, the number of strata in the support of S must be reduced. Imbens and Angrist (1994) use a monotonicity assumption that in framework reduces the number of strata. Without loss of generality, assume Pr(T = 1 Z = z j ) > Pr(T = 1 Z = z j ) whenever j > j : James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 101 / 157

Assumption A-5 Monotonicity: For all z j, z j supp(z), j > j (T Z = z j ) (T Z = z j ). Monotonicity assumption A-5 does not imply that T (z) is non-increasing in z. Instead, it states that (T Z = z j, S n ) (T Z = z j, S n ) for all n {1,..., N S } and j > j ; j, j {1,..., N Z }. Lemma L-9 Under monotonicity assumption A-5, A is lower triangular. Proof James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 102 / 157

Example of Stratification Matrix Under Monotonicity Under monotonicity assumption A-5 if #(Z) = 3, the admissable strata matrix A that comprises all strata in supp(s) is: 1 0 0 0 A = 1 1 0 0. 1 1 1 0 James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 103 / 157

Previous Literature The analysis of Vytlacil (2002) shows that under Monotonicity Assumption A-5, the treatment assignment function is separable in terms of the propensity score and unobserved variable V. There exists a function τ(v ) such that treatment assignments T are defined by T = 1[Pr(T = 1 Z) τ(v )]. Theorem T-6 There exists a function τ(v ) such that Pr(1[Pr(T = 1 Z) τ(v )] = T ) = 1. Proof LATE implictly assumes a V with these properties James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 104 / 157

LATE identifies E(Y (1) Y (0) X, V V V ) (Heckman and Vytlacil, 1999, 2005) V = Pr(T = 1 Z) V = Pr(T = 1 Z) (Z, Z) discrete points of LATE. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 105 / 157

Theorem T-7 Under Monotonicity Assumption A-5, the strata probabilities {Pr(S n } N S n=1 are identified. Proof Theorem T-8 Let S (t) be the union of S n such that Pr(T = t S n ) > 0 for n = 1,, N S then, under Monotonicity Assumption A-5, E(Y (t) S (t)) is identified. Proof James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 106 / 157

Stratification Treatment Effects James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 107 / 157

Revisit the definition of treatment effects using stratification concepts. For simplicity take the binary case. Let supp(z) = {z 1,..., z NZ } such that Pr(T = 1 Z = z 1 ) = 0, Pr(T = 1 Z = z NZ ) = 1 and Pr(T = 1 Z = z n ) > Pr(T = 1 Z = z n ) whenever n > n. Under Monotonicity Assumption A-5, the number of strata is given by N S = N Z 1 and the dimension of A is N Z (N Z 1). James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 108 / 157

and The strata matrix A and its Moore-Penrose inverse A + 1 the following form: 0 0 0 0 0 0 1 0 0 0 0 0 A = 1 1 1 0 0 0....... 1 1 1 1 1 1 0 1 0 0 0 0 0 1 1 0 0 0 A + 1 = 0 0 1 1 0 0........ 0 0 0 0 1 1 are of James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 109 / 157

Strata Probabilities Using T-3 we can identify strata probabilities and generate the following relationship for the treatment group: Pr(S n ) = Pr(T = 1 Z = z n+1 ) Pr(T = 1 Z = z n ); n {1,..., N S }. (20) James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 110 / 157

Under monotonicity assumption A-5: E(Y (1) S n ) Pr(S n ) }{{} Unobserved For the control group: E(Y (0) S n ) Pr(S n ) }{{} Unobserved = E(YT Z = z n+1 ) E(YT Z = z n ) }{{} Observed n {1,..., N S }. (21) = E(Y (1 T ) Z = z n ) E(Y (1 T ) Z = z n+1 ) }{{} Observed n {1,..., N S }. (22) James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 111 / 157

LATE is a Stratum Treatment Effect and Stratum Treatment Effects are LATEs Subtracting (22) from (21) within each stratum: E(Y (1) Y (0) S n) Pr(S n) =E(Y Z = z n+1 ) E(Y Z = z n); n {1,..., N S }. By equation (20): E(Y Z = z n+1 ) E(Y Z = z n ) E(Y (1) Y (0) S n ) = Pr(T = 1 Z = z n+1 ) Pr(T = 1 Z = z n ) (23) = LATE (z n+1, z n ); n {1,..., N S }. (24) This shows that the strata average treatment effect is identical to the Local Average Treatment Effect (LATE) in Imbens and Angrist (1994). James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 112 / 157

Heckman and Vytlacil (1999; 2005) show how to use Pr(T = 1 Z) to trace out the support of V on which LATE implicitly conditions. Identifies the quantiles of V associated with the changers. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 113 / 157

Generating Marginal Treatment Effects A change of variables permits us to explore this further. Let p n = Pr(T = 1 Z = z n ) Define n = Pr(T = 1 Z = z n+1 ) Pr(T = 1 Z = z n ). In this notation, n j=1 Pr(S j) = p n + n according to equation (20). Define L = F S (S), cdf of S. From equation (20): N S ( n L = Pr(S j ) ) N S ( 1[S = s n ] = pn + n )1[S = s n ]. n=1 j=1 n=1 (25) James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 114 / 157

Thus L = p n + n, S = s n and V S n are equivalent and P = p n and Z = z n are equivalent. Thus: E(Y (1) Y (0) L = p n + n ) = E(Y P = p n + n ) E(Y P = p n ) n = LATE (p n + n, p n ) James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 115 / 157

MTE If L and P absolutely continuous: E(Y (1) Y (0) L = p) = lim 0 E(Y P = p + ) E(Y P = p) = lim 0 LATE (p +, p) = MTE (p) MTE (p) is the Marginal Treatment Effect (MTE) of Heckman and Vytlacil (1999). Heckman and Vytlacil (2005, 2007) show how, under different supports, MTE (p) generates a variety of treatment parameters. Through limit operations with continuous instruments, Heckman and Vytlacil identify the points of evaluation of the quantiles of V. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 116 / 157

Treatment Effect Weights James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 117 / 157

Treatment Effect Weights A consequence of Lemma L-4: Y (t) (T, Z) S Thus: E(Y (t) S, K) = E(Y (t) S). (26) This causal effect is a weighted average of the total treatment effects within strata: N S E(Y (1) Y (0) K) = E(Y (1) Y (0) S n, K) Pr(S n K) n=1 N S Pr(K Sn) Pr(Sn) = E(Y (1) Y (0) S n) by Equation (26) Pr(K) n=1 N S = E(Y (1) Y (0) S n)ω K,n (27) n=1 where weights ω K,n = Pr(K Sn) Pr(Sn) Pr(K) are positive and N s n=1 ω K,n = 1. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 118 / 157

In the case of the average treatment effect, the weights ω K,n = Pr(S n ). Can use equation (27) to examine the average treatment effect in the case of absolutely continuous variables. Note that in this case, L has a uniform [0, 1] distribution, as L = F S (S) and the average treatment effect can be written as: N S E(Y (1) Y (0)) = E(Y (1) Y (0) S n ) Pr(S n ) n=1 N S = E(Y (1) Y (0) L = p n + n ) Pr(L = p n + n ). n=1 The equation for the continuous case is: E(Y (1) Y (0)) = 1 0 MTE (p)dp. (28) James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 119 / 157

Since Z S (Lemma L-4): Pr(T = 1 S n ) = [Pr(Z = z 1 )... Pr(Z = z NZ )]A 1 [, n] = N Z j=n+1 Pr(Z = z j ) = 1 F P (p n + n ). where F P stands for the cumulative distribution function of P. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 120 / 157

Thus the treatment weights are given by: E(Y (1) Y (0) T = 1) = (29) N S = E(Y (1) Y (0) S n ) (1 F P(p n + n )) Pr(S n ) Pr(T = 1) n=1 }{{} Weights N S = n=1 E(Y (1) Y (0) L = p n + n ) (1 F P(p n + n )) Pr(L = p n + n ) Pr(T = 1) For the continuous case are: E(Y (1) Y (0)) = 1 0 } {{ } Weights (30) ( ) (1 MTE FP (p)) (p) dp. Pr(T = 1) }{{} Weights. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 121 / 157

5. Summary of Identification Strategies for Binary Case James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 122 / 157

The Fundamental Confounding Problem All of the methods for identifying the Haavelmo model that are discussed in this paper take different approaches to solving the fundamental integral equation. E(Y T = t, X = x) = }{{} Data E(Y T = t, X = x, V = v) }{{} Counterfactual df V X =x (v) }{{} Mixing Distribution. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 123 / 157

Table 4: Alternative Approaches to Identifying Treatment Effects by Eliminating V : Binary Treatment Case (Y (1), Y (0)) T X, V, T {0, 1} Y = TY (1) + (1 T )Y (0) E(Y T = t, X = x) = E(Y T = t, X = x, V = v)df V X =x (v) Method Need Instrument (Z)? Identify Distribution of V? V X? V Z? Matching a V, X known No Yes (V observed) No Control Functions b V estimated, X, Z known (continuous T ); Bounds on quantiles Yes Yes (over support) V X, Z of V estimated (discrete case) Factor No Yes (with auxiliary measurements Typically Method c Distribution of V estimated from additional measurements of V (M) over support) V X (not strictly required) IV LATE d Z, X known Yes Estimate intervals of No; V Z quantiles of V (Heckman and Vytlacil, 1999, 2005) and conditions on them James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, 2012 124 / 157