Causal Analysis After Haavelmo: Definitions and a Unified Analysis of Identification of Recursive Causal Models

Size: px

Start display at page:

Download "Causal Analysis After Haavelmo: Definitions and a Unified Analysis of Identification of Recursive Causal Models"

Calvin Dickerson
6 years ago
Views:

1 Causal Inference in the Social Sciences University of Michigan December 12, 2012 This draft, December 15, 2012 James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157 Causal Analysis After Haavelmo: Definitions and a Unified Analysis of Identification of Recursive Causal Models James Heckman University of Chicago Rodrigo Pinto University of Chicago

2 Plan of the Talk Six Topics: 1 Haavelmo (1943): econometric approach to causal analysis based on latent variables. First formalization of Yule s credo: Correlation is not causation. 2 Linking the models developed in econometrics to DAG approaches (e.g. Pearl, 2009). Those models are fundamentally recursive. 3 Simultaneous causality and mediation analysis. (Haavelmo (1944)) James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

3 Plan of the Talk 4 A framework unifying alternative causal estimators for recursive models as solutions to a mixture problem a Matching b Instrumental Variables c Control Functions/ Selection Models d Stratification e Random Effects Approaches 5 New results on identification within this framework using mixture models. 6 Econometric mediation analysis. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

4 1. Econometric Approach to Causality James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

5 The econometric approach to causality develops explicit models of outcomes where the causes of effects are investigated and the mechanisms governing the choice of treatment are analyzed. The relationship between treatment outcomes and treatment choice mechanisms is studied. A careful accounting of the unobservables in outcome and treatment choice equations facilitates the design and interpretation of estimators. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

6 Both objective and subjective evaluations are considered, where subjective valuations are those of the person receiving treatment as well as the persons assigning it. Differences between anticipated and realized objective and subjective outcomes are analyzed. Models for simultaneous treatment effects are developed. A careful distinction is made between models for potential outcomes and empirical methods for identifying treatment effects. Carefully analyzes the relationship between observables and unobservables. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

7 The econometric approach to causality addresses questions that arise in addressing policy problems. Three distinct policy questions: P1 Evaluating the Impact of Historical Interventions on Outcomes Including Their Impact in Terms of the Well-Being of the Treated and Society at Large P2 Forecasting the Impacts (Constructing Counterfactual States) of Interventions Implemented in one Environment in Other Environments, Including Their Impacts In Terms of Well-Being. (External Validity.) P3 Forecasting the Impacts of Interventions (Constructing Counterfactual States Associated with Interventions) Never Historically Experienced to Various Environments, Including Their Impacts in Terms of Well-Being. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

8 Table 1: Three Distinct Tasks Arising in the Analysis of Causal Models Task Description Requirements 1 Defining the Class of A Scientific Theory: Hypotheticals or A Purely Mental Counterfactuals by Activity Thought Experiments (Models) 2 Identifying Causal Parameters Mathematical Analysis of from Hypothetical Population Point or Set Identification Data 3 Identifying Parameters from Estimation and Testing Real Data Theory James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

9 A Prototypical Structural Model in Economics Prototypical econometric model for policy evaluation. Agent can be given two courses of treatment 1 and 0 with mutually exclusive outcomes Y (1) and Y (0). Costs C. Information I for the relevant decision maker. The decision to treat may be made on the basis of the expected outcomes E(Y (1) I) and E(Y (0) I) and costs E(C I) where the expectations are those of the relevant decision maker. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

10 Expected net value: E(Y (1) I) E(C I) E(Y (0) I). (1) James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

11 For persons who pick treatment based on expected maximum gain: D = 1[(E(Y (1) I) E(C I) E(Y (0) I)) 0]. Generalized Roy Model James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

12 Ex post treatment effect is Y (1) Y (0). The ex ante effect is E(Y (1) I) E(Y (0) I). Behavioral or scientific theory motivates the construction of (Y (0), Y (1)) and decision rules about treatment assignment. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

13 The statistical approach to causal inference does not model the treatment assignment rule or its relationship to potential outcomes. The econometric approach makes the treatment assignment equation the centerpiece of its focus and considers both objective and subjective valuations as well as ex ante (E(Y (1) I), E(Y (0) I), E(C I)) and ex post outcomes (Y (1), Y (0), C). For this model, expected effect of treatment for people at the margin of participating is E(Y (1) Y (0) E(Y (1) I) E(Y (0) I) E(C I) = 0), The gain to people just indifferent between treatment and no treatment. Distributional treatment effects (Heckman, Smith, and Clements (1997)) James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

14 Generating Structural Counterfactuals The traditional model of econometrics is the all causes model. Outcomes for treatment assignment are produced from deterministic mapping of inputs to outputs: y = g (x, u) (2) where x and u are fixed variables specified by the relevant economic theory. Notation anticipates the distinction between observable (x) and unobservable (u) that is important in empirical implementation. Two types of variables in (2) enter symmetrically. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

15 D is the domain of the mapping g : D R y, where R y is the range of y. There may be multiple outcome variables. All outcomes are explained in a functional sense by the arguments of g in (2). If one models the ex post realizations of outcomes, it is entirely reasonable to invoke an all causes model since the realizations are known (ex post) and all uncertainty has been resolved. Implicit in the definition of a function is the requirement that g be stable or invariant to changes in x and u ( Autonomy Frisch, 1938). The g function remains stable as its arguments are varied. Invariance is a key property of a causal model. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

16 Equation (2) is sometimes called a Marshallian causal function. Uncertainty is a feature of the environment, (2) can be interpreted as ex post realizations of the counterfactual as uncertainty is resolved. From the point of view of agent i with information set I i, the ex ante expected value of Y is E (Y I i ). (3) James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

17 Haavelmo (1943) All Causes Framework Early work used recursive linear models. Y outcome cause X observed by analyst = β + cause U unobserved by analyst (4) Distinguishing feature of the econometric approach is explicit modeling of unobservables that drive outcomes and produce selection problems and analyzing their relationship to observables. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

18 Fixing Vs. Conditioning Distinguished between fixing and conditioning on X. E(Y X = x) conditioning on X F (Y X = x) = xβ + E(U X = x) Fixing X at level X = x X is externally manipulated to take value x Fixing X at different levels is a hypothetical manipulation that does not change the U. E(Y X = x, U = u) (A mental construct). In Haavelmo (1943): y = xβ + u Causality is in the mind - a conceptual thought experiment. Marshall (1890) ceteris paribus clause. No algorithm for producing conceptual models. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

19 Decomposing Unobserved Confounders Marschak and Andrews (1944) decompose the unobservable: V X and E (V, X ). E(Y X ) = X β + φe(v X ). U = φv + E (5) Source of Confounding All estimators for causal models control for the effects of V (implicitly or explicitly). James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

20 2. Linking the Econometric Framework of Haavelmo to Fundamentally Recursive DAG Models M H : the hypothetical model (H) M E : the empirical model (E) B E, B H : the domain of the empirical and hypothetical models. Autonomy (Frisch, 1938): y = f Y (x, u) and x = f X (v, ω) hold for both models. (Also called structural invariance.) Example of a hypothetical model: y = xβ + u (Haavelmo, 1943). James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

21 M (Y ): parents of Y. They directly cause Y, i.e. Some Notation 1 Variables in M (Y ) cause Y 2 Given M (Y ), Y is not affected by changes of other variables in B that are not caused by Y. If M (Y ) =, Y is not caused by any variable in the model. In such cases, Y is an external variable. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

22 Construct a Hypothetical Intervention Define a hypothetical random variable X not caused by other variables in the model. This is a purely mental construct. Goal of causal analysis is to identify its effect in the data. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

23 Model for Hypotheticals Model H-1 B H = {X, U, X, Y, V }, M H (Y ) = { X, U}, M H (U) = {V, E}. M H (X ) = {V, Ω}. Variables X, V, E, and Ω are external: M H ( X ) = M H (V ) = M H (E) = M H (Ω) =. U, V, Ω, E unobserved. E and Ω play no essential role in producing selection and evaluation problems. Play an important role in constructing and interpreting the probability spaces of random variables. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

24 Figure 1: Mechanisms of Causality (a) Haavelmo s Hypothetical Model H-1 Ω ε V U X X ~ Y James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

25 Empirical Model Model E-1 B E = {X, U, Y, V }, M E (Y ) = {X, U}, and M E (U) = {V, E}. M E (X ) = {V, Ω}. V, Ω, E are external: M E (V ) =, M E (Ω) =, M E (E) = U, V are unobserved. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

26 Figure 2: Mechanisms of Causality (a) Haavelmo s Hypothetical Model H-1 (b) Haavelmo s Empirical Model E-1 Ω ε V U X X ~ Y Ω ε V U X Y B H = B E { X } James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

27 Formal Definition of Fixing Definition D-1 Fixing: We represent the variable Y B H when X B H is fixed at level x according to the hypothetical model M H by standard statistical conditioning: Y (x) (Y X = x), under model M H. (6) James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

28 Acylic Models To connect M E and M H invoke the Local Markov Condition (Kiiveri et al., 1984; Pearl, 1988). X, Y B are variables that belong to the domain of the model Let G B be a subset of the model domain. Variables directly caused by X B are termed direct descendants of X. Denoted by M 1 (X ) = {Y B; X M (Y )}. Let M 1 (G) = X G M 1 (X ) such that G B. Higher order descendants of generation k: M (k+1) (G) = M 1 (M k (G)). James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

29 Overall set of variables caused by X (directly or indirectly) are called descendants of X : D(X ) = B j=1 M j (X ). Descendants of X, also called internal variables associated with X. In acyclic models, no variable is a descendant of itself, i.e., X / D(X ), X B The set of non-descendant variables of X are called the variables external to X and defined by: L (X ) = B \ D(X ). If a variable X B is external to all other variables in B, it is termed external to the model and has no parents: M (X ) =. Simultaneous equations models in econometrics (Haavelmo, 1944) relax this requirement and allow variables to be descendants of themselves. Recently formulated causal models in statistics (e.g., by Pearl and Rubin) take a step backward from traditional econometric models (Haavelmo, 1944) and are fundamentally recursive. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

30 Local Markov Condition Variable Y is independent of its non-descendants conditional on its parents: LMC: Y B, Y T { all X M (Y )} T B\(D(Y ) Y ). (7) LMC (7)is a necessary and sufficient condition for obtaining a DAG. Pearl (1988) James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

31 3. Nonrecursive (Simultaneous) Models of Causality: Developed in Economics (Haavelmo, 1944) A system of linear simultaneous equations captures interdependence among outcomes Y. Standard framework for mediation analyses. (Klein and Goldberger (1955).) James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

32 Linear model in terms of parameters (Γ, B), observables (Y, X ) and unobservables U: ΓY + BX = U E (U) = 0 (8) Y is a vector of internal and interdependent variables X is external and exogenous (E (U X ) = 0) Γ is a full rank matrix. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

33 Linear-in-the-parameters all causes model for vector Y. Causes are X and U. The structure is (Γ, B), Σ U, where Σ U is the variance-covariance matrix of U. In the Cowles Commission analysis it is assumed that Γ, B, Σ U are invariant to classes of changes in X and modifications of the distribution of U. Autonomy (Frisch, 1938). Later defined as part of the SUTVA (1986) assumption. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

34 Linear systems be generalized. Can postulate a system of equations G (Y, X, U) = 0 Conditions for unique solution of reduced forms Y = K (X, U) require that certain Jacobian terms be nonvanishing. Matzkin (2007, 2010). The structural form (8) is an all causes model that relates in a deterministic way outcomes (internal variables) to other outcomes (internal variables) and external variables (the X and U). James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

35 Are ceteris paribus manipulations associated with the effect of varying some components of Y on other components of Y are possible within the model? James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

36 Consider a two-agent model of social interactions. Y 1 is the outcome for agent 1; Y 2 is the outcome for agent 2. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

37 Y 1 = α 1 + γ 12 Y 2 + β 11 X 1 + β 12 X 2 + U 1 (9a) Y 2 = α 2 + γ 21 Y 1 + β 21 X 1 + β 22 X 2 + U 2. (9b) Social interactions model is a version of the standard simultaneous equations problem. This model is sufficiently flexible to capture the notion that the consumption of 1 (Y 1 ) depends on the consumption of 2 if γ 12 0, as well as 1 s value of X if β 11 0, X 1 (assumed to be observed), 2 s value of X, X 2 if β 12 0 and unobservable factors that affect 1 (U 1 ). The determinants of 2 s consumption are defined symmetrically. Allow U 1 and U 2 to be freely correlated. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

38 Assume E (U 1 X 1, X 2 ) = 0 (10a) and E (U 2 X 1, X 2 ) = 0. (10b) Completeness guarantees that (9a) and (9b) have a determinate solution for (Y 1, Y 2 ). Applying Haavelmo s (1943) fixing analysis to (9a) and (9b), the causal effect of Y 2 on Y 1 is γ 12. This is the effect on Y 1 of fixing Y 2 at different values, holding constant the other variables in the equation. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

39 Under completeness, the reduced form outcomes of the model after social interactions are solved out can be written as Y 1 = π 10 + π 11 X 1 + π 12 X 2 + E 1, (11a) Y 2 = π 20 + π 21 X 1 + π 22 X 2 + E 2. (11b) James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

40 Least squares can identify the ceteris paribus effects of X 1 and X 2 on Y 1 and Y 2 because E(E 1 X 1, X 2 ) = 0 and E(E 2 X 1, X 2 ) = 0. Simple algebra: and π 11 = β 11 + γ 12 β 21 1 γ 12 γ 21, π 12 = β 12 + γ 12 β 22 1 γ 12 γ 21, π 21 = γ 21β 11 + β 21 1 γ 12 γ 21, π 22 = γ 21β 12 + β 22 1 γ 12 γ 21 E 1 = U 1 + γ 12 U 2 1 γ 12 γ 21, E 2 = γ 21U 1 + U 2 1 γ 12 γ 21. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

41 Without any further information on the variances of (U 1, U 2 ) and their relationship to the causal parameters, we cannot isolate the causal effects γ 12 and γ 21 from the reduced form regression coefficients. This is so because holding X 1, X 2, U 1 and U 2 fixed in (9a) or (9b), it is not in principle possible to vary Y 2 or Y 1, respectively, because they are exact functions of X 1, X 2, U 1 and U 2. This exact dependence holds true even if U 1 = 0 and U 2 = 0 so that there are no unobservables. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

42 Can we define causality within the model? There is no mechanism yet specified within the model to independently vary the right hand sides of Equations (9a) and (9b). Some economists suggest that the mere fact that we can write (9a) and (9b) means that we can imagine independent variation. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

43 By the same token, we can imagine a model Y = ϕ 0 + ϕ 1 X 1 + ϕ 2 X 2. If part of the model is ( ) X 1 = X 2, no causal effect of X 1 holding X 2 constant is possible in principle within the rules of the model. If we break restriction ( ) and permit independent variation in X 1 and X 2, we can define the causal effect of X 1 holding X 2 constant. The X effects on Y 1 and Y 2, identified through the reduced forms, combine the direct effects (through β ij ) and the indirect effects (as they operate through Y 1 and Y 2, respectively). If we assume exclusions (β 12 = 0) or (β 21 = 0) or both, we can identify the ceteris paribus causal effects of Y 2 on Y 1 and of Y 1 on Y 2, respectively, if β 22 0 or β 11 0, respectively. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

44 Thus if β 12 = 0, from the reduced form π 12 π 22 = γ 12. If β 21 = 0, we obtain π 21 π 11 = γ 21. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

45 In a general nonlinear model, Y 1 = g 1 (Y 2, X 1, X 2, U 1 ) Y 2 = g 2 (Y 1, X 1, X 2, U 2 ), Exclusion is defined as g 1 X 2 = 0 for all (Y 2, X 1, X 2, U 1 ) and g 2 X 1 = 0 for all (Y 1, X 1, X 2, U 2 ). James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

46 Assuming the existence of local solutions, we can solve these equations to obtain By the chain rule we can write Y 1 = ϕ 1 (X 1, X 2, U 1, U 2 ) Y 2 = ϕ 2 (X 1, X 2, U 1, U 2 ) g 1 Y 2 = Y 1 X 2 / Y2 X 2 = ϕ 1 X 2 / ϕ2 X 2. We may define and identify causal effects for Y 1 on Y 2 using partials with respect to X 2 in an analogous fashion. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

47 This definition of causal effects in an interdependent system generalizes the recursive definitions of causality featured in the statistical treatment effect literature (Holland, 1988, and Pearl, 2009a). The key to this definition is manipulation of external inputs and exclusion, not randomization or matching. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

48 Econometric Mediation Analysis Build on Wright (1921, 1934), Klein and Goldberger (1955), and Theil (1958) Reduced form estimates the net effect of a policy change X 1 Y 1 X 1 = ϕ 1(X 1, X 2, U 1, U 2 ) X 1. (12) Using this analysis, the system can trivially be used to conduct mediation analyses. ( ) ( ) Y 1 g1 Y2 g 1 = + = ϕ 1(X 1, X 2, U 1, U 2 ) X 1 Y 2 X 1 X 1 X }{{}}{{}}{{} 1 Identified Identified Identified through from reduced from structure exclusion form in structure James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

49 A More General DAG Framework for Causal Analysis Return to recursive models as does much of the causal analysis literature. Introduce instrumental variables (Reiersöl, 1945) James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

50 Features: 1 Observed outcome Y ; 2 Observed treatment indicator T that causes outcome Y ; 3 Unobserved variable U that causes outcome Y ; 4 Unobserved transmission variable V that causes treatment indicator T ; 5 Observed external variable Z that causes treatment indicator T (random assignment might generate Z); 6 Observed variable X that causes T and Y, and might be a descendant of V ; 7 Hypothetical variable T that replaces T as a cause of Y. Without loss of generality, assume that V, U designate unobserved random vectors generating variables Y, T and X. Structural model: y = f Y (t, u, x), u = f U (v, ε), x = f X (v, ω), t = f T (z, v, x). James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

51 Model H-2 Extended Model for Hypotheticals: (Keep Ω and E implicit.) B H = {Y, V, T, Z, X, U, T } M H (Y ) = {X, U, T }, M H (X ) = {V }, M H (T ) = {V, Z, X } M H (U) = {V }. Variables Z, V, T are external, that is, M H (Z) = M H ( T ) = M H (V ) =. Causal effect of T on Y can be moderated by X. Model E-2 Extended Empirical Model: B E = {Y, V, T, Z, X, U}, M E (Y ) = {X, U, T }, M E (X ) = {V }, M E (T ) = {V, Z, X }, M E (U) = {V }, Variables Z, V are external, that is, M E (V ) = M E (Z) =. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

52 Figure 3: Mechanisms of Causality (a) Hypothetical Model H-2 X V U Z T Y T ~ (b) Empirical Model E-2 X V U Z T Y James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

53 Models where T and T cause U: Generalized Roy Model. Model H-2 M H (U) = {V, T } and remaining relations as defined in Model H-2. Model E-2 M E (U) = {V, T } and remaining relations as defined in Model E-2. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

54 Figure 4: Mechanisms of Causality (a) Hypothetical Model H-2 X V U Z T Y T ~ (b) Empirical Model E-2 X V U Z T Y James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

55 Suppose we fix the hypothetical variable T to take exactly the values that T takes in the empirical model. By autonomy, the distribution of Y in both models must be the same. Thus, Y d = Y (T = T ) Left-hand side is a variable in M E, the right hand side in M H T = T means that we are fixing T on the values T takes. For discrete T : Y = d Y (T = T ) d = Y (t) 1[T = t] (13) t supp(t ) James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

56 Quandt (1958) switching regression model. In the binary treatment case: Y d = Y (1)T + Y (0)(1 T ) U d = U(1)T + U(0)(1 T ). Continuum of treatment T [0, 1], Y = 1 0 Y (t)1(t = t)df T (t). James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

57 Benefit of the More General Framework Allow for the analysis of heterogeneity in both slopes and intercepts. (e.g., Generalized Roy Model, Heckman and Vytlacil, 2007a) In the linear setup of Haavelmo (1943): Y (t) = X (t)β(t) + U(t), t [0, 1] or t {0, 1}. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

58 Mean Causal Effects Average treatment effect (ATE) conditional on X when hypothetical variable T is fixed at values t and t : E(Y (t) Y (t ) X = x) E(Y T = t, X = x) E(Y T = t, X = x); t t. More general formula for total causal effects: E(Y (t) Y (t ) K, X = x) E(Y T = t, K, X = x) E(Y T = t, K, X = x) K is some event based on variables not caused by T. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

59 A variety of distributional treatment effects e.g. % who benefit from treatment (net and gross; ex ante and ex post). Benefits to the bottom percentile. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

60 Identification Strategy Identification of treatment effects requires that we connect models M H and M E in a fashion that allows us to evaluate causal parameters of model M H using data generated by model M E. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

61 4. Identification One set of identifying conditions Assumption A-1 Z is a non-degenerate random variable conditional on X ; Assumption A-2 Counterfactual outcomes Y have finite first moments; Assumption A-3 The population contains both a treatment and a control group for each X, that is, 0 < Pr(T = 1 X = x) < 1 x supp(x ). Assumption A-4 V, U are absolutely continuous with respect to Lebesgue measure. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

62 ALL valid strategies in the literature control for Haavelmo s V in some form or other. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

63 Table 2: Alternative Approaches to Identifying Treatment Effects by Eliminating V : Binary Treatment Case (Y (1), Y (0)) T X, V, T {0, 1} Y = TY (1) + (1 T )Y (0) E(Y T = t, X = x) = E(Y T = t, X = x, V = v)df V X =x (v) Method Need Instrument (Z)? Identify Distribution of V? V X? V Z? Matching a V, X known No Yes (V observed) No Control Functions b V estimated, X, Z known (continuous T ); Bounds on quantiles Yes Yes (over support) V X, Z of V estimated (discrete case) Factor No Yes (with auxiliary measurements Typically Method c Distribution of V estimated from additional measurements of V (M) over support) V X (not strictly required) IV LATE d Z, X known Yes Estimate intervals of No; V Z quantiles of V (Heckman and Vytlacil, 1999, 2005) and conditions on them James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

64 Table 3: Alternative Approaches to Identifying Treatment Effects by Eliminating V : Binary Treatment Case Cont. (Y (1), Y (0)) T X, V, T {0, 1} Y = TY (1) + (1 T )Y (0) E(Y T = t, X = x) = E(Y T = t, X = x, V = v)df V X =x (v) IV LIV e Z, X known Yes Shrink interval of quantiles of V to a point using continuous instruments and conditions on them Stratification f Z, X known Instruments give restrictions on strata (balancing scores for V ) Mixing Estimate V, P(V ) for discrete No (intervals of Distributions g mixtures V ) Identify distribution of strata which places interval bounds on V and conditions on them Yes (Integral Equation) No; V Z V Z (for strata) Yes James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

65 4.a Matching: V is Known Lemma L-1 Matching: Let G B H be a set of (matching) variables not caused by T, such that, under model M H, T Y (t) G; t supp(t ), then E(Y (t) G) under M H is equal to E(Y T = t, G) under M E. Proof In matching, it is assumed that the matching variables are observed without error. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

66 Lemma L-2 Under Model H-2, Y (t) T V, X Alternatively: Y T V, T = t, X. Proof James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

67 V may not be observed. Proxy V. Matching on Mismeasured Variables (Carneiro, Hansen, and Heckman, 2003) Correct for measurement error in the proxies.(conti, Heckman, Pinger, and Zanolini (2009), revised 2012.) James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

68 4.b LIV and LATE Identification Heckman and Vytlacil (2005) show how to identify: E(Y T = t, V, X ) from observed data (E(Y T = t, X ) when variable V is unobserved. Under assumptions (A-1) (A-4), and under the assumption of Model H-2, using weak separability between observables and unobservables can establish the following relationship between V and instrumental variable Z : Lemma L-3 Under Model H-2, V Z X. Proof Their LIV estimator identifies E(Y T = t, V, X ) for all quantiles of V in the support of P(T = 1 Z). Heckman Vytlacil separability assumption implies and is implied by the LATE assumptions of Imbens and Angrist (1994). LATE identifies E(Y T = t, V V V, X ) where V and V ) are identified from values of Pr(T = 1 Z). James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

69 4.c Control Function Approach Method goes back to Telser (1964). See Blundell and Powell (2003) for survey. Estimate V using the relationship that follows from Lemma L-3 and auxiliary equations that relate V to T, X, and Z. These equations are called control functions. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

70 4.d Stratification (Robins and Greenland, 1992) Popularized and expanded on by Frangakis and Rubin (2002) James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

71 Definition D-2 Stratification: Stratification Variable Let Z be a discrete valued random variable, where supp(z) = {z 1,... z NZ }. Define the strata variable S as an ordered random vector of treatment assignments T when T is conditioned on Z = z 1,... Z = z NZ : S = [(T Z = z 1 ),..., (T Z = z NZ )]. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

72 Example of Stratification Matrix for the Binary Case T {0, 1} For #(Z) = 3, matrix A that represents all possible strata (S) is: Strata (S) {}}{ s 1 s 2 s 3 s 4 s 5 s 6 s 7 s z 1 A = z 2 Z Each column represents one stratum. Each row associates the strata with the instrument Z = z j. The presence of elements 0 and 1 in every line implies that 0 < Pr(T = 1 Z = z) < 1; for all z supp(z). z 3 James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

73 Idea Partition sample space of the strata so that T Y (t) for each subset of the sample space. Identify counterfactuals with each subset. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

74 A Fundamental Problem With Stratification A basic problem is that the proposed partitions that guarantee independence are not known. Use the concept of stratification to generate a partition that has the desired conditional independence property Lemma L-1. This identification strategy is a version of mixture models (Rao, 1992) and belongs to a broader class of identification methods which we introduce. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

75 Simplify the argument further: Keep X and U implicit. Model H-3 Simplified Model for Hypotheticals: B H = {Y, T, T, Z, V }, M H (Y ) = { T, V }, M H (T ) = {Z, V }, M H (Z) = M H (V ) = M H ( T ) =. V is unobserved. Model E-3 Simplified Empirical Model: B E = {Y, T, Z, V }, M E (Y ) = {T, V }, M E (T ) = {Z, V }, M E (Z) = M E (V ) =. V is unobserved. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

76 Figure 5: Mechanisms of Causality (a) Hypothetical Simplified Model H-3 (b) Empirical Simplified Model E-3 V V Z T Y T ~ Z T Y James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

77 Model H-4 Causal Models with Strata Hypothetical Strata Model: B H = {Y, T, T, Z, V, S} M H (Y ) = { T, V }, M H (T ) = {Z, S}, M H (S) = {V } M H (Z) = M H (V ) = M H ( T ) =. S and V are unobserved variables. Model E-4 Empirical Strata Model: B E = {Y, T, Z, V, S} M E (Y ) = {T, V, Z}, M E (T ) = {Z, S}, M E (S) = {V } M E (Z) = M E (V ) =. S and V are unobserved variables. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

78 Figure 6: Mechanisms of Causality (a) Hypothetical Strata Model H-4 (b) Empirical Strata Model E-4 S V S V Z T Y T ~ Z T Y James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

79 Strata allows us (under certain conditions) to control for V. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

80 Properties of Stratification Procedures If supp(t ) = {t 1,, t NT }, partition is finite there are N N Z T possible strata. supp(s) = {s 1,, s NS } for the support of S, i.e., all strata s such that Pr(S = s) > 0. Key intuition: can partition the support of V on the basis of strata as each value v of V is associated with a unique stratum s supp(s) by construction. This is a consequence of the structural relationship determining treatment: t = f T (z, v, x) Specifically, supp(v ) is the union of the disjoint sets S n = {v supp(v ); (S V = v) = s n } as stratum s n varies across supp(s). In other words, we partition the sample space on the basis of the response type s supp(s). In this notation, the events S = s n or V S n are equivalent. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

81 Principal strata allows us (under certain conditions) to control for V. They are a coarse partition of V. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

82 Both V and the strata variable S play the role of conditioning variables in matching as defined in Lemma L-1. Theorem T-1 In Model H-4, Y (t) T V, but also Y (t) T S. Proof Lemma L-4 In Model H-4, Z S, Y (t) Z S, and Y (t) T S, Z. Proof Lemma L-4 shows that the strata variable S of Definition D-2 is independent of instrument Z. From Lemma L-1: N S E(Y (t)) = E(Y T = t, S n ) Pr(S n ). (14) n=1 James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

83 The Strata Identification Problem Identification problem for mean treatment effects: the identification of {E(Y (t) S n ), Pr(S n )} N S n=1 from the observed {E(Y T = t, Z = z j ), Pr(T = t Z = z j )} N Z j=1 for t supp(t ). James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

84 Identification Based on Strata James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

85 Lemma L-5 In Model H-4, Y (t) Z S. Proof Lemma L-6 In Model H-4, Y (t) T S, Z. Proof James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

86 P Z (t) = [Pr(T = t Z = z 1 ),..., Pr(T = t Z = z NZ )] (Propensity scores) Q Z (t) = [E(Y T = t, Z = z 1 ),..., E(Y T = t, Z = z NZ )] (Data) P S = [Pr(S 1 ),, Pr(S NS ))] (Hypothetical) Q S (t) = [E(Y (t) S 1 )),..., E(Y (t) S NS ))]. (Hypothetical) James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

87 Definition D-3 Strata Matrix: Define the strata matrix A as a matrix that consists of support of S, (e.g., {s 1,..., s NS }) A = [s 1,..., s NS ]. Dimension is N Z N S. Define matrix: A t = 1[A = t] ; t supp(t ). i.e. equals 1 for elements of A that take values = t; zero otherwise. A[j, n] for the element in the j-th line and n-th column of matrix A, we use A[, n] for the n-th column and A[j, ] for the j-th row. We use rank(a) for the column rank of matrix A. Elements of the strata matrix are deterministic: A[j, n] = (T Z = z j, S n ); n {1,..., N S }, j {1,..., N Z }. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

88 From Lemma L-4: Pr(T =t observed Z = z j ) = A t [j, ] Can be constructed from strata Probability of the Stratum P S unknown (15) From Equation (15): P S = A + T [P Z (0),..., P Z (N T ) ] (16) A + T : Moore-Penrose inverse of matrix A T = [A 0,..., A N T ]. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

89 Lemma L-7 In Model H-4, Y (t) Z S. Proof Lemma L-8 In Model H-4, Y (t) T S, Z. Proof James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

90 From Lemmas L-1 and L-4, Equation (15) and the definition of A t : E(Y T = t, Z = z j ) = }{{} Observed N S n=1 A t [j, n] }{{} Can be constructed from strata Unobserved {}}{ E(Y (t) S n ) Pr(S n ). (17) Pr(T = t Z = z j ) }{{} Observed James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

91 Theorem T-2 {E(Y (t) S n )} N S n=1 are identified rank(a t) = N S for each t supp(t ). Proof From Equation (14): E(Y (t)) = ι N S (P S Q S (t)) = ι N S A + t (Q Z (t) (P Z )) (18) : Hadamard product (element-wise multiplication) A + t is the Moore-Penrose inverse of matrix A t ι NS is a N S -dimensional vector whose elements are 1. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

92 Theorem T-3 Identification of stratum probability Pr(T = t S n )} N S n=1 are identified rank(a t) = N S for each t supp(t ). Proof Theorem T-4 {E(Y (t) S n )} N S n=1 are identified rank(a t) = N S for each t supp(t ). Proof Corollary C-1 rank(a t ) = N S E(Y (t)) is identified. Proof James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

93 Key Requirement for Identification The identification of strata treatment effects relies on assumptions that restrict the number of elements in the support of S. To achieve this end, one must draw on hypothetical models that restrict how Z and V interact to generate the treatment assignment T. While the identification of strata treatment effects requires that A t be invertible, the identification of average effect when treatment is set to t, namely, E(Y (t)) is less restrictive. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

94 Stratification and Mixture Distributions James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

95 Define a probability measure P Λ on (I, Q) for a probability measure Λ on (Θ, Ξ), by the following transformation: P Λ (Q) = P θ (Q)dΛ(θ); Q Q. (19) Theorem T-5 θ A necessary and sufficient condition for identifying all finite mixing distributions (e.g., Λ) associated with a family of probability measures P = {P θ, θ Θ} is that the probabilities {P θ, θ Θ} are linearly independent as functions on Q. Proof {P θ, θ Θ} linearly independent is translated to rank((a) = (Supp(S)). James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

96 In our notation, Θ = supp(s), and Ξ is a σ-algebra associated with supp(s). Let the family of distributions be P = {P sn, s n supp(s)} where P sn is the distribution of (T S = s n ) defined by the following probabilities: Pr(T = t S = s n ) = [Pr(Z = z 1 )... Pr(Z = z NZ )] A t [, n]; t supp(t ). James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

97 The mixing distribution Λ represents a distribution of S, say Λ = {Pr(S = s n )} N S n=1. In this case, P Λ is given by: N S P Λ = Pr(S = s n )P sn. n=1 Theorem T-5 states that for Λ to be identified, {P θ, θ Θ} must be linearly independent. In other words, A T must have column rank equal to the number of elements in supp(s), as stated in Theorem T-3. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

98 Example for Binary Treatment James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

99 Binary treatment: N T = 2. Suppose 0 < Pr(T = 1 Z = z) < 1 for each z supp(z) The matrix of all possible strata: i = 1,..., N Z, and n = 1,..., 2 N Z, A[i, n] = odd( n/i ). a smallest integer bigger or equal than a. odd(a) = 1 if a is an odd number and zero otherwise A 1 = A and A 0 = ι NZ ι N T A, N Z : cardinality of the support of the random variable Z : N Z = #(Z). James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

100 Example of Stratification Matrix for the Binary Case T {0, 1} For #(Z) = 3, matrix A that represents all possible strata (S) is: Strata (S) {}}{ s 1 s 2 s 3 s 4 s 5 s 6 s 7 s z 1 A = z 2 Z Each column represents one stratum. Each j associates the strata with the instrument Z = z j. The presence of elements 0 and 1 in every line implies that 0 < Pr(T = 1 Z = z) < 1; for all z supp(z). z 3 James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

101 Monotonicity Without further assumptions, treatment effects cannot be identified because rank(a) N Z < N S = 2 N Z. To identify treatment effects, the number of strata in the support of S must be reduced. Imbens and Angrist (1994) use a monotonicity assumption that in framework reduces the number of strata. Without loss of generality, assume Pr(T = 1 Z = z j ) > Pr(T = 1 Z = z j ) whenever j > j : James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

102 Assumption A-5 Monotonicity: For all z j, z j supp(z), j > j (T Z = z j ) (T Z = z j ). Monotonicity assumption A-5 does not imply that T (z) is non-increasing in z. Instead, it states that (T Z = z j, S n ) (T Z = z j, S n ) for all n {1,..., N S } and j > j ; j, j {1,..., N Z }. Lemma L-9 Under monotonicity assumption A-5, A is lower triangular. Proof James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

103 Example of Stratification Matrix Under Monotonicity Under monotonicity assumption A-5 if #(Z) = 3, the admissable strata matrix A that comprises all strata in supp(s) is: A = James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

104 Previous Literature The analysis of Vytlacil (2002) shows that under Monotonicity Assumption A-5, the treatment assignment function is separable in terms of the propensity score and unobserved variable V. There exists a function τ(v ) such that treatment assignments T are defined by T = 1[Pr(T = 1 Z) τ(v )]. Theorem T-6 There exists a function τ(v ) such that Pr(1[Pr(T = 1 Z) τ(v )] = T ) = 1. Proof LATE implictly assumes a V with these properties James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

105 LATE identifies E(Y (1) Y (0) X, V V V ) (Heckman and Vytlacil, 1999, 2005) V = Pr(T = 1 Z) V = Pr(T = 1 Z) (Z, Z) discrete points of LATE. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

106 Theorem T-7 Under Monotonicity Assumption A-5, the strata probabilities {Pr(S n } N S n=1 are identified. Proof Theorem T-8 Let S (t) be the union of S n such that Pr(T = t S n ) > 0 for n = 1,, N S then, under Monotonicity Assumption A-5, E(Y (t) S (t)) is identified. Proof James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

107 Stratification Treatment Effects James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

108 Revisit the definition of treatment effects using stratification concepts. For simplicity take the binary case. Let supp(z) = {z 1,..., z NZ } such that Pr(T = 1 Z = z 1 ) = 0, Pr(T = 1 Z = z NZ ) = 1 and Pr(T = 1 Z = z n ) > Pr(T = 1 Z = z n ) whenever n > n. Under Monotonicity Assumption A-5, the number of strata is given by N S = N Z 1 and the dimension of A is N Z (N Z 1). James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

109 and The strata matrix A and its Moore-Penrose inverse A + 1 the following form: A = A + 1 = are of James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

110 Strata Probabilities Using T-3 we can identify strata probabilities and generate the following relationship for the treatment group: Pr(S n ) = Pr(T = 1 Z = z n+1 ) Pr(T = 1 Z = z n ); n {1,..., N S }. (20) James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

111 Under monotonicity assumption A-5: E(Y (1) S n ) Pr(S n ) }{{} Unobserved For the control group: E(Y (0) S n ) Pr(S n ) }{{} Unobserved = E(YT Z = z n+1 ) E(YT Z = z n ) }{{} Observed n {1,..., N S }. (21) = E(Y (1 T ) Z = z n ) E(Y (1 T ) Z = z n+1 ) }{{} Observed n {1,..., N S }. (22) James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

112 LATE is a Stratum Treatment Effect and Stratum Treatment Effects are LATEs Subtracting (22) from (21) within each stratum: E(Y (1) Y (0) S n) Pr(S n) =E(Y Z = z n+1 ) E(Y Z = z n); n {1,..., N S }. By equation (20): E(Y Z = z n+1 ) E(Y Z = z n ) E(Y (1) Y (0) S n ) = Pr(T = 1 Z = z n+1 ) Pr(T = 1 Z = z n ) (23) = LATE (z n+1, z n ); n {1,..., N S }. (24) This shows that the strata average treatment effect is identical to the Local Average Treatment Effect (LATE) in Imbens and Angrist (1994). James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

113 Heckman and Vytlacil (1999; 2005) show how to use Pr(T = 1 Z) to trace out the support of V on which LATE implicitly conditions. Identifies the quantiles of V associated with the changers. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

114 Generating Marginal Treatment Effects A change of variables permits us to explore this further. Let p n = Pr(T = 1 Z = z n ) Define n = Pr(T = 1 Z = z n+1 ) Pr(T = 1 Z = z n ). In this notation, n j=1 Pr(S j) = p n + n according to equation (20). Define L = F S (S), cdf of S. From equation (20): N S ( n L = Pr(S j ) ) N S ( 1[S = s n ] = pn + n )1[S = s n ]. n=1 j=1 n=1 (25) James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

115 Thus L = p n + n, S = s n and V S n are equivalent and P = p n and Z = z n are equivalent. Thus: E(Y (1) Y (0) L = p n + n ) = E(Y P = p n + n ) E(Y P = p n ) n = LATE (p n + n, p n ) James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

116 MTE If L and P absolutely continuous: E(Y (1) Y (0) L = p) = lim 0 E(Y P = p + ) E(Y P = p) = lim 0 LATE (p +, p) = MTE (p) MTE (p) is the Marginal Treatment Effect (MTE) of Heckman and Vytlacil (1999). Heckman and Vytlacil (2005, 2007) show how, under different supports, MTE (p) generates a variety of treatment parameters. Through limit operations with continuous instruments, Heckman and Vytlacil identify the points of evaluation of the quantiles of V. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

117 Treatment Effect Weights James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

118 Treatment Effect Weights A consequence of Lemma L-4: Y (t) (T, Z) S Thus: E(Y (t) S, K) = E(Y (t) S). (26) This causal effect is a weighted average of the total treatment effects within strata: N S E(Y (1) Y (0) K) = E(Y (1) Y (0) S n, K) Pr(S n K) n=1 N S Pr(K Sn) Pr(Sn) = E(Y (1) Y (0) S n) by Equation (26) Pr(K) n=1 N S = E(Y (1) Y (0) S n)ω K,n (27) n=1 where weights ω K,n = Pr(K Sn) Pr(Sn) Pr(K) are positive and N s n=1 ω K,n = 1. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

119 In the case of the average treatment effect, the weights ω K,n = Pr(S n ). Can use equation (27) to examine the average treatment effect in the case of absolutely continuous variables. Note that in this case, L has a uniform [0, 1] distribution, as L = F S (S) and the average treatment effect can be written as: N S E(Y (1) Y (0)) = E(Y (1) Y (0) S n ) Pr(S n ) n=1 N S = E(Y (1) Y (0) L = p n + n ) Pr(L = p n + n ). n=1 The equation for the continuous case is: E(Y (1) Y (0)) = 1 0 MTE (p)dp. (28) James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

120 Since Z S (Lemma L-4): Pr(T = 1 S n ) = [Pr(Z = z 1 )... Pr(Z = z NZ )]A 1 [, n] = N Z j=n+1 Pr(Z = z j ) = 1 F P (p n + n ). where F P stands for the cumulative distribution function of P. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

121 Thus the treatment weights are given by: E(Y (1) Y (0) T = 1) = (29) N S = E(Y (1) Y (0) S n ) (1 F P(p n + n )) Pr(S n ) Pr(T = 1) n=1 }{{} Weights N S = n=1 E(Y (1) Y (0) L = p n + n ) (1 F P(p n + n )) Pr(L = p n + n ) Pr(T = 1) For the continuous case are: E(Y (1) Y (0)) = 1 0 } {{ } Weights (30) ( ) (1 MTE FP (p)) (p) dp. Pr(T = 1) }{{} Weights. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

122 5. Summary of Identification Strategies for Binary Case James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

123 The Fundamental Confounding Problem All of the methods for identifying the Haavelmo model that are discussed in this paper take different approaches to solving the fundamental integral equation. E(Y T = t, X = x) = }{{} Data E(Y T = t, X = x, V = v) }{{} Counterfactual df V X =x (v) }{{} Mixing Distribution. James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

124 Table 4: Alternative Approaches to Identifying Treatment Effects by Eliminating V : Binary Treatment Case (Y (1), Y (0)) T X, V, T {0, 1} Y = TY (1) + (1 T )Y (0) E(Y T = t, X = x) = E(Y T = t, X = x, V = v)df V X =x (v) Method Need Instrument (Z)? Identify Distribution of V? V X? V Z? Matching a V, X known No Yes (V observed) No Control Functions b V estimated, X, Z known (continuous T ); Bounds on quantiles Yes Yes (over support) V X, Z of V estimated (discrete case) Factor No Yes (with auxiliary measurements Typically Method c Distribution of V estimated from additional measurements of V (M) over support) V X (not strictly required) IV LATE d Z, X known Yes Estimate intervals of No; V Z quantiles of V (Heckman and Vytlacil, 1999, 2005) and conditions on them James Heckman and Rodrigo Pinto Causal Analysis After Haavelmo, December 15, / 157

Econometric Causality

Econometric Causality Econometric (2008) International Statistical Review, 76(1):1-27 James J. Heckman Spencer/INET Conference University of Chicago Econometric The econometric approach to causality develops explicit models