Instrumental Variables: Then and Now

Size: px

Start display at page:

Download "Instrumental Variables: Then and Now"

Damon Fox
5 years ago
Views:

1 Instrumental Variables: Then and Now James Heckman University of Chicago and University College Dublin Econometric Policy Evaluation, Lecture II Koopmans Memorial Lectures Cowles Foundation Yale University September 27, / 188

2 This lecture based on Heckman, Urzua and Vytlacil (2006) Understanding Instrumental Variables in Models With Essential Heterogeneity 2 / 188

3 Policy adoption problem Suppose a policy is proposed for adoption in a country. What can we conclude about the likely effectiveness of the policy in countries? Build a model of counterfactuals. Y 1 = µ 1 (X ) + U 1 (1.1) Y 0 = µ 0 (X ) + U 0. 3 / 188

4 Consider the basic generalized Roy model Two potential outcomes (Y 0, Y 1 ). A choice equation Observed outcomes are D = 1[µ D (Z, V ) > 0]. }{{} net utility Y = DY 1 + (1 D)Y 0 Assume µ D (Z, V ) = µ D (Z) V. 4 / 188

5 Switching Regression Notation Y = Y 0 + (Y 1 Y 0 )D (1.2) = µ 0 + (µ 1 µ 0 + U 1 U 0 )D + U 0. (Quandt, 1958, 1972) In Conventional Regression Notation Y = α + βd + ε (1.3) α = µ 0, β = (Y 1 Y 0 ) = µ 1 µ 0 + U 1 U 0, ε = U 0. β is the treatment effect. 5 / 188

6 Figure 1: distribution of gains, a Roy economy T T Return to Marginal Agent AT E T U T C=1.5 5 =Y 1 -Y 0 β = Y 1 Y 0 TT= 2.666, TUT= Return to Marginal Agent = C = 1.5, ATE = µ 1 µ 0 = β = / 188

7 The model Outcomes Choice Model Y 1 = µ 1 + U 1 = α + β + U 1 D = Y 0 = µ 0 + U 0 = α + U 0 { 1 if D > 0 0 if D 0 General Case (U 1 U 0 ) D ATE TT TUT 7 / 188

8 The model The Researcher Observes (Y, D, C) Y = α + βd + U 0 where β = Y 1 Y 0 Parameterization α = 0.67 (U 1, U [ 0 ) N (0, Σ) ] D = Y 1 Y 0 C β = 0.2 Σ = C = / 188

9 In the case when U 1 = U 0 = ε 0, simple least squares regression of Y on D subject to a selection bias. This is a form of endogeneity bias considered by the Cowles analysts. Upward biased for β if Cov(D, ε) > 0. 9 / 188

10 Three main approaches have been adopted to solve this problem: 1 Selection models 2 Instrumental variable models 3 Matching: assumes that ε D X. Matching is just nonparametric least squares and assumes access to rich data which happens to guarantee this condition. 10 / 188

11 Case I, the traditional case: β is a constant If there is an instrument Z, with the property that then Cov(Z, D) 0 (1.4) Cov(Z, ε) = 0, (1.5) plim ˆβ IV = Cov(Z, Y ) Cov(Z, D) = β. If other instruments exist, each identifies the same β. 11 / 188

12 Case II, heterogeneous response case: β is a random variable even conditioning on X Sorting bias or sorting on the gain which is distinct from sorting on the level. Essential heterogeneity Cov(β, D) 0. Suppose (1.4), (1.5) and Cov(Z, β) = 0. (1.6) Can we identify the mean of (Y 1 Y 0 ) using IV? 12 / 188

13 In general we cannot (Heckman and Robb, 1985). Let β = (µ 1 µ 0 ) β = β + η U 1 U 0 = η Y = α + βd + [ε + ηd]. Need Z to be uncorrelated with [ε + ηd] to use IV to identify β. This condition will be satisfied if policy adoption is made without knowledge of η (= U 1 U 0 ). If decisions about D are made with partial or full knowledge of η, IV does not identify β. 13 / 188

14 The IV condition is E [ε + ηd Z] = 0. E (ε Z) = 0, E (η Z) = 0. Even if η Z, η Z D = 1. E(ηD Z) = E(η D = 1, Z) Pr(D = 1 Z). But E (η Z, D = 1) 0, in general, if agents have some information about the gains. 14 / 188

15 Draft Lottery example (Heckman, 1997). Linear IV does not identify ATE or any standard treatment parameters. 15 / 188

16 Imbens Angrist conditions (1994) Imbens and Angrist (1994) establish that IV can identify an interpretable parameter in the model with essential heterogeneity. Their parameter is a discrete approximation to the marginal gain parameter of Björklund and Moffitt (1987). This parameter can be interpreted as the marginal gain to outcomes induced from a marginal change in the costs of participating in treatment (Björklund-Moffitt). 16 / 188

17 Imbens Angrist conditions (1994) Imbens and Angrist assume the existence of an instrument Z that takes two or more distinct values. Keep conditioning on X implicit. Let D i (z) be the indicator (= 1 if adopted; = 0 if not) It is a random variable for choice when we set Z = z. 17 / 188

18 Imbens Angrist conditions (1994) IV-1 (Independence) Z ( Y 1, Y 0, {D (z)} z Z ). IV-2 (Rank) Pr(D = 1 Z) depends on Z. They supplement the standard IV assumption with a monotonicity assumption. IV-3 (Monotonicity or Uniformity) D i (z) D i (z ) or D i (z) D i (z ) i = 1,..., I. 18 / 188

19 Imbens Angrist conditions (1994) Uniformity of responses across persons. Uniformity is satisfied when, for z < z, D i (z) D i (z ) for all i, while for z > z, D i (z ) D i (z ) for all i. 19 / 188

20 Imbens Angrist conditions (1994) These conditions imply the LATE parameter. E (Y Z = z) E (Y Z = z ) = E ((D(z) D(z )) (Y 1 Y 0 )) (Independence) 20 / 188

21 Imbens Angrist conditions (1994) Using iterated expectations, E (Y Z = z) E (Y Z = z ) (1.7) ( ) E (Y1 Y = 0 D (z) D (z ) = 1) Pr (D (z) D (z ) = 1) ( ) E (Y1 Y 0 D (z) D (z ) = 1) Pr (D (z) D (z. ) = 1) Monotonicity allows us to drop out one term. 21 / 188

22 Imbens Angrist conditions (1994) Suppose, for example, that Pr(D(z) D(z ) = 1) = 0. Thus, E (Y Z = z) E (Y Z = z ) = E (Y 1 Y 0 D(z) D(z ) = 1) Pr (D(z) D(z ) = 1). E (Y Z = z) E (Y Z = z ) LATE = Pr(D = 1 Z = z) Pr(D = 1 Z = z ) = E (Y 1 Y 0 D(z) D(z ) = 1) (1.8) The mean gain to those induced to switch from 0 to 1 by a change in Z from z to z. 22 / 188

23 Imbens Angrist conditions (1994) Observe LATE = ATE if Pr (D = 1 Z = z) = 1 while Pr (D = 1 Z = z ) = 0. Identification at infinity plays a crucial role throughout the entire literature on policy evaluation. 23 / 188

24 Imbens Angrist conditions (1994) In general, LATE E(Y 1 Y 0 ) = E(β). Not treatment on the treated: E(β D = 1). Different instruments define different parameters. Having a wealth of different strong instruments does not improve the precision of the estimate of any particular parameter (Heckman and Robb, 1986). When there are more than two distinct values of Z, Imbens and Angrist use Yitzhaki (1989) weights. 24 / 188

25 Imbens Angrist conditions (1994) Goal of our work: unify literature with a common set of underlying parameters interpretable across studies. To understand how to connect the results of various disparate IV estimands within a unified framework. 25 / 188

26 IV in choice models D = 1 [D > 0] (2.1) 1[ ] is an indicator (1[A] = 1 if A true; 0 otherwise). Example: µ D (Z) = γz D = µ D (Z) V (2.2) D = γz V 26 / 188

27 Examples (V Z) X. The propensity score: P(z) = Pr(D = 1 Z = z) = Pr(γz > V ) = F V (γz) F V is the distribution of V. 27 / 188

28 Examples Generalized Roy model D = 1[Y 1 Y 0 C > 0] Costs C = µ C (W ) + U C Z = (X, W ) µ D (Z) = µ 1 (X ) µ 0 (X ) µ C (W ) V = (U 1 U 0 U C ). 28 / 188

29 Heterogeneous response model In a general model with heterogenous responses, specification of P(Z) and its relationship with the instrument play a crucial role. Cov (Z, ηd) = E (( Z Z ) ηd ) = E (( Z Z ) η D = 1 ) Pr (D = 1) = E( ( Z Z ) η γz > V }{{} F V (γz) > F V (V ) P(Z) > U D )Pr (γz > V ). }{{} P(Z) Probability of selection enters the covariance even though we use only one component of Z as an instrument. 29 / 188

30 Selection models control for this dependence induced by choice. 30 / 188

31 Selection models Assume (U 1, U 0, V ) Z (2.3) [Alternatively (ε, η, V ) Z]. η = (U 1 U 0 ), ε = U 0 (2.4) E (Y D = 0, Z = z) = E(Y 0 D = 0, Z = z) = α + E(U 0 γz < V ) E(Y D = 0, Z = z) = α + K 0 (P(z) }{{} ) control function 31 / 188

32 Selection models E (Y D = 1, Z = z) = E (Y 1 D = 1, Z = z) = α + β + E (U 1 γz > V ) = α + β + K 1 (P(z)) }{{} control function K 0 (P(z)) and K 1 (P(z)) are control functions in the sense of Heckman and Robb (1985, 1986). P(z) is an essential ingredient. Matching: K 1 (P(z)) = K 0 (P(z)). 32 / 188

33 In a model where β is variable and not independent of V, misspecification of Z affects the interpretation of what IV estimates analogous to its role in selection models. Misspecification of Z affects both approaches to identification. This is a new phenomenon in models with heterogenous β. 33 / 188

34 Model for outcomes Y 1 = µ 1 (X, U 1 ) (3.1) Y 0 = µ 0 (X, U 0 ). X are observed and (U 1, U 0 ) are unobserved by the analyst. The X may be dependent on U 0 and U 1. Generalize choice model (2.1) and (2.2) for D, a latent utility. 34 / 188

35 Model for outcomes D = µ D (Z) V and D = 1 (D 0) (3.2) µ D (Z) V can be interpreted as a net utility for a person with characteristics (Z, V ). β = Y 1 Y 0 = µ 1 (X, U 1 ) µ 0 (X, U 0 ) (Treatment Effect) 35 / 188

36 Model for outcomes A special case that links our analysis to standard models in econometrics: Y 1 = X β 1 + U 1 and Y 0 = X β 0 + U 0 ; so β = X (β 1 β 0 ) + (U 1 U 0 ). In the case of separable outcomes, heterogeneity in β arises because in general U 1 U 0 and people differ in their X. Heckman-Vytlacil conditions (1999,2001, 2005) 36 / 188

37 Assumptions A-1 The distribution of µ D (Z) conditional on X is nondegenerate (Rank Condition for IV). This says that we can vary Z (excluded from outcome equations) given X. Key property of an instrument. A-2 (U 0, U 1, V ) are independent of Z conditional on X (Independence Condition for IV). Z is not affecting potential outcomes or affecting the unobservables affecting choices. 37 / 188

38 Assumptions A-3 The distribution of V is continuous (not essential). A-4 E Y 1 <, and E Y 0 < (Finite Means). 38 / 188

39 Assumptions A-5 1 > Pr (D = 1 X ) > 0 (For each X there is a treatment group and a comparison group). A-6 Let X 0 denote the counterfactual value of X that would have been observed if D is set to 0. X 1 is defined analogously. Thus X d = X, for d = 0, 1 (The X d are invariant to counterfactual manipulations). 39 / 188

40 Separability between V and µ D (Z) in choice equation is conventional. Plays an important role in the properties of instrumental variable estimators in models with essential heterogeneity. It implies monotonicity (uniformity) condition (IV-3) from choice equation (3.2). Vytlacil (2002) shows that independence and monotonicity (IV-3) imply the existence of a V and representation (3.2) given some regularity conditions. 40 / 188

41 Use probability integral transform to write D = 1 [F V (µ D (Z)) > F V (V )] = 1 [P (Z) > U D ] (3.3) U D = F V (V ) and P (Z) = F V (µ D (Z)) = Pr[D = 1 Z] P(Z) is transformation of mean scale utility in a discrete choice model. 41 / 188

42 LATE, the marginal treatment effect and instrumental variables A basic parameter that can be used to unify the treatment effect literature: MTE (x, u D ) = E(Y 1 Y 0 X = x, U D = u D ). = E (β X = x, V = v) MTE and the local average treatment effect (LATE) parameter are closely related. For (z, z ) Z(x) Z(x) so that P(z) > P(z ), under (IV-3) and independence (A-2), LATE is: LATE (z, z) = E (Y 1 Y 0 D (z) = 1, D (z ) = 0) (3.4) 42 / 188

43 LATE, the marginal treatment effect and instrumental variables LATE can be written in a fashion free of any instrument: E (Y 1 Y 0 D(z) = 1, D(z ) = 0) (3.5) = E (Y 1 Y 0 u D < U D < u D ) = LATE (u D, u D ) u D = Pr(D (z) = 1) = Pr (D (z) = 1 Z = z) = Pr(D (z) = 1) = P(z) u D = Pr (D (z ) = 1 Z = z ) = Pr(D (z ) = 1) = P(z ) The z just help us define evaluation points for the u D. 43 / 188

44 LATE, the marginal treatment effect and instrumental variables Under (A-1) (A-5), all standard treatment parameters are weighted averages of MTE with weights that can be estimated. 44 / 188

45 LATE, the marginal treatment effect and instrumental variables Table 1A: treatment effects and estimands as weighted averages of the marginal treatment effect ATE(x) = E (Y 1 Y 0 X = x) = 1 0 MTE (x, u D ) du D TT(x) = E (Y 1 Y 0 X = x, D = 1) = 1 0 MTE (x, u D ) ω TT (x, u D ) du D TUT(x) = E (Y 1 Y 0 X = x, D = 0) = 1 0 MTE (x, u D ) ω TUT (x, u D ) du D Policy Relevant Treatment Effect (x) = E (Y a X = x) E (Y a X = x) = 1 0 MTE (x, u D ) ω PRTE (x, u D ) du D for two policies a and a that affect the Z but not the X IV J (x) = 1 0 MTE (x, u D ) ω J IV (x, u D) du D, given instrument J OLS(x) = 1 0 MTE (x, u D ) ω OLS (x, u D ) du D 45 / 188

46 LATE, the marginal treatment effect and instrumental variables Table 1B: weights ω ATE (x, u D ) = 1 ω TT (x, u D ) = [ ] 1 1 u D f (p X = x)dp E(P X = x) ω TUT (x, u D ) = [ u D 0 f (p X = x) dp ] 1 E ((1 P) X = x) [ ] FPa,X (u D ) F Pa,X (u D ) ω PRTE (x, u D ) = P 46 / 188

47 LATE, the marginal treatment effect and instrumental variables Table 1B: weights ωiv J (x, u D) 1 u = D (J(Z) E(J(Z) X = x)) f J,P X (j, t X = x) dt dj Cov(J(Z), D X = x) ω OLS (x, u { D ) } E(U1 X = x, U D = u D ) ω 1 (x, u D ) E(U 0 X = x, U D = u D ) ω 0 (x, u D ) = 1 + MTE (x, u D ) 47 / 188

48 LATE, the marginal treatment effect and instrumental variables Table 1B: weights ω 1 (x, u D ) = [ ] [ ] 1 1 u D f (p X = x) dp E(P X = x) ω 0 (x, u D ) = [ u D 0 f (p X = x) dp ] 1 E((1 P) X = x) Source: Heckman and Vytlacil (2005) 48 / 188

49 LATE, the marginal treatment effect and instrumental variables Figure 2: weights for the marginal treatment effect for different parameters 2 D P h(u D ) MTE.... U D 49 / 188

50 LATE, the marginal treatment effect and instrumental variables E(β U D = u D ) does not vary with u D. Standard case. ATE = TT = LATE = policy counterfactuals = plim IV. 50 / 188

51 LATE, the marginal treatment effect and instrumental variables When will E(β U D = u D ) not vary with u D? 1 If U 1 = U 0 β a Constant. 2 More Generally, if U 1 U 0 is mean independent of U D, so treatment effect heterogeneity is allowed but individuals do not act upon their own idiosyncratic effect. 51 / 188

52 LATE, the marginal treatment effect and instrumental variables Consider standard analysis. plim of OLS: ln Y = α + ( β + U 1 U 0 )D + U 0 E (ln Y D = 1) E (ln Y D = 0) { = β E (U0 D = 1) + E(U 1 U 0 D = 1) + E (U 0 D = 0) = ATE + Sorting Gain + Ability Bias }{{} } = TT + Ability Bias 52 / 188

53 LATE, the marginal treatment effect and instrumental variables If ATE is a parameter of interest, OLS suffers from both sorting bias and ability bias. If TT is parameter of interest, OLS suffers from ability bias. Using IV removes ability bias, but changes the parameter being estimated (neither ATE nor TT in general). Different IV Weight MTE differently. We derive IV weights below. 53 / 188

54 LATE, the marginal treatment effect and instrumental variables IV Instrument Dependent (which Z used and which values of Z used). Hence studies using different Z are not comparable. How to make studies comparable? We can test to see if these complications are required in any particular empirical analysis. 54 / 188

55 Identifying MTE Testing for essential heterogeneity E (Y Z = z) = E(Y P(Z) = p) (index sufficiency) = E (DY 1 + (1 D)Y 0 P (Z) = p) = E(Y 0 ) + E (D (Y 1 Y 0 ) P (Z) = p) [ E(Y1 Y = E(Y 0 ) + 0 D = 1, P (Z) = p) Pr (D = 1 Z = z) = E(Y 0 ) + p 0 E(Y 1 Y 0 U D = u D ) du D. ] 55 / 188

56 Identifying MTE Testing for essential heterogeneity As a consequence, we get LIV (Local Instrumental Variables), which identifies MTE E (Y Z = z) P(z) = E(Y 1 Y 0 U D = u D ). (3.6) P(Z)=uD }{{}}{{} MTE LIV When β D, Y is linear in P (Z): where b = MTE = ATE = TT. E (Y Z) = a + bp (Z) (3.7) These results are valid whether or not Y 1 and Y 0 are separable in U 1 and U 0. Therefore we can identify the treatment parameters using estimated weights and estimated MTE. 56 / 188

57 Identifying MTE Example: college attendance on wages for high school graduates E(Y X, P) as a function of P for average X E(Y P) P Source: Carneiro, Heckman and Vytlacil (2006) 57 / 188

58 Identifying MTE Example: college attendance on wages for high school graduates E(Y 1 Y ) X, U S ) estimated using locally quadratic regression (averaged over X ) E(Y 1 - Y 0 X,U S ) U S Source: Carneiro, Heckman and Vytlacil (2006) 58 / 188

59 Identifying MTE Example: costs of breast cancer treatments using different instruments in P(Z) MTE(UD) ( in $) MTE(UD) Profile and 95% CI with IV = FEEDIF UD Source: Basu, Heckman and Urzua 59 / 188

60 Identifying MTE Example: costs of breast cancer treatments using different instruments in P(Z) MTE(UD) ( in $) MTE(UD) Profile and 95% CI with IV = NORTH, FEEDIF UD Source: Basu, Heckman and Urzua 60 / 188

61 Identifying MTE Example: costs of breast cancer treatments using different instruments in P(Z) Estimated propensity score for BCSRT and MST MTE(η q, u D ) E(P(Z) X) BCSRT MST Using IV = NORTH Source: Basu, Heckman and Urzua 61 / 188

62 Identifying MTE Example: costs of breast cancer treatments using different instruments in P(Z) ω ATE (η q, u D ) MTE(u D ) MTE(UD) ( in $) MTE(UD) Profile and 95% CI with IV = NORTH UD Source: Basu, Heckman and Urzua 62 / 188

63 Identifying MTE Example: costs of breast cancer treatments using different instruments in P(Z) ω TT (η q, u D ) ω IV (η q, u D ) Source: Basu, Heckman and Urzua 63 / 188

64 Identifying MTE Example: costs of breast cancer treatments using different instruments in P(Z) Source: Basu, Heckman and Urzua 64 / 188

65 Identifying MTE Example: costs of breast cancer treatments using different instruments in P(Z) Source: Basu, Heckman and Urzua 65 / 188

66 Identifying MTE Example: unionism on wages Cross Section: 1989 E(Y X, P) Source: Heckman, Schmierer and Urzua (2006) 66 / 188

67 Identifying MTE Example: unionism on wages Bootstrap Mean Estimates of MTE Note: Source: The top Heckman, panel plots Schmierer the estimated and Urzua outcomes (2006) using the whole sample over the support of P with X fixed at its mean value. The bottom panel plots the estimated outcomes using the bootstrap mean coefficients. In each panel, the dotted line represents the estimates using only a linear term in P and the solid line has up to a quartic term in P. 67 / 188

68 Understanding what linear IV estimates Consider J(Z) as an instrument, a scalar function of Z. IV J = Cov(Y, J(Z)) Cov(D, J(Z)) Express it as a weighted average of MTE. Z can be a vector of instruments. 68 / 188

69 Understanding what linear IV estimates Under (A-1) (A-5) and separable choice model IV J = 1 0 MTE (u D ) ω J IV (u D ) du D (3.8) ωiv J (u D ) = E ( ) J (Z) J(Z) P (Z) > u D Pr (P (Z) > ud ). Cov (J (Z), D) (3.9) J(Z) and P(Z) do not have to be continuous random variables. Functional forms of P(Z) and J(Z) are general. 69 / 188

70 Understanding what linear IV estimates Dependence between J(Z) and P(Z) gives shape and sign to the weights. If J(Z) = P(Z), then weights obviously non-negative. If E(J(Z) J(Z) P(Z) u D ) not monotonic in u D, weights can be negative. 70 / 188

71 Understanding what linear IV estimates When J(Z) = P(Z), weight (3.9) follows from Yitzhaki (1989). He considers a regression function E (Y P (Z) = p). Linear regression of Y on P identifies β Y,P = 1 0 ω (p) = [ E (Y P (Z) = p) 1 p p (t E (P)) df P (t) Var (P) ] ω (p) dp, This is the weight (3.9) when P is the instrument. This expression does not require uniformity or monotonicity for the model; consistent with 2-way flows.. 71 / 188

72 Understanding what linear IV estimates Yitzhaki s theorem Theorem Assume (Y, X ) i.i.d. E( Y ) < E( X ) < µ Y = E(Y ) µ X = E(X ) E(Y X ) = g(x ) Assume g (X ) exists and E ( g (X ) ) <. 72 / 188

73 Understanding what linear IV estimates Yitzhaki s theorem Theorem (cont.) Then, where ω(t) = = Cov(Y, X ) Var(X ) = g (t) ω(t) dt, 1 (x µ X ) f X (x) dx Var(X ) t 1 Var(X ) E (X µ X X > t) Pr (X > t). Y = πx + η, π = Cov(Y, X ). Var(X ) 73 / 188

74 Understanding what linear IV estimates Proof of Yitzhaki s theorem Proof. Cov(Y, X ) = Cov (E(Y X ), X ) = Cov (g(x ), X ) = where t is an argument of integration. g(t)(t µ X ) f X (t) dt 74 / 188

75 Understanding what linear IV estimates Proof of Yitzhaki s theorem cont. Integration by parts: Cov(Y, X ) = g(t) = t (x µ X ) f X (x) dx t g (t) g (t) t since E (X µ X ) = 0. (x µ X ) f X (x) dx dt (x µ X ) f X (x) dx dt, 75 / 188

76 Understanding what linear IV estimates Proof of Yitzhaki s theorem cont. Therefore, Cov(Y, X ) = Result follows with ω(t) = g (t) E (X µ X X > t) Pr (X > t) dt. 1 Var(X ) E (X µ X X > t) Pr (X > t) 76 / 188

77 Understanding what linear IV estimates Weights positive. Integrate to one (use integration by parts formula). = 0 when t and t. We get claimed results on weights for TSLS when in place of X we use P(Z). Weight reaches its peak at t = µ X, if f X has density at x = µ X : d dt t (x µ X ) f X (x) dx dt = (t µ X )f X (t) = 0 at t = µ X. 77 / 188

78 Yitzhaki, s Weights Understanding what linear IV estimates Yitzhaki s weights for X BetaPDF(x, α, β) d[g(x)]/dx; b =0, c =0.5 2 w(x); =1, / = X,Y X 2 w(x); =2, / = X,Y X 2 w(x); =5, / = X,Y X 2 w(x); =10, / = X,Y X x, g(x) = b*x + c*log(x), X ~ BetaPDF(x,, ); =5 78 / 188

79 Adoption model 1.5 IV General model Comparing models Examples GED Separability Conclusion Understanding what linear IV estimates 1 Yitzhaki s weights for X BetaPDF(x, α, β) x, g(x) = b*x + c*log(x), X ~ BetaPDF(x,, ); =5 79 / 188

80 Understanding what linear IV estimates Can apply Yitzhaki s analysis to the treatment effect model Y = α + βd + ε P(Z), the propensity score is the instrument: E (Y Z = z) = E (Y P(Z) = p) 80 / 188

81 Understanding what linear IV estimates E (Y P(Z) = p) = α + E (βd P(Z) = p) = α + E (β D = 1, P(Z) = p) p = α + E (β P(Z) > U D, P(Z) = p) p = α + E (β p > U D ) p p = α + β f (β, u D ) du D } 0 {{ } Derivative with respect to p is MTE. g (p) =MTE and weights as before. g(p) 81 / 188

82 Understanding what linear IV estimates Under uniformity, E (Y P (Z) = p) p = E (Y 1 Y 0 U D = u D ) = MTE (u D ). More generally, it is LIV = E(Y P(Z)=p) p. Yitzhaki s result does not rely on uniformity; true of any regression of Y on P. Estimates a weighted net effect. The expression can be generalized. It produces Heckman-Vytlacil weights. 82 / 188

83 Understanding what linear IV estimates The Heckman-Vytlacil weight as a Yitzhaki weight Proof. ( Cov (J (Z), Y ) = E Y J ) ( = E E (Y Z) J ) (Z) ( = E E (Y P (Z)) J ) (Z) ( = E g (P (Z)) J ) (Z). J = J (Z) E (J (Z) P (Z) ud ), E (Y P (Z)) = g (P (Z)). 83 / 188

84 Understanding what linear IV estimates The Heckman-Vytlacil weight as a Yitzhaki weight cont. Cov (J (Z), Y ) = = 1 J 0 1 J g (u D ) 0 J g (u D ) jf P,J (u D, j) djdu D J jf P,J (u D, j) djdu D. 84 / 188

85 Understanding what linear IV estimates The Heckman-Vytlacil weight as a Yitzhaki weight cont. Use integration by parts: Cov (J (Z), Y ) = g (u D ) = = ud J g (u D ) 0 J 1 g (u D ) E 0 1 J jf P,J (p, j) djdp g (u D ) u D J ud J 0 J jf P,J (p, j) djdpdu D 1 0 jf P,J (p, j) djdpdu D ( J (Z) P (Z) ud ) Pr (P (Z) u D ) du D. 85 / 188

86 Understanding what linear IV estimates The Heckman-Vytlacil weight as a Yitzhaki weight cont. Thus: g (u D ) = E (Y P (Z) = p) P (Z) = MTE (u D ). p=ud 86 / 188

87 Understanding the structure of the IV weights Recapitulate: ω J IV (u D ) = J IV = MTE (u D ) ω J IV(u D ) du D 1 (j E(J (Z))) f J,P (j, t) dt dj u D Cov (J (Z), D) (3.10) The weights are always positive if J (Z) is monotonic in the scalar Z. In this case J (Z) and P (Z) have the same distribution and f J,P (j, t) collapses to a single distribution. 87 / 188

88 Understanding the structure of the IV weights The possibility of negative weights arises when J(Z) is not a monotonic function of P(Z). It can also arise when there are two or more instruments, and the analyst computes estimates with only one instrument or a combination of the Z instruments that is not a monotonic fuction of P(Z) so that J(Z) and P(Z) are not perfectly dependent. 88 / 188

89 Understanding the structure of the IV weights The weights can be constructed from data on (J, P, D). Data on (J (Z), P (Z)) pairs and (J (Z), D) pairs (for each X value) are all that is required. 89 / 188

90 Understanding the structure of the IV weights Discrete instruments J (Z) Discrete Case Support of the distribution of P(Z) contains a finite number of values p 1 < p 2 < < p K. Support of the instrument J (Z) is also discrete, taking I distinct values. E(J(Z) P(Z) u D ) is constant in u D for u D within any (p l, p l+1 ) interval, and Pr(P(Z) u D ) is constant in u D for u D within any (p l, p l+1 ) interval. Let λ l denote the weight on the LATE for the interval (p l, p l+1 ). 90 / 188

91 Understanding the structure of the IV weights Discrete instruments J (Z) Under monotonicity, or uniformity IV J = E(Y 1 Y 0 U D = u D )ωiv J (u D ) du D (3.11) = = K 1 pl+1 1 λ l E(Y 1 Y 0 U D = u D ) p l (p l+1 p l ) du D l=1 K 1 LATE (p l, p l+1 )λ l. l=1 91 / 188

92 Understanding the structure of the IV weights Discrete instruments J (Z) Let j i be the i th smallest value of the support of J(Z). λ l = I (j i E(J(Z))) K (f (j i, p t )) t>l Cov (J (Z), D) (p l+1 p l ) (3.12) i=1 92 / 188

93 Understanding the structure of the IV weights Discrete instruments J (Z) In general, this formula is true, under index sufficiency even if monotonicity is violated. It s certainly true under (A-1) (A-5). True where LATE (p l, p l+1 ) is replaced by the Wald estimator, based on P(z l ), l = 1,..., L, instruments. Observe, LATE here defined in terms of P(Z), the natural instrument. 93 / 188

94 Understanding the structure of the IV weights Discrete instruments J (Z) Generalizes the expression presented by Imbens and Angrist (1994) and Yitzhaki (1989, 1996) Their analysis of the case of vector Z only considers the case where J(Z) and P(Z) are perfectly dependent because J(Z) is a monotonic function of P (Z). More generally, the weights can be positive or negative for any l but they must sum to 1 over the l. 94 / 188

95 The central role of the propensity score For the IV weight to be correctly constructed and interpreted, we need to know the correct model for P (Z). IV depends on: 1 the choice of the instrument J (Z), 2 its dependence with P (Z), 3 the specification of the propensity score (i.e., what variables go into Z). Structural LATE or MTE identified by P(Z). Can derive all other instrumental variable estimators in terms of weighted averages of MTE or LATE. 95 / 188

96 Monotonicity, uniformity and conditional instruments Monotonicity or uniformity condition (IV-3) rules out general heterogeneous responses to treatment choices in response to changes in Z. The recent literature on instrumental variables with heterogeneous responses is asymmetric. The uniformity condition can be violated even when all components of γ are of the same sign if Z is a vector and γ is a nondegenerate random variable. D = 1 [γz > γ] 96 / 188

97 Monotonicity, uniformity and conditional instruments Uniformity is a condition on a vector. Changing one coordinate of Z, holding the other coordinates at different values across people, will not necessarily produce uniformity. Let µ D (z) = γ 0 + γ 1 z 1 + γ 2 z 2 + γ 3 z 1 z 2, where γ 0, γ 1, γ 2 and γ 3 are constants. Consider changing z 1 from a common base state while holding z 2 fixed at different values across people. If γ 3 < 0 then µ D (z) does not necessarily satisfy the uniformity condition. 97 / 188

98 Monotonicity, uniformity and conditional instruments Positive weights and uniformity are distinct issues. Under uniformity, and assumptions (A-1) (A-5), the weights on MTE or LIV for any particular instrument may be positive or negative. 98 / 188

99 Monotonicity, uniformity and conditional instruments If we condition on Z 2 = z 2,..., Z K = z K using Z 1 as an instrument, then uniformity is satisfied. Effectively convert the problem back to that of a scalar instrument where the weights must be positive. The concept of conditioning on other instruments to produce positive weights for the selected instrument is a new idea. 99 / 188

100 Monotonicity and weights Monotonicity is a property needed to get treatment effects with just two values of Z, Z = z 1 and Z = z 2, to guarantee that IV estimates a treatment effect. With multiple values of Z we need to weight to produce linear IV. If our IV shifts P(Z) in same way for everyone, it shifts D in the same way for everyone, D = 1 [P(Z) U D ]. 100 / 188

101 Monotonicity and weights If P(Z) is instrument, monotonicity is obviously satisfied. If J(Z) is an instrument and not a monotonic function of P(Z), may not shift P(Z) in same way for all people. We can get two-way flows if, e.g., we use only one Z or else have a random coefficient model, D = 1 [γz V ]. Negative weights are a tip off of two-way flows. 101 / 188

102 Monotonicity and weights If we do not want a treatment effect, who cares? We do not always want a treatment effect. Go back to ask What economic question am I trying to answer? 102 / 188

103 Treatment effects vs. policy effects Even if uniformity condition (IV-3) fails, IV may answer relevant policy questions. IV or TSLS estimates a weighted average of marginal responses which may be pointwise positive or negative. Policies may induce some people to switch into and others to switch out of choices. Net effects are sometimes of interest in many policy analyses. 103 / 188

104 Treatment effects vs. policy effects Thus, subsidized housing in a region supported by higher taxes may attract some to migrate to the region and cause others to leave. The net effect on earnings from the policy is all that is required to perform cost benefit calculations of the policy on outcomes. If the housing subsidy is the instrument, the issue of monotonicity is a red herring. If the subsidy is exogenously imposed, IV estimates the net effect of the policy on mean outcomes. Only if the effect of migration induced by the subsidy on outcomes is the question of interest, and not the effect of the subsidy, does uniformity emerge as an interesting question. 104 / 188

105 Comparing selection and IV models Angrist and Krueger (1999) compare IV with selection models and view the former with favor. Useful to understand this comparison in a model with essential heterogeneity. IV is estimating the derivative (or finite changes) of the parameters of a selection model. IV only conditions on Z (and X ). 105 / 188

106 Comparing selection and IV models The control function approach conditions on Z and D (and X ). From index sufficiency, equivalent to conditioning on P (Z) and D: E (Y X, D, Z) (4.1) = µ 0 (X ) + [µ 1 (X ) µ 0 (X )] D + K 1 (P (Z), X ) D + K 0 (P (Z), X ) (1 D) K 1 (P(Z), X ) = E(U 1 D = 1, X, P (Z)) and K 0 (P(Z), X ) = E (U 0 D = 0, X, P (Z)). 106 / 188

107 Comparing selection and IV models IV approach does not condition on D. It works with the integral (over D) of (4.1). E (Y X, P(Z)) (4.2) = µ 0 (X ) + [µ 1 (X ) µ 0 (X )] P(Z) + K 1 (P (Z), X ) P(Z) + K 0 (P (Z), X ) (1 P(Z)) Under monotonicity and (A-1) (A-5) E(Y X, P(Z)) P(Z) = LIV (X, p) = MTE (X, p). P(Z)=p Control function builds up MTE from components. IV gets it in one fell swoop. 107 / 188

108 Comparing selection and IV models With rank and limit conditions (Heckman and Robb, 1985; Heckman, 1990), using control functions, one can identify µ 1 (X ), µ 0 (X ), K 1 (P (Z), X ), and K 0 (P (Z), X ). The selection (control function) estimator identifies the conditional means E (Y 1 X, P(Z), D = 1) = µ 1 (X ) + K 1 (X, P(Z)) (4.3a) and E (Y 0 X, P(Z), D = 0) = µ 0 (X ) + K 0 (X, P(Z)). (4.3b) 108 / 188

109 Comparing selection and IV models To decompose these means and separate µ 1 (X ) from K 1 (X, P(Z)) without invoking functional form assumptions, it is necessary to have an exclusion (a Z not in X ). This allows µ 1 (X ) and K 1 (X, P (Z)) to be independently varied with respect to each other. We can also invoke curvature conditions without exclusion of variables. (Heckman and Navarro, 2006) In addition there must exist a limit set for Z given X such that K 1 (X, P(Z)) = 0 for Z in that limit set. 109 / 188

110 Comparing selection and IV models Limit set not required for selection model if we are interested only in MTE or LATE. Not required in IV either if we only seek MTE or LATE. 110 / 188

111 Comparing selection and IV models Without functional form assumptions, it is not possible to disentangle µ 1 (X ) from K 1 (X, P(Z)) which may contain constants and functions of X that do not interact with P(Z) (see Heckman (1990)). These limit set arguments are needed for ATE or TT, not LATE or LIV. 111 / 188

112 IV method IV method works with derivatives of (4.2) and not levels. Cannot directly recover the constant terms in (4.3a) and (4.3b). 112 / 188

113 IV method In summary, the control function method directly identifies levels while the LIV approach works with slopes. Constants that do not depend on P(Z) disappear from the LIV estimates of the model. 113 / 188

114 IV method The distributions of U 1, U 0 and V do not need to be specified to estimate control function models (see Powell, 1994). In particular, there is no reliance on normality. 114 / 188

115 Support problems for IV Support conditions with control function models have their counterparts in IV models. One common criticism of selection models is that without invoking functional form assumptions, identification of µ 1 (X ) and µ 0 (X ) requires that P(Z) 1 and P(Z) 0 in limit sets. Identification in limit sets is sometimes called identification at infinity. In order to identify ATE = E(Y 1 Y 0 X ), IV methods also require that P(Z) 1 and P(Z) 0 in limit sets, so an identification at infinity argument is implicit when IV is used to identify this parameter. 115 / 188

116 Support problems for IV The LATE parameter avoids this problem by moving the goal posts and redefining the parameter of interest from a level parameter like ATE or TT to a slope parameter like LATE which differences out the unidentified constants. We can identify this parameter by selection models or IV models without invoking identification at infinity. 116 / 188

117 Support problems for IV The IV estimator is model dependent, just like the selection estimator, but in application, the model does not have to be fully specified to obtain IV using Z (or J(Z)). However the distribution of P (Z) and the relationship between P (Z) and J (Z) generates the weights on MTE (or LIV). The interpretation placed on IV in terms of weights on MTE depends crucially on the specification of P (Z). In both control function and IV approaches for the general model of heterogeneous responses, P (Z) plays a central role. 117 / 188

118 Support problems for IV Two economists using the same instrument will obtain the same point estimate using the same data. Their interpretation of that estimate will differ depending on how they specify the arguments in P(Z), even if neither uses P(Z) as an instrument. By conditioning on P (Z), the control function approach makes the dependence of estimates on the specification of P (Z) explicit. The IV approach is less explicit and masks the assumptions required to economically interpret the empirical output of an IV estimation. 118 / 188

119 Examples based on choice theory Suppose cost of adopting the policy C is the same across all countries. Countries choose to adopt the policy if D > 0 where D is the net benefit: D = (Y 1 Y 0 C) and ATE = E (β) = E (Y 1 Y 0 ) = µ 1 µ 0 Treatment on the treated is E (β D = 1) = E (Y 1 Y 0 D = 1) = µ 1 µ 0 + E (U 1 U 0 D = 1). 119 / 188

120 Figure 1: distribution of gains The Roy Economy U 1 U 0 D T T Return to Marginal Agent AT E T U T C=1.5 5 β = Y=Y 1 1 -YY 0 0 TT= 2.666, TUT= Return to Marginal Agent = C = 1.5, ATE = µ 1 µ 0 = β = / 188

121 The model Outcomes Choice Model Y 1 = µ 1 + U 1 = α + β + U 1 D = Y 0 = µ 0 + U 0 = α + U 0 { 1 if D > 0 0 if D 0 General Case (U 1 U 0 ) D ATE TT TUT 121 / 188

122 The model The Researcher Observes (Y, D, C) Y = α + βd + U 0 where β = Y 1 Y 0 Parameterization α = 0.67 (U 1, U [ 0 ) N (0, Σ) ] D = Y 1 Y 0 C β = 0.2 Σ = C = Let C = γz, γ / 188

123 The Discrete instruments and the weights for LATE A. Standard Case B. Ch Figure 4A: monotonicity, the extended Roy economy Standard case z =(0,1) z' =(1,1) γz γz' β = Y 1 - Y / 188

124 Figure 2. Monotonicity The Extended Roy Economy Adoption model IV General model Comparing models Examples GED Separability Conclusion Discrete instruments and the weights for LATE Figure 4B: monotonicity, the extended Roy economy Changing Z 1 without controlling for Z 2 B. Changing Z 1 without Controlling for Z γz z =(0,1) z' =(1,1) z'' =(1,-1) γz'' γz' β = Y 1 - Y / 188

125 Discrete instruments and the weights for LATE Figure 4C: monotonicity, the extended Roy economy Random coefficient case or Z C. Random Coefficient Case ~ γ =( 0.5,0.5) ~ γ =(-0.5,0.5) z =(0,1) z' =(1,1) ~ ~ γ z= γ z ~ γ z' 0.05 ~ γz' β = Y 1 - Y / 188

126 Discrete instruments and the weights for LATE Figure 4: monotonicity, the extended Roy economy A. Standard Case B. Changing Z 1 without Controlling for Z 2 C. Random Coefficient Case z z 0 z z 0 or z z 00 z z 0 z =(0, 1) and z 0 =(1, 1) z =(0, 1), z 0 =(1, 1) and z 00 =(1, 1) z =(0, 1) and z 0 =(1, 1) γ is a random vector eγ =(0.5, 0.5) and eγ =( 0.5, 0.5) where eγ and eγ are two realizations of γ D(γz) D(γz 0 ) D(γz) D(γz 0 ) or D(γz) <D(γz 00 ) D ³eγz D ³eγz 0 and D (eγz) <D(eγz 0 ) For all individuals Depending on the value of z 0 or z 00 Depending on value of γ 126 / 188

127 Discrete instruments and the weights for LATE Figure 4: monotonicity, the extended Roy economy model Outcomes Choice Model Y 1 = α + β + U 1 j 1 if Y1 Y D = 0 γz > 0 0 if Y 1 Y 0 γz 0 Y 0 = α + U 0 with γz = γ 1 Z 1 + γ 2 Z 2 (U 1, U 0 ) N (0, Σ), Σ = Parameterization» 1 0.9, Z 1 = { 1, 0, 1} and Z 2 = { 1, 0, 1} α = 0.67, β = 0.2, γ = (0.5, 0.5) (except in Case C) 127 / 188

128 Discrete instruments and the weights for LATE Figure 5: IV weights and its components under discrete instruments when P(Z) is the instrument LATE (p l, p l+1 ) = E (Y P(Z) = p l+1) E (Y P(Z) = p l ) p l+1 p l = β (p l+1 p l ) + σ U1 U 0 ( φ ( Φ 1 (1 p l+1 ) ) φ ( Φ 1 (1 p l ) )) p l+1 p l λ l = (p l+1 p l ) = (p l+1 p l ) K K (p i E (P (Z))) f (p i, p t ) i=1 t>l Cov (Z 1, D) K (p t E (P (Z))) f (p t ) t>l Cov (Z 1, D) 128 / 188

129 Discrete instruments and the weights for LATE Joint probability distribution of (Z 1, Z 2 ) and the propensity score Z 1 \Z Cov(Z 1, Z 2 ) = (joint probabilities in ordinary type (Pr(Z 1 = z 1, Z 2 = z 2 )); propensity score in italics (Pr (D = 1 Z 1 = z 1, Z 2 = z 2 ))) 129 / 188

130 Discrete instruments and the weights for LATE Figure 5: IV weights and its components under discrete instruments when P(Z) is the instrument ATE = 0.2, TT = , TUT = and K 1 IV P(Z) = LATE (p l, p l+1 ) λ l = 0.09 l=1 130 / 188

131 Discrete instruments and the weights for LATE Figure 5A: IV weights and its components under discrete instruments when Figure 3. IV Weight and Its Components under Discrete Instrum P(Z) is the instrument (IV Weights) The Extended Roy Economy A. IV Weights B. E E(P(Z λ 1 λ 2 λ 3 λ E(P C. Local Average Treatment Eff 131 / 188

132 Discrete instruments and the weights for LATE Figure 5B: IV weights and its components under discrete instruments when under Discrete Instruments when P (Z) is the Instrument xtended P(Z) isroy the Economy instrument (E(P(Z) P(Z) > p l ) and E(P(Z))) B. E(P (Z) P (Z) > p l ) and E(P (Z)) E(P(Z))= E(P(Z) P(Z)>p 1 )E(P(Z) P(Z)>p 2 )E(P(Z) P(Z)>p 3 )E(P(Z) P(Z)>p 4 ) -0.2 verage Treatment Effects 132 / 188

133 Discrete instruments and the weights for LATE λ 2 λ λ 3 4 Figure 5C: IV weights and its components under discrete instruments when -0.2 P(Z) is the instrument (Local average treatment effects) C. Local Average Treatment Effects 0 E(P(Z) P(Z)>p 1 )E(P(Z) P(Z)>p 2 )E(P(Z) P(Z)>p 3 )E(P(Z LATE 0.340,0.438 LATE 0.438,0.540 LATE 0.540,0.640 LATE 0.640, The model is the same as the one presented below Figure / 188

134 Discrete instruments and the weights for LATE Consider using Z 1 as instrument If Z 1 and Z 2 are negatively dependent and E(Z 1 P(Z) > u D ) is not monotonic in u D, weights negative. This nonmonotonicity is evident in Figure 6B. This produces the pattern of negative weights shown in Figure 6A. Associated with two way flows. Two way flows are induced by uncontrolled variation in Z / 188

135 Figure 2. Monotonicity The Extended Roy Economy Adoption model IV General model Comparing models Examples GED Separability Conclusion Discrete instruments and the weights for LATE Figure 4B: monotonicity, the extended Roy economy Changing Z 1 without controlling for Z 2 B. Changing Z 1 without Controlling for Z 2 C γz z =(0,1) z' =(1,1) z'' =(1,-1) γz'' ~ γ = ~ γ = z = z' = 0.05 γz' β = Y 1 - Y / 188

136 Discrete instruments and the weights for LATE Figure 6: IV weights and its components under discrete instruments when Z 1 is the instrument Figure 4. IV Weight and Its Components under Discrete Instruments when Z 1 is the Instrument The Extended Roy Economy A. IV Weights B. E(Z 1 P (Z) >p ) and E(Z 1) E(Z 1 P(Z)>p 1 ) E(Z 1 P(Z)>p 2 ) E(Z 1 P(Z)>p 3 ) E(Z 1 P(Z)>p 4 ) 1 0 E(Z 1 )= λ λ 1 λ 2 λ ThemodelisthesameastheonepresentedbelowFigure2.Thevaluesofthetreatmentparametersarethesameasthe The model is the same as the one presented after figure 4. ones presented below Figure / 188

137 Discrete instruments and the weights for LATE λ 2 λ λ 3 4 Figure 5C: IV weights and its components under discrete instruments when -0.2 P(Z) is the instrument (local average treatment effects) C. Local Average Treatment Effects 0 E(P(Z) P(Z)>p 1 )E(P(Z) P(Z)>p 2 )E(P(Z) P(Z)>p 3 )E(P(Z LATE 0.340,0.438 LATE 0.438,0.540 LATE 0.540,0.640 LATE 0.640, The model is the same as the one presented below Figure / 188

138 Discrete instruments and the weights for LATE K 1 IV Z 1 = LATE (p l, p l+1 ) λ l = l=1 λ l = (p l+1 p l ) I K (z 1,i E (Z 1 )) f (z 1,i, p t ) i=1 t>l Cov (Z 1, D) 138 / 188

139 Discrete instruments and the weights for LATE Joint probability distribution of (Z 1, Z 2 ) and the propensity score Z 1 \Z Cov(Z 1, Z 2 ) = (joint probabilities in ordinary type (Pr(Z 1 = z 1, Z 2 = z 2 )); propensity score in italics (Pr (D = 1 Z 1 = z 1, Z 2 = z 2 ))) 139 / 188

140 Discrete instruments and the weights for LATE Conditional variable estimator and conditional local average treatment effect when Z 1 is the instrument (given Z 2 = z 2 ) Z 2 = 1 Z 2 = 0 Z 2 = 1 P ( 1, Z 2 ) = p P (0, Z 2 ) = p P (1, Z 2 ) = p λ λ LATE (p 1, p 2 ) LATE (p 2, p 3 ) IV Z 1 Z 2 =z / 188

141 Discrete instruments and the weights for LATE Conditional instrumental variable estimator XI 1 XI 1 IV Z 1 Z 2 =z 2 = LATE (p l, p l+1 Z 2 = z 2 ) λ l Z2 =z 2 = LATE (p l, p l+1 Z 2 = z 2 ) λ l Z2 =z 2 l=1 l=1 LATE (p l, p l+1 Z 2 = z 2 ) = E (Y P(Z) = p l+1, Z 2 = z 2 ) E (Y P(Z) = p l, Z 2 = z 2 ) p l+1 p l IX `z1,i E (Z 1 Z 2 = z 2 ) IX f `z 1,i, p t Z 2 = z 2 i=1 t>l λ l Z2 =z 2 = (p l+1 p l ) Cov (Z 1, D) IX (z 1,t E (Z 1 Z 2 = z 2 )) f (z 1,t, p t Z 2 = z 2 ) t>l = (p l+1 p l ) Cov (Z 1, D) 141 / 188

142 Discrete instruments and the weights for LATE Conditional instrumental variable estimator Probability Distribution of Z 1 Conditional on Z 2 (Pr(Z 1 = z 1 Z 2 = z 2 )) z 1 Pr(Z 1 = z 1 Z 2 = 1) Pr(Z 1 = z 1 Z 2 = 0) Pr(Z 1 = z 1 Z 2 = 1) / 188

143 Continuous instruments Figure 7 plots E(Y P(Z)) and MTE for the models displayed at the base of the figure. In cases I and II, β D. In case I, this is trivial since β is a constant. In case II, β is random but selection into D does not depend on β. Case III is the model with essential heterogeneity (β D). Figure 7A depicts E(Y P(Z)) in the three cases. 143 / 188

144 Continuous instruments Figure 7: conditional expectation of Y on P(Z) and the marginal treatment effect (MTE) Figure 5. Conditional Expectation of Y on P (Z) and the Marginal Treatment Effect (MTE) The Extended Roy Economy A. E(Y P (Z) =p) B. MTE (u D) Cases I and II Case III 0.2 Cases I and II Case III p p=u D 144 / 188

145 Continuous instruments Outcomes Choice Model j Y 1 = α + β 1 if D + U 1 D = > 0 0 if D 0 Y 0 = α + U 0 Case I Case II Case III U 1 = U 0 U 1 U 0 D U 1 U 0 D β =ATE=TT=TUT=IV β =ATE=TT=TUT=IV β =ATE TT TUT IV Parameterization Cases I, II and III Cases II and III Case III α = 0.67 (U 1, U 0» ) N (0, Σ) D = Y 1 Y 0 γz β = 0.2 with Σ = Z N (µ Z, Σ Z )» 9 2 µ Z = (2, 2) and Σ Z = 2 9 γ = (0.5, 0.5) 145 / 188

146 Continuous instruments Cases I and II make E(Y P(Z)) linear in P(Z) (see equation 3.7). Case III is nonlinear in P(Z) which arises when β D. The derivative of E(Y P(Z)) is presented in the right panel (Figure 7B). It is a constant in cases I and II (flat MTE) but declining in U D = P(Z) for the case with selection on the gain. 146 / 188

147 Continuous instruments MTE gives the mean marginal return for persons who have utility P(Z) = u D (P(Z) = u D is the margin of indifference). Figure 7 highlights that MTE (and LATE) identify average returns for persons at the margin of indifference at different levels of the mean utility function P(Z). Figure 8 plots MTE and LATE for different intervals of u D using the model plotted in Figure 7. LATE is the chord of E(Y P(Z)) evaluated at different points. The relationship between LATE and MTE is presented in the right panel of Figure / 188

148 Continuous instruments Figure 8: the local average treatment effect Figure 6. The Local Average Treatment Effect The Extended Roy Economy A. E(Y P (Z) =p) and LATE (p,p +1) B. MTE (u D) and LATE (p,p +1) A LATE(0.1,0.35)= LATE(0.6,0.9)= C B D Area(E,F,G,H)/0.3=LATE(0.6,0.9)= 1.17 E F G Area(A,B,C,D)/0.25=LATE(0.1,0.35)= H p u D =p 148 / 188

149 Continuous instruments Figure 8: the local average treatment effect LATE (p l, p l+1 ) = E (Y P(Z) = p l+1) E (Y P(Z) = p l ) p l+1 p l p l+1 MTE (u D )du D = p l p l+1 p l LATE (0.1, 0.35) = LATE (0.6, 0.9) = / 188

150 Continuous instruments Figure 8: the local average treatment effect Outcomes Choice Model Y 1 = α + β + U 1 { 1 if D D = > 0 0 if D 0 Y 0 = α + U 0 with D = Y 1 Y 0 γz Σ = Parameterization (U 1, U 0 ) N (0, Σ) and Z N (µ Z, Σ Z ) [ ] [ ] , µ Z = (2, 2) and Σ Z = 2 9 α = 0.67, β = 0.2, γ = (0.5, 0.5) 150 / 188

151 Continuous instruments The treatment parameters as a function of p associated with case III are plotted in Figure 9. MTE is the same as that reported in Figure 7. ATE is the same for all p. TT (p) = E(Y 1 Y 0 D = 1, P(Z) = p) declines in p (equivalently, it declines in u D ). LATE(p, p ) = TT (p )p TT (p)p, p p p p MTE = [ TT (p)p]. p 151 / 188

152 Continuous instruments Parameter Definition Under Assumptions (*) Marginal Treatment Effect E [Y 1 Y 0 D =0,P(Z) =p] β + σu1 U0Φ 1 (1 p) Average Treatment Effect E [Y 1 Y 0 P (Z) =p] β Treatment on the Treated E [Y 1 Y 0 D φ(φ > 0,P(Z) =p] β + 1 (1 p)) σu1 U0 Treatment on the Untreated E [Y 1 Y 0 D 0,P(Z) =p] β σu1 U0 µ OLS/Matching on P (Z) E [Y 1 D > 0,P(Z) =p] E [Y 0 D 0,P(Z) =p] β + σ 2 U 1 σu 1,U 0 σu1 U 0 p φ(φ 1 (1 p)) 1 p ³ 1 2p p(1 p) φ Φ 1 (1 p) Note: Φ ( ) and φ ( ) represent the cdf and pdf of a standard normal distribution, respectively. Φ 1 ( ) represents the inverse of Φ ( ). (*):ThemodelinthiscaseisthesameastheonepresentedbelowFigure / 188

153 Continuous instruments Figure 9: treatment parameters and OLS matching as a function of P(Z) = p Figure 7. Treatment Parameters and OLS/Matching as a function of P (Z) =p 5 4 T T (p) A T E ( p ) M T E (p) T U T (p) M a t c h i n g ( p ) p 153 / 188

154 Continuous instruments Another nonmonotonicity example A mixture of two normals: Z P 1 N(µ 1, Σ 1 ) + P 2 N(µ 2, Σ 2 ) P 1 is the proportion in population 1, P 2 is the proportion in population 2 and P 1 + P 2 = / 188

155 Continuous instruments Another nonmonotonicity example Conventional normal outcome selection model generated by the parameters at the base of Figure 11. The discrete choice equation ( is a conventional probit: γz Pr (D = 1 Z = z) = Φ σ V ). The MTE (v), E(Y 1 Y 0 V = v) = µ 1 µ 0 + Cov (U 1 U 0, V ) v. Var (V ) We show results for models with vector Z that satisfies (IV-1) and (IV-2) and with γ > 0 componentwise. 155 / 188

156 Continuous instruments Outcomes Choice Model Y 1 = α + β + U 1 { 1 if D D = > 0 0 if D 0 Y 0 = α + U 0 D = Y 1 Y 0 γz and V = (U 1 U 0 ) (U 1, U 0 ) N (0, Σ), Σ = Parameterization [ ] 1 0.9, α = 0.67, β = / 188

157 Continuous instruments Z = (Z 1, Z 2 ) p 1 N(κ 1, Σ 1 ) + p 2 N(κ 2, Σ 2 ) p 1 = 0.45, p 2 = 0.55 ; Σ 1 = [ Cov(Z 1, γz) = γσ 1 1 = 0.98 ; γ = (0.2, 1.4) ] 157 / 188

158 Continuous instruments Figure 11: marginal treatment effect and IV weights using Z 1 as the instrument when Z = (Z 1, Z 2 ) p 1 N(µ 1, Σ 1 ) + p 2 N(µ 2, Σ 2 ) for different values of Σ 2 Figure 8. Marginal Treatment Effect and IV Weights using Z 1 as the Instrument when Z =(Z 1,Z 2) p 1N(κ 1, Σ 1)+p 2N(κ 2, Σ 2) for different values of Σ A. IV Weights B. MTE (v) Weights MTE 4 ω 1 ω2 3 ω v v Outcomes Choice Model Y 1 = α + β + U 1 D = { 1 if D > 0 0 if D / 188

Lecture 11 Roy model, MTE, PRTE

Lecture 11 Roy model, MTE, PRTE Economics 2123 George Washington University Instructor: Prof. Ben Williams Roy Model Motivation The standard textbook example of simultaneity is a supply and demand system