A Course in Applied Econometrics. Lecture 5. Instrumental Variables with Treatment Effect. Heterogeneity: Local Average Treatment Effects.

A Course in Applied Econometrics Lecture 5 Outline. Introduction 2. Basics Instrumental Variables with Treatment Effect Heterogeneity: Local Average Treatment Effects 3. Local Average Treatment Effects 4. Extrapolation to the Population 5. Covariates Guido Imbens IRP Lectures, UW Madison, August 2008 6. Multivalued Instruments 7. Multivalued Endogenous Regressors. Introduction. Instrumental variables estimate average treatment effects, with the average depending on the instruments. 2. Population averages are only estimable under unrealistically strong assumptions ( identification at infinity, or under the constant effect). 3. Compliers (for whom we can identify effects) are not necessarily the subpopulations that are ex ante the most interesting subpopulations, but need extrapolation for others. 4. The set up here allows the researcher to sharply separate the extrapolation to the (sub-)population of interest from exploration of the information in the data. 2 2. Basics Linear IV with Constant Coefficients. Standard set up: Y i = β 0 + β W i + ε i. There is concern that the regressor W i is endogenous, correlated with ε i. Suppose that we have an instrument Z i that is both uncorrelated with ε i and correlated with W i. In the single instrument / single endogenous regressor, we end up with the ratio of covariances ˆβ IV Ni= = N (Y i Ȳ ) (Z i Z) Ni= N (W i W ) (Z i Z). Using a central limit theorem for all the moments and the delta method we can infer the large sample distribution without additional assumptions. 3

Potential Outcome Set Up Let Y i (0) and Y i () be two potential outcomes for unit i, one for each value of the endogenous regressor or treatment. Let W i be the realized value of the endogenous regressor, equal to zero or one. We observe W i and Y i = Y i (W i )= { Yi () if W i = Y i (0) if W i =0. Define two potential outcomes W i (0) and W i (), representing the value of the endogenous regressor given the two values for the instrument Z i. The actual or realized value of the endogenous variable is W i = W i (Z i )= { Wi () if Z i = W i (0) if Z i =0. 3. Local Average Treatment Effects The key instrumental variables assumption is Assumption (Independence) Z i (Y i (0),Y i (),W i (0),W i ()). It requires that the instrument is as good as randomly assigned, and that it does not directly affect the outcome. The assumption is formulated in a nonparametric way, without definitions of residuals that are tied to functional forms. So we observe the triple Z i, W i = W i (Z i )andy i = Y i (W i (Z i )). 4 5 Assumptions (ctd) Alternatively, we separate the assumption by postulating the existence of four potential outcomes, Y i (z, w), corresponding to the outcome that would be observed if the instrument was Z i = z and the treatment was W i = w. Assumption 2 (Random Assignment) and Z i (Y i (0, 0),Y i (0, ),Y i (, 0),Y i (, ),W i (0),W i ()). Compliance Types It is useful for our approach to think about the compliance behavior of the different units W i (0) Assumption 3 (Exclusion Restriction) Y i (z, w) =Y i (z,w), for all z, z,w. The first of these two assumptions is implied by random assignment of Z i, but the second is substantive, and randomization has no bearing on it. 6 W i () 0 never-taker defier complier always-taker 7

We cannot directly establish the type of a unit based on what we observe for them since we only see the pair (Z i,w i ), not the pair (W i (0),W i ()). Nevertheless, we can rule out some possibilities. Monotonicity Assumption 4 (Monotonicity/No-Defiers) W i () W i (0). Z i 0 complier/never-taker never-taker/defier W i always-taker/defier complier/always-taker This assumption makes sense in a lot of applications. It is implied directly by many (constant coefficient) latent index models of the type: W i (z) ={π 0 + π z + ε i > 0}, but it is much weaker than that. 8 9 Implications for Compliance types: Distribution of Compliance Types Z i Under random assignment and monotonicity we can estimate the distribution of compliance types: 0 complier/never-taker never-taker W i always-taker complier/always-taker π a =Pr(W i (0) = W i () = ) = E[W i Z i =0] π c =Pr(W i (0) = 0,W i () = ) = E[W i Z i =] E[W i Z i =0] For individuals with (Z i =0,W i =)andfor(z i =,W i =0) we can now infer the compliance type. π n =Pr(W i (0) = W i () = 0) = E[W i Z i =] 0

Now consider average outcomes by instrument and treatment: E[Y i W i =0,Z i =0]= π c E[Y i (0) complier] + π n E[Y i (0) never taker], π c + π n π c + π n E[Y i W i =0,Z i =]=E[Y i (0) never taker], E[Y i W i =,Z i =0]=E[Y i () always taker], E[Y i W i =,Z i =]= π c E[Y i () complier] + π a E[Y i () always taker]. π c + π a π c + π a Local Average Treatment Effect Hence the instrumental variables estimand, the ratio of these two reduced form estimands, is equal to the local average treatment effect β IV = E[Y i Z i =] E[Y i Z i =0] E[W i Z i =] E[W i Z i =0] = E[Y i () Y i (0) complier]. From this we can infer the average outcome for compliers, E[Y i (0) complier], and E[Y i () complier], 2 3 4. Extrapolating to the Full Population We can estimate E [Y i (0) never taker], and E [Y i () always taker] We can learn from these averages whether there is any evidence of heterogeneity in outcomes by compliance status, by comparing the pair of average outcomes of Y i (0); E [Y i (0) never taker], and E [Y i (0) complier], and the pair of average outcomes of Y i (): E [Y i () always taker], and E [Y i () complier]. If compliers, never-takers and always-takers are found to be substantially different in levels, then it appears much less plausible that the average effect for compliers is indicative of average effects for other compliance types. 4 5. Covariates Traditionally the TSLS set up is used with the covariates entering in the outcome equation linearly and additively, as Y i = β 0 + β W i + β 2 X i + ε i, with the covariates added to the set of instruments. Given the potential outcome set up with general heterogeneity in the effects of the treatment, one may also wish to allow for more heterogeneity in the correlations between treatment effects and covariates. Here we describe a general way of doing so. Unlike TSLS type approaches, this involves modelling both the dependence of the outcome and the treatment on the covariates. 5

Heckman Selection Model A traditional parametric model with a dummy endogenous variables might have the form (translated to the potential outcome set up used here): W i (z) ={π 0 + π z + π 2 X i + η i 0}, Y i (w) =β 0 + β w + β 2 X i + ε i, with (η i,ε i ) jointly normally distributed (e.g., Heckman, 978). Such a model impose restrictions on the relation between compliance types, covariates and outcomes: never taker if η i < π 0 π π 2 X i i is a complier if π 0 π π 2 X i η i < π 0 π π 2 X always taker if π 0 π 2 X i η i, which imposes strong restrictions, e.g., if E[Y i (0) n, X i ] < E[Y i (0) c, X i ], then E[Y i () c, X i ] < E[Y i () a, X i ] 6 Flexible Alternative Model Specify f Y (w) X,T (y x, t) =f(y x; θ wt ), for (w, t) =(0,n), (0,c),(,c),(,a). A natural model for the distribution of type is a trinomial logit model: Pr(T i = complier X i )= Pr(T i = never taker X i )= Pr(T i =always taker X i )= + exp(π n X i) + exp(π a X i), exp(π nx i ) + exp(π n X i) + exp(π a X i), Pr(T i = complier X i ) Pr(T i = never taker X i ). 7 The log likelihood function is then, factored in terms of the contribution by observed (W i,z i )values: L(π n,π a,θ 0n,θ 0c,θ c,θ a )= exp(π n X i) + exp(π i W i =0,Z i = n X i) + exp(π a X i) f(y i X i ; θ 0n ) i W i =0,Z i =0 i W i =,Z i = ( exp(π n X i ) + exp(π n X i) f(y i X i ; θ 0n )+ ( exp(π a X i ) + exp(π ax i ) f(y i X i ; θ a )+ + exp(π n X i) f(y + exp(π ax i ) f(y exp(π a X i) + exp(π i W i =,Z i =0 nx i ) + exp(π ax i ) f(y i X i ; θ a ). 8 Application: Angrist (990) effect of military service The simple ols regression leads to: log(earnings) i = 5.4364 0.0205 veteran i (0079) (0.067) In Table we present population sizes of the four treatmen/instrument samples. For example, with a low lottery number 5,948 individuals do not, and,372 individuals do serve in the military. Z i 0 5,948,95 W i,372 865 9

Using these data we get the following proportions of the various compliance types, given in Table, under the non-defiers assumption. For example, the proportion of nevertakers is estimated as the conditional probability of W i = 0 given Z i =: Pr(nevertaker) = 95 95 + 865. W i (0) Estimated Average Outcomes by Treatment and Instrument Z i 0 E[Y ]=5.4472 E[Y ]=5.4076, W i E[Y ]=5.4028 E[Y ]=5.4289 W i () 0 never-taker (0.6888) defier (0) complier (0.237) always-taker (0.875) Not much variation by treatment status given instrument, but these comparisons are not causal under IV assumptions. 20 2 W i () 0 E[Yi (0)] = 5.4028 W i (0) E[Yi (0)] = 5.6948, E[Yi ()] = 5.462 defier (NA) E[Yi ()] = 5.4076 The local average treatment effect is -0.2336, a 23% drop in earnings as a result of serving in the military. Simply doing IV or TSLS would give you the same numerical results: log(earnings) i = 5.4836 0.2336 veteran i (0.0289) (0.266) 22 It is interesting in this application to inspect the average outcome for different compliance groups. Average log earnings for never-takers are 5.40, lower by 29% than average earnings for compliers who do not serve in the military. This suggests that never-takers are substantially different than compliers, and that the average effect of 23% for compliers need not be informative never-takers. Note that E[Y i (0) n, X i ] < E[Y i (0) c, X i ], but also E[Y i () c, X i ] > E[Y i () a, X i ] Compliers earn more than nevertakers when not serving, and more than always-takers when serving. Does not fit standard gaussian selection model. 23

6. Multivalued Instruments For any two values of the instrument z 0 and z satisfying the local average treatment effect assumptions we can define the corresponding local average treatment effect: τ z,z 0 = E[Y i () Y i (0) W i (z )=,W i (z 0 )=0]. Note that these local average treatment effects need not be the same for different pairs of instrument values (z 0,z ). Interpretation of IV Estimand Suppose that monotonicity holds for all (z, z ), and suppose that the instruments are ordered in such a way that p(z k ) p(z k ), where p(z) =E[W i Z i = z]. Also suppose that the instrument is relevant, E[g(Z i ) W i ] 0. Then the instrumental variables estimator based on using g(z) as an instrument for W estimates a weighted average of local average treatment effects: τ g( ) = Cov(Y i,g(z i )) Cov(W i,g(z i )) = K λ k τ zk,z k, k= Comparisons of estimates based on different instruments underly conventional tests of overidentifying restrictions in TSLS settings. An alternative interpretation of rejections in such testing procedures is therefore treatment effect heterogeneity. 24 λ k = (p(z k) p(z k )) K l=k π l (g(z l ) E[g(Z i )] Kk= p(z k ) p(z k )) K l=k π l (g(z l ) E[g(Z i )], π k =Pr(Z i = z k ). These weights are nonnegative and sum up to one. 25 Marginal Treatment Effect If the instrument is continuous, and p(z) is continuous in z, we can define the limit of the local average treatment effects τ z = lim z z,z z τ z,z. Suppose we have a latent index model for the receipt of treatment: W i (z) ={h(z)+η i 0}, with the scalar unobserved component η i independent of the instrument Z i. Then we can define the marginal treatment effect τ(η) (Heckman and Vytlacil, 2005) as τ(η) =E [Y i () Y i (0) η i = η]. 26 This marginal treatment effect relates directly to the limit of the local average treatment effects τ(η) =τ z, with η = h(z)). Note that we can only define this for values of η for which there is a z such that τ = h(z). Normalizing the marginal distribution of η to be uniform on [0, ], this restricts η to be in the interval [inf z p(z), sup z p(z)], where p(z) =Pr(W i = Z i = z). Now we can characterize various average treatment effects in terms of this limit. E.g.: τ = τ(η)df η(η). η 27

7. Multivalued Endogenous Variables τ = Cov(Y i,zi) Cov(Wi,Zi) = E[Y i Zi =] E[Yi Zi =0] E[Wi Zi =] E[Wi Zi =0]. Exclusion restriction and monotonicity: Y i (w) W i (z) Z i, W i () W i (0), Then J τ = λ j E[Y i (j) Y i (j ) W i () j>w i (0)], j= λj = Pr(W i() j>w i (0) Ji= Pr(Wi() i>wi(0). with the weights λj estimable. 28 Illustration: Angrist-Krueger (99) Returns to Educ. êduci = 2.797 0.09 qobi (0.006) (0.03) log(earnings) i = 5.903 0.0 qobi (0.00) (0.003) The instrumental variables estimate is the ratio ˆβ IV = 0.09 0.0 =0.020. Weights γj =Pr(Wi() j>wi(0) can be estimated as ˆγj = {Wi j} N i Zi= N0 {Wi j}. i Zi=0 29 0.6 0.3 0.2 0. 0 0.5 0. 0.05 0 0.05 0.04 0.03 0.02 0.0 0 0.8 0.4 Figure : histogram estimate of density of years of education 0 2 4 6 8 2 4 6 8 20 Figure 2: Normalized Weight Function for Instrumental Variables Estimand 0 2 4 6 8 2 4 6 8 20 Figure 3: Unnormalized Weight Function for Instrumental Variables Estimand 0 2 4 6 8 2 4 6 8 20 Figure 3: Education Distribution Function by Quarter 0.4 0.2 0 0 2 4 6 8 2 4 6 8 20