Econ 2148, fall 2017 Instrumental variables I, origins and binary treatment case

Similar documents
Introduction to causal identification. Nidhiya Menon IGC Summer School, New Delhi, July 2015

Econ 2148, fall 2017 Instrumental variables II, continuous treatment

The Generalized Roy Model and Treatment Effects

Statistical Analysis of Randomized Experiments with Nonignorable Missing Binary Outcomes

WORKSHOP ON PRINCIPAL STRATIFICATION STANFORD UNIVERSITY, Luke W. Miratrix (Harvard University) Lindsay C. Page (University of Pittsburgh)

Identification in Triangular Systems using Control Functions

PSC 504: Instrumental Variables

150C Causal Inference

Flexible Estimation of Treatment Effect Parameters

Comments on: Panel Data Analysis Advantages and Challenges. Manuel Arellano CEMFI, Madrid November 2006

Nonadditive Models with Endogenous Regressors

The Problem of Causality in the Analysis of Educational Choices and Labor Market Outcomes Slides for Lectures

Lecture 8. Roy Model, IV with essential heterogeneity, MTE

Online Appendix to Yes, But What s the Mechanism? (Don t Expect an Easy Answer) John G. Bullock, Donald P. Green, and Shang E. Ha

IDENTIFICATION OF TREATMENT EFFECTS WITH SELECTIVE PARTICIPATION IN A RANDOMIZED TRIAL

Instrumental Variables

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies

Methods to Estimate Causal Effects Theory and Applications. Prof. Dr. Sascha O. Becker U Stirling, Ifo, CESifo and IZA

ECO Class 6 Nonparametric Econometrics

A Course in Applied Econometrics. Lecture 5. Instrumental Variables with Treatment Effect. Heterogeneity: Local Average Treatment Effects.

Identification Analysis for Randomized Experiments with Noncompliance and Truncation-by-Death

On Variance Estimation for 2SLS When Instruments Identify Different LATEs

Introduction to Causal Inference. Solutions to Quiz 4

IV Estimation WS 2014/15 SS Alexander Spermann. IV Estimation

An Alternative Assumption to Identify LATE in Regression Discontinuity Design

Imbens, Lecture Notes 2, Local Average Treatment Effects, IEN, Miami, Oct 10 1

Instrumental Variables

Econometrics of causal inference. Throughout, we consider the simplest case of a linear outcome equation, and homogeneous

Sensitivity checks for the local average treatment effect

Michael Lechner Causal Analysis RDD 2014 page 1. Lecture 7. The Regression Discontinuity Design. RDD fuzzy and sharp

Statistical Models for Causal Analysis

Recitation Notes 6. Konrad Menzel. October 22, 2006

AGEC 661 Note Fourteen

Instrumental Variables

The problem of causality in microeconometrics.

Supplementary material to: Tolerating deance? Local average treatment eects without monotonicity.

Recitation Notes 5. Konrad Menzel. October 13, 2006

Causal Hazard Ratio Estimation By Instrumental Variables or Principal Stratification. Todd MacKenzie, PhD

Quantitative Economics for the Evaluation of the European Policy

An Alternative Assumption to Identify LATE in Regression Discontinuity Designs

Potential Outcomes and Causal Inference I

SUPPOSE a policy is proposed for adoption in a country.

Lecture #11: Introduction to the New Empirical Industrial Organization (NEIO) -

Bounds on Causal Effects in Three-Arm Trials with Non-compliance. Jing Cheng Dylan Small

CEPA Working Paper No

Applied Microeconometrics Chapter 8 Regression Discontinuity (RD)

Impact Evaluation Technical Workshop:

1 Motivation for Instrumental Variable (IV) Regression

IsoLATEing: Identifying Heterogeneous Effects of Multiple Treatments

Regression Discontinuity Designs.

The problem of causality in microeconometrics.

Sharp Bounds on Causal Effects under Sample Selection*

Principles Underlying Evaluation Estimators

arxiv: v1 [stat.me] 8 Jun 2016

Identi cation of Positive Treatment E ects in. Randomized Experiments with Non-Compliance

Recitation 7. and Kirkebøen, Leuven, and Mogstad (2014) Spring Peter Hull

Partial Identification of Average Treatment Effects in Program Evaluation: Theory and Applications

Potential Outcomes Model (POM)

Lecture 11 Roy model, MTE, PRTE

Instrumental Variables

Growth Mixture Modeling and Causal Inference. Booil Jo Stanford University

Experimental Designs for Identifying Causal Mechanisms

Noncompliance in Randomized Experiments

Mgmt 469. Causality and Identification

Testing instrument validity for LATE identification based on inequality moment constraints

Regression Discontinuity Designs

New Developments in Econometrics Lecture 16: Quantile Estimation

Transparent Structural Estimation. Matthew Gentzkow Fisher-Schultz Lecture (from work w/ Isaiah Andrews & Jesse M. Shapiro)

The relationship between treatment parameters within a latent variable framework

Unpacking the Black-Box: Learning about Causal Mechanisms from Experimental and Observational Studies

Estimating the Dynamic Effects of a Job Training Program with M. Program with Multiple Alternatives

DEALING WITH MULTIVARIATE OUTCOMES IN STUDIES FOR CAUSAL EFFECTS

Econometrics I G (Part I) Fall 2004

Identification and Estimation Using Heteroscedasticity Without Instruments: The Binary Endogenous Regressor Case

Empirical approaches in public economics

Final Exam. Economics 835: Econometrics. Fall 2010

The Econometric Evaluation of Policy Design: Part I: Heterogeneity in Program Impacts, Modeling Self-Selection, and Parameters of Interest

ECON The Simple Regression Model

Harvard University. Harvard University Biostatistics Working Paper Series

Econometric Causality

Econometric Analysis of Games 1

Graduate Econometrics I: What is econometrics?

ECO 310: Empirical Industrial Organization Lecture 2 - Estimation of Demand and Supply

Exam ECON5106/9106 Fall 2018

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data?

Difference-in-Differences Methods

What s New in Econometrics? Lecture 14 Quantile Methods

What s New in Econometrics. Lecture 1

Relaxing Latent Ignorability in the ITT Analysis of Randomized Studies with Missing Data and Noncompliance

Lecture 11/12. Roy Model, MTE, Structural Estimation

Using Instrumental Variables to Find Causal Effects in Public Health

Econometrics in a nutshell: Variation and Identification Linear Regression Model in STATA. Research Methods. Carlos Noton.

IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors

Why experimenters should not randomize, and what they should do instead

1 Outline. 1. Motivation. 2. SUR model. 3. Simultaneous equations. 4. Estimation

Sharp IV bounds on average treatment effects on the treated and other populations under endogeneity and noncompliance

Trimming for Bounds on Treatment Effects with Missing Outcomes *

Linear IV and Simultaneous Equations

Bayesian regression tree models for causal inference: regularization, confounding and heterogeneity

Discussion of Identifiability and Estimation of Causal Effects in Randomized. Trials with Noncompliance and Completely Non-ignorable Missing Data

Transcription:

Econ 2148, fall 2017 Instrumental variables I, origins and binary treatment case Maximilian Kasy Department of Economics, Harvard University 1 / 40

Agenda instrumental variables part I Origins of instrumental variables: Systems of linear structural equations Strong restriction: Constant causal effects. Modern perspective: Potential outcomes, allow for heterogeneity of causal effects Binary case: 1. Keep IV estimand, reinterpret it in more general setting: Local Average Treatment Effect (LATE) 2. Keep object of interest average treatment effect (ATE): Partial identification (Bounds) 2 / 40

Agenda instrumental variables part II Continuous treatment case: 1. Restricting heterogeneity in the structural equation: Nonparametric IV (conditional moment equalities) 2. Restricting heterogeneity in the first stage: Control functions 3. Linear IV: Continuous version of LATE 3 / 40

Takeaways for this part of class Instrumental variables methods were invented jointly with the idea of economic equilibrium. Classic assumptions impose strong restrictions on heterogeneity: same causal effect for every unit. Modern formulations based on potential outcomes relax this assumption. With effect heterogeneity, average treatment effects are not point-identified any more. Two solutions: 1. Re-interpret the classic IV-coefficient in more general setting. 2. Derive bounds on the average treatment effect. 4 / 40

Origins of IV: systems of structural equations Origins of IV: systems of structural equations econometrics pioneered by Cowles commission starting in the 1930s they were interested in demand (elasticities) for agricultural goods introduced systems of simultaneous equations outcomes as equilibria of some structural relationships goal: recover the slopes of structural relationships from observations of equilibrium outcomes and exogenous shifters 5 / 40

Origins of IV: systems of structural equations System of structural equations Y = A Y + B Z + ε, Y : k-dimensional vector of equilibrium outcomes Z : l-dimensional vector of exogenous variables A: unknown k k matrix of coefficients of interest B: unknown k l matrix ε: further unobserved factors affecting outcomes 6 / 40

Origins of IV: systems of structural equations Example: supply and demand Y = (P,Q) P = A 12 Q + B 1 Z + ε 1 demand Q = A 21 P + B 2 Z + ε 2 supply demand function: relates prices to quantity supplied and shifters Z and ε 1 of demand supply function relates quantities supplied to prices and shifters Z and ε 2 of supply. does not really matter which of the equations puts prices on the left hand side. price and quantity in market equilibrium: solution of this system of equations. 7 / 40

Origins of IV: systems of structural equations Reduced form solve equation Y = A Y + B Z + ε for Y as a function of Z and ε bring A Y to the left hand side, pre-multiply by (I A) 1 Y = C Z + η reduced form C := (I A) 1 B reduced form coefficients η := (I A) 1 ε suppose E[ε Z] = 0 (ie., Z is randomly assigned) then we can identify C from E[Y Z] = C Z. 8 / 40

Origins of IV: systems of structural equations Exclusion restrictions suppose we know C what we want is A, possibly B problem: k l coefficients in C = (I A) 1 B k (k + l) coefficients in A and B further assumptions needed exclusion restrictions: assume that some of the coefficients in B or A are = 0. Example: rainfall affects grain supply but not grain demand 9 / 40

Origins of IV: systems of structural equations Supply and demand continued suppose Z is (i) random, E[ε Z] = 0 and (ii) excluded from the demand equation B 11 = 0 by construction, diag(a) = 0 therefore Cov(Z,P) = Cov(Z,A 12 Q + B 1 Z + ε 1 ) = A 12 Cov(Z,Q), the slope of demand is identified by Z is an instrumental variable A 12 = Cov(Z,P) Cov(Z,Q). 10 / 40

Origins of IV: systems of structural equations Remarks historically, applied researchers have not been very careful about choosing Z for which (i) randomization and (ii) exclusion restriction are well justified. since the 1980s, more emphasis on credibility of identifying assumptions some additional problematic restrictions we imposed: 1. linearity 2. constant (non-random) slopes 3. heterogeneity ε is k dimensional and enters additively causal effects assumed to be the same for everyone next section: framework which does not impose this 11 / 40

Treatment effects Modern perspective: Treatment effects and potential outcomes coming from biostatistics / medical trials potential outcome framework: answer to what if questions two treatments: D = 0 or D = 1 eg. placebo vs. actual treatment in a medical trial Y i person i s outcome eg. survival after 2 years potential outcome Y 0 i : what if person i would have gotten treatment 0 potential outcome Y 1 i : what if person i would have gotten treatment 1 question to you: is this even meaningful? 12 / 40

Treatment effects causal effect / treatment effect for person i : Yi 1 Yi 0. average causal effect / average treatment effect: ATE = E[Y 1 Y 0 ], expectation averages over the population of interest 13 / 40

Treatment effects The fundamental problem of causal inference we never observe both Y 0 and Y 1 at the same time one of the potential outcomes is always missing from the data treatment D determines which of the two we observe formally: Y = D Y 1 + (1 D) Y 0. 14 / 40

Treatment effects Selection problem distribution of Y 1 among those with D = 1 need not be the same as the distribution of Y 1 among everyone. in particular E[Y D = 1] = E[Y 1 D = 1] E[Y 1 ] E[Y D = 0] = E[Y 0 D = 0] E[Y 0 ] E[Y D = 1] E[Y D = 0] E[Y 1 Y 0 ] = ATE. 15 / 40

Treatment effects Randomization no selection D is random (Y 0,Y 1 ) D. in this case, E[Y D = 1] = E[Y 1 D = 1] = E[Y 1 ] E[Y D = 0] = E[Y 0 D = 0] = E[Y 0 ] E[Y D = 1] E[Y D = 0] = E[Y 1 Y 0 ] = ATE. can ensure this by actually randomly assigning D independence comparing treatment and control actually compares apples with apples this gives empirical content to the metaphysical notion of potential outcomes! 16 / 40

LATE Instrumental variables recall: simultaneous equations models with exclusion restrictions instrumental variables β = Cov(Z,Y ) Cov(Z,D). we will now give a new interpretation to β using the potential outcomes framework, allowing for heterogeneity of treatment effects Local Average Treatment Effect (LATE) 17 / 40

LATE 6 assumptions 1. Z {0,1}, D {0,1} 2. Y = D Y 1 + (1 D) Y 0 3. D = Z D 1 + (1 Z) D 0 4. D 1 D 0 5. Z (Y 0,Y 1,D 0,D 1 ) 6. Cov(Z,D) 0 18 / 40

LATE Discussion of assumptions Generalization of randomized experiment D is partially randomized instrument Z is randomized D depends on Z, but is not fully determined by it 1. Binary treatment and instrument: both D and Z can only take two values results generalize, but things get messier without this 2. Potential outcome equation for Y : Y = D Y 1 + (1 D) Y 0 exclusion restriction: Z does not show up in the equation determining the outcome. stable unit treatment values assumption (SUTVA): outcomes are not affected by the treatment received by other units. excludes general equilibrium effects or externalities. 19 / 40

LATE 3. Potential outcome equation for D: D = Z D 1 + (1 Z) D 0 SUTVA; treatment is not affected by the instrument values of other units 4. No defiers: D 1 D 0 four possible combinations for the potential treatments (D 0,D 1 ) in the binary setting D 1 = 0,D 0 = 1, is excluded monotonicity 20 / 40

LATE Table: No defiers D 0 D 1 Never takers (NT) 0 0 Compliers (C) 0 1 Always takers (AT) 1 1 Defiers 1 0 21 / 40

LATE 5. Randomization: Z (Y 0,Y 1,D 0,D 1 ) Z is (as if) randomized. in applications, have to justify both exclusion and randomization no reverse causality, common cause! 6. Instrument relevance: Cov(Z, D) 0 guarantees that the IV estimand is well defined there are at least some compliers testable near-violation: weak instruments 22 / 40

LATE Graphical illustration Z=1 Never takers Compliers Always takers Z=0 D=0 D=1 23 / 40

LATE Illustration explained 3 groups, never takers, compliers, and always takers by randomization of Z : each group represented equally given Z = 0 / Z = 1 depending on group: observe different treatment values and potential outcomes. will now take the IV estimand Cov(Z,Y ) Cov(Z,D) interpret it in terms of potential outcomes: average causal effects for the subgroup of compliers idea of proof: take the top part of figure 28, and subtract the bottom part. 24 / 40

LATE Preliminary result: If Z is binary, then Cov(Z,Y ) E[Y Z = 1] E[Y Z = 0] = Cov(Z,D) E[D Z = 1] E[D Z = 0]. Practice problem Prove this. 25 / 40

LATE Proof Consider the covariance in the numerator: Cov(Z,Y ) = E[YZ] E[Y ] E[Z] = E[Y Z = 1] E[Z] (E[Y Z = 1] E[Z]+E[Y Z = 0] E[1 Z]) E[Z] = (E[Y Z = 1] E[Y Z = 0]) E[Z] E[1 Z]. Similarly for the denominator: Cov(Z,D) = (E[D Z = 1] E[D Z = 0]) E[Z] E[1 Z]. The E[Z] E[1 Z] terms cancel when taking a ratio 26 / 40

LATE The LATE result Practice problem Prove this. E[Y Z = 1] E[Y Z = 0] E[D Z = 1] E[D Z = 0] = E[Y 1 Y 0 D 1 > D 0 ] Hint: decompose E[Y Z = 1] E[Y Z = 0] in 3 parts corresponding to our illustration 27 / 40

LATE Z=1 Never takers Compliers Always takers Z=0 D=0 D=1 28 / 40

LATE Proof top part of figure: E[Y Z = 1] = E[Y Z = 1,NT ] P(NT Z = 1) + E[Y Z = 1,C] P(C Z = 1) + E[Y Z = 1,AT ] P(AT Z = 1) = E[Y 0 NT ] P(NT ) + E[Y 1 C] P(C) + E[Y 1 AT ] P(AT ). first equation relies on the no defiers assumption second equation uses the exclusion restriction and randomization assumptions. Similarly E[Y Z = 0] = E[Y 0 NT ] P(NT )+ E[Y 0 C] P(C) + E[Y 1 AT ] P(AT ). 29 / 40

LATE proof continued: Taking the difference, only the complier terms remain, the others drop out: E[Y Z = 1] E[Y Z = 0] = ( E[Y 1 C] E[Y 0 C] ) P(C). denominator: E[D Z = 1] E[D Z = 0] = E[D 1 ] E[D 0 ] taking the ratio, the claim follows. = (P(C) + P(AT )) P(AT ) = P(C). 30 / 40

LATE Recap LATE result: take the same statistical object economists estimate a lot which used to be interpreted as average treatment effect new interpretation in a more general framework allowing for heterogeneity of treatment effects treatment effect for a subgroup Practice problem Is the LATE, E[Y 1 Y 0 D 1 > D 0 ], a structural object? 31 / 40

Bounds An alternative approach: Bounds keep the old structural object of interest: average treatment effect but analyze its identification in the more general framework with heterogeneous treatment effects in general: we can learn something, not everything bounds / partial identification 32 / 40

Bounds Same assumptions as before 1. Z {0,1}, D {0,1} 2. Y = D Y 1 + (1 D) Y 0 3. D = Z D 1 + (1 Z) D 0 4. D 1 D 0 5. Z (Y 0,Y 1,D 0,D 1 ) 6. Cov(Z,D) 0 additionally: 7. Y is bounded, Y [0,1] 33 / 40

Bounds Decomposing ATE in known and unknown components decompose E[Y 1 ]: E[Y 1 ] = E[Y 1 NT ] P(NT ) + E[Y 1 C AT ] P(C AT ). terms that are identified: E[Y 1 C AT ] = E[Y Z = 1,D = 1] P(C AT ) = E[D Z = 1] P(NT ) = E[1 D Z = 1] and thus E[Y 1 C AT ] P(C AT ) = E[YD Z = 1]. 34 / 40

Bounds Data tell us nothing about E[Y 1 NT ]. Y 1 is never observed for never takers. but we know, since Y is bounded, that E[Y 1 NT ] [0,1] Combining these pieces, get upper and lower bounds on E[Y 1 ]: E[Y 1 ] [E[YD Z = 1], E[YD Z = 1] + E[1 D Z = 1]]. 35 / 40

Bounds For Y 0, similarly E[Y 0 ] [E[Y (1 D) Z = 0], E[Y (1 D) Z = 0] + E[D Z = 0]]. Data are uninformative about E[Y 0 AT ]. Practice problem Show this. 36 / 40

Bounds Combining to get bounds on ATE lower bound for E[Y 1 ], upper bound for E[Y 0 ] lower bound on E[Y 1 Y 0 ] E[Y 1 Y 0 ] E[YD Z = 1] E[Y (1 D) Z = 0] E[D Z = 0] upper bound for E[Y 1 ], lower bound for E[Y 0 ] upper bound on E[Y 1 Y 0 ] E[Y 1 Y 0 ] E[YD Z = 1] E[Y (1 D) Z = 0]+E[1 D Z = 1] 37 / 40

Bounds Between randomized experiments and nothing bounds on ATE: E[Y 1 Y 0 ] [E[YD Z = 1] E[Y (1 D) Z = 0] E[D Z = 0], E[YD Z = 1] E[Y (1 D) Z = 0] + E[1 D Z = 1]]. length of this interval: E[1 D Z = 1] + E[D Z = 0] = P(NT ) + P(AT ) = 1 P(C) 38 / 40

Bounds Share of compliers 1 interval ( identified set ) shrinks to a point In the limit, D = Z thus (Y 1,Y 0 ) D randomized experiment Share of compliers 0 length of the interval goes to 1 In the limit the identified set is the same as without instrument 39 / 40

References References Local average treatment effect: Angrist, J., Imbens, G., and Rubin, D. (1996). Identification of causal effects using instrumental variables. Journal of the American Statistical Association, 91(434):444 455. Bounds on the average treatment effect: Manski, C. (2003). Partial identification of probability distributions. Springer Verlag, chapter 2 and 7. 40 / 40