Using Instrumental Variables to Find Causal Effects in Public Health

Similar documents
Applied Econometrics (MSc.) Lecture 3 Instrumental Variables

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Mgmt 469. Causality and Identification

Recitation Notes 6. Konrad Menzel. October 22, 2006

Econometrics with Observational Data. Introduction and Identification Todd Wagner February 1, 2017

Introduction to causal identification. Nidhiya Menon IGC Summer School, New Delhi, July 2015

EC402 - Problem Set 3

Empirical approaches in public economics

Econometrics. 7) Endogeneity

Applied Health Economics (for B.Sc.)

Click to edit Master title style

Chapter 2: simple regression model

Session IV Instrumental Variables

1 Motivation for Instrumental Variable (IV) Regression

Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16)

The returns to schooling, ability bias, and regression

ECNS 561 Multiple Regression Analysis

Logistic regression: Why we often can do what we think we can do. Maarten Buis 19 th UK Stata Users Group meeting, 10 Sept. 2015

Linear Model Under General Variance

Instrumental Variables and the Problem of Endogeneity

INTRODUCTION TO BASIC LINEAR REGRESSION MODEL

IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors

Steps in Regression Analysis

ECON Introductory Econometrics. Lecture 17: Experiments

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Econ 510 B. Brown Spring 2014 Final Exam Answers

Instrumental Variables

Econ 583 Final Exam Fall 2008

CHAPTER 6: SPECIFICATION VARIABLES

Least Squares Estimation-Finite-Sample Properties

An example to start off with

Development. ECON 8830 Anant Nyshadham

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data?

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017

Instrumental Variables

ECON Introductory Econometrics. Lecture 16: Instrumental variables


Økonomisk Kandidateksamen 2004 (I) Econometrics 2. Rettevejledning

Instrumental Variables, Simultaneous and Systems of Equations

Econometrics. 8) Instrumental variables

Essential of Simple regression

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

Motivation for multiple regression

IV Estimation WS 2014/15 SS Alexander Spermann. IV Estimation

DEEP, University of Lausanne Lectures on Econometric Analysis of Count Data Pravin K. Trivedi May 2005

Next, we discuss econometric methods that can be used to estimate panel data models.

Economics 241B Estimation with Instruments

ECO Class 6 Nonparametric Econometrics

Causality II: How does causal inference fit into public health and what it is the role of statistics?

Econometrics Summary Algebraic and Statistical Preliminaries

EMERGING MARKETS - Lecture 2: Methodology refresher

Recitation Notes 5. Konrad Menzel. October 13, 2006

Using Matching, Instrumental Variables and Control Functions to Estimate Economic Choice Models

Selection on Observables: Propensity Score Matching.

Applied Quantitative Methods II

Introductory Econometrics

ECON 497: Lecture 4 Page 1 of 1

Econ 2148, fall 2017 Instrumental variables I, origins and binary treatment case

ECON The Simple Regression Model

Econometrics in a nutshell: Variation and Identification Linear Regression Model in STATA. Research Methods. Carlos Noton.

4.8 Instrumental Variables

Introduction to Econometrics

Simultaneous Equation Models Learning Objectives Introduction Introduction (2) Introduction (3) Solving the Model structural equations

Our point of departure, as in Chapter 2, will once more be the outcome equation:

SIMULTANEOUS EQUATION MODEL

On the Use of Linear Fixed Effects Regression Models for Causal Inference

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data?

Notes 11: OLS Theorems ECO 231W - Undergraduate Econometrics

Lecture 4: Linear panel models

Sociology 593 Exam 2 Answer Key March 28, 2002

Panel Data. March 2, () Applied Economoetrics: Topic 6 March 2, / 43

Correlation Analysis

For more information about how to cite these materials visit

Multiple Linear Regression CIVL 7012/8012

Quantitative Economics for the Evaluation of the European Policy

Econ 1123: Section 5. Review. Internal Validity. Panel Data. Clustered SE. STATA help for Problem Set 5. Econ 1123: Section 5.

Regression Analysis Tutorial 34 LECTURE / DISCUSSION. Statistical Properties of OLS

AGEC 661 Note Fourteen

Controlling for Time Invariant Heterogeneity

Causal Hazard Ratio Estimation By Instrumental Variables or Principal Stratification. Todd MacKenzie, PhD

Advanced Statistical Methods for Observational Studies L E C T U R E 0 6

Econometrics for PhDs

Sensitivity checks for the local average treatment effect

1 Outline. 1. Motivation. 2. SUR model. 3. Simultaneous equations. 4. Estimation

Limited Dependent Variables and Panel Data

Ec1123 Section 7 Instrumental Variables

SC705: Advanced Statistics Instructor: Natasha Sarkisian Class notes: Introduction to Structural Equation Modeling (SEM)

Lecture Notes Part 7: Systems of Equations

Ability Bias, Errors in Variables and Sibling Methods. James J. Heckman University of Chicago Econ 312 This draft, May 26, 2006

y it = α i + β 0 ix it + ε it (0.1) The panel data estimators for the linear model are all standard, either the application of OLS or GLS.

Advanced Econometrics I

The Problem of Causality in the Analysis of Educational Choices and Labor Market Outcomes Slides for Lectures

Rockefeller College University at Albany

Lecture 8. Roy Model, IV with essential heterogeneity, MTE

Basic econometrics. Tutorial 3. Dipl.Kfm. Johannes Metzler

Freeing up the Classical Assumptions. () Introductory Econometrics: Topic 5 1 / 94

Applied Statistics and Econometrics. Giuseppe Ragusa Lecture 15: Instrumental Variables

Econometrics Review questions for exam

Small-sample cluster-robust variance estimators for two-stage least squares models

Outline. Overview of Issues. Spatial Regression. Luc Anselin

Transcription:

1 Using Instrumental Variables to Find Causal Effects in Public Health Antonio Trujillo, PhD John Hopkins Bloomberg School of Public Health Department of International Health Health Systems Program October 30, 2013

2 Today s Topics Motivation IV: Matching Through partial assignment Graphical representation of the IV approach The problem of endogeneity in the context of program evaluation The general idea behind the Instrumental Variables (IVs) approach in PE The conceptual framework Three fundamental characteristics of IVs Testing the statistical validity of IVs Additional issues

3 Empirical Work: Some examples The effect of treatment for acute myocardial infarction IV distance to facility (Newhouse and McClellan) Effect of health insurance on medical care use among people with diabetes IVs community coverage; community knowledge about disease Effect of informal care on labor force participation IV parents health Effect of cognitive skills on self-management among people with diabetes IV right/left hand Estimate birth outcome in teenagers IV mother s schooling (Rosenzweig)

4 Matching to Partial Assignment In the case of IV estimations, there is an incentive to take the treatment. This incentive is random. In experimental design the treatment itself is randomized. The core of argument is to find a variable Z that affects X; but has not direct effect on Y. Then, r(y,z) divided by r (X,Z) provided an estimate of the parameter of interest. Because we reduce the variance of X, IV methods do not work well with small samples.

5 Graphical representation of IV Figure 1 X Y ε y In this case, r(y,x) is causal effect of X on Y Figure 2 C X Y In this case, r (Y,X) is NOT causal effect of X on Y ε x ε y Figure 3 C Z α X Y ε x ε y - IV method: r(y,z)= α*β, and r(x,z)= α - Ratio of both correlation gives the causal effect β β

6 More formally Let assume that we are interested in the following causal relationship: Y= β 0 + B 1 X+ ε (1) Taking the expected value of Y and rewrite this expected value as changes in Z will be: E[Y]= E[β 0 + B 1 X+ ε ]= β 0 + B 1 [X]+ E[ε] (2) E[Y/Z=1]- EY/Z=0]= B 1 (E[X/Z=1]-E[x/Z=0])+ (E[ε /Z=1]-E[ε /Z=0]) (3) Diving by changes in X produced by changes in Z yields the IV estimate: E[Y/Z=1]- EY/Z=0]/ E[Y/Z=1]- EY/Z=0] = B 1 + K (4) Main takeaways: (i) the bias K will equal zero if Z is uncorrelated with ε; (ii) it is assumed that the parameters of interest are constant structural effects for the motivation of this IV estimation; (iii) the bias is big when Z and X are weakly correlated. Extending this result when Z takes more than two values lead to similar conclusions.

7 The Problem of Endogeneity in PE Endogeneity arises when: program participation could be function of the outcome variable; or when the error term in the outcome equation may be correlated with the error term in the participation equation. In other words, unobservable factors influence both outcome and program participation In this case, OLS estimates of the outcome equation would be inconsistent. One solution to the problem is to model participation using IVs. Another solution would be to measure the unobserved factor. The problem is in finding good instruments, and evaluating what happened with bad instruments. Keep in mind that weak instruments would also bias the results.

8 The General Idea Behind IVs in PE The idea behind IVs is to replace the endogenous variable (eg., program participation) with its predicted value using exogenous variables and the IV. However, the first step is to identify (at least one) variable that explains program participation: Such that the only link between the IV and the outcome of interest happens through program participation Different statistical tests are used to identify good instruments and the optimal number of them. More about these tests later. Since IV estimates are consistent but not unbiased, analysts using IV should use large databases. In addition, one needs to be very careful with the use of correct standard errors when using IV estimates.

9 IV in Program Evaluation IVs key issues: - exogenous variation in PP - IV allows us to estimate the coefficient of interest without knowing what the omitted variables are Instrumental Variables Program Participation IVs are used to solve: - omitted variable problem in causal relationship. - bias in estimates from measurement problems Outcome of interest Exogenous variables

10 The conceptual framework If program participation is endogenous, we can model it as : Y ij = X ij β + P j α + e ij of interest to the evaluator is α. P ij = X ij δ + Z ij κ + μ ij where Z ij are instruments. Both errors could be correlated because the omitted variables influence both the outcome and program participation The evaluator needs to distinguish between endogenous and exogenous variables in the case of simultaneous equations. Endogenous variables may have both direct and indirect effects.

11 The conceptual framework (II) To have a fully specified a structural model, be able to derive a set of reduced form equations from the structural equations. The key issue of identification is recovery of the structural parameters from the reduced form equation. The estimation methods usually implemented are: Substitute the IVs for program participation and run OLS on outcome 2SLQ: estimate the participation equation with exogenous variables and IVs and then substitute the estimated value in the outcome equation (this would work as an IV) Be careful with the standard errors when using IVs. Nowadays most statistical packages adjust the standard errors in the second stage to use the corrected ones.

12 Weak Instruments If you discover that you only have weak instruments, it is better to use OLS (assume program participation exogenous) Instrumental variables are weak when: there is weak correlation between the IVs and the program participation; strong correlation between IVs and the outcome of interest strong correlation between IVs and the error term in the outcome equation

13 More on Weak Instruments One needs to keep in mind that it is difficult to find a variable (IV) that does not have a net direct effect on the outcome of interest. But even if one finds this variable, IV estimators are biased in finite samples. From equation (4) previously presented, it is easy to derived that the biased will be big when the denominator is close to zero (i.e, weak instrument). This relationship is independent of sample size. In this case, the IV would not contain relevant information about the causal effect even in cases with small standard errors. In empirical work, it is hard to sell the non-net direct effect assumptions. For this reason, IV that comes from naturally occurring variation are becoming more common in the literature. Yet, naturally occurring events may also have an impact in other causes of the outcome.

14 Characteristics of Good IVs Highly correlated with Program Participation (i.e., IV s are not correlated with included variables) Uncorrelated with the error term in both equations (i.e. uncorrelated with outcome) Only link with outcome is through program participation

15 Testing the Validity of IVs Steps to test the validity of IVs: Test exogeneity of program participation: problematic due to measurement characteristics. Test for good instruments R 2 test IVs explain program participation IV does not belong to the outcome of interest equation after controlling for program participation (over-identification test)

16 Test of Exogeneity of PP There are two common tests for exogeneity of PP: Hausman test (most commonly used) Spencer Berk test Hausman test: (null hypothesis: PP is exogenous) Get the reduced form of program participation (based on exogenous variables and IVs) Get predicted PP Run the outcome variable with PP and predicted PP and exogenous variables. The test of the null is whether the coefficient of the predicted PP is significantly different from zero. It is important to keep in mind that the test for exogeneity depends on the model specification (i.e., definition of exogenous variables and IVs)

17 R 2 test One should run the PP equation with only IVs: If the R 2 in participation equation is high when one uses IVs, estimates from 1SLQ and 2SLQ would be similar. If R 2 is low; then IV estimations would be practically meaningless. An OLS would be a better option There is not a definite cut off point; but the literature suggests 0.03-0.05. Below this, instrumental variables are not necessary, and weak.

18 Testing IVs in PP One should implement the following steps: Run the PP equation with exogenous variables and IVs Run the PP equation with exogenous variables (not including IVs) Conduct a t-test for the IV coefficient In case of multiple IVs, one may conduct a Wald test or likelihood ratio test using the results from the two previous regressions One would like to reject the null hypothesis (IVs are statistically significant)

19 Over-identification Test One should implement the following steps: Run the outcome equation with exogenous variables and program participation Run the outcome equation with exogenous variables, program participation and IVs Conduct a T-test for the IV coefficient In case of multiple IVs, one may conduct a Wald test or likelihood ratio test using the results from the two previous regressions One would like to accept the null hypothesis (i.e., after controlling for program participation, IVs do not explain the outcome of interest)

20 Additional issues IV provides the program effect for those for whom the IV influences their decision to participate. Yet, we may have people with different program impact. Therefore, IV estimates provide the Local Average Treatment Effect (LATE) instead of the Average Treatment Effect. It is important to keep in mind functional form issues for the first and the second stage. In practice, it is better to use a linear function in the first stage and use the correct non-linear function in the second stage.