Econ 673: Microeconometrics Chapter 12: Estimating Treatment Effects. The Problem

Similar documents
Econ 673: Microeconometrics

Selection on Observables: Propensity Score Matching.

Imbens/Wooldridge, IRP Lecture Notes 2, August 08 1

Empirical Analysis III

Principles Underlying Evaluation Estimators

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies

A Note on Adapting Propensity Score Matching and Selection Models to Choice Based Samples

Introduction to Propensity Score Matching: A Review and Illustration

By Marcel Voia. February Abstract

Controlling for overlap in matching

ESTIMATION OF TREATMENT EFFECTS VIA MATCHING

Matching. James J. Heckman Econ 312. This draft, May 15, Intro Match Further MTE Impl Comp Gen. Roy Req Info Info Add Proxies Disc Modal Summ

NBER WORKING PAPER SERIES A NOTE ON ADAPTING PROPENSITY SCORE MATCHING AND SELECTION MODELS TO CHOICE BASED SAMPLES. James J. Heckman Petra E.

Flexible Estimation of Treatment Effect Parameters

Chapter 60 Evaluating Social Programs with Endogenous Program Placement and Selection of the Treated

Implementing Matching Estimators for. Average Treatment Effects in STATA

Matching Techniques. Technical Session VI. Manila, December Jed Friedman. Spanish Impact Evaluation. Fund. Region

Section 10: Inverse propensity score weighting (IPSW)

Implementing Matching Estimators for. Average Treatment Effects in STATA. Guido W. Imbens - Harvard University Stata User Group Meeting, Boston

Evaluating Social Programs with Endogenous Program Placement and Selection of the Treated 1

finite-sample optimal estimation and inference on average treatment effects under unconfoundedness

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data?

Matching using Semiparametric Propensity Scores

PROPENSITY SCORE MATCHING. Walter Leite

Sensitivity of Propensity Score Methods to the Specifications

studies, situations (like an experiment) in which a group of units is exposed to a

Econometric Methods for Ex Post Social Program Evaluation

Tables and Figures. This draft, July 2, 2007

Comparative Advantage and Schooling

The Econometric Evaluation of Policy Design: Part I: Heterogeneity in Program Impacts, Modeling Self-Selection, and Parameters of Interest

Estimation of Treatment Effects under Essential Heterogeneity

Notes on causal effects

Job Training Partnership Act (JTPA)

Microeconometrics. C. Hsiao (2014), Analysis of Panel Data, 3rd edition. Cambridge, University Press.

What s New in Econometrics. Lecture 1

Using Matching, Instrumental Variables and Control Functions to Estimate Economic Choice Models

Quantitative Economics for the Evaluation of the European Policy

A Simulation-Based Sensitivity Analysis for Matching Estimators

Moving the Goalposts: Addressing Limited Overlap in Estimation of Average Treatment Effects by Changing the Estimand

Propensity Score Matching and Policy Impact Analysis. B. Essama-Nssah Poverty Reduction Group The World Bank Washington, D.C.

Lecture 8. Roy Model, IV with essential heterogeneity, MTE

Matching. Quiz 2. Matching. Quiz 2. Exact Matching. Estimand 2/25/14

A Measure of Robustness to Misspecification

Development. ECON 8830 Anant Nyshadham

Using Matching, Instrumental Variables and Control Functions to Estimate Economic Choice Models

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data?

A Course in Applied Econometrics. Lecture 2 Outline. Estimation of Average Treatment Effects. Under Unconfoundedness, Part II

Heterogeneous Treatment Effects

Imbens, Lecture Notes 1, Unconfounded Treatment Assignment, IEN, Miami, Oct 10 1

Policy-Relevant Treatment Effects

Imbens/Wooldridge, Lecture Notes 1, Summer 07 1

Sensitivity analysis for average treatment effects

Statistical Models for Causal Analysis

Lecture 11 Roy model, MTE, PRTE

CALIFORNIA INSTITUTE OF TECHNOLOGY

Propensity Score Matching and Variations on the Balancing Test

Estimating Marginal and Average Returns to Education

WHEN TO CONTROL FOR COVARIATES? PANEL ASYMPTOTICS FOR ESTIMATES OF TREATMENT EFFECTS

Average treatment effect estimation via random recursive partitioning

Course Description. Course Requirements

New Developments in Econometrics Lecture 11: Difference-in-Differences Estimation

Using matching, instrumental variables and control functions to estimate economic choice models

Econometric Causality

Estimating and Using Propensity Score in Presence of Missing Background Data. An Application to Assess the Impact of Childbearing on Wellbeing

Propensity Score Matching and Variations on the Balancing Test

12E016. Econometric Methods II 6 ECTS. Overview and Objectives

The Evaluation of Social Programs: Some Practical Advice

The problem of causality in microeconometrics.

Empirical approaches in public economics

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score

ESTIMATING AVERAGE TREATMENT EFFECTS: REGRESSION DISCONTINUITY DESIGNS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics

Lecture 11/12. Roy Model, MTE, Structural Estimation

The Information Basis of Matching with Propensity Score

Estimating the Dynamic Effects of a Job Training Program with M. Program with Multiple Alternatives

Front-Door Adjustment

Should We Combine Difference In Differences with Conditioning on Pre Treatment Outcomes?

The problem of causality in microeconometrics.

An Introduction to Causal Analysis on Observational Data using Propensity Scores

Chapter 1 Introduction. What are longitudinal and panel data? Benefits and drawbacks of longitudinal data Longitudinal data models Historical notes

The 2004 Florida Optical Voting Machine Controversy: A Causal Analysis Using Matching

Since the seminal paper by Rosenbaum and Rubin (1983b) on propensity. Propensity Score Analysis. Concepts and Issues. Chapter 1. Wei Pan Haiyan Bai

Impact Evaluation Workshop 2014: Asian Development Bank Sept 1 3, 2014 Manila, Philippines

Rubin s Potential Outcome Framework. Motivation. Example: Crossfit versus boot camp 2/20/2012. Paul L. Hebert, PhD

DOCUMENTS DE TRAVAIL CEMOI / CEMOI WORKING PAPERS. A SAS macro to estimate Average Treatment Effects with Matching Estimators

Potential Outcomes Model (POM)

Technical Track Session I: Causal Inference

Comparing Treatments across Labor Markets: An Assessment of Nonexperimental Multiple-Treatment Strategies

NONPARAMETRIC ESTIMATION OF AVERAGE TREATMENT EFFECTS UNDER EXOGENEITY: A REVIEW*

Controlling for Time Invariant Heterogeneity

Causal Inference with and without Experiments I

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data?

Evaluating Nonexperimental Estimators for Multiple Treatments: Evidence from a Randomized Experiment

Econometrics of causal inference. Throughout, we consider the simplest case of a linear outcome equation, and homogeneous

Propensity Score Methods for Causal Inference

Market and Nonmarket Benefits

Additional Material for Estimating the Technology of Cognitive and Noncognitive Skill Formation (Cuttings from the Web Appendix)

Causal Inference with Big Data Sets

CompSci Understanding Data: Theory and Applications

Difference-in-Differences Estimation

Four Parameters of Interest in the Evaluation. of Social Programs. James J. Heckman Justin L. Tobias Edward Vytlacil

Transcription:

Econ 673: Microeconometrics Chapter 12: Estimating Treatment Effects The Problem Analysts are frequently interested in measuring the impact of a treatment on individual behavior; e.g., the impact of job training programs on income 401(k) s on household savings teenage pregnancy on high school drop-out or college graduation rates environmental regulations on pollution levels Randomized experiments are typically not an option for cost and/or ethical reasons. Comparisons of treatment and nontreatment outcomes in a nonexperimental setting are contaminated by the treatment selection process. 1

The Problem (cont d) Lalonde (1986, AER) used data from an actual experiment (the National Supported Work Demonstration Experiment) to study the performance of non-experimental estimators simple regression adjustments difference-in-differences two step Heckman adjustment Found alternative estimators produced very different estimates Most deviated substantially from experimental benchmarks There has in recent years been a boom in the development of alternative non-experimental estimators Alternative Solutions Matching Instrumental Variables Control Functions 2

The Literature - Theory *Wooldridge, J. M, (2002), Econometric Analysis of Cross Section and Panel Data, Cambridge: The MIT Press, Ch. 18. Heckman, J., and Navarro-Lozano, S., (2004), Using Matching, Instrumental Variables, and Continuous Control Functions to Estimate Economics Choice Models, The Review of Economics and Statistics, 86(1): 30-57 Rosenbaum, P., and D. Rubin (1983), The Central Role of the Propensity Score in Observations Studies for Causal Effects, Biometrika 70(1): 41-55. Dehejia, R.H., and S. Wahba (2002), Propensity Score-Matching Methods for Nonexperimental Causal Studies, The Review of Economic Studies, 84(1): 151-161. Heckman, J., H. Ichimura, J. Smith, and P. Todd (1998), Characterizing Selection Bias Using Experimental Data, Econometrica 66(5): 1017-1098. Heckman, J., H. Ichimura, and P. Todd (1997), Matching as an Econometric Evaluation Estimator: Evidence from Evaluating a Job Training Programme, Review of Economic Studies 64: 605-654. Heckman, J., H. Ichimura, and P. Todd (1998), Matching as an Econometric Evaluation Estimator, Review of Economic Studies 65: 261-294. *Smith, J., and P. Todd (2005), Does Matching Overcome Lalonde s Critique of Nonexperimental Estimators? Journal of Econometrics, 125(1-2): 305-53. Abadie, A., and G. Imbens (2004), Large Sample Properties of Matching Estimators for Average Treatment Effects, working paper, January. The Literature - Applications Benjamin, D., (2003), Does 401(k) Eligability Increase Saving? Evidence from Propensity Score Subclassification, Journal of Public Economics 87: 1259-1290. Jalan, J., and M. Ravallion (2003), Does Piped Water Reduce Diarrhea for Children in Rural India? Journal of Econometrics 112: 153-173. Jalan, J., and M. Ravallion (2003), Estimating the Benefit Incidence of an Antipoverty Program by Propensity-Score Matching, Journal of Business and Economic Statistics 21(1):19-30. Levine, D., and G. Painter (2003), The Schooling Costs of Teenage Outof-Wedlock Childbearing: Analysis with a within-school Propensity- Score-Matching Estimator, The Review of Economics and Statistics 85(4): 884-900. *List, J., D. Millimet, P. Fredriksson, and W. McHone (2003), Effects of Environmental Regulations on Manufacturing Plant Births: Evidence from a Propensity Score Matching Estimator, The Review of Economics and Statistics 85(4): 944-52. Park, A., S. Wang, and G. Wu (2002), Regional Poverty Targeting in China, Journal of Public Economics, 86: 123-153. 3

Notation The choice of the treatment is assumed to be determined in the fashion of a standard RUM model, with where V (, ) 1( 0) V = μ Z U D= V > Z denotes factors observed by the analyst V U V denotes factors unobserved by the analyst, but known to the decision maker Potential Outcomes Let Y 1 and Y 0 denote the outcome with and without the treatment, where ( X U ) ( X U ) Y = μ, D = 1 Y 1 1 1 = μ, D = 0 0 0 0 The individual level treatment effect is given by Δ = Y Y 1 0 Additively separable specifications are often considered, with ( ) ( ) 0 ( ) ( ) ( ) ( ) V = μ Z + U E U = V V V Y X U E U 1 = μ1 + 1 1 = 0 Y X U E U 0 = μ0 + 0 0 = 0 4

Parameters of Interest Three different treatment effects are typically of interest 1. The average treatment effect ( ) ATE : E Y Y X 1 0 2. The treatment on the treated ( = ) TT : E Y Y X, D 1 1 0 3. The marginal treatment effect ( = ) MTE E Y Y X Z U u : 1 0,, V V The Selection Problem in a Regression Context The fundamental problem is that each individual is only observed in one state of the world; i.e., we only observe 1 ( 1 ) 0 [ μ1( ) 1] ( 1 )[ μ0( ) 0] ( X) D[ ( X) ( X) ] Y = DY + D Y = D X + U + D X + U = μ + μ μ + ε where ε DU + ( 1 ) 0 1 0 D U 1 0 Unfortunately, unless the treatment assignment is randomized, E ( ε X D) ATE ( X ), 0. 5

The Biases From the samples, we can compute (,, = 1 ) = (,, = 1) EY XZD EY XZD (,, = 0 ) = (,, = 0) EY XZD EY XZD Integrating out Z yields (, = 1 ) and (, = 0) EY XD EY XD 1 0 The resulting bias from comparing (D = 1) and (D = 0) means 1 0 [ ( 1 ) ( 0 )] (, 1) Bias TT= EY XD, = 1 EY XD, = 0 EY Y XD= 1 0 (, 1 ) (, 0) = EY XD= EY XD= 0 0 For ATE The Biases (cont d) [ ( 1 ) ( 0 )] EY ( 1 Y0 X) [ EY ( 1 XD, 1 ) EY ( 1 X) ] [ EY ( XD, 0 ) EY ( X) ] Bias ATE = E Y X, D = 1 E Y X, D = 0 = = = 0 0 [ ( 1 ) ( 0 )] EY ( 1 Y0 XZU,, V = uv ) [ EY ( 1 XZD,, 1 ) EY ( 1 XZU,, V uv )] [ EY ( XZD,, 0 ) EY ( XZU,, u) ] Bias MTE = E Y X, Z, D = 1 E Y X, Z, D = 0 = = = = = 0 0 V V 6

Ignorability of Treatment Matching methods are based on the ignorability of treatment assumption introduced by Rosenbaum and Rubin (1983) Assumption ATE.1: Conditional on W=(X,Z), D and (Y 0,Y 1 ) are independent. ( Y Y ), D W 0 1 A less restrictive version that sometimes suffices is Assumption ATE.1': (, ) = ( ) and (, ) = ( ) EY WD EY W EY WD EY W 0 0 1 1 selection on observables Ignorability of Treatment (cont d) The key to the benefit of ignorability is that it suggests that, even though (Y 0,Y 1 ) and D might be correlated, once we control for W they are uncorrelated (, = 0 ) = (, = 1 ) = ( ) EY WD EY WD EY W 1 1 1 (, = 1 ) = (, = 0 ) = ( ) E Y W D E Y W D E Y W 0 0 0 By conditioning on W, we can construct the missing counterfactuals. Note: If we are interested in TT, then we only need the weaker assumption that Y D W 0 7

Making Use of Ignorability There are several ways in which we can use the ignorability assumption. 1. Since we have a random sample on (Y,D,W), we can estimate (even nonparametrically): ( ) E( Y W D= ) r1 W, 1 ( ) E( Y W D= ) r0 W, 0 given consistent estimators of these functions, a consistent estimator of ATE is N 1 ATE = [ rˆ( W ) rˆ ( W )] N 1 i 0 i i= 1 Similarly Making Use of Ignorability (cont d) N 1 N i i 1 i 0 i i= 1 i= 1 TT = D D [ rˆ( W ) rˆ ( W )] 2. Alternatively, if W can take on a finite number of alternatives i.e., W { w1,, wm } Then we can compute τ jm = E Yj W = τ m, D= j N ATE = s [ ˆ τ ˆ τ ] i= 1 m 1m 0m a form of matching difficult if M is larger where s m denotes the population proportions of type m 8

Using the Propensity Score The ignorability assumption is less useful if W is of high dimensionality. Rosenbaum and Rubin (1983, Theorem 3) reduce the dimensionality problem using the propensity score: ( Y Y ) D W p( W), and 0 < < 1 1 0 ( ) = Pr( = 1 ) p W D W Rosenbaum and Rubin (1983, Theorem 3) show that ( Y, Y ) D p( W) and 0 < Pr( D= 1 p( W) ) 1 0 strong ignorability of treatment Using the Propensity Score (cont d) Again, we can now construct the counterfactuals of interest ( ( ), = 0 ) = ( ( ), = 1 ) = ( ( )) EY pw D EY pw D EY pw 1 1 1 ( ( ), = 1 ) = ( ( ), = 0 ) = ( ( )) EY pw D EY pw D EY pw 0 0 0 Note, however, that we are ruling out p(w)=1 and p(w)=0 cases we want a good model of p(w), but not too good 9

Using the Propensity Score (cont d) Strong ignorability implies that [ D p( W) ] Y ( )[ 1 ( )] ATE E = pw pw [ ( )] [ 1 pw ( )] D p W Y TT = E p W ( ) Given a consistent estimator of p(w), we then have N 1 [ Di pˆ ( Wi) ] Yi ATE = N = 1 pˆ( W )[ 1 pˆ( W )] i i i 1 [ ( )] 1 N 1 N ˆ TT N Di N = 1 = 1 [ 1 pw ˆ ( )] D p W Y i i i = i i i Propensity Score Matching Estimators PSM estimators take the form: ˆ τ = Y Y ˆ with where I 1 n 1 I 0 S P ( ) Wˆ i, j 1 n1 1i 0i i I1 SP Yˆ = W ˆ ( i, j) Y 0i 0 j j I0 denotes the set of treatment observations denotes the number of treatment observations denotes the set of comparison observations denotes the region of common support are weights that depend upon the distance between the propensity scores for i and j. 10

The Choice of Weights Nearest neighbor matching 1 argmin ˆ ˆ j = Pi Pk = 0 otherwise ( ) k I0 Wˆ i, j frequently used because of ease of implementation a single alternative individual serves as counterfactual for the treated individual Nearest k neighbors matching trades off reduced variance (more info used to construct counterfactual) and increased bias (on average poorer fits) The Choice of Weights (cont d) caliper matching ni 1 ˆ ˆ ˆ n P i i Pj < c W( i, j) = 0 otherwise denotes number of caliper matches for i Note: Treated individuals for whom no matches can be found are excluded from the analysis Stratification matching 1 ˆ n P ˆ i j Ti W( i, j) = 0 otherwise Ti denotes propensity score strata for i n denotes number of strata matches for i i 11

Matching Decisions (cont d) kernel (e.g., Heckman, Ichimura, and Todd; 1997,1998) n ( ) Wˆ i, j = Pˆ ˆ j Pi G an Pˆ ˆ k Pi G a n k I0 15 16 ( ) is a kernel function - e.g., G ( s) = ( s 2 1) 2 G s a is a bandwidth parameter local linear Fan (1992) Other Matching Decisions matching with or without replacement again, the tradeoff here is between bias and variance trimming the support region focus analysis on that region such that ( ( ) ) ( ˆp( W) ) Pr ˆp W > 0 > 0 Pr 1 > 0 > 0 nonparametric density estimators can be used for p(w) typically, stricter requirements are placed on the support, with ( ˆp ( W) ) ( ˆp ( W) ) Pr > 0 > c Pr 1 > 0 > c 12

Other Matching Decisions (cont d) difference in difference matching uses time series differencing to eliminate unobserved temporally invariant effects requires before and after treatment observations for both treated and untreated individuals conditional matching (e.g., common region, school, etc.) the choice of the comparison sample. Heckman et al. (1997,1998) argue for the following criteria: same data source individuals reside in the same market data contain a rich set of variables affecting outcomes and treatment group Example #1: Heckman, Ichimura, and Todd (1997) HIT7 Use data from the National Job Training Partnership Act (JTPA) Experiment, including randomized-out controls an eligible nonparticipants comparison group. In this paper, the authors decompose the bias differences in earnings test the assumptions underlying matching, rejecting most of them evaluate the performance of difference matching routines emphasize the importance of a good comparison group 13

Decomposing Evaluation Bias in TT The bias in PSME can be decomposed as follows where S1 (, 1 ) ( 1) B = EY XD= f X D= dx EY ( 0 XD, 0 ) f ( X D 0) dx = B + 1 B + 2 B3 S0 0 = = 1 S1\ S10 S0\ S10 ( ) ( ) B = EY, 1 1 0 XD= f X D= dx (, 0 ) ( 0) EY XD= f X D= dx 0 bias due to nonoverlapping support Decomposing Evaluation Bias in TT (cont d) S10 (, 0) { f ( X D= ) ( = )} B2 = E Y0 X D= 1 f X D 0 dx { EY ( XD= ) EY ( XD = )} ( = ) 3 0, 1 0, 0 1 S10 bias due to differing distributions in X B = f X D dx bias due to selection on unobservables PSME attempts to address B 1 and B 2, but assumes away B 3 14

Overlap - Adult Males (HIT7) Overlap - Males Youths (HIT7) 15

Overlap - Males Youths (HIT7) Decomposition of Bias (HIT7) 16

Testing Key assumptions (HIT7) Testing Key assumptions cont d (HIT7) 17

Testing Key assumptions cont d (HIT7) max correct predictions increasing number of explanatory variables in p(w) model 18

Example #2: Dehejia and Wahba (2003) ReStat Use data on National Supported Work (NSW) demonstration this is randomized experiment DW compare experimental treatment effect estimates to those obtained using two comparison samples Population Survey of Income Dynamics (PSID) Current Population Survey A variety of matching algorithms are considered 19

20

21

Example #3: Smith and Todd (2005) Repeat the exercise in DW, but investigate alternative sample definitions estimate bias by using PSME s on NSW randomized controls add difference in difference matching General conclusions: PSME are not a silver bullet for nonexperimental situations The performance of PSME in DW is not generalizable, varying by sample definition Difference-in-difference matching performed substantially better than cross-sectional matching alone Details of the matching procedure generally had little impact including type of matching (nearest neighbor, local linear, etc.) propensity score estimation procedure 22

Example #4: List, Millimet, Fredriksson, and McHone (2003) REStat Treatment: Nonattainment designation Outcome of interest: County level dirty plant births in New York 176 treatment observations Caliper matching conditional matching considered for within region and year within year matches are obtain for 8 to 81 of the treatment observations (depending on the use of conditional matches) Difference-in-difference estimates using clean plant births as control 23

Let with A Simple Experiment Y1 i = 2+ 2X1+ X2 + 2X3 + ε1 i Y0i = 1+ X1+ 2X2 + X3 + ε 0i Y = 4 + X + X + X + X + ε Di 1 2 3 4 ( ε1 i, ε0i, εdi) ~ N( 0, I3) X ~ N( 1, Σ) 1 ρ ρ ρ 2 ρ 1 ρ ρ Σ= σ D ρ ρ 1 ρ ρ ρ ρ 1 Di 24

RMSE Using Full Set of Conditioning Variables 3.5 3 2.5 2 1.5 1 0.5 0 1 2 3 4 5 6 7 8 9 10 sigmad Treatment-NonTreatment PSME RMSE Omitting X 1 3.5 3 2.5 2 1.5 1 0.5 0 1 2 3 4 5 6 7 8 9 10 sigmad Treatment-NonTreatment PSME 25

RMSE Omitting X 2 3.5 3 2.5 2 1.5 1 0.5 0 1 2 3 4 5 6 7 8 9 10 sigmad Treatment-NonTreatment PSME Other Issues Post-matching verification that treatment and matched group characteristics are similar Limited common support Standard errors practitioners frequently ignore uncertainty in matching process Large sample properties - Aradie and Imbens (2004) working paper Multiple treatments Lechner, M., (2002), Program Heterogeneity and Propensity Score Matching: An Application to the Evaluation of Active Labor Market Policies, Review of Economics and Statistics 84(2): 205-220. Bellio, R., and E. Gori (2003), Impact Evaluation of Job Training Programmes: Selection Bias in Multilevel Models, Journal of Applied Statistics 30(8):893-907 26