Implementing Matching Estimators for. Average Treatment Effects in STATA. Guido W. Imbens - Harvard University Stata User Group Meeting, Boston

Similar documents
Implementing Matching Estimators for. Average Treatment Effects in STATA

What s New in Econometrics. Lecture 1

The Evaluation of Social Programs: Some Practical Advice

The Stata Journal. Stata Press Production Manager

Imbens/Wooldridge, Lecture Notes 1, Summer 07 1

Moving the Goalposts: Addressing Limited Overlap in Estimation of Average Treatment Effects by Changing the Estimand

A Course in Applied Econometrics. Lecture 2 Outline. Estimation of Average Treatment Effects. Under Unconfoundedness, Part II

Estimation of the Conditional Variance in Paired Experiments

Imbens/Wooldridge, IRP Lecture Notes 2, August 08 1

Selection on Observables: Propensity Score Matching.

Large Sample Properties of Matching Estimators for Average Treatment Effects

Section 10: Inverse propensity score weighting (IPSW)

ESTIMATION OF TREATMENT EFFECTS VIA MATCHING

DOCUMENTS DE TRAVAIL CEMOI / CEMOI WORKING PAPERS. A SAS macro to estimate Average Treatment Effects with Matching Estimators

Dealing with Limited Overlap in Estimation of Average Treatment Effects

Imbens, Lecture Notes 1, Unconfounded Treatment Assignment, IEN, Miami, Oct 10 1

NONPARAMETRIC ESTIMATION OF AVERAGE TREATMENT EFFECTS UNDER EXOGENEITY: A REVIEW*

Flexible Estimation of Treatment Effect Parameters

Moving the Goalposts: Addressing Limited Overlap in Estimation of Average Treatment Effects by Changing the Estimand

Matching Techniques. Technical Session VI. Manila, December Jed Friedman. Spanish Impact Evaluation. Fund. Region

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies

Econ 673: Microeconometrics Chapter 12: Estimating Treatment Effects. The Problem

The propensity score with continuous treatments

A Simulation-Based Sensitivity Analysis for Matching Estimators

studies, situations (like an experiment) in which a group of units is exposed to a

A Measure of Robustness to Misspecification

SIMULATION-BASED SENSITIVITY ANALYSIS FOR MATCHING ESTIMATORS

finite-sample optimal estimation and inference on average treatment effects under unconfoundedness

Empirical Analysis III

A Course in Applied Econometrics. Lecture 5. Instrumental Variables with Treatment Effect. Heterogeneity: Local Average Treatment Effects.

Nonparametric Tests for Treatment Effect. Heterogeneity

Notes on causal effects

ESTIMATING AVERAGE TREATMENT EFFECTS: REGRESSION DISCONTINUITY DESIGNS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics

Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score 1

Econometric Methods for Program Evaluation. March 19, 2018

A Note on Adapting Propensity Score Matching and Selection Models to Choice Based Samples

Finite Sample Properties of Semiparametric Estimators of Average Treatment Effects

Instrumental Variables

NBER WORKING PAPER SERIES A NOTE ON ADAPTING PROPENSITY SCORE MATCHING AND SELECTION MODELS TO CHOICE BASED SAMPLES. James J. Heckman Petra E.

WHEN TO CONTROL FOR COVARIATES? PANEL ASYMPTOTICS FOR ESTIMATES OF TREATMENT EFFECTS

Combining multiple observational data sources to estimate causal eects

Job Training Partnership Act (JTPA)

Gov 2002: 5. Matching

Causal Inference Lecture Notes: Selection Bias in Observational Studies

Why high-order polynomials should not be used in regression discontinuity designs

Large Sample Properties of Matching Estimators for Average Treatment Effects

Balancing Covariates via Propensity Score Weighting

Bounds on least squares estimates of causal effects in the presence of heterogeneous assignment probabilities

SINCE the work by Ashenfelter (1978), Card and Sullivan

Quantitative Economics for the Evaluation of the European Policy

A SIMULATION-BASED SENSITIVITY ANALYSIS FOR MATCHING ESTIMATORS

Evaluating Nonexperimental Estimators for Multiple Treatments: Evidence from a Randomized Experiment

Matching. James J. Heckman Econ 312. This draft, May 15, Intro Match Further MTE Impl Comp Gen. Roy Req Info Info Add Proxies Disc Modal Summ

Nonparametric Tests for Treatment Effect Heterogeneity

Econometric Methods for Program Evaluation

Covariate Balancing Propensity Score for General Treatment Regimes

Impact Evaluation Technical Workshop:

Sensitivity analysis for average treatment effects

Potential Outcomes Model (POM)

Econometrics of causal inference. Throughout, we consider the simplest case of a linear outcome equation, and homogeneous

Exploring Marginal Treatment Effects

TECHNICAL WORKING PAPER SERIES ON THE FAILURE OF THE BOOTSTRAP FOR MATCHING ESTIMATORS. Alberto Abadie Guido W. Imbens

Matching Methods in Practice: Three Examples

Difference-in-Differences Estimation

Matching using Semiparametric Propensity Scores

Matching on the Estimated Propensity Score

Assess Assumptions and Sensitivity Analysis. Fan Li March 26, 2014

AAEC/ECON 5126 FINAL EXAM: SOLUTIONS

Estimation of average treatment effects based on propensity scores

New Developments in Econometrics Lecture 11: Difference-in-Differences Estimation

Balancing Covariates via Propensity Score Weighting: The Overlap Weights

Blinder-Oaxaca as a Reweighting Estimator

PROPENSITY SCORE MATCHING. Walter Leite

Nonparametric Tests for Treatment Effect Heterogeneity

Journal of Statistical Software

Statistical Analysis of Randomized Experiments with Nonignorable Missing Binary Outcomes

Matching. Quiz 2. Matching. Quiz 2. Exact Matching. Estimand 2/25/14

Robust Confidence Intervals for Average Treatment Effects under Limited Overlap

By Marcel Voia. February Abstract

Causality and Experiments

Statistical Models for Causal Analysis

Evaluating Nonexperimental Estimators for Multiple Treatments: Evidence from a Randomized Experiment

The problem of causality in microeconometrics.

Covariate selection and propensity score specification in causal inference

Composite Causal Effects for. Time-Varying Treatments and Time-Varying Outcomes

CALIFORNIA INSTITUTE OF TECHNOLOGY

Primal-dual Covariate Balance and Minimal Double Robustness via Entropy Balancing

Sensitivity of Propensity Score Methods to the Specifications

Lisa Gilmore Gabe Waggoner

Estimating and Using Propensity Score in Presence of Missing Background Data. An Application to Assess the Impact of Childbearing on Wellbeing

The Econometric Evaluation of Policy Design: Part I: Heterogeneity in Program Impacts, Modeling Self-Selection, and Parameters of Interest

Introduction to Propensity Score Matching: A Review and Illustration

Achieving Optimal Covariate Balance Under General Treatment Regimes

Empirical Methods in Applied Microeconomics

Propensity Score Methods for Causal Inference

Why High-Order Polynomials Should Not Be Used in Regression Discontinuity Designs

FINITE-SAMPLE OPTIMAL ESTIMATION AND INFERENCE ON AVERAGE TREATMENT EFFECTS UNDER UNCONFOUNDEDNESS. Timothy B. Armstrong and Michal Kolesár

Gov 2002: 4. Observational Studies and Confounding

arxiv: v1 [stat.me] 4 Feb 2017

A Theory of Statistical Inference for Matching Methods in Causal Research

Transcription:

Implementing Matching Estimators for Average Treatment Effects in STATA Guido W. Imbens - Harvard University Stata User Group Meeting, Boston July 26th, 2006

General Motivation Estimation of average effect of binary treatment, allowing for general heterogeneity. Use matching to eliminate bias thatis present in simple comparison of means by treatment status. 1

Economic Applications Labor market programs: Ashenfelter (1978), Ashenfelter and Card (1985), Lalonde (1986), Card and Sullivan (1989), Heckman and Hotz (1989), Friedlander and Robins (1995), Dehejia and Wahba (1999), Lechner (1999), Heckman, Ichimura and Todd (1998). Effect of Military service on Earnings: Angrist (1998) Effect of Family Composition: Manski, McLanahan, Powers, Sandefur (1992) Many other applications. 2

Topics 1. General Set Up /Notation 2. Estimators for Ave Treatm Effect under Unconf. 3. Implementation in STATA Using nnmatch 3

1. Notation N individuals/firms/units, indexed by i=1,...,n, Wi {0, 1}: Binary treatment, Yi(1): Potential outcome for unit i with treatment, Yi(0): Potential outcome for unit i without the treatment, Xi: k 1 vector of covariates. We observe {(Xi, Wi, Yi)} N i=1, where Yi = { Yi(0) if Wi = 0, Yi(1) if Wi = 1. Fundamental problem: we never observe Yi(0) and Yi(1) for the same individual i. 4

Notation (ctd) µw(x) = E[Y (w) X = x] (regression functions) σ 2 w(x) = E[(Y (w) µw(x)) 2 X = x] (conditional variances) e(x) = E[W X = x] = Pr(W = 1 X = x) (propensity score, Rosenbaum and Rubin, 1983) τ(x) = E[Y (1) Y (0) X = x] = µ1(x) µ0(x) (conditional average treatment effect) τ = E[τ(X)] = E[Y (1) Y (0)] Population Average Treatment Effect 5

Assumptions I. Unconfoundedness Y (0), Y (1) W X. This form due to Rosenbaum and Rubin (1983). Like selection on observables, or exogeneity. Suppose Yi(0) = α + β Xi + εi, Yi(1) = Yi(0) + τ, then Yi = α + τ Wi + β Xi + εi, and unconfoundedness εi Wi Xi. 6

II. Overlap 0 < Pr(W = 1 X) < 1. For all X there are treated and control units.

Motivation for Assumptions I. Descriptive statistics. After simple difference in mean outcomes for treated and controls, it may be useful to compare average outcomes adjusted for covariates. II. Alternative: bounds (e.g., Manski, 1990) 7

III. Unconfoundedness follows from some economic models. Suppose individuals choose treatment w to maximize expected utility, equal to outcome minus cost, Yi(w) ci w, conditional on a set of covariates X: Wi = argmaxw E[Yi(w) Xi] ci w. Suppose that costs c i differ between individuals, indep. of potential outcomes. Then (i) choices will vary between individuals with the same covariates, and (ii) conditional on the covariates X the choice is independent of the potential outcomes. 8

Identification τ(x) = E[Y (1) Y (0) X = x] = E[Y (1) X = x] E[Y (0) X = x] By unconfoundedness this is equal to E[Y (1) W = 1, X = x] E[Y (0) W = 0, X = x] = E[Y W = 1, X = x] E[Y W = 0, X = x]. By the overlap assumption we can estimate both terms on the righthand side. Then τ = E[τ(X)]. 9

Questions How well can we estimate τ? How do we estimate τ? How do we do inference? How do we assess assumptions (unconfoundedness/overlap)? 10

2. Estimation of Average Treatment Effect under Unconfoundedness I. Regression estimators: estimate µw(x). II. Propensity score estimators: estimate e(x) III. Matching: match all units to units with similar values for covariates and opposite treatment. IV. Combining Regression with Propensity score and Matching Methods. 11

Regression Estimators Estimate µw(x) nonparametrically, and then ˆτ = 1 N N i=1 (ˆµ1(Xi) ˆµ0(Xi)). These estimators can reach efficiency bound. 12

Propensity Score Estimators The key insight is that even with high-dimensional covariates, one can remove all bias by conditioning on a scalar function of the covariates, the propensity score. Formally, if Y (0), Y (1) W X. then Y (0), Y (1) W e(x). (e(x) = Pr(W = 1 X = x), Rosenbaum and Rubin, 1983) Thus we can reduce the dimension of the conditioning set (if we know the propensity score) to one. 13

Propensity Score Estimators (ctd) Estimate e(x) nonparametrically, and then: A. weighting (Hirano, Imbens, Ridder, 2003) ˆτ = 1 N N i=1 ( Wi Y i ê(xi) (1 W ) i) Y i. 1 ê(xi) This is based on the fact that [ W Y E e(x) X = x ] [ W Y (1) = E e(x) X = x ] = E [ W e(x) X = x ] E [Y (1) X = x] = µ1(x). 14

Propensity Score Estimators (ctd) B. Blocking (Rosenbaum and Rubin, 1983) Divide sample in subsamples on the basis of the value of the (estimated) propensity score. Estimate average treatment effect within each block as the difference in average outcomes for treated and controls. Average within block estimates by the proportion of observations in each block. Using five blocks reduces bias by about 90% (Cochran, 1968), under normality. 15

Matching For each treated unit i, find untreated unit l(i) with X l(i) x = min {l:w l =0} X l x, and the same for all untreated observations. Define: Ŷi(1) = { Yi if Wi = 1, Y l(i) if Wi = 0, Ŷi(0) = { Yi if Wi = 0, Y l(i) if Wi = 1. Then the simple matching estimator is: ˆτ sm = 1 N N i=1 (Ŷi (1) Ŷ i (0) ). Note: since we match all units it is crucial that matching is done with replacement. 16

Matching (ctd) More generally, let J M (i) = {l1(i),..., l M (i)} be the set of indices for the nearest M matches for unit i. Define: Ŷi(1) = { Yi if Wi = 1, j J M (i) Y j/m if W i = 0, Ŷi(0) = { Yi if Wi = 0, j J M (i) Y j/m if Wi = 1. Matching is generally not efficient (unless M ), but efficiency loss is small (variance is less than 1 + 1/(2M) times the efficiency bound). 17

The bias is of order Op(N 1/k ), where k is the dimension of the covariates. Matching is consistent under weak smoothness conditions (does not require higher order derivatives).

Matching and Regression Estimate µw(x), and modify matching estimator to: Ỹi(1) = { Yi if Wi = 1, Y l(i) + ˆµ1(Xi) ˆµ1(X j(i) ) if Wi = 0 Ỹi(0) = { Yi if Wi = 0, Y l(i) + ˆµ0(Xi) ˆµ0(X j(i) ) if Wi = 1 Then the bias corrected matching estimator is: ˆτ bcm = 1 N N i=1 (Ỹi(1) Ỹi(0) ) 18

Variance Estimation Matching estimators have the form ˆτ = N i=1 (W i λ i Y i (1 W i ) λ i Y i ), (linear in outcomes) with weights λi = λ(w, X). λ(w, X) (known) is very non-smooth for matching estimators and bootstrap is not valid as a result. (not just no second order justification, but not valid asymptotically) 19

Variance conditional on W and X is V (ˆτ W, X) = N i=1 ( Wi λ 2 i σ2 1 (X i) + (1 Wi) λ 2 i σ2 0 (X i) ). All parts known other than σ 2 w(x). For each treated (control) find the closest treated (control) unit: h(i) = min j i,w X j=wi i Xj. Then use the difference between their outcomes to estimate σ 2 (Xi) for this unit: σ 2 W i (Xi) = 1 2 (Y i Y h(i) ) 2. Substitute into variance formula. Even though ˆσ w(x) 2 is not consistent, the estimator for V (ˆτ W, X) is because it averages over all ˆσ w 2 (X i). 20

3. Implementation in STATA Using nnmatch Syntax: nnmatch depvar treatvar varlist [weight] [if exp] [in range] [, tc({ate att atc}) m(#) metric(maha matname ) exact(varlistex) biasadj(bias varlist adj ) robusth(#) population level(#) keep(filename) replace] 21

nnmatch depvar treatvar varlist [weight] [if exp] [in range] Basic command: treatvar must be binary variable 22

option 1 tc({ate att atc}) One can est. the overall average effect of the treatment (ate), or the average treatment effect for the treated units (att), or the average effect for those who were not treated (atc) 23

option 2 m(#) The number of matches. In kernel matching estimators essentially the key difference is that the number of matches increases with the sample size. In practice there is little gain from using more than 3-4 matches. Under homoskedasticity the variance goes down proportional to 1+1/(2M), where M is the number of matches. 24

option 3 metric(maha matname ) The distance metric. Two main options, the inverse of the variances or mahalanobis distance. It can also be prespecified. 25

option 4 exact(varlistex) A special list of covariates receives extra weight (1000 times the weight specified in met. Useful for binary covariates where one wishes to match exactly. 26

option 5 biasadj(bias varlist adj ) In treatment and control group regression adjustment is used based on the variables in this option. If bias(bias) is used, all the variables used in the matching are used here again. 27

option 6 robusth(#) heteroskedasticity consistent variance estimation. 28

option 7 population sample average treatment effect 1 N N i=1 (µ1(xi) µ0(xi)), versus population average treatment effect: E[µ1(X) µ0(x)]. The former can be estimated more precisely if there is heterogeneity in µ1(x) µ0(x). 29

option 8 level(#) standard STATA option that specifies the confidence level for confidence sets 30

option 9 keep(filename) replace Allows the user to recover output beyond the estimate and its standard error. A new data set is created with one observation per match, and covariate information is kept for control and treated unit in each match. 31

3 Examples nnmatch re78 t age educ black hisp married re74 re75 reo74 reo75, tc(att) nnmatch re78 t age educ black hisp married re74 re75 reo74 reo75, tc(att) m(4) exact(reo75) bias(bias) rob(4) keep(lalonde temp1) replace nnmatch re78 t age educ black hisp married re74 re75 reo74 reo75, tc(att) m(4) exact(pscore) bias(bias) rob(4) keep(lalonde temp2) replace 32

Cautionary note: If there are few matching variables (all discrete), and many ties, nnmatch can be very slow, and memory intensive. One solution is to add a continuous matching variable, even if it is unrelated to other things, to break the ties. 33