A Course in Applied Econometrics. Lecture 2 Outline. Estimation of Average Treatment Effects. Under Unconfoundedness, Part II

Similar documents
What s New in Econometrics. Lecture 1

Imbens/Wooldridge, IRP Lecture Notes 2, August 08 1

Implementing Matching Estimators for. Average Treatment Effects in STATA

Selection on Observables: Propensity Score Matching.

The Evaluation of Social Programs: Some Practical Advice

Imbens, Lecture Notes 1, Unconfounded Treatment Assignment, IEN, Miami, Oct 10 1

Implementing Matching Estimators for. Average Treatment Effects in STATA. Guido W. Imbens - Harvard University Stata User Group Meeting, Boston

A Measure of Robustness to Misspecification

Moving the Goalposts: Addressing Limited Overlap in Estimation of Average Treatment Effects by Changing the Estimand

Section 10: Inverse propensity score weighting (IPSW)

Flexible Estimation of Treatment Effect Parameters

A Course in Applied Econometrics. Lecture 5. Instrumental Variables with Treatment Effect. Heterogeneity: Local Average Treatment Effects.

The propensity score with continuous treatments

DOCUMENTS DE TRAVAIL CEMOI / CEMOI WORKING PAPERS. A SAS macro to estimate Average Treatment Effects with Matching Estimators

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies

Imbens/Wooldridge, Lecture Notes 1, Summer 07 1

Groupe de lecture. Instrumental Variables Estimates of the Effect of Subsidized Training on the Quantiles of Trainee Earnings. Abadie, Angrist, Imbens

Assess Assumptions and Sensitivity Analysis. Fan Li March 26, 2014

studies, situations (like an experiment) in which a group of units is exposed to a

Empirical Methods in Applied Microeconomics

Matching. Quiz 2. Matching. Quiz 2. Exact Matching. Estimand 2/25/14

finite-sample optimal estimation and inference on average treatment effects under unconfoundedness

Moving the Goalposts: Addressing Limited Overlap in Estimation of Average Treatment Effects by Changing the Estimand

Why high-order polynomials should not be used in regression discontinuity designs

Gov 2002: 5. Matching

Dealing with Limited Overlap in Estimation of Average Treatment Effects

Matching Methods in Practice: Three Examples

Matching Techniques. Technical Session VI. Manila, December Jed Friedman. Spanish Impact Evaluation. Fund. Region

Testing for Rank Invariance or Similarity in Program Evaluation: The Effect of Training on Earnings Revisited

ESTIMATION OF TREATMENT EFFECTS VIA MATCHING

Comparing Treatments across Labor Markets: An Assessment of Nonexperimental Multiple-Treatment Strategies

A Course in Applied Econometrics Lecture 4: Linear Panel Data Models, II. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

ESTIMATING AVERAGE TREATMENT EFFECTS: REGRESSION DISCONTINUITY DESIGNS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics

AAEC/ECON 5126 FINAL EXAM: SOLUTIONS

Empirical approaches in public economics

Economics 113. Simple Regression Assumptions. Simple Regression Derivation. Changing Units of Measurement. Nonlinear effects

Potential Outcomes Model (POM)

Causal Inference in Observational Studies with Non-Binary Treatments. David A. van Dyk

A Simulation-Based Sensitivity Analysis for Matching Estimators

Causality and Experiments

Matching using Semiparametric Propensity Scores

Lecture 5: Omitted Variables, Dummy Variables and Multicollinearity

Evaluating Nonexperimental Estimators for Multiple Treatments: Evidence from a Randomized Experiment

New Developments in Econometrics Lecture 11: Difference-in-Differences Estimation

The problem of causality in microeconometrics.

Econ 673: Microeconometrics Chapter 12: Estimating Treatment Effects. The Problem

NONPARAMETRIC ESTIMATION OF AVERAGE TREATMENT EFFECTS UNDER EXOGENEITY: A REVIEW*

ECONOMETRICS HONOR S EXAM REVIEW SESSION

Bounds on Average and Quantile Treatment Effects of Job Corps Training on Wages*

Estimating and Using Propensity Score in Presence of Missing Background Data. An Application to Assess the Impact of Childbearing on Wellbeing

WE consider the problem of using data from several

Why High-Order Polynomials Should Not Be Used in Regression Discontinuity Designs

Bounds on Average and Quantile Treatment Effects of Job Corps Training on Wages*

Combining multiple observational data sources to estimate causal eects

Impact Evaluation Technical Workshop:

Topic 10: Panel Data Analysis

By Marcel Voia. February Abstract

Multiple Regression. Midterm results: AVG = 26.5 (88%) A = 27+ B = C =

Estimating Average Treatment Effects: Supplementary Analyses and Remaining Challenges

Write your identification number on each paper and cover sheet (the number stated in the upper right hand corner on your exam cover).

Principles Underlying Evaluation Estimators

Gov 2002: 4. Observational Studies and Confounding

Quantitative Economics for the Evaluation of the European Policy

Comparing Change Scores with Lagged Dependent Variables in Models of the Effects of Parents Actions to Modify Children's Problem Behavior

Introduction to Econometrics

Econometrics Honor s Exam Review Session. Spring 2012 Eunice Han

Estimating Dynamic Treatment Effects in Event Studies with Heterogeneous Treatment Effects

SIMULATION-BASED SENSITIVITY ANALYSIS FOR MATCHING ESTIMATORS

Michael Lechner Causal Analysis RDD 2014 page 1. Lecture 7. The Regression Discontinuity Design. RDD fuzzy and sharp

PhD/MA Econometrics Examination January 2012 PART A

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012

Achieving Optimal Covariate Balance Under General Treatment Regimes

Weighting. Homework 2. Regression. Regression. Decisions Matching: Weighting (0) W i. (1) -å l i. )Y i. (1-W i 3/5/2014. (1) = Y i.

Bounds on Average and Quantile Treatment E ects of Job Corps Training on Participants Wages

A Course in Applied Econometrics. Lecture 10. Partial Identification. Outline. 1. Introduction. 2. Example I: Missing Data

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals

Estimation of the Conditional Variance in Paired Experiments

A Test for Rank Similarity and Partial Identification of the Distribution of Treatment Effects Preliminary and incomplete

NBER WORKING PAPER SERIES A NOTE ON ADAPTING PROPENSITY SCORE MATCHING AND SELECTION MODELS TO CHOICE BASED SAMPLES. James J. Heckman Petra E.

A Note on Adapting Propensity Score Matching and Selection Models to Choice Based Samples

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score

WHEN TO CONTROL FOR COVARIATES? PANEL ASYMPTOTICS FOR ESTIMATES OF TREATMENT EFFECTS

The problem of causality in microeconometrics.

14.74 Lecture 10: The returns to human capital: education

Applied Econometrics (MSc.) Lecture 3 Instrumental Variables

Notes on causal effects

GLS and FGLS. Econ 671. Purdue University. Justin L. Tobias (Purdue) GLS and FGLS 1 / 22

Empirical Analysis III

Regression Discontinuity Designs.

Truncation and Censoring

Propensity Score Matching

Controlling for overlap in matching

Applied Microeconometrics Chapter 8 Regression Discontinuity (RD)

The Balance-Sample Size Frontier in Matching Methods for Causal Inference: Supplementary Appendix

Nonadditive Models with Endogenous Regressors

Heteroscedasticity 1

Problem Set 7. Ideally, these would be the same observations left out when you

Linear Regression with 1 Regressor. Introduction to Econometrics Spring 2012 Ken Simons

Covariate selection and propensity score specification in causal inference

Balancing Covariates via Propensity Score Weighting: The Overlap Weights

Transcription:

A Course in Applied Econometrics Lecture Outline Estimation of Average Treatment Effects Under Unconfoundedness, Part II. Assessing Unconfoundedness (not testable). Overlap. Illustration based on Lalonde Data Guido Imbens IRP Lectures, UW Madison, August.I Assessing Unconfoundedness: Multiple Control Groups Suppose we have a three-valued indicator T i {,, } for the groups (e.g., ineligibles, eligible nonnonparticipants and participants), with the treatment indicator equal to W i ={T i =}, so that Y i = { Yi () if T i {, } Y i () if T i =. Suppose we extend the unconfoundedness assumption to independence of the potential outcomes and the three-valued group indicator given covariates, Y i (),Y i () T i Xi Now a testable implication is Y i () {T i =} and thus Y i {T i =} X i,t i {, }, X i,t i {, }. An implication of this independence condition is being tested by the tests discussed above. Whether this test has much bearing on the unconfoundedness assumption, depends on whether the extension of the assumption is plausible given unconfoundedness itself.

.II Assessing Unconfoundedness: Estimate Effects on Pseudo Outcomes Partition the covariate vector into X i =(X p i,xr i ), Xp i scalar. Unconfoundedness assumes (Y i (),Y i ()) W i (X p i,xr i ) Most useful implementations with X p i a lagged outcome. Suppose the covariates consist of a number of lagged outcomes Y i,,...,y i, T as well as time-invariant individual characteristics Z i, so that X i = (X p i,xr i ), with Xp i = Y i, and Xi r =(Y i,,...,y i, T,Z i ). Outcome is Y i = Y i,. Now consider the following two assumptions. The first is unconfoundedness given only T lags of the outcome: Suppose we are willing to assume X r i is sufficient: Y i, (),Y i, () W i Y i,,...,y i, (T ),Z i, (Y i (),Y i ()) W i X r i and suppose X p i is a good proxy for Y i (), then we can test Then, under stationarity it seems reasonable to expect Then it follows that Y i, W i Y i,,...,y i, T,Z i, X p i W i X r i which is testable..i Assessing Overlap The first method to detect lack of overlap is to look at summary statistics for the covariates by treatment group. Most important here is the normalized difference in covariates: nor dif = X X S X, + S X, X w = N w X i and SX,w = ( ) Xi X w i:w i =w N w i:w i =w Note that we do not report the t-statistic for the difference, X t = X SX, /N + SX, /N The t-statistic partly reflects the sample size. Given the normalized difference, a larger t-statistic just indicates a larger sample size, and therefore in fact an easier problem in terms of finding credible estimators for average treatment effects. In general a difference in average means bigger than. standard deviations is substantial. In that case one may want to be suspicious of simple methods like linear regression with a dummy for the treatment variable. Recall that estimating the average effect essentially amounts to using the controls to estimate μ (x) =E[Y i W i =,X i = x] and using this estimated regression function to predict the (missing) control outcomes for the treated units. With a large difference between the two groups, linear regression is going to rely heavily on extrapolation, and thus will be sensitive to the exact functional form.

Assessing Overlap by Inspecting the Propensity Score Distribution The second method for assessing overlap is more directly focused on the overlap assumption. It involves inspecting the marginal distribution of the propensity score in both treatment groups. Any difference in covariate distribution shows up in differences in the average propensity score between the two groups. Moreoever, any area of non-overlap shows up in zero or one values for the propensity score..ii Selecting a Subsample with Overlap: Matching Appropriate when the focus is on the average effect for treated, E[Y i () Y i () W i = ], and when there is a relatively large pool of potential controls. Order treated units by estimated propensity score, highest first. Match highest propensity score treated unit to closest control on estimated propensity score, without replacement. Only to create balanced sample, not as final analysis. 9.III Selecting a Subsample with Overlap: Trimming Define average effects for subsamples A: N N τ(a) = {X i A} τ(x i )/ {X i A}. i= i= The efficiency bound for τ(a), assuming homoskedasticity, as σ [ ] q(a) E e(x) + e(x) X A, where q(a) =Pr(X A). They derive the characterization for the set A that minimizes the asymptotic variance. The optimal set has the form A = {x X α e(x) α}, dropping observations with extreme values for the propensity score, with the cutoff value α determined by the equation α ( α) = [ ] E e(x) ( e(x)) e(x) ( e(x)). α ( α) Note that this subsample is selected solely on the basis of the joint distribution of the treatment indicators and the covariates, and therefore does not introduce biases associated with selection based on the outcomes. Calculations for Beta distributions for the propensity score suggest that α =. approximates the optimal set well in practice.

. Applic. to Lalonde Data (Dehejia-Wahba Sample) Data on job training program, first used by Lalonde (9), See also Heckman and Hotz (99), Dehejia and Wahba (999). Small experimental evaluation, trainees, controls, group of very disadvantaged in labor market. Large, non-experimental comparison group from CPS (,99 observations). Very different in distribution of covariates. How well do the non-experimental results replicate the experimental ones? Is non-experimental analysis credible? Would we have known whether it was credible without experiments results? Table : Summary Statistics for Lalonde Data Trainees Controls CPS (N=) (N=) (N=,99) mean (s.d.) mean (s.d.) n-dif mean (s.d.) n-dif Black........ Hispanic........ Age........ Married.9.9...... No Deg........ Educ.......9. Earn..9..9.. 9.. U........ Earn...... 9.. U..9...... Fig : Histogram Propensity Score for Controls, Experimental Sample..............9 Fig : Histogram Propensity Score for Trainees, Experimental Sample.........9 Fig : Histogram Propensity Score for Controls, Full CPS Fig Sample : Histogram Propensity Score for Controls, Matched CPS Sample........ Fig : Histogram Propensity Score for Trainees, Full CPS Fig Sample : Histogram Propensity Score for Trainees, Matched CPS Sample............

Next, let us assess unconfoundedness in this sample using earnings in 9 as the pseudo outcome. The experimental data set is well balanced. The difference in averages between treatment and control group is never more than. standard deviations. In contrast, with the CPS comparison group the differences between the averages are up to. standard deviations from zero, suggesting there will be serious issues in obtaining credible estimates of the average effect of the treatment. We report results for 9 different estimators, including the simple difference, parallel and separate least squares regressions, weighting and blocking on the propensity score, and matching, with the last three also combined with regression. Both for experimental control group and for cps comparison group. Specification for propensity score, and block choice are based on algorithm, see notes for details. Table : Estimates for Lalonde Data with Earnings as Outcome Experimental Controls CPS Comparison Group est (s.e.) t-stat est (s.e.) t-stat Simple Dif... -.. -.9 OLS (parallel)... -.. -. OLS (separate)... -.. -. Weighting.9..9 -.. -.99 Blocking... -.. -.9 Matching... -.. -. Weight and Regr...99 -.. -. Block and Regr...9 -.. -. Match and Regr -.. -. -.. -.9 With the cps comparison group, results are discouraging. Consistently find big effects on earnings in 9, with point estimates varying widely. The sensitivity is not surprising given substantial differences in covariate distributions.

Table : Summary Statistics for Matched CPS Sample Next, create a matched sample to improve balance. Trainees (N=) Controls (N=) mean (s.d.) mean (s.d.) nor-dif Order treated observations on estimated propensity score. Starting with the highest propensity score, match each treated observation to the closest control, without replacement. Match on the propensity score. Black.... -. Hispanic.... -. Age.... -. Married.9.9.. -. No Degree..... Education...9.9 -. Earnings..9.. -. Unempl..... Earnings....9 -. Unempl...9... 9 Table : Estimates on Selected CPS Lalonde Data In the matched sample the normalized differences are comparable to those in the experimental sample. Now we revisit the analysis on earnings in 9, and also carry out analysis on earnings in 9 (the actual outcome). Earn Outcome Earn Outcome est (s.e.) t-stat est (s.e.) t-stat Simple Dif -.9. -.9... OLS (parallel)...... OLS (separate)...... Weighting -.. -....9 Blocking -.. -.... Matching -.. -...9. Weight and Regr......9 Block and Regr...... Match and Regr -.. -...9.9

Now results are consistently small and statistically insignificant for earnigns in 9, so unconfoundedness seems reasonable, and analyses potentially credible. Estimates for earnings in 9 are robust accross all nine estimators, with the exception of the simple difference in average outcomes by treatment status. Estimates are consistent with experimental estimates (.). Conclusion Important to assess and address lack of overlap. In reasonably balanced samples choice of estimator is less important. Combining regression and matching or propensity score blocking is preferred method for robustness properties.