Introduction to Propensity Score Matching: A Review and Illustration

Similar documents
Econ 673: Microeconometrics Chapter 12: Estimating Treatment Effects. The Problem

Selection on Observables: Propensity Score Matching.

What s New in Econometrics. Lecture 1

An Introduction to Causal Mediation Analysis. Xu Qin University of Chicago Presented at the Central Iowa R User Group Meetup Aug 10, 2016

PROPENSITY SCORE MATCHING. Walter Leite

Implementing Matching Estimators for. Average Treatment Effects in STATA

ESTIMATION OF TREATMENT EFFECTS VIA MATCHING

Quantitative Economics for the Evaluation of the European Policy

Implementing Matching Estimators for. Average Treatment Effects in STATA. Guido W. Imbens - Harvard University Stata User Group Meeting, Boston

Propensity Score Matching and Analysis TEXAS EVALUATION NETWORK INSTITUTE AUSTIN, TX NOVEMBER 9, 2018

Controlling for overlap in matching

Since the seminal paper by Rosenbaum and Rubin (1983b) on propensity. Propensity Score Analysis. Concepts and Issues. Chapter 1. Wei Pan Haiyan Bai

Propensity Score Matching

Matching. Quiz 2. Matching. Quiz 2. Exact Matching. Estimand 2/25/14

(Mis)use of matching techniques

Job Training Partnership Act (JTPA)

Michael Lechner Causal Analysis RDD 2014 page 1. Lecture 7. The Regression Discontinuity Design. RDD fuzzy and sharp

Use of Matching Methods for Causal Inference in Experimental and Observational Studies. This Talk Draws on the Following Papers:

An Introduction to Causal Analysis on Observational Data using Propensity Scores

EMERGING MARKETS - Lecture 2: Methodology refresher

Econometrics I. Professor William Greene Stern School of Business Department of Economics 1-1/40. Part 1: Introduction

Impact Evaluation Technical Workshop:

Bootstrap Approach to Comparison of Alternative Methods of Parameter Estimation of a Simultaneous Equation Model

Propensity Score Methods for Causal Inference

Empirical approaches in public economics

Flexible Estimation of Treatment Effect Parameters

Gov 2002: 5. Matching

Econometrics in a nutshell: Variation and Identification Linear Regression Model in STATA. Research Methods. Carlos Noton.

SIMULATION-BASED SENSITIVITY ANALYSIS FOR MATCHING ESTIMATORS

Technical Track Session I: Causal Inference

Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design

Notes on causal effects

Imbens/Wooldridge, IRP Lecture Notes 2, August 08 1

Propensity Score Weighting with Multilevel Data

Introduction to causal identification. Nidhiya Menon IGC Summer School, New Delhi, July 2015

Online Appendix to Yes, But What s the Mechanism? (Don t Expect an Easy Answer) John G. Bullock, Donald P. Green, and Shang E. Ha

Causal Inference Basics

Matching. James J. Heckman Econ 312. This draft, May 15, Intro Match Further MTE Impl Comp Gen. Roy Req Info Info Add Proxies Disc Modal Summ

Applied Microeconometrics (L5): Panel Data-Basics

Causal Analysis in Social Research

Estimation of the Conditional Variance in Paired Experiments

Independent and conditionally independent counterfactual distributions

Matching Techniques. Technical Session VI. Manila, December Jed Friedman. Spanish Impact Evaluation. Fund. Region

Balancing Covariates via Propensity Score Weighting

ECO Class 6 Nonparametric Econometrics

Causal Inference with Big Data Sets

Estimating and Using Propensity Score in Presence of Missing Background Data. An Application to Assess the Impact of Childbearing on Wellbeing

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies

Econometric Analysis of Cross Section and Panel Data

Ninth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis"

Section 7: Local linear regression (loess) and regression discontinuity designs

Gov 2002: 4. Observational Studies and Confounding

DOCUMENTS DE TRAVAIL CEMOI / CEMOI WORKING PAPERS. A SAS macro to estimate Average Treatment Effects with Matching Estimators

Combining multiple observational data sources to estimate causal eects

AGEC 661 Note Fourteen

Propensity Score Analysis Using teffects in Stata. SOC 561 Programming for the Social Sciences Hyungjun Suh Apr

Development. ECON 8830 Anant Nyshadham

The propensity score with continuous treatments

SC705: Advanced Statistics Instructor: Natasha Sarkisian Class notes: Introduction to Structural Equation Modeling (SEM)

Logistic regression: Why we often can do what we think we can do. Maarten Buis 19 th UK Stata Users Group meeting, 10 Sept. 2015

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

A Measure of Robustness to Misspecification

ESTIMATING AVERAGE TREATMENT EFFECTS: REGRESSION DISCONTINUITY DESIGNS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data?

Use of Matching Methods for Causal Inference in Experimental and Observational Studies. This Talk Draws on the Following Papers:

CompSci Understanding Data: Theory and Applications

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data?

ACE 564 Spring Lecture 8. Violations of Basic Assumptions I: Multicollinearity and Non-Sample Information. by Professor Scott H.

Why experimenters should not randomize, and what they should do instead

Exam D0M61A Advanced econometrics

Dynamics in Social Networks and Causality

Measuring Social Influence Without Bias

NONPARAMETRIC ESTIMATION OF AVERAGE TREATMENT EFFECTS UNDER EXOGENEITY: A REVIEW*

Covariate Balancing Propensity Score for General Treatment Regimes

Chapter 1 Introduction. What are longitudinal and panel data? Benefits and drawbacks of longitudinal data Longitudinal data models Historical notes

Chapter 60 Evaluating Social Programs with Endogenous Program Placement and Selection of the Treated

Technical Track Session I:

WISE International Masters

IV Quantile Regression for Group-level Treatments, with an Application to the Distributional Effects of Trade

Causal Mechanisms Short Course Part II:

One Economist s Perspective on Some Important Estimation Issues

Local Polynomial Modelling and Its Applications

Confidence intervals for kernel density estimation

Statistical Models for Causal Analysis

Applied Health Economics (for B.Sc.)

Generated Covariates in Nonparametric Estimation: A Short Review.

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score

The Simple Linear Regression Model

Analysis of Panel Data: Introduction and Causal Inference with Panel Data

Composite Causal Effects for. Time-Varying Treatments and Time-Varying Outcomes

CAUSALITY AND EXOGENEITY IN ECONOMETRICS

Click to edit Master title style

Econometrics of causal inference. Throughout, we consider the simplest case of a linear outcome equation, and homogeneous

Propensity Score Analysis with Hierarchical Data

Master of Science in Statistics A Proposal

Combining Difference-in-difference and Matching for Panel Data Analysis

The Impact of Measurement Error on Propensity Score Analysis: An Empirical Investigation of Fallible Covariates

Propensity Score Matching and Genetic Matching : Monte Carlo Results

INTRODUCTION TO BASIC LINEAR REGRESSION MODEL

arxiv: v1 [stat.me] 15 May 2011

Transcription:

Introduction to Propensity Score Matching: A Review and Illustration Shenyang Guo, Ph.D. School of Social Work University of North Carolina at Chapel Hill January 28, 2005 For Workshop Conducted at the School of Social Work, University of Illinois Urbana-Champaign NSCAW data used to illustrate PSM were collected under funding by the Administration on Children, Youth, and Families of the U.S. Department of Health and Human Services. Findings do not represent the official position or policies of the U.S. DHHS. PSM analyses were funded by the Robert Wood Johnson Foundation Substance Abuse Policy Research Program, and by the Children s Bureau s research grant. Results are preliminary and not quotable. Contact information: sguo@email.unc.edu

Outline Day 1 Overview: Why PSM? History and development of PSM Counterfactual framework The fundamental assumption General procedure Software packages Review & illustration of the basic methods developed by Rosenbaum and Rubin

Outline (continued) Review and illustration of Heckman s difference-in-differences method Problems with the Rosenbaum & Rubin s method Difference-in-differences method Nonparametric regression Bootstrapping Day 2 Practical issues, concerns, and strategies Questions and discussions

PSM References Check website: http://sswnt5.sowo.unc.edu/vrc/lectures/index.htm (Link to file Day1b.doc )

Why PSM? (1) Need 1: Analyze causal effects of treatment from observational data Observational data - those that are not generated by mechanisms of randomized experiments, such as surveys, administrative records, and census data. To analyze such data, an ordinary least square (OLS) regression model using a dichotomous indicator of treatment does not work, because in such model the error term is correlated with explanatory variable.

Why PSM? (2) Y = $ + # W + X " +! i The independent variable w is usually correlated with the error term ε. The consequence is inconsistent and biased estimate about the treatment effect τ. i ' i i

Why PSM? (3) Need 2: Removing Selection Bias in Program Evaluation Fisher s randomization idea. Whether social behavioral research can really accomplish randomized assignment of treatment? Consider E(Y 1 W=1) E(Y 0 W=0). Add and subtract E(Y 0 W=1), we have {E(Y 1 W=1) E(Y 0 W=1)} + {E(Y 0 W=1) - E(Y 0 W=0)} Crucial: E(Y 0 W=1) E(Y 0 W=0) The debate among education researchers: the impact of Catholic schools vis-à-vis public schools on learning. The Catholic school effect is the strongest among those Catholic students who are less likely to attend Catholic schools (Morgan, 2001).

Why PSM? (4) Heckman & Smith (1995) Four Important Questions: What are the effects of factors such as subsidies, advertising, local labor markets, family income, race, and sex on program application decision? What are the effects of bureaucratic performance standards, local labor markets and individual characteristics on administrative decisions to accept applicants and place them in specific programs? What are the effects of family background, subsidies and local market conditions on decisions to drop out from a program and on the length of time taken to complete a program? What are the costs of various alternative treatments?

History and Development of PSM The landmark paper: Rosenbaum & Rubin (1983). Heckman s early work in the late 1970s on selection bias and his closely related work on dummy endogenous variables (Heckman, 1978) address the same issue of estimating treatment effects when assignment is nonrandom. Heckman s work on the dummy endogenous variable problem and the selection model can be understood as a generalization of the propensity-score approach (Winship & Morgan, 1999). In the 1990s, Heckman and his colleagues developed difference-in-differences approach, which is a significant contribution to PSM. In economics, the DID approach and its related techniques are more generally called nonexperimental evaluation, or econometrics of matching.

The Counterfactual Framework Counterfactual: what would have happened to the treated subjects, had they not received treatment? The key assumption of the counterfactual framework is that individuals selected into treatment and nontreatment groups have potential outcomes in both states: the one in which they are observed and the one in which they are not observed (Winship & Morgan, 1999). For the treated group, we have observed mean outcome under the condition of treatment E(Y 1 W=1) and unobserved mean outcome under the condition of nontreatment E(Y 0 W=1). Similarly, for the nontreated group we have both observed mean E(Y 0 W=0) and unobserved mean E(Y 1 W=0).

The Counterfactual Framework (Continued) Under this framework, an evaluation of E(Y 1 W=1) - E(Y 0 W=0) can be thought as an effort that uses E(Y 0 W=0) to estimate the counterfactual E(Y 0 W=1). The central interest of the evaluation is not in E(Y 0 W=0), but in E(Y 0 W=1). The real debate about the classical experimental approach centers on the question: whether E(Y 0 W=0) really represents E(Y 0 W=1)?

Fundamental Assumption Rosenbaum & Rubin (1983) ( 1 Y, Y )! W X 0. Different versions: unconfoundedness & ignorable treatment assignment (Rosenbaum & Robin, 1983), selection on observables (Barnow, Cain, & Goldberger, 1980), conditional independence (Lechner 1999, 2002), and exogeneity (Imbens, 2004)

General Procedure Run Logistic Regression: Dependent variable: Y=1, if participate; Y = 0, otherwise. Choose appropriate conditioning (instrumental) variables. Obtain propensity score: predicted probability (p) or log[(1-p)/p]. Either Or 1-to-1 or 1-to-n match and then stratification (subclassification) Kernel or local linear weight match and then estimate Difference-indifferences (Heckman) 1-to-1 or 1-to-n Match Nearest neighbor matching Caliper matching Mahalanobis Mahalanobis with propensity score added Multivariate analysis based on new sample

Nearest Neighbor and Caliper Matching C( P ) = min P " P, j I i j 0 Nearest neighbor: j The nonparticipant with the value of Pj that is closest to Pi is selected as the match. Caliper: A variation of nearest neighbor: A match for person i is selected only if where ε is a pre-specified tolerance. Pi " Pj < #, j! I0 i! Recommended caliper size:.25σp 1-to-1 Nearest neighbor within caliper (The is a common practice) 1-to-n Nearest neighbor within caliper

Mahalanobis Metric Matching: (with or without replacement) Mahalanobis without p-score: Randomly ordering subjects, calculate the distance between the first participant and all nonparticipants. The distance, d(i,j) can be defined by the Mahalanobis distance: d( i, j) = ( u! v) where u and v are values of the matching variables for participant i and nonparticipant j, and C is the sample covariance matrix of the matching variables from the full set of nonparticipants. Mahalanobis metric matching with p-score added (to u and v). Nearest available Mahalandobis metric matching within calipers defined by the propensity score (need your own programming). T C! 1 ( u! v)

Stratification (Subclassification) Matching and bivariate analysis are combined into one procedure (no step-3 multivariate analysis): Group sample into five categories based on propensity score (quintiles). Within each quintile, calculate mean outcome for treated and nontreated groups. Estimate the mean difference (average treatment effects) for the whole sample (i.e., all five groups) and variance using the following equations: " K n! [ k " K $ = Y 0k # Y 1k], Var($ ) = N! k = 1 k = 1 nk ( N ) 2 Var[ Y 0k # Y 1k ]

Multivariate Analysis at Step-3 We could perform any kind of multivariate analysis we originally wished to perform on the unmatched data. These analyses may include: multiple regression generalized linear model survival analysis structural equation modeling with multiple-group comparison, and hierarchical linear modeling (HLM) As usual, we use a dichotomous variable indicating treatment versus control in these models.

Very Useful Tutorial for Rosenbaum & Rubin s Matching Methods D Agostino, R.B. (1998). Propensity score methods for bias reduction in the comparison of a treatment to a nonrandomized control group. Statistics in Medicine 17, 2265-2281.

Software Packages There is currently no commercial software package that offers formal procedure for PSM. In SAS, Lori Parsons developed several Macros (e.g., the GREEDY macro does nearest neighbor within caliper matching). In SPSS, Dr. John Painter of Jordan Institute developed a SPSS macro to do similar works as GREEDY (http://sswnt5.sowo.unc.edu/vrc/lectures/index.htm). We have investigated several computing packages and found that PSMATCH2 (developed by Edwin Leuven and Barbara Sianesi [2003], as a user-supplied routine in STATA) is the most comprehensive package that allows users to fulfill most tasks for propensity score matching, and the routine is being continuously improved and updated.

Demonstration of Running STATA/PSMATCH2: Part 1. Rosenbaum & Rubin s Methods (Link to file Day1c.doc )

Problems with the Conventional (Prior to Heckman s DID) Approaches Equal weight is given to each nonparticipant, though within caliper, in constructing the counterfactual mean. Loss of sample cases due to 1-to-1 match. What does the resample represent? External validity. It s a dilemma between inexact match and incomplete match: while trying to maximize exact matches, cases may be excluded due to incomplete matching; while trying to maximize cases, inexact matching may result.

Heckman s Difference-in- Differences Matching Estimator (1) $ & Difference-in-differences Applies when each participant matches to multiple nonparticipants. KDM = 1! {( Y % Y! 1ti 0t i n1 i# I1" S j# I0 " ' ) % W ( i, j)( Y % Y ' )} p S p 0tj Weight (see the following slides) 0t j Total number of participants Participant i in the set of commonsupport. Difference Multiple nonparticipants who are in the set of common-support (matched to i)..in Differences

Heckman s Difference-in- Differences Matching Estimator (2) Weights W(i.,j) (distance between i and j) can be determined by using one of two methods: 1. Kernel matching: W ( i, j) = ( Pj ) Pi % G & # ' an $ ( P )! " # % k Pi G & k I0 ' an $ where G(.) is a kernel function and α n is a bandwidth parameter.

2. Local linear weighting function (lowess): Heckman s Difference-in- Differences Matching Estimator (3) ( ) ( ) [ ] ( ) ( )!!!!! " " " " " # # $ % & & ' ( ) ) ) * +, -. / ) ) ) ) = 0 0 0 0 0 2 2 2 ) ( ), ( I j I k I k i k ik i k ij ij I k i k ik I k i j ij i k ik ij P P G P P G G P P G P P G P P G G j i W

A Review of Nonparametric Regression (Curve Smoothing Estimators) I am grateful to John Fox, the author of the two Sage green books on nonparametric regression (2000), for his provision of the R code to produce the illustrating example.

Why Nonparametric? Why Parametric Regression Doesn t Work? Female Expectation of Life 40 50 60 70 80 GDP per Capita

The Task: Determining the Y-value for a Focal Point X(120) 40 50 60 70 80 Focal x(120) The 120 th ordered x Saint Lucia: x=3183 y=74.8 The window, called span, contains.5n=95 observations 0 10000 20000 30000 40000 GDP per Capita

Weights within the Span Can Be Determined by the Tricube Kernel Function 0.0 0.2 0.4 0.6 0.8 1.0 Tricube kernel weights zi = ( xi! x0) / K T ( z) = h { 3 3 (1! z )... for z < 1 0... for z " 1 0 10000 20000 30000 40000 GDP per Capita

The Y-value at Focal X(120) Is a Weighted Mean 40 50 60 70 80 Weighted mean = 71.11301 Country Life Exp. GDP Z Weight Poland 75.7 3058 1.3158 0 Lebanon 71.7 3114 0.7263 0.23 Saint.Lucia 74.8 3183 0 1.00 South.Africa 68.3 3230 0.4947 0.68 Slovakia 75.8 3266 0.8737 0.04 Venezuela 75.7 3496 3.2947 0 0 10000 20000 30000 40000 GDP per Capita

The Nonparametric Regression Line Connects All 190 Averaged Y Values Female Expectation of Life 40 50 60 70 80 GDP per Capita

Review of Kernel Functions Tricube is the default kernel in popular packages. Gaussian normal kernel: 1 2! z / 2 K ( z) = e N 2" Epanechnikov kernel parabolic shape with support [-1, 1]. But the kernel is not differentiable at z=+1. Rectangular kernel (a crude method).

Local Linear Regression (Also known as lowess or loess ) A more sophisticated way to calculate the Y values. Instead of constructing weighted average, it aims to construct a smooth local linear regression with estimated β 0 and β 1 that minimizes: n! 1 [ Y " # " # ( x i 0 1 i " x 0 )] 2 x K( i " h x 0 ) where K(.) is a kernel function, typically tricube.

The Local Average Now Is Predicted by a Regression Line, Instead of a Line Parallel to the X-axis. 40 50 60 70 80 Y! (120) 0 10000 20000 30000 40000 GDP per Capita

Asymptotic Properties of lowess Fan (1992, 1993) demonstrated advantages of lowess over more standard kernel estimators. He proved that lowess has nice sampling properties and high minimax efficiency. In Heckman s works prior to 1997, he and his coauthors used the kernel weights. But since 1997 they have used lowess. In practice it s fairly complicated to program the asymptotic properties. No software packages provide estimation of the S.E. for lowess. In practice, one uses S.E. estimated by bootstrapping.

Bootstrap Statistics Inference (1) It allows the user to make inferences without making strong distributional assumptions and without the need for analytic formulas for the sampling distribution s parameters. Basic idea: treat the sample as if it is the population, and apply Monte Carlo sampling to generate an empirical estimate of the statistic s sampling distribution. This is done by drawing a large number of resamples of size n from this original sample randomly with replacement. A closely related idea is the Jackknife: drop one out. That is, it systematically drops out subsets of the data one at a time and assesses the variation in the sampling distribution of the statistics of interest.

Bootstrap Statistics Inference (2) After obtaining estimated standard error (i.e., the standard deviation of the sampling distribution), one can calculate 95 % confidence interval using one of the following three methods: Normal approximation method Percentile method Bias-corrected (BC) method The BC method is popular.

Finite-Sample Properties of lowess The finite-sample properties of lowess have been examined just recently (Frolich, 2004). Two practical implications: 1. Choose optimal bandwidth value. 2. Trimming (i.e., discarding the nonparametric regression results in regions where the propensity scores for the nontreated cases are sparse) may not be the best response to the variance problems. Sensitivity analysis testing different trimming schemes.

Heckman s Contributions to PSM Unlike traditional matching, DID uses propensity scores differentially to calculate weighted mean of counterfactuals. A creative way to use information from multiple matches. DID uses longitudinal data (i.e., outcome before and after intervention). By doing this, the estimator is more robust: it eliminates temporarily-invariant sources of bias that may arise, when program participants and nonparticipants are geographically mismatched or from differences in survey questionnaire.

Demonstration of Running STATA/PSMATCH2: Part 2. Heckman s Difference-in-differences Method (Link to file Day1c.doc )