PROPENSITY SCORE MATCHING. Walter Leite

Similar documents
Matching. Quiz 2. Matching. Quiz 2. Exact Matching. Estimand 2/25/14

Dynamics in Social Networks and Causality

Gov 2002: 5. Matching

Propensity Score Methods for Causal Inference

ESTIMATION OF TREATMENT EFFECTS VIA MATCHING

Since the seminal paper by Rosenbaum and Rubin (1983b) on propensity. Propensity Score Analysis. Concepts and Issues. Chapter 1. Wei Pan Haiyan Bai

Selection on Observables: Propensity Score Matching.

(Mis)use of matching techniques

Propensity Score Matching

Lab 4, modified 2/25/11; see also Rogosa R-session

Introduction to Propensity Score Matching: A Review and Illustration

Job Training Partnership Act (JTPA)

NISS. Technical Report Number 167 June 2007

An Introduction to Causal Analysis on Observational Data using Propensity Scores

Propensity Score Matching and Analysis TEXAS EVALUATION NETWORK INSTITUTE AUSTIN, TX NOVEMBER 9, 2018

Econ 673: Microeconometrics Chapter 12: Estimating Treatment Effects. The Problem

Background of Matching

Weighting Methods. Harvard University STAT186/GOV2002 CAUSAL INFERENCE. Fall Kosuke Imai

Covariate Balancing Propensity Score for General Treatment Regimes

Weighting. Homework 2. Regression. Regression. Decisions Matching: Weighting (0) W i. (1) -å l i. )Y i. (1-W i 3/5/2014. (1) = Y i.

arxiv: v1 [stat.me] 15 May 2011

Propensity Score Analysis with Hierarchical Data

Propensity Score Weighting with Multilevel Data

Flexible Estimation of Treatment Effect Parameters

Evaluating the performance of propensity score matching methods: A simulation study

Section 10: Inverse propensity score weighting (IPSW)

Matching Techniques. Technical Session VI. Manila, December Jed Friedman. Spanish Impact Evaluation. Fund. Region

What s New in Econometrics. Lecture 1

Imbens/Wooldridge, IRP Lecture Notes 2, August 08 1

Matching. Stephen Pettigrew. April 15, Stephen Pettigrew Matching April 15, / 67

Estimating and Using Propensity Score in Presence of Missing Background Data. An Application to Assess the Impact of Childbearing on Wellbeing

Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design

Propensity Score for Causal Inference of Multiple and Multivalued Treatments

Stratified Randomized Experiments

Propensity Score Analysis Using teffects in Stata. SOC 561 Programming for the Social Sciences Hyungjun Suh Apr

Controlling for overlap in matching

Summary and discussion of The central role of the propensity score in observational studies for causal effects

Implementing Matching Estimators for. Average Treatment Effects in STATA

Implementing Matching Estimators for. Average Treatment Effects in STATA. Guido W. Imbens - Harvard University Stata User Group Meeting, Boston

DOCUMENTS DE TRAVAIL CEMOI / CEMOI WORKING PAPERS. A SAS macro to estimate Average Treatment Effects with Matching Estimators

New Developments in Nonresponse Adjustment Methods

HOW A SUPPRESSOR VARIABLE AFFECTS THE ESTIMATION OF CAUSAL EFFECT: EXAMPLES OF CLASSICAL AND RECIPROCAL SUPPRESSIONS. Yun-Jia Lo

Causal Sensitivity Analysis for Decision Trees

studies, situations (like an experiment) in which a group of units is exposed to a

Longitudinal Data Analysis Using SAS Paul D. Allison, Ph.D. Upcoming Seminar: October 13-14, 2017, Boston, Massachusetts

Causal Inference Basics

Classification. Chapter Introduction. 6.2 The Bayes classifier

An Introduction to Causal Mediation Analysis. Xu Qin University of Chicago Presented at the Central Iowa R User Group Meetup Aug 10, 2016

Primal-dual Covariate Balance and Minimal Double Robustness via Entropy Balancing

Gov 2002: 4. Observational Studies and Confounding

SIMULATION-BASED SENSITIVITY ANALYSIS FOR MATCHING ESTIMATORS

Sensitivity analysis for average treatment effects

H-LIKELIHOOD ESTIMATION METHOOD FOR VARYING CLUSTERED BINARY MIXED EFFECTS MODEL

Matching for Causal Inference Without Balance Checking

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017

Section 9: Matching without replacement, Genetic matching

Vector-Based Kernel Weighting: A Simple Estimator for Improving Precision and Bias of Average Treatment Effects in Multiple Treatment Settings

Machine Learning Linear Classification. Prof. Matteo Matteucci

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies

Table B1. Full Sample Results OLS/Probit

Chapter 60 Evaluating Social Programs with Endogenous Program Placement and Selection of the Treated

OMITTED VARIABLES, R, AND BIAS REDUCTION IN MATCHING HIERARCHICAL DATA: A MONTE CARLO STUDY

Estimating the Marginal Odds Ratio in Observational Studies

Use of Matching Methods for Causal Inference in Experimental and Observational Studies. This Talk Draws on the Following Papers:

Section 9c. Propensity scores. Controlling for bias & confounding in observational studies

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

Difference-in-Differences Methods

Four Parameters of Interest in the Evaluation. of Social Programs. James J. Heckman Justin L. Tobias Edward Vytlacil

An Empirical Comparison of Multiple Imputation Approaches for Treating Missing Data in Observational Studies

MATCHING FOR EE AND DR IMPACTS

A SIMULATION-BASED SENSITIVITY ANALYSIS FOR MATCHING ESTIMATORS

Propensity Score Methods for Estimating Causal Effects from Complex Survey Data

Matching. James J. Heckman Econ 312. This draft, May 15, Intro Match Further MTE Impl Comp Gen. Roy Req Info Info Add Proxies Disc Modal Summ

Alternative Balance Metrics for Bias Reduction in. Matching Methods for Causal Inference

Statistics Handbook. All statistical tables were computed by the author.

Estimating Causal Effects from Observational Data with the CAUSALTRT Procedure

The Impact of Measurement Error on Propensity Score Analysis: An Empirical Investigation of Fallible Covariates

AAEC/ECON 5126 FINAL EXAM: SOLUTIONS

Lecture 12: Effect modification, and confounding in logistic regression

Business Statistics. Lecture 10: Correlation and Linear Regression

THE DESIGN (VERSUS THE ANALYSIS) OF EVALUATIONS FROM OBSERVATIONAL STUDIES: PARALLELS WITH THE DESIGN OF RANDOMIZED EXPERIMENTS DONALD B.

Asymptotic equivalence of paired Hotelling test and conditional logistic regression

A Theory of Statistical Inference for Matching Methods in Causal Research

Analysis of propensity score approaches in difference-in-differences designs

Imbens, Lecture Notes 1, Unconfounded Treatment Assignment, IEN, Miami, Oct 10 1

Biostat 2065 Analysis of Incomplete Data

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Data Integration for Big Data Analysis for finite population inference

Analysis of Covariance. The following example illustrates a case where the covariate is affected by the treatments.

ESTIMATING AVERAGE TREATMENT EFFECTS: REGRESSION DISCONTINUITY DESIGNS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics

Tutorial 6: Tutorial on Translating between GLIMMPSE Power Analysis and Data Analysis. Acknowledgements:

Stat 412/512 TWO WAY ANOVA. Charlotte Wickham. stat512.cwick.co.nz. Feb

Lecture 11/12. Roy Model, MTE, Structural Estimation

Near/Far Matching. Building a Stronger Instrument in an Observational Study of Perinatal Care for Premature Infants

finite-sample optimal estimation and inference on average treatment effects under unconfoundedness

OUTCOME REGRESSION AND PROPENSITY SCORES (CHAPTER 15) BIOS Outcome regressions and propensity scores

Double Robustness. Bang and Robins (2005) Kang and Schafer (2007)

Matching with Multiple Control Groups, and Adjusting for Group Differences

High Dimensional Propensity Score Estimation via Covariate Balancing

Moving the Goalposts: Addressing Limited Overlap in Estimation of Average Treatment Effects by Changing the Estimand

Transcription:

PROPENSITY SCORE MATCHING Walter Leite 1

EXAMPLE Question: Does having a job that provides or subsidizes child care increate the length that working mothers breastfeed their children? Treatment: Working for a company that provides or subsidizes child care Outcome: age of the child in weeks when breastfeeding ended Data source: National Longitudinal Survey of Youth 1979 (NLSY79) and the NLSY79 Children and Youth Sample size: Child care was provided or subsidized in 107 (8.85%) of 1209 cases. 2

ESTIMATION OF PROPENSITY SCORES FOR EXAMPLE 31 covariates were selected; Examples: benefits provided by the mother s current job (i.e., life insurance, dental insurance, profit sharing, retirement, training opportunities), the mother s education level, hours worked per week, and employment sector, family size, amount of public assistance received by the family, and whether a cesarean section was performed. Estimation was performed using logistic regression with the glm function of the base R package. 3

PROPENSITY SCORES FOR MATCHING It is advantageous to match on the linear propensity score (i.e., the logit of the propensity score) rather than the propensity score itself, because it avoids compression around zero and one. log(e(x)) log ex ( ) 1 ex ( ) 4

Common support region Cases excluded Range of matched cases. Participants Nonparticipants Predicted Probability 5

COMMON SUPPORT REQUIREMENTS Matching to estimate the ATT only requires that the distribution of propensity scores for the treated is contained within the distribution of the untreated. 6

MATCHING METHODS TAXONOMY Replacement: Matching with replacement Matching without replacement Algorithms: Greedy matching: Optimal matching Genetic matching Ratio: Pair matching (1 to 1) Fixed ratio (1 to k) variable ratio Full 7

GREEDY MATCHING For each individual in the treated sample, select the best available match without accounting for the quality of the match of the entire treated sample. Greedy matching has the advantage of allowing any analysis to estimate the treatment effect after matching. Types of Greedy Matching: Nearest neighbor propensity score matching Nearest neighbor propensity score matching within a caliper Mahalanobis metric matching Mahalanobis metric matching within a propensity score caliper 8

LIMITATIONS OF GREEDY MATCHING While trying to maximize exact matches (i.e., within the common support region or within a caliper), cases may be excluded due to incomplete matching (no available matches). While trying to maximize cases (i.e., widen the region), inexact matching may result. To prevent inexact or incomplete matching when the number of untreated is not much larger than the treated, use matching with replacement. 9

NEAREST NEIGHBOR MATCHING 1. Randomly order the treated and untreated individuals 2. select the first treated individual i and find the untreated individual j with closest propensity score. 3. If matching is without replacement, remove j from the pool. 4. Repeat the above process until matches are found for all participants. 10

NEAREST NEIGHBOR WITHIN A CALIPER Caliper: a required common-support region for each match, usually in standard deviation units. Rosembaum and Rubin (1985) suggest a caliper of.25 standard deviations. Matching Method: Within the caliper of each treated observation, select the untreated observation with closest propensity score. 11

MAHALANOBIS DISTANCE 1 di (, j) ( uv) C ( uv) T u and v are values of the matching variables for participant i and nonparticipant j, C is the sample covariance matrix of the matching variables from the full set of nonparticipants; 12

MAHALANOBIS METRIC MATCHING 1. Randomly order sample 2. Calculate the Mahalanobis distance between the first treated individual and all untreated individuals based on all covariates; 3. Choose the untreated individual, j, with the minimum distance d(i,j) as the match for treated individual i; 4. If matching is without replacement, remove j from the pool. 5. Repeat the above process until matches are found for all participants. 13

MAHALANOBIS METRIC MATCHING WITHIN PROPENSITY SCORE CALIPER 1. Randomly order sample 2. Calculate the Mahalanobis distance between the first treated individual and all untreated individuals within an propensity score caliper based on all covariates except the propensity score; 3. Choose the untreated individual, j, with the minimum distance d(i,j) as the match for treated individual i; 4. If matching is without replacement, remove j from the pool. 5. Repeat the above process until matches are found for all participants. 14

RESULTS OF SIMULATION STUDIES COMPARING DISTANCE MEASURES Gu and Rosenbaum (1993): Propensity score performed better than Mahalanobis distance and Mahalanobis within propensity calipers when there were many covariates for use in the study. Zhao (2004): Propensity score matching performed better than Mahalanobis metric matching in conditions with high correlations between covariates and the treatment participation indicator. Propensity score matching did not work well when the sample size used in the simulation was small. 15

OPTIMAL MATCHING Optimal matching is a network flow optimization problem that can be solved by linear programming methods. Matching is performed to minimize total weighted sample distance (which is the minimum cost in the network). Optimal matching will perform as well or better than greedy matching with respect to minimum distance and balance. 16

SIMULATION STUDIES COMPARING MATCHING ALGORITHMS Gu and Rosenbaum (1993): Optimal matching outperformed greedy matching except when there were a large number of control units available for matching to treated units. Optimal matching performed slightly better than greedy matching when comparing both matching methods based on the propensity distance. Cepeda, Boston, Farrar and Strom(2003): Optimal matching produced a larger reduction in bias when using a variable number of control units compared to using a fixed number of control units. 17

NUMBER OF MATCHES: PAIR MATCHING For ATT: Each treated is matched to a single control. For ATE: Each treated is matched to a single control and each control to a single treated. 18

NUMBER OF MATCHES: 1 TO K For ATT: Each treated is matched to K controls. For ATE: Each treated is matched to K controls and each control to K treated. 19

NUMBER OF MATCHES: VARIABLE RATIO OR FULL Variable ratio matching where each treated is matched to many controls. Full matching were each treated is matched to many controls and each control is matched to many treated. Full matching can be considered a form of stratification to a maximum number of strata containing at least one treated and one untreated. 20

SIMULATION STUDIES COMPARING NUMBER OF MATCHES. Gu & Rosenbaum (1993): Full matching performed better than 1 to k matching in terms of distance within matched sets as well as producing greater balance, especially when the number of covariates was large. Cepeda, Boston, Farrar and Strom(2003): Overall, optimal matching with variable number of controls removes more bias than with a fixed number of controls. Rosembaum (1989) points out that optimal pair matching is not actually optimal, but optimal full matching is. 21

SIMULATION RESULTS ABOUT THE EFFECT OF THE NUMBER OF AVAILABLE CONTROLS Gu & Rosenbaum: When there were many controls available for every one that will be used in pair matching, there was little difference between optimal and greedy matching When there is only one control available for matching to each treated unit, optimal matching is noticeably better than greedy matching. Cepeda, Boston, Farrar and Strom(2003): Both optimal matching with a fixed and variable number of control units produced identical reduction in bias when the treated to control ratio was 1/5, but optimal matching with a variable number of controls performed better with 1/2, 1/3 and 1/4 ratios. Regardless of method, reduction of bias with optimal matching decreases as the number of available controls decreases. 22

MATCHING WITH A GENETIC ALGORITHM Minimizes a multivariate weighted distance where weights are chosen to maximize a measure of covariate balance (e.g., standardized mean difference). Weight Matrix. If weights are 1, d is the Mahalanobis distance Cholesky decomposition of the sample covariance matrix S of the matching variables, which is a lower triangular matrix L where S = LL. 23

MATCHING METHODS USED FOR THE EXAMPLE Matching Method One-to-one greedy with replacement and caliper Variable ratio greedy with replacement and caliper Variable ratio genetic with replacement (propensity score [PS] + covariates) Variable ratio genetic with replacement (PS only) One-to-one optimal without replacement Full matching 24

COVARIATE BALANCE FOR THE EXAMPLE Matching Method Maximum standardized difference Unbalanced Covariates One-to-one greedy with replacement and caliper 0.21 11 (26.1) Variable ratio greedy with replacement and caliper 0.30 4 (9.5) Variable ratio genetic with replacement (PS + 0.23 12 (28.6) covariates) Variable ratio genetic with replacement (PS only) 0.13 8 (19.0) One-to-one optimal without replacement 0.28 11 (26.1) Full matching 0.26 4 (9.5) 25

SHOULD MATCHED DATA BE TREATED AS RELATED SAMPLES? Schafer and Kang (2008) and Stuart (2010): matched samples should be treated as independent data because matching does not produce correlations between outcomes of matched individuals. Austin (2011): because the covariates which have similar distributions for matched and treated groups are related to outcomes, the distributions of outcomes will be more similar for treated and matched samples than from randomly selected samples. 26

ABADIE AND IMBENS SIMPLE MATCHING ESTIMATOR n 1 ATE Yˆ Yˆ n 1 0 i 1 0 ATT Y i i Yi i n1 1 ˆ ˆ n1 i T ˆ1 i 1 Y M i Y j J i M if Z 1 () i Y j i if Z 0 i ˆ 0 i 1 Y M i Y i j J () i M if Z 0 Y i j if Z 1 i 27

TREATMENT EFFECT ESTIMATES WITH VARIABLE RATIO GENETIC MATCHING USING THE MATCHING PACKAGE Matching only: Estimate... 3.7664 AI SE... 2.6266 T-stat... 1.4339 p.val... 0.1516 Matching with additional bias adjustment by regressing the outcomes on covariates only with the matched data: Estimate... 4.352 AI SE... 2.7694 T-stat... 1.5714 p.val... 0.11608 28

WEIGHTS FOR TREATMENT EFFECT ESTIMATION WITH 1 TO K, VARIABLE RATIO OR FULL MATCHING One-to-k or variable ratio: weighs are the inverse of the total number of matches unit received. Matching with replacement: weights for each untreated unit are summed across the multiple matched groups it was included in. Then, weights of the matched cases are multiplied by the ratio of the total number of matched units and total number of treated units. w i n n 0 1 n 1 if Z =1 i m 1 i 1 if Zi = 0 M m 29

ESTIMATION OF TREATMENT EFFECT WITH MATCHING Horvitz and Thompson Estimator of the treatment effect (Rosenbaum, 1987) n 1 wy i1 i1 i0 i0 i1 i1 n n 1 0 w i1 i0 i1 i1 n 0 wy w Estimation with Weighted regression: Y BZ e i 0 1 i i

TREATMENT EFFECT ESTIMATES USING FULL MATCHING WITH WEIGHTED REGRESSION Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 10.0428 0.8887 11.300 <2e-16 *** childcaretrue 3.4806 2.3346 1.491 0.136 31

ROSENBAUM S SENSITIVITY ANALYSIS Rosenbaum (2002) proposed a method based on the Wilcoxon signed ranks test The sensitivity analysis increases the level of hidden bias to obtain an upper and lower bound for how the p-value of association might be affected. By varying the assumed magnitude of hidden bias, we can find out how much hidden bias is necessary to render our p-value not significant. 32

BASIC CONCEPTS OF ROSENBAUM S METHOD If two matched individuals have the same observed covariates but different probabilities of receiving the treatment, the odds ratio of these units receiving the treatment is: j /(1 j) j(1 k) /(1 ) (1 ) k k k 1 If there is hidden bias,the odds ratio will be larger than one and smaller than a constant. 1 (1 ) j k (1 ) k 1 33

MEASURING THE DEGREE OF HIDDEN BIAS (gamma) measures the degree of departure from a study that is free of hidden bias. A study is sensitive to hidden bias if small values of lead to change in inferences. A study is insensitive to hidden bias if large values of do not lead to change in inferences. 34

STEPS OF WILCOXON S SIGNED RANK TEST FOR SENSITIVITY ANALYSIS IN A MATCHED PAIRS STUDY 1. Compute the differences between matched pairs and rank them. 2. Compute the Wilcoxon signed rank statistic for the outcome difference between treated and control. 3. Compute the expectation and variance of the Wilcoxon signed rank statistic under the null hypothesis of no treatment effect. 4. Compute the Z score for the observed signed rank statistic given the expected value and variance. For the test against the null hypothesis, the lower and upper bounds of the p value are the same. 35

STEPS OF WILCOXON S SIGNED RAND TEST FOR SENSITIVITY ANALYSIS IN A MATCHED PAIRS STUDY 5. Compute the expectation of the lower bound of the Wilcoxon signed rank statistic under the null hypothesis of = 2 (or any other value higher than 1) and its variance. 6. Compute the Z score associated with the lower bound and associated p value. 7. Compute the expectation of the upper bound of the Wilcoxon signed rank statistic under the null hypothesis of = 2 (or any other value higher than 1) and its variance. 8. Compute the Z score associated with the upper bound and associated p value. 9. Check whether range between the p values of the lower and upper bounds does not cross 0.05, which would change conclusions about the statistical significance of results. 36

RESULTS OF ROSENBAUM S SENSITIVITY ANALYSIS WITH ESTIMATES FROM GENETIC MATCHING Rosenbaum Sensitivity Test for Wilcoxon Signed Rank P-Value Unconfounded estimate... 0.1305 Gamma Lower bound Upper bound 1.0 0.1305 0.1305 1.2 0.0343 0.3311 1.4 0.0077 0.5560 1.6 0.0016 0.7396 1.8 0.0003 0.8613 2.0 0.0001 0.9315 2.2 0.0000 0.9681 2.4 0.0000 0.9858 2.6 0.0000 0.9939 2.8 0.0000 0.9974 3.0 0.0000 0.9990 37

R PACKAGES USEFUL FOR PROPENSITY SCORE MATCHING Package Function Objective MatchIt matchit Implement greedy matching and as interface for genetic, optimal, and full matching Matching GenMatch, Match, MatchBalance Obtain covariate weights for genetic matching, implement genetic matching with weights, as well as greedy matching rbounds psens Implement Rosenbaum s sensitivity analysis method 38