Propensity Score Matching and Genetic Matching : Monte Carlo Results

Similar documents
Summary and discussion of The central role of the propensity score in observational studies for causal effects

Gov 2002: 5. Matching

Matching for Causal Inference Without Balance Checking

Propensity Score Matching

Causal Inference Basics

Measuring Social Influence Without Bias

Gov 2002: 4. Observational Studies and Confounding

MATCHING FOR EE AND DR IMPACTS

A Weighted Score Derived from a Multiple Correspondence

Genetic Matching for Estimating Causal Effects:

Section 9: Matching without replacement, Genetic matching

An Introduction to Causal Analysis on Observational Data using Propensity Scores

R-squared for Bayesian regression models

Figure Figure

Latin Hypercube Sampling with Multidimensional Uniformity

review session gov 2000 gov 2000 () review session 1 / 38

ESTIMATION OF TREATMENT EFFECTS VIA MATCHING

Variable selection and machine learning methods in causal inference

The Nonparametric Bootstrap

Dynamics in Social Networks and Causality

Propensity Score Methods for Causal Inference

Alternative Balance Metrics for Bias Reduction in. Matching Methods for Causal Inference

Discussion: Can We Get More Out of Experiments?

Matching. Quiz 2. Matching. Quiz 2. Exact Matching. Estimand 2/25/14

Authors and Affiliations: Nianbo Dong University of Missouri 14 Hill Hall, Columbia, MO Phone: (573)

Sampling Distributions

Causal Analysis in Social Research

Background of Matching

SIMULATION-BASED SENSITIVITY ANALYSIS FOR MATCHING ESTIMATORS

Use of Matching Methods for Causal Inference in Experimental and Observational Studies. This Talk Draws on the Following Papers:

Chapter 7: Hypothesis testing

Chapter 1 Statistical Inference

Pooling multiple imputations when the sample happens to be the population.

Targeted Maximum Likelihood Estimation in Safety Analysis

Is My Matched Dataset As-If Randomized, More, Or Less? Unifying the Design and. Analysis of Observational Studies

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3

Robustness and Distribution Assumptions

Instrumental Variables

Primal-dual Covariate Balance and Minimal Double Robustness via Entropy Balancing

Review of the Normal Distribution

Covariate Balancing Propensity Score for General Treatment Regimes

Marginal, crude and conditional odds ratios

Research Division Federal Reserve Bank of St. Louis Working Paper Series

Appendix: Modeling Approach

Downloaded from:

Estimation of change in a rotation panel design

The Balance-Sample Size Frontier in Matching Methods for Causal Inference: Supplementary Appendix

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

PROPENSITY SCORE MATCHING. Walter Leite

Matching Techniques. Technical Session VI. Manila, December Jed Friedman. Spanish Impact Evaluation. Fund. Region

The 2004 Florida Optical Voting Machine Controversy: A Causal Analysis Using Matching

An Introduction to Causal Mediation Analysis. Xu Qin University of Chicago Presented at the Central Iowa R User Group Meetup Aug 10, 2016

Econometrics of Policy Evaluation (Geneva summer school)

A Primer on Statistical Inference using Maximum Likelihood

e author and the promoter give permission to consult this master dissertation and to copy it or parts of it for personal use. Each other use falls

Controlling for overlap in matching

Bivariate Weibull-power series class of distributions

Estimating the Marginal Odds Ratio in Observational Studies

An adaptive kriging method for characterizing uncertainty in inverse problems

equal percent bias reducing (EPBR) and variance proportionate modifying (PM) with mean covariance preserving matching

Simulating Uniform- and Triangular- Based Double Power Method Distributions

Multivariate Distributions

Matching. Stephen Pettigrew. April 15, Stephen Pettigrew Matching April 15, / 67

Flexible Spatio-temporal smoothing with array methods

Bootstrap Confidence Intervals for the Correlation Coefficient

Repeated-Measures ANOVA in SPSS Correct data formatting for a repeated-measures ANOVA in SPSS involves having a single line of data for each

Multivariate Distributions

Sampling distributions and the Central Limit. Theorem. 17 October 2016

Implementing Rubin s Alternative Multiple Imputation Method for Statistical Matching in Stata

Confidence Intervals for the Mean of Non-normal Data Class 23, Jeremy Orloff and Jonathan Bloom

The F distribution. If: 1. u 1,,u n are normally distributed; and 2. X i is distributed independently of u i (so in particular u i is homoskedastic)

The R package sampling, a software tool for training in official statistics and survey sampling

Casual Mediation Analysis

Massive Experiments and Observational Studies: A Linearithmic Algorithm for Blocking/Matching/Clustering

Maximum-weighted matching strategies and the application to symmetric indefinite systems

Multilevel Analysis, with Extensions

Controlling for latent confounding by confirmatory factor analysis (CFA) Blinded Blinded

Combining Experimental and Non-Experimental Design in Causal Inference

Accurate and Powerful Multivariate Outlier Detection

Estimating and Using Propensity Score in Presence of Missing Background Data. An Application to Assess the Impact of Childbearing on Wellbeing

3/10/03 Gregory Carey Cholesky Problems - 1. Cholesky Problems

Lab 4, modified 2/25/11; see also Rogosa R-session

Toutenburg, Fieger: Using diagnostic measures to detect non-mcar processes in linear regression models with missing covariates

Bootstrapping, Randomization, 2B-PLS

Financial Econometrics

Optimal Blocking by Minimizing the Maximum Within-Block Distance

Use of Matching Methods for Causal Inference in Experimental and Observational Studies. This Talk Draws on the Following Papers:

Instrumental Variables

Introduction to Propensity Score Matching: A Review and Illustration

How likely is Simpson s paradox in path models?

Uncertainty in energy system models

Don t be Fancy. Impute Your Dependent Variables!

Matching via Majorization for Consistency of Product Quality

Hotelling s One- Sample T2

Chapter 2: Tools for Exploring Univariate Data

Nonparametric Principal Components Regression

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides

NISS. Technical Report Number 167 June 2007

Estimating the Mean Response of Treatment Duration Regimes in an Observational Study. Anastasios A. Tsiatis.

Estimation of the Conditional Variance in Paired Experiments

Transcription:

Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session CPS060) p.5391 Propensity Score Matching and Genetic Matching : Monte Carlo Results Donzé, Laurent University of Fribourg (Switzerland), Department of Quantitative Economics Bd de Pérolles 90 CH 1700 Fribourg, Switzerland E-mail: Laurent.Donze@UniFr.ch Lai, Peychyn University of Fribourg (Switzerland), Department of Quantitative Economics Bd de Pérolles 90 CH 1700 Fribourg, Switzerland E-mail: Peychyn.Lai@UniFr.ch The measure of an impact treatment or the economic evaluation of some political programs appears to raise some non trivial problems, especially when the data are coming from observational studies. Bias adjustments are one of them. In this context, matching methods are generally used. As many others, we chose these methods for example in our study on the impact of the Swiss policy on firm innovation performance [1]. Cautiously, a sensitive analysis was undertaken. Nevertheless, the question of the right matching method stays always open. Propensity scores matching vs genetic matching Generally, the matching process is based on different covariates. The idea is to match two units with same characteristics in terms of covariates. Or in other words to find the nearest-neighbour. Following Rosenbaum and Rubin [5], the propensity score matching is become a standard. The great advantage is to operate the matching on the propensity scores only, i.e. the probabilities of assignment to treatment, and not on a vector of covariates. The method is fully theoretically justify but unfortunately has severe limitations. One of these is that the propensity score is generally unknown and has to be estimated. In fact, the propensity score matching is valid only if we know the true propensity score model and we have sufficient observations to estimate them. One important concept is the Equal Percent Bias Reducing (EPBR) introduced by Rubin [6]. A matching method is said EPBR for a vector of covariates X if the percent reduction of bias is the same for each of the covariates that are matched and for all linear function of them. In particular, one can prove that the propensity scores matching is EPBR only if the covariates have ellipsoidal distributions such as the normal or the student distributions. We have also to note that if the EPBR condition is not fulfilled then the bias can be increased even if the univariate covariate means are closer after the matching than before. Therefore applying propensity scores matching may conduct to balance worse across key confounding variables. Sekhon and Grieve [10] advocate the use of Genetic matching method ([3], [8]). The matching criterion is based on a generalization of the Mahalanobis distance. In fact, a weight matrix is included

Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session CPS060) p.5392 in the distance function: (1) d(x i, X j ) = [ (X i X j ) (Σ 1/2 ) W (Σ 1/2 )(X i X j )] 1/2, where W is a (p p)-matrix; p is the number of covariates in X; i and j are two distinct units; Σ 1/2 is a Cholesky decomposition of Σ, the variance-covariance matrix of X. By default, the matrix W is diagonal with p parameters chosen appropriately. The genetic matching allows the incorporation of the propensity scores in the list of covariates. The issue to choose the free p parameters of W is open. In general, one compares the distribution of the covariates for the two groups of units (treatment and control units) before and after matching. Paired t-tests and non-parametric Kolmogorov-Smirnov tests are applied. The p-values of the test statistics may help to choose in the sense that greater p-values signal better balances. The strategy adopted by the genetic matching method is thus to maximise at each step the smallest p-value (across the covariates). Imbalance is accepted to worsen, as long as the maximum discrepancy is reduced. Empirical study In two case studies, Sekhon and Grieve [10] show the superiority of genetic matching over a propensity scores matching or a parametric matching. They emphasize that unlike propensity scores matching, genetic matching does not rely on covariates with ellipsoidal distributions. Genetic matching can reduce biases in case of great imbalances and are not sensitive to model specification. Precisely, they pretend that genetic matching can achieve much better balance than a strict propensity scores matching. Numerous researches are based on propensity scores matching. This is generally done without considering the distribution of the covariates and / or based on an assumed true model for estimating the propensity scores. The results can be very sensitive to the applied method. It is therefore of great importance to verify the above mentioned results. In our attempt to do this, we adopt the following strategy. We search first to construct true propensity scores based on a vector of covariates X. The covariates are generated according to different ellipsoidal or not-ellipsoidal distributions. A sample is drawn with probabilities given by the propensity scores. Four different matching are then used and the balance of the covariates is tested. The R software [4] is used to perform the simulation. Particularly, we used the library sampling [11] to draw the treatment group and the library Matching [9] to operate the matching and verify the balance of the covariates. Simulation set-up 1. We first generate a population of size N = 1000 with four variables according to the distributions given in Table 1. These covariates could mean respectively the income, sex, age and a last characteristic less present in the population of a person.

Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session CPS060) p.5393 Table 1: Distributions of the variables Variables Distribution X 1 X 1 N (6000, 1500) X 2 X 2 Bernoulli (0.6) X 3 X 3 N (35, 5) X 3 X 4 Bernoulli (0.1) 2. Then, we generate for each unit in the population a propensity score according to the following model: (2) ps(x) := Φ(2.606 0.0004032X 1 0.5198X 2 + 0.008032X 3 0.7896X 4 ), where Φ is the cumulative normal distribution. 3. The next step is the simulation. We fix the number of each simulation to 1000. 4. At each iteration, a treatment group is drawn. Each unit is sampled with a Brewer algorithm with inclusion probabilities given by the propensity scores. The size of the group is equal to the sum of the propensity scores. In our case, the size is of 574 units and therefore the control group has 426 units. 5. The units of the treatment group are then matched. We use four matching methods: (a) traditional matching without propensity scores; (b) traditional matching with propensity scores; (c) genetic matching without propensity scores; (d) genetic matching with propensity scores. 6. For each matching method used, we control the balance of the covariates. As in Sekhon and Grieve [10], a paired t-test difference in the mean of the covariate for the treatment and the control groups is done for each covariate and the p-value of the test is stored. Remember that high p-values suggest better balance. 7. Finally, at the end of the simulation we compute for each covariate the mean of the p-values. The results are given in table 2. Results and conclusion The results are given in table 2. The first covariate, X 1, is suppose issued from a normal distribution. Before matching, the mean of the p-values is close to zero and suggests that there is strong imbalance between the two groups. Traditional matching without propensity scores and genetic

Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session CPS060) p.5394 matching without propensity scores don t really improve the balance. On the contrary, the inclusion of the propensity scores in the vector of covariates improve greatly the balance. This is the case for the traditional matching and for the genetic matching. With 0.3178, genetic matching with propensity scores appears the better method. The third covariate, X 3, comes also from a normal distribution. We see that before matching there is already a great balance (0.3585). Surprisingly, only the traditional matching without propensity scores is able to improve the balance. We note again that genetic matching without performs the worst. For the second covariate, X 2, a Bernoulli s type, it is the traditional matching without propensity scores which improves the best the balance (0.8467). Again, genetic matching without propensity scores seems to be the worst method. Finally, for the last covariate, X 4, a Bernoulli s type, traditional matching without propensity scores appears again the best method with a perfect balance (1.0). In this case, the second best is matching without propensity scores. But, we have to notice that the balance isn t better than before matching (0.4880 vs. 0.4907). In conclusion, our results are very contrasted and don t show the superiority of genetic matching, particularly without propensity scores. Table 2: p-values means X 1 X 2 X 3 X 4 Before Matching 0.0000 0.0001 0.3585 0.4907 Traditional Matching without PS 0.0000 0.8467 0.3731 1.0000 Traditional Matching with PS 0.2847 0.3255 0.3092 0.3320 Genetic Matching without PS 0.0110 0.3105 0.2866 0.4880 Genetic Matching with PS 0.3178 0.3517 0.3201 0.3461 REFERENCES [1] S. Arvanitis, L. Donzé, and N. Sydow, Impact of Swiss technology policy on firm innovation performance : an evaluation based on a matching approach, Science and Public Policy, 37 (2010), pp. 63 78. [2] P. Lai, Genetic matching : Eine bessere Alternative zu den bisherigen Matching-Methoden? Eine empirische Studie über die Auswirkung der Projektförderung der Kommission für Technologie und Innovation, Master s thesis, Department of Quantitative Economics, University of Fribourg, 23. Juni 2010. [3] W. R. Mebane, Jr. and J. S. Sekhon, Genetic Optimization Using Derivatives: The rgenoud Package for R, Journal of Statistical Software, (2009). [4] R Development Core Team, R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, 2009. ISBN 3-900051-07-0.

Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session CPS060) p.5395 [5] P. R. Rosenbaum and D. B. Rubin, The central role of the propensity score in observational studies for causal effects, Biometrika, 70 (1983), pp. 41 55. [6] D. B. Rubin, Multivariate Matching Methods That Are Equal Percent Bias Reducing, I: Some Examples, Biometrics, 32 (1976), pp. 109 120. [7] J. S. Sekhon, Multivariate and Propensity Score Matching Software with Automated Balance Optimization: The Matching package for R. unpublished, 2007. [8] J. S. Sekhon, Matching: Multivariate and Propensity Score Matching with Balance Optimization, 4th May 2011. R package version 4.7-13. [9], Multivariate and Propensity Score Matching. Software with Automated Balance Optimization: The Matching package for R, Journal of Statistical Software, (2011). forthcoming. [10] J. S. Sekhon and R. Grieve, A new non-parametric matching method for bias adjustment with applications to economic evaluations., Working Paper, (2008), p. 46. [11] Y. Tillé and A. Matei, sampling: Functions for drawing and calibrating samples, 21st January 2011. R package version 2.4. ABSTRACT The matching methods have been extensively used to evaluate the impact of a specific treatment in the case as instance of non-experimental studies. In this context, the matching methods based on propensity scores have become a well-known standard. However it appears in many cases that the propensity score matching methods are not really a panacea. Following the work of Sekhon and Grieve [10] we explore by a Monte Carlo study, the genetic matching method in comparison to the traditional methods. Our results are contrasted and don t proof the superiority of the genetic matching method.