Use of Matching Methods for Causal Inference in Experimental and Observational Studies. This Talk Draws on the Following Papers:

Similar documents
Use of Matching Methods for Causal Inference in Experimental and Observational Studies. This Talk Draws on the Following Papers:

Advanced Quantitative Research Methodology, Lecture Notes: Research Designs for Causal Inference 1

Statistical Analysis of Causal Mechanisms

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies

Unpacking the Black-Box of Causality: Learning about Causal Mechanisms from Experimental and Observational Studies

Unpacking the Black-Box of Causality: Learning about Causal Mechanisms from Experimental and Observational Studies

Propensity Score Matching

An Introduction to Causal Analysis on Observational Data using Propensity Scores

Dynamics in Social Networks and Causality

Notes on causal effects

Discussion of Papers on the Extensions of Propensity Score

Statistical Analysis of Causal Mechanisms

Primal-dual Covariate Balance and Minimal Double Robustness via Entropy Balancing

Covariate Balancing Propensity Score for General Treatment Regimes

Causal Inference Lecture Notes: Selection Bias in Observational Studies

Unpacking the Black-Box: Learning about Causal Mechanisms from Experimental and Observational Studies

Introduction to Statistical Inference

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data?

Matching and Weighting Methods for Causal Inference

ESTIMATION OF TREATMENT EFFECTS VIA MATCHING

The Essential Role of Pair Matching in. Cluster-Randomized Experiments. with Application to the Mexican Universal Health Insurance Evaluation

Statistical Analysis of Causal Mechanisms for Randomized Experiments

Statistical Analysis of Randomized Experiments with Nonignorable Missing Binary Outcomes

Statistical Analysis of Causal Mechanisms

Stratified Randomized Experiments

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data?

Identification and Inference in Causal Mediation Analysis

Unpacking the Black-Box: Learning about Causal Mechanisms from Experimental and Observational Studies

Matching for Causal Inference Without Balance Checking

Inference for Average Treatment Effects

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data?

Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal

Discussion: Can We Get More Out of Experiments?

Applied Statistics Lecture Notes

Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions

Partially Identified Treatment Effects for Generalizability

Experimental Designs for Identifying Causal Mechanisms

On the Use of Linear Fixed Effects Regression Models for Causal Inference

Causal Interaction in Factorial Experiments: Application to Conjoint Analysis

Since the seminal paper by Rosenbaum and Rubin (1983b) on propensity. Propensity Score Analysis. Concepts and Issues. Chapter 1. Wei Pan Haiyan Bai

Balancing Covariates via Propensity Score Weighting

Noncompliance in Randomized Experiments

Gov 2002: 4. Observational Studies and Confounding

Propensity Score Methods for Causal Inference

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score

Rerandomization to Balance Covariates

Achieving Optimal Covariate Balance Under General Treatment Regimes

Selection on Observables: Propensity Score Matching.

What s New in Econometrics. Lecture 1

Causal Interaction in Factorial Experiments: Application to Conjoint Analysis

Answers to Problem Set #4

arxiv: v1 [stat.me] 15 May 2011

BIOS 6649: Handout Exercise Solution

Causal Mechanisms Short Course Part II:

Matching with Multiple Control Groups, and Adjusting for Group Differences

Causal Mediation Analysis in R. Quantitative Methodology and Causal Mechanisms

Variable selection and machine learning methods in causal inference

High Dimensional Propensity Score Estimation via Covariate Balancing

PSC 504: Dynamic Causal Inference

Simple Linear Regression

Regression Discontinuity Designs

An Experimental Evaluation of High-Dimensional Multi-Armed Bandits

Advanced Quantitative Methods: Causal inference

Measuring Social Influence Without Bias

Gov 2002: 5. Matching

Robust Estimation of Inverse Probability Weights for Marginal Structural Models

ECO Class 6 Nonparametric Econometrics

Supplemental Appendix to "Alternative Assumptions to Identify LATE in Fuzzy Regression Discontinuity Designs"

Identification Analysis for Randomized Experiments with Noncompliance and Truncation-by-Death

Research Design: Causal inference and counterfactuals

Matching. Quiz 2. Matching. Quiz 2. Exact Matching. Estimand 2/25/14

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Review of Econometrics

Lab 4, modified 2/25/11; see also Rogosa R-session

Weighting Methods. Harvard University STAT186/GOV2002 CAUSAL INFERENCE. Fall Kosuke Imai

A Sensitivity Analysis for Missing Outcomes Due to Truncation-by-Death under the Matched-Pairs Design

Sampling and Sample Size. Shawn Cole Harvard Business School

STAT331. Cox s Proportional Hazards Model

Optimal Blocking by Minimizing the Maximum Within-Block Distance

Propensity Score Matching and Analysis TEXAS EVALUATION NETWORK INSTITUTE AUSTIN, TX NOVEMBER 9, 2018

Analysis of propensity score approaches in difference-in-differences designs

The propensity score with continuous treatments

AFFINELY INVARIANT MATCHING METHODS WITH DISCRIMINANT MIXTURES OF PROPORTIONAL ELLIPSOIDALLY SYMMETRIC DISTRIBUTIONS

University of California, Berkeley

Chapter 4: Factor Analysis

Gov 2002: 3. Randomization Inference

EMERGING MARKETS - Lecture 2: Methodology refresher

ECON Introductory Econometrics. Lecture 17: Experiments

Observational Studies and Propensity Scores

Causal Inference in Observational Studies with Non-Binary Treatments. David A. van Dyk

Applied Microeconometrics (L5): Panel Data-Basics

Gov 2002: 13. Dynamic Causal Inference

1/15. Over or under dispersion Problem

Comments on Best Quasi- Experimental Practice

Balancing Covariates via Propensity Score Weighting: The Overlap Weights

Instrumental Variables

THE DESIGN (VERSUS THE ANALYSIS) OF EVALUATIONS FROM OBSERVATIONAL STUDIES: PARALLELS WITH THE DESIGN OF RANDOMIZED EXPERIMENTS DONALD B.

12E016. Econometric Methods II 6 ECTS. Overview and Objectives

Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design

Flexible Estimation of Treatment Effect Parameters

Transcription:

Use of Matching Methods for Causal Inference in Experimental and Observational Studies Kosuke Imai Department of Politics Princeton University April 27, 2007 Kosuke Imai (Princeton University) Matching Methods 1 / 21 This Talk Draws on the Following Papers: Imai, Kosuke, Gary King, and Elizabeth A. Stuart. Misunderstandings among Experimentalists and Observationalists: Balance Test Fallacies in Causal Inference. Ho, Daniel E., Kosuke Imai, Gary King, and Elizabeth A. Stuart. (2007). Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference. Political Analysis, Forthcoming. Horiuchi, Yusaku, Kosuke Imai, and Naoko Taniguchi. (2007). Designing and Analyzing Randomized Experiments: Application to a Japanese Election Survey Experiment. American Journal of Political Science, Vol. 51, No. 3 (July), pp. 670-689. Imai, Kosuke. Randomization-based Analysis of Randomized Experiments under the Matched-Pair Design: Variance Estimation and Efficiency Considerations. Kosuke Imai (Princeton University) Matching Methods 2 / 21

What are Matching Methods? Matching Methods in 1 Matched-Pair Design: 1 Create pairs of observations based on the pre-treatment covariates so that the two observations within each matched-pair are similar. 2 Randomly assign the treatment within each matched-pair (either simple randomization or complete randomization across pairs). 3 Inference is based on the average of within-pair estimates. 2 Randomized-block Design: 1 Form a group or block of (more than two) observations based on the pre-treatment covariates so that the observations within each block are similar. 2 Complete randomization of the treatment within each block. 3 Inference is based on the weighted average of within-block estimates. Matching and blocking can be based on a single covariate or a function of multiple covariates (e.g., Mahalanobis distance). Kosuke Imai (Princeton University) Matching Methods 3 / 21 What are Matching Methods? An Example of Randomized-block Design Japanese election experiment: randomly selected voters are encouraged to look at the party manifestos on the official party websites during Japan s 2004 Upper House election. Randomized blocks I II III IV V VI Planning to vote Not planning to vote Undecided M F M F M F Total Treatment groups DPJ website 194 151 24 33 36 62 500 LDP website 194 151 24 33 36 62 500 DPJ/LDP websites 117 91 15 20 20 37 300 LDP/DPJ websites 117 91 15 20 20 37 300 Control group no website 156 121 19 26 29 49 400 Block size 778 605 97 132 141 247 2000 Kosuke Imai (Princeton University) Matching Methods 4 / 21

What are Matching Methods? Observational Studies Matching Methods in Observational Studies 1 Matching: Each treated unit is paired with a similar control unit based on the pre-treatment covariates. 2 Subclassification: Treated and control units are grouped to form subclasses based on the pre-treatment covariates so that within each subclass treated units are similar to control units. 3 Weighting: Weight each observation within the treated or control groups by the inverse of the probability of being in that group. Weighting is based on the propensity score (i.e., the probability of receiving the treatment). Matching and subclassification are based on the propensity score and other measures of similarity. Kosuke Imai (Princeton University) Matching Methods 5 / 21 Framework and Notation Notations and Quantities of Interest Notations Population size: N. Sample size: n. Sample selection: I i. Binary treatment assignment: T i. Potential outcomes: Y i (1) and Y i (0). Observed outcome: Y i = T i Y i (1) + (1 T i )Y i (0) for I i = 1. Observed and unobserved pre-treatment covariates: X i and U i. Quantities of Interest 1 Unit treatment effect: TE i Y i (1) Y i (0). 2 Sample average treatment effect: SATE 1 n i {I i =1} TE i. 3 Population average treatment effect: PATE 1 N N i=1 TE i. Super-population N units are sampled from a population of infinite size. Super-population average treatment effect: SPATE E[Y i (1) Y i (0)]. Kosuke Imai (Princeton University) Matching Methods 6 / 21

Decomposition Overview A Decomposition of Causal Effect Estimation Error A simple estimator of the PATE, SATE, or SPATE: D 1 Y i 1 n/2 n/2 i {I i =1,T i =1} i {I i =1,T i =0} Estimation error for the PATE: PATE D. Consider an additive model: Y i (t) = g t (X i ) + h t (U i ) for t, 1. New decomposition: = S + T = ( SX + SU ) + ( TX + TU ). 1 S : sample selection error due to observables ( SX ) and unobservables ( SU ). 2 T : treatment imbalance due to observables ( TX ) and unobservables ( TU ). Generalizable to SPATE, various average treatment effects on the treated, and other settings. Kosuke Imai (Princeton University) Matching Methods 7 / 21 Y i. Decomposition Sample Selection Error Sample Selection Definition: S PATE SATE = N n (NATE SATE), N where NATE is the nonsample average treatment effect, i.e., NATE i {I i =0} TE i/(n n). When does S equal 0? 1 N = n: The sample is a census of the population. 2 SATE = NATE: The average treatment effect is the same between the sample and nonsample. 3 Give up PATE and focus on SATE. Kosuke Imai (Princeton University) Matching Methods 8 / 21

Decomposition Sample Selection Decomposition of Sample Selection Error Sample selection error due to observables: SX = N n [g 1 (X) g 0 (X)] d[ F(X I ) N F(X I = 1)], where F( ) represents the empirical distribution function. When does SX equal 0? 1 F(X I ) = F (X I = 1): identical distributions of Xi. 2 g 1 (X i ) g 0 (X i ) = α: constant treatment effect over X i. The same argument applies to the sample selection error due to unobservables: SU = N n [h 1 (U) h 0 (U)] d[ F(U I ) N F(U I = 1)]. Kosuke Imai (Princeton University) Matching Methods 9 / 21 Decomposition Treatment Imbalance Error Treatment Imbalance Decomposition of estimation error due to treatment imbalance: TX = TU = g1 (X) + g 0 (X) d[ F(X T, I = 1) 2 F(X T = 1, I = 1)], h1 (U) + h 0 (U) d[ F(U T, I = 1) 2 F (U T = 1, I = 1)]. TX and TU vanish if the treatment and control groups are balanced (i.e., have identical empirical distributions): F(X T = 1, I = 1) = F(X T, I = 1), F(U T = 1, I = 1) = F(U T, I = 1). The first condition is verifiable from the observed data but the second is not. Kosuke Imai (Princeton University) Matching Methods 10 / 21

Decomposition Implications of the Decomposition The Role of Matching Methods in Causal Inference The decomposition formalizes the role of matching methods in reducing the estimation error. In both experimental and observational studies, matching methods try to reduce TX given by: g1 (X) + g 0 (X) d[ F(X T, I = 1) 2 F(X T = 1, I = 1)], which represents the estimation error due to the treatment imbalance in observables. They do so by trying to achieve: F(X T = 1, I = 1) = F(X T, I = 1). In practice, some imbalance remains and needs to be adjusted by randomization (in experimental studies) and model adjustments (observational studies matching as nonparametric preprocessing ). Kosuke Imai (Princeton University) Matching Methods 11 / 21 Decomposition Implications of the Decomposition Consequences of Design Choices on Estimation Error Design Choice SX SU Random sampling TX TU Focus on SATE rather than PATE Weighting for nonrandom sampling =? Large sample size???? Random treatment assignment Complete blocking =? Exact matching =? Assumption No selection bias Ignorability No omitted variables Kosuke Imai (Princeton University) Matching Methods 12 / 21

Decomposition What is the Best Design? None! Implications of the Decomposition SX SU TX TU Ideal Experiment 0 0 0 Randomized Clinical Trials (Limited or no blocking) 0 0 Randomized Clinical Trials (Complete blocking) 0 0 Field Experiment (Limited or no blocking) 0 0 0 0 Survey Experiment (Limited or no blocking) 0 0 0 0 Observational Study (Representative data set, Well-matched) 0 0 0 0 Observational Study (Unrepresentative data set, Well-matched) 0 0 0 0 Kosuke Imai (Princeton University) Matching Methods 13 / 21 Fallacies Fallacies in 1 Most applied experimental research conducts simple randomization of the treatment. Among all experiments published in APSR, AJPS, and JOP (since 1995) and those listed in Time-sharing Experiments for the Social Sciences, only one uses matching methods! Two key analytical results: 1 Randomized-block design always yields more efficient estimates. 2 Matched-pair design usually yields more efficient estimates. 2 Hypothesis tests should be used to examine the process of randomization itself but not to look for significant imbalance. 1 Imbalance is a sample concept not a population one, and cannot be eliminated or reduced by randomization. 2 Only matched-pair or randomized-block designs can eliminate or reduce imbalance. Kosuke Imai (Princeton University) Matching Methods 14 / 21

Fallacies Proof that Randomized-block Design is Always Better 1 Variance under the complete-randomized design: Var c (D) = 1 n/2 {Var(Y i(1)) + Var(Y i (0))}. 2 Variance under the randomized-block design: Var b (D) = 1 n/2 w x {(Var x (Y i (1)) + Var x (Y i (0)))}, x X where w x = n x /n is the known population weight for the block formed by units with X i = x. 3 Then, since Var(Y (t)) E[Var x (Y i (t))] = x X w xvar x (Y i (t)), we have Var c (D) Var b (D). Kosuke Imai (Princeton University) Matching Methods 15 / 21 Fallacies Proof that Matched-Pair Design is Usually Better Variance under the Matched-Pair design: Var m (D) = 1 n {Var(Y 1j(1) Y 2j (0)) + Var(Y 2j (1) Y 1j (0))} Then, = Var c (D) Var m (D) n 1 n(2n 1) Cov(Y 1j(1) + Y 1j (0), Y 2j (1) + Y 2j (0)) Thus, unless matching induces negative correlation, the matched-pair design yields more efficient estimates. Kosuke Imai (Princeton University) Matching Methods 16 / 21

Fallacies Relative Efficiency of the Matched-Pair Design One can empirically evaluate the efficiency of the matched-pair design relative to the complete-randomized design. Question: Are match-makers dumb? Under the matched-pair design, I derive the bounds on the variance one would obtain if the complete-randomized design were employed. The bounds are based on: 2(n 1) 2n 1 Var(Y ij(t)) Var(Y i (t)) for t, 1. The bounds can be estimated without bias. 2(n 1) 2n 1 Var(Y ij(t)) + 2 2n 1 E{Y ij(t) 2 } Kosuke Imai (Princeton University) Matching Methods 17 / 21 Fallacies Observational Studies Assessing the Balance in Matching Methods The success of any matching method depends on the resulting covariate balance. How should one assess the balance of matched data? Ideally, one would like to compare the joint distribution of all covariates for the matched treatment and control groups. In practice, this is impossible when X is high-dimensional. Standard practice: use of balance test t test for difference in means for each variable of X. other test statistics such as χ 2, F, Kolmogorov-Smirnov tests are also used. statistically insignificant test statistics are used as a justification for the adequacy of the chosen matching method and a stopping rule for maximizing balance. the practice is widespread across disciplines (economics, education, management science, medicine, public health, psychology, and statistics). Kosuke Imai (Princeton University) Matching Methods 18 / 21

Fallacies Observational Studies An Illustration of Balance Test Fallacy School Dropout Demonstration Assistance Program. Treatment: school restructuring programs. Outcome: dropout rates. We look at the baseline math test score. Silly matching algorithm: randomly selects control units to discard. t statistic 0 1 2 3 4 "Statistical insignificance" region 0 100 200 300 400 Number of Controls Randomly Dropped Math test score 0 20 40 60 80 100 QQ Plot Mean Deviation Difference in Means 0 100 200 300 400 Number of Controls Randomly Dropped Kosuke Imai (Princeton University) Matching Methods 19 / 21 Fallacies Observational Studies Problems with Hypothesis Tests as Stopping Rules 1 Balance test is a function of both balance and statistical power: the more observations dropped, the less power the tests have. 2 t-test is affected by factors other than balance, nm (X mt X mc ) s 2 mt r m + s2 mc 1 r m X mt and X mc are the sample means. s 2 mt and s 2 mc are the sample variances. n m is the total number of remaining observations. r m is the ratio of remaining treated units to the total number of remaining observations. 3 Even a small imbalance can greatly affect the estimates. Linear regression: E(Y T, X) = θ + T β + Xγ. Bias: E( ˆβ β T, X) = Gγ where ˆβ is the difference in means estimator and G contains vector of coefficients from regression of each of the covariates in X on a constant and T. Kosuke Imai (Princeton University) Matching Methods 20 / 21

Conclusion Concluding Recommendations For experimenters: 1 Unbiasedness should not be your goal. 2 Use matching methods to improve efficiency. 3 block what you can and randomize what you cannot. Randomized-block design is always more efficient. Matched-pair design is often more efficient. For observationalists: 1 Balance should be assessed by comparing the sample covariate differences between the treatment and control groups. 2 Do not use hypothesis tests to assess balance. 3 No critical threshold observed imbalance should be minimized. For everyone: 1 There is no best design. 2 Minimize each component of the estimation error via design and analysis: sample selection and treatment imbalance. Kosuke Imai (Princeton University) Matching Methods 21 / 21