The Balance-Sample Size Frontier in Matching Methods for Causal Inference: Supplementary Appendix

Similar documents
An Introduction to Causal Analysis on Observational Data using Propensity Scores

The Balance-Sample Size Frontier in Matching Methods for Causal Inference

Matching is a statistically powerful and conceptually

Matching. Stephen Pettigrew. April 15, Stephen Pettigrew Matching April 15, / 67

Selection on Observables: Propensity Score Matching.

Gov 2002: 5. Matching

Propensity Score Matching

Matching. Quiz 2. Matching. Quiz 2. Exact Matching. Estimand 2/25/14

Lab 4, modified 2/25/11; see also Rogosa R-session

Dynamics in Social Networks and Causality

A Theory of Statistical Inference for Matching Methods in Causal Research

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies

A Course in Applied Econometrics. Lecture 2 Outline. Estimation of Average Treatment Effects. Under Unconfoundedness, Part II

Background of Matching

Quantitative Economics for the Evaluation of the European Policy

Groupe de lecture. Instrumental Variables Estimates of the Effect of Subsidized Training on the Quantiles of Trainee Earnings. Abadie, Angrist, Imbens

Combining Difference-in-difference and Matching for Panel Data Analysis

Introduction to Econometrics. Assessing Studies Based on Multiple Regression

Matching for Causal Inference Without Balance Checking

A Measure of Robustness to Misspecification

Imbens/Wooldridge, IRP Lecture Notes 2, August 08 1

Supplemental Information

Achieving Optimal Covariate Balance Under General Treatment Regimes

Weighting. Homework 2. Regression. Regression. Decisions Matching: Weighting (0) W i. (1) -å l i. )Y i. (1-W i 3/5/2014. (1) = Y i.

NBER WORKING PAPER SERIES A NOTE ON ADAPTING PROPENSITY SCORE MATCHING AND SELECTION MODELS TO CHOICE BASED SAMPLES. James J. Heckman Petra E.

Assess Assumptions and Sensitivity Analysis. Fan Li March 26, 2014

Advanced Quantitative Research Methodology Lecture Notes: January Ecological 28, 2012 Inference1 / 38

Propensity Score Analysis with Hierarchical Data

studies, situations (like an experiment) in which a group of units is exposed to a

Estimating and Using Propensity Score in Presence of Missing Background Data. An Application to Assess the Impact of Childbearing on Wellbeing

Econometrics I. Professor William Greene Stern School of Business Department of Economics 1-1/40. Part 1: Introduction

PROPENSITY SCORE MATCHING. Walter Leite

(quantitative or categorical variables) Numerical descriptions of center, variability, position (quantitative variables)

Comments on Best Quasi- Experimental Practice

ESTIMATION OF TREATMENT EFFECTS VIA MATCHING

Advanced Quantitative Research Methodology, Lecture Notes: Research Designs for Causal Inference 1

DOCUMENTS DE TRAVAIL CEMOI / CEMOI WORKING PAPERS. A SAS macro to estimate Average Treatment Effects with Matching Estimators

The Economics of European Regions: Theory, Empirics, and Policy

What s New in Econometrics. Lecture 1

Genetic Matching for Estimating Causal Effects:

Predicting the Treatment Status

Notes on causal effects

Guanglei Hong. University of Chicago. Presentation at the 2012 Atlantic Causal Inference Conference

Composite Causal Effects for. Time-Varying Treatments and Time-Varying Outcomes

Research Note: A more powerful test statistic for reasoning about interference between units

Applied Microeconometrics (L5): Panel Data-Basics

II. MATCHMAKER, MATCHMAKER

Matching with Multiple Control Groups, and Adjusting for Group Differences

Online Appendix to Yes, But What s the Mechanism? (Don t Expect an Easy Answer) John G. Bullock, Donald P. Green, and Shang E. Ha

Combining Experimental and Non-Experimental Design in Causal Inference

New Developments in Nonresponse Adjustment Methods

Item Response Theory for Conjoint Survey Experiments

Sensitivity analysis for average treatment effects

Use of Matching Methods for Causal Inference in Experimental and Observational Studies. This Talk Draws on the Following Papers:

An Alternative Assumption to Identify LATE in Regression Discontinuity Design

Treatment Effects with Normal Disturbances in sampleselection Package

A Note on Adapting Propensity Score Matching and Selection Models to Choice Based Samples

Propensity Score Matching and Genetic Matching : Monte Carlo Results

SUPPLEMENT TO DEMOCRACY, REDISTRIBUTION, AND POLITICAL PARTICIPATION: EVIDENCE FROM SWEDEN (Econometrica, Vol. 82, No. 3, May 2014, )

Can a Pseudo Panel be a Substitute for a Genuine Panel?

THE DESIGN (VERSUS THE ANALYSIS) OF EVALUATIONS FROM OBSERVATIONAL STUDIES: PARALLELS WITH THE DESIGN OF RANDOMIZED EXPERIMENTS DONALD B.

Regression Discontinuity Designs

Achieving Optimal Covariate Balance Under General Treatment Regimes

Why high-order polynomials should not be used in regression discontinuity designs

Regression Discontinuity

Applied Microeconometrics. Maximilian Kasy

MATCHING FOR EE AND DR IMPACTS

Impact Evaluation of Rural Road Projects. Dominique van de Walle World Bank

Technical Track Session I:

Policy-Relevant Treatment Effects

Balancing Covariates via Propensity Score Weighting

Experimental Designs for Identifying Causal Mechanisms

Economics 326 Methods of Empirical Research in Economics. Lecture 1: Introduction

1. Regressions and Regression Models. 2. Model Example. EEP/IAS Introductory Applied Econometrics Fall Erin Kelley Section Handout 1

Balancing Covariates via Propensity Score Weighting: The Overlap Weights

On the Use of Linear Fixed Effects Regression Models for Causal Inference

Bounds on Average and Quantile Treatment Effects of Job Corps Training on Wages*

Measuring Social Influence Without Bias

The Effect of TenMarks Math on Student Achievement: Technical Supplement. Jordan Rickles, Ryan Williams, John Meakin, Dong Hoon Lee, and Kirk Walters

Empirical approaches in public economics

Empirical Methods in Applied Microeconomics

Supplemental Appendix to "Alternative Assumptions to Identify LATE in Fuzzy Regression Discontinuity Designs"

Why High-Order Polynomials Should Not Be Used in Regression Discontinuity Designs

Primal-dual Covariate Balance and Minimal Double Robustness via Entropy Balancing

Going Beyond LATE: Bounding Average Treatment Effects of Job Corps Training

Propensity Score Weighting with Multilevel Data

Technical Track Session I: Causal Inference

Long-run effects of active labor market policy programs

Causal Inference with and without Experiments I

A Theory of Statistical Inference for Matching Methods in Applied Causal Research

Job Training Partnership Act (JTPA)

Fairness and Redistribution: Response

Logistic regression: Why we often can do what we think we can do. Maarten Buis 19 th UK Stata Users Group meeting, 10 Sept. 2015

Statistical Models for Causal Analysis

NISS. Technical Report Number 167 June 2007

IPW estimation and related estimators for evaluation of active labor market policies in a dynamic setting

Econometrics of causal inference. Throughout, we consider the simplest case of a linear outcome equation, and homogeneous

Write your identification number on each paper and cover sheet (the number stated in the upper right hand corner on your exam cover).

Normalized Equation and Decomposition Analysis: Computation and Inference

An Introduction to Econometrics. A Self-contained Approach. Frank Westhoff. The MIT Press Cambridge, Massachusetts London, England

Transcription:

The Balance-Sample Size Frontier in Matching Methods for Causal Inference: Supplementary Appendix Gary King Christopher Lucas Richard Nielsen March 22, 2016 Abstract This is a supplementary appendix to Gary King, Christopher Lucas, and Richard Nielsen, The Balance-Sample Size Frontier in Matching Methods for Causal Inference, forthcoming, American Journal of Political Science. A Inspecting the Means of Pruned Observations In our main paper, Figures 3 and 5 display scaled covariate means across the Lalonde (1986) and the Boyd, Epstein and Martin (2010) frontiers for observations remaining in the matched samples. However, a researcher might also be interested in a description of the pruned data; that is, the covariate means for observations removed from the sample. In this section, we create these plots for both illustrations using. To make this calculation, we use the MatchingFrontier software package (King, Lucas and Nielsen, 2015). A.1 Job Training As a companion to the analysis displayed in Section 6.1, Figure 1 presents covariate means for pruned observations at each point on the frontier. Note that this differs from Figure 3 in the main text, which presents the means for observations remaining in the matched sample, rather than those pruned. Importantly, this implies that all covariates included Albert J. Weatherhead III University Professor, Institute for Quantitative Social Science, 1737 Cambridge Street, Harvard University, Cambridge MA 02138; GaryKing.org, king@harvard.edu, (617) 500-7570. Ph.D. Candidate, Institute for Quantitative Social Science, 1737 Cambridge Street, Harvard University, Cambridge MA 02138; christopherlucas.org, clucas@fas.harvard.edu, (617) 982-2718. Assistant Professor, Massachusetts Institute of Technology, 77 Massachusetts Ave, Cambridge MA 02139; www.mit.edu/ rnielsen, rnielsen@mit.edu, (857) 998-8039. 1

in Figure 1 are untreated because the estimand in this example is the sample average treatment effect on the treated, for which treated units are not pruned. 1.0 Married Income 1975 0.8 Scaled means 0.6 0.4 Education Income 1974 Age 0.2 0.0 No degree Unemployment 1974 Unemployment 1975 Black Hispanic 0 5000 10000 15000 Figure 1: Covariate means for pruned observations on the Lalonde (1986) frontier. Figure 1 reveals that early in the frontier, the algorithm prunes employed participants with relatively high incomes. This is clear from the means for variables re74, re75, u74, and u75, which are incomes in 74 and 75, along with indicators for unemployment. The intuition that eligibility for the original experimental study was contingent on income and employment status. Naturally, after supplementing the original experiment with observational data, the worst controls are those that would not have been eligible for the experiment namely, those with very high incomes. This is reflected in the sharp 2

downward slope at the beginning of the frontier of means for pruned observations. The first hundred or so observations pruned are individuals with high income. However, as increasingly many observations are pruned, fewer bad matches remain and the means for pruned observations begin to level off. A.2 Sex and Judging We complement the analysis in Section 6.2 with Figure 2, which displays covariate means for pruned units. 1.0 0.8 Republican Majority Scaled means 0.6 0.4 Median Ideology Median Age Majority Experienced 0.2 Has Minority Liberal Lower Direction 0.0 0 100 200 300 400 500 Figure 2: Covariate means for pruned observations on the Boyd, Epstein and Martin (2010) frontier. As in the Lalonde (1986) example, Figure 2 provides intuition for the dimensions 3

most responsible for imbalance between the treated and control groups. In the Boyd, Epstein and Martin (2010) example, that dimension is the presence of another minority on the bench. The intuition here is that the appointment of justices to appellate courts is a political process; women are more likely to be appointed in districts where other minorities were also appointed, which may reflect a general commitment to representation on the bench. B Simulation In this section, we conduct a simple simulation to provide intuition about imbalance and sample size in a context where the true data generating process is exactly known. B.1 Data Generating Process Following King and Nielsen (2015), we conduct a simulation in which a completely randomized experiment is hidden within an imbalanced data set. Specifically, we construct an example with 10,000 observations, 5,000 of which are treated. Two covariates X1 and X2 are observed for all observations. For control units, both X1 and X2 are distributed Uniform(0,5), whereas they are both distributed Uniform(1,6) for treated observations. We generate the outcome Y as Y i = 2T i + X1 i + X2 i + ɛ i, where ɛ is N(0, 1). Note that the true treatment effect is two. B.2 Frontier and Estimates Given these data, we calculate the AMI FSATT frontier, displayed in the left panel of Figure 3. The right panel of Figure 3 displays the treatment effects across this frontier, along with model dependence. In this example, we define model dependence as the range of point estimates for the treatment effect estimated by different model specifications. Thus, the red region represents the range of possible point estimates for the treatment effect given some point on the frontier. Points with wider ranges are those at which the researcher must know the true model in order to correctly estimate the treatment effect. In contrast, points with narrow ranges are those at which the researcher would correctly estimate the treatment effect, no matter the specified functional form. It is possible to estimate the true effect (the number 2) at any point on the frontier where a reasonable number of data points remain. However, when imbalance is largest 4

(at the start of the frontier), it is possible to estimate an effect more than double the true effect. By contrast, when imbalance is low, a researcher would estimate the true effect no matter the specified form. 0.06 4 Mahalanobis Discrepancy 0.05 0.04 0.03 0.02 0.01 Estimate 3 2 1 0 1 0.00 2 Figure 3: The left panel displays the imbalance frontier for the simulated data, while the right panel displays estimated treatment effects across it. B.3 Covariate Means and Intuition The two covariates on which we matched X1 and X2 are distributed Uniform(0,5) for control units and Uniform(1,6) for treated units. This implies a square region of common support, to which the matching algorithm subsets the data. To see this, represent the data in the two dimensional space of X1 and X2, as shown in Figure 4. Let (X1, X2) denote a coordinate within the space. The distribution of the data imply support over a two-dimensional square with corners at (1,1) and (5,5), which is observable in the figure. As in previous analyses, we calculate covariate means for both pruned and remaining observations. However, in this illustration, we also calculate the difference in means between treated and control groups. Because this data set is relatively simple, we do not scale the covariate means (in real data scaling in ways that are substantively meaningful is important). The top left panel in Figure 5 displays the means for both covariates in the matched samples (i.e., the observations remaining), pooling across treated and control groups. Note that pruning bad matches has relatively little influence on the means of these co- 5

X2 0 1 2 3 4 5 6 0 1 2 3 4 5 6 X1 Figure 4: Distribution of the simulated data over covariates. variates. The intuition for this lies in the FSATT estimand. Specifically, pooling treatment conditions, E[X 1 ] and E[X 2 ] both equal three, which is also the centerpoint of the square ranging from (1,1) to (5,5) over which we have common support. That is, the centerpoint (3,3) in this supported square is the centerpoint of the raw data. Because data are symmetrically ditributed around this point, pruning observations according to continuous distance preserves these expectations. To further illustrate this point, the top right panel of Figure 5 displays the difference in the means for treated and control in the remaining matched sample. Note that the difference in the means of X 1 and X 2 before pruning is roughly equal to 1, which equals E[Uniform(1,6)] - E[Uniform(0,5)]. Because imbalance in both X1 and X2 is equal in expectation, imbalance decreases at roughly equal rates in these dimensions until approaching zero. As in the previous examples displayed in this appendix, we plot the means of pruned observations in the lower panel of Figure 5. As implied by the discussion of covariate means in the matched sample, the means of pruned observations do not change across the frontier. 6

7 4.0 1.5 Unscaled means 3.5 3.0 2.5 X1 X2 Difference in unscaled means 1.0 0.5 2.0 0.0 X1 X2 4.0 3.5 Unscaled means 3.0 2.5 X1 X2 2.0 Figure 5: The top left panel displays covariate means across the frontier, the top right displays the difference in means for treated and control units, and the bottom displays the covariate means for pruned observations.

References Boyd, Christina L, Lee Epstein and Andrew D Martin. 2010. Untangling the causal effects of sex on judging. American journal of political science 54(2):389 411. King, Gary, Christopher Lucas and Richard Nielsen. 2015. MatchingFrontier: Automated Matching for Causal Inference. R package version 2.0.0. King, Gary and Richard Nielsen. 2015. Why Propensity Scores Should Not Be Used for Matching. Working. Lalonde, Robert. 1986. Evaluating the Econometric Evaluations of Training Programs. American Economic Review 76:604 620. 8