Vector-Based Kernel Weighting: A Simple Estimator for Improving Precision and Bias of Average Treatment Effects in Multiple Treatment Settings

Similar documents
Causal Inference Basics

Propensity Score Matching and Analysis TEXAS EVALUATION NETWORK INSTITUTE AUSTIN, TX NOVEMBER 9, 2018

Matching. Quiz 2. Matching. Quiz 2. Exact Matching. Estimand 2/25/14

Double Robustness. Bang and Robins (2005) Kang and Schafer (2007)

Propensity Score Weighting with Multilevel Data

Propensity Score Methods for Causal Inference

Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions

Robustness to Parametric Assumptions in Missing Data Models

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies

Bayesian causal forests: dealing with regularization induced confounding and shrinking towards homogeneous effects

Weighting Methods. Harvard University STAT186/GOV2002 CAUSAL INFERENCE. Fall Kosuke Imai

Controlling for overlap in matching

SIMULATION-BASED SENSITIVITY ANALYSIS FOR MATCHING ESTIMATORS

Since the seminal paper by Rosenbaum and Rubin (1983b) on propensity. Propensity Score Analysis. Concepts and Issues. Chapter 1. Wei Pan Haiyan Bai

Summary and discussion of The central role of the propensity score in observational studies for causal effects

Propensity Score Analysis with Hierarchical Data

Bayesian regression tree models for causal inference: regularization, confounding and heterogeneity

Econometric Analysis of Cross Section and Panel Data

(Mis)use of matching techniques

Selection on Observables: Propensity Score Matching.

Covariate Balancing Propensity Score for General Treatment Regimes

Variable selection and machine learning methods in causal inference

Measuring Social Influence Without Bias

The Impact of Measurement Error on Propensity Score Analysis: An Empirical Investigation of Fallible Covariates

Targeted Maximum Likelihood Estimation in Safety Analysis

e author and the promoter give permission to consult this master dissertation and to copy it or parts of it for personal use. Each other use falls

PROPENSITY SCORE MATCHING. Walter Leite

Causal inference in epidemiological practice

High Dimensional Propensity Score Estimation via Covariate Balancing

Causal Hazard Ratio Estimation By Instrumental Variables or Principal Stratification. Todd MacKenzie, PhD

Gov 2002: 5. Matching

Job Training Partnership Act (JTPA)

University of California, Berkeley

Propensity Score Methods, Models and Adjustment

arxiv: v1 [stat.me] 15 May 2011

Causal Analysis in Social Research

A SIMULATION-BASED SENSITIVITY ANALYSIS FOR MATCHING ESTIMATORS

Propensity Score Matching

EMERGING MARKETS - Lecture 2: Methodology refresher

Causal inference in multilevel data structures:

Empirical Bayes Moderation of Asymptotically Linear Parameters

Controlling for latent confounding by confirmatory factor analysis (CFA) Blinded Blinded

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Modern Statistical Learning Methods for Observational Biomedical Data. Chapter 2: Basic identification and estimation of an average treatment effect

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score

A Sampling of IMPACT Research:

The Supervised Learning Approach To Estimating Heterogeneous Causal Regime Effects

University of California, Berkeley

Combining multiple observational data sources to estimate causal eects

Telescope Matching: A Flexible Approach to Estimating Direct Effects

Journal of Biostatistics and Epidemiology

Telescope Matching: A Flexible Approach to Estimating Direct Effects *

Estimating and Using Propensity Score in Presence of Missing Background Data. An Application to Assess the Impact of Childbearing on Wellbeing

Causality and Experiments

Estimating the Marginal Odds Ratio in Observational Studies

Use of Matching Methods for Causal Inference in Experimental and Observational Studies. This Talk Draws on the Following Papers:

Behavioral Data Mining. Lecture 19 Regression and Causal Effects

Dan Graham Professor of Statistical Modelling. Centre for Transport Studies

Difference-in-Differences Methods

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data?

Propensity Score Methods for Estimating Causal Effects from Complex Survey Data

Extending causal inferences from a randomized trial to a target population

Modification and Improvement of Empirical Likelihood for Missing Response Problem

Authors and Affiliations: Nianbo Dong University of Missouri 14 Hill Hall, Columbia, MO Phone: (573)

Matching to estimate the causal effects from multiple treatments

Logistic regression: Why we often can do what we think we can do. Maarten Buis 19 th UK Stata Users Group meeting, 10 Sept. 2015

Instrumental variables estimation in the Cox Proportional Hazard regression model

Randomization-Based Inference With Complex Data Need Not Be Complex!

University of California, Berkeley

Integrated approaches for analysis of cluster randomised trials

The Economics of European Regions: Theory, Empirics, and Policy

Parametric and Non-Parametric Weighting Methods for Mediation Analysis: An Application to the National Evaluation of Welfare-to-Work Strategies

ESTIMATING AVERAGE TREATMENT EFFECTS: REGRESSION DISCONTINUITY DESIGNS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics

Quantitative Economics for the Evaluation of the European Policy

HOW A SUPPRESSOR VARIABLE AFFECTS THE ESTIMATION OF CAUSAL EFFECT: EXAMPLES OF CLASSICAL AND RECIPROCAL SUPPRESSIONS. Yun-Jia Lo

Stable Weights that Balance Covariates for Estimation with Incomplete Outcome Data

Propensity Score for Causal Inference of Multiple and Multivalued Treatments

Finite Population Sampling and Inference

ESTIMATION OF TREATMENT EFFECTS VIA MATCHING

The problem of causality in microeconometrics.

Independent and conditionally independent counterfactual distributions

Modern Statistical Learning Methods for Observational Data and Applications to Comparative Effectiveness Research

Data Integration for Big Data Analysis for finite population inference

The Central Limit Theorem

Least Squares Estimation of a Panel Data Model with Multifactor Error Structure and Endogenous Covariates

An Introduction to Causal Analysis on Observational Data using Propensity Scores

AP Statistics Cumulative AP Exam Study Guide

Empirical approaches in public economics

Marginal, crude and conditional odds ratios

Modeling Log Data from an Intelligent Tutor Experiment

Empirical Bayes Moderation of Asymptotically Linear Parameters

Analysis of propensity score approaches in difference-in-differences designs

Imbens/Wooldridge, IRP Lecture Notes 2, August 08 1

Causal modelling in Medical Research

Robust covariance estimator for small-sample adjustment in the generalized estimating equations: A simulation study

Numerical Measures of Central Tendency

ON THE CONSEQUENCES OF MISSPECIFING ASSUMPTIONS CONCERNING RESIDUALS DISTRIBUTION IN A REPEATED MEASURES AND NONLINEAR MIXED MODELLING CONTEXT

Instance Selection. Motivation. Sample Selection (1) Sample Selection (2) Sample Selection (3) Sample Size (1)

Causal Mechanisms Short Course Part II:

Robustness of Logit Analysis: Unobserved Heterogeneity and Misspecified Disturbances

Transcription:

Vector-Based Kernel Weighting: A Simple Estimator for Improving Precision and Bias of Average Treatment Effects in Multiple Treatment Settings Jessica Lum, MA 1 Steven Pizer, PhD 1, 2 Melissa Garrido, PhD 1, 2 1. Department of Veterans Affairs 2. Boston University School of Public Health Stata Conference Columbus, OH July 20 th, 2018

Overview 1. Importance of using full propensity score vector 2. Common support in multiple treatment setting 3. Transitive treatment effects 4. Weighting/Matching strategies Introduce new treatment effect estimator 5. Monte Carlo (MC) simulation design 6. Demonstrate bias and efficiency of estimators via MC simulations

Multiple Treatment Groups Accounting for all values of a treatment variable in a single equation helps ensure propensity scores from a multinomial model leads to treatment effect estimation among patients with non-zero probabilities of receiving any of the other treatments (common support). Multinomial choice model: Predicts several generalized propensity scores, each one representing probability of receiving one of the treatments. Predicted probabilities are represented by a propensity score vector of values for each observation.

Common Support Drop units outside range of common support

Transitive Treatment Effects* Treatment effect** estimation involves constructing counterfactual outcomes from a comparison group determined to be most similar to the reference group based on propensity scores. Pairwise treatment effects are transitive iff conditioning on a sample eligible to receive the same treatment groups. E[Y(A) Y(C) T = A] E[Y(A) Y(B) T = A] *Lopez and Gutman (2017). = E[Y(B) Y(C) T = A] ** All estimates are obtained as weighted mean differences of outcomes, with weights normalized to sum to 1 in each treatment group.

Goals The degree to which different weighting or matching strategies lead to robust inferences in messy empirical scenarios with multiple treatment groups is unknown. We seek to understand the scenarios in which all methods perform similarly, as well as scenarios that produce divergent inferences. To identify when estimators produce unbiased and efficient estimators in a variety of settings, we compare 4 estimators which each utilize propensity scores differently in treatment effect estimation: 1. Inverse Probability of Treatment Weighting (IPTW) (weighting) 2. Kernel Weighting (KW) (weighting + matching) 3. Vector Matching (VM) (matching) 4. Vector-Based Kernel Weighting (VBKW) (weighting + matching)

Inverse Probability of Treatment Weights In estimating E[Y(A) Y(B)], W = 1 p t=a x) 1 p t=b x), if t = A, if t = B In estimating E[Y(A) Y(B) T = A], W = ቐ 1, if t = A p t=a x) p t=b x), if t = B Incorrectly estimated IPTWs may have extreme values, increasing variance of treatment effect estimate, and potentially leading to biased estimates. In pairwise comparisons, the IPTW estimator does not utilize the full propensity score vector.

Kernel Weights In estimating E[Y(A) Y(B) T = A], W = 1, K j (D ia )/ σn B j Kj (D ia ) if t = A if t = B 3 K j (D ia ) = ቐ (1 D 2 ia 4 0.06 ), if DiA < 0.06 0, otherwise D ia = p j (A X) p i (A X) where i and j index T = A and T = B units, respectively, and N B is the total T = B units. Weight for estimating E[Y(A) Y(B)] = W E[Y(A) Y(B) T = A] + W E[Y(A) Y(B) T = B] In pairwise comparisons, the KW estimator does not utilize the full propensity score vector.

Vector Matching* VM creates matched sets with units that are close on one component of the PS vector, and roughly similar on the other components. To estimate E[Y(A) Y(B) T = A], E[Y(A) Y(C) T = A], or E[Y(B) Y(C) T = A]: 1. Refit PS model to obtain new propensity scores, take logit transform of scores. 2. 1:1 greedy match T=A units to T = B units with replacement on logit(p(a X)) within k- means strata of logit(p(c X)), within caliper. 3. 1:1 greedy match T=A units to T = C units with replacement on logit(p(a X)) within k- means strata of logit(p(b X)), within caliper. Combination of multiple steps in creating this matched set makes VM relatively complex to implement. Weight = The number of times a subject is used to create a matched set. * Lopez MJ, Gutman R. Estimation of causal effects with multiple treatments: A review and new ideas. Statistical Science 2017; 32(3): 432-454.

Vector-Based Kernel Weighting In estimating E[Y(A) Y(B) T = A], W = 1, N K j (D ia )/ σ B j Kj (D ia ) if t = A if t = B 3 K j (D ia ) = ቐ (1 D 2 ia ), if DiA < 0.06 and D 4 0.06 ib < 0.06 and D ic < 0.06 0, otherwise D ia = p j (A X) p i (A X) D ib = p j (B X) p i (B X) D ic = p j (C X) p i (C X) where i and j index T = A and T = B units, respectively, and N B is the total T = B units. Weight for estimating E[Y(A) Y(B)] = W E[Y(A) Y(B) T = A] + W E[Y(A) Y(B) T = B] This translates to non-zero weight assignment to controls with a similar propensity score vector instead of just being similar on p(a X), as in KW. Rather than matching in several steps, as in VM, VBKW takes one step to apply propensity score vector matching.

Vector-Based Kernel Weighting Features VM KW VBKW Requires one step to match x x Requires clustering x Weighting x x Matching x x x Utilizes full PS vector Transitivity of estimates x x x x

Expectations We wish to identify scenarios in which inferences are most likely to diverge under finite samples. We expect estimates from kernel weights (with a low emphasis on extreme weights) to be less biased than IPTW estimates when the data-generating process for the true propensity score is nonlinear and the estimated propensity score model is misspecified. We expect differences in inferences to be more likely when the presence of extreme weights is more likely or when identification of matches may be more difficult.

Simulation We report results from 3 treatment levels, n=999, as results from other simulation designs are qualitatively similar. We look at 12 Estimands: 3 ATEs, 9 ATTs. True ATTs equal to true ATEs when treatment effects were homogeneous. When the simulation included 3 treatment groups, the true ATEs, E[Y(A) Y(B)], E[Y(A) Y(C)], and E[Y(B) Y(C)] were set to -0.1, -0.2, -0.1, respectively. Model misspecification via estimation with (mlogit) main effects only.

Monte Carlo simulation design Simulation parameters* Functional form of the true propensity score model. Increasing model complexity through nonlinearity and/or nonadditivity. Based on Setoguchi et al. (2008). Number of treatments (k = 3, 4) Sample size (n = 999, n = 9,999) Sample distribution across treatment groups: Equal distribution of units into treatment groups 50% of sample in one group, remaining split equally 10% of sample in one group, remaining split equally Treatment effect distribution: Homogenous treatment effect Heterogeneous treatment effect (associated with confounder) Heterogeneous treatment effect (associated with outcome only variable) *For a given k, and n, there are 7 model misspecifications x 12 Estimands x 3 sample dist. x 3 effect dist. = 756 unique analytic scenarios to compare estimator performance.

Monte Carlo simulation design Evaluation metrics Bias* Bias as % of SD of effect estimate* Interquartile Range (IQR) Root-mean-squared-error (RMSE) Median absolute error (MAE)* Number of analytic scenarios with < 40% Bias % *Kang and Schafer (2007)

VBKW led to least biased and most efficient estimates Summary of Bias and Efficiency of Estimates Number (%) Analytic Median Bias as Scenarios % of SD with < 40% Bias IPTW 221 (29) KW 356 (47) VM 542 (72) VBKW 554 (73) 69.626 45.102 26.362 17.509 Median Absolute Bias 0.051 0.030 0.018 0.010 Median IQR 0.095 0.085 0.103 0.075

VBKW less sensitive to PS model misspecification

When treatment effect is homogeneous, IPTW & KW are most likely to be biased

When treatment effect is homogeneous, IPTW & KW are most likely to be biased

In the presence of heterogeneous, confounder-dependent treatment effects, all strategies likely to produce biased ATEs

VBKW most efficient across various PS model misspecifications

VBKW most efficient across sample distributions

VBKW most efficient across effect distributions

Limitations/ Future directions Simulations based on imposed rather than empirical DGP. Future work will include plasmode simulations based on empirical DGP. Our estimated PS model contained only main effects to test robustness to misspecification. Researchers should ensure propensity score leads to adequate covariate balance. We did not test sensitivity of results to observed covariate choice or covariate measurement errors, nor do we examine doubly-robust estimates. Future work: plasmode simulations, generalized boosted models, covariate balancing propensity scores, variable bandwidth, assessment of covariate balance, robustness to unobserved confounding. Creation of vbkw command.

Discussion Simulation results suggest VBKW less sensitive to PS model misspecification & sample distribution across treatment groups than other methods. VBKW only slightly better than VM, but simpler to implement. IPTW & KW not well suited to produce unbiased estimates of transitive effects. None of the estimators led to consistent unbiased estimation of heterogeneous treatment effect due to confounder.

Contact Information Thank you! For comments, questions, or suggestions, or to request a copy of the working paper: Jessica Lum Jessica.Lum2@va.gov 1. Project supported by VA HSR&D IIR 16-140 (PI: Garrido) 2. The views expressed here are those of the authors and not necessarily those of the Department of Veterans Affairs or United States Government.