Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions

Similar documents
Covariate Balancing Propensity Score for General Treatment Regimes

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score

Weighting Methods. Harvard University STAT186/GOV2002 CAUSAL INFERENCE. Fall Kosuke Imai

The propensity score with continuous treatments

Propensity-Score Based Methods for Causal Inference in Observational Studies with Fixed Non-Binary Treatments

Causal Inference in Observational Studies with Non-Binary Treatments. David A. van Dyk

What s New in Econometrics. Lecture 1

Combining multiple observational data sources to estimate causal eects

Propensity Score Weighting with Multilevel Data

Propensity Score Methods for Causal Inference

Achieving Optimal Covariate Balance Under General Treatment Regimes

Propensity-Score Based Methods for Causal Inference in Observational Studies with Fixed Non-Binary Treatments

Discussion of Papers on the Extensions of Propensity Score

Robustness to Parametric Assumptions in Missing Data Models

arxiv: v1 [stat.me] 15 May 2011

Data Integration for Big Data Analysis for finite population inference

For more information about how to cite these materials visit

Gov 2002: 4. Observational Studies and Confounding

Vector-Based Kernel Weighting: A Simple Estimator for Improving Precision and Bias of Average Treatment Effects in Multiple Treatment Settings

IP WEIGHTING AND MARGINAL STRUCTURAL MODELS (CHAPTER 12) BIOS IPW and MSM

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random

Estimating the Mean Response of Treatment Duration Regimes in an Observational Study. Anastasios A. Tsiatis.

Propensity Score Analysis with Hierarchical Data

An Introduction to Causal Mediation Analysis. Xu Qin University of Chicago Presented at the Central Iowa R User Group Meetup Aug 10, 2016

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies

Extending causal inferences from a randomized trial to a target population

Fractional Imputation in Survey Sampling: A Comparative Review

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random

Telescope Matching: A Flexible Approach to Estimating Direct Effects

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

High Dimensional Propensity Score Estimation via Covariate Balancing

Statistical Analysis of Randomized Experiments with Nonignorable Missing Binary Outcomes

Advanced Quantitative Research Methodology, Lecture Notes: Research Designs for Causal Inference 1

Gov 2002: 13. Dynamic Causal Inference

Use of Matching Methods for Causal Inference in Experimental and Observational Studies. This Talk Draws on the Following Papers:

Causal modelling in Medical Research

The Impact of Measurement Error on Propensity Score Analysis: An Empirical Investigation of Fallible Covariates

Primal-dual Covariate Balance and Minimal Double Robustness via Entropy Balancing

Accounting for Complex Sample Designs via Mixture Models

Robust Estimation of Inverse Probability Weights for Marginal Structural Models

University of California, Berkeley

Double Robustness. Bang and Robins (2005) Kang and Schafer (2007)

An Introduction to Causal Analysis on Observational Data using Propensity Scores

Lecture 4 Multiple linear regression

In Praise of the Listwise-Deletion Method (Perhaps with Reweighting)

Comment: Understanding OR, PS and DR

Use of Matching Methods for Causal Inference in Experimental and Observational Studies. This Talk Draws on the Following Papers:

6 Pattern Mixture Models

INVERSE PROBABILITY WEIGHTED ESTIMATION FOR GENERAL MISSING DATA PROBLEMS

Variable selection and machine learning methods in causal inference

New Developments in Nonresponse Adjustment Methods

Counterfactual Model for Learning Systems

Eric Shou Stat 598B / CSE 598D METHODS FOR MICRODATA PROTECTION

Statistical Estimation

Selection on Observables: Propensity Score Matching.

Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai

A Sampling of IMPACT Research:

Potential Outcomes Model (POM)

ABSTRACT CAUSAL INFERENCE WITH A CONTINUOUS TREATMENT AND OUTCOME: ALTERNATIVE ESTIMATORS FOR PARAMETRIC DOSE-RESPONSE FUNCTIONS WITH APPLICATIONS

Flexible Estimation of Treatment Effect Parameters

Modification and Improvement of Empirical Likelihood for Missing Response Problem

9 Correlation and Regression

Estimating the Marginal Odds Ratio in Observational Studies

Causal Inference Basics

Simulation-Extrapolation for Estimating Means and Causal Effects with Mismeasured Covariates

NBER WORKING PAPER SERIES USE OF PROPENSITY SCORES IN NON-LINEAR RESPONSE MODELS: THE CASE FOR HEALTH CARE EXPENDITURES

Matching for Causal Inference Without Balance Checking

Estimating and Using Propensity Score in Presence of Missing Background Data. An Application to Assess the Impact of Childbearing on Wellbeing

CompSci Understanding Data: Theory and Applications

Ratio of Mediator Probability Weighting for Estimating Natural Direct and Indirect Effects

Statistical Methods. Missing Data snijders/sm.htm. Tom A.B. Snijders. November, University of Oxford 1 / 23

Covariate Balancing Propensity Score for a Continuous Treatment: Application to the Efficacy of Political Advertisements

Section 9c. Propensity scores. Controlling for bias & confounding in observational studies

Summary and discussion of The central role of the propensity score in observational studies for causal effects

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i,

Combining Non-probability and Probability Survey Samples Through Mass Imputation

Unpacking the Black-Box of Causality: Learning about Causal Mechanisms from Experimental and Observational Studies

Weighting. Homework 2. Regression. Regression. Decisions Matching: Weighting (0) W i. (1) -å l i. )Y i. (1-W i 3/5/2014. (1) = Y i.

ENTROPY BALANCING IS DOUBLY ROBUST QINGYUAN ZHAO. Department of Statistics, Stanford University DANIEL PERCIVAL. Google Inc.

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall

Comments on Design-Based Prediction Using Auxilliary Information under Random Permutation Models (by Wenjun Li (5/21/03) Ed Stanek

1 Motivation for Instrumental Variable (IV) Regression

Unpacking the Black-Box of Causality: Learning about Causal Mechanisms from Experimental and Observational Studies

Econometrics with Observational Data. Introduction and Identification Todd Wagner February 1, 2017

Measuring Social Influence Without Bias

ECNS 561 Multiple Regression Analysis

Calibration Estimation for Semiparametric Copula Models under Missing Data

Semiparametric Generalized Linear Models

Since the seminal paper by Rosenbaum and Rubin (1983b) on propensity. Propensity Score Analysis. Concepts and Issues. Chapter 1. Wei Pan Haiyan Bai

ENTROPY BALANCING IS DOUBLY ROBUST. Department of Statistics, Wharton School, University of Pennsylvania DANIEL PERCIVAL. Google Inc.

,..., θ(2),..., θ(n)

New Developments in Econometrics Lecture 16: Quantile Estimation

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University

G-ESTIMATION OF STRUCTURAL NESTED MODELS (CHAPTER 14) BIOS G-Estimation

Assess Assumptions and Sensitivity Analysis. Fan Li March 26, 2014

Gov 2002: 5. Matching

Bayesian Analysis of Latent Variable Models using Mplus

ECON3150/4150 Spring 2015

Model Assisted Survey Sampling

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Transcription:

Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions Joe Schafer Office of the Associate Director for Research and Methodology U.S. Census Bureau joint work with Doug Galagate May 12, 2017

Outline 1. Motivation and setup 2. Methods 3. Simulation 4. Real example 5. Extensions Disclaimer: This presentation is intended to inform interested parties of ongoing research and to encourage discussion. The views expressed are those of the authors and not necessarily those of the U.S. Census Bureau. Causal Inference with Continuous Treatment and Outcome 2/31

1. Motivation Methods for causal inference from nonrandomized studies typically assume the treatment is binary; continuous-dose treatments are less well studied Two different generalizations of the propensity score (Imai and van Dyk, 2004; Hirano and Imbens, 2004) Marginal structural models (Robins, Hernan and Brumback, 2000) could be applied Start with the simplest, plain-vanilla version Suppose that the average causal dose-response relationship in the population follows a simple parametric form (e.g., linear) Is there a potential-outcome version of ordinary linear regression? How would one estimate this causal regression line? Which ways are best? Causal Inference with Continuous Treatment and Outcome 3/31

Example: Smoking and Medical Expenditures Previously analyzed by Imai and van Dyk (2004) Data from National Medical Expenditure Survey (NMES) Multistage cluster sample from area frame, with oversampling of certain groups Interviews in 1987 from persons in sampled HHs, with expenditures obtained from health care providers Use smokers over the age of 18 (roughly 10,000 in sample) Examine relationship between Y i = expenditures and T i = packyears, a measure of cumulative lifetime smoking Causal Inference with Continuous Treatment and Outcome 4/31

6000 totalexp 0 0 2000 4000 100000 50000 totalexp 150000 8000 Example (continued) 0 50 100 150 200 0 50 packyears 100 150 200 packyears Red lines are loess curves Plot on RHS omits top 5% of Y values Relationship is distorted by many potential confounders (e.g., age) and by sample design We will return to this example later Causal Inference with Continuous Treatment and Outcome 5/31

Usual Setup for Binary Treatment T i = treatment received by unit i (0=control, 1=experimental) Y i = {Y i (0),Y i (1)} set of potential outcomes Y i (1) Y i (0) = treatment effect for unit i (unobservable) Y i = Y i (T i ) = T i Y i (1)+(1 T i )Y i (0) observed outcome E(Y i (1) Y i (0)) = population average treatment effect (PATE) Data available for estimating PATE are (T i,y i, i ), i = 1,...,N, where i = (X i1,...,x ip ) is a vector of pre-treatment covariates For review and history of this binary case, see Rubin (2005) Causal Inference with Continuous Treatment and Outcome 6/31

Setup for Continuous Treatment T i T = (t min,t max ) real interval Y i = {Y i (t) : t T } Y i (t) is function or path indexed by t that describes all treatment effects for unit i Our goal is to estimate the average dose-response function (ADRF) in the population, µ(t) = E(Y i (t)) for t T We cannot observe Y i (t); we see only one randomly chosen point along the path, Y i = Y i (T i ), along with T i and i. The regression of Y i on T i estimates µ (t) = E(Y i (t) T i = t) which, in general, is not the same as the ADRF. Causal Inference with Continuous Treatment and Outcome 7/31

Simulated example Response (Y) 0 20 40 60 80 100 µ (t) µ(t) 8 10 12 14 16 18 20 Dose (T) Simulated sample of N = 200 observed points (T i,y i ), with representative potential-outcome paths (gray lines), average causal dose response function µ(t) = E(Y i (t)), and regression curve µ (t) = E(Y i (t) T i = t). Causal Inference with Continuous Treatment and Outcome 8/31

Parameterizing the dose-response relationship In most previous work, authors made few direct assumptions about the form of Y i (t) or µ(t) We will suppose that Y i (t) = θ i (t), where (t) = (b 1 (t),...,b k (t)) is a vector of known basis functions, and θ i = (θ i1,...,θ ik ) is a vector of unknown coefficients specific to unit i An important special case is (t) = (1,t), which specifies linear paths whose intercepts and slopes may vary Inference for µ(t) = E(Y i (t)) becomes a matter of estimating a k-dimensional parameter ξ = E(θ i ) = (ξ 1,...,ξ k ). We cannot observe θ i, but only Y i = Y i (T i ) = θ i i, where i = (T i ) = (B i1,...,b ik ). Thus Y i is a randomly selected linear combination of the components of θ i, a type of coarsened data (Heitjan and Rubin, 1991) Causal Inference with Continuous Treatment and Outcome 9/31

Additional assumptions Stable Unit Treament Value Assumption (SUTVA): Treatment applied to any unit has no impact on the outcome for any other unit (Rubin, 1980) Strong ignorability: T i θ i i Positivity: P(T i T 0 i ) > 0 for every i in the population and every set T 0 T with positive measure Each technique for estimating ξ will require us to model the distribution of T i given i and/or the distribution of θ i given i. Consistency will require at least some aspects of these models to be correctly specified. We envision situations where none of these models are precisely true, and look for estimators that perform well under moderate amounts of misspecification. Causal Inference with Continuous Treatment and Outcome 10/31

View from function space 2. Methods Under our parametric assumptions, each path Y i (t) is represented by a point θ i = (θ i1,...,θ ik ) in k-dimensional space Once we observe Y i, we know that θ i lies in the (k 1)-dimensional hyperplane L i = {θ i : θ i (t) = Y i t T} If we could observe the exact locations of the θ i = (θ i1,θ i2 ), then the center of the point cloud, ˆξ = N 1 N i=1, would be an unbiased estimate for ξ. Fortunately, there are some observable magic vectors whose expectations are equal to E(θ i ) = ξ. Causal Inference with Continuous Treatment and Outcome 11/31

Magic vector #1. Under a model for P(T i i ), the vector θ i = Ï i i Y i, where Ï i = E( i i i ) 1, has expectation ξ if that T-model is correct. (Matrix generalization of the Horvitz-Thompson element π 1 i I i Y i.) Magic vector #2. Under a model for P(θ i i ), the vector ˆθ i = E(θ i i ) has expectation ξ if that Y-model is correct. Magic vector #3. Under models for P(T i i ) and P(θ i i ), the vector ˆθ i = θ i + (I k Ï i i i )ˆθ i has expectation ξ if either the T-model or the Y-model is correct. Causal Inference with Continuous Treatment and Outcome 12/31

Method #0: The naive approach Regress Y i on i = (T i ) ( N ˆξ = i i i=1 ) 1 ( N ) i Y i. i=1 Solves Í(ξ) = N i=1 Í i = 0, where Í i = Í i (ξ) = i (Y i i ξ) = i i (θ i ξ) is a vector of estimating functions Adapting terminology from Holland (1980), we call this the prima facie estimator Ignores information in i Would be consistent if T i and θ i were independent Performs poorly, not recommended Causal Inference with Continuous Treatment and Outcome 13/31

What if we toss in the covariates? In practice, many analysts adjust for confounders by including them as additional predictors, regressing Y i on ( i, i ) and hoping for a good result. Can work surprisingly well in some cases, badly in others The ADRF describes the marginal mean of Y i (t), which requires averaging over covariates, not conditioning on them, and this difference often goes unappreciated. When analysts do this, the connection to potential outcomes is rarely made explicit. IMO, we should stop teaching people to do this. Causal Inference with Continuous Treatment and Outcome 14/31

Method #1: Importance weighting Robins, Hernan and Brumback (RHB) (2000) use the estimating function of the form Í i = P(T i ) P(T i i ) i(y i i ξ) Equivalent to weighted least-squares (WLS) regression of Y i on i, with weights P(T i )/P(T i i ); RHB call them stabilized weights Related to importance sampling (Hammersley and Handscomb, 1964) Adjusts the expectation of the prima facie Í i to what it would be if ( i,t i,y i ) were sampled from a population in which T i is independent of i Causal Inference with Continuous Treatment and Outcome 15/31

This is a classic case where importance sampling tends to fail, because the target density P(T i ) is more diffuse than the actual density P(T i i ), causing the weights to be highly unstable. Sensitive to misspecification of P(T i i ) and P(T i ) in the tails Sometimes works well when T i is discrete, but not recommended for continuous treatments Causal Inference with Continuous Treatment and Outcome 16/31

Method #2: Inverse second-moment weighting Taking Í i = (Ï i i Y i ξ) for Ï i = [E( i i i )] 1 gives ˆξ = 1 N ( N ) Ï i i Y i. i=1 A natural extension of inverse probability of treatment weighting (IPTW) for a binary T i (Hirano and Imbens, 2001). A modified version that normalizes the weights is ( N ) 1 ( N ) ˆξ = Ï i i i Ï i i Y i i=1 Not a typical WLS regression; the weight Ï i is a matrix, and the information i Ï i i i is asymmetric i=1 Relies on a model for P(T i i ), but is more stable and robust than importance weighting Causal Inference with Continuous Treatment and Outcome 17/31

Method # 3: Regression prediction Taking Í i = (ˆθ i ξ), where ˆθ i = E(θ i i ) under a model, gives N ˆξ = N 1 ˆθ i i=1 For example, suppose θ i i N(ν +Γ i,σ). Under strong ignorability, we can fit this model by regressing Y i on i and ( i i ). Including the interactions between i and i seems key. The difficulty of doing this as the lengths of i and i grow is the curse of dimensionality Causal Inference with Continuous Treatment and Outcome 18/31

Method # 4: Regression prediction with residual bias correction Taking leads to Í i = Ï i i (Y i i ξ) + (Á Ï i i i )(ˆθ i ξ). ˆξ = 1 N where Ŷi = i ˆθ i. N ˆθ i + 1 N i=1 N Ï i i (Y i Ŷi), i=1 Requires models for P(T i i ) and P(θ i i ) Asymptotically unbiased if either model is correct Related to generalized regression (GREG) estimators from survey literature (Deville and Särndal, 1992), but with matrix weights Causal Inference with Continuous Treatment and Outcome 19/31

Method # 5: Prediction from weighted regression Like regression prediction (Method 3), except that we apply the weight matrices Ï i = [E( i i i )] 1 when estimating ˆθ i = E(θ i i ). Requires models for P(T i i ) and P(θ i i ) Asymptotically unbiased if either model is correct Performance is similar to Method 4 in large samples, but very unstable in small samples Causal Inference with Continuous Treatment and Outcome 20/31

Method # 6: Propensity-spline prediction If P(T i = t i ) depends on i only through a ψ( i ) (typically of smaller dimension), then ψ( i ) is a propensity function (PF) (Imai and van Dyk, 2004) For example, if T i i N( i β,σ2 ), then ψ = (β,σ 2 ), then the linear predictor i β is a PF. Inspired by Little and An (2004), build a rich prediction model for ˆθ i = E(θ i i ) (Method 3), but include a spline basis for ψ( i ) as additional predictors Imai and van Dyk (2004) reduce the model to ˆθ i = E(θ i ψ( i )), but this reduction is unnecessary and inefficient Causal Inference with Continuous Treatment and Outcome 21/31

3. Simulation study Evaluate performance of estimators samples of N = 200 and N = 1,000 from a population where the true ADRF is constant Response (Y) 0 20 40 60 80 100 µ (t) µ(t) 8 10 12 14 16 18 20 Dose (T) Causal Inference with Continuous Treatment and Outcome 22/31

Simulation study (continued) Y i (t) = θ i1 + θ i2 t (θ i1,θ i2,t i ) is multivariate normal with mean vector (ξ 1,ξ 2,κ) = (50,0,12) and covariance matrix Ω = ω 11 ω 12 ω 13 ω 12 ω 22 ω 23 ω 13 ω 23 ω 33 = 51.0 3.80 5.92 3.80 0.55 0.51 5.92 0.51 2.02 θ i1, θ i2 and T i are linearly related to true normally distributed covariates A i1,...,a i8 which are hidden from view; analyst sees transformed versions A i1,...,a i8 which are skewed, bounded and binary, leading to model misspecification. Causal Inference with Continuous Treatment and Outcome 23/31

Simulation results Performance of estimators for ξ 2 over 1,000 samples from the artificial population using incorrect Y-models and misspecified but rich T-models. Sample size Method Bias Var. % Bias RMSE MAE N = 200 Prima facie 7.05 0.35 1,190 7.08 7.06 Importance weighting 3.87 2.54 243 4.19 4.10 Inverse second-moment weighting 0.225 0.504 32 0.745 0.491 Regression prediction 0.616 0.500 87 0.938 0.660 Prediction + residual bias correction 0.268 0.496 38 0.753 0.493 Prediction from weighted regression 0.249 14.9 6 3.87 0.570 Propensity-spline prediction 0.166 0.585 22 0.783 0.531 N = 1, 000 Prima facie 7.06 0.063 2,820 7.07 7.05 Importance weighting 2.75 1.89 200 3.07 3.07 Inverse second-moment weighting 0.144 0.083 50 0.322 0.215 Regression prediction 0.660 0.089 221 0.725 0.661 Prediction + residual bias correction 0.158 0.082 55 0.327 0.225 Prediction from weighted regression 0.133 0.087 45 0.324 0.219 Propensity-spline prediction 0.106 0.084 36 0.308 0.204 % Bias = 100 Bias / Var, MAE = median absolute error Causal Inference with Continuous Treatment and Outcome 24/31

Simulation summary Don t use importance weighting Regression prediction isn t terrible, but it seems more susceptible to model failure than other methods Inverse second-moment weighting uses only a T-model, but is surprisingly efficient and robust when the ADRF is linear Weighted regression prediction is unstable in small samples Other dual-model methods (prediction + bias correction, propensity-spline prediction) are competitive Causal Inference with Continuous Treatment and Outcome 25/31

Example: Smoking and Medical Expenditures Previously analyzed by Imai and van Dyk (2004) Data from National Medical Expenditure Survey (NMES) Multistage cluster sample from area frame, with oversampling of certain groups Interviews in 1987 from persons in sampled HHs, with expenditures obtained from health care providers Use smokers over the age of 18 (roughly 10,000 in sample) Examine relationship between Y i = expenditures and T i = packyears, a measure of cumulative lifetime smoking Causal Inference with Continuous Treatment and Outcome 26/31

6000 totalexp 0 0 2000 4000 100000 50000 totalexp 150000 8000 4. Example: Smoking and Medical Expenditures 0 50 100 150 200 0 packyears 50 100 150 200 packyears Estimate the effect of Ti = packyears on Yi = expenditures, supposing that the ADRF is linear (!) Potential confounders include age, sex, race/ethnicity, marital status, education, region, income/poverty, seatbelt use Causal Inference with Continuous Treatment and Outcome 27/31

Complications Complex sample design, with strata, clusters and oversampling Modify the estimating functions to ci w i Í i, where c i = 1 for smokers and 0 otherwise, and w i is the survey weight (number of pop. persons represented by the sampled person) Compute standard errors by a linearization (sandwich) method appropriate for general with replacement designs (Binder, 1983) Smoking status and packyears are missing for 11% of the sample We multiply imputed them M = 25 times under a two-part regression model Recovers an additional 1,000 smokers Causal Inference with Continuous Treatment and Outcome 28/31

Prima facie estimate totalexp 0 2000 4000 6000 8000 0 50 100 150 200 packyears slope = 25.08 (SE=2.20) No adjustment for potential confounders Causal Inference with Continuous Treatment and Outcome 29/31

Inverse second-moment estimate Model the treatment using a heteroscedastic linear regression for cube root of packyears, with variance mean 1.8 Include all main effects and many two-way interactions (50 predictors) totalexp 0 2000 4000 6000 8000 0 50 100 150 200 packyears slope = 10.22 (SE=3.06) Causal Inference with Continuous Treatment and Outcome 30/31

5. Extensions Our parametric assumption maps each Y i (t) to a random vector θ i R k, leading to a wide variety of estimation methods Assumed linear form Y i (t) = (t) θ i is key, because the ADRF is completely determined by E(θ i ) Generalizations to nonlinear forms will be tricky, because they depend on other aspects of the distibution of θ i s (e.g., covariances) which are more difficult to estimate from Y i Generalizations to discrete responses will also be tricky, because the θ i s will no longer be coarsened to hyperplanes Thanks for listening! Causal Inference with Continuous Treatment and Outcome 31/31