Casual Mediation Analysis

Similar documents
Causal Mediation Analysis Short Course

Statistical Methods for Causal Mediation Analysis

An Introduction to Causal Mediation Analysis. Xu Qin University of Chicago Presented at the Central Iowa R User Group Meetup Aug 10, 2016

Mediation analyses. Advanced Psychometrics Methods in Cognitive Aging Research Workshop. June 6, 2016

Causal mediation analysis: Definition of effects and common identification assumptions

A Unification of Mediation and Interaction. A 4-Way Decomposition. Tyler J. VanderWeele

Estimating direct effects in cohort and case-control studies

Flexible mediation analysis in the presence of non-linear relations: beyond the mediation formula.

Mediation Analysis: A Practitioner s Guide

arxiv: v2 [math.st] 4 Mar 2013

Mediation and Interaction Analysis

Causal Mechanisms Short Course Part II:

Revision list for Pearl s THE FOUNDATIONS OF CAUSAL INFERENCE

The International Journal of Biostatistics

A comparison of 5 software implementations of mediation analysis

In some settings, the effect of a particular exposure may be

Help! Statistics! Mediation Analysis

Modern Mediation Analysis Methods in the Social Sciences

Harvard University. Harvard University Biostatistics Working Paper Series. Semiparametric Estimation of Models for Natural Direct and Indirect Effects

Harvard University. Harvard University Biostatistics Working Paper Series. Sheng-Hsuan Lin Jessica G. Young Roger Logan

Department of Biostatistics University of Copenhagen

Comparison of Three Approaches to Causal Mediation Analysis. Donna L. Coffman David P. MacKinnon Yeying Zhu Debashis Ghosh

Natural direct and indirect effects on the exposed: effect decomposition under. weaker assumptions

SEM REX B KLINE CONCORDIA D. MODERATION, MEDIATION

CAUSAL MEDIATION ANALYSIS FOR NON-LINEAR MODELS WEI WANG. for the degree of Doctor of Philosophy. Thesis Adviser: Dr. Jeffrey M.

Discussion of Papers on the Extensions of Propensity Score

Ratio of Mediator Probability Weighting for Estimating Natural Direct and Indirect Effects

Conceptual overview: Techniques for establishing causal pathways in programs and policies

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall

Causal Inference. Prediction and causation are very different. Typical questions are:

Ratio-of-Mediator-Probability Weighting for Causal Mediation Analysis. in the Presence of Treatment-by-Mediator Interaction

Causality II: How does causal inference fit into public health and what it is the role of statistics?

Identification and Sensitivity Analysis for Multiple Causal Mechanisms: Revisiting Evidence from Framing Experiments

University of California, Berkeley

Causal inference in epidemiological practice

Defining and estimating causal direct and indirect effects when setting the mediator to specific values is not feasible

A Longitudinal Look at Longitudinal Mediation Models

Observational Studies 4 (2018) Submitted 12/17; Published 6/18

Statistical Analysis of Causal Mechanisms

Bounds on Causal Effects in Three-Arm Trials with Non-compliance. Jing Cheng Dylan Small

Ignoring the matching variables in cohort studies - when is it valid, and why?

Harvard University. Inverse Odds Ratio-Weighted Estimation for Causal Mediation Analysis. Eric J. Tchetgen Tchetgen

arxiv: v2 [stat.me] 31 Dec 2012

Sensitivity analysis and distributional assumptions

Mediation Analysis for Health Disparities Research

Identification, Inference, and Sensitivity Analysis for Causal Mediation Effects

Causal Inference for Mediation Effects

Unpacking the Black-Box of Causality: Learning about Causal Mechanisms from Experimental and Observational Studies

Unpacking the Black-Box of Causality: Learning about Causal Mechanisms from Experimental and Observational Studies

SC705: Advanced Statistics Instructor: Natasha Sarkisian Class notes: Introduction to Structural Equation Modeling (SEM)

Harvard University. A Note on the Control Function Approach with an Instrumental Variable and a Binary Outcome. Eric Tchetgen Tchetgen

Online Appendix to Yes, But What s the Mechanism? (Don t Expect an Easy Answer) John G. Bullock, Donald P. Green, and Shang E. Ha

Comparative effectiveness of dynamic treatment regimes

Flexible Mediation Analysis in the Presence of Nonlinear Relations: Beyond the Mediation Formula

Mediation: Background, Motivation, and Methodology

Simple Sensitivity Analysis for Differential Measurement Error. By Tyler J. VanderWeele and Yige Li Harvard University, Cambridge, MA, U.S.A.

Causal Mediation Analysis in R. Quantitative Methodology and Causal Mechanisms

The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters

Harvard University. Harvard University Biostatistics Working Paper Series. On Partial Identification of the Pure Direct Effect

Identification and Inference in Causal Mediation Analysis

Estimating Post-Treatment Effect Modification With Generalized Structural Mean Models

Advanced Quantitative Research Methodology, Lecture Notes: Research Designs for Causal Inference 1

Causal Hazard Ratio Estimation By Instrumental Variables or Principal Stratification. Todd MacKenzie, PhD

Estimating and contextualizing the attenuation of odds ratios due to non-collapsibility

Causal mediation analysis: Multiple mediators

Combining multiple observational data sources to estimate causal eects

This paper revisits certain issues concerning differences

Guanglei Hong. University of Chicago. Presentation at the 2012 Atlantic Causal Inference Conference

Estimation of direct causal effects.

Modeling Mediation: Causes, Markers, and Mechanisms

Investigating mediation when counterfactuals are not metaphysical: Does sunlight exposure mediate the effect of eye-glasses on cataracts?

Gov 2002: 4. Observational Studies and Confounding

Introduction. Consider a variable X that is assumed to affect another variable Y. The variable X is called the causal variable and the

arxiv: v1 [math.st] 28 Nov 2017

New developments in structural equation modeling

Causal Effect Estimation Under Linear and Log- Linear Structural Nested Mean Models in the Presence of Unmeasured Confounding

A Decision Theoretic Approach to Causality

Previous lecture. P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing.

CAUSAL INFERENCE IN THE EMPIRICAL SCIENCES. Judea Pearl University of California Los Angeles (

Assessing In/Direct Effects: from Structural Equation Models to Causal Mediation Analysis

Variable selection and machine learning methods in causal inference

University of California, Berkeley

The identification of synergism in the sufficient-component cause framework

Statistical Analysis of Causal Mechanisms

Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal

New Weighting Methods for Causal Mediation Analysis

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall

eappendix: Description of mgformula SAS macro for parametric mediational g-formula

Harvard University. Harvard University Biostatistics Working Paper Series

Unpacking the Black-Box: Learning about Causal Mechanisms from Experimental and Observational Studies

Path analysis for discrete variables: The role of education in social mobility

Sharp sensitivity bounds for mediation under unmeasured mediator-outcome confounding

A proof of Bell s inequality in quantum mechanics using causal interactions

A Distinction between Causal Effects in Structural and Rubin Causal Models

Identification and Estimation of Causal Mediation Effects with Treatment Noncompliance

Front-Door Adjustment

Experimental Designs for Identifying Causal Mechanisms

Causal Inference. Miguel A. Hernán, James M. Robins. May 19, 2017

Q learning. A data analysis method for constructing adaptive interventions

An Introduction to Causal Inference, with Extensions to Longitudinal Data

Transcription:

Casual Mediation Analysis Tyler J. VanderWeele, Ph.D. Upcoming Seminar: April 21-22, 2017, Philadelphia, Pennsylvania

OXFORD UNIVERSITY PRESS Explanation in Causal Inference Methods for Mediation and Interaction 2015 Hardcover ISBN: 9780199325870 Available on Amazon or from Oxford University Press 1

Plan of Presentation (1) Concepts and Methods for Mediation (2) Sensitivity Analysis (3) Mediation with Time-to-Event Outcome (4) Multiple Mediators (5) Surrogate Outcomes (6) A Unification of Mediation and Interaction 1. Concepts and Methods Tyler J. VanderWeele Departments of Epidemiology and Biostatistics Harvard T.H. Chan School of Public Health 2

Plan of Presentation (1) Motivating Examples (2) Traditional Approaches and Limitations (3) Counterfactual Concepts (4) Regression-Based Methods (5) Binary Outcomes and Mediators (6) Empirical Examples (7) Macros and Software (8) Monte-Carlo Approach (9) Study Design Questions of Mediation In a number of research contexts we might be interested in the extent to which the effect of some exposure A on some outcome Y is mediated by an intermediate variable M and to what extent it is direct A M Y Stated another way, we are interested in the direct and indirect effects of the exposure 3

Genetics Example Lung Cancer: In 2008, three GWAS studies (Thorgeirsson et al., 2008; Hung et al., 2008; Amos et al., 2008) identified variants on chromosome 15q25.1 that were associated with lung cancer Smoking: These variants had also been shown to be associated with smoking behavior (average cigarettes per day) e.g. through nicotine dependence (Saccone et al., 2007; Spitz et al., 2008) Debate: there was debate as to whether the effect on lung is direct or operates through pathways related to smoking behavior (Chanock and Hunter, 2008); two thought direct, one mediated Interaction: Complicating matter further there was some evidence gene-environment interaction: carriers of the variant allele extract more nicotine and toxins from each cigarette (Le Marchand, 2008) Perinatal Epidemiology Example ART: There is evidence that use of assisted reproductive technologies (ART) lead to worse birth outcomes Twins: It is also clear that use of ART leads to high incidence of twins; being born as a twin also leads to worse birth outcomes Mediation? To what extent is the effect of ART on birth outcomes due to twinning? To what extent is it through other pathways? Policy Relevance: Twins could mostly be prevented for those using ART by e.g. only allowing single embryo transfer Some countries have adopted this policy e.g. Sweden 4

Standard Approach The standard approach to mediation analysis in much epidemiologic and social science research consists first of regressing the outcome Y on the exposure A and confounding factors C E[Y A=a,C=c] = ϕ 0 + ϕ 1 a + ϕ 2 c And compare the estimate ϕ 1 of exposure A with the estimate θ 1 obtained when including the potential mediator M in the regression model E[Y A=a,M=m,C=c] = θ 0 + θ 1 a + θ 2 m + θ 4 c If the coefficients ϕ 1 and θ 1 differ The usual measures of direct and indirect effect then some of the effect is thought to be mediated and the following estimates are often used: Indirect effect = ϕ 1 - θ 1 Direct effect = θ 1 Standard Approach Using the difference between the two coefficients is sometimes called the difference method Another standard method, used more commonly in the social sciences is sometimes referred to as the product method (Baron and Kenny, 1986): One regresses M on A: E[M A=a,C=c] = β 0 + β 1 a + β 2 c One regresses Y on M and A: E[Y A=a,M=m,C=c] = θ 0 + θ 1 a + θ 2 m + θ 4 c The direct effect is once again θ 1 The indirect or mediated effect is the product of the coefficient of A in the regression for M times the coefficient of M in the regression for Y: β 1 θ 2 The product method and difference method will coincide for continuous outcomes provided the models are correctly specified but not for binary outcomes (MacKinnon and Dwyer, 1993, MacKinnon et al., 1995) 5

Standard Approach The standard approach to mediation analysis of just including the mediator in the regression is subject to two important limitations PROBLEM 1: Even if the exposure is randomized or if all of the exposure-outcome confounders are included in the model there may be confounders of the mediator-outcome relationship C 1 A M Y U If control is not made for the mediator-outcome confounders then results from the standard approach can be highly biased Mediator-Outcome Confounding In many social and biomedical studies, careful thought is given to control for confounding of the exposure-outcome relationship; data are collected on all variables thought to confound the relationship between the exposure and the outcome (C 1 in the diagram) However, often little thought is given to collecting data on variables that might confound the mediator-outcome relationship (C 2 in the diagram) Mediation analyses are often secondary analyses in social and biomedical research and these variables often are not controlled for C 1 A M Y C 2 6

Mediator-Outcome Confounding Just as unmeasured exposure-outcome confounders can generate confounding bias of estimates of overall effects So also unmeasured mediator-outcome confounders can generate bias of estimates of direct and indirect effects C 1 A M Y U Mediator-Outcome Confounding The importance of controlling for mediator-outcome confounders when examining direct and indirect effects was also pointed out early on in the psychology literature on mediation (Judd and Kenny, 1981) However a later paper in the psychology literature (Baron and Kenny, 1986) came to be the canonical reference for mediation analysis in the social sciences ( >45,000 citations on Google Scholar) Unfortunately, the Baron and Kenny (1986) paper did not note that control needed to be made for mediator-outcome confounders in the estimation of direct and indirect effects, even though the point had been made by Judd and Kenny five years earlier in 1981 and even though the two papers shared an author As a result the point has been ignored by most of the research on mediation in the social sciences; many of these analyses are thus likely biased (possibly severely) Contrary to claims made in the psychology literature, mediator-outcome 14 confounding is an issue for mediation analysis even in randomized trials! 7

Mediator-Outcome Confounding SMaRT trial (Strong et al., 2008): a randomized cognitive behavioral therapy intervention Effect on depression symptoms after 3 months (SCL-20 depression, scale 0-4), was: E[Y A=1]-E[Y A=0]=-0.34 (95% CI: -0.55, -0.13) Intervention also had an effect on the use of antidepressant, M, at three months: E[M A=1]-E[M A=0]=0.27 Those in the CBT arm were more likely to use antidepressants Does the CBT intervention affect depressive symptoms simply because of higher antidepressant use, or other pathways? What happens when we regress outcome Y on treatment and mediator (anti-depressant use)? Mediator-Outcome Confounding The coefficient for antidepressant use is positive! It looks like antidepressant use increases depression! The mediated effect through antidepressant use looks detrimental! The direct effect looks larger than the total effect! What is going on here? Mediator-Outcome Confounding Those in more difficult situations both use an antidepressant and have higher levels of depressive symptoms When we ignore this confounding we get paradoxical results! Anti-depressant use is not randomized here But there are of course other trials that have randomized antidepressant use 8

Mediator-Outcome Confounding There are essentially two approaches to address mediator-outcome confounding (ideally both will be used): (1) If mediation analysis is going to be part of an epidemiologic study then careful thought should be given to collecting data on mediator-outcome confounding variables during the study design stage (2) After the study is finished, if there are unmeasured mediator-outcome confounders then sensitivity analysis techniques can be used to assess the extent to which the unmeasured confounding variable would have to affect the mediator and the outcome (and possibly the exposure) in order to invalidate inferences about direct and indirect effects (VanderWeele, 2010; Imai et al. 2010; Hafeman, 2011; Tchetgen Tchetgen and Shpitser, 2012) Exposure-Mediator Interactions Limitation 2: Interactions between the effects of the exposure and the mediator, if present, and neglected can lead to biases Even if we include an interaction term in the regression model: E[Y A=a,C=c] = ϕ 0 + ϕ 1 a + ϕ 2 c E[Y A=a,M=m,C=c] = θ 0 + θ 1 a + θ 2 m + θ 3 am + θ 4 c The usual measures of direct and indirect effect Indirect effect = ϕ 1 - θ 1 Direct effect = θ 1 break down because it is unclear how to handle the interaction coefficient θ 3 9

Exposure-Mediator Interactions In addition to clarifying the various no-unmeasured confounding assumptions that are need in mediation analysis, the early causal inference literature on mediation (Robins and Greenland, 1992; Pearl, 2001) provided definitions of direct and indirect effects that could be used even when there were interaction between the effects of the exposure and the mediator on the outcome and that could also be used in the presence of non-linear models In what follows we will: (1) Consider the causal ( counterfactual ) definitions of direct and indirect effects for mediation analysis and discuss the no unmeasured confounding assumptions required for identification (2) Describe regression methods that can be used to estimate these counterfactual direct and indirect effect quantities (e.g. VanderWeele and Vansteelandt, 2009, 2010; cf. Imai et al., 2010) (3) Provide sensitivity analysis techniques to assess the importance of 19 possible violations to the no unmeasured confounding assumptions Definitions Let Y denote some outcome of interest for each individual Let A denote some exposure or treatment of interest for each individual Let M denote some post-treatment intermediate(s) for each individual (potentially on the pathway between A and Y) Let C denote a set of covariates for each individual Let Y a be the counterfactual outcome (or potential outcome) Y for each individual when intervening to set A to a Let M a be the counterfactual outcome M for each individual when intervening to set A to a Let Y am be the counterfactual outcome Y for each individual 20 when intervening to set A to a and M to m 10

Definitions Robins and Greenland (1992) and Pearl (2001) proposed the following counterfactual definitions for direct and indirect effects: Controlled direct effect: The controlled direct effect comparing treatment level A=1 to A=0 intervening to fix M=m CDE(m) = Y 1m Y 0m Natural direct effect: The natural direct effect comparing treatment level A=1 to A=0 intervening to fix M=M 0 NDE = Y 1M o Y 0M o Natural indirect effect: The natural indirect effect comparing the effects of M=M 1 versus M=M 0 intervening to fix A=1 NIE = Y 1M 1 Y 1M 0 Properties of Direct and Indirect Effects A total effect decomposes into a direct and indirect effect: Y 1 - Y 0 = Y 1M 1 Y 0M 0 = (Y 1M 1 Y 1M 0) + (Y 1M o Y 0M o) = NIE+ NDE The definitions of natural direct and indirect effect do not presuppose no interactions between the effects of the exposure and the mediator on the outcome The effect decomposition of a total effect into a natural direct and indirect effect also does not presuppose no interaction between the effects of the exposure and the mediator on the outcome Natural direct and indirect effects are useful for effect decomposition; in general, controlled direct effects are not 11

Identification of Direct and Indirect Effects To estimate average natural direct and indirect effects we need: (1) There are no unmeasured exposure-outcome confounders given C (2) There are no unmeasured mediator-outcome confounders given (C,A) (3) There are no unmeasured exposure-mediator confounders given C (4) There is no mediator-outcome confounder affected by exposure (i.e. no arrow from A to C 2 ) For controlled direct effects, only assumptions (1) and (2) are needed Note (1) and (3) are guaranteed when treatment is randomized C 1 A M Y C 3 C 2 Identification of Direct and Indirect Effects More formally, in counterfactual notation, these assumptions are: (1) is Y am A C (2) is Y am M C,A (3) is M a A C (4) is Y am M a* C For controlled direct effects, only assumptions (1) and (2) are needed Note (1) and (3) are guaranteed when treatment is randomized C 1 A M Y C 3 C 2 12

Identification of Direct and Indirect Effects Under assumptions (1) and (2) the controlled direct effect conditional on the covariates is given by: E[ CDE(m) c ] = E[Y A=1,m,c] E[Y A=0,m,c] Under (1)-(4) the conditional natural direct and indirect effects are: E[ NDE c ] = Σ m {E[Y A=1,m,c] E[Y A=0,m,c]} P(M=m A=0,c) E[ NIE c ] = Σ m E[Y A=1,m,c] {P(M=m A=1,c) P(M=m A=0,c)} These are the effects within strata of the covariates We could take averages over each stratum weighted by the probability P(C=c) to get population averages of the effects Regression for Causal Mediation Analysis Similar concepts apply to treatment levels A=a to A=a* (replace 1 by a and 0 by a*) Under our confounding assumptions (1)-(4), natural direct and indirect effects are given by the following expressions: We could consider fitting a parametric regression model for Y and a parametric regression model for M and computing this analytically (VanderWeele and Vansteelandt, 2009, 2010; Valeri and VanderWeele, 2013) Alternatively Imai et al. (2010) propose to use a broad class of parametric or semiparametric models for Y and M and then to use simulations to calculate natural direct and indirect effects using the formulas above and the standard errors for these effects by bootstrapping 13

Regression for Causal Mediation Analysis We use regressions that accommodate exposure-mediator interaction: E[Y A=a,M=m,C=c] = θ 0 + θ 1 a + θ 2 m + θ 3 am + θ 4 c E[M A=a,C=c] = β 0 + β 1 a + β 2 c Under assumptions (1)-(4), and provided our models are correctly specified, we can combine the estimates from the two models to get the following formulas for direct and indirect effects, comparing exposure levels a and a* (VanderWeele and Vansteelandt, 2009): CDE(a,a*;m) = (θ 1 +θ 3 m)(a-a*) NDE(a,a*;a*) = (θ 1 +θ 3 (β 0 +β 1 a*+β 2 E[C]))(a-a*) NIE(a,a*;a) = (θ 2 β 1 +θ 3 β 1 a)(a-a*) If the conditional NDE were of interest then we would have: E[Y am a* Y a*m a* C=c] = (θ 1 +θ 3 (β 0 +β 1 a*+β 2 c))(a-a*) Regression for Causal Mediation Analysis Note that if there is no interaction between the effects of the exposure and the mediator on the outcome so that θ 3 =0 then these expression reduce to: CDE(a,a*;m) = NDE(a,a*;a*) = θ 1 (a-a*) NIE(a,a*;a) = θ 2 β 1 (a-a*) which are the expressions often used for direct and indirect effects in the social science literature (Baron and Kenny, 1986) the product method However, unlike the Baron and Kenny (1986) approach, this approach to direct and indirect effects using counterfactual definitions and estimates can be employed even in settings in which an interaction is present Standard errors can be obtained using the delta method Proportion mediated is just the indirect effect divided by the total effect SAS, Stata, and SPSS macros (Valeri and VanderWeele, 2013) can do 28 this automatically for continuous, binary, count, and time-to-event outcomes 14