Estimating direct effects in cohort and case-control studies

Similar documents
Casual Mediation Analysis

Confounding, mediation and colliding

Mediation analyses. Advanced Psychometrics Methods in Cognitive Aging Research Workshop. June 6, 2016

Causality II: How does causal inference fit into public health and what it is the role of statistics?

Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal

Causal inference in epidemiological practice

Mediation Analysis for Health Disparities Research

Flexible mediation analysis in the presence of non-linear relations: beyond the mediation formula.

Estimating the Marginal Odds Ratio in Observational Studies

Causal Effect Estimation Under Linear and Log- Linear Structural Nested Mean Models in the Presence of Unmeasured Confounding

Harvard University. A Note on the Control Function Approach with an Instrumental Variable and a Binary Outcome. Eric Tchetgen Tchetgen

Natural direct and indirect effects on the exposed: effect decomposition under. weaker assumptions

More Statistics tutorial at Logistic Regression and the new:

arxiv: v2 [math.st] 4 Mar 2013

Causal Mediation Analysis Short Course

Double Robustness. Bang and Robins (2005) Kang and Schafer (2007)

A comparison of 5 software implementations of mediation analysis

Mediation Analysis: A Practitioner s Guide

An Introduction to Causal Mediation Analysis. Xu Qin University of Chicago Presented at the Central Iowa R User Group Meetup Aug 10, 2016

Causal Mechanisms Short Course Part II:

A Unification of Mediation and Interaction. A 4-Way Decomposition. Tyler J. VanderWeele

Ignoring the matching variables in cohort studies - when is it valid, and why?

Revision list for Pearl s THE FOUNDATIONS OF CAUSAL INFERENCE

Statistical Methods for Causal Mediation Analysis

Estimating and contextualizing the attenuation of odds ratios due to non-collapsibility

Causal mediation analysis: Definition of effects and common identification assumptions

Previous lecture. P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing.

Combining multiple observational data sources to estimate causal eects

Flexible Mediation Analysis in the Presence of Nonlinear Relations: Beyond the Mediation Formula

Causal Inference. Prediction and causation are very different. Typical questions are:

University of California, Berkeley

Specification Errors, Measurement Errors, Confounding

Propensity Score Methods for Causal Inference

Comparative effectiveness of dynamic treatment regimes

Help! Statistics! Mediation Analysis

Causal inference in biomedical sciences: causal models involving genotypes. Mendelian randomization genes as Instrumental Variables

Comparison of Three Approaches to Causal Mediation Analysis. Donna L. Coffman David P. MacKinnon Yeying Zhu Debashis Ghosh

Mediation and Interaction Analysis

Lecture 5: Poisson and logistic regression

Causal Hazard Ratio Estimation By Instrumental Variables or Principal Stratification. Todd MacKenzie, PhD

Estimation of Optimal Treatment Regimes Via Machine Learning. Marie Davidian

Methods for inferring short- and long-term effects of exposures on outcomes, using longitudinal data on both measures

Lecture 2: Poisson and logistic regression

Sensitivity analysis and distributional assumptions

Causal mediation analysis: Multiple mediators

A Decision Theoretic Approach to Causality

Selection on Observables: Propensity Score Matching.

STAT 5500/6500 Conditional Logistic Regression for Matched Pairs

Statistical Analysis of Causal Mechanisms

CAUSAL INFERENCE IN THE EMPIRICAL SCIENCES. Judea Pearl University of California Los Angeles (

A Sampling of IMPACT Research:

In some settings, the effect of a particular exposure may be

Statistical inference in Mendelian randomization: From genetic association to epidemiological causation

The International Journal of Biostatistics

Lecture 7: Interaction Analysis. Summer Institute in Statistical Genetics 2017

Literature review The attributable fraction in causal inference and genetics

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score

Marginal, crude and conditional odds ratios

Mediation Analysis for Count and Zero-Inflated Count Data Without Sequential Ignorability and Its Application in Dental Studies

Propensity Score Methods, Models and Adjustment

arxiv: v2 [stat.me] 31 Dec 2012

Investigating mediation when counterfactuals are not metaphysical: Does sunlight exposure mediate the effect of eye-glasses on cataracts?

Department of Biostatistics University of Copenhagen

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data?

Downloaded from:

A Longitudinal Look at Longitudinal Mediation Models

Chapter 22: Log-linear regression for Poisson counts

Covariate Balancing Propensity Score for General Treatment Regimes

Correlation and regression

Integrated approaches for analysis of cluster randomised trials

OUTLINE CAUSAL INFERENCE: LOGICAL FOUNDATION AND NEW RESULTS. Judea Pearl University of California Los Angeles (

Lecture 1 Introduction to Multi-level Models

Ratio of Mediator Probability Weighting for Estimating Natural Direct and Indirect Effects

Lecture 2: Constant Treatment Strategies. Donglin Zeng, Department of Biostatistics, University of North Carolina

Estimation of direct causal effects.

13.1 Causal effects with continuous mediator and. predictors in their equations. The definitions for the direct, total indirect,

Unbiased estimation of exposure odds ratios in complete records logistic regression

Chapter 2: Describing Contingency Tables - II

Effects of multiple interventions

Sections 2.3, 2.4. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis 1 / 21

Simple Sensitivity Analysis for Differential Measurement Error. By Tyler J. VanderWeele and Yige Li Harvard University, Cambridge, MA, U.S.A.

Unpacking the Black Box of Causality: Learning about Causal Mechanisms from Experimental and Observational Studies

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall

Propensity Score Weighting with Multilevel Data

University of California, Berkeley

Estimating Post-Treatment Effect Modification With Generalized Structural Mean Models

Mendelian randomization (MR)

Path analysis for discrete variables: The role of education in social mobility

Lecture 4 Multiple linear regression

Mendelian randomization as an instrumental variable approach to causal inference

IV-estimators of the causal odds ratio for a continuous exposure in prospective and retrospective designs

Estimating the treatment effect on the treated under time-dependent confounding in an application to the Swiss HIV Cohort Study

Harvard University. Harvard University Biostatistics Working Paper Series. Semiparametric Estimation of Models for Natural Direct and Indirect Effects

Causal Mediation Analysis in R. Quantitative Methodology and Causal Mechanisms

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall

Lecture 12: Effect modification, and confounding in logistic regression

Identification and Inference in Causal Mediation Analysis

Bootstrapping Sensitivity Analysis

Observational Studies 4 (2018) Submitted 12/17; Published 6/18

Probabilistic Index Models

Transcription:

Estimating direct effects in cohort and case-control studies, Ghent University

Direct effects Introduction Motivation The problem of standard approaches Controlled direct effect models In many research studies, the interest is in knowing the effect of an exposure on an outcome, which is not mediated by a given intermediate variable or mediator. We term this a direct effect. Mediator M U Exposure X Outcome Y

Motivation The problem of standard approaches Controlled direct effect models Example 1: randomized microbicide trials (1) This is for instance the case when the intervention of interest stimulates secondary interventions. Padian et al., Lancet 2007

Motivation The problem of standard approaches Controlled direct effect models Example 1: randomized microbicide trials (2) For policy-making, the interest is in the pure effect of diaphragm and microbicide use, other than through changing condom use. Condom use M Microbicide X HIV Y

Example 2: genetic pathways (1) Motivation The problem of standard approaches Controlled direct effect models Interest in direct effect inference is also motivated by the general interest in understanding the biological pathways between an exposure and an outcome. When the smoke clears... (Chanock and Hunter, Nature 2008) Three studies identify and association between genetic variation at a location on chromosome 15 and risk of lung cancer. But they disagree on whether the link is direct or mediated through nicotine dependence.

Example 2: genetic pathways (2) Motivation The problem of standard approaches Controlled direct effect models Is the association with smoking real and the cause of an association with lung cancer? Smoking M Gene X Lung cancer Y

The standard approach Motivation The problem of standard approaches Controlled direct effect models Mediator M U Exposure X Outcome Y It is standard to adjust the association between exposure X and outcome Y for the mediator M (Baron and Kenny, 1986): E(Y X, M) = γ 0 + γ 1 X + γ 2 M

The standard approach Motivation The problem of standard approaches Controlled direct effect models Mediator M U Exposure X Outcome Y It is standard to adjust the association between exposure X and outcome Y for the mediator M (Baron and Kenny, 1986): E(Y X, M) = γ 0 + γ 1 X + γ 2 M Even when X is randomly assigned, this introduces bias in the presence of (unmeasured) confounders U for the association between mediator and outcome.

No unmeasured confounders Motivation The problem of standard approaches Controlled direct effect models Confounder L Mediator M U Exposure X Outcome Y We assume that all confounders L for the association between mediator and outcome have been measured. Additional adjustment for L removes this bias: E(Y X, M, L) = γ 0 + γ 1 X + γ 2 M + γ 3 L

Motivation The problem of standard approaches Controlled direct effect models The problem of intermediate confounding Confounder L Mediator M U Exposure X Outcome Y It is often realistic to believe that some of those confounders L are themselves affected by the exposure. Additional adjustment for L then continues to introduce bias.

Motivation The problem of standard approaches Controlled direct effect models Limitations of structural equations and path analysis Confounder L Mediator M U Exposure X Outcome Y 1 Methods based on path analysis and linear structural equation models frequently ignore the possible presence of the unmeasured factors U.

Motivation The problem of standard approaches Controlled direct effect models Limitations of structural equations and path analysis Confounder L Mediator M U Exposure X Outcome Y 1 Methods based on path analysis and linear structural equation models frequently ignore the possible presence of the unmeasured factors U. 2 When they acknowledge this, inference becomes dependent on the chosen model for f(l X).

Controlled direct effects Motivation The problem of standard approaches Controlled direct effect models In view of this, Robins (1999) proposes to model direct effects directly. counterfactual outcome Y(x, m): outcome if, contrary to fact, exposure and mediator took level x and m

Controlled direct effects Motivation The problem of standard approaches Controlled direct effect models In view of this, Robins (1999) proposes to model direct effects directly. counterfactual outcome Y(x, m): outcome if, contrary to fact, exposure and mediator took level x and m average controlled direct effect of exposure X on outcome Y when holding the mediator M fixed at m E {Y(x, m) Y(0, m)}

Motivation The problem of standard approaches Controlled direct effect models Structural nested direct effect models Structural nested direct effect model (Robins, 1999) is model for the average controlled direct effect Example E {Y(x, m) Y(0, m)} E {Y(x, m) Y(0, m)} = βx = β 1 x + β 2 xm How can we estimate the direct effect parameter β in model E {Y(x, m) Y(0, m)} = βx?

Inverse probability weighting Sequential G-estimation (SG) Inverse probability weighted estimator (1) Confounder L Mediator M U Exposure X Outcome Y Robins (1999) proposes inverse weighting the data by 1 f(m X, L)

Inverse probability weighting Sequential G-estimation (SG) Inverse probability weighted estimator (2) Confounder L Mediator M U Exposure X Outcome Y This removes the association between the mediator and its causes, so that only a direct effect remains.

Inverse probability weighting Sequential G-estimation (SG) Inverse probability weighted estimator (3) An estimate of the direct exposure effect β may thus be obtained by regressing outcome on exposure and mediator, after weighting each subject by 1 f(m X, L)

Inverse probability weighting Sequential G-estimation (SG) Inverse probability weighted estimator (3) An estimate of the direct exposure effect β may thus be obtained by regressing outcome on exposure and mediator, after weighting each subject by 1 f(m X, L) The resulting estimator may behave erratically in finite samples when the mediator M is quantitative; or has strong predictors X and L. In view of this, we developed alternative estimators. Goetgeluk, S., Vansteelandt, S. and Goetghebeur, E. (2009). Estimation of controlled direct effects. JRSS B.

Sequential G-estimator (1) Inverse probability weighting Sequential G-estimation (SG) Confounder L Mediator M U Exposure X Outcome Y First, remove the indirect effect from the outcome, Y Y ˆγM, where ˆγ is estimate from a regression model E(Y X, M, L) = δ 1 + δ 2 X + δ 3 L + γm

Sequential G-estimator (2) Inverse probability weighting Sequential G-estimation (SG) Confounder L Mediator M U Exposure X Outcome Y* Now only a direct effect remains. Next, we remove the direct effect from the outcome Y (β) Y ˆγM βx

Sequential G-estimator (3) Inverse probability weighting Sequential G-estimation (SG) Confounder L Mediator M U Exposure X Outcome Y* Now, X and Y (β) must be (mean) independent.

Sequential G-estimator (4) Inverse probability weighting Sequential G-estimation (SG) We thus estimate the direct effect parameter β as the value for which (Goetgeluk et al., JRSS B 2008; Joffe and Greene, Biometrics 2009; Vansteelandt et al., Gen Epi 2009) X Y (β)

Sequential G-estimator (4) Inverse probability weighting Sequential G-estimation (SG) We thus estimate the direct effect parameter β as the value for which (Goetgeluk et al., JRSS B 2008; Joffe and Greene, Biometrics 2009; Vansteelandt et al., Gen Epi 2009) X Y (β) That is by solving 0 = i {X i E(X)}(Y i ˆγM i βx i ) for β, where ˆγ is obtained by fitting E(Y X, M, L) = δ 1 + δ 2 X + δ 3 L + γm

Simulation study Introduction Inverse probability weighting Sequential G-estimation (SG) n = 200 data-generating model: linear structural equation model

Bias Introduction Inverse probability weighting Sequential G-estimation (SG) Direct effect 1.75 1.80 1.85 1.90 1.95 2.00 Direct effect 1.7 1.8 1.9 2.0 Direct effect 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 0.0 0.2 0.4 0.6 0.8 2 R yu 0.0 0.2 0.4 0.6 2 R yu 0.0 0.2 0.4 0.6 2 R yu Direct effect 1.85 1.90 1.95 2.00 Direct effect 1.75 1.80 1.85 1.90 1.95 2.00 Direct effect 1.70 1.75 1.80 1.85 1.90 1.95 2.00 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.0 0.2 0.4 0.6 0.0 0.2 0.4 0.6 2 R ml 2 R ml 2 R ml

Relative efficiency Introduction Inverse probability weighting Sequential G-estimation (SG) Relative Efficiency 1.0 1.1 1.2 1.3 1.4 1.5 Relative Efficiency 1.0 1.5 2.0 2.5 Relative Efficiency 1.0 1.5 2.0 2.5 3.0 3.5 4.0 0.0244 0.0246 0.0248 2 R yu 0.210 0.212 0.214 2 R yu 0.398 0.402 0.406 2 R yu Relative Efficiency 1.0 1.5 2.0 2.5 3.0 3.5 Relative Efficiency 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 Relative Efficiency 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.0 0.2 0.4 0.6 0.0 0.2 0.4 0.6 2 R ml 2 R ml 2 R ml

Case-control studies Sequential G-estimation Suppose now that subjects were sampled conditional on their disease status (Y = 1: case; Y = 0, control). Confounder L Mediator M U Exposure X Outcome Y Selection

Sequential G-estimation Sequential G-estimation subject to selection bias Removing the mediator effect from the outcome induces selection bias. Y Y ˆγM Y Y + ˆγM Confounder L Mediator M U Exposure X Outcome Y* Selection

Sequential G-estimation Multiplicative direct effect models Consider the multiplicative direct effect model P{Y(x, m) = 1} P{Y(0, m) = 1} = exp(βx) exp(βx) expresses the relative change in probability of disease when the exposure X is increased from 0 to x units while holding the mediator fixed at m.

Estimation principle (1) Sequential G-estimation The effect of the mediator on disease risk can be estimated through the odds ratio exp(γ m M) in the logistic regression model: logitp(y = 1 M, X, L) = γ 0 + γ m M + γ x X + γ l L

Estimation principle (1) Sequential G-estimation The effect of the mediator on disease risk can be estimated through the odds ratio exp(γ m M) in the logistic regression model: logitp(y = 1 M, X, L) = γ 0 + γ m M + γ x X + γ l L When the disease prevalence is low, Y exp( γ m M) therefore approximates the expected outcome that would be observed if the effect of mediator on outcome were removed.

Estimation principle (2) Sequential G-estimation By further removing the direct exposure effect, we obtain a residual outcome Y exp( γ m M βx) which is no longer affected by X. At the population level, n 0 = (X i E(X i ))Y i exp( ˆγ m M i βx i ) i=1 is thus an unbiased equation for β.

Estimation principle (2) Sequential G-estimation By further removing the direct exposure effect, we obtain a residual outcome Y exp( γ m M βx) which is no longer affected by X. At the population level, n 0 = (X i E(X i ))Y i exp( ˆγ m M i βx i ) i=1 is thus an unbiased equation for β. This continues to be true in outcome-dependent sampling designs.

Estimation principle (2) Sequential G-estimation By further removing the direct exposure effect, we obtain a residual outcome Y exp( γ m M βx) which is no longer affected by X. At the population level, n 0 = (X i E(X i ))Y i exp( ˆγ m M i βx i ) i=1 is thus an unbiased equation for β. This continues to be true in outcome-dependent sampling designs. It requires approximating E(X i ) with the sample average of X i in the controls.

Sequential G-estimation : effect of X on L, n = 200 Seq-G Logistic Reg n β P(Y = 1) Mean SD Mean SD 200 0 0.010 0.00270 0.426-0.0637 0.409 0.027 0.0136 0.432-0.0548 0.414 0.053 0.0103 0.395-0.0532 0.397 0.15 0.00295 0.397-0.0333 0.394-0.393 0.0075-0.408 0.403-0.474 0.391 0.020-0.407 0.406-0.475 0.397 0.053-0.426 0.371-0.492 0.377 0.10-0.459 0.377-0.500 0.378

Sequential G-estimation : no effect of X on L, n = 200 Seq-G Logistic Reg n β P(Y = 1) Mean SD Mean SD 200 0 0.409 0.00302 0.425-0.00778 0.407 0.027 0.0136 0.432 0.00205 0.412 0.053 0.00998 0.395 0.000298 0.395 0.15-0.00450 0.395-0.00922 0.396-0.393 0.0075-0.407 0.402-0.418 0.388 0.020-0.406 0.407-0.418 0.395 0.053-0.425 0.371-0.435 0.375 0.10-0.462 0.381-0.474 0.377

Conclusions (1) Introduction Sequential G-estimation Inferring direct effects requires adjustment for all confounders of the effect of mediator on outcome. When these confounders are themselves affected by the exposure, standard methods are not applicable.

Conclusions (1) Introduction Sequential G-estimation Inferring direct effects requires adjustment for all confounders of the effect of mediator on outcome. When these confounders are themselves affected by the exposure, standard methods are not applicable. In that setting, sequential G-estimators are valid; easily applied; have good efficiency; and are fairly robust to model misspecification.

Conclusions (1) Introduction Sequential G-estimation Inferring direct effects requires adjustment for all confounders of the effect of mediator on outcome. When these confounders are themselves affected by the exposure, standard methods are not applicable. In that setting, sequential G-estimators are valid; easily applied; have good efficiency; and are fairly robust to model misspecification. Doubly robust direct effect estimators have additional robustness properties and allow for more flexible models. Goetgeluk, S., Vansteelandt, S. and Goetghebeur, E. (2009). Estimation of controlled direct effects. JRSS B.

Conclusions (2) Introduction Sequential G-estimation Under a rare disease assumption, results are easily extended to outcome-dependent sampling designs under relative risk models. Vansteelandt, S. (2009). Estimating direct effects in cohort and case-control studies. Epidemiology. Work on logistic direct effects models is ongoing.