Estimating direct effects in cohort and case-control studies

Estimating direct effects in cohort and case-control studies, Ghent University

Direct effects Introduction Motivation The problem of standard approaches Controlled direct effect models In many research studies, the interest is in knowing the effect of an exposure on an outcome, which is not mediated by a given intermediate variable or mediator. We term this a direct effect. Mediator M U Exposure X Outcome Y

Motivation The problem of standard approaches Controlled direct effect models Example 1: randomized microbicide trials (1) This is for instance the case when the intervention of interest stimulates secondary interventions. Padian et al., Lancet 2007

Motivation The problem of standard approaches Controlled direct effect models Example 1: randomized microbicide trials (2) For policy-making, the interest is in the pure effect of diaphragm and microbicide use, other than through changing condom use. Condom use M Microbicide X HIV Y

Example 2: genetic pathways (1) Motivation The problem of standard approaches Controlled direct effect models Interest in direct effect inference is also motivated by the general interest in understanding the biological pathways between an exposure and an outcome. When the smoke clears... (Chanock and Hunter, Nature 2008) Three studies identify and association between genetic variation at a location on chromosome 15 and risk of lung cancer. But they disagree on whether the link is direct or mediated through nicotine dependence.

Example 2: genetic pathways (2) Motivation The problem of standard approaches Controlled direct effect models Is the association with smoking real and the cause of an association with lung cancer? Smoking M Gene X Lung cancer Y

The standard approach Motivation The problem of standard approaches Controlled direct effect models Mediator M U Exposure X Outcome Y It is standard to adjust the association between exposure X and outcome Y for the mediator M (Baron and Kenny, 1986): E(Y X, M) = γ 0 + γ 1 X + γ 2 M Even when X is randomly assigned, this introduces bias in the presence of (unmeasured) confounders U for the association between mediator and outcome.

No unmeasured confounders Motivation The problem of standard approaches Controlled direct effect models Confounder L Mediator M U Exposure X Outcome Y We assume that all confounders L for the association between mediator and outcome have been measured. Additional adjustment for L removes this bias: E(Y X, M, L) = γ 0 + γ 1 X + γ 2 M + γ 3 L

Motivation The problem of standard approaches Controlled direct effect models The problem of intermediate confounding Confounder L Mediator M U Exposure X Outcome Y It is often realistic to believe that some of those confounders L are themselves affected by the exposure. Additional adjustment for L then continues to introduce bias.

Motivation The problem of standard approaches Controlled direct effect models Limitations of structural equations and path analysis Confounder L Mediator M U Exposure X Outcome Y 1 Methods based on path analysis and linear structural equation models frequently ignore the possible presence of the unmeasured factors U.

Controlled direct effects Motivation The problem of standard approaches Controlled direct effect models In view of this, Robins (1999) proposes to model direct effects directly. counterfactual outcome Y(x, m): outcome if, contrary to fact, exposure and mediator took level x and m

Motivation The problem of standard approaches Controlled direct effect models Structural nested direct effect models Structural nested direct effect model (Robins, 1999) is model for the average controlled direct effect Example E {Y(x, m) Y(0, m)} E {Y(x, m) Y(0, m)} = βx = β 1 x + β 2 xm How can we estimate the direct effect parameter β in model E {Y(x, m) Y(0, m)} = βx?

Inverse probability weighting Sequential G-estimation (SG) Inverse probability weighted estimator (1) Confounder L Mediator M U Exposure X Outcome Y Robins (1999) proposes inverse weighting the data by 1 f(m X, L)

Inverse probability weighting Sequential G-estimation (SG) Inverse probability weighted estimator (2) Confounder L Mediator M U Exposure X Outcome Y This removes the association between the mediator and its causes, so that only a direct effect remains.

Inverse probability weighting Sequential G-estimation (SG) Inverse probability weighted estimator (3) An estimate of the direct exposure effect β may thus be obtained by regressing outcome on exposure and mediator, after weighting each subject by 1 f(m X, L) The resulting estimator may behave erratically in finite samples when the mediator M is quantitative; or has strong predictors X and L. In view of this, we developed alternative estimators. Goetgeluk, S., Vansteelandt, S. and Goetghebeur, E. (2009). Estimation of controlled direct effects. JRSS B.

Sequential G-estimator (1) Inverse probability weighting Sequential G-estimation (SG) Confounder L Mediator M U Exposure X Outcome Y First, remove the indirect effect from the outcome, Y Y ˆγM, where ˆγ is estimate from a regression model E(Y X, M, L) = δ 1 + δ 2 X + δ 3 L + γm

Sequential G-estimator (2) Inverse probability weighting Sequential G-estimation (SG) Confounder L Mediator M U Exposure X Outcome Y* Now only a direct effect remains. Next, we remove the direct effect from the outcome Y (β) Y ˆγM βx

Sequential G-estimator (3) Inverse probability weighting Sequential G-estimation (SG) Confounder L Mediator M U Exposure X Outcome Y* Now, X and Y (β) must be (mean) independent.

Sequential G-estimator (4) Inverse probability weighting Sequential G-estimation (SG) We thus estimate the direct effect parameter β as the value for which (Goetgeluk et al., JRSS B 2008; Joffe and Greene, Biometrics 2009; Vansteelandt et al., Gen Epi 2009) X Y (β) That is by solving 0 = i {X i E(X)}(Y i ˆγM i βx i ) for β, where ˆγ is obtained by fitting E(Y X, M, L) = δ 1 + δ 2 X + δ 3 L + γm

Simulation study Introduction Inverse probability weighting Sequential G-estimation (SG) n = 200 data-generating model: linear structural equation model

Bias Introduction Inverse probability weighting Sequential G-estimation (SG) Direct effect 1.75 1.80 1.85 1.90 1.95 2.00 Direct effect 1.7 1.8 1.9 2.0 Direct effect 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 0.0 0.2 0.4 0.6 0.8 2 R yu 0.0 0.2 0.4 0.6 2 R yu 0.0 0.2 0.4 0.6 2 R yu Direct effect 1.85 1.90 1.95 2.00 Direct effect 1.75 1.80 1.85 1.90 1.95 2.00 Direct effect 1.70 1.75 1.80 1.85 1.90 1.95 2.00 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.0 0.2 0.4 0.6 0.0 0.2 0.4 0.6 2 R ml 2 R ml 2 R ml

Relative efficiency Introduction Inverse probability weighting Sequential G-estimation (SG) Relative Efficiency 1.0 1.1 1.2 1.3 1.4 1.5 Relative Efficiency 1.0 1.5 2.0 2.5 Relative Efficiency 1.0 1.5 2.0 2.5 3.0 3.5 4.0 0.0244 0.0246 0.0248 2 R yu 0.210 0.212 0.214 2 R yu 0.398 0.402 0.406 2 R yu Relative Efficiency 1.0 1.5 2.0 2.5 3.0 3.5 Relative Efficiency 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 Relative Efficiency 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.0 0.2 0.4 0.6 0.0 0.2 0.4 0.6 2 R ml 2 R ml 2 R ml

Case-control studies Sequential G-estimation Suppose now that subjects were sampled conditional on their disease status (Y = 1: case; Y = 0, control). Confounder L Mediator M U Exposure X Outcome Y Selection

Sequential G-estimation Sequential G-estimation subject to selection bias Removing the mediator effect from the outcome induces selection bias. Y Y ˆγM Y Y + ˆγM Confounder L Mediator M U Exposure X Outcome Y* Selection

Sequential G-estimation Multiplicative direct effect models Consider the multiplicative direct effect model P{Y(x, m) = 1} P{Y(0, m) = 1} = exp(βx) exp(βx) expresses the relative change in probability of disease when the exposure X is increased from 0 to x units while holding the mediator fixed at m.

Estimation principle (1) Sequential G-estimation The effect of the mediator on disease risk can be estimated through the odds ratio exp(γ m M) in the logistic regression model: logitp(y = 1 M, X, L) = γ 0 + γ m M + γ x X + γ l L When the disease prevalence is low, Y exp( γ m M) therefore approximates the expected outcome that would be observed if the effect of mediator on outcome were removed.

Estimation principle (2) Sequential G-estimation By further removing the direct exposure effect, we obtain a residual outcome Y exp( γ m M βx) which is no longer affected by X. At the population level, n 0 = (X i E(X i ))Y i exp( ˆγ m M i βx i ) i=1 is thus an unbiased equation for β. This continues to be true in outcome-dependent sampling designs.

Sequential G-estimation : effect of X on L, n = 200 Seq-G Logistic Reg n β P(Y = 1) Mean SD Mean SD 200 0 0.010 0.00270 0.426-0.0637 0.409 0.027 0.0136 0.432-0.0548 0.414 0.053 0.0103 0.395-0.0532 0.397 0.15 0.00295 0.397-0.0333 0.394-0.393 0.0075-0.408 0.403-0.474 0.391 0.020-0.407 0.406-0.475 0.397 0.053-0.426 0.371-0.492 0.377 0.10-0.459 0.377-0.500 0.378

Sequential G-estimation : no effect of X on L, n = 200 Seq-G Logistic Reg n β P(Y = 1) Mean SD Mean SD 200 0 0.409 0.00302 0.425-0.00778 0.407 0.027 0.0136 0.432 0.00205 0.412 0.053 0.00998 0.395 0.000298 0.395 0.15-0.00450 0.395-0.00922 0.396-0.393 0.0075-0.407 0.402-0.418 0.388 0.020-0.406 0.407-0.418 0.395 0.053-0.425 0.371-0.435 0.375 0.10-0.462 0.381-0.474 0.377

Conclusions (1) Introduction Sequential G-estimation Inferring direct effects requires adjustment for all confounders of the effect of mediator on outcome. When these confounders are themselves affected by the exposure, standard methods are not applicable. In that setting, sequential G-estimators are valid; easily applied; have good efficiency; and are fairly robust to model misspecification. Doubly robust direct effect estimators have additional robustness properties and allow for more flexible models. Goetgeluk, S., Vansteelandt, S. and Goetghebeur, E. (2009). Estimation of controlled direct effects. JRSS B.

Conclusions (2) Introduction Sequential G-estimation Under a rare disease assumption, results are easily extended to outcome-dependent sampling designs under relative risk models. Vansteelandt, S. (2009). Estimating direct effects in cohort and case-control studies. Epidemiology. Work on logistic direct effects models is ongoing.