Causal inference in epidemiological practice

Causal inference in epidemiological practice Willem van der Wal Biostatistics, Julius Center UMC Utrecht June 5, 2

Overview Introduction to causal inference Marginal causal effects Estimating marginal causal effects More detail: inverse probability weighting (IPW) Efficiency & small sample bias Possible solution

Counterfactual histories: individual level Dichotomous treatment outcome for individual i when not treated: Y i,a= outcome for individual i when treated: Y i,a= Treatment with varying dosage outcome for individual i with dosis : Y i,a= outcome for individual i with dosis 2: Y i,a=2 outcome for individual i with dosis 5: Y i,a=5 outcome for individual i with dosis : Y i,a=

Counterfactual histories: population level Dichotomous treatment distribution of outcomes, nobody treated: P(Y a= ) distribution of outcomes, everybody treated: P(Y a= ) Treatment with varying dosage distribution of outcomes, everybody dosis : P(Y a= ) distribution of outcomes, everybody dosis 2: P(Y a=2 ) distribution of outcomes, everybody dosis 5: P(Y a=5 ) distribution of outcomes, everybody dosis : P(Y a= )

Marginal Structural Models Potential outcomes distribution: P(Y a ) Marginal Structural Models Model the population distribution of potential outcomes Examples dichotomous treatment logitp(y a = ) = β + β a exp(β ) is causal odds ratio P(Y a==)/( P(Y a= =)) P(Y a= =)/( P(Y a= =)) continous treatment E(Y a ) = β + β a β is causal mean difference E(Y a= ) E(Y a= )

Conditional Effects Causal questions What is on average the difference in outcome when patients are treated compared to when patients are not treated? Often too unspecific More specific causal questions: For patients with poor health, what is on average the difference in outcome between treating and not treating? And for patients with good health, what is on average the difference in outcome between treating and not treating? P(Y a V ) e.g. logitp(y a = V ) = β + β a + β 2 V + β 3 av

Trials Randomized experiments Example: randomized treatment (placebo/treatment) Both treatment arms are comparable Both treatment arms are comparable to the whole sample P(Y a= ) = P(Y A = ) P(Y a= ) = P(Y A = ) Therefore, we can give a causal interpretation to the treatment effect estimated from this sample e.g. P(Y = A = ) P(Y = A = ) = P(Y a= = ) P(Y a= = )

Observational studies Confounding L Y β r A Treatment arms are not comparable (could also happen by chance) r is not an estimate of β

Adjusting for confounding: conditioning Conditioning Stratifying Including confounders in regression model

Adjusting for confounding: conditioning Conditioning Stratifying Including confounders in regression model Estimate effect within each stratum Pool estimates together

Conditioning, caveat : Adjusting away the effect Time dependent confounder and time dependent treatment Confounder also intermediary for treatment Conditioning: indirect effect is adjusted away L t A t Y t L t A t Y t widely known in Epidemiology, see e.g. Rothman (22)

Conditioning, caveat 2: Non-collapsibility Simulation: confounder L, exposure A and outcome Y all dichotomous logitp(l = ) = logitp(a = ) = + 2L logitp(y = ) = + A + βl True conditional causal OR: e True marginal causal OR: logitp[y a = L] = + a + βl = f (a, L, β) P[Y a = ] = l {expitf (a, l, β)p(l = l)} = g(a, β) true marginal cor = g(, β)/( g(, β)) g(, β)/( g(, β))

Conditioning, caveat 2: Non-collapsibility True marginal causal OR 3. True conditional causal OR 2.5 2..5 5 5 β

Conditioning, caveat 3: Berkson s bias Berkson s bias A L A Y U Condition on L An association between A and U is introduced Therefore, an association between A and Y is introduced see e.g. Hernán e.a. (24)

Adjusting for confounding: IPW Inverse probability weighting (IPW) P(Exposure = Blue) = /3 weight by (/3) = 3 3 2 = 6 P(Exposure = Blue) = 2/3 weight by (2/3) =.5.5 4 = 6 P(Exposure = Red) = 2/3 weight by (2/3) =.5.5 4 = 6 P(Exposure = Red) = /3 weight by (/3) = 3 3 2 = 6

Adjusting for confounding: IPW Result of IPW Take into account dependency within patients

IPW, formally Weight each observation i by Stabilized weight: Continuous exposure: w i = sw i = P(A i = a i L i = l i ) P(A i = a i ) P(A i = a i L i = l i ) sw i = f (a i) f (a i l i )

HIV example Effect of active tuberculosis on mortality in HIV patients (n = 276) Active tuberculosis (TB) status: A(t) changes from to Outcome: mortality Confounder CD4 count: L(t)

HIV example: inverse probability weighting Observations: {A ij, L ij, Y ij } Weight observations by: sw ij = j k= P[A ik = a ik Āik = ā ik ] P[A ik = a ik Ā ik = ā ik, L ik = l ik ] () Elements in numerator and denominator of () estimated using regression models Fit MSM on weighted observations: λ Tā = λ (t)e θa(t)

Adjusting for confounding: G-computation G-computation RR = 3/6 4/6 =.75 f (y a ) = f (y a L = l)f (l)dl l

Adjusting for confounding: G-computation G-computation O O O O O O O RR = 3/6 4/6 =.75 f (y a ) = f (y a L = l)f (l)dl l

Adjusting for confounding: G-computation G-computation O O O O O O O RR = 3/6 4/6 =.75 f (y a ) = l f (y a L = l)f (l)dl

Adjusting for confounding: G-computation G-computation O O O O O O O O O O O O O O O O O O O O O O RR = 3/6 4/6 =.75 f (y a ) = l f (y a L = l)f (l)dl

Adjusting for confounding: G-computation G-computation O O O O O O O O O O O O O O O RR = 6/2 9/2 =.67 f (y a ) = f (y a L = l)f (l)dl l

Adjusting for confounding: G-computation G-computation O O O O O O O O O O O O O O O RR = 6/2 9/2 =.67 f (y a ) = f (y a, L = l)f (l)dl l

Van der Wal e.a. (29)

MSMs: other methods Doubly robust estimation Targeted maximum likelihood estimation see Van der Laan (2)

Assumptions No unmeasured confounding Consistency subjects counterfactual outcome given his observed exposure level is precisely his or her observed outcome Positivity every exposure level has a positive probability of being allocated in every stratum defined by confounders (IPW especially sensitive) Correct model specification

Implementation SPSS weighting possible no robust standard errors bootstrap rather difficult SAS macro, binary exposure, point treatment: Jonsson e.a. (27) longitudinal IPW example: Hernán e.a. (2) STATA longitudinal IPW example: Fewell e.a. (24) R causalgam, binary exposure, point treatment cvdsa, point treatment, data adaptive algorithm http://www.stat.berkeley.edu/~laan/software/ tmlelite, binary point treatment, targeted maximum likelihood estimation http://www.stat.berkeley.edu/ ~laan/software/ ipw, point treatment and longitudinal, flexible models

IPW problems Inefficient compared to likelihood-based methods Small sample bias when strata defined by confounder have low response probabilities

Simulation study Continuous confounder L N (, 5), 2 Dichotomous exposure logitp(a = ) = + L, 3 Continuous outcome Y = A +.5L + N (, 5).

Density plots (sample of n=) A= A= 5 5 5 2 25 L Difficult to reweight

Models Uncorrected: 2 Conditional: E(Y A) = β + β A 3 IPW MSM: E(Y A, L) = β + β A + β 2 L E(Y a ) = β + β a observations weighted by sw i = P(A i =a i ) P(A i =a i L i =l i )

n=, 5 runs uncorrected conditional IPW 6 8 2 4 6 Parameter estimates IPW: correct model specification!

n=, 5 runs uncorrected conditional IPW IPW (n=) 6 8 2 4 6 Parameter estimates IPW: correct model specification!

Iterative IPW algorithm Compute sw i, = 2 For k =,..., k max repeat the following steps { In the weighted dataset, estimate new inverse probability weights sw i,k 2 Convergence is reached when VAR(ln( sw i,k )) c STOP. 3 Update the inverse probability weights by multiplying sw i,k = sw i,k sw i,k. 4 Truncate sw i,k to percentiles p trunc % and p trunc %. 5 Center the weights at mean of : sw i,k = sw i,k /MEAN(sw i,k ). } Use weights sw i,k obtained after convergence (or when k max was reached).

n=, 5 runs uncorrected conditional IIPW % IIPW 5% IIPW % IIPW % IPW 6 8 2 4 6 Parameter estimates k max = 5, c STOP = 7

Simulation results (5 runs) Condition/Method Estimate 95% C.I. IIPW Iterations Mean bias RMSE Coverage Average Median Converged length Sample size Uncorrected 3.75 3.76..3 Conditional..48.95.88 IPW.89.92.58 3.93 IIPW, p trunc = % -.3.59.88 4.89 4 99.9% IIPW, p trunc =.%..2.93 4.47 5 99.98% IIPW, p trunc = %..83.95 3.27 6 % IIPW, p trunc = 5%..67.95 2.63 9 % IIPW, p trunc = %..63.95 2.47 4 % IIPW, p trunc = 2% -..87.94 2.95 68 99.82%

Discussion Is the resulting IIPW estimator consistent? Implementation for longitudinal data Robustness against misspecification Or maybe we should all just use TMLE

Literature Cole SR, Hernán MA (28). Constructing inverse probability weights for marginal structural models. American Journal of Epidemiology, pp. 68(6), 656 664. Fewell Z, Hernán MA, Wolfe F, Tilling K, Choi H, Sterne JAC (24). Controlling for time-dependent confounding using marginal structural models. The Stata Journal, pp. 4(4), 42 42. Hernán MA (24). A definition of causal effect for epidemiological research. Journal of Epidemiology & Community Health, pp. 58, 265 27. Hernán MA, Brumback BA, Robins JM (2). Marginal structural models to estimate the causal effect of Zidovudine on the survival of HIV-positive men. Epidemiology, pp. (5), 56 57. Hernán MA, Hernández-Díaz S, Robins JM (24). A structural approach to selection bias. Epidemiology, pp. 5(5), 56 57. Jonsson Funk M, Westreich D, Davidian M, Weisen C (27). Introducing a SAS macro for doubly robust estimation. SAS Global Forum 27, paper 89 27. Pearl J (2). An introduction to causal inference. The International Journal of Biostatistics, pp. 6(2), Article 7. Petersen ML, Deeks SG, Martin JN, Van der Laan MJ (27). History-adjusted marginal structural models for estimating time-varying effect modifcation. American Journal of Epidemiology, pp. 66(9), 985 993. Rothman KJ (22). Epidemiology: An Introduction. New York: Oxford University Press. Van der Laan, MJ (2). Targeted maximum likelihood based causal inference: Part I. The International Journal of Biostatistics, pp. 6(2), Article 2. Van der Wal WM, Prins M, Lumbreras B, Geskus RB (29). A simple G-computation algorithm to quantify the causal effect of a secondary illness on the progression of a chronic disease. Statistics in Medicine, pp. 28(8), 2325 2337.