Integrated approaches for analysis of cluster randomised trials Invited Session 4.1 - Recent developments in CRTs Joint work with L. Turner, F. Li, J. Gallis and D. Murray Mélanie PRAGUE - SCT 2017 - Liverpool May 9, 2017
Collaborators M. Prague - Marginal methods for CRTs May 9, 2017-2
SOMMAIRE Conditional vs. Marginal models Marginal Models Estimation Improved Marginal models estimation (Doubly robust) Conclusion / Discussion M. Prague - Marginal methods for CRTs May 9, 2017-3
1 Background Conditional vs. Marginal models M. Prague - Marginal methods for CRTs May 9, 2017-4
Notations In cross sectional CRTs: A i Intervention group for cluster i X ij Baseline covariates of individual j in cluster i Y ij Outcome at the time of interest for individual j in cluster i Conditional Regression: Y ij = g(β 0 + βa COND A i + u i ), u N (0, σ) β COND A the conditional effect of intervention Mixed Effect Models Marginal Regression: µ ij = E(Y ij A i ) = g(β 0 + βa MAR A i ) the marginal effect of intervention β MAR A Estimating Equation-based models M. Prague - Marginal methods for CRTs May 9, 2017-5
Conditional vs. Marginal methods It is essential to understand the underlying assumptions of each method: Conditional models rely on correct specification of untestable aspects of the data distribution (βa COND ) Marginal models rely on a correct definition of the population of interest, which can make it difficult to generalise results to other populations (βa MAR ) Definition of the parameter of interest: intervention effect Conditional mean: β COND A Effect given other responses in the cluster(s) and unobserved random effects Marginal mean: βa MAR Effect according to average response across the population. M. Prague - Marginal methods for CRTs May 9, 2017-6
How to make a decision? Pros and cons [Hubbard et al. 2010] M. Prague - Marginal methods for CRTs May 9, 2017-7
More... Turner E., Li F., Gallis, J, Prague M. and Murray D. Review of recent methodological developments in group randomized trials: Part 1 - Design. (2017) American journal of public Health. in press Turner E., Prague M., Gallis J, Li F. and Murray D. Review of recent methodological developments in group randomized trials: Part 2 - Analysis. (2017) American journal of public Health. in press M. Prague - Marginal methods for CRTs May 9, 2017-8
2 Marginal Models Adjustment for missing data M. Prague - Marginal methods for CRTs May 9, 2017-9
GEE Principle [Liang et Zeger, 1986] M m(y i, A i, β) = i=1 M i=1 µ i β V 1 i (Y i µ i ) = 0 First, a naive linear regression analysis is carried out, assuming the observations within subjects are independent. Then, residuals are calculated from the naive model (observed-predicted) and a working correlation matrix is estimated from these residuals. Then the regression coefficients are refit, correcting for the correlation. (Iterative process) The within-subject correlation structure is treated as a nuisance variable (i.e. as a covariate) M. Prague - Marginal methods for CRTs May 9, 2017-10
Quadratic Inference Function Limitation of GEE: the working correlation matrix can be difficult to specify. GEE is always unbiased but loss of efficiency if Vi 1 is misspecified V 1 i QIF : V 1 i = a 0 M 0 + a 1 M 1 + a 2 M 2 +... + a n M n where, (a 0,..., a n ) is estimated and (M 0,..., M n ) is a basis of know matrices. Facts: QIF more efficient than GEE [Odueyungbo et al. 2008]. No implementation yet in R SAS or STATA. M. Prague - Marginal methods for CRTs May 9, 2017-11
The missing data problem MCAR: P(R ij Y obs i MAR: P(R ij Y obs i CDM: P(R ij Y obs i MNAR: P(R ij Y obs i ; Y miss i ; X i ; A i ) = P(R ij ) ; Y miss i ; Y miss ; X i ; A i ) = P(R ij X i ; A i, Y obs i ) i ; X i ; A i ) = P(R ij X i ; A i ) ; Y miss i ; X i ; A i ) = P(R ij X i ; A i, Y obs i, Y miss i ) M. Prague - Marginal methods for CRTs May 9, 2017-12
Adjusting for missing data Idea 1 (Multiple imputation): For every missing data, impute what could be the value of the missing. Disavantage: How to impute? Find f (Y X, A). Idea 2 (Inverse-Probability weighting): If four individuals are identical according to covariates and three are missing. The observed individual will get a weight of 4, which correspond to the probability of observation of 0.25 (or 1/4= 0.25). As a result, data from this individual should count once for himself and 3 times for other individuals missing. Disavantage: How to describe identical? This link needs to be exact to obtain correct weighting. Find P(R = 1 X, A). M. Prague - Marginal methods for CRTs May 9, 2017-13
Adjusting for missing data Idea 1 (Multiple imputation): For every missing data, impute what could be the value of the missing. Disavantage: How to impute? Find f (Y X, A). Idea 2 (Inverse-Probability weighting): If four individuals are identical according to covariates and three are missing. The observed individual will get a weight of 4, which correspond to the probability of observation of 0.25 (or 1/4= 0.25). As a result, data from this individual should count once for himself and 3 times for other individuals missing. Disavantage: How to describe identical? This link needs to be exact to obtain correct weighting. Find P(R = 1 X, A). M. Prague - Marginal methods for CRTs May 9, 2017-14
Inverse probability Weighted (IPW) GEE Solve: M m(y i, A i, X i, β) = i=1 M i=1 µ i β V 1 i W i (Y i µ i ) = 0 Properties: W i = Diag[ 1 π ij ] j=1...ni is the weighting matrix. π ij = P(R ij = 1 A i, X ij ) is the propensity score (PS). PS has to be correctly specified to ensure Consistency and Asymptotic Normality (CAN). M. Prague - Marginal methods for CRTs May 9, 2017-15
The Caveat A wrong formula is often implemented in softwares 1 : M µ i β W 1/2 i i=1 Vi 1 W 1/2 i (Y i µ i ) = 0 Solution [Pepe et al. 1992, for longitudinal data]: Weights need to be cluster-specific or the working correlation matrix should be identity. 1 R (geepack), SAS (GENMOD need to be used with observation specific weights),... In any case check the manual... aa aa M. Prague - Marginal methods for CRTs May 9, 2017-16
Simulation - Toy example Settings: Age ij N (30, 10) M = 100 and n i = 100 u i ICC = 0.05 R = 1000 replicates Generation: logit(p(y ij = 1)) = 3.0 0.5A i + 0.15Age ij + 0.05Age ij A i + u i logit(p(r ij = missing)) = 5.0 + 0.75A i + 0.1Age ij + 0.05Age ij A i Independence Exchangeable R package Bias SE Coverage Bias SE Coverage CRTgeeDR 0.013 0.107 95.9 0.017 0.091 94.6 geepack 0.012 0.101 97.4 0.185 0.101 6.3 geem 0.013 0.098 95.9 0.147 0.153 85.1 M. Prague - Marginal methods for CRTs May 9, 2017-17
More... Turner E. and Prague M. GEE Analysis of Cluster Randomized Controlled Trial Data with Missing Outcomes: A tutorial in inverse probability weighting methods Submitted International Journal of Epidemiology. Liang et Zeger (1986) Longitudinal data analysis using generalized linear models. Biometrika 73:13-22 Qu et al. (2000) Improving generalized estimating equations using inference functions. Biometrika, 87:823-836. M. Prague - Marginal methods for CRTs May 9, 2017-18
3 Going further with Marginal Models Doubly Robust, TMLE,... M. Prague - Marginal methods for CRTs May 9, 2017-19
The imbalance of baseline covariates problem A pronounced baseline imbalance is not expected a priori in a CRT: if the randomisation process has worked correctly, any observed imbalance must always be a random phenomenon. It impacts efficiency but not biais M. Prague - Marginal methods for CRTs May 9, 2017-20
Doubly Robust GEE Estimator (implemented in the R package CRTgeeDR) Outcome Model (OM) : B ij (X i, A i ) = E(Y ij A i, X i ) Propensity Score (PS) : [W i ] jj = R ij /P(R ij = 1 X i, A i ) Unbiased if OM or PS correspond to the TRUE data generation process Missing data More weight to individuals unlikely to be observed Augmenta)on for Unbalanced covariates Distance between the data (Y) an models (μ, B) Model of Interest # & % D i V 1 i W i [Y i B i (X i, A i )]+ D i (a)v 1 i [B i (X i, a) µ(β, a)] ( = 0 i $ a=0,1 ' Correla@ons Design Matrix GEE Doubly Robust es)mator M. Prague - Marginal methods for CRTs May 9, 2017-21
tmle (soon in R ctmle) M. Prague - Marginal methods for CRTs May 9, 2017-22
South African Man Study [Jemmott et al. (2014)] Population: Men 18-45 y.o., Sexually active, Consent / completed the baseline survey. Intervention - HIV reduction Strengthen behavioral beliefs that support condom use, Increase skill and self-efficacy to use condoms, Increase HIV/STI risk-reduction knowledge. Control - Health promotion Adhere to physical-activity guidelines Have a diet with 5-a-Day fruit-and-vegetable consumption Limit fat and alcohol intake M. Prague - Marginal methods for CRTs May 9, 2017-23
South African Man Study [Jemmott et al. (2014)] Outcome: Frequency of protected intercourses Missing data : HIV/STI group Control group Y 64% [26%; 100%] 60% [22%; 100%] HIV/STI group Control group R Y 20.8% 17.5% Imbalance in baseline covariates : HIV/STI group Control group % Married in the Neighbourhood 19.1% 19.2% % Married in the sample 4.4% 7.2% M. Prague - Marginal methods for CRTs May 9, 2017-24
South African Man Study [Jemmott et al. (2014)] Primary Outcome : Frequency of protected intercourses HIV/STI intervention effect SD p-value GEE (biased) 3.74 2.36 0.113 IPW-GEE 3.43 2.49 0.168 DR-GEE 7.39 2.89 0.010 Secondary Outcome : Frequency of protected intercourses with casual partner HIV/STI intervention effect SD p-value GEE (biased) -0.50 1.08 0.369 IPW-GEE -0.92 1.09 0.396 DR-GEE -0.82 1.04 0.414 M. Prague - Marginal methods for CRTs May 9, 2017-25
More... M. Prague, R. Wang, E. Tchetgen Tchetgen and V. De Grutolla Accounting for Interactions and Complex Inter-Subject Dependency in Estimating Treatment Effect in Cluster Randomized Trials With Missing Outcomes (2016) Biometrics. 72(4) 1066-1077. M. Prague, R. Wang and V. De Grutolla CRTgeeDR: An R Package for Doubly Robust Generalized Estimating Equations Estimations in Cluster Randomised Trials with Missing Data In revision R Journal. John B. Jemmott III et al. Cluster-Randomized Controlled Trial of an HIV/Sexually Transmitted Infection Risk-Reduction Intervention for South African Men (2013) American Journal of Public Health 104 (3) 467-473. Van der Laan et Robins, Springer, unified Methods for censored dlongitudinal data and causality. M. Prague - Marginal methods for CRTs May 9, 2017-26
6 Conclusion Discussion and future works M. Prague - Marginal methods for CRTs May 9, 2017-27
Going further Take home message : Marginal models most suited to estimate intervention effect in two-levels CRT If using marginal estimation, you must adjust for missing data IPW for CRT in standard software may be biased due to implementation Using Doubly robust or TMLE approach may improve efficiency (ex. package R CRTgeeDR, tmle) M. Prague - Marginal methods for CRTs May 9, 2017-28
Acknowledgement and Fundings Main source of funding was: R37 AI 51164 (PI: V. De Gruttola) R01MH 100974 (PI: S. Little) Other acknowledgement: R01 HD053270 (PI: J. Jemmot) NCRR 1S10RR028832-01 (Cluster HMS) R01 AI24643 (PI: Rui Wang) And my new affiliation: M. Prague - Marginal methods for CRTs May 9, 2017-29
Thanks and happy to take questions! SISTM Inria, Bordeaux, Sud-ouest, France melanie.prague@inria.fr