Integrated approaches for analysis of cluster randomised trials

Similar documents
Using Estimating Equations for Spatially Correlated A

University of California, Berkeley

Combining multiple observational data sources to estimate causal eects

Downloaded from:

A Sampling of IMPACT Research:

Extending causal inferences from a randomized trial to a target population

Causal inference in epidemiological practice

Double Robustness. Bang and Robins (2005) Kang and Schafer (2007)

Targeted Maximum Likelihood Estimation in Safety Analysis

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

Longitudinal analysis of ordinal data

University of California, Berkeley

An Introduction to Causal Analysis on Observational Data using Propensity Scores

The impact of covariance misspecification in multivariate Gaussian mixtures on estimation and inference

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data

Matching. Quiz 2. Matching. Quiz 2. Exact Matching. Estimand 2/25/14

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall

ANALYSIS OF CORRELATED DATA SAMPLING FROM CLUSTERS CLUSTER-RANDOMIZED TRIALS

Empirical Bayes Moderation of Asymptotically Linear Parameters

e author and the promoter give permission to consult this master dissertation and to copy it or parts of it for personal use. Each other use falls

Adaptive Trial Designs

Estimating direct effects in cohort and case-control studies

Estimating the Marginal Odds Ratio in Observational Studies

A weighted simulation-based estimator for incomplete longitudinal data models

Propensity Score Weighting with Multilevel Data

Harvard University. Harvard University Biostatistics Working Paper Series

Empirical Bayes Moderation of Asymptotically Linear Parameters

Mediation analyses. Advanced Psychometrics Methods in Cognitive Aging Research Workshop. June 6, 2016

5 Methods Based on Inverse Probability Weighting Under MAR

MISSING or INCOMPLETE DATA

Causal Inference with Measurement Error

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University

Estimating and Using Propensity Score in Presence of Missing Background Data. An Application to Assess the Impact of Childbearing on Wellbeing

This is the submitted version of the following book chapter: stat08068: Double robustness, which will be

Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association for binary responses

Robustness to Parametric Assumptions in Missing Data Models

Analysis of Incomplete Non-Normal Longitudinal Lipid Data

Causal Inference Basics

Propensity Score Methods for Causal Inference

Robust covariance estimator for small-sample adjustment in the generalized estimating equations: A simulation study

Statistical Methods. Missing Data snijders/sm.htm. Tom A.B. Snijders. November, University of Oxford 1 / 23

An Empirical Comparison of Multiple Imputation Approaches for Treating Missing Data in Observational Studies

Biostatistics Workshop Longitudinal Data Analysis. Session 4 GARRETT FITZMAURICE

Multilevel Statistical Models: 3 rd edition, 2003 Contents

GMM Logistic Regression with Time-Dependent Covariates and Feedback Processes in SAS TM

Weighting Methods. Harvard University STAT186/GOV2002 CAUSAL INFERENCE. Fall Kosuke Imai

arxiv: v2 [stat.me] 17 Jan 2017

Models for binary data

IP WEIGHTING AND MARGINAL STRUCTURAL MODELS (CHAPTER 12) BIOS IPW and MSM

Package drgee. November 8, 2016

2 Naïve Methods. 2.1 Complete or available case analysis

GEE for Longitudinal Data - Chapter 8

Estimating the Mean Response of Treatment Duration Regimes in an Observational Study. Anastasios A. Tsiatis.

multilevel modeling: concepts, applications and interpretations

An Introduction to Causal Mediation Analysis. Xu Qin University of Chicago Presented at the Central Iowa R User Group Meetup Aug 10, 2016

Application of Time-to-Event Methods in the Assessment of Safety in Clinical Trials

Sample Size and Power Considerations for Longitudinal Studies

arxiv: v1 [stat.me] 15 May 2011

University of California, Berkeley

Fair Inference Through Semiparametric-Efficient Estimation Over Constraint-Specific Paths

Unbiased estimation of exposure odds ratios in complete records logistic regression

,..., θ(2),..., θ(n)

A note on L convergence of Neumann series approximation in missing data problems

Methods for inferring short- and long-term effects of exposures on outcomes, using longitudinal data on both measures

Conceptual overview: Techniques for establishing causal pathways in programs and policies

Journal of Biostatistics and Epidemiology

Some methods for handling missing values in outcome variables. Roderick J. Little

Longitudinal Modeling with Logistic Regression

ST 790, Homework 1 Spring 2017

Analysis of propensity score approaches in difference-in-differences designs

Vector-Based Kernel Weighting: A Simple Estimator for Improving Precision and Bias of Average Treatment Effects in Multiple Treatment Settings

Conditional Inference Functions for Mixed-Effects Models with Unspecified Random-Effects Distribution

Selection on Observables: Propensity Score Matching.

A Decision Theoretic Approach to Causality

Propensity Score Analysis with Hierarchical Data

Comparing Adaptive Interventions Using Data Arising from a SMART: With Application to Autism, ADHD, and Mood Disorders

Web-based Supplementary Materials for A Robust Method for Estimating. Optimal Treatment Regimes

Analyzing Pilot Studies with Missing Observations

Dan Graham Professor of Statistical Modelling. Centre for Transport Studies

Lecture 3.1 Basic Logistic LDA

University of California, Berkeley

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall

Discussing Effects of Different MAR-Settings

TECHNICAL REPORT Fixed effects models for longitudinal binary data with drop-outs missing at random

Data Integration for Big Data Analysis for finite population inference

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9.

A Copula-Based Method for Analyzing Bivariate Binary Longitudinal Data

MISSING or INCOMPLETE DATA

Pricing and Risk Analysis of a Long-Term Care Insurance Contract in a non-markov Multi-State Model

7 Sensitivity Analysis

INTRODUCTION TO MULTILEVEL MODELLING FOR REPEATED MEASURES DATA. Belfast 9 th June to 10 th June, 2011

Variable selection and machine learning methods in causal inference

CHL 5225 H Crossover Trials. CHL 5225 H Crossover Trials

Module 6 Case Studies in Longitudinal Data Analysis

University of California, Berkeley

Peng Li * and David T Redden

Weighting. Homework 2. Regression. Regression. Decisions Matching: Weighting (0) W i. (1) -å l i. )Y i. (1-W i 3/5/2014. (1) = Y i.

University of California, Berkeley

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i,

Covariate selection and propensity score specification in causal inference

Transcription:

Integrated approaches for analysis of cluster randomised trials Invited Session 4.1 - Recent developments in CRTs Joint work with L. Turner, F. Li, J. Gallis and D. Murray Mélanie PRAGUE - SCT 2017 - Liverpool May 9, 2017

Collaborators M. Prague - Marginal methods for CRTs May 9, 2017-2

SOMMAIRE Conditional vs. Marginal models Marginal Models Estimation Improved Marginal models estimation (Doubly robust) Conclusion / Discussion M. Prague - Marginal methods for CRTs May 9, 2017-3

1 Background Conditional vs. Marginal models M. Prague - Marginal methods for CRTs May 9, 2017-4

Notations In cross sectional CRTs: A i Intervention group for cluster i X ij Baseline covariates of individual j in cluster i Y ij Outcome at the time of interest for individual j in cluster i Conditional Regression: Y ij = g(β 0 + βa COND A i + u i ), u N (0, σ) β COND A the conditional effect of intervention Mixed Effect Models Marginal Regression: µ ij = E(Y ij A i ) = g(β 0 + βa MAR A i ) the marginal effect of intervention β MAR A Estimating Equation-based models M. Prague - Marginal methods for CRTs May 9, 2017-5

Conditional vs. Marginal methods It is essential to understand the underlying assumptions of each method: Conditional models rely on correct specification of untestable aspects of the data distribution (βa COND ) Marginal models rely on a correct definition of the population of interest, which can make it difficult to generalise results to other populations (βa MAR ) Definition of the parameter of interest: intervention effect Conditional mean: β COND A Effect given other responses in the cluster(s) and unobserved random effects Marginal mean: βa MAR Effect according to average response across the population. M. Prague - Marginal methods for CRTs May 9, 2017-6

How to make a decision? Pros and cons [Hubbard et al. 2010] M. Prague - Marginal methods for CRTs May 9, 2017-7

More... Turner E., Li F., Gallis, J, Prague M. and Murray D. Review of recent methodological developments in group randomized trials: Part 1 - Design. (2017) American journal of public Health. in press Turner E., Prague M., Gallis J, Li F. and Murray D. Review of recent methodological developments in group randomized trials: Part 2 - Analysis. (2017) American journal of public Health. in press M. Prague - Marginal methods for CRTs May 9, 2017-8

2 Marginal Models Adjustment for missing data M. Prague - Marginal methods for CRTs May 9, 2017-9

GEE Principle [Liang et Zeger, 1986] M m(y i, A i, β) = i=1 M i=1 µ i β V 1 i (Y i µ i ) = 0 First, a naive linear regression analysis is carried out, assuming the observations within subjects are independent. Then, residuals are calculated from the naive model (observed-predicted) and a working correlation matrix is estimated from these residuals. Then the regression coefficients are refit, correcting for the correlation. (Iterative process) The within-subject correlation structure is treated as a nuisance variable (i.e. as a covariate) M. Prague - Marginal methods for CRTs May 9, 2017-10

Quadratic Inference Function Limitation of GEE: the working correlation matrix can be difficult to specify. GEE is always unbiased but loss of efficiency if Vi 1 is misspecified V 1 i QIF : V 1 i = a 0 M 0 + a 1 M 1 + a 2 M 2 +... + a n M n where, (a 0,..., a n ) is estimated and (M 0,..., M n ) is a basis of know matrices. Facts: QIF more efficient than GEE [Odueyungbo et al. 2008]. No implementation yet in R SAS or STATA. M. Prague - Marginal methods for CRTs May 9, 2017-11

The missing data problem MCAR: P(R ij Y obs i MAR: P(R ij Y obs i CDM: P(R ij Y obs i MNAR: P(R ij Y obs i ; Y miss i ; X i ; A i ) = P(R ij ) ; Y miss i ; Y miss ; X i ; A i ) = P(R ij X i ; A i, Y obs i ) i ; X i ; A i ) = P(R ij X i ; A i ) ; Y miss i ; X i ; A i ) = P(R ij X i ; A i, Y obs i, Y miss i ) M. Prague - Marginal methods for CRTs May 9, 2017-12

Adjusting for missing data Idea 1 (Multiple imputation): For every missing data, impute what could be the value of the missing. Disavantage: How to impute? Find f (Y X, A). Idea 2 (Inverse-Probability weighting): If four individuals are identical according to covariates and three are missing. The observed individual will get a weight of 4, which correspond to the probability of observation of 0.25 (or 1/4= 0.25). As a result, data from this individual should count once for himself and 3 times for other individuals missing. Disavantage: How to describe identical? This link needs to be exact to obtain correct weighting. Find P(R = 1 X, A). M. Prague - Marginal methods for CRTs May 9, 2017-13

Adjusting for missing data Idea 1 (Multiple imputation): For every missing data, impute what could be the value of the missing. Disavantage: How to impute? Find f (Y X, A). Idea 2 (Inverse-Probability weighting): If four individuals are identical according to covariates and three are missing. The observed individual will get a weight of 4, which correspond to the probability of observation of 0.25 (or 1/4= 0.25). As a result, data from this individual should count once for himself and 3 times for other individuals missing. Disavantage: How to describe identical? This link needs to be exact to obtain correct weighting. Find P(R = 1 X, A). M. Prague - Marginal methods for CRTs May 9, 2017-14

Inverse probability Weighted (IPW) GEE Solve: M m(y i, A i, X i, β) = i=1 M i=1 µ i β V 1 i W i (Y i µ i ) = 0 Properties: W i = Diag[ 1 π ij ] j=1...ni is the weighting matrix. π ij = P(R ij = 1 A i, X ij ) is the propensity score (PS). PS has to be correctly specified to ensure Consistency and Asymptotic Normality (CAN). M. Prague - Marginal methods for CRTs May 9, 2017-15

The Caveat A wrong formula is often implemented in softwares 1 : M µ i β W 1/2 i i=1 Vi 1 W 1/2 i (Y i µ i ) = 0 Solution [Pepe et al. 1992, for longitudinal data]: Weights need to be cluster-specific or the working correlation matrix should be identity. 1 R (geepack), SAS (GENMOD need to be used with observation specific weights),... In any case check the manual... aa aa M. Prague - Marginal methods for CRTs May 9, 2017-16

Simulation - Toy example Settings: Age ij N (30, 10) M = 100 and n i = 100 u i ICC = 0.05 R = 1000 replicates Generation: logit(p(y ij = 1)) = 3.0 0.5A i + 0.15Age ij + 0.05Age ij A i + u i logit(p(r ij = missing)) = 5.0 + 0.75A i + 0.1Age ij + 0.05Age ij A i Independence Exchangeable R package Bias SE Coverage Bias SE Coverage CRTgeeDR 0.013 0.107 95.9 0.017 0.091 94.6 geepack 0.012 0.101 97.4 0.185 0.101 6.3 geem 0.013 0.098 95.9 0.147 0.153 85.1 M. Prague - Marginal methods for CRTs May 9, 2017-17

More... Turner E. and Prague M. GEE Analysis of Cluster Randomized Controlled Trial Data with Missing Outcomes: A tutorial in inverse probability weighting methods Submitted International Journal of Epidemiology. Liang et Zeger (1986) Longitudinal data analysis using generalized linear models. Biometrika 73:13-22 Qu et al. (2000) Improving generalized estimating equations using inference functions. Biometrika, 87:823-836. M. Prague - Marginal methods for CRTs May 9, 2017-18

3 Going further with Marginal Models Doubly Robust, TMLE,... M. Prague - Marginal methods for CRTs May 9, 2017-19

The imbalance of baseline covariates problem A pronounced baseline imbalance is not expected a priori in a CRT: if the randomisation process has worked correctly, any observed imbalance must always be a random phenomenon. It impacts efficiency but not biais M. Prague - Marginal methods for CRTs May 9, 2017-20

Doubly Robust GEE Estimator (implemented in the R package CRTgeeDR) Outcome Model (OM) : B ij (X i, A i ) = E(Y ij A i, X i ) Propensity Score (PS) : [W i ] jj = R ij /P(R ij = 1 X i, A i ) Unbiased if OM or PS correspond to the TRUE data generation process Missing data More weight to individuals unlikely to be observed Augmenta)on for Unbalanced covariates Distance between the data (Y) an models (μ, B) Model of Interest # & % D i V 1 i W i [Y i B i (X i, A i )]+ D i (a)v 1 i [B i (X i, a) µ(β, a)] ( = 0 i $ a=0,1 ' Correla@ons Design Matrix GEE Doubly Robust es)mator M. Prague - Marginal methods for CRTs May 9, 2017-21

tmle (soon in R ctmle) M. Prague - Marginal methods for CRTs May 9, 2017-22

South African Man Study [Jemmott et al. (2014)] Population: Men 18-45 y.o., Sexually active, Consent / completed the baseline survey. Intervention - HIV reduction Strengthen behavioral beliefs that support condom use, Increase skill and self-efficacy to use condoms, Increase HIV/STI risk-reduction knowledge. Control - Health promotion Adhere to physical-activity guidelines Have a diet with 5-a-Day fruit-and-vegetable consumption Limit fat and alcohol intake M. Prague - Marginal methods for CRTs May 9, 2017-23

South African Man Study [Jemmott et al. (2014)] Outcome: Frequency of protected intercourses Missing data : HIV/STI group Control group Y 64% [26%; 100%] 60% [22%; 100%] HIV/STI group Control group R Y 20.8% 17.5% Imbalance in baseline covariates : HIV/STI group Control group % Married in the Neighbourhood 19.1% 19.2% % Married in the sample 4.4% 7.2% M. Prague - Marginal methods for CRTs May 9, 2017-24

South African Man Study [Jemmott et al. (2014)] Primary Outcome : Frequency of protected intercourses HIV/STI intervention effect SD p-value GEE (biased) 3.74 2.36 0.113 IPW-GEE 3.43 2.49 0.168 DR-GEE 7.39 2.89 0.010 Secondary Outcome : Frequency of protected intercourses with casual partner HIV/STI intervention effect SD p-value GEE (biased) -0.50 1.08 0.369 IPW-GEE -0.92 1.09 0.396 DR-GEE -0.82 1.04 0.414 M. Prague - Marginal methods for CRTs May 9, 2017-25

More... M. Prague, R. Wang, E. Tchetgen Tchetgen and V. De Grutolla Accounting for Interactions and Complex Inter-Subject Dependency in Estimating Treatment Effect in Cluster Randomized Trials With Missing Outcomes (2016) Biometrics. 72(4) 1066-1077. M. Prague, R. Wang and V. De Grutolla CRTgeeDR: An R Package for Doubly Robust Generalized Estimating Equations Estimations in Cluster Randomised Trials with Missing Data In revision R Journal. John B. Jemmott III et al. Cluster-Randomized Controlled Trial of an HIV/Sexually Transmitted Infection Risk-Reduction Intervention for South African Men (2013) American Journal of Public Health 104 (3) 467-473. Van der Laan et Robins, Springer, unified Methods for censored dlongitudinal data and causality. M. Prague - Marginal methods for CRTs May 9, 2017-26

6 Conclusion Discussion and future works M. Prague - Marginal methods for CRTs May 9, 2017-27

Going further Take home message : Marginal models most suited to estimate intervention effect in two-levels CRT If using marginal estimation, you must adjust for missing data IPW for CRT in standard software may be biased due to implementation Using Doubly robust or TMLE approach may improve efficiency (ex. package R CRTgeeDR, tmle) M. Prague - Marginal methods for CRTs May 9, 2017-28

Acknowledgement and Fundings Main source of funding was: R37 AI 51164 (PI: V. De Gruttola) R01MH 100974 (PI: S. Little) Other acknowledgement: R01 HD053270 (PI: J. Jemmot) NCRR 1S10RR028832-01 (Cluster HMS) R01 AI24643 (PI: Rui Wang) And my new affiliation: M. Prague - Marginal methods for CRTs May 9, 2017-29

Thanks and happy to take questions! SISTM Inria, Bordeaux, Sud-ouest, France melanie.prague@inria.fr