Systematically missing data in individual participant data meta-analysis: a semiparametric inverse probability weighting approach

Similar documents
Unbiased estimation of exposure odds ratios in complete records logistic regression

Combining multiple observational data sources to estimate causal eects

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA

Comparison of multiple imputation methods for systematically and sporadically missing multilevel data

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates

Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai

Standard RE Model. Critique of the RE Model 2016/08/12

Modification and Improvement of Empirical Likelihood for Missing Response Problem

Double Robustness. Bang and Robins (2005) Kang and Schafer (2007)

Statistical Methods. Missing Data snijders/sm.htm. Tom A.B. Snijders. November, University of Oxford 1 / 23

A Sampling of IMPACT Research:

Estimating the Mean Response of Treatment Duration Regimes in an Observational Study. Anastasios A. Tsiatis.

Weighting Methods. Harvard University STAT186/GOV2002 CAUSAL INFERENCE. Fall Kosuke Imai

Session 9: Introduction to Sieve Analysis of Pathogen Sequences, for Assessing How VE Depends on Pathogen Genomics Part I

A Joint Model with Marginal Interpretation for Longitudinal Continuous and Time-to-event Outcomes

Web-based Supplementary Materials for A Robust Method for Estimating. Optimal Treatment Regimes

Harvard University. Robust alternatives to ANCOVA for estimating the treatment effect via a randomized comparative study

Joint Modeling of Longitudinal Item Response Data and Survival

Power and Sample Size Calculations with the Additive Hazards Model

Estimation of Conditional Kendall s Tau for Bivariate Interval Censored Data

Statistical Methods for Alzheimer s Disease Studies

Richard D Riley was supported by funding from a multivariate meta-analysis grant from

Group sequential designs with negative binomial data

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

Multiple imputation in Cox regression when there are time-varying effects of covariates

A new strategy for meta-analysis of continuous covariates in observational studies with IPD. Willi Sauerbrei & Patrick Royston

REGRESSION ANALYSIS FOR TIME-TO-EVENT DATA THE PROPORTIONAL HAZARDS (COX) MODEL ST520

Department of Biostatistics University of Copenhagen

[Part 2] Model Development for the Prediction of Survival Times using Longitudinal Measurements

A framework for developing, implementing and evaluating clinical prediction models in an individual participant data meta-analysis

Survival Analysis Math 434 Fall 2011

Support Vector Hazard Regression (SVHR) for Predicting Survival Outcomes. Donglin Zeng, Department of Biostatistics, University of North Carolina

Inference for correlated effect sizes using multiple univariate meta-analyses

Building a Prognostic Biomarker

A Regression Model For Recurrent Events With Distribution Free Correlation Structure

University of California, Berkeley

A weighted simulation-based estimator for incomplete longitudinal data models

5 Methods Based on Inverse Probability Weighting Under MAR

Missing covariate data in matched case-control studies: Do the usual paradigms apply?

CTDL-Positive Stable Frailty Model

A Comparison of Multiple Imputation Methods for Missing Covariate Values in Recurrent Event Data

7 Sensitivity Analysis

Correction for classical covariate measurement error and extensions to life-course studies

Causal exposure effect on a time-to-event response using an IV.

Modelling geoadditive survival data

Estimation of Optimal Treatment Regimes Via Machine Learning. Marie Davidian

MISSING or INCOMPLETE DATA

arxiv: v1 [stat.me] 19 Aug 2016

Fractional Imputation in Survey Sampling: A Comparative Review

Incorporating published univariable associations in diagnostic and prognostic modeling

Group Sequential Tests for Delayed Responses. Christopher Jennison. Lisa Hampson. Workshop on Special Topics on Sequential Methodology

Frailty Probit model for multivariate and clustered interval-censor

Reconstruction of individual patient data for meta analysis via Bayesian approach

Multivariate Survival Analysis

A multi-state model for the prognosis of non-mild acute pancreatitis

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P.

Personalized Treatment Selection Based on Randomized Clinical Trials. Tianxi Cai Department of Biostatistics Harvard School of Public Health

Semiparametric Regression

Lecture 5 Models and methods for recurrent event data

Extending causal inferences from a randomized trial to a target population

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University

Columbia University. Columbia University Biostatistics Technical Report Series. Handling Missing Data by Deleting Completely Observed Records

Longitudinal + Reliability = Joint Modeling

Estimating Causal Effects of Organ Transplantation Treatment Regimes

Bayesian methods for missing data: part 1. Key Concepts. Nicky Best and Alexina Mason. Imperial College London

Modelling Survival Events with Longitudinal Data Measured with Error

Statistics 262: Intermediate Biostatistics Non-parametric Survival Analysis

Biomarkers for Disease Progression in Rheumatology: A Review and Empirical Study of Two-Phase Designs

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model

Individualized Treatment Effects with Censored Data via Nonparametric Accelerated Failure Time Models

Integrated approaches for analysis of cluster randomised trials

Integrated likelihoods in survival models for highlystratified

Double-Sampling Designs for Dropouts

Frailty Models and Copulas: Similarities and Differences

Methods for inferring short- and long-term effects of exposures on outcomes, using longitudinal data on both measures

More about linear mixed models

High Dimensional Propensity Score Estimation via Covariate Balancing

Statistical Analysis of Spatio-temporal Point Process Data. Peter J Diggle

Simulation-based robust IV inference for lifetime data

Model Selection in Bayesian Survival Analysis for a Multi-country Cluster Randomized Trial

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY

STAT331. Cox s Proportional Hazards Model

Competing risks data analysis under the accelerated failure time model with missing cause of failure

Nonresponse weighting adjustment using estimated response probability

Regression Calibration in Semiparametric Accelerated Failure Time Models

STATISTICAL INFERENCES FOR TWO-STAGE TREATMENT REGIMES FOR TIME-TO-EVENT AND LONGITUDINAL DATA

Chapter 5: Models used in conjunction with sampling. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70

On the covariate-adjusted estimation for an overall treatment difference with data from a randomized comparative clinical trial

Statistics in medicine

STAT 331. Accelerated Failure Time Models. Previously, we have focused on multiplicative intensity models, where

Estimating and Using Propensity Score in Presence of Missing Background Data. An Application to Assess the Impact of Childbearing on Wellbeing

Downloaded from:

Deductive Derivation and Computerization of Semiparametric Efficient Estimation

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Analysis of Time-to-Event Data: Chapter 6 - Regression diagnostics

Regularization in Cox Frailty Models

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

University of California, Berkeley

Transcription:

Systematically missing data in individual participant data meta-analysis: a semiparametric inverse probability weighting approach International Biometric Conference 2014 Jonathan Bartlett www.missingdata.org.uk London School of Hygiene and Tropical Medicine 1/27

Acknowledgements I am grateful for input to this work from: Angela Wood (University of Cambridge, UK) Ian White & Shaun Seaman (MRC Biostatistics Unit, UK) Stijn Vansteelandt (Ghent University, Belgium) Support for myself from an MRC fellowship (MR/K02180X/1). I am also grateful to the MAGGIC Collaborative Group (funding from New Zealand Heart Foundation, University of Auckland and University of Glasgow), whose data I have used in an illustrative analysis. 2/27

Outline The problem Augmented IPW estimation Simulations Illustrative analysis of MAGGIC Conclusions 3/27

Outline The problem Augmented IPW estimation Simulations Illustrative analysis of MAGGIC Conclusions 4/27

Individual participant data meta-analysis Meta-analysis is traditionally performed using aggregate results from study publications. Increasingly, meta-analyses are performed using the individual participant data (IPD-MA) from contributing studies. IPD-MA are being used to (among other things): estimate exposure effects, adjusted for a set of confounding variables develop prognostic models Two stage and one stage analysis approaches are possible. Here we adopt a two stage approach. 5/27

Missing data in IPD-MA One difficult with IPD-MA is that of systematic missingness - some contributing studies do not measure one or more variables of interest. Analysing only complete studies is inefficient, and potentially biased. Multiple imputation (MI) is an obvious approach to take - we impute missing values (for all participants) in studies which did not record them. However, correctly specifying and fitting appropriate multi-level imputation models is difficult. 6/27

An augmented inverse probability weighting approach We therefore pursue an alternative approach based on augmented inverse probability weighting (AIPW). The augmentation function is similar to an imputation model, and enables information to be extracted from studies with systematic missingness. Importantly however, consistency will not rely on correct specification of the imputation type model. 7/27

Outline The problem Augmented IPW estimation Simulations Illustrative analysis of MAGGIC Conclusions 8/27

Setup We assume the MA consists of n studies. In study i, there are N i participants. For participant j in study i, we let Y ij denote the outcome of interest, and X ij and Z ij (vectors of) covariates. Let X i = (Xi1 T,...,XT in i ) (and Z i similarly) denote matrices of covariates for study i. For the moment suppose that there are no missing data. 9/27

Full data analysis A model for Y ij X ij,z ij is fitted to study i, giving estimates of ˆµ i and corresponding variance ˆσ 2 i. Interest lies in µ = E(µ i ) and τ 2 = Var(µ i ). We adopt a method of moments estimation approach due to Paule and Mandel and recommended by DerSimonian [1]. µ and τ 2 are estimated as the values solving n m(ˆµ i,ˆσ i 2,µ,τ2 ) = 0 i=1 where m(ˆµ i,ˆσ i 2,µ,τ 2 ) = ˆµ i µ σ 2 i +τ2 (ˆµ i µ) 2 n 1 σi 2+τ2 n 10/27

Two stage MA with systematically missing covariates Now we suppose that X i is entirely missing for some studies. R i denotes whether study i recorded X i (R i = 1) or not (R i = 0). We assume X i is missing completely at random (MCAR). We can therefore model the distribution of R i as Bin(1,π). π can of course be trivially estimated by ˆπ = n 1 n i=1 R i. 11/27

Augmented inverse probability weighted estimators Using the same full data estimating function as before, augmented inverse probability weighted estimators [2] can be constructed as solving n i=1 R i ˆπ m(ˆµ i,ˆσ 2 i,µ,τ2 ) { Ri ˆπ ˆπ } φ(y i,z i,µ,τ 2 ) = 0 where φ(y i,z i,µ,τ 2 ) is a function of the always observed variables in study i. Assuming MCAR is true, estimates are consistent irrespective of the choice of φ(y i,z i,µ,τ 2 ). 12/27

Augmented inverse probability weighted estimators In a more standard i.i.d. setting, the optimal choice of the augmentation function is given by φ opt (Y i,z i,µ,τ 2 ) = E [ m(ˆµ i,ˆσ 2 i,µ,τ 2 ) Y i,z i ] We adopt a pragmatic approach to approximating this: impute Xi (in all studies) L times, based on a simple but easy to fit imputation model (e.g. using fixed study effects). calculate ˆµ imp i and ˆσ i 2imp based on the imputed X i. use ˆφ opt (Y i,z i,µ,τ 2 ) = 1 L L l=1 m(ˆµ (l) i,ˆσ 2(l) i,µ,τ 2 ) The sandwich variance estimator can be used, although we should be wary about relying on large n asymptotics. 13/27

Outline The problem Augmented IPW estimation Simulations Illustrative analysis of MAGGIC Conclusions 14/27

Simulation study Simulations were conduced to assess the AIPW estimator. For each of 1,000 simulations, data were generated for n = 15 studies. Study size N i was generated as 250+500χ 2 3 (rounded). Covariates X i and Z i (both scalar) were generated from a bivariate normal random-effects model. X i was made MCAR with probability 0.5. 15/27

Time to event outcome A study-specific frailty random variable κ i was generated from a gamma distribution with shape 2.5 and scale 0.4. An event time was generated for each participant, with hazard h(t X ij,z ij,κ i ) = 0.1κ i exp(η i X ij +µ i Z ij ) with ( µi η i ) N (( ) ( )) 1 0.04 0, 1 0 0.04 Study duration was generated from a 2+Gamma(1,1) distribution. Event times were censored at study duration. 16/27

Estimation methods 1. Complete studies analysis. 2. MI, 10 (proper) imputations, using linear regression imputation model with fixed study effects, including Z ij, the event indicator and overall Nelson-Aalen cumulative hazard as covariates [3]. Studies missing X i are imputed using the estimated constant, corresponding to the (arbitrary) first study which had X i observed. 3. AIPW, assuming MCAR, and using 10 (improper) imputations to calculate ˆφ opt (Y i,z i,µ,τ 2 ). 17/27

Results for µ Mean (emp. SD) Mean SE CI coverage (%) Complete studies 0.998 (0.081) 0.071 85.6 MI 0.939 (0.061) 0.055 75.8 AIPW 1.007 (0.069) 0.059 88.6 MI is biased (due to imputation model mis-specification). AIPW is unbiased, more efficient than complete studies analysis, and has the best CI coverage. 18/27

Outline The problem Augmented IPW estimation Simulations Illustrative analysis of MAGGIC Conclusions 19/27

MAGGIC study The Meta-analysis Global Group in Chronic Heart Failure (MAGGIC) is based on data from 39,372 patients from 30 studies with heart failure. The outcome is time to all cause mortality. We consider an illustrative Cox outcome model, with age, gender and BMI as covariates. Age and gender are fully observed. A previously developed risk score included BMI as a covariate, capped at 30 kg/m 2, i.e. a non-linear effect [4], and we do the same. 20/27

Missingness in BMI BMI was not recorded at all in 17 studies. In the remaining 13, it was mostly fully recorded, but in a few studies there was non-negligible sporadic missingness. To remove the sporadic data problem: we set BMI to missing in studies where it is recorded less than 80% of the time, we delete records with BMI missing in studies where BMI is recorded > 80%, to make it fully recorded. 21/27

Analysis approaches We focus on the log hazard ratio for age (per 10 year increase). We perform three analyses: complete studies analysis, using the 13 studies where BMI was recorded, multiple imputation (25 imputations), with a fixed study effect, and including the event indicator and Nelson-Aalen cumulative hazard estimate as covariates, augmented IPW (25 imputations) 22/27

Estimates of log hazard ratio for age (per 10 year increase) Estimate (SE) Complete studies 0.333 (0.057) MI 0.373 (0.031) AIPW 0.367 (0.028) Similar estimates from MI and AIPW, and efficiency gain from both. Suggestion of more precision from AIPW compared to MI. 23/27

Outline The problem Augmented IPW estimation Simulations Illustrative analysis of MAGGIC Conclusions 24/27

Conclusions AIPW estimator improves upon efficiency of complete studies analysis, but is robust to mis-specification of the imp. type model. Because it is based on two stage MA, it can be applied irrespective of the type of regression model being used. We are currently working on its extension to non-mcar missingness mechanisms and to the setting wtih multiple variables subject to systematic missingness. To handle a combination of systematic and sporadic missingness, it may be possible to impute within study (for those with X measured), followed by application of the AIPW approach. 25/27

Meta-analysis Global Group in Chronic Heart Failure CHARM ECHOS DIAMOND Kirk Newton Andersson 1 & 2 Madsen HFC Edmonton DIG Macin Taffet Rich 1 & 2 UK Heart Hillingdon MUSIC Berry Varela-Roman Euro HF Grigorian Guazzi IN-CHF Hola Tribouilloy Gotsman Tsutsui Executive Committee: C Berry, R Doughty, C Granger, L Køber, B Massie, F McAlister, J McMurray, S Pocock, K Poppe, K Swedberg, J Somaratne, G Whalley MAGGIC Steering Group: A Ahmed, B Andersson, A Bayes-Genis, C Berry, M Cowie, R Cubbon, R Doughty, J Ezekowitz, J Gonzalez-Juanatey, M Gorini, I Gotsman, L Grigorian-Shamagian, M Guazzi, M Kearney, L Køber, M Komajda, A di AHFMS Lenarda, M Lenzen, D Lucci, S Macín, B Madsen, A Maggioni, M Martínez- Sellés, F McAlister, F Oliva, K Poppe, M Rich, M Richards, M Senni, I Squire, NPC I G Taffet, L Tarantini, C Tribouilloy, R Troughton, H Tsutsui, G Whalley MAGGIC Coordinating Centre: Battlescarred R Doughty, N Earle, GD Gamble, K Poppe, G Whalley, The University of Richards Auckland, New Zealand MAGGIC Statistical Centres: R Doughty, K Poppe, G Whalley, The University of Auckland, New Zealand J Dobson, S Pocock, C Ariti, The London School of Hygiene and Tropical Medicine

References I [1] Rebecca DerSimonian and Raghu Kacker. Random-effects model for meta-analysis of clinical trials: an update. Contemporary clinical trials, 28(2):105 114, 2007. [2] A A Tsiatis. Semiparametric Theory and Missing Data. Springer, New York, 2006. [3] I. R. White and P. Royston. Imputing missing covariate values for the Cox model. Statistics in Medicine, 28:1982 1998, 2009. 26/27

References II [4] S. J. Pocock, C. A. Ariti, J. J. V. McMurray, L. Kber A. Maggioni, I. B. Squire, K. Swedberg, J. Dobson, K. K. Poppe, G. A. Whalley, and R. N. Doughty. Predicting survival in heart failure: a risk score based on 39372 patients from 30 studies. European Heart Journal, 34:1404 1413, 2013. 27/27