Comparison of multiple imputation methods for systematically and sporadically missing multilevel data

Similar documents
arxiv: v2 [stat.me] 27 Nov 2017

Statistical Methods. Missing Data snijders/sm.htm. Tom A.B. Snijders. November, University of Oxford 1 / 23

MULTILEVEL IMPUTATION 1

Plausible Values for Latent Variables Using Mplus

Multilevel Multiple Imputation in presence of interactions, non-linearities and random slopes

Fractional Imputation in Survey Sampling: A Comparative Review

Downloaded from:

Bayesian methods for missing data: part 1. Key Concepts. Nicky Best and Alexina Mason. Imperial College London

A comparison of fully Bayesian and two-stage imputation strategies for missing covariate data

Bayesian Analysis of Latent Variable Models using Mplus

arxiv: v1 [stat.me] 27 Feb 2017

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Incorporating published univariable associations in diagnostic and prognostic modeling

Bayesian Multilevel Latent Class Models for the Multiple. Imputation of Nested Categorical Data

Reconstruction of individual patient data for meta analysis via Bayesian approach

Centering Predictor and Mediator Variables in Multilevel and Time-Series Models

Basics of Modern Missing Data Analysis

STA 216, GLM, Lecture 16. October 29, 2007

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

University of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): /rssa.

The STS Surgeon Composite Technical Appendix

Three-Level Multiple Imputation: A Fully Conditional Specification Approach. Brian Tinnell Keller

Multilevel Statistical Models: 3 rd edition, 2003 Contents

A note on multiple imputation for general purpose estimation

Unbiased estimation of exposure odds ratios in complete records logistic regression

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units

Bayesian Linear Regression

Estimating and Using Propensity Score in Presence of Missing Background Data. An Application to Assess the Impact of Childbearing on Wellbeing

Latent Variable Centering of Predictors and Mediators in Multilevel and Time-Series Models

Pooling multiple imputations when the sample happens to be the population.

Missing Data Issues in the Studies of Neurodegenerative Disorders: the Methodology

Comparison between conditional and marginal maximum likelihood for a class of item response models

Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test

Bayesian Analysis of Multivariate Normal Models when Dimensions are Absent

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University

Biostat 2065 Analysis of Incomplete Data

CTDL-Positive Stable Frailty Model

Implications of Missing Data Imputation for Agricultural Household Surveys: An Application to Technology Adoption

A weighted simulation-based estimator for incomplete longitudinal data models

Richard D Riley was supported by funding from a multivariate meta-analysis grant from

Inference for correlated effect sizes using multiple univariate meta-analyses

A Comparative Study of Imputation Methods for Estimation of Missing Values of Per Capita Expenditure in Central Java

Some methods for handling missing values in outcome variables. Roderick J. Little

Longitudinal analysis of ordinal data

Bayesian Multiple Imputation for Large-Scale Categorical Data with Structural Zeros

STATISTICAL ANALYSIS WITH MISSING DATA

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Bayes methods for categorical data. April 25, 2017

Discussing Effects of Different MAR-Settings

Continuous Time Survival in Latent Variable Models

Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions

Statistical Analysis of Randomized Experiments with Nonignorable Missing Binary Outcomes

Ages of stellar populations from color-magnitude diagrams. Paul Baines. September 30, 2008

Outline. Clustering. Capturing Unobserved Heterogeneity in the Austrian Labor Market Using Finite Mixtures of Markov Chain Models

Lecture 16: Mixtures of Generalized Linear Models

Inferences on missing information under multiple imputation and two-stage multiple imputation

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

Handling Missing Data in R with MICE

Stat 5101 Lecture Notes

Bayesian non-parametric model to longitudinally predict churn

Bayesian Multivariate Logistic Regression

Contents. Part I: Fundamentals of Bayesian Inference 1

BAYESIAN METHODS TO IMPUTE MISSING COVARIATES FOR CAUSAL INFERENCE AND MODEL SELECTION

Don t be Fancy. Impute Your Dependent Variables!

FIT CRITERIA PERFORMANCE AND PARAMETER ESTIMATE BIAS IN LATENT GROWTH MODELS WITH SMALL SAMPLES

AN EMPIRICAL INVESTIGATION OF THE IMPACT OF DIFFERENT METHODS FOR SYNTHESISING EVIDENCE IN A NETWORK META- ANALYSIS

Outline. Selection Bias in Multilevel Models. The selection problem. The selection problem. Scope of analysis. The selection problem

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i,

Multiple Imputation for Complex Data Sets

Missing values imputation for mixed data based on principal component methods

Gibbs Sampling in Endogenous Variables Models

Generalized Linear Models for Non-Normal Data

David Hughes. Flexible Discriminant Analysis Using. Multivariate Mixed Models. D. Hughes. Motivation MGLMM. Discriminant. Analysis.

Bayesian inference for factor scores

Accounting for Complex Sample Designs via Mixture Models

Default Priors and Effcient Posterior Computation in Bayesian

Using Bayesian Priors for More Flexible Latent Class Analysis

Distributed multilevel matrix completion for medical databases

Beyond GLM and likelihood

Bayesian linear regression

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

STAT 425: Introduction to Bayesian Analysis

Combining multiple observational data sources to estimate causal eects

Causal Hazard Ratio Estimation By Instrumental Variables or Principal Stratification. Todd MacKenzie, PhD

WU Weiterbildung. Linear Mixed Models

Whether to use MMRM as primary estimand.

STA 4273H: Statistical Machine Learning

Local Likelihood Bayesian Cluster Modeling for small area health data. Andrew Lawson Arnold School of Public Health University of South Carolina

Multi-level Models: Idea

Multiple Imputation for Missing Values Through Conditional Semiparametric Odds Ratio Models

Health utilities' affect you are reported alongside underestimates of uncertainty

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Sampling bias in logistic models

Parametric fractional imputation for missing data analysis

multilevel modeling: concepts, applications and interpretations

PACKAGE LMest FOR LATENT MARKOV ANALYSIS

Marginal Specifications and a Gaussian Copula Estimation

Transcription:

Comparison of multiple imputation methods for systematically and sporadically missing multilevel data V. Audigier, I. White, S. Jolani, T. Debray, M. Quartagno, J. Carpenter, S. van Buuren, M. Resche-Rigon INSERM, UMR 1153, ECSTRA team, Saint-Louis Hospital, Paris MODAL Seminar, November 22th, Lille 1 / 29

Motivation: GREAT data (Great Network, 2013) Risk factors associated with short-term mortality in acute heart failure 28 observational cohorts, 11685 patients, 2 binary and 8 continuous variables (patient characteristics and potential risk factors) X 1, X 2, X 3,... X p, Y (LVEF ) sporadically and systematically missing data Index Aim: explain the relationship between biomarkers (BNP, AFIB,...) and the left ventricular ejection fraction (LVEF) y ik = x ik β + z ik b k + ε ik b k N (0, Ψ) ε ik N ( 0, σ 2) ) ˆβ and associated variability var ( β gender bmi age SBP DBP HR bnpl AFib LVEF 2 / 29

Methods to handle missing values Missing data are often assumed to be missing at random (MAR) Ad-hoc methods Complete-case analysis generally leads to biased estimates increases standard errors Single imputation leads to unbiased estimates standard errors are downwardly biased 3 / 29

Methods to handle missing values Relevant methods Likelihood approaches Frequentist framework: EM algorithm Bayesian framework: Data Augmentation leads to unbiased estimates specic to the analysis model not always feasible Multiple imputation leads to unbiased estimates can be used for several analysis models 4 / 29

Multiple imputation (Rubin, 1987) 1 Generate a set of M parameters θ m of an imputation model to generate M plausible imputed data sets ) P (X miss X obs, θ 1 )......... P (X miss X obs, θ M (ˆFû )ij (ˆFû ) 1 ij +ε1 ij (ˆFû ) 2 ij +ε2 ij (ˆFû ) 3 ij +ε3 ij (ˆFû ) B ij +εb ij 2 Fit the analysis model on each imputed data set: ˆβ m, Var ( ) ˆβ m 3 Combine the results: ˆβ = 1 M ˆβ M m=1 m T = 1 M Var ( ) M m=1 ˆβm + ( ) ( 1 + 1 1 M M M 1 m=1 ˆβm ˆβ Provide estimation of the parameters and of their variability ) 2 5 / 29

MI for multilevel data Two standard ways to perform MI Fully conditional specication (FCS, MICE): a conditional imputation model for each variable Joint modelling (JM): a joint imputation model for all variables The imputation model (joint or conditional) needs to be in line with the data need to account for the heterogeneity between clusters need to account for the types of data (continuous and binary) 6 / 29

MI for multilevel data Type and name Handles missing data: Coded in R Sporadic? Systematic? in binary variable? JM-Pan yes yes no yes, (Pan) JM-REALCOM yes yes yes no JM-jomo yes yes yes yes, (jomo) JM-Mplus yes yes yes no FCS-2lnorm yes no no yes, (mice) FCS-1stage yes using variant yes yes yes FCS-2stage yes yes yes using yes variant 7 / 29

Outline 1 Introduction GREAT data Multiple imputation MI for multilevel data 2 MI methods for multilevel data Continuous variables Univariate case Multivariate case Binary variables 3 Comparisons Simulations Application 4 Conclusion 8 / 29

Continuous variables Heteroscedastic random eect model as imputation model y ik = x ik β + z ik b k + ε ik b k N (0, Ψ) ε ik N (0, Σ k ) Multiple imputation under this model 1 generating M sets of parameters θ m = ( ) β m, Ψ m, (Σ m k ) 1 k K Bayesian formulation: draw θ m from its posterior distribution asymptotic method: estimate θ m, draw θ m from the asymptotic distribution of the estimator 2 imputing the data according each set θ m draw bk m yobs k, θ m draw y miss ik θ m, bk m 9 / 29

Continuous variables Heteroscedastic random eect model as imputation model y ik = x ik β + z ik b k + ε ik b k N (0, Ψ) ε ik N (0, Σ k ) Multiple imputation under this model 1 generating M sets of parameters θ m = 2 imputing the data according each set θ m draw bk m yobs k, θ m draw y miss ik θ m, bk m Specic issues 1 how to generate Σ k without y ik? (systematic) ( ) β m, Ψ m, (Σ m k ) 1 k K 2 how to draw b m k without y ik (systematic) or given y ik (sporadic)? 9 / 29

FCS-1stage (Jolani et al., 2015) Conditional imputation models y ik = x ik β + z ik b k + ε ik b k N (0, Ψ) ε ik N ( 0, σ 2) For each incomplete variable 1 generate θ m = ( ) β m, Ψ m, σ 2 m 1 m M prior: non-informative (Jereys) posterior distribution ( β)) β m N ( β, var W (K, b b ) Ψ 1 m requires REML estimate ( ) σ 2 m Inv-Γ n p 2, (n p) σ 2 2 2 impute in each cluster k with systematically missing data draw b k N (0, Ψ m ) impute data according to the imputation model 10 / 29

FCS-1stage (Jolani et al., 2015) Conditional imputation models y ik = x ik β + z ik b k + ε ik b k N (0, Ψ) ε ik N ( 0, σ 2) For each incomplete variable 1 generate θ m = ( ) β m, Ψ m, σ 2 m 1 m M prior: non-informative (Jereys) posterior distribution ( β)) β m N ( β, var W (K, b b ) Ψ 1 m requires REML estimate ( ) σ 2 m Inv-Γ n p 2, (n p) σ 2 2 2 impute in each cluster k with sporadically missing data draw b k N ( µ bk y k, Ψ bk y k ) impute data according to the imputation model 10 / 29

FCS-2stage (Resche-Rigon and White, 2016) Conditional imputation models y ik = x ik (β + b k ) + ε ik b k N (0, Ψ) ε ik N ( ) 0, σ 2 k the same imputation model, with heteroscedastic assumption 1 generate θ m = ( β m, Ψ m, ( σ1 2,..., ) ) σ2 K ) m estimate θ and var ( θ with a two-stage estimator draw θ m from the asymptotic distribution of the estimator with expectation θ ) and variance var ( θ 2 impute in each cluster k 11 / 29

FCS-2stage (Resche-Rigon and White, 2016) 1 generate θ m = ( β m, Ψ m, ( σ1 2,..., ) ) σ2 K m stage 1 t a linear model to each observed cluster ( ) 1 β k = X k X k X k y k stage 2 combine the estimates β k = (β + b k ) + ε k b k N (0, Ψ) ε k N ( )) 0, var ( βk Two estimators available: REML or method of moments Ψ, β and their associated (asymptotic) variances 12 / 29

FCS-2stage (Resche-Rigon and White, 2016) 1 generate θ m = ( β m, Ψ m, ( σ1 2,..., ) ) σ2 K m stage 1 t a linear model to each observed cluster ) 1 β k = (X k X k X k y k σ k = y k X k β k 2 stage 2 combine the estimates n k p 1 log σ k = (log σ + s k ) + ε k s k N (0, Ψ s) ε k N (0, var (log σ k )) ( )) β k = (β + b k ) + ε k b k N (0, Ψ) ε k N 0, var ( βk Two estimators available: REML or method of moments log σ, Ψ s and their associated (asymptotic) variances Ψ, β and their associated (asymptotic) variances 12 / 29

FCS-2stage (Resche-Rigon and White, 2016) 1 generate θ m = ( β m, Ψ m, ( σ1 2,..., ) ) σ2 K m stage 1 t a linear model to each observed cluster ) 1 β k = (X k X k X k y k σ k = y k X k β k 2 stage 2 combine the estimates n k p 1 log σ k = (log σ + s k ) + ε k s k N (0, Ψ s) ε k N (0, var (log σ k )) ( )) β k = (β + b k ) + ε k b k N (0, Ψ) ε k N 0, var ( βk Two estimators available: REML or method of moments log σ, Ψ s and their associated (asymptotic) variances Ψ, β and their associated (asymptotic) variances 2 impute in each cluster k with systematically missing data draw b k from their marginal distribution impute data according to the imputation model 12 / 29

FCS-2stage (Resche-Rigon and White, 2016) 1 generate θ m = ( β m, Ψ m, ( σ1 2,..., ) ) σ2 K m stage 1 t a linear model to each observed cluster ) 1 β k = (X k X k X k y k σ k = y k X k β k 2 stage 2 combine the estimates n k p 1 log σ k = (log σ + s k ) + ε k s k N (0, Ψ s) ε k N (0, var (log σ k )) ( )) β k = (β + b k ) + ε k b k N (0, Ψ) ε k N 0, var ( βk Two estimators available: REML or method of moments log σ, Ψ s and their associated (asymptotic) variances Ψ, β and their associated (asymptotic) variances 2 impute in each cluster k with sporadically missing data draw b k conditionally to β k impute data according to the imputation model 12 / 29

JM-jomo (Quartagno and Carpenter, 2016) y ik = x ik β + z ik b k + ε ik b k N (0, Ψ) ε ik N (0, Σ k ) 1 Bayesian formulation to generate θ m = (β m, Ψ m, Σ m ) 1 m M (informative) prior: β 1, Ψ 1 W (ν 1, Λ 1 ), Σ 1 k ν 2, Λ 2 W (ν 2, Λ 2 ) posterior: unknown explicitly but... most of conditional posterior distributions are known Gibbs sampler do not require REML estimate unknown conditional distributions can be simulated by MCMC 2 Imputation (given by step 1) 13 / 29

Binary variables FCS-1stage (Jolani et al., 2015) t a logistic model with mixed eect to all clusters sporadically missing values not handled FCS-2stage (Resche-Rigon and White, 2016) t a logistic model with xed eect to each cluster combine estimates using a meta-analysis large clusters are required JM-jomo (Quartagno and Carpenter, 2016) probit link: outcomes are latent normal variables, variance for errors are xed to 1 draw latent normal variables derive categories more time consuming 14 / 29

Summary method Bayesian / asymptotic prior for covariance matrices heteroscedasticity assumption for errors binary variables FCS-1stage Bayesian Jerey no probit link FCS-2tage asymptotic yes logistic link JM-jomo Bayesian Wishart yes logistic link 15 / 29

Simulation design Data generation: 500 incomplete data sets are independently simulated (n = 11685, K = 28, 18 n k 1834) y ik = β 0 + β 1 x (1) ik + β 2 x (2) ik with β = (.72,.11,.03), Ψ = (µ k, ν k, ξ k ) N 0,.12.001.001.001.12.001.001.001.12 + b0 k [ + b1 k x(1) ik + ε ik.0077.0015.0015.0004 x (1) ik : N (2.9 + µ k,.36) ( ( )) x (2) ik : logit P x (2) ik = 1 = 4.2 + ν k x (3) ik : N (2.9 + ξ k,.36) ], σ =.15 add missing values on x (1), x (2) with π syst =.25 and π spor =.25 16 / 29

Simulation design Methods JM-jomo, FCS-1stage, FCS-2stage Full, CC, FCS-x, FCS-noclust, JM-pan M = 5 imputed arrays ) Estimands: β and var ( β Criteria: bias, rmse, variance estimate, coverage 17 / 29

Base-case conguration β β true Full CC FCS fix FCS noclust JM pan FCS 1stage FCS 2stageMM FCS 2stageRE JM jomo 0.01 0.01 0.03 β 1 β 2 18 / 29

Base-case conguration Method ) ) var ( β var ( β 95% Cover Time (min) β 1 β 2 β 1 β 2 β 1 β 2 Full 0.0047 0.0029 0.0048 0.0030 93.8 94.2 CC 0.0070 0.0053 0.0071 0.0053 92.2 94.4 FCS-x 0.0043 0.0043 0.0058 0.0042 85.6 94.6 0.732 FCS-noclust 0.0041 0.0043 0.0067 0.0046 59.6 92.4 0.601 JM-pan 0.0048 0.0042 0.0058 0.0039 83.0 95.2 0.006 FCS-1stage 0.0049 0.0046 0.0056 0.0043 90.4 96.8 63.43 FCS-2stagemm 0.0059 0.0054 0.0058 0.0044 95.0 97.0 0.538 FCS-2stagere 0.0059 0.0049 0.0058 0.0044 95.0 96.2 1.304 JM-jomo 0.0066 0.0069 0.0057 0.0050 96.8 97.6 6.739 19 / 29

Base-case conguration Method ) ) var ( β var ( β 95% Cover Time (min) β 1 β 2 β 1 β 2 β 1 β 2 Full 0.0047 0.0029 0.0048 0.0030 93.8 94.2 CC 0.0070 0.0053 0.0071 0.0053 92.2 94.4 FCS-x 0.0043 0.0043 0.0058 0.0042 85.6 94.6 0.732 FCS-noclust 0.0041 0.0043 0.0067 0.0046 59.6 92.4 0.601 JM-pan 0.0048 0.0042 0.0058 0.0039 83.0 95.2 0.006 FCS-1stage 0.0049 0.0046 0.0056 0.0043 90.4 96.8 63.43 FCS-2stagemm 0.0059 0.0054 0.0058 0.0044 95.0 97.0 0.538 FCS-2stagere 0.0059 0.0049 0.0058 0.0044 95.0 96.2 1.304 JM-jomo 0.0066 0.0069 0.0057 0.0050 96.8 97.6 6.739 19 / 29

Robustness to the cluster size: point estimate β 1 β 2 Relative bias 0.20 0.15 0.10 0.05 0.00 JM FCS 1stage FCS 2stageRE FCS 2stageMM Relative bias 0.20 0.15 0.10 0.05 0.00 0 100 200 300 400 0 100 200 300 400 n k n k 20 / 29

Robustness to the number of clusters: point estimate β 1 β 2 Relative bias 0.20 0.15 0.10 0.05 0.00 JM FCS 1stage FCS 2stageRE FCS 2stageMM Relative bias 0.20 0.15 0.10 0.05 0.00 10 15 20 25 10 15 20 25 K K 21 / 29

Robustness to π syst : point estimate β 1 β 2 Relative bias 0.20 0.15 0.10 0.05 0.00 JM FCS 1stage FCS 2stageRE FCS 2stageMM Relative bias 0.20 0.15 0.10 0.05 0.00 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.10 0.15 0.20 0.25 0.30 0.35 0.40 π syst π syst 22 / 29

Robustness to π syst : variance estimate 0.0050 0.0060 0.0070 SE β 1 Model SE JM Model SE FCS 1stage Model SE FCS 2stageRE Model SE FCS 2stageMM Emp SE 0.005 0.006 0.007 0.008 0.009 SE β 2 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.10 0.15 0.20 0.25 0.30 0.35 0.40 π syst π syst 23 / 29

Robustness to the type of imputed variables β β true Full FCS 1stage FCS 2stageMM FCS 2stageRE JM jomo 0.01 0.01 0.03 β 1 β 2 Method ) ) var ( β var ( β 95% Cover Time (min) β 1 β 2 β 1 β 2 β 1 β 2 Full 0.0050 0.0029 0.0049 0.0028 95.0 95.0 FCS-1stage 0.0057 0.0044 0.0059 0.0043 92.0 95.2 103.665 FCS-2stagemm 0.0063 0.0051 0.0060 0.0044 94.0 96.2 0.652 FCS-2stagere 0.0056 0.0045 0.0061 0.0044 90.4 95.0 1.572 JM-jomo 0.0074 0.0072 0.0064 0.0047 97.0 98.6 5.612 24 / 29

Other congurations Methods have similar performances when the missing data mechanism is MAR the outcome of the analysis model is binary the variance of random eects is higher or smaller binary variables are generated using a probit link 25 / 29

Appplication to GREAT data Explain the relationship between biomarkers easily measurable (BNP, AFIB) and the left ventricular ejection fraction y=lvef, X = BNP, AFib MI using M = 20 imputed arrays gender bmi age SBP DBP HR bnpl AFib LVEF Index CC JM FCS-1stage FCS- 2stagere FCS- 2stagemm β BNP Est -0.1132-0.0891-0.0902-0.0854-0.1009 ModelSE 0.0108 0.0078 0.0153 0.0099 0.0112 β AFIB Est 0.0268 0.0216 0.0251 0.0215 0.0273 ModelSE 0.0071 0.0046 0.0047 0.0040 0.0045 Time 94.0 18609.3 361.3 31.8 26 / 29

Appplication to GREAT data Explain the relationship between biomarkers easily measurable (BNP, AFIB) and the left ventricular ejection fraction y=lvef, X = BNP, AFib MI using M = 20 imputed arrays gender bmi age SBP DBP HR bnpl AFib LVEF Index CC JM FCS-1stage FCS- 2stagere FCS- 2stagemm β BNP Est -0.1132-0.0891-0.0902-0.0854-0.1009 ModelSE 0.0108 0.0078 0.0153 0.0099 0.0112 β AFIB Est 0.0268 0.0216 0.0251 0.0215 0.0273 ModelSE 0.0071 0.0046 0.0047 0.0040 0.0045 Time 94.0 18609.3 361.3 31.8 26 / 29

Appplication to GREAT data Explain the relationship between biomarkers easily measurable (BNP, AFIB) and the left ventricular ejection fraction y=lvef, X = BNP, AFib MI using M = 20 imputed arrays gender bmi age SBP DBP HR bnpl AFib LVEF Index CC JM FCS-1stage FCS- 2stagere FCS- 2stagemm β BNP Est -0.1132-0.0891-0.0902-0.0854-0.1009 ModelSE 0.0108 0.0078 0.0153 0.0099 0.0112 β AFIB Est 0.0268 0.0216 0.0251 0.0215 0.0273 ModelSE 0.0071 0.0046 0.0047 0.0040 0.0045 Time 94.0 18609.3 361.3 31.8 26 / 29

Appplication to GREAT data Explain the relationship between biomarkers easily measurable (BNP, AFIB) and the left ventricular ejection fraction y=lvef, X = BNP, AFib MI using M = 20 imputed arrays gender bmi age SBP DBP HR bnpl AFib LVEF Index CC JM FCS-1stage FCS- 2stagere FCS- 2stagemm β BNP Est -0.1132-0.0891-0.0902-0.0854-0.1009 ModelSE 0.0108 0.0078 0.0153 0.0099 0.0112 β AFIB Est 0.0268 0.0216 0.0251 0.0215 0.0273 ModelSE 0.0071 0.0046 0.0047 0.0040 0.0045 Time 94.0 18609.3 361.3 31.8 26 / 29

Appplication to GREAT data Explain the relationship between biomarkers easily measurable (BNP, AFIB) and the left ventricular ejection fraction y=lvef, X = BNP, AFib MI using M = 20 imputed arrays gender bmi age SBP DBP HR bnpl AFib LVEF Index CC JM FCS-1stage FCS- 2stagere FCS- 2stagemm β BNP Est -0.1132-0.0891-0.0902-0.0854-0.1009 ModelSE 0.0108 0.0078 0.0153 0.0099 0.0112 β AFIB Est 0.0268 0.0216 0.0251 0.0215 0.0273 ModelSE 0.0071 0.0046 0.0047 0.0040 0.0045 Time 94.0 18609.3 361.3 31.8 26 / 29

Conclusion An overview of MI methods for multilevel data FCS-1stage, FSC-2stage and JM-jomo all appear to perform well Outperform had-hoc methods FCS-2stage FCS-1stage JM-jomo MM version provides a quick way to obtain rst results for large clusters relevant with few systematically missing values time consuming with binary variables advised with a lot of incomplete categorical variables be careful with few clusters Methods are implemented in R mice package for FCS methods jomo package for JM-jomo 27 / 29

Limits and perspectives Limits congeniality y ik = β 0 + β 1 x (1) ik + β2 x (2) ik + b0 k + bk 1 x(1) ik + ε ik convergence of the FCS approaches computational time Perspectives correction for FCS-2stage with small clusters? handle logistic link for FCS-1stage? 28 / 29

References I Great Network. Managing acute heart failure in the ED - case studies from the acute heart failure academy, 2013. http://www.greatnetwork.org. D. B. Rubin. Multiple Imputation for Non-Response in Survey. Wiley, New-York, 1987. S. van Buuren. Multiple imputation of multilevel data. In The Handbook of Advanced Multilevel Analysis. Routledge, Milton Park, UK, 2010. S. Jolani, T. P. A. Debray, H. Kojberg, S. van Buuren, and K. G. M. Moons. Imputation of systematically missing predictors in an individual participant data meta-analysis: a generalized approach using MICE. Statistics in Medicine, 34(11):18411863, 2015. M. Resche-Rigon, I. R. White, J. Bartlett, S.A.E. Peters, S.G. Thompson, and on behalf of the PROG-IMT Study Group. Multiple imputation for handling systematically missing confounders in meta-analysis of individual participant data. Statistics in Medicine, 32(28):48904905, 2013. ISSN 1097-0258. doi: 10.1002/sim.5894. M. Resche-Rigon and I. White. Multiple imputation by chained equations for systematically and sporadically missing multilevel data. smmr, 2016. J. L. Schafer. Imputation of missing covariates under a multivariate linear mixed model. Technical report, Dept. of Statistics, The Pennsylvania State University, 1997. M. Quartagno and J. R. Carpenter. Multiple imputation for IPD meta-analysis: allowing for heterogeneity and studies with missing covariates. Statistics in Medicine, 35(17):29382954, 2016. ISSN 1097-0258. 29 / 29