Surrogate marker evaluation when data are small, large, or very large Geert Molenberghs

Similar documents
Introduction (Alex Dmitrienko, Lilly) Web-based training program

A measure for the reliability of a rating scale based on longitudinal clinical trial data Link Peer-reviewed author version

Causal-inference and Meta-analytic Approaches to Surrogate Endpoint Validation

Random Effects Models for Longitudinal Data

T E C H N I C A L R E P O R T A SANDWICH-ESTIMATOR TEST FOR MISSPECIFICATION IN MIXED-EFFECTS MODELS. LITIERE S., ALONSO A., and G.

Modeling of Correlated Data and Multivariate Survival Data

Package Surrogate. January 20, 2018

A Joint Model with Marginal Interpretation for Longitudinal Continuous and Time-to-event Outcomes

Multilevel Methodology

Practical issues related to the use of biomarkers in a seamless Phase II/III design

T E C H N I C A L R E P O R T THE EFFECTIVE SAMPLE SIZE AND A NOVEL SMALL SAMPLE DEGREES OF FREEDOM METHOD

Implementation of Pairwise Fitting Technique for Analyzing Multivariate Longitudinal Data in SAS

Type I and type II error under random-effects misspecification in generalized linear mixed models Link Peer-reviewed author version

Web-based Supplementary Material for A Two-Part Joint. Model for the Analysis of Survival and Longitudinal Binary. Data with excess Zeros

Choice of units of analysis and modeling strategies in multilevel hierarchical models

Generalized Linear Models for Non-Normal Data

Marginal correlation in longitudinal binary data based on generalized linear mixed models

The combined model: A tool for simulating correlated counts with overdispersion. George KALEMA Samuel IDDI Geert MOLENBERGHS

Hierarchical Hurdle Models for Zero-In(De)flated Count Data of Complex Designs

David Hughes. Flexible Discriminant Analysis Using. Multivariate Mixed Models. D. Hughes. Motivation MGLMM. Discriminant. Analysis.

Likelihood ratio testing for zero variance components in linear mixed models

Permutation Test for Bayesian Variable Selection Method for Modelling Dose-Response Data Under Simple Order Restrictions

An R # Statistic for Fixed Effects in the Linear Mixed Model and Extension to the GLMM

Introduction to lnmle: An R Package for Marginally Specified Logistic-Normal Models for Longitudinal Binary Data

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3

Multivariate Assays With Values Below the Lower Limit of Quantitation: Parametric Estimation By Imputation and Maximum Likelihood

2 Naïve Methods. 2.1 Complete or available case analysis

T E C H N I C A L R E P O R T KERNEL WEIGHTED INFLUENCE MEASURES. HENS, N., AERTS, M., MOLENBERGHS, G., THIJS, H. and G. VERBEKE

A Sampling of IMPACT Research:

Models for Longitudinal Analysis of Binary Response Data for Identifying the Effects of Different Treatments on Insomnia

Modelling Dropouts by Conditional Distribution, a Copula-Based Approach

Longitudinal analysis of ordinal data

Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs

BIOSTATISTICAL METHODS

Analysis of non-ignorable missing and left-censored longitudinal data using a weighted random effects tobit model

arxiv: v1 [stat.ap] 3 Feb 2015

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models

Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test

High-Throughput Sequencing Course

Model Assumptions; Predicting Heterogeneity of Variance

Generalized Linear Models for Count, Skewed, and If and How Much Outcomes

The Union and Intersection for Different Configurations of Two Events Mutually Exclusive vs Independency of Events

Introduction to mtm: An R Package for Marginalized Transition Models

Statistical aspects of prediction models with high-dimensional data

Package CorrMixed. R topics documented: August 4, Type Package

Bootstrap Procedures for Testing Homogeneity Hypotheses

Thieme Chemistry E-Books

,..., θ(2),..., θ(n)

Estimating Optimal Dynamic Treatment Regimes from Clustered Data

Multiple Imputation for Missing Data in Repeated Measurements Using MCMC and Copulas

Comparison of multiple imputation methods for systematically and sporadically missing multilevel data

Imputation Algorithm Using Copulas

Targeted Maximum Likelihood Estimation in Safety Analysis

Dose-response modeling with bivariate binary data under model uncertainty

Linear & Generalized Linear Mixed Models

A Bayesian Nonparametric Model for Predicting Disease Status Using Longitudinal Profiles

Econometrics. 5) Dummy variables

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

Welcome! Webinar Biostatistics: sample size & power. Thursday, April 26, 12:30 1:30 pm (NDT)

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Introduction to Within-Person Analysis and RM ANOVA

A comparison of arm-based and contrast-based approaches to network meta-analysis (NMA)

Joint longitudinal and time-to-event models via Stan

Covariate Adjustment for Logistic Regression Analysis of Binary Clinical Trial Data

On Fitting Generalized Linear Mixed Effects Models for Longitudinal Binary Data Using Different Correlation

Meta-analysis. 21 May Per Kragh Andersen, Biostatistics, Dept. Public Health

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models

PQL Estimation Biases in Generalized Linear Mixed Models

STATISTICAL ANALYSIS WITH MISSING DATA

Core Courses for Students Who Enrolled Prior to Fall 2018

Applied Multivariate Statistical Analysis Richard Johnson Dean Wichern Sixth Edition

Longitudinal Data Analysis. Michael L. Berbaum Institute for Health Research and Policy University of Illinois at Chicago

Statistical Practice

Whether to use MMRM as primary estimand.

Sample size determination for a binary response in a superiority clinical trial using a hybrid classical and Bayesian procedure

Generalizing the MCPMod methodology beyond normal, independent data

Case Study in the Use of Bayesian Hierarchical Modeling and Simulation for Design and Analysis of a Clinical Trial

Semiparametric Generalized Linear Models

Including historical data in the analysis of clinical trials using the modified power priors: theoretical overview and sampling algorithms

A Flexible Modeling Framework for Overdispersed Hierarchical Data

Minimax-Regret Sample Design in Anticipation of Missing Data, With Application to Panel Data. Jeff Dominitz RAND. and

A TWO-STAGE LINEAR MIXED-EFFECTS/COX MODEL FOR LONGITUDINAL DATA WITH MEASUREMENT ERROR AND SURVIVAL

Nesting and Equivalence Testing

arxiv: v1 [stat.ap] 6 Apr 2018

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation

Multinomial Regression Models

SEM for Categorical Outcomes

Richard D Riley was supported by funding from a multivariate meta-analysis grant from

Lecture 4: Newton s method and gradient descent

Advantages of Mixed-effects Regression Models (MRM; aka multilevel, hierarchical linear, linear mixed models) 1. MRM explicitly models individual

Multivariate Survival Analysis

Time-Invariant Predictors in Longitudinal Models

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

Practical Biostatistics

A Note on Bayesian Inference After Multiple Imputation

Department of Large Animal Sciences. Outline. Slide 2. Department of Large Animal Sciences. Slide 4. Department of Large Animal Sciences

Time-Invariant Predictors in Longitudinal Models

Advanced topics from statistics

e author and the promoter give permission to consult this master dissertation and to copy it or parts of it for personal use. Each other use falls

Transcription:

Surrogate marker evaluation when data are small, large, or very large Geert Molenberghs Interuniversity Institute for Biostatistics and statistical Bioinformatics (I-BioStat) Universiteit Hasselt & KU Leuven, Belgium geert.molenberghs@uhasselt.be & geert.molenberghs@kuleuven.be www.ibiostat.be Interuniversity Institute for Biostatistics and statistical Bioinformatics London, March 30, 2017

Broad Principle: Pseudo-likelihood Arnold and Strauss (1991) Geys, Molenberghs, and Ryan (1999) Molenberghs and Verbeke (2005) Units: clusters, repeated measures, spatial data, microarrays,... f(y 1,y 2, y 3 ) f(y 1 y 2, y 3 ) f(y 2 y 1, y 3 ) f(y 3 y 1, y 2 ) f(y 1, y 2,y 3 ) f(y 1,y 2 ) f(y 1, y 3 ) f(y 2, y 3 ) London, March 30, 2017 1

f(y i1,...,y ini ) replaced by a product of convenient factors The wrong likelihood used The right results obtained: Consistent, asymptotically normal estimators Often minor loss of statistical efficiency Often major gain of computational efficiency London, March 30, 2017 2

Specific Use 1: Pseudo-likelihood for HD Multivariate Longitudinal Data Fieuws and Verbeke (2006); Fieuws et al (2006) M sequences of repeated measures Example: 44 sequences of hearing variables London, March 30, 2017 3

Data for patient i: Y i11 Y i12 Y i13... Y i1ni Y i21 Y i22 Y i23... Y i2ni Y i31 Y i32 Y i33... Y i3ni....... Y i,44,1 Y i,44,2 Y i,44,3... Y i,44,ni Fit model to each of the M(M 1)/2 pairs Use PL to reach valid conclusions London, March 30, 2017 4

Specific Use 2: Split Sample Method: (In)dependent Subsamples or London, March 30, 2017 5

Behavior Univariate normal: equivalent Univariate Bernoulli (probability): equivalent Univariate Bernoulli (logit): different estimator, same precision Compound symmetry: different estimator, some precision loss London, March 30, 2017 6

Specific Use 3: Per Cluster Size London, March 30, 2017 7

Fixed Cluster Size Variable Cluster Size Fixed cluster size: closed-form maximum likelihood estimator: easy Variable cluster size: Estimate parameters per cluster size Average these to find But: Now weighted average needed London, March 30, 2017 8

Which Weights and Why? Constant weights Proportional weights Optimal weights Scalar weights Iterated optimal weights Approximate optimal weights London, March 30, 2017 9

Surrogate Markers Model: S ij = µ Si + α i Z ij + ε Sij T ij = µ Ti + β i Z ij + ε Tij Error structure: Individual level: Deviations ε Sij and ε Tij are correlated Trial level: Treatment effects α i and β i are correlated (Information from intercepts µ Si and µ Ti can be used as well) London, March 30, 2017 10

Estimation can be problematic: especially in small studies especialy when studies are of differing sizes Solution 1: Use multiple imputation to make all studies equally large Solution 2: Analyze trial-by-trial: it can be shown that this is valid Combine results across trials using weighted averages When some (or all) trials are very large: sub-sampling is allowable Solution 2-advantages: : Very stable small trials : Very fast very large trials London, March 30, 2017 11

Statistics Applied Surrogate Endpoint Evaluation Methods with SAS and R provides an overview of contemporary meta-analytic and information-theoretic methodology to evaluate candidate surrogate endpoints from clinical trials and beyond. The book strongly focuses on user-friendly software in both SAS and R for a variety of outcome types. The book is aimed at researchers and practitioners who want to study and apply methodology for surrogate endpoint and biomarker evaluation. Methodology is described while keeping mathematical detail to a minimum. Throughout the book, a suite of generic case studies is used to illustrate the concepts and methodology. A large part of the book is devoted to the description and illustration of SAS macros, R language libraries, and R Shiny Apps. The software tools can be downloaded from the authors web pages. Methodology, applications, and software encompass continuous, binary, categorical, time-to-event, and longitudinal outcomes. The University of Hasselt and KU Leuven-based editor team, supplemented by a fine group of chapter authors, has over twenty years of experience in the field of surrogate marker evaluation in clinical and other studies. The book is rooted at the same time in methodological research, regular and short courses taught on the topic, as well as in vast experience with the design and conduct of clinical trials. The team s prolific contributions have led to numerous papers, chapters, and books on this topic. This book was written in a coherent fashion, with common notation, conventions, and case studies throughout all chapters. K23717 ISBN: 978-1-4822-4936-1 90000 Applied Surrogate Endpoint Evaluation Methods with SAS and R Alonso Bigirumurame Burzykowski Buyse Molenberghs Muchene Perualila Shkedy Van der Elst Applied Surrogate Endpoint Evaluation Methods with SAS and R Ariel Alonso Theophile Bigirumurame Tomasz Burzykowski Marc Buyse Geert Molenberghs Leacky Muchene Nolen Joy Perualila Ziv Shkedy Wim Van der Elst 9 781482 249361 w w w. c r c p r e s s. c o m London, March 30, 2017 12

Application: Leuven Diabetes Study 120 general practitioners 2495 patients Outcomes LDL: low-density lipoprotein cholestrol HbA1C: glycosylated hemoglobin SBP: systolic blood pressure Ordinal targets Multiple outcomes & measured repeatedly & ordinal = joint modeling London, March 30, 2017 13

Method 3 sequences Partitioning CPU 1 ML (123) 7 13 2 PLp (12)(13)(23) 1 23 3 PLs (123) 1 21 4 PLps (12)(13)(23) 0 20 London, March 30, 2017 14

CPU Gain / Efficiency Loss Subsamples can be analyzed in parallel Base model above, with numerical integration over Q = 3 quadrature points: 7 13 0 20 More demanding integration: Q = 15 10h02 42 0h4 17 Statistical efficiency: almost always 95% For PLps occasionally 85% 87% London, March 30, 2017 15

Application: Quantifying Expert Opinion Janssen Pharmaceutica chemical compound acquisition to diversify library 22,015 compounds presented to 147 experts Outcome: recommended (1) not recommended (0) Variable #compounds per expert London, March 30, 2017 16

Simple model: logit [P (Y ij = 1 b i )] = β j + b i b i : normal random effect of expert i β j : potential of compound j there are 22,015 β j s London, March 30, 2017 17

Modified Procedure Partition β j s into S mutually exclusive, exhaustive sets Fit model to each of the S = 30 subsets Repeat this W = 20 times 96 hours on HPC (Nehalem) Can be brought down to 1 hour when parallelized Can be optimized further Weighted analysis by differing numbers of compounds per expert London, March 30, 2017 18

Conclusions Broad framework based on: pseudo-likelihood pairwise modeling split sample Statistically valid procedures: consistent, asymptotically normal Can lead to tremendous CPU gain Statistical efficiency loss mostly acceptable London, March 30, 2017 19