Incorporating published univariable associations in diagnostic and prognostic modeling

Similar documents
A framework for developing, implementing and evaluating clinical prediction models in an individual participant data meta-analysis

Comparison of multiple imputation methods for systematically and sporadically missing multilevel data

A new strategy for meta-analysis of continuous covariates in observational studies with IPD. Willi Sauerbrei & Patrick Royston

Econometrics with Observational Data. Introduction and Identification Todd Wagner February 1, 2017

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation

Practical Biostatistics

Ann Lazar

BIOS 312: MODERN REGRESSION ANALYSIS

Simple Linear Regression Analysis

ECE521 week 3: 23/26 January 2017

Sample Size and Power I: Binary Outcomes. James Ware, PhD Harvard School of Public Health Boston, MA

Inference for correlated effect sizes using multiple univariate meta-analyses

Reconstruction of individual patient data for meta analysis via Bayesian approach

Residuals in the Analysis of Longitudinal Data

Multi-level Models: Idea

Joint longitudinal and time-to-event models via Stan

Does low participation in cohort studies induce bias? Additional material

On the Use of the Bross Formula for Prioritizing Covariates in the High-Dimensional Propensity Score Algorithm

Estimating and contextualizing the attenuation of odds ratios due to non-collapsibility

Bayesian variable selection and classification with control of predictive values

Extending causal inferences from a randomized trial to a target population

Lecture 1 Introduction to Multi-level Models

Pooling multiple imputations when the sample happens to be the population.

Machine Learning Linear Classification. Prof. Matteo Matteucci

Standard RE Model. Critique of the RE Model 2016/08/12

Richard D Riley was supported by funding from a multivariate meta-analysis grant from

Experimental Design and Data Analysis for Biologists

multilevel modeling: concepts, applications and interpretations

arxiv: v1 [stat.me] 15 May 2011

Overview of statistical methods used in analyses with your group between 2000 and 2013

arxiv: v2 [stat.me] 27 Nov 2017

Memorial Sloan-Kettering Cancer Center

Elaboration d un score au départ d une base de données. Ch. Mélot, MD, PhD, MSciBiostat Service des Soins Intensifs Hôpital Universitaire Erasme

Propensity Score Analysis with Hierarchical Data

Towards stratified medicine instead of dichotomization, estimate a treatment effect function for a continuous covariate

Applied Microeconometrics (L5): Panel Data-Basics

PubH 7405: REGRESSION ANALYSIS INTRODUCTION TO LOGISTIC REGRESSION

Applied Machine Learning Annalisa Marsico

Personalized Treatment Selection Based on Randomized Clinical Trials. Tianxi Cai Department of Biostatistics Harvard School of Public Health

6.873/HST.951 Medical Decision Support Spring 2004 Evaluation

Confidence Intervals, Testing and ANOVA Summary

Generalized Linear Models (GLZ)

Lecture 15 (Part 2): Logistic Regression & Common Odds Ratio, (With Simulations)

Point-wise averaging approach in dose-response meta-analysis of aggregated data

Introduction to Statistical Analysis

Statistical Analysis of Binary Composite Endpoints with Missing Data in Components

Short Note: Naive Bayes Classifiers and Permanence of Ratios

Meta-Analysis for Diagnostic Test Data: a Bayesian Approach

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall

Rerandomization to Balance Covariates

Introduction to logistic regression

Binomial Model. Lecture 10: Introduction to Logistic Regression. Logistic Regression. Binomial Distribution. n independent trials

Finite Population Sampling and Inference

PQL Estimation Biases in Generalized Linear Mixed Models

Application of the Time-Dependent ROC Curves for Prognostic Accuracy with Multiple Biomarkers

Lecture 10: Introduction to Logistic Regression

e author and the promoter give permission to consult this master dissertation and to copy it or parts of it for personal use. Each other use falls

Classification. Classification is similar to regression in that the goal is to use covariates to predict on outcome.

arxiv: v1 [stat.ap] 6 Apr 2018

Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions

Recommendations as Treatments: Debiasing Learning and Evaluation

BIOSTATISTICAL METHODS

Estimating Optimum Linear Combination of Multiple Correlated Diagnostic Tests at a Fixed Specificity with Receiver Operating Characteristic Curves

The use of correlation functions in thoracic surgery research

Estimation in Flexible Adaptive Designs

Basic Medical Statistics Course

A Measurement Error Model for Physical Activity Level Measured by a Questionnaire, with application to the NHANES Questionnaire

Statistical Thinking in Biomedical Research Session #3 Statistical Modeling

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY

Semi-Nonparametric Inferences for Massive Data

A STUDY OF PRE-VALIDATION

Regression so far... Lecture 21 - Logistic Regression. Odds. Recap of what you should know how to do... At this point we have covered: Sta102 / BME102

HOW TO DETERMINE THE NUMBER OF SUBJECTS NEEDED FOR MY STUDY?

Checking model assumptions with regression diagnostics

STAT 5500/6500 Conditional Logistic Regression for Matched Pairs

Basics of Modern Missing Data Analysis

Subgroup analysis using regression modeling multiple regression. Aeilko H Zwinderman

Faculty of Science and Technology MASTER S THESIS

Building a Prognostic Biomarker

Approximate analysis of covariance in trials in rare diseases, in particular rare cancers

Bayesian methods for missing data: part 1. Key Concepts. Nicky Best and Alexina Mason. Imperial College London

Guideline on adjustment for baseline covariates in clinical trials

PhD course: Statistical evaluation of diagnostic and predictive models

Propensity Score Methods for Causal Inference

Comparing MLE, MUE and Firth Estimates for Logistic Regression

Distributed analysis in multi-center studies

ASSESSING PROGNOSIS FOR CHRONIC KIDNEY DISEASE USING NEURAL NETWORKS. by Michael R. Zeleny B.S. Biology, Juniata College, 2012

Introduction to lnmle: An R Package for Marginally Specified Logistic-Normal Models for Longitudinal Binary Data

Lecture 14: Shrinkage

General Linear Model (Chapter 4)

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

A MODERN STATISTICAL APPROACH TO QUALITY IMPROVEMENT IN HEALTH CARE USING QUANTILE REGRESSION JARROD E. DALTON

Constrained Maximum Likelihood Estimation for Model Calibration Using Summary-level Information from External Big Data Sources

Covariate Adjustment for Logistic Regression Analysis of Binary Clinical Trial Data

Joint Modeling of Longitudinal Item Response Data and Survival

Implications of Ignoring the Uncertainty in Control Totals for Generalized Regression Estimators. Calibration Estimators

Welcome! Webinar Biostatistics: sample size & power. Thursday, April 26, 12:30 1:30 pm (NDT)

Impact of covariate misclassification on the power and type I error in clinical trials using covariate-adaptive randomization

Describing Change over Time: Adding Linear Trends

Truck prices - linear model? Truck prices - log transform of the response variable. Interpreting models with log transformation

Transcription:

Incorporating published univariable associations in diagnostic and prognostic modeling Thomas Debray Julius Center for Health Sciences and Primary Care University Medical Center Utrecht The Netherlands December 12, 2011

Clinical Prediction Modeling Aim ˆ provide a probability of outcome presence diagnosis or occurrence prognosis in an individual Typical Approach 1 Collection of Individual Patient Data IPD 2 Data Analysis descriptives, missing values,... 3 Investigation of potential predictors 4 Logistic Regression Modeling 5 Evaluation of generalizability: validation studies

Practical Example Diagnosis of Deep Vein Thrombosis ˆ Derivation dataset IPD of 1,295 patients ˆ Predictors: gender, oral contraceptive use, presence of malignancy, recent surgery, absence of leg trauma, vein distension, calf difference, D-dimer test ˆ Logistic Regression Modeling ˆ Validation dataset of 1,756 patients ˆ Discrimination: 0.86 AUC ˆ Calibration: 1.12 Calibration slope

Improving Generalization ˆ Increase Sample Size ˆ Individual Participant Data ˆ Individual Study Centers ˆ Amplify Sample Spectrum ˆ Domain ˆ Heterogeneity ˆ Apply Robust Estimation ˆ Penalization & Shrinkage ˆ Model Updating ˆ Including External Knowledge

The Adaptation Method ˆ Introduced by Steyerberg/Greenland ˆ Re-estimates a multivariable coefficient ˆ Incorporates univariable coefficients from literature e.g. log odds ratios for binary outcomes β m L = β u L + β m I β u I var β m L = var βu L + var βm I var βu I

The Improved Adaptation Method ˆ Unbiased variance component var β m L = var βu L +var βm I +var βu I 2cov βm I, β u I ˆ Distributional β u L N µ u L, σu L 2, β m I N ˆ Robust Estimation µ m I, σm I 2, β u I N µ u I, σu I 2 µ m I Cauchy 0, 2.5, µ u I Cauchy 0, 2.5

Performance study Simulation study ˆ Reference model with 2 predictors for generating data with x 1, x 2 N 0, 1 and r x 1, x 2 = 0 ˆ Individual Patient Data n IPD = 100 1000 ˆ 4 heterogeneous literature studies n j = 500 Case study: Diagnosis of Deep Vein Thrombosis ˆ IPD: Multivariable dataset n = 1, 295 ˆ LIT: 7 unadjusted odds ratios biomarker D-dimer ˆ Update D-dimer coefficient in multivariable prediction model ˆ External validation of updated prediction model n = 1, 756

Simulation Study: homogeneous literature evidence Mean Squared Error 95% CI coverage Homogeneous evidence 0.0 0.1 0.2 0.3 0.4 Standard approach Greenland/Steyerberg Adaptation method Improved Adaptation Method 0.75 0.80 0.85 0.90 0.95 1.00 0.1 0.2 0.3 0.4 0.5 n I/n L 0.1 0.2 0.3 0.4 0.5 n I/n L

Simulation Study: heterogeneous literature evidence Mean Squared Error 95% CI coverage Homogeneous evidence 0.0 0.1 0.2 0.3 0.4 Standard approach Greenland/Steyerberg Adaptation method Improved Adaptation Method 0.75 0.80 0.85 0.90 0.95 1.00 0.1 0.2 0.3 0.4 0.5 n I/n L 0.1 0.2 0.3 0.4 0.5 n I/n L

D dimer Coefficient Bias and Coverage Model Calibration 1.0 1.5 2.0 2.5 3.0 3.5 4.0 β ddim Model Discrimination Actual Probability 0.0 0.2 0.4 0.6 0.8 1.0 No meta analysis LRM No meta analysis PMLE Steyerberg Adatpation Improved Adaptation 0.80 0.82 0.84 0.86 0.88 0.90 0.0 0.2 0.4 0.6 0.8 1.0 AUC Predicted Probability

Discussion ˆ Strengths ˆ Aggregation usually improves estimation ˆ Abundance of external knowledge ˆ Straightforward implementation of approaches ˆ Explicit aggregated models no black boxes ˆ Weaknesses ˆ Heterogeneity of external knowledge ˆ Performance gain not always very large ˆ Additional efforts required during derivation phase ˆ Ongoing research ˆ incorporation of previously published prediction models with similar and different predictors