A new strategy for meta-analysis of continuous covariates in observational studies with IPD. Willi Sauerbrei & Patrick Royston

Similar documents
Towards stratified medicine instead of dichotomization, estimate a treatment effect function for a continuous covariate

The influence of categorising survival time on parameter estimates in a Cox model

Fractional Polynomials and Model Averaging

especially with continuous

Multivariable Fractional Polynomials

Multivariable Fractional Polynomials

3 The Multivariable Fractional Polynomial Approach, with Thoughts about Opportunities and Challenges in Big Data

Dose-response modeling with bivariate binary data under model uncertainty

Modelling excess mortality using fractional polynomials and spline functions. Modelling time-dependent excess hazard

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation

Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal

Incorporating published univariable associations in diagnostic and prognostic modeling

Multivariable model-building with continuous covariates: 2. Comparison between splines and fractional polynomials

First Aid Kit for Survival. Hypoxia cohort. Goal. DFS=Clinical + Marker 1/21/2015. Two analyses to exemplify some concepts of survival techniques

Point-wise averaging approach in dose-response meta-analysis of aggregated data

REGRESSION ANALYSIS FOR TIME-TO-EVENT DATA THE PROPORTIONAL HAZARDS (COX) MODEL ST520

Power and Sample Size Calculations with the Additive Hazards Model

A framework for developing, implementing and evaluating clinical prediction models in an individual participant data meta-analysis

Introduction to Statistical Analysis

PubH 7405: REGRESSION ANALYSIS INTRODUCTION TO LOGISTIC REGRESSION

Multivariate Survival Analysis

TMA 4275 Lifetime Analysis June 2004 Solution

Classification & Regression. Multicollinearity Intro to Nominal Data

Biostat Methods STAT 5820/6910 Handout #9a: Intro. to Meta-Analysis Methods

Correlation and regression

Practical Biostatistics

Review of Statistics 101

Linear Modelling in Stata Session 6: Further Topics in Linear Modelling

Lecture 7 Time-dependent Covariates in Cox Regression

Faculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics

HOW TO DETERMINE THE NUMBER OF SUBJECTS NEEDED FOR MY STUDY?

Meta-analysis. 21 May Per Kragh Andersen, Biostatistics, Dept. Public Health

Chapter 6. Logistic Regression. 6.1 A linear model for the log odds

Categorical Predictor Variables

MAS3301 / MAS8311 Biostatistics Part II: Survival

Distributed analysis in multi-center studies

Approximate analysis of covariance in trials in rare diseases, in particular rare cancers

Part [1.0] Measures of Classification Accuracy for the Prediction of Survival Times

Ph.D. course: Regression models

Ph.D. course: Regression models. Introduction. 19 April 2012

Example name. Subgroups analysis, Regression. Synopsis

Basic Medical Statistics Course

Multiple imputation in Cox regression when there are time-varying effects of covariates

Instrumental variables estimation in the Cox Proportional Hazard regression model

The Design of a Survival Study

Ph.D. course: Regression models. Regression models. Explanatory variables. Example 1.1: Body mass index and vitamin D status

Building a Prognostic Biomarker

INTRODUCTION TO MULTILEVEL MODELLING FOR REPEATED MEASURES DATA. Belfast 9 th June to 10 th June, 2011

PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH

Relative-risk regression and model diagnostics. 16 November, 2015

Contingency Tables Part One 1

Chapter 1 Statistical Inference

The mfp Package. July 27, GBSG... 1 bodyfat... 2 cox... 4 fp... 4 mfp... 5 mfp.object... 7 plot.mfp Index 9

Previous lecture. Single variant association. Use genome-wide SNPs to account for confounding (population substructure)

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY

Collinearity/Confounding in richly-parameterized models

Statistics in medicine

Joint Modeling of Longitudinal Item Response Data and Survival

Package CoxRidge. February 27, 2015

A (Brief) Introduction to Crossed Random Effects Models for Repeated Measures Data

Basic IRT Concepts, Models, and Assumptions

A comparison of arm-based and contrast-based approaches to network meta-analysis (NMA)

Lecture 8 Stat D. Gillen

Lab 8. Matched Case Control Studies

flexsurv: flexible parametric survival modelling in R. Supplementary examples

Multi-state models: prediction

Missing covariate data in matched case-control studies: Do the usual paradigms apply?

Part III Measures of Classification Accuracy for the Prediction of Survival Times

Lecture 4 Multiple linear regression

Multivariate Dose-Response Meta-Analysis: the dosresmeta R Package

The Weibull Distribution

A Bayesian Nonparametric Approach to Causal Inference for Semi-competing risks

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression

STA441: Spring Multiple Regression. More than one explanatory variable at the same time

A note on R 2 measures for Poisson and logistic regression models when both models are applicable

Regression techniques provide statistical analysis of relationships. Research designs may be classified as experimental or observational; regression

BMA-Mod : A Bayesian Model Averaging Strategy for Determining Dose-Response Relationships in the Presence of Model Uncertainty

The Stata Journal. Editors H. Joseph Newton Department of Statistics Texas A&M University College Station, Texas

CAMPBELL COLLABORATION

Reconstruction of individual patient data for meta analysis via Bayesian approach

β j = coefficient of x j in the model; β = ( β1, β2,

Data splitting. INSERM Workshop: Evaluation of predictive models: goodness-of-fit and predictive power #+TITLE:

Extensions of Cox Model for Non-Proportional Hazards Purpose

Lecture Outline. Biost 518 Applied Biostatistics II. Choice of Model for Analysis. Choice of Model. Choice of Model. Lecture 10: Multiple Regression:

ANALYSIS OF CORRELATED DATA SAMPLING FROM CLUSTERS CLUSTER-RANDOMIZED TRIALS

Two techniques for investigating interactions between treatment and continuous covariates in clinical trials

Checking model assumptions with regression diagnostics

Hypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2)

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P.

The Function Selection Procedure

APPENDIX B Sample-Size Calculation Methods: Classical Design

Logistic regression model for survival time analysis using time-varying coefficients

Generalized Additive Models

arxiv: v1 [stat.me] 28 Jul 2017

Extensions of Cox Model for Non-Proportional Hazards Purpose

Bayesian Nonparametric Meta-Analysis Model George Karabatsos University of Illinois-Chicago (UIC)

Journal of Statistical Software

Propensity Score Methods for Causal Inference

The mfp Package. R topics documented: May 18, Title Multivariable Fractional Polynomials. Version Date

Practical considerations for survival models

Transcription:

A new strategy for meta-analysis of continuous covariates in observational studies with IPD Willi Sauerbrei & Patrick Royston

Overview Motivation Continuous variables functional form Fractional polynomials Brief description of traditional meta-analysis of trials data Extension to continuous covariates Meta-analysis of continuous covariates linear approach Our proposed strategy based on fractional polynomials Discussion 1

MA standard approaches to analysis in randomized controlled trials Treatment effect in each trial is often binary one effect size (ES) parameter to estimate per trial Mechanically, one combines ES values as a weighted average over the n studies and computes its 95% CI Implicitly, one has a model usually fixed effects or random effects 2

Extension to observational studies In principle, no different to RCTs, but must adjust effect estimate of interest for confounding Adjustment for confounding is problematic for several reasons Different confounders per study, different measurement techniques, Doing MA in this setting is virtually impossible without individual participant data (IPD) 3

Extension of MA for continuous covariates Continuous covariates are problematic for researchers Some categorize continuous covariates into groups and proceed as for standard MA Analyzing one or more dummy variables (But different cutpoints may be used across studies) Alternatively, some assume the dose/response function to be linear, and meta-analyze the regression slopes with or without adjustment for confounders None of this allows for heterogeneity in functional form between studies What to do if confounders differ across studies? 4

Continuous variables what functional form? Traditional approaches a) Linear function - may be inadequate functional form - misspecification of functional form may lead to wrong conclusions b) best standard transformation c) Step function (categorial data) - Loss of information - How many cutpoints? - Which cutpoints? - Bias introduced by outcome-dependent choice 5

Step function - the cutpoint problem relative risk } θ = exp ( β) µ In the Cox model quantitative factor X e.g. S-phase fraction λ(t X > μ) = exp β λ(t X μ) μ ˆ :estimated cutpoint for the comparison of patients with X above and below μ. 6

Searching for optimal cutpoint minimal p-value approach SPF in Freiburg DNA study Problem multiple testing => inflated type I error 7

Searching for optimal cutpoint Inflation of type I errors % significant Sample size Simulation study Type I error about 40% instead of 5%, does not disappear with increased sample size (in contrast to type II error) Severe bias of estimated effect Different cutpoints in each study Step function biologically plausible?? 8

Example 1: Prognostic factors GBSG-study in node-positive breast cancer 299 events for recurrence-free survival time (RFS) in 686 patients with complete data 7 prognostic factors, of which 5 are continuous Tamoxifen yes/no 9

Age as prognostic factor cutpoint analyses 1.00 (a) Optimal cutpoint (37) 1.00 (b) Median cutpoint (53) 0.75 0.75 0.50 0.50 0.25 0.25 Survival function 0.00 1.00 0.75 0 2 4 6 8 (c) Predefined cutpoints (45, 60) 0.00 1.00 0.75 0 2 4 6 8 (d) Ten-year age groups (40-80) 0.50 0.50 0.25 0.25 0.00 0.00 0 2 4 6 8 0 2 4 6 8 Years from randomization The youngest group is always in blue. (a) Optimal (37 years); HR (older vs younger) 0.54, p= 0.004 (b) median (53 years); HR (older vs younger) 1.1, p= 0.4 (c) predefined from earlier analyses (45, 60years); (d) popular (10-year groups) 10

StatMed 2006, 25:127-141 11

Fractional polynomial models Fractional polynomial of degree 2 with powers p = (p 1,p 2 ) is defined as FP2 = β 1 X p 1 + β 2 X p 2 Powers p are taken from a predefined set S = { 2, 1, 0.5, 0, 0.5, 1, 2, 3} 0- logx Example FP2 = β 1 X 0.5 + β 2 X 3 12

Examples of FP2 curves - varying powers (-2, 1) (-2, 2) (-2, -2) (-2, -1) 13

Examples of FP2 curves - single power, different coefficients (-2, 2) 4 2 Y 0-2 -4 10 20 30 40 50 x 14

Our philosophy of function selection Prefer simple (linear) model Use more complex (non-linear) FP1 or FP2 model if indicated by the data Contrasts to more local regression modelling Already starts with a complex model 15

FP analysis for the effect of age Degree 1 Degree 2 Power Model Powers Model Powers Model Powers Model chisquare chisquare chisquare chisquare -2 6.41-2 -2 17.09-1 1 15.56 0 2 11.45-1 3.39-2 -1 17.57-1 2 13.99 0 3 9.61-0.5 2.32-2 -0.5 17.61-1 3 12.37 0.5 0.5 13.37 0 1.53-2 0 17.52-0.5-0.5 16.82 0.5 1 12.29 0.5 0.97-2 0.5 17.30-0.5 0 16.18 0.5 2 10.19 1 0.58-2 1 16.97-0.5 0.5 15.41 0.5 3 8.32 2 0.17-2 2 16.04-0.5 1 14.55 1 1 11.14 3 0.03-2 3 14.91-0.5 2 12.74 1 2 8.99-1 -1 17.58-0.5 3 10.98 1 3 7.15-1 -0.5 17.30 0 0 15.36 2 2 6.87-1 0 16.85 0 0.5 14.43 2 3 5.17-1 0.5 16.25 0 1 13.44 3 3 3.67 7 16

Function selection procedure (FSP) Effect of age at 5% level? χ 2 df p-value Any effect? Best FP2 versus null 17.61 4 0.0015 Linear function suitable? Best FP2 versus linear 17.03 3 0.0007 FP1 sufficient? Best FP2 vs. best FP1 11.20 2 0.0037 17

Many predictors MFP With many continuous predictors selection of best FP for each becomes more difficult MFP algorithm as a standardized way to variable and function selection (usually binary and categorical variables are also available) MFP algorithm combines backward elimination with FP function selection procedures 18

Continuous factors Different results with different analyses Age as prognostic factor in breast cancer (adjusted) P-value 0.9 0.2 0.001 19

Results similar? Nodes as prognostic factor in breast cancer (adjusted) P-value 0.001 0.001 0.001 20

Meta-analysis: Simplest approach (for illustration) Univariate not adjusted Assume linearity 21

Example with a good IPD dataset Data: SEER registries in USA we use the primary breast cancer database 9 registries; 83,804 patients; 8,099 events (deaths) No registry (study) has a small sample size Time-to-event analysis using Cox regression Several continuous covariates of interest: nodes (number of positive lymph nodes) age (age at surgery for breast cancer) size (tumour size) Several potential confounders the above variables; other clinical and demographic variables As examples, consider nodes - the strongest predictor and age a weak predictor 22

SEER data: what do the linear nodes and age functions look like? nodes modelled as linear Log relative hazard 0 2 4 6 0 10 20 30 40 50 Number of positive lymph nodes Node negative (0) chosen as reference point 23

Age modeled as linear age modelled as linear Log relative hazard -.5 0.5 20 40 60 80 100 Age (yr) 60 yrs chosen as reference point 24

Linear approach to MA Cox model Do Cox regression on covariate in i th study, estimate i th slope and SE Get combined slope as weighted mean, and its SE 25

Forest plot for nodes % Weight Registry Events ES (95% CI) (I-V) Connecticut 1084 0.11 (0.10, 0.11) 13.28 Metropolitan Detroit 1540 0.12 (0.11, 0.12) 20.61 Hawaii 235 0.11 (0.09, 0.12) 5.10 Iowa 1269 0.13 (0.12, 0.13) 11.33 New Mexico 400 0.10 (0.09, 0.12) 4.46 San Francisco-Oakland SMSA 1270 0.11 (0.11, 0.12) 15.03 Utah 386 0.09 (0.08, 0.11) 5.21 Seattle (Puget Sound) 1184 0.11 (0.10, 0.12) 15.59 Metropolitan Atlanta 731 0.10 (0.09, 0.10) 9.39 I-V Overall (I-squared = 79.2%, p = 0.000) D+L Overall 0.11 (0.11, 0.11) 0.11 (0.10, 0.11) 100.00 0.05.1.15 26

Comments on forest plot for nodes Some slope heterogeneity between studies I-squared = 79% (see Higgins & Thompson 2002) Weights are not vastly different between studies no one dataset dominates the results Fixed- and random-effects estimates of overall slope are similar (although CI s differ) 27

Going beyond linearity We have assumed functions for age and nodes are linear is that sensible? If not, what do we do next? Relax linearity assumption use fractional polynomials to model possible non-linearity 28

What does the nodes function really look like? nodes: studywise selected FP Log relative hazard 0 1 2 3 4 0 10 20 30 40 50 Number of positive lymph nodes 29

age (fitted by FP2 functions) Studywise FP2 Log relative hazard 0 1 2 3 20 40 60 80 100 Age (yr) 30

Comments Information on functional form within each study has clearly been lost with the linear approach Effect of nodes (strong predictor) is highly non-linear Effect of age (weak predictor) is variable but also appears non-linear We do need a more flexible approach to MA 31

Our proposal based on FPs Assume that the covariate of interest is measured on the same scale in each study No problem with nodes and age FP approach offers a practical compromise between flexibility and stability for determining a dose/response function in a single study With or without adjustment for confounders To obtain an overall (average) function from n studies, we suggest a strategy with 3 main steps 32

Our proposed strategy Step 1: Determine confounder model use MFP, summarize selected model in a confounder index Step 2: Determine functional form Adjusted for confounder index from step 1 3 methods Overall FP Studywise FP2 Studywise selected FP Step 3: Pointwise average the fitted functions Fixed or random effect weights 33

Results for SEER breast cancer data Step 1 (determine confounder model) Treatment variables were included in all the confounder models Somewhat different variables and FP transformations were selected across studies Step 2 (determine functional form) Individual functions for nodes and age Step 3 (averaging the functions) Average functions using random-effects weights 34

Step 1: Determine confounder model Each study can have different potential confounders Determine confounder models M 1,, M n per study Apply the MFP procedure * to confounders in study i = 1,, n to determine model M i Include standard confounders (e.g. age, sex, etc) in models Don t include continuous covariate of main interest Summarize confounder models as index (linear predictor) xb i per study Adjust linearly for index values xb i in further analyses *as explained in Willi Sauerbrei s talk 35

Step 2: Determine functional form We suggest 3 methods for selecting a suitable function of a continuous covariate in each study Function estimates are always adjusted for the confounder indexes xb 1,, xb n 36

Three methods of function selection 1) Overall FP. Find the best FP transformation for the pooled dataset, stratified by study 1) To avoid over-fitting when have a large dataset, select using a small significance level 2) Across studies, functions have the same overall FP transformation but different estimated regression coefficients 2) Studywise FP2. Select the best FP2 function for each study. These may differ substantially across studies 3) Studywise selected FP. Select the best FP function for each study. These may also differ substantially across studies, since power may particularly be an issue 37

Step 3: average the fitted functions Make an informal assessment of heterogeneity E.g. plot of estimated functions against covariate Decide whether it is sensible to average functions across studies If it s sensible to average, we adopt standard methods for finding an average function, adjusted for the confounder indexes xb 1,, xb n The study-specific weights w i are not constant any more Since the variance of effect size ES i depends on Z, the weights are also functions of Z, i.e. w i = w i (Z) The approach can be used with all 3 FP function selection methods 38

Further details: Step 3 averaging the functions Cox model has no intercept, so estimated function ES i (Z) in study must first be standardized Choose a suitable reference value Z 0 for all studies Subtract the function in each study i at that value ES i~ (Z) = ES i (Z) ES i (Z 0 ) Weights and tau 2 are computed pointwise on Z functions of Z affects shape of mean function Compute overall mean function as pointwise weighted average of ES i~ (Z) across studies Pointwise fixed and random effects weights are calculated in the usual way For nodes, 0 is obvious ref point; for age, we chose 60 yrs 39

Individual functions for nodes [3 methods] Overall FP Studywise FP2 Log relative hazard 0 1 2 3 4 5 Log relative hazard 0 1 2 3 4 5 0 10 20 30 40 50 Number of positive lymph nodes 0 10 20 30 40 50 Number of positive lymph nodes Studywise selected FP Log relative hazard 0 1 2 3 4 5 0 10 20 30 40 50 Number of positive lymph nodes 40

Average functions for nodes [RE weights] Overall FP Studywise FP2 Log relative hazard 0 1 2 3 0 1 2 3 0 10 20 30 40 50 Studywise selected FP 0 1 2 3 0 10 20 30 40 50 0 10 20 30 40 50 Number of positive nodes 41

Individual functions for age [3 methods] Overall FP Studywise FP2 Log relative hazard -1 0 1 2 Log relative hazard -1 0 1 2 20 40 60 80 100 Age (yr) 20 40 60 80 100 Age (yr) Studywise selected FP Log relative hazard -1 0 1 2 20 40 60 80 100 Age (yr) 42

Average functions for age [RE weights] Overall FP Studywise FP2 Log relative hazard -.5 0.5 1 1.5 -.5 0.5 1 1.5 20 40 60 80 100 Studywise selected FP -.5 0.5 1 1.5 20 40 60 80 100 20 40 60 80 100 Age (yr) 43

Comments on the results Linear approach is clearly highly misleading, both for nodes and age For age, the direction of the effect seems reversed Linear functions suggest decreasing risk with age, whereas the FP2 functions are strongly non-monotonic 44

Conclusions Meta-analysis of continuous covariates based on cutpoints and dummy variables is not sufficient MA of regression coefficients for linear functions is sometimes acceptable When non-linearity is present, MA of FP functions may offer a way forward Several critical points need more consideration, experience and investigation 45

Some selected references Borenstein M, Hedges LV, Higgins JPT, Rothstein HR. 2009. Introduction to meta-analysis. Wiley, Chichester. DerSimonian R, Laird N. 1986. Meta-analysis in clinical trials. Controlled Clinical Trials 7: 177-188. Higgins JPT, Thompson SG, Spiegelhalter DJ. 2009. A re-evaluation of random-effects meta-analysis. JRSS(A) 172: 137-159. Royston P., Altman, D.G. (1994): Regression Using Fractional Polynomials of Continuous Covariates: Parsimonious Parametic Modelling. Applied Statistics, 43(3): 429 467. Royston P, Sauerbrei W. 2008. Multivariable Model-building: A pragmatic approach to regression analysis based on fractional polynomials for modelling continuous variables. Wiley, Chichester. Sauerbrei W., Royston P., Binder H. (2007): Selection of important variables and determination of functional form for continuous predictors in multivariable model building. Statistics in Medicine, 26: 5512-5528. Sauerbrei W, Royston P. 2010. A new strategy for meta-analysis of continuous covariates in observational studies. Statistics in Medicine submitted. White IR. 2009. Multivariable random-effects meta-analysis. Stata J 9(1): 40-56 46