Assessing the effect of a partly unobserved, exogenous, binary time-dependent covariate on -APPENDIX-

Similar documents
UNIVERSITY OF CALIFORNIA, SAN DIEGO

Power and Sample Size Calculations with the Additive Hazards Model

Survival Regression Models

Multi-state models: prediction

Journal of Statistical Software

A SMOOTHED VERSION OF THE KAPLAN-MEIER ESTIMATOR. Agnieszka Rossa

Survival Analysis. Stat 526. April 13, 2018

3003 Cure. F. P. Treasure

Lecture 7 Time-dependent Covariates in Cox Regression

In contrast, parametric techniques (fitting exponential or Weibull, for example) are more focussed, can handle general covariates, but require

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY

CURE FRACTION MODELS USING MIXTURE AND NON-MIXTURE MODELS. 1. Introduction

Statistics 262: Intermediate Biostatistics Non-parametric Survival Analysis

MAS3301 / MAS8311 Biostatistics Part II: Survival

Censoring and Truncation - Highlighting the Differences

Package threg. August 10, 2015

Regression analysis of interval censored competing risk data using a pseudo-value approach

Quantile Regression for Residual Life and Empirical Likelihood

Statistical Inference and Methods

Semi-parametric Inference for Cure Rate Models 1

[Part 2] Model Development for the Prediction of Survival Times using Longitudinal Measurements

Dynamic Prediction of Disease Progression Using Longitudinal Biomarker Data

5. Parametric Regression Model

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Survival analysis in R

Instrumental variables estimation in the Cox Proportional Hazard regression model

Analysis of competing risks data and simulation of data following predened subdistribution hazards

Multistate Modeling and Applications

Estimation for Modified Data

Mixture modelling of recurrent event times with long-term survivors: Analysis of Hutterite birth intervals. John W. Mac McDonald & Alessandro Rosina

Direct likelihood inference on the cause-specific cumulative incidence function: a flexible parametric regression modelling approach

Introduction to Statistical Analysis

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model

Extensions of Cox Model for Non-Proportional Hazards Purpose

Vertical modeling: analysis of competing risks data with a cure proportion

Longitudinal + Reliability = Joint Modeling

Cox s proportional hazards model and Cox s partial likelihood

arxiv: v2 [stat.me] 19 Sep 2015

THESIS for the degree of MASTER OF SCIENCE. Modelling and Data Analysis

ECON 721: Lecture Notes on Duration Analysis. Petra E. Todd

Modified maximum likelihood estimation of parameters in the log-logistic distribution under progressive Type II censored data with binomial removals

and Comparison with NPMLE

9 Estimating the Underlying Survival Distribution for a

You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What?

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

Package SimSCRPiecewise

Analysis of Time-to-Event Data: Chapter 2 - Nonparametric estimation of functions of survival time

Pricing and Risk Analysis of a Long-Term Care Insurance Contract in a non-markov Multi-State Model

Exercises. (a) Prove that m(t) =

Survival Distributions, Hazard Functions, Cumulative Hazards

Let us use the term failure time to indicate the time of the event of interest in either a survival analysis or reliability analysis.

A multistate additive relative survival semi-markov model

Analysis of incomplete data in presence of competing risks

Part IV Extensions: Competing Risks Endpoints and Non-Parametric AUC(t) Estimation

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky

ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES. Cox s regression analysis Time dependent explanatory variables

Promotion Time Cure Rate Model with Random Effects: an Application to a Multicentre Clinical Trial of Carcinoma

Constrained estimation for binary and survival data

Duration Analysis. Joan Llull

Lecture 7. Proportional Hazards Model - Handling Ties and Survival Estimation Statistics Survival Analysis. Presented February 4, 2016

Maximum likelihood estimation for Cox s regression model under nested case-control sampling

for Time-to-event Data Mei-Ling Ting Lee University of Maryland, College Park

Meei Pyng Ng 1 and Ray Watson 1

A comparison of inverse transform and composition methods of data simulation from the Lindley distribution

Analysis of Gamma and Weibull Lifetime Data under a General Censoring Scheme and in the presence of Covariates

Investigation of goodness-of-fit test statistic distributions by random censored samples

University of California, Berkeley

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data

Survival Analysis. 732G34 Statistisk analys av komplexa data. Krzysztof Bartoszek

Lecture 4 - Survival Models

Time-dependent covariates

SAMPLE SIZE ESTIMATION FOR SURVIVAL OUTCOMES IN CLUSTER-RANDOMIZED STUDIES WITH SMALL CLUSTER SIZES BIOMETRICS (JUNE 2000)

Multi-state Models: An Overview

ST745: Survival Analysis: Nonparametric methods

INVERTED KUMARASWAMY DISTRIBUTION: PROPERTIES AND ESTIMATION

Accelerated Failure Time Models: A Review

Does Cox analysis of a randomized survival study yield a causal treatment effect?

Ph.D. course: Regression models. Introduction. 19 April 2012

An augmented inverse probability weighted survival function estimator

CIMAT Taller de Modelos de Capture y Recaptura Known Fate Survival Analysis

11 Survival Analysis and Empirical Likelihood

Fahrmeir: Discrete failure time models

Goodness-Of-Fit for Cox s Regression Model. Extensions of Cox s Regression Model. Survival Analysis Fall 2004, Copenhagen

Two-stage Adaptive Randomization for Delayed Response in Clinical Trials

A note on R 2 measures for Poisson and logistic regression models when both models are applicable

Chapter 4 Fall Notations: t 1 < t 2 < < t D, D unique death times. d j = # deaths at t j = n. Y j = # at risk /alive at t j = n

Ph.D. course: Regression models. Regression models. Explanatory variables. Example 1.1: Body mass index and vitamin D status

Estimating Causal Effects of Organ Transplantation Treatment Regimes

Variable Selection in Competing Risks Using the L1-Penalized Cox Model

Notes largely based on Statistical Methods for Reliability Data by W.Q. Meeker and L. A. Escobar, Wiley, 1998 and on their class notes.

A note on the decomposition of number of life years lost according to causes of death

Multivariate Survival Analysis

Log-linearity for Cox s regression model. Thesis for the Degree Master of Science

Published online: 10 Apr 2012.

Survival Analysis I (CHL5209H)

Tied survival times; estimation of survival probabilities

Lecture 6 PREDICTING SURVIVAL UNDER THE PH MODEL

Part III Measures of Classification Accuracy for the Prediction of Survival Times

Standardization methods have been used in epidemiology. Marginal Structural Models as a Tool for Standardization ORIGINAL ARTICLE

Package crrsc. R topics documented: February 19, 2015

Transcription:

Assessing the effect of a partly unobserved, exogenous, binary time-dependent covariate on survival probabilities using generalised pseudo-values Ulrike Pötschger,2, Harald Heinzl 2, Maria Grazia Valsecchi 3, Martina Mittlböck 2* -APPENDIX- Section A: Properties of the 2 pseudo-value Consider a value z, 0 search z t, and let St ˆ * T z denote the survival propability at t* estimated by Kaplan-Meier based on all r patients still at risk at z. The common pseudo-value [] for the k-th patient, k r, is now defined as ˆk Uˆ rsˆ t* Tz ( r) S t* T z. () k Let z k denote the waiting time of patient k which is set to, say, z, for patients in state 0 at z. Now Z min z, z, z z k k k is the waiting time history of patient k up to time z. Since Z k is a vector of baseline variables at z the arguments with respect to the asymptotic properties of the pseudo-values of Jacobsen and Martinussen [2] apply in analogy. Now T k z, Dk, Zk, k r, can be considered as i.i.d. replicates of the population at risk at z. Using the von Mises expansion the pseudo-value can be expressed as Uˆ T z, D o k k k p denotes the survival probabiltiy at t * of the population where P T t* T z St* T z at risk at z ; and St ˆ * T z is an asymptotically unbiased estimate of. Page of

Jacobson and Martinussen [2] showed that E T k z, Dk Z k k. Here, k P Tk t* Tk z, Zk St* T z, Zk is the survival probability at t* conditional on both, being alive at z and the waiting time history of patient k up to z. Consequently it follows that E ˆ Uk k op. By varying z between 0 and t search, a plethora of pseudo-values could be computed. However, the primary interest is in values of z that correspond to actually observed waiting times w, i m i. Given z wi and a patient with zk wi, then t* *, 2, S t T w W w v vw dv k i i i wi which is the quantity to be estimated in the main paper. Section B: Waiting time distribution in patients with a donor Here, the density f w of partly unobservable times to donor identification (waiting times) in patients with a donor available is related to the density qw of observable waiting times up to and search t not prevented by competing risks, like death and early censoring represented by 02 t C t, respectively. Now, w qw wexp v 02 v C vdv p m 0 Page 2 of

search t pm x v v v dv dx search x t 02 C, so that 0 0 0 with exp q w dw. Here, p m is the expected proportion of patients with observed transition in the population of patients with a donor available. Due to the competing risks 02 t and t C underrepresented among the m patients with observed transitions. For the estimation of S t *, the density f, longer waiting times are w of times to donor identification of all patients with a donor available (includes patients with ceased donor search) is needed. This quantity is w f w wexp vdv 0 and it is linked to qw by f w q w w exp 02 0 v p m C v dv. Given W=w, the denominator represents the probability that a transition can actually be observed at time w. Section C: Generation of simulated data Section C is concerned with the generation of simulated survival and waiting times. Parameter values used in the simulations can be found in Table S. Let SWb t;, exp t denote the survival function of a Weibull distribution for t 0, where > 0 is the shape parameter and > 0 represents the scale parameter. Let f ;, corresponding density distribution. Wb t denote the Page 3 of

For direct 02 transitions, the simulations are based on the parametric mixture survival function ;, S t S t. 0 02 Wb 02 02 02 Here, 02 is the proportion of cured patients and S t converges to 0 02 with increasing t. This mimics the typical situation in paediatric oncology where the plateau in the survival function indicates the presence of long-term survivors (cured patients). 02 is the proportion of patients that are susceptible for a direct 0 2 transition with corresponding survival function 02 02 S t;,. Until t search =t*=5 years, the hazard functions for a 02 transition are Wb t t f t;, 02 Wb 02 02 02 ' 02 in the populations with and without donor available, S0 t respectively. More details on parametric mixture survival function can be found in Sposto [4]. Additionally, times to transitions (waiting times w) need to be simulated. It is assumed that a proportion of patients have a donor available. For scenarios A-G, a log-normal waiting time distribution f ( w ) with parameters and, truncated at t search, is used. The cumulative density function of the log-normal distribution is for x>0 Page 4 of

log( x) v² 2 ² CDF x,, e dv 2 In scenario I, discrete waiting times at w=0.5, and 3 years and the probability mass function f w are assumed. 3 For the population with a donor available and for a specific waiting time w t, the following form of the hazard function for a 2 transition was used in the simulations: ttwr t t w 2, 02 T Here tt w depends on both, the time elapsed since time zero and the time elapsed since the 2, transition at time w. The first term, r 02 t, represents the long-term effect of the timedependent intervention, which is favourable when r<. The second term, T t w, allows for additional short-term risks due to the intervention and follows a Weibull mixture distribution ;, S tw S tw. The proportion T represents the specific intervention T T Wb T T T related events in state that would be observed in absence of any other competing events. Page 5 of

Section D: Software implementation The proposed method can be straightforwardly implemented using standard routines available in the majority of statistical software packages. Firstly, Kaplan-Meier estimates are repeatedly computed to derive generalised pseudo-values; subsequently, a generalised linear model is fitted. In SAS, the procedure LIFETEST provides Kaplan-Meier estimates for survival probabilities. The procedure GENMOD can be used for fitting a generalised linear model. Note, that the model specification is done identically to the original pseudo-value approach, e.g. see Klein et al. [5] for details. In R, the function survfit in the package SURVIVAL can be used for Kaplan-Meier estimates. The generalised linear model can be estimated using the object geese in the package GEEPACK. For a more detailed description of the technical implementation see Klein et al. [5]. Page 6 of

Additional file : Table S: Specification of the simulated scenarios: parameter values and true 5-year survival probabilities Survival times Waiting times True survival probabilities 2 Scenario Transition 0 2 Transition 2 Transition 0 02 02 02 r T T T S 0 5 S 5 I Discrete 0.8 0.50.5 0. 0.5 3.3 0.75 - - 0.333 0.620 A B C D E F G Balduzzi 2005 Gale 998 Goldstone 2008 Locaciulli 2007 PH No diff. Late SCTs 0.4 0.629.3 0.33 0.8 8.5 2.5 0.25 log(0.4) 0.3 0.8 0.79.5 0. 0.35 3.3 0.4 log(0.5) 0.3 0.5 0.20.8 0.3 0.6 0.5 0.4 log(0.7) 0.3 0.7 0.653.2 0.4 0.6 4 2.5 0.4 log(0.4) 0.3 0.8 0.79.5 0.75 0 - - 0.4 log(0.5) 0.3 0.5 0.20.8 0 - - 0.4 log(0.7) 0.3 0.8 0.50.5 0. 0.5 3.3 0.45 log(2) 0.8 0.404 0.562 0.29 0.547 0.5 0.659 0.703 0.703 0.29 0.390 0.5 0.5 0.333 0.569 ) t search =t*=5 years 2) True survival probabilities S 0 5 and 5 S were calculated using computations and simulations in SAS and numerical integration in Mathematica according to equation (3) and (2) of the main paper, respectively. Page 7 of

Additional file : Table S2: Model-based and Monte-Carlo standard-errors for simulation study wglm WGLM ad-hoc 2 Donor w 3 SE est 4 SD sim 3 SE est 4 SD sim n=000 5 No Donor -- 0.25 0.28 0.25 0.28 Yes -- 0.078 0.070 0.098 0.04 0.5 0.64 0.58 0.72 0.68 0.34 0.32 0.6 0.58 3 0.06 0.083 0.9 0.77 n=400 5 No Donor -- 0.20 0.26 0.20 0.26 Yes -- 0.24 0.2 0.56 0.7 0.5 0.263 0.253 0.277 0.269 0.224 0.2 0.269 0.254 3 0.70 0.29 0.309 0.286 ) The weighted generalised linear model (wglm) uses ˆ * Vi, t according to equation (6) 2) The weighted generalised linear model (wglm) uses the ad-hoc correction suggested to Vˆ t * (with one repetition per observation per simulation run) estimate i, 3) Mean of standard errors of the generalised linear model (empirical sandwich estimator) 4) Standard deviations of the parameter estimates of 000 simulation runs (Monte-Carlo standard deviations) 5) Entire sample with and without a donor with 25 % in every subgroup: without donor and donor found at w=0.5, and 3, respectively Page 8 of

Additional file : Table S3: Model-based and Monte-Carlo standard-errors for simulation study 2 n=000 3 n=400 3 uniform 0 censoring SE est 2 SD sim SE est 2 SD sim A 0-0.053 0.05 0.085 0.082 B 0-0.062 0.063 0.098 0.095 C 0-0.067 0.068 0.07 0.07 D 0-0.079 0.080 0.26 0.26 E 0-0.062 0.062 0.098 0.095 F 0-0.067 0.068 0.07 0.07 G 0-0.060 0.063 0.095 0.097 G 0-6 0.083 0.083 0.33 0.32 0 + A 0-0. 0. 0.77 0.8 B 0-0.087 0.092 0.37 0.42 C 0-0.099 0.098 0.57 0.54 D 0-0. 0.05 0.6 0.62 E 0-0.082 0.08 0.30 0.29 F 0-0.087 0.087 0.38 0.42 G 0-0.5 0.25 0.82 0.87 G 0-6 0.49 0.54 0.239 0.246 A 0-0.23 0.22 0.96 0.96 B 0-0.07 0.2 0.69 0.73 C 0-0.9 0.8 0.90 0.84 D 0-0.29 0.27 0.204 0.2 E 0-0.03 0. 0.63 0.6 F 0-0.0 0.09 0.74 0.74 G 0-0.3 0.32 0.208 0.204 G 0-6 0.73 0.69 0.278 0.268 ) Mean of standard errors of the generalised linear model (empirical sandwich estimator) 2) Standard deviations of the parameter estimates of 000 simulation runs (Monte-Carlo standard deviations) 3) entire sample with and without a donor Page 9 of

Additional file : Figure S:Survival scenarios used in simulation study 2 with t search =5 years A B C D E F G Survival without donor Survival with donor t t S0 S Cumulative waiting time distribution t 0 f v dv Page 0 of

References. Andersen PK, Pohar Perme M: Pseudo-observations in survival analysis. Statistical methods in medical research 20, 9():7-99. 2. Jacobsen M, Martinussen T: A Note on the Large Sample Properties of Estimators Based on Generalized Linear Models for Correlated Pseudo-observations. Scandinavian Journal of Statistics 26, 43(3):845-862. 3. Rotnitzky A, Robins JM: Inverse Probability Weighting in Survival Analysis. In: Encyclopedia of Biostatistics. Edited by Armitage P, Colton T: John Wiley; 2005. 4. Sposto R: Cure model analysis in cancer: an application to data from the Children's Cancer Group. Statistics in Medicine 2002, 2(2):293-32. 5. Klein JP, Gerster M, Andersen PK, Tarima S, Pohar Perme M: SAS and R functions to compute pseudo-values for censored data regression. Computer Methods and Programs in Biomedicine 2008, 89(3):289-300. Page of