Semi-parametric Inference for Cure Rate Models 1

Similar documents
Parametric Maximum Likelihood Estimation of Cure Fraction Using Interval-Censored Data

Analysis of Cure Rate Survival Data Under Proportional Odds Model

Negative Binomial Kumaraswamy-G Cure Rate Regression Model

Edinburgh Research Explorer

Analysis of competing risks data and simulation of data following predened subdistribution hazards

Step-Stress Models and Associated Inference

Likelihood Construction, Inference for Parametric Survival Distributions

Statistical Modeling and Analysis for Survival Data with a Cure Fraction

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data

Semiparametric Models for Joint Analysis of Longitudinal Data and Counting Processes

3003 Cure. F. P. Treasure

ST495: Survival Analysis: Maximum likelihood

Lecture 5 Models and methods for recurrent event data

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY

Bayesian Analysis for Partially Complete Time and Type of Failure Data

Promotion Time Cure Rate Model with Random Effects: an Application to a Multicentre Clinical Trial of Carcinoma

Survival Distributions, Hazard Functions, Cumulative Hazards

UNIVERSITY OF CALIFORNIA, SAN DIEGO

MAS3301 / MAS8311 Biostatistics Part II: Survival

arxiv: v2 [stat.me] 19 Sep 2015

Assessing the effect of a partly unobserved, exogenous, binary time-dependent covariate on -APPENDIX-

Survival Analysis Math 434 Fall 2011

Multistate Modeling and Applications

Lecture 22 Survival Analysis: An Introduction

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

Goodness-of-fit tests for the cure rate in a mixture cure model

STAT331. Cox s Proportional Hazards Model

Bivariate Weibull Distributions Derived From Copula Functions In The Presence Of Cure Fraction And Censored Data

Lecture 3. Truncation, length-bias and prevalence sampling

Introduction to Reliability Theory (part 2)

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky

Quantile Regression for Residual Life and Empirical Likelihood

Analysis of Gamma and Weibull Lifetime Data under a General Censoring Scheme and in the presence of Covariates

Logistic regression model for survival time analysis using time-varying coefficients

Two-stage Adaptive Randomization for Delayed Response in Clinical Trials

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates

ST5212: Survival Analysis

CURE MODEL WITH CURRENT STATUS DATA

Vertical modeling: analysis of competing risks data with a cure proportion

UNIVERSIDADE FEDERAL DE SÃO CARLOS CENTRO DE CIÊNCIAS EXATAS E TECNOLÓGICAS

Constrained estimation for binary and survival data

Simple Step-Stress Models with a Cure Fraction

Duration Analysis. Joan Llull

Statistical Methods for Alzheimer s Disease Studies

STAT 6350 Analysis of Lifetime Data. Failure-time Regression Analysis

Exercises. (a) Prove that m(t) =

Goodness-of-fit tests for randomly censored Weibull distributions with estimated parameters

THESIS for the degree of MASTER OF SCIENCE. Modelling and Data Analysis

REGRESSION ANALYSIS FOR TIME-TO-EVENT DATA THE PROPORTIONAL HAZARDS (COX) MODEL ST520

Hacettepe Journal of Mathematics and Statistics Volume 45 (5) (2016), Abstract

Survival Analysis. Stat 526. April 13, 2018

Statistical Analysis of Competing Risks With Missing Causes of Failure

Censoring mechanisms

Power and Sample Size Calculations with the Additive Hazards Model

A new extended Birnbaum-Saunders model with cure fraction: classical and Bayesian approach

The Poisson-Weibull Regression Model

Cox s proportional hazards model and Cox s partial likelihood

The nltm Package. July 24, 2006

The Log-generalized inverse Weibull Regression Model

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Advanced Methodology Developments in Mixture Cure Models

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

Longitudinal + Reliability = Joint Modeling

University of California, Berkeley

Technical Report - 7/87 AN APPLICATION OF COX REGRESSION MODEL TO THE ANALYSIS OF GROUPED PULMONARY TUBERCULOSIS SURVIVAL DATA

Survival Prediction Under Dependent Censoring: A Copula-based Approach

Latent cure rate model under repair system and threshold effect

Statistical Inference and Methods

Analysing geoadditive regression data: a mixed model approach

Time-varying failure rate for system reliability analysis in large-scale railway risk assessment simulation

Meei Pyng Ng 1 and Ray Watson 1

Introduction to Statistical Analysis

Constant Stress Partially Accelerated Life Test Design for Inverted Weibull Distribution with Type-I Censoring

Part III Measures of Classification Accuracy for the Prediction of Survival Times

Mixture modelling of recurrent event times with long-term survivors: Analysis of Hutterite birth intervals. John W. Mac McDonald & Alessandro Rosina

Joint Modeling of Longitudinal Item Response Data and Survival

Survival Analysis I (CHL5209H)

Survival analysis with long-term survivors and partially observed covariates

Chapter 2 Inference on Mean Residual Life-Overview

[Part 2] Model Development for the Prediction of Survival Times using Longitudinal Measurements

A SMOOTHED VERSION OF THE KAPLAN-MEIER ESTIMATOR. Agnieszka Rossa

A Regression Model For Recurrent Events With Distribution Free Correlation Structure

Multivariate Survival Analysis

INVERTED KUMARASWAMY DISTRIBUTION: PROPERTIES AND ESTIMATION

Semiparametric Regression

Gaussian Process Regression Model in Spatial Logistic Regression

SAMPLE SIZE ESTIMATION FOR SURVIVAL OUTCOMES IN CLUSTER-RANDOMIZED STUDIES WITH SMALL CLUSTER SIZES BIOMETRICS (JUNE 2000)

A general mixed model approach for spatio-temporal regression data

Adaptive Prediction of Event Times in Clinical Trials

Semi-Competing Risks on A Trivariate Weibull Survival Model

In contrast, parametric techniques (fitting exponential or Weibull, for example) are more focussed, can handle general covariates, but require

Simple techniques for comparing survival functions with interval-censored data

Lecture 6 PREDICTING SURVIVAL UNDER THE PH MODEL

Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion

Modelling geoadditive survival data

Survival Analysis: Weeks 2-3. Lu Tian and Richard Olshen Stanford University

INFERENCE FOR MULTIPLE LINEAR REGRESSION MODEL WITH EXTENDED SKEW NORMAL ERRORS

MAS3301 / MAS8311 Biostatistics Part II: Survival

Multistate models and recurrent event models

Transcription:

Semi-parametric Inference for Cure Rate Models 1 Fotios S. Milienos jointly with N. Balakrishnan, M.V. Koutras and S. Pal University of Toronto, 2015 1 This research is supported by a Marie Curie International Outgoing Fellowship within the 7th European Community Framework Programme.

Outline 1 Cure rate models 2 Piecewise linear approximation 3 Simulation study 4 Illustrative example 5 Conclusions

Introduction The primary aim of this work is the study of survival times or more generally, the times till the occurrence of an event. This event may be: the occurrence or recurrence of a disease; the return to prison or rearrest of a released prisoner; the death of a patient; etc

Introduction The primary aim of this work is the study of survival times or more generally, the times till the occurrence of an event. This event may be: the occurrence or recurrence of a disease; the return to prison or rearrest of a released prisoner; the death of a patient; etc Using the models found in traditional survival analysis, every item will experience the event of interest, at some point.

Introduction The primary aim of this work is the study of survival times or more generally, the times till the occurrence of an event. This event may be: the occurrence or recurrence of a disease; the return to prison or rearrest of a released prisoner; the death of a patient; etc Using the models found in traditional survival analysis, every item will experience the event of interest, at some point. population Traditional survival analysis the event of interest... after some time...

Introduction Cure rate models allow for a proportion of items which will never experience the event of interest.

Introduction Cure rate models allow for a proportion of items which will never experience the event of interest. Cure rate models the event of interest (susceptibles) population... after some time... cured items (non-susceptibles) These items are called cured or non-susceptibles or long-term survivors or immune or immortals etc.

Introduction The cure rate models have been extensively studied during the last decades. We can find more than 200 publication the last 30-40 years and manymany applications... The reason is the great improvement of treatments for a range of diseases, but also to the fact that in many social phenomena a number of individuals are not susceptible to the event of interest. Applications may be found in biomedical studies, criminology, finance, industrial reliability etc (see the monographs by Maller and Zhou, 1996; Ibrahim, Chen and Sinha, 2005).

Binary cure rate model Boag (1949) and Berkson and Gage (1952) are the first works (to the best of my knowledge) which took into account the existence of two subgroups in the population: the susceptibles and the non -susceptibles (cured).

Binary cure rate model Boag (1949) and Berkson and Gage (1952) are the first works (to the best of my knowledge) which took into account the existence of two subgroups in the population: the susceptibles and the non -susceptibles (cured). Based on this assumption, the population survival function of the timeto-event T, is given by the following mixture model S P (t)=p(t > t)=p 0 + (1 p 0 )S(t) where p 0 is the probability of a patient to be cured, and S(t) = P(T > t susceptibles) is the survival function of susceptibles. Note that lim t S P (t)=p 0.

Binary cure rate model What problems do we have? What is the distribution of susceptibles S(t) = P(T > t susceptibles)? How the covariates can be incorporated into the model (in p 0 or/and S(t) = P(T > t susceptibles))? How to estimate the parameters of the model?

Binary cure rate model S P (t)=p(t > t)=p 0 + (1 p 0 )S(t) What problems do we have? What is the distribution of susceptibles S(t) = P(T > t susceptibles)? How the covariates can be incorporated into the model (in p 0 or/and S(t) = P(T > t susceptibles))? How to estimate the parameters of the model?

Binary cure rate model S P (t)=p(t > t)=p 0 + (1 p 0 )S(t) What problems do we have? What is the distribution of susceptibles S(t) = P(T > t susceptibles)? How the covariates can be incorporated into the model (in p 0 or/and S(t) = P(T > t susceptibles))? How to estimate the parameters of the model?

Binary cure rate model S P (t)=p(t > t)=p 0 + (1 p 0 )S(t) What problems do we have? What is the distribution of susceptibles S(t) = P(T > t susceptibles)? How the covariates can be incorporated into the model (in p 0 or/and S(t) = P(T > t susceptibles))? How to estimate the parameters of the model?

Binary cure rate model S P (t)=p(t > t)=p 0 + (1 p 0 )S(t) What problems do we have? What is the distribution of susceptibles S(t) = P(T > t susceptibles)? How the covariates can be incorporated into the model (in p 0 or/and S(t) = P(T > t susceptibles))? How to estimate the parameters of the model? Parametric, semi-parametric and non-parametric approaches can be found in the relevant literature.

Binary cure rate model Assumptions for the distribution of susceptibles: Exponential distribution (e.g. Berkson and Gage, 1952; Mould, 1973; Ghitany and Maller, 1992; Ghitany, 1993; Farewell, 1977a; Balakrishnan and Pal, 2013c) S(t)=P(T > t susceptibles)=exp( λt),λ>0; Weibull distribution (e.g. Farewell, 1977b, 1982, 1986; Bentzen et al.,1989; Struthers and Farewell, 1989; Balakrishnan and Pal, 2013b,c) S(t)=exp( λt) α,λ,α>0;

Binary cure rate model Assumptions for the distribution of susceptibles: Cox s proportional hazard model (e.g. Kuk and Chen, 1992; Sy and Taylor, 2000, 2001; Peng and Dear, 2000; Fang, Li and Sun, 2005; Zhao et al., 2014) h(t)=h 0 (t)exp(x β),λ>0; Kaplan-Meyer estimator (e.g. Taylor, 1995; Laska and Meisner, 1992; Maller and Zhou, 1992); Generalized F distribution (e.g. Peng, Dear and Denham, 1998); Log-normal distribution (Balakrishnan and Pal, 2013a) and others.

Binary cure rate model The estimation depends on the available data (censoring, truncation etc): MLE and EM algorithm; A marginal likelihood estimation approach (especially in Cox s PH model); An iterative least squares method (Berkson and Gage, 1952).

Binary cure rate model The estimation depends on the available data (censoring, truncation etc): MLE and EM algorithm; A marginal likelihood estimation approach (especially in Cox s PH model); An iterative least squares method (Berkson and Gage, 1952). Logistic regression model for p 0 (logistic-binary mixture model), i.e. p 0 = p 0 (x;β)= 1 1+exp(x β).

Competing cause scenario A number of competing causes left alive after a treatment, say M; let W 1,W 2,...,W M be the random variable for the time-to-event due to ith (i=1,2,...,m) competing cause. The random variables W 1,W 2,...,W M are assumed i.i.d. with cdf F(t)= 1 S(t) and are also independent of M.

Competing cause scenario A number of competing causes left alive after a treatment, say M; let W 1,W 2,...,W M be the random variable for the time-to-event due to ith (i=1,2,...,m) competing cause. The random variables W 1,W 2,...,W M are assumed i.i.d. with cdf F(t)= 1 S(t) and are also independent of M. Therefore, the population time-to-event T is given by with P(W 0 = )=1. T = min{w 0,W 1,...,W M },

Competing cause scenario Then its survival function has the following form S P (t)=p(t > t)=p(w 0 > t,w 1 > t,...,w M > t) = P(M = 0)+P(W 1 > t,w 2 > t,...,w M > t,m 1) = P(M = 0)+ S(t) i P(M = i) i=1

Competing cause scenario Then its survival function has the following form S P (t)=p(t > t)=p(w 0 > t,w 1 > t,...,w M > t) = P(M = 0)+P(W 1 > t,w 2 > t,...,w M > t,m 1) = P(M = 0)+ S(t) i P(M = i) i=1 For example, let M follow a Poisson distribution, with mean θ (0, ); then, S P (t)=exp( θ)+ S(t) i exp( θ) θi i! = exp( θf(t)) i=1 e.g. Tsodikov, Ibrahim and Yakovlev (2003) and Ibrahim et al. (2005).

Competing cause scenario S P (t)=p(m = 0)+ S(t) i P(M = i) i=1 Again, the next questions must be answered: what is the distribution of M (the number of competing causes); what is the distribution of W i (time-to-event due to ith competing cause); how the covariates can be incorporated into the model; how to deal with the estimation of the parameters.

Competing cause scenario Assumptions for the distribution of the number of competing causes M: Poisson distribution (e.g. Yakovlev et al. 1993; Cantor and Shuster, 1992; Yakovlev, Cantor and Shuster, 1994; Chen, Ibrahim and Sinha, 1999; Hashimoto, Ortega et al., 2013); Negative binomial distribution (e.g. Castro, Cancho and Rodrigues, 2009; Ortega et al., 2014); COM-Poisson distribution (e.g. Rodrigues et al., 2009; Balakrishnan and Pal, 2012, 2013a,b,c); Compound weighted Poisson distribution (e.g. Rodrigues et al., 2011); Geometric distribution (e.g. Louzada et al., 2014);

Unified represantion Taking into account the relation S P (t)= S(t) i P(M = i) i=0 we can say that, both of the two classes of cure rate models (the binary and competing cause model) can be expressed as S P (t)=e[s(t) M ]=G M (S(t)), where the expectations is taken w.r.t. M and G M is the probability generating function of M (see e.g. Tsodikov et al., 2003).

Our semi-parametric approach In this work, we are going to study both the logistic-binary mixture and the logistic-poisson mixture models. The common hazard function h(t) of the time-to-event due to ith competing cause is approximated by a piecewise linear function. h( t ) φ N φn 1 φ 2 φ 1 φ 0 τ 0 τ1 τ 2 τ N 1 τ N t

Our semi-parametric approach The piecewise linear approximation (PLA), for the common hazard function h(t) of W i, is based on the following decisions: the number of lines, N, to be used; the selection of cut points τ 0 < τ 1 <...<τ N 1 < τ N for forming the line segments. Of course, we further assume that the PLA is continuous function at cut points.

Hence, Our semi-parametric approach h L (t)= N (c j + s j t)i [τj 1,τ j ](t), j=1 where I [τj 1,τ j ](t)=1 if and only if t [τ j 1,τ j ] and c j = φ j τ j φ j φ j 1 τ j τ j 1, s j = φ j φ j 1 τ j τ j 1, for j= 1,2,...,N, with φ j = c j + s j τ j and φ 0 = c 1 + s 1 τ 0. h( t ) φ N φn 1 φ 2 φ 1 φ 0 τ 0 τ1 τ 2 τ N 1 τ N t

Data and Estimation The MLE and EM algorithm is carried out for estimating the parameters of the model. We consider the scenario where the time-to-event is subject to noninformative random right censoring.

Data and Estimation : non-cured/ susceptibles : cured start time end of study

Data and Estimation Denoting with C i and T i the censoring time and lifetime of the ith individual, respectively, we then observe and δ i = I(T i C i ), i.e. δ i = Y i = min{t i,c i } { 1, if Yi is a time-to-event 0, if Y i is a censoring time,,i=1,2,...,n.

Data and Estimation ( y1, δ 1) = (exact lifetime,1) ( y2, δ 2) = (censoring time,0) ( y3, δ 3) = (censoring time,0) ( y4, δ 4) = (exact lifetime,1) ( y5, δ 5) = (censoring time,0) ( y6, δ 6) = (censoring time,0) ( y7, δ 7) = (exact lifetime,1) ( y8, δ 8) = (censoring time,0) ( y9, δ 9) = (censoring time,0) start time end of study

Data and Estimation: likelihood function From n pairs of times and censoring indicators (y 1,δ 1 ),...,(y n,δ n ), the likelihood function can be written as L=L(θ; x, y,δ) n f P (y i, x i ;θ) δ i S P (y i, x i ;θ) 1 δ i. i=1

Data and Estimation: likelihood function From n pairs of times and censoring indicators (y 1,δ 1 ),...,(y n,δ n ), the likelihood function can be written as L=L(θ; x, y,δ) Thus, the likelihood becomes n f P (y i, x i ;θ) δ i S P (y i, x i ;θ) 1 δ i. i=1 L(θ; x, y,δ) n (1 p 0 (x i,β)) δ i f U (y i, x i ;θ) δ i [p 0 (x i,β)+(1 p 0 (x i,β))s U (y i, x i ;θ)] 1 δ i, i=1 where S U and f U are the probability density and survival function of susceptibles, respectively.

Data and Estimation: likelihood function Let us now assume that I i = { 1, if the ith individual is susceptible 0, if the ith individual is cured,,i=1,2,...,n. Note that the values of I i for censored items are not observable.

Data and Estimation: likelihood function Let us now assume that I i = { 1, if the ith individual is susceptible 0, if the ith individual is cured,,i=1,2,...,n. Note that the values of I i for censored items are not observable. Then, the complete likelihood function can be written as L c = L c (θ; x, y,δ) (1 p 0 (x i,β)) f U (y i, x i ;θ) p 0 (x i,β) 1 I i i 1 i 1 i 0 [(1 p 0 (x i,β))s U (y i, x i ;θ)] I i i 0 (note that i 1 I i = 1).

Data and Estimation: EM algorithm For the E-step of the EM algorithm, we have where w (z) i = E[I i θ (z), O]= exp(x i β(z) )S U (y i, x;θ (z) ) 1+exp(x i β(z) )S U (y i, x;θ (z) ), Binary case : S U (t, x;θ)=s L (y i ;φ (z) ), Poisson case : S U (t, x;θ)= exp( θ(x,β)f L(t;φ)) exp( θ(x,β)). 1 exp( θ(x,β))

Numerical results A set of data of size n=300 is generated, in which the random variables W i follow a Weibull distribution. We assume that: 1. we have only one covariate with 4 possible values/groups (x {1,2,3,4}); 2. the censoring times C i follow Exponential distributions;

Numerical results A set of data of size n=300 is generated, in which the random variables W i follow a Weibull distribution. We assume that: 1. we have only one covariate with 4 possible values/groups (x {1,2,3,4}); 2. the censoring times C i follow Exponential distributions; We fix the cured proportions for the first and fourth group to 0.30 and 0.15, respectively; therefore, based on the equations 0.30= 1 1+exp(β 0 + β 1 1),0.15= 1 1+exp(β 0 + β 1 4) we have that β 0 = 0.552 and β 1 = 0.296.

Numerical results The cut points τ 0,τ 1,...,τ N 1 of the PLA are taken to be the sample percentiles of the uncensored data (i.e. 0,1/N,2/N,...,(N 1)/Nth percentiles); the last point τ N is the maximum of Y i. The initial values for β 0 and β 1 can be given by replacing the cured proportions with the observed censoring proportions; the initial values for φ 0,φ 1,...,φ N may be computed using a set of estimates through φ j = ĥ(τ j),j = 0,1,...,N; in our case, the Kaplan-Meier estimator was used. Moreover, we assume for each group that the censoring proportions exceed the cured proportions by c= 0.10 or c= 0.20.

Numerical results Table: Logistic-binary mixture model: Weibull distribution and low censored proportion (c =.10; n = 300) Cured proportions for Groups 1-4:(.3,.242,.192,.15) Censored proportions for Groups 1-4:(.3,.242,.192,.15)+c,c=.10 Sample size for Groups 1-4: (60,105,65,70),n=300 Parameters:β 0 =.5515,β 1 =.2958,α=1.5,γ=1.5(parameters of the distribution) ˆφ i,(ˆα, ˆγ) N (ˆβ 0, ˆβ 1 ) ˆp 0 l i=0 i=1 i=2 i=3 i=4 Mean 1 (.542,.310) (.301,.240,.189,.148) 352.175.646 16.74 2 (.542,.311) (.301,.239,.188,.148) 349.661.432 1.236 5.319 3 (.543,.311) (.301,.239,.188,.148) 348.917.367 1.100 1.220 7.389 4 (.543,.311) (.301,.239,.188,.148) 348.445.350.983 1.187 1.237 8.591 (.546,.296) (.304,.244,.194,.154) 406.612 (1.504,1.510) RMSE 1 (.362,.148) (.050,.030,.031,.038) 2 (.362,.148) (.050,.030,.031,.038) 3 (.362,.148) (.050,.030,.031,.038) 4 (.362,.148) (.050,.030,.031,.038) (.377,.144) (.053,.031,.028,.035).007 Std 1 (.363,.148) (.050,.030,.031,.038) 12.126.098 7.218 2 (.364,.148) (.050,.030,.031,.038) 12.249.109.123 5.547 3 (.363,.148) (.050,.030,.031,.038) 12.285.131.144.156 7.089 4 (.363,.148) (.050,.030,.031,.038) 12.213.145.143.181.195 9.422 (.378,.145) (.053,.031,.028,.034) 11.378 (.081,.070)

Data and Estimation 4 4 3 3 2 2 1 1 0 0 2 4 6 8 10 0 0 2 4 6 8 10 4 4 3 3 2 2 1 1 0 0 2 4 6 8 10 0 0 2 4 6 8 10

Numerical results Table: Logistic-binary mixture model: Weibull distribution and high censored proportion (c=.20; n=300) Cured proportions for Groups 1-4:(.3,.242,.192,.15) Censored proportions for Groups 1-4:(.3,.242,.192,.15)+c,c=.20 Sample size for Groups 1-4: (60,105,65,70),n=300 Parameters:β 0 =.5515,β 1 =.2958,α=1.5,γ=1.5(parameters of the distribution) ˆφ i,(ˆα, ˆγ) N (ˆβ 0, ˆβ 1 ) ˆp 0 l i=0 i=1 i=2 i=3 i=4 Mean 1 (.460,.343) (.314,.244,.187,.144) 323.516.580 9.644 2 (.460,.348) (.313,.242,.185,.141) 322.001.384 1.240 3.572 3 (.460,.349) (.313,.242,.185,.141) 321.148.324 1.084 1.227 4.839 4 (.462,.349) (.312,.242,.185,.141) 320.524.306.963 1.178 1.252 5.334 (.457,.336) (.316,.247,.190,.148) 373.296 (1.526,1.503) RMSE 1 (.491,.183) (.073,.045,.039,.044) 2 (.492,.185) (.072,.044,.039,.044) 3 (.490,.185) (.072,.044,.039,.044) 4 (.489,.185) (.072,.044,.039,.044) (.503,.193) (.071,.040,.036,.043).010 Std 1 (.484,.177) (.072,.045,.039,.044) 12.356.113 3.924 2 (.485,.178) (.071,.044,.039,.043) 12.460.123.136 3.844 3 (.483,.178) (.071,.044,.039,.043) 12.342.146.148.189 5.301 4 (.483,.178) (.071,.044,.039,.043) 12.316.162.168.210.239 5.987 (.496,.190) (.069,.040,.036,.043) 1.928 (.099,.092)

Data and Estimation 4 4 3 3 2 2 1 1 0 0 2 4 6 8 10 0 0 2 4 6 8 10 4 4 3 3 2 2 1 1 0 0 2 4 6 8 10 0 0 2 4 6 8 10

Numerical results Table: Logistic-Poisson mixture model: Weibull distribution and low censored proportion (c=.10; n=300) Cured proportions for Groups 1-4:(.3,.242,.192,.15) Censored proportions for Groups 1-4:(.3,.242,.192,.15)+c,c=.10 Sample size for Groups 1-4: (60,105,65,70),n=300 Parameters:β 0 =.5515,β 1 =.2958,α=1.5,γ=1.5(parameters of the distribution) ˆφ i,(ˆα, ˆγ) N (ˆβ 0, ˆβ 1 ) ˆp 0 l i=0 i=1 i=2 i=3 i=4 Mean 1 (.593,.307) (.292,.231,.182,.143) 352.121.263 14.46 2 (.584,.303) (.294,.235,.185,.146) 351.215.200.730 8.478 3 (.585,.303) (.294,.235,.185,.146) 350.655.181.595.801 9.074 4 (.585,.303) (.294,.235,.185,.146) 350.320.174.524.692.863 9.287 (.585,.303) (.294,.235,.185,.146) 352.204 (1.477,1.543) RMSE 1 (.371,.143) (.051,.032,.029,.035) 2 (.365,.141) (.051,.031,.028,.035) 3 (.365,.140) (.051,.031,.028,.035) 4 (.365,.140) (.051,.031,.028,.035) (.365,.140) (.051,.031,.028,.035) (.005,.01) Std 1 (.369,.143) (.051,.030,.028,.035) 12.044.052 3.598 2 (.365,.141) (.051,.030,.028,.034) 12.063.064.085 3.954 3 (.364,.141) (.051,.030,.028,.034) 12.087.069.078.116 4.989 4 (.364,.140) (.051,.030,.028,.034) 12.081.075.082.105.133 5.999 (.365,.140) (.051,.030,.028,.035) 12.244 (.067,.093)

Data and Estimation 4 4 3 3 2 2 1 1 0 0 2 4 6 8 10 0 0 2 4 6 8 10 4 4 3 3 2 2 1 1 0 0 2 4 6 8 10 0 0 2 4 6 8 10

Numerical results One more quantity of interest is the probability P(cured T > t)= p 0 (x β) p 0 (x β)+(1 p 0 (x β))s U (t;θ). Hence, in the next Table we mention the mean value (and sample standard deviation, in parentheses) of the above probability at the point t 0.95 for which P(cured T > t 0.95 )=0.95 (at each of the four possible values of our covariate; i.e. for x=1,2,3,4).

Numerical results Table: The sample mean of P(cured T > t 0.95 ) (n=300,c=.10) and for each possible value of covariate x. Weibull (α = 1,γ = 1.5)/Binary Weibull (α = 1.5,γ = 1.5)/Binary PLA PLA x N = 1 N = 2 N = 3 N = 4 Par* N = 1 N = 2 N = 3 N = 4 Par* 1.946(.02).945(.02).945(.02).946(.02).947(.02).971(.02).968(.01).967(.02).968(.02).947(.02) 2.947(.02).945(.02).945(.02).946(.02).947(.02).971(.03).967(.02).966(.02).967(.02).946(.02) 3.946(.02).944(.02).944(.02).945(.02).946(.02).970(.03).964(.02).964(.02).964(.02).945(.03) 4.944(.03).942(.03).942(.03).943(.03).944(.02).968(.04).961(.03).961(.03).961(.03).943(.03) Weibull (α = 1,γ = 1.5)/Poisson Weibull (α = 1.5,γ = 1.5)/Poisson 1.935(.04).933(.04).931(.04).931(.04).941(.03).958(.02).931(.04).932(.04).932(.04).931(.03) 2.933(.04).930(.04).928(.04).928(.04).940(.03).960(.02).928(.04).930(.04).931(.04).929(.03) 3.930(.04).926(.05).925(.05).925(.05).939(.03).962(.03).925(.05).927(.05).928(.05).926(.04) 4.926(.05).921(.05).920(.05).920(.06).936(.04).963(.03).921(.06).923(.06).924(.06).921(.05) *The respective estimations using the parametric approach.

Melanoma data The study took place from 1991 to 1995, and follow-up was conducted until 1998 (n=427); the data, taken from Ibrahim et al. (2005), present the survival times (in years) until the patient s death or the censoring time. In our application, the nodule category is the only covariate with 4 possible values x = 1,2,3,4 (lymph nodes involved in the disease) and the group sizes are n 1 = 111,n 2 = 137,n 3 = 87 and n 4 = 82, with respective censoring proportions 67.6%,61.3%,52.9% and 32.9%. The results from the PLA will be compared with those of the parametric approach.

Melanoma data Table: The estimation of the logistic-binary mixture model. N = 1 N = 2 N = 3 N = 4 N = 5 N = 6 Par 1 Par 2 p 01.618(.046).615(.046).617(.046).615(.046).615(.046).615(.046).632(.048).667(.040) p 02.506(.042).503(.042).504(.042).502(.042).503(.042).502(.042).506(.041).557(.030) p 03.425(.053).422(.053).423(.053).421(.052).421(.052).421(.052).379(.047).440(.033) p 04.276(.049).274(.049).275(.049).273(.049).273(.049).273(.049).267(.056).330(.046) φ 0.223.133.153.146.127.115 φ 1 2.184.792.579.469.486.492 φ 2 1.014.889.905.512.347 φ 3.910.674 1.201 1.117 φ 4 1.437.535.808 φ 5 2.088.671 φ 6 1.820 l 548.30 549.19 548.56 549.01 545.66 545.78 514.47 517.59 AIC 1108.60 1112.38 1113.12 1116.03 1111.33 1113.55 1036.94 1043.18 BIC 1132.80 1140.61 1145.38 1152.32 1151.66 1157.91 1053.07 1059.31 *Par 1 : the parametric Log-normal/logistic-binary mixture model of Balakrishnan and Pal (2013a) Par 2 : the parametric Weibull/logistic-binary mixture model of Balakrishnan and Pal (2013b)

Melanoma data P( M = 0 T > t) 1.0 0.9 0.8 0.7 0.6 0.5 x=1 x=2 x=3 0.4 x=4 t 3.59 3.96 4.20 4.66 Figure: The probability P(M = 0 T > t) = P(cured T > t), for each nodule category.

Similar works Larson and Dinse (1985) used a piecewise constant approximation for the baseline hazard function of the Cox s PH model (under a different competing cause scenario). In Lo et al. (1993), the baseline hazard was determined now by a piecewise continuous linear function (for the Bernoulli model and a different estimation approach). Chen and Ibrahim (2001) studied the Poisson mixture model where the common cdf of competing causes was approximated by a piecewise constant function (using different estimation approach). A piecewise constant approximation under Bayesian framework is met in Ibrahim, Chen and Sinha (2001) and Kim et al. (2007).

Conclusions The accuracy of PLA for estimating the regression coefficients is very close to that gained by a parametric approach; a similar remark made by Taylor (1995), through a KM-based non-parametric approach. The last line of our PLA seems to be significantly affected by the underlying censoring mechanism. In Lo et al. (1993) was found that the choice of cut points had a minimal effect on the results which is in accordance with our findings.

Conclusions The suggested non-parametric approach is quite flexible and the larger the N the more non-parametric the model is. We suggest to choose small to moderate values of N and trying to check the robustness of the estimates of the regression coefficients. No more lines can offer better approximations if the additional estimates of φ s lie very close to existing lines. Finally, we noted that more than 4 or 5 lines do not offer any improvement using either the maximum likelihood or MSE criterion.

Future work Future work includes: assuming that the number of competing causes follows a wider class of discrete distributions; under a fully non-parametric framework about the number of competing causes; a Cox proportional hazard model approach using different estimation methods; the asymptotic properties of these models.

References Balakrishnan, N. and Pal, S. (2012). EM algorithm-based likelihood estimation for some cure rate models. Journal of Statistical Theory and Practice, 6, 698-724. Balakrishnan, N. and Pal, S. (2013a). Lognormal distribution and likelihood-based inference for flexible cure rate models based on COM-Poisson family. Computational Statistics & Data Analysis, 67, 41-67. Balakrishnan, N. and Pal, S. (2013b). Expectation maximization-based likelihood inference for flexible cure rate models with Weibull distribution. Statistical Methods in Medical Research, available online. Balakrishnan, N. and Pal, S. (2013c). COM-Poisson cure rate models and associated likelihood-based inference with exponential and Weibull lifetimes, in Applied Reliability Engineering and Risk Analysis: Probabilistic Models and Statistical Inference (eds I. B. Frenkel, A. Karagrigoriou, A. Lisnianski and A. Kleyner), John Wiley & Sons, Chichester. Berkson, J. and Gage, R.P. (1952), Survival curves for cancer patients following treatment. Journal of the American Statistical Association, 47, 501-515. Bentzen, S.M., Thames, H.D., Travis, E.L., Kian Ang, K., Van Der Schueren, E., Dewit, L. and Dixon, D.O. (1989). Direct estimation of latent time for radiation injury in late-responding normal tissues: gut, lung, and spinal cord. International journal of radiation biology, 55, 27-43. Boag, J.M. (1949), Maximum likelihood estimates of the proportion of patients cured by cancer therapy. Journal of the Royal Statistical Society, Ser. B, 11, 15-44. Cantor, A.B. and Shuster, J.J. (1992). Parametric versus non-parametric methods for estimating cure rates based on censored survival data. Statistics in Medicine, 11, 931-937. Castro, M.D., Cancho, V.G. and Rodrigues, J. (2009). A bayesian long-term survival model parametrized in the cured fraction. Biometrical Journal, 51, 443-455. Chen, M.H. and Ibrahim, J.G. (2001). Maximum likelihood methods for cure rate models with missing covariates. Biometrics, 57, 43-52. Chen, M.H., Ibrahim, J.G. and Sinha, D. (1999). A new Bayesian model for survival data with a surviving fraction. Journal of the American Statistical Association, 94, 909-919. Fang, H.B., Li, G. and Sun, J. (2005). Maximum Likelihood Estimation in a Semiparametric Logistic/Proportional-Hazards Mixture Model. Scandinavian Journal of Statistics, 32, 59-75.

References Farewell, V.T. (1977a). A model for a binary variable with time-censored observations. Biometrika, 64, 43-46. Farewell, V.T. (1977b). The combined effect of breast cancer risk factors. Cancer, 40, 931-936. Farewell, V.T. (1982). The use of mixture models for the analysis of survival data with long-term survivors. Biometrics, 38, 1041-1046. Farewell, V.T. (1986). Mixture models in survival analysis: Are they worth the risk? Canadian Journal of Statistics, 14, 257-262. Ghitany, M. E. (1993). On the information matrix of exponential mixture models with long-term survivors. Biometrical Journal, 35, 15-27. Ghitany, M.E. and Maller, R.A. (1992). Asymptotic results for exponential mixture models with long-term survivors. Statistics: A Journal of Theoretical and Applied Statistics, 23, 321-336. Hashimoto, E.M., Ortega, E.M., Cordeiro, G.M. and Cancho, V.G. (2013). The Poisson Birnbaum-Saunders model with long-term survivors. Statistics, available online. Ibrahim, J.G., Chen, M.H. and Sinha, D. (2001). Bayesian semiparametric models for survival data with a cure fraction. Biometrics, 57, 383-388. Ibrahim, J.G., Chen, M.H. and Sinha, D. (2005). Bayesian Survival Analysis. John Wiley & Sons, New York. Kim, S., Chen, M.H., Dey, D.K. and Gamerman, D. (2007). Bayesian dynamic models for survival data with a cure fraction. Lifetime Data Analysis, 13, 17-35. Kuk, A.Y. and Chen, C.H. (1992). A mixture model combining logistic regression with proportional hazards regression. Biometrika, 79, 531-541. Larson, M.G. and Dinse, G.E. (1985). A mixture model for the regression analysis of competing risks data. Applied statistics, 34, 201-211.

References Laska, E.M. and Meisner, M.J. (1992). Nonparametric estimation and testing in a cure model. Biometrics, 48, 1223-1234. Lo, Y.C., Taylor, J.M., McBride, W.H. and Withers, H.R. (1993). The effect of fractionated doses of radiation on mouse spinal cord. International Journal of Radiation Oncology, Biology, Physics, 27, 309-317. Louzada, F., de Castro, M., Tomazella, V. and Gonzales, J.F. (2014). Modeling categorical covariates for lifetime data in the presence of cure fraction by Bayesian partition structures. Journal of Applied Statistics, 41, 622-634. Maller, R.A. and Zhou, S. (1992). Estimating the proportion of immunes in a censored sample. Biometrika, 79, 731-739. Maller, R.A. and Zhou, X. (1996). Survival Analysis with Long-term Survivors. John Wiley & Sons, New York. Mould, R.F. (1973). Statistical Models for Studying Long Term Survival Results following Treatment for Carcinoma of the Cervix. Ph.D. Thesis, University of London. Ortega, E.M., Barriga, G.D., Hashimoto, E.M., Cancho, V.G. and Cordeiro, G.M. (2014). A new class of survival regression models with cure fraction. Journal of Data Science, 12, 107-136. Peng, Y. and Dear, K.B. (2000). A nonparametric mixture model for cure rate estimation. Biometrics, 56, 237-243. Peng, Y., Dear, K.B. and Denham, J.W. (1998). A generalized F mixture model for cure rate estimation. Statistics in Medicine, 17, 813-830. Rodrigues, J., Cancho, V.G., de Castro, M. and Louzada-Neto, F. (2009). On the unification of long-term survival models. Statistics & Probability Letters, 79, 753-759.

References Rodrigues, J., de Castro, M., Balakrishnan, N. and Cancho, V.G. (2011). Destructive weighted Poisson cure rate models. Lifetime Data Analysis, 17, 333-346. Struthers, C.A. and Farewell, V.T. (1989). A mixture model for time to AIDS data with left truncation and an uncertain origin. Biometrika, 76, 814-817. Sy, J.P. and Taylor, J.M. (2000). Estimation in a Cox proportional hazards cure model. Biometrics, 56, 227-236. Sy, J.P. and Taylor, J.M. (2001). Standard errors for the Cox proportional hazards cure model. Mathematical and Computer Modelling, 33, 1237-1251. Taylor, J.M. (1995). Semi-parametric estimation in failure time mixture models. Biometrics, 51, 899-907. Tsodikov, A.D., Ibrahim, J.G. and Yakovlev, A.Y. (2003). Estimating cure rates from survival data. Journal of the American Statistical Association, 98, 1063-1078. Yakovlev, A.Y., Asselain, B., Bardou, V.J., Fourquet, A., Hoang, T., Rochefediere, A. and Tsodikov, A D. (1993). A simple stochastic model of tumor recurrence and its applications to data on premenopausal breast cancer. Biometrie et Analyse de Donnees Spatio-temporelles, 12, 66-82. Yakovlev, A.Y., Cantor, A.B. and Shuster, J.J. (1994). Parametric versus non-parametric methods for estimating cure rates based on censored survival data. Statistics in Medicine, 13, 983-986. Zhao, L., Feng, D., Bellile, E.L. and Taylor, J.M. (2014). Bayesian random threshold estimation in a Cox proportional hazards cure model. Statistics in Medicine, 33, 650-661.