Parametric Maximum Likelihood Estimation of Cure Fraction Using Interval-Censored Data

Similar documents
Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data

Analysis of Cure Rate Survival Data Under Proportional Odds Model

Mixture modelling of recurrent event times with long-term survivors: Analysis of Hutterite birth intervals. John W. Mac McDonald & Alessandro Rosina

3003 Cure. F. P. Treasure

Semi-parametric Inference for Cure Rate Models 1

Analysis of Gamma and Weibull Lifetime Data under a General Censoring Scheme and in the presence of Covariates

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

The nltm Package. July 24, 2006

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Semiparametric Regression

Adaptive Prediction of Event Times in Clinical Trials

Multi-state Models: An Overview

Introduction to Statistical Analysis

Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion

Robust estimates of state occupancy and transition probabilities for Non-Markov multi-state models

Part IV Extensions: Competing Risks Endpoints and Non-Parametric AUC(t) Estimation

Bivariate Weibull Distributions Derived From Copula Functions In The Presence Of Cure Fraction And Censored Data

Semiparametric Mixed Effects Models with Flexible Random Effects Distribution

Likelihood Construction, Inference for Parametric Survival Distributions

Double Bootstrap Confidence Interval Estimates with Censored and Truncated Data

Survival Analysis for Case-Cohort Studies

REGRESSION ANALYSIS FOR TIME-TO-EVENT DATA THE PROPORTIONAL HAZARDS (COX) MODEL ST520

Survival Analysis Math 434 Fall 2011

Lecture 5 Models and methods for recurrent event data

Analysis of competing risks data and simulation of data following predened subdistribution hazards

Longitudinal + Reliability = Joint Modeling

Power and Sample Size Calculations with the Additive Hazards Model

Multistate Modeling and Applications

Multivariate Survival Analysis

Censoring mechanisms

A comparison of inverse transform and composition methods of data simulation from the Lindley distribution

Semiparametric Models for Joint Analysis of Longitudinal Data and Counting Processes

Lecture 3. Truncation, length-bias and prevalence sampling

UNIVERSITY OF CALIFORNIA, SAN DIEGO

Promotion Time Cure Rate Model with Random Effects: an Application to a Multicentre Clinical Trial of Carcinoma

Constrained estimation for binary and survival data

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Parameter Estimation of Power Lomax Distribution Based on Type-II Progressively Hybrid Censoring Scheme

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY

Frailty Models and Copulas: Similarities and Differences

Vertical modeling: analysis of competing risks data with a cure proportion

A Bayesian Nonparametric Approach to Causal Inference for Semi-competing risks

Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai

CURE MODEL WITH CURRENT STATUS DATA

Package threg. August 10, 2015

Regularization in Cox Frailty Models

FAILURE-TIME WITH DELAYED ONSET

Time-varying failure rate for system reliability analysis in large-scale railway risk assessment simulation

The Design of a Survival Study

Models for Multivariate Panel Count Data

Modelling geoadditive survival data

Statistical Data Mining and Machine Learning Hilary Term 2016

Simple Step-Stress Models with a Cure Fraction

Frailty Modeling for clustered survival data: a simulation study

Step-Stress Models and Associated Inference

Technical Report - 7/87 AN APPLICATION OF COX REGRESSION MODEL TO THE ANALYSIS OF GROUPED PULMONARY TUBERCULOSIS SURVIVAL DATA

Survival Distributions, Hazard Functions, Cumulative Hazards

Chapter 2 Inference on Mean Residual Life-Overview

Statistical Modeling and Analysis for Survival Data with a Cure Fraction

The influence of categorising survival time on parameter estimates in a Cox model

Advanced Methodology Developments in Mixture Cure Models

Semiparametric Generalized Linear Models

MODELING COUNT DATA Joseph M. Hilbe

Estimation of Conditional Kendall s Tau for Bivariate Interval Censored Data

The Accelerated Failure Time Model Under Biased. Sampling

MAS3301 / MAS8311 Biostatistics Part II: Survival

A Sampling of IMPACT Research:

Statistical Methods for Alzheimer s Disease Studies

Dynamic Prediction of Disease Progression Using Longitudinal Biomarker Data

Longitudinal breast density as a marker of breast cancer risk

Analysing geoadditive regression data: a mixed model approach

and Comparison with NPMLE

Review Article Analysis of Longitudinal and Survival Data: Joint Modeling, Inference Methods, and Issues

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

LOGISTIC REGRESSION Joseph M. Hilbe

Econometric Analysis of Cross Section and Panel Data

CIMAT Taller de Modelos de Capture y Recaptura Known Fate Survival Analysis

Physician Performance Assessment / Spatial Inference of Pollutant Concentrations

Hacettepe Journal of Mathematics and Statistics Volume 45 (5) (2016), Abstract

A general mixed model approach for spatio-temporal regression data

Local Likelihood Bayesian Cluster Modeling for small area health data. Andrew Lawson Arnold School of Public Health University of South Carolina

Interval Estimation for Parameters of a Bivariate Time Varying Covariate Model

Point and Interval Estimation for Gaussian Distribution, Based on Progressively Type-II Censored Samples

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA

Statistical Inference on Constant Stress Accelerated Life Tests Under Generalized Gamma Lifetime Distributions

Survival Analysis. Lu Tian and Richard Olshen Stanford University

1 The problem of survival analysis

Survival Analysis. Stat 526. April 13, 2018

CURE FRACTION MODELS USING MIXTURE AND NON-MIXTURE MODELS. 1. Introduction

CASE STUDY: Bayesian Incidence Analyses from Cross-Sectional Data with Multiple Markers of Disease Severity. Outline:

Multistate models and recurrent event models

A brief note on the simulation of survival data with a desired percentage of right-censored datas

Part III Measures of Classification Accuracy for the Prediction of Survival Times

Multistate models and recurrent event models

Lecture 22 Survival Analysis: An Introduction

Two-stage Adaptive Randomization for Delayed Response in Clinical Trials

Transcription:

Columbia International Publishing Journal of Advanced Computing (2013) 1: 43-58 doi:107726/jac20131004 Research Article Parametric Maximum Likelihood Estimation of Cure Fraction Using Interval-Censored Data Bader Ahmad I Aljawadi 1*, Mohd Rizam A Bakar 1, Noor Akma Ibrahim 2, and Mohamad Al-Omari 2 Received 26 November 2012; Published online 15 December 2012 The author(s) 2012 Published with open access at wwwusciporg Abstract A significant proportion of patients in cancer clinical trials can be cured That is, the symptoms of the disease disappear completely and the disease never recurs In this article, the focus is on estimation of the proportion of patients who are cured The parametric maximum likelihood estimation method was used for estimation of the cure fraction based on application of the bounded cumulative hazard (BCH) model to interval-censored data We ran the analysis using the EM algorithm considering two cases: i) when no covariates were involved in the estimation, and ii) when some covariates were involved This paper shows derivation of the estimation equations for the cure rate parameter followed by a simulation study AMS Subject Classification: 62N02 Keywords: Cure fraction; BCH model; Interval censoring; Covariates; MLE method; EM algorithm 1 Introduction Most of the survival models typically assume that all individuals are susceptible to a predetermined event with sufficient follow-up However, this assumption may be violated by a group of individuals such that not all of them will experience the particular event Such group is usually described in the related literature as non-susceptible or cured Several decades ago, the survival models started to include the cured portion in the analysis, and a new group of models hence has been developed, briefly called cure models and broadly used in analyzing data from cancer and other disease clinical trials *Corresponding e-mail: Bader_Aljawadi@yahoocom 1* Institute for Mathematical Research, Universiti Putra Malaysia 2 Department of Mathematics, Universiti Putra Malaysia and Institute for Mathematical Research, Universiti Putra Malaysia 43

The first cure model was developed by Boag (1949) and three years later modified by Berkson and Gage (1952) This type of model is known as a mixture cure rate model and can be written as follows: =+1, (1) where and are the survival functions for the entire population and the uncured patients, respectively, and is the cured fraction In this model, the failure time function of the uncured patients can be estimated either parametrically, thus producing parametric survival models, or non-parametrically, hence producing semi-parametric survival models In the parametric group of cure models, we assume a particular distribution for the failure time of uncured patients, such as the exponential Elaborations on the theory and applications of mixture models can be found in Goldman (1984), Farewell (1986), Gamelet et al (1990), Kuk and Chen (1992), Taylor (1995), Peng and Dear (2000), Sy and Taylor (2000), Peng and Carriere (2003), Uddin et al (2006a and 2006b) and Abu Bakaret et al (2009), amongst others Although the mixture model is widely used in survival analysis, Chen et al (1999) discussed some drawbacks in it and proposed an alternative: the Bounded Cumulative Hazard BCH model, which was originally developed by Yakovlev et al in 1993 (Yakovlev et al, 1993) This model has been examined and applied in medical research, such as by Aljawadi et al (2011) and (2012), where it was employed in the estimation of the cure fraction in cancer trials under various censoring types (ie right, left and interval censoring) and the estimation procedure was handled via the parametric and nonparametric techniques However, the covariates were not involved in the analysis either in the parametric or nonparametric approaches under the interval censoring case, which increased the necessity of such research In the BCH model it is assumed that for cancer patients, a number of cancer cells (N), referred to as clonogens, remains active even after the initial treatment and that the cells grow rapidly and replace the normal tissues In addition, this model posits that N follows a Poisson distribution with a mean of θ Thus, the survival function for the BCH model is: =exp Probability to survive beyond the time t, (2) where is the cumulative distribution function such that =1 (Yakovlev et al, 1993; Chen et al, 1999) Based on the survival function given in (12), the cure fraction can be defined as follows: = lim ==0 =lim exp =exp (3) In the BCH model, represents the lifetime of the individual where in common situations this lifetime is either an exact survival time or a censoring time Other situations, however, can occur 44

where the individuals follow-up after a pre-fixed time period or make periodic visits for a fixed number of times In this paper, the lifetime,=1,,, is not exactly known, but it is known that it falls within an interval, ie, between the two visit times and One problem associated with the cure rate models is identifiability This arises in parametric and semiparametric models, as investigated by Farwell (1986), Taylor (1995) and Peng and Dear (2000), due to the lack of information at the end of the follow-up period, since a significant proportion of subjects are censored before the end of the follow-up period A simple working solution in these models is to assume that those patients who have censored times greater than the last uncensored time are cured, which is quite an arbitrary decision and lacks justification However, since the estimated survival function under the BCH model will approach zero at the largest uncensored time, these patients are treated as cured Therefore, decreasing the number of visiting times in case of interval survival data or increasing the length of the follow-up intervals might cause ambiguity in the real proportion of the cured patients and make the estimation of the true cure fraction a challenge This issue is revealed in the simulation study Furthermore, cure fraction estimates can be quite sensitive to the choice of latency distributions (distribution of the survival times for uncured patients) The cure fraction estimates from the model with generalized gamma distribution is found to be quite robust (Yu et al, 2004) Therefore, several cautions on the general use of cure models are advised 2 Methodology The focus of this paper was on parametric analysis of the estimation of the cure rate by means of the BCH model using interval-censored data However, the analysis was applied considering two scenarios; one with covariates and one without them 21 Data Without Covariates Suppose that T is a random variable with a probability density function ;, to be estimated and that,,, is a random sample of size n, then the joint probability density function is given by,,, ;= ; (4) In the parametric maximum likelihood method the cumulative distribution function and the probability density function For the entire population are known Given that is unknown, let be an indicator of censoring having a value of zero if is the censoring time and one if otherwise; be an indicator of the cure status of the patient, namely, is zero if the patient is cured and one otherwise; and =1,2,, If =1, then =1 However, if =0, will not be observed and it can be either one or zero We assume throughout the paper that the censoring is independent of failure times Given and (ie, the complete data are available), then the complete log-likelihood function is: =log 1 1, (5) 45

where and are the probability density function and the survival function for the uncured patients, respectively Many of the common distributions can be used in parametric statistical inference with survival data, including the exponential distribution, which is the most commonly used distributions due to its lack of memory property and constant failure time (Jeevanand et al, 2008) It has the survival function = for 0, and therefore the probability density function = In the case of interval-censored data, this function assumes the formula = = (Klein and Moeschberger, 2003) Thus, for left censoring, the formula becomes =0 =1 since 0=1, whereas for right censoring the formula is = = since =0 Regarding uncensored individuals, is not observed and the only information available is that it falls in an interval, thus we will estimate it by means of the mid-point of the observed intervals As a result the log-likelihood function can be obtained from: =log 1 1 =log 1 1 + log1 + 1 log (6) Note that the formula of the log-likelihood in equation (6) is valid only for data in which no tied events occur, ie, for data where no two or more events occur simultaneously Theoretically, a tied event may occur if the event time scale is discrete and/or if the event time is grouped into intervals In such a case, an alternative formula for the likelihood function is needed to handle this event However, discussion of such a formula is beyond the scope of this study but could be considered in subsequent works The solutions of =0 and =0 are the desired estimates of and, such that 1 1 =0 or =log And +1 (7) + 1 =0 (8) As the cure status ( ) is not fully observed, we need to implement the EM algorithm to estimate the desired parameters Before implementation of the EM algorithm, we have to address the main concern related to the EM algorithm, which is its slow convergence rate upon poor choice of initial values The rate of convergence depends on the proportion of missing information in the observed data If the portion of the missing data is large, then this algorithm can be quite slow However, in this work it was not too difficult to assign the proper initial values for the parameters and hence accelerate the 46

convergence of the EM algorithm since we utilized simulated data to assess the efficiency of the proposed methods Therefore, the initial values could be expected based on the corresponding censored rates used in each generated data set For example, if we were interested in a censoring rate equal to 20%, then the initial value of the parameter would be set close to log 02 since the cure fraction is defined as =exp For the other parameter, however, it was possible to pick some random values In real settings, on the other hand, the initial values of the parameters could be assigned to them in the same manner but not as much specifically as they are assigned in simulation Accordingly, in a real-world study the initial values for would be set as to be less than log censoring rate since the censoring rate is known and the cure fraction will not exceed the censoring proportion 211 The EM Algorithm The EM algorithm is a very general iterative algorithm for parameter estimation by maximum likelihood when some of the random variables involved are not observed, ie, are considered missing or incomplete For the interval survival data given in the form (,,, ), the cure indicator is partially missing and hence will be handled via the EM algorithm If the complete-data vector =,,, is observed, it is of interest to compute the maximum likelihood estimate of based on the log-likelihood function defined by equation (5) In the presence of missing cure status for censored individuals, however, only a function of the complete-data vector is observed We will denote this by expressing as,, where denotes the observed but incomplete data and denotes the unobserved data set Thus, the probability density function can be written as follows: ;=, ;= ; ; (9) Where is the joint density of and is the joint density of given the observed data, respectively It follows that the log-likelihood function can be expressed by: ;=log;=log ; ; ] = log ; ] + log ; ] (10) Suppose that we have individuals where for = 1,, both and are observed and are equal to 1, while for = + 1,,, only is observed and is equal to 0, whereas is not observed and therefore needs to be estimated (ie = {, ],,, = 1,, = 1,, } and = {, = + 1,, } Since is not completely observed, cannot be evaluated and hence maximized The EM algorithm attempts to maximize iteratively by replacing it by its conditional expectation given the observed data Thus, the E-step of the EM algorithm calculates the expectation of the log-likelihood function defined by equation (6) for given values of, and, ], such that: 47

] = /, ],,, = 1,, ] + /, ],, = + 1, ], (11) Where /, ],, ] = log + log1, And /, ], ] = 1 + log1 + log This equation demonstrates that the expected value of the log-likelihood function cannot be calculated unless the expressions 1,, and log are computed since the cure status for the + 1 censored individuals is not provided These three expressions are collectively called the sufficient statistics and an estimation of the expected value of each is necessary So, let = = 1 and = log (12) Peng and Dear (2000) defined as the expected value of for the patient who would be uncured provided that the values of the current estimates of and the survival function of the uncured patients,, are given Accordingly, the value of will be used throughout this study to represent the expected value of the cure status,, for the censored individual where = + 1 ], = + 1,, (13) For simplicity, let = 1 = 1 where = + 1,, represent the expected value of 1 for the patient to be cured such that for censored individuals = 0 and 1 = 1 = 1 + 1 = (14) Then, the formulae used to calculate the sufficient statistics can be rewritten as: = = 1 48

and = 1 log (15) Given the sufficient statistics, the maximum likelihood estimates of the parameters given by equations (7) and (8) can be written as follows: =log + 1 =log + 1 =log + 1 (16) and + 1 = 0 + + + 1 + 1 = 0 + 1 = 0 (17) In the M-step of the EM algorithm we can solve for,, and where for some initial values assigned to (, ), is the solution of equation (16) and is the numerical solution of equation (17), and we repeat the above sequence of steps until convergence 22 Datasets with Covariates Individual databases of cancer clinical trials contain information that may affect the event time distribution This information can be managed through some baseline variables called covariates, such as gender, type of treatment, year of cancer diagnosis, etc The covariates for individuals in these studies are valuable for assessment of the survival function and therefore the cure fraction So, this part of the current study will focus on estimation of the cure fraction when covariates are involved in the analysis It is assumed that the cure fraction is not to depend on the observed group of the covariates and hence they are modeled via the scale parameter of the latent distribution only, even though it is more realistic to incorporate the covariates via the cure rate parameter too When covariates are given and involved in the analysis, the scale parameter of the exponential distribution () given these covariates can be expressed as: = exp, Where and are the covariates and unknown coefficients vectors, respectively Therefore, the log-likelihood function given in (6) can be obtained from the equation: = exp 1 1 + log1 + 1 log (18) 49

Where the solutions of the equations = 0, = 0, and and, such that = 0 are the desired estimates of = 1 1 + = 0 (19) θ=log c + 1 1 α 1 c = exp + 1 = 0 = exp + 1 = 0 (20) where: = exp and = exp = exp and = exp The system of non-linear equations in (20) can be simplified and re-written as: (21) = 1 ] + 1 = 0, (22) = 1 ] + 1 = 0 where = exp Since the cure status ( ) is not fully observed, we will employ the EM algorithm to get an estimate of the interested parameters 221 The EM Algorithm In the presence of covariates, the data vector will take the form (, ],,, ) where = 1 Based on the data partition defined in 211, the only unobserved data is the cure status,, for = + 1 50

For the E-Step, the log-likelihood function given in equation 18 can be obtained from: /, c, t, = /,,,, 1 + /,,,, + 1 (23) such that /,,,, 1 = + log1 exp (24) And /,,,, + 1 = 1 + log1 log (25) Where 1,, and log are the sufficient statistics Let = 1 = and = log Based on the definition of given in 211, the can, in such case, be estimated using the equation: 1 = 1 = 1 + 1 = (26) Furthermore, the equations giving sufficient statistics can be re-written as follows: = = 1 and = 1 log (27) In light of this, the maximum likelihood estimate of given by equation (7) can be simplified to: =log + 1 =log + 1 =log + 1 (28) 51

For the M-step, a pre-determined initial value for can be used to solve the system of non-linear equations given in (22) with respect to using an appropriate numerical approach such as the Newton-Raphson method This way, the values of, besides the initial value of, can be used to solve for the sufficient statistics, and then to solve equation (28) and get a new value for Then, the whole sequence of steps is repeated until convergence 3 Simulation and Results Simulation studies based on interval-censored data involve further steps in comparison with the other types of censoring In this study, we were quite interested in varying the censoring rates as this enables identification of the pattern that the cure rate estimation will assume So, for the sake of flexibility in finding out how the pattern of the cure rate would progress as the censoring rate increased both slowly and rapidly, we studied several scale parameters of the exponential distribution that can be determined based on the criteria defined below Each data set comprised 100 interval observations and a number of censoring rates depending on the various possible values of Here we ignored the left-censoring case and considered the right censoring one to represent the entire censoring rate, which also represents the real fraction of cured individuals as a special case To control the generation process, we assumed that the true survival time,, followed an exponential distribution Subsequently, the steps used for data generation were as follows: 1 Generation of the covariates from a binomial distribution with pre-assigned probabilities where only two covariates were considered: gender, which was derived from a binomial distribution with a probability of 05; and type of treatment, ie, chemotherapy or radiotherapy, with a probability of 05 too 2 Generation of from an exponential distribution with different scale parameter values such that in the case of covariates the different values of the scale parameter could be assigned on the basis of the link function defined in section 22, = exp, where the initial values of the covariates parameters and were set to 01 and 01, respectively, just to produce a scale parameter within the interval 01, 2] that will produce a number of censoring rates based on the subsequent steps of data generation Consequently, the same values of the scale parameter considered in the case of covariates may be used to generate the lifetimes when the covariates are excluded 3 Creation of a vector for the clinic visits assuming that 20 clinic visits are possible In the case of exponential distribution, the first visit,, was generated from a uniform distribution; 0,01 Afterwards, the next visit,, was generated from the distribution, + 01 The subsequent visit times were generated in the same manner 4 Production of a100 2 empty matrix named bound for each data set The entries of the bound matrix were the intervals endpoints for each individual after comparing the true survival time with the 20 visit times In the case of right censoring, the right end point can 52

be assigned in such a way as to be a large number beyond the last visit time The formulae used for end point determination were: For = 1,,100, = 1,,20 0 ] < 1] bound, 1] = ] ] < ] < + 1] 20] ] > 20] 1] ] < 1] bound, 2] = + 1] ] < ] < + 1] ] > 20] (29) (30) 5 Construction of a100 2 empty matrix named status Based on the bound matrix, let: 0 bound, 2] = 100 status, 1] censoring indicator, = 1 otherwise and status, 1] cured indicator, = 0 = 0 1 otherwise In this simulation we were interested in the bias, which can indicate the performance of the methods under evaluation, and in the mean square error (MSE) Bias is the deviation of an estimate from the true quantity while, on the other hand, the MSE provides a useful measure of the overall accuracy as it incorporates the measures of both bias and variability By using standard notation for scalar parameter, bias and MSE can be expressed as follows: Bias = = + (31) where is the maximum likelihood estimate for In light of this, the simulation was carried out using the R statistical software to generate raw and bootstrapped data and a part of the final results are presented by Figure 31 53

Bias 14% 13% Without Covariates 12% With Covariates 11% 10% 9% 8% 7% 6% 5% 4% 3% 2% 1% 0% 15% 18% 21% 24% 27% 30% 33% 36% 39% 42% 45% 48% 51% 54% Censoring Rate Fig 31 Censoring rate versus bias based on the two scenarios of parametric estimation of the cure fraction The previous considerations suggest the importance of introducing distinct assumptions to the data sets from the proposed distributions to examine the performance of maximum likelihood estimators derived under the BCH model However, it is quite appropriate to consider some departures from idealized assumptions and to study the behavior of estimators by assuming different conditions and circumstances This can be due to: 1) Changing the number of clinic visits Different numbers of clinic visits lead to different numbers of censored individuals between each two visits, which in turn imply different parametric maximum likelihood estimates Therefore, creation of the vector for the clinic visits was based on the assumption that only 15 clinic visits are possible where the visit times could be generated in the same manner defined above A part of the results are presented by Figure 32 54

Bias 28% 26% With covariates 24% 22% Without covariates 20% 18% 16% 14% 12% 10% 8% 6% 4% 2% 0% 35% 38% 41% 44% 47% 50% 53% 56% 59% 62% 65% 68% Censoring Rate Fig 32 Censoring rate versus bias based on 15 clinic visits scenario 2) Estimation of the cure fraction when covariates are involved in the analysis due to different joint regression models A common joint model between the scale parameter of the exponential distribution and covariates has been employed to regularize the data generation In practice, the true relationship between the parameters and covariates is almost unknown Therefore, many other joint models of the covariates can be employed However, as an alternative, we present another joint distribution model of the covariates to describe the manner in which the estimation procedure acts upon various link functions Within this context, the effect of accommodating covariates on the scale parameter of the exponential distribution can be described by the following model: = 1 1 + exp Following the data generation steps defined above, the alternative link function produced limited scale parameter values and hence inflexible censoring rates In particular, accommodation of gender and type of treatment in the proposed joint model yielded scale parameter values which oscillated between zero and 1 (ie 0 < < 1 ) and hence resulted in high censoring rates(greater than 35%) This consequently produced defective cure fractions, ie, cure fractions characterized by high bias and MSE values However, employing this joint model provided intensive computational solutions for the cure fraction due to the inconvenient parameter estimation 55

4 Discussion The above figures and results present the performance of parametric estimation of the cure fraction when covariates were included in the analysis and when they were not The bias and MSE values for the various given rates of censoring indicate that the proposed method of cure rate estimation was in both cases more efficient when the censoring rate was low than when it was high and that the estimation started to diverge in the case of heavy censoring rates A large number of censored individuals decreases the equivalent number of subjects exposed (at risk), making the cure rate estimates less reliable than it could be for the same number of subjects but with less censoring Moreover, if there is heavy censoring, the estimated is nothing more than a poor approximation since the sufficient statistics estimates become considerably unacceptable as a result of extra error in the estimation procedure due to the overuse of the probability of the cure estimator defined by Equation 14 Hence, increasing the proportion of censored data will distort the estimated parameters, and vice versa Furthermore, the efficiency of cure rate estimation when the covariates were considered in the analysis was higher than when no covariates were incorporated in the estimation procedure Consequently, it is essential in this situation to develop parametric estimation of the cure fraction with covariates incorporated in the analysis Since the covariates that arise in many studies, especially clinical trials are secondary variables that provide extra information, inclusion of covariates in this study allowed for improvement in the estimates of the parameters of primary interest, especially the cure fraction Moreover, it should be emphasized that the proposed estimation procedure is very sensitive to the follow-up period for the cancer patients involved in the clinical trial An insufficient following up may distort the estimate of the parameters of interest Additionally, the estimated cure fraction may not be as reliable when estimated based on insufficient clinical follow-up data since cancer care involves regular medical checkups that include a review of the patient s medical history and a physical examination, which may help in further prevention of the incidence of cancer and in monitoring those patients who may be at risk However, in short-period trials a high proportion of the patients are missing, ie, they are censored Nonetheless, they are no longer considered at risk and the researcher was not in the position to record the precise time at which the event of concern may occur This situation can be detected easily when overviewing the bias behavior in Figure 32 where the bias pertaining to the very high censoring rates cases exceed the common level, which is an incidence that can occur as a result of short period trials 5 Conclusion We have shown that parametric estimation based on the BCH model can lead to better inferential performance (less bias and lower MSE) of the cure fraction in the interval censoring case when the censoring rate is low than when it is high The work examined two scenarios: one when covariates were not included in the estimation procedures and the other when some covariates were involved Our results demonstrate that cure fraction estimation based on the proposed procedure was more realistic when covariates were involved in the estimation procedures than when they were not The R codes used to realize the computations are available upon request 56

Acknowledgment The authors would like to express their sincere thanks to the University Research Board of University Putra Malaysia (UPM) for their generous support of this study References Abu Bakar, M R, Salah, K A, Ibrahim, N A, Haron, K, 2009 Bayesian Approach for Joint Longitudinal and Time-to-Event Data with Survival Fraction Bull MalaysMath Sci Soc 32, 75-100 Aljawadi, B A, Bakar, M R, Ibrahim, N A, and Midi, H 2011 Parametric estimation of the immunes based on BCH model and exponential distribution using left censored data Journal of Applied Sciences 11(15), 2861-2865 http://dxdoiorg/103923/jas201128612865 Aljawadi, B A, Bakar, M R, Ibrahim, N A, and Midi, H 2011 Parametric estimation of the cure fraction based on BCH model using left censored data with covariates Modern Applied Science Journal5(3), 103-110 Aljawadi, B A, Bakar, M R, and Ibrahim, N A 2012 Parametric versus nonparametric estimation of the cure fraction using interval censored data Communications in Statistics-Theory and Methods41, 4251-4275 http://dxdoiorg/101080/036109262011569678 Berkson, J, Gage, R P, 1952 Survival curves for cancer patients following treatment Journal of the American Statistical Association 47, 501-515 http://dxdoiorg/101080/01621459195210501187 Boag, J W, 1949 Maximum likelihood estimates of the proportion of patients cured by cancer therapy Journal of the Royal Statistical Society 11,15-44 Burton, A, Altman, D G, Royston, P, Holder, R L, 2006 The design of simulation studies in medical statistics Statistics in Medicine 25, 4279-4292 http://dxdoiorg/101002/sim2673 PMid:16947139 Chen, M H, Ibrahim, J G, Sinha, D, 1999 A new Bayesian model for survival data with a surviving fractionjournal of the American Statistical Association 94, 909-919 http://dxdoiorg/101080/01621459199910474196 Davidson, R, 1997 Bootstrap Confidence Intervals Based on Inverting Hypothesis Tests Unpublished manuscript Farewell, V T, 1986 Mixture models in survival analysis: Are they worth the risk? The Canadian Journal of Statistics 14, 257-262 http://dxdoiorg/102307/3314804 Gamel, J W, McLean, I W, Rosenberg, S H, 1990 Proportion cured and mean log survival time as functions of tumor size Statistics in Medicine 9, 999-1006 http://dxdoiorg/101002/sim4780090814 PMid:2218200 Goldman, A I, 1984 Survivorship analysis when cure is a possibility: A Monte Carlo study Statistics in Medicine 3,153-163 http://dxdoiorg/101002/sim4780030208 PMid:6463452 Jeevanand, E S, Alice, P M and Hitha, N 2008 Semi-parametric Estimation of PX,Y(X > Y) Economic Quality Control 2, 171-180 Klein, J P, Moeschberger, M L, 2003 Survival Analysis Techniques for Censored and Truncated Data, Second Edition, Springer, New York Kuk, A Y C, Chen, C H, 1992 A mixture model combining logistic regression with proportional hazards regressionbiometrika 79, 531-541 57

http://dxdoiorg/101093/biomet/793531 Peng, Y, Dear, K B G, 2000 A non-parametic mixture model for cure rate estimation Biometrics 56,237-243 http://dxdoiorg/101111/j0006-341x200000237x PMid:10783801 PengY, Carriere K, C, 2002 An Empirical Comparison of Parametric and Semiparametric Cure ModelsBiometrical Journal 441002-1014 Sy, J P, Taylor, J M, 2000 Estimation in a Cox proportional hazard cure model Biometrics 54, 227-236 http://dxdoiorg/101111/j0006-341x200000227x Uddin, M, Islam, M N, Ibrahim, Q I, 2006aAn Analytical Approach on Cure Rate Estimation Based on Uncensored DataJournal of Applied Sciences 6(3), 548-552 http://dxdoiorg/103923/jas2006548552 Uddin, M, Sen A, Noor, M S, Islam, M N, Chowdhury, Z I, 2006bAn Analytical Approach on Non-parametric Estimation of Cure Rate Based on Uncensored DataJournal of Applied Sciences 6, 1258-1264 http://dxdoiorg/103923/jas200612581264 Taylor, J M G, 1995 Semi-parametric estimation in failure time mixture modelsbiostatistics 51, 237-243 Yakovlev, A Y, Asselain, B, Bardou, V J, Fourquet, A, Hoang, T, Rochefediere, A, and Tsodikov, A D, 1993 A Simple Stochastic Model of Tumor Recurrence and Its Applications to Data on pre-menopausal Breast CancerIn BiometrieetAnalyse de DormeesSpatio Temporelles, 12(Eds B Asselain, M Boniface, C Duby, C Lopez, J PMasson, and JTranchefort)SociétéFrancaise de Biométrie, ENSA Renned, France, 66-82 Yu, B, Tiwari, R C and Cronin, K Z, 2004Cure fraction estimation from the mixture cure models for grouped survival times Statistics in Medicine 23, 1733-1747 http://dxdoiorg/101002/sim1774 PMid:15160405 58