Promotion Time Cure Rate Model with Random Effects: an Application to a Multicentre Clinical Trial of Carcinoma

Similar documents
Variations. ECE 6540, Lecture 02 Multivariate Random Variables & Linear Algebra

Analysis of Survival Data Using Cox Model (Continuous Type)

Parametric Maximum Likelihood Estimation of Cure Fraction Using Interval-Censored Data

General Strong Polarization

UNIVERSITY OF CALIFORNIA, SAN DIEGO

Lecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher

Extreme value statistics: from one dimension to many. Lecture 1: one dimension Lecture 2: many dimensions

Longitudinal + Reliability = Joint Modeling

Estimate by the L 2 Norm of a Parameter Poisson Intensity Discontinuous

Support Vector Machines. CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Inferring the origin of an epidemic with a dynamic message-passing algorithm

Generalized Linear Mixed Models for Longitudinal Data with Missing Values: A Monte Carlo EM Approach

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Non-Parametric Weighted Tests for Change in Distribution Function

General Strong Polarization

Indirect estimation of a simultaneous limited dependent variable model for patient costs and outcome

A SMOOTHED VERSION OF THE KAPLAN-MEIER ESTIMATOR. Agnieszka Rossa

Lecture 3 Optimization methods for econometrics models

Statistical Learning with the Lasso, spring The Lasso

Assessing the effect of a partly unobserved, exogenous, binary time-dependent covariate on -APPENDIX-

Lecture 6. Notes on Linear Algebra. Perceptron

Edinburgh Research Explorer

Passing-Bablok Regression for Method Comparison

Survival Distributions, Hazard Functions, Cumulative Hazards

A Posteriori Error Estimates For Discontinuous Galerkin Methods Using Non-polynomial Basis Functions

Power and Sample Size Calculations with the Additive Hazards Model

CHAPTER 5 Wave Properties of Matter and Quantum Mechanics I

Radial Basis Function (RBF) Networks

Analysis of Cure Rate Survival Data Under Proportional Odds Model

A Step Towards the Cognitive Radar: Target Detection under Nonstationary Clutter

(1) Introduction: a new basis set

Semi-parametric Inference for Cure Rate Models 1

CS249: ADVANCED DATA MINING

ECE 6540, Lecture 06 Sufficient Statistics & Complete Statistics Variations

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

3003 Cure. F. P. Treasure

Dan Roth 461C, 3401 Walnut

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA

Comparing Different Estimators of Reliability Function for Proposed Probability Distribution

Meei Pyng Ng 1 and Ray Watson 1

General Linear Models. with General Linear Hypothesis Tests and Likelihood Ratio Tests

7.3 The Jacobi and Gauss-Seidel Iterative Methods

Tests for the Odds Ratio of Two Proportions in a 2x2 Cross-Over Design

CSC 578 Neural Networks and Deep Learning

TECHNICAL NOTE AUTOMATIC GENERATION OF POINT SPRING SUPPORTS BASED ON DEFINED SOIL PROFILES AND COLUMN-FOOTING PROPERTIES

Multivariate Survival Analysis

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3

On The Cauchy Problem For Some Parabolic Fractional Partial Differential Equations With Time Delays

Gradient expansion formalism for generic spin torques

49th European Organization for Quality Congress. Topic: Quality Improvement. Service Reliability in Electrical Distribution Networks

Tests for the Odds Ratio in a Matched Case-Control Design with a Quantitative X

Group-Sequential Tests for One Proportion in a Fleming Design

Solving Fuzzy Nonlinear Equations by a General Iterative Method

Lecture 3. Truncation, length-bias and prevalence sampling

Introduction to time series econometrics and VARs. Tom Holden PhD Macroeconomics, Semester 2

Module 7 (Lecture 25) RETAINING WALLS

Estimation of Conditional Kendall s Tau for Bivariate Interval Censored Data

Lecture No. 1 Introduction to Method of Weighted Residuals. Solve the differential equation L (u) = p(x) in V where L is a differential operator

STAT331. Cox s Proportional Hazards Model

Quantum state measurement

Thick Shell Element Form 5 in LS-DYNA

Technical Report - 7/87 AN APPLICATION OF COX REGRESSION MODEL TO THE ANALYSIS OF GROUPED PULMONARY TUBERCULOSIS SURVIVAL DATA

Joint Modeling of Longitudinal Item Response Data and Survival

Multi-state Models: An Overview

International Journal of Scientific & Engineering Research, Volume 7, Issue 8, August ISSN MM Double Exponential Distribution

Comparison of Stochastic Soybean Yield Response Functions to Phosphorus Fertilizer

Independent Component Analysis and FastICA. Copyright Changwei Xiong June last update: July 7, 2016

SAMPLE SIZE ESTIMATION FOR SURVIVAL OUTCOMES IN CLUSTER-RANDOMIZED STUDIES WITH SMALL CLUSTER SIZES BIOMETRICS (JUNE 2000)

Haar Basis Wavelets and Morlet Wavelets

SECTION 8: ROOT-LOCUS ANALYSIS. ESE 499 Feedback Control Systems

A linear operator of a new class of multivalent harmonic functions

Approximate Second Order Algorithms. Seo Taek Kong, Nithin Tangellamudi, Zhikai Guo

Lecture 11. Kernel Methods

4. Comparison of Two (K) Samples

Modeling of a non-physical fish barrier

Math 171 Spring 2017 Final Exam. Problem Worth

FAILURE-TIME WITH DELAYED ONSET

Analytics Software. Beyond deterministic chain ladder reserves. Neil Covington Director of Solutions Management GI

In contrast, parametric techniques (fitting exponential or Weibull, for example) are more focussed, can handle general covariates, but require

Systems of Linear Equations

Chapter 5: Spectral Domain From: The Handbook of Spatial Statistics. Dr. Montserrat Fuentes and Dr. Brian Reich Prepared by: Amanda Bell

Spatial data analysis

Adaptive Prediction of Event Times in Clinical Trials

Semiparametric Models for Joint Analysis of Longitudinal Data and Counting Processes

Lecture 3. Linear Regression

Module 7 (Lecture 27) RETAINING WALLS

Hybrid Extended Kalman Filtering and Noise Statistics Optimization for Produce Wash State Estimation

Quantum Mechanics. An essential theory to understand properties of matter and light. Chemical Electronic Magnetic Thermal Optical Etc.

MAS3301 / MAS8311 Biostatistics Part II: Survival

Expectation Propagation performs smooth gradient descent GUILLAUME DEHAENE

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data

Survival Analysis Math 434 Fall 2011

Statistical Methods for Alzheimer s Disease Studies

Heteroskedasticity ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD

arxiv: v2 [stat.me] 19 Sep 2015

General Strong Polarization

One-stage dose-response meta-analysis

Multistate models and recurrent event models

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Extensions of Cox Model for Non-Proportional Hazards Purpose

Transcription:

www.srl-journal.org Statistics Research Letters (SRL) Volume Issue, May 013 Promotion Time Cure Rate Model with Random Effects: an Application to a Multicentre Clinical Trial of Carcinoma Diego I. Gallardo *1, Heleno Bolfarine, Antonio C. Pedroso-de-Lima 3 Instituto de Matemática e Estatística Universidade de Sao Paulo, Brazil * diegog@ime.usp.br; hbolfar@gmail.com; 3 acarlos@ime.usp.br Abstract This paper extends the cure rate model considered in Lopes and Bolfarine (01) by taking into account random effects in both, the probability of cure and the survival function for individuals that are at risk. The model is parametrized in terms of the cured fraction which is then linked to covariates. The estimation is based on the restricted maximum likelihood (REML) approach proposed in McGilchrist and Yau (1995). Simulation studies are performed and results based on a real data set are presented indicating good performance of the proposed approach. Keywords Promotion Time Cure Rate Model; Random Effects; REML Introduction An implicit assumption with the ordinary survival models is that all individuals under study are susceptible to the event of interest, which is not always in the case given the improvement in disease treatments experienced in the last decades. For some types of cancer, for example, great improvement has increased the probability that an individual is considered with the disease under control (usually denoted by a cure). The amount of cured individuals after a treatment is typically known as the cure fraction. An indication of the presence of cured individuals in a specific treatment is given by the value that the Kaplan-Meier estimator takes at the maximum survival time. A formal test for the presence of a cure fraction using the Kaplan-Meier estimator can be seen in Maller and Zhou (1996). Berkson and Gage (195) developed a model that came to be known in the literature as the mixture model for cure fraction where there is a proportion pp of susceptible individuals and, hence, a proportion 1 pp of cured individuals. This model has been studied by several authors including Goldman (1984), Sposto et al. (199) and Mizoi et al. (007), among others. An alternative route was pursued by Yakolev and Tsodikov (1996) and Chen et al. (1999). Their approach is based on the assumption that each individual has a fixed unobserved (latent) number NN of factors, each capable of triggering the event of interest. This model, presented in Section, known in the literature as the promotion time cure rate model has been the subject of intense research activity. Further references can be found in Lopes and Bolfarine (01). Some literature connecting and unifying both, the mixture and promotion time models are Yin and Ibrahim (005) and Rodrigues et al. (009) This paper aims to consider situations where the collected data come from several clusters and are obtained using repeated measures. For example, in multicenter clinical trials, patients from a given clinic (cluster) can be affected by unobserved variables specific to that particular center. Therefore, it seems appropriate to include in the model random effects to take into account the dependence generated in patients treated in a same clinic. Yau and Ng (001), working with a mixture model incorporates random effects in both, the cure fraction (VV jj ) and on the survival function of the susceptible individuals (UU jj ), assuming that the random effects are normally and independently distributed. Lai and Yau (008) worked with the bivariate normal to (UU jj, VV jj ) in the mixture model, assuming a possible dependence between the two random effects in a same clinic. However, either was developed a classical approach to parameter estimation. Lopes and Bolfarine (01) incorporated random effects in the promotion time cure rate model only on the cure fraction and developed classical as well as Bayesian approaches for parameter estimation. 44

Statistics Research Letters (SRL) Volume Issue, May 013 www.srl-journal.org In this paper, the model in Lopes and Bolfarine (01) is extended to develop a classical approach to the parameter estimation in the promotion time cure rate model incorporating random effects for both, the survival time of susceptible individuals and in the cure fraction. Section presents the ordinary promotion time cure rate model. In Section 3, the model is extended, by incorporating random effects. In Section 4, the classical approach to parameter estimation is developed, based on the BLUP and REML estimation approaches. In Section 5, an application of the proposed model is considered for a real data set. In Section 6, a simulation study is presented to evaluate the proposed estimation procedures. Final considerations and discussions are presented in Section 7, where it is concluded that the proposed estimation approach is fairly satisfactory and that the model is a viable alternative to fit real data where it seems adequate to utilize random effects along with a cure fraction. The Promotion Time Cure Rate Model (PTCRM) The model proposed in Chen et al. (1999) can be defined in the following way. Suppose that an individual in the population has NN carcinogenic cells at the time entering the study, where NN ~ PPPP(θθ), θθ > 0. Moreover, WW aa is denoted as the random variable expressing the time the aa-th cell needs to produce a detectable cancer. For non-cured subjects, there are cells such as NN > 0, with WW aa, aa = 1,,, NN, conditionally independent given NN and identically distributed with common survival function FF(tt κκ) = 1 SS(tt κκ), where κκ is a set of unknown parameters, properly defined in some parameter space. For cured subjects, NN = 0 and it is assumed that PP(WW 0 = ) = 1. The distribution F is related to the susceptible individuals and, in general, it is a proper function in the sense that lim tt FF(tt κκ) = 1. Under this framework, the time for a cancer to be detected can be represented by a random variable TT = min {WW aa, 0 aa NN}. In a word, for non-cured individuals, the failure time is the minimum among the times the cells may take to, eventually, produce a detectable cancer whereas cured individuals will never experience the event of interest and the failure time in this case is infinity. Under such conditions, it can be verified that the (improper) survival function for the random variable TT, also called the population survival function, is given by SS pp (tt) = ee θθθθ(tt κκ). (1) Therefore, the cure fraction in the population is given by lim tt SS pp (tt) = ee θθ > 0. Assuming right censoring, the actual data for the ii-th subject may be represented by the random variables YY ii = min(tt ii, CC ii ) and δδ ii = II(TT ii CC ii ), where TT ii and CC ii are failure and censoring times, respectively, for ii = 1,, nn. The observed data will be denoted by DD oooooo = (yy, δδ), where yy = (yy 1,, yy nn ) and δδ = (δδ ii,, δδ nn ). It is noted that NN = (NN 1,, NN nn ) are non-observable, and the complete data will be denoted by DD = (yy, δδ, NN). Therefore, the complete likelihood function for (κκ, θθ) can be written as nn LL(κκ, θθ DD) = ii=1 SS(yy ii κκ) NN ii δδ ii [NN ii ff(yy ii κκ)] δδ ii nn exp{ ii=1 [NN ii log(θθ) log(nn ii!)] nnnn}. () The marginal likelihood with respect to NN is [Chen et al. (1999)] LL(κκ, θθ DD oooooo ) = nn θθθθ(yy ii κκ) δδii ii=1 exp θθ 1 SS(yy ii κκ). (3) Consider now the particular case where the lifetimes WW aa, aa = 1,,, have a Weibull distribution. Although other distributions could also be considered, the Weibull has a number of advantages: it is popular, easy to interpret and has the flexibility of accommodating different shapes for the hazard function, depending on the shape parameter αα. Moreover, for the practical situation motivating the model, it is expected αα > 1. For non-homogeneous populations, let xx ii = (xx ii1,, xx iiii ) TT be a covariate vector associated with the lifetimes WW aa, aa = 1,,. Following Yau and Ng (001) is considered SS(tt ξξ ii, κκ) = [SS 0 (tt κκ)] exp ξξ ii, ii = 1,, nn, (4) where SS 0 (tt κκ) = exp{ λλtt αα }, κκ = (λλ, αα) and ξξ ii = xx ii TT γγ. As for the cure fraction, a vector of covariates zz ii = (zz ii1,, zz iiii ) TT may be assumed. Consider θθ ii = exp(zz ii TT ββ), ii = 1,, nn. The quantities γγ and ββ are, respectively, rr 1 and ss 1 vectors of unknown parameters. The choice of covariates for each part depends on the practical problem being considered and on the information given by the physician. Note that the linear function xx ii TT γγ should exclude an intercept because if that is the case then, SS(tt ξξ ii, κκ) = [SS 0 (tt κκ)] exp ξξ ii = [SS 0 (tt κκ)] exp {γγ 0+ kk=1 xx iiii γγ kk } = [SS 0 (tt κκ rr exp { )] kk=1 xx iiii γγ kk }, (5) which will lead to identifiability problems, as rr 45

www.srl-journal.org Statistics Research Letters (SRL) Volume Issue, May 013 mentioned in Yau and Ng (001). Promotion Time Cure Rate Model Random Effects with Suppose patients grouped in clinic jj, jj = 1,,. The expanded notation is considered in the previous section by means of the proposition that NN iiii represents the number of carcinogenic cells associated with the iith individual from the jj-th clinic, with NN iiii ~PPPP(θθ iiii ). For jj = 1,,, define the vector NN jj = (NN 1jj, NN jj,, NN nnjj jj ) TT. Let WW aaaaaa be the random variable corresponding to the time the aa-th carcinogenic cell from the ii-th individual within the jj-th clinic needs to yield a detectable cancer. For a given subject, the random variables WW aaaaaa, aa = 1,, NN iiii, conditionally independent given NN iiii have a common survival SS(tt ξξ iiii, κκ). Hence, the time up to a detectable cancer can be defined as TT iiii = min {WW aaaaaa, 1 aa NN iiii } for non-cured individuals and PP TT iiii = = 1 if NN iiii = 0. The survival function SS( ξξ iiii, κκ) is the same as in (4), with covariates xx iiii and zz iiii included in the model in the same way as discussed in the previous section. However, in order to take into account the effect of the jj -th clinic on survival for cured and uncured subjects, random effects UU jj, VV jj, jj = 1,, are considered so that ξξ iiii = xx TT iiii γγ + UU jj and θθ iiii = exp zz TT iiii ββ + VV jj, (6) where UU jj, VV jj ~NN (00, ΣΣ), jj = 1,, with ΣΣ = diag(σσ uu, σσ vv ). For a given clinic jj, high (low) values of UU jj will lead to an increase (decrease) in the risk to activate a particular cancer cell; as a consequence subjects from that clinic will experience a greater (smaller) risk of developing a carcinogenic tumor. Similarly, low (high) values of VV jj indicate that patients treated in the jj-th clinic have greater (smaller) chance of being cured. The model in Lopes and Bolfarine (01) is a particular case of (6), when VV jj ~NN(0, σσ vv ), jj = 1,, and σσ uu = 0. It should be mentioned here that the same nonidentifiability issues discussed in Yau and Ng (001) may arise, specially in trials with very long follow-up times, when it is difficult to distinguish between cure and censoring for long-term survivors. Given that a non-intercept model is considered, the random effects UU jj, jj = 1,, will represent those quantities in this new formulation and need to be estimated. The solution proposed in Yau and Ng (001) may be considered: the remaining parameters are estimated assuming the quantity SS 0 (tt κκ) is known. Define UU = (UU 1,, UU ) TT and VV = (VV 1,, VV ) TT. The complete log-likelihood function for the model (after summing over NN = (NN 1,, NN )) is given by ll = ll 1 + ll, in which ll 1 = δδ iiii log θθ iiii + log ff tt iiii κκ θθ iiii FF tt iiii κκ, ll = ( 1/) log(ππσσ uu ) + 1 σσ uu UUTT UU +( 1/) log(ππσσ vv ) + 1 σσ vv VVTT VV, with FF(tt ξξ ii, κκ) = 1 [SS 0 (tt κκ)] exp ξξ ii and SS 0 (tt κκ) = exp{ λλtt αα }. Estimation In the previous section it was observed that the loglikelihood function ll may be expressed as a sum of two quantities ll 1 and ll, where due to the unobserved quantities UU and VV, it can not be directly maximized. One possible way to deal with this problem is to consider the quadrature procedure discussed in Davis and Rabinowitz (1975). Typically, this approach requires high computation times since in the situation discussed here there are random effects to deal with; moreover, there is the question of how many knots to consider, which can significantly affect the estimation procedure. Another alternative is to consider the restricted maximum likelihood (REML) approach proposed by McGilchrist and Yau (1995) as well as Yau and Ng (001), Lai and Yau (008) and Lopes and Bolfarine (01). Initially, define ζζ = (γγ, ββ, UU, VV). Assuming that SS 0 (tt κκ) is known and the value of ΣΣ is fixed, it is possible to find the BLUP estimator for ζζ using the Newton-Raphson algorithm [Henderson (1975)], including the predictors for UU and VV. Starting with ζζ 0, an initial value for ζζ, for l = 1,,, the iterative procedure is based on ζζ l = ζζ l 1 + BB l 1 ζζ ζζ=ζζ l 1, l = 1,,, (7) where BBis the negative of the the second derivative of ll with respect to ζζ. More details on this procedure are provided in Appendix A. Denote ζζ as the estimator for ζζ given by (7) after convergence. Denote the partitioned matrix BB and its inverse, evaluated at ζζ according to the ordering γγ ββ UU VV, respectively, as 46

Statistics Research Letters (SRL) Volume Issue, May 013 www.srl-journal.org BB 11 BB 1 BB 13 BB 14 BB BB = 1 BB BB 3 BB 4 BB 31 BB 3 BB 33 BB 34 BB 41 BB 4 BB 43 BB 44 AA 11 AA 1 AA 13 AA 14 BB 1 AA = 1 AA 31 AA AA 3 AA 3 AA 33 AA 4 AA. 34 AA 41 AA 4 AA 43 AA 44 and According to McGilchrist and Yau (1995) and Yau and Ng (001), the REML estimate for ΣΣ is then given by ΣΣ = 1 diag traa 33 + UU TT UU, traa 44 + VV TT VV. (8) Moreover, with Var ββ γγ = AA 11 AA 1 and AA 1 AA Var ΣΣ = ΣΣ 11 ΣΣ 1, ΣΣ 1 ΣΣ ΣΣ 11 = σσ 4 uu σσ uu tr AA 33 + σσ 8 uu tr AA 33, ΣΣ = σσ 4 vv σσ vv tr AA 44 + σσ 8 vv tr AA 44 and ΣΣ 1 = σσ uu 4 σσ vv 4 tr (AA 34 AA 43 ). In order to deal with the unknown SS 0 (tt κκ), plug the REML estimator obtained for ζζ and ΣΣ into ll and denote the resulting profile likelihood for κκ by ll pp. Considering a bidimensional grid for κκ, values for λλ and αα maximizing ll pp are obtained. The Newton-Raphson approach can also be used. Considering κκ 0 a initial value for κκ, the Newton-Raphson algorithm is based on κκ l = κκ l 1 + CC l 1 ll pp κκ κκ=κκ l 1, l = 1,,, (9) where CC is the negative of the second derivative of the log-likelihood ll pp with respect to κκ. Details of such procedure are given in Appendix B. As pointed out in Yau and Ng (001), the asymptotic variance of the limiting value κκ in (9) is not given by the inverse of CC, because the expected values of the partial derivatives with respect to λλ or αα considering the other parameters as fixed are not zero. However, a resampling method as the jackknife [Miller (1974)] may be considered to compute the standard deviations for the estimator of κκ. After getting such estimates, it is necessary to restart the procedure to update the estimates for ζζ and ΣΣ, with new updated values, compute the updated estimate for κκ, and so on. In summary, to find estimates for ζζ, ΣΣ and κκ, one has to iterate the procedures described by equations (7), (8) and (9) until convergence. Model diagnostics can be performed considering a graphical analysis based on the Cox-Snell residuals, defined as rr iiii = HH tt iiii ; ψψ, xx iiii, zz iiii, ii = 1,, ; jj = 1,,. (10) where HH ; ψψ, xx iiii, zz iiii is the cumulative risk function evaluated at the parameter estimates ψψ = (ζζ, ΣΣ, κκ ). If the model is correct, then rr iiii, jj = 1,, ; ii = 1,,, should behave as a censored sample from an exponential distribution with mean one. A disadvantage of such procedure is that if the proposed model is not adequate, the Cox-Snell residuals will not give any information on the source of inadequacy. Application Kalbfleisch and Prentice (00) have presented a data set related to a trial to compare treatments for oropharynx carcinoma. A total of 195 subjects, with lesions in three different sites is considered: faucial arch, tonsillar fossa and pharyngeal tongue. Each patient was enrolled in one of six participating clinics and randomly assigned to one of the treatments: radiation therapy and radiation therapy combined with chemotherapy. Information on some covariates (gender, tumor stage classification (TT-stage), lymph node stage classification ( NN -stage), among others) which can have influence on the survival of the patients are also available. The aim of the study is to identify risk factors for cancer and compare the treatments with respect to survival prognostic. Patients undergone the treatment for a period of 90 days; after that they received post-treatments. Since there were no restrictions, the clinics could use different approaches for the post-treatment, which could have some influence on patients survival times as well as on the cure rates. This also may lead to some correlation between observed quantities such as the survival times for patients belonging to same clinic. In Kalbfleisch and Prentice (00) the clinics were considered as fixed effects. Consider now the proposed approach following the same lines as in Yau and Ng (001) and Lopes and Bolfarine (01), so that only patients with cancer located in the pharyngeal were considered, corresponding to a total of 66 patients, out of which 47 died and 19 were censored (9%). Variable TT-stage measuring the tumor size is grouped in 4 categories: TT 1, TT and TT 3 which refer to primary tumor measurement and TT 4 to massive invasive tumor. For the i th individual in the j th clinic, define the xx iiii as 47

www.srl-journal.org Statistics Research Letters (SRL) Volume Issue, May 013 xx iiii = 1, if tumor is classified as T 1, T or T 3, 0, if tumor is classified as T 4. Failure is defined as death due to the tongue or oropharynx carcinoma. Deaths related to other causes were considered as censored data. Previous analyses [Kalbfleisch and Prentice (00), Yau and Ng (001), Lai and Yau (008) and Lopes and Bolfarine (01)] agree that there is strong evidence of a cure fraction in this data set. The proposed model in Section 3 is considered with XX = xx iiii, jj = 1,,6; ii = 1,, nn nn 1 jj ; nn = jj =1, ZZ = (11 nn, XX), where 11 nn is a nn-column vector of ones, ββ = (ββ 0, ββ 1 ) TT, γγ = γγ, and nn 1 = 9, nn = 14, nn 3 = 16, nn 4 = 7, nn 5 = 11, nn 6 = 9, nn = 66. Note that the same set of covariates to model the cure fraction and survival of non-cured individuals has been considered. Additionally, the cure rate of Chen et al. (1999) (i.e., with no random effects) based on (1) and an ordinary Weibull survival model were also fitted, that is, neither one took into account the effect of clinic. Results of model fitting for the models described above are presented in Table 1. In addition, random effects estimates for each one of the clinics are shown in Table. TABLE 1 ML AND REML ESTIMATES FOR THE MODEL PARAMETERS (SE). Parameter PTCRM with random Independent effects (REML) PTCRM (ML) γγ 0.58 (0.07) 0.4 (0.375) ββ 0 0.75 (0.039) 0.40(0.44) ββ 1 0.441 (0.156) 0.50(0.409) σσ uu 0.003 (0.08) - σσ vv 0.09 (0.014) - λλ 0.00006 (0.00009) 0.005 (0.00056) αα 1.478 (0.81) 1.47 (0.374) Parameter Independent Single Weibull (ML) γγ 0.573(0.177) ββ 0 - ββ 1 - σσ uu - σσ vv - λλ 0.00016 (0.00014) αα 1.073 (0.134) TABLE BLUP PREDICTORS FOR THE RANDOM EFFECTS. Clinic UU jj VV jj 1 0.000-0.0419 0.0010 0.1043 3-0.0096-0.0861 4 0.005-0.094 5 0.000 0.0498 6 0.00 0.0033 The figures in the first row of Table 1 suggest that when the cure fraction is excluded from the analysis, tumor stage seems to have a significant effect on the survival of susceptible patients. On the other hand, when a cure fraction model without random effects is considered, tumor stage no longer seems to be significant. In contrast, the results presented in Yau and Ng (001) and Lai and Yau (008) considering the mixture model have suggested that this variable is significant. For the random effect model considered here (first column of the table), tumor stage seems also to be significant on the cure fraction. In addition, the values for random effects shown in Table are close to zero and, probably, non-significant given the estimates of their variances. However, a formal test for the hypothesis HH 0 : σσ uu = σσ vv = 0 to be developed to allow a more reliable assessment will be considered in a future work. Point estimates for the cure rates ee θθ iiii for patients with small primary tumor (XX = 0) and a massive tumor (XX = 1) are presented in Table 3. Inspecting the values under an exploratory perspective, it seems that Clinic 3 is the one performing the best among all the clinics, so that in average, the activation time of the cells for susceptible individuals is higher. The estimated hazard ratio is ee γγ = 1.70, so the cells of non-cured subjects with a massive tumor have a 70% higher risk to cause death when compared to non-cured patients with tumors in other stages. TABLE 3 ESTIMATED CURE RATE FOR THE SIX CLINICS. Estimate cure rate Clinic ZZ = 0 ZZ = 1 1 0.8 0.14 0.3 0.10 3 0.30 0.15 4 0.8 0.14 5 0.5 0.1 6 0.7 0.13 Cox-Snell's residuals were computed to check model fitting, considering (10). The cumulative risk function for those residuals was computed and compared with the cumulative distribution for the unitary exponential distribution. As it can be depicted from Figure 1, the two cumulative risk functions are close, suggesting a 48

Statistics Research Letters (SRL) Volume Issue, May 013 www.srl-journal.org reasonable model fit. We expect that As αα varies from 1 to 1.3, the performance of the estimators is improved since the WW iiii will have smaller expectation making it more probable that min {WW iiii, ii = 1,..., NN iiii } is smaller than CC allowing an easier identification on cured patients. FIG. 1 CUMULATIVE HAZARD FUNCTIONS FOR COX-SNELL RESIDUALS AND EXPONENTIAL DISTRIBUTION (SOLID LINE). Simulation The results of a simulation study performed for the promotion time cure rate model with random effects have been reported so that results are comparable with the simulation study performed by Yau and Ng (001). 10 clinics each with 15 patients are taken into consideration. Patients are randomly assigned to the treatment group (xx iiii = 1) and the remaining patients are assigned to the control group (xx iiii = 0), for each clinic. Following the simulation in Yau and Ng (001), the same covariate for the cure fraction with the addition of an intercept was in consideration as well, i.e., zz iiii = [1, xx iiii ]. Values of NN iiii are simulated from the Poisson distribution with mean θθ iiii. The values of θθ iiii are given as in (6) and the VV jj are generated from NN(0, σσ vv ). For cured individuals, that is, NN iiii = 0, the failure time is infinity and for non-cured individuals, random variables WW 1jj,, WW NNiiii jj representing promotion times, with density given by ff 0 tt, xx iiii = h 0 (tt κκ)exp {xx iiii γγ + UU jj }SS 0 (tt κκ) exp {xx iiii γγ+uu jj } have been generated, with UU jj generated from the NN(0, σσ uu ) and the minimum values of these survival times considered to be failure times. If the survival time is greater than the censoring time CC, then it is considered to be a censored observation. In the simulation study, SS 0 (tt κκ) is Weibull as in (4) is regarded, with parameters λλ and αα. We set λλ = 0.01 was set in all the simulations and 3 groups for parameters γγ, ββ 0, ββ 1 were in consideration. For each group simulations were performed with {αα = 1, σσ uu = 0.5, σσ vv = 0.5, CC = 500}, and the effect of changes on αα, σσ uu, σσ vv and CC was studied. For each combination, 500 simulations were generated. As the variances of the random effects grow from 0.5 to 1 and the censoring times is reduced from 500 to 300, estimators performance will be worse since the variances of the random effects are increased (and consequently the variability of the generated failure amplifies because the censoring time will increase in the group of susceptible individuals. TABLE 4 ESTIMATED BIASES FOR 500 SIMULATIONS OF REML ESTIMATION. Parameter True value Average bias SSSS 1 SSSS (a) Sim. Set I (λλ = 0.01, γγ = 0.5, ββ 0 = 0, ββ 1 = 0.5) Sim. 1 (αα = 1, σσ uu = 0.5, σσ vv = 0.5, CC = 500) γγ -0.5 0.003 0.367 0.403 ββ 0 0.0-0.004 0.65 0.84 ββ 1 0.5 0.015 0.99 0.314 σσ uu 0.5-0.0 0.37 0.397 σσ vv 0.5 0.034 0.319 0.331 Sim. (αα = 1.3, σσ uu = 0.5, σσ vv = 0.5, CC = 500) γγ -0.5 0.005 0.73 0.90 ββ 0 0.0 0.03 0.59 0.73 ββ 1 0.5-0.001 0.44 0.40 σσ uu 0.5-0.006 0.306 0.3 σσ vv 0.5 0.010 0.301 0.31 Sim. 3 (αα = 1, σσ uu = 1, σσ vv = 1, CC = 500) γγ -0.5 0.003 0.406 0.443 ββ 0 0.0 0.004 0.35 0.344 ββ 1 0.5 0.011 0.331 0.335 σσ uu 1.0-0.005 0.600 0.766 σσ vv 1.0 0.06 0.594 0.713 Sim. 4 (αα = 1, σσ uu = 0.5, σσ vv = 0.5, CC = 300) γγ -0.5-0.036 0.553 0.565 ββ 0 0.0 0.016 0.71 0.65 ββ 1 0.5 0.056 0.446 0.46 σσ uu 0.5 0.030 0.391 0.458 σσ vv 0.5 0.051 0.338 0.376 (b) Sim. Set II (λλ = 0.01, γγ = 0.5, ββ 0 = 0.5, ββ 1 = 0.5) Sim. 5 (αα = 1, σσ uu = 0.5, σσ vv = 0.5, CC = 500) γγ -0.5 0.006 0.383 0.46 ββ 0-0.5 0.019 0.7 0.77 ββ 1 0.5-0.009 0.308 0.314 σσ uu 0.5 0.011 0.360 0.447 σσ vv 0.5 0.0 0.36 0.36 Sim. 6 (αα = 1.3, σσ uu = 0.5, σσ vv = 0.5, CC = 500) γγ -0.5 0.010 0.75 0.91 ββ 0-0.5 0.09 0.68 0.76 ββ 1 0.5-0.014 0.53 0.50 σσ uu 0.5 0.07 0.333 0.343 σσ vv 0.5 0.010 0.313 0.38 49

www.srl-journal.org Statistics Research Letters (SRL) Volume Issue, May 013 TABLE 4 ESTIMATED BIASES FOR 500 SIMULATIONS OF REML ESTIMATION (CONTINUATION). Parameter True value Average bias SSSS 1 SSSS (b) Sim. Set II (λλ = 0.01, γγ = 0.5, ββ 0 = 0.5, ββ 1 = 0.5) Sim. 7 (αα = 1, σσ uu = 1, σσ vv = 1, CC = 500) γγ -0.5 0.01 0.399 0.438 ββ 0-0.5 0.0 0.354 0.373 ββ 1 0.5-0.00 0.318 0.3 σσ uu 1.0 0.030 0.630 0.847 σσ vv 1.0 0.01 0.586 0.686 Sim. 8 (αα = 1, σσ uu = 0.5, σσ vv = 0.5, CC = 300) γγ -0.5-0.059 0.618 0.618 ββ 0-0.5 0.003 0.78 0.70 ββ 1 0.5 0.051 0.484 0.441 σσ uu 0.5 0.11 0.456 0.516 σσ vv 0.5 0.08 0.344 0.388 (c) Sim. Set III (λλ = 0.01, γγ = 1, ββ 0 = 1, ββ 1 = 1) Sim. 9 (αα = 1, σσ uu = 0.5, σσ vv = 0.5, CC = 500) γγ -1.0-0.095 0.753 0.704 ββ 0-1.0 0.018 0.85 0.88 ββ 1 1.0 0.101 0.634 0.533 σσ uu 0.5 0.080 0.49 0.530 σσ vv 0.5-0.003 0.336 0.373 Sim. 10 (αα = 1.3, σσ uu = 0.5, σσ vv = 0.5, CC = 500) γγ -1.0 0.019 0.8 0.78 ββ 0-1.0 0.06 0.85 0.88 ββ 1 1.0-0.006 0.69 0.74 σσ uu 0.5-0.007 0.333 0.356 σσ vv 0.5 0.019 0.39 0.357 Sim. 11 (αα = 1, σσ uu = 1, σσ vv = 1, CC = 500) γγ -1.0-0.033 0.566 0.614 ββ 0-1.0 0.046 0.364 0.359 ββ 1 1.0 0.06 0.454 0.443 σσ uu 1.0 0.034 0.669 0.98 σσ vv 1.0 0.009 0.60 0.715 Sim. 1 (αα = 1, σσ uu = 0.5, σσ vv = 0.5, CC = 300) γγ -1.0-0.10 1.045 0.785 ββ 0-1.0 0.00 0.89 0.310 ββ 1 1.0 0.163 0.870 0.6 σσ uu 0.5 0.4 0.566 0.63 σσ vv 0.5-0.00 0.353 0.410 Simulation results are presented in Table 4. SSSS 1 and SSSS represent, respectively, the mean of the standard deviations and the standard deviations of the estimates for the 500 simulated samples. The table reveals that the bias corresponding to the parameters related to the cure fraction, namely ββ 0 and ββ 1, is small except for the case CC = 300 for which the bias of ββ 1 is substantially increased, specially for parameter groups II and III. Concerning the standard deviation, it can be noticed that SSSS 1 is slightly smaller than SSSS indicating that the standard deviations of ββ 0 and ββ 1 are slightly underestimated. One exception occurs in the parameter group IIIIII (CC = 300) for which the standard deviation was considerably underestimated. Regarding γγ, it can be noticed that the bias is small except for group III and CC = 300. The standard deviation of γγ is slightly underestimated, except for parameter group III (CC = 300) for which the standard deviation is substantially overestimated. For the random effects it can be depicted that the bias is small, an exception being the case when σσ uu and σσ vv increase from 0.5 to 1.0 and CC decreases from 500 to 300, in which case the bias increases, although the bias of σσ vv is always smaller than that of σσ uu. The variances are slightly underestimated in all parameter groups, except in the case where σσ uu and σσ vv increase and the censoring variable diminishes. It is also worth noticing that as αα increases from 1.0 to 1.3, the performance (evaluated in terms of standard deviation) of all estimators improves. Final Discussion This paper extends the cure rate model in Lopes and Bolfarine (01) by considering random effects in both, the cure probability and the survival function for individuals that are at risk. The random effects are considered to be normally distributed. An approach to model adequacy is developed defining Cox-Snell type residuals for the proposed model. The model is parametrized in terms of the cured fraction which is then linked to covariates. The approach used for parameter estimation is the restricted maximum likelihood (REML) approach proposed in McGilchrist and Yau (1995). Simulation studies are performed and results of a real data analysis are presented indicating good performance of the proposed approach. Moreover, residual analysis performed for the model considered to analyze the real data has indicated that the methodology considered in this paper is suitable to deal with real data sets of the considered type. An important case exclusive from consideration in this work is the situation where the random effects UU jj and VV jj are correlated, which could be the case in many practical applications. This generalization is the subject of a forthcoming paper. Appendix A: The BLUP Estimation Procedure The score and matrix BB, computed based on the second derivative of ll with respect to ζζ, are as follows: XX 0 ζζ = 0 ZZ ll 1/ ξξ GG 0 ll 1 / φφ 0 GG 0 0 σσ uu UU σσ vv VV, 50

Statistics Research Letters (SRL) Volume Issue, May 013 www.srl-journal.org BB XX 0 = 0 ZZ ll 1 / ξξ ξξ ll 1 / ξξ φφ 0 GG 0 GG 0 ll 1 / φφ ξξ XX ll 1 / φφ φφ 0 ZZ 0 GG 0 GG 0 0 0 0 0 0 0 0 + 0 0 σσ uu II 0, 0 0 0 σσ vv II with ξξ = XX γγ + GG UU and φφ = ZZ ββ + GG VV. II is the identity matrix of dimension and XX, ZZ and GG are the design matrices of γγ, ββ and the random effects UU and VV respectively. Let HH 0 (tt κκ) be the baseline cumulative hazard. The first and second partial derivatives of ll 1 with respect to the elements of ξξ and φφ are given as follows: ll 1 ξξ iiii = δδ iiii 1 HH 0 tt iiii κκ ee ξξ iiii, ll 1 ξξ iiii ee φφ iiii +ξξ iiii HH 0 tt iiii κκ SS 0 tt iiii κκ exp ξξ iiii, = HH 0 tt iiii κκ ee ξξ iiii [δδ iiii +ee φφ iiii SS 0 tt iiii κκ exp ξξ iiii 1 HH 0 tt iiii κκ ee ξξ iiii ], ll 1 φφ iiii = δδ iiii ee φφ iiii 1 SS 0 tt iiii κκ exp ξξ iiii, ll 1 φφ iiii = eeφφ iiii 1 SS 0 tt iiii κκ exp ξξ iiii, ll 1 φφ iiii ξξ iiii = ee φφ iiii +ξξ iiii HH 0 tt iiii κκ SS 0 tt iiii κκ exp ξξ iiii, ll 1 ξξ iiii ξξ ii jj = 0, ll 1 φφ iiii ξξ ii jj = 0, ll 1 φφ iiii ξξ ii jj = 0, for ii ii or jj jj. Appendix B: Estimation of Parameters of Baseline Hazard For a Weibull baseline hazard, the cumulative hazard is HH 0 (tt κκ) = λλtt αα. Let SS iiii = SS 0 tt iiii κκ exp ξξ iiii and θθ iiii = ee φφ iiii. The first and second partial derivatives of ll 1 with respect to the parameters of the baseline hazard and matrix are given as follows ll pp = δδ iiii λλ 1 ee ξξ iiii tt αα iiii θθ iiii ee ξξ iiii tt αα iiii SS iiii, ll pp λλ = δδ iiii λλ θθ iiii ee ξξ iiii tt αα iiii SS iiii, ll pp = δδ iiii αα 1 + log tt iiii λλee ξξ iiii tt αα iiii log tt iiii λλθθ iiii ee ξξ iiii tt iiii αα SS iiii log tt iiii, ll pp αα = δδ iiii αα + λλee ξξ iiii tt αα iiii log tt iiii + λλθθ iiii ee ξξ iiii tt iiii αα SS iiii log tt iiii (1 λλee ξξ iiii tt iiii αα ), ll pp = eeξξ iiii tt αα iiii log tt iiii δδ iiii + θθ iiii SS iiii (1 λλee ξξ iiii tt αα iiii ), ACKNOWLEDGMENT The first author acknowledges the financial support from CONICYT-Chile. The authors also want to thank the referees whose valuable comments improved the presentation of this paper. REFERENCES Berkson, J., and Gage, R. Survival curve for cancer patients following treatment. Journal of the American Statistical Association 47 (195): 501-515. Chen, M.H., and Ibrahim, J.G., Sinha, D. A new Bayesian model for survival data with a surviving fraction. Journal of the American Statistical Association 94 (1999): 909-919. Davis, P., and Rabinowitz, P. Methods of numerical integration. New York: Academic Press, 1975. Goldman, A. Survivorship analysis when cure is a possibility: a Monte Carlo study. Statistics in Medicine 3 (1984): 153-163. Henderson, C.R. Best linear unbiased estimation and prediction under a selection model. Biometrics 31 (1975): 43-447. Kalbfleisch, J.D., and Prentice, R.L. The Statistical Analysis of Failure Data. New York: Wiley, 00. Kaplan, E.L., Meier, P. Nonparametric estimation from incomplete observations. Journal of the American Statistical Association 53 (1958): 457-481. Lai, X., and Yau, K. Long-term survivor model with bivariate random effects: Applications to bone marrow transplant and carcinoma study data. Statistics in Medicine 7 (008): 569-5708. Lopes, C.M., and Bolfarine, H. (01). Random effects in promotion time cure rate models. Computational 51

www.srl-journal.org Statistics Research Letters (SRL) Volume Issue, May 013 Statistics and Data Analysis 56 (01): 75-87. Maller, R.A., and Zhou, S. (1996). Survival analysis with long term survivors. New York: Wiley, 1996. McGilchrist, C.A., and Yau, K.K. The derivation of BLUP, ML, REML estimation methods for generalised linear mixed models. Communications in Statistics - Theory and Methods 4 (1995): 963-980. Miller, R.G. The jackknife: a review. Biometrika 61 (1974): 1-15. Mizoi, M.F., Bolfarine, H., and Lima, A.C.P. (007). Cure rate models with measurement errors. Communications in Statistics - Simulation and Computation 36(1) (007): 185-196. Rodrigues, J., Cancho, V.G., Castro, M.A., and Louzada-Neto, F. On the unification of the long-term survival models. Statistics and Probability Letters 79 (008): 753-759. Sposto, R., Sather, H., and Baker, S. A comparison of tests of the difference in the proportion of patients who are cured. Biometrics 48 (199): 87-99. Yakovlev, A.Y., and Tsodikov, A.D. Stochastic Models of Tumor Latency and their Biostatistical Applications. New Jersey: World Scientific, 1996. Yau K., and Ng, A. Long-term survivor mixture model with random effects: application to a multi-centre clinical trial of carcinoma. Statistics in Medicine 0 (001):1591-1607. Yin, G., and Ibrahim, J.G. (005). Cure rate models: a unified approach. Canadian Journal of Statistics 33(4) (005): 559-570. 5