Maximum likelihood estimation in the proportional hazards cure model

Size: px

Start display at page:

Download "Maximum likelihood estimation in the proportional hazards cure model"

Constance Lane
5 years ago
Views:

1 AISM DOI 1.17/s x Maximum likelihood estimation in the proportional hazards cure model Wenbin Lu Received: 12 January 26 / Revised: 16 November 26 The Institute of Statistical Mathematics, Tokyo 27 Abstract The proportional hazards cure model generalizes Cox s proportional hazards model which allows that a proportion of study subjects may never experience the event of interest. Here nonparametric maximum likelihood approach is proposed to estimating the cumulative hazard and the regression parameters. The asymptotic properties of the resulting estimators are established using the modern empirical process theory. And the estimators for the regression parameters are shown to be semiparametric efficient. Keywords Censoring Maximum likelihood estimation Mixture models Proportional hazards cure model Semiparametric efficiency 1 Introduction Survival models with a cure rate have received much attention in recent years (Farewell, 1982, 1986; Kuk and Chen, 1992; Sy and Taylor, 2; Peng and Dear, 2, among others. These models are useful when a proportion of study subjects never experience the event of interest. Applications of cure models can be found in many disciplines, including biomedical sciences, economics, sociology and engineering science. Maller and Zhou (1996 contains a list of such applications. Mixture modelling approach is commonly used to formulate a cure model, which assumes that the underlying population is a mixture of susceptible and nonsusceptible subjects. All susceptible subjects would eventually experience W. Lu (B Department of Statistics, North Carolina State University, 251 Founders Drive, Raleigh, NC 27695, USA wlu4@stat.ncsu.edu

2 W. Lu the event if there were no censoring, while the nonsusceptible ones are immune from the event. Under mixture modelling approach, a decomposition of the event time is given by T = ηt + (1 η, (1 where T < denotes the failure time of a susceptible subject and η indicates, by the value 1 or, whether the sampled subject is susceptible or not. Thus, one can model separately the survival distribution for susceptible individuals and the fraction of nonsusceptible ones. Parametric mixture models were explored earlier by a number of authors. Berkson and Gage (1952 used a mixture of exponential distributions and a constant cure fraction to fit survival data from studies of breast cancer and stomach cancer. This parametric approach was extended by Farewell (1982, 1986 to Weibull regression for survival and logistic regression for the cure fraction. Theoretical and empirical properties of the Weibull extension were fully studied there via the parametric maximum likelihood method. More recent attention has been focused on semiparametric mixture modelling approaches. Kuk and Chen (1992 proposed the so-called proportional hazards cure model in which the proportional hazards regression (Cox 1972 models the survival times of susceptible subjects while the logistic regression models the cure fraction. The model is specified by the following two terms: λ(t Z, X = lim P(t dt T < t + dt T t, Z, X/dt = λ(t exp(β Z, (2 + P (η = 1 X, Z = exp(γ X 1 + exp(γ X, (3 where λ(t Z, X is the hazard function for a susceptible subject with p-dimensional covariates Z and q-dimensional covariates X, and λ(t the completely unspecified baseline hazard function, and β and γ the unknown regression parameter vectors of primary interest. Recall that Z and X may share some common components and X includes 1 so that γ contains the intercept term. Furthermore, we assume that the censoring time C is independent of T and η conditional on Z and X. Define T = min{t, min(c, τ and δ = I{T min(c, τ, where τ denote the total follow-up of the study. Then the observations consist of ( T i, δ i, Z i, X i, i = 1,..., n, which are independent copies of ( T, δ, Z, X. And the observed likelihood function can be written as L n (, θ = n [ { π(γ X i λ( T i e β Z i e ( T i exp(β δ Z i i {1 π(γ X i + π(γ X i e ( T i exp(β Z i 1 δ i, (4

3 Maximum likelihood estimation in the proportional hazards cure model where θ = (β, γ, π(a = exp(a/{1 + exp(a for any real number a, and (t = λ(sds is the baseline cumulative hazard function. As noted by Kuk and Chen (1992, if the cure fraction 1 π(γ X is not equal to zero, the hazard function of T is no longer proportional and the simple form of the partial likelihood function, like that for the usual Cox proportional hazards model, can not be obtained here. To solve this difficulty, they proposed to consider the following complete but unobserved likelihood function: L nc (, θ = n [ { π(γ X i λ( T i e β Z i e ( T i exp(β δ Z i i η i { { 1 π(γ 1 ηi X i {π(γ X i e ( T i exp(β η 1 δi Z i i, where η i s are only partially observed, i.e. when δ i = 1, η i = 1, while when δ i =, η i is unobserved. Based on L nc, they developed a Monte Carlo simulation-based algorithm for approximating a rank-based likelihood function, thereby enabling them to perform maximum marginal likelihood estimation. The proportional hazards cure model was further studied by Peng and Dear (2 and Sy and Taylor (2 among others to obtain alternative methods for computing the joint semiparametric likelihood function. Their approaches largely relied on the semiparametric EM algorithm that computes estimates for both the cumulative baseline hazard and the regression parameters. However, the theoretical properties of the resulting estimators for the proportional hazards cure model remain to be established. Fang et al. (25 studied the large sample properties of the maximum likelihood estimators under the proportional hazards cure model. Recently, Lu and Ying (24 proposed an estimating equations approach for the semiparametric transformation cure models, where the class of linear transformation models are used for the failure times of susceptible subjects and the logistic regression is used for the cure fraction. Their approach was motivated by the work of Chen et al. (22 and used the martingale integral representation to construct unbiased estimating equations. The large sample properties of the resulting estimators were also studied. However, the proposed algorithm for solving the equations may not converge and the resulting estimators for the regression parameters are not efficient, even when the model specifies the proportional hazards cure model, i.e. the error term of the linear transformation models follows the extreme value distribution. In this paper, we propose to estimate the parametric and the nonparametric components in the proportional hazards cure model by using nonparametric maximum likelihood. The joint parametric/nonparametric likelihood approach to semiparametric problems was pioneered by Murphy (1994, 1995 for the frailty model and Scharfsten et al. (1998 for the generalized odds-rate class of regression models, among others. Here, we apply and extend their techniques to the proportional hazards cure model and make use of the modern empirical

4 W. Lu process theory, as elucidated in van der Vaart and Wellner (1996, to drive the large sample properties of the resulting estimators. In addition, We show that the maximum likelihood estimators for the regression parameters are semiparametrically efficient (Bickel et al., The paper is organized in a natural order. We first compute the semiparametric variance bound (Sect. 2. Then, in Sect. 3, we define our estimators and show that they are consistent, asymptotically normal, and achieve the variance bound. Consistent variance estimators are also derived here. Section 4 gives some discussions on generalization of our approach to include time dependent covariates and other semiparametric cure models. Some technical lemmas are put together and proved in the Appendix. 2 Semiparametric variance bound First, note that the log likelihood function for a single observation is l(, θ = δ[log{π(γ Xλ( T+β Z ( T exp(β Z+(1 δ log S( T,, θ, (5 where S(t,, θ = 1 π(γ X + π(γ X exp{ (t exp(β Z. Then the semiparametric variance bound is computed via the semiparametric efficiency theory of Bickel et al. (1993. To be specific, a parametric submodel corresponding to a parameterization of λ(t is considered, say λ(t, α, where λ(t, α = λ(t for some α. So the log likelihood for such parametric submodel is given by l(α, θ = δ[log{π(γ Xλ( T, α+β Z ( T, α exp(β Z +(1 δ log S{ T, (, α, θ, where (t, α = λ(s, αds. The score for θ is l τ θ = W{t, (, α, θdm{t, (, α, θ, (6 where M(t,, θ = N(t Y(sg(s,, θexp(β Zd (s and W(t,, θ = (Z [1 {1 g(t,, θ exp(β Z (t, X {1 g(t,, θ with g(t,, θ = π(γ X exp{ (t exp(β Z 1 π(γ X + π(γ X exp{ (t exp(β Z. By the usual counting process and its associated martingale theory (Fleming and Harrington 1991; Andersen et al. 1993, M(t,, θ is the F t -counting process martingale, where F t is the smallest sigma-algebra generated by {N(s, Y(s, s t. The score for α is

5 Maximum likelihood estimation in the proportional hazards cure model [ l τ α = α λ(t, α t {1 g(t,, θ exp(β Z λ(t, α λ(s, α ds dm{t, (, α, θ. α (7 Let θ and be the true values of θ and, respectively. And denote S θ and S α to be the corresponding scores for θ and α evaluated at the truth. Similar to Scharfsten et al. (1998, we can define the following tangent set in the nonparametric direction, ={f ( T, δ, Z, X : f ( T, δ, Z, X = [a(t {1 g (t exp(β Z a(sd (sdm (t, where a(t is any(p+q dimensional function of t, E[ f ( T, δ, Z, X 2 <. Here g (t = g(t,, θ and M (t = M(t,, θ. Then the efficient score S eff for θ can be defined as S eff = S θ [S θ, where [ is the projection operator. To project S θ onto, we need to find the vector a(t such that ( E [W (t a(t +{1 g (t exp(β Z a(sd (s dm (t [a (t {1 g (t exp(β Z a (sd (sdm (t =, (8 for all a.herew (t = W(t,, θ. By some simple algebra, we can show that the vector a(t which satisfies (8 is a solution to the following Fredholm integral equation of the second kind: where for t, s τ, a(t K(t, sa(sd (s = f (t, t [, τ, (9 K(t, s = E[g (s t{1 g (s t exp(2β ZY(s t E{Y(tg (t exp(β Z τ s t E[g (u{1 g (u 2 exp(3β ZY(ud (u E{Y(tg (t exp(β Z, f (t = E{W (ty(tg (t exp(β Z E{Y(tg (t exp(β Z t E[W (sg (s{1 g (s exp(2β ZY(sd (s E{Y(tg (t exp(β Z.

6 W. Lu If the kernel K(, satisfies sup t [,τ K(t, s d (s <, then according to Kress (1989, there exists a solution to this integral equation. Under mild regularity conditions, which are given in the next section, we can show that the kernel K defined above satisfies the above condition. Denote the solution by a eff (t. Then, the efficient score for θ is S eff = [W (t a eff (t +{1 g (t exp(β Z a eff (sd (sdm (t Therefore, provided that the E(S eff S eff is nonsingular, the semiparametric variance bound,,is{e(s eff S eff 1. Furthermore, we can show that ( E(S eff S eff = E W (t[w (t a eff (t+{1 g (t exp(β Z a eff (sd (s Y(tg (t exp(β Zd (t, since based on (8, ( E [W (t a eff (t +{1 g (t exp(β Z a eff (sd (s [a eff (t {1 g (t exp(β Z a eff (sd (sy(tg (t exp(β Zd (t =. 3 Estimation and asymptotic Nonparametric maximum likelihood method is used to estimate the regression parameters and the baseline cumulative hazard function. For convenience, we assume that there are no tied death times, but our results can be easily tailored to accommodate tied death times. The log likelihood function for observed data is given by n ( δ i [log{π(γ X i +log λ( T i + β Z i ( T i exp(β Z i +(1 δ i log S i ( T i,, θ, (1 where S i is obtained from S by replacing Z and X by Z i and X i, respectively. The maximum of this function does not exists if ( is restricted to be absolutely continuous. Thus, we allow ( to be discrete and replace λ(t in (1 withthe jump size of at time t, denoted by (t. Then we maximize the following modified log likelihood function to obtain our estimates:

7 Maximum likelihood estimation in the proportional hazards cure model nl n (, θ = n ( δ i [log{π(γ X i +log{ ( T i +β Z i ( T i exp(β Z i + (1 δ i log S i ( T i,, θ. (11 We observe that the maximizer ˆ n of (11 must be a step function with jumps at all the death times. Within the class of functions of this form, we can show that the maximum likelihood estimators, ˆ n and ˆθ n,of(11 exist and are finite. Then, we show that they are consistent and asymptotically normal. And the limiting variance of ˆθ n attains the semiparametric efficiency bound,. To establish our claims, we need the following conditions: Condition 1. The function (t is strictly increasing and continuously differentiable, and (τ <. Condition 2. θ lies in the interior of a compact set C and the covariate vectors Z and X are bounded in the sense that P ( Z < m and X < m = 1 for some constant m >. Condition 3. With probability one, there exists a positive constant ε such that P (C T τ Z, X >ε. Condition 4. P {Y(t = 1 Z, X is continuous in t. Remark 1 The conditions 1, 2 and 4 are the usual regularity conditions needed for establishing the large sample results for the maximum partial likelihood estimators under the proportional hazards model. Condition 3 is for the identifiability of the proportional hazards cure model on the interval [, τ, i.e. the follow-up is sufficiently long for identifying (t on the interval [, τ. 3.1 Existence Theorem 1 Assume that conditions 1 2 hold. Then the maximum likelihood estimators of l n (, θ, (, θ = ( ˆ n, ˆθ n exists and is finite. Proof Since l n is a continuous function of θ and the jump sizes of, it is equivalent to show that the jump sizes are finite (Scharfsten et al., To see this, let b 1,..., b k(n denote the jump sizes at the death times T (1 < T (2 < < T (k(n, where k(n is the total number of deaths. Then l n < 1 k(n i log{π(γ X (i +β Z (i +{log b i exp(β Z (i b j n < 1 k(n i log{π(γ X (i +β Z (i +{log b i exp( M b j n where M is a positive constant and Z (i, X (i are the covariate vectors corresponding to the ith death time, T (i. The first inequality above holds since

8 W. Lu < S i ( T i,, θ < 1 and the second inequality is due to condition 2. Therefore, l n diverges to if b j tends to for some j. This implies that the jump sizes of must be finite. Remark 2 The maximization of l n (, θin (11 can be carried out by the Nelder- Mead simplex method. Such an algorithm is available in MATLAB software with the build-in function fminsearch. Remark 3 Since ( ˆ n, ˆθ n exists and is finite, the derivative of l n with respect to the jump sizes of should be zero. This leads to the following equation for ˆ n : ˆ n (t = 1 n d N(s n Y j (s exp( ˆβ Z j {δ j + (1 δ j g j ( T j, ˆ n, ˆθ, < t τ, (12 where N(t = (1/n n N i (t and g i is obtained from g by replacing Z and X by Z i and X i, respectively. 3.2 Consistency We apply the techniques used by Murphy (1994 and Scharfsten et al. (1998 here for proving consistency. Specifically, define n (t = 1 n d N(s n Y j (s exp(β Z j{δ j + (1 δ j g j ( T j,, θ, < t τ, (13 which is a step function with jumps at each of the death times and converges uniformly to (see Lemma 2 of the Appendix. Theorem 2 Assume that conditions 1 4 hold. Then sup ˆ n (t (t a.s. and ˆθ n θ a.s. t [,τ The proof of Theorem 2 is given in Appendix. 3.3 Asymptotic distribution We extend the approach of Murphy (1995 and Scharfsten et al. (1998 to derive the asymptotic distribution of our estimators ( ˆ n, ˆθ n.herewealso work with one-dimensional submodels through the estimators and the difference at the estimators. To be specific, set d (t = {1 + dh 1(sd ˆ n (s and θ d = dh 2 + ˆθ n, where h 1 is a function and h 2 is a (p + q-dimensional vector. Furthermore, write h 2 = (h 21, h 22, where h 21 is the p-dimensional and

9 Maximum likelihood estimation in the proportional hazards cure model h 22 is the q-dimensional vectors corresponding to Z and X, respectively. Let S n ( ˆ n, ˆθ n (h 1, h 2 denote the derivative of l n with respect to d and evaluated at d =. If ( ˆ n, ˆθ n maximizes l n, then S n ( ˆ n, ˆθ n (h 1, h 2 = for all (h 1, h 2.We observe that S n ( ˆ n, ˆθ n (h 1, h 2 = S n1 ( ˆ n, ˆθ n (h 1 + S n2 ( ˆ n, ˆθ n (h 2, where S n1 ( ˆ n, ˆθ n (h 1 = 1 n [ h 1 (t {1 g i (t, ˆ n, ˆθ n exp( ˆβ Z i n { dn i (t Y i (tg i (t, ˆ n, ˆθ n exp( ˆβ Z i d ˆ (t, S n2 ( ˆ n, ˆθ n (h 2 = 1 n n h 2 W i(t, ˆ n, ˆθ n {dn i (t Y i (tg i (t, ˆ n, ˆθ n exp( ˆβ Z i d ˆ (t. h 1 (sd ˆ (s Let BV[, τ denote the space of bounded variation functions defined on [, τ. We assume that the class of h belongs to the space H = BV[, τ R p+q. For h H, we define the norm on H to be h H = h 1 v + h 2 1, where h 1 v is the absolute value of h 1 ( plus the total variation of h 1 on the interval [, τ and h 2 1 is the L 1 -norm of h 2. Define H m ={h H : h H m. Ifm =, then the inequality is strict. In addition, define, θ (h = h 1(td (t + h 2 θ. The, θ indexes the space functionals { =, θ : sup, θ <. h H m Now l (H m, where l (H m is the space of bounded real-valued functions on H m under the supremum norm U = sup h Hm U(h. The score function S n is a random map to l (H m for all finite m. Convergence in probability (denoted by P and weak convergence will be in terms of outer measure. Theorem 3 Assume that conditions 1 4 hold. Then n( ˆ n, n( ˆθ θ G (14 weakly in l (H m, where G is a tight Gaussian process in l (H m with mean zero and covariance process Cov{G(h, G(h = h 1 (tσ 1 (1 (h (td (t + h 2 σ 1 (2 (h, (15

10 W. Lu where σ = (σ 1, σ 2 is a continuous linear operator from H to H with inverse σ 1 = (σ 1 (1, σ 1 (2. The form of σ is given as follows: σ 1 (h(t = E{V(t, ϒ (hy(tg(t, ϒ exp(β [ Z τ E V(s, ϒ (hy(sg(s, ϒ {1 g(s, ϒ exp(2β Zd (s, t and [ σ 2 (h = E W(t, ϒ V(t, ϒ (hy(tg(t, ϒ exp(β Zd (t, where ϒ = (, θ and V(t,, θ = h 1 (t {1 g(t,, θ exp(β Z h 1 (sd (s + h 2W(t,, θ. The proof of Theorem 3 is given in the Appendix and follows the following theorem from (van der Vaart and Wellner 1996, Theorem In this theorem, the parameter space is a subset of l (H m and the score function is a random map S n : l (H m.letϒ = (, θ and ˆϒ n = ( ˆ n, ˆθ n. Furthermore, denote the asymptotic version of S n by S, i.e S(ϒ = E{S n (ϒ. Then we have that S n ( ˆϒ n =, S(ϒ = and ˆϒ n ϒ = o P (1 as elements in l (H m. The notation lin before a set denotes the set of all finite linear combinations of the elements of the set. Theorem 4 Assume the following: 1. (Asymptotic distribution of the score function n{s n (ϒ S(ϒ G, where G is a tight Gaussian process on l (H m. 2. (Fréchet differentiability of the asymptotic score n{s( ˆϒ n S(ϒ = nṡ(ϒ ( ˆϒ n ϒ + o P (1 + n ˆϒ n ϒ, where Ṡ(ϒ : lin{ϒ ϒ : ϒ l (H m is a continuous linear operator. 3. (Invertibility Ṡ(ϒ is continuously invertible on its range. 4. (Approximation condition n{(s n S( ˆϒ n (S n S(ϒ = o P (1 + n ˆϒ n ϒ. Then, n( ˆϒ n ϒ Ṡ(ϒ 1 G. Remark 4 From (15 of Theorem 3, we have Var{G(h = h 1 (tσ 1 (1 (h(td (t + h 2 σ 1 (2 (h, (16

11 Maximum likelihood estimation in the proportional hazards cure model which provides the asymptotic variances of many quantities of interest. For example, if we choose h 1 (t = for all t and h 2 = e i,theith unit vector, then we obtain the asymptotic variance of the ith element of ˆθ n. Alternatively, if we choose h 1 (s = I(s t and h 2 =, then we obtain the asymptotic variance of ˆ n (t. Remark 5 A natural estimator of the asymptotic variance of n( ˆθ θ, n( ˆ n (h can be given as h 1g 1 d ˆ n + h 2 g 2, where g = (g 1, g 2 is the solution to h 1 =ˆσ 1 (g and h 2 =ˆσ 2 (g, with ˆσ 1 (g and ˆσ 2 (g defined as ˆσ 1 (g(t = 1 n ˆσ 2 (g = 1 n 1 n n V i (t, ˆϒ n (gy i (tg i (t, ˆϒ n exp( ˆβ n Z i n t n V i (s, ˆϒ n (gy i (sg i (s, ˆϒ n {1 g i (s, ˆϒ n exp(2 ˆβ n Z id ˆ n (s W i (t, ˆϒ n V i (t, ˆϒ n (gy i (tg i (t, ˆϒ n exp( ˆβ n Z id ˆ n (t Here ˆσ 1 (g and ˆσ 2 (g are the empirical versions of σ 1 (g and σ 2 (g with the true parameter values θ and replaced by the maximum likelihood estimators ˆθ n and ˆ n, respectively. In the following theorem, we want to show that the asymptotic variance estimator defined above exists and converges to the true value given in (16. Theorem 5 Assume that conditions 1 4 hold, then for h H m, the solution g = ˆσ 1 (h exists with probability going to one as n increases. Furthermore, h 1g 1 d ˆ n + h 2 g 2 converges to h 1(tσ 1 (1 (h(td (t + h 2 σ 1 (2 (h in probability. The proof of Theorem 5 is given in the Appendix. Finally, we want to show that the asymptotic variance of n( ˆθ n θ achieve the semiparametric efficiency bound,, established in Sect. 2. By the Cramer-Wold device (see Serfling 198, it suffices to demonstrate that the asymptotic variance of a n( ˆθ n θ is equal to a a, where a is any vector in R p+q. To do this, we need to find an h = (h 1, h 2 such that σ 1 (h(t = for all t and σ 2 (h = a. Consider the solution, h 2 = a and h 1 (t = a eff (ta 1 (B Ia, where { B = E W 2 (t, ϒ Y(tg(t, ϒ exp(β Zd (t, ( A = E W(t, ϒ [a eff (t {1 g(t, ϒ exp(β Z a eff (sd (s Y(tg(t, ϒ exp(β Zd (t.

12 W. Lu Note that with the h define above ( σ 2 (h = E [ W(t, ϒ a eff (ta 1 (B Ia +{1 g(t, ϒ exp(β Z a eff (sa 1 (B Iad (s + W (t, ϒ a Y(tg(t, ϒ exp(β Zd (t = AA 1 (B Ia + B a = a and σ 1 (h(t = E ([ a eff (t +{1 g(t, ϒ exp(β Z a eff (sd (s Y(tg(t, ϒ exp(β Z A 1 (B Ia +E{W(t, ϒ Y(tg(t, ϒ exp(β ( Z a τ s E [ a eff (s +{1 g(s, ϒ exp(β Z a eff (ud (u t Y(sg(s, ϒ {1 g(s, ϒ exp(2β Zd (s A 1 (B Ia [ E W(s, ϒ Y(sg(s, ϒ {1 g(s, ϒ exp(2β Zd (s a t If A 1 (B I =, then σ 1 (h(t = ( E ([ a eff (t +{1 g(t, ϒ exp(β Z a eff (sd (s Y(tg(t, ϒ exp(β Z + E{W(t, ϒ Y(tg(t, ϒ exp(β Z ( s E [ a eff (s +{1 g(s, ϒ exp(β Z a eff (ud (u t Y(sg(s, ϒ {1 g(s, ϒ exp(2β Zd (s [ E W(s, ϒ Y(sg(s, ϒ {1 g(s, ϒ exp(2β Zd (s a ={a eff (t = t K(t, sa eff (sd (s f (te{y(tg(t, ϒ exp(β Z a

13 Maximum likelihood estimation in the proportional hazards cure model the last equality above holds since a eff (t is the solution to (9. To check A 1 (B I =, it is equivalent to show that B A = 1. This easily follows the definition of and some simple algebra. Therefore, we have proved that the asymptotic variance of n( ˆθ n θ achieve the semiparametric efficiency bound. 4 Concluding remarks We have developed in this paper the nonparametric maximum likelihood estimators and their associated asymptotic properties for statistical inference related to the proportional hazards cure model. And the proposed estimators for the regression parameters have been shown to be semiparametric efficient. Our methods and derivation rely on modern empirical process theory. In this paper, a logistic regression is used for modeling the cure fraction. Other parametric models, such as the probit model, can also be easily accommodated. However, the choice of such parametric models is difficult to test in practice. In addition, the covariate vector Z in the Cox model (2 isassumed to be time independent for convenience. This assumption can be relaxed to include time dependent covariates by adding the following condition on Z( (also see condition 1 of Bilias et al. (1997: Condition 2. There exists a constant B such that Z v B, where Z v is the absolute value of Z( plus the total variation of Z on the interval [, τ. Furthermore, the methods proposed here can also be applied to analysis of other semiparametric cure models. For example, the class of transformation mixture cure models considered by Lu and Ying (24, and the class of semiparametric nonmixture cure models (Tsodikov 1998, 21; Tsodikov et al. 23. In particular, Zeng et al. (26 studied a class of transformation nonmixture cure models using the nonparametric maximum likelihood estimation. The models are specified as P(T > t X = G{exp(γ XF(t where F(t is a completely unspecified distribution function and G(x is a known monotone decreasing transformation function with G( = 1. The cure probability for a subject with covariate vector X is then P(T = X = G{exp(γ X. A main difference between the above nonmixture cure model and the proportional hazards mixture cure model considered in the paper is that the parameter γ affects both the cure fraction and the underlying survival distribution of a susceptible subject in the nonmixture cure model while it only affects the cure fraction in the mixture cure model. The goodness-of-fit tests of different types of semiparametric cure models need to be further investigated and warrant future research. Acknowledgement The author is grateful to Professor Zhiliang Ying for the helpful discussion of the paper. Wenbin Lu s research was partially supported by National Science Foundation Grant DMS

14 W. Lu Appendix Lemma 1 Assume that conditions 1-4 hold, then sup n ˆ n (τ <. Proof We prove this result by contradiction. That is, suppose that sup n ˆ n (τ goes to as n increases. Then note that l n ( ˆ n, ˆθ l n ( n, θ = 1 n δ i log π( ˆγ n X i n π(γ X i + 1 n δ i ( ˆβ n β Z i n { + 1 n δ i log ˆ n ( T i n n ( T i ˆ n ( T i exp( ˆβ n Z i + n ( T i exp(β Z i + 1 n (1 δ i log S( T i, ˆ n, ˆθ n n S( T i, n, θ { = Õ p (1 + 1 n δ i log ˆ n ( T i n n ( T i ˆ n ( T i exp( ˆβ n Z i = Õ p (1 + 1 n log 1 n Y j (t exp(β n n Z j{δ j + (1 δ j g( T j, ϒ dn i (t 1 n log 1 n Y j (t exp( ˆβ n n n Z j{δ j + (1 δ j g( T j, ˆϒ n dn i (t 1 n n δ i ˆ n ( T i exp( ˆβ n Z i = Õ p (1 1 n Õ p (1 1 n n δ i ˆ n ( T i exp( ˆβ n Z i n δ i ˆ n ( T i Y i (τ exp( ˆβ n Z i Õ p (1 ˆ n (τ 1 n n δ i Y i (τ exp( ˆβ n Z i Õ p (1 exp( c ˆ n (τ 1 n n δ i Y i (τ where c is a positive constant and Õ p (1 represents quantities, which are bounded away from positive infinity with probability one as n becomes large. Since the limit of 1 n n δ i Y i (τ is E{π(γ XP(C T τ Z, X, which is bigger than according to conditions 2 and 3. Thus, as n goes to, the right-

15 Maximum likelihood estimation in the proportional hazards cure model hand side of the above inequality diverges to with probability one, which is a contradiction. Lemma 2 Assume that conditions 1 4 hold, then sup t (,τ n (t (t, a.e., and for each ω, (i (ii (iii is absolutely continuous. d ˆ nk sup t (,τ (t γ(t. d nk sup t (,τ ˆ nk (t γ(sd (s. Proof Note that M (t = N(t Y(s exp(β Zg(s, ϒ d (s is mean zero and E{Y(t exp(β Zg(t, ϒ > fort [, τ, we have (t = E{Y(s exp(β Zg(s, ϒ 1 de{n(s where E{N(t =E{ Y(s exp(β Zg(s, ϒ d (s. Then n (t (t = 1 n n Y j (s exp(β Z j{δ j + (1 δ j g j ( T j,, θ E{Y(s exp(β Zg(s, ϒ 1 de{n(s 1 d N(s By the Glivenko-Cantelli theorem, n 1 n Y j (s exp(β Z j{δ j +(1 δ j g j ( T j,, θ uniformly converges to E[Y(s exp(β Z{δ + (1 δg( T,, θ on [, τ. In addition, E{Y(t exp(β Zδ =E[I(C t exp(β ZE{I(t T C Z, X, C = E (I(C t exp(β Zπ(γ X[exp{ (t exp(β Z exp{ (C exp(β Z and E{Y(t exp(β Z(1 δg( T,, θ [ = E I(C t exp(β Zg(C,, θ E{I(T > C Z, X, C = E [I(C t exp(β Zπ(γ X exp{ (C exp(β Z

16 W. Lu Therefore, E[Y(t exp(β Z{δ + (1 δg( T,, θ = E [I(C t exp(β Zπ(γ X exp{ (t exp(β Z [ = E I(C t exp(β Zg(t,, θ E{I(T t Z, X = E{Y(t exp(β Zg(t,, θ Since inf t [,τ E{Y(t exp(β Zg(t,, θ >, by Lemma A.2 of Tsiatis (1981, we have sup t (,τ n (t (t, a.e. For (i, let f be any non-negative, bounded, continuous function. Then = f (td (t + f (td{ (t ˆ nk (t 1 f (t 1 n k Y j (t exp( ˆβ n n k Z j {δ j + (1 δ j g j ( T j, ˆ nk, ˆθ nk d N nk (t k 1 f (td{ 1 n k (t ˆ nk (t+m 4 f (t Y j (t d N nk (t n k where N nk (t = n k N i(t and M 4 is a positive constant. The existence of such constant M 4 is because exp( ˆβ n k Z j {δ j + (1 δ j g j ( T j, ˆ nk, ˆθ nk exp( c π( ˆγ n k X j exp{ ˆ nk (τ exp(c c 1 exp( c exp{ c 2 exp(c where c 1 and c 2 are two positive constants. By the Helly-Bray Lemma (p. 18 of Loeve 1963, f (td{ (t ˆ nk (t ask. Using Lemmas of A.1 and A.2 of Tsiatis (1981, we know that 1 1 n k f (t Y j (t d N nk (t n k f (t[e{y(t 1 E{Y(t exp(β Zg(t,, θ d (t as k. Since E{Y(t is bounded away from zero for all t [, τ, we know that

17 Maximum likelihood estimation in the proportional hazards cure model f (td (t M 4 f (t[e{y(t 1 E{Y(t exp(β Zg(t,, θ d (t By choosing f appropriately, this inequality implies that must be continuous at the continuity points of. Since is absolutely continuous, then so is. For(ii, since d ˆ nk (t d nk (t = 1 nk n k and we have already shown that t [,τ Y j(t exp(β Z j{δ j + (1 δ j g j ( T j,, θ 1 nk n k Y j(t exp( ˆβ n k Z j {δ j + (1 δ j g j ( T j, ˆ nk, ˆθ nk sup 1 n k Y j (t exp(β n Z j{δ j + (1 δ j g j ( T j,, θ k E{Y(t exp(β Zg(t,, θ converges to as k goes to. Then we need to show sup 1 n k Y j (t exp( ˆβ n n k Z j {δ j + (1 δ j g j ( T j, ˆ nk, ˆθ nk k E[Y(t exp(β Z{δ + (1 δg( T,, θ t [,τ converges to as k goes to. In fact, the term in above supremum norm is bounded above by 1 n k Y j (t exp( ˆβ n n k Z j {δ j + (1 δ j g j ( T j, ˆ nk, ˆθ nk k 1 n k Y j (t exp(β Z j {δ j + (1 δ j g j ( T j, ˆ nk, θ n k + 1 n k Y j (t exp(β Z j {δ j + (1 δ j g j ( T j, ˆ nk, θ n k 1 n k Y j (t exp(β Z j {δ j + (1 δ j g j ( T j,, θ n k

18 + 1 n k Y j (t exp(β Z j {δ j + (1 δ j g j ( T j,, θ n k E[Y(t exp(β Z{δ + (1 δg( T,, θ W. Lu M 5 ˆθ nk θ + M 6 sup ˆ nk (t (t t [,τ + 1 n k Y j (t exp(β Z j {δ j + (1 δ j g j ( T j,, θ n k E[Y(t exp(β Z{δ + (1 δg( T,, θ where M 5, M 6 are two positive constants. The first two terms converge to by the uniform consistency of ˆ nk and ˆθ nk. The third term needs to be handled more carefully. As noted by Scharfsten et al. (1998, the space of absolutely, bounded, increasing functions { (t is separable under the supremum norm. Thus, the space has a countably dense subset. Let { l, l 1, denote this set. Then we have sup 1 n k Y j (t exp(ξ β Z j {δ j + (1 δ j g j ( T j, l t [,τ n, ξ θ k E[Y(t exp(ξ β Z{δ + (1 δg( T, l, ξ θ converges to as k goes to, for each rational ξ θ = (ξ β, ξ γ and l 1. The third term can be bounded above by 1 n k Y j (t exp(β Z j {δ j + (1 δ j g j ( T j,, θ n k 1 n k Y j (t exp(ξ β Z j {δ j + (1 δ j g j ( T j,, ξ θ n k + 1 n k Y j (t exp(ξ β Z j {δ j + (1 δ j g j ( T j,, ξ θ n k 1 n k Y j (t exp(ξ β Z j {δ j + (1 δ j g j ( T j, l n, ξ θ k

19 Maximum likelihood estimation in the proportional hazards cure model + 1 n k Y j (t exp(ξ β Z j {δ j + (1 δ j g j ( T j, l n, ξ θ k E[Y(t exp(ξ β Z{δ + (1 δg( T, l, ξ θ M 7 θ ξ θ + M 8 sup (t l (t t [,τ + 1 n k Y j (t exp(ξ β Z j {δ j + (1 δ j g j ( T j, l n, ξ θ k E[Y(t exp(ξ β Z{δ + (1 δg( T, l, ξ θ where M 7, M 8 are two positive constants. By appropriate choice of ξ and l,the right-hand side of the above inequality converges to in supremum norm. So, (ii holds. For(iii, since ˆ nk (t γ(sd (s [ = 1 n k 1 Y j (s exp( ˆβ n n k Z j {δ j + (1 δ j g j ( T j, ˆ nk, ˆθ nk d N (s nk k γ(sd (s [ = 1 n k 1 Y j (s exp( ˆβ n k Z j {δ j + (1 δ j g j ( T j, ˆ nk, ˆθ nk d N (s nk n k ( E[Y(s exp(β Z{δ + (1 δg ( T 1dEN(s {[ 1 n k n k + [ sup 1 n k t [,τ 1 Y j (s exp( ˆβ n k Z j {δ j + (1 δ j g j ( T j, ˆ nk, ˆθ nk ( E[Y(s exp(β Z{δ + (1 δg ( T 1 d N nk (s ( E[Y(s exp(β Z{δ + (1 δg ( T 1 d{ N nk (s EN(s n k 1 Y j (t exp( ˆβ n k Z j {δ j + (1 δ j g j ( T j, ˆ nk, ˆθ nk

20 ( 1 E[Y(t exp(β Z{δ + (1 δg ( T ( 1d{ + E[Y(s exp(β Z{δ + (1 δg ( T N nk (s EN(s W. Lu the first term was shown to converge to in (ii and the second term converges to zero by the Helly-Bray Lemma. In addition, pointwise convergence can be strengthened to uniform convergence by applying the same monotonicity argument used in the proof of the Glivenko-Cantelli Theorem (p. 96 of Shorack and Wellner Proof of Theorem 2 We first show that ( ˆ n, ˆθ n converges to (, θ a.s. To do this, we need to show the following three things: (i sup n ˆ n (τ < ; (ii there exists a convergent subsequence of ( ˆ n, ˆθ n,say( ˆ nk, ˆθ nk (, θ a.s.; (iii (, θ = (, θ. The proof of (i is given in Lemma 1 of the Appendix. For (ii, since every bounded sequence in R p+q has a convergent subsequence, there exists a θ such that ˆθ mk θ. By Helly s theorem (Ash 1972, there exists a function and a subsequence { ˆ nk of { ˆ mk such that ˆ nk for all t [, τ at which is continuous. Therefore, ( ˆ nk, ˆθ nk must converge to (, θ. For (iii, we show in Lemma 2 of the Appendix that is continuous at the continuity points of. And we know that l nk ( ˆ nk, ˆθ nk l nk ( nk, θ = 1 n k log{χ nk,i(t{dn i (t Y i (tg i (t, nk, θ exp(β n Z id nk (t k where + 1 n k n k [log{χ nk,i(t {χ nk,i(t 1Y i (tg i (t, nk, θ exp(β Z id nk (t, χ nk,i(t = ˆ nk (tg i ( ˆ nk, ˆθ nk exp( ˆβ n k Z i nk (tg i ( nk, θ exp(β Z i. The second term on the right-hand side of the above inequality is less or equal to zero since for x >, log(x (x 1. Using the results and techniques of Lemma 2 of the Appendix, the first term can be shown to converge to zero and the second term converges to ( [ { g (t exp(β Z E log g (t exp(β γ(t Z Y(tg (t exp(β Zd (t { g (t exp(β Z g (t exp(β γ(t 1 Z, (17

21 Maximum likelihood estimation in the proportional hazards cure model where g (t = g(t,, θ and γ(t = E[Y(t exp(β Z{δ + (1 δg ( T E[Y(t exp(β Z{δ + (1 δg ( T. (18 Note that (17 is the negative Kullback-Leibler information, E[l(, θ E[l(, θ. Due to the above inequality, the Kullback-Leibler information must equal zero. Thus, with probability one, we have log{λ (t exp(β Zg (tdn(t Y(t exp(β Zg (td (t = log{λ (t exp(β Zg (tdn(t Y(t exp(β Zg (td (t, where λ (t = d (t/dt. This equality holds for the following two cases: (i Y(τ = 1, N(τ =, and (ii Y(τ = 1, N(t =, and N(τ = 1for t (, τ. The difference between the equalities from these two cases entails that λ (t exp(β Zg (t = λ (t exp(β Zg (t, t (, τ. After integrating form to t on both sides of the above equality, we have which implies log{s(t,, θ =log{s(t,, θ, 1 exp{ (t exp(β z 1 exp{ (t exp(β z = π(γ X π(γ X, t (, τ. Since the right-hand side of the above equality is independent of t, we have that = and β = β. This further implies that γ = γ. Therefore, ( ˆ nk, ˆθ nk must converge to (, θ a.s. By Helly s theorem, we know that ( ˆ n, ˆθ n must converge to (, θ a.s. Furthermore, the point-wise convergence can be strengthened to uniform convergence by applying the same monotonicity argument used in the proof of the Glivenko-Cantelli Theorem (Shorack and Wellner Proof of Theorem 3 To prove Theorem 3, we validate each of the four conditions of Theorem 4. First, note that S = S 1 + S 2, where ( [ S 1 (ϒ = E h 1 (t {1 g(t, ϒ exp(β Z h 1 (sd (s {dn(t Y(tg(t, ϒexp(β Zd (t, [ S 2 (ϒ = E h 2 W(t, ϒ{dN(t Y(tg(t, ϒexp(β Zd (t.

22 W. Lu For ϒ ϒ l (H m, we need the following bounds on ϒ ϒ : m θ θ 1 m ϒ ϒ m θ θ 1 2m Step 1. We want to establish condition 1 for all finite m. To do this, we show that the class of score function S {S (ϒ h : h H m is Donsker, where S (ϒh = S (ϒ(h 1 + h 2 S θ (ϒ,with S (ϒ(h 1 = [ h 1 (t {1 g(t, ϒ exp(β Z {dn(t Y(tg(t, ϒexp(β Zd (t h 1 (sd (s T = δh 1 ( T {δ + (1 δg( T, ϒ exp(β Z h 1 (td (t, { S θ (ϒ = W(t, ϒ dn(t Y(tg(t, ϒexp(β Zd (t Conditions 1 and 2 imply that S θ (ϒ is a uniformly bounded function, which implies that {h 2 S θ (ϒ : h R p+q, h 2 1 m is Donsker (see Example of van der Vaart and Wellner Since the sum of bounded Donsker classes is Donsker, the class {S (ϒ (h 1 : h 1 BV[, τ, h 1 v m is Donsker if the following two classes F 1 ={δh 1 ( T : h 1 BV[, τ, h 1 v m, { T F 2 = {δ+(1 δg( T, ϒ exp(β Z h 1 (td (t : h 1 BV[, τ, h 1 v m are Donsker and uniformly bounded. The class F 1 is uniformly bounded and Donsker since h 1 varies over bounded variation functions (see Example of van der Vaart and Wellner, In addition, F 2 equals a uniformly bounded function times the class { T h 1(td (t : h 1 BV[, τ, h 1 v m, and this class is Donsker because is a monotone function (see Example of van der Vaart and Wellner Also F 2 is uniformly bounded because h 1 varies over bounded variation functions. Thus, we conclude that S is Donsker, so the first condition holds. Step 2. To check condition 2, it suffices to show that S(ϒ S(ϒ + Ṡ(ϒ (ϒ ϒ is o( ϒ ϒ as ϒ ϒ goes to. To do this, write S(ϒ linearly in d( and θ θ. To be specific,

23 Maximum likelihood estimation in the proportional hazards cure model ( S 1 (ϒ(h = (θ θ E W(t, ϒ [h 1 (t {1 g(t, ϒ exp(β Z h 1 (sd (sy(tg(t, ϒ exp(β Zd (t ( +E [h 1 (t {1 g(t, ϒ exp(β Z h 1 (sd (s Y(tg(t, ϒ exp(β Zd{ (t (t ( E [h 1 (t {1 g(t, ϒ exp(β Z h 1 (sd (s { (t (ty(tg(t, ϒ {1 g(t, ϒ exp(2β Zd (t + error 1 (ϒ(h, { S 2 (ϒ(h = (θ θ E W(t, ϒ h 2 W(t, ϒ d (t [ +E h 2 W(t, ϒ Y(tg(t, ϒ exp(β Zd{ (t (t [ E h 2 W(t, ϒ { (t (t Y(tg(t, ϒ {1 g(t, ϒ exp(2β Zd (t + error 2 (ϒ(h. And the error terms can be very easily shown to satisfy sup h H m error i (ϒ(h θ θ 1 as θ θ 1, where i = 1, 2. This follows from the boundedness of Z, X, N, Y, θ and. Furthermore, S(ϒ S(ϒ + Ṡ(ϒ (ϒ ϒ ϒ ϒ error 1(ϒ(h + error 2 (ϒ(h m θ θ 1 m Since m θ θ 1 m as ϒ ϒ, we conclude that S(ϒ S(ϒ + Ṡ(ϒ (ϒ ϒ is o( ϒ ϒ as ϒ ϒ goes to. Note that S(ϒ =, combining S 1 and S 2 we have Ṡ(ϒ ( ˆϒ n ϒ (h = σ 1 (h(td( ˆ n (t + ( ˆθ n θ σ 2 (h. (19 Step 3. To check condition 3, we need to show that Ṡ(ϒ is continuously invertible. Following Scharfsten et al. (1998, we need to show that σ(h is invertible

24 W. Lu everywhere due to the discreteness of ˆ n. To do this, we first show that σ(h is one-to-one map on L 2 (d R p+q, i.e. σ 1 (h(th 1 (td (t + h 2 σ 2(h = (2 implies that h 2 = and h 1 (t = almost everywhere (d. Plug σ 1 (h(t and σ 2 (h into the above equation, we have ( = E t [ h 1 (t V(t, ϒ Y(tg(t, ϒ exp(β Z V(s, ϒ Y(tg(s, ϒ {1 g(s, ϒ exp(2β Zd (s d (t { + h 2 E W(t, ϒ V(t, ϒ Y(tg(t, ϒ exp(β Zd (t [ = E {h 1 (t + h 2 W(t, ϒ V(t, ϒ Y(tg(t, ϒ exp(β Zd (t [ E h 1 (sd (sv(t, ϒ Y(tg(t, ϒ {1 g(t, ϒ exp(2β Zd (t [ = E V 2 (t, ϒY(tg(t, ϒ exp(β Zd (t. Since Y(tg(t, ϒ exp(β Z>a.e.on[, τ, it implies V(t, ϒ = a.e.(d. Therefore, for almost all ω, h 1 (t +{1 g(t, ϒ [h 22 X(ω t exp{β Z(ω {h 21 Z(ω + h 1(sd (s = h 21 Z(ω a.e. (d. From this, h 21 must be zero. It further implies that h 1 (t 1 g(t, ϒ exp{β Z(ω h 1 (sd (s = h 22 X(ω a.e. (d. Similarly, h 22 must be zero. With h 2 = (h 21, h 22 =, we have h 1 (t {1 g(t, ϒ exp(β Z h 1(sd (s = a.e.(d. This is a Volterra integral equation of the first kind. It is easy to show that the solution, h 1 (, to this equation must be zero a.e. (d. Now based on the fact (2, we want to show that σ is one-to-one everywhere, i.e. set σ = and show that h 2 = and h 1 (t = for all t on [, τ. Ifσ 1 (h(t = for all t and σ 2 (h =, from (2 we have h 2 = and h 1 (t = a.e.(d. Then

25 Maximum likelihood estimation in the proportional hazards cure model = σ 1 (h(t = E{V(t, ϒ Y(tg(t, ϒ exp(β [ Z τ E V(s, ϒ g(s, ϒ {1 g(s, ϒ exp(2β Zd (s. t Based on the fact h 1 (td (t = for all t on [, τ, we have σ 1 (h(t = h 1 (te{y(tg(t, ϒ exp(β Z =. Since E{Y(tg(t, ϒ exp(β Z > for all t on [, τ, we conclude that h 1(t = for all t. Therefore, σ is one-to-one everywhere. Then, we want to show that σ, as a continuous linear operator from H to H, has a continuous inverse. Note that H is a Banach space, if σ is invertible, then the inverse will be continuous (see p. 149, Luenberger, To show that σ is invertible, since σ is one-to-one, we only need to show that it can be written as the difference of a bounded, linear operator with a bounded inverse and a compact, linear operator (see Corollary 3.8 and Theorem 3.4 of Kress, To do this, we define the following linear operator (h(t = ( h 1 (te{y(t exp(β Zg(t, ϒ, { h 2 E W 2 (t, ϒ Y(t exp(β Zg(t, ϒ d (t This is a bounded linear operator due to the boundedness of Z, X, θ, Y(t and (t. In addition, under conditions 1 4, E{Y(t exp(β Zg(t, ϒ >ɛon [, τ for some ɛ> and W 2 (t, ϒ is positive definite a.e. on [, τ, which implies { E W 2 (t, ϒ Y(t exp(β Zg(t, ϒ d (t is a positive definite matrix. Hence, the inverse of (h(t exists, which is given by 1 (h(t = ( h 1 (te{y(t exp(β Zg(t, ϒ 1, { 1 h 2 E W 2 (t, ϒ Y(t exp(β Zg(t, ϒ d (t and it is also a bounded linear operator. Now we want to show that (h σ(h is compact. Let {h n be a sequence in H m. Then it only needs to show that there exists a convergent subsequence of (h n σ(h n. Since h 1n is of bounded variation, we can write h 1n as the difference of increasing functions (see Lemma of Ash In addition, both of these increasing functions are bounded in absolute value by 2m.This

26 W. Lu implies that there exists a pointwise convergent subsequence according to Helly s theorem. Let {h nk be the convergent subsequence with limit h.wemust prove that (h nk σ(h nk converges to (h σ(h in h H norm. To do this, note that (h σ(h can be written as ( ([{ E 1 g(t, ϒ exp(β Z [ +E t ( E h 1 (sd (s h 2 W(t, ϒ Y(tg(t, ϒ exp(β Z V(s, ϒ Y(sg(s, ϒ {1 g(s, ϒ exp(2β Zd (s, W(t, ϒ [h 1 (t {1 g(t, ϒ exp(β Z Y(tg(t, ϒ exp(β Zd (t h 1 (sd (s Therefore, (h nk σ(h nk (h + σ(h H E ([{1 g(t, ϒ exp(β Z (h 1nk h 1 (sd (s (h 2nk h 2 W(t, ϒ Y(tg(t, ϒ exp(β Z ( s +E [(h 1nk h 1 (s {1 g(s, ϒ exp(β Z (h 1nk h 1 (ud (u +(h 2nk h 2 W(s, ϒ Y(sg(s, ϒ {1 g(s, ϒ exp(2β Zd (s ( + E W(t, ϒ [(h 1nk h 1 (t {1 g(t, ϒ exp(β Z (h 1nk h 1 (sd (sy(tg(t, ϒ exp(β Zd (t Due to the boundedness of Z, X, θ, Y(t and (t and the fact that < g(t, ϒ <1on[, τ, the right-hand side of the above inequality is bounded above by M 1 h 2nk h 2 1+ {M 2 (h 1nk h 1 (t +M 3 v H (h 1nk h 1 (s d (sd (t where M 1, M 2 and M 3 are some positive constants. Then applying the dominated convergence theorem, we can show that this sum converges to zero. Hence, (h σ(h is a compact linear operator from H m onto its range for all finite m. Now following the similar steps (14 and (15 of Scharfsten et al. (1998, we can show that Ṡ(ϒ is continuously invertible on its range.

27 Maximum likelihood estimation in the proportional hazards cure model Step 4. To check condition 4, it suffices to show (by Lemma 1 of van der Vaart 1995 that F ={S (ϒ(h S (ϒ (h : h H m, ϒ ϒ <ɛ is Donsker for some ɛ> and sup h Hm E[{S (ϒ(h S (ϒ (h 2 converges to as ϒ converges to ϒ. We can write F = F 3 + F 4, where { { F 3 = h 2 W(t, ϒdM(t, ϒ F 4 = W(t, ϒ dm(t, ϒ : h 2 R p+q, h 2 1 m, θ [ ɛ, ɛ p+q, nonnegative, increasing with (τ 2 (τ { {δ + (1 δg( T, ϒ exp(β Z T T h 1 (td (t {δ + (1 δg( T, ϒ exp(β Z h 1 (td (t : h 1 BV[, τ, h 1 v m, θ [ ɛ, ɛ p+q, nonnegative, increasing with (τ 2 (τ We want to show that F 3 and F 4 are Donsker with uniformly bounded envelopes. Using the result from empirical process theory, that classes of smooth functions are Donsker (see Theorem of van der Vaart and Wellner 1996, we know that the following two classes { δw(t, ϒ : θ [ ɛ, ɛ p+q, nonnegative, increasing with (τ 2 (τ and { W(t, ϒY(tg(t, ϒexp(β Zd (t : θ [ ɛ, ɛ p+q, nonnegative, increasing with (τ 2 (τ are Donsker. In addition, both classes are uniformly bounded due to the boundedness of Z, X, θ, Y(t and (t. Hence, according to the result, that classes of Lipschitz transformations of Donsker classes with integrable envelope functions are Donsker (see Theorem of van der Vaart and Wellner (1996, F 3 is Donsker with uniformly bounded envelope. Theorem of van der Vaart and Wellner (1996 also implies that the class BV M of all real-valued functions on [, τ, which are uniformly bounded by a constant M and are of variation bounded by M, is Donsker. Since T h 1 (td (t h 1 v <, v

28 W. Lu the class { T h 1 (td (t : h 1 BV[, τ, h 1 v m, nonnegative, increasing with (τ 2 (τ is Donsker and uniformly bounded. In addition, since {δ + (1 δg( T, ϒ exp(β Z is uniformly bounded, F 4 is also Donsker with uniformly bounded envelope (see Example of van der Vaart and Wellner Thus, we have shown that all the four conditions in Theorem 4 hold. Then, we know that for all finite m Ṡ(ϒ n( ˆϒ n ϒ (h = σ 1 (h(td( n ˆ n (t n( ˆθ n θ σ 2 (h = n{s( ˆϒ n S(ϒ +o P (1 = n{s n ( ˆϒ n S n (ϒ +o P (1 = n{s n (ϒ S(ϒ +o P (1 uniformly in h H m, where the last equality in the above equation holds since S n ( ˆϒ n = S(ϒ =. Hence, n( ˆϒ n ϒ Ṡ(ϒ 1 G. Following the similar steps of Scharfsten et al. (1998, we can show that Ṡ(ϒ 1 G = G, where G is defined in Theorem 3. Proof of Theorem 5 Since σ is continuously invertible on its range, ˆσ is continuously invertible on a set of probability going to 1. According to Scharfsten et al. (1998, it suffices to show that sup h H =1 ˆσ(h σ(h H converges in probability to. The most difficult part in the proof is to deal with the terms in ˆσ(h σ(h involving h 1. For example, one such term can be written as 1 n n [ h 1 (t {1 g i (t, ˆϒ n exp( ˆβ n Z i W i (t, ˆϒ n Y i (tg i (t, ˆϒ n exp( ˆβ n Z id ˆ n (t ( E [h 1 (t {1 g(t, ϒ exp(β Z W(t, ϒ Y i (tg(t, ϒ exp(β Zd (t h 1 (sd ˆ n (s h 1 (sd (s

29 Maximum likelihood estimation in the proportional hazards cure model The variation of this term can be bounded by the variation of 1 n n ( [ h 1 (t {1 g i (t, ˆϒ n exp( ˆβ n Z i W i (t, ˆϒ n Y i (tg i (t, ˆϒ n exp( ˆβ n Z id ˆ n (t [h 1 (t {1 g i (t, ϒ exp(β Z W i (t, ϒ Y i (tg i (t, ϒ exp(β Zd (t h 1 (sd ˆ n (s h 1 (sd (s plus the variation of 1 n n { [h 1 (t {1 g i (t, ϒ exp(β Z W i (t, ϒ Y i (tg i (t, ϒ exp(β ( Zd (t τ E [h 1 (t {1 g(t, ϒ exp(β Z W(t, ϒ Y i (tg(t, ϒ exp(β Zd (t h 1 (sd (s h 1 (sd (s Uniform consistency of ˆϒ implies the first term converges to zero. The variation of the second term is bounded by 1 n n W i (t, ϒ Y i (tg i (t, ϒ exp(β Z E{W(t, ϒ Y i (tg(t, ϒ exp(β Z v d (t h 1 v + 1 n n W i (t, ϒ Y i (tg i (t, ϒ {1 g i (t, ϒ exp(2β Z E[W(t, ϒ Y i (tg(t, ϒ {1 g(t, ϒ exp(2β Z v d (t h 1 v (τ The integrand converges to zero by the strong law of large numbers. Thus the term converges to zero by the dominated convergence theorem, since h 1 v 1 and (τ is finite. All other terms in ˆσ(h σ(h can be handled similarly. The details of the proof is omitted here.

30 W. Lu References Andersen, P. K., Borgan, Ø., Gill, R., Keiding, N. (1993. Statistical models based on counting processes. New York: Springer Ash, R.B. (1972. Real analysis and probability. San Diego: Academic Berkson, J., Gage, R. P. (1952. Survival curve for cancer patients following treatment. Journal of the American Statistical Association, 47, Bickel, P. J., Klaassen, C. A. J., Ritov, Y., Wellner, J. A. (1993. Efficient and adaptive estimation for semiparametric models. Baltimore: Johns Hopkins University Press. Bilias, Y., Gu, M., Ying, Z. (1997. Towards a general asymptotic theory for Cox model with staggered entry. Annals of Statistics, 25, Chen, K., Jin, Z., Ying, Z. (22. Semiparametric of transformation models with censored data. Biometrika, 89, Cox, D. R. (1972. Regression models and life tables (with Discussion. Journal of the Royal Statistical Society Series B, 34, Fang, H. B., Li, G., Sun, J. (25. Maximum likelihood estimation in a semiparametric logistic/proportional-hazards mixture model. Scandinavian Journal of Statistics, 32, Farewell, V. T. (1982. The use of mixture models for the analysis of survival data with long-term survivors. Biometrics, 38, Farewell, V. T. (1986. Mixture models in survival analysis: are they worth the risk? Canadian Journal of Statistics, 14, Fleming, T. R., Harrington, D. P. (1991. Counting processes and survival analysis. New York: Wiley. Kress, R. (1989. Linear integral equations. New York: Springer. Kuk, A. Y. C., Chen, C.-H. (1992. A mixture model combining logistic regression with proportional hazards regression. Biometrika, 79, Loeve, M. (1963. Probability theory. Princeton: D. Van Nordstrand Company. Lu, W., Ying, Z. (24. On semiparametric transformation cure models. Biometrika, 91, Luenberger, D. G. (1969. Optimization by vector space methods. New York: Wiley. Maller, R. A., Zhou, S. (1996. Survival analysis with long term survivors. New York: Wiley. Murphy, S. A. (1994. Consistency in a proportional hazards model incorporating a random effect. Annals of Statistics, 22, Murphy, S. A. (1995. Asymptotic theory for the frailty model. Annals of Statistics, 23, Peng, Y., Dear, K. B. G. (2. A nonparametric mixture model for cure rate estimation. Biometrics, 56, Scharfsten, D. O., Tsiatis, A. A., Gilbert, P. B. (1998. Semiparametric efficient estimation in the generalized odds-rate class of regression models for right-censored time-to-event data. Lifetime Data Analysis, 4, Serfling, R. J. (198. Approximation theorems of mathematical statistics. New York: Wiley. Shorack, G. R., Wellner, J. A. (1986. Empirical processes with applications to statistics. New York: Wiley. Sy, J. P., Taylor, J. M. G. (2. Estimation in a Cox proportional hazards cure model. Biometrics, 56, Tsiatis, A. A. (1981. A large sample study of Cox s regression model. Annals of Statistics, 9, Tsodikov, A. D. (1998. A proportional hazards model taking account of long-term survivors. Biometrics, 54, Tsodikov, A. D. (21. Estimation of survival based on proportional hazards when cure is a possibility. Mathematical and Computer Modelling, 33, Tsodikov, A. D., Ibrahim, J. G., Yakovlev, A. Y. (23. Estimating cure rates from survival data: an alternative to two-component mixture models. Journal of the American Statistical Association, 98, van der Vaart, A. W. (1995. Efficiency of infinite dimensional M-estimators. Statistica Neerlandica, 49, 9 3. van der Vaart, A. W., Wellner, J. A. (1996. Weak convergence and empirical Processes. New York: Springer. Zeng, D., Ying, G., Ibrahim, J. G. (26. Semiparametric transformation models for survival data with a cure fraction. Journal of the American Statistical Association, 11,

Efficiency of Profile/Partial Likelihood in the Cox Model

Efficiency of Profile/Partial Likelihood in the Cox Model Yuichi Hirose School of Mathematics, Statistics and Operations Research, Victoria University of Wellington, New Zealand Summary. This paper shows