A semiparametric generalized proportional hazards model for right-censored data

Size: px

Start display at page:

Download "A semiparametric generalized proportional hazards model for right-censored data"

Doreen Adams
5 years ago
Views:

1 Ann Inst Stat Math (216) 68: DOI 1.17/s A semiparametric generalized proportional hazards model for right-censored data M. L. Avendaño M. C. Pardo Received: 28 March 213 / Revised: 3 September 214 / Published online: 21 November 214 The Institute of Statistical Mathematics, Tokyo 214 Abstract We introduce a flexible family of semiparametric generalized logit-based regression models for survival analysis. Its hazard rates are proportional as the Cox model, but its relative risk related to a covariate is different for the values of the other covariates. The method of partial likelihood approach is applied to estimate its parameters in presence of right censoring and its asymptotic normality is established. We perform a simulation study to evaluate the finite-sample performance of these estimators. This new family of models is illustrated with lung cancer data and compared with Cox model. The importance of the conclusions obtained from the relative risk is pointed out. Keywords Survival analysis Proportional hazards Type-I generalized logistic distribution Semiparametric models Profile likelihood Partial likelihood 1 Introduction Analysis of event times (also referred as survival analysis) deals with data representing the time to a well-defined event. These data arise in engineering, economy, reliability, public health, biomedicine and other areas. Data arising from survival analysis often M. L. Avendaño (B) Faculty of Mathematics, University of Veracruz, Circuito Gonzalo Aguirre Beltrán, s/n, Zona Universitaria, 919 Xalapa, Mexico maravend@ucm.es M. C. Pardo Department of Statistics and Operational Research I, Faculty of Mathematics, Complutense University of Madrid, Plaza de las Ciencias 3, Ciudad Universitaria, 284 Madrid, Spain

2 354 M. L. Avendaño, M. C. Pardo consist of a response variable that measures the duration of time until the occurrence of a specific event and a set of variables (covariates) thought to be associated with the event-time variable, they have some features that present difficulties to traditional statistical methods. The first is that the data are generally asymmetrically distributed, while the second feature is that lifetimes are frequently censored (the end point of interest has not been observed for that individual). Regression models for survival data have traditionally been based on the Cox regression model (Cox 1972). One of the reasons for the popularity of this model is because the unknown parameter can be estimated by partial likelihood without putting a structure on baseline hazard. This model requires that the hazards between any two individuals are proportional across time. There are many works whose aim has been to extend the Cox model in different ways. Some of them appealing the necessity to allow non-proportional hazards such as Aranda-Ordaz (1983), Tibshirani and Ciampi (1983), Etezadi-Amoli and Ciampi (1987), Thomas (1986), Sasieni (1995), Clayton (1978), Hougaard (1984), Younes and Lachin (1997), MacKenzie (1996, 1997) and Devarajan and Ebrahimi (211). However, when the assumption of proportional hazards is tenable, a Cox regression model is usually the preferred model. Therefore, proportional hazards models that maintain the good properties of the Cox model, but give flexibility to the model are welcome. As mentioned above, regression models for survival data have traditionally been based on the proportional hazards model of Cox (1972) which is defined through the hazard function λ(t z) of the form λ(t β, z) = λ (t) exp(β z), (1) where λ ( ) is an arbitrary function of time called baseline hazard function, z = (z 1,...,z d ) is a vector of covariates for the individual at time t, and β = (β 1,...,β d ) is a vector of unknown parameters to be estimated. In case that the baseline hazard function is treated non-parametrically, then this model becomes a semiparametric model. In this paper, we introduce a flexible semiparametric family of proportional hazards models based on replacing the exponential link function in (1) by a generalization of the logistic distribution (see Balakrishnan 1992) called Type-I generalized logistic distribution in Sect. 2. The partial likelihood approach is proposed to estimate the covariate effects of the model considering right-censored data and an application is shown in Sects. 3 and 4, respectively. To prove the asymptotic normality of the partial likelihood estimator in our model, first we establish the equivalence of the partial likelihood estimator and the profile likelihood estimator for our model. Second, the efficiency of the profile likelihood estimator for our semiparametric model is proven. This proof is based on the method of an approximate least favorable submodel used by Murphy and van der Vaart (2) with the Cox regression model and it is described in Sect. 5, but developed in an Appendix. Finally, the small sample performance of the maximum partial likelihood estimators of the regression parameters in threeand four-covariate hazard function models is evaluated through a simulation study in Sect. 6.

3 A semiparametric generalized PH model for right-censoring The generalized proportional hazards model The Cox model given in (1) can be generalized by replacing the exponential function in (1) by a generalization of the logistic distribution called Type-I generalized logistic distribution which is given by ( ) 1 α F(y) = 1 + e y, < y <, α>, where the parameter α is a proportionality constant for the distribution. See Balakrishnan (1992). Therefore, we propose a proportional hazards model defined through the hazard function λ(t θ, z) = λ (t)k (θ, z), (2) with ( ) 1 α K (θ, z) = 1 + exp( β, α >, z) where θ = (β,α), which we call generalized logit-link proportional hazards model. In fact, if α = orβ =, the model reduces to a non-parametric form. The cumulative hazard function for the model (2) is given by (t θ, z) = (t)k (θ, z), where (t) = t λ (u)du is the baseline cumulative hazard function. Thus, the survival function corresponding to the hazard model is S(t θ, z) = exp( (t θ, z)), and Eq. (2) characterizes the model with density given by f (t θ, z) = exp( (t θ, z))λ(t θ, z). Note that, in the particular case α = 1, we obtain the proportional hazards model with a logit-link function exp(β z) λ(t β, z) = λ (t) 1 + exp(β z). For two-covariate profiles z i and z j, the hazard ratio is proportional and the relative risk does not depend on t as

4 356 M. L. Avendaño, M. C. Pardo ρ(t θ, z i, z j ) = λ(t θ, z i) λ(t θ, z j ) ( 1 + exp( β z j ) = 1 + exp( β z i ) = ρ(t β,α,z i, z j ). The interpretation of the hazard ratio of model (2) is similar to the Cox model hazard ratio since if ρ(t θ, z i, z j ) = 1, then the individuals with covariates z i and z j have the same relative risk of death and if ρ(t θ, z i, z j )<1(>1), the individual with covariate z i has lower (higher) relative risk of death than the individual with covariate z j. As particular case, when the covariate z m increases z m +1 and the rest of covariates remain equal the relative risk is equal (lower or higher) if and only if β m = (< or >), respectively, and the parameter α is a proportionality parameter for the model since the relative risk increases or decreases when α increases depending if the relative risk is less or greater than 1. Moreover, this parameter gives flexibility to the model to adapt better to the data. For illustrative purposes, we consider a two-covariate model, the first one a continuous covariate and the second one a 4-level factor covariate. The 4-level covariate is transformed in an indicator 3-dimensional vector. So, we obtain a full model given by ( ) 1 α λ(t θ, z) = λ (t), α >. 1 + exp( (β 1 z 1 + β 2 z 2 + β 3 z 3 + β 4 z 4 )) In this case, if we take the relative risk of individuals with values z 1 = z 1i and z 1 = z 1 j, respectively, at the first covariate and the same level at the second covariate, we have that the relative risk depend on the level of the second covariate, that is: ) α Level I: Level II: Level III: Level IV: ) α ( 1 + exp( z1 j β 1 ) 1 + exp( z 1i β 1 ) ( 1 + exp( z1 j β 1 β 2 ) ) α 1 + exp( z 1i ˆβ 1 β 2 ) ( ) 1 + exp( z1 j β 1 β 3 ) α 1 + exp( z 1i β 1 β 3 ) ( ) 1 + exp( z1 j β 1 β 4 ) α. 1 + exp( z 1i β 1 β 4 ) Note that, in this case, the relative risk under Cox model (1)isgivenby ρ Cox (t β, z i, z j ) = exp(β 1 (z 1i z 1 j )) which does not depend on the second covariate level. Therefore, the proposed model gives more flexibility to the relative risk.

5 A semiparametric generalized PH model for right-censoring 357 Up to now, we have not made any assumption about the baseline hazard function λ ( ), but in the following we assume an unknown functional form for the baseline hazard function of the model, so we have a semiparametric model. 3 Estimation procedure Consider a right-censored data sample of size n where the observed data are the iid triplets (X i,δ i, Z i ) where X i = min(t i, C i ) is the observed time with T i and C i the survival and censoring time, respectively, δ i = 1 {Ti C i } is the indicator variable of censoring and Z i is the vector of covariates for i = 1,...,n. In this case, analogously to Cox (1972), we can construct the partial likelihood function for the sample, and it can be written as L n (θ) = n i=1 K (θ, z i ) K (θ, z j ) j R(x i ) where R(x i ) is the risk set at time x i, and the partial log-likelihood function is given by l n (θ) = ln(l n (θ)) n = ln(k (θ, z i )) ln δ i i=1 j R(x i ) δ i, K (θ, z j ). (3) Then, the maximum partial log-likelihood estimator of the unknown vector of parameters of model (2)isgivenby ˆθ n = ( ˆβ n, ˆα n ) = arg max l n(θ), (4) θ=(β,α>) and the variance of the parameter estimator ˆθ n is var(ˆθ n ) = diag(î 1 n (ˆθ n )), where Î n is the observed information matrix. To obtain (4), we would need to use numerical methods for carrying out the required constrained (α >) optimization. However, the numerical algorithms are high sensitive to initial values of the parameters and they fall at local optimums. To avoid it, we consider {α 1,...,α l } a grid for the value of the parameter α then we maximize the partial log-likelihood function so we obtain l estimations of β, {β 1,...,β l } with partial log-likelihood function values {l n (α 1, β 1 ),...,l n (α l, β l )}. Finally, we consider arg max{l n (α 1, β 1 ),..., l n (α l, β l )} as estimations of the parameters (α, β).

6 l l l 358 M. L. Avendaño, M. C. Pardo beta alpha beta alpha alpha beta Fig. 1 Surface for partial log-likelihood function To check how the method employed works and also if there exists an identifiable problem for β and α, weshowinfig.1 a surface with the values of the partial loglikelihood function for different values of α and β 1 for a data set of size 5 generated from our model with the covariate z 1 a Bernoulli of parameter.45 and z 2 = for α = 2.5 and β 1 = β 2 = 1. It shows clearly that the method employed works very well. Furthermore, we graph the level curves for the partial log-likelihood function in Fig. 2. The true value of the parameters is the red square and the blue circle is that which achieves the maximum value for the partial log-likelihood function. It shows there is non-identifiability issue even if we have only one covariate. 4 Veteran s administration lung cancer data We present results from fitting the generalized proportional hazards model to a dataset from the Veteran s Administration Lung Cancer Study Clinical Trial (Kalbfleisch and Prentice 22). In this clinical trial, males with advanced inoperable lung cancer were randomized to either a standard or test chemotherapy. The primary end point for therapy comparison was time to death. Only nine out of the 137 survival times were censored, so the censoring rate is 6.6 %. The dataset includes six covariates: treatment (1 = standard and 2 = test), age at diagnosis (in years), Karnofsky score (from to 1), diagnosis time, cell type (1 = Squamous, 2 = Small, 3 = Adenocarcinoma and 4 = Large) and prior therapy. The primary purpose of this clinical trial was to

7 A semiparametric generalized PH model for right-censoring 359 Partial log likelihood Partial log likelihood α α β 1 5 levels β 1 1 levels Fig. 2 Level curves for partial log-likelihood function Table 1 Fit of models for veteran s administration lung cancer data Covariate PH GLPH Est SE Est SE Karnofsky Cell type Cell type Cell type α AIC investigate whether the new chemotherapy works better or worse than the standard chemotherapy after adjusting for other covariates. The Veteran data were used by Kalbfleisch and Prentice (22) to illustrate the Cox proportional hazards model and the treatment effect was found to be non-significant. The veteran cancer data can be obtained from the randomsurvivalforest Package of the R statistical software, these data are called veteran. In this example, we consider only two covariates: Karnofsky score and cell type (factor with four levels). We fit the Cox proportional hazards model (PH) and the generalized logit-link proportional hazards model (GLPH). In Table 1, we can see the parameter estimates (Est) and their standard errors (SE) for the two fitted models, for GLPH model we use parametric bootstrap with 5 replicas to obtain the SE of the parameters. We compare the fit of the two models with the the Akaike s entropy-based Information Criterion (AIC). From these results, we conclude that the fit of the two models is similar. On the other hand, the interpretation of regression coefficients is similar in both models. However, the interpretation of their relative risks differs. We consider two individuals with a difference of 1 in the Karnofsky score value with the same cell type. Under PH model the relative risk is

8 36 M. L. Avendaño, M. C. Pardo exp(1 ˆβ 1 ).73. However, under the GLPH model the relative risk depends on the Cell type of the individuals. By way of illustration, we consider individuals with a difference of Karnofsky score of 1, for example, individuals with Karnofsky score of 2 and 1, respectively: ( ) ˆα 1 + exp( 1 ˆβ 1 ) Cell type 1 : exp( 2 ˆβ 1 ) ( ) ˆα 1 + exp( 1 ˆβ 1 ˆβ 2 ) Cell type 2 : exp( 2 ˆβ 1 ˆβ 2 ) ( ) ˆα 1 + exp( 1 ˆβ 1 ˆβ 3 ) Cell type 3 : exp( 2 ˆβ 1 ˆβ 3 ) ( ) ˆα 1 + exp( 1 ˆβ 1 ˆβ 4 ) Cell type 4 : exp( 2 ˆβ 1 ˆβ 4 ) Note that, an individual with a Karnofsky score of 2 compared with an individual with a Karnofsky score of 1 has lower relative risk, but now it is different depending on the cell type. The individuals with cell type 1 have the lowest risk. The individuals with cell type 3 have the highest risk. On other hand, considering individuals with Karnofsky score of 1 and 9, respectively (also a difference of 1 ), the relative risks are: ( ) ˆα 1 + exp( 9 ˆβ 1 ) Cell type 1 : exp( 1 ˆβ 1 ) ( ) ˆα 1 + exp( 9 ˆβ 1 ˆβ 2 ) Cell type 2 : exp( 1 ˆβ 1 ˆβ 2 ) ( ) ˆα 1 + exp( 9 ˆβ 1 ˆβ 3 ) Cell type 3 : exp( 1 ˆβ 1 ˆβ 3 ) ( ) ˆα 1 + exp( 9 ˆβ 1 ˆβ 4 ) Cell type 4 :.7, 1 + exp( 1 ˆβ 1 ˆβ 4 ) thus, an individual with a Karnofsky score of 1 compared with an individual with a Karnofsky score of 9 has lower relative risk to die than when we compare to individuals with Karnofsky scores of 1 and 2, respectively, and this risk is different depending on the cell type. The individuals with cell type 1 have the lowest risk. The individuals with cell type 3 have the highest risk. To sum up, the relative risk under the Cox model is independent from the Cell type and also the concrete Karnofsky score since only it takes into account the difference

9 A semiparametric generalized PH model for right-censoring 361 between the score of the individuals. Therefore, the proposed model provides useful information from its relative risk. 5 Asymptotic normal distribution of the parameter estimators To make inference, we establish the asymptotic normal distribution of the parameter estimator of the generalized proportional hazards model. To do it, we use the main Theorem of Murphy and van der Vaart (2) and its Corollary 1, (see Kosorok 28). Note that, this theorem considers a maximum profile log-likelihood estimator and we propose a maximum partial log-likelihood estimator, then it is necessary to verify that both estimators are equivalent. Consider the semiparametric model (θ, ) P θ,, where θ is the regression parameter and the cumulative hazard function in the model (2) and P θ, is the data distribution function. Recall that an observation with right censoring is given by Y = (X,δ,Z) where X = min(t, C), δ = 1 {T C} and Z R d is the covariate vector. We assume that T is the survival time with cumulative hazard function given Z (t z) = (t)k (θ, z) and C is a censoring time independent of T given Z and uninformative of (θ, ). In this case, the density function of y = (x,δ,z) is given by p θ, (y) =[λ (x)k (θ, z)s T Z (x z)s C Z (x z)] δ [S T Z (x z)p C Z (x z)] 1 δ p Z (z). Thus, the log-likelihood for the observation y is given by l(θ, )(y) = log p θ, (y) = δ[log λ (x) + log K (θ, z)] (x)k (θ, z) + c (5) where c is a constant. Note that, the log-likelihood function depends on the parameters θ and. Furthermore, in Appendix A it is shown that its efficient score function for θ is l θ, (y) = δ( K (θ, z) h (x)) K (θ, z) x ( K (θ, z) h (s))d (s), (6) where with K (θ, z) = ([K 1 (θ, z)], K 2 (θ, z)) (7) αz K 1 (θ, z) = 1 + exp(β z), K 2 (θ, z) = log[k (θ, z)] and h : R R d+1 is the least favorable direction (see Appendix A ).

10 362 M. L. Avendaño, M. C. Pardo We consider the log-likelihood function based on a random sample of n observations y 1,...,y n which is given by l n (θ, ) = n l(θ, )(y i ), i=1 then the profile log-likelihood for θ is defined as pl n (θ) = sup l n (θ, ). (8) Note that, the maximum likelihood estimator for θ, the first component of the pair (ˆθ n, ˆ ) that maximizes l n (θ, ), is the maximizer of the profile loglikelihood function θ pl n (θ). Then, θ n denote the maximum profile log-likelihood estimator. To establish the asymptotic properties of the parameter estimates θ n we assume the following: (i) Exist τ < such that P (C τ) = P (C = τ) >, where P is the true probability measure. (ii) The observed time X [,τ]. (iii) Let H be all monotone increasing continuous functions on [,τ] with () =. We define Ĥ to be the set of all monotone increasing Cadlag functions on [,τ] with () = and Ĥ, where is the true value of the parameter. (iv) The vector Z R d is bounded. (v) Let R d R + be a compact set such as θ, where θ is the true value of the parameter. (vi) The efficient Fisher information matrix Ĩ is invertible. Theorem 1 Under regularity conditions (i) (vi), the maximum profile log-likelihood estimator θ n for the generalized proportional hazards model (2) obtained maximizing (8) is asymptotically normal distributed as n( θ n θ ) d n N (, Ĩ 1 ) where Ĩ = P [( l θ, )( l θ, ) ] with l θ, given in (6). Proof To apply Corollary 1 of the main result of Murphy and van der Vaart (2) we need to verify Conditions (8) (11) of Murphy and van der Vaart (2) and also conditions of its Theorem 1. Let us define the least favorable submodel t t (θ, )

11 A semiparametric generalized PH model for right-censoring 363 with t R d+1, where t (θ, )(s) = t (s) = s (1 + (θ t) h (u))d (u), where h : R R d+1 is the least favorable direction at the true parameter (θ, ). For t small enough t (θ, ) Ĥ. Moreover, we obtain θ (θ, )( ) = θ ( ) = ( ), that is, Condition (8) of Murphy and van der Vaart (2) is satisfied. The least favorable direction h ( ) is given by h (x) = P [K (θ, z)1 {X x} ] P [ K (θ, z)k (θ, z)1 {X x} ], with K (θ, z) as (7). Details about the construction of h ( ) is given in Appendix A at the end of this work. On the other hand, we consider the map t l(t, θ, )(y) defined by l(t, θ, )(y) = log p t, t (y), (9) thus the log-likelihood for the data y under the least favorable submodel is given by l(t, θ, )(y) = δ[log((1 + (θ t) h (x))λ (x)) + log K (t, z)] K (t, z) and its score function for t is given by x (1 + (θ t) h (s))d (s) l(t, θ, )(y) = t l(t, θ, )(y) (1) = δ K (t, z) τ δh (x) (1 + (θ t) h (x)) + τ ( = K (t, z) K (t, z)k (t, z)1 {X s} (1 + (θ t) h (s))d (s) τ h (s) 1 + (θ t) h (s) K (t, z)1 {X s} h (s)d (s) (11) ) dm(s), (12) with dm(s) = d1 {X s} δ K (t, z)1 {X s} (1 + (θ t) h (s))d (s). Thus, we have l(θ, θ, )(y) = l θ, (y), this is, Condition (9) of Murphy and van der Vaart (2) is satisfied.

12 364 M. L. Avendaño, M. C. Pardo In addition, considering the likelihood L(θ, ), the maximizer of L(θ, ) has the form t P n [dn(u)] ˆ θ (t) = P n [1 {X u} K (θ, z)], (13) with P n the empirical distribution function and N(u) = 1 {X u} δ. This estimator is equivalent to the Breslow cumulative baseline hazard estimator in Cox model. Note that, the estimator (13) depends on the parameter θ. In Theorem 5 ( Appendix B ) it P is verified that ˆ θ n is uniformly consistent for for any sequence θ n θ. Thus, Condition (1) of Murphy and van der Vaart (2) is satisfied. The non-bias Condition (11) of Murphy and van der Vaart (2) is verified in Lemma 6 in Appendix B of this work. Finally, standard arguments allow to prove Donsker and Glivenko Cantelli conditions of Theorem 1 of Murphy and van der Vaart (2) (see Appendix B ). Then, we can apply Theorem 1 of Murphy and van der Vaart (2) and we obtain log pl n ( θ n ) = log pl n (θ ) + ( θ n θ ) n l (y i ) i=1 1 2 n( θ n θ ) Ĩ ( θ n θ ) + o P ( n θ n θ +1) 2 where l is the efficient score function for θ at (θ, )givenin(6) and Ĩ is the efficient Fisher matrix for θ at (θ, ). Now, assuming that Ĩ is invertible (Assumption iv)) and Theorem 5 of Appendix B, we can apply Corollary 1 of Murphy and van der Vaart (2) and we obtain n( θ n θ ) d n where, the efficient Fisher matrix is given by N (, Ĩ 1 ), Ĩ = P [( l θ, )( l θ, ) ]. Last theorem establishes the asymptotic distribution of the maximum profile loglikelihood estimator of the unknown parameters of model (2), but we propose to use maximum partial log-likelihood estimator. Nevertheless, next lemma shows equivalence of the partial likelihood and profile likelihood estimators. Lemma 2 The estimator θ n based on the profile log-likelihood for the model (2) under right censoring and the estimator ˆθ n based on the partial log-likelihood function (3) are equivalent.

13 A semiparametric generalized PH model for right-censoring 365 Proof We define the function N i (x) = 1 {Xi x}δ i, thus the partial likelihood function for the sample is given by L n (θ) = n the partial log-likelihood is i=1 x τ ( 1 {Xi x}k (θ, z i ) nj=1 1 {X j x}k (θ, z j ) ) Ni (x), l n (θ) = log(l n (θ)) n τ n = log[1 {Xi x}k (θ, z i )] log 1 {X j x}k (θ, z j ) dn i (x), i=1 and its score function is given by [ ( τ θ l n(θ) = np n K (θ, z) P ) ] n[1 {X x} K (θ, z)k (θ, z)] dn(x). (14) P n [1 {X x} K (θ, z)] On the other hand, if we consider the second term of the score efficient function (6) and the empirical density function we have P n [ K (θ, z) [,x] ( j=1 K (θ, z) P ) ] n[1 {X x} K (θ, z)k (θ, z)] d ˆ (s) P n [1 {X x} K (θ, z)] τ ( = P n [1 {X x} K (θ, z)k (θ, z)] P n [1 {X x} K (θ, z)] P n[1 {X x} K (θ, z)k (θ, z)] P n [1 {X x} K (θ, z)] =, ) d ˆ (s) with ˆ ( ) the estimator of the cumulative baseline hazard function. Taking the first term of (6) wehave ( P n [δ K (θ, z) E )] n[1 {X x} K (θ, z)k (θ, z)] E n [1 {X x} K (θ, z)] [ ( τ = P n K (θ, z) E ) ] n[1 {X x} K (θ, z)k (θ, z)] dn(x). E n [1 {X x} K (θ, z)] Thus, the estimator ˆθ n and the estimator θ n are equivalent.

14 366 M. L. Avendaño, M. C. Pardo 6 Simulation study The aim of this section is to check the normality of the estimators of the covariate effects of model (2) for finite-sample through a simulation study. The survival times t were generated under the model λ(t θ, z) = λ (t)k (θ, z), with ( ) 1 α K (θ, z) = 1 + exp( β, α > fixed z) as t = 1 ( ) log(u), K (θ, z) where u U(, 1) (Bender et al. (25)). We assume that the cumulative baseline hazard function is a Weibull distribution with shape parameter γ = 3.2 and scale parameter μ = 1.1. We consider three different experiment designs: First Design: We consider Z β = Z 1 β 1 + Z 2 β 2 + Z 3 β 3 + Z 4 β 4 with true values of the parameters β = (.5,.2,.3,.1) and α = 8, the covariates Z 1, Z 2 and Z 3 are binary variables that were generated from a factor of 4 levels and each level has a probability of {.37,.19,.3,.14}, respectively, and Z 4 is generated from a U( 1, 1). Second Design: We consider Z β = Z 1 β 1 + Z 2 β 2 + Z 3 β 3 with true values of the parameters β = (.3,.6,.1) and α = 8, the covariates Z 1, Z 2 and Z 3 were generated from a B(.35), U( 1, 1) and N (1.5, 2), respectively. Third Design: We consider Z β = Z 1 β 1 + Z 2 β 2 with true values of the parameters β = (2, 1) and α =.2, the covariates Z 1 and Z 2 were generated from a N (, 1) and U( 1, 1), respectively. Several sample sizes n = 1, 2, 4 and different percent of censoring 15, 45 and 7 % for First and Second designs were considered and only 15 % of censoring for Third design was considered. A total of 5 simulated datasets were generated for each configuration of three designs. In Sect. 5, we proved the asymptotic normality of the estimators of the covariate effects and now we investigate this property for small and moderate sample sizes. Therefore, we fit a generalized logit-based proportional hazard model for each one of the 5 simulate datasets. In Tables 2, 3 and 5, we present the average and the median of the estimators of the covariate effects, their estimated standard errors (SE), empirical standard errors (SEe), the p value for the Shapiro Wilks test for testing normality by marginals and the coverage probabilities (CP) of a 9, 95 and 99 % confidence interval by marginals, respectively. In Table 4, the p values for the Henze Zirkler multivariate normality test are presented for the First and Second Designs for each sample size

15 A semiparametric generalized PH model for right-censoring 367 Table 2 Average and median of the parameter estimates, standard error (SE), empirical standard error (SEe), p value for Shapiro Wilks test and coverage probabilities (CP) for a nominal 9, 95 and 99 %, respectively, for the first design Censoring Sample size Parameter Average Median SE SEe p value CP for.9 CP for.95 CP for % 1 β β β β β β β β β β β β % 1 β β β β β β β β β β

16 368 M. L. Avendaño, M. C. Pardo Table 2 continued Censoring Sample size Parameter Average Median SE SEe p value CP for.9 CP for.95 CP for.99 β β % 1 β β β β β β β β β β β β

17 A semiparametric generalized PH model for right-censoring 369 Table 3 Average and median of the parameter estimates, standard error (SE), empirical standard error (SEe), p value for Shapiro Wilks test and coverage probabilities (CP) for a nominal 9, 95 and 99 %, respectively, for the second design Censoring Sample size Parameter Average Median SE SEe p value CP for.9 CP for.95 CP for % 1 β β β β β β β β β % 1 β β β β β β β β β

18 37 M. L. Avendaño, M. C. Pardo Table 3 continued Censoring Sample size Parameter Average Median SE SEe p value CP for.9 CP for.95 CP for.99 7 % 1 β β β β β β β β β

19 A semiparametric generalized PH model for right-censoring 371 Table 4 p Values for the Henze Zirkler test Censoring Sample size p values Design 1 p values Design 2 15 % % % and each censoring percentage. Moreover, Fig. 3 shows the estimated density plots of the standardized parameter estimates (by marginals) to check the normality of the estimator of β, and Fig. 4 shows the multivariate qq-plots, for sample size 4 for First and Second designs and different censoring. From Tables 2 and 3 it can be seen that the average of the estimates of the parameters is really close to the true value of the parameters for all configurations in both designs. Furthermore, the estimated standard errors are very near to the empirical ones. The empirical coverage probabilities appear to be quite close to the nominal levels. Although the p values of the Shapiro Wilks normality test increase with n for most censoring, some p values are too small for not rejecting the normality of some estimations, even for n = 4. This can be caused by the optimization algorithm used that it favors one estimator over the others. However, the p values of the Henze Zirkler multivariate normality test (Table 4) accept the multivariate normality for n = 4. Moreover, sometimes the results of the formal tests contradict with the impressions of the graphical analysis. In particular for large sample sizes the blind trust on the formal tests for normality can lead to erroneous results. Figure 3 shows that the empirical density plots (by marginals) improve when n increases and these plots are near to a normal density for moderate censoring, even for small sample size and Fig. 4 supports the approach to a multivariate normality distribution based on the qq-plots for n = 4. From Table 5, we can see that the estimation of the parameters is bad, however, it is expected because for the Third design α is near to zero, so the model approaches to a non-parametric model. Therefore, we should prevent to use this model for α less than.5 since the estimations get worse cause the model approaches to non-parametric form. Appendix A: The efficient score function and the least favorable direction To find the efficient score function for the model (2), we follow the steps in the Cox model Example of Murphy and van der Vaart (2).

20 372 M. L. Avendaño, M. C. Pardo Table 5 Average and median of the parameter estimates, standard error (SE), empirical standard error (SEe), p value for Shapiro Wilks test and coverage probabilities (CP) for a nominal 9, 95 and 99 %, respectively, for the third design Censoring Sample size Parameter Average Median SE SEe p value CP for.9 CP for.95 CP for % 1 β β β β β β

21 A semiparametric generalized PH model for right-censoring 373 a First design β^4 Second design n=1 n=1 n=1 n=1 β^ n=2 n=2 n=2 n=2 β^ n=1 n=1 n= n=2 n=2 n= b n=4 n=4 n=4 n=4 β^4 n=4 n=4 n= n=1 n=1 n=1 n=1 β^ n=2 n=2 n=2 n=2 β^ n=1 n=1 n= n=2 n=2 n= c n=4 n=4 n=4 n=4 β^4 n=4 n=4 n= n=1 n=1 n=1 n=1 β^ n=2 n=2 n=2 n=2 β^ n=1 n=1 n= n=2 n=2 n= n=4 n=4 n=4 n=4 n=4 n=4 n=4 Fig. 3 Empirical density plots for the standardized parameter estimates for sample sizes n = 1, 2 and 4 for the First and Second design with a 15 % of censoring, b 45 % of censoring and c 7 % of censoring

22 374 M. L. Avendaño, M. C. Pardo a First design Chi Square Q Q Plot Second design Chi Square Q Q Plot Squared Mahalanobis Distance Squared Mahalanobis Distance Chi Square Quantile Chi Square Quantile b Squared Mahalanobis Distance Chi Square Q Q Plot Squared Mahalanobis Distance Chi Square Q Q Plot Chi Square Quantile Chi Square Quantile c Squared Mahalanobis Distance Chi Square Q Q Plot Squared Mahalanobis Distance Chi Square Q Q Plot Chi Square Quantile Chi Square Quantile Fig. 4 Multivariate qq-plots for sample sizes n = 4 for the First and Second design with a 15 % of censoring, b 45 % of censoring and c 7 % of censoring

23 A semiparametric generalized PH model for right-censoring 375 The efficient score function for θ at (θ, ) is defined as l θ, (y) = l θ, (y) θ, l θ, (y) (15) where l θ, (y) is the score function for θ at (θ, ) and θ, minimizes the squared distance P ( l θ, g)2 over all functions g in the closed linear span of the score functions for. In this case, as in the Cox model, we have θ, l θ, = A θ, h (16) h = (A θ, A θ, ) 1 A l θ, θ, (17) where A θ, h is the score function for at (θ, ), A is the adjoint operator θ, and h is the least favorable direction. Then, the score function for θ of the model (2) is given by From (5) wehave with and l(θ, )(y) = ([ ] β l(θ, )(y), ) α l(θ, )(y). β l(θ, )(y) = δk 1 (θ, z) (x)k 1 (θ, z)k (θ, z), α l(θ, )(y) = δk 2 (θ, z) (x)k 2 (θ, z)k (θ, z), K 1 (θ, z) = αz 1 + exp(β z) K 2 (θ, z) = ln[k (θ, z)]. Note that, the score function for θ can be expressed as l(θ, )(y) = δ K (θ, z) (x) K (θ, z)k (θ, z), (18) with K (θ, z) = (K 1 (θ, z), K 2 (θ, z)).

24 376 M. L. Avendaño, M. C. Pardo To obtain the score function for, we define the path t (x) = x (1 + t h(s))d (s) (19) for t R d+1, ( ) given and a bounded function h :[,τ] R d+1. The path t ( ) Ĥ if t. Replacing the path (19) in the log-likelihood (5) we obtain l(θ, t )(y) = δ[log((1 + t h(x))λ (x)) + log K (θ, z)] K (θ, z) x thus, its derivative with respect to t is given by t l(θ, h(x) t)(y) = δ (1 + t h(x)) and taking t =, we can get t l(θ, t)(y) = δh(x) K (θ, z) t= (1 + t h(s))d (s) + c, x K (θ, z) h(s)d (s), x h(s)d (s) = A θ, h(y). (2) Then, if we consider that h is the least favorable direction and by expressions (15), (16), (18) and (2) the efficient score function for θ is given by l θ, (y) = δ( K (θ, z) h (x)) K (θ, z) x ( K (θ, z) h (s))d (s). In addition, to find the expression of the least favorable direction h we use (17), so it is necessary to obtain the expressions of A θ, l(θ, )(y) and A θ, A θ, (x) where A θ, is the adjoint operator characterized by P θ, [(A θ, h)g] = [h(a θ, g)]. Using the last expression we can get P θ, [(A θ, g)(a θ, h)] = [g(a θ, A θ, h)] (21) and [(A θ, l(θ, ))h] =P θ, [ l(θ, )(A θ, h)]. (22)

25 A semiparametric generalized PH model for right-censoring 377 First, we calculate (A θ, g)(a θ, h) that it is given by x (A θ, g(y)) (A θ, h(y)) = δg(x) h(x) δk (θ, z) [g(x) h(s)d (s) x ] + h(x) g(s)d (s) ( x ) x + (K (θ, z)) 2 g(s)d (s) h(s)d (s). We define x x a(x) = g(x) h(s)d (s) + h(x) g(s)d (s) and integrating partially we have (A θ, g(y)) (A θ, h(y)) x = δg(x) h(x) δk (θ, z)a(x) + (K (θ, z)) 2 a(s)d (s). Furthermore, we can obtain τ P θ, [ δk (θ, z)a(x)] = a(x)e[(k (θ, z)) 2 1 {X x} ]d (x). Therefore, we have P θ, [ δg(x) h(x) δk (θ, z)a(x) + (K (θ, z)) 2 x Finally, by Eq. (21) ] a(s)d (s) = P θ, [δg(x) h(x)] τ [ ] = g(x) h(x) K (θ, z)p[t C > x z]p Z (z)dz d (x) R d τ = g(x) E[K (θ, z)1 {X x} ]h(x)d (x) = [g E[K (θ, z)1 {X x} ]h]. A θ, A θ, h(y) = E[K (θ, z)1 {X x} ]h(y). Analogously, taking the product l(θ, )A θ, h, integrating and using (22)wehave A θ, l(θ, )(y) = E[ K (θ, z)k (θ, z)1 {X x} ].

26 378 M. L. Avendaño, M. C. Pardo Then, we can define the least favorable direction at the true parameters (θ, ) as h (x) = E [ K (θ, z)k (θ, z)1 {X x} ]. E [K (θ, z)1 {X x} ] Appendix B: Technical details for the proof of Theorem 1 We follow the procedure of Murphy (1994) to establish the consistency of the estimators of the generalized proportional hazards model. First, we prove the following lemmas. Lemma 3 The estimator (13) is consistent at θ,thisis, when n. Proof We can obtain that sup ˆ θ (x) a.s. (x), x [,τ] sup ˆ θ (x) (x) sup 1 x [,τ] x [,τ] P n [1 {Xi x}k (θ, z)] 1 P [1 {Xi x}k (θ, z)] [ ] + sup (P N(x) n P ). P [1 {X x} K (θ, z)] On the other hand, the classes x [,τ] {1 {X x} K (θ, z) : x [,τ], θ } and { } N(x) : x [,τ], θ (23) P [1 {X x} K (θ, z)] are Donsker, and as consequence they are Glivenko Cantelli classes. Thus, we have sup (P n P )[1 {Xi x}k (θ, z)] a.s., x [,τ] additionally, as P [1 {Xi x}k (θ, z)] > forx [,τ] sup x [,τ] 1 P n [1 {Xi x}k (θ, z)] 1 P [1 {Xi x}k (θ, z)] a.s.. Moreover, as the class given in (23) is Glivenko Cantelli we obtain sup x [,τ] [ (P n P ) N(x) P [1 {X x} K (θ, z)] ] a.s..

27 A semiparametric generalized PH model for right-censoring 379 Then, the stated lemma follows. Lemma 4 lim sup ˆ θ (τ) < almost surely. Proof Under assumption (iv), there exist < c < such as We define the constant c 2 such as for θ given. Thus, we obtain z < c. c 2 = min K (θ, z) z <c P n [1 {X x} K (θ, z)] c 2 P n [1 {X x} ], and by the law of large numbers, almost surely we have P n [1 {X x} K (θ, z)] c 2 P [1 {X x} ]+o P (1). We assume that P [1 {X x} ] > forx [,τ], thus, P n [1 {X x} K (θ, z)] > almost surely when n. This is, the jumps of ˆ θ in τ are bounded by 1/c 2, thus, ˆ θ (τ) O(1)P n N(τ)/c 2 almost surely when n, with N(x) = 1 {X x} δ. Now, we can prove the following consistency theorem, where θ n is the maximum profile log-likelihood estimator of model (2). Theorem 5 The estimators of the generalized proportional hazards model (2) considering n right-censored data are consistent, this is sup ˆ θ n (x) (x) and θ n θ x [,τ] converge to almost surely when n. Proof By Lemma 4 and Helly s selection Theorem, we have that along of a sequence ˆ θ n (x) (x) for x [,τ], where ( ) is a continuous non-decreasing function and θ n θ with θ.

28 38 M. L. Avendaño, M. C. Pardo On the other hand, because P n [1 {X x} K (θ, z)] >ɛ>, we have that ˆ θ n ( ) is absolutely continuous with respect to ˆ θ ( ) and d ˆ θn (x) converges to a bounded d ˆ θ (x) measurable function ζ(x), thisis (x) = x ζ(s)d (s). Thus, ( ) is absolutely continuous with respect to ( ) and its derivative is denoted by λ ( ), moreover ζ(x) = λ (x) λ (x). In addition, as ( θ n, ˆ θ n ) maximize l n (θ, ) we have 1 n n [l( θ n, ˆ θ n )(y) l(θ, ˆ θ )(y)], i=1 if n, P [l(θ, )(y) l(θ, )(y)]. By the Glivenko Cantelli Theorem and as d ˆ θn (x) d ˆ converges uniformly to λ (x) θ (x) λ (x),we can conclude that the Kullback Leibler information between the density given by the parameters (θ, ) and the density given by the parameters (θ, ) is negative, thus two densities are equal almost surely. This implies that θ = θ and ( ) = ( ) in [,τ]. Then, θ n θ and ˆ θ n (x) (x) almost surely with x [,τ]. Lemma 6 Under the consistency of the estimators θ n and ˆ θ n we have P [ l(θ, θ n, ˆ θ n )]=o P ( θ n θ ). Proof From expression (11) wehave [ τ P [ l(θ, θ n, ˆ θ n )]=P K (θ, z)1 {X s} (h (s) K (θ, z))d ˆ θ n (s) τ Moreover, we can obtain the following equalities K (θ, z)k (θ, z)1 {X s} ( θ n θ ) h (s)d ˆ θ n (s) ] + δ( K (θ, z) h (x)) 1 + ( θ n θ ) h (x) + δ K (θ, z)( θ n θ ) h (x). 1 + ( θ n θ ) h (x)

29 A semiparametric generalized PH model for right-censoring 381 P [K (θ, z)1 {X x} (h (x) K (θ, z))] =, [ ] δ( K (θ, z) h (x)) P = 1 + ( θ n θ ) h (x) and [ ] [ ] δ K (θ, z)( θ n θ ) h (x) δh (x)( θ n θ ) h (x) P = P. 1 + ( θ n θ ) h (x) 1 + ( θ n θ ) h (x) Thus, τ P [ l(θ, θ n, ˆ θ n )]=P [ K (θ, z)k (θ, z)1 {X s} ( θ n θ ) h (s)d ˆ θ n (s) ] + δh (x)( θ n θ ) h (x). (24) 1 + ( θ n θ ) h (x) As s M(s) = N(s) K (θ, z)1 {X s} (1 + (θ θ ) h (s))d (s) is a martingale with media, we have P [ l(θ, θ n, )]= τ = P [ K (θ, z)k (θ, z)1 {X s} ( θ n θ ) h (s)d (s) ] + δ h (x)( θ n θ ) h (x). (25) 1 + ( θ n θ ) h (x) Finally, from expressions (24) and (25) wehave P [ l(θ, θ n, ˆ θ n )]=P [ l(θ, θ n, ˆ θ n )] P [ l(θ, θ n, )] = P [ τ K (θ, z)k (θ, z)1 {X s} ( θ n θ ) h (s)d( ˆ θ n (s) (s)) +2δh (x)( θ n θ ) h (x) 1 + ( θ n θ ) h (x) = o P ( θ n θ ). ]

30 382 M. L. Avendaño, M. C. Pardo Lemma 7 Let V be a neighborhood of (θ, θ, ) where θ and are the true values of the parameter of the generalized proportional hazards model (2) considering right-censored data and l(t, θ, ) the log-likelihood function for the submodel as (9). (i) The class of functions { l(t, θ, ) : (t, θ, ) V } is Donsker with squared-integrable envelope function. (ii) The class of functions { l(t, θ, ) : (t, θ, ) V } is Glivenko Cantelli and bounded in L 1 (P ). Proof Let F be the set of continuous distribution functions. For ρ>, we define C ρ ={P F : P P ρ}, where P is the true distribution function. On the other hand, as the vector z is bounded (assumption iv), the class {β z : β β } is Donsker, with β such as = β α. As the function exp(β z) is differentiable and its derivatives are bounded, the class {exp(β z) : β β } is Donsker. Moreover, as 1 + exp(β z)> we have that { ( K (θ, z) = exp( β z) ) α : θ }, (26) and { } αz K 1 (θ, z) = 1 + exp(β z) : θ are Donsker. In addition, as f (x) = ln(x) with domain in [c, ) where c > is Lipschitz we obtain that the class is Donsker. Thus, {K 2 (θ, z) = ln[k (θ, z)] :θ } { K (θ, z) : θ }

31 A semiparametric generalized PH model for right-censoring 383 is Donsker. As the class {1 {X x} : x [,τ]} is Donsker, we have that the classes and {K (θ, z)1 {X x} : x [,τ], θ } { K (θ, z)k (θ, z)1 {X x} : x [,τ], θ } are Donsker. For P C ρ, as the function P E P [ f ] is Lipschitz, the classes and {E P [K (θ, z)1 {X x} ]:x [,τ], θ, P C ρ } {E P [ K (θ, z)k (θ, z)1 {X x} ]:x [,τ], θ, P C ρ } are Donsker. On the other hand, as z is bounded as θ, there exist m and M such as < m < K (θ, z) <M <. (27) As the function P E P [ f ] is continuous, there exist ρ 1 > such as for each P C ρ we have E P [1 {X x} ] ρ 1 >. (28) From (27) and (28) we obtain <ρ 1 m me P [1 {X x} ] E P [K (θ, z)1 {X x} ] ME P [1 {X x} ] < thus, the class { } 1 E P [K (θ, z)1 {X x} ] : x [,τ], θ, P C ρ is Donsker and in consequence the class { K (θ, z) E } P[ K (θ, z)k (θ, z)1 {X x} ] : x [,τ], θ, P C ρ E P [K (θ, z)1 {X x} ] (29) is it. As the function f d is Lipschitz, the class { [,x] ( K (θ, z) E ) } P[ K (θ, z)k (θ, z)1 {X x} ] d : x [,τ], θ, P C ρ E P [K (θ, z)1 {X x} ] (3)

32 384 M. L. Avendaño, M. C. Pardo is Donsker. Finally, from (26), (29) and (3) we have that { l(θ, ) : x [,τ], θ, P C ρ } is Donsker with squared-integrable envelope function, this proves (i). Analogously, (ii) could be proved. Acknowledgments This work was supported by research grant MTM R and MAEC-AECID. The authors are grateful to the referees for their valuable comments and proposals which have improved the contents of this paper. References Aranda-Ordaz, F. J. (1983). An extension of the proportional-hazards model for grouped data. Biometrics, 39, Balakrishnan, N. (1992). Handbook of the logistic distribution. New York: Marcel Dekker. Bender, R., Augustin, T., Blettner, M. (25). Generating survival times to simulate Cox proportional hazards model. Statistics in Medicine, 24(11), Clayton, D. G. (1978). A model for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease incidence. Biometrika, 65, Cox, D. R. (1972). Regression models and life-tables. Journal of the Royal Statistical Society Series B, 34(2), Devarajan, K., Ebrahimi, N. (211). A semi-parametric generalization of the cox proportional hazards regression model: inference and applications. Computational Statistics and Data Analysis, 65(1), Etezadi-Amoli, J., Ciampi, A. (1987). Extended hazard regression for censored survival data with covariates: a spline approximation for the background hazard function. Biometrics, 43, Hougaard, P. (1984). Life table methods for heterogeneous populations: distributions describing heterogeneity. Biometrika, 71, Kalbfleisch, J. D., Prentice, R. L. (22). The statistical analysis of failure time data (2nd ed.). New York: Wiley. Kosorok, M. R. (28). Introduction to empirical processes and semiparametric inference. NewYork: Springer. MacKenzie, G. (1996). Regression models for survival data: the generalized time-logistic family. Statistician, 45, MacKenzie, G. (1997). On a non-proportional hazards regression model for repeated medical random counts. Statistics in Medicine, 16, Murphy, S. A. (1994). Consistency in a proportional hazards model incorporating a random effect. Annals of Statistics, 22(2), Murphy, S. A., van der Vaart, W. (2). On profile likelihood. Journal of the American Statistical Association, Theory and Methods, 95(45), Sasieni, P. D. (1995). Efficiently weighted estimating equations with application to proportional excess hazards. Lifetime data analysis, 1, Thomas, D. C. (1986). Use of auxiliary information in fitting non-proportional hazards model. In S. H. Moolgavkar R. L. Prentice (Eds.), Modern statistical methods in chronic disease epidemiology (pp ). New-York: Wiley. Tibshirani, R., Ciampi, A. (1983). A family of additive and proportional hazard models for survival data. Biometrics, 39(1), Younes, N., Lachin, J. (1997). Link-based models for survival data with interval and continuous time censoring. Biometrics, 53,

Efficiency of Profile/Partial Likelihood in the Cox Model

Efficiency of Profile/Partial Likelihood in the Cox Model Yuichi Hirose School of Mathematics, Statistics and Operations Research, Victoria University of Wellington, New Zealand Summary. This paper shows