A semiparametric generalized proportional hazards model for right-censored data

Similar documents
Efficiency of Profile/Partial Likelihood in the Cox Model

Power and Sample Size Calculations with the Additive Hazards Model

Introduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models

Quantile Regression for Residual Life and Empirical Likelihood

STAT331. Cox s Proportional Hazards Model

University of California, Berkeley

Tests of independence for censored bivariate failure time data

UNIVERSITY OF CALIFORNIA, SAN DIEGO

Full likelihood inferences in the Cox model: an empirical likelihood approach

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data

Frailty Models and Copulas: Similarities and Differences

A note on profile likelihood for exponential tilt mixture models

Survival Analysis for Case-Cohort Studies

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

Lecture 22 Survival Analysis: An Introduction

Estimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations

Cox s proportional hazards model and Cox s partial likelihood

Noninformative Priors for the Ratio of the Scale Parameters in the Inverted Exponential Distributions

Analysis of competing risks data and simulation of data following predened subdistribution hazards

Lecture 3. Truncation, length-bias and prevalence sampling

Chapter 2 Inference on Mean Residual Life-Overview

Analysis of Time-to-Event Data: Chapter 4 - Parametric regression models

Survival Analysis Math 434 Fall 2011

Verifying Regularity Conditions for Logit-Normal GLMM

MAS3301 / MAS8311 Biostatistics Part II: Survival

Attributable Risk Function in the Proportional Hazards Model

FULL LIKELIHOOD INFERENCES IN THE COX MODEL: AN EMPIRICAL LIKELIHOOD APPROACH

Multivariate Survival Data With Censoring.

Jackknife Euclidean Likelihood-Based Inference for Spearman s Rho

Large sample theory for merged data from multiple sources

Empirical likelihood ratio with arbitrarily censored/truncated data by EM algorithm

STAT 6350 Analysis of Lifetime Data. Failure-time Regression Analysis

11 Survival Analysis and Empirical Likelihood

Lecture 6 PREDICTING SURVIVAL UNDER THE PH MODEL

Step-Stress Models and Associated Inference

Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion

Regularization in Cox Frailty Models

Multistate Modeling and Applications

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued

Empirical Processes & Survival Analysis. The Functional Delta Method

A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints

Frailty Modeling for clustered survival data: a simulation study

Beyond GLM and likelihood

Continuous Time Survival in Latent Variable Models

Bias Correction of Cross-Validation Criterion Based on Kullback-Leibler Information under a General Condition

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky

Multivariate Survival Analysis

Survival Analysis I (CHL5209H)

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA

AFT Models and Empirical Likelihood

EMPIRICAL ENVELOPE MLE AND LR TESTS. Mai Zhou University of Kentucky

Maximum likelihood estimation in the proportional hazards cure model

Analysis of Gamma and Weibull Lifetime Data under a General Censoring Scheme and in the presence of Covariates

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

1 Glivenko-Cantelli type theorems

Bayesian estimation of the discrepancy with misspecified parametric models

Practice Exam 1. (A) (B) (C) (D) (E) You are given the following data on loss sizes:

On the Breslow estimator

Logistic regression model for survival time analysis using time-varying coefficients

A FRAILTY MODEL APPROACH FOR REGRESSION ANALYSIS OF BIVARIATE INTERVAL-CENSORED SURVIVAL DATA

Estimation of the Bivariate and Marginal Distributions with Censored Data

Size and Shape of Confidence Regions from Extended Empirical Likelihood Tests

GROUPED SURVIVAL DATA. Florida State University and Medical College of Wisconsin

Hacettepe Journal of Mathematics and Statistics Volume 45 (5) (2016), Abstract

Likelihood Construction, Inference for Parametric Survival Distributions

Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview

Efficient Semiparametric Estimators via Modified Profile Likelihood in Frailty & Accelerated-Failure Models

On Estimation of Partially Linear Transformation. Models

Estimation and Inference of Quantile Regression. for Survival Data under Biased Sampling

Survival Analysis. Stat 526. April 13, 2018

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates

Models for Multivariate Panel Count Data

COX S REGRESSION MODEL UNDER PARTIALLY INFORMATIVE CENSORING

Least Absolute Deviations Estimation for the Accelerated Failure Time Model. University of Iowa. *

COMPUTATION OF THE EMPIRICAL LIKELIHOOD RATIO FROM CENSORED DATA. Kun Chen and Mai Zhou 1 Bayer Pharmaceuticals and University of Kentucky

COMPUTE CENSORED EMPIRICAL LIKELIHOOD RATIO BY SEQUENTIAL QUADRATIC PROGRAMMING Kun Chen and Mai Zhou University of Kentucky

Part III Measures of Classification Accuracy for the Prediction of Survival Times

A general mixed model approach for spatio-temporal regression data

β j = coefficient of x j in the model; β = ( β1, β2,

Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems

Approximate and Fiducial Confidence Intervals for the Difference Between Two Binomial Proportions

ST495: Survival Analysis: Maximum likelihood

CTDL-Positive Stable Frailty Model

TMA 4275 Lifetime Analysis June 2004 Solution

Published online: 10 Apr 2012.

SEMIPARAMETRIC LIKELIHOOD RATIO INFERENCE. By S. A. Murphy 1 and A. W. van der Vaart Pennsylvania State University and Free University Amsterdam

Model Selection in Bayesian Survival Analysis for a Multi-country Cluster Randomized Trial

On consistency of Kendall s tau under censoring

ASYMPTOTIC PROPERTIES AND EMPIRICAL EVALUATION OF THE NPMLE IN THE PROPORTIONAL HAZARDS MIXED-EFFECTS MODEL

LOGISTIC REGRESSION Joseph M. Hilbe

GOODNESS-OF-FIT TESTS FOR ARCHIMEDEAN COPULA MODELS

Reduced-rank hazard regression

M- and Z- theorems; GMM and Empirical Likelihood Wellner; 5/13/98, 1/26/07, 5/08/09, 6/14/2010

STAT Sample Problem: General Asymptotic Results

1 Introduction. 2 Residuals in PH model

Empirical Likelihood in Survival Analysis

Transcription:

Ann Inst Stat Math (216) 68:353 384 DOI 1.17/s1463-14-496-3 A semiparametric generalized proportional hazards model for right-censored data M. L. Avendaño M. C. Pardo Received: 28 March 213 / Revised: 3 September 214 / Published online: 21 November 214 The Institute of Statistical Mathematics, Tokyo 214 Abstract We introduce a flexible family of semiparametric generalized logit-based regression models for survival analysis. Its hazard rates are proportional as the Cox model, but its relative risk related to a covariate is different for the values of the other covariates. The method of partial likelihood approach is applied to estimate its parameters in presence of right censoring and its asymptotic normality is established. We perform a simulation study to evaluate the finite-sample performance of these estimators. This new family of models is illustrated with lung cancer data and compared with Cox model. The importance of the conclusions obtained from the relative risk is pointed out. Keywords Survival analysis Proportional hazards Type-I generalized logistic distribution Semiparametric models Profile likelihood Partial likelihood 1 Introduction Analysis of event times (also referred as survival analysis) deals with data representing the time to a well-defined event. These data arise in engineering, economy, reliability, public health, biomedicine and other areas. Data arising from survival analysis often M. L. Avendaño (B) Faculty of Mathematics, University of Veracruz, Circuito Gonzalo Aguirre Beltrán, s/n, Zona Universitaria, 919 Xalapa, Mexico e-mail: maravend@ucm.es M. C. Pardo Department of Statistics and Operational Research I, Faculty of Mathematics, Complutense University of Madrid, Plaza de las Ciencias 3, Ciudad Universitaria, 284 Madrid, Spain

354 M. L. Avendaño, M. C. Pardo consist of a response variable that measures the duration of time until the occurrence of a specific event and a set of variables (covariates) thought to be associated with the event-time variable, they have some features that present difficulties to traditional statistical methods. The first is that the data are generally asymmetrically distributed, while the second feature is that lifetimes are frequently censored (the end point of interest has not been observed for that individual). Regression models for survival data have traditionally been based on the Cox regression model (Cox 1972). One of the reasons for the popularity of this model is because the unknown parameter can be estimated by partial likelihood without putting a structure on baseline hazard. This model requires that the hazards between any two individuals are proportional across time. There are many works whose aim has been to extend the Cox model in different ways. Some of them appealing the necessity to allow non-proportional hazards such as Aranda-Ordaz (1983), Tibshirani and Ciampi (1983), Etezadi-Amoli and Ciampi (1987), Thomas (1986), Sasieni (1995), Clayton (1978), Hougaard (1984), Younes and Lachin (1997), MacKenzie (1996, 1997) and Devarajan and Ebrahimi (211). However, when the assumption of proportional hazards is tenable, a Cox regression model is usually the preferred model. Therefore, proportional hazards models that maintain the good properties of the Cox model, but give flexibility to the model are welcome. As mentioned above, regression models for survival data have traditionally been based on the proportional hazards model of Cox (1972) which is defined through the hazard function λ(t z) of the form λ(t β, z) = λ (t) exp(β z), (1) where λ ( ) is an arbitrary function of time called baseline hazard function, z = (z 1,...,z d ) is a vector of covariates for the individual at time t, and β = (β 1,...,β d ) is a vector of unknown parameters to be estimated. In case that the baseline hazard function is treated non-parametrically, then this model becomes a semiparametric model. In this paper, we introduce a flexible semiparametric family of proportional hazards models based on replacing the exponential link function in (1) by a generalization of the logistic distribution (see Balakrishnan 1992) called Type-I generalized logistic distribution in Sect. 2. The partial likelihood approach is proposed to estimate the covariate effects of the model considering right-censored data and an application is shown in Sects. 3 and 4, respectively. To prove the asymptotic normality of the partial likelihood estimator in our model, first we establish the equivalence of the partial likelihood estimator and the profile likelihood estimator for our model. Second, the efficiency of the profile likelihood estimator for our semiparametric model is proven. This proof is based on the method of an approximate least favorable submodel used by Murphy and van der Vaart (2) with the Cox regression model and it is described in Sect. 5, but developed in an Appendix. Finally, the small sample performance of the maximum partial likelihood estimators of the regression parameters in threeand four-covariate hazard function models is evaluated through a simulation study in Sect. 6.

A semiparametric generalized PH model for right-censoring 355 2 The generalized proportional hazards model The Cox model given in (1) can be generalized by replacing the exponential function in (1) by a generalization of the logistic distribution called Type-I generalized logistic distribution which is given by ( ) 1 α F(y) = 1 + e y, < y <, α>, where the parameter α is a proportionality constant for the distribution. See Balakrishnan (1992). Therefore, we propose a proportional hazards model defined through the hazard function λ(t θ, z) = λ (t)k (θ, z), (2) with ( ) 1 α K (θ, z) = 1 + exp( β, α >, z) where θ = (β,α), which we call generalized logit-link proportional hazards model. In fact, if α = orβ =, the model reduces to a non-parametric form. The cumulative hazard function for the model (2) is given by (t θ, z) = (t)k (θ, z), where (t) = t λ (u)du is the baseline cumulative hazard function. Thus, the survival function corresponding to the hazard model is S(t θ, z) = exp( (t θ, z)), and Eq. (2) characterizes the model with density given by f (t θ, z) = exp( (t θ, z))λ(t θ, z). Note that, in the particular case α = 1, we obtain the proportional hazards model with a logit-link function exp(β z) λ(t β, z) = λ (t) 1 + exp(β z). For two-covariate profiles z i and z j, the hazard ratio is proportional and the relative risk does not depend on t as

356 M. L. Avendaño, M. C. Pardo ρ(t θ, z i, z j ) = λ(t θ, z i) λ(t θ, z j ) ( 1 + exp( β z j ) = 1 + exp( β z i ) = ρ(t β,α,z i, z j ). The interpretation of the hazard ratio of model (2) is similar to the Cox model hazard ratio since if ρ(t θ, z i, z j ) = 1, then the individuals with covariates z i and z j have the same relative risk of death and if ρ(t θ, z i, z j )<1(>1), the individual with covariate z i has lower (higher) relative risk of death than the individual with covariate z j. As particular case, when the covariate z m increases z m +1 and the rest of covariates remain equal the relative risk is equal (lower or higher) if and only if β m = (< or >), respectively, and the parameter α is a proportionality parameter for the model since the relative risk increases or decreases when α increases depending if the relative risk is less or greater than 1. Moreover, this parameter gives flexibility to the model to adapt better to the data. For illustrative purposes, we consider a two-covariate model, the first one a continuous covariate and the second one a 4-level factor covariate. The 4-level covariate is transformed in an indicator 3-dimensional vector. So, we obtain a full model given by ( ) 1 α λ(t θ, z) = λ (t), α >. 1 + exp( (β 1 z 1 + β 2 z 2 + β 3 z 3 + β 4 z 4 )) In this case, if we take the relative risk of individuals with values z 1 = z 1i and z 1 = z 1 j, respectively, at the first covariate and the same level at the second covariate, we have that the relative risk depend on the level of the second covariate, that is: ) α Level I: Level II: Level III: Level IV: ) α ( 1 + exp( z1 j β 1 ) 1 + exp( z 1i β 1 ) ( 1 + exp( z1 j β 1 β 2 ) ) α 1 + exp( z 1i ˆβ 1 β 2 ) ( ) 1 + exp( z1 j β 1 β 3 ) α 1 + exp( z 1i β 1 β 3 ) ( ) 1 + exp( z1 j β 1 β 4 ) α. 1 + exp( z 1i β 1 β 4 ) Note that, in this case, the relative risk under Cox model (1)isgivenby ρ Cox (t β, z i, z j ) = exp(β 1 (z 1i z 1 j )) which does not depend on the second covariate level. Therefore, the proposed model gives more flexibility to the relative risk.

A semiparametric generalized PH model for right-censoring 357 Up to now, we have not made any assumption about the baseline hazard function λ ( ), but in the following we assume an unknown functional form for the baseline hazard function of the model, so we have a semiparametric model. 3 Estimation procedure Consider a right-censored data sample of size n where the observed data are the iid triplets (X i,δ i, Z i ) where X i = min(t i, C i ) is the observed time with T i and C i the survival and censoring time, respectively, δ i = 1 {Ti C i } is the indicator variable of censoring and Z i is the vector of covariates for i = 1,...,n. In this case, analogously to Cox (1972), we can construct the partial likelihood function for the sample, and it can be written as L n (θ) = n i=1 K (θ, z i ) K (θ, z j ) j R(x i ) where R(x i ) is the risk set at time x i, and the partial log-likelihood function is given by l n (θ) = ln(l n (θ)) n = ln(k (θ, z i )) ln δ i i=1 j R(x i ) δ i, K (θ, z j ). (3) Then, the maximum partial log-likelihood estimator of the unknown vector of parameters of model (2)isgivenby ˆθ n = ( ˆβ n, ˆα n ) = arg max l n(θ), (4) θ=(β,α>) and the variance of the parameter estimator ˆθ n is var(ˆθ n ) = diag(î 1 n (ˆθ n )), where Î n is the observed information matrix. To obtain (4), we would need to use numerical methods for carrying out the required constrained (α >) optimization. However, the numerical algorithms are high sensitive to initial values of the parameters and they fall at local optimums. To avoid it, we consider {α 1,...,α l } a grid for the value of the parameter α then we maximize the partial log-likelihood function so we obtain l estimations of β, {β 1,...,β l } with partial log-likelihood function values {l n (α 1, β 1 ),...,l n (α l, β l )}. Finally, we consider arg max{l n (α 1, β 1 ),..., l n (α l, β l )} as estimations of the parameters (α, β).

l l l 358 M. L. Avendaño, M. C. Pardo 1445 145 1455 146 1465 147..5 1. beta1 1.5 2. 1.5 2. 3.5 3. 2.5 alpha 1445 145 beta1 1...5 1455 1.5 146 1465 2. 147 1.5 2. 2.5 3. 3.5 alpha 1445 145 1455 146 1465 147 1.5 alpha 2. 2.5.5. 3. 1.5 1. beta1 3.5 2. Fig. 1 Surface for partial log-likelihood function To check how the method employed works and also if there exists an identifiable problem for β and α, weshowinfig.1 a surface with the values of the partial loglikelihood function for different values of α and β 1 for a data set of size 5 generated from our model with the covariate z 1 a Bernoulli of parameter.45 and z 2 = for α = 2.5 and β 1 = β 2 = 1. It shows clearly that the method employed works very well. Furthermore, we graph the level curves for the partial log-likelihood function in Fig. 2. The true value of the parameters is the red square and the blue circle is that which achieves the maximum value for the partial log-likelihood function. It shows there is non-identifiability issue even if we have only one covariate. 4 Veteran s administration lung cancer data We present results from fitting the generalized proportional hazards model to a dataset from the Veteran s Administration Lung Cancer Study Clinical Trial (Kalbfleisch and Prentice 22). In this clinical trial, males with advanced inoperable lung cancer were randomized to either a standard or test chemotherapy. The primary end point for therapy comparison was time to death. Only nine out of the 137 survival times were censored, so the censoring rate is 6.6 %. The dataset includes six covariates: treatment (1 = standard and 2 = test), age at diagnosis (in years), Karnofsky score (from to 1), diagnosis time, cell type (1 = Squamous, 2 = Small, 3 = Adenocarcinoma and 4 = Large) and prior therapy. The primary purpose of this clinical trial was to

A semiparametric generalized PH model for right-censoring 359 Partial log likelihood Partial log likelihood α 1.5 2. 2.5 3. 3.5 α 1.5 2. 2.5 3. 3.5..5 1. 1.5 2. β 1 5 levels..5 1. 1.5 2. β 1 1 levels Fig. 2 Level curves for partial log-likelihood function Table 1 Fit of models for veteran s administration lung cancer data Covariate PH GLPH Est SE Est SE Karnofsky.39.518.83.16 Cell type 2.7121.25274.1869.731 Cell type 3 1.158.29286.31.986 Cell type 4.3251.27669.837.86 α 6.5.9857 AIC 96.9972 962.8318 investigate whether the new chemotherapy works better or worse than the standard chemotherapy after adjusting for other covariates. The Veteran data were used by Kalbfleisch and Prentice (22) to illustrate the Cox proportional hazards model and the treatment effect was found to be non-significant. The veteran cancer data can be obtained from the randomsurvivalforest Package of the R statistical software, these data are called veteran. In this example, we consider only two covariates: Karnofsky score and cell type (factor with four levels). We fit the Cox proportional hazards model (PH) and the generalized logit-link proportional hazards model (GLPH). In Table 1, we can see the parameter estimates (Est) and their standard errors (SE) for the two fitted models, for GLPH model we use parametric bootstrap with 5 replicas to obtain the SE of the parameters. We compare the fit of the two models with the the Akaike s entropy-based Information Criterion (AIC). From these results, we conclude that the fit of the two models is similar. On the other hand, the interpretation of regression coefficients is similar in both models. However, the interpretation of their relative risks differs. We consider two individuals with a difference of 1 in the Karnofsky score value with the same cell type. Under PH model the relative risk is

36 M. L. Avendaño, M. C. Pardo exp(1 ˆβ 1 ).73. However, under the GLPH model the relative risk depends on the Cell type of the individuals. By way of illustration, we consider individuals with a difference of Karnofsky score of 1, for example, individuals with Karnofsky score of 2 and 1, respectively: ( ) ˆα 1 + exp( 1 ˆβ 1 ) Cell type 1 :.75 1 + exp( 2 ˆβ 1 ) ( ) ˆα 1 + exp( 1 ˆβ 1 ˆβ 2 ) Cell type 2 :.77 1 + exp( 2 ˆβ 1 ˆβ 2 ) ( ) ˆα 1 + exp( 1 ˆβ 1 ˆβ 3 ) Cell type 3 :.78 1 + exp( 2 ˆβ 1 ˆβ 3 ) ( ) ˆα 1 + exp( 1 ˆβ 1 ˆβ 4 ) Cell type 4 :.76. 1 + exp( 2 ˆβ 1 ˆβ 4 ) Note that, an individual with a Karnofsky score of 2 compared with an individual with a Karnofsky score of 1 has lower relative risk, but now it is different depending on the cell type. The individuals with cell type 1 have the lowest risk. The individuals with cell type 3 have the highest risk. On other hand, considering individuals with Karnofsky score of 1 and 9, respectively (also a difference of 1 ), the relative risks are: ( ) ˆα 1 + exp( 9 ˆβ 1 ) Cell type 1 :.69 1 + exp( 1 ˆβ 1 ) ( ) ˆα 1 + exp( 9 ˆβ 1 ˆβ 2 ) Cell type 2 :.71 1 + exp( 1 ˆβ 1 ˆβ 2 ) ( ) ˆα 1 + exp( 9 ˆβ 1 ˆβ 3 ) Cell type 3 :.72 1 + exp( 1 ˆβ 1 ˆβ 3 ) ( ) ˆα 1 + exp( 9 ˆβ 1 ˆβ 4 ) Cell type 4 :.7, 1 + exp( 1 ˆβ 1 ˆβ 4 ) thus, an individual with a Karnofsky score of 1 compared with an individual with a Karnofsky score of 9 has lower relative risk to die than when we compare to individuals with Karnofsky scores of 1 and 2, respectively, and this risk is different depending on the cell type. The individuals with cell type 1 have the lowest risk. The individuals with cell type 3 have the highest risk. To sum up, the relative risk under the Cox model is independent from the Cell type and also the concrete Karnofsky score since only it takes into account the difference

A semiparametric generalized PH model for right-censoring 361 between the score of the individuals. Therefore, the proposed model provides useful information from its relative risk. 5 Asymptotic normal distribution of the parameter estimators To make inference, we establish the asymptotic normal distribution of the parameter estimator of the generalized proportional hazards model. To do it, we use the main Theorem of Murphy and van der Vaart (2) and its Corollary 1, (see Kosorok 28). Note that, this theorem considers a maximum profile log-likelihood estimator and we propose a maximum partial log-likelihood estimator, then it is necessary to verify that both estimators are equivalent. Consider the semiparametric model (θ, ) P θ,, where θ is the regression parameter and the cumulative hazard function in the model (2) and P θ, is the data distribution function. Recall that an observation with right censoring is given by Y = (X,δ,Z) where X = min(t, C), δ = 1 {T C} and Z R d is the covariate vector. We assume that T is the survival time with cumulative hazard function given Z (t z) = (t)k (θ, z) and C is a censoring time independent of T given Z and uninformative of (θ, ). In this case, the density function of y = (x,δ,z) is given by p θ, (y) =[λ (x)k (θ, z)s T Z (x z)s C Z (x z)] δ [S T Z (x z)p C Z (x z)] 1 δ p Z (z). Thus, the log-likelihood for the observation y is given by l(θ, )(y) = log p θ, (y) = δ[log λ (x) + log K (θ, z)] (x)k (θ, z) + c (5) where c is a constant. Note that, the log-likelihood function depends on the parameters θ and. Furthermore, in Appendix A it is shown that its efficient score function for θ is l θ, (y) = δ( K (θ, z) h (x)) K (θ, z) x ( K (θ, z) h (s))d (s), (6) where with K (θ, z) = ([K 1 (θ, z)], K 2 (θ, z)) (7) αz K 1 (θ, z) = 1 + exp(β z), K 2 (θ, z) = log[k (θ, z)] and h : R R d+1 is the least favorable direction (see Appendix A ).

362 M. L. Avendaño, M. C. Pardo We consider the log-likelihood function based on a random sample of n observations y 1,...,y n which is given by l n (θ, ) = n l(θ, )(y i ), i=1 then the profile log-likelihood for θ is defined as pl n (θ) = sup l n (θ, ). (8) Note that, the maximum likelihood estimator for θ, the first component of the pair (ˆθ n, ˆ ) that maximizes l n (θ, ), is the maximizer of the profile loglikelihood function θ pl n (θ). Then, θ n denote the maximum profile log-likelihood estimator. To establish the asymptotic properties of the parameter estimates θ n we assume the following: (i) Exist τ < such that P (C τ) = P (C = τ) >, where P is the true probability measure. (ii) The observed time X [,τ]. (iii) Let H be all monotone increasing continuous functions on [,τ] with () =. We define Ĥ to be the set of all monotone increasing Cadlag functions on [,τ] with () = and Ĥ, where is the true value of the parameter. (iv) The vector Z R d is bounded. (v) Let R d R + be a compact set such as θ, where θ is the true value of the parameter. (vi) The efficient Fisher information matrix Ĩ is invertible. Theorem 1 Under regularity conditions (i) (vi), the maximum profile log-likelihood estimator θ n for the generalized proportional hazards model (2) obtained maximizing (8) is asymptotically normal distributed as n( θ n θ ) d n N (, Ĩ 1 ) where Ĩ = P [( l θ, )( l θ, ) ] with l θ, given in (6). Proof To apply Corollary 1 of the main result of Murphy and van der Vaart (2) we need to verify Conditions (8) (11) of Murphy and van der Vaart (2) and also conditions of its Theorem 1. Let us define the least favorable submodel t t (θ, )

A semiparametric generalized PH model for right-censoring 363 with t R d+1, where t (θ, )(s) = t (s) = s (1 + (θ t) h (u))d (u), where h : R R d+1 is the least favorable direction at the true parameter (θ, ). For t small enough t (θ, ) Ĥ. Moreover, we obtain θ (θ, )( ) = θ ( ) = ( ), that is, Condition (8) of Murphy and van der Vaart (2) is satisfied. The least favorable direction h ( ) is given by h (x) = P [K (θ, z)1 {X x} ] P [ K (θ, z)k (θ, z)1 {X x} ], with K (θ, z) as (7). Details about the construction of h ( ) is given in Appendix A at the end of this work. On the other hand, we consider the map t l(t, θ, )(y) defined by l(t, θ, )(y) = log p t, t (y), (9) thus the log-likelihood for the data y under the least favorable submodel is given by l(t, θ, )(y) = δ[log((1 + (θ t) h (x))λ (x)) + log K (t, z)] K (t, z) and its score function for t is given by x (1 + (θ t) h (s))d (s) l(t, θ, )(y) = t l(t, θ, )(y) (1) = δ K (t, z) τ δh (x) (1 + (θ t) h (x)) + τ ( = K (t, z) K (t, z)k (t, z)1 {X s} (1 + (θ t) h (s))d (s) τ h (s) 1 + (θ t) h (s) K (t, z)1 {X s} h (s)d (s) (11) ) dm(s), (12) with dm(s) = d1 {X s} δ K (t, z)1 {X s} (1 + (θ t) h (s))d (s). Thus, we have l(θ, θ, )(y) = l θ, (y), this is, Condition (9) of Murphy and van der Vaart (2) is satisfied.

364 M. L. Avendaño, M. C. Pardo In addition, considering the likelihood L(θ, ), the maximizer of L(θ, ) has the form t P n [dn(u)] ˆ θ (t) = P n [1 {X u} K (θ, z)], (13) with P n the empirical distribution function and N(u) = 1 {X u} δ. This estimator is equivalent to the Breslow cumulative baseline hazard estimator in Cox model. Note that, the estimator (13) depends on the parameter θ. In Theorem 5 ( Appendix B ) it P is verified that ˆ θ n is uniformly consistent for for any sequence θ n θ. Thus, Condition (1) of Murphy and van der Vaart (2) is satisfied. The non-bias Condition (11) of Murphy and van der Vaart (2) is verified in Lemma 6 in Appendix B of this work. Finally, standard arguments allow to prove Donsker and Glivenko Cantelli conditions of Theorem 1 of Murphy and van der Vaart (2) (see Appendix B ). Then, we can apply Theorem 1 of Murphy and van der Vaart (2) and we obtain log pl n ( θ n ) = log pl n (θ ) + ( θ n θ ) n l (y i ) i=1 1 2 n( θ n θ ) Ĩ ( θ n θ ) + o P ( n θ n θ +1) 2 where l is the efficient score function for θ at (θ, )givenin(6) and Ĩ is the efficient Fisher matrix for θ at (θ, ). Now, assuming that Ĩ is invertible (Assumption iv)) and Theorem 5 of Appendix B, we can apply Corollary 1 of Murphy and van der Vaart (2) and we obtain n( θ n θ ) d n where, the efficient Fisher matrix is given by N (, Ĩ 1 ), Ĩ = P [( l θ, )( l θ, ) ]. Last theorem establishes the asymptotic distribution of the maximum profile loglikelihood estimator of the unknown parameters of model (2), but we propose to use maximum partial log-likelihood estimator. Nevertheless, next lemma shows equivalence of the partial likelihood and profile likelihood estimators. Lemma 2 The estimator θ n based on the profile log-likelihood for the model (2) under right censoring and the estimator ˆθ n based on the partial log-likelihood function (3) are equivalent.

A semiparametric generalized PH model for right-censoring 365 Proof We define the function N i (x) = 1 {Xi x}δ i, thus the partial likelihood function for the sample is given by L n (θ) = n the partial log-likelihood is i=1 x τ ( 1 {Xi x}k (θ, z i ) nj=1 1 {X j x}k (θ, z j ) ) Ni (x), l n (θ) = log(l n (θ)) n τ n = log[1 {Xi x}k (θ, z i )] log 1 {X j x}k (θ, z j ) dn i (x), i=1 and its score function is given by [ ( τ θ l n(θ) = np n K (θ, z) P ) ] n[1 {X x} K (θ, z)k (θ, z)] dn(x). (14) P n [1 {X x} K (θ, z)] On the other hand, if we consider the second term of the score efficient function (6) and the empirical density function we have P n [ K (θ, z) [,x] ( j=1 K (θ, z) P ) ] n[1 {X x} K (θ, z)k (θ, z)] d ˆ (s) P n [1 {X x} K (θ, z)] τ ( = P n [1 {X x} K (θ, z)k (θ, z)] P n [1 {X x} K (θ, z)] P n[1 {X x} K (θ, z)k (θ, z)] P n [1 {X x} K (θ, z)] =, ) d ˆ (s) with ˆ ( ) the estimator of the cumulative baseline hazard function. Taking the first term of (6) wehave ( P n [δ K (θ, z) E )] n[1 {X x} K (θ, z)k (θ, z)] E n [1 {X x} K (θ, z)] [ ( τ = P n K (θ, z) E ) ] n[1 {X x} K (θ, z)k (θ, z)] dn(x). E n [1 {X x} K (θ, z)] Thus, the estimator ˆθ n and the estimator θ n are equivalent.

366 M. L. Avendaño, M. C. Pardo 6 Simulation study The aim of this section is to check the normality of the estimators of the covariate effects of model (2) for finite-sample through a simulation study. The survival times t were generated under the model λ(t θ, z) = λ (t)k (θ, z), with ( ) 1 α K (θ, z) = 1 + exp( β, α > fixed z) as t = 1 ( ) log(u), K (θ, z) where u U(, 1) (Bender et al. (25)). We assume that the cumulative baseline hazard function is a Weibull distribution with shape parameter γ = 3.2 and scale parameter μ = 1.1. We consider three different experiment designs: First Design: We consider Z β = Z 1 β 1 + Z 2 β 2 + Z 3 β 3 + Z 4 β 4 with true values of the parameters β = (.5,.2,.3,.1) and α = 8, the covariates Z 1, Z 2 and Z 3 are binary variables that were generated from a factor of 4 levels and each level has a probability of {.37,.19,.3,.14}, respectively, and Z 4 is generated from a U( 1, 1). Second Design: We consider Z β = Z 1 β 1 + Z 2 β 2 + Z 3 β 3 with true values of the parameters β = (.3,.6,.1) and α = 8, the covariates Z 1, Z 2 and Z 3 were generated from a B(.35), U( 1, 1) and N (1.5, 2), respectively. Third Design: We consider Z β = Z 1 β 1 + Z 2 β 2 with true values of the parameters β = (2, 1) and α =.2, the covariates Z 1 and Z 2 were generated from a N (, 1) and U( 1, 1), respectively. Several sample sizes n = 1, 2, 4 and different percent of censoring 15, 45 and 7 % for First and Second designs were considered and only 15 % of censoring for Third design was considered. A total of 5 simulated datasets were generated for each configuration of three designs. In Sect. 5, we proved the asymptotic normality of the estimators of the covariate effects and now we investigate this property for small and moderate sample sizes. Therefore, we fit a generalized logit-based proportional hazard model for each one of the 5 simulate datasets. In Tables 2, 3 and 5, we present the average and the median of the estimators of the covariate effects, their estimated standard errors (SE), empirical standard errors (SEe), the p value for the Shapiro Wilks test for testing normality by marginals and the coverage probabilities (CP) of a 9, 95 and 99 % confidence interval by marginals, respectively. In Table 4, the p values for the Henze Zirkler multivariate normality test are presented for the First and Second Designs for each sample size

A semiparametric generalized PH model for right-censoring 367 Table 2 Average and median of the parameter estimates, standard error (SE), empirical standard error (SEe), p value for Shapiro Wilks test and coverage probabilities (CP) for a nominal 9, 95 and 99 %, respectively, for the first design Censoring Sample size Parameter Average Median SE SEe p value CP for.9 CP for.95 CP for.99 15 % 1 β1.52896.527.1246.1366.2.872.924.976 β2.21965.21197.836.8491.5.872.93.98 β3.32589.3214.11383.11941..914.952.986 β4.1197.9954.5665.5979.1339.884.942.984 2 β1.51267.49862.8451.9725..846.912.978 β2.2341.2339.566.593.1778.878.942.984 β3.31225.3518.7773.8479.7.89.938.978 β4.9996.119.387.3866.19144.94.948.986 4 β1.563.543.5899.6881.854.822.898.978 β2.2462.2295.3981.411.4824.94.954.986 β3.3716.3274.5424.5973.33.862.922.98 β4.1179.117.2713.274.78848.876.934.99 45 % 1 β1.54146.52331.161.17754..858.9.956 β2.21324.28.1441.11171.18.87.932.982 β3.32191.3361.1475.15981..878.916.974 β4.11144.1794.7221.7659.16895.866.934.976 2 β1.5282.51629.1798.11714.66.856.918.972 β2.2995.2793.7161.7443.39918.886.938.986 β3.31986.31547.997.1533.271.862.928.978 β4.181.167.499.4939.3318.92.946.984 4 β1.5671.583.7331.8141.15.86.912.978 β2.2492.253.4959.5388.1724.896.938.982

368 M. L. Avendaño, M. C. Pardo Table 2 continued Censoring Sample size Parameter Average Median SE SEe p value CP for.9 CP for.95 CP for.99 β3.385.3543.6759.772.389.868.918.978 β4.1229.141.3383.342.79969.9.942.984 7 % 1 β1.5634.5249.4422.51533..834.884.938 β2.24258.2871.28117.4729..864.98.962 β3.396.29561.63577.62722..86.914.956 β4.142.9984.1412.16359..866.912.964 2 β1.5215.5629.1481.1527.2.866.926.984 β2.2112.2937.988.1264.414.894.94.974 β3.3352.313.19855.1932..9.94.972 β4.1836.1876.675.6929.48689.874.926.982 4 β1.51422.5877.19.11539.2883.842.92.962 β2.2797.2566.676.7152.43157.872.922.982 β3.3877.31168.9271.9495.59518.886.942.98 β4.119.993.4614.4748.4989.874.946.992

A semiparametric generalized PH model for right-censoring 369 Table 3 Average and median of the parameter estimates, standard error (SE), empirical standard error (SEe), p value for Shapiro Wilks test and coverage probabilities (CP) for a nominal 9, 95 and 99 %, respectively, for the second design Censoring Sample size Parameter Average Median SE SEe p value CP for.9 CP for.95 CP for.99 15 % 1 β1.31225.363.8559.9181.21.864.938.972 β2.62663.6167.945.1834..84.894.96 β3.1628.1476.2225.2382.3.864.914.97 2 β1.3951.3939.5781.659.454.884.94.978 β2.6836.6294.633.7822.3.82.912.974 β3.1175.16.1491.1739.1.854.91.958 4 β1.3166.29935.3979.4496.13.874.928.978 β2.6265.59819.4342.592.92.766.846.94 β3.11.9977.117.1121.17298.87.934.98 45 % 1 β1.31524.31323.169.11154.26.882.95.982 β2.63217.6955.11788.1287..868.932.964 β3.168.1288.2793.37..852.922.972 2 β1.3726.2993.7316.7522.38.872.932.982 β2.6233.62235.859.9249.126.826.896.966 β3.1453.1253.1888.272.94.844.898.972 4 β1.36.3154.534.5477.865.866.926.984 β2.61245.6473.5492.6542.199.82.894.964 β3.1237.115.129.1421.7.862.98.966

37 M. L. Avendaño, M. C. Pardo Table 3 continued Censoring Sample size Parameter Average Median SE SEe p value CP for.9 CP for.95 CP for.99 7 % 1 β1.342.31814.1628.1895..864.918.956 β2.67274.64338.17912.22422..848.896.952 β3.1127.1424.4217.5451..858.98.948 2 β1.31892.31371.1125.1926.1.864.928.972 β2.62598.6825.1119.12647..846.98.962 β3.1519.1257.269.2842..868.918.972 4 β1.3947.338.6885.7721.5.884.928.976 β2.62163.61481.7544.8963.398.818.896.96 β3.1292.1166.1757.1942.317.858.916.972

A semiparametric generalized PH model for right-censoring 371 Table 4 p Values for the Henze Zirkler test Censoring Sample size p values Design 1 p values Design 2 15 % 1.25.16 2.8.11 4.183852.5988 45 % 1.. 2.1884.8949 4.125454.3476 7 % 1.. 2.. 4.118224.34833 and each censoring percentage. Moreover, Fig. 3 shows the estimated density plots of the standardized parameter estimates (by marginals) to check the normality of the estimator of β, and Fig. 4 shows the multivariate qq-plots, for sample size 4 for First and Second designs and different censoring. From Tables 2 and 3 it can be seen that the average of the estimates of the parameters is really close to the true value of the parameters for all configurations in both designs. Furthermore, the estimated standard errors are very near to the empirical ones. The empirical coverage probabilities appear to be quite close to the nominal levels. Although the p values of the Shapiro Wilks normality test increase with n for most censoring, some p values are too small for not rejecting the normality of some estimations, even for n = 4. This can be caused by the optimization algorithm used that it favors one estimator over the others. However, the p values of the Henze Zirkler multivariate normality test (Table 4) accept the multivariate normality for n = 4. Moreover, sometimes the results of the formal tests contradict with the impressions of the graphical analysis. In particular for large sample sizes the blind trust on the formal tests for normality can lead to erroneous results. Figure 3 shows that the empirical density plots (by marginals) improve when n increases and these plots are near to a normal density for moderate censoring, even for small sample size and Fig. 4 supports the approach to a multivariate normality distribution based on the qq-plots for n = 4. From Table 5, we can see that the estimation of the parameters is bad, however, it is expected because for the Third design α is near to zero, so the model approaches to a non-parametric model. Therefore, we should prevent to use this model for α less than.5 since the estimations get worse cause the model approaches to non-parametric form. Appendix A: The efficient score function and the least favorable direction To find the efficient score function for the model (2), we follow the steps in the Cox model Example of Murphy and van der Vaart (2).

372 M. L. Avendaño, M. C. Pardo Table 5 Average and median of the parameter estimates, standard error (SE), empirical standard error (SEe), p value for Shapiro Wilks test and coverage probabilities (CP) for a nominal 9, 95 and 99 %, respectively, for the third design Censoring Sample size Parameter Average Median SE SEe p value CP for.9 CP for.95 CP for.99 15 % 1 β1 2.5437 1.63975 1.22515 2.26258..666.716.786 β2 1.373.81227 1.98859 2.92753..762.81.866 2 β1 2.3593 1.67197.79797 1.79754..438.68.766 β2 1.16351.8366 1.28217 1.77248..796.838.892 4 β1 2.32239 1.88265.5646 1.49761..294.376.64 β2 1.1131.82287.91745 1.2282..79.84.91

A semiparametric generalized PH model for right-censoring 373 a First design β^4 Second design..2.4..2.4..2.4..2.4 4 2 4 4 2 4 4 2 4 4 2 4 n=1 n=1 n=1 n=1 β^4..2.4..2.4..2.4..2.4 4 2 4 4 2 4 4 2 4 4 2 4 n=2 n=2 n=2 n=2 β^4..2.4..2.4..2.4..2.4..2.4..2.4..2.4 4 2 2 4 4 2 2 4 4 2 2 4 n=1 n=1 n=1..2.4..2.4..2.4 4 2 2 4 4 2 2 4 4 2 2 4 n=2 n=2 n=2..2.4..2.4..2.4 4 2 4 4 2 4 4 2 4 4 2 4 4 2 2 4 4 2 2 4 4 2 2 4 b n=4 n=4 n=4 n=4 β^4 n=4 n=4 n=4..2.4..2.4..2.4..2.4 4 2 4 4 2 4 4 2 4 4 2 4 n=1 n=1 n=1 n=1 β^4..2.4..2.4..2.4..2.4 4 2 4 4 2 4 4 2 4 4 2 4 n=2 n=2 n=2 n=2 β^4..2.4..2.4..2.4..2.4..2.4..2.4..2.4 4 2 2 4 4 2 2 4 4 2 2 4 n=1 n=1 n=1..2.4..2.4..2.4 4 2 2 4 4 2 2 4 4 2 2 4 n=2 n=2 n=2..2.4..2.4..2.4 4 2 4 4 2 4 4 2 4 4 2 4 4 2 2 4 4 2 2 4 4 2 2 4 c n=4 n=4 n=4 n=4 β^4 n=4 n=4 n=4..2.4..2.4..2.4..2.4 4 2 4 4 2 4 4 2 4 4 2 4 n=1 n=1 n=1 n=1 β^4..2.4..2.4..2.4..2.4 4 2 4 4 2 4 4 2 4 4 2 4 n=2 n=2 n=2 n=2 β^4..2.4..2.4..2.4..2.4..2.4..2.4..2.4 4 2 2 4 4 2 2 4 4 2 2 4 n=1 n=1 n=1..2.4..2.4..2.4 4 2 2 4 4 2 2 4 4 2 2 4 n=2 n=2 n=2..2.4..2.4..2.4 4 2 4 4 2 4 4 2 4 4 2 4 4 2 2 4 4 2 2 4 4 2 2 4 n=4 n=4 n=4 n=4 n=4 n=4 n=4 Fig. 3 Empirical density plots for the standardized parameter estimates for sample sizes n = 1, 2 and 4 for the First and Second design with a 15 % of censoring, b 45 % of censoring and c 7 % of censoring

374 M. L. Avendaño, M. C. Pardo a First design Chi Square Q Q Plot Second design Chi Square Q Q Plot Squared Mahalanobis Distance 5 1 15 Squared Mahalanobis Distance 5 1 15 2 5 1 15 5 1 15 Chi Square Quantile Chi Square Quantile b Squared Mahalanobis Distance 5 1 15 Chi Square Q Q Plot Squared Mahalanobis Distance 5 1 15 Chi Square Q Q Plot 5 1 15 Chi Square Quantile 5 1 15 Chi Square Quantile c Squared Mahalanobis Distance 5 1 15 2 Chi Square Q Q Plot Squared Mahalanobis Distance 5 1 15 2 Chi Square Q Q Plot 5 1 15 Chi Square Quantile 5 1 15 Chi Square Quantile Fig. 4 Multivariate qq-plots for sample sizes n = 4 for the First and Second design with a 15 % of censoring, b 45 % of censoring and c 7 % of censoring

A semiparametric generalized PH model for right-censoring 375 The efficient score function for θ at (θ, ) is defined as l θ, (y) = l θ, (y) θ, l θ, (y) (15) where l θ, (y) is the score function for θ at (θ, ) and θ, minimizes the squared distance P ( l θ, g)2 over all functions g in the closed linear span of the score functions for. In this case, as in the Cox model, we have θ, l θ, = A θ, h (16) h = (A θ, A θ, ) 1 A l θ, θ, (17) where A θ, h is the score function for at (θ, ), A is the adjoint operator θ, and h is the least favorable direction. Then, the score function for θ of the model (2) is given by From (5) wehave with and l(θ, )(y) = ([ ] β l(θ, )(y), ) α l(θ, )(y). β l(θ, )(y) = δk 1 (θ, z) (x)k 1 (θ, z)k (θ, z), α l(θ, )(y) = δk 2 (θ, z) (x)k 2 (θ, z)k (θ, z), K 1 (θ, z) = αz 1 + exp(β z) K 2 (θ, z) = ln[k (θ, z)]. Note that, the score function for θ can be expressed as l(θ, )(y) = δ K (θ, z) (x) K (θ, z)k (θ, z), (18) with K (θ, z) = (K 1 (θ, z), K 2 (θ, z)).

376 M. L. Avendaño, M. C. Pardo To obtain the score function for, we define the path t (x) = x (1 + t h(s))d (s) (19) for t R d+1, ( ) given and a bounded function h :[,τ] R d+1. The path t ( ) Ĥ if t. Replacing the path (19) in the log-likelihood (5) we obtain l(θ, t )(y) = δ[log((1 + t h(x))λ (x)) + log K (θ, z)] K (θ, z) x thus, its derivative with respect to t is given by t l(θ, h(x) t)(y) = δ (1 + t h(x)) and taking t =, we can get t l(θ, t)(y) = δh(x) K (θ, z) t= (1 + t h(s))d (s) + c, x K (θ, z) h(s)d (s), x h(s)d (s) = A θ, h(y). (2) Then, if we consider that h is the least favorable direction and by expressions (15), (16), (18) and (2) the efficient score function for θ is given by l θ, (y) = δ( K (θ, z) h (x)) K (θ, z) x ( K (θ, z) h (s))d (s). In addition, to find the expression of the least favorable direction h we use (17), so it is necessary to obtain the expressions of A θ, l(θ, )(y) and A θ, A θ, (x) where A θ, is the adjoint operator characterized by P θ, [(A θ, h)g] = [h(a θ, g)]. Using the last expression we can get P θ, [(A θ, g)(a θ, h)] = [g(a θ, A θ, h)] (21) and [(A θ, l(θ, ))h] =P θ, [ l(θ, )(A θ, h)]. (22)

A semiparametric generalized PH model for right-censoring 377 First, we calculate (A θ, g)(a θ, h) that it is given by x (A θ, g(y)) (A θ, h(y)) = δg(x) h(x) δk (θ, z) [g(x) h(s)d (s) x ] + h(x) g(s)d (s) ( x ) x + (K (θ, z)) 2 g(s)d (s) h(s)d (s). We define x x a(x) = g(x) h(s)d (s) + h(x) g(s)d (s) and integrating partially we have (A θ, g(y)) (A θ, h(y)) x = δg(x) h(x) δk (θ, z)a(x) + (K (θ, z)) 2 a(s)d (s). Furthermore, we can obtain τ P θ, [ δk (θ, z)a(x)] = a(x)e[(k (θ, z)) 2 1 {X x} ]d (x). Therefore, we have P θ, [ δg(x) h(x) δk (θ, z)a(x) + (K (θ, z)) 2 x Finally, by Eq. (21) ] a(s)d (s) = P θ, [δg(x) h(x)] τ [ ] = g(x) h(x) K (θ, z)p[t C > x z]p Z (z)dz d (x) R d τ = g(x) E[K (θ, z)1 {X x} ]h(x)d (x) = [g E[K (θ, z)1 {X x} ]h]. A θ, A θ, h(y) = E[K (θ, z)1 {X x} ]h(y). Analogously, taking the product l(θ, )A θ, h, integrating and using (22)wehave A θ, l(θ, )(y) = E[ K (θ, z)k (θ, z)1 {X x} ].

378 M. L. Avendaño, M. C. Pardo Then, we can define the least favorable direction at the true parameters (θ, ) as h (x) = E [ K (θ, z)k (θ, z)1 {X x} ]. E [K (θ, z)1 {X x} ] Appendix B: Technical details for the proof of Theorem 1 We follow the procedure of Murphy (1994) to establish the consistency of the estimators of the generalized proportional hazards model. First, we prove the following lemmas. Lemma 3 The estimator (13) is consistent at θ,thisis, when n. Proof We can obtain that sup ˆ θ (x) a.s. (x), x [,τ] sup ˆ θ (x) (x) sup 1 x [,τ] x [,τ] P n [1 {Xi x}k (θ, z)] 1 P [1 {Xi x}k (θ, z)] [ ] + sup (P N(x) n P ). P [1 {X x} K (θ, z)] On the other hand, the classes x [,τ] {1 {X x} K (θ, z) : x [,τ], θ } and { } N(x) : x [,τ], θ (23) P [1 {X x} K (θ, z)] are Donsker, and as consequence they are Glivenko Cantelli classes. Thus, we have sup (P n P )[1 {Xi x}k (θ, z)] a.s., x [,τ] additionally, as P [1 {Xi x}k (θ, z)] > forx [,τ] sup x [,τ] 1 P n [1 {Xi x}k (θ, z)] 1 P [1 {Xi x}k (θ, z)] a.s.. Moreover, as the class given in (23) is Glivenko Cantelli we obtain sup x [,τ] [ (P n P ) N(x) P [1 {X x} K (θ, z)] ] a.s..

A semiparametric generalized PH model for right-censoring 379 Then, the stated lemma follows. Lemma 4 lim sup ˆ θ (τ) < almost surely. Proof Under assumption (iv), there exist < c < such as We define the constant c 2 such as for θ given. Thus, we obtain z < c. c 2 = min K (θ, z) z <c P n [1 {X x} K (θ, z)] c 2 P n [1 {X x} ], and by the law of large numbers, almost surely we have P n [1 {X x} K (θ, z)] c 2 P [1 {X x} ]+o P (1). We assume that P [1 {X x} ] > forx [,τ], thus, P n [1 {X x} K (θ, z)] > almost surely when n. This is, the jumps of ˆ θ in τ are bounded by 1/c 2, thus, ˆ θ (τ) O(1)P n N(τ)/c 2 almost surely when n, with N(x) = 1 {X x} δ. Now, we can prove the following consistency theorem, where θ n is the maximum profile log-likelihood estimator of model (2). Theorem 5 The estimators of the generalized proportional hazards model (2) considering n right-censored data are consistent, this is sup ˆ θ n (x) (x) and θ n θ x [,τ] converge to almost surely when n. Proof By Lemma 4 and Helly s selection Theorem, we have that along of a sequence ˆ θ n (x) (x) for x [,τ], where ( ) is a continuous non-decreasing function and θ n θ with θ.

38 M. L. Avendaño, M. C. Pardo On the other hand, because P n [1 {X x} K (θ, z)] >ɛ>, we have that ˆ θ n ( ) is absolutely continuous with respect to ˆ θ ( ) and d ˆ θn (x) converges to a bounded d ˆ θ (x) measurable function ζ(x), thisis (x) = x ζ(s)d (s). Thus, ( ) is absolutely continuous with respect to ( ) and its derivative is denoted by λ ( ), moreover ζ(x) = λ (x) λ (x). In addition, as ( θ n, ˆ θ n ) maximize l n (θ, ) we have 1 n n [l( θ n, ˆ θ n )(y) l(θ, ˆ θ )(y)], i=1 if n, P [l(θ, )(y) l(θ, )(y)]. By the Glivenko Cantelli Theorem and as d ˆ θn (x) d ˆ converges uniformly to λ (x) θ (x) λ (x),we can conclude that the Kullback Leibler information between the density given by the parameters (θ, ) and the density given by the parameters (θ, ) is negative, thus two densities are equal almost surely. This implies that θ = θ and ( ) = ( ) in [,τ]. Then, θ n θ and ˆ θ n (x) (x) almost surely with x [,τ]. Lemma 6 Under the consistency of the estimators θ n and ˆ θ n we have P [ l(θ, θ n, ˆ θ n )]=o P ( θ n θ ). Proof From expression (11) wehave [ τ P [ l(θ, θ n, ˆ θ n )]=P K (θ, z)1 {X s} (h (s) K (θ, z))d ˆ θ n (s) τ Moreover, we can obtain the following equalities K (θ, z)k (θ, z)1 {X s} ( θ n θ ) h (s)d ˆ θ n (s) ] + δ( K (θ, z) h (x)) 1 + ( θ n θ ) h (x) + δ K (θ, z)( θ n θ ) h (x). 1 + ( θ n θ ) h (x)

A semiparametric generalized PH model for right-censoring 381 P [K (θ, z)1 {X x} (h (x) K (θ, z))] =, [ ] δ( K (θ, z) h (x)) P = 1 + ( θ n θ ) h (x) and [ ] [ ] δ K (θ, z)( θ n θ ) h (x) δh (x)( θ n θ ) h (x) P = P. 1 + ( θ n θ ) h (x) 1 + ( θ n θ ) h (x) Thus, τ P [ l(θ, θ n, ˆ θ n )]=P [ K (θ, z)k (θ, z)1 {X s} ( θ n θ ) h (s)d ˆ θ n (s) ] + δh (x)( θ n θ ) h (x). (24) 1 + ( θ n θ ) h (x) As s M(s) = N(s) K (θ, z)1 {X s} (1 + (θ θ ) h (s))d (s) is a martingale with media, we have P [ l(θ, θ n, )]= τ = P [ K (θ, z)k (θ, z)1 {X s} ( θ n θ ) h (s)d (s) ] + δ h (x)( θ n θ ) h (x). (25) 1 + ( θ n θ ) h (x) Finally, from expressions (24) and (25) wehave P [ l(θ, θ n, ˆ θ n )]=P [ l(θ, θ n, ˆ θ n )] P [ l(θ, θ n, )] = P [ τ K (θ, z)k (θ, z)1 {X s} ( θ n θ ) h (s)d( ˆ θ n (s) (s)) +2δh (x)( θ n θ ) h (x) 1 + ( θ n θ ) h (x) = o P ( θ n θ ). ]

382 M. L. Avendaño, M. C. Pardo Lemma 7 Let V be a neighborhood of (θ, θ, ) where θ and are the true values of the parameter of the generalized proportional hazards model (2) considering right-censored data and l(t, θ, ) the log-likelihood function for the submodel as (9). (i) The class of functions { l(t, θ, ) : (t, θ, ) V } is Donsker with squared-integrable envelope function. (ii) The class of functions { l(t, θ, ) : (t, θ, ) V } is Glivenko Cantelli and bounded in L 1 (P ). Proof Let F be the set of continuous distribution functions. For ρ>, we define C ρ ={P F : P P ρ}, where P is the true distribution function. On the other hand, as the vector z is bounded (assumption iv), the class {β z : β β } is Donsker, with β such as = β α. As the function exp(β z) is differentiable and its derivatives are bounded, the class {exp(β z) : β β } is Donsker. Moreover, as 1 + exp(β z)> we have that { ( K (θ, z) = 1 1 + exp( β z) ) α : θ }, (26) and { } αz K 1 (θ, z) = 1 + exp(β z) : θ are Donsker. In addition, as f (x) = ln(x) with domain in [c, ) where c > is Lipschitz we obtain that the class is Donsker. Thus, {K 2 (θ, z) = ln[k (θ, z)] :θ } { K (θ, z) : θ }

A semiparametric generalized PH model for right-censoring 383 is Donsker. As the class {1 {X x} : x [,τ]} is Donsker, we have that the classes and {K (θ, z)1 {X x} : x [,τ], θ } { K (θ, z)k (θ, z)1 {X x} : x [,τ], θ } are Donsker. For P C ρ, as the function P E P [ f ] is Lipschitz, the classes and {E P [K (θ, z)1 {X x} ]:x [,τ], θ, P C ρ } {E P [ K (θ, z)k (θ, z)1 {X x} ]:x [,τ], θ, P C ρ } are Donsker. On the other hand, as z is bounded as θ, there exist m and M such as < m < K (θ, z) <M <. (27) As the function P E P [ f ] is continuous, there exist ρ 1 > such as for each P C ρ we have E P [1 {X x} ] ρ 1 >. (28) From (27) and (28) we obtain <ρ 1 m me P [1 {X x} ] E P [K (θ, z)1 {X x} ] ME P [1 {X x} ] < thus, the class { } 1 E P [K (θ, z)1 {X x} ] : x [,τ], θ, P C ρ is Donsker and in consequence the class { K (θ, z) E } P[ K (θ, z)k (θ, z)1 {X x} ] : x [,τ], θ, P C ρ E P [K (θ, z)1 {X x} ] (29) is it. As the function f d is Lipschitz, the class { [,x] ( K (θ, z) E ) } P[ K (θ, z)k (θ, z)1 {X x} ] d : x [,τ], θ, P C ρ E P [K (θ, z)1 {X x} ] (3)

384 M. L. Avendaño, M. C. Pardo is Donsker. Finally, from (26), (29) and (3) we have that { l(θ, ) : x [,τ], θ, P C ρ } is Donsker with squared-integrable envelope function, this proves (i). Analogously, (ii) could be proved. Acknowledgments This work was supported by research grant MTM213-4778-R and MAEC-AECID. The authors are grateful to the referees for their valuable comments and proposals which have improved the contents of this paper. References Aranda-Ordaz, F. J. (1983). An extension of the proportional-hazards model for grouped data. Biometrics, 39, 19 117. Balakrishnan, N. (1992). Handbook of the logistic distribution. New York: Marcel Dekker. Bender, R., Augustin, T., Blettner, M. (25). Generating survival times to simulate Cox proportional hazards model. Statistics in Medicine, 24(11), 1713 1723. Clayton, D. G. (1978). A model for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease incidence. Biometrika, 65, 141 151. Cox, D. R. (1972). Regression models and life-tables. Journal of the Royal Statistical Society Series B, 34(2), 187 22. Devarajan, K., Ebrahimi, N. (211). A semi-parametric generalization of the cox proportional hazards regression model: inference and applications. Computational Statistics and Data Analysis, 65(1), 667 676. Etezadi-Amoli, J., Ciampi, A. (1987). Extended hazard regression for censored survival data with covariates: a spline approximation for the background hazard function. Biometrics, 43, 181 192. Hougaard, P. (1984). Life table methods for heterogeneous populations: distributions describing heterogeneity. Biometrika, 71, 75 83. Kalbfleisch, J. D., Prentice, R. L. (22). The statistical analysis of failure time data (2nd ed.). New York: Wiley. Kosorok, M. R. (28). Introduction to empirical processes and semiparametric inference. NewYork: Springer. MacKenzie, G. (1996). Regression models for survival data: the generalized time-logistic family. Statistician, 45, 21 34. MacKenzie, G. (1997). On a non-proportional hazards regression model for repeated medical random counts. Statistics in Medicine, 16, 1831 1843. Murphy, S. A. (1994). Consistency in a proportional hazards model incorporating a random effect. Annals of Statistics, 22(2), 712 731. Murphy, S. A., van der Vaart, W. (2). On profile likelihood. Journal of the American Statistical Association, Theory and Methods, 95(45), 449 465. Sasieni, P. D. (1995). Efficiently weighted estimating equations with application to proportional excess hazards. Lifetime data analysis, 1, 49 57. Thomas, D. C. (1986). Use of auxiliary information in fitting non-proportional hazards model. In S. H. Moolgavkar R. L. Prentice (Eds.), Modern statistical methods in chronic disease epidemiology (pp. 197 21). New-York: Wiley. Tibshirani, R., Ciampi, A. (1983). A family of additive and proportional hazard models for survival data. Biometrics, 39(1), 141 147. Younes, N., Lachin, J. (1997). Link-based models for survival data with interval and continuous time censoring. Biometrics, 53, 1199 1211.