Rank-based inference for the accelerated failure time model

Size: px

Start display at page:

Download "Rank-based inference for the accelerated failure time model"

Rosamond Parker
6 years ago
Views:

1 Biometrika (2003), 90, 2, pp Biometrika Trust Printed in reat Britain Rank-based inference for the accelerated failure time model BY ZHEZHEN JIN Department of Biostatistics, Columbia University, Ne York, Ne York 10032, U.S.A. D. Y. LIN Department of Biostatistics, University of North Carolina, Chapel Hill, North Carolina , U.S.A. L. J. WEI Department of Biostatistics, Harvard University, Boston, Massachusetts 02115, U.S.A. AND ZHILIAN YIN Department of Statistics, Columbia University, Ne York, Ne York 10027, U.S.A. SUMMARY A broad class of rank-based monotone estimating functions is developed for the semiparametric accelerated failure time model ith censored observations. The corresponding estimators can be obtained via linear programming, and are shon to be consistent and asymptotically normal. The limiting covariance matrices can be estimated by a resampling technique, hich does not involve nonparametric density estimation or numerical derivatives. The ne estimators represent consistent roots of the non-monotone estimating equations based on the familiar eighted log-rank statistics. Simulation studies demonstrate that the proposed methods perform ell in practical settings. To real examples are provided. Some key ords: Accelerated life model; Censoring; ehan statistic; Linear programming; Rank estimator; Survival data; Weighted log-rank statistic. 1. INTRODUCTION The accelerated failure time model or accelerated life model relates the logarithm of the failure time linearly to the covariates (Kalbfleisch & Prentice, 1980, pp. 32 4; Cox & Oakes, 1984, pp. 64 5). As a result of its direct physical interpretation, this model provides an attractive alternative to the popular Cox (1972) proportional hazards model for the regression analysis of censored failure time data. The presence of censoring in failure time data creates a serious challenge in the semiparametric analysis of the accelerated failure time model. Several semiparametric estimators ere proposed around 1980, Buckley & James (1979) providing a modification of the Donloaded from at Columbia University Libraries on March 27, 2015

2 342 Z. JIN, D. Y. LIN, L. J. WEI AND Z. YIN least-squares estimator to accommodate censoring and Prentice (1978) proposing rank estimators based on the ell-knon eighted log-rank statistics. The asymptotic properties of the Buckley James and rank estimators ere rigorously studied by Ritov (1990), Tsiatis (1990), Lai & Ying (1991a, b) and Ying (1993) among others. Despite the theoretical advances, semiparametric methods for the accelerated failure time model have rarely been used in applications, mainly because of the lack of efficient and reliable computational methods. The existing semiparametric estimating functions are step functions of the regression parameters ith potentially multiple roots, and the corresponding estimators may not be ell defined. Furthermore, analytical evaluations of the covariance matrices of the estimators ould require nonparametric density function estimation. To bypass the variance-covariance estimation, Wei et al. (1990) developed an inference procedure based on the so-called minimum-dispersion statistic. Calculation of the minimum-dispersion statistics and of the semiparametric estimators involves minimisations of discrete objective functions ith potentially multiple local minima. Such minimisation problems cannot be solved by conventional optimisation algorithms. Lin & eyer (1992) suggested a computational method based on simulated annealing (Kirkpatrick et al., 1983), but their algorithm is not guaranteed to yield the true minimum. In the present paper, e provide simple and reliable methods for implementing the aforementioned rank estimators. We first sho that the rank estimator ith the ehan (1965)-type eight function can be readily obtained by minimising a convex objective function through a standard linear programming technique. We then introduce a class of monotone estimating functions to approximate the possibly non-monotone eighted logrank estimating functions around the true values of the regression parameters. These estimating equations are solved through an iterative algorithm ith the ehan-type estimator as the initial value. Each iteration can also be executed via linear programming. This procedure yields a consistent root of the eighted log-rank estimating equation, hich potentially contains inconsistent roots. The covariance matrices for both the ehantype and iterative estimators can be easily estimated by a resampling technique, hich does not involve density estimation. 2. PROPOSED METHODS 2 1. Accelerated failure time model and rank estimators For,...,n, let T be the failure time for the ith subject and let X be the associated i i p-vector of covariates. The accelerated failure time model specifies that log T i =b 0 X i +e i (,...,n), (2 1) here b is a p-vector of unknon regression parameters and e (,...,n)are independent error terms ith a common, but completely unspecified, distribution. 0 i Let C be the censoring time for T. Assume that T and C are independent conditionally i i i i on X. We impose Conditions 1 4 of Ying (1993, p. 80). The data consist of (TB, D, X ) i i i i (,...,n), here TB =T mc and D =1. Here and in the sequel, amb=min(a, b), i i i i {Ti C i } and 1 is the indicator function. {.} Define e (b)=log TB b X,N(b; t)=d 1 i i i i i {ei (b) t} and Y i (b; t)=1 {e i (b)t}. Note that N i and Y are the counting process and at-risk process on the time scale of the residual. Write i S(0)(b; t)=n 1 n Y (b; t), i S(1)(b; t)=n 1 n Y (b; t)x. i i Donloaded from at Columbia University Libraries on March 27, 2015

3 Accelerated failure time model The eighted log-rank estimating function for b takes the form 0 U (b)= n D {b; e(b)}[x X9 {b; e(b)}], i i i i or 343 U (b)= n P 2 (b; t){x X9 (b; t)} dn (b; t), (2 2) i i 2 here X9 (b; t)=s(1)(b; t)/s(0)(b; t), and is a possibly data-dependent eight function satisfying Condition 5 of Ying (1993, p. 90). The choices of =1 and =S(0) correspond to the log-rank (Mantel, 1966) and ehan (1965) statistics, respectively. Let b@ be a root of the estimating function U (b). It has been established that the random vector nd b ) is asymptotically zero-mean normal ith covariance matrix 0 A 1 B A 1, here A = lim n 1 n P 2 (b ; t){x X9 (b ;t)}e2{l<(t)/l(t)} dn (b ; t), 0 i 0 i 0 n2 2 B = lim n 1 n P 2 2(b ; t){x X9 (b ;t)}e2 dn (b ; t), 0 i 0 i 0 n2 2 l(.) is the common hazard function of the error terms, and l<(t)=dl(t)/dt (Tsiatis, 1990; Lai & Ying, 1991b; Ying, 1993). The foregoing result requires that A be nonsingular. This condition is assumed to hold in the present paper. In general, U (b) is neither continuous nor componentise monotone in b. Thus, it is difficult to solve the equation U (b)=0, especially hen b is high-dimensional. In fact, there are potentially multiple solutions to the equation, some of hich are inconsistent. Although one may define the estimator to be a minimiser of the norm of U (b), itisnot easy to locate a minimiser either, because of the discontinuity and non-monotonicity. Another difficulty lies in the variance estimation. Since A involves the derivative of the hazard function, direct evaluation of the covariance matrix ould require nonparametric estimation of the density function, hich cannot be done reliably in practical samples ith censored observations. One might estimate A by the numerical derivative of n 1U ; hoever, this kind of estimator can be highly unstable ehan-type eight function Considerable simplification arises in the special case of (b; t)=s(0)(b; t), hich is referred to as the ehan-type eight function. In this case, U can be ritten as or U (b)= n D S(0){b; e(b)}[x X9 {b; e(b)}], i i i i U (b)=n 1 n n D (X X )1 i i j {ei (b) e j=1 j (b)}, (2 3) hich is monotone in each component of b (Fygenson & Ritov, 1994). It is not difficult to see that (X X )1 i j {ei (b) e j (b)} is the gradient in b of {e i (b) e (b)}, here j Donloaded from at Columbia University Libraries on March 27, 2015

4 344 Z. JIN, D. Y. LIN, L. J. WEI AND Z. YIN a = a 1 {a<0}. Thus, the right-hand side of (2 3) is the gradient of the convex function L (b))n 1 n n D {e (b) e (b)}. i i j j=1 We denote a minimiser of L (b) by b@. Although b@ is not necessarily unique, all the minimisers of L (b) are asymptotically equivalent. The minimisation of L (b) can be carried out by linear programming: e minimise the linear function Wn Wn D u subject to the linear constraints u 0 and j=1 i ij ij u ij {e i (b) e j (b)} (i, j=1,...,n). This type of linear programming has been used extensively in the econometrics literature, especially in connection ith the regression quantiles introduced by Koenker & Bassett (1978), and as previously considered by Lin et al. (1998) for the rank estimation of the accelerated time model for counting processes. The minimisation of L (b) is equivalent to the minimisation of n n D e (b) e (b) + n n D (X X ) K i i j KM b, k l k j=1 k=1 l=1 here M is an extremely large number. The latter type of minimisation can be implemented ith the algorithm of Koenker & D Orey (1987), hich is available in S-Plus and other softare. According to the general asymptotic theory for the rank estimators stated in 2 1, the random vector nd b ) is asymptotically zero-mean normal ith covariance matrix 0 A 1 B A 1, here A and B are A and B evaluated at =. As discussed before, A involves the unknon hazard function, so that it is difficult to estimate the covariance matrix analytically. We ill develop a resampling scheme similar to those of Rao & Zhao (1992), Parzen et al. (1994) and Jin et al. (2001) to approximate the distribution of b@, particularly its variance-covariance matrix. To be specific, e define a ne loss function L *(b))n 1 n n D {e (b) e (b)} Z, i i j i j=1 here Z (,...,n)are independent positive random variables ith E(Z )=var(z )=1, i i i and are independent of the data (TB, D, X )(,...,n).this loss function is essentially i i i a perturbed version of the original L (b), perturbed by random variables Z (,...,n). i Minimisation of L * is also a linear programming problem. Let b@* be a minimiser of L *, that is a root of the estimating function U* (b)) n D S(0){b; e(b)}[x X9 {b; e(b)}]z. i i i i i Note that U* has the same mean and approximately the same variance as U. We sho in the Appendix that the asymptotic distribution of nd b ) can be approximated by 0 the conditional distribution of nd* b@ ) given the data (TB, D,X)(,...,n). i i i Conditional on the data (TB, D, X )(,...,n), the only random elements in L *(b) i i i are the Z s. To approximate the distribution of b@, e produce a large number of realisations of b@* by repeatedly generating the random sample (Z,...,Z ) hile holding the 1 n i data (TB, D, X )(,...,n) at their observed values. The covariance matrix of b@ can i i i then be approximated by the empirical covariance matrix of b@*. Confidence intervals Donloaded from at Columbia University Libraries on March 27, 2015

5 Accelerated failure time model for individual components of b 0 can be obtained from the percentiles of the empirical distribution of b@* or by the Wald method eneral eight functions The efficiency of the ehan statistic and the corresponding rank estimator b@ depends on the censoring distribution. The variance of b@ is minimised if the limit of (t) is proportional to l<(t)/l(t). Thus, it is desirable to use general eight functions. The simple conversion of the ehan estimating function U to an L 1 -type loss function does not extend directly to the general eighted log-rank estimating functions. On the basis of b@, hoever, it is possible to construct eighted ehan-type loss functions, hose minimisers can be readily obtained by the linear programming algorithm. We consider the folloing modification of (2 2): UB (b; b@)) n P 2 y{b@; t+(b b@) X }S(0)(b; t){x X9 (b; t)} dn (b; t), (2 4) i i i 2 here y(b; x)=(b; x)/s(0)(b; x), and b@ is a preliminary consistent estimator of b, b@ say. 0 Clearly, UB (b; b@)= n D y{b@; e(b)+(b b@) X }S(0){b; e(b)}[x i i i i i X9 {b; e(b)}] i =n 1 n n y{b@; e)}d (X X )1 i i i j {ei (b) e j=1 j (b)}. This is the same as (2 3) except for the eights y{b@; e)}, hich are free of b. Thus, i UB (b; b@) is monotone in each component of b and is the gradient of the folloing extension of L (b): L (b; b@))n 1 n n y{b@; e)}d {e (b) e (b)}. i i i j j=1 As in the case of L, minimisation of L (.; b@) can be implemented via linear programming. We propose the folloing iterative algorithm for estimating b : 0 b@ =b@, b@ =arg min L (b; b@ ) (k1). (0) b By comparing (2 2) and (2 4), e see that, if b@ converges to a limit as the number of iterations k2, then the limit must satisfy U (b)=0. In such situations, the proposed algorithm constitutes a numerical method for solving the original estimating equation U (b)=0. The resulting estimator is consistent although the original estimating equation may contain inconsistent roots. In all the simulated and real datasets e have tested, b@ converges as k increases and, for k as small as 3, b@ is already close to the limit. It is important to point out that b@ itself is a legitimate estimator for any k and the convergence of the algorithm is not essential to the validity of the proposed method. We sho in the Appendix that, for any k, the estimator b@ is consistent and asymptotically normal. In fact, it follos from (A 5) in the Appendix that Donloaded from at Columbia University Libraries on March 27, 2015 b@ ={(A +D ) 1D }kb@ +[I {(A +D ) 1D }k]b@ +o p (n D),

6 346 Z. JIN, D. Y. LIN, L. J. WEI AND Z. YIN here I is the p p identity matrix, D = lim n 1 n P y < (t)s(0)(b ; t){x X9 (b ;t)}e2 dn (b ; t), (2 5) 0 0 i 0 i 0 n2 and y< (t) is the derivative of y (t), the limit of y(b ;t)as n2. Thus, b@ is asymptotically a eighted average of b@ and b@, and can be regarded as an approximation to b@ If {(A +D ) 1D }k0 as k2, then the limit of b@ as k2, denoted by b@,is (2) asymptotically equivalent to b@. Note that {(A +D ) 1D }k0 as k2 is equivalent to all the eigenvalues of (A +D ) 1D being less than 1 in absolute value. This condition holds if A is positive definite and D is nonnegative definite. The most commonly used eight functions, including the log-rank, Prentice Wilcoxon (Prentice, 1978) and the more general r class of Harrington & Fleming (1982), are all nonnegative and nonincreasing. For such eight functions, A is indeed positive definite provided that the distribution of X does not concentrate on a (p 1)-dimensional hyperplane (Ying, 1993, pp. 81 2). It is evident from (2 5) that D ill be nonnegative definite if y< (.)0, hich is true of all commonly used eight functions. Therefore, b@ is asymptotically equivalent to b@, at least for most important eight functions. (2) 0 We sho in the Appendix that, under mild conditions, b@ b@ =o (n D) for any k of p order log(n) provided that {(A +D ) 1D }k0 as k2. This implies that, ith a finite number of iterations, b@ is asymptotically equivalent to the consistent roots of the original estimating function U (b). In practice, one ould terminate the iteration process hen the difference beteen successive estimates is less than a certain bound. This convergence criterion may be set on the basis of the observed b@ and its estimated standard error, as demonstrated in the next section. Although b@ can be easily obtained, it is again difficult to estimate the limiting covariance matrix analytically. As in the case of b@, e appeal to the resampling approach. To be specific, e construct the perturbed objective function L *(b; b))n 1 n n y{b; e (b)}d {e (b) e (b)} Z, i i i j i j=1 and define b@* =b@* and b@* =arg min L *(b; b@* )(k1). We sho in the Appendix (0) b that the asymptotic distribution of nd b ) can be approximated by the conditional 0 distribution of nd* b@ ) given the data (TB, D, X )(,...,n). Inference about b i i i 0 can then be carried out on the basis of the empirical distribution of b@*. 3. NUMERICAL RESULTS 3 1. Real examples The results reported here and in 3 2 are based on y=1 and y=1/s(0), hich are referred to as the ehan and log-rank eight functions. For the resampling procedure, e simulated the Z i from the unit exponential distribution and generated 500 realisations. The results ere similar hen other distributions of the Z i ere used. The Wald method as used to construct the confidence intervals. We first applied the proposed methods to a study on multiple myeloma reported by Krall et al. (1975). Out of a total of 65 patients ho ere treated ith alkylating agents, 48 died during the study and 17 survived. This is the main example in the PROC PHRE of the SAS/STAT User s uide (1999, pp , ), hich focused on the Cox model ith haemoglobin, HB, and the logarithm of blood urea nitrogen, BUN, as the Donloaded from at Columbia University Libraries on March 27, 2015

7 Accelerated failure time model covariates. To improve numerical efficiencies, e standardised the to covariates log(bun) and HB in our analysis. The parameter estimates for standardised log(bun) and HB based on the ehan eight function are and ith estimated standard errors of and The corresponding 95% confidence intervals are ( 0 818, 0 246) and ( 0 039, 0 623). We used the iterative algorithm to obtain the estimates based on the log-rank eight function. The results from the ehan eight function suggested that it is not necessary to have accurate numbers beyond the second decimal point. Thus, e set the convergence criterion for the iteration to be 0 01 beteen successive estimates. Based on this criterion, convergence as achieved at the third iteration. The resulting estimates are and ith 95% confidence intervals ( 0 880, 0 130) and ( 0 017, 0 553). The estimates are identical up to the eighth decimal point after 5 iterations. For further illustration, e considered the ell-knon Mayo primary biliary cirrhosis data (Fleming & Harrington, 1991, Appendix D.1). The database contains information about the survival time and prognostic factors for 418 patients. The data ere used by Dickson et al. (1989) to build a Cox proportional hazards model for the natural history of the disease ith five covariates, age, log(albumin), log(bilirubin), oedema and log(protime); see Fleming & Harrington (1991, Table 4.6.3). We considered the same set of covariates in our analysis. We again standardised the covariates. The estimates based on the ehan eight function are 0 270, 0 204, 0 593, and ith estimated standard errors of 0 062, 0 069, 0 071, and These results also suggested that the convergence criterion of 0 01 can be used for calculating the log-rank-type estimates. This criterion as again met at the third iteration. The estimates ere identical up to the eighth decimal point after 12 iterations. We display in Table 1 the results for the unstandardised covariates. Although the conclusions are not qualitatively different, the analysis under the accelerated failure time model provides an alternative and more direct interpretation of the effects of covariates on the survival time, as compared to the Cox regression analysis. The results based on the to eight functions are fairly similar. The point estimates for the log-rank eight function are similar to those of Lin & eyer (1992), hich ere obtained by a simulated annealing algorithm. Table 1. Accelerated failure time regression for the Mayo primary biliary cirrhosis data ehan eight function Log-rank eight function Parameter Est SE 95% CI Est SE 95% CI Age ( 0 037, 0 014) ( 0 035, 0 018) log(albumin) (0 542, 2 639) (0 934, 2 378) log(bilirubin) ( 0 716, 0 442) ( 0 674, 0 496) Oedema ( 1 421, 0 336) ( 1 083, 0 385) log(protime) ( 4 549, 0 987) ( 2 850, 1 038) Est, estimate; SE, standard error; CI, confidence interval. 347 Donloaded from at Columbia University Libraries on March 27, Simulation studies Extensive simulation studies ere conducted to assess the operating characteristics of the proposed methods in practical settings. One series of the experiments mimicked the foregoing multiple myeloma study. The failure times ere generated from the model log T = log(bun) HB+e, (3 1)

8 348 Z. JIN, D. Y. LIN, L. J. WEI AND Z. YIN here e is a normal random variable ith mean 0 and standard deviation The regression coefficients and the standard deviation of e in model (3 1) ere estimated from the parametric accelerated failure time model. The censoring times ere generated from the Un(0, t) distribution, here t as chosen to yield a desired level of censoring. We first examined the difference beteen log-rank-type estimates obtained after a fixed number of iterations versus those obtained after convergence. We generated 1000 datasets ith sample size of 65 from model (3 1). The standardised covariate values from the original myeloma study ere used. The algorithm as considered convergent if the difference beteen to consecutive estimates as less than ; convergence as usually achieved ithin 10 iterations. Figure 1(a) plots the estimates obtained after convergence versus those obtained after three iterations for the estimation of the first regression coefficient under 20% censoring; the to sets of estimates are clearly very similar. Figure 1(b) displays the corresponding plot for the log-rank estimates versus the ehan estimates, the to of hich are noticeably different. Thus, the phenomenon in Fig.1(a) is not an artifact of the closeness of the ehan and log-rank estimates. The plots for the estimation of the second regression coefficient and for other censoring levels revealed similar results. The values of the log-rank estimating functions at convergence versus after three iterations ere similar: under 20% censoring, the medians of the L 1 -norms of n 1UB among the 1000 datasets ere at convergence and after three iterations; the corresponding maxima ere 0 07 and These results demonstrated that a small number of iterations is sufficient. Log-rank-type est. after conv (a) Log-rank-type est. after 3 iter. Log-rank-type est. after conv (b) ehan-type est. Fig. 1. Comparisons of different estimates of b 01 under 20% censoring: (a) log-rank-type estimates after convergence versus log-rank-type estimates after three iterations; (b) log-rank-type estimates after convergence versus ehan-type estimates. Donloaded from at Columbia University Libraries on March 27, 2015 Most of the simulation studies ere concerned ith the performance of the proposed inference procedures based on the ehan-type estimator and the log-rank-type estimator ith three iterations. We considered sample sizes of 65 and 130. For n=65, the standardised covariate values from the multiple myeloma dataset ere used; for n=130, each covariate value in the original dataset as duplicated. For each configuration of the simulation parameters, e generated 500 datasets ith the same set of covariate values. The results of the simulation studies are summarised in Table 2. The parameter estimators

9 Accelerated failure time model 349 Table 2. Summary statistics for the simulation studies: regression coeycients b 01 and for log(bun) and HB, respectively n Censoring Weight Parameter Bias SE SEE 90% CP 95% CP 65 0% ehan b Log-rank b % ehan b Log-rank b % ehan b Log-rank b % ehan b Log-rank b % ehan b Log-rank b % ehan b Log-rank b Bias, bias of the estimator; SE, standard error of the estimator; SEE, mean of the standard error estimator; 90% CP, coverage probability of the 90% Wald-type confidence interval; 95% CP, coverage probability of the 95% Wald-type confidence interval appear to be virtually unbiased. The standard error estimators reflect ell the true variabilities of the parameter estimators, and the corresponding confidence intervals have satisfactory coverage probabilities. 4. REMARKS Fygenson & Ritov (1994) characterised the class of eight functions in (2 2) hich yields monotone estimating functions. The class is quite restrictive and involves the censoring time distribution. By contrast, the class of monotone estimating functions developed in the present paper is very broad and yields consistent roots of the original eighted logrank estimating equations. On the basis of (2 4), e can obtain a one-step approximation to b@ for any eight function, as follos. Define Donloaded from at Columbia University Libraries on March 27, 2015 ya (b; x)=s(0)(b; x) 1(b; x)i D A 1, and replace y{b@; t+(b b@) X i } in (2 4) by ya {b@ ;t+(b b@ ) X i }. It then follos from (A 4) in the Appendix that the resulting estimator is asymptotically equivalent to b@.to

10 350 Z. JIN, D. Y. LIN, L. J. WEI AND Z. YIN implement this one-step estimator, one ould need to estimate A and D. A consistent estimator of A can be easily obtained because A 1 B A 1 and B can be consistently estimated from the data. The estimation of D ould require an estimator of l(.), hich can be obtained either nonparametrically or parametrically. The proposed estimators do not achieve the semiparametric efficiency bound. By adopting the approach of Lai & Ying (1991b, 3), one can construct data-dependent eight functions hich yield asymptotically efficient estimators provided that the iterative procedure converges. ACKNOWLEDEMENT This research as supported by the National Institutes of Health, the National Science Foundation and the Ne York City Council Speaker s Fund for Public Health Research. The authors thank the revieers for their helpful comments. Softare implementing the proposed methods is available from the authors. APPENDIX Derivation of asymptotic results Asymptotic properties of b@*. Both n 1L and n 1L * are convex functions and, by the strong la of large numbers, converge almost surely to the same limiting function. The first derivative of the limiting function is zero at b and its second derivative at b is A, hich is nonsingular. Thus, 0 0 this function has a unique minimiser b. It then follos from convex analysis that b@ b and 0 0 b@* b almost surely. The latter convergence is ith respect to the joint probability space of 0 (TB, D, X )(,...,n) and Z (,...,n). We shall use F to denote the s-field generated by i i i i the original data. It follos from Theorem 2 of Ying (1993) that almost surely By similar arguments, U )=U (b 0 )+na )+o(nd+ndb@ d). (A 1) U* * )=U* )+na * b@ )+o(nd+ndb@* b@ d) (A 2) almost surely. The functions U* and U have the same asymptotic slope matrix A in (A 1) and (A 2) because E{U* (b) F}=U (b) and b@ and b@* are consistent. We have that U* )=U* ) U )+o(nd) since b@ is a root of U (b). Thus n DU* )=n D n P 2 S(0) ; t){x X9 ; t)} dn ;t)(z 1)+o(1). i i i 2 Conditionally on F, the right-hand side of the above display is a normalised sum of independent zero-mean random vectors. Since its conditional covariance matrix converges almost surely to B, the multivariate central limit theorem implies that n DU* ) converges in distribution to N(0, B ) almost surely. It then follos from (A 2) that the conditional distribution of nd* b@ ) given F converges almost surely to N(0, A 1 B A 1 ), hich is the limiting distribution of b ). 0 Asymptotic properties of b@. We first sho that b@ is strongly consistent for any given k. Recall that b@ =b@. As a result of the strong consistency of b@, the function n 1L (b; b@ ) (0) (0) (0) converges almost surely to lim n 1L (b; b ), hich reaches its minimum at b=b. Since 0 0 n 1L (b; b@ ) is convex, b@ must converge to b almost surely. Applying these arguments successively for k=2,3,...,e see that b@ b almost surely as n2 for all k. 0 (0) (1) 0 To avoid possible tail instabilities, e assume that, for any b and g hich converge almost n n Donloaded from at Columbia University Libraries on March 27, 2015

11 surely to b 0 and 0, respectively, Accelerated failure time model y(b n ;t+g n )=y(b n ;t)+y< 0 (t)g n +o(n D+g n ) uniformly in t. This condition holds for virtually all the existing eight functions for t aay from the endpoint. For it to hold for all t, e typically ould have to use hich is truncated at the tail, that is (t)=0 for t near the endpoint. Under this condition and Conditions 1 5 of Ying (1993), e can expand UB (b; b@ ) to yield 351 UB (b; b@ )= n P ; t)s(0)(b; t){x i X9 (b; t)} dn i (b; t) + n P y < (t)s(0)(b; t){x X9 (b; t)}x dn (b; t)(b b@ ) 0 i i i +o(nd+ndb b@ d) as n2 and bb 0. By Theorem 2 of Ying (1993), the first term on the right-hand side of the above display has the linear expansion U (b 0 )+na (b )+o(nd+ndb d). Thus, UB ; b@ )=U (b 0 )+n(a +D ) ) nd ) +o(nd+ndb@ d+ndb@ d). Suppose that A,A and A +D are nonsingular. By combining (A 1) and (A 3) ith k=1, e obtain nd (1) )= n D(A +D ) 1{U (b 0 )+D A 1 U (b 0 )} +o(1+nd db@ (1) d+nddb@ d). It then follos from recursive use of (A 3) that in general or nd b )= n D k {(A +D ) 1D }j 1(A +D ) 1U (b ) 0 0 j=1 n D{(A +D ) 1D }ka 1 U (b 0 )+o 1+nD k db@ b d A B (j) 0 j=0 nd b )= n D[I {(A +D ) 1D }k]a 1 0 U (b 0 ) n D{(A +D ) 1D }ka 1 U (b 0 )+o 1+nD k db@ b d A (j) 0 j=0 B. (A 5) The asymptotic normality of nd b ) then follos from that of n DU (b ) and n DU (b ) Suppose that is truncated at the tail and the error density is sufficiently smooth. Then it follos from equation (4.5) of Lai & Ying (1991b) that, for some g>0 and c>0, (A 3) holds ith o(nd g) as the remainder term uniformly for b@ and b@ in an O(n D+c) neighbourhood of b. 0 Thus, if {(A +D ) 1D }k0 as k2, then all the b@ (j k) stay in an O(n D+c) neighbourhood (j) of b so that the remainder term in (A 5) becomes o(kn g)=o(1) provided that k2 and 0 k=o(log n). Hence, nd b )=nd b )+o (1) for k2 and k=o(log n). 0 0 p Asymptotic properties of b@*. We impose the same regularity conditions as in the previous section. The convexity arguments and the strong la of large numbers can again be employed to establish the strong consistency of b@*. By definition, b@* is a root of (A 3) (A 4) Donloaded from at Columbia University Libraries on March 27, 2015 UB * (b; b@* ) ) n P ;t+(b b@* ) X i }S(0)(b; t){x i X9 (b; t)} dn i (b; t)z i.

12 352 Z. JIN, D. Y. LIN, L. J. WEI AND Z. YIN By asymptotic linear expansions similar to those used in establishing (A 3), e have UB * * ; b@* )= n P ;t+ b@ ) X }S(0) ; t){x X9 ; t)} dn ; t)z i i i i +n(a +D )* b@ ) nd * b@ )+d*, (A 6) n here d* n =o(nd+n Wk j=0 db@ (j) d+n Wk j=0 db@* (j) d). We can replace the Z i in the first term on the right-hand side of (A 6) by Z i 1 since b@ is a root of UB (b; b@ ). With this replacement, the first term has mean 0 conditionally on F, so that the integral can be approximated by (b 0 ; t){x i X9 (b 0 ; t)} dn i (b 0 ; t). Thus, hich yields Therefore, UB * * ; b@* )= n nd* b@ )= n D(A +D ) 1 n P (b 0 ; t){x i X9 (b 0 ; t)} dn i (b 0 ;t)(z i 1) +n(a +D )* b@ ) nd * b@ )+d* n, P (b 0 ; t){x i X9 (b 0 ; t)} dn i (b 0 ;t)(z i 1) +(A +D ) 1D nd* b@ )+n Dd* n. nd* b@ )= n D [I {(A +D ) 1D }k]a 1 n P (b 0 ; t){x i X9 (b 0 ; t)} dn i (b 0 ;t)(z i 1) n D{(A +D ) 1D }ka 1 n P S(0)(b 0 ; t){x i X9 (b 0 ; t)} dn i (b 0 ;t)(z i 1)+n Dd* n. Comparing the above equation ith (A 5), e see that the conditional distribution of nd* b@ ) given F converges almost surely to the limiting distribution of nd b ). 0 Under the additional conditions stated in the last paragraph of the previous section, (A 6) can be strengthened ith d* =o(n j) for some j>0. Hence, for k2 and k=o(log n), n nd* b@ ) has the same asymptotic distribution as nd b ). 0 REFERENCES BUCKLEY, J. & JAMES, I. (1979). Linear regression ith censored data. Biometrika 66, COX, D. R. (1972). Regression models and life-tables (ith Discussion). J. R. Statist. Soc. B 34, COX, D. R. & OAKES, D. (1984). Analysis of Survival Data. London: Chapman and Hall. DICKSON, E. R., RAMBSCH, P. M., FLEMIN, T. R., FISHER, L. D. & LANWORTHY, A. (1989). Prognosis in primary biliary cirrhosis: model for decision making. Hepatology 10, 1 7. FLEMIN, T. R. & HARRINTON, D. P. (1991). Counting Processes and Survival Analysis. Ne York: Wiley. FYENSON, M. & RITOV, Y. (1994). Monotone estimating equations for censored data. Ann. Statist. 22, EHAN, E. A. (1965). A generalized Wilcoxon test for comparing arbitrarily single-censored samples. Biometrika 52, HARRINTON, D. P. & FLEMIN, T. R. (1982). A class of rank test procedures for censored survival data. Biometrika 69, JIN, Z., YIN, Z. & WEI, L. J. (2001). A simple resampling method by perturbing the minimand. Biometrika 88, KALBFIEISCH, J. D. & PRENTICE, R. L. (1980). T he Statistical Analysis of Failure T ime Data. Ne York: Wiley. KIRKPATRICK, S., ELATT, C. D. & VECCHI, M. P. (1983). Optimisation by simulated annealing. Science 220, Donloaded from at Columbia University Libraries on March 27, 2015

13 Accelerated failure time model KOENKER, R. & BASSETT,. S. (1978). Regression quantiles. Econometrica 46, KOENKER, R. & D OREY, V. (1987). Computing regression quantiles. Appl. Statist. 36, KRALL, J. M., UTHOFF, V. A. & HARLEY, J. B. (1975). A step-up procedure for selecting variables associated ith survival. Biometrics 31, LAI, T. L. & YIN, Z. (1991a). Large sample theory of a modified Buckley James estimator for regression analysis ith censored data. Ann. Statist. 19, LAI, T. L. & YIN, Z. (1991b). Rank regression methods for left-truncated and right-censored data. Ann. Statist. 19, LIN, D. Y. & EYER, C. J. (1992). Computational methods for semiparametric linear regression ith censored data. J. Comp. raph. Statist. 1, LIN, D. Y., WEI, L. J. & YIN, Z. (1998). Accelerated failure time models for counting processes. Biometrika 85, MANTEL, N. (1966). Evaluation of survival data and to ne rank order statistics arising in its considerations. Cancer Chemo. Rep. 50, PARZEN, M. I., WEI, L. J. & YIN, Z. (1994). A resampling method based on pivotal estimating functions. Biometrika 81, PRENTICE, R. L. (1978). Linear rank tests ith right censored data. Biometrika 65, RAO, C. R. & ZHAO, L. C. (1992). Approximation to the distribution of M-estimates in linear models by randomly eighted bootstrap. Sankhya: A 54, RITOV, Y. (1990). Estimation in a linear regression model ith censored data. Ann. Statist. 18, SAS INSTITUTE, INC. (1999). SAS/STAT User s uide, Version 8. Cary, NC: SAS Institute Inc. TSIATIS, A. A. (1990). Estimating regression parameters using linear rank tests for censored data. Ann. Statist. 18, WEI, L. J., YIN, Z. & LIN, D. Y. (1990). Linear regression analysis of censored survival data based on rank tests. Biometrika 77, YIN, Z. (1993). A large sample study of rank estimation for censored regression data. Ann. Statist. 21, [Received August Revised November 2002] 353 Donloaded from at Columbia University Libraries on March 27, 2015

On least-squares regression with censored data

Biometrika (2006), 93, 1, pp. 147 161 2006 Biometrika Trust Printed in Great Britain On least-squares regression with censored data BY ZHEZHEN JIN Department of Biostatistics, Columbia University, New