Efficient Estimation of Censored Linear Regression Model

2 3 4 5 6 7 8 9 2 3 4 5 6 7 8 9 2 2 22 23 24 25 26 27 28 29 3 3 32 33 34 35 36 37 38 39 4 4 42 43 44 45 46 47 48 Biometrika (2), xx, x, pp. 4 C 28 Biometrika Trust Printed in Great Britain Efficient Estimation of Censored Linear Regression Model BY KANI CHEN Department of Mathematics, Hong Kong University of Science and Technology, Kowloon, Hong Kong makchen@ust.hk AND YUANYUAN LIN Department of Mathematics, Hong Kong University of Science and Technology, Kowloon, Hong Kong Corresponding author: linyy@ust.hk SUMMARY In linear regression or accelerated failure time model, the method of efficient estimation, with or without censoring, has long been overlooked. The main reason is that complications arise from multiple roots of the efficient score and density estimation. In particular, when smoothing is involved, uncertainty in the choice of bandwidth is inevitable. Zeng and Lin (27) provided a novel efficient estimation method for the accelerated failure time model by maximizing a kernelsmoothed profile likelihood function. This paper proposes a one-step efficient estimation method based on counting process martingale, which has several advantages: it avoids the multiple root problem, the initial estimator is easily available, and it is easy to implement numerically with a built-in inference procedure. The requirement on bandwidth is rather loose and less restrictive than that imposed in Zeng and Lin (27). A simple and effective data-driven bandwidth selection method is provided. The resulting estimator is proved to be semiparametric efficient with the same asymptotic variance as the efficient estimator when the error distribution is assumed to be known up to a location shift. The asymptotic properties of the proposed method are justified and the asymptotic variance matrix of the regression coefficients is provided in a closed form. Numerical studies with portive evidence are presented. Applications are illustrated with the well-known PBC data and the Colorado Plateau uranium miners data. Some key words: Linear regression model; Accelerated failure time model; Estimating equation; One-step efficient estimation; Counting process martingale.. INTRODUCTION The linear regression model is one of the most classical and widely used statistical models. For complete data, the regression parameters are usually estimated by the least squares, rank estimation and M-estimation. Likewise, in the presence of censoring, the linear model or accelerated failure time model, owing to its straightforward interpretation, is an important alternative to the Cox model for regression analysis. The popular estimation approaches are rank methods (e.g. logrank and Gehan s method) and the least squares method (e.g. Buckley-James), see Gehan (965) and Buckley and James (979). Other existing methods, such as Tarone-Ware,

49 5 5 52 53 54 55 56 57 58 59 6 6 62 63 64 65 66 67 68 69 7 7 72 73 74 75 76 77 78 79 8 8 82 83 84 85 86 87 88 89 9 9 92 93 94 95 96 2 K. CHEN AND Y. LIN Kalbfleisch-Prentice, Peto-Peto and Fleming-Harrington also have clear interpretations and are easy to compute, see Kalbfleisch and Prentice (98), Peto and Peto (972), Fleming and Harrington (99). An M-estimation method for censored linear regression can be found in Zhou (992). The co-existence of the aforementioned competing methods reflect the fact that it has not proven which method is the best. They are asymptotically efficient only under certain specific error distributions. None of them achieves the semiparametric efficiency bound. Even though some general theory for rank methods has been developed in Tsiatis (99), Lai and Ying (99b) and Ying (993), a specific construction of the semiparametric efficient estimator is not available in the literature. Recently, Zeng and Lin (27) developed an elegant efficient estimation method for the accelerated failure time model by maximizing a kernel-smoothed profile likelihood function. This approach is intuitively appealing. It leads to a semiparametric efficient estimator under certain restrictive choice of kernel function and bandwidth. An easy-to-implement semiparametric efficient estimation, with obvious advantage over the above methods in terms of estimation accuracy, is much desired. The semiparametric efficient estimation, however, is generally associated with the problem of density estimation and computational difficulties. The efficient score with estimated density and its derivative is likely to have multiple roots. Moreover, when smoothing is involved, uncertainty in the choice of bandwidth is inevitable. These and other technical/numerical complications constitute the limitations of efficient estimation. In spite of the difficulties, efficient estimation is worth pursuing. The most appealing factor is that, when censoring is absent or independent of covariate, the semiparametric efficient estimator is asymptotically as accurate as the parametric efficient estimator when the error distribution is assumed to be known up to a location shift. Consider a linear regression model y i = α + β x i + ǫ i, i n, where (y i,x i,ǫ i ) are i.i.d copies of (y,x,ε) and x is independent of ǫ. Assume the density of ǫ i, f( ), is known and let ψ( ) = f( )/f( ). The score (,x) ψ(y α β x) = (,x) ψ(ǫ) and the estimating equation n (,x i ) ψ(y i α β x i ) = provide the efficient estimation. The asymptotic variance of the estimator of β is {var(x)i}, where I = E{ψ 2 (ε)} is the Fisher information for the location parameter. When f is entirely unknown, α is not identifiable and is better absorbed into the error term for proper formulation. The score function for β is (x Ex)ψ(y β x) = (x Ex)ψ(ǫ). Heuristically, the efficient estimator of β can be obtained by solving (x i x) ˆψ(y i β x i ) =, where ˆψ( ) is a proper estimator of ψ( ). However, numerical complications arise here. First, the curve estimation itself can be difficult. Second, the above estimating function is sensitive to the curve estimation and deviation can result in bias of the estimator of β. Moreover, the estimating equation may have multiple roots, depending on the shape of the curve ˆψ. As a result, efficient estimation becomes unfavorable compared with least squares or rank estimation. We note that the asymptotic variance of the semiparametric efficient estimator {var(x)i} is the same as that of the parametric efficient estimator. With the ample theoretical and numerical development in smoothing techniques as well as in survival analysis, it is now timely to tackle the technical difficulties and find a more reliable numerical procedure for efficient estimation. This paper proposes a competitive alternative statis-

97 98 99 2 3 4 5 6 7 8 9 2 3 4 5 6 7 8 9 2 2 22 23 24 25 26 27 28 29 3 3 32 33 34 35 36 37 38 39 4 4 42 43 44 Efficient Estimation of Censored Linear Regression Model 3 tical procedure: a one-step semiparametric efficient estimation. There are several advantages to this method. The high order accuracy of curve estimation, such as that imposed in Zeng and Lin (27), is not required. Moreover, even if the curve estimation is not consistent, the estimator of the regression parameter is still consistent and asymptotically normal. The choice of bandwidth, which can be rather delicate in the context of curve estimation, is not restrictive. Moreover, the curve estimation is for the hazard function rather than the density function and can be naturally handled using counting process martingale. A simple and effective data-driven bandwidth selection method is also provided. Although the one-step estimation requires an appropriate initial estimator, this is not a problem as many ideal candidates exist. Using the one-step estimation, we avoid solving the estimating equation and therefore avoid the multiple root problem. This method also has a built-in variance estimation. The computation is straightforward. In summary, the proposed method provides a competitive alternative to the existing methods. It is worth noting that even though Buckley-James, the log-rank and Gehan s methods do not involve smoothing, the estimation of asymptotic variances does if the plug-in rules are used. An alternative way is by the resampling procedures proposed recently in the literature, see Jin et al. (2). In the next section, we describe the one-step efficient estimation procedure with theoretical justification. Numerical studies with portive evidence are presented in Section 3. Section 4 describes the applications to the well-known PBC data and the Colorado Plateau uranium miners data. A few closing remarks are given in Section 5. All proofs are deferred to the Appendix. 2. ONE-STEP EFFICIENT ESTIMATION Let (Y,X) be the response-covariate pair of random variables, satisfying Y = β X + ǫ, () where X is a d-dimensional covariate, β is the d-vector regression parameter to be estimated and ǫ is the unobservable error term independent of X. Y in model () could be some transformation of survival times. When the log transformation is used, the resulting model is often regarded as an accelerated failure time model. Let C be the censoring variable. As usual, we assume that C is conditionally independent of Y given X. Set Y = min(y,c), δ = I(Y C), N(t) = δi(y t), and Y (t) = I(Y t). I( ) is the indicator function throughout this paper. Then, M(t) N(t) t Y (s)λ(s β X)ds is a counting process martingale, where λ( ) is the hazard function of ǫ. Let Λ( ) be the cumulative hazard function of ǫ. For ease of presentation, we also define N(t,β) = δi(y β X t) and Y (t,β) = I(Y β X t). Then, M(t,β ) N(t,β ) t Y (s,β )λ(s)ds is also a counting process martingale based on ǫ. In the presence of censoring, the observations are (Y i,δ i,x i ), i n, which are iid copies of (Y, δ, X). A linear regression model without censoring is a special case with P(C = ) =. Consider the following estimating function { τ nj= } X j Y j (t,β) U φ (β) = φ(t) X i nj= dn i (t,β) Y j (t,β)

45 46 47 48 49 5 5 52 53 54 55 56 57 58 59 6 6 62 63 64 65 66 67 68 69 7 7 72 73 74 75 76 77 78 79 8 8 82 83 84 85 86 87 88 89 9 9 92 4 K. CHEN AND Y. LIN { τ nj= = φ(t β X j Y j (t β X i + β } X j ) X i ) X i nj= Y j (t β X i + β dn i (t), (2) X j ) where τ = inf{t : P(ǫ < t,c β X < t) = }. All the popular estimation procedures can be viewed as special cases with particular choices of φ(t). Choosing φ(t) = yields the well-known log-rank estimation and choosing φ(t) = n j= Y j (t,β) leads to Gehan s estimation. It will be shown later that, under certain regularity conditions, choosing φ(t) = λ(t)/λ(t) leads to the efficient estimation if λ( ) satisfies certain smoothness conditions. When the distribution of ǫ is unknown, it is natural to plug in a proper estimator of λ( )/λ( ). Such estimation shall inevitably involve smoothing techniques. Let K( ) be an infinitely differentiable symmetric kernel function with port in [,] and h be the bandwidth. Let K h (t) = (/h)k(t/h). Define ˆλ(t,β) = K h (s t)dˆλ(s,β), ˆ λ(t,β) τ = h 2 (s t)k h (s t)dˆλ(s,β) κ 2 and ˆψ(t,β) = ˆ λ(t,β)/ˆλ(t,β), where κ2 = u2 K(u)du. ˆΛ( ) is the nonparametric estimator of the cumulative hazard function Λ( ) introduced by Nelson (972) and generalized by Aalen (978). The kernel smoothing estimator of λ( ) is essentially a way of smoothing the increments of ˆΛ( ). Moreover, ˆλ( ) and ˆ λ( ) are consistent estimates for λ( ) and λ( ), see Ramlau-Hansen (983) and Jiang and Doksum (23). A straightforward way of computing an estimator is to directly replace φ(t,β) with ˆψ(t,β) in estimating equation (2) and solve the equation for β. However, such a direct procedure may not be numerically feasible. In general, this equation is likely to have multiple solutions. The conventional numerical tools, such as the Newton-Raphson method, become unreliable. To avoid numerical complications, we propose the following one-step efficient estimation procedure. Here and throughout the paper, υ 2 = υυ for any vector υ of dimension d. is the Euclidean norm and β be the true value of β. Let β be certain initial estimator of β. Denote nj= X j Y j (t,β)/ n j= Y j (t,β) in estimating equation (2) as X(t,β). Let V (β ) = Define the proposed estimator, ˆψ 2 (t,β ) {X j X(t,β )} 2 Y j (t,β ) dn i (t,β )/ Y j (t,β ). j= j= ˆβ = β + V (β )U ˆψ(β ). (3) Some regularity conditions are assumed here. (i) P(ǫ > τ) > and τ is assumed to be finite; (ii) The covariate X is bounded, i.e, P( X K) = for some < K < ; (iii) The hazard function λ(t) is bounded away from for t τ and is continuously differentiable up to the second order; (iv) The bandwidth h = n r with r ( /3,); (v) The initial estimator β = β + O p (n /2 ). THEOREM. Assume the above conditions (i)-(v) hold. In fact, ˆβ n = β + Σ ξ i /n + r n, (4)

93 94 95 96 97 98 99 2 2 22 23 24 25 26 27 28 29 2 2 22 23 24 25 26 27 28 29 22 22 222 223 224 225 226 227 228 229 23 23 232 233 234 235 236 237 238 239 24 Efficient Estimation of Censored Linear Regression Model 5 where the remainder r n = O p (/ n 2 h 3 + h 2 / n), and Σ = ξ i = { λ(t)/λ(t)}{x i X(t,β )}dn i (t,β ) { λ(t)/λ(t) } 2 E { X E(X Y t + β X) } 2 dλ(t), which can be consistently estimated by V (β )/n. It follows that ˆβ is consistent for β as n, and n(ˆβ β ) N(,Σ ) in distribution as n. Remark. Conditions (i)-(ii) are regularity conditions. Condition (i) implies that at the end of the study, there is a positive chance that an individual will have survived without being censored. Such restriction frequently appears in the literature and is used to avoid the usual technical difficulties of a possible tail instability. Condition (iii) is the smoothness condition for the underlying density function. V (β) and Σ are nonsingular under conditions (i)-(iii). This is basically in the same spirit as Theorem 8.4. of Fleming and Harrington (99). Condition (iv) is needed for curve estimation. Condition (v) implies that the initial estimator must be close to β to ensure convergence of the desired semiparametric efficient estimator. Condition (v) is satisfied by various existing estimators such as, for example, the logrank or Gehan s estimator. Remark 2. When assuming that the error distribution in model () is known up to a location shift, the efficient score for β is S(β) = { λ(t)/λ(t)} {X i dn i (t,β) X i Y i (t,β)λ(t)dt}. And the asymptotic variance of the efficient estimator of β is Σ, which is the same as the asymptotic variance of the semiparametric efficient estimator. The proposed estimator is semiparametric-efficient. Most importantly, it is as efficient as the parametric efficient estimator, when the error distribution is known up to a location shift. Remark 3. Zeng and Lin (27) provided a novel method to compute an efficient estimator for the accelerated failure time model. There are certain theoretical restrictive conditions on the kernel function and bandwidth. In particular, the asymptotic normality relies on the assumptions that the first (m ) moments of the kernel function are for some m > 3 and the bandwidth h = n ν with ν ( /(2m), /6). Such assumptions could be too stringent in practice. The proposed efficient estimation procedure in this paper is valid under much looser requirements on the kernel function and bandwidth. The requirement on the bandwidth h = n r with r ( /3,) is only a necessary condition for the consistency of the estimation of λ( ), as the variance of ˆ λ( ) is at the order of /(nh 3 ). And the positivity and symmetry of the kernel function is standard in the literature. Remark 4. Although the estimation of λ( ) and λ( ) may theoretically have different orders of optimal bandwidth, using different choices of bandwidth to estimate them separately only results in marginal improvement. In this paper, choosing the same bandwidth to estimate λ( ) and λ( ), due to its analytical and computational simplicity, yields desirable and favorable results. Remark 5. The remainder term in (4) is a degenerate U-statistics which is zero correlated with the main iid term plus negligible terms by a careful evaluation in the Appendix. It then induces

24 242 243 244 245 246 247 248 249 25 25 252 253 254 255 256 257 258 259 26 26 262 263 264 265 266 267 268 269 27 27 272 273 274 275 276 277 278 279 28 28 282 283 284 285 286 287 288 6 K. CHEN AND Y. LIN a data-driven bandwidth selection method by minimizing the remainder term r n over h. As the main iid terms do not depend on h, it is equivalent to minimize ξ i + r n n Σ over h. The proposed data-driven bandwidth selection method is simple and effective. Moreover, in view of (3), one can write V (β )U ˆψ(β ) = (β β ) + n n Σ ξ i + r n. (5) Under condition (v), β β = (/n) n Z i + O p (/n) where Z i are iid mean zero terms. Then, V (β )U ˆψ(β ) = Z i + n n n Σ ξ i + r n + O p ( ). (6) n Most importantly, it can be verified that Z i is also zero correlated with the degenerate U-statistics leading term of r n which is at the order of (/ n 2 h 3 + h 2 / n). Then the mean squared error of nv (β )U ˆψ(β )/ n MSE(h) = σ 2 + O( nh 3 + h4 ) + o( nh 3 + h4 ) + O( n ) (7) decreases till h is at the order of n /3, where σ 2 is the variance of n Z i / n + Σ n ξ i / n which is independent of h. As a result, it is intuitively appealing to select the bandwidth by minimizing V (β )U ˆψ(β ) in real implementation. The computation is straightforward. Supportive simulation results can be found in section 3. Remark 6. Consider a generalized linear model Y = g(β X) + ǫ, where g( ) is a known differentiable link function, Y is the response variable, X, β and ǫ are given in model (). Under certain regularity conditions, the regression parameter β is identifiable. Our proposed method can be applied to such a generalized linear model in a straightforward fashion. 3. NUMERICAL STUDIES Simulation studies are conducted to examine the finite sample performance of our proposed one-step efficient estimation method. Similar to Jin et al. (26) and Zeng and Lin (27), the studies are based on the following model log(t) = 2 + β X + β 2 X 2 + ǫ, (8) where X is Bernoulli with probability of success.5, X 2 is normal with mean and standard deviation.5 and (β,β 2 ) = (, ). X and X 2 are independent. We consider six error distributions: ǫ follows the standard normal distribution; ǫ follows (standard) extremevalue distribution; ǫ follows the distribution corresponding to the proportional odds model (the standard logistic distribution); and e ǫ follows Weibull distribution with scale parameter and shape parameter, denoted by Weibull(,); ǫ follows mixtures of N(,) and N(,9) with mixing probabilities (.5,.5) and (.95,.5), denoted by.5n(, ) +.5N(, 9) and

Efficient Estimation of Censored Linear Regression Model 7 289 29 29 292 293 294 295 296 297 298 299 3 3 32 33 34 35 36 37 38 39 3 3 32 33 34 35 36 37 38 39 32 32 322 323 324 325 326 327 328 329 33 33 332 333 334 335 336.95N(, ) +.5N(, 9). Censoring times are generated from the Uniform[, τ], where τ was chosen to produce a 25% censoring rate. We set the sample size n = 2. We use the kernel function K( ) to be the standard normal density function (Gaussian kernel) in our simulation. We let the logrank estimator and Gehan s estimator to be the initial estimator in our procedure. We used the recommended optimal bandwidths for the profile-likelihood approach in Zeng and Lin (27). The simulation results are based on replications and can be summarized in the following tables. (Insert Tables, 2, 3 here) It can be seen that the proposed estimator is asymptotically as efficient as that of Zeng and Lin (27). And the proposed estimator is generally stable and not sensitive to the choice of bandwidth. One can see from Tables and 2 that the proposed method is not sensitive to the initial estimator. Both logrank and Gehan s initial estimators provide favorable results. Table 3 reports that the proposed bandwidth selection method works well compared with the existing methods. Although for logistic and.5n(, ) +.5N(, 9) error distributions the variance is slightly underestimated with small bandwidths, the proposed data-driven bandwidth selection method gives accurate estimation. Further simulation shows that the proposed estimator is not sensitive to the choice of kernel function. Indeed, our proposed procedure performs well in finite sample studies. 4. APPLICATIONS We first present an analysis of the well-known Mayo primary biliary cirrhosis (PBC) data (Fleming and Harrington, 99, app. D.). The dataset contains information on the survival time T and prognostic factors for 48 patients. Analogous to Jin et al. (23, 26) and Zeng and Lin (27), we fit the accelerated failure time model with five covariates, age, log(albumin), log(bilirubin), edema and log(protime). The estimates of the regression parameters can be obtained using the proposed method with the Gaussian kernel function and the bandwidths σn /5, σn /7 and the bandwidth selected by the proposed data-driven selection method in Remark 5, denoted by h dd. In these two examples, σ is the sample standard deviation of log(t). The results are displayed in Table 4. The parameter estimates are generally not sensitive to the choice of bandwidth. And the variance estimation with bandwidth h dd is similar to that of Zeng and Lin (27). Table 4: Accelerated failure time regression for the Mayo PBC data. h = σn /5 h = σn /7 h dd Covariate Est SE Est SE Est SE Age.282.56.29.6.293.64 log(albumin) -.8637.4574 -.835.489 -.724.523 log(bilirubin).586.59.5847.635.5872.679 Edema.745.275.745.2248.7458.243 log(protime) 2.2743.5998 2.4225.646 2.4548.694 For further illustration, we apply the proposed method to analyze the Colorado Plateau uranium miners data. The data set is collected to study the effects of radon exposure and smoking on the rates of lung cancer. The detailed description of the data set can be found in Langholz and Goldstein (996). The study consists of 3347 Caucasian male miners who worked underground for at least one month in the uranium mines of the four-state Colorado Plateau area. For each

337 338 339 34 34 342 343 344 345 346 347 348 349 35 35 352 353 354 355 356 357 358 359 36 36 362 363 364 365 366 367 368 369 37 37 372 373 374 375 376 377 378 379 38 38 382 383 384 8 K. CHEN AND Y. LIN subject, the information includes the age at entry to the study, the cumulative radon exposure, the cumulative smoking in number of packs and the death information. In this study, a total of 258 miners died of lung cancer. Subjects who died of lung cancer are taken to be failures and all others were censored at their exit times. Let X denote the cumulative radon exposure in working level months (WLMs), Z be the cumulative smoking in packs and W be the age at entry to the study. We fit the following model log(t) = β X + β 2 W + β 3 Z + ǫ using the proposed method with the Gaussian kernel function. Table 5 presents the estimation results which are generally not sensitive to the choice of bandwidth. It is seen that the cumulative radon exposure and age have significant negative effect on the survival time and smoking is hazardous. Table 5: Analysis of Colorado Plateau Uranium Miners Data. h = σn /5 h = σn /7 h dd Covariate Est SE Est SE Est SE X -.24.8 -.223.3 -.28. W -.26.3 -.268.2 -.266.6 Z -.77.2 -.66.9 -.7.6 NOTE: Est denotes the parameter estimate; SE is the estimated standard error. 5. CONCLUDING REMARKS This paper proposes a one-step efficient estimation and inference approach for linear regression models or accelerated failure time models with censored data. The goal of the paper is to advocate such a numerical procedure of efficient estimation that is both reliable and easy to implement, as shown in the simulation studies and real examples. We understand that efficient estimation has its limitation in terms of, for example, robustness. However, a truncated version of the curve estimation could be considered. For example, when the estimator of λ( ) is too close to, we could replace the weight function by. Such modified versions may result in more stable, albeit less efficient, estimators. We aim to develope a mature program in S-plus and SAS for practitioners or non-experts to compute the semiparametric efficient estimators. Our future work will consider further extensions of the method to partial linear models or general transformation models. APPENDIX The proof relies on counting process martingale techniques, see for example, Andersen and Gill (982) and Chen (24). More notation is needed. for a vector or matrix means the sum of the absolute values of all elements. Define U tr (β) = { λ(t)/λ(t)}{x i X(t, β)}dn i (t, β) = V tr (β) = { λ(t)/λ(t)}{x i X(t, β)}dm i (t, β), } 2 n { λ(t)/λ(t) {X j X(t, β)} 2 Y j (t, β) dn i (t, β)/ Y j (t, β). j= j=

385 386 387 388 389 39 39 392 393 394 395 396 397 398 399 4 4 42 43 44 45 46 47 48 49 4 4 42 43 44 45 46 47 48 49 42 42 422 423 424 425 426 427 428 429 43 43 432 Efficient Estimation of Censored Linear Regression Model 9 Let U est (β) be defined the same as U tr (β) except with λ(t)/λ(t) replaced by ˆ λ(t)/ˆλ(t) defined in Section 2. Set N(t, β) = (/n) n N i(t, β). The following lemma provides the approximations that will be used later. LEMMA. Assume conditions (i)-(iv) hold. Then, there exists a sequence C n such that, n U est(β) n U tr(β) = O p( n2 h + h2 ); 3 n β β C nn /2 nv (β) nvtr (β) = o p (). β β C nn /2 (A ) (A 2) Proof. Denote Ỹ = Y β X. By the boundedness of (Ỹ t)/h and (Ỹ t)k h(ỹ t), var{δk h(ỹ t)} = O(/h) and var{δ(ỹ t)k h(ỹ t)} = O(h) for any fixed t. The empirical approximation analogous to the proof of Lemma of Chen (24, p. 523) can be applied to show, for every sequence B n with B n, that { K h (s t)d N(s, β) E K h (s t)d N(s, β)} = O p ( ); (A 3) β β B n, t τ nh (s t)k h (s t)d N(s, β) β β B n, t τ { E (s t)k h (s t)d β)} N(s, = O p ( h/n). (A 4) Observe that ˆ λ(t) = h 2 κ 2 ˆλ(t) = (s t)k h (s t) (/n) n j= I(Ỹj s) d N(s, β); (A 5) K h (s t) (/n) n j= I(Ỹj s) d N(s, β). (A 6) Using (A 3) and (A 4), it can be shown that, for every sequence B n with B n, β β B n, t τ β β B n, t τ ˆ λ(t) Eˆ λ(t) = Op ( ); nh 3 ˆλ(t) Eˆλ(t) = O p ( ) nh by appealing to Theorem 2.4.3 of van der Vaart and Wellner (996, p.23). Next, in order to show (A ), one can write n U est(β) n U tr(β) = n n n + n [{ˆ λ(t) λ(t)}/λ(t) ] {X i X(t, β)}dm i (t, β) { λ(t)/λ 2 (t)}{ˆλ(t) λ(t)}{x i X(t, β)}dm i (t, β) {ˆ λ(t) λ(t)}{ˆλ(t) λ(t)}/λ 2 (t){x i X(t, β)}dm i (t, β) ˆ λ(t){ˆλ(t) λ(t)} 2 /{λ 2 (t)ˆλ(t)}{x i X(t, β)}dm i (t, β) (A 7) (A 8) = Π Π 2 Π 3 + Π 4, say. (A 9)

433 434 435 436 437 438 439 44 44 442 443 444 445 446 447 448 449 45 45 452 453 454 455 456 457 458 459 46 46 462 463 464 465 466 467 468 469 47 47 472 473 474 475 476 477 478 479 48 K. CHEN AND Y. LIN We first show for any sequence B n with B n, Π E(Π ) = O p ( β β B n n2 h + h2 ). 3 n Let R n be the bias of ˆ λ(t). Using the delta method and plugging in (A 5), one can write Π = } τ {ˆ λ(t) Eˆ λ(t) + Eˆ λ(t) λ(t) {X i n λ(t) X(t, β)}dm i (t, β) = [ { }] τ δ j (Ỹj t)k h (Ỹj t) δ j (Ỹj t)k h (Ỹj t) n P(Ỹ Ỹj Ỹj) E P(Ỹ Ỹj Ỹj) n + n E j= ( δ j (Ỹj t)k h (Ỹj t) j= {P(Ỹ Ỹj Ỹj)} 2 [ { δ j (Ỹj t)k h (Ỹj t) {P(Ỹ Ỹj Ỹj)} 2 n { n X i X(t, β) nh 2 κ 2 λ(t) dm i(t, β) } I(Ỹk Ỹj) P(Ỹ Ỹj Ỹj) k= }]) I(Ỹk Ỹj) P(Ỹ Ỹj Ỹj) k= X i X(t, β) nh 2 κ 2 λ(t) dm i(t, β) { δ j (Ỹj t)k h (Ỹj t) (/n) } 2 n k= I(Ỹk Ỹj) P(Ỹ Ỹj Ỹj) j= {P(Ỹ Ỹj Ỹj)} 2 (/n) n k= I(Ỹk Ỹj) { δ j (Ỹj t)k h (Ỹj t) (/n) } 2 n k= I(Ỹk Ỹj) P(Ỹ Ỹj Ỹj) E {P(Ỹ Ỹj Ỹj)} 2 (/n) n k= I(Ỹk Ỹj) X i X(t, β) nh 2 κ 2 λ(t) dm i(t, β) + R n n X i X(t, β) dm i (t, β) λ(t) (A ) = Θ Θ 2 + Θ 3 + Θ 4, say, (A ) where R n = O(h 2 ). By expressing Θ in the form of a first order degenerate U-statistic, one can write where Θ = n i<j n g ij + n g ij = δ i{h ij E(h ij X i, Y i, δ i )}{X i X(Ỹi, β)} nh 2 κ 2 λ(ỹi) g ii,, h ij = δ j(ỹj Ỹi)K h (Ỹj Ỹi) P(Ỹ Ỹj Ỹj). (A 2) Here g ii = almost surely for i =,..., n. By conditions (i)-(iv) and the boundedness of (Ỹ t)k h (Ỹ t) for all t, var(θ ) = var(g ij ) = O(/(n 2 h 3 )). It can be easily checked that var(ξ 4 ) = O(h 4 /n). Therefore, the uniform convergence Θ E(Θ ) = O p ( β β B n n2 h ), 3 Θ 4 E(Θ 4 ) = O p ( h2 ) β β B n n can be shown by appealing to Theorem 2.4.3 of van der Vaart and Wellner (996, p.23). (A 3)

48 482 483 484 485 486 487 488 489 49 49 492 493 494 495 496 497 498 499 5 5 52 53 54 55 56 57 58 59 5 5 52 53 54 55 56 57 58 59 52 52 522 523 524 525 526 527 528 Efficient Estimation of Censored Linear Regression Model It follows from Lemma of Ying (993) that β β B n, t τ (/n) I(Ỹj t) P(Ỹ t) = O p(n /2 ). j= Using (A 4), it can be shown in a similar fashion that Θ 2 E(Θ 2 ) = o p ( β β B n n2 h ). 3 (A 4) (A 5) Consider Θ 3, under condition (i), inf t τ P(Y t) >. Then there exists a constant c > such that P(Y t) > c for t τ. In view of (A 4), (/n) n j= I(Ỹj t) > C for some constant C > and all t τ. Together with (A 7), Θ 3 β β B n C n β β B n, t τ [ { }] δ j (Ỹj t)k h (Ỹj t) nh j= 2 κ 2 {P(Ỹ Ỹj Ỹj)} E δ j (Ỹj t)k h (Ỹj t) 2 nh 2 κ 2 {P(Ỹ Ỹj Ỹj)} 2 2 I(Ỹk n Ỹj) P(Ỹ Ỹj Ỹj) = o p ( n3 h ) (A 6) 3 β β B n, j n k= for some constant C >. Thus, (A ) is proved. It can be shown in a similar fashion that Π 2 E(Π 2 ) = O p ( β β B n n2 h + n3 h + h2 ); 2 n Π i E(Π i ) = o p ( β β B n n2 h + h2 ) 3 n for i = 3, 4. Finally, let m (β) be the expectation of Π. Since m (β) is continuous in β and m (β ) =, by the Taylor expansion, for every sequence B n with B n and any < δ <, almost surely, where G n (β) = n m (β) G n (β )(β β ) = o(bn +δ ) β β B n + n } [{ˆ λ(t) λ(t) /λ(t){x i β X(t, ] β)} dm i (t, β) {ˆ λ(t) λ(t) } /λ(t){x i X(t, β)} β dm i(t, β) is the gradient of Π by formal differentiation. Thus, G n (β ) = O p (n /2 + / n 2 h 3 + h 2 / n). Then there exists a sequence C n such that Combining (A ) and (A 7), we have m (β) = o( β β C nn /2 n2 h + h2 ). 3 n Π = O p ( β β C nn /2 n2 h + h2 ). 3 n (A 7)

529 53 53 532 533 534 535 536 537 538 539 54 54 542 543 544 545 546 547 548 549 55 55 552 553 554 555 556 557 558 559 56 56 562 563 564 565 566 567 568 569 57 57 572 573 574 575 576 2 K. CHEN AND Y. LIN Likewise, it can be verified that Π i is o p (/ n 2 h 3 + h 2 / n) uniformly over β β C n n /2 for i = 2, 3, 4. Therefore, (A ) is proved. The proof of (A 2) follows along the same line as the proof of (A ) and is omitted here. The proof of Lemma is complete. Proof of Theorem Firstly, we prove (4) in Theorem. Observe that Note that, ˆβ = β + (β β ) + n{v (β ) V tr (β )} n {U est(β ) U tr (β ) + U tr (β ) U tr (β ) na n (β β ) + na n (β β ) + U tr (β )} +nv tr (β ) n {U est(β ) U tr (β )} +nv tr (β ) n {U tr(β ) U tr (β ) na n (β β )} U tr (β ) = + +nv tr (β )A n (β β ) + nv tr (β ) n U tr(β ). { λ(t)/λ(t)} {X i E(X i Y i t + β X i)} dm i (t, β ) = Π + Π 2, say. { λ(t)/λ(t)} { E(X i Y i t + β X i) X(t, β ) } dm i (t, β ) (A 8) Using the approximation established in Ying (993), E(X Y t + β X) X(t, β ) is O p (n /2 ). It follows from the fact that M(t, β ) is a martingale that var(π 2 / n) as n. Therefore Π 2 is o p ( n). Under conditions (i)-(iii), Π is the sum of i.i.d bounded random variables with mean and variance [ ] Σ = var { λ(t)/λ(t)} {X E(X Y t + β X)}dM(t, β ). By the property of the counting process martingale, } 2 Σ = { λ(t)/λ(t) E {X E(X Y t + β X)} 2 dλ(t). Therefore by the central limit theorem, U tr (β )/ n N(, Σ) in distribution as n. Observe that U tr (β) is a weighted logrank estimating function with weight λ( )/λ( ). It can be shown that, for every sequence B n with B n and any < δ <, where A n = n β β B n { Utr (β) U tr (β ) A n n(β β ) /( n + n β β +δ ) } = o p (), (A 9) { } 2 λ(t) {X i E(X i Y i t + β λ(t) X i)} 2 dm i (t, β ) = { n β U tr(β ) + o p ( } n) which is similar to Theorem 2 of Ying (993, p. 9). The proof given there essentially uses empirical approximation. Together with condition (v), U tr (β ) U tr (β ) A n n(β β ) = o p ( n). (A 2) By the law of large numbers, A n Σ in probability as n under condition (ii). By (A 2) and condition (v), nv (β ) nvtr (β ) = o p () and A n n(β β ) = O p (). It can be shown similarly to the second part of Theorem 3.2 of Anderson and Gill (982, p. 8) that V tr (β )/n Σ (A 2)

577 578 579 58 58 582 583 584 585 586 587 588 589 59 59 592 593 594 595 596 597 598 599 6 6 62 63 64 65 66 67 68 69 6 6 62 63 64 65 66 67 68 69 62 62 622 623 624 in probability as n. Therefore, Efficient Estimation of Censored Linear Regression Model 3 (β β ) + nvtr (β )A n (β β ) = o p (/ n), Vtr (β ){U tr (β ) U tr (β ) na n (β β )} = o p (/ n). By Lemma, the leading term of {U est (β ) U tr (β )}/n is O p (/ n 2 h 3 + h 2 / n). Hence, ˆβ = β + V tr (β )U tr (β ) + r n, (A 22) where the remainder r n = O p (/ n 2 h 3 + h 2 / n). In view of (A 2), (4) is proved. Thus, under condition (iv), the consistency of ˆβ is proved. Using (A 2) and the fact that U tr (β )/ n N(, Σ) in distribution as n, we have n(ˆβ β ) N(, Σ ) in distribution as n. The proof is complete. REFERENCES AALEN, O. O. (978). Nonparametric inference for a family of counting processes. Ann. Statist. 6, 7-726. ANDERSON, P. K. & GILL, R. D. (982). Cox s regression model for counting processes: A large sample study. Ann. Statist., -2. BUCKLEY, J. & JAMES, I. (979). Linear regression with censored data. Biometrika 66, 429-436. CHEN, K. (24). Statistical estimation in the proportional hazards model with risk set sampling. Ann. Statist. 32, 53-532. CHEN, K., JIN, Z. & YING, Z. (22). Semiparametric analysis of transformation models with censored data. Biometrika 89, 659-668. FLEMING, T. R. & HARRINGTON, D. P. (99). Counting processes and survival analysis. New York: Wiley. GEHAN, E. A. (965). A generalized Wilcoxon test for comparing arbitrarily singly-censored samples. Biometrika 52, 23-223. JIANG, J. & DOKSUM, K. A. (23). On local polynomial estimation of hazard functions and their derivatives under random censoring. In Mathematical Statistics and Applications: Festschrift for Constance van Eeden, IMS Lecture Notes and Monograph Series 42, 463-482. JIN, Z., LIN, D. Y., WEI, L. J. & YING, Z. (23). Rank-based inference for the accelerated failure time model. Biometrika 9, 34-353. JIN, Z., LIN, D. Y. & YING, Z. (26). On least-squares regression with censored data. Biometrika 93, 47-6. KALBFLEISCH, J. D. & PRENTICE, R. L. (98). Statistical analysis of failure time data. New York: Wiley. LAI, T. L. & YING, Z. (99). Rank regression methods for left-truncated and right-censored data. Ann. Statist. 9, 53-556. LANGHOLZ, B. & GOLDSTEIN, L. (996). Risk set sampling in epidemiologic cohort studies. Statist. Sci., 35-53. MILLER, R. G. & HALPERN, J. (982). Regression with censored data. Biometrika 69, 52-53. NELSON, W. (972). Theory and application of hazard plotting for censored failure data. Technometrics 4, 945-965. PETO, R. & PETO J. (972). Asymptotically efficient rank invariant test procedures. J. Roy. Statist. Soc. Ser. A 35, 85-27. RAMLAU-HANSEN, H. (983). Smoothing counting process intensities by means of kernel function. Ann. Statist., 453-466. RITOV, Y. (99). Estimation in a linear model with censored data. Ann. Statist. 8, 33-328.

625 626 627 628 629 63 63 632 633 634 635 636 637 638 639 64 64 642 643 644 645 646 647 648 649 65 65 652 653 654 655 656 657 658 659 66 66 662 663 664 665 666 667 668 669 67 67 672 4 K. CHEN AND Y. LIN TSIATIS, A. A. (99). Estimating regression parameters using linear rank tests for censored data. Ann. Statist. 8, 354-372. VAN DER VAART, A. W. & WELLNER, J. A. (996). Weak convergence of empirical processes. New York: Springer. YING, Z. (993). A large sample study of rank estimation for censored regression data. Ann. Statist. 2, 76-99. ZENG, D. & LIN, D. Y. (27). Efficient estimation for the accelerated failure time model. J. Amer. Statist. Assoc. 2, 387-396. ZHOU, M. (992). M-estimation in censored linear models. Biometrika 79, 837-84.

673 674 675 676 677 678 679 68 68 682 683 684 685 686 687 688 689 69 69 692 693 694 695 696 697 698 699 7 7 72 73 74 75 76 77 78 79 7 7 72 73 74 75 76 77 78 79 72 Table : Simulation results for the proposed method with logrank initial estimator. h = n /5 h = n /7 h = n /9 ideal estimator BIAS SE SEE CP BIAS SE SEE CP BIAS SE SEE CP BIAS SE ǫ standard normal β.3.62.6.946.6.62.67.952.6.62.7.965.8.56 β 2..62.63.95..66.69.954..65.72.964..57 ǫ Extreme-value β.3.69.57.93.4.68.65.946.3.66.68.962..69 β 2..65.58.933.8.69.65.944.2.65.69.955.4.64 ǫ the standard logistic distribution β.5.284.252.925..279.26.936.5.275.273.945.8.278 β 2..28.256.92.3.28.268.932.3.28.276.949.2.28 e ǫ Weibull(,) β.9.8.74.944..82.79.945.2.8.84.952.6.8 β 2.9.83.79.946.6.85.84.953..85.85.949.6.8 ǫ.5n(,)+.5n(,9) β.3.279.24.922.8.27.264.939.6.279.268.94.7.274 β 2.3.284.244.92.6.275.267.942.6.275.268.948.4.275 ǫ.95n(,)+.5n(,9) β.2.67.67.949.5.68.7.952.9.62.74.959..64 β 2.6.69.69.95..66.7.954..66.75.96.4.69 NOTE: SE is the standard error of the parameter estimates; SEE is the estimated standard error; CP is the coverage probability; ideal estimator is the proposed efficient estimator obtained by plugging in the true λ and λ. Efficient Estimation of Censored Linear Regression Model 5

Table 2: Simulation results for the proposed method with Gehan s initial estimator. 72 722 723 724 725 726 727 728 729 73 73 732 733 734 735 736 737 738 739 74 74 742 743 744 745 746 747 748 749 75 75 752 753 754 755 756 757 758 759 76 76 762 763 764 765 766 767 768 6 K. CHEN AND Y. LIN h = n /5 h = n /7 h = n /9 ideal estimator BIAS SE SEE CP BIAS SE SEE CP BIAS SE SEE CP BIAS SE ǫ standard normal β..62.6.95.6.6.66.956..6.7.96..6 β 2.3.65.64.95..62.68.953..59.7.964.3.59 ǫ Extreme-value β.3.78.72.94.3.73.78.947..73.8.954.2.67 β 2.6.77.76.94.4.78.75.948.6.75.8.958.4.69 ǫ the standard logistic distribution β.6.284.253.926.2.274.269.938..275.273.948.2.259 β 2.8.282.255.924.9.277.268.936..279.276.946.2.259 e ǫ Weibull(,) β.2.8.72.936.7.78.78.952.8.78.83.959.9.78 β 2.8.83.76.935.3.8.83.953.4.82.87.957.6.79 ǫ.5n(,)+.5n(,9) β.3.278.248.925.5.27.253.938.5.272.27.95..279 β 2..278.25.926.3.275.256.936.5.279.273.946.2.279 ǫ.95n(,)+.5n(,9) β.4.68.67.948.6.67.7.95.2.69.77.959.2.68 β 2.9.67.68.949.8.67.7.952.2.69.77.96.4.69

769 77 77 772 773 774 775 776 777 778 779 78 78 782 783 784 785 786 787 788 789 79 79 792 793 794 795 796 797 798 799 8 8 82 83 84 85 86 87 88 89 8 8 82 83 84 85 86 Table 3: Simulation results with the proposed data-driven bandwidth selection method. Proposed Z&L Logrank Gehan Logrank initial estimator Gehan s initial estimator BIAS SE SEE CP BIAS SE SEE CP BIAS SE BIAS SE BIAS SE ǫ standard normal β..64.67.955.4.66.64.943.9.64.3.67.2.67 β 2.4.64.68.957.3.65.67.958.7.64.5.68.7.65 ǫ Extreme-value β.4.75.8.956.5.74.78.957.9.96.6.72.6.95 β 2..76.83.957.4.76.8.956.4.99.9.7.3.2 ǫ the standard logistic distribution β.8.264.26.942.7.267.269.95.5.274..266.7.26 β 2.2.265.263.946.2.27.27.95.3.278.8.279.5.267 e ǫ Weibull(,) β.5.74.8.96.6.78.87.956.5.87.8.8.8.9 β 2.4.8.85.95.3.8.88.957.8.86.2.8.7.94 ǫ.5n(,)+.5n(,9) β.7.274.265.944.3.272.279.958.9.274..298..263 β 2.2.278.268.943.2.27.272.95.8.277..36.6.275 ǫ.95n(,)+.5n(,9) β.7.65.72.953.6.65.67.95..7.6.72.3.64 β 2.5.67.74.954..69.7.954.4.72.9.82.7.67 NOTE: Z&L is the profile-likelihood method in Zeng and Lin (27). Efficient Estimation of Censored Linear Regression Model 7