Median regression using inverse censoring weights

Median regression using inverse censoring weights Sundarraman Subramanian 1 Department of Mathematical Sciences New Jersey Institute of Technology Newark, New Jersey Gerhard Dikta Department of Applied Science and Technology Fachhochschule Aachen Ginsterweg 1, D-52428 Jülich, Germany Abstract We implement semiparametric random censorship model aided inference for censored median regression models. This is based on the idea that, when the censoring is specified by a common distribution, a semiparametric survival function estimator acts as an improved weight in the so-called inverse censoring weighted estimating function. We show that the proposed method will always produce estimates of the model parameters that are as good as or better than an existing estimator based on the traditional Kaplan Meier weights. We also provide an illustration of the method through an analysis of a lung cancer data set. KEY WORDS: Asymptotically normal, Least absolute deviation, Local linearity, Loewner ordering, Lung cancer study, Minimum dispersion statistic. 1 Corresponding author s address: Sundar Subramanian, Associate Professor, Department of Mathematical Sciences, New Jersey Institute of Technology, Newark, New Jersey (email: sundars@njit.edu). 1

1 Introduction Semiparametric random censorship (SRC) models provide a more compelling framework than fully nonparametric models because they not only improve the precision of estimates but also, in some cases, obviate smoothing. Dikta (1998) proved that an SRC estimator of a survival function has improved efficiency in comparison with the Kaplan Meier (KM) estimator. On the other hand, when there are missing censoring indicators (MCIs), SRC should be the approach of choice (Subramanian 2004b), since nonparametric estimation would require pre-smoothing (Subramanian 2004a), which, apart from requiring data-driven bandwidths, can also lead to less precise fits when the censoring proportion is high. The SRC approach requires specifying a parametric model for m(t), the conditional expectation of the censoring indicator given the observed survival time (possibly censored), and estimating the model parameter using maximum likelihood. In this article, we demonstrate the utility of the SRC approach for the fitting of median regression (MR) models from right censored data. Specifically, when the censoring has a covariate free distribution, its KM estimator figures as a key element in the approach of inverse censoring weighted (ICW) median regression (Ying, Jung, and Wei 1995; Subramanian 2007b; Yin, Zeng, and Li 2008). Instead, as we will show, using a SRC estimator of the censoring distribution produces at least as good or better inference. Denote by Z a p + 1 dimensional covariate with first component 1, and suppose T = β Z + ɛ, where β is an unknown parameter. The joint distribution of ɛ and Z is unspecified, but the conditional median of ɛ given Z is known to be 0. The data are n iid copies of (X, δ, Z), where X = min(t, C), δ = I(X = T ), and C is a censoring variable independent 2

of T. The goal is inference for β. Censored median and, more generally, quantile regression models have received much attention (Powell 1986; Newey and Powell 1990; Ying et al. 1995; Lindgren 1997; Yang 1999; McKeague, Subramanian, and Sun 2001; Subramanian 2002, 2007a, 2007b; Honoré, Khan, and Powell 2002; Portnoy 2003; Qin and Tsao 2003; Zhou and Wang 2005; Peng and Huang 2008; Yin et al. 2008). Indeed, in many lifetime data analysis situations, an MR model is perhaps preferable over other competing models such as the accelerated failure time and Cox proportional hazards models because it possesses the twin advantages of robustness and ease of extraction of target information. A key assumption of the approaches of Ying et al. (1995) and Yin et al. (2008) is the independence between the censoring C and covariate Z, mainly designed to avoid curse of dimensionality; see also Honoré et al. (2002). Under this assumption and that of random censoring, the global KM estimator of the distribution of C is typically plugged into a censoring-adjusted least absolute deviation (LAD) estimating function, arrived at by an application of the ICW approach; see also Koul, Susarla, and Van Ryzin (1981), Robins and Rotnitzky (1992), Bang and Tsiatis (2002), and Subramanian (2007a, 2007b). The asymptotic distribution of the estimator of β is normal but with a variance-covariance matrix that depends on the unknown error density, which is difficult to estimate. Ying et al. (1995) and, later, Subramanian (2007a) obtained confidence regions via Basawa and Koul s (1988) minimum dispersion (MD) statistic. Subramanian (2007b) implemented profile empirical likelihood for censored MR models. Yin et al. (2008), on the other hand, employed a resampling procedure based on a perturbation method (Jin, Wei, and Ying 2001). 3

Our proposed modification is driven by the rationale that, as evidenced in the homogeneous case (Dikta 1998), with the right choice of a model, m(t, θ), θ IR k, for m(t) = P (δ = 0 X = t), the SRC paradigm would improve the precision of the parameter estimates. We plug an SRC estimate of G(t), the survival function of C, into the ICW estimating function and choose the minimizer as our estimate of β. We show that the estimator of β (denoted by ˆβ), obtained using the SRC weights in the ICW estimating function, is as good as or better than the estimator obtained by employing the usual KM weights. In practice, specification of a model for m(t) should pose no serious difficulty, since a plethora of choices are available for binary data (Cox and Snell 1989; Dikta, Kvesic, and Schmidt 2006). Indeed, for our illustration involving a lung cancer study, we are able to zero in on an appropriate choice, namely a generalized proportional hazards candidate, see Section 3. Justifying that the chosen model is an appropriate one can be accomplished via the bootstrap method proposed by Dikta et al. (2006), see Section 3. Finally, the proposed approach using SRC weights extends readily even when there are MCIs, since, under the assumption of missing at random, θ can be estimated based on only the complete cases (Lu and Tsiatis 2001; Subramanian 2004b). This is not possible with the ICW approach of Ying et al. (1995). The article is organized as follows. In the next section, we investigate SRC-based ICW inference for the MR parameter. In Section 3, we present an illustration of our proposed methodology using data from a lung cancer study. We complete the article with a conclusion section. Technical details are given in the three Appendixes that appear after Section 4. 4

2 Inference for the regression parameter This section begins with a review of the approach proposed by Ying et al. (1995). We then present our modification and carry out a theoretical comparison between the asymptotic variance-covariance matrices of the estimators from the two competing approaches. 2.1 The ICW estimating function with KM weights Let β 0 denote the true value of β and let Λ G (t) denote the censoring cumulative hazard function. Ying et al. (1995) used KM weights in the estimating function for β given by T n (β) = { } I(X i β Z i ) Ĝ KM (β Z i ) 1 Z i, (1) 2 where ĜKM(t) is the KM estimator of the censoring distribution, and obtained their estimator ˆβ Y JW as the minimizer of T n (β), where denotes the Euclidean norm. For any vector a, let a 2 = aa, and let r(t) and y 1 (t) denote the limits of n 1 n I(β 0Z i t)z i and Y 1 (t) = n 1 n I(X i t). Ying et al. (1995) proved that, under certain regularity conditions (see next subsection), n 1/2 T n (β 0 ) is asymptotically normal with zero mean and variance-covariance matrix given by 1 Γ = lim n n [ I(Xi β 0Z i ) 1 G(β 0Z i ) 2 ] 2 Z 2 i 1 4 r(t)r T (t) dλ G (t). (2) y 1 (t) Let f ɛ (t z) = f(t + β 0z z) denote the conditional density of ɛ given Z = z, where f(t z) is the conditional density of T given Z. The estimator ˆβ Y JW is consistent and n 1/2 ( ˆβ Y JW β 0 ) D N (0, Υ 1 ΓΥ 1 ), where Υ = E[Z 2 f ɛ (0 Z)]. Let ˆΓ denote a consistent estimator of Γ. A test-based approach of computing confidence regions for β 0 uses the statistic 5

V (β 0 ) = n 1 T n(β 0 )ˆΓ 1 T n (β 0 ), which is approximately chi-square distributed with p + 1 degrees of freedom, leading to a 100(1 α)% confidence region {β : V (β) < χ 2 p+1(1 α)} for β 0. Frequently, however, inference for a specific sub-vector is desired, so letting β = ( β (1), β (2)), ( ) β (1) 0, β (2) 0, the MD where β (1) is the q 1 parameter sub-vector of interest, and β 0 = ( ) { statistic given by D β (1) 0 = min β2 n 1 T n ( β (1) 0, β (2) ) ˆΓ 1 S n ( β (1) 0, β (2) )} is used to construct confidence regions for β (1) 0. The MD statistic is approximately chi-square distributed with q degrees of freedom, from which a 100(1 α)% confidence region for β (1) 0 is given by { β (1) : D ( β (1)) < χ 2 q(1 α) }. In particular, this test-based approach also provides confidence regions for individual regression effects. 2.2 The ICW estimating function with SRC weights Define m(x) = P (δ = 0 X = x) and specify m(x) = m(x, θ), where m is known up to the k-dimensional parameter θ. Let θ 0 denote the true value of θ and m 0 (s) = m(s, θ 0 ). Denote the MLE of θ by ˆθ and write ˆm(s) = m(s, ˆθ). Let Ĥ(s) denote the empirical distribution based on X 1,..., X n. Our proposed modification employs SRC weights in the ICW estimating function for β and is given by S n (β) = { I(X i β Z i ) Ĝ SP (β Z i ) 1 2 } Z i, (3) where ĜSP (t) is the SRC model-based estimator of G proposed and studied by Dikta (1998). Note that ĜSP (t) is the product integral of the cumulative hazard estimator given by ˆΛ SP (t) = t ˆm(s) dĥ(s). (4) Y 1 (s) To investigate the large sample properties of ˆβ, we shall assume all the SRC model 6

regularity conditions (Dikta 1998), as well as the following (Ying et al. 1995): 1. The true value β 0 is in the interior of a known bounded convex region D. 2. The covariate vector Z is bounded, say Z L. 3. For β D, there exists a τ such that P (X > τ Z) > 0 and β Z τ almost surely. 4. The derivatives of the conditional survival function F (t z) of T given Z and G(t z) with respect to t are uniformly bounded in (t, z) F = (, τ] [ L, L]. 5. The matrix A = E[Z 2 f ɛ (0 Z)] is positive definite. We first investigate the asymptotic distribution of n 1/2 S n (β 0 ). We shall need additional notation. For r = 1,..., k, write m r (s, θ 0 ) = m(s, θ)/ θ r θ=θ0 and M 0 (s) = [m 1 (s, θ 0 ),..., m k (s, θ 0 )]. Let p 0 (x) = m 0 (x)(1 m 0 (x)) and define the information matrix I 0 = E[M 2 0 (X)/p 0 (X)]. Also, let α(s, t) = M 0(s)I 1 0 M 0 (t) and r(t) = E (I(β 0Z t)z). Write Â(t) = n 1 (1 δ i ) m 0 (X i ) α(t, X i ). p 0 (X i ) Let K( ) be such that K(s)dH(s) <. From Lemma 3.5 of Dikta (1998), we have that n 1/2 { ˆm(s) m 0 (s)} = n 1/2 Â(s) + R n (s), (5) where, uniformly for t τ, the term n 1/2 R n (t) is bounded above by K(t) 1 r,s k (ˆθ r θ 0,r )(ˆθ s θ 0,s ) = K(t) ( O p (n 1/2 ) + o p (n 1/2 ) ) 2 = K(t)Op (n 1 ). Then we have the following representation [Dikta 1998], uniformly for t τ: ˆΛ SP (t) Λ G (t) = 1 n + 1 n t 1 y 1 (s) {m 0(s)d I(X i s) I(X i s)dλ G (s)} (1 δ i ) m 0 (X i ) p 0 (X i ) 7 t α(s, X i ) y 1 (s) dh(s) + o p(n 1/2 ). (6)

Define the following random quantities [Note that E(U) = E(W ) = E(V 1 + V 2 ) = 0]: { I(X β U = 0 Z) 1 G(β 0Z) 2 V 1 = 1 r(x) 2 y 1 (X) m 0(X), V 2 = 1 2 } Z, W = 1 (1 δ) m 0 (X) 2 p 0 (X) r(t) y 1 (t) I(X t)dλ G(t). r(t)α(t, X) dh(t), y 1 (t) Theorem 1 Under the conditions stated above, the distribution of n 1/2 S n (β 0 ) is asymptotically normal with zero mean and variance-covariance matrix given by Σ = Γ where = { 1 4 r(t)r (t) (1 m 0 (t))dλ G (t) 1 y 1 (t) 4 Proof We have that S n (β 0 ) = S n (β 0 ) + o p (n 1/2 ) where S n (β 0 ) = { I(Xi β 0Z i ) 1 } Z G(β 0Z i i ) 2 } r(s)r (t) α(s, t)dh(s)dh(t). (7) y 1 (s)y 1 (t) Z i I(X i β 0Z (β 0Z i ) G(β 0Z i ) i )ĜSP. G 2 (β 0Z i ) Let R(t) = n 1 n Z ii(β 0Z i X i t). Denote the second term of Sn (β 0 ) by Q(β 0 ). Applying the Duhamel equation (Andersen, Borgan, Gill, and Keiding 1992), interchanging integrals, and then applying Lenglart s inequality (Andersen et al. 1992), we obtain Q(β 0 ) = n [ t = n = n = n Ĝ SP (t) G(t) dr(t) G 2 (t) Ĝ SP (s ) G(s) [ ] dr(s) ĜSP (t ) t G(s) G(t) [ dr(s) G(s) t ) ] dr(t) d (ˆΛSP (s) Λ G (s) G(t) ] d (ˆΛSP (t) Λ G (t) ) d (ˆΛSP (t) Λ G (t) ) + o p (n 1/2 ). Recall that r(t) = E (I(β 0Z t)z). The integral inside the square brackets may be expressed as 1 n ( ) I(X i β 0Z i ) I(X β I(β G(β 0Z i ) 0Z i t)z i = E 0 Z) I(β G(β 0Z t)z + o 0Z) p (1) = r(t) 2 + o p(1). 8

Then, 2Q(β 0 ) = n ) (ˆΛSP r(t) d (t) Λ G (t) + o p (n 1/2 ). From Eq. (6), we have Q(β 0 ) = n 1 n (V i + W i ) + o p (n 1/2 ), where V i and W i are independent copies of V = V 1 + V 2 and W respectively. It follows that n 1/2 S n (β 0 ) is asymptotically normal with influence function U + V + W. The covariance matrix of the limiting distribution is Σ, see Appendix A. It is clear from Theorem 1 that is the difference between the asymptotic variancecovariance matrices of n 1/2 T n (β 0 ) and n 1/2 S n (β 0 ). Let a IR p+1 and write b (t) = a r(t). Then c(t) = b (t)b(t) IR. In this case, a a equals the right hand side of Eq. (7), after replacing r(t)r (t) throughout with c(t) in that equation. It follows from Corollary 2.7 of Dikta (1998), that a a 0. That is, the covariance matrices are ordered with regard to the Loewner ordering (Shklyar and Schneeweiss, 2005) in the sense that Γ Σ. The proof of consistency of ˆβ will follow as for ˆβ Y JW, if we can show that, for any ɛ > 0, sup ĜSP (t) G(t) = o ( n 1/2+ɛ), almost surely. (8) t τ A proof of Eq. (8) is given in Appendix B. Finally, asymptotic normality of ˆβ would follow from a property called the local linearity for S n (β). This means that for β β 0 = O(n 1/3 ), the estimating function S n (β) should satisfy (cf. Ying et al. 1995) S n (β) = S n (β 0 ) + na(β β 0 ) + o p (max(n 1/2, n β β 0 )). (9) From Eq. (9), we can deduce that ˆβ is asymptotically normal with mean zero and covariance matrix given by na 1 ΓA 1. A proof of Eq. (9) is given in Appendix C. The covariance matrices of ˆβ, denoted by Σ, and that of ˆβ Y JW, denoted by Γ, also obey the Loewner ordering in the sense that Γ Σ. This means that both the estimating function 9

S n (β 0 ) and the estimator ˆβ are each as good as or more efficient than the corresponding Ying et al. (1995) counterparts. This shows that, when the model for m(t) is chosen well, our modification would yield the desired improved inference for the regression parameter β. 3 Illustration using a lung cancer data set We now analyze data from a lung cancer study (Ying et al. 1995), in which patients with small cell lung cancer were assigned randomly to two treatments. The response is the base 10 log failure time, and age and treatment indicator are the two covariates. Letting θ = (θ 1, θ 2 ), we fitted the model m(x, θ) = (10 x /365) θ 2 /(θ 1 + (10 x /365) θ 2 ) to the data (X i, 1 δ i ), i = 1,..., n. This model is just one minus the generalized proportional hazards model, see Eq. (13) of Dikta et al. (2006). Note that 10 x /365 is just the original failure time expressed in years. We used the genetic optimization algorithm of Mebane and Sekhon (1998) and obtained the estimates ˆθ 1 = 610 and ˆθ 2 = 6.01. The fitted model along with the (X, 1 δ) data are plotted in figure 1. The KM and SRC model-based survival curves are shown in figure 2. We also tested H 0 : m( ) M = { m(x, θ) = (10 x /365) θ 2 /(θ 1 + (10 x /365) θ 2 ) } against H 1 : m( ) / M. We used the model-based resampling test of Dikta et al. (2006), which produced a p-value of 0.28 based on 1000 resamples, implying that H 0 cannot be rejected. It was reported elsewhere (Subramanian, 2007) that, using the KM weights for the censoring, the point estimates of the intercept, age, and treatment effects were respectively 2.90, 0.002, 0.161. Using the SRC weights for the censoring, our analysis using the proposed methodology produced the respective point estimates 3.16, 0.006, 0.169. Treating the intercept β (1) and the treatment effect β (2) as nuisance parameters, a 95% confidence 10

interval for the age effect β (3) 0 using the MD statistic and KM weights is ( 0.0064, 0.0025). The proposed methodology using the the MD statistic and SRC weights produced an interval estimate of ( 0.0102, 0.0004) for the age effect. The null hypothesis of no age effect is still not rejected but now only barely so. Treating the intercept β (1) and the age effect β (3) as nuisance parameters, a 95% confidence interval for the treatment effect β (2) 0 using the MD statistic and KM weights is ( 0.3893, 0.0285). The proposed methodology produced an interval estimate of ( 0.3867, 0.0282) for the treatment effect. 4 Conclusion Much like the existing method, the modified ICW estimating function for censored MR models investigated in this paper is tailored for the case that the censoring distribution is free of the covariate. As such, the proposed method applies when the subjects are administratively censored, as in the case of the lung cancer study above, or when a preliminary analysis reveals that the censoring is not likely to be related to the covariates, as in a data set analyzed by Yin et al. (2008). Censored MR methods that address the general case of covariate dependent censoring perhaps do not possess the simple motivation and formulation of the ICW approach, and hence may not be appealing to a practitioner when it is known in advance that censoring is in fact free of the covariate. Our method relies on a properly chosen parametric model for the conditional probability of censoring given the covariate and, given the right choice, is shown to work as well or better than the approach that employs KM weights. For the lung cancer data analyzed in this paper, our approach suggests a possible negative association of entry age with survival time, a result also corroborated by 11

Yang s (1999) analysis. Interestingly, the approach using KM weights was unable to detect this possible negative association. We conclude that the proposed modification is a useful complement to the existing censored MR methodology, including for the case of MCIs. Appendix A Covariance calculations Recall the definitions of U, V = V 1 + V 2, and W from the expressions following Eq. (6) in section 2.2. The assumption of a common censoring distribution implies that E(UV ) = E {ZV I(X β 0Z)/G(β 0Z)}. We calculate { } I(X β E 0 Z) ZV G(β 0Z) 1 = 1 [ ] I(X β 2 E 0 Z)m 0 (X)Zr (X) G(β 0Z)y 1 (X) [ = 1 ] 2 E Z r (t)dλ G(β 0Z) G (t) β 0 Z = 1 [ ] I(β E 0 Z t)z r (t)dλ 2 G(β 0Z) G (t). Next, since I(X β 0Z)(I(X t)z = ZI(β 0Z < t)i(x t) + ZI(β 0Z t)i(x β 0Z), we can show using the assumption of a common censoring distribution that { } I(X β E 0 Z) V G(β 0Z) 2 = 1 2 1 2 = 1 4 [ ] I(X β E 0 Z)I(β 0Z t)z r (t) G(β 0Z) y 1 (t) dλ G(t) [ ] I(β E 0 Z < t)z r (t)dλ G(β 0Z) G (t) r(t)r (t) dλ G (t) 1 [ I(β E 0 Z < t)z y 1 (t) 2 G(β 0Z) ] r (t)dλ G (t). Therefore, under assumption that the covariate Z is continuous, we have E(UV ) = 1 4 r(t)r (t) dλ G (t). y 1 (t) 12

Likewise, we can show that E(V U ) = E(UV ). Also, it is clear that E(UW ) = E(W U ) = 0. Furthermore, we can show that the following expressions hold: E [V 1 V 1] = 1 4 E [W W ] = 1 4 r(t)r (t) y 1 (t) m 0 (t)dλ G (t) E [V 2 V 2] = E [V 1 V 2] E [V 2 V 1] r(s)r (t) α(s, t)dh(s)dh(t). y 1 (s)y 1 (t) After some algebra, the expression for Σ follows. Appendix B Proof of Eq. (8) Write H 0 (t) = P (X t, δ = 0) and denote its empirical estimator by Ĥ0(t). By the Duhamel equation and Lenglart s inequality it suffices to show that sup t τ ˆΛ SP (t) Λ G (t) = o ( n 1/2+ɛ) almost surely. From Eq. (6), we can write sup t τ ˆΛ SP (t) Λ G (t) = I 1 + I 2 + o p (n 1/2 ), where I 1 = sup t τ sup t τ 1 n t t 1 y 1 (s) {m 0(s)d I(X i s) I(X i s)dλ G (s)} m 0 (s) } {Ĥ(s) y 1 (s) d t H(s) + sup (Ĥ(s) H(s))dΛ G(s) t τ y 1 (s). The second term of I 1 is bounded above by the quantity sup t τ Ĥ(t) H(t) Λ G(τ)/y 1 (τ), which is O ( (n/ log n) 1/2) almost surely. After integration by parts, we can easily show that the first term of I 1 is bounded above by the quantity sup t τ Ĥ(t) H(t) /y 1(τ) plus sup t τ Ĥ(t) H(t) [ sup t τ m 0(t)/y 1 (τ) + sup t τ m 0 (t) sup t τ y 1(t)/y 2 1(τ) ], which would be O ( (n/ log n) 1/2) almost surely, provided m 0 (t), its first derivative, the density functions f and g (of F and G respectively) are all bounded uniformly for t τ. Next we have I 2 = sup t τ 1 n (1 δ i ) m 0 (X i ) p 0 (X i ) 13 t α(s, X i ) y 1 (s) dh(s).

Write ρ(t, u) = t α(s, u)dh(s)/y 1(s). Note that m 0 (u)dh(u) = dh 0 (u). Then I 2 = sup ρ(t, u) t τ p 0 (u) dĥ0(u) m 0 (u)ρ(t, u) p 0 (u) dĥ(u) = sup ρ(t, u) t τ p 0 (u) d{ĥ0(u) m 0 (u)ρ(t, u) H 0 (u)} p 0 (u) d{ĥ(u) H(u)}. Each term can be shown to be O ( (n/ log n) 1/2) almost surely, by integration by parts coupled with an appropriate uniform boundedness assumption on ρ(t, u) and the assumption that m 0 (t) is bounded away from 0 and 1. The proof is completed. Appendix C Proof of local linearity of proposed estimating function It can be shown that S n (β) T n (β) = [ĜKM Z i I(X i β (β Z i ) G(β Z i ) Z i ) G 2 (β Z i ) + o p (n 1/2 ), ĜSP (β Z i ) G(β Z i ) G 2 (β Z i ) ] uniformly for β D. It follows that S n (β) S n (β 0 ) = T n (β) T n (β 0 ) + R 1 + R 2, where R 2 is just R 1 defined below but ĜSP replacing ĜKM: } {ĜKM R 1 = Z i I(X i β (β Z i ) G(β Z i ) Z i ) G 2 (β Z i ) } {ĜKM Z i I(X i β 0Z (β i ) 0Z i ) G(β 0Z i ), G 2 (β 0Z i ) which, by lemmas 1 and 2 of Ying et al. (1995), can be shown to be o p (n 1/2 ) when β β 0 = O(n 1/3 ). Define the following limit: 1 r β (t) = lim n n I(X i β Z i ) Z i I(β Z G(β i t). Z i ) Note that r β0 (t) is just r(t)/2. Write d(t) = r β (t) r β0 (t). Following the method of obtaining the asymptotic representation for Q(β 0 ) in the proof of Theorem 1, we can show that R 2 has 14

influence function Ṽ1 + Ṽ2 + W, where Ṽi and W are defined like V i and W but r(t) in those definitions replaced with d(t). The asymptotic covariance matrix of R 2 can be shown to be the same as the expression given by Eq. (7), but r(t) there as well replaced with d(t). Since β is in the n 1/3 neighborhood of β 0, the covariance matrix is arbitrarily close to the null matrix. It follows that R(t) = o p (n 1/2 ). Therefore we have S n (β) S n (β 0 ) = T n (β) T n (β 0 )+o p (n 1/2 ). Using the local linearity for T n (β), we have that S n (β) = S n (β 0 ) + S n (β) + o p (n 1/2 ), where S n (β) = n {0.5 P (T i β Z i )}Z i = n {0.5 F i((β β 0 ) Z i )}Z i. Local linearity follows by taking a Taylor s expansion of S n (β) about β 0. References 1. P.K. Andersen, O.Borgan, R.D. Gill, N. Keiding, Statistical Models Based on Counting Processes, Springer-Verlag, New York, 1993. 2. H. Bang, A.A. Tsiatis, Median regression with censored cost data, Biometrics 58 (2002) 643 649. 3. I.V. Basawa, H.L. Koul, Large-sample statistics based on quadratic dispersion, Int. Statist. Rev. 56 (1980) 199 219. 4. D.R. Cox, E.J. Snell, Analysis of binary data, Chapman and Hall, London, 1989. 5. S. Csörgő, L. Horváth, The rate of strong uniform consistency for the product-limit estimator, Z. Wahrsch. Verw. Gebiete. 62 (1983) 411 426. 6. G. Dikta, On semiparametric random censorship models, J. Statist. Plann. Inference 66 (1998) 253 279. 15

7. G. Dikta, M. Kvesic, C. Schmidt, Bootstrap Approximations in Model Checks for Binary Data, J. Amer. Statist. Assoc. 101 (2006) 521 530. 8. B. Honoré, S. Khan, J.L. Powell, Quantile regression under random censoring, J. Econometrics 109 (2002) 67 105. 9. Z. Jin, Z. Ying, L.J. Wei, A simple resampling method by perturbing the minimand, Biometrika 88 (2001) 381 390. 10. H. Koul, V. Susarla, J. Van Ryzin, Regression analysis with randomly right censored data, Ann. Statist. 9 (1981) 1276 1288. 11. A. Lindgren, Quantile regression with censored data using generalized L 1 minimization, Comput. Statist. Data Anal. 23 (1997) 509 524. 12. K. Lu, A.A. Tsiatis, Multiple imputation methods for estimating regression coefficients in proportional hazards models with missing cause of failure, Biometrics 57 (2001) 1191 1197. 13. I.W. McKeague, S. Subramanian, Y. Sun, Median regression and the missing information principle, J. Nonparametr. Stat. 13 (2001) 709 727. 14. W.R. Mebane, Jr., J.S. Sekhon, Genetic optimization using derivatives: The rgenoud package for R, To appear in J. Statist. Software. 15. J.L. Powell, Censored Regression Quantiles, J. Econometrics 32 (1986) 143 155. 16. S. Portnoy, Censored Regression Quantiles, J. Amer. Statist. Assoc. 98 (2003) 1001 1012. 16

17. L. Peng, Y. Huang, Survival analysis with quantile regression models, J. Ameri. Statist. Assoc. 103 (2008) 637 649. 18. G. Qin, M. Tsao, Empirical likelihood inference for median regression models for censored survival data, J. Multivar. Anal. 85 (2003) 416 430. 19. J.M. Robins, A. Rotnitzky, Recovery of information and adjustment for dependent censoring using surrogate markers, in AIDS Epidemiology - Methodological Issues, eds. N. Jewell, K. Dietz, and V. Farewell, Boston: Birkhauser, (1992) 297 331. 20. S. Shklyar, H. Schneeweiss, A comparison of asymptotic covariance matrices of three consistent estimators in the Poisson regression model with measurement errors, J. Multivar. Anal. 94 (2005) 250 270. 21. S. Subramanian, Median regression using nonparametric kernel estimation, J. Nonparametr. Statist. 14 (2002) 583 605. 22. S. Subramanian, Asymptotically efficient estimation of a survival function in the missing censoring indicator model, J. Nonparametr. Stat. 16 (2004a) 797 817. 23. S. Subramanian, The missing censoring-indicator model of random censorship, Handbook of Statistics 23: Advances in Survival Analysis. Eds. N. Balakrishnan and C.R.Rao (2004b) 123 141 24. S. Subramanian, Median regression analysis from data with left and right censored observations, Stat. Methodol. 4 (2007a) 121 131. 25. S. Subramanian, Censored median regression and profile empirical likelihood, Stat. Methodol. 4 (2007b) 493 503. 17

26. S. Yang, Censored median regression using weighted empirical survival and hazard functions, J. Amer. Statist. Assoc. 94 (1999) 137 145. 27. G. Yin, D. Zeng, H.Li, Power-transformed linear quantile regression with censored data, J. Ameri. Statist. Assoc. 103 (2008) 1214 1224. 28. Z. Ying, S. Jung, L.J. Wei, Survival analysis with median regression models, J. Amer. Statist. Assoc. 90 (1995) 178 184. Estimated m(t) 0.0 0.2 0.4 0.6 0.8 1.0 c c c cc c cc c cc cccc c ccc ccccc c c ccccccc c ccccccccccccc cc ccc c c c 2.0 2.2 2.4 2.6 2.8 3.0 3.2 t Figure 1: Plot of the fitted model for m(t) for the lung cancer data. 18

Estimated G(t) 0.0 0.2 0.4 0.6 0.8 1.0 Kaplan Meier estimator Semiparametric estimator 2.0 2.2 2.4 2.6 2.8 3.0 3.2 t Figure 2: Plot of the Kaplan Meier and Semiparametric random censorship model based survival curves of the censoring distribution for lung cancer data. 19