MAXIMUM LIKELIHOOD METHOD FOR LINEAR TRANSFORMATION MODELS WITH COHORT SAMPLING DATA

Size: px
Start display at page:

Download "MAXIMUM LIKELIHOOD METHOD FOR LINEAR TRANSFORMATION MODELS WITH COHORT SAMPLING DATA"

Transcription

1 Statistica Sinica 25 (215), doi: MAXIMUM LIKELIHOOD METHOD FOR LINEAR TRANSFORMATION MODELS WITH COHORT SAMPLING DATA Yuan Yao Hong Kong Baptist University Abstract: Three widely used sampling designs the nested case-control, case-cohort, and classical case-control designs can be categorized as generalized case-cohort designs. Maximum likelihood methods are used to perform regression analysis of linear transformation models with these sampling designs, and the resulting estimator is proved to be consistent, asymptotically normal and semiparametrically efficient. Simulation studies and an application to the Stanford heart transplant data are presented. Key words and phrases: Linear transformation models, maximum likelihood estimation, missing at random, nested case-control sampling. 1. Introduction Cohort sampling is a popular methodology in epidemiological studies and clinical trials. The main advantage of such a design, compared with prospective studies, is that collecting covariate information is relatively quick and inexpensive. A number of publications have addressed the analysis of cohort designs, and in particular on nested case-control (n-c-c) designs (see, e.g., Breslow and Day (198), Langholz and Goldstein (1996), and Breslow (1996)). Thomas (1977) proposed a partial likelihood approach to estimating the regression parameter of the Cox model. Because of its partial likelihood nature, his estimation method possesses useful properties similar to those of Cox s partial likelihood estimation based on full cohort data. The estimator is easy to compute, and its variance estimator is simply the negative of the derivative of the log-partial likelihood. The asymptotic properties of Thomas estimator can be formally established using the counting process martingale theory of Goldstein and Langholz (1992). However, Thomas estimator is not semiparametrically efficient, and it is likely that other more efficient and easy-to-compute estimates exist. Samuelsen (1997) presented an entirely new estimator based on maximizing pseudo-likelihood. His key observation is that the conditional probability

2 1232 YUAN YAO of a censored subject s ever being selected as a control can be explicitly computed with an expression similar to that of the Kaplan Meier estimator. The resulting estimator and its inference are also very easy to compute. Empirical evidence shows that Samuelsen s estimator is sometimes better than Thomas. As Samuelsen (1997) has pointed out, however, the weighting techniques used to construct the pseudo-likelihood can be inefficient. Chen (21) proposed an estimation method based on local averaging that is essentially an alternative sample reuse method other than those of Thomas and Samuelsen. This method is applicable to so-called generalized case-cohort designs, which includes not only n-c-c, but case-control and case-cohort designs as well. Chen (21) discusses how the accuracy of this estimator is comparable to those of Thomas and Samuelsen, but the efficient estimator can be quite difficult to obtain, as it involves estimating and computing an integral operator. There are also some other methods in the literature. Almost all use Cox model. This puts serious limitations on practitioners because of the lack of statistical tools to analyze n-c-c or more general cohort designs. More general statistical methodologies for this purpose are badly needed. The transformation model, as a natural generalization of the Cox model, is an ideal choice. Typically a linear transformation model can be written as log H(T ) = βz + ϵ, where (T, Z) is the response-covariates pair, H is an unknown monotone function, ϵ is the unobserved random variable with known distribution and β is the unknown regression parameter of main interest. When ϵ follows the extreme value distribution or the standard logistic distribution, the model reduces to the Cox s proportional hazards model or the proportional odds model, respectively. Cheng, Wei, and Ying (1995) contains a very simple and elegant idea based on pairwise comparison of the survival times. This type of rank based method relies on the assumption that the censoring variables are independent of the covariates. In this paper, we propose an efficient estimation method for linear transformation models with cohort sampling designs. The rest of the paper is organized as follows. We describe the maximum likelihood-based estimation method and asymptotic properties in Section 2. Simulation studies and an application to data are given in Sections 3 and 4. Concluding remarks are given in Section 5. Technical details are provided in the Appendix. 2. Estimation and Inference Consider a cohort of n i.i.d. individuals. Let T i and C i denote the failure time and censoring time of the ith individual respectively. We assume that C i is independent of (T i, Z i ). We set

3 MAXIMUM LIKELIHOOD METHOD 1233 (i) the event time: Y i = min(t i, C i ); (ii) the failure/censoring index: δ i = I(T i C i ); (iii) the indicator of the ith individual being sampled for covariate ascertainment: i, with conditional probability π i = P ( i = 1 (Y j, δ j ), 1 j n). It is necessary in the inference procedure that ( 1,..., n ) are conditionally independent of (Z 1,..., Z n ), given (Y j, δ j ), 1 j n. In the linear transformation model log H(T ) = βz + ϵ, (2.1) it is supposed that β is the unknown parameter of main interest, H is an unknown increasing function, and ϵ is a continuous random variable whose hazard function λ ϵ is known. We take ϵ to be independent of Z and C. The likelihood function from (2.1) can then be written as n L(β, H, F ) = f(y i, δ i ) f( 1,..., n (Y j, δ j ), 1 j n) = i=1 n f(z i (Y j, δ j, j ), 1 j n) i i=1 n f(y i, δ i ) f( 1,..., n (Y j, δ j ), 1 j n) i=1 n f(z i Y i, δ i ) i, i=1 where the second equation comes from the conditional independence of ( 1,..., n ) and (Z 1,..., Z n ), and independence between individuals. The last term in the likelihood function can be calculated as ( ) f(z i Y i, δ i ) f(yi i, δ i Z i )f(z i ) i =. f(y i, δ i ) For simplicity, denote the hazard and cumulative hazard functions of exp(ϵ) as λ and Λ, and the probability density and distribution of covariate Z as f and F, respectively. The log-likelihood then takes a final form l n (β, H, F ) = n i=1 } i {δ i [βz i + log λ(h(y i )e βz i )] Λ(H(Y i )e βz i ) { +(1 i ) log +δ i log h(y i ) + i log f(z i ) } [e βz λ(h(y i )e βz )] δ i e Λ(H(Y i)e βz) f(z)dz

4 1234 YUAN YAO + log(λ C (Y i ) 1 δ i e Λ C(Y i ) ) + log f ( 1,..., n (Y j, δ j ), 1 j n), where h( ) is the derivative function of H( ). Using the method of discretization of H and F, let q j represent the size of the increment of H at the jth smallest observed failure times, say s j, j = 1,..., n 1, where n 1 = n i=1 δ i is the number of failures. Set n 1 n 1 H(t) = q j I(s j t), and h(t) = q j I(t = s j ). To estimate F, put probability mass on all known covariates, with f(z j ) = P (Z = Z j ) = p j satisfying n 2 p j = 1, where n 2 = n i=1 i is the number of individuals with known covariates. Then the integral over Z in the likelihood can be estimated as [e βz λ(h(y i )e βz )] δ i e Λ(H(Y i)e βz) f(z)dz n 2 = [e βz j λ(h(y i )e βz j )] δ i e Λ(H(Y i)e βzj ) p j. We show that maximizing the log-likelihood function over (β, q 1,..., q n1, p 1,..., p n2 ) leads to its being consistent, asymptotically normal, and semiparametrically efficient for β under certain regularity conditions. Regularity conditions and the proof of the theorems are in the Appendix. Theorem 1. Under the regularity conditions (C1) (C5), β ˆ n β, sup t [,τ] H ˆ n (t) H (t) and sup Z M F ˆ n (Z) F (Z) almost surely. To describe the variance estimation, let τ denote the duration of the study and suppose Z lies in a bounded set M. Let Q 1 = {p BV [, τ] : p 1} and Q 2 = {q BV [M] : q 1}, where BV [D] is the set of functions on D with bounded total variation. Then H ˆ n can be treated as a bounded linear functional in L (Q 1 ) as τ Hˆ n (p) = p(t)dh ˆ n (t), and similarly, Fn ˆ can be treated as a bounded linear functional in L (Q 2 ). Theorem 2. Under the conditions (C1) (C6), n( β ˆ n β, Hn ˆ H, F ˆ n F ) converges weakly to a zero-mean Gaussian process in the metric space R d L (Q 1 ) L (Q 2 ). The limiting covariance matrix of n( β ˆ n β ) attains the semiparametric efficiency bound.

5 MAXIMUM LIKELIHOOD METHOD 1235 Theorem 3. For any (b, p, q) V Q 1 Q 2, where V = {v R d : v 1}, the asymptotic variance for nv T ( ˆβ n β ) + τ n p(t)d [ Hn ˆ (t) H (t) ] + n q(z)d [ Fn ˆ (Z) F (Z) ] Z can be consistently estimated by (v T, p T, q T )In 1 (v T, p T, q T ) T, where ni n is the negative Hessian matrix of the log-likelihood function l n (β, H, F ) with respect to (β, q 1,..., q n1, p 1,..., p n2 ), and the vectors p = (p(s 1 ),..., p(s n1 )) T, q = (q(z 1 ),..., q(z n2 )) T. Taking p = and q =, the variance matrix of n( ˆβ n β ) can be estimated by the upper left d d matrix of In 1. Remark. The computation can be carried out in many scientific computing packages. For example, the algorithm of fmincon in the optimization toolbox of Matlab can be used to find a minimizer and calculate the Hessian matrix as well. Our simulation shows that this algorithm does well when handling moderate sample size like 2. For the initial value, one can use estimates from the Cox model or try different initial values to make the maximization guaranteed. 3. Simulation Study We took independent covariates Z 1 uniform distributed over [, 1] and Z 2 Bernoulli with success probability.5. The hazard function of the error term ϵ was chosen as exp(t) λ ϵ (t) = 1 + r exp(t), with r =, 1 and 2 (Dabrowska and Doksum (1988); Chen, Jin, and Ying (22)). Here the proportional hazards and proportional odds models correspond to r = and r = 1, respectively. For the transformation function in the model log H(t) = βz + ϵ, we take H(t) = t for r =, H(t) = exp(t) 1 for r = 1, and H(t) =.5 exp(2t).5 for r = 2. The censoring time C was uniform on [, c], where c was chosen such that the expected proportion of censoring was.7 and.8, respectively. The sample size n was set at 2 and all simulations were based on 5 replications. For the case-cohort design, we selected all failures and a subcohort of size 85 and 5 when the censoring rate was.7 and.8, respectively. For the classical case-control design, we selected all failures and a group of non-failures of the same size as the failures. For the nested case-control design, we selected all failures and two non-failures in each risk set of failure times. Table 1 summarizes the simulation results estimating β 1 and β 2. The true values were β 1 = 1 and β 2 = 1. Results include the mean of the bias (Bias) of the estimates, the sample standard deviations (SSD) of the estimates, the mean

6 1236 YUAN YAO Table 1. Summary of simulation results. Censoring rate=.7 Case-cohort design r Bias SSD ESE CP RE RE* β 1 β 2 β 1 β 2 β 1 β 2 β 1 β 2 β 1 β 2 β 1 β Classical case-control design r Bias SSD ESE CP RE RE* β 1 β 2 β 1 β 2 β 1 β 2 β 1 β 2 β 1 β 2 β 1 β Nested case-control design r Bias SSD ESE CP RE RE* β 1 β 2 β 1 β 2 β 1 β 2 β 1 β 2 β 1 β 2 β 1 β Censoring rate=.8 Case-cohort design r Bias SSD ESE CP RE RE* β 1 β 2 β 1 β 2 β 1 β 2 β 1 β 2 β 1 β 2 β 1 β Classical case-control design r Bias SSD ESE CP RE RE* β 1 β 2 β 1 β 2 β 1 β 2 β 1 β 2 β 1 β 2 β 1 β Nested case-control design r Bias SSD ESE CP RE RE* β 1 β 2 β 1 β 2 β 1 β 2 β 1 β 2 β 1 β 2 β 1 β of the estimated standard errors (ESE) of the estimates, and the 95% empirical coverage probabilities (CP) for β 1 and β 2 based on the asymptotic normality in Theorem 2. The proposed estimation procedures perform well in all cases. We compared our approach with that of Chen, Sun, and Tong (212), a

7 MAXIMUM LIKELIHOOD METHOD 1237 likelihood-based method involving inverse probability that can apply to different cohort sampling schemes. A series of simulation with the same settings as above was conducted and it found that the Chen et al. estimators less efficient in most of the senarios, especially when the censoring rate was large. This is mainly because the Chen et al. estimator does not involve the event times whose covariates are not sampled. This can be seen in Table 1 by comparing the relative efficiency of our estimators (RE) with that of the Chen et al. estimators (RE*), where the relative efficiency was computed by comparing the sample variance of an estimator to the sample variance of the full-cohort estimator. 4. Application The Stanford heart transplant data, consisting of censored or uncensored survival times in February 198 of 184 patients who had received heart transplants, was reported in Miller and Halpern (1982). In the data set, the patients T5 mismatch score is a measure of the degree of tissue incompatability between the initial donor and the recipient with respect to HLA antigens. The goal of this study was to analyze the relationship among the survival time, T5 mismatch score and the age of the patients. We compared the results using the full cohort data with those from different sampling schemes. As in Miller and Halpern (1982) and Jin, Lin, and Ying (26), only the 157 patients with complete records were used. Following Miller and Halpern (1982), in Model 1 we regressed the logarithm of the survival time against the ages and T5 mismatch scores for the 157 patients. Then the T5 mismatch score was deleted due to its nonsignificance in Model 1 and a quadratic age model was tried to achieve better fit. In Model 2 we regressed the logarithm of the survival time against the age and age 2 for the 152 patients whose survival times were not less than 1 days. The sampling designs for both models were set as follows: For the case-cohort design we selected all failures and a subcohort of size 75 among the whole set of subjects. For the classical case-control design we selected all failures and 25 non-failures. For the nested case-control design we selected all failures and one non-failure in each risk set. Table 2 shows the average of the estimates of parameters with standard errors and p-values based on 4 replications. The proposed method under all the sampling designs led to the same conclusion as using the full data set. For comparison, we also present the estimates from Chen, Sun, and Tong (212) in the table, from which one can draw similar conclusions in general. It can also be seen from the comparison of standard error that the proposed method has the advantage of being asymptotically efficient.

8 1238 YUAN YAO Full data Table 2. Comparison of parameter estimation (estimated standard error, p-value) for the Stanford heart transplant data. Model 1 Age T5.295 (.115,.56).1692 (.211,.212) Model 2 Age Age (.528,.33).23 (.7,.4) Model 1 Design Proposed Chen Age T5 Age T5 C-C.281 (.179,.4).1175 (.2817,.3368).292 (.182,.677).1621 (.2756,.296) C-C-C.275 (.186,.459).1145 (.2882,.346).295 (.178,.638).1636 (.2757,.2947) N-C-C.263 (.146,.31).1825 (.2426,.2263).291 (.174,.542).1691 (.2565,.2693) Model 2 Design Proposed Chen Age Age 2 Age Age 2 C-C (.593,.294).23 (.8,.18) (.83,.599).24 (.1,.317) C-C-C (.652,.32).23 (.8,.125) (.85,.614).24 (.11,.337) N-C-C (.552,.15).23 (.7,.65) (.697,.355).24 (.9,.136) Note: Proposed represents the proposed method; Chen represents the Chen, Sun, and Tong (212) method; C-C represents case-cohort sampling; C-C-C represents classical case-control sampling; N-C-C represents nested case-control sampling. 5. Concluding Remarks This paper presents an efficient estimation technique for a broad class of sampling designs using linear transformation models. The computation procedure is based on the maximization of the discretized likelihood function. The variance estimation is obtained from the inverse of the negative Hessian matrix. We prove that such estimation method attains the semiparametric efficiency bound. The performance of the estimator is illustrated with simulation studies and a Stanford heart transplant data study. Further work will consider extending this method to partially linear transformation models, where kernel smoothing could be applied. Acknowledgement The author is grateful to the referee and an associate editor for their constructive comments which greatly improved this manuscript. The author also deeply thanks Professor Kani Chen for his guidance during the study. Appendix: Regularity conditions and proofs of theorems Some regularity conditions are needed:

9 MAXIMUM LIKELIHOOD METHOD 1239 (C1) The function H (t) and the distribution F (t) are strictly increasing and differentiable, with derivatives h(t) and f(t) that are absolutely continuous, and β lies in the interior of a known compact set in the domain of B. (C2) The covariate Z is bounded, and P (C τ) > δ > for some constant δ. (C3) λ(t) >, and λ is twice continuously differentiable. (C4) lim sup Λ(C x) 1 log(x sup y x λ(y)) = holds for every C >. x (C5) (First Identifiability) Let If Ψ i (β, H, F ) = {[e } βz i λ(h(y i )e βz i )] δ i e Λ(H(Y i)e βz i) i { } [e βz λ(h(y i )e βz )] δ i e Λ(H(Y i)e βz) (1 i ) f(z)dz. Ψ i (β, H, F )h (Y i ) δ i f (Z i ) i = Ψ i (β, H, F )h (Y i ) δ i f (Z i ) i, almost surely, then β = β, H = H and F = F. (C6) (Second Identifiability) If [ ] [ ] v T l β (β, H, F ) + l H (β, H, F ) pdh + l F (β, H, F ) qdf = almost surely for some v R d, p BV [, τ] and q BV [M], then (v, p, q) =, where v T l β, l H [g 1 ] and l F [g 2 ] denote the partial derivatives of l along direction of v, g 1, and g 2, respectively. Remark. Conditions (C1) (C3) assume certain smoothness and identifiability of the model, which are standard requirements in censored data analysis. (C4) is a technical condition on the structure of the model that is used in the proof of consistency. (C5) is the usual parameter identifiability condition. (C6) ensures that the Fisher information is non-singular. Proof of Theorem 1. The jump size of Hn ˆ must be finite, otherwise the loglikelihood would diverge to by (C4). Ĥ n is also bounded almost surely, otherwise if a new estimator H n = Ĥn/Ĥn(τ) were considered, it would contradict the maximum property of Ĥ n. Since Ĥn is uniformly bounded and monotone, Helly s Selection Theorem requires that for any subsequence of {Ĥn}, there is a further subsequence which converges to some monotone function H pointwise. Without loss of generality, assume that ˆF n converges to F and ˆβ n converges

10 124 YUAN YAO to β for the same subsequence. Then consistency is proved if we can show that H = H, F = F, and β = β with probability one. Furthermore, the continuity of H and F ensures that the convergence is uniform in t and Z. Take the derivative of the log-likelihood with respect to H along H + ϵi( Y i ), and let it be zero. Then Hence Ĥ n (t) = n i=1 t ĥ(y i ) = δ i n Ψ jh ( ˆβ n, Ĥn, ˆF. n )[I( Y i )] h(u)dn i (u) = Ψ j ( ˆβ n, Ĥn, ˆF n ) n i=1 t dn i (u) n Ψ jh ( ˆβ n, Ĥn, ˆF, n )[I( u)] Ψ j ( ˆβ n, Ĥn, ˆF n ) where we use N i (u) to denote the counting process of Y i. Now define n t dn i (u) H n (t) =. i=1 n Ψ jh (β, H, F )[I( u)] Ψ j (β, H, F ) The Glivenko-Cantelli Theorem leads to lim be written as Ĥ n (t) = t 1 n 1 n n H n (t) = H (t). Here Ĥn(t) can n Ψ jh (β, H, F )[I( u)] Ψ j (β, H, F ) n Ψ jh ( ˆβ n, Ĥn, ˆF d H n (u). n )[I( u)] Ψ j ( ˆβ n, Ĥn, ˆF n ) (A.1) It is now possible to show that H is continuously differentiable. For simplicity, define E (Ψ jh (β, H, F )[I( u)]/ψ j (β, H, F )) as g 1j (u) and E (Ψ jh (β, H, F )[I( u)]/ψ j (β, H, F )) as g 2j (u). It can be shown that they are both bounded away from zero. Taking the limit of (A.1), we get H (t) = t g 1j (u) g 2j (u) dh (u). We have now shown that H (t) is absolutely continuous with respect to H (t). By assumption, H (t) is continuously differentiable, and so is H (t). Then dh lim ˆ n (t) n dh n (t) = h (t) h (t)

11 MAXIMUM LIKELIHOOD METHOD 1241 uniformly in t [, τ], where h is the derivative of H. Repeating the same process with F similarly yields d lim ˆF n (Z) n d F n (Z) = f (Z) f (Z) uniformly. It follows from the inequality l n ( ˆβ n, Ĥn, ˆF n ) l n (β, H n, F n ) that 1 n log Ψ j( ˆβ n, Ĥn, ˆF n ) n Ψ j (β, H n, F n ) + 1 n δ j log dĥn(y j ) n d H n (Y j ) + 1 n j log d ˆF n (Z j ) n d F n (Z j ). Let n, then E ( log Ψ j(β, H, F )h (Y j ) δ j f (Z j ) ) j Ψ j (β, H, F )h (Y j ) δ jf (Z j ). j The left-hand side is the negative Kullback-Leibler distance, therefore condition C5 requires that β = β, H = H and F = F with probability one. Proof of Theorem 2. This proof is based on the argument on maximum likelihood estimators of Van der Vaart (1998, pp ). Let L(β, H, F ) = log Ψ j + δ j log h(y j ) + j log f(z j ), [ Φ n (β, H, F ) = P n {v ] [ ]} T L β + L H pdh + L F qdf, { [ ] [ ]} Φ(β, H, F ) = P v T L β + L H pdh + L F qdf, where we use v T L β, L H [g 1 ], and L F [g 2 ] to denote the partial derivatives of L along direction of β + ϵv, H + ϵg 1 and F + ϵg 2 respectively as above, and let P n denote the empirical measure based on n i.i.d. observations with P as its expectation. For any δ >, when n large enough, } {(β, H, F ) : β β + H H + F F < δ ( ˆβ n, Ĥn, ˆF n ) N = almost surely. By the Donsker theorem, n(φn Φ)( ˆβ n, Ĥn, ˆF n ) n(φ n Φ)(β, H, F ) = o p (1). Direct calculations show [ n(pn P){ ] v T L β (β, H, F ) + L H (β, H, F ) pdh

12 1242 YUAN YAO [ ]} +L F (β, H, F ) qdf = { n(p n P) v T L β ( ˆβ n, Ĥn, ˆF n ) + L H ( ˆβ n, Ĥn, ˆF n )[ ] pdĥn +L F ( ˆβ n, Ĥn, ˆF [ n ) qd ˆF ]} n + o p (1) = { np v T L β ( ˆβ n, Ĥn, ˆF n ) v T L β (β, H, F ) + L H ( ˆβ n, Ĥn, ˆF n )[ ] pdĥn [ ] L H (β, H, F ) pdh + L F ( ˆβ n, Ĥn, ˆF [ n ) qd ˆF ] n [ ]} L F (β, H, F ) qdf + o p (1) = { v T Ψ jβ ( np ˆβ n, Ĥn, ˆF Ψ n ) Ψ j ( ˆβ n, Ĥn, ˆF vt Ψ jβ (β, H, F ) jh ( ˆβ n, Ĥn, ˆF n )[ pdĥ] + n ) Ψ j (β, H, F ) Ψ j ( ˆβ n, Ĥn, ˆF n ) Ψ jh (β, H, F )[ pdh ] Ψ jf ( ˆβ n, Ĥn, ˆF n )[ qd ˆF ] + Ψ j (β, H, F ) Ψ j ( ˆβ n, Ĥn, ˆF n ) Ψ jf (β, H, F )[ qdf ] } + o p (1). (A.2) Ψ j (β, H, F ) For the continuous linear functional from BV [, τ] to R: ( ) ΨjH [p] p P (β, H, F ), Ψ j by Theorem 4.2 in Edwards and Wayment (1971), there exists a bounded function η H such that ( ) ΨjH [p] τ P (β, H, F ) = η H (t; β, H, F )dp(t). Ψ j To be specific, let p(s) = I(s t), so that ( ) ΨjH [I( t)] η H (t; β, H, F ) = P (β, H, F ). Ψ j Analogously, we define η F (z; β, H, F ) = P and second order partial derivatives functions: ( ) ΨjF [I( z)] (β, H, F ), Ψ j

13 MAXIMUM LIKELIHOOD METHOD 1243 We also set η Hβ (t; β, H, F ) = β η H(t; β, H, F ), η F β (t; β, H, F ) = β η F (t; β, H, F ), η HH (s, t; β, H, F ) = H η H(s; β, H, F )[I( t)], η HF (s, z; β, H, F ) = F η H(s; β, H, F )[I( z)], η F H (x, t; β, H, F ) = H η F (x; β, H, F )[I( t)], η F F (x, z; β, H, F ) = F η F (x; β, H, F )[I( z)]. ζ β (β, H, F ) = β P ζ H (t; β, H, F ) = H P ζ F (z; β, H, F ) = F P ( Ψjβ Ψ j ( Ψjβ Ψ j ( Ψjβ Ψ j ) (β, H, F ), ) (β, H, F )[I( t)], ) (β, H, F )[I( z)]. The right-hand side of (A.2) is now = ( n B 1 [v, p, q] T ( β ˆ n β )+ B 21 [v, p, q]d(ĥn H )+ n +o( ˆβn β + n Ĥn H + n ˆF ) n F, B 22 [v, p, q]d( ˆF ) n F ) with linear operators B 1, B 21, and B 22 defined as τ B 1 [v, p, q] = v T ζ β (β, H, F ) + η Hβ (t; β, H, F )p(t)dh (t) + η F β (z; β, H, F )q(z)df (z); M B 21 [v, p, q] = v T ζ H (t; β, H, F ) + η H (t; β, H, F )p(t) τ + + M η HH (s, t; β, H, F )p(s)dh (s) η F H (x, t; β, H, F )q(x)df (x); B 22 [v, p, q] = v T ζ F (z; β, H, F ) + η F (z; β, H, F )q(z) τ + + M η HF (s, z; β, H, F )p(s)dh (s) η F F (x, z; β, H, F )q(x)df (x).

14 1244 YUAN YAO The operator [B 1, B 21, B 22 ] can then be written as B 1 [v, p, q] T ṽ + B 21 [v, p, q] pdh + B 22 [v, p, q] qdf = d { P v T L dϵ β (β + ϵṽ, H + ϵ pdh, F + ϵ qdf ) ϵ= [ ] +L H (β + ϵṽ, H + ϵ pdh, F + ϵ qdf ) pdh [ ]} +L F (β + ϵṽ, H + ϵ pdh, F + ϵ qdf ) qdf. We now show that (B 1, B 21, B 22 ) is continuously invertible. By the Open Mapping Theorem, we need only prove that the linear operator is one-to-one. To show (B 1, B 21, B 22 ) is injective, suppose that B 1 [v, p, q] =, B 21 [v, p, q] = and B 22 [v, p, q] =. Then = B 1 [v, p, q] T v + B 21 [v, p, q]pdh + B 22 [v, p, q]qdf, where the right-hand side is the derivative of { [ ] P v T L β (β, H, F ) + L H (β, H, F ) pdh [ ]} + L F (β, H, F ) qdf along the path (β + ϵv, H + ϵ pdh, F + ϵ qdf ). This implies that the information along this path is zero, hence [ ] [ ] v T L β (β, H, F ) + L H (β, H, F ) pdh + L F (β, H, F ) qdf = almost surely. By the identifiability condition (C6), we get (v, p, q) =. To show (B 1, B 21, B 22 ) is surjective, write (B 1, B 21, B 22 )[v, p, q] = I 1 [v, p, q] + I 2 [v, p, q], where I 1 [v, p, q] is defined as v η H (t; β, H, F )p(t) η F (z; β, H, F )q(z) and I 2 [v, p, q] is defined as v T ζ β (β, H, F ) + τ η Hβ(t; β, H, F )p(t)dh (t) + M η F β(z; β, H, F )q(z)df (z) v v T ζ H (t; β, H, F ) + τ η HH(s, t; β, H, F )p(s)dh (s) + M η F H(x, t; β, H, F )q(x)df (x). v T ζ F (z; β, H, F ) + τ η HF (s, z; β, H, F )p(s)dh (s) + M η F F (x, z; β, H, F )q(x)df (x)

15 MAXIMUM LIKELIHOOD METHOD 1245 Since η H (t; β, H, F ) = P(L H (β, H, F )[I( t)]), the score function along the direction of H +ϵi( t), η H (t; β, H, F ) is negative and continuous for all t. Similarly, η F (z; β, H, F ) is the score function along the direction F +ϵi( z), hence η F (z; β, H, F ) is negative and continuous for all z. Thus I 1 [v, p, q] is a bijective continuous operator, and is continuously invertible. By the smoothness conditions (C1) and (C3), the image of I 2 is a set of uniformly bounded and equi-continuous functions, hence I 2 is a compact operator by the Arzelá-Ascoli theorem. Now we can write (B 1, B 21, B 22 ) as I 1 + I 2 = I 1 (I + I1 1 I 2), where I is identity mapping. It is clear that I1 1 I 2 is a compact operator by the continuity of I1 1. Then I + I 1 1 I 2 is a Fredholm operator. Now since ker(i + I1 1 I 2) = ker(i 1 + I 2 ) =, the Fredholm operator theory gives that I + I1 1 I 2 is surjective, and so is I 1 + I 2. The continuous invertibility of (B 1, B 21, B 22 ) has been proved. Now with (ṽ, p, q) = (B 1, B 21, B 22 ) 1 (v, p, q), from (A.2) we have { n v T ( ˆβ n β ) + pd(ĥn H ) + qd( ˆF } n F ) = { [ ] n(p n P) ṽ T L β (β, H, F ) + L H (β, H, F ) pdh (A.3) [ ]} n +L F (β, H, F ) qdf +o( ˆβn β + n Ĥn H + n ˆF ) n F. The first term of the right side of (A.3) converges in distribution to a zero-mean Gaussian process in the metric space R d L (Q 1 ) L (Q 2 ). By Slutsky s theorem, we now need only show that n ˆβn β + n Ĥn H + n ˆF n F = O p (1). (A.4) By definition, ˆβ n β + Ĥn H + ˆF n F = sup vt ( ˆβ n β ) + (v,p,q) V Q 1 Q 2 τ pd(ĥn H ) + Z qd( ˆF n F ), so (A.3) gives n ˆβn β + n Ĥn H + n ˆF n F n = O p (1) + o( ˆβn β + n Ĥn H + n ˆF ) n F, therefore (A.4) holds immediately. Now we have { n v T ( ˆβ n β ) + pd(ĥn H ) + qd( ˆF } n F )

16 1246 YUAN YAO = { [ ] n(p n P) ṽ T L β (β, H, F ) + L H (β, H, F ) pdh [ ]} +L F (β, H, F ) qdf + o p (1). (A.5) Since the right side of (A.5) converges to normal by the Central Limit Theorem, we have proved that n( ˆβ n β, Ĥn H, ˆF n F ) converges weakly to a zeromean Gaussian process. If p = q = in equality (A.5), then the estimate ˆβ n is an asymptotically linear estimator with influence function ṽ T L β (β, H, F ) + L H (β, H, F )[ pdh ] + L F (β, H, F )[ qdf ], which lies in the linear space spanned by the score functions: {ṽ T L β + L H [ pdh] + L F [ qdf ] : ṽ R d, p Q 1, q Q 2 }. By Proposition 1 in Bickel et al. (1993), the estimate ˆβ n is semiparametrically efficient. Proof of Theorem 3. Let H n (t) be a step function with a jump size h n (s i ) = H (s i ) max sj <s i H (s j ) at each failure time s i, and F n (z) be a step function with a jump size f n (Z i ) = F (Z i ) max Zj1 Z i,...,z jd Z i F (Z j1 (1),..., Z jd (d)) at each observed covariate Z i, where Z k (m) is the value on the mth dimension of the covariate Z k, and the order Z k < Z i is taken as Z k (m) < Z i (m) for all 1 m d. Then H n (s i ) = H (s i ) and F n (Z i ) = F (Z i ) holds for every s i and Z i. Choose v 1 R d and bounded variation functions p 1 Q 1, q 1 Q 2 such that v 1 p1 dĥn q1 d ˆF n = I 1 n v p, q where I n, p, and q are as defined in Theorem 3 and p 1 dĥn = (p 1 (s 1 )ĥn(s 1 ),..., p 1 (s n1 )ĥn(s n1 )) T, q 1 d ˆF n = (q 1 (Z 1 ) ˆf n (Z 1 ),..., q 1 (Z n2 ) ˆf n (Z n2 )) T. Let and ĥ n h = (ĥn(s 1 ) h(s 1 ),..., ĥn(s n1 ) h(s n1 )) T ˆf n f = ( ˆf n (Z 1 ) f(z 1 ),..., ˆf n (Z n2 ) f(z n2 )) T, then we have nv T ( ˆβ n β ) + n 1 n p(s i )(ĥn(s i ) h(s i )) + n 2 n q(z i )( ˆf n (Z i ) f(z i )) i= i=

17 = n = n ˆβ n β ĥ n h ˆf n f ˆβ n β ĥ n h ˆf n f T T MAXIMUM LIKELIHOOD METHOD 1247 v p q I n v 1 p1 dĥn q1 d ˆF n = L ββ L βh L βf v np n L Hβ L HH L HF ( ˆβ n, Ĥn, ˆF 1 ˆβ n β n ) p1 dĥn, L F β L F H L F F q1 d ˆF Ĥ n H n n ˆF n F n = L ββ L βh L βf np L Hβ L HH L HF v 1 ˆβ n β (β, H, F ) p1 dh, Ĥ n H + o p (1) L F β L F H L F F q1 df ˆF n F = { [ ] n(p n P) v T 1 L β (β, H, F ) + L H (β, H, F ) p 1 dh [ ]} +L F (β, H, F ) q 1 df + o p (1), where the last equality is from the proof of Theorem 2. Note that = { [ ] n(p n P) v T 1 L β (β, H, F ) + L H (β, H, F ) p 1 dh [ ]} +L F (β, H, F ) q 1 df = { n(p n P) v T 1 L β (β, H, F ) + L H (β, H, F )[ ] p 1 dĥ +L F (β, H, F )[ q 1 d ˆF ]} + o p (1), and the variance of the right side is consistently estimated by L ββ L βh L βf v 1 v 1 P L Hβ L HH L HF (β, H, F ) p1 dĥn, p1 dĥn. L F β L F H L F F q1 d ˆF n q1 d ˆF n This can be further estimated by L ββ L βh L βf v 1 v P n L Hβ L HH L HF ( ˆβ n, Ĥn, ˆF 1 n ) p1 dĥn, p1 dĥn L F β L F H L F F q1 d ˆF n q1 d ˆF n

18 1248 YUAN YAO v 1 = p1 dĥn q1 d ˆF n T I n v 1 p1 dĥn q1 d ˆF n = (v T, p T, q T )In 1 I n In 1 (v T, p T, q T ) T = (v T, p T, q T )In 1 (v T, p T, q T ) T. The proof of Theorem 3 is complete. References Bickel, P. J., Klaassen, C. A. J., Ritov, Y. and Wellner, J. A. (1993). Efficient and Adaptive Estimation for Semiparametric Models. John Hopkins University Press, Baltimore. Breslow, N. E. (1996). Statistics in epidemiology: The case-control studies. J. Amer. Statist. Assoc. 91, Breslow, N. E. and Day, N. E. (198). Statistical methods in cancer research. Vol.I, The Design and Analysis of Case-Control Studies. International Agency for Research on Cancer, Lyon. Chen, K. (21). Generalized case-cohort sampling. J. Roy. Statist. Soc. Ser. B 63, Chen, K., Jin, Z. and Ying, Z. (22). Semiparametric analysis of transformation models with censored data. Biometrika 89, Chen, K., Sun, L. and Tong, X. (212). Analysis of cohort survival data with transformation model. Statist. Sinica 22, Cheng, S. C., Wei, L. J. and Ying, Z. (1995). Analysis of transformation models with censored data. Biometrika 82, Dabrowska, D. M. and Doksum, K. A. (1988). Estimation and testing in the two-sample generalized odds rate model. J. Amer. Statist. Assoc. 83, Edwards, J. R. and Wayment, S. G. (1971). Representations for transformations continuous in the BV norm. Trans. Amer. Math. Soc. 154, Goldstein, L. and Langholz, B. (1992). Asymptotic theory for nested case-control sampling in the Cox regression model. Ann. Statist. 2, Jin, Z., Lin, D. Y. and Ying, Z. (26). On least-squares regression with censored data. Biometrika 93, Langholz, B. and Goldstein, L. (1996). Risk set sampling in epidemiologic cohort studies. Statist. Sci. 11, Miller, R. G. and Halpern, J. (1982). Regression with censored data. Biometrika 69, Samuelsen, S. (1997). A pseudo-likelihood approach to analysis of nested case-control data. Biometrika 84, Thomas, D. C. (1977). Appendum to Methods of cohort analysis: appraisal by application to asbestos mining, by Liddell, F. D. K., McDonald, J. C., and Thomas, D. C. J. Roy. Statist. Soc. Ser. A 14, Van der Vaart, A. W. (1998). Asymptotic Statistics. Cambridge University Press, Cambridge, U.K. Department of Mathematics, Hong Kong Baptist University, Kowloon Tong, Kowloon, Hong Kong. yaoyuan@hkbu.edu.hk (Received August 211; accepted August 214)

END-POINT SAMPLING. Yuan Yao, Wen Yu and Kani Chen. Hong Kong Baptist University, Fudan University and Hong Kong University of Science and Technology

END-POINT SAMPLING. Yuan Yao, Wen Yu and Kani Chen. Hong Kong Baptist University, Fudan University and Hong Kong University of Science and Technology Statistica Sinica 27 (2017), 000-000 415-435 doi:http://dx.doi.org/10.5705/ss.202015.0294 END-POINT SAMPLING Yuan Yao, Wen Yu and Kani Chen Hong Kong Baptist University, Fudan University and Hong Kong

More information

Efficiency of Profile/Partial Likelihood in the Cox Model

Efficiency of Profile/Partial Likelihood in the Cox Model Efficiency of Profile/Partial Likelihood in the Cox Model Yuichi Hirose School of Mathematics, Statistics and Operations Research, Victoria University of Wellington, New Zealand Summary. This paper shows

More information

On the Breslow estimator

On the Breslow estimator Lifetime Data Anal (27) 13:471 48 DOI 1.17/s1985-7-948-y On the Breslow estimator D. Y. Lin Received: 5 April 27 / Accepted: 16 July 27 / Published online: 2 September 27 Springer Science+Business Media,

More information

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

FULL LIKELIHOOD INFERENCES IN THE COX MODEL October 20, 2007 FULL LIKELIHOOD INFERENCES IN THE COX MODEL BY JIAN-JIAN REN 1 AND MAI ZHOU 2 University of Central Florida and University of Kentucky Abstract We use the empirical likelihood approach

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 24 Paper 153 A Note on Empirical Likelihood Inference of Residual Life Regression Ying Qing Chen Yichuan

More information

Analysis of transformation models with censored data

Analysis of transformation models with censored data Biometrika (1995), 82,4, pp. 835-45 Printed in Great Britain Analysis of transformation models with censored data BY S. C. CHENG Department of Biomathematics, M. D. Anderson Cancer Center, University of

More information

Estimation of the Bivariate and Marginal Distributions with Censored Data

Estimation of the Bivariate and Marginal Distributions with Censored Data Estimation of the Bivariate and Marginal Distributions with Censored Data Michael Akritas and Ingrid Van Keilegom Penn State University and Eindhoven University of Technology May 22, 2 Abstract Two new

More information

STAT 331. Accelerated Failure Time Models. Previously, we have focused on multiplicative intensity models, where

STAT 331. Accelerated Failure Time Models. Previously, we have focused on multiplicative intensity models, where STAT 331 Accelerated Failure Time Models Previously, we have focused on multiplicative intensity models, where h t z) = h 0 t) g z). These can also be expressed as H t z) = H 0 t) g z) or S t z) = e Ht

More information

STAT331. Cox s Proportional Hazards Model

STAT331. Cox s Proportional Hazards Model STAT331 Cox s Proportional Hazards Model In this unit we introduce Cox s proportional hazards (Cox s PH) model, give a heuristic development of the partial likelihood function, and discuss adaptations

More information

NIH Public Access Author Manuscript J Am Stat Assoc. Author manuscript; available in PMC 2015 January 01.

NIH Public Access Author Manuscript J Am Stat Assoc. Author manuscript; available in PMC 2015 January 01. NIH Public Access Author Manuscript Published in final edited form as: J Am Stat Assoc. 2014 January 1; 109(505): 371 383. doi:10.1080/01621459.2013.842172. Efficient Estimation of Semiparametric Transformation

More information

Linear life expectancy regression with censored data

Linear life expectancy regression with censored data Linear life expectancy regression with censored data By Y. Q. CHEN Program in Biostatistics, Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington 98109, U.S.A.

More information

Published online: 10 Apr 2012.

Published online: 10 Apr 2012. This article was downloaded by: Columbia University] On: 23 March 215, At: 12:7 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 172954 Registered office: Mortimer

More information

FULL LIKELIHOOD INFERENCES IN THE COX MODEL: AN EMPIRICAL LIKELIHOOD APPROACH

FULL LIKELIHOOD INFERENCES IN THE COX MODEL: AN EMPIRICAL LIKELIHOOD APPROACH FULL LIKELIHOOD INFERENCES IN THE COX MODEL: AN EMPIRICAL LIKELIHOOD APPROACH Jian-Jian Ren 1 and Mai Zhou 2 University of Central Florida and University of Kentucky Abstract: For the regression parameter

More information

Tests of independence for censored bivariate failure time data

Tests of independence for censored bivariate failure time data Tests of independence for censored bivariate failure time data Abstract Bivariate failure time data is widely used in survival analysis, for example, in twins study. This article presents a class of χ

More information

Empirical Processes & Survival Analysis. The Functional Delta Method

Empirical Processes & Survival Analysis. The Functional Delta Method STAT/BMI 741 University of Wisconsin-Madison Empirical Processes & Survival Analysis Lecture 3 The Functional Delta Method Lu Mao lmao@biostat.wisc.edu 3-1 Objectives By the end of this lecture, you will

More information

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model Other Survival Models (1) Non-PH models We briefly discussed the non-proportional hazards (non-ph) model λ(t Z) = λ 0 (t) exp{β(t) Z}, where β(t) can be estimated by: piecewise constants (recall how);

More information

Cox s proportional hazards model and Cox s partial likelihood

Cox s proportional hazards model and Cox s partial likelihood Cox s proportional hazards model and Cox s partial likelihood Rasmus Waagepetersen October 12, 2018 1 / 27 Non-parametric vs. parametric Suppose we want to estimate unknown function, e.g. survival function.

More information

On Estimation of Partially Linear Transformation. Models

On Estimation of Partially Linear Transformation. Models On Estimation of Partially Linear Transformation Models Wenbin Lu and Hao Helen Zhang Authors Footnote: Wenbin Lu is Associate Professor (E-mail: wlu4@stat.ncsu.edu) and Hao Helen Zhang is Associate Professor

More information

Estimation and Inference of Quantile Regression. for Survival Data under Biased Sampling

Estimation and Inference of Quantile Regression. for Survival Data under Biased Sampling Estimation and Inference of Quantile Regression for Survival Data under Biased Sampling Supplementary Materials: Proofs of the Main Results S1 Verification of the weight function v i (t) for the lengthbiased

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models

Introduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models Introduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations

More information

Likelihood ratio confidence bands in nonparametric regression with censored data

Likelihood ratio confidence bands in nonparametric regression with censored data Likelihood ratio confidence bands in nonparametric regression with censored data Gang Li University of California at Los Angeles Department of Biostatistics Ingrid Van Keilegom Eindhoven University of

More information

Efficient Semiparametric Estimators via Modified Profile Likelihood in Frailty & Accelerated-Failure Models

Efficient Semiparametric Estimators via Modified Profile Likelihood in Frailty & Accelerated-Failure Models NIH Talk, September 03 Efficient Semiparametric Estimators via Modified Profile Likelihood in Frailty & Accelerated-Failure Models Eric Slud, Math Dept, Univ of Maryland Ongoing joint project with Ilia

More information

asymptotic normality of nonparametric M-estimators with applications to hypothesis testing for panel count data

asymptotic normality of nonparametric M-estimators with applications to hypothesis testing for panel count data asymptotic normality of nonparametric M-estimators with applications to hypothesis testing for panel count data Xingqiu Zhao and Ying Zhang The Hong Kong Polytechnic University and Indiana University Abstract:

More information

Lecture 3. Truncation, length-bias and prevalence sampling

Lecture 3. Truncation, length-bias and prevalence sampling Lecture 3. Truncation, length-bias and prevalence sampling 3.1 Prevalent sampling Statistical techniques for truncated data have been integrated into survival analysis in last two decades. Truncation in

More information

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA Kasun Rathnayake ; A/Prof Jun Ma Department of Statistics Faculty of Science and Engineering Macquarie University

More information

Power and Sample Size Calculations with the Additive Hazards Model

Power and Sample Size Calculations with the Additive Hazards Model Journal of Data Science 10(2012), 143-155 Power and Sample Size Calculations with the Additive Hazards Model Ling Chen, Chengjie Xiong, J. Philip Miller and Feng Gao Washington University School of Medicine

More information

STAT Sample Problem: General Asymptotic Results

STAT Sample Problem: General Asymptotic Results STAT331 1-Sample Problem: General Asymptotic Results In this unit we will consider the 1-sample problem and prove the consistency and asymptotic normality of the Nelson-Aalen estimator of the cumulative

More information

Survival Analysis for Case-Cohort Studies

Survival Analysis for Case-Cohort Studies Survival Analysis for ase-ohort Studies Petr Klášterecký Dept. of Probability and Mathematical Statistics, Faculty of Mathematics and Physics, harles University, Prague, zech Republic e-mail: petr.klasterecky@matfyz.cz

More information

A note on L convergence of Neumann series approximation in missing data problems

A note on L convergence of Neumann series approximation in missing data problems A note on L convergence of Neumann series approximation in missing data problems Hua Yun Chen Division of Epidemiology & Biostatistics School of Public Health University of Illinois at Chicago 1603 West

More information

Product-limit estimators of the survival function with left or right censored data

Product-limit estimators of the survival function with left or right censored data Product-limit estimators of the survival function with left or right censored data 1 CREST-ENSAI Campus de Ker-Lann Rue Blaise Pascal - BP 37203 35172 Bruz cedex, France (e-mail: patilea@ensai.fr) 2 Institut

More information

Survival Analysis Math 434 Fall 2011

Survival Analysis Math 434 Fall 2011 Survival Analysis Math 434 Fall 2011 Part IV: Chap. 8,9.2,9.3,11: Semiparametric Proportional Hazards Regression Jimin Ding Math Dept. www.math.wustl.edu/ jmding/math434/fall09/index.html Basic Model Setup

More information

Least Absolute Deviations Estimation for the Accelerated Failure Time Model. University of Iowa. *

Least Absolute Deviations Estimation for the Accelerated Failure Time Model. University of Iowa. * Least Absolute Deviations Estimation for the Accelerated Failure Time Model Jian Huang 1,2, Shuangge Ma 3, and Huiliang Xie 1 1 Department of Statistics and Actuarial Science, and 2 Program in Public Health

More information

Asymptotic Properties of Kaplan-Meier Estimator. for Censored Dependent Data. Zongwu Cai. Department of Mathematics

Asymptotic Properties of Kaplan-Meier Estimator. for Censored Dependent Data. Zongwu Cai. Department of Mathematics To appear in Statist. Probab. Letters, 997 Asymptotic Properties of Kaplan-Meier Estimator for Censored Dependent Data by Zongwu Cai Department of Mathematics Southwest Missouri State University Springeld,

More information

UNIVERSITY OF CALIFORNIA, SAN DIEGO

UNIVERSITY OF CALIFORNIA, SAN DIEGO UNIVERSITY OF CALIFORNIA, SAN DIEGO Estimation of the primary hazard ratio in the presence of a secondary covariate with non-proportional hazards An undergraduate honors thesis submitted to the Department

More information

Full likelihood inferences in the Cox model: an empirical likelihood approach

Full likelihood inferences in the Cox model: an empirical likelihood approach Ann Inst Stat Math 2011) 63:1005 1018 DOI 10.1007/s10463-010-0272-y Full likelihood inferences in the Cox model: an empirical likelihood approach Jian-Jian Ren Mai Zhou Received: 22 September 2008 / Revised:

More information

LEAST ABSOLUTE DEVIATIONS ESTIMATION FOR THE ACCELERATED FAILURE TIME MODEL

LEAST ABSOLUTE DEVIATIONS ESTIMATION FOR THE ACCELERATED FAILURE TIME MODEL Statistica Sinica 17(2007), 1533-1548 LEAST ABSOLUTE DEVIATIONS ESTIMATION FOR THE ACCELERATED FAILURE TIME MODEL Jian Huang 1, Shuangge Ma 2 and Huiliang Xie 1 1 University of Iowa and 2 Yale University

More information

Estimation of Time Transformation Models with. Bernstein Polynomials

Estimation of Time Transformation Models with. Bernstein Polynomials Estimation of Time Transformation Models with Bernstein Polynomials Alexander C. McLain and Sujit K. Ghosh June 12, 2009 Institute of Statistics Mimeo Series #2625 Abstract Time transformation models assume

More information

Asymptotic Properties of Nonparametric Estimation Based on Partly Interval-Censored Data

Asymptotic Properties of Nonparametric Estimation Based on Partly Interval-Censored Data Asymptotic Properties of Nonparametric Estimation Based on Partly Interval-Censored Data Jian Huang Department of Statistics and Actuarial Science University of Iowa, Iowa City, IA 52242 Email: jian@stat.uiowa.edu

More information

Efficient Estimation of Censored Linear Regression Model

Efficient Estimation of Censored Linear Regression Model 2 3 4 5 6 7 8 9 2 3 4 5 6 7 8 9 2 2 22 23 24 25 26 27 28 29 3 3 32 33 34 35 36 37 38 39 4 4 42 43 44 45 46 47 48 Biometrika (2), xx, x, pp. 4 C 28 Biometrika Trust Printed in Great Britain Efficient Estimation

More information

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data Malaysian Journal of Mathematical Sciences 11(3): 33 315 (217) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES Journal homepage: http://einspem.upm.edu.my/journal Approximation of Survival Function by Taylor

More information

LSS: An S-Plus/R program for the accelerated failure time model to right censored data based on least-squares principle

LSS: An S-Plus/R program for the accelerated failure time model to right censored data based on least-squares principle computer methods and programs in biomedicine 86 (2007) 45 50 journal homepage: www.intl.elsevierhealth.com/journals/cmpb LSS: An S-Plus/R program for the accelerated failure time model to right censored

More information

Quantile Regression for Residual Life and Empirical Likelihood

Quantile Regression for Residual Life and Empirical Likelihood Quantile Regression for Residual Life and Empirical Likelihood Mai Zhou email: mai@ms.uky.edu Department of Statistics, University of Kentucky, Lexington, KY 40506-0027, USA Jong-Hyeon Jeong email: jeong@nsabp.pitt.edu

More information

Semiparametric analysis of transformation models with censored data

Semiparametric analysis of transformation models with censored data Biometrika (22), 89, 3, pp. 659 668 22 Biometrika Trust Printed in Great Britain Semiparametric analysis of transformation models with censored data BY KANI CHEN Department of Mathematics, Hong Kong University

More information

Statistical Methods for Alzheimer s Disease Studies

Statistical Methods for Alzheimer s Disease Studies Statistical Methods for Alzheimer s Disease Studies Rebecca A. Betensky, Ph.D. Department of Biostatistics, Harvard T.H. Chan School of Public Health July 19, 2016 1/37 OUTLINE 1 Statistical collaborations

More information

The Accelerated Failure Time Model Under Biased. Sampling

The Accelerated Failure Time Model Under Biased. Sampling The Accelerated Failure Time Model Under Biased Sampling Micha Mandel and Ya akov Ritov Department of Statistics, The Hebrew University of Jerusalem, Israel July 13, 2009 Abstract Chen (2009, Biometrics)

More information

Cox Regression in Nested Case Control Studies with Auxiliary Covariates

Cox Regression in Nested Case Control Studies with Auxiliary Covariates Biometrics DOI: 1.1111/j.1541-42.29.1277.x Cox Regression in Nested Case Control Studies with Auxiliary Covariates Mengling Liu, 1, Wenbin Lu, 2 and Chi-hong Tseng 3 1 Division of Biostatistics, School

More information

Censored quantile regression with varying coefficients

Censored quantile regression with varying coefficients Title Censored quantile regression with varying coefficients Author(s) Yin, G; Zeng, D; Li, H Citation Statistica Sinica, 2013, p. 1-24 Issued Date 2013 URL http://hdl.handle.net/10722/189454 Rights Statistica

More information

Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion

Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion Glenn Heller and Jing Qin Department of Epidemiology and Biostatistics Memorial

More information

Regression Calibration in Semiparametric Accelerated Failure Time Models

Regression Calibration in Semiparametric Accelerated Failure Time Models Biometrics 66, 405 414 June 2010 DOI: 10.1111/j.1541-0420.2009.01295.x Regression Calibration in Semiparametric Accelerated Failure Time Models Menggang Yu 1, and Bin Nan 2 1 Department of Medicine, Division

More information

Maximum likelihood estimation for Cox s regression model under nested case-control sampling

Maximum likelihood estimation for Cox s regression model under nested case-control sampling Biostatistics (2004), 5, 2,pp. 193 206 Printed in Great Britain Maximum likelihood estimation for Cox s regression model under nested case-control sampling THOMAS H. SCHEIKE Department of Biostatistics,

More information

AFT Models and Empirical Likelihood

AFT Models and Empirical Likelihood AFT Models and Empirical Likelihood Mai Zhou Department of Statistics, University of Kentucky Collaborators: Gang Li (UCLA); A. Bathke; M. Kim (Kentucky) Accelerated Failure Time (AFT) models: Y = log(t

More information

Exercises. (a) Prove that m(t) =

Exercises. (a) Prove that m(t) = Exercises 1. Lack of memory. Verify that the exponential distribution has the lack of memory property, that is, if T is exponentially distributed with parameter λ > then so is T t given that T > t for

More information

ASYMPTOTIC PROPERTIES AND EMPIRICAL EVALUATION OF THE NPMLE IN THE PROPORTIONAL HAZARDS MIXED-EFFECTS MODEL

ASYMPTOTIC PROPERTIES AND EMPIRICAL EVALUATION OF THE NPMLE IN THE PROPORTIONAL HAZARDS MIXED-EFFECTS MODEL Statistica Sinica 19 (2009), 997-1011 ASYMPTOTIC PROPERTIES AND EMPIRICAL EVALUATION OF THE NPMLE IN THE PROPORTIONAL HAZARDS MIXED-EFFECTS MODEL Anthony Gamst, Michael Donohue and Ronghui Xu University

More information

Modelling geoadditive survival data

Modelling geoadditive survival data Modelling geoadditive survival data Thomas Kneib & Ludwig Fahrmeir Department of Statistics, Ludwig-Maximilians-University Munich 1. Leukemia survival data 2. Structured hazard regression 3. Mixed model

More information

Lecture 22 Survival Analysis: An Introduction

Lecture 22 Survival Analysis: An Introduction University of Illinois Department of Economics Spring 2017 Econ 574 Roger Koenker Lecture 22 Survival Analysis: An Introduction There is considerable interest among economists in models of durations, which

More information

Attributable Risk Function in the Proportional Hazards Model

Attributable Risk Function in the Proportional Hazards Model UW Biostatistics Working Paper Series 5-31-2005 Attributable Risk Function in the Proportional Hazards Model Ying Qing Chen Fred Hutchinson Cancer Research Center, yqchen@u.washington.edu Chengcheng Hu

More information

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Anastasios (Butch) Tsiatis Department of Statistics North Carolina State University http://www.stat.ncsu.edu/

More information

1 Glivenko-Cantelli type theorems

1 Glivenko-Cantelli type theorems STA79 Lecture Spring Semester Glivenko-Cantelli type theorems Given i.i.d. observations X,..., X n with unknown distribution function F (t, consider the empirical (sample CDF ˆF n (t = I [Xi t]. n Then

More information

GROUPED SURVIVAL DATA. Florida State University and Medical College of Wisconsin

GROUPED SURVIVAL DATA. Florida State University and Medical College of Wisconsin FITTING COX'S PROPORTIONAL HAZARDS MODEL USING GROUPED SURVIVAL DATA Ian W. McKeague and Mei-Jie Zhang Florida State University and Medical College of Wisconsin Cox's proportional hazard model is often

More information

Nonparametric rank based estimation of bivariate densities given censored data conditional on marginal probabilities

Nonparametric rank based estimation of bivariate densities given censored data conditional on marginal probabilities Hutson Journal of Statistical Distributions and Applications (26 3:9 DOI.86/s4488-6-47-y RESEARCH Open Access Nonparametric rank based estimation of bivariate densities given censored data conditional

More information

Likelihood Construction, Inference for Parametric Survival Distributions

Likelihood Construction, Inference for Parametric Survival Distributions Week 1 Likelihood Construction, Inference for Parametric Survival Distributions In this section we obtain the likelihood function for noninformatively rightcensored survival data and indicate how to make

More information

Lecture 2: Martingale theory for univariate survival analysis

Lecture 2: Martingale theory for univariate survival analysis Lecture 2: Martingale theory for univariate survival analysis In this lecture T is assumed to be a continuous failure time. A core question in this lecture is how to develop asymptotic properties when

More information

A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints and Its Application to Empirical Likelihood

A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints and Its Application to Empirical Likelihood Noname manuscript No. (will be inserted by the editor) A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints and Its Application to Empirical Likelihood Mai Zhou Yifan Yang Received:

More information

CURE MODEL WITH CURRENT STATUS DATA

CURE MODEL WITH CURRENT STATUS DATA Statistica Sinica 19 (2009), 233-249 CURE MODEL WITH CURRENT STATUS DATA Shuangge Ma Yale University Abstract: Current status data arise when only random censoring time and event status at censoring are

More information

Estimation for two-phase designs: semiparametric models and Z theorems

Estimation for two-phase designs: semiparametric models and Z theorems Estimation for two-phase designs:semiparametric models and Z theorems p. 1/27 Estimation for two-phase designs: semiparametric models and Z theorems Jon A. Wellner University of Washington Estimation for

More information

Asymptotic Distributions for the Nelson-Aalen and Kaplan-Meier estimators and for test statistics.

Asymptotic Distributions for the Nelson-Aalen and Kaplan-Meier estimators and for test statistics. Asymptotic Distributions for the Nelson-Aalen and Kaplan-Meier estimators and for test statistics. Dragi Anevski Mathematical Sciences und University November 25, 21 1 Asymptotic distributions for statistical

More information

A FRAILTY MODEL APPROACH FOR REGRESSION ANALYSIS OF BIVARIATE INTERVAL-CENSORED SURVIVAL DATA

A FRAILTY MODEL APPROACH FOR REGRESSION ANALYSIS OF BIVARIATE INTERVAL-CENSORED SURVIVAL DATA Statistica Sinica 23 (2013), 383-408 doi:http://dx.doi.org/10.5705/ss.2011.151 A FRAILTY MODEL APPROACH FOR REGRESSION ANALYSIS OF BIVARIATE INTERVAL-CENSORED SURVIVAL DATA Chi-Chung Wen and Yi-Hau Chen

More information

Statistical Inference and Methods

Statistical Inference and Methods Department of Mathematics Imperial College London d.stephens@imperial.ac.uk http://stats.ma.ic.ac.uk/ das01/ 31st January 2006 Part VI Session 6: Filtering and Time to Event Data Session 6: Filtering and

More information

A semiparametric generalized proportional hazards model for right-censored data

A semiparametric generalized proportional hazards model for right-censored data Ann Inst Stat Math (216) 68:353 384 DOI 1.17/s1463-14-496-3 A semiparametric generalized proportional hazards model for right-censored data M. L. Avendaño M. C. Pardo Received: 28 March 213 / Revised:

More information

ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES. Cox s regression analysis Time dependent explanatory variables

ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES. Cox s regression analysis Time dependent explanatory variables ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES Cox s regression analysis Time dependent explanatory variables Henrik Ravn Bandim Health Project, Statens Serum Institut 4 November 2011 1 / 53

More information

Empirical Likelihood in Survival Analysis

Empirical Likelihood in Survival Analysis Empirical Likelihood in Survival Analysis Gang Li 1, Runze Li 2, and Mai Zhou 3 1 Department of Biostatistics, University of California, Los Angeles, CA 90095 vli@ucla.edu 2 Department of Statistics, The

More information

M- and Z- theorems; GMM and Empirical Likelihood Wellner; 5/13/98, 1/26/07, 5/08/09, 6/14/2010

M- and Z- theorems; GMM and Empirical Likelihood Wellner; 5/13/98, 1/26/07, 5/08/09, 6/14/2010 M- and Z- theorems; GMM and Empirical Likelihood Wellner; 5/13/98, 1/26/07, 5/08/09, 6/14/2010 Z-theorems: Notation and Context Suppose that Θ R k, and that Ψ n : Θ R k, random maps Ψ : Θ R k, deterministic

More information

1 Introduction. 2 Residuals in PH model

1 Introduction. 2 Residuals in PH model Supplementary Material for Diagnostic Plotting Methods for Proportional Hazards Models With Time-dependent Covariates or Time-varying Regression Coefficients BY QIQING YU, JUNYI DONG Department of Mathematical

More information

1 Local Asymptotic Normality of Ranks and Covariates in Transformation Models

1 Local Asymptotic Normality of Ranks and Covariates in Transformation Models Draft: February 17, 1998 1 Local Asymptotic Normality of Ranks and Covariates in Transformation Models P.J. Bickel 1 and Y. Ritov 2 1.1 Introduction Le Cam and Yang (1988) addressed broadly the following

More information

GOODNESS-OF-FIT TESTS FOR ARCHIMEDEAN COPULA MODELS

GOODNESS-OF-FIT TESTS FOR ARCHIMEDEAN COPULA MODELS Statistica Sinica 20 (2010), 441-453 GOODNESS-OF-FIT TESTS FOR ARCHIMEDEAN COPULA MODELS Antai Wang Georgetown University Medical Center Abstract: In this paper, we propose two tests for parametric models

More information

Meei Pyng Ng 1 and Ray Watson 1

Meei Pyng Ng 1 and Ray Watson 1 Aust N Z J Stat 444), 2002, 467 478 DEALING WITH TIES IN FAILURE TIME DATA Meei Pyng Ng 1 and Ray Watson 1 University of Melbourne Summary In dealing with ties in failure time data the mechanism by which

More information

Part III. Hypothesis Testing. III.1. Log-rank Test for Right-censored Failure Time Data

Part III. Hypothesis Testing. III.1. Log-rank Test for Right-censored Failure Time Data 1 Part III. Hypothesis Testing III.1. Log-rank Test for Right-censored Failure Time Data Consider a survival study consisting of n independent subjects from p different populations with survival functions

More information

arxiv:submit/ [math.st] 6 May 2011

arxiv:submit/ [math.st] 6 May 2011 A Continuous Mapping Theorem for the Smallest Argmax Functional arxiv:submit/0243372 [math.st] 6 May 2011 Emilio Seijo and Bodhisattva Sen Columbia University Abstract This paper introduces a version of

More information

Statistical Modeling and Analysis for Survival Data with a Cure Fraction

Statistical Modeling and Analysis for Survival Data with a Cure Fraction Statistical Modeling and Analysis for Survival Data with a Cure Fraction by Jianfeng Xu A thesis submitted to the Department of Mathematics and Statistics in conformity with the requirements for the degree

More information

NETWORK-REGULARIZED HIGH-DIMENSIONAL COX REGRESSION FOR ANALYSIS OF GENOMIC DATA

NETWORK-REGULARIZED HIGH-DIMENSIONAL COX REGRESSION FOR ANALYSIS OF GENOMIC DATA Statistica Sinica 213): Supplement NETWORK-REGULARIZED HIGH-DIMENSIONAL COX REGRESSION FOR ANALYSIS OF GENOMIC DATA Hokeun Sun 1, Wei Lin 2, Rui Feng 2 and Hongzhe Li 2 1 Columbia University and 2 University

More information

Size and Shape of Confidence Regions from Extended Empirical Likelihood Tests

Size and Shape of Confidence Regions from Extended Empirical Likelihood Tests Biometrika (2014),,, pp. 1 13 C 2014 Biometrika Trust Printed in Great Britain Size and Shape of Confidence Regions from Extended Empirical Likelihood Tests BY M. ZHOU Department of Statistics, University

More information

Large sample theory for merged data from multiple sources

Large sample theory for merged data from multiple sources Large sample theory for merged data from multiple sources Takumi Saegusa University of Maryland Division of Statistics August 22 2018 Section 1 Introduction Problem: Data Integration Massive data are collected

More information

Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach

Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach By Shiqing Ling Department of Mathematics Hong Kong University of Science and Technology Let {y t : t = 0, ±1, ±2,

More information

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song Presenter: Jiwei Zhao Department of Statistics University of Wisconsin Madison April

More information

Regularization in Cox Frailty Models

Regularization in Cox Frailty Models Regularization in Cox Frailty Models Andreas Groll 1, Trevor Hastie 2, Gerhard Tutz 3 1 Ludwig-Maximilians-Universität Munich, Department of Mathematics, Theresienstraße 39, 80333 Munich, Germany 2 University

More information

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky Empirical likelihood with right censored data were studied by Thomas and Grunkmier (1975), Li (1995),

More information

Estimation of Conditional Kendall s Tau for Bivariate Interval Censored Data

Estimation of Conditional Kendall s Tau for Bivariate Interval Censored Data Communications for Statistical Applications and Methods 2015, Vol. 22, No. 6, 599 604 DOI: http://dx.doi.org/10.5351/csam.2015.22.6.599 Print ISSN 2287-7843 / Online ISSN 2383-4757 Estimation of Conditional

More information

Survival Regression Models

Survival Regression Models Survival Regression Models David M. Rocke May 18, 2017 David M. Rocke Survival Regression Models May 18, 2017 1 / 32 Background on the Proportional Hazards Model The exponential distribution has constant

More information

Issues on quantile autoregression

Issues on quantile autoregression Issues on quantile autoregression Jianqing Fan and Yingying Fan We congratulate Koenker and Xiao on their interesting and important contribution to the quantile autoregression (QAR). The paper provides

More information

REGRESSION ANALYSIS FOR TIME-TO-EVENT DATA THE PROPORTIONAL HAZARDS (COX) MODEL ST520

REGRESSION ANALYSIS FOR TIME-TO-EVENT DATA THE PROPORTIONAL HAZARDS (COX) MODEL ST520 REGRESSION ANALYSIS FOR TIME-TO-EVENT DATA THE PROPORTIONAL HAZARDS (COX) MODEL ST520 Department of Statistics North Carolina State University Presented by: Butch Tsiatis, Department of Statistics, NCSU

More information

Competing risks data analysis under the accelerated failure time model with missing cause of failure

Competing risks data analysis under the accelerated failure time model with missing cause of failure Ann Inst Stat Math 2016 68:855 876 DOI 10.1007/s10463-015-0516-y Competing risks data analysis under the accelerated failure time model with missing cause of failure Ming Zheng Renxin Lin Wen Yu Received:

More information

Survival Analysis. Lu Tian and Richard Olshen Stanford University

Survival Analysis. Lu Tian and Richard Olshen Stanford University 1 Survival Analysis Lu Tian and Richard Olshen Stanford University 2 Survival Time/ Failure Time/Event Time We will introduce various statistical methods for analyzing survival outcomes What is the survival

More information

CENSORED QUANTILE REGRESSION WITH COVARIATE MEASUREMENT ERRORS

CENSORED QUANTILE REGRESSION WITH COVARIATE MEASUREMENT ERRORS Statistica Sinica 21 211, 949-971 CENSORED QUANTILE REGRESSION WITH COVARIATE MEASUREMENT ERRORS Yanyuan Ma and Guosheng Yin Texas A&M University and The University of Hong Kong Abstract: We study censored

More information

Nuisance parameter elimination for proportional likelihood ratio models with nonignorable missingness and random truncation

Nuisance parameter elimination for proportional likelihood ratio models with nonignorable missingness and random truncation Biometrika Advance Access published October 24, 202 Biometrika (202), pp. 8 C 202 Biometrika rust Printed in Great Britain doi: 0.093/biomet/ass056 Nuisance parameter elimination for proportional likelihood

More information

Rank Regression Analysis of Multivariate Failure Time Data Based on Marginal Linear Models

Rank Regression Analysis of Multivariate Failure Time Data Based on Marginal Linear Models doi: 10.1111/j.1467-9469.2005.00487.x Published by Blacwell Publishing Ltd, 9600 Garsington Road, Oxford OX4 2DQ, UK and 350 Main Street, Malden, MA 02148, USA Vol 33: 1 23, 2006 Ran Regression Analysis

More information

Goodness-of-Fit Tests With Right-Censored Data by Edsel A. Pe~na Department of Statistics University of South Carolina Colloquium Talk August 31, 2 Research supported by an NIH Grant 1 1. Practical Problem

More information

STAT331. Combining Martingales, Stochastic Integrals, and Applications to Logrank Test & Cox s Model

STAT331. Combining Martingales, Stochastic Integrals, and Applications to Logrank Test & Cox s Model STAT331 Combining Martingales, Stochastic Integrals, and Applications to Logrank Test & Cox s Model Because of Theorem 2.5.1 in Fleming and Harrington, see Unit 11: For counting process martingales with

More information

VARIABLE SELECTION AND INDEPENDENT COMPONENT

VARIABLE SELECTION AND INDEPENDENT COMPONENT VARIABLE SELECTION AND INDEPENDENT COMPONENT ANALYSIS, PLUS TWO ADVERTS Richard Samworth University of Cambridge Joint work with Rajen Shah and Ming Yuan My core research interests A broad range of methodological

More information

Goodness-of-fit tests for the cure rate in a mixture cure model

Goodness-of-fit tests for the cure rate in a mixture cure model Biometrika (217), 13, 1, pp. 1 7 Printed in Great Britain Advance Access publication on 31 July 216 Goodness-of-fit tests for the cure rate in a mixture cure model BY U.U. MÜLLER Department of Statistics,

More information

Chapter 2 Inference on Mean Residual Life-Overview

Chapter 2 Inference on Mean Residual Life-Overview Chapter 2 Inference on Mean Residual Life-Overview Statistical inference based on the remaining lifetimes would be intuitively more appealing than the popular hazard function defined as the risk of immediate

More information