Rank Regression Analysis of Multivariate Failure Time Data Based on Marginal Linear Models

Size: px
Start display at page:

Download "Rank Regression Analysis of Multivariate Failure Time Data Based on Marginal Linear Models"

Transcription

1 doi: /j x Published by Blacwell Publishing Ltd, 9600 Garsington Road, Oxford OX4 2DQ, UK and 350 Main Street, Malden, MA 02148, USA Vol 33: 1 23, 2006 Ran Regression Analysis of Multivariate Failure Time Data Based on Marginal Linear Models Z. JIN Department of Biostatistics, Columbia University D. Y. LIN Department of Biostatistics, University of North Carolina Z. YING Department of Statistics, Columbia University ABSTRACT. Multivariate failure time data arises when each study subject can potentially experience several types of failures or recurrences of a certain phenomenon, or when failure times are sampled in clusters. We formulate the marginal distributions of such multivariate data with semiparametric accelerated failure time models (i.e. linear regression models for log-transformed failure times with arbitrary error distributions) while leaving the dependence structures for related failure times completely unspecified. We develop ran-based monotone estimating functions for the regression parameters of these marginal models based on right-censored observations. The estimating equations can be easily solved via linear programming. The resultant estimators are consistent and asymptotically normal. The limiting covariance matrices can be readily estimated by a novel resampling approach, which does not involve non-parametric density estimation or evaluation of numerical derivatives. The proposed estimators represent consistent roots to the potentially nonmonotone estimating equations based on weighted log-ran statistics. Simulation studies show that the new inference procedures perform well in small samples. Illustrations with real medical data are provided. Key words: accelerated failure time model, censoring, correlated data, linear programming, survival data, weighted log-ran statistics 1. Introduction Multivariate failure time data are commonly encountered in scientific investigations because each study subject can potentially experience several events or because there exists natural or artificial clustering of study units such that the failure times within the same cluster are correlated. We refer to the former situation as multiple events data and the latter as clustered failure time data. An important special form of multiple events data are recurrent events data, which represents the repetitions of the same phenomenon. Statistical analysis of multivariate failure time data is complicated by right censoring as well as by the dependence of related failure times. Lin (1994) provided a review of Cox-type regression models for such data. An important alternative to the Cox proportional hazards model is the accelerated failure time model (Kalbfleisch & Prentice, 2002, p. 44), which linearly regresses the logarithm of failure time on covariates. Ran estimation of the accelerated failure time model has been studied by Prentice (1978), Tsiatis (1990), Wei et al. (1990) and Lai & Ying (1991) among others for univariate failure time data, and by Lin & Wei (1992), Lee et al. (1993) and Lin et al. (1998) for multivariate failure time data. The ran estimators are derived from a class of weighted log-ran statistics. It is difficult to calculate the ran estimators because the esti-

2 2 Z. Jin et al. Scand J Statist 33 mating functions are step functions with multiple roots, some of which are inconsistent; identification of a consistent root can be very challenging in practice. A further difficulty lies in the variance covariance estimation: the limiting covariance matrices of the ran estimators involve the unnown hazard function of the error term and are thus not amenable to numerical evaluations. For univariate failure time data, some efforts have been made to alleviate the aforementioned difficulties. In particular, Jin et al. (2003) proposed a class of monotone estimating functions which approximates the weighted log-ran estimating functions around the true values of the regression parameters. The corresponding estimators are consistent and asymptotically normal with covariance matrices that can be readily estimated by a simple re-sampling technique. Both the parameter estimation and variance covariance estimation can be performed via linear programming. In this paper, we extend the wor of Jin et al. (2003) to marginal accelerated failure time models for multivariate failure time data. We construct ran-based monotone estimating functions for three types of accelerated failure time models dealing with multiple events, recurrent events and clustered data. The resultant estimators are proven to be consistent and asymptotically normal. Furthermore, we develop a novel resampling approach which properly adjusts for the dependence of related failure times in the variance covariance estimation. The proposed methods, lie those of Jin et al. (2003), can be implemented efficiently through linear programming. Because of the intraclass correlation, the resampling scheme employed here is different from that of Jin et al. (2003) and entails considerable new technical challenges. The rest of this paper is organized as follows. In sections 2 4, we present the models and corresponding inference procedures for multiple events data, recurrent events data and clustered failure time data respectively. In section 5, we report the results of our simulation studies. In section 6, we apply the proposed methods to two medical studies. Some concluding remars are given in section 7. All the proofs are relegated to the appendix. 2. Multiple events data 2.1. Preliminaries Consider a random sample of n subjects, each of whom can potentially experience K types of events or failures. For,..., n and = 1,..., K, let T i be the time to the th failure of the ith subject; let C i be the corresponding censoring time, and X i be the corresponding p 1 vector of covariates. We assume that (T 1i,..., T Ki ) and (C 1i,..., C Ki ) are independent conditional on (X 1i,..., X Ki ). The data consists of ( T i, Δ i, X i )( = 1,..., K;,..., n), where T i = T i C i and Δ i = I(T i C i ). Here and in the sequel, a b = min(a, b) and I( ) isthe indicator function. We formulate the marginal distributions of the K types of events with accelerated failure time models while leaving the dependence structures unspecified, i.e. log T i = β X i + ɛ i,,..., n; = 1,..., K, where β (β 1,..., β p ) is a p 1 vector of unnown regression parameters, and (ɛ 1i,..., ɛ Ki )(,..., n) are independent random vectors that are independent of the X i with a common, but completely unspecified, joint distribution. Let e i (β) = log T i β X i, N i (β; t) = Δ i I{e i (β) t}

3 Scand J Statist 33 Multivariate ran regression 3 and S (r) (β; t) = n 1 I{e i (β) t}xi(r r = 0, 1). i=1 The weighted log-ran estimating function for β is given by or U, (β) = Δ i (β; e i (β)){x i X (β; e i (β))}, U, (β) = (β; t){x i X (β; t)} dn i (β; t), where X (β; t) = S (1) (β; t)/s(0) (β; t) and is a weight function which satisfies condition 5 of Ying (1993, p. 90). The resultant estimator is denoted by β,. Note that the choices of 1, S (0) (β; t) and the Kaplan Meier estimator based on {e i (β), Δ i } (,..., n) as (β; t) correspond to the log-ran, Gehan Wilcoxon and Prentice Wilcoxon statistics respectively. Let M i (β; t) = N i (β; t) t 0 I{e i (β) u}λ (u)du, (1) where λ ( ) is the common hazard function of ɛ i (,..., n). Write s (r) (β; t) = lim S (r) (β; t) (r = 0, 1), and x (t) = s (1) (β ; t)/s (0) (β ; t) 0 (t) = lim (β ; t). Define and A, = lim n 1 V l, l = lim n 1 where a 2 = aa and u i, = { } 0 (t){x i x (t)} 2 dlogλ (t) dn i (β dt ; t), u i, u li, l, 0 (t){x i x (t)} dm i (β ; t). (2) Write B = (β 1,..., β K ) and B =( β 1, 1,..., β K, K ). The random vector n 1/2 ( B B) is asymptotically zero-mean normal with covariance matrix {A 1, V l, l A 1 l, l ;, l = 1,..., K}.

4 4 Z. Jin et al. Scand J Statist Gehan weight function As mentioned earlier, the choice of (β; t) = S (0) (β; t) corresponds to the Gehan (1965) weight function. In this case, U, (β) can be expressed as U,G (β) = n 1 Δ i (X i X j )I{e i (β) e j (β)}, j = 1 which is the gradient of the convex function L,G (β) n 1 Δ i {e i (β) e j (β)}, j = 1 where a = I(a < 0) a. Let β,g be a minimizer of L,G (β). The minimization of L,G (β) can be implemented by linear programming, and is equivalent to the minimization of Δ i e i (β) e j (β) + Q β Δ i (X j X i ), i=1 j=1 i=1 j=1 where Q is any number which is greater than β n n j = 1 Δ i(x j X i ). This minimization can be implemented via an L 1 minimization algorithm. We shall approximate the joint distribution of the β,g s by a resampling procedure. Let L,G(β) = n 1 Δ i {e i (β) e j (β)} Z i Z j, = 1,..., K, j = 1 where (Z 1,..., Z n ) are independent positive random variables with E(Z i ) = var(z i ) = 1. It is important to note that the same set of Z i (,..., n) is used in all the K functions L,G (β) ( = 1,..., K). Let β,g be a minimizer of L,G (β) orarootof U,G(β) n 1 Δ i (X i X j )I{e i (β) e j (β)}z i Z j. (3) i=1 j=1 Again, β,g is obtained via linear programming. Write B G = ( β 1,G,..., β K,G) and B G = ( β 1,G,..., β K,G). We state below and prove in the appendix that the conditional distribution of n 1/2 ( B G B G ) given the data ( T i, Δ i, X i )( = 1,..., K;,..., n) can be used to approximate the distribution of n 1/2 ( B G B). Conditional on the data ( T i, Δ i, X i )( = 1,..., K;,..., n), the only random elements in L,G (β) ( = 1,..., K) are the Z is. To approximate the distribution of B G, we obtain a large number of realizations of B G by repeatedly generating the random sample (Z 1,..., Z n ) while holding the data ( T i, Δ i, X i )( = 1,..., K;,..., n) at their observed values. The covariance matrix of B G can then be approximated by the empirical covariance matrix of B G. To mae our statements precise, we impose the following regularity conditions: Condition 1. For = 1,..., K and,..., n, the Euclidean norms X i are bounded by a non-random constant. Condition 2. Let f (t) be the density function associated with λ (t), = 1,..., K. Then f (t) and df (t)/dt are bounded and (d log f (s)/ds) 2 f (s)ds <. Condition 3. The matrices A,G ( = 1,..., K) are non-singular, where A,G is A, at 0 = s (0). evaluated

5 Scand J Statist 33 Multivariate ran regression 5 Remar 1. Conditions 1 and 2 correspond to conditions 1 and 2 of Ying (1993) that are required to ensure the asymptotic linearity of the (weighted) log-ran estimating function. As indicated by Ying (1993), condition 1 may be relaxed to max,i n X i = O(n α ) for any α > 0. It can be shown that all the commonly used error distributions satisfy condition 2. Condition 3 holds if for each, the vector of covariates does not lie in a lower dimensional hyperplane, which is a minimum requirement for the identifiability of the regression parameters. Theorem 1 Under conditions 1 3, the estimator B G is strongly consistent, and n 1/2 ( B G B) converges in distribution to a zero-mean multivariate normal random vector with covariance matrix {A 1,G V l,ga 1 l,g ;, l = 1,..., K}, where V l,g is V l, l evaluated at 0 = s (0) ( = 1,..., K). Furthermore, the conditional distribution of n 1/2 ( B G B G ) given the data ( T i, Δ i, X i ) ( = 1,..., K;,..., n) converges almost surely to the same limiting distribution. The resampling scheme proposed here is different from that of Jin et al. (2003) even if K = 1 in that each term in the summation of the perturbed function L,G (β) is weighted by Z i Z j rather than Z i. This modification is required so as to properly account for the dependence of the multiple failure times within the same subject, and it creates significant technical challenges in the proofs. As shown in the proof of theorem 1, n 1 U,G (β) has the same asymptotic slope as n 1 U,G (β) for each, and the conditional joint distribution of n 1/2 {U1,G (β 1),..., UK,G (β K )} given the data ( T i, Δ i, X i )( = 1,..., K;,..., n) converges to a zero-mean multivariate normal distribution whose covariance matrix is the limiting covariance matrix of n 1/2 {U 1,G (β 1 ),..., U K,G (β K )}. Thus, the conditional joint distribution of n 1/2 ( B G B G ) given the data is the same in the limit as the joint distribution of n 1/2 ( B G B). If Z i instead of Z i Z j were used in (3), then the conditional marginal distributions of n 1/2 {U1,G (β 1),..., UK,G (β K )} given the data would still be the same in the limit as the marginal distributions of n 1/2 {U 1,G (β 1 ),..., U K,G (β K )}, but the two joint distributions, specifically the two covariance matrices would be different General weight functions In general, U, (β) is non-monotone. We consider the monotone modification of U, (β): Ũ, (β; β ) = Δ i ψ ( β ; e i ( β ))S (0) (β; e i(β)){x i X (β; e i (β))}, where ψ (β; t) = (β; t)/s (0) (β; t) and β is a preliminary consistent estimator of β. Note that Ũ, (β; β ) is monotone componentwise and is the gradient of the convex function L, (β; β ) n 1 ψ ( β ; e i ( β ))Δ i {e i (β) e j (β)}, j = 1 which can again be minimized via linear programming. The minimization is carried out iteratively, i.e. β(m) = arg min β L, (β; β (m 1) ) (m 1), where β (0) = β,g. If the iterative algorithm converges as m, then the limit satisfies the original estimating equation U, (β) = 0. For most commonly used weight functions, the algorithm converges stochas-

6 6 Z. Jin et al. Scand J Statist 33 tically in that, with a suitable choice of m that depends on n, β (m) is asymptotically equivalent to the consistent roots of the original estimating equation U, (β) = 0 (see Jin et al., 2003). Whether the algorithm converges or not, β(m) is consistent and asymptotically normal. To approximate the joint distribution of the β (m) s, we again appeal to the resampling approach. Let β (0) = β,g and β (m) = arg min β L, (β; β (m 1)) (m 1), where L, (β; b) = n 1 j = 1 ψ (b; e i (b))δ i {e i (β) e j (β)} Z i Z j. Write B (m) = ( β 1(m),..., β K(m)) and B (m) = ( β 1(m),..., β K(m)). We state below and prove in the appendix that, for any m, the conditional distribution of n 1/2 ( B (m) B (m) ) given the data is asymptotically equivalent to the limiting distribution of n 1/2 ( B (m) B). We impose two additional regularity conditions: Condition 4. For each = 1,, K, both A, and (A, + D, ) are non-singular, where D, = lim n 1 ψ 0 (t)s (0) (β ; t){x i x (t)} 2 dn i (β ; t), and ψ 0 (t) is the derivative of ψ 0 (t) lim ψ (β ; t). Condition 5. For each = 1,..., K and for any β n and η n such that β n β + η n = o(n ɛ ) almost surely for some ɛ > 0, ψ (β n ; t) = ψ (β ; t) + o(1) and ψ (β n ; t + η n ) = ψ (β n ; t) + ψ 0 (t)η n + o(n 1/2 + η n ), both uniformly in t. Theorem 2 Suppose that conditions 1 5 hold. For each m, the estimator B (m) is strongly consistent, and n 1/2 ( B (m) B) converges to a zero-mean multivariate normal distribution. Furthermore, the conditional distribution of n 1/2 ( B (m) B (m) ) given the data ( T i, Δ i, X i )( = 1,..., K;,..., n) converges almost surely to the same limiting distribution. For notational simplicity, we shall drop the subscript (m) in B (m) and B (m). To approximate the distribution of B, we obtain a large number of realizations of B by repeatedly generating the random sample (Z 1,..., Z n ) while fixing the data ( T i, Δ i, X i )( = 1,..., K;,..., n) at their observed values. The covariance matrix of B can then be approximated by the empirical covariance matrix of B, denoted by V. The above results enable one to carry out simultaneous inference on B. Suppose, for example, one is interested in the effects η β 1 ( = 1,..., K) of a particular ind of covariateonthek event times. Let V η be the part of V corresponding to the covariance matrix of ( η 1,..., η K ), where η = β1. Then the null hypothesis H 0 : η 1 = η 2 = = η K = 0 can be tested by using the quadratic form ( η 1,..., η K ) V η 1 ( η 1,..., η K ). One can also determine which of the individual hypotheses η = 0( = 1,..., K) should be rejected by using the sequential multiple testing procedures discussed in Wei et al. (1989). Under the restriction that η 1 = η 2 = = η K = η, the optimal linear estimator η K = 1 c η, where (c 1,..., c K ) =(1 V η 1) V η 1 and 1 = (1,...,1), has the smallest asymptotic variance among all linear estimators for η.

7 Scand J Statist 33 Multivariate ran regression 7 3. Recurrent events data 3.1. Preliminaries Suppose that we have a random sample of n subjects. For,..., n and = 1, 2,..., let T i be the time to the th recurrent event on the ith subject; let C i and X i be the censoring time and the p 1 vector of covariates for the ith subject. Assume that C i is independent of T i ( = 1, 2,...) conditional on X i. Let N i (t) = I(T i t). = 1 We specify the following accelerated time model for the mean frequency function: E{Ni (t) X i } = μ 0 (t e β 0 X i), (4) where β 0 is a p 1 vector of regression parameters, and μ 0 ( ) is an unspecified baseline mean function. The weighted log-ran estimating function for β 0 taes the form U (β) = I(T i C i ) (β; T i e β X i ){X i X(β; T i e β X i )}, (5) = 1 where X(β; t) = S(1) (β; t) S (0) (β; t), S(r) (β; t) = n 1 I(C j e β X j t)xj r (r = 0, 1) j = 1 and is a weight function. The resultant estimator β normal. is consistent and asymptotically 3.2. Gehan weight function Lin et al. (1998) noted that, for the Gehan weight function, U (β) reduces to U G (β) = n 1 I(T i C i )(X i X j )I{log T i log C j β (X i X j )}. j = 1 = 1 Thus, the corresponding estimator β G can be obtained by minimizing the convex function L G (β) = n 1 I(T i C i ){log T i log C j β (X i X j )} j = 1 = 1 via linear programming. Define LG(β) = n 1 I(T i C i ){log T i log C j β (X i X j )} Z i Z j, j = 1 = 1 where (Z 1,..., Z n ) are the same as in section 2. Denote a minimizer of L G (β) by β G, which again is obtained via linear programming. Let N i (β; t) = Ni (t e β X i C i ) and Also, let M i (β; t) = N i (β; t) t s (r) (β; t) = lim S (r) (β; t)(r = 0, 1) 0 I(C i e β X i u)dμ 0 (u).

8 8 Z. Jin et al. Scand J Statist 33 and Define and where x(t) = s (1) (β 0 ; t)/s (0) (β 0 ; t). A = lim n 1 V = lim n 1 u i, = 0 0 u 2 i,, 0 (t) = lim (β 0 ; t), and μ 0 (t) = dμ 0 (t)/dt. 0 (t)i(c i e β 0 X i t){x i x(t)} 2 d{ μ 0 (t)t}, 0 (t){x i x(t)} dm i (β 0 ; t), We impose the following conditions: Condition 6. For all i, X i + C i + N i (C i ) are bounded by a non-random constant. Condition 7. The function μ 0 is continuously differentiable. Condition 8. The matrix A G is non-singular, where A G is A evaluated at 0 = s (0). Theorem 3 Under conditions 6 8, the estimator β G is strongly consistent, and n 1/2 ( β G β 0 ) converges in distribution to a zero-mean normal random vector with covariance matrix A 1 G V GA 1 G, where V G is V evaluated at 0 = s (0). Furthermore, the conditional distribution of n 1/2 ( β G β G ) given the data (C i, T i, X i )(T i C i ;,..., n) converges almost surely to the limiting distribution of n 1/2 ( β G β 0 ) General weight functions To approximate β and its covariance matrix, we define L (β; b) = n 1 L (β; b) = n 1 j = 1 = 1 j = 1 = 1 ψ(b; T i e b X i )I(T i C i ){log T i log C j β (X i X j )}, ψ(b; T i e b X i )I(T i C i ){log T i log C j β (X i X j )} Z i Z j, where ψ(β; t) = (β; t)/s (0) (β; t). For m 1, let β(m) = arg min β L (β; β (m 1) ), and β (m) = arg min β L (β; β (m 1)), where β (0) = βg and β (0) = β G. We impose the following conditions.

9 Scand J Statist 33 Multivariate ran regression 9 Condition 9. Both A and (A + D ) are non-singular, where D = lim n 1 i=1 0 ψ 0 (t)ts (0) (β 0 ; t){x i x(t)} 2 dn i (β 0 ; t), and ψ 0 (t) is the derivative of ψ 0 (t) lim ψ(β 0 ; t). Condition 10. For any β n and η n such that β n β 0 + η n = o(n ɛ ) almost surely for some ɛ > 0, ψ(β n ; t) = ψ(β 0 ; t) + o(1) and ψ(β n ; t(1+ η n )) = ψ(β n ; t) + ψ 0 (t)η n + o(n 1/2 + η n ), both uniformly in t τ, where τ = sup{t : Pr(C e β 0 X t) > 0}. Theorem 4 Suppose that conditions 6 10 are satisfied. For each m, the estimator β (m) is strongly consistent, and n 1/2 ( β (m) β 0 ) converges to a zero-mean multivariate normal distribution. Furthermore, the conditional distribution of n 1/2 ( β (m) β (m) ) given the data (C i, T i, X i )(T i C i ;,..., n) converges almost surely to the same limiting distribution. 4. Clustered failure time data 4.1. Preliminaries Suppose that we have a random sample of n clusters and there are K i members in the ith cluster. Let T i and C i be the failure time and censoring time for the th member of the ith cluster, and let X i be the corresponding p 1 vector of covariates. We assume that (T i1,..., T iki ) and (C i1,..., C iki ) are independent conditional on (X i1,..., X iki ). The data consist of ( T i, Δ i, X i )( = 1,..., K i ;,..., n), where T i = T i C i and Δ i = I(T i C i ). We specify that the marginal distributions of the T i satisfy the accelerated failure time model: log T i = β 0X i + ɛ i, = 1,..., K i ;,..., n, where β 0 is a p 1 vector of unnown regression parameters, and (ɛ i1,..., ɛ iki )(,..., n) are independent random vectors. For each i, the error terms ɛ i1,..., ɛ iki are potentially correlated, but are assumed to be exchangeable with a common marginal distribution; for any i and j, and K K i K j, the vectors (ɛ i1,..., ɛ ik ) and (ɛ j1,..., ɛ jk ) have the same distribution. Let e i (β) = log T i β X i and S (r) (β; t) = n 1 n Ki = 1 I{e i(β) t}xi r (r = 0, 1). Under the independence woring assumption, the weighted log-ran estimating function taes the form U (β) = K i ( ){ ( )} Δ i β; e i (β) X i X β; e i (β), (6) = 1 where X(β; t) = S (1) (β; t)/s (0) (β; t) and is a weight function. Denote the estimator by β Gehan weight function For (β; t) = S (0) (β; t), we can express U (β) as K i K j U G (β) = n 1 Δ i (X i X jl )I{e i (β) e jl (β)}, = 1 j = 1 l = 1

10 10 Z. Jin et al. Scand J Statist 33 which is the gradient of L G (β) n 1 K i K j Δ i {e i (β) e jl (β)}. = 1 j = 1 l = 1 Let β G be a minimizer of L G (β), which can again be obtained by linear programming. Define L G(β) = n 1 K i K j Δ i {e i (β) e jl (β)} Z i Z j, = 1 j = 1 l = 1 where (Z 1,..., Z n ) are defined in section 2. Let β G be a minimizer of LG (β). Define N i (β; t) = Δ i I{e i (β) t} and M i (β; t) = N i (β; t) t I{e i(β) u}λ 0 (u)du, where λ 0 ( ) is the common hazard function of the ɛ i s. Also, define and where A = lim n 1 V = lim n 1 u i, = K i = 1 ( Ki ) 2 u i,, = 1 0 (t){x i x(t)} dm i (β 0 ; t), { } 0 (t){x i x(t)} 2 dlogλ0 (t) dn i (β dt 0 ; t), 0 (t) = lim (β 0 ; t), x(t) = s(1) (β 0 ; t) s (0) (β 0 ; t), and s (r) (β; t) = lim S (r) (β; t)(r = 0, 1). We impose the following regularity conditions: Condition 11. For all i, K i = 1 X i + K i are bounded by a nonrandom constant. Condition 12. Let f be the density function associated with λ 0. Then f (t) and df (t)/dt are bounded, and {dlogf (t)/dt} 2 f (t)dt <. Condition 13. The matrix A G is non-singular, where A G is A evaluated at 0 = s (0). Theorem 5 Under conditions 11 13, the estimator β G is stongly consistent, and n 1/2 ( β G β 0 ) converges in distribution to a zero-mean normal random vector with covariance matrix A 1 G V GA 1 G, where V G is V evaluated at 0 = s (0). Furthermore, the conditional distribution of n 1/2 ( β G β G ) given the data ( T i, Δ i, X i )( = 1,..., K i ;,..., n) converges almost surely to the limiting distribution of n 1/2 ( β G β 0 ).

11 Scand J Statist 33 Multivariate ran regression General weight functions We consider K i K j ( ) L (β; b) n 1 ψ b; e i (b) Δ i {e i (β) e jl (β)}, = 1 j = 1 l = 1 L (β; b) n 1 K i K j ( ) ψ b; e i (b) Δ i {e i (β) e jl (β)} Z i Z j, = 1 j = 1 l = 1 where ψ(β; t) = (β; t)/s (0) (β; t). For m 1, let β(m) = arg min β L (β; β (m 1) ) and β (m) = arg min β L (β; β (m 1)), where β (0) = βg and β (0) = β G. We impose two additional conditions: Condition 14. Both A and (A + D ) are non-singular, where D = lim n 1 K i = 1 ψ 0 (t)s (0) (β 0 ; t){x i x(t)} 2 dn i (β 0 ; t), and ψ 0 (t) is the derivative of ψ 0 (t) lim ψ(β 0 ; t). Condition 15. For any β n and η n such that β n β 0 + η n = o(n ɛ ) almost surely for some ɛ > 0, ψ(β n ; t) = ψ(β 0 ; t) + o(1) and ψ(β n ; t + η n ) = ψ(β n ; t) + ψ 0 (t)η n + o(n 1/2 + η n ), both uniformly in t. Theorem 6 Suppose that conditions are satisfied. For each m, the estimator β (m) is strongly consistent and n 1/2 ( β (m) β 0 ) converges to a zero-mean multivariate normal distribution. Furthermore, the conditional distribution of n 1/2 ( β (m) β (m) ) given the data ( T i, Δ i, X i )( = 1,..., K i ;,..., n) converges almost surely to the same limiting distribution. 5. Numerical studies We carried out extensive simulation studies to evaluate the small-sample properties of the methods developed in sections 2 4. We focused on the Gehan and log-ran weight functions. The (approximate) log-ran estimates were obtained with three iterations. The differences between the estimates with three iterations and those at convergence are generally negligible. For multiple events and clustered data, two failure times T 1 and T 2 were generated from Gumbel (1960) bivariate distribution: F(t 1, t 2 ) = F 1 (t 1 )F 2 (t 2 )[1 + θ{1 F 1 (t 1 )}{1 F 2 (t 2 )}], where 1 θ 1. The correlation between T 1 and T 2 is θ/4. The two marginal distributions F (t )( = 1, 2) were exponential with hazard rates λ = e β 1 X 1 + β 2 X 2, where X1 ( = 1, 2) were Bernoulli with 0.5 success probability and X 2 ( = 1, 2) were independent standard normal truncated at ±2. For multiple events, T 1 and T 2 shared the same set of covariates, i.e. X 11 = X 12 and X 21 = X 22 ; for clustered data, the covariates were generated independently. The

12 12 Z. Jin et al. Scand J Statist 33 censoring times were generated from the uniform (0, τ) distribution, where τ was chosen to yield a desired level of censoring. For recurrent events, the covariates were generated in the same manner as in the case of multiple events. The gap times between successive events were generated from the aforementioned Gumbel s bivariate exponential distribution. The resultant recurrent event process is Poisson under θ = 0 and non-poisson under θ = 0. The follow-up time was an independent uniform (0, 2.5) random variable, which on average yielded approximately 2.60 and 2.86 events per subject for the Poisson and non-poisson cases respectively. Tables 1 and 2 summarizes the results on the estimation of β 1 when β 1 = 1 and β 2 = 0.5. The results for β 2 are similar and thus omitted. Each entry in the table was based on 1000 simulated data sets. For each data set, we approximated the limiting distribution of the para- Table 1. Simulation results for multiple events and clustered data Multiple events Clustered data Censoring θ n (%) Weight Bias SE SEE CP Bias SE SEE CP Gehan Log-ran Gehan Log-ran Gehan Log-ran Gehan Log-ran Gehan Log-ran Gehan Log-ran Gehan Log-ran Gehan Log-ran Gehan Log-ran Gehan Log-ran Gehan Log-ran Gehan Log-ran Bias and SE represents the bias and standard error of the estimator, SEE represents the mean of the standard error estimator and CP represents the coverage probability of the 95% Wald-type confidence interval. The results for multiple events pertain to the optimal linear estimator. Table 2. Simulation results for recurrent events data θ n Weight Bias SE SEE CP 0 50 Gehan Log-ran Gehan Log-ran Gehan Log-ran Gehan Log-ran For explanation see Table 1.

13 Scand J Statist 33 Multivariate ran regression 13 meter estimator using 1000 samples of (Z 1,..., Z n ), where the Z i s are standard exponential random variables. The simulation results show that the proposed methods perform well in small samples. The parameter estimators are virtually unbiased. The standard error estimators are accurate, and the confidence intervals have proper coverage probabilities. 6. Examples To illustrate the methods of sections 2 and 3 and to compare with the existing methods of Lin & Wei (1992) and Lin et al. (1998), we consider the well-nown bladder cancer data reported by Wei et al. (1989). These data were obtained from a randomized clinical trial assessing the potential benefit of thiotepa in reducing recurrences of bladder tumours. There are 38 patients in the thiotepa group with a total of 45 observed recurrences and 48 placebo patients with a total of 87 observed recurrences. To compare with the results of Lin & Wei (1992), we consider the first three recurrences of each patient. For,..., 86 and = 1, 2, 3, let T i be the time from the initiation of treatment to the th tumour recurrence on the ith patient, let X 1i indicated by the values 1 versus 0 whether the ith patient received thiotepa or placebo, and let X 2i be the number of initial tumours for the ith patient. We regress log 10 T i on X 1i and X 2i. Recurrence times of 0 are replaced with 0.5. In this section, the log-ran estimates at convergence are reported, and the resampling was performed in the same manner as in section 5 except that 10,000 samples are used. The results of our analysis are presented in Table 3. The log-ran estimates for individual recurrences are similar to those of Lin & Wei (1992). The optimally combined log-ran estimate is about 10% smaller than the estimate of Lin & Wei (1992) based on minimum-dispersion statistics. More importantly, our confidence intervals are much narrower than Lin & Wei s. In fact, our two-sided p-value for testing no overall treatment effect is approximately whereas that of Lin & Wei (1992) is approximately These differences reflect the fact that the Lin Wei estimator is not based on the optimal linear combination. Following Lin et al. (1998), we regard the tumour recurrences for each patient as a single counting process and fit model (4) with three covariates: (i) treatment indicator; (ii) number of initial tumours; and (iii) the diameter of the largest initial tumour; the treatment indicator taes the value 1 for placebo and 0 for thiotepa. Table 4 displays the results of our analysis, which are similar to those of Lin et al. (1998). Incidentally, Lin et al. (1998) used ad hoc iterative (one-dimensional) bisection search to solve the estimating functions along with a different resampling technique. Table 3. Estimation of treatment effects on the first three tumor recurrences of bladder cancer patients Weight Tumour Parameter Estimated 95% Wald confidence function recurrences estimate standard error interval Gehan First to Second to Third to First three to Log-ran First to Second to Third to First three to The optimal linear estimator is used to estimate the overall treatment effect on the first three recurrences.

14 14 Z. Jin et al. Scand J Statist 33 Table 4. Regression analysis on the mean frequency of tumour recurrences in bladder cancer patients 95% confidence intervals Weight Parameter Estimated function Covariate estimate standard error Wald Percentile Gehan Treatment to to Initial number to to Initial size to to Logran Treatment to to Initial number to to Initial size to to We change β to β so as to be consistent with the parameterization of Lin et al. (1998). For a real example of clustered data, we consider the litter-matched tumorigenesis data originally reported by Mantel et al. (1977) and reproduced in Table 1 of Lee et al. (1993). There are 50 female litters in the study, each having three rats. For,..., 50 and = 1, 2, 3, let T i be the time of tumour appearance for the th rat in the ith litter, and let X i indicate, by the values 0 versus 1, whether the th rat in the ith litter was drug-treated or not. We regress log T i on X i. The Gehan estimate is with an estimated standard error of 0.093, and the corresponding Wald 95% confidence interval is ( 0.026, 0.338). The log-ran estimate is with an estimated standard error of 0.090, and the corresponding Wald 95% confidence interval is ( 0.016, 0.338). The log-ran results differ slightly from those of Lee et al. (1993). 7. Discussion Although Cox-type regression methods for multivariate failure time data have been studied extensively, it is desirable to explore the accelerated failure time regression approach for several reasons. First, accelerated failure time models may fit the data better than proportional hazards models. Secondly, the accelerated failure time model formulates a natural and direct regression relationship, whereas the relative ris modelled by the Cox regression has no physical interpretation when the censored response variable is not failure time. Thirdly, the regression parameters under multivariate accelerated failure time models have both the population-averaged and subject-specific interpretations. This is not true of proportional hazards models. The proposed resampling approach differs from that of Jin et al. (2003) and entails considerable technical challenges. The fact that this approach correctly adjusts for the intraclass dependence is remarable. In all the existing methods for multivariate failure time data, either under proportional hazards models or accelerated failure time models, each estimating function is approximated by a sum of independent and identically distributed (i.i.d.) terms and the empirical variances and covariances of these sums are calculated. These calculations led to complicated variance covariance expressions, which may perform poorly in small samples. The proposed resampling procedure does not involve complicated i.i.d. approximations in the variance covariance estimation. We have focused on the estimation of the regression parameters. A related problem is the estimation of the failure time distributions. The cumulative hazard functions for multiple events and clustered data as well as the mean frequency functions for recurrent events can be estimated consistently by the Aalen Breslow type estimators (see Lin et al., 1998, p. 608). Upon normalizations, these estimators converge wealy to zero-mean Gaussian processes. We

15 Scand J Statist 33 Multivariate ran regression 15 can approximate the limiting distributions by extending the resampling technique developed in this paper, and construct appropriate confidence intervals and confidence bands. Residuals similar to those of proportional hazards models (Kalbfleisch & Prentice, 2002, pp ) can be used to chec accelerated failure time models. It is also possible to develop formal goodness-of-fit methods based on the comparison of the ran-type estimators with different weight functions (Wei et al., 1990) or on the cumulative sums of residuals (Lin et al., 1993). The resampling technique presented in this paper will be useful in evaluating the distributions of the test statistics. Acnowledgements This research was supported by the New Yor City Council Speaer s Fund for Public Health Research, the National Institutes of Health and the National Science Foundation. References Gehan, E. A. (1965). A generalized Wilcoxon test for comparing arbitrarily single-censored samples. Biometria 52, Gumbel, E. J. (1960). Bivariate exponential distributions. J. Amer. Statist. Assoc. 55, Jin, Z., Lin, D. Y., Wei, L. J. & Ying, Z. (2003). Ran-based inference for the accelerated failure time model. Biometria 90, Kalbfleisch, J. D. & Prentice, R. L. (2002). The statistical analysis of failure time data, 2nd edn. John Wiley, Hoboen, NJ. Lai, T. L. & Ying, Z. (1991). Ran regression methods for left-truncated and right-censored data. Ann. Statist. 19, Lee, E. W., Wei, L. J. & Ying, Z. (1993). Linear regression analysis for highly stratified failure time data. J. Amer. Statist. Assoc. 88, Lin, D. Y. (1994). Cox regression analysis of multivariate failure time data: the marginal approach. Statist. Med. 13, Lin, J. S. & Wei, L. J. (1992). Linear regression analysis for multivariate failure time observations. J. Amer. Statist. Assoc. 87, Lin, D. Y., Wei, L. J. & Ying, Z. (1993). Checing the Cox model with cumulative sums of martingalebased residuals. Biometria 80, Lin, D. Y., Wei, L. J. & Ying, Z. (1998). Accelerated failure time models for counting processes. Biometria 85, Lin, D. Y., Wei, L. J., Yang, I. & Ying, Z. (2000). Semiparametric regression for the mean and rate functions of recurrent events. J. Roy. Statist. Soc. Ser. B 62, Mantel, N., Bohidar, N. R. & Ciminera, J. L. (1977). A Mantel-Haenszel analysis of litter-matched timeto-response data, with modifications for recovery of interlitter information. Cancer Res. 37, Pollard, D. (1990). Empirical processes: theory and application. IMS, Hayward, CA. Prentice, R. L. (1978). Linear ran tests with right censored data. Biometria 65, Serfling, R. J. (1980). Approximation theorems of mathematical statistics. John Wiley, New Yor, NY. Tsiatis, A. A. (1990). Estimating regression parameters using linear ran tests for censored data. Ann. Statist. 18, Wei, L. J., Lin, D. Y. & Weissfeld, L. (1989). Regression analysis of multivariate incomplete failure time data by modeling marginal distributions. J. Amer. Statist. Assoc. 84, Wei, L. J., Ying, Z. & Lin, D. Y. (1990). Linear regression analysis of censored survival data based on ran tests. Biometria 77, Ying, Z. (1993). A large sample study of ran estimation for censored regression data. Ann. Statist. 21, Received June 2004, in final form September 2005 D. Y. Lin, Department of Biostatistics, CB 7420, University of North Carolina, Chapel Hill, NC , USA. lin@bios.unc.edu

16 16 Z. Jin et al. Scand J Statist 33 Appendix Proofs of asymptotic results The proofs in this appendix are more technical and more rigorous than those of Jin et al. (2003). We omit the ind of derivation given in the appendix of Jin et al. (2003) and focus on new technical issues. We first state and prove a lemma that is used repeatedly in our proofs. Lemma 1 Let H n (t) and W n (t) be two sequences of bounded processes. Suppose that H n (t) is componentwise monotone and converges in probability to H(t) uniformly in t and W n (t) converges wealy to a zero-mean process with continuous sample paths. Then for any continuously differentiable function g, [g{h n (t)} g{h(t)}]dw n (t) = o p (1). Proof of lemma 1 By the strong embedding arguments as used in Lin et al. (2000, p. 726), H n and W n can be assumed to converge to their respective limits almost surely. One can then apply lemma 1 of Lin et al. (2000) repeatedly and component-wise to obtain the desired approximation. Proof of theorem 1 The classical strong law of large numbers for U statistics (Serfling, 1980, section 5.4) implies that, under condition 1, L,G converges almost surely for each. Note that L,G is a convex function, so that the convergence is uniform in any compact region. By condition 2, the limiting function is strictly convex at the true parameter value β. Therefore, β a.s.,g β. Under conditions 1 3, we can apply the arguments of Ying (1993) to obtain n 1/2 ( β,g β ) = A 1,G n 1/2 U,G (β ) + o(1+ n 1/2 β,g β ), a.s. (7) Recall that U,G (β) = S (0) (β; t){x i X (β; t)} dn i (β; t). The simple equality I{e i (β) t}{x i X (β; t)} = 0 implies that U,G (β) = S (0) (β; t){x i X (β; t)} dm i (β; t), (8) where the M i (β; t) are defined in (1). It is well nown that E{M i (β ; t)} = 0. By the uniform strong law of large numbers (Pollard, 1990, section 8), S (r) a.s. (β; t) s (r) (β; t) uniformly in β and t. It then follows from lemma 1 that n 1/2 U,G (β ) = n 1/2 u i, G + o p (1), (9) where

17 Scand J Statist 33 Multivariate ran regression 17 u i, G = s (0) (β ; t){x i x (t)} dm i (β ; t). In view of (7) and (9), the convergence of n 1/2 ( B G B) stated in theorem 1 follows from the multivariate central limit theorem. Because of the way the random perturbation is introduced, the loss function L,G retains the convexity of L,G. Thus, the above arguments for the consistency of β,g can be used to show that β a.s.,g β. Through algebraic manipulations, we can express (3) as where U,G(β) = S (r) (β; t) = n 1 S (0) (β; t){x i X (β; t)} dn i (β; t)z i, (10) I{e j (β) t}xjz r j (r = 0, 1), j = 1 and X (β; t) = S (1) (β; t)/ S (0) (β; t). This is a functional of weighted empirical processes, just lie U,G, but with the extra weights Z i. Thus, the arguments for establishing the asymptotic linearity of U,G are applicable to U,G under conditions 1 3. In particular, we can expand U,G ( β,g) at β,g to obtain n 1/2 ( β,g β,g ) = A 1,G n 1/2 U,G( β,g ) + o(1+ n 1/2 β,g β,g ), a.s. (11) Note that (7) and (11) have the same slope matrix A,G. This is because n 1 U,G (β) and n 1 U,G (β) converge to the same limiting function as Z i are independent of the data with mean 1. As I{e i (β) t}z i {X i X (β; t)} = 0, we can rewrite (10) as U,G(β) = S (0) (β; t){x i X (β; t)} dm i (β; t)z i. (12) This result arises from the specific way in which the random weights Z i are introduced into L,G (β), and does not hold under the weighting scheme of Jin et al. (2003). In fact, the latter would not lead to a correct approximation. We shall show that S (0) and X in (12) can be replaced by S (0) and X. This part of the proof is much more delicate than its counterpart in Jin et al. (2003). Simple algebraic manipulations of (12) yield the following decomposition: U,G( β,g ) = + S (0) ( β,g ; t){x i X ( β,g ; t)} dm i ( β,g ; t)z i { S (0) ( β,g ; t) S (0) ( β,g ; t)} d X i M i ( β,g ; t)z i { S (1) ( β,g ; t) S (1) ( β,g ; t)} d M i ( β,g ; t)z i. (13)

18 18 Z. Jin et al. Scand J Statist 33 Let F n be the σ-algebra generated by all potential data (T i, C i, X i ) ( = 1,..., K; i = 1,..., n). For random vectors W n involving the Z i s, we use the notation W n = õ(d n )todenote the fact that Pr( dn 1 W n > ɛ F n ) a.s. 0 for every ɛ > 0. Conditional on F n, n 1/2 { S (r) (β; ) (β; )} (r = 0, 1) converge wealy, and S (r) and n 1 n 1 M i ( β,g ; t)z i 0 X i M i ( β,g ; t)z i 0 uniformly in t. It then follows from lemma 1, together with integration by parts that the second and third terms on the right-hand side of (13) are both of order õ(n 1/2 ). Clearly, U,G( β,g ) = U,G( β,g ) U,G ( β,g ) + õ(n 1/2 ) as β,g is a root of U,G (β). By subtracting (8) evaluated at β = β,g from the first term on the right-hand side of (13), we have U,G( β,g ) = S (0) ( β,g ; t){x i X ( β,g ; t)} dm i ( β,g ; t)(z i 1) + õ(n 1/2 ). (14) Conditional on F n, the first term on the right-hand side of (14) is a sum of zero-mean random vectors. Thus, the multivariate central limit theorem implies that the conditional distribution of the random vector n 1/2 {U1, G ( β 1, G ),..., UK,G ( β K,G )} given F n converges almost surely to a pk-variate normal random vector with mean zero and covariance matrix where and {Ṽ l, G ;, l = 1,..., K}, Ṽ l, G = lim n 1 ũ i, G = ũ i, G ũ li, G, S (0) ( β,g ; t){x i X ( β,g ; t)} dm i ( β,g ; t), = 1,..., K;,..., n. As S (r) a.s. (β; t) s (r) (β; t) (r = 0, 1) and β a.s. a.s.,g β,wehaveṽ l, G V l, G. It then follows from (11) that the conditional distribution of n 1/2 ( β 1, G β 1,..., β K,G β K ) given F n converges almost surely to a zero-mean normal distribution with covariance matrix {A 1,G V l,ga 1 l,g ;, l = 1,..., K}, which is the limiting distribution of n 1/2 ( β 1, G β 1,..., β K,G β K ). Proof of theorem 2 The convex analysis arguments for establishing the consistency of β,g and β,g in the proof of theorem 1 can be used repeatedly to show that both β (m) and β (m) are strongly consistent. To derive the asymptotic distributions, we note that β (m) and β (m) are the roots of Ũ, (β; β (m 1) ) ψ ( β (m 1) ; t + (β β (m 1) ) X i ) S (0) (β; t){x i X (β; t)} dn i (β; t),

19 Scand J Statist 33 Multivariate ran regression 19 Ũ, (β; β (m 1)) ψ ( β (m 1); t + (β β (m 1)) X i ) S (0) (β; t){x i X (β; t)} dn i (β; t)z i respectively. Under condition 5, Ũ, (β; β (m 1) ) = ψ ( β (m 1) ; t)s (0) (β; t){x i X (β; t)} dn i (β; t) Ũ, (β; β (m 1)) = + ψ 0 (t)s (0) (β; t){x i X (β; t)} dn i (β; t) (β β (m 1) ) X i + o(n 1/2 + n β β (m 1) ) + ψ ( β (m 1); t) S (0) (β; t){x i X (β; t)} dn i (β; t)z i ψ 0 (t) S (0) (β; t){x i X (β; t)} dn i (β; t) Z i (β β (m 1)) X i + o(n 1/2 + n β β (m 1) ). Given (7) and (15), we can extend the arguments for establishing (11) of Jin et al. (2003) to show that the following result holds under conditions 1 5, n 1/2 ( β (m) β ) = n 1/2 [I {(A, + D, ) 1 D, } m ]A 1, U, (β ) n 1/2 {(A, + D, ) 1 D, } m A 1,G U,G(β ) ( m ) + o 1 + n 1/2 β (j) β. j = 0 Note that condition 4 is necessary for the above equation to be meaningful. By the arguments for establishing (9) in the proof of theorem 1, we have n 1/2 U, (β ) = n 1/2 u i, + o p (1), where the u i, are defined in (2). Thus, ( n 1/2 ( β (m) β ) = n 1/2 [I {(A, + D, ) 1 D, } m ]A 1, u i, +{(A, + D, ) 1 D, } m A 1 ( m ) + o 1 + n 1/2 β (j) β. j = 0,G u i, G In analogy to (12) of Jin et al. (2003), the following result follows from (16) Ũ, ( β (m); β (m 1)) = ψ ( β (m 1) ; t +( β (m) β (m 1) ) X i ) where S (0) ( β (m) ; t){x i X ( β (m) ; t)} dn i ( β (m) ; t)z i + n(a, + D, )( β (m) β (m) ) nd, ( β (m 1) β (m 1) ) + d, ) (15) (16) (17) (18)

20 20 Z. Jin et al. Scand J Statist 33 d = õ (n 1/2 + n m ) { β (j) β + β (j) β }. j = 0 Under condition 5, up to an asymptotically negligible term, the first term on the right-hand side of (18) can be written as ψ ( β (m 1) ; t) S (0) ( β (m) ; t){x i X ( β (m) ; t)} dn i ( β (m) ; t)z i or + nd, ( β (m) β (m 1) ), ψ ( β (m 1) ; t) S (0) ( β (m) ; t){x i X ( β (m) ; t)} dm i ( β (m) ; t)z i + nd, ( β (m) β (m 1) ), which, up to order õ(n 1/2 ), is equivalent to ψ ( β (m 1) ; t)s (0) ( β (m) ; t){x i X ( β (m) ; t)} dm i ( β (m) ; t)z i + nd, ( β (m) β (m 1) ). The equivalence between the last two expressions follows from a decomposition similar to (13). On the other hand, Ũ, ( β (m) ; β (m 1) ) can be expressed as ψ ( β (m 1) ; t)s (0) ( β (m) ; t){x i X ( β (m) ; t)} dm i ( β (m) ; t) + nd, ( β (m) β (m 1) ) plus an asymptotically negligible term. Thus, the subtraction of Ũ, ( β (m) ; β (m 1) )fromthe right-hand side of (18) yields Ũ, ( β (m); β (m 1)) = ũ i, (Z i 1) + n(a, + D, )( β (m) β (m) ) where ũ i, = Therefore, nd, ( β (m 1) β (m 1) ) + d, ψ ( β (m 1) ; t)s (0) ( β (m) ; t){x i X ( β (m) ; t)} dm i ( β (m) ; t). n 1/2 ( β (m) β (m) ) = n 1/2 ( [I {(A, + D, ) 1 D, } m ]A 1, ũ i, +{(A, + D, ) 1 D, } m A 1,Gũi, G) (Zi 1) + n 1/2 d. By comparing (17) and (19), we conclude that the conditional distribution of n 1/2 ( β 1(m) β 1(m),..., β K(m) β K(m)) given F n converges almost surely to the limiting distribution of n 1/2 ( β 1(m) β 1,..., β K(m) β K ). (19) Proof of theorem 3 As in the proof of theorem 1, the convexity of the loss functions, L G and LG, together with the non-singularity of the second derivative of their common limit under condition 8, implies that both β G and β G are strongly consistent.

21 Scand J Statist 33 Multivariate ran regression 21 We can express (5) as U (β) = (β; t){x i X(β; t)} dm i (β; t). 0 Under model (4), E{M i (β 0 ; t)} = 0(,..., n). It follows from the functional central limit theorem (Pollard, 1990, p. 53) that n 1/2 n X im i (β 0 ; ) and n 1/2 n M i(β 0 ; ) converge to zero-mean Gaussian processes. By the uniform strong law of large numbers, S (r) (β; t) a.s. s (r) (β; t) (r = 0, 1) uniformly in β and t. It then follows from lemma 1 that n 1/2 U (β 0 ) = n 1/2 n u i, + o p (1). In view of this equation, the multivariate central limit theorem implies that n 1/2 U G (β 0 ) converges wealy to a zero-mean normal random vector with covariance matrix V G. It can be shown through algebraic manipulations that the derivative of LG (β) taes the form UG(β) = S (0) (β; t){x i X (β; t)} dm i (β; t)z i, 0 where X (β; t) = S (1) (β; t)/ S (0) (β; t), and S (r) (β; t) = n 1 n j = 1 I(C j e β X j t)xj r Z j (r = 0, 1). Let F n denote the σ-algebra generated by (C i, T i, X i )(T i C i ;,..., n). By the arguments leading to (14), UG( β G ) = S (0) ( β G ; t){x i X( β G ; t)} dm i ( β G ; t)(z i 1) + õ(n 1/2 ). (20) 0 Thus, the multivariate central limit theorem implies that the conditional distribution of n 1/2 UG ( β G ) converges almost surely to a zero-mean normal random vector with covariance matrix V G. As both U G (β) and UG (β) are functionals of empirical processes, we can establish under conditions 6 8 the asymptotic linearities for U G (β) and UG (β) similar to (7) and (11). As E(Z i ) = 1(,..., n), the slope matrices in the two expansions are identical. It follows that the conditional distribution of n 1/2 ( β G β G ) given F n converges almost surely to the limiting distribution of n 1/2 ( β G β 0 ). Proof of theorem 4 The strong consistency of β (m) and β (m) again follow from the convexity arguments. Note that β(m) and β (m) are the roots of Ũ (β; β (m 1) ) Ũ (β; β (m 1)) 0 0 ψ( β (m 1) ; t e (β β (m 1) ) X i )S (0) (β; t){x i X(β; t)} dn i (β; t), ψ( β (m 1); t e (β β (m 1) ) X i ) S (0) (β; t){x i X (β; t)} dn i (β; t)z i respectively. Under condition 10, we can tae the expansion of ψ with respect to its second argument at t. In doing so, the arguments for establishing (17) and (19), together with theorem 2 of Lin et al. (1998), can be used to show that ( n 1/2 ( β (m) β 0 ) = n 1/2 [I {(A +D ) 1 D } m ]A 1 u i, ) ( +{(A + D ) 1 D } m A 1 G u i,g + o 1 + n 1/2 m j = 0 ) β (j) β 0,

Tests of independence for censored bivariate failure time data

Tests of independence for censored bivariate failure time data Tests of independence for censored bivariate failure time data Abstract Bivariate failure time data is widely used in survival analysis, for example, in twins study. This article presents a class of χ

More information

On least-squares regression with censored data

On least-squares regression with censored data Biometrika (2006), 93, 1, pp. 147 161 2006 Biometrika Trust Printed in Great Britain On least-squares regression with censored data BY ZHEZHEN JIN Department of Biostatistics, Columbia University, New

More information

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

FULL LIKELIHOOD INFERENCES IN THE COX MODEL October 20, 2007 FULL LIKELIHOOD INFERENCES IN THE COX MODEL BY JIAN-JIAN REN 1 AND MAI ZHOU 2 University of Central Florida and University of Kentucky Abstract We use the empirical likelihood approach

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 24 Paper 153 A Note on Empirical Likelihood Inference of Residual Life Regression Ying Qing Chen Yichuan

More information

Efficiency Comparison Between Mean and Log-rank Tests for. Recurrent Event Time Data

Efficiency Comparison Between Mean and Log-rank Tests for. Recurrent Event Time Data Efficiency Comparison Between Mean and Log-rank Tests for Recurrent Event Time Data Wenbin Lu Department of Statistics, North Carolina State University, Raleigh, NC 27695 Email: lu@stat.ncsu.edu Summary.

More information

On the Breslow estimator

On the Breslow estimator Lifetime Data Anal (27) 13:471 48 DOI 1.17/s1985-7-948-y On the Breslow estimator D. Y. Lin Received: 5 April 27 / Accepted: 16 July 27 / Published online: 2 September 27 Springer Science+Business Media,

More information

Published online: 10 Apr 2012.

Published online: 10 Apr 2012. This article was downloaded by: Columbia University] On: 23 March 215, At: 12:7 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 172954 Registered office: Mortimer

More information

STAT331. Cox s Proportional Hazards Model

STAT331. Cox s Proportional Hazards Model STAT331 Cox s Proportional Hazards Model In this unit we introduce Cox s proportional hazards (Cox s PH) model, give a heuristic development of the partial likelihood function, and discuss adaptations

More information

Power and Sample Size Calculations with the Additive Hazards Model

Power and Sample Size Calculations with the Additive Hazards Model Journal of Data Science 10(2012), 143-155 Power and Sample Size Calculations with the Additive Hazards Model Ling Chen, Chengjie Xiong, J. Philip Miller and Feng Gao Washington University School of Medicine

More information

log T = β T Z + ɛ Zi Z(u; β) } dn i (ue βzi ) = 0,

log T = β T Z + ɛ Zi Z(u; β) } dn i (ue βzi ) = 0, Accelerated failure time model: log T = β T Z + ɛ β estimation: solve where S n ( β) = n i=1 { Zi Z(u; β) } dn i (ue βzi ) = 0, Z(u; β) = j Z j Y j (ue βz j) j Y j (ue βz j) How do we show the asymptotics

More information

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model Other Survival Models (1) Non-PH models We briefly discussed the non-proportional hazards (non-ph) model λ(t Z) = λ 0 (t) exp{β(t) Z}, where β(t) can be estimated by: piecewise constants (recall how);

More information

Lecture 5 Models and methods for recurrent event data

Lecture 5 Models and methods for recurrent event data Lecture 5 Models and methods for recurrent event data Recurrent and multiple events are commonly encountered in longitudinal studies. In this chapter we consider ordered recurrent and multiple events.

More information

Estimation and Inference of Quantile Regression. for Survival Data under Biased Sampling

Estimation and Inference of Quantile Regression. for Survival Data under Biased Sampling Estimation and Inference of Quantile Regression for Survival Data under Biased Sampling Supplementary Materials: Proofs of the Main Results S1 Verification of the weight function v i (t) for the lengthbiased

More information

Cox s proportional hazards model and Cox s partial likelihood

Cox s proportional hazards model and Cox s partial likelihood Cox s proportional hazards model and Cox s partial likelihood Rasmus Waagepetersen October 12, 2018 1 / 27 Non-parametric vs. parametric Suppose we want to estimate unknown function, e.g. survival function.

More information

Negative Multinomial Model and Cancer. Incidence

Negative Multinomial Model and Cancer. Incidence Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence S. Lahiri & Sunil K. Dhar Department of Mathematical Sciences, CAMS New Jersey Institute of Technology, Newar,

More information

On consistency of Kendall s tau under censoring

On consistency of Kendall s tau under censoring Biometria (28), 95, 4,pp. 997 11 C 28 Biometria Trust Printed in Great Britain doi: 1.193/biomet/asn37 Advance Access publication 17 September 28 On consistency of Kendall s tau under censoring BY DAVID

More information

UNIVERSITY OF CALIFORNIA, SAN DIEGO

UNIVERSITY OF CALIFORNIA, SAN DIEGO UNIVERSITY OF CALIFORNIA, SAN DIEGO Estimation of the primary hazard ratio in the presence of a secondary covariate with non-proportional hazards An undergraduate honors thesis submitted to the Department

More information

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA Kasun Rathnayake ; A/Prof Jun Ma Department of Statistics Faculty of Science and Engineering Macquarie University

More information

AFT Models and Empirical Likelihood

AFT Models and Empirical Likelihood AFT Models and Empirical Likelihood Mai Zhou Department of Statistics, University of Kentucky Collaborators: Gang Li (UCLA); A. Bathke; M. Kim (Kentucky) Accelerated Failure Time (AFT) models: Y = log(t

More information

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky Empirical likelihood with right censored data were studied by Thomas and Grunkmier (1975), Li (1995),

More information

Multivariate Survival Data With Censoring.

Multivariate Survival Data With Censoring. 1 Multivariate Survival Data With Censoring. Shulamith Gross and Catherine Huber-Carol Baruch College of the City University of New York, Dept of Statistics and CIS, Box 11-220, 1 Baruch way, 10010 NY.

More information

Analysis of Time-to-Event Data: Chapter 4 - Parametric regression models

Analysis of Time-to-Event Data: Chapter 4 - Parametric regression models Analysis of Time-to-Event Data: Chapter 4 - Parametric regression models Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/25 Right censored

More information

MAS3301 / MAS8311 Biostatistics Part II: Survival

MAS3301 / MAS8311 Biostatistics Part II: Survival MAS3301 / MAS8311 Biostatistics Part II: Survival M. Farrow School of Mathematics and Statistics Newcastle University Semester 2, 2009-10 1 13 The Cox proportional hazards model 13.1 Introduction In the

More information

STAT 331. Accelerated Failure Time Models. Previously, we have focused on multiplicative intensity models, where

STAT 331. Accelerated Failure Time Models. Previously, we have focused on multiplicative intensity models, where STAT 331 Accelerated Failure Time Models Previously, we have focused on multiplicative intensity models, where h t z) = h 0 t) g z). These can also be expressed as H t z) = H 0 t) g z) or S t z) = e Ht

More information

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Anastasios (Butch) Tsiatis Department of Statistics North Carolina State University http://www.stat.ncsu.edu/

More information

Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations

Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations Takeshi Emura and Hisayuki Tsukuma Abstract For testing the regression parameter in multivariate

More information

Survival Regression Models

Survival Regression Models Survival Regression Models David M. Rocke May 18, 2017 David M. Rocke Survival Regression Models May 18, 2017 1 / 32 Background on the Proportional Hazards Model The exponential distribution has constant

More information

Chapter 7 Fall Chapter 7 Hypothesis testing Hypotheses of interest: (A) 1-sample

Chapter 7 Fall Chapter 7 Hypothesis testing Hypotheses of interest: (A) 1-sample Bios 323: Applied Survival Analysis Qingxia (Cindy) Chen Chapter 7 Fall 2012 Chapter 7 Hypothesis testing Hypotheses of interest: (A) 1-sample H 0 : S(t) = S 0 (t), where S 0 ( ) is known survival function,

More information

Lecture 22 Survival Analysis: An Introduction

Lecture 22 Survival Analysis: An Introduction University of Illinois Department of Economics Spring 2017 Econ 574 Roger Koenker Lecture 22 Survival Analysis: An Introduction There is considerable interest among economists in models of durations, which

More information

Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion

Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion Glenn Heller and Jing Qin Department of Epidemiology and Biostatistics Memorial

More information

Analysis of transformation models with censored data

Analysis of transformation models with censored data Biometrika (1995), 82,4, pp. 835-45 Printed in Great Britain Analysis of transformation models with censored data BY S. C. CHENG Department of Biomathematics, M. D. Anderson Cancer Center, University of

More information

Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence

Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence Sunil Kumar Dhar Center for Applied Mathematics and Statistics, Department of Mathematical Sciences, New Jersey

More information

Survival Analysis Math 434 Fall 2011

Survival Analysis Math 434 Fall 2011 Survival Analysis Math 434 Fall 2011 Part IV: Chap. 8,9.2,9.3,11: Semiparametric Proportional Hazards Regression Jimin Ding Math Dept. www.math.wustl.edu/ jmding/math434/fall09/index.html Basic Model Setup

More information

Quantile Regression for Residual Life and Empirical Likelihood

Quantile Regression for Residual Life and Empirical Likelihood Quantile Regression for Residual Life and Empirical Likelihood Mai Zhou email: mai@ms.uky.edu Department of Statistics, University of Kentucky, Lexington, KY 40506-0027, USA Jong-Hyeon Jeong email: jeong@nsabp.pitt.edu

More information

Survival Analysis. Lu Tian and Richard Olshen Stanford University

Survival Analysis. Lu Tian and Richard Olshen Stanford University 1 Survival Analysis Lu Tian and Richard Olshen Stanford University 2 Survival Time/ Failure Time/Event Time We will introduce various statistical methods for analyzing survival outcomes What is the survival

More information

PhD course in Advanced survival analysis. One-sample tests. Properties. Idea: (ABGK, sect. V.1.1) Counting process N(t)

PhD course in Advanced survival analysis. One-sample tests. Properties. Idea: (ABGK, sect. V.1.1) Counting process N(t) PhD course in Advanced survival analysis. (ABGK, sect. V.1.1) One-sample tests. Counting process N(t) Non-parametric hypothesis tests. Parametric models. Intensity process λ(t) = α(t)y (t) satisfying Aalen

More information

Survival Analysis I (CHL5209H)

Survival Analysis I (CHL5209H) Survival Analysis Dalla Lana School of Public Health University of Toronto olli.saarela@utoronto.ca January 7, 2015 31-1 Literature Clayton D & Hills M (1993): Statistical Models in Epidemiology. Not really

More information

Competing risks data analysis under the accelerated failure time model with missing cause of failure

Competing risks data analysis under the accelerated failure time model with missing cause of failure Ann Inst Stat Math 2016 68:855 876 DOI 10.1007/s10463-015-0516-y Competing risks data analysis under the accelerated failure time model with missing cause of failure Ming Zheng Renxin Lin Wen Yu Received:

More information

Semiparametric Regression

Semiparametric Regression Semiparametric Regression Patrick Breheny October 22 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/23 Introduction Over the past few weeks, we ve introduced a variety of regression models under

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models

Introduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models Introduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations

More information

Rank-based inference for the accelerated failure time model

Rank-based inference for the accelerated failure time model Biometrika (2003), 90, 2, pp. 341 353 2003 Biometrika Trust Printed in reat Britain Rank-based inference for the accelerated failure time model BY ZHEZHEN JIN Department of Biostatistics, Columbia University,

More information

FULL LIKELIHOOD INFERENCES IN THE COX MODEL: AN EMPIRICAL LIKELIHOOD APPROACH

FULL LIKELIHOOD INFERENCES IN THE COX MODEL: AN EMPIRICAL LIKELIHOOD APPROACH FULL LIKELIHOOD INFERENCES IN THE COX MODEL: AN EMPIRICAL LIKELIHOOD APPROACH Jian-Jian Ren 1 and Mai Zhou 2 University of Central Florida and University of Kentucky Abstract: For the regression parameter

More information

Issues on quantile autoregression

Issues on quantile autoregression Issues on quantile autoregression Jianqing Fan and Yingying Fan We congratulate Koenker and Xiao on their interesting and important contribution to the quantile autoregression (QAR). The paper provides

More information

ST745: Survival Analysis: Nonparametric methods

ST745: Survival Analysis: Nonparametric methods ST745: Survival Analysis: Nonparametric methods Eric B. Laber Department of Statistics, North Carolina State University February 5, 2015 The KM estimator is used ubiquitously in medical studies to estimate

More information

Statistics 262: Intermediate Biostatistics Non-parametric Survival Analysis

Statistics 262: Intermediate Biostatistics Non-parametric Survival Analysis Statistics 262: Intermediate Biostatistics Non-parametric Survival Analysis Jonathan Taylor & Kristin Cobb Statistics 262: Intermediate Biostatistics p.1/?? Overview of today s class Kaplan-Meier Curve

More information

sparse and low-rank tensor recovery Cubic-Sketching

sparse and low-rank tensor recovery Cubic-Sketching Sparse and Low-Ran Tensor Recovery via Cubic-Setching Guang Cheng Department of Statistics Purdue University www.science.purdue.edu/bigdata CCAM@Purdue Math Oct. 27, 2017 Joint wor with Botao Hao and Anru

More information

1 Glivenko-Cantelli type theorems

1 Glivenko-Cantelli type theorems STA79 Lecture Spring Semester Glivenko-Cantelli type theorems Given i.i.d. observations X,..., X n with unknown distribution function F (t, consider the empirical (sample CDF ˆF n (t = I [Xi t]. n Then

More information

Multistate models and recurrent event models

Multistate models and recurrent event models and recurrent event models Patrick Breheny December 6 Patrick Breheny University of Iowa Survival Data Analysis (BIOS:7210) 1 / 22 Introduction In this final lecture, we will briefly look at two other

More information

Efficiency of Profile/Partial Likelihood in the Cox Model

Efficiency of Profile/Partial Likelihood in the Cox Model Efficiency of Profile/Partial Likelihood in the Cox Model Yuichi Hirose School of Mathematics, Statistics and Operations Research, Victoria University of Wellington, New Zealand Summary. This paper shows

More information

Statistical Methods for Alzheimer s Disease Studies

Statistical Methods for Alzheimer s Disease Studies Statistical Methods for Alzheimer s Disease Studies Rebecca A. Betensky, Ph.D. Department of Biostatistics, Harvard T.H. Chan School of Public Health July 19, 2016 1/37 OUTLINE 1 Statistical collaborations

More information

More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order Restriction

More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order Restriction Sankhyā : The Indian Journal of Statistics 2007, Volume 69, Part 4, pp. 700-716 c 2007, Indian Statistical Institute More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order

More information

Multistate models and recurrent event models

Multistate models and recurrent event models Multistate models Multistate models and recurrent event models Patrick Breheny December 10 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/22 Introduction Multistate models In this final lecture,

More information

Lecture 15 (Part 2): Logistic Regression & Common Odds Ratio, (With Simulations)

Lecture 15 (Part 2): Logistic Regression & Common Odds Ratio, (With Simulations) Lecture 15 (Part 2): Logistic Regression & Common Odds Ratio, (With Simulations) Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology

More information

Introduction to Statistical Analysis

Introduction to Statistical Analysis Introduction to Statistical Analysis Changyu Shen Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology Beth Israel Deaconess Medical Center Harvard Medical School Objectives Descriptive

More information

Exercises. (a) Prove that m(t) =

Exercises. (a) Prove that m(t) = Exercises 1. Lack of memory. Verify that the exponential distribution has the lack of memory property, that is, if T is exponentially distributed with parameter λ > then so is T t given that T > t for

More information

4. Comparison of Two (K) Samples

4. Comparison of Two (K) Samples 4. Comparison of Two (K) Samples K=2 Problem: compare the survival distributions between two groups. E: comparing treatments on patients with a particular disease. Z: Treatment indicator, i.e. Z = 1 for

More information

Least Absolute Deviations Estimation for the Accelerated Failure Time Model. University of Iowa. *

Least Absolute Deviations Estimation for the Accelerated Failure Time Model. University of Iowa. * Least Absolute Deviations Estimation for the Accelerated Failure Time Model Jian Huang 1,2, Shuangge Ma 3, and Huiliang Xie 1 1 Department of Statistics and Actuarial Science, and 2 Program in Public Health

More information

Multivariate Survival Analysis

Multivariate Survival Analysis Multivariate Survival Analysis Previously we have assumed that either (X i, δ i ) or (X i, δ i, Z i ), i = 1,..., n, are i.i.d.. This may not always be the case. Multivariate survival data can arise in

More information

,..., θ(2),..., θ(n)

,..., θ(2),..., θ(n) Likelihoods for Multivariate Binary Data Log-Linear Model We have 2 n 1 distinct probabilities, but we wish to consider formulations that allow more parsimonious descriptions as a function of covariates.

More information

Efficient Semiparametric Estimators via Modified Profile Likelihood in Frailty & Accelerated-Failure Models

Efficient Semiparametric Estimators via Modified Profile Likelihood in Frailty & Accelerated-Failure Models NIH Talk, September 03 Efficient Semiparametric Estimators via Modified Profile Likelihood in Frailty & Accelerated-Failure Models Eric Slud, Math Dept, Univ of Maryland Ongoing joint project with Ilia

More information

UNIVERSITY OF TORONTO Faculty of Arts and Science

UNIVERSITY OF TORONTO Faculty of Arts and Science UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator

More information

Application of Time-to-Event Methods in the Assessment of Safety in Clinical Trials

Application of Time-to-Event Methods in the Assessment of Safety in Clinical Trials Application of Time-to-Event Methods in the Assessment of Safety in Clinical Trials Progress, Updates, Problems William Jen Hoe Koh May 9, 2013 Overview Marginal vs Conditional What is TMLE? Key Estimation

More information

Linear life expectancy regression with censored data

Linear life expectancy regression with censored data Linear life expectancy regression with censored data By Y. Q. CHEN Program in Biostatistics, Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington 98109, U.S.A.

More information

WISE International Masters

WISE International Masters WISE International Masters ECONOMETRICS Instructor: Brett Graham INSTRUCTIONS TO STUDENTS 1 The time allowed for this examination paper is 2 hours. 2 This examination paper contains 32 questions. You are

More information

Figure 36: Respiratory infection versus time for the first 49 children.

Figure 36: Respiratory infection versus time for the first 49 children. y BINARY DATA MODELS We devote an entire chapter to binary data since such data are challenging, both in terms of modeling the dependence, and parameter interpretation. We again consider mixed effects

More information

Panel Count Data Regression with Informative Observation Times

Panel Count Data Regression with Informative Observation Times UW Biostatistics Working Paper Series 3-16-2010 Panel Count Data Regression with Informative Observation Times Petra Buzkova University of Washington, buzkova@u.washington.edu Suggested Citation Buzkova,

More information

1. Introduction In many biomedical studies, the random survival time of interest is never observed and is only known to lie before an inspection time

1. Introduction In many biomedical studies, the random survival time of interest is never observed and is only known to lie before an inspection time ASYMPTOTIC PROPERTIES OF THE GMLE WITH CASE 2 INTERVAL-CENSORED DATA By Qiqing Yu a;1 Anton Schick a, Linxiong Li b;2 and George Y. C. Wong c;3 a Dept. of Mathematical Sciences, Binghamton University,

More information

Goodness-of-fit test for the Cox Proportional Hazard Model

Goodness-of-fit test for the Cox Proportional Hazard Model Goodness-of-fit test for the Cox Proportional Hazard Model Rui Cui rcui@eco.uc3m.es Department of Economics, UC3M Abstract In this paper, we develop new goodness-of-fit tests for the Cox proportional hazard

More information

Likelihood ratio confidence bands in nonparametric regression with censored data

Likelihood ratio confidence bands in nonparametric regression with censored data Likelihood ratio confidence bands in nonparametric regression with censored data Gang Li University of California at Los Angeles Department of Biostatistics Ingrid Van Keilegom Eindhoven University of

More information

Method of Conditional Moments Based on Incomplete Data

Method of Conditional Moments Based on Incomplete Data , ISSN 0974-570X (Online, ISSN 0974-5718 (Print, Vol. 20; Issue No. 3; Year 2013, Copyright 2013 by CESER Publications Method of Conditional Moments Based on Incomplete Data Yan Lu 1 and Naisheng Wang

More information

Outline. Frailty modelling of Multivariate Survival Data. Clustered survival data. Clustered survival data

Outline. Frailty modelling of Multivariate Survival Data. Clustered survival data. Clustered survival data Outline Frailty modelling of Multivariate Survival Data Thomas Scheike ts@biostat.ku.dk Department of Biostatistics University of Copenhagen Marginal versus Frailty models. Two-stage frailty models: copula

More information

Models for Multivariate Panel Count Data

Models for Multivariate Panel Count Data Semiparametric Models for Multivariate Panel Count Data KyungMann Kim University of Wisconsin-Madison kmkim@biostat.wisc.edu 2 April 2015 Outline 1 Introduction 2 3 4 Panel Count Data Motivation Previous

More information

Full likelihood inferences in the Cox model: an empirical likelihood approach

Full likelihood inferences in the Cox model: an empirical likelihood approach Ann Inst Stat Math 2011) 63:1005 1018 DOI 10.1007/s10463-010-0272-y Full likelihood inferences in the Cox model: an empirical likelihood approach Jian-Jian Ren Mai Zhou Received: 22 September 2008 / Revised:

More information

The Restricted Likelihood Ratio Test at the Boundary in Autoregressive Series

The Restricted Likelihood Ratio Test at the Boundary in Autoregressive Series The Restricted Likelihood Ratio Test at the Boundary in Autoregressive Series Willa W. Chen Rohit S. Deo July 6, 009 Abstract. The restricted likelihood ratio test, RLRT, for the autoregressive coefficient

More information

Longitudinal + Reliability = Joint Modeling

Longitudinal + Reliability = Joint Modeling Longitudinal + Reliability = Joint Modeling Carles Serrat Institute of Statistics and Mathematics Applied to Building CYTED-HAROSA International Workshop November 21-22, 2013 Barcelona Mainly from Rizopoulos,

More information

A Bivariate Weibull Regression Model

A Bivariate Weibull Regression Model c Heldermann Verlag Economic Quality Control ISSN 0940-5151 Vol 20 (2005), No. 1, 1 A Bivariate Weibull Regression Model David D. Hanagal Abstract: In this paper, we propose a new bivariate Weibull regression

More information

Survival Analysis for Case-Cohort Studies

Survival Analysis for Case-Cohort Studies Survival Analysis for ase-ohort Studies Petr Klášterecký Dept. of Probability and Mathematical Statistics, Faculty of Mathematics and Physics, harles University, Prague, zech Republic e-mail: petr.klasterecky@matfyz.cz

More information

SEMIPARAMETRIC REGRESSION WITH TIME-DEPENDENT COEFFICIENTS FOR FAILURE TIME DATA ANALYSIS

SEMIPARAMETRIC REGRESSION WITH TIME-DEPENDENT COEFFICIENTS FOR FAILURE TIME DATA ANALYSIS Statistica Sinica 2 (21), 853-869 SEMIPARAMETRIC REGRESSION WITH TIME-DEPENDENT COEFFICIENTS FOR FAILURE TIME DATA ANALYSIS Zhangsheng Yu and Xihong Lin Indiana University and Harvard School of Public

More information

1 Appendix A: Matrix Algebra

1 Appendix A: Matrix Algebra Appendix A: Matrix Algebra. Definitions Matrix A =[ ]=[A] Symmetric matrix: = for all and Diagonal matrix: 6=0if = but =0if 6= Scalar matrix: the diagonal matrix of = Identity matrix: the scalar matrix

More information

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics Jiti Gao Department of Statistics School of Mathematics and Statistics The University of Western Australia Crawley

More information

TESTS FOR LOCATION WITH K SAMPLES UNDER THE KOZIOL-GREEN MODEL OF RANDOM CENSORSHIP Key Words: Ke Wu Department of Mathematics University of Mississip

TESTS FOR LOCATION WITH K SAMPLES UNDER THE KOZIOL-GREEN MODEL OF RANDOM CENSORSHIP Key Words: Ke Wu Department of Mathematics University of Mississip TESTS FOR LOCATION WITH K SAMPLES UNDER THE KOIOL-GREEN MODEL OF RANDOM CENSORSHIP Key Words: Ke Wu Department of Mathematics University of Mississippi University, MS38677 K-sample location test, Koziol-Green

More information

Machine Learning Brett Bernstein. Recitation 1: Gradients and Directional Derivatives

Machine Learning Brett Bernstein. Recitation 1: Gradients and Directional Derivatives Machine Learning Brett Bernstein Recitation 1: Gradients and Directional Derivatives Intro Question 1 We are given the data set (x 1, y 1 ),, (x n, y n ) where x i R d and y i R We want to fit a linear

More information

Estimation of AUC from 0 to Infinity in Serial Sacrifice Designs

Estimation of AUC from 0 to Infinity in Serial Sacrifice Designs Estimation of AUC from 0 to Infinity in Serial Sacrifice Designs Martin J. Wolfsegger Department of Biostatistics, Baxter AG, Vienna, Austria Thomas Jaki Department of Statistics, University of South Carolina,

More information

EMPIRICAL LIKELIHOOD ANALYSIS FOR THE HETEROSCEDASTIC ACCELERATED FAILURE TIME MODEL

EMPIRICAL LIKELIHOOD ANALYSIS FOR THE HETEROSCEDASTIC ACCELERATED FAILURE TIME MODEL Statistica Sinica 22 (2012), 295-316 doi:http://dx.doi.org/10.5705/ss.2010.190 EMPIRICAL LIKELIHOOD ANALYSIS FOR THE HETEROSCEDASTIC ACCELERATED FAILURE TIME MODEL Mai Zhou 1, Mi-Ok Kim 2, and Arne C.

More information

1 Introduction. 2 Residuals in PH model

1 Introduction. 2 Residuals in PH model Supplementary Material for Diagnostic Plotting Methods for Proportional Hazards Models With Time-dependent Covariates or Time-varying Regression Coefficients BY QIQING YU, JUNYI DONG Department of Mathematical

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

Lecture 12. Multivariate Survival Data Statistics Survival Analysis. Presented March 8, 2016

Lecture 12. Multivariate Survival Data Statistics Survival Analysis. Presented March 8, 2016 Statistics 255 - Survival Analysis Presented March 8, 2016 Dan Gillen Department of Statistics University of California, Irvine 12.1 Examples Clustered or correlated survival times Disease onset in family

More information

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data Malaysian Journal of Mathematical Sciences 11(3): 33 315 (217) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES Journal homepage: http://einspem.upm.edu.my/journal Approximation of Survival Function by Taylor

More information

Integrated likelihoods in survival models for highlystratified

Integrated likelihoods in survival models for highlystratified Working Paper Series, N. 1, January 2014 Integrated likelihoods in survival models for highlystratified censored data Giuliana Cortese Department of Statistical Sciences University of Padua Italy Nicola

More information

11 Survival Analysis and Empirical Likelihood

11 Survival Analysis and Empirical Likelihood 11 Survival Analysis and Empirical Likelihood The first paper of empirical likelihood is actually about confidence intervals with the Kaplan-Meier estimator (Thomas and Grunkmeier 1979), i.e. deals with

More information

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions International Journal of Control Vol. 00, No. 00, January 2007, 1 10 Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions I-JENG WANG and JAMES C.

More information

On the generalized maximum likelihood estimator of survival function under Koziol Green model

On the generalized maximum likelihood estimator of survival function under Koziol Green model On the generalized maximum likelihood estimator of survival function under Koziol Green model By: Haimeng Zhang, M. Bhaskara Rao, Rupa C. Mitra Zhang, H., Rao, M.B., and Mitra, R.C. (2006). On the generalized

More information

Estimation of Conditional Kendall s Tau for Bivariate Interval Censored Data

Estimation of Conditional Kendall s Tau for Bivariate Interval Censored Data Communications for Statistical Applications and Methods 2015, Vol. 22, No. 6, 599 604 DOI: http://dx.doi.org/10.5351/csam.2015.22.6.599 Print ISSN 2287-7843 / Online ISSN 2383-4757 Estimation of Conditional

More information

GOODNESS-OF-FIT TESTS FOR ARCHIMEDEAN COPULA MODELS

GOODNESS-OF-FIT TESTS FOR ARCHIMEDEAN COPULA MODELS Statistica Sinica 20 (2010), 441-453 GOODNESS-OF-FIT TESTS FOR ARCHIMEDEAN COPULA MODELS Antai Wang Georgetown University Medical Center Abstract: In this paper, we propose two tests for parametric models

More information

LSS: An S-Plus/R program for the accelerated failure time model to right censored data based on least-squares principle

LSS: An S-Plus/R program for the accelerated failure time model to right censored data based on least-squares principle computer methods and programs in biomedicine 86 (2007) 45 50 journal homepage: www.intl.elsevierhealth.com/journals/cmpb LSS: An S-Plus/R program for the accelerated failure time model to right censored

More information

Practice Exam 1. (A) (B) (C) (D) (E) You are given the following data on loss sizes:

Practice Exam 1. (A) (B) (C) (D) (E) You are given the following data on loss sizes: Practice Exam 1 1. Losses for an insurance coverage have the following cumulative distribution function: F(0) = 0 F(1,000) = 0.2 F(5,000) = 0.4 F(10,000) = 0.9 F(100,000) = 1 with linear interpolation

More information

Asymptotic Distributions for the Nelson-Aalen and Kaplan-Meier estimators and for test statistics.

Asymptotic Distributions for the Nelson-Aalen and Kaplan-Meier estimators and for test statistics. Asymptotic Distributions for the Nelson-Aalen and Kaplan-Meier estimators and for test statistics. Dragi Anevski Mathematical Sciences und University November 25, 21 1 Asymptotic distributions for statistical

More information

On robust and efficient estimation of the center of. Symmetry.

On robust and efficient estimation of the center of. Symmetry. On robust and efficient estimation of the center of symmetry Howard D. Bondell Department of Statistics, North Carolina State University Raleigh, NC 27695-8203, U.S.A (email: bondell@stat.ncsu.edu) Abstract

More information

Chapter 6. Order Statistics and Quantiles. 6.1 Extreme Order Statistics

Chapter 6. Order Statistics and Quantiles. 6.1 Extreme Order Statistics Chapter 6 Order Statistics and Quantiles 61 Extreme Order Statistics Suppose we have a finite sample X 1,, X n Conditional on this sample, we define the values X 1),, X n) to be a permutation of X 1,,

More information

Multivariate Regression Analysis

Multivariate Regression Analysis Matrices and vectors The model from the sample is: Y = Xβ +u with n individuals, l response variable, k regressors Y is a n 1 vector or a n l matrix with the notation Y T = (y 1,y 2,...,y n ) 1 x 11 x

More information