Efficiency Comparison Between Mean and Log-rank Tests for. Recurrent Event Time Data

Size: px

Start display at page:

Download "Efficiency Comparison Between Mean and Log-rank Tests for. Recurrent Event Time Data"

Stewart Fitzgerald
5 years ago
Views:

1 Efficiency Comparison Between Mean and Log-rank Tests for Recurrent Event Time Data Wenbin Lu Department of Statistics, North Carolina State University, Raleigh, NC Summary. Recurrent event time data are common in biomedical follow-up studies, in which a study subject may experience repeated occurrences of an event of interest. In this paper, we evaluate two popular nonparametric tests for recurrent event time data in terms of their relative efficiency. One is the log-rank test for classical survival data and the other a more recently developed nonparametric test based on comparing mean recurrent rates. We show analytically that, somewhat surprisingly, the log-rank test that only makes use of time to the first occurrence could be more efficient than the test for mean occurrence rates that makes use of all available recurrence times, provided that subject-to-subject variation of recurrence times is large. Explicit formula are derived for asymptotic relative efficiencies under the frailty model. The findings are demonstrated via extensive simulations. Key words: Asymptotic relative efficiency, frailty model, log-rank test, proportional mean test, recurrent events, robust variance estimation.. Introduction The log-rank test (Mantel and Haenszel, 959) is perhaps the most widely used method in two-sample comparisons of treatment efficacies for time-to-event data. It is simple to use, nonparametric in nature and highly efficient under suitable assumptions and incorporates the

2 usual right censorship without complication. It is also closely related to the Cox proportional hazards regression model (Cox, 972). In fact, it is the score test of the partial likelihood (Cox, 975) under the Cox model assumption. For historical reasons in the development of rank tests and for efficiency considerations under nonproportional hazards, weighted log-rank tests, notably the Gehan, the Peto-Prentice and the G ρ -family have also received a great deal of attention; cf. Gehan (965), Peto and Peto (972), Prentice (978), Harrington and Fleming (982) and Fleming and Harrington (99). In many biomedical follow-up studies, as well as studies in other disciplines such as economics, sociology and software engineering, subjects often experience repeated occurrences of the event of interest, i.e. recurrence of the same type of event. Examples include repeated infections of certain diseases, attacks of asthma and epileptic seizures, among others. In fact, the present investigation was motivated by studies sponsored by the R.W. Johnson Pharmaceutical Research Institute on evaluation of treatments for epileptic seizures, which are known to have big subject-to-subject variation in terms of epileptic seizure counts. A major development in the analysis of recurrent event time data is due to Andersen and Gill (982), who introduced a multiplicative intensity model for multivariate counting process which mimics the Cox proportional hazards model for failure time data. Under their model assumption, the method of partial likelihood can be used to obtain semiparametrically efficient estimates of the regression parameters. However, due to within-subject dependency, the requirement for a multiplicative intensity is likely to be too stringent without including complicated time-dependent covariate adjustment. To avoid modelling the intensity, Pepe and Cai (993) proposed use of rate functions for recurrent event time data so that the regression relationship is through the rate function instead of the intensity. Further studies along the line can be found in Lawless and Nadeau (995) and Lawless, Nadeau and Cook (997), who also 2

3 suggested modelling the regression through the mean functions of counting processes. For a comprehensive discussion of the rate and mean function models, the asymptotic theory thereof, as well as their relationship to the Andersen-Gill multiplicative intensity model, we refer to Lin, Wei, Yang and Ying (2). Additional approaches to dealing with recurrent event time data can be found in Wang and Chang (999) and Chang and Wang (999). An important ingredient in the mean/rate function approach to recurrent event time data is that the simple score function derived under the Andersen-Gill multiplicative intensity model assumption is unbiased, at least asymptotically, thereby can be used for hypothesis testing and parameter estimation, provided the robust variance estimation is adopted. In the case of the two-sample problem of testing treatment difference, the resulting test is nonparametric in the sense that its validity does not require any parametric or semiparametric model assumption. Alternatively, the log-rank test can also be used for testing treatment difference when only time to first event is used and subsequent event times are ignored. At a first glance, it appears that such an approach must be far inferior to the test based on counting process and its mean/rate function model, since the latter utilizes not only the first, but all subsequent event times. Indeed, under the Andersen-Gill multiplicative intensity model assumption (for local alternatives), it is asymptotically semiparametrically efficient, analogously to the log-rank test being efficient for failure time data under the Cox model assumption. This can also be seen intuitively that under the Andersen-Gill multiplicative intensity model, the Fisher information, thus the effective sample size, is proportional to the total number of events that all study subjects experience. The more events are included, the larger Fisher information it results in. With this in mind, it is somewhat surprising to find out that the mean/rate function-based method for recurrent event time data as described in Lin et al. (2) is less efficient than 3

4 the log-rank test with time-to-the-first-event data for testing treatment difference when there is large patient-to-patient variability. The main focus of this paper is to show both analytically and numerically that when the subject-to-subject variability is too high, the mean/rate function based test could be quite inefficient, even less efficient than the log-rank test using a single event time only. Under a quite general setting, asymptotic relative efficiency is expressed analytically, resulting in a threshold-type criterion for efficiency comparison of the two methods. The rest of the paper is organized as follows. In the next section, general notation and setup are described and main theoretical findings are given. Section 3 is devoted to simulation studies, which reinforce the theoretical results. Possible extensions and discussions of the results are given in Section 4. The Appendix contains mathematical derivations of the theoretical results. 2. Main Results In this section, we first introduce the notation and basic assumptions for recurrent event time data. We then describe the two nonparametric tests, one the mean test based on comparing the total number of recurrent events, adjust for censoring, and the other the log-rank test with the first event time. Finally we derive asymptotic efficiencies in general forms as well as under specific model assumptions. 2.. Notation and Assumptions Following Lin et al. (2), let Ni (t) be the number of events experienced by the ith subject during time interval [, t, and Z i an indicator, taking values of or, indicating which of the two treatments is received by the subject. Thus dni (t) = Ni (t) Ni (t ) indicates whether or not an event occurs at t. The censoring time is denoted by C i. In other words, N i (t) = Ni (t C i ) is the observed counting process, where a b = min(a, b). So the observations consist of {N i ( ), C i, Z i ; i =,, n}. Throughout, censoring time C i is assumed to be conditionally 4

5 independent of N i ( ) given Z i. Clearly, the conditional independence allows heterogeneous censorship between the two treatment groups. Unlike the counting processes arising from single failure time data that only take values or, N i can take integer values greater than. Let T ik = inf{t : N i (t) = k}, which defines the occurrence time of the kth event on the ith subject, k and i =,, n. For each i, T ik, k, are censored by a common censoring time C i. So, expressing in terms of event times rather than counting processes, observations consist of {T ik, δ ik, Z i ; k, i =, n}, where T ik = T ik C i, δ ik = I(T ik C i) and I( ) is the usual indicator function. The null hypothesis of no treatment difference between the two comparison groups is tantamount to that the counting processes N i, i =,, n are independent and identically distributed (iid). By definition, it also entails that for each fixed k, Tik, i =,, n are also iid. In particular, the first event times, T i, i =,, n are iid random variables Nonparametric Test Statistics The proportional mean function model specifies that E[N i (t) Z i = e β Z i µ (t), where µ is the baseline mean function and β the true value of the regression parameter. Under this model assumption, an unbiased Cox-type estimating function is U R (β) = i= {Z i Z(β, t)}dn i (t), (2.) where Z(β, t) = n i= I(C i t)e βz i Z i / n i= I(C i t)e βz i. The case of β = corresponds to the null hypothesis of no treatment difference. From the results of Pepe and Cai (993), Lawless and Nadeau(995) and Lin et al. (2), it follows that under the null hypothesis n /2 U R () is asymptotically normal with mean and variance consistently estimated by V = n [ {Z i Z(t)}{dN i (t) I(C i t)dˆµ(t)} i= 5 2

6 where Z(t) = n i= Z ii(c i t)/ n i= I(C i t) and ˆµ(t) = t n i= dn i(s)/ n i= I(C i s). Let U R = U R (). So a two-sided α-level test is to reject the null hypothesis if U 2 R /n V > χ 2 α(), where χ 2 α() is the upper α quantile of the χ 2 () distribution. Again note that validity of this test does not require the proportional mean model assumption. Alternatively, the classical log-rank test using the first event times is also a valid nonparametric test. The log-rank statistic has form U L = i= {Z i Z L (t)}dn L i (t), (2.2) where Z L (t) = n i= Z ii(ti C i t)/ n i= I(T i C i t) and Ni L (t) = δ i I(Ti C i t). Its variance under the null is approximately n i= {Z i Z L (t)} 2 dni L (t), so UL 2/ n i= {Z i Z L (t)} 2 dn L i (t) follows the χ 2 () distribution Efficiency Comparison To compare efficiencies of the preceding two nonparametric tests, it is necessary to specify a parametric or a semiparametric model for the alternatives. To this end, we consider the following frailty model, which can be found in Andersen, Borgan, Gill and Keiding (993) and Lin et al. (2). Let ξ i be iid positive random variables that represent the random effect or frailty. For identifiability, assume Eξ i =. Let σ 2 = V ar(ξ i ). For each i, conditional on Z i and ξ i, counting process Ni (t) is assumed to have compensator ξ i e βz i µ (t). Thus dn i (t) has intensity I(C i t)ξ i e βz i dµ (t). This is a special case of the proportional mean function model since the expectation conditional on C i and Z i only produces I(C i t)e βz i dµ (t). As usual, ξ i incorporates the within-subject dependency. A summary of such dependency is σ 2. For the case of σ 2 =, it reduces to the Anderson-Gill multiplicative intensity model and N i has independent increasements under a monotone transformation of time. A large value for σ 2 indicates a high within-subject correlation. 6

7 Asymptotic relative efficiency (ARE) is evaluated at contiguous alternatives (Hajek and Sidak, 967; Serfling, 98). Let β = β n = b/ n be the parameter value under the frailty model as described. For two test statistics, S and S 2, the ARE of S relative to S 2 is defined to be ARE(S, S 2 ) = e e 2, [E βn (S j ) 2 where e j = lim, j =, 2. n V ar (S j ) where E βn is the expectation taken under the contiguous alternatives, and E and V ar are the expectation and variance taken under the null hypothesis (β = ), respectively. For simplicity, it assumes a balanced allocation, i.e. subjects are evenly divided into the two treatment groups. Let G (t) and G (t) denote the censoring survival distributions for the treatment groups Z i = and Z i =, respectively. Similarly, let S (t) and S (t) denote the survival functions of the first event times of the two treatment groups. And define [ 2 [ Σ R = σ 2 E {Z i µ Z (t)}i(c i t)dµ (t) + E {Z i µ Z (t)} 2 I(C i t)dµ (t), [ A R = E {Z i µ Z (t)} 2 I(C i t)dµ (t), [ Σ L = E {Z i µ L Z(t)} 2 dni L (t), ( [ A L = E {Z i µ L Z(t)} 2 I(T i C i t)d µ (t) E ) ξ{ξe ξµ (t) }, (2.3) E ξ {e ξµ (t) } where E ξ is the expectation with respect to the frailty ξ and µ Z (t) = G (t) G (t) + G (t), µl Z(t) = G (t)s (t) G (t)s (t) + G (t)s (t). Note that in (2.3), µ L Z (t) = µ Z(t) since S (t) = S (t) under the null hypothesis. The following theorem gives the asymptotic relative efficiency of the mean test, U R, using the counting processes data versus the log-rank test, U L, using the first event times only. Theorem The ARE of U R relative to U L can be expressed as ARE(U R, U L ) = A2 R Σ L A 2 L Σ. (2.4) R 7

8 In addition, under the contiguous alternatives β = β n = b/ n, ARE(U R, U L ) can be consistently estimated by (Â2 R ˆΣ L )/(Â2 L ˆΣ R ), where ˆΣ R = σ2 n [ 2 {Z i Z(t)}I(C i t)dˆµ(t) + n i= Â L = n [ {Z i Z(t)} 2 I(C i t)dˆµ(t), Â R = [ {Z i n Z(t)} 2 I(C i t)dˆµ(t), i= ˆΣ L = [ {Z i n Z L (t)} 2 dni L (t), i= ( [ n {Z i Z L (t)} 2 j= I(T i C i t)d I{N ) j(t) = } n j= I{N. j(t) = } i= i= The proof of Theorem is given in the appendix. Based on Theorem, the log-rank test is more powerful than the mean test in terms of asymptotic relative efficiency if (A 2 R Σ L)/(A 2 L Σ R) <. In practice, we may use (Â2 R ˆΣ L )/(Â2 L ˆΣ R ) <, or equivalently, σ 2 > nâ2 ˆΣ R L /Â2 L n [ i= {Z i Z(t)} 2 I(C i t)dˆµ(t) n [ i= {Z i Z(t)}I(C i t)dˆµ(t) 2 as a threshold dictating that the log-rank test is more efficient. To simplify the ARE(U R, U L ) given in Theorem, we further assume that G (t) = G (t), i.e. the two treatment groups have the common censoring distribution. Thus, µ Z (t) /2 and we have the following Theorem. Theorem 2 Suppose G (t) = G (t) = G(t), then the ARE(U R, U L ) can be expressed as ( E {I(T i C i t)}d E (δ i )[E {µ (C i )} 2 [ µ (t) E ξ{ξe ξµ (t) } E ξ {e ξµ (t) } ) 2 [σ2 E {µ 2 (C i )} + E {µ (C i )}. (2.5) 8

9 Recall that ξ is the frailty with Eξ = and V ar(ξ) = σ 2. When σ 2 =, i.e. ξ or there is no random effect, (2.5) reduces to ARE(U R, U L ) = E {µ (C i )} E {µ (T i C i )} (2.6) since E (δ i ) = E {ξµ (T i C i )} = E {µ (T i C i )}. It is clear that ARE(U R, U L ) >, which is not surprising since U R (β) in this case is a semiparametrically efficient estimating function for β. We now argue that in the other extreme case where σ 2 is large, it is possible that the logrank test (U L ) is more efficient than the mean test (U R ). To avoid technical complication, we will require that the frailty ξ be bounded away from as σ 2 becomes large. Our result is summarized by the following corollary. Corollary Under the same setting as in Theorem 2, suppose in addition that ξ c, where c is a positive constant less than. Then for sufficiently large σ 2, (2.5) approaches. In particular, ARE(U R, U L ) < for all large σ 2. Corollary is more difficult to see from the form of (2.5). A proof is given in the appendix. The constraint imposed on the random effect ξ is to ensure a certain proportion of subjects experience at least one event. This is certainly true if it is a mixture of a constant with a positive random variable. For example, we can take ξ = p + ( p)η, where η is a Gamma random variable with mean and variance σ 2 /( p) 2 with a constant p (, ). Example As an illustration, consider the case in which the censoring time is uniformly distributed over interval [, 3 and the baseline mean function µ (t) = t. The frailty ξ = p+( p)η, 9

10 as being just described. Then it can be shown that the asymptotic relative efficiency ARE(U R, U L ) = 3[ 3 { } 3 +σ 2 σ 2 /( p) 2 e pt dt t/( p) (4σ 2 + 2)[ 3 ( t 3 ){ +σ 2 t/( p) } σ 2 /( p) 2 e pt {p +. p }dt (+σ 2 t/( p)) 2 2 Figure plots the ARE against σ 2 with p =.2. The crossing of the ARE from above to below the horizontal line of agrees with the claim of Corollary. 3. Simulation Results Simulation studies are conducted to compare the efficiencies (power) of the mean test (U R ) and the log-rank test (U L ) to see how the results of Section 2 hold under realistic sample sizes. To this end, we consider the standard two-arm, parallel and balanced design with m = n/2 subjects assigned randomly to each of the two treatment groups, i.e. Z = or with the same probability. Recurrent event times for each subject are generated from the frailty model, which is described in Section 2.3, with µ (t) = t and ξ = p + ( p)η, where p =.2 or.5 and η is a gamma random variable with mean and variance σ 2 η. Different values are chosen for σ 2 η which lead to various choices for the variance σ 2 ξ of ξ. The null hypothesis is β = and alternatives are taken to be β =.5 when m = 2 and β =.8 when m =. Two kinds of followup or censoring time distributions are considered here. In the first case, C is generated from uniform[,3 as in Lin et al. (2), which gives on average.5 events per subject in the control group (Z = ) and.5, 2.47 and 3.34 events per subject for β =,.5 and.8 respectively in the treatment group (Z = ). In the second case, exponential distribution with hazard rate being.8 is considered, which gives on average.25 events per subject in the control group and.25, 2.6 and 2.78 events per subject for β =,.5 and.8 respectively in the treatment group. Given the censoring time C i and the frailty ξ i, the total number of events K i on the ith subject is generated from a Poisson distribution with the mean ξ i e β Z i µ (C i ). In addition, the actual event times (T i, T i2,, T ik i ) are the order statistics of a set of K i independently

11 and identically distributed random variables, which are generated from the following density function (Ross 983) where µ ( ) is the derivative of µ ( ). K i µ (t ij ) f i (t i, t i2,, t i,ki K i, ξ i, C i ) = K i! µ (C i ), j= Results of the simulation studies are summarized in Tables and 2. Each entry in the table was based on simulated data sets. The first table corresponds to the case of uniform censoring time while the second that of exponential censoring time. They strongly support the theoretical findings of Section 2. In particular, both tables clearly show that the mean recurrence test has higher power than the log-rank test does when the variance of ξ is relatively small, which corresponds to small within-subject correlation. On the other hand, the log-rank test becomes more powerful if the variance of ξ is large. The type one errors for both tests are close to their nominal level. Next, as suggested by the referee, we conduct another set of simulations using the positive stable distribution for the frailty. Note that under the positive stable frailty model, time to the first event follows the proportional hazards model (Hougaard, 2), and thus the log-rank test should be most efficient if only the first occurrence times are used. Moreover, the positive stable distribution has variance of infinity. Thus, the mean test tends to loose power. As in the first simulation study, we consider the standard two-arm, parallel and balanced design with m = 5 or, and β =,.5,.8 or.. Recurrence event times are generated from the positive stable frailty model using the similar method as in the first simulation study. Here the positive stable frailties are generated using a R package rstable. The censoring times are generated from uniform[,3. Besides the mean test and the log-rank test for the first occurrence times, we also include the log-rank test based on the second occurrence times for comparison. The simulation

12 results are summarized in Table 3. Based on the simulation results, the type one errors (under β = ) for all three tests are close to their nominal level. But the log-rank test based on the first occurrence times has bigger power than the other two tests under all the alternatives and sample sizes under inquiry, which agree with our expectation. 4. Discussion This paper deals primarily with the issue of efficiency for two competing nonparametric twosample tests with recurrent event time data. It is found that the mean test by using Coxtype partial likelihood score (Andersen and Gill, 982) for the counting process with a robust variance standardization may not be as efficient as one would have expected when there is large variability among study subjects in terms of number of events. Indeed, by formulating the variability using a multiplicative frailty, an analytic expression for the relative efficiency can be derived. In addition, based on the analytical result, a threshold is also constructed to dictate which method to be more efficient. Finally, because of the mean recurrence and time to first event are two different kinds of endpoint, care must be taken in choosing between them so that no misleading interpretation results. In particular, it is only when longer time to first event likely to imply less frequent recurrence that the substitution of the mean test by the log-rank test becomes meaningful. Likewise, if delaying the first event time is the main objective, then the mean test can be used when less frequent recurrence implies longer time to first event. Of course, when the two tests are exchangeable in terms of interpretation, the efficiency will be a main consideration in deciding which one to choose. In addition, it is also interesting to study the optimal combination of the various log-rank tests based on the first, the second and other sequential event times. This will be investigated in our future research. 2

13 Acknowledgement The author would like to thank the Editor Professor Xuming He and the referee for their insightful and constructive comments. The author also thanks Professor Zhiliang Ying for the helpful discussion of the paper. Wenbin Lu s research was partially supported by National Science Foundation Grant DMS Appendix Proof of Theorem : From Serfling(98,.2), we get ARE(U R, U L ) = Γ L Γ R, where Γ K = lim n V ar (n /2 U K )/{E βn (n /2 U K )} 2, K = R, L. Let µ (t Z i ; β n ) be the cumulative hazard function for the first event time T i of the ith subject under the contiguous alternative. Then µ (t Z i ; β n ) = log{p βn (T i > t Z i )} = log[e ξ {P βn (T i > t ξ i )} = log[e ξ {e ξ ie βnz iµ (t) }. For the mean test, lim E β n (n /2 U R ) n [ = lim n E {Z i Z(t)}I(C i t)e β nz i dµ (t) n = lim n n = lim n n i= i= [ E {Z i Z(t)}I(C i t){ + β n Z i + o(β n )}dµ (t) [ E {Z i Z(t)}Z i I(C i t)dµ (t)β n i= [ = be {Z i µ Z (t)} 2 I(C i t)dµ (t) = ba R, 3

14 lim V ar (n /2 U R ) = lim V ar n n = lim n n = lim n n i= {V ar ( E [ + E (V ar [ [ n i= {Z i µ Z (t)}dn i (t) ) {Z i µ Z (t)}dn i (t) ξ i, Z i, C i )} {Z i µ Z (t)}dn i (t) ξ i, Z i, C i {V ar [ξ i {Z i µ Z (t)}i(c i t)dµ (t) i= + E (V ar [ )} {Z i µ Z (t)}{dn i (t) ξ i I(C i t)dµ (t)} ξ i, Z i, C i [ 2 [ = σ 2 E {Z i µ Z (t)}i(c i t)dµ (t) + E {Z i µ Z (t)} 2 I(C i t)dµ (t) = Σ R Therefor Γ R = Σ R /(ba R ) 2. For the log-rank test, we first apply Taylor expansion to the cumulative hazard function µ (t β n ; Z i ) to get µ (t β n ; Z i ) = µ, (t) + Z i β n µ,(t) + o(β n ), where µ, (t) = log[e ξ {e ξµ (t) } and µ,(t) = µ (t) E ξ{ξe ξµ (t) } E ξ {e ξµ (t) }. Thus lim E β n (n /2 U L ) n [ = lim n E {Z i Z L (t)}i(t i C i t)dµ (t Z i ; β n ) n = lim n n = lim n n i= i= [ E {Z i Z L (t)}i(t i C i t){dµ, (t) + Z i β n dµ,(t) + o(β n )} i= ( [ E {Z i Z L (t)} 2 I(T i C i t)β n d µ (t) E ξ{ξe ξµ (t) } ( = be {Z i µ L Z(t)} 2 I(T i C i t)d [ lim V ar (n /2 U L ) = lim V ar n n n [ = E {Z i µ L Z(t)} 2 dni L (t) = Σ L. [ µ (t) E ξ{ξe ξµ (t) } E ξ {e ξµ (t) } i= E ξ {e ξµ (t) } ) = ba L {Z i µ L Z(t)}dN L i (t) ) Hence Γ L = Σ L /(ba L ) 2. Then (2.4) established in Theorem easily follows. In addition, by 4

15 the law of large numbers we have, under the contiguous alternatives, n I{N i (t) = k} {µ (t)} k E ξ {ξ k e ξµ (t) }, k =,. i= as n. Now it is easy to show, also by the law of large numbers, that Σ K and A K (K = R, L) can be consistently estimated by ˆΣ K and ÂK, respectively. Therefore, the remain part of Theorem also holds. Proof of Corollary : First we have that [E {µ (C)} 2 /[σ 2 E {µ 2 (C)} + E {µ (C)} is bounded above by /σ 2. In addition, E (δ i ) = P (T i C i ). Applying integration by part, we have = = [ E {I(T i C i t)}d µ (t) E ξ{ξe ξµ (t) } E ξ {e ξµ (t) } [ G(t)E ξ {e ξµ (t) }d µ (t) E ξ{ξe ξµ (t) } E ξ {e ξµ (t) } µ (t) E ξ{ξe ξµ (t) } E ξ {e ξµ (t) } d[g(t)e ξ{e ξµ (t) } which is equivalent to E ξ {ξe ξµ (t) }µ (t)d{ G(t)} + [E ξ {ξe ξµ (t) } 2 E ξ {e ξµ (t) } µ (t)g(t)dµ (t). Since ξ c, we have E ξ {ξe ξµ (t) } c E ξ {e ξµ (t) }. By Jensen s Inequality, E ξ {e ξµ (t) } e µ (t) since e x is a convex function. Furthermore, we have E ξ {e ξµ (t) } e c µ (t). Thus [ E {I(T i C i t)}d µ (t) E ξ{ξe ξµ (t) } is bounded below by E ξ {e ξµ (t) } Hence c µ (t)e µ (t) d{ G(t)} + c 2 e (2 c )µ (t) µ (t)g(t)dµ (t). ARE(U R, U L ) σ 2 [ c µ (t)e µ (t) d{ G(t)} + c 2 e (2 c )µ (t) µ (t)g(t)dµ (t), 2 which goes to when σ 2 goes to. In particular, for all sufficiently large σ 2, ARE(U R, U L ) is less than. 5

16 References Andersen, P. K., Borgan, O., Gill, R. D. and Keiding, N. (992). Statistical Models Based on Counting Processes. Springer-Verlag. Andersen, P. K. and Gill, R. D. (982). Cox s regression model for counting processes: a large sample study. Ann. Statist., -2. Chang, S-H. and Wang, M-C. (999). Conditional regression analysis for recurrence time data. J. Amer. Statist. Assoc. 94, Cook, R. J., Lawless, J. F. and Nadeau, C. (996). Robust test for treatment comparisons based on recurrent event responses. Biometrics 52, Cox, D. R. (972). Regression models and life tables (with Discussion). J. R. Statist. Soc. B 34, Cox, D. R. (975). Partial likelihood. Biometrika 62, Fleming, T. R. and Harrington, D. P. (99). Counting Processes and Survival Analysis. New York: John Wiley and Sons. Gail, M. H., Santner, T. J. and Brown, C. C. (98). An analysis of comparative carcinogenisis experiments based on multiple times to tumor. Biometrics 36, Gehan, E. A. (965). A generalized Wilcoxon test for comparing arbitrarily singly censored samples. Biometrika 52, Hajek, J. and Sidak, Z. (967). Theory of Rank Tests. Academic Press, New York. Harrington, D. P. and Fleming, T. R. (982). A class of rank test procedures for censored survival data. Biometrika 69,

17 Hougaard, P. (2). Analysis of Multivariate Survival Data. Springer, New York. Kalbfleisch, J. D. and Prentice, R. L. (98). The Statistical Analysis of Failure Time Data. New York: John Wiley and Sons. Lawless, J. F. and Nadeau, C. (995). Some simple robust methods for the analysis of recurrent events. Technometrics 37, Lawless, J. F., Nadeau, C. and Cook, R. J. (997). Analysis of mean and rate functions for recurrent events. In Proceedings of the First Seattle Symposium in Biostatistics: Survival Analysis, Eds. D. Y. Lin and T. R. Fleming, pp New York: Springer-Verlag. Lin, D. Y., Wei, L. J., Yang, I. and Ying, Z. (2). Robust inferences for the Andersen-Gill counting process model. J. R. Statist. Soc. B 62, Mantel, N. and Haenszel, W. (959), Statistical aspects of the analysis of data from retrospective studies of disease, J. Nat. Cancer Inst. 22, Pepe, M. S. & Cai, J. (993). Some graphical displays and marginal regression analyses for recurrent failure times and time dependent covariates. J. Amer. Statist. Assoc. 88, Peto, R. and Peto, J. (972). Asymptotically efficient rank invariant test procedures (with Discussion). J. R. Statist. Soc. A 35, Prentice, R. L. (978). Linear rank tests with right censored data. Biometrika 65, Ross, S. M. (983). Stochastic processes. New York: Wiley. Serfling, R. J. (98). Approximation Theorems of Mathematical Statistics. New York: John Wiley and Sons. 7

18 Wang, M-C. and Chang, S-H. (999). Nonparametric estimation of a recurrent survival function. J. Amer. Statist. Assoc. 94, Wei, L. J., Lin, D. Y. and Weissfeld, L. (989). Regression analysis of multivariate incomplete failure time data by modelling marginal distributions. J. Amer. Statist. Assoc. 84,

19 Table : Simulation results for gamma frailty model under uniform censoring m β p σ 2 η σ 2 ξ POW L (type I error) POW R (type I error) (.58).976 (.52) (.5).927 (.57) (.47).853 (.53) (.49).789 (.6) (.5). (.44) (.45).994 (.48) (.37).974 (.48) (.53).968 (.58) (.56).936 (.53) (.45).848 (.6) (.39).76 (.44) (.43).69 (.5) (.54).997 (.46) (.39).967 (.5) (.56).93 (.55) (.43).99 (.62) σ 2 η and σ 2 ξ are the variances of η and ξ, respectively. POW L and POW R are the powers of the log-rank test and mean test, respectively. Note that ξ = p + ( p)η. 9

20 Table 2: Simulation results for gamma frailty model under exponential censoring m β p ση 2 σξ 2 POW L (type I error) POW R (type I error) (.52).955 (.47) (.49).86 (.57) (.54).794 (.59) (.43).752 (.44) (.55).999 (.54) (.54).98 (.52) (.5).969 (.47) (.52).943 (.49) (.55).874 (.52) (.59).74 (.58) (.42).687 (.6) (.53).66 (.53) (.47).987 (.59) (.48).938 (.43) (.52).9 (.5) (.43).869 (.48) The notations are the same as in Table. 2

21 Table 3: Simulation results for positive stable frailty model under uniform censoring m β POW R POW L POW L POW L 2 is the power of the log-rank test based on the second occurrence times. 2

22 ARE value Variance of the mixed Gamma frailty Figure : Asymptotic relative efficiency curve of the mean test vs. the log-rank test 22

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

October 20, 2007 FULL LIKELIHOOD INFERENCES IN THE COX MODEL BY JIAN-JIAN REN 1 AND MAI ZHOU 2 University of Central Florida and University of Kentucky Abstract We use the empirical likelihood approach