Part III. Hypothesis Testing. III.1. Log-rank Test for Right-censored Failure Time Data

1 Part III. Hypothesis Testing III.1. Log-rank Test for Right-censored Failure Time Data Consider a survival study consisting of n independent subjects from p different populations with survival functions S 1 (t),..., S p (t). Suppose that the goal is to test the hypothesis H 0 : S 1 (t) =... = S p (t). based on right-censored failure time data { X i = min(t i, C i ), δ i = I(X i = T i ) ; i = 1,..., n }. Let t 1 < t 2 <... < t k observed failure times, d ij = # of failures at t j from the ith population r ij = # of subjects at risk at t j from the ith population d j = # of failures at t j (= d 1j +... + d pj ), r j = # of subjects at risk at t j (= r 1j +... + r pj ), j = 1,..., k, i = 1,..., p.

To construct a test statistic, consider what happened at time t j. Conditional on the failure and censoring experience up to time t j, under H 0, the conditional distribution of d 1j,..., d pj given d j is the hypergeometric distribution 2 P(d 1j,..., d pj d j, r 1j,..., r pj ) =. Thus we have w ij = E[ d ij d j ] = r ij d j r 1 j, V j ii = V ar[ d ij d j ] = r ij (r j r ij ) d j (r j d j ) r 2 j (r j 1) 1, V j i 1 i 2 = cov[ d i1 j, d i2 j d j ] = r i1 jr i2 jd j (r j d j )r 2 j (r j 1) 1. Define the statistic ν j = ( d 1j w 1j,..., d pj w pj ) at t j, which has (conditional) mean zero and covariance matrix V j = ( V j i 1 i 2 ). The log-rank statistic is defined as the simple summation over failure times ν = k ν j = ( D 1 E 1,..., D p E p ),

the vector of the observed numbers of failures in each population minus the corresponding vector of the expected numbers of failures, where D i = k d ij, E i = k w ij. Or the statistic ν can be written as ν = D E, 3 where D = (D 1,..., D p ), E = (E 1,..., E p ). If the ν j s are independent, then E[ ν ] = 0, V ar[ ν ] = V 1 +... + V k. The hypothesis H 0 can be tested using the statistic χ 2 = ν V 1 ν based on a χ 2 p 1 distribution for large samples. If p = 2, the test of the hypothesis H 0 can be based on the statistic Z = k (d 1j r 1j d j /r j ) [ k r 1j (r j r 1j ) d j (r j d j ) r 2 j (r j 1) 1 ] 1/2 with the standard normal distribution for large samples.

Comments: 1. The log-rank test can be seen as censored data generalizations of linear rank statistics such as the Wilcoxon test and Savage exponential score test. It is also referred to as the generalized Savage test. 2. The log-rank test can also be derived as a score test from the marginal or partial likelihood under the proportional hazards model, which means that the hazard or survival functions are proportional to each other. Under this case, it can be shown that the log-rank test is the optimal test or the most efficient test. 3. The log-rank test is derived based on large-sample theory under the assumption that the censoring distribution is independent of the failure distributions. 4. The log-rank test statistic can be rewritten as with ν = k D i E i = k ν j = ( D 1 E 1,..., D p E p ) r ij d ij r ij d j r j = k r ij ( ˆλ ij ˆλ j ) 4 = 0 w i (t) [ d Λ i (t) d Λ(t) ], the summation of weighted differences between the estimates of hazard functions for individual populations and the common population under H 0.

III.2. Other Tests for Right-censored Failure Time Data As in the previous section, again consider the problem of comparing p = 2 survival functions based on right-censored data from n independent subjects, that is, testing H 0 : S 1 (t) = S 2 (t). 5 III.2.1. Weighted log-rank tests : Note that we can rewrite the log-rank statistic ν 1 as ν 1 = D 1 E 1 = k r 1j d 1j r 1j d j r j = k r 1j r 2j r 1j + r 2j d 1j r 1j d 2j r 2j = 0 Ȳ 1 (t) Ȳ2(t) Ȳ 1 (t) + Ȳ2(t) { d Λ1 (t) d Λ 2 (t) }, where Ȳ 1 (t) = # of subjects from the population 1 at risk at t, Ȳ 2 (t) = # of subjects from the population 2 at risk at t.

6 This motivated the weighted log-rank test statistics = 0 WLR = 0 = k W(t) K(s) { d Λ 1 (t) d Λ 2 (t) } Ȳ 1 (t) Ȳ2(t) Ȳ 1 (t) + Ȳ2(t) r 1j r 2j W(t j ) r 1j + r 2j { d Λ1 (t) d Λ 2 (t) } d 1j r 1j d 2j, r 2j where K(s) or W(s) is a weight process. It can be shown that under H 0 and some regularity conditions, WLR has an asymptotic normal distribution with mean zero and variance that can be estimated by ˆσ 2 = k as n. Let K 2 (t j ) 1 r 1j r 2j r j d j r j 1 d j = k W 2 (t j ) r 1j r 2j r 2 j r j d j r j 1 d j Ŝ denote the the Kaplan-Meier estimator of the survival function under H 0 based on pooled samples. A common class of weight processes is given by W(t) = { Ŝ(t )}ρ { 1 Ŝ(t )}γ (Harrington and Fleming, 1982), where ρ and γ are non-negative constants. In this case, the test statistics W LR are referred to as G ρ,γ statistics.

7 III.2.2. Weighted Kaplan-Meier statistics : To test H 0, we could also employ the weighted Kaplan-Meier statistics WKM = n 1 n 2 n τ 0 W(t) [ Ŝ1(t) Ŝ2(t) ] dt, where τ is the largest observation time, W(t) is a weight process and Ŝ 1 and Ŝ2 are the Kaplan-Meier estimators of the survival functions S 1 and S 2 based on separate samples, respectively. Suppose that the weight process W(t) is small when t is close to τ. Then it can be shown that as n, the distribution of the statistics W KM can be approximated by a normal distribution with mean zero and variance where ˆσ 2 = τ 0 [ τ t W(u) Ŝ(u) du ]2 dŝ(t), Ŝ 2 (t) Ĉ (t) Ŝ and Ĉ are the Kaplan-Meier estimators of the common survival function under H 0 and the survival function of the censoring variable based on the pooled samples, respectively. Pepe and Fleming (1989), Biometrics, 497-507.

Comments 1. The test statistics W LR, the integrated weighted differences of the estimated hazard functions, are most sensitive to the alternative of ordered hazard functions Ha 1 : λ 2(t) λ 1 (t) for all t. In contrast, the test statistics W KM, the integrated weighted difference between Kaplan-Meier estimates of the survival functions, are most sensitive to the alternative of ordered survival functions Ha 2 : S 2(t) S 1 (t) for all t. Ha 2 does not imply H1 a. 2. The test statistics WLR are constructed based on ranks and thus invariant under all monotone transformations of time. That is, they do not depend on the scale in which time is measured. This is not true for WKM. 8

9 III.3. Log-rank Test for Interval-censored Data As in the previous sections, consider a survival study which involves n independent subjects from p populations and in which the goal is to test the hypothesis H 0. Instead of observing right-censored data, suppose that only interval-censored data are available. Also suppose that the survival time takes discrete values 0 = t 0 < t 1 <... < t k < t k+1 =. For subject i, let A i = { L i, L i + 1,..., U i } ǫ { t 1,..., t k+1 } denote the interval within which the ith individual fails. Then observed data have the form { A i ; i = 1,..., n }. Also let 0 = s 0 < < s m+1 = k + 1 denote the smallest subset of { t 0, t 1,..., t k+1 } such that each L i and U i is contained in the subset and j = { s j 1 + 1,..., s j }, j = 1,..., m. Define α ij as the indicator of the event j A i. Note that if (i) the intervals not including k + 1 are not overlapping and (ii) for each interval with U i = k + 1, its left endpoint coincides with a left endpoint of an interval that does not include k + 1, then the observed data can be treated as right-censored data by treating each interval as a single point.

To test H 0, we will follow the idea behind the log-rank test for right-censored data and determine the death and risk numbers. Let S = (S 0,..., S m ) denote the common survival function of the p populations under H 0 (S j = Pr{T > s j }) and Ŝ = (Ŝ0,..., Ŝm) the maximum likelihood estimator of S. 10 Define and d j = n {α ij [Ŝj 1 r j = m+1 r=j i=1 n i=1 {α ir [Ŝr 1 m+1 Ŝj]/ u=1 m+1 Ŝr]/ u=1 α iu [Ŝu 1 Ŝu]} α iu [Ŝu 1 Ŝu]}. Also define and d jl = i r jl = m+1 r=j i {α ij [Ŝj 1 {α ir [Ŝr 1 m+1 Ŝj]/ u=1 m+1 Ŝr]/ u=1 α iu [Ŝu 1 Ŝu]} α iu [Ŝu 1 Ŝu]}, where i denotes the summation over subjects i in the population l. The d j s, r j s, d jl s and r jl s possess the similar meanings to the d j s, j s, d jl s and r jl s respectively, the numbers of failures and the numbers of risks.

11 Motivated by the log-rank test statistic for right-censored data, we can construct a test statistic T = (T 1,..., T p ) t for testing H 0, where T l = m d jl r jl d j n j. If an estimate, V, of the variance of T is available, then the test of H 0 can be based on the approximation T t V 1 T χ 2 p 1. To obtain an estimate for the covariance matrix of T or V, see Sun (1996), Statistics in Medicine, Vol. 15, 1387-1395. Zhao and Sun (2004), Statistics in Medicine, Vol. 23, 1621-1629.

III.4. Weighted Survival Test for Interval-censored Data In this section, we will consider two sample comparison problem (p = 2) and use the notation given in the previous section. To test H 0, similar to the weighted Kaplan-Meier test statistics for rightcensored data, we can construct a class of test statistics as 12 W = k w(t j ) [ Ŝ1(t j ) Ŝ2(t j ) ] j, where w is a weight function, Ŝ1 and Ŝ2 are the maximum likelihood estimates of the two survival functions S 1 and S 2 based on separate samples, and j = t j t j 1. The statistic W can be rewritten as W = k w(t j ) [ j l=1 ˆp (2) l j l=1 ˆp (1) l ] j, where p (i) l = S i (t l 1 ) S i (t l ), i = 1, 2. That is, W is a function of estimates of parameters { p (1) l, p (2) l ; l = 1,..., m }, whose covariance can be estimated using the Fisher information matrix. Also under H 0, the distribution of W can be approximated by the normal distribution with mean zero. Petroni and Wolfe (1994), Biometrics, 77-87.