Linear rank statistics Comparison of two groups. Consider the failure time T ij of j-th subject in the i-th group for i = 1 or ; the first group is often called control, and the second treatment. Let n i be the size of the i-th group. We consider the respective censoring time U ij, and assume that T ij and U ij are independent. Within a group we introduce the number of observed events by where N i (t) = N ij (t) = I {Xij t,δ ij =1} (5.1) X ij = min(t ij, U ij ), i = 1, and j = 1,..., n i, are observed times along with indicator δ ij = I {Tij U ij }. Martingales for grouped processes. Similarly we can construct the number at risk by Y i (t) = Y ij (t) = I {Xij t} (5.) A natural filtration F t is also introduced so that the above processes are F t -measurable and M i (t) = N i (t) Y i (u)λ i (u)du (5.3) becomes an F t -martingale with the respective hazard function λ i (t) for each i = 1,. Mantel s logrank statistic. We introduce and N i (t) = N i (t) N i (t ); Y i (t) = Y i (t) Y i (t ) for i = 1,, N(t) = N 1 (t) + N (t); Y (t) = Y 1 (t) + ; n = n 1 + n. Standardized sum of differences between observed and expected numbers of events ( ) 1/ [ n L M = n 1 n dn 1 (t) Y ] 1(t) Y (t) dn(t) is called Mantel s logrank statistic. Heuristic interpretation of test. For a moment we assume that all the data are uncensored; thus, X ij = T ij and δ ij = 1. Then the following table summarizes data observed at time t: Group 1 Group Total Observed at t N 1 (t) N (t) N(t) Not observed Y 1 (t) N 1 (t) N (t) Y (t) N(t) Risk at t Y 1 (t) Y (t) Page 1 Special lecture/july 16
Under the null hypothesis there is no difference between two groups, the number N 1 (t) of observed events in the first group must be predicted by N(t)Y 1 (t) /Y (t). Problem 1. Mantel logrank statistic can be written in a form L M = K M (t) with ( ) 1/ n Y 1 (t) K M (t) = n 1 n Y (t) Here data are generally censored, and the formulation of (5.4) is called a linear rank statistic. (5.4) Wilcoxon rank-sum statistic. Let T (1) < < T (n) be the ordered statistics of the pooled failure time T ij s. Then we can define the rank R j of the j-th subject in the first group by setting R j = k if T 1j = T (k). Let S i (t) be the survival function of failure time from the i-th group. Under the null hypothesis H : S 1 = S, The Wilcoxon rank-sum statistic W = has the mean n 1(n+1) and the variance n 1n (n+1) 1. Gehan-Wilcoxon statistic. Assuming that the observed data are uncensored, the Wilcoxon rank-sum statistic W can be related to a linear rank statistic L G = n 1 (n + 1) W = K G (t) (5.5) with R j K G (t) = Y 1(t) (nn 1 n ) 1/. This formulation allows us to incorporate censored data into the hypothesis test, and it is called the Gehan-Wilcoxon statistic. Problem. we can verify (5.5) by completing the following questions. (a) Show that Y 1 (t) = [Y (t)dn 1 (t) Y 1 (t)dn(t)] (b) Show that (c) Show that Y (t)dn 1 (t) = n 1 (n + 1) W Y 1 (t)dn(t) = R (j) = W. Page Special lecture/july 16
Asymptotic properties. A linear rank statistic L K = K(t) (5.6) is characterized with F t -measurable process K(t). Under the null hypothesis H : λ 1 = λ = λ, the test statistic L K has asymptotically a normal distribution with mean. The varaince σ of asymptotic normal distribution can be obtained as the limit of [ ] Var(L K (t)) = E Y (t) λ(u)du (5.7) Y 1 (u)y (u) Consistency. Let z α be the critical value with level α from the standard normal distribution. Then the critical value of linear rank statistic can be obtained from the asymptotic normality so that P (L K > σz α ) α under H. Consider the alternative hypothesis H A : λ 1 (t) λ (t) for all t. The linear rank statistic (5.6) is consistent in the sense that lim n1,n P (L K > σz α ) = 1 if H A is true. Problem 3. We derive the variance formula (5.7) of linear rank statistic by completing the following questions. (a) Show that E[L K ] = (b) It is known that E[][λ 1 (u) λ (u)]du ( ) Cov Y 1 (u) dm 1(u), Y (u) dm (u) =. Assuming the null hypothesis H : λ 1 = λ = λ, verify (5.7). Problem 4. Assume that data are all uncensored, and that the null hypothesis H : S 1 = S is true. Then we have Y i (t) n i S(t), i = 1,, and Y (t) n S(t) as n 1, n. (a) Show that the limit σ M of Var(L M) for Mantel logrank statistic is obtained by σ M = S(u)λ(u)du = 1. (b) Show that the limit σ G of Var(L G) for Gehan-Wilcoxon statistic is obtained by σ G = S(u) 3 λ(u)du = 1 3. Page 3 Special lecture/july 16
Problem 1. We obtain algebraically Problem solutions dn 1 (t) Y 1(t) Y (t) dn(t) = dn 1(t) Y 1(t) Y (t) [dn 1(t) + dn (t)] Thus, L M has the form of (5.4), as desired. = Y Y (t) dn 1(t) Y 1(t) Y (t) dn = Y [ 1(t) dn1 (t) Y (t) Problem. (a) We can show that Y 1 (t) = dn 1 (t) Y 1 (t)dn (t) = dn 1 (t) Y 1 (t)[dn(t) dn 1 (t)] = [Y (t)dn 1 (t) Y 1 (t)dn(t)] (b) Observe that Y (T (k) ) counts all the cases at risk at the time T (k) of k-th event, and therefore, that Y (T (k) ) = n + 1 k. Then we can verify that Y (t)dn 1 (t) = Y (T 1j ) = Y (T (Rj )) n 1 = (n + 1 R j ) = n 1 (n + 1) W (c) Consider the order statistics R (1) < < R (n1 ) of the rank statistics R j, j = 1,..., n 1. Observe that Y 1 (T (R(j) )) counts the cases at risk in group 1 at the time T (R(j) ) of j-th event from group 1; thus, we have Y 1 (T (R(j) )) = n 1 + 1 j. By setting R () = and T () = for convenience, we can verify that Y 1 (t)dn(t) = Y 1 (T ij ) = Y 1 (T (R(j) ))[R (j) R (j 1) ] i=1 n 1 = (n 1 + 1 j)[r (j) R (j 1) ] = n 1 R (1) + (n 1 1)[R () R (1) ] + + (n 1 1)[R (n1 ) R (n1 1)] = R (j) = W. Problem 3. (a) By using martingales M 1 (t) and M (t), we introduce the stochastic process L K (t) = Y 1 (u) dm 1(u) Y (u) dm (u) + K(t)[λ 1 (u) λ (u)]du Page 4 Special lecture/july 16
Then we can show that E[L K (t)] = E[][λ 1 (u) λ (u)]du (b) By applying the martingale variance formula, we have ( ) t [ ] Var Y i (u) dm i(u) = E λ i (u)du Y i (u) Under the null hypothesis H : λ 1 = λ = λ, we can show that ( t ) Var(L K (t)) = Var Y 1 (u) dm 1(u) Y (u) dm (u) [ ] = E Y 1 (u) + λ(u)du Y (u) Problem 4. (a) Observe that K M (t) Y (t) Y 1 (t) = Therefore, by taking the limit of (5.7) we obtain S(u)λ(u)du = n Y1(t) n 1 n Y (t) S(t) d[ S(u)] = [ S(u)] u= u= = 1 (b) Observe that K G (t) Y (t) Y 1 (t) = Y (t)y 1(t) S(t) 3 nn 1 n Therefore, by taking the limit of (5.7) we obtain S(u) 3 λ(u)du = S(u) d[ S(u)] = [ S(u)3 3 ] u= u= = 1 3 Page 5 Special lecture/july 16