Chapter 7 Fall Chapter 7 Hypothesis testing Hypotheses of interest: (A) 1-sample

Size: px

Start display at page:

Download "Chapter 7 Fall Chapter 7 Hypothesis testing Hypotheses of interest: (A) 1-sample"

Warren Park
5 years ago
Views:

1 Bios 323: Applied Survival Analysis Qingxia (Cindy) Chen Chapter 7 Fall 2012 Chapter 7 Hypothesis testing Hypotheses of interest: (A) 1-sample H 0 : S(t) = S 0 (t), where S 0 ( ) is known survival function, for all t τ; H 0 : H(t) = H 0 (t), where H 0 ( ) is known; H 0 : h(t) = h 0 (t), where h 0 ( ) is known. (B) 2-sample, no known forms H 0 : S 1 (t) = S 2 (t), H 1 (t) = H 2 (t), h 1 (t) = h 2 (t). Equivalent hypotheses Global null, H 0 holds for all t [0, τ] (C) K-sample, obvious extension of (B) H 0 : S 1 (t) = S 2 (t) = = S K (t), t [0, τ] How to test? Difficulty: Censoring (& truncation) Our focus: censoring 7.2 One sample tests with right censoring Data: {(T i, δ i ), i = 1,..., n}, T i = min(x i, C i ), δ i = I(T i = X i ), X i has hazard h(t) H 0 : h(t) = h 0 (t), t [0, τ] then H 0 : H(t) = H 0 (t), t [0, τ] Notations: t 1 < t 2 < < t D, D unique death times 1

2 d j = # deaths at t j = n i=1 I(T i = t j, δ i = 1) = n i=1 dn i(t j ) = # at risk /alive at t j = n i=1 I(T i t j ) = Y (t j ), j = 1,..., D Recall N-A estimator H(t) = { 0, t < t1 t j t d j/, t 1 t. Let W (t) be a weight function such that W (t) = 0 whenever Y (t) = 0. Define the test statistic Z(τ) = O(τ) E(τ) = d j W (t j ) Y (t j ) τ 0 W (s)h 0 (s)ds. (7.2.1) When the null hypothesis is true, the sample variance of this statistic is given by V [Z(τ)] = τ 0 W 2 (s) h 0(s) ds. (7.2.2) Y (s) For large samples, the statistic Z(τ) 2 /V [Z(τ)] has a central chi-squared distribution when the null hypothesis is true. Choices of weight functions: The most popular choice of a weight function is the weight W (t) = Y (t) which yields the one-sample log-rank test. n E(τ) = V [Z(τ)] = [H 0 (T i ) H 0 (L i )], (7.2.4) where L i is the delayed entry time for the ith patient, and H 0 is the cumulative hazard under H 0. Harrington and Fleming (1982) proposed W HF (t) = Y (t)s 0 (t) p [1 S 0 (t)] q, p 0, q 0, where S 0 (t) = exp[ H 0 (t)] is the hypothesized survival function. Example: IBCSG trial 6 of a multi center of adjuvant chemotherapy after surgical removal of tumor. 2 i=1

3 7.3 Tests for two and more samples We shall test the following set of hypotheses: H 0 : h 1 (t) = h 2 (t) = = h K (t), for all t τ, versus H A : at least one of the h k (t) is different for some t τ, where τ is the largest time at which all of the groups have at least one subject at risk. Right censoring data: {(T ik, δ ik ), k = 1,..., K, i = 1,..., n k } Let 0 < t 1 < < t D be the ordered times at which deaths occur in the pooled sample. Define: d j = # deaths in pooled sample at t j, j = 1,..., D d jk = # deaths in group k at t j, j = 1,..., D = # at risk in pooled sample at t j, alive and uncensored k = # at risk in group k at t j Basic idea: Under H 0 : h 1 (t) = = h K (t), the pooled NA estimators for H(t) and the within group NA estimators should be estimating the same quantity. So we can construct a test based on a weighted average of difference between H and the within group estimators H k. Let W k (t) be a nonnegative random function and equal to 0 when = 0 or k = 0, k = 1,..., K. For example, W k (t j ) = k W (t j ), where W is common to all the groups k = 1,..., K. Define Test statistic for H 0 : Z k (τ) = { } k W (t j ) d jk d j. χ 2 = (Z 1 (τ),..., Z K 1 (τ))ˆσ 1 (K 1) (K 1) (Z 1(τ),..., Z K 1 (τ)) t chi-squared dist. with df = K-1 underh 0, 3

4 where ˆΣ = (ˆσ kk ) (K 1) (K 1) ˆσ kk = W (t j ) 2 Y ( jk 1 Y ) jk Yj d j 1 d j, k = 1,..., K, (7.3.4) and ˆσ kk = W (t j ) 2 kk Y 2 j d j 1 d j, 1 k k K. (7.3.5) Choice of weight functions with general form: W k (t j ) = k W (t j ), k = 1,..., K (1) W (t) = 1, log-rank test, optimal against proportional hazards alternatives, where h 1 (t) = c 2 h 2 (t) = = c K h K (t), t τ. (2) W (t) = Y (t) = K nk k=1 i=1 I(T ik t) Gehan s test Simple generalization of Wilcoxon rank sum /Mann-Whitney test when K =2. (3) Tarone-Ware W (t) = Y (t) (4) Peto-Peto W (t) = S(t), where the survival estimator S(t) = t j t (1 d j +1 ) is based on the pooled sample. (5) Modified Peto-Peto W (t) = S(t)Y (t) Y (t)+1. (6) Fleming-Harrington test W (t j ) = (Ŝ(t j 1)) p (1 Ŝ(t j 1)) q, p, q 0, where Ŝ is the KM estimator based on the combined sample. Example 7.2 We are interested in testing if there is a difference in the time to cutaneous exit-site infection between patients whose catheter was placed surgically (group 1) as compared to patients who had their catheters placed percutaneously (group 2). Example 7.4 Comparison of disease-free survival in three groups of leukemia patients given a bone marrow transplantation. Here the three groups include: Group 1: 38 ALL patients, Group 2: 54 AML low-risk patients, and Group 3: 45 AML high-risk patients. We want to test H 0 : S 1 (t) = S 2 (t) = S 3 (t), t τ = Tests for trend 4

5 We shall test the following set of hypotheses: H 0 : h 1 (t) = h 2 (t) = = h K (t), for all t τ, versus H A : h 1 (t) h 2 (t) h K (t) for all t τ, and there is at least one strict inequality. It is equivalent to testing that H A : S 1 (t) S 2 (t) S K (t) for all t τ. Right censoring data: {(T ik, δ ik ), k = 1,..., K, i = 1,..., n k } Let 0 < t 1 < < t D be the ordered times at which deaths occur in the pooled sample. Define: d j = # deaths in pooled sample at t j, j = 1,..., D d jk = # deaths in group k at t j, j = 1,..., D = # at risk in pooled sample at t j, alive and uncensored k = # at risk in group k at t j Let W be a common weight function to all the groups k = 1,..., K and define Z k (τ) = { } d j W (t j ) d jk k. Objective: Increase power by constructing statistic sensitive to the stochastic ordering. General form: Z = K K k=1 a kz k (τ) k=1 K k =1 a ka k ˆσ kk. (7.4.2) where ˆσ kk is from ˆΣ, estimated covariance matrix for (Z 1 (τ),..., Z K (τ)). ˆσ kk = W (t j ) 2 Y ( jk 1 Y ) jk Yj d j 1 d j, k = 1,..., K, (7.3.4) and ˆσ kk = W (t j ) 2 kk Y 2 j d j 1 d j, 1 k k K. (7.3.5) 5

6 How to choose a k to increase power of detecting H A? Let a 1 < a 2 < < a K (e.g., a j = j standard approach) We reject H 0 in favor of H A at an α Type I error rate when the test statistic is larger than the αth upper quantile of a standard normal distribution. Exercise: To show that the test is invariant under linear transformation of the scores. Other choices of a k, which may have better properties. Example 7.6: testing survival difference in patients with larynx cancer across disease stage groups (I, II, III, IV). Should only use trend test if there is a strong prior information about group ordering; otherwise, the trend test results in potential loss of power Stratified tests Suppose one wishes to adjust for other covariates. E.g.: Compare the survival rates of patients receiving 3 months versus 6 months chemotherapy in IBCSG with adjustment for tumor stage (I, II, III, IV). H 0 : h 1s (t) = h 2s (t) = = h Ks (t), t [0, τ], s = 1,..., M strata Compute Z js (τ), j = 1,..., K, s = 1,..., M, and ˆΣ s = ˆ Cov[Z 1s (τ),..., Z Ks (τ)], separately with each stratum. The global test is Z j. (τ) = M s=1 Z js(τ) and ˆΣ. = ˆ Cov[Z 1. (τ),..., Z K. (τ)] = M s=1 ˆΣ s, χ 2 = [Z 1. (τ),..., Z (K 1). (τ)]ˆσ 1.(K 1) (K 1) [Z 1.(τ),..., Z (K 1). (τ)] χ 2 K 1, under H 0. Example 7.4 (Bone marrow transplant data) 6

7 We once compared the disease-free survival rates of patients in the three groups ALL, AML low risk and AML high risk. The subjects were also divided into those who used a graft-versus-host prophylactic (MTX) and those who didn t (NOMTX). We want to perform a stratified log-rank test for differences in the hazard rates of the three disease states. To illustrate the stratified test procedure, we first perform 2 separate log-rank tests using R. library(survival) survdiff(surv(t2, dfree) g + strata (mtx), data = bmt, rho = 0) > print(fit1) Call: survdiff(formula = Surv(t2, dfree) ~ g + strata(mtx), data = bmt, rho = 0) N Observed Expected (O-E)^2/E (O-E)^2/V g= g= g= Chisq= 13.2 on 2 degrees of freedom, p= Continue example IBCSG data: Recall H 0 : h 3mon (t) = h 6mon (t) Stratify on tumor stage = 1, 2, 3, 4 In Splus/R: library(survival) survdiff(surv(t, ind) treatment + strata (stage), rho = 0) Matched pairs 7

8 Here we have paired event times (T 1i, T 2i ) and event indicators (δ 1i, δ 2i ), for i = 1,..., M. We wish to test H 0 : h 1i (t) = h 2i (t), i = 1,..., M. Computing the statistic (7.3.3) and (7.3.4), we have W (T 1i )/2 if T 1i < T 2i, δ 1i = 1, W 2 (T 1i )/4 or T 1i = T 2i, δ 1i = 1, δ 2i = 0, Z 1i (τ) = W (T 2i )/2 if T 2i < T 1i, δ 2i = 1, W 2 (T 2i )/4 = ˆσ 11i (τ) ir T 1i = T 2i, δ 1i = 0, δ 2i = 1, 0 otherwise, 0. (7.5.5) If the weight function does not depend on time, then Z 1. (τ) = W 2 (D 1 D 2 ), where D 1 = T 1i T 2i δ 1i and D 2 = T 1i T 2i δ 2i. Similarly, ˆσ 11. (τ) = M i=1 ˆσ 11i(τ) = W 2 4 (D 1 + D 2 ). So the test statistic is D 1 D 2 D1 +D 2 N(0, 1) under H 0. Example 7.8 Matched pair of leukemia patients receiving 6MP and placebo treatment. 7.6 Renyi type tests The Renyi type tests are preferable when hazard rates cross. We test H 0 : h 1 (t) = h 2 (t), t τ, against H A : h 1 (t) h 2 (t) for some t τ. Consider two samples of sizes n 1 and n 2 with n = n 1 + n 2. Let t 1 < t 2 < < t D be the distinct failure times of the pooled sample. Let d j1, d j2 be the number of events at t j and 1, 2 be the number at risk at time t j in the two samples. Also d j = d j1 + d j2 and = Let W be the weight function and let Z(t i ) = t j t i W (t j ) [ d j1 1 d j ], i = 1,..., D. (7.6.1) 8

9 σ 2 (τ) = t j τ where τ is the largest t j with 1, 2 > 0. The test statistic for a two-sided alternative is given by W 2 (t j ) 1 2 d j 1 d j, (7.6.2) Q = sup{ Z(t), t τ}/σ(τ). (7.6.3) When H 0 is true, the distribution of Q can be approximated by the distribution of the sup{ B(x), 0 x 1} where B is a standard Brownian motion process. Critical values of Q are found in Table C.5 in Appendix C. When the hazards cross, the supremum test should have greater power to detect such differences between the hazard rates. Example 7.9 A clinical trial of chemotherapy against chemotherapy combined with radiotherapy in the treatment of gastric cancer. 9

Chapter 7: Hypothesis testing

Chapter 7: Hypothesis testing Hypothesis testing is typically done based on the cumulative hazard function. Here we ll use the Nelson-Aalen estimate of the cumulative hazard. The survival function is used