Goodness-of-fit tests for the cure rate in a mixture cure model

Biometrika (217), 13, 1, pp. 1 7 Printed in Great Britain Advance Access publication on 31 July 216 Goodness-of-fit tests for the cure rate in a mixture cure model BY U.U. MÜLLER Department of Statistics, Texas A&M University, College Station, Texas 77843-3143, U.S.A. uschi@stat.tamu.edu 5 AND I. VAN KEILEGOM Research Centre for Operations Research and Business Statistics, KU Leuven, Naamsestraat 69, 3 Leuven, Belgium ingrid.vankeilegom@kuleuven.be S. SUPPLEMENTARY ONLINE MATERIAL 1 S.1. Proof of Theorem 1 To derive the desired result for the Härdle Mammen-type test statistic T n from equation (2) we write T n nh 1/2 [{ˆp(x) p(x)} 2 + {p(x) pˆθ(x)} 2 + 2{p(x) pˆθ(x)}{ˆp(x) p(x)}]π(x) dx. Consider local alternatives p(x) p θ (x) + n 1/2 h 1/4 n (x), where θ denotes the true parameter. This covers the null hypothesis as a special case with n. We write T n for the first term of T n. We will show that T n determines the distribution of the test statistic under the null hypothesis and that T n b h converges in distribution, 15 T n b h nh 1/2 {ˆp(x) p(x)} 2 π(x) dx b h N(, V ) (n ), (S.1) i.e., T n is under H approximately normally distributed, with mean b h O(h 1/2 ) tending to infinity and variance V. Before proving (S.1) we consider the second term of T n and, writing p(x) p θ (x) + n 1/2 h 1/4 n (x), we can split it into three parts: nh 1/2 {p θ (x) pˆθ(x) + n 1/2 h 1/4 n (x)} 2 π(x) dx nh 1/2 {p θ (x) pˆθ(x)} 2 π(x) dx + nh 1/2 {n 1/2 h 1/4 n (x)} 2 π(x) dx + 2nh 1/2 {p θ (x) pˆθ(x)}n 1/2 h 1/4 n (x) π(x) dx n (x) 2 π(x) dx + o p (1). The resulting integral comes from the middle part. To see that the first part involving (p θ pˆθ) 2 is asymptotically negligible, use assumptions (A6) and (A7), which ensure that p θ is Lipschitz C 217 Biometrika Trust

2 U.U. MÜLLER AND I. VAN KEILEGOM 2 25 3 35 in θ, uniformly in x, and that ˆθ is root-n consistent. This gives the order O p (h 1/2 ) o p (1) as h. The third part in the above display vanishes with rate O p (h 1/4 ) o p (1), which follows by the same arguments, combined with the assumption that n ( ) is uniformly bounded. Now consider the mixed third term of T n. We will show that this term is asymptotically negligible, nh 1/2 {p(x) pˆθ(x)}{ˆp(x) p(x)}π(x) dx o p (1). (S.2) The proof requires some auxiliary results, which we provide when we verify (S.1), the asymptotic normality of T n. It is therefore postponed to the end of this proof. Let Λ denote the cumulative conditional hazard function Λ(t x) log S(t x) and let ˆΛ(t x) be the nonparametric estimator provided in Xu and Peng (214). These authors show that (nh) 1/2 {ˆp(x) p(x)} S(Y 1 (n) x)(nh)1/2 {ˆΛ(Y 1 (n) x) Λ(Y 1 (n) x)} + o p (1). (S.3) They also show that Y 1 (n) is weakly consistent for sup x τ (x), and that (nh) 1/2 {S(Y 1 (n) x) p(x)} o p (1). This yields (nh) 1/2 {ˆp(x) p(x)} p(x)(nh) 1/2 {ˆΛ(Y 1 (n) x) Λ(Y 1 (n) x)} + o p (1). In order to derive asymptotic normality of T n, we will need an approximation of ˆΛ Λ, which is, for example, provided in Du and Akritas (22). They show in their Theorem 3.1 that, uniformly in x, ˆΛ(t x) Λ(t x) w hi (x)ζ i (t x) + r(t, x) i1 with sup x,t t r(t, x) O{(nh) 3/4 (log n) 3/4 } almost surely, where t satisfies sup x H(t x) < 1, with w hi as specified in Section 2, and with ζ i (t x) defined in (6). It follows from arguments in Xu and Peng that ˆΛ(Y(n) 1 x) Λ(Y 1 (n) x) is asymptotically equivalent to n i1 w hi(x)ζ i {τ (x) x}. Then, T n nh 1/2 {ˆp(x) p(x)} 2 π(x) dx + o p (1) [ nh 1/2 w hi (x)ζ i {τ (x) x}] dx i1 [ + nh 1/2 r{τ (x), x} w hi (x)ζ i {τ (x) x}] dx i1 + nh 1/2 [r{τ (x), x}] 2 π(x) dx + o p (1) [ nh 1/2 w hi (x)ζ i {τ (x) x}] dx + op (1). i1 The last two terms in the above decomposition of T n (second equality) are of order o p (1), which can be explained as follows. Consider the approximation (S.3) of ˆp(x) p(x) by Du and Akritas (22), which involves the difference ˆΛ Λ; see the proof of Theorem 3.1 in Du and Akritas

Goodness-of-fit tests for the cure rate 3 (22) for the definition of that difference. It depends on a quantity Ĥ whose order is given in Lemma 4.2 of the same paper. This yields ˆp(x) p(x) O p {(nh) 1/2 (log n) 1/2 }. Combining this order for the main term n i1 w hi(x)ζ i {τ (x) x} and the order O p {(nh) 3/4 (log n) 3/4 } for the remainder term yields the order O p {nh 1/2 (nh) 5/4 (log n) 5/4 } for the mixed second 4 term. This is o p (1) since we assume nh 3 (log n) 5. The third term is o p (1) because it contains the squared remainder term and is therefore of smaller order. Inserting the definition of w hi into the above gives Now write where T 1n h1/2 n T 2n h1/2 n T n nh 1/2 [n 1 i1 i1 j1 j i i1 K h (x X i ) ζ i {τ (x) x}] dx + op (1). (S.4) T n T 1n + T 2n + o p (1), Kh 2(x X i) f 2 ζi 2 {τ (x) x}π(x) dx, (x) K h (x X i )K h (x X j ) f 2 ζ i {τ (x) x}ζ j {τ (x) x}π(x) dx. (x) Analogously as in the proof of Proposition 1 in Härdle and Mammen (1993), but now with ζ j in place of heteroscedastic errors ε j, we obtain the bias term as follows: E(T 1n ) h 1/2 f 2 (x) E[K2 h (x X)ζ2 {τ (x) x)}]π(x) dx h 1/2 h 1/2 b h + o(1), f 2 (x) E[ 1 f 2 (x) with, using assumptions (A1) and (A5), b h h 1/2 R(K) ) E[ζ 2 {τ (x) x} X] ] π(x) dx h 2 K2( x X h h K2 (v)f(x + vh)µ xx (x + vh) dv π(x) dx π(x) µ xx (x) dx O(h 1/2 ). We have nh 3, thanks to our assumptions on the bandwidth, and therefore var(t 1n X 1,..., X n ) O p (n 1 h 3 ) o p (1), T 1n b h + o p (1). The desired normal approximation (S.1), T n b h N(, V ), now follows from this statement, 45 combined with the asymptotic normality of T 2n, T 2n N(, V ) in distribution as n, which follows analogously to the arguments outlined in Härdle and Mammen (1993), using The-

5 55 4 U.U. MÜLLER AND I. VAN KEILEGOM orem 2.1 by de Jong (1987): write T 2n h1/2 V ij, n V ij i1 j1 j i K h(x X i )K h (x X j ) f 2 ζ i {τ (x) x}ζ j {τ (x) x}π(x) dx. (x) De Jong s theorem requires that T 2n is clean in the sense that the conditional expectations of the V ij given X i (i 1,..., n) vanish. Using similar arguments as in Cao and González- Manteiga (28; pages 183-184), we see that ζ i {τ (x) x} and ζ j {τ (x) x} in the expression of V ij can be replaced by ζ i {τ (X i ) X i } and ζ j {τ (X j ) X j }. Since E[ζ{τ (x) x} X x], it follows that T 2n is indeed clean up to an asymptotically negligible term (and the asymptotic mean is zero). In order to obtain the asymptotic variance, it suffices to calculate the second moment of V ij. We have ( E(Vij) 2 1 E f 2 (x) K h(x X i )K h (x X j )ζ i {τ (x) x}ζ j {τ (x) x}π(x) p2 (z) ) f 2 (z) K h(z X i )K h (z X j )ζ i {τ (z) z}ζ j {τ (z) z}π(z) dx dz p 2 (z)π(x)π(z) f 2 (x)f 2 E[K h (x X i )K h (z X i )ζ i {τ (x) x}ζ i {τ (z) z}] (z) E[K h (x X j )K h (z X j )ζ j {τ (x) x}ζ j {τ (z) z}] dx dz p 2 (z)π(x)π(z) f 2 (x)f 2 (z) ( K h (x u)k h (z u)e[ζ{τ (x) x}ζ{τ (z) z} X u]f(u) du ) 2 dx dz p 2 (z)π(x)π(z) µ xz (x)µ xz (z) { 1 ( z x ) }2 f(z) h K K dx dz + o(1) h as h. Hence, using the fact that T 2n is clean, we obtain with ( h 1/2 ) var(t 2n ) var n 2 V ij V h 2h i<j 4h n 2 cov(v ij, V kl ) 4h n 2 i<j k<l 2hE(V 2 ij) + o(1) V h + o(1), p 2 (z)π(x)π(z){ 1 ( z x ) }2 f(z) h K K h i<j E(V 2 ij) + o(1) E[ζ{τ (x) x}ζ{τ (z) z} X x]e[ζ{τ (x) x}ζ{τ (z) z} X z] dx dz. Since p( ) is continuous and µ xz (z) is continuous in z by assumption (A5), we can approximate V h by V, as on page 1931 of Härdle and Mammen (1993), and obtain the desired formula, {p V 2K (4) 2 (x)π(x)e[ζ 2 {τ (x) x} X x] } 2dx. ()

Goodness-of-fit tests for the cure rate 5 It remains to verify equation (S.2), that the third mixed term of T n is asymptotically negligible. To see this use the arguments by Xu and Peng (214) and Du and Akritas (22), which we used above to derive approximation (S.4), namely ˆp(x) p(x) p(x)n 1 i1 K h (x X i ) ζ i {τ (x) x} + O{(nh) 3/4 (log n) 3/4 }. almost surely. Inserting the first term on the right-hand side into (S.2) gives nh 1/2 1 {p(x) pˆθ(x)}p(x)n i1 K h (x X i ) ζ i {τ (x) x}π(x) dx. (S.5) To determine the order of the part involving the remainder term remember that p(x) p θ (x) + n 1/2 h 1/4 n (x) with n uniformly bounded. Now use the assumptions on the bandwidth and assumptions (A6) and (A7) to obtain p(x) pˆθ(x) p θ (x) pˆθ(x) + n 1/2 h 1/4 n (x) O p (n 1/2 h 1/4 ). Hence the second part of the statistic has the rate O p { nh 1/2 (nh) 3/4 (log n) 3/4 n 1/2 h 1/4} O p { n 1/4 h 1/2 (log n) 3/4} o p (1), so it suffices to study statistic (S.5). Using a Taylor expansion we write it as a sum of three terms, T 31 + T 32 + T 33, where T 31 nh 1/2 (ˆθ θ) p(x)n 1 K h (x X i ) ζ i {τ (x) x}ṗ θ (x)π(x) dx, i1 T 32 nh 1/2 (ˆθ θ) p(x)n 1 K h (x X i ) ζ i {τ (x) x}{ṗ ξ (x) ṗ θ (x)}π(x) dx, i1 T 33 nh 1/2 n 1/2 h 1/4 p(x)n 1 K h (x X i ) ζ i {τ (x) x} n (x)π(x) dx. i1 Here ṗ t denotes the vector of partial derivatives with respect to the parameter evaluated at t, and 6 ξ is an intermediate value between θ and ˆθ. Consider the second term first and use the Lipschitz assumption (A6) on the gradient ṗ θ and the root-n consistency of ˆθ, assumption (A7), to obtain T 32 O p (nh 1/2 n 1/2 n 1/2 ) o p (1). The first and the third term, T 31 and T 33 are similar. We consider only T 33, which is more difficult to handle since it contains the additional factor h 1/4 and therefore converges at a slower rate. To get the desired o p (1) rate note that the order of the integral in T 33 is O(h 3 ) using the assumptions on the kernel K, and the formula of E{ζ i (t x) X i } given in equation (3.3) on page 474 in Van Keilegom and Veraverbeke (1997). Our function ζ is the function g in that paper. This gives E(T 33 ) nh 1/2 n 1/2 h 1/4 O(h 3 ) O(n 1/2 h 13/4 ) o(1).

6 U.U. MÜLLER AND I. VAN KEILEGOM The second moment, E(T33 2 ), is nh 1/2 n 2 ( E i1 p(x)p(y) K h(x X i ) K h (y X i ) f(y) E [ ] ) ζ i {τ (x) x}ζ i {τ (y) y} X i n (x) n (y)π(x)π(y) dx dy ( + nh 1/2 n 2 1 E i,j1 j i p(x)p(y) K h(x X i ) K h (y X j ) f(y) E [ ] [ ] ) ζ i {τ (x) x} X i E ζj {τ (y) y} X j n (x) n (y)π(x)π(y) dx dy Now use again the order of the integral in the expression of T 33 to obtain E(T 2 33) O(h 1/2 ) + O(nh 1/2 h 6 ) o(1) + O(nh 6.5 ) o(1). Since both the mean and the variance of T 33 tend to zero in probability, we can apply Chebyshev s inequality and obtain the desired T 33 o p (1). This completes the proof of (S.2) 65 S.2. Proof of equation (8) We sketch the proof of the asymptotc equivalence of T n and T n, that is, Tn T n o p (1). Consider T n T n nh 1/2 {ˆp(x) pˆθ(x)} 2 d{ ˆF (x) F (x)} nh 1/2 {ˆp(x) pˆθ(x)} 2 d{ ˆF (x) F (x)} +nh 1/2 {ˆp(x) pˆθ(x)} 2 d{ F (x) F (x)}, (S.6) where F (x) ˆF (x vs) dl(v), L( ) l(u) du, l is a symmetric kernel density function with mean zero and bounded support, and s s n is a bandwidth sequence controlling the smoothness of F (x). We first show sup ˆF (x) F (x) o p (n 1/2 ), (S.7) x 7 assuming ns 4 as n, using arguments similar, but in fact much simpler, than those used in the proof of Lemma A.3 in a 217 preprint by Neumeyer and Van Keilegom on distribution functions of residuals entitled Bootstrap of residual processes in regression: to smooth or not to smooth?. Write F (x) ˆF (x) { ˆF (x vs) ˆF (x)} dl(v) { ˆF (x vs) ˆF (x) F (x vs) + F (x)} dl(v) + {F (x vs) F (x)} dl(v) n 1/2 {E n (x vs) E n (x)} dl(v) + {F (x vs) F (x)} dl(v),

Goodness-of-fit tests for the cure rate 7 where E n denotes the empirical process n 1/2 n i1 {1(X i ) F ( )}. Since E n is asymptotically equicontinuous, and since l has a bounded support, it follows that the first term has the 75 order o p (n 1/2 ), uniformly in x. To see that the second term has the order O p (s 2 ) o p (n 1/2 ), use the fact that f is bounded, that ns 4 o(1) and that the kernel l has mean zero. This proves (S.7). Consider the first term of (S.6). Using integration by parts, we can write it as the sum of two integrals.the first integral can be bounded by 2nh 1/2 ˆF (x) F (x) ˆp(x) pˆθ(x) ˆp (x) p ˆθ(x) dx. To determine the rate of ˆp(x) pˆθ(x), we only need to consider ˆp(x) p(x), because it is of larger order than pˆθ(x) p(x), which converges with the parametric root-n rate, thanks to Assumptions (A6) and (A7). As explained in the paragraph after equation (S.4), we have ˆp(x) p(x) O p {(nh) 1/2 (log n) 1/2 }. Further it can be shown that ˆp (x) p ˆθ(x) O p {(nh 3 ) 1/2 (log n) 1/2 }, uniformly in x. This combined with (S.7) gives the order nh 1/2 o p (n 1/2 ) O p {(nh) 1/2 (log n) 1/2 } O p {(nh 3 ) 1/2 (log n) 1/2 } o p {(nh 3 ) 1/2 log n} o p (1) for the above term. For the last step we used the assumptions on the bandwidth h h n in Theorem 1: we assume that nh 3 (log n) 5, which implies (nh 3 ) 1/2 log n O(1). The same arguments yield for the second integral the order nh 1/2 o p (n 1/2 ) [O p {(nh) 1/2 (log n) 1/2 }] 2, which is o p (1), since it is of smaller order than the first one. This yields the desired order o p (1) for the first term on the right-handside of (S.6). 8 The second term of (S.6) is nh 1/2 {ˆp(x) pˆθ(x)} 2 { } dx, where (ns) 1 n i1 l{(x i x)/s} is the standard kernel density estimator with O p {(ns) 1/2 (log n) 1/2 }, uniformly in x. It is of order nh 1/2 (nh) 1 log n (ns) 1/2 (log n) 1/2 n 1/2 h 1/2 s 1/2 (log n) 3/2 o(1), if s is chosen such that nhs(log n) 3. Hence both terms of (S.6) are of order o p (1), which completes the proof of T n T n o p (1). REFERENCES DE JONG, P. (1987). A central limit theorem for generalized quadratic forms. Probab. Theory Related Fields., 75, 261-277. 85 DU, Y. AND AKRITAS, M.G. (22). Uniform strong representation of the conditional Kaplan Meier process. Math. Methods Statist., 11, 152-182. HÄRDLE, W. AND MAMMEN, E. (1993) Comparing nonparametric versus parametric regression fits. Ann. Statist., 21, 1926-1947. VAN KEILEGOM, I. AND VERAVERBEKE, N. (1997). Estimation and bootstrap with censored data in fixed design 9 nonparametric regression. Ann. Inst. Statist. Math., 49, 467-491. XU, J. AND PENG, Y. (214). Nonparametric cure rate estimation with covariates. Canad. J. Statist., 42, 1-17. [Received 2 January 217. Editorial decision on 1 April 217]