Inconsistency of the MLE for the joint distribution. of interval censored survival times. and continuous marks

Size: px

Start display at page:

Download "Inconsistency of the MLE for the joint distribution. of interval censored survival times. and continuous marks"

Bennett Harrison
5 years ago
Views:

1 Inconsistency of the MLE for the joint distribution of interval censored survival times and continuous marks By M.H. Maathuis and J.A. Wellner Department of Statistics, University of Washington, Seattle, Washington 98195, U.S.A. August 23, 2005 Summary This article considers the nonparametric maximum likelihood estimator (MLE) for the joint distribution function of an interval censored survival time and a continuous mark variable. We derive a new explicit formula for the MLE that has both computational and theoretical advantages. Using this formula and the mark specific cumulative hazard function of Huang & Louis (1998), we derive the almost sure limit of the MLE. We conclude that the MLE is inconsistent in general. We show that the inconsistency can be repaired by discretizing the marks, and illustrate the behavior of the inconsistent and repaired MLE in several examples. Some key words: Competing risk; Inconsistency; Interval censoring; Mark variable; Multivariate distribution; Nonparametric maximum likelihood; Survival analysis 1

2 1. Introduction We consider bivariate data (X,Y ), where X is a survival time and Y is a mark variable which is observed if and only if X is not right censored. This type of data arises in various situations. For example, the classical competing risks problem fits in this framework, with X being the failure time and Y the failure cause. Alternatively, X can be the time of onset of a disease and Y its incubation period, or X can be the time of death and Y a measure of utility or cost, such as quality adjusted lifetime or lifetime medical costs (Huang & Louis, 1998). Another example is the HIV vaccine trial data analyzed in Hudgens et al. (2005), where X is the time of HIV infection and Y is the viral distance between the infecting HIV virus and the virus in the vaccine. In practice the marginal distribution of Y is often of interest. However, since Y is observed if and only if X is not right censored, the observed values of Y typically form a biased sample. It is therefore important to consider the bivariate data. We focus on the nonparametric maximum likelihood estimator (MLE) ˆF n (x,y) of the bivariate distribution function F(x,y). One can distinguish various models, with different types of censoring mechanisms for X, and with Y being discrete or continuous. We first discuss the case that Y is discrete, which gives the competing risks model. Aalen (1976, 1978) and Kalbfleisch & Prentice (1980), 7.2, pages , studied the MLE in this model when X is subject to right censoring. The generalization to interval censored competing risks data was considered by Hudgens et al. (2001), Jewell et al. (2003) and Jewell & Kalbfleisch (2004). Maathuis (2005a) studied asymptotic properties of the MLE in the current status competing risks model and proved uniform strong consistency of the MLE under modest conditions. We now consider the case that Y is continuous, which gives the continuous mark 2

3 model. Huang & Louis (1998) studied this model when X is subject to right censoring. They proved uniform strong consistency and the limiting distribution of the MLE. Hudgens et al. (2005) considered the extension to interval censored continuous mark data. They characterized the MLE and studied its finite sample properties. In this article we continue the study of interval censored continuous mark data. Our main focus is on asymptotic properties of the MLE, and in particular on consistency. In 2 we use the analogy with univariate right censored data to derive a new explicit formula for the MLE. In 3 we use this formula and the mark specific cumulative hazard function of Huang & Louis (1998) to derive the almost sure limit of the MLE. We conclude that the MLE is inconsistent in general. In 4 we show that the inconsistency can be repaired by discretizing the marks. In 5 we illustrate the behavior of the inconsistent and repaired MLE in four examples. Finally, 6 gives a summary and a short discussion of some remaining issues. 2. Explicit formula for the MLE 2.1. Intermezzo: univariate right censored data Hudgens et al. (2005) noted a close connection between the MLE for univariate right censored data and the MLE for interval censored continuous mark data. We will use this connection in 2.2 to derive a new explicit formula for the MLE in the interval censored continuous mark model. However, we first briefly review univariate right censored data in a way that shows the similarity between the two models. Let X > 0 be a survival time subject to right censoring. Let T > 0 be the censoring variable, with T independent of X. Let U = X T min(x,t) and = 1{X T }. We are interested in the MLE ˆF n (x) of F(x) = P(X x) based on n independent and identically distributed copies of (U 1, 1 ),...,(U n, n ) of (U, ). 3

4 We call the set of X values that are consistent with an observation (U, ) an observed set A. Thus, we have A = {U} if = 1 and A = (U, ) if = 0. Let U (1),...,U (n) be the order statistics of U 1,...,U n, and let (i) and A (i) be the corresponding values of and A. We assume all A i with i = 1 are distinct, since this will be the case for the continuous mark data. However, we allow ties in the T s and U s provided this assumption is not violated. We break such ties in U arbitrarily after ensuring that observations with = 1 are ordered before those with = 0. Assuming F has a density f with respect to some dominating measure µ, the likelihood (conditional on G) is L(F) = n i=1 q(u i, i ), where q(u,δ) = f(u) δ {1 F(u)} 1 δ. The first term of q is a density-type term, and hence L(F) can be made arbitrarily large by letting f peak at some value U i with i = 1. This problem is usually solved by maximizing L(F) over the class of distribution functions that have a density with respect to counting measure on the observed failure times. We can then write L(F) = n i=1 P F(A i ), where P F (A) is the probability of A under F. It is well-known that the MLE in censored data problems can only assign mass to a finite number of disjoint regions (Turnbull, 1976). In these regions the observed sets have maximal overlap, and hence they were called maximal intersections by Wong & Yu (1999). Maathuis (2005b) defined a height map h : R d N, where d is the dimension of the observed sets and h(x) is the number of observed sets containing x. Furthermore, she defined a version of the observed sets in which ties are resolved, called canonical observed sets. She then showed that the maximal intersections correspond to the local maxima of the canonical observed sets. For univariate right censored data, this means that each A (i) with i I = {i {1,...,n} : (i) = 1} is a maximal intersection. We denote these maximal intersections by M (i). This notation may seem a little redundant since M (i) = A (i), but will be useful in the next section. Furthermore, there is an extra maximal intersection M (n+1) = 4

5 A (n) = (U (n), ) if and only if (n) = 0. Let I be the collection of indices of all maximal intersections. Thus, I = I if (n) = 1 and I = I {n + 1} if (n) = 0. Let p i be the probability mass of maximal intersection M (i), i I. We can then write the likelihood in terms of the p i s: n P(A i ) = i=1 n p j 1{M (j) A (i) } = j I i=1 n i=1 p (i) i j i+1,j I p j 1 (i). (1) The MLE ˆp maximizes this expression under the constraints p i = 1 and p i 0 for all i I. (2) i I It is well-known that ˆp is the Kaplan-Meier or product-limit estimator, given by i 1 ( ˆp i = 1 ) (j) (i) n j + 1 n i + 1, i I, j=1 and ˆp n+1 = 1 i I ˆp i if (n) = 0 (see for example Shorack & Wellner (1986), Chapter 7, pages ). Equivalently, we can write j i,j I ˆp j = j i 1 ( 1 ) (j), i I. n j + 1 The vector ˆp is uniquely determined. We obtain ˆF n (x) by summing all probability mass in the interval (0,x]. It is well-known that ˆF n (x) is non-unique for x > U (n) if and only if (n) = 0. This is caused by the fact that the MLE is indifferent to the distribution of mass within a maximal intersection, called representational nonuniqueness by Gentleman & Vandal (2002). Since all {M (i) : i I} are points this non-uniqueness only occurs when M (n+1) exists, which happens if and only if (n) = 0. 5

6 2.2. Continuous mark data We first introduce the model formally. Let X R + = (0, ) be a survival time, let Y R be a continuous mark variable, and let F(x,y) = P(X x,y y) be their joint distribution. Let X be subject to interval censoring case k, using the terminology of Groeneboom & Wellner (1992). Let T = (T 1,...,T k ) be the k observation times and let G be their distribution. We assume T is independent of (X,Y ) and G({0 < T 1 < < T k }) = 1. We use subscripts to denote the marginal distributions of G. For example, G 1 is the distribution of T 1 and G 23 is the distribution of (T 2,T 3 ). Let = ( 1,..., k+1 ) be a vector of indicator functions, where j = 1{T j 1 < X T j } for j = 1,...,k+1, T 0 = 0 and T k+1 =. We say that X is right censored if k+1 = 1, and we assume Y is observed if and only if X is not right censored. Thus, we observe W = (T,,Z), where Z = + Y and + = k j=1 j = 1 k+1. We study the nonparametric maximum likelihood estimator ˆF n (x,y) for F(x,y) based on n independent and identically distributed copies W 1,...,W n of W, where W i = (T i, i,z i ). We allow ties between components of the vectors T i and T j for i j. In this model an observed set A is the set of (X,Y ) values that are consistent with an observation W = (T,, Z). Thus, (T j 1,T j ] {Z} if j = 1, j = 1,...,k A = (T k, ) R if k+1 = 1. Note that A is a line segment if + = 1 and A is a half plane if k+1 = 1. Assuming F has a density f with respect to some dominating measure µ X µ Y, the likelihood (conditional on G) is given by L(F) = n i=1 q(w i), where q(w) = q(t,δ,z) = { k δj f(s,z)µ X (ds)} (1 F X (t k )) δ k+1. (3) (t j 1,t j ] j=1 6

7 The first term of q is a density-type term. Hence, L(F) can be made arbitrarily large by letting f(s,z) peak at z = Z i for some observation with +i = 1. We therefore define the MLE ˆF n (x,y) to be the maximizer of L(F) over the class F of all bivariate distribution functions that have a marginal density f Y with respect to counting measure on the observed marks. We can then write L(F) = n i=1 P F(A i ). As in Maathuis (2005b), we call the projection of A on the x- and y-axis its x- interval and y-interval. We denote the left and right endpoint of the x-interval of A by L and R and define a new variable U: k+1 k+1 L = j T j 1, R = j T j, U = + R + k+1 L. (4) j=1 j=1 Note that U equals T if X is subject to current status censoring. The variable U plays an important role in this article, because it determines the order of the observations. Let U (1),...,U (n) be the order statistics of U 1,...,U n and let (i), Z (i), A (i), L (i) and R (i) be the corresponding values of, Z, A, L and R. Here (i) = ( 1(i),..., k+1,(i) ). We break ties in U arbitrarily after ensuring that observations with + = 1 are ordered before those with + = 0. Let I = {i {1,...,n} : +(i) = 1}. Recall that the maximal intersections are the local maxima of the height map of the canonical observed sets. Since Y is continuous, the observed sets A (i), i I, are completely distinct with probability one. Hence, each such A (i) contains exactly one maximal intersection M (i) : M (i) = (max{{l (j) : j / I,j < i} {L (i) }},R (i) ] {Z (i) }, i I. (5) To understand this expression, let S (i) be the set of right censored observations A (j) with L (i) < L (j) < R (i). Then (5) implies that M (i) = A (i) if S (i) = and M (i) A (i) otherwise. Furthermore, in the latter case the left endpoint of M (i) is determined by 7

8 the largest L (j) with A (j) S (i). The right endpoints of M (i) and A (i) are always identical. Equation (5) also implies that the maximal intersections can be computed in O(n log n) time, which is faster than the height map algorithm of Maathuis (2005b) due to the special structure in the data. We again have an extra maximal intersection M (n+1) = A (n) = (U (n), ) R if and only if +(n) = 0. Let I be the collection of indices of all maximal intersections. Thus, I = I if +(n) = 1 and I = I {n + 1} if +(n) = 0. Let p i be the probability mass of maximal intersection M (i), i I. We can then write the likelihood as n P(A i ) = i=1 n p j 1{M (j) A (i) } = j I i=1 n i=1 p +(i) i j i+1,j I p j 1 +(i). (6) The MLE ˆp maximizes this expression under the constraints (2). From the analogy with likelihood (1) it follows immediately that i 1 ( ˆp i = 1 ) +(j) +(i) n j + 1 n i + 1, i I, j=1 and ˆp n+1 = 1 i I ˆp i if +(n) = 0. Equivalently, we can write j i,j I ˆp j = j i 1 ( 1 ) +(j), i I. (7) n j + 1 These formulas differ from the ones given in 2.3 of Hudgens et al. (2005). The current formulas have several advantages. First, the tail probabilities (7) can be computed in time complexity O(n log n), since the computationally most intensive step consists of sorting the U s. Furthermore, the current form provides additional insights in the behavior of the MLE. In particular, it shows that the MLE can be viewed as a right endpoint imputation estimator (see Remark 1) and it allows for an easy derivation of 8

9 the almost sure limit of the MLE (see 3). The vector ˆp is again uniquely determined. This was noted by Hudgens et al. (2005) and also follows from our derivation here. We obtain ˆF n (x,y) by summing all mass in the region (0,x] (,y]. We define a marginal MLE for X by letting ˆF Xn (x) = ˆF n (x, ). The estimators ˆF n and ˆF Xn can suffer considerably from representational non-uniqueness, since the maximal intersections {M (i) : i I} are line segments and M (n+1) extends to infinity in two dimensions. We denote the estimator that assigns all mass to the upper right corners of the maximal intersections by ˆF l n, since it is a lower bound for the MLE. Similarly, we denote the estimator that assigns all mass to the lower left corners of the maximal intersections by ˆF u n, since it is an upper bound for the MLE. The formulas for ˆF l n simplify considerably: 1 ˆF l Xn(x) = ˆF l n(x,y) = U (i) x ( 1 ) +(i), (8) n i + 1 n ˆp i 1{U (i) x,z (i) y} i=1 = U (i) x U (j) <U (i) ( 1 ) +(j) +(i) 1{Z (i) y}, (9) n j + 1 n i + 1 where U was defined in (4). Remark 1: The MLE ˆF l n can be viewed as a right endpoint imputation estimator. Namely, replace the observed sets A (i) with +(i) = 1 by their right endpoint: {U A (i) } {Z (i) } if i I, (i) = A (i) if i / I. Then the intersection structures of {A (i) } n i=1 and {A (i) }n i=1 are identical. Furthermore, the maximal intersections of {A (i) }n i=1 are {M (i) = A (i) : i I}. Hence, writing 9

10 the likelihood for the imputed data in terms of p yields exactly the same likelihood as (6). As a result the values ˆp i, i I, are identical to the ones for the original data. Furthermore, since ˆF n l assigns mass to the upper right corners of the maximal intersections, ˆF n l is completely equivalent to the MLE for the imputed data. Since the observed sets A (i) impute an x-value that is always at least as large as the unobserved value X, ˆF l Xn tends to have a negative bias. 3. Inconsistency of the MLE We now derive the almost sure limits FX l and F l of the MLEs ˆF Xn l and ˆF n. l In some cases representational non-uniqueness disappears in the limit, so that F X = FX l and F = F. l This occurs for all (x,y) R + R if and only if the maximal intersections M (i), i I, converge to points and i I ˆp i 1 as n, see Examples 1 and 2 in 5. If these conditions fail, then the upper bounds FX u and F u can be derived from their lower bounds by reassigning mass from the upper right corners to the lower left corners of the maximal intersections. We illustrate this in Examples 3 and 4 in 5. However, we first derive the lower bounds FX l and F. l Let H n (x) = P n 1{U x}, x 0, V n (x,y) = P n + 1{U x,z y}, x 0,y R, and V 1n (x) V n (x, ) = P n + 1{U x}. Here U is defined in (4) and P n f(x) = n 1 n i=1 f(x i). Furthermore, let V n (ds,y) Λ n (x,y) = 1 H n (s ), Λ 1n (x) Λ V 1n (ds) n (x,y) = 1 H n (s ). 10

11 Since Λ n (dx,y) = P n + 1{U = x,z y} P n 1{U x} and Λ 1n (dx) = P n + 1{U = x} P n 1{U x} we can write equations (8) and (9) in terms of Λ 1n and Λ n : 1 ˆF Xn(x) l = s x{1 Λ 1n (ds)}, (10) ˆF n(x,y) l = {1 Λ 1n (du)} Λ n (ds,y). (11) s x u<s Note (10) is analogous to the Kaplan-Meier estimator for right censored data, and (11) is analogous to equation (3.3) of Huang & Louis (1998). However, our functions Λ 1n and Λ n are defined differently. As we will see in the following lemma and theorems, this difference lies at the root of the inconsistency problems of the MLE. Lemma 3.1 For I R d, d 1, let D(I) be the space of cadlag functions on I (cadlag = right continuous with left limits). Let be the supremum norm on (D(R + ), D(R + ), D(R + R)). Then (H n H, V 1n V 1, V n V ) 0 almost surely, (12) where V (x,y) = V 1 (x) = F(t,y)dG j (t) F(s,y)dG j 1,j (s,t), (13) j=1 j=1 H(x) = V 1 (x) + F X (t)dg j (t) j=2 0 s t x j=2 0 s t x F X (s)dg j 1,j (s,t), (14) {1 F X (s)}dg k (s). (15) 11

12 Proof: Equation (12) follows immediately from the Glivenko-Cantelli theorem, with H(x) = E(1{U x}), V (x,y) = E( + 1{U x,z y}) and V 1 (x) = V (x, ) = E( + 1{U x}). We now express H, V and V 1 in terms of F and G. Note that the events [ j = 1], j = 1,...,k + 1, are disjoint. Furthermore, U = T j and Z = Y on [ j = 1], j = 1,...,k and U = T k on [ k+1 = 1]. Hence, V (x,y) = E( + 1{U x,z y}) = P( j = 1,Y y,t j x) = = = j=1 P(X (T j 1,T j ],Y y,t j x) j=1 j=1 0 s t x {F(t,y) F(s,y)}dG j 1,j (s,t) F(t,y)dG j (t) F(s,y)dG j 1,j (s,t), j=1 j=2 0 s t x using T 0 = 0, X > 0 and G({0 < T 1 < < T k }) = 1 in the last equality. Taking y = yields the expression for V 1 (x). The expression for H follows similarly, using H(x) = E1{U x} = P( j = 1,T j x) + P( k+1 = 1,T k x). j=1 The differentials of V and V 1 with respect to x are V (dx,y) = V 1 (dx) = F(x,y)dG j (x) F(s,y)dG j 1,j (s,x), (16) j=1 F X (x)dg j (x) j=2 j=1 j=2 F X (s)dg j 1,j (s,x). (17) Let τ be such that H(τ) < 1. We define 0/0 = 0 and f(x ) = lim t x f(x). 12

13 Theorem 3.2 Let be the supremum norm on (D[0,τ], D([0,τ] R)). Then ( Λ 1n Λ 1, Λ n Λ ) 0 almost surely, where V (ds,y) Λ (x,y) =, x [0,τ],y R, (18) 1 H(s ) V 1 (ds) Λ 1 (x) = Λ (x, ) =, x [0,τ]. (19) 1 H(s ) Proof: The proof is similar to the discussion on page 1536 of Gill & Johansen (1990). For all x 0, let H n (x) H n (x ) and consider the mappings (H n,v 1n, V n ) ((1 H n ) 1, V 1n, V n ) ( Λ 1n, Λ n ) on the spaces (D [0,τ], D[0,τ], D([0,τ] R)) (D [0,τ], D[0,τ], D([0,τ] R)) (D[0,τ], D([0,τ] R)), where D (0,τ] is the space of caglad (left continuous with right limits) functions on (0,τ]. The first mapping is continuous with respect to the supremum norm when we restrict the domain of its first argument to elements of D [0,τ] that are bounded by say {1 + H(τ)}/2 < 1. Strong consistency of H n ensures it satisfies this bound with probability one for n large enough. The second mapping is continuous with respect to the supremum norm by the Helly-Bray lemma. Combining the continuity of these mappings with Lemma 3.1 yields the result of the theorem. Theorem 3.3 Let be the supremum norm on (D[0,τ], D([0,τ] R)). Then ( ˆF l Xn F l X, ˆF l n F l ) 0 almost surely, 13

14 where FX (x) l = 1 {1 Λ 1 (ds)}, (20) s x F (x,y) l = {1 Λ 1 (ds)} Λ (du,y). (21) u x s<u Proof: To derive the almost sure limit of ˆF Xn consider the mapping Λ 1n s x{1 Λ 1n (ds)} = 1 ˆF l Xn(x) (22) on the space D[0,τ] to itself. This mapping is continuous with respect to the supremum norm when its domain is restricted to functions of uniformly bounded variation (Gill & Johansen (1990), Theorem 7). Note that Λ 1n 1/{1 H n (τ)} < 2/{1 H(τ)} with probability one for n large enough. Together with the monotonicity of Λ 1n this implies that with probability one Λ 1n is of uniformly bounded variation for n large enough. The almost sure limit of ˆF Xn now follows by combining Theorem 3.2 and the continuity of (22). To derive the almost sure limit of ˆF n consider the mapping ( Λ 1n, Λ n ) {1 Λ 1n (ds)} Λ n (du,y) = ˆF n(x,y) l u x s<u on the space (D[0, τ], D([0, τ] R)) to D([0, τ] R). This mapping is continuous with respect to the supremum norm when its domain is restricted to functions of uniformly bounded variation (Huang & Louis (1998), Theorem 1). Note that Λ n (x,y) Λ 1n (x), so that with probability one the pair ( Λ n, Λ 1n ) is uniformly bounded for n large enough. The result then follows as in the first part of the proof. 14

15 Corollary 3.4 For x [0,τ],y R, we can write F (x,y) l = Λ (ds,y) Λ 1 (ds) df X (s) l V (ds,y) = V 1 (ds) df X (s). l (23) Proof: Combining equations (20) and (21) yields F (x,y) l = {1 FX (s )}Λ l (ds,y). (24) Taking y = gives F l X (x) = F l (x, ) = {1 F l X (s )}Λ 1 (ds). Hence, df l X (s) = {1 F l X (s )}Λ 1 (ds). Combining this with equation (24) yields the first equality of (23). The second equality follows from Λ (ds,y) = V (ds,y)/{1 H(s )} and Λ 1 (ds) = V 1 (ds,y)/{1 H(s )}. Corollary 3.5 Let X and Y be independent. Then F l (x,y) = F l X (x)f Y (y), x [0,τ],y R. (25) Proof: If X and Y are independent, equations (16) and (17) yield V (ds,y) = F Y (y)v 1 (ds). Substituting this into equation (23) gives the result. Corollary 3.6 Let X be subject to current status censoring (k = 1). Then F (x,y) l = P(Y y X s)dfx (s), l x [0,τ],y R. (26) Proof: For k = 1 equations (16) and (17) reduce to V (ds,y) = F(s,y)dG(s) and V 1 (ds) = F X (s)dg(s). Hence, V (ds,y)/v 1 (ds) = F(s,y)/F X (s) = P(Y y X s). We now consider necessary and sufficient conditions for consistency. From the oneto-one correspondence between a univariate distribution function and its cumulative 15

16 hazard function it follows that ˆF Xn is consistent for F X if and only if Λ 1 equals the cumulative hazard function Λ X of F X. Similarly, it follows that ˆF n (x,y) is consistent for F(x,y) if and only if Λ equals the mark specific cumulative hazard function Λ of F. This is made precise in the following corollary. Corollary 3.7 We introduce the following conditions: Λ 1 (x) = Λ (x) = V 1 (ds) 1 H(s ) = V (ds,y) 1 H(s ) = F X (ds) 1 F X (s ) = Λ X(x) (27) F(ds,y) 1 F X (s ) = Λ(x,y), (28) Then ˆF Xn is consistent for F X on (0,τ] if and only if (27) holds for all x (0,τ]. Furthermore, ˆFn is consistent for F on (0,τ] R if and only if (28) holds for all x (0,τ], y R. Finally, let x 0 (0,τ] with F X (x 0 ) > 0. Then ˆF n(x l 0,y)/ ˆF Xn l (x 0) is consistent for F Y (y) if X and Y are independent. The last claim of the corollary follows from (25). Conditions (27) and (28) are hard to interpret in general, since F X and F enter on both sides of the equations when we plug in the expressions (15), (16) and (17) for H(s ), V (ds,y) and V 1 (ds). However, it is clear that the conditions force a relation between F and G, and such a relation will typically not hold and cannot be assumed since F is unknown. The following corollary further strengthens this result when X is subject to current status censoring. Corollary 3.8 Let X be subject to current status censoring, and let F X and G be continuous. Then the MLE ˆF Xn is inconsistent for any choice of F X and G. Proof: Let γ = inf{x : F X (x) > 0} < τ. For continuous distribution functions G and F X condition (27) can be rewritten as (γ,x] dg(s) 1 G(s) = df X (s) (γ,x] F X (s){1 F X (s)}, x (γ,τ]. 16

17 For continuous G and F X this integral equation is solved by { } FX (x) log{1 G(x)} + C = log, x (γ,τ]. 1 F X (x) This yields F X (x) = [1 + exp( C){1 G(x)}] 1 for x (γ,τ]. But there is no finite C such that F X (γ) = 0 holds, and hence condition (27) fails for all continuous distributions G and F X. The following corollary shows that the asymptotic bias of the MLE goes to zero as the number of observation times k increases for at least one particular distribution of the T j s, namely if they are distributed as the order statistics of a uniform sample on [0,θ]. Corollary 3.9 Let T 1,...,T k be the order statistics of k independent and identically distributed uniform random variables on [0,θ]. Then we have Λ 1 (x) Λ k 1 (x) = Λ (x,y) Λ k (x,y) = dv1 k (s) 1 H k (s ) dv k (s,y) 1 H k (s ) df X (s) 1 F X (s ) = Λ X(x), F(ds,y) 1 F X (s ) = Λ(x,y), for all continuity points x < θ of Λ X (x) and Λ(x,y) and for all y R, as k. Proof: Since the T i s are order statistics of k independent and identically distributed uniform random variables, the marginal densities g j, j = 1,...,k and the joint densities g j 1,j, j = 2,...,k are known (see e.g. Shorack & Wellner (1986), page 97). Summing them over j yields: g j (t) = k θ 1 [0,θ](t) j=1 g j 1,j (s,t) = j=2 k 1 j 1=0 ( k 1 j 1 k(k 1) 1 θ 2 [0 s t θ] )( t θ ) j 1 ( 1 t ) k 1 (j 1) = k θ θ 1 [0,θ](t), ( 1 t s ) k 2. θ 17

18 Hence we compute, using Fubini s theorem to rewrite the second term, V k (x,y) = = k θ = k θ j=1 θ x 0 θ x 0 F(t,y)dG j (t) F(t,y)dt j=2 ( F(s,y) 1 x s θ 0 s t x F(s,y)dG j 1,j (s,t) k(k 1) F(s,y) 0 s t (x θ) θ 2 ) k 1 θ x ds = 0 F(s,y)dQ k x(s), ( 1 t s ) k 2 dsdt θ where s ( Q k k x(s) 1 x r ) k 1 x/θ dr = k(1 v) k 1 dv 0 θ θ (x s)/θ { ( = 1 x s ) } k ( 1 x ) { k ( 1{0 s < x} x ) } k 1{s x}. θ θ θ Thus, Q k x(s) converges weakly (and even uniformly) to the distribution function 1{s x} corresponding to the measure with mass 1 at x as k. Plugging in y = in V k (x,y) yields V k 1 (x) = θ x 0 F X (s)dq k x(s). Furthermore, plugging in the expressions for V k 1 and G k in (15) gives H k (x) = θ x 0 F X (s)dq k x(s) + θ x 0 (1 F X (s)) k θ (s/θ)k 1 ds. Hence, V k (x,y) F(x,y), V k 1 (x) F X (x) and 1 H k (x) 1 F X (x) as k for continuity points of the limits. The corollary then follows from the extended Helly-Bray theorem 4. Repaired MLE via discretization of marks We now define a simple repaired estimator F n (x,y) which is consistent for F(x,y) for y on a grid. The idea behind the estimator is that one can define discrete competing 18

19 risks based on a continuous random variable. Doing so transforms interval censored continuous mark data into interval censored competing risks data, for which the MLE is consistent. To describe the method, we let K > 0 and define a grid y 1 < < y K. We let y 0 = and y K+1 =, and introduce a new random variable C {1,...,K + 1}: C = K+1 j=1 j1{y j 1 < Y y j }. We can determine the value of C for all observations with an observed mark. Hence, we can transform the observations (T,,Z) into (T,,Z ), where Z = + C. This gives interval censored competing risks data with K + 1 competing risks. Since the observed sets for interval censored competing risks data form a partition of the space R + {1,...,K+1}, global consistency of the MLE follows from Theorems 9 and 10 of Van der Vaart & Wellner (2000). We can derive local consistency from the global consistency as done in Maathuis (2005a). This means that we can consistently estimate the sub-distribution functions F j (x) = P(X x,c = j) = P(X x,y j 1 < Y y j ). Hence, we can consistently estimate F(x,y j ) = j l=1 F l(x) for x R + and y j on the grid. Since interval censored competing risks data are a special case of bivariate censored data, we can compute the MLE by methods that are available for bivariate censored data. Such methods often consist of two steps. They first compute the maximal intersections, using for example the height map algorithm of Maathuis (2005b), and then solve a high dimensional convex constrained optimization problem. It may be tempting to choose K large, such that F(x,y) can be estimated for y on a fine grid. However, this may result in a poor estimator. To obtain a good estimator one should choose the grid such that there are ample observations for each value of C. 19

20 In practice, one can start with a course grid, and then refine the grid as long as the estimator stays close to the one computed on the course grid. We close this section with some general remarks about this method. First, note that the repaired MLE corresponds to an existing consistent MLE in the following two cases: (a) estimation of F(x, y) for right censored continuous mark data, and (b) estimation of F X (x) for interval censored continuous mark data. In the first case the discretization does not change the intersection structure of the data. Hence, the repaired MLE equals the consistent MLE as defined by Huang & Louis (1998) for y on the grid. In the second case we can take K = 0, thereby ignoring any information on Y. This means that we compute the MLE for univariate interval censored data (T, ) which is known to be consistent (Schick & Yu (2000), Van der Vaart & Wellner (2000)). In simulation results we found that moderate values of K tend to give better estimates for F X, and in 5 we present results for n = 10, 000 and K = 20. Finally, note that the grouping of the data that occurs in the discretization tends to yield smaller maximal intersections in the x-direction and hence diminishes problems with representational non-uniqueness. This is visible in Examples 3 and 4 in Examples We illustrate the asymptotic behavior of the inconsistent and repaired MLE in four examples. The examples are chosen to cover a range of scenarios, summarized in Table 1. In each example we compute the MLEs ˆF n l and ˆF n u and the repaired estimators F n l and F n u for sample size n = 10,000. For the repaired estimator we use an equidistant grid with K = 20 points as shown in Figure 3. We compare these estimators to the true underlying distribution F and the derived limits F l and F. u Figure 1 shows the contour lines of the MLE ˆF n, l its limit F l and the true underlying 20

21 Table 1: Summary of the examples Example 1 Example 2 Example 3 Example 4 (In)dependence of (X,Y ) independent dependent dependent dependent Censoring mechanism for X case 1 case 1 case 2 case 2 Distribution of T continuous continuous continuous discrete distribution F. Note that ˆF n l and F l are almost indistinguishable, while there is a clear difference between F l and F. The results for the upper limits ˆF n u and F u are similar and not shown. Figure 2 contains the results for F X and shows that the MLE tends to underestimate F X, which can be understood through Remark 1. However, the repaired MLE F n closely follows F X. Figure 3 shows the results for F(x 0,y) for fixed x 0. This function is often estimated as an alternative for F Y, since F Y cannot be consistently estimated if the support of T is contained in the support of X, a situation that typically occurs in practice. The values of x 0 were chosen to show a range of possible scenarios for the behavior of the MLE, and we see that ˆF n can suffer from significant positive or negative bias and non-uniqueness. However, the repaired MLE is again close to the underlying distribution. We now discuss each example in detail. Example 1: Let X and Y be independent, with X Unif(0, 1) and Y Exp(1). Let X be subject to current status censoring with observation time T Unif(0, 0.5) independent of (X,Y ). Thus, F X (x) = x, F Y (y) = 1 exp( y) and F(x,y) = x(1 exp( y)) for x [0, 1] and y 0. We derive the limits for (x,y) [0,τ] R + for τ < 0.5. Using equations (17), (19), (20) and the fact that s x {1 Λ 1 (s)} = exp{ Λ 1 (s)} when Λ 1 is continuous, 21

22 we obtain Λ 1 (x) = x 0 F x X 1 G dg = 2s 0 1 2s ds = x log 2 4x + log 2, 1 F l X (x) = exp{ Λ 1 (x)} = 1 2x exp(x) 1 F X (x) = 1 x. Since all maximal intersections M (i), i I, converge to points and FX l (0.5) = 1, the limit F X does not suffer from representational non-uniqueness. Hence, F X = F l X. Figure 2 shows that F X (x) < F X (x) for small values of x, but F X (x) > F X (x) for large values of x. In particular, F X (0.5) = 1 > F X (0.5) = 0.5. The fact that F X equals one at the upper support point of T is true in some generality and can be explained as follows. Let η = G 1 (1), let X be subject to current status censoring, let F X (η) > 0, and let F X and G be continuous at η. Then Λ 1 (x) = x 0 F X/(1 G)dG can be viewed as a scaled down version of the cumulative hazard function of G, and hence it converges to infinity for x η. This implies that F X (x) converges to one for x η. This observation is relevant in practice since it often happens in medical studies that the support of G is strictly contained in the support of X. Figure 2 also shows that the repaired estimator F Xn (x) closely follows F X (x) for x < 0.5. Neither estimator behaves well for x > 0.5, but this was to be expected since we cannot estimate outside of the support of G. Since X and Y are independent, the bivariate limit F follows from equation (25): F (x,y) = F X (x)f Y (y) = {1 1 2x exp(x)}{1 exp( y)}. In particular, this implies that F(x 0,y) for x 0 = 0.49 is overestimated by a factor F X (0.49)/F X (0.49) 1.57, as shown in Figure 3. The repaired estimator F n (0.49,y) behaves quite well, but is slightly off for larger values of x. Example 2: Let X Unif(0, 1), and let Y X be exponentially distributed with mean 1/(X + a), where a = 0.5. Let X be subject to current status censoring 22

23 with observation time T Unif(0, 1) independent of (X,Y ). Thus, F X (x) = x, F Y (y) = 1 exp( ay){1 exp( y)}/y and F(x,y) = x exp( ay){1 exp( xy)}/y for x [0, 1] and y 0. Let x [0,τ] R + for τ < 1. Equations (17), (19) and (20) yield Λ 1 (x) = x 0 F x X 1 G dg = s ds = x log(1 x), 0 1 s 1 F l X (x) = exp{ Λ 1 (x)} = (1 x) exp(x) 1 F X (x) = 1 x, where the inequality in the last line is strict for all x (0, 1]. As in Example 1 F X = F l X is unique. Note P(Y y X x) = 1 exp( ay){1 exp( xy)}/(xy) and f X (x) = x exp(x). Hence, equation (26) yields F (x,y) = x exp(x) + exp( ay) { {exp(x xy) 1} 1 + exp( ay) } {exp(x) 1}. y(1 y) y Figures 2 and 3 show that ˆF Xn (x) and ˆF n (0, 5,y) underestimate F X (x) and F(0.5,y), while the repaired MLE behaves very well. Example 3: Let X Unif(0, 2), and let Y X. Let X be subject to interval censoring case 2 with observation times T = (T 1,T 2 ), independent of (X,Y ) and uniformly distributed over {(t 1,t 2 ) : 0 t 1 1, 1 t 2 2}. Thus, F X (x) = 1 2 x, F Y (y) = 1 2 y and F(x,y) = 1 2 (x y) for (x,y) [0, 2]2. We derive the limits for (x,y) [0,τ] [0, 2] for τ < 2. Using equations (15), (17), (19) and (20), we get Λ 1 (x) = log {1 14 } (1 x)2 F l X (x) = 1 4 x2 1{x 1} + { } x log(2 x) + { 1 3 ( 2 4 (2 x) exp 3 x 2 3 1{x > 1}, )} 1{x > 1}. 23

24 In this example the limit F X is non-unique and hence we also derive the upper bound FX u. To do so, we look at the x-intervals of the observed sets which take the form (0,t 1 ], (t 1,t 2 ] and (t 2, ), with t 1 (0, 1] and t 2 (1, 2]. Since there are no right censored observations with L < 1, equation (5) implies that observed sets with x-interval (0,t 1 ] are maximal intersections, and these maximal intersections do not converge to points when n goes to infinity. On the other hand, maximal intersections corresponding to observed sets with x-interval (t 1,t 2 ] do converge to points. Hence, we can derive the upper bound F u X by reassigning all mass at points t 1 1 to x = 0+, where 0+ denotes a point slightly bigger than zero to account for the fact that the x-intervals are left-open. This yields FX (x) u = 1 { 4 1{0 < x 1} ( 2 4 (2 x) exp 3 x 2 )} 1{x > 1}. 3 Note that F u X is left continuous at zero. We obtain F l by first computing V (dx,y) using (16), and then integrating V (dx,y)/v 1 (dx) against FX l (x) using (23): F (x,y) l = FX l (x) x y, FX l (y) + 1 y(x y) y x 1, 2 FX l (y) + 3(2y 8 1){exp(2x 2) 3 3 exp(2y 2 )} 1 y x, 3 3 FX l (y) + 1y(1 y) y2 {exp( 2x 2 ) 1} y 1 x. 3 3 We find F u by reassigning mass from the upper right to the lower left corners of the maximal intersections, as outlined for F X. Figure 1 shows that F l is smoother than F and clearly different. Figure 2 shows that that FX l (x) < F X(x) for all x (0,τ] and FX l (x) = F X u (x) for x 1, and Figure 3 shows that both F (0.75,y) l and F (0.75,y) u are smaller than F(0.75,y). However, the repaired estimators F Xn and F n (0.75,y) are unique and behave very well. 24

25 Example 4: Let (X,Y ) be uniformly distributed over {(x,y) : 0 x y 1}. Let X be subject to interval censoring case 2 with observation times T = (T 1,T 2 ) independent of (X,Y ). Let the distribution of T be discrete: G{(0.25, 0.5)} = 0.3, G{(0.25, 0.75)} = 0.3 and G{(0.5, 0.75)} = 0.4. Thus, F X (x) = 2x x 2, F Y (y) = y 2 and F(x,y) = (2xy x 2 )1{x y} + y 2 1{x > y} for (x,y) [0, 1] 2. Since we can only expect to get sensible estimates for F(x,y) for values of x in the support of the observation time distribution, we derive the limits for x {0.25, 0.5, 0.75} and y [0, 1]. Equations (15), (17), (19) and (20) yield FX l (x) 0.26, FX l (0.5) 0.66 and F X l (0.75) Since G is discrete, we do not use the exponential function in (20), but compute the product. As in Example 3, F X is nonunique. We obtain F u X from F l X by moving the probability mass from the right endpoints to the left endpoints of the maximal intersections. The possible x-intervals of the maximal intersections are (0, 0.25], (0, 0.5], (0.25, 0.5], (0.5, 0.75] and (0.75, ). Consider the interval (0, 0.25] and note that moving mass from x = 0.25 to x = 0+ does not change the value of F X (x) for x {0, 0.25, 0.5, 0.75}. This also holds if we move mass in the other intervals, except for the interval (0, 0.5], where moving the mass from x = 0.5 to x = 0+ increases the value of F X (x) at x = Note that the mass FX l ({0.5}) comes from maximal intersections with x-intervals (0, 0.5] and (0.25, 0.5]. The proportion of mass coming from the latter is α = P(L = 0.25,R = 0.5 R = 0.5) = G{(0.25, 0.5)}{F X (0.5) F X (0.25)} G{(0.25, 0.5)}{F X (0.5) F X (0.25)} + G{(0.5, 0.75)}F X (0.5) Hence, we get F u X (0.25) = F l X (0.25) + (1 α)f l X ({0.5}) 0.56 and F u X (x) = F l X (x) for x {0, 0.5, 0.75}. To derive the bivariate limit F l, we first find V (dx,y) using equation (16) and then integrate V (dx,y)/v 1 (dx) against FX l (x) using equation 25

26 (23). This yields F (0.25,y) l = 0.6F(0.25,y), F (0.5,y) l = 0.3F(0.25,y)+0.7F(0.5,y) and F (0.75,y) l 0.90F(0.75,y) F(0.5,y) 0.084F(0.25,y). The upper bound F (x,y) u can be found by reassigning mass to the lower left corners of the maximal intersections. To do so, we compute α(y) = P(L = 0.25,R = 0.5 R = 0.5,Y y) = G{(0.25, 0.5)}{F(0.5,y) F(0.25,y)} G{(0.25, 0.5)}{F(0.5,y) F(0.25,y)} + G{(0.5, 0.75)}F(0.5,y). We then get F (0.25,y) u = F (0.25,y)+{1 α(y)}{f l (0.5,y) F l (0.25,y)}, l and the value of F (x,y) is unchanged for x {0, 0.5, 0.75}. The discrete nature of the limit F l is visible in Figure 1. Figure 2 shows significant non-uniqueness in all estimators for x-values outside the support of G. However, FXn (x) is unique for x {0.25, 0.5, 0.75} and very close to F X (x). Finally, Figure 3 shows that F (0.25,y) is non-unique, while the repaired MLE is unique and closely follows F(0.25,y). 6. Discussion In this article we studied the MLE for the bivariate distribution function of an interval censored survival time and a continuous mark variable. We derived the almost sure limit of the MLE, and showed that the MLE is inconsistent in general. We propose a simple method to repair the inconsistency, and illustrated the behavior of the inconsistent and repaired MLE in four examples. The MLE for the distribution function of bivariate censored data has been found to be inconsistent before, namely when X and Y are both right censored (van der Laan, 1996), and when X is current status censored and Y is uncensored (Maathuis (2003), 6.2). In the latter model the inconsistency could be explained by representational non-uniqueness of the MLE. However, this is not the case for interval censored con- 26

27 tinuous mark data, where the MLE is typically inconsistent even if representational non-uniqueness plays no role in the limit. Rather, the inconsistency in this model is related to the fact that the functions Λ 1n and Λ n that define the MLE in (8) and (9) do not converge to the true underlying cumulative hazard functions. However, there is a similarity between these three bivariate censored data models with inconsistent MLEs. Namely, in each model the observed sets can take the form of line segments, and the likelihood contains corresponding partial density-type terms. Thus, observed line segments can be viewed as a warning sign for consistency problems, and whenever they occur consistency of the MLE should be carefully studied. These warning signs arise in the model for HIV vaccine data in Hudgens et al. (2005). This model is slightly different from ours, since it allows the mark variable to be missing for observations that are not right censored. As a result there is no explicit formula for the MLE and hence it is more difficult to derive its almost sure limit. Consistency of the MLE in this model is currently still an open problem, but simulation results clearly point to inconsistency. 7. Acknowledgements This research was supported by NSF grant DMS We would like to thank Piet Groeneboom and Michael Hudgens for helpful discussions and comments. References Aalen, O. (1976). Nonparametric inference in connection with multiple decrement models. Scand. J. Statist. 3, Aalen, O. (1978). Nonparametric estimation of partial transition probabilities in multiple decrement models. Ann. Statist. 6,

28 Gentleman, R. & Vandal, A. (2002). Nonparametric estimation of the bivariate cdf for arbitrarily censored data. Can. J. Statist. 30, Gill, R. D. & Johansen, S. (1990). A survey of product-integration with a view toward application in survival analysis. Ann. Statist. 18, Groeneboom, P. & Wellner, J. A. (1992). Information Bounds and Nonparametric Maximum Likelihood Estimation. Birkhäuser Verlag, Basel. Huang, Y. & Louis, T. A. (1998). Nonparametric estimation of the joint distribution of survival time and mark variables. Biometrika 85, Hudgens, M. G., Maathuis, M. H. & Gilbert, P. B. (2005). Nonparametric estimation of the joint distribution of a survival time subject to interval censoring and a continuous mark variable. Submitted. Hudgens, M. G., Satten, G. A. & Longini, I. M. (2001). Nonparametric maximum likelihood estimation for competing risks survival data subject to interval censoring and truncation. Biometrics 57, Jewell, N. P. & Kalbfleisch, J. D. (2004). Maximum likelihood estimation of ordered multinomial parameters. Biostatistics 5, Jewell, N. P., Van der Laan, M. J. & Henneman, T. (2003). Nonparametric estimation from current status data with competing risks. Biometrika 90, Kalbfleisch, J. & Prentice, R. (1980). The Statistical Analysis of Failure Time Data. Wiley, New York. Maathuis, M. H. (2003). Nonparametric Maximum Likelihood Estimation For Bivariate Censored Data. Master s thesis, Delft University of Technology, The Netherlands. 28

29 Maathuis, M. H. (2005a). Nonparametric estimation for current status data with competing risks. Ph.D. thesis in preparation. Maathuis, M. H. (2005b). Reduction algorithm for the MLE for the distribution function of bivariate interval censored data. J. Comp. Graph. Statist. 14, Schick, A. & Yu, Q. (2000). Consistency of the GMLE with mixed case intervalcensored data. Scand. J. Statist. 27, Shorack, G. R. & Wellner, J. A. (1986). Empirical Processes with Applications to Statistics. Wiley, New York. Turnbull, B. (1976). The empirical distribution function with arbitrarily grouped, censored, and truncated data. J. R. Statist. Soc. B 38, van der Laan, M. J. (1996). Efficient estimation in the bivariate censoring model and repairing NPMLE. Ann. Statist. 24, Van der Vaart, A. & Wellner, J. A. (2000). Preservation theorems for Glivenko- Cantelli and uniform Glivenko-Cantelli classes. In High Dimensional Probability II. Birkhäuser, Boston, Wong, G. & Yu, Q. (1999). Generalized mle of a joint distribution function with multivariate interval-censored data. Journal of Multivariate Analysis 69,

30 F^n, Example 1 F, Example 1 F, Example F^n, Example 2 F, Example 2 F, Example F^n, Example 3 F, Example 3 F, Example F^n, Example 4 F, Example 4 F, Example 4 Figure 1: Contour lines for the bivariate functions ˆF l n, F l and F. All functions were computed on an equidistant grid with mesh size 0.02, and n = 10,

31 F X, Example 1 F X, Example 2 x x F X, Example 3 F X, Example x x Figure 2: Dotted: F X. Dashed: FX l and F X u. Solid black: l F Xn and F Xn u using the l equidistant grid with K = 20 shown in Figure 3. Solid grey: ˆF Xn and ˆF Xn u. In all cases n = 10,

32 F(0.49, y), Example 1 F(0.5, y), Example y y F(0.75, y), Example 3 F(0.25, y), Example y y Figure 3: Dotted: F(x 0,y). Dashed: F l (x 0,y) and F u (x 0,y). Circles: F l n (x 0,y) = F u n(x 0,y) using an equidistant grid with K = 20. Solid grey: ˆF l n (x 0,y) and ˆF u n(x 0,y). In all cases n = 10,

Nonparametric estimation for current status data with competing risks

Nonparametric estimation for current status data with competing risks Marloes Henriëtte Maathuis A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy