Inconsistency of the MLE for the joint distribution. of interval censored survival times. and continuous marks

Size: px
Start display at page:

Download "Inconsistency of the MLE for the joint distribution. of interval censored survival times. and continuous marks"

Transcription

1 Inconsistency of the MLE for the joint distribution of interval censored survival times and continuous marks By M.H. Maathuis and J.A. Wellner Department of Statistics, University of Washington, Seattle, Washington 98195, U.S.A. August 23, 2005 Summary This article considers the nonparametric maximum likelihood estimator (MLE) for the joint distribution function of an interval censored survival time and a continuous mark variable. We derive a new explicit formula for the MLE that has both computational and theoretical advantages. Using this formula and the mark specific cumulative hazard function of Huang & Louis (1998), we derive the almost sure limit of the MLE. We conclude that the MLE is inconsistent in general. We show that the inconsistency can be repaired by discretizing the marks, and illustrate the behavior of the inconsistent and repaired MLE in several examples. Some key words: Competing risk; Inconsistency; Interval censoring; Mark variable; Multivariate distribution; Nonparametric maximum likelihood; Survival analysis 1

2 1. Introduction We consider bivariate data (X,Y ), where X is a survival time and Y is a mark variable which is observed if and only if X is not right censored. This type of data arises in various situations. For example, the classical competing risks problem fits in this framework, with X being the failure time and Y the failure cause. Alternatively, X can be the time of onset of a disease and Y its incubation period, or X can be the time of death and Y a measure of utility or cost, such as quality adjusted lifetime or lifetime medical costs (Huang & Louis, 1998). Another example is the HIV vaccine trial data analyzed in Hudgens et al. (2005), where X is the time of HIV infection and Y is the viral distance between the infecting HIV virus and the virus in the vaccine. In practice the marginal distribution of Y is often of interest. However, since Y is observed if and only if X is not right censored, the observed values of Y typically form a biased sample. It is therefore important to consider the bivariate data. We focus on the nonparametric maximum likelihood estimator (MLE) ˆF n (x,y) of the bivariate distribution function F(x,y). One can distinguish various models, with different types of censoring mechanisms for X, and with Y being discrete or continuous. We first discuss the case that Y is discrete, which gives the competing risks model. Aalen (1976, 1978) and Kalbfleisch & Prentice (1980), 7.2, pages , studied the MLE in this model when X is subject to right censoring. The generalization to interval censored competing risks data was considered by Hudgens et al. (2001), Jewell et al. (2003) and Jewell & Kalbfleisch (2004). Maathuis (2005a) studied asymptotic properties of the MLE in the current status competing risks model and proved uniform strong consistency of the MLE under modest conditions. We now consider the case that Y is continuous, which gives the continuous mark 2

3 model. Huang & Louis (1998) studied this model when X is subject to right censoring. They proved uniform strong consistency and the limiting distribution of the MLE. Hudgens et al. (2005) considered the extension to interval censored continuous mark data. They characterized the MLE and studied its finite sample properties. In this article we continue the study of interval censored continuous mark data. Our main focus is on asymptotic properties of the MLE, and in particular on consistency. In 2 we use the analogy with univariate right censored data to derive a new explicit formula for the MLE. In 3 we use this formula and the mark specific cumulative hazard function of Huang & Louis (1998) to derive the almost sure limit of the MLE. We conclude that the MLE is inconsistent in general. In 4 we show that the inconsistency can be repaired by discretizing the marks. In 5 we illustrate the behavior of the inconsistent and repaired MLE in four examples. Finally, 6 gives a summary and a short discussion of some remaining issues. 2. Explicit formula for the MLE 2.1. Intermezzo: univariate right censored data Hudgens et al. (2005) noted a close connection between the MLE for univariate right censored data and the MLE for interval censored continuous mark data. We will use this connection in 2.2 to derive a new explicit formula for the MLE in the interval censored continuous mark model. However, we first briefly review univariate right censored data in a way that shows the similarity between the two models. Let X > 0 be a survival time subject to right censoring. Let T > 0 be the censoring variable, with T independent of X. Let U = X T min(x,t) and = 1{X T }. We are interested in the MLE ˆF n (x) of F(x) = P(X x) based on n independent and identically distributed copies of (U 1, 1 ),...,(U n, n ) of (U, ). 3

4 We call the set of X values that are consistent with an observation (U, ) an observed set A. Thus, we have A = {U} if = 1 and A = (U, ) if = 0. Let U (1),...,U (n) be the order statistics of U 1,...,U n, and let (i) and A (i) be the corresponding values of and A. We assume all A i with i = 1 are distinct, since this will be the case for the continuous mark data. However, we allow ties in the T s and U s provided this assumption is not violated. We break such ties in U arbitrarily after ensuring that observations with = 1 are ordered before those with = 0. Assuming F has a density f with respect to some dominating measure µ, the likelihood (conditional on G) is L(F) = n i=1 q(u i, i ), where q(u,δ) = f(u) δ {1 F(u)} 1 δ. The first term of q is a density-type term, and hence L(F) can be made arbitrarily large by letting f peak at some value U i with i = 1. This problem is usually solved by maximizing L(F) over the class of distribution functions that have a density with respect to counting measure on the observed failure times. We can then write L(F) = n i=1 P F(A i ), where P F (A) is the probability of A under F. It is well-known that the MLE in censored data problems can only assign mass to a finite number of disjoint regions (Turnbull, 1976). In these regions the observed sets have maximal overlap, and hence they were called maximal intersections by Wong & Yu (1999). Maathuis (2005b) defined a height map h : R d N, where d is the dimension of the observed sets and h(x) is the number of observed sets containing x. Furthermore, she defined a version of the observed sets in which ties are resolved, called canonical observed sets. She then showed that the maximal intersections correspond to the local maxima of the canonical observed sets. For univariate right censored data, this means that each A (i) with i I = {i {1,...,n} : (i) = 1} is a maximal intersection. We denote these maximal intersections by M (i). This notation may seem a little redundant since M (i) = A (i), but will be useful in the next section. Furthermore, there is an extra maximal intersection M (n+1) = 4

5 A (n) = (U (n), ) if and only if (n) = 0. Let I be the collection of indices of all maximal intersections. Thus, I = I if (n) = 1 and I = I {n + 1} if (n) = 0. Let p i be the probability mass of maximal intersection M (i), i I. We can then write the likelihood in terms of the p i s: n P(A i ) = i=1 n p j 1{M (j) A (i) } = j I i=1 n i=1 p (i) i j i+1,j I p j 1 (i). (1) The MLE ˆp maximizes this expression under the constraints p i = 1 and p i 0 for all i I. (2) i I It is well-known that ˆp is the Kaplan-Meier or product-limit estimator, given by i 1 ( ˆp i = 1 ) (j) (i) n j + 1 n i + 1, i I, j=1 and ˆp n+1 = 1 i I ˆp i if (n) = 0 (see for example Shorack & Wellner (1986), Chapter 7, pages ). Equivalently, we can write j i,j I ˆp j = j i 1 ( 1 ) (j), i I. n j + 1 The vector ˆp is uniquely determined. We obtain ˆF n (x) by summing all probability mass in the interval (0,x]. It is well-known that ˆF n (x) is non-unique for x > U (n) if and only if (n) = 0. This is caused by the fact that the MLE is indifferent to the distribution of mass within a maximal intersection, called representational nonuniqueness by Gentleman & Vandal (2002). Since all {M (i) : i I} are points this non-uniqueness only occurs when M (n+1) exists, which happens if and only if (n) = 0. 5

6 2.2. Continuous mark data We first introduce the model formally. Let X R + = (0, ) be a survival time, let Y R be a continuous mark variable, and let F(x,y) = P(X x,y y) be their joint distribution. Let X be subject to interval censoring case k, using the terminology of Groeneboom & Wellner (1992). Let T = (T 1,...,T k ) be the k observation times and let G be their distribution. We assume T is independent of (X,Y ) and G({0 < T 1 < < T k }) = 1. We use subscripts to denote the marginal distributions of G. For example, G 1 is the distribution of T 1 and G 23 is the distribution of (T 2,T 3 ). Let = ( 1,..., k+1 ) be a vector of indicator functions, where j = 1{T j 1 < X T j } for j = 1,...,k+1, T 0 = 0 and T k+1 =. We say that X is right censored if k+1 = 1, and we assume Y is observed if and only if X is not right censored. Thus, we observe W = (T,,Z), where Z = + Y and + = k j=1 j = 1 k+1. We study the nonparametric maximum likelihood estimator ˆF n (x,y) for F(x,y) based on n independent and identically distributed copies W 1,...,W n of W, where W i = (T i, i,z i ). We allow ties between components of the vectors T i and T j for i j. In this model an observed set A is the set of (X,Y ) values that are consistent with an observation W = (T,, Z). Thus, (T j 1,T j ] {Z} if j = 1, j = 1,...,k A = (T k, ) R if k+1 = 1. Note that A is a line segment if + = 1 and A is a half plane if k+1 = 1. Assuming F has a density f with respect to some dominating measure µ X µ Y, the likelihood (conditional on G) is given by L(F) = n i=1 q(w i), where q(w) = q(t,δ,z) = { k δj f(s,z)µ X (ds)} (1 F X (t k )) δ k+1. (3) (t j 1,t j ] j=1 6

7 The first term of q is a density-type term. Hence, L(F) can be made arbitrarily large by letting f(s,z) peak at z = Z i for some observation with +i = 1. We therefore define the MLE ˆF n (x,y) to be the maximizer of L(F) over the class F of all bivariate distribution functions that have a marginal density f Y with respect to counting measure on the observed marks. We can then write L(F) = n i=1 P F(A i ). As in Maathuis (2005b), we call the projection of A on the x- and y-axis its x- interval and y-interval. We denote the left and right endpoint of the x-interval of A by L and R and define a new variable U: k+1 k+1 L = j T j 1, R = j T j, U = + R + k+1 L. (4) j=1 j=1 Note that U equals T if X is subject to current status censoring. The variable U plays an important role in this article, because it determines the order of the observations. Let U (1),...,U (n) be the order statistics of U 1,...,U n and let (i), Z (i), A (i), L (i) and R (i) be the corresponding values of, Z, A, L and R. Here (i) = ( 1(i),..., k+1,(i) ). We break ties in U arbitrarily after ensuring that observations with + = 1 are ordered before those with + = 0. Let I = {i {1,...,n} : +(i) = 1}. Recall that the maximal intersections are the local maxima of the height map of the canonical observed sets. Since Y is continuous, the observed sets A (i), i I, are completely distinct with probability one. Hence, each such A (i) contains exactly one maximal intersection M (i) : M (i) = (max{{l (j) : j / I,j < i} {L (i) }},R (i) ] {Z (i) }, i I. (5) To understand this expression, let S (i) be the set of right censored observations A (j) with L (i) < L (j) < R (i). Then (5) implies that M (i) = A (i) if S (i) = and M (i) A (i) otherwise. Furthermore, in the latter case the left endpoint of M (i) is determined by 7

8 the largest L (j) with A (j) S (i). The right endpoints of M (i) and A (i) are always identical. Equation (5) also implies that the maximal intersections can be computed in O(n log n) time, which is faster than the height map algorithm of Maathuis (2005b) due to the special structure in the data. We again have an extra maximal intersection M (n+1) = A (n) = (U (n), ) R if and only if +(n) = 0. Let I be the collection of indices of all maximal intersections. Thus, I = I if +(n) = 1 and I = I {n + 1} if +(n) = 0. Let p i be the probability mass of maximal intersection M (i), i I. We can then write the likelihood as n P(A i ) = i=1 n p j 1{M (j) A (i) } = j I i=1 n i=1 p +(i) i j i+1,j I p j 1 +(i). (6) The MLE ˆp maximizes this expression under the constraints (2). From the analogy with likelihood (1) it follows immediately that i 1 ( ˆp i = 1 ) +(j) +(i) n j + 1 n i + 1, i I, j=1 and ˆp n+1 = 1 i I ˆp i if +(n) = 0. Equivalently, we can write j i,j I ˆp j = j i 1 ( 1 ) +(j), i I. (7) n j + 1 These formulas differ from the ones given in 2.3 of Hudgens et al. (2005). The current formulas have several advantages. First, the tail probabilities (7) can be computed in time complexity O(n log n), since the computationally most intensive step consists of sorting the U s. Furthermore, the current form provides additional insights in the behavior of the MLE. In particular, it shows that the MLE can be viewed as a right endpoint imputation estimator (see Remark 1) and it allows for an easy derivation of 8

9 the almost sure limit of the MLE (see 3). The vector ˆp is again uniquely determined. This was noted by Hudgens et al. (2005) and also follows from our derivation here. We obtain ˆF n (x,y) by summing all mass in the region (0,x] (,y]. We define a marginal MLE for X by letting ˆF Xn (x) = ˆF n (x, ). The estimators ˆF n and ˆF Xn can suffer considerably from representational non-uniqueness, since the maximal intersections {M (i) : i I} are line segments and M (n+1) extends to infinity in two dimensions. We denote the estimator that assigns all mass to the upper right corners of the maximal intersections by ˆF l n, since it is a lower bound for the MLE. Similarly, we denote the estimator that assigns all mass to the lower left corners of the maximal intersections by ˆF u n, since it is an upper bound for the MLE. The formulas for ˆF l n simplify considerably: 1 ˆF l Xn(x) = ˆF l n(x,y) = U (i) x ( 1 ) +(i), (8) n i + 1 n ˆp i 1{U (i) x,z (i) y} i=1 = U (i) x U (j) <U (i) ( 1 ) +(j) +(i) 1{Z (i) y}, (9) n j + 1 n i + 1 where U was defined in (4). Remark 1: The MLE ˆF l n can be viewed as a right endpoint imputation estimator. Namely, replace the observed sets A (i) with +(i) = 1 by their right endpoint: {U A (i) } {Z (i) } if i I, (i) = A (i) if i / I. Then the intersection structures of {A (i) } n i=1 and {A (i) }n i=1 are identical. Furthermore, the maximal intersections of {A (i) }n i=1 are {M (i) = A (i) : i I}. Hence, writing 9

10 the likelihood for the imputed data in terms of p yields exactly the same likelihood as (6). As a result the values ˆp i, i I, are identical to the ones for the original data. Furthermore, since ˆF n l assigns mass to the upper right corners of the maximal intersections, ˆF n l is completely equivalent to the MLE for the imputed data. Since the observed sets A (i) impute an x-value that is always at least as large as the unobserved value X, ˆF l Xn tends to have a negative bias. 3. Inconsistency of the MLE We now derive the almost sure limits FX l and F l of the MLEs ˆF Xn l and ˆF n. l In some cases representational non-uniqueness disappears in the limit, so that F X = FX l and F = F. l This occurs for all (x,y) R + R if and only if the maximal intersections M (i), i I, converge to points and i I ˆp i 1 as n, see Examples 1 and 2 in 5. If these conditions fail, then the upper bounds FX u and F u can be derived from their lower bounds by reassigning mass from the upper right corners to the lower left corners of the maximal intersections. We illustrate this in Examples 3 and 4 in 5. However, we first derive the lower bounds FX l and F. l Let H n (x) = P n 1{U x}, x 0, V n (x,y) = P n + 1{U x,z y}, x 0,y R, and V 1n (x) V n (x, ) = P n + 1{U x}. Here U is defined in (4) and P n f(x) = n 1 n i=1 f(x i). Furthermore, let V n (ds,y) Λ n (x,y) = 1 H n (s ), Λ 1n (x) Λ V 1n (ds) n (x,y) = 1 H n (s ). 10

11 Since Λ n (dx,y) = P n + 1{U = x,z y} P n 1{U x} and Λ 1n (dx) = P n + 1{U = x} P n 1{U x} we can write equations (8) and (9) in terms of Λ 1n and Λ n : 1 ˆF Xn(x) l = s x{1 Λ 1n (ds)}, (10) ˆF n(x,y) l = {1 Λ 1n (du)} Λ n (ds,y). (11) s x u<s Note (10) is analogous to the Kaplan-Meier estimator for right censored data, and (11) is analogous to equation (3.3) of Huang & Louis (1998). However, our functions Λ 1n and Λ n are defined differently. As we will see in the following lemma and theorems, this difference lies at the root of the inconsistency problems of the MLE. Lemma 3.1 For I R d, d 1, let D(I) be the space of cadlag functions on I (cadlag = right continuous with left limits). Let be the supremum norm on (D(R + ), D(R + ), D(R + R)). Then (H n H, V 1n V 1, V n V ) 0 almost surely, (12) where V (x,y) = V 1 (x) = F(t,y)dG j (t) F(s,y)dG j 1,j (s,t), (13) j=1 j=1 H(x) = V 1 (x) + F X (t)dg j (t) j=2 0 s t x j=2 0 s t x F X (s)dg j 1,j (s,t), (14) {1 F X (s)}dg k (s). (15) 11

12 Proof: Equation (12) follows immediately from the Glivenko-Cantelli theorem, with H(x) = E(1{U x}), V (x,y) = E( + 1{U x,z y}) and V 1 (x) = V (x, ) = E( + 1{U x}). We now express H, V and V 1 in terms of F and G. Note that the events [ j = 1], j = 1,...,k + 1, are disjoint. Furthermore, U = T j and Z = Y on [ j = 1], j = 1,...,k and U = T k on [ k+1 = 1]. Hence, V (x,y) = E( + 1{U x,z y}) = P( j = 1,Y y,t j x) = = = j=1 P(X (T j 1,T j ],Y y,t j x) j=1 j=1 0 s t x {F(t,y) F(s,y)}dG j 1,j (s,t) F(t,y)dG j (t) F(s,y)dG j 1,j (s,t), j=1 j=2 0 s t x using T 0 = 0, X > 0 and G({0 < T 1 < < T k }) = 1 in the last equality. Taking y = yields the expression for V 1 (x). The expression for H follows similarly, using H(x) = E1{U x} = P( j = 1,T j x) + P( k+1 = 1,T k x). j=1 The differentials of V and V 1 with respect to x are V (dx,y) = V 1 (dx) = F(x,y)dG j (x) F(s,y)dG j 1,j (s,x), (16) j=1 F X (x)dg j (x) j=2 j=1 j=2 F X (s)dg j 1,j (s,x). (17) Let τ be such that H(τ) < 1. We define 0/0 = 0 and f(x ) = lim t x f(x). 12

13 Theorem 3.2 Let be the supremum norm on (D[0,τ], D([0,τ] R)). Then ( Λ 1n Λ 1, Λ n Λ ) 0 almost surely, where V (ds,y) Λ (x,y) =, x [0,τ],y R, (18) 1 H(s ) V 1 (ds) Λ 1 (x) = Λ (x, ) =, x [0,τ]. (19) 1 H(s ) Proof: The proof is similar to the discussion on page 1536 of Gill & Johansen (1990). For all x 0, let H n (x) H n (x ) and consider the mappings (H n,v 1n, V n ) ((1 H n ) 1, V 1n, V n ) ( Λ 1n, Λ n ) on the spaces (D [0,τ], D[0,τ], D([0,τ] R)) (D [0,τ], D[0,τ], D([0,τ] R)) (D[0,τ], D([0,τ] R)), where D (0,τ] is the space of caglad (left continuous with right limits) functions on (0,τ]. The first mapping is continuous with respect to the supremum norm when we restrict the domain of its first argument to elements of D [0,τ] that are bounded by say {1 + H(τ)}/2 < 1. Strong consistency of H n ensures it satisfies this bound with probability one for n large enough. The second mapping is continuous with respect to the supremum norm by the Helly-Bray lemma. Combining the continuity of these mappings with Lemma 3.1 yields the result of the theorem. Theorem 3.3 Let be the supremum norm on (D[0,τ], D([0,τ] R)). Then ( ˆF l Xn F l X, ˆF l n F l ) 0 almost surely, 13

14 where FX (x) l = 1 {1 Λ 1 (ds)}, (20) s x F (x,y) l = {1 Λ 1 (ds)} Λ (du,y). (21) u x s<u Proof: To derive the almost sure limit of ˆF Xn consider the mapping Λ 1n s x{1 Λ 1n (ds)} = 1 ˆF l Xn(x) (22) on the space D[0,τ] to itself. This mapping is continuous with respect to the supremum norm when its domain is restricted to functions of uniformly bounded variation (Gill & Johansen (1990), Theorem 7). Note that Λ 1n 1/{1 H n (τ)} < 2/{1 H(τ)} with probability one for n large enough. Together with the monotonicity of Λ 1n this implies that with probability one Λ 1n is of uniformly bounded variation for n large enough. The almost sure limit of ˆF Xn now follows by combining Theorem 3.2 and the continuity of (22). To derive the almost sure limit of ˆF n consider the mapping ( Λ 1n, Λ n ) {1 Λ 1n (ds)} Λ n (du,y) = ˆF n(x,y) l u x s<u on the space (D[0, τ], D([0, τ] R)) to D([0, τ] R). This mapping is continuous with respect to the supremum norm when its domain is restricted to functions of uniformly bounded variation (Huang & Louis (1998), Theorem 1). Note that Λ n (x,y) Λ 1n (x), so that with probability one the pair ( Λ n, Λ 1n ) is uniformly bounded for n large enough. The result then follows as in the first part of the proof. 14

15 Corollary 3.4 For x [0,τ],y R, we can write F (x,y) l = Λ (ds,y) Λ 1 (ds) df X (s) l V (ds,y) = V 1 (ds) df X (s). l (23) Proof: Combining equations (20) and (21) yields F (x,y) l = {1 FX (s )}Λ l (ds,y). (24) Taking y = gives F l X (x) = F l (x, ) = {1 F l X (s )}Λ 1 (ds). Hence, df l X (s) = {1 F l X (s )}Λ 1 (ds). Combining this with equation (24) yields the first equality of (23). The second equality follows from Λ (ds,y) = V (ds,y)/{1 H(s )} and Λ 1 (ds) = V 1 (ds,y)/{1 H(s )}. Corollary 3.5 Let X and Y be independent. Then F l (x,y) = F l X (x)f Y (y), x [0,τ],y R. (25) Proof: If X and Y are independent, equations (16) and (17) yield V (ds,y) = F Y (y)v 1 (ds). Substituting this into equation (23) gives the result. Corollary 3.6 Let X be subject to current status censoring (k = 1). Then F (x,y) l = P(Y y X s)dfx (s), l x [0,τ],y R. (26) Proof: For k = 1 equations (16) and (17) reduce to V (ds,y) = F(s,y)dG(s) and V 1 (ds) = F X (s)dg(s). Hence, V (ds,y)/v 1 (ds) = F(s,y)/F X (s) = P(Y y X s). We now consider necessary and sufficient conditions for consistency. From the oneto-one correspondence between a univariate distribution function and its cumulative 15

16 hazard function it follows that ˆF Xn is consistent for F X if and only if Λ 1 equals the cumulative hazard function Λ X of F X. Similarly, it follows that ˆF n (x,y) is consistent for F(x,y) if and only if Λ equals the mark specific cumulative hazard function Λ of F. This is made precise in the following corollary. Corollary 3.7 We introduce the following conditions: Λ 1 (x) = Λ (x) = V 1 (ds) 1 H(s ) = V (ds,y) 1 H(s ) = F X (ds) 1 F X (s ) = Λ X(x) (27) F(ds,y) 1 F X (s ) = Λ(x,y), (28) Then ˆF Xn is consistent for F X on (0,τ] if and only if (27) holds for all x (0,τ]. Furthermore, ˆFn is consistent for F on (0,τ] R if and only if (28) holds for all x (0,τ], y R. Finally, let x 0 (0,τ] with F X (x 0 ) > 0. Then ˆF n(x l 0,y)/ ˆF Xn l (x 0) is consistent for F Y (y) if X and Y are independent. The last claim of the corollary follows from (25). Conditions (27) and (28) are hard to interpret in general, since F X and F enter on both sides of the equations when we plug in the expressions (15), (16) and (17) for H(s ), V (ds,y) and V 1 (ds). However, it is clear that the conditions force a relation between F and G, and such a relation will typically not hold and cannot be assumed since F is unknown. The following corollary further strengthens this result when X is subject to current status censoring. Corollary 3.8 Let X be subject to current status censoring, and let F X and G be continuous. Then the MLE ˆF Xn is inconsistent for any choice of F X and G. Proof: Let γ = inf{x : F X (x) > 0} < τ. For continuous distribution functions G and F X condition (27) can be rewritten as (γ,x] dg(s) 1 G(s) = df X (s) (γ,x] F X (s){1 F X (s)}, x (γ,τ]. 16

17 For continuous G and F X this integral equation is solved by { } FX (x) log{1 G(x)} + C = log, x (γ,τ]. 1 F X (x) This yields F X (x) = [1 + exp( C){1 G(x)}] 1 for x (γ,τ]. But there is no finite C such that F X (γ) = 0 holds, and hence condition (27) fails for all continuous distributions G and F X. The following corollary shows that the asymptotic bias of the MLE goes to zero as the number of observation times k increases for at least one particular distribution of the T j s, namely if they are distributed as the order statistics of a uniform sample on [0,θ]. Corollary 3.9 Let T 1,...,T k be the order statistics of k independent and identically distributed uniform random variables on [0,θ]. Then we have Λ 1 (x) Λ k 1 (x) = Λ (x,y) Λ k (x,y) = dv1 k (s) 1 H k (s ) dv k (s,y) 1 H k (s ) df X (s) 1 F X (s ) = Λ X(x), F(ds,y) 1 F X (s ) = Λ(x,y), for all continuity points x < θ of Λ X (x) and Λ(x,y) and for all y R, as k. Proof: Since the T i s are order statistics of k independent and identically distributed uniform random variables, the marginal densities g j, j = 1,...,k and the joint densities g j 1,j, j = 2,...,k are known (see e.g. Shorack & Wellner (1986), page 97). Summing them over j yields: g j (t) = k θ 1 [0,θ](t) j=1 g j 1,j (s,t) = j=2 k 1 j 1=0 ( k 1 j 1 k(k 1) 1 θ 2 [0 s t θ] )( t θ ) j 1 ( 1 t ) k 1 (j 1) = k θ θ 1 [0,θ](t), ( 1 t s ) k 2. θ 17

18 Hence we compute, using Fubini s theorem to rewrite the second term, V k (x,y) = = k θ = k θ j=1 θ x 0 θ x 0 F(t,y)dG j (t) F(t,y)dt j=2 ( F(s,y) 1 x s θ 0 s t x F(s,y)dG j 1,j (s,t) k(k 1) F(s,y) 0 s t (x θ) θ 2 ) k 1 θ x ds = 0 F(s,y)dQ k x(s), ( 1 t s ) k 2 dsdt θ where s ( Q k k x(s) 1 x r ) k 1 x/θ dr = k(1 v) k 1 dv 0 θ θ (x s)/θ { ( = 1 x s ) } k ( 1 x ) { k ( 1{0 s < x} x ) } k 1{s x}. θ θ θ Thus, Q k x(s) converges weakly (and even uniformly) to the distribution function 1{s x} corresponding to the measure with mass 1 at x as k. Plugging in y = in V k (x,y) yields V k 1 (x) = θ x 0 F X (s)dq k x(s). Furthermore, plugging in the expressions for V k 1 and G k in (15) gives H k (x) = θ x 0 F X (s)dq k x(s) + θ x 0 (1 F X (s)) k θ (s/θ)k 1 ds. Hence, V k (x,y) F(x,y), V k 1 (x) F X (x) and 1 H k (x) 1 F X (x) as k for continuity points of the limits. The corollary then follows from the extended Helly-Bray theorem 4. Repaired MLE via discretization of marks We now define a simple repaired estimator F n (x,y) which is consistent for F(x,y) for y on a grid. The idea behind the estimator is that one can define discrete competing 18

19 risks based on a continuous random variable. Doing so transforms interval censored continuous mark data into interval censored competing risks data, for which the MLE is consistent. To describe the method, we let K > 0 and define a grid y 1 < < y K. We let y 0 = and y K+1 =, and introduce a new random variable C {1,...,K + 1}: C = K+1 j=1 j1{y j 1 < Y y j }. We can determine the value of C for all observations with an observed mark. Hence, we can transform the observations (T,,Z) into (T,,Z ), where Z = + C. This gives interval censored competing risks data with K + 1 competing risks. Since the observed sets for interval censored competing risks data form a partition of the space R + {1,...,K+1}, global consistency of the MLE follows from Theorems 9 and 10 of Van der Vaart & Wellner (2000). We can derive local consistency from the global consistency as done in Maathuis (2005a). This means that we can consistently estimate the sub-distribution functions F j (x) = P(X x,c = j) = P(X x,y j 1 < Y y j ). Hence, we can consistently estimate F(x,y j ) = j l=1 F l(x) for x R + and y j on the grid. Since interval censored competing risks data are a special case of bivariate censored data, we can compute the MLE by methods that are available for bivariate censored data. Such methods often consist of two steps. They first compute the maximal intersections, using for example the height map algorithm of Maathuis (2005b), and then solve a high dimensional convex constrained optimization problem. It may be tempting to choose K large, such that F(x,y) can be estimated for y on a fine grid. However, this may result in a poor estimator. To obtain a good estimator one should choose the grid such that there are ample observations for each value of C. 19

20 In practice, one can start with a course grid, and then refine the grid as long as the estimator stays close to the one computed on the course grid. We close this section with some general remarks about this method. First, note that the repaired MLE corresponds to an existing consistent MLE in the following two cases: (a) estimation of F(x, y) for right censored continuous mark data, and (b) estimation of F X (x) for interval censored continuous mark data. In the first case the discretization does not change the intersection structure of the data. Hence, the repaired MLE equals the consistent MLE as defined by Huang & Louis (1998) for y on the grid. In the second case we can take K = 0, thereby ignoring any information on Y. This means that we compute the MLE for univariate interval censored data (T, ) which is known to be consistent (Schick & Yu (2000), Van der Vaart & Wellner (2000)). In simulation results we found that moderate values of K tend to give better estimates for F X, and in 5 we present results for n = 10, 000 and K = 20. Finally, note that the grouping of the data that occurs in the discretization tends to yield smaller maximal intersections in the x-direction and hence diminishes problems with representational non-uniqueness. This is visible in Examples 3 and 4 in Examples We illustrate the asymptotic behavior of the inconsistent and repaired MLE in four examples. The examples are chosen to cover a range of scenarios, summarized in Table 1. In each example we compute the MLEs ˆF n l and ˆF n u and the repaired estimators F n l and F n u for sample size n = 10,000. For the repaired estimator we use an equidistant grid with K = 20 points as shown in Figure 3. We compare these estimators to the true underlying distribution F and the derived limits F l and F. u Figure 1 shows the contour lines of the MLE ˆF n, l its limit F l and the true underlying 20

21 Table 1: Summary of the examples Example 1 Example 2 Example 3 Example 4 (In)dependence of (X,Y ) independent dependent dependent dependent Censoring mechanism for X case 1 case 1 case 2 case 2 Distribution of T continuous continuous continuous discrete distribution F. Note that ˆF n l and F l are almost indistinguishable, while there is a clear difference between F l and F. The results for the upper limits ˆF n u and F u are similar and not shown. Figure 2 contains the results for F X and shows that the MLE tends to underestimate F X, which can be understood through Remark 1. However, the repaired MLE F n closely follows F X. Figure 3 shows the results for F(x 0,y) for fixed x 0. This function is often estimated as an alternative for F Y, since F Y cannot be consistently estimated if the support of T is contained in the support of X, a situation that typically occurs in practice. The values of x 0 were chosen to show a range of possible scenarios for the behavior of the MLE, and we see that ˆF n can suffer from significant positive or negative bias and non-uniqueness. However, the repaired MLE is again close to the underlying distribution. We now discuss each example in detail. Example 1: Let X and Y be independent, with X Unif(0, 1) and Y Exp(1). Let X be subject to current status censoring with observation time T Unif(0, 0.5) independent of (X,Y ). Thus, F X (x) = x, F Y (y) = 1 exp( y) and F(x,y) = x(1 exp( y)) for x [0, 1] and y 0. We derive the limits for (x,y) [0,τ] R + for τ < 0.5. Using equations (17), (19), (20) and the fact that s x {1 Λ 1 (s)} = exp{ Λ 1 (s)} when Λ 1 is continuous, 21

22 we obtain Λ 1 (x) = x 0 F x X 1 G dg = 2s 0 1 2s ds = x log 2 4x + log 2, 1 F l X (x) = exp{ Λ 1 (x)} = 1 2x exp(x) 1 F X (x) = 1 x. Since all maximal intersections M (i), i I, converge to points and FX l (0.5) = 1, the limit F X does not suffer from representational non-uniqueness. Hence, F X = F l X. Figure 2 shows that F X (x) < F X (x) for small values of x, but F X (x) > F X (x) for large values of x. In particular, F X (0.5) = 1 > F X (0.5) = 0.5. The fact that F X equals one at the upper support point of T is true in some generality and can be explained as follows. Let η = G 1 (1), let X be subject to current status censoring, let F X (η) > 0, and let F X and G be continuous at η. Then Λ 1 (x) = x 0 F X/(1 G)dG can be viewed as a scaled down version of the cumulative hazard function of G, and hence it converges to infinity for x η. This implies that F X (x) converges to one for x η. This observation is relevant in practice since it often happens in medical studies that the support of G is strictly contained in the support of X. Figure 2 also shows that the repaired estimator F Xn (x) closely follows F X (x) for x < 0.5. Neither estimator behaves well for x > 0.5, but this was to be expected since we cannot estimate outside of the support of G. Since X and Y are independent, the bivariate limit F follows from equation (25): F (x,y) = F X (x)f Y (y) = {1 1 2x exp(x)}{1 exp( y)}. In particular, this implies that F(x 0,y) for x 0 = 0.49 is overestimated by a factor F X (0.49)/F X (0.49) 1.57, as shown in Figure 3. The repaired estimator F n (0.49,y) behaves quite well, but is slightly off for larger values of x. Example 2: Let X Unif(0, 1), and let Y X be exponentially distributed with mean 1/(X + a), where a = 0.5. Let X be subject to current status censoring 22

23 with observation time T Unif(0, 1) independent of (X,Y ). Thus, F X (x) = x, F Y (y) = 1 exp( ay){1 exp( y)}/y and F(x,y) = x exp( ay){1 exp( xy)}/y for x [0, 1] and y 0. Let x [0,τ] R + for τ < 1. Equations (17), (19) and (20) yield Λ 1 (x) = x 0 F x X 1 G dg = s ds = x log(1 x), 0 1 s 1 F l X (x) = exp{ Λ 1 (x)} = (1 x) exp(x) 1 F X (x) = 1 x, where the inequality in the last line is strict for all x (0, 1]. As in Example 1 F X = F l X is unique. Note P(Y y X x) = 1 exp( ay){1 exp( xy)}/(xy) and f X (x) = x exp(x). Hence, equation (26) yields F (x,y) = x exp(x) + exp( ay) { {exp(x xy) 1} 1 + exp( ay) } {exp(x) 1}. y(1 y) y Figures 2 and 3 show that ˆF Xn (x) and ˆF n (0, 5,y) underestimate F X (x) and F(0.5,y), while the repaired MLE behaves very well. Example 3: Let X Unif(0, 2), and let Y X. Let X be subject to interval censoring case 2 with observation times T = (T 1,T 2 ), independent of (X,Y ) and uniformly distributed over {(t 1,t 2 ) : 0 t 1 1, 1 t 2 2}. Thus, F X (x) = 1 2 x, F Y (y) = 1 2 y and F(x,y) = 1 2 (x y) for (x,y) [0, 2]2. We derive the limits for (x,y) [0,τ] [0, 2] for τ < 2. Using equations (15), (17), (19) and (20), we get Λ 1 (x) = log {1 14 } (1 x)2 F l X (x) = 1 4 x2 1{x 1} + { } x log(2 x) + { 1 3 ( 2 4 (2 x) exp 3 x 2 3 1{x > 1}, )} 1{x > 1}. 23

24 In this example the limit F X is non-unique and hence we also derive the upper bound FX u. To do so, we look at the x-intervals of the observed sets which take the form (0,t 1 ], (t 1,t 2 ] and (t 2, ), with t 1 (0, 1] and t 2 (1, 2]. Since there are no right censored observations with L < 1, equation (5) implies that observed sets with x-interval (0,t 1 ] are maximal intersections, and these maximal intersections do not converge to points when n goes to infinity. On the other hand, maximal intersections corresponding to observed sets with x-interval (t 1,t 2 ] do converge to points. Hence, we can derive the upper bound F u X by reassigning all mass at points t 1 1 to x = 0+, where 0+ denotes a point slightly bigger than zero to account for the fact that the x-intervals are left-open. This yields FX (x) u = 1 { 4 1{0 < x 1} ( 2 4 (2 x) exp 3 x 2 )} 1{x > 1}. 3 Note that F u X is left continuous at zero. We obtain F l by first computing V (dx,y) using (16), and then integrating V (dx,y)/v 1 (dx) against FX l (x) using (23): F (x,y) l = FX l (x) x y, FX l (y) + 1 y(x y) y x 1, 2 FX l (y) + 3(2y 8 1){exp(2x 2) 3 3 exp(2y 2 )} 1 y x, 3 3 FX l (y) + 1y(1 y) y2 {exp( 2x 2 ) 1} y 1 x. 3 3 We find F u by reassigning mass from the upper right to the lower left corners of the maximal intersections, as outlined for F X. Figure 1 shows that F l is smoother than F and clearly different. Figure 2 shows that that FX l (x) < F X(x) for all x (0,τ] and FX l (x) = F X u (x) for x 1, and Figure 3 shows that both F (0.75,y) l and F (0.75,y) u are smaller than F(0.75,y). However, the repaired estimators F Xn and F n (0.75,y) are unique and behave very well. 24

25 Example 4: Let (X,Y ) be uniformly distributed over {(x,y) : 0 x y 1}. Let X be subject to interval censoring case 2 with observation times T = (T 1,T 2 ) independent of (X,Y ). Let the distribution of T be discrete: G{(0.25, 0.5)} = 0.3, G{(0.25, 0.75)} = 0.3 and G{(0.5, 0.75)} = 0.4. Thus, F X (x) = 2x x 2, F Y (y) = y 2 and F(x,y) = (2xy x 2 )1{x y} + y 2 1{x > y} for (x,y) [0, 1] 2. Since we can only expect to get sensible estimates for F(x,y) for values of x in the support of the observation time distribution, we derive the limits for x {0.25, 0.5, 0.75} and y [0, 1]. Equations (15), (17), (19) and (20) yield FX l (x) 0.26, FX l (0.5) 0.66 and F X l (0.75) Since G is discrete, we do not use the exponential function in (20), but compute the product. As in Example 3, F X is nonunique. We obtain F u X from F l X by moving the probability mass from the right endpoints to the left endpoints of the maximal intersections. The possible x-intervals of the maximal intersections are (0, 0.25], (0, 0.5], (0.25, 0.5], (0.5, 0.75] and (0.75, ). Consider the interval (0, 0.25] and note that moving mass from x = 0.25 to x = 0+ does not change the value of F X (x) for x {0, 0.25, 0.5, 0.75}. This also holds if we move mass in the other intervals, except for the interval (0, 0.5], where moving the mass from x = 0.5 to x = 0+ increases the value of F X (x) at x = Note that the mass FX l ({0.5}) comes from maximal intersections with x-intervals (0, 0.5] and (0.25, 0.5]. The proportion of mass coming from the latter is α = P(L = 0.25,R = 0.5 R = 0.5) = G{(0.25, 0.5)}{F X (0.5) F X (0.25)} G{(0.25, 0.5)}{F X (0.5) F X (0.25)} + G{(0.5, 0.75)}F X (0.5) Hence, we get F u X (0.25) = F l X (0.25) + (1 α)f l X ({0.5}) 0.56 and F u X (x) = F l X (x) for x {0, 0.5, 0.75}. To derive the bivariate limit F l, we first find V (dx,y) using equation (16) and then integrate V (dx,y)/v 1 (dx) against FX l (x) using equation 25

26 (23). This yields F (0.25,y) l = 0.6F(0.25,y), F (0.5,y) l = 0.3F(0.25,y)+0.7F(0.5,y) and F (0.75,y) l 0.90F(0.75,y) F(0.5,y) 0.084F(0.25,y). The upper bound F (x,y) u can be found by reassigning mass to the lower left corners of the maximal intersections. To do so, we compute α(y) = P(L = 0.25,R = 0.5 R = 0.5,Y y) = G{(0.25, 0.5)}{F(0.5,y) F(0.25,y)} G{(0.25, 0.5)}{F(0.5,y) F(0.25,y)} + G{(0.5, 0.75)}F(0.5,y). We then get F (0.25,y) u = F (0.25,y)+{1 α(y)}{f l (0.5,y) F l (0.25,y)}, l and the value of F (x,y) is unchanged for x {0, 0.5, 0.75}. The discrete nature of the limit F l is visible in Figure 1. Figure 2 shows significant non-uniqueness in all estimators for x-values outside the support of G. However, FXn (x) is unique for x {0.25, 0.5, 0.75} and very close to F X (x). Finally, Figure 3 shows that F (0.25,y) is non-unique, while the repaired MLE is unique and closely follows F(0.25,y). 6. Discussion In this article we studied the MLE for the bivariate distribution function of an interval censored survival time and a continuous mark variable. We derived the almost sure limit of the MLE, and showed that the MLE is inconsistent in general. We propose a simple method to repair the inconsistency, and illustrated the behavior of the inconsistent and repaired MLE in four examples. The MLE for the distribution function of bivariate censored data has been found to be inconsistent before, namely when X and Y are both right censored (van der Laan, 1996), and when X is current status censored and Y is uncensored (Maathuis (2003), 6.2). In the latter model the inconsistency could be explained by representational non-uniqueness of the MLE. However, this is not the case for interval censored con- 26

27 tinuous mark data, where the MLE is typically inconsistent even if representational non-uniqueness plays no role in the limit. Rather, the inconsistency in this model is related to the fact that the functions Λ 1n and Λ n that define the MLE in (8) and (9) do not converge to the true underlying cumulative hazard functions. However, there is a similarity between these three bivariate censored data models with inconsistent MLEs. Namely, in each model the observed sets can take the form of line segments, and the likelihood contains corresponding partial density-type terms. Thus, observed line segments can be viewed as a warning sign for consistency problems, and whenever they occur consistency of the MLE should be carefully studied. These warning signs arise in the model for HIV vaccine data in Hudgens et al. (2005). This model is slightly different from ours, since it allows the mark variable to be missing for observations that are not right censored. As a result there is no explicit formula for the MLE and hence it is more difficult to derive its almost sure limit. Consistency of the MLE in this model is currently still an open problem, but simulation results clearly point to inconsistency. 7. Acknowledgements This research was supported by NSF grant DMS We would like to thank Piet Groeneboom and Michael Hudgens for helpful discussions and comments. References Aalen, O. (1976). Nonparametric inference in connection with multiple decrement models. Scand. J. Statist. 3, Aalen, O. (1978). Nonparametric estimation of partial transition probabilities in multiple decrement models. Ann. Statist. 6,

28 Gentleman, R. & Vandal, A. (2002). Nonparametric estimation of the bivariate cdf for arbitrarily censored data. Can. J. Statist. 30, Gill, R. D. & Johansen, S. (1990). A survey of product-integration with a view toward application in survival analysis. Ann. Statist. 18, Groeneboom, P. & Wellner, J. A. (1992). Information Bounds and Nonparametric Maximum Likelihood Estimation. Birkhäuser Verlag, Basel. Huang, Y. & Louis, T. A. (1998). Nonparametric estimation of the joint distribution of survival time and mark variables. Biometrika 85, Hudgens, M. G., Maathuis, M. H. & Gilbert, P. B. (2005). Nonparametric estimation of the joint distribution of a survival time subject to interval censoring and a continuous mark variable. Submitted. Hudgens, M. G., Satten, G. A. & Longini, I. M. (2001). Nonparametric maximum likelihood estimation for competing risks survival data subject to interval censoring and truncation. Biometrics 57, Jewell, N. P. & Kalbfleisch, J. D. (2004). Maximum likelihood estimation of ordered multinomial parameters. Biostatistics 5, Jewell, N. P., Van der Laan, M. J. & Henneman, T. (2003). Nonparametric estimation from current status data with competing risks. Biometrika 90, Kalbfleisch, J. & Prentice, R. (1980). The Statistical Analysis of Failure Time Data. Wiley, New York. Maathuis, M. H. (2003). Nonparametric Maximum Likelihood Estimation For Bivariate Censored Data. Master s thesis, Delft University of Technology, The Netherlands. 28

29 Maathuis, M. H. (2005a). Nonparametric estimation for current status data with competing risks. Ph.D. thesis in preparation. Maathuis, M. H. (2005b). Reduction algorithm for the MLE for the distribution function of bivariate interval censored data. J. Comp. Graph. Statist. 14, Schick, A. & Yu, Q. (2000). Consistency of the GMLE with mixed case intervalcensored data. Scand. J. Statist. 27, Shorack, G. R. & Wellner, J. A. (1986). Empirical Processes with Applications to Statistics. Wiley, New York. Turnbull, B. (1976). The empirical distribution function with arbitrarily grouped, censored, and truncated data. J. R. Statist. Soc. B 38, van der Laan, M. J. (1996). Efficient estimation in the bivariate censoring model and repairing NPMLE. Ann. Statist. 24, Van der Vaart, A. & Wellner, J. A. (2000). Preservation theorems for Glivenko- Cantelli and uniform Glivenko-Cantelli classes. In High Dimensional Probability II. Birkhäuser, Boston, Wong, G. & Yu, Q. (1999). Generalized mle of a joint distribution function with multivariate interval-censored data. Journal of Multivariate Analysis 69,

30 F^n, Example 1 F, Example 1 F, Example F^n, Example 2 F, Example 2 F, Example F^n, Example 3 F, Example 3 F, Example F^n, Example 4 F, Example 4 F, Example 4 Figure 1: Contour lines for the bivariate functions ˆF l n, F l and F. All functions were computed on an equidistant grid with mesh size 0.02, and n = 10,

31 F X, Example 1 F X, Example 2 x x F X, Example 3 F X, Example x x Figure 2: Dotted: F X. Dashed: FX l and F X u. Solid black: l F Xn and F Xn u using the l equidistant grid with K = 20 shown in Figure 3. Solid grey: ˆF Xn and ˆF Xn u. In all cases n = 10,

32 F(0.49, y), Example 1 F(0.5, y), Example y y F(0.75, y), Example 3 F(0.25, y), Example y y Figure 3: Dotted: F(x 0,y). Dashed: F l (x 0,y) and F u (x 0,y). Circles: F l n (x 0,y) = F u n(x 0,y) using an equidistant grid with K = 20. Solid grey: ˆF l n (x 0,y) and ˆF u n(x 0,y). In all cases n = 10,

Nonparametric estimation for current status data with competing risks

Nonparametric estimation for current status data with competing risks Nonparametric estimation for current status data with competing risks Marloes Henriëtte Maathuis A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy

More information

Estimation of the Bivariate and Marginal Distributions with Censored Data

Estimation of the Bivariate and Marginal Distributions with Censored Data Estimation of the Bivariate and Marginal Distributions with Censored Data Michael Akritas and Ingrid Van Keilegom Penn State University and Eindhoven University of Technology May 22, 2 Abstract Two new

More information

Maximum likelihood: counterexamples, examples, and open problems. Jon A. Wellner. University of Washington. Maximum likelihood: p.

Maximum likelihood: counterexamples, examples, and open problems. Jon A. Wellner. University of Washington. Maximum likelihood: p. Maximum likelihood: counterexamples, examples, and open problems Jon A. Wellner University of Washington Maximum likelihood: p. 1/7 Talk at University of Idaho, Department of Mathematics, September 15,

More information

Product-limit estimators of the survival function with left or right censored data

Product-limit estimators of the survival function with left or right censored data Product-limit estimators of the survival function with left or right censored data 1 CREST-ENSAI Campus de Ker-Lann Rue Blaise Pascal - BP 37203 35172 Bruz cedex, France (e-mail: patilea@ensai.fr) 2 Institut

More information

1. Introduction In many biomedical studies, the random survival time of interest is never observed and is only known to lie before an inspection time

1. Introduction In many biomedical studies, the random survival time of interest is never observed and is only known to lie before an inspection time ASYMPTOTIC PROPERTIES OF THE GMLE WITH CASE 2 INTERVAL-CENSORED DATA By Qiqing Yu a;1 Anton Schick a, Linxiong Li b;2 and George Y. C. Wong c;3 a Dept. of Mathematical Sciences, Binghamton University,

More information

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

FULL LIKELIHOOD INFERENCES IN THE COX MODEL October 20, 2007 FULL LIKELIHOOD INFERENCES IN THE COX MODEL BY JIAN-JIAN REN 1 AND MAI ZHOU 2 University of Central Florida and University of Kentucky Abstract We use the empirical likelihood approach

More information

arxiv:math/ v2 [math.st] 17 Jun 2008

arxiv:math/ v2 [math.st] 17 Jun 2008 The Annals of Statistics 2008, Vol. 36, No. 3, 1031 1063 DOI: 10.1214/009053607000000974 c Institute of Mathematical Statistics, 2008 arxiv:math/0609020v2 [math.st] 17 Jun 2008 CURRENT STATUS DATA WITH

More information

STAT 6385 Survey of Nonparametric Statistics. Order Statistics, EDF and Censoring

STAT 6385 Survey of Nonparametric Statistics. Order Statistics, EDF and Censoring STAT 6385 Survey of Nonparametric Statistics Order Statistics, EDF and Censoring Quantile Function A quantile (or a percentile) of a distribution is that value of X such that a specific percentage of the

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations Research

More information

and Comparison with NPMLE

and Comparison with NPMLE NONPARAMETRIC BAYES ESTIMATOR OF SURVIVAL FUNCTIONS FOR DOUBLY/INTERVAL CENSORED DATA and Comparison with NPMLE Mai Zhou Department of Statistics, University of Kentucky, Lexington, KY 40506 USA http://ms.uky.edu/

More information

Empirical likelihood ratio with arbitrarily censored/truncated data by EM algorithm

Empirical likelihood ratio with arbitrarily censored/truncated data by EM algorithm Empirical likelihood ratio with arbitrarily censored/truncated data by EM algorithm Mai Zhou 1 University of Kentucky, Lexington, KY 40506 USA Summary. Empirical likelihood ratio method (Thomas and Grunkmier

More information

1 Glivenko-Cantelli type theorems

1 Glivenko-Cantelli type theorems STA79 Lecture Spring Semester Glivenko-Cantelli type theorems Given i.i.d. observations X,..., X n with unknown distribution function F (t, consider the empirical (sample CDF ˆF n (t = I [Xi t]. n Then

More information

Maximum likelihood: counterexamples, examples, and open problems

Maximum likelihood: counterexamples, examples, and open problems Maximum likelihood: counterexamples, examples, and open problems Jon A. Wellner University of Washington visiting Vrije Universiteit, Amsterdam Talk at BeNeLuxFra Mathematics Meeting 21 May, 2005 Email:

More information

Regression analysis of interval censored competing risk data using a pseudo-value approach

Regression analysis of interval censored competing risk data using a pseudo-value approach Communications for Statistical Applications and Methods 2016, Vol. 23, No. 6, 555 562 http://dx.doi.org/10.5351/csam.2016.23.6.555 Print ISSN 2287-7843 / Online ISSN 2383-4757 Regression analysis of interval

More information

Efficiency of Profile/Partial Likelihood in the Cox Model

Efficiency of Profile/Partial Likelihood in the Cox Model Efficiency of Profile/Partial Likelihood in the Cox Model Yuichi Hirose School of Mathematics, Statistics and Operations Research, Victoria University of Wellington, New Zealand Summary. This paper shows

More information

Lecture 5 Models and methods for recurrent event data

Lecture 5 Models and methods for recurrent event data Lecture 5 Models and methods for recurrent event data Recurrent and multiple events are commonly encountered in longitudinal studies. In this chapter we consider ordered recurrent and multiple events.

More information

Semiparametric maximum likelihood estimation in normal transformation models for bivariate survival data

Semiparametric maximum likelihood estimation in normal transformation models for bivariate survival data Biometrika (28), 95, 4,pp. 947 96 C 28 Biometrika Trust Printed in Great Britain doi: 1.193/biomet/asn49 Semiparametric maximum likelihood estimation in normal transformation models for bivariate survival

More information

arxiv:math/ v2 [math.st] 17 Jun 2008

arxiv:math/ v2 [math.st] 17 Jun 2008 The Annals of Statistics 2008, Vol. 36, No. 3, 1064 1089 DOI: 10.1214/009053607000000983 c Institute of Mathematical Statistics, 2008 arxiv:math/0609021v2 [math.st] 17 Jun 2008 CURRENT STATUS DATA WITH

More information

On the generalized maximum likelihood estimator of survival function under Koziol Green model

On the generalized maximum likelihood estimator of survival function under Koziol Green model On the generalized maximum likelihood estimator of survival function under Koziol Green model By: Haimeng Zhang, M. Bhaskara Rao, Rupa C. Mitra Zhang, H., Rao, M.B., and Mitra, R.C. (2006). On the generalized

More information

Approximate Self Consistency for Middle-Censored Data

Approximate Self Consistency for Middle-Censored Data Approximate Self Consistency for Middle-Censored Data by S. Rao Jammalamadaka Department of Statistics & Applied Probability, University of California, Santa Barbara, CA 93106, USA. and Srikanth K. Iyer

More information

FULL LIKELIHOOD INFERENCES IN THE COX MODEL: AN EMPIRICAL LIKELIHOOD APPROACH

FULL LIKELIHOOD INFERENCES IN THE COX MODEL: AN EMPIRICAL LIKELIHOOD APPROACH FULL LIKELIHOOD INFERENCES IN THE COX MODEL: AN EMPIRICAL LIKELIHOOD APPROACH Jian-Jian Ren 1 and Mai Zhou 2 University of Central Florida and University of Kentucky Abstract: For the regression parameter

More information

The International Journal of Biostatistics

The International Journal of Biostatistics The International Journal of Biostatistics Volume 1, Issue 1 2005 Article 3 Score Statistics for Current Status Data: Comparisons with Likelihood Ratio and Wald Statistics Moulinath Banerjee Jon A. Wellner

More information

The Central Limit Theorem Under Random Truncation

The Central Limit Theorem Under Random Truncation The Central Limit Theorem Under Random Truncation WINFRIED STUTE and JANE-LING WANG Mathematical Institute, University of Giessen, Arndtstr., D-3539 Giessen, Germany. winfried.stute@math.uni-giessen.de

More information

(Y; I[X Y ]), where I[A] is the indicator function of the set A. Examples of the current status data are mentioned in Ayer et al. (1955), Keiding (199

(Y; I[X Y ]), where I[A] is the indicator function of the set A. Examples of the current status data are mentioned in Ayer et al. (1955), Keiding (199 CONSISTENCY OF THE GMLE WITH MIXED CASE INTERVAL-CENSORED DATA By Anton Schick and Qiqing Yu Binghamton University April 1997. Revised December 1997, Revised July 1998 Abstract. In this paper we consider

More information

Censoring and Truncation - Highlighting the Differences

Censoring and Truncation - Highlighting the Differences Censoring and Truncation - Highlighting the Differences Micha Mandel The Hebrew University of Jerusalem, Jerusalem, Israel, 91905 July 9, 2007 Micha Mandel is a Lecturer, Department of Statistics, The

More information

Lecture 3. Truncation, length-bias and prevalence sampling

Lecture 3. Truncation, length-bias and prevalence sampling Lecture 3. Truncation, length-bias and prevalence sampling 3.1 Prevalent sampling Statistical techniques for truncated data have been integrated into survival analysis in last two decades. Truncation in

More information

Estimating Bivariate Survival Function by Volterra Estimator Using Dynamic Programming Techniques

Estimating Bivariate Survival Function by Volterra Estimator Using Dynamic Programming Techniques Journal of Data Science 7(2009), 365-380 Estimating Bivariate Survival Function by Volterra Estimator Using Dynamic Programming Techniques Jiantian Wang and Pablo Zafra Kean University Abstract: For estimating

More information

STAT Sample Problem: General Asymptotic Results

STAT Sample Problem: General Asymptotic Results STAT331 1-Sample Problem: General Asymptotic Results In this unit we will consider the 1-sample problem and prove the consistency and asymptotic normality of the Nelson-Aalen estimator of the cumulative

More information

GOODNESS-OF-FIT TESTS FOR ARCHIMEDEAN COPULA MODELS

GOODNESS-OF-FIT TESTS FOR ARCHIMEDEAN COPULA MODELS Statistica Sinica 20 (2010), 441-453 GOODNESS-OF-FIT TESTS FOR ARCHIMEDEAN COPULA MODELS Antai Wang Georgetown University Medical Center Abstract: In this paper, we propose two tests for parametric models

More information

Part III. Hypothesis Testing. III.1. Log-rank Test for Right-censored Failure Time Data

Part III. Hypothesis Testing. III.1. Log-rank Test for Right-censored Failure Time Data 1 Part III. Hypothesis Testing III.1. Log-rank Test for Right-censored Failure Time Data Consider a survival study consisting of n independent subjects from p different populations with survival functions

More information

Preservation Theorems for Glivenko-Cantelli and Uniform Glivenko-Cantelli Classes

Preservation Theorems for Glivenko-Cantelli and Uniform Glivenko-Cantelli Classes Preservation Theorems for Glivenko-Cantelli and Uniform Glivenko-Cantelli Classes This is page 5 Printer: Opaque this Aad van der Vaart and Jon A. Wellner ABSTRACT We show that the P Glivenko property

More information

Nonparametric rank based estimation of bivariate densities given censored data conditional on marginal probabilities

Nonparametric rank based estimation of bivariate densities given censored data conditional on marginal probabilities Hutson Journal of Statistical Distributions and Applications (26 3:9 DOI.86/s4488-6-47-y RESEARCH Open Access Nonparametric rank based estimation of bivariate densities given censored data conditional

More information

Chapter 3. Chord Length Estimation. 3.1 Introduction

Chapter 3. Chord Length Estimation. 3.1 Introduction Chapter 3 Chord Length Estimation 3.1 Introduction Consider a random closed set W R 2 which we observe through a bounded window B. Important characteristics of the probability distribution of a random

More information

Multistate Modeling and Applications

Multistate Modeling and Applications Multistate Modeling and Applications Yang Yang Department of Statistics University of Michigan, Ann Arbor IBM Research Graduate Student Workshop: Statistics for a Smarter Planet Yang Yang (UM, Ann Arbor)

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued

Introduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued Introduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and

More information

11 Survival Analysis and Empirical Likelihood

11 Survival Analysis and Empirical Likelihood 11 Survival Analysis and Empirical Likelihood The first paper of empirical likelihood is actually about confidence intervals with the Kaplan-Meier estimator (Thomas and Grunkmeier 1979), i.e. deals with

More information

Constrained estimation for binary and survival data

Constrained estimation for binary and survival data Constrained estimation for binary and survival data Jeremy M. G. Taylor Yong Seok Park John D. Kalbfleisch Biostatistics, University of Michigan May, 2010 () Constrained estimation May, 2010 1 / 43 Outline

More information

ST745: Survival Analysis: Nonparametric methods

ST745: Survival Analysis: Nonparametric methods ST745: Survival Analysis: Nonparametric methods Eric B. Laber Department of Statistics, North Carolina State University February 5, 2015 The KM estimator is used ubiquitously in medical studies to estimate

More information

Full likelihood inferences in the Cox model: an empirical likelihood approach

Full likelihood inferences in the Cox model: an empirical likelihood approach Ann Inst Stat Math 2011) 63:1005 1018 DOI 10.1007/s10463-010-0272-y Full likelihood inferences in the Cox model: an empirical likelihood approach Jian-Jian Ren Mai Zhou Received: 22 September 2008 / Revised:

More information

STAT331. Cox s Proportional Hazards Model

STAT331. Cox s Proportional Hazards Model STAT331 Cox s Proportional Hazards Model In this unit we introduce Cox s proportional hazards (Cox s PH) model, give a heuristic development of the partial likelihood function, and discuss adaptations

More information

Empirical Processes & Survival Analysis. The Functional Delta Method

Empirical Processes & Survival Analysis. The Functional Delta Method STAT/BMI 741 University of Wisconsin-Madison Empirical Processes & Survival Analysis Lecture 3 The Functional Delta Method Lu Mao lmao@biostat.wisc.edu 3-1 Objectives By the end of this lecture, you will

More information

Multivariate Survival Data With Censoring.

Multivariate Survival Data With Censoring. 1 Multivariate Survival Data With Censoring. Shulamith Gross and Catherine Huber-Carol Baruch College of the City University of New York, Dept of Statistics and CIS, Box 11-220, 1 Baruch way, 10010 NY.

More information

UNIVERSITY OF CALIFORNIA, SAN DIEGO

UNIVERSITY OF CALIFORNIA, SAN DIEGO UNIVERSITY OF CALIFORNIA, SAN DIEGO Estimation of the primary hazard ratio in the presence of a secondary covariate with non-proportional hazards An undergraduate honors thesis submitted to the Department

More information

SOME CONVERSE LIMIT THEOREMS FOR EXCHANGEABLE BOOTSTRAPS

SOME CONVERSE LIMIT THEOREMS FOR EXCHANGEABLE BOOTSTRAPS SOME CONVERSE LIMIT THEOREMS OR EXCHANGEABLE BOOTSTRAPS Jon A. Wellner University of Washington The bootstrap Glivenko-Cantelli and bootstrap Donsker theorems of Giné and Zinn (990) contain both necessary

More information

Estimation of Conditional Kendall s Tau for Bivariate Interval Censored Data

Estimation of Conditional Kendall s Tau for Bivariate Interval Censored Data Communications for Statistical Applications and Methods 2015, Vol. 22, No. 6, 599 604 DOI: http://dx.doi.org/10.5351/csam.2015.22.6.599 Print ISSN 2287-7843 / Online ISSN 2383-4757 Estimation of Conditional

More information

Multivariate random variables

Multivariate random variables DS-GA 002 Lecture notes 3 Fall 206 Introduction Multivariate random variables Probabilistic models usually include multiple uncertain numerical quantities. In this section we develop tools to characterize

More information

Lecture 22 Survival Analysis: An Introduction

Lecture 22 Survival Analysis: An Introduction University of Illinois Department of Economics Spring 2017 Econ 574 Roger Koenker Lecture 22 Survival Analysis: An Introduction There is considerable interest among economists in models of durations, which

More information

A Law of the Iterated Logarithm. for Grenander s Estimator

A Law of the Iterated Logarithm. for Grenander s Estimator A Law of the Iterated Logarithm for Grenander s Estimator Jon A. Wellner University of Washington, Seattle Probability Seminar October 24, 2016 Based on joint work with: Lutz Dümbgen and Malcolm Wolff

More information

A Brief Introduction to Copulas

A Brief Introduction to Copulas A Brief Introduction to Copulas Speaker: Hua, Lei February 24, 2009 Department of Statistics University of British Columbia Outline Introduction Definition Properties Archimedean Copulas Constructing Copulas

More information

Efficient Semiparametric Estimators via Modified Profile Likelihood in Frailty & Accelerated-Failure Models

Efficient Semiparametric Estimators via Modified Profile Likelihood in Frailty & Accelerated-Failure Models NIH Talk, September 03 Efficient Semiparametric Estimators via Modified Profile Likelihood in Frailty & Accelerated-Failure Models Eric Slud, Math Dept, Univ of Maryland Ongoing joint project with Ilia

More information

EMPIRICAL ENVELOPE MLE AND LR TESTS. Mai Zhou University of Kentucky

EMPIRICAL ENVELOPE MLE AND LR TESTS. Mai Zhou University of Kentucky EMPIRICAL ENVELOPE MLE AND LR TESTS Mai Zhou University of Kentucky Summary We study in this paper some nonparametric inference problems where the nonparametric maximum likelihood estimator (NPMLE) are

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

September Math Course: First Order Derivative

September Math Course: First Order Derivative September Math Course: First Order Derivative Arina Nikandrova Functions Function y = f (x), where x is either be a scalar or a vector of several variables (x,..., x n ), can be thought of as a rule which

More information

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky Empirical likelihood with right censored data were studied by Thomas and Grunkmier (1975), Li (1995),

More information

Maximum likelihood estimation of a log-concave density based on censored data

Maximum likelihood estimation of a log-concave density based on censored data Maximum likelihood estimation of a log-concave density based on censored data Dominic Schuhmacher Institute of Mathematical Statistics and Actuarial Science University of Bern Joint work with Lutz Dümbgen

More information

Statistical Analysis of Competing Risks With Missing Causes of Failure

Statistical Analysis of Competing Risks With Missing Causes of Failure Proceedings 59th ISI World Statistics Congress, 25-3 August 213, Hong Kong (Session STS9) p.1223 Statistical Analysis of Competing Risks With Missing Causes of Failure Isha Dewan 1,3 and Uttara V. Naik-Nimbalkar

More information

1 Directional Derivatives and Differentiability

1 Directional Derivatives and Differentiability Wednesday, January 18, 2012 1 Directional Derivatives and Differentiability Let E R N, let f : E R and let x 0 E. Given a direction v R N, let L be the line through x 0 in the direction v, that is, L :=

More information

Nonparametric estimation under Shape Restrictions

Nonparametric estimation under Shape Restrictions Nonparametric estimation under Shape Restrictions Jon A. Wellner University of Washington, Seattle Statistical Seminar, Frejus, France August 30 - September 3, 2010 Outline: Five Lectures on Shape Restrictions

More information

Asymptotic Properties of Nonparametric Estimation Based on Partly Interval-Censored Data

Asymptotic Properties of Nonparametric Estimation Based on Partly Interval-Censored Data Asymptotic Properties of Nonparametric Estimation Based on Partly Interval-Censored Data Jian Huang Department of Statistics and Actuarial Science University of Iowa, Iowa City, IA 52242 Email: jian@stat.uiowa.edu

More information

A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints

A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints Noname manuscript No. (will be inserted by the editor) A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints Mai Zhou Yifan Yang Received: date / Accepted: date Abstract In this note

More information

Spring 2012 Math 541B Exam 1

Spring 2012 Math 541B Exam 1 Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote

More information

Empirical Processes: General Weak Convergence Theory

Empirical Processes: General Weak Convergence Theory Empirical Processes: General Weak Convergence Theory Moulinath Banerjee May 18, 2010 1 Extended Weak Convergence The lack of measurability of the empirical process with respect to the sigma-field generated

More information

Likelihood Construction, Inference for Parametric Survival Distributions

Likelihood Construction, Inference for Parametric Survival Distributions Week 1 Likelihood Construction, Inference for Parametric Survival Distributions In this section we obtain the likelihood function for noninformatively rightcensored survival data and indicate how to make

More information

Bayesian estimation of the discrepancy with misspecified parametric models

Bayesian estimation of the discrepancy with misspecified parametric models Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012

More information

Editorial Manager(tm) for Lifetime Data Analysis Manuscript Draft. Manuscript Number: Title: On An Exponential Bound for the Kaplan - Meier Estimator

Editorial Manager(tm) for Lifetime Data Analysis Manuscript Draft. Manuscript Number: Title: On An Exponential Bound for the Kaplan - Meier Estimator Editorial Managertm) for Lifetime Data Analysis Manuscript Draft Manuscript Number: Title: On An Exponential Bound for the Kaplan - Meier Estimator Article Type: SI-honor of N. Breslow invitation only)

More information

Harvard University. Harvard University Biostatistics Working Paper Series

Harvard University. Harvard University Biostatistics Working Paper Series Harvard University Harvard University Biostatistics Working Paper Series Year 2008 Paper 85 Semiparametric Maximum Likelihood Estimation in Normal Transformation Models for Bivariate Survival Data Yi Li

More information

Statistical Methods for Alzheimer s Disease Studies

Statistical Methods for Alzheimer s Disease Studies Statistical Methods for Alzheimer s Disease Studies Rebecca A. Betensky, Ph.D. Department of Biostatistics, Harvard T.H. Chan School of Public Health July 19, 2016 1/37 OUTLINE 1 Statistical collaborations

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 24 Paper 153 A Note on Empirical Likelihood Inference of Residual Life Regression Ying Qing Chen Yichuan

More information

An exponential family of distributions is a parametric statistical model having densities with respect to some positive measure λ of the form.

An exponential family of distributions is a parametric statistical model having densities with respect to some positive measure λ of the form. Stat 8112 Lecture Notes Asymptotics of Exponential Families Charles J. Geyer January 23, 2013 1 Exponential Families An exponential family of distributions is a parametric statistical model having densities

More information

Empirical Likelihood in Survival Analysis

Empirical Likelihood in Survival Analysis Empirical Likelihood in Survival Analysis Gang Li 1, Runze Li 2, and Mai Zhou 3 1 Department of Biostatistics, University of California, Los Angeles, CA 90095 vli@ucla.edu 2 Department of Statistics, The

More information

NONPARAMETRIC BAYES ESTIMATOR OF SURVIVAL FUNCTIONS FOR DOUBLY/INTERVAL CENSORED DATA

NONPARAMETRIC BAYES ESTIMATOR OF SURVIVAL FUNCTIONS FOR DOUBLY/INTERVAL CENSORED DATA Statistica Sinica 14(2004, 533-546 NONPARAMETRIC BAYES ESTIMATOR OF SURVIVAL FUNCTIONS FOR DOUBLY/INTERVAL CENSORED DATA Mai Zhou University of Kentucky Abstract: The non-parametric Bayes estimator with

More information

GOODNESS-OF-FIT TEST FOR RANDOMLY CENSORED DATA BASED ON MAXIMUM CORRELATION. Ewa Strzalkowska-Kominiak and Aurea Grané (1)

GOODNESS-OF-FIT TEST FOR RANDOMLY CENSORED DATA BASED ON MAXIMUM CORRELATION. Ewa Strzalkowska-Kominiak and Aurea Grané (1) Working Paper 4-2 Statistics and Econometrics Series (4) July 24 Departamento de Estadística Universidad Carlos III de Madrid Calle Madrid, 26 2893 Getafe (Spain) Fax (34) 9 624-98-49 GOODNESS-OF-FIT TEST

More information

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

Censoring mechanisms

Censoring mechanisms Censoring mechanisms Patrick Breheny September 3 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/23 Fixed vs. random censoring In the previous lecture, we derived the contribution to the likelihood

More information

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data Malaysian Journal of Mathematical Sciences 11(3): 33 315 (217) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES Journal homepage: http://einspem.upm.edu.my/journal Approximation of Survival Function by Taylor

More information

3 Applications of partial differentiation

3 Applications of partial differentiation Advanced Calculus Chapter 3 Applications of partial differentiation 37 3 Applications of partial differentiation 3.1 Stationary points Higher derivatives Let U R 2 and f : U R. The partial derivatives

More information

with Current Status Data

with Current Status Data Estimation and Testing with Current Status Data Jon A. Wellner University of Washington Estimation and Testing p. 1/4 joint work with Moulinath Banerjee, University of Michigan Talk at Université Paul

More information

ON THE FAILURE RATE ESTIMATION OF THE INVERSE GAUSSIAN DISTRIBUTION

ON THE FAILURE RATE ESTIMATION OF THE INVERSE GAUSSIAN DISTRIBUTION ON THE FAILURE RATE ESTIMATION OF THE INVERSE GAUSSIAN DISTRIBUTION ZHENLINYANGandRONNIET.C.LEE Department of Statistics and Applied Probability, National University of Singapore, 3 Science Drive 2, Singapore

More information

Preservation Theorems for Glivenko-Cantelli and Uniform Glivenko-Cantelli Classes

Preservation Theorems for Glivenko-Cantelli and Uniform Glivenko-Cantelli Classes Preservation Theorems for Glivenko-Cantelli and Uniform Glivenko-Cantelli Classes Aad van der Vaart and Jon A. Wellner Free University and University of Washington ABSTRACT We show that the P Glivenko

More information

Estimation and Inference of Quantile Regression. for Survival Data under Biased Sampling

Estimation and Inference of Quantile Regression. for Survival Data under Biased Sampling Estimation and Inference of Quantile Regression for Survival Data under Biased Sampling Supplementary Materials: Proofs of the Main Results S1 Verification of the weight function v i (t) for the lengthbiased

More information

An augmented inverse probability weighted survival function estimator

An augmented inverse probability weighted survival function estimator An augmented inverse probability weighted survival function estimator Sundarraman Subramanian & Dipankar Bandyopadhyay Abstract We analyze an augmented inverse probability of non-missingness weighted estimator

More information

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9 MAT 570 REAL ANALYSIS LECTURE NOTES PROFESSOR: JOHN QUIGG SEMESTER: FALL 204 Contents. Sets 2 2. Functions 5 3. Countability 7 4. Axiom of choice 8 5. Equivalence relations 9 6. Real numbers 9 7. Extended

More information

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. To compare ˆθ and θ, two estimators of θ: Say ˆθ is better than θ if it

More information

Size and Shape of Confidence Regions from Extended Empirical Likelihood Tests

Size and Shape of Confidence Regions from Extended Empirical Likelihood Tests Biometrika (2014),,, pp. 1 13 C 2014 Biometrika Trust Printed in Great Britain Size and Shape of Confidence Regions from Extended Empirical Likelihood Tests BY M. ZHOU Department of Statistics, University

More information

An idea how to solve some of the problems. diverges the same must hold for the original series. T 1 p T 1 p + 1 p 1 = 1. dt = lim

An idea how to solve some of the problems. diverges the same must hold for the original series. T 1 p T 1 p + 1 p 1 = 1. dt = lim An idea how to solve some of the problems 5.2-2. (a) Does not converge: By multiplying across we get Hence 2k 2k 2 /2 k 2k2 k 2 /2 k 2 /2 2k 2k 2 /2 k. As the series diverges the same must hold for the

More information

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model Other Survival Models (1) Non-PH models We briefly discussed the non-proportional hazards (non-ph) model λ(t Z) = λ 0 (t) exp{β(t) Z}, where β(t) can be estimated by: piecewise constants (recall how);

More information

Maximum Likelihood Estimation under Shape Constraints

Maximum Likelihood Estimation under Shape Constraints Maximum Likelihood Estimation under Shape Constraints Hanna K. Jankowski June 2 3, 29 Contents 1 Introduction 2 2 The MLE of a decreasing density 3 2.1 Finding the Estimator..............................

More information

Goodness-of-fit tests for the cure rate in a mixture cure model

Goodness-of-fit tests for the cure rate in a mixture cure model Biometrika (217), 13, 1, pp. 1 7 Printed in Great Britain Advance Access publication on 31 July 216 Goodness-of-fit tests for the cure rate in a mixture cure model BY U.U. MÜLLER Department of Statistics,

More information

Statistical Inference and Methods

Statistical Inference and Methods Department of Mathematics Imperial College London d.stephens@imperial.ac.uk http://stats.ma.ic.ac.uk/ das01/ 31st January 2006 Part VI Session 6: Filtering and Time to Event Data Session 6: Filtering and

More information

Asymptotic normality of the L k -error of the Grenander estimator

Asymptotic normality of the L k -error of the Grenander estimator Asymptotic normality of the L k -error of the Grenander estimator Vladimir N. Kulikov Hendrik P. Lopuhaä 1.7.24 Abstract: We investigate the limit behavior of the L k -distance between a decreasing density

More information

STAT 512 sp 2018 Summary Sheet

STAT 512 sp 2018 Summary Sheet STAT 5 sp 08 Summary Sheet Karl B. Gregory Spring 08. Transformations of a random variable Let X be a rv with support X and let g be a function mapping X to Y with inverse mapping g (A = {x X : g(x A}

More information

Notes largely based on Statistical Methods for Reliability Data by W.Q. Meeker and L. A. Escobar, Wiley, 1998 and on their class notes.

Notes largely based on Statistical Methods for Reliability Data by W.Q. Meeker and L. A. Escobar, Wiley, 1998 and on their class notes. Unit 2: Models, Censoring, and Likelihood for Failure-Time Data Notes largely based on Statistical Methods for Reliability Data by W.Q. Meeker and L. A. Escobar, Wiley, 1998 and on their class notes. Ramón

More information

A note on profile likelihood for exponential tilt mixture models

A note on profile likelihood for exponential tilt mixture models Biometrika (2009), 96, 1,pp. 229 236 C 2009 Biometrika Trust Printed in Great Britain doi: 10.1093/biomet/asn059 Advance Access publication 22 January 2009 A note on profile likelihood for exponential

More information

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. To compare ˆθ and θ, two estimators of θ: Say ˆθ is better than θ if it

More information

Outline. Frailty modelling of Multivariate Survival Data. Clustered survival data. Clustered survival data

Outline. Frailty modelling of Multivariate Survival Data. Clustered survival data. Clustered survival data Outline Frailty modelling of Multivariate Survival Data Thomas Scheike ts@biostat.ku.dk Department of Biostatistics University of Copenhagen Marginal versus Frailty models. Two-stage frailty models: copula

More information

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Review of Basic Probability The fundamentals, random variables, probability distributions Probability mass/density functions

More information

Score Statistics for Current Status Data: Comparisons with Likelihood Ratio and Wald Statistics

Score Statistics for Current Status Data: Comparisons with Likelihood Ratio and Wald Statistics Score Statistics for Current Status Data: Comparisons with Likelihood Ratio and Wald Statistics Moulinath Banerjee 1 and Jon A. Wellner 2 1 Department of Statistics, Department of Statistics, 439, West

More information

Limits at Infinity. Horizontal Asymptotes. Definition (Limits at Infinity) Horizontal Asymptotes

Limits at Infinity. Horizontal Asymptotes. Definition (Limits at Infinity) Horizontal Asymptotes Limits at Infinity If a function f has a domain that is unbounded, that is, one of the endpoints of its domain is ±, we can determine the long term behavior of the function using a it at infinity. Definition

More information

COMPUTATION OF THE EMPIRICAL LIKELIHOOD RATIO FROM CENSORED DATA. Kun Chen and Mai Zhou 1 Bayer Pharmaceuticals and University of Kentucky

COMPUTATION OF THE EMPIRICAL LIKELIHOOD RATIO FROM CENSORED DATA. Kun Chen and Mai Zhou 1 Bayer Pharmaceuticals and University of Kentucky COMPUTATION OF THE EMPIRICAL LIKELIHOOD RATIO FROM CENSORED DATA Kun Chen and Mai Zhou 1 Bayer Pharmaceuticals and University of Kentucky Summary The empirical likelihood ratio method is a general nonparametric

More information

Approximate self consistency for middle-censored data

Approximate self consistency for middle-censored data Journal of Statistical Planning and Inference 124 (2004) 75 86 www.elsevier.com/locate/jspi Approximate self consistency for middle-censored data S. Rao Jammalamadaka a;, Srikanth K. Iyer b a Department

More information