arxiv:math/ v2 [math.st] 17 Jun 2008

Similar documents
arxiv:math/ v2 [math.st] 17 Jun 2008

Nonparametric estimation for current status data with competing risks

Maximum likelihood: counterexamples, examples, and open problems

Maximum likelihood: counterexamples, examples, and open problems. Jon A. Wellner. University of Washington. Maximum likelihood: p.

The International Journal of Biostatistics

Inconsistency of the MLE for the joint distribution. of interval censored survival times. and continuous marks

A Law of the Iterated Logarithm. for Grenander s Estimator

Elementary properties of functions with one-sided limits

Estimation of the Bivariate and Marginal Distributions with Censored Data

CURRENT STATUS LINEAR REGRESSION. By Piet Groeneboom and Kim Hendrickx Delft University of Technology and Hasselt University

Nonparametric estimation under Shape Restrictions

Score Statistics for Current Status Data: Comparisons with Likelihood Ratio and Wald Statistics

STATISTICAL INFERENCE UNDER ORDER RESTRICTIONS LIMIT THEORY FOR THE GRENANDER ESTIMATOR UNDER ALTERNATIVE HYPOTHESES

Concentration behavior of the penalized least squares estimator

Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½

3 Integration and Expectation

Brownian motion. Samy Tindel. Purdue University. Probability Theory 2 - MA 539

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Optimal global rates of convergence for interpolation problems with random design

1. Introduction In many biomedical studies, the random survival time of interest is never observed and is only known to lie before an inspection time

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

Weak convergence. Amsterdam, 13 November Leiden University. Limit theorems. Shota Gugushvili. Generalities. Criteria

Notes 1 : Measure-theoretic foundations I

LECTURE 15: COMPLETENESS AND CONVEXITY

NONPARAMETRIC CONFIDENCE INTERVALS FOR MONOTONE FUNCTIONS. By Piet Groeneboom and Geurt Jongbloed Delft University of Technology

Semiparametric Minimax Rates

Preservation Theorems for Glivenko-Cantelli and Uniform Glivenko-Cantelli Classes

Asymptotic statistics using the Functional Delta Method

Editorial Manager(tm) for Lifetime Data Analysis Manuscript Draft. Manuscript Number: Title: On An Exponential Bound for the Kaplan - Meier Estimator

Bahadur representations for bootstrap quantiles 1

Bayesian Adaptation. Aad van der Vaart. Vrije Universiteit Amsterdam. aad. Bayesian Adaptation p. 1/4

Mildly degenerate Kirchhoff equations with weak dissipation: global existence and time decay

The Central Limit Theorem Under Random Truncation

STAT 331. Martingale Central Limit Theorem and Related Results

Economics 204 Summer/Fall 2011 Lecture 5 Friday July 29, 2011

Brownian Motion. 1 Definition Brownian Motion Wiener measure... 3

Real Analysis Math 131AH Rudin, Chapter #1. Dominique Abdi

SOME CONVERSE LIMIT THEOREMS FOR EXCHANGEABLE BOOTSTRAPS

1 Lyapunov theory of stability

Weak and strong convergence theorems of modified SP-iterations for generalized asymptotically quasi-nonexpansive mappings

4 Sums of Independent Random Variables

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R.

An elementary proof of the weak convergence of empirical processes

NETWORK-REGULARIZED HIGH-DIMENSIONAL COX REGRESSION FOR ANALYSIS OF GENOMIC DATA

Empirical Processes: General Weak Convergence Theory

On the Converse Law of Large Numbers

Exercises Measure Theoretic Probability

Exercises Measure Theoretic Probability

ERRATA: Probabilistic Techniques in Analysis

Functional Analysis, Stein-Shakarchi Chapter 1

Weak convergence and Brownian Motion. (telegram style notes) P.J.C. Spreij

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song

L p Functions. Given a measure space (X, µ) and a real number p [1, ), recall that the L p -norm of a measurable function f : X R is defined by

1 Glivenko-Cantelli type theorems

Random Process Lecture 1. Fundamentals of Probability

MATHS 730 FC Lecture Notes March 5, Introduction

Introduction to Empirical Processes and Semiparametric Inference Lecture 12: Glivenko-Cantelli and Donsker Results

Product-limit estimators of the survival function with left or right censored data

ASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE. By Michael Nussbaum Weierstrass Institute, Berlin

Lecture 9. d N(0, 1). Now we fix n and think of a SRW on [0,1]. We take the k th step at time k n. and our increments are ± 1

Estimation of a k monotone density, part 3: limiting Gaussian versions of the problem; invelopes and envelopes

Risk-Minimality and Orthogonality of Martingales

Stochastic Processes II/ Wahrscheinlichkeitstheorie III. Lecture Notes

EVALUATIONS OF EXPECTED GENERALIZED ORDER STATISTICS IN VARIOUS SCALE UNITS

(B(t i+1 ) B(t i )) 2

Laplace s Equation. Chapter Mean Value Formulas

Piecewise Smooth Solutions to the Burgers-Hilbert Equation

Two-Step Iteration Scheme for Nonexpansive Mappings in Banach Space

1/12/05: sec 3.1 and my article: How good is the Lebesgue measure?, Math. Intelligencer 11(2) (1989),

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions

E-Companion to Fully Sequential Procedures for Large-Scale Ranking-and-Selection Problems in Parallel Computing Environments

arxiv:math/ v1 [math.st] 21 Jul 2005

Examples of Dual Spaces from Measure Theory

Zeros of lacunary random polynomials

Lecture 10. Riemann-Stieltjes Integration

Lecture 21 Representations of Martingales

(Y; I[X Y ]), where I[A] is the indicator function of the set A. Examples of the current status data are mentioned in Ayer et al. (1955), Keiding (199

DISTRIBUTION OF THE SUPREMUM LOCATION OF STATIONARY PROCESSES. 1. Introduction

THE DVORETZKY KIEFER WOLFOWITZ INEQUALITY WITH SHARP CONSTANT: MASSART S 1990 PROOF SEMINAR, SEPT. 28, R. M. Dudley

ANALYSIS QUALIFYING EXAM FALL 2017: SOLUTIONS. 1 cos(nx) lim. n 2 x 2. g n (x) = 1 cos(nx) n 2 x 2. x 2.

COUNTING NUMERICAL SEMIGROUPS BY GENUS AND SOME CASES OF A QUESTION OF WILF

2 (Bonus). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

Part II Probability and Measure

STAT Sample Problem: General Asymptotic Results

STAT331. Combining Martingales, Stochastic Integrals, and Applications to Logrank Test & Cox s Model

Stability of Stochastic Differential Equations

On pathwise stochastic integration

Approximate Self Consistency for Middle-Censored Data

Analysis Finite and Infinite Sets The Real Numbers The Cantor Set

Survival Analysis for Interval Censored Data - Nonparametric Estimation

Math 321 Final Examination April 1995 Notation used in this exam: N. (1) S N (f,x) = f(t)e int dt e inx.

1. Stochastic Processes and filtrations

Lecture 8: Information Theory and Statistics

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued

L p Spaces and Convexity

7 Convergence in R d and in Metric Spaces

STAT 6385 Survey of Nonparametric Statistics. Order Statistics, EDF and Censoring

Regularity of the density for the stochastic heat equation

The Skorokhod problem in a time-dependent interval

ON THE REGULARITY OF SAMPLE PATHS OF SUB-ELLIPTIC DIFFUSIONS ON MANIFOLDS

Transcription:

The Annals of Statistics 2008, Vol. 36, No. 3, 1031 1063 DOI: 10.1214/009053607000000974 c Institute of Mathematical Statistics, 2008 arxiv:math/0609020v2 [math.st] 17 Jun 2008 CURRENT STATUS DATA WITH COMPETING RISKS: CONSISTENCY AND RATES OF CONVERGENCE OF THE MLE By Piet Groeneboom, Marloes H. Maathuis 1 and Jon A. Wellner 2 Delft University of Technology and Vrije Universiteit Amsterdam, University of Washington and University of Washington We study nonparametric estimation of the sub-distribution functions for current status data with competing risks. Our main interest is in the nonparametric maximum likelihood estimator (MLE), and for comparison we also consider a simpler naive estimator. Both types of estimators were studied by Jewell, van der Laan and Henneman [Biometrika (2003) 90 183 197], but little was known about their large sample properties. We have started to fill this gap, by proving that the estimators are consistent and converge globally and locally at rate n 1/3. We also show that this local rate of convergence is optimal in a minimax sense. The proof of the local rate of convergence of the MLE uses new methods, and relies on a rate result for the sum of the MLEs of the sub-distribution functions which holds uniformly on a fixed neighborhood of a point. Our results are used in Groeneboom, Maathuis and Wellner [Ann. Statist. (2008) 36 1064 1089] to obtain the local limiting distributions of the estimators. 1. Introduction. We study current status data with competing risks. Such data arise naturally in cross-sectional studies with several failure causes. Moreover, generalizations of these data arise in HIV vaccine trials (see [5]). The general framework is as follows. We analyze a system that can fail from K competing risks, where K N is fixed. The random variables of interest are (X,Y ), where X R is the failure time of the system, and Y {1,...,K} is the corresponding failure cause. We cannot observe (X,Y ) directly. Rather, we observe the current status of the system at a single Received September 2006; revised April 2007. 1 Supported in part by NSF Grant DMS-02-03320. 2 Supported in part by NSF Grants DMS-02-03320 and DMS-05-03822 and by NI-AID Grant 2R01 AI291968-04. AMS 2000 subject classifications. Primary 62N01, 62G20; secondary 62G05. Key words and phrases. Survival analysis, current status data, competing risks, maximum likelihood, consistency, rate of convergence. This is an electronic reprint of the original article published by the Institute of Mathematical Statistics in The Annals of Statistics, 2008, Vol. 36, No. 3, 1031 1063. This reprint differs from the original in pagination and typographic detail. 1

2 P. GROENEBOOM, M. H. MAATHUIS AND J. A. WELLNER random time T R, where T is independent of (X,Y ). This means that at time T, we observe whether or not failure occurred, and if and only if failure occurred, we also observe the failure cause Y. We want to estimate the bivariate distribution of (X,Y ). Since Y {1,..., K}, this is equivalent to estimating the sub-distribution functions F 0k (s) = P(X s,y = k), k = 1,...,K. Note that the sum of the subdistribution functions K F 0k (s) = P(X s) is the overall failure time distribution. This shows that the sub-distribution functions are related to each other and should be considered as a system. We consider nonparametric estimation of the sub-distribution functions. This problem, or close variants thereof, has been studied by [5, 6, 7]. These papers introduced various nonparametric estimators, including the MLE (see [5, 7]) and a naive estimator (see [7]). They also provided algorithms to compute the estimators, and showed simulation studies that compared them. However, until now, little was known about the large sample properties of the estimators. We have started to fill this gap by developing the local asymptotic theory for the MLE and the naive estimator. We study the MLE because it is a natural estimator that often exhibits good behavior. The simpler naive estimator was suggested to be asymptotically efficient for the estimation of smooth functionals [7], and we therefore consider it for comparison. In the present paper we prove consistency and rates of convergence. These results are used in [3] to obtain the local limiting distributions. The outline of this paper is as follows. In Section 2 we introduce the estimators. We discuss their definitions, give existence and uniqueness results, and provide various characterizations in terms of necessary and sufficient conditions. Such characterizations are important since there is no closed form available for the MLE. In Section 3 we show that the estimators are globally and locally consistent. In Section 4 we prove that their global and local rates of convergence are n 1/3 (Theorems 4.1 and 4.17). We also prove that n 1/3 is an asymptotic local minimax lower bound for the rate of convergence (Proposition 4.4). Hence, the estimators converge locally at the optimal rate, in a minimax sense. The proof of the local rate of convergence of the MLE uses new methods. One of the main difficulties in this proof consists of handling the system of sub-distribution functions. We solve this problem by first deriving a rate result for the sum of the MLEs of the sub-distribution functions (Theorem 4.10). This rate result is stronger than usual, since it holds uniformly on a fixed neighborhood of a point, instead of on a shrinking neighborhood of order n 1/3 (see Remark 4.11). Such a strong result is needed to handle potential sparsity of the jump points of the MLEs of the sub-distribution functions (see Remark 4.18). Technical proofs are collected in Section 5, and computational aspects of the estimators are discussed in the companion paper [3], Section 4.

CURRENT STATUS COMPETING RISKS DATA (I) 3 Fig. 1. Graphical representation of the observed data (T, ) in an example with K = 3 competing risks. The black sets indicate the values of (X,Y ) that are consistent with (T, ), for each of the four possible values of. 2. The estimators. We make the following assumptions: (a) the observation time T is independent of the variables of interest (X,Y ), and (b) the system cannot fail from two or more causes at the same time. Assumption (a) is essential for the development of the theory. Assumption (b) ensures that the failure cause is well defined. This assumption is always satisfied by defining simultaneous failure from several causes as a new failure cause. We allow ties in the observation times. We now introduce some notation. We denote the observed data by (T, ), where T is the observation time and = ( 1,..., K+1 ) is an indicator vector defined by k = 1{X T,Y = k} for k = 1,...,K, and K+1 = 1{X > T }. The observed data are illustrated in Figure 1. Let (T i, i ), i = 1,...,n, be n i.i.d. observations of (T, ), where i = ( i 1,..., i K+1 ). Note that we use the superscript i as the index of an observation, and not as a power. The order statistics of T 1,...,T n are denoted by T (1),...,T (n). Furthermore, G is the distribution of T, G n is the empirical distribution of T i, i,...,n, and P n is the empirical distribution of (T i, i ), i = 1,...,n. For any vector (x 1,...,x K ) R K we use the shorthand notation x + = K x k, so that, for example, + = K k and F 0+ (s) = K F 0k (s). For any K- tuple F = (F 1,...,F K ) of sub-distribution functions, we define F K+1 (s) = u>s df +(u) = F + ( ) F + (s). Finally, we use the following conventions for indicator functions and integrals: Definition 2.1. Let da be a Lebesgue Stieltjes measure. Then we define for t < t 0 : 1 [t0,t)(u) = 1 [t,t0 )(u) and f(u)da(u) = f(u)da(u). [t 0,t) [t,t 0 ) 2.1. Definitions of the estimators. We first consider the MLE. To understand its form, let F = (F 1,...,F K ) F K, where F K is the collection of K-tuples F = (F 1,...,F K ) of sub-distribution functions on R with F + 1. Under F we have T Mult K+1 (1,(F 1 (T),...,F K+1 (T))), so that the density of a single observation is given by

4 P. GROENEBOOM, M. H. MAATHUIS AND J. A. WELLNER (1) K+1 p F (t,δ) = F k (t) δ k = K F k (t) δ k (1 F + (t)) 1 δ +, with respect to the dominating measure µ = G #, where # is the counting measure on {e k :k = 1,...,K + 1} and e k is the kth unit vector in R K+1. Hence, the log likelihood l n (F) = logp F (t,δ)dp n (t,δ) is given by { K } (2) l n (F) = δ k logf k (t) + (1 δ + )log(1 F + (t)) dp n (t,δ). It then follows that the MLE F n = ( F n1,..., F nk ) is defined by (3) (4) l n ( F n ) = max F F K l n (F). The naive estimator F n = ( F n1,..., F nk ) is defined by l nk ( F nk ) = max F k F l nk(f k ), k = 1,...,K, where F is the collection of all distribution functions on R and l nk ( ) is the marginal log likelihood for the reduced current status data (T i, i k ), i = 1,...,n: l nk (F k ) = {δ k logf k (t) + (1 δ k )log(1 F k (t))}dp n (t,δ), k = 1,...,K. Thus, F nk uses only the kth entry of the -vector. We see that the naive estimator splits the estimation problem into K well-known univariate current status problems. Therefore, its computation and asymptotic theory follow straightforwardly from known results on current status data. But this simplification comes at a cost. For example, it follows immediately that the constraint F n+ 1 may be violated (see [7]). We note that both F n+ and F n+ provide estimators for the overall failure time distribution F 0+. A third estimator for this distribution is given by the MLE for the reduced current status data (T, + ), ignoring information on the failure causes. These three estimators are typically not the same (see [5]). To compare the MLE and the naive estimator, we now define the naive estimator by a single optimization problem: l n ( F n ) = max F F K l n (F) where l n (F) = K l nk (F k ), and F K is the K-fold product of F. By comparing this to the optimization problem for the MLE, we note the following differences:

CURRENT STATUS COMPETING RISKS DATA (I) 5 (a) The object function l n (F) for the MLE contains the term 1 F +, involving the sum of the sub-distribution functions, while the object function l n (F) for the naive estimator only contains the individual components. (b) The space F K for the MLE contains the constraint F + 1, while the space F K for the naive estimator only involves the individual components. The more complicated object function for the MLE forces us to work with the system of sub-distribution functions, and poses new challenges in the derivation of the local rate of convergence of the MLE. Moreover, it gives rise to a new self-induced limiting process for the local limiting distribution of the MLE (see [3]). The constraint F + 1 on the space over which we maximize is important for small sample sizes, but its effect vanishes asymptotically. These observations are supported by simulations in [3], Section 4. 2.2. Existence and uniqueness. Since only values of the sub-distribution functions at the observation times appear in the log likelihoods l nk (F k ) and l n (F), we limit ourselves to estimating these values. This means that the optimization problems (3) and (4) reduce to finite-dimensional optimization problems. Hence, their solutions exist by [19], Corollary 38.10. For the naive estimator, the values of the sub-distribution functions at all observation times enter in the log likelihood l nk (F k ). Together with strict concavity of l nk (F k ), this implies that F nk is unique at all observation times, for k = 1,...,K. For the MLE, F k (T i ) appears in the log likelihood l n (F) if and only if i k + i K+1 > 0. This motivates the following definition and result: Definition 2.2. For each k = 1,...,K + 1, we define the set T k by (5) T k = {T i,i = 1,...,n: i k + i K+1 > 0} {T (n) }. Proposition 2.3. For each k = 1,...,K +1, F nk (t) is unique at t T k. Moreover, F nk ( ) is unique if and only if i K+1 = 0 for all observations with T i = T (n). Proof. We first prove uniqueness of F nk (t) at t T k, for k = 1,...,K. Let k {1,...,K}. Strict concavity of the log likelihood immediately gives uniqueness of F nk at points T i with i k = 1. Note that the log likelihood is not strictly concave in F nk (T i ) if i K+1 = 1, so that we need to do more work to prove uniqueness at these points. First, one can show that F nk can only assign mass to intervals of the following form: (i) (T i,t j ] where i K+1 = 1, j k = 1 and l k = l K+1 = 0 for all l such that T i < T l < T j,

6 P. GROENEBOOM, M. H. MAATHUIS AND J. A. WELLNER (ii) (T i, ) where T i = T (n) and i K+1 = 1 (see [5], Lemma 1, or use the concept of the height map of [9]). Note that F nk is unique at the right endpoints of the intervals given in (i), since F nk is unique at points T i with i k = 1. This implies that the probability mass in each interval given in (i) is unique. In turn, this implies that F nk is unique at all points that are not in the interior of these intervals. In particular, this gives uniqueness of F nk (t) at t T k. The uniqueness statement about F n,k+1 follows from the uniqueness of F n1,..., F nk. We now prove the statement about F nk ( ). First, if i K+1 = 0 for all observations with T i = T (n), then F nk can only assign mass to the intervals given in (i). Hence, F nk ( ) = F nk (T (n) ), and since F nk (T (n) ) was already proved to be unique, it follows that F nk ( ) is unique. Conversely, if there is a T i = T (n) with i K+1 = 1, then the log likelihood contains the term log(1 F + (T (n) )). Hence, F n+ must assign mass to the right of T (n) in order to get l n ( F n ) >. The MLE is indifferent to the distribution of this mass over F n1,..., F nk, since their separate contributions do not appear in the log likelihood. Hence, F nk ( ) is nonunique in this case. 2.3. Characterizations. Characterizations of the naive estimators F n1,..., F nk follow from [4], Propositions 1.1 and 1.2, pages 39 41. Characterizations of the MLE can be derived from Karush Kuhn Tucker conditions, since the optimization problem can be reduced to a finite-dimensional optimization problem (see the first paragraph of Section 2.2). However, we give characterizations with direct proofs. These methods do not use the discrete nature of the problem, so that they can also be used for truly infinite-dimensional optimization problems. Definition 2.4. We define the processes V nk by (6) V nk (t) = δ k dp n (u,δ), t R, k = 1,...,K + 1. u t Moreover, let F K be the collection of K-tuples of bounded nonnegative nondecreasing right-continuous functions. Using this notation, we can write l n (F) = K+1 logfk (u)dv nk (u). In Lemma 2.5 we translate the optimization problem (3) into an optimization problem over a cone, by removing the constraint F + 1. Subsequently, we give a basic characterization in Proposition 2.6. This characterization leads to various corollaries, of which Corollary 2.10 is most important for the sequel.

CURRENT STATUS COMPETING RISKS DATA (I) 7 Lemma 2.5. F n maximizes l n (F) over F K if and only if F n maximizes l n (F) over F K, where K+1 l n (F) = log F k (u)dv nk (u) F + ( ). Proof. (Necessity.) Let F n maximize l n (F) over F K, and let F F K. We want to show that l n ( F n ) l n (F). Note that this inequality holds trivially if F + ( ) = 0. Hence, we assume F + ( ) = c > 0. Then F/c F K, and l n ( F n ) l n (F/c), by the assumption that F n maximizes l n (F) over F K. Together with F n+ ( ) = 1 this yields ln ( F n ) = l n ( F n ) 1 l n (F/c) 1 = K+1 logf k (u)dv nk (u) logc 1 = l n (F) + c logc 1 l n (F). The last inequality follows since x logx 1 0 for x > 0. (Sufficiency.) Let F n maximize l n (F) over F K, and let F n+ ( ) = c. As before, we may assume c > 0. Then l n ( F n ) l n ( F n /c), and by the same reasoning as above this gives l n ( F n ) l n ( F n /c) = l n ( F n /c) 1 = l n ( F n ) + c logc 1. Since x logx 1 0 if and only if x = 1, this yields c = 1. Hence, F n F K, and F n maximizes l n (F) over F K F K. We now obtain the following basic characterization of the MLE. Proposition 2.6. F n maximizes l n (F) over F K if and only if F n F K and the following two conditions hold for all k = 1,..., K: dv nk (u) u t F nk (u) u<t (7) + dv n,k+1 (u) 1, t R, F n,k+1 (u) { } dv nk (u) u t F nk (u) u<t + dv n,k+1 (u) (8) F n,k+1 (u) 1 d F nk (t) = 0. Proof. (Necessity.) Let F n maximize l n (F) over F K. Then F n also maximizes l n (F) over F K, by Lemma 2.5. Fix k {1,...,K}, and define the (h) (h) (h) (h) perturbation F n = ( F n1,..., F nk ) by F nk = (1+h) F (h) nk and F nj = F nj for j k. Since F K for h < 1, we get F (h) n 0 = lim h 0 h 1 {l n ( F (h) n ) l n( F n )}

8 P. GROENEBOOM, M. H. MAATHUIS AND J. A. WELLNER = F nk ( ) dv nk (u) + F nk (u) dv n,k+1 (u) F F nk ( ) n,k+1 (u) { } dv nk (u) = u t F nk (u) u<t + dv n,k+1 (u) F n,k+1 (u) 1 d F nk (t), using Fubini s theorem to obtain the last line. This gives condition (8). (h,t) (h,t) (h,t) Next, let t R, and define the perturbation F n = ( F n1,..., F nk ) by F (h,t) nk (u) = F (h,t) nk (u) + h1 [t, ) (u) and F nj = F (h,t) nj for j k. Since F n F K for h 0, we get 0 limh 1 (h,t) {l n ( F n ) l n ( F n )} h 0 dv nk (u) = F nk (u) u<t + dv n,k+1 (u) F n,k+1 (u) 1, u t which is condition (7). (Sufficiency.) Let F n F K satisfy conditions (7) and (8), and let F F K. We want to show that l n ( F n ) l n (F). Concavity of the logarithm yields ln (F) l n ( F n ) K+1 Fk (u) F nk (u) dv nk (u) F + ( ) + F F n+ ( ). nk (u) We now show that the right-hand side of this display is nonpositive. By Fubini, we have K Fk (u) F nk (u) K dv nk (u) = d(f k F F nk )(t) dv nk(u) nk (u) t u F nk (u) and FK+1 (u) F n,k+1 (u) F n,k+1 (u) = K u t dv nk (u) F nk (u) d(f k F nk )(t) dv n,k+1 (u) = d(f + F n+ )(t) dv n,k+1(u) t>u F n,k+1 (u) = K u<t dv n,k+1 (u) F n,k+1 (u) d(f k F nk )(t). Combining the last three displays gives l n (F) l n ( F K { } dv nk (u) n ) F nk (u) u<t + dv n,k+1 (u) F n,k+1 (u) 1 d(f k F nk )(t) = u t K { u t } dv nk (u) F nk (u) u<t + dv n,k+1 (u) F n,k+1 (u) 1 df k (t) 0,

CURRENT STATUS COMPETING RISKS DATA (I) 9 where the equality follows from (8), and the final inequality follows from (7). Hence F n maximizes l n (F) over F K, and by Lemma 2.5 this implies that F n maximizes l n (F) over F K. Definition 2.7. We say that t is a point of increase of a right-continuous function F if F(t) > F(t ε) for every ε > 0 (note that this definition is slightly different from the usual definition). Moreover, for F F K, we define (9) dvn,k+1 (u) β nf = 1 F K+1 (u). Note that β n Fn is uniquely defined, since F n,k+1 (t) is unique at points t where dv n,k+1 has mass (Proposition 2.3). We now rewrite the characterization in Proposition 2.6 in terms of β n Fn : Corollary 2.8. F n maximizes l n (F) over F K if and only if F n F K and the following holds for all k = 1,...,K: { dvnk (u) u t F nk (u) dv } n,k+1(u) (10) β F n,k+1 (u) n Fn, t R, where equality holds if t is a point of increase of F nk. Proof. Since the integrand of (8) is a left-continuous function of t, conditions (7) and (8) of Proposition 2.6 are equivalent to the condition that for all k = 1,...,K, dv nk (u) u t F nk (u) u<t + dv n,k+1 (u) 1, t R, F n,k+1 (u) where equality must hold if t is a point of increase of F nk. Combining this with dv n,k+1 (u) F n,k+1 (u) = 1 β dv n,k+1 (u) n Fn u t F n,k+1 (u), t R u<t completes the proof. We determine the sign of β n Fn in Corollary 2.9: Corollary 2.9. Let F n maximize l n (F) over F K. Then β n Fn 0, and β n Fn = 0 if and only if there is an observation with T i = T (n) and i K+1 = 1.

10 P. GROENEBOOM, M. H. MAATHUIS AND J. A. WELLNER Proof. Taking t > T (n) in Corollary 2.8 implies that β n Fn 0. Now suppose that there is a T i = T (n) with i K+1 = 1. Then we must have F n+ (T (n) ) < 1 to obtain l n ( F n ) >. Hence, there must be a k {1,...,K} such that F nk has points of increase t > T (n). Corollary 2.8 then implies that β n Fn = 0. Next, suppose that there does not exist a T i = T (n) with i K+1 = 1. Then { dvnk (u) F nk (u) dv } n,k+1(u) dv nk (u) = F n,k+1 (u) u T (n) F nk (u) > 0, u T (n) and by Corollary 2.8 this implies β n Fn > 0. We now make a first step toward localizing the characterization, in Corollary 2.10. This corollary forms the basis of Proposition 4.8, which is used in the proofs of the local rate of convergence and the limiting distribution of the MLE. Corollary 2.10. F n maximizes l n (F) over F K if and only if F n F K and the following holds for all k = 1,...,K and each point of increase τ nk of F nk : { dvnk (u) [τ nk,s) F nk (u) dv } n,k+1(u) (11) β F n,k+1 (u) n Fn 1 [τnk,s)(t (n) ), s R, where equality holds if s is a point of increase of F nk, and if s > T (n). Proof. Let F n maximize l n ( ) over F K. Let s > τ nk. If τ nk < s T (n), then (11) follows by applying (10) to t = τ nk and t = s, and subtracting the resulting equations. If τ nk T (n) < s, then [τ nk,s) { dvnk (u) F nk (u) dv n,k+1(u) F n,k+1 (u) } { dvnk (u) = u τ nk F nk (u) dv } n,k+1(u), F n,k+1 (u) so that the statement follows by applying (10) to t = τ nk. If T (n) < τ nk < s, then the left-hand side of (10) equals zero for t = τ nk and t = s. The inequalities for s < τ nk can be derived analogously. Finally, the inequality (11) and the corresponding equality condition imply (10). 3. Consistency. Hellinger and L r (G) (r 1) consistency of the naive estimator follow from [13, 18]. Local consistency of the naive estimator follows from [4, 13]. In this section we prove similar results for the MLE. First, note that for two vectors of functions F = (F 1,...,F K ) and F 0 = (F 01,...,F 0K )

CURRENT STATUS COMPETING RISKS DATA (I) 11 in F K, the Hellinger distance h(p F,p F0 ) and the total variation distance d TV (p F,p F0 ) in our model are given by (12) (13) h 2 (p F,p F0 ) = 1 2 ( p F p F0 ) 2 K+1 dµ = 1 2 ( F k F 0k ) 2 dg, K+1 d TV (p F,p F0 ) = 1 2 F k F 0k dg, where µ = G #, and p F and # are defined in (1). The MLE is Hellinger consistent: Theorem 3.1. h(p Fn,p F0 ) a.s. 0. Proof. Since P = {p F :F F K } is convex, we can use the following inequality: h 2 (p Fn,p F0 ) (P n P)φ(p Fn /p F0 ), where φ(t) = (t 1)/(t + 1) ([18], Proposition 3; see also [11] and [14, 15]). Hence, it is sufficient to prove that {φ(p F /p F0 ):F F K } is a P-Glivenko Cantelli class. This can be shown by Glivenko Cantelli preservation theorems of [18], using indicators of V C-classes of sets and monotone functions as building blocks. Alternatively, the result follows directly from [18], Theorem 9 by viewing the problem as a bivariate censored data problem for (X,Y ). L r (G) consistency is given in Corollary 3.2, where the L r (G) distance is defined by (14) F F 0 r G,r = K+1 F k (t) F 0k (t) r dg(t), r 1. Corollary 3.2. F n F 0 G,r a.s. 0 for r 1. Proof. Note that F F 0 G,1 = 2d TV (p F,p F0 ). Hence, the statement for r = 1 follows from the well-known inequality d TV (p F1,p F2 ) 2h(p F1,p F2 ). The result for r > 1 follows from a b r a b for a,b [0,1] and r > 1. Note that Theorem 3.1 and Corollary 3.2 hold without any additional assumptions. The quantities in these statements are integrated with respect

12 P. GROENEBOOM, M. H. MAATHUIS AND J. A. WELLNER to G, showing the importance of the observation time distribution. For example, the results do not imply consistency at intervals where G has zero mass. Such issues should be taken into account if G can be chosen by design. Under some additional assumptions, Maathuis ([10], Section 4.2) proved several forms of local and uniform consistency using methods from [13], Section 3. One such result is needed in the proof of the local rate of convergence of the MLE, and is given below: Proposition 3.3. Let F 01,...,F 0K be continuous at t 0, and let G be continuously differentiable at t 0 with strictly positive derivative g(t 0 ). Then there exists an r > 0 such that sup F nk (t) F 0k (t) a.s. 0, k = 1,...,K. t [t 0 r,t 0 +r] Proof. Let k {1,...,K} and choose the constant r > 0 such that F 0k is continuous on [t 0 2r,t 0 + 2r] and g(t) > g(t 0 )/2 for t [t 0 2r,t 0 + 2r]. Fix an ω for which the L 1 (G) consistency holds, and suppose there is an x 0 [t 0 r,t 0 + r] for which F nk (x 0,ω) does not converge to F 0k (x 0 ). Then there is an ε > 0 such that for all n 1 > 0 there is an n > n 1 such that F nk (x 0,ω) F 0k (x 0 ) > ε. Using the monotonicity of F nk and the continuity of F 0k, this implies there is a γ > 0 such that F nk (t,ω) F 0k (t) > ε/2 for all t (x 0 γ,x 0 ] or [x 0,x 0 + γ) and [x 0 γ,x 0 + γ] [t 0 2r,t 0 + 2r]. This yields that F nk (t,ω) F 0k (t) dg(t) > γεg(t 0 )/4, which contradicts L 1 (G) consistency. Uniform consistency follows since F 0k is continuous. 4. Rate of convergence. The Hellinger rate of convergence of the naive estimator is n 1/3. This follows from [15] or [17], Theorem 3.4.4, page 327. Under certain regularity conditions, the local rate of convergence of the naive estimator is also n 1/3 ; see [4], Lemma 5.4, page 95. This local rate result implies that the distance between two successive jump points of F nk around a point t 0 is of order O p (n 1/3 ). In this section we discuss similar results for the MLE. In Section 4.1 we show that the global rate of convergence is n 1/3. In Section 4.2 we prove that n 1/3 is an asymptotic local minimax lower bound for the rate of convergence, meaning that no estimator can converge locally at a rate faster than n 1/3, in a minimax sense. Hence, the naive estimator converges locally at the optimal rate. Since the MLE is expected to be at least as good as the naive estimator, one may expect that the MLE also converges locally at the optimal rate of n 1/3. This is indeed the case, and this is proved in Section 4.3 (Theorem 4.17). Our main tool for proving this result is Theorem 4.10, which gives a uniform rate of convergence of F n+ on a fixed neighborhood of a point, rather than on the usual shrinking neighborhood of order n 1/3. Such a strong rate

CURRENT STATUS COMPETING RISKS DATA (I) 13 result is needed to handle potential sparsity of the jump points of the MLEs of the sub-distribution functions (see Remark 4.18). Some technical proofs are deferred to Section 5. 4.1. Global rate of convergence. Theorem 4.1. n 1/3 h(p Fn,p F0 ) = O p (1). Proof. We use the rate theorem of Van der Vaart and Wellner ([17], Theorem 3.4.1, page 322) with ( ) pf (t,δ) + p F0 (t,δ) m pf (t,δ) = log, 2p F0 (t,δ) M n (F) = P n m pf, M(F) = Pm pf and G n m pf = n(m n M)(F). The key condition to verify is E G n Mγ φ n (γ), where M γ = {m pf m pf0 :h(p F, p F0 ) < γ} and φ n (γ)/γ α is a decreasing function in γ for some α < 2. For this purpose we use Theorem 3.4.4 of [17], which states that the functions m pf fit the setup of Theorem 3.4.1 of [17], and that (15) where E G n Mγ J [] (γ, P,h){1 + J [] (γ, P,h)γ 2 n 1/2 }, J [] (γ, P,h) = γ 0 1 + logn [] (ε, P,h)dε and logn [] (ε, P,h) is the ε-entropy with bracketing for P = {p F :F F K } with respect to Hellinger distance h. We first bound the bracketing number N [] (ε, P,h). Let F = (F 1,...,F K ) F K. For each k = 1,...,K + 1, let [l k,u k ] be a bracket containing F k, with size ( u k l k ) 2 dg ε 2 /(K + 1). Then [p l (t,δ),p u (t,δ)] = [ K+1 K+1 l k (t) δ k, u k (t) δ k is a bracket containing p F, and its Hellinger size is bounded by ε. Note that all F k, k = 1,...,K +1, are contained in the class F = {F :R [0,1] is monotone}, and it is well known that logn [] (δ, F,L 2 (Q)) 1/δ, uniformly in Q. Hence, considering all possible combinations of (K + 1)-tuples of the brackets [l k,u k ], it follows that logn [] (ε, P,h) log({n [] (ε/ K + 1, F,L 2 (G))} K+1 ) = (K + 1)logN [] (ε/ K + 1, F,L 2 (G)) (K + 1) 3/2 ε 1. Dropping the dependence on K (since K is fixed), this implies that J [] (γ, P,h) γ 1/2, and together with (15) we obtain E G n Mγ γ + (γ n) 1. Since γ ( γ + (γ n) 1 )/γ is decreasing in γ, it is a valid choice for φ n (γ) ]

14 P. GROENEBOOM, M. H. MAATHUIS AND J. A. WELLNER in Theorem 3.4.1 of [17]. We then obtain that r n,p h(p Fn F0 ) = O p (1) provided that,p F0 ) 0 in outer probability, and r h(p Fn n 2φ n(rn 1) n for all n. The first condition is fulfilled by the almost sure Hellinger consistency of the MLE (Theorem 3.1). The second condition holds for r n = cn 1/3 and c = (( 5 1)/2) 2/3. We obtain the following corollary about the L 1 (G) and L 2 (G) rates of convergence: Corollary 4.2. n 1/3 F n F 0 G,r = O p (1) for r = 1,2. Proof. The result for r = 1 again follows from d TV (p F1,p F2 ) 2h(p F1,p F2 ). The result for r = 2 follows from K+1 F F 0 2 G,2 = { F k F 0k } 2 { F k + F 0k } 2 dg 8h 2 (p F,p F0 ), using F k + F 0k 2. 4.2. Asymptotic local minimax lower bound. In this section we prove that n 1/3 is an asymptotic local minimax lower bound for the rate of convergence. We use the set-up of [1], Section 4.1. Let P be a set of probability densities on a measurable space (Ω, A) with respect to a σ-finite dominating measure. We estimate a parameter θ = Up R, where U is a real-valued functional and p P. Let U n, n 1, be a sequence of estimators based on a sample of size n, that is, U n = t n (Z 1,...,Z n ), where Z 1,...,Z n is a sample from the density p, and t n :Ω n R is a Borel measurable function. Let l:[0, ) [0, ) be an increasing convex loss function with l(0) = 0. The risk of the estimator U n in estimating Up is defined by E n,p l( U n Up ), where E n,p denotes the expectation with respect to the product measure P n corresponding to the sample Z 1,...,Z n. We now recall Lemma 4.1 of [1]. Lemma 4.3. For any p 1,p 2 P h(p 1,p 2 ) < 1: such that the Hellinger distance inf U n max{e n,p1 l( U n Up 1 ),E n,p2 l( U n Up 2 )} l( 1 4 Up 1 Up 2 (1 h 2 (p 1,p 2 )) 2n ). Let k {1,...,K} and let U nk, n 1, be a sequence of estimators of F 0k (t 0 ). Furthermore, let c > 0 and let Fn k = (F n1,...,f nk ) be a perturbation of F 0 where only the kth component is changed in the following way: F 0k (t 0 cn 1/3 ), if x [t 0 cn 1/3,t 0 ), F nk (x) = F 0k (t 0 + cn 1/3 ), if x [t 0,t 0 + cn 1/3 ), F 0k (x), otherwise,

CURRENT STATUS COMPETING RISKS DATA (I) 15 and F nj (x) = F 0j (x) for j k. Note that F k n F K is a valid set of subdistribution functions with overall survival function F n,k+1 = 1 F n+. We now apply Lemma 4.3 with l(x) = x r, p 1 = p F0 and p 2 = p F k n, where p F is defined in (1). This gives a local minimax lower bound for the rate of convergence. A detailed derivation of this result is given in [10], Section 5.2. Proposition 4.4. Fix k {1,...,K}. Let 0 < F 0k (t 0 ) < F 0k ( ), and let F 0k and G be continuously differentiable at t 0 with strictly positive derivatives f 0k (t 0 ) and g(t 0 ). Let d = 2 5/3 e 1/3. Then, for r 1, (16) lim inf n nr/3 inf max{e n,pf0 U nk F 0k (t 0 ) r,e n,pf U k nk F nk (t 0 ) r } U n n d r [ g(t0 ) f 0k (t 0 ) { }] 1 F 0k (t 0 ) + 1 r/3. 1 F 0+ (t 0 ) Remark 4.5. Note that the lower bound (16) consists of a part depending on the underlying distribution, and a universal constant d. It is not clear whether the constant depending on the underlying distribution is sharp, because it has not been proved that any estimator achieves this constant. However, we do know that the naive estimator F nk does generally not achieve this constant. To see this, recall that F nk is the MLE for the reduced data (T i, i k ), i = 1,...,n. Hence, its asymptotic risk is bounded below by the asymptotic local minimax lower bound for current status data: d r [ g(t0 ) f 0k (t 0 ) { 1 F 0k (t 0 ) + 1 1 F 0k (t 0 ) }] r/3 (see [1], (4.2), or take K = 1 in Proposition 4.4). Since 1 F 0k (t 0 ) > 1 F 0+ (t 0 ) if F 0j (t 0 ) > 0 for some j {1,...,K}, j k, this bound is larger than the one given in (16). 4.3. Local rate of convergence. As mentioned in the introduction of this section, the n 1/3 local rate of convergence of the naive estimator and the n 1/3 local minimax lower bound for the rate of convergence suggest that the MLE converges locally at rate n 1/3. This is indeed the case, and we now give the proof of this result. However, although this result is intuitively clear, the proof is rather involved. The two main difficulties in the proof are the lack of a closed form for the MLE and the system of sub-distribution functions. We solve the first problem by working with a characterization of the MLE in terms of necessary and sufficient conditions. This approach was also followed in [1] for case 2 interval censored data, and in [2] for convex density estimation. We handle the system of sub-distribution functions by first proving a rate result for

16 P. GROENEBOOM, M. H. MAATHUIS AND J. A. WELLNER F n+ that holds uniformly on a fixed neighborhood around t 0, instead of on the usual shrinking neighborhood of order n 1/3. The outline of this section is as follows. In Section 4.3.1 we revisit the characterization of the MLE, and derive a localized version of the conditions (Proposition 4.8). In Section 4.3.2 we use this characterization to prove the rate result for F n+ that is discussed above (Theorem 4.10). In Section 4.3.3 we use this result to prove the local rate of convergence for the components F n1,..., F nk (Theorem 4.17). Some technical proofs are deferred to Section 5. Throughout, we assume that for each k {1,...,K}, F nk is piecewise constant and right-continuous, with jumps only at points in T k (see Definition 2.2). This assumption does not affect the asymptotic properties of the MLE. 4.3.1. Revisiting the characterization. We consider the characterization given in Corollary 2.10. Since it is difficult to work with F nk in the denominator, we start by rewriting the left-hand side of (11), using dv nk (u) [s,t) F nk (u) [s,t) = dv nk (u) F 0k (u) [s,t) + F 0k (u) F nk (u) F 0k (u) F dv nk (u). nk (u) This leads to the following lemma: Lemma 4.6. For all k = 1,...,K and s,t R, { dvnk (u) [s,t) F nk (u) dv } n,k+1(u) F n,k+1 (u) { dvnk (u) = [s,t) F 0k (u) dv } n,k+1(u) F 0,K+1 (u) F 0k (u) + F nk (u) [s,t) F 0k (u) F dv nk (u) nk (u) F 0,K+1 (u) F n,k+1 (u) [s,t) F 0,K+1 (u) F dv n,k+1 (u). n,k+1 (u) We now combine Corollary 2.10 and Lemma 4.6 to obtain a localized version of the characterization in Proposition 4.8. We first introduce some definitions: Definition 4.7. Let a k = (F 0k (t 0 )) 1 for k = 1,...,K +1. Furthermore, for k = 1,...,K, we define the processes W nk ( ) and S nk ( ) by (17) W nk (t) = {δ k F 0k (u)}dp n (u,δ), (18) u t S nk (t) = a k W nk (t) + a K+1 W n+ (t).

CURRENT STATUS COMPETING RISKS DATA (I) 17 Proposition 4.8. For each k = 1,...,K, let 0 < F 0k (t 0 ) < F 0k ( ), and let F 0k and G be continuously differentiable at t 0 with strictly positive derivatives f 0k (t 0 ) and g(t 0 ). Then there is an r > 0 such that, for all k = 1,...,K and each jump point τ nk < T (n) of F nk, we have s {a k { F nk (u) F 0k (u)} + a K+1 { F n+ (u) F 0+ (u)}}dg(u) τ nk (19) ds nk (u) + R nk (τ nk,s) for s < T (n), [τ nk,s) where equality holds in (19) if s is a jump point of F nk, and where (20) sup t 0 2r s<t t 0 +2r R nk (s,t) n 2/3 n 1/3 (t s) 3/2 = O p(1). Proof. Let k {1,...,K} and let τ nk < T (n) be a jump point of F nk. Note that Corollary 2.10 and Lemma 4.6 imply that for all s < T (n), (21) F nk (u) F 0k (u) F 0k (u) F dv nk (u) nk (u) F n,k+1 (u) F 0,K+1 (u) [τ nk,s) F 0,K+1 (u) F dv n,k+1 (u) n,k+1 (u) { δk F 0k (u) δ } K+1 F 0,K+1 (u) dp n (u,δ), F 0k (u) F 0,K+1 (u) [τ nk,s) u [τ nk,s) with equality if s is a jump point of F nk. We first consider the left-hand side of (21). For each k {1,...,K + 1}, we replace F nk (u) by F 0k (u) in the denominator: F nk (u) F 0k (u) [s,t) F 0k (u) F dv nk (u) nk (u) (22) (23) = [s,t) F nk (u) F 0k (u) F 0k (u) 2 dv nk (u) + ρ (1) nk (s,t), where ρ (1) nk (s,t) = [s,t) { F nk (u) F 0k (u)} 2 F 0k (u) 2 F dv nk (u). nk (u) Next, we replace dv nk (u) by dv k (u) = F 0k (u)dg(u) in the first term on the right-hand side of (22): (24) [s,t) F nk (u) F 0k (u) t F 0k (u) 2 dv nk (u) = s F nk (u) F 0k (u) F 0k (u) dg(u) + ρ (2) nk (s,t),

18 P. GROENEBOOM, M. H. MAATHUIS AND J. A. WELLNER (25) where ρ (2) nk (s,t) = [s,t) F nk (u) F 0k (u) F 0k (u) 2 d(v nk V k )(u). Finally, we replace the denominator F 0k (u) by F 0k (t 0 ) in the first term on the right-hand side of (24): (26) t s u [s,t) F nk (u) F 0k (u) F 0k (u) t dg(u) = s where ρ (3) nk (s,t) = t s F nk (u) F 0k (u) F 0k (t 0 ) dg(u) + ρ (3) nk (s,t), { F nk (u) F 0k (u)}{f 0k (t 0 ) F 0k (u)} F 0k (u)f 0k (t 0 ) and similarly on the right-hand side of (21): δ k F 0k (u) δ k F 0k (u) dp n (u,δ) = F 0k (u) F 0k (t 0 ) u [s,t) dp n (u,δ) ρ (4) nk (s,t), dg(u), where, with G n the empirical distribution of T 1,...,T n (as defined in Section 2), (27) ρ (4) nk (s,t) = = u [s,t) [s,t) + {F 0k (u) F 0k (t 0 )}{δ k F 0k (u)} F 0k (u)f 0k (t 0 ) F 0k (u) F 0k (t 0 ) d(v nk V k )(u) F 0k (u)f 0k (t 0 ) [s,t) F 0k (t 0 ) F 0k (u) d(g n G)(u). F 0k (t 0 ) d(p n P)(u,δ) Inequality (19) then follows from F K+1 = 1 F + for F F K, and the definition 4 4 (28) R nk (s,t) = ρ (l) n,k+1 (s,t) ρ (l) nk (s,t), k = 1,...,K. l=1 l=1 We now show that the remainder term R nk (s,t) is of the given order. Let k {1,...,K + 1}, and consider ρ (1) nk. Note that F nk and F 0k stay away from zero with probability tending to 1 on [t 0 2r,t 0 + 2r], by the assumption F 0k (t 0 ) > 0, the continuity of F 0k at t 0, and the consistency of F nk (Proposition 3.3). Furthermore, { F nk (u) F 0k (u)} 2 dv nk (u) { F nk (u) F 0k (u)} 2 d(g n G)(u) + { F nk (u) F 0k (u)} 2 dg(u), where the second term on the right-hand side is of order O p (n 2/3 ) by the L 2 (G) rate of convergence given in Corollary 4.2, and the first term is of

CURRENT STATUS COMPETING RISKS DATA (I) 19 order O p (n 2/3 ) by a modulus of continuity result. To see the latter, define Q = {q F (u) = {F(u) F 0k (u)} 2 :F F}, { Q(γ) = q F Q: q F (u) 2 dg(u) γ }, 2 where F is the class of monotone functions F :R [0,1]. The L 2 (G) rate of convergence (Corollary 4.2) implies that we can choose C > 0 such that q F Q(Cn 1/3 ) with high probability. We then apply (5.42) of [16], Lemma 5.13, with α = 1 and β = 0 to the class Q(Cn 1/3 ). This yields that ρ (1) nk (s,t) = O p (n 2/3 ) uniformly in t 0 2r s t t 0 + 2r. Analogously, ρ (2) nk (s,t) = O p (n 2/3 ) uniformly in t 0 2r s t t 0 + 2r, using the L 2 (G) rate of convergence and a modulus of continuity result. Next, we consider ρ (3) nk (s,t). By the Cauchy Schwarz inequality, ρ (3) nk (s,t) { t s {F 0k (u) F 0k (t 0 )} 2 F 0k (t 0 ) 2 { { F nk (u) F 0k (u)} 2 F 0k (u) 2 } 1/2 dg(u) } 1/2 dg(u). The first term of the product is of order O(t s) 3/2, uniformly in t 0 2r s t t 0 + 2r, by the continuous differentiability of F 0k. The second term is of order O p (n 1/3 ) by the L 2 (G) rate of convergence. Hence, ρ (3) nk (s,t) = O p(n 1/3 (t s) 3/2 ), uniformly in t 0 2r s t t 0 +2r. Finally, ρ (4) nk (s,t) = O p(n 1/2 (t s)), uniformly in t 0 2r s t t 0 +2r, by writing [s,t) = [s,t 0 ) [t,t 0 ) and using Lemma 4.9 below. Since the term O p(n 1/2 (t s)) is dominated by O p (n 2/3 n 1/3 (t s) 3/2 ) for all s t, it can be omitted. Lemma 4.9. Let F :R R be continuously differentiable at t 0 with derivative f(t 0 ) > 0. Then there is an r > 0 so that uniformly in t 0 2r s t t 0 + 2r, (29) {F(t) F(u)}d(G n G)(u) = O p (n 1/2 (t s)), [s,t) (30) [s,t) F(t) F(u) F(u) d(v nk V k )(u) = O p (n 1/2 (t s)), k = 1,...,K.

20 P. GROENEBOOM, M. H. MAATHUIS AND J. A. WELLNER Proof. We only prove (29), because the proof of (30) is analogous. Integration by parts yields n 1/2 {F(t) F(u)}d(G n G)(u) [s,t) = n 1/2 {F(t) F(s)}{G n (s) G(s)} + n 1/2 [s,t) {G n (u) G(u)}dF(u). Note that n 1/2 sup u R G n (u) G(u) is tight, since it converges in distribution to sup u R B(G(u)) sup x [0,1] B(x), where B is a standard Brownian motion on [0,1]. Hence, both terms on the right-hand side of the display are O p (1){F(t) F(s)} = O p (t s), uniformly in t 0 2r s t t 0 + 2r. 4.3.2. Uniform rate of convergence of F n+ on a fixed neighborhood of t 0. The main result of this section is a rate of convergence result for F n+ which holds uniformly on a fixed neighborhood [t 0 r,t 0 + r] of t 0, rather than on a shrinking neighborhood of the form [t 0 Mn 1/3,t 0 + Mn 1/3 ] (Theorem 4.10). We discuss the meaning of this result in Remark 4.11, by comparing it to several existing results for current status data without competing risks. Theorem 4.10 is used in Section 4.3 to prove the local rate of convergence of the components F n1,..., F nk. Theorem 4.10. For all k = 1,...,K, let 0 < F 0k (t 0 ) < F 0k ( ), and let F 0k and G be continuously differentiable at t 0 with strictly positive derivatives f 0k (t 0 ) and g(t 0 ). For β (0,1) we define { n v n (t) = 1/3, if t n 1/3, (31) n (1 β)/3 t β, if t > n 1/3. Then there exists a constant r > 0 so that (32) sup F n+ (t) F 0+ (t) = O p (1). t [t 0 r,t 0 +r] v n (t t 0 ) Note that the function v n (t) = n 1/3 for t < n 1/3. Outside a n 1/3 neighborhood we cannot expect to get a n 1/3 rate. Therefore, for t > n 1/3 we let the function v n (t) grow with t, by defining v n (t) = n (1 β)/3 t β. Before giving the proof of Theorem 4.10, we discuss its meaning by comparing it to several known results for current status data without competing risks. Remark 4.11. By taking K = 1 in Theorem 4.10, it follows that the theorem holds for the MLE F n for current status data without competing risks. Thus, to clarify the meaning of Theorem 4.10, we can compare it to

CURRENT STATUS COMPETING RISKS DATA (I) 21 Fig. 2. Plot of v n(t) for various values of β. The dotted lines are y = t and y = n 1/3. Note that β close to zero gives the sharpest bound. known results for F n. First, we consider the local rate of convergence given in [4], Lemma 5.4, page 95. For M > 0, they prove that (33) sup F n (t 0 + n 1/3 t) F 0 (t 0 ) = O p (n 1/3 ). t [ M,M] We can obtain this bound by applying Theorem 4.10 to t [t 0 Mn 1/3,t 0 + Mn 1/3 ], and using the continuous differentiability of F 0k at t 0 and the fact that v n (t t 0 ) v n (Mn 1/3 ) = M β n 1/3 for M 1, t [t 0 Mn 1/3,t 0 + Mn 1/3 ]. Hence, Theorem 4.10 implies (33) for M 1. Next, we consider the global bound of [4], Lemma 5.9: (34) sup F n (t) F 0 (t) = O p (n 1/3 logn). t R The result in Theorem 4.10 is fundamentally different from (34), since it is stronger than (34) for t t 0 < n 1/3 (logn) 1/β, and it is weaker outside this region. Remark 4.12. Note that Theorem 4.10 gives a family of bounds in β. Choosing β close to zero gives the tightest bound, as illustrated in Figure 2. For the proof of the local rate of convergence of F n1,..., F nk (Theorem 4.17), it is sufficient that Theorem 4.10 holds for one arbitrary value of β (0,1). Stating the theorem for one fixed β leads to a somewhat simpler proof. However, for completeness we present the result for all β (0,1). As an introduction to the proof of Theorem 4.10 we first note the following. Let ε > 0 and let r > 0 be small. Then the continuous differentiability

22 P. GROENEBOOM, M. H. MAATHUIS AND J. A. WELLNER of F 0+ at t 0 implies F 0+ (t + Mv n (t t 0 )) F 0+ (t) + 2Mv n (t t 0 )f 0+ (t 0 ), t [t 0 r,t 0 + r], F 0+ (t Mv n (t t 0 )) F 0+ (t) 2Mv n (t t 0 )f 0+ (t 0 ), t [t 0 r,t 0 + r]. Hence, it is sufficient to show that we can choose n 1 and M such that for all n > n 1 P { t [t 0 r,t 0 + r]: F n+ (t) / (F 0+ (t Mv n (t t 0 )), F 0+ (t + Mv n (t t 0 )))} < ε. In fact, we only prove that there exist n 1 and M such that (35) P { t [t 0,t 0 +r]: F n+ (t) F 0+ (t+mv n (t t 0 ))} < ε/4, n > n 1, since the proofs for F n+ (t) F 0+ (t Mv n (t t 0 )) and the interval [t 0 r,t 0 ] are analogous. In the proof of (35) we use the fact that we can choose r, n 1 and C such that P(E c nrc ) < ε/8 for all n > n 1, where (36) E nrc = K { F nk has a jump in (t 0 2r,t 0 r), T (n) > t 0 + 2r, } R nk (w,t) sup t 0 2r w<t t 0 +2r n 2/3 n 1/3 (t w) 3/2 C, and R nk (w,t) is defined in Proposition 4.8. For the event involving R nk this follows from Proposition 4.8. For the event that F nk has a jump point in (t 0 2r,t 0 r), this follows from consistency of F nk (Proposition 3.3) and the strict monotonicity of F 0k in a neighborhood of t 0. Finally, T (n) > t 0 +2r for sufficiently large n follows from the positive density of g in a neighborhood of t 0. (37) Proof of Theorem 4.10. By the discussion above, and by writing P { t [t 0,t 0 + r]: F n+ (t) F 0+ (t + Mv n (t t 0 ))} P(E c nrc ) + P( t [t 0,t 0 + r]: F n+ (t) F 0+ (t + Mv n (t t 0 )), E nrc ), it is sufficient to show that we can choose n 1, M and C such that the second term of (37) is bounded by ε/8 for all n > n 1. In order to show this, we put a grid on the interval [t 0,t 0 + r], analogously to [8], Lemma 4.1. The grid points t nj and grid cells I nj are denoted by (38) t nj = t 0 + jn 1/3 and I nj = [t nj,t n,j+1 ) for j = 0,...,J n = rn 1/3.

CURRENT STATUS COMPETING RISKS DATA (I) 23 This yields P( t [t 0,t 0 + r]: F n+ (t) F 0+ (t + Mv n (t t 0 )), E nrc ) J n j=0 P( t I nj : F n+ (t) F 0+ (t + Mv n (t t 0 )),E nrc ). Hence, it is sufficient to show that we can choose n 1 and m 1 such that for all n > n 1, M > m 1 and j = 0,...,J n, we have (39) P( t I nj : F n+ (t) F 0+ (t + Mv n (t t 0 )),E nrc ) p jm, where p jm satisfies limsup Jn n j=0 p jm 0 as M. We prove (39) for { d1 exp{ d p jm = 2 M 3 }, if j = 0, (40) d 1 exp{ d 2 (Mj β ) 3 }, if j = 1,...,J n, where d 1 and d 2 are positive constants. Using the monotonicity of F n+, it is sufficient to prove that for all n > n 1, M > m 1 and j = 0,...,J n, (41) P {A njm, E nrc } p jm, where (42) (43) A njm = { F n+ (t n,j+1 ) F 0+ (s njm )}, s njm = t nj + Mv n (t nj t 0 ). Fix n > 0 and M > 0, and let j {0,...,J n }. Let τ nkj be the last jump point of F nk before t n,j+1, for k = 1,...,K. On the event E nrc, these jump points exist and are in (t 0 2r,t n,j+1 ]. Without loss of generality we assume that the sub-distribution functions are labeled so that τ n1j τ nkj. On the event A njm there must be a k {1,...,K} for which F nk (t n,j+1 ) F 0k (s njm ). Hence, we can define l {1,...,K} such that (44) (45) F nk (t n,j+1 ) < F 0k (s njm ), F nl (t n,j+1 ) F 0l (s njm ). k = l + 1,...,K, Since s njm < t 0 + 2r for n large, and t 0 + 2r < T (n) on the event E nrc, we have snjm {a l { F nl (u) F 0l (u)} + a K+1 { F n+ (u) F 0+ (u)}}dg(u) τ nlj ds nl (u) + R nl (τ nlj,s njm ), [τ nlj,s njm )

24 P. GROENEBOOM, M. H. MAATHUIS AND J. A. WELLNER by Proposition 4.8. Hence, P(A njm,e nrc ) equals P ( snjm τ nlj τ nlj {a l { F nl (u) F 0l (u)} + a K+1 { F n+ (u) F 0+ (u)}}dg(u) ds nl (u) + R nl (τ nlj,s njm ), A njm, E nrc ), [τ nlj,s njm ) and this is bounded above by ( snjm P a l { F nl (u) F 0l (u)}dg(u) (46) (47) [τ nlj,s njm ) ds nl (u) R nl (τ nlj,s njm ), A njm, E nrc ) ( snjm ) + P { F n+ (u) F 0+ (u)}dg(u) 0, A njm, E nrc. τ nlj We now show that both terms (46) and (47) are bounded above by p jm /2. Note that (45) implies that on the event A njm, F nl (u) F nl (τ nlj ) = F nl (t n,j+1 ) F 0l (s njm ) for u τ nlj, using the definition of τ nlj, and the fact that F nl is piecewise constant and monotone nondecreasing. Hence, on the event A njm we have snjm τ nlj { F nl (u) F 0l (u)}dg(u) snjm τ nlj {F 0l (s njm ) F 0l (u)}dg(u) 1 4 g(t 0)f 0l (t 0 )(s njm τ nlj ) 2, for all τ nlj [t 0 2r,t n,j+1 ] and r sufficiently small. Combining this with the definition of E nrc [see (36)], it follows that (46) is bounded above by ( { 1 P inf w [t 0 2r,t n,j+1 ] 4 g(t 0)a l f 0l (t 0 )(s njm w) 2 ds nl (u) [w,s njm ) (48) } ) C(n 2/3 n 1/3 (s njm w) 3/2 ) 0. For m 1 and n 1 sufficiently large, this probability is bounded above by p jm /2 for all M > m 1, n > n 1 and j {0,...,J n }, using Lemma 4.13 below. Similarly, (47) is bounded above by p jm /2, using Lemma 4.14 below. This proves (41) and completes the proof. Lemmas 4.13 and 4.14 play a crucial role in the proof of Theorem 4.10. The probability statement in Lemma 4.13 consists of three terms: a deterministic parabolic drift b(s njm w) 2, a martingale S nk, and a remainder

CURRENT STATUS COMPETING RISKS DATA (I) 25 term C(n 2/3 n 1/6 (s njm w) 3/2 ). The basic idea of the lemma is that the quadratic drift dominates the martingale and the remainder term. Lemma 4.14 controls the term that involves the sum of the components. In this lemma the key idea is to exploit the system of sub-distribution functions, and play out the different components against each other. The proofs of both lemmas are given in Section 5. Finally, we note that (48) in the proof of Theorem 4.10 contains a smaller remainder term C(n 2/3 n 1/3 (s njm w) 3/2 ) than the one in Lemma 4.13. Hence, (48) is also bounded above by p jm. We choose to state Lemma 4.13 in terms of the larger remainder term C(n 2/3 n 1/6 (s njm w) 3/2 ), since we need the lemma in this form for the proof of Theorem 4.17. Lemma 4.13. Let C > 0 and b > 0. Then there exist r > 0, n 1 > 0 and m 1 > 0 such that for all k = 1,...,K, n > n 1, M > m 1 and j {0,...,J n = rn 1/3 }, ( { P inf w [t 0 2r,t n,j+1 ] b(s njm w) 2 [w,s njm ) ds nk (u) } C(n 2/3 n 1/6 (s njm w) 3/2 ) ) 0 p jm, where s njm = t nj + Mv n (t nj t 0 ), and S nk ( ), v n ( ) and p jm are defined by (18), (31) and (40), respectively. Lemma 4.14. Let the conditions of Theorem 4.10 be satisfied, and let l be defined by (44) and (45). Then there exist r > 0, n 1 > 0 and m 1 > 0 such that for all n > n 1, M > m 1 and j {0,...,J n = rn 1/3 }, { snjm } P { F n+ (u) F 0+ (u)}dg(u) 0, A njm, E nrc p jm, τ nlj where τ nlj is the last jump point of F nl before t n,j+1, s njm = t nj +Mv n (t nj t 0 ), and E nrc, p jm and A njm are defined by (36), (40) and (42), respectively. Remark 4.15. The conditions of Theorem 4.10 also hold when t 0 is replaced by s, for s in a neighborhood of t 0. Hence, the results in this section continue to hold when t 0 is replaced by s [t 0 r,t 0 + r], for r > 0 sufficiently small. To be precise, there exists an r > 0 such that for every ε > 0 there exist C > 0 and n 1 > 0 such that ( P sup t [t 0 r,t 0 +r] F n+ (t) F 0+ (t) v n (t s) ) > C < ε for s [t 0 r,t 0 + r], n > n 1.

26 P. GROENEBOOM, M. H. MAATHUIS AND J. A. WELLNER In Remark 4.12 we already mentioned that, in order to prove the local rate of convergence of the components F n1,..., F nk, we only need Theorem 4.10 to hold for one value of β (0,1). Therefore, we now fix β = 1/2 so that v n (t) = n 1/3 n 1/6 t. Then Remark 4.15 leads to the following corollary: Corollary 4.16. Let the conditions of Theorem 4.10 be satisfied. Then there exists an r > 0 such that for every ε > 0 there exist C > 0 and n 1 > 0 such that ( P sup t [t 0 r,s] s t { F ) n+ (u) F 0+ (u)}dg(u) n 2/3 n 1/6 (s t) 3/2 > C < ε for s [t 0 r,t 0 + r], n > n 1. 4.3.3. Local rate of convergence of F n1,..., F nk. We are now ready to prove the local rate of convergence of F n1,..., F nk. The proof is again based on the localized characterization given in Proposition 4.8, but we now use Corollary 4.16 to bound the term involving F n+ [see (52) ahead]. Theorem 4.17. Let the conditions of Theorem 4.10 be satisfied. Then there exists an r > 0 such that for every ε > 0 and M 1 > 0 there exist M > 0 and n 1 > 0 such that ( P sup t [ M 1,M 1 ] n 1/3 F nk (s + n 1/3 t) F 0k (s) > M for all n > n 1 and s [t 0 r,t 0 + r]. ) < ε, k = 1,...,K, Proof. For the reasons discussed in Remark 4.15, it is sufficient to prove the result for s = t 0. Let ε > 0, M 1 > 0 and k {1,...,K}. We want to show that there exist constants M > M 1 and n 1 > 0 such that for all n > n 1, (49) (50) P( F nk (t 0 + Mn 1/3 ) F 0k (t 0 + 2Mn 1/3 )) < ε, P( F nk (t 0 Mn 1/3 ) F 0k (t 0 2Mn 1/3 )) < ε. We only prove (49), since the proof of (50) is analogous. Define B nkm = { F nk (t 0 + Mn 1/3 ) F 0k (s nm )} and s nm = t 0 + 2Mn 1/3, and let τ nk be the last jump point of F nk before t 0 + Mn 1/3. Since we may assume that s nm < t 0 +r < T (n) for n sufficiently large, Proposition 4.8 yields ( snm P(B nkm ) = P {a k { F nk (u) F 0k (u)} τ nk