of the Mean of acounting Process with Panel Count Data Jon A. Wellner 1 and Ying Zhang 2 University of Washington and University of Central Florida

Size: px

Start display at page:

Download "of the Mean of acounting Process with Panel Count Data Jon A. Wellner 1 and Ying Zhang 2 University of Washington and University of Central Florida"

Milton Eaton
5 years ago
Views:

1 Two Estimators of the Mean of acounting Process with Panel Count Data Jon A. Wellner 1 and Ying Zhang 2 University of Washington and University of Central Florida November 1, 1998 Abstract We study two estimators of the mean function of a counting process based on \panel count data". The setting for \panel count data" is one in which n independent subjects, each with a counting process with common mean function, are observed at several possibly dierent times during a study. Following a model proposed by Schick and Yu (1997), we allow the number of observation times, and the observation times themselves, to be random variables. Our goal is to estimate the mean function of the counting process. We show that the estimator of the mean function proposed by Sun and Kalbeisch (1995) can be viewed as a eudo-maximum likelihood estimator when a nonhomogeneous Poisson process model is assumed for the counting process. We establish consistency of both the nonparametric eudo maximum likelihood estimator of Sun and Kalbeisch (1995) and the full maximum likelihood estimator, even if the underlying counting process is not a Poisson process. We also derive the asymptotic distribution of both estimators at a xed time t, and compare the resulting theoretical relative eciency with nite sample relative eciency by way of a limited monte-carlo study. 1 Research supported in part by National Science Foundation grant DMS and NIAID grant 2R1 AI Research supported in part by National Science Foundation grant DMS AMS 198 subject classications. Primary: 6F5, 6F17 secondary 6J65, 6J7. Key words and phrases. algorithm, asymptotic distributions, consistency, convex minorant, counting process, current status data, empirical processes, interval censoring, iterative, maximum likelihood, montecarlo, eudo likelihood, relative eciency. 1

2 1. Introduction Suppose that N = fn(t) :t g is a univariate counting process. In many applications, it is important to estimate the expected number of events which will occur by the time t, (t) =EN(t), the mean function of N. In practice, a number of subjects are under study. Each subject is observed several times during the study. At the observation times, only the counts of events up to that time are observable the exact times of events are unknown. The number of observation times and the observation times themselves are allowed to vary across the subjects. This kind of data often appear in demographic studies, system reliability, and clinical trials. Data of this type are sometimes referred to as Panel Count Data. Examples are given by Kalbeisch and Lawless (1981), (1985), Gaver and O'Muircheataigh (1987), Thall and Lachin (1988), Thall (1988), and Sun and Kalbeisch (1995). In some situations, there is only one event recorded by the counting process for each subject e.g. death or onset of disease. Then panel count data are often referred to as intervalcensored data. There is a large literature on statistical methods for interval-censored data and applications thereof for examples of interval-censored data see, for example, McDonald (1992), Diamond, McDonald and Shah (1986), Finkelstein (1986), and Finkelstein and Wolf (1985). Some theoretical results are also available. However, most of those results are limited to the special case in which each subject has the same number of observation times: see e.g. Groeneboom and Wellner (1992), Huang (1996), and Huang and Wellner (1995) for cases when each subject is only observed once or twice, while Wellner (1995) gives a preliminary study of the NPMLE for the case when each subject is observed k times. SchickandYu (1997) show the consistency of the NPMLE for interval censoring when each subject is allowed to have dierent number of observation times and the counting process is a simple indicator (one jump) counting process. In this paper, we extend the formulation of the sample space given byschick andyu (1997) to study panel count data for general counting processes. We study both a eudo-likelihood estimator b n and the full maximum likelihood estimator n b of the mean function based on a non-homogeneous Poisson process model, i.e. (1.1) P (N(t) =k) =e ; (t) k (t) k = 1 2 ::: k! where (t) =E(N(t)), the mean function of the counting process N. We also establish asymptotic properties of the proposed estimators without assuming any model for the counting process N, thereby demonstrating an important robustness property of both estimators. Our proofs of the asymptotic results rely strongly on the generality inchoice of sample space allowed by modern empirical process theory. The eudo-likelihood estimator we study was proposed by Sun and Kalbeisch (1995), who constructed the estimator based on isotonic regression considerations. The eudolikelihood estimator ignores the dependence between counts in the counting process as 2

3 successive observation times, treating these successive counts as if they were independent random variables to form a \eudo likelihood". On the other hand, the full Non-Parametric Maximum Likelihood Estimator (NPMLE) of the mean function involves taking account of the dependence of the successive counts, and under the assumption that the counting process is a (nonhomogenesou) Poisson process, this is easily accomplished via independence of the increments of the process: (1.2) P (N(t 1 )=k 1 N(t s )=k s )= sy ( (t j ) ; (t j;1 )) k j;k j;1 (k j ; k j;1 )! exp[;( (t j ) ; (t j;1 ))] where <t 1 <t 1 < <t s k 1 k 2 k s and t = k by convention. We show that the NPMLE n b of the mean function based on (1.2) is consistent, regardless the underlying true counting process. We also show that it is considerably more ecient than the Sun-Kalbeisch estimator, both when the nonhomogeneous Poisson assumption holds and when it fails. When the counting process for each individual is observed at just one time (current status data), the estimator of Sun and Kalbeisch (1995) is shown to be the NPMLE if the counting process has just one jump (a simple indicator counting process) or if it is a non-homogeneous Poisson process. Thus our results extend previous results for current status data (or interval censoring case 1) for interval censored data due to Groeneboom (1991) and Groeneboom and Wellner (1992). When the counting process for each individual is observed exactly twice, the estimator of Sun and Kalbeisch (1995) diers from the NPMLE, and provides an alternative to the NPMLE studied in the case of a simple counting process by Groeneboom (1991), Groeneboom and Wellner (1992), Wellner (1995), and Groeneboom (1996). When the counting process for each individual is observed a random number of times (\mixed case" interval censoring in the terminology of Schick andyu (1997)), then our results generalize and extend those of Schick and Yu (1997) in several directions. The outline of the rest of paper is as follows: In section 2, we characterize the two estimators b n and n b. In section 3, weprovide algorithms for computation of the estimators: Computation of the Non-Parametric Maximum Pseudo Maximum Likelihood Estimator (NPMPLE) b n can be accomplished in one step. On the other hand, computation of the NPMLE n b requires an iterative convex minorant algorithm (ICM). In section 4, we state the main asymptotic results for both estimators: strong consistency and pointwise asymptotic distributions. These results are proved in section 7 by use of tools from empirical process theory. In section 6, we present an example involving data from a bladder tumor study in which we compute and compare both estimators. In section 7, we present the results of simulation studies to compare two estimators. 3

4 2. Characterization of Two Nonparametric Estimators Suppose that N = fn(t) :t g is a counting process with mean function EN(t) = (t), an integer-valued random variable K, andt = ft k j j =1 ::: k k =1 2 :::g where N and (K T) are independent, and T k j;1 T k j for j = 1 ::: k and k = 1 2 :::. Let X =(N K T K K), with a possible value x =(n k t k k), where N k =(N k 1 ::: N k k )with N k j = N(T k j ), j =1 2 ::: k,andt k is the kth row of the triangular array T. Suppose we observe n i.i.d. copies of X X 1 X 2 ::: X n, where X i =(N (i) K i T (i) K i K i ), i =1 2 ::: n. Here (N (i) T (i) K i ), i =1 2 ::: are the underlying i.i.d. copies of (N T K). Our goal is to construct nonparametric estimators of and study the properties of these estimators. If we assume the non-homogeneous Poisson process model (1.1) and ignore the dependency of the events within a subject, we can form a \eudo log- likelihood function" for given by (2.1) l n (jx) = nx XK i i=1 n o N (i) (i) (i) K i j log (T K i j ) ; (T K i j ) omitting the parts that are irrelevant in estimating. Let s 1 <s 2 < <s m denote the ordered distinct observation time points in the set of all observation time points ft Ki j j =1 ::: K i i=1 ::: ng. For l 2f1 ::: mg, we dene and w l = N l = 1 w l nx XK i i=1 nx XK i i=1 l =(s l ) 1 n o T (i) K i j =s l N (i) K i j 1n o T (i) K i j =s l and write =( 1 ::: m ). Then (2.1) can be rewritten, with a slight abuse of notation, as (2.2) l n (jx) =l n (jx) = mx w Nl l log l ; l l=1 and a nonparametric estimator ^ n of can be dened to be a nondecreasing step function with possible jum only occurring at s i i =1 ::: m, that maximizes (2.2). The following two lemmas characterize the nonparametric estimator ^ n of. The proofs directly follow 4

5 the same lines as those of Propositions 1.1 and 1.2 of Groeneboom and Wellner (1992), and are therefore omitted. Let (2.3) + = y 2 R m + : y 1 y 2 y m : Lemma 2.1. Suppose that N1 > and z =(z 1 ::: z m ) 2 +. Then z maximizes (2.2) over + if and only if (2.4) X jl Nj w j ; 1 for l =1 ::: m z j and (2.5) mx l=1 Nl w l z l ; 1 =: z l Lemma P P P 2.2. Let H be the greatest convex minorant of the points ( jl w j jl w jn j ) m on [ w l=1 l] i.e. for t 2 P m [ w l=1 l], H (t) =supfh(t) : H( X jl w l ) X jl w j Nj for each l l m H() = andhisconvexg : Moreover, let z i be the left derivative ofh at P ji w j. Then z =(z 1 z m ) is the unique vector maximizing (2.2) over +. The left derivative of the greatest convex minorant in Lemma 2.2 can be explicitly solved by the \max-min" formula (2.6) z l =maxmin il jl P ipj w pn P p ipj w p l =1 ::: m: The estimator ^(s l )=z l given in (2.6) based on the \eudo log-likelihood function" (2.2) is exactly that of Sun and Kalbeisch (1995). They derived (2.6) based on isotonic regression considerations introduced in Barlow, Bartholomew, Bremner and Brunk (1972). The relationship between the NPMLE for case 1 interval-censored data and isotonic regression is also developed in Groeneboom and Wellner (1992), page 43. If we assume the nonhomogeneous Poisson process with joint distribution given by (1.2), it follows that the log likelihood function for is given by (2.7) l n (jx) = nx XK i i=1 N (i) K i j ; N (i) K i j;1 log 5 h (T (i) (i) K i j ) ; (T K i j;1 ) i ; nx i=1 (T (i) K i K i )

6 omitting the parts that do not depend on. Let t 1 <t 2 < <t m denote the ordered distinct observation time points in the set of all observation time points n o T (i) K i j j =1 ::: K i i=1 ::: n : Let (2.8) Ff:[ 1)! [ 1) is nondecreasing () = g : The NPMLE of is dened to be the non-decreasing, non-negative step function with possible jum only occurring at t j j =1 2 ::: m, that maximizes (2.7). Let = f(u 1 u 2 u m ): u 1 u m < 1g and the map A : F!be dened by u = A() = ((t 1 ) (t 2 ) (t m )) for all 2F: We note that the map A does not have aninverse map, but it certainly exists an inverse map in the subclass of F, F, which consists of the stepwise nondecreasing functions with jum only possibly occurring at t j : j =1 2 m.we use the notation A ;1 for this map on the subclass F. We also dene a rank function R: f1 2 mg such that ft (i) K i j : j = 1 2 K i i = 1 2 ng! R(T (i) (i) K i j )=s if T K i j = t s: Then the log likelihood function can be rewritten as (ujx) =l n (A ;1 ujx) = nx i=1 " Ki X N (i) K i j ; N (i) K i j;1 and the NPMLE ^ n of is given by ^ n = A ;1^u n where log u ; u (i) R(T ; u K i j ) R(T (i) K i j;1 ) (i) R(T K i K ) i (2.9) ^u n = argmax u2 (ujx): We note that is a convex set, (u) is a concave function due to the linear combination of the concave function \log". Hence the maximization problem of (2.9) is well dened. The necessary and sucient conditions that characterize the solution of the maximization problem (2.9) can be formulated via the Fenchel duality theorem. The following theorem is stated in Lemma 3.1 of Wellner and Zhan (1997), that is originally from Jongbloed (1995,1998), and reformulated slightly by Groeneboom (1996). 6

7 Theorem 2.1. Let : R n! R [f;1g be acontinuous concave function. Let KR n be aconvex cone, and let K = K\ ;1 (R): Suppose K is nonempty and is dierentiable on K. Then ^x 2K satises (2.1) h^x 5(^x)i = (2.11) hx 5(^x)i for all x 2K. In view of the special structure of the convex set in our problem, for any u 2, it can be expressed as u = where u =, and 1 j =(11 as (2.12) {z } m;j {z } j mx j;1 ): (u m;j+1 ; u m;j )1 j Therefore, applying Theorem 2.1 to our problem, the condition (2.11) can be reformulated j for i =1 2 m: The equivalence between (2.11) and (2.12) can be proved by following the same argument as given on pages 39-4 of Groeneboom and Wellner (1992). Denote l l = nx i=1 i l (u) for l =1 2 ::: m where and i l (x) = K i ;1 X 2 4 N (i) K i j ; N (i) K i j;1 u R(T (i) K i j ) ; u R(T (i) + 2 K i j;1 ) ; 4 N (i) K i K i ; N (i) K i K i i j (l) = u R(T (i) K i K i ) ; u R(T (i) K i K i ) 1 if T (i) K i j = t l otherwise: N (i) K i j+1 ; N (i) K i j u R(T (i) K i j+1 ) ; u R(T (i) K i j ) 3 ; 15 i Ki (l) 3 5 i j (l) 7

8 Then the NPMLE of the mean function, ^ n canbecharacterized by the following: (2.13) (2.14) and (2.15) mx nx l=p i=1 " mx X n l=1 i=1 ^ n = A ;1^u i l (^u) ^u l = i l (^u) for all p =1 2 m: 3. Computation of the Estimators As noted in section 2, the NPMPLE b n can be computed in one step via the min-max formula (2.6). Our proposed method for computation of the NPMLE b n characterized by (2.13) - (2.15) involves the iterative convex minorant algorithm (ICM). This algorithm that has proved to be a numerically stable algorithm in many similar problems see, e.g. Groeneboom and Wellner (1992), Jongbloed (1995,1998), and Wellner and Zhan (1997). For any u =(u 1 u 2 u m ) 2 denote where i ll (u) =; ll (u) 2 l K i ;1 X = nx i=1 N (i) K i j ; N (i) K i j;1 u R(T (i) K i j ) ; u R(T (i) ; i ll (u) for l =1 2 ::: m K i j;1 ) 2 + N (i) K i K i ; N (i) K i K i ;1 u R(T (i) K i K i ) ; u R(T (i) We dene two processes G(u ) andv (u ) by: N (i) K i j+1 ; N (i) K i j u ; u (i) R(T K i j+1 ) R(T (i) K i K i ;1 ) i K i (l): K i j ) i j(l) (3.1) G(u )= 8

9 (3.2) G(u p)= px (; ll (u)) = ; px nx l=1 l=1 i=1 i ll (u) for p =1 2 m and (3.3) (3.4) V (u p)= = px px l=1 l=1 i=1 V (u ) = [ l (u)+u l (; ll (u))] nx [ i l (u) ; u l i ll (u)] for p =1 2 m: Theorem 3.1. A vector u = u satises (2.14) and (2.15) if and only if u is the left derivative of the convex minorant of the cumulative sum-diagram consisting of the following points: P =( ) P l =(G(u l) V(u l)) l =1 2 m: This theorem is actually Theorem 4.3 of Wellner and Zhan (1997) on page 952 by noting that px G(u p)= 2 j and V (u p)= px u j 2 j for p =1 2 ::: m: In general, Theorem 3.1 does not give an explicit solution u as simple as that we have seen in Wellner and Zhang (1998) for the eudo maximum likelihood estimator. Solving for u often requires an iterative procedure as follows. Iterative convex minorant algorithm: Let V (u i); V (u j) i j (u) = G(u i) ; G(u j) and be the accuracy parameter. step 1: Select an initial guess u 2 9

10 step 2: Update the solution by u k+1 l step 3: If P m l=1 = max jl min i j (u k ) l =1 2 m k = 1 2 il P n i=1 i l(u k+1 ) u P k+1 >or m P n max1pn l=p i=1 i l(u k+1 ) > go back to step 2, otherwise stop the iteration here. l Although the ICM works ne in many applications, it remains a problem that the algorithm might be not globally convergent. Moreover, the iteration procedure will have to stop, if the target function (u) hits negative innity (this is very possible in our problem because of the logarithm of dierences in the likelihood structure). To avoid that, Jongbloed (1995) designed a so-called \Modied Iterative Convex Minorant algorithm (MICM)" by inserting a binary line search procedure into the ICM, and proved the global convergence. The essence of the line search procedure is to keep the iteration in the right track, and also automatically avoid the negative innity happening in the procedure. We sketch out the MDICM method for our problem as follows: Modied iterative convex minorant algorithm: Let be the accuracy parameter, the line search parameter, u the initial point satisfying (u ) > ;1: Begin u = u While j P m l=1 [P n i=1 i l(u)] u l j >or max 1pn P m l=p P n i=1 i l(u) > (violation of the characterization (2.14) or (2.15)) Do Begin ~u l = max jl min il i j (u) l =1 2 m If (~u) (u)+(1; ) 5 (u) T (~u ; u) Do (The binary line search) Begin =1 s =1=2 =~u If () <(u)+ 5 (u) T ( ; u) = ; s = u + (~u ; u) s = s=2 Else if () >(u)+(1; ) 5 (u) T ( ; u) = + s = u + (~u ; u) 1

11 Else u =~u s = s=2 Else u = End End End. Our experience with this algorithm is that inclusion of the line search (Armijo's rule) is very important. See Jongbloed (1995,1998) for a proof of global convergence of the modied ICM algorithm. 4. Asymptotic Theory: Results Although the estimator b n described by (2.6) is also given in Sun and Kalbeisch (1995), the properties and behavior of this estimator are still unknown. For the consistency of the estimator b n, Sun and Kalbeisch (1995) referred to some results for isotonic regression given by Brunk (197). In the present problem, however, the average counts N l l=1 ::: m are not independent, and hence do not satisfy the conditions of Brunk's theorems. The estimator n b is described here for the rst time. Here we will use empirical process theory to study the properties of the estimators b n and b n. We establish consistency of both estimators in L 2 ;metrics related to the observation scheme, and establish the asymptotic distribution of the estimators at xed time points under some mild conditions. First some notation: Let B denote the collection of Borel sets in R, and let B [ ] = B 2Bg. On ([ ] B [ ] )we dene measures, 2,and, as follows: for fb \ [ ]: B B 1 B 2 2B [ ],set (4.1) (4.2) and (4.3) (B) = 2 (B 1 B 2 )= 1X k=1 (B) = 1X k=1 P (K = k) P (K = k) 1X k=1 kx kx P (T k j 2 BjK = k) P (T k j;1 2 B 1 T k j 2 B 2 jk = k) P (K = k)p (T k k 2 BjK = k) Let F be the class of functions dened in (2.8), and let d be the L 2 () metric on F thus for 1 2 2F, d( 1 2 )= Z j 1 (t) ; 2 (t)j 2 d(t) 1=2 : 11

12 We also dene a metric d 2 on F in terms of 2 : d 2 ( 1 2 )= Z Z j 1 (v) ; 1 (u) ; ( 2 (v) ; 2 (u))j 2 d 2 (u v) 1=2 : The metrics d and d 2 are closely related, and if P (K k ) = 1 for some k < 1, then they are equivalent: (4.4) Moreover, with e dened by (4.5) e(b) = 1 2 d 2( 1 2 ) d( 1 2 ) k d 2 ( 1 2 ) : 1X k=1 P (K = k) 1 k 2 kx P (T k j 2 BjK = k) the corresponding metric e d dened by e d2 ( 1 2 )= R [ 1 (t) ; 2 (t)] 2 de(t) satises (4.6) ed( 1 2 ) d 2 ( 1 2 ) : (Proofs of (4.4) and (4.6) will be given along with the proof of Theorem 4.2 in section 7.) The measure was introduced by Schick andyu (1997) note that and 2 are nite measures if E(K) < 1. Also note that d 2 ( 1 2 ) can also be written in terms of an expectation as: (4.7) d 2 ( 1 2 )=E ( 1 (T K j ) ; 2 (T K j )) 2 : Let G k denote the joint conditional distribution of T k =(T k 1 ::: T k k ) conditional on K = k, and G k j the marginal conditional distribution of T k j given K = k. Then (4.7) follows easily by the following calculation: E ( 1 (T K j ) ; 2 (T K j )) 2 = = = 1X k=1 Z 1X k=1 Z P (K = k) P (K = k) kx kx Z ( 1 (t k j ) ; 2 (t k j )) 2 dg k (t k ) ( 1 (t k j ) ; 2 (t k j )) 2 dg k j (t k j ) j 1 (t) ; 2 (t)j 2 d(t) =d 2 ( 1 2 ): 12

13 To prove consistency of the maximum eudo likelikehood estimator b n and the maximum likelihood estimator n b we will use the following regularity conditions on the true mean function and the underlying distribution of observation times: Conditions: A. The observation times T k j j =1 ::: k k =1 2 ::: are random variables taking values in the bounded set [ ] where 2 ( 1). B. The true mean function satises () M for some M 2 ( 1). C. The function M dened by M (X) P K N K j log(n K j ) satises PM (X) < 1. D. The function M dened by M (X) P K N K j log(n K j ) satises PM (X) < 1. Note that Assumption C holds if (4.8) P (N K j ) 1+! < 1 PK for some >. Both assumption C and R the sucient condition (4.8) can be expressed in terms of the measure since PM = E[N(t)) log(n(t))]d(t) andp (N K j) 1+ = R E[(N(t)) 1+ ]d(t). Similarly, Assumption D holds if (4.9) P (N K j ) 1+! < 1 for some >, and both assumption D and the sucient condition (4.9) can be expressed in terms of the measure 2 since we have PM = ZZ E[(N(v) ; N(u)) log(n(v) ; N(u))]d 2 (u v) and P (N K j ) 1+! = ZZ E[(N(v) ; N(u)) 1+ ]d 2 (u v) : b n In the current setting, our two consistency theorems for the eudo likelihood estimator and the full maximum likelihood estimator n b are as follows: 13

14 Theorem 4.1. (Consistency of the NPMPLE). Suppose that A, B, and C hold, E(K) < 1, and that E(N(t)) = (t). Then, for every b< for which ([b ]) >, In particular, if (fg) >, then This conclusion also holds if d(b n 1 [ b] 1 [ b] ) ;! a:s: as n!1: d(b n ) ;! a:s: as n!1: (4.1) lim sup n!1 b n () < 1 almost surely : Note that by the max-min denition of b n, (4.1) holds if lim sup n!1 max 1in N (i) () < 1 a.s., and this is always true if the counting processes N (i) have at most nitely many jum Theorem 4.2. (Consistency of the NPMLE). Suppose that A, B, and D hold, E(K) < 1, and that E(N(t))= (t). Then, for every b< for which ([b ]) >, In particular, if and, if (fg) >, then d(b n 1 [ b] 1 [ b] ) ;! a:s: as n!1: d(b n ) ;! p as n!1: These consistency theorems have a number of important corollaries concerning consistency of the estimators in stronger metrics under additional hypotheses on the mean function and the observation scheme just as in the treatment byschick and Yu (1998) of \mixed case interval censoring", but we will forego those corollaries here. We now turn to asymptotic distribution theory at a xed point t. Theorem 4.3. (Asymptotic distribution, NPMPLE) In addition to conditions A-C, suppose that: E1. There is an >andm 1 >, such thate(k 2+ ) < 1 and E(N 2+ (t)) M 1 for all t 2 S : 14

15 E2. For a xed t 2 S, there is a neighborhood of t,suchthatg k j is dierentiable, and G k j (s) is continuous in this neighborhood. Moreover, G k j (t) is positive and uniformly bounded for all j =1 2 ::: k k =1 2 :::. E3. In the neighborhood of t described in E2, the true mean function is dierentiable, and its derivative is continuous and strictly positive. E4. Suppose that G k i j (s t) =P (T k i s T k j t) is dierentiable with respect to s and t, and in a neighborhood of (t t ) g k i j (s G k i j(s t) exists and is uniformly bounded for all i j =1 2 ::: k k =1 2 :::. E5. 2 (t) Var(N(t)) is continuous in a neighborhood of t. Then (4.11) n 1=3 b n (t ) ; (t ) 2 (t ) 1=3 ;!d (t ) 2argmax h Z(h) ; h 2 2G (t ) P where G 1 (t) = P (K = k)p k k=1 G k j (t), and Zis a two-sided Brownian motion process, starting from zero. Now we will study the \toy estimator" b() n corresponding to n b.asingroeneboomand Wellner (1992), this is dened to be the result of carrying out one step of the iterativeconvex minorant algorithm starting with the truth, namely.for progress in the study of the full NPMLE of the distribution function F in the case of interval censored data, see Groeneboom (1996), section 4.2. While it seems clear that the toy version of the NPMLE and the full NPMLE will be asymptotically equivalent, at least under \strict separation hypotheses" on the observation time distributions, the complete proof of this will require more eort. We postpone this to future research. Moreover, for the moment we will establish the pointwise asymptotic distribution of b only when the hypothesized non-homogeneous Poisson process model used to derive the estimator is valid. Suppose that the distributions G k j j have densities g k j j with respect to Lebesgue measure on R 2, and dene functions H i k j, i =1 2, j =1 ::: k, k =1 2 :::, and H as follows: H 1 k j (t k j )= H 2 k j (t k j )= Z tk j Z 1 1 (t k j ) ; (t k j;1 ) g k j;1 j(t k j;1 t k j )dt k j;1 1 t k j (t k j+1 ) ; (t k j ) g k j j+1(t k j t k j+1 )dt k j+1 15

16 and (4.12) H(t) = 1X k=1 P (K = k) ( k;1 X [H 1 k j (t)+h 2 k j (t)] + H 1 k k (t) We will assume that all the H i k j 's, and hence also H, are nite. We also dene and H 1 k j (t k j ) H 2 k j (t k j ) Z tk j Z 1 H(t ) 1 (t k j ) ; (t k j;1 ) 1 [1=( (t k j ); (t k j;1 ))>]g k j;1 j (t k j;1 t k j )dt k j;1 1 t k j (t k j+1 ) ; (t k j ) 1 [1=( (t k j+1 ); (t k j ))>]g k j j+1 (t k j t k j+1 )dt k j+1 1X k=1 P (K = k) " k;1 X fh 1 k j (t )+H 2 k j (t )g + H 1 k k (t ) Then we will assume the following asymptotic negligibility condition: (4.13) for each >andt>. Z (t t +t=] H(u )du! as!1 Theorem 4.4. (Asymptotic distribution, NPMLE toy version) In addition to conditions A, B, and D, suppose that: F1. E(K 2+ ) < 1 for some > and the asymptotic negligibility condition (4.13) holds. F2. The counting process N is a nonhomogeneous Poisson process with mean function. F3. Foraxedt 2 S, there is a neighborhood of t,suchthatg k j is dierentiable, and G k j (s) is continuous in this neighborhood. Moreover, G k j (t) is positive and uniformly bounded for all j =1 2 ::: k k =1 2 :::. F4. In the neighborhood of t described in F3, the true mean function is dierentiable, and its derivative is continuous and strictly positive. F5. Suppose that the function H dened in (4.12) is continuous and strictly positive in a neighborhood of t. Then ) : : (4.14) n 1=3 bn (t ) ; (t ) ;!d (t ) 2H(t ) 1=3 2 argmax h Z(h) ; h 2 16

17 where Zis a two-sided Brownian motion process, starting from zero. Remark 1. It should be emphasized that the hypotheses of Theorems do not require that the true counting process N be a (non-homogeneous) Poisson process. Thus these results yield a very strong robustness property of both the eudo-likelihood estimator b n and the NPMLE n b. Remark 2. We conjecture that Theorems 4.1 and 4.2 continue to hold without the restriction to [ b] with b < when (fg) = or (fg) = respectively with the present L 2 () metrics replaced by their L 1 () counterparts, but we have not succeeded in proving the needed uniform integrability to deduce the stronger results. Despite this shortcoming, Theorems 4.1 and 4.2 have several useful corollaries concerning pointwise and uniform consistency of b n and n b along the lines along the lines of corollaries and 2.8 of Schick and Yu (1997). We forego a detailed exposition of those corollaries here. Remark 3. Suppose that for each i =1 2 :::, G k i (t) are all equal to G i for k = i i +1 :::. Then G (t) in Theorem 4.3 can be rewritten as G (t) = 1X i=1 G i(t) = G 1(t)+ G 1(t): 1X 1X k=i i=2 P (K = k) = G i(t)p (K i) 1X i=1 G i(t)p (K i) Hence G (t) is bigger than G 1, the density function of observation time in the case of current status data, the case K = 1 with probability 1. Hence under the above condition the estimator b n of the mean function based on panel count data has smaller variance than the estimator based only on current status data i.e. multiple observation times help in estimating under the above condition. Since the numerator of the constant appearing in (4.11) depends only on the the counting process N, whenever the denominator G (t )is bigger than the corresponding term in the denominator g(t ) in the case of current status data, the asymptotic variance is smaller. The following examples illustrate the generality of Theorems , and relate them to previous results in several important special cases. Example 4.1. (Current status data, simple (one event) counting process.) Suppose that the counting process N is the simple counting process N(t) =1 [Y t] where Y is a nonnegative random variable with distribution function F. Then (t) =F (t), and 2 (t) = F (t)(1 ; F (t)). If K = 1 with probability one, then our model reduces to \current-status data" for Y (or \interval censoring case 1" in the terminology of Groeneboom and Wellner (1992)). Moreover it is easily seen that the estimator b n (t) of (t) =F (t) is exactly the NPMLE of F studied in Groeneboom and Wellner (1992), and the convergence in distribution 17

18 in Theorem 4.3 agrees exactly with Theorem 5.1, page 89, Groeneboom and Wellner (1991) upon noting that in this case (with G g). 2 (t ) (t ) 2G (t ) = F (t )(1 ; F (t ))f(t ) 2g(t ) Example 4.2. (Interval censoring case 2 simple counting process). Suppose that N is a simple indicator counting process (so that (t) =F (t) and 2 (t) =F (t)(1 ; F (t))) as in example 1, but now suppose that K = 2 with probability one so that G(t) =G 2 1 (t)+ G 2 2 (t), the sum of the two marginal distributions of T 2 =(T 2 1 T 2 2 ). In this case the estimator b n (t) of (t) =F (t) diers from the NPMLE of F studied by Groeneboom (1991), Groeneboom and Wellner (1992), Wellner (1995), and Groeneboom (1996), and is somewhat inecient, even under the \separation type" hypotheses under which the NPMLE bf n (t) converges to F (t) atraten 1=3 c.f. Wellner (1995) and Groeneboom (1996). (Under non-separation, positive density along the diagonal type hypotheses on the distribution G 2 of T 2, the arguments of Groeneboom (1991) and Groeneboom and Wellner (1992) suggest that bf n (t) converges to F (t) at rate (n log n) 1=3.) For example, if G 2 (t 1 t 2 )=( + 2)( +1)(t 2 ; t 1 ) 1 [t1 t 1 1] as in Example 2, page 276 of Wellner (1995), and if F (x) =x, x 1, then the asymptotic relative eciency of the estimator b n (t) relative to the NPMLE based on the conjectured asymptotic distribution given in Wellner (1995) is, at a point t 2 ( 1), ARE b NP MLE (t )= t 1+ +(1; t ) 1+ t 2+ +(1; t ) 2+ + ;1 (1 + ) t (1 ; t ) 1+ + t 1+ (1 ; t ) = when t =1=2 and the latter ranges from to 1 as goes from to 1. Example 4.3. (Current status data, nonhomogeneous Poisson process). Suppose that the counting process N is a non-homogeneous Poisson process with mean function. Furthermore, suppose that K = 1 with probability one. Then there is just one term in the inner sum in (2.1), and hence the eudo log-likelihood l n (jx) is actually the true log-likelihood. Thus the development in section 2 shows that b n is the NPMLE of (t) in this case. Our Theorems 4.1 and 4.2 show thatb n and n b are consistent estimators of, even if the counting process N is not apoisson process. Example 4.4. (\Mixed case" interval censored data, simple counting process.) Suppose that N is a simple indicator (one event) counting process, N(t) =1 [Y t], as in examples 1 and 2, but now suppose that K is random so that we have avariable number of observation 18

19 times for each individual. This is what SchickandYu (1997) refer to as \mixed case" interval censored data. The estimators b n and b n of = F both dier from the NPMLE of F in this case (at least when K i 2 for some i), but Theorems 4.1 and 4.2 guarantee that they are at least consistent, and Theorems 4.3 and 4.4 give conditions under which their (pointwise) convergence rates are n 1=3. 5. An Example The data, extracted from Andrews and Herzberg (1985), are from a bladder tumor study, conducted by the Veterans Administration Co-operative Urological Research Group (VACURG). Previous studies of this data set can be found in Byar, Blackard and the VACURG (1977), Byar (198), and Wei, Lin and Weissfeld (1989). This was a randomized clinical trial. In the trial, all patients experienced supercial bladder tumors when they entered the trial. They were randomly divided into three grou: one placebo group and two treatment grou (one group was assigned pyridoxine pills, and other periodic instillation of a chemotherapeutic agent, thiotepa, into the bladder). The follow-up time periods varied among patients. At each follow-up visit, any tumors noticed were counted, measured and then removed transurethrally, and the treatmentwas continued. The purpose of the study was to determine the eect of treatment on the frequency of tumor recurrence. In the data set as reported in Andrews and Herzberg (1985), there were three missing values on tumor count. Patients 47 and 113 had missing tumor counts at the 5th and 6th follow-up visits, respectively. Since the tumor sizes were also missing at their corresponding follow-up visits, we simply deleted these observation times. However, for patient 65, there was a missing tumor count, but not missing size, at the 3rd follow-up visit we therefore called his count 1 since this is a reasonable lower bound. Another diculty with this data set is the truncation of the tumor counts: in the original data set, a count of 8 actually denotes that there were 8 or more than 8 tumor recurrences by the observation time. Unfortunately, we have no way to access the true tumor counts completely, so we should expect that our estimator is biased downward from the true mean count. Ways of handling such truncation in counting process settings such as this remains a problem for further research. This data was also considered bywei, Lin and Weissfeld (1989). They treated the clinical visit times as the time of tumor recurrence and applied marginal Cox regression models adapted to the multivariate data to estimate the treatment eect on the marginal distribution of tumor recurrence times. Here we treat the clinical visit times as the observation times, and take the tumor count as the main response variable. Thus the data can be treated as panel count data, and ts quite naturally in the setting for panel count data introduced in Sections 1-3. Then the eect of treatment can be seen through the comparison of the mean functions of tumor counts across the treatment grou. 19

20 We estimated the mean functions of tumor counts for the three grou by both the eudo maximum likelihood method (i.e. the Sun-Kalbeisch estimator) and the full maximum likelihood method as dened in Section 2. The three estimated mean functions using the eudo maximum likelihood method are shown in Figure 1. We can see that the treatment using thiotepa seems to reduce the tumor recurrence, assuming that randomization has already adjusted the eects from initial tumor count and size. The three estimated mean functions using the full maximum likelihood method are also shown in Figure 1. The estimated mean functions agree qualitatively with the estimated mean functions using eudo maximum likelihood, but there are also substantial dierences. As can be seen from the plot, the NPMLE n b tends to be smaller than the eudo maximum likelihood estimator b n, though the shapes of the curves are basically the same. This makes sense because the maximum eudo likelihood estimator b n is essentially a \mean-type" estimator (it can be viewed as the \pool-adjacent-violators" estimator). The shapes of the mean functions based on the NPMLEs imply that treatment using thiotepa seems to reduce the tumor recurrence, assuming that randomization has adjusted the eects from initial tumor count and size. Another approach would involve development of a regression model for tumor counts, and could include the treatment, initial tumor count, and initial tumor size as covariates. This would allow for semiparametric estimation of treatment eects on tumor counts with adjustments for the initial tumor count and size. Some workonmodelsofthistype has been carried out in Zhang (1998). 6. Simulations To compare the properties of the NPMPLE b n and the NPMLE n b,we describe two Monte Carlo studies in this section. Before describing the Monte Carlo experiments, we rst explain our approach to estimation of the asymptotic relative eciency of two estimators in this type of problem. To study the asymptotic relative eciency between two estimators, let us suppose that the estimators ^a na,and^b nb have the same asymptotic distributions up to positive constants a and b: and n 1=3 a (^a n ; a) ;! d a 1=3 Z n 1=3 b (^b n ; b) ;! d b 1=3 Z respectively, where Zis a known random variable. Since Var(^a na ) = n ;2=3 a a 2=3 Var(Z) and Var(^b nb ) ;2=3 = n b 2=3 Var(Z) 2 b

21 NUMBER OF TUMORS Placebo Pyridoxine Thiotepa NPMPLE NPMPLE NPMLE NPMLE NPMPLE NPMLE TIME SINCE THE BEGINNING OF TRIAL Figure 1: Estimators of the Mean Functions, Bladder Tumors Data. If we ask two estimators to have samevariance, asymptotically, we nd that this forces 2=3 2=3 nb b = : n a a This implies that the asymptotic relative eciency of two estimators with n 1=3 ; convergence rate is simply the cube of the ratio of constants that appear in the asymptotic distributions: lim n b n a = b a : 21

22 On the other hand, we note that and hence Var(^a n ) = n ;2=3 a 2=3 Var(Z) and Var(^b n ) = n ;2=3 b 2=3 Var(Z)! lim n b = b = Var(^b 3=2 n ) : n a a Var(^a n ) Based on the above arguments, we plot the 3=2 power of the ratio of sample variance of the NPMLE to the variance of the NPMPLE to obtain an approximation to the asymptotic relative eciency. Simulation 1: (Data from the Poisson process model) Let f(n i T i K i ): i =1 2 ng be a random sample, where K i 2f g for each i =1 2 n and P (K i = k) = 1 6 for k = : T i =(T (i) K i 1 T(i) K i 2 T(i) K i K i ) are made from the order statistics of K i random observations, generated from the distribution, Unif(,1). To make observation time points being possibly tied, and separated, T (i) K i j 2 K i, are rounded to 2 nd decimal point. Panel counts N i =(N (i) K i 1 N(i) K i 2 N(i) K i K i ) are generated from Poisson(2t), i.e. N (i) K i j ; N (i) K i j;1 Poisson h 2(T (i) K i j ; T (i) K i j;1 ) i : We choose n = 1, = :1 and = :2 to run the simulation. Since the NPMPLE b n is consistent by Theorem 4.1, it is a good choice as a starting point for the Modied Iterative Convex Minorant (MICM) algorithm to compute n b. MICM procedure to speed up the computation. However, the NPMPLE may yieldavalue of negative innity for log likelihood function (u). To avoid this diculty, we used a piece-wise linear interpolation so that the starting point for the MICM is strictly increasing. Based on these preparations, the MICM sto at step 195. Figure 3 displays both the NPMPLE and NPMLE of the mean function, along with the true mean function (t) =2t. Apparently, both the maximum eudo likelihood estimator and NPMLE converge to the true mean function. Since the NPMLE appears to have more jum than the NPMPLE, it suggests that the NPMLE has less variability than the NPMPLE, in agreement with Theorems 4.3 and 4.4. To assess the variability of the estimators, we carried out a Monte Carlo study by repeating the simulation 1 times. The pointwise variances at time points t = 22

23 1:5 2: 2:5 3: 9:5 are calculated via the sample variances from 1 runs. Table 1 lists the bias and standard errors at those points for both the NPMPLE and NPMLE, respectively. Asymptotic unbiasedness for both estimators appears to be true, and the NPMLE does indeed have smaller variances at all these points, as expected. The plot of estimated relative eciency NPMPLE with respect to the NPMLE is plotted in Figure 3. As can be seen from Figure 3, the NPMLE tends to be more ecient than the NPMPLE: at most of those time points, the estimated relative eciency of the NPMPLE is about 2%. Simulation 2: (Data from a one-jump counting process) Let f i T i K i ): i =1 2 ng be a random sample, where K i, T i are generated following the same schemes as those in simulation 1. Let V i denote the failure time, generated from the distribution Exp (-.2). Panel counts i =( (i) K i 1 (i) K i (i) 2 K i K i ), i =1 2 n are given by (i) K i j =1 [V i T (i) K i j ] for j =1 2 K i: Because the counting process in this simulation always has just one jump, this situation can be viewed as \mixed case interval censored data," as described by Schick and Yu (1998). Because of this, at most two terms are needed in forming the likelihood function, and this means that a data reduction procedure analogous to that described in Denition 1.1 of Groeneboom and Wellner (1992) is necessary in the coding to simplify the algorithm. We choose n = 3, = :1 and = :2 torunthesimulation. Again, the linearly interpolated NPMPLE b n was chosen as the starting point for the MICM algorithm. Under these tolerances, the MICM procedure stopped after 174 iterative ste for a simulated sample. Figure 3 displays both the NPMPLE and NPMLE of the distribution function of failure time F (t) =P (T t) =1; exp(;:2t) (the mean function is just the distribution function of failure time in one jump process). The same phenomenon as in the simulation 1 happens in this study that both NPMPLE and the NPMLE appears to converge to the true target function F (t), and the NPMLE tends to have less variation than the NPMPLE. This study serves asanumerical evidence to verify that both the NPMPLE b n and the NPMLE n b based on the the Poisson process assumption is robust with respect to the actual distribution of the underlying counting process. A Monte Carlo study based on 1 runs of simulation 2 was also implemented to compare the NPMPLE and NPMLE. Table 2 shows the results of the study by listing the bias and standard errors of the estimators at points t =1:5 2: 2:5 3: 9:5. Apparently, these estimators are both asymptotically unbiased with the NPMLE being less variable. The plot of estimated relative eciency NPMPLE with respect to the NPMLE is plotted in Figure 4. As can be seen from Figure 4, the NPMLE tends to be more ecient than the NPMPLE: at most of those time points, the estimated relative eciency of the NPMPLE is about 3% - 4% in this second situation.. 23

24 While the NPMLE is apparently more ecient than the NPMPLE, it should be noted that the implementation of the NPMLE is much moreinvolved than that of the NPMPLE, requires relatively delicate programming structure, and also much more computing time. Here all these implementations were coded in the C language and the plots were produced using S-plus. For executable versions of the above algorithms described in Section 3 and the above simulations (implemented in C), see Wellner's web site for software, 7. Asymptotic Results: Proofs One of the strengths of modern empirical process theory is the generality allowed in the choice of the sample space. Here we exploit that generality. We will use several results from Van der Vaart and Wellner (1996) to prove Theorems 3.1 and 3.2 for the convenience of the reader, these results are restated in the Appendix, section 8. Proof of Theorem 4.1: We begin by outlining the main ste in the proof. The basic idea is to show that for almost all! the sequence fb n (!)g is sequentially compact for the topology of pointwise convergence and that every pointwise limit y (!)offb n (!)g must satisfy M ( ) M ( y ). Since M has as its unique maximum, this then yields that every limit equals a.e.. This then yields b n! a.e. almost surely. By the nitenes of and the dominated convergence theorem, this convergence yields d(b n 1 [ b] 1 [ b] )! a:s: for every b for which lim sup n!1 b n (b) < 1 almost surely. This last conclusion and sequential compactness will follow from an argumentwhich shows that lim sup n!1 b n (b) C=([b ]) for every b<. Now for the detailed argument. Let S supp(), and set r =sup(s ) : It follows from assumption A that r < 1, andwe can take = r. Let M n () = 1 n l n(jx) =P n (N K j log (T K j ) ; (T K j )) = P n m and where (7.1) M () = P (N K j log (T K j ) ; (T K j )) X K m (X) = [N K j log (T K j ) ; (T K j )] : = Pm 24

25 Our proof of consistency will use the one-sided Glivenko-Cantelli Theorem 8.1. To apply Theorem 8.1, we rst need an upper envelope for the class of functions M fm :2Fg. Noting that g(x) =a log x ; x, x>, a, is maximized by x = a (with log ), we nd that X K m (X) [N K j log(n K j ) ; N K j ] where PM (X) < 1 by Assumption C. The eudo NPMLE b n is given by Since it follows that Evaluating this limit we nd that b n = argmax 2F M n () : M n ( b n ) M n ((1 ; ) b n + ) M n ((1 ; )b n lim + ) ; M n (b n ) N K j log(n K j )=M (X) : (7.2) P n ( N K j ; 1! ) K j ; n K j b b n K j where K j (T K j ), b n K j b P n ( n (T K j ). Now (7.2) can be rewritten as!) ( K j N K j + n K j b ; P n ( K j + N K j ) b n K j where the second term on the left side converges a.s. by the strong law of large numbers and the Assumptions A and B, to!! Z P ( K j + N K j ) =2E (T K j ) =2 (t)d(t) C<1 : Hence it follows that for every b 2 ( ] one has, almost surely, ( C =limsupp n ( K j + N K j ) n!1 25 ) )

26 ( lim supp n n!1 N K j K j b n K j + b n K j!) ( lim supp K j n 1 [b ] (T K j ) N K j n!1 ( ) lim sup b (b)p n n 1 [b ] (T K j ) n!1 =limsup n!1 b n (b)([b ]) : b n K j + b n K j!) Note that either (fg) >, or there exists <b<arbitrarily close to such that ([b ]) >. We rst complete the proof in the case when (fg) >. In this case we conclude that on a set with probability one we have (7.3) lim sup b n (b) n!1 C (fg) M < 1 and hence the functions fb fb selection theorem that the sequence fb n (!)g has a subsequence fb ()g f b n n g are a.s. bounded on [ ]. Since the sequence of functions n (t!) :t 2 [ ]g is uniformly bounded (for all n some N!), it follows from the Helly (!)g n (!) which converges to a nondecreasing function y = y!, dened on [ ] and taking values in [ M ]. Consider the class of functions where (7.4) M = fm (X) :2F g F f 2F:() M +1g : Note that F is compact for the (eudo-)metric d. Moreover, the function! m (x) is upper semicontinuous in for each xed x, andm (x) M (x) for all x and 2F where M is integrable by Assumption C. Thus Theorem 8.1 yields (7.5) limsup n!1 sup 2F (P n ; P )(m (X)) almost surely. Since M n ( )! a:s: M ( )by the strong law of large numbers and M n ( ) M n (b n ), it follows that M ( ) liminf n!1 M n (b (7.6) n ) almost surely: Hence it follows from (7.5) and (7.3) that (7.7) limsup n!1m n (b n ) P (m y ) 26 almost surely

27 where the last inequality follows from (7.5) and for any subsequence b n semicontinuityof! P (m convergence of b n (7.8) limsup n!1p (m b n ) P (m y ) converging almost surely to y on [ ] this follows from the upper ) which is a consequence of Theorem 8.1 together with pointwise on [ ]. Combining (7.6) and (7.7) yields M ( y ) ; M ( )=;(Pm ; Pm ) y! = ;P = ;P = ; Z y (t)h K j log K j y K j h (t) y (t) y K j K j y K j!! ; K j d(t) since h(x) x(log x ; 1) + 1 withequality if and only if x =1. This implies that y (t) = (t) a.e.. Since this is true for any convergent subsequence, we conclude that all the limits y of subsequences of fb n g are equal to a.e.. Since (t) () M for t, this implies that lim n!1 b t 2 [ ]. y K j ; 1 y K j n (t) = (t) () almost surely for ;almost all It follows by the dominated convergence theorem (with dominating functions () + 1 since is a nite measure) that d(b n )! a:s:. Now suppose that (fg) =. By the denition of r it then follows that ([b ]) > for every b<.thenwe conclude that on a set with probability one we have (7.9) lim supb n (b) n!1 C ([b ]) M b < 1 and hence the functions fb fb selection theorem that the sequence fb n (!)g has a subsequence fb n ()g fb n g are a.s. bounded on [ b]. Since the sequence of functions n (t!) :t 2 [ b]g is uniformly bounded (for all n some N! ), it follows from the Helly (!)g n (!) which converges to a nondecreasing function y = y! b, dened on [ b] for each b< taking values in [ M b ]. By considering a sequence of b's converging up to we get a function y = y! which iswell-dened on [ ). Set b(x) 1 [ b] (T K K ), and for b< consider the class of functions where (7.1) M b = f b(x)m (X) :2F bg F b 2F (b) Mb +1 (t) =(b) t b : 27!

28 Note that F b is compact for every b<. Moreover, the functions! b(x)m (x) are (x) M (x) for all x and is integrable by Assumption C. Thus Theorem 8.1 yields upper semicontinuous in for each xedx, and! b(x)m 2F b where M (7.11) limsup n!1 sup 2F b (P n ; P )( b(x)m (X)) almost surely. Since M n ( )! a:s: M ( )by the strong law of large numbers and M n ( ) M n (b n ), it follows that M ( ) liminf n!1 M n (b (7.12) n ) almost surely: Now we can write, for any 2F, M n () = P n ( bm )+P n((1 ; b)m ) (P n ; P )( bm )+P( bm )+P n((1 ; b)m ) : Hence it follows from (7.11) and (7.9) that (7.13) limsup n!1m n (b n ) P ( bm y )+P ((1 ; b)m (X)) almost surely where the last inequality follows from (7.11) and limsup n!1p ( bm b n ) P ( bm y ) for any subsequence b n converging almost surely to y on [ ) this follows from the upper semicontinuity of! P ( bm )which is a consequence of Theorem 8.1 together with pointwise convergence of b n on [ ). Letting b " in (7.13) yields, using assumptions A and C, on a set with probability one, (7.14) limsup n!1m n (b n ) P (m y )=M ( y ) : Combining (7.12) and (7.14) yields (7.8) and hence y (t) = (t) a.e. as before. Since this is true for any convergent subsequence, we conclude that all the limits y of subsequences of fb n g are equal to lim n!1 b a.e.. Since (t) () M for t, this implies that n (t) = (t) () almost surely for ;almost all t 2 [ ). It follows by the dominated convergence theorem (with dominating functions (b)+1since is a nite measure) that d(b n 1 [ b] 1 [ b] )! a:s:. 2 Proof of Theorem 4.2: We rst establish the equivalence of the metrics claimed in (4.4): since (a ; b) 2 2(a 2 + b 2 ), it follows that ZZ d 2 2 ( 1 2 )= 2 =4 Z Z Z [ 1 (v) ; 1 (u) ; ( 2 (v) ; 2 (u))] 2 d 2 (u v) [ 1 (v) ; 2 (v)] 2 d 2 (u v)+ ZZ [ 1 (v) ; 2 (v)] 2 d(v) =4d 2 ( 1 2 ) [ 1 (u) ; 2 (u)] 2 d 2 (u v) 28

29 and this proves the rst inequality in (4.4). To prove the second inequality, note that for nonnegative real numbers a j b j j= ::: k with a ==b wehave (a j ; b j ) 2 = ( j X i=1 [(a i ; a i;1 ) ; (b i ; b i;1 )]) 2 j jx i=1 [(a i ; a i;1 ) ; (b i ; b i;1 )] 2 and hence (7.15) kx (a j ; b j ) 2 = kx j ( kx i=1 k 2 kx i=1 jx i=1 [(a i ; a i;1 ) ; (b i ; b i;1 )] 2 [(a i ; a i;1 ) ; (b i ; b i;1 )] 2 [(a i ; a i;1 ) ; (b i ; b i;1 )] 2 : kx j=i j ) This yields d 2 ( 1 2 )=E E ( ( 1 (T K j ) ; 2 (T K j )) 2 K 2 K X k 2 d 2 2( 1 2 ) (( 1 (T K j ) ; 1 (T K j;1 )) ; ( 2 (T K j ) ; 2 (T K j;1 ))) 2 ) where we have used P (K k ) = 1 in the last inequality. Thus (4.4) is proved. Note that (4.6) follows easily from (7.15) upon dividing across by k 2 and integrating. Now let M n () = 1 n l n(jx) = P n ( (N K j ; N K j;1 ) log((t K j ) ; (T K j;1 )) ; ((T K j ) ; (T K j;1 )) ) and M () = P = P n m ( [(N K j ; N K j;1 ) log((t K j ) ; (T K j;1 )) ; ((T K j ) ; (T K j;1 ))] ) = Pm 29

Two Likelihood-Based Semiparametric Estimation Methods for Panel Count Data with Covariates

Two Likelihood-Based Semiparametric Estimation Methods for Panel Count Data with Covariates Jon A. Wellner 1 and Ying Zhang 2 December 1, 2006 Abstract We consider estimation in a particular semiparametric