The Central Limit Theorem Under Random Truncation

Size: px

Start display at page:

Download "The Central Limit Theorem Under Random Truncation"

Madeline Gilmore
5 years ago
Views:

1 The Central Limit Theorem Under Random Truncation WINFRIED STUTE and JANE-LING WANG Mathematical Institute, University of Giessen, Arndtstr., D-3539 Giessen, Germany. Department of Statistics, University of California, Davis, CA 9566, USA Under left truncation data (X i, Y i ) are observed only when Y i X i. Usually the distribution function F of the X i is the target of interest. In this paper we study linear functionals ϕdf n of the nonparametric MLE of F, the Lynden- Bell estimator F n. A useful representation of ϕdf n is derived which yields asymptotic normality under optimal moment conditions on the score function ϕ. No continuity assumption on F is required. As a by-product we obtain the distributional convergence of the Lynden-Bell empirical process on the whole real line. Running Title: The CLT under random truncation Key words: CLT; Lynden-Bell integral; Truncated data. Introduction and main results In this paper we provide some further methodology for statistical analysis of data which are truncated from the left. To be more specific, let (X i, Y i ), i N, denote a sample of independent bivariate data such that, for each i, X i is also independent of Y i. Denote with F and G, respectively, the unknown distribution functions of X and Y. Typically, F is the target of interest. Now, under left truncation, X i is observed only when Y i X i. As a consequence, the empirical distribution of the X s is unavailable and cannot serve as a basic process to compute other statistics. The nonparametric maximum likelihood estimator of F for left-truncated data was first derived by Lynden-Bell (97). Its first mathematical investigation may be attributed to Woodroofe (985), who also reviewed some examples of truncated data from astronomy and economics. See also Wang (989) for applications in the analysis of AIDS data. Now, denoting with n the number of data which are actually observed, we have, by the Strong Law of Large Numbers (SLLN), n N α P(Y X) as N with probability one.

2 Without further mentioning we shall assume that α >, because otherwise nothing will be observed. Of course, α will be unknown. Conditionally on n, the observed data are still independent, but the joint distribution of X i and Y i becomes H (x, y) = P(X x, Y y Y X) = α G(y z)f (dz). (,x] The marginal distribution of the observed X s thus equals (x) α G(z)F (dz). (.) (,x] It may be consistently estimated by the empirical distribution function of the known X s: Fn(x) = n {Xi x}, x R. n i= The problem, however, is one of reconstructing F and not from the available data (X i, Y i ), i n. A crucial quantity in this context is the function where C(z) = P(Y z X Y X) = α G(z)[ F (z )], (.) F (z ) = lim x z F (x) is the left continuous version of F and F {z} = F (z) F (z ) is the F -mass at z. The function C may be consistently estimated by C n (z) = n n {Yi z X i }. It is very helpful to express the cumulative hazard function of F, F (dz) Λ(x) = F (z ), (,x] i= in terms of estimable quantities. For this, let a G = inf{x : G(x) > } be the largest lower bound for the support of G. Similarly for F. From (.) and (.) we obtain (a G,x] F (dz) F (z ) = (a G,x] (dz) C(z). (.3)

3 Provided that a G a F and F {a F } =, the left-hand side equals Λ(x). Otherwise this is no longer true, and as Woodroofe (985) pointed out, F cannot be fully recovered from the available data. The situation is similar for right-censored data. See Stute and Wang (993) for a detailed discussion for (upper) boundary effects there. Throughout the paper we shall therefore assume that a G a F and F {a F } =. (.4) If a G < a F the second assumption is superfluous, while it is automatically satisfied when F is continuous. For the moment, however, no other assumptions such as continuity of F or G will be needed. In differential terms, equation (.3) leads to F (dx) = ( F ()) (dx). (.5) The Lynden-Bell (97) estimator F n of F is obtained as the solution of the so-called self-consistency equation, i.e., as the solution of (.5) after having replaced and C with n and C n, respectively: F n (dx) = ( F n ()) n(dx) C n (x). (.6) Solving for F n yields the product-integral representation of F n : [ F n (x) = F ] n{x i }. (.7) C n (X i ) distinctx i x If there are no ties among the X s, (.7) simplifies to become F n (x) = [ ] ncn (X i ). (.8) nc n (X i ) X i x Note that nc n (X i ) so that each ratio is well defined. Since in this paper our objective will be to study general linear statistics based on F n, namely Lynden-Bell integrals ϕdf n, we also introduce the Lynden-Bell weights W in attached to each datum in the X-sample. For this, denote with X :n <... < X m:n the m distinct order statistics, with m possibly strictly less than n. From (.7) we obtain W in F n {X i:n } = [ F n (X i :n )] n{x i:n } C n (X i:n ) (.9) 3

4 and ϕdf n = m W in ϕ(x i:n ). (.) i= For ϕ = (,x], we are back at F n (x). As to other ϕ s we refer to Stute and Wang (993) or Stute (4), who considered possible applications of empirical integrals in the context of right censored data. More general statistical functionals often admit expansions, in which the leading term is of the form ϕdf n, with ϕ denoting the associated influence function. Since for a fully observable data set with Fn e being the classical empirical distribution function, ϕdfn e is just a sample mean to which, under a second moment assumption, the Central Limit Theorem (CLT) applies, distributional convergence of ϕdf n therefore constitutes an extension of the CLT to the left-truncation case. The corresponding SLLN has been studied in papers by He and Yang (998a,b). The CLT for censored data is due to Stute (995). In the present situation, the CLT is much more elusive than for randomly censored data. In the censored data case the right tails create technical difficulties, but not the left. For left truncation, however, both sides create problems. This is already seen now through the functions C and C n which decrease to zero on the left and on the right tail, and with C n appearing in denominators. The only trivial bound is nc n (X i ), which keeps everything from being not well defined. Moreover, C and C n are not monotone, again contrary to the censored data case, where the role of the C s is played by survival functions. This non-monotone feature of C n creates additional technical complications for truncated data. Worse than that, C n may become zero between the data points also in the central part. Keiding and Gill (99), who recognized the danger of these sets, have called these holes the empty inner risk sets. Not all authors seem to know about these problems since a detailed study of the C n process is sometimes missing. A consequence of these empty inner risk sets, as revealed by the representation (.7), is the loss of mass of F n after the first such hole. More specifically, suppose that, in terms of Keiding and Gill (99), a risky hole exists at X j. This means that X j is not covered by any other pair (X i, Y i ). Hence nc n (X j ) = nfn{x j }. From (.7) we get that all data points right to X j have mass zero under F n. In proofs this excludes one to incorporate exponential representations of the weights. To circumvent this difficulty we first study an asymptotically equivalent estimator ˆF n. This estimator is defined through: ϕd ˆF n = m i= ϕ(x i:n ) n{x i:n } C n (X i:n ) i j= [ nf ] n{x j:n }. (.) nc n (X j:n ) + The extra summand in the denominator allows for contributions of data which have holes on their left. As we shall see in a small simulation study 4

5 it may have a robustifying effect resulting in smaller MSE for small sample sizes. Asymptotically ϕdf n and ϕd ˆF n are equivalent at the n / - rate, which facilitates the asymptotic theory for ϕdf n. Another technical problem is caused by possible ties. In such a situation (.8) does not apply. Lemma. will show how the general case covered by (.7) and (.9) may be traced back to the case of a continuous. The main result of this paper, Theorem., provides a representation of ϕdfn as a sum of i.i.d. random variables under minimal assumptions on ϕ and the truncation mechanism. Usually linear i.i.d. representations of complicated estimators will include the Hájek projection of the statistic of interest. In the case of (.), however, this projection can be computed only up to remainders, and this is exactly what Theorem. does. More precisely, the proof of Theorem. proceeds in two steps. In the first step we have to identify all error terms which are negligible. The leading terms will turn out to be V-statistics. See Serfling (98). Finally, an application of the Hájek projection to the leading terms yields the desired i.i.d. representation. Needless to say asymptotic normality follows immediately. We shall also add some interesting comments on a so-called uniform representation. Proofs will be given in Section 3. Theorem. will hold under the following two assumptions: (A) (i) ϕ /GdF < (ii) df G < Condition (ii) already appeared in Woodroofe (985) in his study of the Lynden-Bell process, i.e., when he considered indicators ϕ = (,x ]. Nontechnically speaking, it is needed to guarantee enough information in the left tails to estimate F at the rate n /. Under slightly stronger assumptions, Stute (993) obtained an almost sure representation with sharp bounds on the remainder, again for indicators. See also Chao and Lo (988). Condition (i) guarantees, among other things, that the leading terms in the i.i.d. representation admit a finite second moment so that asymptotic normality holds. Since G it implies ϕ df <, which is the standard finite moment assumption when no truncation occurs. When ϕ has a finite second F -moment and is locally bounded in a neighborhood of a G, then (i) is implied by (ii). Note also that (i) and (ii) are always satisfied when a G < a F and ϕ df <. A CLT for truncated data is also contained in Sellero et al. (5). Apart from continuity assumptions they also need conditions which, in our notation, require finiteness of the integral ϕ (x)( F (x)) 5 F (dx), where ϕ ϕ. Since, however, this integral equals infinity for constant ϕ, their result is not applicable to bounded ϕ s, not to mention ϕ s which 5

6 increase to infinity as x. Rather, finiteness of the above integral is only obtained for ϕ s which converge to zero fast enough in the right tails. The focus of this paper is, however, on distributional convergence for which (A) will suffice. Theorem. is formulated for the case when F is continuous. This guarantees that among the observed X s there will be no ties, with probability one. In such a situation we obtain ϕd ˆF ϕ(x)γn (x) n = C n (x) n(dx), (.) with γ n (x) = exp n [ ln ] nc n (y) + Fn(dy). (.3) At the end of this section we shall show how general Lynden-Bell integrals may be traced back to the present case. Theorem.. Under Assumptions (A) and (.4), assume that F is continuous. Then we have ψ(y) ϕ[df n df ] = C(y) [F n(dy) ] Cn (y) C(y) ψ(y) + o C P (n / ), (y) where ψ(y) = ϕ(y)( F (y)) {y<x} ϕ(x)( F (x)) (dx) = {y<x} [ϕ(y) ϕ(x)]f (dx). For indicators ϕ = (,x ] the leading term already appears in Theorem in Stute (993). Remark.. If one checks the proof of Theorem. step by step, the following fact will be revealed: If rather than a single ϕ, one considers a collection {ϕ}, then the error terms are uniformly small in the sense that the remainder is o P (n / ) uniformly in ϕ, provided that ϕ ϕ for some ϕ satisfying ϕ df <. Actually, all remainders may be bounded from G above in absolute values by replacing ϕ by ϕ. Compared with Sellero et al. (5), no VC property for the ϕ s is needed for a representation as a V-statistic process. For the i.i.d. representation one has to guarantee that the errors in the Hájek projection are also uniformly small. These errors, however, form a class of degenerate U-Statistics. This kind of process was studied in Stute (994), and the achieved bounds are useful to handle the error terms in the second half of the proof. Details are omitted. 6

7 Corollary.. Under the assumptions of Theorem., we have n / ϕ[df n df ] N (, σ ) in distribution, with σ ψ(x) = Var C(X) X Y ψ(y) C (y). It is not difficult to see that σ < under (A). Finally, if in Remark. we take for {ϕ} the class of all indicators ϕ = (,x] and set ϕ, we obtain the following Corollary. Corollary.. Under df G have F n (x) F (x) = < and (.4), and for a continuous F, we ψx C [df n d ] Cn C ψ C x + o P (n / ) uniformly in x, where ψ x is the ψ belonging to ϕ = (,x]. Remark.. To make the point clear, Corollary. provides a representation which holds uniformly on the whole real line and not only on subintervals (, b] with b < b F = sup{x : F (x) < }, as is usually done in the literature. A major technical problem for proving Theorem. for a general F is caused by the fact that for discontinuous F ties may arise. As before, denote with X :n <... < X m:n the m ordered distinct data in the observed X-sample. To circumvent ties one may use the fact that each X i may be written in the form X i = (U i ), where U,..., U n is a sample of independent random variables with a uniform distribution on (,) and in the quantile function of. The construction of the U s is similar to the construction in Lemma.8 of Stute and Wang (993), with H there replaced by here. For the following recall that a quantile function is continuous from the left. Furthermore, with probability one, is also right-continuous at each U i. With this in mind, one can see that the corresponding truncating sample for the U i s are (Y i ), i n. If we denote with Cn U the C n -function corresponding to the pseudo observations (U i, (Y i )), i n, and if we let U i:n <... < U idi :n denote the ordered U s that satisfy (U ij:n ) = X i:n, with d i = nfn{x i:n }, then nc U n (U i:n ) = nc n (X i:n ) (.4) 7

8 and nc U n (U ij:n ) = nc U n (U i,j :n ), for j d i. (.5) Also note that the Lynden-Bell estimator Fn U for the pseudo observations satisfies (.8), i.e., Fn U (u) = [ ] nc U n (U i ). nc U U i u n (U i ) Introducing the function ϕ = ϕ, the analogue of (.) becomes ϕ df U n = = m d i i= j= ϕ( (U ij:n )) F U n (U ij:n ) nc U n (U ij:n ) m d i ϕ(x i:n ) i= j= Fn U (U ij:n ). ncn U (U ij:n ) In Lemma. we will show that for each i m and every j d i we have F U n (U ij:n ) nc U n (U ij:n ) = F n(x i :n ), (.6) nc n (X i:n ) where X :n =. It follows from (.9), (.) and (.6) that ϕ dfn U = ϕdf n with probability one. In conclusion the study of Lynden-Bell integrals may be traced back to the case when the variables of interest are uniform on (,) and therefore, with probability one, have no ties. At the same time our handling of ties does not require external randomization but maintains the product limit structure of the weights and hence of the estimators. Lemma.. For each i m and j d i, equation (.6) holds true. Simulations It is interesting to compare the small sample size behavior of F n and ˆF n. In a simulation study we considered ϕ(x) = x, i.e., the target was the mean lifetime of X. This ϕ is the canonical score function in the classical CLT. Recall that through truncation there is a sampling bias which would result in an upward bias if we would take the empirical distribution function of the X s and not F n or ˆF n. Introducing more or less complicated weights has the effect, among other things, that compared with the empirical distribution function the bias is reduced. 8

9 In the following we report on some simulation results which are part of a much more extensive study. Typically, for this ϕ, ϕd ˆF n outperforms ϕdfn in terms of the Mean Squared Error (MSE) when n 4 and truncation is heavy. For larger n(n 4) the difference is negligible. MSE was computed through Monte Carlo. In the following both X and Y were exponentially distributed with parameter : 3 Proofs Table. Comparison of MSE Sample Size MSE ( ϕdf n ) MSE( ϕd ˆF n ) n = n = n = n = n = n = n = n = 8.. n = 9.. n =..9 To prove Theorem., note that under continuity of F (.8) applies. Also we may assume w.l.o.g. that all data are nonnegative. Hence all appearing integrals will be over the positive real line hereafter. In our first lemma we provide a bound for the function C/C n. This will be needed in proofs to handle negligible terms. Though, by Glivenko-Cantelli, C n C uniformly, bounding the ratio is more delicate in view of possible holes and the non-monotonicity of C and C n. Lemma 3.. Assume F is continuous. For any λ such that αλ one has ( ) C(X i ) P sup X i :X i b C n (X i ) λ λe exp[ G(b)αλ]. Proof. The proof is similar to that of Lemma. in Stute (993), which provides an exponential bound for the sup extended over the left tails X i b rather than the right tails. Together with the above mentioned bound from Stute (993), Lemma 3. immediately implies sup i n C(X i ) C n (X i ) = O P() as n. (3.) 9

10 Assertion (3.) will be of some importance in forthcoming proofs, as it will allow us to replace the random C n appearing in denominators with the deterministic C. We are now ready to expand ϕd ˆF n. By (.) and (.3), ϕd ˆF n = ϕ(x) C n (x) exp n ln ϕ(x) C n (x) γ n(x) n(dx). [ ] nc n (y) + n(dy) F n(dx) Set γ(x) = F (x) = exp From Taylor s expansion we obtain x C(y). where γ n (x) = γ(x) + e n(x) [B n (x) + D n (x) + D n (x)], B n (x) = n D n (x) = D n (x) = ln [ ] nc n (y) + n(dy) + Fn(dy), C(y) [C n (y) + n C(y)] [C n (y) + n ]C(y) n(dy) Fn(dy) C n (y) +, n and n (x) is between the exponents of γ n (x) and γ(x). Particularly, we have n (x). Setting S n = S n = S n3 = ϕ(x) C n (x) e n(x) B n (x) n(dx) ϕ(x)γ(x) C n (x) n(dx) ϕ(x) C n (x) e n(x) D n (x) n(dx)

11 and we thus get S n4 = ϕ(x) C n (x) e n(x) D n (x) n(dx), ϕd ˆF n = S n + S n + S n3 + S n4. (3.) In the next lemma we study the functions D n and D n more closely. To motivate the following note that for each fixed x such that F (x ) < D n (x ) with probability one. Actually, by standard Glivenko-Cantelli arguments, D n (x) with probability one uniformly on x x. Similarly, when we consider the standardized processes x n / D n (x), it is easy to show their distributional convergence in the Skorokhod space D[, x ]. Things, however, change if we study D n on the whole support of. Since = C(y) we cannot expect uniform convergence on the whole support of. The situation is similar for the cumulative hazard function Λ, where uniform convergence of the Nelson-Aalen estimator Λ n may be obtained only on compact subsets of the support of F. When one evaluates these processes at x = X i it is known though, that the uniform deviation between Λ n and Λ does not got to zero but remains at least bounded. See Theorem. in Zhou (99). Similar things turn out to be true for D n and D n, as Lemma 3. will show. Our proofs are different though since compared with Zhou (99) we shall apply a truncation technique which in proofs guarantees that the sup of D n and D n are bounded on large but not too large sets. Lemma 3.. We have and sup D n (X i ) = O P () (3.3) i n sup D n (X i ) = O P (). (3.4) i n

12 Proof. Assume w.l.o.g. that has unbounded support on the right. Otherwise, replace by b F = sup{x : (x) < }. For a given ε >, one may find some small c = c ε and a sequence a n such that (a n ) = c n and P(X n:n a n ) ε. Actually, this follows from P(X n:n a n ) = [ c n ]n exp( c). It therefore suffices to bound D n and D n on (, a n ]. By Lemma 3., we may replace C n in the denominator by C. Hence with large probability and up to a constant factor, (3.4) is bounded from above by a n C n (y) C(y) C (y) n(dy) + n a n n(dy) C (y). Using (ii) of (A), it is easily seen that the expectation of the second term is bounded as n. The expectation of the first term is less than or equal to a n E C n (y) C(y) + n C (y) a n n C / (y) + n C (y) = O(). This proves (3.4). As for D n, we already mentioned that D n converges to zero with probability one uniformly on each interval [, x ] with F (x ) <. Also for each fixed x, we have ED n (x) = and VarD n (x) n x C (y). Moreover, by the construction of a n and the definition of and C, we have VarD n (a n ) = O(). Conclude that D n (a n ) = O P (). Apply standard tightness arguments for the (non-standardized) process D n, see Billingsley (968), p. 8, to get that D n is uniformly bounded on [, a n ]. This, however, implies (3.3) and completes the proof of Lemma 3.. Our next Lemma implies that S n is negligible. Lemma 3.3. Under the assumptions of Theorem., we have S n = o P (n / ).

13 Proof. We first bound B n (x). Since, for x, we have we obtain n x x x n(dy) C n(y) n x ( x) ln( x) x, n(dy) [nc n (y) + ] B n(x). (3.5) Recall that nc n on the support of n so that the above integrals are all well defined. Conclude from (3.5) that S n n ϕ(x) C n (x) e n(x) n(dy) C n(y) n(dx). Now, as in the proof of Lemma 3., for a given ε >, we may choose some small c = c ε and a sequence a n such that (a n ) = c n and P(X n:n a n ) ε. Hence on this event, n has all its mass on [, a n ], and integration with respect to n may thus be restricted to x a n. Furthermore, by Lemmas 3. and 3., the processes C/C n and D n + D n remain stochastically bounded when restricted to the support of n, as n. Together with B n (x) it therefore suffices to show that a n n / n(dy) C (y) n(dx) = o P (). The expectation of the left-hand side is less than or equal to a n n / For any fixed x with F (x ) <, the integral x C (y) (dx). C (y) (dx) is finite, since F (y) is bounded away from zero there and by assumption (dy) df G (y) = α G <. The same holds true for the integral a n x x 3 C (y) (dx).

14 It remains to study a n x x x C (y) (dx). This integral is bounded from above, however, by G (x ) Now apply Cauchy-Schwarz to get a n x ϕ(x) F (dx) F (x) x a n x ϕ(x) F (dx) F (x). ϕ (x)f (dx) / a n F (dx) [ F (x)] The second integral is less than or equal to [ F (a n )]. Since c n = (a n ) = α a n G(z)F (dz) α ( F (a n )), the second square root is O(n / ). On the other hand,the first integral can be made arbitrarily small by choosing x large enough. This concludes the proof of Lemma 3.3. Next we study S n. For this, the following Lemma will be crucial. Lemma 3.4. Under the assumptions of Theorem., we have ϕ γ[c Cn ] I n df C n = o P (n / ). C n /. Proof. As in the proof of Lemma 3.3 we have X i a n for all i =,..., n with large probability. Similarly, consider a sequence b n > such that (b n ) = c, c sufficiently small, n such that X i b n for i =,,..., n with large probability. In other words, up to an event of small probability, we may restrict integration in I n to the interval [b n, a n ]. Also, in view of Lemma 3., the C n in the denominator may be replaced by C (times a constant), with large probability. Hence on 4

15 a set with large probability we have n I n = n ϕ(x i ) γ(x i ) {bn X i a n } C (X i= i )C n (X i ) { n ϕ(x i ) γ(x i ) {...} n C 3 (X i ) n i= + n 3 n i= j i [ n ( {Yj X i X j } C(X i ) ) + n ( C(X i)) j i ( {...} C(X i ) )} ϕ(x i ) γ(x i ) {...} C (X i )C n (X i ). (3.6) The first sum has an expectation not exceeding n a n b n (dx). C (x) Fix a small positive x and, as in the previous proof, a large x. Assume w.l.o.g. that b n x x a n. The integral ] x x ϕ γ C d is finite. So, the middle part contributes an error O( ), which is more than n a n we want. The upper part,..., is dealt with as in the proof of Lemma x 3.3, yielding a bound o(n / ) as x gets large. The same holds true for the lower part. Finally, the second sum in (3.6) may be studied along the same lines as for the first, upon starting with the inequality nc n (X i ). The proof is complete. Corollary 3.. Under the assumptions of Theorem., we have S n = ϕγ C d n + ϕγ[c C n ] C d n + o P (n / ). We shall come back to S n later but first proceed to S n3. Recalling D n, we have S n3 = ϕ(x)e n(x) C n (x) Fn(dy) C(y) n(dx). To get an expansion for S n3, the next lemma will be crucial. 5

16 Lemma 3.5. Under the assumptions of Theorem., we have [ ] ϕ(x)γ(x) II n e n(x)+ C(y) F n(dy) C n (x) C(y) n(dx) = o P (n / ). Proof. By Cauchy-Schwarz, on a set with large probability, nii n n / a n a n n / C n (x) C n (x) [ n(dy) C(y) e F n(x)+ (dy) C(y) ] n(dx) n(dy). (3.7) By Lemma 3., we may again replace C n by C. The expectation of the resulting first integral is then less than or equal to a n n / C (y) (dx), which was already shown to be o() in the proof of Lemma 3.3. It therefore remains to show that also the second factor in (3.7) is o P (). Putting we may write z n (x) = n (x) + C(y), [ e z n (x) ] = z n (x)e z n(x), where z n (x) is between zero and z n (x). Recalling that n (x) is between the first integral in B n (x) and and B n (x), we may infer that C(y) z n (x) is uniformly bounded from above in probability, on the support of Fn, as n. Hence it suffices to bound the term a n n / which in turn is less than or equal to a n n / n / zn(x) n(dx), [B n (x) + D n (x) + D n (x)] n(dx) a n [B n(x) + (D n (x) + D n (x)) ]Fn(dx). 6

17 By (3.5), a n Bn(x) n(dx) n a n n(dy) C n(y) n(dx). To bound the right-hand side, replace C n with C first. Then the squared term may be viewed as a V-statistic, of which the leading term is a U- statistic. Its expectation is n n Thus we have to show that a n n ϕ(x) C (y) C (y). F (dx) = o(n / ). This follows as in the proof of Lemma 3.3. Only the powers of F (x) are different. Finally, the error terms involving D n and D n may be dealt with similarly. The proof is complete. Lemma 3.6. Under the assumptions of Theorem., we have ϕγ[cn C](x) CC n (x) Fn(dy) C(y) n(dx) = o P (n / ). Proof. As before. Lemmas 3.5 and 3.6 immediately imply the following Corollary, which brings us to the desired representation of S n3. Corollary 3.. Under the assumptions of Theorem., we have the representation S n3 = ϕ(x)γ(x) Fn(dy) C(y) n(dx) + o P (n / ). We now proceed to S n4. As a first step to get the desired representation, we shall need the following lemma. Lemma 3.7. Under the assumptions of Theorem., S n4 admits the expansion S n4 = ϕ(x)e n(x) C n (x) + o P (n / ). C n (y) + C(y) n C (y) n(dy)fn(dx) (3.8) 7

18 Proof. Recalling the definition of D n, the difference between S n4 and the leading term in (3.8) becomes ϕ(x)e n(x) C n (x) [C n (y) + n C(y)] C (y)[c n (y) + n ] n(dy) n(dx). Taking Lemma 3. into account, the last expression is bounded in absolute values from above, with large probability, by ϕ(x) e n(x) [C n (y) + n C(y)] C 3 (y) n(dy)fn(dx). In the proof of Lemma 3.5 we have argued that z n (x) = n (x) + C(y) is uniformly bounded from above on the support of n, as n, with probability one. Consequently we may replace exp( n (x)) by γ(x). Also we may wish to extend the integral only up to a n. The expectation of the resulting term then does not exceed a n ϕ(x) C(y) + 4 n n F (dx). C 3 (y) These terms already appeared in the proof of Lemma 3.3 and were shown to be o(n / ) there. The proof therefore is complete. In the following we shall omit the summand in (3.8), since its contribution n is also negligible. The next Lemma will enable us to replace exp( n (x)) by γ(x). Lemma 3.8. Under the assumptions of Theorem., we have III n ϕ(x)γ(x) C n (x) = o P (n / ). [ e F n(x)+ (dy) C(y) ] C n C df C n n(dx) Proof. Cauchy-Schwarz leads to a bound for niiin similar to (3.7) for niin. The second factor is exactly the same as before. The first factor is dealt with in the following Lemma. 8

19 Lemma 3.9. Under the assumptions of Theorem., we get C n (y) C(y) C (y) n(dy) Fn(dx) = o P (n / ). Proof. As in previous proofs it is enough to extend the x-integral up to a n. It is then easily checked that the expectation of the resulting term is less than or equal to (when n > 3) a n where ϕ(x) G(y z)( F (y z)) + 4 (n 3)α n (dz)f (dx), C (y)c (z) y z = min(y, z) and y z = max(y, z). Neglecting 4n for a moment, the rest equals a n G(y)( F (z)) ϕ(x) (dz)f (dx) (n 3)α C (y)c (z) = = (n 3) n 3 n 3 a n a n a n y ϕ(x) ϕ(x) ϕ(x) ( F (y)) C(z) z y F (dz)f (dy)f (dx) C(z) F (dy)f (dz)f (dx) ( F (y)) F (dz)f (dx). C(z)( F (z)) The last integral already appeared in the proof of Lemma 3.3 and was shown to be o(n / ). The error term 4 yields similar bounds so that the n proof is complete. In the final step C n in the denominator of (3.8) may be replaced with C. Details are omitted. Hence we come up with the following representation of S n4. Corollary 3.3. Under the assumptions of Theorem., we have S n4 = ϕ(x)γ(x) + o P (n / ). C n (y) C(y) C (y) n(dy)fn(dx) 9

20 We are now ready to give the Proof of Theorem.. Corollaries 3., 3. and 3.3 yield representations of the relevant terms S n, S n3 and S n4 as V-statistics. As is known, see Serfling (98), a V-statistic equals a U-statistic, up to an error o P (n / ). If, in S n, S n3 and S n4, we replace Fn by, we come up with degenerate U-statistics which are all of the order o P (n / ). Collecting the leading terms and an application of Fubini s theorem yield the proof of Theorem. with ˆF n instead of F n. As to F n, apply the inequality i i i a j b j a j b j, a j, b j j= j= j= to the weights of F n and ˆF n. It follows that ϕd ˆF n ϕdf n n 3 n i= ϕ(x i:n ) C n (X i:n ) = n ϕ(x) C n (x) i j= C n(x j:n ) C n(y) n(dy) n(dx). Use Lemma 3. again to replace C n with C and apply the SLLN (for U- statistics) to get that the last term is O P (n ). This completes the proof of Theorem.. Proof of Lemma.. The proof proceeds by induction on i. For i = and j =, the LHS of (.6) equals /nc U n (U :n ), and so does the RHS. The assertion then follows from (.4). For i = and j =, the LHS of (.6) equals, by (.8) and (.5), nc U n (U :n ) nc U n (U :n ) nc U n (U :n ) = nc U n (U :n ) = nc n (X :n ). For i = and j >, the proof follows the same pattern, by repeated use of (.5). Hence the assertion of Lemma. holds true for i =. Assuming that (.6) holds true for all indices less than or equal to i, we obtain, by (.7), F n (X i:n ) = F U n (U idi :n). This may be used as a starting point to show (.6) for i +. Again make repeated use of (.4) and (.5). Acknowledgement. The research of Jane-Ling Wang was supported by an NSF grant NSF

21 References Billingsley, P. (968) Convergence of probability meassures. Wiley, New York. Chao, M.-T. and Lo, S.-H. (988) Some representations of the nonparametric maximum likelihood estimators with truncated data. Ann. Statist.,, He, S. and Yang, G. L. (998a) The strong law under random truncation. Ann. Statist., 6, 99-. He, S. and Yang, G. L. (998b) Estimation of the truncation probability in the random truncation model. Ann. Statist., 6, -7. Keiding, N. and Gill, R. D. (99) Random truncation models and Markov processes. Ann. Statist., 8, Lynden-Bell, D. (97) A method of allowing for known observational selection in small samples applied to 3CR quasars. Monthly Notices Roy. Astronom. Soc., 55, Sellero, C. S., Manteiga, W. G. and Van Keilegom, I. (5) Uniform representation of product-limit integrals with applications. Scand. J. Statist., 3, Serfling, R. J. (98) Approximation theorems of mathematical statistics. Wiley, New York. Stute, W. (993) Almost sure representations of the product-limit estimator for truncated data. Ann. Statist.,, Stute, W. (994) U-Statistic processes: a martingale approach. Ann. Probab.,, Stute, W. (995) The central limit theorem under random censorship. Ann. Statist., 3, Stute, W. (4) Kaplan-Meier integrals. Handbook of Statistics, 3, Elsevier, Amsterdam. Stute, W. and Wang, J. L. (993) The strong law under random censorship. Ann. Statist.,, Wang, M. C. (989) A semiparametric model for randomly truncated data. J. Amer. Statist. Assoc., 84, Woodroofe, M. (985) Estimating a distribution function with truncated data. Ann. Statist., 3,

22 Zhou, M. (99) Some properties of the Kaplan-Meier estimator for independent nonidentically distributed random variables. Ann. Statist., 9,

Estimation of the Bivariate and Marginal Distributions with Censored Data

Estimation of the Bivariate and Marginal Distributions with Censored Data Michael Akritas and Ingrid Van Keilegom Penn State University and Eindhoven University of Technology May 22, 2 Abstract Two new