Preservation Theorems for Glivenko-Cantelli and Uniform Glivenko-Cantelli Classes

Similar documents
Preservation Theorems for Glivenko-Cantelli and Uniform Glivenko-Cantelli Classes

PRESERVATION THEOREMS FOR GLIVENKO-CANTELLI AND UNIFORM GLIVENKO-CANTELLI CLASSES AAD VAN DER VAART AND JON A. WELLNER ABSTRACT We show that the P Gli

SOME CONVERSE LIMIT THEOREMS FOR EXCHANGEABLE BOOTSTRAPS

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

Empirical Processes: General Weak Convergence Theory

2 (Bonus). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

Introduction to Empirical Processes and Semiparametric Inference Lecture 13: Entropy Calculations

L p Spaces and Convexity

MATHS 730 FC Lecture Notes March 5, Introduction

5 Measure theory II. (or. lim. Prove the proposition. 5. For fixed F A and φ M define the restriction of φ on F by writing.

MATH MEASURE THEORY AND FOURIER ANALYSIS. Contents

THEOREMS, ETC., FOR MATH 515

Defining the Integral

(Y; I[X Y ]), where I[A] is the indicator function of the set A. Examples of the current status data are mentioned in Ayer et al. (1955), Keiding (199

Lecture 2: Uniform Entropy

Problem set 1, Real Analysis I, Spring, 2015.

Chapter 8. General Countably Additive Set Functions. 8.1 Hahn Decomposition Theorem

HILBERT SPACES AND THE RADON-NIKODYM THEOREM. where the bar in the first equation denotes complex conjugation. In either case, for any x V define

Statistics 581 Revision of Section 4.4: Consistency of Maximum Likelihood Estimates Wellner; 11/30/2001

Measure Theory on Topological Spaces. Course: Prof. Tony Dorlas 2010 Typset: Cathal Ormond

1. Introduction In many biomedical studies, the random survival time of interest is never observed and is only known to lie before an inspection time

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9

Maximum likelihood: counterexamples, examples, and open problems

STATISTICAL INFERENCE UNDER ORDER RESTRICTIONS LIMIT THEORY FOR THE GRENANDER ESTIMATOR UNDER ALTERNATIVE HYPOTHESES

3 Integration and Expectation

A Law of the Iterated Logarithm. for Grenander s Estimator

Integration on Measure Spaces

A SKOROHOD REPRESENTATION THEOREM WITHOUT SEPARABILITY PATRIZIA BERTI, LUCA PRATELLI, AND PIETRO RIGO

Banach Spaces V: A Closer Look at the w- and the w -Topologies

Maximum likelihood: counterexamples, examples, and open problems. Jon A. Wellner. University of Washington. Maximum likelihood: p.

CHAPTER V DUAL SPACES

Part II Probability and Measure

CHAPTER I THE RIESZ REPRESENTATION THEOREM

Nonparametric estimation under Shape Restrictions

Examples of Dual Spaces from Measure Theory

Math212a1413 The Lebesgue integral.

Introduction to Empirical Processes and Semiparametric Inference Lecture 12: Glivenko-Cantelli and Donsker Results

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.

1 Weak Convergence in R k

Weak convergence. Amsterdam, 13 November Leiden University. Limit theorems. Shota Gugushvili. Generalities. Criteria

Real Analysis Notes. Thomas Goller

7 Convergence in R d and in Metric Spaces

Analysis Comprehensive Exam Questions Fall 2008

Real Analysis Problems

Real Analysis, 2nd Edition, G.B.Folland Elements of Functional Analysis

STAT 7032 Probability Spring Wlodek Bryc

arxiv:math/ v2 [math.st] 17 Jun 2008

Efficiency of Profile/Partial Likelihood in the Cox Model

Real Analysis Math 131AH Rudin, Chapter #1. Dominique Abdi

consists of two disjoint copies of X n, each scaled down by 1,

The International Journal of Biostatistics

2 Sequences, Continuity, and Limits

Supplementary Notes for W. Rudin: Principles of Mathematical Analysis

Functional Analysis, Stein-Shakarchi Chapter 1

EMPIRICAL PROCESSES: Theory and Applications

Product-limit estimators of the survival function with left or right censored data

The intersection axiom of

Chapter 4: Asymptotic Properties of the MLE

L p Functions. Given a measure space (X, µ) and a real number p [1, ), recall that the L p -norm of a measurable function f : X R is defined by

THE INVERSE FUNCTION THEOREM

Summary of Real Analysis by Royden

MATH5011 Real Analysis I. Exercise 1 Suggested Solution

Estimation of the Bivariate and Marginal Distributions with Censored Data

Empirical Processes and random projections

Commutative Banach algebras 79

CONVERGENCE OF MULTIPLE ERGODIC AVERAGES FOR SOME COMMUTING TRANSFORMATIONS

d(x n, x) d(x n, x nk ) + d(x nk, x) where we chose any fixed k > N

Problem 1: Compactness (12 points, 2 points each)

1. Bounded linear maps. A linear map T : E F of real Banach

Topological properties of Z p and Q p and Euclidean models

(U) =, if 0 U, 1 U, (U) = X, if 0 U, and 1 U. (U) = E, if 0 U, but 1 U. (U) = X \ E if 0 U, but 1 U. n=1 A n, then A M.

REAL AND COMPLEX ANALYSIS

Probability and Measure

Lebesgue-Radon-Nikodym Theorem

Spectral Continuity Properties of Graph Laplacians

Functional Analysis I

6. Duals of L p spaces

FUNCTIONAL ANALYSIS LECTURE NOTES: WEAK AND WEAK* CONVERGENCE

Integral Jensen inequality

INTRODUCTION TO MEASURE THEORY AND LEBESGUE INTEGRATION

An elementary proof of the weak convergence of empirical processes

Lecture 4 Lebesgue spaces and inequalities

MTH 404: Measure and Integration

MATS113 ADVANCED MEASURE THEORY SPRING 2016

Chapter 4. The dominated convergence theorem and applications

18.175: Lecture 3 Integration

Score Statistics for Current Status Data: Comparisons with Likelihood Ratio and Wald Statistics

AN EFFECTIVE METRIC ON C(H, K) WITH NORMAL STRUCTURE. Mona Nabiei (Received 23 June, 2015)

Functional Analysis HW #1

An introduction to some aspects of functional analysis

Stanford Mathematics Department Math 205A Lecture Supplement #4 Borel Regular & Radon Measures

SOLUTIONS TO HOMEWORK ASSIGNMENT 4

Math 4121 Spring 2012 Weaver. Measure Theory. 1. σ-algebras

Approximate Self Consistency for Middle-Censored Data

Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1

Functional Analysis HW #3

The main results about probability measures are the following two facts:

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product

Transcription:

Preservation Theorems for Glivenko-Cantelli and Uniform Glivenko-Cantelli Classes This is page 5 Printer: Opaque this Aad van der Vaart and Jon A. Wellner ABSTRACT We show that the P Glivenko property of classes of functions F,...,F k is preserved by a continuous function ϕ from R k to R in the sense that the new class of functions x ϕ(f (x),...,f k (x)), f i F i, i =,...,k is again a Glivenko-Cantelli class of functions if it has an integrable envelope. We also prove an analogous result for preservation of the uniform Glivenko-Cantelli property. Corollaries of the main theorem include two preservation theorems of Dudley (998). We apply the main result to reprove a theorem of Schick and Dudley 998a or b? Yu (999)concerning consistency of the NPMLE in a model for mixed case interval censoring. Finally a version of the consistency result of Schick and Yu (999)is established for a general model for mixed case interval censoring in which a general sample space Y is partitioned into sets which are members of some VC-class C of subsets of Y. Glivenko-Cantelli theorems Let (X, A,P) be a probability space, and suppose that F L (P ). For such a class of functions, let F 0,P {f Pf : f F}. We also let F F (x) sup f F f(x), the envelope function of F. If f F for all f F with F measurable, then F is an envelope for F. Suppose that X,X 2,... are i.i.d. P. The Glivenko-Cantelli theorems give conditions under which the empirical measure P n converges uniformly to P over a class F, either in probability (in which case we say that F is a weak Glivenko-Cantelli class for P ) or almost surely: P n P F sup P n f Pf a.s. 0; f F in this case we say that F is a strong Glivenko-Cantelli class for P. Useful sufficient conditions for a class F to be a strong Glivenko-Cantelli class for P are that it has an integrable envelope and either log N [] (ɛ, F,L (P )) < for all ɛ>0

6 A. van der Vaart and J. A. Wellner or log N(ɛ, F M,L (P n )) = o P (n) for every ɛ>0 and 0 <M<. Here F M = {f [F M] : f F}. In the second case some additional measurability conditions are necessary; see Van der Vaart and Wellner (996), Chapter 2.4, pp. 22 26. In particular, the condition in Theorem 2.4.3, page 23, is shown by Giné and Zinn (984) to be both necessary and sufficient, under measurability assumptions, for the class F to be a strong Glivenko-Cantelli class (see Talagrand (996) for another proof). Talagrand (987b) gives necessary and sufficient conditions for the Glivenko-Cantelli theorem without any measurability hypotheses. Theorem. (Giné and Zinn, 984). Suppose that F is L (P ) bounded and nearly linearly supremum measurable for P ; in particular this holds if F is image admissible Suslin. Then the following are equivalent: (a) F is a strong Glivenko-Cantelli class for P. (b) F has an envelope function F L (P ) and the truncated classes F M {f{f M} : f F}satisfy n E log N(ɛ, F M,L r (P n )) 0 for all ɛ>0, and for all M (0, ) for some (all) r (0, ] where f Lr(P ) f P,r {P ( f r )} r. 2 Preservation of the Glivenko Cantelli property Our goal in this section is to present several results concerning the stability of the Glivenko-Cantelli property of one or more classes of functions under composition with functions ϕ. A theorem which motivated our interest is the following result of Dudley (998a). Theorem 2. (Dudley, 998a). Suppose that F is a strong Glivenko-Cantelli class for P with PF <, J is a possibly unbounded interval including the ranges of all f F, ϕ is continuous and monotone on J, and for some finite constants c, d, ϕ(y) c y + d for all y J. Then ϕ(f) is also a strong Glivenko-Cantelli class for P. Dudley (998a) proves this via the characterization of Glivenko-Cantelli classes due to Talagrand (987b). Dudley (998b) also uses Talagrand s characterization to prove the following interesting proposition. Proposition. (Dudley, 998b). Suppose that F is a strong Glivenko- Cantelli class for P with PF <, and g is a fixed bounded function ( g < ). Then the class of functions g F {g f : f F}is a strong Glivenko-Cantelli class for P.

Glivenko-Cantelli Preservation Theorems 7 Yet another proposition in this same vein is: Proposition 2. (Giné and Zinn, 984). Suppose that F is a uniformly bounded strong Glivenko-Cantelli class for P, and g L (P ) is a fixed function. Then the class of functions g F {g f : f F}is a strong Glivenko-Cantelli class for P. Given classes F,...,F k of functions f i : X Rand a function ϕ : R k R, let ϕ(f,...,f k ) be the class of functions x ϕ(f (x),...,f k (x)), where f i F i, i =,...,k. Theorem 2 and Propositions and 2 are all corollaries of the following theorem. Theorem 3. Suppose that F,...,F k are P Glivenko-Cantelli classes of functions, and that ϕ : R k R is continuous. Then H ϕ(f,...,f k ) is P Glivenko-Cantelli provided that it has an integrable envelope function. Proof. We first assume that the classes of functions F i are appropriately measurable. Let F,...,F k and H be integrable envelopes for F,...,F k and H respectively, and set F = F... F k.form (0, ), define Now H M {ϕ(f) [F M] : f =(f,...,f k ) F F 2 F k F}. (P n P )ϕ(f) F (P n + P )H [F>M] + (P n P )h HM. The expectation of the first term on the right converges to 0 as M. Hence it suffices to show that H M is P Glivenko-Cantelli for every fixed M. Let δ = δ(ɛ) betheδ of Lemma 2 below for ϕ :[ M,M] k R, ɛ>0, and the L -norm. Then for any (f j,g j ) F j, j =,...,k, implies that It follows that P n f j g j [Fj M] δ, j =,...,k k P n ϕ(f,...,f k ) ϕ(g,...,g k ) [F M] ɛ. N(ɛ, H M,L (P n )) k N( δ k, F j [Fj M],L (P n )). Thus E log N(ɛ, H M,L (P n )) = o(n) for every ɛ>0, M<. This implies that E log N(ɛ, (H M ) N,L (P n )) = o(n) for (H M ) N the functions h{h

8 A. van der Vaart and J. A. Wellner N} for h H M.ThusH M is strong Glivenko-Cantelli for P by Theorem. This concludes the proof that H = ϕ(f) is weak Glivenko-Cantelli. Because it has an integrable envelope, it is strong Glivenko-Cantelli by, e.g., Lemma 2.4.5 of Van der Vaart and Wellner (996). This concludes the proof for appropriately measurable classes F j, j =,...,k. We extend the theorem to general Glivenko-Cantelli classes using separable versions as in Talagrand (987a). (Also see van der Vaart and Wellner (996), pp. 5 20 for a discussion.) As shown in the preceding argument, it is not a loss of generality to assume that the classes F i are uniformly bounded. Furthermore, it suffices to show that ϕ(f,...,f k )is weak Glivenko-Cantelli. We first need a lemma. Lemma. Any strong P -Glivenko Cantelli class F is totally bounded in L (P ) if and only if P F <. Furthermore for any r (, ), iff has an envelope that is contained in L r (P ), then F is also totally bounded in L r (P ). (983) or (984)? Proof. A class that is totally bounded is also bounded. Thus for the first statement we only need to prove that a strong Glivenko-Cantelli class F with P F < is totally bounded in L (P ). It is well-known that such a class has an integrable envelope. E.g. see Giné and Zinn (983) or Problem 2.4. of van der Vaart and Wellner (996) to conclude first that P f Pf F <. Next the claim follows from the triangle inequality f F f Pf F + P F. Thus it is no loss of generality to assume that the class F possesses an envelope that is finite everywhere. If F is not totally bounded in L (P ), then there is ε > 0 and G F countable such that the packing number D(ε, G,L (P )) =. This is obvious, since if F is not totally bounded, then D(ε, F,L (P )) = for some ε>0; then take a countable set of ε separated points. So, now G has all the measurablity properties, is GC and L (P )-bounded, and moreover, D(ε, G,L )=. But this is a contradiction because, as shown in Giné and Zinn (984), first half of page 984, D(ε, G,L ) must be finite for all ε>0. This completes the proof of the first part of the lemma. If F has an envelope in L r (P ), then F is totally bounded in L r (P ) if the class F M of functions f{f M} is totally bounded in L r (P ) for every fixed M. The class F M is P -GC by Theorem 3 and hence this class is totally bounded in L (P ). But then it is also totally bounded in L r (P ), because P f r P f M r for any f that is bounded by M and we can construct the ɛ-net over F M in L (P ) to consist of functions that are bounded by M. Because a Glivenko-Cantelli class F with P F < is totally bounded in L (P ) by Lemma, it is separable as a subset of L (P ). A minor generalization of Theorem 2.3.7 in van der Vaart and Wellner (996) shows that there exists a bijection f f of F onto a class F L (P ) such that

f = f P almost surely for every f F. Glivenko-Cantelli Preservation Theorems 9 there exists a countable subset G F such that for every n there exists a measurable set N n X n with P n (N n ) = 0 such that for all (x,...,x n ) / N n and f F there exists {g m } Gsuch that P g m f 0 and (g m (x ),...,g m (x n )) ( f(x ),..., f(x n )). By an adaptation of a theorem due to Talagrand (987a) (see Theorem 2.3.5 in van der Vaart and Wellner (996)) a class F is weak Glivenko- Cantelli if and only if the class F is weak Glivenko-Cantelli and sup P n f f 0 f F in outer probability. Construct a pointwise separable version F i for each of the classes F i. The classes F i possess enough measurability to make the preceding argument work; in particular pointwise separable version in the above sense is sufficient for the nearly linearly supremum measurable hypothesis of Giné and Zinn (984) for both F,...,F k and ϕ(f,...,f k ). Thus the class ϕ( F,..., F k ) is Glivenko-Cantelli for P. Now by Lemma 2 there exists for every ɛ>0aδ>0 such that P n f j f j < δ k, j =,...,k, implies P n ϕ(f,...,f k ) ϕ( f,..., f k ) <ɛ. The theorem follows. Lemma 2. Suppose that ϕ : K R is continuous and K R k is compact. Then for every ɛ>0 there exists δ>0 such that for all n and for all a,...,a n,b,...,b n K R k implies n n n a i b i <δ n ϕ(a i ) ϕ(b i ) <ɛ. Here can be any norm on R k ; in particular it can be ( k ) /r x r = x i r, r [, ) or x max i k x i for x =(x,...,x k ) R k.

20 A. van der Vaart and J. A. Wellner Proof. Let U n be uniform on {,...,n}, and set X n = a Un, Y n = b Un. Then we can write n n a i b i = E X n Y n and n ϕ(a i ) ϕ(b i ) = E ϕ(x n ) ϕ(y n ). n Hence it suffices to show that for every ɛ>0 there exists δ>0 such that for all (X, Y ) random vectors in K R k, E X Y <δ implies E ϕ(x) ϕ(y ) <ɛ. Suppose not. Then for some ɛ>0 and for all m =, 2,... there exists (X m,y m ) such that E X m Y m < m, E ϕ(x m) ϕ(y m ) ɛ. But since {(X m,y m )} is tight, there exists (X m,y m ) d (X, Y ). Then it follows that E X Y = lim E X m m Y m =0 so that X = Y a.s., while on the other hand 0=E ϕ(x) ϕ(y ) = lim E ϕ(x m m ) ϕ(y m ) ɛ>0. This contradiction means that the desired implication holds. Another potentially useful preservation theorem is one based on building up Glivenko-Cantelli classes from the restrictions of a class of functions to elements of a partition of the sample space. The following theorem is related to the results of Van der Vaart (996) for Donsker classes. Theorem 4. Suppose that F is a class of functions on (X, A,P), and {X i } is a partition of X : X i = X, X i X j = for i j. Suppose that F j {f Xj : f F}is P Glivenko-Cantelli for each j, and F has an integrable envelope function F. Then F is itself P Glivenko-Cantelli. Proof. Since f = f Xj = f Xj, it follows that E P n P F E P n P Fj 0

Glivenko-Cantelli Preservation Theorems 2 by the dominated convergence theorem, since each term in the sum converges to zero by the hypothesis that each F j is P Glivenko-Cantelli, and we have E P n P Fj E P n (F Xj )+P (F Xj ) 2P (F Xj ) where P (F X j )=P (F ) <. 3 Preservation of the uniform Glivenko Cantelli property We say that F is a strong uniform Glivenko-Cantelli class if for all ɛ>0 ( ) sup PrP sup P m P F >ɛ 0 as n P P(X,A) m n where P(X, A) is the set of all probability measures on (X, A). For x = (x,...,x n ) X n, n =, 2,..., and r (0, ), we define on F the pseudo-distances e x,r (f,g) = { n n } r f(x i ) g(x i ) r e x, (f,g) = max i n f(x i) g(x i ),, f,g F. Let N(ɛ, F,e x,r ) denote the ɛ covering number of (F,e x,r ), ɛ>0. Then define, for n =, 2,..., ɛ>0, and r (0, ], the quantities N n,r (ɛ, F) = sup x X n N(ɛ, F,e x,r ). Theorem 5. (Dudley, Giné, and Zinn (99)). Suppose that F is a class of uniformly bounded functions such that F is image admissible Suslin. Then the following are equivalent: (a) F is a strong uniform Glivenko-Cantelli class. (b) log N n,r (ɛ, F) 0 for all ɛ>0 n for some (all) r (0, ]. For the definition of the image admissible Suslin property see Dudley (984), sections 0.3 and.. The following theorem gives natural sufficient conditions for preservation of the uniform Glivenko-Cantelli theorem.

22 A. van der Vaart and J. A. Wellner Theorem 6. Suppose that F,...,F k are classes of uniformly bounded functions on (X, A) such that F,...,F k are image-admissible Suslin and strong uniform Glivenko-Cantelli classes. Suppose that ϕ : R k R is continuous. Then H ϕ(f,...,f k ) is a strong uniform Glivenko-Cantelli class. Proof. It follows from Lemma 2 that for any ɛ>0 there exists δ>0 such that for any f j,g j F j, j =,...,k, x X n, implies that It follows that and hence that e x, (f j,g j ) δ, j =,...,k k e x, (ϕ(f,...,f k ),ϕ(g,...,g k )) ɛ. N n, (ɛ, H) n log N n,(ɛ, H) k N n, ( δ k, F j), k n log N n,( δ k, F j) 0 where the convergence follows from part (b) of Theorem 5 with r =.If we show that H = ϕ(f) is image admissible Suslin, then the conclusion follows from (b) implies (a) in Theorem 5. We give a proof of a slightly stronger statement in the following lemma. Lemma 3. If F,...,F k are image admissible Suslin via (Y i, S i,t i ), i =,...,k, and φ : R k R is measurable, then ϕ(f,...,f k ) is image admissible Suslin via ( k Y i, k S i,t) for T (y,...,y k )=ϕ(t y,...,t k y k ). Proof. There exist Polish spaces Z i and measurable maps g i : Z i Y i which are onto. Then g : Z Z k Y Y k defined by g(z,...,z k )=(g (z ),...,g k (z k )) is onto and measurable for S S k. So ( i Y i, i S i) is Suslin. It suffices to check that T is onto and ψ defined by the map ψ(x, y,...,y k )=ϕ(t y (x),...,t k y k (x)) is measurable. Obviously T is onto, and ψ is measurable because each map (x, y i ) T i y i (x) is measurable on X Y i and hence on X Y Y k, and hence (x, y,...,y k ) (T y (x),...,t k y k (x)) is measurable from X Yto R k.

Glivenko-Cantelli Preservation Theorems 23 4 Nonparametric maximum likelihood estimation: a general result Now we prove a general result for nonparametricmaximum likelihood estimation in a class of densities. The main proposition in this section is related to results of Pfanzagl (988) and Van de Geer (993), (996). Suppose that P is a class of densities with respect to a fixed σ finite measure µ on a measurable space (X, A). Suppose that X,...,X n are i.i.d. P 0 with density p 0 P. Let p n argmax P n log p. For 0 <α, let ϕ α (t) =(t α )/(t α + ) for t 0, ϕ(t) = for t<0. Then ϕ α is bounded and continuous for each α (0, ]. For 0 <β< define h 2 β(p, q) p β q β dµ. Note that h /2 (p, q) h(p, q) is the Hellinger distance between p and q, and by Hölder s inequality, h β (p, q) 0 with equality if and only if p = q a.e. µ. Proposition 3. Suppose that P is convex. Then ( h 2 α/2 ( p n,p 0 ) (P n P 0 ) ϕ α ( p ) n ). p 0 In particular, when α =we have, with ϕ ϕ, ( h 2 ( p n,p 0 ) h 2 /2 ( p n,p 0 ) (P n P 0 ) ϕ( p ) n ). p 0 Corollary. Suppose that {ϕ(p/p 0 ): p P}is a P 0 Glivenko-Cantelli class. Then for each 0 <α, h α/2 ( p n,p 0 ) a.s. 0. Proof of Proposition 3. It follows from convexity of P and convexity of t ϕ α (/t) that () 0 P n ϕ α ( p n /p 0 )=(P n P 0 )ϕ α ( p n /p 0 )+P 0 ϕ α ( p n /p 0 ); see van der Vaart and Wellner (996) page 330, and Pfanzagl (988), pp. 4 43. Now we show that p α p α ( ) 0 (2) P 0 ϕ α (p/p 0 )= p α + p α dp 0 p β 0 p β dµ 0 for β = α/2. Note that this holds if and only if p α +2 p α 0 + p 0dµ + p β pα 0 p β dµ,

24 A. van der Vaart and J. A. Wellner or p β 0 p β dµ 2 p α p α 0 + pα p 0dµ. But this holds if p β 0 p β 2 pα p 0 p α 0 +. pα With β = α/2, this becomes 2 (pα 0 + p α ) p α/2 0 p α/2 = p α 0 pα, and this holds by the arithmeticmean geometricmean inequality. Thus (2) holds. Combining (2) with () yields the claim of the proposition. 5 Example:a result of Schick and Yu Our goal in this section is to give another proof of the consistency result of Schick and Yu (999) for the Non-Parametric Maximum Likelihood Estimator (NPMLE) F n for mixed case interval censored data. Our proof is based on the inequality of the preceding section, and is similar in spirit to results of Van de Geer (993), (996). Suppose that Y is a random variable taking values in R + = [0, ) with distribution function F F= {all df s F on R + }. Unfortunately we are not able to observe Y itself. What we do observe is a vector of times T K =(T K,,...,T K,K ) where K, the number of times, is itself random, and the interval (T K,j,T K,j ] into which Y falls (with T K,0 0, T K,K+ ). More formally, we assume that K is an integer-valued random variable, and T = {T k,j,j =,...,k,k =, 2,...}, is a triangular array of potential observation times, and that Y and (K, T) are independent. Let X =( K,T K,K), with a possible value x =(δ k,t k,k), where k =( k,,..., k,k ) with k,j = (Tk,j,T k,j ](Y ), j =, 2,...,k +, and T k is the kth row of the triangular array T. Suppose we observe n i.i.d. copies of X; X,X 2,...,X n, where X i =( (i),t (i),k (i) ), i = K (i) K (i), 2,...,n. Here (Y (i),t (i),k (i) ), i =, 2,... are the underlying i.i.d. copies of (Y,T,K). We first note that conditionally on K and T K, the vector K has a multinomial distribution: where ( K K, T K ) Multinomial K+ (, F K ) F K (F (T K, ),F(T K,2 ) F (T K, ),..., F (T K,K )).

Glivenko-Cantelli Preservation Theorems 25 Suppose for the moment that the distribution G k of (T K K = k) has density g k and p k P (K = k). Then a density of X is given by (3) k+ p F (x) p F (δ k,t k,k)= (F (t k,j ) F (t k,j )) δ k,j g k (t k )p k where t k,0 0, t k,k+. In general, (4) p F (x) p F (δ k,t k,k) = = k+ (F (t k,j ) F (t k,j )) δ k,j k+ δ k,j (F (t k,j ) F (t k,j )) is a density of X with respect to the dominating measure ν where ν is determined by the joint distribution of (K, T), and it is this version of the density of X with which we will work throughout the rest of the paper. Thus the log-likelihood function for F of X,...,X n is given by where n l n(f X) = n n = P n m F K (i) + ( ) (i) K,j log F (T (i) (i) ) F (T K (i),j K (i),j ) m F (X) = K+ K+ K,j log (F (T K,j ) F (T K,j )) K,j log ( F K,j ) and where we have ignored the terms not involving F. We also note that, with P 0 P F0, K+ P 0 m F (X) =P 0 F 0,K,j log ( F K,j ). The NonparametricMaximum Likelihood Estimator (NPMLE) F n is the distribution function F n (t) which puts all its mass at the observed time points and maximizes the log-likelihood l n (F X). It can be calculated via the iterative convex minorant algorithm proposed in Groeneboom and Wellner (992) for case 2 interval censored data.

26 A. van der Vaart and J. A. Wellner By Proposition 3 with α = and ϕ ϕ as before, it follows that ( ) h 2 (p Fn,p F0 ) (P n P 0 ) ϕ(p Fn /p F0 ) where ϕ is bounded and continuous from R to R. Now the collection of functions G {p F : F F} is easily seen to be a Glivenko-Cantelli class of functions: this can be seen by first applying Theorem 4 to the collections G k, k =, 2,... obtained from G by restricting to the sets K = k. Then for fixed k, the collections G k = {p F (δ, t k,k): F F}are P 0 Glivenko-Cantelli classes since F is a uniform Glivenko-Cantelli class, and since the functions p F are continuous transformations of the classes of functions x δ k,j and x F (t k,j ) for j =,...,k+, and hence G is P Glivenko-Cantelli by Theorem 3. Note that the single function p F0 is trivially P 0 Glivenko-Cantelli since it is uniformly bounded, and the single function (/p F0 ) is also P 0 GC since P 0 (/p F0 ) <. Thus by Proposition 2 with g =(/p F0 ) and F = G = {p F : F F}, it follows that G {p F /p F0 : F F}is P 0 Glivenko- Cantelli. Finally another application of Theorem 3 shows that the collection H {L (p F /p F0 ): F F}= {ϕ(p F /p F0 ): F F} is also P 0 -Glivenko-Cantelli. When combined with Proposition 3, this yields the following theorem: Theorem 7. The NPMLE F n satisfies h(p Fn,p F0 ) a.s. 0. To relate this result to a recent theorem of Schick and Yu (999), it remains only to understand the relationship between their L (µ) and the Hellinger metric h between p F and p F0. Let B denote the collection of Borel sets in R. OnB we define measures µ and µ as follows: For B B, (5) and (6) µ(b) = µ(b) = P (K = k) k= P (K = k) k k= k P (T k,j B K = k), k P (T k,j B K = k). Let d be the L (µ) metricon the class F; thus for F,F 2 F, d(f,f 2 )= F (t) F 2 (t) dµ(t).

Glivenko-Cantelli Preservation Theorems 27 The measure µ was introduced by Schick and Yu (999); note that µ is a finite measure if E(K) <. Note that d(f,f 2 ) can also be written in terms of an expectation as: K (7) d(f,f 2 )=E (K,T ) (F (T K,j ) F 2 (T K,j ). As Schick and Yu (999) observed, consistency of the NPMLE F n in L (µ) holds under virtually no further hypotheses. Theorem 8. (Schick and Yu). Suppose that E(K) <. Then d( F n,f 0 ) a.s. 0. Proof. We will show that Theorem 8 follows from Theorem 7 and the following lemma. Lemma 4. { 2 Proof. We know that F n F 0 d µ} 2 h 2 (p F n,p F 0 ). h 2 (p Fn,p F0 ) d TV (p Fn,p F0 ) 2h(p Fn,p F0 ) where, with y k,0 =, y k,k+ =, h 2 (p Fn,p F0 )= k+ P (K = k) {[ F n (y k,j ) F n (y k,j )] /2 k= [F 0 (y k,j ) F 0 (y k,j )] /2 } 2 dg k (y) while k+ d TV (p Fn,p F0 )= P (K = k) [ F n (y k,j ) F n (y k,j )] k= [F 0 (y k,j ) F 0 (y k,j )] dg k (y). Note that k+ [ F n (y k,j ) F n (y k,j )] [F 0 (y k,j ) F 0 (y k,j )] k+ = ( F n F 0 )(y k,j,y k,j ] max j k+ F n (y k,j ) F 0 (y k,j ),

28 A. van der Vaart and J. A. Wellner so integrating across this inequality with respect to G k (y) yields k+ [ F n (y k,j ) F n (y k,j )] [F 0 (y k,j ) F 0 (y k,j )] dg k (y) max j k k k F n (y k,j ) F 0 (y k,j ) dg k,j (y k,j ) F n (y k,j ) F 0 (y k,j ) dg k,j (y k,j ). By multiplying across by P (K = k) and summing over k, this yields d TV (p Fn,p F0 ) F n F 0 d µ, and hence (8) h 2 (p Fn,p F0 ) 2 { F n F 0 d µ} 2. The measure µ figuring in Lemma 4 is not the same as the measure µ of Schick and Yu (999) because of the factor /k. Note that this factor means that the measure µ is always a finite measure, even if E(K) =. It is clear that µ(b) µ(b) for every Borel set B, and that µ µ. The following lemma (Lemma 2.2 of Schick and Yu (999)) together with Lemma 4 shows that Theorem 7 implies the result of Schick and Yu once again: Lemma 5. Suppose that µ and µ are two finite measures, and that g, g, g 2,... are measurable functions with range in [0, ]. Suppose that µ is absolutely continuous with respect to µ. Then g n g d µ 0 implies that gn g dµ 0. Proof. Write g n g dµ = g n g dµ d µ d µ and use the dominated convergence theorem applied to a.e. convergent subsequences. 6 Example:generalizing the result of Schick and Yu Our goal in this section is to give a generalization of the consistency result of Schick and Yu (999).

Glivenko-Cantelli Preservation Theorems 29 Suppose that Y is a random variable taking values in Y. Suppose that Y has distribution Q on the measurable space (Y, B). Unfortunately we are not able to observe Y itself. What we do observe is a vector of random sets C K =(C K,,...,C K,K ) where K, the number of sets is itself random, and the set {C K,j } K form a partition of Y: C K,j C K,j = if j j and K C K,j = Y. More formally, we assume that K is an integer-valued random variable, and C = {C k,j,j =,...,k,k =, 2,...}, is a triangular array of random sets, and that Y and (K, C) are independent. Let X =( K,C K,K), with a possible value x =(δ k,c k,k), where k =( k,,..., k,k ) with k,j = Ck,j (Y ), j =, 2,...,k, and C k is the k th row of the triangular array C. Suppose we observe n i.i.d. copies of X; X,X 2,...,X n, where X i =( (i),c (i),k (i) ), i =, 2,...,n. Here K (i) K (i) (Y (i),c (i),k (i) ), i =, 2,... are the underlying i.i.d. copies of (Y,C,K). We first note that conditionally on K and C K, the vector K has a multinomial distribution: where ( K K, C K ) Multinomial K (,Q K ) Q K (Q(C K, ),Q(C K,2 ),...,Q(C K,K )). Suppose for the moment that the distribution G k of (C K K = k) has density g k and p k P (K = k). Then a density of X is given by (9) k p Q (x) p Q (δ k,c k,k)= Q(c k,j ) δ k,j g k (c k )p k. In general, (0) k k p Q (x) p Q (δ k,c k,k)= Q(c k,j ) δ k,j = δ k,j Q(c k,j ) is a density of X with respect to the dominating measure ν where ν is determined by the joint distribution of (K, C), and it is this version of the density of X with which we will work throughout the rest of the paper. Thus the log-likelihood function for Q of X,...,X n is given by where n l n(q X) = n n K (i) (i) K,j log Q(C(i) K (i),j )=P nm Q m Q (X) = K K,j log Q(C K,j )

30 A. van der Vaart and J. A. Wellner and where we have ignored the terms not involving Q. We also note that, with P 0 P Q0, K P 0 m Q (X) =P 0 Q 0 (C K,j ) log Q(C K,j ). The NonparametricMaximum Likelihood Estimator (NPMLE) Q n is a probability measure Q n which maximizes the log-likelihood l n (Q X). Here we will bypass the many interesting existence, characterization, and computational issues connected with the NPMLE Q n, and focus instead on the issue of consistency once we have the NPMLE in hand. By Proposition 3 with α = it follows that ( ) h 2 (p Qn,p Q0 ) (P n P 0 ) ϕ(p Qn /p Q0 ). where ϕ is bounded and continuous. Now the collection of functions G {p Q : Q Q} where Q is the collection of all probability distributions on (Y, B), is easily seen to be a Glivenko-Cantelli class of functions: this can be seen by first applying Theorem 4 to the collections G k, k =, 2,... obtained from G by restricting to the sets K = k. The next step is to show that the collections G k {p Q (,,k): Q Q} are P 0 Glivenko-Cantelli for each k. To this end, suppose that (Y, B) isa measurable space and let C be a universal GC - class of sets in Y (i.e. Q GC for every probability measure Q on (Y, B)). Let (T,T,G) be a probability space, and let t C t be a map with values in C. Lemma 6. Suppose that the collection {{t : y C t } : y Y}is G GC. Then the collection of functions t Q(C t ) where Q ranges over all probability measures Q on Y is G GC. Remark. By Assouad (983), Proposition 2.2, page 246, together with Corollary.0, page 24, it follows that {{t T : y C t } : y Y}is a VC-class of subsets of T if and only if {C t : t T } is a VC-class of subsets of Y. Thus if we assume that all the subsets C = C t arising in the partitions are elements of a VC-class C, then, subject to being suitably measurable, the hypothesis of Lemma 6 is satisfied (and the class in question is even universal Glivenko-Cantelli).

Glivenko-Cantelli Preservation Theorems 3 Proof. For Q the Diracmeasure at y (unit mass at y), the function t Q(C t ) becomes t Ct (y). By assumption, the set of all such functions, with y ranging over Y, is G GC. Then so is the set of all functions t m m Ct (y i ), y,...,y m Y,m N. Let Y,...,Y m be an i.i.d. sample from a given Q. Because C is Q GC, sup m {Y i C} P (C) 0 a.s.. C C m Then there exists a sequence y,y 2,... in Y such that sup m {y i C} P (C) 0, C C m and consequently sup m {y i C t } P(C t ) 0. t T m It follows that the maps t Q(C t ) are the (uniform) limits of sequences of functions from a G GC class. Hence they are G GC. Note on measurability: It is assumed implicitly that the sets {t : y C t } are measurable in T for every y Y. The proof shows that the maps t P (C t ) are then also measurable. It follows that for fixed k, the collections G k = {p Q (δ, c k,k) : Q Q} are P 0 -Glivenko-Cantelli, and since the functions p Q are continuous transformations of the classes of functions x δ k,j and x Q(c k,j ) for j =,...,k, and hence G is P Glivenko-Cantelli by Theorem 3. Note that the single function p Q0 is trivially P 0 Glivenko-Cantelli since it is uniformly bounded. Now another application of Theorem 3 shows that the collection H {L (p Q /p Q0 ): Q Q}= {ϕ(p Q /p Q0 ): Q Q} is also P 0 -Glivenko-Cantelli. When combined with Proposition 3, this yields the following theorem. Theorem 9. If all C K,j C, a VC collection of subsets of X, then the NPMLE Q n satisfies h(p Qn,p Q0 ) a.s. 0.

32 A. van der Vaart and J. A. Wellner To obtain a statement analogous to the theorem of Schick and Yu (999), let Σ denote a sigma algebra for the space C of subsets: we are assuming that P ( K [C K,j C])=.Thus(C, Σ) is a measurable space. On Σ we define the measure µ as follows: () k µ(b) = P (K = k) P (C k,j B K = k), B Σ. k= Note that d TV (p Qn,p Q0 ) = (2) = k P (K = k) Q n (c k,j ) Q 0 (c k,j ) dg k (c) k= Q n (c) Q 0 (c) dµ(c) d( Q n,q 0 ). Here is a generalization of the theorem of Schick and Yu (999). Theorem 0. The NPMLE Q n satisfies d( Q n,q 0 ) a.s. 0. Proof. Theorem 0 follows from Theorem 9, (2) and the observation that d( Q n,q 0 )=d TV (p Qn,p Q0 ) 2h(p Qn,p Q0 ). Example. (Mixed case interval censoring in R 2 ). Suppose that Y Q 0 takes values in R +2, and the partitions {C K,j } K are obtained as the natural rectangles formed from two increasing collections 0 S K,... S K,K and 0 T K2,... T K2,K on the two coordinate axes. Then K =(K + )(K 2 + ), and all the elements of the partitions are elements of the VC-class of rectangles in R 2. It follows that the NPMLE Q n of Q 0 satisfies Q n (s, t] Q 0 (s, t] dµ(s, t) 0 a.s.. s,t R +2 Acknowledgments. We owe thanks to Evarist Giné and Joel Zinn for suggesting the proof given for Lemma. References Assouad, P. (983). Densité et dimension. Annales de l Institut Fourier 33, 233 282. Dudley, R. M. (984). A Course on Empirical Processes. Ecole d été de probabilités de St.-Flour, 982. Lecture Notes in Mathematics 097, Springer, Berlin, 42.

Glivenko-Cantelli Preservation Theorems 33 Dudley, R. M. (998a). Consistency of M-estimators and one-sided bracketing. In Proceedings of the First International Conference on High- Dimensional Probability. E. Eberlein, M. Hahn, and M. Talagrand, eds. Birkhäuser, Basel, pp. 33 58. Dudley, R. M. (998b). Personal communication. Dudley, R. M., Giné, E., and Zinn, J. (99). Uniform and universal Glivenko-Cantelli classes. J. Theoretical Probab. 4, 485 50. Giné, E., and Zinn, J. (984). Some limit theorems for empirical processes. Ann. Probability 2, 929 989. Groeneboom, P. and Wellner, J. A. (992). Information Bounds and Nonparametric Maximum Likelihood Estimation. DMV Seminar Band 9, Birkhäuser, Basel. Pfanzagl, J. (988). Consistency of maximum likelihood estimators for certain nonparametric families, in particular: mixtures. J. Statist. Planning and Inference 9, 37 58. Schick, A. and Yu, Q. (999), Consistency of the GMLE with Mixed Case Interval-Censored Data, Scand. J. Statist., to appear. Talagrand, M. (987a). Measurability problems for empirical measures. Ann. Probability 5, 204 22. Talagrand, M. (987b). The Glivenko-Cantelli problem. Ann. Probability 5, 837 870. Talagrand, M. (996). The Glivenko-Cantelli problem, ten years later. J. Theor. Prob. 9, 37 384. Van de Geer, S. (993). Hellinger consistency of certain nonparametric maximum likelihood estimators. Ann. Statist. 2, 4 44. Van de Geer, S. (996). Rates of convergence for the maximum likelihood estimator in mixture models. J. Nonparametric Statist. 6, 293 30. Van der Vaart, A. W. (996). New Donsker classes. Ann. Probability 24, 228 240. Van der Vaart, A. W., and Wellner, J. A. (996). Weak Convergence and Empirical Processes with Applications to Statistics, Springer-Verlag, New York. update? Department of Mathematics Free University De Boelelaan 08a 08 HV Amsterdam The Netherlands e-mail: aad@cs.vu.nl Department of Statistics University of Washington P.O. Box 354322 Seattle, Washington 9895-4322 U.S.A. e-mail: jaw@stat.washington.edu