Preservation Theorems for Glivenko-Cantelli and Uniform Glivenko-Cantelli Classes

Size: px
Start display at page:

Download "Preservation Theorems for Glivenko-Cantelli and Uniform Glivenko-Cantelli Classes"

Transcription

1 Preservation Theorems for Glivenko-Cantelli and Uniform Glivenko-Cantelli Classes This is page 5 Printer: Opaque this Aad van der Vaart and Jon A. Wellner ABSTRACT We show that the P Glivenko property of classes of functions F,...,F k is preserved by a continuous function ϕ from R k to R in the sense that the new class of functions x ϕ(f (x),...,f k (x)), f i F i, i =,...,k is again a Glivenko-Cantelli class of functions if it has an integrable envelope. We also prove an analogous result for preservation of the uniform Glivenko-Cantelli property. Corollaries of the main theorem include two preservation theorems of Dudley (998). We apply the main result to reprove a theorem of Schick and Dudley 998a or b? Yu (999)concerning consistency of the NPMLE in a model for mixed case interval censoring. Finally a version of the consistency result of Schick and Yu (999)is established for a general model for mixed case interval censoring in which a general sample space Y is partitioned into sets which are members of some VC-class C of subsets of Y. Glivenko-Cantelli theorems Let (X, A,P) be a probability space, and suppose that F L (P ). For such a class of functions, let F 0,P {f Pf : f F}. We also let F F (x) sup f F f(x), the envelope function of F. If f F for all f F with F measurable, then F is an envelope for F. Suppose that X,X 2,... are i.i.d. P. The Glivenko-Cantelli theorems give conditions under which the empirical measure P n converges uniformly to P over a class F, either in probability (in which case we say that F is a weak Glivenko-Cantelli class for P ) or almost surely: P n P F sup P n f Pf a.s. 0; f F in this case we say that F is a strong Glivenko-Cantelli class for P. Useful sufficient conditions for a class F to be a strong Glivenko-Cantelli class for P are that it has an integrable envelope and either log N [] (ɛ, F,L (P )) < for all ɛ>0

2 6 A. van der Vaart and J. A. Wellner or log N(ɛ, F M,L (P n )) = o P (n) for every ɛ>0 and 0 <M<. Here F M = {f [F M] : f F}. In the second case some additional measurability conditions are necessary; see Van der Vaart and Wellner (996), Chapter 2.4, pp In particular, the condition in Theorem 2.4.3, page 23, is shown by Giné and Zinn (984) to be both necessary and sufficient, under measurability assumptions, for the class F to be a strong Glivenko-Cantelli class (see Talagrand (996) for another proof). Talagrand (987b) gives necessary and sufficient conditions for the Glivenko-Cantelli theorem without any measurability hypotheses. Theorem. (Giné and Zinn, 984). Suppose that F is L (P ) bounded and nearly linearly supremum measurable for P ; in particular this holds if F is image admissible Suslin. Then the following are equivalent: (a) F is a strong Glivenko-Cantelli class for P. (b) F has an envelope function F L (P ) and the truncated classes F M {f{f M} : f F}satisfy n E log N(ɛ, F M,L r (P n )) 0 for all ɛ>0, and for all M (0, ) for some (all) r (0, ] where f Lr(P ) f P,r {P ( f r )} r. 2 Preservation of the Glivenko Cantelli property Our goal in this section is to present several results concerning the stability of the Glivenko-Cantelli property of one or more classes of functions under composition with functions ϕ. A theorem which motivated our interest is the following result of Dudley (998a). Theorem 2. (Dudley, 998a). Suppose that F is a strong Glivenko-Cantelli class for P with PF <, J is a possibly unbounded interval including the ranges of all f F, ϕ is continuous and monotone on J, and for some finite constants c, d, ϕ(y) c y + d for all y J. Then ϕ(f) is also a strong Glivenko-Cantelli class for P. Dudley (998a) proves this via the characterization of Glivenko-Cantelli classes due to Talagrand (987b). Dudley (998b) also uses Talagrand s characterization to prove the following interesting proposition. Proposition. (Dudley, 998b). Suppose that F is a strong Glivenko- Cantelli class for P with PF <, and g is a fixed bounded function ( g < ). Then the class of functions g F {g f : f F}is a strong Glivenko-Cantelli class for P.

3 Glivenko-Cantelli Preservation Theorems 7 Yet another proposition in this same vein is: Proposition 2. (Giné and Zinn, 984). Suppose that F is a uniformly bounded strong Glivenko-Cantelli class for P, and g L (P ) is a fixed function. Then the class of functions g F {g f : f F}is a strong Glivenko-Cantelli class for P. Given classes F,...,F k of functions f i : X Rand a function ϕ : R k R, let ϕ(f,...,f k ) be the class of functions x ϕ(f (x),...,f k (x)), where f i F i, i =,...,k. Theorem 2 and Propositions and 2 are all corollaries of the following theorem. Theorem 3. Suppose that F,...,F k are P Glivenko-Cantelli classes of functions, and that ϕ : R k R is continuous. Then H ϕ(f,...,f k ) is P Glivenko-Cantelli provided that it has an integrable envelope function. Proof. We first assume that the classes of functions F i are appropriately measurable. Let F,...,F k and H be integrable envelopes for F,...,F k and H respectively, and set F = F... F k.form (0, ), define Now H M {ϕ(f) [F M] : f =(f,...,f k ) F F 2 F k F}. (P n P )ϕ(f) F (P n + P )H [F>M] + (P n P )h HM. The expectation of the first term on the right converges to 0 as M. Hence it suffices to show that H M is P Glivenko-Cantelli for every fixed M. Let δ = δ(ɛ) betheδ of Lemma 2 below for ϕ :[ M,M] k R, ɛ>0, and the L -norm. Then for any (f j,g j ) F j, j =,...,k, implies that It follows that P n f j g j [Fj M] δ, j =,...,k k P n ϕ(f,...,f k ) ϕ(g,...,g k ) [F M] ɛ. N(ɛ, H M,L (P n )) k N( δ k, F j [Fj M],L (P n )). Thus E log N(ɛ, H M,L (P n )) = o(n) for every ɛ>0, M<. This implies that E log N(ɛ, (H M ) N,L (P n )) = o(n) for (H M ) N the functions h{h

4 8 A. van der Vaart and J. A. Wellner N} for h H M.ThusH M is strong Glivenko-Cantelli for P by Theorem. This concludes the proof that H = ϕ(f) is weak Glivenko-Cantelli. Because it has an integrable envelope, it is strong Glivenko-Cantelli by, e.g., Lemma of Van der Vaart and Wellner (996). This concludes the proof for appropriately measurable classes F j, j =,...,k. We extend the theorem to general Glivenko-Cantelli classes using separable versions as in Talagrand (987a). (Also see van der Vaart and Wellner (996), pp for a discussion.) As shown in the preceding argument, it is not a loss of generality to assume that the classes F i are uniformly bounded. Furthermore, it suffices to show that ϕ(f,...,f k )is weak Glivenko-Cantelli. We first need a lemma. Lemma. Any strong P -Glivenko Cantelli class F is totally bounded in L (P ) if and only if P F <. Furthermore for any r (, ), iff has an envelope that is contained in L r (P ), then F is also totally bounded in L r (P ). (983) or (984)? Proof. A class that is totally bounded is also bounded. Thus for the first statement we only need to prove that a strong Glivenko-Cantelli class F with P F < is totally bounded in L (P ). It is well-known that such a class has an integrable envelope. E.g. see Giné and Zinn (983) or Problem 2.4. of van der Vaart and Wellner (996) to conclude first that P f Pf F <. Next the claim follows from the triangle inequality f F f Pf F + P F. Thus it is no loss of generality to assume that the class F possesses an envelope that is finite everywhere. If F is not totally bounded in L (P ), then there is ε > 0 and G F countable such that the packing number D(ε, G,L (P )) =. This is obvious, since if F is not totally bounded, then D(ε, F,L (P )) = for some ε>0; then take a countable set of ε separated points. So, now G has all the measurablity properties, is GC and L (P )-bounded, and moreover, D(ε, G,L )=. But this is a contradiction because, as shown in Giné and Zinn (984), first half of page 984, D(ε, G,L ) must be finite for all ε>0. This completes the proof of the first part of the lemma. If F has an envelope in L r (P ), then F is totally bounded in L r (P ) if the class F M of functions f{f M} is totally bounded in L r (P ) for every fixed M. The class F M is P -GC by Theorem 3 and hence this class is totally bounded in L (P ). But then it is also totally bounded in L r (P ), because P f r P f M r for any f that is bounded by M and we can construct the ɛ-net over F M in L (P ) to consist of functions that are bounded by M. Because a Glivenko-Cantelli class F with P F < is totally bounded in L (P ) by Lemma, it is separable as a subset of L (P ). A minor generalization of Theorem in van der Vaart and Wellner (996) shows that there exists a bijection f f of F onto a class F L (P ) such that

5 f = f P almost surely for every f F. Glivenko-Cantelli Preservation Theorems 9 there exists a countable subset G F such that for every n there exists a measurable set N n X n with P n (N n ) = 0 such that for all (x,...,x n ) / N n and f F there exists {g m } Gsuch that P g m f 0 and (g m (x ),...,g m (x n )) ( f(x ),..., f(x n )). By an adaptation of a theorem due to Talagrand (987a) (see Theorem in van der Vaart and Wellner (996)) a class F is weak Glivenko- Cantelli if and only if the class F is weak Glivenko-Cantelli and sup P n f f 0 f F in outer probability. Construct a pointwise separable version F i for each of the classes F i. The classes F i possess enough measurability to make the preceding argument work; in particular pointwise separable version in the above sense is sufficient for the nearly linearly supremum measurable hypothesis of Giné and Zinn (984) for both F,...,F k and ϕ(f,...,f k ). Thus the class ϕ( F,..., F k ) is Glivenko-Cantelli for P. Now by Lemma 2 there exists for every ɛ>0aδ>0 such that P n f j f j < δ k, j =,...,k, implies P n ϕ(f,...,f k ) ϕ( f,..., f k ) <ɛ. The theorem follows. Lemma 2. Suppose that ϕ : K R is continuous and K R k is compact. Then for every ɛ>0 there exists δ>0 such that for all n and for all a,...,a n,b,...,b n K R k implies n n n a i b i <δ n ϕ(a i ) ϕ(b i ) <ɛ. Here can be any norm on R k ; in particular it can be ( k ) /r x r = x i r, r [, ) or x max i k x i for x =(x,...,x k ) R k.

6 20 A. van der Vaart and J. A. Wellner Proof. Let U n be uniform on {,...,n}, and set X n = a Un, Y n = b Un. Then we can write n n a i b i = E X n Y n and n ϕ(a i ) ϕ(b i ) = E ϕ(x n ) ϕ(y n ). n Hence it suffices to show that for every ɛ>0 there exists δ>0 such that for all (X, Y ) random vectors in K R k, E X Y <δ implies E ϕ(x) ϕ(y ) <ɛ. Suppose not. Then for some ɛ>0 and for all m =, 2,... there exists (X m,y m ) such that E X m Y m < m, E ϕ(x m) ϕ(y m ) ɛ. But since {(X m,y m )} is tight, there exists (X m,y m ) d (X, Y ). Then it follows that E X Y = lim E X m m Y m =0 so that X = Y a.s., while on the other hand 0=E ϕ(x) ϕ(y ) = lim E ϕ(x m m ) ϕ(y m ) ɛ>0. This contradiction means that the desired implication holds. Another potentially useful preservation theorem is one based on building up Glivenko-Cantelli classes from the restrictions of a class of functions to elements of a partition of the sample space. The following theorem is related to the results of Van der Vaart (996) for Donsker classes. Theorem 4. Suppose that F is a class of functions on (X, A,P), and {X i } is a partition of X : X i = X, X i X j = for i j. Suppose that F j {f Xj : f F}is P Glivenko-Cantelli for each j, and F has an integrable envelope function F. Then F is itself P Glivenko-Cantelli. Proof. Since f = f Xj = f Xj, it follows that E P n P F E P n P Fj 0

7 Glivenko-Cantelli Preservation Theorems 2 by the dominated convergence theorem, since each term in the sum converges to zero by the hypothesis that each F j is P Glivenko-Cantelli, and we have E P n P Fj E P n (F Xj )+P (F Xj ) 2P (F Xj ) where P (F X j )=P (F ) <. 3 Preservation of the uniform Glivenko Cantelli property We say that F is a strong uniform Glivenko-Cantelli class if for all ɛ>0 ( ) sup PrP sup P m P F >ɛ 0 as n P P(X,A) m n where P(X, A) is the set of all probability measures on (X, A). For x = (x,...,x n ) X n, n =, 2,..., and r (0, ), we define on F the pseudo-distances e x,r (f,g) = { n n } r f(x i ) g(x i ) r e x, (f,g) = max i n f(x i) g(x i ),, f,g F. Let N(ɛ, F,e x,r ) denote the ɛ covering number of (F,e x,r ), ɛ>0. Then define, for n =, 2,..., ɛ>0, and r (0, ], the quantities N n,r (ɛ, F) = sup x X n N(ɛ, F,e x,r ). Theorem 5. (Dudley, Giné, and Zinn (99)). Suppose that F is a class of uniformly bounded functions such that F is image admissible Suslin. Then the following are equivalent: (a) F is a strong uniform Glivenko-Cantelli class. (b) log N n,r (ɛ, F) 0 for all ɛ>0 n for some (all) r (0, ]. For the definition of the image admissible Suslin property see Dudley (984), sections 0.3 and.. The following theorem gives natural sufficient conditions for preservation of the uniform Glivenko-Cantelli theorem.

8 22 A. van der Vaart and J. A. Wellner Theorem 6. Suppose that F,...,F k are classes of uniformly bounded functions on (X, A) such that F,...,F k are image-admissible Suslin and strong uniform Glivenko-Cantelli classes. Suppose that ϕ : R k R is continuous. Then H ϕ(f,...,f k ) is a strong uniform Glivenko-Cantelli class. Proof. It follows from Lemma 2 that for any ɛ>0 there exists δ>0 such that for any f j,g j F j, j =,...,k, x X n, implies that It follows that and hence that e x, (f j,g j ) δ, j =,...,k k e x, (ϕ(f,...,f k ),ϕ(g,...,g k )) ɛ. N n, (ɛ, H) n log N n,(ɛ, H) k N n, ( δ k, F j), k n log N n,( δ k, F j) 0 where the convergence follows from part (b) of Theorem 5 with r =.If we show that H = ϕ(f) is image admissible Suslin, then the conclusion follows from (b) implies (a) in Theorem 5. We give a proof of a slightly stronger statement in the following lemma. Lemma 3. If F,...,F k are image admissible Suslin via (Y i, S i,t i ), i =,...,k, and φ : R k R is measurable, then ϕ(f,...,f k ) is image admissible Suslin via ( k Y i, k S i,t) for T (y,...,y k )=ϕ(t y,...,t k y k ). Proof. There exist Polish spaces Z i and measurable maps g i : Z i Y i which are onto. Then g : Z Z k Y Y k defined by g(z,...,z k )=(g (z ),...,g k (z k )) is onto and measurable for S S k. So ( i Y i, i S i) is Suslin. It suffices to check that T is onto and ψ defined by the map ψ(x, y,...,y k )=ϕ(t y (x),...,t k y k (x)) is measurable. Obviously T is onto, and ψ is measurable because each map (x, y i ) T i y i (x) is measurable on X Y i and hence on X Y Y k, and hence (x, y,...,y k ) (T y (x),...,t k y k (x)) is measurable from X Yto R k.

9 Glivenko-Cantelli Preservation Theorems 23 4 Nonparametric maximum likelihood estimation: a general result Now we prove a general result for nonparametricmaximum likelihood estimation in a class of densities. The main proposition in this section is related to results of Pfanzagl (988) and Van de Geer (993), (996). Suppose that P is a class of densities with respect to a fixed σ finite measure µ on a measurable space (X, A). Suppose that X,...,X n are i.i.d. P 0 with density p 0 P. Let p n argmax P n log p. For 0 <α, let ϕ α (t) =(t α )/(t α + ) for t 0, ϕ(t) = for t<0. Then ϕ α is bounded and continuous for each α (0, ]. For 0 <β< define h 2 β(p, q) p β q β dµ. Note that h /2 (p, q) h(p, q) is the Hellinger distance between p and q, and by Hölder s inequality, h β (p, q) 0 with equality if and only if p = q a.e. µ. Proposition 3. Suppose that P is convex. Then ( h 2 α/2 ( p n,p 0 ) (P n P 0 ) ϕ α ( p ) n ). p 0 In particular, when α =we have, with ϕ ϕ, ( h 2 ( p n,p 0 ) h 2 /2 ( p n,p 0 ) (P n P 0 ) ϕ( p ) n ). p 0 Corollary. Suppose that {ϕ(p/p 0 ): p P}is a P 0 Glivenko-Cantelli class. Then for each 0 <α, h α/2 ( p n,p 0 ) a.s. 0. Proof of Proposition 3. It follows from convexity of P and convexity of t ϕ α (/t) that () 0 P n ϕ α ( p n /p 0 )=(P n P 0 )ϕ α ( p n /p 0 )+P 0 ϕ α ( p n /p 0 ); see van der Vaart and Wellner (996) page 330, and Pfanzagl (988), pp Now we show that p α p α ( ) 0 (2) P 0 ϕ α (p/p 0 )= p α + p α dp 0 p β 0 p β dµ 0 for β = α/2. Note that this holds if and only if p α +2 p α 0 + p 0dµ + p β pα 0 p β dµ,

10 24 A. van der Vaart and J. A. Wellner or p β 0 p β dµ 2 p α p α 0 + pα p 0dµ. But this holds if p β 0 p β 2 pα p 0 p α 0 +. pα With β = α/2, this becomes 2 (pα 0 + p α ) p α/2 0 p α/2 = p α 0 pα, and this holds by the arithmeticmean geometricmean inequality. Thus (2) holds. Combining (2) with () yields the claim of the proposition. 5 Example:a result of Schick and Yu Our goal in this section is to give another proof of the consistency result of Schick and Yu (999) for the Non-Parametric Maximum Likelihood Estimator (NPMLE) F n for mixed case interval censored data. Our proof is based on the inequality of the preceding section, and is similar in spirit to results of Van de Geer (993), (996). Suppose that Y is a random variable taking values in R + = [0, ) with distribution function F F= {all df s F on R + }. Unfortunately we are not able to observe Y itself. What we do observe is a vector of times T K =(T K,,...,T K,K ) where K, the number of times, is itself random, and the interval (T K,j,T K,j ] into which Y falls (with T K,0 0, T K,K+ ). More formally, we assume that K is an integer-valued random variable, and T = {T k,j,j =,...,k,k =, 2,...}, is a triangular array of potential observation times, and that Y and (K, T) are independent. Let X =( K,T K,K), with a possible value x =(δ k,t k,k), where k =( k,,..., k,k ) with k,j = (Tk,j,T k,j ](Y ), j =, 2,...,k +, and T k is the kth row of the triangular array T. Suppose we observe n i.i.d. copies of X; X,X 2,...,X n, where X i =( (i),t (i),k (i) ), i = K (i) K (i), 2,...,n. Here (Y (i),t (i),k (i) ), i =, 2,... are the underlying i.i.d. copies of (Y,T,K). We first note that conditionally on K and T K, the vector K has a multinomial distribution: where ( K K, T K ) Multinomial K+ (, F K ) F K (F (T K, ),F(T K,2 ) F (T K, ),..., F (T K,K )).

11 Glivenko-Cantelli Preservation Theorems 25 Suppose for the moment that the distribution G k of (T K K = k) has density g k and p k P (K = k). Then a density of X is given by (3) k+ p F (x) p F (δ k,t k,k)= (F (t k,j ) F (t k,j )) δ k,j g k (t k )p k where t k,0 0, t k,k+. In general, (4) p F (x) p F (δ k,t k,k) = = k+ (F (t k,j ) F (t k,j )) δ k,j k+ δ k,j (F (t k,j ) F (t k,j )) is a density of X with respect to the dominating measure ν where ν is determined by the joint distribution of (K, T), and it is this version of the density of X with which we will work throughout the rest of the paper. Thus the log-likelihood function for F of X,...,X n is given by where n l n(f X) = n n = P n m F K (i) + ( ) (i) K,j log F (T (i) (i) ) F (T K (i),j K (i),j ) m F (X) = K+ K+ K,j log (F (T K,j ) F (T K,j )) K,j log ( F K,j ) and where we have ignored the terms not involving F. We also note that, with P 0 P F0, K+ P 0 m F (X) =P 0 F 0,K,j log ( F K,j ). The NonparametricMaximum Likelihood Estimator (NPMLE) F n is the distribution function F n (t) which puts all its mass at the observed time points and maximizes the log-likelihood l n (F X). It can be calculated via the iterative convex minorant algorithm proposed in Groeneboom and Wellner (992) for case 2 interval censored data.

12 26 A. van der Vaart and J. A. Wellner By Proposition 3 with α = and ϕ ϕ as before, it follows that ( ) h 2 (p Fn,p F0 ) (P n P 0 ) ϕ(p Fn /p F0 ) where ϕ is bounded and continuous from R to R. Now the collection of functions G {p F : F F} is easily seen to be a Glivenko-Cantelli class of functions: this can be seen by first applying Theorem 4 to the collections G k, k =, 2,... obtained from G by restricting to the sets K = k. Then for fixed k, the collections G k = {p F (δ, t k,k): F F}are P 0 Glivenko-Cantelli classes since F is a uniform Glivenko-Cantelli class, and since the functions p F are continuous transformations of the classes of functions x δ k,j and x F (t k,j ) for j =,...,k+, and hence G is P Glivenko-Cantelli by Theorem 3. Note that the single function p F0 is trivially P 0 Glivenko-Cantelli since it is uniformly bounded, and the single function (/p F0 ) is also P 0 GC since P 0 (/p F0 ) <. Thus by Proposition 2 with g =(/p F0 ) and F = G = {p F : F F}, it follows that G {p F /p F0 : F F}is P 0 Glivenko- Cantelli. Finally another application of Theorem 3 shows that the collection H {L (p F /p F0 ): F F}= {ϕ(p F /p F0 ): F F} is also P 0 -Glivenko-Cantelli. When combined with Proposition 3, this yields the following theorem: Theorem 7. The NPMLE F n satisfies h(p Fn,p F0 ) a.s. 0. To relate this result to a recent theorem of Schick and Yu (999), it remains only to understand the relationship between their L (µ) and the Hellinger metric h between p F and p F0. Let B denote the collection of Borel sets in R. OnB we define measures µ and µ as follows: For B B, (5) and (6) µ(b) = µ(b) = P (K = k) k= P (K = k) k k= k P (T k,j B K = k), k P (T k,j B K = k). Let d be the L (µ) metricon the class F; thus for F,F 2 F, d(f,f 2 )= F (t) F 2 (t) dµ(t).

13 Glivenko-Cantelli Preservation Theorems 27 The measure µ was introduced by Schick and Yu (999); note that µ is a finite measure if E(K) <. Note that d(f,f 2 ) can also be written in terms of an expectation as: K (7) d(f,f 2 )=E (K,T ) (F (T K,j ) F 2 (T K,j ). As Schick and Yu (999) observed, consistency of the NPMLE F n in L (µ) holds under virtually no further hypotheses. Theorem 8. (Schick and Yu). Suppose that E(K) <. Then d( F n,f 0 ) a.s. 0. Proof. We will show that Theorem 8 follows from Theorem 7 and the following lemma. Lemma 4. { 2 Proof. We know that F n F 0 d µ} 2 h 2 (p F n,p F 0 ). h 2 (p Fn,p F0 ) d TV (p Fn,p F0 ) 2h(p Fn,p F0 ) where, with y k,0 =, y k,k+ =, h 2 (p Fn,p F0 )= k+ P (K = k) {[ F n (y k,j ) F n (y k,j )] /2 k= [F 0 (y k,j ) F 0 (y k,j )] /2 } 2 dg k (y) while k+ d TV (p Fn,p F0 )= P (K = k) [ F n (y k,j ) F n (y k,j )] k= [F 0 (y k,j ) F 0 (y k,j )] dg k (y). Note that k+ [ F n (y k,j ) F n (y k,j )] [F 0 (y k,j ) F 0 (y k,j )] k+ = ( F n F 0 )(y k,j,y k,j ] max j k+ F n (y k,j ) F 0 (y k,j ),

14 28 A. van der Vaart and J. A. Wellner so integrating across this inequality with respect to G k (y) yields k+ [ F n (y k,j ) F n (y k,j )] [F 0 (y k,j ) F 0 (y k,j )] dg k (y) max j k k k F n (y k,j ) F 0 (y k,j ) dg k,j (y k,j ) F n (y k,j ) F 0 (y k,j ) dg k,j (y k,j ). By multiplying across by P (K = k) and summing over k, this yields d TV (p Fn,p F0 ) F n F 0 d µ, and hence (8) h 2 (p Fn,p F0 ) 2 { F n F 0 d µ} 2. The measure µ figuring in Lemma 4 is not the same as the measure µ of Schick and Yu (999) because of the factor /k. Note that this factor means that the measure µ is always a finite measure, even if E(K) =. It is clear that µ(b) µ(b) for every Borel set B, and that µ µ. The following lemma (Lemma 2.2 of Schick and Yu (999)) together with Lemma 4 shows that Theorem 7 implies the result of Schick and Yu once again: Lemma 5. Suppose that µ and µ are two finite measures, and that g, g, g 2,... are measurable functions with range in [0, ]. Suppose that µ is absolutely continuous with respect to µ. Then g n g d µ 0 implies that gn g dµ 0. Proof. Write g n g dµ = g n g dµ d µ d µ and use the dominated convergence theorem applied to a.e. convergent subsequences. 6 Example:generalizing the result of Schick and Yu Our goal in this section is to give a generalization of the consistency result of Schick and Yu (999).

15 Glivenko-Cantelli Preservation Theorems 29 Suppose that Y is a random variable taking values in Y. Suppose that Y has distribution Q on the measurable space (Y, B). Unfortunately we are not able to observe Y itself. What we do observe is a vector of random sets C K =(C K,,...,C K,K ) where K, the number of sets is itself random, and the set {C K,j } K form a partition of Y: C K,j C K,j = if j j and K C K,j = Y. More formally, we assume that K is an integer-valued random variable, and C = {C k,j,j =,...,k,k =, 2,...}, is a triangular array of random sets, and that Y and (K, C) are independent. Let X =( K,C K,K), with a possible value x =(δ k,c k,k), where k =( k,,..., k,k ) with k,j = Ck,j (Y ), j =, 2,...,k, and C k is the k th row of the triangular array C. Suppose we observe n i.i.d. copies of X; X,X 2,...,X n, where X i =( (i),c (i),k (i) ), i =, 2,...,n. Here K (i) K (i) (Y (i),c (i),k (i) ), i =, 2,... are the underlying i.i.d. copies of (Y,C,K). We first note that conditionally on K and C K, the vector K has a multinomial distribution: where ( K K, C K ) Multinomial K (,Q K ) Q K (Q(C K, ),Q(C K,2 ),...,Q(C K,K )). Suppose for the moment that the distribution G k of (C K K = k) has density g k and p k P (K = k). Then a density of X is given by (9) k p Q (x) p Q (δ k,c k,k)= Q(c k,j ) δ k,j g k (c k )p k. In general, (0) k k p Q (x) p Q (δ k,c k,k)= Q(c k,j ) δ k,j = δ k,j Q(c k,j ) is a density of X with respect to the dominating measure ν where ν is determined by the joint distribution of (K, C), and it is this version of the density of X with which we will work throughout the rest of the paper. Thus the log-likelihood function for Q of X,...,X n is given by where n l n(q X) = n n K (i) (i) K,j log Q(C(i) K (i),j )=P nm Q m Q (X) = K K,j log Q(C K,j )

16 30 A. van der Vaart and J. A. Wellner and where we have ignored the terms not involving Q. We also note that, with P 0 P Q0, K P 0 m Q (X) =P 0 Q 0 (C K,j ) log Q(C K,j ). The NonparametricMaximum Likelihood Estimator (NPMLE) Q n is a probability measure Q n which maximizes the log-likelihood l n (Q X). Here we will bypass the many interesting existence, characterization, and computational issues connected with the NPMLE Q n, and focus instead on the issue of consistency once we have the NPMLE in hand. By Proposition 3 with α = it follows that ( ) h 2 (p Qn,p Q0 ) (P n P 0 ) ϕ(p Qn /p Q0 ). where ϕ is bounded and continuous. Now the collection of functions G {p Q : Q Q} where Q is the collection of all probability distributions on (Y, B), is easily seen to be a Glivenko-Cantelli class of functions: this can be seen by first applying Theorem 4 to the collections G k, k =, 2,... obtained from G by restricting to the sets K = k. The next step is to show that the collections G k {p Q (,,k): Q Q} are P 0 Glivenko-Cantelli for each k. To this end, suppose that (Y, B) isa measurable space and let C be a universal GC - class of sets in Y (i.e. Q GC for every probability measure Q on (Y, B)). Let (T,T,G) be a probability space, and let t C t be a map with values in C. Lemma 6. Suppose that the collection {{t : y C t } : y Y}is G GC. Then the collection of functions t Q(C t ) where Q ranges over all probability measures Q on Y is G GC. Remark. By Assouad (983), Proposition 2.2, page 246, together with Corollary.0, page 24, it follows that {{t T : y C t } : y Y}is a VC-class of subsets of T if and only if {C t : t T } is a VC-class of subsets of Y. Thus if we assume that all the subsets C = C t arising in the partitions are elements of a VC-class C, then, subject to being suitably measurable, the hypothesis of Lemma 6 is satisfied (and the class in question is even universal Glivenko-Cantelli).

17 Glivenko-Cantelli Preservation Theorems 3 Proof. For Q the Diracmeasure at y (unit mass at y), the function t Q(C t ) becomes t Ct (y). By assumption, the set of all such functions, with y ranging over Y, is G GC. Then so is the set of all functions t m m Ct (y i ), y,...,y m Y,m N. Let Y,...,Y m be an i.i.d. sample from a given Q. Because C is Q GC, sup m {Y i C} P (C) 0 a.s.. C C m Then there exists a sequence y,y 2,... in Y such that sup m {y i C} P (C) 0, C C m and consequently sup m {y i C t } P(C t ) 0. t T m It follows that the maps t Q(C t ) are the (uniform) limits of sequences of functions from a G GC class. Hence they are G GC. Note on measurability: It is assumed implicitly that the sets {t : y C t } are measurable in T for every y Y. The proof shows that the maps t P (C t ) are then also measurable. It follows that for fixed k, the collections G k = {p Q (δ, c k,k) : Q Q} are P 0 -Glivenko-Cantelli, and since the functions p Q are continuous transformations of the classes of functions x δ k,j and x Q(c k,j ) for j =,...,k, and hence G is P Glivenko-Cantelli by Theorem 3. Note that the single function p Q0 is trivially P 0 Glivenko-Cantelli since it is uniformly bounded. Now another application of Theorem 3 shows that the collection H {L (p Q /p Q0 ): Q Q}= {ϕ(p Q /p Q0 ): Q Q} is also P 0 -Glivenko-Cantelli. When combined with Proposition 3, this yields the following theorem. Theorem 9. If all C K,j C, a VC collection of subsets of X, then the NPMLE Q n satisfies h(p Qn,p Q0 ) a.s. 0.

18 32 A. van der Vaart and J. A. Wellner To obtain a statement analogous to the theorem of Schick and Yu (999), let Σ denote a sigma algebra for the space C of subsets: we are assuming that P ( K [C K,j C])=.Thus(C, Σ) is a measurable space. On Σ we define the measure µ as follows: () k µ(b) = P (K = k) P (C k,j B K = k), B Σ. k= Note that d TV (p Qn,p Q0 ) = (2) = k P (K = k) Q n (c k,j ) Q 0 (c k,j ) dg k (c) k= Q n (c) Q 0 (c) dµ(c) d( Q n,q 0 ). Here is a generalization of the theorem of Schick and Yu (999). Theorem 0. The NPMLE Q n satisfies d( Q n,q 0 ) a.s. 0. Proof. Theorem 0 follows from Theorem 9, (2) and the observation that d( Q n,q 0 )=d TV (p Qn,p Q0 ) 2h(p Qn,p Q0 ). Example. (Mixed case interval censoring in R 2 ). Suppose that Y Q 0 takes values in R +2, and the partitions {C K,j } K are obtained as the natural rectangles formed from two increasing collections 0 S K,... S K,K and 0 T K2,... T K2,K on the two coordinate axes. Then K =(K + )(K 2 + ), and all the elements of the partitions are elements of the VC-class of rectangles in R 2. It follows that the NPMLE Q n of Q 0 satisfies Q n (s, t] Q 0 (s, t] dµ(s, t) 0 a.s.. s,t R +2 Acknowledgments. We owe thanks to Evarist Giné and Joel Zinn for suggesting the proof given for Lemma. References Assouad, P. (983). Densité et dimension. Annales de l Institut Fourier 33, Dudley, R. M. (984). A Course on Empirical Processes. Ecole d été de probabilités de St.-Flour, 982. Lecture Notes in Mathematics 097, Springer, Berlin, 42.

19 Glivenko-Cantelli Preservation Theorems 33 Dudley, R. M. (998a). Consistency of M-estimators and one-sided bracketing. In Proceedings of the First International Conference on High- Dimensional Probability. E. Eberlein, M. Hahn, and M. Talagrand, eds. Birkhäuser, Basel, pp Dudley, R. M. (998b). Personal communication. Dudley, R. M., Giné, E., and Zinn, J. (99). Uniform and universal Glivenko-Cantelli classes. J. Theoretical Probab. 4, Giné, E., and Zinn, J. (984). Some limit theorems for empirical processes. Ann. Probability 2, Groeneboom, P. and Wellner, J. A. (992). Information Bounds and Nonparametric Maximum Likelihood Estimation. DMV Seminar Band 9, Birkhäuser, Basel. Pfanzagl, J. (988). Consistency of maximum likelihood estimators for certain nonparametric families, in particular: mixtures. J. Statist. Planning and Inference 9, Schick, A. and Yu, Q. (999), Consistency of the GMLE with Mixed Case Interval-Censored Data, Scand. J. Statist., to appear. Talagrand, M. (987a). Measurability problems for empirical measures. Ann. Probability 5, Talagrand, M. (987b). The Glivenko-Cantelli problem. Ann. Probability 5, Talagrand, M. (996). The Glivenko-Cantelli problem, ten years later. J. Theor. Prob. 9, Van de Geer, S. (993). Hellinger consistency of certain nonparametric maximum likelihood estimators. Ann. Statist. 2, Van de Geer, S. (996). Rates of convergence for the maximum likelihood estimator in mixture models. J. Nonparametric Statist. 6, Van der Vaart, A. W. (996). New Donsker classes. Ann. Probability 24, Van der Vaart, A. W., and Wellner, J. A. (996). Weak Convergence and Empirical Processes with Applications to Statistics, Springer-Verlag, New York. update? Department of Mathematics Free University De Boelelaan 08a 08 HV Amsterdam The Netherlands aad@cs.vu.nl Department of Statistics University of Washington P.O. Box Seattle, Washington U.S.A. jaw@stat.washington.edu

Preservation Theorems for Glivenko-Cantelli and Uniform Glivenko-Cantelli Classes

Preservation Theorems for Glivenko-Cantelli and Uniform Glivenko-Cantelli Classes Preservation Theorems for Glivenko-Cantelli and Uniform Glivenko-Cantelli Classes Aad van der Vaart and Jon A. Wellner Free University and University of Washington ABSTRACT We show that the P Glivenko

More information

PRESERVATION THEOREMS FOR GLIVENKO-CANTELLI AND UNIFORM GLIVENKO-CANTELLI CLASSES AAD VAN DER VAART AND JON A. WELLNER ABSTRACT We show that the P Gli

PRESERVATION THEOREMS FOR GLIVENKO-CANTELLI AND UNIFORM GLIVENKO-CANTELLI CLASSES AAD VAN DER VAART AND JON A. WELLNER ABSTRACT We show that the P Gli PRESERVATION THEOREMS FOR GLIVENKO-CANTELLI AND UNIFORM GLIVENKO-CANTELLI CLASSES AAD VAN DER VAART AND JON A. WELLNER ABSTRACT We show that the P Glivenko property of classes of functions F ;::: ;F k

More information

SOME CONVERSE LIMIT THEOREMS FOR EXCHANGEABLE BOOTSTRAPS

SOME CONVERSE LIMIT THEOREMS FOR EXCHANGEABLE BOOTSTRAPS SOME CONVERSE LIMIT THEOREMS OR EXCHANGEABLE BOOTSTRAPS Jon A. Wellner University of Washington The bootstrap Glivenko-Cantelli and bootstrap Donsker theorems of Giné and Zinn (990) contain both necessary

More information

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations Research

More information

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure? MA 645-4A (Real Analysis), Dr. Chernov Homework assignment 1 (Due ). Show that the open disk x 2 + y 2 < 1 is a countable union of planar elementary sets. Show that the closed disk x 2 + y 2 1 is a countable

More information

Empirical Processes: General Weak Convergence Theory

Empirical Processes: General Weak Convergence Theory Empirical Processes: General Weak Convergence Theory Moulinath Banerjee May 18, 2010 1 Extended Weak Convergence The lack of measurability of the empirical process with respect to the sigma-field generated

More information

2 (Bonus). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

2 (Bonus). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure? MA 645-4A (Real Analysis), Dr. Chernov Homework assignment 1 (Due 9/5). Prove that every countable set A is measurable and µ(a) = 0. 2 (Bonus). Let A consist of points (x, y) such that either x or y is

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 13: Entropy Calculations

Introduction to Empirical Processes and Semiparametric Inference Lecture 13: Entropy Calculations Introduction to Empirical Processes and Semiparametric Inference Lecture 13: Entropy Calculations Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations Research

More information

L p Spaces and Convexity

L p Spaces and Convexity L p Spaces and Convexity These notes largely follow the treatments in Royden, Real Analysis, and Rudin, Real & Complex Analysis. 1. Convex functions Let I R be an interval. For I open, we say a function

More information

MATHS 730 FC Lecture Notes March 5, Introduction

MATHS 730 FC Lecture Notes March 5, Introduction 1 INTRODUCTION MATHS 730 FC Lecture Notes March 5, 2014 1 Introduction Definition. If A, B are sets and there exists a bijection A B, they have the same cardinality, which we write as A, #A. If there exists

More information

5 Measure theory II. (or. lim. Prove the proposition. 5. For fixed F A and φ M define the restriction of φ on F by writing.

5 Measure theory II. (or. lim. Prove the proposition. 5. For fixed F A and φ M define the restriction of φ on F by writing. 5 Measure theory II 1. Charges (signed measures). Let (Ω, A) be a σ -algebra. A map φ: A R is called a charge, (or signed measure or σ -additive set function) if φ = φ(a j ) (5.1) A j for any disjoint

More information

MATH MEASURE THEORY AND FOURIER ANALYSIS. Contents

MATH MEASURE THEORY AND FOURIER ANALYSIS. Contents MATH 3969 - MEASURE THEORY AND FOURIER ANALYSIS ANDREW TULLOCH Contents 1. Measure Theory 2 1.1. Properties of Measures 3 1.2. Constructing σ-algebras and measures 3 1.3. Properties of the Lebesgue measure

More information

THEOREMS, ETC., FOR MATH 515

THEOREMS, ETC., FOR MATH 515 THEOREMS, ETC., FOR MATH 515 Proposition 1 (=comment on page 17). If A is an algebra, then any finite union or finite intersection of sets in A is also in A. Proposition 2 (=Proposition 1.1). For every

More information

Defining the Integral

Defining the Integral Defining the Integral In these notes we provide a careful definition of the Lebesgue integral and we prove each of the three main convergence theorems. For the duration of these notes, let (, M, µ) be

More information

(Y; I[X Y ]), where I[A] is the indicator function of the set A. Examples of the current status data are mentioned in Ayer et al. (1955), Keiding (199

(Y; I[X Y ]), where I[A] is the indicator function of the set A. Examples of the current status data are mentioned in Ayer et al. (1955), Keiding (199 CONSISTENCY OF THE GMLE WITH MIXED CASE INTERVAL-CENSORED DATA By Anton Schick and Qiqing Yu Binghamton University April 1997. Revised December 1997, Revised July 1998 Abstract. In this paper we consider

More information

Lecture 2: Uniform Entropy

Lecture 2: Uniform Entropy STAT 583: Advanced Theory of Statistical Inference Spring 218 Lecture 2: Uniform Entropy Lecturer: Fang Han April 16 Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal

More information

Problem set 1, Real Analysis I, Spring, 2015.

Problem set 1, Real Analysis I, Spring, 2015. Problem set 1, Real Analysis I, Spring, 015. (1) Let f n : D R be a sequence of functions with domain D R n. Recall that f n f uniformly if and only if for all ɛ > 0, there is an N = N(ɛ) so that if n

More information

Chapter 8. General Countably Additive Set Functions. 8.1 Hahn Decomposition Theorem

Chapter 8. General Countably Additive Set Functions. 8.1 Hahn Decomposition Theorem Chapter 8 General Countably dditive Set Functions In Theorem 5.2.2 the reader saw that if f : X R is integrable on the measure space (X,, µ) then we can define a countably additive set function ν on by

More information

HILBERT SPACES AND THE RADON-NIKODYM THEOREM. where the bar in the first equation denotes complex conjugation. In either case, for any x V define

HILBERT SPACES AND THE RADON-NIKODYM THEOREM. where the bar in the first equation denotes complex conjugation. In either case, for any x V define HILBERT SPACES AND THE RADON-NIKODYM THEOREM STEVEN P. LALLEY 1. DEFINITIONS Definition 1. A real inner product space is a real vector space V together with a symmetric, bilinear, positive-definite mapping,

More information

Statistics 581 Revision of Section 4.4: Consistency of Maximum Likelihood Estimates Wellner; 11/30/2001

Statistics 581 Revision of Section 4.4: Consistency of Maximum Likelihood Estimates Wellner; 11/30/2001 Statistics 581 Revision of Section 4.4: Consistency of Maximum Likelihood Estimates Wellner; 11/30/2001 Some Uniform Strong Laws of Large Numbers Suppose that: A. X, X 1,...,X n are i.i.d. P on the measurable

More information

Measure Theory on Topological Spaces. Course: Prof. Tony Dorlas 2010 Typset: Cathal Ormond

Measure Theory on Topological Spaces. Course: Prof. Tony Dorlas 2010 Typset: Cathal Ormond Measure Theory on Topological Spaces Course: Prof. Tony Dorlas 2010 Typset: Cathal Ormond May 22, 2011 Contents 1 Introduction 2 1.1 The Riemann Integral........................................ 2 1.2 Measurable..............................................

More information

1. Introduction In many biomedical studies, the random survival time of interest is never observed and is only known to lie before an inspection time

1. Introduction In many biomedical studies, the random survival time of interest is never observed and is only known to lie before an inspection time ASYMPTOTIC PROPERTIES OF THE GMLE WITH CASE 2 INTERVAL-CENSORED DATA By Qiqing Yu a;1 Anton Schick a, Linxiong Li b;2 and George Y. C. Wong c;3 a Dept. of Mathematical Sciences, Binghamton University,

More information

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9 MAT 570 REAL ANALYSIS LECTURE NOTES PROFESSOR: JOHN QUIGG SEMESTER: FALL 204 Contents. Sets 2 2. Functions 5 3. Countability 7 4. Axiom of choice 8 5. Equivalence relations 9 6. Real numbers 9 7. Extended

More information

Maximum likelihood: counterexamples, examples, and open problems

Maximum likelihood: counterexamples, examples, and open problems Maximum likelihood: counterexamples, examples, and open problems Jon A. Wellner University of Washington visiting Vrije Universiteit, Amsterdam Talk at BeNeLuxFra Mathematics Meeting 21 May, 2005 Email:

More information

STATISTICAL INFERENCE UNDER ORDER RESTRICTIONS LIMIT THEORY FOR THE GRENANDER ESTIMATOR UNDER ALTERNATIVE HYPOTHESES

STATISTICAL INFERENCE UNDER ORDER RESTRICTIONS LIMIT THEORY FOR THE GRENANDER ESTIMATOR UNDER ALTERNATIVE HYPOTHESES STATISTICAL INFERENCE UNDER ORDER RESTRICTIONS LIMIT THEORY FOR THE GRENANDER ESTIMATOR UNDER ALTERNATIVE HYPOTHESES By Jon A. Wellner University of Washington 1. Limit theory for the Grenander estimator.

More information

3 Integration and Expectation

3 Integration and Expectation 3 Integration and Expectation 3.1 Construction of the Lebesgue Integral Let (, F, µ) be a measure space (not necessarily a probability space). Our objective will be to define the Lebesgue integral R fdµ

More information

A Law of the Iterated Logarithm. for Grenander s Estimator

A Law of the Iterated Logarithm. for Grenander s Estimator A Law of the Iterated Logarithm for Grenander s Estimator Jon A. Wellner University of Washington, Seattle Probability Seminar October 24, 2016 Based on joint work with: Lutz Dümbgen and Malcolm Wolff

More information

Integration on Measure Spaces

Integration on Measure Spaces Chapter 3 Integration on Measure Spaces In this chapter we introduce the general notion of a measure on a space X, define the class of measurable functions, and define the integral, first on a class of

More information

A SKOROHOD REPRESENTATION THEOREM WITHOUT SEPARABILITY PATRIZIA BERTI, LUCA PRATELLI, AND PIETRO RIGO

A SKOROHOD REPRESENTATION THEOREM WITHOUT SEPARABILITY PATRIZIA BERTI, LUCA PRATELLI, AND PIETRO RIGO A SKOROHOD REPRESENTATION THEOREM WITHOUT SEPARABILITY PATRIZIA BERTI, LUCA PRATELLI, AND PIETRO RIGO Abstract. Let (S, d) be a metric space, G a σ-field on S and (µ n : n 0) a sequence of probabilities

More information

Banach Spaces V: A Closer Look at the w- and the w -Topologies

Banach Spaces V: A Closer Look at the w- and the w -Topologies BS V c Gabriel Nagy Banach Spaces V: A Closer Look at the w- and the w -Topologies Notes from the Functional Analysis Course (Fall 07 - Spring 08) In this section we discuss two important, but highly non-trivial,

More information

Maximum likelihood: counterexamples, examples, and open problems. Jon A. Wellner. University of Washington. Maximum likelihood: p.

Maximum likelihood: counterexamples, examples, and open problems. Jon A. Wellner. University of Washington. Maximum likelihood: p. Maximum likelihood: counterexamples, examples, and open problems Jon A. Wellner University of Washington Maximum likelihood: p. 1/7 Talk at University of Idaho, Department of Mathematics, September 15,

More information

CHAPTER V DUAL SPACES

CHAPTER V DUAL SPACES CHAPTER V DUAL SPACES DEFINITION Let (X, T ) be a (real) locally convex topological vector space. By the dual space X, or (X, T ), of X we mean the set of all continuous linear functionals on X. By the

More information

Part II Probability and Measure

Part II Probability and Measure Part II Probability and Measure Theorems Based on lectures by J. Miller Notes taken by Dexter Chua Michaelmas 2016 These notes are not endorsed by the lecturers, and I have modified them (often significantly)

More information

CHAPTER I THE RIESZ REPRESENTATION THEOREM

CHAPTER I THE RIESZ REPRESENTATION THEOREM CHAPTER I THE RIESZ REPRESENTATION THEOREM We begin our study by identifying certain special kinds of linear functionals on certain special vector spaces of functions. We describe these linear functionals

More information

Nonparametric estimation under Shape Restrictions

Nonparametric estimation under Shape Restrictions Nonparametric estimation under Shape Restrictions Jon A. Wellner University of Washington, Seattle Statistical Seminar, Frejus, France August 30 - September 3, 2010 Outline: Five Lectures on Shape Restrictions

More information

Examples of Dual Spaces from Measure Theory

Examples of Dual Spaces from Measure Theory Chapter 9 Examples of Dual Spaces from Measure Theory We have seen that L (, A, µ) is a Banach space for any measure space (, A, µ). We will extend that concept in the following section to identify an

More information

Math212a1413 The Lebesgue integral.

Math212a1413 The Lebesgue integral. Math212a1413 The Lebesgue integral. October 28, 2014 Simple functions. In what follows, (X, F, m) is a space with a σ-field of sets, and m a measure on F. The purpose of today s lecture is to develop the

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 12: Glivenko-Cantelli and Donsker Results

Introduction to Empirical Processes and Semiparametric Inference Lecture 12: Glivenko-Cantelli and Donsker Results Introduction to Empirical Processes and Semiparametric Inference Lecture 12: Glivenko-Cantelli and Donsker Results Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics

More information

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications. Stat 811 Lecture Notes The Wald Consistency Theorem Charles J. Geyer April 9, 01 1 Analyticity Assumptions Let { f θ : θ Θ } be a family of subprobability densities 1 with respect to a measure µ on a measurable

More information

1 Weak Convergence in R k

1 Weak Convergence in R k 1 Weak Convergence in R k Byeong U. Park 1 Let X and X n, n 1, be random vectors taking values in R k. These random vectors are allowed to be defined on different probability spaces. Below, for the simplicity

More information

Weak convergence. Amsterdam, 13 November Leiden University. Limit theorems. Shota Gugushvili. Generalities. Criteria

Weak convergence. Amsterdam, 13 November Leiden University. Limit theorems. Shota Gugushvili. Generalities. Criteria Weak Leiden University Amsterdam, 13 November 2013 Outline 1 2 3 4 5 6 7 Definition Definition Let µ, µ 1, µ 2,... be probability measures on (R, B). It is said that µ n converges weakly to µ, and we then

More information

Real Analysis Notes. Thomas Goller

Real Analysis Notes. Thomas Goller Real Analysis Notes Thomas Goller September 4, 2011 Contents 1 Abstract Measure Spaces 2 1.1 Basic Definitions........................... 2 1.2 Measurable Functions........................ 2 1.3 Integration..............................

More information

7 Convergence in R d and in Metric Spaces

7 Convergence in R d and in Metric Spaces STA 711: Probability & Measure Theory Robert L. Wolpert 7 Convergence in R d and in Metric Spaces A sequence of elements a n of R d converges to a limit a if and only if, for each ǫ > 0, the sequence a

More information

Analysis Comprehensive Exam Questions Fall 2008

Analysis Comprehensive Exam Questions Fall 2008 Analysis Comprehensive xam Questions Fall 28. (a) Let R be measurable with finite Lebesgue measure. Suppose that {f n } n N is a bounded sequence in L 2 () and there exists a function f such that f n (x)

More information

Real Analysis Problems

Real Analysis Problems Real Analysis Problems Cristian E. Gutiérrez September 14, 29 1 1 CONTINUITY 1 Continuity Problem 1.1 Let r n be the sequence of rational numbers and Prove that f(x) = 1. f is continuous on the irrationals.

More information

Real Analysis, 2nd Edition, G.B.Folland Elements of Functional Analysis

Real Analysis, 2nd Edition, G.B.Folland Elements of Functional Analysis Real Analysis, 2nd Edition, G.B.Folland Chapter 5 Elements of Functional Analysis Yung-Hsiang Huang 5.1 Normed Vector Spaces 1. Note for any x, y X and a, b K, x+y x + y and by ax b y x + b a x. 2. It

More information

STAT 7032 Probability Spring Wlodek Bryc

STAT 7032 Probability Spring Wlodek Bryc STAT 7032 Probability Spring 2018 Wlodek Bryc Created: Friday, Jan 2, 2014 Revised for Spring 2018 Printed: January 9, 2018 File: Grad-Prob-2018.TEX Department of Mathematical Sciences, University of Cincinnati,

More information

arxiv:math/ v2 [math.st] 17 Jun 2008

arxiv:math/ v2 [math.st] 17 Jun 2008 The Annals of Statistics 2008, Vol. 36, No. 3, 1031 1063 DOI: 10.1214/009053607000000974 c Institute of Mathematical Statistics, 2008 arxiv:math/0609020v2 [math.st] 17 Jun 2008 CURRENT STATUS DATA WITH

More information

Efficiency of Profile/Partial Likelihood in the Cox Model

Efficiency of Profile/Partial Likelihood in the Cox Model Efficiency of Profile/Partial Likelihood in the Cox Model Yuichi Hirose School of Mathematics, Statistics and Operations Research, Victoria University of Wellington, New Zealand Summary. This paper shows

More information

Real Analysis Math 131AH Rudin, Chapter #1. Dominique Abdi

Real Analysis Math 131AH Rudin, Chapter #1. Dominique Abdi Real Analysis Math 3AH Rudin, Chapter # Dominique Abdi.. If r is rational (r 0) and x is irrational, prove that r + x and rx are irrational. Solution. Assume the contrary, that r+x and rx are rational.

More information

consists of two disjoint copies of X n, each scaled down by 1,

consists of two disjoint copies of X n, each scaled down by 1, Homework 4 Solutions, Real Analysis I, Fall, 200. (4) Let be a topological space and M be a σ-algebra on which contains all Borel sets. Let m, µ be two positive measures on M. Assume there is a constant

More information

The International Journal of Biostatistics

The International Journal of Biostatistics The International Journal of Biostatistics Volume 1, Issue 1 2005 Article 3 Score Statistics for Current Status Data: Comparisons with Likelihood Ratio and Wald Statistics Moulinath Banerjee Jon A. Wellner

More information

2 Sequences, Continuity, and Limits

2 Sequences, Continuity, and Limits 2 Sequences, Continuity, and Limits In this chapter, we introduce the fundamental notions of continuity and limit of a real-valued function of two variables. As in ACICARA, the definitions as well as proofs

More information

Supplementary Notes for W. Rudin: Principles of Mathematical Analysis

Supplementary Notes for W. Rudin: Principles of Mathematical Analysis Supplementary Notes for W. Rudin: Principles of Mathematical Analysis SIGURDUR HELGASON In 8.00B it is customary to cover Chapters 7 in Rudin s book. Experience shows that this requires careful planning

More information

Functional Analysis, Stein-Shakarchi Chapter 1

Functional Analysis, Stein-Shakarchi Chapter 1 Functional Analysis, Stein-Shakarchi Chapter 1 L p spaces and Banach Spaces Yung-Hsiang Huang 018.05.1 Abstract Many problems are cited to my solution files for Folland [4] and Rudin [6] post here. 1 Exercises

More information

EMPIRICAL PROCESSES: Theory and Applications

EMPIRICAL PROCESSES: Theory and Applications Corso estivo di statistica e calcolo delle probabilità EMPIRICAL PROCESSES: Theory and Applications Torgnon, 23 Corrected Version, 2 July 23; 21 August 24 Jon A. Wellner University of Washington Statistics,

More information

Product-limit estimators of the survival function with left or right censored data

Product-limit estimators of the survival function with left or right censored data Product-limit estimators of the survival function with left or right censored data 1 CREST-ENSAI Campus de Ker-Lann Rue Blaise Pascal - BP 37203 35172 Bruz cedex, France (e-mail: patilea@ensai.fr) 2 Institut

More information

The intersection axiom of

The intersection axiom of The intersection axiom of conditional independence: some new [?] results Richard D. Gill Mathematical Institute, University Leiden This version: 26 March, 2019 (X Y Z) & (X Z Y) X (Y, Z) Presented at Algebraic

More information

Chapter 4: Asymptotic Properties of the MLE

Chapter 4: Asymptotic Properties of the MLE Chapter 4: Asymptotic Properties of the MLE Daniel O. Scharfstein 09/19/13 1 / 1 Maximum Likelihood Maximum likelihood is the most powerful tool for estimation. In this part of the course, we will consider

More information

L p Functions. Given a measure space (X, µ) and a real number p [1, ), recall that the L p -norm of a measurable function f : X R is defined by

L p Functions. Given a measure space (X, µ) and a real number p [1, ), recall that the L p -norm of a measurable function f : X R is defined by L p Functions Given a measure space (, µ) and a real number p [, ), recall that the L p -norm of a measurable function f : R is defined by f p = ( ) /p f p dµ Note that the L p -norm of a function f may

More information

THE INVERSE FUNCTION THEOREM

THE INVERSE FUNCTION THEOREM THE INVERSE FUNCTION THEOREM W. PATRICK HOOPER The implicit function theorem is the following result: Theorem 1. Let f be a C 1 function from a neighborhood of a point a R n into R n. Suppose A = Df(a)

More information

Summary of Real Analysis by Royden

Summary of Real Analysis by Royden Summary of Real Analysis by Royden Dan Hathaway May 2010 This document is a summary of the theorems and definitions and theorems from Part 1 of the book Real Analysis by Royden. In some areas, such as

More information

MATH5011 Real Analysis I. Exercise 1 Suggested Solution

MATH5011 Real Analysis I. Exercise 1 Suggested Solution MATH5011 Real Analysis I Exercise 1 Suggested Solution Notations in the notes are used. (1) Show that every open set in R can be written as a countable union of mutually disjoint open intervals. Hint:

More information

Estimation of the Bivariate and Marginal Distributions with Censored Data

Estimation of the Bivariate and Marginal Distributions with Censored Data Estimation of the Bivariate and Marginal Distributions with Censored Data Michael Akritas and Ingrid Van Keilegom Penn State University and Eindhoven University of Technology May 22, 2 Abstract Two new

More information

Empirical Processes and random projections

Empirical Processes and random projections Empirical Processes and random projections B. Klartag, S. Mendelson School of Mathematics, Institute for Advanced Study, Princeton, NJ 08540, USA. Institute of Advanced Studies, The Australian National

More information

Commutative Banach algebras 79

Commutative Banach algebras 79 8. Commutative Banach algebras In this chapter, we analyze commutative Banach algebras in greater detail. So we always assume that xy = yx for all x, y A here. Definition 8.1. Let A be a (commutative)

More information

CONVERGENCE OF MULTIPLE ERGODIC AVERAGES FOR SOME COMMUTING TRANSFORMATIONS

CONVERGENCE OF MULTIPLE ERGODIC AVERAGES FOR SOME COMMUTING TRANSFORMATIONS CONVERGENCE OF MULTIPLE ERGODIC AVERAGES FOR SOME COMMUTING TRANSFORMATIONS NIKOS FRANTZIKINAKIS AND BRYNA KRA Abstract. We prove the L 2 convergence for the linear multiple ergodic averages of commuting

More information

d(x n, x) d(x n, x nk ) + d(x nk, x) where we chose any fixed k > N

d(x n, x) d(x n, x nk ) + d(x nk, x) where we chose any fixed k > N Problem 1. Let f : A R R have the property that for every x A, there exists ɛ > 0 such that f(t) > ɛ if t (x ɛ, x + ɛ) A. If the set A is compact, prove there exists c > 0 such that f(x) > c for all x

More information

Problem 1: Compactness (12 points, 2 points each)

Problem 1: Compactness (12 points, 2 points each) Final exam Selected Solutions APPM 5440 Fall 2014 Applied Analysis Date: Tuesday, Dec. 15 2014, 10:30 AM to 1 PM You may assume all vector spaces are over the real field unless otherwise specified. Your

More information

1. Bounded linear maps. A linear map T : E F of real Banach

1. Bounded linear maps. A linear map T : E F of real Banach DIFFERENTIABLE MAPS 1. Bounded linear maps. A linear map T : E F of real Banach spaces E, F is bounded if M > 0 so that for all v E: T v M v. If v r T v C for some positive constants r, C, then T is bounded:

More information

Topological properties of Z p and Q p and Euclidean models

Topological properties of Z p and Q p and Euclidean models Topological properties of Z p and Q p and Euclidean models Samuel Trautwein, Esther Röder, Giorgio Barozzi November 3, 20 Topology of Q p vs Topology of R Both R and Q p are normed fields and complete

More information

(U) =, if 0 U, 1 U, (U) = X, if 0 U, and 1 U. (U) = E, if 0 U, but 1 U. (U) = X \ E if 0 U, but 1 U. n=1 A n, then A M.

(U) =, if 0 U, 1 U, (U) = X, if 0 U, and 1 U. (U) = E, if 0 U, but 1 U. (U) = X \ E if 0 U, but 1 U. n=1 A n, then A M. 1. Abstract Integration The main reference for this section is Rudin s Real and Complex Analysis. The purpose of developing an abstract theory of integration is to emphasize the difference between the

More information

REAL AND COMPLEX ANALYSIS

REAL AND COMPLEX ANALYSIS REAL AND COMPLE ANALYSIS Third Edition Walter Rudin Professor of Mathematics University of Wisconsin, Madison Version 1.1 No rights reserved. Any part of this work can be reproduced or transmitted in any

More information

Probability and Measure

Probability and Measure Part II Year 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2018 84 Paper 4, Section II 26J Let (X, A) be a measurable space. Let T : X X be a measurable map, and µ a probability

More information

Lebesgue-Radon-Nikodym Theorem

Lebesgue-Radon-Nikodym Theorem Lebesgue-Radon-Nikodym Theorem Matt Rosenzweig 1 Lebesgue-Radon-Nikodym Theorem In what follows, (, A) will denote a measurable space. We begin with a review of signed measures. 1.1 Signed Measures Definition

More information

Spectral Continuity Properties of Graph Laplacians

Spectral Continuity Properties of Graph Laplacians Spectral Continuity Properties of Graph Laplacians David Jekel May 24, 2017 Overview Spectral invariants of the graph Laplacian depend continuously on the graph. We consider triples (G, x, T ), where G

More information

Functional Analysis I

Functional Analysis I Functional Analysis I Course Notes by Stefan Richter Transcribed and Annotated by Gregory Zitelli Polar Decomposition Definition. An operator W B(H) is called a partial isometry if W x = X for all x (ker

More information

6. Duals of L p spaces

6. Duals of L p spaces 6 Duals of L p spaces This section deals with the problem if identifying the duals of L p spaces, p [1, ) There are essentially two cases of this problem: (i) p = 1; (ii) 1 < p < The major difference between

More information

FUNCTIONAL ANALYSIS LECTURE NOTES: WEAK AND WEAK* CONVERGENCE

FUNCTIONAL ANALYSIS LECTURE NOTES: WEAK AND WEAK* CONVERGENCE FUNCTIONAL ANALYSIS LECTURE NOTES: WEAK AND WEAK* CONVERGENCE CHRISTOPHER HEIL 1. Weak and Weak* Convergence of Vectors Definition 1.1. Let X be a normed linear space, and let x n, x X. a. We say that

More information

Integral Jensen inequality

Integral Jensen inequality Integral Jensen inequality Let us consider a convex set R d, and a convex function f : (, + ]. For any x,..., x n and λ,..., λ n with n λ i =, we have () f( n λ ix i ) n λ if(x i ). For a R d, let δ a

More information

INTRODUCTION TO MEASURE THEORY AND LEBESGUE INTEGRATION

INTRODUCTION TO MEASURE THEORY AND LEBESGUE INTEGRATION 1 INTRODUCTION TO MEASURE THEORY AND LEBESGUE INTEGRATION Eduard EMELYANOV Ankara TURKEY 2007 2 FOREWORD This book grew out of a one-semester course for graduate students that the author have taught at

More information

An elementary proof of the weak convergence of empirical processes

An elementary proof of the weak convergence of empirical processes An elementary proof of the weak convergence of empirical processes Dragan Radulović Department of Mathematics, Florida Atlantic University Marten Wegkamp Department of Mathematics & Department of Statistical

More information

Lecture 4 Lebesgue spaces and inequalities

Lecture 4 Lebesgue spaces and inequalities Lecture 4: Lebesgue spaces and inequalities 1 of 10 Course: Theory of Probability I Term: Fall 2013 Instructor: Gordan Zitkovic Lecture 4 Lebesgue spaces and inequalities Lebesgue spaces We have seen how

More information

MTH 404: Measure and Integration

MTH 404: Measure and Integration MTH 404: Measure and Integration Semester 2, 2012-2013 Dr. Prahlad Vaidyanathan Contents I. Introduction....................................... 3 1. Motivation................................... 3 2. The

More information

MATS113 ADVANCED MEASURE THEORY SPRING 2016

MATS113 ADVANCED MEASURE THEORY SPRING 2016 MATS113 ADVANCED MEASURE THEORY SPRING 2016 Foreword These are the lecture notes for the course Advanced Measure Theory given at the University of Jyväskylä in the Spring of 2016. The lecture notes can

More information

Chapter 4. The dominated convergence theorem and applications

Chapter 4. The dominated convergence theorem and applications Chapter 4. The dominated convergence theorem and applications The Monotone Covergence theorem is one of a number of key theorems alllowing one to exchange limits and [Lebesgue] integrals (or derivatives

More information

18.175: Lecture 3 Integration

18.175: Lecture 3 Integration 18.175: Lecture 3 Scott Sheffield MIT Outline Outline Recall definitions Probability space is triple (Ω, F, P) where Ω is sample space, F is set of events (the σ-algebra) and P : F [0, 1] is the probability

More information

Score Statistics for Current Status Data: Comparisons with Likelihood Ratio and Wald Statistics

Score Statistics for Current Status Data: Comparisons with Likelihood Ratio and Wald Statistics Score Statistics for Current Status Data: Comparisons with Likelihood Ratio and Wald Statistics Moulinath Banerjee 1 and Jon A. Wellner 2 1 Department of Statistics, Department of Statistics, 439, West

More information

AN EFFECTIVE METRIC ON C(H, K) WITH NORMAL STRUCTURE. Mona Nabiei (Received 23 June, 2015)

AN EFFECTIVE METRIC ON C(H, K) WITH NORMAL STRUCTURE. Mona Nabiei (Received 23 June, 2015) NEW ZEALAND JOURNAL OF MATHEMATICS Volume 46 (2016), 53-64 AN EFFECTIVE METRIC ON C(H, K) WITH NORMAL STRUCTURE Mona Nabiei (Received 23 June, 2015) Abstract. This study first defines a new metric with

More information

Functional Analysis HW #1

Functional Analysis HW #1 Functional Analysis HW #1 Sangchul Lee October 9, 2015 1 Solutions Solution of #1.1. Suppose that X

More information

An introduction to some aspects of functional analysis

An introduction to some aspects of functional analysis An introduction to some aspects of functional analysis Stephen Semmes Rice University Abstract These informal notes deal with some very basic objects in functional analysis, including norms and seminorms

More information

Stanford Mathematics Department Math 205A Lecture Supplement #4 Borel Regular & Radon Measures

Stanford Mathematics Department Math 205A Lecture Supplement #4 Borel Regular & Radon Measures 2 1 Borel Regular Measures We now state and prove an important regularity property of Borel regular outer measures: Stanford Mathematics Department Math 205A Lecture Supplement #4 Borel Regular & Radon

More information

SOLUTIONS TO HOMEWORK ASSIGNMENT 4

SOLUTIONS TO HOMEWORK ASSIGNMENT 4 SOLUTIONS TO HOMEWOK ASSIGNMENT 4 Exercise. A criterion for the image under the Hilbert transform to belong to L Let φ S be given. Show that Hφ L if and only if φx dx = 0. Solution: Suppose first that

More information

Math 4121 Spring 2012 Weaver. Measure Theory. 1. σ-algebras

Math 4121 Spring 2012 Weaver. Measure Theory. 1. σ-algebras Math 4121 Spring 2012 Weaver Measure Theory 1. σ-algebras A measure is a function which gauges the size of subsets of a given set. In general we do not ask that a measure evaluate the size of every subset,

More information

Approximate Self Consistency for Middle-Censored Data

Approximate Self Consistency for Middle-Censored Data Approximate Self Consistency for Middle-Censored Data by S. Rao Jammalamadaka Department of Statistics & Applied Probability, University of California, Santa Barbara, CA 93106, USA. and Srikanth K. Iyer

More information

Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1

Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1 Chapter 2 Probability measures 1. Existence Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension to the generated σ-field Proof of Theorem 2.1. Let F 0 be

More information

Functional Analysis HW #3

Functional Analysis HW #3 Functional Analysis HW #3 Sangchul Lee October 26, 2015 1 Solutions Exercise 2.1. Let D = { f C([0, 1]) : f C([0, 1])} and define f d = f + f. Show that D is a Banach algebra and that the Gelfand transform

More information

The main results about probability measures are the following two facts:

The main results about probability measures are the following two facts: Chapter 2 Probability measures The main results about probability measures are the following two facts: Theorem 2.1 (extension). If P is a (continuous) probability measure on a field F 0 then it has a

More information

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product Chapter 4 Hilbert Spaces 4.1 Inner Product Spaces Inner Product Space. A complex vector space E is called an inner product space (or a pre-hilbert space, or a unitary space) if there is a mapping (, )

More information