1 Weak Convergence in R k

Size: px

Start display at page:

Download "1 Weak Convergence in R k"

Josephine Paula Wilkinson
5 years ago
Views:

1 1 Weak Convergence in R k Byeong U. Park 1 Let X and X n, n 1, be random vectors taking values in R k. These random vectors are allowed to be defined on different probability spaces. Below, for the simplicity of notation, we denote all probability measures associated with random vectors simply by P although they are defined on different probability spaces. Define F n x) = P X n x) and F x) = P X x). For a function f : R k R, let C f = {x R k : f is continuous at x}. Definition. We say a sequence of random vectors {X n } converges weakly to a random vector X, and we write X d n X, if P X n A) converges to P X A) for all Borel sets A R k with P X A) = 0. In the case of R k, X n d X if and only if F n x) F x) for all x C F. Theorem 1.1. Skorokhod representation theorem). Suppose that d X. Then, there exist X and {Xn} defined on the same probability X n space such that X n d = X n, X = d X and Xn a.s. X. a.s. Theorem 1.2. Continuous mapping theorem). If X n X and P X C f ) = 1 for a real-valued function f, then fx n ) a.s. fx). Proof. From the conditions of the theorem, 1 = P X C f, lim X n = X) P X C f, lim fx n ) = fx)) P lim fx n ) = fx)). Theorem 1.3. If X n a.s. X, then X n d X. Proof. Fix x C F and consider f = I,x] ). The function f is bounded and C f = R k {x}. Since P X C f ) = P X x) = 1 F x) F x )] = 1, we get from Theorem 1.2 that fx n ) a.s. a.s. fx). Since f is bounded and X n X, Dominated Convergence Theorem implies F n x) = EfX n ) EfX) = F x).

2 Byeong U. Park 2 The converse of Theorem 1.3 is not true. The following theorem presents a set of equivalent definitions of weak convergence. Theorem 1.4. The followings are equivalent. a) F n x) F x) for all x C F. b) EfX n ) EfX) for all bounded f : R k R with P X C f ) = 1. c) EfX n ) EfX) for all bounded and continuous f : R k R. d) EfX n ) EfX) for all bounded and uniformly continuous f : R k R. Proof. The implications b) c) d) are trivial. We prove the implications a) b) and d) a). To prove a) b), we use Theorems 1.1 and 1.2. Let X and {Xn} be defined on the same probability space such that Xn d = X n, X = d X and Xn a.s. X. Let f : R k R be bounded and satisfy P X C f ) = 1. Since X = d X, we also have P X C f ) = 1. Then, by Theorem 1.2 we obtain fxn) a.s. fx ). Applying DCT to fxn), we get EfX n ) = EfX n) EfX ) = EfX). To prove d) a), let x C F be fixed, and for such an x let f m + : R k R be defined by 1 if u x; f mu) + = m/k)1 x + 1/m u) if x u x + 1/m; 0 if u x + 1/m. The function f m + is uniformly continuous and 0 I,x] ) f m + 1. Also, let fm : R k R be defined by 1 if u x 1/m; f mu) + = m/k)1 x u) if x 1/m u x; 0 if u x. The function f m is also uniformly continuous and 0 f m I,x] ) 1. It follows that lim f mu) + = I,x] u) and m lim f mu) = I,x) u) for all u R k. m

3 Byeong U. Park 3 Thus, d) implies that, for all m 1, lim lim inf F n x) lim Ef mx + n ) = Ef mx), + Ef mx n ) = EfmX). F nx) lim inf By DCT, we also have lim Ef mx) = F x ), m lim Ef mx) + = F x). m These results give F x ) lim inf F nx) lim F n x) F x). Since x C F so that F x ) = F x), we obtain lim F n x) = F x). Remark. For other equivalent definitions of weak convergence, see the portmanteau theorem on page 24 in Billingsley, P. 1968). Convergence of Probability Measures. John Wiley & Sons, New York. 2 Weak Convergence in Metric Spaces We have studied the notion of weak convergence for random variables taking values in R k. Here, we extend the notion to random elements taking values in a function space. We start with weak convergence in a metric space, and then specialize the notion to the space of continuous functions defined on 0, 1] and also to the space of càdlàg functions. Let S, S) be a metric space, where S denotes the Borel σ-field of S. Let X n and X be random elements taking values in S. This means that X n and X are measurable mappings from a probability space Ω, F, P ) to the metric space S, S). Definition. We say a sequence of random elements {X n } converges weakly to a random element X, and we write X d n X, if P X n A) converges to P X A) for all Borel sets A S with P X A) = 0. Equivalently, we

4 Byeong U. Park 4 say X d n X if EfX n ) EfX) for any bounded and uniformly continuous real-valued function f. In the case where S = R equipped with the -metric dx, y) = k 1 1 i k x i y i, a sequence of random elements X n = X n,1, X n,2,...) converges weakly to X = X 1, X 2,...) if every finite-dimensional distribution of X n converges weakly to the corresponding finite-dimensional distribution of X, i.e., X n,1,..., X n,k ) converges weakly to X 1,..., X k ) for all k 1. But this is not true when S is a function space. Example. Consider the space of continuous functions defined on the interval 0, 1] with dx, y) = 0 t 1 xt) yt). Define nt if 0 t n 1 ; X n t) = 2 nt if n 1 t 2n 1 ; 0 if 2n 1 t 1. and Xt) 0. Then, X n does not converge weakly to X. To see this, take A = B0, 1/2) = {y : dy, 0) 1/2}. For this Borel set A, we have P X A) = P dx, 0) = 1/2) = 0 and P X A) = P dx, 0) 1/2) = 1, but P X n A) = P dx n, 0) 1/2) = 0. Every finite-dimensional distribution of X n converges weakly to the corresponding finite-dimensional distribution of X since for any 0 t 1 < t 2 < < t k 1 X n t 1 ),..., X n t k )) d = Xt 1 ),..., Xt k )), provided t 1 > 2/n. The main task is to find a plausible set of sufficient conditions that ensures d X. A useful condition for weak convergence in a metric space is given X n below. Theorem 2.1. X d n X if and only if each subsequence {X n } contains a d further subsequence {X n } such that X n X.

5 Byeong U. Park 5 Proof. We only need to prove if part. Suppose X n d X. Then, there exists a bounded and uniformly continuous function f : S R such that EfX n ) EfX). This means that there exists ε > 0 and {n } {n} such that EfX n ) EfX) > ε for all n. This contradicts to the existence of {n } {n } such that EfX n ) EfX) 0 along n. Definition We say that a sequence of random elements {X n } is relatively compact if each subsequence {X n } contains a further subsequence {X n } that converges weakly to a random element which may depend on the choice of {X n }). Theorem 2.2 Continuous mapping theorem). Let h be a measurable function that maps S, S) to another metric space S, S ). If X d n X and P X D h ) = 0 for the set D h of discontinuities of h, then hx n ) d hx). Theorem 2.3. Let H be a collection of measurable and continuous functions that map S, S) to another metric space S, S ). Suppose that H 1 S ) h H h 1 S ) is a field generating S. If {X n } is relatively compact and hx n ) converges weakly to hx) for all h H for some random element X, then X n d X. Remark. In fact, we do not need to assume at the outset that there exists a random element X such that hx n ) converges weakly to hx) for all h H. What we need is that hx n ) converges weakly to a random element, say X h, which takes values in S for all h H. Then, we can define a random element X such that P X h 1 A )] = P X h A ) for all A S and for all h H. Note that the distribution of X is uniquely determined by the probabilities P X A) for A H 1 S ) since H 1 S ) is a field generating S. Proof of Theorem 2.3. We only need to check that the weak limit of any convergent subsequence {X n } of {X n } does not depend on {n }. Let Y be the weak limit of {X n }. By Theorem 2.2 it follows that hx n ) d hy ) d for all h H. On the other hand, we also have hx n ) hx) for all h H by the condition of the theorem. Thus, hx) d = hy ) for all h H.

6 Byeong U. Park 6 This means P X A) = P Y A) for all A H 1 S ), which implies P X A) = P Y A) for all A S since H 1 S ) is a field generating S. It is rather difficult to prove directly relative compactness for a given sequence of random elements {X n }. A more convenient notion that implies relative compactness is tightness. Definition. We say a random element X is tight if for any ɛ > 0 there exists a compact set K such that P X K) > 1 ɛ. Definition. A set A in a metric space S, S) is called compact if every open cover of A has a finite subcover. Alternatively, a set A is compact if it is totally bounded and complete. Definition. A set A in a metric space S, S) is called totally bounded if for any ɛ > 0 the set A is covered by finitely many open balls of radius ɛ in S, S). Definition. A metric space S is called complete if every Cauchy sequence of points in S has a limit that is also in S, in other words, if every Cauchy sequence in S converges in S. A set in S is called complete if every Cauchy sequence in the set converges in the set. Definition. A topological space is called separable if it contains a countable dense subset; that is, there exists a sequence of elements of the space such that every nonempty open subset of the space contains at least one element of the sequence. Definition. A Polish space is a topological space that is separable and metrizable in such a way that it becomes complete. The following theorem gives a sufficient condition for a single random element to be tight. Theorem 2.4 Theorem 1.4, Billingsley, 1968). complete, then each random element in S, S) is tight. If S is separable and Proof. Fix ε > 0. By the separability of S, for each k 1 there exist a countable number of balls B k,1, B k,2,... with radius k 1 that cover S so that

7 Byeong U. Park 7 P ) X j=1b k,j = 1. We may find Jk such that P X J k j=1 B k,j ) > 1 ε 2 k. Let B = k=1 J k j=1 B k,j. Then, B is totally bounded. Since S is complete, the closure B of B is also complete so that B is compact. We obtain P X B c ) P X B c ) P X k=1 k=1 ε 2 k = ε. J k j=1 B c k,j Definition. We say a sequence of random elements {X n } is tight if for any ɛ > 0 there exists a compact set K such that inf n P X n K) > 1 ɛ. Remark. In the case of R k, tightness of a sequence of random vectors X n means X n = O p 1). Theorem 2.5 Theorem 6.1, Billingsley, 1968). If {X n } is is tight, then it is relatively compact. The converse is also true if S is separable and complete. 3 Weak Convergence in C0, 1] Here we consider weak convergence in C C0, 1], the space of real-valued continuous functions defined on the interval 0, 1]. Let C be the Borel σ-field of C. We endow C with the uniform metric dx, y) = xt) yt). t 0,1] With this metric, the space C is separable due to the Stone-Weierstrass theorem any continuous function can be approximated by a polynomial function), and is also complete. Separability and completeness facilitate derivation of a plausible set of sufficient conditions for weak convergence. )

8 Byeong U. Park Projection from C to R k For a set of points {t 1,... t k } in 0, 1], let π t1,...,t k be a map that carries a point x of C to the points xt 1 ),..., xt k )) of R k. It is a map from C, C) to R k, R k ), where R k is the Borel σ-field of R k. The following theorem demonstrates that the collection of all projections π t1,...,t k for t 1,... t k 0, 1] and k 1 satisfies the conditions of H in Theorem 2.3. Theorem 3.1. The projection π t1,...,t k is measurable and continuous for all t 1,... t k 0, 1] and k 1. Also, all sets of the form πt 1 1,...,t k A ) for some A R k, t 1,... t k 0, 1] and k 1 form a field that generates C. Proof. The first part is obvious. For the second part, let C 0 = {π 1 t 1,...,t k B) : B R k, t 1,... t k 0, 1], k 1}. The fact that C 0 is a field follows from π 1 t 1,...,t k B)] c = π 1 t 1,...,t k B c ), π 1 t 1,...,t k B 1 ) π 1 s 1,...,s l B 2 ) = π 1 t 1,...,t k,s 1,...,s l B 1 R l ) R k B 2 )). Now, recall that each open set in a separable space is a countable union of closed balls or open balls). Thus, it suffices to prove that each closed ball can be obtained by the operations of countable union, countable intersection and complementation of the sets in C 0. Let Bx, ε) = {y : 0 t 1 yt) xt) ε} be a closed ball in C. Clearly, Bx, ε) n=1 i=1 n {y : yi/n) xi/n) ε}. Now, let y belong to the set on the right hand side of the above inclusion. For such y, we may find t 0 0, 1] where 0 t 1 yt) xt) = yt 0 ) xt 0 ). Given δ > 0, we may also find t δ {i/n : 1 i n, n 1} such that xt δ ) xt 0 ) δ/2 and yt δ ) yt 0 ) δ/2 due to the continuity of x and y. Thus, we have yt) xt) = yt 0 ) xt 0 ) 0 t 1 yt 0 ) yt δ ) + yt δ ) xt δ ) + xt δ ) xt 0 ) ε + δ.

9 Byeong U. Park 9 Letting δ 0 gives 0 t 1 yt) xt) ε. Thus, n Bx, ε) {y : yi/n) xi/n) ε}. n=1 i=1 This completes the proof. Definition. The distribution of π t1,...,t k X n = X n t 1 ),..., X n t k )) is called a finite-dimensional distribution of X n. Theorem 3.2. Let X n and X be random elements in C. If all finitedimensional distributions of X n converge weakly to those of X and if {X n } is tight, then X n d X. 3.2 Conditions for tightness in C Here, we study necessary and sufficient conditions for a sequence of continuous random functions being tight. We start with the following theorem. Theorem 3.3 Theorem 8.2, Billingsley, 1968). A sequence {X n } in C is tight if and only if i) {X n 0)} is tight in R; ii) for any ɛ > 0 ] lim lim P X n s) X n t) ɛ = ) δ 0 s t <δ The theorem follows from the following Arzelá-Ascoli characterization of compact sets. The property 3.1) is sometimes called asymptotic equicontinuity. Lemma 3.4 Arzelá-Ascoli characterization of compact sets). A set A C has compact closure if and only if i) x A x0) < and ii) lim δ 0 x A w x δ) = 0, where w x δ) is the modulus of continuity of x in C defined by w x δ) = xs) xt). s t <δ Note. The conditions i) and ii) are in fact necessary and sufficient for A to be totally bounded. Since C is complete, Ā is complete for any A C. Thus, Ā is compact if and only if A is totally bounded.

10 Byeong U. Park 10 Proof of Theorem 3.3. To prove the only if part, assume that {X n } is tight. Then, for any given ɛ > 0 there exists a compact set K Kɛ) such that inf n P X n K) > 1 ɛ. By Lemma 3.4, i) x K x0) < C 0 for some 0 < C 0 < and ii) there exists δ 0 > 0 such that x K s t <δ0 xs) xt) < ɛ. Thus, from i) inf n P X n0) < C 0 ) inf n P X n K) > 1 ɛ so that {X n 0)} is tight in R. Furthermore, from ii) it also holds that ] inf X n s) X n t) < ɛ inf n K) > 1 ɛ n s t <δ n 0 ] so that n P s t <δ0 X n s) X n t) ɛ < ɛ. This implies that, for any ɛ > 0, ] lim lim P δ 0 X n s) X n t) ɛ < ɛ. s t <δ Taking ɛ 0 gives that, for any ɛ 0 > 0 ] lim lim P X n s) X n t) ɛ 0 δ 0 s t <δ lim lim lim P ɛ 0 δ 0 s t <δ This completes the proof of the only if part. ] X n s) X n t) ɛ 0. To prove the if part, let ɛ > 0 is fixed. We construct a totally bounded set K such that n P X n K c ) < ɛ. Suppose that i) and ii) of the theorem hold. Then, we may find C 0 such that P X n 0) > C 0 ) < ɛ/2. 3.2) n Also, we may choose δ j > 0, j 1, such that ] lim P X n s) X n t) 1/j < ɛ s t <δ j 2. j Note that we actually can choose δ j > 0, j 1, such that ] P X n s) X n t) 1/j < ɛ n s t <δ j ) j

11 Byeong U. Park 11 This follows since every single random element in {X n } is tight due to Theorem 2.4, and thus from the necessity part of the theorem that we have just proved entails ] lim P X k s) X k t) 1/j = 0 δ 0 s t <δ for each fixed k and j. We take K = {x : x0) C 0 } j=1{x : xs) xt) < 1/j}. s t <δ j This set is totally bounded by Lemma 3.4, so that its closure K is compact. From 3.2) and 3.3), we get P X n K c ) n P X n 0) > C 0 ) + n ɛ. j=1 ] P X n s) X n t) 1/j n s t <δ j Corollary 3.5. Let X n and X be random elements in C. If all finitedimensional distributions of X n converge weakly to those of X, and if for any ɛ > 0 there exists n 0 and δ > 0 such that n n 0 P then X n d X. s t <δ ] X n s) X n t) ɛ ɛ, 3.4) Note. 3.1) holds for any ɛ > 0 if and only if for any ɛ > 0 there exists n 0 and δ > 0 such that 3.4) holds. Theorem 3.6. The inequality 3.4) follows if ] P X n s) X n t) ɛ/3 δɛ. 3.5) n n 0 0 t 1 s t,t+δ] Proof. Let t i = δi for i = 0, 1,..., δ 1. Without loss of generality, we may assume I n δ 1 is an integer. Note that, if s t < δ and t < s

12 Byeong U. Park 12 WLOG), then i) there exists a grid point t i 1 i I n 1) such that t i 1 t t i s t i+1, or ii) there exists a grid point t i 0 i I n 1) such that t i t < s t i+1. This means s t <δ Xn s) X n t) 3 max 0 i I n 1 which with the inequality 3.5) gives P Xn s) X n t) ] ɛ s t <δ P I n 1 i=0 max 0 i I n 1 P I n δ ɛ = ɛ. Xn t) X n t i ), t i t t i+1 Xn t) X n t i ) ] ɛ/3 t i t t i+1 Xn t) X n t i ) ] ɛ/3 t i t t i Donsker s Theorem Let the random variables ξ j be iid with mean 0 and variance 1. Define a sequence of random elements X n in C by X n t) = 1 nt ξ i + nt nt ) 1 ξ nt +1, 3.6) n n i=1 where nt denotes the largest integer which is less than or equal to nt. The process X n is simply the linear interpolation of X n j/n) = j i=1 ξ i/ n. The following theorem due to Donsker is a generalization of the classical Central Limit Theorem. It is a functional CLT for the entire process of partial sum, not for the nth partial sum as the classical CLT treats. Definition. The standard Wiener process or Brownian motion W is a Gaussian process taking values in C such that EW t) = 0 and covw s), W t)) = s t. Alternatively, it is defined to be a stochastic process with the following properties: i) for each 0 t 1, W t) N0, t); ii) W has independent increments, i.e., W t k ) W t k 1 ),..., W t 2 ) W t 1 ) are independent for all 0 t 1 t k 1.

13 Byeong U. Park 13 Theorem 3.7 Donsker). The partial sum process defined at 3.6) converges weakly to the standard Wiener process W. Proof. Convergence of the finite-dimensional distributions of X n to those of W follows immediately from the classical CLT. We prove that for any ɛ > 0 there exists n 0 and δ > 0 such that 3.5) holds. Our approach needs finite 4th moments of ξ j. For a proof with only second moments, see Theorem 8.4 and the arguments running through pp of Billingsley 1968). We introduce a technique based on the assumption that ξ j have finite 4th moment since it is more instructive and can be generalized to different settings. The basic idea is to use the following maximal inequality, which is fairly general so that it can be applied to partial sums of arbitrary random variables that may not be independent or identically distributed. Lemma 3.8 Theorem 12.2, Billingsley, 1968). Let ξ 1,..., ξ m be random variables. Let S k = ξ ξ k for k 1 and put S 0 = 0. If E S j S i γ u i u j ) α for some γ 0, α > 1 and u 1,..., u m 0, then ) P max S k λ 1 k m C γ,α λ γ u u m ) α, where C γ,α is a constant that depends only on γ and α. To prove 3.5), consider a fixed point t 0, 1]. Let j be an integer such that j/n t < j + 1)/n. Suppose that k/n < t + δ k + 1)/n for some j k n 1. In this case, k j 1)/n δ and s t, t + δ] may lie in an interval i/n, i + 1)/n] for some i : j i k. For such an i, X n s) X n t) X n s) X n i/n) + X n i/n) X n j/n) + X n t) X n j/n) X n i + 1)/n) X n i/n) + X n i/n) X n j/n) + X n j + 1)/n) X n j/n), where the second inequality follows from the polygonal character of X n. Writ-

14 Byeong U. Park 14 ing Iδ, j) = j nδ) n 1), we establish X n s) X n t) max X ni/n) X n j/n) 3.7) s t,t+δ] j<i Iδ,j) + 2 max X ni + 1)/n) X n i/n) j i Iδ,j) = max ) ξ j ξ i / n j<i Iδ,j) + 2 max j i Iδ,j) ξ i/ n. We apply Lemma 3.8 to get a probability bound for the large deviation of the first term on the RHS of 3.7). Since ξ k are independent with Eξ k = 0, we obtain for any i, i : i > i > j ES i S i ) 4 = E ξ i ξ i i C 1 k=i+1 Eξ 2 k C 1 + 1)i i) 2 ) 4 ) 2 i + k=i+1 for some absolute constant C 1 > 0. This means we can apply Lemma 3.8 with γ = 4, α = 2 and u k C 1 + 1) 1/2. Thus, there exists an absolute constant C 2 > 0 such that P max j<i Iδ,j) ] ) ξ j ξ i / n ɛ/6 Eξ 4 k C 2nδ + 1) 2 n 2 ɛ 4 4C 2δ 2 ɛ 4 for sufficiently large n such that n 1/δ. Taking δ ɛ 5 /8C 2 ) gives P max ] ) ξ j ξ i / n ɛ/6 δɛ/2. 3.8) j<i Iδ,j) For the second term on the RHS of 3.7), there exist an absolute constant C 3 > 0 and an integer n 0 δ, ɛ) such that for all n n 0 δ, ɛ) P max ξ i/ ] Iδ,j) n ɛ/12 P ξ i / n ɛ/12 ] 3.9) j i Iδ,j) i=j C 3δ nɛ 4 δɛ/2.

15 Byeong U. Park 15 The inequalities 3.7), 3.8) and 3.9) give 3.5). 4 Weak Convergence in D0, 1] We consider weak convergence in D D0, 1], the space of càdlàg functions defined on the interval 0, 1]. Definition. A function defined on A R is called càdlàg function if it is right-continuous and has left limit everywhere in A. Note that all continuous functions are càdlàg functions. All distribution functions are also càdlàg functions. The main difficulty with this space is that it is not separable with the uniform metric d U. 4.1 Non-separability of D, d U ) Define x α D, for 0 α 1, by x α t) = It α). If α α, then d U x α, x α ) = 1. Let ε 1/2. Then, for any α α {x D : d U x α, x) < ε} {x D : d U x α, x) < ε} =. If D, d U ) is separable, there exists a countable subset, say D 0, of D such that every open ball {x D : d U x α, x) < ε} for α 0, 1] contains a member of D 0. But, this is impossible since the number of these open balls is uncountable and all the balls are disjoint. Non-separability of D, d U ) causes a fundamental difficulty. If a metric space S, S, d) is not separable, then functions that map a probability space Ω, F, P ) to S, S, d) often fail to be measurable. Important examples are empirical processes. Define X : 0, 1], B, µ) D, D, d U ) by Xt, w) = Iw t), where B is the Borel σ-field of 0, 1], µ is the Lebesgue measure and D is the Borel σ-field of D. For any subset H of 0, 1], {y D : d U y, x α ) < 1/2} D. α H Bx α, 1/2) = α H

16 Byeong U. Park 16 However, since X, w) Bx α, 1/2) if and only if X, w) = x α, which is also equivalent to w = α, we obtain ) { X 1 Bx α, 1/2) = w : X, w) } Bx α, 1/2) α H α H = { w : X, w) = x α for some α H } = H. By taking H / B, we see that X 1 D) B fails. 4.2 Skorokhod metric The Skorokhod metric d S defined below makes D separable. The basic idea is to allow a deformation on the time scale to define a distance between two elements in D. Definition. Let Λ be the class of all strictly increasing and continuous mappings λ of 0, 1] onto itself such that λ0) = 0 and λ1) = 1. The Skorokhod metric, denoted by d S, is defined by d S x, y) = inf max{ λt) t, xt) yλt)) }. λ Λ t 0,1] t 0,1] A proof for the fact that d S is indeed a metric can be found in page 111 of Billingsley 1968). Example. We compute d S x α, x α ) for α α. If we take λ Λ such that λα) α, then t 0,1] x α t) x α λt)) = 1. For λ Λ such that λα) = α, it hold that t 0,1] x α t) x α λt)) = 0. Also, inf λ Λ:λα)=α t 0,1] Thus, d S x α, x α ) = α α. λt) t = α α 1. Theorem 4.1. The metric space D, d S ) is separable. A proof of the above theorem can be found in page 112 of Billingsley 1968). There is another difficulty. The space D, d S ) is not complete as the following

17 Byeong U. Park 17 example illustrates. Completeness facilitates characterization of compact sets, and thus that of tight sequences of random elements. Example. Define x n D by x n t) = I 1/2) t < 1/2) + 1/n) ). Then, d S x m, x n ) = 1 m 1 0 n as m, n. Thus, {x n } is a Cauchy sequence. However, there exists no x D such that x n x in d S. To see this, pose that there exists such a function x. Then, there exists a strictly increasing and continuous function λ n with λ n 0) = 0 and λ n 1) = 1 such that Note that λ n t) t 0, t 0,1] x n λ n t)) xt) 0. t 0,1] x n λ n t)) = I λ 1 n 1/2) t < λ 1 n 1/2) + 1/n)) ). 4.10) Due to the second convergence in 4.10) and the fact that x n λ n )) is an indicator, the limit x D must take the form xt) = Iα t < β) for some 0 α < β 1. The case α = β is excluded here since then x 0 and thus the second convergence in 4.10) does not hold. Now, due to the first convergence in 4.10), we have λ 1 n 1/2) 1/2 = λ 1 n 1/2) λ n λ 1 n 1/2)) 0. Similarly λ 1 n 1/2) + 1/n)) 1/2. This means that α = β, which is a contradiction. There is a metric d S which is equivalent to d S such that the metric space D, d S ) is complete. See Theorems 14.1 and 14.2 of Billingsley 1968). Thus, we can proceed as if the Skorokhod space D, d S ) is separable and complete. 4.3 Finite-dimensional distributions We try to use Theorem 2.3 to get a set of sufficient conditions for weak convergence in D, D, d S ). As in the case of C, we consider the class of all projections π t1,...,t k for H.

18 Byeong U. Park 18 Theorem 4.2. The projection π t1,...,t k as a map from D, D) to R k, R k ) is measurable for all t 1,... t k 0, 1] and k 1. For a proof of this theorem, see page 121 of Billingsley 1968). However, the projections π t1,...,t k are not continuous everywhere in D for each t 1,..., t k ). This complicates matters somewhat. Recall that, when we derive Theorem 3.2 for a sequence of random elements X n in C from Theorem 2.3, continuity of the projections π t1,...,t k was used only to establish the weak convergence of π t1,...,t k X n to π t1,...,t k X for X n converging weakly to X. In general, a measurable function h : S, S) S, S ) does not need to be continuous everywhere in S for the sequence hx n ) to converge weakly to hx). According to the continuous mapping theorem Theorem 2.2), if P X D h ) = 0 for the set D h where h is discontinuous, then hx n ) converges weakly to hx). The following theorem is a slight generalization of Theorem 2.3 that embodies this idea. Theorem 4.3. For a random element Y taking values in a metric space S, S), let H Y be a collection of measurable functions h that map S, S) to another metric space S, S ) such that P Y D h ) = 0 for the set D h where h is discontinuous. Suppose that {X n } is relatively compact and hx n ) converges weakly to hx) for all h H X for some random element X. If H 1 X,Y S ) h H X H Y h 1 S ) is a field generating S for all random elements Y, then X n converges weakly to X. In fact, the requirement in Theorem 3.2 and Corollary 3.5 that all finitedimensional distributions of X n converge weakly to those of X is too much in the space D, D, d S ) as the following example illustrates. Example. Let X n I 0,1/2)+1/n)) and X I 0,1/2). Then, X n d X since d S X n, X) = 1/n 0. However, X n 1/2) 1 0 X1/2). Theorem 4.3 enables us to relax the condition that all finite-dimensional distributions of X n converge weakly to those of X. Our relaxation is founded on the following three theorems. The first one characterizes the discontinuity sets of π t1,...,t k in D, d S ), which tells that π t1,...,t k for 0 < t 1 < < t k < 1 is discontinuous at x if and only if x is discontinuous at some t j, 1 j k. Theorem 4.4. The projections π 0 and π 1 are everywhere continuous, but π t for 0 < t < 1 is continuous at x if and only if x is continuous at t.

19 Byeong U. Park 19 Proof. The first result is immediate. For the second, pose that x is continuous at t. Let x n be a sequence in D such that d S x n, x) 0. Then, there exists a sequence λ n in Λ such that t 0,1] x n λ n t)) xt) 0 and t 0,1] λ n t) t 0. Since x is continuous at t, x n t) xt) x n t) xλ 1 n 0. s 0,1] t)) + xλ 1 t)) xt) x n λ n s)) xs) + xλ 1 n n t)) xt) Suppose, on the other hand, that x is discontinuous at t. We need to show that there exists an ɛ > 0 and a sequence x n such that d S x n, x) 0 but x n t) xt) ɛ for infinitely many n. Take x n such that x n s) = xλ n s)) for a sequence λ n which is linear on 0, t] and on t, 1] and satisfies λ n t) = t 1/n. Then, d S x n, x) = inf max{ λs) s, x n λs)) xs) } λ Λ s 0,1] s 0,1] λ 1 n s) s s 0,1] 1/n 0, but x n t) xt) xt ) xt) > 0. For a point t 0, 1], define Dπ t ) = {x D : xt) xt )}, which is the set of all elements of D which are discontinuous at t. The main lesson of Theorem 4.4 is that π t for 0 < t < 1 is discontinuous on Dπ t ), and is continuous on Dπ t ) c D. Theorem 4.5. The complement of the set {t 0, 1) : P X J t ) = 0} for a random element X taking values in D, D) is at most countable. For a proof of this theorem, see page 124 of Billingsley 1968). The theorem tells that the set T X {0, 1} {t 0, 1) : P X Dπ t )) = 0}

20 Byeong U. Park 20 is dense in 0, 1]. Furthermore, it implies that m i=1t Xi also dense in 0, 1]. for finitely many X i is Theorem 4.6 Theorem 14.5, Billingsley, 1968). For a subset T of 0, 1], let F T be the class of all sets of the form πt 1 1,...,t k A ) for some A R k, t 1,... t k T and k 1. If T contains 1 and is dense in 0, 1], then F T is a field that generates D. The following theorem demonstrates that the class of the functions Π X = {π t1,...,t k : t j T X for all 1 j k, k 1} plays the role of H X in Theorem 4.3. Recall that T X 0, 1] is the set of t = 0, 1 and t 0, 1) where the probability that X is discontinuous at t equals 0, and that P X D π ) = 0 for any π Π X since D πt1,...,t k = { x D, d S ) : π t1,...,t k is discontinuous at x } = Dπ tj ). due to Theorem j k:0<t j <1 Theorem 4.7. Let X n and X be random elements in D. If the distribution X n t 1 ),..., X n t k )) converges weakly to the distribution Xt 1 ),..., Xt k )) for all t 1,..., t k T X and for all k 1, and if {X n } is tight, then X n d X. Proof. We follow the lines of the proof of Theorem 2.3. We prove that the weak limit of the convergent subsequence {X n } of {X n } does not depend on {X n }. Let Y be the weak limit of {X n }. Then, by the continuous mapping d theorem, πx n πy for all π Π Y.On the other hand, we also have πx n πx for all π Π X by the condition of the theorem. This implies πy = d πx for all π Π X Π Y, so that P πx A) A P πy A) π Π X Π Y P X P Y on {π 1 A) : A R k, π Π X Π Y } P X P Y on F TX T Y notation in Theorem 4.6). By Theorem 4.5, T X T Y contains 0 and 1, and is dense in 0, 1]. Application of Theorem 4.6 concludes Y = d X. d

21 Byeong U. Park Tightness in D, d S ) The following theorem is an analogue of Arzelá-Ascoli theorem which characterizes compact sets in D, D). Let w xδ) be defined by w xδ) = inf {t i } max 1 i r { xs) xt) : s, t t i 1, t i )}, where the infimum extends over all finite sets {t i } of points such that 0 = t 0 < t 1 < < t r = 1 and t i t i 1 > δ for all 1 i r and r 1. This is a modulus that plays in D the role of w x δ) in C. In fact, w xδ) 0 as δ 0 for all x D Lemma 1, page 110, Billingsley, 1968). Note. w xδ) 0 as δ 0 if and only if for any ɛ > 0 there exists a partition of 0, 1] into finitely many T i such that max i xs) xt) ɛ. s,t T i Theorem 4.8 Theorem 14.3, Billingsley, 1968). compact closure if and only if i) ii) lim x A t 0,1] δ 0 x A xt) < ; w xδ) = 0. A set A D has It is sometimes difficult to work with w xδ). The following modulus is often more convenient. Define w xδ) = { xt) xt 1 ) xt 2 ) xt) : 0 t 1 t t 2 1, t 2 t 1 δ}. Example. Recall the definition of x α : x α t) = It α). For this function, w xα δ) = 1 for all δ > 0. For w x α δ), note that w x α δ) x α s) x α t) = 0 s,t α,α+ɛ) for any δ and ɛ such that 0 < δ < ɛ < 1 α. Thus, w x α δ) = 0 for all sufficiently small δ > 0 if α < 1. Also, we have w x α δ) = 0 for all sufficiently small δ > 0 if 0 < α < 1, since w x α δ) = x α t) x α α δ ) xα α + δ ) x α t) = t α δ/2,α+δ/2]

22 Byeong U. Park 22 Fact. For all x D, it follows that w xδ) w xδ) w x 2δ). 4.11) For a proof of the first inequality, see pages of Billingsley 1968), and for a proof of the second, see page 110 of Billingsley 1968). The following theorem gives another characterization of compact sets in D, D) based on w xδ). It is sometimes more convenient to work with than the characterization in Theorem 4.8. We write w x T ) = s,t T xs) xt). Theorem 4.9 Theorem 14.4, Billingsley, 1968). compact closure if and only if i) x A t 0,1] xt) < ; ii) lim δ 0 x A w xδ) = 0; iii) lim δ 0 x A w x 0, δ) = 0; iv) lim δ 0 x A w x 1 δ, 1) = 0. A set A D has From Theorems 4.8 and 4.9, we get the following characterizations of a tight sequence in D Theorem 4.10 Theorem 15.2, Billingsley, 1968). A sequence {X n } in D is tight if and only if i) the sequence of random variables, { X n t) : t 0, 1]}, is tight in R; ii) for any ɛ > 0, lim lim P w X δ 0 n δ) ɛ ] = 0. Note. The condition ii) of Theorem 4.10 is equivalent to asking that for any ɛ, η > 0 there exists a partition of 0, 1] into finitely many T i such that lim P max i X n s) X n t) ɛ ] η. s,t T i Theorem 4.11 Theorem 15.3, Billingsley, 1968). A sequence {X n } in D is tight if and only if the following property i) and those ii), iii) and

23 Byeong U. Park 23 iv) hold for any ɛ > 0: i) the sequence of random variables, { X n t) : t 0, 1]}, is tight in R; ii) lim lim P w X δ 0 n δ) ɛ ] = 0; iii) lim lim P X n s) X n t) ɛ ] = 0; δ 0 s,t 0,δ) iv) lim δ 0 lim P s,t 1 δ,1) X n s) X n t) ɛ ] = 0. Homework: Prove Theorems 4.10 and 4.11, and also prove the result in Note between them. 4.5 Weak convergence in D, D, d S ) The following theorem gives a set of sufficient conditions for weak convergence in D. The theorem follows from Theorems 4.7 and Theorem 4.12 Theorem 15.4, Billingsley, 1968). Let X n and X be random elements in D. Suppose that P X 1) X1)) = 0. If the distribution X n t 1 ),..., X n t k )) converges weakly to the distribution Xt 1 ),..., Xt k )) for all t 1,..., t k T X and for all k 1, and if lim δ 0 lim P w X n δ) ɛ ] = ) for any ɛ > 0, then X n d X. Proof. We prove i), iii) and iv) in Theorem To prove i), let ɛ > 0 be given. Then by the condition 4.12), we can take δ 0 such that lim P w X n δ 0 ) ɛ ) ɛ/2. Choose 0 = t 1 < < t k = 1 such that t i t i 1 δ 0 from T X. This is possible since T X is dense in 0, 1] by Theorem 4.5. Then, X n t) w X n δ 0 ) + max X nt i ). t 0,1] 1 i k

24 Byeong U. Park 24 Since each sequence X n t i ) is tight, max 1 i k X n t i ) is also tight. Thus, there exists C > 0 such that ) lim P max X nt i ) > C ɛ/2. 1 i k This implies lim P lim X n t) < C + ɛ t 0,1] which concludes the proof of i). ) P w X n δ 0 ) > ɛ ) + lim P ) max X nt i ) > C ɛ, 1 i k For iii) it suffices to prove that for any ɛ > 0 there exist δ 0 and n 0 such that for all n n 0 P X n s) X n t) ɛ ] ɛ. 4.13) s,t 0,δ 0 ) We note that X n s) X n t) 2 X n s) X n 0) s,t 0,δ 0 ) s 0,δ 0 ) 2 w X n δ 0 ) + X n δ 0 ) X n 0) ]. The second inequality above holds since, for each s 0, δ 0 ), X n s) X n 0) w X n δ 0 ) in case X n s) X n 0) X n s) X n δ 0 ), and X n s) X n 0) X n s) X n δ 0 ) + X n δ 0 ) X n 0) w X n δ 0 ) + X n δ 0 ) X n 0) in case X n s) X n 0) X n s) X n δ 0 ). Let ɛ > 0 be fixed. By the second condition of the theorem, one can take a δ 1 T X and n 1 n 1 δ 1 ) such that for all n n 1 P w X n δ 1 ) ɛ/4 ] ɛ/ )

25 Byeong U. Park 25 We claim that there exists also a δ 0 δ 1 such that δ 0 T X and P Xδ 0 ) X0) ɛ/12 ] ɛ/ ) Since X n δ 0 ) and X n 0) converge weakly to Xδ 0 ) and X0), respectively, there exists n 1 n 1δ 0, ɛ) such that for all n n 1 P X n δ 0 ) Xδ 0 ) ɛ/12 ] ɛ/4, 4.16) P X n 0) X0) ɛ/12 ] ɛ/ ) In 4.15) 4.17) and below wherever relevant, X n 0), X0) and X n δ 0 ), Xδ 0 ) are those versions in Skorohod Representation Theorem Theorem 1.1) that are defined on the same probability space having almost sure convergence. The inequalities 4.14) 4.17) give P X n s) X n t) ɛ ] s,t 0,δ 0 ) P w X n δ 0 ) ɛ/4 ] + P X n δ 0 ) X n 0) ɛ/4 ] ɛ for all n n 0 n 1 n 1. It remains to prove 4.15) to complete the proof of iii). Since all x in D are right continuous, it follows that ) 1 = P lim Xs) X0) = 0 δ 0 s 0,δ] = P k l s 0,1/l] Xs) X0) 1/k) ]. This means that for any η 1 > 0 ) ] 0 = P Xs) X0) η 1 l = lim l P s 0,1/l] Xs) X0) η 1 ). s 0,1/l] Thus, for any η 1, η 2 > 0, there exists an L > 0 such that ) P Xs) X0) η 1 η 2. s 0,1/L]

26 Byeong U. Park 26 Since T X is dense in 0, 1], we can always take a δ 0 T X such that δ 0 1/L and ) P Xδ 0 ) X0) η 1 ) P Xs) X0) η 1 η 2. s 0,1/L] This completes the proof of iii). The condition iv) also holds by symmetry. One thing one should take care is existence of a δ 0 T X that ensures 4.15) which is now replaced by P X1) X1 δ 0 ) ɛ/12 ] ɛ/4. This can be proved as 4.15) by using the condition P X Dπ 1 )) = 0 and by working on the set Dπ 1 ) c whose elements are all left continuous at Limit process with continuous sample paths When the limit process X in Theorem 4.12 has continuous sample paths a.e., then P X 1) X1)) = 0 is automatically satisfied and also T X = 0, 1]. Thus, in this case X n converges weakly to X if all finite-dimensional distributions of X n converge weakly to those of X and if 4.12) holds for any ɛ > 0. What if all finite-dimensional distributions of X n converges weakly and if, instead of 4.12), lim δ 0 lim P s t <δ ] X n s) X n t) ɛ = ) for any ɛ > 0? Recall that this was a criterion for weak convergence in C. The following theorem tells that the limit process in this case has continuous sample paths a.e. Theorem 4.13 Theorem 15.5, Billingsley, 1968). Let {X n } be a sequence of random elements in D, D, d S ). Suppose that {X n 0)} is tight in R and 4.18) holds for any ɛ > 0. Then, {X n } is tight, and, if X is the weak limit of a subsequence {X n }, then X has continuous sample paths a.e., i.e., P X Dπ t ) c for all t 0, 1]) = 1. Donsker s Theorem.

27 Byeong U. Park 27 Let the random variables ξ j be iid with mean 0 and variance 1. Define a sequence of random elements X n in D by X n t) = 1 nt ξ i, 4.19) n i=1 where nt denotes the largest integer which is less than or equal to nt. Theorem 4.14 Donsker: Theorem 16.1, Billingsley, 1968). The partial sum process defined at 4.19) converges weakly to the standard Wiener process W. Proof. Prove 4.18) along the lines in the proof of Theorem 3.7. Then, use the inequality at 4.11) to prove 4.12) and apply Theorem Since the limit X is the standard Wiener process, X has continuous sample paths a.e. so that T X = 0, 1]. Thus, we need to show that all finite-dimensional distributions of X n converge to those of X, which is clear. 5 Weak Convergence of Empirical Processes We discuss weak convergence of uniform empirical processes first, and then extend the discussion to more general cases. 5.1 Uniform empirical processes Let ξ j be iid uniform0, 1], and F n be their empirical distribution function defined by F n t) = n 1 n Iξ i t). i=1 A centered and scaled version of F n defines a uniform empirical process X n indexed by t 0, 1]: X n t) = n F n t) t ). 5.20) It will be shown that this process converges weakly to the Brownian bridge defined below.

28 Byeong U. Park 28 Definition. A Gaussian process B taking values in C such that EBt) = 0 and covbs), Bt)) = s t st, is called the standard Brownian bridge. Alternatively, it is defined by Bt) = W t) tw 1), where W is the standard Brownian motion. Note. The standard Brownian bridge B are tied down at 0 and 1 with probability 1, i.e., P B0) = B1) = 0] = 1. Theorem 5.1. The empirical process defined at 5.20) for iid uniformly distributed random variables ξ j on 0, 1] converges weakly to the standard Brownian bridge B. Proof. Note that B takes values in C a.e., thus it has continuous sample paths a.e. Convergence of all finite-dimensional distributions of X n to those of B follows from the classical CLT. We use Theorem 3.6 and prove that for any ɛ, η > 0 there exists n 0 and δ > 0 such that ] P X n s) X n t) η δɛ. 5.21) n n 0 0 t 1 s t,t+δ] For each fixed t 0, 1], we divide the interval t, t + δ] into m subintervals of length p = δ/m, i.e., t, t + ip] for 1 i m. Note that X n s) X n t) s t,t+δ] max X nt + ip) X n t) 5.22) 0 i m + max X n s) X n t + ip). 0 i m 1 s t+ip,t+i+1)p] The second term on the RHS of 5.22) can be made small enough by choosing m large enough and the first term which is max 0 i m i X n t + lp) X n t + l 1)p)) let = max S i 0 i m l=1 involves only finitely many with p being determined) partial sums so that it can be handled by a maximal inequality for the partial sums such as Lemma 3.8.

29 Byeong U. Park 29 Treatments of the two terms on the RHS of 5.22) need the following identities: EIξ i s) Iξ i t)] 2 = EIξ i s) Iξ i t)] 4 This gives for the first term = s + t 2s t = s t. 5.23) ES j S i ) 4 = EX n t + jp) X n t + ip)] 4 1 n = E Iξ k t + jp) Iξ k t + ip) j i)p] n k=1 Cj i) 2 p 2 for some constant C > 0 as long as n > p 1 = m/δ and δ < 1. By applying Lemma 3.8 with γ = 4, α = 2 and u j Cp, we get ) P max X nt + ip) X n t) λ C 1 λ 4 m 2 p ) 0 i m for an absolute constant C 1 > 0. Plugging λ = η/2 gives ) P max X nt + ip) X n t) η/2 16C 1 η 4 m 2 p ) 0 i m ) 4 To treat the second term on the RHS of 5.22), we observe that for all s t + ip, t + i + 1)p] X n s) X n t + ip) nf n t + i + 1)p) F n t + ip)] X n t + i + 1)p) X n t + ip) + np since F n is a non-decreasing function. Also, we have Thus X n s) X n t + ip) np X n t + i + 1)p) X n t + ip) np. X n s) X n t + ip) 5.26) s t+ip,t+i+1)p] X n t + i + 1)p) X n t + ip) + np.

30 Byeong U. Park 30 Take m large so that nδ/m = np < η/4 and δ/m > n 1. Then, by 5.24) and 5.26) ) P max 0 i m 1 s t+ip,t+i+1)p] X n s) X n t + ip) η/2 ) P max X nt + i + 1)p) X n t + ip) η/4 0 i m 1 ) 2 P max X nt + ip) X n t) η/8 C 2 η 4 m 2 p 2 0 i m 5.27) for some 0 < C 2 <. Note that there always exists m such that nδ/m η/4 and δ/m n 1 if n > 4/η) 2. From 5.22), 5.25) and 5.27) we get ] P 0 t 1 X n s) X n t) η C 3 η 4 δ 2 s t,t+δ] for some constant C 3 > 0. Taking δ η 4 ɛ/c 3 gives 5.21). 5.2 Empirical processes in D, ) Theorem 5.1 can be extended to a more general case where ξ j distribution function F. Define are iid with X n t) = n F n t) F t) ). 5.28) The function space involved in this case is D, ), the space of all càdlàg functions defined on the whole real line, ). Also, we need to extend the definition of the Skorokhod metric to D, ). The weak limit of the empirical process X n in this case is the F -Brownian bridge BF )). Definition. Let Λ be the class of all strictly increasing and continuous mappings λ of R onto itself. The Skorokhod metric for D, ) is defined by d S x, y) = inf max{ λ Λ <t< λt) t, <t< xt) yλt)) }. With a slight abuse of notation, we continue to denote by d S the Skorokhod metric for D, ).

31 Byeong U. Park 31 Theorem 5.2. The empirical process defined at 5.28) for iid random variables ξ j with a common distribution function F converges weakly to BF )). Proof of Theorem 5.2. Define the quantile function of F by F 1 t) = inf{s : t F s)}. Then, we know F 1 t) s if and only if t F s), and thus F 1 υ) F for a uniformly distributed υ on 0, 1]. Thus, we may represent ξ i = F 1 υ i ) with iid uniformly distributed υ i on 0, 1]. By Theorem 5.1, the empirical process Y n defined by Y n t) = 1 n Iυ i t) t] n i=1 converges weakly to B. For the empirical process X n defined at 5.28), X n t) = 1 n Iξ i t) F t)] n d = 1 n = 1 n i=1 n IF 1 υ i ) t) F t) ] i=1 n Iυ i F t)) F t)] = Y n F t)). i=1 Define a map ψ : D0, 1], d S ) D, ), d S ) by ψx)t) = xf t)). If we prove ψ is continuous on C0, 1], then by the continuous mapping theorem Theorem 2.2) and by the fact that P B C0, 1] c ) = 0 we have X n = ψy n d ψb = BF )), concluding the proof of the theorem. To prove ψ is continuous at all x C0, 1], let x n be a sequence of elements in D0, 1], d S ) such that d S x n, x) 0 as n for some x C0, 1]. Then, d U x n, x) 0. One can prove this by using the inequalities in the first half of the proof of Theorem 4.4 and the fact that all x C0, 1] are uniformly continuous on 0, 1]. This implies d U ψx n, ψx) = Thus, d S ψx n, ψx) 0. x n F t)) xf t)) x n t) xt) 0. <t< t 0,1]

32 Byeong U. Park Empirical processes indexed by G Let ξ j be iid random variables taking values in R with distribution function F. Let G be a collection of measurable functions g in L 2 F ) = {g : g 2 df < }. Define X n g) = n g df n F ) = 1 n n gξ i ) Egξ)]. 5.29) The process X n = {X n g) : g G} is called the empirical process indexed by G. Note that taking G = {I,t] : t R} gives the empirical process at 5.28). We assume that X n is a map from a probability space Ω, F, P ) to the space l G) of all bounded real-valued functions on G, equipped with the uniform metric d U x, y) = xg) yg). g G This is a restriction imposed on G. For example, if the class G is enveloped by a square-integrable function G, i.e., if gx) Gx) for all x and for all g G, then X n takes values in l G). As we have seen before, X n is not Borel-measurable when G = {I,t] : t R}. This can happen with other G. If X n is not measurable, then the statement EfX n ) EfX) for any bounded and uniformly continuous real-valued function f does not make sense. To accommodate this situation in general, we extend the definition of weak convergence to a sequence of arbitrary maps X n : Ω l G). i=1 For an arbitrary map Y : Ω l G), define the outer expec- Definition. tation E by E fy ) = inf{eu) : U is measurable, U fy ) and EU) exists}. Definition. For a sequence arbitrary maps X n and for a Borel-measurable X, we say X n converges weakly to X and we write X d n X if E fx n ) EfX) for any bounded and continuous real-valued function f : l G) R.

33 Byeong U. Park 33 Definition. Let B F be a Gaussian process indexed by G such that EB F g) = 0 and covb F g 1 ), B F g 2 )) = g 1 g 2 df g 1 df g 2 df. The process B F is called F -Brownian bridge. Definition. The class G is called F -Donsker if the process X n defined at 5.29) converges weakly to B F. The following theorem gives a set of sufficient conditions for weak convergence of the empirical process X n indexed by G. Define P by P A) = E I A. Theorem 5.3. If G equipped with the L 2 F )-metric is totally bounded and if for any ɛ > 0 lim lim P Xn g 1 ) X n g 2 ) ) ɛ = 0, δ 0 then G is F -Donsker. g 1 g 2 δ For a proof of this theorem, see Dudley, R. M. 1984). A Course on Empirical Processes. Springer-Verlag, New York. Definition. Let G be a class of functions in L 2 F ). The δ-entropy of G, denoted by Hδ, G, F ), equals the logarithm of the smallest number of balls with radius δ whose union covers G. Definition. Let G be a class of functions in L 2 F ). A bracket g L, g U ] is the set of all functions g in G such that g L g g U. A δ-bracket is a bracket g L, g U ] such that g U g L δ. The δ-entropy with bracketing of G, denoted by H B δ, G, F ), equals the logarithm of the smallest number of δ-brackets that cover G. Note. If Hδ, G, F ) < for any δ > 0, then G is totally bounded. Also, it holds that Hδ, G, F ) H B δ, G, F ). Theorem 5.4 Theorem 6.3, van de Geer, 2000). If 1, then G is F -Donsker. 0 H1/2 B u, G, F ) du <

34 Byeong U. Park 34 Final Remark. So far we have considered the case where ξ j are random variables taking values in R. All discussions remain valid for iid measurable maps ξ j : Ω X, where X, A) is a measurable space. Suggested References. van de Geer, S. 2000). Empirical Processes in M-Estimation. Cambridge University Press. van der Vaart, A. W. 1998). Asymptotic Statistics. Cambridge University Press. van der Vaart, A. W. and Jon A. Wellner 1996). Weak Convergence and Empirical Processes with Applications to Statistics. Springer, New York

Empirical Processes: General Weak Convergence Theory

Empirical Processes: General Weak Convergence Theory Moulinath Banerjee May 18, 2010 1 Extended Weak Convergence The lack of measurability of the empirical process with respect to the sigma-field generated