Introduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued

Introduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations Research University of North Carolina-Chapel Hill 1

Stochastic Convergence Today, we will discuss the following basic concepts: Spaces of Bounded Functions Other Modes of Convergence Continuous Mapping Revisited Outer Almost Sure Representation Theorem Example: The Cramér-von Mises Statistic 2

Spaces of Bounded Functions Now we consider stochastic processes X n with index set T. The natural metric space for weak convergence in this setting is l (T ). 3

A nice feature of this setting is the fact that asymptotic measurability of X n follows from asymptotic measurability of X n (t) for each t T : Lemma 7.16. Let the sequence of maps X n in l (T ) be asymptotically tight. Then X n is asymptotically measurable if and only if X n (t) is asymptotically measurable for each t T. 4

Theorem 7.17. The sequence X n converges to a tight limit in l (T ) if and only if X n is asymptotically tight and all finite-dimensional marginals converge weakly to limits. Moreover, if X n is asymptotically tight and all of its finite-dimensional marginals (X n (t 1 ),..., X n (t k )) converge weakly to the marginals (X(t 1 ),..., X(t k )) of a stochastic process X, then there is a version of X such that X n X and X resides in UC(T, ρ) for some semimetric ρ making T totally bounded. 5

Recall from Chapter 2: Theorem 2.1. X n converges weakly to a tight X in l (T ) if and only if: (i) For all finite {t 1,..., t k } T, the multivariate distribution of {X n (t 1 ),..., X n (t k )} converges to that of {X(t 1 ),..., X(t k )}. (ii) There exists a semimetric ρ for which T is totally bounded and { } lim lim sup P sup X n (s) X n (t) > ɛ δ 0 n s,t T with ρ(s,t)<δ = 0, for all ɛ > 0. 6

The proof of Theorem 2.1 (which we omit) shows that whenever X n X and X is tight, any semimetric ρ defining a σ-compact set UC(T, ρ) with pr(x UC(T, ρ)) = 1 will also result in X n being uniformly ρ-equicontinuous in probability. What is not clear at this point is the converse: that any semimetric ρ which enables uniform asymptotic equicontinuity will also define a σ-compact set UC(T, ρ ) wherein X resides with probability 1. 7

The following theorem settles this question: Theorem 7.19. Assume X n X in l (T ), and let ρ be a semimetric making (T, ρ) totally bounded. TFAE: (i) X n is asymptotically uniformly ρ-equicontinuous in probability. (ii) pr(x UC(T, ρ)) = 1. 8

An interesting consequence of Theorems 2.1 and 7.19, in conjunction with Lemma 7.4, happens when X n X in l (T ) and X is a tight Gaussian process. Recall from Section 7.1 the semimetric for any p (0, ). ρ p (s, t) (E X(s) X(t) p ) 1/(p 1), 9

Then for any p (0, ), (T, ρ p ) is totally bounded, the sample paths of X are ρ p -continuous, and X n is asymptotically uniformly ρ p -equicontinuous in probability. While any value of p (0, ) will work, the choice p = 2 (the standard deviation metric) is often the most convenient to work with. 10

We now point out an equivalent but sometimes easier to work with condition for X n to be asymptotically uniformly ρ-equicontinuous in probability: Lemma 7.20. Let X n be a sequence of stochastic processes indexed by T. TFAE: (i) There exists a semimetric ρ making T totally bounded and for which X n is uniformly ρ-equicontinuous in probability. (ii) For every ɛ, η > 0, there exists a finite partition T = k i=1 T i such that lim sup n P ( sup 1 i k sup s,t T i X n (s) X n (t) > ɛ ) < η. 11

Other Modes of Convergence Recall from Chapter 2 the following modes of convergence: convergence in probability: X n every ɛ > 0. P X if P {d(x n, X) > ɛ} 0 for outer almost sure convergence: X n as X if there exists a sequence n of measurable random variables such that d(x n, X) n and P {lim sup n n = 0} = 1. 12

We now introduce two additional modes of convergence which can be useful in some settings: almost uniform convergence: X n converges almost uniformly to X if, for every ɛ > 0, there exists a measurable set A such that pr(a) 1 ɛ and d(x n, X) 0 uniformly on A. almost sure convergence: X n converges almost surely to X if P ( lim n d(x n, X) = 0) = 1. 13

Note that an important distinction between almost sure and outer almost sure convergence is that, in the latter mode, there must exist a measurable majorant of d(x n, X) which goes to zero. This distinction is quite important because almost sure convergence does not in general imply convergence in probability when d(x n, X) is not measurable. For this reason, we do generally not use the almost sure convergence mode except rarely. 14

Here are key relationships among the three remaining modes: Lemma 7.21. Let X n, X : Ω D be maps with X Borel measurable. Then (i) X n as X implies X n P X. (ii) X n P X if and only if every subsequence X n has a further subsequence X n such that X n as X. (iii) X n as X if and only if X n converges almost uniformly to X if and only if sup m n d(x m, X) P 0. Since almost uniform convergence and outer almost sure convergence are equivalent for sequences, we will not use the almost uniform mode much. 15

The next lemma describes several important relationships between weak convergence and convergence in probability. Before presenting it, we need to extend the definition of convergence in probability in the setting where the limit is a constant to allow the probability spaces involved to change with n as is already permitted for weak convergence. We denote this modified convergence X n P c, and distinguish it from the previous form of convergence in probability only by context. 16

Lemma 7.23. Let X n, Y n : Ω n D be maps, X : Ω D be Borel measurable, and c D be a constant. Then (i) If X n X and d(x n, Y n ) P 0, then Y n X. (ii) X P n X implies X n X. (iii) X P n c if and only if X n c. 17

Proof. We first prove (i). Let F D be closed, and fix ɛ > 0. Then lim sup n P (Y n F ) = lim sup n lim sup n P (X F ɛ ). P (Y n F, d(x n, Y n ) ɛ) P (X n F ɛ ) The result follows by letting ɛ 0. 18

Now assume X n P X. Since X X, d(x, X n ) P 0 implies X n X by (i). This follows by taking in (i) X n = X and Y n = X n. Thus (ii) follows. 19

We now prove (iii). X n P c implies X n c by (ii). Now assume X n c, and fix ɛ > 0. 20

Note that P (d(x n, c) ɛ) = P (X n B(c, ɛ)), where B(c, ɛ) is the open ɛ-ball around c D. By the portmanteau theorem, lim sup n P (X n B(c, ɛ)) pr(x B(c, ɛ)) = 0. Thus X n P c since ɛ is arbitrary, and (iii) follows. 21

Continuous Mapping Revisited We now present a generalized continuous mapping theorem for sequences of maps g n which converge to g. For example, suppose T is high dimensional. The computational burden of computing the supremum of X n (t) over T may be reduced by choosing a finite mesh T n which closely approximates T. 22

Theorem 7.24 (Extended continuous mapping). Let D n D and g n : D n E satisfy the following: if x n x with x n D n for all n 1 and x D 0, then g n (x n ) g(x), where D 0 D and g : D 0 E. Let X n be maps taking values in D n, and let X be Borel measurable and separable with P (X D 0 ) = 1. Then (i) X n X implies g n (X n ) g(x). (ii) X n P X implies g n (X n ) P g(x). (iii) X n as X implies g n (X n ) as g(x). 23

The following alternative theorem does not require separability of X and is a supplement to Theorem 7.7: Theorem 7.25. Let g : D E be continuous at all points in D 0 D, and let X be Borel measurable with P (X D 0 ) = 1. Then (i) X n P X implies g(x n ) P g(x). (ii) X n as X implies g(x n ) as g(x). 24

Outer Almost Sure Representation Theorem We now present a useful outer almost sure representation result for weak convergence. Such representations allow the transformation of certain weak convergence problems into problems about convergence of fixed sequences. We give an illustration of this approach in the sketch of the proof of Proposition 7.27 below. 25

Theorem 7.26. Let X n : Ω n D be a sequence of maps, and let X be Borel measurable and separable. If X n X, then there exists a probability space ( Ω, Ã, P ) and maps X n : Ω D with (i) Xn as X ; (i) E f( X n ) = E f(x n ), for every bounded f : D R and all 1 n. Moreover, Xn can be chosen to be equal to X n φ n, for all 1 n, where the φ n : Ω Ω n are measurable and perfect maps and P n = P φ 1 n. 26

Recall our previous discussion about perfect maps. In the setting of the above theorem, if the X n are constructed from the perfect maps φ n, then [ f( X n )] = [f(xn )] φ n for all bounded f : D R. Thus the equivalence between X n and X n can be made much stronger than simply equivalence in law. 27

The following proposition can be useful in studying weak convergence of certain statistics which can be expressed as stochastic integrals. For example, the Wilcoxon and Cramér-von Mises statistics can be expressed in this way. Proposition 7.27. Let X n, G n D[a, b] be stochastic processes with X n X and G n P G in D[a, b], where X has continuous sample paths, G is fixed, and G n and G have total variation bounded by K <. Then ( ) in D[a, b]. a X n (s)dg n (s) ( ) a X(s)dG(s) 28

Sketch of Proof. First, Slutsky s theorem and Lemma 7.23 establish that (X n, G n ) (X, G). Next, Theorem 7.26 tells us that there exists a new probability space and processes X n, X, Gn and G which have the same outer integrals for bounded functions as X n, X, G n and G, respectively, but which also satisfy ( X n, G n ) as ( X, G). 29

Fix ɛ > 0. By the continuity of the sample paths of X over the compact [a, b] (which can include extended reals), we have that there exists an integer 1 m < and a partition of [a, b], a = t 0 < t 1 < < t m = b, where max 1 j m sup X(s) X(t) ɛ. s,t (t j 1,t j ] 30

Define X m D[a, b] such that Xm (a) = X(a) and X m (t) m 1{t j 1 < t t j } X(t j ), j=1 for t (a, b]. Note that the integral of interest evaluated at t = a is zero since the set (a, a] is empty. 31

We now have, for any t (a, b], that t X n (s)d G n (s) a b a b + + + t a X n (s) X(s) a t a t a X m (s) X(s) X m (s) X m (s) X(s) X(s)d G(s) d G n (s) d G n (s) { d G n (s) d G(s)} d G(s) Note that the first term 0, while the second and fourth terms are bounded by Kɛ. 32

By definition of Xm, the third term equals m j=1 X(t j ) (t j 1,t j ] (a,t] { d G n (s) d G(s)} 0. m X G n G Thus all four parts summed are asymptotically bounded by 2Kɛ. 33

Since ɛ was arbitrary, we conclude that ( ) a X n (s)d G n (s) as ( ) a X(s)d G(s). Now part (ii) of Lemma 7.23 yields that we can replace the outer almost sure convergence with week convergence. This now implies weak convergence in the original probability space, and the proof is complete. 34

Example: The Cramér-von Mises Statistic Let X 1,..., X n be i.i.d. real random variables with continuous distribution F and corresponding empirical process ˆF n. The Cramér-von Mises Statistic for testing the null hypothesis H 0 : F = F 0 has the form: T n = n ( ) 2 ˆFn (t) F 0 (t) d ˆFn (t). 35

In this case, X n = n ( ˆFn F 0 ) 2, G n = ˆF n, and [a, b] = [, ]. Standard empirical process results yield that when F = F 0 (i.e., under the null), X n X and G n P G in D[a, b], where X(t) = B 2 (F 0 (t)), B is a standard Brownian bridage, and G = F 0. Note that X has continuous sample paths and the total variation of G n and G is at most 1. 36

Thus all of the conditions of Proposition 7.27 are met, and thus T n 1 0 B 2 (t)dt, which is a pivotal limit. The distribution of this pivotal can be written as an infinite series which is not difficult to compute (see, e.g., Tolmatz, 2001, Annals of Probability): the critical value for α = 0.05 is approximately 0.220 (I think). 37