Introduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued

Similar documents
Introduction to Empirical Processes and Semiparametric Inference Lecture 08: Stochastic Convergence

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued

Introduction to Empirical Processes and Semiparametric Inference Lecture 12: Glivenko-Cantelli and Donsker Results

Empirical Processes: General Weak Convergence Theory

Introduction to Empirical Processes and Semiparametric Inference Lecture 13: Entropy Calculations

Theoretical Statistics. Lecture 17.

MATH 6605: SUMMARY LECTURE NOTES

Homework # , Spring Due 14 May Convergence of the empirical CDF, uniform samples

Weak convergence and Brownian Motion. (telegram style notes) P.J.C. Spreij

Convergence of Feller Processes

1 Weak Convergence in R k

Measure and Integration: Solutions of CW2

Introduction to Empirical Processes and Semiparametric Inference Lecture 22: Preliminaries for Semiparametric Inference

Lecture Notes on Metric Spaces

SDS : Theoretical Statistics

Chapter 5. Weak convergence

Lecture 2: Convergence of Random Variables

7 Complete metric spaces and function spaces

1 Topology Definition of a topology Basis (Base) of a topology The subspace topology & the product topology on X Y 3

Math 328 Course Notes

Analysis III. Exam 1

d(x n, x) d(x n, x nk ) + d(x nk, x) where we chose any fixed k > N

Math 117: Infinite Sequences

converges as well if x < 1. 1 x n x n 1 1 = 2 a nx n

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.

Draft. Advanced Probability Theory (Fall 2017) J.P.Kim Dept. of Statistics. Finally modified at November 28, 2017

More Empirical Process Theory

Chapter 6. Convergence. Probability Theory. Four different convergence concepts. Four different convergence concepts. Convergence in probability

Metric Spaces. Exercises Fall 2017 Lecturer: Viveka Erlandsson. Written by M.van den Berg

ANALYSIS WORKSHEET II: METRIC SPACES

The Heine-Borel and Arzela-Ascoli Theorems

Maths 212: Homework Solutions

Brownian Motion. 1 Definition Brownian Motion Wiener measure... 3

REAL AND COMPLEX ANALYSIS

Definition 6.1. A metric space (X, d) is complete if every Cauchy sequence tends to a limit in X.

Asymptotic Statistics-VI. Changliang Zou

Fall f(x)g(x) dx. The starting place for the theory of Fourier series is that the family of functions {e inx } n= is orthonormal, that is

THE INVERSE FUNCTION THEOREM

STAT Sample Problem: General Asymptotic Results

Theoretical Statistics. Lecture 1.

SOME CONVERSE LIMIT THEOREMS FOR EXCHANGEABLE BOOTSTRAPS

Filtrations, Markov Processes and Martingales. Lectures on Lévy Processes and Stochastic Calculus, Braunschweig, Lecture 3: The Lévy-Itô Decomposition

Consequences of Continuity

Stat 8112 Lecture Notes Weak Convergence in Metric Spaces Charles J. Geyer January 23, Metric Spaces

ELEMENTS OF PROBABILITY THEORY

1 The Glivenko-Cantelli Theorem

Weak convergence. Amsterdam, 13 November Leiden University. Limit theorems. Shota Gugushvili. Generalities. Criteria

Probability and Measure

SUMMARY OF RESULTS ON PATH SPACES AND CONVERGENCE IN DISTRIBUTION FOR STOCHASTIC PROCESSES

MATH 51H Section 4. October 16, Recall what it means for a function between metric spaces to be continuous:

Elementary Analysis Math 140D Fall 2007

Stochastic Convergence, Delta Method & Moment Estimators

n E(X t T n = lim X s Tn = X s

Problem set 1, Real Analysis I, Spring, 2015.

Exercises from other sources REAL NUMBERS 2,...,

Most Continuous Functions are Nowhere Differentiable

Week 5 Lectures 13-15

Serena Doria. Department of Sciences, University G.d Annunzio, Via dei Vestini, 31, Chieti, Italy. Received 7 July 2008; Revised 25 December 2008

Math 209B Homework 2

REAL VARIABLES: PROBLEM SET 1. = x limsup E k

9 Sequences of Functions

STAT 7032 Probability Spring Wlodek Bryc

Economics 204 Fall 2011 Problem Set 2 Suggested Solutions

ANALYSIS QUALIFYING EXAM FALL 2016: SOLUTIONS. = lim. F n

The main results about probability measures are the following two facts:

1 Glivenko-Cantelli type theorems

Jae Gil Choi and Young Seo Park

Economics 620, Lecture 8: Asymptotics I

Probability Models of Information Exchange on Networks Lecture 5

4 Countability axioms

Random Process Lecture 1. Fundamentals of Probability

Large Sample Theory. Consider a sequence of random variables Z 1, Z 2,..., Z n. Convergence in probability: Z n

Stochastic Differential Equations.

Large Deviations for Small-Noise Stochastic Differential Equations

Filters in Analysis and Topology

Introduction to Topology

Large Deviations for Small-Noise Stochastic Differential Equations

Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1

be the set of complex valued 2π-periodic functions f on R such that

CHAPTER 9. Embedding theorems

7 Convergence in R d and in Metric Spaces

Math 118B Solutions. Charles Martin. March 6, d i (x i, y i ) + d i (y i, z i ) = d(x, y) + d(y, z). i=1

Convergence Concepts of Random Variables and Functions

Consequences of Continuity

Metric Spaces and Topology

Analysis III Theorems, Propositions & Lemmas... Oh My!

Extremogram and ex-periodogram for heavy-tailed time series

Geometric inference for measures based on distance functions The DTM-signature for a geometric comparison of metric-measure spaces from samples

DISTRIBUTION OF EIGENVALUES OF REAL SYMMETRIC PALINDROMIC TOEPLITZ MATRICES AND CIRCULANT MATRICES

Large Deviations for Weakly Dependent Sequences: The Gärtner-Ellis Theorem

Course 212: Academic Year Section 1: Metric Spaces

Individual confidence intervals for true solutions to stochastic variational inequalities

General Theory of Large Deviations

The Arzelà-Ascoli Theorem

Lecture 2: A crash course in Real Analysis

Lecture 2: CDF and EDF

L p Functions. Given a measure space (X, µ) and a real number p [1, ), recall that the L p -norm of a measurable function f : X R is defined by

Problem List MATH 5143 Fall, 2013

McGill University Math 354: Honors Analysis 3

1 Definition of the Riemann integral

Transcription:

Introduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations Research University of North Carolina-Chapel Hill 1

Stochastic Convergence Today, we will discuss the following basic concepts: Spaces of Bounded Functions Other Modes of Convergence Continuous Mapping Revisited Outer Almost Sure Representation Theorem Example: The Cramér-von Mises Statistic 2

Spaces of Bounded Functions Now we consider stochastic processes X n with index set T. The natural metric space for weak convergence in this setting is l (T ). 3

A nice feature of this setting is the fact that asymptotic measurability of X n follows from asymptotic measurability of X n (t) for each t T : Lemma 7.16. Let the sequence of maps X n in l (T ) be asymptotically tight. Then X n is asymptotically measurable if and only if X n (t) is asymptotically measurable for each t T. 4

Theorem 7.17. The sequence X n converges to a tight limit in l (T ) if and only if X n is asymptotically tight and all finite-dimensional marginals converge weakly to limits. Moreover, if X n is asymptotically tight and all of its finite-dimensional marginals (X n (t 1 ),..., X n (t k )) converge weakly to the marginals (X(t 1 ),..., X(t k )) of a stochastic process X, then there is a version of X such that X n X and X resides in UC(T, ρ) for some semimetric ρ making T totally bounded. 5

Recall from Chapter 2: Theorem 2.1. X n converges weakly to a tight X in l (T ) if and only if: (i) For all finite {t 1,..., t k } T, the multivariate distribution of {X n (t 1 ),..., X n (t k )} converges to that of {X(t 1 ),..., X(t k )}. (ii) There exists a semimetric ρ for which T is totally bounded and { } lim lim sup P sup X n (s) X n (t) > ɛ δ 0 n s,t T with ρ(s,t)<δ = 0, for all ɛ > 0. 6

The proof of Theorem 2.1 (which we omit) shows that whenever X n X and X is tight, any semimetric ρ defining a σ-compact set UC(T, ρ) with pr(x UC(T, ρ)) = 1 will also result in X n being uniformly ρ-equicontinuous in probability. What is not clear at this point is the converse: that any semimetric ρ which enables uniform asymptotic equicontinuity will also define a σ-compact set UC(T, ρ ) wherein X resides with probability 1. 7

The following theorem settles this question: Theorem 7.19. Assume X n X in l (T ), and let ρ be a semimetric making (T, ρ) totally bounded. TFAE: (i) X n is asymptotically uniformly ρ-equicontinuous in probability. (ii) pr(x UC(T, ρ)) = 1. 8

An interesting consequence of Theorems 2.1 and 7.19, in conjunction with Lemma 7.4, happens when X n X in l (T ) and X is a tight Gaussian process. Recall from Section 7.1 the semimetric for any p (0, ). ρ p (s, t) (E X(s) X(t) p ) 1/(p 1), 9

Then for any p (0, ), (T, ρ p ) is totally bounded, the sample paths of X are ρ p -continuous, and X n is asymptotically uniformly ρ p -equicontinuous in probability. While any value of p (0, ) will work, the choice p = 2 (the standard deviation metric) is often the most convenient to work with. 10

We now point out an equivalent but sometimes easier to work with condition for X n to be asymptotically uniformly ρ-equicontinuous in probability: Lemma 7.20. Let X n be a sequence of stochastic processes indexed by T. TFAE: (i) There exists a semimetric ρ making T totally bounded and for which X n is uniformly ρ-equicontinuous in probability. (ii) For every ɛ, η > 0, there exists a finite partition T = k i=1 T i such that lim sup n P ( sup 1 i k sup s,t T i X n (s) X n (t) > ɛ ) < η. 11

Other Modes of Convergence Recall from Chapter 2 the following modes of convergence: convergence in probability: X n every ɛ > 0. P X if P {d(x n, X) > ɛ} 0 for outer almost sure convergence: X n as X if there exists a sequence n of measurable random variables such that d(x n, X) n and P {lim sup n n = 0} = 1. 12

We now introduce two additional modes of convergence which can be useful in some settings: almost uniform convergence: X n converges almost uniformly to X if, for every ɛ > 0, there exists a measurable set A such that pr(a) 1 ɛ and d(x n, X) 0 uniformly on A. almost sure convergence: X n converges almost surely to X if P ( lim n d(x n, X) = 0) = 1. 13

Note that an important distinction between almost sure and outer almost sure convergence is that, in the latter mode, there must exist a measurable majorant of d(x n, X) which goes to zero. This distinction is quite important because almost sure convergence does not in general imply convergence in probability when d(x n, X) is not measurable. For this reason, we do generally not use the almost sure convergence mode except rarely. 14

Here are key relationships among the three remaining modes: Lemma 7.21. Let X n, X : Ω D be maps with X Borel measurable. Then (i) X n as X implies X n P X. (ii) X n P X if and only if every subsequence X n has a further subsequence X n such that X n as X. (iii) X n as X if and only if X n converges almost uniformly to X if and only if sup m n d(x m, X) P 0. Since almost uniform convergence and outer almost sure convergence are equivalent for sequences, we will not use the almost uniform mode much. 15

The next lemma describes several important relationships between weak convergence and convergence in probability. Before presenting it, we need to extend the definition of convergence in probability in the setting where the limit is a constant to allow the probability spaces involved to change with n as is already permitted for weak convergence. We denote this modified convergence X n P c, and distinguish it from the previous form of convergence in probability only by context. 16

Lemma 7.23. Let X n, Y n : Ω n D be maps, X : Ω D be Borel measurable, and c D be a constant. Then (i) If X n X and d(x n, Y n ) P 0, then Y n X. (ii) X P n X implies X n X. (iii) X P n c if and only if X n c. 17

Proof. We first prove (i). Let F D be closed, and fix ɛ > 0. Then lim sup n P (Y n F ) = lim sup n lim sup n P (X F ɛ ). P (Y n F, d(x n, Y n ) ɛ) P (X n F ɛ ) The result follows by letting ɛ 0. 18

Now assume X n P X. Since X X, d(x, X n ) P 0 implies X n X by (i). This follows by taking in (i) X n = X and Y n = X n. Thus (ii) follows. 19

We now prove (iii). X n P c implies X n c by (ii). Now assume X n c, and fix ɛ > 0. 20

Note that P (d(x n, c) ɛ) = P (X n B(c, ɛ)), where B(c, ɛ) is the open ɛ-ball around c D. By the portmanteau theorem, lim sup n P (X n B(c, ɛ)) pr(x B(c, ɛ)) = 0. Thus X n P c since ɛ is arbitrary, and (iii) follows. 21

Continuous Mapping Revisited We now present a generalized continuous mapping theorem for sequences of maps g n which converge to g. For example, suppose T is high dimensional. The computational burden of computing the supremum of X n (t) over T may be reduced by choosing a finite mesh T n which closely approximates T. 22

Theorem 7.24 (Extended continuous mapping). Let D n D and g n : D n E satisfy the following: if x n x with x n D n for all n 1 and x D 0, then g n (x n ) g(x), where D 0 D and g : D 0 E. Let X n be maps taking values in D n, and let X be Borel measurable and separable with P (X D 0 ) = 1. Then (i) X n X implies g n (X n ) g(x). (ii) X n P X implies g n (X n ) P g(x). (iii) X n as X implies g n (X n ) as g(x). 23

The following alternative theorem does not require separability of X and is a supplement to Theorem 7.7: Theorem 7.25. Let g : D E be continuous at all points in D 0 D, and let X be Borel measurable with P (X D 0 ) = 1. Then (i) X n P X implies g(x n ) P g(x). (ii) X n as X implies g(x n ) as g(x). 24

Outer Almost Sure Representation Theorem We now present a useful outer almost sure representation result for weak convergence. Such representations allow the transformation of certain weak convergence problems into problems about convergence of fixed sequences. We give an illustration of this approach in the sketch of the proof of Proposition 7.27 below. 25

Theorem 7.26. Let X n : Ω n D be a sequence of maps, and let X be Borel measurable and separable. If X n X, then there exists a probability space ( Ω, Ã, P ) and maps X n : Ω D with (i) Xn as X ; (i) E f( X n ) = E f(x n ), for every bounded f : D R and all 1 n. Moreover, Xn can be chosen to be equal to X n φ n, for all 1 n, where the φ n : Ω Ω n are measurable and perfect maps and P n = P φ 1 n. 26

Recall our previous discussion about perfect maps. In the setting of the above theorem, if the X n are constructed from the perfect maps φ n, then [ f( X n )] = [f(xn )] φ n for all bounded f : D R. Thus the equivalence between X n and X n can be made much stronger than simply equivalence in law. 27

The following proposition can be useful in studying weak convergence of certain statistics which can be expressed as stochastic integrals. For example, the Wilcoxon and Cramér-von Mises statistics can be expressed in this way. Proposition 7.27. Let X n, G n D[a, b] be stochastic processes with X n X and G n P G in D[a, b], where X has continuous sample paths, G is fixed, and G n and G have total variation bounded by K <. Then ( ) in D[a, b]. a X n (s)dg n (s) ( ) a X(s)dG(s) 28

Sketch of Proof. First, Slutsky s theorem and Lemma 7.23 establish that (X n, G n ) (X, G). Next, Theorem 7.26 tells us that there exists a new probability space and processes X n, X, Gn and G which have the same outer integrals for bounded functions as X n, X, G n and G, respectively, but which also satisfy ( X n, G n ) as ( X, G). 29

Fix ɛ > 0. By the continuity of the sample paths of X over the compact [a, b] (which can include extended reals), we have that there exists an integer 1 m < and a partition of [a, b], a = t 0 < t 1 < < t m = b, where max 1 j m sup X(s) X(t) ɛ. s,t (t j 1,t j ] 30

Define X m D[a, b] such that Xm (a) = X(a) and X m (t) m 1{t j 1 < t t j } X(t j ), j=1 for t (a, b]. Note that the integral of interest evaluated at t = a is zero since the set (a, a] is empty. 31

We now have, for any t (a, b], that t X n (s)d G n (s) a b a b + + + t a X n (s) X(s) a t a t a X m (s) X(s) X m (s) X m (s) X(s) X(s)d G(s) d G n (s) d G n (s) { d G n (s) d G(s)} d G(s) Note that the first term 0, while the second and fourth terms are bounded by Kɛ. 32

By definition of Xm, the third term equals m j=1 X(t j ) (t j 1,t j ] (a,t] { d G n (s) d G(s)} 0. m X G n G Thus all four parts summed are asymptotically bounded by 2Kɛ. 33

Since ɛ was arbitrary, we conclude that ( ) a X n (s)d G n (s) as ( ) a X(s)d G(s). Now part (ii) of Lemma 7.23 yields that we can replace the outer almost sure convergence with week convergence. This now implies weak convergence in the original probability space, and the proof is complete. 34

Example: The Cramér-von Mises Statistic Let X 1,..., X n be i.i.d. real random variables with continuous distribution F and corresponding empirical process ˆF n. The Cramér-von Mises Statistic for testing the null hypothesis H 0 : F = F 0 has the form: T n = n ( ) 2 ˆFn (t) F 0 (t) d ˆFn (t). 35

In this case, X n = n ( ˆFn F 0 ) 2, G n = ˆF n, and [a, b] = [, ]. Standard empirical process results yield that when F = F 0 (i.e., under the null), X n X and G n P G in D[a, b], where X(t) = B 2 (F 0 (t)), B is a standard Brownian bridage, and G = F 0. Note that X has continuous sample paths and the total variation of G n and G is at most 1. 36

Thus all of the conditions of Proposition 7.27 are met, and thus T n 1 0 B 2 (t)dt, which is a pivotal limit. The distribution of this pivotal can be written as an infinite series which is not difficult to compute (see, e.g., Tolmatz, 2001, Annals of Probability): the critical value for α = 0.05 is approximately 0.220 (I think). 37