Introduction to Empirical Processes and Semiparametric Inference Lecture 08: Stochastic Convergence

Similar documents
Introduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued

Introduction to Empirical Processes and Semiparametric Inference Lecture 13: Entropy Calculations

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued

Empirical Processes: General Weak Convergence Theory

Introduction to Empirical Processes and Semiparametric Inference Lecture 12: Glivenko-Cantelli and Donsker Results

Stat 8112 Lecture Notes Weak Convergence in Metric Spaces Charles J. Geyer January 23, Metric Spaces

Introduction to Empirical Processes and Semiparametric Inference Lecture 22: Preliminaries for Semiparametric Inference

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

Weak convergence and Brownian Motion. (telegram style notes) P.J.C. Spreij

Metric Spaces and Topology

g 2 (x) (1/3)M 1 = (1/3)(2/3)M.

Math 209B Homework 2

REAL AND COMPLEX ANALYSIS

2 (Bonus). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

MATH 51H Section 4. October 16, Recall what it means for a function between metric spaces to be continuous:

THEOREMS, ETC., FOR MATH 516

Math General Topology Fall 2012 Homework 11 Solutions

CHAPTER I THE RIESZ REPRESENTATION THEOREM

SEPARABILITY AND COMPLETENESS FOR THE WASSERSTEIN DISTANCE

Continuity of convex functions in normed spaces

Notions such as convergent sequence and Cauchy sequence make sense for any metric space. Convergent Sequences are Cauchy

1 Weak Convergence in R k

Metric spaces and metrizability

Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1

Most Continuous Functions are Nowhere Differentiable

Lecture Notes on Metric Spaces

PROBLEMS. (b) (Polarization Identity) Show that in any inner product space

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...

Real Analysis Notes. Thomas Goller

1 Topology Definition of a topology Basis (Base) of a topology The subspace topology & the product topology on X Y 3

Bounded uniformly continuous functions

Weak convergence. Amsterdam, 13 November Leiden University. Limit theorems. Shota Gugushvili. Generalities. Criteria

CHAPTER V DUAL SPACES

Some Background Material

SDS : Theoretical Statistics

SUMMARY OF RESULTS ON PATH SPACES AND CONVERGENCE IN DISTRIBUTION FOR STOCHASTIC PROCESSES

FUNCTIONAL ANALYSIS LECTURE NOTES: COMPACT SETS AND FINITE-DIMENSIONAL SPACES. 1. Compact Sets

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9

U e = E (U\E) e E e + U\E e. (1.6)

(1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define

Overview of normed linear spaces

THEOREMS, ETC., FOR MATH 515

Draft. Advanced Probability Theory (Fall 2017) J.P.Kim Dept. of Statistics. Finally modified at November 28, 2017

Stone-Čech compactification of Tychonoff spaces

4th Preparation Sheet - Solutions

The Arzelà-Ascoli Theorem

Locally convex spaces, the hyperplane separation theorem, and the Krein-Milman theorem

Problem Set 5. 2 n k. Then a nk (x) = 1+( 1)k

4 Countability axioms

ANALYSIS WORKSHEET II: METRIC SPACES

WHY SATURATED PROBABILITY SPACES ARE NECESSARY

Existence and Uniqueness

(convex combination!). Use convexity of f and multiply by the common denominator to get. Interchanging the role of x and y, we obtain that f is ( 2M ε

Analysis Finite and Infinite Sets The Real Numbers The Cantor Set

REVIEW OF ESSENTIAL MATH 346 TOPICS

From now on, we will represent a metric space with (X, d). Here are some examples: i=1 (x i y i ) p ) 1 p, p 1.

An introduction to Mathematical Theory of Control

ELEMENTS OF PROBABILITY THEORY

MAT 578 FUNCTIONAL ANALYSIS EXERCISES

Continuous Functions on Metric Spaces

Review of Multi-Calculus (Study Guide for Spivak s CHAPTER ONE TO THREE)

Measures. Chapter Some prerequisites. 1.2 Introduction

Math 328 Course Notes

1 Lecture 4: Set topology on metric spaces, 8/17/2012

Applied Analysis (APPM 5440): Final exam 1:30pm 4:00pm, Dec. 14, Closed books.

Compact operators on Banach spaces

Math 5052 Measure Theory and Functional Analysis II Homework Assignment 7

Lebesgue Measure on R n

are Banach algebras. f(x)g(x) max Example 7.4. Similarly, A = L and A = l with the pointwise multiplication

Prohorov s theorem. Bengt Ringnér. October 26, 2008

Bounded and continuous functions on a locally compact Hausdorff space and dual spaces

MAA6617 COURSE NOTES SPRING 2014

Weak convergence and large deviation theory

STAT 7032 Probability Spring Wlodek Bryc

Some results on primeness in the near-ring of Lipschitz functions on a normed vector space

converges as well if x < 1. 1 x n x n 1 1 = 2 a nx n

Problem List MATH 5143 Fall, 2013

The main results about probability measures are the following two facts:

Introduction and Preliminaries

Convergence of Banach valued stochastic processes of Pettis and McShane integrable functions

l(y j ) = 0 for all y j (1)

Notes for Functional Analysis

MATH 6605: SUMMARY LECTURE NOTES

Stochastic Convergence, Delta Method & Moment Estimators

Analysis Comprehensive Exam Questions Fall F(x) = 1 x. f(t)dt. t 1 2. tf 2 (t)dt. and g(t, x) = 2 t. 2 t

Convergence of Feller Processes

Continuous functions that are nowhere differentiable

The Heine-Borel and Arzela-Ascoli Theorems

Theoretical Statistics. Lecture 1.

Week 5 Lectures 13-15

An introduction to some aspects of functional analysis

Functional Analysis Exercise Class

Lecture Notes Introduction to Ergodic Theory

(2) E M = E C = X\E M

Metric Spaces Math 413 Honors Project

Basic Definitions: Indexed Collections and Random Functions

General Theory of Large Deviations

1 The Glivenko-Cantelli Theorem

Lecture 5 - Hausdorff and Gromov-Hausdorff Distance

Problem set 1, Real Analysis I, Spring, 2015.

Transcription:

Introduction to Empirical Processes and Semiparametric Inference Lecture 08: Stochastic Convergence Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations Research University of North Carolina-Chapel Hill 1

Stochastic Convergence Today, we will discuss the following basic concepts: Stochastic processes in metric spaces Tightness and separability of stochastic processes Gaussian processes Weak convergence and portmanteau theorem Continuous mapping theorem Asymptotic measurability and asymptotic tightness Prohorov s theorem Slutsky s theorem 2

Stochastic Processes in Metric Spaces Recall that for a stochastic process {X(t), t T }, X(t) is a measurable real random variable for each t T on a probability space (Ω, A, P ). The sample paths of such a process typically reside in the metric space D = l (T ) with the uniform metric. Often, however, when X is viewed as a map from Ω to D, it is no longer Borel measurable. 3

A classic example of this issue comes from Billingsley (1968, Pages 152 153). The example hinges on the existence of a set H [0, 1] which is not a Borel set (this is true). Define the stochastic process X(t) = 1{U t}, where t [0, 1] and U is uniformly distributed on [0, 1]. 4

The probability space for X is (Ω, B, P ), where Ω = [0, 1], B are the Borel sets on [0, 1], and P is the uniform probability measure on [0, 1]. A natural metric space for the sample paths of X is l ([0, 1]). Define the set A = s H B s (1/2), where B s (1/2) is the uniform open ball of radius 1/2 around the function t f s (t) 1{t s}. 5

Since A is an open set in l ([0, 1]), and since the uniform distance between f s1 and f s2 is 1 whenever s 1 s 2, X(t) B s (1/2) if and only if U = s. Thus the set {ω Ω : X(ω) A} equals H. Since H is not a Borel set, X is not Borel measurable. 6

This lack of measurability is the usual state for biostatistics. Many of the associated technical difficulties can be resolved by using of outer measure and outer expectation in the context of weak convergence. In contrast, most of the limiting processes in biostatistics are Borel measurable. Hence a brief study of Borel measurable processes is valuable. 7

Two Borel random maps X and X, with respective laws L and L, are versions of each other if L = L. Recall BL 1 (D), the set of all functions f : D R bounded by 1 and with Lipschitz norm bounded by 1, i.e., f(x) f(y) d(x, y) for all x, y D. When the context is clear, we simply use BL 1. 8

Define a vector lattice F C b (D) to be a vector space such that if f F then f 0 F. We also say that a set F of real functions on D separates points of D if, for any x, y D with x y, there exists f F such that f(x) f(y). 9

LEMMA 1. Let L 1 and L 2 be Borel probability measures on a metric space D. TFAE: (i) L 1 = L 2. (ii) fdl 1 = fdl 2 for all f C b (D). If L 1 and L 2 are also separable, then (i) and (ii) are both equivalent to (iii) fdl 1 = fdl 2 for all f BL 1. Moreover, if L 1 and L 2 are also tight, then (i) (iii) are all equivalent to (iv) fdl 1 = fdl 2 for all f in a vector lattice F C b (D) that both contains the constant functions and separates points in D. 10

Tightness and Separability of Stochastic Processes In addition to being Borel measurable, most of the limiting stochastic processes of interest are tight. A Borel probability measure L on a metric space D is tight if for every ɛ > 0, there exists a compact K D so that L(K) 1 ɛ. A Borel random map X : Ω D is tight if its law L is tight. 11

Tightness is equivalent to there being a σ-compact set that has probability 1 under L or X. L or X is separable if there is a measurable and separable set which has probability 1. L or X is Polish if there is a measurable Polish set having probability 1. 12

Note that tightness, separability and Polishness are all topological properties and do not depend on a metric. Since both σ-compact and Polish sets are also separable, separability is the weakest of the three properties. Whenever we say X has any one of these three properties, we tacitly imply that X is also Borel measurable. On a complete metric space, tightness, separability and Polishness are equivalent. 13

For a stochastic process {X(t), t T }, where (T, ρ) is a separable, semimetric space, there is another meaning for separable: X is separable (as a stochastic process) if there exists a countable subset S T and a null set N so that, for each ω N and t T, there exists a sequence {s m } S with ρ(s m, t) 0 and X(s m, ω) X(t, ω) 0. 14

It turns out that many empirical processes are separable in this sense, even though they are not Borel measurable and therefore cannot satisfy the other meaning for separable. The distinction between these two definitions will either be explicitly stated or made clear by the context. 15

Most limiting processes X of interest will reside in l (T ), where the index set T is often a class of real functions F with domain equal to the sample space. When such limiting processes are tight, the following lemma demands that X resides on UC(T, ρ), where ρ is some semimetric making T totally bounded, with probability 1. 16

LEMMA 2. Let X be a Borel measurable random element in l (T ). TFAE: (i) X is tight. (ii) There exists a semimetric ρ making T totally bounded and for which X UC(T, ρ) with probability 1. Furthermore, if (ii) holds for any ρ, then it also holds for the semimetric ρ 0 (s, t) E arctan X(s) X(t). 17

A nice feature of tight processes in l (T ) is that the laws are completely defined by their finite-dimensional marginal distributions (X(t 1 ),..., X(t k )), where t 1,..., t k T : LEMMA 3. Let X and Y be tight, Borel measurable stochastic processes in l (T ). Then the Borel laws of X and Y are equal if and only if all corresponding finite-dimensional marginal distributions are equal. 18

Proof. Consider the collection F C b (D) of all functions f : l (T ) R of the form f(x) = g(x(t 1 ),..., x(t k )), where g C b (R k ) and k 1 is an integer. We leave it as an exercise to show that F is a vector lattice, an algebra, and separates points of l (T ). The desired result now follows from Lemma 1. 19

While the semimetric ρ 0 defined in Lemma 2 is always applicable when X is tight, it is frequently not the most convenient choice. There are also other useful, equivalent choices. For a process X in l (T ) and a semimetric ρ on T, we say that X is uniformly ρ-continuous in pth mean if E X(s n ) X(t n ) p 0 whenever ρ(s n, t n ) 0. 20

Gaussian Processes Perhaps the most frequently occurring limiting process in l (T ) is a Gaussian process. A stochastic process {X(t), t T } is Gaussian if all finite-dimensional marginals {X(t 1 ),..., X(t k )} are multivariate normal. 21

If a Gaussian process X is tight, then by Lemma 2, there is a semimetric ρ making T totally bounded and for which the sample paths t X(t) are uniformly ρ-continuous. An interesting feature of Gaussian processes is that this result implies that the map t X(t) is uniformly ρ-continuous in pth mean for all p (0, ). 22

For a general Banach space D, a Borel measurable random element X on D is Gaussian if and only if f(x) is Gaussian for every continuous, linear functional f : D R. When D = l (T ) for some set T, this definition appears to contradict the definition of Gaussianity given in the preceding paragraph, since now we are using all continuous linear functionals instead of just linear combinations of coordinate projections. 23

These two definitions are not really reconcilable in general, and so some care must be taken in reading the literature. However, when the process in question is tight, the two definitions are equivalent, as verified in the following proposition. 24

PROPOSITION 1. Let X be a tight, Borel measurable map into l (T ). TFAE: (i) The vector (X(t 1 ),..., X(t k )) is multivariate normal for every finite set {t 1,..., t k } T. (ii) φ(x) is Gaussian for every continuous, linear functional φ : l (T ) R. (iii) φ(x) is Gaussian for every continuous, linear map φ : l (T ) D into any Banach space D. 25

Weak Convergence and Portmanteau Theorem The extremely important concept of weak convergence of sequences arises in many areas of statistics. To be as flexible as possible, we allow the probability spaces associated with the sequences to change with n. Let (Ω n, A n, P n ) be a sequence of probability spaces and X n : Ω n D a sequence of maps. 26

We say that X n converges weakly to a Borel measurable X : Ω D if E f(x n ) Ef(X), for every f C b (D). (1) If L is the law of X, (1) can be reexpressed as E f(x n ) Ω f(x)dl(x), for every f C b (D). 27

Weak convergence is denoted X n X or, equivalently, X n L. Weak convergence is equivalent to convergence in distribution and convergence in law. By Lemma 1, this definition of weak convergence ensures that the limiting distributions are unique. 28

Note that the choice of probability spaces (Ω n, A n, P n ) is important since these dictate the outer expectation. In most of the settings discussed in this book, Ω n = Ω for all n 1. 29

The above definition of weak convergence does not appear to connect to the standard, statistical notion of convergence of probabilities. However, this connection does hold for probabilities of sets B Ω which have boundaries δb satisfying L(δB) = 0. The boundary δb of a set B in a topological space is the closure of B minus the interior of B. 30

THEOREM 1. (Portmanteau) TFAE: (i) X n L; (ii) lim inf P (X n G) L(G) for every open G; (iii) lim sup P (X n F ) L(F ) for every closed F ; (iv) lim inf E f(x n ) Ω f bounded below; f(x)dl(x) for every lower semicontinuous (v) lim sup E f(x n ) Ω semicontinuous f bounded above; f(x)dl(x) for every upper (vi) lim P (X n B) = lim P (X n B) = L(B) for every Borel B with L(δB) = 0; (vii) lim inf E f(x n ) Ω continuous, nonnegative f. f(x)dl(x) for every bounded, Lipschitz 31

Furthermore, if L is separable, then (i) (vii) are also equivalent to (viii) sup f BL1 E f(x n ) Ef(X) 0. 32

Continuous Mapping Theorem THEOREM 2. (Continuous mapping) Let g : D E be continuous at all points in D 0 D, where D and E are metric spaces. Then if X n X in D, with P (X D 0 ) = 1, then g(x n ) g(x). 33

Asymptotic Measureability and Asymptotic Tightness A potential issue is that there may sometimes be more than one choice of metric space D to work with. For example, if we are studying weak convergence of the usual empirical process n( ˆF n (t) F (t)) based on data in [0, 1], we could let D be either l ([0, 1]) or D[0, 1]. 34

The following lemma tells us that the choice of metric space is generally not a problem: LEMMA 4. Let the metric spaces D 0 D have the same metric, and assume X and X n reside in D 0. Then X n X in D 0 if and only if X n X in D. 35

A sequence X n is asymptotically measurable if and only if for all f C b (D). E f(x n ) E f(x n ) 0, (2) A sequence X n is asymptotically tight if for every ɛ > 0, there is a compact K so that lim inf P (X n K δ ) 1 ɛ, for every δ > 0, where for a set A D, A δ = {x D : d(x, A) < δ} is the δ-enlargement around A. 36

Two good properties of asymptotic tightness are that it does not depend on the metric chosen only on the topology and that weak convergence often implies asymptotic tightness. The first of these two properties are verified in the following: LEMMA 5. X n is asymptotically tight if and only if for every ɛ > 0 there exists a compact K so that lim inf P (X n G) 1 ɛ for every open G K. 37

The second good property of asymptotic tightness is part (ii) of the following lemma, part (i) of which gives the necessity of asymptotic measurability for weakly convergent sequences: LEMMA 6. Assume X n X. Then (i) X n is asymptotically measurable. (ii) X n is asymptotically tight if and only if X is tight. 38

Prohorov s Theorem Prohorov s theorem (modernized) tells us that asymptotic measurability and asymptotic tightness together almost gives us weak convergence. This almost-weak-convergence is relative compactness. A sequence X n is relatively compact if every subsequence X n has a further subsequence X n which converges weakly to a tight Borel law. 39

Weak convergence happens when all of the limiting Borel laws are the same. THEOREM 3. (Prohorov s theorem) If the sequence X n is asymptotically measurable and asymptotically tight, then it has a subsequence X n that converges weakly to a tight Borel law. 40

Note that the conclusion of Prohorov s theorem does not state that X n is relatively compact, and thus it appears as if we have broken our earlier promise. However, if X n is asymptotically measurable and asymptotically tight, then every subsequence X n is also asymptotically measurable and asymptotically tight. Thus repeated application of Prohorov s theorem does indeed imply relative compactness of X n. 41

The following lemma specifies the relationship between marginal and joint processes regarding asymptotic measurability and asymptotic tightness: LEMMA 7. Let X n : Ω n D and Y n : Ω n E be sequences of maps. TFAT: (i) X n and Y n are both asymptotically tight if and only if the same is true for the joint sequence (X n, Y n ) : Ω n D E. (ii) Asymptotically tight sequences X n and Y n are both asymptotically measurable if and only if (X n, Y n ) : Ω n D E is asymptotically measurable. 42

Slutsky s Theorem A very useful consequence of Lemma 7 is Slutsky s theorem: THEOREM 4. (Slutsky s theorem) Suppose X n X and Y n c, where X is separable and c is a fixed constant. TFAT: (i) (X n, Y n ) (X, c). (ii) If X n and Y n are in the same metric space, then X n + Y n X + c. (iii) Assume in addition that the Y n are scalars. Then whenever c R, Y n X n cx. Also, whenever c 0, X n /Y n X/c. 43

Proof. By completing the metric space for X, we can without loss of generality assume that X is tight. Thus by Lemma 7, (X n, Y n ) is asymptotically tight and asymptotically measurable. Thus by Prohorov s theorem, all subsequences of (X n, Y n ) have further subsequences which converge to tight limits. 44

Since these limit points have marginals X and c, and since the marginals in this case completely determine the joint distribution, we have that all limiting distributions are uniquely determined as (X, c). This proves Part (i). Parts (ii) and (iii) now follow from the continuous mapping theorem. 45