Large Deviations for Weakly Dependent Sequences: The Gärtner-Ellis Theorem

Similar documents
General Theory of Large Deviations

Gärtner-Ellis Theorem and applications.

HOW TO LOOK AT MINKOWSKI S THEOREM

Convexity in R n. The following lemma will be needed in a while. Lemma 1 Let x E, u R n. If τ I(x, u), τ 0, define. f(x + τu) f(x). τ.

1 Topology Definition of a topology Basis (Base) of a topology The subspace topology & the product topology on X Y 3

Basic Definitions: Indexed Collections and Random Functions

25.1 Ergodicity and Metric Transitivity

Section 9.2 introduces the description of Markov processes in terms of their transition probabilities and proves the existence of such processes.

ON THE REGULARITY OF SAMPLE PATHS OF SUB-ELLIPTIC DIFFUSIONS ON MANIFOLDS

We are going to discuss what it means for a sequence to converge in three stages: First, we define what it means for a sequence to converge to zero

( f ^ M _ M 0 )dµ (5.1)

ALGEBRA. 1. Some elementary number theory 1.1. Primes and divisibility. We denote the collection of integers

3 Integration and Expectation

Convex Optimization Notes

ON SPACE-FILLING CURVES AND THE HAHN-MAZURKIEWICZ THEOREM

One-Parameter Processes, Usually Functions of Time

BASICS OF CONVEX ANALYSIS

MATH 51H Section 4. October 16, Recall what it means for a function between metric spaces to be continuous:

g 2 (x) (1/3)M 1 = (1/3)(2/3)M.

LECTURE 15: COMPLETENESS AND CONVEXITY

(a) For an accumulation point a of S, the number l is the limit of f(x) as x approaches a, or lim x a f(x) = l, iff

are Banach algebras. f(x)g(x) max Example 7.4. Similarly, A = L and A = l with the pointwise multiplication

1. Supremum and Infimum Remark: In this sections, all the subsets of R are assumed to be nonempty.

Appendix B Convex analysis

Convex Analysis and Economic Theory AY Elementary properties of convex functions

δ xj β n = 1 n Theorem 1.1. The sequence {P n } satisfies a large deviation principle on M(X) with the rate function I(β) given by

Spring 2014 Advanced Probability Overview. Lecture Notes Set 1: Course Overview, σ-fields, and Measures

Math 328 Course Notes

EECS 229A Spring 2007 * * (a) By stationarity and the chain rule for entropy, we have

Consequences of Continuity

The Almost-Sure Ergodic Theorem

CHAPTER 7. Connectedness

Large Deviations Techniques and Applications

POISSON PROCESSES 1. THE LAW OF SMALL NUMBERS

Building Infinite Processes from Finite-Dimensional Distributions

Theorem 5.3. Let E/F, E = F (u), be a simple field extension. Then u is algebraic if and only if E/F is finite. In this case, [E : F ] = deg f u.

A NICE PROOF OF FARKAS LEMMA

A minimalist s exposition of EM

Preliminary draft only: please check for final version

Finite-Dimensional Cones 1

THE INVERSE FUNCTION THEOREM

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

A SHORT INTRODUCTION TO BANACH LATTICES AND

Random Times and Their Properties

Math 210B. Artin Rees and completions

Central limit theorem. Paninski, Intro. Math. Stats., October 5, probability, Z N P Z, if

Topology of the set of singularities of a solution of the Hamilton-Jacobi Equation

A Note On Large Deviation Theory and Beyond

4.4 Uniform Convergence of Sequences of Functions and the Derivative

Optimization and Optimal Control in Banach Spaces

Problem List MATH 5143 Fall, 2013

Extreme points of compact convex sets

Notes for Functional Analysis

6.842 Randomness and Computation March 3, Lecture 8

Empirical Processes: General Weak Convergence Theory

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

Indeed, if we want m to be compatible with taking limits, it should be countably additive, meaning that ( )

Cauchy s Theorem (rigorous) In this lecture, we will study a rigorous proof of Cauchy s Theorem. We start by considering the case of a triangle.

M17 MAT25-21 HOMEWORK 6

We are now going to go back to the concept of sequences, and look at some properties of sequences in R

Chapter IV Integration Theory

Measures and Measure Spaces

Filters in Analysis and Topology

Exercise 2. Prove that [ 1, 1] is the set of all the limit points of ( 1, 1] = {x R : 1 <

Alternative Characterization of Ergodicity for Doubly Stochastic Chains

REAL VARIABLES: PROBLEM SET 1. = x limsup E k

Part V. 17 Introduction: What are measures and why measurable sets. Lebesgue Integration Theory

1 Lesson 1: Brunn Minkowski Inequality

1. Bounded linear maps. A linear map T : E F of real Banach

PICARD S THEOREM STEFAN FRIEDL

We set up the basic model of two-sided, one-to-one matching

Introduction and Preliminaries

Continuity. Chapter 4

STAT 331. Martingale Central Limit Theorem and Related Results

The small ball property in Banach spaces (quantitative results)

Introduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued

Topological vectorspaces

CHAPTER 3. Sequences. 1. Basic Properties

In English, this means that if we travel on a straight line between any two points in C, then we never leave C.

2. The Concept of Convergence: Ultrafilters and Nets

Proof Techniques (Review of Math 271)

6.2 Deeper Properties of Continuous Functions

DEVELOPMENT OF MORSE THEORY

Division of the Humanities and Social Sciences. Supergradients. KC Border Fall 2001 v ::15.45

Part III. 10 Topological Space Basics. Topological Spaces

08a. Operators on Hilbert spaces. 1. Boundedness, continuity, operator norms

An introduction to some aspects of functional analysis

GEOMETRIC APPROACH TO CONVEX SUBDIFFERENTIAL CALCULUS October 10, Dedicated to Franco Giannessi and Diethard Pallaschke with great respect

A lower bound for X is an element z F such that

Introduction to Empirical Processes and Semiparametric Inference Lecture 08: Stochastic Convergence

Preliminary draft only: please check for final version

NOTES ON DIFFERENTIAL FORMS. PART 1: FORMS ON R n

TMA 4180 Optimeringsteori KARUSH-KUHN-TUCKER THEOREM

Metric spaces and metrizability

LECTURE 3: SMOOTH FUNCTIONS

MAT 473 Intermediate Real Analysis II

Calculus in Gauss Space

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed

(convex combination!). Use convexity of f and multiply by the common denominator to get. Interchanging the role of x and y, we obtain that f is ( 2M ε

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 43, NO. 3, MARCH

Transcription:

Chapter 34 Large Deviations for Weakly Dependent Sequences: The Gärtner-Ellis Theorem This chapter proves the Gärtner-Ellis theorem, establishing an LDP for not-too-dependent processes taking values in topological vector spaces. Most of our earlier LDP results can be seen as consequences of this theorem. 34.1 The Gärtner-Ellis Theorem The Gärtner-Ellis theorem is a powerful result which establishes the existence of a large deviation principle for processes where the cumulant generating function tends towards a well-behaved limit, implying not-too-strong dependence between successive values. (Exercise 34.5 clarifies the meaning of too strong.) It will imply our LDPs for IID and Markovian sequences. I could have started with it, but its proof, as you ll see, is pretty technical, and so it seemed better to use the more elementary arguments of the preceding chapters. To fix notation, Ξ will be a real topological vector space, and Ξ will be its dual space, of continuous linear functions Ξ R. (If Ξ = R d, we can identify Ξ and Ξ by means of the inner product. In differential geometry, on the other hand, Ξ might be a space of tangent vectors, and Ξ the corresponding oneforms.) X ɛ will be a family of Ξ-valued random variables, parameterized by ɛ > 0. Refer to Definitions 423 and 424 in Section 31.1 for the definition of the cumulant generating function and its Legendre transform (respectively), which I will denote by Λ ɛ : Ξ R and Λ ɛ : Ξ R. The proof of the Gärtner-Ellis theorem goes through a number of lemmas. 233

CHAPTER 34. THE GÄRTNER-ELLIS THEOREM 234 Basically, the upper large deviation bound holds under substantially weaker conditions than the lower bound does, and it s worthwhile having the partial results available to use in estimates even if the full large deviations principle does not apply. Definition 444 The upper-limiting cumulant generating function is and its Legendre transform is written Λ (x). Λ (t) lim sup ɛλ ɛ (t/ɛ) (34.1) The point of this is that the limsup always exists, whereas the limit doesn t, necessarily. But we can show that the limsup has some reasonable properties, and in fact it s enough to give us an upper bound. Lemma 445 Λ (t) is convex, and Λ (x) is a convex rate function. Proof: The proof of the convexity of Λ ɛ (t) follows the proof in Lemma 425, and the convexity of Λ (t) by passing to the limit. To etablish Λ (x) as a rate function, we need it to be non-negative and lower semi-continuous. Since Λ ɛ (0) = 0 for all ɛ, Λ (0) = 0. This in turn implies that Λ (x) 0. Since the latter is the supremum of a class of continuous functions, namely t(x) Λ (t), it must be lower semi-continuous. Finally, its convexity is implied by its being a Legendre transform. Lemma 446 (Upper Bound in Gärtner-Ellis Theorem: Compact Sets) For any compact set K Ξ, lim sup ɛ log P (X ɛ K) Λ (K) (34.2) Proof: Entirely parallel to the proof of the upper bound in Cramér s Theorem (429), up through the point where closed sets are divided into a compact part and a remainder far from the origin, of exponentially-small probability. Because K is compact, we can proceed as though the remainder is empty. Lemma 447 (Upper Bound in Gärtner-Ellis Theorem: Closed Sets) If the family of distributions L (X ɛ ) are exponentially tight, then for all closed C Ξ, lim sup ɛ log P (X ɛ C) Λ (C) (34.3) Proof: Exponential tightness, by definition (417), will let us repeat Theorem 429 trick of dividing closed sets into a compact part, and an exponentiallyvanishing non-compact part.

CHAPTER 34. THE GÄRTNER-ELLIS THEOREM 235 Definition 448 The limiting cumulant generating function is Λ(t) lim ɛλ ɛ (t/ɛ) (34.4) when the limit exists. Its domain of finiteness D {t Ξ : Λ(t) < ty}. Its limiting Legendre transform is Λ, with domain of finiteness D. Lemma 449 If Λ(t) exists, then it is convex, Λ (x) is a convex rate function, and Eq. 34.2 applies to the latter. If in addition the process is exponentially tight, then Eq. 34.3 holds for Λ (x). Proof: Because, if a limit exists, it is equal to the limsup. Lemma 450 If Ξ = R d, then 0 intd is sufficient for exponential tightness. Proof: Exercise. Unfortunately, this is not good enough to get exponential tightness in arbitrary vector spaces. Definition 451 (Exposed Point) A point x Ξ is exposed for Λ ( ) when there is a t Ξ such that Λ (y) Λ (x) > t(y x) for all y x. t is the exposing hyper-plane for x. In R 1, a point x is exposed if the curve Λ (y) lies strictly above the line of slope t through the point (x, Λ (x)). Similarly, in R d, the Λ (y) surface must lie strictly above the hyper-plane passing through (x, Λ (x)) with surface normal t. Since Λ (y) is convex, we could generally arrange this by making this the tangent hyper-plane, but we do not, yet, have any reason to think that the tangent is well-defined. Obviously, if Λ(t) exists, we can replace Λ ( ) by Λ ( ) in the definition and the rest of this paragraph. Definition 452 An exposed point x Ξ with exposing hyper-plane t is nice, x N, if lim ɛλ ɛ(t/ɛ) (34.5) exists, and, for some r > 1, Λ (rt) < (34.6) Note: Most books on large deviations do not give this property any particular name. Lemma 453 (Lower Bound in Gärtner-Ellis Theorem) If the X ɛ are exponentially tight, then, for any open set O Ξ, lim ɛ log P (X ɛ O) Λ (O N) (34.7)

CHAPTER 34. THE GÄRTNER-ELLIS THEOREM 236 Proof: If we pick any nice, exposed point x O, we can repeat the proof of the lower bound from Cramér s Theorem (429). In fact, the point of Definition 452 is to ensure this. Taking the supremum over all such x gives the lemma. Theorem 454 (Abstract Gärtner-Ellis Theorem) If the X ɛ are exponentially tight, and Λ (O E) = Λ (O) for all open sets O Ξ, then X ɛ obey an LDP with rate 1/ɛ and good rate function Λ (x). Proof: The large deviations upper bound is Lemma 447. The large deviations lower bound is implied by Lemma 453 and the additional hypothesis of the theorem. Matters can be made a bit more concrete in Euclidean space (the original home of the theorem), using, however, one or two more bits of convex analysis. Definition 455 (Relative Interior) The relative interior of a non-empty and convex set A is rinta {x A : y A, δ > 0, x δ(y x) A} (34.8) Notice that inta rinta, since the latter, in some sense, doesn t care about points outside of A. Definition 456 (Essentially Smooth) Λ is essentially smooth if (i) intd, (ii) Λ is differentiable on intd, and (iii) Λ is steep, meaning that if D has a boundary, D, then lim t D Λ(t) =. This definition was introduced to exploit the following theorem of convex analysis. Proposition 457 If Λ is essentially smooth, then Λ is continuous on rintd, and rintd N, the set of exposed, nice points. Proof: D is non-empty, because there is at least one point x 0 where Λ (x 0 ) = 0. Moreover, it is a convex set, because Λ is a convex function (Lemma 449), so it has a relative interior. Now appeal to Rockafellar (1970, Corollary 26.4.1). Remark: You might want to try to prove that Λ is continuous on rintd. Theorem 458 (Euclidean Gärtner-Ellis Theorem) Suppose X ɛ, taking values in R d, are exponentially tight, and that Λ(t) exists and is essentially smooth. Then X ɛ obey an LDP with rate 1/ɛ and good rate function Λ (x). Proof: From the abstract Gärtner-Ellis Theorem 454, it is enough to show that Λ (O N) = Λ (O), for any open set O. That is, we want x O N Λ (x) = x O Λ (x) (34.9)

CHAPTER 34. THE GÄRTNER-ELLIS THEOREM 237 Since it s automatic that what we need to show is that x O N Λ (x) x O Λ (x) (34.10) x O N Λ (x) x O Λ (x) (34.11) In view of Proposition 457, it s really just enough to show that Λ (O rintd ) Λ (O). This is trivial when the intersection O D is empty, so assume it isn t, and pick any x in that intersection. Because O is open, and because of the definition of the relative interior, we can pick any point y rintd, and, for sufficiently small δ, δy + (1 δ)x O rintd. Since Λ is convex, Λ (δy + (1 δ)x) δλ (y) + (1 δ)λ (x). Taking the limit as δ 0, and the claim follows by taking the imum over x. 34.2 Exercises Λ (O rintd ) Λ (x) (34.12) Exercise 34.1 Show that Cramér s Theorem (429), is a special case of the Euclidean Gärtner-Ellis Theorem (458). Exercise 34.2 Let Z 1, Z 2,... be IID random variables in a discrete space, and let X 1, X 2,... be the empirical distributions they generate. Use the Gärtner- Ellis Theorem to re-prove Sanov s Theorem (434). Can you extend this to the case where the Z i take values in an arbitrary Polish space? Exercise 34.3 Let Z 1, Z 2,... be values from a stationary ergodic Markov chain (i.e. the state space is discrete). Repeat the previous exercise for the pair measure. Again, can the result be extended to arbitrary Polish state spaces? Exercise 34.4 Let Z i be real-valued mean-zero stationary Gaussian variables, with i= cov (Z 0, Z i ) <. Let X t = t 1 t i=1 Z i. Show that these time averages obey an LDP with rate t and rate function x 2 /2Γ, where Γ = i= cov (Z 0, Z i ). (Cf. the mean-square ergodic theorem 246 of chapter 21.) If Z i are not Gaussian but are weakly stationary, find an additional hypothesis such that the time averages still obey an LDP. Exercise 34.5 (Too-Strong Dependence) Let X i = X, a random variable in R, for all i. Show that the Gärtner-Ellis Theorem fails, unless X is degenerate.