The Almost-Sure Ergodic Theorem

Similar documents
25.1 Ergodicity and Metric Transitivity

Austin Mohr Math 704 Homework 6

Building Infinite Processes from Finite-Dimensional Distributions

MATH 131A: REAL ANALYSIS (BIG IDEAS)

Large Deviations for Weakly Dependent Sequences: The Gärtner-Ellis Theorem

Section 11.1 Sequences

Math 410 Homework 6 Due Monday, October 26

General Theory of Large Deviations

x + 1/2 ; 0 x 1/2 f(x) = 2 2x ; 1/2 < x 1

Math212a1413 The Lebesgue integral.

Math LM (24543) Lectures 02

Large Deviations for Small-Noise Stochastic Differential Equations

Homework 4, 5, 6 Solutions. > 0, and so a n 0 = n + 1 n = ( n+1 n)( n+1+ n) 1 if n is odd 1/n if n is even diverges.

Alternate Characterizations of Markov Processes

Lebesgue Integration: A non-rigorous introduction. What is wrong with Riemann integration?

f(s)ds, i.e. to write down the general solution in

The Borel-Cantelli Group

MATH 101, FALL 2018: SUPPLEMENTARY NOTES ON THE REAL LINE

Large Deviations for Small-Noise Stochastic Differential Equations

Random Times and Their Properties

Lecture 4: Graph Limits and Graphons

2.1 Convergence of Sequences

REAL VARIABLES: PROBLEM SET 1. = x limsup E k

REAL ANALYSIS I Spring 2016 Product Measures

Measure and Integration: Solutions of CW2

F (x) = P [X x[. DF1 F is nondecreasing. DF2 F is right-continuous

MA103 Introduction to Abstract Mathematics Second part, Analysis and Algebra

h(x) lim H(x) = lim Since h is nondecreasing then h(x) 0 for all x, and if h is discontinuous at a point x then H(x) > 0. Denote

The function graphed below is continuous everywhere. The function graphed below is NOT continuous everywhere, it is discontinuous at x 2 and

Problem set 5, Real Analysis I, Spring, otherwise. (a) Verify that f is integrable. Solution: Compute since f is even, 1 x (log 1/ x ) 2 dx 1

Alternative Characterizations of Markov Processes

C.7. Numerical series. Pag. 147 Proof of the converging criteria for series. Theorem 5.29 (Comparison test) Let a k and b k be positive-term series

c 1 v 1 + c 2 v 2 = 0 c 1 λ 1 v 1 + c 2 λ 1 v 2 = 0

CHAPTER 7. Connectedness

Introductory Analysis I Fall 2014 Homework #9 Due: Wednesday, November 19

Slope Fields: Graphing Solutions Without the Solutions

Convex Optimization Notes

Contents Ordered Fields... 2 Ordered sets and fields... 2 Construction of the Reals 1: Dedekind Cuts... 2 Metric Spaces... 3

d(x n, x) d(x n, x nk ) + d(x nk, x) where we chose any fixed k > N

ORBITAL SHADOWING, INTERNAL CHAIN TRANSITIVITY

Spring 2014 Advanced Probability Overview. Lecture Notes Set 1: Course Overview, σ-fields, and Measures

In N we can do addition, but in order to do subtraction we need to extend N to the integers

Copyright 2010 Pearson Education, Inc. Publishing as Prentice Hall.

In N we can do addition, but in order to do subtraction we need to extend N to the integers

5 Birkhoff s Ergodic Theorem

Math 140: Foundations of Real Analysis. Todd Kemp

First-Order Ordinary Differntial Equations II: Autonomous Case. David Levermore Department of Mathematics University of Maryland.

2 Measure Theory. 2.1 Measures

Introductory Analysis 2 Spring 2010 Exam 1 February 11, 2015

Lecture 7. µ(x)f(x). When µ is a probability measure, we say µ is a stationary distribution.

1 Math 241A-B Homework Problem List for F2015 and W2016

Lecture Notes Introduction to Ergodic Theory

Measures and Measure Spaces

Jim Lambers MAT 460 Fall Semester Lecture 2 Notes

MONOTONICALLY COMPACT AND MONOTONICALLY

A Few Examples of Limit Proofs

1 Functions of Several Variables 2019 v2

Lebesgue Measure on R n

Filters in Analysis and Topology

Notes for Functional Analysis

MATH 117 LECTURE NOTES

Part V. 17 Introduction: What are measures and why measurable sets. Lebesgue Integration Theory

4 Countability axioms

1/12/05: sec 3.1 and my article: How good is the Lebesgue measure?, Math. Intelligencer 11(2) (1989),

Supremum and Infimum

Comparison of Orlicz Lorentz Spaces

EC 521 MATHEMATICAL METHODS FOR ECONOMICS. Lecture 1: Preliminaries

NOTES ON EXISTENCE AND UNIQUENESS THEOREMS FOR ODES

Math 104: Homework 7 solutions

f(x)dx = lim f n (x)dx.

DOUBLE SERIES AND PRODUCTS OF SERIES

A dyadic endomorphism which is Bernoulli but not standard

Combinatorics in Banach space theory Lecture 12

1.5 Approximate Identities

FUNCTIONAL ANALYSIS CHRISTIAN REMLING

Topics in Stochastic Geometry. Lecture 4 The Boolean model

Examples of Dual Spaces from Measure Theory

4. Conditional risk measures and their robust representation

Problem List MATH 5143 Fall, 2013

1 Linear transformations; the basics

Theorem 5.3. Let E/F, E = F (u), be a simple field extension. Then u is algebraic if and only if E/F is finite. In this case, [E : F ] = deg f u.

g 2 (x) (1/3)M 1 = (1/3)(2/3)M.

Propagating terraces and the dynamics of front-like solutions of reaction-diffusion equations on R

Ergodic Theorems. 1 Strong laws of large numbers

7.1 Coupling from the Past

Problem set 1, Real Analysis I, Spring, 2015.

Abstract. 2. We construct several transcendental numbers.

Proofs for Large Sample Properties of Generalized Method of Moments Estimators

FUNDAMENTALS OF REAL ANALYSIS by. II.1. Prelude. Recall that the Riemann integral of a real-valued function f on an interval [a, b] is defined as

1 The Glivenko-Cantelli Theorem

Measure Theory and Lebesgue Integration. Joshua H. Lifton

REAL ANALYSIS LECTURE NOTES: 1.4 OUTER MEASURE

LECTURE 16: TENSOR PRODUCTS

08a. Operators on Hilbert spaces. 1. Boundedness, continuity, operator norms

Measure Theory on Topological Spaces. Course: Prof. Tony Dorlas 2010 Typset: Cathal Ormond

Defining the Integral

FIRST YEAR CALCULUS W W L CHEN

1 Lyapunov theory of stability

Lyapunov Stability Theory

Now, the above series has all non-negative terms, and hence is an upper bound for any fixed term in the series. That is to say, for fixed n 0 N,

Transcription:

Chapter 24 The Almost-Sure Ergodic Theorem This chapter proves Birkhoff s ergodic theorem, on the almostsure convergence of time averages to expectations, under the assumption that the dynamics are asymptotically mean stationary. This is not the usual proof of the ergodic theorem, as you will find in e.g. Kallenberg. Rather, it uses the AMS machinery developed in the last lecture, following Gray (1988, sec. 7.2), in turn following Katznelson and Weiss (1982). The central idea is that of blocking : break the infinite sequence up into nonoverlapping blocks, show that each block is well-behaved, and conclude that the whole sequence is too. This is a very common technique in modern ergodic theory, especially among information theorists. In pure probability theory, the usual proof of the ergodic theorem uses a result called the maximal ergodic lemma, which is clever but somewhat obscure, and doesn t seem to generalize well to non-stationary processes: see Kallenberg, ch. 10. We saw, at the end of the last chapter, that if time-averages converge in the long run, they converge on conditional expectations. Our work here is showing that they (almost always) converge. We ll do this by showing that their lim infs and lim sups are (almost always) equal. This calls for some preliminary results about the upper and lower limits of time-averages. Definition 335 (Lower and Upper Limiting Time Averages) For any observable f, define the lower and upper limits of its time averages as, respectively, Af(x) lim inf tf(x) t (24.1) Af(x) lim sup A t f(x) (24.2) t Define L f as the set of x where the limits coincide: L f { x Af(x) = Af(x) } (24.3) 198

CHAPTER 24. THE ALMOST-SURE ERGODIC THEOREM 199 Lemma 336 (Lower and Upper Limiting Time Averages are Invariant) For each obsevable f, Af and Af are invariant functions. Proof: Use our favorite trick, and write A t f(t x) = t+1 t A t+1 f(x) f(x)/t. Clearly, the lim sup and lim inf of this expression will equal the lim sup and lim inf of A t+1 f(x), which is the same as that of A t f(x). Lemma 337 ( Limits Coincide is an Invariant Event) For each f, the set L f is invariant. Proof: Since Af and Af are both invariant, they are both measurable with respect to I (Lemma 304), so the set of x such that Af(x) = Af(x) is in I, therefore it is invariant (Definition 303). Lemma 338 (Ergodic properties under AMS measures) An observable f has the ergodic property with respect to an AMS measure µ if and only if it has it with respect to the stationary limit m. Proof: By Lemma 337, L f is an invariant set. But then, by Lemma 329, m(l f ) = µ(l f ). (Take f = 1 Lf in the lemma.) f has the ergodic property with respect to µ iff µ(l f ) = 1, so f has the ergodic property with respect to µ iff it has it with respect to m. Theorem 339 (Birkhoff s Almost-Sure Ergodic Theorem) If a dynamical system is AMS with stationary mean m, then all bounded observables have the ergodic property, and with probability 1 (under both µ and m), for all f L 1 (m). Af = E m [f I (24.4) Proof: From Theorem 333 and its corollaries, it is enough to prove that all L 1 (m) observables have ergodic properties to get Eq. 24.4. From Lemma 338, it is enough to show that the observables have ergodic properties in the stationary system Ξ, X, m, T. (Accordingly, all expectations in the rest of this proof will be with respect to m.) Since any observable can be decomposed into its positive and negative parts, f = f + f, assume, without loss of generality, that f is positive. Since Af Af everywhere, it suffices to show that E [ Af Af 0. This in turn will follow from E [ Af E [f E [Af. (Since f is bounded, the integrals exist.) We ll prove that E [ Af E [f, by showing that the time average comes close to its lim sup, but from above (in the mean). Proving that E [Af E [f will be entirely parallel. Since f is bounded, we may assume that f M everywhere. For every ɛ > 0, for every x there exists a finite t such that A t f(x) f(x) ɛ (24.5)

CHAPTER 24. THE ALMOST-SURE ERGODIC THEOREM 200 This is because f is the limit of the least upper bounds. (You can see where this is going already the time-average has to be close to its lim sup, but close from above.) Define t(x, ɛ) to be the smallest t such that f(x) ɛ + A t f(x). Then, since f is invariant, we can add from from time 0 to time t(x, ɛ) 1 and get: t(x,ɛ) 1 K n f(x) + ɛt(x, ɛ) t(x,ɛ) 1 K n f(x) (24.6) Define B N {x t(x, ɛ) N}, the set of bad x, where the sample average fails to reach a reasonable (ɛ) distance of the lim sup before time N. Because t(x, ɛ) is finite, m(b N ) goes to zero as N. Chose a N such that m(b N ) ɛ/m, and, for the corresponding bad set, drop the subscript. (We ll see why this level is important presently.) We ll find it convenient to not deal directly with f, but with a related function which is better-behaved on the bad set B. Set f(x) = M when x B, and = f(x) elsewhere. Similarly, define t(x, ɛ) to be 1 if x B, and t(x, ɛ) elsewhere. Notice that t(x, ɛ) N for all x. Something like Eq. 24.6 still holds for the nice-ified function f, specifically, t(x,ɛ) 1 K n f(x) t(x,ɛ) 1 K n f(x) + ɛ t(x, ɛ) (24.7) If x B, this reduces to f(x) M + ɛ, which is certainly true because f(x) M. If x B, it will follow from Eq. 24.6, provided that T n x B, for all n t(x, ɛ) 1. To see that this, in turn, must be true, suppose that T n x B for some such n. Because (we re assuming) n < t(x, ɛ), it must be the case that A n f(x) < f(x) ɛ (24.8) Otherwise, t(x, ɛ) would not be the first time at which Eq. 24.5 held true. Similarly, because T n x B, while x B, t(t n x, ɛ) > N t(x, ɛ), and so Combining the last two displayed equations, A t(x,ɛ) n f(t n x) < f(x) ɛ (24.9) A t(x,ɛ) f(x) < f(x) ɛ (24.10) contradicting the definition of t(x, ɛ). Consequently, there can be no n < t(x, ɛ) such that T n x B. We are now ready to consider the time average A L f over a stretch of time of some considerable length L. We ll break the time indices over which we re averaging into blocks, each block ending when T t x hits B again. We need to make sure that L is sufficiently large, and it will turn out that L N/(ɛ/M) suffices, so that NM/L ɛ. The end-points of the blocks are defined recursively,

CHAPTER 24. THE ALMOST-SURE ERGODIC THEOREM 201 starting with b 0 = 0, b k+1 = b k + t(t b k x, ɛ). (Of course the b k are implicitly dependent on x and ɛ and N, but suppress that for now, since these are constant through the argument.) The number of completed blocks, C, is the largest k such that L 1 b k. Notice that L b C N, because t(x, ɛ) N, so if L b C > N, we could squeeze in another block after b C, contradicting its definition. Now let s examine the sum of the lim sup over the trajectory of length L. K n f(x) = C k=1 b k n=b k 1 K n f(x) + For each term in the inner sum, we may assert that t(t b k x,ɛ) 1 K n f(t b k x) t(t b k x,ɛ) 1 K n f(x) (24.11) K n f(t b k x) + ɛ t(t b k x, ɛ) (24.12) on the strength of Equation 24.7, so, returning to the over-all sum, K n f(x) C k=1 b k 1 n=b k 1 K n f(x) + ɛ(bk b k 1 ) + K n f(x)(24.13) b = ɛb C + K n f(x) + K n f(x) (24.14) b ɛb C + K n f(x) + M (24.15) b ɛb C + M(L 1 b C ) + K n f(x) (24.16) b ɛb C + M(N 1) + K n f(x) (24.17) ɛl + M(N 1) + K n f(x) (24.18) where the last step, going from b C to L, uses the fact that both ɛ and f are non-negative. Taking expectations of both sides, [ [ E K n f(x) E ɛl + M(N 1) + K n f(x) (24.19) E [ K n f(x) ɛl + M(N 1) + E [K n f(x) LE [ f(x) [ ɛl + M(N 1) + LE f(x) (24.20) (24.21)

CHAPTER 24. THE ALMOST-SURE ERGODIC THEOREM 202 using the fact that f is invariant on the left-hand side, and that m is stationary on the other. Now divide both sides by L. E [ f(x) M(N 1) ɛ + [ L 2ɛ + E f(x) [ + E f(x) [ since MN/L ɛ. Now let s bound E f(x) in terms of E [f: [ E f (24.22) (24.23) = f(x)dm (24.24) = f(x)dm + f(x)dm (24.25) B c B = f(x)dm + Mdm (24.26) B c B E [f + Mdm (24.27) B = E [f + Mm(B) (24.28) E [f + M ɛ M (24.29) = E [f + ɛ (24.30) using the definition of f in Eq. 24.26, the non-negativity of f in Eq. 24.27, and the bound on m(b) in Eq. 24.29. Substituting into Eq. 24.23, E [ f E [f + 3ɛ (24.31) Since ɛ can be made arbitrarily small, we conclude that E [ f E [f (24.32) as was to be shown. The proof of E [ f E [f proceeds in parallel, only the nice-ified function f is set equal to 0 on the bad set. Since E [ f E [f E [ f, we have that E [ f f 0. Since however it is always true that f f 0, we may conclude that f f = 0 m-almost everywhere. Thus m(l f ) = 1, i.e., the time average converges m-almost everywhere. Since this is an invariant event, it has the same measure under µ and its stationary limit m, and so the time average converges µ-almost-everywhere as well. By Corollary 334, Af = E m [f I, as promised. Corollary 340 (Birkhoff s Ergodic Theorem for Integrable Observables) Under the assumptions of Theorem 339, all L 1 (m) functions have ergodic properties, and Eq. 24.4 holds a.e. m and µ.

CHAPTER 24. THE ALMOST-SURE ERGODIC THEOREM 203 Proof: We need merely show that the ergodic properties hold, and then the equation follows. To do so, define f M (x) f(x) M, an upper-limited version of the lim sup. Reasoning entirely parallel to the proof of Theorem 339 leads to the conclusion that E [ f M E [f. Then let M, and apply the monotone convergence theorem to conclude that E [ f E [f; the rest of the proof goes through as before.