Lecture 10. Theorem 1.1 [Ergodicity and extremality] A probability measure µ on (Ω, F) is ergodic for T if and only if it is an extremal point in M.

Similar documents
Lecture 5. If we interpret the index n 0 as time, then a Markov chain simply requires that the future depends only on the present and not on the past.

Lecture 7. µ(x)f(x). When µ is a probability measure, we say µ is a stationary distribution.

Lecture 12. F o s, (1.1) F t := s>t

25.1 Ergodicity and Metric Transitivity

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R.

Chapter 8. General Countably Additive Set Functions. 8.1 Hahn Decomposition Theorem

Markov Chains (Part 3)

Probability and Random Processes

Ergodic Theory. Constantine Caramanis. May 6, 1999

The Lebesgue Integral

Section Signed Measures: The Hahn and Jordan Decompositions

Lecture 6. 2 Recurrence/transience, harmonic functions and martingales

Lecture 7. We can regard (p(i, j)) as defining a (maybe infinite) matrix P. Then a basic fact is

Classification of Countable State Markov Chains

SUPPLEMENT TO PAPER CONVERGENCE OF ADAPTIVE AND INTERACTING MARKOV CHAIN MONTE CARLO ALGORITHMS

MATHS 730 FC Lecture Notes March 5, Introduction

1 Topology Definition of a topology Basis (Base) of a topology The subspace topology & the product topology on X Y 3

Signed Measures and Complex Measures

DOMINO TILINGS INVARIANT GIBBS MEASURES

Lecture 11: Introduction to Markov Chains. Copyright G. Caire (Sample Lectures) 321

A Note on the Convergence of Random Riemann and Riemann-Stieltjes Sums to the Integral

Lecture Notes Introduction to Ergodic Theory

A List of Problems in Real Analysis

Lecture 2. We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales.

Random Process Lecture 1. Fundamentals of Probability

Recall that if X is a compact metric space, C(X), the space of continuous (real-valued) functions on X, is a Banach space with the norm

(1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define

Aliprantis, Border: Infinite-dimensional Analysis A Hitchhiker s Guide

Lebesgue-Radon-Nikodym Theorem

CHAPTER V DUAL SPACES

Advanced Computer Networks Lecture 2. Markov Processes

Chapter 7. Markov chain background. 7.1 Finite state space

II - REAL ANALYSIS. This property gives us a way to extend the notion of content to finite unions of rectangles: we define

Dynkin (λ-) and π-systems; monotone classes of sets, and of functions with some examples of application (mainly of a probabilistic flavor)

Disintegration into conditional measures: Rokhlin s theorem

Note that in the example in Lecture 1, the state Home is recurrent (and even absorbing), but all other states are transient. f ii (n) f ii = n=1 < +

Hitting Probabilities

Problem Set 2: Solutions Math 201A: Fall 2016

Probability and Measure

CONDITIONAL ERGODICITY IN INFINITE DIMENSION. By Xin Thomson Tong and Ramon van Handel Princeton University

Acta Universitatis Carolinae. Mathematica et Physica

MAT 571 REAL ANALYSIS II LECTURE NOTES. Contents. 2. Product measures Iterated integrals Complete products Differentiation 17

Analysis of Probabilistic Systems

2. Transience and Recurrence

Necessary and sufficient conditions for strong R-positivity

Positive and null recurrent-branching Process

Measure Theory on Topological Spaces. Course: Prof. Tony Dorlas 2010 Typset: Cathal Ormond

215 Problem 1. (a) Define the total variation distance µ ν tv for probability distributions µ, ν on a finite set S. Show that

n [ F (b j ) F (a j ) ], n j=1(a j, b j ] E (4.1)

INTRODUCTION TO MEASURE THEORY AND LEBESGUE INTEGRATION

WHY SATURATED PROBABILITY SPACES ARE NECESSARY

Lebesgue Measure on R n

Fragmentability and σ-fragmentability

Ergodic Theory for Semigroups of Markov Kernels

The Kolmogorov extension theorem

An Introduction to Entropy and Subshifts of. Finite Type

Markov processes Course note 2. Martingale problems, recurrence properties of discrete time chains.

Countable Borel equivalence relations. Konstantin Slutsky

Markov Chains for Everybody

Lebesgue Measure on R n

ABSTRACT INTEGRATION CHAPTER ONE

2 Probability, random elements, random sets

A topological semigroup structure on the space of actions modulo weak equivalence.

Let (Ω, F) be a measureable space. A filtration in discrete time is a sequence of. F s F t

Measures and Measure Spaces

02. Measure and integral. 1. Borel-measurable functions and pointwise limits

Spring 2014 Advanced Probability Overview. Lecture Notes Set 1: Course Overview, σ-fields, and Measures

Measures. 1 Introduction. These preliminary lecture notes are partly based on textbooks by Athreya and Lahiri, Capinski and Kopp, and Folland.

Definable Graphs and Dominating Reals

Non-homogeneous random walks on a semi-infinite strip

Dual Space of L 1. C = {E P(I) : one of E or I \ E is countable}.

Mean-field dual of cooperative reproduction

18.175: Lecture 3 Integration

PROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS

Point-Map Probabilities of a Point Process

MATH 202B - Problem Set 5

Ergodic Properties of Markov Processes

Invariant measures for iterated function systems

Measurability Problems for Boolean Algebras

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

A mathematical framework for Exact Milestoning

THEOREMS, ETC., FOR MATH 516

Lectures 22-23: Conditional Expectations

Information Theory and Statistics Lecture 3: Stationary ergodic processes

MTH 404: Measure and Integration

Chapter 1. Measure Spaces. 1.1 Algebras and σ algebras of sets Notation and preliminaries

Real Analysis, 2nd Edition, G.B.Folland Signed Measures and Differentiation

4th Preparation Sheet - Solutions

The Caratheodory Construction of Measures

Markov Chains and Stochastic Sampling

2 (Bonus). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

P i [B k ] = lim. n=1 p(n) ii <. n=1. V i :=

1 Inner Product Space

Generalized metric properties of spheres and renorming of Banach spaces

A Study of Probability and Ergodic theory with applications to Dynamical Systems

MH 7500 THEOREMS. (iii) A = A; (iv) A B = A B. Theorem 5. If {A α : α Λ} is any collection of subsets of a space X, then

INVARIANT MEASURES ON LOCALLY COMPACT GROUPS

(2) E M = E C = X\E M

A note on adiabatic theorem for Markov chains and adiabatic quantum computation. Yevgeniy Kovchegov Oregon State University

5 Measure theory II. (or. lim. Prove the proposition. 5. For fixed F A and φ M define the restriction of φ on F by writing.

Transcription:

Lecture 10 1 Ergodic decomposition of invariant measures Let T : (Ω, F) (Ω, F) be measurable, and let M denote the space of T -invariant probability measures on (Ω, F). Then M is a convex set, although it might be empty. We will show that any measure µ M can be decomposed as mixtures of extremal elements of M, which are exactly the ergodic measures for T. Theorem 1.1 [Ergodicity and extremality] A probability measure µ on (Ω, F) is ergodic for T if and only if it is an extremal point in M. Proof. If µ M is not ergodic, then there exists A F with µ(a) (0, 1) and A is an invariant set for T. Let µ A (resp. µ A c) denote the restriction of µ to A (resp. A c ) and normalized to be a probability measure, i.e., µ A ( ) = µ(a ). µ(a) Then µ A and µ A c are distinct invariant probability measures for T, and µ = αµ A + (1 α)µ A c, where α = µ(a) (0, 1), which shows that µ is not extremal. Conversely, if µ M is not extremal, then µ = αµ 1 + (1 α)µ 2 for some α (0, 1) and distinct µ 1, µ 2 M. If µ was ergodic, then by the ergodic theorem, for any bounded measurable f on (Ω, F), A n f(ω) = f(ω) + f(t ω) + + f(t n 1 ω) n E µ [f] µ a.s. and in L 1 (Ω, F, µ). In particular, A n f(ω) also converges to E µ [f] almost surely w.r.t. µ 1 (resp. µ 2 ), and hence E µ1 [f] = E µ2 [f] = E µ [f]. Since f is any bounded measurable function, this implies that µ 1 = µ 2 = µ, a contradiction. Therefore given that µ is non-extremal, it cannot be ergodic. By applying the ergodic theorem to suitable test functions, one can prove: Lemma 1.1 [Singularity of ergodic measures] Distinct ergodic measures µ 1, µ 2 M are mutually singular. More specifically, there exists A I s.t. µ 1 (A) = µ 2 (A c ) = 1. Choquet s Theorem (see Lax [2, Section 13.4]) provides a decomposition of a metrizable compact convex subset K of a locally convex topological vector space in terms of the extremal points of K. Since the set of invariant probability measures M in general may not be compact, we will not appeal to Choquet s theorem. Instead, we will assume that (Ω, F) is a complete separable metric space with Borel σ-algebra, and appeal to the existence of regular conditional probability distributions. 1

Theorem 1.2 [Ergodic decomposition] Let Ω be a complete separable metric space with Borel σ-algebra F. Let T be a measurable transformation on (Ω, F) and let M denote the set of probability measures on (Ω, F) invariant w.r.t. T. Then for any µ M, there exists a probability measure ρ µ on the set of ergodic measures such that µ = ν ρ µ (dν). (1.1) Remark. The σ-algebra we use for defining ρ µ on M is the Borel σ-algebra induced by the weak topology on M, i.e., µ n µ in M w.r.t. the weak topology if and only if for all bounded continuous functions f : Ω R, we have fdµ n fdµ. Such a convergence of probability measures on (Ω, F) is called weak convergence. Proof. Since (Ω, F) is Polish, there exists a regular conditional probability µ ω of µ conditional on the invariant σ-field I. Provided we can show that µ ω almost surely, we can regard µ ω as a map from Ω to, and denote the distribution of µ ω by ρ µ. The decomposition (1.1) then follows readily. We now verify that µ ω ( ) := µ( I) is ergodic µ a.s. First we show invariance, i.e., µ a.s., µ ω (A) = µ ω (T 1 A) A F. (1.2) A priori, there are uncountable number of sets in F, and the exceptional sets may pile up. However, by our assumption that (Ω, F) is Polish, F can be generated by a countable collection of sets F 0, and hence it suffices to verify (1.2) for A F 0 since µ ω is a probability measure a.s. Since µ ω ( ) = µ( I), given A F 0, µ ω (A) = µ ω (T 1 A) a.s. (i.e., µ(a I) = µ(t 1 A I) a.s.) if and only if µ(a E) = µ(t 1 A E) E I, which holds since E I implies that µ(e T 1 E) = 0, and µ(a E) = µ(t 1 (A E)). This proves the a.s. invariance of µ ω for T. For the a.s. ergodicity of µ ω, it suffices to show that for µ a.s. every µ ω, A F, A n 1 A (ω) := 1 A(ω) + 1 A (T ω) + + 1 A (T n 1 ω) n µ ω (A) a.s. w.r.t. µ ω. (1.3) Approximating A F by sets that are finitely generated from F 0, it suffices to verify (1.3) for A F 0. For such an A, the ergodic theorem applied to 1 A w.r.t. µ implies that A n 1 A (ω) µ(a I) = µ ω (A) a.s. w.r.t. µ. Since µ ω is the regular conditional probability of µ given I, (1.3) must hold. 2 Structure of stationary Markov chains We now apply the ergodic decomposition theorem for stationary measures to stationary Markov chains. Let Π(x, dy) be a transition probability kernel on the state space (S, S). In this section, we will consider a general Polish space (S, S). A Markov process (X n ) n N is stationary if and only if its marginal distribution µ is stationary for Π. More precisely, µ M := {ν : ν(s) = 1, ν(a) = Π(x, A)ν(dx) A S}. Given marginal law µ M, we can embed the stationary Markov process (X n ) n N in a doubly infinite stationary sequence (X n ) n Z. The process (X n ) n Z can be regarded as a random 2 S

variable taking values in the sequence space (S Z, S Z ) where S Z denotes the product σ-algebra on the product space S Z. Given marginal law µ M, let P µ denote the law of (X n ) n Z on (S Z, S Z ). Let T denote the coordinate shift map on S Z. Then each µ M determines a P µ M, where M is the family of probability measures on (S Z, S Z ) invariant for the shift map T. Our goal is to show that the ergodic components of a stationary Markov process P µ are stationary Markov processes P ν with ν M, where ν are the extremal components of µ in M. (Note that in general, ergodic decomposition a stationary process gives ergodic processes which need not be Markov). Theorem 2.1 [Ergodic decomposition of stationary Markov processes] Given µ M, P µ is ergodic for the shift map T if and only if µ, i.e., µ is extremal in the family of invariant measures M for the Markov chain. Furthermore, for any µ M, there exists a probability measure ρ µ on such that µ = νρ µ (dν) and P µ = P ν ρ µ (dν). (2.1) The extremal elements of M are called the extremal or ergodic invariant measures. When M is a singleton, we say the Markov chain is ergodic. Proof. If µ M is not extremal, then neither is P µ extremal in M, which is equivalent to P µ not being ergodic. The key to proving the converse is the following result. Lemma 2.1 Let µ M, and let I be the invariant σ-field on (S Z, S Z ) for the shift map T and the measure P µ (note that we defined I modulo sets of P µ measure 0). Then within sets of P µ measure 0, I F 0 0, where F n m = σ(x m, x m+1,, x n ) on S Z = {(x i ) i Z : x i S}. Proof. The lemma shows that, for any E I, there exists A S such that E = {(x n ) n Z : x 0 A} modulo sets of P µ measure zero. The proof relies on the fact that invariant sets lie both in the infinite future F := n Fn, as well as the infinite past F := nf, n and the past and the future of a Markov process are independent conditioned on the present. Thus for E I, P µ [E F0 0 ] = P µ [E E F0 0 ] = P µ [E F0 0 ] 2. Therefore P µ [E F0 0] = 0 or 1 µ a.s. Let A S be the set on which P µ[e F0 0 ] = 1 a.s. Then by the invariance of E under the shift T, we have E = A Z := {(x n ) n Z S Z : x n A n Z} modulo sets of P µ measure zero, while E c = (A c ) Z. In particular, for µ almost all x S, if x A (resp. x A c ), then the Markov chain starting at x never leaves A (resp. A c ). Therefore, E = {(x n ) n Z S Z : x 0 A} modulo sets of P µ measure zero, which proves the lemma. With Lemma 2.1, we can conclude the proof of Theorem 2.1. Suppose that P µ is not ergodic, then P µ is a mixture of P µ [ I], which are ergodic measures on (S Z, S Z ). Since I F0 0 by Lemma 2.1, P µ[ I] are almost surely mixtures of P µ [ F0 0 ], which are measures of the Markov chain with specified values at time 0. Hence P µ [ I] are stationary Markov processes with marginal laws in M, and µ is a mixture of these marginal laws, which means that µ is not extremal in M. The same reasoning also allows us to deduce (2.1) from the ergodic decomposition of P µ. Remark. Note that extremal measures in M must be singular w.r.t. each other, since the associated ergodic Markov processes are singular w.r.t. each other by Theorem 2.1. 3

Remark. A sufficient condition to guarantee the uniqueness of a stationary distribution (if it exists) for a Markov chain is to have some form of irreducibility. If M is not a singleton, then we can find two extremal invariant measure with disjoint support U 1 and U 2 in the state space, such that the Markov chain makes no transitions between U 1 and U 2. Any irreducibility condition that breaks such a partition of the state space will guarantee the existence of at most one stationary distribution. One such condition is if Π(x, dy) has a positive density p(x, y) w.r.t. a common reference measure α(dy) for all x in the state space. 3 Harris chains So far we have studied mostly countable state Markov chains, although the ergodic decomposition of stationary Markov chains was developed for a general Polish space. We now discuss briefly the theory of general state space Markov chains. One class of Markov chains that admit a similar treatment as the countable state space case is the so-called Harris chains. Definition 3.1 (Harris Chains) A Markov chain (X n ) n 0 with state space (S, S) and transition kernel Π(, ) is called a Harris chain, if there exist A, B S, ɛ > 0, and a probability measure ρ with ρ(b) = 1 such that: (i) If τ A := inf{n 0 : X n A}, then P z (τ A < ) > 0 for all z S. (ii) If x A, then Π(x, C) ɛρ(c) for all C S with C B. The conditions of a Harris chain allow us to construct an equivalent Markov chain X with state space S := S {α} and σ-algebra S := {B, B {α} : B S}, where α is an artificial atom that the chain X will visit. More precisely, define X with transition probability kernels Π, such that If x S\A, Π(x, C) = Π(x, C) for C S, If x A, Π(x, {α}) = ɛ, and Π(x, C) = Π(x, C) ɛρ(c) for C S, If x = α, Π(α, D) = ρ(dx) Π(x, D) for D S. X n being in the state α corresponds to X n being distributed as ρ on B. This correspondence allows us to go from the distribution of X to X and vice versa. Having a macroscopic atom α allows us to define transience, recurrence, periodicity, and use the cycle trick to construct stationary measures for recurrent Harris chains, and use coupling to prove convergence of positive recurrent Harris chains to its unique stationary distribution. Definition 3.2 (Recurrence, transience, and periodicity) Let τ α := inf{n 1 : Xn = α}. X is called a recurrent Harris chain if P α (τ α < ) = 1, and transient otherwise. The gcd of D := {n 1 : P α ( X n = α) > 0} is called the period of the Harris chain, with d = 1 corresponding to aperiodicity. Note that Definition 3.1 (i) guarantees that P x (τ α < ) > 0 for all x S, which is a form of irreducibility for the chain X. The theory we developed for countable state Markov chains can be adapted to Harris chains. See e.g. [1] for more details. 4

Theorem 3.1 (Stationary measures) If X is a recurrent Harris chain, then there exists a unique (modulo constant multiple) stationary measure. If X is furthermore aperiodic with stationary distribution π, then for any x S with P x (τ α < ) = 1, we have Π n (x, ) π( ) 0, where denotes the total variation norm of a signed measure. We next give some sufficient conditions for a Harris chain to be positive recurrent, i.e., E α [τ α ] <, which is based on the existence of certain Lyapunov functions. Theorem 3.2 (Sufficient conditions for positive recurrence) Let X be a Harris chain satisfying the conditions in Definition 3.1, where we further assume that A = B. Assume that there exists a function g : S [0, ) with sup x A E x [g(x 1 )] <, such that (i) either g : S [1, ) and there exists r (0, 1) s.t. E x [g(x 1 )] rg(x) for all x A c, (ii) or E x [g(x 1 )] g(x) ɛ for all x A c, then E α [τ α ] < and X is a positive recurrent Harris chain. Proof. Since every time the Markov chain X enters the set A = B, there is probability ɛ of entering the state α in the next step, to show E α [τ α ] <, it suffices to show that sup E x [τ A ] <, where τ A := min{n 1 : X n A}. (3.1) x A Note that condition (i) implies that g(x n τa )r n τ A Letting n then gives g(x) E x [g(x n τa )r n τ A ] E x [r n τ A ]. is a super-martingale. Therefore E x [r τ A ] g(x) x A c. (3.2) By the Markov inequality, this further implies that E x [τ A ] = P x (τ A n) r n g(x) < g(x) 1 r n=1 n=1 x A c. (3.3) Similarly, condition (ii) implies that g(x n τa ) + (n τ A )ɛ is a super-martingale. Therefore Letting n then gives g(x) E x [g(x n τa ) + (n τ A )ɛ] ɛe x [n τ A ]. E x [τ A ] 1 ɛ g(x) x Ac. (3.4) Using (3.3) or (3.4), we note that for x A, E x [τ A ] = 1 + Π(x, dy)e y [τ A ] 1 + 1 Π(x, dy)g(y) 1 + 1 A c c A c c E x[g(x 1 )], where c = 1 r under assumption (i) and c = ɛ under assumption (ii). Taking sup x A on both sides then yields (3.1) by the assumption that sup x A E x [g(x 1 )] <. References [1] R. Durrett, Probability: Theory and Examples, 2nd edition, Duxbury Press, Belmont, California, 1996. [2] P. Lax. Functional analysis, John Wiley & Sons Inc., 2002. 5