Lecture 10. Theorem 1.1 [Ergodicity and extremality] A probability measure µ on (Ω, F) is ergodic for T if and only if it is an extremal point in M.

Lecture 10 1 Ergodic decomposition of invariant measures Let T : (Ω, F) (Ω, F) be measurable, and let M denote the space of T -invariant probability measures on (Ω, F). Then M is a convex set, although it might be empty. We will show that any measure µ M can be decomposed as mixtures of extremal elements of M, which are exactly the ergodic measures for T. Theorem 1.1 [Ergodicity and extremality] A probability measure µ on (Ω, F) is ergodic for T if and only if it is an extremal point in M. Proof. If µ M is not ergodic, then there exists A F with µ(a) (0, 1) and A is an invariant set for T. Let µ A (resp. µ A c) denote the restriction of µ to A (resp. A c ) and normalized to be a probability measure, i.e., µ A ( ) = µ(a ). µ(a) Then µ A and µ A c are distinct invariant probability measures for T, and µ = αµ A + (1 α)µ A c, where α = µ(a) (0, 1), which shows that µ is not extremal. Conversely, if µ M is not extremal, then µ = αµ 1 + (1 α)µ 2 for some α (0, 1) and distinct µ 1, µ 2 M. If µ was ergodic, then by the ergodic theorem, for any bounded measurable f on (Ω, F), A n f(ω) = f(ω) + f(t ω) + + f(t n 1 ω) n E µ [f] µ a.s. and in L 1 (Ω, F, µ). In particular, A n f(ω) also converges to E µ [f] almost surely w.r.t. µ 1 (resp. µ 2 ), and hence E µ1 [f] = E µ2 [f] = E µ [f]. Since f is any bounded measurable function, this implies that µ 1 = µ 2 = µ, a contradiction. Therefore given that µ is non-extremal, it cannot be ergodic. By applying the ergodic theorem to suitable test functions, one can prove: Lemma 1.1 [Singularity of ergodic measures] Distinct ergodic measures µ 1, µ 2 M are mutually singular. More specifically, there exists A I s.t. µ 1 (A) = µ 2 (A c ) = 1. Choquet s Theorem (see Lax [2, Section 13.4]) provides a decomposition of a metrizable compact convex subset K of a locally convex topological vector space in terms of the extremal points of K. Since the set of invariant probability measures M in general may not be compact, we will not appeal to Choquet s theorem. Instead, we will assume that (Ω, F) is a complete separable metric space with Borel σ-algebra, and appeal to the existence of regular conditional probability distributions. 1

Theorem 1.2 [Ergodic decomposition] Let Ω be a complete separable metric space with Borel σ-algebra F. Let T be a measurable transformation on (Ω, F) and let M denote the set of probability measures on (Ω, F) invariant w.r.t. T. Then for any µ M, there exists a probability measure ρ µ on the set of ergodic measures such that µ = ν ρ µ (dν). (1.1) Remark. The σ-algebra we use for defining ρ µ on M is the Borel σ-algebra induced by the weak topology on M, i.e., µ n µ in M w.r.t. the weak topology if and only if for all bounded continuous functions f : Ω R, we have fdµ n fdµ. Such a convergence of probability measures on (Ω, F) is called weak convergence. Proof. Since (Ω, F) is Polish, there exists a regular conditional probability µ ω of µ conditional on the invariant σ-field I. Provided we can show that µ ω almost surely, we can regard µ ω as a map from Ω to, and denote the distribution of µ ω by ρ µ. The decomposition (1.1) then follows readily. We now verify that µ ω ( ) := µ( I) is ergodic µ a.s. First we show invariance, i.e., µ a.s., µ ω (A) = µ ω (T 1 A) A F. (1.2) A priori, there are uncountable number of sets in F, and the exceptional sets may pile up. However, by our assumption that (Ω, F) is Polish, F can be generated by a countable collection of sets F 0, and hence it suffices to verify (1.2) for A F 0 since µ ω is a probability measure a.s. Since µ ω ( ) = µ( I), given A F 0, µ ω (A) = µ ω (T 1 A) a.s. (i.e., µ(a I) = µ(t 1 A I) a.s.) if and only if µ(a E) = µ(t 1 A E) E I, which holds since E I implies that µ(e T 1 E) = 0, and µ(a E) = µ(t 1 (A E)). This proves the a.s. invariance of µ ω for T. For the a.s. ergodicity of µ ω, it suffices to show that for µ a.s. every µ ω, A F, A n 1 A (ω) := 1 A(ω) + 1 A (T ω) + + 1 A (T n 1 ω) n µ ω (A) a.s. w.r.t. µ ω. (1.3) Approximating A F by sets that are finitely generated from F 0, it suffices to verify (1.3) for A F 0. For such an A, the ergodic theorem applied to 1 A w.r.t. µ implies that A n 1 A (ω) µ(a I) = µ ω (A) a.s. w.r.t. µ. Since µ ω is the regular conditional probability of µ given I, (1.3) must hold. 2 Structure of stationary Markov chains We now apply the ergodic decomposition theorem for stationary measures to stationary Markov chains. Let Π(x, dy) be a transition probability kernel on the state space (S, S). In this section, we will consider a general Polish space (S, S). A Markov process (X n ) n N is stationary if and only if its marginal distribution µ is stationary for Π. More precisely, µ M := {ν : ν(s) = 1, ν(a) = Π(x, A)ν(dx) A S}. Given marginal law µ M, we can embed the stationary Markov process (X n ) n N in a doubly infinite stationary sequence (X n ) n Z. The process (X n ) n Z can be regarded as a random 2 S

variable taking values in the sequence space (S Z, S Z ) where S Z denotes the product σ-algebra on the product space S Z. Given marginal law µ M, let P µ denote the law of (X n ) n Z on (S Z, S Z ). Let T denote the coordinate shift map on S Z. Then each µ M determines a P µ M, where M is the family of probability measures on (S Z, S Z ) invariant for the shift map T. Our goal is to show that the ergodic components of a stationary Markov process P µ are stationary Markov processes P ν with ν M, where ν are the extremal components of µ in M. (Note that in general, ergodic decomposition a stationary process gives ergodic processes which need not be Markov). Theorem 2.1 [Ergodic decomposition of stationary Markov processes] Given µ M, P µ is ergodic for the shift map T if and only if µ, i.e., µ is extremal in the family of invariant measures M for the Markov chain. Furthermore, for any µ M, there exists a probability measure ρ µ on such that µ = νρ µ (dν) and P µ = P ν ρ µ (dν). (2.1) The extremal elements of M are called the extremal or ergodic invariant measures. When M is a singleton, we say the Markov chain is ergodic. Proof. If µ M is not extremal, then neither is P µ extremal in M, which is equivalent to P µ not being ergodic. The key to proving the converse is the following result. Lemma 2.1 Let µ M, and let I be the invariant σ-field on (S Z, S Z ) for the shift map T and the measure P µ (note that we defined I modulo sets of P µ measure 0). Then within sets of P µ measure 0, I F 0 0, where F n m = σ(x m, x m+1,, x n ) on S Z = {(x i ) i Z : x i S}. Proof. The lemma shows that, for any E I, there exists A S such that E = {(x n ) n Z : x 0 A} modulo sets of P µ measure zero. The proof relies on the fact that invariant sets lie both in the infinite future F := n Fn, as well as the infinite past F := nf, n and the past and the future of a Markov process are independent conditioned on the present. Thus for E I, P µ [E F0 0 ] = P µ [E E F0 0 ] = P µ [E F0 0 ] 2. Therefore P µ [E F0 0] = 0 or 1 µ a.s. Let A S be the set on which P µ[e F0 0 ] = 1 a.s. Then by the invariance of E under the shift T, we have E = A Z := {(x n ) n Z S Z : x n A n Z} modulo sets of P µ measure zero, while E c = (A c ) Z. In particular, for µ almost all x S, if x A (resp. x A c ), then the Markov chain starting at x never leaves A (resp. A c ). Therefore, E = {(x n ) n Z S Z : x 0 A} modulo sets of P µ measure zero, which proves the lemma. With Lemma 2.1, we can conclude the proof of Theorem 2.1. Suppose that P µ is not ergodic, then P µ is a mixture of P µ [ I], which are ergodic measures on (S Z, S Z ). Since I F0 0 by Lemma 2.1, P µ[ I] are almost surely mixtures of P µ [ F0 0 ], which are measures of the Markov chain with specified values at time 0. Hence P µ [ I] are stationary Markov processes with marginal laws in M, and µ is a mixture of these marginal laws, which means that µ is not extremal in M. The same reasoning also allows us to deduce (2.1) from the ergodic decomposition of P µ. Remark. Note that extremal measures in M must be singular w.r.t. each other, since the associated ergodic Markov processes are singular w.r.t. each other by Theorem 2.1. 3

Remark. A sufficient condition to guarantee the uniqueness of a stationary distribution (if it exists) for a Markov chain is to have some form of irreducibility. If M is not a singleton, then we can find two extremal invariant measure with disjoint support U 1 and U 2 in the state space, such that the Markov chain makes no transitions between U 1 and U 2. Any irreducibility condition that breaks such a partition of the state space will guarantee the existence of at most one stationary distribution. One such condition is if Π(x, dy) has a positive density p(x, y) w.r.t. a common reference measure α(dy) for all x in the state space. 3 Harris chains So far we have studied mostly countable state Markov chains, although the ergodic decomposition of stationary Markov chains was developed for a general Polish space. We now discuss briefly the theory of general state space Markov chains. One class of Markov chains that admit a similar treatment as the countable state space case is the so-called Harris chains. Definition 3.1 (Harris Chains) A Markov chain (X n ) n 0 with state space (S, S) and transition kernel Π(, ) is called a Harris chain, if there exist A, B S, ɛ > 0, and a probability measure ρ with ρ(b) = 1 such that: (i) If τ A := inf{n 0 : X n A}, then P z (τ A < ) > 0 for all z S. (ii) If x A, then Π(x, C) ɛρ(c) for all C S with C B. The conditions of a Harris chain allow us to construct an equivalent Markov chain X with state space S := S {α} and σ-algebra S := {B, B {α} : B S}, where α is an artificial atom that the chain X will visit. More precisely, define X with transition probability kernels Π, such that If x S\A, Π(x, C) = Π(x, C) for C S, If x A, Π(x, {α}) = ɛ, and Π(x, C) = Π(x, C) ɛρ(c) for C S, If x = α, Π(α, D) = ρ(dx) Π(x, D) for D S. X n being in the state α corresponds to X n being distributed as ρ on B. This correspondence allows us to go from the distribution of X to X and vice versa. Having a macroscopic atom α allows us to define transience, recurrence, periodicity, and use the cycle trick to construct stationary measures for recurrent Harris chains, and use coupling to prove convergence of positive recurrent Harris chains to its unique stationary distribution. Definition 3.2 (Recurrence, transience, and periodicity) Let τ α := inf{n 1 : Xn = α}. X is called a recurrent Harris chain if P α (τ α < ) = 1, and transient otherwise. The gcd of D := {n 1 : P α ( X n = α) > 0} is called the period of the Harris chain, with d = 1 corresponding to aperiodicity. Note that Definition 3.1 (i) guarantees that P x (τ α < ) > 0 for all x S, which is a form of irreducibility for the chain X. The theory we developed for countable state Markov chains can be adapted to Harris chains. See e.g. [1] for more details. 4

Theorem 3.1 (Stationary measures) If X is a recurrent Harris chain, then there exists a unique (modulo constant multiple) stationary measure. If X is furthermore aperiodic with stationary distribution π, then for any x S with P x (τ α < ) = 1, we have Π n (x, ) π( ) 0, where denotes the total variation norm of a signed measure. We next give some sufficient conditions for a Harris chain to be positive recurrent, i.e., E α [τ α ] <, which is based on the existence of certain Lyapunov functions. Theorem 3.2 (Sufficient conditions for positive recurrence) Let X be a Harris chain satisfying the conditions in Definition 3.1, where we further assume that A = B. Assume that there exists a function g : S [0, ) with sup x A E x [g(x 1 )] <, such that (i) either g : S [1, ) and there exists r (0, 1) s.t. E x [g(x 1 )] rg(x) for all x A c, (ii) or E x [g(x 1 )] g(x) ɛ for all x A c, then E α [τ α ] < and X is a positive recurrent Harris chain. Proof. Since every time the Markov chain X enters the set A = B, there is probability ɛ of entering the state α in the next step, to show E α [τ α ] <, it suffices to show that sup E x [τ A ] <, where τ A := min{n 1 : X n A}. (3.1) x A Note that condition (i) implies that g(x n τa )r n τ A Letting n then gives g(x) E x [g(x n τa )r n τ A ] E x [r n τ A ]. is a super-martingale. Therefore E x [r τ A ] g(x) x A c. (3.2) By the Markov inequality, this further implies that E x [τ A ] = P x (τ A n) r n g(x) < g(x) 1 r n=1 n=1 x A c. (3.3) Similarly, condition (ii) implies that g(x n τa ) + (n τ A )ɛ is a super-martingale. Therefore Letting n then gives g(x) E x [g(x n τa ) + (n τ A )ɛ] ɛe x [n τ A ]. E x [τ A ] 1 ɛ g(x) x Ac. (3.4) Using (3.3) or (3.4), we note that for x A, E x [τ A ] = 1 + Π(x, dy)e y [τ A ] 1 + 1 Π(x, dy)g(y) 1 + 1 A c c A c c E x[g(x 1 )], where c = 1 r under assumption (i) and c = ɛ under assumption (ii). Taking sup x A on both sides then yields (3.1) by the assumption that sup x A E x [g(x 1 )] <. References [1] R. Durrett, Probability: Theory and Examples, 2nd edition, Duxbury Press, Belmont, California, 1996. [2] P. Lax. Functional analysis, John Wiley & Sons Inc., 2002. 5