Rudiments of Ergodic Theory Zefeng Chen September 24, 203 Abstract In this note we intend to present basic ergodic theory. We begin with the notion of a measure preserving transformation. We then define ergodicity and provide examples. Finally we sketch Birkhoff s Ergodic Theorem and elucidate it with some examples. Contents Introduction 2 Basic notions 3 Birkoff ergodic theorem and Applications 7 Introduction A dynamical system formalizes mathematically the evolution of a state space over time. For instance, suppose X is a topological space and T is a continuous transformation. The evolution of the phase space X is encoded by action of Z on X, given by (n, x) T n (x). The study of dynamical systems may be motivated by specific concerns in physics. A pendulum is a phsyical example of a dynamical system in which the state is its angle and angular velocity, and its evolution follows Newton s Laws. Ergodic theory concerns the behavior of a dynamical system from a measuretheoretic point of view. In this paper we present some fundamental results in ergodic theory. The first is Poincaré s recurrence theorem, which states that almost all points return to the initial state infinitely many times. After presenting other basic results in ergodic theory, we turn to the Birkhoff ergodic theorem. 2 Basic notions We begin with some definitions.
Definition 2.. Let (X, A, m) be a measure space, where X is a set, A is a sigma-algebra, and µ is a measure. A transformation T of X onto X is measurable if for every set A A, T (A) is a measurable set. Definition 2.2. Let (X, A, m) be a measure space. A transformation T of X onto X is measure-preserving if for every set A A, m(t (A)) = m(a). Moreover the transformation T is called m-invariant. Example 2.. (Angle-doubling map) Consider [0, ) as S. The map f(x) = 2x mod is measure-preserving on this space. This transformation helps illustrate why we do not define measurepreserving transformation as the property T (A) = A, which may at first glance look more intuitive than our definition. The map f(x) = 2x mod is measurepreserving, but taking A = [0, 0.5], we have f([0, 0.5]) = [0, ), which is twice as large as [0,0.5]. Example 2.2. Consider [0, ) as before. The translation on the space defined by f(x) = x + b mod is measure-preserving. More generally, consider the n-torus [0, ) n. The translation on the space defined by f(x, x 2...x n ) = (x, x 2...x n ) + (b, b 2...b n ) mod is also measure-preserving on this space. Example 2.3. (Continued fraction map) Consider [0, ) as before. Consider the map 0 if x=0 f(x) = mod if 0 < x < x Let B be a set in the Borel algebra. The map does not preserve Lebesgue measure m as we can choose B to be any interval I and m(t (I)) m(i). However, it preserves the following measure: µ(b) = log 2 B + x dx To see this, we only need to check that µ(t (I)) = µ(i) for intevals. Let I = (a, b) and then note Then T (a, b) = ( b + n, a + n ). n= To consider [0, ) as S might look strange. [0, ) under subspace topology is not the same as S. However, under the topology generated by all (a, b) [0, ) and all [0, a) (b, ), one can show [0, ) is homeomorphic to S. 2
µ(i) = log 2 = log 2 n= a+n b+n + x dx (log( + a + n ) log( + b + n )) n= = lim N log 2 N (log(a + n + ) log(a + n) log(b + n + ) + log(b + n)) n= = log 2 lim (log(a + N + ) log(a + ) log(b + N + ) + log(b + )) N = (log(b + ) log(a + )) = log 2 log 2 as required. b a dx = µ((a, b)) + x Before turning to Poincare s theorem, we provide two definitions. Definition 2.3. The set {T n (x) n Z + } is called the orbit of x. We say point x A is recurrent if the orbit of x intersects A infinitely many times. Theorem 2.. (Poincare Recurrence Theorem) Let (X, A, m) be a probability space and let T be a measure-preserving transformation from X onto itself. For a set A A with m(a) > 0, and for almost every x A we have #{n Z + T n (x) A} =. Thus the Poincare recurrence theorem states that if T is m-invariant then almost every point of a measurable subset with positive measure is recurrent. Proof. Let E be the set of points in A that do not return to A, that is, E = {x A T n (x) / A n }. It suffices to show that m(e) = 0. To see this, note that if m(e) = 0 then m(t n (E)) = 0 for any n 0 and hence T n E is a set of measure 0. Then consider B = A \ n=0 T n E. n=0 The set B comprises the elements of A that return to A for infinitely many iterations of T. Now we check that m(e) = 0. Note that for 0 m < n, T m E T n E =, 3
since if x T m E T n E then T m (x) E T (n m) E. But this contradicts the definition of E. Since m(t n E) = m(e) for all n 0, it follows that ( ) > m T n E = m(t n E) = m(e) n n This implies m(e) = 0. n=0 Now we state the definition of an ergodic transformation. Definition 2.4. Let (X, A, m) be a probability space and let T be a measurepreserving transformation from X onto itself. Then T is called ergodic if for every set A such that T (A) = A, we have m(a) = 0 or m(a) =. Example 2.4. Let X = S. We have shown the map f(x) = 2x mod is measure-preserving and we will show it is also ergodic. Intuitively an interval will have a disconnected preimage under f and so the only f-invariant set is the empty set and the whole set. However to formally show this map is ergodic, we first need some more tools. Example 2.5. As before let X = S. The translation f(x) = x + b mod on this space is not ergodic if b is rational. To see this, let b = p where p and q q are relatively prime. Then we divide [0, ) into q equivalent intervals, namely [0, q ), [ q, 2 ), etc. Then f takes one interval exactly onto another, and thus to q construct a f-invariant set, it is sufficient to pick any part of one interval and take the corresponding parts of other intervals. We will later show that when b is irrational, f is ergodic. Example 2.6. Consider the same space [0, ) n as a n-torus. The translation f(x, x 2...x n ) = (x, x 2...x n ) + (b, b 2...b n ) mod is not necessarily ergodic even if b n is irrational. To see this, we let n = 2 and f(x, x 2 ) = (x, x 2 ) + (b, b), b is irrational. The set is f-invariant. A = {(x, y) y = x + c mod, c [0.9, ] [0, 0.]} Example 2.7. (Bernoulli shift) Let S be a finite set called the alphabet, and let X S N be the set all one-sided infinite sequences of elements of S. Let d be a metric on X: d((x n ), (y n )) = 2 min{k:x k y k } The resulting topology is generated by the collection of cylinders. A cylinder of size k centered at (a n ) is: [a 0, a...a (k ) ] = {(x) X x i = a i (0 i < k)} 4
Note that each cylinder is both open and compact. The left shift on X is the transformation T : (x 0, x, x 2...) (x, x 2, x 3...). It s clear the left shift is continuous. Fix (p) as a probability sequence such that p i =. Then we can construct a measure on X: Definition 2.5. The Bernoulli measure corresponding to (p) is the unique measure on the Borel σ-algebra of X such that m[x 0, x...x n ] = p x0 p x...p xn for all cylinders. The proof that such a measure exists is beyond our scope. proceed to show that the left shift is ergodic. We instead Proposition 2.2. Suppose X = {0, } N, m is the Bernoulli measure such that p n = ( 2 )n+, and T is the left shift. Then (X, m, T ) is measure isomorphic to the map in example 2.: f : [0, ) [0, ) f(x) = 2x mod, and thus T is ergodic. Proof. Let X be a set that contains all sequences in X execpt those that are eventually. The isomorphism is π(x 0, x...) = 2 n x n mod Since every number in [0,) has a unique binary expansion that is not eventually, π is a bijection from X to [0, ). Let a [0, ) and the binary expansion of a in X is (a, a 2...) such that Then a = a 2 + a 2 2 2 + a 3 2 3 +... π(t (a, a 2...)) = π((a 2, a 3...)) = a 2 2 + a 3 2 2 + a 4 +... = 2a mod 23 Hence π also preserves the map structure. The only thing left is to show the set of sequences that are eventually has measure 0. It is clear since there are only countably many such sequences as the alphabet has only two elements. The following two propositions serve as tools to determine if a transformation is ergodic. Proposition 2.3. Let (X, A, m) be a probability space and let T be a measurepreserving transformation from X onto itself. Then the following statements are equivalent: (i) T is ergodic. (ii) Every set A of A with m(t (A) A) = 0 has the property m(a) = or m(a) = 0. 2 (iii) For every A of A with m(a) > 0 we have m( n= T n (A)) =. 2 The symbol means set difference. 5
Proof. (i) (ii) Let A A and m(a T (A)) = 0. For each n 0 we have m(t n A A) = 0 because T n A A n T i+ A T A = n T i (T A A). Let A = n=0 i=n T i A. From this construction we can see T A = A. Since m(a i=n T A) i n m(a T A), we have m(a A ) = 0 and hence m(a ) = m(a). By ergodicity, m(a ) = 0 or. Hence m(a) = 0 or. (ii) (iii) Let A A and m(a) > 0. Let A = n= T n (A). Clearly we have T (A ) A. Since m(t (A )) = m(a ), we have m(t (A ) A ) = 0. Since m(a) > 0, by (ii) we can only have m(a ) =. (iii) (i) Let A A and T (A) = A. Suppose m(a) > 0, by (iii) we have m( n= T n (A)) =. Since T (A) = A, we have hence m( n= T n (A)) = m(a) =. T n (A) = T (n ) (A) =... = A, Proposition 2.4. Let (X, A, m) be a probability space and let T be a measurepreserving transformation from X onto itself. Then the following statements are equivalent: (i) T is ergodic. (ii) Whenever f L 0 is measurable and (f T ) = f a.e., then f is constant a.e. (iii) For a fixed p whenever f L p (m) and (f T ) = f a.e., then f is constant a.e. Proof. Trivially we have (ii) (iii). We first show (i) (ii). Let T be ergodic and (f T ) = f a.e. We assume f is real-valued for if f is complex-valued we can consider the real and imaginary parts separately. Define for k Z and n > 0, X(k, n) = {x : k/2 n f(x) < (k + )/2 n } = f ([k/2 n, (k + )/2 n )] We have and hence T (X(k, n)) X(k, n) {x : (f T )(x) f(x)} m(t (X(k, n)) X(k, n)) = 0 and by the previous theorem m(x(k, n)) = 0 or. For each fixed n, k Z X(k, n) = X is a disjoint union so there exists unique 6
k n such that m(x(k n, n)) =. Let Y = n= X(k n, n).then m(y ) = and f is constant on Y and so that f is constant a.e. (iii) (i). Suppose T (A) = A, A A. Then the characteristic function χ A is in L p and (χ A T ) = χ A. Hence, χ A is constant a.e. and thus χ A = 0 or a.e. This means m(a) = χ A dm = 0 or. Utilizing the tools above, we will now demonstrate that the transformations introduced in our examples are in fact ergodic. First we generalize a previous result. Proposition 2.5. The rotation T (z) = az of S (written multiplicatively) under Haar measure m is ergodic iff a is not a root of unity. Proof. Suppose a is a root of unity, then a p = for some positive integer p. Let f(z) = z p. Then f T = f and f is not constant a.e. Therefore T is not ergodic by the previous theorem. On the other hand, suppose a is not a root of unity and f T = f, f L 2 (m). Let f(z) = b n z n be its Fourier series. Then f(az) = n= n= b n a n z n and therefore b n (a n ) = 0 for all n. If n 0, since a is not a root of unity, we have a n 0. Hence b n = 0, and so f is constant a.e. By the previous theorem T is ergodic. We also have the following results: Proposition 2.6. The map f(x) = ax mod on S is ergodic if a is a positive integer. Proposition 2.7. The map f(x) = x+b mod on S is ergodic if b is irrational. The proofs of these propositions also rely on Fourier series in a straightforward manner and are left to the reader. 3 Birkoff ergodic theorem and Applications Theorem 3.. (Birkhoff s ergodic theorem) Let (X, A, m) be a probability space and let T be a measure-preserving transformation from X onto itself. Then, for any f in L (m), n lim f(t i (x)) = f (x) 7
exists a.e.,is T-invariant and X fdm = X f dm. Moreover, if T is ergodic, then f is constant a.e. and f = X fdm. We will first mention some applications, and later provide a sketch of the proof. The complete proof involves some techniques in hard analysis. Corollary 3.. (Alternative definition for ergodicity) Let (X, A, m) be a probability space and let T be a measure-preserving transformation from X onto itself. Then, T is ergodic iff for all A, B A, we have n lim m(t i A B) = m(a)m(b). Proof. Suppose T is ergodic, and let A, B A. function χ A, by the ergodic theorem we have, Then, n lim χ A (T i x) = X Consider the characteristic χ A (x)dm = m(a) n n lim χ T A B(x) = lim χ T A(x)χ B (x) = m(a)χ B n Since for each n, the function lim n n χ T (A B) is dominated by the constant function, by the dominated convergence theorem we have, n lim m(t i (A B)) = = X X n lim χ T (A B)(x)dm χ B m(a)dm = m(b)m(a) Conversely, suppose the equation holds for all A, B A. Let E A be such that T E = E and m(e) > 0. Since m(t i E E) = m(e), we have Also by hypothesis n lim m(t i E E) = m(e) n lim m(t i E E) = m(e) 2 We have m(e) =. Then T is ergodic. 8
Theorem 3.2. (Borel s Theorem on Normal Numbers) For a.e. x [0, ), the frequency of s in the binary expansion of x is 2. Proof. Let T (x) = 2x mod. We know this map is ergodic with respect to the Lebesgue measure. Let Y denote the set of points in [0, ) that have a unique binary expansion. Then Y has a countable completment and thus m(y ) =. Suppose x = a 2 + a 2 2 2 + a 3 +... has a unique binary expansion. Then 23 Let f(x) = χ (/2,) (x). Then T (x) = T ( a 2 + a 2 2 2 + a 3 2 3 +...) = a 2 2 + a 3 2 2 +... f(t n (x)) = f( a n+ 2 + a n+2 2 2 + a n+3 2 3 +...) = { if a n+ = 0 if a n+ = 0 Hence for x Y the number of s in the first n digits of the expansion of x is n f(t i x). Dividing both sides by n and applying the ergodic theorem we have n f(t i x) χ (/2,) dm = n 2 a.e. This is our desired result. Here we provide a sketch of the proof of the Birkhoff ergodic theorem. The proof is based on the following theorem: Theorem 3.3. (Maximal Ergodic Theorem) Let (X, A, m) be a probability space and let T be a measure-preserving transformation on the space. Let f L (X, A, m). Let n B α = {x X sup f(t j x) > α} n>0 n j=0 then for all A A with T A = A we have that fdm αm(b α A) Now we define B α A f (x) = lim sup n f (x) = lim inf n n n j=0 n n j=0 f(t j x) f(t j x) 9
Observe that f T = f and f T = f. The central idea of this proof is to prove f = f a.e., and the remaining statements of the Birkhoff ergodic theorem is easy. Define Note that E α,β = {x X f (x) > α and f (x) < β} {x X f (x) < f (x)} = β<α,α,β Q E α,β Thus to show f = f a.e., it is sufficient to show that m(e α,β = 0) whenever β < α. Since we have f T = f and f T = f, we see that T E α,β = E α,β. Let n B α = {x X sup g(t j x) > α} n>0 n then E α,β B α = E α,β. Applying the maximal ergodic theorem, we have fdm = αm(e α,β ) E α,β E α,β B α Replace f, α and β by f, β and α, and using the fact ( f) = f and ( f) = f, applying the same theorem we also get E α,β fdm βm(e α,β ) Therefore αm(e α,β ) βm(e α,β ) and since β < α this shows that m(e α,β ) = 0. Thus f = f a.e. and as required. Acknowledgments j=0 n lim f(t i (x)) = f (x) It is a pleasure to thank my mentor, Rachel Vishnepolsky, for her suggestion of the topic and generous help throughout the writing process. References [] Peter Walters, An Introduction to Ergodic Theory. Springer, 2nd Edition, 982. [2] A.N. Shiryaev, Probability. Springer, 2nd Edition, 995. 0