MEASURE-THEORETIC ENTROPY Abstract. We introduce measure-theoretic entropy 1. Some motivation for the formula and the logs We want to define a function I : [0, 1] R which measures how suprised we are or how much information we get when we observe random events, so that if an event occurs with probability p, we want I(p) to be the measure of our surprise. For example, if X Bern(0.01), we would be surprised if we observe X = 1, and not so much if we observe X = 0. By making the following assumptions on I, we can determine I up to a constant. (a) I : [0, 1] [0, ). (b) For all p, q [0, 1], we have I(pq) = I(p) + I(q). (c) I is continuous. To motivate condition (b), suppose that the events A and B are independent and occur with probabilities p and q, respectively. It seems reasonable that information should be additive for independent events, so we require (b) holds. Proposition 1. The assumptions on I give that for some constant k > 0, we have that I(p) = k log p, where log = ln. Exercise 2. Prove Proposition 1. 2. Entropy with respect to a partition Let (Ω, F, µ) be a probability space. Let P be a finite (measurable partition) of Ω; that is, α F, the union of the elements of α is all of Ω, and the measure of the intersection of any two distinct elements of α is zero. Given ω Ω, we let α(ω) be the part that ω belongs to. The information function with respect to α is given by I α (ω) := log µ(α(ω)) = A α log(µ(a))1 A (ω). The static entropy of (Ω, F, µ) with respect to the partition α is defined to be the non-negative number given by H µ (α) = H(α) := EI α = A α µ(a) log(µ(a)). 1
2 MEASURE-THEORETIC ENTROPY Here we take the convention that 0(log 0) = 0. Given two partitions α = {A 1,..., A n } and β = {B 1,..., B m } of Ω, we say that β refines α if every element of β is a subset of an element of α. We let α β denote the least common refinement of α and β given by α β := {A i B j : 1 i n, 1 j m}. Example 3 (i.i.d systems). Let p be a probability measure on a finite alphabet A = {a 1,..., a n }. Consider the probability space (A Z, F, µ), where F is the usual product sigma-algebra and µ = p Z. Let P i = { x A Z : x 0 = a i } and P 0 = {P 1,..., P n }. We have that H µ (P 0 ) = H(p). Exercise 4. Let (Ω, F, µ) be a probability space. Let α and β be partitions of Ω. Show that if β refines α, then H µ (β) H µ (α). Given a probability vector p = (p 1,..., p n ), we also write H(p) = p 1 log p i, and if X is a random variable with pdf p, we also write H(X) = H(p). Example 5 (i.i.d. systems continued). Let T be the left-shift. Define P 1 = T 1 P 0 := {T 1 P 1,..., T 1 P n }. Note that T 1 P i = { x A Z : x 1 = a i } and P 1 is also a partition of A Z. We have that H µ (P 0 P 1 ) = p j p i log(p i p j ) = p i p j log p i + p i p j log p j = 2H(p) We say that two partitions α and β are independent if µ(a B) = µ(a)µ(b) for all A α and B α. Exercise 6. Let (Ω, F, µ) be a probability space. Let α and β be independent partitions of Ω. Show that H(α β) = H(α) + H(β). Exercise 7 (Markov chains). Let X be a Markov chain which takes values on the state space {0, 1} with a transition matrix P, given by p 00 = 1, p 2 01 = 1, p 2 10 = 1, and p 4 11 = 3. Find π the stationary 4 distribution for P. Let Ω = {0, 1} N, and P 0 and P 1 be defined as before. Let µ( ) = P(X ). Compute H(P 0 P 1 ).
MEASURE-THEORETIC ENTROPY 3 3. Dynamical entropy with respect to a partition Lemma 8. Let (Ω, F, µ) be a probability space. If α and β be partitions of Ω, then H(α β) H(α) + H(β). We will prove Lemma 8 using the following useful inequality. Proposition 9 (Gibbs inequality). Let p = (p 1,..., p n ) be a probability vector. If q = (q 1,..., q n ) is another probability vector then we define H(p) = p i log p i p i log q i, where equality holds iff p = q. Exercise 10. Use Jensen s inequality to prove Proposition 9. Proof of Lemma 8. Let r(i, j) = µ(a j B i ), p(i) = µ(a i ) and q(j) = µ(b j ). Let MI(r) := ( r(i, j) ) r(i, j) log. p(i)q(j) By Proposition 9, we have that MI(r) 0. On the other hand, MI(r) = H(p) + H(q) H(r), thus H(α β) + MI(r) H(α) + H(β). We say that a sequence of real numbers a n is subaddtive if a n+m a n + a m. Exercise 11 (Subadditive sequences). If a n is a subadditive sequence of real numbers, then lim n a n /n = inf a n /n. Exercise 12. Let (X i ) i N be a sequence of i.i.d. random variables. Show that a n = log P(S n > n) is subadditive, where S n = X 1 + + X n. Let (Ω, F, µ) be a probability space, and T be a measure-preserving map. Let α be a partition. Consider the sequence given by a n := H(α T 1 α T n+1 α). (1) It is not difficult to show that a n is subadditive. Hence we may define the dynamical entropy of (Ω, F, µ, T ) with respect to the partition α to be a n h µ (α, T ) := lim n n.
4 MEASURE-THEORETIC ENTROPY The dynamical entropy of (Ω, F, µ, T ) is defined h µ (T ) := sup h µ (α, T ), α where the supremum is taken over all measurable partitions. Most of time this is just referred to as the entropy or the Kolmogorov-Sinai entropy. Exercise 13. Show that the sequence given in (1) is subadditive. Exercise 14. Let (Ω, F, µ, T ) be a measure-preserving system. Suppose that (Ω, F, µ, T ) is a factor. Show that h µ (T ) h µ (T ). Conclude that entropy is an isomorphism invariant. Exercise 15. Show that the Kolmogorov-Sinai entropy of a Bernoulli shift of N symbols is bounded. Exercise 16 (Infinite entropy). Let λ be Lebesgue measure on [0, 1]. Consider the space of all sequences taking values in [0, 1], given by ([0, 1] N, F, λ N, T ), where T is the left-shift, and F is the product sigmaalgebra. Show that this space has infinite Kolmogorov-Sinain entropy. 4. Generating partitions Let (Ω, F, µ, T ) be an invertible measure-preserving system. Let α be a finite partition. We say that α is a generating partition if the sigma-algebra generated by α, T 1 α, T 1 α, T 2 α, T 2 α,... is all of F; we say that α is a one-sided generating partition if α, T 1 α,... generates all of F. Theorem 17 (Kolmogorov-Sinai). Let (Ω, F, µ, T ) be a measure-preserving system. If α is a generating partition, then h µ (α, T ) = h µ (T ). Exercise 18 (iid systems again). Notice that in Example 5, P 0 is a generating partition. By Exercise 6, we have that H µ (P 0 T n 1 P 0 ) = nh(p 0 ), so that h µ (T ) = H(p). Exercise 19. Compute the entropy for Markov chain in Example 7 (endowed with the left-shift). Theorem 20 (Rokhlin s countable generator theorem). Let (Ω, F, µ, T ) be an invertible ergodic measure-preserving system. If h µ (T ) <, then there exists a countable generating partition α for the system, where H µ (α) <. Theorem 21 (Krieger finite generator theorem). Let (Ω, F, µ, T ) be an invertible ergodic measure-preserving system. If h µ (T ) <, then there exists a finite generating partition for the system.
MEASURE-THEORETIC ENTROPY 5 Theorems 20 and 21 make good presentation topics. The original proofs of these theorems can be quite difficult, however there are easier proofs due to Keane and Serafin. Exercise 22. Let (Ω, F, µ, T ) be an invertible ergodic measure-preserving system. If h µ (T ) <, show that there exist N > 0 and a bijective map φ : Ω [N] Z such that φ T = S φ, where S is the left-shift. Remark 23 (A word of caution for non-invertible systems). Theorem 21 may not hold if T is not invertible. Although, Sinai proved that every ergodic measure-preserving system, whether invertible or not, has all Bernoulli shifts of no greater entropy has factors, the isomorphism theorem of Ornstein may not hold for one-sided Bernoulli shifts. Although, entropy is still an invariant, it is not a complete invariant for one-sided Bernoulli shifts. One can see this by considering Melshalkin s example and counting pre-images. Presentation topic 24. Present an overview of what can and can not be done in the one-sided case. References [1] T. Downarowicz. Entropy in dynamical systems, volume 18 of New Mathematical Monographs. Cambridge University Press, Cambridge, 2011. [2] R. W. Hamming. Coding and information theory. Prentice-Hall, Inc., Englewood Cliffs, N.J., 1980. [3] C. Hoffman and D. Rudolph. Uniform endomorphisms which are isomorphic to a bernoulli shift. Annals of Mathematics, 156(1):pp. 79 101, 2002. [4] M. S. Keane and J. Serafin. On the countable generator theorem. Fund. Math., 157(2-3):255 259, 1998. Dedicated to the memory of Wies law Szlenk. [5] M. S. Keane and J. Serafin. Generators. Period. Math. Hungar., 44(2):187 195, 2002. [6] W. Parry and P. Walters. Endomorphisms of a lebesgue space. Bull. Amer. Math. Soc., 78(2):272 276, 03 1972. [7] K. Petersen. Ergodic theory, volume 2 of Cambridge Studies in Advanced Mathematics. Cambridge University Press, Cambridge, 1989. Corrected reprint of the 1983 original.