Chapter 1 Measure-Theoretic Entropy, Introduction
|
|
- Rosemary French
- 5 years ago
- Views:
Transcription
1 Chapter 1 Measure-Theoretic Entropy, Introduction... nobody knows what entropy really is, so in a debate you will always have the advantage. attr. von Neumann LetX, B,µbeaprobabilityspace,andletT : X X beameasurablemap which we will frequently also refer to as a transformation. We say that T is measure-preserving, orequivalentlythatµist-invariant,ifµt 1 B = µb for every B B. In this case we also say that X, B,µ,T is a measurepreserving system. A measure-preserving system is called ergodic if a modulo µ invariant set B must have measure µb {0,1}, where B B is called invariant modulo µ if µb T 1 B = 0. We refer to Appendix A for a brief introduction to these and further concepts of ergodic theory and for some important examples, and refer to [52] for a more thorough background. Measure-theoretic entropy is a numerical invariant associated to a measurepreserving system. The early part of the theory described here is due essentially to Kolmogorov,Sinaĭ and Rokhlin, and dates 1 from the late 1950s. As we will see in this chapter and even more so in Chapter 3 there is also a close connection to information theory and the pioneering work of Shannon [184] from The name entropy for Shannon s measure of information carrying capacity was apparently suggested by von Neumann: Shannon is quoted by Tribus and McIrvine [198] as recalling that My greatest concern was what to call it. I thought of calling it information, but the word was overly used, so I decided to call it uncertainty. When I discussed it with John von Neumann, he had a better idea. Von Neumann told me, You should call it entropy, for two reasons. In the first place your uncertainty function has been used in statistical mechanics under that name, so it already has a name. In the second place, and more important, nobody knows what entropy really is, so in a debate you will always have the advantage. The purpose of this volume is to extend this advantage to the reader, who will learn throughout these notes that entropy is a multifaceted notion of 5
2 6 1 Measure-Theoretic Entropy, Introduction great importance to ergodic theory, dynamical systems, and its applications. For instance entropy can be used in order to distinguish special measures like Haar measure from other invariant measures. However, let us not jump ahead too much and note that one of the initial motivations for entropy theory was the following kind of question. The Bernoulli shift on 2 symbols σ 2 : {0,1} Z {0,1} Z defined by σ 2 x n = x n+1 for every n Z and x {0,1} Z preserves the 1 2, 1 2 Bernoulli measure µ 2 which is defined to be the product measure Z 1 2, 1 2 on {0,1}Z see also Appendix A.4 for more general examples of this type. Similarly, the Bernoulli shift on 3 symbols σ 3 : {0,1,2} Z {0,1,2} Z preserves the 1 3, 1 3, 1 3 Bernoulli measure µ 3. Those two measure-preserving systems share many properties, and in particular are unitarily equivalent in the following sense. To any invertible measure-preserving system X, B, µ, T one can associate a unitary operator U T : L 2 µx defined by U T f = f T for all f L 2 µ X see also Section A.1.1. The two shift maps σ 2 and σ 3 are unitarily equivalent in the sense that there is an invertible linear operator W : L 2 µ 3 L 2 µ 2 with Wf,Wg µ2 = f,g µ3 and U σ2 = WU σ3 W 1 see Exercise for a description of a much larger class of measure-preserving systems that are all spectrally indistinguishable. Are σ 2 and σ 3 isomorphic as measure-preserving transformations? To see that this is not out of the question, we note that Mešalkin [135] showed that the 1 4, 1 4, 1 4, 1 4 Bernoulli shift is isomorphic to the one defined by the probability vector 1 2, 1 8, 1 8, 1 8, 1 8, see Section 1.7 for a brief description of the isomorphism between these two maps. It turns out that entropy is preserved by measurable isomorphism, and the 1 2, 1 2 Bernoulli shift and the 1 3, 1 3, 1 3 Bernoulli shift have different entropies and so they cannot be isomorphic. The basic machinery of entropy theory will take some effort to develop, but their interpretation in terms of information theory makes these very easy to remember and highly intuitive. 1.1 Entropy of a Partition Recall that a partition of a probability space X, B,µ is a finite or countably infinite collection of disjointand, by assumption, always measurable subsets of X whose union is X, ξ = {A 1,...,A k } or ξ = {A 1,A 2,...}. We will often think of a partition as being given with an explicit enumeration of its
3 1.1 Entropy of a Partition 7 elements: that is, as a list of disjoint measurable sets that cover X. We will usethe word partition bothforacollectionofsetsand foranenumeratedlist of sets. This is usually a matter of notational convenience but, for example, in Sections 1.2, 1.4 and 1.7 it is essential that we work with an enumerated list. For any partition ξ we define σξ to be the smallest σ-algebra containing the elements of ξ. We will call the elements of ξ the atoms of the partition, and write [x] ξ for the atom of ξ containing x. If the partition ξ is finite, then the σ-algebra σξ is also finite and comprises the unions of elements of ξ. Ifξ andη arepartitions,thenηissaidtobearefinement ofξ,writtenξ η if each atom of ξ is a union of atoms of η. The common refinement of two partitions ξ = {A 1,A 2,...} and η = {B 1,B 2,...}, denoted ξ η, is the partition into all sets of the form A i B j. Notice that σξ η = σξ ση where the right-hand side denotes the σ-algebra generated by σξ and ση, equivalently the intersection of all sub σ algebras of B containing both σξ and ση. This allows us to move from partitions to sub-algebras with impunity. The notation n=0 ξ n will always mean the smallest σ-algebra containing σξ n for all n 0, and we will also write ξ n ր B as a shorthand for σξ n ր B for an increasing sequence of partitions that generate the σ-algebra B of X. For a measurable map T : X X and a partition ξ = {A 1,A 2,...} we write T 1 ξ for the partition {T 1 A 1,T 1 A 2,...} obtained by taking preimages Basic Definition A partition ξ = {A 1,A 2,...} may be thought of as giving the possible outcomes 1,2,... of an experiment, with the probability of outcome i being µa i. The first step is to associate a number Hξ to ξ which describes the amount of uncertainty about the outcome of the experiment, or equivalently the amount of information gained by learning the outcome of the experiment. Two extremes are clear: if one of the sets A i has µa i = 1 then there is no uncertainty about the outcome, and no information to be gained by performing it, so Hξ = 0. At the opposite extreme, if each atom A i of a partition with k elements has µa i = 1 k, then we have maximal uncertainty about the outcome, and Hξ should take on its maximum value for given k for such a partition. Definition 1.1. The entropy of a partition ξ = {A 1,A 2,...} is H µ ξ = HµA 1,... = i 1µA i logµa i [0, ] where 0log0 is defined to be 0. If ξ = {A 1,...} and η = {B 1,...} are partitions, then the conditional entropy of the outcome of ξ once we have
4 8 1 Measure-Theoretic Entropy, Introduction been told the outcome of η briefly, the conditional entropy of ξ given η is defined to be µa1 B j H µ ξ η = µb j H, µa 2 B j, µb j µb j j=1 The formula in 1.1 may be viewed as a weighted average of entropies of the partition ξ conditioned that is, restricted to each atom and then normalized by the measure of that atom on individual atoms B j η. Under the correspondence between partitions and σ-algebras, we may also view H µ as being defined on any σ-algebra corresponding to a countably infinite or finite partition Essential Properties Notice thatthe quantity H µ ξ depends on the partitionξ only viathe probability vector µa 1,µA 2,... Restricting to finite probability vectors, the entropy function H is defined on the space of finite-dimensional simplices = k k where k = {p 1,...,p k p i 0, p i = 1}, by Hp 1,...,p k = k p i logp i. Remarkably, the function in Definition 1.1 is essentially the only function obeying a natural set of properties reflecting the idea of quantifying the uncertainty about the outcome of an experiment. We now list some basic properties of H µ, H, and H µ. Of these properties, 1 and 2 are immediate consequences of the definition, and 3 and 4 will be shown later. 1 Hp 1,...,p k 0, and Hp 1,...,p k = 0 if and only if some p i = 1. 2 Hp 1,...,p k,0 = Hp 1,...,p k. 3 For each k 1, H restricted to k is continuous, independent under permutation of the variables, and attains the maximum value log k at the point 1 k,..., 1 k. 4 H µ ξ η = H µ η+h µ ξ η. Khinchin [102, p. 9] showed that H µ as defined in Definition 1.1 is the only function with these properties. In this chapter all these properties of the entropy function will be derived, but Khinchin s characterization of entropy in terms of the properties 1 to 4 above will not be used and will not be proved here. i=1
5 1.1 Entropy of a Partition Convexity Many of the most fundamental properties of entropy are a consequence of convexity, and we now recall some elementary properties of convex functions. Definition 1.2. A function ψ : a,b R is convex if n n ψ t i x i t i ψx i i=1 for all x i a,b and t i [0,1] with n i=1 t i = 1, and is strictly convex if n n ψ t i x i < t i ψx i i=1 i=1 i=1 unless x i = x for some x a,b and all i with t i > 0. Let us recall a simple consequence of this definition. Suppose that a < s < t < u < b. Then convexity of ψ implies that u t t s ψt = ψ s+ u s u s u u t t s ψs+ u s u s ψu, which is equivalent to the following monotonicity of slopes ψt ψs t s ψu ψt. 1.2 u t Note that strict convexity would give a strict inequality in 1.2. Lemma 1.3 Jensen s inequality. Let ψ : a,b R be a convex function and let f : X a,b be a measurable function in L 1 µ on a probability space X, B,µ. Then ψ fxdµx ψfx dµx. 1.3 If in addition ψ is strictly convex, then ψ fxdµx < ψfx dµx 1.4 unless fx = t for µ-almost every x X for some fixed t a,b. In this lemma we permit a = and b =. Similar conclusions hold on half-open and closed intervals.
6 10 1 Measure-Theoretic Entropy, Introduction Proof of Lemma 1.3. Let t = f dµ, so that t a,b. Let so that, by 1.2, β = sup a<s<t β inf t<u<b { ψt ψs t s { ψu ψt u t It follows that ψs ψt+s tβ for any s with a < s < b, so }, }. ψfx ψt fx tβ for every x X. Since ψ is continuous, the map x ψfx is measurable and so we may integrate 1.5 to get ψ f dµ ψ f dµ β f dµ+β f dµ 0, showing 1.3. If ψ is strictly convex, then ψs > ψt + βs t for all s > t and for all s < t. Iff is not equalalmosteverywhereto a constant,then fx t takes on both negative and positive values on sets of positive measure, proving1.4. We note that only 1.2 was needed to prove Lemma 1.3, which shows that φ 0 implies convexity and φ > 0 implies strict convexity only using the mean value theorem of analysis. We now apply this to the function x xlogx in the definition of entropy. Define the function φ : [0, R by { 0 if x = 0; φx = 1.6 xlogx if x > 0. Clearly the choice of φ0 means that φ is continuous at 0. The graph of φ is shown in Figure 1.1; the minimum value occurs at x = 1/e. Since φ x = 1 x > 0 and x logx = 1 x > 0 on 0,1], we get the 2 following fundamental lemma. Lemma 1.4 Convexity. The function x φx is strictly convex on [0, and the function x logx is strictly convex on 0,. A consequence of this is that the maximum amount of information in a partition arises when all the atoms of the partition have the same measure. Onopen sub-intervals,thisisa consequence ofthe monotonicity ofslopesin1.2.onhalfopen intervals, continuity may fail at an end point, but this does not affect measurability.
7 1.1 Entropy of a Partition 11 1/e 1 Fig. 1.1: The graph of x φx. Proposition 1.5 Maximal entropy. If ξ is a partition with k atoms, then H µ ξ logk, with equality if and only if µp = 1 k for each atom P of ξ. This establishes property 3 of the function H : [0, from page 8. We also note that this proposition is a precursor of many characterizations of uniform measures as being those with maximal entropy see Example 1.28 for the first instance of this phenomenon. Proof of Proposition 1.5. By Lemma 1.4, if some atom P has µp 1 k, then 1 k logk = φ 1 k = φ 1 k µp < 1 k φµp, P ξ P ξ so P ξµplogµp < logk. If µp = 1 k for all P ξ, then H µξ = logk Proof of Essential Properties It will be useful to introduce a function associated to a partition ξ closely related to the entropy H µ ξ. Definition 1.6. The information function of a partition ξ is defined by I µ ξx = logµ[x] ξ,
8 12 1 Measure-Theoretic Entropy, Introduction where [x] ξ ξ is the partition element with x [x] ξ. Moreover,if η is another partition, then the conditional information function of ξ given η is defined by I µ ξ η x = log µ[x] ξ η µ[x] η. In the next proposition we give the remaining main properties of the entropy function, and in particular we prove property 4 from page 8. Proposition 1.7 Additivity and Monotonicity. Let ξ and η be countable partitions of X, B,µ. Then 1 H µ ξ = I µ ξ dµ and H µ ξ η = I µ ξ η dµ; 2 I µ ξ η = I η ξ + I µ ξ η, Hµ ξ η = H µ η + H µ ξ η and so, if H µ ξ <, then H µ ξ η = Hµ ξ η H µ η; 3 H µ ξ η H µ ξ+h µ η; 4 if η and ζ are partitions of finite entropy, then H µ ξ η ζ Hµ ξ ζ. We note that all the properties in Proposition 1.7 fit very well with the interpretation of I µ ξx as the information gained about the point x by learning which atom of ξ contains x, and of H µ ξ as the average information. Thus 2 says that the information gained by learning which element of the refinement ξ η contains x is equal to the information gained by learning which atom of ξ contains x added to the information gained by learning in addition which atom of η contains x given the earlier knowledge about which atom of ξ contains x. The reader may find it helpful to give similar interpretations of the various entropy and information identities and inequalities that come later. Example 1.8. Notice that the relation H µ ξ η H µ ξ for entropy see property 4 above does not hold for the information function I µ. For example, let ξ and η denote the partitions of [0,1] 2 shown in Figure 1.2, and let m denote the two-dimensional Lebesgue measure on [0,1] 2. Then I m ξ = log2 while I m ξ η is { > log2 in the shaded region; < log2 outside the shaded region. Proof of Proposition 1.7. Write ξ = {A 1,A 2,...} and η = {B 1,B 2,...}; then
9 1.1 Entropy of a Partition 13 η ξ ξ η Fig. 1.2: Partitions ξ and η and their refinement. I µ ξ ηdµ = A i ξ, B j η log µa i B j µa i B j µb j = µb j µa i B j µai B j log µb j µb j B j η A i ξ = µa1 B j j H,... = H µ ξ η, µb B j ηµb j showing the second formula in 1, and hence the first by setting η = {X}. Notice that I µ ξ ηx = logµ[x] ξ [x] η which gives 2 by integration. By convexity of φ, H µ ξ η = = logµ[x] η log µ[x] η [x] ξ µ[x] η = I µ ηx+i µ ξ ηx, A i ξ B j η A i ξφ Bj η µai B j µb j φ µb j µb j µa i B j µb j A i ξφµa i = H µ ξ, showing 3. Finally, using the above and the already established additivity property2 we get
10 14 1 Measure-Theoretic Entropy, Introduction H µ ξ η ζ = H µ ξ η ζ H µ η ζ = H µ ζ+h µ ξ η ζ H µ ζ H µ η ζ = µc H µc 1 µ C ξ η H µc 1 µ C η C ζ = C ζµch µc 1 µ C ξ η C ζµch µc 1 µ C ξ = H µ ξ ζ. Exercises for Section 1.1 Exercise Find countably infinite partitions ξ,η of [0,1] with H mξ finite and with H mη infinite, where m is Lebesgue measure. Exercise Show that the function dξ,η = H µξ η + Hµη ξ defines a metric on the space of all partitions considered up to sets of measure zero of a probability space X, B,µ with finite entropy. Exercise Two partitions ξ,η are independent, denoted ξ η, if µa B = µaµb for all A ξ and B η. Prove that ξ and η with finite entropy are independent if and only if H µξ η = H µξ+h µη. Exercise Extend Proposition 1.72 to a conditional form by showing that for countable partitions ξ, η, ζ. H µξ η ζ = Hµξ ζ+hµη ξ ζ 1.7 Exercise For partitions ξ = {A 1,...,A n},η = {B 1,...,B n} of fixed cardinality and thought of as ordered lists, show that ξ,η H µξ η = Hµξ η H µη is a continuous function of ξ and η with respect to the metric dξ,η = n µa i B i. Exercise Define sets Ψ k X = {partitions of X with k or fewer atoms}, i=1 Ψ < X = Ψ k and ΨX = {partitions of X with finite entropy}. Prove that Ψ k X and ΨX are complete metric spaces under the entropy metric from Exercise Prove that Ψ < X is dense in ΨX. k 1
11 1.2 Compression Algorithms Compression Algorithms In this section we discuss a clearly related but slightly different point of view on the notions of information and entropy for finite or countably infinite partitions. 2 Itwillbe importantheretothink ofafinite orcountablyinfinite partition ξ = A 1,A 2,... as an ordered list rather than a set of subsets. We will refer to the indices 1,2,... in the chosen enumeration of ξ as symbols or source symbols in the alphabet, which is a subset of N. We wish to encode each symbol by a finite binary sequence d 1...d l of length l 1 with d 1,...,d l {0,1} with the following properties: 1 every finite binary sequence is the code of at most one symbol i {1,2,...}; 2 if d 1...d l is the code of some symbol then for every k < l the binary sequence d 1...d k is not the code of a symbol. A code, which, because of the second condition is also referred to as a prefix-free code, is then a map S : {1,2,...} l 1 {0,1} l with these two properties. These two properties allow the code to be decoded: given a code d 1...d l the symbol encoded by the sequence can be deduced, and if the code is read from the beginning it is possible to work out when the whole sequence for that symbol has been read. Clearly the last requirement is essential if we want to successfully encode and decode not just a single symbol i but a list of symbols w = i 0 i 1...i r. We will call such a list of symbols a name in the alphabet {1,2,...}. Because of the properties assumed for a code S we may extend the code from symbols to names by simply concatenating the codes of the symbols in the name to form one binary sequence Si 0 Si 1...Si r without needing separators between the codes for individual symbols. The properties of the code mean that there is well-defined decoding map defined on the set of codes of names. Example A simple example of a code defined on the alphabet{1,2,3} is given by S1 = 0, S2 = 10, S3 = 11. In this case the binary sequence is the code of the name 2113, because property 2 means that the sequence may be parsed into codes of symbols uniquely as = S2S1S1S3. 2 Consider the set of all words appearing in a given dictionary. The goal of encoding names might be to find binary representations of sentences consisting of English words chosen from the dictionary appearing in this book. 3 A possible code for the infinite alphabet {1,2,3, } is given by
12 16 1 Measure-Theoretic Entropy, Introduction and so on. Clearly this also gives a code for any finite alphabet. Given that there are many possible codes, a natural question is to ask for codes that are optimal with respect to some notion of weight or cost. To explore this we need additional structure, and in particular need to make assumptions about how frequently different symbols appear. Assume that every symbol has an assigned probability v i [0,1], so that i=1 v i = 1. In Example 1.92, we may think of v i as the relative frequency of the English word represented by i in this book. Let Si denote the length of the codeword Si. Then the average length of the code is LS = v i Si, i which may be finite or infinite depending on the code. We wish to think of a code S as a compression algorithm, and in this viewpoint a code S is better on average more efficient than another code S if the average length of the code S is smaller than the average length of the code S. This allows us to give a new interpretation of the entropy of a partition in terms of the average length of an optimal code for a given distribution of relative frequencies. Lemma 1.10 Lower bound on average code length. For any code S the average length satisfies LSlog2 Hv 1,v 2,... = i v i logv i. InotherwordstheentropyHv 1,v 2,...ofaprobabilityvectorv 1,v 2,... gives a lower bound on the average effectiveness of any possible compression algorithm for the symbols 1,2,... with relative frequencies v 1,v 2,... Proof of Lemma We claim thatthe requirementson the code S imply Kraft s inequality 3 2 Si i To see this relation, interpret a binary sequence d 1...d l as the address of the binary interval Id 1...d l = d d d 2 l 2, d1 l 2 + d d l l of length 1 2 l. The requirements on the code mean precisely that all the intervals ISi for i = 1,2,... are disjoint, which proves 1.8. The lemma now follows by convexity of x logx see Lemma 1.4:
13 1.2 Compression Algorithms 17 LSlog2 Hv 1,v 2,... = v i Si log2+ v i logv i i i = 2 Si v i log log v i i i Si Lemma 1.10 and its proof suggest that there might always be a code that is as efficient as entropy considerations allow. As we show next, this is true if the algorithm is allowed a small amount of wastage. Starting with the probability vector v 1,v 2,... we may assume, by reordering if necessary, that v 1 v 2. Define l i = log 2 v i where t denotes the smallest integer greater than or equal to t, so that l i is the smallest integer with 1 2 v l i i. Starting with the first digit, associate to i = 1 the interval I 1 = 1 0, 2 l 1, to i = 2 the interval I2 = 1 2, 1 l l 1 2 2, and in l general associate to i the interval i 1 I i = 1 i 1 2 lj, 2 lj j=1 of length 1. We claim that every such interval is the interval ISi for a 2 l i unique address Si = d 1...d Si as in 1.9. Lemma 1.11 Near optimal code. Given a probability vector v 1,v 2,... permuting the indices if neccessary to assume v 1 v 2, there exists a code S, called the Shannon code, such that i 1 I i = 1 i 2 lj, j=1 1 2 lj j=1 j=1 = ISi = Si k=1 Si d k d k 2 k, 2 k + 1 k=1 2 Si is the interval with the address Si = d 1 d Si. The Shannon code satisfies Si = log 2 v i and hence LSlog2 Hv 1,v 2,...+log2. That is, the entropy divided by log2 is, to within one digit, the best possible average length of a code encoding the alphabet with the given probability vector describing its relative frequency distribution. Proof of Lemma 1.11.TherequirementthatabinaryintervalI = a with a,b N 0 and m,n N has an address d 1...d l in the sense of 1.9 is precisely the requirement that we can choose to represent the endpoints a 2 m. b 2, m 2 n and b 2 n in such a way that a 2 m = a 2 l, b 2 n = b 2 l and b a = 1. In other words, the length of the interval must be a power 1 2 l of 2 with l m,n. The ordering of ξ chosen and the construction of the intervals ensures this
14 18 1 Measure-Theoretic Entropy, Introduction property. It follows that every interval I i constructed coincides with ISi for some binary sequence Si of length Si. The disjointness of the intervals ensures that S is a code. The average length of the code S is, by definition, LS = i v i Si = 1 log2 1 log2 v i log i v i log i 1 2 Si vi 2 = 1 log2 Hv 1,v 2, Lemmas 1.10 and 1.11 together comprise the source coding theorem of information theory. Using the Shannon code, we interpret the information measured by the informationfunctionuptoonedigitasthenumberofdigitsneededtoencode a symbol. 1.3 Entropy of a Measure-Preserving Transformation In the last two sections we introduced and studied in some detail the notions of entropy and conditional entropy for partitions. In this section we start to apply this theory to the study of measure-preserving transformations, starting with the simple observation that such a transformation preserves conditional entropy in the following sense. Lemma 1.12 Invariance. Let X, B, µ, T be a measure-preserving system and let ξ, η be partitions. Then H µ ξ η = Hµ T 1 ξ T 1 η and I µ ξ η T = Iµ T 1 ξ T 1 η Proof. It is enough to show1.10. Notice that T 1 [Tx] η = [x] T 1 η for all x, so I µ ξ η Tx = log µ[tx] ξ [Tx] η µ[tx] η = log µ[x] T 1 ξ [x] T 1 η µ[x] T 1 η = I µ T 1 ξ T 1 η x.
15 1.3 Entropy of a Measure-Preserving Transformation 19 We are going to define the notion of entropy of a measure-preserving transformation; in order to do this a standard 4 result about convergence of subadditive sequences is needed. Lemma 1.13 Fekete. Let a n be a sequence of elements of R { } with the sub-additive property a m+n a m +a n for all m,n 1. Then 1 n a n converges possibly to, and 1 lim n n a 1 n = inf n 1 n a n. Proof. If a n = for some n 1 then by the sub-additive property we have a n+k = for all k 1, so the result holds with limit. Assume now that a n > for all n, and let a = inf n N { an n }, so an n a for all n 1. We assume here that a > and leave the case a = as an exercise. In our applications, we will always have a n 0 for all n 1 and hence a 0. Given ε > 0, pick k 1 such that a k k < a + 1 2ε. Now by the sub-additive property, for any m 1 and j, 0 j < k, a mk+j mk +j a mk mk +j + a j mk +j a mk mk + a j mk ma k mk + ja 1 mk a k k + a 1 m < a+ 1 2 ε+ a 1 m. So, if n is chosen large enough and we apply division with remainder to write n = mk + j, then we may assume that a1 m < 1 2 ε, then an n < a + ε as required. This simple lemma will be applied in the following way. Let T be a measure-preservingtransformation of X, B,µ, and let ξ be a partition of X with finite entropy. Recall that we can think of ξ as an experiment with at most countably many possible outcomes, represented by the atoms of ξ. The entropy H µ ξ measures the average amount of information conveyed about the points of the space by learning the outcome of this experiment. This quantity could be any non-negative number or infinity and of course has nothing to do with the transformation T. If we think of T : X X as representing evolution in time, then the partitiont 1 ξ correspondstothesameexperimentonetimeunitlater.inthis sense the partition ξ T 1 ξ represents the joint outcome of the experiment ξ carriedoutnowandinoneunitoftime, soh µ ξ T 1 ξmeasuresthe average
16 20 1 Measure-Theoretic Entropy, Introduction amount of information obtained by learning the outcome of the experiment applied twice in a row. Assume for a moment that the partition T k ξ is independent see the definition in Exercise of for all k 1. Then ξ T 1 ξ T k 1 ξ H µ ξ T 1 ξ T n 1 ξ = H µ ξ+ +H µ T n 1 ξ = nh µ ξ for all n 1 by an induction using Exercise and the invariance property in Lemma In general, subadditivity of entropy Proposition 1.73 and Lemma 1.12 show that H µ ξ T 1 ξ T n 1 ξ H µ ξ+ +H µ T n 1 ξ = nh µ ξ, sothe quantityh µ ξ T 1 ξ T n 1 ξ growsatmostlinearlyin n. This asymptotic linear growth rate will in general depend on the partition ξ, but once this dependence is eliminated the resulting rate is an invariant associated to T, the dynamical entropy of T with respect to µ. By the same argument as above, one sees that the sequence a n defined by a n = H µ ξ T 1 ξ T n 1 ξ is sub-additive in the sense of Lemma 1.13, which shows the claimed convergence and the second equality in the next definition. Definition Let X, B,µ,T be a measure-preserving system and let ξ be a partition of X with finite entropy. Then the entropy of T with respect to ξ is n 1 1 h µ T,ξ = lim n n H µ T i ξ n 1 1 = inf n 1 n H µ T i ξ. The entropy of T is h µ T = sup h µ T,ξ. ξ:h µξ< One fact that might explain why entropy is such a useful notion is that there are many possible ways to define entropy. Let us give immediately a second possible definition. As always we ask the reader to find reasonable descriptions of the entropy expressions in terms of information gain instead of just relying on our formal manipulations of the entropy expressions. Proposition 1.15 Entropy conditioned on future. If X, B,µ,T is a measure-preserving system and ξ is a countable partition with finite entropy, then
17 1.3 Entropy of a Measure-Preserving Transformation 21 h µ T,ξ = lim H µ ξ n T i ξ. n i=1 Proof. The limit exists by monotonicity of entropy Proposition By additivity of entropy Proposition 1.72 we also have for any n 1 that n 1 H µ T i ξ = H µ T n 1 ξ +H µ T n 2 ξ T n 1 ξ + +H µ ξ n 1 T i ξ i=1 = H µ ξ+h µ ξ T 1 ξ + +H µ ξ n 1 T i ξ, where we used invariance Lemma 1.12 in the last step. Thus n 1 1 n H µ T i ξ = 1 n 1 H µ ξ+ H µ ξ n j=1 j i=1 i=1 T i ξ, showing the result since the Césaro limit of a convergent sequence coincides with the limit of the original sequence. Example Let X 2 = {0,1} Z with the Bernoulli 1 2, 1 2 measure µ 2, preserved by the shift σ 2. Consider the state partition ξ = {[0] 0,[1] 0 } where [0] 0 = {x X 2 x 0 = 0} and [1] 0 = {x X 2 x 0 = 1} are cylinder sets. The partition σ k 2 ξ is independent of k 1 j=0 σ j 2 ξ for all k 1, so n 1 1 h µ2 σ 2,ξ = lim n n H µ 2 σ i 2 ξ 1 = lim n n log2n = log Elementary Properties Notice that we are not yet in a position to compute h µ2 σ 2 from Example 1.16, since this is defined as the supremum over all partitions in order to make the definition independent of the choice of ξ. In order to calculate h µ2 σ 2 the basic properties of entropy need to be developed further.
18 22 1 Measure-Theoretic Entropy, Introduction Proposition Let X, B, µ, T be a measure-preserving system on a probability space, and let ξ and η be countable partitions of X with finite entropy. Then we have 1 Trivial bound h µ T,ξ H µ ξ; 2 Subadditivity h µ T,ξ η h µ T,ξ+h µ T,η; 3 Continuity bound h µ T,η h µ T,ξ+H µ η ξ. Proof. In this proof we will make use of Proposition 1.7 without particular reference. These basic properties of entropy will be used repeatedly later. 1: For any n 1, n 1 1 n H µ T i ξ 1 n 1 H µ T i ξ = 1 n 1 H µ ξ = H µ ξ. n n 2: For any n 1, n 1 1 n H µ T i ξ η 1 n H µ n 1 Subadditivity follows by taking n. 3: For any n 1, 1 h µ T,η = lim n 1 lim n = lim n n H µ n H µ [ 1 n H µ n 1 n 1 n 1 T i η T i ξ η T i ξ T i ξ + 1 n H µ + 1 n H µ n 1 n 1 1 h µ T,ξ+ lim H µ T i η T i ξ n n }{{} =H µη ξ n 1 T i η = h µ T,ξ η T i η n 1 T i ξ by the invariance property in Lemma Taking n we obtain the continuity bound. Proposition 1.18 Iterates. Let X, B, µ, T be a measure-preserving system on a probability space and ξ a countable partition of X with finite entropy. Then 1 h µ T,ξ = h µ T, k T i ξ for all k 1;. ]
19 1.3 Entropy of a Measure-Preserving Transformation 23 2 for invertible T, h µ T,ξ = h µ T 1,ξ = h µ T, for all k 1; 3 h µ T k = kh µ T for k 1; and 4 h µ T = h µ T 1 if T is invertible. Proof. 1: For any k 1, h µ T, k n 1 T i 1 ξ = lim n n H k µ T j j=0 k+n 1 1 = lim n n H µ T i ξ = lim n k +n n 1 k +n H µ k i= k T i ξ k+n 1 T i ξ T i ξ = h µ T,ξ. 2: For any n 1, the invariance property Lemma 1.12 shows that n 1 H µ n 1 T i ξ = H µ T n 1 n 1 T i ξ = H µ T i ξ. Dividing by n and taking the limit gives the first statement, and the second equality follows easily along the lines of 1. 3: For any partition ξ with finite entropy, n 1 1 lim n n H µ It follows that j=0 k 1 T kj T i ξ = lim n nk 1 k nk H µ k 1 h µ T k, T i ξ = kh µ T,ξ, so kh µ T h µ T k. For the reverse inequality, notice that so h µ T k kh µ T. h µ T k,η k 1 h µ T k, T i η = kh µ T,η, T i ξ =kh µ T,ξ.
20 24 1 Measure-Theoretic Entropy, Introduction 4: This follows from 2. Lemma 1.19 Finite vs. finite entropy. Entropy can be computed using finite partitions only, in the sense that sup h µ T,η = sup h µ T,ξ. η finite ξ:h µξ< In fact, for every countable partition ξ with finite entropy and ε > 0 there exists a finite partition η measurable with respect to σξ with H µ ξ η < ε. Proof. Any finite partition has finite entropy, so sup h µ T,η sup h µ T,ξ. η finite ξ:h µξ< For the reverse inequality, let ξ be any partition with H µ ξ <. By the continuity bound in Proposition it suffices to show the last claim in the lemma. To see this, let ξ = {A 1,A 2,...} and define η = { A 1,A 2,...,A N,B N = X } N A n, n=1 so that µb N 0 as N. Then H µ ξ µan+1 η = µbn H µ µb N, µa N+2 µb N,... = µa j log µa j µb N = j=n+1 j=n+1 µa j logµa j +µb N logµb N. }{{} φb N Hence,bythe assumptionthath µ ξ < andsinceφb N < 0,itispossible to choose N large enough to ensure that H µ ξ η < ε Entropy as an Invariant Recall that Y, B Y,ν,S is a factor of X, B,µ,T if there is a measurepreserving map φ : X Y with φtx = Sφx for µ-almost every x X. Theorem 1.20 Entropy of factor. If Y, B Y,ν,S is a factor of the system X, B,µ,T, then h ν S h µ T. In particular, entropy is an invariant of measurable isomorphism.
21 1.3 Entropy of a Measure-Preserving Transformation 25 Proof. Let φ : X Y be the factor map. Then any partition ξ of Y defines a partition φ 1 ξ of X, and since φ preserves the measure, H ν ξ = H µ φ 1 ξ. Thisimmediately implies thath µ T,φ 1 ξ = h ν S,ξ, andhencethe result. The definition of the entropy of a measure-preserving transformation involves a supremum over the set of all finite partitions. In order to compute the entropy, it is easier to work with a single partition. The next result the Kolmogorov Sinaĭ Theorem gives a sufficient condition on a partition to allow this. Theorem 1.21 Kolmogorov Sinaĭ. Let X, B, µ, T be a measure-preserving system on a probability space, and let ξ be a partition of finite entropy that is a one-sided generator under T in the sense that T n ξ=b n=0 Then h µ T = h µ T,ξ. If T is invertible and ξ is a partition with finite entropy that is a generator under T in the sense that n= Then once again h µ T = h µ T,ξ. T n ξ=b. Theorem 1.21 transfers some of the difficulty inherent in computing entropy onto the problem of finding a generator. We note that if a partition is found satisfying 1.11 modulo µ then under the assumption that B is countably generated, which we will have whenever this is used there is an isomorphic copy of the system for which we have found a generator satisfying 1.11 as stated. There are general results 5 showing that generators always exist under suitable conditions notice that the existence of a generator with k atoms means the entropy cannot exceed logk, but these are of little direct help in constructing a generator. In Section 1.6 a generator will be found for a non-trivial example, and in Chapter 4 we will give a proof of the existence of finite generators for any finite entropy ergodic system. Lemma 1.22 Continuity. Let X, B,µ,T be a measure-preserving system, let ξ be a partition satisfying 1.11, and let η be any partition of X with finite entropy. Then H µ η n T i ξ 0 as n.
22 26 1 Measure-Theoretic Entropy, Introduction Proof. By the last statement in Lemma 1.19 it suffices to consider a finite partition η. By assumption, the partitions n j=0 T j ξ for n = 1,2,... together generate B. This in particular shows that for any δ > 0 and B B, there exists some n 1 and some set n A σ j=0 T j ξ for which µa B < δ. In fact, it is not hard to see that the set of all measurable sets B with this property is a σ-algebra containing T n ξ for all n 0, which gives the claim. Alternatively, this follows quickly from the increasing martingale theorem [52, Th. 5.5]. Applying the above to all the elements of η = {B 1,...,B m }, we can find one n with the property that there is a collection of sets n A i σ j=0 T j ξ with µa i B i < δ/m 2 for i = 1,...,m 1. Write A 1 = A 1,A 2 = A 2 A 1,A 3 = A 3 A 1 A 2,..., m 2 A m 1 = A m 1 Now notice for i = 1,...,m 1 that j=1 µa i B i = µa i B i +µb i A i µa i B i +µb i A i +µ B i δ i 1 m 2 + µa j B j δ m j=1 m 1 A j, and A m = X by construction and since η = {B 1,...,B m } forms a partition. Using that both η and ζ = {A 1,...,A m } form partitions we also get m 1 m 1 µa m B m = µ A i i=1 i=1 B i m 1 j=1 i 1 j=1 A j µa i B i δ. To summarize, the two partitions η and ζ have the property that µa i B i < δ j=1 A j.
23 1.3 Entropy of a Measure-Preserving Transformation 27 for i = 1,...,m. Thus H µ η n T i ξ H µ η ζ = by monotonicity Prop m µa i B i log µa i B i µa i m µa i B j log µa i B j. µa i i=1 i,j=1,i j The terms in the first sum are close to zero because µai Bi µa i is close to 1, and the terms in the second sum are close to zero because µa i B j is close to zero. In other words, given any ε > 0, by choosing δ small enough and hence n large enough we can ensure that H µ η n T i ξ < ε as needed. Proof of Theorem Let ξ be a one-sided generator under T. For any partition η, continuity of entropy Proposition 1.17 and Lemma 1.22 shows that h µ T,η h µ T, n T i ξ } {{ } =h µt,ξ +H µ η n T i ξ } {{ } 0 as n so ht,η ht,ξ as required. The proof for a generator under an invertible T is similar. Corollary If X, B, µ, T is an invertible measure-preserving system on a probability space with a one-sided generator, then h µ T = 0. Proof. Let ξ be a partition with so that T n ξ=b, n=0 h µ T = h µ T,ξ = lim n H µ ξ n T i ξ by the Kolmogorov Sinaĭ theorem Theorem 1.21 and since entropy can be expressed by conditioning on the future Proposition On the other hand, since T is invertible we may consider the partition Tξ and obtain i=1
24 28 1 Measure-Theoretic Entropy, Introduction h µ T = lim H µ ξ n T i ξ = lim H µ Tξ n 1 T i ξ = 0 n n i=1 since T 1 preserves the measure and by continuity of entropy Lemma The Kolmogorov Sinaĭ theorem allows the entropy of simple examples to be computed. The next examples will indicate how positive entropy arises, and gives some indication that the entropy of a transformation is related to the complexity of its orbits. In Examples 1.26 and 1.27 the positive entropy reflects the way in which the transformation moves nearby points apart and thereby using the partition chops up the space in a complicated way; in Examples 1.24 and 1.25 the transformation moves points around in a very orderly way, and this is reflected 6 in the zero entropy. Example The identity map I : X X has zero entropy on any probability space X, B,µ. This is clear, since for any partition ξ, n 1 I i ξ = ξ, so h µ I,ξ = 0. Example The circle rotation R α : T T has zero entropy with respect to Lebesgue measure. If α is rational, then there is some q 1 with Rα q = I, so h mt R α = 0 by Proposition and Example If α is irrational, then ξ = {[0, 1 2,[1 2,1} is a one-sided generator since the point 0 has dense orbit under R α. In fact, if x 1,x 2 T with x 1 < x 2 [0, 1 2 as real numbers, then there is some n N with Rα n0 x 1,x 2, or since R α is just a translation this also gives x 2 Rα n[0, 1 2 but x 1 Rα n[1 2,1. This implies that the elements of n T i ξ are intervals whose maximal length decreases to 0 as n. Therefore the smallest σ-algebra containing n=0 T n ξ contains all open sets, and so it follows that h mt R α = 0 by Corollary Example The state partition for the Bernoulli 2-shift in Example 1.16 is a two-sided generator, so we deduce that h µ2 σ 2 = log2. In fact n i= n T i ξ consists of the 2 2n+1 distinct cylinder sets [w] n n = {x X 2 x i = w i for i = n,...,n} for w X 2. It follows that n= T n ξ contains all metric balls see Example A.5 for an explicit description of the metric. The state partition { {x X3 x 0 = 0},{x X 3 x 0 = 1},{x X 3 x 0 = 2} } of the Bernoulli 3-shift X 3 = {0,1,2} Z with the 1 3, 1 3, 1 3 measure µ 3 is a two-sided generator under the left shift σ 3, so the same argument shows that h µ3 σ 3 = log3.thusthe Bernoulli2-and3-shiftsarenotmeasurably isomorphic.
25 1.3 Entropy of a Measure-Preserving Transformation 29 Example The partition ξ = {[0, 1 2,[1 2,1} is a one-sided generator for the circle-doubling map T 2 : T T. It is easy to check that n 1 T i ξ is the partition so H mt n 1 T i ξ = log2 n. The Kolmogorov Sinaĭ theorem Theorem 1.21 shows that h mt T 2 = log2. {[0, 1 2 n,...,[ 2n 1 2 n,1}, Example Just as in Example 1.27, the partition ξ = {[0, 1 p,[1 p, 2 p,...,[p 1 p,1} is a generator for the map T p x = px mod 1 with p 2 on the circle, and a similar argument shows that h mt T p = logp. NowconsideranarbitraryT p -invariantprobabilitymeasureµ onthecircle. Since ξ is a generator, we have h µ T p = h µ T p,ξ H µ ξ logp 1.12 by the trivial bound in Proposition and Proposition 1.5, since ξ has only p elements. Let us now characterize those measures for which we have equality in the estimate By Lemma 1.13, 1 h µ T p,ξ = inf n 1 n H µ ξ Tp 1 ξ Tp n 1 ξ 1 n logpn, where the last inequality holds again by Proposition 1.5. Hence h µ T p = logp implies, using the equality case in Proposition 1.5, that each of the intervals [ j p, j+1 n p of the partition ξ T 1 n p ξ T n 1 p ξ must have µ-measure equal to 1 p. This implies that µ = m n T, thus characterizingm T as the only T p - invariant Borel probability measure with entropy equal to log p. The phenomenon seen in Example 1.28, where maximality of entropy can be used to characterize particular measures is important, and it holds in other situations too. In this case, the geometry of the generating partition is very simple. In other contexts, it is often impossible to pick a generator that is so convenient. Apart from these complications arising from the geometry of the space and the transformation, the phenomenon that maximality of entropy can be used to characterize certain measures always utilizes the strict convexity of the map x xlogx or the map x logx. We will see other instances of this in Chapter 8.
26 30 1 Measure-Theoretic Entropy, Introduction Example Let X,µ,σ = X v G,µ p,p,σ be the Markov shift defined in Section A.4.2. Then h µ σ = i,j p i p i,j logp i,j. To see this, notice that the state partition ξ = {[i] 0 } is a generator, so we may apply the Kolmogorov Sinaĭ theorem Theorem We have µ [i 0 ] 0 σ 1 [i 1 ] 0 σ n 1 [i n 1 ] 0 = p i0 p i0,i 1 p in 2,i n 1, which gives the result using the properties of the logarithm, since by assumption we have i p ip i,j = p j for all j and j p i,j = 1 for all i. Exercises for Section 1.3 Exercise For a sequence of finite partitions ξ n with σξ n ր B, prove that ht can be expressed as lim n ht,ξ n. Exercise Prove that h µ ν T S = h µt+h νs. Exercise Show that there exists a shift-invariant measure µ of full support on the shift space X = {0,1} Z with h µσ = 0. Exercise Let X, B,µ,T be a measure-preserving system, and let ξ be a countable partition of X with finite entropy. Show that 1 n 1 n Hµ T i ξ decreases to h µt,ξ by the following steps. 1 Recall that n 1 H µ T i ξ n 1 = H µξ+ H µ ξ j=1 from the proof of Proposition 1.15, and deduce that n 1 H µ j T i ξ i=1 T i ξ nh µ ξ n T i ξ. 2 Use 1 and additivity of entropy Proposition 1.72 to show that and deduce the result. i=1 n n 1 nh µ T i ξ n+1h µ T i ξ Exercise Consider finite enumerated partitions of X, B, µ with k elements. Show that h µt,ξ is a continuous function of ξ in the L 1 µ norm on ξ. Exercise Show that for any h [0, ], there is an ergodic measure-preserving transformation with entropy h.
27 1.4 Defining Entropy using Names 31 Exercise Let X, B,µ,T be a measure-preserving system, and let ξ be a finite partition or a countably infinite partition with H µξ <. Prove that 1 h µt,ξ = inf F N F Hµ T n ξ, n F where the infimum is taken over all finite subsets F N. 1.4 Defining Entropy using Names We mentioned in Section 1.1 that the entropy formula in Definition 1.1 is the unique formula satisfying the basic properties of information from Section In this section we describe another way in which Definition 1.1 is forced on us, by computing a quantity related to entropy for a Bernoulli shift Decay Rate Forameasure-preservingsystemX, B,µ,Tandapartitionξ = A 1,A 2,... thought of as an ordered list, define the ξ,n-name wn ξ x of a point x X to be the vector a 0,a 1,...,a n 1 with the property that T i x A ai for 0 i < n. We also denote by wnx ξ the set of all points that share the ξ,n-name of x, which is clearly the atom of x with respect to n 1 T i ξ. By definition, the entropy of a measurepreserving transformation is related to the distribution of the measures of the names. We claim that this relationship goes deeper: the logarithmic rate of decay of the volume of the set associated to a typical name is the entropy. In this section we compute the rate of decay of the measure of names for a Bernoulli shift, which will serve both as another motivation for Definition 1.1 and as a forerunner of Theorem 3.1. This link between the decay rate and the entropy is the content of the Shannon McMillan Breiman theorem Theorem 3.1. Lemma 1.30 Decay for the Bernoulli shift. Let X, B, µ, T be the Bernoulli shift defined by the probability vector p = p 1,...,p s, which means that X = Z {1,...,s}, µ = Z p 1,...,p s, and let T = σ be the left shift. Let ξ be the state partition defined by the 0th coordinate of the points in X. Then This section has motivational character, both for definitions already made and for upcoming results, but will not be needed later.
28 32 1 Measure-Theoretic Entropy, Introduction as n for µ-almost every x. 1 n logµ w ξ n x Hp = s p i logp i Proof. The set of points with the name w ξ nx is the cylinder set so i=1 {y X y 0 = x 0,...,y n 1 = x n 1 }, µ w ξ nx = p x0 p xn Now for 1 j s, write ½ j = ½ [j]0 where [j] 0 denotes the cylinder set of points with 0 coordinate equal to j and notice that n 1 ½ j T i x = {i 0 i n 1,x i = j}, so we may rearrange 1.13 to obtain µ wn ξ x n 1 = p ½1Ti x n 1 1 p ½2Ti x n 1 ½sTi x 2 ps Now, by the ergodic theorem, for any ε > 0 and for almost every x X there is an N so that for every n N and j = 1,...,s we have n 1 1 ½ j T i x p j < ε n Taking the logarithm in 1.14 and dividing by n we see to within a small error the familiar entropy formula in Definition 1.1. More precisely, we combine and conclude that logµ wn ξ x nlogp p1 1 pps s εn logp 1 p s, so as n. 1 n logµ w ξ n x s p i logp i i= Name Entropy In fact the entropy theory for measure-preserving transformations can be built up entirelyin terms ofnames, and this is done in the elegantmonograph
29 1.4 Defining Entropy using Names 33 by Rudolph [180, Chap. 5]. We only discuss this approachbriefly, and will not use the following discussion in the remainder of the book entropy is such a fecund notion that similar alternative entropy notions will arise several times: see Theorem 3.1, the definition of topological entropy using open covers in Section 5.2, and Section 6.3. Let X, B,µ,T be an ergodic measure-preserving transformation, and define for each finite partition ξ = {A 1,...,A r }, ε > 0 and n 1 a quantity Nξ,ε,n as follows. For each ξ,n-name w ξ {1,...,r} n write µw ξ for the measure µ {x X w ξ n x = wξ } of the set of points in X whose name is w ξ, where w ξ nx = a 0,...,a n 1 with T j x A aj for 0 j < n. Starting with the names of least measure in {1,...,r} n, remove as many names as possible compatible with the condition that the total measure of the remaining names exceeds 1 ε. Write Nξ,ε,n for the cardinality of the set of remaining names. Then one may define and 1 h µ,name T,ξ = lim liminf ε 0 n n lognξ,ε,n h µ,name T = suph µ,name T,ξ ξ where the supremum is taken over all finite partitions. Using this definition and the assumption of ergodicity, it is possible to prove directly the following basic theorems: 1 The Shannon McMillan Breiman theorem Theorem 3.1 in the form 1 n logµ w ξ nx h µ,name T,ξ 1.16 for µ-almost every x. 2 The Kolmogorov Sinaĭ theorem: if i= T i ξ = B, then h µ,name T,ξ = h µ,name T We shall see later that h µ,name T,ξ = h µ T,ξ as a corollary of Theorem 3.1 see Exercise In contrast to the development in Sections , the formula in Definition 1.1 is not used in defining h µ,name. Instead it appears as a consequence of the combinatorics of counting names as in Lemma 1.30 see Exercise To obtain an independent and equivalent definition in the way described here, ergodicity needs to be assumed initially.
30 34 1 Measure-Theoretic Entropy, Introduction Exercises for Section 1.4 Exercise Show that h µ,nameσ,ξ = Hp using both the notation and the statement of Lemma Compression Rate Recall from Section 1.2 the interpretationofthe entropy 1 log2 H µξ as the optimal average length of binary codes compressing the possible outcomes of the experiment represented by the partition ξ ignoring the failure of optimality by one digit, as in Lemma This interpretation also helps to interpret some of the results of Section 1.3. For example, the subadditivity n 1 H µ T i ξ nh µ ξ can be interpreted to mean that the almost optimal code as in Lemma 1.11 for ξ = A 1,A 2,... can be used to code n 1 T i ξ as follows. The partition n 1 T i ξ has as a natural alphabet the names i 0...i n 1 of length n in the alphabet of ξ. The requirements on codes ensures that the optimal Shannon code s for ξ induces in a natural way a code s n on names of length n by concatenation, s n i 0...i n 1 = si 0 si 1...si n The average length of this code is nh µ ξ. However unless the partitions ξ,t 1 ξ,...,t n 1 ξ are independent, there might be better codes for names of length n than the code s n constructed by 1.18, giving the subadditivity inequality by Lemma Thus 1 n H µ n 1 T i ξ is the average length of the optimal code for n 1 T i ξ averaged both over the space and over a time interval of length n. Moreover, h µ T,ξ is the lowest averaged length of the code per time unit describing the outcomes of the experiment ξ on long pieces of trajectories that could possibly be achieved. Since h µ T,ξ is defined as an infimum in Definition 1.14, this might not be attained, but any slightly worse compression rate would be attainable by working with sufficiently long blocks T km m 1 T i ξ of a much longer trajectory in nm 1 T i ξ. Notice that the slight lack of optimality in Lemmas 1.10 and 1.11 vanishes on average over long time intervals see Exercise
31 1.5 Compression Rate 35 Example Consider the full three shift σ 3 : {0,1,2} Z {0,1,2} Z, with the generator ξ = {[0] 0,[1] 0,[2] 0 } using the notation from Exercise 1.16 for cylinder sets. A code for ξ is 0 00, 1 01, 2 10, which gives a rather inefficient coding for names: the length of a ternary sequence encoded in this way doubles. Using blocks of ternary sequences of length 3 with a total of 27 sequences gives binary codes of length 5 out of a total 32 possible codes, showing the greater efficiency in longer blocks: Defining a code by some injective map {0,1,2} 3 {0,1} 5 allows a ternary sequence of length 3k to be encoded to a binary sequence of length 5k, giving the better ratio of 5 3. Clearly these simple codes will never give a better ratio than log3 log2, but can achieve any slightly larger ratio at the expense of working with very long blocks of sequences. One might wonder whether more sophisticated codes could, on average, be more efficient on long sequences. The results of this chapter say precisely that this is not possible if we assume that the digits in the ternary sequences considered are identically independently distributed; equivalently if we work with the system X 3,µ 3,σ 3 with entropy h µ3 σ 3 = log3. We will develop these ideas further in Section 3.2. Exercises for Section 1.5 Exercise Give an interpretation of the finiteness of the entropy of an infinite probability vector v 1,v 2,... in terms of codes. Exercise Give an interpretation of conditional entropy and information in terms of codes. Exercise Fix a finite partition ξ with corresponding alphabet A in an ergodic measure-preserving system X, B,µ,T, and for each n 1 let s n be an optimal prefixfree code for the blocks of length n over A. Use the source coding theorem in Section 1.2 to show that lim n log2 Lsn = ht,ξ. n
MAGIC010 Ergodic Theory Lecture Entropy
7. Entropy 7. Introduction A natural question in mathematics is the so-called isomorphism problem : when are two mathematical objects of the same class the same (in some appropriately defined sense of
More informationwhich element of ξ contains the outcome conveys all the available information.
CHAPTER III MEASURE THEORETIC ENTROPY In this chapter we define the measure theoretic entropy or metric entropy of a measure preserving transformation. Equalities are to be understood modulo null sets;
More informationINTRODUCTION TO FURSTENBERG S 2 3 CONJECTURE
INTRODUCTION TO FURSTENBERG S 2 3 CONJECTURE BEN CALL Abstract. In this paper, we introduce the rudiments of ergodic theory and entropy necessary to study Rudolph s partial solution to the 2 3 problem
More informationMeasure and integration
Chapter 5 Measure and integration In calculus you have learned how to calculate the size of different kinds of sets: the length of a curve, the area of a region or a surface, the volume or mass of a solid.
More informationLebesgue measure and integration
Chapter 4 Lebesgue measure and integration If you look back at what you have learned in your earlier mathematics courses, you will definitely recall a lot about area and volume from the simple formulas
More informationMEASURE-THEORETIC ENTROPY
MEASURE-THEORETIC ENTROPY Abstract. We introduce measure-theoretic entropy 1. Some motivation for the formula and the logs We want to define a function I : [0, 1] R which measures how suprised we are or
More informationVARIATIONAL PRINCIPLE FOR THE ENTROPY
VARIATIONAL PRINCIPLE FOR THE ENTROPY LUCIAN RADU. Metric entropy Let (X, B, µ a measure space and I a countable family of indices. Definition. We say that ξ = {C i : i I} B is a measurable partition if:
More information4. Ergodicity and mixing
4. Ergodicity and mixing 4. Introduction In the previous lecture we defined what is meant by an invariant measure. In this lecture, we define what is meant by an ergodic measure. The primary motivation
More informationRudiments of Ergodic Theory
Rudiments of Ergodic Theory Zefeng Chen September 24, 203 Abstract In this note we intend to present basic ergodic theory. We begin with the notion of a measure preserving transformation. We then define
More informationChapter 1. Measure Spaces. 1.1 Algebras and σ algebras of sets Notation and preliminaries
Chapter 1 Measure Spaces 1.1 Algebras and σ algebras of sets 1.1.1 Notation and preliminaries We shall denote by X a nonempty set, by P(X) the set of all parts (i.e., subsets) of X, and by the empty set.
More informationChapter 4. Measure Theory. 1. Measure Spaces
Chapter 4. Measure Theory 1. Measure Spaces Let X be a nonempty set. A collection S of subsets of X is said to be an algebra on X if S has the following properties: 1. X S; 2. if A S, then A c S; 3. if
More informationConstruction of a general measure structure
Chapter 4 Construction of a general measure structure We turn to the development of general measure theory. The ingredients are a set describing the universe of points, a class of measurable subsets along
More informationMeasurable functions are approximately nice, even if look terrible.
Tel Aviv University, 2015 Functions of real variables 74 7 Approximation 7a A terrible integrable function........... 74 7b Approximation of sets................ 76 7c Approximation of functions............
More information1/12/05: sec 3.1 and my article: How good is the Lebesgue measure?, Math. Intelligencer 11(2) (1989),
Real Analysis 2, Math 651, Spring 2005 April 26, 2005 1 Real Analysis 2, Math 651, Spring 2005 Krzysztof Chris Ciesielski 1/12/05: sec 3.1 and my article: How good is the Lebesgue measure?, Math. Intelligencer
More informationMATH41011/MATH61011: FOURIER SERIES AND LEBESGUE INTEGRATION. Extra Reading Material for Level 4 and Level 6
MATH41011/MATH61011: FOURIER SERIES AND LEBESGUE INTEGRATION Extra Reading Material for Level 4 and Level 6 Part A: Construction of Lebesgue Measure The first part the extra material consists of the construction
More informationAn Introduction to Entropy and Subshifts of. Finite Type
An Introduction to Entropy and Subshifts of Finite Type Abby Pekoske Department of Mathematics Oregon State University pekoskea@math.oregonstate.edu August 4, 2015 Abstract This work gives an overview
More informationProbability and Measure
Part II Year 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2018 84 Paper 4, Section II 26J Let (X, A) be a measurable space. Let T : X X be a measurable map, and µ a probability
More information( f ^ M _ M 0 )dµ (5.1)
47 5. LEBESGUE INTEGRAL: GENERAL CASE Although the Lebesgue integral defined in the previous chapter is in many ways much better behaved than the Riemann integral, it shares its restriction to bounded
More information2 Lebesgue integration
2 Lebesgue integration 1. Let (, A, µ) be a measure space. We will always assume that µ is complete, otherwise we first take its completion. The example to have in mind is the Lebesgue measure on R n,
More informationLebesgue Measure on R n
CHAPTER 2 Lebesgue Measure on R n Our goal is to construct a notion of the volume, or Lebesgue measure, of rather general subsets of R n that reduces to the usual volume of elementary geometrical sets
More informationMeasure and Integration: Concepts, Examples and Exercises. INDER K. RANA Indian Institute of Technology Bombay India
Measure and Integration: Concepts, Examples and Exercises INDER K. RANA Indian Institute of Technology Bombay India Department of Mathematics, Indian Institute of Technology, Bombay, Powai, Mumbai 400076,
More information1.1. MEASURES AND INTEGRALS
CHAPTER 1: MEASURE THEORY In this chapter we define the notion of measure µ on a space, construct integrals on this space, and establish their basic properties under limits. The measure µ(e) will be defined
More information5 Measure theory II. (or. lim. Prove the proposition. 5. For fixed F A and φ M define the restriction of φ on F by writing.
5 Measure theory II 1. Charges (signed measures). Let (Ω, A) be a σ -algebra. A map φ: A R is called a charge, (or signed measure or σ -additive set function) if φ = φ(a j ) (5.1) A j for any disjoint
More informationChapter 2 Conditional Measure-Theoretic Entropy
Chapter 2 Conditional Measure-Theoretic Entropy The basic entropy theory from Chapter 1 will become a more powerful and flexible tool after we extend the theory from partitions to σ-algebras. However,
More informationMeasures and Measure Spaces
Chapter 2 Measures and Measure Spaces In summarizing the flaws of the Riemann integral we can focus on two main points: 1) Many nice functions are not Riemann integrable. 2) The Riemann integral does not
More informationMath 4121 Spring 2012 Weaver. Measure Theory. 1. σ-algebras
Math 4121 Spring 2012 Weaver Measure Theory 1. σ-algebras A measure is a function which gauges the size of subsets of a given set. In general we do not ask that a measure evaluate the size of every subset,
More informationIntegration on Measure Spaces
Chapter 3 Integration on Measure Spaces In this chapter we introduce the general notion of a measure on a space X, define the class of measurable functions, and define the integral, first on a class of
More information5 Birkhoff s Ergodic Theorem
5 Birkhoff s Ergodic Theorem Birkhoff s Ergodic Theorem extends the validity of Kolmogorov s strong law to the class of stationary sequences of random variables. Stationary sequences occur naturally even
More information3 Integration and Expectation
3 Integration and Expectation 3.1 Construction of the Lebesgue Integral Let (, F, µ) be a measure space (not necessarily a probability space). Our objective will be to define the Lebesgue integral R fdµ
More informationEntropy and Ergodic Theory Lecture 3: The meaning of entropy in information theory
Entropy and Ergodic Theory Lecture 3: The meaning of entropy in information theory 1 The intuitive meaning of entropy Modern information theory was born in Shannon s 1948 paper A Mathematical Theory of
More informationMATHS 730 FC Lecture Notes March 5, Introduction
1 INTRODUCTION MATHS 730 FC Lecture Notes March 5, 2014 1 Introduction Definition. If A, B are sets and there exists a bijection A B, they have the same cardinality, which we write as A, #A. If there exists
More informationMAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9
MAT 570 REAL ANALYSIS LECTURE NOTES PROFESSOR: JOHN QUIGG SEMESTER: FALL 204 Contents. Sets 2 2. Functions 5 3. Countability 7 4. Axiom of choice 8 5. Equivalence relations 9 6. Real numbers 9 7. Extended
More information25.1 Ergodicity and Metric Transitivity
Chapter 25 Ergodicity This lecture explains what it means for a process to be ergodic or metrically transitive, gives a few characterizes of these properties (especially for AMS processes), and deduces
More informationEntropy and Ergodic Theory Notes 21: The entropy rate of a stationary process
Entropy and Ergodic Theory Notes 21: The entropy rate of a stationary process 1 Sources with memory In information theory, a stationary stochastic processes pξ n q npz taking values in some finite alphabet
More informationErgodic Theory. Constantine Caramanis. May 6, 1999
Ergodic Theory Constantine Caramanis ay 6, 1999 1 Introduction Ergodic theory involves the study of transformations on measure spaces. Interchanging the words measurable function and probability density
More informationA dyadic endomorphism which is Bernoulli but not standard
A dyadic endomorphism which is Bernoulli but not standard Christopher Hoffman Daniel Rudolph November 4, 2005 Abstract Any measure preserving endomorphism generates both a decreasing sequence of σ-algebras
More information(U) =, if 0 U, 1 U, (U) = X, if 0 U, and 1 U. (U) = E, if 0 U, but 1 U. (U) = X \ E if 0 U, but 1 U. n=1 A n, then A M.
1. Abstract Integration The main reference for this section is Rudin s Real and Complex Analysis. The purpose of developing an abstract theory of integration is to emphasize the difference between the
More informationLecture Notes Introduction to Ergodic Theory
Lecture Notes Introduction to Ergodic Theory Tiago Pereira Department of Mathematics Imperial College London Our course consists of five introductory lectures on probabilistic aspects of dynamical systems,
More informationThe Caratheodory Construction of Measures
Chapter 5 The Caratheodory Construction of Measures Recall how our construction of Lebesgue measure in Chapter 2 proceeded from an initial notion of the size of a very restricted class of subsets of R,
More informationMeasures. Chapter Some prerequisites. 1.2 Introduction
Lecture notes Course Analysis for PhD students Uppsala University, Spring 2018 Rostyslav Kozhan Chapter 1 Measures 1.1 Some prerequisites I will follow closely the textbook Real analysis: Modern Techniques
More informationPart V. 17 Introduction: What are measures and why measurable sets. Lebesgue Integration Theory
Part V 7 Introduction: What are measures and why measurable sets Lebesgue Integration Theory Definition 7. (Preliminary). A measure on a set is a function :2 [ ] such that. () = 2. If { } = is a finite
More informationThe Lebesgue Integral
The Lebesgue Integral Brent Nelson In these notes we give an introduction to the Lebesgue integral, assuming only a knowledge of metric spaces and the iemann integral. For more details see [1, Chapters
More informationLecture 4. Entropy and Markov Chains
preliminary version : Not for diffusion Lecture 4. Entropy and Markov Chains The most important numerical invariant related to the orbit growth in topological dynamical systems is topological entropy.
More informationAhlswede Khachatrian Theorems: Weighted, Infinite, and Hamming
Ahlswede Khachatrian Theorems: Weighted, Infinite, and Hamming Yuval Filmus April 4, 2017 Abstract The seminal complete intersection theorem of Ahlswede and Khachatrian gives the maximum cardinality of
More informationDisintegration into conditional measures: Rokhlin s theorem
Disintegration into conditional measures: Rokhlin s theorem Let Z be a compact metric space, µ be a Borel probability measure on Z, and P be a partition of Z into measurable subsets. Let π : Z P be the
More informationKRIEGER S FINITE GENERATOR THEOREM FOR ACTIONS OF COUNTABLE GROUPS I
KRIEGER S FINITE GENERATOR THEOREM FOR ACTIONS OF COUNTABLE GROUPS I BRANDON SEWARD Abstract. For an ergodic p.m.p. action G (X, µ of a countable group G, we define the Rokhlin entropy h Rok G (X, µ to
More informationII - REAL ANALYSIS. This property gives us a way to extend the notion of content to finite unions of rectangles: we define
1 Measures 1.1 Jordan content in R N II - REAL ANALYSIS Let I be an interval in R. Then its 1-content is defined as c 1 (I) := b a if I is bounded with endpoints a, b. If I is unbounded, we define c 1
More informationIndeed, if we want m to be compatible with taking limits, it should be countably additive, meaning that ( )
Lebesgue Measure The idea of the Lebesgue integral is to first define a measure on subsets of R. That is, we wish to assign a number m(s to each subset S of R, representing the total length that S takes
More informationn [ F (b j ) F (a j ) ], n j=1(a j, b j ] E (4.1)
1.4. CONSTRUCTION OF LEBESGUE-STIELTJES MEASURES In this section we shall put to use the Carathéodory-Hahn theory, in order to construct measures with certain desirable properties first on the real line
More information3 Measurable Functions
3 Measurable Functions Notation A pair (X, F) where F is a σ-field of subsets of X is a measurable space. If µ is a measure on F then (X, F, µ) is a measure space. If µ(x) < then (X, F, µ) is a probability
More informationMATH 102 INTRODUCTION TO MATHEMATICAL ANALYSIS. 1. Some Fundamentals
MATH 02 INTRODUCTION TO MATHEMATICAL ANALYSIS Properties of Real Numbers Some Fundamentals The whole course will be based entirely on the study of sequence of numbers and functions defined on the real
More informationLebesgue Measure. Dung Le 1
Lebesgue Measure Dung Le 1 1 Introduction How do we measure the size of a set in IR? Let s start with the simplest ones: intervals. Obviously, the natural candidate for a measure of an interval is its
More informationBERNOULLI ACTIONS OF SOFIC GROUPS HAVE COMPLETELY POSITIVE ENTROPY
BERNOULLI ACTIONS OF SOFIC GROUPS HAVE COMPLETELY POSITIVE ENTROPY DAVID KERR Abstract. We prove that every Bernoulli action of a sofic group has completely positive entropy with respect to every sofic
More informationMetric Spaces and Topology
Chapter 2 Metric Spaces and Topology From an engineering perspective, the most important way to construct a topology on a set is to define the topology in terms of a metric on the set. This approach underlies
More informationChapter 8. General Countably Additive Set Functions. 8.1 Hahn Decomposition Theorem
Chapter 8 General Countably dditive Set Functions In Theorem 5.2.2 the reader saw that if f : X R is integrable on the measure space (X,, µ) then we can define a countably additive set function ν on by
More informationLyapunov optimizing measures for C 1 expanding maps of the circle
Lyapunov optimizing measures for C 1 expanding maps of the circle Oliver Jenkinson and Ian D. Morris Abstract. For a generic C 1 expanding map of the circle, the Lyapunov maximizing measure is unique,
More informationIntroduction to Dynamical Systems
Introduction to Dynamical Systems France-Kosovo Undergraduate Research School of Mathematics March 2017 This introduction to dynamical systems was a course given at the march 2017 edition of the France
More informationA Short Introduction to Ergodic Theory of Numbers. Karma Dajani
A Short Introduction to Ergodic Theory of Numbers Karma Dajani June 3, 203 2 Contents Motivation and Examples 5 What is Ergodic Theory? 5 2 Number Theoretic Examples 6 2 Measure Preserving, Ergodicity
More informationAN ALGEBRAIC APPROACH TO GENERALIZED MEASURES OF INFORMATION
AN ALGEBRAIC APPROACH TO GENERALIZED MEASURES OF INFORMATION Daniel Halpern-Leistner 6/20/08 Abstract. I propose an algebraic framework in which to study measures of information. One immediate consequence
More informationWaiting times, recurrence times, ergodicity and quasiperiodic dynamics
Waiting times, recurrence times, ergodicity and quasiperiodic dynamics Dong Han Kim Department of Mathematics, The University of Suwon, Korea Scuola Normale Superiore, 22 Jan. 2009 Outline Dynamical Systems
More informationMeasures. 1 Introduction. These preliminary lecture notes are partly based on textbooks by Athreya and Lahiri, Capinski and Kopp, and Folland.
Measures These preliminary lecture notes are partly based on textbooks by Athreya and Lahiri, Capinski and Kopp, and Folland. 1 Introduction Our motivation for studying measure theory is to lay a foundation
More informationSpring 2014 Advanced Probability Overview. Lecture Notes Set 1: Course Overview, σ-fields, and Measures
36-752 Spring 2014 Advanced Probability Overview Lecture Notes Set 1: Course Overview, σ-fields, and Measures Instructor: Jing Lei Associated reading: Sec 1.1-1.4 of Ash and Doléans-Dade; Sec 1.1 and A.1
More informationConnectedness. Proposition 2.2. The following are equivalent for a topological space (X, T ).
Connectedness 1 Motivation Connectedness is the sort of topological property that students love. Its definition is intuitive and easy to understand, and it is a powerful tool in proofs of well-known results.
More informationIn N we can do addition, but in order to do subtraction we need to extend N to the integers
Chapter The Real Numbers.. Some Preliminaries Discussion: The Irrationality of 2. We begin with the natural numbers N = {, 2, 3, }. In N we can do addition, but in order to do subtraction we need to extend
More informationDynamical Systems 2, MA 761
Dynamical Systems 2, MA 761 Topological Dynamics This material is based upon work supported by the National Science Foundation under Grant No. 9970363 1 Periodic Points 1 The main objects studied in the
More informationEconomics 204 Fall 2011 Problem Set 1 Suggested Solutions
Economics 204 Fall 2011 Problem Set 1 Suggested Solutions 1. Suppose k is a positive integer. Use induction to prove the following two statements. (a) For all n N 0, the inequality (k 2 + n)! k 2n holds.
More informationExamples of Dual Spaces from Measure Theory
Chapter 9 Examples of Dual Spaces from Measure Theory We have seen that L (, A, µ) is a Banach space for any measure space (, A, µ). We will extend that concept in the following section to identify an
More information2 Probability, random elements, random sets
Tel Aviv University, 2012 Measurability and continuity 25 2 Probability, random elements, random sets 2a Probability space, measure algebra........ 25 2b Standard models................... 30 2c Random
More informationIRRATIONAL ROTATION OF THE CIRCLE AND THE BINARY ODOMETER ARE FINITARILY ORBIT EQUIVALENT
IRRATIONAL ROTATION OF THE CIRCLE AND THE BINARY ODOMETER ARE FINITARILY ORBIT EQUIVALENT MRINAL KANTI ROYCHOWDHURY Abstract. Two invertible dynamical systems (X, A, µ, T ) and (Y, B, ν, S) where X, Y
More informationTHEOREMS, ETC., FOR MATH 515
THEOREMS, ETC., FOR MATH 515 Proposition 1 (=comment on page 17). If A is an algebra, then any finite union or finite intersection of sets in A is also in A. Proposition 2 (=Proposition 1.1). For every
More informationLebesgue Measure on R n
8 CHAPTER 2 Lebesgue Measure on R n Our goal is to construct a notion of the volume, or Lebesgue measure, of rather general subsets of R n that reduces to the usual volume of elementary geometrical sets
More informationSegment Description of Turbulence
Dynamics of PDE, Vol.4, No.3, 283-291, 2007 Segment Description of Turbulence Y. Charles Li Communicated by Y. Charles Li, received August 25, 2007. Abstract. We propose a segment description for turbulent
More informationMA359 Measure Theory
A359 easure Theory Thomas Reddington Usman Qureshi April 8, 204 Contents Real Line 3. Cantor set.................................................. 5 2 General easures 2 2. Product spaces...............................................
More informationTopological properties of Z p and Q p and Euclidean models
Topological properties of Z p and Q p and Euclidean models Samuel Trautwein, Esther Röder, Giorgio Barozzi November 3, 20 Topology of Q p vs Topology of R Both R and Q p are normed fields and complete
More informationL p Spaces and Convexity
L p Spaces and Convexity These notes largely follow the treatments in Royden, Real Analysis, and Rudin, Real & Complex Analysis. 1. Convex functions Let I R be an interval. For I open, we say a function
More informationA NEW SET THEORY FOR ANALYSIS
Article A NEW SET THEORY FOR ANALYSIS Juan Pablo Ramírez 0000-0002-4912-2952 Abstract: We present the real number system as a generalization of the natural numbers. First, we prove the co-finite topology,
More informationIf the [T, Id] automorphism is Bernoulli then the [T, Id] endomorphism is standard
If the [T, Id] automorphism is Bernoulli then the [T, Id] endomorphism is standard Christopher Hoffman Daniel Rudolph September 27, 2002 Abstract For any - measure preserving map T of a probability space
More informationarxiv: v1 [math.co] 5 Apr 2019
arxiv:1904.02924v1 [math.co] 5 Apr 2019 The asymptotics of the partition of the cube into Weyl simplices, and an encoding of a Bernoulli scheme A. M. Vershik 02.02.2019 Abstract We suggest a combinatorial
More informationChapter One. The Real Number System
Chapter One. The Real Number System We shall give a quick introduction to the real number system. It is imperative that we know how the set of real numbers behaves in the way that its completeness and
More informationEntropy dimensions and a class of constructive examples
Entropy dimensions and a class of constructive examples Sébastien Ferenczi Institut de Mathématiques de Luminy CNRS - UMR 6206 Case 907, 63 av. de Luminy F3288 Marseille Cedex 9 (France) and Fédération
More informationErgodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R.
Ergodic Theorems Samy Tindel Purdue University Probability Theory 2 - MA 539 Taken from Probability: Theory and examples by R. Durrett Samy T. Ergodic theorems Probability Theory 1 / 92 Outline 1 Definitions
More informationMATH 521, WEEK 2: Rational and Real Numbers, Ordered Sets, Countable Sets
MATH 521, WEEK 2: Rational and Real Numbers, Ordered Sets, Countable Sets 1 Rational and Real Numbers Recall that a number is rational if it can be written in the form a/b where a, b Z and b 0, and a number
More informationSolutions to Tutorial 1 (Week 2)
THE UNIVERSITY OF SYDNEY SCHOOL OF MATHEMATICS AND STATISTICS Solutions to Tutorial 1 (Week 2 MATH3969: Measure Theory and Fourier Analysis (Advanced Semester 2, 2017 Web Page: http://sydney.edu.au/science/maths/u/ug/sm/math3969/
More informationWeek 2: Sequences and Series
QF0: Quantitative Finance August 29, 207 Week 2: Sequences and Series Facilitator: Christopher Ting AY 207/208 Mathematicians have tried in vain to this day to discover some order in the sequence of prime
More informationLecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016
Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 1 Entropy Since this course is about entropy maximization,
More informationBanach Spaces V: A Closer Look at the w- and the w -Topologies
BS V c Gabriel Nagy Banach Spaces V: A Closer Look at the w- and the w -Topologies Notes from the Functional Analysis Course (Fall 07 - Spring 08) In this section we discuss two important, but highly non-trivial,
More informationNotes for Math 290 using Introduction to Mathematical Proofs by Charles E. Roberts, Jr.
Notes for Math 290 using Introduction to Mathematical Proofs by Charles E. Roberts, Jr. Chapter : Logic Topics:. Statements, Negation, and Compound Statements.2 Truth Tables and Logical Equivalences.3
More informationLecture 4 Lebesgue spaces and inequalities
Lecture 4: Lebesgue spaces and inequalities 1 of 10 Course: Theory of Probability I Term: Fall 2013 Instructor: Gordan Zitkovic Lecture 4 Lebesgue spaces and inequalities Lebesgue spaces We have seen how
More informationBERNOULLI ACTIONS AND INFINITE ENTROPY
BERNOULLI ACTIONS AND INFINITE ENTROPY DAVID KERR AND HANFENG LI Abstract. We show that, for countable sofic groups, a Bernoulli action with infinite entropy base has infinite entropy with respect to every
More informationg 2 (x) (1/3)M 1 = (1/3)(2/3)M.
COMPACTNESS If C R n is closed and bounded, then by B-W it is sequentially compact: any sequence of points in C has a subsequence converging to a point in C Conversely, any sequentially compact C R n is
More informationREAL ANALYSIS LECTURE NOTES: 1.4 OUTER MEASURE
REAL ANALYSIS LECTURE NOTES: 1.4 OUTER MEASURE CHRISTOPHER HEIL 1.4.1 Introduction We will expand on Section 1.4 of Folland s text, which covers abstract outer measures also called exterior measures).
More informationProof: The coding of T (x) is the left shift of the coding of x. φ(t x) n = L if T n+1 (x) L
Lecture 24: Defn: Topological conjugacy: Given Z + d (resp, Zd ), actions T, S a topological conjugacy from T to S is a homeomorphism φ : M N s.t. φ T = S φ i.e., φ T n = S n φ for all n Z + d (resp, Zd
More informationIn N we can do addition, but in order to do subtraction we need to extend N to the integers
Chapter 1 The Real Numbers 1.1. Some Preliminaries Discussion: The Irrationality of 2. We begin with the natural numbers N = {1, 2, 3, }. In N we can do addition, but in order to do subtraction we need
More informationContinuum Probability and Sets of Measure Zero
Chapter 3 Continuum Probability and Sets of Measure Zero In this chapter, we provide a motivation for using measure theory as a foundation for probability. It uses the example of random coin tossing to
More informationTopological properties
CHAPTER 4 Topological properties 1. Connectedness Definitions and examples Basic properties Connected components Connected versus path connected, again 2. Compactness Definition and first examples Topological
More informationSolutions to Tutorial 11 (Week 12)
THE UIVERSITY OF SYDEY SCHOOL OF MATHEMATICS AD STATISTICS Solutions to Tutorial 11 (Week 12) MATH3969: Measure Theory and Fourier Analysis (Advanced) Semester 2, 2017 Web Page: http://sydney.edu.au/science/maths/u/ug/sm/math3969/
More information5 Set Operations, Functions, and Counting
5 Set Operations, Functions, and Counting Let N denote the positive integers, N 0 := N {0} be the non-negative integers and Z = N 0 ( N) the positive and negative integers including 0, Q the rational numbers,
More information(1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define
Homework, Real Analysis I, Fall, 2010. (1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define ρ(f, g) = 1 0 f(x) g(x) dx. Show that
More informationPrinciples of Real Analysis I Fall I. The Real Number System
21-355 Principles of Real Analysis I Fall 2004 I. The Real Number System The main goal of this course is to develop the theory of real-valued functions of one real variable in a systematic and rigorous
More informationAnother Riesz Representation Theorem
Another Riesz Representation Theorem In these notes we prove (one version of) a theorem known as the Riesz Representation Theorem. Some people also call it the Riesz Markov Theorem. It expresses positive
More information