Continuum Probability and Sets of Measure Zero

Size: px

Start display at page:

Download "Continuum Probability and Sets of Measure Zero"

Theresa Rogers
6 years ago
Views:

1 Chapter 3 Continuum Probability and Sets of Measure Zero In this chapter, we provide a motivation for using measure theory as a foundation for probability. It uses the example of random coin tossing to explain why we need to move past discrete probability theory and to figure out what would be needed in the new foundation (that has yet to be developed). Presuming that we can indeed create the necessary theoretical foundation, we show some important consequences that result. This is intended to justify the investment we have to make in rigorous analysis in the following chapters. We do not show that the required theoretical foundation exists in this chapter! This is meant to be a fun and engaging introduction into thought processes involved with measure theoretic probability. Moreover, it shows that formulating a vague idea of measure allows the possibility of stating and proving deep results. Enjoy this chapter as the next chapter provides all the heavy-going theory and proof a reader could want! After developing rigorous measure theory, we revisit the material in this chapter to verify that everything discussed is indeed rigorously justified. 3.1 Probability and sets of real numbers We begin by developing a connection between a probability space with an infinite number of points and an interval of real numbers. With this correspondence, we can then develop a systematic method for computing probabilities of events in the probability space by measuring the sizes of corresponding sets of real numbers. However, it turns out that perfectly reasonable probability questions correspond to very complicated sets of real numbers. Thus, we first need to develop a way to measure the size of rather unusual sets of real numbers Bernoulli sequences and the unit interval Definition Suppose an experiment has two possible outcomes and the probabilities of these outcomes are fixed. A finite number of independent trials of the experiment is a called a Bernoulli trial. An infinite sequence of independent trials is called a Bernoulli sequence. Example Let the experiment be the toss of two-sided coin, with a head denoted (H) and tails denoted (T ). An example of a Bernoulli sequence is H,T,T, H, H, H, H,T, H,T,T, H, H,T,T,T,T, H,T, H,T,T,... 29

2 3 Chapter 3. Continuum Probability and Sets of Measure Zero Definition We define the space of Bernoulli sequences, B = {all Bernoulli sequences generated by a particular experiment}. We use H and T to denote the two outcomes. For simplicity, we mostly treat the case where the outcomes have equal probability of occurring, i.e. corresponding to a fair coin. In general, the two results may have different probabilities. We show that B can (almost) be represented by the real numbers in (,1], which implies that B is uncountable. Theorem If we delete a countable subset of B, we can index the remaining points using the numbers in (,1]. Recall that by index, we mean there is a 1 1 correspondence between the two sets. Proof. We construct a map from (,1] to B that fails to be onto by a countable subset. Any point x [,1] can be written as an expansion in base 2, or binary expansion, x = a i 2 i, a i = or1. Each such expansion corresponds to a Bernoulli sequence. To see this, define the n t h term of the Bernoulli sequence to be H when a n = 1 and T when a n =. Example H,T, H, H, H,T,T, H,T,T, H, A problem with using real numbers as an index set is the fact that some numbers do not have a unique binary expansion but we consider two Bernoulli sequences with different members to be distinct. Example =.1... and 1 = , but H T T T T T H H H. 2 Thus, the method above used to generate a Bernoulli sequence does not define a function into B. To avoid this trouble, we adopt the convention that if the real number x has terminating and non-terminating binary expansions, we use the non-terminating expansion. This is the reason for using (,1] instead of [,1]. With this convention, the method above defines a 1 1 map into B that is not onto because it does not produce Bernoulli sequences ending in all T s. We claim that the set B T of such Bernoulli sequences is countable. Let B k be the finite set of Bernoulli T sequences that have only T s after the k t h term. We have, B T = B k T. (3.1) k=1 This implies that B T is countable and there is a 1 1 and onto correspondence between (,1] and B \ B T.

3 3.1. Probability and sets of real numbers 31 Proof Comment 3.1. The decomposition of a countable set as a countable union of finite sets in (3.1) is a standard measure theory argument Initial encounter with measure Since B is uncountable and B T is countable, we would like to ignore B T for all practical purposes and identify B with I = (,1]. Likewise, it turns out to be convenient to measure the size of any finite or countable subset of I as negligible compared to the size of I, which has a number of important ramifications. This is the first motivating example for devising a way to measure the size of sets of real numbers that applies to complex sets. Lebesgue developed an approach to measure the sizes of complex sets of real numbers that is the basis for measure theory. Measure theory can be developed in a very abstract way that applies to spaces of many different kinds of objects, though we focus on spaces consisting of real numbers in this book. In that context, it is initially reasonable to think of measure as a generalization of length in one dimension, and area and volume in higher dimensions. But, we also caution that measures can have other interpretations. For example, we use measure to quantify probability later on. To fit common conceptions of measuring the sizes of sets, at a minimum, a measure µ should satisfy some properties. Definition (First Wish List for Measures). A measure µ is a real-valued function defined on a collection of subsets of a space called the measurable sets. If A is a measurable set, µ(a) is the measure of A. At a minimum, the structure must satisfy: (Non-negativity) µ should be non-negative. (Closed under finite unions) If {A i } n is a finite collection of disjoint measurable sets, n then A i is measurable. (Finite-additivity) If {A i } n is a collection of disjoint measurable sets, then, n n µ A i = µ(a i ). Thus, a measure is a non-negative finitely additive set function, just like a probability function. There should be a connection here. We pay particular attention to the case of real numbers: Definition If the space is an interval of real numbers and the measurable sets include intervals for which µ((a, b)) = µ([a, b]) = µ((a, b]) = µ([a, b)) = b a, we call µ the Lebesgue measure on and write µ = µ. a, b, Note that this implies that the measure of a set of a single point is zero, i.e., µ ({a}) =.

4 32 Chapter 3. Continuum Probability and Sets of Measure Zero Assigning probabilities to events in B So far, we have identified B with the interval of real numbers I and have introduced the desirability of a general way to measure the sizes of sets in I and some properties that such a measure should have. The next step is to assign a system for computing probabilities of events in B using the measure. For simplicity, we consider the case when T and H occur with equal probability. To start from what we know, we first consider the space consisting of a Bernoulli trial of finite length n. The probability of H as the first outcome in any trial is.5, and likewise the probability of T as the first outcome in any trial. This can be computed using simple counting over all possible trials of length n. Unfortunately, we cannot make a counting argument in the case of B, though intuition suggests that the probabilities are also.5. Switching to sets of real numbers, if A H is the event in B consisting of sequences where H is the first outcome, the corresponding set in I = (,1] is I AH = {x I ; x =.1a 1 a 2 a 3... : a i = or1} = (.5,1]. Note that the largest number not in I AH is.1... while the largest number in I AH is ) We do not include 1/2 because we use non-terminating expansions. Likewise, if A T is the event where T occurs as the first outcome, then I AT = (,.5]. We have µ (I AH ) = µ (I AT ) =.5. In this case, based on the fact that I AH and I AT have equal measures, it seems reasonable to assign the probabilities, P(A H ) = µ (I AH ) =.5 and P(A T ) = µ (I AT ) =.5. Next, if we consider the events A H H, A H T, A T H, A T T in B in which the first two outcomes H H, H T, T H, T T are specified, the corresponding intervals are I AT T = (,.25], I AT H = (.25,.5], I AH T = (.5,.75], I AH H = (.75,1]. Since these intervals have equal length, we assign the probability of.25 to each and to each corresponding event. We can continue with this argument, considering the events corresponding to specification of the first three outcomes, then the first four outcomes, and so on. Considering the events in which the first n outcomes are specified, we obtain 2 n intervals of equal length, and assign equal probability 2 n to each interval and thus each event. In this way, we obtain a sequence of binary partitions n of I into 2 n nonoverlapping subintervals I n, j of equal length such that I = 2n j =1 I n, j, see Fig We assign equal probabilities to each subinterval in a given partition and to the corresponding events. Moreover, it appears that any interval (a, b] I can be approximated arbitrarily well by I n, j (a,b] I n, j in the sense that the intervals of points not in the approximation (a, b] \ In, j (a,b] I n, j shrink in size as n increases, see Fig In view of the Wish List and the fact that µ (I ) = 1, we extend these observations to a general principle of modeling. Axiom (The Measure Theory Model for Probability on B). If A is an event in B, we let I A denote the corresponding set of real numbers in (,1]. Then, we assign the probability of A, denoted by P(E), to be µ (I A ). All of this discussion is terribly vague, since we have not defined µ, described the collection of measurable sets, or quantified the sense of approximation of sets observed

5 3.1. Probability and sets of real numbers 33 1 T 4 a b Figure 3.1. Illustration of the sequence of binary partitions n of I. We illustrate an approximation of the interval (a, b) by subintervals in 5. above! But, we verify these ideas are useful in some simple examples below and show that they lead to stating and proving important theorems in the next couple of sections. Example Consider the event A in which H is the n t h outcome. Then, I A = x I ; x =.a 1 a 2...1a n+1 a n+2 a n+3... : a i = or1 Let s =.a 1 a 2...a n 1 1, so I A contains (s, s + 2 n 1 ]. We can choose a 1, a 2,...a n 1 in 2 n 1 different ways and each of the resulting intervals are disjoint from the others, so we use finite additivity to conclude that, P(A) = µ (I A ) = 2 n n = 1/2. As a concrete example, consider n = 3. Then, we have the following cases: H T H, H H H, T H H, H T H : corresponding to 4 disjoint intervals of length 1/2 3, and P(A) = 4/8 = 1/2.) Example Let A be the event where exactly i of the first n outcomes are H, so I A = x I ; x =.a 1 a 2...a n a n+1 : exactly i of the first n digits are 1 and remaining are or 1. Choose a 1,...,a n so exactly i are 1 and set s =.a 1 a 2...a n. I A contains (s, s + 2 n ]. The intervals corresponding to different choices of a 1,...,a n are disjoint and there are exactly n n! = i i!(n i)!, such intervals. So P(A) = µ (I A ) = n 1 i 2. n

6 34 Chapter 3. Continuum Probability and Sets of Measure Zero Recapping the construction of the model We note that there are actually two modeling steps involved with Axiom 3.1.1: Step 1 The adoption of the measure formulation for probability, which gives a procedure for computing probabilities of events; Step 2 The assignation of specific probabilities to events in B, i.e. P(A) = µ (I A ) for A B. Step 1 is a proposal for how to carry out stochastic computations in a probability space with an infinite number of points. This use of measure theory is not entirely free from controversy and there are alternative proposed frameworks. But it is fair to say that the proposal of measure theory as a foundation for probability by Kolmogorov stands as one of the great mathematical achievements of the Twentieth Century. The worthiness of measure theory as a framework for probability is demonstrated in part by the ability to state and prove important probabilistic results. We present a couple of examples in the next two sections and many examples in later chapters. The assigning of probabilities in Step 2 is subject to perhaps a greater degree of controversy. Partly, this is due to the fact that randomness is used to model various situations, including systems that are truly stochastic in nature and systems whose state is unknown but not truly stochastic. Even if a system is random, there may be limited information on the probability values of different events, and when there is information, it is often based on a finite set of observations. Above, we extrapolated to define P(E) = µ (I E ) working from a finite set of examples. We concluding by noting that the model derived in this section can be applied to a variety of situations. Example We can use I an an index set for the points in the space corresponding to the random throw of a dart onto the interval I and it can index the time of arrival of a single α particle during a unit interval of time. We can also extend these ideas to higher dimensions, e.g., by considering a square dart board. Put a 2 d dart throwing example here Numerical simulation References Exercises 3.2 The Weak Law of Large Numbers Continuing the program of motivating measure theory as a model for probability in B, we use it to state and prove some important results in probability. Of course, we have not shown that it is possible to derive measures yet and we have only described properties of measures under a lot of restrictions. But, we tackle those issues later. In the mean time, we begin by revisiting the Law of Large Numbers. Recall that intuition suggests that it should be possible to detect the probabilities of H and T in B by examining the outcomes of many repetitions of the experiment. In particular, the number of times that H occurs in a large number of trials should be related to the probability of H. However, as discussed earlier, a precise statement of this intuition

7 3.2. The Weak Law of Large Numbers 35 is difficult to formulate. Assuming the probability of H is p and S n is the number of H s that occur in the first n trials, then if we could show that S lim n n n = p, then this would be a mathematical statement expressing the intuition. But such a result is certainly false. A sequence of experiments could yield outcomes of all H s for example. So, we need to create a careful formulation. To make things simple, we assume that the probabilities of H and T are both.5. To state and prove the desired result, we introduce some functions. Definition A random variable is a function on the outcomes of an experiment. The name random variable is a rather disconcerting name to assign to a function! Expressing and proving results in probability by using random variables is a supremely important technique. Definition For x I, define the random variable, S n (x) = a a n, where x =.a 1 a 2 a n S n gives the number of heads in the first n experiments of the Bernoulli sequence corresponding to x. Definition Given δ >, define I n = x I : S n (x) 1 n 2. > δ (3.2) Roughly speaking, this is the event consisting of outcomes for which there are not approximately the same number of H and T after n trials, where δ quantifies the discrepancy. We prove Theorem (Weak Law of Large Numbers for Bernoulli Sequences). For fixed δ >, µ (I n ) as n. (3.3) An observant reader should be uncomfortable at this conclusion, because I n is an apparently complicated set, and we have not yet specified a procedure for computing the measure µ of complicated sets. Fortunately, during the proof, it becomes apparent that I n is actually a finite collection of nonoverlapping intervals for which µ is defined. By definition, (3.3) implies that for any fixed δ >, given any ε >, µ x I : S n (x) 1 n 2 > δ < ε, for all sufficiently large n. Identifying µ with P, we see that (3.3) extends the earlier Law of Large Numbers (2.4) to B.

8 36 Chapter 3. Continuum Probability and Sets of Measure Zero R 1 R 2 R /2 1 1/4 1/2 3/4 1 1/4 1/2 3/ Figure 3.2. Plots of the first three Rademacher functions. Remark 3.1. The idea of measuring the size of the set where a function takes a specified range of values is central to measure theory. However, such a set is not a finite collection of disjoint intervals in general. To prove the result, we reformulate it using two new random variables. Definition For x I, we define the i t h Rademacher function by, R i (x) = 2a i 1, x =.a 1 a 2 Equivalently, R i (x) = 1, ai = 1, 1, a i =. We plot some of these functions in Fig R i has a useful interpretation. Suppose we bet on a sequence of coin tosses such that at each toss, we win $1 if it is heads and lose $1 if it is tails. Then R i (x) is the amount won or lost at the i t h toss in the sequence of tosses represented by x. The next random variable is; Definition We define W n (x) = n R i (x). Following the interpretation of R i, W n gives the total amount won or lost after the n t h toss in the betting game described above. By the definition of R i, Now, W n (x) = 2(a 1 + a a n ) n = 2S n (x) n, x =.a 1 a 2 a 3. x I : or in other words, if and only if, S n (x) 1 n 2 > ε 2S n (x) n > 2εn, W n (x) > 2εn. (3.4)

9 3.2. The Weak Law of Large Numbers 37 f α included in set included in set Figure 3.3. We illustrate a typical set in Chebyshev s inequality. Note that since ε is arbitrary, the factor 2 is immaterial. Definition We define, A n = {x I : W n (x) > nε}. We can prove Theorem by showing that µ (A n ) as n. (3.5) To do this, we use a special version of an important result. Theorem (Special Case of Chebyshev s Inequality). Let f be a non-negative, piecewise constant function on I and α > be a positive real number. Then, µ ({x I : f (x) > α}) < 1 f (x) d x, α where the integral is the standard Riemann integral, which is well defined for piecewise constant, nonnegative functions. We illustrate the theorem in Fig Proof. [Theorem 3.2.2] Since f is piecewise constant, there is a mesh = x 1 < x 2 < < x n = 1 such that f (x) = c i for x i < x x i+1 for 1 i n 1. Then since f is

10 38 Chapter 3. Continuum Probability and Sets of Measure Zero nonnegative, f (x) d x = > α n c i (x i+1 x i ) n c i (x i+1 x i ) c i >α n (x i+1 x i ) c i >α = αµ ({x I : f (x) > α}). Now we are ready to prove Theorem Proof. We can also describe the set A n as A n = x I : W 2 n (x) > n2 ε 2, where Wn 2 (x) is piecewise constant and non-negative. By Theorem 3.2.2, We compute, 1 W 2 n (x) d x = µ (A n ) < 1 n 2 ε 2 n 2 R i (x) d x = n W 2 n (x) d x. R 2 i (x) d x + n i, j =1 i j The first integral on the right is easy since R 2 (x) = 1 for all x, so i n R 2 i (x) d x = n. R i (x)r j (x) d x. We consider R i (x)r j (x) d x when i j. Without loss of generality, we assume i < j. Set J to be the interval, l J = 2, l + 1, l < 2 i. i 2 i R i is constant on J while R j oscillates 2( j i) times. Because this is an even number of oscillations, cancellation implies R i (x)r j (x) d x = R i (x) R j (x) d x =.

11 3.3. Sets of measure zero 39 Therefore, Thus, 1 Wn 2 (x) d x = n, and R i (x)r j (x) d x =, i j. µ (I n ) 1 n 2 ε n = 1 2 nε µ 2 (I n ) as n. The random variables introduced for this proof can be used to quantify other interesting questions. Example Suppose in the betting game above, we start with M dollars. We compute an expression that yields the probability we lose all the money. If A n is the event where we lose the money on the n t h toss, then the corresponding set of numbers is I An = {x I : W i (x) > M for i < n and W n (x) = M}. The set I An, determined by where a function has prescribed values, is generally complicated. The event A of losing all the money, given by I A = is even more complicated. The probability of A is µ (I A ), once we figure out how that is computed Numerical simulation References Exercises 3.3 Sets of measure zero Theorem states that the size of the event consisting of Bernoulli sequences for a fair coin for which the relative frequency of H s in the first N trials is larger than a fixed distance from 1/2 tends to as N. But, this leaves open the question: For a fair coin and a typical x, does S lim n (x) = 1 n n 2? (3.6) This is an important question from the point of view of numerical simulation, as it is quite common that we would have only one numerical sequence corresponding to a choice of x in hand. Can we reliably use the computed example to try to approximate the answer to statistical questions? I An Definition The set of normal numbers in I is S N = x I : n (x) 1 n 2 as n.

12 4 Chapter 3. Continuum Probability and Sets of Measure Zero Another way to state the intuition behind the Law of Large Numbers is that the nonnormal numbers should be atypical in some sense. Definition An event in B is atypical if it has probability zero, or if the corresponding set of real numbers has Lebesgue measure. Thus, the intuition behind the Law of Large Numbers is that N c should have Lebesgue measure zero. In this section, we characterize sets with Lebesgue measure zero. We noted above that the Lebesgue measure of a single point is zero. It follows immediately that finite collections of points also have Lebesgue measure zero. Infinite collections are apparently more complicated. For example, I is the uncountable union of single points and does not have Lebesgue measure zero. Working from the assumptions about measure we have made so far, we develop a general method for characterizing sets with Lebesgue measure zero. In doing so, we actually motivate several key aspects of measure theory. The characterization is based on a fundamentally important concept for metric spaces. Definition Given a subset A n, a countable cover of A is a countable collection of sets {A i } in n such that A A i. If the sets in a countable cover are open, we call it an open cover. We emphasize that the requirement of being countable is important. Definition A set A has Lebesgue measure zero if for every ε >, there is a countable cover {A i } of A, where each A i consists of a finite union of open intervals, such that We also say that A has measure zero. µ (A i ) < ε. Note that because each A i in the countable cover consists of a finite union of open intervals, their Lebesgue measure is computable. In this way, we sidestep the issue of computing µ (A) directly. This definition also uses (implicitly) another property of Lebesgue measure: Definition If (c, d) (a, b), then µ ((c, d)) µ ((a, b)). We say that Lebesgue measure is monotone. We could use half open or closed interval in the definition instead of open intervals, but open intervals turn out to be convenient for compactness arguments. Example We show that a closed interval [a, b] with a b cannot have measure zero. If [a, b] is covered by countably many open intervals, we can extract a finite number that cover [a, b] (a finite subcover) because it is compact. The sum of length of these intervals must be at least b a. We describe some sets of measure zero.

13 3.3. Sets of measure zero 41 Theorem A measurable subset of a set of measure zero has measure zero. 2. If {A i } is a countable collection of sets of measure zero, then A i has measure zero. 3. Any finite or countable set of numbers has measure zero. This states that a countable union of sets of measure zero is a set of measure zero. In contrast, uncountable unions of sets of measure zero can have nonzero measure. The assumption that the subset of the set of measure zero in 2. is measurable is an important point that we address in later chapters. Proof. Result 1. This follows from the definition since any countable cover of the larger set is also a cover of the smaller set. Result 2. We choose ε >. Since A n has measure zero, there is a countable collection of open intervals B n,1, B n,2,..., covering A n with µ (B n,i ) ε 2 n. The collection {B n,i } is countable and covers A n, n. Moreover, i, ε µ (B n,i ) = µ (B n,i ) 2 = ε. n Note that we use non-negativity to switch the order of summation in this argument. Result 3. This follows from 2. and 3. and the observation that a point has measure zero. Proof Comment 3.2. This is a classic measure theory argument that the reader should study until it is familiar. An interesting question is whether or not there are any interesting sets of measure zero. We next show that there are uncountable sets of measure zero. In particular, we describe the construction of a special example that is used frequently in measure theory. The set is constructed by an iterative process. Definition Step 1 Beginning with the unit interval F = [,1], divide F into 3 equal parts and remove the middle third open interval 1 3, 3 2 to get F 1 =, ,1. See Fig. 3.4.

14 42 Chapter 3. Continuum Probability and Sets of Measure Zero o 1 o 1_ 2_ F F1 Figure 3.4. The first step in the construction of the Cantor set. Step 2 Working on F 1 next, divide each of its two pieces into equal thirds and remove the middle open intervals from the divisions to get F 2. F 2 =, , , ,1. This has 2 2 closed intervals of length 3 2, see Fig o 1_ 9 2_ 9 1_ 3 2_ 3 7_ 9 8_ 9 1 F2 Figure 3.5. The second step in the construction of the Cantor set. Step i Divide each of the 2 i 1 pieces remaining after step i 1 into equal thirds and remove the middle piece from each to get F i. F i has 2 i closed intervals of length 3 i. End result This procedure yields a sequence of closed sets {F i }, where each F i is a finite union of 2 i closed interval of length 3 i. The Cantor (Middle Third) Set C is defined, C = F i. Theorem Let C be the Cantor set in. Then, 1. C is closed. 2. Every point in C is a limit of a sequence of points in C. 3. C has measure zero. 4. C is uncountable. Proof. Result 1 Result 2 Exercise. Exercise. Result 3 C is contained in F i for any i. Since F i is a union of disjoint intervals whose lengths sum to (2/3) i and, for any ε >, (2/3) i < ε for all sufficiently large i, C has measure zero.

15 3.3. Sets of measure zero 43 Result 4 form We show that every point x C can be represented uniquely by a series of the x = where a i = or 2. This can be recognized as a base 3 decimal expansion. To show uniqueness, if a i b = i 3 i 3 i for a i, b i = or 2, we show that a i = b i for all i. Suppose a i b i for some i. Let n be the smallest number with a n b n, so a n b n = 2. Since a i b i 2 for all i, = a i b i 3 i = i=n a i b i 3 i a i 3 i, 1 a a 3 n n b n i b i 3 i=n+1 i n = 1 3 n 3 i 3. n This is a contradiction and so every number in C has a unique base 3 decimal expansion. Now let {G i, j, j = 1,2,...,2 i 1 } be the open middle third intervals removed to obtain F i. Then, a number given by the base 3 decimal expansion.b 1 b 2 b 3..., b i =,1,2, is in G i, j for some j if and only if: b j = or 2 for each j < i, because it is in F i 1 ; b i = 1, because it is in one of the discarded open intervals at this stage; the b j s are not all or 2 for j > i. It is a good exercise to use a variation of the Cantor diagonal argument to show that C is uncountable. Check notes on this proof. To give some idea of the importance of the concept of sets of measure zero, we quote a beautiful result of Lebesgue that states if and only if conditions for a function to be Riemann integrable. Recall that two aspects of Riemann integration provided significant impetus to the development of measure theory. First, there was a long search minimal equivalent conditions on a function that would guarantee the function is Riemann integrable. Second, the Riemann integral has some annoying flaws. We provide a theory for Riemann integration and discuss these issues in Appendix A. Here, we simply quote one of the most important results. To explain the idea, we begin with a canonic example. First, Definition A property of sets that holds except on a set of measure zero is said to hold almost everywhere (a.e. ). We say that almost all points in a set have a property if all the points except those in a set of measure zero have the property. Now, the example.

16 44 Chapter 3. Continuum Probability and Sets of Measure Zero Definition Dirichlet s function is defined 1, if x, D(x) =, if x. From the definition, D is a bounded function and D(x) = a.e. It is a simple exercise to show that D is discontinuous at every point in I and therefore D(x) is not continuous a.e. We prove the following result in Appendix A. Theorem (Lebesgue s Theorem on Riemann Integration). A bounded function is Riemann integrable on a closed interval if and only if it is continuous a.e. on the interval. Add Theorem 1.3 from Billingsley? References Exercises 3.4 The Strong Law of Large Numbers We return to analyzing the set of normal numbers N. Theorem (Strong Law of Large Numbers for Bernoulli Sequences). uncountable set with Lebesgue measure zero. N c is an Unlike the Weak Law of Large Numbers Theorem 3.2.1, this theorem is a statement that requires measure theory. This version of the Law of Large Numbers is called strong because Theorem implies Theorem This is a consequence of a general result on different kinds of convergence that we prove later on. Proof. We first show that that N c is uncountable and contains a Cantor-like set. Consider the map f : I I, f(x) =.a 1 11a 2 11a , for x =.a 1 a 2 a 3... The map is 1 1, so its image is uncountable. Moreover, f(i ) is contained in N c. In fact, if y = f(x), then S 3n (y) 3n, and S 3n (y) 3n 2 3. Such y s clearly violate the Law of Large Numbers. The image set f(i ) is Cantor-like in that it is the countable nested union of sets consisting of finite number of well-separated, disjoint intervals. We cover the complicated set N c using a countable cover of much simpler sets. Recall the set A n = {x I : W n (x) > εn} used in the proof of the Weak Law of Large Numbers. We use an equivalent definition, A n = x I : W 4 n (x) > ε4 n 4.

17 3.4. The Strong Law of Large Numbers 45 By Chebyshev s Inequality 3.2.2, µ (A n ) 1 ε 4 n 4 The integrand yields 5 kinds of terms, W 4 n d x 1 ε 4 n 4 n 4 R i d x. 1. R 4 i for i = 1 n. 2. R 2 i R2 j for i j. 3. R 2 i R j R k for i j k. 4. R 3 i R j for i j. 5. R i R j R k R l for i j k l. Since R 4 i (x) = 1 and R2 i (x)r2 (x) = 1 for all i, j, j R 4 i d x = R 2 i R2 j d x = 1. We show the other terms integrate to zero because of cancellation. Two follow from the proof of the Weak Law of Large Numbers: R 2 i R i R k d x = R j R k d x =, i j k, R 3 i R j d x = R i R j d x =, i j. Finally, assume i < j < k < l, and consider an interval of the form m J = 2, m + 1. k 2 k R i R j R k is constant on J. However, R l oscillates 2(l l ) times on J, so R i R j R k R l d x =. There are n terms of the first kind of integrand and 3n(n 1) terms involving the second kind of integrand, so W 4 n (x) d x = 3n2 2n 3n 2, and µ (A n ) 3 n 2 ε 4.

18 46 Chapter 3. Continuum Probability and Sets of Measure Zero We cover N c using a collection of sets of the form A n for increasing n and decreasing ε chosen in such a way that the cover has arbitrarily small measure. For a constant C, set ε 4 n = C n 1/2, so 3 ε 4 nn 2 = 3 C 1 n 3/2. The last series converges and the quantity can be made smaller than any δ > by choosing sufficiently large C. Hence, given δ >, there is a sequence {ε n } such that 3 ε n4 n 2 δ. For each n, set Ã n = {x I : W n (x) > ε n n}. Note Ãn is a finite union of intervals since W n is piecewise constant. We have and µ (Ãn ) 3 ε 4 nn 2, µ (Ãn ) δ. If we show that N c Ã n, then we are done. This holds if N Ã c n. If x Ã c n, then for each n, W n (x) ε n n, or W n (x) n ε n. Since ε n, W n (x) n, or x N. The proof of Theorem can be used to draw stronger conclusions. For example, a normal number has the property that no finite sequence of digits occurs more frequently than any other finite sequence of digits Numerical simulation 3.5 A second wish list for measure theory With some informal experience with measure theory ideas, we make a second attempt at a wish list of desirable properties for a measure theory. We are considering the measure on n that extends the standard notions of length, area, and volume. If E n for some n, let µ(e) denote its measure. 1. µ should be non-negative set function from sets in n into the extended reals { }. µ({x}) = for a single point. µ(a) = should be possible for unbounded sets. 2. In, we should have µ([a, b]) = b a. In n, we should have µ(q) = (b 1 a 1 )(b 2 a 2 )...(b n a n ), for generalized rectangles (multi-intervals), Q = {x n : a i x i b i, 1 i n}.

19 3.5. A second wish list for measure theory If {A 1, A 2,...,A n } are disjoint sets, then µ(a 1 A 2... A n ) = n µ(a i ). What about infinite collections? Well, µ({x}) =. But in, (,1) = {x}. This is a problem because we cannot have 1 = µ((,1)) = µ {x} = µ({x}) =. x x x (,1) So, uncountable collections of sets are a problem and we avoid them. What about countable collections? Countable disjoint collections of sets of measure zero should have measure zero. Also, 1 1 (,1] = 2,1 3, , 1..., 3 and, 1 = µ((,1]) = 1 = µ 2, µ , µ , References Exercises So we would like to say that if {A i } is a countable collection of disjoint sets then µ A i = µ(a i ). 4. If A B are sets, then µ(a) µ(b), or µ should be monotone. 5. For the standard volume measure on n, if a set A is obtained from another set B by rotation, translation, or reflection maps, then µ(a) = µ(b). It turns out that we cannot construct a desirable measure that satisfies all of these properties. We have to give up something, so we do not require that the measure be defined on all subsets on n. We settle for a measure defined on a class of subsets.

Construction of a general measure structure

Chapter 4 Construction of a general measure structure We turn to the development of general measure theory. The ingredients are a set describing the universe of points, a class of measurable subsets along