Measure Theoretic Probability. P.J.C. Spreij

Measure Theoretic Probability P.J.C. Spreij this version: September 16, 2009

Contents 1 σ-algebras and measures 1 1.1 σ-algebras............................... 1 1.2 Measures............................... 3 1.3 Null sets................................ 4 1.4 π- and d-systems........................... 5 1.5 Probability language......................... 7 1.6 Exercises............................... 8 2 Existence of Lebesgue measure 10 2.1 Outer measure and construction.................. 10 2.2 A general extension theorem..................... 13 2.3 Exercises............................... 15 3 Measurable functions and random variables 17 3.1 General setting............................ 17 3.2 Random variables........................... 19 3.3 Independence............................. 21 3.4 Exercises............................... 23 4 Integration 25 4.1 Integration of simple functions................... 25 4.2 A general definition of integral................... 28 4.3 Integrals over subsets......................... 32 4.4 Expectation and integral....................... 33 4.5 L p -spaces of random variables.................... 35 4.6 L p -spaces of functions........................ 36 4.7 Exercises............................... 40 5 Product measures 42 5.1 Product of two measure spaces................... 42 5.2 Application in Probability theory.................. 45 5.3 Infinite products........................... 47 5.4 Exercises............................... 49 6 Derivative of a measure 52 6.1 Linear functionals on R n....................... 52 6.2 Linear functionals on a Hilbert space................ 52 6.3 Real and complex measures..................... 53 6.4 Absolute continuity and singularity................. 54 6.5 The Radon-Nikodym theorem.................... 56 6.6 Additional results........................... 57 6.7 Exercises............................... 58

7 Convergence and Uniform Integrability 61 7.1 Modes of convergence........................ 61 7.2 Uniform integrability......................... 64 7.3 Exercises............................... 66 8 Conditional expectation 68 8.1 Conditional expectation for X L 1 (Ω, F, P)............ 69 8.2 Conditional probabilities....................... 72 8.3 Exercises............................... 74 9 Martingales and alike 76 9.1 Basic concepts and definition.................... 76 9.2 Stopping times and martingale transforms............. 79 9.3 Doob s decomposition........................ 82 9.4 Optional sampling.......................... 83 9.5 Exercises............................... 85 10 Convergence theorems 87 10.1 Doob s convergence theorem..................... 87 10.2 Uniformly integrable martingales and convergence........ 89 10.3 L p convergence results........................ 91 10.4 The strong law of large numbers.................. 93 10.5 Exercises............................... 95 11 Weak convergence 99 11.1 Generalities.............................. 100 11.2 The Central Limit Theorem..................... 107 11.3 Exercises............................... 111 12 Characteristic functions 113 12.1 Definition and first properties.................... 113 12.2 Characteristic functions and weak convergence.......... 117 12.3 The Central Limit Theorem revisited................ 119 12.4 Exercises............................... 121 13 Brownian motion 125 13.1 The space C[0, ).......................... 125 13.2 Weak convergence on metric spaces................. 126 13.3 An invariance principle........................ 127 13.4 The proof of Theorem 13.8..................... 128 13.5 Exercises............................... 130

Preface In these notes we explain the measure theoretic foundations of modern probability. The notes are used during a course that had as one of its principal aims a swift introduction to measure theory as far as it is needed in modern probability, e.g. to define concepts as conditional expectation and to prove limit theorems for martingales. Everyone with a basic notion of mathematics and probability would understand what is meant by f(x) and P(A). In the former case we have the value of some function f evaluated at its argument. In the second case, one recognizes the probability of an event A. Look at the notations, they are quite similar and this suggests that also P is a function, defined on some domain to which A belongs. This is indeed the point of view that we follow. We will see that P is a function -a special case of a measure- on a collection of sets, that satisfies certain properties, a σ-algebra. In general, a σ-algebra Σ will be defined as a suitable collection of subsets of a given set S. A measure µ will then be a map on Σ, satisfying some defining properties. This gives rise to considering a triple, to be called a measure space, (S, Σ, µ). We will develop probability theory in the context of measure spaces and because of tradition and some distinguished features, we will write (Ω, F, P) for a probability space instead of (S, Σ, µ). Given a measure space we will develop in a rather abstract sense integrals of functions defined on S. In a probabilistic context, these integrals have the meaning of expectations. The general setup provides us with two big advantages. In computing expectations, we don t have to distinguish anymore between random variables having a discrete distribution and those who have what is called a density. In the first case, expectations are usually computed as sums, whereas in the latter case, Riemann integrals are the tools. We will see that these are special cases of the more general notion of Lebesgue integral. Another advantage is the availability of convergence theorems. In analytic terms, we will see that integrals of functions converge to the integral of a limiting function, given appropriate conditions and an appropriate concept of convergence. In a probabilistic context, this translates to convergence of expectations of random variables. We will see many instances, where the foundations of the theory can be fruitfully applied to fundamental issues in probability theory. These lecture notes are the result of teaching the course Measure Theoretic Probability for a number of years. To a large extent this course was initially based on the book Probability with Martingales by David Williams, but also other text have been used. In particular we consulted Convergence of Stochastic Processes by David Pollard, Real and Complex Analysis by Walter Rudin and Foundations of Modern Probability by Olav Kallenberg. These lecture notes have first been used in Fall 2008. Among the students who then took the course was Ferdinand Rolwes, who corrected (too) many typos and other annoying errors.

1 σ-algebras and measures In this chapter we lay down the measure theoretic foundations for probability theory. We start with some general notions and show how these are instrumental in a probabilistic environment. 1.1 σ-algebras Definition 1.1 Let S be a non-empty set. A collection Σ 0 2 S is called an algebra (on S) if (i) S Σ 0 (ii) E Σ 0 E c Σ 0 (iii) E, F Σ 0 E F Σ 0. Notice that always belongs to an algebra, since = S c. Of course property (iii) extends to finite unions by induction. Moreover, in an algebra we also have E, F Σ 0 E F Σ 0, since E F = (E c F c ) c. Furthermore E \ F = E F c Σ. Definition 1.2 Let S be a non-empty set. A collection Σ 2 S is called a σ-algebra (on S) if it is an algebra and n=1 E n Σ as soon as E n Σ (n = 1, 2...). If Σ is a σ-algebra on S, then (S, Σ) is called a measurable space and the elements of Σ are called measurable sets. We shall measure them in the next section. If C is any collection of subsets of S, then by σ(c) we denote the smallest σ- algebra containing C. This means that σ(c) is the intersection of all σ-algebras that contain C (see Exercise 1.1). If Σ = σ(c), we say that C generates Σ. The union of two σ-algebras Σ 1 and Σ 2 on a set S is usually not a σ-algebra. We write Σ 1 Σ 2 for σ(σ 1 Σ 2 ). One of the most relevant σ-algebras of this course is B = B(R), the Borel sets of R. Let O be the collection of all open subsets of R with respect to the usual topology (in which all intervals (a, b) are open). Then B := σ(o). Of course, one similarly defines the Borel sets of R d, and in general, for a topological space (S, O), one defines the Borel-sets as σ(o). Borel sets can in principle be rather wild, but it helps to understand them a little better, once we know that they are generated by simple sets. Proposition 1.3 Let I = {(, x] : x R}. Then σ(i) = B. Proof We prove the two obvious inclusions, starting with σ(i) B. Since (, x] = n (, x + 1 n ) B, we have I B and then also σ(i) B, since σ(i) is the smallest σ-algebra that contains I. (Below we will use this kind of arguments repeatedly). For the proof of the reverse inclusion we proceed in three steps. First we observe that (, x) = n (, x 1 n ] σ(i). Knowing this, we conclude that (a, b) = (, b) \ (, a] σ(i). Let then G be an arbitrary open set. Since G is 1

open, for every x G, there is ε(x) such that (x ε(x), x + ε(x)) G. Hence G = x G (x ε(x), x + ε(x)). Since Q is dense in R, we can replace this union by restricting the x to G Q, which gives a countable union open intervals, of which we just proved that they are elements of σ(i). Thus O σ(i), but then also B σ(i). An obvious question to ask is whether every subset of R belongs to B = B(R). The answer is no, as we will show that the cardinality of B(R) is the same as the cardinality of R, from which the negative answer follows. Let E be a countable collection of subsets of some set S that contains. We show that necessarily the cardinality of σ(e) is at most 2 ℵ0. To that end we define collections E α, for any ordinal number α less than ω, the first uncountable ordinal number. To start, we put E 0 = E. Let 0 < α < ω (< denotes the usual ordering of the ordinal numbers) and assume that the collections E β are defined for all β < α. Put Eα 0 = β<α E β. We define E α as the collection of sets that can be written as a countable union n=1e n, with E n Eα 0 or En c Eα. 0 Finally, we define E ω := α<ω E α. The first thing we will prove is that E ω σ(e). Trivially, E 0 σ(e). Suppose now that E β σ(e) for all β < α. Then also Eα 0 σ(e), and if E = n=1e n E α, it follows that E σ(e). Hence E α σ(e) and we conclude that E ω σ(e). Note that this also yields σ(e ω ) = σ(e). We will now show that E ω is a σ-algebra, from which it then follows that E ω = σ(e). It is obvious that E ω. Let E E ω, then there is some α < ω for which E E α. But E c = n=1e n with E n = E c, so that E c E β for all β > α, and thus E c E ω. Similarly, we look at unions. Let E n E ω (n N), so there are a n < ω such that E n E αn. Properties of ordinal numbers yield the existence of β < ω such that α n β for all n. It follows that n=1 E β E ω. We conclude that E ω is a σ-algebra. The next thing to show is that the cardinality of E ω is at most 2 ℵ0. The construction of E 1 from E 0 = E implies that the cardinality of E 1 is at most ℵ ℵ0 0, which is equal to. Let α < ω and assume that the cardinality of E 2ℵ0 β is less than or equal to 2 ℵ0 for all 1 β < α, a property which then also holds for Eα. 0 The construction of E α from Eα 0 yields, by the same argument as used above for E 1, that also E α has cardinality less than or equal to 2 ℵ0. This shows that the set I(ω) := {α < ω : E α has cardinality less than or equal to 2 ℵ0 } is what is called an inductive set. Since the ordinal numbers with the ordering < is well-ordered, I(ω) = {α : α < ω}. It follows that also the cardinality of E ω is at most equal to 2 ℵ0. Turning back to the initial question on the cardinality of B(R), we apply the above result. We can take E as the set of intervals (a, b) with a, b Q, augmented with the empty set. We conclude that B(R) has cardinality at most equal to 2 ℵ0. 2

1.2 Measures Let Σ 0 be an algebra on a set S, and Σ be a σ-algebra on S. We consider mappings µ 0 : Σ 0 [0, ] and µ : Σ [0, ]. Note that is allowed as a possible value. We call µ 0 additive if µ( ) = 0 and if µ 0 (E F ) = µ 0 (E) + µ 0 (F ) for every pair of disjoint sets E and F in Σ 0. Of course this addition rule then extends to arbitrary finite unions of disjoint sets. The mapping µ 0 is called σ-additive or countably additive, if µ( ) = 0 and if µ 0 ( n E n ) = n µ 0(E n ) for every sequence (E n ) of disjoint sets of Σ 0 whose union is also in Σ 0. σ-additivity is defined similarly for µ, but then we don t have to require that n E n Σ. This is true by definition. Definition 1.4 Let (S, Σ) be a measurable space. A countably additive mapping µ : Σ [0, ] is called a measure. The triple (S, Σ, µ) is called a measure space. Some extra terminology follows. A measure is called finite if µ(s) <. It is called σ-finite, if we can write S = n S n, where the S n are measurable sets and µ(s n ) <. If µ(s) = 1, then µ is called a probability measure. Measures are used to measure (measurable) sets in one way or another. Here is a simple example. Let S = N and Σ = 2 N (we often take the power set as the σ-algebra on a countable set). Let τ (we write τ instead of µ for this special case) be the counting measure: τ(e) = E, the cardinality of E. One easily verifies that τ is a measure, and it is σ-finite, because N = n {1,..., n}. A very simple measure is the Dirac measure. Consider a measurable space (S, Σ) and single out a specific x 0 S. Define δ(e) = 1 E (x 0 ), for E Σ (1 E is the indicator function of the set E, 1 E (x) = 1 if x E and 1 E (x) = 0 if x / E). Check that δ is a measure on Σ. Another example is Lebesgue measure, whose existence is formulated below. It is the most natural candidate for a measure on the Borel sets on the real line. Theorem 1.5 There exists a unique measure λ on (R, B) with the property that for every interval I = (a, b] with a < b it holds that λ(i) = b a. The proof of this theorem is deferred to later, see Theorem 2.6. For the time being, we take this existence result for granted. One remark is in order. One can show that B is not the largest σ-algebra for which the measure λ can coherently be defined. On the other hand, on the power set of R it is impossible to define a measure that coincides with λ on the intervals. We ll come back to this later. Here are the first elementary properties of a measure. Proposition 1.6 Let (S, Σ, µ) be a measure space. true (all the sets below belong to Σ). (i) If E F, then µ(e) µ(f ). (ii) µ(e F ) µ(e) + µ(f ). Then the following hold 3

(iii) µ( n k=1 E k) n k=1 µ(e k) If µ is finite, we also have (iv) If E F, then µ(f \ E) = µ(f ) µ(e). (v) µ(e F ) = µ(e) + µ(f ) µ(e F ). Proof The set F can be written as the disjoint union F = E (F \ E). Hence µ(f ) = µ(e) + µ(f \ E). Property (i) now follows and (iv) as well, provided µ is finite. To prove (ii), we note that E F = E (F \ (E F )), a disjoint union, and that E F F. The result follows from (i). Moreover, (v) also follows, if we apply (iv). Finally, (iii) follows from (ii) by induction. Measures have certain continuity properties. Proposition 1.7 Let (E n ) be a sequence in Σ. (i) If the sequence is increasing, with limit E = n E n, then µ(e n ) µ(e) as n. (ii) If the sequence is decreasing, with limit E = n E n and if µ(e n ) < from a certain index on, then µ(e n ) µ(e) as n. Proof (i) Define D 1 = E 1 and D n = E n \ n 1 k=1 E k for n 2. Then the D n are disjoint, E n = n k=1 D k for n 1 and E = k=1 D k. It follows that µ(e n ) = n k=1 µ(d k) k=1 µ(d k) = µ(e). To prove (ii) we assume without loss of generality that µ(e 1 ) <. Define F n = E 1 \ E n. Then (F n ) is an increasing sequence with limit F = E 1 \ E. So (i) applies, yielding µ(e 1 ) µ(e n ) µ(e 1 ) µ(e). The result follows. Corollary 1.8 Let (S, Σ, µ) be a measure space. For an arbitrary sequence (E n ) of sets in Σ, we have µ( n=1e n ) n=1 µ(e n). Proof Exercise 1.2. Remark 1.9 The finiteness condition in the second assertion of Proposition 1.7 is essential. Consider N with the counting measure τ. Let F n = {n, n + 1,...}, then n F n = and so it has measure zero. But τ(f n ) = for all n. 1.3 Null sets Consider a measure space (S, Σ, µ) and let E Σ be such that µ(e) = 0. If N is a subset of E, then it is fair to suppose that also µ(n) = 0. But this can only be guaranteed if N Σ. Therefore we introduce some new terminology. A set N S is called a null set or µ-null set, if there exists E Σ with E N and µ(e) = 0. The collection of null sets is denoted by N, or N µ since it depends on µ. In Exercise 1.5 you will be asked to show that N is a σ-algebra and to extend µ to Σ = Σ N. If the extension is called µ, then we have a new measure space (S, Σ, µ), which is complete, all µ-null sets belong to the σ-algebra Σ. 4

1.4 π- and d-systems In general it is hard to grab what the elements of a σ-algebra Σ are, but often collections C such that σ(c) = Σ are easier to understand. In good situations properties of Σ can easily be deduced from properties of C. This is often the case when C is a π-system, to be defined next. Definition 1.10 A collection I of subsets of S is called a π-system, if I 1, I 2 I implies I 1 I 2 I. It follows that a π-system is closed under finite intersections. In a σ-algebra, all familiar set operations are allowed, at most countably many. We will see that it is possible to disentangle the defining properties of a σ-algebra into taking finite intersections and the defining properties of a d-system. This is the content of Proposition 1.12 below. Definition 1.11 A collection D of subsets of S is called a d-system, if the following hold. (i) S D (ii) If E, F D such that E F, then F \ E D. (iii) If E n D for n N, and E n E n+1 for all n, then n E n D. Proposition 1.12 Σ is a σ-algebra iff it is a π-system and a d-system. Proof Let Σ be a π-system and a d-system. We check the defining conditions of a Σ-algebra. (i) Since Σ is a d-system, S Σ. (ii) Complements of sets in Σ are in Σ as well, again because Σ is a d-system. (iii) If E, F Σ, then E F = (E c F c ) c Σ, because we have just shown that complements remain in Σ and because Σ is a π-system. Then Σ is also closed under finite unions. Let E 1, E 2,... be a sequence in Σ. We have just showed that the sets F n = n i=1 E i Σ. But since the F n form an increasing sequence, also their union is in Σ, because Σ is a d-system. But n F n = n E n. This proves that Σ is a σ-algebra. Of course the other implication is trivial. If C is a collection of subsets of S, then by d(c) we denote the smallest d-system that contains C. Note that it always holds that d(c) σ(c). In one important case we have equality. This is known as Dynkin s lemma. Lemma 1.13 Let I be a π-system. Then d(i) = σ(i). Proof Suppose that we would know that d(i) is a π-system as well. Then Proposition 1.12 yields that d(i) is a σ-algebra, and so it contains σ(i). Since the reverse inclusion is always true, we have equality. Therefore we will prove that indeed d(i) is a π-system. Step 1. Put D 1 = {B d(i) : B C d(i), C I}. We claim that D 1 is a d-system. Given that this holds, it is obvious that I D 1, and then also d(i) D 1. But since D 1 is defined as a subset of d(i), we conclude that these two collections are the same. We now show that the claim holds. Evidently 5

S D 1. Let B 1, B 2 D 1 with B 1 B 2 and C I. Write (B 2 \ B 1 ) C as (B 2 C) \ (B 1 C). The last two intersections belong to d(i) by definition of D 1 and so does their difference, since d(i) is a d-system. For B n B, B n D 1 and C I we have (B n C) d(i) which then converges to B C d(i). So B D 1. Step 2. Put D 2 = {C d(i) : B C d(i), B d(i)}. We claim, again, (and you check) that D 2 is a d-system. The key observation is that I D 2. Indeed, take C I and B d(i). The latter collection is nothing else but D 1, according to step 1. But then B C d(i), which means that C D 2. It now follows that d(i) D 2, but then we must have equality, because D 2 is defined as a subset of d(i). But, by construction, D 2 is a π-system. So we conclude that d(i) is a π-system, as desired. Sometimes another version of Lemma 1.13 is useful. Corollary 1.14 The assertion of Proposition 1.12 is equivalent to the following statement. Let I be a π-system and D be a d-system. If I D, then σ(i) D. Proof Suppose that I D. Then d(i) D. But d(i) = σ(i), according to Proposition 1.12. Conversely, let I be a π-system. Then I d(i). By hypothesis, one also has σ(i) d(i), and the latter is always a subset of σ(i). All these efforts lead to the following very useful theorem. It states that any finite measure on Σ is characterized by its action on a rich enough π-system. We will meet many occasions where this theorem is used. Theorem 1.15 Let I be a π-system and Σ = σ(i). Let µ 1 and µ 2 be finite measures on Σ with the properties that µ 1 (S) = µ 2 (S) and that µ 1 and µ 2 coincide on I. Then µ 1 = µ 2 (on Σ). Proof The whole idea behind the proof is to find a good d-system that contains I. The following set is a reasonable candidate. Put D = {E Σ : µ 1 (E) = µ 2 (E)}. The inclusions I D Σ are obvious. If we can show that D is a d-system, then Corollary 1.14 gives the result. The fact that D is a d-system is straightforward to check, we present only one verification. Let E, F D such that E F. Then (use Proposition 1.6 (iv)) µ 1 (F \ E) = µ 1 (F ) µ 1 (E) = µ 2 (F ) µ 2 (E) = µ 2 (F \ E) and so F \ E D. Remark 1.16 In the above proof we have used the fact that µ 1 and µ 2 are finite. If this condition is violated, then the assertion of the theorem is not valid in general. Here is a counterexample. Take N with the counting measure µ 1 = τ and let µ 2 = 2τ. A π-system that generates 2 N is given by the sets G n = {n, n + 1,...} (n N). 6

1.5 Probability language In Probability Theory, one usually writes (Ω, F, P) instead of (S, Σ, µ). On one hand this is merely change of notation. We still have that Ω is a set, F a σ-algebra on it, and P a measure, but in this case, P is a probability measure, P(Ω) = 1. In probabilistic language, Ω is often called the set of outcomes and elements of F are called events. So by definition, an event is a measurable subset of the set of all outcomes. A probability space (Ω, F, P) can be seen as a mathematical model of a random experiment. Consider for example the experiment consisting of tossing two coins. Each coin has individual outcomes 0 and 1. The set Ω can then be written as {00, 01, 10, 11}, where the notation should be obvious. In this case, we take F = 2 Ω and a choice of P could be such that P assigns probability 1 4 to all singletons. Of course, from a purely mathematical point of view, other possibilities for P are conceivable as well. A more interesting example is obtained by considering an infinite sequence of coin tosses. In this case one should take Ω = {0, 1} N and an element ω Ω is then an infinite sequence (ω 1, ω 2,...) with ω n {0, 1}. It turns out that one cannot take the power set of Ω as a σ-algebra, if one wants to have a nontrivial probability measure defined on it. As a matter of fact, this holds for the same reason that one cannot take the power set on (0, 1] to have a consistent notion of Lebesgue measure. This has everything to do with the fact that one can set up a bijective correspondence between (0, 1) and {0, 1} N. Nevertheless, there is a good candidate for a σ-algebra F on Ω. One would like to have that sets like the 12-th outcome is 1 are events. Let C be the collection of all such sets, C = {{ω Ω : ω n = s}, n N, s {0, 1}}. We take F = σ(c) and all sets {ω Ω : ω n = s} are then events. One can show that there indeed exists a probability measure P on this F with the nice property that for instance the set {ω Ω : ω 1 = ω 2 = 1} (in the previous example it would have been denoted by {11}) has probability 1 4. Having the interpretation of F as a collection of events, we now introduce two special events. Consider a sequence of events E 1, E 2,... and define lim sup E n := lim inf E n := m=1 n=m m=1 n=m E n E n. Note that the sets F m = n m E n form an increasing sequence and the sets D m = n m E n form a decreasing sequence. Clearly, F is closed under taking limsup and liminf. The terminology is explained by (i) of Exercise 1.4. In probabilistic terms, lim sup E n is described as the event that the E n occur infinitely often, abbreviated by E n i.o. Likewise, lim inf E n is the event that the E n occur eventually. The former interpretation follows by observing that ω lim sup E n iff for all m, there exists n m such that ω E n. In other words, a particular 7

outcome ω belongs to lim sup E n iff it belongs to some (infinite) subsequence of (E n ). The terminology to call m=1 n=m E n the lim sup of the sequence is justified in Exercise 1.4. In this exercise, indicator functions of events are used, which are defined as follows. If E is an event, then the function 1 E is defined by 1 E (ω) = 1 if ω E and 1 E (ω) = 0 if ω / E. 1.6 Exercises 1.1 Prove the following statements. (a) The intersection of an arbitrary family of d-systems is again a d-system. (b) The intersection of an arbitrary family of σ-algebras is again a σ-algebra. (c) If C 1 and C 2 are collections of subsets of Ω with C 1 C 2, then d(c 1 ) d(c 2 ). 1.2 Prove Corollary 1.8. 1.3 Prove the claim that D 2 in the proof of Lemma 1.13 forms a d-system. 1.4 Consider a measure space (S, Σ, µ). Let (E n ) be a sequence in Σ. (a) Show that 1 lim inf En = lim inf 1 En. (b) Show that µ(lim inf E n ) lim inf µ(e n ). (Use Proposition 1.7.) (c) Show also that µ(lim sup E n ) lim sup µ(e n ), provided that µ is finite. 1.5 Let (S, Σ, µ) be a measure space. Call a subset N of S a (µ, Σ)-null set if there exists a set N Σ with N N and µ(n ) = 0. Denote by N the collection of all (µ, Σ)-null sets. Let Σ be the collection of subsets E of S for which there exist F, G Σ such that F E G and µ(g \ F ) = 0. For E Σ and F, G as above we define µ (E) = µ(f ). (a) Show that Σ is a σ-algebra and that Σ = Σ N (= σ(n Σ)). (b) Show that µ restricted to Σ coincides with µ and that µ (E) doesn t depend on the specific choice of F in its definition. (c) Show that the collection of (µ, Σ )-null sets is N. 1.6 Let G and H be two σ-algebras on Ω. Let C = {G H : G G, H H}. Show that C is a π-system and that σ(c) = σ(g H). 1.7 Let Ω be a countable set. Let F = 2 Ω and let p : Ω [0, 1] satisfy ω Ω p(ω) = 1. Put P(A) = ω A p(ω) for A F. Show that P is a probability measure. 1.8 Let Ω be a countable set. Let A be the collection of A Ω such that A or its complement has finite cardinality. Show that A is an algebra. What is d(a)? 8

1.9 Show that a finitely additive map µ : Σ 0 [0, ] is countably additive if µ(h n ) 0 for every decreasing sequence of sets H n Σ 0 with n H n =. If µ is countably additive, do we necessarily have µ(h n ) 0 for every decreasing sequence of sets H n Σ 0 with n H n =? 1.10 Consider the collection Σ 0 of subsets of R that can be written as a finite union of disjoint intervals of type (a, b] with a b < or (a, ). Show that Σ 0 is an algebra and that σ(σ 0 ) = B(R). 9

2 Existence of Lebesgue measure In this chapter we construct the Lebesgue measure on the Borel sets of R. To that end we need the concept of outer measure. Somewhat hidden in the proof of the construction is the extension of a countably additive function on an algebra to a measure on a σ-algebra. There are different versions of extension theorems, originally developed by Carathéodory. Although of crucial importance in measure theory, we will confine our treatment of extension theorems mainly aimed at the construction of Lebesgue measure on (R, B). However, see also the end of this section. 2.1 Outer measure and construction Definition 2.1 Let S be a set. An outer measure on S is a mapping µ : 2 S [0, ] that satisfies (i) µ ( ) = 0, (ii) µ is monotone, i.e. µ (E) µ (F ) if E F, (iii) µ is subadditive, i.e. µ ( n=1 E n) n=1 µ (E n ), valid for any sequence of sets E n. Definition 2.2 Let µ be an outer measure on a set S. A set E S is called µ-measurable if µ (F ) = µ (E F ) + µ (E c F ), F S. The class of µ-measurable sets is denoted by Σ µ. Theorem 2.3 Let µ be an outer measure on a set S. Then Σ µ is a σ-algebra and the restricted mapping µ : Σ µ [0, ] of µ is a measure on Σ µ. Proof It is obvious that Σ µ and that E c Σ µ as soon as E Σ µ. Let E 1, E 2 Σ µ and F S. The trivial identity F (E 1 E 2 ) c = (F E c 1) (F (E E c 2)) yields with the subadditivity of µ µ (F (E 1 E 2 ) c ) µ (F E c 1) + µ (F (E 1 E c 2)). Add to both sides µ (F (E 1 E 2 )) and use that E 1, E 2 Σ µ to obtain µ (F (E 1 E 2 )) + µ (F (E 1 E 2 ) c ) µ (F ). From subadditivity the reversed version of this equality immediately follows as well, which shows that E 1 E 2 Σ µ. We conclude that Σ µ is an algebra. Pick disjoint E 1, E 2 Σ µ, then (E 1 E 2 ) E c 1 = E 2. If F S, then by E 1 Σ µ µ (F (E 1 E 2 )) = µ (F (E 1 E 2 ) E 1 ) + µ (F (E 1 E 2 ) E c 1) = µ (F E 1 ) + µ (F E 2 ). 10

By induction we obtain that for every sequence of disjoint set E i in Σ µ it holds that for every F S n n µ (F E) = µ (F E i ). (2.1) i=1 i=1 If E = i=1 E i, it follows from (2.1) and the monotonicity of µ that µ (F E) µ (F E i ). i=1 Since subadditivity of µ immediately yields the reverse inequality, we obtain µ (F E) = µ (F E i ). (2.2) i=1 Let U n = n i=1 E i and note that U n Σ µ. We obtain from (2.1) and (2.2) and monotonicity µ (F ) = µ (F U n ) + µ (F Un) c n µ (F E i ) + µ (F E c ) i=1 µ (F E i ) + µ (F E c ) i=1 = µ (F E) + µ (F E c ). Combined with µ (F ) µ (F E) + µ (F E c ), which again is the result of subadditivity, we see that E Σ µ. If follows that Σ µ is a σ-algebra. Take then F = S in (2.2) to see that µ restricted to Σ µ is a measure. We will use Theorem 2.3 to show the existence of Lebesgue measure on (R, B). Let E be a subset of R. By I(E) we denote a cover of E consisting of at most countably many open intervals. For any interval I, we denote by λ 0 (I) its ordinary length. We now define a function λ defined on 2 R by putting for every E R λ (E) = inf λ 0 (I k ). (2.3) I(E) I k I(E) Lemma 2.4 The function λ defined by (2.3) is an outer measure on R and satisfies λ (I) = λ 0 (I). Proof Properties (i) and (ii) of Definition 2.1 are obviously true. We prove subadditivity. Let E 1, E 2,... be arbitrary subsets of R and ε > 0. By definition of λ, there exist covers I(E n ) of the E n such that for all n λ (E n ) λ 0 (I) ε2 n. (2.4) I I(E n) 11

Because n I(E n ) is a countable open cover of n E n, λ ( n E n ) λ 0 (I) n I I(E n) λ (E n ) + ε, n in view of (2.4). Subadditivity follows upon letting ε 0. Turning to the next assertion, we observe that λ (I) λ 0 (I) is almost immediate (I an arbitrary interval). The reversed inequality is a little harder to prove. Without loss of generality, we may assume that I is compact. Let I(I) be a cover of I. We aim at proving λ 0 (I) λ 0 (I k ), for every interval I. (2.5) I k I(I) If this holds, then by taking the infimum on the right hand of (2.5), it follows that λ 0 (I) λ (I). To prove (2.5) we proceed as follows. The covering intervals are open. By compactness of I, there exists a finite subcover of I, {I 1,..., I n } say. So, it is sufficient to show (2.5), which we do by induction. If n = 1, this is trivial. Assume it is true for covers with at most n 1 elements. Assume that I = [a, b]. Then b is an element of some I k = (a k, b k ). Note that the interval I \ I k (possibly empty) is covered by the remaining intervals, and by hypothesis we have λ 0 (I \ I k ) j k λ 0(I j ). But then we deduce λ 0 (I) = (b a k ) + (a k a) (b k a k ) + (a k a) λ 0 (I k ) + λ 0 (I \ I k ) j λ 0(I j ). Lemma 2.5 Any interval I a = (, a] (a R) is λ-measurable, I a Σ λ. Hence B Σ λ. Proof Let E R. Since λ is subadditive, it is sufficient to show that λ (E) λ (E I a ) + λ (E Ia). c Let ε > 0 and choose a cover I(I) such that λ (E) I I(E) λ (I) ε, which is possible by the definition of λ and Lemma 2.4. This lemma also yields λ (I) = λ (I I a ) + λ (I Ia). c But then we have λ (E) I I(E) λ (I I a )+λ (I Ia) ε, c which is bigger than λ (E I a )+λ (E Ia) ε. c Let ε 0. Putting the previous results together, we obtain existence of the Lebesgue measure on B. Theorem 2.6 The (restricted) function λ : B [0, ] is the unique measure on B that satisfies λ(i) = λ 0 (I). Proof By Theorem 2.3 and Lemma 2.4 λ is a measure on Σ λ and by Lemma 2.5 its restriction to B is a measure as well. Moreover, Lemma 2.4 states that λ(i) = λ 0 (I). The only thing that remains to be shown is that λ is the unique measure with the latter property. Suppose that also a measure µ enjoys this property. Then, for any a R we have and n N, we have that (, a] [ n, +n] is an 12

interval, hence λ(, a] [ n, +n] = µ(, a] [ n, +n]. Since the intervals (, a] form a π-system that generates B, we also have λ(b [ n, +n]) = µ(n B [ n, +n]), for any B B and n N. Since λ and µ are measures, we obtain for n that λ(b) = µ(b), B B. The sets in Σ λ are also called Lebesgue-measurable sets. A function f : R R is called Lebesgue-measurable if the sets {f c} are in Σ λ for all c R. The question arises whether all subsets of R are in Σ λ. The answer is no, but the Axiom of Choice is needed for this, see Exercise 2.6. Unlike showing that there exist sets that are not Borel-measurable, here a counting argument as in Section 1.1 is useless, since it holds that Σ λ has the same cardinality as 2 R. This fact can be seen as follows. Consider the Cantor set in [0, 1]. Let C 1 = [0, 1 3 ] [2 3, 1], obtained from C 0 = [0, 1] be deleting the middle third. From each of the components of C 1 we leave out the middle thirds again, resulting in C 2 = [0, 1 9 ] [ 2 9, 1 3 ] [ 2 3, 7 9 ] [ 8 9, 1], and so. The obtained sequence of sets C n is decreasing and its limit C := n=1 C n the Cantor set, is well defined. Moreover, we see that λ(c) = 0. On the other hand, C is uncountable, since every number can in it can be described by its ternary expansion k=1 x k3 k, with the x k {0, 2}. By completeness of ([0, 1], Σ λ, λ), every subset of C has Lebesgue measure zero as well, and the cardinality of the power set of C equals that of the power set of [0, 1]. An interesting fact is that the Lebesgue-measurable sets Σ λ coincide with the σ-algebra B(R) N, where N is the collection of subsets of [0, 1] with outer measure zero. This follows from Exercise 2.4. 2.2 A general extension theorem Recall Theorem 2.6. Its content can be described by saying that there exists a measure on a σ-algebra (in this case on B) that is such that its restriction to a suitable subclass of sets (the intervals) has a prescribed behavior. This is basically also valid in a more general situation. The proof of the main result of this section parallels to a large extent the development of the previous section. Let s state the theorem. Theorem 2.7 Let Σ 0 be an algebra on a set S and let µ 0 : Σ 0 [0, ] be finitely additive and countably subadditive. Then there exists a measure µ defined on Σ = σ(σ 0 ) such that µ restricted to Σ 0 coincides with µ 0. The measure µ is thus an extension of µ 0, and this extension is unique if µ 0 is σ-finite on Σ 0. Proof We only sketch the main steps. First we define an outer measure on 2 S by putting µ (E) = inf µ 0 (E k ), Σ 0(E) E k Σ 0(E) 13

where the infimum is taken over all Σ 0 (E), countable covers of E with elements E k from Σ 0. Compare this to the definition in (2.3). It follows as in the proof of Lemma 2.4 that µ is an outer measure. Let E Σ 0. Obviously, {E} is a (finite) cover of E and so we have that µ (E) µ 0 (E). Let {E 1, E 2,...} be a cover of E with the E k Σ 0. Since µ 0 is countably subadditive and E = k (E E k ), we have µ 0 (E) k µ 0(E E k ) and since µ 0 is finitely additive, we also have µ 0 (E E k ) µ 0 (E k ). Collecting these results we obtain µ 0 (E) k µ 0 (E k ). Taking the infimum in the displayed inequality over all covers Σ 0 (E), we obtain µ 0 (E) µ (E), for E Σ 0. Hence µ 0 (E) = µ (E) and µ is an extension of µ 0. In order to show that µ restricted to Σ is a measure, it is by virtue of Theorem 2.3 sufficient to show that Σ 0 Σ µ, because we then also have Σ Σ µ. We proceed to prove the former inclusion. Let F S be arbitrary, ε > 0. Then there exists a cover Σ 0 (F ) such that µ (F ) E k Σ 0(F ) µ 0(E k ) ε. Using the same kind of arguments as in the proof of Lemma 2.5, one obtains (using that µ 0 is additive on the algebra Σ 0, where it coincides with µ ) for every E Σ 0 µ (F ) + ε k = k = k µ 0 (E k ) µ 0 (E k E) + k µ (E k E) + k µ 0 (E k E c ) µ (E k E c ) µ (F E) + µ (F E c ), by subadditivity of µ. Letting ε 0, we arrive at µ (F ) µ (F E)+µ (F E c ), which is equivalent to µ (F ) = µ (F E) + µ (F E c ). Below we denote the restriction of µ to Σ by µ. We turn to the asserted unicity. Let ν be a measure on Σ that also coincides with µ 0 on Σ 0. The key result, which we will show below, is that µ and ν also coincide on the sets F in Σ for which µ(f ) <. Indeed, assuming that this is the case, we can write for E Σ and S 1, S 2,... disjoint sets in Σ 0 with µ(s n ) < and n S n = S, using that also µ(e S n ) <, ν(e) = n ν(e S n ) = n µ(e S n ) = µ(e). Now we show the mentioned key result. Let E Σ. Consider a cover Σ 0 (E) of E. Then we have, since ν is a measure on Σ, ν(e) k ν(e k) = k µ 0(E k ). By taking the infimum over such covers, we obtain ν(e) µ (E) = µ(e). We proceed to prove the converse inequality for sets E with µ(e) <. 14

Let E Σ with µ(e) <. Given ε > 0, we can chose a cover Σ 0 (E) such that µ(e) > E k Σ µ(e 0(E) k) ε. Let U n = n k=1 E k and note that U n Σ 0 and U := n=1u n = k=1 E k Σ. Since U E, we obtain µ(e) µ(u), whereas σ-additivity of µ yields µ(u) < µ(e) + ε, which implies µ(u E c ) = µ(u) µ(u E) = µ(u) µ(e) < ε. Since it also follows that µ(u) <, there is N N such that µ(u) < µ(u N ) + ε. Below we use that µ(u N ) = ν(u N ), the already established fact that µ ν on Σ and arrive at the following chain of (in)equalities. ν(e) = ν(e U) = ν(u) ν(u E c ) ν(u N ) µ(u E c ) ν(u N ) ε = µ(u N ) ε > µ(u) 2ε µ(e) 2ε. It follows that ν(e) µ(e). The assumption in Theorem 2.7 that the collection Σ 0 is an algebra can be weakened by only assuming that it is a semiring. This notion is beyond the scope of the present course. Unicity of the extension fails to hold for µ 0 that are not σ-finite. Here is a counterexample. Let S be an infinite set and Σ 0 an arbitrary algebra consisting of the empty set and infinite subsets of S. Let µ 0 (E) =, unless E =, in which case we have µ 0 (E) = 0. Then µ(f ) defined by µ(f ) =, unless F =, yields the extension of Theorem 2.7 on 2 S, whereas the counting measure on 2 S also extends µ 0. 2.3 Exercises 2.1 Let µ be an outer measure on some set S. Let N S be such that µ(n) = 0. Show that N Σ µ. 2.2 Let (S, Σ, µ) be a measure space. A measurable covering of a subset A of S is countable collection {E i : i N} Σ such that A i=1. Let M(A) be the collection of all measurable coverings of A. Put µ (A) = inf{ i=1 µ(e i) : {E 1, E 2,...} M(A)}. Show that µ is an outer measure on S and that µ (E) = µ(e), if E Σ. Show also that µ (A) = inf{µ(e) : E A, E Σ}. We call µ the to µ associated outer measure. 2.3 Let (S, Σ, µ) be a measure space and let µ be the associated outer measure on S. If A S, then there exists E Σ such that A E and µ (A) = µ(e). Prove this. 2.4 Consider a measure space (S, Σ, µ) with σ-finite µ and let µ be the associated outer measure on S. Show that Σ µ Σ N, where N is the collection of 15

all µ -null sets. Hint: Reduce the question to the case where µ is finite. Take then A Σ µ and E as in Exercise 2.3 and show that µ (E \ A) = 0. (By Exercise 2.1, we even have Σ µ = Σ N.) 2.5 Show that the Lebesgue measure λ is translation invariant, i.e. λ(e + x) = λ(e) for all E Σ λ, where E + x = {y + x : y E}. 2.6 This exercise aims at showing the existence of a set E / Σ λ. First we define an equivalence relation on R by saying x y iff x y Q. By the axiom of choice there exists a set E (0, 1) that has exactly one point in each equivalence class induced by. The set E is our candidate. (a) Show the following two statements. If x (0, 1), then q Q ( 1, 1) : x E + q. If q, r Q and q r, then (E + q) (E + r) =. (b) Assume that E Σ λ. Put S = q Q (0,1) E + q and note that S ( 1, 2). Use translation invariance of λ (Exercise 2.5) to show that λ(s) = 0, whereas at the same time one should have λ(s) λ(0, 1). (c) Show that λ (E) = 1 and λ ((0, 1) \ E) = 1. 16

3 Measurable functions and random variables In this chapter we define random variables as measurable functions on a probability space and derive some properties. 3.1 General setting Let (S, Σ) be a measurable space. Recall that the elements of Σ are called measurable sets. Also recall that B = B(R) is the collection of all the Borel sets of R. Definition 3.1 A mapping h : S R is called measurable if h 1 [B] Σ for all B B. It is clear that this definition depends on B and Σ. When there are more σ-algebras in the picture, we sometimes speak of Σ-measurable functions, or Σ/B-measurable functions, depending on the situation. If S is topological space with a topology T and if Σ = σ(t ), a measurable function h is also called a Borel measurable function. Remark 3.2 Consider E S. The indicator function of E is defined by 1 E (s) = 1 if s E and 1 E (s) = 0 if s / E. Note that 1 E is a measurable function iff E is a measurable set. Sometimes one wants to extend the range of the function h to [, ]. If this happens to be the case, we extend B with the singletons { } and { }, and work with B = σ(b {{ }, { }}). We call h : S [, ] measurable if h 1 [B] Σ for all B B. Below we will often use the shorthand notation {h B} for the set {s S : h(s) B}. Likewise we also write {h c} for the set {s S : h(s) c}. Many variations on this theme are possible. Proposition 3.3 Let (S, Σ) be a measurable space and h : S R. (i) If C is collection of subsets of R such that σ(c) = B, and if h 1 [C] Σ for all C C, then h is measurable. (ii) If {h c} Σ for all c R, then h is measurable. (iii) If S is topological and h continuous, then h is measurable with respect to the σ-algebra generated by the open sets. In particular any constant function is measurable. (iv) If h is measurable and another function f : R R is Borel measurable (B/B-measurable), then f h is measurable as well. Proof (i) Put D = {B B : h 1 [B] Σ}. One easily verifies that D is a σ-algebra and it is evident that C D B. It follows that D = B. (ii) This is an application of the previous assertion. Take C = {(, c] : c R}. (iii) Take as C the collection of open sets and apply (i). (iv) Take B B, then f 1 [B] B since f is Borel. Because h is measurable, we then also have (f h) 1 [B] = h 1 [f 1 [B]] Σ. 17

Remark 3.4 There are many variations on the assertions of Proposition 3.3 possible. For instance in (ii) we could also use {h < c}, or {h > c}. Furthermore, (ii) is true for h : S [, ] as well. We proved (iv) by a simple composition argument, which also applies to a more general situation. Let (S i, Σ i ) be measurable spaces (i = 1, 2, 3), h : S 1 S 2 is Σ 1 /Σ 2 -measurable and f : S 2 S 3 is Σ 2 /Σ 3 -measurable. Then f h is Σ 1 /Σ 3 -measurable. The set of measurable functions will also be denoted by Σ. This notation is of course a bit ambiguous, but it turns that no confusion can arise. Remark 3.2, in a way justifies this notation. The remark can, with the present convention, be rephrased as 1 E Σ iff E Σ. Fortunately, the set Σ of measurable function is closed under elementary operations. Proposition 3.5 (i) The collection Σ of Σ-measurable functions is a vector space and products of measurable functions are measurable as well. (ii) Let (h n ) be a sequence in Σ. Then also inf h n, sup h n, lim inf h n, lim sup h n are in Σ, where we extend the range of these functions to [, ]. The set L, consisting of all s S for which lim n h n (s) exists as a finite limit, is measurable. Proof (i) If h Σ and λ R, then λh is also measurable (use (ii) of the previous proposition for λ 0). To show that the sum of two measurable functions is measurable, we first note that {(x 1, x 2 ) R 2 : x 1 + x 2 > c} = q Q {(x 1, x 2 ) R 2 : x 1 > q, x 2 > c q} (draw a picture!). But then we also have {h 1 +h 2 > c} = q Q ({h 1 > q} {h 2 > c q}), a countable union. To show that products of measurable functions are measurable is left as Exercise 3.1. (ii) Since {inf h n c} = n {h n c}, it follows that inf h n Σ. To sup h n a similar argument applies, that then also yield measurability of lim inf h n = sup n inf m n h m and lim sup h n. To show the last assertion we consider h := lim sup h n lim inf h n. Then h : S [, ] is measurable. The assertion follows from L = {lim sup h n < } {lim inf h n > } {h = 0}. For later use we present the Monotone Class Theorem. Theorem 3.6 Let H be a vector space of bounded functions, with the following properties. (i) 1 H. (ii) If (f n ) is a nonnegative sequence in H such that f n+1 f n for all n, and f := lim f n is bounded as well, then f H. If, in addition, H contains the indicator functions of sets in a π-system I, then H contains all bounded σ(i)-measurable functions. Proof Put D = {F S : 1 F H}. One easily verifies that D is a d-system, and that it contains I. Hence, by Corollary 1.14, we have Σ := σ(i) D. We will use this fact later in the proof. Let f be a bounded, Σ-measurable function. Without loss of generality, we may assume that f 0 (add a constant otherwise), and f < K for some real 18

constant K. Introduce the functions f n defined by f n = 2 n 2 n f. In explicit terms, the f n are given by f n (s) = K2 n 1 i=0 i2 n 1 {i2 n f(s)<(i+1)2 n }. Then we have for all n that f n is a bounded measurable function, f n f, and f n f (check this!). Moreover, each f n lies in H. To see this, observe that {i2 n f(s) < (i + 1)2 n } Σ, since f is measurable. But then this set is also an element of D, since Σ D (see above) and hence 1 {i2 n f(s)<(i+1)2 n } H. Since H is a vector space, linear combinations remain in H and therefore f n H. Property (ii) of H yields f H. 3.2 Random variables We return to the setting of Section 1.5 and so we consider a probability space (Ω, F, P). In this setting Definition 3.1 takes the following form. Definition 3.7 A function X : Ω R is called a random variable if it is (F-)measurable. By definition, random variables are nothing else but measurable functions. Following the tradition, we denote them by X (or other capital letters), rather than by h, as in the previous sections. We often need the σ-algebra σ(x), the smallest σ-algebra on Ω that makes X a random variable. In other words, it is the intersection of all σ-algebras F, such that X is a random variable in the sense of Definition 3.7. Of course, X is F-measurable iff σ(x) F. If we have a collection of mappings X := {X i : Ω R} i I, then we denote by σ(x) the smallest σ-algebra on Ω such that all the X i become measurable. See Exercise 3.3. Having a probability space (Ω, F, P), a random variable X, and the measurable space (R, B), we will use these ingredients to endow the latter space with a probability measure. Define µ : B [0, 1] by µ(b) := P(X B) = P(X 1 [B]). (3.1) It is straightforward to check that µ is a probability measure on B. Commonly used alternative notations for µ are P X, or L X, L X. This probability measure is referred to as the distribution of X or the law of X. Along with the distribution of X, we introduce its distribution function, usually denoted by F (or F X, or F X ). By definition it is the function F : R [0, 1], given by F (x) = µ((, x]) = P(X x). Proposition 3.8 The distribution function of a random variable is right continuous, non-decreasing and satisfies lim x F (x) = 1 and lim x F (x) = 0. The set of points where F is discontinuous is at most countable. 19

Proof Exercise 3.4. The fundamental importance of distribution functions in probability is based on the following proposition. Proposition 3.9 Let µ 1 and µ 2 be two probability measures on B. Let F 1 and F 2 be the corresponding distribution functions. If F 1 (x) = F 2 (x) for all x, then µ 1 = µ 2. Proof Consider the π-system I = {(, x] : x R} and apply Theorem 1.15. This proposition thus states, in a different wording, that for a random variable X, its distribution, the collection of all probabilities P(X B) with B B, is determined by the distribution function F X. We call any function on R that has the properties of Proposition 3.8 a distribution function. Note that any distribution function is Borel measurable (sets {F c} are intervals and thus in B). Below, in Theorem 3.10, we justify this terminology. We will see that for any distribution function F, it is possible to construct a random variable on some (Ω, F, P), whose distribution function equals F. This theorem is founded on the existence of the Lebesgue measure λ on the Borel sets B[0, 1] of [0, 1], see Theorem 1.5. We now give a probabilistic translation of this theorem. Consider (Ω, F, P) = ([0, 1], B[0, 1], λ). Let U : Ω [0, 1] be the identity map. The distribution of U on [0, 1] is trivially the Lebesgue measure again, in particular the distribution function F U of U satisfies F U (x) = x for x [0, 1] and so P(a < U b) = F U (b) F U (a) = b a for a, b [0, 1] with a b. Hence, to the distribution function F U corresponds a probability measure on ([0, 1], B[0, 1]) and there exists a random variable U on this space, such that U has F U as its distribution function. The random variable U is said to have the standard uniform distribution. The proof of Theorem 3.10 (Skorokhod s representation of a random variable with a given distribution function) below is easy in the case that F is continuous and strictly increasing (Exercise 3.6), given the just presented fact that a random variable with a uniform distribution exists. The proof that we give below for the general case just follows a more careful line of arguments, but is in spirit quite similar. Theorem 3.10 Let F be a distribution function on R. Then there exists a probability space and a random variable X : Ω R such that F is the distribution function of X. Proof Let (Ω, F, P) = ([0, 1], B[0, 1], λ). We define X (ω) = inf{z R : F (z) ω}. Then X is a Borel measurable function, so a random variable, as this follows from the relation to be proven below, valid for all c R and ω [0, 1], X (ω) c F (c) ω. (3.2) 20