Spring 2014 Advanced Probability Overview. Lecture Notes Set 1: Course Overview, σ-fields, and Measures

36-752 Spring 2014 Advanced Probability Overview Lecture Notes Set 1: Course Overview, σ-fields, and Measures Instructor: Jing Lei Associated reading: Sec 1.1-1.4 of Ash and Doléans-Dade; Sec 1.1 and A.1 of Durrett. 1 Introduction How is this course different from your earlier probability courses? There are some problems that simply can t be handled with finite-dimensional sample spaces and random variables that are either discrete or have densities. Example 1 Try to express the strong law of large numbers without using an infinite-dimensional space. Oddly enough, the weak law of large numbers requires only a sequence of finitedimensional spaces, but the strong law concerns entire infinite sequences. Example 2 Consider a distribution whose cumulative distribution function (cdf) increases continuously part of the time but has some jumps. Such a distribution is neither discrete nor continuous. How do you define the mean of such a random variable? Is there a way to treat such distributions together with discrete and continuous ones in a unified manner? General Measures Both of the above examples are accommodated by a generalization of the theories of summation and integration. Indeed, summation becomes a special case of the more general theory of integration. It all begins with a generalization of the concept of size of a set. Example 3 One way to measure the size of a set is to count its elements. All infinite sets would have the same size (unless you distinguish different infinite cardinals). Example 4 Special subsets of Euclidean spaces can be measured by length, area, volume, etc. But what about sets with lots of holes in them? For example, how large is the set of irrational numbers between 0 and 1? We will use measures to say how large sets are. First, we have to decide which sets we will measure. 1

2 σ-fields Definition 1 (fields and σ-fields) Let Ω be a set. A collection F of subsets of Ω is called a field if it satisfies Ω F, for each A F, A C F, for all A 1, A 2 F, A 1 A 2 F. A field F is a σ-field if, in addition, it satisfies for every sequence {A k } in F, A k F. We will define measures on fields and σ-field s. Definition 2 (Measurable Space) A set Ω together with a σ-field F is called a measurable space (Ω, F), and the elements of F are called measurable sets. Example 5 (Intervals on R 1 ) Let Ω = IR and define U to be the collection of all unions of finitely many disjoint intervals of the form (a, b] or (, b] or (a, ) or (, ), together with. Then U is a field. Example 6 (Power set) Let Ω be an arbitrary set. The collection of all subsets of Ω is a σ-field. It is denoted 2 Ω and is called the power set of Ω. Example 7 (Trivial σ-field) Let Ω be an arbitrary set. Let F = {Ω, }. This is the trivial σ-field. Exercise 1 Let F 1, F 2,... be classes of sets in a common space Ω such that F n F n+1 for each n. Show that if each F n is a field, then n=1f n is also a field. If each F n is a σ-field, then is n=1f n also necessarily a σ-field? Think about the following case: Ω is the set of nonnegative integers and F n σ ({ {0}, {1},..., {n} }). 2

Generated σ-fields A field is closed under finite set theoretic operations whereas a σ-field is closed under countable set theoretic operations. In a problem dealing with probabilities, one usually deals with a small class of subsets A, for example the class of subintervals of (0, 1]. It is possible that if we perform countable operations on such a class A of sets, we might end up operating on sets outside the class A. Hence, we would like to define a class denoted by σ(a) in which we can safely perform countable set-theoretic operations. This class σ(a) is called the σ-field generated by A, and it is defined as the intersection of all the σ-fields containing A (exercise: show that this is a σ-field). σ(a) is the smallest σ-field containing A. Example 8 Let C = {A} for some nonempty A that is not itself Ω. Then σ(c) = {, A, A C, Ω}. Example 9 Let Ω = IR and let C be the collection of all intervals of the form (a, b]. Then the field generated by C is U from Example 5 while σ(c) is larger. Example 10 (Borel σ-field) Let Ω be a topological space and let C be the collection of open sets. Then σ(c) is called the Borel σ-field. If Ω = IR, the Borel σ-field is the same as σ(c) in Example 9. The Borel σ-field of subsets of IR k is denoted B k. Exercise 2 Give some examples of classes of sets C such that σ(c) = B 1. Exercise 3 Are there subsets of IR which are not in B 1? 3 Measures Notation 11 (Extended Reals) The extended reals is the set of all real numbers together with and. We shall denote this set IR. The positive extended reals, denoted IR + is (0, ], and the nonnegative extended reals, denoted IR +0 is [0, ]. Definition 3 Let (Ω, F) be a measurable space. Let µ : F IR +0 satisfy µ( ) = 0, for every sequence {A k } of mutually disjoint elements of F, µ( A k) = µ(a k). Then µ is called a measure on (Ω, F) and (Ω, F, µ) is a measure space. If F is merely a field, then a µ that satisfies the above two conditions whenever A k F is called a measure on the field F. 3

Example 12 Let Ω be arbitrary with F the trivial σ-field. Define µ( ) = 0 and µ(ω) = c for arbitrary c > 0 (with c = possible). Example 13 (Counting measure) Let Ω be arbitrary and F = 2 Ω. For each finite subset A of Ω, define µ(a) to be the number of elements of A. Let µ(a) = for all infinite subsets. This is called counting measure on Ω. Definition 4 (Probability measure) Let (Ω, F, P ) be a measure space. If P (Ω) = 1, then P is called a probability, (Ω, F, P ) is a probability space, and elements of F are called events. Sometimes, if the name of the probability P is understood or is not even mentioned, we will denote P (E) by Pr(E) for events E. Infinite measures pose a few unique problems. Some infinite measures are just like finite ones. Definition 5 (σ-finite measure) Let (Ω, F, µ) be a measure space, and let C F. Suppose that there exists a sequence {A n } n=1 of elements of C such that µ(a n ) < for all n and Ω = n=1 A n. Then we say that µ is σ-finite on C. If µ is σ-finite on F, we merely say that µ is σ-finite. Example 14 Let Ω = ZZ with F = 2 Ω and µ being counting measure. σ-finite. Counting measure on an uncountable space is not σ-finite. This measure is Exercise 4 Prove the claims in Example 14. 3.1 Basic properties of measures There are several useful properties of measures that are worth knowing. First, measures are countably subadditive in the sense that ( ) µ A n µ(a n ), (5) n=1 for arbitrary sequences {A n } n=1. The proof of this uses a standard trick for dealing with countable sequences of sets. Let B 1 = A 1 and let B n = A n \ n 1 i=1 B i for n > 1. The B n s are disjoint and have the same finite and countable unions as the A n s. The proof of Equation 5 relies on the additional fact that µ(b n ) µ(a n ) for all n. Next, if µ(a n ) = 0 for all n, it follows that µ ( n=1 A n) = 0. This gets used a lot in proofs. Similarly, if µ is a probability and µ(a n ) = 1 for all n, then µ ( n=1 A n) = 1. 4 n=1

Definition 6 (Almost sure/almost everywhere) Suppose that some statement about elements of Ω holds for all ω A C where µ(a) = 0. Then we say that the statement holds almost everywhere, denoted a.e. [µ]. If P is a probability, then almost everywhere is often replaced by almost surely, denoted a.s. [P ]. Example 15 Let (Ω, F, P ) be a probability space. Let {X n } n=1 be a sequence of functions a.s. from Ω to IR. To say that X n converges to X a.s. [P ] (denoted X n X) means that there is a set A with P (A) = 0 and lim n X n (ω) = X(ω) for all ω A C. Proposition 6 (Linearity) If µ 1, µ 2,... are all measures on (Ω, F) and if {a n } n=1 is a sequence of positive numbers, then n=1 a nµ n is a measure on (Ω, F). Exercise 7 Prove Proposition 6. 3.2 Monotone sequences of sets and limits of measure Definition 7 (Monotone sequences of sets) Let (Ω, F, µ) be a measure space. A sequence {A n } n=1 of elements of F is called monotone increasing if A n A n+1 for each n. It is monotone decreasing if A n A n+1 for each n. Lemma 16 Let (Ω, F, µ) be a measure space. Let {A n } n=1 be a monotone sequence of elements of F. Then lim n µ(a n ) = µ (lim n A n ) if either of the following hold: the sequence is increasing, the sequence is decreasing and µ(a k ) < for some k. Proof: Define A = lim n A n. In the first case, write B 1 = A 1 and B n = A n \ A n 1 for n > 1. Then A n = n B k for all n (including n = ). Then µ(a n ) = n µ(b k), and ( ) µ lim A n = µ(a ) = n µ(b k ) = lim n n µ(b k ) = lim n µ(a n ). In the second case, write B n = A n \ A n+1 for all n k. Then, for all n > k, A k \ A n = A k \ A = n 1 i=k B i, B i. i=k 5

By the first case, ( ) lim µ(a k \ A n ) = µ B i = µ(a k \ A ). n Because A n A k for all n > k and A A k, it follows that It now follows that lim n µ(a n ) = µ(a ). i=k µ(a k \ A n ) = µ(a k ) µ(a n ), µ(a k \ A ) = µ(a k ) µ(a ). Exercise 8 Construct a simple counterexample to show that the condition µ(a k ) < is required in the second claim of Lemma 16. 3.3 Uniqueness of Measures There is a popular method for proving uniqueness theorems about measures. The idea is to define a function µ on a convenient class C of sets and then prove that there can be at most one extension of µ to σ(c). Example 17 Suppose it is given that for any a IR, P ((, a]) = a 1 2π exp ( u 2 /2 ) du. Does that uniquely define a probability measure on the class of Borel subsets of the line, B 1? Definition 8 (π-system and λ-system) A collection A of subsets of Ω is a π-system if, for all A 1, A 2 A, A 1 A 2 A. A class C is a λ-system if Ω C, for each A C, A C C, for each sequence {A n } n=1 of disjoint elements of C, n=1 A n C. Example 18 The collection of all intervals of the form (, a] is a π-system of subsets of IR. So too is the collection of all intervals of the form (a, b] (together with ). The collection of all sets of the form {(x, y) : x a, y b} is a π-system of subsets of IR 2. So too is the collection of all rectangles with sides parallel to the coordinate axes. Some simple results about π-systems and λ-systems are the following. 6

Proposition 9 If Ω is a set and C is both a π-system and a λ-system, then C is a σ-field. Proposition 10 Let Ω be a set and let Λ be a λ-system of subsets. If A Λ and A B Λ then A B C Λ. Exercise 11 Prove Propositions 9 and 10. Lemma 19 (π λ theorem) Let Ω be a set and let Π be a π-system and let Λ be a λ- system that contains Π. Then σ(π) Λ. Proof: Define λ(π) to be the smallest λ-system containing Π. For each A Ω, define G A to be the collection of all sets B Ω such that A B λ(π). First, we show that G A is a λ-system for each A λ(π). To see this, note that A Ω λ(π), so Ω G A. If B G A, then A B λ(π), and Proposition 10 says that A B C λ(π), so B C G A. Finally, {B n } n=1 G A with the B n disjoint implies that A B n λ(π) with A B n disjoint, so their union is in λ(π). But their union is A ( n=1 B n). So n=1 B n G A. Next, we show that λ(π) G C for every C λ(π). Let A, B Π, and notice that A B Π, so B G A. Since G A is a λ-system containing Π, it must contain λ(π). It follows that A C λ(π) for all C λ(π). If C λ(π), it then follows that A G C. So, Π G C for all C λ(π). Since G C is a λ-system containing Π, it must contain λ(π). Finally, if A, B λ(π), we just proved that B G A, so A B λ(π) and hence λ(π) is also a π-system. By Proposition 9, λ(π) is a σ-field containing Π and hence must contain σ(π). Since λ(π) Λ, the proof is complete. The uniqueness theorem is the following. Theorem 20 (Uniqueness theorem) Suppose that µ 1 and µ 2 are measures on (Ω, F) and F = σ(π), for a π-system Π. If µ 1 and µ 2 are both σ-finite on Π and they agree on Π, then they agree on F. Proof: First, let C Π be such that µ 1 (C) = µ 2 (C) <, and define G C to be the collection of all B F such that µ 1 (B C) = µ 2 (B C). It is easy to see that G C is a λ-system that contains Π, hence it equals F by Lemma 19. (For example, if B G C, so B C G C.) µ 1 (B C C) = µ 1 (C) µ 1 (B C) = µ 2 (C) µ 2 (B C) = µ 2 (B C C), Since µ 1 and µ 2 are σ-finite, there exists a sequence {C n } n=1 Π such that µ 1 (C n ) = µ 2 (C n ) <, and Ω = n=1 C n. (Since Π is only a π-system, we cannot assume that the C n are disjoint.) For each A F, ( n ) µ j (A) = lim µ j [C i A] for j = 1, 2. n i=1 7

Since µ j ( n i=1 [C i A]) can be written as a linear combination of values of µ j at sets of the form A C, where C Π is the intersection of finitely many of C 1,..., C n, it follows from A G C that µ 1 ( n i=1 [C i A]) = µ 2 ( n i=1 [C i A]) for all n, hence µ 1 (A) = µ 2 (A). Exercise 12 Return to Example 17. You should now be able to answer the question posed there. Exercise 13 Suppose that Ω = {a, b, c, d, e} and I tell you the value of P ({a, b}) and P ({b, c}). For which subset of Ω do I need to define P ( ) in order to have a unique extension of P to a σ-field of subsets of Ω? 4 Lebesgue Measure and Caratheodory s Extension Theorem Let F be a cdf (nondecreasing, right-continuous, limits equal 0 and 1 at and respectively). Let U be the field in Example 5 (unions of finitely many disjoint intervals). Define µ : U [0, 1] by µ(a) = n F (b k) F (a k ) when A = n (a k, b k ] and {(a k, b k ]} are disjoint. This set-function is well-defined and finitely additive. Is µ countably additive as probabilities are supposed to be? That is, if A = i=1 A i where the A i s are disjoint, each A i is a union of finitely many disjoint intervals, and A itself is the union of finitely many disjoint intervals (a k, b k ] for k = 1,..., n, does µ(a) = i=1 µ(a i)? First, take the collection of intervals that go into all of the A i s and split them, if necessary, so that each is a subset of at most one of the (a k, b k ] intervals. Then apply the following result to each (a k, b k ]. Lemma 21 Let (a, b] = (c k, d k ] with the (c k, d k ] s disjoint. Then F (b) F (a) = F (d k) F (c k ). Proof: Since (a, b] n (c k, d k ] for all n, it follows that F (b) F (a) n F (d k) F (c k ) (because (c k, d k ) s are disjoint), hence F (b) F (a) F (d k) F (c k ). We need to prove the opposite inequality. Suppose first that both a and b are finite. Let ɛ > 0. For each k, there is e k > d k such that F (d k ) F (e k ) F (d k ) + ɛ 2 k. Also, there is f > a such that F (a) F (f) ɛ. Now, the interval [f, b] is compact and [f, b] (c k, e k ). So there are finitely many (c k, e k ) s (suppose they are the first n) such that [f, b] n (c k, e k ). Now, n n F (b) F (a) F (b) F (f) + ɛ ɛ + F (e k ) F (c k ) 2ɛ + F (d k ) F (c k ). 8

It follows that F (b) F (a) 2ɛ + F (d k) F (c k ). Since this is true for all ɛ > 0, it is true for ɛ = 0. If = a < b <, let g > be such that F (g) < ɛ. The above argument shows that F (b) F (g) F (d k g) F (c k g) F (d k ) F (c k ). Since lim g F (g) = 0, it follows that F (b) F (d k) F (c k ). Similar arguments work when a < b = and = a < b =. In Lemma 21 you can replace F by an arbitrary nondecreasing right-continuous function with only a bit more effort. (See the supplement following at the end of this lecture.) The function µ defined in terms of a nondecreasing right-continuous function is a measure on the field U. There is an extension theorem that gives conditions under which a measure on a field can be extended to a measure on the generated σ-field. Furthermore, the extension is unique. Example 22 (Lebesgue measure) Start with the function F (x) = x, form the measure µ on the field U and extend it to the Borel σ-field. The result is called Lebesgue measure, and it extends the concept of length from intervals to more general sets. Example 23 Every distribution function for a random variable has a corresponding probability measure on the real line. Theorem 24 (Caratheodory Extension) Let µ be a σ-finite measure on the field C of subsets of Ω. Then µ has a unique extension to a measure on σ(a). Exercise 14 In this exercise, we prove Theorem 24. Note that the uniqueness of the extension is a direct consequence of Theorem 20. We only need to prove the existence. First, for each B 2 Ω, define µ (B) = inf µ(a i ), (15) where the inf is taken over all {A i } i=1 such that B i=1 A i and A i C for all i. Since C is a field, we can assume that the A i s are mutually disjoint without changing the value of µ (B). Let A = {B 2 Ω : µ (C) = µ (C B) + µ (C B C ), for all C 2 Ω }. i=1 Now take the following steps: 9

1. Show that µ extends µ, i.e. that µ (A) = µ(a) for each A C. 2. Show that µ is monotone and subadditive. 3. Show that C A. 4. Show that A is a field. 5. Show that µ is finitely additive on A. 6. Show that A is a σ-field. 7. Show that µ is countably additive on A. 10

Supplement: Measures from Increasing Functions Lemma 21 deals only with functions F that are cdf s. Suppose that F is an unbounded nondecreasing function that is continuous from the right. If < a < b <, then the proof of Lemma 21 still applies. Suppose that (, b] = (c k, d k ] with b < nd all (c k, d k ] disjoint. Suppose that lim x F (x) =. We want to show that F (d k) F (c k ) =. If one c k =, the proof is immediate, so assume that all c k >. Then there must be a subsequence {k j } j=1 such that lim j c kj =. For each j, let {(c j,n, d j,n]} n=1 be the subsequence of intervals that cover (c kj, b]. For each j, the proof of Lemma 21 applies to show that F (b) F (c kj ) = F (d j,n) F (c j,n). (16) n=1 As j, the left side of Equation 16 goes to while the right side eventually includes every interval in the original collection. A similar proof works for an interval of the form (a, ) when lim x F (x) =. A combination of the two works for (, ). 11