Measurable Functions and Random Variables

Similar documents
Construction of a general measure structure

Some Background Material

Measures. Chapter Some prerequisites. 1.2 Introduction

3 Integration and Expectation

Integration on Measure Spaces

Continuum Probability and Sets of Measure Zero

Lebesgue Measure on R n

Part V. 17 Introduction: What are measures and why measurable sets. Lebesgue Integration Theory

Measure and integration

MATHS 730 FC Lecture Notes March 5, Introduction

Chapter 4. Measurable Functions. 4.1 Measurable Functions

1.1. MEASURES AND INTEGRALS

Measures and Measure Spaces

Spring 2014 Advanced Probability Overview. Lecture Notes Set 1: Course Overview, σ-fields, and Measures

Chapter 4. Measure Theory. 1. Measure Spaces

MATH & MATH FUNCTIONS OF A REAL VARIABLE EXERCISES FALL 2015 & SPRING Scientia Imperii Decus et Tutamen 1

n [ F (b j ) F (a j ) ], n j=1(a j, b j ] E (4.1)

Measurable functions are approximately nice, even if look terrible.

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

Analysis of Probabilistic Systems

Lebesgue measure and integration

Introduction to Proofs in Analysis. updated December 5, By Edoh Y. Amiran Following the outline of notes by Donald Chalice INTRODUCTION

Analysis Qualifying Exam

STAT 7032 Probability Spring Wlodek Bryc

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9

Summary of Real Analysis by Royden

RS Chapter 1 Random Variables 6/5/2017. Chapter 1. Probability Theory: Introduction

Connectedness. Proposition 2.2. The following are equivalent for a topological space (X, T ).

Introduction and Preliminaries

Chapter 1. Measure Spaces. 1.1 Algebras and σ algebras of sets Notation and preliminaries

II - REAL ANALYSIS. This property gives us a way to extend the notion of content to finite unions of rectangles: we define

Analysis Finite and Infinite Sets The Real Numbers The Cantor Set

Probability and Measure

CHAPTER I THE RIESZ REPRESENTATION THEOREM

Chapter 2 Metric Spaces

Measure and Integration: Concepts, Examples and Exercises. INDER K. RANA Indian Institute of Technology Bombay India

MORE ON CONTINUOUS FUNCTIONS AND SETS

Contents Ordered Fields... 2 Ordered sets and fields... 2 Construction of the Reals 1: Dedekind Cuts... 2 Metric Spaces... 3

Lebesgue Measure on R n

Lebesgue Integration on R n

THEOREMS, ETC., FOR MATH 515

We are going to discuss what it means for a sequence to converge in three stages: First, we define what it means for a sequence to converge to zero

+ 2x sin x. f(b i ) f(a i ) < ɛ. i=1. i=1

Mathematical Methods for Physics and Engineering

02. Measure and integral. 1. Borel-measurable functions and pointwise limits

MATH41011/MATH61011: FOURIER SERIES AND LEBESGUE INTEGRATION. Extra Reading Material for Level 4 and Level 6

I. ANALYSIS; PROBABILITY

A LITTLE REAL ANALYSIS AND TOPOLOGY

The main results about probability measures are the following two facts:

Economics 204 Summer/Fall 2011 Lecture 5 Friday July 29, 2011

A VERY BRIEF REVIEW OF MEASURE THEORY

Notes 1 : Measure-theoretic foundations I

1/12/05: sec 3.1 and my article: How good is the Lebesgue measure?, Math. Intelligencer 11(2) (1989),

Topology. Xiaolong Han. Department of Mathematics, California State University, Northridge, CA 91330, USA address:

Real Analysis Problems

Chapter 5. Measurable Functions

Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1

Lebesgue Integration: A non-rigorous introduction. What is wrong with Riemann integration?

The Lebesgue Integral

1 Measurable Functions

University of Sheffield. School of Mathematics & and Statistics. Measure and Probability MAS350/451/6352

MEASURE AND INTEGRATION. Dietmar A. Salamon ETH Zürich

Notes on Measure, Probability and Stochastic Processes. João Lopes Dias

Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University

Product measures, Tonelli s and Fubini s theorems For use in MAT4410, autumn 2017 Nadia S. Larsen. 17 November 2017.

Notes on the Lebesgue Integral by Francis J. Narcowich Septemmber, 2014

REVIEW OF ESSENTIAL MATH 346 TOPICS

Real Analysis Notes. Thomas Goller

LEBESGUE INTEGRATION. Introduction

Introduction to Real Analysis Alternative Chapter 1

1.4 Outer measures 10 CHAPTER 1. MEASURE

LEBESGUE MEASURE AND L2 SPACE. Contents 1. Measure Spaces 1 2. Lebesgue Integration 2 3. L 2 Space 4 Acknowledgments 9 References 9

REAL ANALYSIS LECTURE NOTES: 1.4 OUTER MEASURE

g 2 (x) (1/3)M 1 = (1/3)(2/3)M.

Introductory Analysis I Fall 2014 Homework #9 Due: Wednesday, November 19

Tools from Lebesgue integration

Probability and Measure

Dynkin (λ-) and π-systems; monotone classes of sets, and of functions with some examples of application (mainly of a probabilistic flavor)

Economics 574 Appendix to 13 Ways

Introduction to Real Analysis

FUNDAMENTALS OF REAL ANALYSIS by. II.1. Prelude. Recall that the Riemann integral of a real-valued function f on an interval [a, b] is defined as

Math 4121 Spring 2012 Weaver. Measure Theory. 1. σ-algebras

7: FOURIER SERIES STEVEN HEILMAN

Bounded and continuous functions on a locally compact Hausdorff space and dual spaces

Measures. 1 Introduction. These preliminary lecture notes are partly based on textbooks by Athreya and Lahiri, Capinski and Kopp, and Folland.

Filters in Analysis and Topology

MAT1000 ASSIGNMENT 1. a k 3 k. x =

Lebesgue measure on R is just one of many important measures in mathematics. In these notes we introduce the general framework for measures.

2 Measure Theory. 2.1 Measures

Fourth Week: Lectures 10-12

2 Metric Spaces Definitions Exotic Examples... 3

Quick Tour of the Topology of R. Steven Hurder, Dave Marker, & John Wood 1

NAME: Mathematics 205A, Fall 2008, Final Examination. Answer Key

Indeed, if we want m to be compatible with taking limits, it should be countably additive, meaning that ( )

Lecture 6 Basic Probability

McGill University Math 354: Honors Analysis 3

(1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define

2 (Bonus). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

Math 564 Homework 1. Solutions.

2.23 Theorem. Let A and B be sets in a metric space. If A B, then L(A) L(B).

Transcription:

Chapter 8 Measurable Functions and Random Variables The relationship between two measurable quantities can, strictly speaking, not be found by observation. Carl Runge What I don t like about measure theory is that you have to say almost everywhere almost everywhere. Karl Friedrichs While writing my book [Stochastic Processes] I had an argument with Feller. He asserted that everyone said random variable and I asserted that everyone said chance variable. We obviously had to use the same name in our books, so we decided the issue by a stochastic procedure. That is, we tossed for it and he won. Joseph L. Doob A panorama Next, we investigate functions between measure spaces. Recall that the inverse of a map commutes with the set operations intersection, union and complement (Theorem 2.2.2). This in turn makes it sensible to discuss whether or not a map between two measurable spaces relates the σ- algebras of the respective spaces. Dealing with functions is the natural next step in analysis of course, if for no other reason than functions are very useful for applications. Recall that in Chapter 4 we use functions like the Rademacher functions to describe interesting events in probability. We begin by explaining how the inverse of a function whose range is a measurable space induces a σ- algebra on its domain. If both the range and domain are measurable spaces, then a function is called measurable if the induced σ- algebra is a subset of the original σ- algebra. This concept is more general than continuity, as continuous functions are measurable but not every measurable function is continuous. Measurable functions also turns out to be the natural class of functions to use in measure theory, e.g. all nonnegative measurable functions can be integrated. One piece of strong evidence for this is that it is 149

150 Chapter 8. Measurable Functions and Random Variables very difficult to get out of the set of measurable functions, e.g. by taking limits of sequences or combining with arithmetic operations. We prove a sequence of theorems that make this claim precise. Recall that one of the knocks against Riemann integration is that the limit of a sequence of Riemann integrable functions may not be Riemann integrable. We then develop a number of other important properties of measurable functions. We show that a measurable function can be approximated by a sequence of particularly simple functions called, well, simple functions. When the measurable function is nonnegative, the sequence of simple functions can be made monotone. Simple functions are piecewise constant functions, but defined through inverse acting on a partition of the range of a measurable function, not a partition of the domain as with standard piecewise constant functions used in Riemann integration. Such approximation results are fundamentally important for analysis, e.g. we use the approximation property to define integration through a limiting process. We also show that a general measurable function is close to a continuous function in the sense they agree except on a set of small measure. Using that result, we show that a measurable function can be approximated by a sequence of continuous functions. The last general result we show is a theorem that explains how the inverse of measurable function on a measure space induces a measure on its range. Using this result, we define the concept of a random variable, which is a measurable function defined on a probability space. Of course, we already saw that random variables are a central concept in probability theory in Chapter 4. Random variables inherit all the properties of general measurable functions of course. In particular, the induced measure of a random variable, called the law of the random variable, plays a key role in probability. Indeed, we also show that any given probability distribution corresponds to at least one random variable, hence we obtain a one-to-one and onto correspondence between probability distributions and random variables. We briefly discuss some familiar examples, e.g. the Cauchy and normal distributions and corresponding random variables. In the last part of this chapter, we extend the notion of random variables valued in to random vectors valued in n. Almost all the proofs in this chapter follow in a straightforward way from the theory we have developed so far. The main exception is the result on approximation of measurable functions by sequences of simple functions, which is a major proof. 8.1 σ- algebras and maps It turns out that an extremely important way to generate a σ-algebra from another σ- algebra is by means of an inverse to a map. For example, this plays a central role in probability. The reason this possible is due to Theorem 2.2.2, which we restate: Theorem 8.1.1 Let f be a map from a set to a set and {A α : α } be a family of subsets of a set. Then, \ A α = (Y \ A α ), \ A α = ( \ A α ). α α α f 1 A α = f 1 (A α ), α α α α f 1 A α = f 1 (A α ) The main result states that a map between spaces where one of the spaces is measurable α

8.1. σ- algebras and maps 151 implicitly defines a σ- algebra in the other one. Theorem 8.1.2: σ- algebras Induced by Inverse Maps Let f : be a map, where and are nonempty sets. 1. If is a σ- algebra on, then f 1 ( ) := f 1 (B) : B is a σ- algebra on. 2. If is a σ- algebra on, then B : f 1 (B) is a σ- algebra on. 3. If is a collection of sets in, then σ f 1 ( ) = f 1 (σ ( )) Definition 8.1.1 The σ- algebras in 1. and 2. are said to be induced by f. Note that the σ- algebra induced by f in 1. is on while the σ- algebra induced by f in 2. is on. Proof. Result 1 Let A f 1 ( ), i.e. A = f 1 (B) for some B. A c = ( f 1 (B)) c = f 1 (B c ), so A c f 1 ( ). Let {A i } be a sequence of sets in f 1 ( ). Then A i = f 1 (B i ) for B i and A i = f 1 (B i ) = f 1 B i f 1 ( ). If B is in the collection, f 1 (B), hence ( f 1 (B)) c = f 1 (B c ). Exer- Result 2 cise: show that a countable union is treated in the same fashion. Result 3 We first show σ f 1 ( ) f 1 (σ ( )). If B then B σ ( ), hence f 1 ( ) f 1 (σ ( )). By 1., f 1 (σ ( )) is a σ- algebra, so σ f 1 ( ) f 1 (σ ( )). Next, we show f 1 (σ ( )) σ f 1 ( ) using Theorem 5.1.7. Define, = B σ ( ) : f 1 (B) σ f 1 ( ). First, since f 1 ( ) σ f 1 ( ). Next, we show that is a σ- algebra. For example, assume B. This means that f 1 (B) σ f 1 ( ), which in turn implies that B c since f 1 (B c ) = ( f 1 (B)) c σ f 1 ( ). Countable unions are treated with the same approach. Thus, σ ( ), which proves the result. Example 8.1.1 Let = {1,2,3,4} and = {a, b}. Define f : by f (1) = a, f (2) = a,

152 Chapter 8. Measurable Functions and Random Variables f (3) = b, and f (4) = b. If we choose for the σ- algebra on, the induced σ- algebra on is f 1 ( ) = {,{1,2},{3,4},{1,2,3,4}}. If we choose for the σ- algebra on, the induced σ- algebra on is. Example 8.1.2 Consider the sets and its σ- algebra defined in Example 5.1.7 and and its σ- algebra defined in Example 5.1.9, and f : with f (1) = a, f (2) = b, f (3) = c, and f (4) = d. The σ- algebra induced by f on is. The σ- algebra induced by f on is,{1},{2},{1,2},{3,4},{1,3,4},{2,3,4},{1,2,3,4}. Example 8.1.3 Let = and =. f (x) = [x] = greatest integer x. For i, f 1 (i) = [i, i + 1) while f 1 (A) = i A [i, i + 1) for A. If we take as the σ- algebra on, then the induced σ- algebra on is = {, countable unions of disjoint intervals of the form [i, i +1), and half rays (, i) and [i, ), i }. In Example 5.1.5, we prove that this is a σ- algebra directly. If we take the Borel or Lebesgue σ- algebra on, then the induced σ- algebra on is. If instead, we use the σ- algebra from Example 5.1.3, i.e., = {A : A is countable or A c is countable }, then the induced σ- algebra on is {,}. 8.2 Measurable functions Example 8.1.3 shows that if f maps one measurable space (, ) to another measurable space (, ), the σ- algebra induced by f on may not be the same as the original σ- algebra. We characterize maps between measurable spaces by the relation of the induced σ- algebra to the original σ- algebra on the input space. Assume (, ) and (, ) are measurable spaces. Where indicated, is or n and is the corresponding Borel σ- algebra. Definition 8.2.1: Measurable function A function f : is (, ) measurable or measurable if f 1 (A) for all A. So, f is measurable if the induced σ- algebra f 1 ( ) is a subset of. Fortunately, it suffices to check the condition for a generating set. Theorem 8.2.1 Assume that = σ ( ) for some collection of subsets of. A function f :

8.2. Measurable functions 153 is (, ) measurable if and only if f 1 (A) for every A. Proof. The only if follows immediately from the definition. To show if, consider the set = A Y : f 1 (A), which is not empty since. By Theorem 8.1.2, is a σ- algebra on. Hence, σ ( ) =. This makes it easier to prove measurability. For example, the measurable functions contain the functions we expect to be well-behaved. Theorem 8.2.2 If and are metric spaces, then every continuous map f : is (, )- measurable. Proof. Recall from Definition 5.5.1, is generated by the collection of open sets in. Since f is continuous, f 1 (G) is open in, and hence is in, for every open set G in. So, Theorem 8.2.1 implies the result. Many applications of measure theory involve maps from a measure space into the real numbers, so the language is specialized for that case. Definition 8.2.2 An extended real-valued function on is measurable or measurable if it is (, ) measurable. An extended real-valued function f : m is Borel measurable if it is m, measurable. An extended real-valued function f : m is Lebesgue measurable if it is (, ) measurable. Note that Definition 8.2.2 does not define a notion of measurability for a map from a measurable space into the completion =. In fact, such a notion is not very useful. This is not surprising, since we obtain the completion by adding nonmeasurable sets to the original σ- algebra, and have little control over the inverse images of such sets. This has several annoying consequences. For example, it turns out that continuous functions would not necessarily be (, )-measurable. On occasion, it is useful to assume a complete domain for a measurable function. For example, intuition suggests measurability should not be affected by behavior on sets of measure 0. This is indeed true if the domain is complete. Theorem 8.2.3 Assume (, ) is complete. If f is (, ) measurable and f = g a.e., then g is also (, ) measurable. Proof. Let A = {x X : f (x) g(x)}, which by assumption has measure 0. Fix any B and consider the set g 1 (B) = {x X : g(x) B}. We can write this set as a

154 Chapter 8. Measurable Functions and Random Variables disjoint union, {x : g(x) B} = {x : g(x) B} A {x : g(x) B} A c = {x : g(x) B} A {x : f (x) B}, where we use the fact that f = g in X A. The first set on the right is a subset of A and therefore has measure 0, since is complete. The second set is in by assumption. Thus, g 1 (B) in and g is (, ) measurable. Returning to the real-valued case, Theorems 8.2.1 and 6.2.2 imply, Theorem 8.2.4 Let f be an extended real-valued function on. Then, f is measurable if and only if f 1 (A) for all A r c or all A o or all A l c or all A c or all A l o l c r o r c or all A or all A or all A. Some examples. Example 8.2.1 Consider the counting measure on, number of elements in A, if A is finite, µ(a) =, if A is infinite. Every set is measurable, so any real-valued function on this space is measurable. Example 8.2.2 Define an outer measure µ on by µ (A) = 0, if A =, 1, otherwise. Then only and are measurable with respect to the induced measure. Consider f (x) = x. Since {x : f (x) > 0} = (0, ) is not measurable, f is not measurable. Note that f = x is a measurable function for (,), just not for (,{,}). Example 8.2.3 For a set of real numbers A, let f : (, ) (, ) be defined, f (x) = b, x A, 0, x / A,

8.2. Measurable functions 155 for some constant b > 0. Then,, a b, {x : f (x) > a} = A, 0 a < b,, a < 0. Hence, f is measurable if and only if A. Example 8.2.4 Consider the real valued function f on (, ) defined, x 2, x 1, f (x) = 2, x = 1, 2 x, x > 1. Then for a real number c, (, 2 c), c 3, (,1], 3 < c < 0, {x : f (x) > c} = (, c) ( c,1], 0 c < 1, (, c) {1}, 1 c < 2, (, c), c 2. Hence, f is measurable. We often want to consider measurability relative to a subset of a measure space. Definition 8.2.3 Assume A and f :. f is measurable on A if f 1 (B) A for all B. We say that f A is ( A, ) measurable, where A is the restricted σ- algebra on A yielded by Theorem 5.1.3. The following results all have versions for a restriction to a measurable subset. The next result says that measurability is inherited through composition. Theorem 8.2.5 If (, ), (, ) and (, ) are measurable spaces and f : is (, ) measurable and g : is (, ) measurable, then the composition g f : is (, ) measurable. Proof. Consider an arbitrary set A. Then B = g 1 (A) and so f 1 (B). Thus, g f 1 (A) = f 1 g 1 (A).

156 Chapter 8. Measurable Functions and Random Variables A consequence is that any reasonable function of a measurable function is often measurable. For example, Theorem 8.2.6 If f is -measurable and g : is continuous then g f is -measurable. Example 8.2.5 If f is a -measurable, then f 2, cos( f ), and exp( f ) are -measurable functions. Remark 8.2.1 The composition of two Lebesgue measurable functions may not be Lebesgue measurable, even if one function is continuous. Measurability is also preserved by arithmetic operations. Theorem 8.2.7 If f and g are -measurable functions, then 1. c 1 f + c 2 g is -measurable for all c 1, c 2. 2. f g is -measurable. 3. I f g(x) 0, then f /g is a -measurable. Proof. Result 1 Theorem 8.2.6 implies that c f is measurable for all c R, so we just have to show that f + g is measurable. We check {x : f (x) + g(x) c} = {x : f (x) c g(x)} = {x : f (x) r i } {x : r i g(x)}, where {r i } is a denumeration of the rational numbers. Since the rationals are dense in and f, g are measurable, the sets on the righthand side are measurable. Theorem 8.2.4 implies the result. Result 2 This follows from the identity f g = 1 2 ( f + g) 2 f 2 g 2 by applying Result 1 and Theorem 8.2.6 imply the result. Result 3 By Result 2, we only have to show that 1/g(x) is measurable. This follows from Theorem 8.2.6 and the assumption since 1/x is continuous for x 0. i This implies, Theorem 8.2.8

8.2. Measurable functions 157 The set of -measurable functions is a vector space. Next, we consider sequences of measurable functions. Recall, Definition 8.2.4 Let { f i } be a sequence of extended real valued functions on. Then for all x, inf f i (x) = inf{ f i (x)} i sup f i (x) = sup{ f i (x)} i limsup f i (x) = inf { f i (x)} sup j 1 i> j liminf f i (x) = sup inf { f i (x)} i> j j 1 Measurability behaves very nicely with respect to sequences. Theorem 8.2.9 If { f i } is a sequence of extended real valued - measurable functions, then the functions, sup f i (x),inf f i (x),limsup f i (x),liminf f i (x) are - measurable. If f = lim f i exists for all x X, then f is - measurable. i Proof. Fix a. Let g 1 = sup f i so for each j, f j (x) g 1 (x) for all x. Choose any i function g with g(x) > g 1 (x), x. Then for each j, {x : f j (x) a} {x : g(x) a} and {x : f j (x) a} {x : f i (x) a} {x : g(x) a}. Hence, {x : g 1 (x) a} = {x : f i (x) a}. By Theorem 8.2.4, g 1 is - measurable. A similar argument shows that g 2 = inf f j is j - measurable (exercise). Now, liminf i f i = sup j inf{f i : i j } and limsup f i = infsup{f i : i j } are - j measurable by repeated application of the results for inf and sup. Finally, if f = lim i f i exists for all x then f = limsup i f i and hence f is - measurable. A particular application of this result yields the useful fact:

158 Chapter 8. Measurable Functions and Random Variables Theorem 8.2.10 If f, g are extended real valued - measurable functions, then so are max{f, g} and min{f, g}. The following is an immediate consequence of Theorems 8.2.9 and 8.2.3. Theorem 8.2.11 If { f i } is a sequence of extended real valued - measurable functions and if lim j f j = f a.e., then f is - measurable. The next result turns out to be very useful in developing a theory of integration. Definition 8.2.5 Let f be an extended real valued function on. Consider the sets, A + = {x : f (x) 0} and A = {x : f (x) < 0}. Then f can be decomposed into its positive and negative parts, f = f + f, where f + = max( f,0) and f = max( f,0). Note the functions f + and f are non-negative. Theorem 8.2.12 If f is an extended real valued function on, then f is - measurable if and only if f +, f are - measurable. Proof. If f is - measurable then A +, A are measurable sets and hence f +, f are - measurable. If f +, f are - measurable then f is the sum of two - measurable functions and so is - measurable. So every measurable function can be written as the difference of two non-negative measurable functions. Since f = f + + f, Theorem 8.2.13 If f is an extended real valued - measurable function, f is - measurable. It is natural to draw the conclusion that it is very hard to get out of the set of measurable functions.

8.3. Approximation by simple functions 159 8.3 Approximation by simple functions The need to approximate a complex function by simpler functions is readily apparent in analysis, e.g. for the purpose of integration. In classic settings, we consider the approximation of smooth functions using Taylor polynomials and the approximation of continuous functions by polynomials (Theorem 3.3.1) or by piecewise polynomials as in Riemann integration. In each of these cases, the smoothness and/or continuity is essential to the construction of accurate approximations. The class of measurable functions contains continuous functions, but is apparently much larger. This should raise a concern about whether or not it is possible to approximate a given measurable function by relatively simple functions. Recall that the Weierstrass Approximation Theorem implies that continuous functions can be approximated by polynomials. It turns out that it is possible to approximate measurable functions if the approximations are computed using a very clever idea. This result is extremely useful, e.g. we define the Lebesgue integral of a function by taking the limits of the integrals of a sequence of relatively simple functions. Let (, ) be a measurable space. The classic approximation used for Riemann integration is built upon intervals. In measure theory, we use measurable sets. Definition 8.3.1: Characteristic function For A, the characteristic function χ A is defined by, χ A (x) = 1, if x A, 0, if x A c (8.1) In probability, the characteristic function of a set A is also called the indicator function of A and is denoted by I A or 1 A. It is easy to see that Theorem 8.3.1 Let A. Then, χ E is -measurable if and only if A. There are a couple of useful properties of characteristic functions. Theorem 8.3.2 Let A and B be measurable sets. 1. If A and B are disjoint, then χ A B = χ A + χ B. 2. χ A + χ A c = 1. 3. χ A B = χ A χ B. We define the functions used for approximation.

160 Chapter 8. Measurable Functions and Random Variables Definition 8.3.2: Simple function A simple function s : is a finite linear combination of characteristic functions of measurable sets, k s(x) = a i χ Ai (x), k where {A i } N is collection of disjoint sets in such that = A i and {a i } k is a set of real numbers. Theorems 8.3.1 and 8.2.7 imply, Theorem 8.3.3 A simple function on a measurable space is measurable. The following alternate characterization of simple functions is useful, Theorem 8.3.4 A function s : is a simple function if and only if s is a measurable function such that the range of s is a finite set of points in. Proof. A simple function has a finite range of course. For the other direction, assume s is measurable and has range {a i : 1 i k}. Let A i = s 1 ({a i }) for 1 i k. {A i } is a k disjoint collection of measurable sets and X = A i. If x A i, then s(x) = a i. Thus, s = is a simple function. k a i χ Ai, Note that a step function with a finite number of discontinuities is a simple function, but simple functions do not have to be step functions. Example 8.3.1 χ is a simple function. Simple functions do not have unique representation as seen in the following example. Example 8.3.2

8.3. Approximation by simple functions 161 Let = [0,3] and define 1, 0 x 1, s(x) = 2, 1 < x 2, 0, 2 < x 3. Let A 1 = [0,1], A 2 = (1,2], A 3 = (2,3]. Then, s = 1χ A1 + 2χ A2 + 0χ A3. However, if we define B 1 = [0,1/2), B 2 = [1/2,1], B 3 = (1,2], B 4 = (2,3], then s = 1χ B1 1χ B2 + 2χ B3 + 0χ B4. In the proof of 8.3.4, we use a special representation for s. Definition 8.3.3: Standard representation Given a simple function s, let the range of s by the set of distinct numbers {a i } k and define A i = s 1 (a i ) for 1 i k. The standard representation of s is s = k a i χ Ai. The reason for using this is, Theorem 8.3.5 Each simple function has a unique standard representation. Proof. Let the range of a given simple function s be the set of distinct numbers {a i } k. Define A i = s 1 (a i ) for 1 i k. We observe that {A i } is a disjoint collection of measurable sets such that = i A i. It is an exercise to show the standard representation is unique. A fact that is crucial to building approximations using simple functions is: Theorem 8.3.6 The set of simple functions is a vector space. Proof. The definition implies that if c is a real number and s is a simple function, then c s is a simple function. The constant function 0 is also simple. We assume that s 1 = k and s 2 = l j =1 a i χ Ai b j χ B are two simple functions in standard representation. First assume j

162 Chapter 8. Measurable Functions and Random Variables k l = 1, which implies that B 1 =. Let C i = A i B 1 for 1 i k. Then, C i = and k s 1 + s 2 = c j χ C j, where c j = a j + b 1 for 1 j k. We extend this to l 1 by the using the collection, j =1 Ai B j : 1 i k, 1 j l, and using the sum a i + b j in the corresponding set. Thus, s 1 + s 2 = (a i + b j )χ Ai B, j i, j =1 and so is simple. To make notation simpler, we use the following convention, Definition 8.3.4 If f and g are functions from a set into, we write f g to indicate f (x) g(x) for all x. We now state the fundamental approximation result for measurable functions. Theorem 8.3.7: Approximation by simple functions Let f be an extended real valued - measurable function. 1. If f is nonnegative, there is a sequence {s i } of simple functions such that, 0 s 1 s 2 s 3 f, and s i f pointwise. 2. There is a sequence {s i } of simple functions such that, and s i f pointwise. 0 s 1 s 2 s 3 f, In either case, the convergence is uniform on any measurable set on which f is bounded. This theorem says that any nonnegative measurable function can be approximated from below by a sequence of simple functions. This kind of monotone approximating sequence is a particularly powerful tool in analysis, with the limiting function typically inheriting many properties of the functions in the approximating sequence. This motivates a general four-step approach for proving many desirable properties about measurable functions and their integrals as discussed in the next chapter. Specifically, we first prove that a desired property holds for characteristic functions, which is often trivial. Then,

8.3. Approximation by simple functions 163 usually by exploiting linearity, we show that the desired property holds for simple functions. In the third step, we apply the first part of the above theorem, often in conjunction with other results, to extend this desired property to nonnegative measurable functions. Finally, the second part of the theorem is applied to extend the desired property to general measurable functions. We illustrate this technique in some of the worked problems at the end of this chapter. Before beginning the proof, we emphasize the new idea behind the approximation. The theory of Riemann integration is built on the approximation of a function f by a piecewise constant functions defined on partitions of the domain of f into disjoint intervals. The advantage of this approach is that the approximations are easy to compute. The disadvantage is that an arbitrary partition may not align well with the behavior of f, e.g., f may have rapid changes or discontinuities inside any given subinterval in a partition. To deal with this, we have to restrict the functions to the collection who s behavior can be described on partitions of decreasing size, e.g. by assuming the function is continuous. In some sense, this is the source of weakness in the Riemann theory of integration. In the Lebesgue approach, we partition the range of a function f into regular intervals and then build an approximation based on the inverse images of the subintervals in the partition of the range. These are all measurable sets if f is measurable. This approach yields approximations with many good properties, though the approximations themselves are significantly more complicated to compute. Proof. Result 1 We approximate the values of f in a bounded interval and treat the rest of f s values as big. We partition the range of f disjointly as [0, ] = [0, m] (m, ] for integer m 1. We further partition [0, m] into m2 m intervals Then, With I m 0 = [0,2 m ], I m i = (i2 m,(i + 1)2 m ], 1 i m2 m 1. m2 m 1 f 1 ([0, ]) = f 1 ([0, m]) f 1 ((m, ]) = A m i we define the simple function, i=0 ( f 1 I m i = f 1 I m i, 0 i < m2 m 1, A m = f 1 ((m, ]), m2 m 1 s m = i2 m χ A m i i=0 ) f 1 ((m, ]). + mχ Am. (8.2) We approximate the values of f on A m by i2 m and on A i m by m. Each A m splits into 2 intervals of length 2 (m+1) at the next level m+1 and hence when i x A m, s i m (x) s m+1 (x). When x (m, m + 1], then s m (x) = m whereas, s m+1 (x) m. When x (m + 1, ], then s m+1 (x) = m + 1 s m (x) = m. For x in the set {x : f (x) m}, f (x) I m for some 0 i m2 m 1. Thus, f (x) s i m (x) (i +1)2 m i2 m = 2 m. We can make this arbitrarily small by taking m large. Suppose f is bounded on a measurable set A, so there is an integer M such that f (x) < M for all x A. Then for all x A, f (x) s m (x) < 2 m for m M. We can make the difference arbitrarily small by taking m large. Thus, the convergence is uniform.

164 Chapter 8. Measurable Functions and Random Variables Result 2 This follows by applying Result 1 to f + and f separately, then adding the approximations. Example 8.3.3 For f (x) = x, I m 0 = [0,2 m ] and I m = (i2 m,(i + 1)2 m ] for 1 i m2 m i 1. Then, A m = f 1 (I m ) = [ (i + 1)2 m, i2 m ) (i2 m,(i + 1)2 m ] and A i i m = f 1 (I m ) = (, m] (m, ). Then, the approximation is defined by (8.2). We i plot s 2 in Figure 8.1. This approximation is fundamentally different than the usual piecewise constant approximation used for Riemann integration because the subset used in the x-axis to build the approximation consists of pairs of disjoint intervals. 2 S 2 1-2 -1 0 1 2 Figure 8.1. Plot of s 2 for x. We also have a converse that follows from Theorem 8.2.11. Theorem 8.3.8 If {s i } is a sequence of simple - measurable functions such that 0 s 1 s 2 and lim i s i (x) = f (x) a.e., then f is - measurable. Theorem 8.3.8 says that the set of simple measurable functions is dense in the set of all measurable functions. This provides another way to define the set of measurable functions. Namely, we can define the set of measurable functions to be the completion of the set of simple functions. In this approach, properties of simple functions are verified directly then extended to the limits of sequences of simple functions. 8.4 Measurability and continuity In this section, we explore the relation of measurability and continuity. This is an interesting issue in itself. Moreover, measurability is the basis for the Lebesgue integral while continuity is bound closely to the Riemann integral, so this is useful when we explore the relation of the two integrals. We begin by proving a classic theorem that states that measurable functions are nearly continuous.

8.4. Measurability and continuity 165 Theorem 8.4.1: Lusin s Theorem Let (,,µ) be a measure space, where n is bounded and µ is a finite Borel measure. Let f : (, ) (, ) be measurable. For any ε > 0, there is a compact set F such that µ(\f ) < ε and f is continuous on F. Again, we find that it is the behavior on sets of small measure that distinguish measurable functions from continuous functions. Remark 8.4.1 Lusin proved a version of this theorem for measure on the real line. We state this result in terms of Euclidean spaces, but it has a quite general analog. We restrict the result to bounded sets in order to give a particularly elementary proof. This can be weakened. Note that any countable dense set of points can be used to build the approximation in the proof, but using rationals is convenient for simplifying notation. Proof. Let {r i } be a denumeration of. For integer j > 0, define the approximating function f j such that for each x, f j (x) = r i, i = inf k : f (x) r k < 1. j f j approximates f on the set A i j = f 1 y : y r i < 1 j by the value ri. f j is defined for every x because is dense in. Moreover, f j is measurable since A i j is measurable for all i and j. Finally, f j satisfies f j (x) f (x) < 1 j for all x, so f j f uniformly on. We truncate to create an approximation. This means that f is not approximated for some values of x because those function values are not close to any of the chosen rational numbers. We control the measure of the set of such x. For each j, choose integer N j sufficiently large so x µ : f (x) ri 1 all i N j j < 1 2. j By the regularity of Lebesgue measure Theorem 6.5.1, for i = 1,2,, N j, there is a compact set F j i f 1 (r j i ) with µ f 1 1 (r j i )\F j i <. 2 j N j For each j, the sets {F j i } N j are disjoint. Set F j = N j F j i. f j is continuous on F j since it is constant on each F j i. The choice of N j and F j i implies that µ(\f j ) < N j 2 j N j + 1 2 j = 2 2 j. For integer k > 0, let F k = j =k F j. Fk is compact for each k and µ(\ F k ) 4 2 k because \ F k = j =k \F 4 j. We choose k so large that < ε and set F = F 2 k k.

166 Chapter 8. Measurable Functions and Random Variables F F j for all j k. { f j } is a sequence of continuous functions that converges j =k uniformly to f on the compact set F, hence f is continuous on F. We can also approximate measurable functions by sequences of continuous functions in measure. Theorem 8.4.2: Approximation by Continuous Functions Let (,,µ) be a measure space, where n is bounded and µ is a finite Borel measure. Let f : (, ) (, ) be measurable. There is a sequence of continuous functions {g i } on such that for any δ > 0, lim µ({x : f (x) g i (x) δ}) 0. i Proof. Following the proof of Theorem 8.4.1, for each integer i > 0, we choose a compact set F i and function g i such that g i is continuous on F i, g i = f on F i, and µ(\f i ) < 1 i. {g i } is the desired sequence. 8.5 The measure induced by a measurable function The property of measurability is defined in terms of reconciling σ- algebra structures, but a measurable function also induces a measure in its range. This is a central concept for probability theory. Let (,,µ ) be a measure space and (, ) be a measurable space. Theorem 8.5.1: Measure Induced by a Measurable Function Let f be (, ) measurable function. For any A, set µ f (A) = µ ( f 1 (A)). Then µ f is a measure on (, ). Definition 8.5.1 The measure µ f associated with a (, ) measurable function f is called the measure induced by f, the pull back measure for f, and the image measure. Proof. Since f 1 () =, µ f () = 0. Let {A i } be a sequence of pairwise disjoint sets.

8.5. The measure induced by a measurable function 167 Then, µ f A i = µ f 1 A i = µ f 1 (A i ) Hence,,,µ f is a measure space. = µ f 1 (A i ) = µ f (A i ). Example 8.5.1 Consider the function f (x) = x 2 mapping ([0,1], [0,1],µ ) to ([0,1], [0,1] ), see Figure 8.2. For any h-interval with 0 a b 1, f 1 ((a, b]) = [ b, a) ( a, b]. Hence, µ f ((a, b]) = 2( b a). 1.0 b a -1.0 - b - a 0.0 a b 1.0 Figure 8.2. Illustration of the inverse image of an h-interval for Example 8.5.1. Example 8.5.2 Consider f (x 1, x 2 ) = x 2 1 + x2 2 mapping (B 1 (0), B 1 (0) ) to ([0,1], [0,1] ), where B 1 (0) is the circle of radius 1 centered at the origin. f 1 (y) is a circle of radius y centered at the origin and f 1 ((a, b]) is the annulus with inner radius a and outer radius b for 0 a b 1. See Figure 8.3. If we assume the uniform probability measure on B 1 (0), which is µ /π, then µ f ((a, b]) = 1 π π( b) 2 π( a) 2 = b a. Example 8.5.3 In this example, we construct maps from ([0,1], [0,1],µ ) to a subspace of C ([0,1]). In particular, for a given m, we define a map taking x [0,1] to a set of Bernstein polynomials of degree m or less. Assuming we use the nonterminating binary ex-

168 Chapter 8. Measurable Functions and Random Variables b a a b Figure 8.3. Illustration of the inverse image of an h-interval for Example 8.5.2. pansion when possible, x =.x 1 x 2 x 3 has a unique binary expansion. Given m, define the map f m (x) = b m,x (t) = m i=0 S i (x) m p m,i (t), 0 t 1, where S i (x) = i j =1 x j counts the number of heads up in the first i th tosses in the Bernoulli sequence corresponding to x and S 0 (x) = 0. This is equivalent to choosing the Bernstein polynomial b m,x corresponding to a function that has values {S i (x)/m} at {i/m}. We show images of f m (x) for two different x and a sequence of m in Figure 8.4. The polynomials have value 0 at 0 by construction. The Weak Law of Large Numbers implies the value at 1 should be around 1/2 for most polynomials corresponding to large m. To compute the inverse, we let B m denote the set of Bernstein polynomials of degree m or less in the range of f m. Since S i (x) {0,1/m,2/m,,1} and S i (x) S i+1 (x) for all x, there is a 1-1 and onto correspondence between B m and the collection of nondecreasing finite sequences of m + 1 integers beginning with 0 and the other values drawn from {0,1,, m}. So, B m has finite cardinality. Suppose that b m B m has coefficients {y k } m {0,1/m,2/m,,1} with respect to the k=0 basis {p m,k } for B m. Since S i (x) = S i 1 (x) + δ i (x), where δ i (x) = 1 if x i = 1 and is 0 otherwise, specifying {y k } determines the first m digits of a decimal expansion.x 1 x 2 x m. Therefore, fm 1(b m ) is the set of numbers x [0,1] whose first m digits are.x 1 x 2 x m, namely the h-interval (.x 1 x 2 x m 1 x m,.x 1 x 2 x m 1 x m + 2 m ]. Therefore, the inverse image of any particular Bernstein polynomial in B m is measurable. Since B m has finite cardinality and all members of the set are measurable, the induced σ- algebra is the power set Bm. Given a set A Bm with cardinality A <, we have µ f (A) = A 2 m.

8.6. Random variables and their distributions 169 0.75 0.375 m=4 m=8 m=16 m=32 m=64 0.605 0.33 m=4 m=8 m=16 m=32 m=64 0 t 1 0 1 t Figure 8.4. Bernstein polynomials f m (x) corresponding to a sequence of m for two different x. 8.6 Random variables and their distributions We turn to the discussion of measurable functions on probability spaces, which, in what has to be one of the stranger abuses of mathematical language, are known as random variables. Let (Ω,, P) be a probability space. All measurable functions are defined on this space. Definition 8.6.1: Random variable A random variable (r.v.) is an extended real valued measurable function. r.v. is used to indicate the plural as well as the singular. r.v.are denoted with capital letters. Example 8.6.1 In the probability model for Bernoulli sequences, we use the r.v. S i, R i and W i. We note that the probability question we address in Chapter 4 are all expressed in terms of conditions on values on one or more of these r.v., i.e. in terms of inverses of r.v. As always, we generally discard behavior on sets of measure zero. Definition 8.6.2: Equal almost surely Assume that X and Y are two r.v.then X = Y almost surely (a.s.) if P({ω Ω : X (ω) Y (ω)}) = 0. We restate the key properties of measurable functions in terms of r.v.

170 Chapter 8. Measurable Functions and Random Variables Theorem 8.6.1: Properties of Random Variables 1. If X is a r.v.and g : is Borel measurable then g X is a r.v.in particular, this applies for continuous g. 2. If X and Y are r.v., then X +Y, X Y, and X Y are r.v.also cx is a r.v.for any constant c. If Y is never zero, then X /Y is a r.v. 3. The set of r.v.is a vector space. 4. If {X i } is a sequence of r.v., then the functions, sup X i (ω), inf X i (ω), limsup X i (ω), liminf X i (ω) are r.v.if X = lim i X i exists a.s., then X is a r.v. 5. If X and Y are r.v., then max{x, Y } and min{x, Y } are r.v. 6. If X is a r.v., then X +, X, and X are r.v. 7. Let X be a r.v. 1. If X is nonnegative, there is a sequence {S i } of simple r.v.such that, 0 S 1 S 2 S 3 X, and S i X pointwise. 2. There is a sequence {S i } of simple r.v.such that, and S i X pointwise. 0 S 1 S 2 S 3 X, In either case, the convergence is uniform on any set on which X is bounded. 3. Assume in addition that Ω is bounded and let X be a r.v.. For any ε > 0, there is a compact set K Ω with P(Ω\K) < ε such that X is continuous on K. We now explore the close link between random variables, probability measures, and distribution functions. Recall that Theorem 8.5.1 implies that a measurable function on a measure space induces a measure on its range. This is the r.v.version. Theorem 8.6.2 Let X be a r.v.and let P X be the induced measure on, defined by PX (A) = P(X 1 (A)) for A. Then, (,, P X ) is a probability space. Proof. By Theorem 8.5.1, we know that (,, P X ) is a measure space. Moreover, P X () = P(X 1 ()) = P() = 1. Definition 8.6.3: Law of a random variable Let X be a r.v.the probability measure P X is called the law of X and is induced by X.

8.6. Random variables and their distributions 171 Example 8.6.2 Consider the function Y (ω) = ω mapping ([0,1], [0,1],µ ) to ([0,1], [0,1] ). For any h-interval with 0 a b 1, Y 1 ((a, b]) = (a 2, b 2 ]. Hence, P Y ((a, b]) = b 2 a 2. Example 8.6.3 Consider Y (ω 1,ω 2 ) = ω 2 1 + ω2 2 mapping (B 1 (0), B 1 (0) ) to ([0,1], [0,1]), where B 1 (0) is the circle of radius 1 centered at the origin. Y 1 ((a, b]) is the annulus with inner radius a and outer radius b for 0 a b 1. If we assume the uniform probability measure on B 1 (0), which is µ /π, then µ f ((a, b]) = 1 π πb 2 πa 2 = b 2 a 2. On a complete measure space, altering the values of a r.v.on a set of probability zero does not affect the distribution. Theorem 8.6.3 Assume that (Ω,, P) is complete. If X and Y are r.v.with X = Y a.s., then X and Y have the same distribution. The existence of a distribution for a r.v.provides a new way to equate two r.v. Definition 8.6.4: Equal in distribution Two r.v. X and Y are equal in distribution if P X (A) = P Y (A) for all A. Theorem 8.6.3 implies that two r.v. that are equal a.s. are equal in distribution. But, two r.v.can be equal in distribution and not equal a.s.indeed, two r.v.that are equal in distribution may not be defined on the same probability spaces! Example 8.6.4 Let Ω = {0,1}, = Ω, and P defined so P(0) = P(1) = 1/2. Define the r.v.on Ω, X (ω) = 0, ω = 0, 1, ω = 0, 1, ω = 1., Y (ω) = 0, ω = 1. Both X and Y map (Ω,, P) to (Ω, ). By definition, P X ( ) = P(X 1 ( )) and P Y ( ) = P(Y 1 ( )). This implies that P X (0) = P Y (0) = 1/2 and P X (1) = P Y (1) = 1/2. However, X Y. Since a r.v. X induces a new probability space (,, P X ), we can define new r.v. Y on (,, P X ). It is natural to wonder about the connection between such a Y and the composition Y X. There is a remarkable relationship.

172 Chapter 8. Measurable Functions and Random Variables Theorem 8.6.4 Let X be a r.v.with distribution P X. Let Y be a measurable function from (, ) to (, ). Then, Y is a r.v.on (,, P X ) with the same distribution as the r.v. Y X on (Ω,, P). Proof. This is a good exercise in using the ideas underlying Theorem 8.2.5.) Example 8.6.5 In Example 8.5.2, we deal with the r.v. X (ω 1,ω 2 ) = ω 2 1 +ω2 2 mapping (B 1 (0), B 1 (0), P) to ([0,1], [0,1] ), where P = µ /π. We find that P X = µ is the uniform distribution. In Example 8.6.3, we consider the composition Y X, where Y (y) = y maps ([0,1], [0,1] ) to ([0,1], [0,1] ). We compute P Y X ((a, b]) = b 2 a 2. In Example 8.6.2, we consider the r.v. Y (y) = y mapping ([0,1], [0,1],µ ) to ([0,1], [0,1] ) and found that P Y ((a, b]) = b 2 a 2. Recall that Theorem 7.2.1 says that each probability measure on (, ) is associated with a cdf. Applying this to the probability space induced by a r.v., we get Definition 8.6.5: Cumulative distribution function Let X be a r.v. The cumulative distribution function (cdf) or probability distribution function F X of X is defined F X (x) = P({ω : X (ω) x}) for x. Example 8.6.6: Delta distribution Suppose X (ω) = a for all ω. Then, F (x) = point mass or delta distribution at a. 0, x < a, 1,. This is called the unit x a. We summarize key properties of a distribution function. Theorem 8.6.5: Properties of a probability distribution function Let X be a r.v.with distribution function F X. Then, F X is a non-decreasing, right continuous, real valued function with left hand limits given by F X (x ) = lim F (x i ). x i x In addition, lim F (x) = 0 and lim F (x) = 1. Finally, F x x X has at most a countable number of discontinuities. Proof. Exercise.

8.6. Random variables and their distributions 173 So, each r.v. defines a distribution function. The following result establishes a converse, namely we can associate a particular random variable with a given distribution function. This turns out to be important for generating random numbers from a specified distribution, among a number of other applications. Theorem 8.6.6: A Theorem of Skorohod Let F : [0,1] be a cdf. There is a r.v. X : [0,1] defined on the probability space [0,1], [0,1], P such that F = F X, where [0,1] is the family of Lebesgue measurable sets on [0,1] and P = µ. Proof. We define the map X : [0,1] by, X (ω) = inf{x : F (x) ω}, 0 ω 1. We first show that X is a [0,1], measurable function and hence a r.v.for a, consider the set {ω : X (ω) a}. Since F is increasing, this is [0, b] with b = sup{ω : ω F (a)}. The result follows from Theorem 8.2.4. We know that X induces a distribution P X on. We show that F is the distribution function for P X, so F = F X, by showing that, F (x) = P({ω : X (ω) x}), x. Since X is an increasing function on [0,1], the event A = {ω : X (ω) x} is the interval [0,sup A]. Since P is the uniform distribution, we want to show that F (x) = sup A = P({ω : X (ω) x}). The definition of X and the right continuity of F imply that F (X (ω)) ω, so if ω A, F (x) F (X (ω)) ω. Therefore, F (x) is an upper bound for A. But, F (x) A since X (F (x)) x. So, F (x) = sup A. Example 8.6.7 In Example 7.2.4, we define the Cauchy distribution F (x) = 1 2 + 1 π arctan x x 0 γ on (, ), for real constants x 0 and γ. We set x 0 = 0 and γ = 1. By the proof of Theorem 8.6.6, X (ω) = inf{x : F (x) ω}. Since F is strictly monotone, and hence invertible, X (ω) = {x : F (x) = ω}. Solving ω = 1 2 + 1 arctan(x ) gives π X (ω) = tan(π(ω 1/2)). Example 8.6.8 In Example 7.2.5, we define the normal distribution F (x) = 1 x µ 1 + erf 2 σ on 2 (, ), for real constants µ and σ > 0. Since F is strictly monotone, we find

174 Chapter 8. Measurable Functions and Random Variables that the r.v.from Theorem 8.6.6 is X (y) = µ + σ 2erf 1 (2y 1). We plot this in Figure 8.5. X 4 3 2 1 0-1 -2-3 -4 0 0.2 0.4 0.6 0.8 1 y Figure 8.5. Plot of the r.v. for the normal distribution given by Theorem 8.6.6 for µ = 0 and σ = 1. It is an exercise to establish further properties of the cdf of a r.v.. Theorem 8.6.7 Let F : [0,1] be a cdf and X : [0,1] the r.v.guaranteed by Theorem 8.6.6. If F is strictly increasing, then X and F are inverse functions to each other, and X is strictly increasing and continuous. In general: X is left continuous; Jumps of F correspond to intervals of constancy for X ; Bounded intervals of constancy of F correspond to jumps of X. Unbounded intervals of constancy of F correspond to finite limits of X at 0 and 1. Since the continuous case is the easiest to deal with, we distinguish it with a definition. Definition 8.6.6: Continuous random variable A r.v.is said to be continuous if its distribution function is continuous. 8.7 Random vectors We can extend the concept of random variables to vector-valued functions. First, Assume (,,µ) is a measure space. Definition 8.7.1 A function is measurable or measurable if it is (, n ) measurable. A function f : m n is Borel measurable if it is ( m, n ) measurable. A

8.7. Random vectors 175 function f : m n is Lebesgue measurable if it is, n measurable. Theorem 8.2.1 implies that it is sufficient to check measurability on a generating set, Theorem 8.7.1 Let f : n. Then, f is measurable if and only if f 1 (A) for all A r c or all A o or all A l c or all A c. In practice, we deal with the components of a vector-valued function. Definition 8.7.2 The i th projection map p i : n is defined p i (x1, x 2,, x n ) = x i for 1 i n. Theorem 8.7.2 The projection maps from ( n, n ) to (, ) are measurable. Proof. p 1 ((a, b]) = {x n : a < x i i b and x j, j i} is a measurable set for all a, b n. We only have to check measurability component-wise, Theorem 8.7.3 Let f : n be given as f (x) = ( f 1 (x),, f n (x)), x Ω. f is measurable if and only if f i is measurable for all 1 i n. Proof. If f is measurable, then f i = p i f is measurable by Theorems 8.7.2 and 8.2.5. If each f i is measurable, then f 1 {y n : a i < y i b i, 1 i n} = {x m : a i < f i (x) b i, 1 i n} n = {x m : a i < f i (x) b i } is measurable set in for all a, b n. Example 8.7.1 We consider the vector-valued function f (x) = x 2 sin(x 1 ), x 1 + x 2, e x 2 on [0,1] [0,1], [0,1] [0,1], µ. Each of the component functions are continuous and hence

176 Chapter 8. Measurable Functions and Random Variables measurable, so f is measurable. The image of the domain is [0,sin(1)] [0,2] [1, e] with the Borel σ- algebra. Now, we specialize to probability spaces. Now let (Ω,, P) be a probability space. Definition 8.7.3: Random vector A random vector is a measurable function from (Ω,, P) to ( n, n ). Theorem 8.7.3 implies, Theorem 8.7.4 A map X on (Ω,, P) is a random vector if and only if each component of X is a random variable. Next, we extend measures and distribution functions to random vectors, Definition 8.7.4: Distribution function of a random vector Let X be a random vector on (Ω,, P). The probability measure induced by X is defined P X (A) = P({ω Ω : X (ω) A}). The probability distribution function or cumulative distribution function of X from R n to [0,1] is, F X (x) = P X ((, x]) = P {ω Ω : X i (ω) x i, 1 i n}, where x = (x 1, x 2,..., x n ). F X is called the joint distribution of X 1, X 2,...,X n. Theorem 6.4.8 and continuity of measure implies some important properties of distribution functions analogous to Theorem 8.6.5 for random variables. Theorem 8.7.5 Let X be a random vector on (Ω,, P). The distribution function F X is increasing and right-continuous on n and, lim F (x 1,, x i,, x n ) = 0, for any particular i, x i lim F (x 1,, x i,, x n ) = 1, for all i. x i holding the other variables fixed, (8.3) Note the conditions on the limits at are more complicated.

i i 8.7. Random vectors 177 Example 8.7.2 p We consider the random vector X (ω) = tan(π(ω 1/2)), 2 erf 1 (2ω 1) map ping (0, 1), B(0,1), P to R2, BR2 where P is the standard Lebesgue measure. Recall Examples 8.6.7 and 8.6.8. Then, p FX (x) = P {ω [0, 1] : tan(π(ω 1/2)) x1, 2 erf 1 (2ω 1) x2 } p 1 1 1 = P {ω [0, 1] : ω + arctan(x1 ), ω (1 + erf(x2 / 2))} 2 π 2 p 1 1 1 = P {ω [0, 1] : ω + arctan(x1 )} {ω [0, 1] : ω (1 + erf(x2 / 2))}. 2 π 2 p 1 1 1 Thus, if x2 x1, FX (x) = 2 (1+erf(x2 / 2)) and otherwise FX (x) = 2 + π arctan(x1 ). FX is discontinuous across x1 = x2 in general. We plot FX in Figure 8.6. Example 8.7.3 We consider the random vector X (ω) = (ω 2, ω) mapping [0, 1], B[0,1], P to [0, 1] [0, 1], B[0,1] [0,1] where P is the standard Lebesgue measure. Then, p FX (x) = P {ω [0, 1] : ω 2 x1, ω x2 } = P {ω [0, 1] : ω x1, ω x2 } p = P {ω [0, 1] : ω x1 } {ω [0, 1] : ω x2 } p = P {ω [0, 1] : ω min{ x1, x2 }}. Thus, if x2 p x1, FX (x) = x2 and otherwise FX (x) = 1 p x1. We plot FX in Figure 8.6. 1 (20,20) (20,-20) (1,1) x1 x2 (-20,-20) (-20,20) (1,0) x1 x2 (0,1) (0,0) Figure 8.6. Left: Plot FX for Example 8.7.2. Right: Plot FX for Example 8.7.3. In both cases, the viewpoint for the plot has an unusual orientation so the two parts of FX are visible. There is a converse showing that a distribution function can be related to at least one random vector,

178 Chapter 8. Measurable Functions and Random Variables Theorem 8.7.6 If F is a probability distribution function on ( n, n ) that satisfies (8.3), there is a probability space (Ω,, P) and random vector X such that F is the distribution function for X. Proof. Set Ω = n, = n, and P equal to the Lebesgue-Stieljes measure defined by F. Finally, define X = ω (the identity) on Ω. Then P X (A) = P({ω : X (ω) A}) = P(A), so X has distribution P. P is a probability measure by assumption on F. The distribution function for X satisfies, F X (x) = P X ((, x]) = P((, x]) = lim P((a, x]) = lim F ((a, x]) = F (x). a a 8.8 References 8.9 Worked problems If { f i } is a sequence of extended real valued - measurable functions, and f = lim f i exists for all x X, then Theorem 8.2.9 implies that f is - measurable. Theorem 8.2.11 i extended this result to the case where f = lim f i a.e. Suppose instead that { f i } is a sequence of extended real valued - measurable functions whose limit fails to exist a.e., i and let A denote the set of points for which { f i } converges. The following problem states that A. This is an interesting result as it implies that even if the limit of a sequence of - measurable functions does not exist a.e., the set of points for which the limit function does exist is a measurable set. Problem 8.9.1 Let (, ) be a measurable space and { f i } be any sequence of extended real valued - measurable functions. Prove that A = {x : lim i f i (x) exists}. Hint: There are at two completely different ways to prove this that are worth exploring. One could use Theorems 8.2.9 and 8.2.7 and recall a useful result that relates limits of real numbers to the lim inf and lim sup. Alternatively, one could exploit the fact that the reals are complete and observe that we can rewrite the set A as k=1 m=1 i, j m {x : f i (x) f j (x) 1/k}. We now consider a problem where the general four-step technique can be used to prove the existence of a function with a desired property. The previous problem is useful in the third step of the proof if we restrict the functions to be real-valued instead of extended