Part II Probability and Measure

Part II Probability and Measure Theorems Based on lectures by J. Miller Notes taken by Dexter Chua Michaelmas 2016 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures. They are nowhere near accurate representations of what was actually lectured, and in particular, all errors are almost surely mine. Analysis II is essential Measure spaces, σ-algebras, π-systems and uniqueness of extension, statement *and proof* of Carathéodory s extension theorem. Construction of Lebesgue measure on R. The Borel σ-algebra of R. Existence of non-measurable subsets of R. Lebesgue-Stieltjes measures and probability distribution functions. Independence of events, independence of σ-algebras. The Borel Cantelli lemmas. Kolmogorov s zero-one law. [6] Measurable functions, random variables, independence of random variables. Construction of the integral, expectation. Convergence in measure and convergence almost everywhere. Fatou s lemma, monotone and dominated convergence, differentiation under the integral sign. Discussion of product measure and statement of Fubini s theorem. [6] Chebyshev s inequality, tail estimates. Jensen s inequality. Completeness of L p for 1 p. The Hölder and Minkowski inequalities, uniform integrability. [4] L 2 as a Hilbert space. Orthogonal projection, relation with elementary conditional probability. Variance and covariance. Gaussian random variables, the multivariate normal distribution. [2] The strong law of large numbers, proof for independent random variables with bounded fourth moments. Measure preserving transformations, Bernoulli shifts. Statements *and proofs* of maximal ergodic theorem and Birkhoff s almost everywhere ergodic theorem, proof of the strong law. [4] The Fourier transform of a finite measure, characteristic functions, uniqueness and inversion. Weak convergence, statement of Lévy s convergence theorem for characteristic functions. The central limit theorem. [2] 1

Contents II Probability and Measure (Theorems) Contents 0 Introduction 3 1 Measures 4 1.1 Measures............................... 4 1.2 Probability measures......................... 4 2 Measurable functions and random variables 5 2.1 Measurable functions......................... 5 2.2 Constructing new measures..................... 5 2.3 Random variables........................... 6 2.4 Convergence of measurable functions................ 6 2.5 Tail events............................... 7 3 Integration 8 3.1 Definition and basic properties................... 8 3.2 Integrals and limits.......................... 8 3.3 New measures from old....................... 9 3.4 Integration and differentiation.................... 9 3.5 Product measures and Fubini s theorem.............. 10 4 Inequalities and L p spaces 12 4.1 Four inequalities........................... 12 4.2 L p spaces............................... 12 4.3 Orthogonal projection in L 2..................... 13 4.4 Convergence in L 1 (P) and uniform integrability.......... 13 5 Fourier transform 14 5.1 The Fourier transform........................ 14 5.2 Convolutions............................. 14 5.3 Fourier inversion formula...................... 14 5.4 Fourier transform in L 2....................... 15 5.5 Properties of characteristic functions................ 15 5.6 Gaussian random variables..................... 15 6 Ergodic theory 17 6.1 Ergodic theorems........................... 17 7 Big theorems 18 7.1 The strong law of large numbers.................. 18 7.2 Central limit theorem........................ 18 2

0 Introduction II Probability and Measure (Theorems) 0 Introduction 3

1 Measures II Probability and Measure (Theorems) 1 Measures 1.1 Measures Proposition. A collection A is a σ-algebra if and only if it is both a π-system and a d-system. Lemma (Dynkin s π-system lemma). Let A be a π-system. Then any d-system which contains A contains σ(a). Theorem (Caratheodory extension theorem). Let A be a ring on E, and µ a countably additive set function on A. Then µ extends to a measure on the σ-algebra generated by A. Theorem. Suppose that µ 1, µ 2 are measures on (E, E) with µ 1 (E) = µ 2 (E) <. If A is a π-system with σ(a) = E, and µ 1 agrees with µ 2 on A, then µ 1 = µ 2. Theorem. There exists a unique Borel measure µ on R with µ([a, b]) = b a. Proposition. The Lebesgue measure is translation invariant, i.e. for all A B and x R, where µ(a + x) = µ(a) A + x = {y + x, y A}. Proposition. Let µ be a Borel measure on R that is translation invariant and µ([0, 1]) = 1. Then µ is the Lebesgue measure. 1.2 Probability measures Proposition. Events (A n ) are independent iff the σ-algebras σ(a n ) are independent. Theorem. Suppose A 1 and A 2 are π-systems in F. If P[A 1 A 2 ] = P[A 1 ]P[A 2 ] for all A 1 A 1 and A 2 A 2, then σ(a 1 ) and σ(a 2 ) are independent. Lemma (Borel Cantelli lemma). If P[A n ] <, then n P[A n i.o.] = 0. Lemma (Borel Cantelli lemma II). Let (A n ) be independent events. If P[A n ] =, then n P[A n i.o.] = 1. 4

2 Measurable functions and random variables II Probability and Measure (Theorems) 2 Measurable functions and random variables 2.1 Measurable functions Lemma. Let (E, E) and (G, G) be measurable spaces, and G = σ(q) for some Q. If f 1 (A) E for all A Q, then f is measurable. Proposition. Let f i : E F i be functions. Then {f i } are all measurable iff (f i ) : E F i is measurable, where the function (f i ) is defined by setting the ith component of (f i )(x) to be f i (x). Proposition. Let (E, E) be a measurable space. Let (f n : n N) be a sequence of non-negative measurable functions on E. Then the following are measurable: f 1 + f 2, f 1 f 2, max{f 1, f 2 }, min{f 1, f 2 }, inf f n, sup f n, n n lim inf f n, n lim sup f n. n The same is true with real replaced with non-negative, provided the new functions are real (i.e. not infinity). Theorem (Monotone class theorem). Let (E, E) be a measurable space, and A E be a π-system with σ(a) = E. Let V be a vector space of functions such that (i) The constant function 1 = 1 E is in V. (ii) The indicator functions 1 A V for all A A (iii) V is closed under bounded, monotone limits. More explicitly, if (f n ) is a bounded non-negative sequence in V, f n f (pointwise) and f is also bounded, then f V. Then V contains all bounded measurable functions. 2.2 Constructing new measures Lemma. Let g : R R be non-constant, non-decreasing and right continuous. We set g(± ) = lim g(x). x ± We set I = (g( ), g( )). Since g is non-constant, this is non-empty. Then there is a non-decreasing, left continuous function f : I R such that for all x I and y R, we have x g(y) f(x) y. Thus, taking the negation of this, we have Explicitly, for x I, we define x > g(y) f(x) > y. f(x) = inf{y R : x g(y)}. Theorem. Let g : R R be non-constant, non-decreasing and right continuous. Then there exists a unique Radon measure dg on B such that dg((a, b]) = g(b) g(a). Moreover, we obtain all non-zero Radon measures on R in this way. 5

2 Measurable functions and random variables II Probability and Measure (Theorems) 2.3 Random variables Proposition. We have F X (x) { 0 x 1 x +. Also, F X (x) is non-decreasing and right-continuous. Proposition. Let F be any distribution function. Then there exists a probability space (Ω, F, P) and a random variable X such that F X = F. Proposition. Two real-valued random variables X, Y are independent iff P[X x, Y y] = P[X x]p[y y]. More generally, if (X n ) is a sequence of real-valued random variables, then they are independent iff for all n and x j. Proposition. Let P[x 1 x 1,, x n x n ] = n P[X j x j ] j=1 (Ω, F, P) = ((0, 1), B(0, 1), Lebesgue). be our probability space. Then there exists as sequence R n of independent Bernoulli(1/2) random variables. Proposition. Let (Ω, F, P) = ((0, 1), B(0, 1), Lebesgue). Given any sequence (F n ) of distribution functions, there is a sequence (X n ) of independent random variables with F Xn = F n for all n. 2.4 Convergence of measurable functions Theorem. (i) If µ(e) <, then f n f a.e. implies f n f in measure. (ii) For any E, if f n f in measure, then there exists a subsequence (f nk ) such that f nk f a.e. Theorem (Skorokhod representation theorem of weak convergence). (i) If (X n ), X are defined on the same probability space, and X n X in probability. Then X n X in distribution. (ii) If X n X in distribution, then there exists random variables ( X n ) and X defined on a common probability space with F Xn = F Xn and F X = F X such that X n X a.s. 6

2 Measurable functions and random variables II Probability and Measure (Theorems) 2.5 Tail events Theorem (Kolmogorov 0-1 law). Let (X n ) be a sequence of independent (realvalued) random variables. If A T, then P[A] = 0 or 1. Moreover, if X is a T -measurable random variable, then there exists a constant c such that P[X = c] = 1. 7

3 Integration II Probability and Measure (Theorems) 3 Integration 3.1 Definition and basic properties Proposition. A function is simple iff it is measurable, non-negative, and takes on only finitely-many values. Proposition. Let f : [0, 1] R be Riemann integrable. Then it is also Lebesgue integrable, and the two integrals agree. Theorem (Monotone convergence theorem). Suppose that (f n ), f are nonnegative measurable with f n f. Then µ(f n ) µ(f). Theorem. Let f, g be non-negative measurable, and α, β 0. We have that (i) µ(αf + βg) = αµ(f) + βµ(g). (ii) f g implies µ(f) µ(g). (iii) f = 0 a.e. iff µ(f) = 0. Theorem. Let f, g be integrable, and α, β 0. We have that (i) µ(αf + βg) = αµ(f) + βµ(g). (ii) f g implies µ(f) µ(g). (iii) f = 0 a.e. implies µ(f) = 0. Proposition. If A is a π-system with E A and σ(a) = E, and f is an integrable function that µ(f1 A ) = 0 for all A A. Then µ(f) = 0 a.e. Proposition. Suppose that (g n ) is a sequence of non-negative measurable functions. Then we have ( ) µ g n = µ(g n ). n=1 3.2 Integrals and limits Theorem (Fatou s lemma). Let (f n ) be a sequence of non-negative measurable functions. Then µ(lim inf f n ) lim inf µ(f n ). Theorem (Dominated convergence theorem). Let (f n ), f be measurable with f n (x) f(x) for all x E. Suppose that there is an integrable function g such that f n g for all n, then we have as n. n=1 µ(f n ) µ(f) 8

3 Integration II Probability and Measure (Theorems) 3.3 New measures from old Lemma. For (E, E, µ) a measure space and A E, the restriction to A is a measure space. Proposition. Let (E, E, µ) and (F, F, µ ) be measure spaces and A E. Let f : E F be a measurable function. Then f A is E A -measurable. Proposition. If f is integrable, then f A is µ A -integrable and µ A (f A ) = µ(f1 A ). Proposition. If g is a non-negative measurable function on G, then ν(g) = µ(g f). Proposition. The ν defined above is indeed a measure. 3.4 Integration and differentiation Proposition (Change of variables formula). Let φ : [a, b] R be continuously differentiable and increasing. Then for any bounded Borel function g, we have φ(b) g(y) dy = b φ(a) a g(φ(x))φ (x) dx. ( ) Theorem (Differentiation under the integral sign). Let (E, E, µ) be a space, and U R be an open set, and f : U E R. We assume that (i) For any t U fixed, the map x f(t, x) is integrable; (ii) For any x E fixed, the map t f(t, x) is differentiable; (iii) There exists an integrable function g such that f (t, x) t g(x) for all x E and t U. Then the map x f (t, x) t is integrable for all t, and also the function F (t) = f(t, x)dµ is differentiable, and F (t) = E E f (t, x) dµ. t 9

3 Integration II Probability and Measure (Theorems) 3.5 Product measures and Fubini s theorem Lemma. Let E = E 1 E 2 be a product of σ-algebras. Suppose f : E R is E-measurable function. Then (i) For each x 2 E 2, the function x 1 f(x 1, x 2 ) is E 1 -measurable. (ii) If f is bounded or non-negative measurable, then f 2 (x 2 ) = f(x 1, x 2 ) µ 1 (dx 1 ) E 1 is E 2 -measurable. Theorem. There exists a unique measurable function µ = µ 1 µ 2 on E such that µ(a 1 A 2 ) = µ(a 1 )µ(a 2 ) for all A 1 A 2 A. Theorem (Fubini s theorem). (i) If f is non-negative measurable, then ( ) µ(f) = f(x 1, x 2 ) µ 2 (dx 2 ) µ 1 (dx 1 ). ( ) E 2 E 1 E 1 In particular, we have ( ) f(x 1, x 2 ) µ 2 (dx 2 ) µ 1 (dx 1 ) = E 2 This is sometimes known as Tonelli s theorem. E 2 ( ) f(x 1, x 2 ) µ 1 (dx 1 ) µ 2 (dx 2 ). E 1 (ii) If f is integrable, and { } A = x 1 E : f(x 1, x 2 ) µ 2 (dx 2 ) < E 2. then If we set µ 1 (E 1 \ A) = 0. { f 1 (x 1 ) = E 2 f(x 1, x 2 ) µ 2 (dx 2 ) x 1 A 0 x 1 A, then f 1 is a µ 1 integrable function and µ 1 (f 1 ) = µ(f). Proposition. Let X 1,, X n be random variables on (Ω, F, P) with values in (E 1, E 1 ),, (E n, E n ) respectively. We define E = E 1 E n, E = E 1 E n. Then X = (X 1,, X n ) is E-measurable and the following are equivalent: 10

3 Integration II Probability and Measure (Theorems) (i) X 1,, X n are independent. (ii) µ X = µ X1 µ Xn. (iii) For any f 1,, f n bounded and measurable, we have [ n ] n E f k (X k ) = E[f k (X k )]. k=1 k=1 11

4 Inequalities and L p spaces II Probability and Measure (Theorems) 4 Inequalities and L p spaces 4.1 Four inequalities Proposition (Chebyshev s/markov s inequality). Let f be non-negative measurable and λ > 0. Then µ({f λ}) 1 λ µ(f). Proposition (Jensen s inequality). Let X be an integrable random variable with values in I. If c : I R is convex, then we have E[c(X)] c(e[x]). Lemma. If c : I R is a convex function and m is in the interior of I, then there exists real numbers a, b such that for all x I, with equality at x = m. c(x) ax + b φ ax + b m Proposition (Hölder s inequality). Let p, q (1, ) be conjugate. Then for f, g measurable, we have µ( fg ) = fg 1 f p g q. When p = q = 2, then this is the Cauchy-Schwarz inequality. Lemma. Let a, b 0 and p 1. Then (a + b) p 2 p (a p + b p ). Theorem (Minkowski inequality). Let p [1, ] and f, g measurable. Then 4.2 L p spaces f + g p f p + g p. Theorem. Let 1 p. Then L p is a Banach space. In other words, if (f n ) is a sequence in L p, with the property that f n f m p 0 as n, m, then there is some f L p such that f n f p 0 as n. 12

4 Inequalities and L p spaces II Probability and Measure (Theorems) 4.3 Orthogonal projection in L 2 Theorem. Let V be a closed subspace of L 2. Then each f L 2 has an orthogonal decomposition f = u + v, where v V and u V. Moreover, for all g V with equality iff g v. Lemma (Pythagoras identity). Lemma (Parallelogram law). f v 2 f g 2 f + g 2 = f 2 + g 2 + 2 f, g. f + g 2 + f g 2 = 2( f 2 + g 2 ). Proposition. The conditional expectation of X given G is the projection of X onto the subspace L 2 (G, P) of G-measurable L 2 random variables in the ambient space L 2 (P). 4.4 Convergence in L 1 (P) and uniform integrability Theorem (Bounded convegence theorem). Suppose X, (X n ) are random variables. Assume that there exists a (non-random) constant C > 0 such that X n C. If X n X in probability, then X n X in L 1. Proposition. Finite unions of uniformly integrable sets are uniformly integrable. Proposition. Let X be an L p -bounded family for some p > 1. Then X is uniformly integrable. Lemma. Let X be a family of random variables. Then X is uniformly integrable if and only if sup{e[ X 1 X >k ] : X X } 0 as k. Corollary. Let X = {X}, where X L 1 (P). Then X is uniformly integrable. Hence, a finite collection of L 1 functions is uniformly integrable. Theorem. Let X, (X n ) be random variables. Then the following are equivalent: (i) X n, X L 1 for all n and X n X in L 1. (ii) {X n } is uniformly integrable and X n X in probability. 13

5 Fourier transform II Probability and Measure (Theorems) 5 Fourier transform 5.1 The Fourier transform Proposition. ˆf f 1, ˆµ µ(r d ). Proposition. The functions ˆf, ˆµ are continuous. 5.2 Convolutions Proposition. For any f L p and ν a probability measure, we have f ν p f p. Proposition. f ν(u) = ˆf(u)ˆν(u). Proposition. Let µ, ν be probability measures, and X, Y be independent variables with laws µ, ν respectively. Then µ ν(u) = ˆµ(u)ˆν(u). 5.3 Fourier inversion formula Theorem (Fourier inversion formula). Let f, ˆf L 1. Then f(x) = 1 ˆf(u)e i(u,x) (2π) d du a.e. Proposition. Let Z N(0, 1). Then φ Z (a) = e u2 /2. Proposition. Let Z = (Z 1,, Z d ) with Z j N(0, 1) independent. Then tz has density 1 g t (x) = 2 e x /(2t). (2πt) d/2 with ĝ t (u) = e u 2 t/2. Lemma. The Fourier inversion formula holds for the Gaussian density function. Proposition. Proposition. Proposition. and f g t 1 f 1. f g t (2πt) d/2 f 1. f g t 1 = ˆfĝ t 1 (2π) d/2 t d/2 ˆf 1, f g t ˆf. 14

5 Fourier transform II Probability and Measure (Theorems) Lemma. The Fourier inversion formula holds for Gaussian convolutions. Theorem (Fourier inversion formula). Let f L 1 and f t (x) = (2π) d ˆf(u)e u 2 t/2 e i(u,x) du = (2π) d f g t (u)e i(u,x) du. Then f t f 1 0, as t 0, and the Fourier inversion holds whenever f, ˆf L 1. Lemma. Suppose that f L p with p [1, ). Then f g t f p 0 as t 0. 5.4 Fourier transform in L 2 Theorem (Plancherel identity). For any function f L 1 L 2, the Plancherel identity holds: ˆf 2 = (2π) d/2 f 2. Theorem. There exists a unique Hilbert space automorphism F : L 2 L 2 such that F ([f]) = [(2π) d/2 ˆf] whenever f L 1 L 2. Here [f] denotes the equivalence class of f in L 2, and we say F : L 2 L 2 is a Hilbert space automorphism if it is a linear bijection that preserves the inner product. 5.5 Properties of characteristic functions Theorem. The characteristic function φ X of a distribution µ X of a random variable X determines µ X. In other words, if X and X are random variables and φ X = φ X, then µ X = µ X Theorem. If φ X is integrable, then µ X has a bounded, continuous density function f X (x) = (2π) d φ X (u)e i(u,x) du. Theorem. Let X, (X n ) be random variables with values in R d. If φ Xn (u) φ X (u) for each u R d, then µ Xn µ X weakly. 5.6 Gaussian random variables Proposition. Let X N(µ, σ 2 ). Then Also, for any a, b R, we have E[X] = µ, var(x) = σ 2. ax + b N(aµ + b, a 2 σ 2 ). Lastly, we have φ X (u) = e iµu u2 σ 2 /2. 15

5 Fourier transform II Probability and Measure (Theorems) Theorem. Let X be Gaussian on R n, and le ta be an m n matrix and b R m. Then (i) AX + b is Gaussian on R m. (ii) X L 2 and its law µ X is determined by µ = E[X] and V = var(x), the covariance matrix. (iii) We have φ X (u) = e i(u,µ) (u,v u)/2. (iv) If V is invertible, then X has a density of ( f X (x) = (2π) n/2 (det V ) 1/2 exp 1 ) 2 (x µ, V 1 (x µ)). (v) If X = (X 1, X 2 ) where X i R ni, then cov(x 1, X 2 ) = 0 iff X 1 and X 2 are independent. 16

6 Ergodic theory II Probability and Measure (Theorems) 6 Ergodic theory Proposition. If f is integrable and Θ is measure-preserving. Then f Θ is integrable and f Θdµ = fdµ. Proposition. If Θ is ergodic and f is invariant, then there exists a constant c such that f = c a.e. Theorem. The shift map Θ is an ergodic, measure preserving transformation. 6.1 Ergodic theorems Lemma (Maximal ergodic lemma). Let f be integrable, and where S 0 (f) = 0 by convention. Then E S = sup S n (f) 0, n 0 {S >0} f dµ 0. Theorem (Birkhoff s ergodic theorem). Let (E, E, µ) be σ-finite and f be integrable. There exists an invariant function f such that and µ( f ) µ( f ), S n (f) n If Θ is ergodic, then f is a constant. f a.e. Theorem (von Neumann s ergodic theorem). Let (E, E, µ) be a finite measure space. Let p [1, ) and assume that f L p. Then there is some function f L p such that S n (f) n f in L p. 17

7 Big theorems II Probability and Measure (Theorems) 7 Big theorems 7.1 The strong law of large numbers Theorem (Strong law of large numbers assuming finite fourth moments). Let (X n ) be a sequence of independent random variables such that there exists µ R and M > 0 such that E[X n ] = µ, E[Xn] 4 M for all n. With S n = X 1 + + X n, we have that S n n µ a.s. as n. Theorem (Strong law of large numbers). Let (Y n ) be an iid sequence of integrable random variables with mean ν. With S n = Y 1 + + Y n, we have 7.2 Central limit theorem S n n ν a.s. Theorem. Let (X n ) be a sequence of iid random variables with E[X i ] = 0 and E[X 2 1 ] = 1. Then if we set then for all x R, we have as n. S n = X 1 + + X n, [ ] x Sn e y2 /2 P x dy = P[N(0, 1) x] n 2π 18