Probability Theory. Muhammad Waliji. August 11, 2006

Similar documents
Convergence of random variables. (telegram style notes) P.J.C. Spreij

Advanced Stochastic Processes.

7.1 Convergence of sequences of random variables

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

1 Convergence in Probability and the Weak Law of Large Numbers

Measure and Measurable Functions

Distribution of Random Samples & Limit theorems

Lecture 3 : Random variables and their distributions

7.1 Convergence of sequences of random variables

Probability and Random Processes

Axioms of Measure Theory

Introduction to Probability. Ariel Yadin

Singular Continuous Measures by Michael Pejic 5/14/10

Integrable Functions. { f n } is called a determining sequence for f. If f is integrable with respect to, then f d does exist as a finite real number

Probability for mathematicians INDEPENDENCE TAU

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

An Introduction to Randomized Algorithms

Math 525: Lecture 5. January 18, 2018

Lecture 19: Convergence

Chapter 0. Review of set theory. 0.1 Sets

6.3 Testing Series With Positive Terms

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem

Lecture 3 The Lebesgue Integral

Introduction to Probability. Ariel Yadin. Lecture 7

Chapter 6 Infinite Series

f n (x) f m (x) < ɛ/3 for all x A. By continuity of f n and f m we can find δ > 0 such that d(x, x 0 ) < δ implies that

EE 4TM4: Digital Communications II Probability Theory

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013

6 Infinite random sequences

Probability: Limit Theorems I. Charles Newman, Transcribed by Ian Tobasco

It is always the case that unions, intersections, complements, and set differences are preserved by the inverse image of a function.

Notes 19 : Martingale CLT

lim za n n = z lim a n n.

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

MAT1026 Calculus II Basic Convergence Tests for Series

Sequences and Series of Functions

2.2. Central limit theorem.

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

The Borel hierarchy classifies subsets of the reals by their topological complexity. Another approach is to classify them by size.

If a subset E of R contains no open interval, is it of zero measure? For instance, is the set of irrationals in [0, 1] is of measure zero?

Infinite Sequences and Series

The Boolean Ring of Intervals

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4.

This section is optional.

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

Lecture 8: Convergence of transformations and law of large numbers

4. Partial Sums and the Central Limit Theorem

Lecture 12: November 13, 2018

ECE 330:541, Stochastic Signals and Systems Lecture Notes on Limit Theorems from Probability Fall 2002

Chapter 5. Inequalities. 5.1 The Markov and Chebyshev inequalities

Lecture Notes for Analysis Class

ST5215: Advanced Statistical Theory

2.1. Convergence in distribution and characteristic functions.

LECTURE 8: ASYMPTOTICS I

Random Variables, Sampling and Estimation

Notes 5 : More on the a.s. convergence of sums

5 Birkhoff s Ergodic Theorem

A Proof of Birkhoff s Ergodic Theorem

1 Introduction. 1.1 Notation and Terminology

M17 MAT25-21 HOMEWORK 5 SOLUTIONS

Probability and Statistics

Expectation and Variance of a random variable

Notes on Snell Envelops and Examples

1 = δ2 (0, ), Y Y n nδ. , T n = Y Y n n. ( U n,k + X ) ( f U n,k + Y ) n 2n f U n,k + θ Y ) 2 E X1 2 X1

6 Integers Modulo n. integer k can be written as k = qn + r, with q,r, 0 r b. So any integer.

Introduction to Probability. Ariel Yadin. Lecture 2

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS

The standard deviation of the mean

1 Introduction to reducing variance in Monte Carlo simulations

Problem Set 2 Solutions

On Random Line Segments in the Unit Square

MA131 - Analysis 1. Workbook 3 Sequences II

Math 61CM - Solutions to homework 3

Sets and Probabilistic Models


Solution. 1 Solutions of Homework 1. Sangchul Lee. October 27, Problem 1.1

Application to Random Graphs

Introduction to Extreme Value Theory Laurens de Haan, ISM Japan, Erasmus University Rotterdam, NL University of Lisbon, PT

Entropy Rates and Asymptotic Equipartition

f(1), and so, if f is continuous, f(x) = f(1)x.

Math Solutions to homework 6

Ma 530 Introduction to Power Series

sin(n) + 2 cos(2n) n 3/2 3 sin(n) 2cos(2n) n 3/2 a n =

Basics of Probability Theory (for Theory of Computation courses)

Sets and Probabilistic Models

5 Many points of continuity

FUNDAMENTALS OF REAL ANALYSIS by

FUNDAMENTALS OF REAL ANALYSIS by. V.1. Product measures

Lecture 12: September 27

STAT Homework 1 - Solutions

Part II Probability and Measure

Here are some examples of algebras: F α = A(G). Also, if A, B A(G) then A, B F α. F α = A(G). In other words, A(G)

1 Lecture 2: Sequence, Series and power series (8/14/2012)

Precise Rates in Complete Moment Convergence for Negatively Associated Sequences

Lecture 2: Monte Carlo Simulation

Sequences and Series

Randomized Algorithms I, Spring 2018, Department of Computer Science, University of Helsinki Homework 1: Solutions (Discussed January 25, 2018)

MAS111 Convergence and Continuity

Transcription:

Probability Theory Muhammad Waliji August 11, 2006 Abstract This paper itroduces some elemetary otios i Measure-Theoretic Probability Theory. Several probabalistic otios of the covergece of a sequece of radom variables are discussed. The theory is the used to prove the Law of Large Numbers. Fially, the otios of coditioal expectatio ad coditioal probability are itroduced. 1 Heuristic Itroductio Probability theory is cocered with the outcome of experimets that are radom i ature, that is, experimets whose outcomes caot be predicted i advace. The set of possible outcomes, ω, of a experimet is called the sample space, deoted by Ω. For istace, if our experimet cosists of rollig a dice, we will have Ω = {1, 2, 3, 4, 5, 6}. A subset, A, of Ω is called a evet. For istace A = {1, 3, 5} correspods to the evet a odd umber is rolled. I elemetary probability theory, oe is ormally cocered with sample spaces that are either fiite or coutable. I this case, oe ofte assigs a probability to every sigle outcome. That is, we have probability fuctio P : Ω [0, 1], where P (ω) is the probability that ω occurs. Here, we issist that P (ω) = 1. ω Ω However, if the sample space is ucoutable, the this coditio becomes osesible. Two elemetary types of problems come ito this category ad hece caot be dealt with by elemetary probability theory: a ifiite umber of repeated coi tosses (or dice rolls), ad a umber draw at radom from [0, 1]. This illustrates the importace of ucoutable sample spaces. The solutio to this problem is to use the theory of measures. Istead of assigig probabilities to outcomes i the sample space, oe ca restrict himself to a certai class of evets that form a structure kow as a σ-field, ad assig probabilities to these special kids of evets. 1

2 σ-fields, Probability Measures, ad Distributio Fuctios Defiitio 2.1. A class of subsets of Ω, F, is a σ-field if the followig hold: (i) F ad Ω F (ii) A F = A c F (iii) A 1, A 2,... F = A F Note that this implies that σ-fields are also closed uder coutable iteresectios also. Defiitio 2.2. The σ-field geerated by a class of sets, A, is the smallest σ-field cotaiig A. It is deoted σ(a). Defiitio 2.3. Let F be a σ-field. A fuctio P : F [0, 1] is a probability measure if P ( ) = 0, P (Ω) = 1, ad wheever (A ) N is a disjoit collectio of sets i F, we have ( ) P A = P (A ). =1 Throughout this paper, uless otherwise oted, the words icreasig, decreasig, ad mootoe are always meat i their weak sese. Suppose {A } is a sequece of sets. We say that {A } is a icreasig sequece if A 1 A 2. We say that {A } is a decreasig sequece if A 1 A 2. I both of these cases, the sequece {A } is said to be mootoe. If A is icreasig, the set lim A := A. If A is decreasig, the set lim A := A. The followig properties follow immediately from the defiitios. Lemma 2.4. Let F be a σ-field, ad let P be a probability measure o it. (i) P (A c ) = 1 P (A) (ii) If A B, the P (A) P (B). (iii) P ( i=1 A i) i=1 P (A i). (iv) If {A } is a mootoe sequece i F, the lim P (A ) = P (lim A ). Defiitio 2.5. Suppose Ω is a set, F is a field o Ω, ad P is a probability measure o F. The, the ordered pair (Ω, F) is called a measurable space. The triple (Ω, F, P ) is called a probability space. The a probability space is fiitely additive or coutably additive depedig o whether P is fiitely or coutably additive. Defiitio 2.6. Let (X, τ) be a topological space. The σ-field, B(X, τ) geerated by τ is called the Borel σ-field. I particular, B(X, τ) is the smallest σ-field cotaiig all ope ad closed sets of X. The sets of B(X, τ) are called Borel sets. =1 2

Whe the topology, τ, or eve the space X are obvious from the cotext, B(X, τ) will ofte be abbreviated B(X) or eve just B. A particularly importat situatio i probability theory is whe Ω = ad F are the Borel sets i. Defiitio 2.7. A distributio fuctio is a icreasig right-cotiuous fuctio F : [0, 1] such that lim F (x) = 0 ad lim F (x) = 1. x x We ca associate probability fuctios o (, B) with distributio fuctios. Namely, the distributio fuctio associated with P is F (x) := P ((, x]). Coversely, each distributio fuctio defies a probability fuctio o the reals. 3 adom Variables, Trasformatios, ad Expectatio We ow have stated the basic objects that we will be studyig ad discussed their elemetary properties. We ow itroduce the cocept of a adom Variable. Let Ω be the set of all possible drawigs of lottery umbers. The fuctio X :Ω which idicates the payoff X(ω) to a player associated with a drawig ω is a example of a radom variable. The expectatio of a radom variable is the average or expected value of X. Defiitio 3.1. Let (Ω 1, F 1 ) ad (Ω 2, F 2 ) be measurable spaces. A fuctio T : Ω 1 Ω 2 is a measurable trasformatio if the preimage of ay measurable set is a measurable set. That is, T is a measurable trasformatio if ( A F 2 )(T 1 (A) F 1 ). Lemma 3.2. It is sufficiet to check the coditio i Defiitio 3.1 for those A i a class that geerates F 2. More precisely, suppose that A geerates F 2. The, if ( A A)(T 1 (A) F 1 ), the T is a measurable trasformatio. Proof. Let C := {A 2 Ω2 : T 1 (A) F 1 }. The, C is a σ-field, ad A C. But the, σ(a) = F 2 C, which is exactly what we wated. Defiitio 3.3. Let (Ω, F) be a measurable space. A measurable fuctio or a radom variable is a measurable trasformatio from (Ω, F) ito (, B). Lemma 3.4. If f : is a cotiuous fuctio, the f is a measurable trasformatio from (, B) to (, B). Defiitio 3.5. Give a set A, the idicator fuctio for A is the fuctio { 1 if ω A I A (ω) := 0 if ω / A 3

If A F, the I A is a measurable fuctio. Note that may elemetary operatios, icludig compositio, arithmetic, max, mi, ad others, whe performed upo measurable fuctios, agai yield measurable fuctios. Let (Ω 1, F 1, P ) be a probability space ad (Ω 2, F 2 ) a measurable space. A measurable trasformatio T :Ω 1 Ω 2 aturally iduces a probability measure P T 1 o (Ω 2, F 2 ). I the case of a radom variable, X, the iduced measure o will geerally be deoted α. The distributio fuctio associated with α will be deoted F X. α will sometimes be called a probability distributio. Now that we have a otio of measure ad of measurable fuctios, we ca develop a otio of the itegral of a fuctio. The itegral will have the probabalistic iterpretatio of beig a expected (or average) value. For the precise defiitio of the Lebesgue itegral, see ay textbook o Measure Theory. Defiitio 3.6. Suppose X is a radom variable. The the expectatio of X is EX := Ω X(ω)dP. We coclude this sectio with a useful chage of variables formula for itegrals. Propositio 3.7. Let (Ω 1, F 1, P ) be a probability space ad let (Ω 2, F 2 ) be a measurable space. Suppose T :Ω 1 Ω 2 is a measurable trasformatio. Suppose f : Ω 2 is a measurable fuctio. The, P T 1 is a probability measure o (Ω 2, F 2 ) ad ft :Ω 1 is a measurable fuctio. Furthermore, f is itegrable iff ft is itegrable, ad ft (ω 1 )dp = f(ω 2 )dp T 1. Ω 1 Ω 2 4 Notios of Covergece We will ow itroduce some otios of the covergece of radom variables. Note that we will ofte ot explicitly state the depedece of a fuctio X(ω) o ω. Hece, sets of the form {ω : X(ω) > 0} will ofte be abbreviated {X > 0}. For the remaider of this sectio, let X be a sequece of radom variables. Defiitio 4.1. The sequece X coverges almost surely (almost everywhere) to a radom variable X if X (ω) X(ω) for all ω outside of a set of probability 0. Defiitio 4.2. The sequece X coverges i probability (i measure) to a fuctio X if, for every ɛ > 0, This is deoted X P X. lim P {ω : X (ω) X(ω) ɛ} = 0. 4

Propositio 4.3. If X coverges almost surely to X, the X coverges i probability to X. Proof. We have {ω : X (ω) X(ω)} N, P (N) = 0. That is, ɛ > 0 Therefore, give ɛ > 0, we have =1 m= lim P { X X ɛ} lim P thereby completig the proof. = P { X m X ɛ} N. m= =1 m= { X m X ɛ} { X m X ɛ} P (N) = 0 Note, however, that the coverse is ot true. Let Ω = [0, 1] with Lebesgue measure. Cosider the sequece of sets A 1 = [0, 1 2 ], A 2 = [ 1 2, 1], A 3 = [0, 1 3 ], A 4 = [ 1 3, 2 3 ], ad so o. The, the idicator fuctios, I A, coverge i probability to 0. However, I A (ω) does ot coverge for ay ω, ad i particular the sequece does ot coverge almost surely. However, the followig holds as a sort of coverse: Propositio 4.4. Suppose f coverges i probability to f. The, there is a subsequece f k of f such that f k coverges almost surely to f. Proof. Let B ɛ := {ω : f (ω) f(ω) ɛ}. The, f i f almost surely iff P ( B ɛ j ) = 0. i j>i We kow that for ay ɛ, Now, otice that P ( m lim P (Bɛ ) = 0. Bm) ɛ if P ( Bm) ɛ if m Furthermore, ɛ 1 < ɛ 2 B ɛ1 B ɛ2 P (Bm) ɛ = lim m= P (B ɛ1 ) P (B ɛ2 ). m= P (Bm). ɛ Let δ i := 1/2 i. Now, ote that ( i)( ɛ i )( ɛ i )(P (Bɛ ) < δ i ). Let i := δi i. Choose ɛ 0. Note, ( m)(δ m < ɛ). Hece, P ( i j i B ɛ j ) lim i which is what we wated. j=i P (B ɛ j ) lim i j=i P (B δj j ) = lim i δ j = 0 j=i 5

Defiitio 4.5. A sequece of probability measures {α } o coverges weakly to α if wheever α(a) = α(b) = 0, for a < b, we have lim α [a, b] = α[a, b]. A sequece of radom variable {X } coverges weakly to X if the iduced probability measures {α } coverge weakly to α. This is deoted α α or X X. Lemma 4.6. Suppose α ad α are probability measures o with associated distributio fuctios F ad F. The, α α iff F (x) F (x) for each cotiuity poit x of F. Proof. First, ote that x is a cotiuity poit of F iff α(x) = 0. Let a < b be cotiuity poits of F. Suppose F (x) F (x) for each cotiuity poit x of F. The, lim α [a, b] = lim F (b) F (a) = F (b) F (a) = α[a, b]. For the coverse, suppose α α. The, lim F (b) F (a) = lim α [a, b] = α[a, b]. Now, we ca let a i such a way that a is always a cotiuity poit of F. The, we get, lim F (b) = α(, b]. The ext result shows that weak covergece is actually weak : Propositio 4.7. Suppose X coverges i probability to X. The, X coverges weakly to X. Proof. Let F, F be the distributio fuctios of X, X respectively. suppose x is a cotiuity poit of F. Note that ad {X x ɛ} { X X ɛ} {X x} {X x} = {X x ad X x + ɛ} {X x ad X > x + ɛ} Therefore, {X x + ɛ} { X X ɛ} P {X x ɛ} P { X X ɛ} P {X x} P {X x + ɛ} + P { X X ɛ} Sice for each ɛ > 0, lim P { X X ɛ} = 0, whe we let, we have F (x ɛ) lim if F (x) lim sup F (x) F (x + ɛ). 6

Fially, sice F is cotiuous at x, lettig ɛ 0, we have so that X X. lim F (x) = F (x) The coverse is ot true i geeral. However, if X is a degeerate distributio (takes a sigle value with probability oe), the the coverse is true. Propositio 4.8. Suppose X X, ad X is a degeerate distributio such that P {X = a} = 1. The, X P X. Proof. Let α ad α be the distributios o iduced by X ad X respectively. Give ɛ > 0, we have Hece, ad so lim α [a ɛ, a + ɛ] = α[a ɛ, a + ɛ] = 1. lim P { X X ɛ} = 1, lim P { X X > ɛ} = 0 5 Product Measures ad Idepedece Suppose (Ω 1, F 1 ) ad (Ω 2, F 2 ) are two measurable spaces. We wat to costruct a product measurable space with sample space Ω 1 Ω 2. Defiitio 5.1. Let A = {A B : A F 1, B F 2 }. Let F 1 F 2 be the σ-field geerated by A. F 1 F 2 is called the product σ-field of F 1 ad F 2. If P 1 ad P 2 are probability measures o the measurable spaces above, the P 1 P 2 (A B) := P 1 (A)P 2 (B) gives a probability measure o A. This ca be exteded i a caoical way to the σ-field F 1 F 2. Defiitio 5.2. P 1 P 2 is called the product probability measure of P 1 ad P 2. Let Ω := Ω 1 Ω 2, F := F 1 F 2, ad P := P 1 P 2. Note that whe calculatig itegrals with respect to a product probability measure, we ca ormally perform a iterated itegral i ay order with respect to the compoet probability measures. This result is kow as Fubii s Theorem. Before we defie a otio of idepedece, we will give some heuristic cosideratios. Two evets A ad B should be idepedet if A occurrig has othig to do with B occurrig. If we deote by P A (X), the probability that X occurs give that A has occurred, the we see that P A (X) = P (A X) P (A). Now, suppose that A ad B are ideed idepedet. This meas that P A (B) = P (B). But the, P (B) = P (A B) P (A), so that P (A B) = P (A)P (B). This leads us to defie, 7

Defiitio 5.3. Let (Ω, F, P ) be a probability space. Let A i F for every i. Let X i be a radom variable for every i. (i) A 1,..., A are idepedet if P (A 1 A ) = P (A 1 ) P (A ). (ii) A collectio of evets {A i } i I is idepedet if every fiite subcollectio is idepedet. (iii) X 1,..., X are idepedet if for ay sets A 1,..., A B(), the evets {X i A i } i=1 are idepedet. (iv) A collectio of radom variables {X i } i I subcollectio is idepedet. is idepedet if every fiite Lemma 5.4. Suppose X, Y are radom variables o (Ω, F, P ), with iduced distributios α, β o respectively. The, X ad Y are idepedet if ad oly if the distributio iduced o 2 by (X, Y ) is α β. Lemma 5.5. Suppose X, Y are idepedet radom variables, ad suppose that f, g are measurable fuctios. The, f(x) ad g(y ) are also idepedet radom variables. Propositio 5.6. Let X, Y be idepedet radom variables, ad let f, g be measurable fuctios. Suppose that E f(x) ad E g(y ) are both fiite. The, E[f(X)g(Y )] = E[f(X)]E[g(Y )]. Proof. Let α be the distributio o iduced by f(x), ad let β be the distributio iduced by g(y ). The, the distributio o 2 iduced by (f(x), g(y )) is α β. So, E[f(X)g(Y )] = f(x(ω))g(y (ω)) dp = uv dα dβ Ω = 6 Characteristic Fuctios u dα v dβ = E[f(X)]E[g(Y )] The iverse Fourier trasform of a probability distributio plays a cetral role i probability theory. Defiitio 6.1. Let α be a probability measure o. The, the characteristic fuctio of α is φ α (t) = e ıtx dα If X is a radom variable, the characteristic fuctio of the distributio o iduced by X will sometimes be deoted φ X. These results demostrate the importace of the characteristic fuctio i probability. 8

Propositio 6.2. Suppose α ad β are probability measures o with characteristic fuctios φ ad ψ respectively. Suppose further that for each t, φ(t) = ψ(t). The, α = β. Theorem 6.3. Let α, α be probability measures o with distributio fuctios F ad F ad characteristic fuctios φ ad φ. The, the followig are equivalet (i) α α. (ii) for ay bouded cotiuous fuctio f :, f(x)dα = f(x)dα. (iii) for every t, lim lim φ (t) = φ(t). Theorem 6.4. Suppose α is a sequece of probability measures o, with characteristic fuctios φ. Suppose that for each t, lim φ (t) =: φ(t) exists ad φ is cotiuous at 0. The, there is a probability distriubtio α such that φ is the characteristic fuctio of α. Furthermore, α α. Next, we show how to recover the momets of a radom variable from its characteristic fuctio. Defiitio 6.5. Suppose X is a radom variable. The, the kth momet of X is EX k. The kth absolute momet of X is E X k. Propositio 6.6. Let X be a radom variable. Suppose that the kth momet of X exists. The, the characteristic fuctio φ of X is k times cotiuously differetiable, ad φ (k) (0) = ı k EX k. Now, a result o affie trasforms of a radom variable: Propositio 6.7. Suppose X is a radom variable, ad Y = ax + b. Let φ X ad φ Y be the characteristic fuctios of X ad Y. The, φ Y (t) = e ıtb φ X (at). We will ofte be iterested i the sums of idepedet radom variables. Suppose that X ad Y are idepedet radom variables with iduced distributios α ad β o respectively. The, the iduced distributio of (X, Y ) o 2 is α β. Cosider the map f : 2 give by f(x, y) = x + y. The, the distributio o iduced by α β is deoted α β, ad is called the covolutio of α ad β. α β is the distributio of the sum of X ad Y. Propositio 6.8. Suppose X ad Y are idepedet radom variables with distributios α ad β respectively. The, φ X+Y (t) = φ X (t)φ Y (t). 9

Proof. φ α β (t) = e ıtz dα β = e ıt(x+y) dα dβ = e ıtx e ıty dα dβ = e ıtx dα e ıty dβ = φ α (t)φ β (t) 7 Useful Bouds ad Iequalities Here, we will prove some useful bouds regardig radom variables ad their momets. Defiitio 7.1. Let X be a radom variable. Var(X) := E[(X EX) 2 ] = EX 2 (EX) 2. The, the variace of X is The variace is a measure of how far spread X is o average from its mea. It exists if X has a fiite secod momet. It is ofte deoted σ 2. Lemma 7.2. Suppose X, Y are idepedet radom variables. The, Var(X + Y ) = Var(X) + Var(Y ) Propositio 7.3 (Markov s Iequality). Let ɛ > 0. Suppose X is a radom variable with fiite kth absolute momet. The, P { X ɛ} 1 ɛ k E X k. Proof. P { X ɛ} = { X ɛ} dp 1 ɛ k { X ɛ} X k dp 1 ɛ k Ω X k dp = 1 ɛ k E X k Corollary 7.4 (Chebyshev s Iequality). Suppose X is a radom variable with fiite 2d momet. The, The followig is also a useful fact: P { X EX ɛ} 1 ɛ 2 Var(X). Lemma 7.5. Suppose X is a oegative radom variable. The, P {X m} EX Proof. E X = = m=1 P { X < + 1} = =1 P {X m} EX m=1 m=1 =m P { X < + 1} 10

8 The Borel-Catelli Lemma First, let us itroduce some termiology. Let A 1, A 2,... be sets. The, lim sup A := =1 m= A m. lim sup A cosists of those ω that appear i A ifiitely ofte (i.o.). Also, lim if A := =1 m= A. lim if A cosists of those ω that appear i all but fiitely may A. Theorem 8.1 (Borel-Catelli Lemma). Let A 1, A 2,... F. If =1 P (A ) <, the P (lim sup A ) = 0. Furthermore, suppose that the A i are idepedet. The, if =1 P (A ) =, the P (lim sup A ) = 1. Proof. Suppose =1 P (A ) <. The, ( ) ( ) P m = A m = lim P A m lim =1 =1 m= m= For the coverse, it is eough to show that ( ) P = 0, ad so it is also eough to show that ( P m= A c m A c m ) = 0 m= for all. By idepedece, ad sice 1 x e x, we have ( ) ( +k ) { +k P P = (1 P (A m )) exp m= A c m m= A c m m= Sice the last sum diverges, takig the limit as k, we get ( ) P = 0 m= A c m P (A m ) = 0. +k m= P (A m ) } 11

9 The Law of Large Numbers Let X 1, X 2,... be radom variables that are idepedet ad idetically distributed (iid). Let S := X 1 + +X. We will be iterested i the asymptotic behavior of the average S. If X i has a fiite expectatio, the we would thik that S would settle dow to EX i. This is kow as the Law of Large Numbers. There are two varieties of this law: the Weak Law of Large Numbers ad the Strog Law of Large Numbers. The weak law states that the average coverges i probability to EX i. The strog law, however states that the average coverges almost surely to EX i. However, the strog law is sigificatly harder to prove, ad requires a bit of additioal machiery. For the rest of this sectio, fix a probability space (Ω, F, P ). Theorem 9.1 (The Weak Law of Large Numbers). Suppose X 1, X 2,... are iid radom variables with mea EX i = m <. The, S P m. Proof. Let φ be the characteristic fuctio of X i. The, the characteristic fuctio of S is [φ(t)]. The, by 6.7, the characteristic fuctio of S is ψ (t) = [φ( t )]. Furthermore, by 6.6, φ is differetiable, ad φ (0) = im. Therefore, we ca form the Taylor expasio, ( ) t φ = 1 + ımt ( ) 1 + o, ad so ψ (t) = Takig the limit as, we get [ 1 + ımt ( )] 1 + o. lim ψ (t) = e ımt which is the characteristic fuctio for the distributio degeerate at m. Therefore, by Propositio 4.8, S coverges i probability to m. Theorem 9.2 (The Strog Law of Large Numbers). Suppose X 1, X 2,... are iid radom variables with EX i = m <. Let S = X 1 + + X. The, S coverges to m almost surely. Proof. We ca decompose a arbitrary radom variable X i ito its positive ad egative parts: X + i := X i I {Xi 0} ad X i := X i I {Xi<0}, so that X i = X + i X i. The, we have S = X 1 + + + X+ (X1 + + X ) =: S + S. Hece, it is eough to prove the Theorem for oegative X i. Now, Let Y i := X i I {Xi i}. Let S := Y 1 +... Y. Furthermore, let α > 1, ad set u := α. We shall first establish the iequality =1 P { S u ES u u } ɛ < ɛ > 0 (9.1) 12

Sice the X i are idepedet, we have Var(S ) = Var(Y k ) k=1 k=1 EY 2 k E[Xi 2 I {Xi k}] E[Xi 2 I {Xi }] k=1 By Chebyshev s iequality, we have =1 P { S u ES u u } ɛ =1 1 ɛ 2 Var(S u ) ɛ 2 u 2 =1 = 1 ɛ 2 E [X 2 i E[X 2 i I {X i u }] u =1 1 u I {Xi u } ] (9.2) Now, let K := 2α α 1. Let x > 0, ad let N := if{ : u x}. The, α N x. Also, ote that α 2u, ad so u 2α. The, ad hece, u x 1 2 1 u α = 2α N N =1 =0 ( ) 1 = Kα N Kx 1, α 1 u I {Xi u } KX 1 1 if X 1 > 0 ad so, puttig this ito (9.2), we get [ ] 1 ɛ 2 E Xi 2 1 I {Xi u u } 1 ɛ 2 E [ Xi 2 KX 1 ] K i = ɛ 2 EX i < =1 thereby establishig iequality (9.1). Therefore, by the Borel-Catelli Lemma, we have ( { Su P lim sup ES }) u ɛ = 0 ɛ > 0. Takig a itersectio over all ratioal ɛ, we get that u S u ES u u 0 almost surely. However, 1 ES = 1 k=1 EY k, ad sice EY k EX i, takig the limit as, we have that 1 ES EX i. Therefore, we have that S u u EX i almost surely. (9.3) 13

Now, otice that by Lemma 7.5, P {X Y } = P {X i > } EX i < =1 =1 Agai, by the Borel-Catelli Lemma, we have ( ) P lim sup{x Y } = 0. Therefore, S S 0 almost surely, ad so by (9.3), S u u EX i almost surely. (9.4) Now, to get that the etire sequece S EX i almost surely, ote that S m is a icreasig sequece. Suppose u k u +1. The, ad so, u u +1 S u u S k k u +1 S u+1 u u +1 1 α EX S k i lim if k k lim sup k S k k αex i almost surely. Takig α 1, we get by (9.4) S k lim k k = EX i almost surely 10 Coditioal Expectatio ad Probability Before defiig codtioal expectatio ad probability, we will make a few observatios about the probabalistic iterpretatio of σ-fields. Cosider a process where a radom umber betwee zero ad oe is chose. More precisely, a outcome ω is chose accordig to some probability law from the set of all possible outcomes, Ω = [0, 1). We may be able to observe this umber ω to some amout of precisio, say up to oe digit. The σ-field that represets this amout of precisio is F 1 := σ{[0,.1), [.1,.2),..., [.9, 1)}. The σ-field F 1 represets all the iformatio that we ca kow about ω by observig it to oe digit of precisio. That is, a observer who ca observe the umber ω to oe digit will be able to determie exactly which sets A F 1 that ω belogs to, but he will ot be able to give ay iformatio more precise tha that. Similarly, if we ca observe ω up to digits of precisio, the σ-field which correspods to this is: F := σ {[ ) i 10, } i+1 10 : 0 i < 10. This example illustrates a geeral cocept: The σ-field that is used represets the amout of iformatio that a observer has about the radom process. 14

Defiitio 10.1. If F is a σ-field, a F-observer is a observer who ca determie precisely which sets A F that a radom outcome ω belogs to but has o more iformatio about ω. Therefore, a 2 Ω -observer has complete iformatio about the outcome ω, whereas a F-observer has less iformatio. Similarly, if Σ F, the a F- observer has more iformatio tha a Σ-observer. Suppose that a radom variable X is F-measurable. This meas that the preimage of ay Borel set uder X i F. Therefore, a F-observer will have complete iformatio about X, or ay other F-measurable radom variable. Note that if Σ F, a Σ-measurable fuctio is also F-measurable. Suppose that X is a F-measurable radom variable, ad that you are a Σ- observer. You do ot have complete iformatio about X. However, give your iformatio Σ, you would like to make a buest guess about the value of X. That is, you wat to create aother radom variable, Y, that is Σ-measurable, but which approximates X. Y is called the coditioal expectatio of X wrt Σ, ad is deoted E[X Σ]. We will require that X(ω)dP = E[X Σ](ω)dP for all A Σ (10.1) A A Lemma 10.2. Let (Ω, F, P ) be a probability space, ad let Σ be a sub-σ-field of F. Let P Σ deote the restrictio of P to Σ. Suppose f is a Σ-measurable fuctio ad A Σ. The, f(ω)dp Σ = f(ω)dp. A Justified by the previous lemma, we will ofte be sloppy ad ot explicitly say which σ-field a particular itegral is take over. I order to prove that a fuctio satisfyig (10.1) exists, we will have to discuss the ado-nikodym Theorem. First, a defiitio. Defiitio 10.3. A siged measure λ o a measurable space (Ω, F) is a fuctio λ : F such that wheever A 1, A 2,... is a fiite or coutable sequece of disjoit sets i F, we have ( ) λ A i = λ(a i ) i i I particular, we have for a siged measure, λ( ) = 0. All probability measures are also siged measures. Note that λ is permitted to take o egative values. However, it is ot permitted to take o the values + or. Defiitio 10.4. a siged measure λ o (Ω, F) is absolutely cotiuous with respect to a probability measure P if, wheever P (A) = 0, we have also λ(a) = 0. This is deoted λ P. A 15

For example, if f is a itegrable fuctio wrt P, the λ(a) = A f(ω)dp is a siged measure that is absolutely cotiuous with respect to P. I fact, all absolutely cotiuous siged measures arise i this way. Theorem 10.5 (ado-nikodym). Suppose λ P. The, there is a itegrable fuctio f such that λ(a) = f(ω)dp. (10.2) A Furthermore, if f is aother fuctio satisfyig (10.2), the f = f P -almosteverywhere. Defiitio 10.6. The fuctio f i Theorem 10.5 is called the ado-nikodym derivative of λ with respect to P. It is deoted dλ dp. Note that the adom-nikodym derivative is oly defied up to equality almost everywhere. We ca use the ado-nikodym derivative to defie the coditioal expectatio satisfyig (10.1). Defiitio 10.7. Let (Ω, F, P ) be a probability space. Let Σ be a sub-σ-field of F. Let X be a F-itegrable radom variable. Let λ be the siged measure defied by λ(a) = X(ω)dP. The coditioal expectatio of X wrt Σ is A E[X Σ] := dλ Σ dp Σ. We ow state some of the elemetary properties of coditioal expectatio. Lemma 10.8. Let X ad X i be radom variables o (Ω, F, P ). Let Σ be a sub-σ-field of F. (i) E[E[X Σ]] = E[X] (ii) If X is oegative, the E[X Σ] is oegative almost surely. (iii) Suppose a 1, a 2. The E[a 1 X 1 + a 2 X 2 Σ] = a 1 E[X 1 Σ] + a 2 E[X 2 Σ] almost surely. (iv) E[X Σ] dp X dp. (v) If Y is bouded ad Σ-measurable, the E[XY Σ] = Y E[X Σ] almost surely. (vi) If Σ 2 Σ 1 F are sub-σ-fields, the E[X Σ 2 ] = E[E[X Σ 1 ] Σ 2 ] almost surely. As a special case of coditioal expectatio, we have coditioal probability. 16

Defiitio 10.9. Let (Ω, F, P ) be a probability space, ad let Σ be a subσ-field of F. The, the coditioal probability of a evet A F give Σ is P [A Σ] := E[I A Σ]. P [A Σ](ω) is also sometimes writte P (ω, A). elemetary properties of coditioal probability. We ow state some of the Lemma 10.10. The followig hold almost surely: (i) P (ω, Ω) = 1 ad P (ω, ). (ii) For ay A F, 0 P (ω, A) 1. (iii) If A 1, A 2... is a fiite or coutable sequece of disjoit sets i F, the ( P ω, ) A i = P (ω, A i ). i i (iv) If A Σ, the P (ω, A) = I A (ω). Lemma 10.10 i particular implies that give ω Ω, P (ω, ) is a probability measure o (Ω, F). efereces [1] P. Billigsley, Probability ad measure, Joh Wiley & Sos, Ic., 1995. [2] S..S. Varadha, Probability theory, Courat lecture otes, 7. America Mathematical Society, 2001. 17