Probability Theory Muhammad Waliji August 11, 2006 Abstract This paper itroduces some elemetary otios i Measure-Theoretic Probability Theory. Several probabalistic otios of the covergece of a sequece of radom variables are discussed. The theory is the used to prove the Law of Large Numbers. Fially, the otios of coditioal expectatio ad coditioal probability are itroduced. 1 Heuristic Itroductio Probability theory is cocered with the outcome of experimets that are radom i ature, that is, experimets whose outcomes caot be predicted i advace. The set of possible outcomes, ω, of a experimet is called the sample space, deoted by Ω. For istace, if our experimet cosists of rollig a dice, we will have Ω = {1, 2, 3, 4, 5, 6}. A subset, A, of Ω is called a evet. For istace A = {1, 3, 5} correspods to the evet a odd umber is rolled. I elemetary probability theory, oe is ormally cocered with sample spaces that are either fiite or coutable. I this case, oe ofte assigs a probability to every sigle outcome. That is, we have probability fuctio P : Ω [0, 1], where P (ω) is the probability that ω occurs. Here, we issist that P (ω) = 1. ω Ω However, if the sample space is ucoutable, the this coditio becomes osesible. Two elemetary types of problems come ito this category ad hece caot be dealt with by elemetary probability theory: a ifiite umber of repeated coi tosses (or dice rolls), ad a umber draw at radom from [0, 1]. This illustrates the importace of ucoutable sample spaces. The solutio to this problem is to use the theory of measures. Istead of assigig probabilities to outcomes i the sample space, oe ca restrict himself to a certai class of evets that form a structure kow as a σ-field, ad assig probabilities to these special kids of evets. 1
2 σ-fields, Probability Measures, ad Distributio Fuctios Defiitio 2.1. A class of subsets of Ω, F, is a σ-field if the followig hold: (i) F ad Ω F (ii) A F = A c F (iii) A 1, A 2,... F = A F Note that this implies that σ-fields are also closed uder coutable iteresectios also. Defiitio 2.2. The σ-field geerated by a class of sets, A, is the smallest σ-field cotaiig A. It is deoted σ(a). Defiitio 2.3. Let F be a σ-field. A fuctio P : F [0, 1] is a probability measure if P ( ) = 0, P (Ω) = 1, ad wheever (A ) N is a disjoit collectio of sets i F, we have ( ) P A = P (A ). =1 Throughout this paper, uless otherwise oted, the words icreasig, decreasig, ad mootoe are always meat i their weak sese. Suppose {A } is a sequece of sets. We say that {A } is a icreasig sequece if A 1 A 2. We say that {A } is a decreasig sequece if A 1 A 2. I both of these cases, the sequece {A } is said to be mootoe. If A is icreasig, the set lim A := A. If A is decreasig, the set lim A := A. The followig properties follow immediately from the defiitios. Lemma 2.4. Let F be a σ-field, ad let P be a probability measure o it. (i) P (A c ) = 1 P (A) (ii) If A B, the P (A) P (B). (iii) P ( i=1 A i) i=1 P (A i). (iv) If {A } is a mootoe sequece i F, the lim P (A ) = P (lim A ). Defiitio 2.5. Suppose Ω is a set, F is a field o Ω, ad P is a probability measure o F. The, the ordered pair (Ω, F) is called a measurable space. The triple (Ω, F, P ) is called a probability space. The a probability space is fiitely additive or coutably additive depedig o whether P is fiitely or coutably additive. Defiitio 2.6. Let (X, τ) be a topological space. The σ-field, B(X, τ) geerated by τ is called the Borel σ-field. I particular, B(X, τ) is the smallest σ-field cotaiig all ope ad closed sets of X. The sets of B(X, τ) are called Borel sets. =1 2
Whe the topology, τ, or eve the space X are obvious from the cotext, B(X, τ) will ofte be abbreviated B(X) or eve just B. A particularly importat situatio i probability theory is whe Ω = ad F are the Borel sets i. Defiitio 2.7. A distributio fuctio is a icreasig right-cotiuous fuctio F : [0, 1] such that lim F (x) = 0 ad lim F (x) = 1. x x We ca associate probability fuctios o (, B) with distributio fuctios. Namely, the distributio fuctio associated with P is F (x) := P ((, x]). Coversely, each distributio fuctio defies a probability fuctio o the reals. 3 adom Variables, Trasformatios, ad Expectatio We ow have stated the basic objects that we will be studyig ad discussed their elemetary properties. We ow itroduce the cocept of a adom Variable. Let Ω be the set of all possible drawigs of lottery umbers. The fuctio X :Ω which idicates the payoff X(ω) to a player associated with a drawig ω is a example of a radom variable. The expectatio of a radom variable is the average or expected value of X. Defiitio 3.1. Let (Ω 1, F 1 ) ad (Ω 2, F 2 ) be measurable spaces. A fuctio T : Ω 1 Ω 2 is a measurable trasformatio if the preimage of ay measurable set is a measurable set. That is, T is a measurable trasformatio if ( A F 2 )(T 1 (A) F 1 ). Lemma 3.2. It is sufficiet to check the coditio i Defiitio 3.1 for those A i a class that geerates F 2. More precisely, suppose that A geerates F 2. The, if ( A A)(T 1 (A) F 1 ), the T is a measurable trasformatio. Proof. Let C := {A 2 Ω2 : T 1 (A) F 1 }. The, C is a σ-field, ad A C. But the, σ(a) = F 2 C, which is exactly what we wated. Defiitio 3.3. Let (Ω, F) be a measurable space. A measurable fuctio or a radom variable is a measurable trasformatio from (Ω, F) ito (, B). Lemma 3.4. If f : is a cotiuous fuctio, the f is a measurable trasformatio from (, B) to (, B). Defiitio 3.5. Give a set A, the idicator fuctio for A is the fuctio { 1 if ω A I A (ω) := 0 if ω / A 3
If A F, the I A is a measurable fuctio. Note that may elemetary operatios, icludig compositio, arithmetic, max, mi, ad others, whe performed upo measurable fuctios, agai yield measurable fuctios. Let (Ω 1, F 1, P ) be a probability space ad (Ω 2, F 2 ) a measurable space. A measurable trasformatio T :Ω 1 Ω 2 aturally iduces a probability measure P T 1 o (Ω 2, F 2 ). I the case of a radom variable, X, the iduced measure o will geerally be deoted α. The distributio fuctio associated with α will be deoted F X. α will sometimes be called a probability distributio. Now that we have a otio of measure ad of measurable fuctios, we ca develop a otio of the itegral of a fuctio. The itegral will have the probabalistic iterpretatio of beig a expected (or average) value. For the precise defiitio of the Lebesgue itegral, see ay textbook o Measure Theory. Defiitio 3.6. Suppose X is a radom variable. The the expectatio of X is EX := Ω X(ω)dP. We coclude this sectio with a useful chage of variables formula for itegrals. Propositio 3.7. Let (Ω 1, F 1, P ) be a probability space ad let (Ω 2, F 2 ) be a measurable space. Suppose T :Ω 1 Ω 2 is a measurable trasformatio. Suppose f : Ω 2 is a measurable fuctio. The, P T 1 is a probability measure o (Ω 2, F 2 ) ad ft :Ω 1 is a measurable fuctio. Furthermore, f is itegrable iff ft is itegrable, ad ft (ω 1 )dp = f(ω 2 )dp T 1. Ω 1 Ω 2 4 Notios of Covergece We will ow itroduce some otios of the covergece of radom variables. Note that we will ofte ot explicitly state the depedece of a fuctio X(ω) o ω. Hece, sets of the form {ω : X(ω) > 0} will ofte be abbreviated {X > 0}. For the remaider of this sectio, let X be a sequece of radom variables. Defiitio 4.1. The sequece X coverges almost surely (almost everywhere) to a radom variable X if X (ω) X(ω) for all ω outside of a set of probability 0. Defiitio 4.2. The sequece X coverges i probability (i measure) to a fuctio X if, for every ɛ > 0, This is deoted X P X. lim P {ω : X (ω) X(ω) ɛ} = 0. 4
Propositio 4.3. If X coverges almost surely to X, the X coverges i probability to X. Proof. We have {ω : X (ω) X(ω)} N, P (N) = 0. That is, ɛ > 0 Therefore, give ɛ > 0, we have =1 m= lim P { X X ɛ} lim P thereby completig the proof. = P { X m X ɛ} N. m= =1 m= { X m X ɛ} { X m X ɛ} P (N) = 0 Note, however, that the coverse is ot true. Let Ω = [0, 1] with Lebesgue measure. Cosider the sequece of sets A 1 = [0, 1 2 ], A 2 = [ 1 2, 1], A 3 = [0, 1 3 ], A 4 = [ 1 3, 2 3 ], ad so o. The, the idicator fuctios, I A, coverge i probability to 0. However, I A (ω) does ot coverge for ay ω, ad i particular the sequece does ot coverge almost surely. However, the followig holds as a sort of coverse: Propositio 4.4. Suppose f coverges i probability to f. The, there is a subsequece f k of f such that f k coverges almost surely to f. Proof. Let B ɛ := {ω : f (ω) f(ω) ɛ}. The, f i f almost surely iff P ( B ɛ j ) = 0. i j>i We kow that for ay ɛ, Now, otice that P ( m lim P (Bɛ ) = 0. Bm) ɛ if P ( Bm) ɛ if m Furthermore, ɛ 1 < ɛ 2 B ɛ1 B ɛ2 P (Bm) ɛ = lim m= P (B ɛ1 ) P (B ɛ2 ). m= P (Bm). ɛ Let δ i := 1/2 i. Now, ote that ( i)( ɛ i )( ɛ i )(P (Bɛ ) < δ i ). Let i := δi i. Choose ɛ 0. Note, ( m)(δ m < ɛ). Hece, P ( i j i B ɛ j ) lim i which is what we wated. j=i P (B ɛ j ) lim i j=i P (B δj j ) = lim i δ j = 0 j=i 5
Defiitio 4.5. A sequece of probability measures {α } o coverges weakly to α if wheever α(a) = α(b) = 0, for a < b, we have lim α [a, b] = α[a, b]. A sequece of radom variable {X } coverges weakly to X if the iduced probability measures {α } coverge weakly to α. This is deoted α α or X X. Lemma 4.6. Suppose α ad α are probability measures o with associated distributio fuctios F ad F. The, α α iff F (x) F (x) for each cotiuity poit x of F. Proof. First, ote that x is a cotiuity poit of F iff α(x) = 0. Let a < b be cotiuity poits of F. Suppose F (x) F (x) for each cotiuity poit x of F. The, lim α [a, b] = lim F (b) F (a) = F (b) F (a) = α[a, b]. For the coverse, suppose α α. The, lim F (b) F (a) = lim α [a, b] = α[a, b]. Now, we ca let a i such a way that a is always a cotiuity poit of F. The, we get, lim F (b) = α(, b]. The ext result shows that weak covergece is actually weak : Propositio 4.7. Suppose X coverges i probability to X. The, X coverges weakly to X. Proof. Let F, F be the distributio fuctios of X, X respectively. suppose x is a cotiuity poit of F. Note that ad {X x ɛ} { X X ɛ} {X x} {X x} = {X x ad X x + ɛ} {X x ad X > x + ɛ} Therefore, {X x + ɛ} { X X ɛ} P {X x ɛ} P { X X ɛ} P {X x} P {X x + ɛ} + P { X X ɛ} Sice for each ɛ > 0, lim P { X X ɛ} = 0, whe we let, we have F (x ɛ) lim if F (x) lim sup F (x) F (x + ɛ). 6
Fially, sice F is cotiuous at x, lettig ɛ 0, we have so that X X. lim F (x) = F (x) The coverse is ot true i geeral. However, if X is a degeerate distributio (takes a sigle value with probability oe), the the coverse is true. Propositio 4.8. Suppose X X, ad X is a degeerate distributio such that P {X = a} = 1. The, X P X. Proof. Let α ad α be the distributios o iduced by X ad X respectively. Give ɛ > 0, we have Hece, ad so lim α [a ɛ, a + ɛ] = α[a ɛ, a + ɛ] = 1. lim P { X X ɛ} = 1, lim P { X X > ɛ} = 0 5 Product Measures ad Idepedece Suppose (Ω 1, F 1 ) ad (Ω 2, F 2 ) are two measurable spaces. We wat to costruct a product measurable space with sample space Ω 1 Ω 2. Defiitio 5.1. Let A = {A B : A F 1, B F 2 }. Let F 1 F 2 be the σ-field geerated by A. F 1 F 2 is called the product σ-field of F 1 ad F 2. If P 1 ad P 2 are probability measures o the measurable spaces above, the P 1 P 2 (A B) := P 1 (A)P 2 (B) gives a probability measure o A. This ca be exteded i a caoical way to the σ-field F 1 F 2. Defiitio 5.2. P 1 P 2 is called the product probability measure of P 1 ad P 2. Let Ω := Ω 1 Ω 2, F := F 1 F 2, ad P := P 1 P 2. Note that whe calculatig itegrals with respect to a product probability measure, we ca ormally perform a iterated itegral i ay order with respect to the compoet probability measures. This result is kow as Fubii s Theorem. Before we defie a otio of idepedece, we will give some heuristic cosideratios. Two evets A ad B should be idepedet if A occurrig has othig to do with B occurrig. If we deote by P A (X), the probability that X occurs give that A has occurred, the we see that P A (X) = P (A X) P (A). Now, suppose that A ad B are ideed idepedet. This meas that P A (B) = P (B). But the, P (B) = P (A B) P (A), so that P (A B) = P (A)P (B). This leads us to defie, 7
Defiitio 5.3. Let (Ω, F, P ) be a probability space. Let A i F for every i. Let X i be a radom variable for every i. (i) A 1,..., A are idepedet if P (A 1 A ) = P (A 1 ) P (A ). (ii) A collectio of evets {A i } i I is idepedet if every fiite subcollectio is idepedet. (iii) X 1,..., X are idepedet if for ay sets A 1,..., A B(), the evets {X i A i } i=1 are idepedet. (iv) A collectio of radom variables {X i } i I subcollectio is idepedet. is idepedet if every fiite Lemma 5.4. Suppose X, Y are radom variables o (Ω, F, P ), with iduced distributios α, β o respectively. The, X ad Y are idepedet if ad oly if the distributio iduced o 2 by (X, Y ) is α β. Lemma 5.5. Suppose X, Y are idepedet radom variables, ad suppose that f, g are measurable fuctios. The, f(x) ad g(y ) are also idepedet radom variables. Propositio 5.6. Let X, Y be idepedet radom variables, ad let f, g be measurable fuctios. Suppose that E f(x) ad E g(y ) are both fiite. The, E[f(X)g(Y )] = E[f(X)]E[g(Y )]. Proof. Let α be the distributio o iduced by f(x), ad let β be the distributio iduced by g(y ). The, the distributio o 2 iduced by (f(x), g(y )) is α β. So, E[f(X)g(Y )] = f(x(ω))g(y (ω)) dp = uv dα dβ Ω = 6 Characteristic Fuctios u dα v dβ = E[f(X)]E[g(Y )] The iverse Fourier trasform of a probability distributio plays a cetral role i probability theory. Defiitio 6.1. Let α be a probability measure o. The, the characteristic fuctio of α is φ α (t) = e ıtx dα If X is a radom variable, the characteristic fuctio of the distributio o iduced by X will sometimes be deoted φ X. These results demostrate the importace of the characteristic fuctio i probability. 8
Propositio 6.2. Suppose α ad β are probability measures o with characteristic fuctios φ ad ψ respectively. Suppose further that for each t, φ(t) = ψ(t). The, α = β. Theorem 6.3. Let α, α be probability measures o with distributio fuctios F ad F ad characteristic fuctios φ ad φ. The, the followig are equivalet (i) α α. (ii) for ay bouded cotiuous fuctio f :, f(x)dα = f(x)dα. (iii) for every t, lim lim φ (t) = φ(t). Theorem 6.4. Suppose α is a sequece of probability measures o, with characteristic fuctios φ. Suppose that for each t, lim φ (t) =: φ(t) exists ad φ is cotiuous at 0. The, there is a probability distriubtio α such that φ is the characteristic fuctio of α. Furthermore, α α. Next, we show how to recover the momets of a radom variable from its characteristic fuctio. Defiitio 6.5. Suppose X is a radom variable. The, the kth momet of X is EX k. The kth absolute momet of X is E X k. Propositio 6.6. Let X be a radom variable. Suppose that the kth momet of X exists. The, the characteristic fuctio φ of X is k times cotiuously differetiable, ad φ (k) (0) = ı k EX k. Now, a result o affie trasforms of a radom variable: Propositio 6.7. Suppose X is a radom variable, ad Y = ax + b. Let φ X ad φ Y be the characteristic fuctios of X ad Y. The, φ Y (t) = e ıtb φ X (at). We will ofte be iterested i the sums of idepedet radom variables. Suppose that X ad Y are idepedet radom variables with iduced distributios α ad β o respectively. The, the iduced distributio of (X, Y ) o 2 is α β. Cosider the map f : 2 give by f(x, y) = x + y. The, the distributio o iduced by α β is deoted α β, ad is called the covolutio of α ad β. α β is the distributio of the sum of X ad Y. Propositio 6.8. Suppose X ad Y are idepedet radom variables with distributios α ad β respectively. The, φ X+Y (t) = φ X (t)φ Y (t). 9
Proof. φ α β (t) = e ıtz dα β = e ıt(x+y) dα dβ = e ıtx e ıty dα dβ = e ıtx dα e ıty dβ = φ α (t)φ β (t) 7 Useful Bouds ad Iequalities Here, we will prove some useful bouds regardig radom variables ad their momets. Defiitio 7.1. Let X be a radom variable. Var(X) := E[(X EX) 2 ] = EX 2 (EX) 2. The, the variace of X is The variace is a measure of how far spread X is o average from its mea. It exists if X has a fiite secod momet. It is ofte deoted σ 2. Lemma 7.2. Suppose X, Y are idepedet radom variables. The, Var(X + Y ) = Var(X) + Var(Y ) Propositio 7.3 (Markov s Iequality). Let ɛ > 0. Suppose X is a radom variable with fiite kth absolute momet. The, P { X ɛ} 1 ɛ k E X k. Proof. P { X ɛ} = { X ɛ} dp 1 ɛ k { X ɛ} X k dp 1 ɛ k Ω X k dp = 1 ɛ k E X k Corollary 7.4 (Chebyshev s Iequality). Suppose X is a radom variable with fiite 2d momet. The, The followig is also a useful fact: P { X EX ɛ} 1 ɛ 2 Var(X). Lemma 7.5. Suppose X is a oegative radom variable. The, P {X m} EX Proof. E X = = m=1 P { X < + 1} = =1 P {X m} EX m=1 m=1 =m P { X < + 1} 10
8 The Borel-Catelli Lemma First, let us itroduce some termiology. Let A 1, A 2,... be sets. The, lim sup A := =1 m= A m. lim sup A cosists of those ω that appear i A ifiitely ofte (i.o.). Also, lim if A := =1 m= A. lim if A cosists of those ω that appear i all but fiitely may A. Theorem 8.1 (Borel-Catelli Lemma). Let A 1, A 2,... F. If =1 P (A ) <, the P (lim sup A ) = 0. Furthermore, suppose that the A i are idepedet. The, if =1 P (A ) =, the P (lim sup A ) = 1. Proof. Suppose =1 P (A ) <. The, ( ) ( ) P m = A m = lim P A m lim =1 =1 m= m= For the coverse, it is eough to show that ( ) P = 0, ad so it is also eough to show that ( P m= A c m A c m ) = 0 m= for all. By idepedece, ad sice 1 x e x, we have ( ) ( +k ) { +k P P = (1 P (A m )) exp m= A c m m= A c m m= Sice the last sum diverges, takig the limit as k, we get ( ) P = 0 m= A c m P (A m ) = 0. +k m= P (A m ) } 11
9 The Law of Large Numbers Let X 1, X 2,... be radom variables that are idepedet ad idetically distributed (iid). Let S := X 1 + +X. We will be iterested i the asymptotic behavior of the average S. If X i has a fiite expectatio, the we would thik that S would settle dow to EX i. This is kow as the Law of Large Numbers. There are two varieties of this law: the Weak Law of Large Numbers ad the Strog Law of Large Numbers. The weak law states that the average coverges i probability to EX i. The strog law, however states that the average coverges almost surely to EX i. However, the strog law is sigificatly harder to prove, ad requires a bit of additioal machiery. For the rest of this sectio, fix a probability space (Ω, F, P ). Theorem 9.1 (The Weak Law of Large Numbers). Suppose X 1, X 2,... are iid radom variables with mea EX i = m <. The, S P m. Proof. Let φ be the characteristic fuctio of X i. The, the characteristic fuctio of S is [φ(t)]. The, by 6.7, the characteristic fuctio of S is ψ (t) = [φ( t )]. Furthermore, by 6.6, φ is differetiable, ad φ (0) = im. Therefore, we ca form the Taylor expasio, ( ) t φ = 1 + ımt ( ) 1 + o, ad so ψ (t) = Takig the limit as, we get [ 1 + ımt ( )] 1 + o. lim ψ (t) = e ımt which is the characteristic fuctio for the distributio degeerate at m. Therefore, by Propositio 4.8, S coverges i probability to m. Theorem 9.2 (The Strog Law of Large Numbers). Suppose X 1, X 2,... are iid radom variables with EX i = m <. Let S = X 1 + + X. The, S coverges to m almost surely. Proof. We ca decompose a arbitrary radom variable X i ito its positive ad egative parts: X + i := X i I {Xi 0} ad X i := X i I {Xi<0}, so that X i = X + i X i. The, we have S = X 1 + + + X+ (X1 + + X ) =: S + S. Hece, it is eough to prove the Theorem for oegative X i. Now, Let Y i := X i I {Xi i}. Let S := Y 1 +... Y. Furthermore, let α > 1, ad set u := α. We shall first establish the iequality =1 P { S u ES u u } ɛ < ɛ > 0 (9.1) 12
Sice the X i are idepedet, we have Var(S ) = Var(Y k ) k=1 k=1 EY 2 k E[Xi 2 I {Xi k}] E[Xi 2 I {Xi }] k=1 By Chebyshev s iequality, we have =1 P { S u ES u u } ɛ =1 1 ɛ 2 Var(S u ) ɛ 2 u 2 =1 = 1 ɛ 2 E [X 2 i E[X 2 i I {X i u }] u =1 1 u I {Xi u } ] (9.2) Now, let K := 2α α 1. Let x > 0, ad let N := if{ : u x}. The, α N x. Also, ote that α 2u, ad so u 2α. The, ad hece, u x 1 2 1 u α = 2α N N =1 =0 ( ) 1 = Kα N Kx 1, α 1 u I {Xi u } KX 1 1 if X 1 > 0 ad so, puttig this ito (9.2), we get [ ] 1 ɛ 2 E Xi 2 1 I {Xi u u } 1 ɛ 2 E [ Xi 2 KX 1 ] K i = ɛ 2 EX i < =1 thereby establishig iequality (9.1). Therefore, by the Borel-Catelli Lemma, we have ( { Su P lim sup ES }) u ɛ = 0 ɛ > 0. Takig a itersectio over all ratioal ɛ, we get that u S u ES u u 0 almost surely. However, 1 ES = 1 k=1 EY k, ad sice EY k EX i, takig the limit as, we have that 1 ES EX i. Therefore, we have that S u u EX i almost surely. (9.3) 13
Now, otice that by Lemma 7.5, P {X Y } = P {X i > } EX i < =1 =1 Agai, by the Borel-Catelli Lemma, we have ( ) P lim sup{x Y } = 0. Therefore, S S 0 almost surely, ad so by (9.3), S u u EX i almost surely. (9.4) Now, to get that the etire sequece S EX i almost surely, ote that S m is a icreasig sequece. Suppose u k u +1. The, ad so, u u +1 S u u S k k u +1 S u+1 u u +1 1 α EX S k i lim if k k lim sup k S k k αex i almost surely. Takig α 1, we get by (9.4) S k lim k k = EX i almost surely 10 Coditioal Expectatio ad Probability Before defiig codtioal expectatio ad probability, we will make a few observatios about the probabalistic iterpretatio of σ-fields. Cosider a process where a radom umber betwee zero ad oe is chose. More precisely, a outcome ω is chose accordig to some probability law from the set of all possible outcomes, Ω = [0, 1). We may be able to observe this umber ω to some amout of precisio, say up to oe digit. The σ-field that represets this amout of precisio is F 1 := σ{[0,.1), [.1,.2),..., [.9, 1)}. The σ-field F 1 represets all the iformatio that we ca kow about ω by observig it to oe digit of precisio. That is, a observer who ca observe the umber ω to oe digit will be able to determie exactly which sets A F 1 that ω belogs to, but he will ot be able to give ay iformatio more precise tha that. Similarly, if we ca observe ω up to digits of precisio, the σ-field which correspods to this is: F := σ {[ ) i 10, } i+1 10 : 0 i < 10. This example illustrates a geeral cocept: The σ-field that is used represets the amout of iformatio that a observer has about the radom process. 14
Defiitio 10.1. If F is a σ-field, a F-observer is a observer who ca determie precisely which sets A F that a radom outcome ω belogs to but has o more iformatio about ω. Therefore, a 2 Ω -observer has complete iformatio about the outcome ω, whereas a F-observer has less iformatio. Similarly, if Σ F, the a F- observer has more iformatio tha a Σ-observer. Suppose that a radom variable X is F-measurable. This meas that the preimage of ay Borel set uder X i F. Therefore, a F-observer will have complete iformatio about X, or ay other F-measurable radom variable. Note that if Σ F, a Σ-measurable fuctio is also F-measurable. Suppose that X is a F-measurable radom variable, ad that you are a Σ- observer. You do ot have complete iformatio about X. However, give your iformatio Σ, you would like to make a buest guess about the value of X. That is, you wat to create aother radom variable, Y, that is Σ-measurable, but which approximates X. Y is called the coditioal expectatio of X wrt Σ, ad is deoted E[X Σ]. We will require that X(ω)dP = E[X Σ](ω)dP for all A Σ (10.1) A A Lemma 10.2. Let (Ω, F, P ) be a probability space, ad let Σ be a sub-σ-field of F. Let P Σ deote the restrictio of P to Σ. Suppose f is a Σ-measurable fuctio ad A Σ. The, f(ω)dp Σ = f(ω)dp. A Justified by the previous lemma, we will ofte be sloppy ad ot explicitly say which σ-field a particular itegral is take over. I order to prove that a fuctio satisfyig (10.1) exists, we will have to discuss the ado-nikodym Theorem. First, a defiitio. Defiitio 10.3. A siged measure λ o a measurable space (Ω, F) is a fuctio λ : F such that wheever A 1, A 2,... is a fiite or coutable sequece of disjoit sets i F, we have ( ) λ A i = λ(a i ) i i I particular, we have for a siged measure, λ( ) = 0. All probability measures are also siged measures. Note that λ is permitted to take o egative values. However, it is ot permitted to take o the values + or. Defiitio 10.4. a siged measure λ o (Ω, F) is absolutely cotiuous with respect to a probability measure P if, wheever P (A) = 0, we have also λ(a) = 0. This is deoted λ P. A 15
For example, if f is a itegrable fuctio wrt P, the λ(a) = A f(ω)dp is a siged measure that is absolutely cotiuous with respect to P. I fact, all absolutely cotiuous siged measures arise i this way. Theorem 10.5 (ado-nikodym). Suppose λ P. The, there is a itegrable fuctio f such that λ(a) = f(ω)dp. (10.2) A Furthermore, if f is aother fuctio satisfyig (10.2), the f = f P -almosteverywhere. Defiitio 10.6. The fuctio f i Theorem 10.5 is called the ado-nikodym derivative of λ with respect to P. It is deoted dλ dp. Note that the adom-nikodym derivative is oly defied up to equality almost everywhere. We ca use the ado-nikodym derivative to defie the coditioal expectatio satisfyig (10.1). Defiitio 10.7. Let (Ω, F, P ) be a probability space. Let Σ be a sub-σ-field of F. Let X be a F-itegrable radom variable. Let λ be the siged measure defied by λ(a) = X(ω)dP. The coditioal expectatio of X wrt Σ is A E[X Σ] := dλ Σ dp Σ. We ow state some of the elemetary properties of coditioal expectatio. Lemma 10.8. Let X ad X i be radom variables o (Ω, F, P ). Let Σ be a sub-σ-field of F. (i) E[E[X Σ]] = E[X] (ii) If X is oegative, the E[X Σ] is oegative almost surely. (iii) Suppose a 1, a 2. The E[a 1 X 1 + a 2 X 2 Σ] = a 1 E[X 1 Σ] + a 2 E[X 2 Σ] almost surely. (iv) E[X Σ] dp X dp. (v) If Y is bouded ad Σ-measurable, the E[XY Σ] = Y E[X Σ] almost surely. (vi) If Σ 2 Σ 1 F are sub-σ-fields, the E[X Σ 2 ] = E[E[X Σ 1 ] Σ 2 ] almost surely. As a special case of coditioal expectatio, we have coditioal probability. 16
Defiitio 10.9. Let (Ω, F, P ) be a probability space, ad let Σ be a subσ-field of F. The, the coditioal probability of a evet A F give Σ is P [A Σ] := E[I A Σ]. P [A Σ](ω) is also sometimes writte P (ω, A). elemetary properties of coditioal probability. We ow state some of the Lemma 10.10. The followig hold almost surely: (i) P (ω, Ω) = 1 ad P (ω, ). (ii) For ay A F, 0 P (ω, A) 1. (iii) If A 1, A 2... is a fiite or coutable sequece of disjoit sets i F, the ( P ω, ) A i = P (ω, A i ). i i (iv) If A Σ, the P (ω, A) = I A (ω). Lemma 10.10 i particular implies that give ω Ω, P (ω, ) is a probability measure o (Ω, F). efereces [1] P. Billigsley, Probability ad measure, Joh Wiley & Sos, Ic., 1995. [2] S..S. Varadha, Probability theory, Courat lecture otes, 7. America Mathematical Society, 2001. 17