LECTURE NOTES ON PROBABILITY

Size: px

Start display at page:

Download "LECTURE NOTES ON PROBABILITY"

Wilfrid Cain
5 years ago
Views:

1 LECTURE NOTES ON PROBABILITY OMER TAMUZ Cotets Disclaimer 3 1. Why we eed measure theory Riddle Riddle 2 (Gabay-O Coor game) Riddle Why we eed measure theory Bous riddle 5 2. Measure theory π-systems, algebras ad sigma-algebras 6 3. Hah-Kolmogorov Theorem ad costructig measures 9 4. Evets ad radom variables Idepedece ad the Borel-Catelli Lemmas The tail sigma-algebra Expectatios A strog law of large umbers ad the Cheroff boud The weak law of large umbers Coditioal expectatios Why thigs are ot as simple as they seem Coditioal expectatios i fiite spaces Coditioal expectatios i L Coditioal expectatios i L Some properties of coditioal expectatio The Galto-Watso process Markov chais Martigales Stoppig times Harmoic ad superharmoic fuctios The Choquet-Dey Theorem 47 Date: November 29, Partially adapted from Williams [6]. Ay commets or suggestios are welcome. 1

2 2 17. Characteristic fuctios ad the Cetral Limit Theorem Sceery Recostructio: I Statioary distributios ad processes Sceery recostructio: II Statioary processes ad measure preservig trasformatios The Ergodic Theorem The Rado-Nikodym derivative The weak topology ad the simplex of ivariat measures Percolatio Large deviatios The mass trasport priciple Majority dyamics 78 Refereces 81

3 3 Disclaimer This a ot a textbook. These are lecture otes.

4 4 1. Why we eed measure theory 1.1. Riddle 1. There are N people stadig i a lie. Each perso {1,..., N} has a bit X {0, 1} writte above her head. Each perso ca see the bits of the people i frot of her but ot her ow or the bits of those behid, so that perso ca see (X +1, X +2,..., X N ). Startig with perso 1, each perso declares i tur a bit Y, ad this declaratio is heard by the rest. Y has to be a fuctio of what is kow to perso. Hece Y = f (Y 1,..., Y 1, X +1,..., X N ) for some fuctio f : {0, 1} N 1 {0, 1}. Show that there exist fuctios (f 1,..., f ) such that for ay assigmet of bits to (X 1,..., X ) it holds that Y = X for all > Riddle 2 (Gabay-O Coor game). This time there is a coutably ifiite lie of people, so that perso ca see (X +1, X +2,...), ad, as before, hears (Y 1,..., Y 1 ). Thus Y = f (Y 1,..., Y 1, X +1,...) for some f : {0, 1} N {0, 1}. Show that there exist fuctios (f 1, f 2,...) such that for ay assigmet of bits to (X 1, X 2,...) it holds that Y = X for all > Riddle 3. Now there are agai coutably ifiitely may people, but they do ot hear the declaratios ad so Y = f (X +1, X +2,...) for some f : {0, 1} N {0, 1}. Show that there exist fuctios (f 1, f 2,...) such that for ay assigmet of bits to (X 1, X 2,...) the set of N for which Y X is fiite Why we eed measure theory. Assume that the X s are i.i.d radom variables with P [X = 1] = P [X = 2] = 1. I the settig of 2 riddle 3, fix ay fuctios (f 1, f 2,...). Sice Y is a fuctio of (X +1, X +2,...) it is idepedet of X. Thus P [Y = X ] = P [Y = X, X = 0] + P [Y = X, X = 1] = P [Y = 0, X = 0] + P [Y = 1, X = 1] = P [Y = 0] P [X = 0] + P [Y = 1] P [X = 1] = (P [Y = 0] + P [Y = 1]) 1 2 = 1 2. Defie K {1, 2,..., } by K = max{ : X Y }, with K = if this maximum does ot exist.

5 If Y X the K. Hece, ad sice P [Y X ] = 1, we have 2 that P [K ] 1 for all. Hece P [K < ] < 1 for all. Thus m=1 P [K = m] = P [K < ] < 1 2. Takig the limit we have show that P [K = m] < 1, 2 m=1 ad so P [K < ] < 1. Thus P [K = ] > 1, ad i particular with 2 2 positive probability ifiitely may people guess wrogly. 1 Exercise 1.1. Show that i the settig of riddle 2, P [Y = X for all > 1] < 1. Aother similar (ad better kow) example is the Baach-Tarski paradox Bous riddle. Prove or disprove: every subset of R 2 of size 9 is cotaied i the disjoit uio of 9 closed disks of radius I fact this happes w.p. 1.

6 6 2. Measure theory A probability measure µ o a fiite space Ω assiges to each ω Ω a umber betweem 0 ad 1, ad has the property that these umbers sum to 1. We ca also thik about it as a fuctio µ: 2 ω [0, 1] that assigs to each subset of Ω a umber, ad has the properties that (1) µ(ω) = 1. (2) µ is additive. That is, if A 1, A 2 are disjoit (i.e., A 1 A 2 = ) the µ(a 1 A 2 ) = µ(a 1 ) + µ(a 2 ). For example, whe Ω = {0, 1}, the i.i.d. fair coi toss measure ca be defied by lettig, for each k µ ({ω : ω 1 = 1, ω 2 = 1,..., ω k = 1}) = 2 k for each ω Ω. We would like to defie the same object for a coutable umber of coi tosses. That is, whe Ω = {0, 1} N, we would like to defie a map µ: 2 Ω [0, 1] that has the above properties, satisfies µ ({ω : ω 1 = 1, ω 2 = 1,..., ω k = 1}) = 2 k ad is furthermore coutably additive: if (A 1, A 2,...) is a sequece of disjoit sets the µ ( A ) = µ(a ). As we saw i the riddle from the previous lecture, this is impossible. I order to solve this problem we will itroduce some measure theoretical cocepts π-systems, algebras ad sigma-algebras. Give a set Ω, a π-system o Ω is a collectio P of subsets of Ω such that if A, B P the A B P. Example 2.1. Let Ω = R, ad let P = {(, x] : x R}. This is a π-system because (, x] (, y] = (, mi{x, y}]. Example 2.2. Let Ω = {0, 1} N, ad let P be the collectio of sets {A S } idexed by fiite S N where A S = {ω Ω : ω k = 1 for all k S}. This is a π-system because A S A T = A S T.

7 Example 2.3. Let X be a topological space. The the set of closed sets i X is a π-system. A algebra of subsets of Ω is a π-system A o Ω with the followig additioal properties: (1) Ω A. (2) If A A the its complemet A c A. It is easy to see that if A is a algebra of subsets of Ω the (1) A. (2) If A, B A the A B A. Example 2.4. Let Ω be ay set. The the collectio of subsets of Ω is a algebra. Example 2.5. Let Ω = {0, 1} N, ad let A clope be the algebra of clope sets. That is, A clope is the collectio of fiite uios of sets A x idexed by fiite x {0, 1}, where A x = {ω Ω : ω k = x k for all k }. Exercise 2.6. Show that A clope uios of sets of the form A x. is the collectio of fiite disjoit Example 2.7. Let Ω = N, ad let A be the collectio of sets A such that either A is fiite, or else A c is fiite. Exercise 2.8. Prove that A clope ad A are algebras. Give a algebra A, a fiitely additive probability measure is a fuctio µ: A [0, 1] with the followig properties: (1) µ(ω) = 1. (2) µ is additive. That is, if A 1, A 2 are disjoit (i.e., A 1 A 2 = ) the µ(a 1 A 2 ) = µ(a 1 ) + µ(a 2 ). Exercise 2.9. Show that µ( ) = 0. Exercise Defie a fiitely additive measure o the algebra A from Example 2.7. A algebra F of subsets of Ω is a sigma-algebra if for ay sequece (A 1, A 2,...) of elemets of F it holds that A F. It follows that A F. Exercise (1) Let I be a set, ad let {F i } i I be a collectio of sigma-algebras of subsets of Ω. Show that i I F i is a sigmaalgebra. 7

8 8 (2) Let C be a collectio of subsets of Ω. The there exists a uique miimal (uder iclusio) sigma-algebra F C. F is called the sigma-algebra geerated by C, which we write as F = σ(c). Exercise Prove that A (Example 2.7) is ot a sigma-algebra. Give a topological space, the Borel sigma-algebra B is the sigmaalgebra geerated by the ope sets. Hece it is also geerated by ay basis of the topology. A measurable space is a pair (Ω, F), where F is a sigma-algebra of subsets of Ω. A probability measure o (Ω, F) is a fuctio µ: F [0, 1] with the followig properties: (1) µ(ω) = 1. (2) µ is coutably additive. That is, if (A 1, A 2,...) is a sequece of disjoit sets (i.e., A A m = for all m) the µ ( A ) = µ(a ).

9 3. Hah-Kolmogorov Theorem ad costructig measures Theorem 3.1 (Hah-Kolmogorov Theorem). Let C be a collectio of subsets of Ω, ad let F = σ(c). Let µ 0 : C [0, 1] be a coutably additive map with µ(ω) = 1. We say that a probability measure µ: F [0, 1] exteds µ 0 if µ(a) = µ 0 (A) for all A C. (1) If C is a π-system the there exists at most oe probability measure µ that exteds µ 0. (2) If C is a algebra the there exists exactly oe probability measure µ that exteds µ 0. Example 3.2. Let A = A clope be the algebra defied i Example 2.5. The there is a uique map µ 0 : A [0, 1] that is additive ad satisfies µ 0 (A x ) = 2 x. Furthremore, this map is coutably additive. Hece µ 0 has a uique extesio µ: B [0, 1] (where B = σ(a) is the Borel sigma-algebra o {0, 1} N, equipped with the product topology). The probability measure µ is sometimes called the Beroulli measure o {0, 1} N. Exercise 3.3. Prove that µ 0 : A clope [0, 1] is coutably additive. Example 3.4. Let P be the π-system o the iterval [0, 1] give by P = {[0, x] : x [0, 1]}, ad let ad let µ 0 : P [0, 1] be give by µ 0 ([0, x]) = x. The there exists a probability measure µ: B [0, 1] (where B = σ(c) is the Borel sigma-algebra o [0, 1]) that exteds µ 0. Note that ideed there always exists such a µ; it is called the Lebesgue measure. To prove this we aturally exted µ 0 to the algebra geerated by P, ad the show that this extesio is coutably additive. Example 3.5. Let P be the π-system from Example 2.1. Choose some mootoe icreasig, right cotiuous F : R [0, 1] with if x F (x) = 0 ad sup x F (x) = 1. Let µ 0 : P [0, 1] be give by µ 0 ((, x]) = F (x). The if there exists a probability measure µ: B [0, 1] (where B = σ(c) is the Borel sigma-algebra o R) that exteds µ 0, the it is uique. Such a probability measure also always exists. Theorem 3.6. Let (Ω, F, µ) be a probability space. 9

10 10 (1) If (F 1, F 2,...) be a sequece of sets i F such that F F +1 the µ ( F ) = lim µ(f ). (2) If (F 1, F 2,...) be a sequece of sets i F such that F F +1 the µ ( F ) = lim µ(f ). Proof. (1) Let G 1 = F 1, ad for > 1 let G = F \ F 1. The F = G, ad additioally the G s are disjoit. Hece µ ( F ) = µ(g ) = lim µ(g ) = lim µ ( k=1g ) = lim µ(f ). (2) Left as a exercise. k=1 Corollary 3.7. Let (Ω, F, µ) be a probability space, ad let (F 1, F 2,...) be a sequece of sets i F. (1) If µ(f ) = 0 for all the (2) If µ(f ) = 1 for all the µ ( F ) = 0. µ ( F ) = 1.

11 4. Evets ad radom variables Give a measurable space (Ω, F), a evet A is a elemet of F. We sometimes call evets measurable sets. A sub-sigma-algebra of F is a subset of F that is also a sigma-algebra. Give aother measurable space (Θ, G), a fuctio f : Ω Θ is measurable if for all A G it holds that f 1 (A) F. Exercise 4.1. Prove that f is measurable iff the collectio (4.1) is a sub-sigma-algebra of F. σ(f) = {f 1 (A) : A G} = f 1 (G). Hece (assumig f is oto, otherwise restrict to its image), f 1 : G σ(f) is a isomorphism of sigma-algebras. Fix a measurable space (Ω, F), ad let f be a measurable fuctio to some other measurable space. Give a sub-sigma-algebra G F, we say that f is G-measurable if σ(f) is a sub-sigma-algebra of G. We say that a sigma-algebra F is separable if it geerated by a coutable subset. That is, if there exists some coutable C F such that F = σ(c). We say thtat F separates poits if for all ω 1 ω 2 there exists some A F such that ω 1 A ad ω 2 A. Theorem 4.2. Let (Ω, F), (Θ 1, G 1 ) ad (Θ 2, G 2 ) be measurable spaces with sigma-algebras that separate poits. Let f : Ω Θ 1 ad g : Ω Θ 2 be measurable fuctios. The g is σ(f)-measurable iff there exists a measurable h: Θ 1 Θ 2 such that g = h f. Exercise 4.3. Prove for the case that g = h f. Measurable fuctios to (R, B) will be of particular iterest. Claim 4.4. Let (Ω, F) be a measurable space, ad let f : Ω R. The (1) If C B satisfies σ(c) = B, ad if f 1 (A) F for all A C the f is measurable. (2) For each x R let A x Ω be give by A x = {ω : f(ω) x}. If each A x is i F the f is measurable. (3) If Ω is a topological space with Borel sigma-algebra F, ad if f is cotiuous, the it is measurable. (4) If g is a measurable fuctio from (R, B) to itself ad f is measurable the g f is measurable. Claim 4.5. Let (Ω, F) be a measurable space, ad let {f } be a sequece of measurable fuctios to (R, B) with 0 f 1 for all. The the followig are measurable: 11

12 12 (1) if f. (2) lim if f. (3) The set {ω : lim f (ω) exists }. Claim 4.6. The measurable fuctios (Ω, F) (R, B) are a vector space over the reals: (1) If f is measurable the λf is measurable, for all λ R. (2) If f 1 ad f 2 are measurable, the f 1 + f 2 is measurable. Give a probability space (Ω, F, µ) ad a measurable space (Θ, G), we say that two measurable fuctios f, g : Ω Θ are equivalet if µ({ω : f(ω) = g(ω)}) = 1. A radom variable is a equivalece class of measurable fuctios. We will ofte cosider the case that (Θ, G) = (R, B), i which case we will call X a real radom variable. I fact, we will do this so ofte that we will ofte refer to real radom variables as just radom variables. A few otes: (1) Note we will ofte just thik of radom variables as measurable fuctios. We will say, for example, that a real radom variable is o-egative, by which we will mea that there is a o-egative fuctio i the equivalece class. We will also defie radom variables by just describig oe elemet of the equivalece class. (2) It is easy to verify that sums, products, limits etc. of radom variables are well defied, i the sese that (for example) the equivalece class of f +g is equal to the equivalece class of f + g wheever f ad f are equivalet ad g ad g are equivalet. (3) We will later eed to verify that the expectatio of a radom variable is well defied, i.e., is idepedet of the choice of represetative. Example 4.7. Let Ω = {0, 1} N, ad let P be the Beroulli measure defied i Example 3.2. Defie the radom variable X : Ω R by X(ω) = max{ N : ω k = 0 for all k }. Note that X is ot well defied at a sigle poit i Ω, the all zeros sequece. We accordigly exted R to iclude (ad ) ad assig X(ω) = i this case. Give a radom variable X : Ω Θ, we defie the pushforward measure ν = X µ o (Θ, G) by ν(a) = µ ( X 1 (A) ).

13 The measure ν is also called the law of X. Whe Θ = R we defie the cumulative distributio fuctio F : R [0, 1] of X by F (x) = ν ((, x]) = µ ({ω : X(ω) x}). As we oted i Example 3.5, ν is uiquely determied by F. Exercise 4.8. Calculate the cumulative distributio fuctio of the radom variable defied i Example

14 14 5. Idepedece ad the Borel-Catelli Lemmas Let (Ω, F, P) be a probability space. Let (F 1, F 2,...) be sub-sigmaalgebras. We say that these sigma-algebras are idepedet if for ay (A 1, A 2,...) with A F ad ay fiite sequece k it holds that (5.1) P [ k A k ] = P [A k ]. k We say that the radom variables (X 1, X 2,...) are idepedet if (σ(x 1 ), σ(x 2 ),...) are idepedet. We say that the evets (A 1, A 2,...) are idepedet if their idicators fuctios (1 {A1 }, 1 {A2 },...) are idepedet. Note that σ(1 {A} ) = {, A, A c, Ω}. Claim 5.1. Let the evets (A 1, A 2,...) be idepedet. The P [ A ] = P [A ]. Proof. By idepedece we have that for ay m N m P [ m =1A ] = P [A ]. Deote B m = m =1A. The B is a decreasig sequece with m B m = A, ad so by Theorem 3.6 we have that m P [ A ] = P [ m B m ] = lim P [B m ] = lim P [A ]. m m =1 =1 P [A ] = It turs out that to prove idepedece it suffices to show (5.1) for geeratig π-systems. Proof is by Carathéodory s Theorem. Theorem 5.2. Let (X 1, X 2,...) be a sequece of idepedet real radom variables, each with the distributio P [X > x] = e x. Let The P [L = 1] = 1. L = lim sup X log. To prove this Theorem we will eed the Borel-Catelli Lemmas. Lemma 5.3 (Borel-Catelli Lemmas). Let (Ω, F, P) be a probability space, ad let (A 1, A 2,...) be a sequece of evets. (1) If P [A ] < the P [ω Ω : ω A for ifiitely may ] = 0.

15 (2) If P [A ] = ad (A 1, A 2,...) are idepedet the P [ω Ω : ω A for ifiitely may ] = 1. To see why idepedece is eeded for the secod part, cosider the case that all the evets A are equal to some evet A with 0 < P [A] < 1. Proof of Lemma 5.3. (1) Note that {ω : ω A for ifiitely may } = m A m. Let B = m A m, so that we wat to show that P [ B ] = 0. Note that B is a decreasig sequece (i.e., if > the B B ) ad therefore by Theorem 3.6 we have that P [ B ] = lim P [B ]. Sice B = m A m, we have that P [B ] m A m. But the latter coverges to 0, ad so we are doe. (2) Note that {ω : ω A for ifiitely may } c = {ω : ω A for fiitely may } 15 = {ω : ω A c for all large eough} = m A c m. We would hece like to show that P [ m A c m] = 0. Let C = m A c m. The by idepedece ad Claim 5.1 we have that P [C ] = P [ m A c m] = m (1 P [A m ]). Sice 1 x e x this implies that ( P [C ] exp m P [A m ] Fially, by Corollary 3.7, P [ C ] = 0. ) = 0. Proof of Theorem 5.2. Let A be the evet that X α log. The P [A ] = α, ad the evets (A 1, A 2,...) are idepedet (exercise!). Also, ote that { = if α 1, P [A ] < if α > 1.

16 16 Thus, from the Borel-Catelli Lemmas it follows that { 1 if α 1, P [X α log for ifiitely may ] = 0 if α > 1. Now, ote that the evet {L α} is idetical to the evet m>0 {X (α 1/m) log for ifiitely may }, ad so P [L 1] = 1, by Corollary 3.7. It also follows that P [L 1 + 1/] = 0 for ay > 0, ad so we have that P [L > 1] = 0, agai by Corollary 3.7. Hece P [L 1] = 1, ad so P [L = 1] = 1.

17 6. The tail sigma-algebra Cosider a sequece of idepedet real radom variables (X 1, X 2,...) such that there exists some M 0 such that P [ X M] = 1 for all. That is, the sequece is uiformly bouded. Defie the radom variables Y = 1 k=1 Claim 6.1. P [ L M] = 1. X ad L = lim sup Y. Proof. Clearly P [ Y M] = 1. Hece P [ Y M for all ] = 1, ad thus P [ L M] = 1. Defie the evet A = {lim Y exists }. Theorem 6.2. There exists some c [ M, M] such that P [L = c] = 1, ad P [A] {0, 1}. A iterestig observatio is that L is idepedet of X 1. To see this, defie L 1 = lim sup X, k=1 k=2 which is clearly idepedet of X 1. But 1 X 1 L = lim sup X = lim sup + 1 X = L. I fact, by the same argumet, L is idepedet of (X 1, X 2,..., X ) for ay. This makes L a tail radom variable, as we ow explai. For each N defie the sigma-algebra T by σ(x, X +1,...), which is the smallest sigma-algebra that cotais (σ(x ), σ(x +1 ),...). Defie the tail sigma-algebra by T = T. A radom variable is a tail radom variable if it is T -measurable. Claim 6.3. L is a tail radom variable. Proof. Usig a costructio similar to the L costructio above, it is easy to see that for every there exists a fuctio f such that L = f (X, X +1,...). It follows that L is T -measurable. Thus for every A σ(a) it holds that L 1 (A) T, for every. Thus A T = T. k=2 17

18 18 Let (Z 1, Z 2,...) be i.i.d radom variables, each distributed uiformly over the set of symbols S = {a, b, c}. Let S be the set of fiite strigs over S, ad defie the radom variable W takig values i S as follows: W 1 = Z 1. If W is empty, or if the last symbol i W is differet tha Z +1, the W +1 is the cocateatio W Z +1. If the last symbol i W is Z +1 the W +1 is equal to W, with this last symbol removed. We will prove later i the course that with probability oe it holds that lim W =, ad hece we ca defie the radom variable T to be the evetual first symbol i all W high eough. It is immediate that T is measurable i the tail sigma-algebra of the sequece (W 1, W 2,...). It is also easy to see that P [T = a] = 1/3, sice by the symmetry of the defiitios, P [T = a] = P [T = b] = P [T = c], ad these must sum to oe. By the same argumet, the probability that W starts with some strig w for all high eough is w. Theorem 6.4 (Kolmogorov s Zero-Oe Law). Let T be the tail sigmaalgebra of a sequece of idepedet radom variables. The P [A] {0, 1} for ay A T. Before provig this theorem we will prove a lemma. Lemma 6.5. Let the evet A be idepedet of itself. The P [A] {0, 1}. Proof. P [A] = P [A A] = P [A] P [A]. Proof of Theorem 6.4. Let G = σ(x 1,..., X 1 ), T = σ(x, X +1,...) ad T = T. We first claim that G ad T are idepedet. To see this, defie T m = σ(x,..., X +m ), ad ote that T m ad G are idepedet, ad so P [A B] = P [A] P [B] for ay A G ad ay B T m. Now C = m T m is ot a sigma-algebra, but it is a π-system. Sice P [A B] = P [A] P [B] for ay A G ad ay B C, it follows that G ad σ(c ) = T. Sice T T the G ad T are idepedet. Hece T is idepedet of σ( G ) = σ( σ(x )) = σ(x 1, X 2,...). Sice T σ(x 1, X 2,...) it follows that T is idepedet of T, ad so P [A] {0, 1} for ay A T. Proof of Theorem 6.2. Sice A is a tail radom variable the P [A] {0, 1}.

19 For ay q Q defie the tail evet A q = {L q}. By Kolmogorov s zero-oe, law, the probability of each of these is either 0 or 1, ad so there is some c = sup{q : P [A q ] = 1} = if{q : P [A q ] = 0}. Sice Q is coutable, P [L c] = P [L c] = 1, ad so P [L = c] = 1. Fially, c [ M, M], sice P [L [ M, M]] = 1. 19

20 20 7. Expectatios Let (Ω, F, P) be a probability space. Choose evets (A 1,..., A k ) ad o-egative umbers (x 1,..., x k ), ad let f = k x k1 {Ak }. We [ ] call such a measurable fuctio simple, ad defie its expectatio E f by [ ] E f = k x P [A ]. =1 Note that oe eeds to check that E [ ] is well defied, as there might be more tha oe way to write a simple fuctio as a fiite sum of idicators. Give a (o-simple) o-egative real fuctio f, we defie its expectatio by { [ ] } E [f] = sup E f : f is simple ad f f. Note that this supremum may be ifiite. It is straightforward to verify that for ay o-egative fuctios f, g such that P [f = g] = 1 it holds that E [f] = E [g]. We ca therefore defie the expectatio of a radom variable X as the expectatio of ay f i the equivalece class. We will heceforth cosider expectatios of radom variables. It is likewise straightforward to verify that for ay two o-egative radom variables X, Y : Liearity of expectatio: For ay λ > 0 it holds that E [X + λy ] = E [X] + λe [Y ]. If X Y the E [X] E [Y ]. Theorem 7.1 (Markov s Iequality). If X is a o-egative radom variable with E [X] < the for every λ > 0 P [X λ] E [X] λ. Proof. Let A = {X λ}, ad let Y be give by { λ if ω A, Y (ω) = λ 1 {A} (ω) = 0 otherwise. The Y X, ad so E [Y ] E [X]. Sice E [Y ] = λ P [A], we get that λ P [X λ] E [X], ad the claim follows by dividig both sides by λ.

21 Cosider the o-egative radom variables (X 1, X 2,...) defied o the iterval (0, 1] (equipped with the Borel sigma-algebra ad Lebesgue measure) which are give by { if x 1/, X (x) = 0 otherwise. The (1) E [X ] = 1. (2) For every x (0, 1] it holds that lim X (x) = X(x), where X is the costat fuctio X(x) = 0. (3) lim E [X ] E [X]. Hece it is ot ecessarily true that if X X the E [X ] E [X]. Theorem 7.2 (Mootoe Covergece Theorem). Let (Ω, F, P) be a probability space, ad let (X 1, X 2,...) be a sequece of o-egative radom variables such that X (ω) is icreasig for every ω Ω. Let X(ω) = lim X (ω) [0, ]. The lim E [X ] = E [X] [0, ]. Theorem 7.3 (Domiated Covergece Theorem). Let (Ω, F, P) be a probability space, ad let (X 1, X 2,...) be a sequece of o-egative radom variables. Let X, Y be a o-egative radom variables with E [Y ] <, ad such that lim X (ω) = X(ω) for every ω Ω, ad X (ω) Y (ω) for every ω Ω ad N. The lim E [X ] = E [X]. Give a radom variable X, we defie the radom variables X + ad X by X + (ω) = max{x(ω), 0} ad X (ω) = max{ X(ω), 0}, so that X + ad X are both o-egative, ad X = X + X. If E [X + ] ad E [X ] are both fiite, we defie E [X] = E [ X +] E [ X ], ad say that X L 1 (Ω, F, P), or just X L 1. Note that X L 1 iff E [ X ] < iff X L 1. For p 1 we say that X L p if X p L 1. Exercise 7.4. Show that L p is a vector space. X E [ X p ] 1/p defies a orm o L p. Theorem 7.5. If r > p 1 ad X L r the X L p ad E [ X r ] 1/r E [ X p ] 1/p. 21

22 22 I fact, if we equip L p with this orm, the it is a Baach space; that is, it is complete with respect to the metric iduced by this orm. Theorem 7.6. Let (X 1, X 2,...) be a sequece of radom variables i L p such that lim r sup {E [ X X m p ] = 0. m, r The there exists a X L p such that lim E [ X X p ] = 0. A particularly iterestig case is p = 2. I this case we ca defie a ier product (X, Y ) := E [X Y ], which makes L 2 a Hilbert space, with completeess give by Theorem 7.6. Theorem 7.7. Let X, Y L 2. The X Y L 1. Proof. Note first that X, Y L 2. Sice L 2 is a vector space the E [( X + Y ) 2 ] <, ad so By the liearity of expectatio E [ X X Y + Y 2] <. E [ ( X + Y ) 2] = E [ X 2] + 2 E [ X Y ] + E [ Y 2], ad so we have that E [ X Y ] <. Now, E [ X Y ] = E [ X Y ], ad so X Y L 1. Fially, sice X Y = (X Y ) + + (X Y ) it follows that E [(X Y ) ± ] < ad so X Y L 1. It follows from Theorems 7.6 ad 7.7 that L 2 is a real Hilbert space, whe equipped with the ier product (X, Y ) := E [X Y ]. We ca therefore immediately coclude that for ay X, Y L 2 (1) E [X Y ] 2 E [X 2 ] E [Y 2 ], with equality iff for some λ R it a.s. holds that X = λ Y. (2) E [(X + Y ) 2 ] = E [X 2 ] + E [Y 2 ] iff E [X Y ] = 0. Give X L 2, we[ defie] the radom variable [ X := X] E [X], ad deote Var (X) = E X X ad Cov (X, Y ) = E X Ỹ. We say that X ad Y are ucorrelated if Cov (X, Y ) = 0. Usig these defiitios the facts above become (1) Cov (X, Y ) 2 Var (X) Var (Y ), with equality iff for some λ R it a.s. holds that X = λ Y. (2) Var (X + Y ) = Var (X)+Var (Y ) iff X ad Y are ucorrelated.

23 8. A strog law of large umbers ad the Cheroff boud Theorem 8.1. Let X, Y L 1 be idepedet. The X Y L 1 ad E [X Y ] = E [X] E [Y ]. To prove this, we first ote that it holds for idicator fuctios by the defiitio of idepedece, the show that it holds for simple fuctios, ad apply the mootoe covergece theorem to show that it holds i geeral. Theorem 8.2. Let (X 1, X 2,...) be a sequece of idepedet radom variables uiformly bouded i L 4 (so that E [X] 4 < K for all ad some K > 0), ad with E [X ] = 0. Let Y = 1 X. The lim Y = 0 a.s. Proof. By idepedece k E [ X k X 3 l ] = E [ Xk X 2 l X m ] = 0, ad so, by liearity we have that E [ ( ) 4 ] Y 4 = E 1 X k = 1 E [ ] X k + 4 k=1 k By Theorem 7.5 we have that E [X 2 k ]2 < K, ad so E [ ] Y 4 K + 6K 3 7K 2. 2 It follows from Markov s iequality that for ay ε > 0 k<l 23 E [ X 2 k X 2 l ]. P [ Y 4 ε 4] 7K ε 4, 2 ad so, by Borel-Catelli, lim sup Y ε for ay ε > 0 (almost surely, which we drop for the remaider of the proof). Itersectig these probability oe evets for ε = 1/2, 1/3, 1/4,... yields that lim sup Y = 0 ad thus lim Y = 0. With a little additioal effort we ca prove that if E [X ] = µ the lim Y = µ. A atural questio is: what is the probability that Y is sigificatly far from µ, for fiite? For example, for η > µ, what is the probability that Y η?

24 24 Theorem 8.3 (Cheroff Boud). Let (X 1, X 2,...) be a sequece of i.i.d. radom variables i L, ad with E [X ] = µ. The for every η > µ there is a r > 0 such that P [Y η] e r. Proof. Deote p = P [Y η]; we wat to show that p e r. Note that the evet {Y η} is idetical to the evet {e t Y e t η }, for ay t > 0. Sice e t Y is a positive radom variable, by the Markov iequality we have that p = P [ e t Y e t η] E [ ] e t Y. e t η Now, E [ [ ] e t Y] = E e t X k = E [ e k] t X, k k where the peultimate equality uses idepedece. Let X be a radom variable with the same distributio as each X k. The we have show that E [ e t Y] = E [ e t X]. We ow defie the momet geeratig fuctio of X by M(t) := E [ e tx]. The ame comes from the fact that t (8.1) M(t) =! E [X ]. Note that this meas that M (0) = E [X]. Usig M we ca write ad so =0 E [ e t X] = M(t), p exp ( (t η log M x (t)) ) If we defie the cumulat geeratig fuctio of X by K(t) := log M(t), the p exp ( (t η K(t)) ). Sice K (0) = M (0)/M(0) = E [X], ad sice K is smooth (as it turs out), it follows that for t > 0 small eough, Hece, if we defie t η K(t) = t η t µ O(t 2 ) > 0. r = sup{t η K(t)} t

25 we get that r > 0 ad p e r. Note that we did ot really eed X k to be i L, but oly that it is i L 1 ad that its momet (or cumulat) geeratig fuctio is defied ad smooth aroud zero. Claim 8.4. Let X L 1 have a cumulat geeratig fuctio K that is well defied ad fiite for some t > 0. The Proof. By Markov s iequality P [X a] e t a+k(t). P [X a] = P [ e t X e t a] E [ e t X] e t a = e t a+k(t). It turs out that the Cheroff boud is asymptotically tight. We show this i

26 26 9. The weak law of large umbers Theorem 9.1. Let (X 1, X 2,...) be a sequece of idepedet real radom variables i L 2, let E [X ] = µ, Var (X ) σ 2, ad let Y = k X. The for every ε > 0 ad N ad i particular P [ Y µ ε] σ2 ε, lim P [ Y µ ε] = 0. I this case we say that Y coverges i probability to µ. More geerally, we say that a sequece of real radom variables Y coverges i probability to a real radom variable Y if lim P [ Y Y ε] = 0. Exercise 9.2. Does covergece i probability imply poitwise covergece? Does poitwise covergece imply covergece i probability? To prove this Theorem we will eed Chebyshev s iequality, which is just Markov s iequality i disguise. Lemma 9.3 (Chebyshev s Iequality). For every X L 2 ad for every λ > 0 it holds that P [ X E [X] λ] Var (X) λ 2. Proof of Theorem 9.1. Note that E [Y ] = µ, ad that, by idepedece, ( ) ( ) 1 Var (Y ) = Var X k = 1 Var X 2 k = 1 Var (X 2 k ) σ2. k k k Hece Chebyshev s iequality yields that for every λ > 0 we have that P [ Y µ ε] σ2 ε We ca relax the assumptio X L 2 to X L 1 ad still prove the weak law of large umbers. I fact, eve the strog law holds i this settig (for i.i.d. radom variables), but we will leave the proof of that for after we prove the Ergodic Theorem.

27 Theorem 9.4. Let (X 1, X 2,...) be a sequece of i.i.d. real radom variables i L 1, let E [X ] = µ, ad let Y = k X. The for every ε > 0 lim P [ Y µ ε] = 0. We show a proof adapted from [5]. Proof. We assume µ = 0; the reductio is straightforward. Let X = X 1. For N N, ad a r.v. X deote X N = X 1 { X N} ad X >N = X 1 { X >N}, so that X = X N + X >N. By the Domiated Covergece Theorem (9.1) E [ X >N ] 0 ad E [ X N] E [X] = 0, sice both are domiated by X. Fix ε, δ > 0. To prove the claim (uder our assumptio that µ = 0) we show that P [ Y ε] < δ for all large eough. For ay N N we ca write Y as where Y Note that Y Y = 1 := 1 k k X N k + X k >N = Y + Y >, X N k ad Y > = 1 is ot the same as Y N k X >N k. 27 ; we will ot eed the latter. Likewise, Y > is ot the same as Y >N. Choose N large eough so that E [ X >N ] < ε δ/4; this is possible by (9.1). Now, [ E [ Y > 1 ] = E k X >N k ] E [ 1 k X >N k Therefore, by Markov s iequality, we have that P [ Y > ε/2] < δ/2. ] = E [ X >N ] < ε δ/4. Sice X N k is bouded it is i L 2. Therefore, by idepedece, Var ( ) Y Var ( ) X N = k N 2. By liearity of expectatios E [ ] [ ] Y = E X N, ad thus teds to zero, by (9.1). It thus from Chebyshev s iequality that for large eough P [ Y ε/2 ] < δ/2. Sice P [ Y ε] P [ Y ε/2 ad Y > ε/2 ], the claim follows by the uio boud.

28 Coditioal expectatios Why thigs are ot as simple as they seem. Cosider a poit chose uiformly from the surface of the (idealized, spherical) earth, so that the probability of fallig o a set is proportioal to its area. Say we coditio o the poit fallig o the equator. What is the coditioal distributio? It obviously has to be uiform: by symmetry, there caot be a reaso that it is more likely to be i oe time zoe tha aother. Say ow that we coditio o the poit fallig o a particular meridia m. By the same reasoig, the coditioal distributio is uiform, ad so, for example, the probability that we are withi 2 meters of the orth pole is the same as the probability that we are withi 1 meter from the equator. Itegratig over m we get that regardless of the meridia, the probability of beig 2 meters from the orth pole is the same as the probability of beig 1 meter from the equator. But the area withi 2 meters of the orth pole is about 4πm 2, whereas the area withi 1 meter of the equator is about 80000m Coditioal expectatios i fiite spaces. Cosider a probability space (Ω, F, P) with Ω <, F = 2 Ω, ad P [ω] > 0 for all ω Ω. Let Ω = {1,..., } 2, let Y be the radom variable give by Y (ω 1, ω 2 ) = ω 1, ad let G = σ(y ) be the sigma-algebra geerated by the sets A k = {k} {0,..., }. Let X be a real radom variable. The the usual defiitio is the E [X Y ] is the radom variable Ω R give by E [X Y ](ω) = ω Y 1 (ω) X(ω )P [ω ] ω Y 1 (ω) P [ω ] This otatio ca be cofusig - E [X Y ] is a radom variable ad ot a umber! Ideed, give A F with P [A] > 0, we deote by E [X A] the umber E [X A] = 1 P [A] E [ X 1 {A} ]. Exercise (1) E [X Y ] = argmi Z L 2 (Ω,G,P) E [(X Z) 2 ]. (2) E [X Y ] is G-measurable. (3) If A G with P [A] > 0 the E [ X 1 {A} ] = E [ E [X Y ] 1{A} ] Coditioal expectatios i L 2. Fix a probability space (Ω, F, P). Give a sub-sigma-algebra G F, we kow by Theorem 7.6 that the

29 subspace L 2 (Ω, G, P) L 2 (Ω, F, P) is closed. We ca therefore defie the projectio operator by P G : L 2 (Ω, F, P) L 2 (Ω, G, P) P G (X) = argmi E [ (X Y ) 2]. Y L 2 (Ω,G,P) Some immediate observatios: (1) P G (X) is G-measurable. (2) If Y L 2 (Ω, F, P) the E [(X P G (X)) Y ] = 0, or E [X Y ] = E [P G (X) Y ]. Thus give A G with P [A] > 0 we have that E [ X 1 {A} ] = E [ PG (X) 1 {A} ] Coditioal expectatios i L 1. Theorem Let (Ω, F, P) be a probability space with a r.v. X L 1 ad a sub-sigma-algebra G F. The there exists a uique radom variable Y with the followig properties: (1) Y L 1 (Ω, G, P). (2) For every A G it holds that E [ Y 1 {A} ] = E [ X 1{A} ]. We deote E [X G] := Y. For A F with P [A] > 0 we deote E [X A] = E [ ] X 1 {A} /P [A]. Proof. We first prove uiqueess. Let Y ad Z both satisfy the two coditios i the theorem, ad assume by cotradictio that P [Y > Z] > 0. The there is some ε > 0 such that P [Y ε > Z] > 0. Let A = {Y ε > Z}, ad ote that A G. The E [ Y 1 {A} ] = E [ (Y ε) 1{A} ] + εp [A] > E [ Z 1 {A} ] + ε P [A] > E [ Z 1 {A} ]. But sice A G we have that both P [ ] [ ] Y 1 {A} ad P Z 1{A} are equal to E [ ] X 1 {A} - cotradictio. We prove the remider uder the assumptio that X 0; the reductio is straightforward. Let X = X 1 {X }. The X is bouded, ad i particular is i L 2. Let Y = P G (X ). We claim that Y is o-egative. To see this, assume by cotradictio that P [Y < ε] > 0 for some ε > 0, ad let A = {Y < ε}. The E [ ] [ ] [ ] Y 1 {A} < ε P [A] < 0, but E Y 1 {A} = E X 1{A} 0. Now, Y is a mootoe icreasig sequece. To see this, ote that X is mootoe icreasig, ad that P G is a liear operator, ad so 29

30 30 Y +1 Y = P G (X +1 X ) is o-egative, by the same proof as above. Sice Y is mootoe icreasig the so is Y 1 {A}, for ay A G. Therefore, if we defie Y = lim Y, the E [ ] [ ] Y 1 {A} E Y 1{A}. But E [ ] Y 1 {A} = E [X A], ad, sice X 1 {A} is also mootoe icreasig with X 1 {A} = lim X 1 {A}, we have that E [ Y 1 {A} ] = lim E [Y A] = lim E [ X 1 {A} ] = E [ X 1{A} ]. Fially, each Y is G-measurable by costructio, ad therefore so is Y Some properties of coditioal expectatio. Exercise (1) If X is G-measurable (i.e., σ(x) G) the E [X G] = X. (2) The Law of Total Expectatio. If G 2 G 1 the E [E [X G 1 ] G 2 ] = E [X G 2 ]. I particular E [E [X G]] = E [X]. (3) If Z L (Ω, G, P) the E [Z X G] = Z E [X G].

31 11. The Galto-Watso process Cosider a asexual orgaism (i the origial work these were Victoria me) whose umber of offsprigs X 1 is chose at radom from some distributio o N 0 = {0, 1, 2,...}. Each of its descedats i (assumig it has ay) has X i offsprigs, with the radom variables (X 1, X 2,...) distributed idepedetly ad idetically. A iterestig questio is: what is the probability that the orgaisms progey will live forever, ad what is the probability that there will be a last oe to its ame? Formally, cosider geeratios {1, 2,...}, ad to each geeratio associate a ifiite sequece of radom variables (X,1, X,2,...), with all the radom variables (X,i ) idepedet ad idetically distributed o N 0. We will, to simplify some expressios, defie X = X 1,1. We assume that 0 < E [X] <, ad deote µ = E [X]. We also assume that P [X = 0] > 0. To each geeratio we defie the umber of orgaisms Z, which is also a radom variable. It is defied recursively by Z 1 = 1 ad Z +1 = Z i=1 X,i. Clearly Z = 0 implies Z +1 = 0. We are iterested i the evet that Z = 0 for some, or that, equivaletly, Z = 0 for all large eough. This is agai equivalet to the evet Z <, sice each Z is a iteger. We deote this evet by E (for extictio), ad deote E = {Z = 0}, so that the sequece E is icreasig ad E = E. Therefore, by Theorem 3.6, P [E] = lim P [Z = 0]. We first calculate the expectatio of Z +1. Sice Z is idepedet of (X,1, X,2,...), it holds that [ Z ] E [Z +1 ] = E X,i ad so = E [ i=1 [ Z E ]] X,i Z i=1 = E [Z E [X Z ]] = E [Z ] E [X], E [Z +1 ] = µ. Claim If µ < 1 the P [E] = 1. 31

32 32 Proof 1. By Markov s iequality, P [Z 1] µ. Thus by the Borel- Catelli Lemma w.p. 1 there will be some with Z < 1, ad thus Z = 0. Proof 2. Note that E [ Z ] = E [Z ] <, ad so P [ Z = ] = 0. It is also true that P [E] = 1 whe µ = 1. Note that i this case E [Z +1 Z 1, Z 2,..., Z ] = E [Z +1 Z ] = Z E [X] = Z. The first equality makes Z a Markov chai. The secod makes it a Martigale; we will discuss both cocepts formally. By the Martigale Covergece Theorem we have that Z coverges almost surely to some r.v. Z. But clearly Z caot coverge to aythig but 0, ad so P [E] = 1. Note that the evet E is equal to the uio of the evet that X 1,1 = 0 with the evet that X 1,1 > 0 but each of the sub-tree of the Z 2 offsprigs goes extict. Sice the process o each subtree is idetical, ad sice the probability that all of such k offsprig trees goes extict is P [E] k, we have that P [E] must satisfy (11.1) P [E] = k N 0 P [X = k]p [E] k. We accordigly defie f : [0, 1] [0, 1], the geeratig fuctio of X, by f(t) = k N 0 P [X = k] t k = E [ t X], where we take 0 0 = 1. The (11.1) is equivalet to observig that P [E] is a fixed poit of f. Note that 1 is always a fixed poit, but i geeral there might be more. Some observatios: (1) f(0) = P [X = 0] ad f(1) = 1. (2) f (t) = k N 0 P [X = k] k t k 1 = E [ X t X 1]. Hece f (1) = E [X] = µ. Note also that f (t) > 0. (3) Likewise, the k th derivative of f is E [ X k t X k], which is also positive. Thus f is strictly covex.

33 Let f (t) = E [ ] t Z be the geeratig fuctio of Z. The f +1 (t) = E [ t ] Z +1 = E [ E [ ]] t Z +1 Z [ [ = E E t ]] Z k=1 X,k Z = E [E [ t X] ] Z = E [ f(t) Z] = f (f(t)), where we agai used the fact that Z is idepedet of (X,1, X,2,...). Sice f 1 = 1, f +1 is the -fold compositio of f with itself: f +1 = f f f. Now P [Z = 0] = f (0). Sice f is aalytic, P [E] = lim P [E ] = lim f (0) will be the fixed poit of f that oe coverges to by applyig f repeatedly to 0. Furthermore, f(0) = P [X = 0] > 0, f(1) = 1, ad f is icreasig ad covex. Thus f will have a uique fixed poit. Fially, sice f (1) = µ, this fixed poit will be 1 iff µ 1. 33

34 Markov chais Let the state space S be a coutable or fiite set. A sequece of S-valued radom variables (X 0, X 1, X 2,...) is said to be a Markov chai if for all x S ad > 0 P [X = x X 0, X 1,..., X 1 ] = P [X = x X 1 ]. A Markov chai is said to be time homogeeous if P [X = x X 1 ] does ot deped o. I this case it will be useful to study the associated stochastic S-idexed matrix P (x, y) = P [X +1 = y X = x]. It is easy to see that P [X +m = y X = x] = P m (x, y), where P m deotes the usual matrix expoetiatio. We call P the trasitio matrix of the Markov chai. I the cotext of a trasitio matrix P, we will deote by P x the measure of the Markov chai for which P [X 0 = x] = 1. The ext claim is eeded to formally apply the Markov property. Claim Let (X 0, X 1,...) be a time homogeeous Markov chai. Fix some measurable f : S N R ad deote Y = f(x, X +1,...). The for ay, m N ad x S such that P [X = x] > 0 ad P [X m = x] > 0 it holds that E [Y +1 X = x] = E [Y m+1 X m = x]. Example: let S = Z, let X 0 = 0, ad let P (x, y) = { x y =1}. This is called the simple radom walk o Z. More geerally (i some directio), oe ca cosider a graph G = (S, E) with fiite positive out-degrees d(x) = E {x} S ad let P (x, y) = 1 {(x,y) E} d(x) The lazy radom walk o Z has trasitio probabilities P (x, y) = { x y 1}. We say that a (time homogeeous) Markov chai is irreducible if for all x, y S there exists some m so that P m (x, y) > 0. We say that a irreducible chai is aperiodic if for some (equivaletly, every) x S it holds that P m (x, x) > 0 for all m large eough. Exercise Show that if a irreducible chai is ot aperiodic the for every x S there is a k N so that P m (x, x) = 0 for all m ot divisible by k. Exercise (1) Show that the simple radom walk o Z is irreducible but ot aperiodic.

35 (2) Show that the lazy radom walk o Z is irreducible ad aperiodic. (3) Show that the simple radom walk o a directed graph is irreducible iff the graph is strogly coected. (4) Show that the simple radom walk o a coected, udirected graph is aperiodic iff the graph is ot bipartite. We defie the hittig time to x S by T x = mi{ > 0 : X = x}. This is a radom variable takig values i N { }. A irreducible Markov chai is said to be recurret if P [T x < ] = 1 wheever P [X 0 = x] > 0. A o-recurret radom walks is called trasiet. Theorem Fix a irreducible Markov chai with P [X 0 = x] > 0 for all x S. The the followig are equivalet. (i) The Markov chai is recurret. (ii) For some (all) x X it holds that P [X = x i.o.] = 1. (iii) For some (all) x X it holds that m P m (x, x) =. Proof. Choose ay x S. Sice P [T x < ] = 1, ad sice P [X 0 = y] > 0 for ay y S, we have that P [T x < X 0 = y] = 1, or that P [X = x for some > 0 X 0 = y] = 1. By irreducibility we have that P [X m = y] > 0 for ay m, ad so by the Markov property it follows that Summig over y yields that ad so P [X = x for some > m X m = y] = 1. P [X = x for some > m] = 1, P [X = x i.o.] = 1. We have thus show that (i) implies (ii). Note that P m (x, x) = P [X m = x X 0 = x]. Now, (ii) implies that P [X = x i.o. X 0 = x] = 1 ad so, by Borel-Catelli, (ii) implies (iii). 35

36 36 Fially, to show that (iii) implies (i), assume that the Markov chai is trasiet. The P [T x < ] < 1, ad so P [T x < X 0 = x] < 1. Deote the latter by p. Hece, by the Markov property, p = P [X = x for some > m X m = x]. Therefore, coditioed o X 0 = x, the probability that x is visited k more times is p k (1 p). I particular the expected umber of visits is fiite, ad sice this expectatio is equal to m P m (x, x), the proof is complete. Exercise Prove that every irreducible Markov chai over a fiite state space is recurret. Exercise Let P be the trasitio matrix of a Markov chai over S, ad for ε > 0 let P ε = (1 ε)p + εi, where I is the idetity matrix. Thus P ε is the ε-lazified versio of P. Cosider two Markov chais over S: both with X 0 = x, ad oe with trasitio matrix P ad the other with trasitio matrix P ε. Prove that either both are recurret or both are trasitive. Corollary The simple radom walk o Z is recurret. Proof. Note that P [X 2+1 = 0] = 0 ad that By Stirlig ad so Hece P [X 2 = 0] = 2 2 ( 2 m ( ) , P [X 2 = 0] 1 2. ). P m (0, 0) 1 2 m =, ad the claim follows by Theorem Cosider ow a radom walk with a drift o Z. For example, let P (x, y) = p if y = x + 1 ad P (x, y) = 1 p if y = x 1. I this case, assumig X 0 = 0, X = k Y where the Y are i.i.d. r.v. with P [Y = 1] = p ad P [Y = 1] = 1 p. It follows from the strog law of large umbers that a.s. lim X / = 2p 1 > 0, ad so i particular lim X =, ad the radom walk is trasiet. The same

37 argumet holds wheever the trasitio probabilities correspod to a L 1 radom variable with o-zero expectatio, by the same argumet (although we have yet to prove a L 1 SLLN). Exercise Prove that the simple radom walk o Z 2 (give by P (x, y) = { x y =1}) is recurret, but that the simple radom walk o Z d (give by P (x, y) = 1 d 1 { x y =1}) is trasiet for all d 3. 37

38 Martigales A filtratio Φ = (F 1, F 2,...) is a sequece of icreasig sigmaalgebras F 1 F 2. A atural (ad i some sese oly) example is the case that F = σ(y 1,..., Y ) for some sequece of radom variables (Y 1, Y 2,...). A process (X 1, X 2,...) is said to be adapted to Φ if each X is F -measurable. A sequece of real radom variables (X 1, X 2,...) that is adapted to Φ ad is i L 1 is called a martigale with respect to Φ if for all 1 E [X +1 F ] = X. It is called a supermartigale if E [X +1 F ] X. Note that if (X 1, X 2,...) is a martigale the E [X ] = E [X 1 ] ad by subtractig the costat E [X 1 ] from all X s we get that (X 0, X 1,...) is a martigale with X 0 = 0. A similar statemet holds for supermartigales. As a first example, let W be i.i.d. r.v. with P [W = +1] = P [W = 1] = 1/2, let X = k W, ad let F = σ(x 1,..., X ). The X is the amout of moey made i fair bets (or the locatios of a simple radom walk o Z) ad is a martigale with respect to (F 1, F 2,...). If we set P [W = +1] = 1/2 ε ad P [W = 1] = 1/2 + ε for some ε > 0 the X is a supermartigale. As a secod example we itroduce Pólya s ur. Cosider a ur i which there are iitially a sigle black ball ad a sigle white ball. I each time period we reach i, pull out a ball, ad the put back two balls of the same color. Formally, let (Y 1, Y 2,...) be i.i.d. radom variables distributed uiformly over [0, 1], ad let the umber of black balls at time be B, give by B 1 = 1 ad B +1 = B + 1 {Y<B /(+1)}. Deote by R = B /( + 1) the fractio of black balls. The E [R +1 B 1,..., B ] = E [R +1 B ],

39 sice the process (B 1, B 2,...) is a Markov chai. Furthermore E [R +1 B ] = E [B +1 B ] ( B + = = B + 1 = R, B + 1 ad so R is a martigale with respect to F = σ(b 1,..., B ). Theorem R coverges poitwise: there is a radom variable R such that P [lim R = R] = 1. To prove this theorem we will prove a much more geeral theorem. Theorem 13.2 (Martigale Covergece i L 2 ). Let Φ = (F 1, F 2,...) be a filtratio, ad let (X 1, X 2,...) be a martigale w.r.t. Φ. Furthermore, assume that there exists a K such that E [X 2 ] < K for all. The there exists a radom variable X L 2 such that E [(X X ) 2 ] 0. Proof. Set X 0 = 0, ad for 1 let Y = X X 1. Sice X 1 = E [X F 1 ], we have that Y is orthogoal to ay F 1 -measurable r.v., ad i particular is orthogoal to Y m for ay m <. Now, Y = X k ad so by the orthogoality of the Y s it follows that E [ ] [ ] Yk 2 = E X 2 < K. Thus k k E [ Y 2 ] < K, ad we have that X is a Cauchy sequece i L 2. Therefore, sice L 2 is complete (Theorem 7.6) there exists some X L 2 such that E [(X X ) 2 ] 0. This theorem still does ot imply poitwise covergece, which we would eed to prove Theorem Theorem 13.3 (Martigale Poitwise Covergece). Let Φ = (F 1, F 2,...) be a filtratio, ad let (X 1, X 2,...) be a supermartigale w.r.t. Φ. Furthermore, assume that there exists a K such that E [ X ] < K for all ) 39

40 40. The there exists a radom variable X L 1 such that almost surely lim X = X. Before provig this theorem we will eed the followig lemmas. Lemma Let (X 0, X 1, X 2,...) be a supermartigale w.r.t. Φ = (F 1, F 2,...) with X 0 = 0, let B be {0, 1}-values radom variables adapted to Φ, ad let Y = k B k 1 (X k X k 1 ). The Y is a supermartigale ad E [Y ] 0. The idea behid this lemma is the followig: imagie that you are gamblig at a casio with o-positive expected wis from every gamble. Say that you have some system for decidig whe to gamble ad whe to stay out (i.e., the B s). The you do ot expect to wi more tha you would have if you stayed i the game every time. Proof. E [Y +1 F ] = E [ k +1 ] B k 1 (X k X k 1 ) F = E [Y + B (X +1 X ) F ] = Y + B E [X +1 X F ] = Y + B (E [X +1 F ] X ) Y. Thus Y is a supermartigale, ad by iductio E [Y ] 0. Lemma Let (X 0, X 1, X 2,...) be a supermartigale w.r.t. Φ = (F 1, F 2,...) with X 0 = 0. Fix some a < b, ad let B be defied as follows: B 0 = 0, ad B +1 is the idicator of the uio of the evets (1) B = 1 ad X b. (2) B = 0 ad X < a. Let U a,b be the umber of k such that B k = 0 ad B k 1 = 1. The E [ ] U a,b E [(X a) ]. b a Proof. By picture, it is clear that for it holds that Y = k B 1 (X X 1 ) Y (b a)u a,b (X a). By Lemma 13.4 we have that E [Y ] 0, ad so the claim follows by takig expectatios.

41 Proof of Theorem For a give a < b, let U a,b = lim U a,b. The limit exists sice this is a mootoe icreasig sequece, ad it also follows that E [ ] U a,b = lim E [ U a,b ] E [(X a) ] lim a + K b a b a <. Thus P [ U a,b < ] = 1, ad it follows that with probability zero it occurs that lim sup X b ad lim if X a. Applyig this to a coutable dese set of pairs (a, b) we get that with probability zero lim sup X > lim if X, ad so lim sup X = lim if X almost surely. Theorem 13.1 is ow a direct cosequece. Exercise Let R be the fractio of black balls i Pólya s ur. Show that lim R is distributed uiformly o (0, 1). Hit: calculate the distributio of R. 41

42 Stoppig times Let Φ = (F 1, F 2,...) be a filtratio, ad let (X 0, X 1, X 2,...) be a supermartigale w.r.t. Φ. Deote F = σ( F ). A radom variable T takig values i N { } is called a stoppig time if for all it holds that the evet {T } is F -measurable. Example: (X 1, X 2,...) is a Markov chai over the state space S, ad T x is the hittig time to x S give by T x = mi{ > 0 : X = x}. Example: (X 1, X 2,...) is the simple radom walk o Z, ad T is the first 3 such that X < X 1 < X 2. Give a stoppig time T, we defie the stopped process (X T 1, X T 2,...) = (X 1, X 2,..., X T 1, X T, X T,...). That is, X T = X if T, ad X T = X T if T. Equivaletly, X T = X mi{t,}. Ituitively, the stopped process correspods to the process of a gambler s bak accout, whe the gambler decides stoppig at time T. Theorem If (X 0, X 1, X 2,...) is a (super)martigale (with X 0 = 0) the (X T 0, X T 1, X T 2,...) is a (super)martigale. Proof. We prove for the case of supermartigales; the proof for martigales is idetical. Let B = 1 {T } ad Y = k B k 1 (X k X k 1 ). The by Lemma 13.4 we have that Y is a supermartigale. But Y = X T. So the gambler s bak accout is still a martigale, o matter what the stoppig time is, ad i particular E [ X T ] 0 (with equality for martigales). However, cosider a simple radom walk o Z, with stoppig time T 1. That is, the gambler stops oce she has eared a dollar. The clearly E [X T1 ] = 1. The followig theorem gives coditios for whe E [X T ] = 0. Theorem 14.2 (Doob s Optioal Stoppig Time Theorem). Let (X 0, X 1,...) be a supermartigale with X 0 = 0, ad let T be a stoppig time. Assume oe of the followig holds: (1) N s.t. P [T N] = 1. (2) K s.t. P [ X K for all ] = 1, ad P [T < ] = 1. (3) E [T ] < ad K s.t. P [ X +1 X K for all ] = 1. (4) P [T < ] = 1 ad X is o-egative. The E [X T ] 0, with equality if (X 0, X 1,...) is a martigale. To prove this theorem we will eed the followig importat lemma.

43 Lemma 14.3 (Fatou s Lemma). Let (Z 1, Z 2,...) be a sequece of oegative real radom variables. The [ ] E lim if Z lim if E [Z ]. Recall from the Galto-Watso example that ideed this may be a strict iequality. Exercise Prove Fatou s Lemma. Hit: use the Mootoe Covergece Theorem. Proof. We prove that E [X T ] 0; the equality i case of the martigales follows easily. Note that E [ ] X T 0, by Theorem Also lim X T = X T, sice P [T < ] = 1 uder all coditios. (1) X T = XN T. (2) By the Bouded Covergece Theorem E [ X ] T = lim E [ ] X T 0. (3) mi{t,} X T = X k X k 1 K T. k=1 Hece by the Domiated Covergece Theorem E [X T ] = E [ X T ]. (4) By Fatou s Lemma, E [X T ] lim if E [ X T ] 0. Corollary Let T 1 be the hittig time to 1 of the simple radom walk o Z. The E [T 1 ] =. 43

44 Harmoic ad superharmoic fuctios Let (X 0, X 1,...) be a Markov chai over the state space S with trasitio matrix P. We say that a fuctio f : S R is P -harmoic if P f = f. Here P f : S R is [P f](x) = y S P (x, y)f(y). We say that f is P -superharmoic if [P f](x) f(x) for all x S. Claim Assume that for all x there exists a such that P [X = x] > 0. Let Z = f(x ). The Z is a (super)martigale iff f is (super)harmoic. Proof. We prove for the (super) case: E [f(x +1 ) X 0,..., X ] = E [f(x +1 ) X ] = y S P (X, y)f(y) f(x ) iff f is superharmoic. Theorem Let P be irreducible. The the followig are equivalet. (i) Every Markov chai with trasitio matrix P is recurret. (ii) Some Markov chai with trasitio matrix P is recurret. (iii) Every o-egative P -superharmoic fuctio is costat. Proof. The equivalece of (i) ad (ii) follows easily from Theorem To see that (i) implies (iii), let T y be the hittig time to y, ad ote that P x [T y < ] = 1, by recurrece. Let f be a o-egative superharmoic fuctio, ad let Z = f(x ). The we ca apply the Optioal Stoppig Time Theorem to Z Ty to get that E x [ ZTy ] Ex [Z 0 ]. The l.h.s. is equal to f(y) ad the r.h.s. is equal to f(x), ad so f is costat.

45 45 Assume (iii), ad ote that P x [T y < ] = P x [X 1 = y, T y < ] + P x [X 1 y, T y < ] = P x [X 1 = y] + z y P x [X 1 = z, T y < ] = P x [X 1 = y] + z y P x [T y < X 1 = z] P x [X 1 = z] = P (x, y) + z y P (x, z) P z [T y < ] z P (x, z) P z [T y < ]. Hece f(x) = P x [T y < ] is superharmoic, ad thus costat by assumptio. Say p = P x [T y < ]. By irreducibility p > 0. Hece, by the Markov property, for every N the expected umber of visits at times > N is at least p, ad so the expected umber of visits is ifiite. Thus the radom walk is recurret. The followig claim is a direct cosequece of Claim 15.1 ad the Martigale Covergece Theorem. Claim Let f : S R be bouded ad superharmoic. The Z = f(x ) is a bouded supermartigale ad therefore coverges almost surely to Z := lim Z. Recall that T = σ(x, X +1,...) ad that T = T is the tail sigma-algebra. We thik of our probability space as beig (Ω, F, P) with Ω = S N ad F the Borel sigma-algebra of the product of the discrete topologies. The A T iff A is of the form S B for some measurable B F. Equivaletly, A T iff for every (x 0, x 1,...) A, ad (y 0,..., y 1 ) S it holds that (y 0,..., y 1, x, x +1,...) A. Aother importat sigma-algebra is the shift-ivariat sigma-algebra I. To defie it, let ϕ: S N S N be the shift map give by ϕ(x 0, x 1, x 2,...) = (x 1, x 2,...). The I is the collectio of subsets of S N that are ϕ 1 -ivariat. That is, A I iff for every (x 0, x 1, x 2,...) A it holds that ϕ 1 (x 0, x 1, x 2,...) = S (x 1, x 2,...) A. Exercise Fid a irreducible Markov chai o the state space N that has a radom variable that is T -measurable but ot I-measurable. Claim Z is I-measurable ad T -measurable.

46 46 Proof. Fix some z R, ad let A = {Z z}. To show that Z is I-measurable it suffices to show that A I. Let (x 0, x 1,...) A, so that lim f(x ) z. But the lim f(x 1 ) z, ad so ay (x, x 0, x 1, x 2,...) A. Thus A I. Z is clearly T -measurable for every, ad therefore is also T -measurable. Sice Z is bouded we have that Z L (I).

47 16. The Choquet-Dey Theorem As motivatio, cosider the simple radom walk (X 1, X 2,...) o Z 3. Let P = X / X be the projectio of X to the uit sphere (ad assume P = 0 wheever X = 0). Sice this radom walk is trasiet, it is easy to deduce that lim X =. It follows that lim P +1 P = 0; that is, the projectio moves more ad more slowly. A atural questio is: does P coverge? Assume (X 0, X 1,...) is a martigale. The by the martigale ad Markov properties we have that Z = E [Z X ], ad so f(x) = E [Z X = x] wheever P [X = x] > 0. Coversely, choose ay W L (T ), ad defie a fuctio f(x) = E [W X = x] for some s.t. P [X = x] > 0. We claim that f is well defied, sice for ay such f(x) = E [W X = x] = E x [W ]. It is easy to verify that f is harmoic. Deote by h (S, P ) l (S) the bouded P -harmoic fuctios. The the map Φ: L (T ) h (S, P ) give by Φ: W f is a liear isometry. Its iverse is the map Φ 1 : f lim f(x ). Theorem 16.1 (Choquet-Dey Theorem). Let (Y 1, Y 2,...) be i.i.d. radom variables takig values i some coutable abelia group G. Let X = k Y. The (X 1, X 2,...) is a time homogeeous Markov chai over the state space G. If (X 1, X 2,...) is also irreducible the every W L (T ) is costat. For the proof of this theorem we will eed a importat classical result from covex aalysis. Theorem 16.2 (Krei-Milma Theorem). Let X be a Hausdorff locally covex topological space. A poit x C is extreme if wheever x is equal to the o-trivial covex combiatio αy + (1 α)z the y = z. Let C be compact covex subset of X. The every x C ca be writte as the limit of covex combiatios of extreme poits i C. Proof of Theorem Deote by P the trasitio matrix of (X 1, X 2,...), ad let µ(g) = P [Y = g]. The P (g, k) = µ(k g). Thus, if f is P -harmoic the f(g) = k G f(k)p (g, k) = k G f(k)µ(k g) = k G f(g + k)µ(k). Let H = h [0,1] (G, P ) be the set of all P -harmoic fuctios with rage i [0, 1]. We ote that harmoicity is ivariat to multiplicatio by a costat ad additio, ad so if we show that every f h [0,1] (G, P ) 47

48 48 is costat the we have show that every f H is costat. It the follows that every W L (T ) is costat, by the fact that Φ is a isometry. We state three properties of H that are easy to verify. (1) H is ivariat to the G actio: for ay f H ad g G, the fuctio f g : G R give by [f g ](k) = f(k g) is also i H. (2) H is compact i the topology of poitwise covergece. (3) H is covex. As a covex compact space, H is the closed covex hull of its extreme poits; this is the Krei-Milma Theorem. Thus H has extreme poits. Let f H be a extreme poit. The, sice f is harmoic, f(g) = k G f(g + k)µ(k) = k G f k (g)µ(k). By the first property of H each f k is also i H, ad thus we have writte f as a covex combiatio of fuctios i H. But f is extreme, ad so f = f k for all k i the support of µ. But sice the Markov chai is irreducible, the support of µ geerates G. Hece f is ivariat to the G-actio, ad therefore costat. A immediate corollary of the Choquet-Dey Theorem is that every evet i T has probability either 0 or 1. As a applicatio, cosider the questio o the simple radom walk o Z 3. We would like to show that P does ot coverge poitwise. Note that the evet that P coverges is a shift-ivariat evet, ad therefore has measure i {0, 1}. Assume by cotradictio that it has measure 1, ad let P = lim P. For each Borel subset B of the sphere, the evet that P B is shift-ivariat, ad therefore has measure i {0, 1}. For each k N, disjoitly partitio the sphere ito Borel sets with radius at most 1/k. The P [P B] = 1 for exactly oe of these sets, which we call B k. Let the itersectio of all these B k s be the sigleto cotaiig the poit i the sphere b. The we have show that P is equal to b, almost surely. Note that so far we have ot used the fact that the radom walk is simple. Fially, because the radom walk is simple, the by the symmetry of the problem, it must hold that such a poit b is ivariat to reflectio about the x y, y z ad x z plaes, which is impossible. Claim Every evet i T has measure either 0 or 1. Exercise Derive Kolmogorov s zero-oe law from Claim 16.3.

49 17. Characteristic fuctios ad the Cetral Limit Theorem Let X be a real radom variables. The characteristic fuctio ϕ X : R C of X is give by ϕ X (t) = E [ e itx] = E [cos(tx)] + i E [si(tx)]. This expectatio exists for ay real radom variable X ad ay real t, sice the sie ad cosie fuctios are bouded. Note that ϕ ax+b (t) = E [ e it(ax+b)] = E [ e itax e itb] = ϕ ax e itb. Exercise ϕ X is cotiuous, ad is differetiable times if X L. I this case ϕ () X (0) = i E [X ]. If X ad Y are idepedet, the it(x+y )] ϕ X+Y (t) = E [ e = E [ e itx] E [ e ] ity = ϕ X (t) ϕ Y (t). A real radom variable X is said to have a probability distributio fuctio (or p.d.f.) f X : R R if for ay measurable h: R R it holds that E [h(x)] = wheever the l.h.s. exists. I this case ϕ X (t) = h(x)f X (x) dx, e itx f X (x) dx, So that ϕ X is the Fourier trasform of f X. Let X be a real radom variable. Recall that the Cumulative Distributio Fuctio (or c.d.f.) F : R [0, 1] is give by F (x) = P [X x]. We saw i Example 3.5 that F uiquely determies the distributio of X. Theorem 17.2 (Lévy s Iversio Formula). Let X be a real radom variable. For every b > a such that P [X = a] = P [X = b] = 0 it holds that T e ita e itb F (b) F (a) = lim ϕ X (t) dt. T T it 49

50 50 Sice there are at most coutably may c R such that P [X = c] > 0, F is determied by ϕ X. Let X be a stadard Gaussia (or ormal) radom variable. This is a real radom variable with p.d.f. f X (x) = 1 /2. It is easy to 2π e x2 calculate that ϕ X (t) = e 1 2 t2. Thus if X 1 ad X 2 are idepedet stadard Gaussia the ϕ (X1 +X 2 )/ 2 (t) = e 1 2 t2, ad more geerally the same holds for (X X )/. If (X 1, X 2,...) are (ot ecessarily Gaussia) i.i.d. ad Y = k X k the ϕ Y (t) = ϕ X (t). If we defie the Z = 1 Y = 1 k ϕ Z (t) = ϕ X (t/ ). X Now, let E [X] = 0 ad E [X 2 ] = 1. Sice X L 2 the ϕ X is twice differetiable ad ϕ X (0) = 1 ϕ (0) = 0 ϕ (0) = 1. It is a exercise to show that it follows that ϕ X (t) = t2 + o(t 2 ), where here we mea by o(t 2 ) that as t 0 it holds that Thus we have that ad thus ϕ X (t) t2 t 2 0. ϕ Z (t) = ϕ X (t/ ) = (1 1 2 t2 / + o(t 2 / 2 )), lim ϕ Z (t) = e 1 2 t2.

51 51 As we kow, e 1 2 t2 is the characteristic fuctio of a stadard Gaussia. Thus we have proved that if G is a stadard Gaussia the for ay t R it holds that E [ e itz] E [ e itg]. This is almost the cetral limit theorem.

52 Sceery Recostructio: I Fix, ad let (X = X 1, X 2,...) be i.i.d. radom variables o the abelia group Z/Z. Deote by µ(k) = P [X = k] their law. Let X 0 be uiformly distributed o Z/Z, ad let Z = k=0 X be the correspodig radom walk. We assume throughout that the support of µ geerates Z/Z. Some importat examples to keep i mid: µ(1) = 1. µ(1) = 1 ε, µ(2) = ε. Fix some f {0, 1}, ad let F = f(z ). The law of (F 1, F 2,...) depeds o f; we thik of these distributios as a family idexed by f. We deote by P f [ ] the distributio whe we fix a particular f. Note that P f [ ] does ot chage if we shift f. Exercise Prove this. Deote by [f] the equivalece class of f uder shifts. That is, f [f] if there is some k Z/Z such that for every m Z/Z it holds that f (k + m) = f(m). The questio of sceery recostructio is the followig: is it possible to determie [f] give (F 1, F 2,...)? I particular we say that we ca recostruct f if there is some measurable ˆf : {0, 1} N {0, 1} such that for every f {0, 1} it holds that [ ] (18.1) P f ˆf(F1, F 2,...) [f] = 1. Equivaletly, if [ ] P ˆf(f(Z1 ), f(z 2 ),...) [f] = 1. I statistics, ˆf is called a estimator of f, ad the existece of such a ˆf is called idetifiability (of f). This clearly depeds o µ, ad so we say that µ is recostructive if this holds. Oe ca reformulate (18.1) i fiitary terms. It is equivalet to the existece of a sequece ( ˆf 1, ˆf 2,...) with ˆf k beig σ(f 1,..., F k )-measurable ad with [ ] lim P f ˆfk (F 1,..., F k ) [f] = 1 k for all f {0, 1}.

53 A very iterestig questio is how quickly does this coverge to oe (whe it does), for µ chose uiformly over ; for example for µ(1) = 0.99, µ(2) = Questio Let N(, ε) be the smallest k such that there is a ˆf k : {0, 1} k {0, 1} with [ ] P f ˆfk (F 1,..., F k ) [f] 1 ε for all f. For fixed ε (say 1/3), how does N(, ε) grow with? This is ot kow; it is ot eve kow if N(, ε) is expoetial or polyomial. The questio of whether a give µ is recostructive is much better uderstood. Theorem Let be a prime > 5, ad let µ Q. The µ is recostructive iff ϕ µ (k) ϕ µ (m) for all k m. Here ϕ µ is give by [ ] ϕ µ (k) = ϕ X (k) = E e 2πi k X = e 2πi k X µ(k). where k X is multiplicatio mod. l Z/Z The first directio (the case that ϕ µ (k) ϕ µ (m) for all k m) does ot require the extra assumptios o ad µ. This is due to Matziger ad Lember [3]. To prove this theorem we will eed to study a few ew cocepts. 53

54 Statioary distributios ad processes Give a trasitio matrix P o some state space S, ad give a Markov chai (X 1, X 2,...) over this P, the law of X 2 is give by P [X 2 = t] = s P [X 1 = s, X 2 = t] = s = s P [X 1 = s]p [X 2 = t X 1 = s] P [X 1 = s]p (s, t). Thus, if we thik of the distributios of X 1 ad X 2 as vectors v 1, v 2 l 1 (S), the we have that v 2 = v 1 P. A o-egative left eigevector of P is called a statioary distributio of P. It correspods to a distributio of X 1 that iduces the same distributio o X 2. By the Perro-Frobeius Theorem, if S is fiite the P has a statioary distributio. Furthermore, if P is also irreducible the this distributio is uique. Exercise The uiform distributio o Z/Z is the uique statioary distributio of the µ radom walk (recall that µ is geeratig). Let (Y 1, Y 2,...) be a geeral process. We say that this process is statioary (or shift-ivariat) if its law is the same as the law of (Y 2, Y 3,...). Equivaletly, for every, the law of (Y k+1,..., Y k+ ) is idepedet of k. Exercise Show that the two defiitios are ideed equivalet. Claim If (Y 1, Y 2,...) is a Markov chai, ad if the distributio of Y 1 is statioary, the (Y 1, Y 2,...) is a statioary process. Returig to our sceery recostructio problem, we ca use what we leared above to deduce that (Z 1, Z 2,...) is a statioary process. It easily follows that is also a statioary process. (F 1, F 2,...) = (f(z 1 ), f(z 2 ),...)

55 20. Sceery recostructio: II Fix N, f {0, 1} ad µ a geeratig probability measure o Z/Z, ad recall our process i which X 0 is uiform o Z/Z, (X 1, X 2,...) are i.i.d. with law µ, Z = X 0 + X X ad F = f(z ). Recall also that are we are iterested i guessig (correctly, almost surely) what [f] is (the equivalece class of fuctios that are shifts of f) from a sigle radom istace of (F 1, F 2,...). Defie the a: Z/Z R, autocorrelatio of f by a(k) = 1 f(m) f(m + k), m=0 ad ote that a is the same for ay f [f]. Imagie that we are willig to settle o recostructig a rather tha [f]. We will show that if the values of the characteristic fuctio ϕ µ are uique the we ca recostruct the a(k) s. To this ed, we defie A: N R, the autocorrelatio of F by α k = E [F T F T +k ] for some T N; by statioarity, the choice of T is immaterial. We will show that if we kow the α k s the we ca ifer the a k s. But this will ot help us, uless there some measurable ˆα k : {0, 1} N R such that P f [ˆα k (F 1, F 2,...) = α k ] = 1. A atural cadidate for ˆα k is the empirical average; we take lim sup rather tha lim to make sure ˆα k is well defied: ˆα k = lim sup m 1 m m F T F T +k. A statemet such as ˆα k = α k almost surely souds a lot like the strog law of large umbers. We will show later that this is ideed true, ad that it follows from the Ergodic Theorem, which is a geeralizatio of the SLLN. Let µ µ be the covolutio of µ with itself, which is give by T =1 55 [µ µ](k) = m µ(k m) µ(m). This is a probability distributio which is exactly the law of X 1 + X 2. Defie aalogously the k-fold covolutio µ (k), which is the law of X X k.

56 56 Claim For every k N it holds that α k = m µ (k) (m) a m. Proof. We set T = 0, coditio o X 0 ad Z k ad thus α k = E [f(z 0 ) f(z k )] = m,l E [f(x 0 ) f(z k ) X 0 = l, Z k = l + m] P [X 0 = l, Z k = l + m] = m,l f(l) f(l + m) 1 µ(k) (m) = m a m µ (k) (m). It follows that if we deote by α the colum vector (α 0,..., α 1 ), by a the colum vector (a 0,..., a 1 ), ad by M the matrix M k,m = µ (k) (m) the α = Ma. Assumig (as we will show later) that we ca determie α, it follows that we ca determie a if M is ivertible. Claim M is ivertible iff the values of the characteristic fuctio ϕ µ are uique. Proof. We apply the Fourier trasform to each row of M. Sice the Fourier trasform is a orthogoal liear trasformatio, the resultig matrix ˆM is ivertible iff M is ivertible. Now, over Z/Z the Fourier trasform is idetical to the characteristic fuctio. Sice the k th row of M is the law of X X k, the k th row of ˆM is give by [ ] ϕ X1 + +X k (m) = E e 2πi m (X 1+ +X k ) = ϕ X (m) k. Thus ˆM is a Vadermode matrix, ad is ivertible iff ϕ X has uique values. Recall that we are iterested i recostructig [f] rather tha a. To this ed we eed to defie the two-fold autocorrelatio a k,l = 1 f(m) f(m + k) f(m + k + l), ad its aalogue m=0 α k,l = E [F T F T +k F T +k+l ].

57 It is the easy to show that there is also a liear relatio betwee these two objects, with the correspodig matrix beig M M, the tesor product of M with itself. This is ivertible iff M is ivertible, ad so we get the same result. However, this still does ot suffice, ad we eed to add still more idices ad calculate -fold autocorrelatios. The appropriate matrices are agai ivertible iff M is, ad moreover [f] is uiquely determied by the -fold autocorrelatio. 57

58 Statioary processes ad measure preservig trasformatios We say that a statioary process (Y 1, Y 2,...) is ergodic if its shiftivariat sigma-algebra is trivial. That is, if for every shift-ivariat evet A it holds that P [A] {0, 1}. Some examples: A i.i.d. process is obviously statioary. By Kolmogorov s zerooe law its tail sigma-algebra is trivial, ad so its shift-ivariat sigma-algebra is also trivial. Thus it is ergodic. Let (Y 1, Y 2,...) be biary radom variables such that ad P [(Y 1, Y 2,...) = (1, 1,...)] = 1/2 P [(Y 1, Y 2,...) = (0, 0,...)] = 1/2. This process is statioary but ot ergodic; the evet lim Y = 1 is shift-ivariat ad has probability 1/2. Let (Y 1, Y 2,...) be biary radom variables such that ad P [(Y 1, Y 2,...) = (1, 0, 1, 0,...)] = 1/2 P [(Y 1, Y 2,...) = (0, 1, 0, 1,...)] = 1/2. This process is statioary ad ergodic. Let P be chose uiformly over [0, 1], ad let (Y 1, Y 2,...) be biary radom variables, which coditioed o P are i.i.d. Beroulli with parameter P. This process is statioary but ot ergodic. For example, the evet that 1 lim Y k 1/2 k is a shift-ivariat evet that has probability 1/2. Let (Y 1, Y 2,...) be a Markov chai, with the distributio of Y 1 equal to some statioary distributio. The this process is statioary. It is ergodic iff the distributio of Y 1 is ot a o-trivial covex combiatio of two differet statioary distributios. Let Y 1 be distributed uiformly o [0, 1). Fix some 0 < α < 1, ad let Y +1 = Y +α mod 1. This is a statioary process, ad it is ergodic iff α is irratioal. A geeralizatio of the last example is the followig. Let (Ω, F, ν) be a probability space, ad let T : Ω Ω be a measurable trasformatio that preserves ν. That is, ν(a) = ν(t 1 (A)) for all A F. We say

59 that A F is T -ivariat if T 1 (A) = A, ad ote that the collectio of T -ivariat sets is a sub-sigma-algebra. Let Y 1 have law ν, ad let each Y +1 = T (Y ). The (Y 1, Y 2,...) is a statioary process. Claim (Y 1, Y 2,...) is ergodic iff for every T -ivariat A F it holds that ν(a) {0, 1}. Proof. The map π : Ω Ω N give by π(ω) = (ω, T (ω), T 2 (ω),...) is a bijectio that pushes the measure ν to the law P of (Y 1, Y 2,...), ad thus these two probability spaces are isomorphic. Furthermore, if we deote the shift by σ : Ω N Ω N, the π is equivariat, i the sese that π T = σ π. It follows that the T -ivariat sigma-algebra is mapped to the shift-ivariat sigma-algebra, ad thus oe is trivial iff the other is trivial. Of course, if we have a process (Y 1, Y 2,...) takig values i Ω, the statioarity is precisely ivariace w.r.t. the shift trasformatio T : Ω Ω give by T (x 1, x 2, x 3,...) = (x 2, x 3,...). Thus statioary processes ad measure preservig trasformatios are two maifestatios of the same object. We say that T is ergodic if the T -ivariat sigma-algebra is trivial. Claim Let (Ω, F, P) be a probability space, with T : Ω Ω a ergodic measure preservig trasformatio. If Z : Ω R is a T -ivariat radom variable (i.e., Z(ω) = Z(T (ω)) for all ω Ω) the there is some z R such that P [Z = z] = 1. Exercise Prove this claim. Hit: For ay a < b R, the evet Z [a, b] is T -ivariat, ad thus has measure either 0 or 1. If (Ω, F) is a measurable space, if T : Ω Ω is measurable, ad if ν 1 ad ν 2 are T -ivariat, the it is easy to see that ay covex combiatio of ν 1 ad ν 2 is also T -ivariat. Thus the set of T -ivariat probability measures o (Ω, F) is covex. A extreme T -ivariat probability measure o (Ω, F) is oe that caot be writte as a o-trivial liear combiatio of two differet ivariat measures. We will later show (Propositio 24.2) that the extreme measures are precisely the ergodic oes. 59

60 The Ergodic Theorem Theorem 22.1 (The Poitwise Ergodic Theorem). Let (Ω, F, P) be a probability space, with T : Ω Ω a measure preservig trasformatio. If T is ergodic the for every X L 1 (Ω, F, P) it holds that for ν-almost every ω Ω 1 1 lim X(T k (ω)) = E [X]. k=0 I the laguage of statioary processes, oe ca say that if (Y 1, Y 2,...) is a statioary process with trivial shift-ivariat sigma-algebra, ad if f(y 1, Y 2,...) L 1, the almost surely lim 1 f(y k, Y k+1,...) = E [f(y 1, Y 2,...)]. k=1 This Theorem was origially proved by Birkhoff [1]. We give a proof due to Katzelso ad Weiss [2]. Proof. We assume without loss of geerality that X is o-egative; otherwise apply the proof separately to X + ad X. Defie X : Ω Ω by X 1 1 (ω) = lim X(T k (ω)) wheever this limit exists. We wat to show that it exists w.p. 1, ad that P [X = E [X]] = 1. Defie X : Ω R by ad likewise X(ω) = lim sup X(ω) = lim if k=0 1 1 k=0 1 1 k=0 X(T k (ω)), X(T k (ω)). Note that both are T -ivariat, ad so there are some x ad x such that Provig that (22.1) will thus fiish the proof. P [ X = x, X = x ] = 1. x E [X] x

61 61 Fix some ε > 0. Let N(ω) be the first positive iteger such that (22.2) N(ω) 1 1 X(T k (ω)) + ε x N(ω) k=0 Sice N(ω) is a.s. fiite, there is some K N such that the set A = {ω : N(ω) > K} has measure less tha ε/x. Defie { X(ω) ω A X(ω) = max{x(ω), x} ω A, ad also { N(ω) ω A Ñ(ω) = 1 ω A. Note that i aalogy to (22.2) we have that or, rearragig, that 1 Ñ(ω) Ñ(ω) 1 k=0 X(T k (ω)) + ε x, (22.3) Ñ(ω) 1 k=0 X(T k (ω)) Ñ(ω)(x ε). Now X ad X oly differ o A, ad whe they do differ the it is at most by x, sice X is o-egative. Hece [ ] [ E X = E X + ( X ] X) [ ] = E [X] + E X X = E [X] + E [( X ] X) 1 {A} E [X] + E [ ] x 1 {A} (22.4) E [X] + x ε/x = E [X] + ε. Now, let L = Kx/ε. For each ω Ω, let ω 0 = ω ad let It follows that ω j+1 = T Ñ(ω j) (ω j ). ω j = T Ñ(ω 0)+Ñ(ω 1)+ +Ñ(ω j 1) (ω).

62 62 Let J(ω) be the maximal j such that Ñ(ω 0 ) + Ñ(ω 1) + + Ñ(ω j) < L, ad let Ñ L (ω) = Ñ(ω 0) + Ñ(ω 1) + + Ñ(ω J(ω)). Note that ÑL(ω) > L K. The we ca write L 1 k=0 X(T k (ω)) = Ñ(ω 0 ) k=0 X(T k (ω 0 )) + + Ñ(ω J(ω) ) k=0 Applyig (22.3) to each term but the last yields L 1 k=0 X(T k (ω)) ÑL(ω)(x ε) + L 1 k=ñl(ω) ad usig the fact that X is o-egative meas L 1 k=0 X(T k (ω)) ÑL(ω)(x ε). X(T k (ω J(ω) )) + X(T k (ω)), L 1 k=ñl(ω) Sice ÑL(ω) > L K we ca apply this estimate too, ad, rearragig, arrive at 1 L 1 X(T k (ω)) x K L L x ε k=0 which by the choice of L we ca write as 1 L 1 X(T k (ω)) x 2ε. L k=0 Now, by T -ivariace the expectatio of the l.h.s. is just equal to the expectatio of X. Hece [ E X] x 2ε. Puttig this together with (22.4) yields [ ] x E X + 2ε E [X] + 3ε, ad takig ε to zero yields x E [X]. This completes the first half of the proof of (22.1); the secod follows by a similar argumet. Exercise Use the Ergodic Theorem to prove the strog law of large umbers. X(T k (ω))

63 23. The Rado-Nikodym derivative Let (Ω, F, P) be a probability space. Give a r.v. X with E [X] = 1, we ca defie a the measure Q = X P by Q[A] = E [ 1 {A} X ] = 1 {A} (ω) X(ω) dp(ω). It is easy to show that X is the uique r.v. such that Q = X P. I this case we call X the Rado-Nikodym derivative of Q with respect to P, ad deote Note that (23.1) Ω dq (ω) = X(ω). dp P[A] = 0 implies Q[A] = 0, so that ot every measure Q ca be writte as X P for some X. Whe Q ad Q satisfy (23.1) the we say that Q is absolutely cotiuous relative to P. Example The uiform distributio o [0, 1] is absolutely cotiuous relative to the uiform distributio o [0, 2]. If P [A] > 0 the P [ A] is absolutely cotiuous relative to P. The poit mass δ 1/2 is ot absolutely cotiuous relative to the uiform distributio o [0, 1]. The i.i.d. q measure o {0, 1} N is ot absolutely cotiuous relative to the i.i.d. p measure o {0, 1} N, uless p = q. Lemma If Q is absolutely cotiuous relative to P, the for each ε > 0 there exists a δ > 0 such that, for every measurable A, P[A] < δ implies Q[A] < ε. Proof. Assume the cotrary, so that there is some ε ad a sequece of evets (A 1, A 2,...) with P[A ] < 2 ad Q[A ] ε. Let A = m> A m be the evet that ifiitely may of these evets occur. The by Borel-Catelli P[A] = 0. O the other had Q[A] ε, i cotradictio to absolute cotiuity. Recall that F is separable if it geerated by a coutable subset {F 1, F 2,...}. We ca assume w.l.o.g. that this subset is a π-system. Theorem 23.3 (Rado-Nikodym Theorem). Let (Ω, F, P) be a probability space with F separable, ad let Q be absolutely cotiuous relative to P. The there exists a r.v. X such that Q = X P. 63

64 64 Proof. Let F = σ(f 1, F 2,...). The F is a fiite sigma-algebra, ad as such is the set of all possible uios of {B1,..., Bk }, a fiite partitio of Ω. Defie the F -measurable r.v. X as follows. For a give ω Ω there is a uique B {B1,..., Bk } such that ω B. Set X (ω) = X (B ) = Q[B ] P[B ], where we take 0/0 = 0. It is easy to verify that E [X ] = 1, ad that for every B F it holds that Q[B] = E [ 1 {B} X ], so that o F it holds that X is the Rado-Nikodym derivative dq/dp. Now, sice (F 1, F 2,...) is a filtratio, B is the disjoit uio of (at most) two sets B +1 i ad B +2 j. Hece E [X +1 F ](ω) = X +1(B +1 i ) P [ ] B +1 i + X+1 (B +1 j ) P [ ] B +1 j P [ ] [ ] B +1 i + P B +1 j = Q[B +1 i ] P [ B +1 P[B +1 i ] i = Q[B+1 i ] + Q[B +1 j ] P [B ] = Q[B ] P [B ] = X (ω), ] Q[B + +1 j ] P [ B +1 P[B +1 j ] j P [B ] ad thus (X 1, X 2,...) is a martigale w.r.t. the filtratio (F 1, F 2,...). Sice it is o-egative the it coverges almost surely to some r.v. X. We ow claim that (X 1, X 2,...) are uiformly itegrable, i the sese that for every ε there exists a K such that for all it holds that E [ X 1 {X>K}] < ε. To see this, recall that E [X ] = 1, ote that X is o-egative, ad apply Markov s iequality to arrive at P [X > K] < 1 K. Now, by Lemma 23.2, if we choose K large eough the this implies that Q[X > K] < ε. But the evet {X > K} is i F, sice X is F -measurable. Hece E [ X 1 {X>K}] = Q[X > K] < ε. ]

65 This proves that (X 1, X 2,...) are uiformly itegrable. A importat result (which is ot hard but which we will ot prove) is that if X X almost surely, the uiform itegrability implies that this covergece is also i L 1, i the sese that E [ X X ] 0. It follows that for ay F i {F 1, F 2,...} ad thus lim E [ 1 {Fi } (X X) ] lim E [ 1 {Fi } X X ] = 0, E [ 1 {Fi } X ] = lim E [ 1 {Fi } X ] = Q[Fi ]. Thus the measure X P agrees with Q o the geeratig algebra {F 1, F 2,...}, ad thus Q = X P. 65

66 The weak topology ad the simplex of ivariat measures Let X be a compact metrizable topological space. By the Riesz Represetatio Theorem we ca idetify P(X), the set of probability measures o X, with the positive bouded liear fuctioals o C(X) that assig 1 to the costat fuctio 1. The space X of bouded liear fuctioals o C(X) comes equipped with the compact, metrizable weak* topology, uder which ϕ ϕ if ϕ (f) ϕ(f) for all f C(X). The restrictio of this topology to the (closed) set of probability measures yields what probabilists call the weak topology o the probability measures o X. I the importat case that X = {0, 1} N we have that ν ν weakly if for every clope A it holds that ν (A) ν(a). I the case X = R {, } we have that ν ν if lim sup ν (A) ν(a) for all closed A, or if lim if ν (A) ν(a) for all ope A. Let X = {0, 1} Z, ad deote by I(X) the set of statioary (or shiftivariat) probability measures o X. Claim I(X) is a closed subset of P(X). Proof. Deote the shift by σ : X X. Assume that ν is a sequece i I(X) that coverges to some ν P(X). We prove the claim by showig that ν is statioary. Let A be a clope subset of X. The ν(a) = lim ν (A) = lim ν (σ(a)) = ν(σ(a)), where the last equality follows from the fact that A beig clope implies that σ(a) is clope. Thus ν is ivariat o a geeratig sub-algebra of the sigma-algebra, ad by a stadard argumet it is ivariat. Clearly, I(X) is a covex set. The ext propositio shows (a more geeral claim which implies) that its extreme poits I e (X) are the ergodic measures. Propositio A T -ivariat measure ν o (Ω, F) is ergodic iff it is extreme. Proof. Assume that ν is ot ergodic. The there is some T -ivariat A F such that p := ν(a) (0, 1). Let ν 1 be give by ν 1 (B) = ν(b

67 67 A) = 1 p ν(b A), ad let ν 2(B) = ν(b A c ). The ν 1 (T 1 B) = 1 p ν((t 1 (B)) A)) = 1 p ν((t 1 (B)) T 1 (A))) = 1 p ν((t 1 (B A))) = 1 ν(b A) p = ν 1 (B). Ad thus ν 1 is T -ivariat. The same argumet applies to ν 2, sice A c is also T -ivariat. Fially, ν = pν 1 + (1 p)ν 2. For the other directio, assume ν = pν 1 + (1 p)ν 2 for some p (0, 1). Clearly, ν 1 is absolutely cotiuous relative to ν, ad so we write ν 1 = X ν for some X L 1 (ν). We ow claim that X is T -ivariat; we prove this for the case that T is ivertible (although it is true i geeral). I this case, for ay A F ν 1 (A) = ν 1 (T (A)) = 1 {A} (T 1 (ω)) X(ω) dν(ω) Ω = 1 {A} (T 1 (ω)) X(ω) dν(t 1 (ω)) Ω = 1 {A} (ω) X(T (ω)) dν(ω), Ω ad so X T is also a Rado-Nikodym derivative dν 1 /dν. But by the uiqueess of this derivative X ad X T agree almost everywhere. It is a ow ice exercise to show that there exists some X that is equal to X almost everywhere ad is T -ivariat. It the follows by Claim 21.2, ad by the fact that E [X] = 1, that P [X = 1] = 1, ad thus ν = ν 1. This Theorem has a iterestig cosequece. Exercise Assume ν, µ are both T -ivariat ergodic measures o (Ω, F). Show that there exist two disjoit set A, B F such that ν(a) = 0 ad µ(a) = 1, while ν(b) = 1 ad µ(b) = 0. Thus ν ad µ live i differet places. I fact, it is possible to show that there is a map β : I e (X) F with the properties that

68 68 (1) µ(β µ ) = 1 for all µ I e (X). (2) For all µ ν I e (X) it holds that β µ β ν =. Usig this, it is possible to show that I(X) is i fact a simplex: a compact covex set i which there is a uique way to write each elemet as a covex itegral of the extreme poits. Propositio The ergodic measures I e (X) are dese i I(X). Thus the simplex I(X) has the iterestig property that its extreme poits are dese. It turs out that there is oly oe such simplex (up to affie homeomorphisms), which is called the Poulse simplex. Proof. It suffices to show that for ν, µ I e (X) ad θ = 1ν + 1 µ it is 2 2 possible to fid θ I e (X) s.t. lim θ = θ. To this ed, fix ad defie θ as follows. Let the law of the r.v.s (X k ) k Z be µ, ad the law of (Y k ) k Z be ν. For m Z, let (X0 m,..., X 1) m be idepedet of all previously defied radom variables, ad with law equal to that of (X 0,..., X 1 ). Defie (Y0 m,..., Y 1) m aalogously. Defie (W k ) k Z by W k = { X k/ k mod if k/ is eve Y k/ k mod if k/ is odd. Fially, choose N uiformly at radom from {0, 1,..., 2 1}, ad defie (Z k ) k Z by Z k = W k+n. Let θ be the law of (Z k ). It is straightforward (if tedious) to verify that (Z k ) is statioary. We leave it as a exercise to show that it is ergodic. Thus to fiish the proof we have to show that lim θ = θ. Fix M N, ad cosider the evet that N {1,..., M}. As teds to ifiity, the probability of this evet teds to zero. Thus, if we coditio o N, with probability that teds to 1/2 we have that the law of (Z 1,..., Z M ) is equal to the law of (X 1,..., X M ), ad likewise for (Y 1,..., Y M ). This completes the proof.

69 25. Percolatio Let V be a coutable set, ad let G = (V, E) be a locally fiite, simple symmetric graph. That is, E is a symmetric relatio o V with E {v} V fiite for each v V. We also assume that G is coected, so that the trasitive closure of E is V V. The i.i.d. p percolatio measure o {0, 1} E is simply the product Beroulli measure, i which we choose each edge idepedetly with probability p. We will deote this measure by P p [ ], ad will deote by E the radom edge set with this law. G = (V, E) will be the correspodig radom graph. Note that G will i geeral ot be coected. For each v V we deote by K(v) the (radom) coected compoet that v belogs to i G. We deote by {v } the evet that K(v) is ifiite. We deote by K the evet that there is some v for which K(v) is ifiite. Claim The probability of K is either 0 or 1. I the former case, for every v V, P p [v ] = 0, while i the latter P p [v ] > 0. Proof. Eumerate E = (e 1, e 2,...), ad let A = {e i E}. The (A 1, A 2,...) is a i.i.d. sequece. Clearly K is σ(a 1, A 2,...)-measurable, ad also clearly it is a tail evet. Hece the first part of the claim follows by Kolmogorov s 0-1 law. Sice the evet K cotais {v }, it is immediate that P p [K ] = 0 implies P p [v ] = 0. Assume ow that P p [K ] = 1. The there is some w V such that, with positive probability, P p [w ]. Let P = (e 1, e 2,..., e ) be a path betwee v ad w. Cosider the radom variable Ẽ takig values i {0, 1}E defied as follows: for every edge e P, we set e Ẽ iff e E. Ad we set e Ẽ for all e P. We (you) prove i the exercise below that the law of Ẽ is absolutely cotiuous relative to the law of E. Deote G = (V, Ẽ), ad deote by K(v) the coected compoet of v i G. Now, K(v) = K(w), sice v ad w are coected i G. Also, K(w) cotais K(w), sice Ẽ cotais E. Hece the evet { K(w) = } occurs with positive probability, ad so the same holds for K(v) = K(w). Fially, by absolute cotiuity, the same holds for K(v), ad so P p [v ] > 0. Exercise Prove that the law of Ẽ is absolutely cotiuous relative to the law of E. Claim If q > p the P q [K ] P p [K ]. To prove this claim we prove a stroger theorem, ad i the process itroduce the techique of couplig. Let Ω = {0, 1} N, with F the Borel 69

70 70 sigma-algebra. We cosider the atural partial order o Ω give by ω ω if ω ω for all N. We say that A F is icreasig if for all ω ω it holds that ω A implies ω A. Let P p [ ] deote the i.i.d. p measure o Ω. Theorem If A is icreasig the q > p implies P q [A] P p [A]. Proof. Let (X 1, X 2,...) be i.i.d. radom variables, each distributed uiformly o [0, 1]. For each let Q = 1 {X q} ad P = 1 {X p}. Note that P [Q = 1] = q ad P [P = 1] = p, ad that (Q 1, Q 2,...) is i.i.d., as is (P 1, P 2,...). Hece the law of (Q 1, Q 2,...) (resp., (P 1, P 2,...)) is P q [ ] (resp., P p [ ]). Note also that (Q 1, Q 2,...) (P 1, P 2,...), sice q > p. Hece for ay icreasig evet A {0, 1} N it holds that (P 1, P 2,...) A implies (Q 1, Q 2,...) A, ad thus P q [A] P p [A]. The costructio i this proof is a example of couplig. Formally, a couplig of two probability spaces (Ω, F, P) ad (Ω, F, P ), is a probability space (Ω Ω, σ(f F ), Q) such that the projectios o the two coordiates pushes Q forward to P ad P. Sice P p [K ] {0, 1}, sice P p [K ] is weakly icreasig i p, we are iterested i the critical percolatio probability p c = sup{p : P p [K ] = 0}. A iterestig (ad ofte hard) questio is whether P pc [K ] is zero or oe. Let G be the ifiite k-ary tree with root o. I this case we ca calculate p c, by otig that the evet {o } ca be thought of as the evet that the Galto-Watso tree with childre distributio B(k, p) is ifiite. We kow that this happes with positive probability iff p > 1/k. Hece i this case p c = 1/k, ad P pc [K ] = 0.

71 26. Large deviatios Let (X 1, X 2,...) be i.i.d. real radom variables. Deote X = X 1 ad let µ = E [X]. Let Y = 1 X k. By the law of large umbers we expect that Y should be close to µ for large. What is the probability that it is larger tha some η > µ? We already proved the Cheroff lower boud. We here prove a asymptotically matchig upper boud. Recall that the momet geeratig fuctio of X is k=1 M(t) = E [ e tx], ad that its cumulat geeratig fuctio is K(t) = log M(t) = log E [ e tx]. Of course, these may be ifiite for some t. Let I, the domai of both, be the set o which they are fiite, ad ote that 0 I. Claim I is a iterval, ad K is covex o I. 71 For the proof of this claim we will eed Hölder s iequality. p [1, ] ad a real r.v. X deote For X p = E [ X p ] 1/p. Lemma 26.2 (Hölder s iequality). For ay p, q [1, ] with 1/p + 1/q = 1 ad r.v.s X, Y it holds that X Y 1 X p Y q. Exercise Prove Hölder s iequality. Hit: use Youg s iequality, which states that for every real x, y 0 ad p, q > 1 with 1/p + 1/q = 1 it holds that xy xp p + yq q. Proof of Claim Assume a, b I. The for ay r (0, 1) K(ra + (1 r)b) = log E [ e (ra+(1 r)b)x] = log E [ (e ax ) r ( e bx ) 1 r ].

72 72 By Hölder s iequality K(ra + (1 r)b) log E [( e ax) r] 1/r + log E [ (e bx ) 1 r ] 1/(1 r) = log E [ e ax] r + log E [ e bx ] 1 r = r log E [ e ax] + (1 r) log E [ e bx] = rk(a) + (1 r)k(b). Sice K is o-egative it follows that it is fiite o ra + (1 r)b, ad thus I is a iterval o which it is covex. Applyig the Domiated Covergece Theorem iductively ca be used to show that M ad K are smooth (i.e., ifiitely differetiable) o the iterior of I. Let the Legedre trasform of K be give by K (η) = sup(tη K(t)). t>0 It turs out that the fact that K is smooth ad covex implies that K is also smooth ad covex. Therefore, if the supremum i this defiitio is obtaied at some t, the K (t) = η. Coversely, if K (t) = η for some t, the this t is uique ad K (η) = tη K(t). Theorem 26.4 (Cheroff boud). Proof. For ay t 0 Optimizig over t yields the claim. P [Y η] e K (η). P [Y η] P [ty tη] = P [e t ] k X k e tη [ E e ] k tx k e tη = e (tη K(t)). Theorem If η = K (t) for some t i the iterior of I the P [Y η] = e K (η)+o(). Proof. Oe side is give by the Cheroff boud. It thus remais to prove the upper boud. Let Z = k=1 X k. We wat to prove that P [Z η] = e K (η)+o().

73 Deote the law of X by ν, ad for a t I (to be determied late) defie ν by d ν dν (x) = etx E [e tx ] = etx K(t). Let ( X 1, X 2,...) be i.i.d. with law ν, ad let Z = k=1 X k. The law of Z is ν (), the -fold covolutio of ν. We claim that d ν () (x) = etx dν () E [e tx ] = e tx K(t). This is left as a exercise. Note also that [ ] E X = E [ Xe tx] = K (t). E [e tx ] Now, for ay η > η, P [Z η] P [η Z η] = η η 1 dν () (x) η = e K(t) e tx d ν () (x) η η e (tη K(t))+ d ν () (x) = e (tη K(t)) P η [ η Z ] η [ ] Sice E X = K (t), it follows that if we choose t so that η > K (t) > η which we ca, by the claim hypothesis ad the smoothess of K the, by the law of large umbers, [ P η Z ] η 1, ad so 1 lim log P [Z η] (tη K(t)). Sice this holds for ay η > η ad η > K (t) > η, it also holds for η = η ad t such that K (t) = η. So lim 1 log P [Z η] (tη K(t)) 73

74 74 or P [Z η] e (tη K(t))+o(). Fially, sice K is covex ad smooth, ad sice K (t) = η, the t is the maximizer of zη K(z), ad thus tη K(t) = K (η) ad P [Z η] e K (η)+o().

75 27. The mass trasport priciple Let G = (V, E) be a locally fiite, coutable graph. A graph automorphism is a bijectio f : V V such that (v, w) E iff (f(v), f(w)) E. The automorphisms of a graph form the group Aut(G) uder compositio. We say that G is trasitive if its automorphism group acts o it trasitively. That is, if for all v, w V there is a graph automorphism f s.t. f(v) = w. Ituitively, this meas that the geometry of the graph looks the same from the poit of view of every vertex. A importat example is whe Γ is fiitely geerated by a symmetric fiite subset S, ad G = (V, E) with V = Γ is the correspodig Cayley graph. I this case it is easy to see that the Γ actio o itself is a actio by graph automorphisms, which is furthermore already trasitive. We will restrict our discussio to this settig, eve though it all exteds to uimodular trasitive graphs; these are graphs with a uimodular automorphism group. A map f : Γ Γ [0, ) is a mass-trasport if it is ivariat uder the diagoal Γ-actio: f(h, k) = f(gh, gk) for all g, h, k Γ. It is useful to thik about f as idicatig how much mass is passed from h to k, where the amout passed ca deped o idetities of h ad k, but i a way that (i some sese) oly depeds o the geometry of the graph ad ot o their ames. Theorem 27.1 (Mass Trasport Priciple for Groups). For every mass trasport f : Γ Γ [0, ) ad g Γ it holds that f(k, g). k G f(g, k) = k G That is, the total mass flowig out of g is equal to the total mass flowig i. Proof. By ivariace k G f(g, k) = k G Chagig variables to h = k 1 g yields = h G f(h, e). Applyig ivariace agai yields f(gh, g) h G f(k 1 g, e). 75

76 ad agai chagig variables to k = gh yields the desired result. Figure 1. Spaig forest. As a applicatio, cosider the followig radom subgraph E of the stadard Cayley graph of Z 2.

76 76 ad agai chagig variables to k = gh yields the desired result. Figure 1. Spaig forest. As a applicatio, cosider the followig radom subgraph E of the stadard Cayley graph of Z 2. For each z, w 1, w 2 Z 2 such that w 1 = z + (0, 1) ad w 2 = z + (1, 0), we idepedetly set (z, w 1 ) E, (z, w 2 ) E w.p. 1/2, ad (z, w 1 ) E, (z, w 2 ) E w.p. 1/2. For distict z, w Z 2, we say that w is a descedat of z (ad z is a acestor of w) i E if there is a path betwee w ad z, ad if w z i both coordiates. Note that, by costructio, (1) E has o cycles, ad each ode is adjacet to at least oe edge, ad so E is a spaig forest. (2) Each w has ifiitely may acestors. (3) If w z the the umber of descedats of w is idepedet of the umber of descedats of z. Propositio The umber of descedats of each w Z is almost surely fiite, with ifiite expectatio. Proof. Let f(w, z) equal the probability that z is a acestor of w, ad ote that by the ivariace of the defiitios f is a mass trasport.

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Convergence of random variables. (telegram style notes) P.J.C. Spreij Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space