Probability Theory and Stochastic Processes UAB. Paul Jung

Size: px

Start display at page:

Download "Probability Theory and Stochastic Processes UAB. Paul Jung"

Franklin Greer
5 years ago
Views:

1 Probability Theory ad Stochastic Processes UAB Paul Jug 1

2 Notatio used i the text: Throughout the text := deotes a defiitio or set equal to. We use the symbol to idicate two differet otatios for the same object, or i the case of fuctios, idetically equal to a costat. deotes a disjoit uio. Lebesgue measure is deoted by m, m(dx), dx, or dt. R := R {, } {X < x} := {ω : X(ω) < x} For evets A ad B, we ofte use a comma for itersectio: {A, B} := {A B}. For evets A, B or radom variables X, Y, we deote idepedece by A B or X Y. X F deotes that X is F-measurable. N 0 := N {0} u.i. stads for uiformly itegrable. i.i.d. stads for idepedet ad idetically distributed. µ µ deotes weak covergece of distributios. a b deotes lim a /b = 1. 2

3 CONTENTS Cotets 1 Measure Theory Radom Variables Expectatio Distributios Beroulli s Laws of Large Numbers Idepedece ad Covolutio Weak Law of Large Numbers Strog Law of Large Numbers Uiform Itegrability ad the L 1 Law of Large Numbers The Ergodic Theorem Coditioal Expectatio Statioary Sequeces Birkhoff s Ergodic Theorem The Cetral Limit Theorem Covergece i Distributio Characteristic Fuctios The Cetral Limit Theorem The Momet Method Bocher s Theorem The Law of Small Numbers Poisso Covergece Poisso Processes Berstei s Block Method Radom Walk Recurrece ad Trasiece Stoppig Times The Markov ad Martigale Properties Large Deviatios Browia Motio Costructio of the Process Properties of Browia Motio

4 1 MEASURE THEORY 1 Measure Theory We will begi with a brief review of measure theory. As this is meat oly for review, may proofs will be omitted, but ca be foud i [RF10] or [Rud87]. Defiitio 1.1. Give a set Ω, a σ-algebra or σ-field is a subset F P(Ω) (the power set of Ω) which satisfies (i) F (thus F is oempty) (ii) A F implies A c F (iii) A F for all implies that N A F. Defiitio 1.2. A measure is a o-egative fuctio µ : F [0, ] which satisfies (i) µ( ) = 0 (ii) coutable additivity: disjoit uio. N µ(a ) = µ ( N A ) where deotes the Defiitio 1.3. A measurable space is a pair (Ω, F), ad a measure space is a triplet (Ω, F, µ). Example 1.4 (Lebesgue measure). Let B be the Borel σ-field o Ω = R, i.e., the smallest σ-field cotaiig all ope sets. Thus, ay ope set G is i B, ad ay closed set F closed is i B. Also, G δ B ad F σ B where G δ is a coutable itersectio of ope sets ad F σ is a coutable uio of closed sets. Furthermore, coutable uios of G δ sets form the collectio G δσ := (G δ ) σ ad coutable itersectios of F σ form F σδ. This procedure ca be recursively applied to get all of B. Let m( ) be the measure which assigs to ay ope iterval, its legth m((a, b)) = b a. This is Lebesgue measure, ad we have that (R, B, m) is a measure space. Oe ca exted the σ-field to iclude sets which are ot i B, but such a discussio is better left for a course i real aalysis. Oe ca also restrict Lebesgue measure to obtai a measure space of the form (R, σ({a}), m). Here σ({ }) is the smallest σ-field cotaiig { }, e.g. σ({a}) = {, A, A c, Ω}. Example 1.5 (Coutig measure). Let Ω = Z. We ca defie the coutig measure space (Z, P(Z), µ) where for A P(Z), µ(a) := #{x Z : x A}. Agai, we ca also defie measures by restrictig the σ-field, for example (Z, {, A, A c, Z}, µ). Defiitio 1.6. The orm of a measure µ, deoted µ, is µ(ω). If µ <, this correspods to a true orm for the vector space of fiite siged measures o Ω. 4

5 1.1 Radom Variables 1.1 Radom Variables Defiitio 1.7. A probability space is a measure space where the orm of the measure is 1. We geerally deote probability spaces as (Ω, F, P). Ay elemet ω Ω is called a outcome ad ay A F is called a evet. The set Ω is ofte called the sample space. Ay measure µ with a ozero, fiite orm ca be made ito a probability measure by ormalizig, i.e., scalig the measure by µ 1. Example 1.8. Cosider the set of all possible results from fair coi tosses which result i H or T. Let Ω = {H, T }, F = P(Ω), ad P(A) = #A 2. The for A = {exactly oe T } we have that P(A) = 2. Note that P is a ormalized coutig measure. Example 1.9. If we cout the umber of H s appearig i each ω Ω of the previous example ad idetify (or merge ito a sigle elemet) all elemets with the same umber of H s, the we get the sample space Ω = {0, 1,..., } with F = P( Ω). Oe ca the check that P(A) = ( ) 1 k 2 k A is the measure iduced by P uder this idetificatio. Example Let Ω = [0, 1], F = B([0, 1]), ad P(A) = m(a) (Lebesgue measure). This ca be thought of as the measure correspodig to ifiitely may fair coi tosses with ω Ω give by ω = (ω 1, ω 2,...) ad ω i = 0 or 1 with the associatio that H = 1 ad T = 0. I other words, we use the dyadic expasio of ω [0, 1]. For example: ω = = Defiitio Give two measurable spaces (Ω, F) ad (R, B), we say that X : Ω R is a F-measurable fuctio if X 1 (B) F for all B B. We ofte just say measurable whe our choice of F is implicit. Probabilists call measurable fuctios o (Ω, F, P), radom variables. Remarks It is importat to look at iverse images of Borel sets from the rage of X ad ot forward images of A F from the domai of X. I geeral, the image of a measurable fuctio X may ot be measurable with respect to the Borel σ-field o R. For istace, set Ω R such that Ω / B ad cosider the σ-field F = {Ω B : B B}. The the idetity map restricted to Ω (or iclusio map X : Ω R) is F-measurable, but the forward image of Ω (uder the map X) is Ω which is ot a Borel set. 5

6 1 MEASURE THEORY 2. Sice {(, c), c R}, or alteratively {(, c], c R}, geerate the Borel sets, the measurability coditio is equivalet to X 1 ((, c)) F for all c R, or alteratively X 1 ((, c]) F for all c R. Exercise 1.1. If X ad Y are radom variables, show that X + Y ad X Y are radom variables. Example 1.13 (Biomial ad Beroulli radom variables). The probability space described i Example 1.9 is a caoical oe for the followig importat radom variable. A radom variable which couts the umber of heads out of idepedet coi tosses is called a biomial radom variable. I geeral, the coi tosses come up heads with probability 0 < p < 1, i which case a biomial radom variable X Bi(, p) is described by the probabilities ( P({ω : X(ω) = k}) = p k) k (1 p) k. The special istace = 1 is called a Beroulli radom variable, i which case we write X Ber(p). Example 1.14 (Geometric radom variables). Cosider a probability space similar to Example 1.10, except that we toss a ifiite umber of cois which may ot be fair. Suppose each coi comes up heads idepedetly with probability 0 < p < 1. A radom variable X which couts the umber of tosses required i order to get our first H is called a geometric radom variable ad we write X Geom(p). It is described by the probabilities P({ω : X(ω) = k}) = p(1 p) 1. Defiitio If X : (Ω, F) (R, B) is measurable, the we say that X is a exteded radom variable (Here, R = R {± }). If X : (Ω, F) (R, B ) is measurable, the we say that X is a radom vector (B is just the Borel σ-field o R ). Propositio For a coutable set of radom variables {X, N}, we have that if N X, N X, lim if N X ad lim sup N X are all exteded radom variables. Proof. We shall oly prove the first claim. We begi by otig that {ω : if N X (ω) < a} = N {ω : X (ω) < a}. (1.1) Each set {ω : X (ω) < a} F sice X is measurable by assumptio. Thus, N {ω : X (ω) < a} F as it is a coutable uio of measurable sets. 6

7 1.2 Expectatio Example It is importat that the idex set of if N X is coutable. Cosider the probability space ([0, 1], B, m), a fixed subset A [0, 1], ad the radom variables { 0 if t = ω ad ω A c X t (ω) = 1 otherwise. The if t [0,1] X t (ω) = 1 A (ω) which is measurable oly if A B. 1.2 Expectatio I the rest of this sectio we cosider geeral measure spaces (E, F, µ) which are fiite (uless otherwise stated) so that i particular, µ(e) <. Defiitio A measurable fuctio ϕ(x) is simple if ϕ takes fiitely may values. A simple fuctio ϕ is called a idicator fuctio if ϕ(x) {0, 1} for all x E. We write { 1, x A 1 A (x) = 0, x A c. Defiitio 1.19 (Itegral of simple fuctios). If ϕ : E R is simple ad takes values {a 1,..., a }, the we defie its Lebesgue itegral as ϕ dµ ϕ µ(dx) E := E a j µ(ϕ 1 (a j )). (1.2) j=1 Example Let A be a measurable subset of E. The 1 A dµ = 1µ(A) + 0µ(A c ) E = µ(a). The followig lemma is what makes the Lebesgue itegral work ad what gives it power. Lemma 1.21 (Simple Approximatio Lemma). A fuctio f is measurable if ad oly if there exists a sequece of simple fuctios (ϕ, N) such that ϕ (x) f(x) for all x E. Moreover, this sequece satisfies ϕ f for all. If f is oegative, the sequece (ϕ, N) ca be chose to be odecreasig. Defiitio 1.22 (Itegral of measurable fuctios). We proceed i three steps, by defiig the itegral first for bouded ad oegative fuctios, the for oegative fuctios, ad fially for geeral measurable fuctios. 7

8 1 MEASURE THEORY (i) If f 0 is a bouded, measurable fuctio o E, we defie { } f dµ := sup ϕ dµ : ϕ is simple ad ϕ f. (1.3) E E (ii) If g 0 is measurable o E, we defie { } g dµ := sup f dµ : f is a bouded, measurable fuctio ad 0 f g. E E Note that g dµ = is possible. E (iii) If g is measurable, the there exists measurable fuctios g 1 ad g 2 such that g 1 0, g 2 < 0, ad g = g 1 + g 2. For example, let A = g 1 ([0, )). The A is measurable ad g = 1 A g + 1 A cg. Usig this decompositio, we ca the defie g dµ := g 1 dµ ( g 2 ) dµ (1.4) E wheever (at least) oe of the itegrals o the right is fiite. itegrals o the right are ifiite, the itegral is udefied. Exercise 1.2. For bouded, measurable f, { } f dµ = if ϕ dµ : ϕ is simple ad ϕ f. E E E E If both Exercise 1.3. If (E, µ) = ([0, 1], m) ad f is bouded ad Riema itegrable, the f m(dx) = f dx. E Heceforth, all itegrals are with respect to measures ad dx or dt will deote Lebesgue measure (we also cotiue to sometimes use m(dx) to deote Lebesgue measure). Theorem 1.23 (Liearity ad mootoicity). If f ad g are measurable fuctios o (E, µ), the af + bg dµ = a f dµ + b g dµ. E E E If f g, the E f dµ E E g dµ. I a typical course i measure theory, oe would prove liearity ad mootoicity i three steps for measurable f ad g: first for bouded fuctios, ext for oegative fuctios, ad fially for geeral measurable fuctios. 8

9 1.2 Expectatio Corollary 1.24 (Triagle Type Iequality). If f is measurable, the f dµ f dµ. Proof. Sice f f f, we have by mootoicity that f dµ f dµ f dµ. E E E E E Defiitio 1.25 (Itegral over subsets). For ay measurable subset A E, we set f dµ := f1 A dµ. Propositio If E = =1 A, the f dµ = f dµ. E A A E =1 Proof. This follows from liearity of the itegral ad the coutable additivity of µ. Defiitio We say that f(x) = g(x) almost everywhere o E with respect to the measure µ, deoted a.e., if µ({x : f(x) g(x)}) = 0. If µ = 1, the we say that f = g almost surely, deoted a.s., o E. We say that two sets A ad B are equal a.e. or a.s. if their idictor fuctios are equal a.e. or a.s. Remark It is ofte the case i both probability theory ad measure theory that oe refers to a fuctio f whe really oe meas the equivalece class of fuctios which are equal to f a.s. or a.e., sice typically oe oly wats to kow thigs up to measure zero. Corollary If f a.e. = g, the f dµ = g dµ. Proof. Let A = {x : f(x) = g(x)}. The f dµ = f dµ + f dµ E A A c = g dµ + f dµ A A c = g dµ A = g dµ + g dµ A A c = g dµ. E E E 9

10 1 MEASURE THEORY Defiitio 1.30 (Covergece almost everywhere, almost surely). We say a.e. (f, N) coverges almost everywhere to f ad write f f wheever it coverges poitwise except o a set of measure zero: µ({x : lim f (x) f(x)}) = 0. If µ is a probability measure, we say almost surely ad write f a.s. f. Theorem 1.31 (Bouded Covergece Theorem). If (f, N) is a sequece of measurable fuctios for which there exists a M R such that f M for a.e. all ad f f, the f dµ = lim f dµ. E E It is crucial i the above theorem that (E, F, µ) is a fiite measure space. This is ot so for the ext three results, but for cosistecy let us cotiue to assume the settig of fiite measure spaces. Lemma 1.32 (Fatou s Lemma). If (f, N) is a sequece of oegative, measurable fuctios o E, the lim if f dµ lim if f dµ. E Theorem 1.33 (Mootoe Covergece Theorem). If (f, N) is a sequece of measurable fuctios o E such that f f a.e., the f dµ = lim f dµ. E Theorem 1.34 (Domiated Covergece Theorem). If (f, N) is a sequece of measurable fuctios o E for which f g for some g such that E g dµ <. If f a.e. f, the lim f dµ = f dµ. E E Defiitio If µ = 1, the we call the itegral (if it exists) of a radom variable X, the expectatio of X, deoted EX := X(x) dµ(x). E Our typical otatio for a probability space is (Ω, F, P) i which case the above becomes EX = X(ω) dp. Ω E E 10

11 1.2 Expectatio Exercise 1.4. Show that if g : R R is a Borel measurable fuctio ad X is a radom variable, the g(x) is a radom variable. I particular, the followig is well-defied: E(g(X)) = g(x(ω)) dp. Ω Havig itroduced probabilist s otatio for itegratio, i the rest of the sectio, we shall couch some typical theorems from real aalysis i this settig. First, however, let us itroduce two more terms used i the laguage of probability. Defiitio Sice g(x) = x p is a Borel measurable fuctio, g(x) = X p is a radom variable. The pth momet of X is give by EX p = X p (ω) dp. Ω Defiitio E(X 2 (EX) 2 ) is called the variace of X, deoted Var X. It is easy to see that Var X is equivalet also to E(X EX) 2 Because EX ad Var X play a cetral role, they ofte are deoted by µ = EX ad σ 2 = Var X. We ackowledge that µ is beig used for several differet objects, but the reader should be able to deduce the meaig i each case by cotext. Theorem 1.38 (Hölder s Iequality). Suppose p ad q are cojugate expoets, i.e., 1 p + 1 = 1, 1 < p, q <. q If X ad Y are radom variables, the where X p := (E X p ) 1 p. E XY X p Y q, I the case where p = q = 2, Hölder s Iequality is just the Cauchy-Schwarz Iequality (oe ca check that E XY defies a ier product betwee X ad Y ). Exercise 1.5 (Paley-Zygmud Iequality). Let Y 0 with EY 2 <. For θ [0, 1] 2 (EY )2 P(Y > θey ) (1 θ) EY 2. Hit: Use Hölder s Iequality o the product Y 1 {Y >θey }. Exercise 1.6. Let E X k <. The for 0 < j < k we have E X j E( X k ) j/k <. 11

12 1 MEASURE THEORY Exercise 1.7 (Mikowski s Iequality). For p 1 ad radom variables X ad Y, show that X p + Y p X + Y p. Mikowski s Iequality shows that p is a orm for the space of all radom variables with fiite pth momet. I fact, it turs out that this ormed liear space is complete, thus makig L p (Ω) a Baach space. This motivates the followig defiitio. Defiitio We say that a radom variable X is itegrable with respect to P if E X < ad write X L 1 (Ω) L 1 (Ω, F, P). If E X p <, the we say X L p (Ω). Example If X L p (Ω) ad Y L q (Ω) where p ad q are cojugate expoets, the by Hölder s Iequality, XY L 1 (Ω). Example Let Z be a radom variable. Defie radom variables Y 1 ad X = Z α for α 1. The we have that Thus, E X 1 (E 1 q ) 1 q (E X p ) 1 p. (E Z α ) 1 α (E Z αp ) 1 αp or Z α Z αp. Thus, for 1 p < q <, we have that Z p Z q. I particular, if X L q (Ω), the X L p (Ω). A immediate result of this fact is that for a radom variable X, Var X < implies E X < ad thus, EX <. Note: This sort of result does ot hold for geeral measure spaces. It relies o the assumptio that we are workig o a probability (ad thus fiite) measure space. Exercise 1.8. (a) Use summatio-by-parts (the discrete aalog of itegratioby-parts) to show that whe X takes values i N, EX = N P({ω : X(ω) }). (b) For a geeral radom variable, use Example 1.41 to show also that Var X < if ad oly if E( X 1 { X >} ) <. N Defiitio A fuctio ϕ : R R is said to be covex o R if for all x, y R ad λ [0, 1] ϕ(λx + (1 λ)y) λϕ(x) + (1 λ)ϕ(y). 12

13 1.2 Expectatio Theorem 1.43 (Jese s Iequality). If ϕ : R R is a covex fuctio, the provided that both are fiite. E(ϕ(X)) ϕ(ex) Proof. Cosider a lie l(x) = ax+b satisfyig l(x) ϕ(x) ad l(ex) = ϕ(ex). For covex fuctios, such a lie always exists. The ϕ(x) ax + b, thus by mootoicity ad liearity of expectatio, Eϕ(X) E(aX + b) = aex + b = l(ex) = ϕ(ex). Example Let ϕ(t) = t p for t 0 ad p 1. The ϕ is a covex fuctio. Let Z be a radom variable ad defie the radom variable X as X = Z α, α 1. The Jese s Iequality gives us that Eϕ(X) ϕ(ex). Therefore, E Z αp (E Z α ) p. By takig the αp th root of both sides we get (E Z αp ) 1 αp (E Z α ) 1 α. Thus, we agai have the result that Z αp Z α. Theorem 1.45 (Markov s Iequality). For a radom variable X ad u > 0, we have that P({ω : X(ω) u}) E X u. Proof. We begi by otig that u1 {ω: X(ω) u} X. We ca the take the expectatio of both sides to get ue(1 {ω: X(ω) u} ) E X. By rewritig the expectatio o the left as a probability ad divig both sides by u, we get P({ω : X(ω) u}) E X u. Remark Whe u > 0, the evet { X u} is the same as {X 2 u 2 }, thus a easy corollary is Chebyshev s Iequality: P({ω : X(ω) u}) EX2 u 2. 13

14 1 MEASURE THEORY Exercise 1.9. If ϕ is a strictly covex fuctio, the ϕ(ex) = Eϕ(X) implies that X is a.s. costat. Theorem 1.47 (Trasformatio Theorem). If f : R R is a Borel measurable fuctio ad {X i, 1 i } are radom variables, the f(x 1,..., X ) is a radom variable. We ow wat to discuss the idea of product measures. I order to simplify the mechaics while still providig isight, we will cosider the case where = 2. Lemma 1.48 (Product measure). Let (E 1, F 1, µ 1 ) ad (E 2, F 2, µ 2 ) be measure spaces. There is a uique measure µ o E = E 1 E 2 with σ-field F 1 F 2 := σ({a B : A F 1, B F 2 }) = such that µ(a B) = µ 1 µ 2 (A B) := µ 1 (A)µ 2 (B). Usig the lemma above, we may defie the otio of a product σ-field ad product measure as the oes described i the lemma. Oe should check that, i the case of Lebesgue measure, these coicide with our previous otio of (R, B, m). This is doe by cofirmig the coutable additivity of Lebesgue measure o B, ad the utilizig Caratheodory s Extesio Theorem (see [RF10]). Theorem 1.49 (Fubii-Toelli Theorem). Suppose E = E 1 E 2 is edowed with a product σ-field ad product measure. For a measurable fuctio f o E, if f 0 or f L 1 (E, F, µ), the f(x, y) µ(dx, dy) = f(x, y) µ 1 (dx) µ 2 (dy). E E 2 E } 1 {{} ( ) Remarks It is possible that f(x, ) be measurable for each y ad f(, y) be measurable for each x, but f is still ot measurable. I such cases, the theorem caot hold sice the itegral o the left is ot eve well-defied. 2. For Toelli (f 0), both sides may be ifiite. 3. Part of the theorem is that ( ), as a fuctio of y, is measurable with respect to (E 2, F 2, µ 2 ). 4. Note that the iterated itegral o the right may be doe i either order sice the desigatio of E 1 was arbitrary. 5. Similar to Fatou, Mootoe Covergece, ad Domiated Covergece, it is ot ecessary here that the measure spaces be fiite. 14

15 1.3 Distributios 1.3 Distributios A property is probability-theoretical if ad oly if it is described i terms of a distributio. M. Loève [Loè77] Defiitio The distributio 1 of a radom variable X is a measure µ X o (R, B) such that for A B, µ X (A) := P(X 1 (A)). I other words, it is the iduced probability measure o R by the measure P o Ω. We will also sometimes say that a measure µ o (R, B) is a distributio if it is a probability measure, eve if there is o a priori associated radom variable. Remark A less widely used term for the iduced measure o R is the law of a radom variable X. However, the term law is also sometimes used i referece to the measure P o the measurable space (Ω, F). Due to its ambiguity, we do ot use this termiology i the sequel. The quote at the begiig of this sectio is i the cotext of M. Loève writig o the coceptual differece betwee probability theory ad geeral measure theory. We add to this the claim that the otio of a distributio is the heart of a radom variable. To explai this claim, cosider that a measurable space (Ω, F) is required to have very little structure just eough to defie a measure o it (for example we caot discuss additio or cotiuity i the set Ω sice it may ot be a group ad may ot have a topology). This is reasoable sice evets which occur i real life typically do ot come with atural algebraic, topological, or geometric structures. Itroducig a radom variable X ito the picture (or radom vector X) allows us to push probabilities ito a space i which we have a great deal of structure, amely R (respectively R ). We ca the forget about (Ω, F, P) ad work with (R, B, µ X ) (respectively (R, B, µ X )). The values X takes are ow thought of as varyig accordig to the iduced measure µ X, which is the motivatio behid callig it a radom variable rather tha a fuctio. This is embodied by the shorthad otatio P(X A) P({ω : X(ω) A}). Remark Sice Ω is typically a abstract space to begi with, oe ofte just sets Ω = R. The we may also set µ X = P i which case we might as well set X(ω) = ω (the idetity fuctio). This is takig the above discussio to the extreme, but this sort of thikig is sometimes helpful particularly whe whe we later ecouter sequeces of radom variables takig values i a ifiite product space R N = i=1 R. Exercise Show that µ X is a probability measure. Defiitio A collectio of radom variables {X i, i I} is said to be idetically distributed (i.d.) if the distributio of X i is the same for all i I. 1 These should ot be cofused with distributios i the theory of partial differetial equatios. 15

16 1 MEASURE THEORY Example Let Ω = {X, T } 3 ad cosider the Beroulli radom variables { 1, if H o the i X i = th toss 0, otherwise for i {1, 2, 3}. The ω 1 = (H, H, H), ω 5 = (H, T, T ), ω Ω = 2 = (H, H, T ), ω 6 = (T, H, T ), ω 3 = (H, T, H), ω 7 = (T, T, H), ω 4 = (T, H, H), ω 8 = (T, T, T ) The the distributio of X i for i {1, 2, 3} is give by µ = 1 2 δ δ 1. Thus, these radom variables are idetically distributed with distributio µ = µ Xi.. Example 1.56 (Expoetial distributio with rate λ). We say that X has a expoetial distributio with rate λ ad write X Exp(λ) if µ X (A) = λe λx dx for all A B. A [0, ) Example 1.57 (Normal or Gaussia distributio). We say that X has a ormal or Gaussia distributio with mea µ 0 ad variace σ 2 ad write X N(µ 0, σ 2 ) if ( 1 µ X (A) = exp (x µ 0) 2 ) 2πσ 2 2σ 2 dx A Defiitio The distributio fuctio of X is defied to be F X (x) := P(X x) P({ω : X(ω) x}) = µ X ((, x]). Oe should ot cofuse distributio fuctios, which are actual fuctios o R, with distributios which are probability measures o R. To make the distictio clear, oe ofte calls F X a cumulative distributio fuctio or more simply a cdf. The followig is a stadard result from measure theory. Lemma 1.59 (Cotiuity of measure). Suppose A, A F for some (possibly ifiite) measure space (E, F, µ). If A A, the µ(a) = lim µ(a ). If µ is a fiite measure ad A A, the µ(a) = lim µ(a ). 16

17 1.3 Distributios I the followig, we use the otatio f(x+) := lim f(y) ad f(x ) := lim f(y). y x + y x Propositio 1.60 (Distributio fuctio properties). (i) F X is odecreasig, i.e., x y implies F X (x) F X (y). (ii) F X is right-cotiuous, i.e., F (x+) = F (x) for all x. (iii) lim x F X (x) = 0 ad lim x F X (x) = 1. (iv) P(X = x) = µ X ({x}) = F (x) F (x ). Proof. (i) Sice x y, we have (, x] (, y] which implies F X (x) = µ X ((, x]) µ X ((, y]) = F X (y). (ii) Suppose x x. The (, x ] = (, x] which implies F X (x+) = lim µ X((, x ]) = µ X ( (, x ]) = µ X ((, x]) = F X (x) where the secod equality follows from cotiuity of measure. (iii) From the fact that N (, ] =, we coclude that lim F X(x) = lim µ X((, ]) x = µ X ( N(, ]) = µ X ( ) = 0. From N (, ] = R, we coclude that lim F X(x) = lim µ X((, ]) x = µ X ( N(, ])) = µ X (R) = 1 where the secod equalities i both of the above follow from cotiuity of measure. (iv) This follows from the fact that F X (x ) = P(X < x), which oe ca easily check. 17

18 1 MEASURE THEORY Theorem 1.61 (Characterizatio by distributio fuctios). If a fuctio F satisfies properties (i), (ii), ad (iii) of Propositio 1.60, the it is the distributio fuctio of some radom variable X, i.e., there is a X such that F = F X. Proof. We will costruct a X usig (Ω, F, P) = ([0, 1], B([0, 1]), m), i.e., Lebesgue measure o the Borel subsets of [0, 1]. We defie X(ω) := sup {y : F (y) < ω}. Whe F is cotiuous ad strictly icreasig, X is its iverse, ad this is how oe should thik about it eve whe F is discotiuous or ot strictly icreasig. If we ca show that the {ω : X(ω) x} = {ω : ω F (x)}, (1.5) P(X x) = P(ω F (x)) = m([0, F (x)]) = F (x), as desired. So ow we show (1.5). [ ] Suppose that ω F (x). Sice X is odecreasig, we have X(ω) X(F (x)) x. The latter iequality is equivalet to sayig sup {y : F (y) < F (x)} x, which is true sice if y is such that F (y) < F (x), the y x sice F is odecreasig. [ ] Suppose that X(ω) x. We must show ω F (x). By way of cotradictio, suppose ω > F (x). Let x x with x > x for all. By right-cotiuity, F (x ) F (x). Choose a N N such that ω > F (x N ) F (x). The we have sup {y : F (y) < ω} x N because x N is oe such y. The by defiitio, X(ω) x N > x, a cotradictio. Remark Sice F X is odecreasig, all discotiuities are jump-discotiuities. Therefore, ay F X with a discotiuity of height c at poit x must be associated to a distributio µ X which is partly made up by the poit mass cδ x. The poit x is sometimes referred to as a atom. Example 1.63 (Uiform distributio). Perhaps the simplest distributio fuctios are those of cotiuous uiform radom variables X Uif[a, b] ad 18

19 1.3 Distributios discrete uiform radom variables X Uif{x 1,..., x }. For the cotiuous case we have 0, x a x a F X (x) = b a, x (a, b) 1, x b. I the discrete case we have 0, x < x 1 k F X (x) =, x k x < x k+1 ad 1 k < 1, x x. The use of the words discrete ad cotiuous apply to more tha just uiform radom variables. I particular, we say that a radom variable is discrete if its distributio ca be writte i the form p δ x N for a coutable set of values {x, N} which occur with probabilities {p, N} summig to oe (i the fiite case, ifiitely may of the probabilities are zero). A radom variable is cotiuous if its distributio fuctio F X is cotiuous. However, it is ofte the case that whe oe says X is cotiuous, oe really meas the slightly stroger statemet that its distributio is absolutely cotiuous, which we ow discuss. Defiitio If µ ad ν are measures o (R, B), we say that ν is absolutely cotiuous with respect to µ if for all A B, µ(a) = 0 ν(a) = 0, ad we write ν µ. Before movig o, let us motivate the above defiitio. Give a radom variable X, we have up util ow, three differet yet equivalet ways of describig a probability measure. Firstly, the abstract way, (Ω, F, P). Secodly, usig the measure µ X o (R, B) iduced by X. Fially, Theorem 1.61 tells us that the distributio fuctio F X uiquely determies the measure µ X. I the rest of the sectio we will show that the otio of absolute cotiuity provides a fourth descriptio of the probability measure that holds wheever a distributio is absolutely cotiuous with respect to Lebesgue measure. Defiitio A measure space (E, F, µ) is σ-fiite if there exists a coutable collectio {E, N} F such that E = N E ad µ(e ) < for all. Theorem 1.66 (Rado-Nikodym Theorem). Suppose (E, B, µ) is a σ-fiite measure space. The ν µ if ad oly if there is a measurable fuctio f 0 such that ν(b) = f dµ for all B B. (1.6) B The fuctio f is called the Rado-Nikodym derivative ad is uique µ-a.e. 19

20 1 MEASURE THEORY Proof. For the proof we refer to [Rud87]. However, oe ca easily see the µ-a.e. uiqueess of the fuctio f, for if f ad g both satisfy (1.6), ad differ o a set of positive µ-measure, the there is a set B B such that (f g) dµ 0. B Hece ν(b) = B f dµ g dµ = ν(b), a cotradictio. B Defiitio We say that a fuctio F o R is absolutely cotiuous o a iterval [a, b] if for every ɛ > 0, there exists a δ > 0 such that for every N ad every collectio {(a k, b k )} k=1 of disjoit ope subitervals of [a, b] such that k=1 (b k a k ) < δ, we have k=1 F (b k) F (a k ) < ɛ. Note that absolute cotiuity implies uiform cotiuity, which i tur implies cotiuity. Theorem 1.68 (Fudametal Theorem of Calculus). A fuctio F is absolutely cotiuous o [a, b] if ad oly if F (x) exists a.e., F L 1 ([a, b]), ad F (x) F (a) = x As usual, dt deotes Lebesgue measure. a F (t) dt for all x [a, b]. Defiitio If X is a radom variable with a absolutely cotiuous distributio fuctio F X, the its desity (sometimes called probability desity fuctio or pdf) is f X (x) := F X (x). The previous theorem shows that a probability distributio o R is absolutely cotiuous with respect to Lebesgue measure, precisely whe its associated distributio fuctio is absolutely cotiuous (thus affordig us the dual use of this termiology). I particular, oe gets that µ X (A) = f X (x)dx, thus whe F X is absolutely cotiuous o R, we have four equivalet ways of iterpretig the probability measure. Example 1.70 (Expoetial ad Normal desities). If X has a expoetial distributio with rate λ the ad thus for c > 0, A f X (x) = λe λx 1 [0, ) F X (c) = P(X c) = c 0 λe λx dx. If X has a ormal distributio with mea µ 0 ad variace σ 2 the ( 1 f X (x) = exp (x µ 0) 2 ) 2πσ 2 2σ 2. 20

21 1.3 Distributios Example Clearly, F X must be cotiuous i order for it to be absolutely cotiuous, so there is o desity for the radom variable X Ber(0, 1), which is a coi flip assigig 1 to heads ad 0 to tails. It is ot eough, however, that F X is cotiuous, i order for X to have a desity. Cosider the cotiuous Cator-Lebesgue fuctio F o [0, 1], i.e., the Devil s Staircase, which ca be exteded to all of R by settig its value to 1 for x > 1 ad 0 for x < 0. The, it is a distributio fuctio. Sice F (x) = 0 a.e., F (1) F (0) = 1 0 = 1 0 F (x)dx. We see that F is ot absolutely cotiuous thus has o associated desity. Exercise If the distributio µ X is absolutely cotiuous with desity f X, show that for ay Borel measurable fuctio h, Eh(X) = h(x)f X (x) dx. R 21

22 2 BERNOULLI S LAWS OF LARGE NUMBERS 2 Beroulli s Laws of Large Numbers 2.1 Idepedece ad Covolutio Measure theory eds ad probability begis with the defiitio of idepedece. R. Durrett [Dur10] Before defiig idepedece, let us cosider the motivatio behid the way it is defied. Let us suppose some evet B Ω (with P(B) > 0) is kow occur (say to someoe with extra or iside iformatio). Coditioed o the iformatio that B has occurred, the probability that B c occurs must the be 0. O the other had, it may be that P(B) < 1, yet we kow B occurs. It is atural the, uder the assumptio that B occurs, to ormalize P by dividig by P(B). This gives us a ew probability P(A) = P(A B) := P(A B) P(B) o Ω, called the coditioal probability, where P(A) ad P(A B) are just two differet otatios for the same thig ad are defied by the right-had side. We read P(A B) as the probability that A occurs give that B occurs. With this i mid, if we thik of A ad B as beig idepedet of each other, the kowledge of B should ot affect the probability of A occurrig. So, we should expect P(A B) = P(A). This would the imply that P(A B) = P(A)P(B). We ow defie idepedece i a variety of cotexts. These cosideratios motivate us to make the followig defiitio of idepedece o the space (Ω, F, P): we say that two evets A ad B are idepedet (with respect to P), ad we write A B, if P(A B) = P(A)P(B). Give (Ω, F, P), two sub-σ-fields of F, say G ad H, are said to be idepedet if A B for all A G ad B H. The otio of idepedet σ-fields is just a extesio of idepedet evets. To see this, ote that if A B, the A B c, for we have P(A B c ) = P(A) P(A B) = P(A) P(A)P(B) = P(A)(1 P(B)) = P(A)P(B c ). Sice σ({a}) = {, A, A c, Ω} ad σ({b}) = {, B, B c, Ω}, it follows that A B implies that every elemet of σ({a}) is idepedet of every elemet of σ({b}). We say that two radom variables X ad Y are idepedet if σ(x) σ(y ), where for a radom variable Z we defie the σ-field geerated by Z to be σ(z) := {Z 1 (B) : B B} (oe ca check that this is a σ-field). Note that σ(z) is i fact the smallest σ-field that ca be costructed from Ω that allows 22

23 2.1 Idepedece ad Covolutio Z to be measurable. Also, otice that if X Y, the P(X [a, b], Y (c, d)) = P({ω : X(ω) [a, b]} {ω : Y (ω) (c, d)} ) }{{}}{{} =X 1 ([a,b]) σ(x) =Y 1 ((c,d)) σ(y ) = P(X [a, b])p(y (c, d)). We could of course use ay two Borel sets i place of [a, b] ad (c, d). We say that a fiite umber of evets A 1,..., A are idepedet if for every idex set I {1,..., }, we have ( ) P A i = P(A i ). i I i I The evets A 1,..., A are pairwise idepedet if for every 1 i < j, P(A i A j ) = P(A i )P(A j ). It is clear that idepedece implies pairwise idepedece. Exercise 2.1. Show that the coverse is ot true. I other words, costruct evets which are pairwise idepedet, but ot idepedet. We say that fiitely may σ-fields F 1,..., F are idepedet if ( ) P A i = P(A i ) i=1 i=1 for all A i F i, 1 i. Defiig this i terms of arbitrary idex sets I, as above, is ot ecessary because we ca always let some A i = Ω. Fiitely may radom variables X 1,..., X are idepedet if σ(x 1 ),..., σ(x ) are idepedet. Lastly, whether we are talkig about evets, σ-fields, or radom variables, a ifiite collectio is said to be idepedet if every fiite subcollectio is idepedet. For ow, we assume that such ifiite collectios exist ad address the existece issue later i Theorem Havig defied idepedece eight times, let us develop some properties ad see how it is useful. First, a result: Theorem 2.1 (Idepedece Trasformatio Theorem). Suppose {X i, i N} are idepedet radom variables ad f i : R R, i N, are Borel measurable. The {f i (X i ), i N} are idepedet. Proof. Sice we must show that {σ(f i (X i )), i N} are idepedet, it suffices to show that σ(f i (X i )) σ(x i ) sice {σ(x i ), i N} are idepedet. This meas we must show {X 1 (f 1 (B)) : B B} {X 1 (B) : B B}, but this is clear for if B is a Borel set, so is f 1 (B) sice f is Borel measurable. Hece X 1 (f 1 (B)) is of the form X 1 (B ) where B B. 23

24 2 BERNOULLI S LAWS OF LARGE NUMBERS By applyig the same argumets, oe ca show Corollary 2.2. If {X ij, (i, j) N 2 } are idepedet, ad f i : R mi R, i N, are Borel measurable, the {f i (X i1,..., X imi ), i N} are idepedet. Propositio 2.3 (Product measures ad idepedece). If {X i, i N} are idepedet with distributios {µ i, 1 i } respectively, the the radom vector X = (X 1,..., X ) has distributio µ := i=1 µ i o (R, B ). Proof. First we check that µ coicides with the distributio whe applied to product sets A 1 A, where A i B. We have P((X 1,..., X ) A 1 A ) = P(X 1 A 1,..., X A ) = P(X 1 A 1 ) P(X A ) = µ 1 (A 1 ) µ (A ) = µ(a 1 A ). Now we must show that µ(b) = P( X B) for arbitrary B B. We have thus far oly show this for very special B, however, ote that the Borel σ-field i R is geerated by products A 1 A of Borel sets i R. Hece, sice µ ad the distributio coicide o a geeratig set, the they coicide o all elemets of B. Example 2.4. If X N(0, 1) ad Y N(0, 1) are idepedet, the they are distributed as 1 µ X (A) = e x2 /2 m(dx), A 2π 1 µ Y (B) = e y2 /2 m(dy). 2π By Theorem 2.3, the distributio of (X, Y ) is give by 1 x µ (X,Y ) (A B) = (µ X µ Y )(A B) = 2 +y 2 2π e 2 dxdy. B A B As oe might expect, we see that µ (X,Y ) turs out to be the two-dimesioal stadard Gaussia distributio. More geerally, if X ad Y have absolutely cotiuous distributio fuctios F X ad F Y, the they have desities f X ad f Y. If X Y, by Theorem 2.3 oe has µ (X,Y ) (B) = f X f Y dxdy. (2.1) Coversely, if µ (X,Y ) is give by the above equatio, it is a easy exercise to see that X Y. B 24

25 2.1 Idepedece ad Covolutio Defiitio 2.5. The joit distributio for a collectio of radom variables {X 1,..., X }, is the distributio µ o (R, B ) of the radom vector X = (X 1,..., X ), i.e., µ(a) = P( X A) for all A B. Likewise, if µ(a) = A f X (x 1,..., x ) m(d x) for all A B, the f X {X 1,..., X } is said to be a joit desity for the collectio Remark 2.6. We ofte abuse otatio by switchig idiscrimiately betwee X ad {X 1,..., X }. Similarly, we sometimes say joit distributio (whe thought of as a collectio) ad other times simply say distributio (whe thought of as a vector). Note that i the above defiitio, idepedece of the radom variables is ot required. See the examples below. Corollary 2.7. If {X 1,..., X } are idepedet ad have desities f x1,..., f x, the the radom vector X = (X 1,..., X ) has joit desity f X (x 1,..., x ) : R R defied by f X = f xi. i=1 Example 2.8. If X 1 Uif(0, 1) ad X 1 = X 2, i.e., the two radom variables are ot oly idetically distributed but i fact idetical, the the radom vector X = (X 1, X 2 ) has a joit distributio but o joit desity sice the distributio cocetrates o the oe-dimesioal lie y = x. This is despite the fact that X 1 ad X 2 both have desities. Example 2.9. For X 1 X 2, let X 1 N(0, 1) ad let X 2 have a desity defied by { 2 f X2 (x) = 2π e x2 2, if x < 0 0, otherwise Also, let X 3 = X 1 if X 1 > 0 ad X 3 = X 2 if X 1 < 0. The, X 3 N(0, 1), but X 3 is ot idepedet from X 1. Agai, the radom vector X = (X 1, X 3 ) has a joit distributio but o joit desity sice part of the distributio cocetrates o the the lie y = x for x > 0. However, if oe coditios o the evet X 1 < 0 (or X 3 < 0), the the coditioal distributio has a joit desity. I fact, this coditioal desity is proportioal to the desity of a stadard two-dimesioal Gaussia radom variable, restricted to the quadrat x < 0, y < 0. 25

26 2 BERNOULLI S LAWS OF LARGE NUMBERS Defiitio Suppose the vector X = (X 1, X 2 ) has distributio µ. The probabilities µ(b) for all B of the form (a, b) R, determie a margial distributio, µ 1, defied by µ 1 ((a, b)) := µ((a, b) R) Remarks Give µ((a 1, b 1 ) R) ad µ(r (a 2, b 2 )) for all ope (a 1, b 1 ) ad (a 2, b 2 ), we the kow the margial distributios µ 1 ad µ 2, but we still do ot kow the distributio µ for the radom vector. However, i the special case where X 1 ad X 2 are idepedet, oe ca easily extract µ, sice the we obtai that µ(((a 1, b 1 ) R) (R (a 2, b 2 ))) = µ((a 1, b 1 ) (a 2, b 2 )) = }{{} by µ 1 ((a 1, b 1 ))µ 2 (a 2, b 2 )). 2. If B i is the support of µ i, the µ is supported o B 1 B 2, but B 1 B 2 may ot be its support. I other words, eve if µ i (A i ) > 0 for both A i B i, this does ot imply that µ(a 1 A 2 ) > If for B B B, the distributio µ is give by a desity fuctio µ(b) = f X (x 1, x 2 )dx 1 dx 2, B the for B 1 B, the margial distributio µ 1 (B 1 ) = B 1 f X1 dx 1, where f X1 = R f X (x 1, x 2 ) dx 2 is called the margial desity. If i additio X 1 X 2, the f X = f X1 f X2. I fact whe margial desities exist, the previous statemet is if ad oly if. 4. If X = (X1,..., X ) R with 3, the the margial distributios described above are the oe-poit or oe-dimesioal margials. More geerally, the k-poit or k-dimesioal margials are the distributios of vectors (X j1, X j2,..., X jk ) where 1 k <. Example Suppose µ(a) = the f Y (y) = f X (x) = y 0 x A e y 1 {0<x<y} dxdy, e y dy = e x for x > 0 ad y e y dx = e y 1 dx = ye y for y > 0. Hece, sice e y 1 {0<x<y} ye y e x 1 {x>0,y>0}, we ca coclude that X ad Y are ot idepedet. 0 26

27 2.1 Idepedece ad Covolutio Propositio If X ad Y are idepedet ad E X < ad E Y <, the E XY < ad EXY E(XY ) = EXEY. Proof. First ote that if we let X = 1 A ad Y = 1 B, the EXY = E1 A B = P(A B) = P(A)P(B) = EXEY. The idea is that the class of idicator radom variables are i some sese the buildig blocks of all radom variables. The ext step is to use liearity to exted the result to all simple fuctios of the type X = i=1 c i1 Ai The, usig the Simple Approximatio Lemma ad the Mootoe Covergece Theorem, oe ca show that the theorem holds for all X 0 ad Y 0. Fially, cosiderig the egative ad oegative parts of arbitrary X ad Y separately completes the proof. Idepedece is hugely importat i probability theory ad most of the fudametal theorems ad basic models i the sequel are built o idepedet radom variables at some level. These theorem ad models are, however, oly startig poits for more complex models which require some level of depedece i order to be more realistic. Thus, we ow quickly itroduce the most basic tools for measurig depedece. Defiitio (i) If < EXY = EXEY <, the X ad Y are said to be ucorrelated. Note that ucorrelated does ot imply idepedece. (ii) If > EXY > EXEY >, the X ad Y are positively correlated. (iii) If < EXY < EXEY <, the X ad Y are egatively correlated. Defiitio (i) The covariace of X ad Y is Cov(X, Y ) := E[(X EX)(Y EY )] = EXY EXEY. (ii) The correlatio 2 of X ad Y is Corr(X, Y ) := Cov(X, Y ) Var X Var Y. 2 The otio of correlatio is extremely importat i probability; it represets oe of the first ways of measurig depedece, hece providig a foil to idepedece. The moder form of correlatio is due to Pearso, but it is geerally recogized (icludig by Pearso himself) that Galto iveted this cocept i 1888, after may writigs o similar ideas. O a related ote, Pearso is also the first perso to use the term stadard deviatio [Sti89]. 27

28 2 BERNOULLI S LAWS OF LARGE NUMBERS (iii) The covariace matrix Σ associated to a radom vector (X 1,..., X ) R is Cov(X 1, X 1 ) Cov(X 1, X 2 )... Cov(X 1, X ) Cov(X 2, X 1 ) Cov(X 2, X 2 )... Cov(X 2, X )......, Cov(X, X 1 ) Cov(X, X 2 )... Cov(X, X ) where each etry Σ ij = Cov(X i, X j ). Remarks It is easy to see that Cov(X, X) = Var X. 2. We have Cov(aX, by ) = ab Cov(X, Y ) which implies Var(cX) = c 2 Var X ad also Var(cX) = c Var(X) }{{}}{{} σ cx cσ X where σ X is called the stadard deviatio of X. 3. We have Cov(X, Y +Z) = Cov(Y +Z, X) = Cov(X, Y )+Cov(X, Z) which together with (b) shows that covariace is a symmetric, biliear form. 4. A very commo formula is ( ) Var X i = i=1 i,j Cov(X i, X j ), which is the same as summig all the etries of the covariace matrix for (X 1,..., X ). Exercise 2.2. Show that Corr(X, Y ) 1. Also, fid whe Corr(X, Y ) = 1 ad whe Corr(X, Y ) = 1. Exercise 2.3. Show that the covariace matrix Σ of ay radom vector X must be positive semi-defiite, i.e., v T Σv 0 for all v R. Oe way to do this is to cosider the variace of the scalar radom variable give by the dot product v X. Coversely, show that ay positive semi-defiite matrix Σ is the covariace matrix of some radom vector. Hit: for the coverse directio, take a radom vector X whose margials are all idepedet ad which each have variace oe. Sice Σ is positive defiite, the square-root matrix is well-defied. Calculate the covariace matrix for the radom vector Σ X. 28

29 2.1 Idepedece ad Covolutio Example 2.17 (Multivariate Gaussia distributio). Covariace matrices are especially useful whe dealig with Gaussia distributios i high dimesios. I particular, if µ = (µ 1,..., µ ) R ad Σ is a positive defiite matrix the the -dimesioal Gaussia distributio with mea vector µ ad covariace matrix Σ has the desity (2π) 2 (detσ) 1 2 exp ( 1 ) 2 (x µ)t Σ 1 (x µ) for x R. We are earig our first big result i probability theory (both historically ad i traditioal pedagogy) which uses idepedece to aalyze the limitig behavior of ormalized sums of i.i.d. radom variables. The limitig behavior is described by the so-called Law of Large Numbers attributed to Jakob Beroulli [Ber13]. However, before movig to the limitig behavior, let us first preset a method for obtaiig a complete descriptio of the distributio of sums of a fixed fiite umber of idepedet radom variables. Defiitio If µ X ad µ Y are distributios o R correspodig to idepedet radom variables X ad Y, the their covolutio, defied i terms of the product measure by µ X µ Y (A) := µ X µ Y ({(x, y) : x + y A}), is the distributio of the sum X + Y. Remarks It is very importat that X +Y is a sum of idepedet radom variables. 2. The covolutio is a distributio o R eve though it is defied i terms of a distributio o R This otio exteds to radom vectors i X, Y R (as log as they are both i the same dimesio ). Exercise 2.4 (Poisso distributio). We say that X Poiss(λ) has a Poisso 3 distributio with mea λ if P(X = k) = e k λk for k [0, ). k! If X Poiss(λ) ad Y Poiss(κ) are idepedet, use covolutio to show X + Y Poiss(λ + κ). If X Bi(, p) ad Y Bi(m, p) are idepedet, use covolutio to show X + Y Bi( + m, p). 3 As oted i [JKK05, p. 157], this distributio ad its correspodig covergece theorem were most likely first discovered by A. De Moivre i 1711, well before Poisso s time. This a example of Stigler s Law, the otio that the ame attached to a mathematical theorem is ever the perso that actually discovered the theorem. 29

30 2 BERNOULLI S LAWS OF LARGE NUMBERS Propositio 2.20 (Covolutio is a semigroup). As a operatio, covolutio is commutative ad associative. Moreover, δ 0 is a idetity with respect to covolutio of dsitributios. Proof. Commutativity ad associativity follow from these properties for additio (of idepedet radom variables). Similarly, sice X 0 is a idetity for additio of idepedet radom variables, its distributio δ 0 is the idetity for the operatio of covolutio. Remarks Covolutio is ot a group sice the oly possible iverse of X would be X, but these are clearly ot idepedet. 2. The -fold covolutio of µ X with itself correspods to the sum of i.i.d. radom variables which have the same distributio as X. It is deoted by ν = µ X. Equivaletly we say that µ X is the th covolutio root of ν, ad we may write ν 1/ = µ X. 3. If the th covolutio root of ν exists for every N, we say that ν is a ifiitely divisible distributio. Exercise 2.5. Show that the th covolutio root of the Gaussia distributio N(µ, σ 2 ) is the Gaussia distributio N(µ/, σ 2 /). I particular, every Gaussia distributio is ifiitely divisible. Propositio Let A y = {z : z + x A}. The covolutio of µ X ad µ Y is give by µ X µ Y (A) = µ X (A y)µ Y (dy). R If µ X ad µ Y have associated desities f X ad f Y, the µ X µ Y also has a desity which is give by f X+Y (z) = f X (z y)f Y (y) dy. R Proof. Set B := {(x, y) : x + y A}. The, usig Toelli s Theorem, we have µ X µ Y (A) = = R R ( ) 1 B (x, y)µ X (dx) µ Y (dy) R µ X (A y) µ Y (dy). For the secod part, itegrate the fuctio g(z) := f X (z y)f Y (y) dy R 30

31 2.2 Weak Law of Large Numbers over a set A B to get g(x) dx = A = = A R R ( ) f X (x y)f Y (y) dy dx R ( ) f X (x y) dx f Y (y) dy A µ X (A y)f Y (y) dy. The right side is equal to µ X µ Y (A) by the first part of the propositio, thus g must be the desity correspodig to the distributio µ X µ Y. Example 2.23 (Gamma distributio). We say that X has a Gamma distributio with rate λ > 0 ad shape parameter ν > 0, ad write X Gamma(ν, λ), if λ ν x ν 1 µ X (A) = Γ(ν) e λx dx for all A B. A [0, ) Note that whe ν = 1, this is just a expoetial distributio with rate λ For ν N, Γ(ν) = (ν 1)! whereas for other values this is the well-kow Gamma fuctio. If X Gamma(ν 1, λ) ad Y Gamma(ν 2, λ) are idepedet, the by Propositio 2.22, X + Y has a desity give by f X+Y (z) = z 0 λ ν1+ν2 Γ(ν 1 )Γ(ν 2 ) (z y)ν1 1 e λ(z y) y ν2 1 e λy dy = λν1+ν2 e λz Γ(ν 1 + ν 2 ) = λν1+ν2 e λz Γ(ν 1 + ν 2 ) z 0 Γ(ν 1 + ν 2 ) Γ(ν 1 )Γ(ν 2 ) (z y)ν1 1 y ν2 1 dy where the itegral is see to be equal to oe by substitutig y = zu ad dy = z du to tur the itegrad ito a Beta fuctio (or simply use the fact that the right side must be a desity). Thus we see that the sum of two idepedet Gamma radom variables with the same rate, gives us aother Gamma radom variable. 2.2 Weak Law of Large Numbers Defiitio We say the sequece (X, N) coverges i probability to pr X, deoted by X X, if for every ɛ > 0, there exists a N such that N implies that P( X X > ɛ) < ɛ. Exercise 2.6. Show that Fatou s Lemma, the Domiated Covergece Theorem, ad the Mootoe Covergece Theorem all remai valid if we replace covergece a.s. with covergece i probability. 31

32 2 BERNOULLI S LAWS OF LARGE NUMBERS Exercise 2.7. Suppose a fuctio h : R R is cotiuous. If X X, the h(x ) pr a.s. h(x). If X X, the h(x ) a.s. h(x). These are versios of what is kow as the Cotiuous Mappig Theorem 4. pr Example (The shrikig ad revolvig iterval) Set f 1 (x) = 1 [0,1], f 2 (x) = 1 [1,1 1 2 ],..., f (x) = 1 [ 1 1 k=1 k, k=1 1 ]. k Cosiderig the itervals above, modulo 1, we set g 1 (x) = 1 [0,1], g 2 (x) = 1 [0, 1 2 ],..., g (x) = 1 [ 1 1 k=1 k (mod 1), k=1 1 k (mod 1)] where modulo 1 simply meas that we slide back to [0, 1] (if the left edpoit becomes greater tha the right edpoit, modulo 1, we split the iterval i two i the atural way). The for ay fixed ω [0, 1], g (ω) = 1 for ifiity may. The sequece (g, N) coverges poitwise owhere, but if we let g 0, the P( g g > ɛ) = 1. Hece, the sequece coverges i probability. Example Let the radom variable g i = 1 [i,i+1]. If (R, B, µ) is a probability a.s. pr space, the for all x, lim i g i (x) = 0, ad hece g i 0. Also, g i 0. For each ɛ > 0, simply choose N ɛ large eough such that µ([ N ɛ, N ɛ ]) > 1 ɛ. The for > N ɛ, P( g 0 > ɛ) < ɛ. Note that i this example, if oe uses Lebesgue measure istead of the probability measure µ, the the sequece does ot coverge i measure, see for example [RF10]. Theorem 2.27 (Weak Law of Large Numbers, fiite 2d momets). If {X, N} are idepedet ad idetically distributed (i.i.d.) ad EX 2 1 <, the S pr EX 1 where S = X 1 + +X. I fact, the assumptio of idepedece i the above ca be weakeed to Cov(X i, X j ) 0 for all i j. 4 This result ad aalogs were proved i [MW43]. 32

33 2.2 Weak Law of Large Numbers Proof. By Chebyshev s Iequality, we obtai ( ) S P EX 1 > ɛ E S EX 1 2 ɛ 2 = = = = S Var( ) ɛ 2 (sice E (S /) = EX 1 / = EX 1 < ) 1 2 ɛ 2 Var(X X ) 1 2 ɛ 2 Cov(X i, X j ) i=1 j=1 1 2 ɛ 2 Var X 1 (sice Cov(X i, X j ) 0 for all i j) 1 ɛ 2 Var X 1 0. Remarks The result is ot geerally valid for positively correlated radom variables. If EX1 2 < ad all X i are idetical, i.e., X i = X 1 (thik of people observig the same coi toss), the S = X1 = X 1 EX 1, uless X 1 costat. 2. The assumptio of beig idetically distributed ca also be relaxed. For example, oe ca use {X, N} which have bouded variaces ad are pairwise ucorrelated or egatively correlated. The without chagig the proof too much, oe ca obtai S E(S ) pr 0. Defiitio We say that (X, N) coverges i L p ad write X if lim E X X p = 0. Remarks By Chebyshev s Iequality, for p > 0, P( X X > ɛ) E X X p ɛ p, ad thus covergece i L p implies covergece i probability. L p X, 2. I the proof of the Weak Law of Large Numbers (WLLN), we actually proved the stroger result ( S, N) coverges i L2 to EX The shrikig, revolvig iterval i Example 2.25 shows that it is possible for a sequece to coverge i L p, but ot almost surely. 33

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014. Product measures, Toelli s ad Fubii s theorems For use i MAT3400/4400, autum 2014 Nadia S. Larse Versio of 13 October 2014. 1. Costructio of the product measure The purpose of these otes is to preset the