The Strong Law of Large Numbers

Lecure 9 The Srong Law of Large Numbers Reading: Grimme-Sirzaker 7.2; David Williams Probabiliy wih Maringales 7.2 Furher reading: Grimme-Sirzaker 7.1, 7.3-7.5 Wih he Convergence Theorem (Theorem 54) and he Ergodic Theorem (Theorem 55) we have wo very differen saemens of convergence of somehing o a saionary disribuion. We are looking a a recurren Markov chain (X ), i.e. one ha visis every sae a arbirarily large imes, so clearly X iself does no converge, as. In his lecure, we look more closely a he differen ypes of convergence and develop mehods o show he so-called almos sure convergence, of which he saemen of he Ergodic Theorem is an example. 9.1 Modes of convergence Definiion 59 Le X n, n 1, and X be random variables. Then we define 1. X n X in probabiliy, if for all ε >, P( X n X > ε) as n. 2. X n X in disribuion, if P(X n x) P(X x) as n, for all x R a which x P(X x) is coninuous. 3. X n X in L 1, if E( X n ) < for all n 1 and E( X n X ) as n. 4. X n X almos surely (a.s.), if P(X n X as n ) = 1. Almos sure convergence is he noion ha we will sudy in more deail here. I helps o consider random variables as funcions X n : Ω R on a sample space Ω, or a leas as funcions of a common, ypically infinie, family of independen random variables. Wha is differen here from previous pars of he course (excep for he Ergodic Theorem, which we ye have o inspec more horoughly), is ha we wan o calculae probabiliies ha fundamenally depend on an infinie number of random variables. So far, we have been able o rever o evens depending on only finiely many random variables by condiioning. This will no work here. 47

48 Lecure Noes Par B Applied Probabiliy Oxford MT 27 Le us sar by recalling he definiion of convergence of sequences, as n, x n x m 1 nm 1 n nm x n x < 1/m. If we wan o consider all sequences (x n ) n 1 of possible values of he random variables (X n ) n 1, hen n m = inf{k 1 : n k x n x < 1/m} N { } will vary as a funcion of he sequence (x n ) n 1, and so i will become a random variable N m = inf{k 1 : n k X n X < 1/m} N { } as a funcion of (X n ) n 1. This definiion of N m permis us o wrie P(X n X) = P( m 1 N m < ). This will occasionally help, when we are given almos sure convergence, bu is no much use when we wan o prove almos sure convergence. To prove almos sure convergence, we can ransform as follows P(X n X) = P( m 1 N 1 n N X n X < 1/m) = 1 P( m 1 N 1 n N X n X 1/m) =. We are used o evens such as A m,n = { X n X 1/m}, and we undersand evens as subses of Ω, or loosely idenify his even as se of all ((x k ) k 1, x) for which x n x 1/m. This is useful, because we can now ranslae m 1 N 1 n N ino se operaions and wrie P( m 1 N 1 n N A m,n ) =. This even can only have zero probabiliy if all evens N 1 n N A m,n, m 1, have zero probabiliy (formally, his follows from he sigma-addiiviy of he measure P). The Borel-Canelli lemma will give a crierion for his. Proposiion 6 The following implicaions hold X n X almos surely X n X in probabiliy X n X in disribuion X n X in L 1 E(X n ) E(X) No oher implicaions hold in general. Proof: Mos of his is Par A maerial. Some counerexamples are on Assignmen 5. I remains o prove ha almos sure convergence implies convergence in probabiliy. Suppose, X n X almos surely, hen he above consideraions yield P( m 1 N m < ) = 1, i.e. P(N k < ) P( m 1 N m < ) = 1 for all k 1. Now fix ε >. Choose m 1 such ha 1/m < ε. Then clearly X n X > ε > 1/m implies N m > n so ha P( X n X > ε) P(N m > n) P(N m = ) =, as n, for any ε >. Therefore, X n X in probabiliy.

Lecure 9: The Srong Law of Large Numbers 49 9.2 The firs Borel-Canelli lemma Le us now work on a sample space Ω. I is safe o hink of Ω = R N R and ω Ω as ω = ((x n ) n 1, x) as he se of possible oucomes for an infinie family of random variables (and a limiing variable). The Borel-Canelli lemmas are useful o prove almos sure resuls. Paricularly limiing resuls ofen require cerain evens o happen infiniely ofen (i.o.) or only a finie number of imes. Logically, his can be expressed as follows. Consider evens A n Ω, n 1. Then ω A n i.o. n 1 m n ω A m ω A m n 1 m n Lemma 61 (Borel-Canelli (firs lemma)) Le A = n 1 infiniely many of he evens A n occur. Then P(A n ) < P(A) = n 1 Proof: We have ha A m n A m for all n 1, and so P(A) P ( m n A m ) m n P(A m ) as n m n A m be he even ha whenever n 1 P(A n) <. 9.3 The Srong Law of Large Numbers Theorem 62 Le (X n ) n 1 be a sequence of independen and idenically disribued (iid) random variables wih E(X 4 1) < and E(X 1 ) = µ. Then S n n := 1 n n X i µ i=1 almos surely. Fac 63 Theorem 62 remains valid wihou he assumpion E(X 4 1) <, jus assuming E( X 1 ) <. The proof for he general resul is hard, bu under he exra momen condiion E(X 4 1) < here is a nice proof. Lemma 64 In he siuaion of Theorem 62, here is a consan K < such ha for all n E((S n nµ) 4 ) Kn 2.

5 Lecure Noes Par B Applied Probabiliy Oxford MT 27 Proof: Le Z k = X k µ and T n = Z 1 +... + Z n = S n nµ. Then ( n ) 4 E(Tn 4 ) = E Z i = ne(z1 4 ) + 3n(n 1)E(Z2 1 Z2 2 ) Kn2 i=1 by expanding he fourh power and noing ha mos erms vanish such as E(Z 1 Z2) 3 = E(Z 1 )E(Z2) 3 =. K was chosen appropriaely, say K = 4 max{e(z 4 1), (E(Z 2 1)) 2 }. Proof of Theorem 62: By he lemma, ( (Sn E n µ ) 4 ) Kn 2 Now, by Tonelli s heorem, ( ( ) ) 4 Sn E n µ = ( (Sn ) ) 4 E n µ < n n 1 n 1 Bu if a series converges, he underlying sequence converges o zero, and so ( ) 4 Sn n µ < a.s. ( ) 4 Sn n µ almos surely S n n µ almos surely. This proof did no use he Borel-Canelli lemma, bu we can also conclude by he Borel-Canelli lemma: Proof of Theorem 62: We know by Markov s inequaliy ha ( ) 1 P n S n nµ n γ E((S n/n µ) 4 ) = Kn 2+4γ. n 4γ Define for γ (, 1/4) { } 1 A n = n S n nµ n γ by he firs Borel-Canelli lemma, where A = n 1 if and only if S n N n N n µ < n γ n 1 P(A n ) < P(A) = m n A n. Bu now, even A c happens S n n µ.

Lecure 9: The Srong Law of Large Numbers 51 9.4 The second Borel-Canelli lemma We won need he second Borel-Canelli lemma in his course, bu include i for compleeness. Lemma 65 (Borel-Canelli (second lemma)) Le A = n 1 m n A n be he even ha infiniely many of he evens A n occur. Then P(A n ) = and (A n ) n 1 independen P(A) = 1. n 1 Proof: The conclusion is equivalen o P(A c ) =. By de Morgan s laws A c = A c m. n 1 m n However, ( r ) P m n A c m = lim r P m=n A c m = (1 P(A m )) exp ( P(A m )) = exp P(A m ) = m n m n m n whenever n 1 P(A n) =. Thus P(A c ) = lim n P m n A c m =. As a echnical deail: o jusify some of he limiing probabiliies, we use coninuiy of P along increasing and decreasing sequences of evens, ha follows from he sigma-addiiviy of P, cf. Grimme-Sirzaker, Lemma 1.3.(5). 9.5 Examples Example 66 (Arrival imes in Poisson process) A Poisson process has independen and idenically disribued iner-arrival imes (Z n ) n wih Z n Exp(λ). We denoed he parial sums (arrival imes) by T n = Z +... + Z n 1. The Srong Law of Large Numbers yields T n n 1 λ almos surely, as n.

52 Lecure Noes Par B Applied Probabiliy Oxford MT 27 Example 67 (Reurn imes of Markov chains) For a posiive-recurren discree-ime Markov chain we denoed by N i = N (1) i = inf{n > : M n = i}, N (m+1) i = inf{n > N (m) i : M n = i}, m N, he successive reurn imes o. By he srong Markov propery, he random variables N (m+1) i N (m) i, m 1 are independen and idenically disribued. If we define N () i = and sar from i, hen his holds for m. The Srong Law of Large Number yields Similarly, in coninuous ime, for N (m) i m E i(n i ) almos surely, as m. we ge H i = H (1) i = inf{ T 1 : X = i}, H (m) i = T (m) N, m N, i H (m) i m E i(h i ) = m i almos surely, as m. Example 68 (Empirical disribuions) If (Y n ) n 1 is an infinie sample (independen and idenically disribued random variables) from a discree disribuion ν on S, hen he random variables B n (i) = 1 {Yn=i}, n 1, are also independen and idenically disribued for each fixed i S, as funcions of independen variables. The Srong Law of Large Numbers yields ν (n) i = #{k = 1,...,n : Y k = i} = B(i) 1 +... + B n (i) n n E(B (i) 1 ) = P(Y 1 = i) = ν i almos surely, as n. The probabiliy mass funcion ν (n) is called empirical disribuion. I liss relaive frequencies in he sample and, for a specific realisaion, can serve as an approximaion of he rue disribuion. In applicaions of saisics, i is he sample disribuion associaed wih a populaion disribuion. The resul ha empirical disribuions converge o he rue disribuion, is rue uniformly in i and in higher generaliy, i is usually referred o as he Glivenko-Canelli heorem. Remark 69 (Discree ergodic heorem) If (M n ) n is a posiive-recurren discreeime Markov chain, he Ergodic Theorem is a saemen very similar o he example of empirical disribuions #{k =,..., n 1 : M k = i} n P η (M = i) = η i almos surely, as n, for a saionary disribuion η, bu of course, he M n, n, are no independen (in general). Therefore, we need o work a bi harder o deduce he Ergodic Theorem from he Srong Law of Large Numbers.

Lecure 1 Renewal processes and equaions 1.1 Moivaion and definiion Reading: Grimme-Sirzaker 1.1-1.2; Ross 7.1-7.3 So far, he opic has been coninuous-ime Markov chains, and we ve inroduced hem as discree-ime Markov chains wih exponenial holding imes. In his seing we have a heory very much similar o he discree-ime heory, wih independence of fuure and pas given he presen (Markov propery), ransiion probabiliies, invarian disribuions, class srucure, convergence o equilibrium, ergodic heorem, ime reversal, deailed balance ec. A few odd feaures can occur, mainly due o explosion. These parallels are due o he exponenial holding imes and heir lack of memory propery which is he key o he Markov propery in coninuous ime. In pracice, his assumpion is ofen no reasonable. Example 7 Suppose ha you coun he changing of baeries for an elecrical device. Given ha he baery has been in use for ime, is is residual lifeime disribued as is oal lifeime? We would assume his, if we were modelling wih a Poisson process. We may wish o replace he exponenial disribuion by oher disribuions, e.g. one ha canno ake arbirarily large values or, for oher applicaions, one ha can produce clusering effecs (many shor holding imes separaed by significanly longer ones). We sared he discussion of coninuous-ime Markov chains wih birh processes as generalised Poisson processes. Similarly, we sar here generalising he Poisson process o have non-exponenial bu independen idenically disribued iner-arrival imes. Definiion 71 Le (Z n ) n be a sequence of independen idenically disribued posiive random variables, T n = n 1 k= Z k, n 1, he parial sums. Then he process X = (X ) defined by X = #{n 1 : T n } is called a renewal process. The common disribuion of Z n, n, is called iner-arrival disribuion. 53

54 Lecure Noes Par B Applied Probabiliy Oxford MT 27 Example 72 If (Y ) is a coninuous-ime Markov chain wih Y = i, hen Z n = H (n+1) i H (n) i, he imes beween successive reurns o i by Y, are independen and idenically disribued (by he srong Markov propery). The associaed couning process X = #{n 1 : H (n) } couning he visis o i is hus a renewal process. 1.2 The renewal funcion Definiion 73 The funcion m() := E(X ) is called he renewal funcion. I plays an imporan role in renewal heory. Remember ha for Z n Exp(λ) we had X Poi(λ) and in paricular m() = E(X ) = λ. To calculae he renewal funcion for general renewal processes, we should invesigae he disribuion of X. Noe ha, as for birh processes, so ha we can express X = k T k < T k+1, P(X = k) = P(T k < T k+1 ) = P(T k ) P(T k+1 ) in erms of he disribuions of T k = Z +... + Z k 1, k 1. Recall ha for wo independen coninuous random variables S and T wih densiies f and g, he random variable S + T has densiy (f g)(u) = f(u )g()d, u R, he convoluion (produc) of f and g, and if S and T, hen (f g)(u) = u f(u )g()d, u. I is no hard o check ha he convoluion produc is symmeric, associaive and disribues over sums of funcions. While he firs wo of hese properies ranslae as S + T = T + S and (S + T) + U = S + (T + U) for associaed random variables, he hird propery has no such meaning, since sums of densiies are no longer probabiliy densiies. However, he definiion of he convoluion produc makes sense for general nonnegaive inegrable funcions, and we will mee oher relevan examples soon. We can define convoluion powers f (1) = f and f (k+1) = f f (k), k 1. Then P(T k ) = if Z n, n, are coninuous wih densiy f. f Tk (s)ds = f (k) (s)ds,

Lecure 1: Renewal processes and equaions 55 Proposiion 74 Le X be a renewal process wih inerarrival densiy f. Then m() = E(X ) is differeniable in he weak sense ha i is he inegral funcion of m (s) := f (k) (s) k=1 Lemma 75 Le X be an N-valued random variable. Then E(X) = k 1 P(X k). Proof: We use Tonelli s Theorem P(X k) = P(X = j) = k 1 k 1 j k j 1 j k=1 P(X = j) = j jp(x = j) = E(X). Proof of Proposiion 74: Le us inegrae k=1 f (k) (s) using Tonelli s Theorem k=1 f (k) (s)ds = k=1 f (k) (s)ds = P(T k ) = k=1 P(X k) = E(X ) = m(). k=1 1.3 The renewal equaion For coninuous-ime Markov chains, condiioning on he firs ransiion ime was a powerful ool. We can do his here and ge wha is called he renewal equaion. Proposiion 76 Le X be a renewal process wih inerarrival densiy f. Then m() = E(X ) is he unique (locally bounded) soluion of m() = F() + where F() = f(s)ds = P(Z 1 ). m( s)f(s)ds, i.e. m = F + f m, Proof: Condiioning on he firs arrival will involve he process X u = X T1 +u, u. Noe ha X = 1 and ha X u 1 is a renewal process wih inerarrival imes Z n = Z n+1, n, independen of T 1. Therefore E(X ) = f(s)e(x T 1 = s)ds = f(s)e( X s )ds = F(s) + f(s)m( s)ds.

56 Lecure Noes Par B Applied Probabiliy Oxford MT 27 For uniqueness, suppose ha also l = F + f l, hen α = l m is locally bounded and saisfies α = f α = α f. Ieraion gives α = α f (k) for all k 1 and, summing over k gives for he righ hand side somehing finie: ( ) ( α f (k) () = α f (k) )() = (α m )() k 1 k 1 ( ) = α( s)m (s)ds sup α(u) u [,] m() < bu he lef-hand side is infinie unless α() =. Therefore l() = m(), for all. Example 77 We can express m as follows: m = F + F k 1 f (k). Indeed, we check ha l = F + F k 1 f (k) saisfies he renewal equaion: F + f l = F + F f + F j 2 f (j) = F + F k 1 f (k) = l, jus using properies of he convoluion produc. By Proposiion 76, l = m. Unlike Poisson processes, general renewal processes do no have a linear renewal funcion, bu i will be asympoically linear (Elemenary Renewal Theorem, as we will see). In fac, renewal funcions are in one-o-one correspondence wih inerarrival disribuions we do no prove his, bu i should no be oo surprising given ha m = F + f m is almos symmeric in f and m. Unlike he Poisson process, incremens of general renewal processes are no saionary (unless we change he disribuion of Z in a clever way, as we will see) nor independen. Some of he imporan resuls in renewal heory are asympoic resuls. These asympoic resuls will, in paricular, allow us o prove he Ergodic Theorem for Markov chains. 1.4 Srong Law and Cenral Limi Theorem of renewal heory Theorem 78 (Srong Law of renewal heory) Le X be a renewal process wih mean inerarrival ime µ (, ). Then X 1 µ almos surely, as. Proof: Noe ha X is consan on [T n, T n+1 ) for all n, and herefore consan on [T X, T X+1). Therefore, as soon as X >, T X < T X +1 = T X +1 X + 1. X X X X + 1 X

Lecure 1: Renewal processes and equaions 57 Now P(X ) = 1, since X n T n+1 = which is absurd, since T n+1 = Z +... + Z n is a finie sum of finie random variables. Therefore, we conclude from he Srong Law of Large Numbers for T n, ha Therefore, if X and T n /n µ, hen T X X µ almos surely, as. µ lim µ as, X bu his means P(X / 1/µ) P(X, T n /n µ) = 1, as required. Try o do his proof for convergence in probabiliy. The nasy ε expressions are no very useful in his conex, and he proof is very much harder. Bu we can now deduce a corresponding Weak Law of Renewal Theory, because almos sure convergence implies convergence in probabiliy. We also have a Cenral Limi Theorem: Theorem 79 (Cenral Limi Theorem of Renewal Theory) Le X = (X ) be a renewal process whose inerarrival imes (Y n ) n saisfy < σ 2 = V ar(y 1 ) < and µ = E(Y 1 ). Then X /µ N(, 1) in disribuion, as. σ2 /µ 3 The proof is no difficul and lef as an exercise on Assignmen 5. 1.5 The elemenary renewal heorem Theorem 8 Le X be a renewal process wih mean inerarrival imes µ and m() = E(X ). Then m() = E(X ) 1 µ as Noe ha his does no follow easily from he srong law of renewal heory since almos sure convergence does no imply convergence of means (cf. Proposiion 6, see also he couner example on Assignmen 5). In fac, he proof is longer and no examinable: we sar wih a lemma. Lemma 81 For a renewal process X wih arrival imes (T n ) n 1, we have E(T X+1) = µ(m() + 1), where m() = E(X ), µ = E(T 1 ).

58 Lecure Noes Par B Applied Probabiliy Oxford MT 27 This ough o be rue, because T X+1 is he sum of X +1 inerarrival imes, each wih mean µ. Taking expecaions, we should ge m() + 1 imes µ. However, if we condiion on X we have o know he disribuion of he residual inerarrival ime afer, bu wihou lack of memory, we are suck. Proof: Le us do a one-sep analysis on he quaniy of ineres g() = E(T X+1): g()= E(T X+1 T 1 =s)f(s)ds = ( s + E(TX s +1) ) f(s)ds + sf(s)ds = µ + (g f)(). This is almos he renewal equaion. In fac, g 1 () = g()/µ 1 saisfies he renewal equaion g 1 () = 1 µ g( s)f(s)ds = (g 1 ( s) + 1)f(s)ds = F() + (g 1 f)(), and, by Proposiion 76, g 1 () = m(), i.e. g() = µ(1 + m()) as required. Proof of Theorem 8: Clearly < E(T X+1) = µ(m() + 1) gives he lower bound lim inf m() 1 µ. For he upper bound we use a runcaion argumen and inroduce { Zj if Z Z j = Z j a = j < a a if Z j a wih associaed renewal process X. Zj Z j for all j implies X X for all, hence m() m(). Puing hings ogeher, we ge from he lemma again Therefore E( T X e ) = E( T X e ) E( Z +1 X e ) = µ( m() + 1) E( Z +1 X e+1 ) µ(m() + 1) a. m() 1 µ + a µ µ so ha lim sup m() 1 µ Now µ = E( Z 1 ) = E(Z 1 a) E(Z 1 ) = µ a a (by monoone convergence). Therefore lim sup m() 1 µ. Noe ha runcaion was necessary o ge E( Z ex+1 ) a. I would have been enough if we had E(Z X+1) = E(Z 1 ) = µ, bu his is no rue. Look a he Poisson process as an example. We know ha he residual lifeime has already mean µ = 1/λ, bu here is also he par of Z X+1 before ime. We will explore his in Lecure 11 when we discuss residual lifeimes in renewal heory.