MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

Similar documents
7.1 Convergence of sequences of random variables

7.1 Convergence of sequences of random variables

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Distribution of Random Samples & Limit theorems

This section is optional.

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013. Large Deviations for i.i.d. Random Variables

LECTURE 8: ASYMPTOTICS I

1 Convergence in Probability and the Weak Law of Large Numbers

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 6 9/23/2013. Brownian motion. Introduction

Advanced Stochastic Processes.

Lecture 19: Convergence

Lecture 2: Concentration Bounds

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013

Sequences and Series of Functions

Entropy Rates and Asymptotic Equipartition

4. Partial Sums and the Central Limit Theorem

Notes 19 : Martingale CLT

Chapter 6 Infinite Series

ST5215: Advanced Statistical Theory

Introduction to Probability. Ariel Yadin

Lecture 3 The Lebesgue Integral

We are mainly going to be concerned with power series in x, such as. (x)} converges - that is, lims N n

STAT Homework 1 - Solutions

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

MATH 112: HOMEWORK 6 SOLUTIONS. Problem 1: Rudin, Chapter 3, Problem s k < s k < 2 + s k+1

Lecture 8: Convergence of transformations and law of large numbers

Mathematics 170B Selected HW Solutions.

2.2. Central limit theorem.

ECE 330:541, Stochastic Signals and Systems Lecture Notes on Limit Theorems from Probability Fall 2002

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

Introduction to Probability. Ariel Yadin. Lecture 7

Lecture Chapter 6: Convergence of Random Sequences

Notes 5 : More on the a.s. convergence of sums

Chapter 5. Inequalities. 5.1 The Markov and Chebyshev inequalities

Lecture 3 : Random variables and their distributions

Additional Notes on Power Series

Integrable Functions. { f n } is called a determining sequence for f. If f is integrable with respect to, then f d does exist as a finite real number

An Introduction to Randomized Algorithms

Seunghee Ye Ma 8: Week 5 Oct 28

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

1 = δ2 (0, ), Y Y n nδ. , T n = Y Y n n. ( U n,k + X ) ( f U n,k + Y ) n 2n f U n,k + θ Y ) 2 E X1 2 X1

Approximations and more PMFs and PDFs

Math 61CM - Solutions to homework 3

EE 4TM4: Digital Communications II Probability Theory

Agnostic Learning and Concentration Inequalities

ECE534, Spring 2018: Final Exam

5 Birkhoff s Ergodic Theorem

Advanced Analysis. Min Yan Department of Mathematics Hong Kong University of Science and Technology

6 Infinite random sequences

Math Solutions to homework 6

Math 525: Lecture 5. January 18, 2018

Probability for mathematicians INDEPENDENCE TAU

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4.

Power series are analytic

n=1 a n is the sequence (s n ) n 1 n=1 a n converges to s. We write a n = s, n=1 n=1 a n

2.1. Convergence in distribution and characteristic functions.

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall Midterm Solutions

Lecture 4. We also define the set of possible values for the random walk as the set of all x R d such that P(S n = x) > 0 for some n.

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 6 9/24/2008 DISCRETE RANDOM VARIABLES AND THEIR EXPECTATIONS

Asymptotic distribution of products of sums of independent random variables

y X F n (y), To see this, let y Y and apply property (ii) to find a sequence {y n } X such that y n y and lim sup F n (y n ) F (y).

BIRKHOFF ERGODIC THEOREM

Probability 2 - Notes 10. Lemma. If X is a random variable and g(x) 0 for all x in the support of f X, then P(g(X) 1) E[g(X)].

MAS111 Convergence and Continuity

Lecture 7: Properties of Random Samples

6.3 Testing Series With Positive Terms

Chapter 2 The Monte Carlo Method

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Exponential Functions and Taylor Series

Power series are analytic

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.

M17 MAT25-21 HOMEWORK 5 SOLUTIONS

Sequences. A Sequence is a list of numbers written in order.

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

2 Banach spaces and Hilbert spaces

Empirical Processes: Glivenko Cantelli Theorems

Solutions to HW Assignment 1

The Central Limit Theorem

CHAPTER 10 INFINITE SEQUENCES AND SERIES

Problems from 9th edition of Probability and Statistical Inference by Hogg, Tanis and Zimmerman:

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Learning Theory: Lecture Notes

The log-behavior of n p(n) and n p(n)/n

ACO Comprehensive Exam 9 October 2007 Student code A. 1. Graph Theory

Singular Continuous Measures by Michael Pejic 5/14/10

Glivenko-Cantelli Classes

1 Introduction to reducing variance in Monte Carlo simulations

Fall 2013 MTH431/531 Real analysis Section Notes

Sieve Estimators: Consistency and Rates of Convergence

Lecture 01: the Central Limit Theorem. 1 Central Limit Theorem for i.i.d. random variables

ECE534, Spring 2018: Solutions for Problem Set #2

Generalized Semi- Markov Processes (GSMP)

Notes on Snell Envelops and Examples

Random Models. Tusheng Zhang. February 14, 2013

Notes #3 Sequences Limit Theorems Monotone and Subsequences Bolzano-WeierstraßTheorem Limsup & Liminf of Sequences Cauchy Sequences and Completeness

32 estimating the cumulative distribution function

Transcription:

MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak law of large umbers establishes covergece of the sample mea, i probability, the strog law establishes almost sure covergece. Before we proceed, we poit out two commo methods for provig almost sure covergece. Propositio : Let {X } be a sequece of radom variables, ot ecessarily idepedet. (i) If [ s a.s. ] <, ad s > 0, the X 0. = X a.s. (ii) If P( X > ɛ) <, for every ɛ > 0, the X 0. = Proof. (i) By the mootoe covergece theorem, we obtai [ = X s ] <, which implies that the radom variable = X s is fiite, with probability. Therefore, X s a.s. a.s. 0, which also implies that X 0. (ii) Settig ɛ = /k, for ay positive iteger k, the Borel-Catelli Lemma shows that the evet { X > /k} occurs oly a fiite umber of times, with probability. Thus, P(lim sup X > /k) = 0, for every positive iteger k. Note that the sequece of evets {lim sup X > /k} is mootoe ad coverges to the evet {lim sup X > 0}. The cotiuity of probability measures implies that P(lim sup X > 0) = 0. This establishes that a.s. X 0.

Theorem : Let X, X, X 2,... be i.i.d. radom variables, ad assume that [ X ] <. Let S = X + + X. The, S / coverges almost surely to [X]. Proof, assumig fiite fourth momets. Let us make the additioal assumptio that [X 4 ] <. Note that this implies [ X ] <. Ideed, usig the iequality x + x 4, we have [ ] X + [X 4 ] <. Let us assume first that [X] = 0. We will show that [ ] (X + + X ) 4 4 <. = We have [ ] (X + + X ) 4 = [Xi 4 4 X i X 2 i X 3 i ]. 4 i = i 2 = i 3 = i 4 = Let us cosider the various terms i this sum. If oe of the idices is differet from all of the other idices, the correspodig term is equal to zero. For example, if i is differet from i 2, i 3, or i 4, the assumptio [X i ] = 0 yields [X i X i2 X i3 X i4 ] = [X i ] [X i2 X i3 X i4 ] = 0. Therefore, the ozero terms i the above sum are either of the form [X 4 i ] (there are such terms), or of the form [X 2 i X 2 j ], with i = j. Let us cout the umber of terms of the secod type. Such terms are obtaied i three differet ways: by settig i = i 2 i 3 = i 4, or by settig i = i 3 i 2 = i 4, or by settig i = i 4 = i 2 = i 3. For each oe of these three ways, we have choices for the first pair of idices, ad choices for the secod pair. We coclude that there are 3( ) terms of this type. Thus, [ ] (X + + X ) 4 = [X 4 ] + 3( )[X 2 X 2 2 ] 4 4. Usig the iequality xy (x 2 + y 2 )/2, we obtai [X 2 X2 2 ] [X 4 ], ad [ ] ( ) (X + + X ) 4 + 3( ) [X 4 ] 3 2 [X 4 ] 3[X 4 ] =. 4 4 4 2 2

It follows that [ ] (X + + X ) 4 [ = (X + +X ) 4] 3 4 [X ] <, 4 4 2 = = = where the last step uses the well kow property = 2 <. This implies that (X + + X ) 4 / 4 coverges to zero with probability, ad therefore, (X + +X )/ also coverges to zero with probability, which is the strog law of large umbers. For the more geeral case where the mea of ( the radom variables X ) i is ozero, the precedig argumet establishes that X + + X [X ] / coverges to zero, which is the same as (X + +X )/ covergig to [X ], with probability. Proof, assumig fiite secod momets. We ow cosider the case where we oly assume that [X 2 ] <. We have [ (S ) ] 2 var(x) µ =. If we oly cosider values of that are perfect squares, we obtai [ (Si ) ] 2 2 var(x) i 2 µ = i 2 <, i= i= which implies that ( (S i 2 /i 2 ) [X] ) 2 coverges to zero, with probability. Therefore, S i 2 /i 2 coverges to [X], with probability. Suppose that the radom variables X i are oegative. Cosider some such that i 2 < (i + ) 2. We the have S i 2 S S (i+) 2. It follows that or S i 2 S S (i+) 2, (i + ) 2 i 2 i 2 S i 2 S (i + ) 2 S (i+) 2. (i + ) 2 i 2 i 2 (i + ) 2 As, we also have i. Sice i/(i + ), ad sice S i 2 i 2 coverges to [X], with probability, we see that for almost all sample poits, S / is sadwiched betwee two sequeces that coverge to [X]. This proves that S / [X], with probability. For a geeral radom variable X, we write it i the form X = X + X, where X + ad X are oegative. The strog law applied to X ad X separately, implies the strog law for X as well. 3

The proof for the most geeral case (fiite mea, but possibly ifiite variace) is omitted. It ivolves trucatig the distributio of X, so that its momets are all fiite, ad the verifyig that the errors due to such trucatio are ot sigificat i the limit. 2 The Cheroff boud Let agai X, X,... be i.i.d., ad S = X + + X. Let us assume, for simplicity, that [X] = 0. Accordig to the weak law of large umbers, we kow that P(S a) 0, for every a > 0. We are iterested i a more detailed estimate of P(S a), ivolvig the rate at which this probability coverges to zero. It turs out that if the momet geeratig fuctio of X is fiite o some iterval [0, c] (where c > 0), the P(S a) decays expoetially with, ad much is kow about the precise rate of expoetial decay. 2. Upper boud Let M(s) = [e sx ], ad assume that M(s) <, for s [0, c], where c > 0. Recall that M S (s) = [e s(x + +X ) ] = (M(s)). For ay s > 0, the Markov iequality yields P(S a) = P(e ss e sa ) e sa [e ss ] = e sa (M(s)). very oegative value of s, gives us a particular boud o P(S a). To obtai the tightest possible boud, we miimize over s, ad obtai the followig result. Theorem 2. (Cheroff upper boud) Suppose that [e sx ] < for some s > 0, ad that a > 0. The, P(S a) e φ(a), where ( ) φ(a) = sup sa log M(s). s 0 For s = 0, we have sa log M(s) = 0 log = 0, where we have used the geeric property M(0) =. Furthermore, d( ) d sa log M(s) = a M(s) = a [X] > 0. ds s=0 M(s) ds s=0 4

Sice the fuctio sa log M(s) is zero ad has a positive derivative at s = 0, it must be positive whe s is positive ad small. It follows that the supremum φ(a) of the fuctio sa log M(s) over all s 0 is also positive. I particular, for ay fixed a > 0, the probability P(S a) decays at least expoetially fast with. s xample: For a stadard ormal radom variable X, we have M(s) = e 2 /2. Therefore, sa log M(s) = sa s 2 /2. To maximize this expressio over all s 0, we form the derivative, which is a s, ad set it to zero, resultig i s = a. Thus, φ(a) = a 2 /2, which leads to the boud 2.2 Lower boud P(X a) e a2 /2. Remarkably, it turs out that the estimate φ(a) of the decay rate is tight, uder miimal assumptios. To keep the argumet simple, we itroduce some simplifyig assumptios. Assumptio. (i) M(s) = [e sx ] <, for all s R. (ii) The radom variable X is cotiuous, with PDF f X. (iii) The radom variable X does ot admit fiite upper ad lower bouds. (Formally, 0 < F X (x) <, for all x R.) We the have the followig lower boud. Theorem 2. (Cheroff lower boud) Uder Assumptio, we have for every a > 0. lim log P(S a) = φ(a), () We ote two cosequeces of our assumptios, whose proof is left as a exercise: log M(s) (a) lim s = ; s (b) M(s) is differetiable at every s. The first property guaratees that for ay a > 0 we have lim s (log M(s) sa) =. Sice M(s) > 0 for all s, ad sice M(s) is differetiable, it follows that log M(s) is also differetiable ad that there exists some s 0 at which 5

log M(s) sa is miimized over all s 0. Takig derivatives, we see that such a s satisfies a = M (s )/M(s ), where M stads for the derivative of M. I particular, φ(a) = s a log M(s ). (2) Let us itroduce a ew PDF e s x f Y (x) = f X (x). M(s ) This is a legitimate PDF because f Y (x) dx = e s x f X (x) dx = M(s ) =. M(s ) M(s ) The momet geeratig fuctio associated with the ew PDF is M Y (s) = e sx e s x f X (x) dx = M(s + s ). M(s ) M(s ) Thus, d M (s ) [Y ] = M(s + s ) = = a, M(s ) ds s=0 M(s ) where the last equality follows from our defiitio of s. The distributio of Y is called a tilted versio of the distributio of X. Let Y,..., Y be i.i.d. radom variables with PDF f Y. Because of the close relatio betwee f X ad f Y, approximate probabilities of evets ivolvig Y,..., Y ca be used to obtai approximate probabilities of evets ivolvig X,..., X. We keep assumig that a > 0, ad fix some δ > 0. Let { } B = (x,..., x ) a δ x i a + δ R. i= Let S = X +... + X ad T = Y +... + Y. We have ( ) ( ) P S (a δ) P (a δ) S (a + δ) = f X (x ) f X (x ) dx dx (x,...,x ) B = (M(s )) e s x f Y (x ) e s x f Y (x ) dx dx (x,...,x ) B (M(s )) e s (a+δ) f Y (x ) f Y (x ) dx dx (x,...,x ) B = (M(s )) e s (a+δ) P(T B). (3) 6

The secod iequality above was obtaied because for every (x,..., x )] B, we have x + + x (a + δ), so that e s x e s x e s (a+δ). By the weak law of large umbers, we have ( Y + + Y ) P(T B) = P [a δ, a + δ], as. Takig logarithms, dividig by, ad the takig the limit of the two sides of q. (3), ad fially usig q. (2), we obtai lim if log P(S > a) log M(s ) s a δ = φ(a) δ. This iequality is true for every δ > 0, which establishes the lower boud i q. (). 7

MIT OpeCourseWare http://ocw.mit.edu 6.436J / 5.085J Fudametals of Probability Fall 2008 For iformatio about citig these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.