MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem

Similar documents
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013. Large Deviations for i.i.d. Random Variables

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013

Advanced Stochastic Processes.

Convergence of random variables. (telegram style notes) P.J.C. Spreij

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall Midterm Solutions

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Sequences and Series of Functions

7.1 Convergence of sequences of random variables

Lecture 19: Convergence

Seunghee Ye Ma 8: Week 5 Oct 28

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

1 Convergence in Probability and the Weak Law of Large Numbers

Integrable Functions. { f n } is called a determining sequence for f. If f is integrable with respect to, then f d does exist as a finite real number

Distribution of Random Samples & Limit theorems

Chapter 6 Infinite Series

Math 341 Lecture #31 6.5: Power Series

STAT Homework 1 - Solutions

This section is optional.

Lecture 3 : Random variables and their distributions

Chapter 5. Inequalities. 5.1 The Markov and Chebyshev inequalities

Notes 5 : More on the a.s. convergence of sums

Lecture 8: Convergence of transformations and law of large numbers

It is often useful to approximate complicated functions using simpler ones. We consider the task of approximating a function by a polynomial.

Math 113 Exam 3 Practice

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.

4x 2. (n+1) x 3 n+1. = lim. 4x 2 n+1 n3 n. n 4x 2 = lim = 3

PROBLEM SET 5 SOLUTIONS 126 = , 37 = , 15 = , 7 = 7 1.

sin(n) + 2 cos(2n) n 3/2 3 sin(n) 2cos(2n) n 3/2 a n =

Different kinds of Mathematical Induction

Introduction to Extreme Value Theory Laurens de Haan, ISM Japan, Erasmus University Rotterdam, NL University of Lisbon, PT

LECTURE 8: ASYMPTOTICS I

6.3 Testing Series With Positive Terms

Measure and Measurable Functions

Lecture 4. We also define the set of possible values for the random walk as the set of all x R d such that P(S n = x) > 0 for some n.

MAS111 Convergence and Continuity

ST5215: Advanced Statistical Theory

4. Partial Sums and the Central Limit Theorem

Probability 2 - Notes 10. Lemma. If X is a random variable and g(x) 0 for all x in the support of f X, then P(g(X) 1) E[g(X)].

INFINITE SEQUENCES AND SERIES

Solution. 1 Solutions of Homework 1. Sangchul Lee. October 27, Problem 1.1

7.1 Convergence of sequences of random variables

Fall 2013 MTH431/531 Real analysis Section Notes

Spring Information Theory Midterm (take home) Due: Tue, Mar 29, 2016 (in class) Prof. Y. Polyanskiy. P XY (i, j) = α 2 i 2j

Lecture 3 The Lebesgue Integral

The natural exponential function

Limit superior and limit inferior c Prof. Philip Pennance 1 -Draft: April 17, 2017

Math 104: Homework 2 solutions

ECE 330:541, Stochastic Signals and Systems Lecture Notes on Limit Theorems from Probability Fall 2002

MATH 472 / SPRING 2013 ASSIGNMENT 2: DUE FEBRUARY 4 FINALIZED

Notes 19 : Martingale CLT

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3

Lecture 7: Properties of Random Samples

1.010 Uncertainty in Engineering Fall 2008

The log-behavior of n p(n) and n p(n)/n

Chapter 0. Review of set theory. 0.1 Sets

Math 525: Lecture 5. January 18, 2018

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 6 9/23/2013. Brownian motion. Introduction

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

Math 140A Elementary Analysis Homework Questions 3-1

Solutions to Tutorial 3 (Week 4)

Solutions to HW Assignment 1

We are mainly going to be concerned with power series in x, such as. (x)} converges - that is, lims N n

1 Approximating Integrals using Taylor Polynomials

Estimation for Complete Data

lim za n n = z lim a n n.

M17 MAT25-21 HOMEWORK 5 SOLUTIONS

Mathematical Statistics - MS

Advanced Analysis. Min Yan Department of Mathematics Hong Kong University of Science and Technology

Mathematics 170B Selected HW Solutions.

Additional Notes on Power Series

Problem Set 4 Due Oct, 12

Math 113 Exam 4 Practice

1 Lecture 2: Sequence, Series and power series (8/14/2012)

Power series are analytic

MAT1026 Calculus II Basic Convergence Tests for Series

Power series are analytic

Lecture 3: August 31

PRACTICE FINAL/STUDY GUIDE SOLUTIONS

Assignment 5: Solutions

Entropy Rates and Asymptotic Equipartition

Lecture 10 October Minimaxity and least favorable prior sequences

Probability for mathematicians INDEPENDENCE TAU

CHAPTER 10 INFINITE SEQUENCES AND SERIES

y X F n (y), To see this, let y Y and apply property (ii) to find a sequence {y n } X such that y n y and lim sup F n (y n ) F (y).

1+x 1 + α+x. x = 2(α x2 ) 1+x

BINOMIAL COEFFICIENT AND THE GAUSSIAN

The Wasserstein distances

7 Sequences of real numbers

Sequences and Series

MIT Spring 2016

Application to Random Graphs


Review Questions, Chapters 8, 9. f(y) = 0, elsewhere. F (y) = f Y(1) = n ( e y/θ) n 1 1 θ e y/θ = n θ e yn

2.2. Central limit theorem.

Random Matrices with Blocks of Intermediate Scale Strongly Correlated Band Matrices

MATH 413 FINAL EXAM. f(x) f(y) M x y. x + 1 n

Exponential Functions and Taylor Series

Transcription:

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/5.070J Fall 203 Lecture 3 9//203 Large deviatios Theory. Cramér s Theorem Cotet.. Cramér s Theorem. 2. Rate fuctio ad properties. 3. Chage of measure techique. Cramér s Theorem We have established i the previous lecture that uder some assumptios o the Momet Geeratig Fuctio (MGF) M(θ), a i.i.d. sequece of radom variables X i, ; i with mea µ satisfies P(S a) exp( I(a)), where S = i X i, ad I(a) sup θ (θa log M(θ)) is the Legedre trasform. The fuctio I(a) is also commoly called the rate fuctio i the theory of Large Deviatios. The boud implies log P(S a) lim sup I(a), ad we have idicated that the boud is tight. Namely, ideally we would like to establish the limit log P(S a) lim sup = I(a), Furthermore, we might be iterested i more complicated rare evets, beyod the iterval [a, ). For example, the likelihood that P(S A) for some set A R ot cotaiig the mea value µ. The Large Deviatios theory says that roughly speakig lim P(S A) = if I(x), () x A

but ufortuately this statemet is ot precisely correct. Cosider the followig example. Let X be a iteger-valued radom variable, ad A = { m p : m Z, p is odd prime.}. The for prime, we have P(S A) = ; but for = 2 k, log P (S A) we have P (S A) = 0. As a result, the limit lim i this case does ot exist. The sese i which the idetity () is give by the Cramér s Theorem below. Theorem (Cram er s Theorem). Give a sequece of i.i.d. real valued ra dom variables X i, i with a commo momet geeratig fuctio M(θ) = E[exp(θX )] the followig holds: (a) For ay closed set F R, (b) For ay ope set U R, lim sup log P(S F ) if I(x), x F lim if log P(S U) if I(x). We will prove the theorem oly for the special case whe D(M) = R (amely, the MGF is fiite everywhere) ad whe the support of X is etire R. Namely for every K > 0, P(X > K) > 0 ad P(X < K) > 0. For example a Gaussia radom variable satisfies this property. To see the power of the theorem, let us apply it to the tail of S. I the followig sectio we will establish that I(x) is a o-decreasig fuctio o the iterval [µ, ). Furthermore, we will establish that if it is fiite i some iterval cotaiig x it is also cotiuous at x. Thus fix a ad suppose I is fiite i a iterval cotaiig a. Takig F to be the closed set [a, ) with a > µ, we obtai from the lim sup log P(S [a, )) mi I(x) x U x a = I(a). Applyig the secod part of Cramér s Theorem, we obtai lim if log P(S [a, )) lim if log P(S (a, )) if I(x) x>a = I(a). 2

Thus i this special case ideed the large deviatios limit exists: lim log P(S a) = I(a). The limit is isesitive to whether the iequality is strict, i the sese that we also have lim log P(S > a) = I(a). 2 Properties of the rate fuctio I Before we prove this theorem, we will eed to establish several properties of I(x) ad M(θ). Propositio. The rate fuctio I satisfies the followig properties (a) I is a covex o-egative fuctio satisfyig I(µ) = 0. Furthermore, it is a icreasig fuctio o [µ, ) ad a decreasig fuctio o (, µ]. Fially I(x) = sup θ 0 (θx log M(θ)) for every x µ ad I(x) = sup θ 0 (θx log M(θ)) for every x µ. (b) Suppose i additio that D(M) = R ad the support of X is R. The, I is a fiite cotiuous fuctio o R. Furthermore, for every x R we have I(x) = θ 0 x log M(θ 0 ), for some θ 0 = θ 0 (x) satisfyig Ṁ(θ 0 ) x =. (2) M(θ 0 ) Proof of part (a). Covexity is due to the fact that I(x) is poit-wise supremum. Precisely, cosider λ (0, ) I(λx + ( λ)y) = sup[θ(λx + ( x)y) log M(θ)] θ = sup[λ(x log M(θ)) + ( λ)(y log M(θ))] λ sup(x log M(θ)) + ( λ) sup (y log M(θ)) θ θ =λi(x) + ( λ)i(y). This establishes the covexity. Now sice M(0) = the I(x) 0 x log M(0) = 0 ad the o-egativity is established. By Jese s iequality, we have that M(θ) = E[exp(θX )] exp(θe[x ]) = exp(θµ). 3

Therefore, log M(θ) θµ, amely, θµ log M(θ) 0, implyig I(µ) = 0 = mi x R I(x). Furthermore, if x > µ, the for θ < 0 we have θx log M(θ) θ(x µ) < 0. This meas that sup θ (θx log M(θ)) must be equal to sup θ 0 (θx log M(θ)). Similarly we show that whe x < µ, we have I(x) = sup θ 0 (θx log M(θ)). Next, the mootoicity follows from covexity. Specifically, the existece of real umbers µ x < y such that I(x) > I(y) I(µ) = 0 violates covexity (check). This completes the proof of part (a). Proof of part (b). For ay K > 0 we have log M(θ) log exp(θx) dp (x) lim if = lim if θ θ θ θ lim if log exp(θx) dp (x) Sice K is arbitrary, Similarly, Therefore, θ θ lim if log (exp(kθ)p([k, ])) θ θ K = K + lim if log P([K, ]) θ θ = K (sice supp(x ) = R, we have P([K, )) > 0.) lim if log M(θ) = θ θ lim if log M(θ) = θ θ lim θx log M(θ) = lim θ(x log M(θ)) θ θ θ Therefore, for each x as θ, we have that lim θx log M(θ) = θ From the previous lecture we kow that M(θ) is differetiable (hece cotiuous). Therefore the supremum of θx log M(θ) is achieved at some fiite value θ 0 = θ 0 (x), amely, I(x) = θ 0 x log M(θ 0 ) <, 4

where θ 0 is foud by settig the derivative of θx log M(θ) to zero. Namely, θ 0 must satisfy (2). Sice I is a fiite covex fuctio o R it is also cotiuous (verify this). This completes the proof of part (b). 3 Proof of Cramér s Theorem Now we are equipped to provig the Cramér s Theorem. Proof of Cramer s Theorem. Part (a). Fix a closed set F R. Let α + = mi{x [µ, + ) F } ad α = max{x (, µ] F }. Note that α + ad α exist sice F is closed. If α + = µ the I(µ) = 0 = mi x R I(x). Note that log P(S F ) 0, ad the statemet (a) follows trivially. Similarly, if α = µ, we also have statemet (a). Thus, assume α < µ < α +. The Defie P (S F ) P (S [α +, )) + P (S (, α ]) We already showed that from which we have x P (S [α +, )), y P (S (, α ]). P (S α + ) exp( (θα + log M(θ))), θ 0. log P (S α + ) (θα + log M(θ)), θ 0. log P (S α + ) sup(θα + log M(θ)) = I(α + ) θ 0 The secod equality i the last equatio is due to the fact that the supremum i I(x) is achieved at θ 0, which was established as a part of Propositio. Thus, we have Similarly, we have lim sup lim sup log P (S α + ) I(α + ) (3) log P (S α ) I(α ) (4) Applyig Propositio we have I(α + ) = mi x α+ I(x) ad I(α ) = mi x α I(x). Thus mi{i(α + ), I(α )} = if I(x) (5) x F 5

From (3)-(5), we have that lim sup log x if I(x), lim sup log y if I(x), (6) x F x F which implies that lim sup log(x + y ) if I(x). x F (you are asked to establish the last implicatio as a exercise). We have established lim sup log P (S F ) if I(x) (7) x F Proof of the upper boud i statemet (a) is complete. Proof of Cram er s Theorem. Part (b). Fix a ope set U R. Fix E > 0 ad fid y such that I(y) if x U ((x). It is sufficiet to show that lim if P (S U) I(y), (8) sice it will imply lim if P (S U) if I(x) + E, ad sice E > 0 was arbitrary, it will imply the result. Thus we ow establish (8). Assume y > µ. The case y < µ is treated similarly. Fid θ 0 = θ 0 (y) such that x U I(y) = θ 0 y log M(θ 0 ). Such θ 0 exists by Propositio. Sice y > µ, the agai by Propositio we may assume θ 0 0. We will use the chage-of-measure techique to obtai the cover boud. For this, cosider a ew radom variable let X θ0 be a radom variable defied by z P(X θ0 z) = exp(θ 0 x) dp (x) M(θ 0 ) Now, E[X θ0 ] = x exp(θ 0 x) dp (x) M(θ 0 ) Ṁ(θ 0 ) = M(θ0 ) = y, 6

where the secod equality was established i the previous lecture, ad the last equality follows by the choice of θ 0 ad Propositio. Sice U is ope we ca fid δ > 0 be small eough so that (y δ, y + δ) U. Thus, we have P(S U) P(S (y δ, y + δ)) = dp (x ) dp (x ) x i y <δ = exp( θ 0 x i )M (θ 0 ) M (θ 0 ) exp(θ 0 x i )dp (x i ). x i y <δ i i (9) Sice θ 0 is o-egative, we obtai a boud P(S (y δ, y + δ)) exp( θ 0 y θ 0 δ)m (θ 0 ) M (θ 0 ) exp(θ 0 x i )dp (x i ) x i y <δ i However, we recogize the itegral ; o the right-had side of the iequality above as the that the average i Y i of i.i.d. radom variables Y i, i distributed accordig to the distributio of X θ0 belogs to the iterval (y δ, y + δ). Recall, however that E[Y i ] = E[X θ0 ] = y (this is how X θ0 was desiged). Thus by the Weak Law of Large Numbers, this probability coverges to uity. As a result lim log M (θ 0 ) exp(θ 0 x i )dp (x i ) = 0. We obtai x i y <δ i lim if log P(S U) θ 0 y θ 0 δ + log M(θ 0 ) = I(y) θ 0 δ. Recallig that θ 0 depeds o y oly ad sedig δ to zero, we obtai (8). This completes the proof of part (b). 7

MIT OpeCourseWare http://ocw.mit.edu 5.070J / 6.265J Advaced Stochastic Processes Fall 203 For iformatio about citig these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.