MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/5.070J Fall 203 Lecture 3 9//203 Large deviatios Theory. Cramér s Theorem Cotet.. Cramér s Theorem. 2. Rate fuctio ad properties. 3. Chage of measure techique. Cramér s Theorem We have established i the previous lecture that uder some assumptios o the Momet Geeratig Fuctio (MGF) M(θ), a i.i.d. sequece of radom variables X i, ; i with mea µ satisfies P(S a) exp( I(a)), where S = i X i, ad I(a) sup θ (θa log M(θ)) is the Legedre trasform. The fuctio I(a) is also commoly called the rate fuctio i the theory of Large Deviatios. The boud implies log P(S a) lim sup I(a), ad we have idicated that the boud is tight. Namely, ideally we would like to establish the limit log P(S a) lim sup = I(a), Furthermore, we might be iterested i more complicated rare evets, beyod the iterval [a, ). For example, the likelihood that P(S A) for some set A R ot cotaiig the mea value µ. The Large Deviatios theory says that roughly speakig lim P(S A) = if I(x), () x A

but ufortuately this statemet is ot precisely correct. Cosider the followig example. Let X be a iteger-valued radom variable, ad A = { m p : m Z, p is odd prime.}. The for prime, we have P(S A) = ; but for = 2 k, log P (S A) we have P (S A) = 0. As a result, the limit lim i this case does ot exist. The sese i which the idetity () is give by the Cramér s Theorem below. Theorem (Cram er s Theorem). Give a sequece of i.i.d. real valued ra dom variables X i, i with a commo momet geeratig fuctio M(θ) = E[exp(θX )] the followig holds: (a) For ay closed set F R, (b) For ay ope set U R, lim sup log P(S F ) if I(x), x F lim if log P(S U) if I(x). We will prove the theorem oly for the special case whe D(M) = R (amely, the MGF is fiite everywhere) ad whe the support of X is etire R. Namely for every K > 0, P(X > K) > 0 ad P(X < K) > 0. For example a Gaussia radom variable satisfies this property. To see the power of the theorem, let us apply it to the tail of S. I the followig sectio we will establish that I(x) is a o-decreasig fuctio o the iterval [µ, ). Furthermore, we will establish that if it is fiite i some iterval cotaiig x it is also cotiuous at x. Thus fix a ad suppose I is fiite i a iterval cotaiig a. Takig F to be the closed set [a, ) with a > µ, we obtai from the lim sup log P(S [a, )) mi I(x) x U x a = I(a). Applyig the secod part of Cramér s Theorem, we obtai lim if log P(S [a, )) lim if log P(S (a, )) if I(x) x>a = I(a). 2

Thus i this special case ideed the large deviatios limit exists: lim log P(S a) = I(a). The limit is isesitive to whether the iequality is strict, i the sese that we also have lim log P(S > a) = I(a). 2 Properties of the rate fuctio I Before we prove this theorem, we will eed to establish several properties of I(x) ad M(θ). Propositio. The rate fuctio I satisfies the followig properties (a) I is a covex o-egative fuctio satisfyig I(µ) = 0. Furthermore, it is a icreasig fuctio o [µ, ) ad a decreasig fuctio o (, µ]. Fially I(x) = sup θ 0 (θx log M(θ)) for every x µ ad I(x) = sup θ 0 (θx log M(θ)) for every x µ. (b) Suppose i additio that D(M) = R ad the support of X is R. The, I is a fiite cotiuous fuctio o R. Furthermore, for every x R we have I(x) = θ 0 x log M(θ 0 ), for some θ 0 = θ 0 (x) satisfyig Ṁ(θ 0 ) x =. (2) M(θ 0 ) Proof of part (a). Covexity is due to the fact that I(x) is poit-wise supremum. Precisely, cosider λ (0, ) I(λx + ( λ)y) = sup[θ(λx + ( x)y) log M(θ)] θ = sup[λ(x log M(θ)) + ( λ)(y log M(θ))] λ sup(x log M(θ)) + ( λ) sup (y log M(θ)) θ θ =λi(x) + ( λ)i(y). This establishes the covexity. Now sice M(0) = the I(x) 0 x log M(0) = 0 ad the o-egativity is established. By Jese s iequality, we have that M(θ) = E[exp(θX )] exp(θe[x ]) = exp(θµ). 3

Therefore, log M(θ) θµ, amely, θµ log M(θ) 0, implyig I(µ) = 0 = mi x R I(x). Furthermore, if x > µ, the for θ < 0 we have θx log M(θ) θ(x µ) < 0. This meas that sup θ (θx log M(θ)) must be equal to sup θ 0 (θx log M(θ)). Similarly we show that whe x < µ, we have I(x) = sup θ 0 (θx log M(θ)). Next, the mootoicity follows from covexity. Specifically, the existece of real umbers µ x < y such that I(x) > I(y) I(µ) = 0 violates covexity (check). This completes the proof of part (a). Proof of part (b). For ay K > 0 we have log M(θ) log exp(θx) dp (x) lim if = lim if θ θ θ θ lim if log exp(θx) dp (x) Sice K is arbitrary, Similarly, Therefore, θ θ lim if log (exp(kθ)p([k, ])) θ θ K = K + lim if log P([K, ]) θ θ = K (sice supp(x ) = R, we have P([K, )) > 0.) lim if log M(θ) = θ θ lim if log M(θ) = θ θ lim θx log M(θ) = lim θ(x log M(θ)) θ θ θ Therefore, for each x as θ, we have that lim θx log M(θ) = θ From the previous lecture we kow that M(θ) is differetiable (hece cotiuous). Therefore the supremum of θx log M(θ) is achieved at some fiite value θ 0 = θ 0 (x), amely, I(x) = θ 0 x log M(θ 0 ) <, 4

where θ 0 is foud by settig the derivative of θx log M(θ) to zero. Namely, θ 0 must satisfy (2). Sice I is a fiite covex fuctio o R it is also cotiuous (verify this). This completes the proof of part (b). 3 Proof of Cramér s Theorem Now we are equipped to provig the Cramér s Theorem. Proof of Cramer s Theorem. Part (a). Fix a closed set F R. Let α + = mi{x [µ, + ) F } ad α = max{x (, µ] F }. Note that α + ad α exist sice F is closed. If α + = µ the I(µ) = 0 = mi x R I(x). Note that log P(S F ) 0, ad the statemet (a) follows trivially. Similarly, if α = µ, we also have statemet (a). Thus, assume α < µ < α +. The Defie P (S F ) P (S [α +, )) + P (S (, α ]) We already showed that from which we have x P (S [α +, )), y P (S (, α ]). P (S α + ) exp( (θα + log M(θ))), θ 0. log P (S α + ) (θα + log M(θ)), θ 0. log P (S α + ) sup(θα + log M(θ)) = I(α + ) θ 0 The secod equality i the last equatio is due to the fact that the supremum i I(x) is achieved at θ 0, which was established as a part of Propositio. Thus, we have Similarly, we have lim sup lim sup log P (S α + ) I(α + ) (3) log P (S α ) I(α ) (4) Applyig Propositio we have I(α + ) = mi x α+ I(x) ad I(α ) = mi x α I(x). Thus mi{i(α + ), I(α )} = if I(x) (5) x F 5

From (3)-(5), we have that lim sup log x if I(x), lim sup log y if I(x), (6) x F x F which implies that lim sup log(x + y ) if I(x). x F (you are asked to establish the last implicatio as a exercise). We have established lim sup log P (S F ) if I(x) (7) x F Proof of the upper boud i statemet (a) is complete. Proof of Cram er s Theorem. Part (b). Fix a ope set U R. Fix E > 0 ad fid y such that I(y) if x U ((x). It is sufficiet to show that lim if P (S U) I(y), (8) sice it will imply lim if P (S U) if I(x) + E, ad sice E > 0 was arbitrary, it will imply the result. Thus we ow establish (8). Assume y > µ. The case y < µ is treated similarly. Fid θ 0 = θ 0 (y) such that x U I(y) = θ 0 y log M(θ 0 ). Such θ 0 exists by Propositio. Sice y > µ, the agai by Propositio we may assume θ 0 0. We will use the chage-of-measure techique to obtai the cover boud. For this, cosider a ew radom variable let X θ0 be a radom variable defied by z P(X θ0 z) = exp(θ 0 x) dp (x) M(θ 0 ) Now, E[X θ0 ] = x exp(θ 0 x) dp (x) M(θ 0 ) Ṁ(θ 0 ) = M(θ0 ) = y, 6

where the secod equality was established i the previous lecture, ad the last equality follows by the choice of θ 0 ad Propositio. Sice U is ope we ca fid δ > 0 be small eough so that (y δ, y + δ) U. Thus, we have P(S U) P(S (y δ, y + δ)) = dp (x ) dp (x ) x i y <δ = exp( θ 0 x i )M (θ 0 ) M (θ 0 ) exp(θ 0 x i )dp (x i ). x i y <δ i i (9) Sice θ 0 is o-egative, we obtai a boud P(S (y δ, y + δ)) exp( θ 0 y θ 0 δ)m (θ 0 ) M (θ 0 ) exp(θ 0 x i )dp (x i ) x i y <δ i However, we recogize the itegral ; o the right-had side of the iequality above as the that the average i Y i of i.i.d. radom variables Y i, i distributed accordig to the distributio of X θ0 belogs to the iterval (y δ, y + δ). Recall, however that E[Y i ] = E[X θ0 ] = y (this is how X θ0 was desiged). Thus by the Weak Law of Large Numbers, this probability coverges to uity. As a result lim log M (θ 0 ) exp(θ 0 x i )dp (x i ) = 0. We obtai x i y <δ i lim if log P(S U) θ 0 y θ 0 δ + log M(θ 0 ) = I(y) θ 0 δ. Recallig that θ 0 depeds o y oly ad sedig δ to zero, we obtai (8). This completes the proof of part (b). 7

MIT OpeCourseWare http://ocw.mit.edu 5.070J / 6.265J Advaced Stochastic Processes Fall 203 For iformatio about citig these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.