PhD short course Information Theory and Statistics Siena, 15-19 September, 2014 IT and large deviation theory Mauro Barni University of Siena
Outline of the short course Part 1: Information theory in a nutshell Part 2: The method of types and its relationship with statistics Part 3: Information theory and large deviation theory Part 4: Information theory and hypothesis testing Part 5: Application to adversarial signal processing
Outline of Part 3 Large Deviation Theory Sanov Theorem Conditional limit theorem Examples
Large deviation theory LDT studies the probability of rare events, i.e. events not covered by the law of large numbers Examples What is the probability that in 1000 fair coin tosses head appears 800 times? Compute the probability that the mean value of a sequence (emitted by a DMS X) is larger than T, with T much larger than E[X]. Rare events in statistical physics or economics
Large deviation theory More formally: let S be a subset of pmf s and let Q be a source. We want to compute the probability that Q emits a sequence whose type belongs to S Q(S) = x:p x S Q(x) Example: What is the probability that the average value of a sequence drawn from Q is larger than 4? Above problem with S = pmf s such that E[S] > 4.
Large deviation theory If S contains a KL neighborhood of Q, then Q(S) -> 1 If S does not contain Q or a KL neighborhood of Q, then Q(S) -> 0. The question is: how fast?. Q S S. Q
More formally: Sanov s theorem Theorem (Sanov) Let S be a regular set of pmf s (cl(int(s) = S), then Q(S) 2 nd(p* Q) P * = argmin P S D(P Q) S. P *. Q
Sanov s theorem Proof. (upper bound) Q(S) = Q(T(P)) 2 nd(p Q) 2 nmin P S P n D(P Q) P S P n P S P n P S P n 2 nmin P S D(P Q) = 2 nd(p* Q) P S P n P S P n (n +1) X 2 nd(p* Q)
Sanov s theorem Proof. (lower bound) Due to the regularity of S and the density of P n n in the set of all pmf's we can find a sequence P n P n such that P n P * and hence D(P n Q) D(P * Q). Then for large n we can write: Q(S) = Q(T(P)) Q(T(P n )) P S P n 1 (n +1) X 2 nd(p n Q) 1 2 nd(p * Q) (n +1) X
Example Compute the probability that in 1000 coin tosses, head shows more than 800 times. S = B(p,1 p) with p 0.8 Q = B(0.5, 0.5) P * = B(0.8, 0.2) D(P * Q) =1 H(P * ) =1 h(0.8) = 0.3 P(S) 2 nd(p* Q) = 2 300!!!!
A more general example We may want to compute # Pr$ % 1 n n & g j (X i ) α j j =1 k ' ( i=1 Sanov theorem with $ S = % P : & x X P(x)g j (x) α j ' j =1 k ( ) We can use Lagrange multipliers to minimize D(P Q) subject to Q in S
A more general example Unconstrained minimization of L(P) = P(x)log P(x) k Q(x) + λ # P(x)g (x) α & # & % ( + β % P(x) 1( j $ j j ' $ ' x j=1 Yielding (after some algebra): x x P * (x) = 1 K Q(x)e j λ j g j (x) with K = x X j Q(x)e λ j g j (x)
A numerical example Compute the probability that the average of n tosses of a fair die is larger than 4 (instead than 3.5) # 6 & S = $ P : xp(x) 4' % ( x=1 From the previous result we have P * (x) = 2λx 6 2 λi i=1 with λ chosen in such a way that Which can be solved numerically (Matlab) 6 xp * (x) = 4 x=1
Homework: how lucky do you need to be? Is it better to bet that head will show up in 3/5 of the tosses of a fair coin or that face 6 will show in 5/18 of the tosses of a fair die?
Conditional limit theorem Not only is the probability of S determined by P *, but P * determines the probabilities of the elements of x n subject to S Theorem Let E be a closed convex set S. Let X i be a sequence of iid RV generated by Q. Let P * be defined as in Sanov theorem. Then Pr { Q X 1 = a P x n S} P * (a) a X
Conditional limit theorem (extension) Theorem Let E be a closed convex set S. Let X i be a sequence of iid RV generated by Q. Let P * be defined as in Sanov theorem. Let m be fixed. Then Pr Q X 1 = a 1, X 2 = a 2 X m = a m P x n S m i=1 { } P * (a 1 ) Remark The theorem holds for any fixed m but not for m = n P * (a)
Homework: a lucky friend Your are told that your friend was so lucky that in a whole night spent at tossing dies face 6 showed up ¼ of the times. Estimate the probability that face 1 never showed in the first 10 tosses Do the same for the first 100 tosses (assuming that in the whole night your friend tossed the coin much more than 100 times).
References 1. T. M. Cover and J. A. Thomas, Elements of Information Theory, Wiley 2. I. Csiszar, The method of types, IEEE Trans. Inf. Theory, vol.44, no.6, pp. 2505 2523, Oct. 1998. 3. I. Csiszar and P. C Shields, Information Theory and Statistics; a Tutorial, Foundations and Trends in Commun. and Inf. Theory, 2004, NOW Pubisher Inc.