Lecture 15: Strong, Conditional, & Joint Typicality

EE376A/STATS376A Iformatio Theory Lecture 15-02/27/2018 Lecture 15: Strog, Coditioal, & Joit Typicality Lecturer: Tsachy Weissma Scribe: Nimit Sohoi, William McCloskey, Halwest Mohammad I this lecture, we will cotiue developig tools that will be useful goig forward, i particular i the cotext of lossy compressio. 1 We will itroduce the otios of Strog, Coditioal, ad Joit Typicality. 1 Notatio A quick recap of the otatio: 1. Radom variables: i.e. X 2. Alphabet: i.e. X 3. Specific values: i.e. x 4. Sequece of values: i.e. x 5. Set of all probability mass fuctios o alphabet X : M(X ) 6. Empirical distributio of a sequece x : P x (a) := N(a x ) appears i x ] [N(a x ) is # of times symbol a 2 Typicality 2.1 Strog Typicality Defiitio 1. A sequece x X is strogly δ-typical with respect to a probability mass fuctio P M(X ) if P x (a) P (a) δ P (a), a X (1) I words, a sequece is strogly δ-typical with respect to P if its empirical distributio is close to the probability mass fuctio P. [δ is some fixed umber, typically small.] Defiitio 2. The strogly δ-typical set [or simply strogly typical set] of p, T δ (P ), is defied as the set of all sequeces that are strogly δ-typical with respect to P, i.e. T δ (P ) = {x : P x (A) P (a) δ P (a), a X } (2) Recall: the weakly ɛ-typical set of a IID source P is defied as A ɛ (P ) := {x : 1 log P (x ) H(P ) ɛ}. Note: The coditio for iclusio i the weakly ɛ-typical set is ideed weaker tha the coditio to be i the strogly δ-typical set. 1 log P (x ) = 1 log 1 P x (a) log 1 P (a). This is P (a) log 1 = 1 1 log P (x P (x i) i) P (a) = H(P ) if P x = 1 N(a x ) log 1 P (a) = P, i.e. if the empirical distributio iduced by x is close to P, i.e. if the sequece is strogly typical. Thus, P (x ) P 1 log P (x ) H(P ), i.e. strog typicality implies weak typicality. I the homework, we will show more precisely that 1 Optioal Readig: Chapter 2 i El Gamal ad Kim, Network Iformatio Theory. 1

T δ (P ) A ɛ (P ) for ɛ = δ H(P ). Example: Here is a example of a sequece that is weakly typical but ot strogly typical. Let P be the uiform distributio over X, i.e. P (a) = 1 a X. The P (x ) = 1 1 log p(x ) = log = H(P ) x X. Thus, A ɛ (P ) = X, while T δ (P ) = {x : P x (a) 1 δ, a X }. I other words, the weakly typical set is the set of all sequeces over X, whereas the strogly typical set is the set of all sequeces such that each symbol appears roughly the same umber of times alog the sequece. We have already show that the probability of a particular sequece beig i A ɛ (P ) approaches 1 as. I the homework, we will ivestigate the probability of a particular sequece beig i T δ (P ), i.e. P (T δ (P )). I fact, this also approaches 1 as. lim P (T δ(p )) = 1 This is also a maifestatio of the law of large umbers, which tells us that for every symbol a, the fractio of times that it appears i a sequece will approach its true probability uder the source P, with probability close to 1. Fially, we will show that the size of the set of strogly δ-typical sequeces T δ (P ) is roughly 2 H(P ) ; more precisely, that for all sufficietly large : 2 [H(P ) ɛ(δ)] T δ (P ) 2 [H(P )+ɛ(δ)] (3) where ɛ(δ) 0 as δ 0. The lower boud follows from the previously show fact that ay set with size smaller tha 2 H(P ) has vaishig probability. The upper boud simply follows from the fact that T δ (P ) A ɛ (P ). 2.2 Joit Typicality I the followig, we refer to the sequeces x = (x 1, x 2,..., x ), x i X ad y = (y 1, y 2,..., y ), y i Y, where X ad Y are fiite alphabets. Defiitio 3. The joit empirical distributio of (x, y ) is: P x,y (x, y) = 1 N(x, y x, y ) (4) Defiitio 4. (x, y ) is joitly δ-typical with respect to P M(X Y) if P x,y(x, y) P (x, y) δ P (x, y), x X, y Y (5) Defiitio 5. The joitly δ-typical set with respect to P M(X Y) is T δ (P ) = {(x, y ) : (x, y ) is joitly δ-typical with respect to P } (6) 2

Observe that these defiitios are just special cases of the defiitios of the empirical distributio, strog δ-typicality, ad the strogly δ-typical set, sice a pair of a sequece i X ad a sequece i Y is simply a sequece i the alphabet of pairs X Y. Notatio: For coveiece, we will sometimes write T δ (X) i place of T δ (P ),whe X P, or T δ (X, Y ) i place of T δ (P ) whe (X, Y ) P. I the homework, we will show that g : X R, x T δ (X), (1 δ)e[g(x)] 1 g(x i ) (1 + δ)e[g(x)] I other words, for strogly typical sequeces, the average value of g computed o the compoets of the sequece is close to the expected value of g(x). Observe that 1 g(x i ) = P x (a) g(a); the latter is the expectatio of g(x) whe X is distributed accordig to the empirical distributio of P x. But sice x T δ (x), P x is close to the true PMF of X [i.e. P ], which is why this expectatio is close to the true expectatio E[g(X)]. This property will be importat for the rate distortio theorem where g will be replaced by the distortio fuctio. I the homework, you will fid cases where this does ot hold for weak typicality. 2.3 Coditioal Typicality Defiitio 6. Fix x. The coditioal δ-typical set is T δ (Y x ) = {y : (x, y ) T δ (X, Y )} (7) I other words, it is the set of all sequeces y such that the pair (x, y ) is joitly δ-typical. Observe that if x T δ (X), the T δ (Y x ) =, because for a sequece (x, y ) to be joitly typical, each idividual sequece must be typical with respect to P X ad P Y, respectively (show i homework). I the homework, we will show that, assumig x T δ (X), (1 δ)2 [H(Y X) ɛ(δ)] T δ (Y x [H(Y X)+ɛ(δ)] ) 2 for all 0 < δ < δ ad sufficietly large, where ɛ(δ) = δ H(Y X). I short, for a sequece x that is typical, the umber of sequeces y that are joitly typical with x is approximately 2 H(Y X). A startig poit of the proof will be the Coditioal Typicality Lemma. Lemma 7 (Coditioal Typicality Lemma). For 0 < δ < δ, x T δ (X) ad Y P (y x ) = P Y X (y i x i ), the lim P (Y T δ (Y x )) = 1 (8) I other words, we fix a idividual sequece x, ad geerate the sequece Y stochastically ad idepedetly accordig to the distributio coditioed o x, i.e. we geerate Y i P Y X=xi, [accordig to the joit probability mass fuctio P X,Y, which gives rise to the coditioal probability mass fuctio P Y X ]. Oe ca thik of this i commuicatio termiology: the sequece Y is geerated is by takig the idividual sequece x ad passig it through the memoryless chael P (Y X). The probability that the sequece Y thus geerated is coditioally typical approaches 1 as becomes large. To prove the coditioal typicality lemma, we will employ the fact [to be proved earlier i the homework] that P (T δ (P )) 1. Fix some a X, ad cosider the subsequece of all compoets x i i x that 3

are equal to a. Cosider the subsequece of y i s correspodig to the same idices. This subsequece is geerated IID from the PMF P Y X=a. We will apply the aforemetioed result separately to each such subsequece correspodig to a symbol i a X. To prove the bouds o the size of T δ (Y x ), we will take a similar approach: we will use Equatio (3) [which will also be proved earlier i the homework] ad apply it to each subsequece associated with a symbol a X. We ca iterpret the Coditioal Typicality Lemma qualitatively with the help of the followig pictures: X Y T δ (X) T δ (Y ) x size 2 H(X) Tδ (Y x ) H(Y X) size 2 size 2 H(Y ) Figure 1: Illustratio of the relatioships betwee strogly δ-typical ad coditioally δ-typical sets The dashed lie deotes that, give chael iput x, the chael output will fall withi the dark gray set T δ (Y x ) with high probability. T δ (Y x ) ca be thought of the oise ball aroud the particular chael iput sequece x. Recall that i lecture 11, we used this to give ituitio for the chael codig coverse. Lemma 8 (Joit Typicality Lemma). 0 < δ < δ, if Ỹi IID Y, the for all sufficietly large ad x T δ (X), where ɛ(δ) 0 as δ 0. 2 [I(X;Y )+ ɛ(δ)] P (Ỹ T δ (Y x )) 2 [I(X;Y ) ɛ(δ)] (9) The proof of the Joit Typicality Lemma will also be a homework problem. Ituitively speakig, sice the sequece Ỹ is geerated IID with respect to Y, o a expoetial scale it is roughly uiformly distributed over the set T δ (Y ). Thus, the probability that the sequece falls withi T δ (Y x ) for some particular x is, o a expoetial scale, roughly the ratio of the size of this set to the size of T δ (Y ), sice T δ (Y x ) T δ (Y ). 4

2H(Y X) Agai, refer to Figure 1 for a visual aid. So, P (Ỹ T δ (Y X )) = 2 I(X;Y ). So, the 2 H(Y ) probability that a radomly geerated sequece Ỹ looks joitly typical with a particular sequece x is expoetially ulikely. I the ext lecture, we will see why these otios are sigificat i the cotext of lossy compressio. We will use them to prove the mai achievability result of lossy compressio. 5