Entropy and Ergodic Theory Lecture 5: Joint typicality and conditional AEP

Size: px

Start display at page:

Download "Entropy and Ergodic Theory Lecture 5: Joint typicality and conditional AEP"

Ashlee King
5 years ago
Views:

1 Etropy ad Ergodic Theory Lecture 5: Joit typicality ad coditioal AEP 1 Notatio: from RVs back to distributios Let (Ω, F, P) be a probability space, ad let X ad Y be A- ad B-valued discrete RVs, respectively. The p X,Y Prob(A B), ad we may write it usig the multiplicatio rule for coditioal probabilities: p X,Y (a, b) = p X (a) p Y X (b a). I vaious applicatios of the theory, we eed to fix oe of the two factors o the right of this equatio, ad cosider all the possible joit distributios that ca be obtaied by varyig the other factor. Defiitio 1.1. A kerel from A to B is a A B matrix θ = (θ(b a)) a,b such that the row vector (θ(b a)) b B is a probability vector o B for each a A. Later i this course we exted this defiitio to more geeral measurable spaces A ad B. The th extesio of a kerel θ is the kerel θ from A to B defied by θ (y 1 y x 1 x ) = θ(y i x i ). Each of the fuctios θ ( x) for x A is a product probability distributio o B. A kerel θ may be see as specifyig a collectio of coditioal probabilities θ( a), a A, but ot a probability distributio o A itself. If (X, Y ) are as above, the we say that p Y X agrees with a kerel θ if p Y X (b a) = θ(b a) wheever p X (a) > 0. 1

2 If p Prob(A) ad θ is a kerel from A to B, the they ca be combied to give a probability distributio o A B: q(a, b) := p(a) θ(b a). (1) This is sometimes called the iput-output distributio of p ad θ, or the hookup of p with θ. We sometimes deote if by the (o-stadard) otatio p θ, which is meat to suggest a kid of geeralizatio of a product measure. If (X, Y ) are a (A B)-valued RV with this distributio, the p X = p ad p Y X agrees with θ. Ay pair of RVs (X, Y ) with this distributio is called a radomizatio of p ad θ. If (X, Y ) is a radomizatio of p ad θ, the the coditioal etropy H(Y X) clearly depeds oly o the joit distributio p θ. We sometimes use the alterative otatio H(θ p) := H(Y X) = a A p(a) H(θ( a)). Sice this equals H(X, Y ) H(X) = H(p θ) H(p), it is clearly a cotiuous fuctio of the joit distributio p θ. 2 Joit typicality ad the coditioal AEP Let q Prob(A B), ad let p A ad p B its margials o A ad B. I order to emphasize that q is a distributio o a product of two alphabets, we sometimes refer to elemets of T,δ (q) as joitly δ-typical for q. This omeaclature also emphasizes the followig importat poit: If x T,δ (p A ) ad y T,δ (p B ), it does ot follow that (x, y) is joitly ε-typical for q for some small ε. Example. Let x {0, 1} be δ-typical for the uiform distributio o {0, 1}, ad let y = x. The (x, y) is very far from typical for the uiform distributio o {0, 1} {0, 1}, sice N(01 (x, y)) = N(10 (x, y)) = 0 /4. Istead, (x, y) is typical for the distributio p = 1 2 (δ 00 +δ 11 ) o {0, 1} {0, 1}. 2

3 Our ext goal is a coutig-problem itepretatio of coditioal etropy, aalogous to the meaig established for ucoditioal etropy i Lecture 1. I provig it, we will also meet coditioal versios of the LLN for types ad the AEP. We build up all these results followig roughly the same sequece as i the ucoditioal settig of Lecture 1. Let us ow write q = p A θ for some kerel θ from A to B. We fix all of these otatios for the rest of the lecture. 2.1 Coditioal typicality Defiitio 2.1. Let x A ad y B. The the coditioal type of y give x is the collectio of values p y x (b a) := N( (a, b) (x, y) ), N(a x) defied for all a A such that N(a x) > 0. Heceforth, we usually abbreviate N(ab xy) := N ( (a, b) (x, y) ). Ituitively, the quatity p y x (b a) does the followig: amog all i {1, 2,..., }, we cosider oly those for which x i = a, ad amog those we record the fractio which also satisfy y i = b. Sometimes we refer to the coditioal type as beig idexed by all a A, without worryig about whether N(a x) > 0. We do this oly i cases where it causes o real problems. The first part of our iterpretatio of coditioal etropy is the followig. It is the aalog of the upper boud i Theorem 3.1 from Lecture 1. Lemma 2.2. Fix a kerel θ ad a strig x A. The umber of strigs y B for which p y x agrees with θ is at most 2 H(θ px). Proof. Exercise: mimic the proof of the upper boud i Lecture 1, Theorem 3.1. The key fact is that, if p y x agrees with θ, the N(ab xy) = θ(b a)p x (a) for a A, b B. 3

4 Like Theorem 3.1 i Lecture 1, there is a matchig lower boud for this lemma, but we leave it aside here. Istead let us itroduce a approximate form of the problem, ad prove upper ad lower bouds for that. The approximate form is more importat for later applicatios. Defiitio 2.3. Let δ > 0 ad x A. The y B is coditioally δ-typical for θ give x if px,y p x θ < δ. The set of all such y is deoted by T,δ (θ, x). More expasively, y T,δ (θ, x) if N(ab xy) a,b N(a x) θ(b a) < δ. Beware that some authors use slightly differet defiitios, as i the case of ucoditioal tylicality. This defiitio requires just a little care for the followig reaso: if θ ad λ are two kerels satisfyig θ(b a) = λ(b a) wheever N(a x) > 0, the y is coditioally δ-typical for θ give x if ad oly if the same holds for λ. As i the ucoditioal settig, the upper boud of Lemma 2.2 is easily tured ito a upper boud for the problem of coutig approximately coditioally typical strigs. Corollary 2.4. For x A ad δ > 0 we have T,δ (θ, x) 2 H(θ px)+ (δ)+o(), where the estimates i the two error terms of the expoet are idepedet of x. Proof. The quatity H(θ p) is cotiuous as a fuctio of the joit distributio p θ. From this ad Lemma 2.2 it follows that {y : p y x agrees with λ} 2 H(λ px) H(θ px)+ (δ) 2 wheever p x λ p x θ < δ. The values p y x (b a) take by the coditioal type of y give x are all ratios of itegers betwee 0 ad, ad there are at most A B of these values. Therefore 4

5 we may choose a set C of at most ( + 1) 2 A B ratioal stochastic matrices such that every coditioal type betwee strigs of legth agrees with some λ C. Havig doe so, we obtai T,δ (θ, x) λ C p x λ p x θ < δ λ C p x λ p x θ < δ H(λ px) 2 2 H(θ px)+ (δ) 2 H(θ px)+ (δ)+o(). I the remaider of this sectio, we itroduce a LLN ad AEP for coditioal types, ad obtai a lower boud which complemets the previous corollary. 2.2 A LLN for coditioal types We eed a versio of the weak law of large umbers for idepedet RVs of possibly differet distributios. We give a very simple versio which assumes a uiform boud o the RVs. It ca be proved by the same applicatio of Chebyshev s iequality as i the i.i.d. case (which i fact requires oly a uiform boud o the variaces). Propositio 2.5 (Weak LLN for RVs with differig distributios). Fix M > 0. For ay ε > 0 there is a 0 N, depedig o M ad ε, with the followig property: wheever 0 ad X 1,... X are idepedet [ M, M]-valued RVs, we have { 1 P [ 1 X i E ] } X i > ε < ε. Propositio 2.6 (LLN for coditioal types). For ay δ > 0, we have as. mi θ( T,δ (θ, x) ) x 1 x A 5

6 Proof. Fix x A, ad let Y B be draw from the distributio θ ( x). This meas that Y 1,..., Y are idepedet ad that Y i has distributio θ( x i ). For ay (a, b) A B we have p x,y (a, b) = N(ab xy) = 1 1 {xi =a, Y i =b}. The RVs 1 {xi =a, Y i =b} are idepedet, but i geeral ot idetically distributed: for those i which satisfy x i = a the distributio is Beroulli(θ(b a)), ad for the other i this RV is idetically zero. However, it does follow that they are all bouded by 1, ad so Propositio 2.5 gives { px,y P (a, b) E [ p x,y (a, b) ] } < δ 1 δ > 0, where the rate of covergece does ot deped o the specific choice of x. This gives the correct limitig behaviour for p x,y (a, b) because E [ p x,y (a, b) ] = 1 i: x i =a E[1 {Yi =b}] = 1 N(a x)θ(b a) = (p x θ)(a, b). Combiig this covergece for the fiitely may choices of (a, b) completes the proof. 2.3 Coditioal etropy typicality ad AEP There is also a coditioal versio of etropy typicality. Defiitio 2.7. A strig y B is coditioally ε-etropy typical for θ give x if 2 H(θ px) ε < θ (y x) < 2 H(θ px)+ε. The set of all such y is deoted by T et,ε (θ, x). The choice of the costat H(θ p x ) here is justified by the followig. Theorem 2.8 (Coditioal AEP). For ay ε > 0 we have as. mi θ( T et x A,ε (θ, x) ) x 1 6

7 Proof. Let Y θ ( x) as before. Takig logarithms gives log 2 θ (Y x) = Z i, where Z i = log 2 θ(y i x i ). These are idepedet, fiite-valued RVs. Amog them, there are at most A -may differet possible distributios, oe for each of the possible values of x i. This list of possible distributios depeds o θ, but the specific choice of x determies oly which distributio is chose for each Z i. Therefore Propositio 2.5 gives { 1 [ 1 } P log 2 θ (Y x) E log 2 θ (Y x)] < δ 1 δ > 0, where the rate of covergece agai does ot deped o the specific choice of x. This completes the proof, because E[log 2 θ (Y x)] = = m E[log 2 θ(y i x i )] = H(θ( x i )) = a θ(b x i ) log 2 θ(b x i ) b p x (a) H(θ( a)) = H(θ P x ). From this we obtai a coverig-umber corollary exactly as i the ucoditioal settig. Corollary 2.9. For ay ε > 0, we have cov ε (θ ( x)) = 2 H(θ px)+oε(), where the estimate i o ε () also depeds o θ but ot o x. This ow implies the lower boud o the umber of coditioally typical strigs which accompaies Corollary 2.4 Corollary For ay x A we have T,δ (θ, x) 2 H(θ px) o δ(), where the estimate i o δ () depeds o θ but ot o x. Proof. Combie Propositio 2.6 with the previous corollary. 7

8 2.4 Ituitive picture of the above results Let (X, Y ) be a radomizatio of q. The origial AEP lets us roughly visualize q as a uiform distributio o typical strigs. If we wish to thik about q as a joit distributio for pairs of strigs (x, y), the the results of this sectio give us a ehaced versio of that picture. To explai this, we start with the followig. The followig lemma gives some simple but useful relatios betwee coditioal ad ucoditioal typicality. Lemma If x T,δ (p) ad y T,δ (θ, x) the (x, y) T,2δ (p θ). 2. O the other had, if (x, y) T,δ (p θ), the y T,2δ (θ, x). Proof. The key is that for ay p Prob(A) we have p θ p θ = p (a)θ(b a) p(a)θ(b a) a,b = a ( ) p (a) p(a) θ(b a) = p p. b Part 1. Usig the calculatio above, our first pair of assumptios gives p x,y p θ p x,y p x θ + p x θ p θ = p x,y p x θ + p x p < 2δ. Part 2. Clearly if (x, y) T,δ (p θ), the x T,δ (p). Give this, we obtai p x,y p x θ p x,y p θ + p x θ p θ < 2δ. More succictly, this lemma asserts that T,δ/2 (p θ) ( {x} T,δ (θ, x) ) T,2δ (p θ) δ > 0. (2) x T,δ (p) Usig (2), we see that q is somethig like the uiform distributio o the subset of the Cartesia product where moreover T,δ (q) T,δ (p A ) T,δ (p B ), 8

9 1. most vertical slices T,δ (q) ({x} B ) have size roughly 2 H(Y X) (where we restrict attetio to x which is itself typical for p A ), ad similarly 2. most horizotal slices T,δ (q) (A {y}) have size roughly 2 H(X Y ). This ituitive picture gives us a ice way to visualize the chai rule. There are some messy εs ad δs to keep track of, but the idea is this: 2 H(X,Y ) T,δ (q) = T,δ (q) ({x} B ) x T,δ (p A ) T,δ (p A ) (typical size T,δ (q) ({x} B ) ) 2 H(X) 2 H(Y X), where the approximatio betwee the first ad the secod lies uses property 1 above. 3 Notes ad remarks Basic sources for this lecture: [CT91, Chapter 10, Exercise 16]. Further readig: the book [CK11] icludes a much more thorough accout of typicality ad coditioal typicality. Refereces [CK11] Imre Csiszár ad Jáos Körer. Iformatio theory. Cambridge Uiversity Press, Cambridge, secod editio, Codig theorems for discrete memoryless systems. [CT91] Thomas M. Cover ad Joy A. Thomas. Elemets of iformatio theory. Wiley Series i Telecommuicatios. Joh Wiley & Sos, Ic., New York, A Wiley-Itersciece Publicatio. TIM AUSTIN COURANT INSTITUTE OF MATHEMATICAL SCIENCES, NEW YORK UNIVERSITY, 251 MERCER ST, NEW YORK NY 10012, USA tim@cims.yu.edu URL: cims.yu.edu/ tim 9

Lecture 6: Source coding, Typicality, and Noisy channels and capacity

Lecture 6: Source coding, Typicality, and Noisy channels and capacity 15-859: Iformatio Theory ad Applicatios i TCS CMU: Sprig 2013 Lecture 6: Source codig, Typicality, ad Noisy chaels ad capacity Jauary 31, 2013 Lecturer: Mahdi Cheraghchi Scribe: Togbo Huag 1 Recap Uiversal