Refinement of Two Fundamental Tools in Information Theory

Refiemet of Two Fudametal Tools i Iformatio Theory Raymod W. Yeug Istitute of Network Codig The Chiese Uiversity of Hog Kog Joit work with Siu Wai Ho ad Sergio Verdu

Discotiuity of Shao s Iformatio Measures q Shao s iformatio measures: H(X), H(X Y), I(X;Y) ad I(X;Y Z). q They are described as cotiuous fuctios [Shao 1948] [Csiszár & Körer 1981] [Cover & Thomas 1991] [McEliece 2002] [Yeug 2002]. q All Shao's iformatio measures are ideed discotiuous everywhere whe radom variables take values from coutably ifiite alphabets [Ho & Yeug 2005]. q e.g., X ca be ay positive iteger. P.2

Discotiuity of Etropy q Let P X = {1, 0, 0,...} ad 1 1 1 PX = 1,, log log log,...,0,0,.... q As, we have 2 i PX ( i) PX ( i) = log 0. q However, lim H ( X ) =. P.3

Discotiuity of Etropy q Theorem 1: For ay c 0 ad ay X takig values from a coutably ifiite alphabet with H(X) <, P X ( PX, PX ) = i PX ( X ) H( X ) + c s.t. V ( i) P ( i) 0 but H X H ( ) X 0 H ( X ) + c H( X ) P.4

Discotiuity of Etropy q Theorem 2: For ay c 0 ad ay X takig values from coutably ifiite alphabet with H(X) <, P X P i D( P P ) P i X ( ) s.t. X X = i X ( )log 0 P ( i) but H H ( X ) H( X ) + c ( ) X X H ( X ) + c H( X ) 0 P.5

Pisker s iequality D(p q) 1 2 l2 V 2 (p, q) q By Pisker s iequality, covergece w.r.t. implies covergece w.r.t. V(, ). D( ) q Therefore, Theorem 2 implies Theorem 1. P.6

Discotiuity of Etropy 1 1 2 1 4 1 4 P.7

Discotiuity of Shao s Iformatio Measures q Theorem 3: For ay X, Y ad Z takig values from coutably ifiite alphabet with I(X;Y Z) <, P XYZ but s.t. lim lim I D( PXYZ PXYZ ) ( X ; Y Z ) =. = 0 P.8

Discotiuity of Shao s Iformatio Measures Applicatios: chael codig theorem lossless/lossy source codig theorems, etc. Typicality Fao s Iequality Shao s Iformatio Measures P.9

To Fid the Capacity of a Commuicatio Chael Alice Chael Bob Capacity C 1 Typicality Capacity C 2 Fao s Iequality P.10

O Coutably Ifiite Alphabet Applicatios: chael codig theorem lossless/lossy source codig theorems, etc. Typicality Fao s Iequality Shao s Iformatio Measures discotiuous! P.11

Typicality q Weak typicality was first itroduced by Shao [1948] to establish the source codig theorem. q Strog typicality was first used by Wolfowitz [1964] ad the by Berger [1978]. It was further developed ito the method of types by Csiszár ad Körer [1981]. q Strog typicality possesses stroger properties compared with weak typicality. q It ca be used oly for radom variables with fiite alphabet. 12

Notatios q Cosider a i.i.d. source {X k, k 1}, where X k takig values from a coutable alphabet X. q Let P = for all k. X P X k q Assume H(P X ) <. q Let X = (X 1, X 2,, X ) q For a sequece x = (x 1, x 2,, x ) X, q N(x; x) is the umber of occurreces of x i x q q(x; x) = -1 N(x; x) ad q Q X = {q(x; x)} is the empirical distributio of x q e.g., x = (1, 3, 2, 1, 1). N(1; x) = 3, N(2; x) = N(3; x) =1 Q X = {3/5, 1/5, 1/5}. 13

Weak Typicality q Defiitio (Weak typicality): For ay ε > 0, the weakly typical set W [X]ε with respect to P X is the set of sequeces x = (x 1, x 2,, x ) X such that 1 log P X(x) H(P X ) ε 14

Weak Typicality q Defiitio 1 (Weak typicality): For ay ε > 0, the weakly typical set W [X]ε with respect to P X is the set of sequeces x = (x 1, x 2,, x ) X such that D(Q X P X )+ H(Q X ) H(P X ) ε q Note that while H(Q X ) = Empirical etropy Q X x (x)logq X (x) x = Q X (x)log P X (x) 15

Asymptotic Equipartitio Property q Theorem 4 (Weak AEP): For ay ε > 0: q 1) If x W [X]ε, the 2 ( H ( X ) + ε ) ( H ( X ) ε ) p( x) 2 q 2) For sufficietly large, q 3) For sufficietly large, { } X > 1 ε Pr W [ X ] ε (1 ε )2 ( H ( X ) ε ) W [ X ] ε 2 ( H ( X ) + ε ) 16

Illustratio of AEP X Set of all -sequeces Typical Set of -sequeces: Prob. 1 Uiform distributio 17

Strog Typicality q Strog typicality has bee defied i slightly differet forms i the literature. q Defiitio 2 (Strog typicality): For X < ad ay δ > 0, the strogly typical set T [X]δ with respect to P X is the set of sequeces x = (x 1, x 2,, x ) X such that V ( P, Q ) P ( x) q( x; x) δ X = x X the variatioal distace betwee the empirical distributio of the sequece x ad P X is small. X 18

Asymptotic Equipartitio Property q Theorem 5 (Strog AEP): For a fiite alphabet X ad ay δ > 0: q 1) If x T [X]δ, the ( H ( X ) + δ ) ( H ( X ) δ ) 2 p( x) 2 q 2) For sufficietly large, { } X > 1 δ Pr T [ X ] δ q 3) For sufficietly large, (1 δ)2 (H (X ) γ ) T [ X ]δ (H (X )+γ ) 2 19

Breakdow of Strog AEP q If strog typicality is exteded (i the atural way) to coutably ifiite alphabets, strog AEP o loger holds q Specifically, Property 2 holds but Properties 1 ad 3 do ot hold. P.20

Typicality X fiite alphabet Weak Typicality: D( QX PX ) + H ( QX ) H ( PX ) ε Strog Typicality: V ( P X, QX ) δ 21

Uified Typicality X coutably ifiite alphabet Weak Typicality: D( QX PX ) + H ( QX ) H ( PX ) ε Strog Typicality: V ( P X, QX ) δ x s.t. D( Q X PX ) is small but H ( Q ) H ( P ) is large X X 22

Uified Typicality X coutably ifiite alphabet Weak Typicality: D( QX PX ) + H ( QX ) H ( PX ) ε Strog Typicality: V ( P X, QX ) δ Uified Typicality: D( Q P ) H ( Q ) H ( P ) η. X X + X X 23

Uified Typicality q Defiitio 3 (Uified typicality): For ay η > 0, the uified typical set U [X]η with respect to P X is the set of sequeces x = (x 1, x 2,, x ) X such that D(Q X P X )+ H(Q X ) H(P X ) η q Weak Typicality: Strog Typicality: D(Q X P X )+ H(Q X ) H(P X ) ε V(P X,Q X ) δ q Each typicality correspods to a distace measure q Etropy is cotiuous w.r.t. the distace measure iduced by uified typicality 24

Asymptotic Equipartitio Property q Theorem 6 (Uified AEP): For ay > 0: q 1) If x U [X]η, the 2 ( H ( X ) + η) ( H ( X ) η ) p( x) 2 q 2) For sufficietly large, q 3) For sufficietly large, { } X > 1 η Pr U [ X ] η (1 η)2 (H (X ) µ ) U [ X ]η (H (X )+µ ) 2 25

Uified Typicality q Theorem 7: For ay x X, if x U [X]η, the x W [X]ε ad x T [X]δ, where ε = η ad δ = η 2l 2. 26

Uified Joitly Typicality q Cosider a bivariate iformatio source {(X k, Y k ), k 1} where (X k, Y k ) are i.i.d. with geeric distributio P XY. q We use (X, Y) to deote the pair of geeric radom variables. q Let (X, Y) = ((X 1, Y 1 ), (X 2, Y 2 ),, (X, Y )). q For the pair of sequece (x, y), the empirical distributio is Q XY = {q(x,y; x,y)} where q(x,y; x,y) = -1 N(x,y; x,y). 27

Uified Joitly Typicality q Defiitio 4 (Uified joitly typicality): For ay η > 0, the uified typical set U [XY]η with respect to P XY is the set of sequeces (x, y) X Y such that D( Q XY + H ( Q X P XY ) ) + H ( Q H ( P X XY ) ) + H ( Q H ( P Y ) XY ) H ( P Y ) η. q This defiitio caot be simplified. 28

Coditioal AEP q Defiitio 5: For ay x U [X]η, the coditioal typical set of Y is defied as U [Y X ]η { : (x, y) U [ XY ]η } (x) = y U [Y ]η q Theorem 8: For x U [X]η, if the U [Y X ]η (x) 1, 2 (H (Y X ) ν ) (H (Y X )+ν ) U [Y X ]η (x) 2 where ν 0 as η 0 ad the 29

Illustratio of Coditioal AEP 2 H ( X ) x S [ X ] 2 H ( Y ) y S [ Y ]............... H ( X,Y ) 2 ( x, y ) T [ XY ] P.30

Applicatios q Rate-distortio theory q A versio of rate-distortio theorem was proved by strog typicality [Cover & Thomas 1991][Yeug 2008] q It ca be easily geeralized to coutably ifiite alphabet q Multi-source etwork codig q The achievable iformatio rate regio i multisource etwork codig problem was proved by strog typicality [Yeug 2008] q It ca be easily geeralized to coutably ifiite alphabet 31

Fao s Iequality q Fao's iequality: For discrete radom variables X ad Y takig values o the same alphabet X = {1, 2, }, let ε = P[ X Y ] = 1 P ( w, w) w X XY q The H ( X Y) ε log( X 1) + h( ε ), where h( x) 1 = xlog + (1 x for 0 < x < 1 ad h(0) = h(1) = 0. 1 x)log 1 x P.32

Motivatio 1 q This upper boud o H ( X Y) ε log( X 1) + h( ε) H(X Y ) is ot tight. q For fixed ε ad X, ca always fid X such that H(X Y ) H(X) < ε log( X 1)+ h(ε) q The we ca ask, for fixed P X ad ε, what is max H ( X Y ) < ε log( X 1) + Y : P[ X Y ] = ε P X h( ε ) P.33

Motivatio 2 q If X is coutably ifiite, Fao s iequality o loger gives a upper boud o H(X Y). q It is possible that q explaied by the discotiuity of etropy. P X 1 1 1 1,,..., ad PY = log log log = q The H(X Y ) = H(X ) but q Uder what coditios ε 0 H(X Y) 0 for coutably ifiite alphabets? H ( X Y) 0 as ε 0 ε = 1 log which ca be { 1,0,0,... } 0 P.34

Tight Upper Boud o H(X Y) q Theorem 9: Suppose ε = P[ X Y ] 1 PX (1), the H ( X Y ) ε H ( Q ( PX, ε )) + h( ε ) where the right side is the tight boud depedet o ε ad P X. (This is the simplest of the 3 cases.) q1 q2 q3 q4 ε 1 1 1 Q( P X, ε) = { ε q1, ε q2, ε q3, } q Let Φ ( ε ) = ε H ( Q ( P, ε )) h( ε ) X X + P.35

Geeralizig Fao s Iequality q Fao's iequality [Fao 1952] gives a upper boud o the coditioal etropy H(X Y) i terms of the error probability ε = Pr{X Y}. q e.g. P X = [0.4, 0.4, 0.1, 0.1] [Fao 1952] H(X Y) [Ho & Verdú 2008] ε P.36

Geeralizig Fao s Iequality q e.g., X is a Poisso radom variable with mea equal to 10. q Fao's iequality o loger gives a upper boud o H(X Y). H(X Y) ε P.37

Geeralizig Fao s Iequality q e.g. X is a Poisso radom variable with mea equal to 10. q Fao's iequality o loger gives a upper boud o H(X Y). H(X Y) [Ho & Verdú 2008] ε P.38

Joit Source-Chael Codig ( S, S2, S 1 k ) Ecoder ( X, X2, X 1 ) Chael S ˆ, Sˆ, Sˆ ) Decoder Y, Y, Y ) ( 1 2 k ( 1 2 k-to- joit source-chael code P.39

P.40 Error Probabilities q The average symbol error probability is defied as q The block error probability is defied as = = k i i i k S S k 1 ] ˆ [ 1 P λ )] ˆ, ˆ, ˆ ( ),, ( [ 2 1 2 1 k k k S S S S S S = P µ

Symbol Error Rate q Theorem 10: For ay discrete memoryless source ad geeral chael, the rate of a k-to- joit sourcechael code with symbol error probability λ k satisfies k 1 sup I( X X 1 k k H ( S ) Φ S* ; Y ) ( λ ) k where S* is costructed from {S 1, S 2,..., S k } accordig to P P S* S* (1) ( a) = mi = mi j j P Sj a i (1), a 1 = 1PS ( i) i= 1 PS *( i) a j 2. P.41

Block Error Rate q Theorem 11: For ay geeral discrete source ad geeral chael, the block error probability µ k of a k- to- joit source-chael code is lower bouded by k Φ 1 S k H ( S ) sup I( X ; Y ) µ k X P.42

Iformatio Theoretic Security q Weak secrecy lim 1 I( X ; Y ) = 0 has bee cosidered i [Csiszár & Körer 78, Broadcast chael] ad some semial papers. q [Wyer 75, Wiretap chael I] oly stated that a large value of the equivocatio implies a large value of P ew, where the equivocatio refers to ad P ew meas µ. 1 H ( X Y k ) q It is importat to clarify what exactly weak secrecy implies. P.43

Weak Secrecy q E.g., P X = (0.4, 0.4, 0.1, 0.1). H(X) [Fao 1952] H(X Y) [Ho & Verdú 2008] ε = P[X Y] P.44

Weak Secrecy q Theorem 12: For ay discrete statioary memoryless source (i.i.d. source) with distributio P X, if q The q Remark: lim 1 I( X ; Y ) = q Weak Secrecy together with the statioary source assumptio is isufficiet to show the maximum error probability. q The proof is based o the tight upper boud o H(X Y) i terms of error probability. 0, lim λ = λmax ad lim µ = 1. P.45

Summary Applicatios: chael codig theorem lossless/lossy source codig theorems Typicality Fao s Iequality Weak Typicality Strog Typicality Shao s Iformatio Measures P.46

O Coutably Ifiite Alphabet Applicatios: chael codig theorem lossless/lossy source codig theorem Typicality Weak Typicality Shao s Iformatio Measures discotiuous! P.47

Uified Typicality Typicality Applicatios: chael codig theorem MSNC/lossy SC theorems Uified Typicality Shao s Iformatio Measures P.48

Geeralized Fao s Iequality Applicatios: results o JSCC, IT security MSNC/lossy SC theorems Geeralized Typicality Fao s Iequality Uified Typicality Shao s Iformatio Measures P.49

Perhaps... A lot of fudametal research i iformatio theory are still waitig for us to ivestigate. P.50

Refereces q S.-W. Ho ad R. W. Yeug, O the Discotiuity of the Shao Iformatio Measures, IEEE Tras. Iform. Theory, vol. 55, o. 12, pp. 5362 5374, Dec. 2009. q S.-W. Ho ad R. W. Yeug, O Iformatio Divergece Measures ad a Uified Typicality, IEEE Tras. Iform. Theory, vol. 56, o. 12, pp. 5893 5905, Dec. 2010. q S.-W. Ho ad S. Verdú, O the Iterplay betwee Coditioal Etropy ad Error Probability, IEEE Tras. Iform. Theory, vol. 56, o. 12, pp. 5930 5942, Dec. 2010. q S.-W. Ho, O the Iterplay betwee Shao's Iformatio Measures ad Reliability Criteria, i Proc. 2009 IEEE It. Symposium Iform. Theory (ISIT 2009), Seoul, Korea, Jue 28-July 3, 2009. q S.-W. Ho, Bouds o the Rates of Joit Source-Chael Codes for Geeral Sources ad Chaels, i Proc. 2009 IEEE Iform. Theory Workshop (ITW 2009), Taormia, Italy, Oct. 11-16, 2009. P.51

Q & A P.52

Why Coutably Ifiite Alphabet? q A importat mathematical theory ca provide some isights which caot be obtaied from other meas. q Problems ivolve radom variables takig values from coutably ifiite alphabets. q Fiite alphabet is the special case. q Beefits: tighter bouds, faster coverget rates, etc. q I source codig, the alphabet size ca be very large, ifiite or ukow. P.53

Discotiuity of Etropy q Etropy is a measure of ucertaity. q We ca be more ad more sure that a particular evet will happe as time goes, but at the same time, the ucertaity of the whole picture keeps o icreasig. q If oe foud the above statemet couter-ituitive, he/she may have the cocept that etropy is cotiuous rooted i his/her mid. q The limitig probability distributio may ot fully characterize the asymptotic behavior of a Markov chai. P.54

Discotiuity of Etropy Suppose a child hides i a shoppig mall where the floor pla is show i the ext slide. I each case, the chace for him to hide i a room is directly proportioal to the size of the room. We are oly iterested i which room the child locates i but ot his exact positio iside a room. Which case do you expect is the easiest to locate the child? P.55

Case A 1 blue room + 2 gree rooms Case B 1 blue room + 16 gree rooms Case C 1 blue room + 256 gree rooms Case A Case B Case C Case D Case D 1 blue room + 4096 gree rooms The chace i the blue room The chace i a gree room 0.5 0.622 0.698 0.742 0.25 0.0326 0.00118 0.000063 P.56

Discotiuity of Etropy From Case A to Case D, the difficulty is icreasig. By the Shao etropy, the ucertaity is icreasig although the probability of the child beig i the blue room is also icreasig. We ca cotiue to costruct this example ad make the chace i the blue room approachig to 1! The critical assumptio is that the umber of rooms ca be ubouded. So we have see that There is a very sure evet ad large ucertaity of the whole picture ca exist at the same time. Imagie there is a city where everyoe has a ormal life everyday with probability 0.99. With probability 0.01, however, ay kid of accidet that beyod our imagiatio ca happe. Would you feel a big ucertaity about your life if you were livig i that city? P.57

lim 1 I( X ; Y ) = Weak secrecy is isufficiet to show the maximum error probability. Example 1: Let W, V ad X i be biary radom variables. Suppose W ad V are idepedet ad uiform. Let ~ λ ~ µ max max X i W = idepedet ad uiform = 1 max = lim x P X k ( x) = 0.5 ( ) 1 max P ( x) x X 0 = lim V V 1 = 0 = 1 1 2 1 2 = 3 4 P.58

Example 1 Let Y 1 Y 2 Y 3 Y 4... = = = = X 1 X 4 X 9 X 16 X 2 X 3 X 8 X 15 X 5 X 6 X 7 X 14 0 lim lim 1 1 I( X ; Y = 0 k ) X 10 X 11 X 12 X 13 Choose The xˆ lim lim (0,0,,0) = (1,1,,1) µ λ = P[ V = P[ V if if = = Y Y i i = 0 = 1 1 1] = 2 1 1] = 2 i i. < ~ µ 1 4 max ~ < λ = max 3 4 = 1 2 P.59

Joit Uified Typicality Ca be chaged to D( Q XY + H ( Q D( Q XY X + H( Q X P XY ) P XY ) ) + H ( Q H ( P X ) + H( Q H ( P X XY ) ) + H ( Q XY ) η. ) H ( P Y ) H( P XY ) H ( P XY ) Y ) η.? As: X Y X Y 1 3 2 2 2 2 Q = {q(xy)} D(Q P) << 1 P = {p(xy)} P.60

Joit Uified Typicality Ca be chaged to D( Q XY + H ( Q D( Q XY + H ( Q X Y P XY ) P ) XY ) + H ( Q H ( P H ( P X ) + H ( Q Y XY ) ) + H ( Q X ) η. ) H ( P Y ) H ( P X XY ) H ( P )? Y ) η. As: X Y X Y 1 3 1 2 2 2 Q = {q(xy)} D(Q P) << 1 P = {p(xy)} P.61

Asymptotic Equipartitio Property q Theorem 5 (Cosistecy): For ay (x,y) X Y, if (x,y) U [XY]η, the x U [X]η ad y U [Y]η. q Theorem 6 (Uified JAEP): For ay η > 0: q 1) If (x, y) U [XY]η, the ( H ( XY ) + η) ( H ( XY ) η ) 2 p( x, y) 2 q 2) For sufficietly large, q 3) For sufficietly large, (1 η)2 {( ) } X,Y > 1 η Pr U [ XY ] ( H ( XY ) η ) U η [ XY ] η 2 ( H ( XY ) + η) P.62