Lecture 14 (03/27/18). Channels. Decodng. Prevew of the Capacty Theorem. A. Barg The concept of a communcaton channel n nformaton theory s an abstracton for transmttng dgtal (and analog) nformaton from the sender to the recpent over a nosy medum. Examples of physcal channels are wreless lnks, cable communcatons (optcal, coaxal, etc.), wrtng onto dgtal meda (flash, magnetc), and many more. Let X, Y be fnte sets. A mappng W : X Y s called stochastc f the mage of x X s a random varable takng values n Y. Denote P (x y) by W (y x), the probablty of y condtonal on the gven nput x. Defnton: A dscrete memoryless channel (DMC) s a stochastc mappng W : X Y. We use the letter W to refer both to the channel tself and to the probablty dstrbuton W (y x). The sets X and Y are called the nput alphabet and the output alphabet of W, respectvely. The channel s represented by a stochastc matrx whose rows are labelled by the elements of X (nput letters) and columns by the elements of Y (output letters). By defnton y Y W (y x) = 1 for any x X. Examples. 1. Z-channel (called so because ts dagram resembles the letter Z). ( ) 1 ε ε W = 0 1 2. Bnary symmetrc channel (BSC(p) ) W : {0, 1} {0, 1} ( ) 1 p p W = p 1 p 3. Bnary erasure channel (BEC((p)) W : {0, 1} {0, 1,?} ( ) 1 p p 0 W = 0 p 1 p There are more examples n the textbook. Defnton: Let M be a fnte set of cardnalty M and let f : M X n be a mappng. A code C of length n over the alphabet X s the mage of f n X n. We say that a message m M s encoded nto a codeword x m C f f(m) = x m. The set of codewords {x 1,..., x M } s called a channel code 1. The number R = 1 n log M s called the rate of the code C. Below we denote general n-vectors by x n, y n and keep the above notaton for the codewords. The codewords are transmtted over the channel. Ths means the followng. The mappng W s extended from X to X n usng the memoryless property of W : n W n (y n x n ) = W (y n x n ), where x n = (x n 1,..., x n n), y n = (y1 n,..., yn). n =1 The result of transmttng the codeword x m over the channel W s a vector y n Y n wth probablty W n (y n x m ). Messages are encoded and transmtted as codewords to provde the recpent wth the functonalty of correctng errors that may occur n the channel. Error correcton s performed by a decoder,.e., a mappng g : Y n M. The decoder s a determnstc mappng constructed so as to mnmze the probablty of ncorrect recovery of transmtted messages. 1 Sometmes the term code s used to refer to f and then the set of codewords C s called the codebook. 1
2 Optmal decoders. We brefly dscuss optmal decodng rules. Let Pr(m) be a probablty dstrbuton on M. Let y n be the receved vector,.e., the output of the channel. The posteror probablty that the transmtted message was m equals (1) P (m y n ) = Pr(m)W n (y n x m ) P (y n, ) where P (y n ) = M m=1 Pr(m)W n (y n x m ). Assume that g(y n ) = m, then the error probablty s p e = 1 P (m y n ). To mnmze p e decode y to m such that (2) P (m y n ) P (m y n ) for all m m (tes are broken arbtrarly). Ths rule s called the maxmum aposteror probablty (MAP) decoder. If Pr(m) = 1/M s unform, then the MAP decoder s equvalent to the maxmum lkelhood (ML) decoder g ML gven by g(y n ) = m f W n (y n x m ) W n (y n x m ) for all m m. To see ths, use the Bayes formula (1) n (2). If Pr(m) = 1/M s not unform, then the ML decoder s generally suboptmal. ML and MAP decoders are computatonally very hard because of the large search nvolved n fndng g(y). Prevew of the Shannon capacty theorem. The followng dscusson s nformal. It uses the smple case of the BSC to explan the nature of channel capacty n geometrc terms. Consder transmsson over W =BSC(p), p < 1 /2. Let d H (x n, y n ) = { : x y } be the Hammng dstance between the (bnary n-dmensonal) vectors x n and y n. Let x n be the transmtted vector and y n the receved vector. The typcal value of the dstance d H (x n, y n ) np. In other words, Pr{ d H (x n, y n ) np nα} s small, where α > 0 s a small number. Therefore defne the decoder value g(y n ) as follows: f there s a unque codevector x m C such that d H (x m, y n ) np nα, then g(y n ) = x m, otherwse put g(y) = x 1 (or any other arbtrary codevector). Below we call vectors y n whose dstance from x m s about np typcal for x m. The number of typcal vectors y n {0, 1} n for a gven x n s (3) {y n {0, 1} n : d H (x n, y n ) np nλ} =. Lemma 1. Let 1 λ 1/2, then (4) Proof : 1 = (λ + (1 λ)) n =0 1 n + 1 2nh(λ) λ (1 λ) n =0 =0 2 nh(λ). : np nλ (1 λ) n( λ ) λn = 2 nh(λ) 1 λ =0
3 We would lke the sets T α (x m ) {y n Y n : d H (x m, y n ) np nα} for dfferent x m to be dsjont. Note that n(p+α) T α ( ) =. Suppose frst that =n(p α) (5) M T α ( ) (4) 1 n + 1 M2nh(p α) > 2 n, then a pont y n n the output space s typcal on average for an exponentally large number of codevectors, namely for A = 2 n(r+h 2(p α) o(1)) /2 n = 2 n(r+h 2(p α) 1 o(1)) = 2 nε codevectors, where we denoted ε = R + h 2 (p) α 1. Ths means that a sgnfcant proporton of ponts y n s typcal for exponentally many codewords. In ths case decodng wth low error probablty s mpossble (for nstance, the maxmum error probablty s close to 1). We observe that (5) mples the followng nequalty: R > 1 h(p) + α where α s small f so s α. Thus f ths nequalty s true, the error probablty s large. At the same tme, f the cumulatve volume of the typcal sets around the codewords satsfes M T α ( ) = 2 nr T α ( ) (4) 2 n(r+h 2(p+α)),.e., s less than 2 n (the total sze of the output space), then there can exst codes n whch decodng s correct wth large probablty. We wll show by random choce that such codes ndeed exst. We notce that there s a dvdng pont R = 1 h 2 (p) between the low and hgh error probablty of decodng. Ths value of the rate s called the capacty of the channel W. We can also flp the queston by askng Threshold nose level: We are gven a code C of rate R. Assumng that C s chosen optmally for transmsson over a BSC(p), what s the largest p for whch the code can guarantee relable transmsson? The above argument shows that the maxmum p s p th = h 1 2 to vsualze the dependence on the rate). (1 R) (Plot the functon h 1 2 (z)
4 Lecture 15 (03/29/18). Channel capacty The followng s a formalzaton of the dscusson n the prevous secton. Consder BSC(p),.e., a stochastc mappng W : X Y, X = Y = {0, 1} such that Let W (y x) = (1 p)1 {x=y} + p1 {x y}. T α (x n ) = {y n : d H (x n, y n ) np nα}, α > 0. Let C = {x n 1,..., x n M } be a code (below we omt the superscrpt n from the notaton for the codewords). Let D(x m ) {y n {0, 1} n : m W n (y n x m ) W n (y n x m )} be the decson regon for the codeword x m. Denote by λ m = W n (y n x m ) y n D(x m) c the error probablty of decodng condtonal on transmttng the mth codeword and let be the maxmum error probablty of the code C. λ max (C) = max λ m m Theorem 2. (Shannon s capacty theorem for the BSC, lower bound) Gven ε > 0, γ > 0, p < 1/2 and R 1 h(p) γ, there exsts n 0 = n 0 (ε, γ) such that for any n n 0 there exsts a code C {0, 1} n of cardnalty 2 Rn, whose maxmum error probablty of decodng on a BSC(p) satsfes λ max ε. Proof : Let M = 2 R n, where R = R + 1 n. Choose C = {x 1,..., x M } {0, 1} Mn by randomly assgnng codewords to the messages wth unform probablty Pr(f(m) = x m ) = 2 n ndependently of each other (below we use the notaton x m for codewords, omttng the superscrpt x n m). Suppose that y n s the receved vector. Let us use the followng decoder g : Y n M : f there s a unque codeword x m C such that y n T α (x m ), assgn g(y n ) = m. In all other stuatons put g(y n ) = 1. (Ths mappng s called the typcal pars decoder). Let Z m = 1 {(yn T α (x m ))}, m = 1,..., M be the ndcator random varable of the event {y n s typcal for x m }. Suppose that the transmtted vector s x 1. The probablty of error λ 1 satsfes We have 2 M λ 1 Pr{Z 1 = 0} + Pr{ Z m 1}. m=2 Pr{Z 1 = 0} = Pr{y n T α (x 1 )} = Pr{ d H (x 1, y n ) np > nα} Chebyshev nequalty np(1 p) (nα) 2 = p(1 p)n δ 2 The Chebyshev nequalty states that for any random varable X wth fnte expectaton and varance Var(X) we have Pr{ X EX a} Var X a 2.
5 by takng α = n (1 δ)/2, δ > 0. By takng n suffcently large, we can guarantee that Pr{Z 1 = 0} β, where β > 0 s arbtrarly small. Next for m 2 Pr{Z m = 1} = T α( ) 2 n (4) 2 n(1 h(p+α)). Use the unon bound: M Pr{ Z m 1} M Pr{Z m = 1} 2 n(1 R h(p+α)) 2 n(α γ+ 1 n ), m=2 where we wrte h(p+α) = h(p)+α, and α s small f α s small (note that α > 0 snce p < 1/2). By takng a suffcently large n we can ensure that α < γ 1 n, and so Pr{ M m=2 Z m 1} β. Now let use compute the average probablty of error over all codeword assgnments f : P e = E F λ(c) = 1 M P r(c) λ m (C) M (6) = 1 M C m=1 M P r(c)λ m (C) = C C m=1 Pr(C)λ 1 (C) 2β where Pr(C) = M m=1 Pr{f(m) = x m}. Here F s the random mappng. Snce we go over all the mappngs, the sum C P r(c)λ m(c) does not depend on m. By (6) there exsts a code C for whch the error probablty averaged over M codewords satsfes λ(c) 2β. By the Markov nequalty, {m M : λ m (C ) 2λ(C } M/2. Thus, there exst at least M/2 messages 3 whose codewords n C are decoded wth error probablty λ m (C ) 4β. Denote ths set of codewords by C and take β = ε/4. Thus there s a code C of cardnalty M 2 = 2n(R 1 n ),.e., of rate R wth λ max ( C ) ε. Observe that the probablty λ max falls as n ε. By usng the optmal (.e., MAP) decoder together rather than the typcal pars decoder t s possble to show that there exst much better codes for the BSC. Theorem 3. For any rate R, 0 R < 1 h 2 (p) there exsts a sequence of codes C, = 1, 2... of growng length n such that log C R n and (7) λ max (C ) 2 n{d(δ(r) p))(1 o(1)} where D(x y) = x log x y 1 x + (1 x) log 1 y, δ(r) h 1 2 (1 R), and o(1) 0 as n. We wll omt the proof. Note that the declne rate of the maxmum error probablty s a much faster (exponental) functon of the code length n than n the above argument. We took a loss by usng a suboptmal decoder (and a smpler proof). Fnte-length scalng. The effcency of our transmsson desgn can be measured by the number of messages that can be transmtted relably. Suppose that the code rate s R = 1 h 2 (p) γ, where γ > 0 s small and p s the transmsson probablty of the BSC. Suppose moreover that we requre 3 Ths dea s called expurgaton.
6 that λ max = ε. We already understand that we wll have to choose a suffcently long code. What s the smallest n that can guarantee ths? As R 1 h(p), we have δ(r) p. We have D(δ(1 h(p)) p) = D(p p) = 0, D δ(δ p) (1 δ)p = log = 0, D δ (δ p) = 1 1 δ(1 p) δ=p ln 2 δ(1 δ). If R = 1 h 2 (p) γ then δ = p + γ where γ s some small number. Expandng D nto a power seres n the neghborhood of δ = p we obtan D(δ p) = 1 2 D δ (δ p)(δ p) 2 + o((δ p) 2 ) = O((δ p) 2 ). 0.10 0.08 p 0.5 0.06 0.04 0.02 From (7) we obtan (8) n log(1/ε) (δ p) 2 (constants omtted). Let us rephrase ths by fndng how γ (gap to capacty) depends on the code length n. To answer ths, rewrte (8) as follows: δ p (log(1/ε)) 1 /2 1 n Now substtute δ = h 1 (1 R) to fnd that R 1 h(p O(n 1 /2 )), or ( 1 R 1 h(p) O n ). To conclude: Proposton 4. The gap-to-capacty for optmal codes scales as n 1/2. The outcome of ths calculaton s called fnte-length scalng of the code sequence on the BSC. The same order of scalng s true for optmal codes used on any bnary-nput dscrete memoryless channel (DMC). We return to the textbook, and state the Shannon capacty theorem for a DMC (pp.192 195, 200). Then we consder examples (pp. 187 191), and then (next class) prove the capacty theorem.