ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE School of Computer and Communication Sciences Handout 8 Information Theor and Coding Notes on Random Coding December 2, 2003 Random Coding In this note we prove the achievabilit part of the channel coding theorem for discrete memorless channels without feedback. The discussion is largel based on chapter 5 of R. G. Gallager, Information Theor and Reliable Communication, Wile, 968.. Discrete Memorless Channels Without Feedback Throughout this note we will fix the channel we wish to communicate over. Let X and Y denote the input and output alphabets of this channel. We will assume that the channel is discrete, i.e., that X and Y are finite sets. The behavior of the channel will be completel described b specifing for each k the function P k : X k Y k R, (x k, k ) P k ( k x k, k ), which gives the probabilit of receiving the letter k at the output at time k, given the past and current inputs to the channel, and the past outputs of the channel. We make the two following assumptions: the channel is memorless and used without feedback. We will call a channel memorless if P k ( k x k, k ) = P ( k x k ), which in words means that the channel keeps no memor of the past inputs and outputs in determining the output of time k. Also note that the channel does not behave differentl at different times: the function P on the right hand side is not an explicit function of k. If a memorless channel is used without feedback, i.e., if P Xk X k,y k (x k x k, k ) = P Xk X k (x k x k ) (in words: if the channel inputs do not depend on the past channel outputs) then P Y n X n(n x n ) = P X n,y n(xn, n ) P X n (x n ) n k= = P X k,y k X k,y (x k k, k x k, k ) P X n (x n ) n k= = P X k X k,y (x k k x k, k )P Yk X k k,y ( k x k, k ) P X n (x n ) n k= = P X k X (x k k x k )P Yk X k ( k x k ) P X n (x n ) n = P ( k x k ) k=
where we use the memorless and without feedback conditions at the fourth equalit. From now on, we will restrict our attention of channels used without feedback. With some abuse of notation we will let P ( x) denote the probabilit of receiving the sequence = n at the output of the channel when the channel input is the sequence x = x n. If the channel is memorless, we see from above that 2. Block Codes P ( x) = n P ( k x k ). k= A block code with M messages and block length n is a mapping from a set of M messages {,..., M} to channel input sequences of length n. Thus, a block code is specified when we specif the M channel input sequences c = (c,,..., c,n ),..., c M = (c M,,..., c M,n ) the messages are mapped into. We will call c m the codeword for message m. To send message m with such a block code we simpl give the sequence c m to the channel as input. A decoder for such a block code is a mapping from channel output sequences Y n to the set of M messages {,..., M}. For a given decoder, let D m Y n denote the set of channel outputs which are mapped to message m. Since an output sequence is mapped to exactl one message, D m s form a collection of disjoint sets whose union is Y n. We define the rate of a block code with M messages and block length n as ln M n, and given such a code and a decoder we define P e,m = D m P ( c m ), the probabilit of a decoding error when message m is sent. Further define P e,ave = M M m= P e,m and P e,max = max m M P e,m as the average and maximal (both over the possible messages) error probabilit of such a code and decoder. Among man possible decoding methods, the rule that minimizes P e,ave is the maximum likelihood rule. Given a channel output sequence, the maximum likelihood rule decodes a message m for which P ( c m ) P ( c m ) for ever m m, and if there are more than one such m chooses one of them arbitraril. We will restrict ourselves in the following to the maximum likelihood rule. 3. Error probabilit for two codewords Consider now the case when M = 2, so the block code consists of two codewords, c and c 2. We will find a bound on P e,m for the maximum likelihood decoding rule. 2
Suppose message is to be sent. The channel input is then c, and the probabilit of receiving at the channel output is P ( c ). An error will occur if the received sequence is not in D. Since for ever not in D, P ( c 2 ) P ( c ), (but there ma be s for which P ( c 2 ) = P ( c ) and D ) we have P e, = D P ( c ) : P ( c 2 ) P ( c ) P ( c ) = P ( c ) P ( c2 ) P ( c ) :P ( c 2 ) P ( c ) P ( c ) P ( c 2) s P ( c ) s P ( c ) P ( c 2) s P ( c ) s for an s 0 = P ( c ) s P ( c 2 ) s The choice s = /2 gives us P ( c )P ( c 2 ), P e, and b smmetr, the same quantit also upper bounds P e,2. For a memorless channel P ( c m ) = n k= P ( k c m,k ) and we obtain P e,m P ( c )P ( c 2 ) = P ( c, )P ( c 2, )... P ( n c,n P ( n c 2,n ) n [ ] [ ] = P ( c, )P ( c 2, )... P ( n c,n )P ( n c 2,n ) = n [ ] P ( c,k )P ( c 2,k ). k= For example, for a binar smmetric channel with crossover probabilit ɛ, we see that { if c,k = c 2,k P ( c,k )P ( c 2,k ) = 2 ɛ( ɛ) else, and we obtain n P e,m [4ɛ( ɛ)] d/2 where d is the number of places c and c 2 are different (i.e., d = {k : c,k c 2,k } ). 4. Error Probabilit for two randoml chosen codewords Suppose now that the codewords c and c 2 are chosen randoml and independentl each from a distribution Q on X n. Observe that the codewords are random variables C and 3
C 2 and the probabilit that the block code is the particular block code with codewords c and c 2 is given b Q(c )Q(c 2 ). The error probabilities P e,m are now random variables P e,m since the are functions of C and C 2. Let P e,m denote the expectation of P e,m. We will now give an upper bound on P e,m in two different was. The first is the more straightforward, but the second wa will turn out to be conceptuall more useful. We will take m =, but is is clear b smmetr that P e, = P e,2. Method : Write P e, = E[P e, ] = c c 2 Q(c )Q(c 2 )E[P e, C = c, C 2 = c 2 ]. But when we are given c and c 2, P e, is no longer random, and we can use the bound of the last section on the probabilit of error to get P e, Q(c )Q(c 2 ) P ( c ) P ( c 2 ) c c 2 = [ Q(c ) P ( c )][ Q(c 2 ) P ( c2 )] c c 2 = [ Q(c) ] 2 P ( c) c where the last equalit follows b noting that the sums in the brackets in the next to last line are identical since the differ onl b the index of summation. Method 2: Write P e, = E[P e, ] = Q(c )P ( c )E[P e, C = c, Y = ]. c Note that here we are computing the expectation b first conditioning on the transmitted and received sequences. Now, observe that given C = c and the received sequence, an error is sure not to occur if C 2 is chosen such that P ( C 2 ) < P ( c ), and otherwise we can upper bound P e, b. Thus, given C = c and, we have { if P ( C 2 ) P ( c ) P e, 0 if P ( C 2 ) < P ( c ) Taking expectations we see that = P ( C2 ) P ( c ). E[P e, C = c, Y = ] Pr(B 2 ) where B 2 is the event that C 2 is chosen such that P ( C 2 ) P ( c ). We can bound Pr(B 2 ) b Pr(B 2 ) = c 2 Q(c 2 ) P ( c2 ) P ( c ) c 2 Q(c 2 ) P ( c 2) s P ( c ) s for an s 0. 4
Taking s = /2 and substituting back we obtain the same bound as before, P e, [ Q(c) 2. P ( c)] c For memorless channels the bound simplifies if Q(c) is chosen to be Q(c) = n k= Q(c k). In this case we obtain [ [ P e, Q(x) ] 2 n P ( x)]. 5. Average error probabilit of a randoml chosen code x Consider now a code with M codewords, each codeword c m chosen independentl according to the probabilit distribution Q. Just as in the above section, the probabilit that the block code constructed is a particular block code with codewords c,..., c M is M m= Q(c m). The error probabilities P e,m are also random variables, and again let P e,m denote the expectation of P e,m. B the smmetr with respect to the permutation of the codewords in the construction, we see that P e,m does not depend on m, and it will suffice to analze P e,. We will take the extension of method 2 in the previous section as our analsis method, and write P e, = c Q(c )P ( c )E[P e, C = c, Y = ]. Now, for a given c and define for each m 2, B m as the event that codeword C m is chosen such that P ( C m ) P ( c ), i.e., that codeword m is at least as likel as the transmitted codeword. Then just as in the previous section, ( M E[P e, C = c, Y = ] Pr m=2 { min, B m ) M m=2 } Pr(B m ) [ M ρ Pr(B m )] for all ρ [0, ]. m=2 The second inequalit above is just the union bound, the third inequalit is because for ρ [0, ], x x ρ when x [0, ], and x ρ when x. Observe now that Pr(B m ) = c m Q(c m ) P ( cm) P ( c ) c m = c Q(c m ) P ( c m) s P ( c ) s for an s 0 Q(c) P ( c)s P ( c ) s and thus E[P e, C = c, Y = ] [ (M ) c 5 Q(c) P ] ρ ( c)s. P ( c ) s
Substituting back we get P e, (M ) ρ [ ][ ] ρ. Q(c )P ( c ) sρ Q(c)P ( c) s c c Choosing now s = /( + ρ) (this choice in fact minimizes the bound) and observing that for this choice sρ = s, and that [ ] [ ] Q(c )P ( c ) sρ = Q(c)P ( c) s c since the two summations differ onl b the summation index, we get P e, (M ) [ ] +ρ. ρ Q(c)P ( c) /(+ρ) c If we now specialize this theorem to discrete memorless channels and if we choose Q(c) = n i= Q(c i), we get that for ever ρ [0, ], P e, (M ) ρ [ c [ ] ] +ρ n Q(x)P ( x) /(+ρ) If M = e nr, then (M ) e nr, and we can summarize the above as x Theorem. Given a discrete memorless channel described b P ( x), for an blocklength n, and an R 0 consider constructing a random block code with M = e nr codewords b choosing each letter of each codeword independentl according to a distribution Q on X. Then, the expected average error probabilit of this random code satisfies P e,ave exp { n max [E 0 (ρ, Q) ρr] } ρ [0,] where E 0 (ρ, Q) = ln [ ] +ρ. Q(x)P ( x) /(+ρ) x Since the expected error probabilit cannot be better than the error probabilit of the best code we also get Corollar. Given a discrete memorless channel described b P ( x), for an distribution Q on X, an blocklength n and an R > 0, there exists a code of block length n and rate at least R with P e,ave exp { ne r (R, Q) }. where E r (R, Q) = max [E 0 (ρ, Q) ρr]. ρ [0,] Note that the above corollar establishes the existence of codes of a certain rate with a guarantee on their P e,ave, but suggests no mechanism to find them. However, if one does carr out the experiment of constructing a code b randoml choosing its codewords, the probabilit that the code obtained will be much worse than the average is small, in particular, Markov inequalit tells us that the probabilit that a code constructed as such has P e,ave larger than α P e,ave is small: Pr[P e,ave α P e,ave ] /α for α >. 6
ρi(q) E 0 (ρ, Q) ρ Figure : E 0 (ρ, Q) Thus, the difficult in finding good practical codes is not that the good codes are rare, but the difficult is in find good codes for which there are practical methods to encode and decode. The corollar above makes guarantees on P e,ave, but for a code constructed b random coding the value of P e,max ma be much higher than P e,ave. Can we ensure the existence of codes with small P e,max? Theorem 2. Given a discrete memorless channel described b P ( x), for an distribution Q on X, an blocklength n and an R > 0, there exists a block code of length n and rate at least R with P e,max 4 exp { ne r (R, Q) }. Proof. Let M = 2e nr. We know that there is a code with M codewords and satisfies P e,ave = M M m= P e,m (M ) ρ exp { ne 0 (ρ, Q) } for ever ρ [0, ]. Since we see that for this code (M ) ρ 2 ρ e nρr 2e nρr, P e,ave = M M m= P e,m 2 exp { ne r (R, Q) }. Now, among the M numbers P e,,..., P e,m there cannot be more than M /2 which exceed twice the average value P e,ave. Thus, among these M numbers there exist at least e nr of them which satisf P e,m 2P e,ave 4 exp { ne r (R, Q) }. Keeping onl these codewords, and noticing that in the maximum likelihood decoder that corresponds to this code the decoding sets can onl be enlarged, we see that we have constructed a code with the desired properties. 5.. Properties of E 0 and E r The value of the existence theorems we just proved depend on whether the bound we have on the probabilit of error can be made arbitraril small for a set of rates of interest. Define I(Q) = x Q(x)P ( x) ln P ( x) x Q(x )P ( x ) 7
as the value of mutual information between the input and output when the input distribution is Q. We will now show that for ever rate R < I(Q), the exponent E r (R, Q) > 0, and thus for ever rate R < I(Q) we can find codes with arbitraril small probabilit of error (b taking n large enough). Since this statement holds for ever input distribution Q, it also follows that for ever rate R < max Q I(Q) = max px I(X; Y ) there exist codes of arbitraril small probabilit of error. This proves that an rate R below capacit is achievable (achievabilit part of the coding theorem). A tedious but straightforward calculation also ields that E 0 (ρ, Q) ρ = I(Q). ρ=0 Using the fact that whenever Q and A are nonnegative, the function t [0, ) [ x Q(x)A(x)/t] t is nonincreasing, it is eas to show that E0 (ρ, Q) is nondecreasing in ρ. We do not need it here, but it is also possible to show that E 0 (ρ, Q) is a concave function of ρ. Thus, Figure shows the tpical behavior of E 0 as a function of ρ. Observe now that since the function R E r (R, Q) is a maximum of the linear functions R E 0 (ρ, Q) ρr, (see Figure 2), we see that E r (R, Q) > 0 whenever for some ρ [0, ], E 0 (ρ, Q) ρr > 0 is satisfied. E 0 (, Q) E 0 ( 2, Q) E 0 ( 4, Q) E r (R, Q) I(Q) R Figure 2: E r (R) as a maximum of linear functions. It now follows that if R < E 0 (ρ, Q)/ρ for some ρ (0, ] and Q, then E 0 (ρ, Q) ρr > 0 and thus E r (R, Q) > 0. But, I(Q) = E 0(ρ, Q) E 0 (ρ, Q) E 0 (0, Q) E 0 (ρ, Q) ρ = lim = lim. ρ 0 ρ=0 ρ ρ 0 ρ where the last equalit follows from fact that E 0 (0, Q) = ln x, Q(x)P ( x) = ln = 0. Thus, b the definition of the limit above, for ever R < I(Q), we can find a ρ such that E 0 (ρ, Q)/ρ > R. Hence for ever R < I(Q) we have E r (R, Q) > 0. Since the above argument holds for an input distribution Q, it holds in particular for the Q that maximizes I(Q). This proves the existence of codes with arbitraril small probabilit of error for ever rate less than the capacit max Q I(Q). The approach we used here ields to the function E r (R, Q) called random coding error exponent. To prove the achievabilit part of the coding theorem it would suffice to show that for an rate R below capacit there exists a sequence of rate R codes of length n with corresponding error probabilit that tends to zero as n tends to infinit. The random coding error exponent argument in addition to prove it, also characterizes the deca of the probabilit of error as n tends to infinit. 8