Capacity and the Weak Converse Lecture 4 Coding I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw October 15, 2014 1 / 16 I-Hsiang Wang NIT Lecture 4
Capacity and the Weak Converse The Coding Problem w Encoder x N Noisy y N Decoder bw Meta Description 1 Message: Random message W Unif [1 : 2 K ]. 2 : Consist of an input alphabet X, an output alphabet Y, and a family of conditional distributions { p ( y k x k, y k 1) k N } determining the stochastic relationship between the output symbol y k and the input symbol x k along with all past signals ( x k 1, y k 1). 3 Encoder: Encode the message w by a length N codeword x N X N. 4 Decoder: Reconstruct message ŵ from the channel output y N. 5 Efficiency: Maximize the code rate R := K N bits/channel use, given certain decoding criterion. 2 / 16 I-Hsiang Wang NIT Lecture 4
Capacity and the Weak Converse Decoding Criterion: Vanishing Error Probability w Encoder x N Noisy A key performance measure: Error Probability P (N) e Question: Is it possible to get zero error probability? y N Decoder bw { } := Pr W Ŵ. Ans: Probably not, unless the channel noise has some special structure. Following the development of lossless source coding, Shannon turned the attention to answering the following question: Is it possible to have a sequence of encoder/decoder pairs such that P (N) e 0 as N? If so, what is the largest possible code rate R where vanishing error probability is possible? Note: In lossless source coding, we see that the infimum of compression rates where vanishing error probability is possible is H ({S i }). 3 / 16 I-Hsiang Wang NIT Lecture 4
Capacity and the Weak Converse Rate R Block Length N Probability of Error P (N) e Take N, Require P (N) e 0: sup R = C. capacity Take N : min P (N) e 2 NE(R). error exponent Fix N, Require P (N) V e ϵ: sup R C n Q 1 (ϵ). finite length 4 / 16 I-Hsiang Wang NIT Lecture 4
Capacity and the Weak Converse In this course we only focus on capacity. In other words, we ignore the issue of delay and do not pursue finer analysis of the error probability via large deviation techniques. 5 / 16 I-Hsiang Wang NIT Lecture 4
Capacity and the Weak Converse Discrete Memoryless (DMC) In order to demonstrate the key ideas in channel coding, in this lecture we shall focus on discrete memoryless channels (DMC) defined below. Definition 1 (Discrete Memoryless ) A discrete channel ( X, { p ( y k x k, y k 1) k N }, Y ) is memoryless if k N, p ( y k x k, y k 1) = p Y X (y k x k ). In other words, Y k X k ( X k 1, Y k 1). Here the conditional p.m.f. p Y X is called the channel law or channel transition function. Question: is our definition of a channel sufficient to specify p ( y N x N), the stochastic relationship between the channel input (codeword) x N and the channel output y N? 6 / 16 I-Hsiang Wang NIT Lecture 4
Capacity and the Weak Converse p ( y N x N) = p ( x N, y N) p (x N ) p ( x N, y N) N = p ( x k, y k x k 1, y k 1) = k=1 N p ( y k x k, y k 1) p ( x k x k 1, y k 1) k=1 Hence, we need to further specify { p ( x k x k 1, y k 1) k N }, which cannot be obtained from p ( x N). Interpretation: { p ( x k x k 1, y k 1) k N } is induced by the encoding function, which implies that the encoder can potentially make use of the past channel output, i.e., feedback. 7 / 16 I-Hsiang Wang NIT Lecture 4
Capacity and the Weak Converse DMC without Feedback w Encoder x k Noisy y k w Encoder x k y k 1 D Noisy y k (a) No Feedback (b) With Feedback Suppose in the system, the encoder has no knowledge about the realization of the channel output, then, p ( x k x k 1, y k 1) = p ( x k x k 1) for all k N, and it is said the the channel has no feedback. In this case, specifying { p ( y k x k, y k 1) k N } suffices to specify p ( y N x N). Proposition 1 (DMC without Feedback) For a DMC ( X, p Y X, Y ) without feedback, p ( y N x N) = N p Y X (y i x i ). k=1 8 / 16 I-Hsiang Wang NIT Lecture 4
Capacity and the Weak Converse Overview In this lecture, we would like to establish the following (informally described) noisy channel coding theorem due to Shannon: For a DMC ( X, p Y X, Y ), the maximum code rate with vanishing error probability is the channel capacity C := max I (X; Y). p X ( ) The above holds regardless of the availability of feedback. To demonstrate this beautiful result, we organize this lecture as follows: 1 Give the problem formulation, state the main theorem, and visit a couple of examples to show how to compute channel capacity. 2 Prove the converse part: an achievable rate cannot exceed C. 3 Prove the achievability part with a random coding argument. 9 / 16 I-Hsiang Wang NIT Lecture 4
Capacity and the Weak Converse 1 Capacity and the Weak Converse 2 3 10 / 16 I-Hsiang Wang NIT Lecture 4
Capacity and the Weak Converse Coding: Problem Setup w Encoder x N Noisy y N Decoder bw 1 A ( 2 NR, N ) channel code consists of an encoding function (encoder) enc N : [1 : 2 K ] X N that maps each message w to a length N codeword x N, where K := NR. a decoding function (decoder) dec N : Y N [1 : 2 K ] { } that maps a channel output sequence y N to a reconstructed message ŵ or an error message. 2 The error probability is defined as P (N) e { } := Pr W Ŵ. 3 A rate R is said to be achievable if there exist a sequence of ( 2 NR, N ) codes such that P (N) e 0 as N. The channel capacity is defined as C := sup {R R : achievable}. 11 / 16 I-Hsiang Wang NIT Lecture 4
Capacity and the Weak Converse Coding Theorem for Discrete Memoryless w Encoder x N Noisy y N Decoder bw Theorem 1 ( Coding Theorem for DMC) The capacity of the DMC p (y x) is given by C = max I (X; Y), p(x) regardless of the availability of feedback. 12 / 16 I-Hsiang Wang NIT Lecture 4
Capacity and the Weak Converse Proof of the (Weak) Converse (1) We would like to show that for every sequence of ( 2 NR, N ) codes such that P (N) e 0 as N, the rate R max I (X; Y). p(x) pf: Note that W Unif [1 : 2 K ] and hence K = H (W). ( ) ( ) NR H (W) = I W; Ŵ + H W Ŵ (1) I ( W; Y N) ( + 1 + P (N) e log ( 2 K + 1 )) (2) N k=1 I ( W; Y k Y k 1) ( ) + 1 + P (N) e (NR + 2) (1) is due to K = NR NR and chain rule. (2) is due to W Y N Ŵ and Fano s inequality. (3) is due to chain rule and 2 K + 1 2 NR+1 + 1 2 2 NR+1. (3) 13 / 16 I-Hsiang Wang NIT Lecture 4
Capacity and the Weak Converse Proof of the (Weak) Converse (2) ( ) Set ϵ N := 1 N 1 + P (N) e (NR + 2), we see that ϵ N 0 as N because lim N P (N) e = 0. The next step is to relate N k=1 I ( W; Y k Y k 1) to I (X; Y), by the following manipulation: I ( W; Y k Y k 1) I ( W, Y k 1 ; Y k ) I ( W, Y k 1, X k ; Y k ) (4) = I (X k ; Y k ) max I (X; Y) (5) p(x) (4) is due to the fact that conditioning reduces entropy. (5) is due to DMC: Y k X k ( X k 1, Y k 1). Hence, we have R max p(x) I (X; Y) + ϵ N for all N. Taking N, we conclude that R max p(x) I (X; Y) if it is achievable. 14 / 16 I-Hsiang Wang NIT Lecture 4
Capacity and the Weak Converse 1 Capacity and the Weak Converse 2 3 15 / 16 I-Hsiang Wang NIT Lecture 4
Capacity and the Weak Converse 1 Capacity and the Weak Converse 2 3 16 / 16 I-Hsiang Wang NIT Lecture 4