Lecture 5: Channel Capacity Copyright G. Caire (Sample Lectures) 122
M Definitions and Problem Setup 2 X n Y n Encoder p(y x) Decoder ˆM Message Channel Estimate Definition 11. Discrete Memoryless Channel (DMC): a (stationary) DMC (X,P Y X, Y) consists of an input alphabet X, an output alphabet Y and a transition pmf P Y X such that P(Y n = y X n = x) = The memoryless property implies that ny P Y X (y i x i ) i=1 P Y i = y i M = m, X i =(x 1 (m),...,x i (m)),y i 1 =(y 1,...,y i 1 ) = P Y X (y i x i (m)) Copyright G. Caire (Sample Lectures) 123
Channel Coding Definition 12. consists of A block code C with rate R and block length n (an (R, n)-code) 1. A message set M =[1:2 nr ]= 1,...,2 nr. 2. A codebook {x(1),...,x(2 nr )}, i.e., an array of dimension 2 nr n over X, each row of which is a codeword. 3. An encoding function f : M! X n, such that f(m) =x(m) for m 2 M. 4. A decoding function g : Y n! M such that bm = g(y) is the decoded message. Copyright G. Caire (Sample Lectures) 124
Probability of Error Definition 13. Individual message probability of error: the conditional probability of error given that message m is transmitted is P e,m (C) =P(g(Y n ) 6= m X n = x(m)) Definition 14. Maximal probability of error: P e,max (C) = max m2m P e,m(c) Definition 15. Average probability of error: nr X2 P e (C) =2 nr m=1 P e,m (C) Copyright G. Caire (Sample Lectures) 125
Achievable Rates and Capacity Definition 16. Achievable rate: A rate R is said to be achievable if there exist a sequence of (R, n)-codes {C n } with probability of error P e,max (C n )! 0 as n!1. Definition 17. Channel capacity: The channel capacity is the supremum of all achievable rates. The above is an operational definition of capacity. A coding theorem (in information theory) consists of finding a formula, i.e., an explicit expression, for C in terms of the characteristics of the problem, i.e., in terms of P Y X. Copyright G. Caire (Sample Lectures) 126
Role of Mutual Information When the input X n is i.i.d. P X, then (X n,y n ) are i.i.d. with (X i,y i ) P X P Y X and Y n has an induced marginal distribution Y i P Y. There are 2 nh(y ) typical output sequences. If the input is typical, the probability of a non-typical output is negligible. For > 0 T (n) (Y x). > 0, for x 2 T (n) 0 (X) there are 2 nh(y X) typical outputs in How many non-overlapping typical output sets we can pack in T (n) (Y )? 2 nh(y ) 2 nh(y X) =2n(H(Y ) H(Y X)) =2 ni(x;y ) Copyright G. Caire (Sample Lectures) 127
The Channel Coding Theorem Theorem 11. Channel Coding Theorem: The capacity of the DMC (X,P Y X, Y) is given by C = max P X I(X; Y ) Example: Capacity of the BSC: A BSC is defined by Y = X Z, where X = Y = {0, 1}, addition is modulo-2, and Z Bernoulli-p. We have C = max P X I(X; Y ) = max P X {H(Y ) H(Y X)} = max P X {H(Y ) H(X Z X)} = max P X H(Y ) H 2 (p) =1 H 2 (p) Copyright G. Caire (Sample Lectures) 128
Capacity of the BEC Example: Capacity of the BEC: A BEC is defined by the diagram here below: 0 1 1-e e e 1-e 0 e 1 We have C = max P X I(X; Y ) = max P X {H(X) H(X Y )} = max P X {H(X) eh(x)} = max P X (1 e)h(x) = 1 e where the last equality holds by choosing X to be Bernoulli- 1 2. Copyright G. Caire (Sample Lectures) 129
Symmetric channels Strongly symmetric channels: the transition matrix P with elements P r,s = P Y X (y = s x = r) has the property that every row is a permutation of the first row, and each column is a permutation of the first column. Weakly symmetric channels: every row of P is a permutation of the first row. For strongly symmetric channels: achieved by X Uniform on X. For weekly symmetric channels: C = log Y H(P 1,1,...,P 1, Y ) C = max P X H(Y ) H(P 1,1,...,P 1, Y ) Copyright G. Caire (Sample Lectures) 130
Additive-noise channels A discrete memoryless additive noise channel is defined by X = Y = F q (or, more in general, some additive group). P Y X is induced by the random mapping Y i = x i + Z i It follows that P Y X (y x) =P Z (y x) Hence, additive noise channels over F q are always strongly symmetric, and have capacity C = log q H(Z). Copyright G. Caire (Sample Lectures) 131
Computing capacity: convex maximization Maximization of the mutual information: convex optimization problem maximize subject to I(p, P) = X r X p r =1 r 0 apple p r apple 1, 8 r X p r P r,s log s P r,s P r 0 p r 0P r 0,s Recall that we have proved that the mutual information I(p, P) seen as a function of the input probability vector p for fixed transition matrix P, is a concave function. Copyright G. Caire (Sample Lectures) 132
Proof of the Channel Coding Theorem (1) Direct Part (achievability): we wish to prove that for any R<Cthere exists a sequence of (R, n) codes with vanishing error probability. Random coding: instead of building a specific family of codes (very difficult), we average over a random ensemble of codes. Fix P X and generate a 2 nr n codebook at random with i.i.d. entries P X. The codebook (natural ordering encoding function) is revealed to transmitter and receiver before the communication takes place. Encoding: x(m) is the m-th row of the generated codebook. Decoding: Joint Typicality Decoding. Let y denote the received observed channel output. Then, g(y) = bm 2 M if this is the unique index s.t. (x( bm), y) 2 T (n) (X, Y ) declare error otherwise Copyright G. Caire (Sample Lectures) 133
Proof of the Channel Coding Theorem (2) Analysis of the probability of error: P (n) e = X C P(C)P e (n) (C) = X C nr X2 = 2 nr nr X2 P(C)2 nr m=1 nr X2 = 2 nr = X C m=1 m=1 P e,m (C) X P(C)P e,m (C) C X P(C)P e,1 (C) C P(C)P e,1 (C) =P(g(Y n ) 6= 1 M = 1) Copyright G. Caire (Sample Lectures) 134
Proof of the Channel Coding Theorem (3) We let E = {g(y n ) 6= 1} denote the conditional error event, and notice that E E 1 [ E 2 where and E 1 = {(X n (1),Y n ) /2 T (n) (X, Y )} E 2 = {(X n (m),y n ) 2 T (n) (X, Y ) for some m 6= 1} By the Union Bound we have P(g(Y n ) 6= 1 M = 1) = P(E M = 1) apple P(E 1 [ E 2 M = 1) apple P(E 1 M = 1) + P(E 2 M = 1) Copyright G. Caire (Sample Lectures) 135
Proof of the Channel Coding Theorem (4) For the first term, notice that since (X n (1),Y n ) are jointly distributed according to Q n i=1 P X(x i )P Y X (y i x i ), then by the LLN P(E 1 M = 1)! 0, as n!1 For the second term, for m 6= 1we have that X n (m) and Y n are distributed as the product of marginals Q n i=1 P X(x i )P Y (y i ), therefore P(E 2 M = 1) apple 2 nr 2 n(i(x;y ) ( )) by the Packing Lemma. It follows that for any > 0 and sufficiently large n P (n) e apple +2 n(i(x;y ) R ( )) apple 2, for R<I(X; Y ) ( ) Copyright G. Caire (Sample Lectures) 136
Proof of the Channel Coding Theorem (5) Consequences: 1. For any n there exists at least one code that perform not worse than the ensemble average; 2. We can choose P X in order to maximize I(X; Y ); 3. ( ) vanishes by considering smaller and smaller. From average to maximal error probability (expurgation). Fix > 0, and let C n? be a code with P e (C n)? apple and rate R>C ( ). Sort the codewords such that P e,1 (C? n) apple P e,2 (C? n) apple apple P e,2 nr(c? n) Copyright G. Caire (Sample Lectures) 137
Proof of the Channel Coding Theorem (6) Define the expurgated code e C? n = {x(1),...,x(2 nr 1 )} (best half of the codewords). It follows that P e,max ( e C? n)=p e,2 nr 1(C? n) apple 2 END OF THE DIRECT PART Copyright G. Caire (Sample Lectures) 138
Proof of the Channel Coding Theorem (7) Proof of the converse part: There exist no codes with rate R>Cand arbitrarily small error probability. It is more convenient to prove the following equivalent statement: suppose that a sequence of (R, n)-codes (C n } exists, such that P e (C n )=P e (n)! 0. Then, it must be R apple C. We start from Fano Inequality: consider the joint n-letter distribution induced by the message M Uniform on M, by the encoding function f and by the channel P Y X. Then... Copyright G. Caire (Sample Lectures) 139
Proof of the Channel Coding Theorem (8) nr = H(M) = H(M) H(M Y n )+H(M Y n ) nx apple I(M; Y n )+1+nP e (n) R = I(M; Y i Y i apple = nx I(M,Y i 1 ; Y i )+n n = i=1 i=1 nx I(M,Y i i=1 nx I(X i ; Y i )+n n apple nc + n n i=1 1 )+n n 1,X i ; Y i )+n n END OF THE CONVERSE PART Copyright G. Caire (Sample Lectures) 140
A discrete (stationary) memoryless channel (DMC) (X, p(y x), Y) consists of two finite sets X, Y, and a collection of conditional pmfs p(y x) on Y Feedback Capacity (1) By memoryless, we mean that when the DMC (X, p(y x), Y) is used over n transmissions with message M and input X n, the output Yi at time i 2 [1 : n] i i 1 is distributed according to xp(y i xy, iy 1 ), m) = p(yi xi ) i (M, Faculty of Electrical Engineering and Computer Systems Department of Telecommunication Systems Information and Communication Theory Prof. Dr. Giuseppe Caire Einsteinufer 25 10587 Berlin Telefon +49 (0)30 314-29668 Telefax +49 (0)30 314-28320 caire@tu-berlin.de Sekretariat HFT6 Patrycja Chudzik M Encoder Xn YYin p(y x) Decoder M Telefon +49 (0)30 314-28459 Telefax +49 (0)30 314-28320 sekretariat@mk.tu-berlin.de Message Estimate Channel Yi 1 A (2nR, n) code with rate R consists of: 0 bits/transmission for the DMC (X, p(y x), Y) 1. A message set [1 : 2nR] = {1, 2,...,D 2 nr } 2. An encoding function (encoder) xn : [1 : 2nR]! X n that assigns a codeword xn(m) to each message m 2 [1 : 2nR]. The set C := {xn(1),..., xn (2 nr )} is referred to with as thefeedback codebookis defined by a sequence of encoding functions An (R, n)-code LNIT: Point-to-Point Communication (2010-06-22 08:45) for i = 1,..., n, such that fi : M Y i 1!X xi = fi(m, y1,..., yi Copyright G. Caire (Sample Lectures) Page 3 2 1) 141
Feedback Capacity (2) This model, referred to as Shannon feedback channel, can be seen as an idealization of several protocols implemented today (e.g., ARQ, incremental redundancy, power control, rate allocation in wireless channels). Theorem 12. given by The feedback capacity of a discrete memoryless channel is C fb = C = max P X I(X; Y ) Proof: The converse for the channel without feedback holds verbatim for the case with feedback (check!). The achievability, obviously, also holds. Copyright G. Caire (Sample Lectures) 142
Feedback Capacity (3) Memoryless channels: feedback may greatly simplify operations, and achieve a much better behavior of the error probability versus n, at fixed rate R<C. Example of the BEC (Automatic Repetition request, ARQ). Channels with memory: feedback may achieves higher capacity. Multiuser networks: feedback may achieves (much) larger capacity. Copyright G. Caire (Sample Lectures) 143
Source-Channel Separation Theorem (1) We wish to transmit a stationary ergodic information source {V i } over the finite alphabet V over a discrete memoryless channel {X,P Y X, Y}. We fix the compression ratio = n/k, in terms of channel uses per source symbol. A joint source-channel code for this setup is defined by an encoding function and by a decoding function : V k! X n : Y n! V k Copyright G. Caire (Sample Lectures) 144
Source-Channel Separation Theorem (2) The error probability is defined by P (k,n) e = P(V k 6= (Y n ),X n = (V k )) We say that the source is transmissible over the channel with compression ratio if there exists a sequence of codes for k!1and n = k such that P e (k,n)! 0. Copyright G. Caire (Sample Lectures) 145
Source-Channel Separation Theorem (3) Theorem 13. Source-channel coding: A discrete memoryless source {V i } with V i 2 V is transmissible over the discrete memoryless channel {X,P Y X, Y} with compression ratio if H(V ) < C Conversely, if H(V ) > C, the source is not transmissible over the channel. Copyright G. Caire (Sample Lectures) 146
Source-Channel Separation Theorem (4) Proof of Achievability: Separation approach: we concatenate an almost lossless source code with a channel code. Almost lossless source code: if V k 2 T (k) (V ), encode it with k(h(v )+ ) bits, otherwise, declare error. Choose a sequence of capacity achieving codes C n? of rate R>C such that nr k(h(v )+ ) It follows that error probability not larger than 2 can be achieved if (C ) H(V )+ If H(V ) < C, we can find small enough and k large enough such that the above conditions can be satisfied. Copyright G. Caire (Sample Lectures) 147
Source-Channel Separation Theorem (5) Proof of Converse: We let b V k = (Y n ) denote the decoder output, then Fano inequality yields H(V k V b k ) apple 1+P e (k,n) k log V Assume that there exist a sequence of source-channel codes for k!1and n = k such that P (k,n) e! 0. Then... Copyright G. Caire (Sample Lectures) 148
Source-Channel Separation Theorem (6) H(V ) = 1 k H(V k ) = 1 k I(V k ; b V k )+ 1 k H(V k b V k ) apple apple apple 1 k I(V k ; b V k )+ 1 k + P (k,n) e n I(Xn ; Y n )+ k C + k log V from which we conclude that if such sequence of codes exists, then H(V) apple C. Copyright G. Caire (Sample Lectures) 149
Capacity-Cost Function (1) In certain problems it is meaningful to associate to the channel input a cost function. Let b : X! R + such that the per letter cost of an input sequence x is defined as b(x) = 1 nx b(x i ) n i=1 Example: Hamming weight cost: for X = F q, b(x) =1{x 6= 0}. Example: Quadratic cost (related to transmit power): for X R, b(x) =x 2. Copyright G. Caire (Sample Lectures) 150
Capacity-Cost Function (2) Theorem 14. Capacity-Cost Function: The capacity-cost function of the DMC (X,P Y X, Y) with input cost function b : X! R + is given by C(B) = max P X :E[b(X)]appleB I(X; Y ) Proof of Achievability (Sketch): For 0 > 0, choose an input distribution P X such that E[b(X)] apple B 0. Use P X through the random coding argument. Define an additional encoding error as follows: if the selected codeword x(m) violates the input cost, i.e., if 1 n P n i=1 b(x i(m)) >B, then declare an error. Copyright G. Caire (Sample Lectures) 151
Capacity-Cost Function (3) Include this error event in the union bound, and use the typical average lemma: if x(m) 2 T (n) (X), then (1 )(B 0 ) apple 1 n nx b(x i (m)) apple (1 + )(B 0 ) i=1 Choose and 0 such that (1+ )(B 0 ) <B, and conclude that the probability of encoding error can be made less than for sufficiently large n. Copyright G. Caire (Sample Lectures) 152
Capacity-Cost Function (4) Proof of Converse (Sketch): Assume that there exists a sequence of codes {C n } with rate R, such that! 0, and such that P (n) e nr X2 2 nr m=1 1 n nx b(x i (m)) apple B i=1 (notice that here we consider a relaxed version of the input constraint, that holds on average over all codewords, and not for each individual codeword). Define the function C( ) = max I(X; Y ) P X :E[b(X)]apple Notice that C( ) is non-decreasing, and concave in, in fact: C( 1 )+(1 )C( 2 ) apple C( 1 +(1 ) 2 ) Copyright G. Caire (Sample Lectures) 153
Capacity-Cost Function (5) For the n-letter distribution induced by using the codewords of C n with uniform probability over the message M, using Fano inequality as before, we obtain nr apple I(M; Y n )+1+nP e (n) R apple nx I(X i ; Y i )+n n apple i=1 nx C(E[b(X i )]) + n n i=1 apple nc 1 n apple! nx E[b(X i )] + n n i=1 nc(b)+n n Copyright G. Caire (Sample Lectures) 154
Capacity-Cost Function (6) Example: Capacity of the BSC with a Hamming weight input constraint: we can write I(X; Y )=H(Y) H(Y X) =H(Y ) H 2 (p) Hence, we have to maximize H(Y ) subject to E[1{X =1}] apple. Assume P X (1) = 2 [0, 1], then P Y (0) = (1 )(1 p)+ p, P Y (1) = (1 )p + (1 p) We use the compact notation p 0 = p probability vectors. Then, indicating cyclic convolution of the H(Y )=H(p )=H 2 ((1 )p + (1 p)) It can be checked that this is monotonically increasing for decreasing for 2 [1/2, 1]. Hence 2 [0, 1/2] and then C( )= H(p ) H2 (p), for 0 apple apple 1 2 1 H 2 (p), for 1 2 < apple 1 Copyright G. Caire (Sample Lectures) 155
End of Lecture 5 Copyright G. Caire (Sample Lectures) 156