Information Theory. Lecture 10. Network Information Theory (CT15); a focus on channel capacity results

Information Theory Lecture 10 Network Information Theory (CT15); a focus on channel capacity results The (two-user) multiple access channel (15.3) The (two-user) broadcast channel (15.6) The relay channel (15.7) Some remarks on general multiterminal channels (15.10) Mikael Skoglund, Information Theory 1/25 Joint Typicality Extension of previous results to an arbitrary number of variables (most basic defs here, many additional results in CT) Notation For any k-tuple x k 1 = (x 1, x 2,..., x k ) X 1 X 2 X k and subset of indices S {1, 2,..., k} let x S = (x i ) i S denote the restriction of x k 1 to components with indices in S. Assume x i Xi n, any i, and let x S be a matrix with x i as rows for i S. Let the S -tuple x S,j be the jth column of x S.. As in CT, a n = 2 n(c±ε) means 1 n log a n c < ε, for all sufficiently large n Mikael Skoglund, Information Theory 2/25

For random variables X1 k with joint distribution p(xk 1 ): Generate X S via n independent copies of x S,j, j = 1,..., n. Then, n Pr(X S = x S ) = p(x S,j ) p(x S ) j=1 For S {1, 2,..., k}, define the set of ε-typical n-sequences { A (n) ε (S) = x S : Pr(X S = x S ) =. } 2 n[h(x S )±ε], S S Then, for any ε > 0, sufficiently large n, and S {1,..., k}, P (A (n) ε (S)) 1 ε p(x S ). = 2 n[h(x S)±ε] A (n) ε (S) =. 2 n[h(x S)±2ε] if x S A (n) ε (S) Mikael Skoglund, Information Theory 3/25 The Multiple Access Channel encoder 1 W 1 α 1 ( ) encoder 2 W 2 α 2 ( ) X 1 X 2 channel p(y x 1, x 2 ) Y decoder β( ) Ŵ 1 Ŵ 2 Two users communicating over a common channel. (The generalization to more than two is straightforward.) Mikael Skoglund, Information Theory 4/25

Coding: Memoryless pmf (or pdf): p(y x 1, x 2 ), x 1 X 1, x 2 X 2, y Y Data: W 1 I 1 = {1,..., M 1 } and W 2 I 2 = {1,..., M 2 } Assume W 1 and W 2 uniformly distributed and independent Encoders: α 1 : I 1 X n 1 and α 2 : I 2 X n 2 Rates: R 1 = 1 n log M 1 and R 2 = 1 n log M 2 Decoder: β : Y n I 1 I 2, β(y n ) = (Ŵ1, Ŵ2) Error probability: P e (n) = Pr ( (Ŵ1, Ŵ2) (W 1, W 2 ) ) Mikael Skoglund, Information Theory 5/25 Capacity: We have two (or more) rates, R 1 and R 2 = cannot consider one maximum achievable rate = study sets of achievable rate-pairs (R 1, R 2 ) = trade-off between R 1 and R 2 Achievable rate-pair: (R 1, R 2 ) is achievable if (α 1, α 2, β) exists such that P e (n) 0 as n Capacity region: The closure of the set of all achievable rate-pairs (R 1, R 2 ) Mikael Skoglund, Information Theory 6/25

Capacity Region for the Multiple Access Channel Fix π(x 1, x 2 ) = p 1 (x 1 )p 2 (x 2 ) on X 1 and X 2. Draw { X n 1 (i) : i I 1} and { X n 2 (j) : j I 2 } in an i.i.d. manner according to p 1 and p 2. Symmetry of codebook generation = P (n) e = Pr ( (Ŵ1, Ŵ2) (W 1, W 2 ) ) = Pr { (Ŵ1, Ŵ2) (1, 1) (W 1, W 2 ) = (1, 1) } where the second Pr is with respect to the channel and the random codebook design. Mikael Skoglund, Information Theory 7/25 Also Pr ( (Ŵ1, Ŵ2) (1, 1) ) = Pr(Ŵ1 1, Ŵ2 1) + Pr(Ŵ1 1, Ŵ2 = 1) + Pr(Ŵ1 = 1, Ŵ2 1) = P (n) 12 + P (n) 1 + P (n) 2 conditioned that (W 1, W 2 ) = (1, 1) everywhere. Joint typicality decoding, declare (Ŵ1, Ŵ2) = (1, 1) if only for i = j = 1 ( X n 1 (i), X n 2 (j), Y n) A (n) ε P (n) 12 2n[R 1+R 2 I(X 1,X 2 ;Y )+4ε] P (n) 1 2 n[r 1 I(X 1 ;Y X 2 )+3ε] P (n) 2 2 n[r 2 I(X 2 ;Y X 1 )+3ε] Mikael Skoglund, Information Theory 8/25

I(X 2 ; Y X 1 ) R 2 A I(X 2 ; Y ) B R 1 I(X 1 ; Y ) I(X 1 ; Y X 2 ) Hence, for a fixed π(x 1, x 2 ) = p 1 (x 1 )p 2 (x 2 ) the capacity region contains at least all pairs (R 1, R 2 ) in the set Π defined by R 1 < I(X 1 ; Y X 2 ) R 2 < I(X 2 ; Y X 1 ) R 1 + R 2 < I(X 1, X 2 ; Y ) Mikael Skoglund, Information Theory 9/25 The corner points Consider the point A R 1 = I(X 1 ; Y ) R 2 = I(X 2 ; Y X 1 ) } R 1 + R 2 = I(X 1, X 2 ; Y ) User 1 ignores the presence of user 2 R 1 = I(X 1 ; Y ) Decode user 1 s codeword User 2 sees an equivalent channel with input X2 n and output (Y n, X1 n ) R 2 = I(X 2 ; Y, X 1 ) = I(X 2 ; Y X 1 ) + I(X 1 ; X 2 ) = I(X 2 ; Y X 1 ) The above can be repeated with 1 2 and A B Points on the line A B can be achieved by time sharing Mikael Skoglund, Information Theory 10/25

Each particular choice of distribution π gives an achievable region Π; for two different π s, R 2 π 1 π 2 R 1 Fixed π = Π is convex. Varying π = Π can be non-convex. However all rates on a line connecting two achievable rate-pairs are achievable by time-sharing. Mikael Skoglund, Information Theory 11/25 The capacity region for the multiple access channel is the closure of the convex hull of the set of points defined by the three inequalities R 1 < I(X 1 ; Y X 2 ) R 2 < I(X 2 ; Y X 1 ) R 1 + R 2 < I(X 1, X 2 ; Y ) over all possible product distributions p 1 (x 1 )p 2 (x 2 ) for (X 1, X 2 ). Mikael Skoglund, Information Theory 12/25

Example: A Gaussian Channel Bandlimited AWGN channel with two additive users Y (t) = X 1 (t) + X 2 (t) + Z(t). The noise Z(t) is zero-mean Gaussian with power spectral density N 0 /2, and X 1 (t) and X 2 (t) are subject to the power constraints P 1 and P 2, respectively. The available bandwidth is W. The capacity of the corresponding single-user channel (with power constraint P ) is ( ) P W C [bits/second] where C(x) = log(1 + x). Mikael Skoglund, Information Theory 13/25 Time-Division Multiple-Access (TDMA): Let user 1 use all of the bandwidth with power P 1 /α a fraction α [0, 1] of time, and let user 2 use all the bandwidth with power P 2 /(1 α) the remaining fraction 1 α of time. The achievable rates then are ( ) ( ) P1 /α P2 /(1 α) R 1 < W α C R 2 < W (1 α) C Frequency-Division Multiple-Access (FDMA): Let user 1 transmit with power P 1 using a fraction α of the available bandwidth W, and let user two transmit with power P 2 the remaining fraction (1 α)w. The achievable rates are ( ) P1 R 1 < αw C α ( P 1 R 2 < (1 α)w C (1 α) TDMA and FDMA are equivalent from a capacity perspective! ) Mikael Skoglund, Information Theory 14/25

Code-Division Multiple-Access (CDMA): Defined, in our context, as all schemes that can be implemented to achieve the rates in the true capacity region R 1 R 2 R 1 + R 2 ( ) ( P1 W C = W log 1 + P ) 1 ( ) ( P2 W C = W log 1 + P ) 2 ( ) ( P1 + P 2 W C = W log 1 + P ) 1 + P 2 Mikael Skoglund, Information Theory 15/25 I R 2 T/FDMA CDMA R 2 T/FDMA CDMA I I R 1 2I R 1 Capacity region for P 1 = P 2 Capacity region for P 1 = 2P 2 Note that T/FDMA is only optimal when α 1 α = P 1 P 2. Mikael Skoglund, Information Theory 16/25

The Broadcast Channel (W 0, W 1, W 2 ) encoder α( ) X channel p(y 1, y 2 x) Y 1 Y 2 decoder 1 β 1 ( ) decoder 2 β 2 ( ) (Ŵ0, Ŵ1) (Ŵ0, Ŵ2) One transmitter, several receivers Message W 0 is a public message for both receivers, whereas W 1 and W 2 are private messages Mikael Skoglund, Information Theory 17/25 The Degraded Broadcast Channel Y 1 X p(y1, y 2 x) Y 2 X p(y 1 x) Y 1 p(y 2 y 1 ) Y 2 A broadcast channel is degraded if it can be split as in the figure. That is, Y 2 is a noisier version of X than Y 1, p(y 1, y 2 x) = p(y 2 y 1 )p(y 1 x). The Gaussian and the binary symmetric broadcast channels are degraded (see the examples in CT). Mikael Skoglund, Information Theory 18/25

Superposition Coding for the Degraded Broadcast Channel W 1 W 2 Codebook Y 1 -receiver Y 2 -receiver Assume there is no common information (for simplicity). Let W 2 chose a cloud of possible W 1 -codewords. The Y 1 -receiver sees all codewords, whereas the Y 2 -receiver is only able to distinguish between clouds. Mikael Skoglund, Information Theory 19/25 The capacity region of the degraded broadcast channel (with no common information), is the closure of the convex hull of all rates satisfying R 2 < I(U; Y 2 ) R 1 < I(X; Y 1 U) for some distribution p 1 (u)p 2 (x u). Proof: Choose W 2 -codewords i.i.d. according to p 1 (u), and for each one, choose W 1 -codewords i.i.d. according to p 2 (x u). The overall channel from W 2 to Y 2 (the clouds) can be made error-free as long as R 2 < I(U; Y 2 ), and conditioned on W 2 the channel from W 1 to Y 1 can be made error-free as long as R 1 < I(X; Y 1 U). Converse proved in Problem 15.11 (based on Fano, as usual). Mikael Skoglund, Information Theory 20/25

The capacity region of the degraded broadcast channel (with common information): If the pair (R 1 = a, R 2 = b) is achievable for independent messages, as before, the triple (R 0, R 1 = a, R 2 = b R 0 ) is achievable with common information at rate R 0 (as long as R 0 < b). Since the better receiver can decode both W 1 and W 2, part of the W 2 -message can be made to include common information! Mikael Skoglund, Information Theory 21/25 The Relay Channel relay encoder { fi (Y i 1 1,1 )} n i=1 W encoder α( ) X Y 1 p(y, y 1 x, x 1 ) X 1 Y decoder β( ) Ŵ One sender, one receiver, and one intermediate node The problem does not define the set of relay functions {f i ( )} n i=1. The relay s strategy might be to decode the message, or compress its channel observation, or amplify it and retransmit it, or... Mikael Skoglund, Information Theory 22/25

Capacity is not known in general. Some known bounds: Cut-set upper bound: The relay is assumed to be co-located with the transmitter or with the receiver. R max p(x,x 1 ) min { I(X, X 1; Y ), I(X; Y, Y 1 X 1 ) } Decode-and-forward lower bound: R max p(x,x 1 ) min { I(X, X 1; Y ), I(X; Y 1 X 1 ) } Proof: Split transmission in b blocks. Choose 2 n R codewords i.i.d. p(x 2 ), and for each one, choose 2 nr codewords i.i.d. p(x 1 x 2 ) and distribute them in 2 n R bins. The relay can decode the message if R < I(X; Y 1 X 1 ) and then it sends the bin index in the next block. The receiver can decode the bin index if R < I(X1 ; Y ) and, knowing this index, it can decode the message from the previous block if R R < I(X; Y X 1 ). These bounds coincide if the relay channel is degraded: p(y, y 1 x, x 1 ) = p(y 1 x, x 1 )p(y y 1, x 1 ) Mikael Skoglund, Information Theory 23/25 General Multiterminal Systems M different nodes, each transmitting X m and receiving Y m. The message from node i to node j is W i,j with rate R i,j. The channel between nodes is p(y 1,..., y M x 1,..., x M ). Although significant progress in recent years, still only a few general results are known. One of them is El Gamal s cut-set bound: If the rates {R i,j } are achievable there exists a p(x 1,..., x m ) such that for all S {1,..., M}. i S,j S c R i,j < I(X (S) ; Y (Sc) X (Sc) ) Mikael Skoglund, Information Theory 24/25

The source channel separation principle: For general multiterminal networks the source and channel codes cannot be designed separately without loss. Essentially the source code needs to know the channel to provide optimal dependencies between channel input variables. Feedback: Feedback can increase the capacity of a multi-terminal channel, since it can help transmitters to cooperate to reduce interference. Mikael Skoglund, Information Theory 25/25