EE/Stat 376B Handout #5 Network Information Theory October, 14, 014 1. Problem.4 parts (b) and (c). Homework Set # Solutions (b) Consider h(x + Y ) h(x + Y Y ) = h(x Y ) = h(x). (c) Let ay = Y 1 + Y, where Y 1 N(0, 1) and Y N(0, a 1) are independent. Then by part (b), h(x + ay ) = h(x + Y 1 + Y ) h(x + Y 1 ) = h(x + Y ).. Problem.11. Since the (nonlinear) MMSE is upper bounded by the linear MMSE, E [ [ ( (X E(X X + Z)) ] E X P ) ] (X + Z) with equality if Z is Gaussian. 3. Problem 3.10. Consider = E [ ( N X = P N C(B) = max I(X; Y ) E(X) B ( ) = max H(Y ) H(Y X) E(X) B = max H(Y ) H(p) E(X) B = max (b) = H(α p) H(p) 0 α min{1,b} { 1 H(p) B 1/, H(B p) H(p) 0 B < 1/, P ) ] Z where follows by setting α = P{X = 1} and (b) follows since H(α p) is monotonically increasing for α [0, 1/] and decreasing for α [1/, 1]. 4. Problem 3.1. 1
It follows immediately from the operational definition of capacity. Alternatively, note that the output becomes Y = agx + az, where a 0. Thus, I(X; Y ) = I(X; agx + az) = h(agx + az) h(az) = 1 log(πe(a g P )) 1 log(πe(a )) = 1 log(g P ). 5. Problem 3.14. Consider the random codebook generation as in the standard achievability proof for the DMC. For the decoding, let A = {m: (x n (m), y n ) T ɛ } and declare L = A if A nl (and take an arbitrary L otherwise). Suppose M = 1. Then the probability of error is bounded as P e = P{M / L(Y n ) M = 1} P{(X n (1), Y n ) / T ɛ M = 1} + P{ A > nl M = 1}. By the LLN, the first term tends to zero as n. By the Markov inequality and the joint typicality lemma, the second term is bounded as P{ A > nl M = 1} E( A M = 1) 1 nl nl 1 + nr m= P{(X n (m), Y n ) T nl + n(r L I(X;Y )+δ(ɛ)), ɛ } which tends to zero as n if R < L + I(X; Y ) δ(ɛ). Alternatively, we can partition nr messages together into n(r L) equal-size groups and map each group into a single codeword. The encoder sends the group index k [1 : n(r L) ] by transmitting x n (k) for m [(k 1) nl + 1 : k nl ]. The decoder finds the correct group index ˆk and simply forms the list of messages associated with ˆk, i.e., L = [(ˆk 1) nl + 1 : ˆk nl ]. Finally, by the channel coding theorem for the standard DMC, the group index can be recovered if R L < C, which completes the proof of achievability. (b) Note that we have a new definition of error in this problem. An error occurs if M / L(Y n ). Define an error random variable E which takes value 1 if there is an
error, and 0 otherwise. Then, H(M Y n ) = H(M Y n ) + H(E M, Y n ) = H(E Y n ) + H(M E, Y n ) H(E) + H(M E, Y n ) 1 + P{E = 0}H(M E = 0, Y n ) + P{E = 1}H(M E = 1, Y n ) 1 + (1 P e ) log nl + P e log( nr 1) 1 + nl + nrp e = nl + nɛ n. This implies that given any sequence of ( nr, nl, n) codes with P e n, we have 0 as nr = H(M) = I(M; Y n ) + H(M Y n ) I(M; Y n ) + nl + nɛ n n I(X i ; Y i ) + nl + nɛ n = i=1 nc + nl + nɛ n, where follows by the proof of the converse for DMC. Therefore, R C + L. 6. Problem 10.1. The optimal rate is R = H(X Y ). (b) Since both the encoder and the decoder know the side information, the encoder only needs to describe the sequences X n such that X n T ɛ (X y n ) for each observed y n sequence, which requires n(h(x Y ) + δ(ɛ)) bits. An error occurs only if (X n, Y n ) / T ɛ (X, Y ). By the LLN, the probability of this event tends to zero as n. (c) Consider nr H(M) I(M; X n Y n ) = H(X n Y n ) H(X n M, Y n ) n H(X i Y n, X i 1 ) nɛ n = i=1 n H(X i Y i ) nɛ n i=1 = nh(x Y ) nɛ n, where follows by Fano s inequality. 3
(d) For distributed lossless source coding, suppose that a genie provides side information X n to both encoder 1 and the decoder. From part (c), the rate required in this setting is R 1 H(X 1 X ), which establishes an upper bound on R 1 for the case without genie. Similarly, by introducing a genie providing side information X1 n to both encoder and the decoder, we have R H(X X 1 ). 7. Problem 10.6. By identifying X 1 = (X, Y ) and X = Y in the distributed lossless source coding setting, the optimal rate region is the set of rate pairs (R 1, R ) such that or equivalently, R 1 H(X, Y Y ), R H(Y X, Y ), R 1 + R H(X, Y ), R 1 H(X Y ), R 1 + R H(X, Y ). (b) The optimal rate region is the set of rate pairs (R 1, R ) such that R 1 + R H(Y ). This can be achieved by ignoring X n at encoder 1. For the converse, consider where follows by Fano s inequality. Extra problems. 8. Problem.6. n(r 1 + R ) H(M 1, M ) I(Y n ; M 1, M ) H(Y n ) nɛ n = nh(y ) nɛ n, Let X N(0, σ 1 x ). Since f X (x) = ( ) exp( πσ σ ), we have f X (x) log f X (x)dx = f X (x) ( 1 ) log(πσ ) x dx σ = f X (x) ( 1 ) log(πσ ) x dx σ = f X (x) log f X (x)dx = h(x ), where equality follows since E(X ) = EX. 4
(b) By the nonnegativity of relative entropy, h(x) = f X (x) log f X (x)dx ( = f X (x) log f ) X(x) f X (x) + log f X (x) dx = D(f X f X ) f X (x) log f X (x)dx (c) Let X N(0, K). Since h(x ). f X (x) = 1 (π) N K 1 exp( 1 xt K 1 x), we have f X (x) log f X (x)dx = = = ( ) f X (x) log(π) N 1 K x T K 1 x ( ) fx(x) log(π) N 1 K x T K 1 x f X (x) log f X (x)dx. Following similar steps as in (b), we have 9. Problem.1. h(x) = D(f X f X ) + h(x ) h(x ). By Problem.6, we have h(x X + Z) 1 (πe (E(X log ) with equality iff both X and Z are Gaussian. (b) Consider )) (E(XY )) = 1 ( E(Y ) log πe P N ) I(X ; X + Z) = h(x ) h(x X + Z) 1 log(πep ) 1 ( log πe P N ) = h(x ) h(x X + Z ) where step follows by part. = I(X ; X + Z ), 5
10. Problem 10.. Given a sequence of ( nr 1, nr, n) codes with lim n P e = 0, let (M 1, M ) be the random variables corresponding to the indices each generated by the corresponding encoder. The outer bound (10.1) can be shown as follows. By Fano s inequality, H(X n 1, X n M 1, M ) H(X n 1, X n ˆX n 1, ˆX n ) nɛ n, where ɛ n tends to zero as n by the assumption that lim n P e consider = 0. Now n(r 1 + R ) H(M 1, M ) = I(X1 n, X n ; M 1, M ) = nh(x 1, X ) H(X1 n, X n M 1, M ) nh(x 1, X ) nɛ n. By taking n, we conclude that R 1 + R H(X 1, X ). The outer bound (10.) can be shown as follows. By Fano s inequality, H(X1 n M 1 ) H(X1 n ˆX 1 n ) nɛ n, where ɛ n tends to zero as n by the assumption that lim n P e consider = 0. Now nr 1 H(M 1 ) H(M 1 X n ) = I(X n 1 ; M 1 X n ) = nh(x 1 X ) H(X n 1 M 1, X n ) nh(x 1 X ) H(X n 1 M 1 ) nh(x 1 X ) nɛ n. By taking n, we conclude that R 1 H(X 1 X ). Similarly, we can show that R H(X X 1 ). 11. Problem 10.3. Codebook generation. Generate a random codebook such that each codeword m(x n ) is a linear function of x n (in binary field arithmetic). In particular, let m(x) = Hx n, where m [1 : nr ] is represented by a vector of nr bits and the elements of the nr n parity-check random binary matrix H are generated i.i.d. Bern(1/). The chosen parity-check matrix is revealed to the encoder and the decoder. Encoding. Upon observing x n, the encoder sends m(x) = Hx n. Decoding. Upon receiving m, the decoder declares ˆx n to be the estimate of the source sequence if it is the unique typical sequence such that m = H ˆx n ; otherwise, it declares an error. 6
Analysis of the probability of error. We show that the probability of error averaged over parity check matrices tends to zero as n if R > H(X). Let M denote the random message M = HX n. Let M = [M 1, M,, M nr ]. Since the elements of H is independent, it can be easily shown that M i (x n ) is i.i.d. Bern(1/), i [1 : nr], for each x n 0, and M(x n ) and M( x n ) are independent for each x n x n. The decoder makes an error iff one or both of the following events occur: E 1 = {X n T ɛ }, E = {H x n = HX n for some x n X n, x T ɛ }. Then by the symmetry of codebook construction and the union of events bound, the average probability of error is upper bounded as P(E) P(E 1 ) + P(E ) = P(E 1 ) + P(E HX n = 1). We now bound each probability of error term. By the LLN, P(E 1 ) tends to zero as n. For the second term, consider P(E HX n = 1) p(x n ) P{H x n = 1} x n x n T ɛ, x n x n T ɛ nr n(h(x)δ(ɛ) nr, where step follows because each element of H x n is i.i.d. Bern(1/). Thus, the probability that all the elements are one is equal to (1/) nr. Hence, there must exist a parity check matrix with lim n P e = 0. 7