EE376A: Homeworks #4 Solutions Due on Thursday, February 22, 2018 Please submit on Gradescope. Start every question on a new page.

Similar documents
EE376A - Information Theory Final, Monday March 14th 2016 Solutions. Please start answering each question on a new page of the answer booklet.

EE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm

Electrical and Information Technology. Information Theory. Problems and Solutions. Contents. Problems... 1 Solutions...7

ECE 4400:693 - Information Theory

EE/Stat 376B Handout #5 Network Information Theory October, 14, Homework Set #2 Solutions

LECTURE 10. Last time: Lecture outline

Lecture 3: Channel Capacity

Capacity of a channel Shannon s second theorem. Information Theory 1/33

Solutions to Homework Set #3 Channel and Source coding

Lecture 8: Channel and source-channel coding theorems; BEC & linear codes. 1 Intuitive justification for upper bound on channel capacity

Lecture 8: Channel Capacity, Continuous Random Variables

EE5139R: Problem Set 7 Assigned: 30/09/15, Due: 07/10/15

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

Chapter 8: Differential entropy. University of Illinois at Chicago ECE 534, Natasha Devroye

Lecture 14 February 28

Network coding for multicast relation to compression and generalization of Slepian-Wolf

Note that the new channel is noisier than the original two : and H(A I +A2-2A1A2) > H(A2) (why?). min(c,, C2 ) = min(1 - H(a t ), 1 - H(A 2 )).

Solutions to Homework Set #4 Differential Entropy and Gaussian Channel

National University of Singapore Department of Electrical & Computer Engineering. Examination for

Lecture 6 I. CHANNEL CODING. X n (m) P Y X

LECTURE 13. Last time: Lecture outline

EE376A: Homework #2 Solutions Due by 11:59pm Thursday, February 1st, 2018

ELEC546 Review of Information Theory

Lecture 15: Conditional and Joint Typicaility

Lecture 5: Asymptotic Equipartition Property

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016

Shannon s noisy-channel theorem

Solutions to Homework Set #1 Sanov s Theorem, Rate distortion

Lecture 22: Final Review

(each row defines a probability distribution). Given n-strings x X n, y Y n we can use the absence of memory in the channel to compute

Shannon s Noisy-Channel Coding Theorem

Lecture 5: Channel Capacity. Copyright G. Caire (Sample Lectures) 122

Homework Set #3 Rates definitions, Channel Coding, Source-Channel coding

ECE Information theory Final

Concentration Inequalities

Lecture 2: August 31

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

Quiz 2 Date: Monday, November 21, 2016

x log x, which is strictly convex, and use Jensen s Inequality:

ECE Information theory Final (Fall 2008)

Homework Set #2 Data Compression, Huffman code and AEP

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University

Lecture 4 Noisy Channel Coding

(Classical) Information Theory III: Noisy channel coding

Principles of Communications

EE5319R: Problem Set 3 Assigned: 24/08/16, Due: 31/08/16

18.175: Lecture 15 Characteristic functions and central limit theorem

Shannon s Noisy-Channel Coding Theorem

Chapter 4: Continuous channel and its capacity

ECE 534 Information Theory - Midterm 2

EE376A - Information Theory Midterm, Tuesday February 10th. Please start answering each question on a new page of the answer booklet.

Homework 1 Due: Thursday 2/5/2015. Instructions: Turn in your homework in class on Thursday 2/5/2015

Solutions to Homework Set #3 (Prepared by Yu Xiang) Let the random variable Y be the time to get the n-th packet. Find the pdf of Y.

X 1 : X Table 1: Y = X X 2

Chapter 2: Random Variables

3F1: Signals and Systems INFORMATION THEORY Examples Paper Solutions

Lecture 11: Continuous-valued signals and differential entropy

Lecture 17: Differential Entropy

LECTURE 3. Last time:

Chapter I: Fundamental Information Theory

LECTURE 15. Last time: Feedback channel: setting up the problem. Lecture outline. Joint source and channel coding theorem

UCSD ECE250 Handout #27 Prof. Young-Han Kim Friday, June 8, Practice Final Examination (Winter 2017)

SDS 321: Introduction to Probability and Statistics

1.1 Review of Probability Theory

MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

UCSD ECE 153 Handout #20 Prof. Young-Han Kim Thursday, April 24, Solutions to Homework Set #3 (Prepared by TA Fatemeh Arbabjolfaei)

(Classical) Information Theory II: Source coding

Revision of Lecture 5

Lecture 10: Broadcast Channel and Superposition Coding

Capacity of AWGN channels

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science Transmission of Information Spring 2006

Exercises with solutions (Set D)

Lecture 5 Channel Coding over Continuous Channels

18.2 Continuous Alphabet (discrete-time, memoryless) Channel

Appendix B Information theory from first principles

Chapter 6. Continuous Sources and Channels. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information

Notes 3: Stochastic channels and noisy coding theorem bound. 1 Model of information communication and noisy channel

Exercise 1. = P(y a 1)P(a 1 )

Math 3215 Intro. Probability & Statistics Summer 14. Homework 5: Due 7/3/14

Solutions to Set #2 Data Compression, Huffman code and AEP

Information Theory. Lecture 10. Network Information Theory (CT15); a focus on channel capacity results

EE 376A: Information Theory Lecture Notes. Prof. Tsachy Weissman TA: Idoia Ochoa, Kedar Tatwawadi

Hands-On Learning Theory Fall 2016, Lecture 3

Lecture 2. Capacity of the Gaussian channel

10-704: Information Processing and Learning Fall Lecture 9: Sept 28

1 Introduction to information theory

Noisy-Channel Coding

Information Theory Primer:

2 Functions of random variables

Communication Theory II

Chapter 9 Fundamental Limits in Information Theory

Lecture 8: Shannon s Noise Models

CHAPTER 3. P (B j A i ) P (B j ) =log 2. j=1

EE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Transcription:

EE376A: Homeworks #4 Solutions Due on Thursday, February 22, 28 Please submit on Gradescope. Start every question on a new page.. Maximum Differential Entropy (a) Show that among all distributions supported in an interval [a,b], the uniform distribution maximizes differential entropy. (b) Let X be a continuous random variable with E[X 4 ] σ 4 and Y be ( a continuous ) random variable with a probability density function g(y) = c exp y4 where 4σ 4 c = ). Show that ( exp y4 4σ 4 dy h(x) h(y ) with equality if and only if X is distributed as Y. [Hint: you can use the fact that E[Y 4 ] = σ 4.] Solutions: Maximum Differential Entropy (a) Denote by u(x) the uniform distribution, with x [a, b], such that u(x) = b a if x [a, b], and otherwise. Let g(x) be any distribution supported in the interval [a, b]. Then, we have D(g u) () = g(x) log g(x) (2) u(x) = g(x) log ((b a)g(x)) (3) = log(b a) + g(x) log g(x) (4) = log(b a) H(X), (5) which implies H(X) log(b a). On the other hand, note that if x is uniformly distributed in the interval [a, b], we have which finishes the proof. H(X) = = u(x) log u(x) u(x) log(b a) = log(b a), Homework 4 Page of 9

(b) Since E[X 4 ] σ 4 = E[Y 4 ], we have [ D(f X g) = E log f ] X(X) g(x) = h(x) + E [ log g(x)] ] = h(x) + E [ log c + X4 4σ log e 4 h(x) + E [ log c + Y ] 4 4σ log e 4 = h(x) + E [ log g(y )] = h(x) + h(y ). Therefore, h(y ) h(x) + D(f X g) h(x). 2. Cascaded BSCs. Consider the two discrete memoryless channels (X, p (y x), Y) and (Y, p 2 (z y), Z). Let p (y x) and p 2 (z y) be binary symmetric channels with crossover probabilities λ and λ 2 respectively. X λ λ 2 λ λ 2 λ λ 2 λ λ 2 Y Z (a) What is the capacity C of p (y x)? (b) What is the capacity C 2 of p 2 (z y)? (c) We now cascade these channels. Thus p 3 (z x) = y p (y x)p 2 (z y). What is the capacity C 3 of p 3 (z x)? (d) Now let us actively intervene between channels and 2, rather than passively transmit y n. What is the capacity of channel followed by channel 2 if you are allowed to decode the output y n of channel and then reencode it as ỹ n for transmission over channel 2? (Think W x n (W ) y n ỹ n (y n ) z n Ŵ.) (e) What is the capacity of the cascade in part c) if the receiver can view both Y and Z? Homework 4 Page 2 of 9

Solution: Cascaded BSCs (a) C is just a capacity of a BSC(λ ). Thus, C = H(λ ). (b) Similarly, C 2 = H(λ 2 ). (c) First observe that the cascaded channel is also a BSC. Since the new BSC has a crossover probability of p 3 = λ ( λ 2 ) + ( λ )λ 2 = λ + λ 2 2λ λ 2, C 3 = H(λ + λ 2 2λ λ 2 ). Note that the new channel is noisier than the original two since by concavity of H(p), H(( λ )λ 2 + λ ( λ 2 )) λ 2 H( λ ) + ( λ 2 )H(λ ) = H(λ ) Similarly for H(λ 2 ). Thus, C 3 minc, C 2 }. (d) Since we are allowed to decode the intermediate outputs and reencode them prior to the second transmission, any rate less than both C and C 2 can be achievable and at the same time any rate greater than either C or C 2 will cause P ɛ (n) exponentially. Hence, the overall capacity is the minimum of two capacities, min(c, C 2 ) = min( H(λ ), H(λ 2 )). (e) Note that Z becomes irrelevant once we observe Y. Thus, the capacity of this channel is just C = H(λ ). Alternatively, X Y (Y, Z) forms a Markov chain so that I(X; Y ) I(X; Y, Z). On the other hand, I(X; Y ) I(X; Y, Z) since we can always ignore the observation Z. (Or X (Y, Z) Y also forms a Markov chain.) Hence, I(X; Y ) = I(X; Y, Z) and the capacity of this case is C. 3. Tensor Power Trick We have seen the proof of Kraft s inequality for uniquely decodable codes via the tensor power trick: we upper bound ( i 2 l i) k and then let k. This is a powerful tool in various problems (e.g., harmonic analysis) where some product structure is available. In this problem we look at another application in information theory. Let (X, Y ),, (X n, Y n ) (X, Y ) be i.i.d discrete random variables. For any ɛ >, define the following ɛ-typical sets: A (n) ɛ (X) = (x n, y n ) : } n log p(xn ) H(X) A (n) ɛ (Y ) = (x n, y n ) : } n log p(yn ) H(Y ) A (n) ɛ (X, Y ) = (x n, y n ) : } n log p(xn, y n ) H(X, Y ) and define A (n) ɛ = A (n) ɛ (X) A (n) ɛ (Y ) A ɛ (n) (X, Y ). Homework 4 Page 3 of 9

(a) Show that P((X n, Y n ) A (n) ɛ ) as n. (b) Show that for n large enough, we have ( ɛ)2 n(h(x,y ) ɛ) A (n) ɛ 2 n(h(x)+ɛ) n(h(y )+ɛ) 2 (c) Conclude from (b) that H(X, Y ) H(X) + H(Y ) by taking n and ɛ. This gives another proof of I(X; Y ) without using any convexity/concavity of mutual information and/or KL divergence. Solution: Tensor Power Trick (a) (From Lecture 9, courtesy scribers) We apply WLLN and convergence in probability on the three conditions of the jointly typical set. That is, there exists n, n 2, n 3 such that for all n > n, we have ( P ) n log p(xn ) H(X) ɛ < ɛ/3, and for all n > n 2, we have ( P ) n log p(yn ) H(Y ) ɛ < ɛ/3, and for all n > n 3, we have ( P ) n log p(xn, y n ) H(X, Y ) ɛ < ɛ/3. All three apply for n greater than the largest of n, n 2, n 3. Therefore the probability of the union the set of (x n, y n ) satisfying these inequalities must be less than ɛ, and for n sufficiently large, the probability of the set A (n) ɛ is greater than ɛ. (b) (Lower bound from Lecture 9, courtesy scribers) Upper Bound: First suppose we have S x X n and S y Y n. Then we have S x S y = (x n, y n ) : x n S x, y n S y } X n Y n and S x S y = S x S y. Now, define S x = x n : } n log p(xn ) H(X) S y = y n : } n log p(yn ) H(Y ) Homework 4 Page 4 of 9

Then by the AEP, we know that S x 2 n(h(x)+ɛ) and S y 2 n(h(y )+ɛ). Also observe that S x S y = A (n) ɛ (X) A (n) ɛ (Y ) and hence we have A (n) ɛ A (n) ɛ (X) A (n) ɛ (Y ) S x S y = S x S y Lower Bound: ( By Part, P (X n, Y n ) A (n) ɛ (X, Y ) Thus, for large n: (c) First take logarithm to get Dividing by n, 2 n(h(x)+ɛ) n(h(y )+ɛ) 2 ) n. ɛ P ((X n, Y n ) A (n) ɛ (X, Y )) (x n,y n ) A (n) ɛ n(h(x,y ) ɛ) 2 = 2 n(h(x,y ) ɛ) A (n) ɛ (X, Y ) = A (n) n(h(x,y ) ɛ) ɛ (X, Y ) ( ɛ)2 log( ɛ) + n(h(x, Y ) ɛ) n(h(x) + ɛ) + n(h(y ) + ɛ) log( ɛ) n Letting n for fixed ɛ, + (H(X, Y ) ɛ) (H(X) + ɛ) + (H(Y ) + ɛ) (H(X, Y ) ɛ) (H(X) + ɛ) + (H(Y ) + ɛ) This holds for all ɛ >, so let ɛ and get H(X, Y ) H(X) + H(Y ) 4. Channel with uniformly distributed noise. Consider an additive channel whose input alphabet X = 2,,,, 2}, and whose output Y = X + Z, where Z is uniformly distributed over the interval [, ]. Thus the input of the channel is a discrete random variable, while the output is continuous. Calculate the capacity C = max p(x) I(X; Y ) of this channel. Solution: Channel with uniformly distributed noise We can expand the mutual information I(X; Y ) = h(y ) h(y X) = h(y ) h(z) Homework 4 Page 5 of 9

and h(z) = log 2, since Z U(, ). The output Y is a sum a of a discrete and a continuous random variable, and if the probabilities of X are p 2, p,..., p 2, then the output distribution of Y has a uniform distribution with weight p 2 /2 for 3 Y 2, uniform with weight (p 2 + p )/2 for 2 Y, etc. Given that Y ranges from -3 to 3, the maximum entropy that it can have is an uniform over this range. This can be achieved if the distribution of X is (/3,, /3,,/3). Then h(y ) = log 6 and the capacity of this channel is C = log 6 log 2 = log 3 bits. Homework 4 Page 6 of 9

5. Exponential Noise Channel and Exponential Source Recall that X Exp(λ) is to say that X is a continuous non-negative random variable with density λe f X (x) = λx if x if x < or, equivalently, that X is a random variable with characteristic function ϕ X (t) = E [ e itx] = Recall also that in this case EX = /λ. (a) Find the differential entropy of X Exp(λ). it/λ. (b) Prove that Exp(λ) uniquely maximizes the differential entropy among all nonnegative random variables confined to EX /λ. Hint: Recall our proof of an analogous fact for the Gaussian distribution. Fix positive scalars a and b. Let X be the non-negative random variable of mean a formed by taking X = with probability b and, with probability a, drawing from a+b a+b an exponential distribution Exp(/(a + b)). Equivalently stated, X is the random variable with characteristic function ϕ X (t) = b a + b + Let N Exp(/b) and independent of X. a a + b it(a + b). (c) What is the distribution of X + N? Tip: simplest would be to compute the characteristic function of X + N by recalling the relation ϕ X+N (t) = ϕ X (t) ϕ N (t). (d) Find I(X; X + N). (e) Consider the problem of communication over the additive exponential noise channel Y = X +N, where N Exp(/b), independent of the channel input X, which is confined to being non-negative and satisfying the moment constraint EX a. Find C(a) = max I(X; X + N), where the maximization is over all non-negative X satisfying EX a. What is the capacity-achieving distribution? Hint: Using findings from previous parts, show that for any non-negative random variable X, independent of N, with EX a, we have I(X; X+N) I(X; X+N). Homework 4 Page 7 of 9

Solution: Exponential Noise Channel and Exponential Source (a) h(x) = = = = log λ. f X (x) log(f X (x))dx λe λx log(λe λx )dx λe λx log(λ) + λ xλe λx dx (b) Let the probability density of any such non-negative random variable be f X, while g X is the density of Exp(λ) as in Part () above, (c) (d) h(x) = = f X (x) log(f X (x))dx f X (x) log( f X(x) g X (x) )dx = D(f X g X ) log(λ) = log λ D(f X g X ) log λ, f X (x)dx + λ f X (x) log(λe λx )dx xf X (x)dx where the last inequality is due to the fact that D(f X g X ), equality holds if X = Exp(λ). ϕ X+N (t) = ϕ X (t) ϕ N (t) b = ( a + b + a a + b it(a + b) ) itb a + b itab itb 2 = it(a + b) (a + b)( itb) = it(a + b), which is the characteristic function of Exp(/a + b). Thus X + N is distributed as Exp(/a + b). I(X; X + N) = h(x + N) h(x + N X) N X = h(x + N) h(n) = + log(a + b) ( + log(b)) = log( + a b ) Homework 4 Page 8 of 9

(e) For any feasible X, note that X + N is a non-negative random variable and E[X +N] = E[X]+E[N] a+b, thus by the result of Part (2) above, h(x +N) + log(a + b). Hence, I(X; X + N) = h(x + N) h(x + N X) N X = h(x + N) h(n) ( ) + log(a + b) h(n) = + log(a + b) ( + log(b)) = log( + a b ) = I(X; X + N), Thus C(a) log(+ a b ). Equality in ( ) holds if X = X proving C(a) = log(+ a b ). Maximizing distribution is that of X. Homework 4 Page 9 of 9