Electrical and Information Technology. Information Theory. Problems and Solutions. Contents. Problems... 1 Solutions...7

Similar documents
Capacity of a channel Shannon s second theorem. Information Theory 1/33

EE376A: Homeworks #4 Solutions Due on Thursday, February 22, 2018 Please submit on Gradescope. Start every question on a new page.

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

Lecture 3: Channel Capacity

Lecture 8: Channel and source-channel coding theorems; BEC & linear codes. 1 Intuitive justification for upper bound on channel capacity

Solutions to Homework Set #3 Channel and Source coding

Solutions to Homework Set #1 Sanov s Theorem, Rate distortion

EE376A - Information Theory Final, Monday March 14th 2016 Solutions. Please start answering each question on a new page of the answer booklet.

Exercise 1. = P(y a 1)P(a 1 )

CSCI 2570 Introduction to Nanocomputing

Shannon s noisy-channel theorem

ECE Information theory Final

EE5139R: Problem Set 7 Assigned: 30/09/15, Due: 07/10/15

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

National University of Singapore Department of Electrical & Computer Engineering. Examination for

Noisy-Channel Coding

Notes 3: Stochastic channels and noisy coding theorem bound. 1 Model of information communication and noisy channel

(Classical) Information Theory III: Noisy channel coding

Lecture 8: Channel Capacity, Continuous Random Variables

Chapter 9 Fundamental Limits in Information Theory

Lecture 6 I. CHANNEL CODING. X n (m) P Y X

Revision of Lecture 5

Lecture 22: Final Review

EE/Stat 376B Handout #5 Network Information Theory October, 14, Homework Set #2 Solutions

3F1: Signals and Systems INFORMATION THEORY Examples Paper Solutions

(each row defines a probability distribution). Given n-strings x X n, y Y n we can use the absence of memory in the channel to compute

Lecture 8: Shannon s Noise Models

Lecture 15: Conditional and Joint Typicaility

Homework Set #2 Data Compression, Huffman code and AEP

18.2 Continuous Alphabet (discrete-time, memoryless) Channel

EE5585 Data Compression May 2, Lecture 27

LECTURE 10. Last time: Lecture outline

ELEC546 Review of Information Theory

Lecture 2. Capacity of the Gaussian channel

ECE 4400:693 - Information Theory

16.36 Communication Systems Engineering

Lecture 12. Block Diagram

Appendix B Information theory from first principles

EE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16

Solutions to Homework Set #4 Differential Entropy and Gaussian Channel

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

Principles of Communications

Channel capacity. Outline : 1. Source entropy 2. Discrete memoryless channel 3. Mutual information 4. Channel capacity 5.

Solutions to Set #2 Data Compression, Huffman code and AEP

X 1 : X Table 1: Y = X X 2

EE229B - Final Project. Capacity-Approaching Low-Density Parity-Check Codes

Network coding for multicast relation to compression and generalization of Slepian-Wolf

ECE Information theory Final (Fall 2008)


Error Correcting Codes: Combinatorics, Algorithms and Applications Spring Homework Due Monday March 23, 2009 in class

Chapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code

Note that the new channel is noisier than the original two : and H(A I +A2-2A1A2) > H(A2) (why?). min(c,, C2 ) = min(1 - H(a t ), 1 - H(A 2 )).

Coding for Discrete Source

Shannon s Noisy-Channel Coding Theorem

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye

Physical Layer and Coding

1 Ex. 1 Verify that the function H(p 1,..., p n ) = k p k log 2 p k satisfies all 8 axioms on H.

Chapter 2: Source coding

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016

EE 229B ERROR CONTROL CODING Spring 2005

Shannon s Noisy-Channel Coding Theorem

One Lesson of Information Theory

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

MARKOV CHAINS A finite state Markov chain is a sequence of discrete cv s from a finite alphabet where is a pmf on and for

10-704: Information Processing and Learning Fall Lecture 9: Sept 28

Lecture 4: Proof of Shannon s theorem and an explicit code

Chapter 8: Differential entropy. University of Illinois at Chicago ECE 534, Natasha Devroye

1 Introduction to information theory

Upper Bounds on the Capacity of Binary Intermittent Communication

LECTURE 13. Last time: Lecture outline

Lecture 14 February 28

Lecture 4 Noisy Channel Coding

Lecture 5: Channel Capacity. Copyright G. Caire (Sample Lectures) 122

Bounds on Mutual Information for Simple Codes Using Information Combining

Exercises with solutions (Set B)

3F1 Information Theory, Lecture 3

SIPCom8-1: Information Theory and Coding Linear Binary Codes Ingmar Land

A One-to-One Code and Its Anti-Redundancy

Communications Theory and Engineering

Lecture 4 : Adaptive source coding algorithms

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157

EE376A - Information Theory Midterm, Tuesday February 10th. Please start answering each question on a new page of the answer booklet.

3F1 Information Theory, Lecture 3

Chapter I: Fundamental Information Theory

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria

Lecture 2: August 31

Quiz 2 Date: Monday, November 21, 2016

EECS 229A Spring 2007 * * (a) By stationarity and the chain rule for entropy, we have

Entropies & Information Theory

Variable Length Codes for Degraded Broadcast Channels

MAHALAKSHMI ENGINEERING COLLEGE-TRICHY QUESTION BANK UNIT V PART-A. 1. What is binary symmetric channel (AUC DEC 2006)

CHAPTER 3. P (B j A i ) P (B j ) =log 2. j=1

Problem 7.7 : We assume that P (x i )=1/3, i =1, 2, 3. Then P (y 1 )= 1 ((1 p)+p) = P (y j )=1/3, j=2, 3. Hence : and similarly.

Capacity of AWGN channels

Energy State Amplification in an Energy Harvesting Communication System

Noisy channel communication

Two Applications of the Gaussian Poincaré Inequality in the Shannon Theory

Solutions 1. Introduction to Coding Theory - Spring 2010 Solutions 1. Exercise 1.1. See Examples 1.2 and 1.11 in the course notes.

Lecture 7 September 24

Transcription:

Electrical and Information Technology Information Theory Problems and Solutions Contents Problems.......... Solutions...........7

Problems 3. In Problem?? the binomial coefficent was estimated with Stirling s approximation. Use the strong form of Stirling s approximation ( n ) n ( n ) ne πn n! πn n e e and justify the steps in the following calculations, where n is an integer and < p <, q = p. ( ) n (a) np πnpq p np q nq e n = (c) πnpq nh(p) e n πnpq nh(p) ( ) n (d) np = (e) πnpq p np q e npq nq πnpq nh(p) e npq 8npq nh(p) where, in the last inequality, it can be assumed that npq 9. The above calculations gives a useful bound on the binomial coefficient, ( ) n nh(p) nh(p) 8npq np πnpq 3. Encode the text IF IF = THEN THEN THEN = ELSE ELSE ELSE = IF; using the LZ77 algoritm with S = 7 and B = 7. How many code symbols were generated? If each letter in the text is translated to binary form with eight bits, what is the compression ratio? 3. Encode the text IF IF = THEN THEN THEN = ELSE ELSE ELSE = IF; using the LZ78 algoritm. How many code symbols were generated? If each letter in the text is translated to binary form with eight bits, what is the compression ratio?

33. Consider the sequence Nat the bat swat at Matt the gnat Encode and calculate the compression rate using (a) LZ77 with S = and B = 3. LZSS with S = and B = 3. (c) LZ78. (d) LZW with predefined alphabet of size 56. 34. Consider the sequence gegeven.een.eend (a) (c) (d) What source alphabet should be used? Use the LZ77 with a window size N = 8 to encode and decode the sequence with a binary code alphabet. Use the LZ78 to encode and decode the sequence with a binary code alphabet. How many codesymbols were generated? 35. Use the LZ78 algorithm to encode and decode the string THE FRIEND IN NEED IS THE FRIEND INDEED with a binary code alphabet. What is the minimal source alphabet? How many code symbols were generated? 36. Show that for all jointly ε-typical sequences, (x, y) A ε (X, Y ), we get n(h(x Y )+ε) n(h(x Y ) ε) p(x y) 37. One is given a communication channel with transition probabilities p(y x) and channel capacity C = max p(x) I(X; Y ). A helpful statistician preprocesses the output by forming Ỹ = g(y ). He claims that this will strictly improve the capacity. (a) Show that he is wrong. Under what conditions does he not strictly decrease the capacity? 38. Let X Z = {,,..., }, be a random variable used as input to an additive channel, Y = X + Z, mod where p(z = ) = p(z = ) = p(z = 3) = 3. Assume that X and Z are statistically idependent.

(a) What is the capacity of the channel? What distribution on p(x) gives the capacity? 39. Consider the discrete memoryless channel Y = X Z where X and Z are independent binary random variables. Let P (Z = ) = α. Find the capacity of this channel and the maximizing distribution on X. 4. In Shannon s original paper from 948, the following discrete memoryless channels are given. Calculate their channel capacities. (a) Noisy typewriter / / X 3 / / / / / / 3 Y Soft decoding /3 X /6 /6 /3 /6 /6 /3 Y /3 3 (c) 3-ary channel /6 /3 / X /6 / /3 Y /3 /6 / 4. Consider the binary erasure channel below. Calculate the channel capacity. p q X q p p q p q Y 3

4. Determine the channel capacity for the following Z-channel. X / Y / Hint: D(h(p)) = D(p) log( p p ). 43. Casdade two binary symetric channels as in the following picture. Determin the channel capacity. p p p p X Y Z p p p p 44. Consider a linear encoder where three information bits u = (u, u, u ) is complemented with three parity bits according to v = u u v = u u v = u u Hence, an information word u = (u, u, u ) is encoded to the codeword x = (u, u, u, v, v, v ). (a) What is the code rate R? Find a generator matrix G. (c) What is the minimum distance, d min, of the code? (d) Find a parity check matrix H, such that GH T =. (e) (f) Construct a syndrom table for decoding. Make an example where a three bit vector is encoded, transmitted over a channel and decoded. 45. Show that if d min λ + γ + for a linear code, it is capable of correcting λ errors and simultaneously detecting γ errors, where γ > λ. 46. Derive the differential entropy for the following distributions: (a) Rectangular distribution: f(x) = b a, a x b. Normal distribution: f(x) = (x µ) e σ, x. πσ (c) Exponential distribution: f(x) = λe λx, x. (d) Laplace distribution: f(x) = λe λ x, x. 4

47. Let X and X be two independant normal distributions random variables with distributions N(µ, σ ) and N(µ, σ ), respectively. Construct a new random variable X = X + X. (a) What is the distribution of X? Derive the differential entropy of X. 48. An additive channel, Y = X +Z, has the input alphabet X = {,,,, }. The aditive random variable Z is uniformly distributed over the interval [, ]. Thus, the input is a discrete random variable and the output is a continuous random variable. Derive the capacity C = max p(x) I(X; Y ). 49. The length X a stick that is manufactured in a poorly managed company, is uniformly distributed. (a) The length varies between and meters, i.e. {, x f(x) =, otherwise Derive the differential entropy H(X). The length varies between and cm, i.e. {., x f(x) =, otherwise Derive the differential entropy H(X). 5. Consider an additive channel where the output is Y = X + Z, where the noise is normal distributed with N(, σ). The channel has an output power constraint E[Y ] P. Derive the channel capacity for the channel. 5. Consider a channel with binary input with P (X = ) = p and P (X = ) = p. During the transmission a uniformly distributed noise parameter Z in the interval [, a], where a >, is added to X, i.e. Y = X + Z. (a) Calculate the mutual information according to I(X; Y ) = H(X) H(X Y ) Calculate the mutual information according to I(X; Y ) = H(Y ) H(Y X) (c) Calculate the capacity by maximizing over p. 5

5. Consider four independent, parallel, time discrete, additive Gaussian channels. The variance of the noise in the ith channel is σ i = i, i =,, 3, 4. The total power of the used signals is limited by 4 P i 7. i= Determine the channel capacity for this parallel combination. 6

Solutions 3. (a) Follows from Stirling s approximation. p np q nq = n log pp q q = n( p log p q log q) = nh(p). (c) From e n e <. (d) Follows from Stirling s approximation. (e) With npq 9 we can use e npq e 9 > 3. The decoding procedure can be viewed in the following table. The colon in the B-buffer denots the stop of the encoded letters for that codeword. S-buffer B-buffer Codeword [IF IF =] [ T:HEN T] (,,T) [ IF = T] [H:EN THE] (,,H) [IF = TH] [E:N THEN] (,,E) [F = THE] [N: THEN ] (,,N) [ = THEN] [ THEN TH:] (5,7,H) [THEN TH] [EN =: EL] (5,3,=) [ THEN =] [ E:LSE E] (,,E) [HEN = E] [L:SE ELS] (,,L) [EN = EL] [S:E ELSE] (,,S) [N = ELS] [E :ELSE ] (3,, ) [= ELSE ] [ELSE ELS:] (5,7,S) [LSE ELS] [E =: IF ] (5,,=) [ ELSE =] [ I:F ] (,,I) [LSE = I] [F:; ] (,,F) [SE = IF] [;: ] (,,;) There are 5 codewords. In the uncoded text there are 45 letters, which corresponds to 36 bits. In the coded sequence we first have the buffer of 7 letters, which gives 56 bits. Then, each codeword requires 3 + 3 + 8 = 4 bits. With 5 codewords we get 7 8 + 5(3 + 3 + 8) = 66 bits. The compression rate becomes R = 66 36 =.7389. 3. The decoding procedure can be viewed in the following table. The colon in the binary representation of the codeword shows where the index stops and the character code begins. This separatot is not necessary in the final code string. π. 7

Index Codeword Dictionary Binary (,I) [I] : (,F) [F] : 3 (, ) [ ] : 4 (,F) [IF] : 5 (3,=) [ =] : 6 (3,T) [ T] : 7 (,H) [H] : 8 (,E) [E] : 9 (,N) [N] : (6,H) [ TH] : (8,N) [EN] : (,E) [ THE] : 3 (9, ) [N ] : 4 (,=) [=] : 5 (3,E) [ E] : 6 (,L) [L] : 7 (,S) [S] : 8 (8, ) [E ] : 9 (8,L) [EL] : (7,E) [SE] : (5,L) [ EL] : (, ) [SE ] : 3 (4, ) [= ] : 4 (4,;) [IF;] : In the uncoded text there are 45 letters, which corresponds to 36 bits. In the coded sequence there are in total + +4 3+8 4+8 5 = 89 bits for the indexes and 4 8 = 9 bits for the characters of the codewords. In total the codesequence is 89 + 9 = 8 bits. The compression rate becomes R = 8 36 =.786. 33. (a) S-buffer B-buffer Codeword [Nat the ba] [t s:] (8,,s) [ the bat s] [w:at] (,,w) [the bat sw] [at a:] (5,3,a) [bat swat a] [t M:] (3,,M) [ swat at M] [att:] (4,,t) [at at Matt] [ t:h] (5,,t) [ at Matt t] [h:e ] (,,h) [at Matt th] [e: g] (,,e) [t Matt the] [ g:n] (4,,g) [Matt the g] [n:at] (,,n) [att the gn] [at:] (,,t) Text: 64 bits, Code: 34 bits, Rate:.886364 8

S-buffer B-buffer Codeword [Nat the ba] [t :s] (,8,) [t the bat ] [s:wa] (,s) [ the bat s] [w:at] (,w) [the bat sw] [at :] (,5,3) [ bat swat ] [at :] (,3,3) [t swat at ] [M:at] (,M) [ swat at M] [at:t] (,4,) [wat at Mat] [t :t] (,5,) [t at Matt ] [t:he] (,,) [ at Matt t] [h:e ] (,h) [at Matt th] [e: g] (,e) [t Matt the] [ :gn] (,4,) [ Matt the ] [g:na] (,g) [Matt the g] [n:at] (,n) [att the gn] [at:] (,,) Text: 64 bits, Code: 99 bits, Rate:.7538 (c) Index Codeword Dictionary Binary (,N) [N] : (,a) [a] : 3 (,t) [t] : 4 (, ) [ ] : 5 (3,h) [th] : 6 (,e) [e] : 7 (4,b) [ b] : 8 (,t) [at] : 9 (4,s) [ s] : (,w) [w] : (8, ) [at ] : (,M) [at M] : 3 (8,t) [att] : 4 (4,t) [ t] : 5 (,h) [h] : 6 (6, ) [e ] : 7 (,g) [g] : 8 (,n) [n] : 9 (,t) -- : Text: 64 bits, Code: 6 bits, Rate:.88 (d) 9

Index Codeword Dictionary Binary 3 [ ] 77 [M] 78 [N] 97 [a] 98 [b] [e] 3 [g] 4 [h] [n] 5 [s] 6 [t] 9 [w] 56 78 [Na] 57 97 [at] 58 6 [t ] 59 3 [ t] 6 6 [th] 6 4 [he] 6 [e ] 63 3 [ b] 64 98 [ba] 65 57 [at ] 66 3 [ s] 67 5 [sw] 68 9 [wa] 69 65 [at a] 7 65 [at M] 7 77 [Ma] 7 57 [att] 73 58 [t t] 74 6 [the] 75 6 [e g] 76 3 [gn] 77 [na] 78 57 -- Text: 64 bits, Code: 6 bits, Rate:.783 34. 35. Encoding

step lexicon prefix new symbol codeword (pointer,new symbol) binary T (, T ), T H (, H ), H E (, E ), 3 E (, ), 4 F (, F ), 5 F R (, R ), 6 R I (, I ), 7 I E N (3, N ), 8 EN D (, D ), 9 D I (4, I ), I N (, N ), N N (4, N ), N E E (3, E ), 3 EE D (9, ), 4 D I S (7, S ), 5 IS T (4, T ), 6 T H E (, E ), 7 HE F (4, F ), 8 F R I (6, I ), 9 RI EN D (8, D ), END I N (, N ), IN D E (9, E ), DE E D (3, D ), The length of the code sequence is 68 bits. Assume that the source alphabet is ASCII, then the source sequence is of length 3 bits. There are only ten different symbols in the sequence, therefore we can use a letter alphabet, {T,H,E,-,F,R,I,N,D,S}. In that case we get 39 4 = 56 bits as the source sequence. 36. The definition of jointly typical sequences can be rewritten as and n(h(x,y )+ε) n(h(x,y ) ε) p(x, y) n(h(y )+ε) n(h(y ) ε) p(y) Dividing these and using the chainrule concludes the proof. 37. (a) According to the data processing inequality we have that I(X; Y ) I(X; Ỹ ), where X Y Ỹ forms a Markov chain. Now if p(x) maximizes I(X; Ỹ ) we have that C = max I(X; Y ) I(X; Y ) p(x)= p(x) I(X; Ỹ ) p(x)= p(x) = max I(X; Ỹ ) = C p(x) p(x)

The capacity is not decreased only if we have equality in the data processing inequality, that is when X Ỹ Y forms the Markov chain. 38. (a) Since X and Z independent H(Y X) = H(Z X) = H(Z) = log 3. The capacity becomes C = max p(x) I(X; Y ) = max H(Y ) log 3 = log log 3 = log p(x) 3 This is achieved for uniform Y which by symmetry is achieved for uniform X, i.e. p(x i ) = i. 39. Assume that P (X = ) = p and P (X = ) = p. Then { P (Y = ) = P (X = )P (Z = ) = αp Then P (Y = ) = P (Y = ) = αp I(X; Y ) = H(Y ) H(Y X) = h(αp) (( p)h() + ph(α)) = h(αp) ph(α) Diffretiatin with respect to p gives us the maximizing p =. The capacity is α( h(α) α +) C = h(α p) ph(α) = = log( h(α α + ) h(α) α 4. (a) C = log 4 h( ) = = C = log 4 H( 3, 3, 6, 6 ), 87 (c) C = log 3 H(, 3, 6 ), 6 4. By assuming that P (X = ) = π and P (X = ) = π we get the following: H(Y ) = H(π( p q) + ( π)p, πq + ( π)q, ( π)( p q) + πp) = H(π pπ qπ + p, q, p q π + pπ + qπ) ( ) π pπ qπ + p p q π + pπ + qπ = h(q) + ( q)h, h(q) + ( q) ( q) ( q) with equality if π =, where H(, ) =. C = max I(X; Y ) = max(h(y ) H(Y X)) = h(q) + ( q) H(p, q, p q) p(x) p(x) ( ( )) p q p = ( q) H, q q 4. Assume that P (X = ) = A and P (X = ) = A. Then ( H(Y ) = H ( A) + A, A ) ( = H A, A ) = h( A ) H(Y X) = P (X = )H(Y X = ) + P (X = )H(Y X = ) = Ah( ) = A

and we conclude } C = max {h( A ) A p(x) Differentiation with respect to A gives the optimal à = 5. C = h( à ) Ã, 3 43. By cascading two BSCs we get the following probabilities: P (X =, Z = ) = ( p) + p P (X =, Z = ) = p( p) + ( p)p = p( p) P (X =, Z = ) = p( p) P (X =, Z = ) = ( p) + p This channel can be seen as a new BSC with crossover probability ɛ = p( p). The capacity for this channel becomes C = h(ɛ) = h(p( p)). 44. (a) R = 3 6 Find the codewords for u = (), u = () and u 3 = () and form the gernerator matrix G = (c) List all codewords (d) u x u x Then we get d min = min x {w H (x)} = 3 From part b we note that G = (I P ). Since ( ) ( ) P I P = P P = I we get H = (P T I) = 3

(e) List the most probable error patterns e s = eh T (f) where the last row is one of the weight two vectors that gives the syndrom (). One (correctable) error An uncorrectable error u = x = e = y = x e = s = yh T = ê = ˆx = y ê = û = u = x = e = y = x e = s = yh T = ê = ˆx = y ê = û = 45. Consider the graphical interpretation of F n and the two codewords x i and x j. λ γ + x x d d min λ A received symbol that is at Hamming distance at most λ from a codeword is corrected to that codeword. This is indicated by a sphere with radius λ around each codeword. Received symbols that lies outside a sphere are detected to be eroneus. The distance from one codeword to the sphere around another codeword is γ +, the number of detected errors, and the minimal distance between two codewords must be at least γ++λ. Hence, d min λ + γ +. 46. According to the definition of differential entropi (H(X) = f(x) log f(x) dx) we get that: 4

(a) (c) H(X) = = H(X) = = H(X) = b a f(x) log f(x) dx = [ x log (b a) b a ] b a f(x) log f(x) dx πσ e (x µ) σ = log πσ πσ + log e σ b a = log (b a) (x µ) πσ ( ) b a log dx b a [ log πσ e (x µ) σ dx e (x µ) σ = log (πσ ) + log e σ σ = log (πeσ ) = f(x) log f(x) dx = ] (x µ) e σ dx dx ( λe λx log λe λx) dx λe λx (log λ λx log e) dx = log λ + λ log e xλe λx dx = log λ + log e = log e λ (d) ( ) H(X) = λe λ x log λe λ x dx [ ( ) λ = λe λx log eλx dx + ( λ λe λx log [ ( ( ) ) ] λ = λe λx log + λe λx log e( λx) dx [ ( ) λ ] = log λ log e xλe λx dx = log e λ e λx ) ] dx 47. (a) The sum of two normal variables is normal distributed with N(µ + µ, σ + σ ). According to Problem 46. the entropy becomes log πe(σ + σ ). 48. The mutual information is I(X; Y ) = H(Y ) H(Y X) = H(Y ) H(X + Z X) = H(Y ) H(Z), where H(Z) = log ( ( )) = log. Since Y ranges from -3 to 3 with uniform weights p / for 3 Y, (p + p )/ for Y etc the maximum of H(Y ) is obtained for a uniform Y. This can be achieved if the distribution of X is ( 3,, 3,, 3 ). Now H(Y ) = log (3 ( 3)) = log 6. We conclude that C = log 6 log = log 3. 49. The differential entropi for a uniformly distributed variable between a and b is H(X) = log (b a). 5

(a) H(X) = log ( ) = log = H(X) = log ( ) = log 6, 644 5. The capacity of this additive white Gaussian noise channel with the output power constraint E[Y ] P is C = max I(X; Y ) = max (H(Y ) H(Y X)) f(x):e[y ] P f(x):e[y ] P = max (H(Y ) H(Z)) f(x):e[y ] P Here the maximum differential entropi is achieved by a normal distribution and the power constraint on Y is satisfied if we choose the distribution of X as N(, P σ). The capacity is C = log (πe(p σ + σ)) log (πe(σ)) = log (πep ) log (πeσ) = log (P σ ) 5. From the problem we have that P (X = ) = p, P (X = ) = p and that f(z) = a, z a, where a >. This gives that the conditional density for y becomes f(y X = ) = a, y a f(y X = ) = a, y a + which gives the desity for y as f(y) = p a, y f(y X = x)p (X = x) = a x, y a ( p) a, a y a + (a) H(X) = h(p) H(X Y ) = H(X y ) P ( y ) }{{} = + H(X y a) P ( y a) }{{} =h(p) + H(X a y a + ) P (a y a + ) }{{} = a = h(p) a log dy = h(p)a a a I(X; Y ) = H(X) H(X Y ) = a h(p) 6

p H(Y ) = a log p a a dy a log a+ a dy p log p a a a dy = p a log p a (a ) a log a p log p a a = h(p) + log a a H(X Y ) = x H(Y X = x) P (X = x) = log a }{{} =log a I(X; Y ) = H(Y ) H(Y X) = a h(p) (c) C = max I(X; Y ) = p a for p =. 5. We can use the total power P + P + P 3 + P 4 = 7 and for the four channels the noise power is N =, N = 4, N 3 = 9, N 4 = 6. Let B = P i + N i for the used channels. Since (6 ) + (6 4) + (6 9) > 7 we should not use channel four when reaching capacity. Similarly, since (9 ) + (9 4) < 7 we should use the rest of the three channels. These tests are marked as dashed lines in the figure below. Hence, B = P + = P +4 = P 3 +9, which leads to B = 3 (P + P + P 3 + 4) = 3 (7 + 4) = 3 3. The capacity becomes C = 3 i= ( log + P ) i = N i 3 log B = 3 N i log 3 + 3 log 3 4 + 3 log 3 9 i= = 3 log 3 5 log 3.4689 6 B P 3 9 P P N 4 4 N 3 N N 7