Solutions to Homework Set #3 Channel and Source coding

Solutions to Homework Set #3 Channel and Source coding. Rates (a) Channels coding Rate: Assuming you are sending 4 different messages using usages of a channel. What is the rate (in bits per channel use) that you send. (b) Source coding Rate: Assuming you have a file with 6 Ascii characters, where the alphabet of Ascii characters is 56. Assume that each Ascii character is represented by bits (binary alphabet before compression). After compressing it we get 4 6 bits. What is the compression rate? Solution: Rates. (a) (b) R = log 4 =. R = 4 6 6 log (56) =.. Preprocessing the output. One is given a communication channel with transition probabilities p(y x) and channel capacity C = max I(X;Y). A helpful statistician preprocesses the output by forming Ỹ = g(y), yielding a channel p(ỹ x). He claims that this will strictly improve the capacity. (a) Show that he is wrong. (b) Under what conditions does he not strictly decrease the capacity? Solution: Preprocessing the output.

(a) The statistician calculates Ỹ = g(y). Since X Y Ỹ forms a Markov chain, we can apply the data processing inequality. Hence for every distribution on x, I(X;Y) I(X;Ỹ). Let be the distribution on x that maximizes I(X;Ỹ). Then C = max I(X;Y) I(X;Y) = I(X;Ỹ) = = maxi(x;ỹ) = C. Thus, the helpful suggestion is wrong and processing the output does not increase capacity. (b) We have equality (no decrease in capacity) in the above sequence of inequalities only if we have equality in data processing inequality, i.e., for the distribution that maximizes I(X;Ỹ), we have X Ỹ Y forming a Markov chain. Thus, Ỹ should be a sufficient statistic. 3. The Z channel. The Z-channel has binary input and output alphabets and transition probabilities p(y x) given by the following matrix: [ ] Q = x,y {,} / / Find the capacity of the Z-channel and the maximizing input probability distribution. Solution: The Z channel. First we express I(X; Y), the mutual information between the input and output of the Z-channel, as a function of α = Pr(X = ): H(Y X) = Pr(X = ) +Pr(X = ) = α H(Y) = H(Pr(Y = )) = H(α/) I(X;Y) = H(Y) H(Y X) = H(α/) α Since I(X;Y) is strictly concave on α (why?) and I(X;Y) = when α = and α =, the maximum mutual information is obtained for some value of α such that < α <.

Using elementary calculus, we determine that d dα I(X;Y) = log α/, α/ which is equal to zero for α = /5. (It is reasonable that Pr(X = ) < / since X = is the noisy input to the channel.) So the capacity of the Z-channel in bits is H(/5) /5 =.7.4 =.3. 4. Using two channels at once. Consider two discrete memoryless channels (X,p(y x ),Y ) and (X,p(y x ),Y ) with capacities C and C respectively. A new channel (X X,p(y x ) p(y x ),Y Y ) is formed in which x X and x X, are simultaneously sent, resulting in y,y. Find the capacity of this channel. Solution: Using two channels at once. Tofindthecapacityoftheproductchannel(X X,p(y,y x,x ),Y Y ), we have to find the distribution p(x,x ) on the input alphabet X X that maximizes I(X,X ;Y,Y ). Since the transition probabilities are given as p(y,y x,x ) = p(y x )p(y x ), p(x,x,y,y ) = p(x,x )p(y,y x,x ) = p(x,x )p(y x )p(y x ), Therefore, Y X X Y forms a Markov chain and I(X,X ;Y,Y ) = H(Y,Y ) H(Y,Y X,X ) = H(Y,Y ) H(Y X,X ) H(Y X,X )() = H(Y,Y ) H(Y X ) H(Y X ) () H(Y )+H(Y ) H(Y X ) H(Y X ) (3) = I(X ;Y )+I(X ;Y ), where Eqs. () and () follow from Markovity, and Eq. (3) is met with equality if X and X are independent and hence Y and Y are inde- 3

pendent. Therefore C = max I(X,X ;Y,Y ) p(x,x ) max I(X ;Y )+ max I(X ;Y ) p(x,x ) p(x,x ) = maxi(x ;Y )+maxi(x ;Y ) p(x ) p(x ) = C +C. with equality iff p(x,x ) = p (x )p (x ) and p (x ) and p (x ) are the distributions that maximize C and C respectively. 5. A channel with two independent looks at Y. Let Y and Y be conditionally independent and conditionally identically distributed given X. Thus p(y,y x) = p(y x)p(y x). (a) Show I(X;Y,Y ) = I(X;Y ) I(Y ;Y ). (b) Conclude that the capacity of the channel X (Y,Y ) is less than twice the capacity of the channel X Y Solution: A channel with two independent looks at Y. (a) I(X;Y,Y ) = H(Y,Y ) H(Y,Y X) (4) = H(Y )+H(Y ) I(Y ;Y ) H(Y X) H(Y X) (5) (since Y and Y are conditionally independent given X)(6) = I(X;Y )+I(X;Y ) I(Y ;Y ) (7) = I(X;Y ) I(Y ;Y ) (since Y and Y are conditionally (8). identically distributed) 4

(b) The capacity of the single look channel X Y is The capacity of the channel X (Y,Y ) is C = maxi(x;y ). (9) C = maxi(x;y,y ) () = maxi(x;y ) I(Y ;Y ) () maxi(x;y ) () = C. (3) Hence, two independent looks cannot be more than twice as good as one look. 6. Choice of channels. Find the capacity C of the union of channels (X,p (y x ),Y ) and (X,p (y x ),Y ) where, at each time, one can send a symbol over channel or over channel but not both. Assume the output alphabets are distinct and do not intersect. (a) Show C = C + C. (b) What is the capacity of this Channel? 3 p p p p 3 Solution: Choice of channels. (a) Let θ = {, if the signal is sent over the channel, if the signal is sent over the channel. 5

Consider the following communication scheme: The sender chooses between two channels according to Bern(α) coin flip. Then the channel input is X = (θ,x θ ). Since the output alphabets Y and Y are disjoint, θ is a function of Y, i.e. X Y θ. Therefore, I(X;Y) = I(X;Y,θ) = I(X θ,θ;y,θ) Thus, it follows that = I(θ;Y,θ)+I(X θ ;Y,θ θ) = I(θ;Y,θ)+I(X θ ;Y θ) = H(θ)+αI(X θ ;Y θ = )+( α)i(x θ ;Y θ = ) = H(α)+αI(X ;Y )+( α)i(x ;Y ). C = sup{h(α)+αc +( α)c }, α which is a strictly concave function on α. Hence, the maximum exists and by elementary calculus, one can easily show C = log ( C + C ), which is attained with α = C /( C + C ). If one interprets M = C as the effective number of noise free symbols, then the above result follows in a rather intuitive manner: we have M = C noise free symbols from channel, and M = C noise free symbols from channel. Since at each step we get to choose which channel to use, we essentially have M + M = C + C noise free symbols for the new channel. Therefore, the capacity of this channel is C = log ( C + C ). This argument is very similiar to the effective alphabet argument given in Problem 9, Chapter of the text. (b) From part (b) we get capacity is log( H(p) + ). 7. Cascaded BSCs. Considerthetwodiscretememorylesschannels(X,p (y x),y)and(y,p (z y),z). Let p (y x) and p (z y) be binary symmetric channels with crossover probabilities λ and λ respectively. 6

X λ λ λ λ λ λ λ λ Y Z (a) What is the capacity C of p (y x)? (b) What is the capacity C of p (z y)? (c) Wenowcascadethesechannels. Thusp 3 (z x) = y p (y x)p (z y). What is the capacity C 3 of p 3 (z x)? Show C 3 min{c,c }. (d) Now let us actively intervene between channels and, rather than passively transmitting y n. What is the capacity of channel followed by channel if you are allowed to decode the output y n of channel and then reencode it as ỹ n for transmission over channel? (Think W x n (W) y n ỹ n (y n ) z n Ŵ.) (e) What is the capacity of the cascade in part c) if the receiver can view both Y and Z? Solution: Cascaded BSCs. (a) This is a simple BSC with capacity C = H b (λ ). (b) Similarly, C = H b (λ ). (c) This is also a BSC channel with transition probability λ λ = λ ( λ )+( λ )λ and thus C 3 = H b (λ λ ). From the markov chain X Y Z we can see that and additionaly C 3 = I(X;Z) I(X;Y,Z) = I(X;Y) = C C 3 = I(X;Z) I(X,Y;Z) = I(Y;Z) = C 7

and thus C 3 min{c,c } (d) If one can reencode Y then we obtain that C 3 = min{c,c } since only one of the channels will be the bottleneck between X and Z. (e) From the Markov chain X Y Z we can see that 8. Channel capacity C 3 = I(X;Y,Z) = I(X;Y) = C (a) What is the capacity of the following channel (Input) X 3 3 4 Y (Output) (b) Provide a simple scheme that can transmit at rate R = log 3 bits through this channel. Solution for Channel capacity (a) We can use the solution of previous home question: C = log ( C + C + C 3 ) Now we need to calculate the capacity of each channel: C = maxi(x;y) = H(Y) H(Y X) = = 8

C = maxi(x;y) = H(Y) H(Y X) = = C 3 = max I(X;Y) = max = max {H(Y) H(Y X)} [ ( ) p log p ( ) ( )] p +p 3 log p +p 3 p Assigning p 3 = p and derive against p : di(x;y) dp = p p ( log p ) + p p + ( ) log p = And as result p = 5 : C 3.3 And, finally: C = log( + +.3 ).7 (b) Here is a simple code that achieves capacity. Encoding: You just use ternary representation of the message and send using,, but no 3 (or,,3 but no ) of the input channel. Decoding: map the ternary output into the message. 9

9. A channel with a switch. Consider the channel that is depicted in Fig., there are two channels with the conditional probabilities p(y x) and p(y x). These two channels have common input alphabet X and two disjoint output alphabets Y,Y (a symbol that appears in Y can t appear in Y ). The position of the switch is determined by R.V Z which is independent of X, where Pr(Z = ) = λ. p(y x) Y Z = X Y p(y x) Y Z = Figure : The channel. (a) Show that I(X;Y) = λi(x;y )+ λi(x;y ). (4) (b) The capacity of this system is given by C = max I(X;Y). Show that C λc + λc, (5) where C i = max I(X;Y i ). When is equality achieved? (c) The sub-channels defined by p(y x) and p(y x) are now given in the following figure, where p =. Find the input probability that maximizes I(X; Y). For this case, does the equality C = λc + λc stand? explain! Solution: Channel with state (a) Since the alphabet Y,Y are disjoint the markov chain X Y Z holds. Additionally, X and Z are independent and thus we have

p X Y p p p p X Y p 3 Figure : (a) describes channel - BSC with transition probability p. (b) describes channel - Z channel with transition probability p. that I(X;Y) = I(X;Y Z) (b) Consider the following: = Pr(z = )I(X;Y z = )+Pr(z = )I(X;Y z = ) = λi(x;y z = )+ λi(x;y z = ) = λi(x;y )+ λi(x;y ) C = maxi(x;y) = max{λi(x;y )+ λi(x;y )} max {λi(x;y )}+max { λi(x;y )} = λc + λc (c) Since p =, the capacity of the first channel is and thus we only need to maximize over the second channel. Thus, we would like to find such that λi(x;y ) = λ(h b ( α) αh b( ) is maximized where Pr(x = ) = α [,]. In this case the solution is α =.4. We can see that the equality in holds.. Channel with state A discrete memoryless (DM) state dependent channel with state space S is defined by an input alphabet X, an output alphabet Y and a set of channel transition matrices {p(y x,s)} s S. Namely, for each s S

the transmitter sees a different channel. The capacity of such a channel where the state is know causally to both encoder and decoder is given by: C = maxi(x;y S). (6) p(x s) Let S = 3 and the three different channels (one for each state s S) are as illustrated in the following figure ǫ ǫ δ δ ǫ δ δ ǫ S = S = S = 3 Figure 3: The three state dependent channel. The state process is i.i.d. according to the distribution p(s). (a) Find an expression for the capacity of the S-channel (the channel the transmitter sees given S = ) as a function of ǫ. (b) Find an expression for the capacity of the BSC (the channel the transmitter sees given S = ) as a function of δ. (c) Find an expression for the capacity of the Z-channel (the channel the transmitter sees given S = 3) as a function of ǫ. (d) Find an expression for the capacity of the DM state dependent channel (using formula (6)) for p(s) = [ ] as a function of 3 6 ǫ and δ. (e) LetusdefineaconditionalprobabilitymatrixP X S fortworandom variables X and S with X = {,} and S = {,,3}, by: [ ] 3, PX S = p(x = j s = i). (7) i=,j= WhatistheinputconditionalprobabilitymatrixP X S thatachieves the capacity you have found in (d)?

Solution: Channel with state (a) Denote the capacity of the S-Channel by C S = max I(X;Y S = ) p(x s=) = max [H(Y S = ) H(Y X,S = )] p(x s=) Assume that the input X is distributed according to X Bernoulli(α) for s =, and let us calculate the entropy terms: H(Y X,S = ) = x X H(Y X = x,s = ) Consider =αh(y X =,S = )+( α)h(y X =,S = ) =αh b (ǫ)+( α) = αh b (ǫ) p(y = s = ) =p(y =,x = s = )+p(y =,x = s = ) Then We want to maximize =α( ǫ) H(Y X,S = ) =H b (α( ǫ)). I(X;Y S = ) = H b (α( ǫ)) αh b (ǫ). Taking derivative with respect to α and find the roots of the derivation gives the capacity. (b) For the binary symmetric channel (BSC) the capacity is given by C BSC = H b (δ). (c) The capacity of the Z-channel and S-channel are equal because thechannelsareequivalentuptoswitchingwithandviceversa. (d) The capacity of the state dependent channel is C =maxi(x;y S) p(x s) [ p(s = )I(X;Y S = )+p(s = )I(X;Y S = ) =max p(x s) +p(s = 3)I(X;Y S = 3) ] = C S + 3 C BSC + 6 C Z 3

(e) The rows of the matrix P X S are the conditional probability function which achieve capacity for each of the sub channel. p(x = s = ) p(x = s = ) P X S = p(x = s = ) p(x = s = ) p(x = s = ) p(x = s = 3). Modulo channel (a) Consider the DMC defined as follows: Output Y = X Z where X, taking values in {,}, is the channel input, is the modulo- summation operation, and Z is binary channel noise uniform over {,} and independent of X. What is the capacity of this channel? (b) Consider the channel of the previous part, but suppose that instead of modulo- addition Y = X Z, we perform modulo-3 addition Y = X 3 Z. Now what is the capacity? Solution: Modulo channel (a) This is a simple case of a BSC with transition probability of p = / and thus the capacity in this case is C BSC =. (b) In this case we can model the channel as a BEC as studied in class. The input is of binary alphabet and the output is of a trenary alphabet. The probability of error is Pr(z = ) = p = and thus the capacity for this channel is C BEC = p =.. Cascaded BSCs: Given is a cascade of k identical and independent binary symmetric channels, each with crossover probability α. (a) In the case where no encoding or decoding is allowed at the intermediate terminals, what is the capacity of this cascaded channel as a function of k,α. (b) Now, assume that encoding and decoding is allowed at the intermediate points, what is the capacity as a function of k,α. (c) What is the capacity of each of the above settings in the case where the number of cascaded channels, k, goes to infinity? 4

Solution: Cascaded BSCs. (a) CascadedBSCsresultanewBSCwithanewparameter,β. Therefore, the capacity is C a = H (β) and the parameter β can be found as follows: ( ) k β = α i ( α) k i i {i k:i is odd} k ( ) k ( ) i = α i ( α) k i i i= = k ( ) k ( α) i ( α) k i i i= = ( ( α)k ). Answers of β as the initial sum over odd indices or the binary convolutional of k identical parameters α have been accepted as well. (b) We have seen in HW that in the case of encoding and decoding the capacity of the cascaded channel equals C b = min{c i }. Since all channels are identical with capacity H (α), we have that C b = H (α). (c) In (a), β.5 as k so C a. For (b), the number of cascaded channels does not change the capacity which remains C b = H (α). 5