EE5139R: Problem Set 7 Assigned: 30/09/15, Due: 07/10/15 1. Cascade of Binary Symmetric Channels The conditional probability distribution py x for each of the BSCs may be expressed by the transition probability matrix A, given by: 1 p p A = p 1 p The transition matrix for the cascade is given by A n = A n ; it is possible to exploit the singular value decomposition for A to be able to easily compute A n : A = T 1 1 0 1 1 T, where T = 0 1 2p 1 1 Hence, A n = T 1 1 0 0 1 2p n T = [ 1 2 1 + 1 2pn 1 2 1 1 ] 2pn 1 2 1 1 2pn 1 2 1 + 1 2pn Hence, the probability of error of the cascade is 1 2 1 1 2pn and it is equivalent to a single BSC with this probability of error. Now say that E is a random variable that indicates whether an error occurs in the cascade channel. E can assume values in 0, 1} and its distribution is PrE = 1 = 1 2 1 1 2pn and the capacity is C = IX 0 ; X n = 1 HE achieved using a uniform distribution. Now for n, PrE = 1 1 2 and so the probability distribution of the error becomes uniform; we can now compute the limit: 2. Channel with two independent looks at Y : a We have lim IX 0; X n = lim 1 HE = 1 1 = 0 n n IX; Y 1 Y 2 = HY 1 Y 2 HY 1 Y 2 X a = HY 1 Y 2 HY 1 X HY 2 X = HY 1 + HY 2 IY 1 ; Y 2 HY 1 X HY 2 X = IX; Y 1 + IX; Y 2 IY 1 ; Y 2 b = 2IX; Y 1 IY 1 ; Y 2 where equality a is due to Y 1 X Y 2 and equality b is because Y 1 and Y 2 are conditionally identically distributed given X. 1
b The capacity of the single look channel X Y 1 is The capacity of the channel X Y 1, Y 2 is 3. Tall Fat People. C 1 = max IX; Y 1 C 2 = max IX; Y 1 Y 2 = max 2IX; Y 1 IY 1 ; Y 2 max 2IX; Y 1 = 2C 1 a The average height of the individuals in the population is 5 feet. So 1 n i h i, where n is the population size and h i is the height of the i-th person. If more than 1/3 of the population is at least 15 feet tall, then the average will be greater than 1/3 15 = 5 feet since each person is at least 0 feet tall. Thus no more than 1/3 of the population is 15 feet tall. b By the same reasoning as in part a, at most 1/2 of the population is 10 feet tall and at most 1/3 of the population weighs 300 lbs. Therefore, at most 1/3 are both 10 feet tall and weigh 300 lbs. 4. Noise Alphabets. a Maximum capacity is 2 bits. Z = 10, 20, 30} and PX = 1/4, 1/4, 1/4, 1/4. b Minimum capacity is 1 bit. Z = 0, 1, 2} and PX = 1/2, 0, 0, 1/2. 5. Joint Typicality a Consider Pr X n, Ỹ n, Z n A n = x n,y n,z n A n x n,y n,z n A n px n py n pz n 2 nhx 2 nhy 2 nhz = A n 2 nhx 2 nhy 2 nhz 2 nhxy Z+ 2 nhx 2 nhy 2 nhz nhx+hy +HZ HXY Z 4 2 2
We may reverse all the inequalities to obtain Pr X n, Ỹ n, Z n A n = x n,y n,z n A n x n,y n,z n A n px n py n pz n 2 nhx+ 2 nhy + 2 nhz+ = A n 2 nhx+ 2 nhy + 2 nhz+ 1 2 nhxy Z 2 nhx+ 2 nhy + 2 nhz+ 1 nhx+hy +HZ HXY Z+4 2 where the inequality for the size of A n holds for all n large enough depending on. 6. Information Spectrum Analysis: In class, we saw how to do typical set decoding and proved that for all rates R smaller than capacity C = max PX IX; Y, there exists a sequence of 2 nr, n-codes with vanishing error probabilities. Here, we consider a refined version of this analysis, leading to better bounds on the error probability in decoding. We can also derive a general formula for channel capacity. Let X and Y be the input and output alphabets of a channel. These alphabets need not be discrete. Let P Y X be a channel from X to Y. a Suppose we use the channel once. Show that there exist a code with M codewords with average error probability ε satisfying ε Pr log P Y XY X log M + γ + 2 γ. P Y Y for any choice of γ > 0 and any input distribution where P Y y = x P Y Xy x x. Hint: Generate codewords independently according to. Instead of using typical set decoding, decode that ˆm 1,..., M} is the transmitted message if it is the unique one satisfying log P Y Xy x ˆm P Y y If there is no unique ˆm satisfying the above condition, declare an error. The analysis to arrive at the one-shot finite blocklength bound above is very similar to typical set decoding. A stronger version of this bound for maximum error was shown by Feinstein [Fei54]. As provided in the hint, we generate M codewords xm independently from. To send message m, transmit codeword xm. Decode using the rule given above. Assume m = 1. We make an error if and only if one or more of the following events occurs: E 1 := log P } Y XY X1 < log M + γ P Y Y E 2 := m 1 : log P } Y XY X m P Y Y The probability of error can be bounded as PrE PrE 1 + PrE 2 Now note that X1, Y P Y X and so PrE 1 gives the first term in the bound we have to 3
show. We simply have to show that PrE 2 2 nγ. For this consider PrE 2 = Pr m 1 : log P Y XY X m P Y Y a b = c d M m=2 M m=2 x,y M m=2 x,y Pr log P Y XY X m P Y Y xp Y y1 log P } Y Xy x P Y y xp Y X y xm2 γ 1 log P } Y Xy x P Y y M xp Y X y xm2 γ m=2 x,y e 2 γ where a is due to the union bound, b due to the fact that for m 1, the codeword X m and channel output Y are independent, c due to the fact that we re only summing over all x, y such that log P Y X y x P Y y, d we drop the indicator and e use the fact that x,y xp Y X y x = 1 and there are M 1 terms in the outer sum. b Based on part a, prove the channel coding theorem for finite X, Y and memoryless channels. Hint: Set n above to be the n-fold product distribution corresponding to a capacity-achieving input distribution arg max PX IX; Y. Set γ above to be nγ for some γ > 0. Set log M = nc 2γ. Apply the law of large numbers to the first term to see that there exists a sequence of n, 2 nc 2γ -codes with vanishing average error probabilities. Going to the n-fold n channel uses setting, we have that there exists a code with blocklength n, M n codewords and average error probability ε n satisfying ε n Pr log P Y n X ny n X n P Y ny n log M n + γ + 2 γ. Choose n to be the n-fold product distribution corresponding to a capacity-achieving input distribution arg max PX IX; Y. Since channel is a DMC, we have Pr log P Y n X ny n X n n P Y ny n log M n + γ = Pr log P Y XY i X i P Y Y i n = Pr = Pr 1 n log P Y XY i X i P Y Y i n log P Y XY i X i P Y Y i log M n + nγ nc 2γ + nγ C γ Since [ E log P ] Y XY i X i = IX; Y = C P Y Y i 4
for all i, we have that the probability above tends to zero by the weak law of large numbers. Clearly, the second term in the bound 2 γ = 2 nγ also tends to zero because γ > 0. So we have demonstrated a sequence of codes for which ε n 0 and the code rate is C 2γ which is arbitrarily close to C. c Again consider the setup in b. Let V := Var log P Y XY X P Y Y evaluated at the say unique capacity-achieving input distribution. Based on part a, show using the central limit theorem that there exists a sequence of codes indexed by blocklength n, with sizes M n satisfying log M n = nc + nv Φ 1 ε + o n such that the average error probability is no larger than ε + o1. This exercise demonstrates a cool refinement to the channel coding theorem we have seen. For more information about this class of results information theory problems with non-vanishing errors, you may refer to my monograph [Tan14]. This result was first shown by Strassen [Str62]. See also Hayashi [Hay09] and Polyanskiy-Poor-Verdú [PPV10]. Now we again use the bound ε n Pr log P Y n X ny n X n P Y ny n log M n + nγ + 2 nγ. with γ = log n n so the final term is 1/n. Plug the value of M n into the probability in the bound above. We have Pr log P Y n X ny n X n P Y ny n log M n + nγ n = Pr log P Y XY i X i nc + nv Φ 1 ε + log n P Y Y i 1 n = Pr log P Y XY i X i log n C Φ 1 ε + O nv P Y Y i nv Now note that for all i [n], [ 1 E log P Y XY i X i V P Y Y i [ 1 Var log P Y XY i X i C V P Y Y i ] C = 0 ] = 1 so the random variable in the probability converges to a standard Gaussian by the central limit theorem. Consequently, Pr log P Y n X ny n X n P Y ny n log M n + nγ ε and we are done. d Now consider a general channel P Y n X n : X n Y n } n 1 which is simply a sequence of stochastic maps from X n Y n. Show that the capacity of this channel is bounded from below as follows: 1 C sup sup a R : lim Pr X n n log P Y n XnY n X n } P Y ny n a = 0 5
In fact, this bound is tight. That is, there is a matching upper bound. What is cool is that the lower bound above is a generalization of the notion of convergence in probability. See Verdú-Han s beautiful paper on general formulas [VH94]. Follows directly from the definition and the bound in part a. 7. List Decoding for Channel Coding a Fano s inequality for list decoding: Define the error random variable 1 W / LY E = 0 W LY Now consider HW, E LY = HW E, LY + HE LY = HE W, LY + HW LY Let P e := PrW / LY. Now clearly, HE W, LY = 0, and HE LY HE = H b P e. Now, we examine the term HW E, LY. We have HW E, LY = PrE = 0HW E = 0, LY + PrE = 1HW E = 1, LY 1 P e log l + P e log W l since if we know that E = 0, the number of values that W can take on is no more than l and if E = 1, the number of values that W can take on is no more than W l. Putting everything together and upper bounding H b P e by 1, we have b We have so the minimum a is capacity C. c We have HW LY 1 log l P e. log W l l IX n ; Y n = HY n HY n X n n = HY n HY i X i nr = HW n HY i HY i X i n IX i ; Y i nc = HW LY n + IW ; LY n Now from Fano s inequality for list decoding, HW LY n P e log W 2nL 2 nl + 1 + log2 nl = nl + n n 6
where n 0 as n. Furthermore, IW ; LY n IX n ; Y n nc where the first inequality follows from data processing; cf. W X n Y n LY n forms a Markov chain. So we have nr nl + n n + nc which upon dividing by n and taking lim sup on both sides yields 8. [Capacity Calculation for Symmetric Channels] The capacity is log 2 m h1/4. Consider R L + C =: R +. IX; Y = HY HY X = HY h1/4 Note that HY is maximized at the value log 2 m and this is achievable using the uniform input distribution px = 1/m for all x 0, 1,..., m 1}. References [Fei54] A Feinstein. A new basic theorem of information theory. IEEE Transactions on Information Theory, 44:2 22, 1954. [Hay09] M. Hayashi. Information spectrum approach to second-order coding rate in channel coding. IEEE Transactions on Information Theory, 5511:4947 4966, 2009. [PPV10] Y. Polyanskiy, H. V. Poor, and S. Verdú. Channel coding rate in the finite blocklength regime. IEEE Transactions on Information Theory, 565:2307 2359, 2010. [Str62] V. Strassen. Asymptotische Abschätzungen in Shannons Informationstheorie. In Trans. Third Prague Conf. Inf. Theory, pages 689 723, Prague, 1962. http://www.math.cornell.edu/ pmlut/strassen.pdf. [Tan14] [VH94] V. Y. F. Tan. Asymptotic estimates in information theory with non-vanishing error probabilities. Foundations and Trends in Communications and Information Theory, 111 2:1 184, 2014. S. Verdú and T. S. Han. A general formula for channel capacity. IEEE Transactions on Information Theory, 404:1147 1157, 1994. 7