EE539R: Problem Set 4 Assigned: 3/08/6, Due: 07/09/6. Cover and Thomas: Problem 3.5 Sets defined by probabilities: Define the set C n (t = {x n : P X n(x n 2 nt } (a We have = P X n(x n P X n(x n 2 nt = C n (t 2 nt x n x n C n(t x n C n(t from which the desired result follows since C n (t 2 nt. (b We want to find the set of values t for which Pr(X n C n (t. Consider the probability therein: Pr(X n C n (t = Pr ( P X n(x n 2 nt ( = Pr ( n log P X n(xn t = Pr n log P X (X i t Now note that the mean of n log P X(X i is H(X and this rv has finite variance. If t = H(X+δ for any δ > 0, then by the law of large numbers, the probability converges to one. Hence the required set is the open interval (H(X,. 2. (Optional Cover and Thomas: Problem 3.7 AEP and Source Coding: (a The number of 00-bit binary sequences with three or fewer ones is: ( ( ( ( 00 00 00 00 + + + = + 00 + 4950 + 6700 = 6675 0 2 3 so the required codelength is log 2 6675 = 8. (b The probability that a 00-bit sequence has three or fewer ones is: 3 ( 00 0.005 3 0.995 00 i = 0.99833 i i=0 Thus, the probability that the sequence that is generated cannot be encoded is 0.99833 = 0.0067. (c If S n that is the sum of n iid random variables X,... X n, Chebyshev s inequality states that, Pr( S n nµ ɛ nσ2 ɛ 2
where µ and σ 2 are the mean and variance of the X i s. In this problem, n = 00, µ = 0.005, and σ 2 = 0.005 0.995. Note that S 00 4 if and only if S 00 00 0.005 3.5 so we should choose ɛ = 3.5. Then Pr(S 00 4 00 0.005 0.995 3.5 2 0.0406 This bound is much larger than the actual probability 0.0067. 3. Cover and Thomas: Problem 3.9 AEP and Divergence: (a We have that X,..., X n are iid according to p(x. We are asked to evaluate the limit in probability of L = n log q(x,..., X n First note from memorylessness (independence that We may also write L = n L = n log q(x i log p(x i q(x i } {{ } =L n log p(x i }{{} =L 2 We know by the usual AEP that the second term L 2 converges to H(p = H(X in probability. The first term L converges to E[L ] = x p(x log p(x q(x = D(p q in probability so L converges to in probability. E[L] = D(p q + H(p (b The limit of the log-likelihood ratio in probability is [ E log q(x ] = p(x log q(x p(x p(x = D(p q. x 4. From Last Year s Exam: (a One value for c is This is because n log Pr(Xn = x n = n c = H(X + H(X 2,i odd log Pr(X i = x i n,i even log Pr(X i = x i By the law of large numbers, the first sum tends to H(X/2 since the distribution of X i for odd i is p X while the second sum tends to H(X /2 since the distribution of X i for even i is p X. 2
(b The optimal compression rate is c. We code as follows. If the encoder observes a sequence in T n ε (c, represent it using c + ε + 2/n bits per symbol (with a single bit prefix to indicate that the sequence belongs in the typical set. If the encoder observes a sequence not in T n ε (c, encoder it with an arbitrary string of length nc + nε + 2. This ensures that the compression rate is no more than c + ε + 2/n and the probability of decoding error goes to zero as n becomes large. Since c + ε + 2/n is arbitrarily close to c, the claim is proved. (c The minimum compression rate is H(X Y. We code as follows assuming Y = {0, }. Let I 0 := {i =,..., n : y i = 0} be the indices in which the side information y i = 0 and let I := {,..., n} \ I 0. Then we use an optimum length- I 0 source code for the source Pr(X = x Y = 0 and an optimum length- I source code for the source Pr(X = x Y =. The code rate is thus I 0 n H(X Y = 0 + I H(X Y = n The decoder also knows the indices I 0 and I and can partition its observations into these two subblocks and decode per normal. However, by the law of large numbers, Ij n p Y (j for j = 0,. Thus, for large enough n with very high probability, the code rate above is which is the conditional entropy H(X Y. p Y (0H(X Y = 0 + p Y (H(X Y = (d The source has uncertainty or entropy H(X. The side information reduces the uncertainty from H(X to H(X Y. The difference is the mutual information I(X; Y which is how much reduction in uncertainty of X given I know Y. (e Yes. The code rate will be increased to 9 0 H(X + 0H(X Y since the side-information is only available one-tenth of the time. 5. (Optional 205/6 Quiz : (0 points Weighted Source Coding: In class, we saw that the minimum rate of compression for an i.i.d. source X n with distribution P X is H(X = x X P X (x log P X (x. Now suppose that there are costs to encoding each symbol. Consider a cost function c : X [0,. For any length-n string, let n c (n (x n := c(x i and let the size of any set A X n be c (n (A := x n A c (n (x n. We say that a rate R is achievable if there exists a sequence (in n of sets A n X n with sizes c (n (A n that satisfy n log c(n (A n R and Pr(X n / A n 0 as n. We also define the optimal weighted source coding rate to be R (X; c := inf{r R : R is achievable}. Define H(P c := x X P X (x log c(x P X (x, 3
and for a small ɛ > 0, the set { B ɛ (n (X; c := x n : H(P c ɛ n log c(n (x n } P X n(x n H(P c + ɛ. (a What is R (X; c when c(x = for all x X? quantity needs to be stated. The answer is H(X, the entropy. (b Now for general c : X [0,, is it true that No justification needed, only a information Pr(X n B (n ɛ (X; c, as n? Prove or argue why it is not true. Yes it is true. By the law of large numbers. Consider ( Pr(X n B ɛ (n (X; c = Pr n ( = Pr n log c(x i P X (X i Var(log c(x P X (X nɛ 2 log c(x i P X (X i H(P c ɛ H(P c > ɛ (c Show carefully that We have ( c (n B ɛ (n (X; c = ( c (n B ɛ (n n(h(p c+ɛ (X; c 2 x n B (n ɛ (X;c x n B ɛ (n (X;c c (n (x n = 2 n(h(p c+ɛ n(h(p c+ɛ 2 P X n(x n n(h(p c+ɛ 2 x n B ɛ (n (X;c P X n(x n (d Using part (c, find the best possible upper bound for R (X; c. You need to prove an achievability result, i.e., specify the sets A n and provide clear a reason for your upper bound. You do not need to prove any converse. An upper bound is R (X; c H(P c + ɛ. Take A n := B ɛ (n (X; c. 6. Bad Huffman Codes: Which of these codes cannot be Huffman codes for any probability assignment? (a {0, 0, }. Solution: {0, 0, } is a Huffman code for the distribution (/2, /4, /4. 4
(b {00, 0, 0, 0}. Solution: {00, 0, 0, 0} is not a Huffman code because there is a unique longest codeword. (c {0, 0}. Solution: The code {0, 0} can be shortened to {0, } without losing its instantaneous property, and therefore is not optimal and not a Huffman code. 7. (Optional Suffix-Free Codes: Define a suffix-free code as a code in which no codeword is a suffix of any other codeword. (a Show that suffix-free codes are uniquely decodable. Use the definition of unique decodability, rather than the intuitive but vague idea of decodability with initial synchronization. Solution: Assume the contrary i.e. suffix-free codes are not uniquely decodable. Then there must exist two distinct sequence of source letters, say, (x, x 2,... x n and (x, x 2,... x m such that, C(x C(x 2... C(x n = C(x C(x 2... C(x m Then one of the following must hold (i C(x n = C(x m or (ii C(x n is a suffix of C(x m or (ii C(x m is a suffix of C(x n. In the last two cases we arrive at a contradiction since our code is suffix-free. In the first case, simply delete the the last two source letters from each sequence and repeat the argument till one of the latter two cases holds and a contradiction is reached. Hence, suffix-free codes are uniquely decodable. Alternatively, the fact that the codes are uniquely decodable can be seen easily be reversing the order of the code. For any received sequence, we work backwards from the end, and look for the reversed codewords. Since the codewords satisfy the suffix condition, the reversed codewords satisfy the prefix condition, and the we can uniquely decode the reversed code. (b Find an example of a suffix free code with codeword lengths (, 2, 2 that is not a prefix-free code. Can a code word be decoded as soon as its last bit arrives at the decoder? Show that a decoder might have to wait for an arbitrarily long time before decoding (this is why a careful definition of unique decodability is required. Solution: The {0, 0, } code discussed in the lecture is an example of a suffix-free code with codeword lengths (, 2, 2 that is not a prefix-free code. Clearly, a codeword cannot be decoded as soon as its last bit arrives at the decoder. To illustrate a rather extreme case, consider the following output produced by the encoder, 0.... Assuming that source letters {a, b, c} map to {0, 0, }, we cannot distinguish between the two possible source sequences, acccccccc... and bcccccccc... till the end of the string is reached. Hence, in this case the decoder might have to wait for an arbitrarily long time before decoding. 8. (Optional Kraft for Uniquely Decodable Codes: Assume a uniquely decodable code has lengths l,..., l M. (a Prove the following identity (this is easy: n 2 lj =... j = j 2= Solution: This is trivial. Simply expand the sum. j n= 2 (lj +lj 2 +...+ljn (b Show that there is one term on the right for each concatenation of n codewords (i.e. for the encoding of the n-tuple x = (x,..., x n where l j + l j2 +... + l jn is the aggregate length of that concatenation. Solution: l j + l j2 +... + l jn is the length of n codewords from the code. 5
(c Let A i be the number of concatenations that have overall length i and show that n nl max = A i 2 i 2 lj Solution: The smallest value this exponent can take is n, which would happen if all code words had the length. The largest value the exponent can take is nl max where l max is the maximal codeword length. The summation can then be written as above. (d Using unique decodability, show that A i 2 i and hence n 2 lj nl max Solution: The number of possible binary sequences of length i is 2 i. Since the code is uniquely decodable, we must have A i 2 i in order to be able to decode. Plugging this into the above bound yields 2 lj n nl max i=n i=n 2 i 2 i = n(l max. (e By taking n-th root and letting n, recover Kraft s inequality for uniquely decodable codes. Solution: We have 2 lj [ n(l max ] [ ] /n = exp n log(n(l max The exponent goes to zero as n and hence, M 2 lj, Kraft s inequality for UD codes. 9. (Optional Infinite Alphabet Optimal Code: Let X be an i.i.d. random variable with an infinite alphabet, X = {, 2, 3,..., }. In addition let P (X = i = 2 i. (a What is the entropy of X? Solution: By direct calculation, This is because H(X = 2 i log(2 i = for all x <. Can be shown by differentiation. i2 i = 2. ix i = ( x 2 (b Find an optimal variable length code, and show that it is indeed optimal. Solution: Take the codelengths to be log(2, log(2 2, log(2 3,.... Codewords can be C( = 0 C(2 = 0 C(3 = 0. 6