Lecture 16 Agenda for the lecture Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code Variable-length source codes with error 16.1 Error-free coding schemes 16.1.1 The Shannon-Fano-Elias code Our next coding scheme is Elias s twist on the Shannon-Fano code of the previous class which gets rid of the requirement to sort the probabilities. However, this algorithmic efficiency comes at the cost of 1 bit we now need H(X) + 2 bits instead of H(X) + 1 as before. Input: Source distribution P = (p 1,..., p m ) Output: Code C = {c 1,..., c m }. 1. for i = 1,..., m (i) Let l(i) = log p i + 1. c Himanshu Tyagi. Feel free to use with acknowledgement. 1
(ii) Compute F i = j<i p j + p i /2 (where F 0 = p 1 /2). Let c denote the infinite sequence corresponding to the binary representation of F i. If F i has a terminating binary representation, append 0s at the end to make it an infinite sequence. (iii) The codeword c i is given by the first l(i) bits of c, i.e., by the approximation of c to l(i) bits. For illustration, consider the example from the last class once more: From the length Alphabet P X F i in binary l(i) codeword a 1/8 0.0001 4 0001 b 1/2 0.011 2 01 c 1/4 0.11 3 110 d 1/8 0.1111 4 1111 Table 1: An illustration of Shannon-Fano-Elias code. assignments it is clear that the average length of this code is less than H(X) + 2. It only remains to verify that this code is prefix free. Theorem 16.1. A Shannon-Fano-Elias code is prefix-free. Proof. Consider F i and F j such that i < j. Then, as noted in the analysis of Shannon-Fano codes, the codeword c j satisfies 0.c j F j 0.c j + 2 l(j) 0.c j + 2 log p j 1 = 0.c j + p j 2. Thus, 0.c i F i F j (p i + p j )/2 0.c j p i 2. In particular, 0.c j > 0.c i and so c j cannot be a prefix of c i. On the other hand, 0.c j 0.c i + 2 l(i). Note that since both c i and c j are of finite lengths and c i is of length l(i), c i can be a prefix of c j only if 0.c j < 0.c i + 2 l(i), 2
which does not hold. 16.1.2 Huffman code We next present the Huffman code, which still requires us to sort the probabilities, but has average length exactly equal to L p (X), i.e., it is of optimal average length. Unlike the two coding schemes earlier, we now represent the prefix-free code by a binary tree. Input: Source distribution P = (p 1,..., p m ) Output: Code C = {c 1,..., c m }. 1. Associate with each symbol a leaf node. 2. while m 2, do (i) If m = 2, assign the two symbols as the left and the right child of the root. Update m to 1. (ii) Sort the probabilities of symbols in a descending order. Let p 1 p 2... p m be the sorted sequence of probabilities. (iii) If m > 2, assign the symbols (m 1)th and the mth symbols as the left and the right child of a node. Treat this node and its subtree as a new symbol which occurs with probability (p m 1 + p m ) and associate a leaf with it. Update m m 1 and P (p 1,..., p m 2, p m 1 + p m ). 3. Generate a binary code from the tree by putting a 0 over each edge to a left child and 1 over each edge to the right child. For illustration, consider our foregoing example once more. The algorithm proceeds as follows 1 : ((a, 0.125); (b, 0.5); (c, 0.25)); (d, 0.125)) 1 The reader can easily decode our representation of the evolving tree. 3
(b(c(ab)), 1). ((b, 0.5); (c, 0.25)); (ab, 0.25)) ((b, 0.5); (c(ab), 0.5)) In summary, we have the following code: The average length for this example is 7/4 which is Alphabet P X codeword a 1/8 110 b 1/2 0 c 1/4 10 d 1/8 111 Table 2: An illustration of Huffman code. equal to the entropy. Therefore, the average length must be optimal. In fact, the Huffman code always attains the optimal average length. Theorem 16.2. A Huffman code has optimal average length. We only provide a sketch of the proof. An interested reader can find the proof in Cover and Thomas. Proof sketch. The proof relies on the following observation: There exists a prefix-free code of optimal length which assigns two symbols with the least probability to the two longest codewords of length l max such that the first l max 1 bits for the two codewords are the same. Next, we show the optimality of Huffman code by induction on the number of symbols. Denote by L(P) the minimum average length of a prefix-free code. Suppose Huffman code attains the optimal length for every probability distribution over an alphabet of size m 1. For P = (p 1,..., p m ), consider the prefix-free code C of average length L(P) guaranteed by the observation above. Let L H (P) denote the length of the optimal Huffman code. Then, L((p 1,..., p m )) L H ((p 1,..., p m )) 4
= (p m 1 + p m ) + L H ((p 1,..., p m 2, p m 1 + p m )) = (p m 1 + p m ) + L((p 1,..., p m 2, p m 1 + p m )), where the previous equality is by the induction hypothesis. On the other hand, by property of the optimal code C, it also yields a prefix-free code for (p 1,..., p m 2, p m 1 +p m ) of average length L((p 1,..., p m )) (p m 1 + p m ). But then L((p 1,..., p m )) (p m 1 + p m ) + (p m 1 + p m ) + L((p 1,..., p m 2, p m 1 + p m )). Thus, by combining all the bounds above, all inequalities must hold with equality. particular, L((p 1,..., p m )) = L H ((p 1,..., p m )). In 16.2 Variable-length source codes with error Thus far we have seen performance bounds for error-free codes and specific schemes which come close to these bounds. We now examine what do we have to gain if we allow a small probability of error. For fixed-length codes, we have already addressed this question. There the gain depends on the large probability upper bound for the entropy density h(x). However, asymptotically the gain due to error was negligible since the optimal asymptotic rate is independent of ɛ (strong converse). In the case of variable length codes the situation is quite different. Allowing error results in significant gains when we are trying to compress a single symbol as well as results in a rate gain asymptotically. To make our exposition simple, we allow randomized encoding where in addition to the source symbol the encoder also has access to a biased coin toss. We show that L ɛ (X) (1 ɛ)l(x). In fact, we prove a stronger statement. 5
Lemma 16.3. Consider a source with pmf P over a discrete alphabet X. Given an errorfree code of average length l and minimum codeword length l min, there exists a variable length code which allows a probability of error of at most ɛ and has average length no more than (1 ɛ)l + ɛl min. Proof. Consider a code C with average length l, and let c 0 be a codeword of minimum length l min. We define a new code where for each symbol x X the encoder flips a coin which shows heads with probability 1 ɛ and outputs the codeword in C for x if it shows heads or the codeword c 0 if it shows tails. The decoder upon observing a codeword in C simply outputs the corresponding symbol for C. Note that an error will occur only if the coin used in encoding showed heads. Therefore, the probability of error is less than ɛ. Also, the average length of the code must now be averaged over both the coin toss as well as the source distribution. Given that the coin showed heads, the average length of the code is l. Given that the coin showed tails, the average length of the code is l min. Therefore, the overall average length equals (1 ɛ)l + ɛl min. Note that l min for an optimal nonsingular code is 0. Therefore, there exists a code with probability of error less than ɛ and average length less than (1 ɛ)l(x). 6