On the redundancy of optimum fixed-to-variable length codes

Size: px

Start display at page:

Download "On the redundancy of optimum fixed-to-variable length codes"

Brandon Jackson
5 years ago
Views:

1 On the redundancy of optimum fixed-to-variable length codes Peter R. Stubley' Bell-Northern Reserch Abstract There has been much interest in recent years in bounds on the redundancy of Huffman codes, given only partial information about the source word distribution, such as the probability of the most likely source. This work determines upper and lower bounds for the redundancy of Huffman codes of source words which are binomially distributed. Since the complete distribution is known, it is possible to determine bounds which are much tighter than other bounds in the literature, given only p, the probability of the most likely symbol of the binary source, and I<, where there are 2" source words. The upper and lower bounds will be shown to converge to the same value as X becomes large, resulting in a simple approximation which can be used to predict the redundancy of the Huffman code, given p and I<, without constructing the code. 1 Introduction Let S be an independent, identically distributed binary source, which generates a 0 with probability p, p p. Let E[K] be the average code word length of the Huffman code of the alphabet obtained by taking K-bit blocks of input symbols, called source words, where the code words are also assumed to be binary, with equal cost symbols. The redundancy of the code is defined as r(p,k) = E(K] - KH(S), (1) where H(S) = -plogp-(1-p) ratio is log(1-p) (the logarithms are base 2). The compression It is clear from Equation 2 that limiii-cop = H(S), since 0 2 r(p,zc) 2 1 for the Huffman code [7], and limrc-.m 1/K = 0. However, limk,, r(p, A') is not as well understood. There have been numerous upper and lower bounds obtained for r in recent years (for example, (1, '2, 9, 10, Ill), given partial information about the source alphabet, such as the probability of the most likely source word. However, for the source words described above, obtained by taking IC-bit blocks, the complete distribution is known, given p and I(. The source words are binomially distributed, with probabilities of the form p(i) = ~ '~-~(l- P)~, such that XE, (:)p(i) = 1. There are (7)source words with probability p(i). 'Bell-Northern Research Ltd., 16, Place du Commerce, Montreal, Qudbec, H3E 1H6, Phone: (514) , Fax: (514) , stuhley@hnr.ca $ IEEE

2 91 Jakobssen [SI has analyzed E[IC] for the binomial distribution for a class of values of p, where However, observe that the values of p which satisfy Equation 3 become closer and closer to 1 as I( becomes large. This approach does not allow us to consider the behaviour of a particular value of E[K] for a particular value of p as IC becomes large. Huffman codes for binomially distributed source words have also been considered by Montgomery and Kumar [12]. They are particularly interested in the conditions which are necessary to ensure that Huffman code of a %symbol block is more efficient than simply constructing the Huffman code for the original source alphabet. In this work, ~ (p, IC) is analyzed for the binomial distribution for arbitrary values of p, with simple closed forms for particular values of p. In particular, the problem of interest is what happens to r(p, IC) as I( becomes large. The approach taken will be to derive upper and lower bounds on r(p, I(), and showing that the upper and lower bounds converge as I( becomes large. Using this approach, it will be shown that r(p, IC) does not, in general, vanish, and does not monotonically increase or decrease, but tends to oscillate between values nearer and farther from 0. 2 Properties of Huffman codes of binomial distributions This section assumes that a modified form of the Huffman algorithm is used (see Appendix). The modification ensures that the Huffman code produced by the algorithm is unique (there is only one code for a given source word distribution, which is not always the case for Huffman s original algorithm), and is constructed such that the longest code words have the shortest possible length. This assumption does not affect the optimality of the code produced by the modified algorithm, since all the Huffman codes for the same source word distribution have the same average length. If there are M source words, each with probability 1/M, then their code words will have length [log, MJ or ([log, MJ + 1) [4, 61. A similar property holds for binomial distributions - if two source words have equal probabilities, then their code words will differ in length by at most 1, which follows from Gallager s sibling property [5]. Furthermore, the least likely source word, with probability p(k), will have a code word with the same length as the source word with probability p(ic - 1 ) with the longest code word, which is a direct result of the Huffman algorithm. Let I, be the length of the j-th source word, and let Si be the set of source words with probability p(i). Then,

3 92 where pj is the probability of the j-th source word, and Using Equation 4, the redundancy of the code is When E[K] is expressed in the form of Equation 4, it is easily shown that the redundancy can be zero only if the source word distribution is dyadic; that is, pj = 2-2 J, for zj a positive integer, and I, = zj. Equation 7 appears to relax this constraint, since zero redundancy may be obtained if L(i) = log, p(i) for all i, and L(i) may take non-integer values. In fact, as will be shown, zero redundancy is not in general, attainable, unless p = 1/ Determining the code word lengths In order to find the redundancy, it is necessary to determine the lengths of the code words. From the above properties of the code word lengths, the code words for SI must have length I(i) or I(i) + 1, where I(i) is a positive integer. Let n, be the number of code words with length I(i)+l. Then, from the definition of L(i), L(i) = I(i)+ i = 0,1,..., K. It is neccessary to determine n, and I(i). The Huffman code minimizes r(p,k) = E[K] - KH(S). By removing all the constraints on {L(i),i = 0,1,...,I(}, except (:)2-'(') = 1, then it is easy to show using Lagrange multipliers that Lopt(i) = - log, p(i) achieves the minimum solution, with r = 0. Similarly, letting L(i) = I(;) +n(i)/('y), where I(i) is a positive integer, and 0 5 n(i) 5 (t), but with no integer constraint, then clearly, I(i) = [-log,p(i)l and n(i) = (y)(-logzp(i) - I(i)) is also an optimum solution, with r = 0. The integer values for n(i) will be considered in the next section. In fact, setting I(z) = L-log,p(i)J will only achieve a Huffman code if K is sufficiently large, and p < 1. If p = 1, then r = 1 for any IC. When I( is small, and p is near 1, then p(k - 1) may be much less than 2'; - 1, which is the length of the longest possible Huffman code word for a block length of K. Since p 2 1/2, then 2K - 1 grows faster than p(k - 1) decays, and the above value for I(i) will become the Huffman code length, as A' gets large, when p < 1. Also, using the properties of Huffman codes, L(K) = [(IC- 1) or I(K - 1) %

4 93 3 Determining the redundancy The number of code words of length I(i) + 1, n(i), will not be exactly determined, but will be upper and lower bounded, using the Kraft inequality. Let d; = &s, +J. If di 5 (l)p(i) for all a, then the Kraft inequality will be satisfied, which will ensure that a prefix codes with lengths I, exists. This code will be an upper bound on the Huffman code. Note that from this point on, the constraint that L(K) is determined from 1- log, p (K- l)j and not 1- log, p(k)j will be ignored. This approximation will simplify the ensuing development, but will not affect the final answer, since both 2-L(K) and p (K). log, -. p(k) vanish as I( becomes large. Observe that If ni = [2(:) [l -p(i)2'(')]], then d; 5 (':)p(i), and the Kraft inequality will be satisfied. Let fi = I (;) + log, p(a) = 1- log, p(i)j + log, p(z), and let and LU(i) = I(i) + -[2(3(1-2'71, 1 (If) (9) If L(i) is the value obtained by the Huffman code, then L'(i) 5 L(i) 5 Lu(i), i # I(. From the definition of redundancy, and

5 94 Observe that -1 < fi 5 0, which implies that [2(1-21*)] = 0, and [Z(l - 2fi)l = 1, and Equation 12 becomes P(0) + P(W. (13) As It' becomes large, for p < 1, the term p(0) +p(k) becomes negligibly small. This, combined with the observation that lim, e = lima,00 = z, implies that the upper and lower bounds of Equation 13 converge, and K-m lim r(p,it') = K-00 lim Equation 14 is the fundamental result of this correspondence, but it is difficult to completely resolve the limit without further constraints on p. 3.1 Closed-form results for a class of values of p Numerical results for r(p, It') can be easily obtained from Equation 14 for any value of p. However, a more intuitive closed-form can be obtained for a particular class of values for p, where p = (1 + 2-M)-1, for M a positive integer. For these values of p, log,[(l - p)/p] is an integer, and fi = It' log, p + [-IC log, p] = f for all i. Equation 14 becomes i=l lim r(p,~t') = lim (.f + 2(1-2')(1- h'-m K-CO p(0) - p(~t'))] = I&m [f + 2( It is clear that limk,,[f + 2(1-2f)] does not exist, and does not decay as K becomes large. Instead, it shows that r(p, A') has a behaviour which is determined by the numerical properties of A' log, p, becoming closer to 0 when -It' log, p is close to 1-zc log, p]. It is interesting to note that max-l<,<o[f + 2(1-2')] = log, 2/(eln2) x 0.086, which is the number obtained by Gallager in [5] in determining upper bounds on r, given the probability of the most likely source word. Gallager obtains this value as the maximumof the function H(ql, qzr..., ql), where the Huffman code tree is full at depth I, the qi's are the probabilities of the nodes at depth I, L = 2', and H() is the entropy function. The maximum value occurs at an extreme point, where qi = ql, for i 5 n, for some 1 5 n 5 L, and q; = q1/2 for n < i 5 L. A distribution of this form is similar to the binomial distributions which are being considered here, where there are blocks of nodes with the same probabilities.

6 p=o r i66667 rb p = 0.8 p = r rb r fb Table 1: Comparison of redundancy and bound. As well, numerical results show that the redundancy of the Huffman codes for K-bit block converges quite quickly to f + 2(1-2') (see Table l), which makes it useful as a rule-of-thumb for estimating ~ (p, IC) with a simple calculation. This approach extends naturally to source probabilities of a more general form, where 10g2(l -p)/p = -A/B, where A and B are positive, relatively prime integers. In this case, p = (1 + 2-A/B)-1. Let A B A B = I(log,p - i- + [-I(log2p + i-j, for 0 5 i 5 B - 1. Then, for large I<, the redundancy converges to 1 B-1 - B i=o c [fi + 2(1-2'91 (15) 4 Conclusions This work has investigated the properties of the redundancy of Huffman codes for K-bit blocks of the binary source. It has been shown that for p = (1 + 2-M)-', the compression ratio for large Ii' is p = H(S) + f + 2(1-2') and that the redundancy of the code never goes to 0, no matter how large Ii becomes. Numerically, it has been shown that the redundancy of the Huffman code converges quite quickly to f+2(1-2'), which means that the redundancy can be determined for a given I( and p with only a simple calculation. In fact, the redundancies for abitrary values of p can also be approximated using Equation 15, since T is a continuous function of p, and A and B can be chosen to be as close to any p as required.

7 % 5 Appendix: The modified Huffman code It is well-known that the Huffman code for a given source distribution is not necessarily unique. In fact, it is possible to construct a Huffman code which has code word lengths that are in some cases longer than the corresponding Shannon-Fano code word lengths (for example, see Problem 5.14 in [3]). Recall that the Huffman code is an iterative algorithm, where the two lowest weight nodes are combined into a new node with a weight that is the sum of the weights of its two children. If at some point during the construction of the Huffman code, there are two or more nodes that have the same weight, then it is possible to construct more than one Huffman code. All the codes for the same distribution will have the same average length, but the topologies of their code trees may differ. It is often convenient when studying the redundancy of Huffman codes to be able to use the Shannon-Fano code word lengths as upper bounds for the code word length of the Huffman code. If it is possible to construct Huffman codes that have code word lengths longer than the Shannon-Fano code words, then this approach is not feasible. Fortunately, with a simple modification to Huffman s algorithm, it is possible to construct a Huffman code that does have average code words that are less than or equal to that of the Shannon-Fano code, by changing the algorithm so that the longest code words are as short as possible. This is achieved as follows. Assume that the nodes at each step of the Huffman algorithm are kept in a list which is ordered largest to smallest, in terms of weight, and, at each step, the bottom two nodes in the list are combined. The new node is moved up the list until the node which is immediately above it has a larger weight. Huffman s original algorithm specified that the new node should be moved up the list until the node which is immediately has the same or larger weight. If the modification is used, then the longest code words will have shortest possible length, and the lengths of the Huffman codes will indeed be less than or equal to those of the Shannon-Fano code. References [l] R.M. Capocelli, and A. de Santis, Tight upper bounds on the redundancy of Huffman codes, IEEE Trans. lnf. Theory, Vol. 35, No.5, September 1989, pp , (21 R.M. Capocelli, and A. de Santis, New bounds on the redundancy of Huffman codes, ZEEE Trans. lnf. Theory, Vol. 37, No.4, July 1991, pp [3] T.M. Cover, and J.A. Thomas, Elements of information theory, New York: Wiley, [4] R.G. Gallager, Information theory and reliable communication, New York: Wiley, [5] R.G. Gallager, Variations on a theme by Huffman, IEEE Trans. Znf. Theory, IT-24, No.6, November 1978, pp

8 97 [6] N.C. Geckinli, Two corollaries to the Huffman coding procedure, IEEE Trans. Inf. Theory, May 1975, pp [7] D.A. Huffman, A method for the construction of minimum redundancy codes, P~oc. IRE, Vol. 40, 1952, pp [8] M. Jakobssen, Huffman coding in bit-vector compression, Information Processing Letters, Vol. 7, No. 6, October 1978, pp [9] 0. Johnsson, On the redundancy of binary Huffman codes, IEEE Trans. Inf. Theory, IT-26, No.2, March 1980, pp [lo] D. Manstetten, Tight upper bounds on the redundancy of Huffman codes, IEEE Trans. Inf. Theory, IT-38, No. 1, January 1992, pp [ll] B.L. Montgomery, and J. Abrahams, On the redundancy of optimal prefixcondition codes for finite and infinite sources, IEEE Trans. Inf. Theory, IT-33, No. 1, January 1987, pp [12] B.L. Montgomery, and B.V.K.V. Kumar, On the average codeword length of optimal binary codes for extended sources, IEEE Trans. I f. Theory, IT-33, No. 2, March 1987, pp

Chapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code

Chapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code Chapter 3 Source Coding 3. An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code 3. An Introduction to Source Coding Entropy (in bits per symbol) implies in average