Generalized Kraft Inequality and Arithmetic Coding

Similar documents
Data Compression Techniques

Motivation for Arithmetic Coding

Chapter 5: Data Compression

SIGNAL COMPRESSION Lecture 7. Variable to Fix Encoding

Chapter 2: Source coding

3F1 Information Theory, Lecture 3

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

Chapter 2 Source Models and Entropy. Any information-generating process can be viewed as. computer program in executed form: binary 0

3F1 Information Theory, Lecture 3

Run-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE

Data Compression. Limit of Information Compression. October, Examples of codes 1

Chapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code

Using an innovative coding algorithm for data encryption

UNIT I INFORMATION THEORY. I k log 2

Entropy as a measure of surprise

New Minimal Weight Representations for Left-to-Right Window Methods

ECE 587 / STA 563: Lecture 5 Lossless Compression

SIGNAL COMPRESSION Lecture Shannon-Fano-Elias Codes and Arithmetic Coding

A Comparison of Methods for Redundancy Reduction in Recurrence Time Coding

Lecture 1 : Data Compression and Entropy

ECE 587 / STA 563: Lecture 5 Lossless Compression

Chapter 2 Date Compression: Source Coding. 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code

Data Compression Techniques (Spring 2012) Model Solutions for Exercise 2

On the redundancy of optimum fixed-to-variable length codes

Summary of Last Lectures

Information Theory CHAPTER. 5.1 Introduction. 5.2 Entropy

EECS 229A Spring 2007 * * (a) By stationarity and the chain rule for entropy, we have

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria

10-704: Information Processing and Learning Fall Lecture 10: Oct 3

CSCI 2570 Introduction to Nanocomputing

On Synchronous Variable Length Coding for Discrete Noiseless Chrannels

Fibonacci Coding for Lossless Data Compression A Review

Basic Principles of Lossless Coding. Universal Lossless coding. Lempel-Ziv Coding. 2. Exploit dependences between successive symbols.

Coding of memoryless sources 1/35

Constant Weight Codes: An Approach Based on Knuth s Balancing Method

Lecture 4 : Adaptive source coding algorithms

Lecture 3. Mathematical methods in communication I. REMINDER. A. Convex Set. A set R is a convex set iff, x 1,x 2 R, θ, 0 θ 1, θx 1 + θx 2 R, (1)

Multimedia Communications. Mathematical Preliminaries for Lossless Compression

1 Introduction to information theory

Information and Entropy

An introduction to basic information theory. Hampus Wessman

Data Compression Techniques

Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet)

Lecture 1: Shannon s Theorem

Lecture 3 : Algorithms for source coding. September 30, 2016

EE5585 Data Compression January 29, Lecture 3. x X x X. 2 l(x) 1 (1)

On the Cost of Worst-Case Coding Length Constraints

Lecture 16. Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code

The Optimal Fix-Free Code for Anti-Uniform Sources

Compression and Coding

CMPT 365 Multimedia Systems. Lossless Compression

EE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16

Coding for Discrete Source

Chapter 9 Fundamental Limits in Information Theory

Information Theory and Statistics Lecture 2: Source coding

Source Coding Techniques

Lecture 2: Introduction to Audio, Video & Image Coding Techniques (I) -- Fundaments

Selective Use Of Multiple Entropy Models In Audio Coding

Huffman Coding. C.M. Liu Perceptual Lab, College of Computer Science National Chiao-Tung University

On Universal Types. Gadiel Seroussi Hewlett-Packard Laboratories Palo Alto, California, USA. University of Minnesota, September 14, 2004

1 Ex. 1 Verify that the function H(p 1,..., p n ) = k p k log 2 p k satisfies all 8 axioms on H.

Lecture 2: Introduction to Audio, Video & Image Coding Techniques (I) -- Fundaments. Tutorial 1. Acknowledgement and References for lectures 1 to 5

An Approximation Algorithm for Constructing Error Detecting Prefix Codes

PART III. Outline. Codes and Cryptography. Sources. Optimal Codes (I) Jorge L. Villar. MAMME, Fall 2015

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

Entropy and Ergodic Theory Lecture 3: The meaning of entropy in information theory

Introduction to information theory and coding

Shannon-Fano-Elias coding

Lec 03 Entropy and Coding II Hoffman and Golomb Coding

Multimedia. Multimedia Data Compression (Lossless Compression Algorithms)

Optimal codes - I. A code is optimal if it has the shortest codeword length L. i i. This can be seen as an optimization problem. min.

ECE Advanced Communication Theory, Spring 2009 Homework #1 (INCOMPLETE)

COMM901 Source Coding and Compression. Quiz 1

Ch. 2 Math Preliminaries for Lossless Compression. Section 2.4 Coding

MARKOV CHAINS A finite state Markov chain is a sequence of discrete cv s from a finite alphabet where is a pmf on and for

Cell-Probe Lower Bounds for Prefix Sums and Matching Brackets

Information Theory and Coding Techniques

lossless, optimal compressor

Induction and Recursion

2012 IEEE International Symposium on Information Theory Proceedings

Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information

Lecture 6: Kraft-McMillan Inequality and Huffman Coding

Uncertainity, Information, and Entropy

Lec 05 Arithmetic Coding

With Question/Answer Animations

Introduction to algebraic codings Lecture Notes for MTH 416 Fall Ulrich Meierfrankenfeld

A NOTE ON STRATEGY ELIMINATION IN BIMATRIX GAMES

COS597D: Information Theory in Computer Science October 19, Lecture 10

Course notes for Data Compression - 1 The Statistical Coding Method Fall 2005

Information Theory with Applications, Math6397 Lecture Notes from September 30, 2014 taken by Ilknur Telkes

Efficient Alphabet Partitioning Algorithms for Low-complexity Entropy Coding

Scalar multiplication in compressed coordinates in the trace-zero subgroup

Text Compression. Jayadev Misra The University of Texas at Austin December 5, A Very Incomplete Introduction to Information Theory 2

Upper Bounds on the Capacity of Binary Intermittent Communication

COMMUNICATION SCIENCES AND ENGINEERING

Digital Communication Systems ECS 452. Asst. Prof. Dr. Prapun Suksompong 5.2 Binary Convolutional Codes

ELEC 515 Information Theory. Distortionless Source Coding

Lecture 6: Expander Codes

On universal types. Gadiel Seroussi Information Theory Research HP Laboratories Palo Alto HPL September 6, 2004*

Tight Upper Bounds on the Redundancy of Optimal Binary AIFV Codes

Transcription:

J. J. Rissanen Generalized Kraft Inequality and Arithmetic Coding Abstract: Algorithms for encoding and decoding finite strings over a finite alphabet are described. The coding operations are arithmetic involving rational numbers li as parameters such that Zi2"i 5 2". This coding technique requires no blocking, and the per-symbol length of the encoded string approaches the associated entropy within E. The coding speed is comparable to that of conventional coding methods. Introduction The optimal conventional instantaneous Huffman code [ for an independent information source with symbol probabilities (p,,..., p,) may be viewed as a solution to the integer programming problem: Find m natural numbers li as lengths of binary code words such that Cipil, is minimized under the constraining Kraft inequality I. Then the minimized sum B,p,l,approximates the Shannon- Boltzmann entropy function H (p,,..., p,) = -zip, log pi from above with an error no more than one. If a better approximation is required, blocking is needed; e.g., a kth extension of the alphabet must be encoded, which reduces the least upper bound of the error to / k [2]. We describe another coding technique in which m positive rational numbers I,,...,, are selected such that a generalized Kraft inequality holds: 2"i 5 2-, E > 0. (For E = 0, the rationality requirement would have to be relaxed.) The length of the code of a sequence s with ni occurrences of the ith symbol is given by the sum L (s) = i nili, which when minimized over the I, subject to the preceding inequality and divided by n = Bini is never larger than 98 where E is determined by the difference li - log(n/n,). This means that, if the strings are generated by an independent information source, the mean of n"l(s) approaches the entropy function from above within an error E + O( /n). The coding operations are arithmetic, and they resemble the concatenation operations in conventional coding in that the code of the string sak, where uk is a new symbol, is obtained by replacing only a few (always less than some fixed number) of the left-most bits of the code representing s by a new longer string. As a result, the coding operations can be accomplished with a speed comparable to that of conventional coding. The primary advantage of the resulting "arithmetic coding" is that there is no blocking needed even for the binary alphabet. In addition, for small alphabet sizes the size of tables to be stored and searched is smaller than the code word tables in conventional coding methods. For a binary alphabet a special choice of parameters I, and I, leads to a particularly simple coding technique that is closely related to one due to Fano [ 3. The coding method described here is reminiscent of the enumerative coding techniques of Schalkwijk [4] and Cover [5], and perhaps also of that due to Elias, as sketched in [2]. All of these are inferior, however, in one or more crucial respects, especially speed. Binary alphabet The coding algorithm is derived and studied for any finite alphabet, including the binary case. But because of the special nature of a binary alphabet, which admits certain simplifications, we study it separately. The importance of applications of the binary alphabet in data storage also warrants separate study. J. J. RISSANEN IBM.I. RES. DEVELOP.

Let I, and I, be two positive rational numbers such that I, 5 I, and when written in binary notation they have q binary digits in their fractional part. Further, for x, a rational number, 0 5 x <, with q binary digits in its fractional part, let e(x) be a rational-valued approximation to 2s such that ( x) = 2s+% '3 I > -6,0, and that e(x) has r > 0 binary digits in its fractional part. Clearly, the minimum size for r depends on E and q. For a choice of these, see Theorem 2. Let s denote a binary string and A an empty string. Write the concatenation of s and k, k = 0 or, as sk. Define the rational-valued function L by the recursion L(sk) = L(s) + Ik+,, L(A) =r. Write the numbers L(s) as L(s) = y (s) + x(s), (3) where y(s) is the integer part LL(s)d and x(s) is the fraction with fractional bits. The encoding function C transforms binary strings into nonnegative integers by the recursive formula () In addition, in the former case make C(s') = C(s) and in the latter case make C(s') = C(s) - 2"'"'e[x(s)], L(s') = L(s) - I,, to complete the recursion. As a practical matter, the test in (7) is done by truncat- ing C(s) to its r + left-most bits c(s) and writing - C(s) = 2"'"'p(s), where 5 'p(s) < 2 and a(s) = (C(s)l + I. Then the order inequality in (7) is equivalent to the lexicographic order inequality with priority on the first component. We next give a Kraft inequality type condition for the numbers, and I,, which guarantees a successful decoding. then D(C(s), L(s)) = s for all binary strings s. Because the code increases only when a has been appended to the string, C takes a binary string s,, described by the list (k,;.., k,), where ki denotes the position of its ith I from left, to the sum Proof We have to prove that s = s' I C( s) 2'") e[x(s)]. By (4) the implication from left to right is clear. To prove the converse is equivalent to proving that for all strings s @(si) = 2'/'si"e[x(si)] Y(s,) = L(ki- i)i, + d2-i + r x(si) = (ki - i)i, + ii, + r - y(s,). (6) The code for s consists of the pair (C(s), L (s) ), where L (s) may be replaced by the length n = / SI of s and m, the number of 's in s. Decoding function D recovers s from right to left recursively as follows: If ~ ( s < ) 2Y'"'e[x(s)l, Then generate a 0, i.e., s = s'0; (7 Else, generate a, i.e., s = s'. and, because by ( ) I, > E and I, > E, @(s) < @ (sj). (4) We prove ( 2) by induction. For the induction base C(A) = 0 < 2''0'e[x(~)], because L(0) = I, + r > 0. Assume then that (2) holds for all s with length Is/ 5 n. Case - st = SO By (4),(2), and (l4), in turn, and ( 2) holds for all strings s with length n + ending at 0. 99 MAY 976 ARITHMETIC CODING

Case 2 - s' = s l By (4), (2), and (3), in turn, C(s') = C(s) +@ (SI) < @ (SO) + (sl) 5 (2+i3 + 2' /A) @(s). Again by ( 3) twice ~ ( 5 2-2+$3 ~ ~ ( ~ 52-2+43 ) X 2-+"3~(s0), and we have < (,-(I+, + 2-b+' ) @(do). By ( ), finally, C(s') < @ (S'O), which completes the induction and the proof of the theorem. Theorem 2 If s is a binary string of length n with m l's, then log C(s) 5 rn, + (n - m)l, + r + + ~ /3. (5) With E 5 el, E, 5 E + 2-' n, = log n" + ' n I, = log + E,, inequality ( )' holds and and n + E + 2-'+ O(i), where H(p) = p log p-' + ( - p ) log ( - p)" Proof By (l), (5), and (3), m q S ) 2r+r/3 2 2(ki-i)ll+i2 i= I The sum is clearly at its maximum when kt = n - m + i, and hence Numerical considerations, example, and special case We next study the addition in (9). From (4), (3), and (2, In cases of interest, where I, < and I, > 2, this gives It then follows that never more than r- L2d + 2 left-most bits of C (s) in (9) need be replaced in order to get C (s ). This means that the recursion (9) is reminiscent of conventional coding methods such as Huffman coding in which a new code word is appended to the string of the previous code words. Here, there is a small overlapping between the manipulated strings. What then are the practical choices for q, E, and r? Suppose that p = (m/n) < $, which are the values that provide worthwhile savings in data storage; [H(f) E 0.8. Then E should be smaller than, say, p E I,. This makes q not smaller than log p-' - E, -. The function 2" is such that ( ) in general can be satisfied approximately with Y = q + 2. The function e(x) may be tabulated for small or moderate values of q, or it may be calculated from the standard formula of the type With a table lookup for e(x) the coding operations involve only two arithmetic operations each per bit with the version (7)-(0) and approximately 6(m/n) operations per bit when the formulas (5), (6) are used. In the latter case the decoding test (0) is not done for every bit but rather the ki in si c* (k,,..., kt) are solved as: ki is the maximum for which In fact, ki=60r6+, By ( ) and from the fact that, 2,, 2-l~ 5 2-f- or 2'2 > 2. Therefore, / (22 - ) < I, and the claim (5) follows. The rest follows by a direct calculation. Remark If the symbols 0 and are generated by an independent or memoryless information source, then, because E(m/ n) = p, inequality ( 6) holds also for the 200 means of both sides. where 6 = R-'[ a( si) + i(, - /,) - r] can be calculated recursively with two arithmetic operations. If log e(x) is also approximated by a table, then ki can be calculated without the above ambiguity with three arithmetic operations from Thus the coding operations can be done at a rate faster or comparable to that of Huffman coding. If the function e(x) and its logarithm are tabulated, the sizes of the tables are of the order of p-', which gives the error E E p as the "penalty" term in ( 6). J. J. RISSANEN IBM J. RES. DEVELOP.

With Huffman coding the same penalty term is k" where k is the block size [2]. Therefore, in order to guarantee the same penalty term, we must put k E ec E - p, which gives 2k 2"Ii for the number of code words. In typical cases this is much larger than the number of entries p-' in the table of e(x) with the present coding method. Example We now illustrate our coding technique by a simple example. Let I, = 0.0 I and l2 =. I, both numbers being in binary form. Then 2-' + 2-2 < - 0.95 < 2-0.28 Hence, E = 0.28 will do, and we get from ( ) r = 4. The function e is defined by the table Take s above, X e(x) 0.00.0000 0.0.00 0.0.00 0..0. ( I, 3, 20, 22,42). Then from (6) and the table By adding all these we get @(l, I) =26x.0 @(3, 2) = 2 x.0 @(20, 3) = 28 x.00 @(22, 4) = LZ2 x.00 @(42,5) = 23 X LOOOO. f(s)= 000000000000000ll00000. Because of the complications involved in blocking, Huffman coding is rarely used for a binary alphabet. Instead, binary strings are often coded by various other techniques, such as run-length coding in which runs of consecutive zeros are encoded as their number. We describe a special case of arithmetic coding that is particularly fast and simple. This method turns out to be related to one due to Fano [3], and the resulting compression is much better than in ordinary run-length coding. Suppose that we wish to store m binary integers, z,=(z,,z2,~~~,z,),0~zl~z,~'~~iz,i2", each written with w bits. Let n = [log ml, and let n < w. The list Z, may be encoded as a binary string s, whose ith isin position ki = zi + i, as in (5). Then p is approximately given by 2"-", and we may put = 2"-m l,=w-n+l. Then inequality ( ) can be shown to hold for some E whose size we do not need. By setting e(x) = I + x and r = w - n we get from (6) y(s,) = qi + i(r + ) + r, x(si) = ri2?, @(si) = (2' + ri)2qi+i'r+', where ri consists of the w - n right-most bits of zi and qi of the rest; i.e., zi = qi2w-" + Ti. The encoding of Z, or, equivalently, of s, is now straightforward by (5). It can be seen to be a binary string where the m strings ri and qi appear as follows lr,0...0... lr20~~~0r0~~~0, where the first run of zeros counted from the right is of length ql, the next of length q, - q,, and so on; the last or the left-most run is of length q, - q,-l. From this the ri and the qi can be recovered in the evident manner. The length of the code is seen to be not greater than m( w - n + 2), which is an excellent approximation to the optimum, 22LH( 2'-,), whenever w - n > 3. Finite alphabet Let ( ul, a2;.., a,) be an ordered alphabet, and let be a positive number, which we will show determines the accuracy within which the optimum per symbol length, the entropy, is achieved. Let I,,..., I, be positive rational numbers with q binary digits in their fractional parts such that 5 2-i 5 2". i= Further, let pk be a rational number such that and define k P, = pi, Po = 0. i=l Finally, for x, a q-bit fraction, let e(x) be an approximation to 2" such that e(x) = 2s+sx, 0 5 6z 5 5 2' Write pi, P,, and e(x) with r fractional bits. Because of (2 ) and (22), P, m2-r, so that r P log m. The encoding function C is defined as follows: (24) C(sa,) = C(s) + Pk-l x @(SLl,), C(h) = 0, 20 MAY 976 ARITHMETIC (:ODlNG

L(su,) = L(s) +, = y(sa,) + x(sa,), L(X) = 2r, @(sa,) = 2Y(sa,) X e[x(su,)l, (25) where y (sa,) = LL(sa,) and x(sa,) is the fractional part of L (sa,). Let u (t) be the largest number in {,..., m} such that Pu(l)-l 5 t for t 5. (26) Further, for the decoding we need an upper bound for a truncation of C(s). We pick this as follows: - C(s) = 2"[p(s) + 2-y+'], (7) where a(s) = IC(s)l + and p(s), 5 p(s) < 2, is defined by the y left-most bits of C(s). Put = r5.6 - logel. (28) The decoding function D recovers s from (C (s), L ( s) ) as follows: s t(s) = s'aullcs,,' - = C(s)/@(s), C(s') = C(s) - Pu[t(s),-l x @(SI> L(s') = L(s) - (29 Theorem 3 If for E > 0, m i=l 2-'i 5 2-~, then for every s, D(C(s), L(s)) = s. Proof We prove first that (30) C(s) 5 @(s) (3) for every s. This inequality holds for s = A by (25). Arguing by induction, suppose that it holds for all strings s of length no more than n. By (25) and (3 ), C(sa,) 5 @(s) + Pk-l x @(sa,). (32) By (24) and (25) 2A!-f/2 @(s) 5 @(sa,) 5 2k+d2 @(SI> (33) which with (32) leads to C(sa,) 5 (2-'k+t/2 + Pk-l) @(sa,). This with (22) and (23) gives C(sa,) 5 P, @(sa,), and because by (22) A! P, 5 2' E 2-4 i= 202 the desired inequality, C(sa,) 5 @(sa,), follows with (30). The induction and the proof of (3) is complete. When we apply the decoding algorithm to the pair (C(sa,),L(su,)),wefindthatu[t(su,)]=kifandonlyif P,-] 5 t(su,) < P,. (34) If we putc(s) = C(s) +ti($), where 0 < 6(s) 5 2-y+' C(s) 5 2-y+ @(SI > (35) the inequalities (34) become The former inequality holds by (25) and the fact that 6 (sa,) 3 0. The second inequality holds with the first equation of (25), and (23) translates into C(s) + S(sa,) < p, @(sa,). (36) By (3), (33) and (35, C(s) + S(sa,) 5 2-"+t2( + 2-y+) @(sa,) - * -la!+c/2 ( + 2--y+) 2l"'k x Pk @(sa,) = (2(e/2)-ck + 2(f/2)-~+l-t~ P,@ (sa,). By (22), E, - E/ 2? E/ 6. Thus by a well-known exponential inequality we have The coefficient of p, @ (sa,) is then smaller than one for the given value (28) for y, and (36) holds. The proof is complete. From (3 I) and (25) we immediately obtain -logc(s) n n. 2r + Inequality (30) holds if we put in which case where H(Pl,..., Pm) = E Pi log pi-,. Pk-l @(sa,) 2"-' @(SI. By (33 ) and the fact that Pk-, > 2-+"2 for k >, It therefore follows that in the first equation of (25) no more than 2r + -, +, left-most bits of C(s) need by changed in order to get C(sa,). J. J. RISSANEN IBM J. RES. DEVELOP.

The crux of the decoding process is to calculate u(t) by (26 j for the r-bit fractions t. This can either be tabulated or calculated from a suitable approximation to k p = k i=l 2- i+ci. Further, the products fk-l X e[x(sa,j] appearing in (25 j can be tabulated, which is practicable for small or moderate sized alphabets. Then only two arithmetic operations per symbol are needed in the coding operations. The general case does not reduce to the binary case when rn = 2, because the factor f, = p, approximated as 2-Ii, can be absorbed by +(sl), which is cheaper. Finally, if for large alphabets the symbols are ordered such that the numbers li form an increasing sequence, the function k + P, is concave. Then both are easy to approximate by an analytic function, and the overhead storage consists mainly of the table of symbols. The coding operations then require about the time for the table lookups k c-2 ak. In addition, the li may be chosen, except for small values for i, in a universal way; e.g. as li E log i + (log and still have a near-entropy compression. (For universal coding, see [ 6 ). This means that there is no need to collect elaborate statistics about the symbol probabilities. We leave the details to another paper. References. D. A. Huffman, A Method for the Construction of Minimum-Redundancy Codes, Proc. IRE 40, 098 (952). 2. N. Abramson, Information Theory and Coding, McGraw- Hill Book Co., Inc., New York, 963. 3. R. M. Fano, On the Number of Bits Required to Implement an Associative Memory, Memorandum 6, Computer Structures Group, Project MAC, Massachusetts, 972. 4. J. P. M. Schalkwijk, An Algorithm for Source Coding, IEEE Truns. Information Theory IT-8, 395 (972). 5. T. M. Cover, Enumerative Source Encoding, IEEE Trans. lnfornzation Theory IT-9, 73 (973). 6. P. Elias, Universal Codeword Sets and Representations of the Integers, IEEE Trans. Infiwmution Theory IT-2, 94 (March 975). Received May 6, 975; revised October 5, 975 The author is located ut the IBM Research Laborutory, Monterey and Cottle Roads, Sun Jose, California 9593. 203 MAY 976 ARITHMETIC CODING