On the redundancy of optimum fixed-to-variable length codes

Size: px
Start display at page:

Download "On the redundancy of optimum fixed-to-variable length codes"


1 On the redundancy of optimum fixed-to-variable length codes Peter R. Stubley' Bell-Northern Reserch Abstract There has been much interest in recent years in bounds on the redundancy of Huffman codes, given only partial information about the source word distribution, such as the probability of the most likely source. This work determines upper and lower bounds for the redundancy of Huffman codes of source words which are binomially distributed. Since the complete distribution is known, it is possible to determine bounds which are much tighter than other bounds in the literature, given only p, the probability of the most likely symbol of the binary source, and I<, where there are 2" source words. The upper and lower bounds will be shown to converge to the same value as X becomes large, resulting in a simple approximation which can be used to predict the redundancy of the Huffman code, given p and I<, without constructing the code. 1 Introduction Let S be an independent, identically distributed binary source, which generates a 0 with probability p, p p. Let E[K] be the average code word length of the Huffman code of the alphabet obtained by taking K-bit blocks of input symbols, called source words, where the code words are also assumed to be binary, with equal cost symbols. The redundancy of the code is defined as r(p,k) = E(K] - KH(S), (1) where H(S) = -plogp-(1-p) ratio is log(1-p) (the logarithms are base 2). The compression It is clear from Equation 2 that limiii-cop = H(S), since 0 2 r(p,zc) 2 1 for the Huffman code [7], and limrc-.m 1/K = 0. However, limk,, r(p, A') is not as well understood. There have been numerous upper and lower bounds obtained for r in recent years (for example, (1, '2, 9, 10, Ill), given partial information about the source alphabet, such as the probability of the most likely source word. However, for the source words described above, obtained by taking IC-bit blocks, the complete distribution is known, given p and I(. The source words are binomially distributed, with probabilities of the form p(i) = ~ '~-~(l- P)~, such that XE, (:)p(i) = 1. There are (7)source words with probability p(i). 'Bell-Northern Research Ltd., 16, Place du Commerce, Montreal, Qudbec, H3E 1H6, Phone: (514) , Fax: (514) , stuhley@hnr.ca $ IEEE

2 91 Jakobssen [SI has analyzed E[IC] for the binomial distribution for a class of values of p, where However, observe that the values of p which satisfy Equation 3 become closer and closer to 1 as I( becomes large. This approach does not allow us to consider the behaviour of a particular value of E[K] for a particular value of p as IC becomes large. Huffman codes for binomially distributed source words have also been considered by Montgomery and Kumar [12]. They are particularly interested in the conditions which are necessary to ensure that Huffman code of a %symbol block is more efficient than simply constructing the Huffman code for the original source alphabet. In this work, ~ (p, IC) is analyzed for the binomial distribution for arbitrary values of p, with simple closed forms for particular values of p. In particular, the problem of interest is what happens to r(p, IC) as I( becomes large. The approach taken will be to derive upper and lower bounds on r(p, I(), and showing that the upper and lower bounds converge as I( becomes large. Using this approach, it will be shown that r(p, IC) does not, in general, vanish, and does not monotonically increase or decrease, but tends to oscillate between values nearer and farther from 0. 2 Properties of Huffman codes of binomial distributions This section assumes that a modified form of the Huffman algorithm is used (see Appendix). The modification ensures that the Huffman code produced by the algorithm is unique (there is only one code for a given source word distribution, which is not always the case for Huffman s original algorithm), and is constructed such that the longest code words have the shortest possible length. This assumption does not affect the optimality of the code produced by the modified algorithm, since all the Huffman codes for the same source word distribution have the same average length. If there are M source words, each with probability 1/M, then their code words will have length [log, MJ or ([log, MJ + 1) [4, 61. A similar property holds for binomial distributions - if two source words have equal probabilities, then their code words will differ in length by at most 1, which follows from Gallager s sibling property [5]. Furthermore, the least likely source word, with probability p(k), will have a code word with the same length as the source word with probability p(ic - 1 ) with the longest code word, which is a direct result of the Huffman algorithm. Let I, be the length of the j-th source word, and let Si be the set of source words with probability p(i). Then,

3 92 where pj is the probability of the j-th source word, and Using Equation 4, the redundancy of the code is When E[K] is expressed in the form of Equation 4, it is easily shown that the redundancy can be zero only if the source word distribution is dyadic; that is, pj = 2-2 J, for zj a positive integer, and I, = zj. Equation 7 appears to relax this constraint, since zero redundancy may be obtained if L(i) = log, p(i) for all i, and L(i) may take non-integer values. In fact, as will be shown, zero redundancy is not in general, attainable, unless p = 1/ Determining the code word lengths In order to find the redundancy, it is necessary to determine the lengths of the code words. From the above properties of the code word lengths, the code words for SI must have length I(i) or I(i) + 1, where I(i) is a positive integer. Let n, be the number of code words with length I(i)+l. Then, from the definition of L(i), L(i) = I(i)+ i = 0,1,..., K. It is neccessary to determine n, and I(i). The Huffman code minimizes r(p,k) = E[K] - KH(S). By removing all the constraints on {L(i),i = 0,1,...,I(}, except (:)2-'(') = 1, then it is easy to show using Lagrange multipliers that Lopt(i) = - log, p(i) achieves the minimum solution, with r = 0. Similarly, letting L(i) = I(;) +n(i)/('y), where I(i) is a positive integer, and 0 5 n(i) 5 (t), but with no integer constraint, then clearly, I(i) = [-log,p(i)l and n(i) = (y)(-logzp(i) - I(i)) is also an optimum solution, with r = 0. The integer values for n(i) will be considered in the next section. In fact, setting I(z) = L-log,p(i)J will only achieve a Huffman code if K is sufficiently large, and p < 1. If p = 1, then r = 1 for any IC. When I( is small, and p is near 1, then p(k - 1) may be much less than 2'; - 1, which is the length of the longest possible Huffman code word for a block length of K. Since p 2 1/2, then 2K - 1 grows faster than p(k - 1) decays, and the above value for I(i) will become the Huffman code length, as A' gets large, when p < 1. Also, using the properties of Huffman codes, L(K) = [(IC- 1) or I(K - 1) %

4 93 3 Determining the redundancy The number of code words of length I(i) + 1, n(i), will not be exactly determined, but will be upper and lower bounded, using the Kraft inequality. Let d; = &s, +J. If di 5 (l)p(i) for all a, then the Kraft inequality will be satisfied, which will ensure that a prefix codes with lengths I, exists. This code will be an upper bound on the Huffman code. Note that from this point on, the constraint that L(K) is determined from 1- log, p (K- l)j and not 1- log, p(k)j will be ignored. This approximation will simplify the ensuing development, but will not affect the final answer, since both 2-L(K) and p (K). log, -. p(k) vanish as I( becomes large. Observe that If ni = [2(:) [l -p(i)2'(')]], then d; 5 (':)p(i), and the Kraft inequality will be satisfied. Let fi = I (;) + log, p(a) = 1- log, p(i)j + log, p(z), and let and LU(i) = I(i) + -[2(3(1-2'71, 1 (If) (9) If L(i) is the value obtained by the Huffman code, then L'(i) 5 L(i) 5 Lu(i), i # I(. From the definition of redundancy, and

5 94 Observe that -1 < fi 5 0, which implies that [2(1-21*)] = 0, and [Z(l - 2fi)l = 1, and Equation 12 becomes P(0) + P(W. (13) As It' becomes large, for p < 1, the term p(0) +p(k) becomes negligibly small. This, combined with the observation that lim, e = lima,00 = z, implies that the upper and lower bounds of Equation 13 converge, and K-m lim r(p,it') = K-00 lim Equation 14 is the fundamental result of this correspondence, but it is difficult to completely resolve the limit without further constraints on p. 3.1 Closed-form results for a class of values of p Numerical results for r(p, It') can be easily obtained from Equation 14 for any value of p. However, a more intuitive closed-form can be obtained for a particular class of values for p, where p = (1 + 2-M)-1, for M a positive integer. For these values of p, log,[(l - p)/p] is an integer, and fi = It' log, p + [-IC log, p] = f for all i. Equation 14 becomes i=l lim r(p,~t') = lim (.f + 2(1-2')(1- h'-m K-CO p(0) - p(~t'))] = I&m [f + 2( It is clear that limk,,[f + 2(1-2f)] does not exist, and does not decay as K becomes large. Instead, it shows that r(p, A') has a behaviour which is determined by the numerical properties of A' log, p, becoming closer to 0 when -It' log, p is close to 1-zc log, p]. It is interesting to note that max-l<,<o[f + 2(1-2')] = log, 2/(eln2) x 0.086, which is the number obtained by Gallager in [5] in determining upper bounds on r, given the probability of the most likely source word. Gallager obtains this value as the maximumof the function H(ql, qzr..., ql), where the Huffman code tree is full at depth I, the qi's are the probabilities of the nodes at depth I, L = 2', and H() is the entropy function. The maximum value occurs at an extreme point, where qi = ql, for i 5 n, for some 1 5 n 5 L, and q; = q1/2 for n < i 5 L. A distribution of this form is similar to the binomial distributions which are being considered here, where there are blocks of nodes with the same probabilities.

6 p=o r i66667 rb p = 0.8 p = r rb r fb Table 1: Comparison of redundancy and bound. As well, numerical results show that the redundancy of the Huffman codes for K-bit block converges quite quickly to f + 2(1-2') (see Table l), which makes it useful as a rule-of-thumb for estimating ~ (p, IC) with a simple calculation. This approach extends naturally to source probabilities of a more general form, where 10g2(l -p)/p = -A/B, where A and B are positive, relatively prime integers. In this case, p = (1 + 2-A/B)-1. Let A B A B = I(log,p - i- + [-I(log2p + i-j, for 0 5 i 5 B - 1. Then, for large I<, the redundancy converges to 1 B-1 - B i=o c [fi + 2(1-2'91 (15) 4 Conclusions This work has investigated the properties of the redundancy of Huffman codes for K-bit blocks of the binary source. It has been shown that for p = (1 + 2-M)-', the compression ratio for large Ii' is p = H(S) + f + 2(1-2') and that the redundancy of the code never goes to 0, no matter how large Ii becomes. Numerically, it has been shown that the redundancy of the Huffman code converges quite quickly to f+2(1-2'), which means that the redundancy can be determined for a given I( and p with only a simple calculation. In fact, the redundancies for abitrary values of p can also be approximated using Equation 15, since T is a continuous function of p, and A and B can be chosen to be as close to any p as required.

7 % 5 Appendix: The modified Huffman code It is well-known that the Huffman code for a given source distribution is not necessarily unique. In fact, it is possible to construct a Huffman code which has code word lengths that are in some cases longer than the corresponding Shannon-Fano code word lengths (for example, see Problem 5.14 in [3]). Recall that the Huffman code is an iterative algorithm, where the two lowest weight nodes are combined into a new node with a weight that is the sum of the weights of its two children. If at some point during the construction of the Huffman code, there are two or more nodes that have the same weight, then it is possible to construct more than one Huffman code. All the codes for the same distribution will have the same average length, but the topologies of their code trees may differ. It is often convenient when studying the redundancy of Huffman codes to be able to use the Shannon-Fano code word lengths as upper bounds for the code word length of the Huffman code. If it is possible to construct Huffman codes that have code word lengths longer than the Shannon-Fano code words, then this approach is not feasible. Fortunately, with a simple modification to Huffman s algorithm, it is possible to construct a Huffman code that does have average code words that are less than or equal to that of the Shannon-Fano code, by changing the algorithm so that the longest code words are as short as possible. This is achieved as follows. Assume that the nodes at each step of the Huffman algorithm are kept in a list which is ordered largest to smallest, in terms of weight, and, at each step, the bottom two nodes in the list are combined. The new node is moved up the list until the node which is immediately above it has a larger weight. Huffman s original algorithm specified that the new node should be moved up the list until the node which is immediately has the same or larger weight. If the modification is used, then the longest code words will have shortest possible length, and the lengths of the Huffman codes will indeed be less than or equal to those of the Shannon-Fano code. References [l] R.M. Capocelli, and A. de Santis, Tight upper bounds on the redundancy of Huffman codes, IEEE Trans. lnf. Theory, Vol. 35, No.5, September 1989, pp , (21 R.M. Capocelli, and A. de Santis, New bounds on the redundancy of Huffman codes, ZEEE Trans. lnf. Theory, Vol. 37, No.4, July 1991, pp [3] T.M. Cover, and J.A. Thomas, Elements of information theory, New York: Wiley, [4] R.G. Gallager, Information theory and reliable communication, New York: Wiley, [5] R.G. Gallager, Variations on a theme by Huffman, IEEE Trans. Znf. Theory, IT-24, No.6, November 1978, pp

8 97 [6] N.C. Geckinli, Two corollaries to the Huffman coding procedure, IEEE Trans. Inf. Theory, May 1975, pp [7] D.A. Huffman, A method for the construction of minimum redundancy codes, P~oc. IRE, Vol. 40, 1952, pp [8] M. Jakobssen, Huffman coding in bit-vector compression, Information Processing Letters, Vol. 7, No. 6, October 1978, pp [9] 0. Johnsson, On the redundancy of binary Huffman codes, IEEE Trans. Inf. Theory, IT-26, No.2, March 1980, pp [lo] D. Manstetten, Tight upper bounds on the redundancy of Huffman codes, IEEE Trans. Inf. Theory, IT-38, No. 1, January 1992, pp [ll] B.L. Montgomery, and J. Abrahams, On the redundancy of optimal prefixcondition codes for finite and infinite sources, IEEE Trans. Inf. Theory, IT-33, No. 1, January 1987, pp [12] B.L. Montgomery, and B.V.K.V. Kumar, On the average codeword length of optimal binary codes for extended sources, IEEE Trans. I f. Theory, IT-33, No. 2, March 1987, pp

Chapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code

Chapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code Chapter 3 Source Coding 3. An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code 3. An Introduction to Source Coding Entropy (in bits per symbol) implies in average

More information

Tight Upper Bounds on the Redundancy of Optimal Binary AIFV Codes

Tight Upper Bounds on the Redundancy of Optimal Binary AIFV Codes Tight Upper Bounds on the Redundancy of Optimal Binary AIFV Codes Weihua Hu Dept. of Mathematical Eng. Email: weihua96@gmail.com Hirosuke Yamamoto Dept. of Complexity Sci. and Eng. Email: Hirosuke@ieee.org

More information

Tight Bounds on Minimum Maximum Pointwise Redundancy

Tight Bounds on Minimum Maximum Pointwise Redundancy Tight Bounds on Minimum Maximum Pointwise Redundancy Michael B. Baer vlnks Mountain View, CA 94041-2803, USA Email:.calbear@ 1eee.org Abstract This paper presents new lower and upper bounds for the optimal

More information

Chapter 2 Date Compression: Source Coding. 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code

Chapter 2 Date Compression: Source Coding. 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code Chapter 2 Date Compression: Source Coding 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code 2.1 An Introduction to Source Coding Source coding can be seen as an efficient way

More information

On the Cost of Worst-Case Coding Length Constraints

On the Cost of Worst-Case Coding Length Constraints On the Cost of Worst-Case Coding Length Constraints Dror Baron and Andrew C. Singer Abstract We investigate the redundancy that arises from adding a worst-case length-constraint to uniquely decodable fixed

More information

Asymptotic redundancy and prolixity

Asymptotic redundancy and prolixity Asymptotic redundancy and prolixity Yuval Dagan, Yuval Filmus, and Shay Moran April 6, 2017 Abstract Gallager (1978) considered the worst-case redundancy of Huffman codes as the maximum probability tends

More information

Data Compression. Limit of Information Compression. October, Examples of codes 1

Data Compression. Limit of Information Compression. October, Examples of codes 1 Data Compression Limit of Information Compression Radu Trîmbiţaş October, 202 Outline Contents Eamples of codes 2 Kraft Inequality 4 2. Kraft Inequality............................ 4 2.2 Kraft inequality

More information

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1 Kraft s inequality An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if N 2 l i 1 Proof: Suppose that we have a tree code. Let l max = max{l 1,...,

More information

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria Source Coding Master Universitario en Ingeniería de Telecomunicación I. Santamaría Universidad de Cantabria Contents Introduction Asymptotic Equipartition Property Optimal Codes (Huffman Coding) Universal

More information

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 13 Competitive Optimality of the Shannon Code So, far we have studied

More information

Coding of memoryless sources 1/35

Coding of memoryless sources 1/35 Coding of memoryless sources 1/35 Outline 1. Morse coding ; 2. Definitions : encoding, encoding efficiency ; 3. fixed length codes, encoding integers ; 4. prefix condition ; 5. Kraft and Mac Millan theorems

More information

Lecture 3 : Algorithms for source coding. September 30, 2016

Lecture 3 : Algorithms for source coding. September 30, 2016 Lecture 3 : Algorithms for source coding September 30, 2016 Outline 1. Huffman code ; proof of optimality ; 2. Coding with intervals : Shannon-Fano-Elias code and Shannon code ; 3. Arithmetic coding. 1/39

More information

EE5585 Data Compression January 29, Lecture 3. x X x X. 2 l(x) 1 (1)

EE5585 Data Compression January 29, Lecture 3. x X x X. 2 l(x) 1 (1) EE5585 Data Compression January 29, 2013 Lecture 3 Instructor: Arya Mazumdar Scribe: Katie Moenkhaus Uniquely Decodable Codes Recall that for a uniquely decodable code with source set X, if l(x) is the

More information

Lecture 16. Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code

Lecture 16. Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code Lecture 16 Agenda for the lecture Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code Variable-length source codes with error 16.1 Error-free coding schemes 16.1.1 The Shannon-Fano-Elias

More information

Optimal codes - I. A code is optimal if it has the shortest codeword length L. i i. This can be seen as an optimization problem. min.

Optimal codes - I. A code is optimal if it has the shortest codeword length L. i i. This can be seen as an optimization problem. min. Huffman coding Optimal codes - I A code is optimal if it has the shortest codeword length L L m = i= pl i i This can be seen as an optimization problem min i= li subject to D m m i= lp Gabriele Monfardini

More information

Entropy as a measure of surprise

Entropy as a measure of surprise Entropy as a measure of surprise Lecture 5: Sam Roweis September 26, 25 What does information do? It removes uncertainty. Information Conveyed = Uncertainty Removed = Surprise Yielded. How should we quantify

More information

Lecture 3. Mathematical methods in communication I. REMINDER. A. Convex Set. A set R is a convex set iff, x 1,x 2 R, θ, 0 θ 1, θx 1 + θx 2 R, (1)

Lecture 3. Mathematical methods in communication I. REMINDER. A. Convex Set. A set R is a convex set iff, x 1,x 2 R, θ, 0 θ 1, θx 1 + θx 2 R, (1) 3- Mathematical methods in communication Lecture 3 Lecturer: Haim Permuter Scribe: Yuval Carmel, Dima Khaykin, Ziv Goldfeld I. REMINDER A. Convex Set A set R is a convex set iff, x,x 2 R, θ, θ, θx + θx

More information

Redundancy-Related Bounds for Generalized Huffman Codes

Redundancy-Related Bounds for Generalized Huffman Codes Redundancy-Related Bounds for Generalized Huffman Codes Michael B. Baer, Member, IEEE arxiv:cs/0702059v2 [cs.it] 6 Mar 2009 Abstract This paper presents new lower and upper bounds for the compression rate

More information

Chapter 9 Fundamental Limits in Information Theory

Chapter 9 Fundamental Limits in Information Theory Chapter 9 Fundamental Limits in Information Theory Information Theory is the fundamental theory behind information manipulation, including data compression and data transmission. 9.1 Introduction o For

More information

Lecture 6: Kraft-McMillan Inequality and Huffman Coding

Lecture 6: Kraft-McMillan Inequality and Huffman Coding EE376A/STATS376A Information Theory Lecture 6-0/25/208 Lecture 6: Kraft-McMillan Inequality and Huffman Coding Lecturer: Tsachy Weissman Scribe: Akhil Prakash, Kai Yee Wan In this lecture, we begin with

More information

Multimedia Communications. Mathematical Preliminaries for Lossless Compression

Multimedia Communications. Mathematical Preliminaries for Lossless Compression Multimedia Communications Mathematical Preliminaries for Lossless Compression What we will see in this chapter Definition of information and entropy Modeling a data source Definition of coding and when

More information

Information Theory and Statistics Lecture 2: Source coding

Information Theory and Statistics Lecture 2: Source coding Information Theory and Statistics Lecture 2: Source coding Łukasz Dębowski ldebowsk@ipipan.waw.pl Ph. D. Programme 2013/2014 Injections and codes Definition (injection) Function f is called an injection

More information

PART III. Outline. Codes and Cryptography. Sources. Optimal Codes (I) Jorge L. Villar. MAMME, Fall 2015

PART III. Outline. Codes and Cryptography. Sources. Optimal Codes (I) Jorge L. Villar. MAMME, Fall 2015 Outline Codes and Cryptography 1 Information Sources and Optimal Codes 2 Building Optimal Codes: Huffman Codes MAMME, Fall 2015 3 Shannon Entropy and Mutual Information PART III Sources Information source:

More information

Information Theory with Applications, Math6397 Lecture Notes from September 30, 2014 taken by Ilknur Telkes

Information Theory with Applications, Math6397 Lecture Notes from September 30, 2014 taken by Ilknur Telkes Information Theory with Applications, Math6397 Lecture Notes from September 3, 24 taken by Ilknur Telkes Last Time Kraft inequality (sep.or) prefix code Shannon Fano code Bound for average code-word length

More information

Intro to Information Theory

Intro to Information Theory Intro to Information Theory Math Circle February 11, 2018 1. Random variables Let us review discrete random variables and some notation. A random variable X takes value a A with probability P (a) 0. Here

More information

An Approximation Algorithm for Constructing Error Detecting Prefix Codes

An Approximation Algorithm for Constructing Error Detecting Prefix Codes An Approximation Algorithm for Constructing Error Detecting Prefix Codes Artur Alves Pessoa artur@producao.uff.br Production Engineering Department Universidade Federal Fluminense, Brazil September 2,

More information

SIGNAL COMPRESSION Lecture Shannon-Fano-Elias Codes and Arithmetic Coding

SIGNAL COMPRESSION Lecture Shannon-Fano-Elias Codes and Arithmetic Coding SIGNAL COMPRESSION Lecture 3 4.9.2007 Shannon-Fano-Elias Codes and Arithmetic Coding 1 Shannon-Fano-Elias Coding We discuss how to encode the symbols {a 1, a 2,..., a m }, knowing their probabilities,

More information

Basic Principles of Lossless Coding. Universal Lossless coding. Lempel-Ziv Coding. 2. Exploit dependences between successive symbols.

Basic Principles of Lossless Coding. Universal Lossless coding. Lempel-Ziv Coding. 2. Exploit dependences between successive symbols. Universal Lossless coding Lempel-Ziv Coding Basic principles of lossless compression Historical review Variable-length-to-block coding Lempel-Ziv coding 1 Basic Principles of Lossless Coding 1. Exploit

More information

Generalized Kraft Inequality and Arithmetic Coding

Generalized Kraft Inequality and Arithmetic Coding J. J. Rissanen Generalized Kraft Inequality and Arithmetic Coding Abstract: Algorithms for encoding and decoding finite strings over a finite alphabet are described. The coding operations are arithmetic

More information

3F1 Information Theory, Lecture 3

3F1 Information Theory, Lecture 3 3F1 Information Theory, Lecture 3 Jossy Sayir Department of Engineering Michaelmas 2011, 28 November 2011 Memoryless Sources Arithmetic Coding Sources with Memory 2 / 19 Summary of last lecture Prefix-free

More information

A Mathematical Theory of Communication

A Mathematical Theory of Communication A Mathematical Theory of Communication Ben Eggers Abstract This paper defines information-theoretic entropy and proves some elementary results about it. Notably, we prove that given a few basic assumptions

More information

Information Theory and Coding Techniques

Information Theory and Coding Techniques Information Theory and Coding Techniques Lecture 1.2: Introduction and Course Outlines Information Theory 1 Information Theory and Coding Techniques Prof. Ja-Ling Wu Department of Computer Science and

More information

10-704: Information Processing and Learning Fall Lecture 10: Oct 3

10-704: Information Processing and Learning Fall Lecture 10: Oct 3 0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 0: Oct 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy of

More information

The Optimal Fix-Free Code for Anti-Uniform Sources

The Optimal Fix-Free Code for Anti-Uniform Sources Entropy 2015, 17, 1379-1386; doi:10.3390/e17031379 OPEN ACCESS entropy ISSN 1099-4300 www.mdpi.com/journal/entropy Article The Optimal Fix-Free Code for Anti-Uniform Sources Ali Zaghian 1, Adel Aghajan

More information

EE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16

EE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16 EE539R: Problem Set 4 Assigned: 3/08/6, Due: 07/09/6. Cover and Thomas: Problem 3.5 Sets defined by probabilities: Define the set C n (t = {x n : P X n(x n 2 nt } (a We have = P X n(x n P X n(x n 2 nt

More information

Optimum Binary-Constrained Homophonic Coding

Optimum Binary-Constrained Homophonic Coding Optimum Binary-Constrained Homophonic Coding Valdemar C. da Rocha Jr. and Cecilio Pimentel Communications Research Group - CODEC Department of Electronics and Systems, P.O. Box 7800 Federal University

More information

Shannon-Fano-Elias coding

Shannon-Fano-Elias coding Shannon-Fano-Elias coding Suppose that we have a memoryless source X t taking values in the alphabet {1, 2,..., L}. Suppose that the probabilities for all symbols are strictly positive: p(i) > 0, i. The

More information

Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet)

Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet) Compression Motivation Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet) Storage: Store large & complex 3D models (e.g. 3D scanner

More information

Ch. 2 Math Preliminaries for Lossless Compression. Section 2.4 Coding

Ch. 2 Math Preliminaries for Lossless Compression. Section 2.4 Coding Ch. 2 Math Preliminaries for Lossless Compression Section 2.4 Coding Some General Considerations Definition: An Instantaneous Code maps each symbol into a codeord Notation: a i φ (a i ) Ex. : a 0 For Ex.

More information

COMM901 Source Coding and Compression. Quiz 1

COMM901 Source Coding and Compression. Quiz 1 German University in Cairo - GUC Faculty of Information Engineering & Technology - IET Department of Communication Engineering Winter Semester 2013/2014 Students Name: Students ID: COMM901 Source Coding

More information

Lecture 4 : Adaptive source coding algorithms

Lecture 4 : Adaptive source coding algorithms Lecture 4 : Adaptive source coding algorithms February 2, 28 Information Theory Outline 1. Motivation ; 2. adaptive Huffman encoding ; 3. Gallager and Knuth s method ; 4. Dictionary methods : Lempel-Ziv

More information

Chapter 5: Data Compression

Chapter 5: Data Compression Chapter 5: Data Compression Definition. A source code C for a random variable X is a mapping from the range of X to the set of finite length strings of symbols from a D-ary alphabet. ˆX: source alphabet,

More information

CSCI 2570 Introduction to Nanocomputing

CSCI 2570 Introduction to Nanocomputing CSCI 2570 Introduction to Nanocomputing Information Theory John E Savage What is Information Theory Introduced by Claude Shannon. See Wikipedia Two foci: a) data compression and b) reliable communication

More information

ECE 587 / STA 563: Lecture 5 Lossless Compression

ECE 587 / STA 563: Lecture 5 Lossless Compression ECE 587 / STA 563: Lecture 5 Lossless Compression Information Theory Duke University, Fall 2017 Author: Galen Reeves Last Modified: October 18, 2017 Outline of lecture: 5.1 Introduction to Lossless Source

More information

SIGNAL COMPRESSION Lecture 7. Variable to Fix Encoding

SIGNAL COMPRESSION Lecture 7. Variable to Fix Encoding SIGNAL COMPRESSION Lecture 7 Variable to Fix Encoding 1. Tunstall codes 2. Petry codes 3. Generalized Tunstall codes for Markov sources (a presentation of the paper by I. Tabus, G. Korodi, J. Rissanen.

More information

ECE 587 / STA 563: Lecture 5 Lossless Compression

ECE 587 / STA 563: Lecture 5 Lossless Compression ECE 587 / STA 563: Lecture 5 Lossless Compression Information Theory Duke University, Fall 28 Author: Galen Reeves Last Modified: September 27, 28 Outline of lecture: 5. Introduction to Lossless Source

More information


UNIT I INFORMATION THEORY. I k log 2 UNIT I INFORMATION THEORY Claude Shannon 1916-2001 Creator of Information Theory, lays the foundation for implementing logic in digital circuits as part of his Masters Thesis! (1939) and published a paper

More information

Lecture 1: September 25, A quick reminder about random variables and convexity

Lecture 1: September 25, A quick reminder about random variables and convexity Information and Coding Theory Autumn 207 Lecturer: Madhur Tulsiani Lecture : September 25, 207 Administrivia This course will cover some basic concepts in information and coding theory, and their applications

More information

Reserved-Length Prefix Coding

Reserved-Length Prefix Coding Reserved-Length Prefix Coding Michael B. Baer Ocarina Networks 42 Airport Parkway San Jose, California 95110-1009 USA Email:icalbear@ 1eee.org arxiv:0801.0102v1 [cs.it] 30 Dec 2007 Abstract Huffman coding

More information

Reserved-Length Prefix Coding

Reserved-Length Prefix Coding Reserved-Length Prefix Coding Michael B. Baer vlnks Mountain View, CA 94041-2803, USA Email:.calbear@ 1eee.org Abstract Huffman coding finds an optimal prefix code for a given probability mass function.

More information

Constant Weight Codes: An Approach Based on Knuth s Balancing Method

Constant Weight Codes: An Approach Based on Knuth s Balancing Method Constant Weight Codes: An Approach Based on Knuth s Balancing Method Vitaly Skachek 1, Coordinated Science Laboratory University of Illinois, Urbana-Champaign 1308 W. Main Street Urbana, IL 61801, USA

More information

Asymptotically Optimal Tree-based Group Key Management Schemes

Asymptotically Optimal Tree-based Group Key Management Schemes Asymptotically Optimal Tree-based Group Key Management chemes Hideyuki akai Hitachi, Ltd., ystems Development Laboratory Asao-ku, Kawasaki-shi, Kanagawa 215 0013, Japan ariv:cs/0507001v1 [cs.it] 1 Jul

More information

Chapter 2: Source coding

Chapter 2: Source coding Chapter 2: meghdadi@ensil.unilim.fr University of Limoges Chapter 2: Entropy of Markov Source Chapter 2: Entropy of Markov Source Markov model for information sources Given the present, the future is independent

More information

The memory centre IMUJ PREPRINT 2012/03. P. Spurek

The memory centre IMUJ PREPRINT 2012/03. P. Spurek The memory centre IMUJ PREPRINT 202/03 P. Spurek Faculty of Mathematics and Computer Science, Jagiellonian University, Łojasiewicza 6, 30-348 Kraków, Poland J. Tabor Faculty of Mathematics and Computer

More information

Lecture 1: Shannon s Theorem

Lecture 1: Shannon s Theorem Lecture 1: Shannon s Theorem Lecturer: Travis Gagie January 13th, 2015 Welcome to Data Compression! I m Travis and I ll be your instructor this week. If you haven t registered yet, don t worry, we ll work

More information

An introduction to basic information theory. Hampus Wessman

An introduction to basic information theory. Hampus Wessman An introduction to basic information theory Hampus Wessman Abstract We give a short and simple introduction to basic information theory, by stripping away all the non-essentials. Theoretical bounds on

More information

3F1 Information Theory, Lecture 3

3F1 Information Theory, Lecture 3 3F1 Information Theory, Lecture 3 Jossy Sayir Department of Engineering Michaelmas 2013, 29 November 2013 Memoryless Sources Arithmetic Coding Sources with Memory Markov Example 2 / 21 Encoding the output

More information

Maximum Likelihood Decoding of Codes on the Asymmetric Z-channel

Maximum Likelihood Decoding of Codes on the Asymmetric Z-channel Maximum Likelihood Decoding of Codes on the Asymmetric Z-channel Pål Ellingsen paale@ii.uib.no Susanna Spinsante s.spinsante@univpm.it Angela Barbero angbar@wmatem.eis.uva.es May 31, 2005 Øyvind Ytrehus

More information

Run-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE

Run-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE General e Image Coder Structure Motion Video x(s 1,s 2,t) or x(s 1,s 2 ) Natural Image Sampling A form of data compression; usually lossless, but can be lossy Redundancy Removal Lossless compression: predictive

More information



More information

Exercises with solutions (Set B)

Exercises with solutions (Set B) Exercises with solutions (Set B) 3. A fair coin is tossed an infinite number of times. Let Y n be a random variable, with n Z, that describes the outcome of the n-th coin toss. If the outcome of the n-th

More information

lossless, optimal compressor

lossless, optimal compressor 6. Variable-length Lossless Compression The principal engineering goal of compression is to represent a given sequence a, a 2,..., a n produced by a source as a sequence of bits of minimal possible length.

More information

arxiv: v1 [cs.it] 5 Sep 2008

arxiv: v1 [cs.it] 5 Sep 2008 1 arxiv:0809.1043v1 [cs.it] 5 Sep 2008 On Unique Decodability Marco Dalai, Riccardo Leonardi Abstract In this paper we propose a revisitation of the topic of unique decodability and of some fundamental

More information

Coding for Discrete Source

Coding for Discrete Source EGR 544 Communication Theory 3. Coding for Discrete Sources Z. Aliyazicioglu Electrical and Computer Engineering Department Cal Poly Pomona Coding for Discrete Source Coding Represent source data effectively

More information

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University Chapter 4 Data Transmission and Channel Capacity Po-Ning Chen, Professor Department of Communications Engineering National Chiao Tung University Hsin Chu, Taiwan 30050, R.O.C. Principle of Data Transmission

More information

Motivation for Arithmetic Coding

Motivation for Arithmetic Coding Motivation for Arithmetic Coding Motivations for arithmetic coding: 1) Huffman coding algorithm can generate prefix codes with a minimum average codeword length. But this length is usually strictly greater

More information

Lecture 1 : Data Compression and Entropy

Lecture 1 : Data Compression and Entropy CPS290: Algorithmic Foundations of Data Science January 8, 207 Lecture : Data Compression and Entropy Lecturer: Kamesh Munagala Scribe: Kamesh Munagala In this lecture, we will study a simple model for

More information

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018 Please submit the solutions on Gradescope. EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018 1. Optimal codeword lengths. Although the codeword lengths of an optimal variable length code

More information

A One-to-One Code and Its Anti-Redundancy

A One-to-One Code and Its Anti-Redundancy A One-to-One Code and Its Anti-Redundancy W. Szpankowski Department of Computer Science, Purdue University July 4, 2005 This research is supported by NSF, NSA and NIH. Outline of the Talk. Prefix Codes

More information

Infinite anti-uniform sources

Infinite anti-uniform sources Infinite anti-uniform sources Daniela Tarniceriu, Valeriu Munteanu, Gheorghe Zaharia To cite this version: Daniela Tarniceriu, Valeriu Munteanu, Gheorghe Zaharia. Infinite anti-uniform sources. AEU - International

More information

1 Ex. 1 Verify that the function H(p 1,..., p n ) = k p k log 2 p k satisfies all 8 axioms on H.

1 Ex. 1 Verify that the function H(p 1,..., p n ) = k p k log 2 p k satisfies all 8 axioms on H. Problem sheet Ex. Verify that the function H(p,..., p n ) = k p k log p k satisfies all 8 axioms on H. Ex. (Not to be handed in). looking at the notes). List as many of the 8 axioms as you can, (without

More information

Using an innovative coding algorithm for data encryption

Using an innovative coding algorithm for data encryption Using an innovative coding algorithm for data encryption Xiaoyu Ruan and Rajendra S. Katti Abstract This paper discusses the problem of using data compression for encryption. We first propose an algorithm

More information

Source Coding Techniques

Source Coding Techniques Source Coding Techniques. Huffman Code. 2. Two-pass Huffman Code. 3. Lemple-Ziv Code. 4. Fano code. 5. Shannon Code. 6. Arithmetic Code. Source Coding Techniques. Huffman Code. 2. Two-path Huffman Code.

More information

Entropy and Ergodic Theory Lecture 3: The meaning of entropy in information theory

Entropy and Ergodic Theory Lecture 3: The meaning of entropy in information theory Entropy and Ergodic Theory Lecture 3: The meaning of entropy in information theory 1 The intuitive meaning of entropy Modern information theory was born in Shannon s 1948 paper A Mathematical Theory of

More information

CSE 421 Greedy: Huffman Codes

CSE 421 Greedy: Huffman Codes CSE 421 Greedy: Huffman Codes Yin Tat Lee 1 Compression Example 100k file, 6 letter alphabet: File Size: ASCII, 8 bits/char: 800kbits 2 3 > 6; 3 bits/char: 300kbits better: 2.52 bits/char 74%*2 +26%*4:

More information

Chaitin Ω Numbers and Halting Problems

Chaitin Ω Numbers and Halting Problems Chaitin Ω Numbers and Halting Problems Kohtaro Tadaki Research and Development Initiative, Chuo University CREST, JST 1 13 27 Kasuga, Bunkyo-ku, Tokyo 112-8551, Japan E-mail: tadaki@kc.chuo-u.ac.jp Abstract.

More information

1 Introduction to information theory

1 Introduction to information theory 1 Introduction to information theory 1.1 Introduction In this chapter we present some of the basic concepts of information theory. The situations we have in mind involve the exchange of information through

More information

arxiv: v4 [cs.it] 17 Oct 2015

arxiv: v4 [cs.it] 17 Oct 2015 Upper Bounds on the Relative Entropy and Rényi Divergence as a Function of Total Variation Distance for Finite Alphabets Igal Sason Department of Electrical Engineering Technion Israel Institute of Technology

More information

arxiv: v2 [cs.ds] 28 Jan 2009

arxiv: v2 [cs.ds] 28 Jan 2009 Minimax Trees in Linear Time Pawe l Gawrychowski 1 and Travis Gagie 2, arxiv:0812.2868v2 [cs.ds] 28 Jan 2009 1 Institute of Computer Science University of Wroclaw, Poland gawry1@gmail.com 2 Research Group

More information

Upper Bounds on the Capacity of Binary Intermittent Communication

Upper Bounds on the Capacity of Binary Intermittent Communication Upper Bounds on the Capacity of Binary Intermittent Communication Mostafa Khoshnevisan and J. Nicholas Laneman Department of Electrical Engineering University of Notre Dame Notre Dame, Indiana 46556 Email:{mhoshne,

More information

Algorithm Design and Analysis

Algorithm Design and Analysis Algorithm Design and Analysis LECTURE 8 Greedy Algorithms V Huffman Codes Adam Smith Review Questions Let G be a connected undirected graph with distinct edge weights. Answer true or false: Let e be the

More information

Antipodal Gray Codes

Antipodal Gray Codes Antipodal Gray Codes Charles E Killian, Carla D Savage,1 Department of Computer Science N C State University, Box 8206 Raleigh, NC 27695, USA Abstract An n-bit Gray code is a circular listing of the 2

More information

An Entropy Bound for Random Number Generation

An Entropy Bound for Random Number Generation 244 An Entropy Bound for Random Number Generation Sung-il Pae, Hongik University, Seoul, Korea Summary Many computer applications use random numbers as an important computational resource, and they often

More information

Performance Analysis and Code Optimization of Low Density Parity-Check Codes on Rayleigh Fading Channels

Performance Analysis and Code Optimization of Low Density Parity-Check Codes on Rayleigh Fading Channels Performance Analysis and Code Optimization of Low Density Parity-Check Codes on Rayleigh Fading Channels Jilei Hou, Paul H. Siegel and Laurence B. Milstein Department of Electrical and Computer Engineering

More information

Lec 03 Entropy and Coding II Hoffman and Golomb Coding

Lec 03 Entropy and Coding II Hoffman and Golomb Coding CS/EE 5590 / ENG 40 Special Topics Multimedia Communication, Spring 207 Lec 03 Entropy and Coding II Hoffman and Golomb Coding Zhu Li Z. Li Multimedia Communciation, 207 Spring p. Outline Lecture 02 ReCap

More information

Homework Set #2 Data Compression, Huffman code and AEP

Homework Set #2 Data Compression, Huffman code and AEP Homework Set #2 Data Compression, Huffman code and AEP 1. Huffman coding. Consider the random variable ( x1 x X = 2 x 3 x 4 x 5 x 6 x 7 0.50 0.26 0.11 0.04 0.04 0.03 0.02 (a Find a binary Huffman code

More information

On bounded redundancy of universal codes

On bounded redundancy of universal codes On bounded redundancy of universal codes Łukasz Dębowski Institute of omputer Science, Polish Academy of Sciences ul. Jana Kazimierza 5, 01-248 Warszawa, Poland Abstract onsider stationary ergodic measures

More information

Lower Bounds on the Graphical Complexity of Finite-Length LDPC Codes

Lower Bounds on the Graphical Complexity of Finite-Length LDPC Codes Lower Bounds on the Graphical Complexity of Finite-Length LDPC Codes Igal Sason Department of Electrical Engineering Technion - Israel Institute of Technology Haifa 32000, Israel 2009 IEEE International

More information

1 Background on Information Theory

1 Background on Information Theory Review of the book Information Theory: Coding Theorems for Discrete Memoryless Systems by Imre Csiszár and János Körner Second Edition Cambridge University Press, 2011 ISBN:978-0-521-19681-9 Review by

More information

Optimal Power Control in Decentralized Gaussian Multiple Access Channels

Optimal Power Control in Decentralized Gaussian Multiple Access Channels 1 Optimal Power Control in Decentralized Gaussian Multiple Access Channels Kamal Singh Department of Electrical Engineering Indian Institute of Technology Bombay. arxiv:1711.08272v1 [eess.sp] 21 Nov 2017

More information

EE376A: Homework #2 Solutions Due by 11:59pm Thursday, February 1st, 2018

EE376A: Homework #2 Solutions Due by 11:59pm Thursday, February 1st, 2018 Please submit the solutions on Gradescope. Some definitions that may be useful: EE376A: Homework #2 Solutions Due by 11:59pm Thursday, February 1st, 2018 Definition 1: A sequence of random variables X

More information

Optimal prefix codes for pairs of geometrically-distributed random variables

Optimal prefix codes for pairs of geometrically-distributed random variables Author manuscript, published in IEEE International Symposium on Information Theory (ISIT'06), United States (2006) Optimal prefix codes for pairs of geometrically-distributed random variables Frédériue

More information

Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information

Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information 204 IEEE International Symposium on Information Theory Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information Omur Ozel, Kaya Tutuncuoglu 2, Sennur Ulukus, and Aylin Yener

More information

Communications Theory and Engineering

Communications Theory and Engineering Communications Theory and Engineering Master's Degree in Electronic Engineering Sapienza University of Rome A.A. 2018-2019 AEP Asymptotic Equipartition Property AEP In information theory, the analog of

More information

Information and Entropy

Information and Entropy Information and Entropy Shannon s Separation Principle Source Coding Principles Entropy Variable Length Codes Huffman Codes Joint Sources Arithmetic Codes Adaptive Codes Thomas Wiegand: Digital Image Communication

More information

Information Theory in Intelligent Decision Making

Information Theory in Intelligent Decision Making Information Theory in Intelligent Decision Making Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire, United Kingdom June 7, 2015 Information Theory

More information

Sample solutions to Homework 4, Information-Theoretic Modeling (Fall 2014)

Sample solutions to Homework 4, Information-Theoretic Modeling (Fall 2014) Sample solutions to Homework 4, Information-Theoretic Modeling (Fall 204) Jussi Määttä October 2, 204 Question [First, note that we use the symbol! as an end-of-message symbol. When we see it, we know

More information

And for polynomials with coefficients in F 2 = Z/2 Euclidean algorithm for gcd s Concept of equality mod M(x) Extended Euclid for inverses mod M(x)

And for polynomials with coefficients in F 2 = Z/2 Euclidean algorithm for gcd s Concept of equality mod M(x) Extended Euclid for inverses mod M(x) Outline Recall: For integers Euclidean algorithm for finding gcd s Extended Euclid for finding multiplicative inverses Extended Euclid for computing Sun-Ze Test for primitive roots And for polynomials

More information

Data Structures in Java

Data Structures in Java Data Structures in Java Lecture 20: Algorithm Design Techniques 12/2/2015 Daniel Bauer 1 Algorithms and Problem Solving Purpose of algorithms: find solutions to problems. Data Structures provide ways of

More information

Introduction to information theory and coding

Introduction to information theory and coding Introduction to information theory and coding Louis WEHENKEL Set of slides No 5 State of the art in data compression Stochastic processes and models for information sources First Shannon theorem : data

More information