Lecture 6: Kraft-McMillan Inequality and Huffman Coding

Similar documents
10-704: Information Processing and Learning Fall Lecture 10: Oct 3

EE 376A: Information Theory Lecture Notes. Prof. Tsachy Weissman TA: Idoia Ochoa, Kedar Tatwawadi

Chapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code

Coding of memoryless sources 1/35

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

Chapter 2 Date Compression: Source Coding. 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code

Information Theory and Statistics Lecture 2: Source coding

Lecture 3. Mathematical methods in communication I. REMINDER. A. Convex Set. A set R is a convex set iff, x 1,x 2 R, θ, 0 θ 1, θx 1 + θx 2 R, (1)

COMM901 Source Coding and Compression. Quiz 1

PART III. Outline. Codes and Cryptography. Sources. Optimal Codes (I) Jorge L. Villar. MAMME, Fall 2015

Run-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE

Lecture 16. Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code

Lecture 1 : Data Compression and Entropy

Homework Set #2 Data Compression, Huffman code and AEP

Data Compression. Limit of Information Compression. October, Examples of codes 1

Multimedia Communications. Mathematical Preliminaries for Lossless Compression

EE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16

Lecture 3 : Algorithms for source coding. September 30, 2016

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria

APC486/ELE486: Transmission and Compression of Information. Bounds on the Expected Length of Code Words

SIGNAL COMPRESSION Lecture 7. Variable to Fix Encoding

EE5585 Data Compression January 29, Lecture 3. x X x X. 2 l(x) 1 (1)

Solutions to Set #2 Data Compression, Huffman code and AEP

Chapter 2: Source coding

Basic Principles of Lossless Coding. Universal Lossless coding. Lempel-Ziv Coding. 2. Exploit dependences between successive symbols.

Intro to Information Theory

lossless, optimal compressor

Chapter 5: Data Compression

Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet)

On the Cost of Worst-Case Coding Length Constraints

Entropy as a measure of surprise

Information and Entropy

Data Compression Techniques (Spring 2012) Model Solutions for Exercise 2

10-704: Information Processing and Learning Fall Lecture 9: Sept 28

Ch. 2 Math Preliminaries for Lossless Compression. Section 2.4 Coding

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

Lec 03 Entropy and Coding II Hoffman and Golomb Coding

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

SIGNAL COMPRESSION Lecture Shannon-Fano-Elias Codes and Arithmetic Coding

CS4800: Algorithms & Data Jonathan Ullman

3F1 Information Theory, Lecture 3

Information Theory. David Rosenberg. June 15, New York University. David Rosenberg (New York University) DS-GA 1003 June 15, / 18

EECS 229A Spring 2007 * * (a) By stationarity and the chain rule for entropy, we have

Digital communication system. Shannon s separation principle

Information Theory with Applications, Math6397 Lecture Notes from September 30, 2014 taken by Ilknur Telkes

EE376A - Information Theory Midterm, Tuesday February 10th. Please start answering each question on a new page of the answer booklet.

4.8 Huffman Codes. These lecture slides are supplied by Mathijs de Weerd

Tight Upper Bounds on the Redundancy of Optimal Binary AIFV Codes

TTIC 31230, Fundamentals of Deep Learning David McAllester, April Information Theory and Distribution Modeling

Lecture 1: September 25, A quick reminder about random variables and convexity

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

EE376A: Homework #2 Solutions Due by 11:59pm Thursday, February 1st, 2018

4. Quantization and Data Compression. ECE 302 Spring 2012 Purdue University, School of ECE Prof. Ilya Pollak

Lecture 11: Polar codes construction

1 Introduction to information theory

Chapter 9 Fundamental Limits in Information Theory

Optimisation and Operations Research

Summary of Last Lectures

Digital Communications III (ECE 154C) Introduction to Coding and Information Theory

Lecture 1: Shannon s Theorem

Sample solutions to Homework 4, Information-Theoretic Modeling (Fall 2014)

Huffman Coding. C.M. Liu Perceptual Lab, College of Computer Science National Chiao-Tung University

Lecture 4 : Adaptive source coding algorithms

Motivation for Arithmetic Coding

Communications Theory and Engineering

Chapter 5. Data Compression

DCSP-3: Minimal Length Coding. Jianfeng Feng

Exercises with solutions (Set B)

Data Compression Techniques

COS597D: Information Theory in Computer Science October 19, Lecture 10

3F1 Information Theory, Lecture 3

Information Theory, Statistics, and Decision Trees

CSEP 590 Data Compression Autumn Arithmetic Coding

CSCI 2570 Introduction to Nanocomputing

UNIT I INFORMATION THEORY. I k log 2

Coding for Discrete Source

21. Dynamic Programming III. FPTAS [Ottman/Widmayer, Kap. 7.2, 7.3, Cormen et al, Kap. 15,35.5]

On the redundancy of optimum fixed-to-variable length codes

CSEP 521 Applied Algorithms Spring Statistical Lossless Data Compression

Tight Bounds on Minimum Maximum Pointwise Redundancy

Optimal codes - I. A code is optimal if it has the shortest codeword length L. i i. This can be seen as an optimization problem. min.

Lecture Notes on Digital Transmission Source and Channel Coding. José Manuel Bioucas Dias

Entropy Coding. Connectivity coding. Entropy coding. Definitions. Lossles coder. Input: a set of symbols Output: bitstream. Idea

Shannon-Fano-Elias coding

CS 229r Information Theory in Computer Science Feb 12, Lecture 5

An Approximation Algorithm for Constructing Error Detecting Prefix Codes

Digital Communications III (ECE 154C) Introduction to Coding and Information Theory

Lecture 4 Channel Coding

A Mathematical Theory of Communication

10-704: Information Processing and Learning Spring Lecture 8: Feb 5

Using an innovative coding algorithm for data encryption

Autumn Coping with NP-completeness (Conclusion) Introduction to Data Compression

ELEC 515 Information Theory. Distortionless Source Coding

Introduction to information theory and coding

Lecture 17: Trees and Merge Sort 10:00 AM, Oct 15, 2018

Asymptotic redundancy and prolixity

ECE 587 / STA 563: Lecture 5 Lossless Compression

ECE 587 / STA 563: Lecture 5 Lossless Compression

Introduction to algebraic codings Lecture Notes for MTH 416 Fall Ulrich Meierfrankenfeld

Multimedia. Multimedia Data Compression (Lossless Compression Algorithms)

Transcription:

EE376A/STATS376A Information Theory Lecture 6-0/25/208 Lecture 6: Kraft-McMillan Inequality and Huffman Coding Lecturer: Tsachy Weissman Scribe: Akhil Prakash, Kai Yee Wan In this lecture, we begin with an observation that Shannon coding is not optimal, in the sense that it does not in general achieve the lowest possible code word length. We will then show that entropy is the lower bound on the expected codeword length of any variable length uniquely decodable code. Finally, we introduce the Huffman Code, a method that produces optimal prefix codes for lossless compression. Suboptimality of Shannon coding Recall from the previous lecture that for a given source distribution p(u, the Shannon code is a prefix code with length function and expected code length as follows: l(u = log u U ( p(u E[l(U] = l H(U + (2 Example: Consider a binary source U for which one symbol is much more likely to occur than the other: U = {a, b}, p(a = 0.99, p(b = 0.0 (3 We can construct the Shannon code for this source: l(a = log = 0.045 = (4 p(a l(b = log = 6.64 = 7 (5 p(b Now consider a fictitious source p (u with symbol probabilities as follows: p (a = 2 = 2, p (b = 2 7 = 28 And recall that we can extend this source with additional symbols to U U such that p (u is dyadic and p (u = : p (c = 2 2, p (d = 2 3, p (e = 2 4, p (f = 2 5, p (g = 2 6, p (h = 2 7 (7 (6 The binary tree representation of the code would look like this: a c d e f And the expected code length and entropy would be: g h b l = E[l(U] = 0.99 + 0.0 7 =.06 (8 H(U = h(0.0 = 0.08 (9

As seen here, even though we are within one bit of the entropy, for a binary source this is not a really an advantage in practice. In fact for a binary source, the optimal code would have been simply using one bit (i.e. 0 for symbol a and one bit (i.e. for symbol b. We will soon learn a method to construct code for a given source such that it is optimal, i.e. achieving the lowest possible expected code length. 2 Average codelength bound for uniquely decodable codes Theorem: (Kraft-McMillan Inequality. For all uniquely decodable (UD codes: 2 l(u (0 Conversely, any integer-valued function satisfying this inequality is the length function of some UD code. Note: The converse has already been shown to be true in earlier discussions, because if 2 l(u holds then we can construct a Shannon prefix code for the dyadic source p(u = 2 l(u. Thus, this theorem will imply that any set of lengths possible for a uniquely decodable code can also be achieved with prefix codes. This statement is even stronger than the statement that prefix codes can achieve the same expected length as uniquely decodable codes. This justifies our focus on the class of prefix codes. We will now proceed with proving the first claim of the Kraft-McMillan Inequality. Proof: Take any uniquely decodable code and let l max = max u l(u. Fix any integer k and observe: ( k ( 2 l(u = u 2 l(u ( u 2 2 l(u2... ( u k 2 l(uk ( = k... 2 l(ui (2 u u 2 u k = k l(ui (3 u,...u k 2 = u k 2 l(uk = k l max (4 {u k l(u k = i} 2 i (5 Since the code is one-to-one, there is at most 2 i symbols whose codewords have length i. This is the one place where we use the assumption that we have a uniquely decodable code. So we can get an upper-bound: ( k 2 l(u k l max 2 i 2 i = k l max (6 Notice that the exponential term ( 2 l(u k is in fact upper-bounded by k l max, which is linear in k. From this we arrive at the theorem s claim (note that the inequality holds for all k > 0: 2 l(u lim k (k l max /k = (7 2

This leads us to an important conclusion on the length of UD codes and the entropy. Theorem: For all UD codes: l H(U (8 Proof: Let = c2 l(u be a P.M.F. where the normalizing constant c is chosen such that the probability sums to. By the Kraft-McMillan Inequality, we know that c, since, = c 2 l(u, c = 2 l(u (9 Next, consider the relative entropy between the probability distributions of the source p and q (Taking advantage of the non-negativity of relative entropy: 0 D(p q (20 = u = u = u p(u log p(u p(u log p(u + u p(u log p(u + u p(u log (2 (22 p(u[l(u log c] (23 }{{} 0 H(U + l (24 Note: Beyond its role in this proof, the relative entropy has significance in the context of compression. Suppose we have length function l(u log for some P.M.F. q. This code will be near optimal if the source distribution is. However, if the true source distribution is p(u, then, [ l H(U E log ] [ E log ] p(u (25 = p(u log p(u u (26 = D(p q (27 The relative entropy can be thought of as the cost of mismatch in lossless compression, i.e. the expected number of excess bits that will be expended due to optimizing the code for distribution q when the true distribution is p. 3 Huffman coding We earlier looked at Shannon code, which is a pretty good construction of a prefix code for a given distribution. However, the best prefix code for a general source code distribution is the Huffman Code. Our construction will be exactly like that for dyadic sources. We will recursively merge the two symbols with the smallest probabilities. To find the code c(u, we follow these steps: 3

. Find 2 symbols with the smallest probability and then merge them to create a new node and treat it as a new symbol. 2. Then merge the next 2 symbols with the smallest probability to create a new node. 3. Repeat steps and 2 until there is only symbol left. At the end of this process, we obtain a binary tree. The paths traversed from the root to the leaves are the prefix codes. Recall that this is a prefix code since we only assign the source symbols to the leaf nodes. Also note that the Huffman code is not necessarily unique since it depends on the order of the merge operations. Aside: (Source arity nomenclature binary (2 symbols, ternary (3 symbols, quaternary (4 symbols, quinary (5 symbols, senary (6 symbols, septenary (7 symbols, octal (8 symbols, nonary (9 symbols, decimal (0 symbols... Example: Senary (6 symbol source: u p(u c(u l(u a.25 0 2 b.25 00 2 c.2 0 2 d.5 0 3 e. 0 4 f.05 4 abcdef ( 0 cb (.45 adef (.55 c (.2 b (.25 a (.25 def (.3 d (.5 ef (.5 e (. f (.05 4

The probability is included in parenthesis. Theorem. The Huffman code is an optimal prefix code. Proof: Assume without loss of generality that U p, U = {, 2,..., r} and p( p(2... p(r. Note that we are just ordering the probabilities. If the probabilities are not ordered we can just relabel the elements so that they are ordered. Let V denote the random variable with alphabet {, 2,..., r } obtained from U by merging symbols r and r. Let {c(u} r be a prefix code for V. We can obtain a prefix code { c(i} r for U by splitting the last code word. c is a prefix code because c is a prefix code. c(i i =, 2,... r 2 c(i = c(r 0 (concatenate a 0 i = r c(r (concatenate a i = r Note: If l( is the length function of {c(i} r and l split( is the length function of { c(i} r, then E[l split (u] = E[l(V ] + p(r + p(r Proof of Note: E[l split (u] = r p(ul split (u r 2 = p(ul split (u + p(r l split (r + p(rl split (r r 2 = p(ul(u + p(r (l(r + + p(r(l(r + = r p(ul(u + p(r + p(r = E[l(V ] + p(r + p(r Note in the third step that for r and r we add one bit to the code and for u r 2 the length function is the same. Thus, we have proven that E[l split (u] = E[l(V ] + p(r + p(r. Observation: The Huffman code for U can be obtained from the Huffman code for V by splitting. The proof will be completed in the next lecture. 5