Lecture 16. Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code

Similar documents
Chapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code

Entropy as a measure of surprise

Coding of memoryless sources 1/35

PART III. Outline. Codes and Cryptography. Sources. Optimal Codes (I) Jorge L. Villar. MAMME, Fall 2015

10-704: Information Processing and Learning Fall Lecture 10: Oct 3

Chapter 5: Data Compression

SIGNAL COMPRESSION Lecture Shannon-Fano-Elias Codes and Arithmetic Coding

Chapter 2: Source coding

Lecture 3 : Algorithms for source coding. September 30, 2016

Chapter 2 Date Compression: Source Coding. 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code

Tight Upper Bounds on the Redundancy of Optimal Binary AIFV Codes

Data Compression Techniques (Spring 2012) Model Solutions for Exercise 2

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

Intro to Information Theory

Homework Set #2 Data Compression, Huffman code and AEP

Run-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE

Data Compression. Limit of Information Compression. October, Examples of codes 1

Lecture 1: September 25, A quick reminder about random variables and convexity

U Logo Use Guidelines

Information Theory with Applications, Math6397 Lecture Notes from September 30, 2014 taken by Ilknur Telkes

3F1 Information Theory, Lecture 3

Asymptotic redundancy and prolixity

Lecture 4 : Adaptive source coding algorithms

EE5585 Data Compression January 29, Lecture 3. x X x X. 2 l(x) 1 (1)

COMM901 Source Coding and Compression. Quiz 1

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria

4.8 Huffman Codes. These lecture slides are supplied by Mathijs de Weerd

Exercises with solutions (Set B)

Information Theory and Statistics Lecture 2: Source coding

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

Communications Theory and Engineering

Lecture 3. Mathematical methods in communication I. REMINDER. A. Convex Set. A set R is a convex set iff, x 1,x 2 R, θ, 0 θ 1, θx 1 + θx 2 R, (1)

COS597D: Information Theory in Computer Science October 19, Lecture 10

Lecture 1 : Data Compression and Entropy

Lecture 6: Kraft-McMillan Inequality and Huffman Coding

Motivation for Arithmetic Coding

ECE 587 / STA 563: Lecture 5 Lossless Compression

ECE 587 / STA 563: Lecture 5 Lossless Compression

3F1 Information Theory, Lecture 3

Using an innovative coding algorithm for data encryption

SIGNAL COMPRESSION Lecture 7. Variable to Fix Encoding

Data Compression Techniques

CSE 421 Greedy: Huffman Codes

UNIT I INFORMATION THEORY. I k log 2

EE376A: Homework #2 Solutions Due by 11:59pm Thursday, February 1st, 2018

Lec 03 Entropy and Coding II Hoffman and Golomb Coding

Lecture 7. Union bound for reducing M-ary to binary hypothesis testing

Streaming Algorithms for Optimal Generation of Random Bits

DCSP-3: Minimal Length Coding. Jianfeng Feng

Optimal codes - I. A code is optimal if it has the shortest codeword length L. i i. This can be seen as an optimization problem. min.

EECS 229A Spring 2007 * * (a) By stationarity and the chain rule for entropy, we have

EE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16

Multimedia Communications. Mathematical Preliminaries for Lossless Compression

Information Theory. David Rosenberg. June 15, New York University. David Rosenberg (New York University) DS-GA 1003 June 15, / 18

Randomized Sorting Algorithms Quick sort can be converted to a randomized algorithm by picking the pivot element randomly. In this case we can show th

Summary of Last Lectures

Shannon-Fano-Elias coding

EE376A - Information Theory Midterm, Tuesday February 10th. Please start answering each question on a new page of the answer booklet.

4. Quantization and Data Compression. ECE 302 Spring 2012 Purdue University, School of ECE Prof. Ilya Pollak

CS4800: Algorithms & Data Jonathan Ullman

Information and Entropy

Lecture 1: Shannon s Theorem

Streaming Algorithms for Optimal Generation of Random Bits

Lecture 11: Polar codes construction

Tight Bounds on Minimum Maximum Pointwise Redundancy

TTIC 31230, Fundamentals of Deep Learning David McAllester, April Information Theory and Distribution Modeling

Lecture 5: Asymptotic Equipartition Property

Basic Principles of Lossless Coding. Universal Lossless coding. Lempel-Ziv Coding. 2. Exploit dependences between successive symbols.

Introduction to information theory and coding

Non-binary Distributed Arithmetic Coding

Data Compression Techniques

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

Solutions to Set #2 Data Compression, Huffman code and AEP

On the Cost of Worst-Case Coding Length Constraints

Multimedia. Multimedia Data Compression (Lossless Compression Algorithms)

Information Theory. Lecture 5 Entropy rate and Markov sources STEFAN HÖST

CSCI 2570 Introduction to Nanocomputing

Chapter 5. Data Compression

3 Greedy Algorithms. 3.1 An activity-selection problem

Entropy. Probability and Computing. Presentation 22. Probability and Computing Presentation 22 Entropy 1/39

An Approximation Algorithm for Constructing Error Detecting Prefix Codes

Computing and Communications 2. Information Theory -Entropy

EE376A - Information Theory Final, Monday March 14th 2016 Solutions. Please start answering each question on a new page of the answer booklet.

1 Ex. 1 Verify that the function H(p 1,..., p n ) = k p k log 2 p k satisfies all 8 axioms on H.

Quantum-inspired Huffman Coding

10-704: Information Processing and Learning Fall Lecture 9: Sept 28

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet)

Kolmogorov complexity ; induction, prediction and compression

Text Compression. Jayadev Misra The University of Texas at Austin December 5, A Very Incomplete Introduction to Information Theory 2

repetition, part ii Ole-Johan Skrede INF Digital Image Processing

MARKOV CHAINS A finite state Markov chain is a sequence of discrete cv s from a finite alphabet where is a pmf on and for

21. Dynamic Programming III. FPTAS [Ottman/Widmayer, Kap. 7.2, 7.3, Cormen et al, Kap. 15,35.5]

Compression and Coding

Coding for Discrete Source

Shannon s Noisy-Channel Coding Theorem

Design and Analysis of Algorithms

A Mathematical Theory of Communication

6.02 Fall 2012 Lecture #1

Lecture 16 Oct 21, 2014

Transcription:

Lecture 16 Agenda for the lecture Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code Variable-length source codes with error 16.1 Error-free coding schemes 16.1.1 The Shannon-Fano-Elias code Our next coding scheme is Elias s twist on the Shannon-Fano code of the previous class which gets rid of the requirement to sort the probabilities. However, this algorithmic efficiency comes at the cost of 1 bit we now need H(X) + 2 bits instead of H(X) + 1 as before. Input: Source distribution P = (p 1,..., p m ) Output: Code C = {c 1,..., c m }. 1. for i = 1,..., m (i) Let l(i) = log p i + 1. c Himanshu Tyagi. Feel free to use with acknowledgement. 1

(ii) Compute F i = j<i p j + p i /2 (where F 0 = p 1 /2). Let c denote the infinite sequence corresponding to the binary representation of F i. If F i has a terminating binary representation, append 0s at the end to make it an infinite sequence. (iii) The codeword c i is given by the first l(i) bits of c, i.e., by the approximation of c to l(i) bits. For illustration, consider the example from the last class once more: From the length Alphabet P X F i in binary l(i) codeword a 1/8 0.0001 4 0001 b 1/2 0.011 2 01 c 1/4 0.11 3 110 d 1/8 0.1111 4 1111 Table 1: An illustration of Shannon-Fano-Elias code. assignments it is clear that the average length of this code is less than H(X) + 2. It only remains to verify that this code is prefix free. Theorem 16.1. A Shannon-Fano-Elias code is prefix-free. Proof. Consider F i and F j such that i < j. Then, as noted in the analysis of Shannon-Fano codes, the codeword c j satisfies 0.c j F j 0.c j + 2 l(j) 0.c j + 2 log p j 1 = 0.c j + p j 2. Thus, 0.c i F i F j (p i + p j )/2 0.c j p i 2. In particular, 0.c j > 0.c i and so c j cannot be a prefix of c i. On the other hand, 0.c j 0.c i + 2 l(i). Note that since both c i and c j are of finite lengths and c i is of length l(i), c i can be a prefix of c j only if 0.c j < 0.c i + 2 l(i), 2

which does not hold. 16.1.2 Huffman code We next present the Huffman code, which still requires us to sort the probabilities, but has average length exactly equal to L p (X), i.e., it is of optimal average length. Unlike the two coding schemes earlier, we now represent the prefix-free code by a binary tree. Input: Source distribution P = (p 1,..., p m ) Output: Code C = {c 1,..., c m }. 1. Associate with each symbol a leaf node. 2. while m 2, do (i) If m = 2, assign the two symbols as the left and the right child of the root. Update m to 1. (ii) Sort the probabilities of symbols in a descending order. Let p 1 p 2... p m be the sorted sequence of probabilities. (iii) If m > 2, assign the symbols (m 1)th and the mth symbols as the left and the right child of a node. Treat this node and its subtree as a new symbol which occurs with probability (p m 1 + p m ) and associate a leaf with it. Update m m 1 and P (p 1,..., p m 2, p m 1 + p m ). 3. Generate a binary code from the tree by putting a 0 over each edge to a left child and 1 over each edge to the right child. For illustration, consider our foregoing example once more. The algorithm proceeds as follows 1 : ((a, 0.125); (b, 0.5); (c, 0.25)); (d, 0.125)) 1 The reader can easily decode our representation of the evolving tree. 3

(b(c(ab)), 1). ((b, 0.5); (c, 0.25)); (ab, 0.25)) ((b, 0.5); (c(ab), 0.5)) In summary, we have the following code: The average length for this example is 7/4 which is Alphabet P X codeword a 1/8 110 b 1/2 0 c 1/4 10 d 1/8 111 Table 2: An illustration of Huffman code. equal to the entropy. Therefore, the average length must be optimal. In fact, the Huffman code always attains the optimal average length. Theorem 16.2. A Huffman code has optimal average length. We only provide a sketch of the proof. An interested reader can find the proof in Cover and Thomas. Proof sketch. The proof relies on the following observation: There exists a prefix-free code of optimal length which assigns two symbols with the least probability to the two longest codewords of length l max such that the first l max 1 bits for the two codewords are the same. Next, we show the optimality of Huffman code by induction on the number of symbols. Denote by L(P) the minimum average length of a prefix-free code. Suppose Huffman code attains the optimal length for every probability distribution over an alphabet of size m 1. For P = (p 1,..., p m ), consider the prefix-free code C of average length L(P) guaranteed by the observation above. Let L H (P) denote the length of the optimal Huffman code. Then, L((p 1,..., p m )) L H ((p 1,..., p m )) 4

= (p m 1 + p m ) + L H ((p 1,..., p m 2, p m 1 + p m )) = (p m 1 + p m ) + L((p 1,..., p m 2, p m 1 + p m )), where the previous equality is by the induction hypothesis. On the other hand, by property of the optimal code C, it also yields a prefix-free code for (p 1,..., p m 2, p m 1 +p m ) of average length L((p 1,..., p m )) (p m 1 + p m ). But then L((p 1,..., p m )) (p m 1 + p m ) + (p m 1 + p m ) + L((p 1,..., p m 2, p m 1 + p m )). Thus, by combining all the bounds above, all inequalities must hold with equality. particular, L((p 1,..., p m )) = L H ((p 1,..., p m )). In 16.2 Variable-length source codes with error Thus far we have seen performance bounds for error-free codes and specific schemes which come close to these bounds. We now examine what do we have to gain if we allow a small probability of error. For fixed-length codes, we have already addressed this question. There the gain depends on the large probability upper bound for the entropy density h(x). However, asymptotically the gain due to error was negligible since the optimal asymptotic rate is independent of ɛ (strong converse). In the case of variable length codes the situation is quite different. Allowing error results in significant gains when we are trying to compress a single symbol as well as results in a rate gain asymptotically. To make our exposition simple, we allow randomized encoding where in addition to the source symbol the encoder also has access to a biased coin toss. We show that L ɛ (X) (1 ɛ)l(x). In fact, we prove a stronger statement. 5

Lemma 16.3. Consider a source with pmf P over a discrete alphabet X. Given an errorfree code of average length l and minimum codeword length l min, there exists a variable length code which allows a probability of error of at most ɛ and has average length no more than (1 ɛ)l + ɛl min. Proof. Consider a code C with average length l, and let c 0 be a codeword of minimum length l min. We define a new code where for each symbol x X the encoder flips a coin which shows heads with probability 1 ɛ and outputs the codeword in C for x if it shows heads or the codeword c 0 if it shows tails. The decoder upon observing a codeword in C simply outputs the corresponding symbol for C. Note that an error will occur only if the coin used in encoding showed heads. Therefore, the probability of error is less than ɛ. Also, the average length of the code must now be averaged over both the coin toss as well as the source distribution. Given that the coin showed heads, the average length of the code is l. Given that the coin showed tails, the average length of the code is l min. Therefore, the overall average length equals (1 ɛ)l + ɛl min. Note that l min for an optimal nonsingular code is 0. Therefore, there exists a code with probability of error less than ɛ and average length less than (1 ɛ)l(x). 6