Optimal codes - I. A code is optimal if it has the shortest codeword length L. i i. This can be seen as an optimization problem. min.

Similar documents
Chapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code

Chapter 2 Date Compression: Source Coding. 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code

Data Compression. Limit of Information Compression. October, Examples of codes 1

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria

Coding of memoryless sources 1/35

10-704: Information Processing and Learning Fall Lecture 10: Oct 3

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

SIGNAL COMPRESSION Lecture Shannon-Fano-Elias Codes and Arithmetic Coding

SGN-2306 Signal Compression. 1. Simple Codes

Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet)

UNIT I INFORMATION THEORY. I k log 2

Shannon-Fano-Elias coding

1 Introduction to information theory

Entropy as a measure of surprise

2018/5/3. YU Xiangyu

Lec 03 Entropy and Coding II Hoffman and Golomb Coding

Information Theory and Statistics Lecture 2: Source coding

Information Theory with Applications, Math6397 Lecture Notes from September 30, 2014 taken by Ilknur Telkes

Lecture 16. Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code

3F1 Information Theory, Lecture 3

Lecture 1 : Data Compression and Entropy

Lecture 3. Mathematical methods in communication I. REMINDER. A. Convex Set. A set R is a convex set iff, x 1,x 2 R, θ, 0 θ 1, θx 1 + θx 2 R, (1)

Lecture 3 : Algorithms for source coding. September 30, 2016

Chapter 2: Source coding

Basic Principles of Lossless Coding. Universal Lossless coding. Lempel-Ziv Coding. 2. Exploit dependences between successive symbols.

Chapter 5: Data Compression

4. Quantization and Data Compression. ECE 302 Spring 2012 Purdue University, School of ECE Prof. Ilya Pollak

EE5585 Data Compression January 29, Lecture 3. x X x X. 2 l(x) 1 (1)

PART III. Outline. Codes and Cryptography. Sources. Optimal Codes (I) Jorge L. Villar. MAMME, Fall 2015

Lecture 1: September 25, A quick reminder about random variables and convexity

Introduction to information theory and coding

Tight Upper Bounds on the Redundancy of Optimal Binary AIFV Codes

3F1 Information Theory, Lecture 3

Communications Theory and Engineering

Intro to Information Theory

COMM901 Source Coding and Compression. Quiz 1

ECE 587 / STA 563: Lecture 5 Lossless Compression

ECE 587 / STA 563: Lecture 5 Lossless Compression

Lecture 4 : Adaptive source coding algorithms

Multimedia Communications. Mathematical Preliminaries for Lossless Compression

Chapter 9 Fundamental Limits in Information Theory

Lecture 1: Shannon s Theorem

Information & Correlation

Data Compression Techniques (Spring 2012) Model Solutions for Exercise 2

Compression and Coding

SIGNAL COMPRESSION Lecture 7. Variable to Fix Encoding

CSCI 2570 Introduction to Nanocomputing

repetition, part ii Ole-Johan Skrede INF Digital Image Processing

DCSP-3: Minimal Length Coding. Jianfeng Feng

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

Exercises with solutions (Set B)

EE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16

Quantum-inspired Huffman Coding

Run-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE


Information Theory. David Rosenberg. June 15, New York University. David Rosenberg (New York University) DS-GA 1003 June 15, / 18

Chapter 5. Data Compression

Entropy Coding. Connectivity coding. Entropy coding. Definitions. Lossles coder. Input: a set of symbols Output: bitstream. Idea

Huffman Coding. C.M. Liu Perceptual Lab, College of Computer Science National Chiao-Tung University

Image and Multidimensional Signal Processing

Information Theory CHAPTER. 5.1 Introduction. 5.2 Entropy

On the redundancy of optimum fixed-to-variable length codes

Lecture Notes on Digital Transmission Source and Channel Coding. José Manuel Bioucas Dias

Summary of Last Lectures

Optimisation and Operations Research

Ch. 2 Math Preliminaries for Lossless Compression. Section 2.4 Coding

Information Theory, Statistics, and Decision Trees

Source Coding Techniques

An Approximation Algorithm for Constructing Error Detecting Prefix Codes

lossless, optimal compressor

Motivation for Arithmetic Coding

Lecture 11: Polar codes construction

Homework Set #2 Data Compression, Huffman code and AEP

Lecture 6: Kraft-McMillan Inequality and Huffman Coding

Data Compression Techniques

Using an innovative coding algorithm for data encryption

ELEC 515 Information Theory. Distortionless Source Coding

TTIC 31230, Fundamentals of Deep Learning David McAllester, April Information Theory and Distribution Modeling

MARKOV CHAINS A finite state Markov chain is a sequence of discrete cv s from a finite alphabet where is a pmf on and for

Tight Bounds on Minimum Maximum Pointwise Redundancy

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy

Text Compression. Jayadev Misra The University of Texas at Austin December 5, A Very Incomplete Introduction to Information Theory 2

An introduction to basic information theory. Hampus Wessman

A Mathematical Theory of Communication

Chapter 2 Source Models and Entropy. Any information-generating process can be viewed as. computer program in executed form: binary 0

Digital Image Processing Lectures 25 & 26

Kolmogorov complexity ; induction, prediction and compression

Coding for Discrete Source

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Multimedia. Multimedia Data Compression (Lossless Compression Algorithms)

BASICS OF COMPRESSION THEORY

ELECTRONICS & COMMUNICATIONS DIGITAL COMMUNICATIONS

1. Basics of Information

An O(N) Semi-Predictive Universal Encoder via the BWT

CSE 421 Greedy: Huffman Codes

Information and Entropy

Introduction to Information Theory. By Prof. S.J. Soni Asst. Professor, CE Department, SPCE, Visnagar

Lecture 22: Final Review

CMPT 365 Multimedia Systems. Lossless Compression

Information Theory. Week 4 Compressing streams. Iain Murray,

Transcription:

Huffman coding

Optimal codes - I A code is optimal if it has the shortest codeword length L L m = i= pl i i This can be seen as an optimization problem min i= li subject to D m m i= lp Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 2 i i

Optimal codes - II Let s make two simplifying assumptions no integer constraint on the codelengths Kraft inequality holds with equality Lagrange-multiplier problem m m l i J = pl i i+ λ D i= i= J j = j 0 pj λd log D = 0 D = l j l l j p λ log D Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 3

Optimal codes - III D l = j Substitute into the Kraft inequality λ log D p j m i= p i λ log D = λ = pi = log D D l i that is l Note that = log p * i D i the entropy, when we use base D for logarithms m m * * L = pl i i = pi Dpi= HD X i= i= log ( ) Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 4!!

Optimal codes - IV In practice the codeword lengths must be integer value, so obtained results is a lower bound Theorem The expected length of any istantaneous D-ary code for a r.v. X satisfies L HD ( x ) this fundamental result derives frow the work of Shannon Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 5

Optimal codes - V What about the upper bound? Theorem Given a source alphabet (i.e. a r.v.) of entropy it is possible to find an instantaneous binary code which length satisfies H( X) L H( X) + H( X) A similar theorem could be stated if we use the wrong probabilities { q i } instead of the true ones { p i } ; the only difference is a term which accounts for the relative entropy Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 6

The redundance It is defined as the average codeword legths minus the entropy Note that (why?) Redundancy = L pilog p i 0 redundancy i Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 7

Compression ratio It is the ratio between the average number of bit/symbol in the original message and the same quantity for the coded message, i.e. C < average original symbol length > = < average compressed symbol length > LX ( )!! Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 8

Uniquely decodable codes The set of the instantaneous codes are a small subset of the uniquely decodable codes. It is possible to obtain a lower average code length L using a uniquely decodable code that is not instantaneous? NO So we use instantaneous codes that are easier to decode Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 9

Summary Average codeword length L L H( X) for uniquely decodable codes (and for instantaneous codes) In practice for each r.v. with entropy H ( X ) we can build a code with average codeword length that satisfies HX ( ) L HX ( ) + X Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 0

Shannon-Fano coding The main advantage of the Shannon-Fano technique is its semplicity Source symbols are listed in order of nonincreasing probability. The list is divided in such a way to form two groups of as nearly equal probabilities as possible Each symbol in the first group receives a 0 as first digit of its codeword, while the others receive a Each of these group is then divided according to the same criterion and additional code digits are appended The process is continued until each group contains only one message Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006

example a 2 0 b c d 4 8 6 0 0 0 H=.9375 bits L=.9375 bits e 32 0 f 32 Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 2

Shannon-Fano coding - exercise Symb. Prob. * 2%? 5%! 3% & 2% $ 29% 3% 0% 6% @ 0% Encode, using Shannon-Fano algorithm Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 3

Is Shannon-Fano coding optimal? a 0.35 00 0 b 0.7 0 00 H=2.2328 bits c 0.7 0 0 d 0.6 0 0 L=2.3 bits e 0.5 L=2.3 bits Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 4

Huffman coding - I There is another algorithm which performances are slightly better than Shanno-Fano, the famous Huffman coding It works constructing bottom-up a tree, that has symbols in the leafs The two leafs with the smallest probabilities becomes sibling under a parent node with probabilities equal to the two children s probabilities Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 5

Huffman coding - II At this time the operation is repeated, considering also the new parent node and ignoring its children The process continue until there is only parent node with probability, that is the root of the tree Then the two branches for every non-leaf node are labeled 0 and (typically, 0 on the left branch, but the order is not important) Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 6

Huffman coding - example Symbol Prob. a 0.05 b 0.05 c 0. d 0.2 e 0.3 f 0.2 g 0. 0 0 0. 0 0.2 0 0.4.0 0 a b c d e f 0.05 0.05 0. 0.2 0.3 0.2 Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 7 0.6 0 0.3 g 0.

Huffman coding - example Symbol Prob. Codeword a 0.05 0000 b 0.05 000 c 0. 00 d 0.2 0 e 0.3 0 f 0.2 0 g 0. Exercise: evaluate H(X) and L(X) H(X)=2.5464 bits L(X)=2.6 bits!! Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 8

Huffman coding - exercise Symbol Prob. Codeword a 0.05 0000 b 0.05 000 c 0. 00 d 0.2 0 e 0.3 0 f 0.2 0 g 0. Code the sequence aeebcddegfced and calculate the compression ratio Sol: 0000 0 0 000 00 0 0 0 0 00 0 0 Aver. orig. symb. length = 3 bits Aver. compr. symb. length = 34/3 C=... Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 9

Huffman coding - exercise Symbol Prob. Codeword a 0.05 0000 b 0.05 000 c 0. 00 d 0.2 0 e 0.3 0 f 0.2 0 g 0. Decode the sequence 00000000000 Sol: dfdcadgf Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 20

Huffman coding - exercise Symb. Prob. a 0.0 b 0.03 c 0.4 0 0.4 0.22 2 0.04 $ 0.07 Encode with Huffman the sequence 0$cc0a02ba0 and evaluate entropy, average codeword length and compression ratio Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 2

Huffman coding - exercise Symb. Prob. 0 0.6 0.02 2 0.5 3 0.29 4 0.7 5 0.04 % 0.7 Decode (if possible) the Huffman coded bit streaming 000000000... Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 22

Huffman coding - notes In the huffman coding, if, at any time, there is more than one way to choose a smallest pair of probabilities, any such pair may be chosen Sometimes, the list of probabilities is inizialized to be non-increasing and reordered after each node creation. This details doesn t affect the correctness of the algorithm, but it provides a more efficient implementation Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 23

Huffman coding - notes There are cases in which the Huffman coding does not uniquely determine codeword lengths, due to the arbitrary choice among equal minimum probabilities. For example for a source with probabilities { 0.4, 0.2, 0.2, 0., 0.} it is possible to obtain codeword lengths of {, 2, 3, 4, 4 } and of { 2, 2, 2, 3, 3} It would be better to have a code which codelength has the minimum variance, as this solution will need the minimum buffer space in the transmitter and in the receiver Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 24

Huffman coding - notes Schwarz defines a variant of the Huffman algorithm that allows to build the code with minimum. l max There are several other variants, we will explain the most important in a while. Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 25

Optimality of Huffman coding - I It is possible to prove that, in case of character coding (one symbol, one codeword), Huffman coding is optimal In another terms Huffman code has minimum redundancy An upper bound for redundancy has been found ( ) redundancy p + log e+ log log e p + 0.086 p 2 2 2 where is the probability of the most likely simbol Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 26

Optimality of Huffman coding - II Why Huffman code suffers when there is one symbol with very high probability? Remember the notion of uncertainty... px ( ) log( px ( )) 0 The main problem is given by the integer constraint on codelengths!! This consideration opens the way to a more powerful coding... we will see it later Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 27

Huffman coding - implementation Huffman coding can be generated in O(n) time, where n is the number of source symbols, provided that probabilities have been presorted (however this sort costs O(nlogn)...) Nevertheless, encoding is very fast Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 28

Huffman coding - implementation However, spatial and temporal complexity of the decoding phase are far more important, because, on average, decoding will happen more frequently. Consider a Huffman tree with n symbols n leafs and n- internal nodes has the pointer to a symbol and the info that it is a leaf has two pointers 2n+ 2( n ) 4 n words (32 bits) Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 29

Huffman coding - implementation million symbols 6 MB of memory! Moreover traversing a tree from root to leaf involves follow a lot of pointers, with little locality of reference. This causes several page faults or cache misses. To solve this problem a variant of Huffman coding has been proposed: canonical Huffman coding Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 30

canonical Huffman coding - I Symb. Prob. Code Code 2 Code 3 a b 0. 000 0.2 00 0 000 00 0.0 () (0) c 0.3 00 d 0.4 0 e 0.24 0 f 0.26 0 00 00 0 0 0 00 0 () 0.47 0 () 0.23 0.27 0.53 (0) (0) () 0 (0) 0() (0)? a 0. b 0.2 c 0.3 d 0.4 e 0.24 f 0.26 Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 3

Symb. Code 3 a b c d e f canonical Huffman coding - II 000 00 00 0 0 This code cannot be obtained through a Huffman tree! We do call it an Huffman code because it is instantaneous and the codeword lengths are the same than a valid Huffman code numerical sequence property codewords with the same length are ordered lexicographically when the codewords are sorted in lexical order they are also in order from the longest to the shortest codeword Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 32

canonical Huffman coding - III The main advantage is that it is not necessary to store a tree, in order to decoding We need a list of the symbols ordered according to the lexical order of the codewords an array with the first codeword of each distinct length Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 33

canonical Huffman coding - IV Encoding. Suppose there are n disctinct symbols, that for symbol i we have calculated huffman codelength l i and i l i maxlength numl[k] = number of codewords with length k firstcode[k] = integer for first code of length k nextcode[k] = integer for the next codeword of length k to be assigned symbol[-,-] used for decoding codeword[i] the l i rightmost bits of this integer are the code for symbol i for k = to maxlength { numl[ k] = 0; } for i = to n { numl[ l ] = numl[ l ] + ; } firstcode[ maxlength] = 0; for k = maxlength downto { i firstcode[ k] = ( firstcode[ k + ] + numl[ k + ]) / 2 ; } for k = to maxlength { nextcode[ k]= firstcode[ k]; } for i = to n { codeword[ i] = nextcode[ l ]; [ ] symbol l, nextcode[ l ]- firstcode[ l ] = i; i i i nextcode[ l ] = nextcode[ l ] + ; } i i i i 34

canonical Huffman - example Symb. length i l i a 2 b 5 c 5 d 3 e 2 f 5 g 5 h 2 code bits word 0 0 00000 0000 00 2 0 2 0000 3 000 3. Evaluate array numl numl : [0 3 0 4] 2. Evaluate array firstcode firstcode : [2 2 0] 3. Construct array codeword and symbol for k = to maxlength { nextcode[]= k firstcode[]; k } for i= to n { codeword [] i = nextcode[ l ]; [ ] symbol l, nextcode[ l ]- firstcode [ l ] = i; i i i nextcode[] l = nextcode[] l + ; } i i i for k = maxlength downto { firstcode[ k] = ( firstcode[ k + ] + + numl[ k + ]) / 2 ; } symbol 0 2 3 - - - - a e h - d - - - - - - - b c f g 35 2 3 4 5

canonical Huffman coding - V Decoding. We have the arrays firstcode and symbols nextinputbit() function that returns next input bit firstcode[k] = integer for first code of length k symbol[k,n] returns the symbol number n with codelength k v = nextinputbit(); k = ; while v < firstcode[ k] { v = 2* v + nextinputbit(); k = k+ ; } [ firstcode k ] Return symbol k, v [ ] ; Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 36

canonical Huffman - example v = nextinputbit(); k = ; while v < firstcode[ k] { v = 2* v + nextinputbit(); k = k+ ; } [ firstcode k ] Return symbol k, v [ ] ; symbol 0 2 3 - - - - a e h - 2 00 0 0 0 0 000 00 symbol[3,0] = d symbol[2,2] = h symbol[2,] = e symbol[5,0] = b symbol[2,0] = a symbol[3,0] = d d - - - - - - - b c f g 3 4 5 Decoded: dhebad firstcode : [2 2 0] 37