Huffman Coding. C.M. Liu Perceptual Lab, College of Computer Science National Chiao-Tung University

Similar documents
Run-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE

CSEP 521 Applied Algorithms Spring Statistical Lossless Data Compression

Shannon-Fano-Elias coding

Data Compression Techniques (Spring 2012) Model Solutions for Exercise 2

CMPT 365 Multimedia Systems. Lossless Compression

Lec 03 Entropy and Coding II Hoffman and Golomb Coding

Lecture 3 : Algorithms for source coding. September 30, 2016

Chapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code

Multimedia. Multimedia Data Compression (Lossless Compression Algorithms)

Autumn Coping with NP-completeness (Conclusion) Introduction to Data Compression

Entropy as a measure of surprise

Basic Principles of Lossless Coding. Universal Lossless coding. Lempel-Ziv Coding. 2. Exploit dependences between successive symbols.

Compression and Coding

CSEP 590 Data Compression Autumn Arithmetic Coding

Information Theory and Statistics Lecture 2: Source coding

CS4800: Algorithms & Data Jonathan Ullman

Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet)

Lecture 4 : Adaptive source coding algorithms

3F1 Information Theory, Lecture 3

Digital communication system. Shannon s separation principle

SIGNAL COMPRESSION Lecture Shannon-Fano-Elias Codes and Arithmetic Coding

UNIT I INFORMATION THEORY. I k log 2

Lecture 2: Introduction to Audio, Video & Image Coding Techniques (I) -- Fundaments

Information Theory with Applications, Math6397 Lecture Notes from September 30, 2014 taken by Ilknur Telkes

Chapter 2 Date Compression: Source Coding. 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code

3F1 Information Theory, Lecture 3

Summary of Last Lectures

Lecture 2: Introduction to Audio, Video & Image Coding Techniques (I) -- Fundaments. Tutorial 1. Acknowledgement and References for lectures 1 to 5

4. Quantization and Data Compression. ECE 302 Spring 2012 Purdue University, School of ECE Prof. Ilya Pollak

Lec 04 Variable Length Coding (VLC) in JPEG

Information and Entropy

CSCI 2570 Introduction to Nanocomputing

4.8 Huffman Codes. These lecture slides are supplied by Mathijs de Weerd

DCSP-3: Minimal Length Coding. Jianfeng Feng

ECE Advanced Communication Theory, Spring 2009 Homework #1 (INCOMPLETE)

BASICS OF COMPRESSION THEORY

CSE 421 Greedy: Huffman Codes

10-704: Information Processing and Learning Fall Lecture 10: Oct 3

Information & Correlation

EE376A: Homework #2 Solutions Due by 11:59pm Thursday, February 1st, 2018

1. Basics of Information

Multimedia Communications. Mathematical Preliminaries for Lossless Compression

2018/5/3. YU Xiangyu

Information Theory. Week 4 Compressing streams. Iain Murray,

Digital Communications III (ECE 154C) Introduction to Coding and Information Theory

Lecture 1 : Data Compression and Entropy

Data Compression Techniques

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

Introduction to information theory and coding

CMPT 365 Multimedia Systems. Final Review - 1

Text Compression. Jayadev Misra The University of Texas at Austin December 5, A Very Incomplete Introduction to Information Theory 2

A Mathematical Theory of Communication

17.1 Binary Codes Normal numbers we use are in base 10, which are called decimal numbers. Each digit can be 10 possible numbers: 0, 1, 2, 9.

Lecture 6: Kraft-McMillan Inequality and Huffman Coding

Lecture 1: September 25, A quick reminder about random variables and convexity

Entropy Coding. Connectivity coding. Entropy coding. Definitions. Lossles coder. Input: a set of symbols Output: bitstream. Idea

Introduction to Information Theory. By Prof. S.J. Soni Asst. Professor, CE Department, SPCE, Visnagar

Chapter 2: Source coding

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria

PART III. Outline. Codes and Cryptography. Sources. Optimal Codes (I) Jorge L. Villar. MAMME, Fall 2015

1 Introduction to information theory

Lecture 16. Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code

Optimal codes - I. A code is optimal if it has the shortest codeword length L. i i. This can be seen as an optimization problem. min.

Module 5 EMBEDDED WAVELET CODING. Version 2 ECE IIT, Kharagpur

Chapter 2 Source Models and Entropy. Any information-generating process can be viewed as. computer program in executed form: binary 0

Quantum-inspired Huffman Coding

Intro to Information Theory

6.02 Fall 2012 Lecture #1

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

MAHALAKSHMI ENGINEERING COLLEGE QUESTION BANK. SUBJECT CODE / Name: EC2252 COMMUNICATION THEORY UNIT-V INFORMATION THEORY PART-A

Data Compression Techniques

Data Compression. Limit of Information Compression. October, Examples of codes 1

lossless, optimal compressor

Greedy. Outline CS141. Stefano Lonardi, UCR 1. Activity selection Fractional knapsack Huffman encoding Later:

MAHALAKSHMI ENGINEERING COLLEGE-TRICHY QUESTION BANK UNIT V PART-A. 1. What is binary symmetric channel (AUC DEC 2006)

Lec 05 Arithmetic Coding

Adapting Boyer-Moore-Like Algorithms for Searching Huffman Encoded Texts

Using an innovative coding algorithm for data encryption

Reduce the amount of data required to represent a given quantity of information Data vs information R = 1 1 C

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Information and Entropy. Professor Kevin Gold

Communications Theory and Engineering

ECE 587 / STA 563: Lecture 5 Lossless Compression

Image Compression. Qiaoyong Zhong. November 19, CAS-MPG Partner Institute for Computational Biology (PICB)

COMP9319 Web Data Compression and Search. Lecture 2: Adaptive Huffman, BWT

! Where are we on course map? ! What we did in lab last week. " How it relates to this week. ! Compression. " What is it, examples, classifications

Ch 0 Introduction. 0.1 Overview of Information Theory and Coding

Exercises with solutions (Set B)

An Approximation Algorithm for Constructing Error Detecting Prefix Codes

Coding for Discrete Source

Lecture 10 : Basic Compression Algorithms

TTIC 31230, Fundamentals of Deep Learning David McAllester, April Information Theory and Distribution Modeling

Multimedia Information Systems

Source Coding Techniques

Fibonacci Coding for Lossless Data Compression A Review

ECE 587 / STA 563: Lecture 5 Lossless Compression

Digital Image Processing Lectures 25 & 26

ELEC 515 Information Theory. Distortionless Source Coding

SGN-2306 Signal Compression. 1. Simple Codes

COMP9319 Web Data Compression and Search. Lecture 2: Adaptive Huffman, BWT

Transcription:

Huffman Coding C.M. Liu Perceptual Lab, College of Computer Science National Chiao-Tung University http://www.csie.nctu.edu.tw/~cmliu/courses/compression/ Office: EC538 (03)573877 cmliu@cs.nctu.edu.tw

Overview Huffman Coding Algorithm The procedure to build Huffman codes. Extended Huffman Codes Adaptive Huffman Coding Update Procedure Decoding Procedure Golomb Codes

Shannon-Fano Coding 3 The first code based on Shannon s theory Suboptimal (it took a graduate student to fix it!) Algorithm Start with empty codes Compute frequency statistics for all symbols Order the symbols in the set by frequency Split the set to minimize * difference Add 0 to the codes in the first set and to the rest Recursively assign the rest of the code bits for the two subsets, until sets cannot be split.

Shannon-Fano Coding () 4 0 0 a b c d e f 9 8 6 5 4 0 a b 9 8 c d e f 6 5 4

Shannon-Fano Coding (3) 5 0 0 a b c d e f 9 8 6 5 4 0 0 c d e f 6 5 4 a b 9 8

Shannon-Fano Coding (4) 6 00 0 a b c d e f 9 8 6 5 4 0 0 c d e f 6 5 4 a b 9 8

Shannon-Fano Coding (5) 7 00 0 a b c d e f 9 8 6 5 4 0 0 0 a b 9 8 c d e f 6 5 4

Shannon-Fano Coding (6) 8 00 0 0 0 a b c d e f 9 8 6 5 4 0 0 0 a b 9 8 c d e f 6 5 4

Shannon-Fano Coding (7) 9 00 0 0 0 a b c d e f 9 8 6 5 4 0 0 0 a b 9 8 0 c d 6 5 e f 4

Shannon-Fano Coding (8) 0 00 0 00 0 a b c d e f 9 8 6 5 4 0 0 0 a b 9 8 0 c d 6 5 e f 4

Shannon-Fano Coding (9) 00 0 00 0 a b c d e f 9 8 6 5 4 0 0 0 a b 9 8 0 c d 6 5 0 e f 4

Shannon-Fano Coding (0) 00 0 00 0 0 a b c d e f 9 8 6 5 4 0 0 0 a b 9 8 0 c d 6 5 0 e f 4

Shannon-Fano Coding: Remarks 3 Shannon-Fano does not always produce optimal prefix codes; the set of probabilities {0.35, 0.7, 0.7, 0.6, 0.5} Huffman coding is almost as computationally simple and produces prefix codes that always achieve the lowest expected code word length, under the constraints that each symbol is represented by a code formed of an integral number of bits. Symbol-by-symbol Huffman coding is only optimal if the probabilities of the symbols are independent and are some power of a half, i.e. / n

Optimum Prefix Codes 4 Key observations on optimal codes. Symbols that occur more frequently will have shorter codewords. The two least frequent symbols will have the same length Proofs. Assume the opposite code is clearly sub-optimal. Assume the opposite Let X, Y be the least frequent symbols & code(x) = k, code(y) = k+ Then by unique decodability (UD), code(x) cannot be a prefix for code(y) also, all other codes are shorter Dropping the last bit of code(y) would generate a new, shorter, uniquely decodable code!!! This contradicts optimality assumption!!!

Huffman Coding 5 David Huffman (95) Grad student of Robert M. Fano (MIT) Term paper(!) Explained by example Letter Code Probability Set Set Prob a 0. b 0.4 c 0. d 0. e 0.

Huffman Coding by Example 6 Init: Create a set out of each letter Letter Code Probability Set Set Prob a 0. b 0.4 c 0. d 0. e 0.

Huffman Coding by Example 7. Sort sets according to probability (lowest first) Letter Code Probability Set Set Prob a 0. a 0. b 0.4 b 0.4 c 0. c 0. d 0. d 0. e 0. e 0.

Huffman Coding by Example 8. Insert prefix into the codes of top set letters Letter Code Probability Set Set Prob a 0. d 0. b 0.4 e 0. c 0. a 0. d 0. c 0. e 0. b 0.4

Huffman Coding by Example 9 3. Insert prefix 0 into the codes of the second set letters Letter Code Probability Set Set Prob a 0. d 0. b 0.4 e 0. c 0. a 0. d 0. c 0. e 0 0. b 0.4

Huffman Coding by Example 0 4. Merge the top two sets Letter Code Probability Set Set Prob a 0. de d 0. 0. b 0.4 ea 0. 0. c 0. ac 0. d 0. dc 0.4 0. e 0 0. b 0.4

Huffman Coding by Example. Sort sets according to probability (lowest first) Letter Code Probability Set Set Prob a 0. de 0. b 0.4 a 0. c 0. c 0. d 0. b 0.4 e 0 0.

Huffman Coding by Example. Insert prefix into the codes of top set letters Letter Code Probability Set Set Prob a 0. de 0. b 0.4 a 0. c 0. c 0. d 0. b 0.4 e 00 0.

Huffman Coding by Example 3 3. Insert prefix 0 into the codes of the second set letters Letter Code Probability Set Set Prob a 0 0. de 0. b 0.4 a 0. c 0. c 0. d 0. b 0.4 e 0 0.

Huffman Coding by Example 4 4. Merge the top two sets Letter Code Probability Set Set Prob a 0 0. dea 0.4 0. b 0.4 ac 0. c 0. bc 0.4 0. d 0. b 0.4 e 0 0.

Huffman Coding by Example 5. Sort sets according to probability (lowest first) Letter Code Probability Set Set Prob a 0 0. dea c 0. 0.4 b 0.4 dea c 0.4 0. c 0. b 0.4 d 0. e 0 0.

Huffman Coding by Example 6. Insert prefix into the codes of top set letters Letter Code Probability Set Set Prob a 0 0. c 0. b 0.4 dea 0.4 c 0. b 0.4 d 0. e 0 0.

Huffman Coding by Example 7 3. Insert prefix 0 into the codes of the second set letters Letter Code Probability Set Set Prob a 000 0. c 0. b 0.4 dea 0.4 c 0. b 0.4 d 0 0. e 00 0 0.

Huffman Coding by Example 8 4. Merge the top two sets

Huffman Coding by Example 9. Sort sets according to probability (lowest first) Letter Code Probability Set Set Prob a 00 0. cdea 0.6 b 0.4 b 0.4 c 0. d 0 0. e 00 0.

Huffman Coding by Example 30. Insert prefix into the codes of top set letters Letter Code Probability Set Set Prob a 00 0. b 0.4 b 0.4 cdea 0.6 c 0. d 0 0. e 00 0.

Huffman Coding by Example 3 3. Insert prefix 0 into the codes of the second set letters Letter Code Probability Set Set Prob a 00 0. b 0.4 b 0.4 cdea 0.6 c 0. d 0 0. e 00 0.

Huffman Coding by Example 3 4. Merge the top two sets Letter Code Probability Set Set Prob a 000 0. b 0.4 b 0.4 cdea 0.6 c 0 0. d 00 0. e 000 0. The END

Example Summary 33 Average code length l = 0.4x + 0.x + 0.x3 + 0.x4 + 0.x4 =. bits/symbol Entropy H = Σ s=a..e P(s) log P(s) =. bits/symbol Redundancy l - H = 0.078 bits/symbol

Huffman Tree 34 0 0 b 0.4 0 c 0. a 0. 0 e d 0. 0.

Building a Huffman Tree 35 Letter a b c d Code c 0. b 0.4 e a 0. 0 0. e d 0. 0.

Building a Huffman Tree 36 Letter Code a b c d 0 0.4 c 0. b 0.4 e 0 a 0. 0 0. e d 0. 0.

Building a Huffman Tree 37 Letter Code a 0 b c d 0 0.4 0 0.6 c 0. b 0.4 e 0 a 0. 0 0. e d 0. 0.

Building a Huffman Tree 38 0.0 Letter Code a 00 b c d 0 0 0.4 0 0.6 c 0. b 0.4 e 00 a 0. 0 0. e d 0. 0.

An Alternative Huffman Tree 39 Letter Code a b c d e 0 a c 0. 0. 0 0. e b 0.4 d 0. 0.

An Alternative Huffman Tree 40 Letter Code a 0 b c d e 0 0 0.4 a c 0. 0. 0 0. e b 0.4 d 0. 0.

An Alternative Huffman Tree 4 Letter Code a 000 b c 0 d e 0 0 0.4 a c 0. 0. 0 0.6 0 0. e b 0.4 d 0. 0.

An Alternative Huffman Tree 4 Letter Code a 000 b c 000 d 0 e 00 Average code length 0 0.4 a c 0. 0. 0 l = 0.4x + (0. + 0. + 0. + 0.)x3=. bits/symbol 0.6 0 0 0. e b 0.4 d 0. 0.

Yet Another Tree 43 Letter Code a 00 b c 0 d 0 e 00 0 0.4 a c 0. 0. 0 0 0. 0 0.6 b 0.4 Average code length e d 0. 0. l = 0.4x+ (0. + 0.)x + (0. + 0.)x3=. bits/symbol

44 Design Examples

Min Variance Huffman Trees 45 Huffman codes are not unique All versions yield the same average length Which one should we choose? The one with the minimum variance in codeword lengths I.e. with the minimum height tree Why? It will ensure the least amount of variability in the encoded stream How to achieve it? During sorting, break ties by placing smaller sets higher Alternatively, place newly merged sets as low as possible

Extended Huffman Codes 46 Consider the source: A = {a, b, c}, P(a) = 0.8, P(b) = 0.0, P(c) = 0.8 H = 0.86 bits/symbol Huffman code: a 0 b c 0 l =. bits/symbol Redundancy = 0.384 b/sym (47%!) Q: Could we do better?

Extended Huffman Codes () 47 Idea Consider encoding sequences of two letters as opposed to single letters Letter Probability Code aa 0.6400 0 ab 0.060 00 ac 0.440 ba 0.060 0000 bb 0.0004 0000 bc 0.0036 000 ca 0.440 00 cb 0.0036 00000 cc 0.034 0 l =.78/ = 0.864 Red. = 0.0045 bits/symbol

Extended Huffman Codes (3) 48 The idea can be extended further Consider all possible n m sequences (we did 3 ) In theory, by considering more sequences we can improve the coding In reality, the exponential growth of the alphabet makes this impractical E.g., for length 3 ASCII seq.: 56 3 = 4 = 6M Most sequences would have zero frequency Other methods are needed

Adaptive Huffman Coding 49 Problem Huffman requires probability estimates This could turn it into a two-pass procedure:. Collect statistics, generate codewords. Perform actual encoding Not practical in many situations E.g. compressing network transmissions Theoretical solution Start with equal probabilities Based on the first k symbol statistics (k =,, ) regenerate codewords and encode k+ st symbol Too expensive in practice

Adaptive Huffman Coding () 50 Basic idea Alphabet A = {a,, a n } Notes: Pick a fixed default binary codes for all symbols Start with an empty Huffman tree Read symbol s from source If NYT(s) // Not Yet Transmitted Send NYT, default(s) Update tree (and keep it Huffman) Else Until done Send codeword for s Update tree Codewords will change as a function of symbol frequencies Encoder & decoder follow the same procedure so they stay in sync

Adaptive Huffman Tree 5 Tree has at most n - nodes Node attributes symbol, left, right, parent, siblings, leaf weight If x k is leaf then weight(x k ) = frequency of symbol(x k ) Else x k = weight( left(x k )) + weight( right(x k )) id, assigned as follows: If weight(x ) weight(x ) weight(x n- ) then id(x ) id(x ) id(x n- ) Also, parent(x k- ) = parent(x k ), for k n Sibling property

Updating the Tree 5 Assign id(root) = n-, weight(nyt) = 0 Start with an NYT node Whenever a new symbols is seen, a new node is formed by splitting the NYT Maintaining sibling property Whenever node x is updated Repeat If weight(x) < weight(y), for all y siblings(x) weight(x)++ exit Else swap(x, z), where z rightmost sibling: weight(x) == weight(z) weight(x)++ x = parent(x) Until x == root

Adaptive Huffman Encoding 53 Input: aardvark Output: Symbol NYT a r d v k Code NYT slightly more efficient default codes are possible (4-/5-bit combination) 0 5 a 00000 f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

Adaptive Huffman Encoding 54 Input: aardvark Output: 00000 5 Symbol NYT 0 a Code NYT 0 49 50 a r d v k a 00000 f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

Adaptive Huffman Encoding 55 Input: aardvark Output: 00000 5 Symbol NYT 0 a Code NYT 0 49 50 a r d v k a 00000 f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

Adaptive Huffman Encoding 56 Input: aardvark Output: 000000000 3 5 Symbol Code NYT 00 a r 0 d v NYT 0 47 49 48 r 50 a k a 00000 f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

Adaptive Huffman Encoding 57 Input: aardvark Output: 00000000000000 4 5 Symbol Code NYT 000 a r 0 d 00 v k NYT 0 47 49 d 48 r 50 a 45 46 a 00000 f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

Adaptive Huffman Encoding 58 4 Input: aardvark 5 Output: 00000000000000000 Symbol Code NYT 000 a r 0 d 00 v k NYT 0 45 47 v 49 46 d 48 r 50 a 43 44 a 00000 f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

Adaptive Huffman Encoding 59 4 Input: aardvark 5 Output: 00000000000000000 Symbol Code NYT 000 a r 0 d 00 v k NYT 0 45 47 v 49 46 d 48 r 50 a 43 44 a 00000 f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

Adaptive Huffman Encoding 60 4 Input: aardvark 5 Output: 00000000000000000 Symbol Code NYT 000 a r 0 d 00 v k NYT 48 0 r 3 49 45 47 v 50 46 a d 43 44 a 00000 f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

Adaptive Huffman Encoding 6 4 Input: aardvark 5 Output: 00000000000000000 Symbol Code NYT 000 a r 0 d 00 v k NYT 48 0 r 3 49 45 47 v 50 46 a d 43 44 a 00000 f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

Adaptive Huffman Encoding 6 5 Input: aardvark Output: 0000000000000000000 Symbol Code NYT 00 a 0 r 0 d v 0 k a 50 NYT 5 48 0 r 3 49 45 47 v 46 d 43 44 a 00000 f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

Adaptive Huffman Encoding 63 6 Input: aardvark Output: 00000000000000000000 Symbol Code NYT 00 a 0 r 0 d v 0 k 3 a 50 NYT 5 48 0 r 3 49 45 47 v 46 d 43 44 a 00000 f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

Adaptive Huffman Encoding 64 7 Input: aardvark Output: 000000000000000000000 Symbol Code NYT 00 a 0 r 0 d v 0 k 3 a 50 NYT 5 48 0 r 4 49 45 47 v 46 d 43 44 a 00000 f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

Adaptive Huffman Encoding 65 3 a 4 Input: aardvark 50 49 7 5 Output: 000000000000000000000 00 Symbol Code NYT 00 a 0 r 0 d v 0 k 43 48 r 45 47 44 v 46 d NYT 0 k 4 4

Adaptive Huffman Encoding 66 3 a 4 Input: aardvark 50 49 7 5 Output: 000000000000000000000 00 Symbol Code NYT 00 a 0 r 0 d v 0 k 43 48 r 45 47 44 v 46 d NYT 0 k 4 4

Adaptive Huffman Encoding 67 8 5 Input: aardvark 3 50 a 5 49 Output: 000000000000000000000 00000 Symbol Code NYT 00 a 0 r 0 d 0 48 r 46 d 3 47 45 v v k 0 43 44 k 000 NYT 0 4 4 k

Adaptive Huffman Decoding 68 Output: Input: 000000000000000000000 00000 Symbol NYT a Code NYT 0 5 r d v k a 00000 f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

Adaptive Huffman Decoding 69 Output: a Input: 000000000000000000000 00000 5 Symbol NYT 0 a Code NYT 0 49 50 a r d v k a 00000 f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

Adaptive Huffman Decoding 70 Output: aa Input: -----0000000000000000 00000 5 Symbol NYT 0 a Code NYT 0 49 50 a r d v k a 00000 f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

Adaptive Huffman Decoding 7 Output: aa Input: ------0000000000000000 00000 5 Symbol NYT 0 a Code NYT 0 49 50 a r d v k a 00000 f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

Adaptive Huffman Decoding 7 Output: aar Input: -------000000000000000 00000 3 5 Symbol NYT 00 a r 0 d v Code NYT 0 47 49 48 r 50 a k a 00000 f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

Adaptive Huffman Decoding 73 Output: aar Input: ------------000000000000 00000 3 5 Symbol NYT 00 a r 0 d v Code NYT 0 47 49 48 r 50 a k a 00000 f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

Adaptive Huffman Decoding 74 Output: aard Input: --------------0000000000 00000 4 5 Symbol Code NYT 000 a r 0 d 00 v k NYT 0 47 49 d 48 r 50 a 45 46 a 00000 f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

Adaptive Huffman Decoding 75 Output: aard Input: -------------------0000000 00000 4 5 Symbol Code NYT 000 a r 0 d 00 v k NYT 0 47 49 d 48 r 50 a 45 46 a 00000 f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

Adaptive Huffman Decoding 76 Output: aardv Input: ----------------------0000 00000 Symbol NYT 000 a r 0 d 00 v k Code NYT 0 45 43 44 a 00000 f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000 47 v 49 46 d 4 5 48 r 50 a

Adaptive Huffman Decoding 77 5 Output: aardv Input: ---------------------------00 00000 Symbol Code NYT 00 a 0 r 0 d v 0 k 50 a NYT 5 48 0 r 3 49 45 47 v 46 d 43 44 a 00000 f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

Adaptive Huffman Decoding 78 6 Output: aardva Input: ---------------------------00 00000 Symbol Code NYT 00 a 0 r 0 d v 0 k 3 50 a NYT 5 48 0 r 3 49 45 47 v 46 d 43 44 a 00000 f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

Adaptive Huffman Decoding 79 7 Output: aardvar Input: ---------------------------0 00000 Symbol Code NYT 00 a 0 r 0 d v 0 k 3 50 a NYT 5 48 0 r 4 49 45 47 v 46 d 43 44 a 00000 f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

Adaptive Huffman Decoding 80 7 Output: aardvar Input: ----------------------------- 00000 Symbol Code NYT 00 a 0 r 0 d v 0 k 3 50 a NYT 5 48 0 r 4 49 45 47 v 46 d 43 44 a 00000 f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

Adaptive Huffman Decoding 8 7 Output: aardvark Input: ------------------------ ----000 Symbol Code NYT 00 a 0 r 0 d v 0 k 3 50 a 5 48 r 4 49 45 47 v 46 d 43 44 k 000 NYT 0 4 4 k

Adaptive Huffman Decoding 8 8 5 3 a 5 Symbol Code NYT 00 a 0 r 0 d 0 v k 0 50 48 r 49 46 d 3 47 45 v 43 44 NYT 0 k 4 4

Dealing with Counter Overflow 83 Over time counters can overflow E.g., 3-bit counter ~ 4 billion BIG but still finite and can overflow on long network connections Solution? Rescale all frequency counts (of leaf nodes) when limit is reached E.g., divide by two all of them Re-compute the rest of the tree (keep it Huffman!) Note: After rescaling, new symbols will count twice as much as old ones! This is mostly a feature, not a bug: Data tends to have strong local correlation I.e., what happened a long time ago is not as important as what happened more recently

Huffman Image Compression 84 Example images: 56x56 pixels, 8 bits/pixel, 65,536 bytes Sena Sensin Earth Omaha Huffman coding of pixel values Image Bits/pixel Size (bytes) Compression Ratio Sena 7.0 57,504.4 Sensin 7.49 6,430.07 Earth 4.94 40,534.6 Omaha 7. 58,374.

Huffman Image Compression () 85 Basic observations The plain Huffman yields modest gains, except in the Earth case Lots of black skews the pixel distribution nicely We are not taking into account obvious correlations of pixel values Huffman coding of pixel differences Image Bits/pixel Size (bytes) Compression Ratio Sena 4.0 3,968.99 Sensin 4.70 38,54.70 Earth 4.3 33,880.93 Omaha 6.4 5,643.4

Two-pass Huffman vs. Adaptive Huffman 86 Two-pass Image Bits/pixel Size (bytes) Compression Ratio Sena 4.0 3,968.99 Sensin 4.70 38,54.70 Earth 4.3 33,880.93 Omaha 6.4 5,643.4 Adaptive Image Bits/pixel Size (bytes) Compression Ratio Sena 3.93 3,6.03 Sensin 4.63 37,896.73 Earth 4.8 39,504.66 Omaha 6.39 5,3.5

Huffman Text Compression 87 PDF(letters): US Constitution vs. Chapter 3 0. P(Consitution) P(Chapter) 0.0 0.08 Probability 0.06 0.04 0.0 0.00 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Letter

Huffman Audio Compression 88 Huffman coding: 6-bit CD audio (44,00 Hz) x channels File Name Original File Size (bytes) Entropy (bits) Est. Compressed File Size (bytes) Compression Ratio Mozart 939,86.8 75,40.6 Cohn 40,44 3.8 349,300.5 Mir 884,00 3.7 759,540.30 Difference Huffman Coding File Name Original File Size (bytes) Entropy of Diff. (bits) Est. Compressed File Size (bytes) Compression Ratio Mozart 939,86 9.7 569.79.65 Cohn 40,44 0.4 6,590.54 Mir 884,00 0.9 60,40.47

Golomb Codes 89 Invented by Solomon W. Golomb in the 960s. Golomb coding is optimal for the geometric distribution Rice coding Golomb code has a tunable parameter that can be any positive value, Rice codes are those in which the tunable parameter is a power of two Unary code The unary representation of the number followed by 0 0 0 0 0 3 0 Identical to Huffman code for {,, 3, } and P(k) = / k Optimal for the probability model

Golomb Codes () 90 Uses a tunable parameter m to divide an input value into the quotient and the remainder. To represent n, we compute q = n/m (quotient) r = n - qm (remainder) Represent q in unary code, followed by r in log m bits If m is not a power of then we can use log m bits Truncated Binary Encoding log m -bit representation for 0 r log m -m- log m -bit representation of r+ log m -m for the rest

Golomb Codes Truncated binary coding 9 Truncated binary coding An entropy encoding typically used for uniform probability distributions with a finite alphabet. A more general form of binary encoding when n is not a power of two. Coding (A Prefix Code) For k n k+, there are u = k+ n unused entries. k-bit codes for 0 r u-. (k+)-bit codes for the rest by r+u. U Truncated binary k U Encoding Standard binary 0 000 0 00 00 UNUSED 0 3 UNUSED 00 4 UNUSED 0 5/UNUSED U n-u 3 0 6/UNUSED 4 7/UNUSED k+ N=5

Golomb Codes Truncated binary coding 9 Input value Offset value Standard Binary Truncated Binary 0 0 000 00 00 00 3 00 0 3 4 0 00 4 5 00 0 5 6 0 0 6 7 0 N=7 Input value Offset value Standard Binary Truncated Binary 0 0 0000 000 000 00 000 00 3 3 00 0 4 4 000 00 5 5 00 0 6 00 00 7 3 0 0 8 4 000 0 9 5 00 N=0

Golomb Code Example 93 m = 6 log m = log m = 3 -bit codes for 0 r log 6-6- 0 r 3-bit codes of r+ log 6-6 for the rest r+ n q r Codeword 0 0 0 000

Golomb Code Example () 94 m = 6 log m = log m = 3 -bit codes for 0 r log 6-6- 0 r 3-bit codes of r+ log 6-6 for the rest r+ n q r Codeword 0 0 0 000 0 00

Golomb Code Example (3) 95 m = 6 log m = log m = 3 -bit codes for 0 r log 6-6- 0 r 3-bit codes of r+ log 6-6 for the rest r+ n q r Codeword 0 0 0 000 0 00 0 000

Golomb Code Example (4) 96 m = 6 log m = log m = 3 -bit codes for 0 r log 6-6- 0 r 3-bit codes of r+ log 6-6 for the rest r+ n q r Codeword 0 0 0 000 0 00 0 000 3 0 3 00

Golomb Code Example (5) 97 m = 6 log m = log m = 3 -bit codes for 0 r log 6-6- 0 r 3-bit codes of r+ log 6-6 for the rest r+ n q r Codeword 0 0 0 000 0 00 0 000 3 0 3 00 4 0 4 00

Golomb Code Example (6) 98 m = 6 log m = log m = 3 -bit codes for 0 r log 6-6- 0 r 3-bit codes of r+ log 6-6 for the rest r+ n q r Codeword 0 0 0 000 0 00 0 000 3 0 3 00 4 0 4 00 5 0 5 0

Golomb Code Example (7) 99 m = 6 log m = log m = 3 -bit codes for 0 r log 6-6- 0 r 3-bit codes of r+ log 6-6 for the rest r+ n q r Codeword 0 0 0 000 0 00 n q r Codeword 6 0 000 0 000 3 0 3 00 4 0 4 00 5 0 5 0

Golomb Code Example (8) 00 m = 6 log m = log m = 3 -bit codes for 0 r log 6-6- 0 r 3-bit codes of r+ log 6-6 for the rest r+ n q r Codeword 0 0 0 000 0 00 0 000 n q r Codeword 6 0 000 7 00 3 0 3 00 4 0 4 00 5 0 5 0

Golomb Code Example (9) 0 m = 6 log m = log m = 3 -bit codes for 0 r log 6-6- 0 r 3-bit codes of r+ log 6-6 for the rest r+ n q r Codeword 0 0 0 000 0 00 0 000 3 0 3 00 n q r Codeword 6 0 000 7 00 8 000 4 0 4 00 5 0 5 0

Golomb Code Example (0) 0 m = 6 log m = log m = 3 -bit codes for 0 r log 6-6- 0 r 3-bit codes of r+ log 6-6 for the rest r+ n q r Codeword 0 0 0 000 0 00 0 000 3 0 3 00 4 0 4 00 n q r Codeword 6 0 000 7 00 8 000 9 3 00 5 0 5 0

Golomb Code Example () 03 m = 6 log m = log m = 3 -bit codes for 0 r log 6-6- 0 r 3-bit codes of r+ log 6-6 for the rest r+ n q r Codeword 0 0 0 000 0 00 0 000 3 0 3 00 4 0 4 00 5 0 5 0 n q r Codeword 6 0 000 7 00 8 000 9 3 00 0 4 00

Golomb Code Example () 04 m = 6 log m = log m = 3 -bit codes for 0 r log 6-6- 0 r 3-bit codes of r+ log 6-6 for the rest r+ n q r Codeword 0 0 0 000 0 00 0 000 3 0 3 00 4 0 4 00 5 0 5 0 n q r Codeword 6 0 000 7 00 8 000 9 3 00 0 4 00 5 0

Golomb Codes: Choosing m 05 Assume a binary string (zeroes & ones) It can be encoded counting the runs of identical bits (either zeroes or ones) A.k.a. run-length encoding (RLE) E.g. 00000000000000000000000000000000000 ---4-0--3 - --------9-00---4 ---4 --3-4,,0,3,,9,,0,0,4,4,3, 35 zeroes, ones P(0) = 35/(35+) = 0.745 log + p log + 0.745 m = m = log p log 0.745 ( ) ( ) =

Summary 06 Early Shannon Fano code Huffman code Original (two-pass) version Collect symbol statistics, assign codes Perform actual encoding of the source Extended version Group multiple symbols to reduce entropy estimate Adaptive version Most practical build Huffman tree on the fly Single pass Escape codes for NYT symbols Encoder & decoder are synchronized More sensitive to local variation, tends to forget older data Homeworks (pp. 78) 4, 5, 6, 0.