Lecture 3 : Algorithms for source coding. September 30, 2016

Similar documents
Coding of memoryless sources 1/35

Chapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code

SIGNAL COMPRESSION Lecture Shannon-Fano-Elias Codes and Arithmetic Coding

Information Theory and Statistics Lecture 2: Source coding

Lecture 4 : Adaptive source coding algorithms

Lecture 3. Mathematical methods in communication I. REMINDER. A. Convex Set. A set R is a convex set iff, x 1,x 2 R, θ, 0 θ 1, θx 1 + θx 2 R, (1)

Entropy as a measure of surprise

Data Compression. Limit of Information Compression. October, Examples of codes 1

3F1 Information Theory, Lecture 3

Lecture 16. Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code

3F1 Information Theory, Lecture 3

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

Chapter 5: Data Compression

Chapter 2 Date Compression: Source Coding. 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code

Data Compression Techniques (Spring 2012) Model Solutions for Exercise 2

Shannon-Fano-Elias coding

10-704: Information Processing and Learning Fall Lecture 10: Oct 3

Basic Principles of Lossless Coding. Universal Lossless coding. Lempel-Ziv Coding. 2. Exploit dependences between successive symbols.

Run-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet)

1 Introduction to information theory

Information Theory with Applications, Math6397 Lecture Notes from September 30, 2014 taken by Ilknur Telkes

Exercise 1. = P(y a 1)P(a 1 )

Lec 03 Entropy and Coding II Hoffman and Golomb Coding

EE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria

Chapter 2: Source coding

Motivation for Arithmetic Coding

CSCI 2570 Introduction to Nanocomputing

CSE 421 Greedy: Huffman Codes

Source Coding Techniques

Summary of Last Lectures

CMPT 365 Multimedia Systems. Lossless Compression

CSEP 590 Data Compression Autumn Arithmetic Coding

ECE 587 / STA 563: Lecture 5 Lossless Compression

4.8 Huffman Codes. These lecture slides are supplied by Mathijs de Weerd

EE5585 Data Compression January 29, Lecture 3. x X x X. 2 l(x) 1 (1)

CSEP 521 Applied Algorithms Spring Statistical Lossless Data Compression

Lecture 6: Kraft-McMillan Inequality and Huffman Coding

ECE 587 / STA 563: Lecture 5 Lossless Compression

Multimedia Communications. Mathematical Preliminaries for Lossless Compression

COMM901 Source Coding and Compression. Quiz 1

Homework Set #2 Data Compression, Huffman code and AEP

U Logo Use Guidelines

Huffman Coding. C.M. Liu Perceptual Lab, College of Computer Science National Chiao-Tung University

Compression and Coding

ELEC 515 Information Theory. Distortionless Source Coding

Optimal codes - I. A code is optimal if it has the shortest codeword length L. i i. This can be seen as an optimization problem. min.

Lec 05 Arithmetic Coding

Introduction to information theory and coding

Capacity of a channel Shannon s second theorem. Information Theory 1/33

Lecture 1 : Data Compression and Entropy

lossless, optimal compressor

Coding for Discrete Source

Lecture 1: September 25, A quick reminder about random variables and convexity

Information Theory. David Rosenberg. June 15, New York University. David Rosenberg (New York University) DS-GA 1003 June 15, / 18

Information & Correlation

CS4800: Algorithms & Data Jonathan Ullman

Using an innovative coding algorithm for data encryption

Intro to Information Theory

Solutions to Set #2 Data Compression, Huffman code and AEP

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Data Compression Techniques

PART III. Outline. Codes and Cryptography. Sources. Optimal Codes (I) Jorge L. Villar. MAMME, Fall 2015

UNIT I INFORMATION THEORY. I k log 2

Chapter 5. Data Compression

Autumn Coping with NP-completeness (Conclusion) Introduction to Data Compression

CMPT 365 Multimedia Systems. Final Review - 1

Communications Theory and Engineering

Lecture 22: Final Review

repetition, part ii Ole-Johan Skrede INF Digital Image Processing

Text Compression. Jayadev Misra The University of Texas at Austin December 5, A Very Incomplete Introduction to Information Theory 2

EE376A - Information Theory Midterm, Tuesday February 10th. Please start answering each question on a new page of the answer booklet.

Information Theory CHAPTER. 5.1 Introduction. 5.2 Entropy

Multimedia. Multimedia Data Compression (Lossless Compression Algorithms)

SIGNAL COMPRESSION Lecture 7. Variable to Fix Encoding

10-704: Information Processing and Learning Fall Lecture 9: Sept 28

Chapter 9 Fundamental Limits in Information Theory

(Classical) Information Theory II: Source coding

Information Theory. Week 4 Compressing streams. Iain Murray,

4. Quantization and Data Compression. ECE 302 Spring 2012 Purdue University, School of ECE Prof. Ilya Pollak

Introduction to Information Theory. By Prof. S.J. Soni Asst. Professor, CE Department, SPCE, Visnagar

Lecture 11: Polar codes construction

COS597D: Information Theory in Computer Science October 19, Lecture 10

Lecture 1: Shannon s Theorem

Information and Entropy

1 Ex. 1 Verify that the function H(p 1,..., p n ) = k p k log 2 p k satisfies all 8 axioms on H.

Lecture 6 I. CHANNEL CODING. X n (m) P Y X

On the redundancy of optimum fixed-to-variable length codes

An Approximation Algorithm for Constructing Error Detecting Prefix Codes

DCSP-3: Minimal Length Coding. Jianfeng Feng

On the Cost of Worst-Case Coding Length Constraints

Course notes for Data Compression - 1 The Statistical Coding Method Fall 2005

21. Dynamic Programming III. FPTAS [Ottman/Widmayer, Kap. 7.2, 7.3, Cormen et al, Kap. 15,35.5]

A One-to-One Code and Its Anti-Redundancy

0 1 d 010. h 0111 i g 01100

Tight Upper Bounds on the Redundancy of Optimal Binary AIFV Codes


Arithmetic and Incompleteness. Will Gunther. Goals. Coding with Naturals. Logic and Incompleteness. Will Gunther. February 6, 2013

Transcription:

Lecture 3 : Algorithms for source coding September 30, 2016

Outline 1. Huffman code ; proof of optimality ; 2. Coding with intervals : Shannon-Fano-Elias code and Shannon code ; 3. Arithmetic coding. 1/39

1. Coding and decoding with a prefix code Coding : Using a table indexed by the letters. Decoding : Using the associated binary tree Start at the root. For each bit turn left or right. When a leaf is attained, the corresponding letter is output and go back to the root. 2/39

Optimal Code Corollary of Kraft and Mac Millan theorems : Corollary 1. If there exists a uniquely decodable code with K words of length n 1, n 2,..., n K then there exists a prefix code with the same lengths. K k=1 1 2 n k 1. Definition A uniquely decodable code of a source X is optimal if there exists no other uniquely decodable code with a smaller average length. Proposition 1. For every source there exists an optimal prefix code. 3/39

Huffman Code Let X be a discrete source with alphabet X = {a 1,..., a K 2, a K 1, a K } with prob. distr. p. W.l.o.g. we can assume that p(a 1 )... p(a K 1 ) p(a K ) > 0. Let Y be the source with alphabet Y = {a 1,..., a K 2, b K 1 } and prob. distr. q(a k ) = p(a k ), k = 1..., K 2 q(b K 1 ) = p(a K 1 ) + p(a K ) Algorithm (Huffman) We compute recursively a prefix code of X. When K = 2, the words are given by ϕ K (a 1 ) = 0 and ϕ K (a 2 ) = 1. If K > 2, let ϕ K 1 be a Huffman code for Y, ϕ K (a k ) = ϕ K 1 (a k ) for k = 1..., K 2, ϕ K (a K 1 ) = ϕ K 1 (b K 1 ) 0, ϕ K (a K ) = ϕ K 1 (b K 1 ) 1, 4/39

Huffman Code : example 1 0,57 a 0,43 b 0,17 0,15 c 0,32 0,25 e 0,14 d 0,09 f 0,05 0,11 x P (x) log 2 (P (x)) ϕ(x) n x a 0.43 1.22 1 1 b 0.17 2.56 000 3 c 0.15 2.74 001 3 d 0.11 3.18 011 3 e 0.09 3.47 0100 4 f 0.05 4.32 0101 4 H = 2.248 n = 2.28 E = 98.6% 5/39

The Huffman code is optimal Lemma 1. For every source, there exists an optimal prefix code such that 1. if p(x j ) > p(x k ) then φ(x j ) φ(x k ). 2. The two longest codewords have the same length and correspond to the least likely symbols. 3. These two codewords differ only in their last bit. 6/39

Proof of 1 1. If p(x j ) > p(x k ) then φ(x j ) φ(x k ). Let φ be an optimal code and assume that φ(x j ) > φ(x k ). φ is obtained by swapping the codewords of x j and x k. Then φ φ = p(x) φ (x) p(x) φ(x) = p(x j ) φ(x k ) + p(x k ) φ(x j ) p(x j ) φ(x j ) p(x k ) φ(x k ) = (p(x j ) p(x k ))( φ(x k ) φ(x j ) ) < 0 This contradicts the optimality of φ. 7/39

Proof of 2 2. The two longest codewords have the same length. If they do not have the same length then a bit can be removed from the longest one- the resulting code is still a prefix code. = a shorter prefix code is obtained in this way. By the previous point we know that they correspond to the least likely symbols. 8/39

Proof of 3. 3. The two longest codewords differ only in their last bit. Every φ(x) of maximal length has a brother (otherwise it would be possible to shorten the code). Among those, there are two which correspond to the least likely symbols. By swapping the leaves we can arrange these two symbols so that they become brothers. 9/39

The Huffman code is optimal Proof (Induction on k the alphabet size) Let f be optimal for X with p(a 1 ) p(a k 1 ) p(a k ), and f(a k 1 ) = m 0 et f(a k ) = m 1. We derive from f a code g on Y = (a 1,..., a k 2, b k 1 ) : { g(aj ) = f(a j ), j k 2 g(b k 1 ) = m f g = p(a k 1 ) + p(a k ) ϕ k ϕ k 1 = p(a k 1 ) + p(a k ) Hence f ϕ k = g φ k 1. Optimality of ϕ k 1 g φ k 1 0 f ϕ k 0. Optimality of f and f ϕ k 0 f ϕ k = 0 optimality of ϕ k. 10/39

Drawbacks of the Huffman code Need to know beforehand the source statistics. If the wrong probability distribution q instead of the true distribution p, instead of compressing with a rate of H(p) we compress with a rate of H(p) + D(p q). This can be dealt with by using an adaptive algorithm that computes the statistics on the fly and updates them during the coding process. Based on a memoryless source model. It is better to group letters by blocks of size l, for l large enough : (1) memoryless model sufficiently more realistic (2) coding efficiency 1. But space complexity = O( X l ). 11/39

Huffman code in practice Almost everywhere... 1. in the compression algorithms gzip, pkzip, winzip, bzip2 2. compressed images such as jpeg, png 3. compressed audio files such as mp3 Even the rather recent Google compression algorithm Brotli uses it... 12/39

2. Interval coding Gives a scheme for which we have in a natural way x i log p(x i ) bits and therefore φ H(X). Idea of arithmetic coding. 13/39

2-adic numbers In the interval [0, 1], they are of the form d i 2 i, where d i {0, 1} We write 0.d 1 d 2 d 3... i=1 For instance 0.25 0.01 0.43 0.0110111000... 0.125 0.001 0.71 0.1011010111... 0.625 0.101 1/ 2 0.1011010100... Certain numbers have several different 2-adic representation, for instance 0.25 0.01000... and 0.25 0.00111.... In the last case ; we choose the smallest representation. 14/39

Shannon-Fano-Elias Code Let X be a memoryless source with alphabet X, and probability distribution p. S(x) = x <x p(x ) p(x 1 ) + + p(x n 1 ) p(x 1 ) + + p(x i ) p(x 1 ) + p(x 2 ) p(x 1 ) The interval width is p(x i ). Each x i is encoded by its interval. 15/39

Shannon-Fano-Elias Coding Consider the middle of the interval S(x) def = 1 p(x) + S(x). 2 Let ϕ(x) for all x X be defined as the log p(x) + 1 first bits of the 2-adic representation of S(x). Proposition 2. The code ϕ is prefix and its average length ϕ verifies H(X) ϕ < H(X) + 2 16/39

A simple explanation Lemma 2. For all reals u and v in [0, 1[, and for every integer l > 0,if u v 2 l then the l first bits of the 2-adic representation of u and v are distinct. 17/39

The Shannon-Fano-Elias code is prefix Proof W.l.o.g., we may assume y > x S(y) S(x) = p(y) 2 + p(x ) p(x) 2 p(x ) x <y x <x = p(y) 2 p(x) 2 + x x <y = p(y) 2 + p(x) 2 + > max x<x <y p(x ) p(x ) > max (2 log p(x) 1 log, 2 p(y) 1) ( p(y) 2, p(x) ) 2 Therefore they differ in their min( log p(x) + 1, log p(y) + 1) first bits : φ(x) et φ(y) can not be prefix of each other. 18/39

Shannon-Fano-Elias Code : example x p(x) log p(x) + 1 S(x) ϕ(x) Huffman a 0.43 3 0.215 0.0011011... 001 0 b 0.17 4 0.515 0.1000001... 1000 100 c 0.15 4 0.675 0.1010110... 1010 101 d 0.11 5 0.805 0.1100111... 11001 110 e 0.09 5 0.905 0.1110011... 11100 1110 f 0.05 6 0.975 0.1111100... 111110 1111 19/39

Shannon-Fano-Elias code : another example x p(x) log p(x) + 1 S(x) ϕ(x) Huffman a 0.25 3 0.125 0.001 001 10 b 0.5 2 0.5 0.1 10 0 c 0.125 4 0.8125 0.1101 1101 110 d 0.125 4 0.9375 0.1111 1111 111 x p(x) log p(x) + 1 S(x) ϕ(x) Huffman b 0.5 2 0.25 0.01 01 0 a 0.25 3 0.625 0.101 101 10 c 0.125 4 0.8125 0.1101 1101 110 d 0.125 4 0.9375 0.1111 1111 111 If the last bit of ϕ is deleted the code is not prefix anymore. If the last bit of ϕ is deleted the code is still a prefix code. 20/39

Shannon Code The Shannon code is defined in the same way as the Shannon-Fano-Elias code, with the exception of : The symbols are ordered by decreasing probability, The codeword of x is formed by the log p(x) first bits of S(x). (In particular, the codeword corresponding to the most likely letter is formed by log p(x) 0 ). Proposition 3. The Shannon code is prefix and its average length ϕ verifies H(X) ϕ < H(X) + 1 21/39

The Shannon code is prefix Proof For all x, y X such that x < y (therefore log p(y) log p(x) ) S(y) S(x) = p(z) p(z) z<y z<x = p(z) x z<y = p(x) + x<z<y log p(x) p(z) p(x) 2 22/39

Proof for the lengths Proposition 4. The Shannon-Fano-Elias code ϕ verifies H(X) ϕ < H(X) + 2 Proof ϕ(x) = log p(x) + 1 < log 2 p(x) + 2. The conclusion follows by taking the average. Proposition 5. The Shannon code ϕ verifies H(X) ϕ < H(X) + 1 ϕ(x) = log p(x) < log 2 p(x) + 1. The conclusion follows by taking the average. 23/39

The Shannon code is optimal for a dyadic distribution In the dyadic case, the probabilities verify p(x) = 2 log p(x). In this case the lengths of encoding satisfy φ(x) = log p(x) = log 2 p(x) By taking the average p(x) φ(x) = p(x) log2 p(x) = H(X). As for the Huffman code : average coding length = entropy. 24/39

Non optimality of Shannon s code For the following source 1. output 1 with probability 1 2 10 ; 2. output 0 with probability 2 10. 1. The Shannon code encodes 1 with a codeword of length log(1 2 10 ) = 0.0014 = 1 2. The Shannon code encodes 0 with a codeword of length log 2 10 = 10 = 10 The Huffman code uses 1 and 0, instead which is of course optimal. 25/39

3. Arithmetic coding One of the main drawbacks of the Huffman code is the space complexity when packets of length l are formed for the alphabet letters (space complexity C = O( X l )). r = redundancy def = 1 efficiency. Huffman code on the blocks of size l : r O ( 1 l ) O ( 1 log C Arithmetic coding allows to work on packets of arbitrary sizes with acceptable algorithmic cost. Here ( ) 1 r O, l but the dependency of the space complexity in l is much better. ). 26/39

The fundamental idea Instead of encoding the symbols of X, work directly on X l where l is the length of the word which is to encoded.. To encode x 1,..., x l 1. compute the interval [S(x 1,..., x l ), S(x 1,..., x l ) + p(x 1,..., x l )], with S(x 1,..., x l ) def = (y 1,...,y l )<(x 1,...,x l ) p(y 1,..., y l ), 2. encode (x 1,..., x l ) with an element of the interval whose 2-adic representation length is log(p(x 1,..., x l )) + 1. This identifies the interval deduce x 1... x l at the decoding step. 27/39

Advantage of arithmetic encoding codelength(x 1,..., x l ) = log p(x 1,..., x l ) + O(1). at most O(1) bits are used to encode l symbols and not O(l) as before! Very interesting in the previous example where 0 s are generated with probability 2 10. The average coding length becomes h(2 10 )l + O(1) 0.011l instead of l for the Huffman code! 28/39

The major problem for implementing arithmetic coding Need to compute S(x 1,..., x l ) with a precision of log p(x 1,..., x l ) + 1 bits. perform computations with arbitrary large precision. 29/39

The key idea which allows to work with packets of size l Principle : efficient computation of p(x 1,..., x l ) and S(x 1,..., x l ). Computation of p(x 1,..., x l ) p(x 1,..., x l ) = p(x 1 )... p(x l ) iterative computation of S(x 1,..., x l ) : lexicographic order on X l. S(x 1,..., x l ) def = (y 1,...,y l )<(x 1,...,x l ) p(y 1,..., y l ). Proposition 6. S(x 1,..., x l ) = S(x 1,..., x l 1 ) + p(x 1,..., x l 1 )S(x l ). 30/39

Proof Let m be the smallest element in the alphabet S(x 1,..., x l ) = p(y 1, y 2,..., y l ) (y 1,...,y l )<(x 1,...,x l ) = A + B where A B def = def = p(y 1, y 2,..., y l ) (y 1,...,y l )<(x 1,...,x l 1,m) (x 1,...,x l 1,m) (y 1,...,y l )<(x 1,...,x l ) p(y 1, y 2,..., y l ) 31/39

Proof (cont d) A = p(y 1, y 2,..., y l ) = = = (y 1,...,y l )<(x 1,...,x l 1,m) y 1,...,y l :(y 1,...,y l 1 )<(x 1,...,x l 1 ) (y 1,...,y l 1 )<(x 1,...,x l 1 ) (y 1,...,y l 1 )<(x 1,...,x l 1 ) = S(x 1,..., x l 1 ). p(y 1, y 2,..., y l 1 )p(y l ) p(y 1, y 2,..., y l 1 ) p(y l ) y l p(y 1, y 2,..., y l 1 ) 32/39

Proof (end) B = p(y 1, y 2,..., y l ) = = (x 1,...,x l 1,m) (y 1,...,y l )<(x 1,...,x l ) p(x 1,..., x l 1, y l ) m y l <x l p(x 1,..., x l 1 )p(y l ) m y l <x l = p(x 1,..., x l 1 )S(x l ) 33/39

fractals... Example : Prob(a) = 1 2 ; Prob(b) = 1 6, Prob(c) = 1 3. 0 a b c 1 34/39

fractals Example : Prob(a) = 1 2 ; Prob(b) = 1 6, Prob(c) = 1 3. 0 a b c 1 aa ab ac ba bbbc ca cb cc 35/39

coding S(x 1,..., x l ) = S(x 1,..., x l 1 ) }{{} beginning of the previous interval + p(x 1,..., x l 1 ) }{{} S(x l ) length of. previous interval Therefore if the current interval is [a, b[,and if x i is read, then a new a + (b a)s(x i ) b new a new + (b a)p(x i ) The width of the interval becomes (b a)p(x i ) = p(x i )... p(x i 1 )p(x i ) = p(x 1,..., x i ). 36/39

Zooming Let [a, b[ be the current interval If b 1/2, [a, b[ [2a, 2b[, output ( 0 ) ; If a 1/2, [a, b[ [2a 1, 2b 1[, output ( 1 ) If 1/4 a < b 3/4, [a, b[ [2a 1/2, 2b 1/2[, no output (underflow expansion), but keep this in mind (with the help of a counter). 37/39

Decoding Let r = ϕ(x 1,..., x l ) be the real number whose 2-adic representation encodes (x 1,..., x l ) X l. The letters x 1,..., x l are the only ones for which S(x 1 ) < r < S(x 1 ) + p(x 1 ) S(x 1, x 2 ) < r. < S(x 1, x 2 ) + p(x 1, x 2 ) S(x 1,..., x l ) < r < S(x 1,..., x l ) + p(x 1,..., x l ) 38/39

Procedure S(x 1 ) < r < S(x 1 ) + p(x 1 ) Suppose that x 1 was found. We have S(x 1, x 2 ) < r < S(x 1, x 2 ) + p(x 1, x 2 ) S(x 1 ) + p(x 1 )S(x 2 ) r < S(x 1 ) + p(x 1 )S(x 2 ) + p(x 1 )p(x 2 ) p(x 1 )S(x 2 ) r S(x 1 ) < p(x 1 )(S(x 2 ) + p(x 2 )) S(x 2 ) r S(x 1 ) p(x 1 ) < S(x 2 ) + p(x 2 ) We have therefore to find out in which interval lies r 1 = r S(x 1) p(x 1 ). 39/39