Lecture 4 : Adaptive source coding algorithms

Similar documents
CS4800: Algorithms & Data Jonathan Ullman

SIGNAL COMPRESSION Lecture 7. Variable to Fix Encoding

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria

Chapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code

Motivation for Arithmetic Coding

Basic Principles of Lossless Coding. Universal Lossless coding. Lempel-Ziv Coding. 2. Exploit dependences between successive symbols.

Run-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE

Introduction to information theory and coding

Lecture 3 : Algorithms for source coding. September 30, 2016

Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet)

Chapter 2: Source coding

Source Coding Techniques

Chapter 2 Date Compression: Source Coding. 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code

Coding of memoryless sources 1/35

Data Compression Techniques

3F1 Information Theory, Lecture 3

3F1 Information Theory, Lecture 3

10-704: Information Processing and Learning Fall Lecture 10: Oct 3

Text Compression. Jayadev Misra The University of Texas at Austin December 5, A Very Incomplete Introduction to Information Theory 2

CSE 421 Greedy: Huffman Codes

ECE 587 / STA 563: Lecture 5 Lossless Compression

MARKOV CHAINS A finite state Markov chain is a sequence of discrete cv s from a finite alphabet where is a pmf on and for

ECE 587 / STA 563: Lecture 5 Lossless Compression

4.8 Huffman Codes. These lecture slides are supplied by Mathijs de Weerd

Entropy Coding. Connectivity coding. Entropy coding. Definitions. Lossles coder. Input: a set of symbols Output: bitstream. Idea

1 Introduction to information theory

Information Theory with Applications, Math6397 Lecture Notes from September 30, 2014 taken by Ilknur Telkes

Multimedia. Multimedia Data Compression (Lossless Compression Algorithms)

EECS 229A Spring 2007 * * (a) By stationarity and the chain rule for entropy, we have

Lecture 1 : Data Compression and Entropy

CMPT 365 Multimedia Systems. Lossless Compression

On Universal Types. Gadiel Seroussi Hewlett-Packard Laboratories Palo Alto, California, USA. University of Minnesota, September 14, 2004

Autumn Coping with NP-completeness (Conclusion) Introduction to Data Compression

Lecture 16. Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code

Entropy as a measure of surprise

Data Compression Techniques (Spring 2012) Model Solutions for Exercise 2

Coding for Discrete Source

1 Ex. 1 Verify that the function H(p 1,..., p n ) = k p k log 2 p k satisfies all 8 axioms on H.

Information Theory and Statistics Lecture 2: Source coding

Algorithm Design and Analysis

Multimedia Communications. Mathematical Preliminaries for Lossless Compression

SIGNAL COMPRESSION Lecture Shannon-Fano-Elias Codes and Arithmetic Coding

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

CSEP 590 Data Compression Autumn Dictionary Coding LZW, LZ77

CMPT 365 Multimedia Systems. Final Review - 1

Tight Upper Bounds on the Redundancy of Optimal Binary AIFV Codes

Data Compression Techniques

PART III. Outline. Codes and Cryptography. Sources. Optimal Codes (I) Jorge L. Villar. MAMME, Fall 2015

UNIT I INFORMATION THEORY. I k log 2

2018/5/3. YU Xiangyu

Lec 03 Entropy and Coding II Hoffman and Golomb Coding

Lecture 1: September 25, A quick reminder about random variables and convexity

Chapter 9 Fundamental Limits in Information Theory

Data Compression. Limit of Information Compression. October, Examples of codes 1

Chapter 2 Source Models and Entropy. Any information-generating process can be viewed as. computer program in executed form: binary 0

lossless, optimal compressor

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

Multimedia Information Systems

Shannon-Fano-Elias coding

ELEC 515 Information Theory. Distortionless Source Coding

Ch 0 Introduction. 0.1 Overview of Information Theory and Coding

Huffman Coding. C.M. Liu Perceptual Lab, College of Computer Science National Chiao-Tung University

Communications Theory and Engineering

COMM901 Source Coding and Compression. Quiz 1

SGN-2306 Signal Compression. 1. Simple Codes

Compression and Coding

Image and Multidimensional Signal Processing

Data compression. Harald Nautsch ISY Informationskodning, Linköpings universitet.

On the Cost of Worst-Case Coding Length Constraints

Using an innovative coding algorithm for data encryption

Chapter 5: Data Compression

CSEP 521 Applied Algorithms Spring Statistical Lossless Data Compression

Lecture 11: Polar codes construction

COMP9319 Web Data Compression and Search. Lecture 2: Adaptive Huffman, BWT

Adapting Boyer-Moore-Like Algorithms for Searching Huffman Encoded Texts

Midterm 2 for CS 170

CS 229r Information Theory in Computer Science Feb 12, Lecture 5

EE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16

Asymptotic redundancy and prolixity

Kolmogorov complexity ; induction, prediction and compression

Non-binary Distributed Arithmetic Coding

On universal types. Gadiel Seroussi Information Theory Research HP Laboratories Palo Alto HPL September 6, 2004*

CSCI 2570 Introduction to Nanocomputing

Compressing Tabular Data via Pairwise Dependencies

Lecture 3. Mathematical methods in communication I. REMINDER. A. Convex Set. A set R is a convex set iff, x 1,x 2 R, θ, 0 θ 1, θx 1 + θx 2 R, (1)

Information Theory. Week 4 Compressing streams. Iain Murray,

EE5585 Data Compression January 29, Lecture 3. x X x X. 2 l(x) 1 (1)

21. Dynamic Programming III. FPTAS [Ottman/Widmayer, Kap. 7.2, 7.3, Cormen et al, Kap. 15,35.5]

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Introduction to information theory and coding

Slides for CIS 675. Huffman Encoding, 1. Huffman Encoding, 2. Huffman Encoding, 3. Encoding 1. DPV Chapter 5, Part 2. Encoding 2

An O(N) Semi-Predictive Universal Encoder via the BWT

Optimal codes - I. A code is optimal if it has the shortest codeword length L. i i. This can be seen as an optimization problem. min.

EE376A - Information Theory Midterm, Tuesday February 10th. Please start answering each question on a new page of the answer booklet.

Computing Techniques for Parallel and Distributed Systems with an Application to Data Compression. Sergio De Agostino Sapienza University di Rome

Lec 05 Arithmetic Coding

CSEP 590 Data Compression Autumn Arithmetic Coding

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

Intro to Information Theory

3F1: Signals and Systems INFORMATION THEORY Examples Paper Solutions

Transcription:

Lecture 4 : Adaptive source coding algorithms February 2, 28 Information Theory

Outline 1. Motivation ; 2. adaptive Huffman encoding ; 3. Gallager and Knuth s method ; 4. Dictionary methods : Lempel-Ziv 78, Lempel-Ziv-Welsh, Lempel-Ziv 77. Information Theory 1/36

1. Motivation Huffman/arithmetic encoding needs two passes 1. first pass to compute the source statistics 2. second pass : Huffman/arithmetic encoding Moreover additional information is needed for the decoder either the statistics are known ; or the encoding table is known. Information Theory 2/36

Universal source coding Universal source coding : no assumption on the source. Idea : Compute at each time a dynamic source model that could produce the observed text and use this model to compress the text which has been observed so far. Illustration : adaptive Huffman algorithm (memoryless source model), adaptive arithmetic coding algorithm, Lempel-Ziv algorithm and its variants (using the dependencies between the symbols). Information Theory 3/36

2. Adaptive Huffman Outline Assume that n letters have been read so far, and that they correspond to K distinct letters. Let X n be the source over K + 1 symbols formed by the K letters observed so far with probability proportional to their number of occurrences in the text + void symbold which has probability 0. Compute the Huffman code tree T n of this source, the n + 1-th letter is read and encoded with its codeword when it exists, with the K + 1-th codeword + ascii code of the letter otherwise. Information Theory 4/36

Adaptive Huffman Coding The initial Huffman tree has a single leaf corresponding to the void symbol. Each time a new letter x is read if already seen print its codeword, update the Huffman tree, else print the codeword of the void symbol followed by an unencoded version of x (ascii code for instance), add a leaf to the Huffman tree, update the Huffman tree. Information Theory 5/36

Adaptive Huffman Decoding The initial tree is formed by a single leaf corresponding to the void symbol. Until all the encoded text is read, perform a walk in the tree by going down left when 0 is read and going down right when 1 is read until a leaf is reached. if the leaf is not the void symbol print the letter, update the tree, else print the 8 next bits to write the ascii code of the letter add a leaf to the tree, update the tree. Information Theory 6/36

Adaptive Huffman preliminary definitions Identify a prefix code of a discrete source X with a binary tree with X leaves labeled by the letters of X. Reminder : A code is irreducible if each tree node has either 0 or 2 children. Definition Consider a prefix code. the leaf weight is the probability of the corresponding letter the node weight is the sum of the weights of its children. Definition A Gallager order u 1,..., u 2K 1 on the nodes of an irreducible code (of a source of cardinality K) verifies 1. the weights of u i are decreasing, 2. u 2i and u 2i+1 are brothers for all i such that 1 i < K. Information Theory 7/36

Example 32 1 11 4 21 2 11 a 10 3 5 6 6 5 7 e f 3 b 5 8 5 c 10 2 11 d 9 Information Theory 8/36

Example 32 1 3 11 a 21 2 5 10 11 4 3 b 8 5 9 5 6 6 5 c e f 10 2 11 d 7 Information Theory 9/36

Adaptive Huffman Properties Theorem 1. [Gallager] Let T be a binary tree corresponding to a prefix code of a source X. T is a Huffman tree of X iff there exists a Gallager order on the nodes of T. Information Theory 10/36

Proof T Huffman = T admits a Gallager order. The two codewords of minimum weight are brothers remove them and keep only their common parent. The obtained tree is a Huffman tree which admits a Gallager order (induction) u 1 u 2K 3. The parent appears somewhere in this sequence. Take its two children and put them at the end. This gives a Gallager order. u 1 u 2K 3 u 2K 2 u 2K 1 Information Theory 11/36

Proof T admits a Gallager order = T Huffman. T has a Gallager order = nodes are ordered as u 1 u 2K 3 u 2K 2 u 2K 1 where u 2K 2 and u 2K 1 are brothers, leaves and are of minimum weight. Let T be the tree corresponding to u 1,..., u 2K 3. It has the Gallager order u 1 u 2K 3. It is a Huffman tree (induction). By using Huffman s algorithm, we know that the binary tree corresponding to u 1,..., u 2K 1 is a Huffman tree, since once of its nodes is the merge of u 2K 2 and of u 2K 1. Information Theory 12/36

Update Proposition 1. Let X n be the source corresponding to the n-th step and let T n be the corresponding Huffman tree. Let x be the n + 1-th letter and let u 1,..., u 2K 1 be the Gallager order on the nodes of T n. If x X n and if all the nodes u i1, u i2,..., u il that are on the path between the root and x are the first ones in the Gallager order with this weight, then T n is a Huffman tree for X n+1. Proof Take the same Gallager order. Information Theory 13/36

Adaptive Huffman Updating the tree Let T n be the Huffman tree at Step n and let u 1,..., u 2K 1 be its corresponding Gallager order. Assumption : x T n (at node u). repeat until u is not the root let ũ be the first node in the Gallager order of the same weight as u, exchange u and ũ, exchange u and ũ in the Gallager order, Increment the weight of u (weight = nb occurrences) u parent of u This algorithm is due to D. Knuth. Information Theory 14/36

32 Example 1 3 11 a 21 2 5 10 11 4 3 b 8 5 9 5 6 6 5 c e f 10 2 11 d 7 Information Theory 15/36

32 Example 1 3 11 a 21 2 5 10 11 4 3 b 8 5 9 5 6 6 5 c e f 10 3 11 d 7 Information Theory 16/36

Example 32 1 32 1 3 11 a 21 2 5 10 11 4 3 11 a 21 2 5 10 11 4 8 5 9 5 6 6 5 c e f 3 10 3 11 b d 7 8 5 9 5 6 6 6 7 f c e 3 b 10 3 d 11 Information Theory 17/36

Example 32 1 3 11 a 21 2 5 10 11 4 12 3 33 1 21 2 8 5 9 5 6 6 6 7 f c e 3 b 10 3 d 11 6 6 6 7 5 e 10 10 3 11 3 8 5 5 b d f c 9 11 a 4 Information Theory 18/36

Adaptive Huffman Adding a Leaf When the current letter x does not belong to the tree, the update uses the void symbol. It is replaced by a tree with two leaves, one for the void symbol and one for x.... u 2K 1 0 0... u 2K 1 1 0 1 00 000 11 1 11 0 0 1 1 u u 2K 1 2K+1 0 x 0 The two new nodes are added at the end of the sequence (u i ). Information Theory 19/36

Methods based on dictionaries Idea : maintaining a dictionary (key,string). The keys are written on the output, rather than the string. The hope is that the keys are shorter than the strings. Information Theory 20/36

Lempel-Ziv 78 Outline The Lempel-Ziv algorithm (1978) reads a text composed of symbols from an alphabet A. Assume that N symbols have been read and that a dictionary of the words which have been seen has been constructed. Read the text by starting from the (N + 1)-th symbol until a word of length n which is not in the dictionary is found, print the index of the last seen word (it is of length n 1) together with the last symbol. Add the new word (of length n) to the dictionary and start again at the (N + n + 1)-th symbol. Information Theory 21/36

Lempel-Ziv 78 Data Structure We need an efficient way of representing the dictionary. Useful property : when a word is in the dictionary all its prefixes are also in it. the dictionaries that we want to represent are A -ary trees. Such a representation gives a simple and efficient implementation for the functions which are needed, namely check if a word is in the tree, add a new word Information Theory 22/36

Lempel-Ziv 78 Coding The dictionary is empty initially. Its size is K = 1 (empty word). Repeat the following by starting at the root until this is not possible anymore walk on the tree by reading the text letters until this is not possible anymore Let b 1,..., b n, b n+1 be the read symbols and let i, 0 i < K (K being the size of the dictionary), be the index of the word (b 1,..., b n ) in the dictionary, (b 1,..., b n, b n+1 ) is added to the dictionary with index K, print the binary representation of i with log 2 K bits followed by the symbol b n+1. Information Theory 23/36

Lempel-Ziv 78 Example 1 0 0 1 0 1 1 1 0 0 0 1 1 1 0 0 Dictionary pair indices word (index,symbol) codeword 0 ε 1 1 (0,1) 1 2 0 (0,0) 00 3 (2,1) 10 1 4 1 (3,1) 11 1 5 10 (1,0) 0 0 6 00 (2,0) 0 0 7 11 (1,1) 0 1 8 100 (5,0) 1 0 Information Theory 24/36

Lempel-Ziv 78 Decoding The dictionary is now a table which contains only the empty word M 0 = and K = 1. Repeat until the whole encoded text is read read the log 2 K first bits of the encoded text to obtain index i. Let M i be the word of index i in the dictionary read the next symbol b, add a K-th entry to the table M K M i b, print M K. Information Theory 25/36

The Welsh variant Initially, all words of length 1 are in the dictionary. Instead of printing the pair (i, b) print only i. Add (i, b) to the dictionary. Start reading again from symbol b. slightly more efficient. used in the unix compress command, or for GIF87. In practice, English text is compressed by a factor of 2. Information Theory 26/36

Lempel-Ziv-Welsh Example 1 0 0 1 0 1 1 1 0 0 0 1 1 1 0 0 1 0 1... Dictionary Word indices words word index codeword 0 1 0 1 2 10 1 1 1 3 00 0 0 00 4 0 0 00 5 1 10 2 0 6 11 1 1 0 7 110 11 6 110 8 000 00 3 1 9 1 4 00 10 110 7 11 11 0 5 Information Theory 27/36

encoding algorithm : Lempel-Ziv-Welsh, encoding/decoding 1. Read the text until finding m followed by a letter a such that m is in the dictionary, but not m a, 2. print the index of m in the dictionary, 3. add m a to the dictionary, 4. Continue by starting from letter a. decoding algorithm : 1. read index i, 2. print the word m of index i, 3. add at the end of the dictionary m a where a is the first letter of the following word (a will be known the next time a word is formed). Information Theory 28/36

Lempel-Ziv 77 This variant appeared before the previous one. More difficult to implement. Assume that N bits have been read. Read by starting from the N + 1-th bit the longest word (of length n) which is in the previously read text (=dictionary) and print (i, n, b), where b is the next bit. In practice, implemented by a sliding window of fixed size, the dictionary is the set of words in this sliding window. Used in gzip, zip. Information Theory 29/36

Example 0 0 1 0 1 1 1 0 0 0 1 1 1 0 0 1 (0,0,0) Information Theory 30/36

Example 0 0 1 0 1 1 1 0 0 0 1 1 1 0 0 1 (0,0,0) (1,1,1) Information Theory 31/36

Example 0 0 1 0 1 1 1 0 0 0 1 1 1 0 0 1 (0,0,0) (1,1,1) (2,2,1) Information Theory 32/36

Example 0 0 1 0 1 1 1 0 0 0 1 1 1 0 0 1 (0,0,0) (1,1,1) (2,2,1) (3,2,0) Information Theory 33/36

Example 0 0 1 0 1 1 1 0 0 0 1 1 1 0 0 1 (0,0,0) (1,1,1) (2,2,1) (3,2,0) (4,6,1) Information Theory 34/36

Stationnary Source Definition A source is stationary if its behavior does not change with a time shift. For all nonnegative integers n and j, and for all (x 1,..., x n ) X n p X1...Xn(x 1,..., x n ) = p X1+j...Xn+j (x 1,..., x n ) Theorem 2. equal For all stationary sources the following limits exist and are H(X ) = lim n 1 n H(X 1,..., X n ) = lim n H(X n X 1,..., X n 1 ). the quantity H(X ) is called the entropy per symbol. Information Theory 35/36

The fundamental property of Lempel-Ziv Theorem 3. For all stationary and ergodic sources X the compression rate goes to H(X ) with prob. 1 when the text size goes to. Information Theory 36/36