Sources: The Data Compression Book, 2 nd Ed., Mark Nelson and Jean-Loup Gailly.

Similar documents
Run-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE

COMP9319 Web Data Compression and Search. Lecture 2: Adaptive Huffman, BWT

COMP9319 Web Data Compression and Search. Lecture 2: Adaptive Huffman, BWT

Data Compression Techniques

Multimedia. Multimedia Data Compression (Lossless Compression Algorithms)

Source Coding Techniques

UNIT I INFORMATION THEORY. I k log 2

Entropy Coding. Connectivity coding. Entropy coding. Definitions. Lossles coder. Input: a set of symbols Output: bitstream. Idea

Lecture 1 : Data Compression and Entropy

SIGNAL COMPRESSION Lecture Shannon-Fano-Elias Codes and Arithmetic Coding

Information Theory with Applications, Math6397 Lecture Notes from September 30, 2014 taken by Ilknur Telkes

Lecture 4 : Adaptive source coding algorithms

CSE 421 Greedy: Huffman Codes

CMPT 365 Multimedia Systems. Lossless Compression

21. Dynamic Programming III. FPTAS [Ottman/Widmayer, Kap. 7.2, 7.3, Cormen et al, Kap. 15,35.5]

Shannon-Fano-Elias coding

Introduction to information theory and coding

Entropy as a measure of surprise

Module 5 EMBEDDED WAVELET CODING. Version 2 ECE IIT, Kharagpur

Information Theory. Week 4 Compressing streams. Iain Murray,

Lecture 16. Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code

Static Huffman. Wrong probabilities. Adaptive Huffman. Canonical Huffman Trees. Algorithm for canonical trees. Example of a canonical tree

U Logo Use Guidelines

Kolmogorov complexity ; induction, prediction and compression

Lecture 2: Introduction to Audio, Video & Image Coding Techniques (I) -- Fundaments

Lecture 10 : Basic Compression Algorithms

Information Theory and Statistics Lecture 2: Source coding

Lecture 2: Introduction to Audio, Video & Image Coding Techniques (I) -- Fundaments. Tutorial 1. Acknowledgement and References for lectures 1 to 5

Introduction to Information Theory. By Prof. S.J. Soni Asst. Professor, CE Department, SPCE, Visnagar

Multimedia Communications. Mathematical Preliminaries for Lossless Compression

Digital communication system. Shannon s separation principle

EECS 229A Spring 2007 * * (a) By stationarity and the chain rule for entropy, we have

17.1 Binary Codes Normal numbers we use are in base 10, which are called decimal numbers. Each digit can be 10 possible numbers: 0, 1, 2, 9.

6.02 Fall 2012 Lecture #1

Information and Entropy

+ (50% contribution by each member)

Multimedia Information Systems

Autumn Coping with NP-completeness (Conclusion) Introduction to Data Compression

Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet)

Lecture 3 : Algorithms for source coding. September 30, 2016

CSEP 521 Applied Algorithms Spring Statistical Lossless Data Compression

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria

Huffman Coding. C.M. Liu Perceptual Lab, College of Computer Science National Chiao-Tung University

CMPT 365 Multimedia Systems. Final Review - 1

Stream Codes. 6.1 The guessing game

CSEP 590 Data Compression Autumn Arithmetic Coding

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

SIGNAL COMPRESSION. 8. Lossy image compression: Principle of embedding

PART III. Outline. Codes and Cryptography. Sources. Optimal Codes (I) Jorge L. Villar. MAMME, Fall 2015

Fibonacci Coding for Lossless Data Compression A Review

Chapter 5: Data Compression

Lecture 1: Shannon s Theorem

Using an innovative coding algorithm for data encryption

2018/5/3. YU Xiangyu

On the minimum neighborhood of independent sets in the n-cube

Motivation for Arithmetic Coding

( c ) E p s t e i n, C a r t e r a n d B o l l i n g e r C h a p t e r 1 7 : I n f o r m a t i o n S c i e n c e P a g e 1

CS4800: Algorithms & Data Jonathan Ullman

BASIC COMPRESSION TECHNIQUES

Chapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code

Integer Sorting on the word-ram

Chapter 2: Source coding

3F1 Information Theory, Lecture 3

SIGNAL COMPRESSION Lecture 7. Variable to Fix Encoding

3F1 Information Theory, Lecture 3

ECE Advanced Communication Theory, Spring 2009 Homework #1 (INCOMPLETE)

Digital Communications III (ECE 154C) Introduction to Coding and Information Theory

CSCI 2570 Introduction to Nanocomputing

4.8 Huffman Codes. These lecture slides are supplied by Mathijs de Weerd

Text Compression. Jayadev Misra The University of Texas at Austin December 5, A Very Incomplete Introduction to Information Theory 2

10-704: Information Processing and Learning Fall Lecture 10: Oct 3

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Slides for CIS 675. Huffman Encoding, 1. Huffman Encoding, 2. Huffman Encoding, 3. Encoding 1. DPV Chapter 5, Part 2. Encoding 2

Splay trees (Sleator, Tarjan 1983)

Introduction to Theory of Computing

Ch 0 Introduction. 0.1 Overview of Information Theory and Coding

4. Quantization and Data Compression. ECE 302 Spring 2012 Purdue University, School of ECE Prof. Ilya Pollak

State of the art Image Compression Techniques

Multimedia Communications. Scalar Quantization

Summary of Last Lectures

Information Theory, Statistics, and Decision Trees

ELEC 515 Information Theory. Distortionless Source Coding

Optimal codes - I. A code is optimal if it has the shortest codeword length L. i i. This can be seen as an optimization problem. min.

Information and Entropy. Professor Kevin Gold

Image Data Compression

1. Basics of Information

TTIC 31230, Fundamentals of Deep Learning David McAllester, April Information Theory and Distribution Modeling

Information & Correlation

Lec 03 Entropy and Coding II Hoffman and Golomb Coding

5. Density evolution. Density evolution 5-1

Selective Use Of Multiple Entropy Models In Audio Coding

An Approximation Algorithm for Constructing Error Detecting Prefix Codes

Lecture 7 Predictive Coding & Quantization

Lecture 3. Mathematical methods in communication I. REMINDER. A. Convex Set. A set R is a convex set iff, x 1,x 2 R, θ, 0 θ 1, θx 1 + θx 2 R, (1)

Source Coding: Part I of Fundamentals of Source and Video Coding

Ph219/CS219 Problem Set 3


Compression and Coding

Lecture 5: Channel Capacity. Copyright G. Caire (Sample Lectures) 122

Transcription:

Lossless ompression Multimedia Systems (Module 2 Lesson 2) Summary: daptive oding daptive Huffman oding Sibling Property Update lgorithm rithmetic oding oding and ecoding Issues: OF problem, Zero frequency problem Sources: The ata ompression ook, 2 nd d., Mark Nelson and Jean-Loup ailly. Introduction to ata ompression, by Sayood Khalid The Sqeeze Page at SFU daptive oding Motivations: The previous algorithms (both Shannon-Fano and Huffman) require the statistical knowledge which is often not available (e.g., live audio, video). ven when it is available, it could be a heavy overhead. Higher-order models incur more overhead. For example, a 255 entry probability table would be required for a 0-order model. n order-1 model would require 255 such probability tables. ( order-1 model will consider probabilities of occurrences of 2 symbols) The solution is to use adaptive algorithms. daptive Huffman oding is one such mechanism that we will study. The idea of adaptiveness is however applicable to other adaptive compression algorithms. daptive oding NOR Initialize_model(); do { c = getc( input ); encode( c, output ); update_model( c ); } while ( c!= eof) OR Initialize_model(); while ( c = decode(input))!= eof) { putc( c, output) update_model( c ); } H The key is that, both encoder and decoder use exactly the same initialize_model and update_model routines.

The Sibling Property The node numbers will be assigned in such a way that: 1. node with a higher weight will have a higher node number 2. parent node will always have a higher node number than its children. In a nutshell, the sibling property requires that the nodes (internal and leaf) are arranged in order of increasing weights. The update procedure swaps nodes that are in violation of the sibling property. The identification of nodes in violation of the sibling property is achieved by using the notion of a block. ll nodes that have the same weight are said to belong to one block Flowchart of the update procedure gives birth To new and external node Increment weight of external node and old node; djust node numbers o to old node STRT First Yes appearance of symbol No o to symbol external node Node No number max in block? Yes Increment node weight Is this No the root node? Yes Switch node with highest numbered node in block o to parent node The Huffman tree is initialized with a single node, known as the Not-Yet- Transmitted () or escape code. This code will be sent every time that a new character, which is not in the tree, is encountered, followed by the SII encoding of the character. This allows for the de-compressor to distinguish between a code and a new character. lso, the procedure creates a new node for the character and a new from the old node. The root node will have the highest node number because it has the highest weight. STOP xample Initial Huffman Tree ounts: (number of occurrences) :2 :2 :2 :10 W=16 xample Huffman tree after some symbols have been processed in accordance with the sibling property

xample ounts: (number of occurrences) :1 :2 :2 :2 :10 +1 +1 W=16+1 0 W=1 W=1 Huffman tree after first appearance of symbol Increment ounts: :1+1 :2 :2 :2 :10 W=3+1 W=7+1 W=17+1 0 W=1+1 W=1+1 n increment in the count for propagates up to the root Swapping ounts: :2+1 :2 :2 :2 :10 nother increment in the count for results in swap W=18 0 W=8 Swap nodes 1 and 5 ounts: :3 :2 :2 :2 :10 W=8+1 W=18+1 0 +1 +1

Swapping contd. ounts: :3+1 :2 :2 :2 :10 W=9+1 W=19+1 0 W=5+1 W=3+1 nother increment in the count for propagates up Swapping contd. nother increment in the count for causes swap of sub-tree ounts: :4+1 :2 :2 :2 :10 0 0 Swap nodes 5 and 6 Swapping contd. ounts: :4+1 :2 :2 :2 :10 Further swapping needed to fix the tree +1 0 0 Swap nodes 8 and 9

Swapping contd. ounts: :5 :2 :2 :2 :10 0+1 0 +1 W=5 rithmetic oding rithmetic coding is based on the concept of interval subdividing. In arithmetic coding a source ensemble is represented by an interval between 0 and 1 on the real number line. ach symbol of the ensemble narrows this interval. s the interval becomes smaller, the number of bits needed to specify it grows. rithmetic coding assumes an explicit probabilistic model of the source. It uses the probabilities of the source messages to successively narrow the interval used to represent the ensemble. high probability message narrows the interval less than a low probability message, so that high probability messages contribute fewer bits to the coded ensemble rithmetic oding: escription In the following discussions, we will use M as the size of the alphabet of the data source, N[x] as symbol x's probability, Q[x] as symbol x's cumulative probability (i.e., Q[i]=N[0]+N[1]+...+N[i]) ssume we know the probabilities of each symbol of the data source, we can allocate to each symbol an interval with width proportional to its probability, and each of the intervals does not overlap with others. This can be done if we use the cumulative probabilities as the two ends of each interval. Therefore, the two ends of each symbol x amount to Q[x-1] and Q[x]. Symbol x is said to own the range [Q[x-1], Q[x]).

rithmetic oding: ncoder We begin with the interval [0,1) and subdivide the interval iteratively. For each symbol entered, the current interval is divided according to the probabilities of the alphabet. The interval corresponding to the symbol is picked as the interval to be further proceeded with. The procedure continues until all symbols in the message have been processed. Since each symbol's interval does not overlap with others, for each possible message there is a unique interval assigned. We can represent the message with the interval's two ends [L,H). In fact, taking any single value in the interval as the encoded code is enough, and usually the left end L is selected. rithmetic oding lgorithm L = 0.0; H = 1.0; While ( (x = getc(input))!= OF ) { R = (H-L); H = L + R * Q[x]; L = L + R * Q[x-1]; } Output(L); R is the interval range, and H and L are two ends of the current code interval. x is the new symbol to be encoded. H and L are initialized to 0 and 1 respectively rithmetic oding: ncoder example Symbol, x Probability, N[x] 0.3 0.2 0.1 [Q[x-1], Q[x]) 0.0,, 0.7 0.7, 0.9 0.9, 1.0 1 0.7 0.67 0.634 0.6286 String: ode sent: 0.6196 0 0.61 0.61 0.6196

ecoding lgorithm When decoding the code v is placed on the current code interval to find the symbol x so that Q[x-1] <= code < Q[x]. The procedure iterates until all symbols are decoded. v = input_code(); for (;;) { x = find_symbol_straddling_this_range(v); putc(x); R = Q[x] Q[x-1]; v = (v Q[x-1])/R; } v Output har x Q[x-1] Q[x] R 0.6196 0.7 0.3 0.732 0.7 0.9 0.2 0.16 0.0 0.7 0.3 0.0 rithmetic oding: Issues The zero-frequency problem: ach symbol's predicted probability must not be zero or the interval will become zero and interval renormalization would fail. This is called the zero-frequency problem. Models that adapt online may encounter such problem when decaying. The OF problem: ssume we pick the lower end of the interval as the encoded code. Two messages may yield the same code if one message is identical to the other, except for a sequence of finite number of the first symbol(first in table, not in the sequence) as a suffix. For e.g., oth,,, will have the same lower interval but different upper intervals. (try it) The simplest solution is to let the decoder know the length of the encoded message. The decoder will know if the message size is fixed or can be transmitted at first. However this is not plausible if the data size is not known beforehand, such as live broadcasting data; or it's too costly to do so, such as tapes whose size is unknown at the beginning. There is another solution if we introduce a special OF symbol to the alphabet. The symbol takes a small interval and is used only at the end of the message. When the decoder detects the OF symbol it knows the end of the message is reached.