Introduction to information theory and data compression

Similar documents
Introduction to Information Theory, Data Compression,

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

EGR 544 Communication Theory

Entropy Coding. A complete entropy codec, which is an encoder/decoder. pair, consists of the process of encoding or

Chapter 8 SCALAR QUANTIZATION

Compression in the Real World :Algorithms in the Real World. Compression in the Real World. Compression Outline

Lecture 4: November 17, Part 1 Single Buffer Management

Structure and Drive Paul A. Jensen Copyright July 20, 2003

Lecture 3: Shannon s Theorem

ENTROPIC QUESTIONING

ECE 534: Elements of Information Theory. Solutions to Midterm Exam (Spring 2006)

Module 9. Lecture 6. Duality in Assignment Problems

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

Finding Dense Subgraphs in G(n, 1/2)

} Often, when learning, we deal with uncertainty:

Generalized Linear Methods

Feature Selection: Part 1

Lecture 14 (03/27/18). Channels. Decoding. Preview of the Capacity Theorem.

VQ widely used in coding speech, image, and video

On the Repeating Group Finding Problem

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

Lossy Compression. Compromise accuracy of reconstruction for increased compression.

Calculation of time complexity (3%)

Lecture 10 Support Vector Machines II

The Order Relation and Trace Inequalities for. Hermitian Operators

Tornado and Luby Transform Codes. Ashish Khisti Presentation October 22, 2003

Problem Set 9 Solutions

Week 5: Neural Networks

Foundations of Arithmetic

A Note on Bound for Jensen-Shannon Divergence by Jeffreys

THE SUMMATION NOTATION Ʃ

Errors for Linear Systems

Pulse Coded Modulation

Exercises. 18 Algorithms

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

Lecture 3. Ax x i a i. i i

Difference Equations

= z 20 z n. (k 20) + 4 z k = 4

Edge Isoperimetric Inequalities

G /G Advanced Cryptography 12/9/2009. Lecture 14

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

a b a In case b 0, a being divisible by b is the same as to say that

1 The Mistake Bound Model

CS : Algorithms and Uncertainty Lecture 17 Date: October 26, 2016

The Second Anti-Mathima on Game Theory

CSC 411 / CSC D11 / CSC C11

Homework Assignment 3 Due in class, Thursday October 15

NUMERICAL DIFFERENTIATION

Low Complexity Soft-Input Soft-Output Hamming Decoder

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

Assortment Optimization under MNL

An Interactive Optimisation Tool for Allocation Problems

Exercises of Chapter 2

A new construction of 3-separable matrices via an improved decoding of Macula s construction

CS 770G - Parallel Algorithms in Scientific Computing

Outline and Reading. Dynamic Programming. Dynamic Programming revealed. Computing Fibonacci. The General Dynamic Programming Technique

Math 426: Probability MWF 1pm, Gasson 310 Homework 4 Selected Solutions

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

Message modification, neutral bits and boomerangs

Min Cut, Fast Cut, Polynomial Identities

HMMT February 2016 February 20, 2016

Finding Primitive Roots Pseudo-Deterministically

Basic Regular Expressions. Introduction. Introduction to Computability. Theory. Motivation. Lecture4: Regular Expressions

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016

Bit Juggling. Representing Information. representations. - Some other bits. - Representing information using bits - Number. Chapter

General theory of fuzzy connectedness segmentations: reconciliation of two tracks of FC theory

FREQUENCY DISTRIBUTIONS Page 1 of The idea of a frequency distribution for sets of observations will be introduced,

Speeding up Computation of Scalar Multiplication in Elliptic Curve Cryptosystem

Assignment 2. Tyler Shendruk February 19, 2010

FACTORIZATION IN KRULL MONOIDS WITH INFINITE CLASS GROUP

Some Consequences. Example of Extended Euclidean Algorithm. The Fundamental Theorem of Arithmetic, II. Characterizing the GCD and LCM

CHAPTER-5 INFORMATION MEASURE OF FUZZY MATRIX AND FUZZY BINARY RELATION

Problem Set 6: Trees Spring 2018

Ensemble Methods: Boosting

Lecture 5 Decoding Binary BCH Codes

x = , so that calculated

Some modelling aspects for the Matlab implementation of MMA

Grover s Algorithm + Quantum Zeno Effect + Vaidman

Section 8.3 Polar Form of Complex Numbers

Lossless Compression Performance of a Simple Counter- Based Entropy Coder

2.3 Nilpotent endomorphisms

Affine transformations and convexity

COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

ESCI 341 Atmospheric Thermodynamics Lesson 10 The Physical Meaning of Entropy

The L(2, 1)-Labeling on -Product of Graphs

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

Numerical Heat and Mass Transfer

Amusing Properties of Odd Numbers Derived From Valuated Binary Tree

Computing Correlated Equilibria in Multi-Player Games

On the Multicriteria Integer Network Flow Problem

Error Probability for M Signals

Lecture 4. Instructor: Haipeng Luo

Société de Calcul Mathématique SA

The optimal delay of the second test is therefore approximately 210 hours earlier than =2.

Note on EM-training of IBM-model 1

Week3, Chapter 4. Position and Displacement. Motion in Two Dimensions. Instantaneous Velocity. Average Velocity

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

1. Estimation, Approximation and Errors Percentages Polynomials and Formulas Identities and Factorization 52

Transcription:

Introducton to nformaton theory and data compresson Adel Magra, Emma Gouné, Irène Woo March 8, 207 Ths s the augmented transcrpt of a lecture gven by Luc Devroye on March 9th 207 for a Data Structures and Algorthms class (COMP 252). Data compresson nvolves encodng nformaton usng fewer bts than the orgnal representaton. Informaton Theory Informaton theory s the study of quantfcaton, storage, and com- Thomas and Cover [2006] muncaton of nformaton. Claude Shannon developed the mathematcal theory that descrbes the basc aspects of communcaton systems. It s concerned wth the constructon and study of mathematcal models usng probablty theory. In 948, Shannon publshed hs paper A Mathematcal Theory of Communcaton n the Bell Systems Techncal Journal 2. The pa- 2 Shannon [948] per provded a blueprnt for the dgtal age. Fgure llustrates a general communcaton system as Shannon proposed n hs paper. We can calculate the compresson rato C as: C = length(b) length(a) Shannon s Theory Fgure : Communcaton system dagram Shannon magned that every possble nput sequence that may have to be compressed has a gven probablty p, where the p 's sum to one. So f we transform the -th nput sequence nto one havng l bts, the expected length of the output bt sequence s p l. One can reverse engneer a transformaton (or compresson) algorthm and construct a bnary tree that maps every bnary output back to an nput. It s smlar to the old decson tree we saw when argung about lower bounds. Leaves correspond to possble nputs. What matters s to fnd a compresson method that mnmzes p l. Theoretcally, ths can be done by fndng the Huffman tree usng the Hu-Tucker algorthm (see secton Practce ). Snce the number of possble nputs s ncredbly large, one cannot possbly use Huffman to actually do t. In addton, p s generally unknown.

ntroducton to nformaton theory and data compresson 2 However, one can stll learn thngs from Shannon about p l. Hs man theorem s that: E + mn p l E where E s the bnary entropy, p log 2 p, and the mnmum s over all possble bnary trees (and thus, all compresson algorthms). As a specal case, puttng p = n, where n s the total number of possble answers for a partcular algorthmc problem (such as sortng), we redscover the decson tree lower bound seen earler n the course: the expected number of bnary oracle comparsons must be E = log 2 n. We wll prove Shannon s theorem n the next secton. Entropy (symbol E) In Informaton Theory, entropy (E) s a number defned to be the measure of the average nformaton content delvered by a message. It measures the unpredctablty of the outcome. The bnary entropy s defned by E = p log 2 p 0, where the p 's are the probabltes of the nput sequences. We wll prove that E + mn p l E, where the mnmum s over all bnary trees. Recall Kraft's nequalty, whch s vald for all bnary trees: 2 l. Remark: The converse of Kraft's nequalty s also true,.e., gven numbers l, l 2,... wth 2 l, there exsts a bnary tree such that ts leaves have those depths. We frst show: p l E. Observe that: p l = p log 2 2 l ( = p log 2 2 l p p ( ) = p log 2 (2 l p ) + p log 2 p ) = p log 2 (2 l p ) + E.

ntroducton to nformaton theory and data compresson 3 Now, p log 2 l p (p ( )) = 2 l p 2 l 0 snce log x x and by Kraft s nequalty. Thus, clearly p l E. Now we show: E + p l for the so-called Shannon - Fano code. In ths code, we take l = (log 2 ( p )). We have 2 l p. Thus, by the converse of Kraft's nequalty, there exsts a code that has length l for nput. That s the Shannon - Fano code. Now, p l = p (log 2 ( p )) p log 2 p + p = E +. So we are done. In concluson, E, measured n "bts," corresponds to how well one can hope to compress a fle gven the assumpton on the p 's. Practce In practce, we wll compress ether symbols or small chunks of symbols. There s a separaton problem on the part of the recever. How do we separate a bt sequence f we transform each symbol n the nput, symbol per symbol, nto a small bt sequence? Indeed, when we send an encoded strng, a concatenaton of "codewords," the recever mght not be able to parse correctly the strng n order to decode each strng porton or codeword. Example. Suppose we want to send a sequence of ntegers such as 8 23. Its standard bt representaton s 000 and 0. Sendng the sequence 0000 s not very useful. Where does the frst porton start or end? How large s the porton? A bt number always starts wth a, hence one can send a prefx that starts wth 0 and ndcates the length of codeword. We have then: 0000000000 Another method s to add a prefx of 0's of the same length of the codeword. We have: 00000000000000. Stll, ths method s nconvenent and slghtly wasteful. Fxed wdth codng, as for example n the standard 8-bts per character codng, gves another soluton. It can be vastly mproved. A frst such mprovement s (varable wdth) prefx codng. A codeword assgns a sequence of bts to a symbol. A code s a set of codewords. One can pcture a code as a tre (defned below). In a prefx code, each leaf unquely corresponds to an nput symbol. In ths manner, the separaton problem wll be solved. Tres A tre (pronounced try ) s a tree-based data structure for storng strngs n order to support fast pattern matchng. The man applca-

ntroducton to nformaton theory and data compresson 4 ton of a tre s nformaton retreval, thus t s not surprsng that the name tre comes from the word retreval. In a bnary tre, each edge represents a bt n a codeword, wth an edge to a left chld representng a 0 and an edge to a rght chld representng a. Each leaf s assocated wth a specfc character. Usng a tre we can develop a strategy for the coder and decoder. The coder would search for a character and then go backwards to the root whle recordng the path. Ths step can be done usng ponters to the parent node. The decoder doesn't need to have the code tre. She/he only needs to process the code to fnd a leaf and go backward to the root. For effcency we usually send the tre along wth the coded message. For prefx codng, the sender (or coder) has a prefx codng tree and uses ether a table of codewords or parents ponters n the tree to do hs codng. The recever needs the tree (whch s sent n some way), and wth the tree, one can easly decode the sequence symbol by symbol as leaves correspond to nput symbols. Example 2. A smple example we can make s to encode the alphabet a,b,c wth bts: The leaves correspond to all the possble nputs. Here 0 maps to a, 0 maps to b and maps to c. Prefx Codng Defnton 3. Prefx codng s a codng system n whch varables (words n Englsh text) are dfferentated by ther prefx attrbute. We gve an example wth a tre and an alphabet composed of 5 letters a,b,c,d,e. Each letter s attrbuted a prefx code whch s a proper bnary sequence. Let P be the prefx code: P = a:00,b:0,c:0,d:0,e:. We clearly see that P s a vald prefx code as no bnary sequence s the prefx of another n P. We can vew ths prefx code as a code tree: In ths example, the strng abbba s transformed nto 0000000. Ths bt sequence can be unquely nterpreted and decoded back nto

ntroducton to nformaton theory and data compresson 5 abbba n other words, we have solved the separaton problem qute elegantly. If we assume a certan probablty p on symbol (possbly approxmated by the relatve frequency of symbols n general nput sequences that we would lke to compress), then the expected length of a codeword whch ultmately tells us about the expected length of the compressed sequence s agan p l. We should frst of all desgn the code by usng the Huffman tree. Such codes are called Huffman codes. By Shannon's theorem, observe that for the Huffman code, p l E +. Huffman Codes We can optmze a prefx code by takng nto consderaton the probablty of dfferent code words to occur. We could then construct a Huffman Tree 3. The Huffman codng algorthm constructs a solu- 3 Cormen et al. [2009] ton step by step by pckng the locally optmal choce. It s called a greedy algorthm. Gven a fxed tree wth leaf dstances l and a certan assgnment of symbols to the leaves, p l s mnmzed by placng the symbols and j wth smallest p values furthest from the root. Therefore, snce sngle chld nodes are obvously suboptmal, the optmal tree has and j as chldren of an nternal node. Ths permts us to create one nternal node and reduce the problem by one. The algorthm proceeds n a seres of rounds. Algorthm: Frst make each of the dstnct characters of the strng to encode the root node of a sngle-node bnary tree. In each round, take the two bnary trees wth the smallest frequences and merge them nto a sngle bnary tree. Repeat ths process untl only one tree s left. Example 4. Let s show the algorthm wth the followng alphabet and probabltes p.

ntroducton to nformaton theory and data compresson 6 As seen n the prevous lecture (March 7, 207), there s a complete algorthm for buldng a Huffman tree usng bnary heaps. Let's recall the method we used: The Huffman tree has n leaves and 2n- nternal nodes. We buld the Huffman Tree by fllng the nternal nodes wth left and rght chldren. We use the Hu-Tucker algorthm, whch uses a prorty queue (H). Hu-Tucker(n symbols wth key and probablty p are gven) MAKE_EMPTY_PRIORITY_QUEUE(H) 2 For from to n do (to nsert the leaves frst) 3 LEFT[] = 0 4 RIGHT[] = 0 5 INSERT((p,),H) 6 For from n+ to 2n- do (to mplement the nternal nodes) 7 (p a,a) = DELETEMIN(H) 8 (p b,b) = DELETEMIN(H) 9 LEFT[] = a 0 RIGHT[] = b INSERT((p a + p r, ),H) Ths algorthm outputs the Huffman tree. The root s node 2n -. Left and rght chldren of nodes are stored n the arrays LEFT and RIGHT. The constructon of a Huffman tree takes O(n log(n)). Fnally, the entropy tells us about how well we can do. For example, E depends upon the language when the nput conssts of long texts.

ntroducton to nformaton theory and data compresson 7 Fnal Remarks Improvements are possble by groupng nput symbols n groups of two, three, or more. Once can also employ adaptve Huffman codng, where the code s changed as a text s beng processed (and the frequences of the symbols change). References Thomas H. Cormen, Charles E. Leserson, and Ronald L. Rvest. Introducton to Algorthms. 2009. Cambrdge, MA. Claude E. Shannon. A mathematcal theory of communcaton. The Bell System Techncal Journal, 27:379 423, 623 656, 948. Joy A. Thomas and Thomas M. Cover. Elements of Informaton Theory. Wley Seres, 2nd edton, 2006. New York.