Compression in the Real World :Algorithms in the Real World. Compression in the Real World. Compression Outline

Similar documents
Entropy Coding. A complete entropy codec, which is an encoder/decoder. pair, consists of the process of encoding or

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Introduction to information theory and data compression

Introduction to Information Theory, Data Compression,

Chapter 8 SCALAR QUANTIZATION

ECE 534: Elements of Information Theory. Solutions to Midterm Exam (Spring 2006)

Lecture 3: Shannon s Theorem

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

Problem Set 9 Solutions

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

Tornado and Luby Transform Codes. Ashish Khisti Presentation October 22, 2003

EGR 544 Communication Theory

18.1 Introduction and Recap

Errors for Linear Systems

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

APPENDIX A Some Linear Algebra

Lecture 3: Probability Distributions

NUMERICAL DIFFERENTIATION

VQ widely used in coding speech, image, and video

Bit Juggling. Representing Information. representations. - Some other bits. - Representing information using bits - Number. Chapter

Linear Regression Analysis: Terminology and Notation

A New Scrambling Evaluation Scheme based on Spatial Distribution Entropy and Centroid Difference of Bit-plane

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law:

Lecture 4: November 17, Part 1 Single Buffer Management

Calculation of time complexity (3%)

Expected Value and Variance

Math Review. CptS 223 Advanced Data Structures. Larry Holder School of Electrical Engineering and Computer Science Washington State University

Lecture 14 (03/27/18). Channels. Decoding. Preview of the Capacity Theorem.

6. Stochastic processes (2)

} Often, when learning, we deal with uncertainty:

6. Stochastic processes (2)

NP-Completeness : Proofs

Spectral Graph Theory and its Applications September 16, Lecture 5

First day August 1, Problems and Solutions

ENTROPIC QUESTIONING

Affine transformations and convexity

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

find (x): given element x, return the canonical element of the set containing x;

Edge Isoperimetric Inequalities

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016

Lec 02 Entropy and Lossless Coding I

Hopfield networks and Boltzmann machines. Geoffrey Hinton et al. Presented by Tambet Matiisen

Generalized Linear Methods

Lossless Compression Performance of a Simple Counter- Based Entropy Coder

Dynamic Programming. Preview. Dynamic Programming. Dynamic Programming. Dynamic Programming (Example: Fibonacci Sequence)

Lossy Compression. Compromise accuracy of reconstruction for increased compression.

I529: Machine Learning in Bioinformatics (Spring 2017) Markov Models

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

HMMT February 2016 February 20, 2016

Stanford University CS254: Computational Complexity Notes 7 Luca Trevisan January 29, Notes for Lecture 7

Foundations of Arithmetic

Module 2. Random Processes. Version 2 ECE IIT, Kharagpur

CS : Algorithms and Uncertainty Lecture 17 Date: October 26, 2016

Introduction to Econometrics (3 rd Updated Edition, Global Edition) Solutions to Odd-Numbered End-of-Chapter Exercises: Chapter 13

Channel Encoder. Channel. Figure 7.1: Communication system

Notes on Frequency Estimation in Data Streams

More metrics on cartesian products

Learning Theory: Lecture Notes

CHAPTER 17 Amortized Analysis

= z 20 z n. (k 20) + 4 z k = 4

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

Basic Regular Expressions. Introduction. Introduction to Computability. Theory. Motivation. Lecture4: Regular Expressions

CSE4210 Architecture and Hardware for DSP

Open Systems: Chemical Potential and Partial Molar Quantities Chemical Potential

Introductory Cardinality Theory Alan Kaylor Cline

Assignment 2. Tyler Shendruk February 19, 2010

Note on EM-training of IBM-model 1

Case A. P k = Ni ( 2L i k 1 ) + (# big cells) 10d 2 P k.

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

Application of Nonbinary LDPC Codes for Communication over Fading Channels Using Higher Order Modulations

Linear Feature Engineering 11

x = , so that calculated

Pulse Coded Modulation

Lecture 10: May 6, 2013

Retrieval Models: Language models

Lecture 4: Universal Hash Functions/Streaming Cont d

Maximizing the number of nonnegative subsets

Lecture 3. Ax x i a i. i i

U.C. Berkeley CS294: Beyond Worst-Case Analysis Handout 6 Luca Trevisan September 12, 2017

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

Lecture 5 Decoding Binary BCH Codes

Homework Assignment 3 Due in class, Thursday October 15

ECE559VV Project Report

Module 9. Lecture 6. Duality in Assignment Problems

SPANC -- SPlitpole ANalysis Code User Manual

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

The Order Relation and Trace Inequalities for. Hermitian Operators

Week 2. This week, we covered operations on sets and cardinality.

Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction

Microwave Diversity Imaging Compression Using Bioinspired

Statistics MINITAB - Lab 2

Classification as a Regression Problem

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture # 15 Scribe: Jieming Mao April 1, 2013

Lecture 3 January 31, 2017

MATH 5707 HOMEWORK 4 SOLUTIONS 2. 2 i 2p i E(X i ) + E(Xi 2 ) ä i=1. i=1

Limited Dependent Variables

Clustering gene expression data & the EM algorithm

Computing Correlated Equilibria in Multi-Player Games

Transcription:

Compresson n the Real World 5-853:Algorthms n the Real World Data Compresson: Lectures and 2 Generc Fle Compresson Fles: gzp (LZ77), bzp (Burrows-Wheeler), BOA (PPM) Archvers: ARC (LZW), PKZp (LZW+) Fle systems: NTFS Communcaton Fax: ITU-T Group 3 (run-length + Huffman) Modems: V.42bs protocol (LZW), MNP5 (run-length+huffman) Vrtual Connectons 5-853 Page 5-853 Page 2 Compresson n the Real World Multmeda Images: gf (LZW), jbg (context), jpeg-ls (resdual), jpeg (transform+rl+arthmetc) TV: HDTV (mpeg-4) Sound: mp3 An example Other structures Indexes: google, lycos Meshes (for graphcs): edgebreaker Graphs Databases: 5-853 Page 3 Compresson Outlne Introducton: Lossless vs. lossy Model and coder Benchmarks Informaton Theory: Entropy, etc. Probablty Codng: Huffman + Arthmetc Codng Applcatons of Probablty Codng: PPM + others Lempel-Zv Algorthms: LZ77, gzp, compress,... Other Lossless Algorthms: Burrows-Wheeler Lossy algorthms for mages: JPEG, MPEG,... Compressng graphs and meshes: BBK 5-853 Page 4

Encodng/Decodng Wll use message n generc sense to mean the data to be compressed Input Message Encoder Compressed Message Decoder The encoder and decoder need to understand common compressed format. Output Message Lossless vs. Lossy Lossless: Input message = Output message Lossy: Input message Output message Lossy does not necessarly mean loss of qualty. In fact the output could be better than the nput. Drop random nose n mages (dust on lens) Drop background n musc Fx spellng errors n text. Put nto better form. Wrtng s the art of lossy text compresson. 5-853 Page 5 5-853 Page 6 How much can we compress? For lossless compresson, assumng all nput messages are vald, f even one strng s compressed, some other must expand. Model vs. Coder To compress we need a bas on the probablty of messages. The model determnes ths bas Messages Model Encoder Probs. Coder Bts Example models: Smple: Character counts, repeated strngs Complex: Models of a human face 5-853 Page 7 5-853 Page 8 2

Qualty of Compresson Runtme vs. Compresson vs. Generalty Several standard corpuses to compare algorthms e.g. Calgary Corpus 2 books, 5 papers, bblography, collecton of news artcles, 3 programs, termnal sesson, 2 object fles, geophyscal data, btmap bw mage The Archve Comparson Test mantans a comparson of just about all algorthms publcly avalable Comparson of Algorthms Program Algorthm Tme BPC Score RK LZ + PPM +5.79 430 BOA PPM Var. 94+97.9 407 PPMD PPM +20 2.07 265 IMP BW 0+3 2.4 254 BZIP BW 20+6 2.9 273 GZIP LZ77 Var. 9+5 2.59 38 LZ77 LZ77? 3.94? 5-853 Page 9 5-853 Page 0 Compresson Outlne Introducton: Lossy vs. Lossless, Benchmarks, Informaton Theory: Entropy Condtonal Entropy Entropy of the Englsh Language Probablty Codng: Huffman + Arthmetc Codng Applcatons of Probablty Codng: PPM + others Lempel-Zv Algorthms: LZ77, gzp, compress,... Other Lossless Algorthms: Burrows-Wheeler Lossy algorthms for mages: JPEG, MPEG,... Compressng graphs and meshes: BBK 5-853 Page Informaton Theory An nterface between modelng and codng Entropy A measure of nformaton content Condtonal Entropy Informaton content based on a context Entropy of the Englsh Language How much nformaton does each character n typcal Englsh text contan? 5-853 Page 2 3

Entropy (Shannon 948) For a set of messages S wth probablty p(s), s S, the self nformaton of s s: s () = log = log ps ( ) ps () Measured n bts f the log s base 2. The lower the probablty, the hgher the nformaton Entropy s the weghted average of self nformaton. HS ( ) = ps ( )log ps () Entropy Example p( S) = {. 25,. 25,. 25,. 25,. 25} H ( S) = 3.25log 4 + 2.25log8 = 2.25 p( S) = {. 5,. 25,. 25,. 25,. 25} H ( S) =.5log 2 + 4.25log8 = 2 p( S) = {. 75,. 0625,. 0625,. 0625,. 0625} H ( S) =.75 log(4 3) + 4.0625 log6 =.3 5-853 Page 3 5-853 Page 4 Condtonal Entropy Example of a Markov Chan The condtonal probablty p(s c) s the probablty of s n a context c. The condtonal self nformaton s sc ( ) = log psc ( ) The condtonal nformaton can be ether more or less than the uncondtonal nformaton. The condtonal entropy s the weghted average of the condtonal self nformaton H ( S C) = p( c) p( s c)log c C p( s c) p(w w).9 w. p(b w) p(w b).2 b p(b b).8 5-853 Page 5 5-853 Page 6 4

Entropy of the Englsh Language How can we measure the nformaton per character? ASCII code = 7 Entropy = 4.5 (based on character probabltes) Huffman codes (average) = 4.7 Unx Compress = 3.5 Gzp = 2.6 Bzp =.9 Entropy =.3 (for text compresson test ) Must be less than.3 for Englsh language. Shannon s experment Asked humans to predct the next character gven the whole prevous text. He used these as condtonal probabltes to estmate the entropy of the Englsh Language. The number of guesses requred for rght answer: # of guesses 2 3 4 5 > 5 Probablty.79.08.03.02.02.05 From the experment he predcted H(Englsh) =.6-.3 5-853 Page 7 5-853 Page 8 Compresson Outlne Introducton: Lossy vs. Lossless, Benchmarks, Informaton Theory: Entropy, etc. Probablty Codng: Prefx codes and relatonshp to Entropy Huffman codes Arthmetc codes Applcatons of Probablty Codng: PPM + others Lempel-Zv Algorthms: LZ77, gzp, compress,... Other Lossless Algorthms: Burrows-Wheeler Lossy algorthms for mages: JPEG, MPEG,... Compressng graphs and meshes: BBK 5-853 Page 9 Assumptons and Defntons Communcaton (or a fle) s broken up nto peces called messages. Each message come from a message set S = {s,,s n } wth a probablty dstrbuton p(s). Probabltes must sum to. Set can be nfnte. Code C(s): A mappng from a message set to codewords, each of whch s a strng of bts Message sequence: a sequence of messages Note: Adjacent messages mght be of a dfferent types and come from a dfferent probablty dstrbutons 5-853 Page 20 5

Dscrete or Blended We wll consder two types of codng: Dscrete: each message s a fxed set of bts Huffman codng, Shannon-Fano codng 000 000 0 message: 2 3 4 Blended: bts can be shared among messages Arthmetc codng 000000 Unquely Decodable Codes A varable length code assgns a bt strng (codeword) of varable length to every message value e.g. a =, b = 0, c = 0, d = 0 What f you get the sequence of bts 0? Is t aba, ca, or, ad? A unquely decodable code s a varable length code n whch bt strngs can always be unquely decomposed nto ts codewords. message:,2,3, and 4 5-853 Page 2 5-853 Page 22 Prefx Codes A prefx code s a varable length code n whch no codeword s a prefx of another word. e.g., a = 0, b = 0, c =, d = 0 All prefx codes are unquely decodable Prefx Codes: as a tree Can be vewed as a bnary tree wth message values at the leaves and 0s or s on the edges: a 0 0 0 d b c a = 0, b = 0, c =, d = 0 5-853 Page 23 5-853 Page 24 6

Some Prefx Codes for Integers n Bnary Unary Gamma..00 0 0 2..00 0 0 0 3..0 0 0 4..00 0 0 00 5..0 0 0 0 6..0 0 0 0 Average Length For a code C wth assocated probabltes p(c) the average length s defned as l ( C) = p()() c l c a c C We say that a prefx code C s optmal f for all prefx codes C, l a (C) l a (C ) l(c) = length of the codeword c (a postve nteger) Many other fxed prefx codes: Golomb, phased-bnary, subexponental,... 5-853 Page 25 5-853 Page 26 Relatonshp to Entropy Theorem (lower bound): For any probablty dstrbuton p(s) wth assocated unquely decodable code C, HS ( ) l( C) a Theorem (upper bound): For any probablty dstrbuton p(s) wth assocated optmal prefx code C, la ( C) H( S) + Kraft McMllan Inequalty Theorem (Kraft-McMllan): For any unquely decodable code C, c C Also, for any set of lengths L such that 2 l l L there s a prefx code C such that 2 l( c) l( c ) = l (,..., L ) = 5-853 Page 27 5-853 Page 28 7

Proof of the Upper Bound (Part ) Assgn each message a length: We then have l s 2 = 2 ( p( )) So by the Kraft-McMllan nequalty there s a prefx code wth lengths l(s). l ( s) = log s 2 ( p s ) ( ) log / ( ) = = ( p s ) log / ( ) ps () 5-853 Page 29 Proof of the Upper Bound (Part 2) Now we can calculate the average length gven l(s) l ( S) = p( s) l( s) a And we are done. ( ps) = ps () log / () ps ( ) ( + log( / ps ( ))) = + ps ( ) log( / ps ( )) = + HS ( ) 5-853 Page 30 Another property of optmal codes Theorem: If C s an optmal prefx code for the probabltes {p,, p n } then p > p j mples l(c ) l(c j ) Proof: (by contradcton) Assume l(c ) > l(c j ). Consder swtchng codes c and c j. If l a s the average length of the orgnal code, the length of the new code s ' la = la + pj( lc ( ) lc ( j)) + p( lc ( j) lc ( )) = la + ( pj p)( l( c) l( cj)) < la Ths s a contradcton snce l a s not optmal 5-853 Page 3 Huffman Codes Invented by Huffman as a class assgnment n 950. Used n many, f not most, compresson algorthms gzp, bzp, jpeg (as opton), fax compresson, Propertes: Generates optmal prefx codes Cheap to generate codes Cheap to encode and decode l a = H f probabltes are powers of 2 5-853 Page 32 8

Huffman Codes Huffman Algorthm: Start wth a forest of trees each consstng of a sngle vertex correspondng to a message s and wth weght p(s) Repeat untl one tree left: Select two trees wth mnmum weght roots p and p 2 Jon nto sngle tree by addng root wth weght p + p 2 5-853 Page 33 Example p(a) =., p(b) =.2, p(c ) =.2, p(d) =.5 a(.) b(.2) c(.2) d(.5) (.3) (.5) (.0) 0 a(.) b(.2) (.3) (.5) d(.5) c(.2) 0 Step a(.) b(.2) (.3) c(.2) 0 Step 2 a(.) b(.2) Step 3 a=000, b=00, c=0, d= 5-853 Page 34 Encodng and Decodng Encodng: Start at leaf of Huffman tree and follow path to the root. Reverse order of bts and send. Decodng: Start at root of Huffman tree and take branch for each bt receved. When at leaf can output message and return to root. There are even faster methods that (.0) 0 can process 8 or 32 bts at a tme (.5) d(.5) 0 (.3) c(.2) 0 a(.) b(.2) 5-853 Page 35 Huffman codes are optmal Theorem: The Huffman algorthm generates an optmal prefx code. Proof outlne: Inducton on the number of messages n. Consder a message set S wth n+ messages. Can make t so least probable messages of S are neghbors n the Huffman tree 2. Replace the two messages wth one message wth probablty p(m ) + p(m 2 ) makng S 3. Show that f S s optmal, then S s optmal 4. S s optmal by nducton 5-853 Page 36 9

Problem wth Huffman Codng Consder a message wth probablty.999. The self nformaton of ths message s log(. 999) =.0044 If we were to send a 000 such message we mght hope to use 000*.004 =.44 bts. Usng Huffman codes we requre at least one bt per message, so we would requre 000 bts. Arthmetc Codng: Introducton Allows blendng of bts n a message sequence. Only requres 3 bts for the example Can bound total bts requred based on sum of self nformaton: n l < 2 + s = Used n PPM, JPEG/MPEG (as opton), DMM More expensve than Huffman codng, but nteger mplementaton s not too bad. 5-853 Page 37 5-853 Page 38 Arthmetc Codng: message ntervals Assgn each probablty dstrbuton to an nterval range from 0 (nclusve) to (exclusve). e.g..0 c =.3 = 0.7 f ( ) p( j) j= b =.5 f(a) =.0, f(b) =.2, f(c) =.7 0.2 a =.2 0.0 The nterval for a partcular message wll be called the message nterval (e.g for b the nterval s [.2,.7)) 5-853 Page 39 Arthmetc Codng: sequence ntervals Code a message sequence by composng ntervals. For example: bac.0 0.7 0.3 c =.3 c =.3 c =.3 0.7 0.2 0.0 b =.5 a =.2 0.55 0.3 0.2 b =.5 a =.2 The fnal nterval s [.27,.3) We call ths the sequence nterval 0.27 0.2 0.2 b =.5 a =.2 5-853 Page 40 0

Arthmetc Codng: sequence ntervals To code a sequence of messages wth probabltes p ( =..n) use the followng: l = f l = l + s f bottom of nterval s = p s = s p sze of nterval Each message narrows the nterval by a factor of p. Fnal nterval sze: n s n = p = Warnng Three types of nterval: message nterval : nterval for a sngle message sequence nterval : composton of message ntervals code nterval : nterval for a specfc code used to represent a sequence nterval (dscussed later) 5-853 Page 4 5-853 Page 42 Unquely defnng an nterval Important property:the sequence ntervals for dstnct message sequences of length n wll never overlap Therefore: specfyng any number n the fnal nterval unquely determnes the sequence. Decodng s smlar to encodng, but on each step need to determne what the message value s and then reduce nterval Arthmetc Codng: Decodng Example Decodng the number.49, knowng the message s of length 3: 0.49.0 0.7 0.2 0.0 c =.3 b =.5 a =.2 The message s bbc. 0.7 0.55 0.49 0.3 0.2 c =.3 b =.5 a =.2 0.55 0.49 0.475 0.35 0.3 c =.3 b =.5 a =.2 5-853 Page 43 5-853 Page 44

Representng Fractons Bnary fractonal representaton:.75 / 3 /6 =. =.00 =.0 So how about just usng the smallest bnary fractonal representaton n the sequence nterval. e.g. [0,.33) =.0 [.33,.66) =. [.66,) =. But what f you receve a? Should we wat for another? Representng an Interval Can vew bnary fractonal numbers as ntervals by consderng all completons. e.g. mn max nterval.. 0. [. 750,. ). 0. 00. 0 [. 625,. 75) We wll call ths the code nterval. 5-853 Page 45 5-853 Page 46 Code Intervals: example [0,.33) =.0 [.33,.66) =. [.66,) =....0 0 Note that f code ntervals overlap then one code s a prefx of the other. Lemma: If a set of code ntervals do not overlap then the correspondng codes form a prefx code. Selectng the Code Interval To fnd a prefx code fnd a bnary fractonal number whose code nterval s contaned n the sequence nterval..79.75 Sequence Interval Code Interval (.0).6.625 Can use the fracton l + s/2 truncated to log( s 2) = + log s bts 5-853 Page 47 5-853 Page 48 2

Selectng a code nterval: example [0,.33) =.00 [.33,.66) =.00 [.66,) =.0.66.33 0.0.00.00 e.g: for [.33,.66 ), l =.33, s =.33 l + s/2 =.5 =.000 truncated to + log s = + log(.33) = 3 bts s.00 Is ths the best we can do for [0,.33)? RealArth Encodng and Decodng RealArthEncode: Determne l and s usng orgnal recurrences Code usng l + s/2 truncated to + -log s bts RealArthDecode: Read bts as needed so code nterval falls wthn a message nterval, and then narrow sequence nterval. Repeat untl n messages have been decoded. 5-853 Page 49 5-853 Page 50 Bound on Length Theorem: For n messages wth self nformaton {s,,s n } RealArthEncode wll generate at most 2 + n s = bts. Proof: log s + = n + log p = = n + log p = = + n s = < 2 + n s = 5-853 Page 5 Integer Arthmetc Codng Problem wth RealArthCode s that operatons on arbtrary precson real numbers s expensve. Key Ideas of nteger verson: Keep ntegers n range [0..R) where R=2 k Use roundng to generate nteger sequence nterval Whenever sequence nterval falls nto top, bottom or mddle half, expand the nterval by factor of 2 Ths nteger Algorthm s an approxmaton or the real algorthm. 5-853 Page 52 3

Integer Arthmetc Codng Integer Arthmetc (contractng) The probablty dstrbuton as ntegers R=256 Probabltes as counts: e.g. c(a) =, c(b) = 7, c(c) = 30 T s the sum of counts e.g. 48 (+7+30) T=48 Partal sums f as before: e.g. f(a) = 0, f(b) =, f(c) = 8 c = 30 Requre that R > 4T so that 8 probabltes do not get rounded to b = 7 zero a = 0 l = 0, s = R s u l = = = u l + l l + s ( f + s ) / T + s f / T R=256 u l 0 5-853 Page 53 5-853 Page 54 Integer Arthmetc (scalng) If l R/2 then (n top half) Output followed by m 0s m = 0 Scale message nterval by expandng by 2 If u < R/2 then (n bottom half) Output 0 followed by m s m = 0 Scale message nterval by expandng by 2 If l R/4 and u < 3R/4 then (n mddle half) Increment m Scale message nterval by expandng by 2 5-853 Page 55 4