ELEC 515 Information Theory. Distortionless Source Coding

Similar documents
Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria

Source Coding Techniques

Communications Theory and Engineering

Basic Principles of Lossless Coding. Universal Lossless coding. Lempel-Ziv Coding. 2. Exploit dependences between successive symbols.

Multimedia Communications. Mathematical Preliminaries for Lossless Compression

Chapter 2: Source coding

Digital Communications III (ECE 154C) Introduction to Coding and Information Theory

Chapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code

UNIT I INFORMATION THEORY. I k log 2

SIGNAL COMPRESSION Lecture Shannon-Fano-Elias Codes and Arithmetic Coding

Shannon-Fano-Elias coding

3F1 Information Theory, Lecture 3

Motivation for Arithmetic Coding

Chapter 5: Data Compression

CMPT 365 Multimedia Systems. Lossless Compression

Compression and Coding

3F1 Information Theory, Lecture 3

Introduction to Information Theory. By Prof. S.J. Soni Asst. Professor, CE Department, SPCE, Visnagar

Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet)

Multimedia. Multimedia Data Compression (Lossless Compression Algorithms)

CSCI 2570 Introduction to Nanocomputing

Data Compression. Limit of Information Compression. October, Examples of codes 1

Image and Multidimensional Signal Processing

Lecture 4 : Adaptive source coding algorithms

Entropy as a measure of surprise

Summary of Last Lectures

repetition, part ii Ole-Johan Skrede INF Digital Image Processing

1 Ex. 1 Verify that the function H(p 1,..., p n ) = k p k log 2 p k satisfies all 8 axioms on H.

10-704: Information Processing and Learning Fall Lecture 10: Oct 3

Lecture 3 : Algorithms for source coding. September 30, 2016

Coding for Discrete Source

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

Run-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE

CSEP 521 Applied Algorithms Spring Statistical Lossless Data Compression

EECS 229A Spring 2007 * * (a) By stationarity and the chain rule for entropy, we have

Chapter 2 Date Compression: Source Coding. 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code

MARKOV CHAINS A finite state Markov chain is a sequence of discrete cv s from a finite alphabet where is a pmf on and for

Homework Set #2 Data Compression, Huffman code and AEP

Ch 0 Introduction. 0.1 Overview of Information Theory and Coding

2018/5/3. YU Xiangyu

Information and Entropy

ECE 587 / STA 563: Lecture 5 Lossless Compression

CMPT 365 Multimedia Systems. Final Review - 1

Lecture 1 : Data Compression and Entropy

Information Theory CHAPTER. 5.1 Introduction. 5.2 Entropy

ECE 587 / STA 563: Lecture 5 Lossless Compression

Autumn Coping with NP-completeness (Conclusion) Introduction to Data Compression

Lecture 16. Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code

1 Introduction to information theory

Chapter 9 Fundamental Limits in Information Theory

Stream Codes. 6.1 The guessing game

EC2252 COMMUNICATION THEORY UNIT 5 INFORMATION THEORY

COMM901 Source Coding and Compression. Quiz 1


Introduction to information theory and coding

Entropy and Ergodic Theory Lecture 3: The meaning of entropy in information theory

Lecture 5: Asymptotic Equipartition Property

CSEP 590 Data Compression Autumn Arithmetic Coding

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

4. Quantization and Data Compression. ECE 302 Spring 2012 Purdue University, School of ECE Prof. Ilya Pollak

SIGNAL COMPRESSION Lecture 7. Variable to Fix Encoding

Entropy Coding. Connectivity coding. Entropy coding. Definitions. Lossles coder. Input: a set of symbols Output: bitstream. Idea

Coding of memoryless sources 1/35

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

U Logo Use Guidelines

10-704: Information Processing and Learning Fall Lecture 9: Sept 28

Lecture 3. Mathematical methods in communication I. REMINDER. A. Convex Set. A set R is a convex set iff, x 1,x 2 R, θ, 0 θ 1, θx 1 + θx 2 R, (1)

Information Theory and Coding Techniques

lossless, optimal compressor

Information Theory and Statistics Lecture 2: Source coding

DCSP-3: Minimal Length Coding. Jianfeng Feng

Solutions to Set #2 Data Compression, Huffman code and AEP

Quantum-inspired Huffman Coding

Optimal codes - I. A code is optimal if it has the shortest codeword length L. i i. This can be seen as an optimization problem. min.

Information Theory. David Rosenberg. June 15, New York University. David Rosenberg (New York University) DS-GA 1003 June 15, / 18

Chapter 2 Source Models and Entropy. Any information-generating process can be viewed as. computer program in executed form: binary 0

Information Theory with Applications, Math6397 Lecture Notes from September 30, 2014 taken by Ilknur Telkes

Data Compression Techniques

MAHALAKSHMI ENGINEERING COLLEGE-TRICHY QUESTION BANK UNIT V PART-A. 1. What is binary symmetric channel (AUC DEC 2006)

Multimedia Information Systems

1. Basics of Information

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

Introduction p. 1 Compression Techniques p. 3 Lossless Compression p. 4 Lossy Compression p. 5 Measures of Performance p. 5 Modeling and Coding p.

Reduce the amount of data required to represent a given quantity of information Data vs information R = 1 1 C

Information Theory, Statistics, and Decision Trees

PART III. Outline. Codes and Cryptography. Sources. Optimal Codes (I) Jorge L. Villar. MAMME, Fall 2015

Intro to Information Theory

Lecture 10 : Basic Compression Algorithms

Lecture 6: Kraft-McMillan Inequality and Huffman Coding

EE5585 Data Compression January 29, Lecture 3. x X x X. 2 l(x) 1 (1)

MAHALAKSHMI ENGINEERING COLLEGE QUESTION BANK. SUBJECT CODE / Name: EC2252 COMMUNICATION THEORY UNIT-V INFORMATION THEORY PART-A

Source Coding: Part I of Fundamentals of Source and Video Coding

CSEP 590 Data Compression Autumn Dictionary Coding LZW, LZ77

(Classical) Information Theory III: Noisy channel coding

Introduction to information theory and coding

Lecture 4 Noisy Channel Coding

Basics of DCT, Quantization and Entropy Coding

An introduction to basic information theory. Hampus Wessman

ELEMENTS O F INFORMATION THEORY

17.1 Binary Codes Normal numbers we use are in base 10, which are called decimal numbers. Each digit can be 10 possible numbers: 0, 1, 2, 9.

Transcription:

ELEC 515 Information Theory Distortionless Source Coding 1

Source Coding Output Alphabet Y={y 1,,y J } Source Encoder Lengths 2

Source Coding Two coding requirements The source sequence can be recovered from the encoded sequence with no ambiguity The average number of output symbols per source symbol is as small as possible 3

Typical Sequences Consider the following discrete memoryless binary source: p(1) = 1/4 p(0) = 3/4 Sequences of 20 symbols 1. 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 2. 1,0,1,0,1,0,0,0,0,0,0,0,0,0,1,1,0,0,0,1 3. 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 4

Tchebycheff Inequality 5

Weak Law of Large Numbers Sequence of N i.i.d. RVs Define a new RV 6

Weak Law of Large Numbers The average approaches the statistical mean 7

Asymptotic Equipartition Property N i.i.d. random variables X 1,, X N p(x,x,,x )=p(x )p(x ) p(x ) 1 2 N 1 2 N N 1 1 logp(x = 1,X 2,,X N) logp(x n) E[ logp(x) ] =H(X) N N n = 1 as N 8

Typical Sequences RV X where p(x k ) = p k Consider a sequence x of length N where x k appears Np k times 1 2 p( x) = p p p K K Np log2 p Np pk ((2 ) k= 1 k= 1 = = Np log p = 2 = 2 = 2 Np Np Np 1 2 K K k= 1 NH(X) k k k k 2 k k= 1 K K N p log p k 2 k 9

Summary The Tchebycheff inequality was used to prove the weak law of large numbers (WLLN) the sample average approaches the statistical mean as The WLLN was used to prove the AEP N 1 logp(x n) H(X) as N N n= 1 A typical sequence has probability p( x1, x2,, x ) 2 N N - H(X) N There are about 2 NH(X) typical sequences of length N 10

Typical Sequences H( X) H( X) 11

Typical Sequences set of typical sequences nontypical or atypical sequence set of sequences of length N 12

Interpretation Although there are very many results that may be produced by a random process, the one actually produced is most probably from a set of outcomes that all have approximately the same chance of being the one actually realized. Although there are individual outcomes which may have a higher probability than outcomes in this set, the vast number of outcomes in the set almost guarantees that the outcome will come from the set. ``Almost all events are almost equally surprising Cover and Thomas 13

Typical Sequences From the definition, the probability of occurrence of a typical sequence p(x) is 14

Example p(x 1 ) = p(1) = 1/4 p(x 2 ) = p(0) = 3/4 H(X) = 0.811 bit N = 3 p(x 1,x 1,x 1 ) = 1/64 p(x 1,x 1,x 2 ) = p(x 1,x 2,x 1 ) = p(x 2,x 1,x 1 ) = 3/64 p(x 1,x 2,x 2 ) = p(x 2,x 2,x 1 ) = p(x 2,x 1,x 2 ) = 9/64 p(x 2,x 2,x 2 ) = 27/64 15

Example H(X) = 0.811 bit N = 3 b = 2 2 < p( x, x, x ) < 2 3[.811 + δ] 3[.811 δ] 1 2 3 x 1,x 1,x 1 1/64 = 2-3[.811+1.199] x 1,x 1,x 2 3/64 = 2-3[.811+0.661] x 1,x 2,x 2 9/64 = 2-3[.811+0.132] x 2,x 2,x 2 27/64 = 2-3[.811-0.395] 16

Example If δ = 0.2 the typical sequences are (x 1,x 2,x 2 ), (x 2,x 1,x 2 ), (x 2,x 2,x 1 ) with probability 0.422 (1,0,0), (0,1,0), (0,0,1) If δ = 0.4 the typical sequences are (x 1,x 2,x 2 ), (x 2,x 1,x 2 ), (x 2,x 2,x 1 ), (x 2,x 2,x 2 ) with probability 0.844 (1,0,0), (0,1,0), (0,0,1), (0,0,0) 17

18

Typical Sequences Random variable X Alphabet size K Entropy H(X) Arbitrary number δ>0 Sequences x of blocklength N N 0 and probability p(x) c X( ) + X( ) = K N 19

Shannon-McMillan Theorem 20

The essence of source coding or data compression is that as N, atypical sequences almost never appear as the output of the source. Therefore, one can focus on representing typical sequences with codewords and ignore atypical sequences. Since there are only 2 NH(X) typical sequences of length N, and they are approximately equiprobable, it takes about NH(X) bits to represent them. On average it takes H(X) bits per source output symbol. 21

Variable Length Codes Output Alphabet Y={y 1,,y J } Lengths 22

23

Instantaneous Codes Definition: A uniquely decodable code is said to be instantaneous if it is possible to decode each codeword in a sequence without reference to succeeding codewords. A necessary and sufficient condition for a code to be instantaneous is that no codeword is a prefix of some other codeword. 24

Variable Length Codes Prefix code (also prefix-free or instantaneous) C 1 ={0,10,110,111} Uniquely decodable code (which is not prefix) C 2 ={0,01,011,0111} Non-singular code (which is not uniquely decodable) C 3 ={0,1,00,11} Singular code C 4 ={0,10,11,10} Example sequence of codewords: 001110100110 25

26

Average Codeword Length K L(C) = p( xk) l k= 1 k 27

Two Binary Prefix Codes Five source symbols: x 1, x 2, x 3, x 4, x 5 K = 5, J = 2 c 1 = 0, c 2 = 10, c 3 = 110, c 4 = 1110, c 5 = 1111 codeword lengths 1,2,3,4,4 c 1 = 00, c 2 = 01, c 3 = 10, c 4 = 110, c 5 = 111 codeword lengths 2,2,2,3,3 28

Kraft Inequality for Prefix Codes 29

Code Tree 30

Five Binary Codes Source symbols Code A Code B Code C Code D Code E x 1 x 2 x 3 x 4 31

Ternary Code Example Ten source symbols: x 1, x 2,, x 9, x 10 K = 10, J = 3 l k = 1,2,2,2,2,2,3,3,3,3 l k = 1,2,2,2,2,2,3,3,3 l k = 1,2,2,2,2,2,3,3,4,4 32

Average Codeword Length Bound L(C) H(X) log b J 33

Four Symbol Source Information Source H p(x 1 ) = 1/2 p(x 3 ) = 1/4 p(x 2 ) = p(x 4 ) = 1/8 H(X) = 1.75 bits x 1 0 x 2 110 x 3 10 x 4 111 L(C) = 1.75 bits x 1 00 x 2 01 x 3 10 x 4 11 L(C) = 2 bits 34

Code Efficiency H(X) ζ= 1 L(C)log b J First code ζ = 1.75/1.75 = 100% Second code ζ = 1.75/2.0 = 87.5% 35

Compact Codes A code C is called compact for a source X if its average codeword length L(C) is less than or equal to the average length of all other uniquely decodable codes for the same source and alphabet J. 36

Upper and Lower Bounds for a Compact Code H(X) log b J H(X) L(C) < + 1 log J b 37

The Shannon Algorithm Order the symbols from largest to smallest probability Choose the codeword lengths according to l = log p( x ) k J k Construct the codewords according to the cumulative probability P k P k k 1 = p( x ) 1 i= i 38

Example K = 10, J = 2 p(x 1 ) = p(x 2 ) = 1/4 p(x 3 ) = p(x 4 ) = 1/8 p(x 5 ) = p(x 6 ) = 1/16 p(x 7 ) = p(x 8 ) = p(x 9 ) = p(x 10 ) = 1/32 39

P k 40

Shannon Algorithm p(x 1 ) =.4 p(x 2 ) =.3 p(x 3 ) =.2 p(x 4 ) =.1 H(X) = 1.85 bits Shannon Code x 1 00 x 2 01 x 3 101 x 4 1110 L(C) = 2.4 bits ζ = 77.1% Alternate Code x 1 0 x 2 10 x 3 110 x 4 111 L(C) = 1.9 bits ζ = 97.4% 41

Shannon s Noiseless Coding Theorem 42

Robert M. Fano (1917-2016) 43

The Fano Algorithm Arrange the symbols in order of decreasing probability Divide the symbols into J approximately equally probable groups Each group receives one of the J code symbols as the first symbol This division process is repeated within the groups as many times as possible 44

Shannon Algorithm vs Fano Algorithm p(x 1 ) =.4 p(x 2 ) =.3 p(x 3 ) =.2 p(x 4 ) =.1 H(X) = 1.85 bits Shannon Code x 1 00 x 2 01 x 3 101 x 4 1110 L(C) = 2.4 bits ζ = 77.1% Fano Code x 1 0 x 2 10 x 3 110 x 4 111 L(C) = 1.9 bits ζ = 97.4% 45

David Huffman (1925-1999) 46

``It was the most singular moment in my life. There was the absolute lightning of sudden realization. David Huffman ``Is that all there is to it! Robert Fano 47

The Binary Huffman Algorithm 1. Arrange the symbols in order of decreasing probability. 2. Assign a 1 to the last digit of the Kth codeword c K and a 0 to the last digit of the (K-1)th codeword c K-1. Note that this assignment is arbitrary. 3. Form a new source X with x k = x k, k = 1,, K-2, and x K-1 = x K-1 U x K p(x K-1 ) = p(x K-1 ) + p(x K ) 4. Rearrange the new set of probabilities, set K = K-1. 5. Repeat Steps 2 to 4 until all symbols have been combined. To obtain the codewords, trace back to the original symbols. 48

Five Symbol Source p(x 1 )=.35 p(x 2 )=.22 p(x 3 )=.18 p(x 4 )=.15 p(x 5 )=.10 H(X) = 2.2 bits 49

Midterm Test Tuesday, October 24, 2017 In class test (11:30-12:20) 20 marks total 20% of final grade One 8.5" 11 sheet of notes allowed both sides Calculator allowed but no cellphones 50

51

Huffman Code for the English Alphabet 52

Six Symbol Source p(x 1 )=.4 p(x 2 )=.3 p(x 3 )=.1 p(x 4 )=.1 p(x 5 )=.06 p(x 6 )=.04 H(X) = 2.1435 bits 53

Second Five Symbol Source p(x 1 )=.4 p(x 2 )=.2 p(x 3 )=.2 p(x 4 )=.1 p(x 5 )=.1 H(X) = 2.1219 bits 54

Two Huffman Codes x 1 x 1 x 2 x 2 x 3 x 3 x 4 x 4 x 5 x 5 C 1 C 2 55

Two Huffman Codes C 1 C 2 x 1 0 11 x 2 10 01 x 3 111 00 x 4 1101 101 x 5 1100 100 56

Second Five Symbol Source p(x 1 )=.4 p(x 2 )=.2 p(x 3 )=.2 p(x 4 )=.1 p(x 5 )=.1 H(X) = 2.1219 bits L(C) = 2.2 bits variance of code C 1 0.4(1-2.2) 2 +0.2(2-2.2) 2 +0.2(3-2.2) 2 +0.2(4-2.2) 2 = 1.36 variance of code C 2 0.8(2-2.2) 2 +0.2(3-2.2) 2 = 0.16 which code is preferable? 57

Nonbinary Codes The Huffman algorithm for nonbinary codes (J>2) follows the same procedure as for binary codes except that J symbols are combined at each stage. This requires that the number of symbols in the source X is K =J+c(J-1), K K c K J = J 1 58

Nonbinary Example J=3 K=6 K J c = c=2 so K =7 J 1 Add an extra symbol x 7 with p(x 7 )=0 p(x 1 )=1/3 p(x 2 )=1/6 p(x 3 )=1/6 p(x 4 )=1/9 p(x 5 )=1/9 p(x 6 )=1/9 p(x 7 )=0 H(X) = 1.54 trits 59

Codes for Different Output Alphabets K=13 p(x 1 )=1/4 p(x 2 )=1/4 p(x 3 )=1/16 p(x 4 )=1/16 p(x 5 )=1/16 p(x 6 )=1/16 p(x 7 )=1/16 p(x 8 )=1/16 p(x 9 )=1/16 p(x 10 )=1/64 p(x 11 )=1/64 p(x 12 )=1/64 p(x 13 )=1/64 J=2 to 13 60

Codes for Different Output Alphabets p(x i ) x i J x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 x 11 x 12 x 13 L(C) 61

Code Efficiency ζ J 62

The Binary and Quaternary Codes x 1 00 0 x 2 01 1 x 3 1000 20 x 4 1001 21 x 5 1010 22 x 6 1011 23 x 7 1100 30 x 8 1101 31 x 9 1110 32 x 10 111100 330 x 11 111101 331 x 12 111110 332 x 13 111111 333 63

Huffman Codes Symbol probabilities must be known a priori The redundancy of the code L(C)-H(X) (for J=b) is typically nonzero Error propagation can occur Codewords have variable length 64

Variable to Fixed Length Codes M sourcewords M J L Lengths m 1,m 2,,m M Variable to fixed length encoder 65

Variable to Fixed Length Codes Two questions: 1. What is the best mapping from sourcewords to codewords? 2. How to ensure unique encodability? 66

Average Bit Rate ABR = = average codeword length average sourceword length L L(S) M L(S) = p( si) m i= 1 M - number of sourcewords s i - sourceword i m i - length of sourceword i i 67

Variable to Fixed Length Codes Design criteria: minimize the Average Bit Rate L ABR= L(S) ABR H(X) (vs. L(C) H(X) for fixed to variable length codes) L(S) should be as large as possible so that the ABR is close to H(X) 68

Variable to Fixed Length Codes Fixed to variable length codes H(X) ζ= 1 L(C) Variable to fixed length codes H(X) ζ= 1 ABR 69

Binary Tunstall Code K=3, L=3 Let x 1 = a, x 2 = b and x 3 = c Unused codeword is 111 70

Tunstall Codes Tunstall codes must satisfy the Kraft inequality M i= 1 m K i 1 M - number of sourcewords K - source alphabet size m i - length of sourceword i 71

Binary Tunstall Code Construction Source X with K symbols Choose a codeword length L where 2 L > K 1. Form a tree with a root and K branches labelled with the symbols 2. If the number of leaves is greater than 2 L - (K-1), go to Step 4 3. Find the leaf with the highest probability and extend it to have K branches, go to Step 2 4. Assign codewords to the leaves 72

K=3, L=3 p(a) =.7, p(b) =.2, p(c) =.1 73

H(X) ζ = H(X)/ABR = 84.7% 74

The Codewords aaa 000 aab 001 aac 010 ab 011 ac 100 b 101 c 110 75

What if a or aa is left at the end of the sequence of source symbols? There are no corresponding codewords. Solution: use the unused codeword 111. a 1110 aa 1111 76

Tunstall Code for a Binary Source L = 3, K = 2, J = 2, p(x 1 ) = 0.7, p(x 2 ) = 0.3 J L = 8 Seven sourcewords Eight sourcewords Codewords x 1 x 1 x 1 x 1 x 1 x 1 x 1 x 1 x 1 x 1 000 x 1 x 1 x 1 x 1 x 2 x 1 x 1 x 1 x 1 x 2 001 x 1 x 1 x 1 x 2 x 1 x 1 x 1 x 2 010 x 1 x 1 x 2 x 1 x 1 x 2 011 x 1 x 2 x 1 x 2 x 1 100 x 2 x 1 x 1 x 2 x 2 101 x 2 x 2 x 2 x 1 110 x 2 x 2 111 77

Huffman Code for a Binary Source N = 3, K = 2, p(x 1 ) = 0.7, p(x 2 ) = 0.3 Eight sourcewords A = x 1 x 1 x 1 p(a) =.343 00 B = x 1 x 1 x 2 p(b) =.147 11 C = x 1 x 2 x 1 p(c) =.147 010 D = x 2 x 1 x 1 p(d) =.147 011 E = x 2 x 2 x 1 p(e) =.063 1000 F = x 2 x 1 x 2 p(f) =.063 1001 G = x 1 x 2 x 2 p(g) =.063 1010 H = x 2 x 2 x 2 p(h) =.027 1011 78

Code Comparison H(X) =.8813 Tunstall Code (7 codewords) ABR =.9762 ζ = 90.3% Tunstall Code (8 codewords) ABR =.9138 ζ = 96.4% Huffman Code L(C) =.9087 ζ = 97.0% 79

Error Propagation Received Huffman codeword sequence 00 11 00 11 00 11 A B A B A B Sequence with one bit error 011 1001 1001 1 D F F 80

Huffman Coding The length of Huffman codewords has to be an integer number of symbols, while the selfinformation of the source symbols is almost always a non-integer. Thus the theoretical minimum message compression cannot always be achieved. For a binary source with p(x 1 ) =.1 and p(x 2 ) =.9 H(X) =.469 so the optimal average codeword length is.469 bits. An x 1 symbol should be encoded to l 1 =-log 2 (0.1)=3.32 bits and an x 2 symbol to l 2 =-log 2 (0.9)=.152 bits 81

Improving Huffman Coding One way to overcome the redundancy limitation is to encode blocks of several symbols. In this way the per-symbol inefficiency is spread over an entire block. N = 1: ζ = 46.9% N = 2: ζ = 72.7% However, using blocks is difficult to implement as there is a block for every possible combination of symbols, so the number of blocks increases exponentially with their length. 82

Peter Elias (1923 2001) 83

Arithmetic Coding Arithmetic coding bypasses the idea of replacing an input symbol (or groups of symbols) with a specific codeword. Instead, a stream of input symbols is encoded to a single number in [0,1). Useful when dealing with sources with small alphabets, such as binary sources, and alphabets with highly skewed probabilities. 84

Arithmetic Coding Applications JPEG, MPEG-1, MPEG-2 Huffman and arithmetic coding JPEG2000, MPEG-4 ZIP Arithmetic coding only prediction by partial matching (PPMd) algorithm H.263, H.264 85

Arithmetic Coding The output of an arithmetic encoder is a stream of bits However we can think that there is a prefix 0, and the stream represents a fractional binary number between 0 and 1 01101010 0. 01101010 To explain the algorithm, numbers will be shown as decimal, but they are always binary in practice 86

Example 1 Encode string bccb from the alphabet {a,b,c} p(a) = p(b) = p(c) = 1/3 The arithmetic coder maintains two numbers, low and high, which represent a subinterval [low,high) of the range [0,1) Initially low=0 and high=1 87

Example 1 The range between low and high is divided between the symbols of the source alphabet according to their probabilities p(c)=1/3 p(b)=1/3 high 1 0.6667 0.3333 c b p(a)=1/3 low 0 a 88

Example 1 b high 1 0.6667 0.3333 c b high = 0.6667 low 0 a low = 0.3333 89

Example 1 high p(c)=1/3 0.6667 c c high = 0.6667 0.5556 p(b)=1/3 b 0.4444 p(a)=1/3 low 0.3333 a low = 0.5556 90

Example 1 high 0.6667 c high = 0.6667 p(c)=1/3 0.6296 c p(b)=1/3 b 0.5926 p(a)=1/3 a low 0.5556 low = 0.6296 91

Example 1 high 0.6667 b high = 0.6543 p(c)=1/3 0.6543 c p(b)=1/3 b 0.6420 p(a)=1/3 a low 0.6296 low = 0.6420 92

Symbol Ranges Determine the cumulative probabilities k Pk = p( xi) k= 1,, K i= 1 where P 0 =0 and P K =1 The range for symbol x i is low = P i-1 high = P i The ranges divide [0,1) 93

Arithmetic Coding Algorithm Set low to 0.0 Set high to 1.0 While there are still input symbols Do get next input symbol range = high low high = low + range symbol_high_range low = low + range symbol_low_range End While output number between high and low 94

Arithmetic Coding Example 2 p(x 1 ) = 0.5, p(x 2 ) = 0.3, p(x 3 ) = 0.2 Symbol ranges: 0 x 1 <.5.5 x 2 <.8.8 x 3 < 1 low = 0.0 high = 1.0 Symbol sequence x 1 x 2 x 3 x 2 Iteration 1 x 1 : range = 1.0-0.0 = 1.0 high = 0.0 + 1.0 0.5 = 0.5 low = 0.0 + 1.0 0.0 = 0.0 Iteration 2 x 2 : range = 0.5-0.0 = 0.5 high = 0.0 + 0.5 0.8 = 0.40 low = 0.0 + 0.5 0.5 = 0.25 Iteration 3 x 3 : range = 0.4-0.25 = 0.15 high = 0.25 + 0.15 1.0 = 0.40 low = 0.25 + 0.15 0.8 = 0.37 95

Arithmetic Coding Example 2 Iteration 3 x 3 : range = 0.4-0.25 = 0.15 high = 0.25 + 0.15 1.0 = 0.40 low = 0.25 + 0.15 0.8 = 0.37 Iteration 4 x 2 : range = 0.4-0.37 =.03 high = 0.37 + 0.03 0.5 = 0.385 low = 0.37 + 0.03 0.8 = 0.394 0.385 x 1 x 2 x 3 x 2 <0.394 0.385 = 0.0110001 0.394 = 0.0110010... The first 5 bits of the codeword are 01100 If there are no additional symbols to be encoded the codeword is 011001 96

Arithmetic Coding Example 3 Suppose that we want to encode the message BILL GATES Character Probability Range SPACE 1/10 0.00 r < 0.10 A 1/10 0.10 r < 0.20 B 1/10 0.20 r < 0.30 E 1/10 0.30 r < 0.40 G 1/10 0.40 r < 0.50 0.50 r < 0.60 I 1/10 L 2/10 S 1/10 T 1/10 0.60 r < 0.80 0.80 r < 0.90 0.90 r < 1.00 97

98 0.0 1.0 0.1 0.2 0.3 0.4 0.5 0.6 0.8 0.9 ( ) A B E G I L S T 0.2 0.3 ( ) A B E G I L S T 0.25 0.26 ( ) A B E G I L S T 0.256 0.258 ( ) A B E G I L S T 0.2572 0.2576 ( ) A B E G I L S T 0.25720 0.25724 ( ) A B E G I L S T 0.257216 0.25722 ( ) A B E G I L S T 0.2572164 0.2572168 ( ) A B E G I L S T 0.25721676 0.2572168 ( ) A B E G I L S T 0.257216772 0.257216776 ( ) A B E G I L S T 0.2572167752 0.2572167756

Arithmetic Coding Example 3 New Symbol Low Value High Value 0.0 1.0 B 0.2 0.3 I 0.25 0.26 L 0.256 0.258 L 0.2572 0.2576 SPACE 0.25720 0.25724 G 0.257216 0.257220 A 0.2572164 0.2572168 T 0.25721676 0.2572168 E 0.257216772 0.257216776 S 0.2572167752 0.2572167756 99

Binary Codeword 0.2572167752 in binary is 0.01000001110110001111010101100101 0.2572167756 in binary is 0.01000001110110001111010101100111 The transmitted binary codeword is then 0100000111011000111101010110011 31 bits long 100

Decoding Algorithm get encoded number Do find symbol whose range contains the encoded number output the symbol subtract symbol_low_range from the encoded number divide by the probability of the output symbol Until no more symbols 101

Decoding BILL GATES Encoded Number Output Symbol Low High Probability 0.2572167752 B 0.2 0.3 0.1 0.572167752 I 0.5 0.6 0.1 0.72167752 L 0.6 0.8 0.2 0.6083876 L 0.6 0.8 0.2 0.041938 SPACE 0.0 0.1 0.1 0.41938 G 0.4 0.5 0.1 0.1938 A 0.2 0.3 0.1 0.938 T 0.9 1.0 0.1 0.38 E 0.3 0.4 0.1 0.8 S 0.8 0.9 0.1 0.0 102

Finite Precision Symbol Probability (fraction) Interval (8-bit precision) fraction Interval (8-bit precision) binary Range in binary a 1/3 [0,85/256) [0.00000000, 0.01010101) b 1/3 [85/256,171/256) [0.01010101, 0.10101011) c 1/3 [171/256,1) [0.10101011, 1.00000000) 00000000 01010100 01010101 10101010 10101011 11111111 103

Renormalization Symbol Probability (fraction) Range Digits that can be output Range after renormalization a 1/3 00000000 01010100 b 1/3 01010101 10101010 c 1/3 10101011 11111111 0 00000000 10101001 none 01010101 10101010 1 01010110 11111111 104

Terminating Symbol Symbol Probability (fraction) Interval (8-bit precision) fraction Interval (8-bit precision) binary Range in binary a 1/3 [0,85/256) [0.00000000, 0.01010101) b 1/3 [85/256,170/256) [0.01010101, 0.10101011) c 1/3 [170/256,255/256) [0.10101011, 0.11111111) term 1/256 [255/256,1) [0.11111111, 1.00000000) 00000000 01010100 01010101 10101001 10101010 11111110 11111111 105

Huffman vs Arithmetic Codes X = {a,b,c,d} p(a) =.5, p(b) =.25, p(c) =.125, p(d) =.125 Huffman code a 0 b 10 c 110 d 111 106

Huffman vs Arithmetic Codes X = {a,b,c,d} p(a) =.5, p(b) =.25, p(c) =.125, p(d) =.125 Arithmetic code ranges a [0,.5) b [.5,.75) c [.75,.875) d [.875, 1) 107

Huffman vs Arithmetic Codes H(X)=2.75 bits L(C)=2.75 bits ζ=100% Encode abcdc Huffman codewords 010110111110 Arithmetic codeword 010110111110 both 12 bits 108

Huffman vs Arithmetic Codes X = {a,b,c,d} p(a) =.7, p(b) =.12, p(c) =.10, p(d) =.08 Huffman code a 0 b 10 c 110 d 111 109

Huffman vs Arithmetic Codes X = {a,b,c,d} p(a) =.7, p(b) =.12, p(c) =.10, p(d) =.08 Arithmetic code ranges a [0,.7) b [.7,.82) c [.82,.92) d [.92, 1) 110

Huffman vs Arithmetic Codes H(X)=1.351 bits L(C)=1.48 bits ζ=91.3% Encode aaab Huffman 00010 5 bits Arithmetic 01 2 bits Encode abcdaaa Huffman 010110111000 12 bits Arithmetic 100100010001 12 bits 111

Gadsby by Ernest Vincent Wright If youth, throughout all history, had had a champion to stand up for it; to show a doubting world that a child can think; and, possibly, do it practically; you wouldn t constantly run across folks today who claim that a child don t know anything. A child s brain starts functioning at birth; and has, amongst its many infant convolutions, thousands of dormant atoms, into which God has put a mystic possibility for noticing an adult s act, and figuring out its purport. Up to about its primary school days a child thinks, naturally, only of play. But many a form of play contains disciplinary factors. You can t do this, or that puts you out, shows a child that it must think, practically or fail. Now, if, throughout childhood, a brain has no opposition, it is plain that it will attain a position of status quo, as with our ordinary animals. Man knows not why a cow, dog or lion was not born with a brain on a par with ours; why such animals cannot add, subtract, or obtain from books and schooling, that paramount position which Man holds today. 112

Lossless Compression Techniques 1 Model and code The source is modelled as a random process. The probabilities (or statistics) are given or acquired. 2 Dictionary-based There is no explicit model and no explicit statistics gathering. Instead, a codebook (or dictionary) is used to map sourcewords into codewords. 113

Model and Code Huffman code Tunstall code Fano code Shannon code Arithmetic code 114

Dictionary-based Techniques Lempel-Ziv LZ77 sliding window LZ78 explicit dictionary Adaptive Huffman coding Due to patents, LZ77 and LZ78 led to many variants: LZ77 Variants LZR LZSS DEFLATE LZH LZB LZ78 Variants LZW LZC LZT LZMW LZJ LZFG Zip methods use LZH and LZR among other techniques UNIX compress uses LZC (a variant of LZW) 115

Dictionary-based Techniques 116

Lempel-Ziv Compression Source symbol sequences are replaced by codewords that are dynamically determined. The code table is encoded into the compressed data so it can be reconstructed during decoding. 117

Lempel-Ziv Example 118

119

120

Lempel-Ziv Codeword 121

Compression Comparison Compression as a percentage of the original file size File Type UNIX Compact Adaptive Huffman UNIX Compress Lempel-Ziv-Welch ASCII File 66% 44% Speech File 65% 64% Image File 94% 88% 122

Compression Comparison 123