Chapter 5: Data Compression

Similar documents
Chapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code

Data Compression. Limit of Information Compression. October, Examples of codes 1

Chapter 5. Data Compression

Chapter 2 Date Compression: Source Coding. 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code

Lecture 3. Mathematical methods in communication I. REMINDER. A. Convex Set. A set R is a convex set iff, x 1,x 2 R, θ, 0 θ 1, θx 1 + θx 2 R, (1)

Entropy as a measure of surprise

Chapter 2: Source coding

EE5585 Data Compression January 29, Lecture 3. x X x X. 2 l(x) 1 (1)

10-704: Information Processing and Learning Fall Lecture 10: Oct 3

10-704: Information Processing and Learning Fall Lecture 9: Sept 28

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

1 Introduction to information theory

SIGNAL COMPRESSION Lecture Shannon-Fano-Elias Codes and Arithmetic Coding

Information Theory and Statistics Lecture 2: Source coding

Lecture 16. Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code

Motivation for Arithmetic Coding

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

Communications Theory and Engineering

EECS 229A Spring 2007 * * (a) By stationarity and the chain rule for entropy, we have

Lecture 3 : Algorithms for source coding. September 30, 2016

Lecture 22: Final Review

Homework Set #2 Data Compression, Huffman code and AEP

Solutions to Set #2 Data Compression, Huffman code and AEP

ECE 587 / STA 563: Lecture 5 Lossless Compression

ECE 587 / STA 563: Lecture 5 Lossless Compression

Lecture Notes on Digital Transmission Source and Channel Coding. José Manuel Bioucas Dias

lossless, optimal compressor

CSCI 2570 Introduction to Nanocomputing

Information Theory. David Rosenberg. June 15, New York University. David Rosenberg (New York University) DS-GA 1003 June 15, / 18

3F1 Information Theory, Lecture 3

EE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16

U Logo Use Guidelines

Data Compression Techniques (Spring 2012) Model Solutions for Exercise 2

10-704: Information Processing and Learning Spring Lecture 8: Feb 5

Basic Principles of Lossless Coding. Universal Lossless coding. Lempel-Ziv Coding. 2. Exploit dependences between successive symbols.

PART III. Outline. Codes and Cryptography. Sources. Optimal Codes (I) Jorge L. Villar. MAMME, Fall 2015

Coding for Discrete Source

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

3F1 Information Theory, Lecture 3

COMM901 Source Coding and Compression. Quiz 1

Shannon-Fano-Elias coding

Coding of memoryless sources 1/35

Run-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE

Chapter 9 Fundamental Limits in Information Theory

ELEC 515 Information Theory. Distortionless Source Coding

(Classical) Information Theory II: Source coding

Information and Entropy

Information Theory with Applications, Math6397 Lecture Notes from September 30, 2014 taken by Ilknur Telkes

Lec 03 Entropy and Coding II Hoffman and Golomb Coding

Lecture 1: September 25, A quick reminder about random variables and convexity

Multimedia Communications. Mathematical Preliminaries for Lossless Compression

Lecture 1 : Data Compression and Entropy

Summary of Last Lectures

EE376A - Information Theory Midterm, Tuesday February 10th. Please start answering each question on a new page of the answer booklet.

Lecture 6: Kraft-McMillan Inequality and Huffman Coding

Tight Upper Bounds on the Redundancy of Optimal Binary AIFV Codes

UNIT I INFORMATION THEORY. I k log 2

Using an innovative coding algorithm for data encryption

Lecture 5: Asymptotic Equipartition Property

DCSP-3: Minimal Length Coding. Jianfeng Feng

Optimal codes - I. A code is optimal if it has the shortest codeword length L. i i. This can be seen as an optimization problem. min.

Ch. 2 Math Preliminaries for Lossless Compression. Section 2.4 Coding

Introduction to information theory and coding

Lec 05 Arithmetic Coding

Universal Loseless Compression: Context Tree Weighting(CTW)

Lecture 1: Shannon s Theorem

APC486/ELE486: Transmission and Compression of Information. Bounds on the Expected Length of Code Words

Intro to Information Theory

Context tree models for source coding

Lecture 4 : Adaptive source coding algorithms

Generalized Kraft Inequality and Arithmetic Coding

Data Compression Techniques

On the Cost of Worst-Case Coding Length Constraints

1 Ex. 1 Verify that the function H(p 1,..., p n ) = k p k log 2 p k satisfies all 8 axioms on H.

Lecture 2: Introduction to Audio, Video & Image Coding Techniques (I) -- Fundaments

MARKOV CHAINS A finite state Markov chain is a sequence of discrete cv s from a finite alphabet where is a pmf on and for

Information Theory. M1 Informatique (parcours recherche et innovation) Aline Roumy. January INRIA Rennes 1/ 73

Source Coding: Part I of Fundamentals of Source and Video Coding

Variable-to-Variable Codes with Small Redundancy Rates

Lecture 2: Introduction to Audio, Video & Image Coding Techniques (I) -- Fundaments. Tutorial 1. Acknowledgement and References for lectures 1 to 5

Quantum-inspired Huffman Coding

Kolmogorov complexity ; induction, prediction and compression

Information Theory: Entropy, Markov Chains, and Huffman Coding

ELEC546 Review of Information Theory


Information & Correlation

COS597D: Information Theory in Computer Science October 19, Lecture 10

TTIC 31230, Fundamentals of Deep Learning David McAllester, April Information Theory and Distribution Modeling

Introduction to algebraic codings Lecture Notes for MTH 416 Fall Ulrich Meierfrankenfeld

Chapter 2 Source Models and Entropy. Any information-generating process can be viewed as. computer program in executed form: binary 0

Course notes for Data Compression - 1 The Statistical Coding Method Fall 2005

2018/5/3. YU Xiangyu

Entropy and Ergodic Theory Lecture 3: The meaning of entropy in information theory

EE/Stat 376B Handout #5 Network Information Theory October, 14, Homework Set #2 Solutions

Optimum Binary-Constrained Homophonic Coding

Kolmogorov complexity

Introduction to information theory and coding

Non-binary Distributed Arithmetic Coding

ECE Advanced Communication Theory, Spring 2009 Homework #1 (INCOMPLETE)

Transcription:

Chapter 5: Data Compression Definition. A source code C for a random variable X is a mapping from the range of X to the set of finite length strings of symbols from a D-ary alphabet. ˆX: source alphabet, ˆD: code alphabet x C(x) }{{} codeword, l(x): the length of C(x) Example : ˆX = {Red, Blue, Yellow}, ˆD = {0, 1} C(Red) = 0 C(Red) = 0 C(Blue) = 1 or C(Blue) = 10 C(Y ellow) = 00 C(Y ellow) = 11

Definition. The expected length L(C) of a source code C for a random variable X with probability mass function p(x) is L(C) = x X p(x)l(x) Without loss of generality, we assume ˆD = {0, 1,, D 1}

Example : P (X = 1) = 1 2, c(1) = 0 P (X = 2) = 1, c(2) = 10 4 P (X = 3) = 1, c(3) = 110 8 P (X = 4) = 1, c(4) = 111 8 H(X) = 1.75 bits, L(c) = E{l(x)} = 1.75 bits Definition. A code is said to be nonsingular if x i x j implies c(x i ) c(x j ). Definition. The extension C of a code C is the mapping from finite length strings of X to finite length strings of D, defined by c(x 1 x 2 x n ) = c(x 1 )c(x 2 ) c(x n ) }{{} The concatenation of c(x 1 ) c(x n )

Example : c(x 1 ) = 00, c(x 2 ) = 11, c(x 1 x 2 ) = 0011 Definition. A code is called uniquely decodable if its extension is nonsingular. Definition. A uniquely decodable code is said to be instantaneous if it is possible to decode each word in a sequence without referring to succeeding code symbols. Definition. Let x = (x 1, x 2,, x n ) be a sequence. A sequence (x 1, x 2,, x i ) with i n is called a prefix of x. A necessary and sufficient condition for a code to be instantaneous is that no codeword of the code be a prefix of some other codewords.

Code tree: 1. The code alphabet consists of D symbols. 2. The maximum number of branches emanating from each node is D. 3. Each node, except the initial node has exactly one branch entering it. 4. A node with no branch emanating from it is called a terminal node, otherwise a node is called an intermediate node. 5. A path is defined as a sequence of consecutive branches starting from the initial node and entering certain node. 6. Each branch is named by a code symbol b j. 7. A path is corresponding to a sequence of code symbols. 8. A node is named by the sequence of code symbols which correspond to the path Γ entering it.

9. A node which is entered by a path of n consecutive branches is called an nth order node.

Theorem 5.2.1 (Kraft inequality) : For any instantaneous code over an alphabet of size D, the codeword lengths l 1, l 2,, l m must satisfy the inequality D l i 1. i Conversely, given a set of codeword lengths that satisfy this inequality, there exists an instantaneous code with these word lengths. Proof. ( ) There are at most D l nodes of order l. A terminal node of order l i (l i < l) eliminates D l l i of the possible nodes of order l. Hence, we have m i=1 Dl l i D l. This implies m i=1 D l i 1.

( ) Let max 1 i m l i = L. Let N i be the number of codewords of length i. Then, m i=1 D l i 1 can be written as L l=1 N ld l 1. Consider the codewords of length t, where 1 t L. Then, t 1 l=1 N ld l + N t D t + L N l D l 1. Equivalently, N t {D t t 1 l=1 N l D t l } l=t+1 L l=t+1 N l D (l t). Note that D t is the maximum number of nodes of order t and t 1 l=1 N ld t l is the number of nodes of order t eliminated by the presence of terminal nodes of lower orders. Therefore, D t t 1 l=1 N ld l is the number of available nodes of order t in the tree. For all t, (1 t L), the number of available terminal node of order t is greater than the required terminal nodes. Thus, a tree with the required terminal nodes can always be constructed.

Theorem 5.2.2 (Extended Kraft Inequality) : For any countably infinite set of codewords that form a prefix code, the codeword lengths satisfy the extended Kraft inequality D l i 1 i=1 Conversely, given any l 1, l 2,, satisfying the extended Kraft ineqality, we can construct a prefix code with these codeword lengths. Proof. Let the D-ary alphabet be {0, 1,, D 1}. Consider the ith codeword η 1 η 2 η li. Let 0.η 1 η 2 η li represent the real number in the D-ary expansion, i.e., 0.η 1 η 2 η li = l i j=1 η j D j

This codeword corresponds to an interval (0.η 1 η 2 η li, 0.η 1 η 2 η li + 1 D l i ) i.e., the set of all real numbers whose D-ary expansion begins with 0, η 1 η 2 η li. This is a subinterval of [0,1]. By the prefix condition, all the intervals are disjoint. Hence, the sum of their lengths are less then or equal to 1. That means D l i 1 i=1 We can reverse the procedure to construct a prefix code of lengths l 1, l 2,. (or use the code tree which can be extended to infinite order)

Theorem 5.3.1 The expected length L of an instantaneous D-ary code for a random variable X is greater than or equal to the entropy H D (X), i.e., L H D (X) with equality if and only if D l i = p i Proof. L H D (X) = i p i l i i p i log D 1 p i = i p i log D D l i + p i log D p i = i [p i log p i r i ] log D c = D(p r) + log D 1 c 0 ( ) where r i = D l i j D l j and c = D l i 1. Equality holds if and only if p i = D l i and p i = D l i = 1

Definition. A probability distribution is called D-adic if each of the probabilities is equal to D n for some n. Thus, we have equality in the theorem if and only if the distribution of X is D-adic. Theorem 5.4.1 Let l 1, l 2,, l m be optimal codeword lengths for a source distribution p 1, p 2,, p m and a D-ary alphabet, and let L be the associated expected length of an optimal code (L = p i l i ). Then, H D (X) L < H D (X) + 1. 1 1 Proof. Let log D p i l i < log D p i + 1. Then, H D (X) L = p i l i < H D (X) + 1. 1 Since l i = log D p i satisfies the Kraft inequality, there exits an instantaneous code with l i. An optimal code has average length L = p i li L. Since L H D (X) from Theorem 5.3.1, we have H D (X) L < H D (X) + 1.

Extended Source X n : H(X 1, X 2,, X n ) p(x 1,, x n )l(x 1,, x n ) < H(X 1, X 2,, X n ) + 1. Suppose that X 1,, X n are i.i.d. Then, H(X 1,, X n ) = H(X i ) = nh(x). Hence, H(X) L n = 1 n p(x1,, x n )l(x 1,, x n ) < H(X) + 1 n.

Theorem 5.4.2 In general, H(X 1,, X n ) n Note that H(X 1,, X n ) n L n < H(X 1,, X n ) n + 1 n H(X), if the random process is stationary. Theorem 5.4.3 The expected length under p(x) of the code assignment l(x) = log 1 q(x) satisfies H(p) + D(p q) E p [l(x)] < H(p) + D(p q) + 1

Proof. E p [l(x)] = x < x = x p(x) log 1 q(x) p(x)(log 1 q(x) + 1) p(x)log p(x) q(x) 1 p(x) + 1 = p(x)log p(x) q(x) + x x = D(p q) + H(p) + 1. p(x)log 1 p(x) + 1 The lower bound can be derived similarly.

Theorem 5.5.1 (McMillan Inequality) The condition of k D l i i=1 necessary and sufficient for the existence of a base D uniquely decodable code with lengths l 1, l 2,, l k. Proof. Consider that ( k i=1 1 is D l i ) m = (D l 1 + + D l k ) m. There are k m terms, each of the form D l i 1 l i2 l i m = D l, where l = l i1 + l i2 + + l im. Then, ( k D l i ) m = mn N l D l, where l i n i=1 l=m for all i. Note that N l is the number of strings of m codewords that can be formed so that each string has a length of exactly l code symbols. If the code is uniquely decodable, N l can not exceed D l, the number of distinct D-ary seqence of length l. Thus, ( k D l i ) m mn D l D l = mn m + 1 mn. If x > 1,then i=1 l=m x m > mn when m is large enough. Thus, we have k D l i 1. i=1

Huffman Codes: Example:

Total number of symbols D + k(d 1) is needed for D-ary codeword. Example: D = 3

Optimality of Huffman Codes Lemma 5.8.1 For any distribution, there exists an optimal instantaneous code (with minimum expected length) that satisfies the following property: 1. If P j > P k, then l j l k. 2.The two longest codewords have the same length. 3.The two longest codewords differ only in the last bit and correspond to the least likely symbols. Proof : Omitted. Proof for the optimality of Huffman code (binary case). Let C m be a Huffman code for m symbols with probability p 1, p 2,, p m. Then C m satisfies the properties of Lemma 5.8.1. Let C m 1 be a reduced code of C m. C m 1 is constructed for m 1 symbols, which takes the common prefix of two longest codewords and allots it to a symbol with probability P m 1 + P m. All the other codewords remain the same.

C m 1 C m p 1 w 1 l 1 w 1 = w 1 l 1 = l 1 p 2 w 2 l 2 w 2 = w 2 l 2 = l 2... p m 2 w m 2 l m 2 w m 2 = w m 2 l m 2 = l m 2 p m 1 + p m w m 1 l m 1 w m 1 = w m 10 l m 1 = l m 1 + 1 The expected length of C m is L(C m ) = m = m 2 i=1 = m 2 i=1. w m = w m 11 l m = l m 1 + 1 i=1 p i l i p i l i + p m 1(l m 1 + 1) + p m (l m 1 + 1) p i l i + (p m + p m 1 )l m 1 + p m 1 + p m = L(C m 1 ) + p m 1 + p m.

We will show that the best L(C m 1 ) implies the best L(C m ). Suppose that there is a code Ĉm such that L(Ĉm) < L(C m ). Let the codeword of Ĉ m be ŵ 1,, wˆ m with lengths ˆl 1,, l ˆ m. Also that ˆl 1 ˆl 2 l ˆ m. One of the words ŵ m 1 of Cm ˆ must be identical with ŵ m except in its last digit. We form Ĉm 1 by combining ŵ m 1 and ŵ m and dropping their last digit while leaving all other words unchanged. Then L( C ˆ m ) = L(Ĉm 1) + ˆp m 1 + ˆp m. Note that ˆp m 1 = p m 1 and pˆ m = p m. Hence, we have L(Ĉm 1) < L(C m 1 ) for (p 1,, p m 2, p m 1 + p m ) which is a contradiction. Example of Huffman code over extended source alphabet. Let x={0,1}, p(x=0)=0.9, p(x=1)=0.1 (next page)

Shannon code: codeword length of log 1 p i. Shannon codes is not optimal. Example : D={0,1}, p(0)=0.9999, p(1)=0.0001. Huffman code = C(0) = 1, C(1) = 1 Shannon code = C(0) = 1, C(1) = }{{}. 14 bits

Shannon-Fano-Elias coding Let X = {1, 2,, m} and p(x) > 0 for all x. Let F (x) = a x p(a) and F (x) = a<x p(a) + 1 2 p(x) Round off F (x) to l(x) bits, denoted by F (x) l(x)

We use the first l(x) bits of F (x) as a codeword for x. Note that If l(x) = log 1 p(x) + 1, then F (x) F (x) l(x) < 1 2 l(x) 1 p(x) < 2l(x) 2 = F (x) F (x 1). Hence, F (x) l(x) lies within the step corresponding to x. Let each codeword z 1,, z l represent an interval [0.z 1 z 2 z l, 0.z 1 z 2 z l + 1 2 l ]. The code is prefix free if and only if the intervals corresponding to codeword are disjoint.

Note that The interval [0.z 1 z 2 z l, 0.z 1 z 2 z l + 1 2 l ] falls within the step of (F (x 1), F (x)). Hence all the intervals are disjoint, once all the steps are disjoint.

The average length L = x p(x)l(x) = x p(x)( log 1 + 1) < H(X) + 2 p(x) Ex. x p(x) F (x) F (x) F (x) in binary l(x) codeword 1 0.25 0.25 0.125 0.001 3 001 2 0.25 0.5 0.375 0.011 3 011 3 0.2 0.7 0.6 0.10011 4 1001 4 0.15 0.85 0.775 0.1100011 4 1100 5 0.15 1.0 0.925 0.1110110 4 1110

For small source alphabets, we have efficient coding only if we use long blocks of source symbols. Hence, it is desirable to have an efficient coding procedure that works for long blocks of source symbols. Huffman coding is not ideal for this situation, since it is a bottom-up procedire that requires the calculation of the probabilities of all source sequences of a particular block length and the construction of the corresponding complete code tree. Arithmetic coding is a direct extension of the Shannon-Fano-Elias coding, which is suitable for long block lengths without having to redo all the calculations. The essential idea of arithmetic coding is to efficiently calculate the probability mass function p(x n ) and the cumulative distribution function F (x n ) for the source sequence x n.