Chapter 2: Source coding

Similar documents
Chapter 2 Date Compression: Source Coding. 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code

Chapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria

Basic Principles of Lossless Coding. Universal Lossless coding. Lempel-Ziv Coding. 2. Exploit dependences between successive symbols.

Chapter 5: Data Compression

10-704: Information Processing and Learning Fall Lecture 10: Oct 3

Multimedia Communications. Mathematical Preliminaries for Lossless Compression

Source Coding Techniques

10-704: Information Processing and Learning Fall Lecture 9: Sept 28

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

Image and Multidimensional Signal Processing

Motivation for Arithmetic Coding

Lecture 3. Mathematical methods in communication I. REMINDER. A. Convex Set. A set R is a convex set iff, x 1,x 2 R, θ, 0 θ 1, θx 1 + θx 2 R, (1)

COMM901 Source Coding and Compression. Quiz 1

repetition, part ii Ole-Johan Skrede INF Digital Image Processing

EE5585 Data Compression January 29, Lecture 3. x X x X. 2 l(x) 1 (1)

Lecture 4 : Adaptive source coding algorithms

PART III. Outline. Codes and Cryptography. Sources. Optimal Codes (I) Jorge L. Villar. MAMME, Fall 2015

Information Theory CHAPTER. 5.1 Introduction. 5.2 Entropy

Lecture 16. Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code

Information and Entropy

1 Introduction to information theory

Chapter 9 Fundamental Limits in Information Theory

Coding of memoryless sources 1/35

Intro to Information Theory

Multimedia. Multimedia Data Compression (Lossless Compression Algorithms)

DCSP-3: Minimal Length Coding. Jianfeng Feng

Entropy as a measure of surprise

Coding for Discrete Source

lossless, optimal compressor

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

CMPT 365 Multimedia Systems. Lossless Compression

ELEC 515 Information Theory. Distortionless Source Coding

Exercises with solutions (Set B)

EE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16

ECE 587 / STA 563: Lecture 5 Lossless Compression

ECE 587 / STA 563: Lecture 5 Lossless Compression

Lecture 1: September 25, A quick reminder about random variables and convexity

Compression and Coding

Information Theory and Statistics Lecture 2: Source coding

SIGNAL COMPRESSION Lecture Shannon-Fano-Elias Codes and Arithmetic Coding

Lecture 1: Shannon s Theorem

Lecture 1 : Data Compression and Entropy

Introduction to information theory and coding

Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet)

Run-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

10-704: Information Processing and Learning Spring Lecture 8: Feb 5

2018/5/3. YU Xiangyu

Lecture Notes on Digital Transmission Source and Channel Coding. José Manuel Bioucas Dias

CMPT 365 Multimedia Systems. Final Review - 1

3F1 Information Theory, Lecture 3

An introduction to basic information theory. Hampus Wessman

EC2252 COMMUNICATION THEORY UNIT 5 INFORMATION THEORY

Lec 03 Entropy and Coding II Hoffman and Golomb Coding

Chapter 5. Data Compression

Lecture 3 : Algorithms for source coding. September 30, 2016

Image Compression. Qiaoyong Zhong. November 19, CAS-MPG Partner Institute for Computational Biology (PICB)

Variable-to-Variable Codes with Small Redundancy Rates

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy

Ch 0 Introduction. 0.1 Overview of Information Theory and Coding

Data Compression. Limit of Information Compression. October, Examples of codes 1

Lecture 22: Final Review

ECE472/572 - Lecture 11. Roadmap. Roadmap. Image Compression Fundamentals and Lossless Compression Techniques 11/03/11.

UNIT I INFORMATION THEORY. I k log 2

Quantum-inspired Huffman Coding

Autumn Coping with NP-completeness (Conclusion) Introduction to Data Compression

CS4800: Algorithms & Data Jonathan Ullman

Information Theory. Lecture 5 Entropy rate and Markov sources STEFAN HÖST

Text Compression. Jayadev Misra The University of Texas at Austin December 5, A Very Incomplete Introduction to Information Theory 2

CSCI 2570 Introduction to Nanocomputing

1 Ex. 1 Verify that the function H(p 1,..., p n ) = k p k log 2 p k satisfies all 8 axioms on H.

MARKOV CHAINS A finite state Markov chain is a sequence of discrete cv s from a finite alphabet where is a pmf on and for

EE376A - Information Theory Midterm, Tuesday February 10th. Please start answering each question on a new page of the answer booklet.

Homework Set #2 Data Compression, Huffman code and AEP

3F1 Information Theory, Lecture 3

SGN-2306 Signal Compression. 1. Simple Codes

Summary of Last Lectures


SIGNAL COMPRESSION Lecture 7. Variable to Fix Encoding

Information Theory. M1 Informatique (parcours recherche et innovation) Aline Roumy. January INRIA Rennes 1/ 73

Non-binary Distributed Arithmetic Coding

arxiv: v1 [cs.it] 5 Sep 2008

Information Theory and Coding Techniques

Digital Communications III (ECE 154C) Introduction to Coding and Information Theory

Digital communication system. Shannon s separation principle

Communications Theory and Engineering

Multimedia Information Systems

18.310A Final exam practice questions

Lecture 10 : Basic Compression Algorithms

Lecture 5: Asymptotic Equipartition Property

Introduction to Information Theory. Part 3

Entropy Coding. Connectivity coding. Entropy coding. Definitions. Lossles coder. Input: a set of symbols Output: bitstream. Idea

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

4. Quantization and Data Compression. ECE 302 Spring 2012 Purdue University, School of ECE Prof. Ilya Pollak

Lecture 6: Kraft-McMillan Inequality and Huffman Coding

Introduction to Information Theory. By Prof. S.J. Soni Asst. Professor, CE Department, SPCE, Visnagar

CS 229r Information Theory in Computer Science Feb 12, Lecture 5

EE376A: Homework #2 Solutions Due by 11:59pm Thursday, February 1st, 2018

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

Transcription:

Chapter 2: meghdadi@ensil.unilim.fr University of Limoges Chapter 2:

Entropy of Markov Source Chapter 2:

Entropy of Markov Source Markov model for information sources Given the present, the future is independent of the past. This fact can be presented as follows. For a given source producing X 1, X 2 and X 3, we can always write: P(X 1, X 2, X 3 ) = P(X 1 )P(X 2 X 1 )P(X 3 X 1, X 2 ) However for a, the history of the source is saved in its present state: P(X k = s q X 1, X 2,..., X k 1 ) = P(X k = s q S k ) It means that the state contains all the past of the source. Chapter 2:

Entropy of Markov Source modeling The source can be in one of n possible states. At each symbol generation, the source changes its state from i to j. This state change is done with the probability p ij which depends only on the initial state i and the final state j and remains constant over the time. The generated symbol depends only on the initial state and the next state (i and j). The transition matrix P is defined where [p ij ] is the probability of transmission from state i to state j. Chapter 2:

Entropy of Markov Source, example A symbol is sent out at each transition. What is the probability of message A? What is the probability of message B? What is the probability of message AB? Give the transition matrix. A 1/2 1 2 A 1/4 A 1/4 C 1/4 B 1/4 3 C 1/2 B 1/4 C B 1/2 Chapter 2:

Entropy of Markov Source Entropy of a Each state can be considered as a source. For this source the entropy can be calculated. We average then these entropies over all the states to find the entropy of the whole source, i.e.: H = n P i H i i=1 Where n is the number of states and : n H i = P ij log 2 (1/P ij ) j=1 The information rate will be then: R = r s H Chapter 2:

Entropy of Markov Source Example For the following example calculate the entropy of the source the information per symbol that is contained in the messages of size 1, 2 and 3 symbols is it true that the information for longer messages is less? A 1/4 3/4 1 2 3/4 1/4 C C P 1 =1/2 P 2 =1/2 B Chapter 2:

Entropy of Markov Source Theorem Suppose the messages m i of size N at the output of a Markov source. Because the information content of sequence m i is log 2 p(m i ), the mean of information per symbol is: G N = 1 p(m i ) log N 2 p(m i ) i It can be shown that G N is a monotone decreasing function of N, and: lim N G N = H bits/symbol Chapter 2:

Entropy of Markov Source Compression Application for compression. Suppose that there is a source modeled by Markov model. If we find the statistic for the sequences of one symbol, the correlation between the consecutive symbols is not exploited. So the information seems to be more than reality. If we consider the sequences of n symbols, there is more correlation taken into account, so entroy seems to be lessen and nearer to real source entropy. Conclusion: to better compress, we should code the longer sequences. Chapter 2:

Two types of compression can be considered: Lossless coding: the information can be reconstructed exactly from coded sequence (Shannon, Huffman, LZW, PKZIP, GIF, TIFF,...) Lossy coding: Information loss happens in coding process (JPEG, MPEG, Wavelet, transform coding, sub band coding,...) In this course we only consider the first one. We suppose furthermore that the sequences at the output of encoder are binary. Chapter 2:

(1) Outline Definition A source code C for a random variable X is a mapping from X, the range of X, to D, the set of finite length strings of symbols from a D-ary alphabet. C(x) denotes the codeword corresponding to x and l(x) denotes the length of C(x). Example If you toss a coin, X = {tail, head}, C(head) = 0, C(tail) = 11, l(head) = 1, l(tail) = 2. Chapter 2:

(2) Outline Definition The expected (average) length of a code C(x) for a random variable X with probability mass function p(x) is given by L(C) = x X p(x)l(x) For example for the above example the expected length of the code is L(C) = 1 2 2 + 1 2 1 = 1.5 Chapter 2:

(3) Outline Definition The code is singular if C(x 1 ) = C(x 2 ) and x 1 x 2. Definition The extension of a code C is a code obtained as: C(x 1 x 2... x n ) = C(x 1 )C(x 2 )... C(x n ) It means that a long message can be coded by concatenating the shorter message code words. For example if C(x 1 ) = 11 and C(x 2 ) = 00, then C(x 1 x 2 ) = 1100. Chapter 2:

(4) Definition A code is uniquely decodable if its extension is non-singular. In other words, any encoded string has only one possible source string and there is no ambiguity. Definition A code is called a prefix code or an instantaneous code if no code word is a prefix of any other codeword. Chapter 2:

Example (1) Outline Example The following code is a prefix code: C(x 1 ) = 1, C(x 2 ) = 01, C(x 3 ) = 001, C(X 4 ) = 000. Any encoded sequence is uniquely decodable and its corresponding source word can be obtained as soon as the code word is received. In other word, an instantaneous code can be decoded without reference to the future codewords since the end of a codeword is immediately recognizable. For example the sequence 001100000001 is decoded as x 3 x 1 x 4 x 4 x 2. Chapter 2:

Example (2) Outline Example The following code is not instantaneous code but uniquely decodable: C(x 1 ) = 1, C(x 2 ) = 10, C(x 3 ) = 100, C(X 4 ) = 000. Why? Here you should wait to receive a 1 to be able to decode. Note that if we look at the encoded sequence from right to left, it becomes instantaneous. Chapter 2:

Kraft-McMillan Inequality For any uniquely decodable code C: w C D l(w) 1 where w is a codeword in C and l(w) is its length, D is the size of alphabet, it is 2 for binary sequences. Lemma For any message set X with a mass probability function and associated uniquely decodable code C H(X ) L(C) The proof uses Jensen inequality for concave logarithmic function and Kraft-MacMillan inequality: i p if (x i ) f ( i p ix i ). Chapter 2:

Kraft-McMillan Inequality example 0 1 A 0 1 B 0 1 0 1 0 1 0 1 0 1 D C Codewords cannot be used Here D = 2. There are 4 codewords A=1, B=01, C=001, D=000, with l i =1, 2, 3 and 3. So w C D l(w) = 1. If all of the branches are used the code is complete and the equality is obtained, as above. But, if the tree is not complete, for example C in the figure is not used in the code, then the equality cannot be obtained in Kraft inequality. Chapter 2:

Results The average code length is lower bounded by the source entropy. It can be shown that there is an upper bound based on entropy for optimal prefix code: L(c) H(X ) + 1 Chapter 2:

(1) A systematic method to design the code The input of the encoder is one of the q possible sequences of size N symbols: m i. The corresponding probabilities are p 1, p 2,..., p q. The encoder transforms m i to a binary sequence c i trying to minimize the average output bits L(C). The average number of bits per symbol at the output can be calculated as: L(C) = (1/N) q i=1 n ip i bits/symbol where n i is the number of bits in the coded sequence c i. For a good encoder L(C) should be very close to the input entropy: G N = (1/N) q i=1 p i log(1/p i ). Chapter 2:

(2) The idea is to assign shorter codes to more probable messages. It is a variable length code. Theorem If C is an optimal prefix code for the probabilities {p 1, p 2,..., p n }, then p i > p j implies that l(c i ) l(c j ). Chapter 2:

(3) 1. Order the messages from the most probable to the least probable, from m 1 to m q. 2. Put these messages in the first column of a table. 3. In the second column put n i such that log 2 (1/p i ) n i < 1 + log 2 (1/p i ) 4. In the third column, write F i = i 1 k=1 p k with F 1 = 0. 5. In the forth column, write down the binary representation of F i for n i bits. This column gives directly the code corresponding to the messages of the first column. Chapter 2:

properties This is a uniquely decodable prefix code. The more probable messages are coded with shorter codes. The codewords are distinct. The average number of bits per symbol at the output is G N L(C) < G N + 1/N When N, G N L(C) and L(C) H. The code performance is quantified by e = H/L(C). Chapter 2:

Example Outline A 1/4 3/4 1 2 3/4 1/4 C C P 1 =1/2 P 2 =1/2 B For the above, give the code for 1. the messages of size one symbol, 2. the messages of size two symbols, 3. the messages of size three symbols. Show that the code performance is: e 1 = 40.56%, e 2 = 56.34% and e 3 = 62.40% Chapter 2:

Huffman algorithm Outline constructs the coding tree with a systematic method. Suppose that the messages are ordered with their probabilities, m 1 is the most probable and m q the least one. We consider the two last messages and we sum up their probabilities to obtain a new message. We construct at the same time the tree. Then, with new messages we continue as before. 0 1 p q p q-1 Chapter 2:

Huffman algorithm example a 0.25 0.25 0.25 0.33 0.42 0.58 b c 0.2 0.15 0.2 0.18 0.22 0.2 0.25 0.22 0.33 0.25 0.42 0 1 1 d e 0.12 0.1 0.15 0.12 0.18 0.15 0.2 0 0.58 0.42 1 0 1 f g 0.1 0.08 0.1 a:01 b:11 c:001 d:101 e:100 f: 0001 g:0000 0 0.1 g 0 0.33 0.18 0.15 1 0.08 f 1 c 0.25 a 0 0.1 e 0.22 1 0.12 d 0.2 b Chapter 2:

Lempel-Ziv algorithm Huffman encoding is optimal (average block length is minimal) requires the probability distribution of source. In, the input sequence is fixed length but the output is variable length. In LZ coding, the knowledge of probabilities is not required. LZ coding therefore belongs to the class of universal source coding In LZ coding, the input sequence can be variable length but the output is fixed length. Chapter 2:

Dictionary construction Outline In this algorithm we construct on the fly a dictionary. That s why the code is called dictionary coding. Suppose the binary message 10101101001001110101000011001110101100011011... to be coded. There is no entry in our table yet. The encoder constructs the first entry of its table with the first letter in the sequence, 1, with the location in the table 1. The second letter, which is a 0, does not belong to the table, so it is added with the location 2. The next letter is a 1 which is already in the table, so the encoder continues to receive the next letter. Now 10 is not in the table, that will be added with the index 3. Chapter 2:

The table Outline For the example given the dictionary for the sequence 10101101001001110101000011001110101100011011 will be: 1, 0, 10, 11, 01, 00, 100, 111, 010, 1000, 011, 001, 110, 101, 10001, 1011 Note that each phrase is a concatenation of a previous phrase in the table with the new letter to be appended. To encode the sequence, the codewords are the position of the phrase in the dictionary with the new letter appended to it. Initially, the location 0000 is used to encode a phrase that has not appeared previously. Chapter 2:

Encoding Outline Assuming the codewords of length 5, the constructed dictionary together with the codewords are presented in the following table: location content codeword 1 0001 1 00001 2 0010 0 00000 3 0011 10 00010 4 0100 11 00011 5 0101 01 00101 6 0110 00 00100 7 0111 100 00110............ 15 1111 10001 10101 Chapter 2:

Lempel-Ziv decoding Outline In decoding the table must be constructed also on the fly. For example for the previous sequence, the receiver must know that the codeword length is 5. It receives a 00001. So the entry of the table is the 1. This is the decoded sequence and it puts in its table the 1 in the first location. Then it receives a 00000 which says that the second entry of the table is a 0 and the decoded phrase will be 0. The third word is 00010 which means that 0001 is appended to 0. The decoded phrase will be the phrase corresponding to the first row, 1 with a 0 at the end: 10, and so on... Chapter 2: