3F1 Information Theory, Lecture 3

Similar documents
3F1 Information Theory, Lecture 3

3F1 Information Theory, Lecture 1

Chapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code

SIGNAL COMPRESSION Lecture Shannon-Fano-Elias Codes and Arithmetic Coding

Motivation for Arithmetic Coding

Lecture 3 : Algorithms for source coding. September 30, 2016

UNIT I INFORMATION THEORY. I k log 2

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

Coding of memoryless sources 1/35

Chapter 2 Date Compression: Source Coding. 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code

Lecture 4 : Adaptive source coding algorithms

10-704: Information Processing and Learning Fall Lecture 10: Oct 3

Chapter 9 Fundamental Limits in Information Theory

Entropy as a measure of surprise

U Logo Use Guidelines

Shannon-Fano-Elias coding

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

Lecture 8: Shannon s Noise Models

CSCI 2570 Introduction to Nanocomputing

Chapter 5: Data Compression

Homework Set #2 Data Compression, Huffman code and AEP

2018/5/3. YU Xiangyu

1 Ex. 1 Verify that the function H(p 1,..., p n ) = k p k log 2 p k satisfies all 8 axioms on H.

Information Theory with Applications, Math6397 Lecture Notes from September 30, 2014 taken by Ilknur Telkes

Lecture 16. Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code

Data Compression Techniques (Spring 2012) Model Solutions for Exercise 2

Chapter 2: Source coding

Basic Principles of Lossless Coding. Universal Lossless coding. Lempel-Ziv Coding. 2. Exploit dependences between successive symbols.

1 Introduction to information theory

Coding for Discrete Source

Intro to Information Theory

Lecture 6 I. CHANNEL CODING. X n (m) P Y X


Information & Correlation

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

SIGNAL COMPRESSION Lecture 7. Variable to Fix Encoding

Run-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE

Introduction to information theory and coding

Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet)

Lecture 22: Final Review

Information and Entropy

EE376A: Homework #2 Solutions Due by 11:59pm Thursday, February 1st, 2018

Lecture 1 : Data Compression and Entropy

Multimedia Communications. Mathematical Preliminaries for Lossless Compression

Capacity of a channel Shannon s second theorem. Information Theory 1/33

ELEC 515 Information Theory. Distortionless Source Coding

PART III. Outline. Codes and Cryptography. Sources. Optimal Codes (I) Jorge L. Villar. MAMME, Fall 2015

EECS 229A Spring 2007 * * (a) By stationarity and the chain rule for entropy, we have

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.

Dept. of Linguistics, Indiana University Fall 2015

Data Compression Techniques

Data Compression. Limit of Information Compression. October, Examples of codes 1

DCSP-3: Minimal Length Coding. Jianfeng Feng

Information Theory, Statistics, and Decision Trees

ECE 587 / STA 563: Lecture 5 Lossless Compression

Chapter 2 Source Models and Entropy. Any information-generating process can be viewed as. computer program in executed form: binary 0

MAHALAKSHMI ENGINEERING COLLEGE-TRICHY QUESTION BANK UNIT V PART-A. 1. What is binary symmetric channel (AUC DEC 2006)

Using an innovative coding algorithm for data encryption

Information Theory and Statistics Lecture 2: Source coding

Lecture 3. Mathematical methods in communication I. REMINDER. A. Convex Set. A set R is a convex set iff, x 1,x 2 R, θ, 0 θ 1, θx 1 + θx 2 R, (1)

ECE 587 / STA 563: Lecture 5 Lossless Compression

CMPT 365 Multimedia Systems. Lossless Compression

EE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16

Stream Codes. 6.1 The guessing game

Kolmogorov complexity ; induction, prediction and compression

MAHALAKSHMI ENGINEERING COLLEGE QUESTION BANK. SUBJECT CODE / Name: EC2252 COMMUNICATION THEORY UNIT-V INFORMATION THEORY PART-A

Entropies & Information Theory

10-704: Information Processing and Learning Fall Lecture 9: Sept 28

Summary of Last Lectures

4 An Introduction to Channel Coding and Decoding over BSC

CS4800: Algorithms & Data Jonathan Ullman

Solutions to Set #2 Data Compression, Huffman code and AEP

Information Theory. Lecture 5 Entropy rate and Markov sources STEFAN HÖST

Multimedia. Multimedia Data Compression (Lossless Compression Algorithms)

Optimal codes - I. A code is optimal if it has the shortest codeword length L. i i. This can be seen as an optimization problem. min.

COMM901 Source Coding and Compression. Quiz 1

Digital communication system. Shannon s separation principle

Lecture 1: Shannon s Theorem

Information Theory CHAPTER. 5.1 Introduction. 5.2 Entropy

Information Theory. Week 4 Compressing streams. Iain Murray,

Exercise 1. = P(y a 1)P(a 1 )

Lec 03 Entropy and Coding II Hoffman and Golomb Coding

Complex Systems Methods 2. Conditional mutual information, entropy rate and algorithmic complexity

Lecture 1: September 25, A quick reminder about random variables and convexity

On Universal Types. Gadiel Seroussi Hewlett-Packard Laboratories Palo Alto, California, USA. University of Minnesota, September 14, 2004

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria

Communications Theory and Engineering

ITCT Lecture IV.3: Markov Processes and Sources with Memory

lossless, optimal compressor

Source Coding Techniques

ELECTRONICS & COMMUNICATIONS DIGITAL COMMUNICATIONS

Context tree models for source coding

EC2252 COMMUNICATION THEORY UNIT 5 INFORMATION THEORY

Information Theory - Entropy. Figure 3

Information Theory. David Rosenberg. June 15, New York University. David Rosenberg (New York University) DS-GA 1003 June 15, / 18

Introduction to Information Theory. By Prof. S.J. Soni Asst. Professor, CE Department, SPCE, Visnagar

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye

Source Coding: Part I of Fundamentals of Source and Video Coding

Transcription:

3F1 Information Theory, Lecture 3 Jossy Sayir Department of Engineering Michaelmas 2011, 28 November 2011

Memoryless Sources Arithmetic Coding Sources with Memory 2 / 19 Summary of last lecture Prefix-free code, rooted tree with probabilities Kraft s inequality: D w i 1 Shannon-Fano coding: w i = log D p i Huffman s optimal code i Performance of Shannon-Fano and Huffman coding and Shannon s converse theorem H(X) log D E[W ] < H(X) log D + 1

Memoryless Sources Arithmetic Coding Sources with Memory 3 / 19 Encoding the output of a source Binary Memoryless Source X 1, X 2, X 3,... X 1, X 2,... independent and identically distributed P Xi (1) = 1 P Xi (0) = 0.01 What s the optimal binary variable length code for X i? E[W ] = 1, but H(X) = 0.081 bits. Redundancy ρ 1 = E[W ] H(X) = 0.9190 Can we do better?

Memoryless Sources Arithmetic Coding Sources with Memory 3 / 19 Encoding the output of a source Binary Memoryless Source X 1, X 2, X 3,... X 1, X 2,... independent and identically distributed P Xi (1) = 1 P Xi (0) = 0.01 What s the optimal binary variable length code for X i? E[W ] = 1, but H(X) = 0.081 bits. Redundancy ρ 1 = E[W ] H(X) = 0.9190 Can we do better?

Memoryless Sources Arithmetic Coding Sources with Memory 4 / 19 Parsing a source into blocks of 2 x 1 x 2 P X1 X 2 (x 1, x 2 ) 00 0.9801 0 01 0.0099 0 10 0.0099 0 11 0.0001 1 1 1 Codeword 0 10 110 111 E[W ] = 1.0299, H(X 1 X 2 ) = 0.1616 bits, where H(X 1 X 2 ) = x 1 x 2 P X1 X 2 (x 1, x 2 ) log P X1 X 2 (x 1, x 2 ) Redundancy per symbol: Can we do better? ρ 2 = E[W ] H(X 1X 2 ) 2 = 0.4342

Memoryless Sources Arithmetic Coding Sources with Memory 5 / 19 Encoding a block of length N If encoding a block of length N using Huffman or Shannon-Fano coding, H(X 1... X N ) E[W ] < H(X 1... X N ) + 1 where H(X 1... X 2 ) = x 1...x 2 P(x 1,..., x 2 ) log P(x 1,..., x 2 ) Thus, ρ N = E[W ] H(X 1... X N ) N and H(X 1... X N ) + 1 H(X 1... X N ) N lim ρ N = 0, N i.e., the redundancy tends to zero = 1 N

Memoryless Sources Arithmetic Coding Sources with Memory 6 / 19 The problem with block compression The alphabet size grows exponentially in the block length N Huffman s and Fano s algorithms become infeasible for large N, as does storing the codebook In Shannon s version of Shannon-Fano coding, the cumulative probability can be computed recursively Compute the cumulative probability for a specific block without computing all others! If only there wasn t the need for the alphabet to be ordered...

Memoryless Sources Arithmetic Coding Sources with Memory 7 / 19 Reminder: Shannon s version of S-F Coding x P X (x) P(X<x) P(X < x) b log P X (x) Codeword F.30 0.0 0.00000000 2 00 D.20 0.3 0.01001101 3 010 E.20 0.5 0.10000000 3 100 C.15 0.7 0.10110011 3 101 B.10 0.85 0.11011010 4 1101 A.05 0.95 0.11110011 5 11110 Order the symbols in order of non-increasing probability Compute the cumulative probabilities Express the cumulative probabilities in D-ary The codeword is the fractional part of the cumulative probabilities truncated to length log P X (x)

Memoryless Sources Arithmetic Coding Sources with Memory 8 / 19 Source intervals / code intervals x P(X<x) P(X < x) b F 0.0 0.00 D 0.3 0.010 E 0.5 0.100 C 0.7 0.101 B 0.85 0.1101 A 0.95 0.11110 A B C E D A B C E D F F

Memoryless Sources Arithmetic Coding Sources with Memory 9 / 19 What if we don t order the source probabilities? F F x P(X<x) P(X < x) b A 0.0 0.00000 B 0.05 0.000 C 0.15 0.001 D 0.3 0.010 E 0.5 0.100 F 0.7 0.11 E D C B A E D C A B

Memoryless Sources Arithmetic Coding Sources with Memory 10 / 19 New approach x [a, b] A [0.0, 0.05] B [0.05, 0.15] C [0.15, 0.3] D [0.3, 0.5] E [0.5, 0.7] F [0.7, 1.0] a = P(X < x) b = P(X < x) + P(X = x) F E D C Draw source intervals in no particular order Pick the largest interval [k2 w i, (k + 1)2 w i ] that fits in each source interval How large is w i? w i log p i +1 B A

Memoryless Sources Arithmetic Coding Sources with Memory 10 / 19 New approach x [a, b] A [0.0, 0.05] B [0.05, 0.15] C [0.15, 0.3] D [0.3, 0.5] E [0.5, 0.7] F [0.7, 1.0] a = P(X < x) b = P(X < x) + P(X = x) F E D C Draw source intervals in no particular order Pick the largest interval [k2 w i, (k + 1)2 w i ] that fits in each source interval How large is w i? w i log p i +1 B A

Memoryless Sources Arithmetic Coding Sources with Memory 11 / 19 Arithmetic Coding Recursive computation of the source interval: 1 0 1 0 1 0 1 At the end of the source block, generate the shortest possible codeword whose interval fits in the computed source interval E[W ] < H(X 1... X N ) + 2 0

Memoryless Sources Arithmetic Coding Sources with Memory 12 / 19 Arithmetic Coding = Recursive Shannon-Fano Coding Claude E. Shannon Robert L. Fano Peter Elias Richard C. Pasco Jorma Rissanen H(X 1... X N ) E[W ] < H(X 1... X N ) + 2 ρ N = 2/N No need to wait until all the source block has been received to start generating code symbols Arithmetic Coding is an infinite state machine whose states need ever growing precision Finite precision implementation requires a few tricks (and loses some performance)

Memoryless Sources Arithmetic Coding Sources with Memory 13 / 19 English Text Letter Frequency Letter Frequency Letter Frequency a 0.08167 j 0.00153 s 0.06327 b 0.01492 k 0.00772 t 0.09056 c 0.02782 l 0.04025 u 0.02758 d 0.04253 m 0.02406 v 0.00978 e 0.12702 n 0.06749 w 0.02360 f 0.02228 o 0.07507 x 0.00150 g 0.02015 p 0.01929 y 0.01974 h 0.06094 q 0.00095 z 0.00074 i 0.06966 r 0.05987 H(X) = 4.17576 bits P( and ) = 0.08167 0.06749 0.04253 = 0.0002344216 P( eee ) = 0.12702 3 = 0.00204935 P( eee ) >> P( and ). Is this right? No! Can we do better than H(X) for sources with memory?

Memoryless Sources Arithmetic Coding Sources with Memory 13 / 19 English Text Letter Frequency Letter Frequency Letter Frequency a 0.08167 j 0.00153 s 0.06327 b 0.01492 k 0.00772 t 0.09056 c 0.02782 l 0.04025 u 0.02758 d 0.04253 m 0.02406 v 0.00978 e 0.12702 n 0.06749 w 0.02360 f 0.02228 o 0.07507 x 0.00150 g 0.02015 p 0.01929 y 0.01974 h 0.06094 q 0.00095 z 0.00074 i 0.06966 r 0.05987 H(X) = 4.17576 bits P( and ) = 0.08167 0.06749 0.04253 = 0.0002344216 P( eee ) = 0.12702 3 = 0.00204935 P( eee ) >> P( and ). Is this right? No! Can we do better than H(X) for sources with memory?

Memoryless Sources Arithmetic Coding Sources with Memory 14 / 19 Entropy revisited We already know the symbol and block entropies: H(X) = x H(XY ) = xy P X (x) log P X (x) P XY (x, y) log P XY (x, y) Entropy conditioned on an event: H(X Y = y) = x P X Y (x y) log P X Y (x y) Equivocation or conditional entropy H(X Y ) = H(XY ) H(Y ) = E[H(X Y = Y )] = y P Y (y)h(x Y = y)

Memoryless Sources Arithmetic Coding Sources with Memory 15 / 19 Does conditioning reduce uncertainty? X color of randomly picked ball Y row of picked ball (1-10) H(X) = h(1/4) = 2 3 4 log 2 3 = 0.811 bits H(X Y = 1) = 1 bit > H(X) H(X Y = 2) = 0 bits < H(X) H(X Y ) = 5 1+5 0 10 = 1 2 < H(X) Conditioning Theorem 0 H(X Y ) H(X)

Memoryless Sources Arithmetic Coding Sources with Memory 16 / 19 Discrete Stationary Source (DSS) properties H N (X) def = 1 N H(X 1... X N ) Entropy Rate H(X N X 1... X N 1 ) H N (X) H(X N X 1... X N 1 ) and H N (X) are non-increasing functions of N lim N H(X N X 1... X N 1 ) = lim N H N (X) def = H (X) H (X) is the entropy rate of a DSS Shannon s converse source coding theorem for a DSS E[W ] N H (X) log D

Memoryless Sources Arithmetic Coding Sources with Memory 17 / 19 Coding for discrete stationary sources Arithmetic coding can use conditional probabilities The intervals will be different at every step depending on the source context Prediction by Partial Matching (PPM) and Context Tree Weighing (CTW) are techniques to build the context tree based source model on the fly, achieving compression rates of approx 2.2 binary symbols per ASCII character What is H (X) for English text? (assuming language is a stationary source, which is a disputed proposition)

Memoryless Sources Arithmetic Coding Sources with Memory 18 / 19 Shannon s Twin Experiment Source?? Encoder Decoder Sink

Memoryless Sources Arithmetic Coding Sources with Memory 19 / 19 Shannon s twin experiment Shannon 1 and Shannon 2 are hypothetical fully identical twins (who look alike, talk alike, and think exactly alike) An operator in the transmitter asks Shannon 1 to guess the next source symbol of the source based on the context The operator counts the number of guesses until Shannon 1 gets it right, and transmits this number An operator in the receiver asks Shannon 2 to guess the next source symbol based on the context. Shannon 2 will get the same answer as Shannon 1 after the same number of guesses. An upper bound on the entropy rate of English is the entropy of the number of guesses A better bound would take dependencies between numbers of guesses into account (if Shannon 1 needed many guesses for a symbol then chances are that he will need many for the next as well, whereas if he guessed right the first time, chances are that he s in the middle of a word and will guess the next symbol correctly as well)