PART III. Outline. Codes and Cryptography. Sources. Optimal Codes (I) Jorge L. Villar. MAMME, Fall 2015

Similar documents
Chapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code

Tight Upper Bounds on the Redundancy of Optimal Binary AIFV Codes

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

Lecture 16. Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code

Chapter 2: Source coding

Entropy as a measure of surprise

Chapter 2 Date Compression: Source Coding. 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code

Exercises with solutions (Set B)

10-704: Information Processing and Learning Fall Lecture 10: Oct 3

Lecture 1 : Data Compression and Entropy

TTIC 31230, Fundamentals of Deep Learning David McAllester, April Information Theory and Distribution Modeling

Data Compression. Limit of Information Compression. October, Examples of codes 1

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria

Coding of memoryless sources 1/35

Chapter 5: Data Compression

Lec 03 Entropy and Coding II Hoffman and Golomb Coding

Lecture 3. Mathematical methods in communication I. REMINDER. A. Convex Set. A set R is a convex set iff, x 1,x 2 R, θ, 0 θ 1, θx 1 + θx 2 R, (1)

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

Lecture 6: Kraft-McMillan Inequality and Huffman Coding

Information Theory with Applications, Math6397 Lecture Notes from September 30, 2014 taken by Ilknur Telkes

lossless, optimal compressor

LECTURE 2. Convexity and related notions. Last time: mutual information: definitions and properties. Lecture outline

Information Theory and Statistics Lecture 2: Source coding

UNIT I INFORMATION THEORY. I k log 2

Homework Set #2 Data Compression, Huffman code and AEP

Information Theory. Lecture 5 Entropy rate and Markov sources STEFAN HÖST

Chapter 9 Fundamental Limits in Information Theory

Data Compression Techniques (Spring 2012) Model Solutions for Exercise 2

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

3F1: Signals and Systems INFORMATION THEORY Examples Paper Solutions

3F1 Information Theory, Lecture 3

Information Theory. M1 Informatique (parcours recherche et innovation) Aline Roumy. January INRIA Rennes 1/ 73

Lecture 4 : Adaptive source coding algorithms

Digital Communications III (ECE 154C) Introduction to Coding and Information Theory

1 Ex. 1 Verify that the function H(p 1,..., p n ) = k p k log 2 p k satisfies all 8 axioms on H.

Information Theory. David Rosenberg. June 15, New York University. David Rosenberg (New York University) DS-GA 1003 June 15, / 18

Lecture 1: September 25, A quick reminder about random variables and convexity

EE376A: Homework #2 Solutions Due by 11:59pm Thursday, February 1st, 2018

Introduction to algebraic codings Lecture Notes for MTH 416 Fall Ulrich Meierfrankenfeld

Lecture 22: Final Review

Communications Theory and Engineering

ECE 587 / STA 563: Lecture 5 Lossless Compression

ECE 587 / STA 563: Lecture 5 Lossless Compression

Quantum-inspired Huffman Coding

A One-to-One Code and Its Anti-Redundancy

Chapter 5. Data Compression

Run-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE

LECTURE 3. Last time:

1 Introduction to information theory

SIGNAL COMPRESSION Lecture 7. Variable to Fix Encoding

EECS 229A Spring 2007 * * (a) By stationarity and the chain rule for entropy, we have

Information Theory CHAPTER. 5.1 Introduction. 5.2 Entropy

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

3F1 Information Theory, Lecture 3

Information Leakage of Correlated Source Coded Sequences over a Channel with an Eavesdropper

EE 376A: Information Theory Lecture Notes. Prof. Tsachy Weissman TA: Idoia Ochoa, Kedar Tatwawadi

Slides for CIS 675. Huffman Encoding, 1. Huffman Encoding, 2. Huffman Encoding, 3. Encoding 1. DPV Chapter 5, Part 2. Encoding 2

Multimedia Communications. Mathematical Preliminaries for Lossless Compression

EE5585 Data Compression January 29, Lecture 3. x X x X. 2 l(x) 1 (1)

21. Dynamic Programming III. FPTAS [Ottman/Widmayer, Kap. 7.2, 7.3, Cormen et al, Kap. 15,35.5]

Solutions to Set #2 Data Compression, Huffman code and AEP

10-704: Information Processing and Learning Fall Lecture 9: Sept 28

Optimal codes - I. A code is optimal if it has the shortest codeword length L. i i. This can be seen as an optimization problem. min.

Information theory and decision tree

Lecture 3 : Algorithms for source coding. September 30, 2016

CSE 421 Greedy: Huffman Codes

Tight Bounds on Minimum Maximum Pointwise Redundancy

An Approximation Algorithm for Constructing Error Detecting Prefix Codes

Variable-to-Variable Codes with Small Redundancy Rates

Information and Entropy

COMM901 Source Coding and Compression. Quiz 1

Homework 1 Due: Thursday 2/5/2015. Instructions: Turn in your homework in class on Thursday 2/5/2015

Introduction to information theory and coding

Lecture 12: November 6, 2017

DCSP-3: Minimal Length Coding. Jianfeng Feng

Intro to Information Theory

Lecture Notes on Digital Transmission Source and Channel Coding. José Manuel Bioucas Dias

Optimal Encoding Schemes for Several Classes of Discrete Degraded Broadcast Channels

ECE Advanced Communication Theory, Spring 2009 Homework #1 (INCOMPLETE)

Motivation for Arithmetic Coding

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

On the Cost of Worst-Case Coding Length Constraints

Information Theory, Statistics, and Decision Trees

Asymptotic redundancy and prolixity

EE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16

Summary of Last Lectures

ECE Information theory Final (Fall 2008)

10-704: Information Processing and Learning Spring Lecture 8: Feb 5

Course notes for Data Compression - 1 The Statistical Coding Method Fall 2005

Lecture Notes for Communication Theory

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy

Upper Bounds on the Capacity of Binary Intermittent Communication

MAHALAKSHMI ENGINEERING COLLEGE-TRICHY QUESTION BANK UNIT V PART-A. 1. What is binary symmetric channel (AUC DEC 2006)

Lecture 11: Polar codes construction

EE376A - Information Theory Midterm, Tuesday February 10th. Please start answering each question on a new page of the answer booklet.

4.8 Huffman Codes. These lecture slides are supplied by Mathijs de Weerd

Information Theory: Entropy, Markov Chains, and Huffman Coding

arxiv: v1 [cs.it] 5 Sep 2008

ELEMENTS O F INFORMATION THEORY

MAHALAKSHMI ENGINEERING COLLEGE QUESTION BANK. SUBJECT CODE / Name: EC2252 COMMUNICATION THEORY UNIT-V INFORMATION THEORY PART-A

Transcription:

Outline Codes and Cryptography 1 Information Sources and Optimal Codes 2 Building Optimal Codes: Huffman Codes MAMME, Fall 2015 3 Shannon Entropy and Mutual Information PART III Sources Information source: sequence of random variables taking values on a finite source alphabet. Stateless and stationary source: the random variables are independent and identically distributed. Mathematical model of a source S: Source alphabet A = {α 1,..., α n } Probability distribution (p 1,..., p n ) where p i is the probability of symbol α i We simply say S = (p 1,..., p n ) Optimal Codes (I) Optimal code: an r-ary uniquely-decodable code C that minimizes the average codeword length L(C, S) = l(w i )p i where w i is the codeword associated to source symbol α i It is the meaningful efficiency measure when long sequences of input symbols are to be encoded From McMillan s inequality we have the additional constraint r l(w i ) 1 There always exists a prefix-free optimal code

Optimal Codes (II) Shannon s Lower Bound We solve the continuous optimization problem: Minimize L = l i p i for l 1,..., l n subject to r l i 1 As L is a linear function, its minima are on the boundary. Now, using Lagrange multipliers ( L = (p 1,..., p n ) = λ ) r l i 1 = λ ln r(r l 1,..., r l n ) Thus, λ ln r = 1 since r l i = Therefore l i = log r p i and L opt = p i = 1, and p i = r l i. p i log r p i We have find the following lower bound Theorem (Shannon) For any stateless stationary source S = (p 1,..., p n ) and any r-ary uniquely-decodable code C L(C, S) L r (S) = p i log r p i However, this lower bound is achievable if and only if all l i = log r p i are integers (i.e., p i are negative powers of r). Example: The Uniform Source Shannon-Fano Codes Consider as an example the case p 1 = = p n = 1 n, where n = r d for some integer d Now L r (S) = n ( 1 n log r l i = log 1 r n) = d ( 1 ) n = logr n = d is achievable by taking The optimal code is the constant length code C = {0,..., r 1} d A good approximation for the optimal (continuous) case is taking l i = log r p i. There always exists a prefix-free code, the Shannon-Fano code, for this codeword length sequence, since r l i = r log r p i r log r p i = p i = 1 On the other hand, the code is near-optimal L(C, S) = p i log r p i < p i ( log r p i + 1) = L r (S) + 1

Source Coalescence Source Extensions (I) Consider two sources S 1 = (p 1,..., p n ), S 2 = (q 1,..., q m ). We can define a new source S 1 S 2 = (p 1 q 1, p 1 q 2,..., p n q m ) for the cartesian product of the two source alphabets. The two information sources are merged into a single one A code C for S 1 S 2 is trivially obtained by concatenating C 1 for S 1 and C 2 for S 2. (But better codes can exist!) m However, L r (S 1 S 2 ) = p i q j log r (p i q j ) = p i log r p i j=1 m q j log r q j = L r (S 1 ) + L r (S 2 ) j=1 Given a source S = (p 1,..., p n ) we can define its k-th extension S k = S } {{ S }. k times Idea: encode k-tuples instead of separate symbols Now, L r (S k ) = kl r (S) and the Shannon-Fano code C k for S k satisfies L r (S k ) L(C k, S k ) < L r (S k ) + 1 But the correct performance measure for k-tuples is L r (S) L(C k, S k ) < L r (S) + 1 k k In the context of source extensions, Shannon s lower bound is asymptotically achieved by Shannon-Fano codes. Source Extensions (II) Outline Theorem (Noiseless Coding Theorem [Shannon, 1948]) For any stateless stationary source S and any family {C k } k Z + of r-ary optimal uniquely-decodable codes for {S k } k Z + L(C k, S k ) lim = L r (S) k k The proof simply considers the fact that any optimal code performs better than any other code, in particular, better that Shannon-Fano codes 1 Information Sources and Optimal Codes 2 Building Optimal Codes: Huffman Codes 3 Shannon Entropy and Mutual Information

Construction of Optimal Codes (I)... Coming back to the discrete optimization problem... Shannon-Fano codes are guaranteed to be optimal only when all p i are negative powers of r, but we still need a construction for the general case. Given S = (p 1,..., p n ) we look for an optimal r-ary prefix-free code C with some codeword lengths l 1,..., l n. Let s assume that p 1 p n ). Lemma (1) In an optimal code, p i > p j l i l j. Otherwise, swapping i-th and j-th codewords in C would result in a better code. Construction of Optimal Codes (II) Let T (C) be the subtree of a complete r-ary tree formed by the paths from the root node to the nodes of the codewords in C. For example, C = {001, 01, 10010, 110, 111} We call available the nodes in the complement of T (C) with siblings in T (C). After adding a new codeword corresponding to an available node, C remains prefix-free! Construction of Optimal Codes (III) Lemma (2) If C is optimal, then available nodes of T (C) can only occur in the lowest level. Otherwise, we can change one codeword in the lowest level by the codeword associated to the available node, then improving the code. C = {2, 00, 02, 10, 12, 010, 012, 110, 111} Lemma (3) There exists an optimal code C with ( n + 1) mod (r 1) available nodes, and all of them are siblings and they are in the lowest level. In the binary case, r = 2, there are no available nodes.

C = {2, 00, 02, 10, 12, 010, 012, 110, 111} C = {2, 00, 02, 10, 12, 010, 012, 110, 111} All available nodes are at the lowest level Rearrange lowest level nodes to make all available nodes siblings C = {2, 00, 02, 10, 12, 010, 012, 110, 011} C = {2, 00, 02, 10, 11, 12, 010, 011, 012 } A single child can always be replaced by its parent Now the code can be optimal! Still the lowest level nodes can be rearranged to make the set of r ( n + 1) mod (r 1) siblings have the lowest probabilities.

Huffman Codes: Algorithm A recursive construction algorithm: C = {2, 00, 02, 10, 11, 12, 01 } Still the lowest level nodes can be rearranged to make the set of r ( n + 1) mod (r 1) siblings have the lowest probabilities. If they are replaced by their parent, the code should remain optimal! Input: A sequence of probabilities p 1 p n Output: An optimal r-ary code {w 1,..., w n }. If n r, then output C = {0,..., n 1}. Otherwise,... Compute m = r (( n + 1) mod (r 1)). Replace (p n m+1,..., p n ) by p n m+1 = p n m+1 +... + p n. Sort the new sequence and compute an optimal code C for it. Undo the above sorting procedure and replace the last codeword w by the m codewords w0,..., w(m 1). Remark: m = r except perhaps in the first step for r 3 Huffman Codes: Example (I) Example: r = 3, n = 10 S = (0.1821, 0.1161, 0.0510, 0.0130, 0.0124, 0.4355, 0.0864, 0.0411, 0.0104, 0.0520) [1] (0.4355, 0.1821, 0.1161, 0.0864, 0.0520, 0.0510, 0.0411, 0.0130, 0.0124, 0.0104 }{{} ) [2] (0.4355, 0.1821, 0.1161, 0.0864, 0.0520, 0.0510, 0.0411, 0.0228, 0.0130 }{{} ) [3] (0.4355, 0.1821, 0.1161, 0.0864, 0.0769, 0.0520, 0.0510 }{{} ) [4] (0.4355, 0.1821, 0.1799, 0.1161, 0.0864 }{{} ) [5] (0.4355, 0.3824, 0.1821) [5] (0, 1, 2) {}}{ [4] (0, 2, 10, 11, 12) {}}{ [3] (0, 2, 11, 12, 100, 101, 102) {}}{ [2] (0, 2, 11, 12, 101, 102, 1000, 1001, 1002) {}}{ [1] (0, 2, 11, 12, 101, 102, 1000, 1002, 10010, 10011) C = (2, 11, 102, 1002, 10010, 0, 12, 1000, 10011, 101) C 5 = {0, 1, 2}

C 4 = {0, 2, {}}{ 10, 11, 12} C 3 = {0, 2, 11, 12, {}}{ 100, 101, 102} C 2 = {0, 2, 11, 12, 101, 102, {}}{ 1000, 1001, 1002} C 1 = {0, 2, 11, 12, 101, 102, 1000, 1002, {}}{ 10010, 10011}

Outline 1 Information Sources and Optimal Codes 2 Building Optimal Codes: Huffman Codes C = {2, 11, 102, 1002, 10010, 0, 12, 1000, 10011, 101} C Sh-F = {10, 11, 120, 2010, 2011, 0, 121, 122, 20120, 200} 3 Shannon Entropy and Mutual Information L(C, S) = 1.6620 L r (S) = 1.5736 L(C Sh-F, S) = 1.8770 There is still one available node Shannon s Entropy of a Discrete Random Variable (I) It measures the average amount of information in a random variable. Definition (Shannon s Entropy) Let X be a random variable with range a finite set X. H(X) = x X Pr[X = x] log 2 Pr[X = x] Relationship with optimal codes: L r (S) = H(S)/ log 2 r The minimum possible average length for a lossless code for a source S is actually the average amount of (r-ary) information per source symbol. Shannon s Entropy of a Discrete Random Variable (II) Properties of Shannon s entropy: Main inequality: H(X, Y, Z ) + H(Z ) H(X, Z ) + H(Y, Z ) (equality holds only if z Z, conditioned to Z = z, X and Y are independent) details... 0 H(X) log 2 X (r.h.s. equality holds only for uniform X, and l.h.s. holds for a trivial X) H(X) H(X, Y ) H(X) + H(Y ) (r.h.s. equality holds only for independent X, Y, and l.h.s. holds only if Y = g(x)) H(g(X)) H(X) (equality holds only for injective g) H(X, g(x, Y )) H(X, Y ) (equality holds only if y g(x, y) is injective x X ) H(X, g(x)) = H(X)

Shannon s Entropy Main Inequality (Submodularity) The map f (t) = t log 2 t is convex in [0, + ) (because f > 0) By discrete Jensen s inequality, for all a i > 0, t i > 0, i I a i t i = 1 a i t i log 2 t i log 2 a i i I i I i I and the equality implies that all t i are equal. The main inequality is proven by letting t x,y,z = a x,y,z = Pr[X = x, Y = y, Z = z] Pr[Z = z] Pr[X = x, Z = z] Pr[Y = y, Z = z] ; Pr[X = x, Z = z] Pr[Y = y, Z = z] ; Pr[Z = z] a x,y,z t x,y,z = 1 x,y,z a x,y,z = 1 x,y,z Conditional Entropy Definition (Conditional Shannon s Entropy) Let X, Y be random variables with respective ranges X, Y. H(Y X) = H(X, Y ) H(X) It measures the average amount of information still in Y after revealing X. Equivalently, H(Y X) = x X Pr[X = x]h(y X = x) or H(Y X) = x X Pr[X = x, Y = y] log 2 Pr[Y = y X = x] y Y go back... Conditional Entropy: Properties Similarly to (non-conditional) Shannon s entropy: Main inequality: H(Y, Z, T X) + H(T X) H(Y, T X) + H(Z, T X) 0 H(Y X) H(Y ) (r.h.s. equality holds only for independent X, Y, and l.h.s. holds only if Y = g(x)) H(Z X, Y ) = H(Y, Z X) H(Y X) H(Y X) H(Y, Z X) H(Y X) + H(Z X) H(g(X, Y ) X) H(Y X) H(Y, g(x, Y, Z ) X) H(Y, Z X) H(Y, g(x, Y ) X) = H(Y X) Mutual Information Definition (Mutual Information) Let X, Y be random variables with respective ranges X, Y. I(Y ; X) = H(X) + H(Y ) H(X, Y ) = H(Y ) H(Y X) It measures the amount of information about Y conveyed by X. Some properties I(Y, Z, T ; X) + I(T ; X) I(Y, T ; X) + I(Z, T ; X) I(X; Y ) = I(Y ; X) 0 I(Y ; X) min(h(x), H(Y )) Definition (Conditional Mutual Information) Let X, Y, Z be random variables with respective ranges X, Y, Z. I(Z ; Y X) = H(Z X) H(Z X, Y )

Codes and Cryptography MAMME, Fall 2015 END OF PART III