Variable-to-Variable Codes with Small Redundancy Rates

Similar documents
A One-to-One Code and Its Anti-Redundancy

Chapter 2: Source coding

Average Redundancy for Known Sources: Ubiquitous Trees in Source Coding

Chapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code

A Master Theorem for Discrete Divide and Conquer Recurrences

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria

Chapter 5: Data Compression

PART III. Outline. Codes and Cryptography. Sources. Optimal Codes (I) Jorge L. Villar. MAMME, Fall 2015

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

Analytic Algorithmics, Combinatorics, and Information Theory

Coding for Discrete Source

Entropy as a measure of surprise

CSCI 2570 Introduction to Nanocomputing

Multiple Choice Tries and Distributed Hash Tables

Chapter 2 Date Compression: Source Coding. 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code

SIGNAL COMPRESSION Lecture 7. Variable to Fix Encoding

Lecture 3. Mathematical methods in communication I. REMINDER. A. Convex Set. A set R is a convex set iff, x 1,x 2 R, θ, 0 θ 1, θx 1 + θx 2 R, (1)

Communications Theory and Engineering

Basic Principles of Lossless Coding. Universal Lossless coding. Lempel-Ziv Coding. 2. Exploit dependences between successive symbols.

EE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16

ECE 587 / STA 563: Lecture 5 Lossless Compression

Lecture 4 : Adaptive source coding algorithms

A Master Theorem for Discrete Divide and Conquer Recurrences. Dedicated to PHILIPPE FLAJOLET

3F1 Information Theory, Lecture 3

Lecture 3 : Algorithms for source coding. September 30, 2016

Analytic Information Theory and Beyond

Coding of memoryless sources 1/35

Analytic Information Theory: From Shannon to Knuth and Back. Knuth80: Piteaa, Sweden, 2018 Dedicated to Don E. Knuth

ECE 587 / STA 563: Lecture 5 Lossless Compression

Information and Entropy

Motivation for Arithmetic Coding

A Master Theorem for Discrete Divide and Conquer Recurrences

EECS 229A Spring 2007 * * (a) By stationarity and the chain rule for entropy, we have

3F1 Information Theory, Lecture 3

Exercises with solutions (Set B)

Digital Communications III (ECE 154C) Introduction to Coding and Information Theory

10-704: Information Processing and Learning Fall Lecture 10: Oct 3

Data Compression Techniques

Information Theory and Statistics Lecture 2: Source coding

On the Cost of Worst-Case Coding Length Constraints

Lecture 16. Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code

Minimax Redundancy for Large Alphabets by Analytic Methods

An Approximation Algorithm for Constructing Error Detecting Prefix Codes

Introduction to algebraic codings Lecture Notes for MTH 416 Fall Ulrich Meierfrankenfeld

RENEWAL THEORY IN ANALYSIS OF TRIES AND STRINGS: EXTENDED ABSTRACT

10-704: Information Processing and Learning Fall Lecture 9: Sept 28

arxiv: v1 [cs.it] 5 Sep 2008

Intro to Information Theory

UNIT I INFORMATION THEORY. I k log 2

The Optimal Fix-Free Code for Anti-Uniform Sources

Tight Bounds on Minimum Maximum Pointwise Redundancy

Information Theory with Applications, Math6397 Lecture Notes from September 30, 2014 taken by Ilknur Telkes

SIGNAL COMPRESSION Lecture Shannon-Fano-Elias Codes and Arithmetic Coding

1 Introduction to information theory

Chapter 5. Data Compression

Coding on Countably Infinite Alphabets

Chapter 9 Fundamental Limits in Information Theory

EE376A - Information Theory Midterm, Tuesday February 10th. Please start answering each question on a new page of the answer booklet.

lossless, optimal compressor

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

Two Applications of the Gaussian Poincaré Inequality in the Shannon Theory

Approaching Blokh-Zyablov Error Exponent with Linear-Time Encodable/Decodable Codes

Shannon-Fano-Elias coding

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

Avoiding Approximate Squares

Tree-adjoined spaces and the Hawaiian earring

Data Compression Techniques (Spring 2012) Model Solutions for Exercise 2

COMM901 Source Coding and Compression. Quiz 1

Capacity of a channel Shannon s second theorem. Information Theory 1/33

Digital Trees and Memoryless Sources: from Arithmetics to Analysis

Tunstall Code, Khodak Variations, and Random Walks

Information Theory. Lecture 5 Entropy rate and Markov sources STEFAN HÖST

4. Quantization and Data Compression. ECE 302 Spring 2012 Purdue University, School of ECE Prof. Ilya Pollak

Information Theory, Statistics, and Decision Trees

Efficient On-line Schemes for Encoding Individual Sequences with Side Information at the Decoder

Lecture 8: Shannon s Noise Models

Robustness and duality of maximum entropy and exponential family distributions

INITIAL POWERS OF STURMIAN SEQUENCES

On Unique Decodability, McMillan s Theorem and the Expected Length of Codes

Information measures in simple coding problems

Context tree models for source coding

Asymptotic redundancy and prolixity

Continued Fraction Digit Averages and Maclaurin s Inequalities

1 Ex. 1 Verify that the function H(p 1,..., p n ) = k p k log 2 p k satisfies all 8 axioms on H.

MARKOV CHAINS A finite state Markov chain is a sequence of discrete cv s from a finite alphabet where is a pmf on and for

Uncertainity, Information, and Entropy

Tight Upper Bounds on the Redundancy of Optimal Binary AIFV Codes

On approximation of real, complex, and p-adic numbers by algebraic numbers of bounded degree. by K. I. Tsishchanka

Uniform Diophantine approximation related to b-ary and β-expansions

C.7. Numerical series. Pag. 147 Proof of the converging criteria for series. Theorem 5.29 (Comparison test) Let a k and b k be positive-term series

Summary of Last Lectures

Lecture 4 Channel Coding

Bounded Expected Delay in Arithmetic Coding

Introduction to information theory and coding

Multimedia Communications. Mathematical Preliminaries for Lossless Compression

2.2 Some Consequences of the Completeness Axiom

Definition: Let S and T be sets. A binary relation on SxT is any subset of SxT. A binary relation on S is any subset of SxS.

Source Coding: Part I of Fundamentals of Source and Video Coding

Lecture 1 : Data Compression and Entropy

CMPT 365 Multimedia Systems. Lossless Compression

Transcription:

Variable-to-Variable Codes with Small Redundancy Rates M. Drmota W. Szpankowski September 25, 2004 This research is supported by NSF, NSA and NIH. Institut f. Diskrete Mathematik und Geometrie, TU Wien, Austria Department of Computer Science, Purdue University, U.S.A.

Outline of the Talk 1. Types of Codes 2. Redundancy and Redundancy Rates 3. Main Results 4. Sketch of Proof 5. Algorithm

Types of Codes Fixed-to-Variable Code: Fixed length blocks are mapped into variable-length binary code strings. Example: Shannon and Huffman codes. Variable-to-Fixed Code: Encoder partitions strings into phrases. Phrases belong to dictionary D ( complete tree). The encoder represents dictionary string by fixed-length code word. Example: Tunstall code. Variable-to-variable (VV) code: Encoder consists of a parser and a string encoder. The parser works as in the VF code (it partitions a sequence into phrases). The string encoder encodes dictionary strings into codeword C(d) of length C(d) = l(d).

Redundancy and Redundancy Rates Let A = {a 1,...,a m } be the input alphabet of m 2 symbols with probabilities p 1,...,p m. A source S generates a sequence x with the underlying probability P S. The average and worst case (maximal) redundancy of a fixed-to-variable code are defined, respectively, as R = X P S [L(x) +logp S (x)] x A n R = max x A n[l(x) +logp S(x)] where L(x) is the code length assigned to x A n. Redundancy rates are respectively r = R n r = R n.

Redundancy Rates for VV Codes Let P := P D be the probability induced by the dictionary D. Define the average delay D as D = X d D P D (d) d. The (asymptotic) average redundancy rate r is P x =n r = lim P S(x)(L(x) +logp S (x)). n n Using renewal theory (regeneration theory) we find lim n P x =n P S(x)L(x) n = P where l(d) is the length of the phrase d D. d D P D(d)l(d). D

A New Definition Denote H S the binary source entropy, H D the dictionary entropy. Then P d D P D(d)l(d) D H S = H D + `P d D P D(d)l(d) H D D H S. By the Conservation of Entropy Theorem (Savari 99) H D = H S D, hence P d D r = P D(d)l(d) H D D which we adopt as our definition of the average redundancy rate.

Maximum Redundancy Maximum (asymptotic) redundancy rate r r = lim x max x [L(x) +logp S (x)]. x Assuming the source is sequence is partitioned into phrases x 1,...,x k, k we have r = lim k max x1,...,xk = lim k = lim k P k i=1 [l(xi )+logp S (x i )] P k i=1 xi P k i=1 max x i [l(x i )+logp S (x i )] P k i=1 xi P k i=1 max d D[l(d) +logp (d)] P k i=1 xi = lim k k max d D [l(d) +logp (d)] P k i=1 xi = max d D[l(d) +logp (d)] D (a.s.).

Main Result for Average Redundancy Theorem 1. Let m 2 and S a memoryless or a Markov source. There exists a variable-to-variable code such that its average redundancy satisfies r = O( D 5/3 ). There also exists a variable-to-variable code such that the worst case redundancy satisfies r = O( D 4/3 ), however, the maximal code length might be infinite. The estimate for r for the memoryless source is the same as in Khodak s 1972 paper. However, the method presented in Khodak s paper is difficult to follow and it is not clear one can construct a VV code.

Main Result for Maximal Redundancy Theorem 2. Under the same assumption as in Theorem 1. For almost all source parameters p j (resp. p ij ): There exists a variable-to-variable code such that its average redundancy is bounded by r D 4 3 m 3 +ε, where ε>0 and the maximal length is O( D log D). There also exists a variable-to-variable code with worst case redundancy r D 1 m 3 +ε for ε>0.

Lower Bound Theorem 3. For every variable-to-variable code and almost all parameters p j, the following lower bound holds for ε>0 Some comments: r r D 2m 1 ε. 1. Typically the best possible average and worst case redundancy are measured in terms of negative powers of D that linearly decrease in term of the alphabet size m. 2. It seems to be very difficult to obtain the optimal exponent (almost surely). Nevertheless the bounds we are obtain are best possible with respect to the methods we use. 3. Note that Theorem 4 of Kodak 1972 paper states a lower bound of for the redundancy of the form r D 9 (log D) 8 (for almost all memoryless sources). In view of our Theorem 3 this cannot be true for large m.

Rough Idea... Let A = {1, 2,...,m} and p 1 + p m =1. By k i we denote the number of symbols i in a word x. For the Shannon code the redundancy is R = X x log p k 1 1 pk m m +logp k 1 1 pk m m = 1 X x k 1 log p 1 k m log p m = X x k 1 log p 1 + + k m log p m where a = a a is the fractional part of a. Basic Thrust of our Approach In order to minimize the redundancy of a code we must find such k 1,...,k m that k 1 log p 1 + + k m log p m is as close as possible to an integer.

Proof by Picture good word in D j bad word in B j r d N 2 N 2 + O(N) PD ( 1 ) = c --- N 2N 2 + O(2N) PD ( 2 ) c 1 --- c = --- N N... r 3N 2 + O(3N) PD ( 3 ) c 1 --- 2 c = --- N N KN 2 + O(KN) K = N log N PD ( k ) c 1 --- k 1 c = --- N N

Algorithm Input: m, an integer 2, positive rational numbers p 1,...,p m with p 1 + + p m =1, p m is not a power of 2 ε<1, apositive real number. Output: A VV-code (given by a dictionary D, a complete prefix free set on an m-ary alphabet and by a prefix code C : D {0, 1} )withthe average redundancy r ε/d where the average dictionary code length D satisfies D c(m, p 1,...,p m )/ε 3.

Algorithm 1. Calculate a convergent M N =[c 0,c 1,...,c n ] of log 2 p m for which N > 4/ε. 2. Set k 0 j = p jn 2, x = P m j=1 k0 j log 2 p j, n 0 = P m j=1 k0 j. 3. Set D =, B = {empty word}, andp =0 while p<1 ε/4 do Take r Bof minimal length b log 2 P (r) Determine 0 k<nthat solves km 1 (x + b)n mod N (i.e., 1/N km/n + x + b 2/N ) n n 0 + k D {d A n :type(d) =(k 0 1,...,k m + k)} D D r D B (B \{r} r (A n \D ) p p + P (r)p (D ),where P (D )= n! 0 1 k1 0! k0 m 1!(k0 m + k)!pk 1 pk0 m 1 m 1 pk0 m +k m. end while 4. D D B 5. Construct a Shannon code l(d) = P (d).

Some Definitions Needed in the Proof Define dispersion of the set X [0, 1) as δ(x) = sup inf y x, 0 y<1 x X where x =min( x, 1 x ). That is, for every y [0, 1) there exists x X with y x δ(x). Lemma 1. Suppose that γ is an irrational number. The there exists N such that δ ({ kγ :0 k<n}) 2 N. Lemma 2. Let (γ 1,...,γ m ) = (logp 1,...,log p m ) an m-vector of real numbers such that at least one of its coordinates is irrational. There exists N such that the dispersion of the set is bounded by X = { k 1 γ 1 + + k m γ m :0 k j <N(1 j m)} δ(x) 2 N.

Two Important Lemmas Lemma 3. Let D be a finite set with probability distribution P such that that for every d Dthe length l d satisfies l d +log 2 P (d) 1. If X d D P (d)(l d +log 2 P (d)) 2 X d D P (d)(l d +log 2 P (d)) 2 then there exists an injective mapping C : D {0, 1} such that C(D) is aprefixfreesetand C(d) = l d for all d D. Proof: Kraft s inequality and Taylor s expansion. Lemma 4. Let D be a finite set with probability distribution P.Then r 1 2 1 D X P (d) log 2 P (d) 2, d D for a certain constant c>0.

Main Step of the Proof Theorem 4. Suppose that for some N 1 and η 1 the set has dispersion X = { k 1 log 2 p 1 + + k m log 2 p m :0 k j <N} δ(x) 2 N η. Then there exists a variable-to-variable code (with D =Θ(N 3 )) such that the average redundancy rate is r c m D 4+η 3, There exists also another variable-to-variable code with the the worst case redundancy bounded by r c m D 1 η 3, where the constants c m,c m > 0 just depend on m. Observe that above theorem and previous lemmas immediately implied our main result after setting η =1.

Sketch of the Proof of Theorem 3 1. Setk 0 i := p in 2 (1 i m) and x = k 0 1 log 2 p 1 + + k 0 m log 2 p m. By Theorem 3 there exist integers 0 k 1 j <Nsuch that E Dx + k 1 1 log 2 p 1 + + k 1 m log 2 p m = D(k 0 1 + k1 1 )log 2 p 1 + +(k 0 m + k1 m )log 2 p m E < 4 N η. 2. Build an m-ary tree starting at the root with k 0 1 + k 1 edges of type 1, k 0 2 + k 2 edges of type 2,...,and k 0 m + k m edges of type m. Let D 1 denote the set of the corresponding words. Then c N P (D 1) = (k 0 1 + k 1 )+ +(k0 m + k m ) k1 0 + k 1,...,k0 m + p k0 1 +k 1 k 1 p k0 m +k m m m c N for certain positive constants c,c.

Constructing the Prefix Code 3. By construction, all words d D 1 satisfy and have the same length log 2 P (d) < 4 N η, n 1 =(k 0 1 + k 1 )+ +(k0 m + k m )=N 2 + O(N). 4. Consider words not in D 1, that is, B 1 = A n 1 \D 1 that by above satisfy c N P (B 1) 1 c N.

Second Step... 5. Takeawordr B 1 and concatenate it with a word d 2 of length N 2 such that log 2 P (rd 2 ) is close to an integer with high probability. For every word r B 1 we set x(r) =log 2 P (r) +k 0 1 log 2 p 1 + + k 0 m log 2 p m. By condition of Theorem 3 again there exist integers 0 k 2 j (r) <N(1 j m) such that E Dx(r) +k 2 1 (r)log 2 p 1 + + k 2 m (r)log 2 p m < 4 N η

Proving Some Properties... 6. We continue extending the tree T by adding a path starting at r B 1 with k 0 1 + k2 1 (r) edges of type 1, k 0 2 + k2 2 (r) edges of type 2,...,and k 0 m + k2 m (r) edges of type m. We observe that P (r) c N P (D 2(r)) P (r) c N. Furthermore (by construction) we have log 2 P (d) < 4 N η for all d D 2 (r).

Last Step... 7. This construction is cut after K = O(N log N) steps so that P (B K ) c 1 c for some β>0. This also ensures that N «K 1 N β P (D 1 D K ) > 1 1 N β. 8. Thecomplete prefix free set D is D = D 1 D K B K. By the construction the average delay of D is c 1 N 3 D = X d D P (d) d c 2 N 3 whilethe maximal code length satisfies max d D d = ON3 log N = O D log D).

Finishing the Proof... 9. For every d D 1 D K we can choose a non-negative integer l d with Indeed: l d +log 2 P (d) < 2 N η. if log 2 P (d) < 2/N η and 0 l d +log 2 P (d) < 2 N η if 1 log 2 P (d) < 2/N η. 2 N η <l d +log 2 P (d) 0 For d B K we set l d = log 2 P (d). 10. The final step is to verify that condition of Lemma 3 is satisfied, which is an easy step.