A One-to-One Code and Its Anti-Redundancy

Similar documents
Analytic Information Theory: From Shannon to Knuth and Back. Knuth80: Piteaa, Sweden, 2018 Dedicated to Don E. Knuth

Variable-to-Variable Codes with Small Redundancy Rates

Analytic Information Theory and Beyond

Analytic Algorithmics, Combinatorics, and Information Theory

Chapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code

Communications Theory and Engineering

PART III. Outline. Codes and Cryptography. Sources. Optimal Codes (I) Jorge L. Villar. MAMME, Fall 2015

Average Redundancy for Known Sources: Ubiquitous Trees in Source Coding

CSCI 2570 Introduction to Nanocomputing

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

Minimax Redundancy for Large Alphabets by Analytic Methods

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria

Lecture 8: Shannon s Noise Models

(each row defines a probability distribution). Given n-strings x X n, y Y n we can use the absence of memory in the channel to compute

Chapter 2 Date Compression: Source Coding. 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code

Exercises with solutions (Set B)

3F1 Information Theory, Lecture 3

1 Ex. 1 Verify that the function H(p 1,..., p n ) = k p k log 2 p k satisfies all 8 axioms on H.

Information Theory and Statistics Lecture 2: Source coding

COMM901 Source Coding and Compression. Quiz 1

LECTURE 13. Last time: Lecture outline

Information Theory with Applications, Math6397 Lecture Notes from September 30, 2014 taken by Ilknur Telkes

(b) Prove that the following function does not tend to a limit as x tends. is continuous at 1. [6] you use. (i) f(x) = x 4 4x+7, I = [1,2]

Coding of memoryless sources 1/35

Coding for Discrete Source

EE376A - Information Theory Final, Monday March 14th 2016 Solutions. Please start answering each question on a new page of the answer booklet.

National University of Singapore Department of Electrical & Computer Engineering. Examination for

3F1 Information Theory, Lecture 3

Upper Bounds on the Capacity of Binary Intermittent Communication

Lecture 16. Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code

Chapter 2: Source coding

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

EE5585 Data Compression January 29, Lecture 3. x X x X. 2 l(x) 1 (1)

Solutions to Homework Set #1 Sanov s Theorem, Rate distortion

ELEC546 Review of Information Theory

Lecture 3 : Algorithms for source coding. September 30, 2016

EE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16

1 Introduction to information theory

Lecture 5: Channel Capacity. Copyright G. Caire (Sample Lectures) 122

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

Electrical and Information Technology. Information Theory. Problems and Solutions. Contents. Problems... 1 Solutions...7

Capacity of a channel Shannon s second theorem. Information Theory 1/33

UNIT I INFORMATION THEORY. I k log 2

Chapter 5: Data Compression

Analytic Information Theory: Redundancy Rate Problems

Information Theory. Lecture 5 Entropy rate and Markov sources STEFAN HÖST

Lecture 3. Mathematical methods in communication I. REMINDER. A. Convex Set. A set R is a convex set iff, x 1,x 2 R, θ, 0 θ 1, θx 1 + θx 2 R, (1)

Entropy as a measure of surprise

An introduction to basic information theory. Hampus Wessman

Data Compression Techniques (Spring 2012) Model Solutions for Exercise 2

Solutions to Set #2 Data Compression, Huffman code and AEP

Chapter 9 Fundamental Limits in Information Theory

Lecture 11: Quantum Information III - Source Coding

Entropy power inequality for a family of discrete random variables

Correlation Detection and an Operational Interpretation of the Rényi Mutual Information

LECTURE 10. Last time: Lecture outline

Entropies & Information Theory

EE376A: Homeworks #4 Solutions Due on Thursday, February 22, 2018 Please submit on Gradescope. Start every question on a new page.

Lecture 11: Polar codes construction

ECE 587 / STA 563: Lecture 5 Lossless Compression

Lecture 6 I. CHANNEL CODING. X n (m) P Y X

Second-Order Asymptotics in Information Theory

EE/Stat 376B Handout #5 Network Information Theory October, 14, Homework Set #2 Solutions

ECE 587 / STA 563: Lecture 5 Lossless Compression

7 Asymptotics for Meromorphic Functions

Coding on Countably Infinite Alphabets

Lower Bounds on the Graphical Complexity of Finite-Length LDPC Codes

Information and Entropy

Solutions to Homework 2

An Extended Fano s Inequality for the Finite Blocklength Coding

Approaching Blokh-Zyablov Error Exponent with Linear-Time Encodable/Decodable Codes

CHAPTER 3. P (B j A i ) P (B j ) =log 2. j=1

Run-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE

FRAMES IN QUANTUM AND CLASSICAL INFORMATION THEORY

Multimedia Communications. Mathematical Preliminaries for Lossless Compression

lossless, optimal compressor

Tight Bounds on Minimum Maximum Pointwise Redundancy

Algebraic Geometry Codes. Shelly Manber. Linear Codes. Algebraic Geometry Codes. Example: Hermitian. Shelly Manber. Codes. Decoding.

Information Theory for Wireless Communications. Lecture 10 Discrete Memoryless Multiple Access Channel (DM-MAC): The Converse Theorem

Homework Set #2 Data Compression, Huffman code and AEP

(Classical) Information Theory II: Source coding

LECTURE 3. Last time:

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

Lecture 4 Noisy Channel Coding

The Method of Types and Its Application to Information Hiding

A Master Theorem for Discrete Divide and Conquer Recurrences

The memory centre IMUJ PREPRINT 2012/03. P. Spurek

Entropy Vectors and Network Information Theory

Common Information. Abbas El Gamal. Stanford University. Viterbi Lecture, USC, April 2014

EE5139R: Problem Set 7 Assigned: 30/09/15, Due: 07/10/15

Context tree models for source coding

EE376A - Information Theory Midterm, Tuesday February 10th. Please start answering each question on a new page of the answer booklet.

Lecture 4 Channel Coding

On Achievable Rates and Complexity of LDPC Codes over Parallel Channels: Bounds and Applications

10-704: Information Processing and Learning Fall Lecture 10: Oct 3

MARKOV CHAINS A finite state Markov chain is a sequence of discrete cv s from a finite alphabet where is a pmf on and for

Introduction to algebraic codings Lecture Notes for MTH 416 Fall Ulrich Meierfrankenfeld


Introduction to Information Theory. By Prof. S.J. Soni Asst. Professor, CE Department, SPCE, Visnagar

Lecture 4 : Adaptive source coding algorithms

Transcription:

A One-to-One Code and Its Anti-Redundancy W. Szpankowski Department of Computer Science, Purdue University July 4, 2005 This research is supported by NSF, NSA and NIH.

Outline of the Talk. Prefix Codes 2. Redundancy 3. Our One-to-One Code 4. Asymptotic Results for Anti-redundancy 5. Sketch of Proof Sums involving the floor function Saddle point method Sums and sequences modulo

Some Definitions A block code C n : A n {0, } is an injective mapping from the set A n of all sequences x n = x...x n of length n over the alphabet A to the set {0, } of binary sequences. For a given source P, the pointwise redundancy and the average redundancy are defined as respectively R n (C n,p; x n ) = L(C n)+lgp (x n ) R n (C n,p) = E X n[r n (C n,p; X n )] where L(C n,x n ) is the code length, H n (P )= P x n P (xn ) the source entropy, and E denotes the expectation, = E[L(C n,x n )] H n(p )

Prefix Codes Usually, we deal with prefix codes which are defined as those in which there is no codeword being a prefix of another codeword. Prefix codes do satisfy Kraft s inequality: P x n 2 L(xn ). Shannon Lower Bound: For any prefix code E[L(C n,x n )] H n(p ). Indeed, let K = P x n 2 L(xn ) Kraft. E[L(C n,x n )] H n(p )= = X x n An P (x n )L(xn )+ X x n An P (x n )logp (xn ) = X P (x n n )log P (x ) x n An 2 L(xn ) /K log K 0 since the first term is a divergence and cannot be negative (or log x x for 0 <x ).

Redundancy for Prefix Codes Throughout this talk we assume that the source P is given and is binary memoryless with probability p for transmitting a 0. Thatis,P (x n )=pk ( p) n k where k is the number of 0 s. Let ««p α =log 2, β =log 2. p p and x = x x be the fractional part of x. Redundancy of the Shannon-Fano Code: R SF n = 8 < : 2 + o() α irrational 2 M ` Mnβ 2 + O(ρ n ) α = N M, gcd(n, M) = Redundancy of the Huffman Code: R H n = 8 >< >: 3 2 log 2 3 2 M + o() 0.057304 α irrational ` βmn 2 M( 2 /M ) 2 nβm /M + O(ρ n ) α = N M where N, M are integers such that gcd(n, M) =and ρ<.

Oscillations 0.8 0.7 0.8 0.6 0.6 0.5 0.4 0.4 0.2 0.3 0 0 20 30 40 50 60 70 0 0 20 30 40 50 60 70 (a) (b) Figure : Shannon Fano code redundancy 0.08 0.08 0.07 0.06 0.06 0.04 0.05 0.04 0.02 0.03 0 0 20 30 (a) 40 50 60 70 0 0 20 30 (b) 40 50 60 70 Figure 2: Huffman s code redundancy versus block size n for: (a) irrational α = log 2 ( p)/p with p = /π; (b) rational α = log 2 ( p)/p with p =/9.

One-to-One Codes One-to-One codes are not prefix codes. In one-to-one codes a distinct codeword is assigned to each source symbol and unique decodability is not required. Such codes are usually one shot codes and there is one designated an end of message channel symbol. Wyner in 972 proved that L H(X), which was further improved by Alon and Orlitsky who showed L H(X) log(h(x) +) log e. Can we establish more precise bounds? Where are the oscillations observed in prefix codes?

Block One-to-One Codes We consider a block one-to-one code for x n = x...x n A n generated by a memoryless source with p being the probability of generating a 0 and q = p. We write P (x n )=pk q n k,where k is the number of 0s. Throughout we assume p q. We now list all 2 n probabilities in a nonincreasing order and assign code lengths as follows q n p q 0 q n p q n... q n p q log 2 () log 2 (2)... log 2 (2 n )

Average Code Length There are `n k equal probabilities p k q n k. Define A k = + + +, A =0. 0 Starting from the position A k the next `n k probabilities P (x n ) are the same. The average code length is L n = p k q n k A k X j=a k + log 2 (j) = X ( n k) p k q n k log 2 (A k + i). i= Our goal is to estimate L n asymptotically for large n.

An Ugly Sum To evaluate the inner part of the sum for L n we apply the following identity (cf. Knuth Ex..2.4-42) NX a j = Na n j= N X (a j+ a j ) j= for any sequence a j.then L n = + p k q n k log 2 A k p k q n k 2 log 2 A k p k q n k +A k `n k log 2 + «n A k + log 2 A k log 2 A k ) 2 p k q n k A k 2 log 2 A k 4 2 log 2 A k where x = x x is the fractional part of x. `n k

Main Result Theorem. Consider a binary memoryless source and the one-to-one block code described above. Then for p< 2 L n = nh(p) 2 log 2 n 2ln2 +log 2 + p 2p log 2 + F (n) +o() 2 3p p + 5 4p 2p 2ln2 + G(n) p ( 2p) pqπ «where H(p) = p log 2 p ( p)log 2 ( p), andg(n) = F (n) = 0 p p if log 2 p is irrational. If log 2 p = N/M for some integers M, N such that gcd(n, M) =, then G(n) and F (n) are oscillating functions of complicated nature. For example, F (n) is equal to M 2π Z e x2 /2 * M «!+ 2p p nβ log 2πpqn x2! dx p 2ln2 2 where β = log 2 ( p). For p = 2, then L n = nh(/2) +2 n (n 2) for every n.

Oscillations 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0 0 20 30 40 50 60 70 0 0 20 30 40 50 60 70 (a) (b) Figure 3: The constant part of the average anti-redundancy versus n for: (a) irrational α =log 2 ( p)/p with p =/π; (b) rational α =log 2 ( p)/p with p =/9. Anti-redundancy R n = L n nh(p) for our one-to-one code is R n = 2 log n + O() where the O() terms contains oscillations.

Sketch of Proof. We only deal with the sum S n = = p k q n k log 2 A k p k q n k log 2 A k p k q n k log 2 A k = a n + b n where a n = b n = p k q n k log 2 A k, p k q n k log 2 A k.

Asymptotics of A n 2. We need the saddle point approximation of A n. Lemma. For large n and p</2 A np = p p 2 nh(p) +O(n /2 ). 2p 2πnp( p) More precisely, for an ε>0 and k = np +Θ(n /2+ε ) we have for some δ>0. Proof. Notice that A k = p «p k p 2p 2πnp( p) p ( p) n! (k np)2 exp +O(n δ ) 2p( p)n A n (z) = A k z k = ( + z)n 2 n z n+. z and apply the saddle point method to the Cauchy formula.

Binomial Distribution Approximation 3. Using Stirling s approximation we find a good approximation for the binomial distribution. Lemma 2. Let p n (k) = `n k pk q n k where q = p be the binomial distribution. Then for k pn n /2+ε we have p n (k) = p 2πp( p)n exp uniformly as n. Furthermore for large n. X k np > npn /2+ε p n (k) < 2n ε e n! (k pn)2 + O(n δ ) 2p( p)n 2ε /2

Asymptotics of a n 4. From above lemmas we find log A k =loga np + α(k np) (k np)2 2pqn ln 2 + O(n δ ). and then a n =loga np 2ln2 + O(n δ ) which is the desired result.

Returning to b n 5. Recall we need asymptotics of b n = p k q n k log 2 A k. From previous lemmas we conclude that log A k = αk + nβ log 2 ω n (k np)2 2pqn ln 2 + O(n δ ) for some ω>0. Thus we need asymptotics of the following sum p k q n k * αk + nβ log 2 ω n + (k np)2. 2pqn ln 2

Final Lemma 6. To complete we need the following lemma. Lemma 3. Let 0 <p< be a fixed real number and f :[0, ] R be a Riemann integrable function. (i) If α is irrational, then lim n p k ( p) n k f = D E kα + y (k np) 2 /(2pqn ln 2) Z 0 f(t) dt, where the convergence is uniform for all shifts y R.

Continue... (ii) Suppose that α = N M is a rational number with integers N, M such that gcd(n, M) =. Then uniformly for all y R Z = p k ( p) n k f 0 f(t) dt + H M (y) D E kα + y (k np) 2 /(2pqn ln 2) where Z H M (y) := M 2π Z «f(t) dt 0 e x2 /2 dx *!+ M y x2 2ln2 is a periodic function with period M.