A Comparison of Methods for Redundancy Reduction in Recurrence Time Coding

Similar documents
Asymptotic Optimal Lossless Compression via the CSE Technique

Data Compression Using a Sort-Based Context Similarity Measure

Motivation for Arithmetic Coding

Data Compression Techniques

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria

Generalized Kraft Inequality and Arithmetic Coding

Tight Upper Bounds on the Redundancy of Optimal Binary AIFV Codes

Bounded Expected Delay in Arithmetic Coding

Information Theory CHAPTER. 5.1 Introduction. 5.2 Entropy

Basic Principles of Lossless Coding. Universal Lossless coding. Lempel-Ziv Coding. 2. Exploit dependences between successive symbols.

Lecture 1 : Data Compression and Entropy

Run-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE

arxiv:cs/ v1 [cs.it] 21 Nov 2006

MARKOV CHAINS A finite state Markov chain is a sequence of discrete cv s from a finite alphabet where is a pmf on and for

Text Compression. Jayadev Misra The University of Texas at Austin December 5, A Very Incomplete Introduction to Information Theory 2

Second step algorithms in the Burrows Wheeler compression algorithm

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

Chapter 2 Source Models and Entropy. Any information-generating process can be viewed as. computer program in executed form: binary 0

Remote Source Coding with Two-Sided Information

Lecture 4 : Adaptive source coding algorithms

On bounded redundancy of universal codes

arxiv: v1 [cs.it] 5 Sep 2008

arxiv: v2 [cs.ds] 17 Sep 2017

Multimedia. Multimedia Data Compression (Lossless Compression Algorithms)

Lecture 16. Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code

An O(N) Semi-Predictive Universal Encoder via the BWT

Compression and Coding

Multimedia Communications. Mathematical Preliminaries for Lossless Compression

CSCI 2570 Introduction to Nanocomputing

Chapter 2: Source coding

10-704: Information Processing and Learning Fall Lecture 10: Oct 3

ECE 587 / STA 563: Lecture 5 Lossless Compression

Entropy as a measure of surprise

Lecture 5: Channel Capacity. Copyright G. Caire (Sample Lectures) 122

1590 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 6, JUNE Source Coding, Large Deviations, and Approximate Pattern Matching

ECE 587 / STA 563: Lecture 5 Lossless Compression

EECS 229A Spring 2007 * * (a) By stationarity and the chain rule for entropy, we have

THIS paper is aimed at designing efficient decoding algorithms

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

Information and Entropy

Chapter 2. Error Correcting Codes. 2.1 Basic Notions

Turbo Compression. Andrej Rikovsky, Advisor: Pavol Hanus

ITCT Lecture IV.3: Markov Processes and Sources with Memory

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

1 Introduction to information theory

On Probability Estimation by Exponential Smoothing

Data Compression Techniques

Arimoto Channel Coding Converse and Rényi Divergence

COMM901 Source Coding and Compression. Quiz 1

How to Pop a Deep PDA Matters

The Burrows-Wheeler Transform: Theory and Practice

SIGNAL COMPRESSION Lecture 7. Variable to Fix Encoding

repetition, part ii Ole-Johan Skrede INF Digital Image Processing

Fixed-Length-Parsing Universal Compression with Side Information

The Gallager Converse

Lecture 3 : Algorithms for source coding. September 30, 2016

Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet)

Low-Complexity Fixed-to-Fixed Joint Source-Channel Coding

lossless, optimal compressor

Chapter 5: Data Compression

IN this paper, we study the problem of universal lossless compression

Chapter 2 Date Compression: Source Coding. 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code

1 Ex. 1 Verify that the function H(p 1,..., p n ) = k p k log 2 p k satisfies all 8 axioms on H.

Chapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code

Efficient Alphabet Partitioning Algorithms for Low-complexity Entropy Coding

On Universal Types. Gadiel Seroussi Hewlett-Packard Laboratories Palo Alto, California, USA. University of Minnesota, September 14, 2004

On Scalable Coding in the Presence of Decoder Side Information

Lecture 11: Polar codes construction

Information Theory with Applications, Math6397 Lecture Notes from September 30, 2014 taken by Ilknur Telkes

Upper Bounds on the Capacity of Binary Intermittent Communication

5284 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 11, NOVEMBER 2009

Paper from European Trans. on Telecomm., Vol. 5, pp , July-August 1994.

A NOTE ON STRATEGY ELIMINATION IN BIMATRIX GAMES

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science Transmission of Information Spring 2006

Using Information Theory Approach to Randomness Testing

Lecture 4 Noisy Channel Coding

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Data Compression Techniques (Spring 2012) Model Solutions for Exercise 2

EE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16

Data Compression Techniques

Data Compression. Limit of Information Compression. October, Examples of codes 1

Lecture 3. Mathematical methods in communication I. REMINDER. A. Convex Set. A set R is a convex set iff, x 1,x 2 R, θ, 0 θ 1, θx 1 + θx 2 R, (1)

,

Lossless Source Coding

4 An Introduction to Channel Coding and Decoding over BSC

ProblemsWeCanSolveWithaHelper

Chapter 9 Fundamental Limits in Information Theory

On Unique Decodability, McMillan s Theorem and the Expected Length of Codes

3F1 Information Theory, Lecture 3

Linear Algebra. F n = {all vectors of dimension n over field F} Linear algebra is about vectors. Concretely, vectors look like this:

CMPT 365 Multimedia Systems. Lossless Compression

Module 5 EMBEDDED WAVELET CODING. Version 2 ECE IIT, Kharagpur

Anti-Slide. Regular Paper. Kazuyuki Amano 1,a) Shin-ichi Nakano 1,b) Koichi Yamazaki 1,c) 1. Introduction

Summary of Last Lectures

Common Information. Abbas El Gamal. Stanford University. Viterbi Lecture, USC, April 2014

3F1 Information Theory, Lecture 3

Chaitin Ω Numbers and Halting Problems

On universal types. Gadiel Seroussi Information Theory Research HP Laboratories Palo Alto HPL September 6, 2004*

(each row defines a probability distribution). Given n-strings x X n, y Y n we can use the absence of memory in the channel to compute

PART III. Outline. Codes and Cryptography. Sources. Optimal Codes (I) Jorge L. Villar. MAMME, Fall 2015

Transcription:

1 1 A Comparison of Methods for Redundancy Reduction in Recurrence Time Coding Hidetoshi Yokoo, Member, IEEE Abstract Recurrence time of a symbol in a string is defined as the number of symbols that have appeared since the last previous occurrence of the same symbol. It is one of the most fundamental quantities that can be used in universal source coding. If we count only the minimum required number of symbols occurring in the recurrence period, we can reduce some redundancy contained in recurrence time coding. The MTF (move-to-front) scheme is a typical example that shares the idea. In this correspondence, we establish three such schemes, and make a basic comparison with one another from the viewpoint that they can be thought of as different attempts to realize the above idea. Index Terms MTF, data compression, recency rank, recurrence time, source coding, universal codes I. INTRODUCTION Recurrence time coding, orinterval coding proposed by P. Elias [7] is a method for universal lossless compression, which encodes recurrence times of symbols in a string. The idea is extended from symbols to strings in [1], where its asymptotic optimality with respect to the string length is shown. The well-known Lempel Ziv code [14] is another type of realization of recurrence time coding [13]. It is known that recurrence time coding is redundant, although a kind of redundancy contained in symbolwise recurrence time coding is easy to reduce. The MTF (move-to-front) scheme, which was developed independently of recurrence time coding under the name of book stack [10] and has been known fairly formerly as various names in various contexts [8], can be regarded as an improvement of recurrence time coding. The recurrence time of a symbol is defined as the number of symbols that have appeared since the last previous occurrence of the same symbol. If we encode the number of distinct symbols occurring in that interval, we obtain the MTF scheme [3], [7]. Another improvement of recurrence time coding, which counts the number of alphabetically bigger or smaller symbols in the interval, is also known [], [9]. These improvements of recurrence time coding share an idea that one should encode the minimum number of symbols required to specify the current symbol. Since the minimum required number naturally depends on how to specify symbols, redundancy reduction in recurrence time coding should be analyzed in relation to how to specify the symbols in a string. In this correspondence, we begin with three different representations for specifying symbols via recurrence times, and establish their respective schemes for reducing redundancy, which include the MTF scheme. The three schemes have already been known, and actually applied to the block-sorting compression algorithm [4] as its second step component [], [6]. However, they have not been analyzed and compared from the viewpoint that they can be thought of as different attempts to realize redundancy reduction in recurrence time coding. Especially, very little attention has been paid to theoretical aspects of the newer two schemes, whereas even a redundancy analysis with no symbol extension is given to the MTF scheme [1]. This correspondence explores these three improved schemes by highlighting their common natures, and compares relatively the expectations of their outputs for memoryless sources. We also reveal the entropies assumed by the three schemes on binary memoryless sources. H. Yokoo is with the Department of Computer Science, Gunma University, Kiryu 376-8515, Japan.

II. REPRESENTATIONS OF RECURRENCE TIME CODING When we consider redundancy reduction in recurrence time coding, the representation of a recurrence time itself plays an essential role. In this section, we give three different representations of a recurrence time, which will be used to derive three different methods for redundancy reduction. Let A = {a 1,a,...,a α} denote a source alphabet of finite size α. Elements of the alphabet are called symbols. We assume that the symbols are totally ordered by some ordering relation. We say that the symbol a i is alphabetically smaller than the symbol a m for 1 i<m α, and conversely the symbol a m is alphabetically bigger than the symbol a i. The ordering relation is usually inherent in the alphabet. However, as we will see later, it may be determined after the data string is processed. For example, we can use actual symbol frequencies to define the order of symbols. Suppose that we are going to encode an n-tuple of symbols (i.e., a string) x n 1 = x 1x x n, x k A, k =1,,...,n, on which we measure how far two positions are using the distance: d(k, ) =k, 1 <k n. For the kth symbol x k = a A, if we have no a in {x 1,x,...,x k 1 } then we say that the symbol x k is the initial occurrence of the symbol a. Ifx k is not the initial occurrence of any symbol, then there exists such a position that x = x k (= a), 1 <k n, a {x +1,x +,...,x k 1 } = I. Then, the recurrence time of the kth symbol x k is defined as (1) r(k) =d(k, ) 1, () in which the second term 1 is introduced only for convenience so that the value of recurrence time is greater than or equal to zero. When we regard this value as a quantity associated with the position instead of k on a fixed string x n 1, we write r() =d(k, ) 1. (3) As long as the treatment of initial occurrences of symbols is properly defined, the original string x n 1 is unambiguously recovered from the sequence of r(1),r(),r(3),...,r(n). Also, the sequence of r(1), r(), r(3),... can be used to recover the same string. If we gather recurrence times for every symbol, we have another representation that can be used for a similar purpose. For x k being the initial occurrence of a symbol a, let s 1(a) = k 1. For the second and subsequent occurrences of the same symbol, when x is the m 1st occurrence of a and x k is the mth occurrence of the same symbol, we represent the interval between these two symbols by s m(a) =d(k, ) 1. (4) According to (), (3), and (4), we have three different representations of the recurrence time coding. These representations produce the same multiset of integers, but have different orders of elements. In this correspondence, however, we will not investigate the difference in such orders; instead we regard the three representations as starting points, from which we consider redundancy reduction in recurrence time coding. x n 1 ort k (b) Fig. 1. a b a b a b a a b 1 3 Recovery of x n 1 from a list of ort k(a i ) s III. REDUNDANCY REDUCTION IN RECURRENCE TIME REPRESENTATIONS A. Redundancy Reduction in {r(k)} The most typical method for reducing redundancy in recurrence time coding is the MTF scheme. As seen above, a usual recurrence time is defined as the number of symbols that have appeared since the last previous occurrence of the current symbol. If we encode the number of distinct symbols in that interval, we obtain the MTF scheme. In this correspondence, we represent the output of the scheme at time k by mtf (k) = I = number of distinct symbols occurring between x = a and x k = a. Usually, the output of the MTF scheme is regarded as a property associated with the position k provided that the condition (1) holds. The scheme reduces redundancy contained in the r(k)-representation in the sense that 0 mtf (k) r(k). The MTF scheme has been extensively studied from various viewpoints (see, e.g., [8], [1]). We will not discuss their details. We simply consider the MTF scheme to be qualified for the bench mark when we compare its related schemes. B. Redundancy Reduction in {s m(a)} We proceed to redundancy reduction in the {s m(a)}-based representation. Let us first consider a simple example. Assume that we have A = {a, b, c, d} with ordering relation a < b < c < d. The coding scheme based on (4) first encodes a list of recurrence times of symbol a, which is followed by the encoding of all occurrences of symbol b, and so on. Therefore, its corresponding decoder initially decodes all a s in x n 1, then decodes all b s, all c s, and all d s, in this order. In this scheme, as known from Fig. 1, after decoding all a s, we can use a list of quantities which are obtained by removing the contribution of symbol a from s m(b), instead of the original list {s m(b)} itself, to recover all the positions of symbol b. That quantity is called an inversion rank [] (also inversion frequency), or ordered recurrence time (ort) [9], which is now defined formally. For a given string x k 1 1 and for the next symbol x k = a, provided that the condition (1) holds for some, let ort k (a) represent the number of symbols in x k 1 +1 that are alphabetically bigger than a. When x k is the initial occurrence of a, ort k (a) is defined as the number of symbols in x k 1 1 that are alphabetically bigger than a. As an example, suppose that we are given the following string over the same four-symbol alphabet as above: 1 3 4 5 6 7 8 9 10 11 x 11 1 = a d c a a b c a d a b. When we compute ort 11(b), for example, we first observe the interval between x 6 and x 11 noting that x 6 = x 11 = b. In this interval we count the number of symbols that are alphabetically bigger than x 11 = b to yield ort 11(b) =. Thus, we have x 11 1 = a d c a a b c a d a b ort k (x k ) = 0 0 1 0 0 0 1.

3 In general, we cannot uniquely recover an original string of symbols from a sequence of {ort k (x k )} n k=1. For instance, if we proceed to x 1 after the above example, we may have ort 1(a)= ort 1(c) =1or ort 1(b) = ort 1(d) = 0. In either case, we cannot unambiguously determine x 1 from ort 1(x 1) = 0 or ort 1(x 1) =1. But, this difficulty can be overcome when we add a list of the numbers of occurrences of all symbols in x n 1, or when we append an imaginary sentinel to the end of the original string that can match any symbol in A. Since the scheme adopting the former idea has already been reported [9], [], we present coding and decoding procedures which are based on the latter idea 1. We assume that the string length n is shared by the procedures in advance. [Encoder 1] 1) Set i 1. ) Set x n+1 a i. 3) Compute and output an ordered multiset {ort k (a i) x k = a i, 1 k n +1}. 4) If i<α, then set i i +1 and go to Step. Otherwise (i = α), terminate. If we apply this procedure to the same example as above, Step 3 produces the following list of ort values. a i in x 1 1 : a a a a a a b b b c c c d d d ort k (a i) : 0 0 1 1 0 1 0 1 0 0 0. Note that we encode {ort k (a i) x k = a i, 1 k n +1} as an ordered multiset (i.e., linear list), not as a simple set, as shown above. The decoding can be performed by the following procedure. [Decoder 1] 1) Set x k nil for k =1,,...,n+1. ) Set i 1 and S concatenation of {ort k (a i) x k = a i, 1 k n +1} for i =1,,...,α. 3) Set k 1. 4) Let t be the first element of S. Delete it from S. 5) While x k nil, set k k +1. 6) If k = n +1, then go to Step 8. 7) If t =0, then set x k a i, k k +1, and go to Step 4. Otherwise (t 0), set t t 1, k k +1, and go to Step 5. 8) If i<α, then set i i +1 and go to Step 3. Otherwise (i = α), terminate. Note that the positions of the last symbol a α in the alphabet can be automatically determined even if the encoding of its positions is omitted. We will refer to the method for encoding {ort k (a i)} for i<αby the ORT scheme. C. Redundancy Reduction in { r()} The difference between the methods based on () and on (3) lies in the output order of recurrence times. The method based on (3) encodes r() when it reads x. At that time, although the encoder must read x k in advance in order to calculate r(), its current position is still at the th symbol. After emitting r(), it proceeds to the +1st symbol x +1. 1 An anonymous referee pointed out that it is possible to omit some of the outputs for the sentinel. The method suggested by the referee can be extended so that the whole outputs representing the sentinel are completely excluded. In order to do so, it is enough to redefine the ort value for the initial occurrence of symbol a i (i ) by the sum of ort n+1 (a i 1 ) and ort k (a i ), where k is the position of the initial occurrence of a i. The author thanks the referee for the suggestion. Fig.. x n 1 a b c d r() 5 a b c d a r() a b c b d a r() 0 a b c c b d a r() 4 a b c c b d a c Recovery of x n 1 from a list of r() s Figure gives a decoding example, in which the topmost row shows that the 1st symbol has ust been processed. Continuing from this state, we can recover x n 1 from a sequence of r(), r( +1), and so on. This method can be improved by an idea similar to that used to derive the ORT scheme. For example, in the topmost state in Fig., we use x = a and r() =5to fix x + r()+1 = x +6 = a. But, at this moment, since we have already had x +1 = b, x + = c, and x +5 = d, we can remove the contribution of these symbols from r() to obtain x +6 = a. Namely, we can introduce a new method that encodes urc() = number of unrecovered symbols between x and x + r()+1 at the processing of x instead of r(). Then, we can show the following equality. Theorem 1: For any string x n 1,ifwehavenoa Ain the interval between x = a and x k = a (1 <k n), then we have urc() =r(k) mtf (k). (5) Proof: When the th symbol x is processed in decoding, those symbols between x and x + r()+1 that have already been recovered are all distinct. They cover all the different symbols appearing in that interval. Hence, the number of these symbols should be mtf (k) = mtf ( + r()+1). Since the number of unrecovered symbols in the interval is equal to the difference between the interval length r(k) and the number of recovered symbols, it is given by (5). The most characteristic point of the method for encoding {urc()} is that we can correctly decode x n 1 even if we give no codeword to urc() = 0 when we have r(k) = mtf (k) = 0 in (5). That is to say, if W denotes the full sequence of urc() s, and Ŵ a sub-sequence of W obtained by removing such urc() =0that its corresponding mtf (k) is equal to zero, then the original string can be completely recovered from Ŵ provided that the initial occurrences of all the symbols in A are given in advance. The following are concrete encoding and decoding procedures, in which Ŵ is explicitly used for clarity. Actually, however, we can emit the codeword representing the value of u directly, not via Ŵ. An auxiliary array f of 1-bit flags is required only in the encoding procedure. Again, the string length n is assumed to be known by the procedures. [Encoder ]

4 1) Set f k false for k =1,,...,n. ) Encode the positions of the initial occurrences of all symbols. (This requires O(α log n) space.) 3) Set f k true for such k that x k is the initial occurrence of some symbol. 4) Set 1. 5) Set c x. 6) If x +1 = c, then set f +1 true and go to Step 9. 7) Set u 0. 8) Starting from l =1, iterate the following. a) If + l>nthen if u>0 then output u at the end of Ŵ, and go to Step 9. b) If x +l = c, then set f +l true, output u at the end of Ŵ, and go to Step 9. c) If f +l = false, then set u u +1. d) Set l l +1and go to Step 8a. 9) Set +1, and if = n then terminate else go to Step 5. [Decoder ] 1) Set x k nil for k =1,,...,n. ) Restore the initial occurrences of symbols to their appropriate positions in x n 1. 3) Set 1. 4) Set c x. 5) If x +1 = nil, then set x +1 c and go to Step 9. 6) If Ŵ =, then go to Step 9. 7) Let u be the first element of Ŵ. Delete it from Ŵ. 8) Starting from l =1, iterate the following. a) If + l>nthen go to Step 9. b) If (u =0) & (x +l = nil), then set x +l c and go to Step 9. c) If x +l = nil, then set u u 1. d) Set l l +1and go to Step 8a. 9) Set +1, and if = n then terminate else go to Step 4. The above encoding and decoding procedures are essentially the same as those in distance coding (see, for example, Deorowicz [6]). In the following, we will call the pair of the above procedures the URC scheme. IV. OUTPUT STATISTICS FOR MEMORYLESS SOURCES The three improved recurrence time coding schemes, MTF, ORT, and URC might have their own characteristics for various sources with or without memory. However, it is not easy to reveal or analyze such characteristics on sources with memory. We now compare their average behaviors for memoryless sources. Let {X k } + k= be a stationary, memoryless source over A, with positive symbol probabilities p i = Pr{X k = a i} > 0 for i = 1,,...,α, and p i =1. We have assumed that a source sequence is infinitely long also in the past so that we can ignore the initial occurrences of symbols. Thus, the position of x in the condition (1) does exist somewhere in x k 1. Corresponding to {X k} + k=, the following random variables are introduced: R k : recurrence time r(k) of X k, S i,m : mth recurrence time s m(a i) of a i, M k : mtf value mtf (k) of X k, T k : ort value ort k (X k ) of X k, U : urc value urc() of X. Considering average behaviors of these random variables on a double-sided stationary source implies that we are performing steady state analyses. In a finite-source case, however, the above random variables depend on the initial state, and are not stationary even when the source is so, which should be noted [1]. A. Comparison of Expected Values We first make a comparison among the expected values of the above random variables, taking it into account that redundancy reduction involves decreasing these values near to zero. We will not discuss concrete coding details for any of the schemes. If any random variable Z taking values on non-negative integers is encoded into a codeword of length f(z), and if f is continuous, convex-, and monotonically nondecreasing, then the expected codeword length E[f(Z)] is upper bounded by f(e[z]). Based on these, we concentrate on the expectation of integer outputs of the coding schemes. As a special case of Kac s lemma [1], [13], the expectation of S i,m is, independently of m, given by E[S i]=e[s i,m] = 1 p i 1. (6) Thus, the expectation of R k for the source is E[R k ]= p ie[s i]=α 1. (7) The expectation of T k is also independent of k, and given by the following theorem [9]. Theorem : The expectation of T k for a memoryless source is given by E[T k ]= (i 1)p i = ip i 1. (8) Proof: Assume that the condition (1) holds at time k for a = a i. The probabilities that an arbitrary one symbol in I is alphabetically smaller and bigger, respectively, than a i are given by i 1 P s = p l and P b = p l. Obviously, we have l=1 l=i+1 P s + p i + P b =1. (9) Suppose that in I we have t symbols that are alphabetically bigger than a i. The probability that such a situation occurs, so T k = t is given by Define Noting that Pr{T k = t X k = a i} =Pr{(X = a i) (t bigger symbols in I) (arbitrary number of smaller symbols in I)} ( ) s + t = p i Ps s Pb t = pip b t (s + t)! Ps s. (10) s t! s! g t(x) = g 0(x) = g t+1(x) = (s + t)! x s for t =0, 1,... s! s=1 x s = 1 1 x for x < 1, (s + t)! (s 1)! xs 1 = g t(x),

5 we can show t! g t(x) = for x < 1. (1 x) t+1 Therefore, the equation (10) now becomes p ipb t Pr{T k = t X k = a i} = (1 P. s) t+1 Using this, we can derive the expectation of E[T k ] given X k = a i as Pr{T k = t X k = a i} t = pi 1 P s ( ) t Pb t 1 P s = pi P b (1 P s) 1 P s (1 P s P b ) = P b (11) p i = 1 p l. p i l=i+1 Summed over all symbols, the expectation of T k is given by E[T k ]= Pr{X k =a i} Pr{T k =t X k =a i} t α 1 = l=i+1 p l = (i 1)p i. This completes the proof. The above theorem is intuitively obvious. It follows from the equalities (6) and (9) that the expectation of a simple recurrence time S i of symbol a i is given by E[S i]= 1 p i 1= 1 pi p i = Ps + P b p i. If we remove from this the contribution of symbols that are alphabetically smaller than a i, then we obtain the expectation (11) of T k for a i, which includes only symbols that are alphabetically bigger than a i. From this, we can easily reach the final result of the theorem. If the ORT scheme incorporates the numbers of symbols actually occurring in the original string, we can make use of the symbol frequencies to define the alphabet order. When we define the alphabet, it is natural to assume that p 1 p p α > 0. (1) In the rest of this subsection, we assume (1) to continue our analysis. The expectation (8) has another interpretation. Let us turn to the MTF scheme. Assume that the symbols are linked linearly as the MTF list, in which the position of every symbol represents the corresponding mtf value. When the symbols are arranged in alphabetic order in this list, the expectation of the mtf value is equal to (8). On the assumption (1), the expectation E[M k ] is minimized in this state. If the MTF list is in a different state from this, the expectation of E[M k ] is never smaller than (8). Namely, we have E[M k ] (i 1)p i, (13) which is combined with (8) to yield E[T k ] E[M k ]. (14) Let E[U ] denote the expectation of an element of W. Note that we do not use Ŵ when we consider E[U ]. Then, the expectations of the three schemes are related in the following way. Theorem 3: For a memoryless source with (1), we have E[T k ] E[M k ] α 1 E[U ]. (15) Proof: From the equations (5), (7), and (14), we have E[U ] = E[R k ] E[M k ] = α 1 E[M k ] α 1 E[T k ]. (16) It is known [8, Eq. (5.3)] that the average mtf value is given by p ip l E[M k ]=. p i + p l 1 i<l α Since we have 4p ip l (p i + p l ) for any real p i s, E[M k ] is upper bounded as E[M k ] 1 (p i + p l ) 1 i<l α ( = 1 α 1 ) α 1 p i + p l l=i+1 = 1 α 1 (α i)p i + 1 l=i+1 (i 1)p i = α 1, (17) where the equality holds when p 1 = p = = p α = α 1. The inequality (17) can be combined with (16) to show E[U ] (α 1)/. Summarizing (14), (17), and this, we have the inequalities in (15). Theorem 4: For a memoryless source with (1), E[T k ] (1 p1)α α 1. (18) Proof: The first inequality holds for α =with equality, which comes from (8). For α 3, subtracting the lefthand side of (18) from the second one, we have α (p + + pα) p p3 (α 1)pα = 1 {(α )p +(α 4)p3 +(α 6)p4 + +( α +4)p α 1 +( α +)p α} = 1 { (α )(p p α)+(α 4)(p 3 p α 1) + + ( α α ) } (p α+ p α+3 ) 0. The second inequality is obvious from p 1 1/α. The above results for a memoryless source with (1) can be summarized in words: The ORT scheme is superior to the other two schemes in average sense, although it requires the encoding of symbol frequencies, or of sentinels. The extra spaces required to encode these overheads are both O(α log n). The outputs of the MTF scheme are upper bounded by a constant (= α 1), while the other two schemes may take any nonnegative integers. Comparing the schemes on expectation of their outputs can be validated only when the outputs of every scheme are encoded by the same code. This presupposition is not satisfied in many situations, tending to make the comparison unfair to the MTF scheme. The URC scheme can eliminate some of the codewords, although the equations (15) and (16) are not benefited by this fact. When mtf (k) = 0, no codeword is required for the

6 corresponding urc() = 0. The probability that M k = 0 is given by p i = ασ + 1 α, where σ = α (pi 1/α) /α. We know from the above observation that the three schemes have their respective advantages, which are, however, difficult to compare with one another. Although a detailed redundancy analysis of the MTF scheme has been done even for sources with memory [1], it seems rather harder to apply a similar technique to the other schemes. Since we cannot reach such a conclusion that one scheme always outperforms another, we further investigate them in a more restricted case. B. Comparison on a Binary Memoryless Source Consider a binary memoryless source: {X k :Pr{X k = a 0} = p, Pr{X k = a 1} =1 p}, where 0 p 1. For this source, we now compute model entropies assumed by the three schemes. By model entropy, we mean the shortest codeword length per input bit that can be attained by the scheme when its output is ideally encoded. In the following, all logarithms are taken to the base. The binary entropy function is denoted by H(p) = p log p (1 p)log(1 p). The MTF scheme produces only 0 s and 1 s for the source. Since their probabilities are { 1 p(1 p), m =0 Pr{M k = m} = p(1 p), m =1, the entropy assumed by the MTF scheme is given by H(p(1 p)). Next, we proceed to the ORT scheme. It is sufficient for the scheme to encode a 0 s only. For X k = a 0, its ort value has probabilities Pr{T k = t} = p(1 p) t, t =0, 1,,... Therefore, the entropy per single a 0 is given by Pr{T k = t} log Pr{T k = t} = p(1 p) t log p(1 p) t = p log p (1 p) t p log(1 p) t(1 p) t = log p 1 p log(1 p) = 1 p p H(p). Hence, the per-bit entropy assumed by the ORT scheme is equal to H(p). This means that, for a binary memoryless source, the ORT scheme attains the source entropy. However, in order for the scheme to actually attain the entropy of the empirical distribution, it must encode the frequencies of a 0 s and a 1 s beforehand. In this situation, its performance on a binary source will be equivalent to enumerative codes (see, for example, Schalkwik [11] and Cover [5]). Finally, as seen in Fig. 3, we have for the URC scheme Pr{U =0, M k =0} =1 p(1 p), Pr{U = u, M k =1} = p(1 p){p(1 p) u +(1 p)p u }, u =0, 1,,..., Pr{U = u M k =1} = 1 {p(1 p)u +(1 p)p u }, u =0, 1,,... = 1 { ( ) p u 1 } 1+ Pr{T k = u}, p 1. 1 p Fig. 3. Model entropy per bit k k a 0 a 0 a 1 a 1 (a) urc() =0, mtf (k) =0 k a 0 a 1a 1 a 1 a 0 } {{ } u+1 (b) urc() =u, mtf (k) =1 The URC scheme for a binary string 1.0 0.5 O 0.5 p k a 1 a 0a 0 a 0 a 1 1.0 } {{ } u+1 URC (H(U )) MTF URC (H(Ŵ )) ORT (source entropy) Fig. 4. Comparison of the model entropies assumed by the three schemes. The entropies of the URC scheme, which cannot be represented in closed forms, are numerically evaluated. Thus, the marginal probabilities are given by Pr{U = u} { 1 p(1 p), u =0, = p(1 p){p(1 p) u +(1 p)p u },u=1,,... Because their entropy H(U )= Pr{U = u} log Pr{U = u} u=0 cannot be represented in a closed form, we have evaluated it numerically and plotted as a function of p. Its result included in Fig. 4 shows relatively large values of H(U ), which may be predicted by Theorem 3. However, we can improve it by eliminating redundant urc values. When the URC scheme uses Encoder, it emits its outputs only in the case corresponding to Q u =Pr{U = u M k =1}. Therefore, the entropy per bit is given by H(Ŵ ) = p(1 p) Q u log Q u, u=0 which equals the entropy assumed by the ORT scheme when p = 0, 1/, 1. When the parameter p takes other values, the difference between the entropies assumed by the URC and ORT schemes increases (see Fig. 4). In summary, the URC scheme shows a remarkable improvement when it eliminates redundant codewords, but it still inferior in model entropy to the ORT scheme at least on binary memoryless sources. V. CONCLUSION We have compared three methods for reducing redundancy in recurrence time coding. The three methods are the MTF, ORT,

7 and URC schemes. Although these schemes are difficult to analyze independently, we have overcome the difficulty by making a relative comparison. We conclude that the ORT scheme is relatively favorable, which is consistent with empirical observations []. However, the results are not sophisticated; they correspond only to the zeroth order analysis on memoryless sources. In addition, while the MTF scheme can be incorporated into a real-time system, the other two schemes may restrict their applications within those admitting unbounded coding/decoding delays. We have analyzed the schemes on the assumption that they produce multisets of integers, but actually they produce ordered linear lists of integers. Thus, there still remains much work for future exploration in considering orders of their elements. ACKNOWLEDGMENTS This correspondence is an extension of our previous work [9], which has been performed ointly with Shingo Mamada. His contribution to the work should be appreciated. Also, the author would like to thank referees for their constructive comments. REFERENCES [1] M. Arimura and H. Yamamoto, Asymptotic redundancy of the MTF scheme for stationary ergodic sources, IEEE Trans. Inf. Theory, vol. 51, no. 11, pp. 374 375, Nov. 005. [] Z. Arnavut, Inversion coding, Computer Journal, vol. 47, no. 1, pp. 46 57, 004. [3] J. L. Bentley, D. D. Sleator, R. E. Taran, and V. K. Wei, A locally adaptive data compression scheme, Comm. ACM, vol. 9, no. 4, pp. 30 330, 1986. [4] M. Burrows and D. J. Wheeler, A block-sorting lossless data compression algorithm, SRC Research Report, 14, May 1994. [5] T. M. Cover, Enumerative source encoding, IEEE Trans. Inf. Theory, vol. IT-19, no. 1, pp. 73 77, 1973. [6] S. Deorowicz, Second step algorithms in the Burrows Wheeler compression algorithm, Software Practice and Experience, vol. 3, no., pp. 99 111, 00. [7] P. Elias, Interval and recency rank source coding: Two on-line adaptive variable-length schemes, IEEE Trans. Inf. Theory, vol. IT-33, no. 1, pp. 3 10, 1987. [8] J. A. Fill, An exact formula for the Move-to-Front rule for self organizing lists, Journal of Theoretical Probability, vol. 9, no. 1, pp. 113 159, 1996. [9] S. Mamada and H. Yokoo, A new interval source coding scheme over an ordered alphabet, Proc. 003 IEEE Intern. Symp. Inf. Theory, p. 1, Yokohama, 003. [10] B. Y. Ryabko, Data compression by means of a book stack, Problems of Information Transmission, vol. 16, no. 4, pp. 65 69, 1980. [11] J. P. M. Schalkwik, An algorithm for source coding, IEEE Trans. Inf. Theory, vol. IT-18, no. 3, pp. 395 399, 197. [1] F. M. J. Willems, Universal data compression and repetition times, IEEE Trans. Inf. Theory, vol. 35, no. 1, pp. 54 58, 1989. [13] A. D. Wyner, J. Ziv, and A. J. Wyner, On the role of pattern matching in information theory, IEEE Trans. Inf. Theory, vol. 44, no. 6, pp. 045 056, 1998. [14] J. Ziv and A. Lempel, A universal algorithm for sequential data compression, IEEE Trans. Inf. Theory, vol. IT-3, no. 3, pp. 337 343, 1977. Hidetoshi Yokoo received the B.Eng. degree in instrumentation physics, the M.Eng. degree in information engineering, and the D.Eng. degree in mathematical engineering from the University of Tokyo, in 1978, 1980, and 1987, respectively. From 1980 to 1989 he was Research Associate at the Department of Electrical Engineering, Yamagata University, Japan. In 1989, he oined Gunma University, Kiryu, Japan, where he is currently Associate Professor in the Department of Computer Science. His research interests are in data compression and its application to computer science. Dr. Yokoo is a member of the Information Processing Society of Japan, the Institute of Electronics, Information and Communication Engineers, and the Society of Information Theory and Its Applications.