On bounded redundancy of universal codes

Size: px

Start display at page:

Download "On bounded redundancy of universal codes"

Olivia Bell
5 years ago
Views:

1 On bounded redundancy of universal codes Łukasz Dębowski Institute of omputer Science, Polish Academy of Sciences ul. Jana Kazimierza 5, Warszawa, Poland Abstract onsider stationary ergodic measures for which the difference between the expected length of a uniquely decodable code and the block entropy is asymptotically bounded by a constant. Using ergodic decomposition, it is shown that the number of such measures is less than the base of the logarithm raised to the power of that constant. In consequence, an analogous statement is derived for excess lengths of universal codes. The latter was previously communicated without proof. Keywords: uniquely decodable codes, entropy, ergodic decomposition The work was partly supported by the Polish Ministry of Scientific Research and Information Technology, grant no. 1/P03A/045/28. 1

2 1 Introduction The aim of this note is to establish an impossibility result concerning uniquely decodable codes and stationary ergodic measures. onsider measures for which the difference between the expected length of a code and the block entropy is asymptotically bounded by a constant. We will show that the number of such measures is less than the base of the logarithm raised to the power of that constant. In other words, the expected length of a uniquely decodable code cannot be close to its lower bound for all ergodic measures. This simple but novel result is a showcase application of Shannon information measures for σ- algebras, a neat and powerful tool developed by Pinsker (1964), Wyner (1978), and Dębowski (2009). We will derive, as a corollary, an analogous statement for excess lengths of universal codes, which was communicated by Dębowski (2009) in a weaker form without a proof. The preliminaries are as follows. Let X = {0, 1,..., D X 1} be a finite alphabet. For the measurable space (X Z, X Z ), consider the shift transformation T : X Z (x k ) k Z (x k+1 ) k Z X Z, where x k X. The shift-invariant σ- algebra is I := { A X Z : T A = A }. Let (S, S) be the measurable space of stationary probability measures on (X Z, X Z ) (i.e., µ T = µ for µ S) and let (E, E) (S, S) be the subspace of ergodic measures (i.e., µ(a) {0, 1} for µ E and A I). To be precise, S and E are defined as the smallest σ-algebras containing all cylinder sets {µ S : µ(a) r} and {µ E : µ(a) r}, A X Z, r R, respectively. Since X Z is countably generated, all respective singletons {µ} belong to S and E. Define random variables X i ((x k ) k Z ) := x i on (X Z, X Z ) and write the blocks as X n:m := (X i ) n i m. For a stationary measure µ S consider the block entropy µ (n) := µ (X i+1:i+n ) := E µ log µ(x i+1:i+n = )], where E µ is the expectation with respect to µ and log is the natural logarithm. Moreover, we consider another finite alphabet Y = {0, 1,..., D Y 1} and a code : X + Y +, where X + = n=1 Xn. Denote the expected length of the code as µ (n) := E µ (X 1:n ) log D Y. The code is called uniquely decodable if its extension (u 1,..., u k ) := (u 1 )...(u k ), u i X n, k N, is an injection for any n. As shown by, e.g., over and Thomas (2006), this property implies the source coding inequality The main result of this work is as follows. µ (n) µ (n). (1) Theorem 1 Let be a uniquely decodable code. Then { card µ E : lim sup µ (n) µ (n) ] } K exp(k). (2) This theorem will be proved in the following section. As we have said, the proposition states that the codes cannot be too good. Whereas there are uncountably many ergodic measures, the difference µ (n) µ (n) is bounded only 1

3 for countably many of them. Indeed, there exists a code satisfying the latter condition for measures in an arbitrary countable subset A E. For instance, let be the Shannon-Fano code for measure P = µ A c µµ, where µ A c µ = 1 and c µ > 0. Then lim sup n µ (n) µ (n) ] log c µ + log D Y < for all µ A. ere we show that this is not true if A is uncountable. Shields (1993) demonstrated, as another related result, that for any β 0, 1) and any universal code there exists such an ergodic measure that µ (n) µ (n) ] /n β =. (3) sup n N The code is called here universal if it is uniquely decodable and µ (n) µ (n) ] /n = 0 lim holds for any stationary measure µ S, cf. over and Thomas (2006). Thus Theorem 1 strengthens Shields result for β = 0. Next, we discuss the result mentioned in Dębowski (2009). Denote the mutual information between blocks of length n as E µ (n) := 2 µ (n) µ (2n) = I µ (X 1:n ; X n+1:2n ) := µ (X 1:n ) + µ (X n+1:2n ) µ (X 1:2n ), and the expected excess length of the code as E µ (n) := 2 µ (n) µ (2n) = E µ ( (X 1:n ) + (X n+1:2n ) (X 1:2n ) ) log D Y. There are a few universal codes for which excess lengths have a natural interpretation. Firstly, let (u) be the shortest program for generating string u for a prefix Turing machine. This code is universal and thus E µ (n) is the expectation of the algorithmic mutual information between blocks X 1:n and X n+1:2n, cf. Li and Vitányi (2008). While the shortest program for generating a string cannot be efficiently found, there exist also computable universal codes such as the Lempel-Ziv code (Ziv and Lempel, 1977) and grammar-based codes (Kieffer and Yang, 2000; Dębowski, 2011). In particular, the excess length E µ (n) of admissibly minimal grammar-based codes is bounded above by the number of distinct nonterminal symbols in the grammar used for compression (Dębowski, 2011). We claim this proposition, announced without proof as Theorem 6 in Dębowski (2009). Theorem 2 Let be a universal code. We have { card µ E : lim sup E µ (n) E µ (n) ] } K exp(k). The original statement in Dębowski (2009) was weaker, namely, the term E µ (n) was missing. (E µ (n) is nonnegative.) To prove Theorem 2, we use a lemma which resembles Lemma 1 in Dębowski (2011). 2

4 Lemma 1 (Excess-bounding lemma II) onsider a function G : N R such that lim k G(k)/k = 0 and G(n) 0 for all n. For any integer A 2 and a real number β 0, 1) these statements are equivalent: (i) G(n) γn β holds for a γ > 0 and all but finitely many n N. (ii) AG(n) G(An) δn β holds for a δ > 0 and all but finitely many n N. The exact relationship between constants γ and δ is given in the proof. Proof of Lemma 1: If (i) holds then (ii) holds for δ = Aγ because G(n) 0. onversely, if (ii) is true then for sufficiently large n, we obtain G(k) G(n) = G(n) n lim = k k k=0 so (i) holds with γ = δ/a(1 A β 1 )]. AG(A k n) G(A k+1 n) A k+1 δn β A(1 A β 1 ) Proof of Theorem 2: We apply Lemma 1 (ii) = (i) with G(n) = µ (n) µ (n) and A = 2. onsider an ergodic measure µ such that lim sup n E µ (n) E µ (n) ] K. This implies lim sup n µ (n) µ (n) ] K + ɛ for any ɛ > 0. Thus the claim follows from Theorem 1. Lemma 1 implies that the redundancy µ (n) µ (n) of universal codes is always bounded in a similar fashion to the excess redundancy Eµ (n) E µ (n). In particular, the result (3) given by Shields is equivalent to saying that for any β 0, 1) and any universal code there exists an ergodic measure such that sup n N E µ (n) E µ (n) ] /n β =. This remark concludes the Introduction. In the remaining part of the work we prove Theorem 1. 2 The proof of Theorem 1 Recall that I is the shift-invariant σ-algebra. onsider a stationary measure P S. According to the ergodic decomposition theorem (Kallenberg, 1997, Theorems ), if X is countable then there exists a random ergodic measure F : (X Z, I) (E, E) such that F (A) = P (A I) (4) P -almost surely for all A X Z. Because E P P (A I) = P (A), we have P (n) = E P F (n). (5) In contrast, by Theorem 7 from Dębowski (2011), the analogous decomposition for the block entropy reads P (n) = I P (X 1:n ; I) + E P F (n), (6) 3

5 where I P (A; B) is the mutual information between σ-algebras defined by Pinsker (1964), Wyner (1978), or Dębowski (2009). By the source coding inequality (1) for µ = P, formulas (5) and (6) imply E P F (n) F (n) ] I P (X 1:n ; I). Because the P -completion of I is contained in the P -completion of the tail σ- algebra by Lemma 3 from Dębowski (2009), we also have lim n I P (X 1:n ; I) = I P (I; I) by Theorems 1(iii)-(v) and 2(i) from Dębowski (2009). ence lim sup E P F (n) F (n) ] I P (I; I) = P (I), (7) where P (A) := sup ] I i=1 P (A i) log P (A i ) is the entropy of a σ-algebra A with the supremum taken over all finite partitions {A i } I i=1, A i A. Write now { N(K) := card µ E : lim sup µ (n) µ (n) ] } K. Observe that if N(K) = 0 then (2) holds trivially. Thus it suffices to prove (2) for N(K) 1. onsider a natural number M 1 such that M N(K). Let A E be such a subset of M distinct ergodic measures µ that lim sup n µ (n) µ (n) ] K. Put the measure P = M 1 µ A µ. By the uniqueness of its ergodic decomposition (Kallenberg, 1997, Theorem 9.12), we have P (F = µ) = 1/M for µ A and P (F = µ) = 0 otherwise. Let F be the smallest σ-algebra with respect to which F is measurable. This σ-algebra is generated from the finite partition {(F = µ)} µ A and P (F) = log M. Because the P -completion of F equals the P -completion of I by Lemma 3 from Dębowski (2009), we also have P (I) = P (F) by Theorem 2(i) from Dębowski (2009). Take an ɛ > 0. Random variables K +ɛ F (n) F (n) ] are nonnegative P -almost surely for sufficiently large n because F assumes only finitely many values. Thus, by the Fatou lemma, K + ɛ E P lim sup n F (n) F (n) ] K + ɛ lim sup n E P F (n) F (n) ]. ence from inequality (7) we obtain log M = P (I) lim sup E P lim sup E P F (n) F (n) ] F (n) F (n) ] K. Since this holds for any M N(K), inequality (2) follows. Acknowledgments I would like to thank Jan Mielniczuk and an anonymous referee for their comments. References over, T. M., Thomas, J. A., Elements of Information Theory, 2nd ed. John Wiley, New York. 4

6 Dębowski, Ł., A general definition of conditional information and its application to ergodic decomposition. Statist. Probab. Lett. 79, Dębowski, Ł., On the vocabulary of grammar-based codes and the logical consistency of texts. IEEE Trans. Inform. Theor. 57, Kallenberg, O., Foundations of Modern Probability. Springer, New York. Kieffer, J.., Yang, E., Grammar-based codes: A new class of universal lossless source codes. IEEE Trans. Inform. Theor. 46, Li, M., Vitányi, P. M. B., An Introduction to Kolmogorov omplexity and Its Applications, 3rd ed. Springer, New York. Pinsker, M. S., Information and Information Stability of Random Variables and Processes. olden-day, San Francisco. Shields, P.., Universal redundancy rates don t exist. IEEE Trans. Inform. Theor. IT-39, Wyner, A. D., A definition of conditional mutual information for arbitrary ensembles. Inform. ontrol 38, Ziv, J., Lempel, A., A universal algorithm for sequential data compression. IEEE Trans. Inform. Theor. 23,

Chaitin Ω Numbers and Halting Problems

Chaitin Ω Numbers and Halting Problems Kohtaro Tadaki Research and Development Initiative, Chuo University CREST, JST 1 13 27 Kasuga, Bunkyo-ku, Tokyo 112-8551, Japan E-mail: tadaki@kc.chuo-u.ac.jp Abstract.