A lower bound on compression of unknown alphabets

Size: px

Start display at page:

Download "A lower bound on compression of unknown alphabets"

Eugene Ambrose Edwards
5 years ago
Views:

1 Theoretical Computer Science wwwelseviercom/locate/tcs A lower bound on compression of unknown alphabets Nikola Jevtić a, Alon Orlitsky a,b,, Narayana P Santhanam a a ECE Department, University of California, San Diego, 9500 Gilman Drive, La Jolla CA , USA b CSE Department, University of California, San Diego, 9500 Gilman Drive, La Jolla CA , USA Received 5 March 004; received in revised form October 004; accepted 9 October 004 Abstract Many applications call for universal compression of strings over large, possibly infinite, alphabets However, it has long been known that the resulting redundancy is infinite even for iid distributions It was recently shown that the redundancy of the strings patterns, which abstract the values of the symbols, retaining only their relative precedence, is sublinear in the blocklength n, hence the persymbol redundancy diminishes to zero In this paper we show that pattern redundancy is at least 5log en /3 bits To do so, we construct a generating function whose coefficients lower bound the redundancy, and use Hayman s saddle-point approximation technique to determine the coefficients asymptotic behavior 004 Published by Elsevier BV Keywords: Large and unknown alphabets; Patterns; Saddle point approximations Hayman s theorem; Universal compression Introduction Many applications require compression of data generated by an unknown distribution For example, while data often needs to be compressed to accomodate bandwidth constraints in wireless communications, its distribution is rarely known Corresponding author CSE Department, University of California, San Diego, 9500 Gilman Drive, La Jolla CA , USA addresses: njevtic@ucsdedu N Jevtić, alon@ucsdedu A Orlitsky, nsanthan@ucsdedu NP Santhanam /$ - see front matter 004 Published by Elsevier BV doi:006/jtcs

2 94 N Jevtić et al / Theoretical Computer Science A typical approach to this problem assumes that the underlying distribution, though unknown, belongs to a known collection P of possible distributions, for example, the set of iid or Markov distributions When the underlying distribution is known, sources can be compressed to essentially their entropy Even with the distribution unknown, we attempt to compress the data with a universal code so that the number of bits used is not much larger than the entropy of the underlying distribution, no matter which one in P it may be The minimum number of extra bits used by any universal code in the worst case is the redundancy, ˆRP of the collection P of distributions Let P be a collection of distributions over a set X Shtarkov [6] showed that ˆRP = log max px p P, x X where throughout the paper, logarithms are taken to base For the collection Im n of iid distributions over length-n strings from an alphabet of a fixed size m, a number of researchers have shown [5 7,8,4,7,9,35] that as n increases ˆRI n m = m log n π + log Γm Γ m + o m, where Γ is the gamma function, and the o m term diminishes with increasing n at a rate determined by m This redundancy grows logarithmically with the blocklength n, hence as n increases, the per-symbol redundancy ˆRI m n /n diminishes, implying that asymptotically, strings generated by an unknown distribution in Im n can be compressed essentially as well as when the underlying distribution is known However, in many applications, such as language modeling, text, and image compression, the alphabet size m is large, often comparable to the blocklength, and the redundancy calculated in high In the limit, Kieffer [7] showed that universal compression of iid sequences over infinite alphabets requires infinite per-symbol redundancy, and specified the condition for a collection of distributions to have negligible per-symbol redundancy Motivated by and by Kieffer s result, researchers have generally avoided direct compression of sequences over infinite or large alphabets Therefore, several researchers have attempted to get around Kieffer s negative result One line of work, along the lines of [8,,3,3,5] constructed compression algorithms for collections satisfying Kieffer s condition For example, [8] considered the collection of iid distributions that assign non-increasing probabilities to positive integers A second approach [,6] does not restrict the collection of distributions, but separates the description of the sequence into two parts: a description of the symbols appearing in the string, and of the pattern they form For example, the string abracadabra can be described by conveying the pattern, Ψ abracadabra, 3453

3 and the dictionary N Jevtić et al / Theoretical Computer Science Index Letter a b r c d Together, the pattern and the dictionary specify that the string abracadabra consists of the first letter to appear a, followed by the second letter to appear b, then by the third to appear r, the first that appeared a again, the fourth c, the first a, etc In many applications [ 4,,8,3,36], the description of patterns is more important than the dictionary For example, in language modeling, the pattern reflects the structure of the language while the dictionary plays a less important part Consequently, we concentrate on the redundancy of compressing the patterns Any distribution induces a distribution on patterns, assigning to a pattern ψ the probability pψ def = p{x : Ψx = ψ} of all sequences whose pattern is ψ Letting Ψ n denote the set of all length-n patterns, Shtarkov s sum implies that the pattern redundancy of I n, ie, the redundancy of the collection of distributions induced on patterns by I n is ˆRIΨ n = log max pψ ψ Ψ n p I n It has been shown [] that, irrespective of the alphabet size, patterns of iid distributed strings can be compressed with redundancy of at most n ˆRIΨ π n 3 log e bits Hence as the blocklength n grows, the redundancy of patterns increases sublinearly with n, and the per-symbol redundancy diminishes to zero, even for infinite alphabets In this paper we improve on a lower bound on ˆRIΨ n presented in [] To do so, we lower bound the highest probability of a pattern ψ by the highest probability of any single iid string whose pattern is ψ We obtain, n ˆRIΨ n ˆR IΨ n def μ μφμ ψ = log n, ψ Ψ n μ= where φ μ ψ is the number of symbols appearing μ times in ψ ˆR IΨ n is of mathematical interest of its own and its simple formulation allows for a precise evaluation of its growth order In Theorem 0, we use Hayman s saddle point analysis on its generating function to show that 3 ˆR IΨ n = log e n 3 3 log n 3 log e log 3 + o 3

4 96 N Jevtić et al / Theoretical Computer Science This bound is related to the bound for fixed m, ˆRI n m m log n π + log Γm Γ m + o m, 4 presented in Appendix of [7] However, it is not clear whether the latter bound holds when m grows with n see Discussion in Section 5 of [9] If Bound 4 held for m growing as 3 n, then it could be applied to obtain a lower bound on ˆRIΨ n, as described in [] However, this approach of [] would yield only the matching leading coefficient of Bound 3, and it can be shown that if additional coefficients were calculated, they would not exceed those in Bound 3 We note that recently Shamir [5] showed that a weaker form of the lower bound in Corollary applies to average-case redundancy An interesting property of patterns arises also in connection with Good Turing estimators Applications considered here, like language modeling, text compression etc, typically involve distributions over a large alphabet, with most letters in the alphabet having insignificant probabilities The maximum likelihood estimates of the distribution from a given data sample are known to be unreliable for these applications, and of the several alternatives proposed, the Good Turing estimate and its modifications are known to perform well In contrast to the maximum likelihood estimate which explains the count of each symbol, these estimates look at the number of symbols appearing once, twice, and so on It can be shown, eg, [], that the iid distribution assigning the highest probability to a pattern is the best iid distribution to explain this statistic For a detailed discussion along this angle, see [0,] Patterns and their redundancy We formally define patterns and discuss their compression Let A be any alphabet For x = x x n def = x n An, Ax def ={x,,x n } is the set of symbols appearing in x The index of x Ax is x x def = min { Ax i : i n and x i = x}, one more than the number of distinct symbols preceding x s first appearance in x The pattern of x is the concatenation Ψx def = x x x x x x n, of all indices For example, if x = abracadabra, x a =, x b =, x r = 3, x c = 4, and x d = 5, hence Let Ψabracadabra = 3453 ΨA n ={Ψx : x A n }

5 N Jevtić et al / Theoretical Computer Science denote the set of patterns of all strings in A n For example, if A consists of two elements, then ΨA ={}, ΨA ={, }, ΨA 3 ={,,, }, etc Let Ψ n def = A ΨA n be the set of all length-n patterns, and let Ψ def = Ψ n n=0 be the set of all patterns For example, Ψ 0 ={λ}, Ψ ={}, Ψ ={, }, Ψ 3 ={,,,, 3}, where λ is the empty string, and so on It is easy to see that a string ψ is a pattern if and only if the first occurance of any i Z + in ψ precedes that of i+ For example,,, and 3 are patterns, while, 3 are not Every probability distribution p over A, the collection of all strings of symbols from A induces a distribution p Ψ over patterns on Ψ, where p Ψ ψ def = p{x A : Ψx = ψ}, is the probability that a string generated according to p has pattern ψ When pattern probabilities p Ψ ψ are evaluated, the subscript Ψ can be inferred, and is hence omitted For example, let p be a uniform distribution over {a,b} Then p induces on Ψ the distribution p = p{aa,bb} =, p = p{ab,ba} = For a collection P of distributions over A let def P Ψ ={p Ψ : p P} denote the collection of distributions over Ψ induced by probability distributions in P The pattern redundancy of P, namely the worst case redundancy of universally coding patterns generated according to an unknown distribution in P Ψ,is ˆRP Ψ = inf q sup p P Ψ sup log pψ ψ Ψ qψ, where q is any distribution on Ψ Clearly, the pattern redundancy of all P is non-negative We will be mainly interested in pattern redundancy of the collection I n of all iid distributions, that is, we compare any distribution q s probabilities to the maximum iid probabilities of patterns, ˆpψ def = sup p I n Ψ pψ

6 98 N Jevtić et al / Theoretical Computer Science We illustrate the computation of maximum probabilities for a few simple patterns Observe that since any distribution p has p =, we have ˆp = Since any distribution p concentrated on a single element has p = for any number of s, we obtain ˆp =, and, since any continuous distribution p has p n =, we derive ˆp n= In general it is difficult to determine the maximum probability of a pattern For example, some work [3] is needed to show that ˆp = 4 Since it is difficult to obtain the maximum probability of patterns, it is difficult to compute the pattern redundancy of I n exactly In [], an upper bound was obtained for the redundancy of patterns, showing that the per-symbol pattern redundancy of I n diminishes to zero with increasing blocklengths However, we prove here that the pattern redundancy of I n is not less than On 3 3 The generating function As mentioned earlier, it is difficult to obtain the maximum probability of patterns Instead, we lower bound these probabilities of patterns, and use Shtarkov s sum to derive a lower bound on redundancy Let Ψ p ψ ={x A : Ψx = ψ and px > 0} be the support of a pattern ψ with respect to a distribution p For every ψ Ψ n, sup pψ = sup p Ψ p IΨ n p ψ max max px p I n p I n x Ψ p ψ Let the number of symbols occuring μ times in ψ be φ μ Standard maximum-likelihood arguments imply that hence max p I n max x Ψ p ψ px = n μ= μ n μφμ, sup pψ n μ μφμ 5 p IΨ n μ= n Let Φ n ={φ,,φ n : φ i 0, n μ= μφ μ = n}, and Ψ φ ={ψ : φ μ symbols appear μ times in pattern ψ}

7 N Jevtić et al / Theoretical Computer Science Incorporating 5 into Shtarkov s sum, we obtain ˆRIΨ n = log sup pψ φ Φ n ψ Ψ φ p IΨ n log n μ μφμ φ Φ n ψ Ψ φ μ= n n! n μ μφμ = log nμ= φ Φ n μ! φ μφ μ! μ= n def = log gn 6 Direct computation of gn appears to be difficult Instead, we evaluate a generating function of gn, Gz def = n=0 gn nn n! zn, 7 from which the asymptotics of gn can be obtained using Hayman s analysis [4] To express the generating function Gz in a more explicit form, observe that Gz = μ μ z μ φμ n=0 φ,,φ n Φ n μ μ! φ μ! μ μ z μ φμ = μ φ μ 0, μ μ! φ μ! = μ μ z μ φμ μ φ μ 0 μ! φ μ!, thus yielding k k z k Gz = exp 8 4 Hayman s analysis In the last section, we lower bounded ˆRIΨ n in terms of the coefficients of a generating function Gz Hayman [4] developed a technique to compute the asymptotics of the coefficients of power series that satisfy certain properties, which, as shown later, Gz also satisfies In this section we describe Hayman s analysis We follow the terminology used in [30] Theorem Hayman For fz= a n z n, n=0

8 300 N Jevtić et al / Theoretical Computer Science let az def = d log fz d log z and and let the saddle point r n be the solution of ar n = n If for some real R, the following three conditions hold: Nonnegativity: R 0 <R such that for R 0 <x<r, fx 0; bz def = d log fz dlog z = za z, 9 Fast growth: As x R 0, namely, x approaches R from below, bx ; Basic split: x > 0, called the basic split such that Local approximation: for θ x, uniformly in θ as x R fxe iθ fxexpiaxθ θ bx; Fast taper: for x < θ < π, uniformly in θ as x R fxe iθ of x bx ; then, a n r n n fr n πbrn Hayman s analysis can also be viewed as a special case of the class of saddle point approximations It exploits the fact that for functions satisfying the conditions of Theorem, the value of Cauchy s integral C f z/zn+ around a contour C through the saddle point r n is captured by a short arc around r n For more details on the saddle point approximation and related results, see [0,4,30] For the generating function G defined in Eq 8, the functions az and bz of Eq 9 are az = k k+ z k and bz = k k+ z k 0 We pick R = e The first two conditions are clearly satisfied for Gz For x = ex 6 5, we show in Theorem 6, that the local approximation for G holds, and in Theorem 7 that Gz does drop rapidly for θ x 5 Preliminaries We outline some results that will be extensively used in this paper

9 N Jevtić et al / Theoretical Computer Science Observe that we can expand Gxe iθ in θ as Gxe iθ iθ l d l log Gz = Gx exp l= l! dlog z l iθ l k k+l x k = Gx exp l! l= z=x z=x We first check for convergence of each of the summations over k Lemma For any l, k k+l x k converges for x< e Proof By the Cauchy ratio test, eg, [33] Therefore, in order to evaluate the n th coefficient in the Taylor series, Hayman s theorem approximates the value of Gz in the complex integration over the circle z =xbya correction over the value Gx for points on the circle near the positive real line, and by a term much smaller than Gx for points on the circle away from the positive real line Intuitively speaking it follows that at the basic split, the contribution of higher order terms is negligible and that the contribution of the second coefficient is large enough to satisfy fast taper We choose a based on these criteria, and then prove that our choice indeed works We also use Feller s bounds [9] on Stirling s approximation for all n, n n n n πn n! πn e n, e e extensively in the paper Further, we shall denote by C positive constants that are, in particular, independent of x, θ and l 6 Locating the basic split We locate the basic split for Gxe iθ iθ l = Gx exp l= l! k k+l x k To do so, we estimate the magnitude of the coefficients of θ, and ensure that at our choice of, the second term is unbounded, and the contribution of any term beyond the second is negligible In Theorems 6 and 7, we show that this choice works We upper bound the magnitude of the coefficients of θ as follows Lemma 3 For integers l and x< e, k k+l x k l! l ex l+

10 30 N Jevtić et al / Theoretical Computer Science Proof From Feller s bounds, k k+l x k π k l x k e k k Squaring the right side, k l x k e k = x k e k k k mm l / k k= m= k x k e k k k= k l / l k l x k e k l! l k=0 k=0 k + l x k e k l l! = l ex l+ Taking the positive square root proves the lemma We lower bound the magnitude of the coefficient of θ as follows Lemma 4 For 5 6e <x< e, k k+ x k C ex 5 Proof From Feller s bounds, k k+ x k C Squaring the right side, k x k e k = k k= k x k e k k x k e k k m= 3k 4 k mm 3/ x k e k k mm 3/ k= m= k 4 x k e k k 3k k= 4 k 3/ 4 = C k 4 x k e k k= C 4! k= k 4 x k e k

11 N Jevtić et al / Theoretical Computer Science = C 4! xe 4 C 3 ex 5 k=4 k 4 x k 4 e k 4 In the last step we observed that 5 6 <xe <, and thus included it in the constant Taking the positive square root proves the lemma The following Lemma locates the basic split Lemma 5 x so that lim x x e k k+ x k and simultaneously for l 3, lim x l x e k k+l x k = 0 Proof Take x = ex α with 7 6 < α < 4 5 From Lemma 4, lim x k k+ x k lim x e x e C x ex 5 because α < 4 5, and from Lemma 3, lim x l k k+l x k lim x e x e x l l! l ex l+ = 0 because α > max l 3 + l = 7 6 Therefore all x = exα with 7 6 < α < 4 5 satisfy the lemma In particular we will use x = ex Local approximation We show that all points on the circle z =x with argument θ x = ex 6 5 can be approximated by a small correction over the value on the positive real line Theorem 6 Let x = ex 6 5 Uniformly in θ, for 0 θ x, Gxe iθ Gx exp iθax θ bx,

12 304 N Jevtić et al / Theoretical Computer Science ie, for 0 θ x, ε > 0, δε such that if 0 < x e < δ, Gxe iθ Gx exp iθax θ bx < ε Proof Observe that Gxe iθ k k x k e ikθ = exp k k x k ikθ l = exp l=0 l! iθ l k k+l x k = exp l! l=0 The rearrangement can be done for all x< e, as the original series is absolutely convergent for x< e Split the term in the exponent as, iθ l k k+l x k = k k x k + iθ k k+ x k l=0 l! θ k k+ x k + iθ l k k+l x k l=3 l! = loggx + iθax θ bx + iθ l k k+l x k l! Observing that if t < ε, e t e t <e t <e ε, an equivalent statement for the local approximation would be that for θ < x, givenε > 0, δε such that for x e < δ, iθ l k k+l x k l! < ε l=3 l=3 k k+l x k To reduce the above expression note that each term, iθl l!, approaches 0 as x e, and that the summation converges by Cauchy s root test eg, [34] Therefore, iθ l k k+l x k a l=3 l! θ l k k+l x k l=3 l! b x l l! l=3 l! l ex l+ = ex 5 l l! l=3 l l! ex m 5 m + 6! = m+3 ex 0 m + 3! m=0

13 N Jevtić et al / Theoretical Computer Science a is the mod-sum inequality and b follows from Lemma 3 Observe that the coefficient of ex 0 converges when ex To see this, use Cauchy s root test for the convergence of the series Since expression can be made smaller than ε by taking x close enough to e, the theorem follows 8 Fast taper We prove that our choice, x = ex 6 5 from Lemma 5 is indeed a basic split Theorem 7 Let x = ex 6 5 Uniformly in θ as x e Gxe iθ ogx bx θ : 0 < x θ < π ie, for x θ < π, ε > 0, δε such that if x e < δ, G xe iθ bx G x = k k+ x k exp 4 k k xk sin kθ < ε Proof We first upper bound bx using Lemma 3 We bound the denominator separately in the regions ex 6 5 θ ex 8 and ex 8 θ π The bound for the second region will apply uniformly in any range lower bounded by ex α with α < 4, in particular, we choose 8 We first consider the second region Let x = ex 8 In the sum k k kθ xk sin, reject all terms for which kθ is less than 4 x or between π± 4 x The sequence θ, θ k θ will never have consecutive terms < 4 x or between π± 4 x because x θ π Consequently, for any M consecutive terms, after this rejection process, we will have at least M terms remaining Lower bounding all remaining sin kθ by sin x 4 allows us to factor the sin x 4 term out of the summation Call the sum of the remaining terms residual summation The terms kk xk decrease monotonically with k for x e So the lower bound for any residual summation is, using Lemma 8, sin k k sin x 4 xk 4 C ex k= keven

14 306 N Jevtić et al / Theoretical Computer Science Define v = 4 ex Combining all that has been proved so far k k+ x k exp 4 k k xk sin kθ Cv0 e cv, which can be made smaller than any ε > 0, for all θ ex 8 by choosing x e < δ ε Note that the sin x 4 for β < ex 8, and equals for β = 8 To tackle the remaining region, ie, ex 6 5 θ ex 8, we use the following inequality for kθ π, kθ sin kθ π In this region, we will have π θ terms for which the inequality holds with both sides being positive We write θ = ex α Therefore 8 α 6 5 Squaring and substituting the above inequality into the left side of the Theorem, k k+ x k exp 4 k k We lower bound ex α Define v = π ex α ex α xk sin kθ π ex α k k+ x k 6 We conclude ex k k+ x k exp 4 k k xk sin kθ k k+ x k exp 6 π π θ k k xk k θ 4 k k+ x k using Lemma 9, C ex α 8 α C α 6 ex 5 α 5 Cv40 e cv, which, for x θ 8 ex, can be made smaller than ε > 0, by taking x e δ ε Picking δ = minδ, δ = δ concludes the proof for all ex 6 5 x π We prove Lemmas 8 and 9 used in Theorem 7 Lemma 8 For 5 6e <x< e, k= keven C xk ex k k

15 N Jevtić et al / Theoretical Computer Science Proof From Feller s bounds, k= keven k= keven k k xk e π k= keven k=4 keven k ex k To lower bound the sum on the right observe that, ex k = ex k k k lk l l= leven ex k k k=4 4 k keven ex 4 = ex + ex C ex By observing that 5 6 < ex < we incorporate ex and + ex into a constant Taking the positive square root proves the lemma Lemma 9 For x> 6e 5, ex α ex α k k+ x k C ex α C ex 5 α α α Proof For any m, from Feller Bounds, m k k+ x k m C k x k e k k We first show that if k< ex 3ex, the k th term is less than the k + th term in the above summation To see that observe that the ratio of the k + th to the k th term is + 3/ xe k and that + 3/ xe + 3 xe, k k where the second inequality holds if k< 5 4 ex in the summation are nondecreasing ex 3ex Since ex > 5 6, the terms before k =

16 308 N Jevtić et al / Theoretical Computer Science For α, observe that ex α ex, so that ex α k 3/ x k e k a 5 4 ex α ex α k 3/ x k e k ex 3 α xe ex α 4 ex α b 4 ex 5 α 4, where a follows by replacing all terms of the summation with the first term in the summation and b because for all y<, y y α y y We complete the proof for α > by using the lemma for α =, which we just proved For α >, observe that ex α > ex Using the inequality, ex ex k k+ x k C, ex observe that ex α ex α k k+ x k ex α ex k k+ x k C ex 5 α 9 Evaluation of coefficients Using Hayman s analysis, we evaluate the lower bound on ˆRIΨ n n!, namely n n times the n th coefficient of the expansion of Gz

17 Theorem 0 N Jevtić et al / Theoretical Computer Science ˆR I n Ψ = 3 log e n 3 3 log n 3 log e Proof From 6 8, we have that ˆR IΨ n nn n! zn = Gz = exp n=0 k k z k log 3 + o From the observations following 0 and Theorems 6 and 7, we conclude that Gz satisfies the conditions of Theorem To use 9, we need to evaluate the function az shown in 0 tobe k k+ z k We do so using the related tree function [30] Tz= k k z k, which satisfies [30] the equation Tz= ze Tz 3 Therefore, k k z k = Tz Tz 4 By differentiating Eqs 3 and 4 and using the absolute convergence of the series, we obtain Tz az = Tz 3, and bz = Tz + Tz Tz 5 At z = e, we have the following singular expansion [30], Tz = + ez 3 ez + O ez 4 Consequently, it can be verified that az = + O, ez 3 ez and the solution to ar n = n is r n = + O e n 3 n 4 3

18 30 N Jevtić et al / Theoretical Computer Science The n th coefficient of the Gz therefore equals r n n Gr n + o 5 πbrn We evaluate the terms to be Gr n = expn On 3, rn n = exp n n 3 + On 3 + On 3, and br n = 3n On, and use them to evaluate ˆR I n Ψ, ˆR I n Ψ = 3 log e n 3 3 log n 3 log e log 3 + o We note that this is the highest accuracy of the asymptotic expansion allowed by the Hayman s theorem, limited by the form of Eq 5 3 Corollary ˆRIΨ n log e n 3 3 log n 3 log e log 3 + o Acknowledgements We thank Wojciech Szpankowski for sharing his intuition behind saddle point approximations References [] J Åberg, YM Shtarkov, BJM Smeets, Multialphabet coding with separate alphabet description, in: Proc of Compression and Complexity of Sequences, 997 [] N Cesa-Bianchi, G Lugosi, Minimax regret under log loss for general classes of experts, in: Proc of the Twelfth Ann Conf on Computational Learning Theory, pp 8, 999 [3] SF Chen, J Goodman, An empirical study of smoothing techniques for language modeling, in: Proceedings of the Thirty-Fourth Annual Meeting of the Association for Computational Linguistics, San Francisco, 996 Morgan Kaufmann, Los Altos, CA, pp [4] KW Church, WA Gale, Probability scoring for spelling correction, Statist and Comput [5] TM Cover, Universal portfolios, Math Finance 99 9 [6] TM Cover, E Ordentlich, Universal portfolios with side information, IEEE Trans Inform Theory [7] M Drmota, W Szpankowski, The precise minimax redundancy, in: Proc IEEE Symp Inform Theory, 00 [8] P Elias, Universal codeword sets and representations of integers, IEEE Trans Inform Theory [9] W Feller, An Introduction to Probability Theory, Wiley, 968 [0] P Flajolet, R Sedgewick, Average case analysis of algorithms: saddle point asymptotics, Technical Report 376, INRIA, 994

19 N Jevtić et al / Theoretical Computer Science [] DP Foster, RA Stine, AJ Wyner, Universal codes for finite sequences of integers drawn from a monotone distribution, IEEE Trans Inform Theory [] WA Gale, KW Church, DYarowsky,A method for disambiguating word senses, Computers and Humanities [3] L Györfi, I Pali, EC Van der Meulen, On universal noiseless source coding for infinite source alphabets, European Trans Telecommunications and Related Technol [4] WK Hayman, A generalization of Stirling s formula, Journal für die reine und angewandte Mathematik [5] D He, E Yang, On the universality of grammar-based codes for sources with countably infinite alphabets, in: Proc IEEE Symp Inform Theory, 003 [6] N Jevtić, A Orlitsky, NP Santhanam, Universal compression of unknown alphabets, in: Proc IEEE Symp Inform Theory, 00 [7] JC Kieffer, A unified approach to weak universal source coding, IEEE Trans Inform Theory [8] RE Krichevsky, VK Trofimov, The performance of universal coding, IEEE Trans Inform Theory [9] A Orlitsky, NP Santhanam, Speaking of infinity, IEEE Trans Inform Theory [0] A Orlitsky, NP Santhanam, J Zhang, Always good turing: asymptotically optimal probability estimation, in: Proc of the 44th Ann Symp on Foundations of Computer Science, October 003 [] A Orlitsky, NP Santhanam, J Zhang, Always good turing: asymptotically optimal probability estimation, Science [] A Orlitsky, NP Santhanam, J Zhang, Universal compression of memoryless sources over unknown alphabets, IEEE Trans Inform Theory [3] A Orlitsky, K Viswanathan, One-way communication and error-correcting codes, IEEE Trans Inform Theory [4] J Rissanen, Fisher information and stochastic complexity, IEEE Trans Inform Theory [5] G Shamir, Universal lossless compression with unknown alphabets the average case, IEEE Trans Inform Theory, 003, submitted for publication [6] YM Shtarkov, Universal sequential coding of single messages, Problems of Inform Transmission [7] YM Shtarkov, TJ Tjalkens, FMJ Willems, Multialphabet universal coding of memoryless sources, Problems of Inform Transmission [8] F Song, WB Croft, A general language model for information retrieval poster abstract, in: Research and Development in Information Retrieval, ACM Press, NY, 999, pp [9] W Szpankowski, On asymptotics of certain recurrences arising in universal coding, Problems of Inform Trans [30] W Szpankowski, Average Case Analysis of Algorithms on Sequences, Wiley, New York, 00 [3] T Uyematsu, F Kanaya, Asymptotic optimality of two variations of Lempel-Ziv codes for sources with countably infinite alphabet, in: Proc IEEE Symp Inform Theory, 00 [3] VG Vovk, A game of prediction with expert advice, J Comput and System Sci [33] EW Weisstein, Ratio Test From mathworld a wolfram webresource, RatioTesthtml [34] EW Weisstein, Root Test From Mathworld a wolfram webresource, RootTesthtml [35] Q Xie, AR Barron, Asymptotic minimax regret for data compression, gambling and prediction, IEEE Trans Inform Theory [36] K Yamanishi, A decision-theoretic extension of stochastic complexity and its application to learning, IEEE Trans on Inform Theory

Coding on Countably Infinite Alphabets

Coding on Countably Infinite Alphabets Non-parametric Information Theory Licence de droits d usage Outline Lossless Coding on infinite alphabets Source Coding Universal Coding Infinite Alphabets Enveloppe