A lower bound on compression of unknown alphabets

Size: px
Start display at page:

Download "A lower bound on compression of unknown alphabets"

Transcription

1 Theoretical Computer Science wwwelseviercom/locate/tcs A lower bound on compression of unknown alphabets Nikola Jevtić a, Alon Orlitsky a,b,, Narayana P Santhanam a a ECE Department, University of California, San Diego, 9500 Gilman Drive, La Jolla CA , USA b CSE Department, University of California, San Diego, 9500 Gilman Drive, La Jolla CA , USA Received 5 March 004; received in revised form October 004; accepted 9 October 004 Abstract Many applications call for universal compression of strings over large, possibly infinite, alphabets However, it has long been known that the resulting redundancy is infinite even for iid distributions It was recently shown that the redundancy of the strings patterns, which abstract the values of the symbols, retaining only their relative precedence, is sublinear in the blocklength n, hence the persymbol redundancy diminishes to zero In this paper we show that pattern redundancy is at least 5log en /3 bits To do so, we construct a generating function whose coefficients lower bound the redundancy, and use Hayman s saddle-point approximation technique to determine the coefficients asymptotic behavior 004 Published by Elsevier BV Keywords: Large and unknown alphabets; Patterns; Saddle point approximations Hayman s theorem; Universal compression Introduction Many applications require compression of data generated by an unknown distribution For example, while data often needs to be compressed to accomodate bandwidth constraints in wireless communications, its distribution is rarely known Corresponding author CSE Department, University of California, San Diego, 9500 Gilman Drive, La Jolla CA , USA addresses: njevtic@ucsdedu N Jevtić, alon@ucsdedu A Orlitsky, nsanthan@ucsdedu NP Santhanam /$ - see front matter 004 Published by Elsevier BV doi:006/jtcs

2 94 N Jevtić et al / Theoretical Computer Science A typical approach to this problem assumes that the underlying distribution, though unknown, belongs to a known collection P of possible distributions, for example, the set of iid or Markov distributions When the underlying distribution is known, sources can be compressed to essentially their entropy Even with the distribution unknown, we attempt to compress the data with a universal code so that the number of bits used is not much larger than the entropy of the underlying distribution, no matter which one in P it may be The minimum number of extra bits used by any universal code in the worst case is the redundancy, ˆRP of the collection P of distributions Let P be a collection of distributions over a set X Shtarkov [6] showed that ˆRP = log max px p P, x X where throughout the paper, logarithms are taken to base For the collection Im n of iid distributions over length-n strings from an alphabet of a fixed size m, a number of researchers have shown [5 7,8,4,7,9,35] that as n increases ˆRI n m = m log n π + log Γm Γ m + o m, where Γ is the gamma function, and the o m term diminishes with increasing n at a rate determined by m This redundancy grows logarithmically with the blocklength n, hence as n increases, the per-symbol redundancy ˆRI m n /n diminishes, implying that asymptotically, strings generated by an unknown distribution in Im n can be compressed essentially as well as when the underlying distribution is known However, in many applications, such as language modeling, text, and image compression, the alphabet size m is large, often comparable to the blocklength, and the redundancy calculated in high In the limit, Kieffer [7] showed that universal compression of iid sequences over infinite alphabets requires infinite per-symbol redundancy, and specified the condition for a collection of distributions to have negligible per-symbol redundancy Motivated by and by Kieffer s result, researchers have generally avoided direct compression of sequences over infinite or large alphabets Therefore, several researchers have attempted to get around Kieffer s negative result One line of work, along the lines of [8,,3,3,5] constructed compression algorithms for collections satisfying Kieffer s condition For example, [8] considered the collection of iid distributions that assign non-increasing probabilities to positive integers A second approach [,6] does not restrict the collection of distributions, but separates the description of the sequence into two parts: a description of the symbols appearing in the string, and of the pattern they form For example, the string abracadabra can be described by conveying the pattern, Ψ abracadabra, 3453

3 and the dictionary N Jevtić et al / Theoretical Computer Science Index Letter a b r c d Together, the pattern and the dictionary specify that the string abracadabra consists of the first letter to appear a, followed by the second letter to appear b, then by the third to appear r, the first that appeared a again, the fourth c, the first a, etc In many applications [ 4,,8,3,36], the description of patterns is more important than the dictionary For example, in language modeling, the pattern reflects the structure of the language while the dictionary plays a less important part Consequently, we concentrate on the redundancy of compressing the patterns Any distribution induces a distribution on patterns, assigning to a pattern ψ the probability pψ def = p{x : Ψx = ψ} of all sequences whose pattern is ψ Letting Ψ n denote the set of all length-n patterns, Shtarkov s sum implies that the pattern redundancy of I n, ie, the redundancy of the collection of distributions induced on patterns by I n is ˆRIΨ n = log max pψ ψ Ψ n p I n It has been shown [] that, irrespective of the alphabet size, patterns of iid distributed strings can be compressed with redundancy of at most n ˆRIΨ π n 3 log e bits Hence as the blocklength n grows, the redundancy of patterns increases sublinearly with n, and the per-symbol redundancy diminishes to zero, even for infinite alphabets In this paper we improve on a lower bound on ˆRIΨ n presented in [] To do so, we lower bound the highest probability of a pattern ψ by the highest probability of any single iid string whose pattern is ψ We obtain, n ˆRIΨ n ˆR IΨ n def μ μφμ ψ = log n, ψ Ψ n μ= where φ μ ψ is the number of symbols appearing μ times in ψ ˆR IΨ n is of mathematical interest of its own and its simple formulation allows for a precise evaluation of its growth order In Theorem 0, we use Hayman s saddle point analysis on its generating function to show that 3 ˆR IΨ n = log e n 3 3 log n 3 log e log 3 + o 3

4 96 N Jevtić et al / Theoretical Computer Science This bound is related to the bound for fixed m, ˆRI n m m log n π + log Γm Γ m + o m, 4 presented in Appendix of [7] However, it is not clear whether the latter bound holds when m grows with n see Discussion in Section 5 of [9] If Bound 4 held for m growing as 3 n, then it could be applied to obtain a lower bound on ˆRIΨ n, as described in [] However, this approach of [] would yield only the matching leading coefficient of Bound 3, and it can be shown that if additional coefficients were calculated, they would not exceed those in Bound 3 We note that recently Shamir [5] showed that a weaker form of the lower bound in Corollary applies to average-case redundancy An interesting property of patterns arises also in connection with Good Turing estimators Applications considered here, like language modeling, text compression etc, typically involve distributions over a large alphabet, with most letters in the alphabet having insignificant probabilities The maximum likelihood estimates of the distribution from a given data sample are known to be unreliable for these applications, and of the several alternatives proposed, the Good Turing estimate and its modifications are known to perform well In contrast to the maximum likelihood estimate which explains the count of each symbol, these estimates look at the number of symbols appearing once, twice, and so on It can be shown, eg, [], that the iid distribution assigning the highest probability to a pattern is the best iid distribution to explain this statistic For a detailed discussion along this angle, see [0,] Patterns and their redundancy We formally define patterns and discuss their compression Let A be any alphabet For x = x x n def = x n An, Ax def ={x,,x n } is the set of symbols appearing in x The index of x Ax is x x def = min { Ax i : i n and x i = x}, one more than the number of distinct symbols preceding x s first appearance in x The pattern of x is the concatenation Ψx def = x x x x x x n, of all indices For example, if x = abracadabra, x a =, x b =, x r = 3, x c = 4, and x d = 5, hence Let Ψabracadabra = 3453 ΨA n ={Ψx : x A n }

5 N Jevtić et al / Theoretical Computer Science denote the set of patterns of all strings in A n For example, if A consists of two elements, then ΨA ={}, ΨA ={, }, ΨA 3 ={,,, }, etc Let Ψ n def = A ΨA n be the set of all length-n patterns, and let Ψ def = Ψ n n=0 be the set of all patterns For example, Ψ 0 ={λ}, Ψ ={}, Ψ ={, }, Ψ 3 ={,,,, 3}, where λ is the empty string, and so on It is easy to see that a string ψ is a pattern if and only if the first occurance of any i Z + in ψ precedes that of i+ For example,,, and 3 are patterns, while, 3 are not Every probability distribution p over A, the collection of all strings of symbols from A induces a distribution p Ψ over patterns on Ψ, where p Ψ ψ def = p{x A : Ψx = ψ}, is the probability that a string generated according to p has pattern ψ When pattern probabilities p Ψ ψ are evaluated, the subscript Ψ can be inferred, and is hence omitted For example, let p be a uniform distribution over {a,b} Then p induces on Ψ the distribution p = p{aa,bb} =, p = p{ab,ba} = For a collection P of distributions over A let def P Ψ ={p Ψ : p P} denote the collection of distributions over Ψ induced by probability distributions in P The pattern redundancy of P, namely the worst case redundancy of universally coding patterns generated according to an unknown distribution in P Ψ,is ˆRP Ψ = inf q sup p P Ψ sup log pψ ψ Ψ qψ, where q is any distribution on Ψ Clearly, the pattern redundancy of all P is non-negative We will be mainly interested in pattern redundancy of the collection I n of all iid distributions, that is, we compare any distribution q s probabilities to the maximum iid probabilities of patterns, ˆpψ def = sup p I n Ψ pψ

6 98 N Jevtić et al / Theoretical Computer Science We illustrate the computation of maximum probabilities for a few simple patterns Observe that since any distribution p has p =, we have ˆp = Since any distribution p concentrated on a single element has p = for any number of s, we obtain ˆp =, and, since any continuous distribution p has p n =, we derive ˆp n= In general it is difficult to determine the maximum probability of a pattern For example, some work [3] is needed to show that ˆp = 4 Since it is difficult to obtain the maximum probability of patterns, it is difficult to compute the pattern redundancy of I n exactly In [], an upper bound was obtained for the redundancy of patterns, showing that the per-symbol pattern redundancy of I n diminishes to zero with increasing blocklengths However, we prove here that the pattern redundancy of I n is not less than On 3 3 The generating function As mentioned earlier, it is difficult to obtain the maximum probability of patterns Instead, we lower bound these probabilities of patterns, and use Shtarkov s sum to derive a lower bound on redundancy Let Ψ p ψ ={x A : Ψx = ψ and px > 0} be the support of a pattern ψ with respect to a distribution p For every ψ Ψ n, sup pψ = sup p Ψ p IΨ n p ψ max max px p I n p I n x Ψ p ψ Let the number of symbols occuring μ times in ψ be φ μ Standard maximum-likelihood arguments imply that hence max p I n max x Ψ p ψ px = n μ= μ n μφμ, sup pψ n μ μφμ 5 p IΨ n μ= n Let Φ n ={φ,,φ n : φ i 0, n μ= μφ μ = n}, and Ψ φ ={ψ : φ μ symbols appear μ times in pattern ψ}

7 N Jevtić et al / Theoretical Computer Science Incorporating 5 into Shtarkov s sum, we obtain ˆRIΨ n = log sup pψ φ Φ n ψ Ψ φ p IΨ n log n μ μφμ φ Φ n ψ Ψ φ μ= n n! n μ μφμ = log nμ= φ Φ n μ! φ μφ μ! μ= n def = log gn 6 Direct computation of gn appears to be difficult Instead, we evaluate a generating function of gn, Gz def = n=0 gn nn n! zn, 7 from which the asymptotics of gn can be obtained using Hayman s analysis [4] To express the generating function Gz in a more explicit form, observe that Gz = μ μ z μ φμ n=0 φ,,φ n Φ n μ μ! φ μ! μ μ z μ φμ = μ φ μ 0, μ μ! φ μ! = μ μ z μ φμ μ φ μ 0 μ! φ μ!, thus yielding k k z k Gz = exp 8 4 Hayman s analysis In the last section, we lower bounded ˆRIΨ n in terms of the coefficients of a generating function Gz Hayman [4] developed a technique to compute the asymptotics of the coefficients of power series that satisfy certain properties, which, as shown later, Gz also satisfies In this section we describe Hayman s analysis We follow the terminology used in [30] Theorem Hayman For fz= a n z n, n=0

8 300 N Jevtić et al / Theoretical Computer Science let az def = d log fz d log z and and let the saddle point r n be the solution of ar n = n If for some real R, the following three conditions hold: Nonnegativity: R 0 <R such that for R 0 <x<r, fx 0; bz def = d log fz dlog z = za z, 9 Fast growth: As x R 0, namely, x approaches R from below, bx ; Basic split: x > 0, called the basic split such that Local approximation: for θ x, uniformly in θ as x R fxe iθ fxexpiaxθ θ bx; Fast taper: for x < θ < π, uniformly in θ as x R fxe iθ of x bx ; then, a n r n n fr n πbrn Hayman s analysis can also be viewed as a special case of the class of saddle point approximations It exploits the fact that for functions satisfying the conditions of Theorem, the value of Cauchy s integral C f z/zn+ around a contour C through the saddle point r n is captured by a short arc around r n For more details on the saddle point approximation and related results, see [0,4,30] For the generating function G defined in Eq 8, the functions az and bz of Eq 9 are az = k k+ z k and bz = k k+ z k 0 We pick R = e The first two conditions are clearly satisfied for Gz For x = ex 6 5, we show in Theorem 6, that the local approximation for G holds, and in Theorem 7 that Gz does drop rapidly for θ x 5 Preliminaries We outline some results that will be extensively used in this paper

9 N Jevtić et al / Theoretical Computer Science Observe that we can expand Gxe iθ in θ as Gxe iθ iθ l d l log Gz = Gx exp l= l! dlog z l iθ l k k+l x k = Gx exp l! l= z=x z=x We first check for convergence of each of the summations over k Lemma For any l, k k+l x k converges for x< e Proof By the Cauchy ratio test, eg, [33] Therefore, in order to evaluate the n th coefficient in the Taylor series, Hayman s theorem approximates the value of Gz in the complex integration over the circle z =xbya correction over the value Gx for points on the circle near the positive real line, and by a term much smaller than Gx for points on the circle away from the positive real line Intuitively speaking it follows that at the basic split, the contribution of higher order terms is negligible and that the contribution of the second coefficient is large enough to satisfy fast taper We choose a based on these criteria, and then prove that our choice indeed works We also use Feller s bounds [9] on Stirling s approximation for all n, n n n n πn n! πn e n, e e extensively in the paper Further, we shall denote by C positive constants that are, in particular, independent of x, θ and l 6 Locating the basic split We locate the basic split for Gxe iθ iθ l = Gx exp l= l! k k+l x k To do so, we estimate the magnitude of the coefficients of θ, and ensure that at our choice of, the second term is unbounded, and the contribution of any term beyond the second is negligible In Theorems 6 and 7, we show that this choice works We upper bound the magnitude of the coefficients of θ as follows Lemma 3 For integers l and x< e, k k+l x k l! l ex l+

10 30 N Jevtić et al / Theoretical Computer Science Proof From Feller s bounds, k k+l x k π k l x k e k k Squaring the right side, k l x k e k = x k e k k k mm l / k k= m= k x k e k k k= k l / l k l x k e k l! l k=0 k=0 k + l x k e k l l! = l ex l+ Taking the positive square root proves the lemma We lower bound the magnitude of the coefficient of θ as follows Lemma 4 For 5 6e <x< e, k k+ x k C ex 5 Proof From Feller s bounds, k k+ x k C Squaring the right side, k x k e k = k k= k x k e k k x k e k k m= 3k 4 k mm 3/ x k e k k mm 3/ k= m= k 4 x k e k k 3k k= 4 k 3/ 4 = C k 4 x k e k k= C 4! k= k 4 x k e k

11 N Jevtić et al / Theoretical Computer Science = C 4! xe 4 C 3 ex 5 k=4 k 4 x k 4 e k 4 In the last step we observed that 5 6 <xe <, and thus included it in the constant Taking the positive square root proves the lemma The following Lemma locates the basic split Lemma 5 x so that lim x x e k k+ x k and simultaneously for l 3, lim x l x e k k+l x k = 0 Proof Take x = ex α with 7 6 < α < 4 5 From Lemma 4, lim x k k+ x k lim x e x e C x ex 5 because α < 4 5, and from Lemma 3, lim x l k k+l x k lim x e x e x l l! l ex l+ = 0 because α > max l 3 + l = 7 6 Therefore all x = exα with 7 6 < α < 4 5 satisfy the lemma In particular we will use x = ex Local approximation We show that all points on the circle z =x with argument θ x = ex 6 5 can be approximated by a small correction over the value on the positive real line Theorem 6 Let x = ex 6 5 Uniformly in θ, for 0 θ x, Gxe iθ Gx exp iθax θ bx,

12 304 N Jevtić et al / Theoretical Computer Science ie, for 0 θ x, ε > 0, δε such that if 0 < x e < δ, Gxe iθ Gx exp iθax θ bx < ε Proof Observe that Gxe iθ k k x k e ikθ = exp k k x k ikθ l = exp l=0 l! iθ l k k+l x k = exp l! l=0 The rearrangement can be done for all x< e, as the original series is absolutely convergent for x< e Split the term in the exponent as, iθ l k k+l x k = k k x k + iθ k k+ x k l=0 l! θ k k+ x k + iθ l k k+l x k l=3 l! = loggx + iθax θ bx + iθ l k k+l x k l! Observing that if t < ε, e t e t <e t <e ε, an equivalent statement for the local approximation would be that for θ < x, givenε > 0, δε such that for x e < δ, iθ l k k+l x k l! < ε l=3 l=3 k k+l x k To reduce the above expression note that each term, iθl l!, approaches 0 as x e, and that the summation converges by Cauchy s root test eg, [34] Therefore, iθ l k k+l x k a l=3 l! θ l k k+l x k l=3 l! b x l l! l=3 l! l ex l+ = ex 5 l l! l=3 l l! ex m 5 m + 6! = m+3 ex 0 m + 3! m=0

13 N Jevtić et al / Theoretical Computer Science a is the mod-sum inequality and b follows from Lemma 3 Observe that the coefficient of ex 0 converges when ex To see this, use Cauchy s root test for the convergence of the series Since expression can be made smaller than ε by taking x close enough to e, the theorem follows 8 Fast taper We prove that our choice, x = ex 6 5 from Lemma 5 is indeed a basic split Theorem 7 Let x = ex 6 5 Uniformly in θ as x e Gxe iθ ogx bx θ : 0 < x θ < π ie, for x θ < π, ε > 0, δε such that if x e < δ, G xe iθ bx G x = k k+ x k exp 4 k k xk sin kθ < ε Proof We first upper bound bx using Lemma 3 We bound the denominator separately in the regions ex 6 5 θ ex 8 and ex 8 θ π The bound for the second region will apply uniformly in any range lower bounded by ex α with α < 4, in particular, we choose 8 We first consider the second region Let x = ex 8 In the sum k k kθ xk sin, reject all terms for which kθ is less than 4 x or between π± 4 x The sequence θ, θ k θ will never have consecutive terms < 4 x or between π± 4 x because x θ π Consequently, for any M consecutive terms, after this rejection process, we will have at least M terms remaining Lower bounding all remaining sin kθ by sin x 4 allows us to factor the sin x 4 term out of the summation Call the sum of the remaining terms residual summation The terms kk xk decrease monotonically with k for x e So the lower bound for any residual summation is, using Lemma 8, sin k k sin x 4 xk 4 C ex k= keven

14 306 N Jevtić et al / Theoretical Computer Science Define v = 4 ex Combining all that has been proved so far k k+ x k exp 4 k k xk sin kθ Cv0 e cv, which can be made smaller than any ε > 0, for all θ ex 8 by choosing x e < δ ε Note that the sin x 4 for β < ex 8, and equals for β = 8 To tackle the remaining region, ie, ex 6 5 θ ex 8, we use the following inequality for kθ π, kθ sin kθ π In this region, we will have π θ terms for which the inequality holds with both sides being positive We write θ = ex α Therefore 8 α 6 5 Squaring and substituting the above inequality into the left side of the Theorem, k k+ x k exp 4 k k We lower bound ex α Define v = π ex α ex α xk sin kθ π ex α k k+ x k 6 We conclude ex k k+ x k exp 4 k k xk sin kθ k k+ x k exp 6 π π θ k k xk k θ 4 k k+ x k using Lemma 9, C ex α 8 α C α 6 ex 5 α 5 Cv40 e cv, which, for x θ 8 ex, can be made smaller than ε > 0, by taking x e δ ε Picking δ = minδ, δ = δ concludes the proof for all ex 6 5 x π We prove Lemmas 8 and 9 used in Theorem 7 Lemma 8 For 5 6e <x< e, k= keven C xk ex k k

15 N Jevtić et al / Theoretical Computer Science Proof From Feller s bounds, k= keven k= keven k k xk e π k= keven k=4 keven k ex k To lower bound the sum on the right observe that, ex k = ex k k k lk l l= leven ex k k k=4 4 k keven ex 4 = ex + ex C ex By observing that 5 6 < ex < we incorporate ex and + ex into a constant Taking the positive square root proves the lemma Lemma 9 For x> 6e 5, ex α ex α k k+ x k C ex α C ex 5 α α α Proof For any m, from Feller Bounds, m k k+ x k m C k x k e k k We first show that if k< ex 3ex, the k th term is less than the k + th term in the above summation To see that observe that the ratio of the k + th to the k th term is + 3/ xe k and that + 3/ xe + 3 xe, k k where the second inequality holds if k< 5 4 ex in the summation are nondecreasing ex 3ex Since ex > 5 6, the terms before k =

16 308 N Jevtić et al / Theoretical Computer Science For α, observe that ex α ex, so that ex α k 3/ x k e k a 5 4 ex α ex α k 3/ x k e k ex 3 α xe ex α 4 ex α b 4 ex 5 α 4, where a follows by replacing all terms of the summation with the first term in the summation and b because for all y<, y y α y y We complete the proof for α > by using the lemma for α =, which we just proved For α >, observe that ex α > ex Using the inequality, ex ex k k+ x k C, ex observe that ex α ex α k k+ x k ex α ex k k+ x k C ex 5 α 9 Evaluation of coefficients Using Hayman s analysis, we evaluate the lower bound on ˆRIΨ n n!, namely n n times the n th coefficient of the expansion of Gz

17 Theorem 0 N Jevtić et al / Theoretical Computer Science ˆR I n Ψ = 3 log e n 3 3 log n 3 log e Proof From 6 8, we have that ˆR IΨ n nn n! zn = Gz = exp n=0 k k z k log 3 + o From the observations following 0 and Theorems 6 and 7, we conclude that Gz satisfies the conditions of Theorem To use 9, we need to evaluate the function az shown in 0 tobe k k+ z k We do so using the related tree function [30] Tz= k k z k, which satisfies [30] the equation Tz= ze Tz 3 Therefore, k k z k = Tz Tz 4 By differentiating Eqs 3 and 4 and using the absolute convergence of the series, we obtain Tz az = Tz 3, and bz = Tz + Tz Tz 5 At z = e, we have the following singular expansion [30], Tz = + ez 3 ez + O ez 4 Consequently, it can be verified that az = + O, ez 3 ez and the solution to ar n = n is r n = + O e n 3 n 4 3

18 30 N Jevtić et al / Theoretical Computer Science The n th coefficient of the Gz therefore equals r n n Gr n + o 5 πbrn We evaluate the terms to be Gr n = expn On 3, rn n = exp n n 3 + On 3 + On 3, and br n = 3n On, and use them to evaluate ˆR I n Ψ, ˆR I n Ψ = 3 log e n 3 3 log n 3 log e log 3 + o We note that this is the highest accuracy of the asymptotic expansion allowed by the Hayman s theorem, limited by the form of Eq 5 3 Corollary ˆRIΨ n log e n 3 3 log n 3 log e log 3 + o Acknowledgements We thank Wojciech Szpankowski for sharing his intuition behind saddle point approximations References [] J Åberg, YM Shtarkov, BJM Smeets, Multialphabet coding with separate alphabet description, in: Proc of Compression and Complexity of Sequences, 997 [] N Cesa-Bianchi, G Lugosi, Minimax regret under log loss for general classes of experts, in: Proc of the Twelfth Ann Conf on Computational Learning Theory, pp 8, 999 [3] SF Chen, J Goodman, An empirical study of smoothing techniques for language modeling, in: Proceedings of the Thirty-Fourth Annual Meeting of the Association for Computational Linguistics, San Francisco, 996 Morgan Kaufmann, Los Altos, CA, pp [4] KW Church, WA Gale, Probability scoring for spelling correction, Statist and Comput [5] TM Cover, Universal portfolios, Math Finance 99 9 [6] TM Cover, E Ordentlich, Universal portfolios with side information, IEEE Trans Inform Theory [7] M Drmota, W Szpankowski, The precise minimax redundancy, in: Proc IEEE Symp Inform Theory, 00 [8] P Elias, Universal codeword sets and representations of integers, IEEE Trans Inform Theory [9] W Feller, An Introduction to Probability Theory, Wiley, 968 [0] P Flajolet, R Sedgewick, Average case analysis of algorithms: saddle point asymptotics, Technical Report 376, INRIA, 994

19 N Jevtić et al / Theoretical Computer Science [] DP Foster, RA Stine, AJ Wyner, Universal codes for finite sequences of integers drawn from a monotone distribution, IEEE Trans Inform Theory [] WA Gale, KW Church, DYarowsky,A method for disambiguating word senses, Computers and Humanities [3] L Györfi, I Pali, EC Van der Meulen, On universal noiseless source coding for infinite source alphabets, European Trans Telecommunications and Related Technol [4] WK Hayman, A generalization of Stirling s formula, Journal für die reine und angewandte Mathematik [5] D He, E Yang, On the universality of grammar-based codes for sources with countably infinite alphabets, in: Proc IEEE Symp Inform Theory, 003 [6] N Jevtić, A Orlitsky, NP Santhanam, Universal compression of unknown alphabets, in: Proc IEEE Symp Inform Theory, 00 [7] JC Kieffer, A unified approach to weak universal source coding, IEEE Trans Inform Theory [8] RE Krichevsky, VK Trofimov, The performance of universal coding, IEEE Trans Inform Theory [9] A Orlitsky, NP Santhanam, Speaking of infinity, IEEE Trans Inform Theory [0] A Orlitsky, NP Santhanam, J Zhang, Always good turing: asymptotically optimal probability estimation, in: Proc of the 44th Ann Symp on Foundations of Computer Science, October 003 [] A Orlitsky, NP Santhanam, J Zhang, Always good turing: asymptotically optimal probability estimation, Science [] A Orlitsky, NP Santhanam, J Zhang, Universal compression of memoryless sources over unknown alphabets, IEEE Trans Inform Theory [3] A Orlitsky, K Viswanathan, One-way communication and error-correcting codes, IEEE Trans Inform Theory [4] J Rissanen, Fisher information and stochastic complexity, IEEE Trans Inform Theory [5] G Shamir, Universal lossless compression with unknown alphabets the average case, IEEE Trans Inform Theory, 003, submitted for publication [6] YM Shtarkov, Universal sequential coding of single messages, Problems of Inform Transmission [7] YM Shtarkov, TJ Tjalkens, FMJ Willems, Multialphabet universal coding of memoryless sources, Problems of Inform Transmission [8] F Song, WB Croft, A general language model for information retrieval poster abstract, in: Research and Development in Information Retrieval, ACM Press, NY, 999, pp [9] W Szpankowski, On asymptotics of certain recurrences arising in universal coding, Problems of Inform Trans [30] W Szpankowski, Average Case Analysis of Algorithms on Sequences, Wiley, New York, 00 [3] T Uyematsu, F Kanaya, Asymptotic optimality of two variations of Lempel-Ziv codes for sources with countably infinite alphabet, in: Proc IEEE Symp Inform Theory, 00 [3] VG Vovk, A game of prediction with expert advice, J Comput and System Sci [33] EW Weisstein, Ratio Test From mathworld a wolfram webresource, RatioTesthtml [34] EW Weisstein, Root Test From Mathworld a wolfram webresource, RootTesthtml [35] Q Xie, AR Barron, Asymptotic minimax regret for data compression, gambling and prediction, IEEE Trans Inform Theory [36] K Yamanishi, A decision-theoretic extension of stochastic complexity and its application to learning, IEEE Trans on Inform Theory

Coding on Countably Infinite Alphabets

Coding on Countably Infinite Alphabets Coding on Countably Infinite Alphabets Non-parametric Information Theory Licence de droits d usage Outline Lossless Coding on infinite alphabets Source Coding Universal Coding Infinite Alphabets Enveloppe

More information

WE start with a general discussion. Suppose we have

WE start with a general discussion. Suppose we have 646 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO. 2, MARCH 1997 Minimax Redundancy for the Class of Memoryless Sources Qun Xie and Andrew R. Barron, Member, IEEE Abstract Let X n = (X 1 ; 111;Xn)be

More information

Minimax Redundancy for Large Alphabets by Analytic Methods

Minimax Redundancy for Large Alphabets by Analytic Methods Minimax Redundancy for Large Alphabets by Analytic Methods Wojciech Szpankowski Purdue University W. Lafayette, IN 47907 March 15, 2012 NSF STC Center for Science of Information CISS, Princeton 2012 Joint

More information

Counting Markov Types

Counting Markov Types Counting Markov Types Philippe Jacquet, Charles Knessl, Wojciech Szpankowski To cite this version: Philippe Jacquet, Charles Knessl, Wojciech Szpankowski. Counting Markov Types. Drmota, Michael and Gittenberger,

More information

Context tree models for source coding

Context tree models for source coding Context tree models for source coding Toward Non-parametric Information Theory Licence de droits d usage Outline Lossless Source Coding = density estimation with log-loss Source Coding and Universal Coding

More information

The best expert versus the smartest algorithm

The best expert versus the smartest algorithm Theoretical Computer Science 34 004 361 380 www.elsevier.com/locate/tcs The best expert versus the smartest algorithm Peter Chen a, Guoli Ding b; a Department of Computer Science, Louisiana State University,

More information

Achievability of Asymptotic Minimax Regret in Online and Batch Prediction

Achievability of Asymptotic Minimax Regret in Online and Batch Prediction JMLR: Workshop and Conference Proceedings 9:181 196, 013 ACML 013 Achievability of Asymptotic Minimax Regret in Online and Batch Prediction Kazuho Watanabe Graduate School of Information Science Nara Institute

More information

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley Learning Methods for Online Prediction Problems Peter Bartlett Statistics and EECS UC Berkeley Course Synopsis A finite comparison class: A = {1,..., m}. Converting online to batch. Online convex optimization.

More information

Population estimation with performance guarantees

Population estimation with performance guarantees Population estimation with performance guarantees A. Orlitsy ECE and CSE Depts, UC San Diego, 9500 Gilman Dr 0407, La Jolla, CA 909-0407 alon@ucsd.edu N.P. Santhanam EECS Dept, UC Bereley 64 Cory Hall

More information

Worst-Case Bounds for Gaussian Process Models

Worst-Case Bounds for Gaussian Process Models Worst-Case Bounds for Gaussian Process Models Sham M. Kakade University of Pennsylvania Matthias W. Seeger UC Berkeley Abstract Dean P. Foster University of Pennsylvania We present a competitive analysis

More information

On bounded redundancy of universal codes

On bounded redundancy of universal codes On bounded redundancy of universal codes Łukasz Dębowski Institute of omputer Science, Polish Academy of Sciences ul. Jana Kazimierza 5, 01-248 Warszawa, Poland Abstract onsider stationary ergodic measures

More information

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria Source Coding Master Universitario en Ingeniería de Telecomunicación I. Santamaría Universidad de Cantabria Contents Introduction Asymptotic Equipartition Property Optimal Codes (Huffman Coding) Universal

More information

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability

More information

Asymptotic Minimax Regret for Data Compression, Gambling, and Prediction

Asymptotic Minimax Regret for Data Compression, Gambling, and Prediction IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 46, NO. 2, MARCH 2000 431 Asymptotic Minimax Regret for Data Compression, Gambling, Prediction Qun Xie Andrew R. Barron, Member, IEEE Abstract For problems

More information

Variable-to-Variable Codes with Small Redundancy Rates

Variable-to-Variable Codes with Small Redundancy Rates Variable-to-Variable Codes with Small Redundancy Rates M. Drmota W. Szpankowski September 25, 2004 This research is supported by NSF, NSA and NIH. Institut f. Diskrete Mathematik und Geometrie, TU Wien,

More information

Efficient Compression of Monotone and m-modal Distributions

Efficient Compression of Monotone and m-modal Distributions Efficient Compression of Monotone and m-modal Distriutions Jayadev Acharya, Ashkan Jafarpour, Alon Orlitsky, and Ananda Theertha Suresh ECE Department, UCSD {jacharya, ashkan, alon, asuresh}@ucsd.edu Astract

More information

Sequential prediction with coded side information under logarithmic loss

Sequential prediction with coded side information under logarithmic loss under logarithmic loss Yanina Shkel Department of Electrical Engineering Princeton University Princeton, NJ 08544, USA Maxim Raginsky Department of Electrical and Computer Engineering Coordinated Science

More information

STATISTICAL CURVATURE AND STOCHASTIC COMPLEXITY

STATISTICAL CURVATURE AND STOCHASTIC COMPLEXITY 2nd International Symposium on Information Geometry and its Applications December 2-6, 2005, Tokyo Pages 000 000 STATISTICAL CURVATURE AND STOCHASTIC COMPLEXITY JUN-ICHI TAKEUCHI, ANDREW R. BARRON, AND

More information

Achievability of Asymptotic Minimax Regret by Horizon-Dependent and Horizon-Independent Strategies

Achievability of Asymptotic Minimax Regret by Horizon-Dependent and Horizon-Independent Strategies Journal of Machine Learning Research 16 015) 1-48 Submitted 7/14; Revised 1/14; Published 8/15 Achievability of Asymptotic Minimax Regret by Horizon-Dependent and Horizon-Independent Strategies Kazuho

More information

A One-to-One Code and Its Anti-Redundancy

A One-to-One Code and Its Anti-Redundancy A One-to-One Code and Its Anti-Redundancy W. Szpankowski Department of Computer Science, Purdue University July 4, 2005 This research is supported by NSF, NSA and NIH. Outline of the Talk. Prefix Codes

More information

IN this paper, we study the problem of universal lossless compression

IN this paper, we study the problem of universal lossless compression 4008 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 9, SEPTEMBER 2006 An Algorithm for Universal Lossless Compression With Side Information Haixiao Cai, Member, IEEE, Sanjeev R. Kulkarni, Fellow,

More information

Chapter 2: Source coding

Chapter 2: Source coding Chapter 2: meghdadi@ensil.unilim.fr University of Limoges Chapter 2: Entropy of Markov Source Chapter 2: Entropy of Markov Source Markov model for information sources Given the present, the future is independent

More information

Minimax Optimal Bayes Mixtures for Memoryless Sources over Large Alphabets

Minimax Optimal Bayes Mixtures for Memoryless Sources over Large Alphabets Proceedings of Machine Learning Research 8: 8, 208 Algorithmic Learning Theory 208 Minimax Optimal Bayes Mixtures for Memoryless Sources over Large Alphabets Elias Jääsaari Helsinki Institute for Information

More information

Graph Coloring and Conditional Graph Entropy

Graph Coloring and Conditional Graph Entropy Graph Coloring and Conditional Graph Entropy Vishal Doshi, Devavrat Shah, Muriel Médard, Sidharth Jaggi Laboratory for Information and Decision Systems Massachusetts Institute of Technology Cambridge,

More information

A Comparison of Methods for Redundancy Reduction in Recurrence Time Coding

A Comparison of Methods for Redundancy Reduction in Recurrence Time Coding 1 1 A Comparison of Methods for Redundancy Reduction in Recurrence Time Coding Hidetoshi Yokoo, Member, IEEE Abstract Recurrence time of a symbol in a string is defined as the number of symbols that have

More information

Basic Principles of Lossless Coding. Universal Lossless coding. Lempel-Ziv Coding. 2. Exploit dependences between successive symbols.

Basic Principles of Lossless Coding. Universal Lossless coding. Lempel-Ziv Coding. 2. Exploit dependences between successive symbols. Universal Lossless coding Lempel-Ziv Coding Basic principles of lossless compression Historical review Variable-length-to-block coding Lempel-Ziv coding 1 Basic Principles of Lossless Coding 1. Exploit

More information

7 Asymptotics for Meromorphic Functions

7 Asymptotics for Meromorphic Functions Lecture G jacques@ucsd.edu 7 Asymptotics for Meromorphic Functions Hadamard s Theorem gives a broad description of the exponential growth of coefficients in power series, but the notion of exponential

More information

Lecture 4 Noisy Channel Coding

Lecture 4 Noisy Channel Coding Lecture 4 Noisy Channel Coding I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw October 9, 2015 1 / 56 I-Hsiang Wang IT Lecture 4 The Channel Coding Problem

More information

On the lower limits of entropy estimation

On the lower limits of entropy estimation On the lower limits of entropy estimation Abraham J. Wyner and Dean Foster Department of Statistics, Wharton School, University of Pennsylvania, Philadelphia, PA e-mail: ajw@wharton.upenn.edu foster@wharton.upenn.edu

More information

NATIONAL UNIVERSITY OF SINGAPORE Department of Mathematics MA4247 Complex Analysis II Lecture Notes Part II

NATIONAL UNIVERSITY OF SINGAPORE Department of Mathematics MA4247 Complex Analysis II Lecture Notes Part II NATIONAL UNIVERSITY OF SINGAPORE Department of Mathematics MA4247 Complex Analysis II Lecture Notes Part II Chapter 2 Further properties of analytic functions 21 Local/Global behavior of analytic functions;

More information

lossless, optimal compressor

lossless, optimal compressor 6. Variable-length Lossless Compression The principal engineering goal of compression is to represent a given sequence a, a 2,..., a n produced by a source as a sequence of bits of minimal possible length.

More information

UNIT I INFORMATION THEORY. I k log 2

UNIT I INFORMATION THEORY. I k log 2 UNIT I INFORMATION THEORY Claude Shannon 1916-2001 Creator of Information Theory, lays the foundation for implementing logic in digital circuits as part of his Masters Thesis! (1939) and published a paper

More information

Chapter 9 Fundamental Limits in Information Theory

Chapter 9 Fundamental Limits in Information Theory Chapter 9 Fundamental Limits in Information Theory Information Theory is the fundamental theory behind information manipulation, including data compression and data transmission. 9.1 Introduction o For

More information

Analytic Information Theory: From Shannon to Knuth and Back. Knuth80: Piteaa, Sweden, 2018 Dedicated to Don E. Knuth

Analytic Information Theory: From Shannon to Knuth and Back. Knuth80: Piteaa, Sweden, 2018 Dedicated to Don E. Knuth Analytic Information Theory: From Shannon to Knuth and Back Wojciech Szpankowski Center for Science of Information Purdue University January 7, 2018 Knuth80: Piteaa, Sweden, 2018 Dedicated to Don E. Knuth

More information

arxiv: v4 [cs.it] 17 Oct 2015

arxiv: v4 [cs.it] 17 Oct 2015 Upper Bounds on the Relative Entropy and Rényi Divergence as a Function of Total Variation Distance for Finite Alphabets Igal Sason Department of Electrical Engineering Technion Israel Institute of Technology

More information

Piecewise Constant Prediction

Piecewise Constant Prediction Piecewise Constant Prediction Erik Ordentlich Information heory Research Hewlett-Packard Laboratories Palo Alto, CA 94304 Email: erik.ordentlich@hp.com Marcelo J. Weinberger Information heory Research

More information

1 Ex. 1 Verify that the function H(p 1,..., p n ) = k p k log 2 p k satisfies all 8 axioms on H.

1 Ex. 1 Verify that the function H(p 1,..., p n ) = k p k log 2 p k satisfies all 8 axioms on H. Problem sheet Ex. Verify that the function H(p,..., p n ) = k p k log p k satisfies all 8 axioms on H. Ex. (Not to be handed in). looking at the notes). List as many of the 8 axioms as you can, (without

More information

Week 2: Sequences and Series

Week 2: Sequences and Series QF0: Quantitative Finance August 29, 207 Week 2: Sequences and Series Facilitator: Christopher Ting AY 207/208 Mathematicians have tried in vain to this day to discover some order in the sequence of prime

More information

Worst-Case Analysis of the Perceptron and Exponentiated Update Algorithms

Worst-Case Analysis of the Perceptron and Exponentiated Update Algorithms Worst-Case Analysis of the Perceptron and Exponentiated Update Algorithms Tom Bylander Division of Computer Science The University of Texas at San Antonio San Antonio, Texas 7849 bylander@cs.utsa.edu April

More information

Multi-armed bandit models: a tutorial

Multi-armed bandit models: a tutorial Multi-armed bandit models: a tutorial CERMICS seminar, March 30th, 2016 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions)

More information

Keep it Simple Stupid On the Effect of Lower-Order Terms in BIC-Like Criteria

Keep it Simple Stupid On the Effect of Lower-Order Terms in BIC-Like Criteria Keep it Simple Stupid On the Effect of Lower-Order Terms in BIC-Like Criteria Teemu Roos and Yuan Zou Helsinki Institute for Information Technology HIIT Department of Computer Science University of Helsinki,

More information

Information Theory and Hypothesis Testing

Information Theory and Hypothesis Testing Summer School on Game Theory and Telecommunications Campione, 7-12 September, 2014 Information Theory and Hypothesis Testing Mauro Barni University of Siena September 8 Review of some basic results linking

More information

Multimedia Communications. Mathematical Preliminaries for Lossless Compression

Multimedia Communications. Mathematical Preliminaries for Lossless Compression Multimedia Communications Mathematical Preliminaries for Lossless Compression What we will see in this chapter Definition of information and entropy Modeling a data source Definition of coding and when

More information

280 Eiji Takimoto and Manfred K. Warmuth

280 Eiji Takimoto and Manfred K. Warmuth The Last-Step Minimax Algorithm Eiji Takimoto 1? and Manfred K. Warmuth?? 1 Graduate School of Information Sciences, Tohoku University Sendai, 980-8579, Japan. t@ecei.tohoku.ac.jp Computer Science Department,

More information

Notes 1 : Measure-theoretic foundations I

Notes 1 : Measure-theoretic foundations I Notes 1 : Measure-theoretic foundations I Math 733-734: Theory of Probability Lecturer: Sebastien Roch References: [Wil91, Section 1.0-1.8, 2.1-2.3, 3.1-3.11], [Fel68, Sections 7.2, 8.1, 9.6], [Dur10,

More information

Two-Part Codes with Low Worst-Case Redundancies for Distributed Compression of Bernoulli Sequences

Two-Part Codes with Low Worst-Case Redundancies for Distributed Compression of Bernoulli Sequences 2003 Conference on Information Sciences and Systems, The Johns Hopkins University, March 2 4, 2003 Two-Part Codes with Low Worst-Case Redundancies for Distributed Compression of Bernoulli Sequences Dror

More information

Taylor and Laurent Series

Taylor and Laurent Series Chapter 4 Taylor and Laurent Series 4.. Taylor Series 4... Taylor Series for Holomorphic Functions. In Real Analysis, the Taylor series of a given function f : R R is given by: f (x + f (x (x x + f (x

More information

An Analysis of the Difference of Code Lengths Between Two-Step Codes Based on MDL Principle and Bayes Codes

An Analysis of the Difference of Code Lengths Between Two-Step Codes Based on MDL Principle and Bayes Codes IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 47, NO. 3, MARCH 2001 927 An Analysis of the Difference of Code Lengths Between Two-Step Codes Based on MDL Principle Bayes Codes Masayuki Goto, Member, IEEE,

More information

On Bank-Laine functions

On Bank-Laine functions Computational Methods and Function Theory Volume 00 0000), No. 0, 000 000 XXYYYZZ On Bank-Laine functions Alastair Fletcher Keywords. Bank-Laine functions, zeros. 2000 MSC. 30D35, 34M05. Abstract. In this

More information

Markov approximation and consistent estimation of unbounded probabilistic suffix trees

Markov approximation and consistent estimation of unbounded probabilistic suffix trees Markov approximation and consistent estimation of unbounded probabilistic suffix trees Denise Duarte Antonio Galves Nancy L. Garcia Abstract We consider infinite order chains whose transition probabilities

More information

Universal Loseless Compression: Context Tree Weighting(CTW)

Universal Loseless Compression: Context Tree Weighting(CTW) Universal Loseless Compression: Context Tree Weighting(CTW) Dept. Electrical Engineering, Stanford University Dec 9, 2014 Universal Coding with Model Classes Traditional Shannon theory assume a (probabilistic)

More information

Bayesian Network Structure Learning using Factorized NML Universal Models

Bayesian Network Structure Learning using Factorized NML Universal Models Bayesian Network Structure Learning using Factorized NML Universal Models Teemu Roos, Tomi Silander, Petri Kontkanen, and Petri Myllymäki Complex Systems Computation Group, Helsinki Institute for Information

More information

Bandit models: a tutorial

Bandit models: a tutorial Gdt COS, December 3rd, 2015 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions) Bandit game: a each round t, an agent chooses

More information

1 Introduction This work follows a paper by P. Shields [1] concerned with a problem of a relation between the entropy rate of a nite-valued stationary

1 Introduction This work follows a paper by P. Shields [1] concerned with a problem of a relation between the entropy rate of a nite-valued stationary Prexes and the Entropy Rate for Long-Range Sources Ioannis Kontoyiannis Information Systems Laboratory, Electrical Engineering, Stanford University. Yurii M. Suhov Statistical Laboratory, Pure Math. &

More information

A proof of a partition conjecture of Bateman and Erdős

A proof of a partition conjecture of Bateman and Erdős proof of a partition conjecture of Bateman and Erdős Jason P. Bell Department of Mathematics University of California, San Diego La Jolla C, 92093-0112. US jbell@math.ucsd.edu 1 Proposed Running Head:

More information

1590 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 6, JUNE Source Coding, Large Deviations, and Approximate Pattern Matching

1590 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 6, JUNE Source Coding, Large Deviations, and Approximate Pattern Matching 1590 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 6, JUNE 2002 Source Coding, Large Deviations, and Approximate Pattern Matching Amir Dembo and Ioannis Kontoyiannis, Member, IEEE Invited Paper

More information

Appendix A. Sequences and series. A.1 Sequences. Definition A.1 A sequence is a function N R.

Appendix A. Sequences and series. A.1 Sequences. Definition A.1 A sequence is a function N R. Appendix A Sequences and series This course has for prerequisite a course (or two) of calculus. The purpose of this appendix is to review basic definitions and facts concerning sequences and series, which

More information

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University Chapter 4 Data Transmission and Channel Capacity Po-Ning Chen, Professor Department of Communications Engineering National Chiao Tung University Hsin Chu, Taiwan 30050, R.O.C. Principle of Data Transmission

More information

Metric Spaces and Topology

Metric Spaces and Topology Chapter 2 Metric Spaces and Topology From an engineering perspective, the most important way to construct a topology on a set is to define the topology in terms of a metric on the set. This approach underlies

More information

ON THE SERIES EXPANSION FOR THE STATIONARY PROBABILITIES OF AN M/D/1 QUEUE

ON THE SERIES EXPANSION FOR THE STATIONARY PROBABILITIES OF AN M/D/1 QUEUE Journal of the Operations Research Society of Japan 2005, Vol. 48, No. 2, 111-122 2005 The Operations Research Society of Japan ON THE SERIES EXPANSION FOR THE STATIONARY PROBABILITIES OF AN M/D/1 QUEUE

More information

Common Information. Abbas El Gamal. Stanford University. Viterbi Lecture, USC, April 2014

Common Information. Abbas El Gamal. Stanford University. Viterbi Lecture, USC, April 2014 Common Information Abbas El Gamal Stanford University Viterbi Lecture, USC, April 2014 Andrew Viterbi s Fabulous Formula, IEEE Spectrum, 2010 El Gamal (Stanford University) Disclaimer Viterbi Lecture 2

More information

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information. L65 Dept. of Linguistics, Indiana University Fall 205 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission rate

More information

Dept. of Linguistics, Indiana University Fall 2015

Dept. of Linguistics, Indiana University Fall 2015 L645 Dept. of Linguistics, Indiana University Fall 2015 1 / 28 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission

More information

Counting Palindromes According to r-runs of Ones Using Generating Functions

Counting Palindromes According to r-runs of Ones Using Generating Functions 3 47 6 3 Journal of Integer Sequences, Vol. 7 (04), Article 4.6. Counting Palindromes According to r-runs of Ones Using Generating Functions Helmut Prodinger Department of Mathematics Stellenbosch University

More information

Solution Set for Homework #1

Solution Set for Homework #1 CS 683 Spring 07 Learning, Games, and Electronic Markets Solution Set for Homework #1 1. Suppose x and y are real numbers and x > y. Prove that e x > ex e y x y > e y. Solution: Let f(s = e s. By the mean

More information

Piecewise Constant Prediction

Piecewise Constant Prediction Piecewise Constant Prediction Erik Ordentlich, Marcelo J. Weinberger, Yihong Wu HP Laboratories HPL-01-114 Keyword(s): prediction; data compression; i strategies; universal schemes Abstract: Mini prediction

More information

Bounded Expected Delay in Arithmetic Coding

Bounded Expected Delay in Arithmetic Coding Bounded Expected Delay in Arithmetic Coding Ofer Shayevitz, Ram Zamir, and Meir Feder Tel Aviv University, Dept. of EE-Systems Tel Aviv 69978, Israel Email: {ofersha, zamir, meir }@eng.tau.ac.il arxiv:cs/0604106v1

More information

The Game of Normal Numbers

The Game of Normal Numbers The Game of Normal Numbers Ehud Lehrer September 4, 2003 Abstract We introduce a two-player game where at each period one player, say, Player 2, chooses a distribution and the other player, Player 1, a

More information

3F1 Information Theory, Lecture 3

3F1 Information Theory, Lecture 3 3F1 Information Theory, Lecture 3 Jossy Sayir Department of Engineering Michaelmas 2013, 29 November 2013 Memoryless Sources Arithmetic Coding Sources with Memory Markov Example 2 / 21 Encoding the output

More information

Analytic Algorithmics, Combinatorics, and Information Theory

Analytic Algorithmics, Combinatorics, and Information Theory Analytic Algorithmics, Combinatorics, and Information Theory W. Szpankowski Department of Computer Science Purdue University W. Lafayette, IN 47907 September 11, 2006 AofA and IT logos Research supported

More information

Some Expectations of a Non-Central Chi-Square Distribution With an Even Number of Degrees of Freedom

Some Expectations of a Non-Central Chi-Square Distribution With an Even Number of Degrees of Freedom Some Expectations of a Non-Central Chi-Square Distribution With an Even Number of Degrees of Freedom Stefan M. Moser April 7, 007 Abstract The non-central chi-square distribution plays an important role

More information

Large Deviations Performance of Knuth-Yao algorithm for Random Number Generation

Large Deviations Performance of Knuth-Yao algorithm for Random Number Generation Large Deviations Performance of Knuth-Yao algorithm for Random Number Generation Akisato KIMURA akisato@ss.titech.ac.jp Tomohiko UYEMATSU uematsu@ss.titech.ac.jp April 2, 999 No. AK-TR-999-02 Abstract

More information

Pointwise Redundancy in Lossy Data Compression and Universal Lossy Data Compression

Pointwise Redundancy in Lossy Data Compression and Universal Lossy Data Compression Pointwise Redundancy in Lossy Data Compression and Universal Lossy Data Compression I. Kontoyiannis To appear, IEEE Transactions on Information Theory, Jan. 2000 Last revision, November 21, 1999 Abstract

More information

Chapter 5: Data Compression

Chapter 5: Data Compression Chapter 5: Data Compression Definition. A source code C for a random variable X is a mapping from the range of X to the set of finite length strings of symbols from a D-ary alphabet. ˆX: source alphabet,

More information

Arimoto Channel Coding Converse and Rényi Divergence

Arimoto Channel Coding Converse and Rényi Divergence Arimoto Channel Coding Converse and Rényi Divergence Yury Polyanskiy and Sergio Verdú Abstract Arimoto proved a non-asymptotic upper bound on the probability of successful decoding achievable by any code

More information

MMSE Dimension. snr. 1 We use the following asymptotic notation: f(x) = O (g(x)) if and only

MMSE Dimension. snr. 1 We use the following asymptotic notation: f(x) = O (g(x)) if and only MMSE Dimension Yihong Wu Department of Electrical Engineering Princeton University Princeton, NJ 08544, USA Email: yihongwu@princeton.edu Sergio Verdú Department of Electrical Engineering Princeton University

More information

Mismatched Multi-letter Successive Decoding for the Multiple-Access Channel

Mismatched Multi-letter Successive Decoding for the Multiple-Access Channel Mismatched Multi-letter Successive Decoding for the Multiple-Access Channel Jonathan Scarlett University of Cambridge jms265@cam.ac.uk Alfonso Martinez Universitat Pompeu Fabra alfonso.martinez@ieee.org

More information

CSCI 2570 Introduction to Nanocomputing

CSCI 2570 Introduction to Nanocomputing CSCI 2570 Introduction to Nanocomputing Information Theory John E Savage What is Information Theory Introduced by Claude Shannon. See Wikipedia Two foci: a) data compression and b) reliable communication

More information

Math Homework 2

Math Homework 2 Math 73 Homework Due: September 8, 6 Suppose that f is holomorphic in a region Ω, ie an open connected set Prove that in any of the following cases (a) R(f) is constant; (b) I(f) is constant; (c) f is

More information

Probabilistic analysis of the asymmetric digital search trees

Probabilistic analysis of the asymmetric digital search trees Int. J. Nonlinear Anal. Appl. 6 2015 No. 2, 161-173 ISSN: 2008-6822 electronic http://dx.doi.org/10.22075/ijnaa.2015.266 Probabilistic analysis of the asymmetric digital search trees R. Kazemi a,, M. Q.

More information

MATH 311: COMPLEX ANALYSIS CONTOUR INTEGRALS LECTURE

MATH 311: COMPLEX ANALYSIS CONTOUR INTEGRALS LECTURE MATH 3: COMPLEX ANALYSIS CONTOUR INTEGRALS LECTURE Recall the Residue Theorem: Let be a simple closed loop, traversed counterclockwise. Let f be a function that is analytic on and meromorphic inside. Then

More information

Fourth Week: Lectures 10-12

Fourth Week: Lectures 10-12 Fourth Week: Lectures 10-12 Lecture 10 The fact that a power series p of positive radius of convergence defines a function inside its disc of convergence via substitution is something that we cannot ignore

More information

IEOR 6711: Stochastic Models I Fall 2013, Professor Whitt Lecture Notes, Thursday, September 5 Modes of Convergence

IEOR 6711: Stochastic Models I Fall 2013, Professor Whitt Lecture Notes, Thursday, September 5 Modes of Convergence IEOR 6711: Stochastic Models I Fall 2013, Professor Whitt Lecture Notes, Thursday, September 5 Modes of Convergence 1 Overview We started by stating the two principal laws of large numbers: the strong

More information

Local Asymptotics and the Minimum Description Length

Local Asymptotics and the Minimum Description Length Local Asymptotics and the Minimum Description Length Dean P. Foster and Robert A. Stine Department of Statistics The Wharton School of the University of Pennsylvania Philadelphia, PA 19104-6302 March 27,

More information

Richard F. Bass Krzysztof Burdzy University of Washington

Richard F. Bass Krzysztof Burdzy University of Washington ON DOMAIN MONOTONICITY OF THE NEUMANN HEAT KERNEL Richard F. Bass Krzysztof Burdzy University of Washington Abstract. Some examples are given of convex domains for which domain monotonicity of the Neumann

More information

A Master Theorem for Discrete Divide and Conquer Recurrences. Dedicated to PHILIPPE FLAJOLET

A Master Theorem for Discrete Divide and Conquer Recurrences. Dedicated to PHILIPPE FLAJOLET A Master Theorem for Discrete Divide and Conquer Recurrences Wojciech Szpankowski Department of Computer Science Purdue University U.S.A. September 6, 2012 Uniwersytet Jagieloński, Kraków, 2012 Dedicated

More information

Variable Length Codes for Degraded Broadcast Channels

Variable Length Codes for Degraded Broadcast Channels Variable Length Codes for Degraded Broadcast Channels Stéphane Musy School of Computer and Communication Sciences, EPFL CH-1015 Lausanne, Switzerland Email: stephane.musy@ep.ch Abstract This paper investigates

More information

Considering our result for the sum and product of analytic functions, this means that for (a 0, a 1,..., a N ) C N+1, the polynomial.

Considering our result for the sum and product of analytic functions, this means that for (a 0, a 1,..., a N ) C N+1, the polynomial. Lecture 3 Usual complex functions MATH-GA 245.00 Complex Variables Polynomials. Construction f : z z is analytic on all of C since its real and imaginary parts satisfy the Cauchy-Riemann relations and

More information

ON A WEIGHTED INTERPOLATION OF FUNCTIONS WITH CIRCULAR MAJORANT

ON A WEIGHTED INTERPOLATION OF FUNCTIONS WITH CIRCULAR MAJORANT ON A WEIGHTED INTERPOLATION OF FUNCTIONS WITH CIRCULAR MAJORANT Received: 31 July, 2008 Accepted: 06 February, 2009 Communicated by: SIMON J SMITH Department of Mathematics and Statistics La Trobe University,

More information

Supplementary Notes for W. Rudin: Principles of Mathematical Analysis

Supplementary Notes for W. Rudin: Principles of Mathematical Analysis Supplementary Notes for W. Rudin: Principles of Mathematical Analysis SIGURDUR HELGASON In 8.00B it is customary to cover Chapters 7 in Rudin s book. Experience shows that this requires careful planning

More information

Optimal global rates of convergence for interpolation problems with random design

Optimal global rates of convergence for interpolation problems with random design Optimal global rates of convergence for interpolation problems with random design Michael Kohler 1 and Adam Krzyżak 2, 1 Fachbereich Mathematik, Technische Universität Darmstadt, Schlossgartenstr. 7, 64289

More information

The diameter of a random Cayley graph of Z q

The diameter of a random Cayley graph of Z q The diameter of a random Cayley graph of Z q Gideon Amir Ori Gurel-Gurevich September 4, 009 Abstract Consider the Cayley graph of the cyclic group of prime order q with k uniformly chosen generators.

More information

Distributed Functional Compression through Graph Coloring

Distributed Functional Compression through Graph Coloring Distributed Functional Compression through Graph Coloring Vishal Doshi, Devavrat Shah, Muriel Médard, and Sidharth Jaggi Laboratory for Information and Decision Systems Massachusetts Institute of Technology

More information

THIS paper is aimed at designing efficient decoding algorithms

THIS paper is aimed at designing efficient decoding algorithms IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 45, NO. 7, NOVEMBER 1999 2333 Sort-and-Match Algorithm for Soft-Decision Decoding Ilya Dumer, Member, IEEE Abstract Let a q-ary linear (n; k)-code C be used

More information

Some Fun with Divergent Series

Some Fun with Divergent Series Some Fun with Divergent Series 1. Preliminary Results We begin by examining the (divergent) infinite series S 1 = 1 + 2 + 3 + 4 + 5 + 6 + = k=1 k S 2 = 1 2 + 2 2 + 3 2 + 4 2 + 5 2 + 6 2 + = k=1 k 2 (i)

More information

The Moments of the Profile in Random Binary Digital Trees

The Moments of the Profile in Random Binary Digital Trees Journal of mathematics and computer science 6(2013)176-190 The Moments of the Profile in Random Binary Digital Trees Ramin Kazemi and Saeid Delavar Department of Statistics, Imam Khomeini International

More information

Data Compression. Limit of Information Compression. October, Examples of codes 1

Data Compression. Limit of Information Compression. October, Examples of codes 1 Data Compression Limit of Information Compression Radu Trîmbiţaş October, 202 Outline Contents Eamples of codes 2 Kraft Inequality 4 2. Kraft Inequality............................ 4 2.2 Kraft inequality

More information

A Master Theorem for Discrete Divide and Conquer Recurrences

A Master Theorem for Discrete Divide and Conquer Recurrences A Master Theorem for Discrete Divide and Conquer Recurrences Wojciech Szpankowski Department of Computer Science Purdue University W. Lafayette, IN 47907 January 20, 2011 NSF CSoI SODA, 2011 Research supported

More information

Streaming Algorithms for Optimal Generation of Random Bits

Streaming Algorithms for Optimal Generation of Random Bits Streaming Algorithms for Optimal Generation of Random Bits ongchao Zhou Electrical Engineering Department California Institute of echnology Pasadena, CA 925 Email: hzhou@caltech.edu Jehoshua Bruck Electrical

More information

An Inverse Problem for Gibbs Fields with Hard Core Potential

An Inverse Problem for Gibbs Fields with Hard Core Potential An Inverse Problem for Gibbs Fields with Hard Core Potential Leonid Koralov Department of Mathematics University of Maryland College Park, MD 20742-4015 koralov@math.umd.edu Abstract It is well known that

More information