Analytic Information Theory: From Shannon to Knuth and Back. Knuth80: Piteaa, Sweden, 2018 Dedicated to Don E. Knuth

Size: px
Start display at page:

Download "Analytic Information Theory: From Shannon to Knuth and Back. Knuth80: Piteaa, Sweden, 2018 Dedicated to Don E. Knuth"

Transcription

1 Analytic Information Theory: From Shannon to Knuth and Back Wojciech Szpankowski Center for Science of Information Purdue University January 7, 2018 Knuth80: Piteaa, Sweden, 2018 Dedicated to Don E. Knuth Joint work with M. Drmota, P. Flajolet, P. Jacquet, M. Weinberger

2 Outline 1. Shannon & Knuth Legacy 2. Huffman Code and Its Redundancy 3. Universal Codes and Lambert s Function 4. Graph Compression and Knuth s Recurrences Algorithms: Combinatorics: Information: are at the heart of virtually all computing technologies; provides indispensable tools for finding patterns and structures; is a measure of distinguishability.

3 Shannon Legacy: Information Theory Theorem 1 & 3. [Shannon 1948; Lossless & Lossy Data Compression] compression bit rate source entropy H(X) for distortion level D: lossy bit rate rate distortion function R(D) Theorem 2. [Shannon 1948; Channel Coding ] In Shannon s words: It is possible to send information at the capacity through the channel with as small a frequency of errors as desired by proper (long) encoding. This statement is not true for any rate greater than the capacity.

4 Knuth s Legacy: Analytic Combinatorics Following Hadamard s precept 1, analytic combinatorics applies techniques of complex analysis (e.g., generating functions, combinatorial calculus, Rice s formula, Mellin transform, Fourier series, sequences distributed modulo 1, saddle point methods, analytic poissonization and depoissonization, and singularity analysis) to analyze algorithms (and combinatorial structures). D.E. Knuth initiated it in the 1970, while Flajolet and followers develop it through the next three decades, culminating in the publication of the Flajolet-Sedgewick magnum opus in 2009, which defines the field, and has stimulated a blossoming research field since. In the 1997 Shannon Lecture Jacob Ziv presented compelling arguments for backing off from first-order asymptotics in information theory. The program that applies complex-analytic tools to information theory, constitutes analytic information theory. 1 The shortest path between two truths on the real line passes through the complex plane.

5 Outline Update 1. Shannon & Knuth Legacy 2. Huffman Code and Its Redundancy 3. Universal Codes and Lambert s Function 4. Graph Compression and Knuth s Recurrences

6 Source Coding vel Data Compression A source code is a bijective mapping C : A {0,1} from sequences over the alphabet A to set {0,1} of binary sequences. The basic problem of source coding (i.e., data compression) is to find codes with shortest descriptions either on average or for individual sequences.

7 Source Coding vel Data Compression A source code is a bijective mapping C : A {0,1} from sequences over the alphabet A to set {0,1} of binary sequences. The basic problem of source coding (i.e., data compression) is to find codes with shortest descriptions either on average or for individual sequences. For a probabilistic source model S and a code C n we let: P(x n 1 ) be the probability of xn 1 = x 1...x n ; L(C n,x n 1 ) be the code length for xn 1 ; Entropy H n (P) = x n 1 P(xn 1 )log 2P(x n 1 ). Prefix Codes: no codeword is a prefix of a codword (Kraft s inequality).

8 Source Coding vel Data Compression A source code is a bijective mapping C : A {0,1} from sequences over the alphabet A to set {0,1} of binary sequences. The basic problem of source coding (i.e., data compression) is to find codes with shortest descriptions either on average or for individual sequences. For a probabilistic source model S and a code C n we let: P(x n 1 ) be the probability of xn 1 = x 1...x n ; L(C n,x n 1 ) be the code length for xn 1 ; Entropy H n (P) = x n 1 P(xn 1 )log 2P(x n 1 ). Prefix Codes: no codeword is a prefix of a codword (Kraft s inequality). Shannon First Theorem: For any prefix code the average code length E[L(C n,x n 1 )] cannot be smaller then the entropy of the source H n(p): E[L(C n,x n 1 )] H n(p).

9 Redundancy: Rate of Covergence Known Source P : Assume that P is known to us. It is known that the shortest code length L op (x n 1 ) logp(xn 1 ). Thus, the pointwise redundancyr n (C n,p;x n 1 ) and the average redundancy R n (C n,p) are defined as R n (C n,p;x n 1 ) = L(C n,x n 1 ) ( log 2P(x n 1 )) The maximal or worst case redundancy is Huffman Code (1952): R n (C n ) = E[L(C n,x n 1 )] H n(p) 0 R (C n,p) = max{r x n n (C n,p;x n 1 )}( 0). 1 R n (P) = min Cn C E x n 1 [L(C n,x n 1 ) + log 2P(x n 1 )].

10 Redundancy: Rate of Covergence Known Source P : Assume that P is known to us. It is known that the shortest code length L op (x n 1 ) logp(xn 1 ). Thus, the pointwise redundancyr n (C n,p;x n 1 ) and the average redundancy R n (C n,p) are defined as R n (C n,p;x n 1 ) = L(C n,x n 1 ) ( log 2P(x n 1 )) The maximal or worst case redundancy is Huffman Code (1952): R n (C n ) = E[L(C n,x n 1 )] H n(p) 0 R (C n,p) = max{r x n n (C n,p;x n 1 )}( 0). 1 R n (P) = min Cn C E x n 1 [L(C n,x n 1 ) + log 2P(x n 1 )]. D. E. Knuth, Dynamic Huffman Coding, J. Algorithms, 1985.

11 Redundancy: Rate of Covergence Known Source P : Assume that P is known to us. It is known that the shortest code length L op (x n 1 ) logp(xn 1 ). Thus, the pointwise redundancyr n (C n,p;x n 1 ) and the average redundancy R n (C n,p) are defined as R n (C n,p;x n 1 ) = L(C n,x n 1 ) ( log 2P(x n 1 )) The maximal or worst case redundancy is Huffman Code (1952): R n (C n ) = E[L(C n,x n 1 )] H n(p) 0 R (C n,p) = max{r x n n (C n,p;x n 1 )}( 0). 1 R n (P) = min Cn C E x n 1 [L(C n,x n 1 ) + log 2P(x n 1 )]. D. E. Knuth, Dynamic Huffman Coding, J. Algorithms, How the average Huffman s code redundancy behaves asymptotically (n )?

12 Redundancy of the Huffman Code (a) (b) Figure 1: The average redundancy of Huffman codes versus block size n for: (a) irrational α = log 2 (1 p)/p with p = 1/π; (b) rational α = log 2 (1 p)/p with p = 1/9.

13 Redundancy of the Huffman Code (a) (b) Figure 1: The average redundancy of Huffman codes versus block size n for: (a) irrational α = log 2 (1 p)/p with p = 1/π; (b) rational α = log 2 (1 p)/p with p = 1/9. Theorem 1 (W.S., 2000). Consider the Huffman block code of length n over a binary memoryless source with probability of transmitting a 1 is p < 1 2. Then as n R H n = ln M + o(1) , α = log 2(1 p)/p irrational, ( ) βmn M(1 2 1/M ) 2 nβm /M + O(ρ n ), α = N M where gcd(n,m) = 1, β = log(1 p), x = x x, and ρ < 1.

14 Why Two Modes: Shannon Code To simplify, we consider the Shannon code that assigns the length L(C S n,xn 1 ) = log 2P(x n 1 ) where P(x n 1 ) = pk (1 p) n k, with p being known probability of generating 0 and k is the number of 0s. The Shannon code redundancy is R S n = n k=0 = 1 = ( n p k) k (1 p) n k( ) log 2 (p k (1 p) n k ) + log 2 (p k (1 p) n k ) n k=0 ( n k) p k (1 p) n k αk + βn o(1) α = log 2(1 p)/p irrational M ( Mnβ 1 2) + O(ρ n ) α = N M rational where x = x x is the fractional part of x, and ( ) ( ) 1 p 1 α = log 2, β = log 2. p 1 p

15 Sketch of Proof: Sequences Modulo 1 To analyze redundancy for known sources one needs to understand asymptotic behavior of the following sum n k=0 ( n k) p k (1 p) n k f( αk + y ) for fixed p and some Riemann integrable function f : [0, 1] R. The proof follows from the following two lemmas. Lemma 1. Let 0 < p < 1 be a fixed real number and α be an irrational number. Then for every Riemann integrable function f : [0, 1] R lim n n k=0 ( n k) p k (1 p) n k f( αk + y ) = 1 0 f(t)dt, where the convergence is uniform for all shifts y R. Lemma 2. Let α = N M be a rational number with gcd(n,m) = 1. Then for bounded function f : [0,1] R n k=0 ( n k) p k (1 p) n k f( αk + y ) = 1 M uniformly for all y R and some ρ < 1. M 1 l=0 f ( l M + My ) M + O(ρ n )

16 Outline Update 1. Shannon & Knuth Legacy 2. Huffman Code and Its Redundancy 3. Universal Codes and Lambert s Function Finite Alphabet Unbounded Alphabet 4. Graph Compression and Knuth s Recurrences

17 Minimax Redundancy For Unknown Sources Following Davisson, the maximal minimax redundancy R n (S) for a family of sources S is: R n (S) = min R n Cn sup P S (S) = minsup Cn P S E[L(C n,x n 1 ) + logp(xn 1 )] max[l(c x n n,x n 1 ) + logp(xn 1 )]. 1

18 Minimax Redundancy For Unknown Sources Following Davisson, the maximal minimax redundancy R n (S) for a family of sources S is: R n (S) = min R n Cn sup P S (S) = minsup Cn P S E[L(C n,x n 1 ) + logp(xn 1 )] max[l(c x n n,x n 1 ) + logp(xn 1 )]. 1 Shtarkov s Bound for maximal minimaxr n (S) (using the maximum likelihood distribution): d n (S) := log x n 1 An sup P S P(x n 1 ) R n (S) log using the following maximum likelihood distribution Q (x n 1 ) := sup P S P(x n 1 ) y n1 An sup P S P(y n 1 ). x n 1 An sup P S P(x n 1 ) +1 } {{ } Dn(S)

19 Maximal Minimax for Memoryless Sources We shall analyze D n (M 0 ) for memoryless source M 0 over the alphabet A = {1,2,...,m} with probability of symbols p i, i = 1,...,m. Observe P(x n 1 ) = p 1 k1 p m km, k k m = n, where k i is the number of symbols i in x n. Since we find sup P(x n P(x n 1 ) 1 ) = sup p k 1 1 pk m m = p 1,...,pm ( k1 n ) k1 ( km n ) k m D n (M 0 ) := x n 1 = = sup P(x n 1 ) P(x n 1 ) = x n 1 k 1 + +km=n k 1 + +km=n ( n k 1,...,k m ) sup p k 1 1 pk m m p 1,...,pm sup p 1,...,pm ( n k 1,...,k m )( k1 n p k 1 1 pk m m ) k1 ( km n ) k m.

20 Tree Generating Function for D n (M 0 ) We write D n (M 0 ) = k 1 + +km=n ( n )( ) k1 k1 ( km k 1,...,k m n n ) k m = n! n n k 1 + +km=n k k 1 1 k 1! kkm m k m! Let us introduce a tree-generating function B(z) = k=0 k k k! zk = 1 1 T(z), T(z) = k=1 k k 1 z k k! where T(z) = ze T(z) (= W( z), Lambert s W -function) that enumerates all rooted labeled trees. Let now D m (z) = n=0 z n n n n! D n(m 0 ). Then by the convolution formula: D m (z) = [B(z)] m 1.

21 Tree Generating Function for D n (M 0 ) We write D n (M 0 ) = k 1 + +km=n ( n )( ) k1 k1 ( km k 1,...,k m n n ) k m = n! n n k 1 + +km=n k k 1 1 k 1! kkm m k m! Let us introduce a tree-generating function B(z) = k=0 k k k! zk = 1 1 T(z), T(z) = k=1 k k 1 z k k! where T(z) = ze T(z) (= W( z), Lambert s W -function) that enumerates all rooted labeled trees. Let now D m (z) = n=0 z n n n n! D n(m 0 ). Then by the convolution formula: D m (z) = [B(z)] m 1. D. E. Knuth and B. Pittel A Recurrence Related to Trees, Proc. AMS, D. E. Knuth, et al. On the Lambert W Function, Adv. Comp. Math., 1996.

22 Outline Update 1. Shannon & Knuth Legacy 2. Huffman Code and Its Redundancy 3. Universal Codes and Lambert s Function Finite Alphabet Unbounded Alphabet 4. Graph Compression and Knuth s Recurrences

23 Asymptotics for FINITE m The function B(z) has an algebraic singularity at z = e 1, and β(z) = B(z/e) = 1 2(1 z) O( (1 z). By Cauchy s coefficient formula D n (M 0 ) = n! n n[zn ][B(z)] m = 2πn(1 + O(1/n)) 1 2πi β(z) m dz. zn+1 For finite m, the singularity analysis of Flajolet and Odlyzko implies [z n ](1 z) α nα 1 Γ(α), α / {0, 1, 2,...} that finally yields (cf. W.S., 1998) R n (M 0) = m 1 ( ) ( ) n π log + log 2 2 Γ( m 2 ) ( 3 + m(m 2)(2m + 1) Γ(m 2 )m 3Γ( m ) Γ2 ( m 2 )m2 9Γ 2 ( m ) ) 2 n 1 n +

24 Outline Update 1. Shannon & Knuth Legacy 2. Huffman Code and Its Redundancy 3. Universal Codes and Lambert s Function Finite Alphabet Unbounded Alphabet 4. Graph Compression and Knuth s Recurrences

25 Redundancy for UNBOUNDED m Now assume that m is unbounded and may vary with n. Then D n,m (M 0 ) = 2πn 1 2πi β(z) m z n+1 dz = 2πn 1 2πi e g(z) dz where g(z) = mlnβ(z) (n + 1)lnz. The saddle point z 0 is a solution of g (z 0 ) = 0, where 0 z 0 1.

26 Redundancy for UNBOUNDED m Now assume that m is unbounded and may vary with n. Then D n,m (M 0 ) = 2πn 1 2πi β(z) m z n+1 dz = 2πn 1 2πi e g(z) dz where g(z) = mlnβ(z) (n + 1)lnz. The saddle point z 0 is a solution of g (z 0 ) = 0, where 0 z 0 1. m = o(n) m = n n = o(m)

27 Redundancy for UNBOUNDED m Now assume that m is unbounded and may vary with n. Then D n,m (M 0 ) = 2πn 1 2πi β(z) m z n+1 dz = 2πn 1 2πi e g(z) dz where g(z) = mlnβ(z) (n + 1)lnz. The saddle point z 0 is a solution of g (z 0 ) = 0, where 0 z 0 1. m = o(n) m = n n = o(m) D. Greene, D. E. Knuth, Mathematics for the Analysis of Algorithms, 1990.

28 Main Results for LARGE m Theorem 2 (W.S. and Weinberger, 2010). For memoryless sourcesm 0 over an m-ary alphabet, m as n grows, we have: (i) For m = o(n) R n,m (M 0) = m 1 2 log n m + m 2 loge + mloge 3 m n 1 2 O ( ) m n (ii) For m = αn + l(n), where α is a positive constant and l(n) = o(n), R n,m (M 0) = nlogb α + l(n)logc α log A α + O(l(n) 2 /n) where C α := /α, A α := C α + 2/α, B α = αc α+2 α e 1 Cα. (iii) For n = o(m) R n,m (M 0) = nlog m n + 3 n 2 2m loge 3 ( ) n 1 2m loge + O n + n3. m 2

29 Outline Update 1. Shannon & Knuth Legacy 2. Huffman Code and Its Redundancy 3. Universal Codes and Lambert s Function 4. Graph Compression and Knuth s Recurrences

30 Graph and Structural Entropies A structure model S of a graph G is defined as its unlabeled version. G 1 1 G 2 1 G 3 1 G 4 1 S 1 S G 5 1 G 6 1 G 7 1 G 8 1 S 3 S The probability of a structure S is: P(S) = N(S) P(G) where N(S) is the number of labeled graphs with the same structure. H G = E[ logp(g)] = G GP(G)logP(G), graph entropy H S = E[ logp(s)] = S S P(S)logP(S) structural entropy

31 Graph and Structural Entropies A structure model S of a graph G is defined as its unlabeled version. G 1 1 G 2 1 G 3 1 G 4 1 S 1 S G 5 1 G 6 1 G 7 1 G 8 1 S 3 S The probability of a structure S is: P(S) = N(S) P(G) where N(S) is the number of labeled graphs with the same structure. H G = E[ logp(g)] = G GP(G)logP(G), graph entropy H S = E[ logp(s)] = S S P(S)logP(S) structural entropy Graph Automorphism: For a graph G its automorphism is adjacency preserving permutation of vertices of G. H S = H G logn! + S S P(S)log Aut(S) b a c N(S) = n! Aut(S) d e

32 Erdös-Rényi Graph Model Erdös and Rényi model: G(n, p) generates graphs with n vertices, where edges are chosen independently with probability p: P(G) = p k (1 p) (n 2) k. Lemma (Kim, at al. 2002]. For ER graphs P(Aut(G) = 1) 1.

33 Erdös-Rényi Graph Model Erdös and Rényi model: G(n, p) generates graphs with n vertices, where edges are chosen independently with probability p: P(G) = p k (1 p) (n 2) k. Lemma (Kim, at al. 2002]. For ER graphs P(Aut(G) = 1) 1. Theorem 3 (Y. Choi and W.S., 2012). (i) For largenand all p satisfying lnn n p and 1 p lnn n (i.e., the graph is connected w.h.p.), H S = ( n 2) h(p) log n!+o(1) = ( n 2) h(p) nlogn+nloge 1 2 logn+o(1), where h(p) = plogp (1 p)log(1 p) is the entropy rate. (ii) CONVERSE: There is an algorithm, called SZIP, whose code length L(S) is upper bounded by E[L(S)] ( n 2) h(p) nlogn + n(c + Φ(logn)) + o(n),

34 Erdös-Rényi Graph Model Erdös and Rényi model: G(n, p) generates graphs with n vertices, where edges are chosen independently with probability p: P(G) = p k (1 p) (n 2) k. Lemma (Kim, at al. 2002]. For ER graphs P(Aut(G) = 1) 1. Theorem 3 (Y. Choi and W.S., 2012). (i) For largenand all p satisfying lnn n p and 1 p lnn n (i.e., the graph is connected w.h.p.), H S = ( n 2) h(p) log n!+o(1) = ( n 2) h(p) nlogn+nloge 1 2 logn+o(1), where h(p) = plogp (1 p)log(1 p) is the entropy rate. (ii) CONVERSE: There is an algorithm, called SZIP, whose code length L(S) is upper bounded by ( n E[L(S)] h(p) nlogn + n(c + Φ(logn)) + o(n), 2) Sketch of Proof: N(S) = n! Aut(S) and S S P(S)log Aut(S) = o(1).

35 Structural Zip (SZIP) Algorithm

36 Recurrences for E[B 1 ] and E[B 2 ] Let N x be the number of vertices that passed through node x in T n. {a,b,c,d,e,f,g,h,j} B 1 = B 2 = = log(n x + 1) x Tn and Nx > 1 x Tn and Nx = 1 x Tn and Nx = 1 log(n x + 1) 1. {d,g,j} {g,j} {b,c} {g} {c} {b} {c} {b} {h} {b} {h} {a,b,c,e,h} {a,e,h} {a,e,h} {a,e} {e} {a} {h} {e} {a} {e} {a} {a}

37 Recurrences for E[B 1 ] and E[B 2 ] Let N x be the number of vertices that passed through node x in T n. {a,b,c,d,e,f,g,h,j} B 1 = B 2 = = log(n x + 1) x Tn and Nx > 1 x Tn and Nx = 1 x Tn and Nx = 1 log(n x + 1) 1. {d,g,j} {g,j} {b,c} {g} {c} {b} {c} {b} {h} {b} {h} {a,b,c,e,h} {a,e,h} {a,e,h} {a,e} {e} {a} {h} {e} {a} {e} {a} Both E[ B 1 ] and E[ B 2 ] satisfy two-dimensional recurrences for some d 0 {a} b(n + 1,0) = n + n k=0( n k) p k q n k [b(k,0) + b(n k,k)], b(n,d) = n + n k=0( n k) p k q n k [b(k,d 1) + b(n k,k + d 1)], for d 1.

38 Regular Tries d (Knuth, 1968) Regular Trie Recurrence: Set d b(n, ) = n+ n k=0 ( n k) p k q n k [b(k, ) + b(n k, )]. Asymptotically: (Knuth 70, Jacquet 88, WS 89) b(n, ) = 1 h nlogn + 1 h [ γ + h ] 2 2h + Φ(log pn) n + o(n), where Φ(x) is the periodic function Φ(x) = ( k=,k 0 Γ when logp/log(1 p) is irrational, then Φ(x) 0 as x. 2kπir logp ) e 2kπrix ;

39 Regular Tries d (Knuth, 1968) Regular Trie Recurrence: Set d b(n, ) = n+ n k=0 ( n k) p k q n k [b(k, ) + b(n k, )]. Asymptotically: (Knuth 70, Jacquet 88, WS 89) b(n, ) = 1 h nlogn + 1 h [ γ + h ] 2 2h + Φ(log pn) n + o(n), where Φ(x) is the periodic function Φ(x) = ( k=,k 0 Γ when logp/log(1 p) is irrational, then Φ(x) 0 as x. 2kπir logp ) e 2kπrix ; D. E. Knuth, The Art of Computer Programming, vol 3, Addison-Wesley, 1973.

40 Regular Tries d (Knuth, 1968) Regular Trie Recurrence: Set d b(n, ) = n+ n k=0 ( n k) p k q n k [b(k, ) + b(n k, )]. Asymptotically: (Knuth 70, Jacquet 88, WS 89) b(n, ) = 1 h nlogn + 1 [ γ + h ] 2 h 2h + Φ(log pn) n + o(n), where Φ(x) is the periodic function Φ(x) = ( ) k=,k 0 Γ 2kπir logp e 2kπrix ; when logp/log(1 p) is irrational, then Φ(x) 0 as x. D. E. Knuth, The Art of Computer Programming, vol 3, Addison-Wesley, Define b(n,d) := b(n,d) b(n, ). Then we have our main result. Theorem 4. For n and d = O(1) we have b(n,d) = O(log 2 n), that is, 1 b(n,d) = 2hlogp log2 n + d h logn [ + 1 2h + 1 ( γ h )] 2 hlogp 2h + Ψ(log pn) where Ψ( ) is the periodic function, as above. logn + O(1),

41 Sketch of Proof 1. We analyze b (n,d) instead b(n,d) satisfying b (n,d) = n k=0 ( n k) p k q n k b (k,d 1) 2. The Poisson transform of b (n,d) A d (z) = n=2 b (n,d) zn n! e z satisfies the following functional recurrence A d (z) = A d 1 (pz) which can be solved as A d (z) = A 0 (p d z). 3. Define the Mellin transform: M(s) = 0 A 0 (z)z s 1 dz which leads to the following functional equation (s 1)M(s 1) + (1 p s )M(s) = (s 1)Γ(s) 1 p 1 s q 1 s. This can be solved and we arrive at ( Γ(s) s(s 1) M(s) = L=0 (1 pl s ) 2 + β + (s i 1) i=0 [ k=1 (1 ]) pk s+i ) 1 p 1+i s q 1 1+i s Residue theory and depoissonization complete the proof.

42 Standing of the Shoulders of Giants...

Minimax Redundancy for Large Alphabets by Analytic Methods

Minimax Redundancy for Large Alphabets by Analytic Methods Minimax Redundancy for Large Alphabets by Analytic Methods Wojciech Szpankowski Purdue University W. Lafayette, IN 47907 March 15, 2012 NSF STC Center for Science of Information CISS, Princeton 2012 Joint

More information

Algorithms, Combinatorics, Information, and Beyond. Dedicated to PHILIPPE FLAJOLET

Algorithms, Combinatorics, Information, and Beyond. Dedicated to PHILIPPE FLAJOLET Algorithms, Combinatorics, Information, and Beyond Wojciech Szpankowski Purdue University October 3, 2011 Concordia University, Montreal, 2011 Dedicated to PHILIPPE FLAJOLET Research supported by NSF Science

More information

Analytic Information Theory and Beyond

Analytic Information Theory and Beyond Analytic Information Theory and Beyond W. Szpankowski Department of Computer Science Purdue University W. Lafayette, IN 47907 May 12, 2010 AofA and IT logos Frankfurt 2010 Research supported by NSF Science

More information

Analytic Algorithmics, Combinatorics, and Information Theory

Analytic Algorithmics, Combinatorics, and Information Theory Analytic Algorithmics, Combinatorics, and Information Theory W. Szpankowski Department of Computer Science Purdue University W. Lafayette, IN 47907 September 11, 2006 AofA and IT logos Research supported

More information

A One-to-One Code and Its Anti-Redundancy

A One-to-One Code and Its Anti-Redundancy A One-to-One Code and Its Anti-Redundancy W. Szpankowski Department of Computer Science, Purdue University July 4, 2005 This research is supported by NSF, NSA and NIH. Outline of the Talk. Prefix Codes

More information

Shannon Legacy and Beyond. Dedicated to PHILIPPE FLAJOLET

Shannon Legacy and Beyond. Dedicated to PHILIPPE FLAJOLET Shannon Legacy and Beyond Wojciech Szpankowski Department of Computer Science Purdue University W. Lafayette, IN 47907 May 18, 2011 NSF Center for Science of Information LUCENT, Murray Hill, 2011 Dedicated

More information

Facets of Information

Facets of Information Facets of Information Wojciech Szpankowski Department of Computer Science Purdue University W. Lafayette, IN 47907 May 31, 2010 AofA and IT logos Ecole Polytechnique, Palaiseau, 2010 Research supported

More information

Shannon Legacy and Beyond

Shannon Legacy and Beyond Shannon Legacy and Beyond Wojciech Szpankowski Department of Computer Science Purdue University W. Lafayette, IN 47907 July 4, 2011 NSF Center for Science of Information Université Paris 13, Paris 2011

More information

A New Binomial Recurrence Arising in a Graphical Compression Algorithm

A New Binomial Recurrence Arising in a Graphical Compression Algorithm A New Binomial Recurrence Arising in a Graphical Compression Algorithm Yongwoo Choi, Charles Knessl, Wojciech Szpanowsi To cite this version: Yongwoo Choi, Charles Knessl, Wojciech Szpanowsi. A New Binomial

More information

A Master Theorem for Discrete Divide and Conquer Recurrences. Dedicated to PHILIPPE FLAJOLET

A Master Theorem for Discrete Divide and Conquer Recurrences. Dedicated to PHILIPPE FLAJOLET A Master Theorem for Discrete Divide and Conquer Recurrences Wojciech Szpankowski Department of Computer Science Purdue University U.S.A. September 6, 2012 Uniwersytet Jagieloński, Kraków, 2012 Dedicated

More information

A Master Theorem for Discrete Divide and Conquer Recurrences

A Master Theorem for Discrete Divide and Conquer Recurrences A Master Theorem for Discrete Divide and Conquer Recurrences Wojciech Szpankowski Department of Computer Science Purdue University W. Lafayette, IN 47907 January 20, 2011 NSF CSoI SODA, 2011 Research supported

More information

String Complexity. Dedicated to Svante Janson for his 60 Birthday

String Complexity. Dedicated to Svante Janson for his 60 Birthday String Complexity Wojciech Szpankowski Purdue University W. Lafayette, IN 47907 June 1, 2015 Dedicated to Svante Janson for his 60 Birthday Outline 1. Working with Svante 2. String Complexity 3. Joint

More information

Facets of Information

Facets of Information Facets of Information W. Szpankowski Department of Computer Science Purdue University W. Lafayette, IN 47907 January 30, 2009 AofA and IT logos QUALCOMM, San Diego, 2009 Thanks to Y. Choi, Purdue, P. Jacquet,

More information

Science of Information

Science of Information Science of Information Wojciech Szpankowski Department of Computer Science Purdue University W. Lafayette, IN 47907 July 9, 2010 AofA and IT logos ORACLE, California 2010 Research supported by NSF Science

More information

Chapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code

Chapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code Chapter 3 Source Coding 3. An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code 3. An Introduction to Source Coding Entropy (in bits per symbol) implies in average

More information

From the Discrete to the Continuous, and Back... Philippe Flajolet INRIA, France

From the Discrete to the Continuous, and Back... Philippe Flajolet INRIA, France From the Discrete to the Continuous, and Back... Philippe Flajolet INRIA, France 1 Discrete structures in... Combinatorial mathematics Computer science: data structures & algorithms Information and communication

More information

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria Source Coding Master Universitario en Ingeniería de Telecomunicación I. Santamaría Universidad de Cantabria Contents Introduction Asymptotic Equipartition Property Optimal Codes (Huffman Coding) Universal

More information

Context tree models for source coding

Context tree models for source coding Context tree models for source coding Toward Non-parametric Information Theory Licence de droits d usage Outline Lossless Source Coding = density estimation with log-loss Source Coding and Universal Coding

More information

Variable-to-Variable Codes with Small Redundancy Rates

Variable-to-Variable Codes with Small Redundancy Rates Variable-to-Variable Codes with Small Redundancy Rates M. Drmota W. Szpankowski September 25, 2004 This research is supported by NSF, NSA and NIH. Institut f. Diskrete Mathematik und Geometrie, TU Wien,

More information

Chapter 9 Fundamental Limits in Information Theory

Chapter 9 Fundamental Limits in Information Theory Chapter 9 Fundamental Limits in Information Theory Information Theory is the fundamental theory behind information manipulation, including data compression and data transmission. 9.1 Introduction o For

More information

Multiple Choice Tries and Distributed Hash Tables

Multiple Choice Tries and Distributed Hash Tables Multiple Choice Tries and Distributed Hash Tables Luc Devroye and Gabor Lugosi and Gahyun Park and W. Szpankowski January 3, 2007 McGill University, Montreal, Canada U. Pompeu Fabra, Barcelona, Spain U.

More information

Analytic Information Theory: Redundancy Rate Problems

Analytic Information Theory: Redundancy Rate Problems Analytic Information Theory: Redundancy Rate Problems W. Szpankowski Department of Computer Science Purdue University W. Lafayette, IN 47907 November 3, 2015 Research supported by NSF Science & Technology

More information

Coding on Countably Infinite Alphabets

Coding on Countably Infinite Alphabets Coding on Countably Infinite Alphabets Non-parametric Information Theory Licence de droits d usage Outline Lossless Coding on infinite alphabets Source Coding Universal Coding Infinite Alphabets Enveloppe

More information

Average Redundancy for Known Sources: Ubiquitous Trees in Source Coding

Average Redundancy for Known Sources: Ubiquitous Trees in Source Coding Discrete Mathematics and Theoretical Computer Science DMTCS vol. (subm.), by the authors, 1 1 Average Redundancy for Known Sources: Ubiquitous Trees in Source Coding Wojciech Szpanowsi Department of Computer

More information

Chapter 2: Source coding

Chapter 2: Source coding Chapter 2: meghdadi@ensil.unilim.fr University of Limoges Chapter 2: Entropy of Markov Source Chapter 2: Entropy of Markov Source Markov model for information sources Given the present, the future is independent

More information

Lecture 6 I. CHANNEL CODING. X n (m) P Y X

Lecture 6 I. CHANNEL CODING. X n (m) P Y X 6- Introduction to Information Theory Lecture 6 Lecturer: Haim Permuter Scribe: Yoav Eisenberg and Yakov Miron I. CHANNEL CODING We consider the following channel coding problem: m = {,2,..,2 nr} Encoder

More information

Chapter 2 Date Compression: Source Coding. 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code

Chapter 2 Date Compression: Source Coding. 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code Chapter 2 Date Compression: Source Coding 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code 2.1 An Introduction to Source Coding Source coding can be seen as an efficient way

More information

Lecture 3. Mathematical methods in communication I. REMINDER. A. Convex Set. A set R is a convex set iff, x 1,x 2 R, θ, 0 θ 1, θx 1 + θx 2 R, (1)

Lecture 3. Mathematical methods in communication I. REMINDER. A. Convex Set. A set R is a convex set iff, x 1,x 2 R, θ, 0 θ 1, θx 1 + θx 2 R, (1) 3- Mathematical methods in communication Lecture 3 Lecturer: Haim Permuter Scribe: Yuval Carmel, Dima Khaykin, Ziv Goldfeld I. REMINDER A. Convex Set A set R is a convex set iff, x,x 2 R, θ, θ, θx + θx

More information

A Note on a Problem Posed by D. E. Knuth on a Satisfiability Recurrence

A Note on a Problem Posed by D. E. Knuth on a Satisfiability Recurrence A Note on a Problem Posed by D. E. Knuth on a Satisfiability Recurrence Dedicated to Philippe Flajolet 1948-2011 July 4, 2013 Philippe Jacquet Charles Knessl 1 Wojciech Szpankowski 2 Bell Labs Dept. Math.

More information

Digital Trees and Memoryless Sources: from Arithmetics to Analysis

Digital Trees and Memoryless Sources: from Arithmetics to Analysis Digital Trees and Memoryless Sources: from Arithmetics to Analysis Philippe Flajolet, Mathieu Roux, Brigitte Vallée AofA 2010, Wien 1 What is a digital tree, aka TRIE? = a data structure for dynamic dictionaries

More information

Analytic Pattern Matching: From DNA to Twitter. AxA Workshop, Venice, 2016 Dedicated to Alberto Apostolico

Analytic Pattern Matching: From DNA to Twitter. AxA Workshop, Venice, 2016 Dedicated to Alberto Apostolico Analytic Pattern Matching: From DNA to Twitter Wojciech Szpankowski Purdue University W. Lafayette, IN 47907 June 19, 2016 AxA Workshop, Venice, 2016 Dedicated to Alberto Apostolico Joint work with Philippe

More information

Lecture 4 Noisy Channel Coding

Lecture 4 Noisy Channel Coding Lecture 4 Noisy Channel Coding I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw October 9, 2015 1 / 56 I-Hsiang Wang IT Lecture 4 The Channel Coding Problem

More information

CSCI 2570 Introduction to Nanocomputing

CSCI 2570 Introduction to Nanocomputing CSCI 2570 Introduction to Nanocomputing Information Theory John E Savage What is Information Theory Introduced by Claude Shannon. See Wikipedia Two foci: a) data compression and b) reliable communication

More information

Probabilistic analysis of the asymmetric digital search trees

Probabilistic analysis of the asymmetric digital search trees Int. J. Nonlinear Anal. Appl. 6 2015 No. 2, 161-173 ISSN: 2008-6822 electronic http://dx.doi.org/10.22075/ijnaa.2015.266 Probabilistic analysis of the asymmetric digital search trees R. Kazemi a,, M. Q.

More information

Universal Loseless Compression: Context Tree Weighting(CTW)

Universal Loseless Compression: Context Tree Weighting(CTW) Universal Loseless Compression: Context Tree Weighting(CTW) Dept. Electrical Engineering, Stanford University Dec 9, 2014 Universal Coding with Model Classes Traditional Shannon theory assume a (probabilistic)

More information

Information and Entropy

Information and Entropy Information and Entropy Shannon s Separation Principle Source Coding Principles Entropy Variable Length Codes Huffman Codes Joint Sources Arithmetic Codes Adaptive Codes Thomas Wiegand: Digital Image Communication

More information

arxiv:cs/ v1 [cs.ds] 6 Oct 2005

arxiv:cs/ v1 [cs.ds] 6 Oct 2005 Partial Fillup and Search Time in LC Tries December 27, 2017 arxiv:cs/0510017v1 [cs.ds] 6 Oct 2005 Svante Janson Wociech Szpankowski Department of Mathematics Department of Computer Science Uppsala University,

More information

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018 Please submit the solutions on Gradescope. EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018 1. Optimal codeword lengths. Although the codeword lengths of an optimal variable length code

More information

And now for something completely different

And now for something completely different And now for something completely different David Aldous July 27, 2012 I don t do constants. Rick Durrett. I will describe a topic with the paradoxical features Constants matter it s all about the constants.

More information

Information Theory and Statistics Lecture 2: Source coding

Information Theory and Statistics Lecture 2: Source coding Information Theory and Statistics Lecture 2: Source coding Łukasz Dębowski ldebowsk@ipipan.waw.pl Ph. D. Programme 2013/2014 Injections and codes Definition (injection) Function f is called an injection

More information

(Classical) Information Theory II: Source coding

(Classical) Information Theory II: Source coding (Classical) Information Theory II: Source coding Sibasish Ghosh The Institute of Mathematical Sciences CIT Campus, Taramani, Chennai 600 113, India. p. 1 Abstract The information content of a random variable

More information

RENEWAL THEORY IN ANALYSIS OF TRIES AND STRINGS: EXTENDED ABSTRACT

RENEWAL THEORY IN ANALYSIS OF TRIES AND STRINGS: EXTENDED ABSTRACT RENEWAL THEORY IN ANALYSIS OF TRIES AND STRINGS: EXTENDED ABSTRACT SVANTE JANSON Abstract. We give a survey of a number of simple applications of renewal theory to problems on random strings, in particular

More information

EE376A - Information Theory Midterm, Tuesday February 10th. Please start answering each question on a new page of the answer booklet.

EE376A - Information Theory Midterm, Tuesday February 10th. Please start answering each question on a new page of the answer booklet. EE376A - Information Theory Midterm, Tuesday February 10th Instructions: You have two hours, 7PM - 9PM The exam has 3 questions, totaling 100 points. Please start answering each question on a new page

More information

Counting Markov Types

Counting Markov Types Counting Markov Types Philippe Jacquet, Charles Knessl, Wojciech Szpankowski To cite this version: Philippe Jacquet, Charles Knessl, Wojciech Szpankowski. Counting Markov Types. Drmota, Michael and Gittenberger,

More information

UNIT I INFORMATION THEORY. I k log 2

UNIT I INFORMATION THEORY. I k log 2 UNIT I INFORMATION THEORY Claude Shannon 1916-2001 Creator of Information Theory, lays the foundation for implementing logic in digital circuits as part of his Masters Thesis! (1939) and published a paper

More information

PROBABILISTIC BEHAVIOR OF ASYMMETRIC LEVEL COMPRESSED TRIES

PROBABILISTIC BEHAVIOR OF ASYMMETRIC LEVEL COMPRESSED TRIES PROBABILISTIC BEAVIOR OF ASYMMETRIC LEVEL COMPRESSED TRIES Luc Devroye Wojcieh Szpankowski School of Computer Science Department of Computer Sciences McGill University Purdue University 3450 University

More information

Asymmetric Rényi Problem

Asymmetric Rényi Problem Asymmetric Rényi Problem July 7, 2015 Abram Magner and Michael Drmota and Wojciech Szpankowski Abstract In 1960 Rényi in his Michigan State University lectures asked for the number of random queries necessary

More information

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy Coding and Information Theory Chris Williams, School of Informatics, University of Edinburgh Overview What is information theory? Entropy Coding Information Theory Shannon (1948): Information theory is

More information

Partial Fillup and Search Time in LC Tries

Partial Fillup and Search Time in LC Tries Partial Fillup and Search Time in LC Tries August 17, 2006 Svante Janson Wociech Szpankowski Department of Mathematics Department of Computer Science Uppsala University, P.O. Box 480 Purdue University

More information

3F1 Information Theory, Lecture 3

3F1 Information Theory, Lecture 3 3F1 Information Theory, Lecture 3 Jossy Sayir Department of Engineering Michaelmas 2011, 28 November 2011 Memoryless Sources Arithmetic Coding Sources with Memory 2 / 19 Summary of last lecture Prefix-free

More information

ECE 587 / STA 563: Lecture 5 Lossless Compression

ECE 587 / STA 563: Lecture 5 Lossless Compression ECE 587 / STA 563: Lecture 5 Lossless Compression Information Theory Duke University, Fall 2017 Author: Galen Reeves Last Modified: October 18, 2017 Outline of lecture: 5.1 Introduction to Lossless Source

More information

Notes by Zvi Rosen. Thanks to Alyssa Palfreyman for supplements.

Notes by Zvi Rosen. Thanks to Alyssa Palfreyman for supplements. Lecture: Hélène Barcelo Analytic Combinatorics ECCO 202, Bogotá Notes by Zvi Rosen. Thanks to Alyssa Palfreyman for supplements.. Tuesday, June 2, 202 Combinatorics is the study of finite structures that

More information

Basic Principles of Lossless Coding. Universal Lossless coding. Lempel-Ziv Coding. 2. Exploit dependences between successive symbols.

Basic Principles of Lossless Coding. Universal Lossless coding. Lempel-Ziv Coding. 2. Exploit dependences between successive symbols. Universal Lossless coding Lempel-Ziv Coding Basic principles of lossless compression Historical review Variable-length-to-block coding Lempel-Ziv coding 1 Basic Principles of Lossless Coding 1. Exploit

More information

A Master Theorem for Discrete Divide and Conquer Recurrences

A Master Theorem for Discrete Divide and Conquer Recurrences A Master Theorem for Discrete Divide and Conquer Recurrences MICHAEL DRMOTA and WOJCIECH SZPANKOWSKI TU Wien and Purdue University Divide-and-conquer recurrences are one of the most studied equations in

More information

Lecture 3 : Algorithms for source coding. September 30, 2016

Lecture 3 : Algorithms for source coding. September 30, 2016 Lecture 3 : Algorithms for source coding September 30, 2016 Outline 1. Huffman code ; proof of optimality ; 2. Coding with intervals : Shannon-Fano-Elias code and Shannon code ; 3. Arithmetic coding. 1/39

More information

Pointwise Redundancy in Lossy Data Compression and Universal Lossy Data Compression

Pointwise Redundancy in Lossy Data Compression and Universal Lossy Data Compression Pointwise Redundancy in Lossy Data Compression and Universal Lossy Data Compression I. Kontoyiannis To appear, IEEE Transactions on Information Theory, Jan. 2000 Last revision, November 21, 1999 Abstract

More information

EE5585 Data Compression January 29, Lecture 3. x X x X. 2 l(x) 1 (1)

EE5585 Data Compression January 29, Lecture 3. x X x X. 2 l(x) 1 (1) EE5585 Data Compression January 29, 2013 Lecture 3 Instructor: Arya Mazumdar Scribe: Katie Moenkhaus Uniquely Decodable Codes Recall that for a uniquely decodable code with source set X, if l(x) is the

More information

Asymptotic and Exact Poissonized Variance in the Analysis of Random Digital Trees (joint with Hsien-Kuei Hwang and Vytas Zacharovas)

Asymptotic and Exact Poissonized Variance in the Analysis of Random Digital Trees (joint with Hsien-Kuei Hwang and Vytas Zacharovas) Asymptotic and Exact Poissonized Variance in the Analysis of Random Digital Trees (joint with Hsien-Kuei Hwang and Vytas Zacharovas) Michael Fuchs Institute of Applied Mathematics National Chiao Tung University

More information

The Moments of the Profile in Random Binary Digital Trees

The Moments of the Profile in Random Binary Digital Trees Journal of mathematics and computer science 6(2013)176-190 The Moments of the Profile in Random Binary Digital Trees Ramin Kazemi and Saeid Delavar Department of Statistics, Imam Khomeini International

More information

Coding of memoryless sources 1/35

Coding of memoryless sources 1/35 Coding of memoryless sources 1/35 Outline 1. Morse coding ; 2. Definitions : encoding, encoding efficiency ; 3. fixed length codes, encoding integers ; 4. prefix condition ; 5. Kraft and Mac Millan theorems

More information

On Symmetries of Non-Plane Trees in a Non-Uniform Model

On Symmetries of Non-Plane Trees in a Non-Uniform Model On Symmetries of Non-Plane Trees in a Non-Uniform Model Jacek Cichoń Abram Magner Wojciech Spankowski Krystof Turowski October 3, 26 Abstract Binary trees come in two varieties: plane trees, often simply

More information

ECE 587 / STA 563: Lecture 5 Lossless Compression

ECE 587 / STA 563: Lecture 5 Lossless Compression ECE 587 / STA 563: Lecture 5 Lossless Compression Information Theory Duke University, Fall 28 Author: Galen Reeves Last Modified: September 27, 28 Outline of lecture: 5. Introduction to Lossless Source

More information

Lecture 5: Asymptotic Equipartition Property

Lecture 5: Asymptotic Equipartition Property Lecture 5: Asymptotic Equipartition Property Law of large number for product of random variables AEP and consequences Dr. Yao Xie, ECE587, Information Theory, Duke University Stock market Initial investment

More information

Information Theory. Lecture 5 Entropy rate and Markov sources STEFAN HÖST

Information Theory. Lecture 5 Entropy rate and Markov sources STEFAN HÖST Information Theory Lecture 5 Entropy rate and Markov sources STEFAN HÖST Universal Source Coding Huffman coding is optimal, what is the problem? In the previous coding schemes (Huffman and Shannon-Fano)it

More information

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1 Kraft s inequality An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if N 2 l i 1 Proof: Suppose that we have a tree code. Let l max = max{l 1,...,

More information

Coding for Discrete Source

Coding for Discrete Source EGR 544 Communication Theory 3. Coding for Discrete Sources Z. Aliyazicioglu Electrical and Computer Engineering Department Cal Poly Pomona Coding for Discrete Source Coding Represent source data effectively

More information

Lecture 4 : Adaptive source coding algorithms

Lecture 4 : Adaptive source coding algorithms Lecture 4 : Adaptive source coding algorithms February 2, 28 Information Theory Outline 1. Motivation ; 2. adaptive Huffman encoding ; 3. Gallager and Knuth s method ; 4. Dictionary methods : Lempel-Ziv

More information

Run-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE

Run-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE General e Image Coder Structure Motion Video x(s 1,s 2,t) or x(s 1,s 2 ) Natural Image Sampling A form of data compression; usually lossless, but can be lossy Redundancy Removal Lossless compression: predictive

More information

Common Information. Abbas El Gamal. Stanford University. Viterbi Lecture, USC, April 2014

Common Information. Abbas El Gamal. Stanford University. Viterbi Lecture, USC, April 2014 Common Information Abbas El Gamal Stanford University Viterbi Lecture, USC, April 2014 Andrew Viterbi s Fabulous Formula, IEEE Spectrum, 2010 El Gamal (Stanford University) Disclaimer Viterbi Lecture 2

More information

Information Transfer in Biological Systems

Information Transfer in Biological Systems Information Transfer in Biological Systems W. Szpankowski Department of Computer Science Purdue University W. Lafayette, IN 47907 May 19, 2009 AofA and IT logos Cergy Pontoise, 2009 Joint work with M.

More information

Tight Bounds on Minimum Maximum Pointwise Redundancy

Tight Bounds on Minimum Maximum Pointwise Redundancy Tight Bounds on Minimum Maximum Pointwise Redundancy Michael B. Baer vlnks Mountain View, CA 94041-2803, USA Email:.calbear@ 1eee.org Abstract This paper presents new lower and upper bounds for the optimal

More information

3F1 Information Theory, Lecture 3

3F1 Information Theory, Lecture 3 3F1 Information Theory, Lecture 3 Jossy Sayir Department of Engineering Michaelmas 2013, 29 November 2013 Memoryless Sources Arithmetic Coding Sources with Memory Markov Example 2 / 21 Encoding the output

More information

EE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16

EE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16 EE539R: Problem Set 4 Assigned: 3/08/6, Due: 07/09/6. Cover and Thomas: Problem 3.5 Sets defined by probabilities: Define the set C n (t = {x n : P X n(x n 2 nt } (a We have = P X n(x n P X n(x n 2 nt

More information

Data Compression Techniques (Spring 2012) Model Solutions for Exercise 2

Data Compression Techniques (Spring 2012) Model Solutions for Exercise 2 582487 Data Compression Techniques (Spring 22) Model Solutions for Exercise 2 If you have any feedback or corrections, please contact nvalimak at cs.helsinki.fi.. Problem: Construct a canonical prefix

More information

lossless, optimal compressor

lossless, optimal compressor 6. Variable-length Lossless Compression The principal engineering goal of compression is to represent a given sequence a, a 2,..., a n produced by a source as a sequence of bits of minimal possible length.

More information

Lecture 16. Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code

Lecture 16. Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code Lecture 16 Agenda for the lecture Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code Variable-length source codes with error 16.1 Error-free coding schemes 16.1.1 The Shannon-Fano-Elias

More information

Solutions to Set #2 Data Compression, Huffman code and AEP

Solutions to Set #2 Data Compression, Huffman code and AEP Solutions to Set #2 Data Compression, Huffman code and AEP. Huffman coding. Consider the random variable ( ) x x X = 2 x 3 x 4 x 5 x 6 x 7 0.50 0.26 0. 0.04 0.04 0.03 0.02 (a) Find a binary Huffman code

More information

Entropy for Sparse Random Graphs With Vertex-Names

Entropy for Sparse Random Graphs With Vertex-Names Entropy for Sparse Random Graphs With Vertex-Names David Aldous 11 February 2013 if a problem seems... Research strategy (for old guys like me): do-able = give to Ph.D. student maybe do-able = give to

More information

AVERAGE-CASE ANALYSIS OF COUSINS IN m-ary TRIES

AVERAGE-CASE ANALYSIS OF COUSINS IN m-ary TRIES J. Appl. Prob. 45, 888 9 (28) Printed in England Applied Probability Trust 28 AVERAGE-CASE ANALYSIS OF COUSINS IN m-ary TRIES HOSAM M. MAHMOUD, The George Washington University MARK DANIEL WARD, Purdue

More information

A Representation of Approximate Self- Overlapping Word and Its Application

A Representation of Approximate Self- Overlapping Word and Its Application Purdue University Purdue e-pubs Computer Science Technical Reports Department of Computer Science 1994 A Representation of Approximate Self- Overlapping Word and Its Application Wojciech Szpankowski Purdue

More information

Efficient On-line Schemes for Encoding Individual Sequences with Side Information at the Decoder

Efficient On-line Schemes for Encoding Individual Sequences with Side Information at the Decoder Efficient On-line Schemes for Encoding Individual Sequences with Side Information at the Decoder Avraham Reani and Neri Merhav Department of Electrical Engineering Technion - Israel Institute of Technology

More information

Entropy and Compression for Sparse Graphs with Vertex Names

Entropy and Compression for Sparse Graphs with Vertex Names Entropy and Compression for Sparse Graphs with Vertex Names David Aldous February 7, 2012 Consider a graph with N vertices O(1) average degree vertices have distinct names, strings of length O(log N) from

More information

Information Theory CHAPTER. 5.1 Introduction. 5.2 Entropy

Information Theory CHAPTER. 5.1 Introduction. 5.2 Entropy Haykin_ch05_pp3.fm Page 207 Monday, November 26, 202 2:44 PM CHAPTER 5 Information Theory 5. Introduction As mentioned in Chapter and reiterated along the way, the purpose of a communication system is

More information

Capacity of a channel Shannon s second theorem. Information Theory 1/33

Capacity of a channel Shannon s second theorem. Information Theory 1/33 Capacity of a channel Shannon s second theorem Information Theory 1/33 Outline 1. Memoryless channels, examples ; 2. Capacity ; 3. Symmetric channels ; 4. Channel Coding ; 5. Shannon s second theorem,

More information

Entropies & Information Theory

Entropies & Information Theory Entropies & Information Theory LECTURE I Nilanjana Datta University of Cambridge,U.K. See lecture notes on: http://www.qi.damtp.cam.ac.uk/node/223 Quantum Information Theory Born out of Classical Information

More information

Compression and Coding

Compression and Coding Compression and Coding Theory and Applications Part 1: Fundamentals Gloria Menegaz 1 Transmitter (Encoder) What is the problem? Receiver (Decoder) Transformation information unit Channel Ordering (significance)

More information

Block 2: Introduction to Information Theory

Block 2: Introduction to Information Theory Block 2: Introduction to Information Theory Francisco J. Escribano April 26, 2015 Francisco J. Escribano Block 2: Introduction to Information Theory April 26, 2015 1 / 51 Table of contents 1 Motivation

More information

7 Asymptotics for Meromorphic Functions

7 Asymptotics for Meromorphic Functions Lecture G jacques@ucsd.edu 7 Asymptotics for Meromorphic Functions Hadamard s Theorem gives a broad description of the exponential growth of coefficients in power series, but the notion of exponential

More information

Analysis of the Multiplicity Matching Parameter in Suffix Trees

Analysis of the Multiplicity Matching Parameter in Suffix Trees Analysis of the Multiplicity Matching Parameter in Suffix Trees Mark Daniel Ward 1 and Wojciech Szpankoski 2 1 Department of Mathematics, Purdue University, West Lafayette, IN, USA. mard@math.purdue.edu

More information

COMM901 Source Coding and Compression. Quiz 1

COMM901 Source Coding and Compression. Quiz 1 German University in Cairo - GUC Faculty of Information Engineering & Technology - IET Department of Communication Engineering Winter Semester 2013/2014 Students Name: Students ID: COMM901 Source Coding

More information

Solutions to Homework Set #1 Sanov s Theorem, Rate distortion

Solutions to Homework Set #1 Sanov s Theorem, Rate distortion st Semester 00/ Solutions to Homework Set # Sanov s Theorem, Rate distortion. Sanov s theorem: Prove the simple version of Sanov s theorem for the binary random variables, i.e., let X,X,...,X n be a sequence

More information

Exercise 1. = P(y a 1)P(a 1 )

Exercise 1. = P(y a 1)P(a 1 ) Chapter 7 Channel Capacity Exercise 1 A source produces independent, equally probable symbols from an alphabet {a 1, a 2 } at a rate of one symbol every 3 seconds. These symbols are transmitted over a

More information

3F1: Signals and Systems INFORMATION THEORY Examples Paper Solutions

3F1: Signals and Systems INFORMATION THEORY Examples Paper Solutions Engineering Tripos Part IIA THIRD YEAR 3F: Signals and Systems INFORMATION THEORY Examples Paper Solutions. Let the joint probability mass function of two binary random variables X and Y be given in the

More information

10-704: Information Processing and Learning Fall Lecture 10: Oct 3

10-704: Information Processing and Learning Fall Lecture 10: Oct 3 0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 0: Oct 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy of

More information

PART III. Outline. Codes and Cryptography. Sources. Optimal Codes (I) Jorge L. Villar. MAMME, Fall 2015

PART III. Outline. Codes and Cryptography. Sources. Optimal Codes (I) Jorge L. Villar. MAMME, Fall 2015 Outline Codes and Cryptography 1 Information Sources and Optimal Codes 2 Building Optimal Codes: Huffman Codes MAMME, Fall 2015 3 Shannon Entropy and Mutual Information PART III Sources Information source:

More information

Lecture 8: Shannon s Noise Models

Lecture 8: Shannon s Noise Models Error Correcting Codes: Combinatorics, Algorithms and Applications (Fall 2007) Lecture 8: Shannon s Noise Models September 14, 2007 Lecturer: Atri Rudra Scribe: Sandipan Kundu& Atri Rudra Till now we have

More information

CS6304 / Analog and Digital Communication UNIT IV - SOURCE AND ERROR CONTROL CODING PART A 1. What is the use of error control coding? The main use of error control coding is to reduce the overall probability

More information

Information Theory with Applications, Math6397 Lecture Notes from September 30, 2014 taken by Ilknur Telkes

Information Theory with Applications, Math6397 Lecture Notes from September 30, 2014 taken by Ilknur Telkes Information Theory with Applications, Math6397 Lecture Notes from September 3, 24 taken by Ilknur Telkes Last Time Kraft inequality (sep.or) prefix code Shannon Fano code Bound for average code-word length

More information

An O(N) Semi-Predictive Universal Encoder via the BWT

An O(N) Semi-Predictive Universal Encoder via the BWT An O(N) Semi-Predictive Universal Encoder via the BWT Dror Baron and Yoram Bresler Abstract We provide an O(N) algorithm for a non-sequential semi-predictive encoder whose pointwise redundancy with respect

More information

Information Dimension

Information Dimension Information Dimension Mina Karzand Massachusetts Institute of Technology November 16, 2011 1 / 26 2 / 26 Let X would be a real-valued random variable. For m N, the m point uniform quantized version of

More information