Analytic Algorithmics, Combinatorics, and Information Theory

Size: px
Start display at page:

Download "Analytic Algorithmics, Combinatorics, and Information Theory"

Transcription

1 Analytic Algorithmics, Combinatorics, and Information Theory W. Szpankowski Department of Computer Science Purdue University W. Lafayette, IN September 11, 2006 AofA and IT logos Research supported by NSF and NIH. Joint work with M. Drmota, P. Flajolet, and P. Jacquet.

2 Outline 1. Basic Facts about Source Coding 2. The Redundancy Rate Problem (a) Known Sources (Sequences mod 1) (b) Universal Memoryless Sources (Tree-like generating functions) (c) Universal Markov Sources (Balance matrices, Eulerian paths) (d) Universal Renewal Sources (Combinatorial calculus, Mellin, asymptotics) 3. Conclusions: Information Beyond Shannon (a) Computer Science and Information Theory Interplay (b) Today s Challenges (c) Vision

3 Goals of Source Coding The basic problem of source coding (i.e., data compression) is to find codes with shortest descriptions (lengths) either on average or for individual sequences when the source (i.e., statistics of the underlying probability distribution) is unknown (the so called universal source coding. Goals: Find universal lower bound on compression ratio (bit rate). Construct universal source codes that achieve this lower bound up to the second order asymptotics (i.e., match redundancy which is basically a measure of the second term asymptotics). As pointed out by Rissanen, universal coding evolved into universal modeling where the purpose is no longer restricted to just coding but rather to finding optimal models for data.

4 Some Definitions Definition: A block-to-variable (BV) length code C n : A n {0, 1} is a bijective mapping from a set of all sequences of length n over the alphabet A to the set {0, 1} of binary sequences. For a probabilistic source model S and a code C n we let: P(x n 1 ) be the probability of xn 1 = x 1... x n ; L(C n, x n 1 ) be the code length for xn 1 ; Entropy H n (P) = P x n 1 P(xn 1 )lg P(xn 1 ). Information-theoretic quantities are expressed in binary logarithms written lg := log 2.

5 Two Facts A prefix code is such that no codeword is a prefix of another codeword. Kraft s Inequality A code is a prefix code if and only if l 1, l 2,..., l m satisfy the inequality mx 2 l i 1. i=1 where l 1, l 2,..., l m are codeword length. Proof. An easy exercise on trees. Shannon s First Theorem For any prefix code the average code length E[L(C n, X n 1 )] cannot be smaller than the entropy of the source H n (P), that is, E[L(C n, X n 1 )] H n(p). Proof: Use log x x 1 for 0 < x 1 and Kraft s inequality. Optimal length: Minimize min C n E[L(C n, X)] subject to Kraft s inequality: el(x n 1 ) = log P(xn 1 ).

6 Outline Update 1. Basic Facts of Source Coding 2. The Redundancy Rate Problem (a) Known Sources (Sequences mod 1) (b) Universal Memoryless Sources (c) Universal Markov Sources (d) Universal Renewal Sources 3. Conclusions

7 Redundancy Known Source P The pointwise redundancy R n (C n, P; x n 1 ) and the average redundancy R n (C n, P) are defined as R n (C n, P; x n 1 ) = L(C n, x n 1 ) + lg P(xn 1 ) R n (C n ) = E[L(C n, X n 1 )] H n(p) 0 The maximal or worst case redundancy is R (C n, P) = max{r x n n (C n, P; x n 1 )}( 0). 1 The pointwise redundancy can be negative, maximal and average redundancy cannot. The smaller the redundancy is, the better (closer to the optimal) the code is.

8 Redundancy for Known Sources Huffman Code The following optimization problem is solved by Huffman s code Generalized Shannon Code R n (P) = min Cn C E x n 1 [L(C n, x n 1 ) + log 2 P(x n 1 )]. Drmota and W.S. (2001) proved that R n (P) = min Cn max[l(c x n n, x n 1 ) + lg P(xn 1 )]. 1 is solved by the generalized Shannon code C GS n which is defined as L(C GS n, xn 1 ) = j lg 1/P(x n 1 ) if lg P(xn 1 ) s 0 lg 1/P(x n 1 ) if lg P(xn 1 ) > s 0 where s 0 is a constant, and x = x x is the fractional part of x;

9 Two Modes of R H n (P) Behavior Figure: The average redundancy of Huffman codes versus block size n for: (a) irrational α = log 2 (1 p)/p with p = 1/π; (b) rational α = log 2 (1 p)/p with p = 1/9.

10 Why Two Modes: Shannon Code Consider the Shannon code that assigns the length L(C S n, xn 1 ) = lg P(xn 1 ) to the source sequence x n 1. Observe that P(x n 1 ) = pk (1 p) n k where p is known probability of generating 0, and k is the number of 0s. The Shannon code redundancy is R S n = nx k=0 n k p k (1 p) n k log 2 (p k (1 p) n k ) + log 2 (p k (1 p) n k ) = 1 nx k=0 n k p k (1 p) n k αk + βn where x = x x is the fractional part of x, and ««1 p 1 α = log 2, β = log 2. p 1 p

11 Sequences Modulo 1 Asymptotic redundancy depends on the behavior of the sequence x k = αk + βn. One needs asymptotics of the following sum: nx k=0 n k p k (1 p) n k f( x k + y ) 8 >< >: R 1 0 f(t) dt, x k B.u.d mod 1 sequences oscilating otherwise for fixed p and some Riemann integrable function f : [0, 1] R. For the Huffman code the following is known [W.S. (2000)] R H n = 8 >< >: ln M + o(1) α irrational ` βmn nβm /M M(1 2 1/M + ) O(ρn ) α = N M N, M are integers such that gcd(n, M) = 1 and ρ < 1. For the generalized Shannon code Drmota and W.S. (2004) proved when α is irrational. R n (P log log 2 p) = log 2 + o(1) = o(1).

12 Outline Update 1. Basic Facts of Source Coding 2. The Redundancy Rate Problem (a) Known Sources (b) Unknown Sources (memoryless, Markov, renewal) 3. Conclusions

13 Minimax Redundancy Unknown Source P In practice, one can only hope to have some knowledge about a family of sources S that generate real data. Following Davisson we define the average minimax redundancy R n (S) and the worst case (maximal) minimax redundancy R n (S) for a family of sources S as R n (S) R n (S) = min Cn = min Cn supe[l(c n, x n 1 ) + lg P(xn 1 )] P S sup P S max[l(c x n n, x n 1 ) + lg P(xn 1 )]. 1 In the minimax scenario we look for the best code for the the worst source. Source Coding Goal: Find data compression algorithms that match optimal redundancy rates either on average or for individual sequences.

14 Another Representation for the Minimax Redundancy For the maximal minimax redundancy define Q (x n 1 ) := sup P S P(x n 1 ) Py n1 An sup P S P(y n 1 ). the maximum likelihood distribution. Observe that R n (S) = min sup Cn C P S = min Cn C max x n 1 = min max Cn C x n 1 max(l(c n, x n 1 ) + lg P(xn 1 )) x n 1 = R GS B n (Q ) + «L(C n, x n 1 ) + sup lg P(x n 1 ) P S [L(C n, x n 1 ) + lg Q (x n 1 ) + lg X 0 X sup y 1 n An P S 1 P(y n 1 ) C A, y n 1 An sup P S P(y n 1 )] R GS n (Q ) being the maximal redundancy of a generalized Shannon. Define 0 1 B X D n (S) = P(x n 1 ) C A := lg d n (S). sup x n 1 An P S

15 Maximal Minimax Redundancy We consider the following classes of sources S: Memoryless sources M 0 over an m-ary (finite) alphabet, that is, P(x n 1 ) = pk 1 1 pk m m with k k m = n, where p i are unknown! Markov sources M r over a binary alphabet of order r. Observe that for r = 1 P(x n 1 ) = p x 1 p k pk pk pk 11 11, where k ij is such that (x k, x k+1 ) = (i, j) {0, 1} 2 and k 00 + k 01 + k 10 + k 11 = n 1, and that k 01 = k 10 if x 1 = x n and k 01 = k 10 ± 1 if x 1 x n. Renewal Sources R 0 where a 1 is introduced after a run of 0 s distributed according to some distribution.

16 Outline Update 1. Basic Facts of Source Coding 2. The Redundancy Rate Problem (a) Known Sources (b) Universal Memoryless Sources (c) Universal Markov Sources (d) Universal Renewal Sources 3. Conclusions

17 Maximal Minimax for Memoryless Sources We first consider the maximal minimax redundancy R n (M 0) for a class of memoryless sources over a finite m-ary alphabet. Observe that d n (M 0 ) = X x n 1 = = The summation set is sup p k 1 1 pk m m p 1,...,pm X k 1 + +km=n X k 1 + +km=n n k 1,..., k m sup p 1,...,pm n k 1,..., k m k1 n p k 1 1 pk m m «k1 km I(k 1,..., k m ) = {(k 1,..., k m ) : k k m = n}. The number N k of types k = (k 1,..., k m ) is n N k = k 1,..., k m The (unnormalized) likelihood distribution is sup p k 1 1 pk m m = p 1,...,pm k1 n «k1 km n «k m n «k m.

18 Generating Function for d n (M 0 ) We write d n (M 0 ) = n! n n X k 1 + +km=n Let us introduce the tree-generating function k k 1 1 k 1! kkm m k m! B(z) = X k=0 k k k! zk = 1 1 T(z), where T(z) satisfies T(z) = ze T(z) (= W( z), Lambert s W -function) and X k k 1 T(z) = z k k! k=1 enumerates all rooted labeled trees. Let now D m (z) = Then by the convolution formula X n=0 n n n! d n(m 0 )z n. D m (z) = [B(z)] m.

19 Asymptotics The function B(z) has an algebraic singularity at z = e 1 (it becomes a multi-valued function) and one finds B(z) = 1 p 2(1 ez) O ` 1 ez. The singularity analysis of Flajolet & Odlyzko yields (cf. W.S., 1998) d n (M 0 ) = m «n log + log m(m 2)(2m + 1) 36 π Γ( m 2 )! + Γ(m 2 )m 3Γ( m ) Γ2 ( m 2 )m2 9Γ 2 ( m )! 2 n 1 n + To complete the analysis, we need R GS n (Q ). Drmota & W.S., 2001 proved n (Q ) = ln m 1 1 ln m ln m R GS + o(1), In general, the term o(1) can not be improved. Thus R n (M 0) = m 1 2 n log 2 «ln 1 m 1 ln m ln m + log π Γ( m 2 )! + o(1).

20 Outline Update 1. Basic Facts of Source Coding 2. The Redundancy Rate Problem (a) Known Source (b) Universal Memoryless Sources (c) Universal Markov Sources (d) Universal Renewal Sources 3. Conclusions

21 Maximal Minimax for Markov Sources (i) Consider now the M 1 Markov source of order r = 1, (ii) with the transition matrix P = {p ij } m i,j=1, (iii) and circular sequences (the first symbols follows the last). d n (M 1 ) = X x n 1 = X k Fn sup P p k pk mm mm M k k11 k 1 «k11... kmm k m «k mm, where M k represents the numbers of strings x n 1 of type k, and k ij is the number of pairs ij A 2 in x n 1, k i = P m j=1 k ij. Matrix k = {k ij } m i,j=1 is an integer matrix such that (balance property) F n : mx k ij = n, and mx k ij = mx k ji, i,j=1 j=1 j=0 Matrix k satisfying the above conditions is called the frequency matrix or Markov type matrix.

22 Markov Types For (a binary) Markov source P(x n 1 ) = pk pk pk pk 11 11, thus all strings x n 1 with the same matrix k = {k ij} i,j A have the same empirical distribution. These sequences belong to the same type k. Let N a k be number of (cyclic) strings xn 1 belonging to the same type k and starting with a symbol a. Observe: N a k is equal to the number of Eulerian paths (starting from vertex a A) in a multigraph built on A vertices and k ij edges between the ith and the jth vertices. Example: Let A = {0, 1} and k =»

23 Main Technical Tool Let g k be a sequence of scalars indexed by matrices k and g(z) = X k g k z k be its regular generating function, and Fg(z) = X k F g k z k = X n 0 X k Fn g k z k the F-generating function of g k for which k F. Lemma 1. Let g(z) = P k g kz k. Then Fg(z) := X n 0 X k Fn g k z k = 1 2jπ «m I I dx1 x 1 dxm x m g([z ij x j x i ]) with the ij-coefficient of [z ij x j x i ] is z ij x j x i. Proof. It suffices to observe g([z ij x j x i ]) = X k g k z k m Y i=1 P i x k ij P j k ij i Thus Fg(z) is the coefficient of g([z ij x j x i ]) at x 0 1 x0 2 x0 m.

24 Main Results Theorem 1 (Jacquet and W.S., 2004). (i) Let M 1 be a Markov source over an m-ary alphabet. Then with d n (M 1 ) = A m = where K(1) = {y ij : of degree m 1. n Z 2π «m(m 1)/2 A m 1 + O ««1 n q P j y ij d[y ij ] yij Q j mf m (y ij ) Y K(1) i P ij y ij = 1} and F m ( ) is a polynomial expression (ii) Let M r be a Markov source of order r. Then d n (M r ) = n 2π «m r (m 1)/2 A r m 1 + O ««1 n where A r m is a constant defined in a similar fashion as A m above. In particular, for m = 2, A 2 = 16 C where C is Catalan s constant P ( 1) i i (2i+1)

25 Outline Update 1. Basic Facts of Source Coding 2. The Redundancy Rate Problem (a) Known Sources (b) Universal Memoryless Sources (c) Universal Markov Sources (d) Universal Renewal Sources 3. Conclusions

26 Renewal Sources The renewal process is defined as follows: Let T 1, T 2... be a sequence of i.i.d. positive-valued random variables with distribution Q(j) = Pr{T i = j}. The process T 0, T 0 + T 1, T 0 + T 1 + T 2,... is called the renewal process. With a renewal process we associate a binary renewal sequence in which the positions of the 1 s are at the renewal epochs (runs of zeros) T 0, T 0 + T 1,.... Csiszár and Shields (1996) proved that R n (R 0 ) = Θ( n). Theorem 2 (Flajolet and W.S., 1998). Let c = π Then R n (R 0) = 2 cn + O(log n). log 2

27 Maximal Minimax Redundancy For a sequence x n 0 = 10α 1 10 α αn 1 0 {z 0} k k m is the number of i such that α i = m. Then P(x n 1 ) = Qk 0 (0)Q k 1 (1) Q k n 1 (n 1)Pr{T 1 > k }. It can be proved that r n+1 1 d n (R 0 ) P n m=0 r m where r n = nx k=0 r n,k = X P(n,k) r n,k k «k0 «k1 k0 k1 kn 1 k 0 k n 1 k k k «kn 1 where P(n, k) is the summation set which happens to be the partition of n into k terms, i.e., n = k 0 + 2k nk n 1, k = k k n 1.

28 Main Asymptotic Results Theorem 3 (Flajolet and W.S., 1998). The quantity r n attains the following asymptotics where c = π r n = 2 5 cn log 2 8 lg n lg log n + O(1) Asymptotic analysis is sophisticated and follows these steps: first, we transform r n into another quantity s n ; using a probabilistic technique) we know recover r n from s n ; by combinatorial calculus we find the generating function of s n, which turns out to be an infinite product of tree-functions B(z); transform this product into a harmonic sum that can be analyzed asymptotically by the Mellin transform; find asymptotic expansion of the generating function around z = 1; finally, estimate R n (R 0) by the saddle point method.

29 Asymptotics: The Main Idea The quantity r n is to hard to analyze due to the factor k!/k k, hence we define a new quantity s n defined as ( sn = P n k=0 s n,k s n,k = e k P P(n,k) kk 0 k 0! kk n 1 k n 1!. To analyze it, we introduce the random variable K n as follows Stirling s formula yields Pr{K n = k} = s n,k s n. r n s n = nx k=0 r n,k s n,k s n,k s n = E[(K n )!K K n n e Kn ] = E[ p 2πK n ] + O(E[K n 1 2]).

30 Fundamental Lemmas Lemma 2. Let µ n = E[K n ] and σ 2 n = Var(K n). s n exp 2 cn 78 «log n + d + o(1) µ n = 1 4 r n c log n c + o( n) σ 2 n = O(n log n) = o(µ 2 n ), where c = π 2 /6 1, d = log log c 3 4 Lemma 3. For large n log π. E[ p K n ] = µ 1/2 n (1 + o(1)) E[K n 1 2] = o(1). where µ n = E[K n ]. Thus r n = s n E[ p 2πK n ](1 + o(1)) = s n p 2πµn (1 + o(1)).

31 Sketch of a Proof: Generating Functions 1. Define β(z) as By Lagrange inversion β(z) = β(z) = X k=0 k k k! e k z k. 1 1 T(ze 1 ) where the tree generating function T(z) = ze T(z). 2. Define S n (u) = X s n,k u k, S(z, u) = k=0 X S n (u)z n. n=0 Since s n,k involves convolutions of sequences of the form k k /k!, we have S(z, u) = X P n,k z 1k 0 +2k 1 + u e «k0 + +k n 1 k k 0 k 0! kkn 1 k n 1! = Y β(z i u). We need s n = [z n ]S(z, 1) where [z n ]F(z) is the coefficient at z n of F(z). 3. Let L(z) = log S(z, 1) and z = e t, so that i=1 L(e t ) = X log β(e kt ). k=1 which falls under the harmonic sum paradigm: L(t) = P k λ kf(µ k t).

32 Mellin Asymptotics 4. Computing the Mellin transform L (s) = M(L(e t ); s) of the above harmonic sum we find L (s) = ζ(s)λ(s) where ζ(s) = P n 1 n s is the Riemann zeta function, and Λ(s) = Z 0 log β(e t )t s 1 dt. This leads to the following asymptotic expansion «L Λ(1) (s) + 1 s 1 4s 2 log π «4s s=1 5. By the converse mapping property of the Mellin transform which translates in L(e t ) = Λ(1) t s= log t 1 4 log π + O( t), L(z) = Λ(1) 1 z log(1 z) 1 4 log π 1 2 Λ(1) + O( 1 z). where c = Λ(1) = π 2 / In summary, we just proved that, as z 1, S(z, 1) = e L(z) = a(1 z) 4 1 «c exp (1 + o(1)), 1 z where a = exp( 1 4 log π 1 2 c). 7. To extract asymptotic we need to apply the saddle point method..

33 Outline Update 1. Basic Facts of Source Coding 2. The Redundancy Rate Problem (a) Universal Memoryless Sources (b) Universal Markov Sources (c) Universal Renewal Sources 3. Conclusions

34 Analytic Information Theory In the 1997 Shannon Lecture Jacob Ziv presented compelling arguments for backing off from first-order asymptotics in order to predict the behavior of real systems with finite length description. To overcome these difficulties we propose replacing first-order analyses by full asymptotic expansions and more accurate analyses (e.g., large deviations, central limit laws). Following Knuth and Hadamard s precept 1, we study information theory problems using techniques of complex analysis such as generating functions, combinatorial calculus, Rice s formula, Mellin transform, Fourier series, sequences distributed modulo 1, saddle point methods, analytic poissonization and depoissonization, and singularity analysis. This program, which applies complex-analytic tools to information theory, constitutes analytic information theory. 1 The shortest path between two truths on the real line passes through the complex plane.

35 Information Theory and Computer Science Interface The interplay between IT and CS dates back to the founding father of information theory, Claude E. Shannon. In 2003 was the first Workshop on Information Theory and Computer Science Interface held in Chicago. Examples of IT and CS Interplay: Lempel-Ziv schemes (Ziv, Lempel, Louchard, Jacquet, Szpankowski) LDPC coding, Tornado and Raptor codes (Gallager, Luby, Mitzenmacher, Shokrollahi, Urbanke) List-decoding algorithms for error-correcting codes (Gallager, Sudan, Guruswami, Koetter, Vardy); Kolmogorov complexity (Kolmogorov, Cover, Li, Vitanyi, Lempel, Ziv); Analytic information theory (Jacquet, Flajolet, Drmota, Savari, Szpankowski); Quantum computing and information (Shor, Grover, Schumacher, Bennett, Deutsch, Calderbank); Network coding and wireless computing (Kumar, Yang, Effros, Bruck, Ephremides, Shroff, Verdu).

36 Information Beyond Shannon Participants of the Information Beyond Shannon workshop, Orlando, 2005 listed the following research issues: Delay: In computer networks, delay incurred is a nontrivial issue not yet addressed in information theory (e.g., complete information arriving late maybe useless). Space: In networks the spatially distributed components raise fundamental issues of limitations in information exchange since the available resources must be shared, allocated and re-used. Information and Control: Again in networks our objective is to reliably send data with high bit rate and small delay (control). For example, in wireless/ad-hoc networks, information is exchanged in space and time for decision making, thus timeliness of information delivery along with reliability and complexity constitute the basic objective. Dynamic information: In a complex network in a space-time-control environment ( e.g., human brain information is not simply communicated but also processed) how can the consideration of such dynamical sources be incorporated into the Shannon-theoretic model?

37 Today s Challenges We still lack measures and meters to define and appraise the amount of structure and organization embodied in artifacts and natural objects. Information accumulates at a rate faster than it can be sifted through, so that the bottleneck, traditionally represented by the medium, is drifting towards the receiving end of the channel. Timeliness, space and control are important dimensions of Information. Time and space varying situations are rarely studied in Shannon Information Theory. In a growing number of situations, the overhead in accessing Information makes information itself practically unattainable or obsolete. Microscopic systems do not seem to obey Shannon s postulates of Information. In the quantum world and on the level of living cells, traditional Information often fails to accurately describe reality.

38 Vision Perhaps it is time to initiate an Information Science Institute integrating research and teaching activities aimed at investigating the role of information from various viewpoints: from the fundamental theoretical underpinnings of information to the science and engineering of novel information substrates, biological pathways, communication networks, economics, and complex social systems. The specific means and goals for the Center are: initiate the Science Lecture Series on Information to collectively ponder short and long term goals; study dynamic information theory that extends information theory to time space varying situations; advance information algorithmics that develop new algorithms and data structures for the application of information; encourage and facilitate interdisciplinary collaborations; provide scholarships and fellowships for the best students, and support the development of new interdisciplinary courses.

Analytic Information Theory and Beyond

Analytic Information Theory and Beyond Analytic Information Theory and Beyond W. Szpankowski Department of Computer Science Purdue University W. Lafayette, IN 47907 May 12, 2010 AofA and IT logos Frankfurt 2010 Research supported by NSF Science

More information

Analytic Information Theory: From Shannon to Knuth and Back. Knuth80: Piteaa, Sweden, 2018 Dedicated to Don E. Knuth

Analytic Information Theory: From Shannon to Knuth and Back. Knuth80: Piteaa, Sweden, 2018 Dedicated to Don E. Knuth Analytic Information Theory: From Shannon to Knuth and Back Wojciech Szpankowski Center for Science of Information Purdue University January 7, 2018 Knuth80: Piteaa, Sweden, 2018 Dedicated to Don E. Knuth

More information

Minimax Redundancy for Large Alphabets by Analytic Methods

Minimax Redundancy for Large Alphabets by Analytic Methods Minimax Redundancy for Large Alphabets by Analytic Methods Wojciech Szpankowski Purdue University W. Lafayette, IN 47907 March 15, 2012 NSF STC Center for Science of Information CISS, Princeton 2012 Joint

More information

A One-to-One Code and Its Anti-Redundancy

A One-to-One Code and Its Anti-Redundancy A One-to-One Code and Its Anti-Redundancy W. Szpankowski Department of Computer Science, Purdue University July 4, 2005 This research is supported by NSF, NSA and NIH. Outline of the Talk. Prefix Codes

More information

Variable-to-Variable Codes with Small Redundancy Rates

Variable-to-Variable Codes with Small Redundancy Rates Variable-to-Variable Codes with Small Redundancy Rates M. Drmota W. Szpankowski September 25, 2004 This research is supported by NSF, NSA and NIH. Institut f. Diskrete Mathematik und Geometrie, TU Wien,

More information

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria Source Coding Master Universitario en Ingeniería de Telecomunicación I. Santamaría Universidad de Cantabria Contents Introduction Asymptotic Equipartition Property Optimal Codes (Huffman Coding) Universal

More information

Chapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code

Chapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code Chapter 3 Source Coding 3. An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code 3. An Introduction to Source Coding Entropy (in bits per symbol) implies in average

More information

Chapter 2 Date Compression: Source Coding. 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code

Chapter 2 Date Compression: Source Coding. 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code Chapter 2 Date Compression: Source Coding 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code 2.1 An Introduction to Source Coding Source coding can be seen as an efficient way

More information

Algorithms, Combinatorics, Information, and Beyond. Dedicated to PHILIPPE FLAJOLET

Algorithms, Combinatorics, Information, and Beyond. Dedicated to PHILIPPE FLAJOLET Algorithms, Combinatorics, Information, and Beyond Wojciech Szpankowski Purdue University October 3, 2011 Concordia University, Montreal, 2011 Dedicated to PHILIPPE FLAJOLET Research supported by NSF Science

More information

CSCI 2570 Introduction to Nanocomputing

CSCI 2570 Introduction to Nanocomputing CSCI 2570 Introduction to Nanocomputing Information Theory John E Savage What is Information Theory Introduced by Claude Shannon. See Wikipedia Two foci: a) data compression and b) reliable communication

More information

Chapter 2: Source coding

Chapter 2: Source coding Chapter 2: meghdadi@ensil.unilim.fr University of Limoges Chapter 2: Entropy of Markov Source Chapter 2: Entropy of Markov Source Markov model for information sources Given the present, the future is independent

More information

Multiple Choice Tries and Distributed Hash Tables

Multiple Choice Tries and Distributed Hash Tables Multiple Choice Tries and Distributed Hash Tables Luc Devroye and Gabor Lugosi and Gahyun Park and W. Szpankowski January 3, 2007 McGill University, Montreal, Canada U. Pompeu Fabra, Barcelona, Spain U.

More information

Counting Markov Types

Counting Markov Types Counting Markov Types Philippe Jacquet, Charles Knessl, Wojciech Szpankowski To cite this version: Philippe Jacquet, Charles Knessl, Wojciech Szpankowski. Counting Markov Types. Drmota, Michael and Gittenberger,

More information

Context tree models for source coding

Context tree models for source coding Context tree models for source coding Toward Non-parametric Information Theory Licence de droits d usage Outline Lossless Source Coding = density estimation with log-loss Source Coding and Universal Coding

More information

Digital Trees and Memoryless Sources: from Arithmetics to Analysis

Digital Trees and Memoryless Sources: from Arithmetics to Analysis Digital Trees and Memoryless Sources: from Arithmetics to Analysis Philippe Flajolet, Mathieu Roux, Brigitte Vallée AofA 2010, Wien 1 What is a digital tree, aka TRIE? = a data structure for dynamic dictionaries

More information

(Classical) Information Theory II: Source coding

(Classical) Information Theory II: Source coding (Classical) Information Theory II: Source coding Sibasish Ghosh The Institute of Mathematical Sciences CIT Campus, Taramani, Chennai 600 113, India. p. 1 Abstract The information content of a random variable

More information

Basic Principles of Lossless Coding. Universal Lossless coding. Lempel-Ziv Coding. 2. Exploit dependences between successive symbols.

Basic Principles of Lossless Coding. Universal Lossless coding. Lempel-Ziv Coding. 2. Exploit dependences between successive symbols. Universal Lossless coding Lempel-Ziv Coding Basic principles of lossless compression Historical review Variable-length-to-block coding Lempel-Ziv coding 1 Basic Principles of Lossless Coding 1. Exploit

More information

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1 Kraft s inequality An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if N 2 l i 1 Proof: Suppose that we have a tree code. Let l max = max{l 1,...,

More information

ECE 587 / STA 563: Lecture 5 Lossless Compression

ECE 587 / STA 563: Lecture 5 Lossless Compression ECE 587 / STA 563: Lecture 5 Lossless Compression Information Theory Duke University, Fall 2017 Author: Galen Reeves Last Modified: October 18, 2017 Outline of lecture: 5.1 Introduction to Lossless Source

More information

Universal Loseless Compression: Context Tree Weighting(CTW)

Universal Loseless Compression: Context Tree Weighting(CTW) Universal Loseless Compression: Context Tree Weighting(CTW) Dept. Electrical Engineering, Stanford University Dec 9, 2014 Universal Coding with Model Classes Traditional Shannon theory assume a (probabilistic)

More information

ECE 587 / STA 563: Lecture 5 Lossless Compression

ECE 587 / STA 563: Lecture 5 Lossless Compression ECE 587 / STA 563: Lecture 5 Lossless Compression Information Theory Duke University, Fall 28 Author: Galen Reeves Last Modified: September 27, 28 Outline of lecture: 5. Introduction to Lossless Source

More information

Lecture 4 : Adaptive source coding algorithms

Lecture 4 : Adaptive source coding algorithms Lecture 4 : Adaptive source coding algorithms February 2, 28 Information Theory Outline 1. Motivation ; 2. adaptive Huffman encoding ; 3. Gallager and Knuth s method ; 4. Dictionary methods : Lempel-Ziv

More information

Average Redundancy for Known Sources: Ubiquitous Trees in Source Coding

Average Redundancy for Known Sources: Ubiquitous Trees in Source Coding Discrete Mathematics and Theoretical Computer Science DMTCS vol. (subm.), by the authors, 1 1 Average Redundancy for Known Sources: Ubiquitous Trees in Source Coding Wojciech Szpanowsi Department of Computer

More information

From the Discrete to the Continuous, and Back... Philippe Flajolet INRIA, France

From the Discrete to the Continuous, and Back... Philippe Flajolet INRIA, France From the Discrete to the Continuous, and Back... Philippe Flajolet INRIA, France 1 Discrete structures in... Combinatorial mathematics Computer science: data structures & algorithms Information and communication

More information

MARKOV CHAINS A finite state Markov chain is a sequence of discrete cv s from a finite alphabet where is a pmf on and for

MARKOV CHAINS A finite state Markov chain is a sequence of discrete cv s from a finite alphabet where is a pmf on and for MARKOV CHAINS A finite state Markov chain is a sequence S 0,S 1,... of discrete cv s from a finite alphabet S where q 0 (s) is a pmf on S 0 and for n 1, Q(s s ) = Pr(S n =s S n 1 =s ) = Pr(S n =s S n 1

More information

Chapter 9 Fundamental Limits in Information Theory

Chapter 9 Fundamental Limits in Information Theory Chapter 9 Fundamental Limits in Information Theory Information Theory is the fundamental theory behind information manipulation, including data compression and data transmission. 9.1 Introduction o For

More information

5. Analytic Combinatorics

5. Analytic Combinatorics ANALYTIC COMBINATORICS P A R T O N E 5. Analytic Combinatorics http://aofa.cs.princeton.edu Analytic combinatorics is a calculus for the quantitative study of large combinatorial structures. Features:

More information

Analytic Information Theory: Redundancy Rate Problems

Analytic Information Theory: Redundancy Rate Problems Analytic Information Theory: Redundancy Rate Problems W. Szpankowski Department of Computer Science Purdue University W. Lafayette, IN 47907 November 3, 2015 Research supported by NSF Science & Technology

More information

lossless, optimal compressor

lossless, optimal compressor 6. Variable-length Lossless Compression The principal engineering goal of compression is to represent a given sequence a, a 2,..., a n produced by a source as a sequence of bits of minimal possible length.

More information

1 Ex. 1 Verify that the function H(p 1,..., p n ) = k p k log 2 p k satisfies all 8 axioms on H.

1 Ex. 1 Verify that the function H(p 1,..., p n ) = k p k log 2 p k satisfies all 8 axioms on H. Problem sheet Ex. Verify that the function H(p,..., p n ) = k p k log p k satisfies all 8 axioms on H. Ex. (Not to be handed in). looking at the notes). List as many of the 8 axioms as you can, (without

More information

UNIT I INFORMATION THEORY. I k log 2

UNIT I INFORMATION THEORY. I k log 2 UNIT I INFORMATION THEORY Claude Shannon 1916-2001 Creator of Information Theory, lays the foundation for implementing logic in digital circuits as part of his Masters Thesis! (1939) and published a paper

More information

Facets of Information

Facets of Information Facets of Information W. Szpankowski Department of Computer Science Purdue University W. Lafayette, IN 47907 January 30, 2009 AofA and IT logos QUALCOMM, San Diego, 2009 Thanks to Y. Choi, Purdue, P. Jacquet,

More information

SIGNAL COMPRESSION Lecture Shannon-Fano-Elias Codes and Arithmetic Coding

SIGNAL COMPRESSION Lecture Shannon-Fano-Elias Codes and Arithmetic Coding SIGNAL COMPRESSION Lecture 3 4.9.2007 Shannon-Fano-Elias Codes and Arithmetic Coding 1 Shannon-Fano-Elias Coding We discuss how to encode the symbols {a 1, a 2,..., a m }, knowing their probabilities,

More information

EE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16

EE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16 EE539R: Problem Set 4 Assigned: 3/08/6, Due: 07/09/6. Cover and Thomas: Problem 3.5 Sets defined by probabilities: Define the set C n (t = {x n : P X n(x n 2 nt } (a We have = P X n(x n P X n(x n 2 nt

More information

1 Introduction to information theory

1 Introduction to information theory 1 Introduction to information theory 1.1 Introduction In this chapter we present some of the basic concepts of information theory. The situations we have in mind involve the exchange of information through

More information

Lecture 3. Mathematical methods in communication I. REMINDER. A. Convex Set. A set R is a convex set iff, x 1,x 2 R, θ, 0 θ 1, θx 1 + θx 2 R, (1)

Lecture 3. Mathematical methods in communication I. REMINDER. A. Convex Set. A set R is a convex set iff, x 1,x 2 R, θ, 0 θ 1, θx 1 + θx 2 R, (1) 3- Mathematical methods in communication Lecture 3 Lecturer: Haim Permuter Scribe: Yuval Carmel, Dima Khaykin, Ziv Goldfeld I. REMINDER A. Convex Set A set R is a convex set iff, x,x 2 R, θ, θ, θx + θx

More information

Universal Anytime Codes: An approach to uncertain channels in control

Universal Anytime Codes: An approach to uncertain channels in control Universal Anytime Codes: An approach to uncertain channels in control paper by Stark Draper and Anant Sahai presented by Sekhar Tatikonda Wireless Foundations Department of Electrical Engineering and Computer

More information

On Universal Types. Gadiel Seroussi Hewlett-Packard Laboratories Palo Alto, California, USA. University of Minnesota, September 14, 2004

On Universal Types. Gadiel Seroussi Hewlett-Packard Laboratories Palo Alto, California, USA. University of Minnesota, September 14, 2004 On Universal Types Gadiel Seroussi Hewlett-Packard Laboratories Palo Alto, California, USA University of Minnesota, September 14, 2004 Types for Parametric Probability Distributions A = finite alphabet,

More information

Introduction to Low-Density Parity Check Codes. Brian Kurkoski

Introduction to Low-Density Parity Check Codes. Brian Kurkoski Introduction to Low-Density Parity Check Codes Brian Kurkoski kurkoski@ice.uec.ac.jp Outline: Low Density Parity Check Codes Review block codes History Low Density Parity Check Codes Gallager s LDPC code

More information

EE376A - Information Theory Midterm, Tuesday February 10th. Please start answering each question on a new page of the answer booklet.

EE376A - Information Theory Midterm, Tuesday February 10th. Please start answering each question on a new page of the answer booklet. EE376A - Information Theory Midterm, Tuesday February 10th Instructions: You have two hours, 7PM - 9PM The exam has 3 questions, totaling 100 points. Please start answering each question on a new page

More information

String Complexity. Dedicated to Svante Janson for his 60 Birthday

String Complexity. Dedicated to Svante Janson for his 60 Birthday String Complexity Wojciech Szpankowski Purdue University W. Lafayette, IN 47907 June 1, 2015 Dedicated to Svante Janson for his 60 Birthday Outline 1. Working with Svante 2. String Complexity 3. Joint

More information

Multimedia Communications. Mathematical Preliminaries for Lossless Compression

Multimedia Communications. Mathematical Preliminaries for Lossless Compression Multimedia Communications Mathematical Preliminaries for Lossless Compression What we will see in this chapter Definition of information and entropy Modeling a data source Definition of coding and when

More information

COMM901 Source Coding and Compression. Quiz 1

COMM901 Source Coding and Compression. Quiz 1 German University in Cairo - GUC Faculty of Information Engineering & Technology - IET Department of Communication Engineering Winter Semester 2013/2014 Students Name: Students ID: COMM901 Source Coding

More information

On bounded redundancy of universal codes

On bounded redundancy of universal codes On bounded redundancy of universal codes Łukasz Dębowski Institute of omputer Science, Polish Academy of Sciences ul. Jana Kazimierza 5, 01-248 Warszawa, Poland Abstract onsider stationary ergodic measures

More information

SIGNAL COMPRESSION Lecture 7. Variable to Fix Encoding

SIGNAL COMPRESSION Lecture 7. Variable to Fix Encoding SIGNAL COMPRESSION Lecture 7 Variable to Fix Encoding 1. Tunstall codes 2. Petry codes 3. Generalized Tunstall codes for Markov sources (a presentation of the paper by I. Tabus, G. Korodi, J. Rissanen.

More information

Notes by Zvi Rosen. Thanks to Alyssa Palfreyman for supplements.

Notes by Zvi Rosen. Thanks to Alyssa Palfreyman for supplements. Lecture: Hélène Barcelo Analytic Combinatorics ECCO 202, Bogotá Notes by Zvi Rosen. Thanks to Alyssa Palfreyman for supplements.. Tuesday, June 2, 202 Combinatorics is the study of finite structures that

More information

10-704: Information Processing and Learning Fall Lecture 10: Oct 3

10-704: Information Processing and Learning Fall Lecture 10: Oct 3 0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 0: Oct 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy of

More information

Coding on Countably Infinite Alphabets

Coding on Countably Infinite Alphabets Coding on Countably Infinite Alphabets Non-parametric Information Theory Licence de droits d usage Outline Lossless Coding on infinite alphabets Source Coding Universal Coding Infinite Alphabets Enveloppe

More information

A Master Theorem for Discrete Divide and Conquer Recurrences

A Master Theorem for Discrete Divide and Conquer Recurrences A Master Theorem for Discrete Divide and Conquer Recurrences Wojciech Szpankowski Department of Computer Science Purdue University W. Lafayette, IN 47907 January 20, 2011 NSF CSoI SODA, 2011 Research supported

More information

COLLECTED WORKS OF PHILIPPE FLAJOLET

COLLECTED WORKS OF PHILIPPE FLAJOLET COLLECTED WORKS OF PHILIPPE FLAJOLET Editorial Committee: HSIEN-KUEI HWANG Institute of Statistical Science Academia Sinica Taipei 115 Taiwan ROBERT SEDGEWICK Department of Computer Science Princeton University

More information

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018 Please submit the solutions on Gradescope. EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018 1. Optimal codeword lengths. Although the codeword lengths of an optimal variable length code

More information

Notes 3: Stochastic channels and noisy coding theorem bound. 1 Model of information communication and noisy channel

Notes 3: Stochastic channels and noisy coding theorem bound. 1 Model of information communication and noisy channel Introduction to Coding Theory CMU: Spring 2010 Notes 3: Stochastic channels and noisy coding theorem bound January 2010 Lecturer: Venkatesan Guruswami Scribe: Venkatesan Guruswami We now turn to the basic

More information

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability

More information

Information Theory CHAPTER. 5.1 Introduction. 5.2 Entropy

Information Theory CHAPTER. 5.1 Introduction. 5.2 Entropy Haykin_ch05_pp3.fm Page 207 Monday, November 26, 202 2:44 PM CHAPTER 5 Information Theory 5. Introduction As mentioned in Chapter and reiterated along the way, the purpose of a communication system is

More information

Chapter 5: Data Compression

Chapter 5: Data Compression Chapter 5: Data Compression Definition. A source code C for a random variable X is a mapping from the range of X to the set of finite length strings of symbols from a D-ary alphabet. ˆX: source alphabet,

More information

Entropy for Sparse Random Graphs With Vertex-Names

Entropy for Sparse Random Graphs With Vertex-Names Entropy for Sparse Random Graphs With Vertex-Names David Aldous 11 February 2013 if a problem seems... Research strategy (for old guys like me): do-able = give to Ph.D. student maybe do-able = give to

More information

Lecture 16. Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code

Lecture 16. Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code Lecture 16 Agenda for the lecture Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code Variable-length source codes with error 16.1 Error-free coding schemes 16.1.1 The Shannon-Fano-Elias

More information

CS6304 / Analog and Digital Communication UNIT IV - SOURCE AND ERROR CONTROL CODING PART A 1. What is the use of error control coding? The main use of error control coding is to reduce the overall probability

More information

Introduction to information theory and coding

Introduction to information theory and coding Introduction to information theory and coding Louis WEHENKEL Set of slides No 5 State of the art in data compression Stochastic processes and models for information sources First Shannon theorem : data

More information

EE229B - Final Project. Capacity-Approaching Low-Density Parity-Check Codes

EE229B - Final Project. Capacity-Approaching Low-Density Parity-Check Codes EE229B - Final Project Capacity-Approaching Low-Density Parity-Check Codes Pierre Garrigues EECS department, UC Berkeley garrigue@eecs.berkeley.edu May 13, 2005 Abstract The class of low-density parity-check

More information

Motivation for Arithmetic Coding

Motivation for Arithmetic Coding Motivation for Arithmetic Coding Motivations for arithmetic coding: 1) Huffman coding algorithm can generate prefix codes with a minimum average codeword length. But this length is usually strictly greater

More information

Information Theory, Statistics, and Decision Trees

Information Theory, Statistics, and Decision Trees Information Theory, Statistics, and Decision Trees Léon Bottou COS 424 4/6/2010 Summary 1. Basic information theory. 2. Decision trees. 3. Information theory and statistics. Léon Bottou 2/31 COS 424 4/6/2010

More information

Lecture 8: Shannon s Noise Models

Lecture 8: Shannon s Noise Models Error Correcting Codes: Combinatorics, Algorithms and Applications (Fall 2007) Lecture 8: Shannon s Noise Models September 14, 2007 Lecturer: Atri Rudra Scribe: Sandipan Kundu& Atri Rudra Till now we have

More information

Facets of Information

Facets of Information Facets of Information Wojciech Szpankowski Department of Computer Science Purdue University W. Lafayette, IN 47907 May 31, 2010 AofA and IT logos Ecole Polytechnique, Palaiseau, 2010 Research supported

More information

A Master Theorem for Discrete Divide and Conquer Recurrences. Dedicated to PHILIPPE FLAJOLET

A Master Theorem for Discrete Divide and Conquer Recurrences. Dedicated to PHILIPPE FLAJOLET A Master Theorem for Discrete Divide and Conquer Recurrences Wojciech Szpankowski Department of Computer Science Purdue University U.S.A. September 6, 2012 Uniwersytet Jagieloński, Kraków, 2012 Dedicated

More information

AVERAGE-CASE ANALYSIS OF COUSINS IN m-ary TRIES

AVERAGE-CASE ANALYSIS OF COUSINS IN m-ary TRIES J. Appl. Prob. 45, 888 9 (28) Printed in England Applied Probability Trust 28 AVERAGE-CASE ANALYSIS OF COUSINS IN m-ary TRIES HOSAM M. MAHMOUD, The George Washington University MARK DANIEL WARD, Purdue

More information

Coding for Discrete Source

Coding for Discrete Source EGR 544 Communication Theory 3. Coding for Discrete Sources Z. Aliyazicioglu Electrical and Computer Engineering Department Cal Poly Pomona Coding for Discrete Source Coding Represent source data effectively

More information

Lecture 2 Linear Codes

Lecture 2 Linear Codes Lecture 2 Linear Codes 2.1. Linear Codes From now on we want to identify the alphabet Σ with a finite field F q. For general codes, introduced in the last section, the description is hard. For a code of

More information

What is Information?

What is Information? What is Information? W. Szpankowski Department of Computer Science Purdue University W. Lafayette, IN 47907 May 3, 2006 AofA LOGO Participants of Information Beyond Shannon, Orlando, 2005, and J. Konorski,

More information

3F1 Information Theory, Lecture 3

3F1 Information Theory, Lecture 3 3F1 Information Theory, Lecture 3 Jossy Sayir Department of Engineering Michaelmas 2011, 28 November 2011 Memoryless Sources Arithmetic Coding Sources with Memory 2 / 19 Summary of last lecture Prefix-free

More information

Efficient On-line Schemes for Encoding Individual Sequences with Side Information at the Decoder

Efficient On-line Schemes for Encoding Individual Sequences with Side Information at the Decoder Efficient On-line Schemes for Encoding Individual Sequences with Side Information at the Decoder Avraham Reani and Neri Merhav Department of Electrical Engineering Technion - Israel Institute of Technology

More information

Exercises with solutions (Set B)

Exercises with solutions (Set B) Exercises with solutions (Set B) 3. A fair coin is tossed an infinite number of times. Let Y n be a random variable, with n Z, that describes the outcome of the n-th coin toss. If the outcome of the n-th

More information

Lecture 4 Noisy Channel Coding

Lecture 4 Noisy Channel Coding Lecture 4 Noisy Channel Coding I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw October 9, 2015 1 / 56 I-Hsiang Wang IT Lecture 4 The Channel Coding Problem

More information

Lecture 6 I. CHANNEL CODING. X n (m) P Y X

Lecture 6 I. CHANNEL CODING. X n (m) P Y X 6- Introduction to Information Theory Lecture 6 Lecturer: Haim Permuter Scribe: Yoav Eisenberg and Yakov Miron I. CHANNEL CODING We consider the following channel coding problem: m = {,2,..,2 nr} Encoder

More information

Networks: Lectures 9 & 10 Random graphs

Networks: Lectures 9 & 10 Random graphs Networks: Lectures 9 & 10 Random graphs Heather A Harrington Mathematical Institute University of Oxford HT 2017 What you re in for Week 1: Introduction and basic concepts Week 2: Small worlds Week 3:

More information

Data Compression Techniques (Spring 2012) Model Solutions for Exercise 2

Data Compression Techniques (Spring 2012) Model Solutions for Exercise 2 582487 Data Compression Techniques (Spring 22) Model Solutions for Exercise 2 If you have any feedback or corrections, please contact nvalimak at cs.helsinki.fi.. Problem: Construct a canonical prefix

More information

Continued Fraction Digit Averages and Maclaurin s Inequalities

Continued Fraction Digit Averages and Maclaurin s Inequalities Continued Fraction Digit Averages and Maclaurin s Inequalities Steven J. Miller, Williams College sjm1@williams.edu, Steven.Miller.MC.96@aya.yale.edu Joint with Francesco Cellarosi, Doug Hensley and Jake

More information

10-704: Information Processing and Learning Spring Lecture 8: Feb 5

10-704: Information Processing and Learning Spring Lecture 8: Feb 5 10-704: Information Processing and Learning Spring 2015 Lecture 8: Feb 5 Lecturer: Aarti Singh Scribe: Siheng Chen Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal

More information

Pattern Matching in Constrained Sequences

Pattern Matching in Constrained Sequences Pattern Matching in Constrained Sequences Yongwook Choi and Wojciech Szpankowski Department of Computer Science Purdue University W. Lafayette, IN 47907 U.S.A. Email: ywchoi@purdue.edu, spa@cs.purdue.edu

More information

Delay, feedback, and the price of ignorance

Delay, feedback, and the price of ignorance Delay, feedback, and the price of ignorance Anant Sahai based in part on joint work with students: Tunc Simsek Cheng Chang Wireless Foundations Department of Electrical Engineering and Computer Sciences

More information

Linear Codes, Target Function Classes, and Network Computing Capacity

Linear Codes, Target Function Classes, and Network Computing Capacity Linear Codes, Target Function Classes, and Network Computing Capacity Rathinakumar Appuswamy, Massimo Franceschetti, Nikhil Karamchandani, and Kenneth Zeger IEEE Transactions on Information Theory Submitted:

More information

Lecture 1 : Data Compression and Entropy

Lecture 1 : Data Compression and Entropy CPS290: Algorithmic Foundations of Data Science January 8, 207 Lecture : Data Compression and Entropy Lecturer: Kamesh Munagala Scribe: Kamesh Munagala In this lecture, we will study a simple model for

More information

EE376A - Information Theory Final, Monday March 14th 2016 Solutions. Please start answering each question on a new page of the answer booklet.

EE376A - Information Theory Final, Monday March 14th 2016 Solutions. Please start answering each question on a new page of the answer booklet. EE376A - Information Theory Final, Monday March 14th 216 Solutions Instructions: You have three hours, 3.3PM - 6.3PM The exam has 4 questions, totaling 12 points. Please start answering each question on

More information

Information Theory in Intelligent Decision Making

Information Theory in Intelligent Decision Making Information Theory in Intelligent Decision Making Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire, United Kingdom June 7, 2015 Information Theory

More information

Tight Bounds on Minimum Maximum Pointwise Redundancy

Tight Bounds on Minimum Maximum Pointwise Redundancy Tight Bounds on Minimum Maximum Pointwise Redundancy Michael B. Baer vlnks Mountain View, CA 94041-2803, USA Email:.calbear@ 1eee.org Abstract This paper presents new lower and upper bounds for the optimal

More information

Error-Correcting Codes:

Error-Correcting Codes: Error-Correcting Codes: Progress & Challenges Madhu Sudan Microsoft/MIT Communication in presence of noise We are not ready Sender Noisy Channel We are now ready Receiver If information is digital, reliability

More information

Analytic Pattern Matching: From DNA to Twitter. AxA Workshop, Venice, 2016 Dedicated to Alberto Apostolico

Analytic Pattern Matching: From DNA to Twitter. AxA Workshop, Venice, 2016 Dedicated to Alberto Apostolico Analytic Pattern Matching: From DNA to Twitter Wojciech Szpankowski Purdue University W. Lafayette, IN 47907 June 19, 2016 AxA Workshop, Venice, 2016 Dedicated to Alberto Apostolico Joint work with Philippe

More information

Probabilistic analysis of the asymmetric digital search trees

Probabilistic analysis of the asymmetric digital search trees Int. J. Nonlinear Anal. Appl. 6 2015 No. 2, 161-173 ISSN: 2008-6822 electronic http://dx.doi.org/10.22075/ijnaa.2015.266 Probabilistic analysis of the asymmetric digital search trees R. Kazemi a,, M. Q.

More information

Introduction to algebraic codings Lecture Notes for MTH 416 Fall Ulrich Meierfrankenfeld

Introduction to algebraic codings Lecture Notes for MTH 416 Fall Ulrich Meierfrankenfeld Introduction to algebraic codings Lecture Notes for MTH 416 Fall 2014 Ulrich Meierfrankenfeld December 9, 2014 2 Preface These are the Lecture Notes for the class MTH 416 in Fall 2014 at Michigan State

More information

Lecture 7. Union bound for reducing M-ary to binary hypothesis testing

Lecture 7. Union bound for reducing M-ary to binary hypothesis testing Lecture 7 Agenda for the lecture M-ary hypothesis testing and the MAP rule Union bound for reducing M-ary to binary hypothesis testing Introduction of the channel coding problem 7.1 M-ary hypothesis testing

More information

Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information

Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information 204 IEEE International Symposium on Information Theory Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information Omur Ozel, Kaya Tutuncuoglu 2, Sennur Ulukus, and Aylin Yener

More information

Almost all trees have an even number of independent sets

Almost all trees have an even number of independent sets Almost all trees have an even number of independent sets Stephan G. Wagner Department of Mathematical Sciences Stellenbosch University Private Bag X1, Matieland 7602, South Africa swagner@sun.ac.za Submitted:

More information

Lecture 3 : Algorithms for source coding. September 30, 2016

Lecture 3 : Algorithms for source coding. September 30, 2016 Lecture 3 : Algorithms for source coding September 30, 2016 Outline 1. Huffman code ; proof of optimality ; 2. Coding with intervals : Shannon-Fano-Elias code and Shannon code ; 3. Arithmetic coding. 1/39

More information

A class of trees and its Wiener index.

A class of trees and its Wiener index. A class of trees and its Wiener index. Stephan G. Wagner Department of Mathematics Graz University of Technology Steyrergasse 3, A-81 Graz, Austria wagner@finanz.math.tu-graz.ac.at Abstract In this paper,

More information

RENEWAL THEORY IN ANALYSIS OF TRIES AND STRINGS: EXTENDED ABSTRACT

RENEWAL THEORY IN ANALYSIS OF TRIES AND STRINGS: EXTENDED ABSTRACT RENEWAL THEORY IN ANALYSIS OF TRIES AND STRINGS: EXTENDED ABSTRACT SVANTE JANSON Abstract. We give a survey of a number of simple applications of renewal theory to problems on random strings, in particular

More information

k-protected VERTICES IN BINARY SEARCH TREES

k-protected VERTICES IN BINARY SEARCH TREES k-protected VERTICES IN BINARY SEARCH TREES MIKLÓS BÓNA Abstract. We show that for every k, the probability that a randomly selected vertex of a random binary search tree on n nodes is at distance k from

More information

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information 4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information Ramji Venkataramanan Signal Processing and Communications Lab Department of Engineering ramji.v@eng.cam.ac.uk

More information

Entropies & Information Theory

Entropies & Information Theory Entropies & Information Theory LECTURE I Nilanjana Datta University of Cambridge,U.K. See lecture notes on: http://www.qi.damtp.cam.ac.uk/node/223 Quantum Information Theory Born out of Classical Information

More information

Lecture 6: Expander Codes

Lecture 6: Expander Codes CS369E: Expanders May 2 & 9, 2005 Lecturer: Prahladh Harsha Lecture 6: Expander Codes Scribe: Hovav Shacham In today s lecture, we will discuss the application of expander graphs to error-correcting codes.

More information