Analytic Information Theory: Redundancy Rate Problems

Size: px
Start display at page:

Download "Analytic Information Theory: Redundancy Rate Problems"

Transcription

1 Analytic Information Theory: Redundancy Rate Problems W. Szpankowski Department of Computer Science Purdue University W. Lafayette, IN November 3, 2015 Research supported by NSF Science & Technology Center, and Humboldt Foundation.

2 Outline 1. Shannon Information Theory: Three Theorems of Shannon 2. Glance at Channel Coding Theorem Asymptotic Equipartition Theorem Jointly typical Sequences Capacity 3. Source Coding 4. Redundancy: Known Sources Shannon and Huffman Coding (sequences modulo 1) Non-Prefix Codes (saddle point method) Tunstall Code (Mellin transform) 5. Minimax Redundancy: Universal (unknown) Sources (a) Universal Memoryless Sources (Tree-like gen. func.) (b) Universal Markov Sources (Balance matrices) (c) Universal Renewal Sources (Combinatorial calculus) 6. Appendix A: Mellin Transform 7. Appendix B: Saddle Point Method

3 Outline Update 1. Shannon Information Theory: Three Theorems of Shannon 2. Glance at Channel Coding Theorem Asymptotic Equipartition Theorem Jointly typical Sequences Capacity 3. Source Coding 4. Redundancy: Known Sources 5. Minimax Redundancy: Universal (unknown) Sources

4 Three Jewels of Shannon Theorem 1 & 3. [Shannon 1948; Lossless & Lossy Data Compression] Lossless Compression: compression bit rate source entropy H(X); Lossy Compression: For distortion level D: lossy bit rate rate distortion function R(D) Theorem 2. [Shannon 1948; Channel Coding ] In Shannon s words: It is possible to send information at the capacity through the channel with as small a frequency of errors as desired by proper (long) encoding. This statement is not true for any rate greater than the capacity.

5 Theorem 1: AEP and Typical Sequences Shannon-McMilan-Breiman: 1 n logp(xn 1 ) H(X) (pr.) H(X) is the entropy rate. Asymptotic Equipartition Property: partitioned into Sequences of length n can be Also, G ε n 2nH(X). good set G ε n P(w) 2 nh(x), w G ε n bad set B ε n P(B ε n ) < ε.

6 Theorem 2: Shannon Random Decoding Rule There are 2 nh(x) X-typical sequences There are 2 nh(y) Y-typical sequences There are 2 nh(x,y) jointly X,Y-typical pair of sequences Decoding Rule: Declare that sequence sent X is the one that is jointly typical with the received sequence Y provided there is unique X satisfying this property!

7 Sketch of Proof: Channel Capacity Theorem 1. With high probability (whp), there is a jointly typical pair (X,Y ). 2. The probability that there is another jointly typical pair is 2 ni(x,y). Indeed: there are2 nh(x) and2 nh(y) typical sequencesx n andy n, respectively. there are 2 nh(x,y) jointly typical pairs (X,Y ). The probability of error (more than one typical pair is): 2 nh(x,y) 2 nh(x)+h(y) = 2 ni(x,y). 3. Probability of error when 2 nr messages are sent is approximately minp(error) 2 n(sup P(X) I(X,Y) R) = 2 n(c R). 4. In conclusion: R < C P(error) 2 nδ R > C P(error) 1.

8 Capacity of BSC Capacity: I(X;Y ) = H(Y ) H(Y X) = H(Y ) H(p) 1 H(p). The capacity is achieved for the uniform input distribution. Thus C = 1 H(p).

9 Temporal Capacity for BSC (Exercise) 1. Each bit incurs a delay, T with known probability distribution: F(t) = P(T < t). If a bit arrives after a given deadline τ, it is dropped. 2. The longer it takes to send a bit, the lower the probability of success, which we denote by Φ(ε,t) for t < τ (e.g., Φ(ε,t) = (1 ε) t ). 3. Define P(x x) = τ 0 P(y x) = Φ(ε,t)dF(t): prob. of a successfully transmission: α := 1 F(τ) y = erasure P(x x) if x = y 1 α P(x x) if x y Define: α = 1 F(τ) and ρ := P(x x) (1 α). H(Y X) = H(α) + (1 α)h(ρ) and H(Y ) = H(α) + (1 α)h(pρ + p ρ). Then: tau C(τ) = [(1 P(T > τ)][1 H(ρ)].

10 Noisy Constrained Channel Let S denote the set of binary constrained sequences of length n. Here: S d,k = {(d,k) sequences}, i.e., no sequence contains a run of zeros shorter than d or longer than k.

11 Noisy Constrained Channel Let S denote the set of binary constrained sequences of length n. Here: S d,k = {(d,k) sequences}, i.e., no sequence contains a run of zeros shorter than d or longer than k. Sequence X S (d,k) can be represented as a Markov process.

12 Noisy Constrained Channel Let S denote the set of binary constrained sequences of length n. Here: S d,k = {(d,k) sequences}, i.e., no sequence contains a run of zeros shorter than d or longer than k. Sequence X S (d,k) can be represented as a Markov process. C(S,ε) noisy constrained capacity defined as 1 C(S,ε) = supi(x;y) = lim X S n n sup I(X n X 1 n S 1,Y n 1 ). n This is/was an open problem since Shannon.

13 Entropy of HMM 1. Mutual information where I(X;Y) = H(Y ) H(Y X). Y k = X k E k, k 1, ( is exclusive-or), E = {E k } k 1 is a binary i.i.d. representing noise such that P(E i = 1) = ε. 2. Observe that H(Y X) = H(E) = H(ε) hence we must find the entropy of H(Y), that is, the entropy of a hidden Markov process since a (d,k) sequence can be generated as an output of a kth order Markov process. X k Y k = E k X k E k How to compute the entropy H(Y )?

14 Outline Update 1. Shannon Information Theory: Three Theorems of Shannon 2. Glance at Channel Coding Theorem 3. Source Coding 4. Redundancy: Known Sources 5. Minimax Redundancy: Universal (unknown) Sources

15 Source Coding A source code is a bijective mapping C : A {0,1} from sequences over the alphabet A to set {0,1} of binary sequences. The basic problem of source coding (i.e., data compression) is to find codes with shortest descriptions (lengths) either on average or for individual sequences. Three Basic Types of Source Coding: Fixed-to-Variable (FV) length codes (e.g., Huffman and Shannon codes). Variable-to-Fixed (VF) length codes (e.g., Tunstall and Khodak codes). Variable-to-Variable (VV) length codes (e.g., Khodak VV code).

16 Preliminary Results Prefix code is such that no codeword is a prefix of another codeword. Tree and lattice representations: L R R L Notation: For a source model S and a code C we let: P(x) be the probability of x A ; L(C,x) be the code length for the source sequence x A ; Entropy H(P) = x A P(x)lgP(x). Quantities are expressed in binary logarithms written lg := log 2.

17 Prefix Codes Kraft s Inequality A binary code is a prefix code iff the code lengths l 1,l 2,...,l N satisfy l max l i l i i 2 l max l i 2 l max < N i=1 2 l i 1. Barron s lemma For any sequence a n of positive constants satisfying n 2 an < Pr{L(X) < logp(x) a n } 2 an, and therefore L(X) logp(x) a n (a.s). Proof: We argue as follows: Pr{L(X) < log 2 P(X) a n } = x:p(x)<2 L(x) a n x:p(x)<2 L(x) a n 2 L(x) a n P(x) 2 a n x 2 L(x) 2 an.

18 Shannon Lower Bound Shannon First Theorem For any prefix code the average code length E[L(C, X)] cannot be smaller than the entropy of the source H(P), that is, E[L(C n,x)] H(P). Proof: Let K = x 2 L(x) 1, and L(C,x) := L(C). Then E[L(C,X)] H(P) = = x A P(x)L(x) + x A P(x)logP(x) = x A P(x)log 0 P(x) 2 L(x) /K logk since logx x 1 for 0 < x 1 or the divergence is nonnegative, while K 1 by Kraft s inequality. Exercise: There exists at least one sequence x n 1 such that L( xn 1 ) log 2 P( x n 1 ).

19 Redundancy Known Source P. The pointwise redundancy R(x) and the average redundancy R: R(x) = L(C,x) + lgp(x) R = E[L(C,X)] H(P) 0 Optimal Code: min L(x)P(x) subject to L x 2 L(x) 1. x Solution: By Lagrangian multipliers we find L opt (x) = lgp(x). The smaller the redundancy is, the better (closer to the optimal) the code is.

20 Outline Update 1. Shannon Information Theory: Three Theorems of Shannon 2. Glance at Channel Coding Theorem 3. Source Coding 4. Redundancy: Known Sources Shannon and Huffman Coding Non-Prefix Codes Tunstall Code 5. Minimax Redundancy: Universal (unknown) Sources

21 Redundancy for Huffman s Code We consider fixed-to-variable length codes; in particular, Huffman s code. For a known source P, we consider fixed length sequencesx n 1 = x 1...x n. Huffman Code: The following optimization problem is solved by Huffman s code. R n = min Cn C E x n 1 [L(C n,x n 1 ) + log 2P(x n 1 )]. We study the average redundancy for a binary memoryless sources with p denoting the probability of generating 0 and q = 1 p. In 1994 Stubley proposed the following for Huffman s average redundancy R H n = 2 n k=0 ( n k) p k q n k αk + βn 2 n k=0 where ( ) ( ) 1 p 1 α = log 2, β = log 2 p 1 p and x = x x is the fractional part of x. ( n k) p k q n k 2 αk+βn + o(1).

22 Main Result Theorem 1 (W.S., 2000). Consider the Huffman block code of length n over a binary memoryless source with p < 1 2. Then as n R H n = ln M + o(1) α irrational ( ) βmn M(1 2 1/M ) 2 nβm /M + O(ρ n ) α = N M where N,M are integers such that gcd(n,m) = 1 and ρ < (a) (b) Figure 1: The average redundancy of Huffman codes versus block size n for: (a) irrational α = log 2 (1 p)/p with p = 1/π; (b) rational α = log 2 (1 p)/p with p = 1/9.

23 Why Two Modes: Shannon Code Consider the Shannon code that assigns the length L(C S n,xn 1 ) = lgp(xn 1 ) to the source sequence x n 1. Observe that P(x n 1 ) = pk (1 p) n k where p is known probability of generating 0 and k is the number of 0s. The Shannon code redundancy is R S n = n k=0 ( n k) p k (1 p) n k( log 2 (p k (1 p) n k ) ) + log 2 (p k (1 p) n k ) = 1 n k=0 ( n k) p k (1 p) n k αk + βn where x = x x is the fractional part of x, and ( ) ( ) 1 p 1 α = log 2, β = log 2. p 1 p

24 Sketch of Proof We need to understand asymptotic behavior of the following sum (cf. Bernoulli distributed sequences modulo 1) n k=0 ( n k) p k (1 p) n k f( αk + y ) for fixed p and some Riemann integrable function f : [0, 1] R. Lemma 1. Let 0 < p < 1 be a fixed real number and α be an irrational number. Then for every Riemann integrable function f : [0, 1] R lim n n k=0 ( n k) p k (1 p) n k f( αk + y ) = 1 0 f(t)dt, where the convergence is uniform for all shifts y R. Lemma 2. Let α = N M be a rational number with gcd(n,m) = 1. Then for bounded function f : [0,1] R n k=0 ( n k) p k (1 p) n k f( αk + y ) = 1 M uniformly for all y R and some ρ < 1. M 1 l=0 f ( l M + My ) M + O(ρ n )

25 Uniformly Distributed Sequences Mod 1 1. Uniformly Distributed Sequences Mod 1: A sequence x n R is said to be Bernoulli uniformly distributed modulo 1 (in short: B-u.d. mod 1) if for 0 < p < 1 lim n n k=0 ( n k) p k (1 p) n k χ I ( x k ) = λ(i) holds for every interval I R, where χ I (x n ) is the characteristic function of I (i.e., it equals to 1 if x n I and 0 otherwise) and λ(i) is the Lebesgue measure of I. 2. Weyl s Criterion: A sequence x n is B-u.d. mod 1 if and only if lim n n k=0 holds for all non-zero m Z {0}. ( n k) p k (1 p) n k e 2πimx k = 0 Proof. The proof is standard. Basically, it is based on the fact that by Weierstrass s approximation theorem every Riemann integrable function f of period 1 can be uniformly approximated by a trigonometric polynomial (i.e., a finite combination of functions of the type e 2πimx ).

26 Shannon Code: The Irrational Case 3. Let us return to the Shannon code redundancy. Two cases: α irrational and α rational. 4. We first consider α irrational. To apply our previous results, we must show that αk is B-u.d. mod 1. By Weyl s criterion lim n n k=0 ( n k) p k q n k e 2πim(kα) ( ) n = lim pe 2πimα + q n 0 = 0 providedαis irrational. Hence, by the previous theorem, withf(t) = t and y = βn, we immediately obtain lim n n k=0 This proves that for α irrational ( n k) p k q n k αk + βn = 1 0 tdt = 1 2. R S n = o(1).

27 Shannon Redundancy Rational Case Assume α = N/M where gcd(n,m) = 1. Denote p n,k = ( n k) p k q n k. S n = = n ( n k)p k q k n k N M + βn = k=0 M 1 l=0 l M + βn m: k=l+mm n M 1 l=0 p n,k. m: k=l+mm n Lemma 3. For fixed l M and M, there exist ρ < 1 such that m: k=l+mm n ( n k) p k (1 p) n k = 1 M + O(ρn ). p n,k l N M + N + βn Proof. Let ω k = e 2πik/M for k = 0,1,...,M 1 be the Mth root of unity. M 1 1 { ω n 1 if M n k M = 0 otherwise. k=0 where M n means that M divides n. Then ( n k)p k q n k = 1 + (pω 1 + q) n l (pω M 1 + q) n l m: k=l+mm n M = 1 M + O(ρn ), since (pω r + q) = p 2 + q 2 + 2pqcos(2πr/M) < 1 for r 0.

28 Finishing the Rational Case We shall use the following Fourier series; for real x x = 1 2 m=1 sin 2πmx mπ = 1 2 m Z {0} c m e 2πimx, c m = i 2πm, Continuing the derivation and using the above lemma we obtain S n = 1 M M 1 l=0 = M 1 2 m 0c m e 2πim(l/M+βn) = 1 2 m 0 m=km 0 c km e 2πikMβn = M c m e 2πimnβ 1 M ( ) 1 2 βnm. M 1 l=0 e 2πim l M Now, having the above two results we easily establish that R S n = o(1) α irrational M ( Mnβ 1 2) + O(ρ n ) α = N M Exercise. Using Stubley s formula and tools discussed above, derive the redundancy of the Huffman code.

29 Outline Update 1. Shannon Information Theory: Three Theorems of Shannon 2. Glance at Channel Coding Theorem 3. Source Coding 4. Redundancy: Known Sources Shannon and Huffman Coding Non-Prefix Codes Tunstall Code 5. Minimax Redundancy: Universal (unknown) Sources

30 Do We Really Need Prefix FV Codes? As argued in the XXVIII Shannon Lecture, it is possible to attain average compression for FV codes lower than Huffman s code (P andnare known): 1. Avoiding symbol-by-symbol compression, we design fixed-to-variable code for the whole file, hence eliminating the need for prefix codes. 2. Applying prefix condition to symbol-by-symbol or n-block supersymbol is only optimal for the linear term but not for the sublinear term. 3. The optimal FV code performs no blocking but encodes a table that listing sequences in decreasing probabilities: Define: π X (x) = l if x is the l-th most probable element according to distribution P X. The the minimum average code length is L(X) = E[ log 2 π X (X) ].

31 Do We Really Need Prefix FV Codes? As argued in the XXVIII Shannon Lecture, it is possible to attain average compression for FV codes lower than Huffman s code (P andnare known): 1. Avoiding symbol-by-symbol compression, we design fixed-to-variable code for the whole file, hence eliminating the need for prefix codes. 2. Applying prefix condition to symbol-by-symbol or n-block supersymbol is only optimal for the linear term but not for the sublinear term. 3. The optimal FV code performs no blocking but encodes a table that listing sequences in decreasing probabilities: Define: π X (x) = l if x is the l-th most probable element according to distribution P X. The the minimum average code length is L(X) = E[ log 2 π X (X) ]. Example: Binary source with p < 1 p := q; sequence x n 1 = x 1...x n : q n( p q ) 0 q n( p q ) 1... q n( ) n p q log 2 (1) log 2 (2)... log 2 (2 n )

32 Bounds on the Average Rate of Fixed-to-Variable Codes Upper Bound: E[L(X)] H(X) (Wyner). Lower Bounds: Theorem 2 (W.S., Verdu, 2009). Define the monotonically increasing function by: ψ(x) = x + (1 + x)log 2 (1 + x) xlog 2 x. Then ψ 1 (H(X)) L(X). Looser Bounds: Notice that: ψ(x) x + log 2 (e + ex), then Alon & Orlitsky bound follows: H(X) log 2 (H(X) + 1) log 2 e L(X) By monotonic increasing: h(x) = (1+x)log(1+x) xlogx, we conclude L(X) H(X) (1 + H(X))log 2 (1 + H(X)) H(X)log 2 H(X) This can be also found in Blundo & de Prisco.

33 Binary Memoryless Source (WS, 2005) Consider again a binary memoryless source with p probability of transmitting a 1. Let x n 1 = x 1...x n. There are ( n) k equal probabilities P(x n 1 ) = pk q n k, k number of 1 s. Define ( n ( n ( n A k = + + +, A 1 = 0. 0) 1) k) Starting from A k 1 the next ( n) k probabilities P(x n 1 ) are the same. The average code length is L n = n p k q n k A k log 2 (j) = n ( n k) p k q n k log 2 (A k 1 + i). k=0 j=a k 1 +1 k=0 i=1 In 2005 we proved the following asymptotic results: p = 1 2 p < 1 2 L n = n n (n + 2). L n = nh(p) 1 logn + O(1). 2

34 Binary Case: More Precise Main Result Theorem 3 (W.S., 2005). For a binary memoryless source, let p < 1 2. Then L n = nh(p) 1 2 log 2n 3 + ln(2) 1 p 1 + log 2 2ln(2) 1 2p 2πp(1 p) ( ) p 2(1 p) + 1 2p log 2 + F(n) + o(1) p where H(p) = plog 2 p (1 p)log 2 (1 p), and F(n) = 0 if log 2 1 p p is irrational; F(n) is an oscillating function if log 2 1 p p = N/M is rational, F(n) = 1 p 1 2p H M(nβ)[x] p 1 2p H 2(1 3p) M(nβ α)[ x] 1 2p H M(nβ)[2 x ]+ p 1 2p H M(nβ α)[2 x ] where H M (y)[f] := 1 M 2π e x2 /2 ( M ( y log 2 ( 1 2p 1 p 2πpqn ) x2 2ln2 ) 1 0 f(t)dt ) dx for some Riemann function f and β = log 2 (1 p).

35 Some Oscillations (a) L_n n*h(p)+0.5*log(n) (b) L_n n*h(p)+0.5*log(n) Figure 2: Plots of L n nh(p) + 0.5log(n) (y-axis) versus n (x-axis) for: (a) irrational α = log 2 (1 p)/p with p = 1/π; (b) rational α = log 2 (1 p)/p with p = 1/9.

36 Sketch of Proof 1. Using the following identity to handle floor functions (partial summation; cf. Knuth) N N 1 a j = Na n (a j+1 a j ) j=1 j=1 we can reduce L n to the sums of the following form S n = = n k=0 n k=0 ( n k) p k q n k log 2 A k ( n k) p k q n k log 2 A k n k=0 ( n k) p k q n k log 2 A k = a n + b n where a n = b n = n k=0 n k=0 ( n k) p k q n k log 2 A k, ( n k) p k q n k log 2 A k.

37 Asymptotics of A n : Saddle point Method Lemma 5. For large n and p < 1/2 A np = 1 p 1 2 nh(p)( ) 1 + O(n 1/2 ). 1 2p 2πnp(1 p) More precisely, for an ε > 0 and k = np + Θ(n 1/2+ε ) we have (Exercise) A k = 1 p ( ) 1 1 p k ( 1 1 2p 2πnp(1 p) p (1 p) exp n (k np)2 2p(1 p)n ) ( ) 1 + O(n δ ). Proof. Notice that A n (z) = n k=0 A kz k = (1+z)n 2 n z n+1 1 z. Apply the saddle point method to the Cauchy formula A k = [z n ]A n (z). A k = 1 2πi (1 + z) n 2 n z n+1 1 z dz z k+1 = 1 2πi 1 1 z 2nlog(1+z) (k+1)logz dz. The saddle pointz 0 = (k+1)/(n k+1) = p/(1 p) andh (z 0 ) = q 3 /p. A k = 1 1 z 0 1 2πnH (z 0 ) 2nH(z 0 ) (1 + O(n 1/2 )). Heuristic: Note that ( n ) k i = (p/q) i ( n k).

38 Returning to b n 3. We also need asymptotics of b n = n k=0 ( n k) p k q n k log 2 A k. From previous lemma we conclude that loga k = αk + nβ log 2 ω n (k np)2 2pqnln2 + O(n δ ) for some ω > 0 and α = logp/(1 p). Thus we need asymptotics of the following sum n k=0 ( n k) p k q n k αk + nβ log 2 ω n (k np)2. 2pqnln2 We must now resort to theory of Bernoulli sequences modulo 1.

39 Final Lemma Lemma 6 (Drmota, Hwang, W.S., 2004). Let 0 < p < 1 be a fixed real number and f : [0,1] R be a Riemann integrable function. (i) If α is irrational, then lim n n k=0 ( n k) p k (1 p) n k f ( ) kα + y (k np) 2 /(2pqnln2) = 1 0 f(t)dt, where the convergence is uniform for all shifts y R. (ii) If α = N M (rational) (gcd(n,m) = 1), then uniformly y R n k=0 ( n k) p k (1 p) n k f ( ) kα + y (k np) 2 /(2pqnln2) = 1 0 f(t)dt + H M (y) where H M (y) := 1 M 1 2π e x2 /2 ( M ( ) y x2 2ln2 1 0 ) f(t)dt dx is a periodic function with period 1 M.

40 General Memoryless Source: Main Results We consider m-ary alphabet A with probabilities of symbols: p 1 p 2 p m 1 p m. Entropy: H(X) = i p ilogp i. Define also B i = logp m /p i. Our main results are: Theorem 4. For a memoryless source with finite alphabet A, the minimum expected length of a lossless binary encoding is if the source is equiprobable, and L n = nlog 2 A + o(1). if the source is not equiprobable. L n = nh(x) 1 2 log 2n + O(1)

41 Sketch of the Proof 1. Type: Type T n,m T n,m = {(k 1,...,k m ) N m,k k m = n}. such thatk i is the number of symbolsiin a sequence and probabilities are the same p k = p k 1 1 pk m m, k T n,m. 2. Order among types: l k iff p l p k, sort from the smallest index to the largest; or equivalently where B i = logp m /p i. l 1 B l m 1 B m 1 k 1 B k m 1 B m 1 3. There are ( n ) = k sequences of type k. Define ( n k 1,...,k m ) A k := l k ( n l). Starting from position A k the next ( n k+1) sequences have the same probability p k+1, where k + 1 is the next type.

42 Sketch of Proof 4. The average code length is L n = = k T n,m k Tn,m p k A k i=a k 1 +1 logi = ( n k) p k loga k + O(1) k Tn,m ( n k) p k log(a k i) i=1 binomial sum = loga np + O(1) 5. We need to estimate A np = p l p np ( n l) For l i = np i + x i A np = ( n ). np + x B 1 x 1 + +B m 1 x m 1 0

43 Sketch of the Proof 6. By Stirling ( n np + x ) ( 1 + O(1/ n) ) C 2nH(X) n exp(b 1x (m 1)/ B m 1 x m 1 ) ( exp 1 ) 2n xt Σ 1 x where Σ is an invertible covariance matrix (m 1) (m 1). 7. Observe that (b = (B 1,...B m 1 )) A np = C2nH(X) n (m 1)/2 b T x=0 exp ( 1 ) 2n xt Σ 1 x + b T x<0 exp ( b T x 1 ) 2n xt Σ 1 x. which leads to ( loga np = log C 2nH(X) n (m 1)/2n(m 2)/2 + O ( 2 H(X) n (m 1)/2 )) = nh(x) 1 2 logn+o(1) since D m 2 exp ( 1 ) 2n xt Σ 1 x = Cn (m 2)/2.

44 Example for m = 3 Assume now m = 3. Then ( n ) A np = np 1 + x,np 2 + y B 1 x+b 2 y 0 2 nh(x) n 2πp 1 p 2 p 3 B 1 x+b 2 y=0 exp ( x2 2np 1 y2 2np 2 ) (x + y)2 2np 3 = O( n) 2nH(X) n = C 2nH(X) n, k 2 normal O( n) np 2 geometric np 1 k 1. Figure 3: Illustration for m = 3

45 Outline Update 1. Shannon Information Theory: Three Theorems of Shannon 2. Glance at Channel Coding Theorem 3. Source Coding 4. Redundancy: Known Sources Shannon and Huffman Coding Non-Prefix Codes Tunstall Code 5. Minimax Redundancy: Universal (unknown) Sources

46 Variable-to-Fixed Codes A VF coder consists of a parser and a dictionary. Parsing Tree 1 0 Dictionary A variable-to-fixed length encoder partitions the source string into a concatenation of variable-length phrases. 2. Each phrase belongs to a given dictionary D of source strings. 3. A dictionary can be represented by a complete parsing tree T. The dictionary entries d D correspond to the leaves of the parsing tree. 4. The encoder represents phrases by the fixed length binary codewords. i.e., a dictionary D of M entries requires log 2 M bits to represent entries. Average Redundancy Rate: r = lim n x =n P S(x)(L(x) + logp S (x)) n = logm E[D] h where h is the entropy rate of the source.

47 Tunstall and Khodak Codes p = 0.6 q = 04 Tunstall s construction M = 5 Khodak s construction r = 0.25 Tunstall Code: 1. Start with a root and leaves. 2. In the J s iteration select a leaf with the highest probability and grow children out it At Jth step, the parsing tree has J internal nodes and M=J + 1 leaves corresponding to dictionary entries. Khodak Construction: 1. Pick a real number r and grow a complete parsing tree satisfying min{p,1 p} r P(d) < r, d D. 2. The resulting parsing tree is exactly the same as the Tunstall tree. 3. Ify is a proper prefix of entries ofd r, i.e., y is an internal node oft r, then P(y) r.

48 Phrase Length We study the phrase length D = d, i.e., path length in the parsing tree. Moment Generating Functions: Define D(r,z) := E[z D ] = d Dr P(d)z d. and its corresponding internal nodes generating function S(r,z) = P(y)z y. y: P(y) r Simple Fact on Trees: Let D be a dictionary (leaves of T ) and Ỹ be the collection of proper prefixes of dictionary entries (internal nodes of T ). d D P(d) z d 1 z 1 = y Ỹ P(y)z y, Exercise. Thus and D(r,z) = 1 + (z 1)S(r,z), E[D] = S(v,1) = y Ỹ P(y), E[D(D 1)] = S (v,1) = 2 y Ỹ P(y) y.

49 Recurrences Define v = 1/r, z complex, and S(v,z) = S(v 1,z). Let A(v) = 1 y:p(y) 1/v be the # of strings of probab. v 1 or the # of internal nodes. In fact: M r = A(v) + 1. We have and A(v) = { 0 v < 1, 1 + A(vp) + A(vq) v 1 S(v,z) = { 0 v < 1, 1 + zp S(vp,z) + zq S(vq,z) v 1, since every binary string either is: empty string, string starting with the first symbol string starting with second symbol.

50 Mellin Transform The Mellin transform F (s) of a function F(v) is F (s) = 0 F(v)v s 1 dv. From the recurrence on S(v,z) we find D (s,z) = 1 z s(1 zp 1 s zq 1 s ) 1 s, R(s) < s 0(z), where s 0 (z) denotes the real solution of: zp 1 s + zq 1 s = 1. To find the asymptotics of D(v,z) as v we compute the inverse transform of D (s,z)): D(v,z) = 1 σ+it 2πi lim T σ it D (s,z)v s ds, where σ < s 0 (z). To determine the polar singularities of the meromorphic continuation of D (s,z), we have to analyze the set Z(z) = {s C : zp 1 s + zq 1 s = 1}.

51 x x x Im(z) T From C D(v, z x F T (v, σ x x x s 0 τ Re(z) = + 1 2πi x x = x x x T + 1 2πi provided that the series of residues converges the last integral exists. But they don t!.

52 Tauberian Rescue Therefore, we analyze (as in analytic number theory; cf. also Vallee) whose Mellin transform is D 1 (v,z) = v 0 D(w,z)dw. D 1 (s,z) = D (s + 1,z) s = O(1/s 2 ). Lemma 7 (Tauberian). Let f(v, λ) be a non-negative increasing function such that F(v,λ) = and has the asymptotic expansion v 0 f(w,λ)dw F(v,λ) = vλ+1 (1 + λ o(1)) λ + 1 as v and uniformly in λ. Then as v uniformly in λ f(v,λ) = v λ (1 + λ 1 2 o(1)).

53 Main Results Theorem 5 (Central Limit Theorem). For large M r D r 1 H lnm r ) N(0,1) standard normal distribution lnm r (H2 H 3 1 H where H is natural entropy and H 2 = pln 2 +qln 2 q. If ln q/ lg p is irrational, then M r = A(v) + 1 = v H + o(v) if ln q/ lg p is rational, then M r E[D r ] = lnm r H + lnh H + H 2 2H 2 + o(1); = Q 1(logv) v + O(v 1 η ), Q 1 (x) = H L 1 e Le L x L, E[D r ] = lnm r H + lnh H + H 2 2H + lnl + ln(1 e L ) + L 2 2 H + O(M r η ), L largest real number s.t. ln(1/p) and ln(1/q) are integer multiples of L.

54 Redundancy Rate The average redundancy rate of Tunstall/Khodak s VF code is defined as r M r = lnm r E[D] h. Case 1: ln p/ ln q is irrational: r M r = H lnm r Case 2: lnp/lnq is rational: r M r = H lnm r ( ( H ) ( ) 2 1 2H lnh + o. lnm r H 2 2H lnh + lnl ln(el 1) + L ) 2 + O ( M r η ), for some η > 0, where L > 0 is the largest real number for which ln(1/p) and ln(1/q) are integer multiples of L. (No oscillation!)

55 Random Walk l logr logq Consider a random walk that corresponds to a path in the associated parsing tree. For Khodak s code we studied A(v) = f(v) y:p(y) 1/v logq = b a = log p logr log p k for some function f(v). But P(y) = p k q l (k,l 0), and set v = 2 V so that logp(v) = klg(1/p)+sllg(1/q) V. This corresponds to a random walk in the first quadrant with the linear boundary condition ax + by = V where a = log(1/p) and b = log(1/q). The phrase length coincides with the exit time of such a random walk.

56 Outline Update 1. Shannon Information Theory: Three Theorems of Shannon 2. Glance at Channel Coding Theorem 3. Source Coding 4. Redundancy: Known Sources 5. Minimax Redundancy: Universal (unknown) Sources Universal Memoryless Sources Universal Markov Sources Universal Renewal Sources

57 Minimax Redundancy Unknown Source P In practice, one can only hope to have some knowledge about a family of sources S that generates real data. Following Davisson we define the average minimax redundancy R n (S) and the worst case (maximal) minimax redundancy R n (S) for a family of sources S as R n (S) = min Cn R n (S) = minsup Cn P S supe[l(c n,x n 1 ) + lgp(xn 1 )] P S max[l(c x n n,x n 1 ) + lgp(xn 1 )]. 1 In the minimax scenario we look for the best code for the the worst source. Source Coding Goal: Find data compression algorithms that match optimal redundancy rates either on average or for individual sequences.

58 Maximal Minimax Redundancy We consider the following classes of sources S: Memoryless sources M 0 over an m-ary (finite) alphabet, that is, P(x n 1 ) = pk 1 1 pk m m with k k m = n, where p i are unknown! Markov sources M r over a binary alphabet of order r. Observe that for r = 1 P(x n 1 ) = p x 1 p k pk pk pk 11 11, where k ij is such that (x k,x k+1 ) = (i,j) {0,1} 2 and k 00 + k 01 + k 10 + k 11 = n 1, and that k 01 = k 10 if x 1 = x n and k 01 = k 10 ± 1 if x 1 x n. Renewal Sources R 0 where an 1 is introduced after a run of 0s distributed according to some distribution.

59 Improved Shtarkov Bounds For the maximal minimax redundancy define Q (x n 1 ) := sup P S P(x n 1 ) y n1 An sup P S P(y n 1 ). the maximum likelihood distribution. Observe that R n (S) = min sup Cn C P S = min Cn C max x n 1 = min max Cn C x n 1 max(l(c n,x n 1 ) + lgp(xn 1 )) x n 1 = R GS n (Q ) + lg ( ) L(C n,x n 1 ) + sup lgp(x n 1 ) P S [L(C n,x n 1 ) + lgq (x n 1 ) + lg y n 1 An sup P S P(y n 1 ) y n 1 An sup P S P(y n 1 )] where R GS n (Q ) is the maximal redundancy of a generalized Shannon code built for the (known) distribution Q. We also write D n (S) = lg sup x n 1 An P S P(x n 1 ) := lgd n (S).

60 Outline Update 1. Shannon Information Theory: Three Theorems of Shannon 2. Glance at Channel Coding Theorem 3. Source Coding 4. Redundancy: Known Sources 5. Minimax Redundancy: Universal (unknown) Sources Universal Memoryless Sources Universal Markov Sources Universal Renewal Sources

61 Maximal Minimax for Memoryless Sources We first consider the maximal minimax redundancy R n (M 0) for a class of memoryless sources over a finite m-ary alphabet. Observe that d n (M 0 ) = x n 1 = = The summation set is sup p k 1 1 pk m m p 1,...,pm k 1 + +km=n k 1 + +km=n ( n k 1,...,k m ) sup p 1,...,pm ( n k 1,...,k m )( k1 n p k 1 1 pk m m ) k1 ( km I(k 1,...,k m ) = {(k 1,...,k m ) : k k m = n}. The number N k of types k = (k 1,...,k m ) is ( n ) N k = k 1,...,k m The (unnormalized) likelihood distribution is sup p k 1 1 pk m m = p 1,...,pm ( k1 n ) k1 ( km n ) k m n ) k m.

62 Generating Function for d n (M 0 ) We write d n (M 0 ) = n! n n k 1 + +km=n Let us introduce a tree-generating function k k 1 1 k 1! kkm m k m! B(z) = k=0 k k k! zk = 1 1 T(z), where T(z) satisfies T(z) = ze T(z) (= W( z), Lambert s W -function) and also k k 1 T(z) = z k k! k=1 enumerates all rooted labeled trees. Let now D m (z) = Then by the convolution formula n=0 n n n! d n(m 0 )z n. D m (z) = [B(z)] m.

63 Asymptotics The function B(z) has an algebraic singularity at z = e 1 and B(z) = 1 2(1 ez) O( (1 ez). The singularity analysis yields ( ) [z n 1 ] = 1 ez e n ( 1 1 πn [z n ] ( 1 ez ) = en πn 3 ( ) 1 [z n ] = e n, 1 ez n! n n = e n 2πn ) 8n + O(1/n2 ) ) ( n ( ) 12n + O(1/n2 ),. which leads to (cf. Clarke & Barron, 1990, W.S., 1998) d n (M 0 ) = m 1 ( ) ( ) n π log + log 2 2 Γ( m 2 ) ( 3 + m(m 2)(2m + 1) Γ(m 2 )m 3Γ( m ) Γ2 ( m 2 )m2 9Γ 2 ( m ) ) 2 n 1 n +

64 Outline Update 1. Shannon Information Theory: Three Theorems of Shannon 2. Glance at Channel Coding Theorem 3. Source Coding 4. Redundancy: Known Sources 5. Minimax Redundancy: Universal (unknown) Sources Universal Memoryless Sources Universal Markov Sources Universal Renewal Sources

65 Maximal Minimax for Markov Sources (i) M 1 is a Markov source of order r = 1, (ii) the transition matrix P = {p ij } m i,j=1 (iii)circular sequences (the first symbols follows the last). d n (M 1 ) = x n 1 sup P p k pk mm mm = k Fn M k ( k11 k 1 ) k11... ( kmm k m ) k mm, k ij is the number of pairs ij A 2 in x n 1, k i = m k = {k ij } m i,j=1 is an integer matrix such that j=1 k ij, F n : m k ij = n, and m k ij = m 1 k ji, i,j=1 j=1 j=0 Matrix k satisfying the above conditions is called the frequency matrix or Markov type. M k represents the numbers of strings x n 1 of type k. We come back to Markov types soon.

66 Main Technical Tool Let g k be a sequence of scalars indexed by matrices k and g(z) = k g k z k be its regular generating function, and Fg(z) = k F g k z k = n 0 k Fn g k z k the F-generating function of g k for which k F. Lemma 8. Let g(z) = k g kz k. Then Fg(z) := n 0 k Fn g k z k = ( 1 2jπ ) m dx1 dxm x j g([z ij ]) x 1 x m x i with the ij-th coefficient of [z ij x j x i ] is z ij x j x i. Proof. It suffices to observe g([z ij x j x i ]) = k g k z k m i=1 i x k ij j k ij i Thus Fg(z) is the coefficient of g([z ij x j x i ]) at x 0 1 x0 2 x0 m.

67 Main Results Theorem 6. Let M 1 be a Markov source over an m-ry alphabet. Then with d n (M 1 ) = A m = where K(1) = {y ij : of degree m 1. ( n 2π ) m(m 1)/2 A m ( 1 + O ( )) 1 n j y ij d[y ij ] yij j m (y ij ) K(1)mF i ij y ij = 1} and F m ( ) is a polynomial expression In particular, for m = 2 A 2 = 16 Catalan where Catalan is Catalan s constant ( 1) i i (2i+1) Theorem 7. Let M r be a Markov source of order r. Then d n (M r ) = ( n 2π ) m r (m 1)/2 A r m (1 + O ( )) 1 where A r m is a constant defined in a similar fashion as A m above. n

68 Outline Update 1. Shannon Information Theory: Three Theorems of Shannon 2. Glance at Channel Coding Theorem 3. Source Coding 4. Redundancy: Known Sources 5. Minimax Redundancy: Universal (unknown) Sources Universal Memoryless Sources Universal Markov Sources Universal Renewal Sources

69 Renewal Sources The renewal process defined as follows: Let T 1,T 2... be a sequence of i.i.d. positive-valued random variables with distribution Q(j) = Pr{T i = j}. The process T 0,T 0 +T 1,T 0 +T 1 +T 2,... is called the renewal process. With a renewal process we associate a binary renewal sequence in which the positions of the 1 s are at the renewal epochs (runs of zeros) T 0,T 0 + T 1,... We start with x 0 = 1. Csiszár and Shields (1996) proved that R n (R 0 ) = Θ( n). We prove the following result. Theorem 8 (Flajolet and WS, 1998). Consider the class of renewal processes as defined above. Then where c = π R n (R 0) = 2 cn + O(logn). log2

70 Maximal Minimax Redundancy For a sequence x n 0 = 10α 1 10 α αn 10 0 }{{} k k m is the number of i such that α i = m. Then P(x n 1 ) = Qk 0 (0)Q k 1 (1) Q k n 1 (n 1)Pr{T 1 > k }. It can be proved that r n+1 1 d n (R 0 ) n m=0 r m where r n = n k=0 r n,k = P(n,k) r n,k ( k )( ) k0 ( ) k1 k0 k1 kn 1 ( k 0 k n 1 k k k ) kn 1 where P(n,k) is the summation set which happens to be the partition of n into k terms, i.e., n = k 0 + 2k nk n 1, k = k k n 1.

71 Main Results Theorem 9 (Flajolet and WS, 1998). Consider the class of renewal processes as defined above. The quantity r n attains the following asymptotics where c = π r n = 2 5 cn log2 8 lgn lglogn + O(1) Asymptotic analysis is sophisticated and follows these steps: first, we transform r n into another quantity s n that we know how to handle and (using a probabilistic technique) we know how to read back results for r n from s n ; use combinatorial calculus to find the generating function of s n, which turns out to be an infinite product of tree-functions B(z) defined above; transform this product into a harmonic sum that can be analyzed asymptotically by the Mellin transform; obtain an asymptotic expansion of the generating function around z = 1 which is the starting point for extracting the asymptotics of the coefficients; finally, estimate R n (R 0) by the saddle point method.

72 Asymptotics: The Main Idea The quantity r n is to hard to analyze due to the factor k!/k k, hence we define a new quantity s n defined as { sn = n k=0 s n,k s n,k = e k P(n,k) kk 0 k 0! kk n 1 k n 1!. To analyze it, we introduce the random variable K n as follows Stirling s formula yields Pr{K n = k} = s n,k s n. r n s n = n k=0 r n,k s n,k s n,k s n = E[(K n )!K K n n e Kn ] = E[ 2πK n ] + O(E[K n 1 2]).

73 Fundamental Lemmas Lemma 9. Let µ n = E[K n ] and σ 2 n = Var(K n). s n exp (2 cn 78 ) logn + d + o(1) µ n = 1 4 n c log n c + o( n) σ 2 n = O(nlogn) = o(µ 2 n ), where c = π 2 /6 1, d = log2 3 8 logc 3 4 logπ. Lemma 10. For large n where µ n = E[K n ]. Thus E[ K n ] = µ 1/2 n (1 + o(1)) E[K n 1 2] = o(1). r n = s n E[ 2πK n ](1 + o(1)) = s n 2πµn (1 + o(1)).

74 Sketch of a Proof: Generating Functions 1. Define the function β(z) as β(z) = k=0 k k k! e k z k. One has (e.g., by Lagrange inversion or otherwise) β(z) = 1 1 T(ze 1 ). 2. Define S n (u) = s n,k u k, S(z,u) = S n (u)z n. k=0 n=0 Since s n,k involves convolutions of sequences of the form k k /k!, we have S(z,u) = Pn,k z 1k 0 +2k 1 + ( u e ) k0 + +k n 1 k k 0 k 0! kkn 1 k n 1! = β(z i u). i=1 We need to compute s n = [z n ]S(z,1) where [z n ]F(z) denotes the coefficient at z n of F(z).

75 Mellin Asymptotics 3. Let L(z) = logs(z,1) and z = e t, so that L(e t ) = logβ(e kt ). k=1 Mellin transform techniques provide an expansion of L(e t ) around t = 0 (or equivalently z = 1) since the sum falls under the harmonic sum paradigm. 4. The Mellin transforml (s) = M(L(e t );s) ofl(e t ) is computed by the harmonic sum property (M3). For R(s) (1, ), the transform evaluates to L (s) = ζ(s)λ(s) where ζ(s) = n 1 n s is the Riemann zeta function, and Λ(s) = 0 logβ(e t )t s 1 dt. This leads to L (s) ( ) Λ(1) s 1 s=1 + ( 1 4s logπ ). 2 4s s=0

76 What s Next? 5. An application of the converse mapping property (M4) allows us to come back to the original function, which translates in where L(e t ) = Λ(1) t logt 1 4 logπ + O( t), L(z) = Λ(1) 1 z log(1 z) 1 4 logπ 1 2 Λ(1) + O( 1 z). c = Λ(1) = 1 0 = π log(1 T(x/e)) dx x 6. In summary, we just proved that, as z 1, ( ) S(z,1) = e L(z) = a(1 z) 4 1 c exp (1 + o(1)), 1 z where a = exp( 1 4 logπ 1 2 c). 7. To extract asymptotic we need to apply the saddle point method.

77 Saddle Point Method Lemma 11. For positivea > 0, and reals B andc, definef(z) = f A,B,C (z) as ( ( )) A f(z) = exp 1 z + Blog z + C log z log 1. 1 z Then, [z n ]f A,B,C (z) = exp ( 2 An (4πe 2 log A / ) ) A + o(1). ( B 3 ) n logn + Cloglog 2 A Proof: We start with Cauchy s formula [z n ]f(z) = 1 2πi e h(z) dz where h(z) = logf A,B,C (z) (n +1)logz. The saddle point r defined by h (r) = 0 is asymptotically equal to r = 1 A n + B A 2n + o(n 1 ), and ( ) n n h(r) = 2A A + Blog A ( ) n + C loglog + 1 A 2 A + o(1).

78 Outline Update 1. Shannon Information Theory: Three Theorems of Shannon 2. Glance at Channel Coding Theorem 3. Source Coding 4. Redundancy: Known Sources 5. Minimax Redundancy: Universal (unknown) Sources 6. Appendix A: Mellin Transform 7. Appendix B: Saddle Point Method

79 Appendix A: Mellin Properties (M1) DIRECT AND INVERSE MELLIN TRANSFORMS. fundamental strip defined below. Let c belong to the f (s) := M(f(x);s) = 0 f(x)x s 1 dx then f(x) = 1 2πi c+i c i f (s)x s ds. (M2) FUNDAMENTAL STRIP. The Mellin transform of f(x) exists in the fundamental strip R(s) ( α, β), where f(x) = O(x α ) (x 0), f(x) = O(x β ) (x ). (M3) HARMONIC SUM PROPERTY. By linearity and the scale rule M(f(ax);s) = a s M(f(x);s), f(x) = k 0 λ k g(µ k x) then f (s) = g (s) k 0 λ k µ s k.

80 (M4) MAPPING PROPERTIES (Asymptotic expansion of f(x) and singularities of f (s)). f(x) = c ξ,k x ξ (logx) k + O(x M ) then (ξ,k) A f (s) (ξ,k) A c ξ,k ( 1) k k! (s + ξ) k+1. (i) Direct Mapping. Assume thatf(x) admits asx 0 + the asymptotic expansion of the above for some M < α and k > 0. Then for R(s) ( M, β), the transform f (s) satisfies the singular expansion of above. (ii) Converse Mapping. Assume that f (s) = O( s r ) with r > 1, as s and that f (s) admits the singular expansion above for R(s) ( M, α). Then f(x) satisfies the asymptotic expansion of above at x = 0 +.

81 Outline Update 1. Shannon Information Theory: Three Theorems of Shannon 2. Glance at Channel Coding Theorem 3. Source Coding 4. Redundancy: Known Sources 5. Minimax Redundancy: Universal (unknown) Sources 6. Appendix A: Mellin Transform 7. Appendix B: Saddle Point Method

82 Appendix B: Saddle Point Method Input: A function g(z) analytic in z < R (0 < R < + ) with nonnegative Taylor coefficients and fast growth as z R. Let h(z) := logg(z) (n + 1)logz. Output: The asymptotic formula for g n := [z n ]g(z) derived from the Cauchy coefficient integral g n = 1 2iπ where γ is a loop around z = 0. γ g(z) dz z n+1 = 1 2iπ γ e h(z) dz (S1). SADDLE POINT CONTOUR. Require that g (z)/g(z) + as z R. Let r = r(n) be the unique positive root of the saddle point equation h (r) = 0 or rg (r) g(r) = n + 1, so that r R as n. The integral above is evaluated on γ = {z z = r}.

83 (S2). BASIC SPLIT. Require that h (r) 1/3 h (r) 1/2 0. Define ϕ = ϕ(n) called the range of the saddle point by ϕ = h (r) 1/6 h (r) 1/4, so that ϕ 0, h (r)ϕ 2, and h (r)ϕ 3 0. Split γ = γ 0 γ 1, where γ 0 = {z γ arg(z) ϕ}, γ 1 = {z γ arg(z) ϕ}. (S3) ELIMINATION OF TAILS. Require that g(re iθ ) g(re iϕ ) onγ 1. Then, the tail integral satisfies the trivial bound, e h(z) dz γ 1 = O ( ) e h(reiϕ).

84 (S4) LOCAL APPROXIMATION. Require that h(re iθ ) h(r) 1 2 r2 θ 2 h (r) = O( h (r)ϕ 3 ) onγ 0. Then, the central integral is asymptotic to a complete Gaussian integral, and 1 2iπ e h(z) dz = g(r)r n γ 0 2πh (r) ( ) 1 + O( h (r)ϕ 3 ). (S5) COLLECTION. Requirements (S1),(S2),(S3),(S4), imply the estimate: [z n ]g(z) = g(r)r n 2πh (r) ( ) 1 + O( h (r)ϕ 3 ) g(r)r n 2πh (r).

A One-to-One Code and Its Anti-Redundancy

A One-to-One Code and Its Anti-Redundancy A One-to-One Code and Its Anti-Redundancy W. Szpankowski Department of Computer Science, Purdue University July 4, 2005 This research is supported by NSF, NSA and NIH. Outline of the Talk. Prefix Codes

More information

Minimax Redundancy for Large Alphabets by Analytic Methods

Minimax Redundancy for Large Alphabets by Analytic Methods Minimax Redundancy for Large Alphabets by Analytic Methods Wojciech Szpankowski Purdue University W. Lafayette, IN 47907 March 15, 2012 NSF STC Center for Science of Information CISS, Princeton 2012 Joint

More information

Analytic Algorithmics, Combinatorics, and Information Theory

Analytic Algorithmics, Combinatorics, and Information Theory Analytic Algorithmics, Combinatorics, and Information Theory W. Szpankowski Department of Computer Science Purdue University W. Lafayette, IN 47907 September 11, 2006 AofA and IT logos Research supported

More information

Analytic Information Theory and Beyond

Analytic Information Theory and Beyond Analytic Information Theory and Beyond W. Szpankowski Department of Computer Science Purdue University W. Lafayette, IN 47907 May 12, 2010 AofA and IT logos Frankfurt 2010 Research supported by NSF Science

More information

Analytic Information Theory: From Shannon to Knuth and Back. Knuth80: Piteaa, Sweden, 2018 Dedicated to Don E. Knuth

Analytic Information Theory: From Shannon to Knuth and Back. Knuth80: Piteaa, Sweden, 2018 Dedicated to Don E. Knuth Analytic Information Theory: From Shannon to Knuth and Back Wojciech Szpankowski Center for Science of Information Purdue University January 7, 2018 Knuth80: Piteaa, Sweden, 2018 Dedicated to Don E. Knuth

More information

Variable-to-Variable Codes with Small Redundancy Rates

Variable-to-Variable Codes with Small Redundancy Rates Variable-to-Variable Codes with Small Redundancy Rates M. Drmota W. Szpankowski September 25, 2004 This research is supported by NSF, NSA and NIH. Institut f. Diskrete Mathematik und Geometrie, TU Wien,

More information

Average Redundancy for Known Sources: Ubiquitous Trees in Source Coding

Average Redundancy for Known Sources: Ubiquitous Trees in Source Coding Discrete Mathematics and Theoretical Computer Science DMTCS vol. (subm.), by the authors, 1 1 Average Redundancy for Known Sources: Ubiquitous Trees in Source Coding Wojciech Szpanowsi Department of Computer

More information

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria Source Coding Master Universitario en Ingeniería de Telecomunicación I. Santamaría Universidad de Cantabria Contents Introduction Asymptotic Equipartition Property Optimal Codes (Huffman Coding) Universal

More information

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018 Please submit the solutions on Gradescope. EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018 1. Optimal codeword lengths. Although the codeword lengths of an optimal variable length code

More information

ELEC546 Review of Information Theory

ELEC546 Review of Information Theory ELEC546 Review of Information Theory Vincent Lau 1/1/004 1 Review of Information Theory Entropy: Measure of uncertainty of a random variable X. The entropy of X, H(X), is given by: If X is a discrete random

More information

A Master Theorem for Discrete Divide and Conquer Recurrences. Dedicated to PHILIPPE FLAJOLET

A Master Theorem for Discrete Divide and Conquer Recurrences. Dedicated to PHILIPPE FLAJOLET A Master Theorem for Discrete Divide and Conquer Recurrences Wojciech Szpankowski Department of Computer Science Purdue University U.S.A. September 6, 2012 Uniwersytet Jagieloński, Kraków, 2012 Dedicated

More information

Lecture 6 I. CHANNEL CODING. X n (m) P Y X

Lecture 6 I. CHANNEL CODING. X n (m) P Y X 6- Introduction to Information Theory Lecture 6 Lecturer: Haim Permuter Scribe: Yoav Eisenberg and Yakov Miron I. CHANNEL CODING We consider the following channel coding problem: m = {,2,..,2 nr} Encoder

More information

Chapter 2: Source coding

Chapter 2: Source coding Chapter 2: meghdadi@ensil.unilim.fr University of Limoges Chapter 2: Entropy of Markov Source Chapter 2: Entropy of Markov Source Markov model for information sources Given the present, the future is independent

More information

A Master Theorem for Discrete Divide and Conquer Recurrences

A Master Theorem for Discrete Divide and Conquer Recurrences A Master Theorem for Discrete Divide and Conquer Recurrences Wojciech Szpankowski Department of Computer Science Purdue University W. Lafayette, IN 47907 January 20, 2011 NSF CSoI SODA, 2011 Research supported

More information

Capacity of a channel Shannon s second theorem. Information Theory 1/33

Capacity of a channel Shannon s second theorem. Information Theory 1/33 Capacity of a channel Shannon s second theorem Information Theory 1/33 Outline 1. Memoryless channels, examples ; 2. Capacity ; 3. Symmetric channels ; 4. Channel Coding ; 5. Shannon s second theorem,

More information

Shannon Legacy and Beyond

Shannon Legacy and Beyond Shannon Legacy and Beyond Wojciech Szpankowski Department of Computer Science Purdue University W. Lafayette, IN 47907 July 4, 2011 NSF Center for Science of Information Université Paris 13, Paris 2011

More information

Chapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code

Chapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code Chapter 3 Source Coding 3. An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code 3. An Introduction to Source Coding Entropy (in bits per symbol) implies in average

More information

Algorithms, Combinatorics, Information, and Beyond. Dedicated to PHILIPPE FLAJOLET

Algorithms, Combinatorics, Information, and Beyond. Dedicated to PHILIPPE FLAJOLET Algorithms, Combinatorics, Information, and Beyond Wojciech Szpankowski Purdue University October 3, 2011 Concordia University, Montreal, 2011 Dedicated to PHILIPPE FLAJOLET Research supported by NSF Science

More information

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University Chapter 4 Data Transmission and Channel Capacity Po-Ning Chen, Professor Department of Communications Engineering National Chiao Tung University Hsin Chu, Taiwan 30050, R.O.C. Principle of Data Transmission

More information

Context tree models for source coding

Context tree models for source coding Context tree models for source coding Toward Non-parametric Information Theory Licence de droits d usage Outline Lossless Source Coding = density estimation with log-loss Source Coding and Universal Coding

More information

(Classical) Information Theory II: Source coding

(Classical) Information Theory II: Source coding (Classical) Information Theory II: Source coding Sibasish Ghosh The Institute of Mathematical Sciences CIT Campus, Taramani, Chennai 600 113, India. p. 1 Abstract The information content of a random variable

More information

EE376A: Homeworks #4 Solutions Due on Thursday, February 22, 2018 Please submit on Gradescope. Start every question on a new page.

EE376A: Homeworks #4 Solutions Due on Thursday, February 22, 2018 Please submit on Gradescope. Start every question on a new page. EE376A: Homeworks #4 Solutions Due on Thursday, February 22, 28 Please submit on Gradescope. Start every question on a new page.. Maximum Differential Entropy (a) Show that among all distributions supported

More information

Lecture 3: Channel Capacity

Lecture 3: Channel Capacity Lecture 3: Channel Capacity 1 Definitions Channel capacity is a measure of maximum information per channel usage one can get through a channel. This one of the fundamental concepts in information theory.

More information

Exercises with solutions (Set B)

Exercises with solutions (Set B) Exercises with solutions (Set B) 3. A fair coin is tossed an infinite number of times. Let Y n be a random variable, with n Z, that describes the outcome of the n-th coin toss. If the outcome of the n-th

More information

EE/Stat 376B Handout #5 Network Information Theory October, 14, Homework Set #2 Solutions

EE/Stat 376B Handout #5 Network Information Theory October, 14, Homework Set #2 Solutions EE/Stat 376B Handout #5 Network Information Theory October, 14, 014 1. Problem.4 parts (b) and (c). Homework Set # Solutions (b) Consider h(x + Y ) h(x + Y Y ) = h(x Y ) = h(x). (c) Let ay = Y 1 + Y, where

More information

Electrical and Information Technology. Information Theory. Problems and Solutions. Contents. Problems... 1 Solutions...7

Electrical and Information Technology. Information Theory. Problems and Solutions. Contents. Problems... 1 Solutions...7 Electrical and Information Technology Information Theory Problems and Solutions Contents Problems.......... Solutions...........7 Problems 3. In Problem?? the binomial coefficent was estimated with Stirling

More information

Multimedia Communications. Mathematical Preliminaries for Lossless Compression

Multimedia Communications. Mathematical Preliminaries for Lossless Compression Multimedia Communications Mathematical Preliminaries for Lossless Compression What we will see in this chapter Definition of information and entropy Modeling a data source Definition of coding and when

More information

EE376A - Information Theory Final, Monday March 14th 2016 Solutions. Please start answering each question on a new page of the answer booklet.

EE376A - Information Theory Final, Monday March 14th 2016 Solutions. Please start answering each question on a new page of the answer booklet. EE376A - Information Theory Final, Monday March 14th 216 Solutions Instructions: You have three hours, 3.3PM - 6.3PM The exam has 4 questions, totaling 12 points. Please start answering each question on

More information

EE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16

EE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16 EE539R: Problem Set 4 Assigned: 3/08/6, Due: 07/09/6. Cover and Thomas: Problem 3.5 Sets defined by probabilities: Define the set C n (t = {x n : P X n(x n 2 nt } (a We have = P X n(x n P X n(x n 2 nt

More information

CSCI 2570 Introduction to Nanocomputing

CSCI 2570 Introduction to Nanocomputing CSCI 2570 Introduction to Nanocomputing Information Theory John E Savage What is Information Theory Introduced by Claude Shannon. See Wikipedia Two foci: a) data compression and b) reliable communication

More information

Chapter 2 Date Compression: Source Coding. 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code

Chapter 2 Date Compression: Source Coding. 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code Chapter 2 Date Compression: Source Coding 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code 2.1 An Introduction to Source Coding Source coding can be seen as an efficient way

More information

Information Theory, Statistics, and Decision Trees

Information Theory, Statistics, and Decision Trees Information Theory, Statistics, and Decision Trees Léon Bottou COS 424 4/6/2010 Summary 1. Basic information theory. 2. Decision trees. 3. Information theory and statistics. Léon Bottou 2/31 COS 424 4/6/2010

More information

Solutions to Set #2 Data Compression, Huffman code and AEP

Solutions to Set #2 Data Compression, Huffman code and AEP Solutions to Set #2 Data Compression, Huffman code and AEP. Huffman coding. Consider the random variable ( ) x x X = 2 x 3 x 4 x 5 x 6 x 7 0.50 0.26 0. 0.04 0.04 0.03 0.02 (a) Find a binary Huffman code

More information

Lecture 5 Channel Coding over Continuous Channels

Lecture 5 Channel Coding over Continuous Channels Lecture 5 Channel Coding over Continuous Channels I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw November 14, 2014 1 / 34 I-Hsiang Wang NIT Lecture 5 From

More information

SIGNAL COMPRESSION Lecture Shannon-Fano-Elias Codes and Arithmetic Coding

SIGNAL COMPRESSION Lecture Shannon-Fano-Elias Codes and Arithmetic Coding SIGNAL COMPRESSION Lecture 3 4.9.2007 Shannon-Fano-Elias Codes and Arithmetic Coding 1 Shannon-Fano-Elias Coding We discuss how to encode the symbols {a 1, a 2,..., a m }, knowing their probabilities,

More information

Lecture 4 Noisy Channel Coding

Lecture 4 Noisy Channel Coding Lecture 4 Noisy Channel Coding I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw October 9, 2015 1 / 56 I-Hsiang Wang IT Lecture 4 The Channel Coding Problem

More information

Lecture 8: Channel and source-channel coding theorems; BEC & linear codes. 1 Intuitive justification for upper bound on channel capacity

Lecture 8: Channel and source-channel coding theorems; BEC & linear codes. 1 Intuitive justification for upper bound on channel capacity 5-859: Information Theory and Applications in TCS CMU: Spring 23 Lecture 8: Channel and source-channel coding theorems; BEC & linear codes February 7, 23 Lecturer: Venkatesan Guruswami Scribe: Dan Stahlke

More information

ECE 587 / STA 563: Lecture 5 Lossless Compression

ECE 587 / STA 563: Lecture 5 Lossless Compression ECE 587 / STA 563: Lecture 5 Lossless Compression Information Theory Duke University, Fall 2017 Author: Galen Reeves Last Modified: October 18, 2017 Outline of lecture: 5.1 Introduction to Lossless Source

More information

1 Introduction to information theory

1 Introduction to information theory 1 Introduction to information theory 1.1 Introduction In this chapter we present some of the basic concepts of information theory. The situations we have in mind involve the exchange of information through

More information

Notes 3: Stochastic channels and noisy coding theorem bound. 1 Model of information communication and noisy channel

Notes 3: Stochastic channels and noisy coding theorem bound. 1 Model of information communication and noisy channel Introduction to Coding Theory CMU: Spring 2010 Notes 3: Stochastic channels and noisy coding theorem bound January 2010 Lecturer: Venkatesan Guruswami Scribe: Venkatesan Guruswami We now turn to the basic

More information

EE5139R: Problem Set 7 Assigned: 30/09/15, Due: 07/10/15

EE5139R: Problem Set 7 Assigned: 30/09/15, Due: 07/10/15 EE5139R: Problem Set 7 Assigned: 30/09/15, Due: 07/10/15 1. Cascade of Binary Symmetric Channels The conditional probability distribution py x for each of the BSCs may be expressed by the transition probability

More information

Coding on Countably Infinite Alphabets

Coding on Countably Infinite Alphabets Coding on Countably Infinite Alphabets Non-parametric Information Theory Licence de droits d usage Outline Lossless Coding on infinite alphabets Source Coding Universal Coding Infinite Alphabets Enveloppe

More information

EECS 229A Spring 2007 * * (a) By stationarity and the chain rule for entropy, we have

EECS 229A Spring 2007 * * (a) By stationarity and the chain rule for entropy, we have EECS 229A Spring 2007 * * Solutions to Homework 3 1. Problem 4.11 on pg. 93 of the text. Stationary processes (a) By stationarity and the chain rule for entropy, we have H(X 0 ) + H(X n X 0 ) = H(X 0,

More information

10-704: Information Processing and Learning Spring Lecture 8: Feb 5

10-704: Information Processing and Learning Spring Lecture 8: Feb 5 10-704: Information Processing and Learning Spring 2015 Lecture 8: Feb 5 Lecturer: Aarti Singh Scribe: Siheng Chen Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal

More information

Tunstall Code, Khodak Variations, and Random Walks

Tunstall Code, Khodak Variations, and Random Walks Tunstall Code, Khodak Variations, and Random Walks April 15, 2010 Michael Drmota Yuriy A. Reznik Wojciech Szpankowski Inst. Discrete Math. and Geometry Qualcomm Inc. Dept. Computer Science TU Wien 5775

More information

18.2 Continuous Alphabet (discrete-time, memoryless) Channel

18.2 Continuous Alphabet (discrete-time, memoryless) Channel 0-704: Information Processing and Learning Spring 0 Lecture 8: Gaussian channel, Parallel channels and Rate-distortion theory Lecturer: Aarti Singh Scribe: Danai Koutra Disclaimer: These notes have not

More information

Lecture 22: Final Review

Lecture 22: Final Review Lecture 22: Final Review Nuts and bolts Fundamental questions and limits Tools Practical algorithms Future topics Dr Yao Xie, ECE587, Information Theory, Duke University Basics Dr Yao Xie, ECE587, Information

More information

lossless, optimal compressor

lossless, optimal compressor 6. Variable-length Lossless Compression The principal engineering goal of compression is to represent a given sequence a, a 2,..., a n produced by a source as a sequence of bits of minimal possible length.

More information

Lecture 5: Channel Capacity. Copyright G. Caire (Sample Lectures) 122

Lecture 5: Channel Capacity. Copyright G. Caire (Sample Lectures) 122 Lecture 5: Channel Capacity Copyright G. Caire (Sample Lectures) 122 M Definitions and Problem Setup 2 X n Y n Encoder p(y x) Decoder ˆM Message Channel Estimate Definition 11. Discrete Memoryless Channel

More information

ECE 587 / STA 563: Lecture 5 Lossless Compression

ECE 587 / STA 563: Lecture 5 Lossless Compression ECE 587 / STA 563: Lecture 5 Lossless Compression Information Theory Duke University, Fall 28 Author: Galen Reeves Last Modified: September 27, 28 Outline of lecture: 5. Introduction to Lossless Source

More information

EC2252 COMMUNICATION THEORY UNIT 5 INFORMATION THEORY

EC2252 COMMUNICATION THEORY UNIT 5 INFORMATION THEORY EC2252 COMMUNICATION THEORY UNIT 5 INFORMATION THEORY Discrete Messages and Information Content, Concept of Amount of Information, Average information, Entropy, Information rate, Source coding to increase

More information

Lecture 11: Quantum Information III - Source Coding

Lecture 11: Quantum Information III - Source Coding CSCI5370 Quantum Computing November 25, 203 Lecture : Quantum Information III - Source Coding Lecturer: Shengyu Zhang Scribe: Hing Yin Tsang. Holevo s bound Suppose Alice has an information source X that

More information

Lecture 3. Mathematical methods in communication I. REMINDER. A. Convex Set. A set R is a convex set iff, x 1,x 2 R, θ, 0 θ 1, θx 1 + θx 2 R, (1)

Lecture 3. Mathematical methods in communication I. REMINDER. A. Convex Set. A set R is a convex set iff, x 1,x 2 R, θ, 0 θ 1, θx 1 + θx 2 R, (1) 3- Mathematical methods in communication Lecture 3 Lecturer: Haim Permuter Scribe: Yuval Carmel, Dima Khaykin, Ziv Goldfeld I. REMINDER A. Convex Set A set R is a convex set iff, x,x 2 R, θ, θ, θx + θx

More information

Network coding for multicast relation to compression and generalization of Slepian-Wolf

Network coding for multicast relation to compression and generalization of Slepian-Wolf Network coding for multicast relation to compression and generalization of Slepian-Wolf 1 Overview Review of Slepian-Wolf Distributed network compression Error exponents Source-channel separation issues

More information

Information Theory and Statistics Lecture 2: Source coding

Information Theory and Statistics Lecture 2: Source coding Information Theory and Statistics Lecture 2: Source coding Łukasz Dębowski ldebowsk@ipipan.waw.pl Ph. D. Programme 2013/2014 Injections and codes Definition (injection) Function f is called an injection

More information

Chapter 9 Fundamental Limits in Information Theory

Chapter 9 Fundamental Limits in Information Theory Chapter 9 Fundamental Limits in Information Theory Information Theory is the fundamental theory behind information manipulation, including data compression and data transmission. 9.1 Introduction o For

More information

Digital Trees and Memoryless Sources: from Arithmetics to Analysis

Digital Trees and Memoryless Sources: from Arithmetics to Analysis Digital Trees and Memoryless Sources: from Arithmetics to Analysis Philippe Flajolet, Mathieu Roux, Brigitte Vallée AofA 2010, Wien 1 What is a digital tree, aka TRIE? = a data structure for dynamic dictionaries

More information

(Classical) Information Theory III: Noisy channel coding

(Classical) Information Theory III: Noisy channel coding (Classical) Information Theory III: Noisy channel coding Sibasish Ghosh The Institute of Mathematical Sciences CIT Campus, Taramani, Chennai 600 113, India. p. 1 Abstract What is the best possible way

More information

Lecture 15: Conditional and Joint Typicaility

Lecture 15: Conditional and Joint Typicaility EE376A Information Theory Lecture 1-02/26/2015 Lecture 15: Conditional and Joint Typicaility Lecturer: Kartik Venkat Scribe: Max Zimet, Brian Wai, Sepehr Nezami 1 Notation We always write a sequence of

More information

Communications Theory and Engineering

Communications Theory and Engineering Communications Theory and Engineering Master's Degree in Electronic Engineering Sapienza University of Rome A.A. 2018-2019 AEP Asymptotic Equipartition Property AEP In information theory, the analog of

More information

A Master Theorem for Discrete Divide and Conquer Recurrences

A Master Theorem for Discrete Divide and Conquer Recurrences A Master Theorem for Discrete Divide and Conquer Recurrences MICHAEL DRMOTA and WOJCIECH SZPANKOWSKI TU Wien and Purdue University Divide-and-conquer recurrences are one of the most studied equations in

More information

RENEWAL THEORY IN ANALYSIS OF TRIES AND STRINGS: EXTENDED ABSTRACT

RENEWAL THEORY IN ANALYSIS OF TRIES AND STRINGS: EXTENDED ABSTRACT RENEWAL THEORY IN ANALYSIS OF TRIES AND STRINGS: EXTENDED ABSTRACT SVANTE JANSON Abstract. We give a survey of a number of simple applications of renewal theory to problems on random strings, in particular

More information

10-704: Information Processing and Learning Fall Lecture 9: Sept 28

10-704: Information Processing and Learning Fall Lecture 9: Sept 28 10-704: Information Processing and Learning Fall 2016 Lecturer: Siheng Chen Lecture 9: Sept 28 Note: These notes are based on scribed notes from Spring15 offering of this course. LaTeX template courtesy

More information

Solutions to Homework Set #1 Sanov s Theorem, Rate distortion

Solutions to Homework Set #1 Sanov s Theorem, Rate distortion st Semester 00/ Solutions to Homework Set # Sanov s Theorem, Rate distortion. Sanov s theorem: Prove the simple version of Sanov s theorem for the binary random variables, i.e., let X,X,...,X n be a sequence

More information

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy Coding and Information Theory Chris Williams, School of Informatics, University of Edinburgh Overview What is information theory? Entropy Coding Information Theory Shannon (1948): Information theory is

More information

Lecture 11: Polar codes construction

Lecture 11: Polar codes construction 15-859: Information Theory and Applications in TCS CMU: Spring 2013 Lecturer: Venkatesan Guruswami Lecture 11: Polar codes construction February 26, 2013 Scribe: Dan Stahlke 1 Polar codes: recap of last

More information

Information Theory CHAPTER. 5.1 Introduction. 5.2 Entropy

Information Theory CHAPTER. 5.1 Introduction. 5.2 Entropy Haykin_ch05_pp3.fm Page 207 Monday, November 26, 202 2:44 PM CHAPTER 5 Information Theory 5. Introduction As mentioned in Chapter and reiterated along the way, the purpose of a communication system is

More information

MTH4101 CALCULUS II REVISION NOTES. 1. COMPLEX NUMBERS (Thomas Appendix 7 + lecture notes) ax 2 + bx + c = 0. x = b ± b 2 4ac 2a. i = 1.

MTH4101 CALCULUS II REVISION NOTES. 1. COMPLEX NUMBERS (Thomas Appendix 7 + lecture notes) ax 2 + bx + c = 0. x = b ± b 2 4ac 2a. i = 1. MTH4101 CALCULUS II REVISION NOTES 1. COMPLEX NUMBERS (Thomas Appendix 7 + lecture notes) 1.1 Introduction Types of numbers (natural, integers, rationals, reals) The need to solve quadratic equations:

More information

Quiz 2 Date: Monday, November 21, 2016

Quiz 2 Date: Monday, November 21, 2016 10-704 Information Processing and Learning Fall 2016 Quiz 2 Date: Monday, November 21, 2016 Name: Andrew ID: Department: Guidelines: 1. PLEASE DO NOT TURN THIS PAGE UNTIL INSTRUCTED. 2. Write your name,

More information

Lecture 4 : Adaptive source coding algorithms

Lecture 4 : Adaptive source coding algorithms Lecture 4 : Adaptive source coding algorithms February 2, 28 Information Theory Outline 1. Motivation ; 2. adaptive Huffman encoding ; 3. Gallager and Knuth s method ; 4. Dictionary methods : Lempel-Ziv

More information

LECTURE 10. Last time: Lecture outline

LECTURE 10. Last time: Lecture outline LECTURE 10 Joint AEP Coding Theorem Last time: Error Exponents Lecture outline Strong Coding Theorem Reading: Gallager, Chapter 5. Review Joint AEP A ( ɛ n) (X) A ( ɛ n) (Y ) vs. A ( ɛ n) (X, Y ) 2 nh(x)

More information

Coding for Discrete Source

Coding for Discrete Source EGR 544 Communication Theory 3. Coding for Discrete Sources Z. Aliyazicioglu Electrical and Computer Engineering Department Cal Poly Pomona Coding for Discrete Source Coding Represent source data effectively

More information

Chapter 5: Data Compression

Chapter 5: Data Compression Chapter 5: Data Compression Definition. A source code C for a random variable X is a mapping from the range of X to the set of finite length strings of symbols from a D-ary alphabet. ˆX: source alphabet,

More information

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information 4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information Ramji Venkataramanan Signal Processing and Communications Lab Department of Engineering ramji.v@eng.cam.ac.uk

More information

Department of Mathematics, University of California, Berkeley. GRADUATE PRELIMINARY EXAMINATION, Part A Fall Semester 2016

Department of Mathematics, University of California, Berkeley. GRADUATE PRELIMINARY EXAMINATION, Part A Fall Semester 2016 Department of Mathematics, University of California, Berkeley YOUR 1 OR 2 DIGIT EXAM NUMBER GRADUATE PRELIMINARY EXAMINATION, Part A Fall Semester 2016 1. Please write your 1- or 2-digit exam number on

More information

Entropies & Information Theory

Entropies & Information Theory Entropies & Information Theory LECTURE I Nilanjana Datta University of Cambridge,U.K. See lecture notes on: http://www.qi.damtp.cam.ac.uk/node/223 Quantum Information Theory Born out of Classical Information

More information

CHAPTER 3. P (B j A i ) P (B j ) =log 2. j=1

CHAPTER 3. P (B j A i ) P (B j ) =log 2. j=1 CHAPTER 3 Problem 3. : Also : Hence : I(B j ; A i ) = log P (B j A i ) P (B j ) 4 P (B j )= P (B j,a i )= i= 3 P (A i )= P (B j,a i )= j= =log P (B j,a i ) P (B j )P (A i ).3, j=.7, j=.4, j=3.3, i=.7,

More information

PART III. Outline. Codes and Cryptography. Sources. Optimal Codes (I) Jorge L. Villar. MAMME, Fall 2015

PART III. Outline. Codes and Cryptography. Sources. Optimal Codes (I) Jorge L. Villar. MAMME, Fall 2015 Outline Codes and Cryptography 1 Information Sources and Optimal Codes 2 Building Optimal Codes: Huffman Codes MAMME, Fall 2015 3 Shannon Entropy and Mutual Information PART III Sources Information source:

More information

3F1: Signals and Systems INFORMATION THEORY Examples Paper Solutions

3F1: Signals and Systems INFORMATION THEORY Examples Paper Solutions Engineering Tripos Part IIA THIRD YEAR 3F: Signals and Systems INFORMATION THEORY Examples Paper Solutions. Let the joint probability mass function of two binary random variables X and Y be given in the

More information

EE5585 Data Compression January 29, Lecture 3. x X x X. 2 l(x) 1 (1)

EE5585 Data Compression January 29, Lecture 3. x X x X. 2 l(x) 1 (1) EE5585 Data Compression January 29, 2013 Lecture 3 Instructor: Arya Mazumdar Scribe: Katie Moenkhaus Uniquely Decodable Codes Recall that for a uniquely decodable code with source set X, if l(x) is the

More information

ECE Information theory Final (Fall 2008)

ECE Information theory Final (Fall 2008) ECE 776 - Information theory Final (Fall 2008) Q.1. (1 point) Consider the following bursty transmission scheme for a Gaussian channel with noise power N and average power constraint P (i.e., 1/n X n i=1

More information

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1 Kraft s inequality An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if N 2 l i 1 Proof: Suppose that we have a tree code. Let l max = max{l 1,...,

More information

Homework Set #2 Data Compression, Huffman code and AEP

Homework Set #2 Data Compression, Huffman code and AEP Homework Set #2 Data Compression, Huffman code and AEP 1. Huffman coding. Consider the random variable ( x1 x X = 2 x 3 x 4 x 5 x 6 x 7 0.50 0.26 0.11 0.04 0.04 0.03 0.02 (a Find a binary Huffman code

More information

An introduction to basic information theory. Hampus Wessman

An introduction to basic information theory. Hampus Wessman An introduction to basic information theory Hampus Wessman Abstract We give a short and simple introduction to basic information theory, by stripping away all the non-essentials. Theoretical bounds on

More information

SIGNAL COMPRESSION Lecture 7. Variable to Fix Encoding

SIGNAL COMPRESSION Lecture 7. Variable to Fix Encoding SIGNAL COMPRESSION Lecture 7 Variable to Fix Encoding 1. Tunstall codes 2. Petry codes 3. Generalized Tunstall codes for Markov sources (a presentation of the paper by I. Tabus, G. Korodi, J. Rissanen.

More information

7 Asymptotics for Meromorphic Functions

7 Asymptotics for Meromorphic Functions Lecture G jacques@ucsd.edu 7 Asymptotics for Meromorphic Functions Hadamard s Theorem gives a broad description of the exponential growth of coefficients in power series, but the notion of exponential

More information

Considering our result for the sum and product of analytic functions, this means that for (a 0, a 1,..., a N ) C N+1, the polynomial.

Considering our result for the sum and product of analytic functions, this means that for (a 0, a 1,..., a N ) C N+1, the polynomial. Lecture 3 Usual complex functions MATH-GA 245.00 Complex Variables Polynomials. Construction f : z z is analytic on all of C since its real and imaginary parts satisfy the Cauchy-Riemann relations and

More information

Conformal maps. Lent 2019 COMPLEX METHODS G. Taylor. A star means optional and not necessarily harder.

Conformal maps. Lent 2019 COMPLEX METHODS G. Taylor. A star means optional and not necessarily harder. Lent 29 COMPLEX METHODS G. Taylor A star means optional and not necessarily harder. Conformal maps. (i) Let f(z) = az + b, with ad bc. Where in C is f conformal? cz + d (ii) Let f(z) = z +. What are the

More information

or E ( U(X) ) e zx = e ux e ivx = e ux( cos(vx) + i sin(vx) ), B X := { u R : M X (u) < } (4)

or E ( U(X) ) e zx = e ux e ivx = e ux( cos(vx) + i sin(vx) ), B X := { u R : M X (u) < } (4) :23 /4/2000 TOPIC Characteristic functions This lecture begins our study of the characteristic function φ X (t) := Ee itx = E cos(tx)+ie sin(tx) (t R) of a real random variable X Characteristic functions

More information

Revision of Lecture 5

Revision of Lecture 5 Revision of Lecture 5 Information transferring across channels Channel characteristics and binary symmetric channel Average mutual information Average mutual information tells us what happens to information

More information

10-704: Information Processing and Learning Fall Lecture 10: Oct 3

10-704: Information Processing and Learning Fall Lecture 10: Oct 3 0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 0: Oct 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy of

More information

Solutions to Homework Set #4 Differential Entropy and Gaussian Channel

Solutions to Homework Set #4 Differential Entropy and Gaussian Channel Solutions to Homework Set #4 Differential Entropy and Gaussian Channel 1. Differential entropy. Evaluate the differential entropy h(x = f lnf for the following: (a Find the entropy of the exponential density

More information

Notes by Zvi Rosen. Thanks to Alyssa Palfreyman for supplements.

Notes by Zvi Rosen. Thanks to Alyssa Palfreyman for supplements. Lecture: Hélène Barcelo Analytic Combinatorics ECCO 202, Bogotá Notes by Zvi Rosen. Thanks to Alyssa Palfreyman for supplements.. Tuesday, June 2, 202 Combinatorics is the study of finite structures that

More information

EE5585 Data Compression May 2, Lecture 27

EE5585 Data Compression May 2, Lecture 27 EE5585 Data Compression May 2, 2013 Lecture 27 Instructor: Arya Mazumdar Scribe: Fangying Zhang Distributed Data Compression/Source Coding In the previous class we used a H-W table as a simple example,

More information

LECTURE 3. Last time:

LECTURE 3. Last time: LECTURE 3 Last time: Mutual Information. Convexity and concavity Jensen s inequality Information Inequality Data processing theorem Fano s Inequality Lecture outline Stochastic processes, Entropy rate

More information

Lecture 8: Shannon s Noise Models

Lecture 8: Shannon s Noise Models Error Correcting Codes: Combinatorics, Algorithms and Applications (Fall 2007) Lecture 8: Shannon s Noise Models September 14, 2007 Lecturer: Atri Rudra Scribe: Sandipan Kundu& Atri Rudra Till now we have

More information

Lecture 14 February 28

Lecture 14 February 28 EE/Stats 376A: Information Theory Winter 07 Lecture 4 February 8 Lecturer: David Tse Scribe: Sagnik M, Vivek B 4 Outline Gaussian channel and capacity Information measures for continuous random variables

More information

18.310A Final exam practice questions

18.310A Final exam practice questions 18.310A Final exam practice questions This is a collection of practice questions, gathered randomly from previous exams and quizzes. They may not be representative of what will be on the final. In particular,

More information

EE376A - Information Theory Midterm, Tuesday February 10th. Please start answering each question on a new page of the answer booklet.

EE376A - Information Theory Midterm, Tuesday February 10th. Please start answering each question on a new page of the answer booklet. EE376A - Information Theory Midterm, Tuesday February 10th Instructions: You have two hours, 7PM - 9PM The exam has 3 questions, totaling 100 points. Please start answering each question on a new page

More information

EE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm

EE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm EE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm 1. Feedback does not increase the capacity. Consider a channel with feedback. We assume that all the recieved outputs are sent back immediately

More information