Lossy Compression Coding Theorems for Arbitrary Sources

Size: px
Start display at page:

Download "Lossy Compression Coding Theorems for Arbitrary Sources"

Transcription

1 Lossy Compression Coding Theorems for Arbitrary Sources Ioannis Kontoyiannis U of Cambridge joint work with M. Madiman, M. Harrison, J. Zhang Beyond IID Workshop, Cambridge, UK July 23, 2018

2 Outline Motivation and Background: Lossless Data Compression Lossy Data Compression: MDL Point of View

3 Outline Motivation and Background: Lossless Data Compression Lossy Data Compression: MDL Point of View 1. Ideal Compression: Kolmogorov Distortion-Complexity

4 Outline Motivation and Background: Lossless Data Compression Lossy Data Compression: MDL Point of View 1. Ideal Compression: Kolmogorov Distortion-Complexity 2. Codes as Probability Distributions: Lossy Kraft Inequality 3. Coding Theorems: Asymptotics & Non-Asymptotics

5 Outline Motivation and Background: Lossless Data Compression Lossy Data Compression: MDL Point of View 1. Ideal Compression: Kolmogorov Distortion-Complexity 2. Codes as Probability Distributions: Lossy Kraft Inequality 3. Coding Theorems: Asymptotics & Non-Asymptotics 4. Code Performance: Generalized AEP 5. Solidarity with Shannon Theory: Stationary Ergodic Sources

6 Outline Motivation and Background: Lossless Data Compression Lossy Data Compression: MDL Point of View 1. Ideal Compression: Kolmogorov Distortion-Complexity 2. Codes as Probability Distributions: Lossy Kraft Inequality 3. Coding Theorems: Asymptotics & Non-Asymptotics 4. Code Performance: Generalized AEP 5. Solidarity with Shannon Theory: Stationary Ergodic Sources 6. Aside: Lossy Compression & Infrence: What IS the best code? 7. Choosing a Code: Lossy MLE & Lossy MDL 8. Application: Pre-processing in VQ Design

7 Emphasis Concentrate On ; General sources, general distortion measures ; Nonasymptotic, pointwise results ; Precise performance bounds ; Systematic develompent of MDL point of view, parallel to lossless case ; Connections with Applications

8 Emphasis Concentrate On ; General sources, general distortion measures ; Nonasymptotic, pointwise results ; Precise performance bounds ; Systematic develompent of MDL point of view, parallel to lossless case ; Connections with Applications Background Questions Why is lossy compression so much harder? What s so di erent (mathematically) between them? How much do the right models matter for real data?

9 Lossy Compression: The Basic Problem Consider Data string x n 1 =(x 1,x 2,...,x n ) to be compressed Each x i taking values in the source alphabet A e.g., A = {0, 1}, A= R, A= R k,...

10 Lossy Compression: The Basic Problem Consider Data string x n 1 =(x 1,x 2,...,x n ) to be compressed Each x i taking values in the source alphabet A e.g., A = {0, 1}, A= R, A= R k,... Problem Find e cient approximate representation y n 1 =(y 1,y 2,...,y n ) for x n 1 with y i taking values in the reproduction alphabet  E cient means simple or compressible Approximate means that the distortion d n (x n 1,y n 1 ) is apple some level D where d n : A n Ân is an arbitrary distortion measure

11 Step 1. Ideal Compression: Kolmogorov Distortion Complexity For computable data and distortion: Define (Muramatsu-Kanaya 1994) The (Kolmogorov) distortion-complexity at distortion level D: K D (x n 1)=min{`(p) : p s.t. U(p) 2 B(x n 1,D)} bits where U( ) =universal Turing machine B(x n 1,D)=distortion-ball of radius D around x n 1: B(x n 1,D)={y n 1 2 Ân : d n (x n 1,y n 1 ) apple D}

12 Step 1. Ideal Compression: Kolmogorov Distortion Complexity For computable data and distortion: Define (Muramatsu-Kanaya 1994) The (Kolmogorov) distortion-complexity at distortion level D: K D (x n 1)=min{`(p) : p s.t. U(p) 2 B(x n 1,D)} bits where U( ) =universal Turing machine B(x n 1,D)=distortion-ball of radius D around x n 1: B(x n 1,D)={y n 1 2 Ân : d n (x n 1,y n 1 ) apple D} Properties of K D (x n 1): (a) machine-independent (b) nr(d) for stationary ergodic data ; THE fundamental limit of compression But (c) not computable...

13 A Hint From Lossless Compression Data: X1 n = X 1,X 2,...,X n with distribution P n on A n Lossless Code: (e n,l n ) Encoder: e n : A n!{0, 1} Code-lengths: L n (X1 n )=length of e n (X1 n ) bits

14 A Hint From Lossless Compression Data: X1 n = X 1,X 2,...,X n with distribution P n on A n Lossless Code: (e n,l n ) Encoder: e n : A n!{0, 1} Code-lengths: L n (X1 n )=length of e n (X1 n ) bits Kraft Inequality There exists a uniquley decodable code (e n,l n ) i P 1 ) apple 1 all x n 1 2An 2 L n(xn ) To any code (e n,l n ) we can associate a probability distribution Q n (x n 1) / 2 L n(x n 1 ) ( From any prob distribution Q n we can get a code e n with L n (x n 1) = log Q n (x n 1) bits

15 Lossy Compression in More Detail Data: X1 n = X 1,X 2,...,X n with distribution P n on A n Quantizer: q n : A n! codebook B n Ân Encoder: e n : B n!{0, 1} (uniquely decodable) Code-length: L n (X1 n )=L n (q n (X1 n )) = length of e n (q n (X1 n )) bits q n!! e n e 1 n

16 Step 2. Codes as Probability Distributions The code (C n,l n )=(B n,q n,e n,l n ) operates at distortion level D, if d n (x n 1,q n (x n 1)) apple D for every x n 1 2 A n

17 Step 2. Codes as Probability Distributions The code (C n,l n )=(B n,q n,e n,l n ) operates at distortion level D, if d n (x n 1,q n (x n 1)) apple D for every x n 1 2 A n Kraft Inequality Theorem: Lossy Kraft Inequality (() For every lossless code (C n,l n ) (() Foreverycode(C n,l n ) there is a prob measure Q n on A n s.t. operating at distortion level D there is a prob meas. Q n on Ân s.t. L n (x n 1) log Q n (x n 1) bits L n (x n 1) log Q n (B(x n 1,D)) bits for all x n 1 for all x n 1 ()) For any prob measure Q n on A n ()) For any admissible sequence there is a code (C n,l n ) s.t. of measures {Q n } on Ân there are codes {C n,l n } at dist n level D s.t. L n (x n 1) apple log Q n (x n 1)+1bits L n (X1 n ) apple log Q n (B(X1 n,d)) + log n for all x n 1 bits, eventually, w.p.1

18 Remarks on the Codes-Measures Correspondence The converse part is a finite-n result as in the lossless case The direct part is asymptotic (random coding) but with (near) optimal convergence rate Both results are valid without any assumptions on the source or the distortion measure Similar results hold in expectation with a 1 2 log n rate Admissibility, the {Q n } yield codes with finite rate: lim sup n!1 1 n log Q n(b(x n 1,D)) apple some finite R bits/symbol, w.p.1

19 Remarks on the Codes-Measures Correspondence The converse part is a finite-n result as in the lossless case The direct part is asymptotic (random coding) but with (near) optimal convergence rate Both results are valid without any assumptions on the source or the distortion measure Similar results hold in expectation with a 1 2 log n rate Admissibility, the {Q n } yield codes with finite rate: lim sup n!1 1 n log Q n(b(x n 1,D)) apple some finite R bits/symbol, w.p.1 This suggests a natural lossy analog for the Shannon code-lengths: L n (X1 n ) = log Q n B(X1 n,d) bits All codes are random codes

20 Proof Outline (() Given a code (C n,l n ), let Q n (y n 1 ) / Then for all x n 1 : ( 2 L n(y n 1 ) if y n 1 2 B n 0 otherwise L n (x n 1) = L n (q n (x n 1)) log Q n (q n (x n 1)) log Q n (B(x n 1,D)) bits ()) Given Q n, generate IID codewords Y n 1 (i) Q n : Encode X n 1 Y1 n (1) Y1 n (2) Y1 n (3) as the position of the first D-close match: W n =min{i : d n (X n 1,Y n 1 (i)) apple D} This takes L n (X n 1 ) log W n bits log[waiting time for a match] log[1/prob of a match] log Q n (B(X n 1,D)) bits 2

21 Step 3. Coding Theorems: Best Achievable Performance Let Q n achieve: K n (D) 4 =inf Q n E[ log Q n (B(X n 1,D))]

22 Step 3. Coding Theorems: Best Achievable Performance Let Q n achieve: K n (D) 4 =inf Q n E[ log Q n (B(X n 1,D))] Theorem: One-Shot Converses i. For any code (C n,l n ) operating at distortion level D : E[L n (X n 1 )] K n (D) R n (D) bits ii. For any (other) prob measure Q n on A n and any K: n o Pr log Q n (B(X1 n,d)) apple log Q n(b(x1 n,d)) K bits apple 2 K

23 Step 3. Coding Theorems: Best Achievable Performance Let Q n achieve: K n (D) 4 =inf Q n E[ log Q n (B(X n 1,D))] Theorem: One-Shot Converses i. For any code (C n,l n ) operating at distortion level D : E[L n (X n 1 )] K n (D) R n (D) bits ii. For any (other) prob measure Q n on A n and any K: n o Pr log Q n (B(X1 n,d)) apple log Q n(b(x1 n,d)) K bits apple 2 K Proof. Selection in convex families: Bell-Cover version of the Kuhn-Tucker conditions 2

24 Coding Theorems Continued Theorem: Asymptotic Bounds i. For any seq of codes {C n,l n } operating at distortion level D : L n (X1 n ) log Q n(b(x1 n,d)) log n bits, eventually, w.p.1 with Q n as before

25 Coding Theorems Continued Theorem: Asymptotic Bounds i. For any seq of codes {C n,l n } operating at distortion level D : L n (X1 n ) log Q n(b(x1 n,d)) log n bits, eventually, w.p.1 with Q n as before ii. There is a seq of codes {Cn,L n} operating at distortion level D s.t. L n(x1 n ) apple log Q n(b(x1 n,d)) + log n bits, eventually, w.p.1

26 Coding Theorems Continued Theorem: Asymptotic Bounds i. For any seq of codes {C n,l n } operating at distortion level D : L n (X1 n ) log Q n(b(x1 n,d)) log n bits, eventually, w.p.1 with Q n as before ii. There is a seq of codes {Cn,L n} operating at distortion level D s.t. L n(x1 n ) apple log Q n(b(x1 n,d)) + log n bits, eventually, w.p.1 Proof. Finite-n bounds + Markov inequality + Borel-Cantelli + extra care 2

27 Remarks 1. Gaussian Approximation This is a good starting point for developing the well-known Gaussian approximation results for the fundamental limits of lossy compression [K2000, Kostina-Verdu,...]

28 Remarks 1. Gaussian Approximation This is a good starting point for developing the well-known Gaussian approximation results for the fundamental limits of lossy compression [K2000, Kostina-Verdu,...] 2. More fundamentally We wish to approximate the performance of the optimal Shannon code: Find {Q n } that yield code-lengths L n (X1 n ) = log Q n B(X1 n,d) bits close to those of the optimal Shannon code : L n(x1 n ) = log Q n B(X1 n,d) bits

29 Remarks 1. Gaussian Approximation This is a good starting point for developing the well-known Gaussian approximation results for the fundamental limits of lossy compression [K2000, Kostina-Verdu,...] 2. More fundamentally We wish to approximate the performance of the optimal Shannon code: Find {Q n } that yield code-lengths L n (X1 n ) = log Q n B(X1 n,d) bits close to those of the optimal Shannon code : L n(x1 n ) = log Q n B(X1 n,d) bits 3. Performance?

30 Step 4. Code Performance: Generalized AEP Suppose The source {X n } is stationary ergodic with distribution P {Q n } are the marginals of a stationary ergodic Q d n (x n 1,y1 n ) = 1 P n n i=1 d(x i,y i ) is a single-letter distortion measure

31 Step 4. Code Performance: Generalized AEP Suppose The source {X n } is stationary ergodic with distribution P {Q n } are the marginals of a stationary ergodic Q d n (x n 1,y n 1 ) = 1 n P n i=1 d(x i,y i ) is a single-letter distortion measure Theorem: Generalized AEP [L. & Szpan.], [Dembo & K.], [Chi], [...] If Q is mixing enough and d(x, y) is not wild: where 1 n log Q n(b(x n 1,D))! R(P, Q,D) 1 R(P, Q,D)= lim n!1 n bits/symbol, w.p.1 inf H(P P X n=p n,e[d n (X n 1 1,Y 1 n)]appled X1 n,y 1 nkp n Q n )

32 Step 4. Code Performance: Generalized AEP Suppose The source {X n } is stationary ergodic with distribution P {Q n } are the marginals of a stationary ergodic Q d n (x n 1,y n 1 ) = 1 n P n i=1 d(x i,y i ) is a single-letter distortion measure Theorem: Generalized AEP [L. & Szpan.], [Dembo & K.], [Chi], [...] If Q is mixing enough and d(x, y) is not wild: where 1 n log Q n(b(x n 1,D))! R(P, Q,D) 1 R(P, Q,D)= lim n!1 n bits/symbol, w.p.1 inf H(P P X n=p n,e[d n (X n 1 1,Y 1 n)]appled X1 n,y 1 nkp n Q n ) Proof. Based on very technical large deviations 2

33 Step 5. Sanity Check: Stationary Ergodic Souces Suppose The source {X n } is stationary ergodic with distribution P d n (x n 1,y n 1 ) = 1 n P n i=1 d(x i,y i ) is a single-letter distortion measure As before, Q n achieves K n (D) = inf Q n E[ log Q n (B(X n 1,D))] Theorem ([Kie er], [K & Zhang]) 1 i. K(D) = lim n!1 n K 1 n(d) = lim n!1 n R n(d) =R(D) ii. 1 n log Q n(b(x n 1,D))! R(D) bits/symbol, w.p.1

34 Outline Recall our program: 1. Ideal Compression: Kolmogorov Distortion-Complexity 2. Codes as Probability Distributions: Lossy Kraft Inequality 3. Coding Theorems: Asymptotics & Non-Asymptotics 4. Code Performance: Generalized AEP 5. Solidarity with Shannon Theory: Stationary Ergodic Sources 6. Aside: Lossy Compression & Infrence: What IS the best code? 7. Choosing a Code: Lossy MLE & Lossy MDL 8. Toward Practicality: Pre-processing in VQ Design

35 So far Setting We identified codes with prob measures L n (X1 n ) = log Q n B(X1 n,d) bits Design WhatisQ n?! What are good codes like? Howdewefindthem? How can we empirically design/choose a good code? Given a parametric family {Q ; 2 } of codes how do we choose?

36 Step 6. Lossy Compression & Inference: What is Q n? An HMM Example: Noisy Data Suppose {Y n } is a finite-state Markov Chain {Z n } is IID N(0, 2 ) noise and d(x, y) is MSE: d(x n 1,y1 n )= 1 P n n i=1 (x i y i ) 2 Y 1 Y + Z + Z2 Z n IID N(0, σ 2 ) Y n + Then For D = For D< X 1 X 2,wehaveQ n 2 2,wehaveQ n D Y n 1 X n D [Y n 1 + IID N(0, 2 D)]

37 Lossy Compression & Inference: Denoising Idea: Lossy Compression of Noisy Data ; Denoising Suppose X n 1 is noisy data X n 1 = Y n 1 + Z n 1 the noise is maximally random with strength 2 the distortion measure d(x, y) is matched to the noise 0 apple D apple 2 [i.e., if the Shannon lower bound is tight] Then Q n D distr of Y n 1 + noise with strength ( 2 D) Egs Binary noise with Hamming distortion Gaussian noise with MSE

38 Step 7. Choosing a Code: The Lossy MLE & A Lossy MDL Proposal Greedy empirical code selection Given a parametric family {Q ; 2 } of codes, define: h i 4 ˆ MLE =arginf log Q (B(X1 n,d)) 2

39 Step 7. Choosing a Code: The Lossy MLE & A Lossy MDL Proposal Greedy empirical code selection Given a parametric family {Q ; 2 } of codes, define: h i 4 ˆ MLE =arginf log Q (B(X1 n,d)) 2 Problems: As with classical MLE 1. Encourages overfitting, and 2. Does not lead to real codes

40 Step 7. Choosing a Code: The Lossy MLE & A Lossy MDL Proposal Greedy empirical code selection Given a parametric family {Q ; 2 } of codes, define: h i 4 ˆ MLE =arginf log Q (B(X1 n,d)) 2 Problems: As with classical MLE 1. Encourages overfitting, and 2. Does not lead to real codes Solution: Follow coding intuition For the MLE to be useful, it needs to be described as well ) Consider two-part codes: L n (X n 1 )= log Q (B(X n 1,D)) + `n( ) {z} where: either (c n,`n) is a prefix-free code on or `n( ) is an appropriate penalization term description of

41 Two Lossy MDL Proposals Two Specific MDL-like proposals: Define: h i 4 ˆ MDL =arginf log Q (B(X1 n,d)) + `n( ) 2 with either: (a) `n( ) = dim(q ) 2 log n 8 >< `( ) for in some countable ; (b) `n( ) = >: 1 otherwise

42 Two Lossy MDL Proposals Two Specific MDL-like proposals: Define: h i 4 ˆ MDL =arginf log Q (B(X1 n,d)) + `n( ) 2 with either: (a) `n( ) = dim(q ) 2 log n 8 >< `( ) for in some countable ; (b) `n( ) = >: 1 otherwise Consistency? Does ˆ MLE /ˆ MDL asymptotically lead to optimal compression? What is the optimal? What if is not unique? (the typical case) As in the classical case: Proof often hard and problem-specific

43 Justification For The Lossy MDL Estimate 1. Leads to realistic code selection 2. An example: Theorem Let {X n } be real-valued, stationary, ergodic, E(X n )=0, Var(X n )=1 Take d n (x n 1,y n 1 ) = MSE, let D 2 (0, 1) fixed : Q IID 1 2 N(0, 1 D)+1 2N(0, ), 2 [0, 1] With `n( ) = dim(q ) 2 log n we have: (a) ˆ MLE and ˆ MDL both! =1 D w.p.1 (b) ˆ MLE 6=(1 D) i.o., w.p.1 (c) ˆ MDL =(1 D) ev., w.p.1

44 Example Interpretation and Details Although artificial, the above example illustrates a general phenomenon: Proof. ˆ MLE overfits whereas ˆ MDL doesn t To compare the ˆ MLE with ˆ MDL, need estimates of the log-likelihood log Q (B(X n 1,D)) with accuracy better than O(log n), uniformlyin. This involves very intricate large deviations: STEPS 1 & 2: np log Q (B(X1 n,d)) = log Pr d(x i,y i ) apple D X1 n 1 n i=1 nr( ˆP X n 1, Q,D)+ 1 2 n P i=1 g (X i )+ 1 2 log n + O(1) w.p.1 log n + O(log log n) w.p.1 STEP 3: Implicitly identify g (x) as the derivative a convex dual STEP 4: Expand g (x) in Taylor series around STEP 5: Use the LIL to compare ˆ MLE with ˆ MDL STEP 6: Justify a.s.-uniform approximation 2

45 Another Example of Consistency: Gaussian Mixtures Let: The source {X n } be R k -valued, stationary, ergodic finite mean and covariance, arbitrary distr P be Gaussian mixtures: Q IID P L i=1 p in(µ i, K i ) where = (p 1,...,p L ), (µ 1,...,µ L ), (K 1,...,K L ) k, L fixed, µ i 2 [ M,M] k, K i has eigenvalues in some [, ] d n (x n 1,y n 1 ) = MSE, D>0 fixed Motivation: Practical quantization/clustering schemes e.g., Gray s Gauss mixture VQ and MDI selection

46 Another Example of Consistency: Gaussian Mixtures Let: The source {X n } be R k -valued, stationary, ergodic finite mean and covariance, arbitrary distr P be Gaussian mixtures: Q IID P L i=1 p in(µ i, K i ) where = (p 1,...,p L ), (µ 1,...,µ L ), (K 1,...,K L ) k, L fixed, µ i 2 [ M,M] k, K i has eigenvalues in some [, ] d n (x n 1,y n 1 ) = MSE, D>0 fixed Motivation: Theorem: Practical quantization/clustering schemes e.g., Gray s Gauss mixture VQ and MDI selection There is a unique optimal characterized by a k-dimensional variational problem inf R(P k,q,k,d)=r(p k,q,k,d), 2 and ˆ MLE, ˆ MDL both! w.p.1 and in L 1

47 Weak Consistency: A General Theorem Suppose (a) The generalized AEP holds for all Q in (b) The rate function R(P, Q,D) of the AEP is lower semicont s in (c) The lower bound of the AEP holds uniformly on compacts in (d) ˆ stays in a compact set, eventually, w.p.1 Then dist(ˆ MLE, { })! 0 w.p.1 dist(ˆ MDL, { })! 0 w.p.1

48 Weak Consistency: A General Theorem Suppose (a) The generalized AEP holds for all Q in (b) The rate function R(P, Q,D) of the AEP is lower semicont s in (c) The lower bound of the AEP holds uniformly on compacts in (d) ˆ stays in a compact set, eventually, w.p.1 Then dist(ˆ MLE, { })! 0 w.p.1 dist(ˆ MDL, { })! 0 w.p.1 Proof. Based on epiconvergence (or -convergence); quite technical 2 Conditions. (a) we saw; (c) often checked by Chernov bound-like arguments; (b) and (d) need to be checked case-by-case ; These su cient conditions can be substantially weakened

49 Strong Consistency: A Conjecture Under conditions (a) (d) above: Conjecture For smooth enough parametric families, under regularity conditions: always : dim(ˆ MDL )=dim( ) ev., w.p.1 typically : dim(ˆ MLE ) 6= dim( ) i.o., w.p.1

50 Strong Consistency: A Conjecture Under conditions (a) (d) above: Conjecture For smooth enough parametric families, under regularity conditions: always : dim(ˆ MDL )=dim( ) ev., w.p.1 typically : dim(ˆ MLE ) 6= dim( ) i.o., w.p.1 Proof? Saw the many technicalities in simple Gaussian case General result is still open 2

51 Step 8. An Application: Preprocessing in VQ Design Remarks So far, everything based on measures $ (random) codes correspondence Practical implications? Candidate Application: Gaussian-mixture VQ stationary, ergodic, R k -valued source {X n } finite mean and covariance, arbitrary distr P are Gaussian mixtures: Q IID P L i=1 p in(µ i, K i ) µ i 2 [ M,M] k, K i has eigenvalues in some [, ] d n (x n 1,y n 1 ) = MSE, D>0 fixed Main Problem Choose L

52 Step 8. An Application: Preprocessing in VQ Design Remarks So far, everything based on measures $ (random) codes correspondence Practical implications? Candidate Application: Gaussian-mixture VQ stationary, ergodic, R k -valued source {X n } finite mean and covariance, arbitrary distr P are Gaussian mixtures: Q IID P L i=1 p in(µ i, K i ) µ i 2 [ M,M] k, K i has eigenvalues in some [, ] d n (x n 1,y n 1 ) = MSE, D>0 fixed Main Problem Choose L MDL Estimate ˆL =[# of components in ˆ MDL ]

53 Final Remarks: The MDL Point of View Guideline: Kolmogorov Distortion-Complexity: Not computable Codes-Measures Correspondence: All codes are random codes L n (X1 n ) = log Q n B(X1 n,d) bits Optimal code: Q n = arginf Qn E[ log(q n (B(X n 1,D))] Generalized AEP(s): 1 n log Q n(b(x1 n,d))! R(P, Q,D) Lossy MLE: consistent but overfits h i 4 ˆ MLE =arginf log Q (B(X1 n,d)) 2 Lossy MDL: consistent and does NOT overfit h 4 ˆ MDL =arginf log Q (B(X1 n,d)) + `n( ) 2 VQ Design: Preprocessing with Lossy MDL reduces problem dimensionality i

Pointwise Redundancy in Lossy Data Compression and Universal Lossy Data Compression

Pointwise Redundancy in Lossy Data Compression and Universal Lossy Data Compression Pointwise Redundancy in Lossy Data Compression and Universal Lossy Data Compression I. Kontoyiannis To appear, IEEE Transactions on Information Theory, Jan. 2000 Last revision, November 21, 1999 Abstract

More information

1590 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 6, JUNE Source Coding, Large Deviations, and Approximate Pattern Matching

1590 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 6, JUNE Source Coding, Large Deviations, and Approximate Pattern Matching 1590 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 6, JUNE 2002 Source Coding, Large Deviations, and Approximate Pattern Matching Amir Dembo and Ioannis Kontoyiannis, Member, IEEE Invited Paper

More information

Lecture 5 Channel Coding over Continuous Channels

Lecture 5 Channel Coding over Continuous Channels Lecture 5 Channel Coding over Continuous Channels I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw November 14, 2014 1 / 34 I-Hsiang Wang NIT Lecture 5 From

More information

Lecture 4 Noisy Channel Coding

Lecture 4 Noisy Channel Coding Lecture 4 Noisy Channel Coding I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw October 9, 2015 1 / 56 I-Hsiang Wang IT Lecture 4 The Channel Coding Problem

More information

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University Chapter 4 Data Transmission and Channel Capacity Po-Ning Chen, Professor Department of Communications Engineering National Chiao Tung University Hsin Chu, Taiwan 30050, R.O.C. Principle of Data Transmission

More information

Exercises with solutions (Set D)

Exercises with solutions (Set D) Exercises with solutions Set D. A fair die is rolled at the same time as a fair coin is tossed. Let A be the number on the upper surface of the die and let B describe the outcome of the coin toss, where

More information

Context tree models for source coding

Context tree models for source coding Context tree models for source coding Toward Non-parametric Information Theory Licence de droits d usage Outline Lossless Source Coding = density estimation with log-loss Source Coding and Universal Coding

More information

EE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16

EE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16 EE539R: Problem Set 4 Assigned: 3/08/6, Due: 07/09/6. Cover and Thomas: Problem 3.5 Sets defined by probabilities: Define the set C n (t = {x n : P X n(x n 2 nt } (a We have = P X n(x n P X n(x n 2 nt

More information

National University of Singapore Department of Electrical & Computer Engineering. Examination for

National University of Singapore Department of Electrical & Computer Engineering. Examination for National University of Singapore Department of Electrical & Computer Engineering Examination for EE5139R Information Theory for Communication Systems (Semester I, 2014/15) November/December 2014 Time Allowed:

More information

Joint Universal Lossy Coding and Identification of Stationary Mixing Sources With General Alphabets Maxim Raginsky, Member, IEEE

Joint Universal Lossy Coding and Identification of Stationary Mixing Sources With General Alphabets Maxim Raginsky, Member, IEEE IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 5, MAY 2009 1945 Joint Universal Lossy Coding and Identification of Stationary Mixing Sources With General Alphabets Maxim Raginsky, Member, IEEE Abstract

More information

Coding on Countably Infinite Alphabets

Coding on Countably Infinite Alphabets Coding on Countably Infinite Alphabets Non-parametric Information Theory Licence de droits d usage Outline Lossless Coding on infinite alphabets Source Coding Universal Coding Infinite Alphabets Enveloppe

More information

Basic Principles of Lossless Coding. Universal Lossless coding. Lempel-Ziv Coding. 2. Exploit dependences between successive symbols.

Basic Principles of Lossless Coding. Universal Lossless coding. Lempel-Ziv Coding. 2. Exploit dependences between successive symbols. Universal Lossless coding Lempel-Ziv Coding Basic principles of lossless compression Historical review Variable-length-to-block coding Lempel-Ziv coding 1 Basic Principles of Lossless Coding 1. Exploit

More information

A Tight Upper Bound on the Second-Order Coding Rate of Parallel Gaussian Channels with Feedback

A Tight Upper Bound on the Second-Order Coding Rate of Parallel Gaussian Channels with Feedback A Tight Upper Bound on the Second-Order Coding Rate of Parallel Gaussian Channels with Feedback Vincent Y. F. Tan (NUS) Joint work with Silas L. Fong (Toronto) 2017 Information Theory Workshop, Kaohsiung,

More information

Chapter 9 Fundamental Limits in Information Theory

Chapter 9 Fundamental Limits in Information Theory Chapter 9 Fundamental Limits in Information Theory Information Theory is the fundamental theory behind information manipulation, including data compression and data transmission. 9.1 Introduction o For

More information

A new converse in rate-distortion theory

A new converse in rate-distortion theory A new converse in rate-distortion theory Victoria Kostina, Sergio Verdú Dept. of Electrical Engineering, Princeton University, NJ, 08544, USA Abstract This paper shows new finite-blocklength converse bounds

More information

COMM901 Source Coding and Compression. Quiz 1

COMM901 Source Coding and Compression. Quiz 1 German University in Cairo - GUC Faculty of Information Engineering & Technology - IET Department of Communication Engineering Winter Semester 2013/2014 Students Name: Students ID: COMM901 Source Coding

More information

Pattern Matching and Lossy Data Compression on Random Fields

Pattern Matching and Lossy Data Compression on Random Fields Pattern Matching and Lossy Data Compression on Random Fields I. Kontoyiannis December 16, 2002 Abstract We consider the problem of lossy data compression for data arranged on twodimensional arrays (such

More information

EE5139R: Problem Set 7 Assigned: 30/09/15, Due: 07/10/15

EE5139R: Problem Set 7 Assigned: 30/09/15, Due: 07/10/15 EE5139R: Problem Set 7 Assigned: 30/09/15, Due: 07/10/15 1. Cascade of Binary Symmetric Channels The conditional probability distribution py x for each of the BSCs may be expressed by the transition probability

More information

Lecture 5: Asymptotic Equipartition Property

Lecture 5: Asymptotic Equipartition Property Lecture 5: Asymptotic Equipartition Property Law of large number for product of random variables AEP and consequences Dr. Yao Xie, ECE587, Information Theory, Duke University Stock market Initial investment

More information

Lecture 22: Final Review

Lecture 22: Final Review Lecture 22: Final Review Nuts and bolts Fundamental questions and limits Tools Practical algorithms Future topics Dr Yao Xie, ECE587, Information Theory, Duke University Basics Dr Yao Xie, ECE587, Information

More information

Lecture 5: Channel Capacity. Copyright G. Caire (Sample Lectures) 122

Lecture 5: Channel Capacity. Copyright G. Caire (Sample Lectures) 122 Lecture 5: Channel Capacity Copyright G. Caire (Sample Lectures) 122 M Definitions and Problem Setup 2 X n Y n Encoder p(y x) Decoder ˆM Message Channel Estimate Definition 11. Discrete Memoryless Channel

More information

10-704: Information Processing and Learning Spring Lecture 8: Feb 5

10-704: Information Processing and Learning Spring Lecture 8: Feb 5 10-704: Information Processing and Learning Spring 2015 Lecture 8: Feb 5 Lecturer: Aarti Singh Scribe: Siheng Chen Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal

More information

18.2 Continuous Alphabet (discrete-time, memoryless) Channel

18.2 Continuous Alphabet (discrete-time, memoryless) Channel 0-704: Information Processing and Learning Spring 0 Lecture 8: Gaussian channel, Parallel channels and Rate-distortion theory Lecturer: Aarti Singh Scribe: Danai Koutra Disclaimer: These notes have not

More information

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1 Kraft s inequality An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if N 2 l i 1 Proof: Suppose that we have a tree code. Let l max = max{l 1,...,

More information

Solutions to Homework Set #1 Sanov s Theorem, Rate distortion

Solutions to Homework Set #1 Sanov s Theorem, Rate distortion st Semester 00/ Solutions to Homework Set # Sanov s Theorem, Rate distortion. Sanov s theorem: Prove the simple version of Sanov s theorem for the binary random variables, i.e., let X,X,...,X n be a sequence

More information

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria Source Coding Master Universitario en Ingeniería de Telecomunicación I. Santamaría Universidad de Cantabria Contents Introduction Asymptotic Equipartition Property Optimal Codes (Huffman Coding) Universal

More information

10-704: Information Processing and Learning Fall Lecture 9: Sept 28

10-704: Information Processing and Learning Fall Lecture 9: Sept 28 10-704: Information Processing and Learning Fall 2016 Lecturer: Siheng Chen Lecture 9: Sept 28 Note: These notes are based on scribed notes from Spring15 offering of this course. LaTeX template courtesy

More information

Lecture 8: Channel and source-channel coding theorems; BEC & linear codes. 1 Intuitive justification for upper bound on channel capacity

Lecture 8: Channel and source-channel coding theorems; BEC & linear codes. 1 Intuitive justification for upper bound on channel capacity 5-859: Information Theory and Applications in TCS CMU: Spring 23 Lecture 8: Channel and source-channel coding theorems; BEC & linear codes February 7, 23 Lecturer: Venkatesan Guruswami Scribe: Dan Stahlke

More information

Homework Set #2 Data Compression, Huffman code and AEP

Homework Set #2 Data Compression, Huffman code and AEP Homework Set #2 Data Compression, Huffman code and AEP 1. Huffman coding. Consider the random variable ( x1 x X = 2 x 3 x 4 x 5 x 6 x 7 0.50 0.26 0.11 0.04 0.04 0.03 0.02 (a Find a binary Huffman code

More information

1 Ex. 1 Verify that the function H(p 1,..., p n ) = k p k log 2 p k satisfies all 8 axioms on H.

1 Ex. 1 Verify that the function H(p 1,..., p n ) = k p k log 2 p k satisfies all 8 axioms on H. Problem sheet Ex. Verify that the function H(p,..., p n ) = k p k log p k satisfies all 8 axioms on H. Ex. (Not to be handed in). looking at the notes). List as many of the 8 axioms as you can, (without

More information

Information Theory. M1 Informatique (parcours recherche et innovation) Aline Roumy. January INRIA Rennes 1/ 73

Information Theory. M1 Informatique (parcours recherche et innovation) Aline Roumy. January INRIA Rennes 1/ 73 1/ 73 Information Theory M1 Informatique (parcours recherche et innovation) Aline Roumy INRIA Rennes January 2018 Outline 2/ 73 1 Non mathematical introduction 2 Mathematical introduction: definitions

More information

(Classical) Information Theory III: Noisy channel coding

(Classical) Information Theory III: Noisy channel coding (Classical) Information Theory III: Noisy channel coding Sibasish Ghosh The Institute of Mathematical Sciences CIT Campus, Taramani, Chennai 600 113, India. p. 1 Abstract What is the best possible way

More information

lossless, optimal compressor

lossless, optimal compressor 6. Variable-length Lossless Compression The principal engineering goal of compression is to represent a given sequence a, a 2,..., a n produced by a source as a sequence of bits of minimal possible length.

More information

Memory in Classical Information Theory: A Brief History

Memory in Classical Information Theory: A Brief History Beyond iid in information theory 8- January, 203 Memory in Classical Information Theory: A Brief History sergio verdu princeton university entropy rate Shannon, 948 entropy rate: memoryless process H(X)

More information

Information Dimension

Information Dimension Information Dimension Mina Karzand Massachusetts Institute of Technology November 16, 2011 1 / 26 2 / 26 Let X would be a real-valued random variable. For m N, the m point uniform quantized version of

More information

CSCI 2570 Introduction to Nanocomputing

CSCI 2570 Introduction to Nanocomputing CSCI 2570 Introduction to Nanocomputing Information Theory John E Savage What is Information Theory Introduced by Claude Shannon. See Wikipedia Two foci: a) data compression and b) reliable communication

More information

Summary of Shannon Rate-Distortion Theory

Summary of Shannon Rate-Distortion Theory Summary of Shannon Rate-Distortion Theory Consider a stationary source X with kth-order probability density function denoted f k (x). Consider VQ with fixed-rate coding. Recall the following OPTA function

More information

Shannon s Noisy-Channel Coding Theorem

Shannon s Noisy-Channel Coding Theorem Shannon s Noisy-Channel Coding Theorem Lucas Slot Sebastian Zur February 13, 2015 Lucas Slot, Sebastian Zur Shannon s Noisy-Channel Coding Theorem February 13, 2015 1 / 29 Outline 1 Definitions and Terminology

More information

Midterm Exam Information Theory Fall Midterm Exam. Time: 09:10 12:10 11/23, 2016

Midterm Exam Information Theory Fall Midterm Exam. Time: 09:10 12:10 11/23, 2016 Midterm Exam Time: 09:10 12:10 11/23, 2016 Name: Student ID: Policy: (Read before You Start to Work) The exam is closed book. However, you are allowed to bring TWO A4-size cheat sheet (single-sheet, two-sided).

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science Transmission of Information Spring 2006

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science Transmission of Information Spring 2006 MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science 6.44 Transmission of Information Spring 2006 Homework 2 Solution name username April 4, 2006 Reading: Chapter

More information

Information Theory. Lecture 5 Entropy rate and Markov sources STEFAN HÖST

Information Theory. Lecture 5 Entropy rate and Markov sources STEFAN HÖST Information Theory Lecture 5 Entropy rate and Markov sources STEFAN HÖST Universal Source Coding Huffman coding is optimal, what is the problem? In the previous coding schemes (Huffman and Shannon-Fano)it

More information

Lossless Source Coding

Lossless Source Coding Chapter 2 Lossless Source Coding We begin this chapter by describing the general source coding scenario in Section 2.1. Section 2.2 introduces the most rudimentary kind of source codes, called fixed-length

More information

On Optimum Conventional Quantization for Source Coding with Side Information at the Decoder

On Optimum Conventional Quantization for Source Coding with Side Information at the Decoder On Optimum Conventional Quantization for Source Coding with Side Information at the Decoder by Lin Zheng A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the

More information

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 54, NO. 7, JULY

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 54, NO. 7, JULY IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 54, NO. 7, JULY 2008 3059 Joint Fixed-Rate Universal Lossy Coding Identification of Continuous-Alphabet Memoryless Sources Maxim Raginsky, Member, IEEE Abstract

More information

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157 Lecture 6: Gaussian Channels Copyright G. Caire (Sample Lectures) 157 Differential entropy (1) Definition 18. The (joint) differential entropy of a continuous random vector X n p X n(x) over R is: Z h(x

More information

Subset Source Coding

Subset Source Coding Fifty-third Annual Allerton Conference Allerton House, UIUC, Illinois, USA September 29 - October 2, 205 Subset Source Coding Ebrahim MolavianJazi and Aylin Yener Wireless Communications and Networking

More information

(Classical) Information Theory II: Source coding

(Classical) Information Theory II: Source coding (Classical) Information Theory II: Source coding Sibasish Ghosh The Institute of Mathematical Sciences CIT Campus, Taramani, Chennai 600 113, India. p. 1 Abstract The information content of a random variable

More information

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 1 Entropy Since this course is about entropy maximization,

More information

Coding into a source: an inverse rate-distortion theorem

Coding into a source: an inverse rate-distortion theorem Coding into a source: an inverse rate-distortion theorem Anant Sahai joint work with: Mukul Agarwal Sanjoy K. Mitter Wireless Foundations Department of Electrical Engineering and Computer Sciences University

More information

On MMSE Estimation from Quantized Observations in the Nonasymptotic Regime

On MMSE Estimation from Quantized Observations in the Nonasymptotic Regime On MMSE Estimation from Quantized Observations in the Nonasymptotic Regime Jaeho Lee, Maxim Raginsky, and Pierre Moulin Abstract This paper studies MMSE estimation on the basis of quantized noisy observations

More information

Lecture 3: Channel Capacity

Lecture 3: Channel Capacity Lecture 3: Channel Capacity 1 Definitions Channel capacity is a measure of maximum information per channel usage one can get through a channel. This one of the fundamental concepts in information theory.

More information

Chapter 2: Source coding

Chapter 2: Source coding Chapter 2: meghdadi@ensil.unilim.fr University of Limoges Chapter 2: Entropy of Markov Source Chapter 2: Entropy of Markov Source Markov model for information sources Given the present, the future is independent

More information

Lecture 14 February 28

Lecture 14 February 28 EE/Stats 376A: Information Theory Winter 07 Lecture 4 February 8 Lecturer: David Tse Scribe: Sagnik M, Vivek B 4 Outline Gaussian channel and capacity Information measures for continuous random variables

More information

MARKOV CHAINS A finite state Markov chain is a sequence of discrete cv s from a finite alphabet where is a pmf on and for

MARKOV CHAINS A finite state Markov chain is a sequence of discrete cv s from a finite alphabet where is a pmf on and for MARKOV CHAINS A finite state Markov chain is a sequence S 0,S 1,... of discrete cv s from a finite alphabet S where q 0 (s) is a pmf on S 0 and for n 1, Q(s s ) = Pr(S n =s S n 1 =s ) = Pr(S n =s S n 1

More information

A Formula for the Capacity of the General Gel fand-pinsker Channel

A Formula for the Capacity of the General Gel fand-pinsker Channel A Formula for the Capacity of the General Gel fand-pinsker Channel Vincent Y. F. Tan Institute for Infocomm Research (I 2 R, A*STAR, Email: tanyfv@i2r.a-star.edu.sg ECE Dept., National University of Singapore

More information

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information 4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information Ramji Venkataramanan Signal Processing and Communications Lab Department of Engineering ramji.v@eng.cam.ac.uk

More information

Electrical and Information Technology. Information Theory. Problems and Solutions. Contents. Problems... 1 Solutions...7

Electrical and Information Technology. Information Theory. Problems and Solutions. Contents. Problems... 1 Solutions...7 Electrical and Information Technology Information Theory Problems and Solutions Contents Problems.......... Solutions...........7 Problems 3. In Problem?? the binomial coefficent was estimated with Stirling

More information

( 1 k "information" I(X;Y) given by Y about X)

( 1 k information I(X;Y) given by Y about X) SUMMARY OF SHANNON DISTORTION-RATE THEORY Consider a stationary source X with f (x) as its th-order pdf. Recall the following OPTA function definitions: δ(,r) = least dist'n of -dim'l fixed-rate VQ's w.

More information

Lecture 4 Channel Coding

Lecture 4 Channel Coding Capacity and the Weak Converse Lecture 4 Coding I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw October 15, 2014 1 / 16 I-Hsiang Wang NIT Lecture 4 Capacity

More information

Solutions to Set #2 Data Compression, Huffman code and AEP

Solutions to Set #2 Data Compression, Huffman code and AEP Solutions to Set #2 Data Compression, Huffman code and AEP. Huffman coding. Consider the random variable ( ) x x X = 2 x 3 x 4 x 5 x 6 x 7 0.50 0.26 0. 0.04 0.04 0.03 0.02 (a) Find a binary Huffman code

More information

Sequential prediction with coded side information under logarithmic loss

Sequential prediction with coded side information under logarithmic loss under logarithmic loss Yanina Shkel Department of Electrical Engineering Princeton University Princeton, NJ 08544, USA Maxim Raginsky Department of Electrical and Computer Engineering Coordinated Science

More information

On Third-Order Asymptotics for DMCs

On Third-Order Asymptotics for DMCs On Third-Order Asymptotics for DMCs Vincent Y. F. Tan Institute for Infocomm Research (I R) National University of Singapore (NUS) January 0, 013 Vincent Tan (I R and NUS) Third-Order Asymptotics for DMCs

More information

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability

More information

arxiv: v1 [cs.it] 5 Sep 2008

arxiv: v1 [cs.it] 5 Sep 2008 1 arxiv:0809.1043v1 [cs.it] 5 Sep 2008 On Unique Decodability Marco Dalai, Riccardo Leonardi Abstract In this paper we propose a revisitation of the topic of unique decodability and of some fundamental

More information

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy Coding and Information Theory Chris Williams, School of Informatics, University of Edinburgh Overview What is information theory? Entropy Coding Information Theory Shannon (1948): Information theory is

More information

On Scalable Coding in the Presence of Decoder Side Information

On Scalable Coding in the Presence of Decoder Side Information On Scalable Coding in the Presence of Decoder Side Information Emrah Akyol, Urbashi Mitra Dep. of Electrical Eng. USC, CA, US Email: {eakyol, ubli}@usc.edu Ertem Tuncel Dep. of Electrical Eng. UC Riverside,

More information

Lecture 6 I. CHANNEL CODING. X n (m) P Y X

Lecture 6 I. CHANNEL CODING. X n (m) P Y X 6- Introduction to Information Theory Lecture 6 Lecturer: Haim Permuter Scribe: Yoav Eisenberg and Yakov Miron I. CHANNEL CODING We consider the following channel coding problem: m = {,2,..,2 nr} Encoder

More information

Lecture Notes 3 Convergence (Chapter 5)

Lecture Notes 3 Convergence (Chapter 5) Lecture Notes 3 Convergence (Chapter 5) 1 Convergence of Random Variables Let X 1, X 2,... be a sequence of random variables and let X be another random variable. Let F n denote the cdf of X n and let

More information

Lecture 15: MCMC Sanjeev Arora Elad Hazan. COS 402 Machine Learning and Artificial Intelligence Fall 2016

Lecture 15: MCMC Sanjeev Arora Elad Hazan. COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 15: MCMC Sanjeev Arora Elad Hazan COS 402 Machine Learning and Artificial Intelligence Fall 2016 Course progress Learning from examples Definition + fundamental theorem of statistical learning,

More information

Two Applications of the Gaussian Poincaré Inequality in the Shannon Theory

Two Applications of the Gaussian Poincaré Inequality in the Shannon Theory Two Applications of the Gaussian Poincaré Inequality in the Shannon Theory Vincent Y. F. Tan (Joint work with Silas L. Fong) National University of Singapore (NUS) 2016 International Zurich Seminar on

More information

Capacity of a channel Shannon s second theorem. Information Theory 1/33

Capacity of a channel Shannon s second theorem. Information Theory 1/33 Capacity of a channel Shannon s second theorem Information Theory 1/33 Outline 1. Memoryless channels, examples ; 2. Capacity ; 3. Symmetric channels ; 4. Channel Coding ; 5. Shannon s second theorem,

More information

Chapter 5: Data Compression

Chapter 5: Data Compression Chapter 5: Data Compression Definition. A source code C for a random variable X is a mapping from the range of X to the set of finite length strings of symbols from a D-ary alphabet. ˆX: source alphabet,

More information

ELEMENT OF INFORMATION THEORY

ELEMENT OF INFORMATION THEORY History Table of Content ELEMENT OF INFORMATION THEORY O. Le Meur olemeur@irisa.fr Univ. of Rennes 1 http://www.irisa.fr/temics/staff/lemeur/ October 2010 1 History Table of Content VERSION: 2009-2010:

More information

Quiz 2 Date: Monday, November 21, 2016

Quiz 2 Date: Monday, November 21, 2016 10-704 Information Processing and Learning Fall 2016 Quiz 2 Date: Monday, November 21, 2016 Name: Andrew ID: Department: Guidelines: 1. PLEASE DO NOT TURN THIS PAGE UNTIL INSTRUCTED. 2. Write your name,

More information

Denoising via MCMC-based Lossy. Compression

Denoising via MCMC-based Lossy. Compression Denoising via MCMC-based Lossy 1 Compression Shirin Jalali and Tsachy Weissman, Center for Mathematics of Information, CalTech, Pasadena, CA 91125 Department of Electrical Engineering, Stanford University,

More information

On Common Information and the Encoding of Sources that are Not Successively Refinable

On Common Information and the Encoding of Sources that are Not Successively Refinable On Common Information and the Encoding of Sources that are Not Successively Refinable Kumar Viswanatha, Emrah Akyol, Tejaswi Nanjundaswamy and Kenneth Rose ECE Department, University of California - Santa

More information

The Gallager Converse

The Gallager Converse The Gallager Converse Abbas El Gamal Director, Information Systems Laboratory Department of Electrical Engineering Stanford University Gallager s 75th Birthday 1 Information Theoretic Limits Establishing

More information

EE/Stat 376B Handout #5 Network Information Theory October, 14, Homework Set #2 Solutions

EE/Stat 376B Handout #5 Network Information Theory October, 14, Homework Set #2 Solutions EE/Stat 376B Handout #5 Network Information Theory October, 14, 014 1. Problem.4 parts (b) and (c). Homework Set # Solutions (b) Consider h(x + Y ) h(x + Y Y ) = h(x Y ) = h(x). (c) Let ay = Y 1 + Y, where

More information

ECE 587 / STA 563: Lecture 5 Lossless Compression

ECE 587 / STA 563: Lecture 5 Lossless Compression ECE 587 / STA 563: Lecture 5 Lossless Compression Information Theory Duke University, Fall 2017 Author: Galen Reeves Last Modified: October 18, 2017 Outline of lecture: 5.1 Introduction to Lossless Source

More information

ECE Information theory Final

ECE Information theory Final ECE 776 - Information theory Final Q1 (1 point) We would like to compress a Gaussian source with zero mean and variance 1 We consider two strategies In the first, we quantize with a step size so that the

More information

ECE Information theory Final (Fall 2008)

ECE Information theory Final (Fall 2008) ECE 776 - Information theory Final (Fall 2008) Q.1. (1 point) Consider the following bursty transmission scheme for a Gaussian channel with noise power N and average power constraint P (i.e., 1/n X n i=1

More information

Appendix B Information theory from first principles

Appendix B Information theory from first principles Appendix B Information theory from first principles This appendix discusses the information theory behind the capacity expressions used in the book. Section 8.3.4 is the only part of the book that supposes

More information

ECE 587 / STA 563: Lecture 5 Lossless Compression

ECE 587 / STA 563: Lecture 5 Lossless Compression ECE 587 / STA 563: Lecture 5 Lossless Compression Information Theory Duke University, Fall 28 Author: Galen Reeves Last Modified: September 27, 28 Outline of lecture: 5. Introduction to Lossless Source

More information

INFORMATION THEORY AND STATISTICS

INFORMATION THEORY AND STATISTICS CHAPTER INFORMATION THEORY AND STATISTICS We now explore the relationship between information theory and statistics. We begin by describing the method of types, which is a powerful technique in large deviation

More information

Lecture 6 Channel Coding over Continuous Channels

Lecture 6 Channel Coding over Continuous Channels Lecture 6 Channel Coding over Continuous Channels I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw November 9, 015 1 / 59 I-Hsiang Wang IT Lecture 6 We have

More information

Soft-Output Trellis Waveform Coding

Soft-Output Trellis Waveform Coding Soft-Output Trellis Waveform Coding Tariq Haddad and Abbas Yongaçoḡlu School of Information Technology and Engineering, University of Ottawa Ottawa, Ontario, K1N 6N5, Canada Fax: +1 (613) 562 5175 thaddad@site.uottawa.ca

More information

Second-Order Asymptotics in Information Theory

Second-Order Asymptotics in Information Theory Second-Order Asymptotics in Information Theory Vincent Y. F. Tan (vtan@nus.edu.sg) Dept. of ECE and Dept. of Mathematics National University of Singapore (NUS) National Taiwan University November 2015

More information

ECE 4400:693 - Information Theory

ECE 4400:693 - Information Theory ECE 4400:693 - Information Theory Dr. Nghi Tran Lecture 8: Differential Entropy Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 1 / 43 Outline 1 Review: Entropy of discrete RVs 2 Differential

More information

Arimoto Channel Coding Converse and Rényi Divergence

Arimoto Channel Coding Converse and Rényi Divergence Arimoto Channel Coding Converse and Rényi Divergence Yury Polyanskiy and Sergio Verdú Abstract Arimoto proved a non-asymptotic upper bound on the probability of successful decoding achievable by any code

More information

Transmitting k samples over the Gaussian channel: energy-distortion tradeoff

Transmitting k samples over the Gaussian channel: energy-distortion tradeoff Transmitting samples over the Gaussian channel: energy-distortion tradeoff Victoria Kostina Yury Polyansiy Sergio Verdú California Institute of Technology Massachusetts Institute of Technology Princeton

More information

Digital Communications III (ECE 154C) Introduction to Coding and Information Theory

Digital Communications III (ECE 154C) Introduction to Coding and Information Theory Digital Communications III (ECE 154C) Introduction to Coding and Information Theory Tara Javidi These lecture notes were originally developed by late Prof. J. K. Wolf. UC San Diego Spring 2014 1 / 8 I

More information

The Empirical Distribution of Rate-Constrained Source Codes

The Empirical Distribution of Rate-Constrained Source Codes The Empirical Distribution of Rate-Constrained Source Codes Tsachy Weissman, Eri Ordentlich HP Laboratories Palo Alto HPL-2003-253 December 8 th, 2003* E-mail: tsachy@stanford.edu, eord@hpl.hp.com rate-distortion

More information

Multimedia Communications. Mathematical Preliminaries for Lossless Compression

Multimedia Communications. Mathematical Preliminaries for Lossless Compression Multimedia Communications Mathematical Preliminaries for Lossless Compression What we will see in this chapter Definition of information and entropy Modeling a data source Definition of coding and when

More information

EE376A - Information Theory Final, Monday March 14th 2016 Solutions. Please start answering each question on a new page of the answer booklet.

EE376A - Information Theory Final, Monday March 14th 2016 Solutions. Please start answering each question on a new page of the answer booklet. EE376A - Information Theory Final, Monday March 14th 216 Solutions Instructions: You have three hours, 3.3PM - 6.3PM The exam has 4 questions, totaling 12 points. Please start answering each question on

More information

AN INFORMATION THEORY APPROACH TO WIRELESS SENSOR NETWORK DESIGN

AN INFORMATION THEORY APPROACH TO WIRELESS SENSOR NETWORK DESIGN AN INFORMATION THEORY APPROACH TO WIRELESS SENSOR NETWORK DESIGN A Thesis Presented to The Academic Faculty by Bryan Larish In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy

More information

Vector Quantization. Institut Mines-Telecom. Marco Cagnazzo, MN910 Advanced Compression

Vector Quantization. Institut Mines-Telecom. Marco Cagnazzo, MN910 Advanced Compression Institut Mines-Telecom Vector Quantization Marco Cagnazzo, cagnazzo@telecom-paristech.fr MN910 Advanced Compression 2/66 19.01.18 Institut Mines-Telecom Vector Quantization Outline Gain-shape VQ 3/66 19.01.18

More information

ELEMENTS O F INFORMATION THEORY

ELEMENTS O F INFORMATION THEORY ELEMENTS O F INFORMATION THEORY THOMAS M. COVER JOY A. THOMAS Preface to the Second Edition Preface to the First Edition Acknowledgments for the Second Edition Acknowledgments for the First Edition x

More information

Channel Dispersion and Moderate Deviations Limits for Memoryless Channels

Channel Dispersion and Moderate Deviations Limits for Memoryless Channels Channel Dispersion and Moderate Deviations Limits for Memoryless Channels Yury Polyanskiy and Sergio Verdú Abstract Recently, Altug and Wagner ] posed a question regarding the optimal behavior of the probability

More information

Digital communication system. Shannon s separation principle

Digital communication system. Shannon s separation principle Digital communication system Representation of the source signal by a stream of (binary) symbols Adaptation to the properties of the transmission channel information source source coder channel coder modulation

More information

LECTURE 15. Last time: Feedback channel: setting up the problem. Lecture outline. Joint source and channel coding theorem

LECTURE 15. Last time: Feedback channel: setting up the problem. Lecture outline. Joint source and channel coding theorem LECTURE 15 Last time: Feedback channel: setting up the problem Perfect feedback Feedback capacity Data compression Lecture outline Joint source and channel coding theorem Converse Robustness Brain teaser

More information