EE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16

Similar documents
Chapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code

Chapter 2 Date Compression: Source Coding. 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science Transmission of Information Spring 2006

Lecture 1 : Data Compression and Entropy

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

Homework Set #2 Data Compression, Huffman code and AEP

10-704: Information Processing and Learning Fall Lecture 10: Oct 3

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria

Multimedia Communications. Mathematical Preliminaries for Lossless Compression

EE376A: Homework #2 Solutions Due by 11:59pm Thursday, February 1st, 2018

Solutions to Set #2 Data Compression, Huffman code and AEP

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

EE5319R: Problem Set 3 Assigned: 24/08/16, Due: 31/08/16

Entropy as a measure of surprise

Lecture 3. Mathematical methods in communication I. REMINDER. A. Convex Set. A set R is a convex set iff, x 1,x 2 R, θ, 0 θ 1, θx 1 + θx 2 R, (1)

COMM901 Source Coding and Compression. Quiz 1

Coding of memoryless sources 1/35

National University of Singapore Department of Electrical & Computer Engineering. Examination for

Chapter 9 Fundamental Limits in Information Theory

Chapter 2: Source coding

Chapter 5: Data Compression

Information Theory and Statistics Lecture 2: Source coding

EE376A - Information Theory Midterm, Tuesday February 10th. Please start answering each question on a new page of the answer booklet.

10-704: Information Processing and Learning Fall Lecture 9: Sept 28

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

1 Introduction to information theory

Lecture 3 : Algorithms for source coding. September 30, 2016

MARKOV CHAINS A finite state Markov chain is a sequence of discrete cv s from a finite alphabet where is a pmf on and for

10-704: Information Processing and Learning Spring Lecture 8: Feb 5

EECS 229A Spring 2007 * * (a) By stationarity and the chain rule for entropy, we have

Communications Theory and Engineering

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy

(Classical) Information Theory II: Source coding

Lecture 11: Quantum Information III - Source Coding

APC486/ELE486: Transmission and Compression of Information. Bounds on the Expected Length of Code Words

Lecture 1: September 25, A quick reminder about random variables and convexity

Entropy and Ergodic Theory Lecture 3: The meaning of entropy in information theory

(each row defines a probability distribution). Given n-strings x X n, y Y n we can use the absence of memory in the channel to compute

Information measures in simple coding problems

Lecture 6: Kraft-McMillan Inequality and Huffman Coding

Data Compression. Limit of Information Compression. October, Examples of codes 1

Information Theory CHAPTER. 5.1 Introduction. 5.2 Entropy

An introduction to basic information theory. Hampus Wessman

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

Lecture 22: Final Review

Coding for Discrete Source

CS 229r Information Theory in Computer Science Feb 12, Lecture 5

EE5585 Data Compression January 29, Lecture 3. x X x X. 2 l(x) 1 (1)

1 Ex. 1 Verify that the function H(p 1,..., p n ) = k p k log 2 p k satisfies all 8 axioms on H.

Information Theory, Statistics, and Decision Trees

LECTURE 15. Last time: Feedback channel: setting up the problem. Lecture outline. Joint source and channel coding theorem

3F1 Information Theory, Lecture 3

SIGNAL COMPRESSION Lecture Shannon-Fano-Elias Codes and Arithmetic Coding

lossless, optimal compressor

Exercises with solutions (Set B)

Lecture 16. Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code

Chapter 5. Data Compression

CSCI 2570 Introduction to Nanocomputing

Lecture 2: August 31

Frans M.J. Willems. Authentication Based on Secret-Key Generation. Frans M.J. Willems. (joint work w. Tanya Ignatenko)

Motivation for Arithmetic Coding

Lecture 3: Channel Capacity

LECTURE 13. Last time: Lecture outline

Information Theory. M1 Informatique (parcours recherche et innovation) Aline Roumy. January INRIA Rennes 1/ 73

Ch. 2 Math Preliminaries for Lossless Compression. Section 2.4 Coding

ECE 4400:693 - Information Theory

Quiz 2 Date: Monday, November 21, 2016

Lecture 4 Noisy Channel Coding

18.2 Continuous Alphabet (discrete-time, memoryless) Channel

LECTURE 10. Last time: Lecture outline

Lecture 8: Channel and source-channel coding theorems; BEC & linear codes. 1 Intuitive justification for upper bound on channel capacity

Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet)

Shannon s noisy-channel theorem

Lecture 11: Continuous-valued signals and differential entropy

ECE Advanced Communication Theory, Spring 2009 Homework #1 (INCOMPLETE)

Solutions to Homework Set #1 Sanov s Theorem, Rate distortion

Kolmogorov complexity

3F1 Information Theory, Lecture 3

EE376A - Information Theory Final, Monday March 14th 2016 Solutions. Please start answering each question on a new page of the answer booklet.

Avoiding Approximate Squares

Complex Systems Methods 2. Conditional mutual information, entropy rate and algorithmic complexity

EE/Stat 376B Handout #5 Network Information Theory October, 14, Homework Set #2 Solutions

Entropy. Probability and Computing. Presentation 22. Probability and Computing Presentation 22 Entropy 1/39

Capacity of a channel Shannon s second theorem. Information Theory 1/33

Lecture 4 : Adaptive source coding algorithms

Lecture 5: Asymptotic Equipartition Property

ECE 587 / STA 563: Lecture 5 Lossless Compression

Lecture 11: Polar codes construction

ECE 587 / STA 563: Lecture 5 Lossless Compression

Lecture 14 February 28

Information Theory. Lecture 5 Entropy rate and Markov sources STEFAN HÖST

EE 376A: Information Theory Lecture Notes. Prof. Tsachy Weissman TA: Idoia Ochoa, Kedar Tatwawadi

Homework 1 Due: Thursday 2/5/2015. Instructions: Turn in your homework in class on Thursday 2/5/2015

(Classical) Information Theory III: Noisy channel coding

Information & Correlation

Electrical and Information Technology. Information Theory. Problems and Solutions. Contents. Problems... 1 Solutions...7

Introduction to information theory and coding

Course notes for Data Compression - 1 The Statistical Coding Method Fall 2005

Lecture 19 : Reed-Muller, Concatenation Codes & Decoding problem

Theory of Computation

Transcription:

EE539R: Problem Set 4 Assigned: 3/08/6, Due: 07/09/6. Cover and Thomas: Problem 3.5 Sets defined by probabilities: Define the set C n (t = {x n : P X n(x n 2 nt } (a We have = P X n(x n P X n(x n 2 nt = C n (t 2 nt x n x n C n(t x n C n(t from which the desired result follows since C n (t 2 nt. (b We want to find the set of values t for which Pr(X n C n (t. Consider the probability therein: Pr(X n C n (t = Pr ( P X n(x n 2 nt ( = Pr ( n log P X n(xn t = Pr n log P X (X i t Now note that the mean of n log P X(X i is H(X and this rv has finite variance. If t = H(X+δ for any δ > 0, then by the law of large numbers, the probability converges to one. Hence the required set is the open interval (H(X,. 2. (Optional Cover and Thomas: Problem 3.7 AEP and Source Coding: (a The number of 00-bit binary sequences with three or fewer ones is: ( ( ( ( 00 00 00 00 + + + = + 00 + 4950 + 6700 = 6675 0 2 3 so the required codelength is log 2 6675 = 8. (b The probability that a 00-bit sequence has three or fewer ones is: 3 ( 00 0.005 3 0.995 00 i = 0.99833 i i=0 Thus, the probability that the sequence that is generated cannot be encoded is 0.99833 = 0.0067. (c If S n that is the sum of n iid random variables X,... X n, Chebyshev s inequality states that, Pr( S n nµ ɛ nσ2 ɛ 2

where µ and σ 2 are the mean and variance of the X i s. In this problem, n = 00, µ = 0.005, and σ 2 = 0.005 0.995. Note that S 00 4 if and only if S 00 00 0.005 3.5 so we should choose ɛ = 3.5. Then Pr(S 00 4 00 0.005 0.995 3.5 2 0.0406 This bound is much larger than the actual probability 0.0067. 3. Cover and Thomas: Problem 3.9 AEP and Divergence: (a We have that X,..., X n are iid according to p(x. We are asked to evaluate the limit in probability of L = n log q(x,..., X n First note from memorylessness (independence that We may also write L = n L = n log q(x i log p(x i q(x i } {{ } =L n log p(x i }{{} =L 2 We know by the usual AEP that the second term L 2 converges to H(p = H(X in probability. The first term L converges to E[L ] = x p(x log p(x q(x = D(p q in probability so L converges to in probability. E[L] = D(p q + H(p (b The limit of the log-likelihood ratio in probability is [ E log q(x ] = p(x log q(x p(x p(x = D(p q. x 4. From Last Year s Exam: (a One value for c is This is because n log Pr(Xn = x n = n c = H(X + H(X 2,i odd log Pr(X i = x i n,i even log Pr(X i = x i By the law of large numbers, the first sum tends to H(X/2 since the distribution of X i for odd i is p X while the second sum tends to H(X /2 since the distribution of X i for even i is p X. 2

(b The optimal compression rate is c. We code as follows. If the encoder observes a sequence in T n ε (c, represent it using c + ε + 2/n bits per symbol (with a single bit prefix to indicate that the sequence belongs in the typical set. If the encoder observes a sequence not in T n ε (c, encoder it with an arbitrary string of length nc + nε + 2. This ensures that the compression rate is no more than c + ε + 2/n and the probability of decoding error goes to zero as n becomes large. Since c + ε + 2/n is arbitrarily close to c, the claim is proved. (c The minimum compression rate is H(X Y. We code as follows assuming Y = {0, }. Let I 0 := {i =,..., n : y i = 0} be the indices in which the side information y i = 0 and let I := {,..., n} \ I 0. Then we use an optimum length- I 0 source code for the source Pr(X = x Y = 0 and an optimum length- I source code for the source Pr(X = x Y =. The code rate is thus I 0 n H(X Y = 0 + I H(X Y = n The decoder also knows the indices I 0 and I and can partition its observations into these two subblocks and decode per normal. However, by the law of large numbers, Ij n p Y (j for j = 0,. Thus, for large enough n with very high probability, the code rate above is which is the conditional entropy H(X Y. p Y (0H(X Y = 0 + p Y (H(X Y = (d The source has uncertainty or entropy H(X. The side information reduces the uncertainty from H(X to H(X Y. The difference is the mutual information I(X; Y which is how much reduction in uncertainty of X given I know Y. (e Yes. The code rate will be increased to 9 0 H(X + 0H(X Y since the side-information is only available one-tenth of the time. 5. (Optional 205/6 Quiz : (0 points Weighted Source Coding: In class, we saw that the minimum rate of compression for an i.i.d. source X n with distribution P X is H(X = x X P X (x log P X (x. Now suppose that there are costs to encoding each symbol. Consider a cost function c : X [0,. For any length-n string, let n c (n (x n := c(x i and let the size of any set A X n be c (n (A := x n A c (n (x n. We say that a rate R is achievable if there exists a sequence (in n of sets A n X n with sizes c (n (A n that satisfy n log c(n (A n R and Pr(X n / A n 0 as n. We also define the optimal weighted source coding rate to be R (X; c := inf{r R : R is achievable}. Define H(P c := x X P X (x log c(x P X (x, 3

and for a small ɛ > 0, the set { B ɛ (n (X; c := x n : H(P c ɛ n log c(n (x n } P X n(x n H(P c + ɛ. (a What is R (X; c when c(x = for all x X? quantity needs to be stated. The answer is H(X, the entropy. (b Now for general c : X [0,, is it true that No justification needed, only a information Pr(X n B (n ɛ (X; c, as n? Prove or argue why it is not true. Yes it is true. By the law of large numbers. Consider ( Pr(X n B ɛ (n (X; c = Pr n ( = Pr n log c(x i P X (X i Var(log c(x P X (X nɛ 2 log c(x i P X (X i H(P c ɛ H(P c > ɛ (c Show carefully that We have ( c (n B ɛ (n (X; c = ( c (n B ɛ (n n(h(p c+ɛ (X; c 2 x n B (n ɛ (X;c x n B ɛ (n (X;c c (n (x n = 2 n(h(p c+ɛ n(h(p c+ɛ 2 P X n(x n n(h(p c+ɛ 2 x n B ɛ (n (X;c P X n(x n (d Using part (c, find the best possible upper bound for R (X; c. You need to prove an achievability result, i.e., specify the sets A n and provide clear a reason for your upper bound. You do not need to prove any converse. An upper bound is R (X; c H(P c + ɛ. Take A n := B ɛ (n (X; c. 6. Bad Huffman Codes: Which of these codes cannot be Huffman codes for any probability assignment? (a {0, 0, }. Solution: {0, 0, } is a Huffman code for the distribution (/2, /4, /4. 4

(b {00, 0, 0, 0}. Solution: {00, 0, 0, 0} is not a Huffman code because there is a unique longest codeword. (c {0, 0}. Solution: The code {0, 0} can be shortened to {0, } without losing its instantaneous property, and therefore is not optimal and not a Huffman code. 7. (Optional Suffix-Free Codes: Define a suffix-free code as a code in which no codeword is a suffix of any other codeword. (a Show that suffix-free codes are uniquely decodable. Use the definition of unique decodability, rather than the intuitive but vague idea of decodability with initial synchronization. Solution: Assume the contrary i.e. suffix-free codes are not uniquely decodable. Then there must exist two distinct sequence of source letters, say, (x, x 2,... x n and (x, x 2,... x m such that, C(x C(x 2... C(x n = C(x C(x 2... C(x m Then one of the following must hold (i C(x n = C(x m or (ii C(x n is a suffix of C(x m or (ii C(x m is a suffix of C(x n. In the last two cases we arrive at a contradiction since our code is suffix-free. In the first case, simply delete the the last two source letters from each sequence and repeat the argument till one of the latter two cases holds and a contradiction is reached. Hence, suffix-free codes are uniquely decodable. Alternatively, the fact that the codes are uniquely decodable can be seen easily be reversing the order of the code. For any received sequence, we work backwards from the end, and look for the reversed codewords. Since the codewords satisfy the suffix condition, the reversed codewords satisfy the prefix condition, and the we can uniquely decode the reversed code. (b Find an example of a suffix free code with codeword lengths (, 2, 2 that is not a prefix-free code. Can a code word be decoded as soon as its last bit arrives at the decoder? Show that a decoder might have to wait for an arbitrarily long time before decoding (this is why a careful definition of unique decodability is required. Solution: The {0, 0, } code discussed in the lecture is an example of a suffix-free code with codeword lengths (, 2, 2 that is not a prefix-free code. Clearly, a codeword cannot be decoded as soon as its last bit arrives at the decoder. To illustrate a rather extreme case, consider the following output produced by the encoder, 0.... Assuming that source letters {a, b, c} map to {0, 0, }, we cannot distinguish between the two possible source sequences, acccccccc... and bcccccccc... till the end of the string is reached. Hence, in this case the decoder might have to wait for an arbitrarily long time before decoding. 8. (Optional Kraft for Uniquely Decodable Codes: Assume a uniquely decodable code has lengths l,..., l M. (a Prove the following identity (this is easy: n 2 lj =... j = j 2= Solution: This is trivial. Simply expand the sum. j n= 2 (lj +lj 2 +...+ljn (b Show that there is one term on the right for each concatenation of n codewords (i.e. for the encoding of the n-tuple x = (x,..., x n where l j + l j2 +... + l jn is the aggregate length of that concatenation. Solution: l j + l j2 +... + l jn is the length of n codewords from the code. 5

(c Let A i be the number of concatenations that have overall length i and show that n nl max = A i 2 i 2 lj Solution: The smallest value this exponent can take is n, which would happen if all code words had the length. The largest value the exponent can take is nl max where l max is the maximal codeword length. The summation can then be written as above. (d Using unique decodability, show that A i 2 i and hence n 2 lj nl max Solution: The number of possible binary sequences of length i is 2 i. Since the code is uniquely decodable, we must have A i 2 i in order to be able to decode. Plugging this into the above bound yields 2 lj n nl max i=n i=n 2 i 2 i = n(l max. (e By taking n-th root and letting n, recover Kraft s inequality for uniquely decodable codes. Solution: We have 2 lj [ n(l max ] [ ] /n = exp n log(n(l max The exponent goes to zero as n and hence, M 2 lj, Kraft s inequality for UD codes. 9. (Optional Infinite Alphabet Optimal Code: Let X be an i.i.d. random variable with an infinite alphabet, X = {, 2, 3,..., }. In addition let P (X = i = 2 i. (a What is the entropy of X? Solution: By direct calculation, This is because H(X = 2 i log(2 i = for all x <. Can be shown by differentiation. i2 i = 2. ix i = ( x 2 (b Find an optimal variable length code, and show that it is indeed optimal. Solution: Take the codelengths to be log(2, log(2 2, log(2 3,.... Codewords can be C( = 0 C(2 = 0 C(3 = 0. 6