EE376A: Homework #2 Solutions Due by 11:59pm Thursday, February 1st, 2018

Similar documents
EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

Homework Set #2 Data Compression, Huffman code and AEP

EE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16

Solutions to Set #2 Data Compression, Huffman code and AEP

Exercises with solutions (Set B)

EE376A - Information Theory Final, Monday March 14th 2016 Solutions. Please start answering each question on a new page of the answer booklet.

EECS 229A Spring 2007 * * (a) By stationarity and the chain rule for entropy, we have

EE376A - Information Theory Midterm, Tuesday February 10th. Please start answering each question on a new page of the answer booklet.

Lecture 5: Asymptotic Equipartition Property

Lecture 22: Final Review

Communications Theory and Engineering

National University of Singapore Department of Electrical & Computer Engineering. Examination for

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

Chapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code

Lecture 1 : Data Compression and Entropy

3F1 Information Theory, Lecture 3

10-704: Information Processing and Learning Fall Lecture 10: Oct 3

Lecture 16. Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code

Lec 03 Entropy and Coding II Hoffman and Golomb Coding

EE376A: Homeworks #4 Solutions Due on Thursday, February 22, 2018 Please submit on Gradescope. Start every question on a new page.

Information Theory. David Rosenberg. June 15, New York University. David Rosenberg (New York University) DS-GA 1003 June 15, / 18

Multimedia Communications. Mathematical Preliminaries for Lossless Compression

10-704: Information Processing and Learning Fall Lecture 9: Sept 28

Homework 1 Due: Thursday 2/5/2015. Instructions: Turn in your homework in class on Thursday 2/5/2015

Information Theory CHAPTER. 5.1 Introduction. 5.2 Entropy

EE5319R: Problem Set 3 Assigned: 24/08/16, Due: 31/08/16

3F1 Information Theory, Lecture 3

Chapter 2 Date Compression: Source Coding. 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code

lossless, optimal compressor

SIGNAL COMPRESSION Lecture Shannon-Fano-Elias Codes and Arithmetic Coding

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria

PART III. Outline. Codes and Cryptography. Sources. Optimal Codes (I) Jorge L. Villar. MAMME, Fall 2015

Intro to Information Theory

Data Compression Techniques (Spring 2012) Model Solutions for Exercise 2

Coding of memoryless sources 1/35

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science Transmission of Information Spring 2006

Entropy. Probability and Computing. Presentation 22. Probability and Computing Presentation 22 Entropy 1/39

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

Shannon s noisy-channel theorem

Chapter 2: Source coding

EE/Stat 376B Handout #5 Network Information Theory October, 14, Homework Set #2 Solutions

EE5585 Data Compression May 2, Lecture 27

1 Introduction to information theory

Information Theory. Week 4 Compressing streams. Iain Murray,

Information Theory. Lecture 5 Entropy rate and Markov sources STEFAN HÖST

Entropy as a measure of surprise

EE 376A: Information Theory Lecture Notes. Prof. Tsachy Weissman TA: Idoia Ochoa, Kedar Tatwawadi

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

(Classical) Information Theory II: Source coding

Lecture 3: Channel Capacity

Lecture 8: Channel and source-channel coding theorems; BEC & linear codes. 1 Intuitive justification for upper bound on channel capacity

Huffman Coding. C.M. Liu Perceptual Lab, College of Computer Science National Chiao-Tung University

Coding for Discrete Source

Lecture 6: Kraft-McMillan Inequality and Huffman Coding

Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet)

Lecture 1: September 25, A quick reminder about random variables and convexity

3F1: Signals and Systems INFORMATION THEORY Examples Paper Solutions

Lecture 11: Information theory THURSDAY, FEBRUARY 21, 2019

Chapter 9 Fundamental Limits in Information Theory

ECE Advanced Communication Theory, Spring 2009 Homework #1 (INCOMPLETE)

Complex Systems Methods 2. Conditional mutual information, entropy rate and algorithmic complexity

Information Theory: Entropy, Markov Chains, and Huffman Coding

LECTURE 13. Last time: Lecture outline

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

Classical Information Theory Notes from the lectures by prof Suhov Trieste - june 2006

Information & Correlation

LECTURE 15. Last time: Feedback channel: setting up the problem. Lecture outline. Joint source and channel coding theorem

Lecture 4 : Adaptive source coding algorithms

An introduction to basic information theory. Hampus Wessman


The Communication Complexity of Correlation. Prahladh Harsha Rahul Jain David McAllester Jaikumar Radhakrishnan

Lecture 6 I. CHANNEL CODING. X n (m) P Y X

Shannon s Noisy-Channel Coding Theorem

ELEC546 Review of Information Theory

Lecture 5 - Information theory

Lecture 11: Polar codes construction

CSCI 2570 Introduction to Nanocomputing

SIGNAL COMPRESSION Lecture 7. Variable to Fix Encoding

(Classical) Information Theory III: Noisy channel coding

CS 229r Information Theory in Computer Science Feb 12, Lecture 5

Run-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE

Chaos, Complexity, and Inference (36-462)

Basic Principles of Lossless Coding. Universal Lossless coding. Lempel-Ziv Coding. 2. Exploit dependences between successive symbols.

Introduction to information theory and coding

Chapter 5: Data Compression

Lecture 3. Mathematical methods in communication I. REMINDER. A. Convex Set. A set R is a convex set iff, x 1,x 2 R, θ, 0 θ 1, θx 1 + θx 2 R, (1)

Lec 05 Arithmetic Coding

Capacity of a channel Shannon s second theorem. Information Theory 1/33

Lecture Notes 3 Convergence (Chapter 5)

INTRODUCTION TO INFORMATION THEORY

EE5585 Data Compression January 29, Lecture 3. x X x X. 2 l(x) 1 (1)

Exercises with solutions (Set D)

CS 630 Basic Probability and Information Theory. Tim Campbell

4. Quantization and Data Compression. ECE 302 Spring 2012 Purdue University, School of ECE Prof. Ilya Pollak

Noisy-Channel Coding

Lecture 15: Conditional and Joint Typicaility

Notes 3: Stochastic channels and noisy coding theorem bound. 1 Model of information communication and noisy channel

LECTURE 10. Last time: Lecture outline

10-704: Information Processing and Learning Spring Lecture 8: Feb 5

Physical Layer and Coding

Transcription:

Please submit the solutions on Gradescope. Some definitions that may be useful: EE376A: Homework #2 Solutions Due by 11:59pm Thursday, February 1st, 2018 Definition 1: A sequence of random variables X 1, X 2,..., X n,... converges to a random variable X in probability if for any ɛ > 0, lim P { X n X < ɛ} = 1 (1) n 1. Inequalities. Let X, Y and Z be jointly distributed random variables. Prove the following inequalities and find conditions for equality. (a) H(X, Y Z) H(X Z). (b) I(X, Y ; Z) I(X; Z). (c) H(X, Y, Z) H(X, Y ) H(X, Z) H(X). (d) I(X; Z Y ) I(Z; Y X) I(Z; Y ) + I(X; Z). Solution: Inequalities. (a) Using the chain rule for conditional entropy, H(X, Y Z) = H(X Z) + H(Y X, Z) H(X Z), with equality iff H(Y X, Z) = 0, that is, when Y is a function of X and Z. (b) Using the chain rule for mutual information, I(X, Y ; Z) = I(X; Z) + I(Y ; Z X) I(X; Z), with equality iff I(Y ; Z X) = 0, that is, when Y and Z are conditionally independent given X. (c) Using first the chain rule for entropy and then the definition of conditional mutual information, H(X, Y, Z) H(X, Y ) = H(Z X, Y ) = H(Z X) I(Y ; Z X) H(Z X) = H(X, Z) H(X), with equality iff I(Y ; Z X) = 0, that is, when Y and Z are conditionally independent given X. Homework 2 Page 1 of 9

(d) Using the chain rule for mutual information, I(X; Z Y ) + I(Z; Y ) = I(X, Y ; Z) = I(Z; Y X) + I(X; Z), and therefore I(X; Z Y ) = I(Z; Y X) I(Z; Y ) + I(X; Z). We see that this inequality is actually an equality in all cases. 2. AEP Let X i be iid p(x), x {1, 2,..., m}. For ɛ > 0, let µ = EX, and H = p(x) log p(x). Let A n = {x n X n : 1 n log p(xn ) H ɛ} and B n = {x n X n : 1 n n i=1 x i µ ɛ}. (a) Does P {X n A n } 1? (b) Does P {X n A n B n } 1? (c) Show that A n B n 2 n(h+ɛ), for all n. (d) Show that A n B n ( 1 2 )2n(H ɛ), for n sufficiently large. Solution: AEP (a) Yes, by the AEP for discrete random variables the probability X n is typical goes to 1. (b) Yes, by the Law of Large Numbers P (X n B n ) 1. So there exists ɛ > 0 and N 1 such that P (X n A n ) > 1 ɛ 2 for all n > N 1, and there exists N 2 such that P (X n B n ) > 1 ɛ 2 for all n > N 2. So for all n > max(n 1, N 2 ): P (X n A n B n ) = P (X n A n ) + P (X n B n ) P (X n A n B n ) > 1 ɛ 2 + 1 ɛ 2 1 = 1 ɛ So for any ɛ > 0 there exists N = max(n 1, N 2 ) such that P (X n A n B n ) > 1 ɛ for all n > N, therefore P (X n A n B n ) 1. (c) By the law of total probability x n A n B n p(x n ) 1. Also, for x n A n, from Theorem 3.1.2 in the text, p(x n ) 2 n(h+ɛ). Combining these two equations gives 1 x n A n B n p(x n ) x n A n B n 2 n(h+ɛ) = A n B n 2 n(h+ɛ). Multiplying through by 2 n(h+ɛ) gives the result A n B n 2 n(h+ɛ). (d) Since from (b) P {X n A n B n } 1, there exists N such that P {X n A n B n } 1 2 for all n > N. From Theorem 3.1.2 in the text, for xn A n, p(x n ) 2 n(h ɛ). So combining these two gives 1 2 x n A n B n p(x n ) x n A n B n 2 n(h ɛ) = A n B n 2 n(h ɛ). Multiplying through by 2 n(h ɛ) gives the result A n B n ( 1 2 )2n(H ɛ) for n sufficiently large. Homework 2 Page 2 of 9

3. An AEP-like limit and the AEP (a) Let X 1, X 2,... be i.i.d. drawn according to probability mass function p(x). Find the limit in probability as n of p(x 1, X 2,..., X n ) 1 n. (b) Let X 1, X 2,... be an i.i.d. sequence of discrete random variables with entropy H(X). Let C n (t) = {x n X n : p(x n ) 2 nt } denote the subset of n-length sequences with probabilities 2 nt. i. Show that C n (t) 2 nt. ii. What is lim n P (X n C n (t)) when t < H(X)? And when t > H(X)? Solution: An AEP-like limit and the AEP. (a) By the AEP, we know that for every δ > 0, lim P n ( H(X) δ 1 n log p(x 1, X 2,..., X n ) H(X) + δ ) = 1 Now, fix ɛ > 0 (sufficiently small) and choose δ = min{log(1 + 2 H(X) ɛ), log(1 2 H(X) ɛ)}. Then, 2 H(X) (2 δ 1) ɛ and 2 H(X) (2 δ 1) ɛ. Thus, H(X) δ 1 n log p(x 1, X 2,..., X n ) H(X) + δ = 2 H(X) 2 δ (p(x 1, X 2,..., X n )) 1 n 2 H(X) 2 δ = 2 H(X) (2 δ 1) (p(x 1, X 2,..., X n )) 1 n 2 H(X) 2 H(X) (2 δ 1) = ɛ (p(x 1, X 2,..., X n )) 1 n 2 H(X) ɛ This along with AEP implies that P ( p(x 1, X 2,..., X n )) 1 n 2 H(X) ɛ) 1 for all ɛ > 0 and hence (p(x 1, X 2,..., X n )) 1 n converges to 2 H(X) in probability. This proof can be shortened by directly invoking the continuous mapping theorem, which says that if Z n converges to Z in probability and f is a continuous function, then f(z n ) converges to f(z) in probability. Alternate proof (using Strong LLN): X 1, X 2,..., i.i.d. p(x). Hence log(x i ) are also i.i.d. and lim(p(x 1, X 2,..., X n )) 1 n = lim 2 log(p(x 1,X 2,...,X n)) 1 n = 2 lim 1 n log p(xi ) = 2 E(log(p(X))) = 2 H(X) Homework 2 Page 3 of 9

where the second equality uses the continuity of the function 2 x and the third equality uses the strong law of large numbers. Thus, (p(x 1, X 2,..., X n )) 1 n converges to 2 H(X) alomost surely, and hence in probability. (b) i. 1 p(x n ) x n C n(t) x n C n(t) 2 nt = C n (t) 2 nt Thus, C n (t) 2 nt. ii. AEP immediately implies that lim n P (X n C n (t)) = 0 for t < H(X) and lim n P (X n C n (t)) = 1 for t > H(X). 4. Prefix and Uniquely Decodable codes Consider the following code: u Codeword a 1 0 b 0 0 c 1 1 d 1 1 0 (a) Is this a Prefix code? (b) Argue that this code is uniquely decodable, by providing an algorithm for the decoding. Solution: Prefix and Uniquely Decodable (a) No. The codeword of c is a prefix of the codeword of d. (b) We decode the encoded symbols from left to right. At any stage, If the next two bits are 10, output a and move to the third bit. If the next two bits are 00, output b and move to the third bit. If the next two bits are 11, look at the third bit: If it is 1, output c and move to the third bit If it is 0, count the number of 0 s after the 11: If even (say 2m zeros), decode to cb... b with m b s and move to the bit after the 0 s. Homework 2 Page 4 of 9

If odd (say 2m + 1 zeros), decode to db... b with m b s and move to the bit after the 0 s. Some examples with their decoding: 11011. It is not possible to split this string as 11 0 11 because there is no codeword 0. Therefore the only way is: 110 11. 1110. It is not possible to split this string as 1 11 0 or 1 110 because there is no codeword 0 or 1. Therefore the only way is: 11 10. 110010. It is not possible to split this string as 110 0 10 because there is no codeword 0. Therefore the only way is: 11 00 10. For a more elaborate discussion on this topic read Problem 5.27 1. In this problem,the Sardinas-Patterson test of unique decodability is explained. 5. Information in the Sequence length. How much information does the length of a sequence give about the content of a sequence? Consider an i.i.d. Bernoulli(1/2) process {X i }. Stop the process when the first 1 appears. Let N designate this stopping time. Thus X N is a random element of the set of all finite length binary sequences {0, 1} = {0, 1, 00, 01, 10, 11, 000, }. (a) Find H(X N N). (b) Find H(X N ). (c) Find I(N; X N ). Let s now consider a different stopping time. For this part, again assume X i Bernoulli(1/2) but stop at time N = 6 with probability 1/3 and stop at time N = 12 with probability 2/3. Let this stopping time be independent of the sequence X 1 X 2 X 12. (d) Find H(X N N). (e) Find H(X N ). (f) Find I(N; X N ). Solution: Sequence length. (a) Given N, the only possible X N is 000 01 with N 1 number of 0 s. Thus, H(X N N) = 0. (b) H(X N ) = I(X N ; N) + H(X N N) = H(N) H(N X N ) + H(X N N). Now, H(N X N ) = H(X N N) = 0 and H(N) = i=1 (1/2)i log(1/2) i = 2. Thus, H(X N ) = 2. 1 from: T.M. Cover and J.A. Thomas, Elements of Information Theory, Second Edition,2006. Homework 2 Page 5 of 9

(c) I(N; X N ) = H(N) H(N X N ) = 2 (d) H(X N N) = P (N = 6)H(X N N = 6) + P (N = 12)H(X N N = 12) = (1/3) 6 + (2/3) 12 = 10. (e) Again, we have H(N X N ) = 0. Therefore, I(N; X N ) = H(N) H(N X N ) = H(N) = (1/3) log(1/3) (2/3) log(2/3) = log 2 3 2/3. H(X N ) = I(X N ; N) + H(X N N) = log 2 3 2/3 + 10 (f) From above, I(N; X N ) = log 2 3 2/3. 6. Coin Toss Experiment and Golomb Codes Kedar, Mikel and Naroa have been instructed to record the outcomes of a coin toss experiment. Consider the coin toss experiment X 1, X 2, X 3,... where X i are i.i.d. Bern(p) (probability of a H (head) is p), p = 15/16. (a) Kedar decides to use Huffman coding to represent the outcome of each coin toss separately. What is the resulting scheme? What compression rate does it achieve? (b) Mikel suggests he can do a better job by applying Huffman coding on blocks of r tosses. Construct the Huffman code for r = 2. (c) Will Mikel s scheme approach the optimum expected number of bits per description of source symbol (coin toss outcome) with increasing r? How does the space required to represent the codebook increase as we increase r? (d) Naroa suggests that, as the occurrence of T is so rare, we should just record the number of tosses it takes for each T to occur. To be precise, if Y k represents the number of trials until the k th T occurred (inclusive), then Naroa records the sequence: where Y 0 = 0 Z k = Y k Y k 1, k 1, (2) i. What is the distribution of Z k, i.e., P (Z k = j), j = 1, 2, 3,...? ii. Compute the entropy and expectation of Z k. iii. How does the ratio between the entropy and the expectation of Z k compare to the entropy of X k? Give an intuitive interpretation. (e) Consider the following scheme for encoding Z k, which is a specific case of Golomb Coding. We are showing the first 10 codewords. Homework 2 Page 6 of 9

Z Quotient Remainder Code 1 0 1 1 01 2 0 2 1 10 3 0 3 1 11 4 1 0 0 1 00 5 1 1 0 1 01 6 1 2 0 1 10 7 1 3 0 1 11 8 2 0 00 1 00 9 2 1 00 1 01 10 2 2 00 1 10 i. Can you guess the general coding scheme? Compute the expected codelength of this code. ii. What is the decoding rule for this Golomb code? iii. How can you efficiently encode and decode with this codebook? Solution: (a) As we have just 2 source symbols (H & T), the optimal Huffman code is {0, 1} for {H, T }. The compression rate (expected codelength) achieved is 1. (b) One possible Huffman code for r = 2 is HH - 0, HT - 10, TH - 110, TT - 111. This has average codelength of 303 bits/source symbol. 512 (c) We know that, using Shannon codes for extended codewords of length r, the expected average codelength has bounds: H(X) l avg H(X) + 1 r (3) As Huffman coding is the optimal prefix code, it will always perform at least as well as the Shannon codes. Thus, as r increases, we will achieve optimal average codelength of H(X) = H b (1/16). As r increases, the number of codewords increases as 2 r. Also, the average size of the codewords, increases as rh(x). Thus, effectively, the amount of space required to store the codebook increases exponentially in r. (d) First of all, let us convince ourselves that if Y k = n, then {Z 1,..., Z k } represents the same information as {X 1,..., X n }. Also, Z k random variables are independent of each other (since the coin tosses are independent) and have identical distributions. i. Z k has geometric distribution. Specifically P (Z k = j) = p j 1 (1 p), j 1, 2, 3,... To see this, lets consider Z 1 for ease first. For Z 1 = j, we should have the first j 1 tosses to be H, while the j th toss should be a T. This results in: P (Z 1 = j) = p j 1 (1 p). Due to Z k being i.i.d., we can generalize the results to Z k. Homework 2 Page 7 of 9

ii. H(Z k ) = H b(p) (1 p). As Z k has geometric distribution, E[Z k ] = 1 (1 p). iii. We observe that, H(X) = H(Z k) E[Z k. Intuitively, we can explain this as follows: ] Z k represents the same amount of information as X Yk 1 +1,..., X Yk 1. As the X i are i.i.d., the information content of Z k is the same as the information content of E[Z k ] i.i.d random variables X i. Thus, H(Z k ) = E[Z k ]H(X) (e) As an aside, the aim of the question was to introduce Golomb Codes. Golomb codes are infact optimal for sequences with geometric distribution, even though having a very simple construction. We have presented a specific case of Golomb codes in our example. i. We are considering quotient of Z with respect to 4 in the unary form, and remainder in the binary form and concatenating them using a separator bit (the bit 1 in this case). To calculate the expected codelength, observe that the length of codeword for Z = 4k + m (where 0 m 3), is 3 + k. Thus, adding terms in groups of 4 and subtracting one term corresponding to Z = 0, we get, l = (1 p)p 4k 1 (1 + p + p 2 + p 3 )(3 + k) 3 1 p p k=0 = (1 p)(1 + p + p 2 + p 3 ) p 4k 1 (3 + k) 3 1 p p k=0 = (1 p)(1 + p + p 2 + p 3 ) 3 2p4 p(1 p 4 ) 31 p 2 p = 3 2p4 p(1 p 4 ) 31 p p For the current source, this evaluates to 6.62 bits, as compared to the entropy H(Z) = 5.397 bits. ii. It is easy to show that the code is prefix-free. Thus, the decoding can proceed by using the standard techniques of constructing a prefix-tree and using the same for decoding. However, note that in this case, the codebook size itself is unbounded, thus we might need to truncate the tree to some maximum depth. There, is however a better and a simpler way, which does not require us to store the codebook explicitly. We can parse the symbol stream until we get the bit 1, this gives us the quotient q. Now, we read the next 2 bits, which gives us the remainder r. Using q and r, we can decode Z = 4 q + r. iii. The good thing about Golomb codes is that, even though the codebook size is unbounded, we do not need to explicitly store the codebook, but can have a simple code which generate the codes, and also to decode the codes. 7. Bad codes. Homework 2 Page 8 of 9

Which of these codes cannot be a Huffman code for any probability assignment? Justify your answers. (a) {0, 10, 11}. (b) {00, 01, 10, 110}. (c) {01, 10}. Solution: Bad codes. (a) {0, 10, 11} is a Huffman code for the distribution ( 1 2, 1 4, 1 4 ). (b) The code {00, 01, 10, 110} can be shortened to {00, 01, 10, 11} without losing its prefix-free property, and therefore is not optimal, so it cannot be a Huffman code. Alternatively, it is not a Huffman code because there is a unique longest codeword. (c) The code {01, 10} can be shortened to {0, 1} without losing its prefix-free property, and therefore is not optimal and not a Huffman code. Homework 2 Page 9 of 9