MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science Transmission of Information Spring 2006

Similar documents
EE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16

Homework Set #2 Data Compression, Huffman code and AEP

Information Theory. Lecture 5 Entropy rate and Markov sources STEFAN HÖST

Solutions to Set #2 Data Compression, Huffman code and AEP

National University of Singapore Department of Electrical & Computer Engineering. Examination for

Communications Theory and Engineering

EE376A: Homework #2 Solutions Due by 11:59pm Thursday, February 1st, 2018

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

3F1 Information Theory, Lecture 3

Lecture 5: Asymptotic Equipartition Property

Solutions to Homework Set #1 Sanov s Theorem, Rate distortion

(Classical) Information Theory II: Source coding

LECTURE 15. Last time: Feedback channel: setting up the problem. Lecture outline. Joint source and channel coding theorem

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

Information measures in simple coding problems

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria

APC486/ELE486: Transmission and Compression of Information. Bounds on the Expected Length of Code Words

EE5319R: Problem Set 3 Assigned: 24/08/16, Due: 31/08/16

Lecture 22: Final Review

Lecture 11: Polar codes construction

1 Ex. 1 Verify that the function H(p 1,..., p n ) = k p k log 2 p k satisfies all 8 axioms on H.

INFORMATION THEORY AND STATISTICS

Context tree models for source coding

COMM901 Source Coding and Compression. Quiz 1

Homework 1 Due: Thursday 2/5/2015. Instructions: Turn in your homework in class on Thursday 2/5/2015

10-704: Information Processing and Learning Spring Lecture 8: Feb 5

10-704: Information Processing and Learning Fall Lecture 9: Sept 28

Stochastic process. X, a series of random variables indexed by t

LECTURE 3. Last time:

(each row defines a probability distribution). Given n-strings x X n, y Y n we can use the absence of memory in the channel to compute

EE376A: Homeworks #4 Solutions Due on Thursday, February 22, 2018 Please submit on Gradescope. Start every question on a new page.

. Find E(V ) and var(v ).

18.2 Continuous Alphabet (discrete-time, memoryless) Channel

ECE Advanced Communication Theory, Spring 2009 Homework #1 (INCOMPLETE)

CSE 312 Final Review: Section AA

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

Information Theory in Intelligent Decision Making

MARKOV CHAINS A finite state Markov chain is a sequence of discrete cv s from a finite alphabet where is a pmf on and for

Network coding for multicast relation to compression and generalization of Slepian-Wolf

10-704: Information Processing and Learning Fall Lecture 10: Oct 3

Bioinformatics: Biology X

CS 229r Information Theory in Computer Science Feb 12, Lecture 5

Complex Systems Methods 2. Conditional mutual information, entropy rate and algorithmic complexity

Lecture 2: August 31

Memory in Classical Information Theory: A Brief History

Lecture 3 : Algorithms for source coding. September 30, 2016

Lecture 1: Introduction, Entropy and ML estimation

Introduction to information theory and coding

3F1: Signals and Systems INFORMATION THEORY Examples Paper Solutions

MARKOV PROCESSES. Valerio Di Valerio

EECS 229A Spring 2007 * * (a) By stationarity and the chain rule for entropy, we have

QB LECTURE #4: Motif Finding

Quiz 1 Date: Monday, October 17, 2016

3F1 Information Theory, Lecture 3

Lecture 3: Channel Capacity

MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016

Lecture 4 Noisy Channel Coding

EE5585 Data Compression May 2, Lecture 27

Coding for Discrete Source

Midterm Exam Information Theory Fall Midterm Exam. Time: 09:10 12:10 11/23, 2016

Solutions to Problem Set 5

Coding for Digital Communication and Beyond Fall 2013 Anant Sahai MT 1

MATH 564/STAT 555 Applied Stochastic Processes Homework 2, September 18, 2015 Due September 30, 2015

Consistency of the maximum likelihood estimator for general hidden Markov models

Capacity of a channel Shannon s second theorem. Information Theory 1/33

Capacity of AWGN channels

Chapter 3: Asymptotic Equipartition Property

Eleventh Problem Assignment

ECE 587 / STA 563: Lecture 5 Lossless Compression

Information Theory and Networks

Coding of memoryless sources 1/35

ECE 587 / STA 563: Lecture 5 Lossless Compression

18.175: Lecture 30 Markov chains

ELEC546 Review of Information Theory

Coding on Countably Infinite Alphabets

ECS 332: Principles of Communications 2012/1. HW 4 Due: Sep 7

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

Entropies & Information Theory

Electrical and Information Technology. Information Theory. Problems and Solutions. Contents. Problems... 1 Solutions...7

COMPSCI 240: Reasoning Under Uncertainty

1 Introduction to information theory

Lecture 6: Entropy Rate

Information Theory: Entropy, Markov Chains, and Huffman Coding

Bayesian Methods for Machine Learning

Bit-Stuffing Algorithms for Crosstalk Avoidance in High-Speed Switching

Midterm 2 Review. CS70 Summer Lecture 6D. David Dinh 28 July UC Berkeley

Markov Chains. X(t) is a Markov Process if, for arbitrary times t 1 < t 2 <... < t k < t k+1. If X(t) is discrete-valued. If X(t) is continuous-valued

Large Deviations Performance of Knuth-Yao algorithm for Random Number Generation

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.

Week 5: Markov chains Random access in communication networks Solutions

Dept. of Linguistics, Indiana University Fall 2015

Exercises with solutions (Set B)

Disjointness and Additivity

LIMITING PROBABILITY TRANSITION MATRIX OF A CONDENSED FIBONACCI TREE

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Intermittent Communication

Information Theory. David Rosenberg. June 15, New York University. David Rosenberg (New York University) DS-GA 1003 June 15, / 18

Basic Principles of Lossless Coding. Universal Lossless coding. Lempel-Ziv Coding. 2. Exploit dependences between successive symbols.

Solutions to Homework Set #3 Channel and Source coding

Transcription:

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science 6.44 Transmission of Information Spring 2006 Homework 2 Solution name username April 4, 2006 Reading: Chapter 3 to 5 of Cover. Initial conditions [prob 7 on p75 of Cover]. Show, for a Markov chain, that H(X 0 X n ) H(X 0 X n ) Thus initial condition X 0 becomes more difficult to recover as the future X n unfolds 2. AEP [modified prob 5 on p58 of Cover]. Let X,X 2,... be independent identically distributed random variables drawn according to the probability mass function p(x), x {, 2,..., m}. Thus p(x, x 2,..., x n ) = n i= p(x i). We know that n log p(x,..., X n ) H(X) in probability. Let q(x,..., x n ) = n i= q(x i), where q is another probability mass function on {, 2,..., m}. (a) Evaluate lim n log q(x,..., X n ), where X, X 2,... are i.i.d. p(x). (b) Now evaluate the limit of the log likelihood ratio n when X, X 2,... are i.i.d. p(x). Thus the odds favoring q are exponentially small when p is true. Give and interpret the scenario when the log likelihood ratio is infinite. log q(x,...,xn) p(x,...,x n) 3. Random box size [prob 6 on p58 of Cover]. An n-dimensional rectangular box with sides X, X 2, X 3,..., X n is to be constructed. The volume is V n = n i= X i. The edge length l of an n-cube with the same volume of the random box is l = V n Let X, X 2,... be i.i.d. uniform random variables over the unit interval [0, ]. Find lim n V n n and compare to (EV n ) n. Clearly, the expected edge length does not capture the idea of the volume of the box. n.

6.44 Spring 2006 2 4. The AEP and source coding [prob 3 on p57 in Cover]. optional A discrete memoryless source emits a sequence of statistically independent binary digits with probabilities p() = 0.005 and p(0) = 0.995. The digits are taken 00 at a time and a binary codeword is provided for every sequence of 00 digits containing three or fewer ones. (a) Asumming that all codewords are the same length, find the minimum length required to provide codewords for all sequences with three or fewer ones. (b) Calculate the probability of observing a source sequence for which no codeword has been assigned. (c) Use Chebyshev s inequality to bound the probability of observing a source sequence for which no codeword has been assigned. Compare this bound with the actual probability computed in part b. 5. Proof of Theorem 3.3. [modified prob 7 on p58 of Cover]. Let X, X 2,..., X n be i.i.d. p(x). Let B (n) X n such that Pr(B (n) ) >. [0, ). (a) Given any two sets A, B such that Pr(A) > and Pr(B) > 2, show that Pr(A B) > 2. Hence, Pr(A (n) B (n) ) >. (b) Using the above result, show that ( ) ( [0, ))( B > 0)( N N)( n > N) n log (n) > H Hence, ( [0, )) ( { min B (n) : B (n) } ) X n, Pr(B (n).= ) > A (n) as 0 6. Entropy rates of Markov Chain [prob 5 on p74 of Cover]. optional (a) Find the entropy rate of the two-state Markov chain with transition matrix [ ] p0 p P = 0 p 0 p 0

6.44 Spring 2006 3 (b) What values of p 0, p 0 maximize the rate of part a? (c) Find the entropy rate of the two-state Markov chain with transition matrix [ ] p p P = 0 (d) Find the maximum value of the entropy rate of the Markov chain of part c. We expect that the maximizing value of p should be less than /2, since the 0 state permits more information to be generated than the state. (e) Let N(t) be the number of allowable state sequences of length t for the Markov chain of part c. Find N(t) and calculate H 0 := lim log N(t) t t [Hint: Find a linear recurrence that expresses N(t) in terms of N(t ) and N(t 2).] Why is H 0 an upper bound on the entropy rate of the Markov chain? Compare H 0 with the maximum entropy found in part d. (f) How could we optimally encode an arbitrary random source with the allowable state sequences assuming that AEP holds? What is the minimum and expected codeword length per source symbol in terms of the source entropy rate H and the quantity H 0? 7. The entropy rate of a dog looking for a bone [prob 4 on p75 of Cover]. A dog walks on the integers, possibly reversing direction at each step with probability p = 0.. Let X 0 = 0. The first step is equally likely to be positive or negative. A typical walk might look like this: (X 0, X,... ) = (0,, 2, 3, 4, 3, 2,, 0,,... ) (a) Find H(X n ) where X n := (X, X 2,..., X n ).

6.44 Spring 2006 4 (b) Find the entropy rate of this browsing dog. (c) What is the expected number of steps the dog takes before reversing direction? 8. General AEP. Consider the stochastic process {X i } i Z +. (a) Independent sequence. Suppose that X i s are independent (not necessarily identically distributed) with Var[log p(x i )] bounded above by some constant α. i. Express the expectation H n := E[ n log p(xn )] in terms of the entropies H(X i ). ii. Argue that AEP holds if the limit H := lim n H n exists. Construct an example where the limit H does not exist (i.e. specify the sequence {H n } in terms of n such that the average does not converge). iii. Consider X i s to be deterministically 0 if i k N { 2 2k,..., 2 2k+ }, and i.i.d. Bernoulli random variables with probability /2 otherwise. Give H if it exists. If not, briefly explains why. (b) difficult Hidden Markov. Let {S i } i Z + be the stationary 2-state Markov process with transition probabilities p 0 = 2p 0 = 0.5. If S i = 0, X i s are independent Bernoulli distributed random variables with probability /2. Otherwise, X i will be deterministically 0.5. Calculate the entropy rate of {X i } [Hint: apply chain rule on H(X n, S n ) and see if any term is 0.] (c) Briefly explain whether the following statement is true/false. i. The optimal compression scheme that minimizes the expected codeword length for a general stochastic process is not known if the process does not have AEP.

6.44 Spring 2006 5 ii. (See the historical notes on p59 of Cover.) The most general process for which AEP holds is continuous stationary ergodic process and the type of convergence is almost sure convergence. (d) difficult and optional Random length. Let {L i } i Z + be a set of i.i.d. geometric random variables with probability /2. X j is Bernoulli random variable with probability /L k for j [ + k i= L i, k i= L i] where k Z +. Briefly explain what is wrong with the following calculation of the entropy rate of {X i } assuming that AEP holds for this process. Consider L m. As m, there will be almost surely a fraction 2 l of L i s taking values of l. Then, there will be almost surely a fraction l2 l Pl>0 l2 l of X i s being Bernoulli(/l) random variables. Hence, we can obtain the entropy rate as follows, l>0 H(X ) = l2 l H[/l] l>0 l2 l = l2 l H[/l] 0.447 2 l>0 9. Information divergence Let {X i } i Z + be a set of i.i.d. random variables with probability distribution Q, which is unknown. We would like to compress {X i } by defining the hypothesized typical set A (n) based on the estimate P x n of Q after observing the n-sequence x n. (a) Suppose X i s are Bernoulli random variable, where the actual generating distribution Q := Bernoulli(q), the hypothesized distribution based on x n is P x n := Bernoulli(p := i x i/n), and the typical set defined using P x n is A (n) := { x n : n log P x n(xn ) H[p] } i. Show that x n A (n) P n i= xi ii. If p /2, show that as 0, n Pr(A (n) p log p p ) := Q(A (n) ) =. 2 D[p q]n Briefly explain whether this result holds for p = /2. (b) difficult and optional Strong typicality. Consider a general distribution Q on a finite support set X. Let N(α; x n ) be the number of α X in the sample sequence x n. We

6.44 Spring 2006 6 define the distribution P x n : α X N(α;xn ) n to be the type (or empirical distribution) of x n. We define A (n) to be the strong typical set based on P x n as follows, { A (n) := x n : α X, P x n (α) P x n (α) < } X As a side note, the strong typical set has all the desired property of the (weak) typical set. With this stronger version, the following parts will generalize the interpretation of information divergence beyond the Bernoulli distribution, and avoid the problem raised in part ii i. Define P n := { P : x n X n P = P x n } to be the class of all types. Show that ( ) n + X P n = n X (n + ) ii. Define T P := { x n X n : P x n = P } to be the class of type P P n. Show that P x n (T Px n ) = max P P n P x n (T P ) P n iii. Show using the previous result that T Px n = P x n (T Px n )2 nh[p x n ]. = 2 nh[p x n ] iv. Finally, show that Q(T Px n ) =. 2 nd[p x n Q] and so Pr(A (n) ) := Q(A (n) ) =. 2 nd[p x n Q] as 0.