EE5319R: Problem Set 3 Assigned: 24/08/16, Due: 31/08/16

Size: px
Start display at page:

Download "EE5319R: Problem Set 3 Assigned: 24/08/16, Due: 31/08/16"

Transcription

1 EE539R: Problem Set 3 Assigned: 24/08/6, Due: 3/08/6. Cover and Thomas: Problem 2.30 (Maimum Entropy): Solution: We are required to maimize H(P X ) over all distributions P X on the non-negative integers satisfying np X (n) = A n=0 and also the normalization constraint n=0 P X(n) = (which we ignore without loss of generality). Now, construct the Lagrangian: ( ) L(P X, λ) = P X (n) log P X (n) + λ np X (n) A n=0 Differentiating with respect to P X (n) (assuming natural logs and interchanging of differentiation and infinite sum), we obtain so we have n=0 P X (n) P X (n) log P X(n) + λn = 0 P X(n) = ep( + λn), n 0. We immediately recognize that this is a geometric distribution with mean A, i.e., PX can be written alternatively as where From direct calculations, the entropy is P X(n) = ( p) n p, n 0. A = p p H(P X ) = H b(p). p 2. (Optional): Cover and Thomas: Problem 2.38 (The Value of a Question): since H(Y X) = 0. H(X) H(X Y ) = I(X; Y ) = H(Y ) H(Y X) = H b (α) H(Y X) = H b (α)

2 3. Fano s inequality for list decoding: Recall the proof of Fano s inequality. Now develop a generalization of Fano s inequality for list decoding. Let (X, Y ) P XY and let L(Y ) ˆX be a set of size L (compare this to an estimator ˆX(Y ) X which is a set of size L = ). Lower bound the probability of error Pr(X / L(Y )) in terms of L, H(X L(Y )) and X. You should be able to recover the standard Fano inequality if you set L =. Solution: Define the error random variable Now consider E = { X / L(Y ) 0 X L(Y ) H(X, E L(Y )) = H(X E, L(Y )) + H(E L(Y )) = H(E X, L(Y )) + H(X L(Y )) Let P e := Pr(X / L(Y )). Now clearly, H(E X, L(Y )) = 0, and H(E L(Y )) H(E) = H b (P e ). Now, we eamine the term H(X E, L(Y )). We have H(X E, L(Y )) = Pr(E = 0)H(X E = 0, L(Y )) + Pr(E = )H(X E =, L(Y )) ( P e ) log L + P e log( X L) since if we know that E = 0, the number of values that X can take on is no more than L and if E =, the number of values that X can take on is no more than X L. Putting everything together and upper bounding H b (P e ) by, we have H(X L(Y )) log L P e. log X L L 4. (Optional): Data Processing Inequality for KL Divergence: Let P X, Q X be pmfs on the same alphabet X. Assume for the sake of simplicity that P X (), Q X () > 0 for all X. Let W (y ) = Pr(Y = y X = ) be a channel from X to Y. Define P Y (y) = W (y )P X (), and Q Y (y) = W (y )Q X () Show that D(P X Q X ) D(P Y Q Y ) You may use the log-sum inequality. This problem shows that processing does not increase divergence. Solution: Starting from the definition of D(P Y Q Y ), we have D(P Y Q Y ) = y = y y = y P Y (y) log P Y (y) Q Y (y) ( ) W (y )P X () log ( W (y )P X()) ( W (y )Q X()) W (y )P X () log W (y )P X() W (y )Q X () W (y )P X () log P X() Q X () = P X () log P X() Q X () = D(P X Q X ) where the inequality follows from the log-sum inequality 2

3 5. Typical-Set Calculations : (a) Suppose a DMS emits h and t with probability /2 each. For ɛ = 0.0 and n = 5, what is A n ɛ? Solution: In this case, H(X) =. All source sequences are equally likely, each with probability 2 5 = 2 nh(x). Hence, all sequences satisfy the condition for being typical, 2 n(h(x)+ɛ) p X n( n ) 2 n(h(x) ɛ) for any ɛ > 0. Hence, all 32 sequences are typical. (b) Repeat if Pr(h) = 0.2, Pr(t) = 0.8, n = 5, and ɛ = Solution: Consider a sequence with m heads and n m tails. Then, the probability of occurrence of this sequence is p m ( p) n m, where p = Pr(h). For such a sequence to be typical which translates to Plugging in the value of p = 0.2, we get 2 n(h(x)+ɛ) p m ( p) n m 2 n(h(x) ɛ) ( m ) n p log p p ɛ m 5 5 ɛ 2. Since m = 0,..., 5, this condition will be satisfied for the given ɛ only for m = i.e. when there is one H in the sequence. Thus, A n ɛ = {(HT T T T ), (T HT T T ), (T T HT T ), (T T T HT ), (T T T T H)}. 6. Typical-Set Calculations 2: Consider a DMS with a two symbol alphabet {a, b} where p X (a) = 2/3 and p X (b) = /3. Let X n = (X,..., X n ) be a string of chance variables from the source with n = 00, 000. (a) Let W (X j ) be the log pmf random variable for the j-th source output, i.e., W (X j ) = log 2/3 for X j = a and log /3 for X j = b. Find the variance of W (X j ). Solution: For notational convenience, we will denote the log pmf random variable by W. Now, note that W takes on values log 2/3 with probability 2/3 and log /3 with probability /3. Hence, Var(W ) = E[W 2 ] E[W ] 2 = 2 9 (b) For ɛ = 0.0, evaluate the bound on the probability of the typical set using Pr(X n σw 2 /(nɛ2 ). Solution: The bound on the typical set, as derived using Chebyshev s inequality is / A (n) ɛ ) Pr(X n A (n) ɛ ) σ2 W nɛ 2. Substituting the values of n = 0 5 and ɛ = 0.0, we obtain Pr(X n A (n) ɛ ) 45 = Loosely speaking this means that if we were to look at sequences of length 00, 000 generated from our DMS, more than 97% of the time the sequence will be typical. 3

4 (c) Let N a be the number of a s in the string X n = (X,..., X n ). The random variable (rv) N a is the sum of n iid rv s. Show what these rv s are. Solution: The rv N a is the sum of n iid rv s Y i, N a = n i= Y i where Y i s are Bernoulli with Pr(Y i = ) = 2/3. (d) Epress the rv W (X n ) as a function of the rv N a. Note how this depends on n. Solution: The probability of a particular sequence X n with N a number of a s (2/3) Na (/3) n Na. Hence, W (X n ) = log p X n( n ) = log[(2/3) Na (/3) n Na ] = n log 3 N a (e) Epress the typical set in terms of bounds on N a (i.e., A (n) ɛ = { n : α < N a < β} and calculate α and β). Solution: For a sequence X n to be typical, it must satisfy n log p X n(n ) H(X) < ɛ From (a) the source entropy is H(X) = E[W (X)] = log 3 2/3 and substituting in ɛ and W (X n ) from part (d), we get N a n Note the intuitive appeal of this condition! It says that for a sequence to be typical, the proportion of a s in that sequence will be very close to the probability that the DMS generates an a. Plugging in the value of n in the above equation, we get the bounds on 65, 667 N a 67, 666. (f) Find the mean and variance of N a. Approimate Pr(X n A (n) ɛ ) by the central limit theorem approimation. The central limit theorem approimation is to evaluate Pr(X n A ɛ (n) ) assuming that N a is Gaussian with the mean and variance of the actual N a. Recall that for a sequence of iid rvs C,... C n, the central limit theorem assert that Pr ( n n i= C i µ C t ) ( ) t Φ σ C where µ C and σ C are the mean and standard deviation of the C i s and Φ(z) = z 2π ep( u2 2 ) du is the cdf of the standard Gaussian. Solution: N a is a binomial r.v. (which is a sum independent Bernoulli r.v. as we have shown in part (c)). The mean and variance are E[N a ] = , Var(N a ) = Note that we can calculate the eact probability of the typical set A (n) ɛ : Pr(A (n) ɛ ) = Pr(65, 667 N a 67, 666) = 67,666 N a=65,667 ( 0 5 N a ) ( ) Na ( ) N a 3 3 But this is computationally intensive, so we approimate the Pr(A (n) ɛ ) with the central limit theorem. We can use the CLT because N a is the sum of n iid r.v. so in the limit of large n, the cumulative distribution approaches that of a Gaussian r.v. with the mean and variance of N a. β ( Pr(65, 667 N a 67, 666) 2π Var(Na ) ep ( E[N a]) 2 ) d = Φ(6.706) Φ(6.70) 2 Var(N a ) α 4

5 where Φ() is the integral of the unit Gaussian r.v. from (, ). Thus the CLT approimation tells us approimately all of the sequences we observe from the output of the DMS will be typical, whereas Chebyshev gave us a bound that more than 97% of the sequences that we observe will be typical. 7. (Optional): Typical-Set Calculations 3: For the random variables in the previous problem, find Pr(N a = i) for i = 0,, 2. Find the probability of each individual string n for those values of i. Find the particular string n that has maimum probability over all sample values of X n. What are the net most probable n-strings. Give a brief discussion of why the most probable n-strings are not regarded as typical strings. Solution: We know from the previous problem that ( 0 5 Pr(N a = i) = i ) ( 2 3 ) i ( ) 0 5 i 3 For i = 0,, 2, Pr(N a = i) is approimately zero. The string with the maimal probability is the string with all a s. The net most probable strings are the sequences with n a s and one b, and so forth. From the definition of the typical set, we see that the typical set is a fairly small set which contains most of the probability, and the probability of each sequence in the typical set is almost the same. The most probable sequences and the least probable sequences are the tails of the distribution of the sample mean of the log pmf (they are the furthest from the mean), so are not regarded as typical strings. In fact, the aggregate probability of the all the most likely sequences and all the least likely sequences is very small. The only case where the most likely sequence is regarded as typical is when every sequence is typical and every sequence is most likely (as in problem Typical Set Calculation ). However, this is not the case in general. From what we have seen in problem Typical Set Calculation 2 for very long sequences, the typical sequence will contain roughly the same proportion of of symbols as the probability of that symbol. 8. (Optional): AEP and Mutual Information: Let (X i, Y i ) be i.i.d. p X,Y (, y). We form the loglikelihood ratio of the hypothesis that X and Y are independent vs the hypothesis that X and Y are dependent. What is the limit of n log p Xn(X n )p Y n(y n ) p Xn,Y n(xn, Y n ) What is the limit of p X n (Xn )p Y n (Y n ) p X n,y n (X n,y n ) if X i and Y i are independent for all i? Solution: Let L = n log p X n(xn )p Y n(y n ) p Xn,Y n(xn, Y n ) Since (X i, Y i ) be i.i.d. p X,Y (, y), we have L = n n i= log p X(X i )p Y (Y i ) p X,Y (X i, Y i ) }{{} W (X i,y i) Each of the terms is a function of (X i, Y i ) which are independent across i =,..., n. following convergence in probability is observed: [ L E [W (X, Y )] = E (X,Y ) px,y log p ] X(X)p Y (Y ) = I(X; Y ) p X,Y (X, Y ) Thus, the is 2 ni(x;y ) which converges to one if X and Y are inde- Hence, the limit of 2 nl = p X n (Xn )p Y n (Y n ) p X n,y n (X n,y n ) pendent because I(X; Y ) = 0. 5

6 9. Piece of Cake: A cake is sliced roughly in half, the largest piece being chosen each time, the other pieces discarded. We will assume that a random cut creates pieces of proportions: { (2/3, /3) w.p. 3/4 P = (2/5, 3/5) w.p. /4 Thus, for eample, the first cut (and choice of largest piece) may result in a piece of size 3/5. Cutting and choosing from this piece might reduce it to size (3/5)(2/3) at time 2, and so on. Let T n be the fraction of cake left after n cuts. Find the limit (in probability) of lim n n log T n Solution: Let C i be the fraction of the piece of cake that is cut at the i-th cut, and let T n be the fraction of cake left after n cuts. Then we have T n = C C 2... C n. Hence, lim n n log T n = lim n n n log C i E[log C ] = 3 4 log log 3 5. i= 0. Two Typical Sets: Let X i be a sequence of real-valued random variables independent and identically distributed according to P X (), X. Let µ = E[X] and denote the entropy of X as H(X) = P X() log P X (). Define the two sets A n = { n X n : } { n log P X n( n ) H(X) ɛ, B n = n X n : n (a) ( point) Pr(X n A n ) as n. True or false. Justify your answer. Solution: This follows by Chebyshev s inequality: Indeed. where σ 2 0 = Var( log P X (X)). Consequently as desired. Pr(X n A c n) σ2 0 nɛ 2 0 Pr(X n A n ) (b) ( point) Pr(X n A n B n ) as n. True or false. Justify your answer. Solution: By Chebyshev s inequality and the same logic as the above, Pr(X n B n ) So by De Morgan s theorem and the union bound, } n X i µ ɛ Pr(X n A n B n ) = Pr(X n A c n B c n) Pr(X n A c n) Pr(X n B c n) Since the latter two terms tend to zero, we know that as desired. Pr(X n A n B n ) (c) ( point) Show that A n B n 2 n(h(x)+ɛ) for all n. A n B n A n 2 n(h+ɛ) where the final inequality comes from the AEP, shown in class. i= 6

7 (d) ( point) Show that A n B n 2 2n(H(X) ɛ) for n sufficiently large. Pr(X n A n B n ) 2 for n sufficiently large. Thus and we are done. 2 P X n( n ) n A n B n n A n B n 2 n(h ɛ) = A n B n 2 n(h ɛ). Entropy Inequalities: Let X and Y be real-valued random variables that take on discrete values in X = {,..., r} and Y = {,..., s}. Let Z = X + Y. (a) ( point) Show that H(Z X) = H(Y X). Justify your answer carefully. Solution: Consider H(Z X) = P X ()H(Z X = ) = = = P X () z P X () z P X () y P Z X (z ) log P Z X (z ) P Y X (z ) log P Y X (z ) P Y X (y ) log P Y X (y ) = H(Y X). (b) ( point) It is now known that X and Y are independent, which of the following is true in general? (i) H(X) H(Z); (ii) H(X) H(Z). Justify your answer. Solution: From the above, note that X and Y are symmetrical. So given what we have proved in (a), we also know that H(Z Y ) = H(X Y ) Now, we have H(Z) H(Z Y ) = H(X Y ) = H(X) where the inequality is due to conditioning reduces entropy and the final equality by the independence of X and Z. So the first assertion is true. (c) ( point) Now, in addition to Z = X + Y and that X and Y are independent, it is also known that X = f (Z) and Y = f 2 (Z) for some functions f and f 2. Find H(Z) in terms of H(X) and H(Y ). H(Z) = H(X + Y ) H(X, Y ) = H(X) + H(Y ) where the final equality is by independence of X and Y. On the other hand, H(X) + H(Y ) = H(X, Y ) = H(f (Z), f 2 (Z)) H(Z) Hence all inequalities above are equalities and we have H(Z) = H(X) + H(Y ). 7

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information 4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information Ramji Venkataramanan Signal Processing and Communications Lab Department of Engineering ramji.v@eng.cam.ac.uk

More information

National University of Singapore Department of Electrical & Computer Engineering. Examination for

National University of Singapore Department of Electrical & Computer Engineering. Examination for National University of Singapore Department of Electrical & Computer Engineering Examination for EE5139R Information Theory for Communication Systems (Semester I, 2014/15) November/December 2014 Time Allowed:

More information

EE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16

EE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16 EE539R: Problem Set 4 Assigned: 3/08/6, Due: 07/09/6. Cover and Thomas: Problem 3.5 Sets defined by probabilities: Define the set C n (t = {x n : P X n(x n 2 nt } (a We have = P X n(x n P X n(x n 2 nt

More information

ECE 4400:693 - Information Theory

ECE 4400:693 - Information Theory ECE 4400:693 - Information Theory Dr. Nghi Tran Lecture 8: Differential Entropy Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 1 / 43 Outline 1 Review: Entropy of discrete RVs 2 Differential

More information

Lecture 22: Final Review

Lecture 22: Final Review Lecture 22: Final Review Nuts and bolts Fundamental questions and limits Tools Practical algorithms Future topics Dr Yao Xie, ECE587, Information Theory, Duke University Basics Dr Yao Xie, ECE587, Information

More information

Solutions to Homework Set #1 Sanov s Theorem, Rate distortion

Solutions to Homework Set #1 Sanov s Theorem, Rate distortion st Semester 00/ Solutions to Homework Set # Sanov s Theorem, Rate distortion. Sanov s theorem: Prove the simple version of Sanov s theorem for the binary random variables, i.e., let X,X,...,X n be a sequence

More information

Chapter 8: Differential entropy. University of Illinois at Chicago ECE 534, Natasha Devroye

Chapter 8: Differential entropy. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 8: Differential entropy Chapter 8 outline Motivation Definitions Relation to discrete entropy Joint and conditional differential entropy Relative entropy and mutual information Properties AEP for

More information

Lecture 5: Asymptotic Equipartition Property

Lecture 5: Asymptotic Equipartition Property Lecture 5: Asymptotic Equipartition Property Law of large number for product of random variables AEP and consequences Dr. Yao Xie, ECE587, Information Theory, Duke University Stock market Initial investment

More information

EE/Stat 376B Handout #5 Network Information Theory October, 14, Homework Set #2 Solutions

EE/Stat 376B Handout #5 Network Information Theory October, 14, Homework Set #2 Solutions EE/Stat 376B Handout #5 Network Information Theory October, 14, 014 1. Problem.4 parts (b) and (c). Homework Set # Solutions (b) Consider h(x + Y ) h(x + Y Y ) = h(x Y ) = h(x). (c) Let ay = Y 1 + Y, where

More information

Solutions to Set #2 Data Compression, Huffman code and AEP

Solutions to Set #2 Data Compression, Huffman code and AEP Solutions to Set #2 Data Compression, Huffman code and AEP. Huffman coding. Consider the random variable ( ) x x X = 2 x 3 x 4 x 5 x 6 x 7 0.50 0.26 0. 0.04 0.04 0.03 0.02 (a) Find a binary Huffman code

More information

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria Source Coding Master Universitario en Ingeniería de Telecomunicación I. Santamaría Universidad de Cantabria Contents Introduction Asymptotic Equipartition Property Optimal Codes (Huffman Coding) Universal

More information

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 2: Entropy and Mutual Information Chapter 2 outline Definitions Entropy Joint entropy, conditional entropy Relative entropy, mutual information Chain rules Jensen s inequality Log-sum inequality

More information

Exercises with solutions (Set B)

Exercises with solutions (Set B) Exercises with solutions (Set B) 3. A fair coin is tossed an infinite number of times. Let Y n be a random variable, with n Z, that describes the outcome of the n-th coin toss. If the outcome of the n-th

More information

Lecture 11: Quantum Information III - Source Coding

Lecture 11: Quantum Information III - Source Coding CSCI5370 Quantum Computing November 25, 203 Lecture : Quantum Information III - Source Coding Lecturer: Shengyu Zhang Scribe: Hing Yin Tsang. Holevo s bound Suppose Alice has an information source X that

More information

Homework Set #2 Data Compression, Huffman code and AEP

Homework Set #2 Data Compression, Huffman code and AEP Homework Set #2 Data Compression, Huffman code and AEP 1. Huffman coding. Consider the random variable ( x1 x X = 2 x 3 x 4 x 5 x 6 x 7 0.50 0.26 0.11 0.04 0.04 0.03 0.02 (a Find a binary Huffman code

More information

EE376A: Homework #2 Solutions Due by 11:59pm Thursday, February 1st, 2018

EE376A: Homework #2 Solutions Due by 11:59pm Thursday, February 1st, 2018 Please submit the solutions on Gradescope. Some definitions that may be useful: EE376A: Homework #2 Solutions Due by 11:59pm Thursday, February 1st, 2018 Definition 1: A sequence of random variables X

More information

Lecture 11: Continuous-valued signals and differential entropy

Lecture 11: Continuous-valued signals and differential entropy Lecture 11: Continuous-valued signals and differential entropy Biology 429 Carl Bergstrom September 20, 2008 Sources: Parts of today s lecture follow Chapter 8 from Cover and Thomas (2007). Some components

More information

Chapter 2. Discrete Distributions

Chapter 2. Discrete Distributions Chapter. Discrete Distributions Objectives ˆ Basic Concepts & Epectations ˆ Binomial, Poisson, Geometric, Negative Binomial, and Hypergeometric Distributions ˆ Introduction to the Maimum Likelihood Estimation

More information

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University Chapter 4 Data Transmission and Channel Capacity Po-Ning Chen, Professor Department of Communications Engineering National Chiao Tung University Hsin Chu, Taiwan 30050, R.O.C. Principle of Data Transmission

More information

Communications Theory and Engineering

Communications Theory and Engineering Communications Theory and Engineering Master's Degree in Electronic Engineering Sapienza University of Rome A.A. 2018-2019 AEP Asymptotic Equipartition Property AEP In information theory, the analog of

More information

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018 Please submit the solutions on Gradescope. EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018 1. Optimal codeword lengths. Although the codeword lengths of an optimal variable length code

More information

ECE 587 / STA 563: Lecture 5 Lossless Compression

ECE 587 / STA 563: Lecture 5 Lossless Compression ECE 587 / STA 563: Lecture 5 Lossless Compression Information Theory Duke University, Fall 2017 Author: Galen Reeves Last Modified: October 18, 2017 Outline of lecture: 5.1 Introduction to Lossless Source

More information

Lecture 6 I. CHANNEL CODING. X n (m) P Y X

Lecture 6 I. CHANNEL CODING. X n (m) P Y X 6- Introduction to Information Theory Lecture 6 Lecturer: Haim Permuter Scribe: Yoav Eisenberg and Yakov Miron I. CHANNEL CODING We consider the following channel coding problem: m = {,2,..,2 nr} Encoder

More information

ECE 587 / STA 563: Lecture 5 Lossless Compression

ECE 587 / STA 563: Lecture 5 Lossless Compression ECE 587 / STA 563: Lecture 5 Lossless Compression Information Theory Duke University, Fall 28 Author: Galen Reeves Last Modified: September 27, 28 Outline of lecture: 5. Introduction to Lossless Source

More information

X = X X n, + X 2

X = X X n, + X 2 CS 70 Discrete Mathematics for CS Fall 2003 Wagner Lecture 22 Variance Question: At each time step, I flip a fair coin. If it comes up Heads, I walk one step to the right; if it comes up Tails, I walk

More information

EE376A: Homeworks #4 Solutions Due on Thursday, February 22, 2018 Please submit on Gradescope. Start every question on a new page.

EE376A: Homeworks #4 Solutions Due on Thursday, February 22, 2018 Please submit on Gradescope. Start every question on a new page. EE376A: Homeworks #4 Solutions Due on Thursday, February 22, 28 Please submit on Gradescope. Start every question on a new page.. Maximum Differential Entropy (a) Show that among all distributions supported

More information

Discrete Probability Refresher

Discrete Probability Refresher ECE 1502 Information Theory Discrete Probability Refresher F. R. Kschischang Dept. of Electrical and Computer Engineering University of Toronto January 13, 1999 revised January 11, 2006 Probability theory

More information

Information measures in simple coding problems

Information measures in simple coding problems Part I Information measures in simple coding problems in this web service in this web service Source coding and hypothesis testing; information measures A(discrete)source is a sequence {X i } i= of random

More information

Chapter 3 Single Random Variables and Probability Distributions (Part 1)

Chapter 3 Single Random Variables and Probability Distributions (Part 1) Chapter 3 Single Random Variables and Probability Distributions (Part 1) Contents What is a Random Variable? Probability Distribution Functions Cumulative Distribution Function Probability Density Function

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science Transmission of Information Spring 2006

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science Transmission of Information Spring 2006 MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science 6.44 Transmission of Information Spring 2006 Homework 2 Solution name username April 4, 2006 Reading: Chapter

More information

LECTURE 13. Last time: Lecture outline

LECTURE 13. Last time: Lecture outline LECTURE 13 Last time: Strong coding theorem Revisiting channel and codes Bound on probability of error Error exponent Lecture outline Fano s Lemma revisited Fano s inequality for codewords Converse to

More information

MATH Notebook 5 Fall 2018/2019

MATH Notebook 5 Fall 2018/2019 MATH442601 2 Notebook 5 Fall 2018/2019 prepared by Professor Jenny Baglivo c Copyright 2004-2019 by Jenny A. Baglivo. All Rights Reserved. 5 MATH442601 2 Notebook 5 3 5.1 Sequences of IID Random Variables.............................

More information

EE5139R: Problem Set 7 Assigned: 30/09/15, Due: 07/10/15

EE5139R: Problem Set 7 Assigned: 30/09/15, Due: 07/10/15 EE5139R: Problem Set 7 Assigned: 30/09/15, Due: 07/10/15 1. Cascade of Binary Symmetric Channels The conditional probability distribution py x for each of the BSCs may be expressed by the transition probability

More information

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157 Lecture 6: Gaussian Channels Copyright G. Caire (Sample Lectures) 157 Differential entropy (1) Definition 18. The (joint) differential entropy of a continuous random vector X n p X n(x) over R is: Z h(x

More information

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 20

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 20 CS 70 Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 20 Today we shall discuss a measure of how close a random variable tends to be to its expectation. But first we need to see how to compute

More information

Fundamental Tools - Probability Theory IV

Fundamental Tools - Probability Theory IV Fundamental Tools - Probability Theory IV MSc Financial Mathematics The University of Warwick October 1, 2015 MSc Financial Mathematics Fundamental Tools - Probability Theory IV 1 / 14 Model-independent

More information

Homework 1 Due: Thursday 2/5/2015. Instructions: Turn in your homework in class on Thursday 2/5/2015

Homework 1 Due: Thursday 2/5/2015. Instructions: Turn in your homework in class on Thursday 2/5/2015 10-704 Homework 1 Due: Thursday 2/5/2015 Instructions: Turn in your homework in class on Thursday 2/5/2015 1. Information Theory Basics and Inequalities C&T 2.47, 2.29 (a) A deck of n cards in order 1,

More information

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University Chapter 3, 4 Random Variables ENCS6161 - Probability and Stochastic Processes Concordia University ENCS6161 p.1/47 The Notion of a Random Variable A random variable X is a function that assigns a real

More information

Data Compression. Limit of Information Compression. October, Examples of codes 1

Data Compression. Limit of Information Compression. October, Examples of codes 1 Data Compression Limit of Information Compression Radu Trîmbiţaş October, 202 Outline Contents Eamples of codes 2 Kraft Inequality 4 2. Kraft Inequality............................ 4 2.2 Kraft inequality

More information

More on Distribution Function

More on Distribution Function More on Distribution Function The distribution of a random variable X can be determined directly from its cumulative distribution function F X. Theorem: Let X be any random variable, with cumulative distribution

More information

MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016

MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016 MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016 Lecture 14: Information Theoretic Methods Lecturer: Jiaming Xu Scribe: Hilda Ibriga, Adarsh Barik, December 02, 2016 Outline f-divergence

More information

LECTURE 3. Last time:

LECTURE 3. Last time: LECTURE 3 Last time: Mutual Information. Convexity and concavity Jensen s inequality Information Inequality Data processing theorem Fano s Inequality Lecture outline Stochastic processes, Entropy rate

More information

Lecture 3: Channel Capacity

Lecture 3: Channel Capacity Lecture 3: Channel Capacity 1 Definitions Channel capacity is a measure of maximum information per channel usage one can get through a channel. This one of the fundamental concepts in information theory.

More information

Lecture 17: Differential Entropy

Lecture 17: Differential Entropy Lecture 17: Differential Entropy Differential entropy AEP for differential entropy Quantization Maximum differential entropy Estimation counterpart of Fano s inequality Dr. Yao Xie, ECE587, Information

More information

Lecture 14 February 28

Lecture 14 February 28 EE/Stats 376A: Information Theory Winter 07 Lecture 4 February 8 Lecturer: David Tse Scribe: Sagnik M, Vivek B 4 Outline Gaussian channel and capacity Information measures for continuous random variables

More information

A Probability Primer. A random walk down a probabilistic path leading to some stochastic thoughts on chance events and uncertain outcomes.

A Probability Primer. A random walk down a probabilistic path leading to some stochastic thoughts on chance events and uncertain outcomes. A Probability Primer A random walk down a probabilistic path leading to some stochastic thoughts on chance events and uncertain outcomes. Are you holding all the cards?? Random Events A random event, E,

More information

1 Basic Information Theory

1 Basic Information Theory ECE 6980 An Algorithmic and Information-Theoretic Toolbo for Massive Data Instructor: Jayadev Acharya Lecture #4 Scribe: Xiao Xu 6th September, 206 Please send errors to 243@cornell.edu and acharya@cornell.edu

More information

Lecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN

Lecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN Lecture Notes 5 Convergence and Limit Theorems Motivation Convergence with Probability Convergence in Mean Square Convergence in Probability, WLLN Convergence in Distribution, CLT EE 278: Convergence and

More information

Probability. Lecture Notes. Adolfo J. Rumbos

Probability. Lecture Notes. Adolfo J. Rumbos Probability Lecture Notes Adolfo J. Rumbos October 20, 204 2 Contents Introduction 5. An example from statistical inference................ 5 2 Probability Spaces 9 2. Sample Spaces and σ fields.....................

More information

Math 180A. Lecture 16 Friday May 7 th. Expectation. Recall the three main probability density functions so far (1) Uniform (2) Exponential.

Math 180A. Lecture 16 Friday May 7 th. Expectation. Recall the three main probability density functions so far (1) Uniform (2) Exponential. Math 8A Lecture 6 Friday May 7 th Epectation Recall the three main probability density functions so far () Uniform () Eponential (3) Power Law e, ( ), Math 8A Lecture 6 Friday May 7 th Epectation Eample

More information

Frans M.J. Willems. Authentication Based on Secret-Key Generation. Frans M.J. Willems. (joint work w. Tanya Ignatenko)

Frans M.J. Willems. Authentication Based on Secret-Key Generation. Frans M.J. Willems. (joint work w. Tanya Ignatenko) Eindhoven University of Technology IEEE EURASIP Spain Seminar on Signal Processing, Communication and Information Theory, Universidad Carlos III de Madrid, December 11, 2014 : Secret-Based Authentication

More information

CSCI-6971 Lecture Notes: Probability theory

CSCI-6971 Lecture Notes: Probability theory CSCI-6971 Lecture Notes: Probability theory Kristopher R. Beevers Department of Computer Science Rensselaer Polytechnic Institute beevek@cs.rpi.edu January 31, 2006 1 Properties of probabilities Let, A,

More information

1 Review of Probability

1 Review of Probability 1 Review of Probability Random variables are denoted by X, Y, Z, etc. The cumulative distribution function (c.d.f.) of a random variable X is denoted by F (x) = P (X x), < x

More information

Chapter 2: The Random Variable

Chapter 2: The Random Variable Chapter : The Random Variable The outcome of a random eperiment need not be a number, for eample tossing a coin or selecting a color ball from a bo. However we are usually interested not in the outcome

More information

Lecture 2: CDF and EDF

Lecture 2: CDF and EDF STAT 425: Introduction to Nonparametric Statistics Winter 2018 Instructor: Yen-Chi Chen Lecture 2: CDF and EDF 2.1 CDF: Cumulative Distribution Function For a random variable X, its CDF F () contains all

More information

Capacity of a channel Shannon s second theorem. Information Theory 1/33

Capacity of a channel Shannon s second theorem. Information Theory 1/33 Capacity of a channel Shannon s second theorem Information Theory 1/33 Outline 1. Memoryless channels, examples ; 2. Capacity ; 3. Symmetric channels ; 4. Channel Coding ; 5. Shannon s second theorem,

More information

Topic 3: The Expectation of a Random Variable

Topic 3: The Expectation of a Random Variable Topic 3: The Expectation of a Random Variable Course 003, 2017 Page 0 Expectation of a discrete random variable Definition (Expectation of a discrete r.v.): The expected value (also called the expectation

More information

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 ECE598: Information-theoretic methods in high-dimensional statistics Spring 06 Lecture : Mutual Information Method Lecturer: Yihong Wu Scribe: Jaeho Lee, Mar, 06 Ed. Mar 9 Quick review: Assouad s lemma

More information

Central Limit Theorem and the Law of Large Numbers Class 6, Jeremy Orloff and Jonathan Bloom

Central Limit Theorem and the Law of Large Numbers Class 6, Jeremy Orloff and Jonathan Bloom Central Limit Theorem and the Law of Large Numbers Class 6, 8.5 Jeremy Orloff and Jonathan Bloom Learning Goals. Understand the statement of the law of large numbers. 2. Understand the statement of the

More information

Probability Review. Gonzalo Mateos

Probability Review. Gonzalo Mateos Probability Review Gonzalo Mateos Dept. of ECE and Goergen Institute for Data Science University of Rochester gmateosb@ece.rochester.edu http://www.ece.rochester.edu/~gmateosb/ September 11, 2018 Introduction

More information

lossless, optimal compressor

lossless, optimal compressor 6. Variable-length Lossless Compression The principal engineering goal of compression is to represent a given sequence a, a 2,..., a n produced by a source as a sequence of bits of minimal possible length.

More information

Topic 7: Convergence of Random Variables

Topic 7: Convergence of Random Variables Topic 7: Convergence of Ranom Variables Course 003, 2016 Page 0 The Inference Problem So far, our starting point has been a given probability space (S, F, P). We now look at how to generate information

More information

ELEC546 Review of Information Theory

ELEC546 Review of Information Theory ELEC546 Review of Information Theory Vincent Lau 1/1/004 1 Review of Information Theory Entropy: Measure of uncertainty of a random variable X. The entropy of X, H(X), is given by: If X is a discrete random

More information

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1 Kraft s inequality An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if N 2 l i 1 Proof: Suppose that we have a tree code. Let l max = max{l 1,...,

More information

PART III. Outline. Codes and Cryptography. Sources. Optimal Codes (I) Jorge L. Villar. MAMME, Fall 2015

PART III. Outline. Codes and Cryptography. Sources. Optimal Codes (I) Jorge L. Villar. MAMME, Fall 2015 Outline Codes and Cryptography 1 Information Sources and Optimal Codes 2 Building Optimal Codes: Huffman Codes MAMME, Fall 2015 3 Shannon Entropy and Mutual Information PART III Sources Information source:

More information

The binary entropy function

The binary entropy function ECE 7680 Lecture 2 Definitions and Basic Facts Objective: To learn a bunch of definitions about entropy and information measures that will be useful through the quarter, and to present some simple but

More information

CS 630 Basic Probability and Information Theory. Tim Campbell

CS 630 Basic Probability and Information Theory. Tim Campbell CS 630 Basic Probability and Information Theory Tim Campbell 21 January 2003 Probability Theory Probability Theory is the study of how best to predict outcomes of events. An experiment (or trial or event)

More information

LECTURE 10. Last time: Lecture outline

LECTURE 10. Last time: Lecture outline LECTURE 10 Joint AEP Coding Theorem Last time: Error Exponents Lecture outline Strong Coding Theorem Reading: Gallager, Chapter 5. Review Joint AEP A ( ɛ n) (X) A ( ɛ n) (Y ) vs. A ( ɛ n) (X, Y ) 2 nh(x)

More information

6 The normal distribution, the central limit theorem and random samples

6 The normal distribution, the central limit theorem and random samples 6 The normal distribution, the central limit theorem and random samples 6.1 The normal distribution We mentioned the normal (or Gaussian) distribution in Chapter 4. It has density f X (x) = 1 σ 1 2π e

More information

Lecture 4: Sampling, Tail Inequalities

Lecture 4: Sampling, Tail Inequalities Lecture 4: Sampling, Tail Inequalities Variance and Covariance Moment and Deviation Concentration and Tail Inequalities Sampling and Estimation c Hung Q. Ngo (SUNY at Buffalo) CSE 694 A Fun Course 1 /

More information

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information. L65 Dept. of Linguistics, Indiana University Fall 205 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission rate

More information

Ch. 8 Math Preliminaries for Lossy Coding. 8.4 Info Theory Revisited

Ch. 8 Math Preliminaries for Lossy Coding. 8.4 Info Theory Revisited Ch. 8 Math Preliminaries for Lossy Coding 8.4 Info Theory Revisited 1 Info Theory Goals for Lossy Coding Again just as for the lossless case Info Theory provides: Basis for Algorithms & Bounds on Performance

More information

Dept. of Linguistics, Indiana University Fall 2015

Dept. of Linguistics, Indiana University Fall 2015 L645 Dept. of Linguistics, Indiana University Fall 2015 1 / 28 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission

More information

The Method of Types and Its Application to Information Hiding

The Method of Types and Its Application to Information Hiding The Method of Types and Its Application to Information Hiding Pierre Moulin University of Illinois at Urbana-Champaign www.ifp.uiuc.edu/ moulin/talks/eusipco05-slides.pdf EUSIPCO Antalya, September 7,

More information

5 Mutual Information and Channel Capacity

5 Mutual Information and Channel Capacity 5 Mutual Information and Channel Capacity In Section 2, we have seen the use of a quantity called entropy to measure the amount of randomness in a random variable. In this section, we introduce several

More information

SOURCE CODING WITH SIDE INFORMATION AT THE DECODER (WYNER-ZIV CODING) FEB 13, 2003

SOURCE CODING WITH SIDE INFORMATION AT THE DECODER (WYNER-ZIV CODING) FEB 13, 2003 SOURCE CODING WITH SIDE INFORMATION AT THE DECODER (WYNER-ZIV CODING) FEB 13, 2003 SLEPIAN-WOLF RESULT { X i} RATE R x ENCODER 1 DECODER X i V i {, } { V i} ENCODER 0 RATE R v Problem: Determine R, the

More information

Lecture 15: Conditional and Joint Typicaility

Lecture 15: Conditional and Joint Typicaility EE376A Information Theory Lecture 1-02/26/2015 Lecture 15: Conditional and Joint Typicaility Lecturer: Kartik Venkat Scribe: Max Zimet, Brian Wai, Sepehr Nezami 1 Notation We always write a sequence of

More information

INFORMATION THEORY AND STATISTICS

INFORMATION THEORY AND STATISTICS CHAPTER INFORMATION THEORY AND STATISTICS We now explore the relationship between information theory and statistics. We begin by describing the method of types, which is a powerful technique in large deviation

More information

MAS113 Introduction to Probability and Statistics

MAS113 Introduction to Probability and Statistics MAS113 Introduction to Probability and Statistics School of Mathematics and Statistics, University of Sheffield 2018 19 Identically distributed Suppose we have n random variables X 1, X 2,..., X n. Identically

More information

Gaussian, Markov and stationary processes

Gaussian, Markov and stationary processes Gaussian, Markov and stationary processes Gonzalo Mateos Dept. of ECE and Goergen Institute for Data Science University of Rochester gmateosb@ece.rochester.edu http://www.ece.rochester.edu/~gmateosb/ November

More information

Introduction to Machine Learning Lecture 14. Mehryar Mohri Courant Institute and Google Research

Introduction to Machine Learning Lecture 14. Mehryar Mohri Courant Institute and Google Research Introduction to Machine Learning Lecture 14 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Density Estimation Maxent Models 2 Entropy Definition: the entropy of a random variable

More information

Lecture 1: August 28

Lecture 1: August 28 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 1: August 28 Our broad goal for the first few lectures is to try to understand the behaviour of sums of independent random

More information

Bandits, Experts, and Games

Bandits, Experts, and Games Bandits, Experts, and Games CMSC 858G Fall 2016 University of Maryland Intro to Probability* Alex Slivkins Microsoft Research NYC * Many of the slides adopted from Ron Jin and Mohammad Hajiaghayi Outline

More information

(each row defines a probability distribution). Given n-strings x X n, y Y n we can use the absence of memory in the channel to compute

(each row defines a probability distribution). Given n-strings x X n, y Y n we can use the absence of memory in the channel to compute ENEE 739C: Advanced Topics in Signal Processing: Coding Theory Instructor: Alexander Barg Lecture 6 (draft; 9/6/03. Error exponents for Discrete Memoryless Channels http://www.enee.umd.edu/ abarg/enee739c/course.html

More information

On the Duality between Multiple-Access Codes and Computation Codes

On the Duality between Multiple-Access Codes and Computation Codes On the Duality between Multiple-Access Codes and Computation Codes Jingge Zhu University of California, Berkeley jingge.zhu@berkeley.edu Sung Hoon Lim KIOST shlim@kiost.ac.kr Michael Gastpar EPFL michael.gastpar@epfl.ch

More information

Introduction to Probability

Introduction to Probability LECTURE NOTES Course 6.041-6.431 M.I.T. FALL 2000 Introduction to Probability Dimitri P. Bertsekas and John N. Tsitsiklis Professors of Electrical Engineering and Computer Science Massachusetts Institute

More information

Lecture 8: Channel and source-channel coding theorems; BEC & linear codes. 1 Intuitive justification for upper bound on channel capacity

Lecture 8: Channel and source-channel coding theorems; BEC & linear codes. 1 Intuitive justification for upper bound on channel capacity 5-859: Information Theory and Applications in TCS CMU: Spring 23 Lecture 8: Channel and source-channel coding theorems; BEC & linear codes February 7, 23 Lecturer: Venkatesan Guruswami Scribe: Dan Stahlke

More information

(Re)introduction to Statistics Dan Lizotte

(Re)introduction to Statistics Dan Lizotte (Re)introduction to Statistics Dan Lizotte 2017-01-17 Statistics The systematic collection and arrangement of numerical facts or data of any kind; (also) the branch of science or mathematics concerned

More information

Introduction to Probability Theory for Graduate Economics Fall 2008

Introduction to Probability Theory for Graduate Economics Fall 2008 Introduction to Probability Theory for Graduate Economics Fall 008 Yiğit Sağlam October 10, 008 CHAPTER - RANDOM VARIABLES AND EXPECTATION 1 1 Random Variables A random variable (RV) is a real-valued function

More information

CS145: Probability & Computing

CS145: Probability & Computing CS45: Probability & Computing Lecture 5: Concentration Inequalities, Law of Large Numbers, Central Limit Theorem Instructor: Eli Upfal Brown University Computer Science Figure credits: Bertsekas & Tsitsiklis,

More information

Handout 1: Mathematical Background

Handout 1: Mathematical Background Handout 1: Mathematical Background Boaz Barak September 18, 2007 This is a brief review of some mathematical tools, especially probability theory that we will use. This material is mostly from discrete

More information

On the Shamai-Laroia Approximation for the Information Rate of the ISI Channel

On the Shamai-Laroia Approximation for the Information Rate of the ISI Channel On the Shamai-Laroia Approximation for the Information Rate of the ISI Channel Yair Carmon and Shlomo Shamai (Shitz) Department of Electrical Engineering, Technion - Israel Institute of Technology 2014

More information

Correlation Detection and an Operational Interpretation of the Rényi Mutual Information

Correlation Detection and an Operational Interpretation of the Rényi Mutual Information Correlation Detection and an Operational Interpretation of the Rényi Mutual Information Masahito Hayashi 1, Marco Tomamichel 2 1 Graduate School of Mathematics, Nagoya University, and Centre for Quantum

More information

Lecture 5: Channel Capacity. Copyright G. Caire (Sample Lectures) 122

Lecture 5: Channel Capacity. Copyright G. Caire (Sample Lectures) 122 Lecture 5: Channel Capacity Copyright G. Caire (Sample Lectures) 122 M Definitions and Problem Setup 2 X n Y n Encoder p(y x) Decoder ˆM Message Channel Estimate Definition 11. Discrete Memoryless Channel

More information

Shannon s Noisy-Channel Coding Theorem

Shannon s Noisy-Channel Coding Theorem Shannon s Noisy-Channel Coding Theorem Lucas Slot Sebastian Zur February 2015 Abstract In information theory, Shannon s Noisy-Channel Coding Theorem states that it is possible to communicate over a noisy

More information

Capacity of AWGN channels

Capacity of AWGN channels Chapter 3 Capacity of AWGN channels In this chapter we prove that the capacity of an AWGN channel with bandwidth W and signal-tonoise ratio SNR is W log 2 (1+SNR) bits per second (b/s). The proof that

More information

Lecture 2: Review of Probability

Lecture 2: Review of Probability Lecture 2: Review of Probability Zheng Tian Contents 1 Random Variables and Probability Distributions 2 1.1 Defining probabilities and random variables..................... 2 1.2 Probability distributions................................

More information

LECTURE 15. Last time: Feedback channel: setting up the problem. Lecture outline. Joint source and channel coding theorem

LECTURE 15. Last time: Feedback channel: setting up the problem. Lecture outline. Joint source and channel coding theorem LECTURE 15 Last time: Feedback channel: setting up the problem Perfect feedback Feedback capacity Data compression Lecture outline Joint source and channel coding theorem Converse Robustness Brain teaser

More information

EE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm

EE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm EE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm 1. Feedback does not increase the capacity. Consider a channel with feedback. We assume that all the recieved outputs are sent back immediately

More information

Some Basic Concepts of Probability and Information Theory: Pt. 2

Some Basic Concepts of Probability and Information Theory: Pt. 2 Some Basic Concepts of Probability and Information Theory: Pt. 2 PHYS 476Q - Southern Illinois University January 22, 2018 PHYS 476Q - Southern Illinois University Some Basic Concepts of Probability and

More information