EE5319R: Problem Set 3 Assigned: 24/08/16, Due: 31/08/16
|
|
- Miranda Crawford
- 6 years ago
- Views:
Transcription
1 EE539R: Problem Set 3 Assigned: 24/08/6, Due: 3/08/6. Cover and Thomas: Problem 2.30 (Maimum Entropy): Solution: We are required to maimize H(P X ) over all distributions P X on the non-negative integers satisfying np X (n) = A n=0 and also the normalization constraint n=0 P X(n) = (which we ignore without loss of generality). Now, construct the Lagrangian: ( ) L(P X, λ) = P X (n) log P X (n) + λ np X (n) A n=0 Differentiating with respect to P X (n) (assuming natural logs and interchanging of differentiation and infinite sum), we obtain so we have n=0 P X (n) P X (n) log P X(n) + λn = 0 P X(n) = ep( + λn), n 0. We immediately recognize that this is a geometric distribution with mean A, i.e., PX can be written alternatively as where From direct calculations, the entropy is P X(n) = ( p) n p, n 0. A = p p H(P X ) = H b(p). p 2. (Optional): Cover and Thomas: Problem 2.38 (The Value of a Question): since H(Y X) = 0. H(X) H(X Y ) = I(X; Y ) = H(Y ) H(Y X) = H b (α) H(Y X) = H b (α)
2 3. Fano s inequality for list decoding: Recall the proof of Fano s inequality. Now develop a generalization of Fano s inequality for list decoding. Let (X, Y ) P XY and let L(Y ) ˆX be a set of size L (compare this to an estimator ˆX(Y ) X which is a set of size L = ). Lower bound the probability of error Pr(X / L(Y )) in terms of L, H(X L(Y )) and X. You should be able to recover the standard Fano inequality if you set L =. Solution: Define the error random variable Now consider E = { X / L(Y ) 0 X L(Y ) H(X, E L(Y )) = H(X E, L(Y )) + H(E L(Y )) = H(E X, L(Y )) + H(X L(Y )) Let P e := Pr(X / L(Y )). Now clearly, H(E X, L(Y )) = 0, and H(E L(Y )) H(E) = H b (P e ). Now, we eamine the term H(X E, L(Y )). We have H(X E, L(Y )) = Pr(E = 0)H(X E = 0, L(Y )) + Pr(E = )H(X E =, L(Y )) ( P e ) log L + P e log( X L) since if we know that E = 0, the number of values that X can take on is no more than L and if E =, the number of values that X can take on is no more than X L. Putting everything together and upper bounding H b (P e ) by, we have H(X L(Y )) log L P e. log X L L 4. (Optional): Data Processing Inequality for KL Divergence: Let P X, Q X be pmfs on the same alphabet X. Assume for the sake of simplicity that P X (), Q X () > 0 for all X. Let W (y ) = Pr(Y = y X = ) be a channel from X to Y. Define P Y (y) = W (y )P X (), and Q Y (y) = W (y )Q X () Show that D(P X Q X ) D(P Y Q Y ) You may use the log-sum inequality. This problem shows that processing does not increase divergence. Solution: Starting from the definition of D(P Y Q Y ), we have D(P Y Q Y ) = y = y y = y P Y (y) log P Y (y) Q Y (y) ( ) W (y )P X () log ( W (y )P X()) ( W (y )Q X()) W (y )P X () log W (y )P X() W (y )Q X () W (y )P X () log P X() Q X () = P X () log P X() Q X () = D(P X Q X ) where the inequality follows from the log-sum inequality 2
3 5. Typical-Set Calculations : (a) Suppose a DMS emits h and t with probability /2 each. For ɛ = 0.0 and n = 5, what is A n ɛ? Solution: In this case, H(X) =. All source sequences are equally likely, each with probability 2 5 = 2 nh(x). Hence, all sequences satisfy the condition for being typical, 2 n(h(x)+ɛ) p X n( n ) 2 n(h(x) ɛ) for any ɛ > 0. Hence, all 32 sequences are typical. (b) Repeat if Pr(h) = 0.2, Pr(t) = 0.8, n = 5, and ɛ = Solution: Consider a sequence with m heads and n m tails. Then, the probability of occurrence of this sequence is p m ( p) n m, where p = Pr(h). For such a sequence to be typical which translates to Plugging in the value of p = 0.2, we get 2 n(h(x)+ɛ) p m ( p) n m 2 n(h(x) ɛ) ( m ) n p log p p ɛ m 5 5 ɛ 2. Since m = 0,..., 5, this condition will be satisfied for the given ɛ only for m = i.e. when there is one H in the sequence. Thus, A n ɛ = {(HT T T T ), (T HT T T ), (T T HT T ), (T T T HT ), (T T T T H)}. 6. Typical-Set Calculations 2: Consider a DMS with a two symbol alphabet {a, b} where p X (a) = 2/3 and p X (b) = /3. Let X n = (X,..., X n ) be a string of chance variables from the source with n = 00, 000. (a) Let W (X j ) be the log pmf random variable for the j-th source output, i.e., W (X j ) = log 2/3 for X j = a and log /3 for X j = b. Find the variance of W (X j ). Solution: For notational convenience, we will denote the log pmf random variable by W. Now, note that W takes on values log 2/3 with probability 2/3 and log /3 with probability /3. Hence, Var(W ) = E[W 2 ] E[W ] 2 = 2 9 (b) For ɛ = 0.0, evaluate the bound on the probability of the typical set using Pr(X n σw 2 /(nɛ2 ). Solution: The bound on the typical set, as derived using Chebyshev s inequality is / A (n) ɛ ) Pr(X n A (n) ɛ ) σ2 W nɛ 2. Substituting the values of n = 0 5 and ɛ = 0.0, we obtain Pr(X n A (n) ɛ ) 45 = Loosely speaking this means that if we were to look at sequences of length 00, 000 generated from our DMS, more than 97% of the time the sequence will be typical. 3
4 (c) Let N a be the number of a s in the string X n = (X,..., X n ). The random variable (rv) N a is the sum of n iid rv s. Show what these rv s are. Solution: The rv N a is the sum of n iid rv s Y i, N a = n i= Y i where Y i s are Bernoulli with Pr(Y i = ) = 2/3. (d) Epress the rv W (X n ) as a function of the rv N a. Note how this depends on n. Solution: The probability of a particular sequence X n with N a number of a s (2/3) Na (/3) n Na. Hence, W (X n ) = log p X n( n ) = log[(2/3) Na (/3) n Na ] = n log 3 N a (e) Epress the typical set in terms of bounds on N a (i.e., A (n) ɛ = { n : α < N a < β} and calculate α and β). Solution: For a sequence X n to be typical, it must satisfy n log p X n(n ) H(X) < ɛ From (a) the source entropy is H(X) = E[W (X)] = log 3 2/3 and substituting in ɛ and W (X n ) from part (d), we get N a n Note the intuitive appeal of this condition! It says that for a sequence to be typical, the proportion of a s in that sequence will be very close to the probability that the DMS generates an a. Plugging in the value of n in the above equation, we get the bounds on 65, 667 N a 67, 666. (f) Find the mean and variance of N a. Approimate Pr(X n A (n) ɛ ) by the central limit theorem approimation. The central limit theorem approimation is to evaluate Pr(X n A ɛ (n) ) assuming that N a is Gaussian with the mean and variance of the actual N a. Recall that for a sequence of iid rvs C,... C n, the central limit theorem assert that Pr ( n n i= C i µ C t ) ( ) t Φ σ C where µ C and σ C are the mean and standard deviation of the C i s and Φ(z) = z 2π ep( u2 2 ) du is the cdf of the standard Gaussian. Solution: N a is a binomial r.v. (which is a sum independent Bernoulli r.v. as we have shown in part (c)). The mean and variance are E[N a ] = , Var(N a ) = Note that we can calculate the eact probability of the typical set A (n) ɛ : Pr(A (n) ɛ ) = Pr(65, 667 N a 67, 666) = 67,666 N a=65,667 ( 0 5 N a ) ( ) Na ( ) N a 3 3 But this is computationally intensive, so we approimate the Pr(A (n) ɛ ) with the central limit theorem. We can use the CLT because N a is the sum of n iid r.v. so in the limit of large n, the cumulative distribution approaches that of a Gaussian r.v. with the mean and variance of N a. β ( Pr(65, 667 N a 67, 666) 2π Var(Na ) ep ( E[N a]) 2 ) d = Φ(6.706) Φ(6.70) 2 Var(N a ) α 4
5 where Φ() is the integral of the unit Gaussian r.v. from (, ). Thus the CLT approimation tells us approimately all of the sequences we observe from the output of the DMS will be typical, whereas Chebyshev gave us a bound that more than 97% of the sequences that we observe will be typical. 7. (Optional): Typical-Set Calculations 3: For the random variables in the previous problem, find Pr(N a = i) for i = 0,, 2. Find the probability of each individual string n for those values of i. Find the particular string n that has maimum probability over all sample values of X n. What are the net most probable n-strings. Give a brief discussion of why the most probable n-strings are not regarded as typical strings. Solution: We know from the previous problem that ( 0 5 Pr(N a = i) = i ) ( 2 3 ) i ( ) 0 5 i 3 For i = 0,, 2, Pr(N a = i) is approimately zero. The string with the maimal probability is the string with all a s. The net most probable strings are the sequences with n a s and one b, and so forth. From the definition of the typical set, we see that the typical set is a fairly small set which contains most of the probability, and the probability of each sequence in the typical set is almost the same. The most probable sequences and the least probable sequences are the tails of the distribution of the sample mean of the log pmf (they are the furthest from the mean), so are not regarded as typical strings. In fact, the aggregate probability of the all the most likely sequences and all the least likely sequences is very small. The only case where the most likely sequence is regarded as typical is when every sequence is typical and every sequence is most likely (as in problem Typical Set Calculation ). However, this is not the case in general. From what we have seen in problem Typical Set Calculation 2 for very long sequences, the typical sequence will contain roughly the same proportion of of symbols as the probability of that symbol. 8. (Optional): AEP and Mutual Information: Let (X i, Y i ) be i.i.d. p X,Y (, y). We form the loglikelihood ratio of the hypothesis that X and Y are independent vs the hypothesis that X and Y are dependent. What is the limit of n log p Xn(X n )p Y n(y n ) p Xn,Y n(xn, Y n ) What is the limit of p X n (Xn )p Y n (Y n ) p X n,y n (X n,y n ) if X i and Y i are independent for all i? Solution: Let L = n log p X n(xn )p Y n(y n ) p Xn,Y n(xn, Y n ) Since (X i, Y i ) be i.i.d. p X,Y (, y), we have L = n n i= log p X(X i )p Y (Y i ) p X,Y (X i, Y i ) }{{} W (X i,y i) Each of the terms is a function of (X i, Y i ) which are independent across i =,..., n. following convergence in probability is observed: [ L E [W (X, Y )] = E (X,Y ) px,y log p ] X(X)p Y (Y ) = I(X; Y ) p X,Y (X, Y ) Thus, the is 2 ni(x;y ) which converges to one if X and Y are inde- Hence, the limit of 2 nl = p X n (Xn )p Y n (Y n ) p X n,y n (X n,y n ) pendent because I(X; Y ) = 0. 5
6 9. Piece of Cake: A cake is sliced roughly in half, the largest piece being chosen each time, the other pieces discarded. We will assume that a random cut creates pieces of proportions: { (2/3, /3) w.p. 3/4 P = (2/5, 3/5) w.p. /4 Thus, for eample, the first cut (and choice of largest piece) may result in a piece of size 3/5. Cutting and choosing from this piece might reduce it to size (3/5)(2/3) at time 2, and so on. Let T n be the fraction of cake left after n cuts. Find the limit (in probability) of lim n n log T n Solution: Let C i be the fraction of the piece of cake that is cut at the i-th cut, and let T n be the fraction of cake left after n cuts. Then we have T n = C C 2... C n. Hence, lim n n log T n = lim n n n log C i E[log C ] = 3 4 log log 3 5. i= 0. Two Typical Sets: Let X i be a sequence of real-valued random variables independent and identically distributed according to P X (), X. Let µ = E[X] and denote the entropy of X as H(X) = P X() log P X (). Define the two sets A n = { n X n : } { n log P X n( n ) H(X) ɛ, B n = n X n : n (a) ( point) Pr(X n A n ) as n. True or false. Justify your answer. Solution: This follows by Chebyshev s inequality: Indeed. where σ 2 0 = Var( log P X (X)). Consequently as desired. Pr(X n A c n) σ2 0 nɛ 2 0 Pr(X n A n ) (b) ( point) Pr(X n A n B n ) as n. True or false. Justify your answer. Solution: By Chebyshev s inequality and the same logic as the above, Pr(X n B n ) So by De Morgan s theorem and the union bound, } n X i µ ɛ Pr(X n A n B n ) = Pr(X n A c n B c n) Pr(X n A c n) Pr(X n B c n) Since the latter two terms tend to zero, we know that as desired. Pr(X n A n B n ) (c) ( point) Show that A n B n 2 n(h(x)+ɛ) for all n. A n B n A n 2 n(h+ɛ) where the final inequality comes from the AEP, shown in class. i= 6
7 (d) ( point) Show that A n B n 2 2n(H(X) ɛ) for n sufficiently large. Pr(X n A n B n ) 2 for n sufficiently large. Thus and we are done. 2 P X n( n ) n A n B n n A n B n 2 n(h ɛ) = A n B n 2 n(h ɛ). Entropy Inequalities: Let X and Y be real-valued random variables that take on discrete values in X = {,..., r} and Y = {,..., s}. Let Z = X + Y. (a) ( point) Show that H(Z X) = H(Y X). Justify your answer carefully. Solution: Consider H(Z X) = P X ()H(Z X = ) = = = P X () z P X () z P X () y P Z X (z ) log P Z X (z ) P Y X (z ) log P Y X (z ) P Y X (y ) log P Y X (y ) = H(Y X). (b) ( point) It is now known that X and Y are independent, which of the following is true in general? (i) H(X) H(Z); (ii) H(X) H(Z). Justify your answer. Solution: From the above, note that X and Y are symmetrical. So given what we have proved in (a), we also know that H(Z Y ) = H(X Y ) Now, we have H(Z) H(Z Y ) = H(X Y ) = H(X) where the inequality is due to conditioning reduces entropy and the final equality by the independence of X and Z. So the first assertion is true. (c) ( point) Now, in addition to Z = X + Y and that X and Y are independent, it is also known that X = f (Z) and Y = f 2 (Z) for some functions f and f 2. Find H(Z) in terms of H(X) and H(Y ). H(Z) = H(X + Y ) H(X, Y ) = H(X) + H(Y ) where the final equality is by independence of X and Y. On the other hand, H(X) + H(Y ) = H(X, Y ) = H(f (Z), f 2 (Z)) H(Z) Hence all inequalities above are equalities and we have H(Z) = H(X) + H(Y ). 7
4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information
4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information Ramji Venkataramanan Signal Processing and Communications Lab Department of Engineering ramji.v@eng.cam.ac.uk
More informationNational University of Singapore Department of Electrical & Computer Engineering. Examination for
National University of Singapore Department of Electrical & Computer Engineering Examination for EE5139R Information Theory for Communication Systems (Semester I, 2014/15) November/December 2014 Time Allowed:
More informationEE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16
EE539R: Problem Set 4 Assigned: 3/08/6, Due: 07/09/6. Cover and Thomas: Problem 3.5 Sets defined by probabilities: Define the set C n (t = {x n : P X n(x n 2 nt } (a We have = P X n(x n P X n(x n 2 nt
More informationECE 4400:693 - Information Theory
ECE 4400:693 - Information Theory Dr. Nghi Tran Lecture 8: Differential Entropy Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 1 / 43 Outline 1 Review: Entropy of discrete RVs 2 Differential
More informationLecture 22: Final Review
Lecture 22: Final Review Nuts and bolts Fundamental questions and limits Tools Practical algorithms Future topics Dr Yao Xie, ECE587, Information Theory, Duke University Basics Dr Yao Xie, ECE587, Information
More informationSolutions to Homework Set #1 Sanov s Theorem, Rate distortion
st Semester 00/ Solutions to Homework Set # Sanov s Theorem, Rate distortion. Sanov s theorem: Prove the simple version of Sanov s theorem for the binary random variables, i.e., let X,X,...,X n be a sequence
More informationChapter 8: Differential entropy. University of Illinois at Chicago ECE 534, Natasha Devroye
Chapter 8: Differential entropy Chapter 8 outline Motivation Definitions Relation to discrete entropy Joint and conditional differential entropy Relative entropy and mutual information Properties AEP for
More informationLecture 5: Asymptotic Equipartition Property
Lecture 5: Asymptotic Equipartition Property Law of large number for product of random variables AEP and consequences Dr. Yao Xie, ECE587, Information Theory, Duke University Stock market Initial investment
More informationEE/Stat 376B Handout #5 Network Information Theory October, 14, Homework Set #2 Solutions
EE/Stat 376B Handout #5 Network Information Theory October, 14, 014 1. Problem.4 parts (b) and (c). Homework Set # Solutions (b) Consider h(x + Y ) h(x + Y Y ) = h(x Y ) = h(x). (c) Let ay = Y 1 + Y, where
More informationSolutions to Set #2 Data Compression, Huffman code and AEP
Solutions to Set #2 Data Compression, Huffman code and AEP. Huffman coding. Consider the random variable ( ) x x X = 2 x 3 x 4 x 5 x 6 x 7 0.50 0.26 0. 0.04 0.04 0.03 0.02 (a) Find a binary Huffman code
More informationSource Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria
Source Coding Master Universitario en Ingeniería de Telecomunicación I. Santamaría Universidad de Cantabria Contents Introduction Asymptotic Equipartition Property Optimal Codes (Huffman Coding) Universal
More informationChapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye
Chapter 2: Entropy and Mutual Information Chapter 2 outline Definitions Entropy Joint entropy, conditional entropy Relative entropy, mutual information Chain rules Jensen s inequality Log-sum inequality
More informationExercises with solutions (Set B)
Exercises with solutions (Set B) 3. A fair coin is tossed an infinite number of times. Let Y n be a random variable, with n Z, that describes the outcome of the n-th coin toss. If the outcome of the n-th
More informationLecture 11: Quantum Information III - Source Coding
CSCI5370 Quantum Computing November 25, 203 Lecture : Quantum Information III - Source Coding Lecturer: Shengyu Zhang Scribe: Hing Yin Tsang. Holevo s bound Suppose Alice has an information source X that
More informationHomework Set #2 Data Compression, Huffman code and AEP
Homework Set #2 Data Compression, Huffman code and AEP 1. Huffman coding. Consider the random variable ( x1 x X = 2 x 3 x 4 x 5 x 6 x 7 0.50 0.26 0.11 0.04 0.04 0.03 0.02 (a Find a binary Huffman code
More informationEE376A: Homework #2 Solutions Due by 11:59pm Thursday, February 1st, 2018
Please submit the solutions on Gradescope. Some definitions that may be useful: EE376A: Homework #2 Solutions Due by 11:59pm Thursday, February 1st, 2018 Definition 1: A sequence of random variables X
More informationLecture 11: Continuous-valued signals and differential entropy
Lecture 11: Continuous-valued signals and differential entropy Biology 429 Carl Bergstrom September 20, 2008 Sources: Parts of today s lecture follow Chapter 8 from Cover and Thomas (2007). Some components
More informationChapter 2. Discrete Distributions
Chapter. Discrete Distributions Objectives ˆ Basic Concepts & Epectations ˆ Binomial, Poisson, Geometric, Negative Binomial, and Hypergeometric Distributions ˆ Introduction to the Maimum Likelihood Estimation
More informationChapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University
Chapter 4 Data Transmission and Channel Capacity Po-Ning Chen, Professor Department of Communications Engineering National Chiao Tung University Hsin Chu, Taiwan 30050, R.O.C. Principle of Data Transmission
More informationCommunications Theory and Engineering
Communications Theory and Engineering Master's Degree in Electronic Engineering Sapienza University of Rome A.A. 2018-2019 AEP Asymptotic Equipartition Property AEP In information theory, the analog of
More informationEE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018
Please submit the solutions on Gradescope. EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018 1. Optimal codeword lengths. Although the codeword lengths of an optimal variable length code
More informationECE 587 / STA 563: Lecture 5 Lossless Compression
ECE 587 / STA 563: Lecture 5 Lossless Compression Information Theory Duke University, Fall 2017 Author: Galen Reeves Last Modified: October 18, 2017 Outline of lecture: 5.1 Introduction to Lossless Source
More informationLecture 6 I. CHANNEL CODING. X n (m) P Y X
6- Introduction to Information Theory Lecture 6 Lecturer: Haim Permuter Scribe: Yoav Eisenberg and Yakov Miron I. CHANNEL CODING We consider the following channel coding problem: m = {,2,..,2 nr} Encoder
More informationECE 587 / STA 563: Lecture 5 Lossless Compression
ECE 587 / STA 563: Lecture 5 Lossless Compression Information Theory Duke University, Fall 28 Author: Galen Reeves Last Modified: September 27, 28 Outline of lecture: 5. Introduction to Lossless Source
More informationX = X X n, + X 2
CS 70 Discrete Mathematics for CS Fall 2003 Wagner Lecture 22 Variance Question: At each time step, I flip a fair coin. If it comes up Heads, I walk one step to the right; if it comes up Tails, I walk
More informationEE376A: Homeworks #4 Solutions Due on Thursday, February 22, 2018 Please submit on Gradescope. Start every question on a new page.
EE376A: Homeworks #4 Solutions Due on Thursday, February 22, 28 Please submit on Gradescope. Start every question on a new page.. Maximum Differential Entropy (a) Show that among all distributions supported
More informationDiscrete Probability Refresher
ECE 1502 Information Theory Discrete Probability Refresher F. R. Kschischang Dept. of Electrical and Computer Engineering University of Toronto January 13, 1999 revised January 11, 2006 Probability theory
More informationInformation measures in simple coding problems
Part I Information measures in simple coding problems in this web service in this web service Source coding and hypothesis testing; information measures A(discrete)source is a sequence {X i } i= of random
More informationChapter 3 Single Random Variables and Probability Distributions (Part 1)
Chapter 3 Single Random Variables and Probability Distributions (Part 1) Contents What is a Random Variable? Probability Distribution Functions Cumulative Distribution Function Probability Density Function
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science Transmission of Information Spring 2006
MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science 6.44 Transmission of Information Spring 2006 Homework 2 Solution name username April 4, 2006 Reading: Chapter
More informationLECTURE 13. Last time: Lecture outline
LECTURE 13 Last time: Strong coding theorem Revisiting channel and codes Bound on probability of error Error exponent Lecture outline Fano s Lemma revisited Fano s inequality for codewords Converse to
More informationMATH Notebook 5 Fall 2018/2019
MATH442601 2 Notebook 5 Fall 2018/2019 prepared by Professor Jenny Baglivo c Copyright 2004-2019 by Jenny A. Baglivo. All Rights Reserved. 5 MATH442601 2 Notebook 5 3 5.1 Sequences of IID Random Variables.............................
More informationEE5139R: Problem Set 7 Assigned: 30/09/15, Due: 07/10/15
EE5139R: Problem Set 7 Assigned: 30/09/15, Due: 07/10/15 1. Cascade of Binary Symmetric Channels The conditional probability distribution py x for each of the BSCs may be expressed by the transition probability
More informationLecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157
Lecture 6: Gaussian Channels Copyright G. Caire (Sample Lectures) 157 Differential entropy (1) Definition 18. The (joint) differential entropy of a continuous random vector X n p X n(x) over R is: Z h(x
More informationDiscrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 20
CS 70 Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 20 Today we shall discuss a measure of how close a random variable tends to be to its expectation. But first we need to see how to compute
More informationFundamental Tools - Probability Theory IV
Fundamental Tools - Probability Theory IV MSc Financial Mathematics The University of Warwick October 1, 2015 MSc Financial Mathematics Fundamental Tools - Probability Theory IV 1 / 14 Model-independent
More informationHomework 1 Due: Thursday 2/5/2015. Instructions: Turn in your homework in class on Thursday 2/5/2015
10-704 Homework 1 Due: Thursday 2/5/2015 Instructions: Turn in your homework in class on Thursday 2/5/2015 1. Information Theory Basics and Inequalities C&T 2.47, 2.29 (a) A deck of n cards in order 1,
More informationChapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University
Chapter 3, 4 Random Variables ENCS6161 - Probability and Stochastic Processes Concordia University ENCS6161 p.1/47 The Notion of a Random Variable A random variable X is a function that assigns a real
More informationData Compression. Limit of Information Compression. October, Examples of codes 1
Data Compression Limit of Information Compression Radu Trîmbiţaş October, 202 Outline Contents Eamples of codes 2 Kraft Inequality 4 2. Kraft Inequality............................ 4 2.2 Kraft inequality
More informationMore on Distribution Function
More on Distribution Function The distribution of a random variable X can be determined directly from its cumulative distribution function F X. Theorem: Let X be any random variable, with cumulative distribution
More informationMGMT 69000: Topics in High-dimensional Data Analysis Falll 2016
MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016 Lecture 14: Information Theoretic Methods Lecturer: Jiaming Xu Scribe: Hilda Ibriga, Adarsh Barik, December 02, 2016 Outline f-divergence
More informationLECTURE 3. Last time:
LECTURE 3 Last time: Mutual Information. Convexity and concavity Jensen s inequality Information Inequality Data processing theorem Fano s Inequality Lecture outline Stochastic processes, Entropy rate
More informationLecture 3: Channel Capacity
Lecture 3: Channel Capacity 1 Definitions Channel capacity is a measure of maximum information per channel usage one can get through a channel. This one of the fundamental concepts in information theory.
More informationLecture 17: Differential Entropy
Lecture 17: Differential Entropy Differential entropy AEP for differential entropy Quantization Maximum differential entropy Estimation counterpart of Fano s inequality Dr. Yao Xie, ECE587, Information
More informationLecture 14 February 28
EE/Stats 376A: Information Theory Winter 07 Lecture 4 February 8 Lecturer: David Tse Scribe: Sagnik M, Vivek B 4 Outline Gaussian channel and capacity Information measures for continuous random variables
More informationA Probability Primer. A random walk down a probabilistic path leading to some stochastic thoughts on chance events and uncertain outcomes.
A Probability Primer A random walk down a probabilistic path leading to some stochastic thoughts on chance events and uncertain outcomes. Are you holding all the cards?? Random Events A random event, E,
More information1 Basic Information Theory
ECE 6980 An Algorithmic and Information-Theoretic Toolbo for Massive Data Instructor: Jayadev Acharya Lecture #4 Scribe: Xiao Xu 6th September, 206 Please send errors to 243@cornell.edu and acharya@cornell.edu
More informationLecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN
Lecture Notes 5 Convergence and Limit Theorems Motivation Convergence with Probability Convergence in Mean Square Convergence in Probability, WLLN Convergence in Distribution, CLT EE 278: Convergence and
More informationProbability. Lecture Notes. Adolfo J. Rumbos
Probability Lecture Notes Adolfo J. Rumbos October 20, 204 2 Contents Introduction 5. An example from statistical inference................ 5 2 Probability Spaces 9 2. Sample Spaces and σ fields.....................
More informationMath 180A. Lecture 16 Friday May 7 th. Expectation. Recall the three main probability density functions so far (1) Uniform (2) Exponential.
Math 8A Lecture 6 Friday May 7 th Epectation Recall the three main probability density functions so far () Uniform () Eponential (3) Power Law e, ( ), Math 8A Lecture 6 Friday May 7 th Epectation Eample
More informationFrans M.J. Willems. Authentication Based on Secret-Key Generation. Frans M.J. Willems. (joint work w. Tanya Ignatenko)
Eindhoven University of Technology IEEE EURASIP Spain Seminar on Signal Processing, Communication and Information Theory, Universidad Carlos III de Madrid, December 11, 2014 : Secret-Based Authentication
More informationCSCI-6971 Lecture Notes: Probability theory
CSCI-6971 Lecture Notes: Probability theory Kristopher R. Beevers Department of Computer Science Rensselaer Polytechnic Institute beevek@cs.rpi.edu January 31, 2006 1 Properties of probabilities Let, A,
More information1 Review of Probability
1 Review of Probability Random variables are denoted by X, Y, Z, etc. The cumulative distribution function (c.d.f.) of a random variable X is denoted by F (x) = P (X x), < x
More informationChapter 2: The Random Variable
Chapter : The Random Variable The outcome of a random eperiment need not be a number, for eample tossing a coin or selecting a color ball from a bo. However we are usually interested not in the outcome
More informationLecture 2: CDF and EDF
STAT 425: Introduction to Nonparametric Statistics Winter 2018 Instructor: Yen-Chi Chen Lecture 2: CDF and EDF 2.1 CDF: Cumulative Distribution Function For a random variable X, its CDF F () contains all
More informationCapacity of a channel Shannon s second theorem. Information Theory 1/33
Capacity of a channel Shannon s second theorem Information Theory 1/33 Outline 1. Memoryless channels, examples ; 2. Capacity ; 3. Symmetric channels ; 4. Channel Coding ; 5. Shannon s second theorem,
More informationTopic 3: The Expectation of a Random Variable
Topic 3: The Expectation of a Random Variable Course 003, 2017 Page 0 Expectation of a discrete random variable Definition (Expectation of a discrete r.v.): The expected value (also called the expectation
More informationECE598: Information-theoretic methods in high-dimensional statistics Spring 2016
ECE598: Information-theoretic methods in high-dimensional statistics Spring 06 Lecture : Mutual Information Method Lecturer: Yihong Wu Scribe: Jaeho Lee, Mar, 06 Ed. Mar 9 Quick review: Assouad s lemma
More informationCentral Limit Theorem and the Law of Large Numbers Class 6, Jeremy Orloff and Jonathan Bloom
Central Limit Theorem and the Law of Large Numbers Class 6, 8.5 Jeremy Orloff and Jonathan Bloom Learning Goals. Understand the statement of the law of large numbers. 2. Understand the statement of the
More informationProbability Review. Gonzalo Mateos
Probability Review Gonzalo Mateos Dept. of ECE and Goergen Institute for Data Science University of Rochester gmateosb@ece.rochester.edu http://www.ece.rochester.edu/~gmateosb/ September 11, 2018 Introduction
More informationlossless, optimal compressor
6. Variable-length Lossless Compression The principal engineering goal of compression is to represent a given sequence a, a 2,..., a n produced by a source as a sequence of bits of minimal possible length.
More informationTopic 7: Convergence of Random Variables
Topic 7: Convergence of Ranom Variables Course 003, 2016 Page 0 The Inference Problem So far, our starting point has been a given probability space (S, F, P). We now look at how to generate information
More informationELEC546 Review of Information Theory
ELEC546 Review of Information Theory Vincent Lau 1/1/004 1 Review of Information Theory Entropy: Measure of uncertainty of a random variable X. The entropy of X, H(X), is given by: If X is a discrete random
More informationAn instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1
Kraft s inequality An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if N 2 l i 1 Proof: Suppose that we have a tree code. Let l max = max{l 1,...,
More informationPART III. Outline. Codes and Cryptography. Sources. Optimal Codes (I) Jorge L. Villar. MAMME, Fall 2015
Outline Codes and Cryptography 1 Information Sources and Optimal Codes 2 Building Optimal Codes: Huffman Codes MAMME, Fall 2015 3 Shannon Entropy and Mutual Information PART III Sources Information source:
More informationThe binary entropy function
ECE 7680 Lecture 2 Definitions and Basic Facts Objective: To learn a bunch of definitions about entropy and information measures that will be useful through the quarter, and to present some simple but
More informationCS 630 Basic Probability and Information Theory. Tim Campbell
CS 630 Basic Probability and Information Theory Tim Campbell 21 January 2003 Probability Theory Probability Theory is the study of how best to predict outcomes of events. An experiment (or trial or event)
More informationLECTURE 10. Last time: Lecture outline
LECTURE 10 Joint AEP Coding Theorem Last time: Error Exponents Lecture outline Strong Coding Theorem Reading: Gallager, Chapter 5. Review Joint AEP A ( ɛ n) (X) A ( ɛ n) (Y ) vs. A ( ɛ n) (X, Y ) 2 nh(x)
More information6 The normal distribution, the central limit theorem and random samples
6 The normal distribution, the central limit theorem and random samples 6.1 The normal distribution We mentioned the normal (or Gaussian) distribution in Chapter 4. It has density f X (x) = 1 σ 1 2π e
More informationLecture 4: Sampling, Tail Inequalities
Lecture 4: Sampling, Tail Inequalities Variance and Covariance Moment and Deviation Concentration and Tail Inequalities Sampling and Estimation c Hung Q. Ngo (SUNY at Buffalo) CSE 694 A Fun Course 1 /
More informationIntroduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.
L65 Dept. of Linguistics, Indiana University Fall 205 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission rate
More informationCh. 8 Math Preliminaries for Lossy Coding. 8.4 Info Theory Revisited
Ch. 8 Math Preliminaries for Lossy Coding 8.4 Info Theory Revisited 1 Info Theory Goals for Lossy Coding Again just as for the lossless case Info Theory provides: Basis for Algorithms & Bounds on Performance
More informationDept. of Linguistics, Indiana University Fall 2015
L645 Dept. of Linguistics, Indiana University Fall 2015 1 / 28 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission
More informationThe Method of Types and Its Application to Information Hiding
The Method of Types and Its Application to Information Hiding Pierre Moulin University of Illinois at Urbana-Champaign www.ifp.uiuc.edu/ moulin/talks/eusipco05-slides.pdf EUSIPCO Antalya, September 7,
More information5 Mutual Information and Channel Capacity
5 Mutual Information and Channel Capacity In Section 2, we have seen the use of a quantity called entropy to measure the amount of randomness in a random variable. In this section, we introduce several
More informationSOURCE CODING WITH SIDE INFORMATION AT THE DECODER (WYNER-ZIV CODING) FEB 13, 2003
SOURCE CODING WITH SIDE INFORMATION AT THE DECODER (WYNER-ZIV CODING) FEB 13, 2003 SLEPIAN-WOLF RESULT { X i} RATE R x ENCODER 1 DECODER X i V i {, } { V i} ENCODER 0 RATE R v Problem: Determine R, the
More informationLecture 15: Conditional and Joint Typicaility
EE376A Information Theory Lecture 1-02/26/2015 Lecture 15: Conditional and Joint Typicaility Lecturer: Kartik Venkat Scribe: Max Zimet, Brian Wai, Sepehr Nezami 1 Notation We always write a sequence of
More informationINFORMATION THEORY AND STATISTICS
CHAPTER INFORMATION THEORY AND STATISTICS We now explore the relationship between information theory and statistics. We begin by describing the method of types, which is a powerful technique in large deviation
More informationMAS113 Introduction to Probability and Statistics
MAS113 Introduction to Probability and Statistics School of Mathematics and Statistics, University of Sheffield 2018 19 Identically distributed Suppose we have n random variables X 1, X 2,..., X n. Identically
More informationGaussian, Markov and stationary processes
Gaussian, Markov and stationary processes Gonzalo Mateos Dept. of ECE and Goergen Institute for Data Science University of Rochester gmateosb@ece.rochester.edu http://www.ece.rochester.edu/~gmateosb/ November
More informationIntroduction to Machine Learning Lecture 14. Mehryar Mohri Courant Institute and Google Research
Introduction to Machine Learning Lecture 14 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Density Estimation Maxent Models 2 Entropy Definition: the entropy of a random variable
More informationLecture 1: August 28
36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 1: August 28 Our broad goal for the first few lectures is to try to understand the behaviour of sums of independent random
More informationBandits, Experts, and Games
Bandits, Experts, and Games CMSC 858G Fall 2016 University of Maryland Intro to Probability* Alex Slivkins Microsoft Research NYC * Many of the slides adopted from Ron Jin and Mohammad Hajiaghayi Outline
More information(each row defines a probability distribution). Given n-strings x X n, y Y n we can use the absence of memory in the channel to compute
ENEE 739C: Advanced Topics in Signal Processing: Coding Theory Instructor: Alexander Barg Lecture 6 (draft; 9/6/03. Error exponents for Discrete Memoryless Channels http://www.enee.umd.edu/ abarg/enee739c/course.html
More informationOn the Duality between Multiple-Access Codes and Computation Codes
On the Duality between Multiple-Access Codes and Computation Codes Jingge Zhu University of California, Berkeley jingge.zhu@berkeley.edu Sung Hoon Lim KIOST shlim@kiost.ac.kr Michael Gastpar EPFL michael.gastpar@epfl.ch
More informationIntroduction to Probability
LECTURE NOTES Course 6.041-6.431 M.I.T. FALL 2000 Introduction to Probability Dimitri P. Bertsekas and John N. Tsitsiklis Professors of Electrical Engineering and Computer Science Massachusetts Institute
More informationLecture 8: Channel and source-channel coding theorems; BEC & linear codes. 1 Intuitive justification for upper bound on channel capacity
5-859: Information Theory and Applications in TCS CMU: Spring 23 Lecture 8: Channel and source-channel coding theorems; BEC & linear codes February 7, 23 Lecturer: Venkatesan Guruswami Scribe: Dan Stahlke
More information(Re)introduction to Statistics Dan Lizotte
(Re)introduction to Statistics Dan Lizotte 2017-01-17 Statistics The systematic collection and arrangement of numerical facts or data of any kind; (also) the branch of science or mathematics concerned
More informationIntroduction to Probability Theory for Graduate Economics Fall 2008
Introduction to Probability Theory for Graduate Economics Fall 008 Yiğit Sağlam October 10, 008 CHAPTER - RANDOM VARIABLES AND EXPECTATION 1 1 Random Variables A random variable (RV) is a real-valued function
More informationCS145: Probability & Computing
CS45: Probability & Computing Lecture 5: Concentration Inequalities, Law of Large Numbers, Central Limit Theorem Instructor: Eli Upfal Brown University Computer Science Figure credits: Bertsekas & Tsitsiklis,
More informationHandout 1: Mathematical Background
Handout 1: Mathematical Background Boaz Barak September 18, 2007 This is a brief review of some mathematical tools, especially probability theory that we will use. This material is mostly from discrete
More informationOn the Shamai-Laroia Approximation for the Information Rate of the ISI Channel
On the Shamai-Laroia Approximation for the Information Rate of the ISI Channel Yair Carmon and Shlomo Shamai (Shitz) Department of Electrical Engineering, Technion - Israel Institute of Technology 2014
More informationCorrelation Detection and an Operational Interpretation of the Rényi Mutual Information
Correlation Detection and an Operational Interpretation of the Rényi Mutual Information Masahito Hayashi 1, Marco Tomamichel 2 1 Graduate School of Mathematics, Nagoya University, and Centre for Quantum
More informationLecture 5: Channel Capacity. Copyright G. Caire (Sample Lectures) 122
Lecture 5: Channel Capacity Copyright G. Caire (Sample Lectures) 122 M Definitions and Problem Setup 2 X n Y n Encoder p(y x) Decoder ˆM Message Channel Estimate Definition 11. Discrete Memoryless Channel
More informationShannon s Noisy-Channel Coding Theorem
Shannon s Noisy-Channel Coding Theorem Lucas Slot Sebastian Zur February 2015 Abstract In information theory, Shannon s Noisy-Channel Coding Theorem states that it is possible to communicate over a noisy
More informationCapacity of AWGN channels
Chapter 3 Capacity of AWGN channels In this chapter we prove that the capacity of an AWGN channel with bandwidth W and signal-tonoise ratio SNR is W log 2 (1+SNR) bits per second (b/s). The proof that
More informationLecture 2: Review of Probability
Lecture 2: Review of Probability Zheng Tian Contents 1 Random Variables and Probability Distributions 2 1.1 Defining probabilities and random variables..................... 2 1.2 Probability distributions................................
More informationLECTURE 15. Last time: Feedback channel: setting up the problem. Lecture outline. Joint source and channel coding theorem
LECTURE 15 Last time: Feedback channel: setting up the problem Perfect feedback Feedback capacity Data compression Lecture outline Joint source and channel coding theorem Converse Robustness Brain teaser
More informationEE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm
EE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm 1. Feedback does not increase the capacity. Consider a channel with feedback. We assume that all the recieved outputs are sent back immediately
More informationSome Basic Concepts of Probability and Information Theory: Pt. 2
Some Basic Concepts of Probability and Information Theory: Pt. 2 PHYS 476Q - Southern Illinois University January 22, 2018 PHYS 476Q - Southern Illinois University Some Basic Concepts of Probability and
More information