x log x, which is strictly convex, and use Jensen s Inequality:

Size: px
Start display at page:

Download "x log x, which is strictly convex, and use Jensen s Inequality:"

Transcription

1 2. Information measures: mutual information 2.1 Divergence: main inequality Theorem 2.1 (Information Inequality). D(P Q) 0 ; D(P Q) = 0 iff P = Q Proof. Let ϕ(x) x log x, which is strictly convex, and use Jensen s Inequality: P (x) D( P Q) = P (x) log X Q(x) = X Q(x)ϕ ( P (x) Q(x) ) ϕ ( X Q(x) P (x) Q(x) ) = ϕ(1) = Conditional divergence The main objects in our course are random variables. The main operation for creating new random variables, and also for defining relations between random variables, is that of a random transformation: Definition 2.1. Conditional probability distribution (aka random transformation, transition probability kernel, Markov kernel, channel) K( ) has two arguments: first argument is a measurable subset of, second argument is an element of X. It must satisfy: 1. For any x X : K( x) is a probability measure on 2. For any measurable A function x K(A x) is measurable on X. In this case we will say that K acts from X to. In fact, we will abuse notation and write P X instead of K to suggest what spaces X and are 1. Furthermore, if X and are connected by the random transformation P X we will write X X. P Remark 2.1. (Very technical!) Unfortunately, condition 2 (standard for probability textbooks) will frequently not be sufficiently strong for this course. The main reason is that we want Radon-Nikodym dp X=x derivatives such as dq (y) to be jointly measurable in (x, y). See Section?? for more. Example: 1. deterministic system: = f(x) P X=x = δ f(x ) 2. decoupled system: X P X=x = P 1 Another reason for writing P X is that from any joint distribution, (on standard Borel spaces) one can extract a random transformation by conditioning on X. 20

2 3. additive noise (convolution): = X + Z with Z X P X=x = P x+z. Multiplication: X X to get = P P X: PX (x, y) = P X (y x) (x). Composition (Marginalization): P = P X, that is P X acts on to produce P : P ( y) = ( y x) ( x). x X P X Will also write P. Definition 2.2 (Conditional divergence). D(P X Q X ) = E x PX [D( X=x = ( x)p D( X=x Q P X=x Q X=x ). (2.2) x X )] (2.1) Note: H(X ) = log A D( U X P ), where U X is is uniform distribution on X. Theorem 2.2 (Properties of Divergence). 1. D(P X Q X ) = D( P X Q X ) 2. (Simple chain rule) D ( 3. (Monotonicity) D( Q X ) D( ) 4. (Full chain rule) D( Q X ) = D( P X Q X ) + D( Q X ) P Q 1 X n Q X 1 X n Xi 1 i= 1 n ) = D( i Q X 1 i X i 1 i ) In the special case of QX n = i Q X i we have D(1 X n Q X1 Q Xn ) = D(1 X n 1 n ) + D( i Q Xi ) 5. (Conditioning increases divergence) Let P X and Q X be two kernels, let P and Q = Q X. Then = P X D(P Q ) D(P X Q X ) equality iff D( Q X P ) = 0 Pictorially: P X P Q X Q 21

3 6. (Data-processing for divergences) Let P = P X P = P X ( x) dpx } D(P Q = P X ( x) dq Q ) D( Q X ) (2.3) X Pictorially: P X P X D( Q X ) D(P Q ) Q X Q Proof. We only illustrate these results for the case of finite alphabets. General case follows by doing a careful analysis of Radon-Nikodym derivatives, introduction of regular branches of conditional probability etc. For certain cases (e.g. separable metric spaces), however, we can simply discretize alphabets and take granularity of discretization to 0. This method will become clearer in Lecture 4, once we understand continuity of D. 1. Ex [D(P X=x Q X=x )] = E (X, ) PX P X [log P X Q X PX ] 2. Disintegration: E (X, ) [log Q X ] = E (X, ) [log P X Q X + log 3. Apply 2. with X and interchanged and use D( ) Telescoping n = n i=1 P n X X i 1 and Q X n = i i= 1 QX i X i Inequality follows from monotonicity. To get conditions for equality, notice that by the chain rule for D: Q X D( Q X ) = D( P X Q X ) + D( ) = 0 = D( P Q X P ) + D(P Q ) X and hence we get the claimed result from positivity of D. 6. This again follows from monotonicity. Corollary 2.1. D(1 X n Q X1 Q Xn ) D(i Q Xi ) or n = iff n = j 1 P = X j Note: In general we can have D( Q X ) D( PX X ) + D( P Q ). For example, if X = under P and Q, then D( D( Q X ) = D( Q X ) <Q 2D( Q X ). Conversely, if = Q X and P = Q but Q X we have D( Q X ) > 0 = D( Q X ) + D( P Q ). Corollary 2.2. = f(x) D( P Q ) D( Q X ), with equality if f is ]

4 Note: D(P Q ) = D( Q X ) / f is 1-1. Example: = Gaussian, Q X = Laplace, = X. Corollary 2.3 (Large deviations estimate). For any subset E X we have d( [ E] Q X [ E]) D( Q X ) Proof. Consider = 1 {X E}. 2.3 Mutual information Definition 2.3 (Mutual information). Note: I(X; ) = D( P ) Intuition: I(X; ) measures the dependence between X and, or, the information about X (resp. ) provided by (resp. X) Defined by Shannon (in a different form), in this form by Fano. Note: not restricted to discrete. I(X; ) is a functional of the joint distribution, or equivalently, the pair (, P X ). Theorem 2.3 (Properties of I). 1. I(X; ) = D( P ) = D(P X P ) = D( P ) 2. Symmetry: I(X; ) = I( ; X) 3. Positivity: I(X; ) 0; I(X; ) = 0 iff X 4. I(f(X); ) I(X; ); f one-to-one I(f(X); ) = I( X; ) 5. More data More info : I( X 1, X 2 ; Z) I(X 1 ; Z) Proof. 1. I(X; ) = E log P = E log P X P = E log. 2. Apply data-processing inequality twice to the map ( x, y) ( y, x) to get D(, P ) = D( P,X P ). 3. By definition. 4. We will use the data-processing property of mutual information (to be proved shortly, see Theorem 2.5). Consider the chain of data processing: (x, y) (f(x), y) (f 1 (f(x)), y). Then I(X; ) I(f(X); ) I(f 1 (f(x)); ) = I(X; ) 5. Consider f(x1, X 2 ) = X 1. Theorem 2.4 (I v.s. H). 23

5 ( ) = H( X) X discrete 1. I X; X + otherwise 2. If X, discrete then I(X; ) = H(X) + H( ) H(X, ) If only X discrete then I( X; ) = H( X) H( X ) 3. If X, are real-valued vectors, have joint pdf and all three differential entropies are finite then I(X; ) = h(x) + h( ) h( X, ) If X has marginal pdf p X and conditional pdf p X (x y) then I(X; ) = h(x) h(x ). 4. If X or are discrete then I( X; ) min ( H( X ), H( )), with equality iff H( X ) = 0 or H( X) = 0, i.e., one is a deterministic function of the other. Proof. 1. By definition, I(X; X) = D( X ) = E x X D( δx ). If is discrete, then 1 D( δ x ) = log ( x) and I (X; X) = H ( X). If is not discrete, then let A = { x ( x) > 0} c denote the set of atoms of. Let = {( x, x) x / A} X X. Then,X ( ) = PX(A ) > 0 but since ( )( E) (dx 1 ) ( dx 2 ) 1{( x 1, x 2 ) E} X X we have by taking E = that ( )( ) = 0. Thus / and thus,x I(X; X) = D(,X ) = E log P = E [log 1 + log 1 P log 1 ]. 3. Similarly, when, and P have densities p X and p X p we have 4. Follows from 2. p X D( P ) E [ log ] = h( X) + h( ) h( X, ) p X p Corollary 2.4 (Conditioning reduces entropy). X discrete: H( X ) H( X ), with equality iff X. Intuition: The amount of entropy reduction = mutual information i.i.d. Example: X = U OR, where U, Bern( 1 2 ). Then X Bern( 3 4 ) and H(X) = h( 1 4 ) < 1 bits = H(X = 0), i.e., conditioning on = 0 increases entropy. But on average, H(X ) = P [ = 0] H(X = 0) + P [ = 1] H(X = 1) = 1 2 bits < H(X), by the strong concavity of h( ). Note: Information, entropy and Venn diagrams: 1. The following Venn diagram illustrates the relationship between entropy, conditional entropy, joint entropy, and mutual information. 24

6 H(X, ) H( X) I(X; ) H(X ) H( ) H(X) 2. If you do the same for 3 variables, you will discover that the triple intersection corresponds to H(X 1 ) + H(X 2 ) + H(X 3 ) H(X 1, X 2 ) H(X 2, X 3 ) H(X 1, X 3 ) + H( X 1, X 2, X 3 ) (2.4) which is sometimes denoted I( X; ; Z ). It can be both positive and negative (why?). 3. In general, one can treat random variables as sets (so that r.v. X i corresponds to set E i and ( X 1, X 2 ) corresponds to E 1 E 2 ). Then we can define a unique signed measure µ on the finite algebra generated by these sets so that every information quantity is found by replacing I/ H µ ;,. As an example, we have H(X 1 X 2, X 3 ) = µ (E 1 (E 2 E 3 )), (2.5) I( X 1, X 2 ; X 3 X 4 ) = µ ((( E1 E 2 ) E 3 ) E 4 ). (2.6) By inclusion-exclusion, quantity (2.4) corresponds to µ(e is not necessarily a positive measure. Example: Bivariate Gaussian. X, jointly Gaussian 1 1 I( X; ) = log 2 1 ρ 2 X 1 E 2 E 3 ), which explains why µ I(X; ) where ρ X E[(X EX)( E )] σ X σ [ 1, 1] is the correlation coefficient ρ Proof. WLOG, by shifting and scaling if necessary, we can assume EX = E = 0 and EX 2 = E 2 = 1. Then ρ = EX. By joint Gaussianity, = ρx + Z for some Z N ( 0, 1 ρ 2 ) X. Then using the divergence formula for Gaussians (1.16), we get I(X; ) = D(P X P ) = ED(N ( ρx, 1 ρ 2 ) N (0, 1)) 1 1 = E [ log 2 1 ρ 2 + log e 2 ((ρx)2 + 1 ρ 2 1)] = 1 2 log 1 1 ρ 2 25

7 Note: Similar to the role of mutual information, the correlation coefficient also measures the dependency between random variables which are real-valued (more generally, on an inner-product space) in certain sense. However, mutual information is invariant to bijections and more general: it can be defined not just for numerical random variables, but also for apples and oranges. Example: Additive white Gaussian noise (AWGN) channel. X N independent Gaussian N X + I(X; X + N) = 1 2 log (1 + σ2 X ) σn 2 signal-to-noise ratio (SNR) Example: Gaussian vectors. X R m, R n jointly Gaussian I(X; ) = 1 2 log det Σ X det Σ det Σ[ X, ] where Σ X E [(X EX)(X EX) ] denotes the covariance matrix of X R m, and Σ X,] denotes the the covariance matrix of the random vector [X, ] R m + n [. In the special case of additive noise: = X + N for N X, we have I( X; X + N) = 1 2 log det(σ X + Σ N ) det Σ N Σ Σ X why? Σ X+Σ N ) = since det Σ [X,X+N] = det ( X det Σ X Σ X det Σ N. Example: Binary symmetric channel (BSC). N X δ 1 δ δ 0 1 X + X Bern( 1 ), N Bern(δ) 2 = X + N I(X; ) = log 2 h(δ) Example: Addition over finite groups. X is uniform on G and independent of Z. Then I(X; X + Z) = log G H(Z) Proof. Show that X + Z is uniform on G regardless of Z. 26

8 2.4 Conditional mutual information and conditional independence Definition 2.4 (Conditional mutual information). I(X; Z) = D( Z Z P Z P Z ) (2.7) = E z P Z [I(X; Z = z)]. (2.8) where the product of two random transformations is ( Z = z P Z = z )( x, y) Z ( x z) P Z ( y z ), under which X and are independent conditioned on Z. Note: I(X; Z) is a functional of Z. Remark 2.2 (Conditional independence). A family of distributions can be represented by a directed acyclic graph. A simple example is a Markov chain (line graph), which represents distributions that factor as { Z Z = P X P Z }. X Z Z = PZ P Z X = P Z Z = P X PZ Cond. indep. X,, Z form a Markov chain notation X Z Theorem 2.5 (Further properties of Mutual Information). 1. I(X; Z ) 0, with equality iff X Z 2. (Kolmogorov identity or small chain rule) Z = P P Z Z X I(X, ; Z) = I(X; Z) + I( ; Z X) = I( ; Z) + I( X; Z ) 3. (Data Processing) If X Z, then a) I( X; Z) I( X; ) b) I(X; Z) I( X; ) 4. (Full chain rule) I(X n ; ) = n k=1 I(X k ; X k 1 ) Proof. 1. By definition and Theorem Z P Z = Z P Z P XZ P X 27

9 3. Apply Kolmogorov identity to I(, Z; X): 4. Recursive application of Kolmogorov identity. I(, Z; X) = I(X; ) + I( X; Z ) =0 = I( X; Z) + I( X; Z) Example: 1-to-1 function I( X; ) = I( X; f( )) Note: In general, I( X; Z) I(X; ). Examples: a) > : Conditioning does not always decrease M.I. To find counterexamples when X,, Z do not form a Markov chain, notice that there is only one directed acyclic graph non-isomorphic to X Z, namely X Z. Then a counterexample is i.i.d. 1 I( X; Z) = I( X; X Z Z) = H( X) = log 2 X, Z Bern( ) 2 = X Z I(X; ) = 0 since X b) < : Z =. Then I( X; ) = 0. Note: (Chain rule for I Chain rule for H) Set = X n. Then H( X n ) = I( X n ; X n ) = n ( ) = ( = 1 I X k; X n X k 1 k n = 1 H X k Xk 1 ), since H( X X k k X n 1, ) = 0. k Remark 2.3 (Data processing for mutual information via data processing of divergence). We proved data processing for mutual information in Theorem 2.5 using Kolmogorov s identity. In fact, data processing for mutual information is implied by the data processing for divergence: I(X; Z) = D(P Z X P Z ) D(P X P ) = I( X; ), P Z where note that for each x, we have P X = x P Z X= x and P P Z. Therefore if we have a bi-variate functional of distributions D( P Q) which satisfies data processing, then we can define an M.I.-like quantity via I D ( X; ) D( P X P ) E x PX D( P X=x P ) which will satisfy data processing on Markov chains. A rich class of examples arises by taking D = D f (an f-divergence, defined in (1.15)). That f-divergence satisfies data-processing is going to be shown in Remark 4.2. P Z 2.5 Strong data-processing inequalities For many random transformations P X, it is possible to improve the data-processing inequality (2.3): For any, Q X we have D(P Q ) η KL D( Q X ), where η KL < 1 and depends on the channel P X only. Similarly, this gives an improvement in the data-processing inequality for mutual information: For any P U,X we have U X I( U; ) η KL I( U; X ). For example, for P X = BSC(δ) we have η KL = (1 2δ) 2. Strong data-processing inequalities quantify the intuitive observation that noise inside the channel P X must reduce the information that carries about the data U, regardless of how smart the hook up U X is. This is an active area of research, see [PW15] for a short summary. 28

10 2.6* How to avoid measurability problems? As we mentioned in Remark 2.1 conditions imposed by Definition 2.1 on P X are insufficient. Namely, we get the following two issues: 1. Radon-Nikodym derivatives such as dp X=x 2. Set {x P X=x Q } may not be measurable. The easiest way to avoid all such problems is the following: dq (y) may not be jointly measurable in (x, y) Agreement A1: All conditional kernels P X X in these notes will be assumed to be defined by choosing a σ-finite measure µ 2 on and measurable function ρ(y x) 0 on X such that P X(A x) = ρ(y x)µ 2 (dy) for all x and measurable sets A and ρ( y x ) µ 2 (dy) = 1 for all x. Notes: A 1. Given another kernel Q X specified via ρ (y x) and µ = + 2 we may first replace µ 2 and µ 2 via µ 2 µ 2 µ 2 and thus assume that both P X and Q X are specified in terms of the same dominating measure µ 2. (This modifies ρ(y x) to ρ( y x) dµ 2 (y).) 2. Given two kernels P X and Q X specified in terms of the same dominating measure µ 2 and functions ρ P (y x) and ρ Q (y x), respectively, we may set dp X ρ P (y x) dq X ρ Q ( y x) outside of ρ Q = 0. When P X=x Q X=x the above gives a version of the Radon-Nikodym derivative, which is automatically measurable in (x, y). dµ 2 3. Given Q specified as we may set A 0 dq = q( y) dµ 2 = {x ρ(y x)dµ 2 = 0} {q=0} This set plays a role of { x P X = x Q }. Unlike the latter A0 is guaranteed to be measurable by Fubini [Ç11, Prop. 6.9]. By plays a role we mean that it allows to prove statements like: For any, Q [ A 0 ] = 1. So, while our agreement resolves the two measurability problems above, it introduces a new one. Indeed, given a joint distribution, on standard Borel spaces, it is always true that one can extract a conditional distribution P X satisfying Definition 2.1 (this is called disintegration). However, it is not guaranteed that P X will satisfy Agreement A1. To work around this issue as well, we add another agreement: 29

11 Agreement A2: All joint distributions, are specified by means of data: µ 1,µ 2 σ-finite measures on X and, respectively, and measurable function λ( x, y) such that Notes:, (E) E λ(x, y)µ 1 (dx)µ 2 ( dy). 1. Again, given a finite or countable collection of joint distributions,, Q X,,... satisfying A2 we may without loss of generality assume they are defined in terms of a common µ 1, µ Given, satisfying A2 we can disintegrate it into conditional (satisfying A1) and marginal: P X (A x) = ρ ( λ y x)µ 2 (dy) ρ(y x) (x, y) (2.9) A p(x) PX( A) = p( x ) µ 1 ( dx) p( x) λ( x, η)µ 2 (dη) (2.10) with ρ(y x) defined arbitrarily for those x, for which p(x) = 0. A Remark 2.4. The first problem can also be resolved with the help of Doob s version of Radon- Nikodym theorem [Ç11, Chapter V.4, Theorem 4.44]: If the σ-algebra on is separable (satisfied whenever is a Polish space, for example) and P X =x Q X=x then there exists a jointly measurable version of Radon-Nikodym derivative dp x ( x, y) X = (y) dq X=x 30

12 MIT OpenCourseWare Information Theory Spring 2016 For information about citing these materials or our Terms of Use, visit:

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 2: Entropy and Mutual Information Chapter 2 outline Definitions Entropy Joint entropy, conditional entropy Relative entropy, mutual information Chain rules Jensen s inequality Log-sum inequality

More information

Lecture 7. Sums of random variables

Lecture 7. Sums of random variables 18.175: Lecture 7 Sums of random variables Scott Sheffield MIT 18.175 Lecture 7 1 Outline Definitions Sums of random variables 18.175 Lecture 7 2 Outline Definitions Sums of random variables 18.175 Lecture

More information

MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016

MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016 MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016 Lecture 14: Information Theoretic Methods Lecturer: Jiaming Xu Scribe: Hilda Ibriga, Adarsh Barik, December 02, 2016 Outline f-divergence

More information

Complex Systems Methods 2. Conditional mutual information, entropy rate and algorithmic complexity

Complex Systems Methods 2. Conditional mutual information, entropy rate and algorithmic complexity Complex Systems Methods 2. Conditional mutual information, entropy rate and algorithmic complexity Eckehard Olbrich MPI MiS Leipzig Potsdam WS 2007/08 Olbrich (Leipzig) 26.10.2007 1 / 18 Overview 1 Summary

More information

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157 Lecture 6: Gaussian Channels Copyright G. Caire (Sample Lectures) 157 Differential entropy (1) Definition 18. The (joint) differential entropy of a continuous random vector X n p X n(x) over R is: Z h(x

More information

LECTURE 2. Convexity and related notions. Last time: mutual information: definitions and properties. Lecture outline

LECTURE 2. Convexity and related notions. Last time: mutual information: definitions and properties. Lecture outline LECTURE 2 Convexity and related notions Last time: Goals and mechanics of the class notation entropy: definitions and properties mutual information: definitions and properties Lecture outline Convexity

More information

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 ECE598: Information-theoretic methods in high-dimensional statistics Spring 06 Lecture : Mutual Information Method Lecturer: Yihong Wu Scribe: Jaeho Lee, Mar, 06 Ed. Mar 9 Quick review: Assouad s lemma

More information

Chapter 8: Differential entropy. University of Illinois at Chicago ECE 534, Natasha Devroye

Chapter 8: Differential entropy. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 8: Differential entropy Chapter 8 outline Motivation Definitions Relation to discrete entropy Joint and conditional differential entropy Relative entropy and mutual information Properties AEP for

More information

Lecture 8: Channel Capacity, Continuous Random Variables

Lecture 8: Channel Capacity, Continuous Random Variables EE376A/STATS376A Information Theory Lecture 8-02/0/208 Lecture 8: Channel Capacity, Continuous Random Variables Lecturer: Tsachy Weissman Scribe: Augustine Chemparathy, Adithya Ganesh, Philip Hwang Channel

More information

LECTURE 3. Last time:

LECTURE 3. Last time: LECTURE 3 Last time: Mutual Information. Convexity and concavity Jensen s inequality Information Inequality Data processing theorem Fano s Inequality Lecture outline Stochastic processes, Entropy rate

More information

ECE 4400:693 - Information Theory

ECE 4400:693 - Information Theory ECE 4400:693 - Information Theory Dr. Nghi Tran Lecture 8: Differential Entropy Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 1 / 43 Outline 1 Review: Entropy of discrete RVs 2 Differential

More information

Information Theory and Communication

Information Theory and Communication Information Theory and Communication Ritwik Banerjee rbanerjee@cs.stonybrook.edu c Ritwik Banerjee Information Theory and Communication 1/8 General Chain Rules Definition Conditional mutual information

More information

Multiple Random Variables

Multiple Random Variables Multiple Random Variables This Version: July 30, 2015 Multiple Random Variables 2 Now we consider models with more than one r.v. These are called multivariate models For instance: height and weight An

More information

Lecture 2: August 31

Lecture 2: August 31 0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 2: August 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy

More information

Lecture 17: Differential Entropy

Lecture 17: Differential Entropy Lecture 17: Differential Entropy Differential entropy AEP for differential entropy Quantization Maximum differential entropy Estimation counterpart of Fano s inequality Dr. Yao Xie, ECE587, Information

More information

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Definitions Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

EE376A: Homeworks #4 Solutions Due on Thursday, February 22, 2018 Please submit on Gradescope. Start every question on a new page.

EE376A: Homeworks #4 Solutions Due on Thursday, February 22, 2018 Please submit on Gradescope. Start every question on a new page. EE376A: Homeworks #4 Solutions Due on Thursday, February 22, 28 Please submit on Gradescope. Start every question on a new page.. Maximum Differential Entropy (a) Show that among all distributions supported

More information

ELEC546 Review of Information Theory

ELEC546 Review of Information Theory ELEC546 Review of Information Theory Vincent Lau 1/1/004 1 Review of Information Theory Entropy: Measure of uncertainty of a random variable X. The entropy of X, H(X), is given by: If X is a discrete random

More information

Hands-On Learning Theory Fall 2016, Lecture 3

Hands-On Learning Theory Fall 2016, Lecture 3 Hands-On Learning Theory Fall 016, Lecture 3 Jean Honorio jhonorio@purdue.edu 1 Information Theory First, we provide some information theory background. Definition 3.1 (Entropy). The entropy of a discrete

More information

Exercises with solutions (Set D)

Exercises with solutions (Set D) Exercises with solutions Set D. A fair die is rolled at the same time as a fair coin is tossed. Let A be the number on the upper surface of the die and let B describe the outcome of the coin toss, where

More information

Lecture 5 Channel Coding over Continuous Channels

Lecture 5 Channel Coding over Continuous Channels Lecture 5 Channel Coding over Continuous Channels I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw November 14, 2014 1 / 34 I-Hsiang Wang NIT Lecture 5 From

More information

Machine Learning. Lecture 02.2: Basics of Information Theory. Nevin L. Zhang

Machine Learning. Lecture 02.2: Basics of Information Theory. Nevin L. Zhang Machine Learning Lecture 02.2: Basics of Information Theory Nevin L. Zhang lzhang@cse.ust.hk Department of Computer Science and Engineering The Hong Kong University of Science and Technology Nevin L. Zhang

More information

Multivariate Random Variable

Multivariate Random Variable Multivariate Random Variable Author: Author: Andrés Hincapié and Linyi Cao This Version: August 7, 2016 Multivariate Random Variable 3 Now we consider models with more than one r.v. These are called multivariate

More information

EE4601 Communication Systems

EE4601 Communication Systems EE4601 Communication Systems Week 2 Review of Probability, Important Distributions 0 c 2011, Georgia Institute of Technology (lect2 1) Conditional Probability Consider a sample space that consists of two

More information

Lecture 6 Basic Probability

Lecture 6 Basic Probability Lecture 6: Basic Probability 1 of 17 Course: Theory of Probability I Term: Fall 2013 Instructor: Gordan Zitkovic Lecture 6 Basic Probability Probability spaces A mathematical setup behind a probabilistic

More information

ECE534, Spring 2018: Solutions for Problem Set #3

ECE534, Spring 2018: Solutions for Problem Set #3 ECE534, Spring 08: Solutions for Problem Set #3 Jointly Gaussian Random Variables and MMSE Estimation Suppose that X, Y are jointly Gaussian random variables with µ X = µ Y = 0 and σ X = σ Y = Let their

More information

5 Mutual Information and Channel Capacity

5 Mutual Information and Channel Capacity 5 Mutual Information and Channel Capacity In Section 2, we have seen the use of a quantity called entropy to measure the amount of randomness in a random variable. In this section, we introduce several

More information

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as L30-1 EEL 5544 Noise in Linear Systems Lecture 30 OTHER TRANSFORMS For a continuous, nonnegative RV X, the Laplace transform of X is X (s) = E [ e sx] = 0 f X (x)e sx dx. For a nonnegative RV, the Laplace

More information

Lecture 14 February 28

Lecture 14 February 28 EE/Stats 376A: Information Theory Winter 07 Lecture 4 February 8 Lecturer: David Tse Scribe: Sagnik M, Vivek B 4 Outline Gaussian channel and capacity Information measures for continuous random variables

More information

The binary entropy function

The binary entropy function ECE 7680 Lecture 2 Definitions and Basic Facts Objective: To learn a bunch of definitions about entropy and information measures that will be useful through the quarter, and to present some simple but

More information

Entropy and Ergodic Theory Lecture 4: Conditional entropy and mutual information

Entropy and Ergodic Theory Lecture 4: Conditional entropy and mutual information Entropy and Ergodic Theory Lecture 4: Conditional entropy and mutual information 1 Conditional entropy Let (Ω, F, P) be a probability space, let X be a RV taking values in some finite set A. In this lecture

More information

Probability and Measure

Probability and Measure Part II Year 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2018 84 Paper 4, Section II 26J Let (X, A) be a measurable space. Let T : X X be a measurable map, and µ a probability

More information

Homework Set #2 Data Compression, Huffman code and AEP

Homework Set #2 Data Compression, Huffman code and AEP Homework Set #2 Data Compression, Huffman code and AEP 1. Huffman coding. Consider the random variable ( x1 x X = 2 x 3 x 4 x 5 x 6 x 7 0.50 0.26 0.11 0.04 0.04 0.03 0.02 (a Find a binary Huffman code

More information

Lectures 22-23: Conditional Expectations

Lectures 22-23: Conditional Expectations Lectures 22-23: Conditional Expectations 1.) Definitions Let X be an integrable random variable defined on a probability space (Ω, F 0, P ) and let F be a sub-σ-algebra of F 0. Then the conditional expectation

More information

Information Theory Primer:

Information Theory Primer: Information Theory Primer: Entropy, KL Divergence, Mutual Information, Jensen s inequality Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,

More information

Lecture 5 - Information theory

Lecture 5 - Information theory Lecture 5 - Information theory Jan Bouda FI MU May 18, 2012 Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 1 / 42 Part I Uncertainty and entropy Jan Bouda (FI MU) Lecture 5 - Information

More information

Lecture Notes 3 Multiple Random Variables. Joint, Marginal, and Conditional pmfs. Bayes Rule and Independence for pmfs

Lecture Notes 3 Multiple Random Variables. Joint, Marginal, and Conditional pmfs. Bayes Rule and Independence for pmfs Lecture Notes 3 Multiple Random Variables Joint, Marginal, and Conditional pmfs Bayes Rule and Independence for pmfs Joint, Marginal, and Conditional pdfs Bayes Rule and Independence for pdfs Functions

More information

Part II Probability and Measure

Part II Probability and Measure Part II Probability and Measure Theorems Based on lectures by J. Miller Notes taken by Dexter Chua Michaelmas 2016 These notes are not endorsed by the lecturers, and I have modified them (often significantly)

More information

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed

More information

Homework 1 Due: Thursday 2/5/2015. Instructions: Turn in your homework in class on Thursday 2/5/2015

Homework 1 Due: Thursday 2/5/2015. Instructions: Turn in your homework in class on Thursday 2/5/2015 10-704 Homework 1 Due: Thursday 2/5/2015 Instructions: Turn in your homework in class on Thursday 2/5/2015 1. Information Theory Basics and Inequalities C&T 2.47, 2.29 (a) A deck of n cards in order 1,

More information

Chapter I: Fundamental Information Theory

Chapter I: Fundamental Information Theory ECE-S622/T62 Notes Chapter I: Fundamental Information Theory Ruifeng Zhang Dept. of Electrical & Computer Eng. Drexel University. Information Source Information is the outcome of some physical processes.

More information

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information 4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information Ramji Venkataramanan Signal Processing and Communications Lab Department of Engineering ramji.v@eng.cam.ac.uk

More information

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed

More information

Application of Information Theory, Lecture 7. Relative Entropy. Handout Mode. Iftach Haitner. Tel Aviv University.

Application of Information Theory, Lecture 7. Relative Entropy. Handout Mode. Iftach Haitner. Tel Aviv University. Application of Information Theory, Lecture 7 Relative Entropy Handout Mode Iftach Haitner Tel Aviv University. December 1, 2015 Iftach Haitner (TAU) Application of Information Theory, Lecture 7 December

More information

EE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm

EE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm EE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm 1. Feedback does not increase the capacity. Consider a channel with feedback. We assume that all the recieved outputs are sent back immediately

More information

Review (Probability & Linear Algebra)

Review (Probability & Linear Algebra) Review (Probability & Linear Algebra) CE-725 : Statistical Pattern Recognition Sharif University of Technology Spring 2013 M. Soleymani Outline Axioms of probability theory Conditional probability, Joint

More information

Distance-Divergence Inequalities

Distance-Divergence Inequalities Distance-Divergence Inequalities Katalin Marton Alfréd Rényi Institute of Mathematics of the Hungarian Academy of Sciences Motivation To find a simple proof of the Blowing-up Lemma, proved by Ahlswede,

More information

STA 711: Probability & Measure Theory Robert L. Wolpert

STA 711: Probability & Measure Theory Robert L. Wolpert STA 711: Probability & Measure Theory Robert L. Wolpert 6 Independence 6.1 Independent Events A collection of events {A i } F in a probability space (Ω,F,P) is called independent if P[ i I A i ] = P[A

More information

Chapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued

Chapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued Chapter 3 sections Chapter 3 - continued 3.1 Random Variables and Discrete Distributions 3.2 Continuous Distributions 3.3 The Cumulative Distribution Function 3.4 Bivariate Distributions 3.5 Marginal Distributions

More information

COMPSCI 650 Applied Information Theory Jan 21, Lecture 2

COMPSCI 650 Applied Information Theory Jan 21, Lecture 2 COMPSCI 650 Applied Information Theory Jan 21, 2016 Lecture 2 Instructor: Arya Mazumdar Scribe: Gayane Vardoyan, Jong-Chyi Su 1 Entropy Definition: Entropy is a measure of uncertainty of a random variable.

More information

3. If a choice is broken down into two successive choices, the original H should be the weighted sum of the individual values of H.

3. If a choice is broken down into two successive choices, the original H should be the weighted sum of the individual values of H. Appendix A Information Theory A.1 Entropy Shannon (Shanon, 1948) developed the concept of entropy to measure the uncertainty of a discrete random variable. Suppose X is a discrete random variable that

More information

Chapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued

Chapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued Chapter 3 sections 3.1 Random Variables and Discrete Distributions 3.2 Continuous Distributions 3.3 The Cumulative Distribution Function 3.4 Bivariate Distributions 3.5 Marginal Distributions 3.6 Conditional

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 9 10/2/2013. Conditional expectations, filtration and martingales

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 9 10/2/2013. Conditional expectations, filtration and martingales MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 9 10/2/2013 Conditional expectations, filtration and martingales Content. 1. Conditional expectations 2. Martingales, sub-martingales

More information

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014 Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 Problem Set 3 Issued: Thursday, September 25, 2014 Due: Thursday,

More information

Brownian Motion and Conditional Probability

Brownian Motion and Conditional Probability Math 561: Theory of Probability (Spring 2018) Week 10 Brownian Motion and Conditional Probability 10.1 Standard Brownian Motion (SBM) Brownian motion is a stochastic process with both practical and theoretical

More information

Lecture 5. If we interpret the index n 0 as time, then a Markov chain simply requires that the future depends only on the present and not on the past.

Lecture 5. If we interpret the index n 0 as time, then a Markov chain simply requires that the future depends only on the present and not on the past. 1 Markov chain: definition Lecture 5 Definition 1.1 Markov chain] A sequence of random variables (X n ) n 0 taking values in a measurable state space (S, S) is called a (discrete time) Markov chain, if

More information

Class 8 Review Problems solutions, 18.05, Spring 2014

Class 8 Review Problems solutions, 18.05, Spring 2014 Class 8 Review Problems solutions, 8.5, Spring 4 Counting and Probability. (a) Create an arrangement in stages and count the number of possibilities at each stage: ( ) Stage : Choose three of the slots

More information

Shannon meets Wiener II: On MMSE estimation in successive decoding schemes

Shannon meets Wiener II: On MMSE estimation in successive decoding schemes Shannon meets Wiener II: On MMSE estimation in successive decoding schemes G. David Forney, Jr. MIT Cambridge, MA 0239 USA forneyd@comcast.net Abstract We continue to discuss why MMSE estimation arises

More information

Measure-theoretic probability

Measure-theoretic probability Measure-theoretic probability Koltay L. VEGTMAM144B November 28, 2012 (VEGTMAM144B) Measure-theoretic probability November 28, 2012 1 / 27 The probability space De nition The (Ω, A, P) measure space is

More information

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows. Chapter 5 Two Random Variables In a practical engineering problem, there is almost always causal relationship between different events. Some relationships are determined by physical laws, e.g., voltage

More information

4 Derivations of the Discrete-Time Kalman Filter

4 Derivations of the Discrete-Time Kalman Filter Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof N Shimkin 4 Derivations of the Discrete-Time

More information

EE376A - Information Theory Final, Monday March 14th 2016 Solutions. Please start answering each question on a new page of the answer booklet.

EE376A - Information Theory Final, Monday March 14th 2016 Solutions. Please start answering each question on a new page of the answer booklet. EE376A - Information Theory Final, Monday March 14th 216 Solutions Instructions: You have three hours, 3.3PM - 6.3PM The exam has 4 questions, totaling 12 points. Please start answering each question on

More information

Lecture 10: Broadcast Channel and Superposition Coding

Lecture 10: Broadcast Channel and Superposition Coding Lecture 10: Broadcast Channel and Superposition Coding Scribed by: Zhe Yao 1 Broadcast channel M 0M 1M P{y 1 y x} M M 01 1 M M 0 The capacity of the broadcast channel depends only on the marginal conditional

More information

REVIEW OF MAIN CONCEPTS AND FORMULAS A B = Ā B. Pr(A B C) = Pr(A) Pr(A B C) =Pr(A) Pr(B A) Pr(C A B)

REVIEW OF MAIN CONCEPTS AND FORMULAS A B = Ā B. Pr(A B C) = Pr(A) Pr(A B C) =Pr(A) Pr(B A) Pr(C A B) REVIEW OF MAIN CONCEPTS AND FORMULAS Boolean algebra of events (subsets of a sample space) DeMorgan s formula: A B = Ā B A B = Ā B The notion of conditional probability, and of mutual independence of two

More information

Lecture 22: Final Review

Lecture 22: Final Review Lecture 22: Final Review Nuts and bolts Fundamental questions and limits Tools Practical algorithms Future topics Dr Yao Xie, ECE587, Information Theory, Duke University Basics Dr Yao Xie, ECE587, Information

More information

3F1: Signals and Systems INFORMATION THEORY Examples Paper Solutions

3F1: Signals and Systems INFORMATION THEORY Examples Paper Solutions Engineering Tripos Part IIA THIRD YEAR 3F: Signals and Systems INFORMATION THEORY Examples Paper Solutions. Let the joint probability mass function of two binary random variables X and Y be given in the

More information

MTH 404: Measure and Integration

MTH 404: Measure and Integration MTH 404: Measure and Integration Semester 2, 2012-2013 Dr. Prahlad Vaidyanathan Contents I. Introduction....................................... 3 1. Motivation................................... 3 2. The

More information

Lecture 3. Mathematical methods in communication I. REMINDER. A. Convex Set. A set R is a convex set iff, x 1,x 2 R, θ, 0 θ 1, θx 1 + θx 2 R, (1)

Lecture 3. Mathematical methods in communication I. REMINDER. A. Convex Set. A set R is a convex set iff, x 1,x 2 R, θ, 0 θ 1, θx 1 + θx 2 R, (1) 3- Mathematical methods in communication Lecture 3 Lecturer: Haim Permuter Scribe: Yuval Carmel, Dima Khaykin, Ziv Goldfeld I. REMINDER A. Convex Set A set R is a convex set iff, x,x 2 R, θ, θ, θx + θx

More information

Math General Topology Fall 2012 Homework 6 Solutions

Math General Topology Fall 2012 Homework 6 Solutions Math 535 - General Topology Fall 202 Homework 6 Solutions Problem. Let F be the field R or C of real or complex numbers. Let n and denote by F[x, x 2,..., x n ] the set of all polynomials in n variables

More information

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Joint Probability Distributions and Random Samples (Devore Chapter Five) Joint Probability Distributions and Random Samples (Devore Chapter Five) 1016-345-01: Probability and Statistics for Engineers Spring 2013 Contents 1 Joint Probability Distributions 2 1.1 Two Discrete

More information

If Y and Y 0 satisfy (1-2), then Y = Y 0 a.s.

If Y and Y 0 satisfy (1-2), then Y = Y 0 a.s. 20 6. CONDITIONAL EXPECTATION Having discussed at length the limit theory for sums of independent random variables we will now move on to deal with dependent random variables. An important tool in this

More information

Machine Learning Lecture Notes

Machine Learning Lecture Notes Machine Learning Lecture Notes Predrag Radivojac January 3, 25 Random Variables Until now we operated on relatively simple sample spaces and produced measure functions over sets of outcomes. In many situations,

More information

LECTURE 13. Last time: Lecture outline

LECTURE 13. Last time: Lecture outline LECTURE 13 Last time: Strong coding theorem Revisiting channel and codes Bound on probability of error Error exponent Lecture outline Fano s Lemma revisited Fano s inequality for codewords Converse to

More information

Lecture 2. Capacity of the Gaussian channel

Lecture 2. Capacity of the Gaussian channel Spring, 207 5237S, Wireless Communications II 2. Lecture 2 Capacity of the Gaussian channel Review on basic concepts in inf. theory ( Cover&Thomas: Elements of Inf. Theory, Tse&Viswanath: Appendix B) AWGN

More information

Lecture 11: Continuous-valued signals and differential entropy

Lecture 11: Continuous-valued signals and differential entropy Lecture 11: Continuous-valued signals and differential entropy Biology 429 Carl Bergstrom September 20, 2008 Sources: Parts of today s lecture follow Chapter 8 from Cover and Thomas (2007). Some components

More information

9 Radon-Nikodym theorem and conditioning

9 Radon-Nikodym theorem and conditioning Tel Aviv University, 2015 Functions of real variables 93 9 Radon-Nikodym theorem and conditioning 9a Borel-Kolmogorov paradox............. 93 9b Radon-Nikodym theorem.............. 94 9c Conditioning.....................

More information

P (A G) dp G P (A G)

P (A G) dp G P (A G) First homework assignment. Due at 12:15 on 22 September 2016. Homework 1. We roll two dices. X is the result of one of them and Z the sum of the results. Find E [X Z. Homework 2. Let X be a r.v.. Assume

More information

How to Quantitate a Markov Chain? Stochostic project 1

How to Quantitate a Markov Chain? Stochostic project 1 How to Quantitate a Markov Chain? Stochostic project 1 Chi-Ning,Chou Wei-chang,Lee PROFESSOR RAOUL NORMAND April 18, 2015 Abstract In this project, we want to quantitatively evaluate a Markov chain. In

More information

Concentration Inequalities

Concentration Inequalities Chapter Concentration Inequalities I. Moment generating functions, the Chernoff method, and sub-gaussian and sub-exponential random variables a. Goal for this section: given a random variable X, how does

More information

Functional Properties of MMSE

Functional Properties of MMSE Functional Properties of MMSE Yihong Wu epartment of Electrical Engineering Princeton University Princeton, NJ 08544, USA Email: yihongwu@princeton.edu Sergio Verdú epartment of Electrical Engineering

More information

Capacity of AWGN channels

Capacity of AWGN channels Chapter 3 Capacity of AWGN channels In this chapter we prove that the capacity of an AWGN channel with bandwidth W and signal-tonoise ratio SNR is W log 2 (1+SNR) bits per second (b/s). The proof that

More information

conditional cdf, conditional pdf, total probability theorem?

conditional cdf, conditional pdf, total probability theorem? 6 Multiple Random Variables 6.0 INTRODUCTION scalar vs. random variable cdf, pdf transformation of a random variable conditional cdf, conditional pdf, total probability theorem expectation of a random

More information

Lecture 4 Noisy Channel Coding

Lecture 4 Noisy Channel Coding Lecture 4 Noisy Channel Coding I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw October 9, 2015 1 / 56 I-Hsiang Wang IT Lecture 4 The Channel Coding Problem

More information

U Logo Use Guidelines

U Logo Use Guidelines Information Theory Lecture 3: Applications to Machine Learning U Logo Use Guidelines Mark Reid logo is a contemporary n of our heritage. presents our name, d and our motto: arn the nature of things. authenticity

More information

Noisy channel communication

Noisy channel communication Information Theory http://www.inf.ed.ac.uk/teaching/courses/it/ Week 6 Communication channels and Information Some notes on the noisy channel setup: Iain Murray, 2012 School of Informatics, University

More information

Brief Review of Probability

Brief Review of Probability Brief Review of Probability Nuno Vasconcelos (Ken Kreutz-Delgado) ECE Department, UCSD Probability Probability theory is a mathematical language to deal with processes or experiments that are non-deterministic

More information

Lecture 6 I. CHANNEL CODING. X n (m) P Y X

Lecture 6 I. CHANNEL CODING. X n (m) P Y X 6- Introduction to Information Theory Lecture 6 Lecturer: Haim Permuter Scribe: Yoav Eisenberg and Yakov Miron I. CHANNEL CODING We consider the following channel coding problem: m = {,2,..,2 nr} Encoder

More information

Introduction to Probability and Stocastic Processes - Part I

Introduction to Probability and Stocastic Processes - Part I Introduction to Probability and Stocastic Processes - Part I Lecture 2 Henrik Vie Christensen vie@control.auc.dk Department of Control Engineering Institute of Electronic Systems Aalborg University Denmark

More information

1/12/05: sec 3.1 and my article: How good is the Lebesgue measure?, Math. Intelligencer 11(2) (1989),

1/12/05: sec 3.1 and my article: How good is the Lebesgue measure?, Math. Intelligencer 11(2) (1989), Real Analysis 2, Math 651, Spring 2005 April 26, 2005 1 Real Analysis 2, Math 651, Spring 2005 Krzysztof Chris Ciesielski 1/12/05: sec 3.1 and my article: How good is the Lebesgue measure?, Math. Intelligencer

More information

Lecture 1: Introduction, Entropy and ML estimation

Lecture 1: Introduction, Entropy and ML estimation 0-704: Information Processing and Learning Spring 202 Lecture : Introduction, Entropy and ML estimation Lecturer: Aarti Singh Scribes: Min Xu Disclaimer: These notes have not been subjected to the usual

More information

Example: Letter Frequencies

Example: Letter Frequencies Example: Letter Frequencies i a i p i 1 a 0.0575 2 b 0.0128 3 c 0.0263 4 d 0.0285 5 e 0.0913 6 f 0.0173 7 g 0.0133 8 h 0.0313 9 i 0.0599 10 j 0.0006 11 k 0.0084 12 l 0.0335 13 m 0.0235 14 n 0.0596 15 o

More information

EE229B - Final Project. Capacity-Approaching Low-Density Parity-Check Codes

EE229B - Final Project. Capacity-Approaching Low-Density Parity-Check Codes EE229B - Final Project Capacity-Approaching Low-Density Parity-Check Codes Pierre Garrigues EECS department, UC Berkeley garrigue@eecs.berkeley.edu May 13, 2005 Abstract The class of low-density parity-check

More information

PROOF OF ZADOR-GERSHO THEOREM

PROOF OF ZADOR-GERSHO THEOREM ZADOR-GERSHO THEOREM FOR VARIABLE-RATE VQ For a stationary source and large R, the least distortion of k-dim'l VQ with nth-order entropy coding and rate R or less is δ(k,n,r) m k * σ 2 η kn 2-2R = Z(k,n,R)

More information

Example: Letter Frequencies

Example: Letter Frequencies Example: Letter Frequencies i a i p i 1 a 0.0575 2 b 0.0128 3 c 0.0263 4 d 0.0285 5 e 0.0913 6 f 0.0173 7 g 0.0133 8 h 0.0313 9 i 0.0599 10 j 0.0006 11 k 0.0084 12 l 0.0335 13 m 0.0235 14 n 0.0596 15 o

More information

lossless, optimal compressor

lossless, optimal compressor 6. Variable-length Lossless Compression The principal engineering goal of compression is to represent a given sequence a, a 2,..., a n produced by a source as a sequence of bits of minimal possible length.

More information

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R.

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R. Ergodic Theorems Samy Tindel Purdue University Probability Theory 2 - MA 539 Taken from Probability: Theory and examples by R. Durrett Samy T. Ergodic theorems Probability Theory 1 / 92 Outline 1 Definitions

More information

EE514A Information Theory I Fall 2013

EE514A Information Theory I Fall 2013 EE514A Information Theory I Fall 2013 K. Mohan, Prof. J. Bilmes University of Washington, Seattle Department of Electrical Engineering Fall Quarter, 2013 http://j.ee.washington.edu/~bilmes/classes/ee514a_fall_2013/

More information

Solutions to Set #2 Data Compression, Huffman code and AEP

Solutions to Set #2 Data Compression, Huffman code and AEP Solutions to Set #2 Data Compression, Huffman code and AEP. Huffman coding. Consider the random variable ( ) x x X = 2 x 3 x 4 x 5 x 6 x 7 0.50 0.26 0. 0.04 0.04 0.03 0.02 (a) Find a binary Huffman code

More information

LECTURE 15. Last time: Feedback channel: setting up the problem. Lecture outline. Joint source and channel coding theorem

LECTURE 15. Last time: Feedback channel: setting up the problem. Lecture outline. Joint source and channel coding theorem LECTURE 15 Last time: Feedback channel: setting up the problem Perfect feedback Feedback capacity Data compression Lecture outline Joint source and channel coding theorem Converse Robustness Brain teaser

More information

ECE 587 / STA 563: Lecture 2 Measures of Information Information Theory Duke University, Fall 2017

ECE 587 / STA 563: Lecture 2 Measures of Information Information Theory Duke University, Fall 2017 ECE 587 / STA 563: Lecture 2 Measures of Information Information Theory Duke University, Fall 207 Author: Galen Reeves Last Modified: August 3, 207 Outline of lecture: 2. Quantifying Information..................................

More information