ECE 4400:693 - Information Theory

Size: px
Start display at page:

Download "ECE 4400:693 - Information Theory"

Transcription

1 ECE 4400:693 - Information Theory Dr. Nghi Tran Lecture 8: Differential Entropy Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 1 / 43

2 Outline 1 Review: Entropy of discrete RVs 2 Differential Entropy: Motivation 3 Continuous RVs: A Review Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 2 / 43

3 Outline 1 Review: Entropy of discrete RVs 2 Differential Entropy: Motivation 3 Continuous RVs: A Review Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 2 / 43

4 Outline 1 Review: Entropy of discrete RVs 2 Differential Entropy: Motivation 3 Continuous RVs: A Review Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 2 / 43

5 Review: Entropy of discrete RVs Self-Information of a RV What is Entropy? Are there some intuitive notions about Entropy? We first define a so-called Self-Information At first, let consider a discrete RV X with PMF P(X = x). For convenience, hereafter, we shall denote the PMF as p(x). p(x) and p(y) refer to two different RVs and different PMFs Note that x is outcome of an experiment, not necessarily a number. Now,foraRVX, Self-Information of an event X = x is defined as: I(x) =log 1 = log p(x) p(x) If the base of the logarithm is e, it is measured in nats. Unless otherwise state, we take logarithm to base 2 and the measurement will be in bits. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 3 / 43

6 Review: Entropy of discrete RVs Self-Information of a RV (Continued) I(x) =log 1 p(x) = log p(x) Let see a very simple example: Suppose we have a discrete information source that emits binary bits 0 and 1 with equal probability of 1/2. What is Self-Information: It is 1 bit. Now, if a source emits k bits in a block k time intervals, Self-Information will be k bits. So we somehow already have some appropriate measure of information!!!!! Observe that High probability event conveys less information than low-probability one. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 4 / 43

7 Entropy of a RV Review: Entropy of discrete RVs So what is Entropy of a RV: Simply speaking It is an average of self-information. Or it can be understood considered as a measure of the uncertainty ofarv. Definition The entropy H(X) of a discrete RV X with PMF p(x) is given by H(X) = x p(x) log p(x) Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 5 / 43

8 Review: Entropy of discrete RVs Entropy of a RV (Continued) H(X) = x p(x) log p(x) Note that H(X) is a functional of the distribution of X: It does not depends on the actual values taken by X but only on the probabilities. We can observe that the entropy H(X) can be interpreted as the expected value of RV log 1 : An average p(x) self-information Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 6 / 43

9 Differential Entropy: Motivation Motivation What we considered earlier applies to discrete RVs. For continuous RVs, we also need to define entropy, mutual information. In fact, most of the time, we need to work on continuous RVs, e.g., Gaussian channel. Given H(X) = x p(x) log p(x) for discrete X, can you guess what would it be for a continuous one? H(X) = f X (x) log f X (x)dx S where f X (x), or for simplicity f (x), is the probability density function of RV X and S is support set of X. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 7 / 43

10 Differential Entropy: Motivation Motivation What we considered earlier applies to discrete RVs. For continuous RVs, we also need to define entropy, mutual information. In fact, most of the time, we need to work on continuous RVs, e.g., Gaussian channel. Given H(X) = x p(x) log p(x) for discrete X, can you guess what would it be for a continuous one? H(X) = f X (x) log f X (x)dx S where f X (x), or for simplicity f (x), is the probability density function of RV X and S is support set of X. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 7 / 43

11 Differential Entropy: Motivation Motivation What we considered earlier applies to discrete RVs. For continuous RVs, we also need to define entropy, mutual information. In fact, most of the time, we need to work on continuous RVs, e.g., Gaussian channel. Given H(X) = x p(x) log p(x) for discrete X, can you guess what would it be for a continuous one? H(X) = f X (x) log f X (x)dx S where f X (x), or for simplicity f (x), is the probability density function of RV X and S is support set of X. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 7 / 43

12 CDF and PDF Continuous RVs: A Review We have cumulative distribution function (CDF), which gives a complete description of the random variable: F X (x) =P(X x) The probability density function (PDF) is defined as the derivative of the CDF: f X (x) = df X(x) dx Then one has the following relationship: P(x 1 X x 2 )=P(X x 2 ) P(X x 1 ) = F X (x 2 ) F X (x 1 )= x2 x 1 f X (x)dx Some properties: f X (x) 0 (and the set of x is referred to as support set), + f X(x)dx = 1. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 8 / 43

13 Joint PDF For two RVs X and Y defined in sample space Ω, one has the joint CDF: The joint PDF is: F X,Y (x, y) =P(X x, Y y) f X,Y (x, y) = 2 F X,Y (x, y) x y The marginal PDF can be obtained from the joint PDF as: f X (x) = + f X,Y (x, y)dy; f Y (y) = + f X,Y (x, y)dx Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 9 / 43

14 The Conditional PDF The conditional PDF of the RV Y, given that the value of the RV X is equal to x, is defined as { fx,y (x,y) f Y (y x) = f X, f (x) X (x) 0 0, Otherwise Two RVs X and Y are statistically independent if and only if f Y (y x) =f Y (y) or equivalently, f X,Y (x, y) =f X (x) f Y (y) It means that knowledge of X does not affect the statistics of Y, and vice versa. As we will see later, if X and Y are independent, then X provides no information about Y and vice-versa. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 10 / 43

15 The Conditional PDF The conditional PDF of the RV Y, given that the value of the RV X is equal to x, is defined as { fx,y (x,y) f Y (y x) = f X, f (x) X (x) 0 0, Otherwise Two RVs X and Y are statistically independent if and only if f Y (y x) =f Y (y) or equivalently, f X,Y (x, y) =f X (x) f Y (y) It means that knowledge of X does not affect the statistics of Y, and vice versa. As we will see later, if X and Y are independent, then X provides no information about Y and vice-versa. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 10 / 43

16 Quantized Random Variables Before proceeding further with differential entropy, let first consider quantized RVs from a continuous PDF. We divide the range of x into bins of width Δ (quantization): f(x) Δ For any bin ith, x i such that f (x i )Δ = (i+1)δ f (x)dx: From iδ mean-value theorem. We then now define the following (discrete) RV: X Δ = {x i } p X Δ = {f (x i )Δ} x Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 11 / 43

17 Quantized Random Variables Before proceeding further with differential entropy, let first consider quantized RVs from a continuous PDF. We divide the range of x into bins of width Δ (quantization): f(x) Δ For any bin ith, x i such that f (x i )Δ = (i+1)δ f (x)dx: From iδ mean-value theorem. We then now define the following (discrete) RV: X Δ = {x i } p X Δ = {f (x i )Δ} x Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 11 / 43

18 Quantized Random Variables Before proceeding further with differential entropy, let first consider quantized RVs from a continuous PDF. We divide the range of x into bins of width Δ (quantization): f(x) Δ For any bin ith, x i such that f (x i )Δ = (i+1)δ f (x)dx: From iδ mean-value theorem. We then now define the following (discrete) RV: X Δ = {x i } p X Δ = {f (x i )Δ} x Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 11 / 43

19 Quantized Random Variable f(x) f(x) Δ Δ x x X Δ = {x i } p X Δ = {f (x i )Δ} It is a scaled, quantized version of f (x), with unevenly spaced x i. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 12 / 43

20 Entropy of Quantized Random Variable X Δ = {x i } p X Δ = {f (x i )Δ} H(X Δ ) = f (x i )Δ log (f (x i )Δ) = log Δ f (x i ) log (f (x i )) Δ Now, if Δ 0, wehave: H(X Δ )= log Δ The parameter h(x) = + + f (x) log f (x)dx f (x) log f (x)dx is defined as differential entropy of a continuous RV X. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 13 / 43

21 Entropy of Quantized Random Variable X Δ = {x i } p X Δ = {f (x i )Δ} H(X Δ ) = f (x i )Δ log (f (x i )Δ) = log Δ f (x i ) log (f (x i )) Δ Now, if Δ 0, wehave: H(X Δ )= log Δ The parameter h(x) = + + f (x) log f (x)dx f (x) log f (x)dx is defined as differential entropy of a continuous RV X. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 13 / 43

22 Entropy of Quantized Random Variable X Δ = {x i } p X Δ = {f (x i )Δ} H(X Δ ) = f (x i )Δ log (f (x i )Δ) = log Δ f (x i ) log (f (x i )) Δ Now, if Δ 0, wehave: H(X Δ )= log Δ The parameter h(x) = + + f (x) log f (x)dx f (x) log f (x)dx is defined as differential entropy of a continuous RV X. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 13 / 43

23 Differential Entropy: Definition Definition The differential entropy h(x) of a continuous random variable X with density f (x) is defined as h(x) = f (x) log f (x)dx = E {log f (X)} S where S is the support set of the random variable. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 14 / 43

24 Differential Entropy With Δ 0, H(X Δ )= log Δ h(x) = + + f (x) log f (x)dx f (x) log f (x)dx = E {log f (X)} h(x) does not give the amount of information in X Not necessarily positive However, one still can compare the uncertainly of two continuous r.v. (quantized to the same precision) Relative Entropy and Mutual Information still work well Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 15 / 43

25 Differential Entropy With Δ 0, H(X Δ )= log Δ h(x) = + + f (x) log f (x)dx f (x) log f (x)dx = E {log f (X)} h(x) does not give the amount of information in X Not necessarily positive However, one still can compare the uncertainly of two continuous r.v. (quantized to the same precision) Relative Entropy and Mutual Information still work well Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 15 / 43

26 Example: Uniform Distribution Uniform distribution X U(a, b): f (x) = 1 b a for x (a, b) and 0 else where b 1 h(x) = a b a log 1 dx = log(b a) b a We can observe that h(x) < 0 when (b a) < 1. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 16 / 43

27 Example: Uniform Distribution Uniform distribution X U(a, b): f (x) = 1 b a for x (a, b) and 0 else where b 1 h(x) = a b a log 1 dx = log(b a) b a We can observe that h(x) < 0 when (b a) < 1. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 16 / 43

28 Example: Uniform Distribution Uniform distribution X U(a, b): f (x) = 1 b a for x (a, b) and 0 else where b 1 h(x) = a b a log 1 dx = log(b a) b a We can observe that h(x) < 0 when (b a) < 1. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 16 / 43

29 Example: Gaussian Distribution Gaussian distribution X N(μ, σ 2 ): ( ) 1 f (x) = exp (x μ)2 2πσ 2 2σ 2 Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 17 / 43

30 Joint and Conditional Entropy Definition (Joint Entropy) The differential entropy of a set X 1,...,X n of random variables with joint pdf f (x 1,...,x n ) is defined as: h (X 1,...,X n )= f (x n ) log f (x n )dx n. Definition (Conditional Entropy) If X and Y have a joint density function f (x, y), we can define the conditional entropy h(x Y as h(x Y) = f (x, y) log f (x y)dxdy. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 18 / 43

31 Joint and Conditional Entropy Definition (Joint Entropy) The differential entropy of a set X 1,...,X n of random variables with joint pdf f (x 1,...,x n ) is defined as: h (X 1,...,X n )= f (x n ) log f (x n )dx n. Definition (Conditional Entropy) If X and Y have a joint density function f (x, y), we can define the conditional entropy h(x Y as h(x Y) = f (x, y) log f (x y)dxdy. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 18 / 43

32 Multivariate Gaussian Theorem (Entropy of a multivariate Gaussian distribution) Let X 1,...,X n have a multivariate Gaussian distribution with mean µ and covariance matrix Q, denoted as N n (µ, Q). Then h (X 1,...,X n )= 1 2 log(2πe)n Q where Q is the determinant of Q. Proof. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 19 / 43

33 Relative Entropy and Mutual Information Definition (Relative Entropy) The relative entropy (or Kullback-Leibler distance) D(f g) between two densities f and g is defined by D(f g) = f log f g. Note that D(f g) is finite only if the support set of f is contained in the support set of g. Definition (Mutual Information) The mutual information I(X; Y) between two random variables with joint density f (x, y) is defined as f (x, y) I(X; Y) = f (x, y) log f (x)f (y) dxdy Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 20 / 43

34 Information Inequality Theorem D(f g) 0 with equality iff f = g almost everywhere (a.e.). Proof. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 21 / 43

35 Properties of Mutual Information I(X; Y) = f (x, y) log f (x, y) f (x)f (y) dxdy From the definition, it is clear that: I(X; Y) =h(x) h(x Y) =h(y) h(y X) =h(x)+h(y) h(x, Y) Also, I(X; Y) =D (f (x, y) f (x)f (y)) Properties of D(f g) and I(X; Y) are the same as discrete case. Why? Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 22 / 43

36 Properties of Mutual Information I(X; Y) = f (x, y) log f (x, y) f (x)f (y) dxdy From the definition, it is clear that: I(X; Y) =h(x) h(x Y) =h(y) h(y X) =h(x)+h(y) h(x, Y) Also, I(X; Y) =D (f (x, y) f (x)f (y)) Properties of D(f g) and I(X; Y) are the same as discrete case. Why? Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 22 / 43

37 Properties of Mutual Information I(X; Y) = f (x, y) log f (x, y) f (x)f (y) dxdy From the definition, it is clear that: I(X; Y) =h(x) h(x Y) =h(y) h(y X) =h(x)+h(y) h(x, Y) Also, I(X; Y) =D (f (x, y) f (x)f (y)) Properties of D(f g) and I(X; Y) are the same as discrete case. Why? Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 22 / 43

38 Mutual Information with Finite Partitions Definition (Partition) Let X be the range of X. A partition P of X is a finite collection of disjoint sets P i such that i P i = X. Definition (Quantization) The quantization of X by P, denoted [X] P is a discrete RV defined by: Pr([X] P = i) =Pr(X P i = df(x). P i We can now have a general definition of mutual information between two arbitrary RVs X and Y with partitions P and Q using the mutual information between the quantized version of X and Y, [X] P and [Y] Q. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 23 / 43

39 Mutual Information with Finite Partitions Definition (Partition) Let X be the range of X. A partition P of X is a finite collection of disjoint sets P i such that i P i = X. Definition (Quantization) The quantization of X by P, denoted [X] P is a discrete RV defined by: Pr([X] P = i) =Pr(X P i = df(x). P i We can now have a general definition of mutual information between two arbitrary RVs X and Y with partitions P and Q using the mutual information between the quantized version of X and Y, [X] P and [Y] Q. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 23 / 43

40 Mutual Information with Finite Partitions Definition (Partition) Let X be the range of X. A partition P of X is a finite collection of disjoint sets P i such that i P i = X. Definition (Quantization) The quantization of X by P, denoted [X] P is a discrete RV defined by: Pr([X] P = i) =Pr(X P i = df(x). P i We can now have a general definition of mutual information between two arbitrary RVs X and Y with partitions P and Q using the mutual information between the quantized version of X and Y, [X] P and [Y] Q. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 23 / 43

41 Mutual Information with Finite Partitions Definition (Mutual Information) The mutual information between two RVs X and Y is given by: I(X; Y) =sup I([X] P, [Y] Q ) P,Q where the supremum is over all finite partitions P and Q. In fact, the above definition can be used to both RVs having pdf and pmf: More general. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 24 / 43

42 Mutual Information with Finite Partitions Definition (Mutual Information) The mutual information between two RVs X and Y is given by: I(X; Y) =sup I([X] P, [Y] Q ) P,Q where the supremum is over all finite partitions P and Q. In fact, the above definition can be used to both RVs having pdf and pmf: More general. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 24 / 43

43 Example: Mutual Information between Correlated Gaussian RVs Let (X, Y N (0, K) where ( σ 2 ρσ K = 2 ρσ 2 σ 2 ) It means the correlation is ρ. What is I(X; Y)? Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 25 / 43

44 More Properties of Differential Entropy Corollary h(x Y) h(x) with equality iff X and Y are independent. Theorem (Chain rule for differential entropy) n h(x 1,...,X n )= h(x i X 1,...,X i 1 ) i=1 Corollary h(x 1,...,X n ) h(x i ) with equality iff X 1,...,X n are independent. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 26 / 43

45 Changing Variable Now we have Y = g(x). We know that f Y (y) = df Y(y) dy = f X (g 1 (y)) dg 1 (y) dy = f X(g 1 (y)) dx dy Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 27 / 43

46 Changing Variable - Example Theorem Translation does not change differential entropy: h(x + c) =h(x). Theorem Corollary h(ax) =h(x)+log a For a vector-valued RV: h(ax) =h(x)+log det(a) Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 28 / 43

47 Concavity and Convexity Same properties with discrete RVs: Differential Entropy h(x) is a concave function of f X (x). Mutual Information h(x) is a concave function of f X (x). I(X; Y) is a concave function of f X (x) for fixed f Y X (y). I(X; Y) is a convex function of f Y X (y) for fixed f X (x). Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 29 / 43

48 Maximum Entropy Distribution Going back to discrete RVs: For a discrete random variable taking on K values, what distribution maximized the entropy? For continuous RVs, we are interested in: Maximizing the entropy h(f ) over all f that satisfy: 1. f (x) 0 with equality outside the support set S 2. S f (x)dx = 1 3. Moment constraints S f (x)r i(x)dx = α i for 1 i m. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 30 / 43

49 Maximum Entropy Distribution Going back to discrete RVs: For a discrete random variable taking on K values, what distribution maximized the entropy? For continuous RVs, we are interested in: Maximizing the entropy h(f ) over all f that satisfy: 1. f (x) 0 with equality outside the support set S 2. S f (x)dx = 1 3. Moment constraints S f (x)r i(x)dx = α i for 1 i m. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 30 / 43

50 Maximum Entropy Distribution Maximizing the entropy h(f ) over all f that satisfy: 1. f (x) 0 with equality outside the support set S 2. f (x)dx = 1 S 3. Moment constraints f (x)r S i(x)dx = α i for 1 i m. Theorem (Maximum entropy distribution) Let f (x) =f λ (x) =exp(λ 0 + m i=1 λ ir i (x)), x S, where λ 0,...,λ m are chosen so that f satisfies the above constraints. Then f uniquely maximizes h(f ) over all probability densities f satisfying the above constraints. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 31 / 43

51 Example 1 Continuous RVs: A Review First, let S =[a, b]. What is f? Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 32 / 43

52 Example 2 Continuous RVs: A Review Now we consider that we have constraints E(X) =0 and E(X 2 )=σ 2. What is f? Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 33 / 43

53 Example 3 Continuous RVs: A Review What zero-mean distribution maximizes the entropy on (, ) n for a given covariance matrix K? Answer: A multivariate Gaussian 1 φ(x) = ( 2πn K exp 1 ) 2 x K 1 x Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 34 / 43

54 Example 3 Continuous RVs: A Review What zero-mean distribution maximizes the entropy on (, ) n for a given covariance matrix K? Answer: A multivariate Gaussian 1 φ(x) = ( 2πn K exp 1 ) 2 x K 1 x Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 34 / 43

55 Estimation Error and Differential Entropy Recall for discrete RVs: The problem: Assume we know RV Y and wish to guess the value of a correlated RV X Fano s inequality relates the probability of error in guessing X to its conditional entropy H(X Y). As we shall see later, this problem is indeed crucial in proving the converse to Shannon s channel capacity theorem. In one of the assignments, we will see that H(X Y) =0 if and only if X is a function of Y. It means that when H(X Y) =0, we can estimate X from Y with zero probability of error. Fano s inequality quantifies the following idea: Estimate X with a small probability of error only if H(X Y) is small Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 35 / 43

56 Estimation Error and Differential Entropy Recall for discrete RVs: The problem: Assume we know RV Y and wish to guess the value of a correlated RV X Fano s inequality relates the probability of error in guessing X to its conditional entropy H(X Y). As we shall see later, this problem is indeed crucial in proving the converse to Shannon s channel capacity theorem. In one of the assignments, we will see that H(X Y) =0 if and only if X is a function of Y. It means that when H(X Y) =0, we can estimate X from Y with zero probability of error. Fano s inequality quantifies the following idea: Estimate X with a small probability of error only if H(X Y) is small Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 35 / 43

57 Estimation Error and Differential Entropy Recall for discrete RVs: The problem: Assume we know RV Y and wish to guess the value of a correlated RV X Fano s inequality relates the probability of error in guessing X to its conditional entropy H(X Y). As we shall see later, this problem is indeed crucial in proving the converse to Shannon s channel capacity theorem. In one of the assignments, we will see that H(X Y) =0 if and only if X is a function of Y. It means that when H(X Y) =0, we can estimate X from Y with zero probability of error. Fano s inequality quantifies the following idea: Estimate X with a small probability of error only if H(X Y) is small Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 35 / 43

58 Estimation Error and Differential Entropy Recall for discrete RVs: The problem: Assume we know RV Y and wish to guess the value of a correlated RV X Fano s inequality relates the probability of error in guessing X to its conditional entropy H(X Y). As we shall see later, this problem is indeed crucial in proving the converse to Shannon s channel capacity theorem. In one of the assignments, we will see that H(X Y) =0 if and only if X is a function of Y. It means that when H(X Y) =0, we can estimate X from Y with zero probability of error. Fano s inequality quantifies the following idea: Estimate X with a small probability of error only if H(X Y) is small Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 35 / 43

59 Fano s Inequality Assume we wish to estimate X and observe Y related to X by p(y x). From Y, we can calculate an estimate ˆX = g(y). We observe that X Y ˆX and wish to bound the probability P e = Pr(X ˆX). Also, let an error RV E = {1, 0}. Then Theorem (Fano s Inequality) For any estimate ˆX such that X Y ˆX, with P e = Pr(X ˆX) and H(P e ) H(E). We have H(P e )+P e log X H(X ˆX) H(X Y) This implies that P e H(X Y) 1 : P log X e cannot be too small if H(X Y) is large i.e., correct estimation only happens when residual randomness of X is small after observation of Y. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 36 / 43

60 Fano s Inequality Assume we wish to estimate X and observe Y related to X by p(y x). From Y, we can calculate an estimate ˆX = g(y). We observe that X Y ˆX and wish to bound the probability P e = Pr(X ˆX). Also, let an error RV E = {1, 0}. Then Theorem (Fano s Inequality) For any estimate ˆX such that X Y ˆX, with P e = Pr(X ˆX) and H(P e ) H(E). We have H(P e )+P e log X H(X ˆX) H(X Y) This implies that P e H(X Y) 1 : P log X e cannot be too small if H(X Y) is large i.e., correct estimation only happens when residual randomness of X is small after observation of Y. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 36 / 43

61 Fano s Inequality Assume we wish to estimate X and observe Y related to X by p(y x). From Y, we can calculate an estimate ˆX = g(y). We observe that X Y ˆX and wish to bound the probability P e = Pr(X ˆX). Also, let an error RV E = {1, 0}. Then Theorem (Fano s Inequality) For any estimate ˆX such that X Y ˆX, with P e = Pr(X ˆX) and H(P e ) H(E). We have H(P e )+P e log X H(X ˆX) H(X Y) This implies that P e H(X Y) 1 : P log X e cannot be too small if H(X Y) is large i.e., correct estimation only happens when residual randomness of X is small after observation of Y. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 36 / 43

62 Estimation Error and Differential Entropy Now we have the estimation counterpart to Fano s inequality: Theorem (Estimation error and differential entropy) For any RV X and estimator ˆX, E(X ˆX) 2 1 2πe exp(2h(x)) with equality iff X is Gaussian and ˆX is the mean of X and h(x be in nats. E(X ˆX) 2 : The expected prediction error. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 37 / 43

63 AEP for Continuous RVs Theorem (AEP - Discrete) If {X 1,...,X n } are i.i.d. with p(x) then 1 n log p (X 1,...,X n ) H(X) in probability. Theorem (AEP - Continuous) If {X 1,...,X n } are i.i.d. with f (x) then 1 n log f (X 1,...,X n ) h(x) in probability. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 38 / 43

64 AEP for Continuous RVs Theorem (AEP - Discrete) If {X 1,...,X n } are i.i.d. with p(x) then 1 n log p (X 1,...,X n ) H(X) in probability. Theorem (AEP - Continuous) If {X 1,...,X n } are i.i.d. with f (x) then 1 n log f (X 1,...,X n ) h(x) in probability. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 38 / 43

65 Typical Set Continuous RVs: A Review Definition (Typical Set - Discrete) The typical set A (n) ɛ with respect to p(x) is the set of sequence (x 1,...,x n ) X n with the property: 2 n(h(x)+ɛ) p(x 1,...,x n ) 2 n(h(x) ɛ) Definition (Typical Set - Continuous) For ɛ>0 and any n, the typical set A (n) ɛ defined as A (n) ɛ = { (x 1,...,x n ) S n : where f (x 1,...,x n )= n i=1 f (x i). with respect to f (x) is } 1 n log f (x 1,...,x n ) h(x) ɛ Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 39 / 43

66 Typical Set Continuous RVs: A Review Definition (Typical Set - Discrete) The typical set A (n) ɛ with respect to p(x) is the set of sequence (x 1,...,x n ) X n with the property: 2 n(h(x)+ɛ) p(x 1,...,x n ) 2 n(h(x) ɛ) Definition (Typical Set - Continuous) For ɛ>0 and any n, the typical set A (n) ɛ defined as A (n) ɛ = { (x 1,...,x n ) S n : where f (x 1,...,x n )= n i=1 f (x i). with respect to f (x) is } 1 n log f (x 1,...,x n ) h(x) ɛ Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 39 / 43

67 Typical Set and Volume Definition The volume Vol(A) of a set A R n is defined as Vol(A) = A dx 1...dx n. Theorem (Typical Set Properties) ( ) 1. Pr > 1 ɛ for n sufficient large. A (n) ɛ 2. Vol(A (n) ɛ ) 2 n(h(x)+ɛ) for all n. 3. Vol(A (n) ɛ ) (1 ɛ)2 n(h(x) ɛ) for n sufficient large. Theorem The set A (n) ɛ is the smallest volume set with probability larger or equal 1 ɛ, to first order in the exponent. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 40 / 43

68 Typical Set and Volume Definition The volume Vol(A) of a set A R n is defined as Vol(A) = A dx 1...dx n. Theorem (Typical Set Properties) ( ) 1. Pr > 1 ɛ for n sufficient large. A (n) ɛ 2. Vol(A (n) ɛ ) 2 n(h(x)+ɛ) for all n. 3. Vol(A (n) ɛ ) (1 ɛ)2 n(h(x) ɛ) for n sufficient large. Theorem The set A (n) ɛ is the smallest volume set with probability larger or equal 1 ɛ, to first order in the exponent. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 40 / 43

69 Differential Entropy: Summary h(x) = h(f ) = f(x)log f(x)dx S f(x n ) =2. nh(x) Vol(A (n) ɛ ) =2. nh(x). H ([X] 2 n) h(x) + n. h(n(0,σ 2 )) = 1 2 log 2πeσ2. h(n n (μ, K)) = 1 2 log(2πe)n K. D(f g) = f log f g 0. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 41 / 43

70 Differential Entropy: Summary h(x 1,X 2,...,X n ) = n h(x i X 1,X 2,...,X i 1 ). (8.88) i=1 h(x Y) h(x). (8.89) h(ax) = h(x) + log a. (8.90) I(X; Y) = f(x,y)log f(x,y) 0. (8.91) f(x)f(y) max h(x) = 1 EXX t =K 2 log(2πe)n K. (8.92) E(X ˆX(Y)) 2 1 2πe e2h(x Y). 2 nh (X) is the effective alphabet size for a discrete random variable. 2 nh(x) is the effective support set size for a continuous random variable. 2 C is the effective alphabet size of a channel of capacity C. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 42 / 43

71 Thank you! Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 43 / 43

Lecture 17: Differential Entropy

Lecture 17: Differential Entropy Lecture 17: Differential Entropy Differential entropy AEP for differential entropy Quantization Maximum differential entropy Estimation counterpart of Fano s inequality Dr. Yao Xie, ECE587, Information

More information

Chapter 8: Differential entropy. University of Illinois at Chicago ECE 534, Natasha Devroye

Chapter 8: Differential entropy. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 8: Differential entropy Chapter 8 outline Motivation Definitions Relation to discrete entropy Joint and conditional differential entropy Relative entropy and mutual information Properties AEP for

More information

Lecture 8: Channel Capacity, Continuous Random Variables

Lecture 8: Channel Capacity, Continuous Random Variables EE376A/STATS376A Information Theory Lecture 8-02/0/208 Lecture 8: Channel Capacity, Continuous Random Variables Lecturer: Tsachy Weissman Scribe: Augustine Chemparathy, Adithya Ganesh, Philip Hwang Channel

More information

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157 Lecture 6: Gaussian Channels Copyright G. Caire (Sample Lectures) 157 Differential entropy (1) Definition 18. The (joint) differential entropy of a continuous random vector X n p X n(x) over R is: Z h(x

More information

Lecture 5 Channel Coding over Continuous Channels

Lecture 5 Channel Coding over Continuous Channels Lecture 5 Channel Coding over Continuous Channels I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw November 14, 2014 1 / 34 I-Hsiang Wang NIT Lecture 5 From

More information

EE376A: Homeworks #4 Solutions Due on Thursday, February 22, 2018 Please submit on Gradescope. Start every question on a new page.

EE376A: Homeworks #4 Solutions Due on Thursday, February 22, 2018 Please submit on Gradescope. Start every question on a new page. EE376A: Homeworks #4 Solutions Due on Thursday, February 22, 28 Please submit on Gradescope. Start every question on a new page.. Maximum Differential Entropy (a) Show that among all distributions supported

More information

Lecture 11: Continuous-valued signals and differential entropy

Lecture 11: Continuous-valued signals and differential entropy Lecture 11: Continuous-valued signals and differential entropy Biology 429 Carl Bergstrom September 20, 2008 Sources: Parts of today s lecture follow Chapter 8 from Cover and Thomas (2007). Some components

More information

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 2: Entropy and Mutual Information Chapter 2 outline Definitions Entropy Joint entropy, conditional entropy Relative entropy, mutual information Chain rules Jensen s inequality Log-sum inequality

More information

EE4601 Communication Systems

EE4601 Communication Systems EE4601 Communication Systems Week 2 Review of Probability, Important Distributions 0 c 2011, Georgia Institute of Technology (lect2 1) Conditional Probability Consider a sample space that consists of two

More information

EE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm

EE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm EE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm 1. Feedback does not increase the capacity. Consider a channel with feedback. We assume that all the recieved outputs are sent back immediately

More information

Basics on Probability. Jingrui He 09/11/2007

Basics on Probability. Jingrui He 09/11/2007 Basics on Probability Jingrui He 09/11/2007 Coin Flips You flip a coin Head with probability 0.5 You flip 100 coins How many heads would you expect Coin Flips cont. You flip a coin Head with probability

More information

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information 4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information Ramji Venkataramanan Signal Processing and Communications Lab Department of Engineering ramji.v@eng.cam.ac.uk

More information

Lecture 11. Probability Theory: an Overveiw

Lecture 11. Probability Theory: an Overveiw Math 408 - Mathematical Statistics Lecture 11. Probability Theory: an Overveiw February 11, 2013 Konstantin Zuev (USC) Math 408, Lecture 11 February 11, 2013 1 / 24 The starting point in developing the

More information

Lecture 22: Final Review

Lecture 22: Final Review Lecture 22: Final Review Nuts and bolts Fundamental questions and limits Tools Practical algorithms Future topics Dr Yao Xie, ECE587, Information Theory, Duke University Basics Dr Yao Xie, ECE587, Information

More information

EE5319R: Problem Set 3 Assigned: 24/08/16, Due: 31/08/16

EE5319R: Problem Set 3 Assigned: 24/08/16, Due: 31/08/16 EE539R: Problem Set 3 Assigned: 24/08/6, Due: 3/08/6. Cover and Thomas: Problem 2.30 (Maimum Entropy): Solution: We are required to maimize H(P X ) over all distributions P X on the non-negative integers

More information

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows. Chapter 5 Two Random Variables In a practical engineering problem, there is almost always causal relationship between different events. Some relationships are determined by physical laws, e.g., voltage

More information

Machine Learning. Lecture 02.2: Basics of Information Theory. Nevin L. Zhang

Machine Learning. Lecture 02.2: Basics of Information Theory. Nevin L. Zhang Machine Learning Lecture 02.2: Basics of Information Theory Nevin L. Zhang lzhang@cse.ust.hk Department of Computer Science and Engineering The Hong Kong University of Science and Technology Nevin L. Zhang

More information

Recitation 2: Probability

Recitation 2: Probability Recitation 2: Probability Colin White, Kenny Marino January 23, 2018 Outline Facts about sets Definitions and facts about probability Random Variables and Joint Distributions Characteristics of distributions

More information

Review of Probability Theory

Review of Probability Theory Review of Probability Theory Arian Maleki and Tom Do Stanford University Probability theory is the study of uncertainty Through this class, we will be relying on concepts from probability theory for deriving

More information

Information Theory Primer:

Information Theory Primer: Information Theory Primer: Entropy, KL Divergence, Mutual Information, Jensen s inequality Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,

More information

Review: mostly probability and some statistics

Review: mostly probability and some statistics Review: mostly probability and some statistics C2 1 Content robability (should know already) Axioms and properties Conditional probability and independence Law of Total probability and Bayes theorem Random

More information

Lecture 2: August 31

Lecture 2: August 31 0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 2: August 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy

More information

conditional cdf, conditional pdf, total probability theorem?

conditional cdf, conditional pdf, total probability theorem? 6 Multiple Random Variables 6.0 INTRODUCTION scalar vs. random variable cdf, pdf transformation of a random variable conditional cdf, conditional pdf, total probability theorem expectation of a random

More information

5 Mutual Information and Channel Capacity

5 Mutual Information and Channel Capacity 5 Mutual Information and Channel Capacity In Section 2, we have seen the use of a quantity called entropy to measure the amount of randomness in a random variable. In this section, we introduce several

More information

Random Variables and Their Distributions

Random Variables and Their Distributions Chapter 3 Random Variables and Their Distributions A random variable (r.v.) is a function that assigns one and only one numerical value to each simple event in an experiment. We will denote r.vs by capital

More information

ELEC546 Review of Information Theory

ELEC546 Review of Information Theory ELEC546 Review of Information Theory Vincent Lau 1/1/004 1 Review of Information Theory Entropy: Measure of uncertainty of a random variable X. The entropy of X, H(X), is given by: If X is a discrete random

More information

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n JOINT DENSITIES - RANDOM VECTORS - REVIEW Joint densities describe probability distributions of a random vector X: an n-dimensional vector of random variables, ie, X = (X 1,, X n ), where all X is are

More information

EE/Stat 376B Handout #5 Network Information Theory October, 14, Homework Set #2 Solutions

EE/Stat 376B Handout #5 Network Information Theory October, 14, Homework Set #2 Solutions EE/Stat 376B Handout #5 Network Information Theory October, 14, 014 1. Problem.4 parts (b) and (c). Homework Set # Solutions (b) Consider h(x + Y ) h(x + Y Y ) = h(x Y ) = h(x). (c) Let ay = Y 1 + Y, where

More information

LECTURE 13. Last time: Lecture outline

LECTURE 13. Last time: Lecture outline LECTURE 13 Last time: Strong coding theorem Revisiting channel and codes Bound on probability of error Error exponent Lecture outline Fano s Lemma revisited Fano s inequality for codewords Converse to

More information

Continuous Random Variables

Continuous Random Variables 1 / 24 Continuous Random Variables Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical Engineering Indian Institute of Technology Bombay February 27, 2013 2 / 24 Continuous Random Variables

More information

ECE353: Probability and Random Processes. Lecture 7 -Continuous Random Variable

ECE353: Probability and Random Processes. Lecture 7 -Continuous Random Variable ECE353: Probability and Random Processes Lecture 7 -Continuous Random Variable Xiao Fu School of Electrical Engineering and Computer Science Oregon State University E-mail: xiao.fu@oregonstate.edu Continuous

More information

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed

More information

LECTURE 2. Convexity and related notions. Last time: mutual information: definitions and properties. Lecture outline

LECTURE 2. Convexity and related notions. Last time: mutual information: definitions and properties. Lecture outline LECTURE 2 Convexity and related notions Last time: Goals and mechanics of the class notation entropy: definitions and properties mutual information: definitions and properties Lecture outline Convexity

More information

BASICS OF PROBABILITY

BASICS OF PROBABILITY October 10, 2018 BASICS OF PROBABILITY Randomness, sample space and probability Probability is concerned with random experiments. That is, an experiment, the outcome of which cannot be predicted with certainty,

More information

ENGG2430A-Homework 2

ENGG2430A-Homework 2 ENGG3A-Homework Due on Feb 9th,. Independence vs correlation a For each of the following cases, compute the marginal pmfs from the joint pmfs. Explain whether the random variables X and Y are independent,

More information

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University Chapter 3, 4 Random Variables ENCS6161 - Probability and Stochastic Processes Concordia University ENCS6161 p.1/47 The Notion of a Random Variable A random variable X is a function that assigns a real

More information

Ch. 8 Math Preliminaries for Lossy Coding. 8.4 Info Theory Revisited

Ch. 8 Math Preliminaries for Lossy Coding. 8.4 Info Theory Revisited Ch. 8 Math Preliminaries for Lossy Coding 8.4 Info Theory Revisited 1 Info Theory Goals for Lossy Coding Again just as for the lossless case Info Theory provides: Basis for Algorithms & Bounds on Performance

More information

Formulas for probability theory and linear models SF2941

Formulas for probability theory and linear models SF2941 Formulas for probability theory and linear models SF2941 These pages + Appendix 2 of Gut) are permitted as assistance at the exam. 11 maj 2008 Selected formulae of probability Bivariate probability Transforms

More information

Exercises with solutions (Set D)

Exercises with solutions (Set D) Exercises with solutions Set D. A fair die is rolled at the same time as a fair coin is tossed. Let A be the number on the upper surface of the die and let B describe the outcome of the coin toss, where

More information

Chapter I: Fundamental Information Theory

Chapter I: Fundamental Information Theory ECE-S622/T62 Notes Chapter I: Fundamental Information Theory Ruifeng Zhang Dept. of Electrical & Computer Eng. Drexel University. Information Source Information is the outcome of some physical processes.

More information

Lecture 5 - Information theory

Lecture 5 - Information theory Lecture 5 - Information theory Jan Bouda FI MU May 18, 2012 Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 1 / 42 Part I Uncertainty and entropy Jan Bouda (FI MU) Lecture 5 - Information

More information

Computing and Communications 2. Information Theory -Entropy

Computing and Communications 2. Information Theory -Entropy 1896 1920 1987 2006 Computing and Communications 2. Information Theory -Entropy Ying Cui Department of Electronic Engineering Shanghai Jiao Tong University, China 2017, Autumn 1 Outline Entropy Joint entropy

More information

Chapter 2: Random Variables

Chapter 2: Random Variables ECE54: Stochastic Signals and Systems Fall 28 Lecture 2 - September 3, 28 Dr. Salim El Rouayheb Scribe: Peiwen Tian, Lu Liu, Ghadir Ayache Chapter 2: Random Variables Example. Tossing a fair coin twice:

More information

Quick Tour of Basic Probability Theory and Linear Algebra

Quick Tour of Basic Probability Theory and Linear Algebra Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra CS224w: Social and Information Network Analysis Fall 2011 Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra Outline Definitions

More information

Lecture 2: Repetition of probability theory and statistics

Lecture 2: Repetition of probability theory and statistics Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:

More information

Algorithms for Uncertainty Quantification

Algorithms for Uncertainty Quantification Algorithms for Uncertainty Quantification Tobias Neckel, Ionuț-Gabriel Farcaș Lehrstuhl Informatik V Summer Semester 2017 Lecture 2: Repetition of probability theory and statistics Example: coin flip Example

More information

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 ECE598: Information-theoretic methods in high-dimensional statistics Spring 06 Lecture : Mutual Information Method Lecturer: Yihong Wu Scribe: Jaeho Lee, Mar, 06 Ed. Mar 9 Quick review: Assouad s lemma

More information

2 (Statistics) Random variables

2 (Statistics) Random variables 2 (Statistics) Random variables References: DeGroot and Schervish, chapters 3, 4 and 5; Stirzaker, chapters 4, 5 and 6 We will now study the main tools use for modeling experiments with unknown outcomes

More information

Capacity of AWGN channels

Capacity of AWGN channels Chapter 3 Capacity of AWGN channels In this chapter we prove that the capacity of an AWGN channel with bandwidth W and signal-tonoise ratio SNR is W log 2 (1+SNR) bits per second (b/s). The proof that

More information

1 Joint and marginal distributions

1 Joint and marginal distributions DECEMBER 7, 204 LECTURE 2 JOINT (BIVARIATE) DISTRIBUTIONS, MARGINAL DISTRIBUTIONS, INDEPENDENCE So far we have considered one random variable at a time. However, in economics we are typically interested

More information

CHAPTER 3. P (B j A i ) P (B j ) =log 2. j=1

CHAPTER 3. P (B j A i ) P (B j ) =log 2. j=1 CHAPTER 3 Problem 3. : Also : Hence : I(B j ; A i ) = log P (B j A i ) P (B j ) 4 P (B j )= P (B j,a i )= i= 3 P (A i )= P (B j,a i )= j= =log P (B j,a i ) P (B j )P (A i ).3, j=.7, j=.4, j=3.3, i=.7,

More information

EE376A - Information Theory Final, Monday March 14th 2016 Solutions. Please start answering each question on a new page of the answer booklet.

EE376A - Information Theory Final, Monday March 14th 2016 Solutions. Please start answering each question on a new page of the answer booklet. EE376A - Information Theory Final, Monday March 14th 216 Solutions Instructions: You have three hours, 3.3PM - 6.3PM The exam has 4 questions, totaling 12 points. Please start answering each question on

More information

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed

More information

Entropy and Ergodic Theory Lecture 4: Conditional entropy and mutual information

Entropy and Ergodic Theory Lecture 4: Conditional entropy and mutual information Entropy and Ergodic Theory Lecture 4: Conditional entropy and mutual information 1 Conditional entropy Let (Ω, F, P) be a probability space, let X be a RV taking values in some finite set A. In this lecture

More information

The binary entropy function

The binary entropy function ECE 7680 Lecture 2 Definitions and Basic Facts Objective: To learn a bunch of definitions about entropy and information measures that will be useful through the quarter, and to present some simple but

More information

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy Coding and Information Theory Chris Williams, School of Informatics, University of Edinburgh Overview What is information theory? Entropy Coding Information Theory Shannon (1948): Information theory is

More information

Bivariate distributions

Bivariate distributions Bivariate distributions 3 th October 017 lecture based on Hogg Tanis Zimmerman: Probability and Statistical Inference (9th ed.) Bivariate Distributions of the Discrete Type The Correlation Coefficient

More information

Lecture 3: Channel Capacity

Lecture 3: Channel Capacity Lecture 3: Channel Capacity 1 Definitions Channel capacity is a measure of maximum information per channel usage one can get through a channel. This one of the fundamental concepts in information theory.

More information

SDS 321: Introduction to Probability and Statistics

SDS 321: Introduction to Probability and Statistics SDS 321: Introduction to Probability and Statistics Lecture 17: Continuous random variables: conditional PDF Purnamrita Sarkar Department of Statistics and Data Science The University of Texas at Austin

More information

x log x, which is strictly convex, and use Jensen s Inequality:

x log x, which is strictly convex, and use Jensen s Inequality: 2. Information measures: mutual information 2.1 Divergence: main inequality Theorem 2.1 (Information Inequality). D(P Q) 0 ; D(P Q) = 0 iff P = Q Proof. Let ϕ(x) x log x, which is strictly convex, and

More information

Lecture Notes 3 Multiple Random Variables. Joint, Marginal, and Conditional pmfs. Bayes Rule and Independence for pmfs

Lecture Notes 3 Multiple Random Variables. Joint, Marginal, and Conditional pmfs. Bayes Rule and Independence for pmfs Lecture Notes 3 Multiple Random Variables Joint, Marginal, and Conditional pmfs Bayes Rule and Independence for pmfs Joint, Marginal, and Conditional pdfs Bayes Rule and Independence for pdfs Functions

More information

Chapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued

Chapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued Chapter 3 sections 3.1 Random Variables and Discrete Distributions 3.2 Continuous Distributions 3.3 The Cumulative Distribution Function 3.4 Bivariate Distributions 3.5 Marginal Distributions 3.6 Conditional

More information

Hands-On Learning Theory Fall 2016, Lecture 3

Hands-On Learning Theory Fall 2016, Lecture 3 Hands-On Learning Theory Fall 016, Lecture 3 Jean Honorio jhonorio@purdue.edu 1 Information Theory First, we provide some information theory background. Definition 3.1 (Entropy). The entropy of a discrete

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB

More information

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1 Kraft s inequality An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if N 2 l i 1 Proof: Suppose that we have a tree code. Let l max = max{l 1,...,

More information

Capacity of a channel Shannon s second theorem. Information Theory 1/33

Capacity of a channel Shannon s second theorem. Information Theory 1/33 Capacity of a channel Shannon s second theorem Information Theory 1/33 Outline 1. Memoryless channels, examples ; 2. Capacity ; 3. Symmetric channels ; 4. Channel Coding ; 5. Shannon s second theorem,

More information

Solutions to Homework Set #1 Sanov s Theorem, Rate distortion

Solutions to Homework Set #1 Sanov s Theorem, Rate distortion st Semester 00/ Solutions to Homework Set # Sanov s Theorem, Rate distortion. Sanov s theorem: Prove the simple version of Sanov s theorem for the binary random variables, i.e., let X,X,...,X n be a sequence

More information

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University Chapter 4 Data Transmission and Channel Capacity Po-Ning Chen, Professor Department of Communications Engineering National Chiao Tung University Hsin Chu, Taiwan 30050, R.O.C. Principle of Data Transmission

More information

LECTURE 3. Last time:

LECTURE 3. Last time: LECTURE 3 Last time: Mutual Information. Convexity and concavity Jensen s inequality Information Inequality Data processing theorem Fano s Inequality Lecture outline Stochastic processes, Entropy rate

More information

Multiple Random Variables

Multiple Random Variables Multiple Random Variables This Version: July 30, 2015 Multiple Random Variables 2 Now we consider models with more than one r.v. These are called multivariate models For instance: height and weight An

More information

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Review of Basic Probability The fundamentals, random variables, probability distributions Probability mass/density functions

More information

Math 416 Lecture 2 DEFINITION. Here are the multivariate versions: X, Y, Z iff P(X = x, Y = y, Z =z) = p(x, y, z) of X, Y, Z iff for all sets A, B, C,

Math 416 Lecture 2 DEFINITION. Here are the multivariate versions: X, Y, Z iff P(X = x, Y = y, Z =z) = p(x, y, z) of X, Y, Z iff for all sets A, B, C, Math 416 Lecture 2 DEFINITION. Here are the multivariate versions: PMF case: p(x, y, z) is the joint Probability Mass Function of X, Y, Z iff P(X = x, Y = y, Z =z) = p(x, y, z) PDF case: f(x, y, z) is

More information

Chapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued

Chapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued Chapter 3 sections Chapter 3 - continued 3.1 Random Variables and Discrete Distributions 3.2 Continuous Distributions 3.3 The Cumulative Distribution Function 3.4 Bivariate Distributions 3.5 Marginal Distributions

More information

Lecture 5: Asymptotic Equipartition Property

Lecture 5: Asymptotic Equipartition Property Lecture 5: Asymptotic Equipartition Property Law of large number for product of random variables AEP and consequences Dr. Yao Xie, ECE587, Information Theory, Duke University Stock market Initial investment

More information

Introduction to Machine Learning

Introduction to Machine Learning What does this mean? Outline Contents Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola December 26, 2017 1 Introduction to Probability 1 2 Random Variables 3 3 Bayes

More information

Lecture 14 February 28

Lecture 14 February 28 EE/Stats 376A: Information Theory Winter 07 Lecture 4 February 8 Lecturer: David Tse Scribe: Sagnik M, Vivek B 4 Outline Gaussian channel and capacity Information measures for continuous random variables

More information

1 Random Variable: Topics

1 Random Variable: Topics Note: Handouts DO NOT replace the book. In most cases, they only provide a guideline on topics and an intuitive feel. 1 Random Variable: Topics Chap 2, 2.1-2.4 and Chap 3, 3.1-3.3 What is a random variable?

More information

COMPSCI 650 Applied Information Theory Jan 21, Lecture 2

COMPSCI 650 Applied Information Theory Jan 21, Lecture 2 COMPSCI 650 Applied Information Theory Jan 21, 2016 Lecture 2 Instructor: Arya Mazumdar Scribe: Gayane Vardoyan, Jong-Chyi Su 1 Entropy Definition: Entropy is a measure of uncertainty of a random variable.

More information

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability

More information

Random Variables. Cumulative Distribution Function (CDF) Amappingthattransformstheeventstotherealline.

Random Variables. Cumulative Distribution Function (CDF) Amappingthattransformstheeventstotherealline. Random Variables Amappingthattransformstheeventstotherealline. Example 1. Toss a fair coin. Define a random variable X where X is 1 if head appears and X is if tail appears. P (X =)=1/2 P (X =1)=1/2 Example

More information

ECE Lecture #9 Part 2 Overview

ECE Lecture #9 Part 2 Overview ECE 450 - Lecture #9 Part Overview Bivariate Moments Mean or Expected Value of Z = g(x, Y) Correlation and Covariance of RV s Functions of RV s: Z = g(x, Y); finding f Z (z) Method : First find F(z), by

More information

Introduction to Probability Theory

Introduction to Probability Theory Introduction to Probability Theory Ping Yu Department of Economics University of Hong Kong Ping Yu (HKU) Probability 1 / 39 Foundations 1 Foundations 2 Random Variables 3 Expectation 4 Multivariate Random

More information

Lecture 6 I. CHANNEL CODING. X n (m) P Y X

Lecture 6 I. CHANNEL CODING. X n (m) P Y X 6- Introduction to Information Theory Lecture 6 Lecturer: Haim Permuter Scribe: Yoav Eisenberg and Yakov Miron I. CHANNEL CODING We consider the following channel coding problem: m = {,2,..,2 nr} Encoder

More information

Statistics for scientists and engineers

Statistics for scientists and engineers Statistics for scientists and engineers February 0, 006 Contents Introduction. Motivation - why study statistics?................................... Examples..................................................3

More information

1 Presessional Probability

1 Presessional Probability 1 Presessional Probability Probability theory is essential for the development of mathematical models in finance, because of the randomness nature of price fluctuations in the markets. This presessional

More information

CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University

CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University Charalampos E. Tsourakakis January 25rd, 2017 Probability Theory The theory of probability is a system for making better guesses.

More information

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information. L65 Dept. of Linguistics, Indiana University Fall 205 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission rate

More information

Dept. of Linguistics, Indiana University Fall 2015

Dept. of Linguistics, Indiana University Fall 2015 L645 Dept. of Linguistics, Indiana University Fall 2015 1 / 28 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission

More information

Introduction to Computational Finance and Financial Econometrics Probability Review - Part 2

Introduction to Computational Finance and Financial Econometrics Probability Review - Part 2 You can t see this text! Introduction to Computational Finance and Financial Econometrics Probability Review - Part 2 Eric Zivot Spring 2015 Eric Zivot (Copyright 2015) Probability Review - Part 2 1 /

More information

2 Functions of random variables

2 Functions of random variables 2 Functions of random variables A basic statistical model for sample data is a collection of random variables X 1,..., X n. The data are summarised in terms of certain sample statistics, calculated as

More information

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018 Please submit the solutions on Gradescope. EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018 1. Optimal codeword lengths. Although the codeword lengths of an optimal variable length code

More information

A Probability Review

A Probability Review A Probability Review Outline: A probability review Shorthand notation: RV stands for random variable EE 527, Detection and Estimation Theory, # 0b 1 A Probability Review Reading: Go over handouts 2 5 in

More information

ECE 587 / STA 563: Lecture 2 Measures of Information Information Theory Duke University, Fall 2017

ECE 587 / STA 563: Lecture 2 Measures of Information Information Theory Duke University, Fall 2017 ECE 587 / STA 563: Lecture 2 Measures of Information Information Theory Duke University, Fall 207 Author: Galen Reeves Last Modified: August 3, 207 Outline of lecture: 2. Quantifying Information..................................

More information

Communications Theory and Engineering

Communications Theory and Engineering Communications Theory and Engineering Master's Degree in Electronic Engineering Sapienza University of Rome A.A. 2018-2019 AEP Asymptotic Equipartition Property AEP In information theory, the analog of

More information

National Sun Yat-Sen University CSE Course: Information Theory. Maximum Entropy and Spectral Estimation

National Sun Yat-Sen University CSE Course: Information Theory. Maximum Entropy and Spectral Estimation Maximum Entropy and Spectral Estimation 1 Introduction What is the distribution of velocities in the gas at a given temperature? It is the Maxwell-Boltzmann distribution. The maximum entropy distribution

More information

p. 6-1 Continuous Random Variables p. 6-2

p. 6-1 Continuous Random Variables p. 6-2 Continuous Random Variables Recall: For discrete random variables, only a finite or countably infinite number of possible values with positive probability (>). Often, there is interest in random variables

More information

3. Review of Probability and Statistics

3. Review of Probability and Statistics 3. Review of Probability and Statistics ECE 830, Spring 2014 Probabilistic models will be used throughout the course to represent noise, errors, and uncertainty in signal processing problems. This lecture

More information

Lecture 11: Quantum Information III - Source Coding

Lecture 11: Quantum Information III - Source Coding CSCI5370 Quantum Computing November 25, 203 Lecture : Quantum Information III - Source Coding Lecturer: Shengyu Zhang Scribe: Hing Yin Tsang. Holevo s bound Suppose Alice has an information source X that

More information

Machine Learning Srihari. Information Theory. Sargur N. Srihari

Machine Learning Srihari. Information Theory. Sargur N. Srihari Information Theory Sargur N. Srihari 1 Topics 1. Entropy as an Information Measure 1. Discrete variable definition Relationship to Code Length 2. Continuous Variable Differential Entropy 2. Maximum Entropy

More information

4. CONTINUOUS RANDOM VARIABLES

4. CONTINUOUS RANDOM VARIABLES IA Probability Lent Term 4 CONTINUOUS RANDOM VARIABLES 4 Introduction Up to now we have restricted consideration to sample spaces Ω which are finite, or countable; we will now relax that assumption We

More information