ECE 4400:693 - Information Theory
|
|
- Derick Harris
- 5 years ago
- Views:
Transcription
1 ECE 4400:693 - Information Theory Dr. Nghi Tran Lecture 8: Differential Entropy Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 1 / 43
2 Outline 1 Review: Entropy of discrete RVs 2 Differential Entropy: Motivation 3 Continuous RVs: A Review Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 2 / 43
3 Outline 1 Review: Entropy of discrete RVs 2 Differential Entropy: Motivation 3 Continuous RVs: A Review Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 2 / 43
4 Outline 1 Review: Entropy of discrete RVs 2 Differential Entropy: Motivation 3 Continuous RVs: A Review Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 2 / 43
5 Review: Entropy of discrete RVs Self-Information of a RV What is Entropy? Are there some intuitive notions about Entropy? We first define a so-called Self-Information At first, let consider a discrete RV X with PMF P(X = x). For convenience, hereafter, we shall denote the PMF as p(x). p(x) and p(y) refer to two different RVs and different PMFs Note that x is outcome of an experiment, not necessarily a number. Now,foraRVX, Self-Information of an event X = x is defined as: I(x) =log 1 = log p(x) p(x) If the base of the logarithm is e, it is measured in nats. Unless otherwise state, we take logarithm to base 2 and the measurement will be in bits. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 3 / 43
6 Review: Entropy of discrete RVs Self-Information of a RV (Continued) I(x) =log 1 p(x) = log p(x) Let see a very simple example: Suppose we have a discrete information source that emits binary bits 0 and 1 with equal probability of 1/2. What is Self-Information: It is 1 bit. Now, if a source emits k bits in a block k time intervals, Self-Information will be k bits. So we somehow already have some appropriate measure of information!!!!! Observe that High probability event conveys less information than low-probability one. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 4 / 43
7 Entropy of a RV Review: Entropy of discrete RVs So what is Entropy of a RV: Simply speaking It is an average of self-information. Or it can be understood considered as a measure of the uncertainty ofarv. Definition The entropy H(X) of a discrete RV X with PMF p(x) is given by H(X) = x p(x) log p(x) Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 5 / 43
8 Review: Entropy of discrete RVs Entropy of a RV (Continued) H(X) = x p(x) log p(x) Note that H(X) is a functional of the distribution of X: It does not depends on the actual values taken by X but only on the probabilities. We can observe that the entropy H(X) can be interpreted as the expected value of RV log 1 : An average p(x) self-information Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 6 / 43
9 Differential Entropy: Motivation Motivation What we considered earlier applies to discrete RVs. For continuous RVs, we also need to define entropy, mutual information. In fact, most of the time, we need to work on continuous RVs, e.g., Gaussian channel. Given H(X) = x p(x) log p(x) for discrete X, can you guess what would it be for a continuous one? H(X) = f X (x) log f X (x)dx S where f X (x), or for simplicity f (x), is the probability density function of RV X and S is support set of X. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 7 / 43
10 Differential Entropy: Motivation Motivation What we considered earlier applies to discrete RVs. For continuous RVs, we also need to define entropy, mutual information. In fact, most of the time, we need to work on continuous RVs, e.g., Gaussian channel. Given H(X) = x p(x) log p(x) for discrete X, can you guess what would it be for a continuous one? H(X) = f X (x) log f X (x)dx S where f X (x), or for simplicity f (x), is the probability density function of RV X and S is support set of X. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 7 / 43
11 Differential Entropy: Motivation Motivation What we considered earlier applies to discrete RVs. For continuous RVs, we also need to define entropy, mutual information. In fact, most of the time, we need to work on continuous RVs, e.g., Gaussian channel. Given H(X) = x p(x) log p(x) for discrete X, can you guess what would it be for a continuous one? H(X) = f X (x) log f X (x)dx S where f X (x), or for simplicity f (x), is the probability density function of RV X and S is support set of X. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 7 / 43
12 CDF and PDF Continuous RVs: A Review We have cumulative distribution function (CDF), which gives a complete description of the random variable: F X (x) =P(X x) The probability density function (PDF) is defined as the derivative of the CDF: f X (x) = df X(x) dx Then one has the following relationship: P(x 1 X x 2 )=P(X x 2 ) P(X x 1 ) = F X (x 2 ) F X (x 1 )= x2 x 1 f X (x)dx Some properties: f X (x) 0 (and the set of x is referred to as support set), + f X(x)dx = 1. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 8 / 43
13 Joint PDF For two RVs X and Y defined in sample space Ω, one has the joint CDF: The joint PDF is: F X,Y (x, y) =P(X x, Y y) f X,Y (x, y) = 2 F X,Y (x, y) x y The marginal PDF can be obtained from the joint PDF as: f X (x) = + f X,Y (x, y)dy; f Y (y) = + f X,Y (x, y)dx Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 9 / 43
14 The Conditional PDF The conditional PDF of the RV Y, given that the value of the RV X is equal to x, is defined as { fx,y (x,y) f Y (y x) = f X, f (x) X (x) 0 0, Otherwise Two RVs X and Y are statistically independent if and only if f Y (y x) =f Y (y) or equivalently, f X,Y (x, y) =f X (x) f Y (y) It means that knowledge of X does not affect the statistics of Y, and vice versa. As we will see later, if X and Y are independent, then X provides no information about Y and vice-versa. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 10 / 43
15 The Conditional PDF The conditional PDF of the RV Y, given that the value of the RV X is equal to x, is defined as { fx,y (x,y) f Y (y x) = f X, f (x) X (x) 0 0, Otherwise Two RVs X and Y are statistically independent if and only if f Y (y x) =f Y (y) or equivalently, f X,Y (x, y) =f X (x) f Y (y) It means that knowledge of X does not affect the statistics of Y, and vice versa. As we will see later, if X and Y are independent, then X provides no information about Y and vice-versa. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 10 / 43
16 Quantized Random Variables Before proceeding further with differential entropy, let first consider quantized RVs from a continuous PDF. We divide the range of x into bins of width Δ (quantization): f(x) Δ For any bin ith, x i such that f (x i )Δ = (i+1)δ f (x)dx: From iδ mean-value theorem. We then now define the following (discrete) RV: X Δ = {x i } p X Δ = {f (x i )Δ} x Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 11 / 43
17 Quantized Random Variables Before proceeding further with differential entropy, let first consider quantized RVs from a continuous PDF. We divide the range of x into bins of width Δ (quantization): f(x) Δ For any bin ith, x i such that f (x i )Δ = (i+1)δ f (x)dx: From iδ mean-value theorem. We then now define the following (discrete) RV: X Δ = {x i } p X Δ = {f (x i )Δ} x Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 11 / 43
18 Quantized Random Variables Before proceeding further with differential entropy, let first consider quantized RVs from a continuous PDF. We divide the range of x into bins of width Δ (quantization): f(x) Δ For any bin ith, x i such that f (x i )Δ = (i+1)δ f (x)dx: From iδ mean-value theorem. We then now define the following (discrete) RV: X Δ = {x i } p X Δ = {f (x i )Δ} x Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 11 / 43
19 Quantized Random Variable f(x) f(x) Δ Δ x x X Δ = {x i } p X Δ = {f (x i )Δ} It is a scaled, quantized version of f (x), with unevenly spaced x i. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 12 / 43
20 Entropy of Quantized Random Variable X Δ = {x i } p X Δ = {f (x i )Δ} H(X Δ ) = f (x i )Δ log (f (x i )Δ) = log Δ f (x i ) log (f (x i )) Δ Now, if Δ 0, wehave: H(X Δ )= log Δ The parameter h(x) = + + f (x) log f (x)dx f (x) log f (x)dx is defined as differential entropy of a continuous RV X. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 13 / 43
21 Entropy of Quantized Random Variable X Δ = {x i } p X Δ = {f (x i )Δ} H(X Δ ) = f (x i )Δ log (f (x i )Δ) = log Δ f (x i ) log (f (x i )) Δ Now, if Δ 0, wehave: H(X Δ )= log Δ The parameter h(x) = + + f (x) log f (x)dx f (x) log f (x)dx is defined as differential entropy of a continuous RV X. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 13 / 43
22 Entropy of Quantized Random Variable X Δ = {x i } p X Δ = {f (x i )Δ} H(X Δ ) = f (x i )Δ log (f (x i )Δ) = log Δ f (x i ) log (f (x i )) Δ Now, if Δ 0, wehave: H(X Δ )= log Δ The parameter h(x) = + + f (x) log f (x)dx f (x) log f (x)dx is defined as differential entropy of a continuous RV X. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 13 / 43
23 Differential Entropy: Definition Definition The differential entropy h(x) of a continuous random variable X with density f (x) is defined as h(x) = f (x) log f (x)dx = E {log f (X)} S where S is the support set of the random variable. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 14 / 43
24 Differential Entropy With Δ 0, H(X Δ )= log Δ h(x) = + + f (x) log f (x)dx f (x) log f (x)dx = E {log f (X)} h(x) does not give the amount of information in X Not necessarily positive However, one still can compare the uncertainly of two continuous r.v. (quantized to the same precision) Relative Entropy and Mutual Information still work well Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 15 / 43
25 Differential Entropy With Δ 0, H(X Δ )= log Δ h(x) = + + f (x) log f (x)dx f (x) log f (x)dx = E {log f (X)} h(x) does not give the amount of information in X Not necessarily positive However, one still can compare the uncertainly of two continuous r.v. (quantized to the same precision) Relative Entropy and Mutual Information still work well Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 15 / 43
26 Example: Uniform Distribution Uniform distribution X U(a, b): f (x) = 1 b a for x (a, b) and 0 else where b 1 h(x) = a b a log 1 dx = log(b a) b a We can observe that h(x) < 0 when (b a) < 1. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 16 / 43
27 Example: Uniform Distribution Uniform distribution X U(a, b): f (x) = 1 b a for x (a, b) and 0 else where b 1 h(x) = a b a log 1 dx = log(b a) b a We can observe that h(x) < 0 when (b a) < 1. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 16 / 43
28 Example: Uniform Distribution Uniform distribution X U(a, b): f (x) = 1 b a for x (a, b) and 0 else where b 1 h(x) = a b a log 1 dx = log(b a) b a We can observe that h(x) < 0 when (b a) < 1. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 16 / 43
29 Example: Gaussian Distribution Gaussian distribution X N(μ, σ 2 ): ( ) 1 f (x) = exp (x μ)2 2πσ 2 2σ 2 Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 17 / 43
30 Joint and Conditional Entropy Definition (Joint Entropy) The differential entropy of a set X 1,...,X n of random variables with joint pdf f (x 1,...,x n ) is defined as: h (X 1,...,X n )= f (x n ) log f (x n )dx n. Definition (Conditional Entropy) If X and Y have a joint density function f (x, y), we can define the conditional entropy h(x Y as h(x Y) = f (x, y) log f (x y)dxdy. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 18 / 43
31 Joint and Conditional Entropy Definition (Joint Entropy) The differential entropy of a set X 1,...,X n of random variables with joint pdf f (x 1,...,x n ) is defined as: h (X 1,...,X n )= f (x n ) log f (x n )dx n. Definition (Conditional Entropy) If X and Y have a joint density function f (x, y), we can define the conditional entropy h(x Y as h(x Y) = f (x, y) log f (x y)dxdy. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 18 / 43
32 Multivariate Gaussian Theorem (Entropy of a multivariate Gaussian distribution) Let X 1,...,X n have a multivariate Gaussian distribution with mean µ and covariance matrix Q, denoted as N n (µ, Q). Then h (X 1,...,X n )= 1 2 log(2πe)n Q where Q is the determinant of Q. Proof. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 19 / 43
33 Relative Entropy and Mutual Information Definition (Relative Entropy) The relative entropy (or Kullback-Leibler distance) D(f g) between two densities f and g is defined by D(f g) = f log f g. Note that D(f g) is finite only if the support set of f is contained in the support set of g. Definition (Mutual Information) The mutual information I(X; Y) between two random variables with joint density f (x, y) is defined as f (x, y) I(X; Y) = f (x, y) log f (x)f (y) dxdy Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 20 / 43
34 Information Inequality Theorem D(f g) 0 with equality iff f = g almost everywhere (a.e.). Proof. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 21 / 43
35 Properties of Mutual Information I(X; Y) = f (x, y) log f (x, y) f (x)f (y) dxdy From the definition, it is clear that: I(X; Y) =h(x) h(x Y) =h(y) h(y X) =h(x)+h(y) h(x, Y) Also, I(X; Y) =D (f (x, y) f (x)f (y)) Properties of D(f g) and I(X; Y) are the same as discrete case. Why? Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 22 / 43
36 Properties of Mutual Information I(X; Y) = f (x, y) log f (x, y) f (x)f (y) dxdy From the definition, it is clear that: I(X; Y) =h(x) h(x Y) =h(y) h(y X) =h(x)+h(y) h(x, Y) Also, I(X; Y) =D (f (x, y) f (x)f (y)) Properties of D(f g) and I(X; Y) are the same as discrete case. Why? Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 22 / 43
37 Properties of Mutual Information I(X; Y) = f (x, y) log f (x, y) f (x)f (y) dxdy From the definition, it is clear that: I(X; Y) =h(x) h(x Y) =h(y) h(y X) =h(x)+h(y) h(x, Y) Also, I(X; Y) =D (f (x, y) f (x)f (y)) Properties of D(f g) and I(X; Y) are the same as discrete case. Why? Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 22 / 43
38 Mutual Information with Finite Partitions Definition (Partition) Let X be the range of X. A partition P of X is a finite collection of disjoint sets P i such that i P i = X. Definition (Quantization) The quantization of X by P, denoted [X] P is a discrete RV defined by: Pr([X] P = i) =Pr(X P i = df(x). P i We can now have a general definition of mutual information between two arbitrary RVs X and Y with partitions P and Q using the mutual information between the quantized version of X and Y, [X] P and [Y] Q. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 23 / 43
39 Mutual Information with Finite Partitions Definition (Partition) Let X be the range of X. A partition P of X is a finite collection of disjoint sets P i such that i P i = X. Definition (Quantization) The quantization of X by P, denoted [X] P is a discrete RV defined by: Pr([X] P = i) =Pr(X P i = df(x). P i We can now have a general definition of mutual information between two arbitrary RVs X and Y with partitions P and Q using the mutual information between the quantized version of X and Y, [X] P and [Y] Q. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 23 / 43
40 Mutual Information with Finite Partitions Definition (Partition) Let X be the range of X. A partition P of X is a finite collection of disjoint sets P i such that i P i = X. Definition (Quantization) The quantization of X by P, denoted [X] P is a discrete RV defined by: Pr([X] P = i) =Pr(X P i = df(x). P i We can now have a general definition of mutual information between two arbitrary RVs X and Y with partitions P and Q using the mutual information between the quantized version of X and Y, [X] P and [Y] Q. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 23 / 43
41 Mutual Information with Finite Partitions Definition (Mutual Information) The mutual information between two RVs X and Y is given by: I(X; Y) =sup I([X] P, [Y] Q ) P,Q where the supremum is over all finite partitions P and Q. In fact, the above definition can be used to both RVs having pdf and pmf: More general. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 24 / 43
42 Mutual Information with Finite Partitions Definition (Mutual Information) The mutual information between two RVs X and Y is given by: I(X; Y) =sup I([X] P, [Y] Q ) P,Q where the supremum is over all finite partitions P and Q. In fact, the above definition can be used to both RVs having pdf and pmf: More general. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 24 / 43
43 Example: Mutual Information between Correlated Gaussian RVs Let (X, Y N (0, K) where ( σ 2 ρσ K = 2 ρσ 2 σ 2 ) It means the correlation is ρ. What is I(X; Y)? Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 25 / 43
44 More Properties of Differential Entropy Corollary h(x Y) h(x) with equality iff X and Y are independent. Theorem (Chain rule for differential entropy) n h(x 1,...,X n )= h(x i X 1,...,X i 1 ) i=1 Corollary h(x 1,...,X n ) h(x i ) with equality iff X 1,...,X n are independent. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 26 / 43
45 Changing Variable Now we have Y = g(x). We know that f Y (y) = df Y(y) dy = f X (g 1 (y)) dg 1 (y) dy = f X(g 1 (y)) dx dy Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 27 / 43
46 Changing Variable - Example Theorem Translation does not change differential entropy: h(x + c) =h(x). Theorem Corollary h(ax) =h(x)+log a For a vector-valued RV: h(ax) =h(x)+log det(a) Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 28 / 43
47 Concavity and Convexity Same properties with discrete RVs: Differential Entropy h(x) is a concave function of f X (x). Mutual Information h(x) is a concave function of f X (x). I(X; Y) is a concave function of f X (x) for fixed f Y X (y). I(X; Y) is a convex function of f Y X (y) for fixed f X (x). Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 29 / 43
48 Maximum Entropy Distribution Going back to discrete RVs: For a discrete random variable taking on K values, what distribution maximized the entropy? For continuous RVs, we are interested in: Maximizing the entropy h(f ) over all f that satisfy: 1. f (x) 0 with equality outside the support set S 2. S f (x)dx = 1 3. Moment constraints S f (x)r i(x)dx = α i for 1 i m. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 30 / 43
49 Maximum Entropy Distribution Going back to discrete RVs: For a discrete random variable taking on K values, what distribution maximized the entropy? For continuous RVs, we are interested in: Maximizing the entropy h(f ) over all f that satisfy: 1. f (x) 0 with equality outside the support set S 2. S f (x)dx = 1 3. Moment constraints S f (x)r i(x)dx = α i for 1 i m. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 30 / 43
50 Maximum Entropy Distribution Maximizing the entropy h(f ) over all f that satisfy: 1. f (x) 0 with equality outside the support set S 2. f (x)dx = 1 S 3. Moment constraints f (x)r S i(x)dx = α i for 1 i m. Theorem (Maximum entropy distribution) Let f (x) =f λ (x) =exp(λ 0 + m i=1 λ ir i (x)), x S, where λ 0,...,λ m are chosen so that f satisfies the above constraints. Then f uniquely maximizes h(f ) over all probability densities f satisfying the above constraints. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 31 / 43
51 Example 1 Continuous RVs: A Review First, let S =[a, b]. What is f? Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 32 / 43
52 Example 2 Continuous RVs: A Review Now we consider that we have constraints E(X) =0 and E(X 2 )=σ 2. What is f? Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 33 / 43
53 Example 3 Continuous RVs: A Review What zero-mean distribution maximizes the entropy on (, ) n for a given covariance matrix K? Answer: A multivariate Gaussian 1 φ(x) = ( 2πn K exp 1 ) 2 x K 1 x Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 34 / 43
54 Example 3 Continuous RVs: A Review What zero-mean distribution maximizes the entropy on (, ) n for a given covariance matrix K? Answer: A multivariate Gaussian 1 φ(x) = ( 2πn K exp 1 ) 2 x K 1 x Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 34 / 43
55 Estimation Error and Differential Entropy Recall for discrete RVs: The problem: Assume we know RV Y and wish to guess the value of a correlated RV X Fano s inequality relates the probability of error in guessing X to its conditional entropy H(X Y). As we shall see later, this problem is indeed crucial in proving the converse to Shannon s channel capacity theorem. In one of the assignments, we will see that H(X Y) =0 if and only if X is a function of Y. It means that when H(X Y) =0, we can estimate X from Y with zero probability of error. Fano s inequality quantifies the following idea: Estimate X with a small probability of error only if H(X Y) is small Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 35 / 43
56 Estimation Error and Differential Entropy Recall for discrete RVs: The problem: Assume we know RV Y and wish to guess the value of a correlated RV X Fano s inequality relates the probability of error in guessing X to its conditional entropy H(X Y). As we shall see later, this problem is indeed crucial in proving the converse to Shannon s channel capacity theorem. In one of the assignments, we will see that H(X Y) =0 if and only if X is a function of Y. It means that when H(X Y) =0, we can estimate X from Y with zero probability of error. Fano s inequality quantifies the following idea: Estimate X with a small probability of error only if H(X Y) is small Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 35 / 43
57 Estimation Error and Differential Entropy Recall for discrete RVs: The problem: Assume we know RV Y and wish to guess the value of a correlated RV X Fano s inequality relates the probability of error in guessing X to its conditional entropy H(X Y). As we shall see later, this problem is indeed crucial in proving the converse to Shannon s channel capacity theorem. In one of the assignments, we will see that H(X Y) =0 if and only if X is a function of Y. It means that when H(X Y) =0, we can estimate X from Y with zero probability of error. Fano s inequality quantifies the following idea: Estimate X with a small probability of error only if H(X Y) is small Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 35 / 43
58 Estimation Error and Differential Entropy Recall for discrete RVs: The problem: Assume we know RV Y and wish to guess the value of a correlated RV X Fano s inequality relates the probability of error in guessing X to its conditional entropy H(X Y). As we shall see later, this problem is indeed crucial in proving the converse to Shannon s channel capacity theorem. In one of the assignments, we will see that H(X Y) =0 if and only if X is a function of Y. It means that when H(X Y) =0, we can estimate X from Y with zero probability of error. Fano s inequality quantifies the following idea: Estimate X with a small probability of error only if H(X Y) is small Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 35 / 43
59 Fano s Inequality Assume we wish to estimate X and observe Y related to X by p(y x). From Y, we can calculate an estimate ˆX = g(y). We observe that X Y ˆX and wish to bound the probability P e = Pr(X ˆX). Also, let an error RV E = {1, 0}. Then Theorem (Fano s Inequality) For any estimate ˆX such that X Y ˆX, with P e = Pr(X ˆX) and H(P e ) H(E). We have H(P e )+P e log X H(X ˆX) H(X Y) This implies that P e H(X Y) 1 : P log X e cannot be too small if H(X Y) is large i.e., correct estimation only happens when residual randomness of X is small after observation of Y. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 36 / 43
60 Fano s Inequality Assume we wish to estimate X and observe Y related to X by p(y x). From Y, we can calculate an estimate ˆX = g(y). We observe that X Y ˆX and wish to bound the probability P e = Pr(X ˆX). Also, let an error RV E = {1, 0}. Then Theorem (Fano s Inequality) For any estimate ˆX such that X Y ˆX, with P e = Pr(X ˆX) and H(P e ) H(E). We have H(P e )+P e log X H(X ˆX) H(X Y) This implies that P e H(X Y) 1 : P log X e cannot be too small if H(X Y) is large i.e., correct estimation only happens when residual randomness of X is small after observation of Y. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 36 / 43
61 Fano s Inequality Assume we wish to estimate X and observe Y related to X by p(y x). From Y, we can calculate an estimate ˆX = g(y). We observe that X Y ˆX and wish to bound the probability P e = Pr(X ˆX). Also, let an error RV E = {1, 0}. Then Theorem (Fano s Inequality) For any estimate ˆX such that X Y ˆX, with P e = Pr(X ˆX) and H(P e ) H(E). We have H(P e )+P e log X H(X ˆX) H(X Y) This implies that P e H(X Y) 1 : P log X e cannot be too small if H(X Y) is large i.e., correct estimation only happens when residual randomness of X is small after observation of Y. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 36 / 43
62 Estimation Error and Differential Entropy Now we have the estimation counterpart to Fano s inequality: Theorem (Estimation error and differential entropy) For any RV X and estimator ˆX, E(X ˆX) 2 1 2πe exp(2h(x)) with equality iff X is Gaussian and ˆX is the mean of X and h(x be in nats. E(X ˆX) 2 : The expected prediction error. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 37 / 43
63 AEP for Continuous RVs Theorem (AEP - Discrete) If {X 1,...,X n } are i.i.d. with p(x) then 1 n log p (X 1,...,X n ) H(X) in probability. Theorem (AEP - Continuous) If {X 1,...,X n } are i.i.d. with f (x) then 1 n log f (X 1,...,X n ) h(x) in probability. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 38 / 43
64 AEP for Continuous RVs Theorem (AEP - Discrete) If {X 1,...,X n } are i.i.d. with p(x) then 1 n log p (X 1,...,X n ) H(X) in probability. Theorem (AEP - Continuous) If {X 1,...,X n } are i.i.d. with f (x) then 1 n log f (X 1,...,X n ) h(x) in probability. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 38 / 43
65 Typical Set Continuous RVs: A Review Definition (Typical Set - Discrete) The typical set A (n) ɛ with respect to p(x) is the set of sequence (x 1,...,x n ) X n with the property: 2 n(h(x)+ɛ) p(x 1,...,x n ) 2 n(h(x) ɛ) Definition (Typical Set - Continuous) For ɛ>0 and any n, the typical set A (n) ɛ defined as A (n) ɛ = { (x 1,...,x n ) S n : where f (x 1,...,x n )= n i=1 f (x i). with respect to f (x) is } 1 n log f (x 1,...,x n ) h(x) ɛ Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 39 / 43
66 Typical Set Continuous RVs: A Review Definition (Typical Set - Discrete) The typical set A (n) ɛ with respect to p(x) is the set of sequence (x 1,...,x n ) X n with the property: 2 n(h(x)+ɛ) p(x 1,...,x n ) 2 n(h(x) ɛ) Definition (Typical Set - Continuous) For ɛ>0 and any n, the typical set A (n) ɛ defined as A (n) ɛ = { (x 1,...,x n ) S n : where f (x 1,...,x n )= n i=1 f (x i). with respect to f (x) is } 1 n log f (x 1,...,x n ) h(x) ɛ Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 39 / 43
67 Typical Set and Volume Definition The volume Vol(A) of a set A R n is defined as Vol(A) = A dx 1...dx n. Theorem (Typical Set Properties) ( ) 1. Pr > 1 ɛ for n sufficient large. A (n) ɛ 2. Vol(A (n) ɛ ) 2 n(h(x)+ɛ) for all n. 3. Vol(A (n) ɛ ) (1 ɛ)2 n(h(x) ɛ) for n sufficient large. Theorem The set A (n) ɛ is the smallest volume set with probability larger or equal 1 ɛ, to first order in the exponent. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 40 / 43
68 Typical Set and Volume Definition The volume Vol(A) of a set A R n is defined as Vol(A) = A dx 1...dx n. Theorem (Typical Set Properties) ( ) 1. Pr > 1 ɛ for n sufficient large. A (n) ɛ 2. Vol(A (n) ɛ ) 2 n(h(x)+ɛ) for all n. 3. Vol(A (n) ɛ ) (1 ɛ)2 n(h(x) ɛ) for n sufficient large. Theorem The set A (n) ɛ is the smallest volume set with probability larger or equal 1 ɛ, to first order in the exponent. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 40 / 43
69 Differential Entropy: Summary h(x) = h(f ) = f(x)log f(x)dx S f(x n ) =2. nh(x) Vol(A (n) ɛ ) =2. nh(x). H ([X] 2 n) h(x) + n. h(n(0,σ 2 )) = 1 2 log 2πeσ2. h(n n (μ, K)) = 1 2 log(2πe)n K. D(f g) = f log f g 0. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 41 / 43
70 Differential Entropy: Summary h(x 1,X 2,...,X n ) = n h(x i X 1,X 2,...,X i 1 ). (8.88) i=1 h(x Y) h(x). (8.89) h(ax) = h(x) + log a. (8.90) I(X; Y) = f(x,y)log f(x,y) 0. (8.91) f(x)f(y) max h(x) = 1 EXX t =K 2 log(2πe)n K. (8.92) E(X ˆX(Y)) 2 1 2πe e2h(x Y). 2 nh (X) is the effective alphabet size for a discrete random variable. 2 nh(x) is the effective support set size for a continuous random variable. 2 C is the effective alphabet size of a channel of capacity C. Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 42 / 43
71 Thank you! Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 43 / 43
Lecture 17: Differential Entropy
Lecture 17: Differential Entropy Differential entropy AEP for differential entropy Quantization Maximum differential entropy Estimation counterpart of Fano s inequality Dr. Yao Xie, ECE587, Information
More informationChapter 8: Differential entropy. University of Illinois at Chicago ECE 534, Natasha Devroye
Chapter 8: Differential entropy Chapter 8 outline Motivation Definitions Relation to discrete entropy Joint and conditional differential entropy Relative entropy and mutual information Properties AEP for
More informationLecture 8: Channel Capacity, Continuous Random Variables
EE376A/STATS376A Information Theory Lecture 8-02/0/208 Lecture 8: Channel Capacity, Continuous Random Variables Lecturer: Tsachy Weissman Scribe: Augustine Chemparathy, Adithya Ganesh, Philip Hwang Channel
More informationLecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157
Lecture 6: Gaussian Channels Copyright G. Caire (Sample Lectures) 157 Differential entropy (1) Definition 18. The (joint) differential entropy of a continuous random vector X n p X n(x) over R is: Z h(x
More informationLecture 5 Channel Coding over Continuous Channels
Lecture 5 Channel Coding over Continuous Channels I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw November 14, 2014 1 / 34 I-Hsiang Wang NIT Lecture 5 From
More informationEE376A: Homeworks #4 Solutions Due on Thursday, February 22, 2018 Please submit on Gradescope. Start every question on a new page.
EE376A: Homeworks #4 Solutions Due on Thursday, February 22, 28 Please submit on Gradescope. Start every question on a new page.. Maximum Differential Entropy (a) Show that among all distributions supported
More informationLecture 11: Continuous-valued signals and differential entropy
Lecture 11: Continuous-valued signals and differential entropy Biology 429 Carl Bergstrom September 20, 2008 Sources: Parts of today s lecture follow Chapter 8 from Cover and Thomas (2007). Some components
More informationChapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye
Chapter 2: Entropy and Mutual Information Chapter 2 outline Definitions Entropy Joint entropy, conditional entropy Relative entropy, mutual information Chain rules Jensen s inequality Log-sum inequality
More informationEE4601 Communication Systems
EE4601 Communication Systems Week 2 Review of Probability, Important Distributions 0 c 2011, Georgia Institute of Technology (lect2 1) Conditional Probability Consider a sample space that consists of two
More informationEE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm
EE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm 1. Feedback does not increase the capacity. Consider a channel with feedback. We assume that all the recieved outputs are sent back immediately
More informationBasics on Probability. Jingrui He 09/11/2007
Basics on Probability Jingrui He 09/11/2007 Coin Flips You flip a coin Head with probability 0.5 You flip 100 coins How many heads would you expect Coin Flips cont. You flip a coin Head with probability
More information4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information
4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information Ramji Venkataramanan Signal Processing and Communications Lab Department of Engineering ramji.v@eng.cam.ac.uk
More informationLecture 11. Probability Theory: an Overveiw
Math 408 - Mathematical Statistics Lecture 11. Probability Theory: an Overveiw February 11, 2013 Konstantin Zuev (USC) Math 408, Lecture 11 February 11, 2013 1 / 24 The starting point in developing the
More informationLecture 22: Final Review
Lecture 22: Final Review Nuts and bolts Fundamental questions and limits Tools Practical algorithms Future topics Dr Yao Xie, ECE587, Information Theory, Duke University Basics Dr Yao Xie, ECE587, Information
More informationEE5319R: Problem Set 3 Assigned: 24/08/16, Due: 31/08/16
EE539R: Problem Set 3 Assigned: 24/08/6, Due: 3/08/6. Cover and Thomas: Problem 2.30 (Maimum Entropy): Solution: We are required to maimize H(P X ) over all distributions P X on the non-negative integers
More informationPerhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.
Chapter 5 Two Random Variables In a practical engineering problem, there is almost always causal relationship between different events. Some relationships are determined by physical laws, e.g., voltage
More informationMachine Learning. Lecture 02.2: Basics of Information Theory. Nevin L. Zhang
Machine Learning Lecture 02.2: Basics of Information Theory Nevin L. Zhang lzhang@cse.ust.hk Department of Computer Science and Engineering The Hong Kong University of Science and Technology Nevin L. Zhang
More informationRecitation 2: Probability
Recitation 2: Probability Colin White, Kenny Marino January 23, 2018 Outline Facts about sets Definitions and facts about probability Random Variables and Joint Distributions Characteristics of distributions
More informationReview of Probability Theory
Review of Probability Theory Arian Maleki and Tom Do Stanford University Probability theory is the study of uncertainty Through this class, we will be relying on concepts from probability theory for deriving
More informationInformation Theory Primer:
Information Theory Primer: Entropy, KL Divergence, Mutual Information, Jensen s inequality Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,
More informationReview: mostly probability and some statistics
Review: mostly probability and some statistics C2 1 Content robability (should know already) Axioms and properties Conditional probability and independence Law of Total probability and Bayes theorem Random
More informationLecture 2: August 31
0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 2: August 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy
More informationconditional cdf, conditional pdf, total probability theorem?
6 Multiple Random Variables 6.0 INTRODUCTION scalar vs. random variable cdf, pdf transformation of a random variable conditional cdf, conditional pdf, total probability theorem expectation of a random
More information5 Mutual Information and Channel Capacity
5 Mutual Information and Channel Capacity In Section 2, we have seen the use of a quantity called entropy to measure the amount of randomness in a random variable. In this section, we introduce several
More informationRandom Variables and Their Distributions
Chapter 3 Random Variables and Their Distributions A random variable (r.v.) is a function that assigns one and only one numerical value to each simple event in an experiment. We will denote r.vs by capital
More informationELEC546 Review of Information Theory
ELEC546 Review of Information Theory Vincent Lau 1/1/004 1 Review of Information Theory Entropy: Measure of uncertainty of a random variable X. The entropy of X, H(X), is given by: If X is a discrete random
More informationP (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n
JOINT DENSITIES - RANDOM VECTORS - REVIEW Joint densities describe probability distributions of a random vector X: an n-dimensional vector of random variables, ie, X = (X 1,, X n ), where all X is are
More informationEE/Stat 376B Handout #5 Network Information Theory October, 14, Homework Set #2 Solutions
EE/Stat 376B Handout #5 Network Information Theory October, 14, 014 1. Problem.4 parts (b) and (c). Homework Set # Solutions (b) Consider h(x + Y ) h(x + Y Y ) = h(x Y ) = h(x). (c) Let ay = Y 1 + Y, where
More informationLECTURE 13. Last time: Lecture outline
LECTURE 13 Last time: Strong coding theorem Revisiting channel and codes Bound on probability of error Error exponent Lecture outline Fano s Lemma revisited Fano s inequality for codewords Converse to
More informationContinuous Random Variables
1 / 24 Continuous Random Variables Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical Engineering Indian Institute of Technology Bombay February 27, 2013 2 / 24 Continuous Random Variables
More informationECE353: Probability and Random Processes. Lecture 7 -Continuous Random Variable
ECE353: Probability and Random Processes Lecture 7 -Continuous Random Variable Xiao Fu School of Electrical Engineering and Computer Science Oregon State University E-mail: xiao.fu@oregonstate.edu Continuous
More informationLecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable
Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed
More informationLECTURE 2. Convexity and related notions. Last time: mutual information: definitions and properties. Lecture outline
LECTURE 2 Convexity and related notions Last time: Goals and mechanics of the class notation entropy: definitions and properties mutual information: definitions and properties Lecture outline Convexity
More informationBASICS OF PROBABILITY
October 10, 2018 BASICS OF PROBABILITY Randomness, sample space and probability Probability is concerned with random experiments. That is, an experiment, the outcome of which cannot be predicted with certainty,
More informationENGG2430A-Homework 2
ENGG3A-Homework Due on Feb 9th,. Independence vs correlation a For each of the following cases, compute the marginal pmfs from the joint pmfs. Explain whether the random variables X and Y are independent,
More informationChapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University
Chapter 3, 4 Random Variables ENCS6161 - Probability and Stochastic Processes Concordia University ENCS6161 p.1/47 The Notion of a Random Variable A random variable X is a function that assigns a real
More informationCh. 8 Math Preliminaries for Lossy Coding. 8.4 Info Theory Revisited
Ch. 8 Math Preliminaries for Lossy Coding 8.4 Info Theory Revisited 1 Info Theory Goals for Lossy Coding Again just as for the lossless case Info Theory provides: Basis for Algorithms & Bounds on Performance
More informationFormulas for probability theory and linear models SF2941
Formulas for probability theory and linear models SF2941 These pages + Appendix 2 of Gut) are permitted as assistance at the exam. 11 maj 2008 Selected formulae of probability Bivariate probability Transforms
More informationExercises with solutions (Set D)
Exercises with solutions Set D. A fair die is rolled at the same time as a fair coin is tossed. Let A be the number on the upper surface of the die and let B describe the outcome of the coin toss, where
More informationChapter I: Fundamental Information Theory
ECE-S622/T62 Notes Chapter I: Fundamental Information Theory Ruifeng Zhang Dept. of Electrical & Computer Eng. Drexel University. Information Source Information is the outcome of some physical processes.
More informationLecture 5 - Information theory
Lecture 5 - Information theory Jan Bouda FI MU May 18, 2012 Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 1 / 42 Part I Uncertainty and entropy Jan Bouda (FI MU) Lecture 5 - Information
More informationComputing and Communications 2. Information Theory -Entropy
1896 1920 1987 2006 Computing and Communications 2. Information Theory -Entropy Ying Cui Department of Electronic Engineering Shanghai Jiao Tong University, China 2017, Autumn 1 Outline Entropy Joint entropy
More informationChapter 2: Random Variables
ECE54: Stochastic Signals and Systems Fall 28 Lecture 2 - September 3, 28 Dr. Salim El Rouayheb Scribe: Peiwen Tian, Lu Liu, Ghadir Ayache Chapter 2: Random Variables Example. Tossing a fair coin twice:
More informationQuick Tour of Basic Probability Theory and Linear Algebra
Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra CS224w: Social and Information Network Analysis Fall 2011 Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra Outline Definitions
More informationLecture 2: Repetition of probability theory and statistics
Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:
More informationAlgorithms for Uncertainty Quantification
Algorithms for Uncertainty Quantification Tobias Neckel, Ionuț-Gabriel Farcaș Lehrstuhl Informatik V Summer Semester 2017 Lecture 2: Repetition of probability theory and statistics Example: coin flip Example
More informationECE598: Information-theoretic methods in high-dimensional statistics Spring 2016
ECE598: Information-theoretic methods in high-dimensional statistics Spring 06 Lecture : Mutual Information Method Lecturer: Yihong Wu Scribe: Jaeho Lee, Mar, 06 Ed. Mar 9 Quick review: Assouad s lemma
More information2 (Statistics) Random variables
2 (Statistics) Random variables References: DeGroot and Schervish, chapters 3, 4 and 5; Stirzaker, chapters 4, 5 and 6 We will now study the main tools use for modeling experiments with unknown outcomes
More informationCapacity of AWGN channels
Chapter 3 Capacity of AWGN channels In this chapter we prove that the capacity of an AWGN channel with bandwidth W and signal-tonoise ratio SNR is W log 2 (1+SNR) bits per second (b/s). The proof that
More information1 Joint and marginal distributions
DECEMBER 7, 204 LECTURE 2 JOINT (BIVARIATE) DISTRIBUTIONS, MARGINAL DISTRIBUTIONS, INDEPENDENCE So far we have considered one random variable at a time. However, in economics we are typically interested
More informationCHAPTER 3. P (B j A i ) P (B j ) =log 2. j=1
CHAPTER 3 Problem 3. : Also : Hence : I(B j ; A i ) = log P (B j A i ) P (B j ) 4 P (B j )= P (B j,a i )= i= 3 P (A i )= P (B j,a i )= j= =log P (B j,a i ) P (B j )P (A i ).3, j=.7, j=.4, j=3.3, i=.7,
More informationEE376A - Information Theory Final, Monday March 14th 2016 Solutions. Please start answering each question on a new page of the answer booklet.
EE376A - Information Theory Final, Monday March 14th 216 Solutions Instructions: You have three hours, 3.3PM - 6.3PM The exam has 4 questions, totaling 12 points. Please start answering each question on
More informationLecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable
Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed
More informationEntropy and Ergodic Theory Lecture 4: Conditional entropy and mutual information
Entropy and Ergodic Theory Lecture 4: Conditional entropy and mutual information 1 Conditional entropy Let (Ω, F, P) be a probability space, let X be a RV taking values in some finite set A. In this lecture
More informationThe binary entropy function
ECE 7680 Lecture 2 Definitions and Basic Facts Objective: To learn a bunch of definitions about entropy and information measures that will be useful through the quarter, and to present some simple but
More informationInformation Theory. Coding and Information Theory. Information Theory Textbooks. Entropy
Coding and Information Theory Chris Williams, School of Informatics, University of Edinburgh Overview What is information theory? Entropy Coding Information Theory Shannon (1948): Information theory is
More informationBivariate distributions
Bivariate distributions 3 th October 017 lecture based on Hogg Tanis Zimmerman: Probability and Statistical Inference (9th ed.) Bivariate Distributions of the Discrete Type The Correlation Coefficient
More informationLecture 3: Channel Capacity
Lecture 3: Channel Capacity 1 Definitions Channel capacity is a measure of maximum information per channel usage one can get through a channel. This one of the fundamental concepts in information theory.
More informationSDS 321: Introduction to Probability and Statistics
SDS 321: Introduction to Probability and Statistics Lecture 17: Continuous random variables: conditional PDF Purnamrita Sarkar Department of Statistics and Data Science The University of Texas at Austin
More informationx log x, which is strictly convex, and use Jensen s Inequality:
2. Information measures: mutual information 2.1 Divergence: main inequality Theorem 2.1 (Information Inequality). D(P Q) 0 ; D(P Q) = 0 iff P = Q Proof. Let ϕ(x) x log x, which is strictly convex, and
More informationLecture Notes 3 Multiple Random Variables. Joint, Marginal, and Conditional pmfs. Bayes Rule and Independence for pmfs
Lecture Notes 3 Multiple Random Variables Joint, Marginal, and Conditional pmfs Bayes Rule and Independence for pmfs Joint, Marginal, and Conditional pdfs Bayes Rule and Independence for pdfs Functions
More informationChapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued
Chapter 3 sections 3.1 Random Variables and Discrete Distributions 3.2 Continuous Distributions 3.3 The Cumulative Distribution Function 3.4 Bivariate Distributions 3.5 Marginal Distributions 3.6 Conditional
More informationHands-On Learning Theory Fall 2016, Lecture 3
Hands-On Learning Theory Fall 016, Lecture 3 Jean Honorio jhonorio@purdue.edu 1 Information Theory First, we provide some information theory background. Definition 3.1 (Entropy). The entropy of a discrete
More informationIntroduction to Machine Learning
Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB
More informationAn instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1
Kraft s inequality An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if N 2 l i 1 Proof: Suppose that we have a tree code. Let l max = max{l 1,...,
More informationCapacity of a channel Shannon s second theorem. Information Theory 1/33
Capacity of a channel Shannon s second theorem Information Theory 1/33 Outline 1. Memoryless channels, examples ; 2. Capacity ; 3. Symmetric channels ; 4. Channel Coding ; 5. Shannon s second theorem,
More informationSolutions to Homework Set #1 Sanov s Theorem, Rate distortion
st Semester 00/ Solutions to Homework Set # Sanov s Theorem, Rate distortion. Sanov s theorem: Prove the simple version of Sanov s theorem for the binary random variables, i.e., let X,X,...,X n be a sequence
More informationChapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University
Chapter 4 Data Transmission and Channel Capacity Po-Ning Chen, Professor Department of Communications Engineering National Chiao Tung University Hsin Chu, Taiwan 30050, R.O.C. Principle of Data Transmission
More informationLECTURE 3. Last time:
LECTURE 3 Last time: Mutual Information. Convexity and concavity Jensen s inequality Information Inequality Data processing theorem Fano s Inequality Lecture outline Stochastic processes, Entropy rate
More informationMultiple Random Variables
Multiple Random Variables This Version: July 30, 2015 Multiple Random Variables 2 Now we consider models with more than one r.v. These are called multivariate models For instance: height and weight An
More informationMA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems
MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Review of Basic Probability The fundamentals, random variables, probability distributions Probability mass/density functions
More informationMath 416 Lecture 2 DEFINITION. Here are the multivariate versions: X, Y, Z iff P(X = x, Y = y, Z =z) = p(x, y, z) of X, Y, Z iff for all sets A, B, C,
Math 416 Lecture 2 DEFINITION. Here are the multivariate versions: PMF case: p(x, y, z) is the joint Probability Mass Function of X, Y, Z iff P(X = x, Y = y, Z =z) = p(x, y, z) PDF case: f(x, y, z) is
More informationChapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued
Chapter 3 sections Chapter 3 - continued 3.1 Random Variables and Discrete Distributions 3.2 Continuous Distributions 3.3 The Cumulative Distribution Function 3.4 Bivariate Distributions 3.5 Marginal Distributions
More informationLecture 5: Asymptotic Equipartition Property
Lecture 5: Asymptotic Equipartition Property Law of large number for product of random variables AEP and consequences Dr. Yao Xie, ECE587, Information Theory, Duke University Stock market Initial investment
More informationIntroduction to Machine Learning
What does this mean? Outline Contents Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola December 26, 2017 1 Introduction to Probability 1 2 Random Variables 3 3 Bayes
More informationLecture 14 February 28
EE/Stats 376A: Information Theory Winter 07 Lecture 4 February 8 Lecturer: David Tse Scribe: Sagnik M, Vivek B 4 Outline Gaussian channel and capacity Information measures for continuous random variables
More information1 Random Variable: Topics
Note: Handouts DO NOT replace the book. In most cases, they only provide a guideline on topics and an intuitive feel. 1 Random Variable: Topics Chap 2, 2.1-2.4 and Chap 3, 3.1-3.3 What is a random variable?
More informationCOMPSCI 650 Applied Information Theory Jan 21, Lecture 2
COMPSCI 650 Applied Information Theory Jan 21, 2016 Lecture 2 Instructor: Arya Mazumdar Scribe: Gayane Vardoyan, Jong-Chyi Su 1 Entropy Definition: Entropy is a measure of uncertainty of a random variable.
More informationPROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS
PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability
More informationRandom Variables. Cumulative Distribution Function (CDF) Amappingthattransformstheeventstotherealline.
Random Variables Amappingthattransformstheeventstotherealline. Example 1. Toss a fair coin. Define a random variable X where X is 1 if head appears and X is if tail appears. P (X =)=1/2 P (X =1)=1/2 Example
More informationECE Lecture #9 Part 2 Overview
ECE 450 - Lecture #9 Part Overview Bivariate Moments Mean or Expected Value of Z = g(x, Y) Correlation and Covariance of RV s Functions of RV s: Z = g(x, Y); finding f Z (z) Method : First find F(z), by
More informationIntroduction to Probability Theory
Introduction to Probability Theory Ping Yu Department of Economics University of Hong Kong Ping Yu (HKU) Probability 1 / 39 Foundations 1 Foundations 2 Random Variables 3 Expectation 4 Multivariate Random
More informationLecture 6 I. CHANNEL CODING. X n (m) P Y X
6- Introduction to Information Theory Lecture 6 Lecturer: Haim Permuter Scribe: Yoav Eisenberg and Yakov Miron I. CHANNEL CODING We consider the following channel coding problem: m = {,2,..,2 nr} Encoder
More informationStatistics for scientists and engineers
Statistics for scientists and engineers February 0, 006 Contents Introduction. Motivation - why study statistics?................................... Examples..................................................3
More information1 Presessional Probability
1 Presessional Probability Probability theory is essential for the development of mathematical models in finance, because of the randomness nature of price fluctuations in the markets. This presessional
More informationCS 591, Lecture 2 Data Analytics: Theory and Applications Boston University
CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University Charalampos E. Tsourakakis January 25rd, 2017 Probability Theory The theory of probability is a system for making better guesses.
More informationIntroduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.
L65 Dept. of Linguistics, Indiana University Fall 205 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission rate
More informationDept. of Linguistics, Indiana University Fall 2015
L645 Dept. of Linguistics, Indiana University Fall 2015 1 / 28 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission
More informationIntroduction to Computational Finance and Financial Econometrics Probability Review - Part 2
You can t see this text! Introduction to Computational Finance and Financial Econometrics Probability Review - Part 2 Eric Zivot Spring 2015 Eric Zivot (Copyright 2015) Probability Review - Part 2 1 /
More information2 Functions of random variables
2 Functions of random variables A basic statistical model for sample data is a collection of random variables X 1,..., X n. The data are summarised in terms of certain sample statistics, calculated as
More informationEE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018
Please submit the solutions on Gradescope. EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018 1. Optimal codeword lengths. Although the codeword lengths of an optimal variable length code
More informationA Probability Review
A Probability Review Outline: A probability review Shorthand notation: RV stands for random variable EE 527, Detection and Estimation Theory, # 0b 1 A Probability Review Reading: Go over handouts 2 5 in
More informationECE 587 / STA 563: Lecture 2 Measures of Information Information Theory Duke University, Fall 2017
ECE 587 / STA 563: Lecture 2 Measures of Information Information Theory Duke University, Fall 207 Author: Galen Reeves Last Modified: August 3, 207 Outline of lecture: 2. Quantifying Information..................................
More informationCommunications Theory and Engineering
Communications Theory and Engineering Master's Degree in Electronic Engineering Sapienza University of Rome A.A. 2018-2019 AEP Asymptotic Equipartition Property AEP In information theory, the analog of
More informationNational Sun Yat-Sen University CSE Course: Information Theory. Maximum Entropy and Spectral Estimation
Maximum Entropy and Spectral Estimation 1 Introduction What is the distribution of velocities in the gas at a given temperature? It is the Maxwell-Boltzmann distribution. The maximum entropy distribution
More informationp. 6-1 Continuous Random Variables p. 6-2
Continuous Random Variables Recall: For discrete random variables, only a finite or countably infinite number of possible values with positive probability (>). Often, there is interest in random variables
More information3. Review of Probability and Statistics
3. Review of Probability and Statistics ECE 830, Spring 2014 Probabilistic models will be used throughout the course to represent noise, errors, and uncertainty in signal processing problems. This lecture
More informationLecture 11: Quantum Information III - Source Coding
CSCI5370 Quantum Computing November 25, 203 Lecture : Quantum Information III - Source Coding Lecturer: Shengyu Zhang Scribe: Hing Yin Tsang. Holevo s bound Suppose Alice has an information source X that
More informationMachine Learning Srihari. Information Theory. Sargur N. Srihari
Information Theory Sargur N. Srihari 1 Topics 1. Entropy as an Information Measure 1. Discrete variable definition Relationship to Code Length 2. Continuous Variable Differential Entropy 2. Maximum Entropy
More information4. CONTINUOUS RANDOM VARIABLES
IA Probability Lent Term 4 CONTINUOUS RANDOM VARIABLES 4 Introduction Up to now we have restricted consideration to sample spaces Ω which are finite, or countable; we will now relax that assumption We
More information