On Upper Bounding Discrete Entropy

Size: px
Start display at page:

Download "On Upper Bounding Discrete Entropy"

Transcription

1 On Upper Bounding Discrete Entropy by Razan Al-nakhli A report submitted to the Department of mathematics and statistics in conformity with the requirements for the degree of Master of Science Queen s University Kingston, Ontario, Canada December 011 Copyright c Razan Al-nakhli, 011

2 Abstract Two upper bounds on the entropy of a discrete random variable are studied. The standard upper bound is derived based on the differential entropy bound for a Gaussian random variable. A tighter bound is proved using the transformation formula of the Jacobi theta function and Shannon s inequality. Numerical examples are provided to illustrate their tightness. i

3 Acknowledgments First and foremost, I would like to express my sincerest gratitude to my supervisor Dr. Fady Alajaji for his patience and guidance all through my journey here at Queen s University. His support has encouraged me to learn more than what I had expected to be possible from myself. I would also like to thank Dr. David Wehlau. Many thanks go to my beloved parents for their unconditional love, patience and support all through my life. To my adorable husband Mohammad. I thank you for your endless love and for standing by my side every step of the way. I could never have imagined how this experience would have been without you. Last, but not least, I am thankful to all members and staff in the department of Mathematics and Statistics at Queen s University. I am proud to have considered some of them as my friends here in Kingston. ii

4 Contents Abstract Acknowledgments Contents List of Figures i ii iii iv Chapter 1: Introduction Overview Motivation Organization of Thesis Chapter : Preliminaries 3.1 Entropy of a Discrete Random Variable Differential Entropy of a Continuous Random Variable Chapter 3: Two Bounds on Discrete Entropy The Standard Upper Bound on Discrete Entropy A Tight Upper Bound on Discrete Entropy Chapter 4: Numerical Examples 0 Chapter 5: Conclusion 30 Bibliography 31 iii

5 List of Figures 4.1 Entropy and the different bounds for p= Entropy and the different bounds for p= Entropy and the different bounds for p= iv

6 1 Chapter 1 Introduction Entropy is a key measure in information theory. It represents for discrete alphabet sources the ultimate rate below which compression cannot be realized error-free []. 1.1 Overview The standard upper bound on discrete entropy (which was independently established by Cover and Thomas, Willems and Massey) is derived based on the differential entropy of the Gaussian random variable [3]. In 1975, Djacov published a similar bound in his work on coin-weighing [4]. In his 1998 work [5], Mow proved a tightened bound on the discrete entropy providing certain conditions on the probability mass function of the discrete random variable. 1. Motivation A major contribution of this report addresses the problem of finding bounds for the entropy of discrete random variables. Entropies are ubiquitously used in information theory. It is important to note that the task of deriving bounds for the entropy of

7 1.3. ORGANIZATION OF THESIS discrete random variables is sometimes crucial because the exact entropy value may not be handily known (particularly if the full distribution of the random variables is not known ). Thus, finding entropy bounds that can be easily determined in terms of partial knowledge about the random variables (such as their mean and variance) is an interesting and worthwhile task. 1.3 Organization of Thesis We proceed in Chapter by introducing some background on discrete entropy and differential entropy along with some important definitions and properties. In the next chapter, we first prove the standard upper bound on discrete entropy. Then, we show in details a tighter bound due to Mow [5]. In Chapter 4 we apply both bounds to several examples to illustrate their tightness. Chapter 5 concludes the report and outlines future work. Throughout the report, all logarithms are assumed to have base unless otherwise specified.

8 3 Chapter Preliminaries.1 Entropy of a Discrete Random Variable In information theory, entropy is a measure of the uncertainty associated with a random variable. In this context, the term usually refers to the Shannon entropy, which quantifies the expected value of the information contained in a message, measured usually in units such as bits. In this context, a message means a specific realization of the random variable X with discrete (finite or countable) alphabet X and probability mass function (pmf) p( ). Definition 1. The entropy H(X) of a discrete random variable is defined by H(X) = x X p(x) log b p(x) where b is the base of the logarithm used to specify units. Common values of b are, Euler s number e, and 10, and the unit of entropy is bit for b =, nat for b = e, and digit for b = 10. In the case of p(i) = 0 for some i X, the value of the corresponding summand

9 .1. ENTROPY OF A DISCRETE RANDOM VARIABLE 4 0 log b 0 is taken to be 0, which is consistent with the limit lim p log p = 0. p 0+ Also, note that the entropy is a function of the distribution p( ) of X and is sometimes written as H(p). Thus, it doesn t depend on the actual values that the random variable X takes but on the probabilities of its outcomes. Definition. The joint entropy of two random variables X and Y with joint pmf p(x, y), where x X, y Y is defined as H(X, Y ) = x p(x, y) log p(x, y) y where x and y are particular values of X and Y, respectively,and p(x, y) log p(x, y)is defined to be 0 if p(x, y) = 0 for some (x, y). For more than two variables X 1,..., X n, their joint entropy is similarly defined as H(X 1,..., X n ) = x 1... x n p(x 1,..., x n ) log p(x 1,..., x n ) where p(x 1,..., x n ) is the joint pmf of X 1,..., X n. Definition 3. Given discrete random variable X with alphabet X and Y with alphabet

10 .1. ENTROPY OF A DISCRETE RANDOM VARIABLE 5 Y, the conditional entropy of Y given X is defined as: H(Y X) = x X p(x)h(y X = x) (.1) = p(x) p(y x) log p(y x) x X y Y (.) = p(x, y) log p(y x) x X y Y (.3) where the conditional pmf p(y x) = p(x,y) p(x) for x X with p(x) > 0 and y Y. Theorem 1. The chain rule for entropy yields H(X, Y ) = H(X) + H(Y X) or H(X, Y ) = H(Y ) + H(X Y ). Proof. H(X, Y ) = p(x, y) log p(x, y) x X y Y = p(x, y) log[p(x)p(y x)] x X y Y = p(x, y) log p(x) p(x, y) log p(y x) x X y Y x X y Y = p(x) log p(x) p(x, y) log p(y x) x X x X y Y = H(X) + H(Y X).

11 .1. ENTROPY OF A DISCRETE RANDOM VARIABLE 6 Corollary 1. H(X, Y Z) = H(X Z) + H(Y X, Z). Properties of Entropy H(X) 0. Conditioning reduces entropy: for any two random variables X and Y, we have H(X Y ) H(X) with equality if and only if X and Y are independent. H(X) log X for any random variable X with finite alphabet X, with equality if and only if X is distributed uniformly over X. H(p) is concave in p. The joint entropy of a set of random variables is greater than or equal to all of the individual entropies of the random variables in the set. H(X, Y ) max[h(x), H(Y )] and H(X 1,..., X n ) max[h(x 1 ),..., H(X n )]. The joint entropy of a set of random variables is less than or equal to the sum of the individual entropies of the random variables in the set. with equality if and only if the random variables are independent of each other.

12 .. DIFFERENTIAL ENTROPY OF A CONTINUOUS RANDOM VARIABLE 7 H(X, Y ) H(X) + H(Y ) and H(X 1,..., X n ) H(X 1 ) H(X n ).. Differential Entropy of a Continuous Random Variable We now deal with continuous (real-valued) random variables that admit a probability density function (pdf). Definition 4. Let X be a random variable with pdf f( ) and support S where S = {x R : f(x) > 0}. The differential entropy of X is denoted by h(x) and defined as assuming that the integral exists. h(x) = f(x) log f(x) dx S As with its discrete analog, the units of differential entropy depend on the base of the logarithm, which is usually (i.e., the units are bits). Related concepts such as joint, conditional differential entropy are defined in a similar fashion. However, differential entropy does not share all properties of discrete entropy. For example it can be negative, and it is not necessarily invariant under invertible maps (such as scaling). Definition 5. The joint differential entropy of n random variables X 1, X,..., X n with joint pdf f(x 1, x,..., x n )is defined as h(x 1, X,.., X n ) = f(x n ) log f(x n )dx n

13 .. DIFFERENTIAL ENTROPY OF A CONTINUOUS RANDOM VARIABLE 8 assuming that the integral exists, where x n = (x 1,..., x n ) and dx n denotes dx 1,..., dx n. Definition 6. If X and Y have joint pdf f(x, y), we can define the the conditional differential entropy h(x Y ) as h(x Y ) = f(x, y) log f(x y)dxdy assuming that the integral exists. Properties of differential entropy The chain rule for differential entropy holds as in the discrete case h(x 1,..., X n ) = n h(x i X 1,..., X i 1 ) n h(x i ). Differential entropy is translation invariant: h(x + c) = h(x) for any constant c. Differential entropy is not invariant under invertible maps. In particular, for a constant a 0, h(ax) = h(x) + log a. Also, for a random vector X = (x 1, x,..., x n ) T and n n invertible matrix A, h(ax) = h(x) + log det A, where T denotes transposition. If a random vector X n R n has zero mean and covariance matrix K, then, h(x n ) 1 log [(πe)n det K ]

14 .. DIFFERENTIAL ENTROPY OF A CONTINUOUS RANDOM VARIABLE 9 with equality if and only if X n is Gaussian; i.e, the random variables X 1,..., X n are jointly Gaussian with covariance matrix K.

15 10 Chapter 3 Two Bounds on Discrete Entropy 3.1 The Standard Upper Bound on Discrete Entropy As noted in the previous chapter, the Gaussian distribution yields the largest entropy among all densities with the same variance. We next use this fact to derive an upper bound on the entropy of a discrete random variable Theorem. Let X be a discrete random variable with countable alphabet X = {a 1, a,...} and pmf P [X = a i ] = p i, i = 1,, 3,... Then, ( H(X) 1 ) log πe i p i ip i + 1 (3.1) 1 where H(X) = p i log p i is the entropy of X. Proof. Let X 0 be an integer-valued discrete random variable with pmf P [X 0 = i] = p i for i = 1,,.... Let U be a continuous random variable uniformly distributed on the interval [0, 1] and independent of X 0. Now, define the continuous random variable X by X = X 0 + U. Hence, V ar( X) = V ar(x 0 ) + V ar(u) since X 0 and U are independent, where V ar( ) denotes the variance.

16 3.1. THE STANDARD UPPER BOUND ON DISCRETE ENTROPY11 We then have H(X 0 ) = H(X) = p i log p i = = = [ i+1 i i+1 i 1 f X(x)dx ] [ i+1 log i f X(x) log f X(x)dx f X(x) log f X(x)dx = h( X) 1 [πe(v log ar( X)) ] f X(x)dx ] = 1 log [πe (V ar(x 0) + V ar(u))] ( = 1 )] [πe log i p i ( ip i ) The first equality above is true because the discrete entropy depends only on the probabilities and not on the values of the outcomes and the third equality is achieved since f X(x) = p i for i x < i + 1. The inequality follows from the fact that the Gaussian density maximizes the differential entropy among all densities with the same variance. Note: A tighter bound is possible by appropriately reordering the given set of probability masses. It is conjectured that a good bound is achieved by the assignment...p 5, p 3, p 1, p, p 4,... for p 1 p p 3... Example 1. Let X be a discrete geometric random variable on the set X = {1,,...}, and parameter p, with entropy H(X) = (1 p) log (1 p) p log p p. The standard upper

17 3.. A TIGHT UPPER BOUND ON DISCRETE ENTROPY 1 bound of Theorem gives H(X) 1 (1 p) log(πe( + 1 ). Letting p = 1, we get p 1 H(X) 1 log(πe( + 1 ) = bits while H(X) = bits 1 Example. Let X be a negative Binomial random variable on the set X = {0, 1,...}, with pmf p[x = k] = ( k+r 1 k ) (1 p) r p k, for k X, where probability p = 1 4 parameter r > 0. The standard upper bound yields and integer H(X) 1 [ r( 1 (πe log ) ]) 1 16 = 1 [ 16 (πe log 36 r + 1 ]) A Tight Upper Bound on Discrete Entropy The bound in the previous theorem can be tightened by eliminating the term 1 1. However, it s clear that this is not always possible because the upper bound will become negative if V ar(x) < 1 πe. Lemma 1. (Shannon s inequality) If {a i } and {b i } are sequences of N positive numbers such that a 1 + a + a a N = 1 and b 1 + b + b b N 1 then N N a i log a i a i log b i (3.) with equality if and only if a i = b i for i = 1,,..., N.

18 3.. A TIGHT UPPER BOUND ON DISCRETE ENTROPY 13 Proof. To prove (3.), it is sufficient to prove that N a i log a i b i 0 (3.3) Based on the log-sum inequality, (3.3) is directly proved. Note that the log-sum inequality states that for nonnegative numbers a 1, a,..., a n and b 1, b,..., b n, N a i log a i b i ( N ) a i log ( N a i) ( N b i) (3.4) with equality if and only if a i b i = N a i N b i i = 1,..., N. Indeed, using (3.4) and considering that N a i = 1, we get N a i log a i (1) (1) log b i ( N b i) (1) log(1) = 0 since N b i 1. Thus, (3.3) is proved and (3.) is directly implied using the property of logarithm function. The equality condition for (3.) is obtained by noting that equality holds in the above inequalities if and only if a i b i = N a i N b i = 1 N b i i = 1,,.., N and N b i = 1

19 3.. A TIGHT UPPER BOUND ON DISCRETE ENTROPY 14 i.e, if and only if a i b i = 1 i = 1,,.., N. Lemma. The Jacobi theta function [5], defined by θ 3 (ν t) = exp(jπti + jπνi) i= where j = 1, ν and t are two complex numbers with Im(t) > 0, satisfies the identity t j exp ( ) jπν θ 3 (ν ( ν t) = θ 3 1 t t t ). (3.5) Theorem 3. Let {p 1, p,...p N } be a set of N probability masses. The discrete entropy H(p 1, p,..., p N ) satisfies H(p 1, p,..., p N ) 1 log(πeσ ) (3.6) if the following three conditions are satisfied: πσ > 1, (πσ )i 1 (i µ) + σ ln(πσ ), and (πσ )i (i + N µ) + σ ln(πσ ),

20 3.. A TIGHT UPPER BOUND ON DISCRETE ENTROPY 15 where and N µ = µ(p 1, p,..., p N ) = ip i, N σ = σ (p 1, p,..., p N ) = p i (i µ) and i 1, i are two positive integers nearest to µ 1 (πσ ) 1 and N µ (πσ ) 1, respectively. Proof. Let X be an integer-valued random variable with alphabet X = {1,,..., N} such that µ and σ are the mean and the variance of X, respectively. Choosing a i = p i and b i = 1 (i µ) πσ exp( ) in Lemma 1 yields σ H(X) = H(p 1, p,..., p N ) = = N [ ] 1 µ) p i log exp (i πσ σ N [ ] (i µ) p i log e log(πσ ) 1 σ N [ (i µ) p i log e 1 ] σ log(πσ ) = 1 σ log e N p i (i µ) + 1 log(πσ ) = 1 σ σ log e + 1 log(πσ ) = 1 log e + 1 log(πσ ) = 1 log(πeσ ) N p i N p i

21 3.. A TIGHT UPPER BOUND ON DISCRETE ENTROPY 16 if 1 πσ N ) (i µ) exp ( 1. (3.7) σ Now, using Lemma (note that Im(t) > 0), we let t = and substitute these values in (3.5). This yields j πσ and ν = tµ = ( j πσ )µ t j exp ( jπν t ) θ 3 (ν t) = = = = = = 1 exp jπµ t πσ t 1 πσ 1 πσ 1 πσ 1 πσ 1 πσ i= i= i= exp(jπti + jπ( tµ)i) exp(jπµ t) exp(jπt(i µi) ( ) ( ) µ 1 exp exp (i µi) σ σ ( ) ( ) µ 1 exp exp (i µi + µ µ ) σ σ exp 1 σ (µ + (i µ) µ ) ) (i µ) exp ( = θ σ 3 ( µ jπσ ). (3.8) i= i= i= Using (3.8) we can rewrite (3.7) as follows θ 3 ( µ jπσ ) 1 πσ i 0 or i N+1 ) (i µ) exp ( 1. (3.9) σ

22 3.. A TIGHT UPPER BOUND ON DISCRETE ENTROPY 17 The Jacobi theta function can be written as θ 3 ( µ jπσ ) = exp(jπ(jπσ )i jπµi) = = = = = = = i= i= i= i= i= i= i= exp( π σ i jπµi) exp( π σ i ) exp( jπµi) exp( π σ i ) [cos( πµi) + j sin( πµi)] exp( π σ i ) cos( πµi) + exp( π σ i )j sin( πµi) i= exp( π σ i ) cos( πµi) + exp( π σ i )j [(sin( πµi) + sin( πµ( i))] exp( π σ i ) cos(πµi) exp( π σ i ) cos(πµi) + 1. Now substituting the above expression in (3.9) yields

23 3.. A TIGHT UPPER BOUND ON DISCRETE ENTROPY 18 exp( π σ i ) cos(πµi) = 1 πσ 1 πσ ) (i µ) exp ( σ i 0 or i N+1 ) ( i + 1 µ) exp ( σ ) (i + N µ) + exp ( σ (3.10) where we the last equality above is maintained by shifting the support of the index i as follows i N+1 ) (i µ) exp ( = σ ) (i + N µ) exp ( σ and i 0 ) (i µ) exp ( = σ ) ( i + 1 µ) exp ( σ Sufficient conditions for( 3.10) are as follows: for i = 1,,...,, we get exp( π σ i ) 1 πσ ( i+1 µ) exp( max{, (i+n µ) }) σ σ (πσ ) i max{(i 1 + µ), (i + N µ) } + σ ln(πσ ) (( πσ ) 1)i (µ 1)i (µ 1) σ ln(πσ ) 0 ((πσ ) 1)i (N µ)i (N µ) σ ln(πσ ) 0 (3.11)

24 3.. A TIGHT UPPER BOUND ON DISCRETE ENTROPY 19 ((πσ ) 1)(i (πσ ) 1)(i µ 1 (πσ ) 1 ) (µ 1) (1 + 1 (πσ ) 1 ) σ ln(πσ ) 0 N µ (πσ ) 1 ) (N µ) (1 + 1 (πσ ) 1 ) σ ln(πσ ) 0. (3.1) The left hand-sides of the first and second inequalities in (3.1) achieve their minimum values precisely wheni = i 1 and i = i, respectively, provided that πσ > 1. Substituting these values in (3.11), we obtain the if condition stated in the theorem.

25 0 Chapter 4 Numerical Examples Before we illustrate examples on the bounds of the previous chapter, we first prove the following result. Corollary. The discrete entropy bound in (3.6) holds if max{µ, N + 1 µ} < (πσ ) σ ln(πσ ). (4.1) Proof. If µ 1 (πσ ) 1 and N µ (πσ ) 1 are both less than 3, then i 1 = i = 1. Equivalently, i 1 = i if max{µ, N + 1 µ} < 3 (πσ ) 1. (4.)

26 1 Note that max{µ, N + 1 µ} µ + N + 1 µ implies that max{µ, N + 1 µ} N + 1. (4.3) From (4.) and (4.3) we get that 3 (πσ ) 1 > N + 1 (πσ ) 1 > N πσ > 1. Now, for i 1 = i = 1, the sufficient conditions in (3.6) become (πσ ) max{µ, N + 1 µ} + σ ln(πσ ) where ln(πσ ) is positive. Let θ = (πσ ) > 1, then ( 3θ 1 ) θ = 1 (9θ 1)(θ 1) > 0 4 ( 3 (πσ ) 1 ) > (πσ ). We now present several examples to evaluate the tightness of the upper bounds shown in the previous chapter.

27 Example 3. Let X be Bernoulli random variable on the set X = {0, 1} with parameter p= 1. Then H(X) = 1 bit, µ = 1 and σ = 1(1 1) = 1. The standard upper bound 4 of Theorem yields H(X) 1 log [ πe(var(x) ) ] = 1 [ log πe( 1 4 ) + 1 ] 1 = 1.55 bits. Corollary is satisfied since max{ 1, } = max{.5,.5} =.5 and (πσ ) σ ln(πσ ) =.345. Thus, the upper bound of Theorem 3 yields H(X) 1 [ log πe 1 ] = bits 4 which is closer to the real entropy H(X) = 1 than the standard bound. Example 4. Let X be a uniformly distributed discrete random variable on the set X = {1,, 3, 4}. Then H(X) = 1 4 log log log log 1 4 = bits. Also, µ = E(X) = 5 and σ = E(X ) (E(X)) = 15 1.

28 3 Corollary is satisfied since max{ 5, 5 5 } < (πσ ) σ ln(πσ ) max{6.5, 6.5} < The standard upper bound of Theorem yields H(X) 1 log(πe(v ar(x) )) = 1 log(πe( )) =.5461 bits. On the other hand, the bound of Theorem 3 is tighter as H(X) log(πe(1.5)) =.080 bits. Example 5. Let X be binomial random variable on the set X = {1,, 3, 4,...n+1} with parameter p. The mean and the variance are µ = np + 1 and σ = npq respectively, where q = (1 p). Examining (4.1), we have max{np + 1, n (np + 1)} = max{np + 1, n + 1 np} = max{np + 1, n(1 p) + 1} = max{np + 1, nq + 1} = {nmax(p, q) + 1}.

29 4 Thus, Corollary holds if {nmax(p, q) + 1} < (πnpq) npq ln(πnpq). (4.4) This condition is satisfied for n sufficiently large if max{p, q} < πpq or equivalently if min{p, 1 p} > 1 π = Calculating the standard bound and the new bound, we get H(X) 1 log πe(np(1 p) ) and H(X) 1 log(πenp(1 p)) respectively. When p equal 1, condition (4.4) reduces to ( n + 1) < ( πn ) ( n 4 ) ln(πn ). Note that this is satisfied for any n 1. In this case, the tightened bound becomes H(X) 1 log(πen ), n = 1,, 3,... Example 6. Let X be a discrete random variable defined on the set X = {1,, 3, 4}

30 5 with the following pmf ( 1, 1 4, 1 4, 0). Then, H(X) = 1.5 bits. The mean and the variance are µ = E(X) = 4 xp(x) = 1.75 and σ = E(X ) E(X) =.6875, respectively. The standard upper bound yields H(X) 1 log(πe( )) = Corollary is satisfied since max{3.06, 10.56} < Thus, the tightened upper bound yields H(X) 1 log(πe(.6875)) = 1.77 which is slightly closer to the real value of H(X) than the standard upper bound. Example 7. let X be discrete random variable on the set X = {1,, 3, 4, 5} with pmf ( 1, 1 8, 1 8, 1 8, 1 8). Then H(X) = bits. The mean is µ =.5 and the variance is σ = The standard upper bound gives H(X) Here again Corollary holds since max{5.065, } < This yields a tighter upper bound of H(X) Example 8. Let X be discrete random variable on the set X = {1,, 3, 4} with pmf ( 1, 1 4, 1 8, 1 8). Then, H(X) = 1.75 bits. The mean µ = 1.87 and the variance

31 6 σ = The standard upper bound gives H(X).17. Corollary holds since max{3.49, 9.793} < 46.43; yielding a tightened bound of H(X).1. Example 9. Consider Xto be a binomial random variable on the set {1,,...n + 1} with n 30. Figs show the exact value and various bounds of the discrete entropy of Xwith p =.1, p =.3, and p =.5. The bounds for reordering the set of probability masses as described in Section 3.1 are also included. Note that all values in the figures are normalized by subtracting the term 1 log(n). Observations For n larger than a certain threshold which is a function p, the entropy of the discrete random variable increases monotonically towards the tightened bound. Note that the convergence rate is smaller for smaller values of p. Reordering the set of probability masses gives improved bounds except for p =.5, where in this case the bounds for the reordered set of probabilities are identical to the original ones. The improvement seems to be greater for smaller values of p. For p =.3, the tighter bound is applicable for n. Fig.4. shows that it is also valid for n = 1. The conditions guaranteeing the applicability of the new bound is violated in for the case of p =.1. However, Fig.4.1 shows that the

32 standard bound standard bound for reordered probabilities exact entropy new bound new bound for reordered probabilities n Figure 4.1: Entropy and the different bounds for p=.1 bound is valid for for n 4. This suggests the possibility of generalizing the sufficient conditions under which the bound is applicable.

33 standard bound standard bound for reordered probabilities exact entropy new bound new bound for reordered probabilities n Figure 4.: Entropy and the different bounds for p=.3

34 standard bound standard bound for reordered probabilities exact entropy new bound new bound for reordered probabilities n Figure 4.3: Entropy and the different bounds for p=.5

35 30 Chapter 5 Conclusion In this report two upper bounds for discrete entropy were derived. The standard upper bound on discrete entropy was shown using the differential entropy bound for a Gaussian random variable [3]. The new upper bound [5] was developed based on Shannon s inequality [1] and the transformation formula of the Jacobi theta function [6]. The tightened bound which is an improvement over the standard bound, is only applicable if the probability masses of the discrete random variables satisfy certain conditions. Some examples were provided in the last chapter to show the applicability and the practical meaning of the two bounds. A possible direction for future work is to calculate the tightened bound for a set of discrete random variables using their joint probability mass function. Another interesting direction is to relax the sufficient conditions under which the tightened bound holds.

36 BIBLIOGRAPHY 31 Bibliography [1] J. Aczel. On Shannon s inequality, optimal coding, and characterizations of Shannon s and Renyi s entropies. Technical Report Res. Rep. CS-73-05, Dept. of Applied Anal. Comput. Science, Univ. of Waterloo, Waterloo, Ont.,Canada, Jan [] T. M. Cover and Joy A. Thomas. Elements of Information Theory [3] T. M. Cover and J. A. Tomas. Elements of Information Theory. New York: Wiley, [4] A. G. Djackov. On a search model of false coins. in Topics in Information Theory (Colloquia Mathematica Societatis Janos Bolyai 16, Keszthely, Hungary)Hungary:, page , [5] B.H. Mow. A tight upper bound on discrete entropy. IEEE Transaction on Information Theory, vol. 44, no., Mar [6] H. Rademacher. Topics in Analytic Number Theory. Berlin, Germany:Springer-Verlag, 1973.

ECE 4400:693 - Information Theory

ECE 4400:693 - Information Theory ECE 4400:693 - Information Theory Dr. Nghi Tran Lecture 8: Differential Entropy Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 1 / 43 Outline 1 Review: Entropy of discrete RVs 2 Differential

More information

Lecture 17: Differential Entropy

Lecture 17: Differential Entropy Lecture 17: Differential Entropy Differential entropy AEP for differential entropy Quantization Maximum differential entropy Estimation counterpart of Fano s inequality Dr. Yao Xie, ECE587, Information

More information

Computing and Communications 2. Information Theory -Entropy

Computing and Communications 2. Information Theory -Entropy 1896 1920 1987 2006 Computing and Communications 2. Information Theory -Entropy Ying Cui Department of Electronic Engineering Shanghai Jiao Tong University, China 2017, Autumn 1 Outline Entropy Joint entropy

More information

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information. L65 Dept. of Linguistics, Indiana University Fall 205 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission rate

More information

Dept. of Linguistics, Indiana University Fall 2015

Dept. of Linguistics, Indiana University Fall 2015 L645 Dept. of Linguistics, Indiana University Fall 2015 1 / 28 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission

More information

Information Theory Primer:

Information Theory Primer: Information Theory Primer: Entropy, KL Divergence, Mutual Information, Jensen s inequality Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,

More information

COMPSCI 650 Applied Information Theory Jan 21, Lecture 2

COMPSCI 650 Applied Information Theory Jan 21, Lecture 2 COMPSCI 650 Applied Information Theory Jan 21, 2016 Lecture 2 Instructor: Arya Mazumdar Scribe: Gayane Vardoyan, Jong-Chyi Su 1 Entropy Definition: Entropy is a measure of uncertainty of a random variable.

More information

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157 Lecture 6: Gaussian Channels Copyright G. Caire (Sample Lectures) 157 Differential entropy (1) Definition 18. The (joint) differential entropy of a continuous random vector X n p X n(x) over R is: Z h(x

More information

CS 630 Basic Probability and Information Theory. Tim Campbell

CS 630 Basic Probability and Information Theory. Tim Campbell CS 630 Basic Probability and Information Theory Tim Campbell 21 January 2003 Probability Theory Probability Theory is the study of how best to predict outcomes of events. An experiment (or trial or event)

More information

Quick Tour of Basic Probability Theory and Linear Algebra

Quick Tour of Basic Probability Theory and Linear Algebra Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra CS224w: Social and Information Network Analysis Fall 2011 Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra Outline Definitions

More information

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 2: Entropy and Mutual Information Chapter 2 outline Definitions Entropy Joint entropy, conditional entropy Relative entropy, mutual information Chain rules Jensen s inequality Log-sum inequality

More information

Entropy and Ergodic Theory Lecture 4: Conditional entropy and mutual information

Entropy and Ergodic Theory Lecture 4: Conditional entropy and mutual information Entropy and Ergodic Theory Lecture 4: Conditional entropy and mutual information 1 Conditional entropy Let (Ω, F, P) be a probability space, let X be a RV taking values in some finite set A. In this lecture

More information

Lecture 5 - Information theory

Lecture 5 - Information theory Lecture 5 - Information theory Jan Bouda FI MU May 18, 2012 Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 1 / 42 Part I Uncertainty and entropy Jan Bouda (FI MU) Lecture 5 - Information

More information

Chapter I: Fundamental Information Theory

Chapter I: Fundamental Information Theory ECE-S622/T62 Notes Chapter I: Fundamental Information Theory Ruifeng Zhang Dept. of Electrical & Computer Eng. Drexel University. Information Source Information is the outcome of some physical processes.

More information

Review of Probability Theory

Review of Probability Theory Review of Probability Theory Arian Maleki and Tom Do Stanford University Probability theory is the study of uncertainty Through this class, we will be relying on concepts from probability theory for deriving

More information

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University Chapter 3, 4 Random Variables ENCS6161 - Probability and Stochastic Processes Concordia University ENCS6161 p.1/47 The Notion of a Random Variable A random variable X is a function that assigns a real

More information

Chapter 8: Differential entropy. University of Illinois at Chicago ECE 534, Natasha Devroye

Chapter 8: Differential entropy. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 8: Differential entropy Chapter 8 outline Motivation Definitions Relation to discrete entropy Joint and conditional differential entropy Relative entropy and mutual information Properties AEP for

More information

Lecture 5 Channel Coding over Continuous Channels

Lecture 5 Channel Coding over Continuous Channels Lecture 5 Channel Coding over Continuous Channels I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw November 14, 2014 1 / 34 I-Hsiang Wang NIT Lecture 5 From

More information

EE/Stat 376B Handout #5 Network Information Theory October, 14, Homework Set #2 Solutions

EE/Stat 376B Handout #5 Network Information Theory October, 14, Homework Set #2 Solutions EE/Stat 376B Handout #5 Network Information Theory October, 14, 014 1. Problem.4 parts (b) and (c). Homework Set # Solutions (b) Consider h(x + Y ) h(x + Y Y ) = h(x Y ) = h(x). (c) Let ay = Y 1 + Y, where

More information

Random variables. DS GA 1002 Probability and Statistics for Data Science.

Random variables. DS GA 1002 Probability and Statistics for Data Science. Random variables DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall17 Carlos Fernandez-Granda Motivation Random variables model numerical quantities

More information

1 Random Variable: Topics

1 Random Variable: Topics Note: Handouts DO NOT replace the book. In most cases, they only provide a guideline on topics and an intuitive feel. 1 Random Variable: Topics Chap 2, 2.1-2.4 and Chap 3, 3.1-3.3 What is a random variable?

More information

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed

More information

Lecture 2: August 31

Lecture 2: August 31 0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 2: August 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy

More information

Multiple Random Variables

Multiple Random Variables Multiple Random Variables This Version: July 30, 2015 Multiple Random Variables 2 Now we consider models with more than one r.v. These are called multivariate models For instance: height and weight An

More information

Exercises with solutions (Set D)

Exercises with solutions (Set D) Exercises with solutions Set D. A fair die is rolled at the same time as a fair coin is tossed. Let A be the number on the upper surface of the die and let B describe the outcome of the coin toss, where

More information

INTRODUCTION TO INFORMATION THEORY

INTRODUCTION TO INFORMATION THEORY INTRODUCTION TO INFORMATION THEORY KRISTOFFER P. NIMARK These notes introduce the machinery of information theory which is a eld within applied mathematics. The material can be found in most textbooks

More information

Machine Learning Srihari. Information Theory. Sargur N. Srihari

Machine Learning Srihari. Information Theory. Sargur N. Srihari Information Theory Sargur N. Srihari 1 Topics 1. Entropy as an Information Measure 1. Discrete variable definition Relationship to Code Length 2. Continuous Variable Differential Entropy 2. Maximum Entropy

More information

Electrical and Information Technology. Information Theory. Problems and Solutions. Contents. Problems... 1 Solutions...7

Electrical and Information Technology. Information Theory. Problems and Solutions. Contents. Problems... 1 Solutions...7 Electrical and Information Technology Information Theory Problems and Solutions Contents Problems.......... Solutions...........7 Problems 3. In Problem?? the binomial coefficent was estimated with Stirling

More information

Introduction to Machine Learning

Introduction to Machine Learning What does this mean? Outline Contents Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola December 26, 2017 1 Introduction to Probability 1 2 Random Variables 3 3 Bayes

More information

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability

More information

How to Quantitate a Markov Chain? Stochostic project 1

How to Quantitate a Markov Chain? Stochostic project 1 How to Quantitate a Markov Chain? Stochostic project 1 Chi-Ning,Chou Wei-chang,Lee PROFESSOR RAOUL NORMAND April 18, 2015 Abstract In this project, we want to quantitatively evaluate a Markov chain. In

More information

Chapter 3: Random Variables 1

Chapter 3: Random Variables 1 Chapter 3: Random Variables 1 Yunghsiang S. Han Graduate Institute of Communication Engineering, National Taipei University Taiwan E-mail: yshan@mail.ntpu.edu.tw 1 Modified from the lecture notes by Prof.

More information

Lecture 22: Final Review

Lecture 22: Final Review Lecture 22: Final Review Nuts and bolts Fundamental questions and limits Tools Practical algorithms Future topics Dr Yao Xie, ECE587, Information Theory, Duke University Basics Dr Yao Xie, ECE587, Information

More information

Lecture 8: Channel Capacity, Continuous Random Variables

Lecture 8: Channel Capacity, Continuous Random Variables EE376A/STATS376A Information Theory Lecture 8-02/0/208 Lecture 8: Channel Capacity, Continuous Random Variables Lecturer: Tsachy Weissman Scribe: Augustine Chemparathy, Adithya Ganesh, Philip Hwang Channel

More information

Random Variables and Their Distributions

Random Variables and Their Distributions Chapter 3 Random Variables and Their Distributions A random variable (r.v.) is a function that assigns one and only one numerical value to each simple event in an experiment. We will denote r.vs by capital

More information

Lecture 14 February 28

Lecture 14 February 28 EE/Stats 376A: Information Theory Winter 07 Lecture 4 February 8 Lecturer: David Tse Scribe: Sagnik M, Vivek B 4 Outline Gaussian channel and capacity Information measures for continuous random variables

More information

On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein Method

On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein Method On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein Method Igal Sason Department of Electrical Engineering Technion - Israel Institute of Technology Haifa 32000, Israel ETH, Zurich,

More information

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics DS-GA 100 Lecture notes 11 Fall 016 Bayesian statistics In the frequentist paradigm we model the data as realizations from a distribution that depends on deterministic parameters. In contrast, in Bayesian

More information

ECE 302 Division 2 Exam 2 Solutions, 11/4/2009.

ECE 302 Division 2 Exam 2 Solutions, 11/4/2009. NAME: ECE 32 Division 2 Exam 2 Solutions, /4/29. You will be required to show your student ID during the exam. This is a closed-book exam. A formula sheet is provided. No calculators are allowed. Total

More information

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows. Chapter 5 Two Random Variables In a practical engineering problem, there is almost always causal relationship between different events. Some relationships are determined by physical laws, e.g., voltage

More information

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Review of Basic Probability The fundamentals, random variables, probability distributions Probability mass/density functions

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 /

More information

Probability review. September 11, Stoch. Systems Analysis Introduction 1

Probability review. September 11, Stoch. Systems Analysis Introduction 1 Probability review Alejandro Ribeiro Dept. of Electrical and Systems Engineering University of Pennsylvania aribeiro@seas.upenn.edu http://www.seas.upenn.edu/users/~aribeiro/ September 11, 2015 Stoch.

More information

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 ECE598: Information-theoretic methods in high-dimensional statistics Spring 06 Lecture : Mutual Information Method Lecturer: Yihong Wu Scribe: Jaeho Lee, Mar, 06 Ed. Mar 9 Quick review: Assouad s lemma

More information

Information Theory. David Rosenberg. June 15, New York University. David Rosenberg (New York University) DS-GA 1003 June 15, / 18

Information Theory. David Rosenberg. June 15, New York University. David Rosenberg (New York University) DS-GA 1003 June 15, / 18 Information Theory David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 18 A Measure of Information? Consider a discrete random variable

More information

Upper Bounds on the Capacity of Binary Intermittent Communication

Upper Bounds on the Capacity of Binary Intermittent Communication Upper Bounds on the Capacity of Binary Intermittent Communication Mostafa Khoshnevisan and J. Nicholas Laneman Department of Electrical Engineering University of Notre Dame Notre Dame, Indiana 46556 Email:{mhoshne,

More information

Recitation 2: Probability

Recitation 2: Probability Recitation 2: Probability Colin White, Kenny Marino January 23, 2018 Outline Facts about sets Definitions and facts about probability Random Variables and Joint Distributions Characteristics of distributions

More information

Information Theory in Intelligent Decision Making

Information Theory in Intelligent Decision Making Information Theory in Intelligent Decision Making Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire, United Kingdom June 7, 2015 Information Theory

More information

Lecture 5: GPs and Streaming regression

Lecture 5: GPs and Streaming regression Lecture 5: GPs and Streaming regression Gaussian Processes Information gain Confidence intervals COMP-652 and ECSE-608, Lecture 5 - September 19, 2017 1 Recall: Non-parametric regression Input space X

More information

18.440: Lecture 28 Lectures Review

18.440: Lecture 28 Lectures Review 18.440: Lecture 28 Lectures 18-27 Review Scott Sheffield MIT Outline Outline It s the coins, stupid Much of what we have done in this course can be motivated by the i.i.d. sequence X i where each X i is

More information

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed

More information

Asymptotic redundancy and prolixity

Asymptotic redundancy and prolixity Asymptotic redundancy and prolixity Yuval Dagan, Yuval Filmus, and Shay Moran April 6, 2017 Abstract Gallager (1978) considered the worst-case redundancy of Huffman codes as the maximum probability tends

More information

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ). .8.6 µ =, σ = 1 µ = 1, σ = 1 / µ =, σ =.. 3 1 1 3 x Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ ). The Gaussian distribution Probably the most-important distribution in all of statistics

More information

Non-Gaussian Maximum Entropy Processes

Non-Gaussian Maximum Entropy Processes Non-Gaussian Maximum Entropy Processes Georgi N. Boshnakov & Bisher Iqelan First version: 3 April 2007 Research Report No. 3, 2007, Probability and Statistics Group School of Mathematics, The University

More information

Algorithms for Uncertainty Quantification

Algorithms for Uncertainty Quantification Algorithms for Uncertainty Quantification Tobias Neckel, Ionuț-Gabriel Farcaș Lehrstuhl Informatik V Summer Semester 2017 Lecture 2: Repetition of probability theory and statistics Example: coin flip Example

More information

MMSE Dimension. snr. 1 We use the following asymptotic notation: f(x) = O (g(x)) if and only

MMSE Dimension. snr. 1 We use the following asymptotic notation: f(x) = O (g(x)) if and only MMSE Dimension Yihong Wu Department of Electrical Engineering Princeton University Princeton, NJ 08544, USA Email: yihongwu@princeton.edu Sergio Verdú Department of Electrical Engineering Princeton University

More information

Machine learning - HT Maximum Likelihood

Machine learning - HT Maximum Likelihood Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce

More information

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018 Please submit the solutions on Gradescope. EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018 1. Optimal codeword lengths. Although the codeword lengths of an optimal variable length code

More information

Chapter 6 Expectation and Conditional Expectation. Lectures Definition 6.1. Two random variables defined on a probability space are said to be

Chapter 6 Expectation and Conditional Expectation. Lectures Definition 6.1. Two random variables defined on a probability space are said to be Chapter 6 Expectation and Conditional Expectation Lectures 24-30 In this chapter, we introduce expected value or the mean of a random variable. First we define expectation for discrete random variables

More information

The Central Limit Theorem: More of the Story

The Central Limit Theorem: More of the Story The Central Limit Theorem: More of the Story Steven Janke November 2015 Steven Janke (Seminar) The Central Limit Theorem:More of the Story November 2015 1 / 33 Central Limit Theorem Theorem (Central Limit

More information

1 Review of Probability and Distributions

1 Review of Probability and Distributions Random variables. A numerically valued function X of an outcome ω from a sample space Ω X : Ω R : ω X(ω) is called a random variable (r.v.), and usually determined by an experiment. We conventionally denote

More information

Probability and Distributions

Probability and Distributions Probability and Distributions What is a statistical model? A statistical model is a set of assumptions by which the hypothetical population distribution of data is inferred. It is typically postulated

More information

Mathematical Statistics 1 Math A 6330

Mathematical Statistics 1 Math A 6330 Mathematical Statistics 1 Math A 6330 Chapter 2 Transformations and Expectations Mohamed I. Riffi Department of Mathematics Islamic University of Gaza September 14, 2015 Outline 1 Distributions of Functions

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB

More information

ECE Information theory Final

ECE Information theory Final ECE 776 - Information theory Final Q1 (1 point) We would like to compress a Gaussian source with zero mean and variance 1 We consider two strategies In the first, we quantize with a step size so that the

More information

ELEC546 Review of Information Theory

ELEC546 Review of Information Theory ELEC546 Review of Information Theory Vincent Lau 1/1/004 1 Review of Information Theory Entropy: Measure of uncertainty of a random variable X. The entropy of X, H(X), is given by: If X is a discrete random

More information

01 Probability Theory and Statistics Review

01 Probability Theory and Statistics Review NAVARCH/EECS 568, ROB 530 - Winter 2018 01 Probability Theory and Statistics Review Maani Ghaffari January 08, 2018 Last Time: Bayes Filters Given: Stream of observations z 1:t and action data u 1:t Sensor/measurement

More information

AQI: Advanced Quantum Information Lecture 6 (Module 2): Distinguishing Quantum States January 28, 2013

AQI: Advanced Quantum Information Lecture 6 (Module 2): Distinguishing Quantum States January 28, 2013 AQI: Advanced Quantum Information Lecture 6 (Module 2): Distinguishing Quantum States January 28, 2013 Lecturer: Dr. Mark Tame Introduction With the emergence of new types of information, in this case

More information

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1 Kraft s inequality An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if N 2 l i 1 Proof: Suppose that we have a tree code. Let l max = max{l 1,...,

More information

Chapter 5. Random Variables (Continuous Case) 5.1 Basic definitions

Chapter 5. Random Variables (Continuous Case) 5.1 Basic definitions Chapter 5 andom Variables (Continuous Case) So far, we have purposely limited our consideration to random variables whose ranges are countable, or discrete. The reason for that is that distributions on

More information

On the number of ways of writing t as a product of factorials

On the number of ways of writing t as a product of factorials On the number of ways of writing t as a product of factorials Daniel M. Kane December 3, 005 Abstract Let N 0 denote the set of non-negative integers. In this paper we prove that lim sup n, m N 0 : n!m!

More information

Basics on Probability. Jingrui He 09/11/2007

Basics on Probability. Jingrui He 09/11/2007 Basics on Probability Jingrui He 09/11/2007 Coin Flips You flip a coin Head with probability 0.5 You flip 100 coins How many heads would you expect Coin Flips cont. You flip a coin Head with probability

More information

5 Mutual Information and Channel Capacity

5 Mutual Information and Channel Capacity 5 Mutual Information and Channel Capacity In Section 2, we have seen the use of a quantity called entropy to measure the amount of randomness in a random variable. In this section, we introduce several

More information

Classification & Information Theory Lecture #8

Classification & Information Theory Lecture #8 Classification & Information Theory Lecture #8 Introduction to Natural Language Processing CMPSCI 585, Fall 2007 University of Massachusetts Amherst Andrew McCallum Today s Main Points Automatically categorizing

More information

1 Presessional Probability

1 Presessional Probability 1 Presessional Probability Probability theory is essential for the development of mathematical models in finance, because of the randomness nature of price fluctuations in the markets. This presessional

More information

Chapter 4: Continuous channel and its capacity

Chapter 4: Continuous channel and its capacity meghdadi@ensil.unilim.fr Reference : Elements of Information Theory by Cover and Thomas Continuous random variable Gaussian multivariate random variable AWGN Band limited channel Parallel channels Flat

More information

Entropy Rate of Stochastic Processes

Entropy Rate of Stochastic Processes Entropy Rate of Stochastic Processes Timo Mulder tmamulder@gmail.com Jorn Peters jornpeters@gmail.com February 8, 205 The entropy rate of independent and identically distributed events can on average be

More information

Lecture 11: Continuous-valued signals and differential entropy

Lecture 11: Continuous-valued signals and differential entropy Lecture 11: Continuous-valued signals and differential entropy Biology 429 Carl Bergstrom September 20, 2008 Sources: Parts of today s lecture follow Chapter 8 from Cover and Thomas (2007). Some components

More information

Convexity/Concavity of Renyi Entropy and α-mutual Information

Convexity/Concavity of Renyi Entropy and α-mutual Information Convexity/Concavity of Renyi Entropy and -Mutual Information Siu-Wai Ho Institute for Telecommunications Research University of South Australia Adelaide, SA 5095, Australia Email: siuwai.ho@unisa.edu.au

More information

Lecture 4: Probability and Discrete Random Variables

Lecture 4: Probability and Discrete Random Variables Error Correcting Codes: Combinatorics, Algorithms and Applications (Fall 2007) Lecture 4: Probability and Discrete Random Variables Wednesday, January 21, 2009 Lecturer: Atri Rudra Scribe: Anonymous 1

More information

Probability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014

Probability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014 Probability Machine Learning and Pattern Recognition Chris Williams School of Informatics, University of Edinburgh August 2014 (All of the slides in this course have been adapted from previous versions

More information

Mathematical statistics

Mathematical statistics October 4 th, 2018 Lecture 12: Information Where are we? Week 1 Week 2 Week 4 Week 7 Week 10 Week 14 Probability reviews Chapter 6: Statistics and Sampling Distributions Chapter 7: Point Estimation Chapter

More information

An Extended Fano s Inequality for the Finite Blocklength Coding

An Extended Fano s Inequality for the Finite Blocklength Coding An Extended Fano s Inequality for the Finite Bloclength Coding Yunquan Dong, Pingyi Fan {dongyq8@mails,fpy@mail}.tsinghua.edu.cn Department of Electronic Engineering, Tsinghua University, Beijing, P.R.

More information

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University Chapter 4 Data Transmission and Channel Capacity Po-Ning Chen, Professor Department of Communications Engineering National Chiao Tung University Hsin Chu, Taiwan 30050, R.O.C. Principle of Data Transmission

More information

M378K In-Class Assignment #1

M378K In-Class Assignment #1 The following problems are a review of M6K. M7K In-Class Assignment # Problem.. Complete the definition of mutual exclusivity of events below: Events A, B Ω are said to be mutually exclusive if A B =.

More information

Formulas for probability theory and linear models SF2941

Formulas for probability theory and linear models SF2941 Formulas for probability theory and linear models SF2941 These pages + Appendix 2 of Gut) are permitted as assistance at the exam. 11 maj 2008 Selected formulae of probability Bivariate probability Transforms

More information

Lecture 3. Discrete Random Variables

Lecture 3. Discrete Random Variables Math 408 - Mathematical Statistics Lecture 3. Discrete Random Variables January 23, 2013 Konstantin Zuev (USC) Math 408, Lecture 3 January 23, 2013 1 / 14 Agenda Random Variable: Motivation and Definition

More information

Problem Set 1 Sept, 14

Problem Set 1 Sept, 14 EE6: Random Processes in Systems Lecturer: Jean C. Walrand Problem Set Sept, 4 Fall 06 GSI: Assane Gueye This problem set essentially reviews notions of conditional expectation, conditional distribution,

More information

Math 3215 Intro. Probability & Statistics Summer 14. Homework 5: Due 7/3/14

Math 3215 Intro. Probability & Statistics Summer 14. Homework 5: Due 7/3/14 Math 325 Intro. Probability & Statistics Summer Homework 5: Due 7/3/. Let X and Y be continuous random variables with joint/marginal p.d.f. s f(x, y) 2, x y, f (x) 2( x), x, f 2 (y) 2y, y. Find the conditional

More information

Strong log-concavity is preserved by convolution

Strong log-concavity is preserved by convolution Strong log-concavity is preserved by convolution Jon A. Wellner Abstract. We review and formulate results concerning strong-log-concavity in both discrete and continuous settings. Although four different

More information

15-388/688 - Practical Data Science: Basic probability. J. Zico Kolter Carnegie Mellon University Spring 2018

15-388/688 - Practical Data Science: Basic probability. J. Zico Kolter Carnegie Mellon University Spring 2018 15-388/688 - Practical Data Science: Basic probability J. Zico Kolter Carnegie Mellon University Spring 2018 1 Announcements Logistics of next few lectures Final project released, proposals/groups due

More information

Gaussian channel. Information theory 2013, lecture 6. Jens Sjölund. 8 May Jens Sjölund (IMT, LiU) Gaussian channel 1 / 26

Gaussian channel. Information theory 2013, lecture 6. Jens Sjölund. 8 May Jens Sjölund (IMT, LiU) Gaussian channel 1 / 26 Gaussian channel Information theory 2013, lecture 6 Jens Sjölund 8 May 2013 Jens Sjölund (IMT, LiU) Gaussian channel 1 / 26 Outline 1 Definitions 2 The coding theorem for Gaussian channel 3 Bandlimited

More information

Chapter 1. Sets and probability. 1.3 Probability space

Chapter 1. Sets and probability. 1.3 Probability space Random processes - Chapter 1. Sets and probability 1 Random processes Chapter 1. Sets and probability 1.3 Probability space 1.3 Probability space Random processes - Chapter 1. Sets and probability 2 Probability

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

[POLS 8500] Review of Linear Algebra, Probability and Information Theory

[POLS 8500] Review of Linear Algebra, Probability and Information Theory [POLS 8500] Review of Linear Algebra, Probability and Information Theory Professor Jason Anastasopoulos ljanastas@uga.edu January 12, 2017 For today... Basic linear algebra. Basic probability. Programming

More information

Analysis of Engineering and Scientific Data. Semester

Analysis of Engineering and Scientific Data. Semester Analysis of Engineering and Scientific Data Semester 1 2019 Sabrina Streipert s.streipert@uq.edu.au Example: Draw a random number from the interval of real numbers [1, 3]. Let X represent the number. Each

More information

Superposition Encoding and Partial Decoding Is Optimal for a Class of Z-interference Channels

Superposition Encoding and Partial Decoding Is Optimal for a Class of Z-interference Channels Superposition Encoding and Partial Decoding Is Optimal for a Class of Z-interference Channels Nan Liu and Andrea Goldsmith Department of Electrical Engineering Stanford University, Stanford CA 94305 Email:

More information

Lecture 2: Review of Basic Probability Theory

Lecture 2: Review of Basic Probability Theory ECE 830 Fall 2010 Statistical Signal Processing instructor: R. Nowak, scribe: R. Nowak Lecture 2: Review of Basic Probability Theory Probabilistic models will be used throughout the course to represent

More information

Ch. 8 Math Preliminaries for Lossy Coding. 8.4 Info Theory Revisited

Ch. 8 Math Preliminaries for Lossy Coding. 8.4 Info Theory Revisited Ch. 8 Math Preliminaries for Lossy Coding 8.4 Info Theory Revisited 1 Info Theory Goals for Lossy Coding Again just as for the lossless case Info Theory provides: Basis for Algorithms & Bounds on Performance

More information

Probability and Information Theory. Sargur N. Srihari

Probability and Information Theory. Sargur N. Srihari Probability and Information Theory Sargur N. srihari@cedar.buffalo.edu 1 Topics in Probability and Information Theory Overview 1. Why Probability? 2. Random Variables 3. Probability Distributions 4. Marginal

More information