H(X) = plog 1 p +(1 p)log 1 1 p. With a slight abuse of notation, we denote this quantity by H(p) and refer to it as the binary entropy function.

Size: px
Start display at page:

Download "H(X) = plog 1 p +(1 p)log 1 1 p. With a slight abuse of notation, we denote this quantity by H(p) and refer to it as the binary entropy function."

Transcription

1 LECTURE 2 Information Measures 2. ENTROPY LetXbeadiscreterandomvariableonanalphabetX drawnaccordingtotheprobability mass function (pmf) p() = P(X = ), X, denoted in short as X p(). The uncertainty about the outcome of X, or equivalently, the amount of information gained by observing X, is measured by its entropy H(X) = p() log p() = E log p(x). X Bycontinuity,weusetheconvention 0log0 = 0intheabovesummation. Sometimeswe denoteh(x)byh(p()),highlightingthefactthath(x)isafunctionalofthepmf p(). Eample2.. If X is a Bernoulli random variable with parameter p = P{X = } [0,] (inshort X Bern(p)),then H(X) = plog p +( p)log p. With a slight abuse of notation, we denote this quantity by H(p) and refer to it as the binary entropy function. The entropy H(X) satisfies the following properties.. H(X) H(X)isaconcavefunctionin p(). 3. H(X) log X. Thefirstpropertyistrivial. Theproofofthesecondpropertyisleftasaneercise. Forthe proof of the third property, we recall the following. Lemma 2.(Jensen s inequality). If f() is conve, then E(f(X)) f(e(x)). If f()isconcave,then E(f(X)) f(e(x)).

2 2 Information Measures Now by the concavity of the logarithm function and Jensen s inequality, H(X) = E log where the last inequality follows since log E log X, p(x) p(x) E = p(x) : p() 0 p() p() = {: p() 0} X. Let (X,Y) be a pair of discrete random variables. Then the conditional entropy of Y given X isdefinedas H(Y X) = p()h(p(y )) = E log p(y X), where p(y ) = p(, y)/p() is the conditional pmf of Y given {X = }. We sometimes use the notation H(Y X = ) = H(p(y )), X. By the concavity of H(p(y)) in p(y) and Jensen s inequality, p()h(p(y )) H p()p(y ) = H(p(y)), where the inequality holds with equality if p(y ) p(y), or equivalently, X and Y are independent. We summarize this relationship between the conditional and unconditional entropies as follows. Conditioning reduces entropy. withequality if X andy areindependent. H(Y X) H(Y) (2.) Let(X,Y) p(, y)beapairofdiscreterandomvariables. Theirjointentropyis H(X,Y) = E log p(x,y). Bythechainruleofprobability p(, y) = p()p(y ) = p(y)p( y),wehavethechainrule of entropy H(X,Y) = E log p(x) +E log p(y X) = H(X)+H(Y X) = H(Y)+H(X Y).

3 2.2 Relative Entropy 3 More generally, for an n-tuple of random variables X n = (X,X 2,...,X n ), we have the following. Chain rule of entropy. H(X n ) = H(X )+H(X 2 X )+H(X n X n ) = n i= H(X i X i ), where X 0 issettobeanunspecifiedconstantbyconvention. Bythechainruleand(2.),wecanupperboundthejointentropyas H(X n ) n i= H(X i ) withequality if X,...,X n aremutually independent. 2.2 RELATIVE ENTROPY Let p() and q() be a pair of pmfs on X. The etent of discrepancy between p() and q() is measured by their relative entropy(also referred to as Kullback Leibler divergence) D(p q) = D(p() q()) = E p log p(x) q(x) = p()log p() q(). (2.2) X where the epectation is taken w.r.t. X p(). Note that this quantity is well defined only when p() is absolutely continuous w.r.t.q(), namely, p() = 0 wheneverq() = 0. Otherwise, we define D(p q) =, which follows by adopting the convention /0 = as well. The relative entropy D(p q) satisfies the following properties.. D(p q) 0withequalityifandonlyif(iff)p q,namely,p() = q()forevery X. 2. D(p q) is not symmetric, i.e., D(p q) D(q p) in general. 3. D(p q) isconvein(p,q),i.e.,forany(p,q ),(p 2,q 2 ),and λ, λ = λ [0,], λd(p q )+ λd(p 2 q 2 ) D(λp + λp 2 λq + λq 2 ).

4 4 Information Measures 4. Chainrule. Forany p(, y)andq(, y), D(p(, y) q(, y)) = D(p() q()) + p()d(p(y ) q(y )) = D(p() q()) + E p D(p(y X) q(y X)). The proofof thefirstthreeproperties isleftas aneercise. For thefourthproperty, consider D(p(, y) q(, y)) = E p log p(x,y) q(x,y) = E p log p(x) q(x) +E p log p(y X) q(y X). The notion of relative entropy can be etended to arbitrary probability measures P and Qdefinedonthesamesamplespaceandsetofeventsas D(P Q) = log dp dq d P, where dp/dqistheradon Nikodym derivative of Pw.r.t. Q. (If Pisnotabsolutely continuousw.r.t. Q,thenD(P Q) =.) Inparticular, if Pand Qhaverespectivedensities p andqw.r.t.aσ-finitemeasure μ(suchasthelebesgueandcountingmeasures),then D(P Q) = D(p q) = p()log p() dμ(). (2.3) q() This epression generalizes (2.2) since probability mass functions can be viewed as densities w.r.t. the counting measure. When μ is the Lebesgue measure (or equivalently, p and q are derivatives of continuous distributions on the Euclidean space), we follow the standardconventionofdenoting dμ()by d in(2.3). 2.3 f -DIVERGENCE Wedigressabittogeneralizethenotionofrelativeentropy. Let f : [0, ) Rbeconve with f() = 0.Thenthe f-divergencebetweenapairofdensitiespandqw.r.t.μisdefined as D f (p q) = q()f p() q() dμ() = E q f p(x) q(x). Eample2.2. Let f(u) = ulogu. Then D f (p q) = q() p() p() log q() q() dμ() = p()log p() q() dμ() = D(p q).

5 2.4 Mutual Information 5 Eample2.3. Nowlet f(u) = logu. Then D f (p q) = q()log p() q() dμ() = q()log q() p() dμ() = D(q p). Eample2.4. Combiningtheabovetwocases,let f(u) = (u )logu. Then whichissymmetricin(p,q). D f (p q) = D(p q)+d(q p), Many basic distance functions on probability measures can be represented as f-divergences; see, for eample, Liese and Vajda (2006). 2.4 MUTUAL INFORMATION Let (X,Y) be a pair of discrete random variables with joint pmf p(, y) = p()p(y ). Theamountofinformationaboutoneprovidedbytheotherismeasuredbytheirmutual information I(X;Y) = D(p(, y) p()p(y)) p(, y) = p(, y)log,y p()p(y) = p() p(y )log p(y ) y p(y) = p()d(p(y ) p(y)). The mutual information I(X; Y) satisfies the following properties.. I(X;Y)isanonnegativefunctionof p(, y). 2. I(X;Y) = 0iff X andy areindependent,i.e., p(, y) p()p(y). 3. Asafunctionof(p(), p(y )), I(X;Y)isconcavein p()forafied p(y ),andconvein p(y )forafied p(). 4. Mutual information and entropy. and I(X;X) = H(X) I(X;Y) = H(X) H(X Y) = H(Y) H(Y X) = H(X)+H(Y) H(X,Y).

6 6 Information Measures 5. Variational characterization. I(X;Y) = min q(y) wheretheminimumisattainedbyq(y) p(y). p()d(p(y ) q(y)), (2.4) Theproofofthefirstfourpropertiesisleftasaneercise. Forthefifthproperty,consider I(X;Y) = p(, y)log p(y ),y p(y) = p()p(y )log p(y ) q(y),y q(y) p(y) = p()d(p(y ) q(y)) p(y)log p(y) y q(y) p()d(p(y ) q(y)), where the last inequality follows since the subtracted term is equal to D(p(y) q(y)) 0, andholdswithequality iff p(y) q(y). Sometimesweareinterestedinthemaimummutualinformationma p() I(X;Y)of a conditional pmf p(y ), which is referred to as the information capacity(or the capacity in short). By the variational characterization in (2.4), the information capacity can be epressed as mai(x;y) = ma min p()d(p(y ) q(y)), p() p() q(y) whichcanbeviewedasagamebetweentwoplayers,onechoosing p()firstandtheother choosing q(y) net, with the payoff function f(p(),q(y)) = p()d(p(y ) q(y)). Using the following fundamental result in game theory, we show that the order of plays can be echanged without affecting the outcome of the game. Minima theorem(sion 958). Suppose that U and V are compact conve subsets of the Euclidean space, and that a real-valued continuous function f(u, ) on U V is concavein uforeach andconvein foreachu. Then ma min f(u, ) = min ma f(u, ). u U V V u U Since f(p(),q(y)) is linear (thus concave) in p() and conve in q(y) (recall Prop-

7 2.5 Entropy Rate 7 erty 3 of relative entropy), we can apply the minima theorem and conclude that mai(x;y) = ma min p() p() q(y) p()d(p(y ) q(y)) = min ma p()d(p(y ) q(y)) q(y) p() = min D(p(y ) q(y)), ma q(y) where the last equality follows by noting that the maimum epectation is attained by putting all the weights on the value that maimizes D(p(y ) q(y)). Furthermore, if p ()attainsthemaimum, thenbytheoptimalityconditionof (2.4) attains the minimum. q (y) p ()p(y ) 2.5 ENTROPY RATE Let X = (X n ) n= bearandomprocessonafinitealphabetx. Theamount ofuncertainty persymbol ismeasuredbyitsentropyrate if the limit eists. H(X) = lim n n H(X,...,X n ), Eample 2.5. If X is stationary, then the limit eists and H(X) = lim n H(X n X n ). Eample 2.6. If X is an aperiodic irreducible Markov chain, then the limit eists and H(X) = lim n H(X n X n ) = π( )H(p( 2 )), where π is the unique stationary distribution of the chain. Eample2.7. If X,X 2,...areindependentandidentically distributed(i.i.d.), then H(X) = H(X ). Eample2.8. LetY = (Y n ) n= beastationarymarkovchainandx n = f(y n ),n =,2,... Theresultingrandomprocess X = (X n ) n= ishiddenmarkovanditsentropyratesatisfies H(X n X n,y ) H(X) H(X n X n ) and H(X) = lim n H(X n X n,y ) = lim n H(X n X n ).

8 8 Information Measures 2.6 RELATIVE ENTROPY RATE Let Pand QbetwoprobabilitymeasuresonX withn-thorderdensitiesp( n )andq( n ), respectively. The normalized discrepancy between them is measured by their relative entropy rate D(P Q) = lim n n D(p(n ) q( n )) if the limit eists. Eample 2.9. If P is stationary, Q is stationary finite-order Markov, and P is absolutely continuous w.r.t. Q, then the limit eists and D(P Q) = lim n p( n )D(p( n n ) q( n n ))d n. (2.5) See, for eample, Barron(985) and Gray(20, Lemma 0.). Eample 2.0. Similarly, if P and Q are stationary hidden Markov and P is absolutely continuous w.r.t. Q, then the limit eists and(2.5) holds(juang and Rabiner 985, Lerou 992, Ephraim and Merhav 2002). PROBLEMS 2.. Prove Property 2 of entropy in Section 2. and find the equality condition for Property Prove Properties through 3 of relative entropy in Section Entropy and relative entropy. Let X be a finite alphabet and q() be the uniform pmfonx. Showthatforanypmf p()onx, D(p q) = log X H(p()) The total variation distance between two pmfs p() and q() is defined as δ(p,q) = 2 p() q(). X (a) Showthatthisdistanceisan f-divergencebyfindingthecorresponding f. (b) Show that δ(p,q) = ma A X P(A) Q(A), where Pand Qarecorrespondingprobabilitymeasures,e.g., P(A) = A p().

9 Problems Pinsker s inequality. Show that δ(p,q) 2loge D(p q), where the logarithm has the same base as the relative entropy.(hint: First consider thecasethatx isbinary.) 2.6. Let p(, y) be a joint pmf on X Y. Show that p(, y) is absolutely continuous w.r.t. p()p(y) Prove Properties through 4 of mutual information in Section Let X = (X n ) n= beastationaryrandomprocess. Showthat H(X) = lim n H(X n X n ) LetY = (Y n ) n= beastationarymarkovchainandx = {д(y n)}beahiddenmarkov process. Show that and conclude that H(X n X n,y ) H(X) H(X n X n ) H(X) = lim n H(X n X n,y ) = lim n H(X n X n ) Recurrencetime. LetX 0,X,X 2,...bei.i.d.copiesofX p(),andletn = min{n : X n = X 0 }bethewaitingtimetothenetoccurrenceof X 0. (a) Showthat E(N) = X. (b)showthat E(logN) H(X). 2.. Thepastandthefuture. Let X = (X n ) n= bestationary. Showthat lim n n I(X,...,X n ;X n+,...,x 2n ) = Variable-duration symbols. A discrete memoryless source has the alphabet {, 2}, where symbol has duration and symbol 2 has duration 2. Let X = (X n ) n= be the resulting random process. (a) FinditsentropyrateH(X)intermsoftheprobability pofsymbol. (b) Find the maimum entropy by optimizing over p.

10

11 Bibliography Barron, A. R.(985). The strong ergodic theorem for densities: Generalized Shannon McMillan Breiman theorem. Ann. Probab., 3(4), [8] Ephraim, Y. and Merhav, N. (2002). Hidden Markov processes. IEEE Trans. Inf. Theory, 48(6), [8] Gray, R. M.(20). Entropy and information theory. Springer, New York. [8] Juang, B.-H. F. and Rabiner, L. R. (985). A probabilistic distance measure for hidden Markov models. AT&T Tech. J., 64(2), [8] Lerou, B. G.(992). Maimum-likelihood estimation for hidden Markov models. Stoc. Proc. Appl., 40(), [8] Liese, F. and Vajda, I.(2006). On divergences and informations in statistics and information theory. IEEE Trans. Inf. Theory, 52(0), [5] Sion, M.(958). On general minima theorems. Pacific J. Math., 8, [6]

1 p(x) = E log 1. H(X) = plog 1 p +(1 p)log 1 1 p.

1 p(x) = E log 1. H(X) = plog 1 p +(1 p)log 1 1 p. LECTURE 2 Iformatio Measures 2. ENTROPY LetXbeadiscreteradomvariableoaalphabetX drawaccordigtotheprobability mass fuctio (pmf) p() = P(X = ), X, deoted i short as X p(). The ucertaity about the outcome

More information

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 2: Entropy and Mutual Information Chapter 2 outline Definitions Entropy Joint entropy, conditional entropy Relative entropy, mutual information Chain rules Jensen s inequality Log-sum inequality

More information

Lecture 5 - Information theory

Lecture 5 - Information theory Lecture 5 - Information theory Jan Bouda FI MU May 18, 2012 Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 1 / 42 Part I Uncertainty and entropy Jan Bouda (FI MU) Lecture 5 - Information

More information

ECE 587 / STA 563: Lecture 2 Measures of Information Information Theory Duke University, Fall 2017

ECE 587 / STA 563: Lecture 2 Measures of Information Information Theory Duke University, Fall 2017 ECE 587 / STA 563: Lecture 2 Measures of Information Information Theory Duke University, Fall 207 Author: Galen Reeves Last Modified: August 3, 207 Outline of lecture: 2. Quantifying Information..................................

More information

5 Mutual Information and Channel Capacity

5 Mutual Information and Channel Capacity 5 Mutual Information and Channel Capacity In Section 2, we have seen the use of a quantity called entropy to measure the amount of randomness in a random variable. In this section, we introduce several

More information

The binary entropy function

The binary entropy function ECE 7680 Lecture 2 Definitions and Basic Facts Objective: To learn a bunch of definitions about entropy and information measures that will be useful through the quarter, and to present some simple but

More information

Information Theory and Communication

Information Theory and Communication Information Theory and Communication Ritwik Banerjee rbanerjee@cs.stonybrook.edu c Ritwik Banerjee Information Theory and Communication 1/8 General Chain Rules Definition Conditional mutual information

More information

Lecture 2: August 31

Lecture 2: August 31 0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 2: August 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy

More information

3. If a choice is broken down into two successive choices, the original H should be the weighted sum of the individual values of H.

3. If a choice is broken down into two successive choices, the original H should be the weighted sum of the individual values of H. Appendix A Information Theory A.1 Entropy Shannon (Shanon, 1948) developed the concept of entropy to measure the uncertainty of a discrete random variable. Suppose X is a discrete random variable that

More information

ECE 587 / STA 563: Lecture 5 Lossless Compression

ECE 587 / STA 563: Lecture 5 Lossless Compression ECE 587 / STA 563: Lecture 5 Lossless Compression Information Theory Duke University, Fall 2017 Author: Galen Reeves Last Modified: October 18, 2017 Outline of lecture: 5.1 Introduction to Lossless Source

More information

COMPSCI 650 Applied Information Theory Jan 21, Lecture 2

COMPSCI 650 Applied Information Theory Jan 21, Lecture 2 COMPSCI 650 Applied Information Theory Jan 21, 2016 Lecture 2 Instructor: Arya Mazumdar Scribe: Gayane Vardoyan, Jong-Chyi Su 1 Entropy Definition: Entropy is a measure of uncertainty of a random variable.

More information

ECE 587 / STA 563: Lecture 5 Lossless Compression

ECE 587 / STA 563: Lecture 5 Lossless Compression ECE 587 / STA 563: Lecture 5 Lossless Compression Information Theory Duke University, Fall 28 Author: Galen Reeves Last Modified: September 27, 28 Outline of lecture: 5. Introduction to Lossless Source

More information

Information Theory Primer:

Information Theory Primer: Information Theory Primer: Entropy, KL Divergence, Mutual Information, Jensen s inequality Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,

More information

Example: Letter Frequencies

Example: Letter Frequencies Example: Letter Frequencies i a i p i 1 a 0.0575 2 b 0.0128 3 c 0.0263 4 d 0.0285 5 e 0.0913 6 f 0.0173 7 g 0.0133 8 h 0.0313 9 i 0.0599 10 j 0.0006 11 k 0.0084 12 l 0.0335 13 m 0.0235 14 n 0.0596 15 o

More information

MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016

MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016 MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016 Lecture 14: Information Theoretic Methods Lecturer: Jiaming Xu Scribe: Hilda Ibriga, Adarsh Barik, December 02, 2016 Outline f-divergence

More information

Example: Letter Frequencies

Example: Letter Frequencies Example: Letter Frequencies i a i p i 1 a 0.0575 2 b 0.0128 3 c 0.0263 4 d 0.0285 5 e 0.0913 6 f 0.0173 7 g 0.0133 8 h 0.0313 9 i 0.0599 10 j 0.0006 11 k 0.0084 12 l 0.0335 13 m 0.0235 14 n 0.0596 15 o

More information

Tight Bounds for Symmetric Divergence Measures and a New Inequality Relating f-divergences

Tight Bounds for Symmetric Divergence Measures and a New Inequality Relating f-divergences Tight Bounds for Symmetric Divergence Measures and a New Inequality Relating f-divergences Igal Sason Department of Electrical Engineering Technion, Haifa 3000, Israel E-mail: sason@ee.technion.ac.il Abstract

More information

Machine Learning. Lecture 02.2: Basics of Information Theory. Nevin L. Zhang

Machine Learning. Lecture 02.2: Basics of Information Theory. Nevin L. Zhang Machine Learning Lecture 02.2: Basics of Information Theory Nevin L. Zhang lzhang@cse.ust.hk Department of Computer Science and Engineering The Hong Kong University of Science and Technology Nevin L. Zhang

More information

Communication Theory and Engineering

Communication Theory and Engineering Communication Theory and Engineering Master's Degree in Electronic Engineering Sapienza University of Rome A.A. 018-019 Information theory Practice work 3 Review For any probability distribution, we define

More information

LECTURE 2. Convexity and related notions. Last time: mutual information: definitions and properties. Lecture outline

LECTURE 2. Convexity and related notions. Last time: mutual information: definitions and properties. Lecture outline LECTURE 2 Convexity and related notions Last time: Goals and mechanics of the class notation entropy: definitions and properties mutual information: definitions and properties Lecture outline Convexity

More information

Complex Systems Methods 2. Conditional mutual information, entropy rate and algorithmic complexity

Complex Systems Methods 2. Conditional mutual information, entropy rate and algorithmic complexity Complex Systems Methods 2. Conditional mutual information, entropy rate and algorithmic complexity Eckehard Olbrich MPI MiS Leipzig Potsdam WS 2007/08 Olbrich (Leipzig) 26.10.2007 1 / 18 Overview 1 Summary

More information

Lecture 22: Final Review

Lecture 22: Final Review Lecture 22: Final Review Nuts and bolts Fundamental questions and limits Tools Practical algorithms Future topics Dr Yao Xie, ECE587, Information Theory, Duke University Basics Dr Yao Xie, ECE587, Information

More information

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information 4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information Ramji Venkataramanan Signal Processing and Communications Lab Department of Engineering ramji.v@eng.cam.ac.uk

More information

5.1 Inequalities via joint range

5.1 Inequalities via joint range ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 5: Inequalities between f-divergences via their joint range Lecturer: Yihong Wu Scribe: Pengkun Yang, Feb 9, 2016

More information

Tight Bounds for Symmetric Divergence Measures and a Refined Bound for Lossless Source Coding

Tight Bounds for Symmetric Divergence Measures and a Refined Bound for Lossless Source Coding APPEARS IN THE IEEE TRANSACTIONS ON INFORMATION THEORY, FEBRUARY 015 1 Tight Bounds for Symmetric Divergence Measures and a Refined Bound for Lossless Source Coding Igal Sason Abstract Tight bounds for

More information

Chapter 8: Differential entropy. University of Illinois at Chicago ECE 534, Natasha Devroye

Chapter 8: Differential entropy. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 8: Differential entropy Chapter 8 outline Motivation Definitions Relation to discrete entropy Joint and conditional differential entropy Relative entropy and mutual information Properties AEP for

More information

LECTURE 3. Last time:

LECTURE 3. Last time: LECTURE 3 Last time: Mutual Information. Convexity and concavity Jensen s inequality Information Inequality Data processing theorem Fano s Inequality Lecture outline Stochastic processes, Entropy rate

More information

Lecture 1: Introduction, Entropy and ML estimation

Lecture 1: Introduction, Entropy and ML estimation 0-704: Information Processing and Learning Spring 202 Lecture : Introduction, Entropy and ML estimation Lecturer: Aarti Singh Scribes: Min Xu Disclaimer: These notes have not been subjected to the usual

More information

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information. L65 Dept. of Linguistics, Indiana University Fall 205 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission rate

More information

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1 Kraft s inequality An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if N 2 l i 1 Proof: Suppose that we have a tree code. Let l max = max{l 1,...,

More information

Dept. of Linguistics, Indiana University Fall 2015

Dept. of Linguistics, Indiana University Fall 2015 L645 Dept. of Linguistics, Indiana University Fall 2015 1 / 28 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission

More information

Information Theory: Entropy, Markov Chains, and Huffman Coding

Information Theory: Entropy, Markov Chains, and Huffman Coding The University of Notre Dame A senior thesis submitted to the Department of Mathematics and the Glynn Family Honors Program Information Theory: Entropy, Markov Chains, and Huffman Coding Patrick LeBlanc

More information

Solutions to Homework Set #1 Sanov s Theorem, Rate distortion

Solutions to Homework Set #1 Sanov s Theorem, Rate distortion st Semester 00/ Solutions to Homework Set # Sanov s Theorem, Rate distortion. Sanov s theorem: Prove the simple version of Sanov s theorem for the binary random variables, i.e., let X,X,...,X n be a sequence

More information

Entropy and Ergodic Theory Lecture 4: Conditional entropy and mutual information

Entropy and Ergodic Theory Lecture 4: Conditional entropy and mutual information Entropy and Ergodic Theory Lecture 4: Conditional entropy and mutual information 1 Conditional entropy Let (Ω, F, P) be a probability space, let X be a RV taking values in some finite set A. In this lecture

More information

Introduction to Information Theory

Introduction to Information Theory Introduction to Information Theory Gurinder Singh Mickey Atwal atwal@cshl.edu Center for Quantitative Biology Kullback-Leibler Divergence Summary Shannon s coding theorems Entropy Mutual Information Multi-information

More information

EE 4TM4: Digital Communications II. Channel Capacity

EE 4TM4: Digital Communications II. Channel Capacity EE 4TM4: Digital Communications II 1 Channel Capacity I. CHANNEL CODING THEOREM Definition 1: A rater is said to be achievable if there exists a sequence of(2 nr,n) codes such thatlim n P (n) e (C) = 0.

More information

ECE 4400:693 - Information Theory

ECE 4400:693 - Information Theory ECE 4400:693 - Information Theory Dr. Nghi Tran Lecture 8: Differential Entropy Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 1 / 43 Outline 1 Review: Entropy of discrete RVs 2 Differential

More information

Solutions to Set #2 Data Compression, Huffman code and AEP

Solutions to Set #2 Data Compression, Huffman code and AEP Solutions to Set #2 Data Compression, Huffman code and AEP. Huffman coding. Consider the random variable ( ) x x X = 2 x 3 x 4 x 5 x 6 x 7 0.50 0.26 0. 0.04 0.04 0.03 0.02 (a) Find a binary Huffman code

More information

The Method of Types and Its Application to Information Hiding

The Method of Types and Its Application to Information Hiding The Method of Types and Its Application to Information Hiding Pierre Moulin University of Illinois at Urbana-Champaign www.ifp.uiuc.edu/ moulin/talks/eusipco05-slides.pdf EUSIPCO Antalya, September 7,

More information

Quantitative Biology II Lecture 4: Variational Methods

Quantitative Biology II Lecture 4: Variational Methods 10 th March 2015 Quantitative Biology II Lecture 4: Variational Methods Gurinder Singh Mickey Atwal Center for Quantitative Biology Cold Spring Harbor Laboratory Image credit: Mike West Summary Approximate

More information

Introduction to Information Theory. B. Škorić, Physical Aspects of Digital Security, Chapter 2

Introduction to Information Theory. B. Škorić, Physical Aspects of Digital Security, Chapter 2 Introduction to Information Theory B. Škorić, Physical Aspects of Digital Security, Chapter 2 1 Information theory What is it? - formal way of counting information bits Why do we need it? - often used

More information

Data Compression. Limit of Information Compression. October, Examples of codes 1

Data Compression. Limit of Information Compression. October, Examples of codes 1 Data Compression Limit of Information Compression Radu Trîmbiţaş October, 202 Outline Contents Eamples of codes 2 Kraft Inequality 4 2. Kraft Inequality............................ 4 2.2 Kraft inequality

More information

Series 7, May 22, 2018 (EM Convergence)

Series 7, May 22, 2018 (EM Convergence) Exercises Introduction to Machine Learning SS 2018 Series 7, May 22, 2018 (EM Convergence) Institute for Machine Learning Dept. of Computer Science, ETH Zürich Prof. Dr. Andreas Krause Web: https://las.inf.ethz.ch/teaching/introml-s18

More information

EE5319R: Problem Set 3 Assigned: 24/08/16, Due: 31/08/16

EE5319R: Problem Set 3 Assigned: 24/08/16, Due: 31/08/16 EE539R: Problem Set 3 Assigned: 24/08/6, Due: 3/08/6. Cover and Thomas: Problem 2.30 (Maimum Entropy): Solution: We are required to maimize H(P X ) over all distributions P X on the non-negative integers

More information

Lecture 6 I. CHANNEL CODING. X n (m) P Y X

Lecture 6 I. CHANNEL CODING. X n (m) P Y X 6- Introduction to Information Theory Lecture 6 Lecturer: Haim Permuter Scribe: Yoav Eisenberg and Yakov Miron I. CHANNEL CODING We consider the following channel coding problem: m = {,2,..,2 nr} Encoder

More information

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 ECE598: Information-theoretic methods in high-dimensional statistics Spring 06 Lecture : Mutual Information Method Lecturer: Yihong Wu Scribe: Jaeho Lee, Mar, 06 Ed. Mar 9 Quick review: Assouad s lemma

More information

Computational Systems Biology: Biology X

Computational Systems Biology: Biology X Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA L#8:(November-08-2010) Cancer and Signals Outline 1 Bayesian Interpretation of Probabilities Information Theory Outline Bayesian

More information

Example: Letter Frequencies

Example: Letter Frequencies Example: Letter Frequencies i a i p i 1 a 0.0575 2 b 0.0128 3 c 0.0263 4 d 0.0285 5 e 0.0913 6 f 0.0173 7 g 0.0133 8 h 0.0313 9 i 0.0599 10 j 0.0006 11 k 0.0084 12 l 0.0335 13 m 0.0235 14 n 0.0596 15 o

More information

Homework Set #2 Data Compression, Huffman code and AEP

Homework Set #2 Data Compression, Huffman code and AEP Homework Set #2 Data Compression, Huffman code and AEP 1. Huffman coding. Consider the random variable ( x1 x X = 2 x 3 x 4 x 5 x 6 x 7 0.50 0.26 0.11 0.04 0.04 0.03 0.02 (a Find a binary Huffman code

More information

Convexity/Concavity of Renyi Entropy and α-mutual Information

Convexity/Concavity of Renyi Entropy and α-mutual Information Convexity/Concavity of Renyi Entropy and -Mutual Information Siu-Wai Ho Institute for Telecommunications Research University of South Australia Adelaide, SA 5095, Australia Email: siuwai.ho@unisa.edu.au

More information

Computing and Communications 2. Information Theory -Entropy

Computing and Communications 2. Information Theory -Entropy 1896 1920 1987 2006 Computing and Communications 2. Information Theory -Entropy Ying Cui Department of Electronic Engineering Shanghai Jiao Tong University, China 2017, Autumn 1 Outline Entropy Joint entropy

More information

Variable Length Codes for Degraded Broadcast Channels

Variable Length Codes for Degraded Broadcast Channels Variable Length Codes for Degraded Broadcast Channels Stéphane Musy School of Computer and Communication Sciences, EPFL CH-1015 Lausanne, Switzerland Email: stephane.musy@ep.ch Abstract This paper investigates

More information

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157 Lecture 6: Gaussian Channels Copyright G. Caire (Sample Lectures) 157 Differential entropy (1) Definition 18. The (joint) differential entropy of a continuous random vector X n p X n(x) over R is: Z h(x

More information

EE514A Information Theory I Fall 2013

EE514A Information Theory I Fall 2013 EE514A Information Theory I Fall 2013 K. Mohan, Prof. J. Bilmes University of Washington, Seattle Department of Electrical Engineering Fall Quarter, 2013 http://j.ee.washington.edu/~bilmes/classes/ee514a_fall_2013/

More information

Entropies & Information Theory

Entropies & Information Theory Entropies & Information Theory LECTURE I Nilanjana Datta University of Cambridge,U.K. See lecture notes on: http://www.qi.damtp.cam.ac.uk/node/223 Quantum Information Theory Born out of Classical Information

More information

Large Deviations Performance of Knuth-Yao algorithm for Random Number Generation

Large Deviations Performance of Knuth-Yao algorithm for Random Number Generation Large Deviations Performance of Knuth-Yao algorithm for Random Number Generation Akisato KIMURA akisato@ss.titech.ac.jp Tomohiko UYEMATSU uematsu@ss.titech.ac.jp April 2, 999 No. AK-TR-999-02 Abstract

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science Transmission of Information Spring 2006

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science Transmission of Information Spring 2006 MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science 6.44 Transmission of Information Spring 2006 Homework 2 Solution name username April 4, 2006 Reading: Chapter

More information

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 1 Entropy Since this course is about entropy maximization,

More information

Chaos, Complexity, and Inference (36-462)

Chaos, Complexity, and Inference (36-462) Chaos, Complexity, and Inference (36-462) Lecture 7: Information Theory Cosma Shalizi 3 February 2009 Entropy and Information Measuring randomness and dependence in bits The connection to statistics Long-run

More information

Bioinformatics: Biology X

Bioinformatics: Biology X Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA Model Building/Checking, Reverse Engineering, Causality Outline 1 Bayesian Interpretation of Probabilities 2 Where (or of what)

More information

Entropy and Large Deviations

Entropy and Large Deviations Entropy and Large Deviations p. 1/32 Entropy and Large Deviations S.R.S. Varadhan Courant Institute, NYU Michigan State Universiy East Lansing March 31, 2015 Entropy comes up in many different contexts.

More information

How to Quantitate a Markov Chain? Stochostic project 1

How to Quantitate a Markov Chain? Stochostic project 1 How to Quantitate a Markov Chain? Stochostic project 1 Chi-Ning,Chou Wei-chang,Lee PROFESSOR RAOUL NORMAND April 18, 2015 Abstract In this project, we want to quantitatively evaluate a Markov chain. In

More information

Computation of Information Rates from Finite-State Source/Channel Models

Computation of Information Rates from Finite-State Source/Channel Models Allerton 2002 Computation of Information Rates from Finite-State Source/Channel Models Dieter Arnold arnold@isi.ee.ethz.ch Hans-Andrea Loeliger loeliger@isi.ee.ethz.ch Pascal O. Vontobel vontobel@isi.ee.ethz.ch

More information

Information Theoretic Limits of Randomness Generation

Information Theoretic Limits of Randomness Generation Information Theoretic Limits of Randomness Generation Abbas El Gamal Stanford University Shannon Centennial, University of Michigan, September 2016 Information theory The fundamental problem of communication

More information

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University Chapter 4 Data Transmission and Channel Capacity Po-Ning Chen, Professor Department of Communications Engineering National Chiao Tung University Hsin Chu, Taiwan 30050, R.O.C. Principle of Data Transmission

More information

CS 229: Lecture 7 Notes

CS 229: Lecture 7 Notes CS 9: Lecture 7 Notes Scribe: Hirsh Jain Lecturer: Angela Fan Overview Overview of today s lecture: Hypothesis Testing Total Variation Distance Pinsker s Inequality Application of Pinsker s Inequality

More information

Homework 1 Due: Thursday 2/5/2015. Instructions: Turn in your homework in class on Thursday 2/5/2015

Homework 1 Due: Thursday 2/5/2015. Instructions: Turn in your homework in class on Thursday 2/5/2015 10-704 Homework 1 Due: Thursday 2/5/2015 Instructions: Turn in your homework in class on Thursday 2/5/2015 1. Information Theory Basics and Inequalities C&T 2.47, 2.29 (a) A deck of n cards in order 1,

More information

Information Theory. David Rosenberg. June 15, New York University. David Rosenberg (New York University) DS-GA 1003 June 15, / 18

Information Theory. David Rosenberg. June 15, New York University. David Rosenberg (New York University) DS-GA 1003 June 15, / 18 Information Theory David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 18 A Measure of Information? Consider a discrete random variable

More information

Information Theory and Hypothesis Testing

Information Theory and Hypothesis Testing Summer School on Game Theory and Telecommunications Campione, 7-12 September, 2014 Information Theory and Hypothesis Testing Mauro Barni University of Siena September 8 Review of some basic results linking

More information

NUMERICAL COMPUTATION OF THE CAPACITY OF CONTINUOUS MEMORYLESS CHANNELS

NUMERICAL COMPUTATION OF THE CAPACITY OF CONTINUOUS MEMORYLESS CHANNELS NUMERICAL COMPUTATION OF THE CAPACITY OF CONTINUOUS MEMORYLESS CHANNELS Justin Dauwels Dept. of Information Technology and Electrical Engineering ETH, CH-8092 Zürich, Switzerland dauwels@isi.ee.ethz.ch

More information

June 21, Peking University. Dual Connections. Zhengchao Wan. Overview. Duality of connections. Divergence: general contrast functions

June 21, Peking University. Dual Connections. Zhengchao Wan. Overview. Duality of connections. Divergence: general contrast functions Dual Peking University June 21, 2016 Divergences: Riemannian connection Let M be a manifold on which there is given a Riemannian metric g =,. A connection satisfying Z X, Y = Z X, Y + X, Z Y (1) for all

More information

Information measures in simple coding problems

Information measures in simple coding problems Part I Information measures in simple coding problems in this web service in this web service Source coding and hypothesis testing; information measures A(discrete)source is a sequence {X i } i= of random

More information

CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University

CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University Charalampos E. Tsourakakis January 25rd, 2017 Probability Theory The theory of probability is a system for making better guesses.

More information

Lecture Notes for Statistics 311/Electrical Engineering 377. John Duchi

Lecture Notes for Statistics 311/Electrical Engineering 377. John Duchi Lecture Notes for Statistics 311/Electrical Engineering 377 February 23, 2016 Contents 1 Introduction and setting 5 1.1 Information theory..................................... 5 1.2 Moving to statistics....................................

More information

AQI: Advanced Quantum Information Lecture 6 (Module 2): Distinguishing Quantum States January 28, 2013

AQI: Advanced Quantum Information Lecture 6 (Module 2): Distinguishing Quantum States January 28, 2013 AQI: Advanced Quantum Information Lecture 6 (Module 2): Distinguishing Quantum States January 28, 2013 Lecturer: Dr. Mark Tame Introduction With the emergence of new types of information, in this case

More information

x log x, which is strictly convex, and use Jensen s Inequality:

x log x, which is strictly convex, and use Jensen s Inequality: 2. Information measures: mutual information 2.1 Divergence: main inequality Theorem 2.1 (Information Inequality). D(P Q) 0 ; D(P Q) = 0 iff P = Q Proof. Let ϕ(x) x log x, which is strictly convex, and

More information

Chapter I: Fundamental Information Theory

Chapter I: Fundamental Information Theory ECE-S622/T62 Notes Chapter I: Fundamental Information Theory Ruifeng Zhang Dept. of Electrical & Computer Eng. Drexel University. Information Source Information is the outcome of some physical processes.

More information

Machine Learning Srihari. Information Theory. Sargur N. Srihari

Machine Learning Srihari. Information Theory. Sargur N. Srihari Information Theory Sargur N. Srihari 1 Topics 1. Entropy as an Information Measure 1. Discrete variable definition Relationship to Code Length 2. Continuous Variable Differential Entropy 2. Maximum Entropy

More information

1 Basic Information Theory

1 Basic Information Theory ECE 6980 An Algorithmic and Information-Theoretic Toolbo for Massive Data Instructor: Jayadev Acharya Lecture #4 Scribe: Xiao Xu 6th September, 206 Please send errors to 243@cornell.edu and acharya@cornell.edu

More information

Context tree models for source coding

Context tree models for source coding Context tree models for source coding Toward Non-parametric Information Theory Licence de droits d usage Outline Lossless Source Coding = density estimation with log-loss Source Coding and Universal Coding

More information

Application of Information Theory, Lecture 7. Relative Entropy. Handout Mode. Iftach Haitner. Tel Aviv University.

Application of Information Theory, Lecture 7. Relative Entropy. Handout Mode. Iftach Haitner. Tel Aviv University. Application of Information Theory, Lecture 7 Relative Entropy Handout Mode Iftach Haitner Tel Aviv University. December 1, 2015 Iftach Haitner (TAU) Application of Information Theory, Lecture 7 December

More information

Information in Biology

Information in Biology Lecture 3: Information in Biology Tsvi Tlusty, tsvi@unist.ac.kr Living information is carried by molecular channels Living systems I. Self-replicating information processors Environment II. III. Evolve

More information

Symmetric Characterization of Finite State Markov Channels

Symmetric Characterization of Finite State Markov Channels Symmetric Characterization of Finite State Markov Channels Mohammad Rezaeian Department of Electrical and Electronic Eng. The University of Melbourne Victoria, 31, Australia Email: rezaeian@unimelb.edu.au

More information

Gambling and Information Theory

Gambling and Information Theory Gambling and Information Theory Giulio Bertoli UNIVERSITEIT VAN AMSTERDAM December 17, 2014 Overview Introduction Kelly Gambling Horse Races and Mutual Information Some Facts Shannon (1948): definitions/concepts

More information

Principles of Communications

Principles of Communications Principles of Communications Weiyao Lin Shanghai Jiao Tong University Chapter 10: Information Theory Textbook: Chapter 12 Communication Systems Engineering: Ch 6.1, Ch 9.1~ 9. 92 2009/2010 Meixia Tao @

More information

ELEC546 Review of Information Theory

ELEC546 Review of Information Theory ELEC546 Review of Information Theory Vincent Lau 1/1/004 1 Review of Information Theory Entropy: Measure of uncertainty of a random variable X. The entropy of X, H(X), is given by: If X is a discrete random

More information

arxiv: v4 [cs.it] 17 Oct 2015

arxiv: v4 [cs.it] 17 Oct 2015 Upper Bounds on the Relative Entropy and Rényi Divergence as a Function of Total Variation Distance for Finite Alphabets Igal Sason Department of Electrical Engineering Technion Israel Institute of Technology

More information

Arimoto Channel Coding Converse and Rényi Divergence

Arimoto Channel Coding Converse and Rényi Divergence Arimoto Channel Coding Converse and Rényi Divergence Yury Polyanskiy and Sergio Verdú Abstract Arimoto proved a non-asymptotic upper bound on the probability of successful decoding achievable by any code

More information

Information in Biology

Information in Biology Information in Biology CRI - Centre de Recherches Interdisciplinaires, Paris May 2012 Information processing is an essential part of Life. Thinking about it in quantitative terms may is useful. 1 Living

More information

Coding on Countably Infinite Alphabets

Coding on Countably Infinite Alphabets Coding on Countably Infinite Alphabets Non-parametric Information Theory Licence de droits d usage Outline Lossless Coding on infinite alphabets Source Coding Universal Coding Infinite Alphabets Enveloppe

More information

Hands-On Learning Theory Fall 2016, Lecture 3

Hands-On Learning Theory Fall 2016, Lecture 3 Hands-On Learning Theory Fall 016, Lecture 3 Jean Honorio jhonorio@purdue.edu 1 Information Theory First, we provide some information theory background. Definition 3.1 (Entropy). The entropy of a discrete

More information

Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information

Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information 204 IEEE International Symposium on Information Theory Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information Omur Ozel, Kaya Tutuncuoglu 2, Sennur Ulukus, and Aylin Yener

More information

Common Information. Abbas El Gamal. Stanford University. Viterbi Lecture, USC, April 2014

Common Information. Abbas El Gamal. Stanford University. Viterbi Lecture, USC, April 2014 Common Information Abbas El Gamal Stanford University Viterbi Lecture, USC, April 2014 Andrew Viterbi s Fabulous Formula, IEEE Spectrum, 2010 El Gamal (Stanford University) Disclaimer Viterbi Lecture 2

More information

Information Theory in Intelligent Decision Making

Information Theory in Intelligent Decision Making Information Theory in Intelligent Decision Making Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire, United Kingdom June 7, 2015 Information Theory

More information

arxiv: v4 [cs.it] 8 Apr 2014

arxiv: v4 [cs.it] 8 Apr 2014 1 On Improved Bounds for Probability Metrics and f-divergences Igal Sason Abstract Derivation of tight bounds for probability metrics and f-divergences is of interest in information theory and statistics.

More information

Approximate inference, Sampling & Variational inference Fall Cours 9 November 25

Approximate inference, Sampling & Variational inference Fall Cours 9 November 25 Approimate inference, Sampling & Variational inference Fall 2015 Cours 9 November 25 Enseignant: Guillaume Obozinski Scribe: Basile Clément, Nathan de Lara 9.1 Approimate inference with MCMC 9.1.1 Gibbs

More information

Lecture 5 Channel Coding over Continuous Channels

Lecture 5 Channel Coding over Continuous Channels Lecture 5 Channel Coding over Continuous Channels I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw November 14, 2014 1 / 34 I-Hsiang Wang NIT Lecture 5 From

More information

Lecture 8: Channel Capacity, Continuous Random Variables

Lecture 8: Channel Capacity, Continuous Random Variables EE376A/STATS376A Information Theory Lecture 8-02/0/208 Lecture 8: Channel Capacity, Continuous Random Variables Lecturer: Tsachy Weissman Scribe: Augustine Chemparathy, Adithya Ganesh, Philip Hwang Channel

More information

3F1 Information Theory, Lecture 1

3F1 Information Theory, Lecture 1 3F1 Information Theory, Lecture 1 Jossy Sayir Department of Engineering Michaelmas 2013, 22 November 2013 Organisation History Entropy Mutual Information 2 / 18 Course Organisation 4 lectures Course material:

More information

The Information Bottleneck Revisited or How to Choose a Good Distortion Measure

The Information Bottleneck Revisited or How to Choose a Good Distortion Measure The Information Bottleneck Revisited or How to Choose a Good Distortion Measure Peter Harremoës Centrum voor Wiskunde en Informatica PO 94079, 1090 GB Amsterdam The Nederlands PHarremoes@cwinl Naftali

More information