H(X) = plog 1 p +(1 p)log 1 1 p. With a slight abuse of notation, we denote this quantity by H(p) and refer to it as the binary entropy function.
|
|
- Melanie Hubbard
- 5 years ago
- Views:
Transcription
1 LECTURE 2 Information Measures 2. ENTROPY LetXbeadiscreterandomvariableonanalphabetX drawnaccordingtotheprobability mass function (pmf) p() = P(X = ), X, denoted in short as X p(). The uncertainty about the outcome of X, or equivalently, the amount of information gained by observing X, is measured by its entropy H(X) = p() log p() = E log p(x). X Bycontinuity,weusetheconvention 0log0 = 0intheabovesummation. Sometimeswe denoteh(x)byh(p()),highlightingthefactthath(x)isafunctionalofthepmf p(). Eample2.. If X is a Bernoulli random variable with parameter p = P{X = } [0,] (inshort X Bern(p)),then H(X) = plog p +( p)log p. With a slight abuse of notation, we denote this quantity by H(p) and refer to it as the binary entropy function. The entropy H(X) satisfies the following properties.. H(X) H(X)isaconcavefunctionin p(). 3. H(X) log X. Thefirstpropertyistrivial. Theproofofthesecondpropertyisleftasaneercise. Forthe proof of the third property, we recall the following. Lemma 2.(Jensen s inequality). If f() is conve, then E(f(X)) f(e(x)). If f()isconcave,then E(f(X)) f(e(x)).
2 2 Information Measures Now by the concavity of the logarithm function and Jensen s inequality, H(X) = E log where the last inequality follows since log E log X, p(x) p(x) E = p(x) : p() 0 p() p() = {: p() 0} X. Let (X,Y) be a pair of discrete random variables. Then the conditional entropy of Y given X isdefinedas H(Y X) = p()h(p(y )) = E log p(y X), where p(y ) = p(, y)/p() is the conditional pmf of Y given {X = }. We sometimes use the notation H(Y X = ) = H(p(y )), X. By the concavity of H(p(y)) in p(y) and Jensen s inequality, p()h(p(y )) H p()p(y ) = H(p(y)), where the inequality holds with equality if p(y ) p(y), or equivalently, X and Y are independent. We summarize this relationship between the conditional and unconditional entropies as follows. Conditioning reduces entropy. withequality if X andy areindependent. H(Y X) H(Y) (2.) Let(X,Y) p(, y)beapairofdiscreterandomvariables. Theirjointentropyis H(X,Y) = E log p(x,y). Bythechainruleofprobability p(, y) = p()p(y ) = p(y)p( y),wehavethechainrule of entropy H(X,Y) = E log p(x) +E log p(y X) = H(X)+H(Y X) = H(Y)+H(X Y).
3 2.2 Relative Entropy 3 More generally, for an n-tuple of random variables X n = (X,X 2,...,X n ), we have the following. Chain rule of entropy. H(X n ) = H(X )+H(X 2 X )+H(X n X n ) = n i= H(X i X i ), where X 0 issettobeanunspecifiedconstantbyconvention. Bythechainruleand(2.),wecanupperboundthejointentropyas H(X n ) n i= H(X i ) withequality if X,...,X n aremutually independent. 2.2 RELATIVE ENTROPY Let p() and q() be a pair of pmfs on X. The etent of discrepancy between p() and q() is measured by their relative entropy(also referred to as Kullback Leibler divergence) D(p q) = D(p() q()) = E p log p(x) q(x) = p()log p() q(). (2.2) X where the epectation is taken w.r.t. X p(). Note that this quantity is well defined only when p() is absolutely continuous w.r.t.q(), namely, p() = 0 wheneverq() = 0. Otherwise, we define D(p q) =, which follows by adopting the convention /0 = as well. The relative entropy D(p q) satisfies the following properties.. D(p q) 0withequalityifandonlyif(iff)p q,namely,p() = q()forevery X. 2. D(p q) is not symmetric, i.e., D(p q) D(q p) in general. 3. D(p q) isconvein(p,q),i.e.,forany(p,q ),(p 2,q 2 ),and λ, λ = λ [0,], λd(p q )+ λd(p 2 q 2 ) D(λp + λp 2 λq + λq 2 ).
4 4 Information Measures 4. Chainrule. Forany p(, y)andq(, y), D(p(, y) q(, y)) = D(p() q()) + p()d(p(y ) q(y )) = D(p() q()) + E p D(p(y X) q(y X)). The proofof thefirstthreeproperties isleftas aneercise. For thefourthproperty, consider D(p(, y) q(, y)) = E p log p(x,y) q(x,y) = E p log p(x) q(x) +E p log p(y X) q(y X). The notion of relative entropy can be etended to arbitrary probability measures P and Qdefinedonthesamesamplespaceandsetofeventsas D(P Q) = log dp dq d P, where dp/dqistheradon Nikodym derivative of Pw.r.t. Q. (If Pisnotabsolutely continuousw.r.t. Q,thenD(P Q) =.) Inparticular, if Pand Qhaverespectivedensities p andqw.r.t.aσ-finitemeasure μ(suchasthelebesgueandcountingmeasures),then D(P Q) = D(p q) = p()log p() dμ(). (2.3) q() This epression generalizes (2.2) since probability mass functions can be viewed as densities w.r.t. the counting measure. When μ is the Lebesgue measure (or equivalently, p and q are derivatives of continuous distributions on the Euclidean space), we follow the standardconventionofdenoting dμ()by d in(2.3). 2.3 f -DIVERGENCE Wedigressabittogeneralizethenotionofrelativeentropy. Let f : [0, ) Rbeconve with f() = 0.Thenthe f-divergencebetweenapairofdensitiespandqw.r.t.μisdefined as D f (p q) = q()f p() q() dμ() = E q f p(x) q(x). Eample2.2. Let f(u) = ulogu. Then D f (p q) = q() p() p() log q() q() dμ() = p()log p() q() dμ() = D(p q).
5 2.4 Mutual Information 5 Eample2.3. Nowlet f(u) = logu. Then D f (p q) = q()log p() q() dμ() = q()log q() p() dμ() = D(q p). Eample2.4. Combiningtheabovetwocases,let f(u) = (u )logu. Then whichissymmetricin(p,q). D f (p q) = D(p q)+d(q p), Many basic distance functions on probability measures can be represented as f-divergences; see, for eample, Liese and Vajda (2006). 2.4 MUTUAL INFORMATION Let (X,Y) be a pair of discrete random variables with joint pmf p(, y) = p()p(y ). Theamountofinformationaboutoneprovidedbytheotherismeasuredbytheirmutual information I(X;Y) = D(p(, y) p()p(y)) p(, y) = p(, y)log,y p()p(y) = p() p(y )log p(y ) y p(y) = p()d(p(y ) p(y)). The mutual information I(X; Y) satisfies the following properties.. I(X;Y)isanonnegativefunctionof p(, y). 2. I(X;Y) = 0iff X andy areindependent,i.e., p(, y) p()p(y). 3. Asafunctionof(p(), p(y )), I(X;Y)isconcavein p()forafied p(y ),andconvein p(y )forafied p(). 4. Mutual information and entropy. and I(X;X) = H(X) I(X;Y) = H(X) H(X Y) = H(Y) H(Y X) = H(X)+H(Y) H(X,Y).
6 6 Information Measures 5. Variational characterization. I(X;Y) = min q(y) wheretheminimumisattainedbyq(y) p(y). p()d(p(y ) q(y)), (2.4) Theproofofthefirstfourpropertiesisleftasaneercise. Forthefifthproperty,consider I(X;Y) = p(, y)log p(y ),y p(y) = p()p(y )log p(y ) q(y),y q(y) p(y) = p()d(p(y ) q(y)) p(y)log p(y) y q(y) p()d(p(y ) q(y)), where the last inequality follows since the subtracted term is equal to D(p(y) q(y)) 0, andholdswithequality iff p(y) q(y). Sometimesweareinterestedinthemaimummutualinformationma p() I(X;Y)of a conditional pmf p(y ), which is referred to as the information capacity(or the capacity in short). By the variational characterization in (2.4), the information capacity can be epressed as mai(x;y) = ma min p()d(p(y ) q(y)), p() p() q(y) whichcanbeviewedasagamebetweentwoplayers,onechoosing p()firstandtheother choosing q(y) net, with the payoff function f(p(),q(y)) = p()d(p(y ) q(y)). Using the following fundamental result in game theory, we show that the order of plays can be echanged without affecting the outcome of the game. Minima theorem(sion 958). Suppose that U and V are compact conve subsets of the Euclidean space, and that a real-valued continuous function f(u, ) on U V is concavein uforeach andconvein foreachu. Then ma min f(u, ) = min ma f(u, ). u U V V u U Since f(p(),q(y)) is linear (thus concave) in p() and conve in q(y) (recall Prop-
7 2.5 Entropy Rate 7 erty 3 of relative entropy), we can apply the minima theorem and conclude that mai(x;y) = ma min p() p() q(y) p()d(p(y ) q(y)) = min ma p()d(p(y ) q(y)) q(y) p() = min D(p(y ) q(y)), ma q(y) where the last equality follows by noting that the maimum epectation is attained by putting all the weights on the value that maimizes D(p(y ) q(y)). Furthermore, if p ()attainsthemaimum, thenbytheoptimalityconditionof (2.4) attains the minimum. q (y) p ()p(y ) 2.5 ENTROPY RATE Let X = (X n ) n= bearandomprocessonafinitealphabetx. Theamount ofuncertainty persymbol ismeasuredbyitsentropyrate if the limit eists. H(X) = lim n n H(X,...,X n ), Eample 2.5. If X is stationary, then the limit eists and H(X) = lim n H(X n X n ). Eample 2.6. If X is an aperiodic irreducible Markov chain, then the limit eists and H(X) = lim n H(X n X n ) = π( )H(p( 2 )), where π is the unique stationary distribution of the chain. Eample2.7. If X,X 2,...areindependentandidentically distributed(i.i.d.), then H(X) = H(X ). Eample2.8. LetY = (Y n ) n= beastationarymarkovchainandx n = f(y n ),n =,2,... Theresultingrandomprocess X = (X n ) n= ishiddenmarkovanditsentropyratesatisfies H(X n X n,y ) H(X) H(X n X n ) and H(X) = lim n H(X n X n,y ) = lim n H(X n X n ).
8 8 Information Measures 2.6 RELATIVE ENTROPY RATE Let Pand QbetwoprobabilitymeasuresonX withn-thorderdensitiesp( n )andq( n ), respectively. The normalized discrepancy between them is measured by their relative entropy rate D(P Q) = lim n n D(p(n ) q( n )) if the limit eists. Eample 2.9. If P is stationary, Q is stationary finite-order Markov, and P is absolutely continuous w.r.t. Q, then the limit eists and D(P Q) = lim n p( n )D(p( n n ) q( n n ))d n. (2.5) See, for eample, Barron(985) and Gray(20, Lemma 0.). Eample 2.0. Similarly, if P and Q are stationary hidden Markov and P is absolutely continuous w.r.t. Q, then the limit eists and(2.5) holds(juang and Rabiner 985, Lerou 992, Ephraim and Merhav 2002). PROBLEMS 2.. Prove Property 2 of entropy in Section 2. and find the equality condition for Property Prove Properties through 3 of relative entropy in Section Entropy and relative entropy. Let X be a finite alphabet and q() be the uniform pmfonx. Showthatforanypmf p()onx, D(p q) = log X H(p()) The total variation distance between two pmfs p() and q() is defined as δ(p,q) = 2 p() q(). X (a) Showthatthisdistanceisan f-divergencebyfindingthecorresponding f. (b) Show that δ(p,q) = ma A X P(A) Q(A), where Pand Qarecorrespondingprobabilitymeasures,e.g., P(A) = A p().
9 Problems Pinsker s inequality. Show that δ(p,q) 2loge D(p q), where the logarithm has the same base as the relative entropy.(hint: First consider thecasethatx isbinary.) 2.6. Let p(, y) be a joint pmf on X Y. Show that p(, y) is absolutely continuous w.r.t. p()p(y) Prove Properties through 4 of mutual information in Section Let X = (X n ) n= beastationaryrandomprocess. Showthat H(X) = lim n H(X n X n ) LetY = (Y n ) n= beastationarymarkovchainandx = {д(y n)}beahiddenmarkov process. Show that and conclude that H(X n X n,y ) H(X) H(X n X n ) H(X) = lim n H(X n X n,y ) = lim n H(X n X n ) Recurrencetime. LetX 0,X,X 2,...bei.i.d.copiesofX p(),andletn = min{n : X n = X 0 }bethewaitingtimetothenetoccurrenceof X 0. (a) Showthat E(N) = X. (b)showthat E(logN) H(X). 2.. Thepastandthefuture. Let X = (X n ) n= bestationary. Showthat lim n n I(X,...,X n ;X n+,...,x 2n ) = Variable-duration symbols. A discrete memoryless source has the alphabet {, 2}, where symbol has duration and symbol 2 has duration 2. Let X = (X n ) n= be the resulting random process. (a) FinditsentropyrateH(X)intermsoftheprobability pofsymbol. (b) Find the maimum entropy by optimizing over p.
10
11 Bibliography Barron, A. R.(985). The strong ergodic theorem for densities: Generalized Shannon McMillan Breiman theorem. Ann. Probab., 3(4), [8] Ephraim, Y. and Merhav, N. (2002). Hidden Markov processes. IEEE Trans. Inf. Theory, 48(6), [8] Gray, R. M.(20). Entropy and information theory. Springer, New York. [8] Juang, B.-H. F. and Rabiner, L. R. (985). A probabilistic distance measure for hidden Markov models. AT&T Tech. J., 64(2), [8] Lerou, B. G.(992). Maimum-likelihood estimation for hidden Markov models. Stoc. Proc. Appl., 40(), [8] Liese, F. and Vajda, I.(2006). On divergences and informations in statistics and information theory. IEEE Trans. Inf. Theory, 52(0), [5] Sion, M.(958). On general minima theorems. Pacific J. Math., 8, [6]
1 p(x) = E log 1. H(X) = plog 1 p +(1 p)log 1 1 p.
LECTURE 2 Iformatio Measures 2. ENTROPY LetXbeadiscreteradomvariableoaalphabetX drawaccordigtotheprobability mass fuctio (pmf) p() = P(X = ), X, deoted i short as X p(). The ucertaity about the outcome
More informationChapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye
Chapter 2: Entropy and Mutual Information Chapter 2 outline Definitions Entropy Joint entropy, conditional entropy Relative entropy, mutual information Chain rules Jensen s inequality Log-sum inequality
More informationLecture 5 - Information theory
Lecture 5 - Information theory Jan Bouda FI MU May 18, 2012 Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 1 / 42 Part I Uncertainty and entropy Jan Bouda (FI MU) Lecture 5 - Information
More informationECE 587 / STA 563: Lecture 2 Measures of Information Information Theory Duke University, Fall 2017
ECE 587 / STA 563: Lecture 2 Measures of Information Information Theory Duke University, Fall 207 Author: Galen Reeves Last Modified: August 3, 207 Outline of lecture: 2. Quantifying Information..................................
More information5 Mutual Information and Channel Capacity
5 Mutual Information and Channel Capacity In Section 2, we have seen the use of a quantity called entropy to measure the amount of randomness in a random variable. In this section, we introduce several
More informationThe binary entropy function
ECE 7680 Lecture 2 Definitions and Basic Facts Objective: To learn a bunch of definitions about entropy and information measures that will be useful through the quarter, and to present some simple but
More informationInformation Theory and Communication
Information Theory and Communication Ritwik Banerjee rbanerjee@cs.stonybrook.edu c Ritwik Banerjee Information Theory and Communication 1/8 General Chain Rules Definition Conditional mutual information
More informationLecture 2: August 31
0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 2: August 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy
More information3. If a choice is broken down into two successive choices, the original H should be the weighted sum of the individual values of H.
Appendix A Information Theory A.1 Entropy Shannon (Shanon, 1948) developed the concept of entropy to measure the uncertainty of a discrete random variable. Suppose X is a discrete random variable that
More informationECE 587 / STA 563: Lecture 5 Lossless Compression
ECE 587 / STA 563: Lecture 5 Lossless Compression Information Theory Duke University, Fall 2017 Author: Galen Reeves Last Modified: October 18, 2017 Outline of lecture: 5.1 Introduction to Lossless Source
More informationCOMPSCI 650 Applied Information Theory Jan 21, Lecture 2
COMPSCI 650 Applied Information Theory Jan 21, 2016 Lecture 2 Instructor: Arya Mazumdar Scribe: Gayane Vardoyan, Jong-Chyi Su 1 Entropy Definition: Entropy is a measure of uncertainty of a random variable.
More informationECE 587 / STA 563: Lecture 5 Lossless Compression
ECE 587 / STA 563: Lecture 5 Lossless Compression Information Theory Duke University, Fall 28 Author: Galen Reeves Last Modified: September 27, 28 Outline of lecture: 5. Introduction to Lossless Source
More informationInformation Theory Primer:
Information Theory Primer: Entropy, KL Divergence, Mutual Information, Jensen s inequality Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,
More informationExample: Letter Frequencies
Example: Letter Frequencies i a i p i 1 a 0.0575 2 b 0.0128 3 c 0.0263 4 d 0.0285 5 e 0.0913 6 f 0.0173 7 g 0.0133 8 h 0.0313 9 i 0.0599 10 j 0.0006 11 k 0.0084 12 l 0.0335 13 m 0.0235 14 n 0.0596 15 o
More informationMGMT 69000: Topics in High-dimensional Data Analysis Falll 2016
MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016 Lecture 14: Information Theoretic Methods Lecturer: Jiaming Xu Scribe: Hilda Ibriga, Adarsh Barik, December 02, 2016 Outline f-divergence
More informationExample: Letter Frequencies
Example: Letter Frequencies i a i p i 1 a 0.0575 2 b 0.0128 3 c 0.0263 4 d 0.0285 5 e 0.0913 6 f 0.0173 7 g 0.0133 8 h 0.0313 9 i 0.0599 10 j 0.0006 11 k 0.0084 12 l 0.0335 13 m 0.0235 14 n 0.0596 15 o
More informationTight Bounds for Symmetric Divergence Measures and a New Inequality Relating f-divergences
Tight Bounds for Symmetric Divergence Measures and a New Inequality Relating f-divergences Igal Sason Department of Electrical Engineering Technion, Haifa 3000, Israel E-mail: sason@ee.technion.ac.il Abstract
More informationMachine Learning. Lecture 02.2: Basics of Information Theory. Nevin L. Zhang
Machine Learning Lecture 02.2: Basics of Information Theory Nevin L. Zhang lzhang@cse.ust.hk Department of Computer Science and Engineering The Hong Kong University of Science and Technology Nevin L. Zhang
More informationCommunication Theory and Engineering
Communication Theory and Engineering Master's Degree in Electronic Engineering Sapienza University of Rome A.A. 018-019 Information theory Practice work 3 Review For any probability distribution, we define
More informationLECTURE 2. Convexity and related notions. Last time: mutual information: definitions and properties. Lecture outline
LECTURE 2 Convexity and related notions Last time: Goals and mechanics of the class notation entropy: definitions and properties mutual information: definitions and properties Lecture outline Convexity
More informationComplex Systems Methods 2. Conditional mutual information, entropy rate and algorithmic complexity
Complex Systems Methods 2. Conditional mutual information, entropy rate and algorithmic complexity Eckehard Olbrich MPI MiS Leipzig Potsdam WS 2007/08 Olbrich (Leipzig) 26.10.2007 1 / 18 Overview 1 Summary
More informationLecture 22: Final Review
Lecture 22: Final Review Nuts and bolts Fundamental questions and limits Tools Practical algorithms Future topics Dr Yao Xie, ECE587, Information Theory, Duke University Basics Dr Yao Xie, ECE587, Information
More information4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information
4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information Ramji Venkataramanan Signal Processing and Communications Lab Department of Engineering ramji.v@eng.cam.ac.uk
More information5.1 Inequalities via joint range
ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 5: Inequalities between f-divergences via their joint range Lecturer: Yihong Wu Scribe: Pengkun Yang, Feb 9, 2016
More informationTight Bounds for Symmetric Divergence Measures and a Refined Bound for Lossless Source Coding
APPEARS IN THE IEEE TRANSACTIONS ON INFORMATION THEORY, FEBRUARY 015 1 Tight Bounds for Symmetric Divergence Measures and a Refined Bound for Lossless Source Coding Igal Sason Abstract Tight bounds for
More informationChapter 8: Differential entropy. University of Illinois at Chicago ECE 534, Natasha Devroye
Chapter 8: Differential entropy Chapter 8 outline Motivation Definitions Relation to discrete entropy Joint and conditional differential entropy Relative entropy and mutual information Properties AEP for
More informationLECTURE 3. Last time:
LECTURE 3 Last time: Mutual Information. Convexity and concavity Jensen s inequality Information Inequality Data processing theorem Fano s Inequality Lecture outline Stochastic processes, Entropy rate
More informationLecture 1: Introduction, Entropy and ML estimation
0-704: Information Processing and Learning Spring 202 Lecture : Introduction, Entropy and ML estimation Lecturer: Aarti Singh Scribes: Min Xu Disclaimer: These notes have not been subjected to the usual
More informationIntroduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.
L65 Dept. of Linguistics, Indiana University Fall 205 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission rate
More informationAn instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1
Kraft s inequality An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if N 2 l i 1 Proof: Suppose that we have a tree code. Let l max = max{l 1,...,
More informationDept. of Linguistics, Indiana University Fall 2015
L645 Dept. of Linguistics, Indiana University Fall 2015 1 / 28 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission
More informationInformation Theory: Entropy, Markov Chains, and Huffman Coding
The University of Notre Dame A senior thesis submitted to the Department of Mathematics and the Glynn Family Honors Program Information Theory: Entropy, Markov Chains, and Huffman Coding Patrick LeBlanc
More informationSolutions to Homework Set #1 Sanov s Theorem, Rate distortion
st Semester 00/ Solutions to Homework Set # Sanov s Theorem, Rate distortion. Sanov s theorem: Prove the simple version of Sanov s theorem for the binary random variables, i.e., let X,X,...,X n be a sequence
More informationEntropy and Ergodic Theory Lecture 4: Conditional entropy and mutual information
Entropy and Ergodic Theory Lecture 4: Conditional entropy and mutual information 1 Conditional entropy Let (Ω, F, P) be a probability space, let X be a RV taking values in some finite set A. In this lecture
More informationIntroduction to Information Theory
Introduction to Information Theory Gurinder Singh Mickey Atwal atwal@cshl.edu Center for Quantitative Biology Kullback-Leibler Divergence Summary Shannon s coding theorems Entropy Mutual Information Multi-information
More informationEE 4TM4: Digital Communications II. Channel Capacity
EE 4TM4: Digital Communications II 1 Channel Capacity I. CHANNEL CODING THEOREM Definition 1: A rater is said to be achievable if there exists a sequence of(2 nr,n) codes such thatlim n P (n) e (C) = 0.
More informationECE 4400:693 - Information Theory
ECE 4400:693 - Information Theory Dr. Nghi Tran Lecture 8: Differential Entropy Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 1 / 43 Outline 1 Review: Entropy of discrete RVs 2 Differential
More informationSolutions to Set #2 Data Compression, Huffman code and AEP
Solutions to Set #2 Data Compression, Huffman code and AEP. Huffman coding. Consider the random variable ( ) x x X = 2 x 3 x 4 x 5 x 6 x 7 0.50 0.26 0. 0.04 0.04 0.03 0.02 (a) Find a binary Huffman code
More informationThe Method of Types and Its Application to Information Hiding
The Method of Types and Its Application to Information Hiding Pierre Moulin University of Illinois at Urbana-Champaign www.ifp.uiuc.edu/ moulin/talks/eusipco05-slides.pdf EUSIPCO Antalya, September 7,
More informationQuantitative Biology II Lecture 4: Variational Methods
10 th March 2015 Quantitative Biology II Lecture 4: Variational Methods Gurinder Singh Mickey Atwal Center for Quantitative Biology Cold Spring Harbor Laboratory Image credit: Mike West Summary Approximate
More informationIntroduction to Information Theory. B. Škorić, Physical Aspects of Digital Security, Chapter 2
Introduction to Information Theory B. Škorić, Physical Aspects of Digital Security, Chapter 2 1 Information theory What is it? - formal way of counting information bits Why do we need it? - often used
More informationData Compression. Limit of Information Compression. October, Examples of codes 1
Data Compression Limit of Information Compression Radu Trîmbiţaş October, 202 Outline Contents Eamples of codes 2 Kraft Inequality 4 2. Kraft Inequality............................ 4 2.2 Kraft inequality
More informationSeries 7, May 22, 2018 (EM Convergence)
Exercises Introduction to Machine Learning SS 2018 Series 7, May 22, 2018 (EM Convergence) Institute for Machine Learning Dept. of Computer Science, ETH Zürich Prof. Dr. Andreas Krause Web: https://las.inf.ethz.ch/teaching/introml-s18
More informationEE5319R: Problem Set 3 Assigned: 24/08/16, Due: 31/08/16
EE539R: Problem Set 3 Assigned: 24/08/6, Due: 3/08/6. Cover and Thomas: Problem 2.30 (Maimum Entropy): Solution: We are required to maimize H(P X ) over all distributions P X on the non-negative integers
More informationLecture 6 I. CHANNEL CODING. X n (m) P Y X
6- Introduction to Information Theory Lecture 6 Lecturer: Haim Permuter Scribe: Yoav Eisenberg and Yakov Miron I. CHANNEL CODING We consider the following channel coding problem: m = {,2,..,2 nr} Encoder
More informationECE598: Information-theoretic methods in high-dimensional statistics Spring 2016
ECE598: Information-theoretic methods in high-dimensional statistics Spring 06 Lecture : Mutual Information Method Lecturer: Yihong Wu Scribe: Jaeho Lee, Mar, 06 Ed. Mar 9 Quick review: Assouad s lemma
More informationComputational Systems Biology: Biology X
Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA L#8:(November-08-2010) Cancer and Signals Outline 1 Bayesian Interpretation of Probabilities Information Theory Outline Bayesian
More informationExample: Letter Frequencies
Example: Letter Frequencies i a i p i 1 a 0.0575 2 b 0.0128 3 c 0.0263 4 d 0.0285 5 e 0.0913 6 f 0.0173 7 g 0.0133 8 h 0.0313 9 i 0.0599 10 j 0.0006 11 k 0.0084 12 l 0.0335 13 m 0.0235 14 n 0.0596 15 o
More informationHomework Set #2 Data Compression, Huffman code and AEP
Homework Set #2 Data Compression, Huffman code and AEP 1. Huffman coding. Consider the random variable ( x1 x X = 2 x 3 x 4 x 5 x 6 x 7 0.50 0.26 0.11 0.04 0.04 0.03 0.02 (a Find a binary Huffman code
More informationConvexity/Concavity of Renyi Entropy and α-mutual Information
Convexity/Concavity of Renyi Entropy and -Mutual Information Siu-Wai Ho Institute for Telecommunications Research University of South Australia Adelaide, SA 5095, Australia Email: siuwai.ho@unisa.edu.au
More informationComputing and Communications 2. Information Theory -Entropy
1896 1920 1987 2006 Computing and Communications 2. Information Theory -Entropy Ying Cui Department of Electronic Engineering Shanghai Jiao Tong University, China 2017, Autumn 1 Outline Entropy Joint entropy
More informationVariable Length Codes for Degraded Broadcast Channels
Variable Length Codes for Degraded Broadcast Channels Stéphane Musy School of Computer and Communication Sciences, EPFL CH-1015 Lausanne, Switzerland Email: stephane.musy@ep.ch Abstract This paper investigates
More informationLecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157
Lecture 6: Gaussian Channels Copyright G. Caire (Sample Lectures) 157 Differential entropy (1) Definition 18. The (joint) differential entropy of a continuous random vector X n p X n(x) over R is: Z h(x
More informationEE514A Information Theory I Fall 2013
EE514A Information Theory I Fall 2013 K. Mohan, Prof. J. Bilmes University of Washington, Seattle Department of Electrical Engineering Fall Quarter, 2013 http://j.ee.washington.edu/~bilmes/classes/ee514a_fall_2013/
More informationEntropies & Information Theory
Entropies & Information Theory LECTURE I Nilanjana Datta University of Cambridge,U.K. See lecture notes on: http://www.qi.damtp.cam.ac.uk/node/223 Quantum Information Theory Born out of Classical Information
More informationLarge Deviations Performance of Knuth-Yao algorithm for Random Number Generation
Large Deviations Performance of Knuth-Yao algorithm for Random Number Generation Akisato KIMURA akisato@ss.titech.ac.jp Tomohiko UYEMATSU uematsu@ss.titech.ac.jp April 2, 999 No. AK-TR-999-02 Abstract
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science Transmission of Information Spring 2006
MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science 6.44 Transmission of Information Spring 2006 Homework 2 Solution name username April 4, 2006 Reading: Chapter
More informationLecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016
Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 1 Entropy Since this course is about entropy maximization,
More informationChaos, Complexity, and Inference (36-462)
Chaos, Complexity, and Inference (36-462) Lecture 7: Information Theory Cosma Shalizi 3 February 2009 Entropy and Information Measuring randomness and dependence in bits The connection to statistics Long-run
More informationBioinformatics: Biology X
Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA Model Building/Checking, Reverse Engineering, Causality Outline 1 Bayesian Interpretation of Probabilities 2 Where (or of what)
More informationEntropy and Large Deviations
Entropy and Large Deviations p. 1/32 Entropy and Large Deviations S.R.S. Varadhan Courant Institute, NYU Michigan State Universiy East Lansing March 31, 2015 Entropy comes up in many different contexts.
More informationHow to Quantitate a Markov Chain? Stochostic project 1
How to Quantitate a Markov Chain? Stochostic project 1 Chi-Ning,Chou Wei-chang,Lee PROFESSOR RAOUL NORMAND April 18, 2015 Abstract In this project, we want to quantitatively evaluate a Markov chain. In
More informationComputation of Information Rates from Finite-State Source/Channel Models
Allerton 2002 Computation of Information Rates from Finite-State Source/Channel Models Dieter Arnold arnold@isi.ee.ethz.ch Hans-Andrea Loeliger loeliger@isi.ee.ethz.ch Pascal O. Vontobel vontobel@isi.ee.ethz.ch
More informationInformation Theoretic Limits of Randomness Generation
Information Theoretic Limits of Randomness Generation Abbas El Gamal Stanford University Shannon Centennial, University of Michigan, September 2016 Information theory The fundamental problem of communication
More informationChapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University
Chapter 4 Data Transmission and Channel Capacity Po-Ning Chen, Professor Department of Communications Engineering National Chiao Tung University Hsin Chu, Taiwan 30050, R.O.C. Principle of Data Transmission
More informationCS 229: Lecture 7 Notes
CS 9: Lecture 7 Notes Scribe: Hirsh Jain Lecturer: Angela Fan Overview Overview of today s lecture: Hypothesis Testing Total Variation Distance Pinsker s Inequality Application of Pinsker s Inequality
More informationHomework 1 Due: Thursday 2/5/2015. Instructions: Turn in your homework in class on Thursday 2/5/2015
10-704 Homework 1 Due: Thursday 2/5/2015 Instructions: Turn in your homework in class on Thursday 2/5/2015 1. Information Theory Basics and Inequalities C&T 2.47, 2.29 (a) A deck of n cards in order 1,
More informationInformation Theory. David Rosenberg. June 15, New York University. David Rosenberg (New York University) DS-GA 1003 June 15, / 18
Information Theory David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 18 A Measure of Information? Consider a discrete random variable
More informationInformation Theory and Hypothesis Testing
Summer School on Game Theory and Telecommunications Campione, 7-12 September, 2014 Information Theory and Hypothesis Testing Mauro Barni University of Siena September 8 Review of some basic results linking
More informationNUMERICAL COMPUTATION OF THE CAPACITY OF CONTINUOUS MEMORYLESS CHANNELS
NUMERICAL COMPUTATION OF THE CAPACITY OF CONTINUOUS MEMORYLESS CHANNELS Justin Dauwels Dept. of Information Technology and Electrical Engineering ETH, CH-8092 Zürich, Switzerland dauwels@isi.ee.ethz.ch
More informationJune 21, Peking University. Dual Connections. Zhengchao Wan. Overview. Duality of connections. Divergence: general contrast functions
Dual Peking University June 21, 2016 Divergences: Riemannian connection Let M be a manifold on which there is given a Riemannian metric g =,. A connection satisfying Z X, Y = Z X, Y + X, Z Y (1) for all
More informationInformation measures in simple coding problems
Part I Information measures in simple coding problems in this web service in this web service Source coding and hypothesis testing; information measures A(discrete)source is a sequence {X i } i= of random
More informationCS 591, Lecture 2 Data Analytics: Theory and Applications Boston University
CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University Charalampos E. Tsourakakis January 25rd, 2017 Probability Theory The theory of probability is a system for making better guesses.
More informationLecture Notes for Statistics 311/Electrical Engineering 377. John Duchi
Lecture Notes for Statistics 311/Electrical Engineering 377 February 23, 2016 Contents 1 Introduction and setting 5 1.1 Information theory..................................... 5 1.2 Moving to statistics....................................
More informationAQI: Advanced Quantum Information Lecture 6 (Module 2): Distinguishing Quantum States January 28, 2013
AQI: Advanced Quantum Information Lecture 6 (Module 2): Distinguishing Quantum States January 28, 2013 Lecturer: Dr. Mark Tame Introduction With the emergence of new types of information, in this case
More informationx log x, which is strictly convex, and use Jensen s Inequality:
2. Information measures: mutual information 2.1 Divergence: main inequality Theorem 2.1 (Information Inequality). D(P Q) 0 ; D(P Q) = 0 iff P = Q Proof. Let ϕ(x) x log x, which is strictly convex, and
More informationChapter I: Fundamental Information Theory
ECE-S622/T62 Notes Chapter I: Fundamental Information Theory Ruifeng Zhang Dept. of Electrical & Computer Eng. Drexel University. Information Source Information is the outcome of some physical processes.
More informationMachine Learning Srihari. Information Theory. Sargur N. Srihari
Information Theory Sargur N. Srihari 1 Topics 1. Entropy as an Information Measure 1. Discrete variable definition Relationship to Code Length 2. Continuous Variable Differential Entropy 2. Maximum Entropy
More information1 Basic Information Theory
ECE 6980 An Algorithmic and Information-Theoretic Toolbo for Massive Data Instructor: Jayadev Acharya Lecture #4 Scribe: Xiao Xu 6th September, 206 Please send errors to 243@cornell.edu and acharya@cornell.edu
More informationContext tree models for source coding
Context tree models for source coding Toward Non-parametric Information Theory Licence de droits d usage Outline Lossless Source Coding = density estimation with log-loss Source Coding and Universal Coding
More informationApplication of Information Theory, Lecture 7. Relative Entropy. Handout Mode. Iftach Haitner. Tel Aviv University.
Application of Information Theory, Lecture 7 Relative Entropy Handout Mode Iftach Haitner Tel Aviv University. December 1, 2015 Iftach Haitner (TAU) Application of Information Theory, Lecture 7 December
More informationInformation in Biology
Lecture 3: Information in Biology Tsvi Tlusty, tsvi@unist.ac.kr Living information is carried by molecular channels Living systems I. Self-replicating information processors Environment II. III. Evolve
More informationSymmetric Characterization of Finite State Markov Channels
Symmetric Characterization of Finite State Markov Channels Mohammad Rezaeian Department of Electrical and Electronic Eng. The University of Melbourne Victoria, 31, Australia Email: rezaeian@unimelb.edu.au
More informationGambling and Information Theory
Gambling and Information Theory Giulio Bertoli UNIVERSITEIT VAN AMSTERDAM December 17, 2014 Overview Introduction Kelly Gambling Horse Races and Mutual Information Some Facts Shannon (1948): definitions/concepts
More informationPrinciples of Communications
Principles of Communications Weiyao Lin Shanghai Jiao Tong University Chapter 10: Information Theory Textbook: Chapter 12 Communication Systems Engineering: Ch 6.1, Ch 9.1~ 9. 92 2009/2010 Meixia Tao @
More informationELEC546 Review of Information Theory
ELEC546 Review of Information Theory Vincent Lau 1/1/004 1 Review of Information Theory Entropy: Measure of uncertainty of a random variable X. The entropy of X, H(X), is given by: If X is a discrete random
More informationarxiv: v4 [cs.it] 17 Oct 2015
Upper Bounds on the Relative Entropy and Rényi Divergence as a Function of Total Variation Distance for Finite Alphabets Igal Sason Department of Electrical Engineering Technion Israel Institute of Technology
More informationArimoto Channel Coding Converse and Rényi Divergence
Arimoto Channel Coding Converse and Rényi Divergence Yury Polyanskiy and Sergio Verdú Abstract Arimoto proved a non-asymptotic upper bound on the probability of successful decoding achievable by any code
More informationInformation in Biology
Information in Biology CRI - Centre de Recherches Interdisciplinaires, Paris May 2012 Information processing is an essential part of Life. Thinking about it in quantitative terms may is useful. 1 Living
More informationCoding on Countably Infinite Alphabets
Coding on Countably Infinite Alphabets Non-parametric Information Theory Licence de droits d usage Outline Lossless Coding on infinite alphabets Source Coding Universal Coding Infinite Alphabets Enveloppe
More informationHands-On Learning Theory Fall 2016, Lecture 3
Hands-On Learning Theory Fall 016, Lecture 3 Jean Honorio jhonorio@purdue.edu 1 Information Theory First, we provide some information theory background. Definition 3.1 (Entropy). The entropy of a discrete
More informationCapacity of the Discrete Memoryless Energy Harvesting Channel with Side Information
204 IEEE International Symposium on Information Theory Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information Omur Ozel, Kaya Tutuncuoglu 2, Sennur Ulukus, and Aylin Yener
More informationCommon Information. Abbas El Gamal. Stanford University. Viterbi Lecture, USC, April 2014
Common Information Abbas El Gamal Stanford University Viterbi Lecture, USC, April 2014 Andrew Viterbi s Fabulous Formula, IEEE Spectrum, 2010 El Gamal (Stanford University) Disclaimer Viterbi Lecture 2
More informationInformation Theory in Intelligent Decision Making
Information Theory in Intelligent Decision Making Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire, United Kingdom June 7, 2015 Information Theory
More informationarxiv: v4 [cs.it] 8 Apr 2014
1 On Improved Bounds for Probability Metrics and f-divergences Igal Sason Abstract Derivation of tight bounds for probability metrics and f-divergences is of interest in information theory and statistics.
More informationApproximate inference, Sampling & Variational inference Fall Cours 9 November 25
Approimate inference, Sampling & Variational inference Fall 2015 Cours 9 November 25 Enseignant: Guillaume Obozinski Scribe: Basile Clément, Nathan de Lara 9.1 Approimate inference with MCMC 9.1.1 Gibbs
More informationLecture 5 Channel Coding over Continuous Channels
Lecture 5 Channel Coding over Continuous Channels I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw November 14, 2014 1 / 34 I-Hsiang Wang NIT Lecture 5 From
More informationLecture 8: Channel Capacity, Continuous Random Variables
EE376A/STATS376A Information Theory Lecture 8-02/0/208 Lecture 8: Channel Capacity, Continuous Random Variables Lecturer: Tsachy Weissman Scribe: Augustine Chemparathy, Adithya Ganesh, Philip Hwang Channel
More information3F1 Information Theory, Lecture 1
3F1 Information Theory, Lecture 1 Jossy Sayir Department of Engineering Michaelmas 2013, 22 November 2013 Organisation History Entropy Mutual Information 2 / 18 Course Organisation 4 lectures Course material:
More informationThe Information Bottleneck Revisited or How to Choose a Good Distortion Measure
The Information Bottleneck Revisited or How to Choose a Good Distortion Measure Peter Harremoës Centrum voor Wiskunde en Informatica PO 94079, 1090 GB Amsterdam The Nederlands PHarremoes@cwinl Naftali
More information