Application of Information Theory, Lecture 7. Relative Entropy. Handout Mode. Iftach Haitner. Tel Aviv University.

Size: px
Start display at page:

Download "Application of Information Theory, Lecture 7. Relative Entropy. Handout Mode. Iftach Haitner. Tel Aviv University."

Transcription

1 Application of Information Theory, Lecture 7 Relative Entropy Handout Mode Iftach Haitner Tel Aviv University. December 1, 2015 Iftach Haitner (TAU) Application of Information Theory, Lecture 7 December 1, / 36

2 Part I Statistical Distance Iftach Haitner (TAU) Application of Information Theory, Lecture 7 December 1, / 36

3 Statistical distance Let p = (p 1,..., p m ) and q = (q 1,..., q m ) be distributions over [m] Their statistical distance (also known as, variation distance) is defined by SD(p, q) := 1 p i q i 2 i [m] This is simply the L 1 norm between the distribution vectors We will soon see another distance measures for distributions next lecture For Z p and Y q, let SD(X, Y ) = SD(p, q) Claim (HW): SD(p, q) = max S [m] ( i S p i i S q ) i Hence, SD(p, q) = max D (Pr X p [D(X) = 1] Pr X q [D(X) = 1]) Interpretation Iftach Haitner (TAU) Application of Information Theory, Lecture 7 December 1, / 36

4 Distance from the uniform distribution Let X be rv over [m] H(X) log m H(X) = log m X is uniform over [m] Theorem 1 (this lecture) Let X rv over [m]. Assume H(X) log m ε, then SD(X, [m]) ε ln 2 2 = O( ε) Iftach Haitner (TAU) Application of Information Theory, Lecture 7 December 1, / 36

5 Part II Relative entropy Distance Iftach Haitner (TAU) Application of Information Theory, Lecture 7 December 1, / 36

6 Section 1 Definition and Basic Facts Iftach Haitner (TAU) Application of Information Theory, Lecture 7 December 1, / 36

7 Definition For p = (p 1,..., p m ) and q = (q 1,..., q m ), let 0 log 0 0 = 0, p log p 0 = D(p q) = m i=1 p i log p i q i The relative entropy of pair of rv s, is the relative entropy of their distributions. Names: Entropy of p relative to q, relative entropy, information divergence, Kullback-Leibler (KL) divergence/distance Many different interpretations Main interpretation: the information we gained about X, if we originally thought X q and now we learned X p Iftach Haitner (TAU) Application of Information Theory, Lecture 7 December 1, / 36

8 Numerical Example D(p q) = m i=1 p i log p i q i p = ( 1 4, 1 2, 1 4, 0), q = ( 1 2, 1 4, 1 8, 1 8 ) D(p q) = 1 4 log log log log 0 = 1 4 ( 1) = D(q p) = 1 2 log log log log = ( 1) ( 1) + = Iftach Haitner (TAU) Application of Information Theory, Lecture 7 December 1, / 36

9 Supporting the interpretation X rv over [m] H(X) measure for amount of information we do not have about X log m H(X) measure for information we do have about X (just by knowing its distribution) Example X = (X 1, X 2 ) ( 1 2, 0, 0, 1 2 ) over {00, 01, 10, 11} H(X) = 1, log m H(X) = 2 1 = 1 Indeed, we know X 1 X 2 H( [m]) H(p 1,..., p m ) = log m H(p 1,..., p m ) = log m + i p i log p i = i p i (log p i log 1 m ) = p i log p i = D(p [m]) 1 i m D(X [m]) measures the information we gained about X, if we originally thought it is [m] and now we learned it is p Iftach Haitner (TAU) Application of Information Theory, Lecture 7 December 1, / 36

10 Supporting the interpretation, cont. (generally) D(p q) H(q) H(p) H(q) H(p) is not a good measure for information change Example: q = (0.01, 0.99) and p = (0.99, 0.01) We were almost sure that X = 1 but learned that X is almost surely 0 But H(q) H(p) = 0 Also, H(q) H(p) might be negative We understand D(p q) as the information we gained about X, if we originally thought it is q and now we learned it is p Iftach Haitner (TAU) Application of Information Theory, Lecture 7 December 1, / 36

11 Changing distribution What does it mean: originally thought X q and now we learned X p? How can a distribution change? Typically, this happens by learning additional information q i = Pr [X = i] and p i = Pr [X = i E] Example X ( 1 2, 1 4, 1 4, 0); someone saw X and tells us that X 2 The distribution changes to X ( 2 3, 1 3, 0, 0) Another example X Y Y ( 1 2, 1 4, 1 4, 0), but Y ( 1 2, 1 2, 0, 0) conditioned on X = 0 Y ( 1 2, 0, 1 2, 0) conditioned on X = 1 Generally, a distribution can change if we condition on event E Iftach Haitner (TAU) Application of Information Theory, Lecture 7 December 1, / 36

12 Additional properties 0 log 0 0 = 0, p log p 0 = for p > 0 i s.t. p i > 0 and q i = 0, then D(p q) = If originally Pr [X = i] = 0, then it cannot be more than 0 after we learned something. Hence, it make sense to think of it as infinite amount of information learnt Alteratively, we can define D(p q) only for distribution with q i = 0 = p i = 0 (recall that Pr [X = i] = 0 = Pr [X = i E] = 0, for any event E If p i is large and q i is small, then D(p q) is large D(p q) 0, with equality iff p = q (hw) Iftach Haitner (TAU) Application of Information Theory, Lecture 7 December 1, / 36

13 Example q = (q 1,..., q m ) with n i=1 q i = 2 k (i.e., n < m) { qi /2 p i = k, 1 i n 0, otherwise. p = (p 1,..., p m ) the distribution of q conditioned on the event i [n] D(p q) = n i=1 p i log p i q i We gained k bits of information = n i=1 p i log 2 k = n i=1 p ik = k Example: n i=1 q i = 1 2, and we were told that i n or i > n, we got one bit of information Iftach Haitner (TAU) Application of Information Theory, Lecture 7 December 1, / 36

14 Section 2 Axiomatic Derivation Iftach Haitner (TAU) Application of Information Theory, Lecture 7 December 1, / 36

15 Axiomatic derivation Let D is a continuous and symmetric (wrt each distribution) function such that 1. D(p [m]) = log m H(p) 2. D((p1,..., p m ) (q 1,..., q m )) = D((p 1,..., p m 1, αp m, (1 α)p m ) (q 1,..., q m 1, αq m, (1 α)q m )), for any α [0, 1] then D = D. Interpretation Proof: Let p and q be distributions over [m], and assume q i Q \ {0}. D(p q) = D((α1,1 p 1,..., α 1,k1 p 1,..., α m,1 p m,..., α m,km p m ) (α 1,1 q 1,..., α 1,k1 q 1,..., α m,1 q m,..., α m,km q m )), for j α i,j = 1 and α i,j 0 Taking α s s.t. α i,1 = α i,2..., α i,ki = α i and α i q i = 1 M, it follows that D(p q) = log M H((α 1,1 p 1,..., α 1,k1 p 1,..., α m,1 p m,..., α m,km p m )) = p i log M + p i log α i p i = p i (log M + log p i q i M ) = i i i Zeros and non-rational q i s are dealt by continuity p i log p i q i. Iftach Haitner (TAU) Application of Information Theory, Lecture 7 December 1, / 36

16 Section 3 Relation to Mutual Information Iftach Haitner (TAU) Application of Information Theory, Lecture 7 December 1, / 36

17 Mutual information as expected relative entropy Claim 2 E y Y [D(X Y =y X)] = I(X; Y ). Proof: Let X (q 1,..., q m ) over [m], and Y be rv over {0, 1} (X Y =j ) p j = (p j,1,..., p j,m ), p j,i = Pr [X = i Y = j] E Y [D(p Y q)] = Pr [Y = 0] D(p 0,1,..., p 0,m q 1,..., q m ) + Pr [Y = 1] D(p 1,1,..., p 1,m q 1,..., q m ) = Pr [Y = 0] p 0,i log p 0,i + Pr [Y = 1] p 1,i log p 1,i q i q i i = Pr [Y = 0] p 0,i log p 0,i + Pr [Y = 1] p 1,i log p 1,i Pr [Y = 0] p 0,i log q i Pr [Y = 1] p 1,i log q i = H(X Y ) (Pr [Y = 0] p 0,i + Pr [Y = 1] p 1,i ) log q i = H(X Y ) + H(X) = I(X; Y ). Iftach Haitner (TAU) Application of Information Theory, Lecture 7 December 1, / 36 i

18 Equivalent definition for mutual information Claim 3 Let (X, Y ) p, then I(X; Y ) = D(p p X p Y ). Proof: D(p p X p Y ) = x,y = x,y p(x, y) log p(x, y) p X (x)p Y (y) p(x, y) log p X Y (x y) p X (x) = x,y p(x, y) log p X (x) + x,y p(x, y) log p X Y (x y) = H(X) + y p Y (y) x p X Y (x y) log p X Y (x y) = H(X) H(X Y ) = I(X; Y ). We will later relate the above two claims. Iftach Haitner (TAU) Application of Information Theory, Lecture 7 December 1, / 36

19 Section 4 Relation to Data Compression Iftach Haitner (TAU) Application of Information Theory, Lecture 7 December 1, / 36

20 Wrong code Theorem 4 Let p and q be distributions over [m], and let C be code with l(i) = C(i) = log 1qi. Then H(p) + D(p q) E i p [l(i)] H(p) + D(p q) + 1 Recall that H(q) E i q [l(i)] H(q) + 1. Proof of upperbound (upperbound is proved similarly) E [l(i)] = p i log 1 < p i (log 1 + 1) i p q i q i i i = 1 + p i (log p i 1 ) = 1 + q i p i i i = 1 + D(p q) + H(p) p i (log p i q i ) + i p i (log 1 p i ) Can there be a (close) to optimal code for q that is better for p? HW Iftach Haitner (TAU) Application of Information Theory, Lecture 7 December 1, / 36

21 Section 5 Conditional Relative Entropy Iftach Haitner (TAU) Application of Information Theory, Lecture 7 December 1, / 36

22 Conditional relative entropy For dist. p over X Y, let p X and p Y X be its marginal and conditional dist. Definition 5 For two distributions p and q over X Y: D(p Y X q Y X ) := p X (x) x X y Y D(p Y X q Y X ) = E (X,Y ) p(x,y) [ log p Y X (Y X) q Y X (Y X) p Y X (y x) log p Y X (y x) q Y X (y x) Let (X p, Y p ) p and (X q, Y q ) q, then D(p Y X q Y X ) = E x Xp [ D(Yq Xp=x Y q Xq=x) ] ] Numerical example: p = X Y q = X Y D(p Y X q Y X ) = 1 4 D((1 2, 1 2 ) (1 3, 2 3 )) D((1 3, 2 3 ) (4 5, 1 5 )) =... Iftach Haitner (TAU) Application of Information Theory, Lecture 7 December 1, / 36

23 Chain rule Claim 6 For any two distributions p and q over X Y, it holds that D(p q) = D(p X q X ) + D(p Y X q Y X ) Proof: D(p q) = = = (x,y) X Y (x,y) X Y (x,y) X Y Hence, for (X, Y ) p: p(x, y) log p(x, y) q(x, y) p(x, y) log p X (x)p Y X (y x) q X (x)q Y X (y x) p(x, y) log p X (x) q X (x) + = D(p X q X ) + D(p Y X q Y X ) (x,y) X Y p(x, y) log p Y X (y x) q Y X (y x) [ I(X, Y ) = D(p p X p Y ) = D(p X p X ) + E D(pY X=x p Y ) ] x X = E x X [ D(pY X=x, p Y ) ]... Iftach Haitner (TAU) Application of Information Theory, Lecture 7 December 1, / 36

24 Section 6 Data-processing inequality Iftach Haitner (TAU) Application of Information Theory, Lecture 7 December 1, / 36

25 Data-processing inequality Claim 7 For any rv s X and Y and function f, it holds that D(f (X) f (Y )) D(X Y ). Analogues to H(X) H(f (X)) Proof: D(X, f (X) Y, f (Y )) = D(X Y ) D(X, f (X) Y, f (Y )) = D(f (X) f (Y )) + E z f (X) [ D(X f (X)=z Y f (X)=z )) ] D(f (X) f (Y )) Hence, D(f (X) f (Y )) D(X Y ). Iftach Haitner (TAU) Application of Information Theory, Lecture 7 December 1, / 36

26 Section 7 Relation to Statistical Distance Iftach Haitner (TAU) Application of Information Theory, Lecture 7 December 1, / 36

27 Relation to statistical distance D(p q) is used many time to measure the distance from p to q It is not a distance in the mathematical sense: D(p q) D(q p) and no triangle inequality However, Theorem 8 SD(p, q) ln 2 2 D(p q) Corollary: For rv X over [m] with H(X) log m ε, it holds that ln 2 SD(X, [m]) 2 (log m H(X)) = ln 2 2 ε Other direction is incorrect: SD(p, q) might be small but D(p q) = Does SD(p, [m]) being small imply D(p [m]) = log m H(p) is small? HW Iftach Haitner (TAU) Application of Information Theory, Lecture 7 December 1, / 36

28 Proving Thm 8, boolean case Let p = (α, 1 α) and q = (β, 1 β) and assume α β SD(p, q) = α β We will show that D(p q) = α log α 1 α β + (1 α) log 1 β 4 2 ln 2 (α β)2 = 2 ln 2 SD(p, q)2 Let g(x, y) = x log x y g(x, y) y 1 x + (1 x) log 1 y 4 2 ln 2 (x y)2 = x y ln x (1 y) ln 2 4 2(y x) 2 ln 2 y x = y(1 y) ln 2 4 (y x) ln 2 Since y(1 y) 1 4, g(x,y) y 0 for y < x. Since g(x, x) = 0, g(x, y) 0 for y < x. Iftach Haitner (TAU) Application of Information Theory, Lecture 7 December 1, / 36

29 Proving Thm 8, general case Let U = Supp(p) Supp(q) Let S = {u U : p(u) > q(u)} SD(p, q) = Pr p [S] Pr q [S] (by homework) Let P p, and let the indicator ˆP be 1 iff P S. Let Q q, and let the indicator ˆQ be 1 iff Q S. SD(ˆP, ˆQ) = Pr [P S] Pr [Q S] = SD(p, q) D(p q) D(ˆP ˆQ) 2 2 SD(ˆP, ˆQ) (the ln 2 = 2 ln 2 SD(p, q)2. (data-proccessing inequality) Boolean case) Iftach Haitner (TAU) Application of Information Theory, Lecture 7 December 1, / 36

30 Section 8 Conditioned Distributions Iftach Haitner (TAU) Application of Information Theory, Lecture 7 December 1, / 36

31 Main theorem Theorem 9 Let X 1,..., X k be iid over U, and let Y = (Y 1,..., Y k ) be rv over U k. Then k j=1 D(Y j X j ) D(Y (X 1,..., X k )). For rv Z, let Z (z) = Pr [Z = z]. We prove for k = 2, general case follows similar lines. Let X = (X 1, X 2 ) D(Y X) = y U 2 Y (y) log Y (y) X(y) = = y=(y 1,y 2 ) + y=(y 1,y 2 ) y=(y 1,y 2 ) Y (y) log Y 1(y 1 ) X 1 (y 1 ) + Y (y) log Y (y) log Y 1(y 1 ) Y 2 (y 2 ) Y (y) X 1 (y 1 ) X 2 (y 2 ) Y 1 (y 1 )Y 2 (y 2 ) y=(y 1,y 2 ) Y (y) Y 1 (y 1 )Y 2 (y 2 ) Y (y) log Y 2(y 2 ) X 2 (y 1 ) = D(Y 1 X 1 ) + D(Y 2 X 2 ) + I(Y 1 ; Y 2 ) D(Y 1 X 1 ) + D(Y 2 X 2 ) Iftach Haitner (TAU) Application of Information Theory, Lecture 7 December 1, / 36

32 Conditioning distributions, relative entropy case Theorem 10 Let X 1,..., X k be iid over X, let X = (X 1,..., X k ) and let W be an event (i.e., Boolean rv). Then k j=1 D((X j W ) X j ) D((X W ) X) log 1 Pr[W ]. k D((X j W ) X j ) D((X W ) X) (Thm 9) j=1 = (X W )(x) log (X W )(x) X(x) x X k = Pr [W X = x]) (X W )(x) log Pr [W ] x X k 1 = log Pr [W ] + (X W )(x) log Pr [W X = x]) x X k log 1 Pr [W ] (Bayes) Iftach Haitner (TAU) Application of Information Theory, Lecture 7 December 1, / 36

33 Conditioning distributions, statistical distance case Theorem 11 Let X 1,..., X k be iid over X and let W be an event. Then k j=1 SD((X j W ), X j ) 2 log 1 Pr[W ]. Proof: follows by Thm 8, and Thm 9. Using ( k j=1 a i) 2 k k j=1 a2 i, it follows that Corollary 12 k j=1 SD((X 1 j W ), X j ) k log( Pr[W ]), and E j k SD((X j W ), X j ) 1 k log( 1 Pr[W ] ) Extraction Iftach Haitner (TAU) Application of Information Theory, Lecture 7 December 1, / 36

34 Numerical example Let X = (X 1,..., X k ) {0, 1} 40 and let f : {0, 1} 40 0 be such that Pr [f (X) = 0] = E j [40] SD((X j f (X)=0 ), {0, 1}) = 1 2 Typical bits are not too biassed, even when conditioning on a very unlikely event. Iftach Haitner (TAU) Application of Information Theory, Lecture 7 December 1, / 36

35 Extension Theorem 13 Let X = (X 1,..., X k ), T and V be rv s over X k, T and V respectively. Let W be an event and assume that the X i s are iid conditioned on T. Then k j=1 D((TVX j) W (TV ) W X j 1 (T )) log Pr[W ] + log Supp(V W ), where X j (t) is distributed according to X j T =t. Interpretation. Iftach Haitner (TAU) Application of Information Theory, Lecture 7 December 1, / 36

36 Proving Thm 13 Let X = (X 1,..., X k ), T and V be rv s over X k, T and V respectively, such that X i s are iid conditioned on T. Let W be an event and let X j (t) be distributed according to the distribution of X j T =t. k j=1 D((TVX j ) W (TV ) W X j (T )) [ k = E D ( )] X j W,V =v,t =t (X j T =t (t,v) (TV ) W j=1 [ ( )] = E D (Xj (t,v) (TV ) W W, V = v ) T =t (X j T = t }{{} W [ 1 ] E log (t,v) (TV ) W Pr [W V = v T = t] 1 log E (t,v) (TV ) W Pr [W V = v T = t] = log (t,v) Supp((TV ) W ) Pr [T = t] Pr [W ] log Supp(V W ). Pr [W ] (chain rule) (Thm 10) (Jensen s inequality) Iftach Haitner (TAU) Application of Information Theory, Lecture 7 December 1, / 36

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information 4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information Ramji Venkataramanan Signal Processing and Communications Lab Department of Engineering ramji.v@eng.cam.ac.uk

More information

The binary entropy function

The binary entropy function ECE 7680 Lecture 2 Definitions and Basic Facts Objective: To learn a bunch of definitions about entropy and information measures that will be useful through the quarter, and to present some simple but

More information

Hands-On Learning Theory Fall 2016, Lecture 3

Hands-On Learning Theory Fall 2016, Lecture 3 Hands-On Learning Theory Fall 016, Lecture 3 Jean Honorio jhonorio@purdue.edu 1 Information Theory First, we provide some information theory background. Definition 3.1 (Entropy). The entropy of a discrete

More information

LECTURE 3. Last time:

LECTURE 3. Last time: LECTURE 3 Last time: Mutual Information. Convexity and concavity Jensen s inequality Information Inequality Data processing theorem Fano s Inequality Lecture outline Stochastic processes, Entropy rate

More information

Foundation of Cryptography, Lecture 4 Pseudorandom Functions

Foundation of Cryptography, Lecture 4 Pseudorandom Functions Foundation of Cryptography, Lecture 4 Pseudorandom Functions Handout Mode Iftach Haitner, Tel Aviv University Tel Aviv University. March 11, 2014 Iftach Haitner (TAU) Foundation of Cryptography March 11,

More information

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 2: Entropy and Mutual Information Chapter 2 outline Definitions Entropy Joint entropy, conditional entropy Relative entropy, mutual information Chain rules Jensen s inequality Log-sum inequality

More information

Series 7, May 22, 2018 (EM Convergence)

Series 7, May 22, 2018 (EM Convergence) Exercises Introduction to Machine Learning SS 2018 Series 7, May 22, 2018 (EM Convergence) Institute for Machine Learning Dept. of Computer Science, ETH Zürich Prof. Dr. Andreas Krause Web: https://las.inf.ethz.ch/teaching/introml-s18

More information

COMPSCI 650 Applied Information Theory Jan 21, Lecture 2

COMPSCI 650 Applied Information Theory Jan 21, Lecture 2 COMPSCI 650 Applied Information Theory Jan 21, 2016 Lecture 2 Instructor: Arya Mazumdar Scribe: Gayane Vardoyan, Jong-Chyi Su 1 Entropy Definition: Entropy is a measure of uncertainty of a random variable.

More information

Lecture 2: August 31

Lecture 2: August 31 0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 2: August 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy

More information

3. If a choice is broken down into two successive choices, the original H should be the weighted sum of the individual values of H.

3. If a choice is broken down into two successive choices, the original H should be the weighted sum of the individual values of H. Appendix A Information Theory A.1 Entropy Shannon (Shanon, 1948) developed the concept of entropy to measure the uncertainty of a discrete random variable. Suppose X is a discrete random variable that

More information

Computational Models - Lecture 3 1

Computational Models - Lecture 3 1 Computational Models - Lecture 3 1 Handout Mode Iftach Haitner and Yishay Mansour. Tel Aviv University. March 13/18, 2013 1 Based on frames by Benny Chor, Tel Aviv University, modifying frames by Maurice

More information

ECE 4400:693 - Information Theory

ECE 4400:693 - Information Theory ECE 4400:693 - Information Theory Dr. Nghi Tran Lecture 8: Differential Entropy Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 1 / 43 Outline 1 Review: Entropy of discrete RVs 2 Differential

More information

Information Theory Primer:

Information Theory Primer: Information Theory Primer: Entropy, KL Divergence, Mutual Information, Jensen s inequality Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,

More information

MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016

MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016 MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016 Lecture 14: Information Theoretic Methods Lecturer: Jiaming Xu Scribe: Hilda Ibriga, Adarsh Barik, December 02, 2016 Outline f-divergence

More information

Homework 1 Due: Thursday 2/5/2015. Instructions: Turn in your homework in class on Thursday 2/5/2015

Homework 1 Due: Thursday 2/5/2015. Instructions: Turn in your homework in class on Thursday 2/5/2015 10-704 Homework 1 Due: Thursday 2/5/2015 Instructions: Turn in your homework in class on Thursday 2/5/2015 1. Information Theory Basics and Inequalities C&T 2.47, 2.29 (a) A deck of n cards in order 1,

More information

Lecture 5 - Information theory

Lecture 5 - Information theory Lecture 5 - Information theory Jan Bouda FI MU May 18, 2012 Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 1 / 42 Part I Uncertainty and entropy Jan Bouda (FI MU) Lecture 5 - Information

More information

CS229T/STATS231: Statistical Learning Theory. Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018

CS229T/STATS231: Statistical Learning Theory. Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018 CS229T/STATS231: Statistical Learning Theory Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018 1 Overview This lecture mainly covers Recall the statistical theory of GANs

More information

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information. L65 Dept. of Linguistics, Indiana University Fall 205 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission rate

More information

Dept. of Linguistics, Indiana University Fall 2015

Dept. of Linguistics, Indiana University Fall 2015 L645 Dept. of Linguistics, Indiana University Fall 2015 1 / 28 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission

More information

Quantitative Biology II Lecture 4: Variational Methods

Quantitative Biology II Lecture 4: Variational Methods 10 th March 2015 Quantitative Biology II Lecture 4: Variational Methods Gurinder Singh Mickey Atwal Center for Quantitative Biology Cold Spring Harbor Laboratory Image credit: Mike West Summary Approximate

More information

Machine Learning. Lecture 02.2: Basics of Information Theory. Nevin L. Zhang

Machine Learning. Lecture 02.2: Basics of Information Theory. Nevin L. Zhang Machine Learning Lecture 02.2: Basics of Information Theory Nevin L. Zhang lzhang@cse.ust.hk Department of Computer Science and Engineering The Hong Kong University of Science and Technology Nevin L. Zhang

More information

Lecture 1: Introduction, Entropy and ML estimation

Lecture 1: Introduction, Entropy and ML estimation 0-704: Information Processing and Learning Spring 202 Lecture : Introduction, Entropy and ML estimation Lecturer: Aarti Singh Scribes: Min Xu Disclaimer: These notes have not been subjected to the usual

More information

5 Mutual Information and Channel Capacity

5 Mutual Information and Channel Capacity 5 Mutual Information and Channel Capacity In Section 2, we have seen the use of a quantity called entropy to measure the amount of randomness in a random variable. In this section, we introduce several

More information

x log x, which is strictly convex, and use Jensen s Inequality:

x log x, which is strictly convex, and use Jensen s Inequality: 2. Information measures: mutual information 2.1 Divergence: main inequality Theorem 2.1 (Information Inequality). D(P Q) 0 ; D(P Q) = 0 iff P = Q Proof. Let ϕ(x) x log x, which is strictly convex, and

More information

EE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16

EE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16 EE539R: Problem Set 4 Assigned: 3/08/6, Due: 07/09/6. Cover and Thomas: Problem 3.5 Sets defined by probabilities: Define the set C n (t = {x n : P X n(x n 2 nt } (a We have = P X n(x n P X n(x n 2 nt

More information

Lecture 5 Channel Coding over Continuous Channels

Lecture 5 Channel Coding over Continuous Channels Lecture 5 Channel Coding over Continuous Channels I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw November 14, 2014 1 / 34 I-Hsiang Wang NIT Lecture 5 From

More information

Lecture 35: December The fundamental statistical distances

Lecture 35: December The fundamental statistical distances 36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose

More information

Introduction to Statistical Learning Theory

Introduction to Statistical Learning Theory Introduction to Statistical Learning Theory In the last unit we looked at regularization - adding a w 2 penalty. We add a bias - we prefer classifiers with low norm. How to incorporate more complicated

More information

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157 Lecture 6: Gaussian Channels Copyright G. Caire (Sample Lectures) 157 Differential entropy (1) Definition 18. The (joint) differential entropy of a continuous random vector X n p X n(x) over R is: Z h(x

More information

Ch. 8 Math Preliminaries for Lossy Coding. 8.5 Rate-Distortion Theory

Ch. 8 Math Preliminaries for Lossy Coding. 8.5 Rate-Distortion Theory Ch. 8 Math Preliminaries for Lossy Coding 8.5 Rate-Distortion Theory 1 Introduction Theory provide insight into the trade between Rate & Distortion This theory is needed to answer: What do typical R-D

More information

Chapter 8: Differential entropy. University of Illinois at Chicago ECE 534, Natasha Devroye

Chapter 8: Differential entropy. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 8: Differential entropy Chapter 8 outline Motivation Definitions Relation to discrete entropy Joint and conditional differential entropy Relative entropy and mutual information Properties AEP for

More information

Lecture 11: Continuous-valued signals and differential entropy

Lecture 11: Continuous-valued signals and differential entropy Lecture 11: Continuous-valued signals and differential entropy Biology 429 Carl Bergstrom September 20, 2008 Sources: Parts of today s lecture follow Chapter 8 from Cover and Thomas (2007). Some components

More information

Information Theory and Communication

Information Theory and Communication Information Theory and Communication Ritwik Banerjee rbanerjee@cs.stonybrook.edu c Ritwik Banerjee Information Theory and Communication 1/8 General Chain Rules Definition Conditional mutual information

More information

Chapter I: Fundamental Information Theory

Chapter I: Fundamental Information Theory ECE-S622/T62 Notes Chapter I: Fundamental Information Theory Ruifeng Zhang Dept. of Electrical & Computer Eng. Drexel University. Information Source Information is the outcome of some physical processes.

More information

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 1 Entropy Since this course is about entropy maximization,

More information

Homework Set #2 Data Compression, Huffman code and AEP

Homework Set #2 Data Compression, Huffman code and AEP Homework Set #2 Data Compression, Huffman code and AEP 1. Huffman coding. Consider the random variable ( x1 x X = 2 x 3 x 4 x 5 x 6 x 7 0.50 0.26 0.11 0.04 0.04 0.03 0.02 (a Find a binary Huffman code

More information

Solutions 1. Introduction to Coding Theory - Spring 2010 Solutions 1. Exercise 1.1. See Examples 1.2 and 1.11 in the course notes.

Solutions 1. Introduction to Coding Theory - Spring 2010 Solutions 1. Exercise 1.1. See Examples 1.2 and 1.11 in the course notes. Solutions 1 Exercise 1.1. See Exaples 1.2 and 1.11 in the course notes. Exercise 1.2. Observe that the Haing distance of two vectors is the iniu nuber of bit flips required to transfor one into the other.

More information

APC486/ELE486: Transmission and Compression of Information. Bounds on the Expected Length of Code Words

APC486/ELE486: Transmission and Compression of Information. Bounds on the Expected Length of Code Words APC486/ELE486: Transmission and Compression of Information Bounds on the Expected Length of Code Words Scribe: Kiran Vodrahalli September 8, 204 Notations In these notes, denotes a finite set, called the

More information

Solutions to Homework Set #3 Channel and Source coding

Solutions to Homework Set #3 Channel and Source coding Solutions to Homework Set #3 Channel and Source coding. Rates (a) Channels coding Rate: Assuming you are sending 4 different messages using usages of a channel. What is the rate (in bits per channel use)

More information

Computational Models: Class 3

Computational Models: Class 3 Computational Models: Class 3 Benny Chor School of Computer Science Tel Aviv University November 2, 2015 Based on slides by Maurice Herlihy, Brown University, and modifications by Iftach Haitner and Yishay

More information

1 Review of The Learning Setting

1 Review of The Learning Setting COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #8 Scribe: Changyan Wang February 28, 208 Review of The Learning Setting Last class, we moved beyond the PAC model: in the PAC model we

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB

More information

Introduction to Machine Learning

Introduction to Machine Learning What does this mean? Outline Contents Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola December 26, 2017 1 Introduction to Probability 1 2 Random Variables 3 3 Bayes

More information

Computational Models - Lecture 1 1

Computational Models - Lecture 1 1 Computational Models - Lecture 1 1 Handout Mode Ronitt Rubinfeld and Iftach Haitner. Tel Aviv University. February 29/ March 02, 2016 1 Based on frames by Benny Chor, Tel Aviv University, modifying frames

More information

CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University

CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University Charalampos E. Tsourakakis January 25rd, 2017 Probability Theory The theory of probability is a system for making better guesses.

More information

CS 630 Basic Probability and Information Theory. Tim Campbell

CS 630 Basic Probability and Information Theory. Tim Campbell CS 630 Basic Probability and Information Theory Tim Campbell 21 January 2003 Probability Theory Probability Theory is the study of how best to predict outcomes of events. An experiment (or trial or event)

More information

EE376A: Homeworks #4 Solutions Due on Thursday, February 22, 2018 Please submit on Gradescope. Start every question on a new page.

EE376A: Homeworks #4 Solutions Due on Thursday, February 22, 2018 Please submit on Gradescope. Start every question on a new page. EE376A: Homeworks #4 Solutions Due on Thursday, February 22, 28 Please submit on Gradescope. Start every question on a new page.. Maximum Differential Entropy (a) Show that among all distributions supported

More information

Power series solutions for 2nd order linear ODE s (not necessarily with constant coefficients) a n z n. n=0

Power series solutions for 2nd order linear ODE s (not necessarily with constant coefficients) a n z n. n=0 Lecture 22 Power series solutions for 2nd order linear ODE s (not necessarily with constant coefficients) Recall a few facts about power series: a n z n This series in z is centered at z 0. Here z can

More information

Entropy and Ergodic Theory Lecture 4: Conditional entropy and mutual information

Entropy and Ergodic Theory Lecture 4: Conditional entropy and mutual information Entropy and Ergodic Theory Lecture 4: Conditional entropy and mutual information 1 Conditional entropy Let (Ω, F, P) be a probability space, let X be a RV taking values in some finite set A. In this lecture

More information

Homework 1 Due: Wednesday, September 28, 2016

Homework 1 Due: Wednesday, September 28, 2016 0-704 Information Processing and Learning Fall 06 Homework Due: Wednesday, September 8, 06 Notes: For positive integers k, [k] := {,..., k} denotes te set of te first k positive integers. Wen p and Y q

More information

The Communication Complexity of Correlation. Prahladh Harsha Rahul Jain David McAllester Jaikumar Radhakrishnan

The Communication Complexity of Correlation. Prahladh Harsha Rahul Jain David McAllester Jaikumar Radhakrishnan The Communication Complexity of Correlation Prahladh Harsha Rahul Jain David McAllester Jaikumar Radhakrishnan Transmitting Correlated Variables (X, Y) pair of correlated random variables Transmitting

More information

Lecture 10. (2) Functions of two variables. Partial derivatives. Dan Nichols February 27, 2018

Lecture 10. (2) Functions of two variables. Partial derivatives. Dan Nichols February 27, 2018 Lecture 10 Partial derivatives Dan Nichols nichols@math.umass.edu MATH 233, Spring 2018 University of Massachusetts February 27, 2018 Last time: functions of two variables f(x, y) x and y are the independent

More information

Information Theory. David Rosenberg. June 15, New York University. David Rosenberg (New York University) DS-GA 1003 June 15, / 18

Information Theory. David Rosenberg. June 15, New York University. David Rosenberg (New York University) DS-GA 1003 June 15, / 18 Information Theory David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 18 A Measure of Information? Consider a discrete random variable

More information

Computing and Communications 2. Information Theory -Entropy

Computing and Communications 2. Information Theory -Entropy 1896 1920 1987 2006 Computing and Communications 2. Information Theory -Entropy Ying Cui Department of Electronic Engineering Shanghai Jiao Tong University, China 2017, Autumn 1 Outline Entropy Joint entropy

More information

EE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm

EE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm EE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm 1. Feedback does not increase the capacity. Consider a channel with feedback. We assume that all the recieved outputs are sent back immediately

More information

Lecture 8: Channel Capacity, Continuous Random Variables

Lecture 8: Channel Capacity, Continuous Random Variables EE376A/STATS376A Information Theory Lecture 8-02/0/208 Lecture 8: Channel Capacity, Continuous Random Variables Lecturer: Tsachy Weissman Scribe: Augustine Chemparathy, Adithya Ganesh, Philip Hwang Channel

More information

CSC Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming

CSC Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming CSC2411 - Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming Notes taken by Mike Jamieson March 28, 2005 Summary: In this lecture, we introduce semidefinite programming

More information

Capacity of AWGN channels

Capacity of AWGN channels Chapter 3 Capacity of AWGN channels In this chapter we prove that the capacity of an AWGN channel with bandwidth W and signal-tonoise ratio SNR is W log 2 (1+SNR) bits per second (b/s). The proof that

More information

Machine Learning Srihari. Information Theory. Sargur N. Srihari

Machine Learning Srihari. Information Theory. Sargur N. Srihari Information Theory Sargur N. Srihari 1 Topics 1. Entropy as an Information Measure 1. Discrete variable definition Relationship to Code Length 2. Continuous Variable Differential Entropy 2. Maximum Entropy

More information

Lecture 19: Solving linear ODEs + separable techniques for nonlinear ODE s

Lecture 19: Solving linear ODEs + separable techniques for nonlinear ODE s Lecture 19: Solving linear ODEs + separable techniques for nonlinear ODE s Geoffrey Cowles Department of Fisheries Oceanography School for Marine Science and Technology University of Massachusetts-Dartmouth

More information

Complex Systems Methods 2. Conditional mutual information, entropy rate and algorithmic complexity

Complex Systems Methods 2. Conditional mutual information, entropy rate and algorithmic complexity Complex Systems Methods 2. Conditional mutual information, entropy rate and algorithmic complexity Eckehard Olbrich MPI MiS Leipzig Potsdam WS 2007/08 Olbrich (Leipzig) 26.10.2007 1 / 18 Overview 1 Summary

More information

A DARK GREY P O N T, with a Switch Tail, and a small Star on the Forehead. Any

A DARK GREY P O N T, with a Switch Tail, and a small Star on the Forehead. Any Y Y Y X X «/ YY Y Y ««Y x ) & \ & & } # Y \#$& / Y Y X» \\ / X X X x & Y Y X «q «z \x» = q Y # % \ & [ & Z \ & { + % ) / / «q zy» / & / / / & x x X / % % ) Y x X Y $ Z % Y Y x x } / % «] «] # z» & Y X»

More information

Computational Models - Lecture 3 1

Computational Models - Lecture 3 1 Computational Models - Lecture 3 1 Handout Mode Iftach Haitner. Tel Aviv University. November 14, 2016 1 Based on frames by Benny Chor, Tel Aviv University, modifying frames by Maurice Herlihy, Brown University.

More information

Information. = more information was provided by the outcome in #2

Information. = more information was provided by the outcome in #2 Outline First part based very loosely on [Abramson 63]. Information theory usually formulated in terms of information channels and coding will not discuss those here.. Information 2. Entropy 3. Mutual

More information

A Gentle Tutorial on Information Theory and Learning. Roni Rosenfeld. Carnegie Mellon University

A Gentle Tutorial on Information Theory and Learning. Roni Rosenfeld. Carnegie Mellon University A Gentle Tutorial on Information Theory and Learning Roni Rosenfeld Mellon University Mellon Outline First part based very loosely on [Abramson 63]. Information theory usually formulated in terms of information

More information

Computational Models Lecture 2 1

Computational Models Lecture 2 1 Computational Models Lecture 2 1 Handout Mode Iftach Haitner. Tel Aviv University. October 30, 2017 1 Based on frames by Benny Chor, Tel Aviv University, modifying frames by Maurice Herlihy, Brown University.

More information

1/37. Convexity theory. Victor Kitov

1/37. Convexity theory. Victor Kitov 1/37 Convexity theory Victor Kitov 2/37 Table of Contents 1 2 Strictly convex functions 3 Concave & strictly concave functions 4 Kullback-Leibler divergence 3/37 Convex sets Denition 1 Set X is convex

More information

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 ECE598: Information-theoretic methods in high-dimensional statistics Spring 06 Lecture : Mutual Information Method Lecturer: Yihong Wu Scribe: Jaeho Lee, Mar, 06 Ed. Mar 9 Quick review: Assouad s lemma

More information

Statistical Learning Theory

Statistical Learning Theory Statistical Learning Theory Part I : Mathematical Learning Theory (1-8) By Sumio Watanabe, Evaluation : Report Part II : Information Statistical Mechanics (9-15) By Yoshiyuki Kabashima, Evaluation : Report

More information

Homework 2: Solution

Homework 2: Solution 0-704: Information Processing and Learning Sring 0 Lecturer: Aarti Singh Homework : Solution Acknowledgement: The TA graciously thanks Rafael Stern for roviding most of these solutions.. Problem Hence,

More information

Computational Models Lecture 2 1

Computational Models Lecture 2 1 Computational Models Lecture 2 1 Handout Mode Ronitt Rubinfeld and Iftach Haitner. Tel Aviv University. March 16/18, 2015 1 Based on frames by Benny Chor, Tel Aviv University, modifying frames by Maurice

More information

Lecture 17: Differential Entropy

Lecture 17: Differential Entropy Lecture 17: Differential Entropy Differential entropy AEP for differential entropy Quantization Maximum differential entropy Estimation counterpart of Fano s inequality Dr. Yao Xie, ECE587, Information

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 /

More information

Introduction to Machine Learning Lecture 14. Mehryar Mohri Courant Institute and Google Research

Introduction to Machine Learning Lecture 14. Mehryar Mohri Courant Institute and Google Research Introduction to Machine Learning Lecture 14 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Density Estimation Maxent Models 2 Entropy Definition: the entropy of a random variable

More information

Likelihood, MLE & EM for Gaussian Mixture Clustering. Nick Duffield Texas A&M University

Likelihood, MLE & EM for Gaussian Mixture Clustering. Nick Duffield Texas A&M University Likelihood, MLE & EM for Gaussian Mixture Clustering Nick Duffield Texas A&M University Probability vs. Likelihood Probability: predict unknown outcomes based on known parameters: P(x q) Likelihood: estimate

More information

Lecture 22: Final Review

Lecture 22: Final Review Lecture 22: Final Review Nuts and bolts Fundamental questions and limits Tools Practical algorithms Future topics Dr Yao Xie, ECE587, Information Theory, Duke University Basics Dr Yao Xie, ECE587, Information

More information

Foundation of Cryptography ( ), Lecture 1

Foundation of Cryptography ( ), Lecture 1 Foundation of Cryptography (0368-4162-01), Lecture 1 Iftach Haitner, Tel Aviv University November 1-8, 2011 Section 1 Notation Notation I For t N, let [t] := {1,..., t}. Given a string x {0, 1} and 0 i

More information

Information Theory. M1 Informatique (parcours recherche et innovation) Aline Roumy. January INRIA Rennes 1/ 73

Information Theory. M1 Informatique (parcours recherche et innovation) Aline Roumy. January INRIA Rennes 1/ 73 1/ 73 Information Theory M1 Informatique (parcours recherche et innovation) Aline Roumy INRIA Rennes January 2018 Outline 2/ 73 1 Non mathematical introduction 2 Mathematical introduction: definitions

More information

A CLASSROOM NOTE: ENTROPY, INFORMATION, AND MARKOV PROPERTY. Zoran R. Pop-Stojanović. 1. Introduction

A CLASSROOM NOTE: ENTROPY, INFORMATION, AND MARKOV PROPERTY. Zoran R. Pop-Stojanović. 1. Introduction THE TEACHING OF MATHEMATICS 2006, Vol IX,, pp 2 A CLASSROOM NOTE: ENTROPY, INFORMATION, AND MARKOV PROPERTY Zoran R Pop-Stojanović Abstract How to introduce the concept of the Markov Property in an elementary

More information

A Probability Review

A Probability Review A Probability Review Outline: A probability review Shorthand notation: RV stands for random variable EE 527, Detection and Estimation Theory, # 0b 1 A Probability Review Reading: Go over handouts 2 5 in

More information

Information measures in simple coding problems

Information measures in simple coding problems Part I Information measures in simple coding problems in this web service in this web service Source coding and hypothesis testing; information measures A(discrete)source is a sequence {X i } i= of random

More information

EE5319R: Problem Set 3 Assigned: 24/08/16, Due: 31/08/16

EE5319R: Problem Set 3 Assigned: 24/08/16, Due: 31/08/16 EE539R: Problem Set 3 Assigned: 24/08/6, Due: 3/08/6. Cover and Thomas: Problem 2.30 (Maimum Entropy): Solution: We are required to maimize H(P X ) over all distributions P X on the non-negative integers

More information

Generalization bounds

Generalization bounds Advanced Course in Machine Learning pring 200 Generalization bounds Handouts are jointly prepared by hie Mannor and hai halev-hwartz he problem of characterizing learnability is the most basic question

More information

Lecture 11: Quantum Information III - Source Coding

Lecture 11: Quantum Information III - Source Coding CSCI5370 Quantum Computing November 25, 203 Lecture : Quantum Information III - Source Coding Lecturer: Shengyu Zhang Scribe: Hing Yin Tsang. Holevo s bound Suppose Alice has an information source X that

More information

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy Coding and Information Theory Chris Williams, School of Informatics, University of Edinburgh Overview What is information theory? Entropy Coding Information Theory Shannon (1948): Information theory is

More information

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability

More information

Information Theory. Lecture 5 Entropy rate and Markov sources STEFAN HÖST

Information Theory. Lecture 5 Entropy rate and Markov sources STEFAN HÖST Information Theory Lecture 5 Entropy rate and Markov sources STEFAN HÖST Universal Source Coding Huffman coding is optimal, what is the problem? In the previous coding schemes (Huffman and Shannon-Fano)it

More information

16.1 Bounding Capacity with Covering Number

16.1 Bounding Capacity with Covering Number ECE598: Information-theoretic methods in high-dimensional statistics Spring 206 Lecture 6: Upper Bounds for Density Estimation Lecturer: Yihong Wu Scribe: Yang Zhang, Apr, 206 So far we have been mostly

More information

Chaos, Complexity, and Inference (36-462)

Chaos, Complexity, and Inference (36-462) Chaos, Complexity, and Inference (36-462) Lecture 7: Information Theory Cosma Shalizi 3 February 2009 Entropy and Information Measuring randomness and dependence in bits The connection to statistics Long-run

More information

Lecture 3. Mathematical methods in communication I. REMINDER. A. Convex Set. A set R is a convex set iff, x 1,x 2 R, θ, 0 θ 1, θx 1 + θx 2 R, (1)

Lecture 3. Mathematical methods in communication I. REMINDER. A. Convex Set. A set R is a convex set iff, x 1,x 2 R, θ, 0 θ 1, θx 1 + θx 2 R, (1) 3- Mathematical methods in communication Lecture 3 Lecturer: Haim Permuter Scribe: Yuval Carmel, Dima Khaykin, Ziv Goldfeld I. REMINDER A. Convex Set A set R is a convex set iff, x,x 2 R, θ, θ, θx + θx

More information

Foundation of Cryptography, Lecture 7 Non-Interactive ZK and Proof of Knowledge

Foundation of Cryptography, Lecture 7 Non-Interactive ZK and Proof of Knowledge Foundation of Cryptography, Lecture 7 Non-Interactive ZK and Proof of Knowledge Handout Mode Iftach Haitner, Tel Aviv University Tel Aviv University. April 1, 2014 Iftach Haitner (TAU) Foundation of Cryptography

More information

Computational Models Lecture 8 1

Computational Models Lecture 8 1 Computational Models Lecture 8 1 Handout Mode Ronitt Rubinfeld and Iftach Haitner. Tel Aviv University. May 11/13, 2015 1 Based on frames by Benny Chor, Tel Aviv University, modifying frames by Maurice

More information

EECS 750. Hypothesis Testing with Communication Constraints

EECS 750. Hypothesis Testing with Communication Constraints EECS 750 Hypothesis Testing with Communication Constraints Name: Dinesh Krithivasan Abstract In this report, we study a modification of the classical statistical problem of bivariate hypothesis testing.

More information

the Information Bottleneck

the Information Bottleneck the Information Bottleneck Daniel Moyer December 10, 2017 Imaging Genetics Center/Information Science Institute University of Southern California Sorry, no Neuroimaging! (at least not presented) 0 Instead,

More information

Information Theory in Intelligent Decision Making

Information Theory in Intelligent Decision Making Information Theory in Intelligent Decision Making Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire, United Kingdom June 7, 2015 Information Theory

More information

Solutions to Homework Set #1 Sanov s Theorem, Rate distortion

Solutions to Homework Set #1 Sanov s Theorem, Rate distortion st Semester 00/ Solutions to Homework Set # Sanov s Theorem, Rate distortion. Sanov s theorem: Prove the simple version of Sanov s theorem for the binary random variables, i.e., let X,X,...,X n be a sequence

More information

18.440: Lecture 26 Conditional expectation

18.440: Lecture 26 Conditional expectation 18.440: Lecture 26 Conditional expectation Scott Sheffield MIT 1 Outline Conditional probability distributions Conditional expectation Interpretation and examples 2 Outline Conditional probability distributions

More information

16.4. Power Series. Introduction. Prerequisites. Learning Outcomes

16.4. Power Series. Introduction. Prerequisites. Learning Outcomes Power Series 6.4 Introduction In this Section we consider power series. These are examples of infinite series where each term contains a variable, x, raised to a positive integer power. We use the ratio

More information

Information Theory. Week 4 Compressing streams. Iain Murray,

Information Theory. Week 4 Compressing streams. Iain Murray, Information Theory http://www.inf.ed.ac.uk/teaching/courses/it/ Week 4 Compressing streams Iain Murray, 2014 School of Informatics, University of Edinburgh Jensen s inequality For convex functions: E[f(x)]

More information