ECE 587 / STA 563: Lecture 2 Measures of Information Information Theory Duke University, Fall 2017

Size: px
Start display at page:

Download "ECE 587 / STA 563: Lecture 2 Measures of Information Information Theory Duke University, Fall 2017"

Transcription

1 ECE 587 / STA 563: Lecture 2 Measures of Information Information Theory Duke University, Fall 207 Author: Galen Reeves Last Modified: August 3, 207 Outline of lecture: 2. Quantifying Information Entropy and Mutual Information Entropy Mutual Information Eample: Testing for a disease Relative Entropy Conveity & Concavity Data Processing Inequality Fano s Inequality Summary of Basic Inequalities Aiomatic Derivation of Mutual Information [Optional] Proof of uniqueness of i(, y) Quantifying Information How much information do the answers to the following questions provide? (i) Will it rain today in Durham? (two possible answers) (ii) Will it rain today in Death Valley? (two possible answers) (iii) What is today s winning lottery number? 258,890,850 possible answers) (for the Mega Millions Jackpot, there are The amount of information is linked to the number of possible answers. In 928, Ralph Hartley gave the following definition: Hartley Information = log # answers Hartley s measure of information is additive. The number of possible answers for two questions corresponds to the product of the number of answers for each question. Taking the logarithm turns the product into a sum. What is today s winning lottery number? log 2 ( ) 28(bits) What are the winning lottery numbers for today and tomorrow? log 2 ( ) = log 2 ( ) + log 2 ( ) 56(bits)

2 2 ECE 587 / STA 563: Lecture 2 But Hartley s information does not distinguish between likely and unlikely answers (e.g. rain in Durham vs. rain in Death Valley). In 948, Shannon introduced measures of information which depend on the probabilities of the answers. 2.2 Entropy and Mutual Information 2.2. Entropy Let X be discrete random variable with pmf p() and finite support X. The entropy of X is H(X) = X p() log p() The entropy can also be epressed as the epected value of the random variable log(/p(x)), i.e. [ ( )] H(X) = E log p(x) Binary Entropy: If X is a Bernoulli(p) random variable (i.e. P[X = ]p and P[X = 0] = p), then its entropy is given by the binary entropy function H b (p) = p log p ( p) log( p) H b (p) is concave, the maimum at H b (/2) = log(2) and has minimum is at H b (0) = H b () = 0. Eample: Two Questions Will it rain Today in Durham? 04 H b Will it rain Today in Death Valley? H b bits bits Fundamental Inequality For any base b > 0 and > 0, ( ) log b (e) log b () ( ) log b (e) with equalities on both sides if, and only if, =. Proof:

3 ECE 587 / STA 563: Lecture 2 3 For the natural log, this simplifies to ( ) ln() ( ) To prove the upper bound, note that equality is attained at =. For >, ( ) ln() = ( ) d > 0 }{{} strictly positive and for <, ( ) ln() = d > 0 }{{} strictly positive To prove the lower bound, let y = / so that ln(y) y = y ln y = ln() Theorem: 0 H(X) log X Left side follows from 0 p(). For right side, observe that p() log = ( ) X p() log p() p() X = log( X ) + ( ) p() log p() X log( X ) + ( ) p() log(e) p() X = log( X ) + log(e) log(e) = log( X ) Fundamental Inq. The entropy of an n-dimensional random vector X = (X, X 2,, X n ) with pmf p() is defined as H(X) = H(X, X 2,, X n ) = p() log p() X The joint entropy of random variables X and Y is simply the entropy of the vector (X, Y ) H(X, Y ) = X y Y p(, y)

4 4 ECE 587 / STA 563: Lecture 2 Conditional Entropy: The entropy of a random variable Y conditioned on the event {X = } is a function of the conditional distribution p Y X ( ) and is given by: H(Y X = ) = p(y ) log p(y ) y Y The conditional entropy of Y given X is a function of the joint distribution p(, y): H(Y X) = p()h(y X = ) = p(y ) X,y Warning: Note that H(Y X) is not a random variable! This is differs from the usual convention for conditioning where, for eample, E[Y X] is a random variable. Chain Rule: The joint entropy of X and Y can be decomposed as and more generally, H(X, Y ) = H(X) + H(Y X) H(X, X 2,, X n ) = n H(X i X i,, X ) i= Proof of chain rule: H(X; Y ) =,y =,y =,y ( ( [ ) p(, y) ) p() p(y ) ( ) ( )] + log p() p(y ) = H(X) + H(Y X) Mutual Information Measure of the amount of information that one RV contains about another RV I(X; Y ) = ( ) p(, y) p()p(y) X y Y It can also be epressed as an epectation: ( ) p(, y) I(X; Y ) = E[i(X, Y )] where i(, y) = log p()p(y) Mutual information between X and Y can be epressed as the amount by which knowledge of X reduces the entropy of Y : I(X; Y ) = H(Y ) H(Y X)

5 ECE 587 / STA 563: Lecture 2 5 and by symmetry Proof: X y Y ( ) p(, y) = p()p(y) X I(X; Y ) = H(X) H(X Y ) = X y Y y Y [ ( ( } {{ } H(Y ) The mutual information between X and itself is equal to entropy: I(X; X) = H(X) H(X X) = H(X) Thus entropy is sometimes known as self-information Venn diagram ) ( )] log p(y) p(y ) ) p(y) p(y ) X y Y } {{ } H(Y X) The conditional mutual information between X and Y given Z is I(X; Y Z) = ( ) p(, y z) p(, y, z) log p( z)p(y z),y,z or equivalently I(X; Y Z) = E[i(X, Y Z)], ( ) p(, y z) i(, y z) = log p( z)p(y z) Chain Rule for mutual information: and more generally I(X; Y, Y 2 ) = I(X; Y ) + I(X; Y 2 Y ) I(X; Y, Y 2,, Y n ) = n I(X; Y i Y, Y 2,, Y k ) i= Eample: Testing for a disease There is a % chance I have a certain disease. There eists a test for this disease which is 90% accurate (i.e. P[test is pos I have disease] = P[test is neg I don t have disease] = 0.9). Let { {, I have disease, ith test is positive X = and Y i = 0, I don t have disease 0, ith test is negative Assume the the test outcomes Y = (Y, Y 2 ) are conditionally independent given X.

6 6 ECE 587 / STA 563: Lecture 2 The probability mass functions can be computed as and p(, y) y = (0, 0) y = (0, ) y = (, 0) y = (, ) = = p() = = 0.0 The individual entropies are p(y) y = (0, 0) y = (0, ) y = (, 0) y = (, ) p(y ) y = y = H(X) = H b (0.0) H(Y ) = H(Y 2 ) = H b (0.080) The conditional entropy of X given Y is computed as follows: H(X Y = ) = H b (0.967) and so H(X Y = 0) = H b (0.00) H(X Y ) = P[Y = ]H(X Y = ) + P[Y = 0]H(H Y = 0) Furthermore H(X Y, Y 2 ) = H(X, Y, Y 2 ) H(Y, Y 2 ) H(Y Y 2 ) = H(Y, Y 2 ) H(Y 2 ) The mutual information is I(X; Y ) = H(X) H(X Y ) I(X; Y, Y 2 ) = H(X) H(X Y, Y 2 ) The conditional mutual information is I(X; Y 2 Y ) = H(X Y ) H(X Y, Y 2 ) Relative Entropy The relative entropy between a distributions p and q is defined by D(p q) = ( ) p() p() log q() X This is also known as the Kullback-Leibler divergence. It can be epressed as the epectation of the epectation of the log likelihood ratio ( ) p() D(p q) = E[Λ(X)], X p, Λ() = log q()

7 ECE 587 / STA 563: Lecture 2 7 Note that if there eists such that p() > 0 and q() = 0, then D(p q) =. Warning: D(p q) is not a metric since it is not symmetric and it does not satisfy the triangle inequality. Mutual information between X and Y is equal to the relative entropy between p X,Y (, y) and p X ()p Y (y), I(X; Y ) = D(p X,Y (, y) p X ()p Y (y)) Theorem: Relative entropy is nonnegative, i.e D(p q) 0. It is equal to zero if and only if p = q. The proof is given in CT problem Proof: D(p q) = p() log q() p() ( ) q() p() log(e) p() Fundamental Inq. = log(e) = 0 q() log(e) p() Important consequences of the non-negativity of relative entropy: Mutual information is nonnegative, I(X; Y ) 0, with equality if an only if X and Y are independent. This means that H(X) H(X Y ) 0, and thus conditioning cannot increase entropy, H(X Y ) H(X) Warning: Although conditioning cannot increase entropy (in epectation), it is possible that the entropy of X conditioned on an specific event, say {Y = y}, is greater than H(X), i.e. H(X Y = y) > H(X). 2.3 Conveity & Concavity A function f() is conve over an interval (a, b) R if for every, 2 (a, b) and 0 λ, f(λ + ( λ) 2 ) λf( ) + ( λ)f( 2 ). The function is strictly conve if equality holds only if λ = 0 or λ =. Illustration of conveity. Let = λ + ( λ) 2

8 8 ECE 587 / STA 563: Lecture 2 Theorem: H(X) is a concave function p(), i.e. H(λp + ( λ)p 2 }{{} p ) λh(p ) + ( λ)h(p 2 ) This can be proved using the fundamental inequality (try it yourself) Here is an alternative proof which uses the fact that conditioning cannot increase entropy. Let Z be Bernoulli(λ )and let { p, Z = X p 2, Z = 0 Then, Since conditioning cannot increase entropy, H(X) = H(λp + ( λ)p 2 ) H(X) H(X Z) = λh(x Z = ) + ( λ)h(x Z = 0). Combining the displays completes the proof. Jensen s Inequality If f( ) is a conve function over an interval I and X is a random variable with support X I then E[f(X)] f(e[x]) Moreover, if f( ) is strictly conve, equality occurs if and only if X = E[X] is a constant. The proof of Jensen s Inequality is given in [C&T] Eample For any set of positive numbers { i } n i=, the geometric mean is no greater than the arithmetic mean: ( n ) /n i n i n i= Proof: Let Z be uniformly distributed on { i } so that P[Z = i ] = /n. By Jensen s inequality, ( n ) /n ( ) log i = n n log i = E[log(Z)] log(e[z]) = log i n n i= i= 2.4 Data Processing Inequality Markov Chain: Random variables X, Y, Z form a Markov chain, denoted X Y Z if X and Z are independent conditioned on Y. alternatively p(, z y) = p( y)p(z y) p(, y, z) = p()p(y, z ) i always true = p()p(y )p(z, y) always true = p()p(y )p(z y) if Markov chain i

9 ECE 587 / STA 563: Lecture 2 9 Note X Y Z implies Z Y X If Z = f(y ) then X Y Z. Theorem: (Data Processing Inequality) If X Y Z, then I(X; Y ) I(X; Z) In particular, for any function g( ), we have X Y g(y ) and so I(X; Y ) I(X; g(y )). No clever manipulation of Y can increase the mutual information! proof: By chain rule, we can epand mutual information two different ways: I(X; Y, Z) = I(X; Z) + I(X; Y Z) = I(X; Y ) + I(X; Z Y ) Since X and Z are conditionally independent given Y, we have I(X; Z Y ) = 0. I(X; Y Z) 0, we have I(X; Y ) I(X; Z) Since 2.5 Fano s Inequality Suppose we want to estimate a random variable X from an observation Y. The probability of error for an estimator ˆX = φ(y ) is [ ] P e = P ˆX X Theorem: (Fano s Inequality) For any estimator ˆX such that X Y ˆX, H b (P e ) + P e log( X ) H(X Y ) and thus P e H(X Y ) log( X ) Remark: Fano s Inequality provides a lower bound on P e for any possible function of Y! Proof: Let E be a random variable that indicates whether an error has occured: {, ˆX = X E = 0, ˆX X By the chain rule, the entropy of (E, X) given ˆX can be epanded two different ways H(E, X ˆX) = H(X ˆX) + H(E X, ˆX) }{{} =0 = H(E }{{ ˆX) + H(X E, }}{{ ˆX) } H b (P e) P e log X

10 0 ECE 587 / STA 563: Lecture 2 H(E ˆX) H(E) = H b (P e ) since conditioning cannot increase entropy H(E X, ˆX) = 0 since E is a deterministic function of X and ˆX. By the data processing inequality, Furthermore, H(X ˆX) H(X Y ) H(X E, ˆX) = P[E = ] H(X ˆX, E = ) +P[E = 0] H(X }{{} ˆX, E = 0) }{{} =0 log X Putting everything together proves the desire result. 2.6 Summary of Basic Inequalities Jensen s inequality: If f is a conve function then if f is a concave function then E[f(X)] f(e[x]) E[f(X)] f(e[x]) Data Processing Inequality: If X Y Z form a Markov chain, then I(X; Y ) I(X; Z) Fano s Inequality: If X Y ˆX forms a Markov chain, then [ P X ˆX ] H(X Y ) log( X ) 2.7 Aiomatic Derivation of Mutual Information [Optional] This section is based on lecture notes from Toby Berger. Let X, Y denote discrete random variables with respective alphabets X and Y. (Assume X < and Y <.) Let i(, y) be the amount of information about event {X = } conveyed by learning {Y = y} Let i(, y z) be the amount of information about event {X = } conveyed by learning {Y = y} conditioned on the event {Z = z} Consider the four postulates: (A) Bayesianness: i(, y) depends only on p(, y), i.e. i(, y) = f(α, β) for some function f : [0, ] 2 R. α=p() β=p( y)

11 ECE 587 / STA 563: Lecture 2 (B) Smoothness: partial derivatives of f(, ) eist. f (α, β) = f(α, β) f(α, β), f 2 (α, β) = α β (C) successive revelation: Let y = (w, z). Then i(, y) = i(, w) + i(, z w) where i(, w) = f(p(), p( w)) and i(, z w) = f(p( w), p( z, w)) and so the function f(, ) must obey f(α, γ) = f(α, β) + f(β, γ), 0 α, β, γ (D) Additivity: If (X, Y ) and (U, V ) are independent, i.e. p(, y, u, v) = p(, y)p(u, v), then i((, u), (y, v)) = i(, y) + i(u, v) where i(, u) = f(p(, u), p(, u y, v)) = f(p()p(u), p( y)p(u v)) and so the function f(, ) must obey Theorem: The function f(αγ, βδ) = f(α, β) + f(γ, δ) 0 α, β, γ, δ ( ) p(, y) i(, y) = log p()p(y) is the is the only function which satisfies our four postulates above Proof of uniqueness of i(, y) Because of B, we can apply β to left and right sides of C 0 = f 2 (α, β) + f (β, γ) = f 2 (α, β) = f (β, γ) Thus f 2 (α, β) must be a function only of β, say g (β). Integrating w.r.t. β gives f 2 (α, β)dβ = f(α, β) + c(α) i.e. and so Put this back into C g (β)dβ = g(β) = f(α, β) + c(α) f(α, β) = g(β) c(α) f(α, γ) = g(γ) c(α) = g(β) c(α) + g(γ) c(β) c(β) = g(β) f(α, β) = g(β) g(α)

12 2 ECE 587 / STA 563: Lecture 2 Net, write D in terms of g( ) g(βδ) g(αγ) = g(β) g(α) + g(δ) g(γ) Take derivative w.r.t δ of both sides to get βg (βδ) = g (δ) Set δ = /2 (could be δ = but scared to try) βg (β/2) = g (/2) = K, a constant and so g (β/2) = K/β Take the integral of both sides with respect to β to get g(β/2) = K ln(β) + C So or g() = K ln(2) + C g() = K ln() + C Thus f(α, β) = g(β) g(α) = K ln(β) K ln(α) = K ln(β/α) By A, ( ) p( y) i(, y) = K ln p() Choosing K is equivalent to choosing the log base: K = corresponds to measuring information in nats K = log 2 (e) corresponds to measuring information in bits

The binary entropy function

The binary entropy function ECE 7680 Lecture 2 Definitions and Basic Facts Objective: To learn a bunch of definitions about entropy and information measures that will be useful through the quarter, and to present some simple but

More information

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 2: Entropy and Mutual Information Chapter 2 outline Definitions Entropy Joint entropy, conditional entropy Relative entropy, mutual information Chain rules Jensen s inequality Log-sum inequality

More information

Lecture 5 - Information theory

Lecture 5 - Information theory Lecture 5 - Information theory Jan Bouda FI MU May 18, 2012 Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 1 / 42 Part I Uncertainty and entropy Jan Bouda (FI MU) Lecture 5 - Information

More information

LECTURE 2. Convexity and related notions. Last time: mutual information: definitions and properties. Lecture outline

LECTURE 2. Convexity and related notions. Last time: mutual information: definitions and properties. Lecture outline LECTURE 2 Convexity and related notions Last time: Goals and mechanics of the class notation entropy: definitions and properties mutual information: definitions and properties Lecture outline Convexity

More information

COMPSCI 650 Applied Information Theory Jan 21, Lecture 2

COMPSCI 650 Applied Information Theory Jan 21, Lecture 2 COMPSCI 650 Applied Information Theory Jan 21, 2016 Lecture 2 Instructor: Arya Mazumdar Scribe: Gayane Vardoyan, Jong-Chyi Su 1 Entropy Definition: Entropy is a measure of uncertainty of a random variable.

More information

Machine Learning. Lecture 02.2: Basics of Information Theory. Nevin L. Zhang

Machine Learning. Lecture 02.2: Basics of Information Theory. Nevin L. Zhang Machine Learning Lecture 02.2: Basics of Information Theory Nevin L. Zhang lzhang@cse.ust.hk Department of Computer Science and Engineering The Hong Kong University of Science and Technology Nevin L. Zhang

More information

H(X) = plog 1 p +(1 p)log 1 1 p. With a slight abuse of notation, we denote this quantity by H(p) and refer to it as the binary entropy function.

H(X) = plog 1 p +(1 p)log 1 1 p. With a slight abuse of notation, we denote this quantity by H(p) and refer to it as the binary entropy function. LECTURE 2 Information Measures 2. ENTROPY LetXbeadiscreterandomvariableonanalphabetX drawnaccordingtotheprobability mass function (pmf) p() = P(X = ), X, denoted in short as X p(). The uncertainty about

More information

Information Theory and Communication

Information Theory and Communication Information Theory and Communication Ritwik Banerjee rbanerjee@cs.stonybrook.edu c Ritwik Banerjee Information Theory and Communication 1/8 General Chain Rules Definition Conditional mutual information

More information

5 Mutual Information and Channel Capacity

5 Mutual Information and Channel Capacity 5 Mutual Information and Channel Capacity In Section 2, we have seen the use of a quantity called entropy to measure the amount of randomness in a random variable. In this section, we introduce several

More information

ECE 4400:693 - Information Theory

ECE 4400:693 - Information Theory ECE 4400:693 - Information Theory Dr. Nghi Tran Lecture 8: Differential Entropy Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 1 / 43 Outline 1 Review: Entropy of discrete RVs 2 Differential

More information

3. If a choice is broken down into two successive choices, the original H should be the weighted sum of the individual values of H.

3. If a choice is broken down into two successive choices, the original H should be the weighted sum of the individual values of H. Appendix A Information Theory A.1 Entropy Shannon (Shanon, 1948) developed the concept of entropy to measure the uncertainty of a discrete random variable. Suppose X is a discrete random variable that

More information

1 Basic Information Theory

1 Basic Information Theory ECE 6980 An Algorithmic and Information-Theoretic Toolbo for Massive Data Instructor: Jayadev Acharya Lecture #4 Scribe: Xiao Xu 6th September, 206 Please send errors to 243@cornell.edu and acharya@cornell.edu

More information

ECE 587 / STA 563: Lecture 5 Lossless Compression

ECE 587 / STA 563: Lecture 5 Lossless Compression ECE 587 / STA 563: Lecture 5 Lossless Compression Information Theory Duke University, Fall 2017 Author: Galen Reeves Last Modified: October 18, 2017 Outline of lecture: 5.1 Introduction to Lossless Source

More information

ECE 587 / STA 563: Lecture 5 Lossless Compression

ECE 587 / STA 563: Lecture 5 Lossless Compression ECE 587 / STA 563: Lecture 5 Lossless Compression Information Theory Duke University, Fall 28 Author: Galen Reeves Last Modified: September 27, 28 Outline of lecture: 5. Introduction to Lossless Source

More information

Example: Letter Frequencies

Example: Letter Frequencies Example: Letter Frequencies i a i p i 1 a 0.0575 2 b 0.0128 3 c 0.0263 4 d 0.0285 5 e 0.0913 6 f 0.0173 7 g 0.0133 8 h 0.0313 9 i 0.0599 10 j 0.0006 11 k 0.0084 12 l 0.0335 13 m 0.0235 14 n 0.0596 15 o

More information

Example: Letter Frequencies

Example: Letter Frequencies Example: Letter Frequencies i a i p i 1 a 0.0575 2 b 0.0128 3 c 0.0263 4 d 0.0285 5 e 0.0913 6 f 0.0173 7 g 0.0133 8 h 0.0313 9 i 0.0599 10 j 0.0006 11 k 0.0084 12 l 0.0335 13 m 0.0235 14 n 0.0596 15 o

More information

Lecture 22: Final Review

Lecture 22: Final Review Lecture 22: Final Review Nuts and bolts Fundamental questions and limits Tools Practical algorithms Future topics Dr Yao Xie, ECE587, Information Theory, Duke University Basics Dr Yao Xie, ECE587, Information

More information

Lecture 2: August 31

Lecture 2: August 31 0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 2: August 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy

More information

Information Theory. David Rosenberg. June 15, New York University. David Rosenberg (New York University) DS-GA 1003 June 15, / 18

Information Theory. David Rosenberg. June 15, New York University. David Rosenberg (New York University) DS-GA 1003 June 15, / 18 Information Theory David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 18 A Measure of Information? Consider a discrete random variable

More information

Information Theory Primer:

Information Theory Primer: Information Theory Primer: Entropy, KL Divergence, Mutual Information, Jensen s inequality Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,

More information

Hands-On Learning Theory Fall 2016, Lecture 3

Hands-On Learning Theory Fall 2016, Lecture 3 Hands-On Learning Theory Fall 016, Lecture 3 Jean Honorio jhonorio@purdue.edu 1 Information Theory First, we provide some information theory background. Definition 3.1 (Entropy). The entropy of a discrete

More information

Homework 1 Due: Thursday 2/5/2015. Instructions: Turn in your homework in class on Thursday 2/5/2015

Homework 1 Due: Thursday 2/5/2015. Instructions: Turn in your homework in class on Thursday 2/5/2015 10-704 Homework 1 Due: Thursday 2/5/2015 Instructions: Turn in your homework in class on Thursday 2/5/2015 1. Information Theory Basics and Inequalities C&T 2.47, 2.29 (a) A deck of n cards in order 1,

More information

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information 4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information Ramji Venkataramanan Signal Processing and Communications Lab Department of Engineering ramji.v@eng.cam.ac.uk

More information

LECTURE 3. Last time:

LECTURE 3. Last time: LECTURE 3 Last time: Mutual Information. Convexity and concavity Jensen s inequality Information Inequality Data processing theorem Fano s Inequality Lecture outline Stochastic processes, Entropy rate

More information

Entropy, Relative Entropy and Mutual Information Exercises

Entropy, Relative Entropy and Mutual Information Exercises Entrop, Relative Entrop and Mutual Information Exercises Exercise 2.: Coin Flips. A fair coin is flipped until the first head occurs. Let X denote the number of flips required. (a) Find the entrop H(X)

More information

Computing and Communications 2. Information Theory -Entropy

Computing and Communications 2. Information Theory -Entropy 1896 1920 1987 2006 Computing and Communications 2. Information Theory -Entropy Ying Cui Department of Electronic Engineering Shanghai Jiao Tong University, China 2017, Autumn 1 Outline Entropy Joint entropy

More information

Entropy and Ergodic Theory Lecture 4: Conditional entropy and mutual information

Entropy and Ergodic Theory Lecture 4: Conditional entropy and mutual information Entropy and Ergodic Theory Lecture 4: Conditional entropy and mutual information 1 Conditional entropy Let (Ω, F, P) be a probability space, let X be a RV taking values in some finite set A. In this lecture

More information

Series 7, May 22, 2018 (EM Convergence)

Series 7, May 22, 2018 (EM Convergence) Exercises Introduction to Machine Learning SS 2018 Series 7, May 22, 2018 (EM Convergence) Institute for Machine Learning Dept. of Computer Science, ETH Zürich Prof. Dr. Andreas Krause Web: https://las.inf.ethz.ch/teaching/introml-s18

More information

Complex Systems Methods 2. Conditional mutual information, entropy rate and algorithmic complexity

Complex Systems Methods 2. Conditional mutual information, entropy rate and algorithmic complexity Complex Systems Methods 2. Conditional mutual information, entropy rate and algorithmic complexity Eckehard Olbrich MPI MiS Leipzig Potsdam WS 2007/08 Olbrich (Leipzig) 26.10.2007 1 / 18 Overview 1 Summary

More information

Chapter 8: Differential entropy. University of Illinois at Chicago ECE 534, Natasha Devroye

Chapter 8: Differential entropy. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 8: Differential entropy Chapter 8 outline Motivation Definitions Relation to discrete entropy Joint and conditional differential entropy Relative entropy and mutual information Properties AEP for

More information

Statistical Machine Learning Lectures 4: Variational Bayes

Statistical Machine Learning Lectures 4: Variational Bayes 1 / 29 Statistical Machine Learning Lectures 4: Variational Bayes Melih Kandemir Özyeğin University, İstanbul, Turkey 2 / 29 Synonyms Variational Bayes Variational Inference Variational Bayesian Inference

More information

x log x, which is strictly convex, and use Jensen s Inequality:

x log x, which is strictly convex, and use Jensen s Inequality: 2. Information measures: mutual information 2.1 Divergence: main inequality Theorem 2.1 (Information Inequality). D(P Q) 0 ; D(P Q) = 0 iff P = Q Proof. Let ϕ(x) x log x, which is strictly convex, and

More information

Information in Biology

Information in Biology Lecture 3: Information in Biology Tsvi Tlusty, tsvi@unist.ac.kr Living information is carried by molecular channels Living systems I. Self-replicating information processors Environment II. III. Evolve

More information

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 ECE598: Information-theoretic methods in high-dimensional statistics Spring 06 Lecture : Mutual Information Method Lecturer: Yihong Wu Scribe: Jaeho Lee, Mar, 06 Ed. Mar 9 Quick review: Assouad s lemma

More information

EE5319R: Problem Set 3 Assigned: 24/08/16, Due: 31/08/16

EE5319R: Problem Set 3 Assigned: 24/08/16, Due: 31/08/16 EE539R: Problem Set 3 Assigned: 24/08/6, Due: 3/08/6. Cover and Thomas: Problem 2.30 (Maimum Entropy): Solution: We are required to maimize H(P X ) over all distributions P X on the non-negative integers

More information

Information Theory. M1 Informatique (parcours recherche et innovation) Aline Roumy. January INRIA Rennes 1/ 73

Information Theory. M1 Informatique (parcours recherche et innovation) Aline Roumy. January INRIA Rennes 1/ 73 1/ 73 Information Theory M1 Informatique (parcours recherche et innovation) Aline Roumy INRIA Rennes January 2018 Outline 2/ 73 1 Non mathematical introduction 2 Mathematical introduction: definitions

More information

Homework Set #2 Data Compression, Huffman code and AEP

Homework Set #2 Data Compression, Huffman code and AEP Homework Set #2 Data Compression, Huffman code and AEP 1. Huffman coding. Consider the random variable ( x1 x X = 2 x 3 x 4 x 5 x 6 x 7 0.50 0.26 0.11 0.04 0.04 0.03 0.02 (a Find a binary Huffman code

More information

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information. L65 Dept. of Linguistics, Indiana University Fall 205 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission rate

More information

Dept. of Linguistics, Indiana University Fall 2015

Dept. of Linguistics, Indiana University Fall 2015 L645 Dept. of Linguistics, Indiana University Fall 2015 1 / 28 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission

More information

Chapter 2 Review of Classical Information Theory

Chapter 2 Review of Classical Information Theory Chapter 2 Review of Classical Information Theory Abstract This chapter presents a review of the classical information theory which plays a crucial role in this thesis. We introduce the various types of

More information

How to Quantitate a Markov Chain? Stochostic project 1

How to Quantitate a Markov Chain? Stochostic project 1 How to Quantitate a Markov Chain? Stochostic project 1 Chi-Ning,Chou Wei-chang,Lee PROFESSOR RAOUL NORMAND April 18, 2015 Abstract In this project, we want to quantitatively evaluate a Markov chain. In

More information

Lecture 17: Differential Entropy

Lecture 17: Differential Entropy Lecture 17: Differential Entropy Differential entropy AEP for differential entropy Quantization Maximum differential entropy Estimation counterpart of Fano s inequality Dr. Yao Xie, ECE587, Information

More information

Lecture 3: Channel Capacity

Lecture 3: Channel Capacity Lecture 3: Channel Capacity 1 Definitions Channel capacity is a measure of maximum information per channel usage one can get through a channel. This one of the fundamental concepts in information theory.

More information

Lecture 5: Asymptotic Equipartition Property

Lecture 5: Asymptotic Equipartition Property Lecture 5: Asymptotic Equipartition Property Law of large number for product of random variables AEP and consequences Dr. Yao Xie, ECE587, Information Theory, Duke University Stock market Initial investment

More information

n(1 p i ) n 1 p i = 1 3 i=1 E(X i p = p i )P(p = p i ) = 1 3 p i = n 3 (p 1 + p 2 + p 3 ). p i i=1 P(X i = 1 p = p i )P(p = p i ) = p1+p2+p3

n(1 p i ) n 1 p i = 1 3 i=1 E(X i p = p i )P(p = p i ) = 1 3 p i = n 3 (p 1 + p 2 + p 3 ). p i i=1 P(X i = 1 p = p i )P(p = p i ) = p1+p2+p3 Introduction to Probability Due:August 8th, 211 Solutions of Final Exam Solve all the problems 1. (15 points) You have three coins, showing Head with probabilities p 1, p 2 and p 3. You perform two different

More information

Quantitative Biology II Lecture 4: Variational Methods

Quantitative Biology II Lecture 4: Variational Methods 10 th March 2015 Quantitative Biology II Lecture 4: Variational Methods Gurinder Singh Mickey Atwal Center for Quantitative Biology Cold Spring Harbor Laboratory Image credit: Mike West Summary Approximate

More information

Information in Biology

Information in Biology Information in Biology CRI - Centre de Recherches Interdisciplinaires, Paris May 2012 Information processing is an essential part of Life. Thinking about it in quantitative terms may is useful. 1 Living

More information

U Logo Use Guidelines

U Logo Use Guidelines Information Theory Lecture 3: Applications to Machine Learning U Logo Use Guidelines Mark Reid logo is a contemporary n of our heritage. presents our name, d and our motto: arn the nature of things. authenticity

More information

Introduction to Statistical Learning Theory

Introduction to Statistical Learning Theory Introduction to Statistical Learning Theory In the last unit we looked at regularization - adding a w 2 penalty. We add a bias - we prefer classifiers with low norm. How to incorporate more complicated

More information

Bioinformatics: Biology X

Bioinformatics: Biology X Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA Model Building/Checking, Reverse Engineering, Causality Outline 1 Bayesian Interpretation of Probabilities 2 Where (or of what)

More information

COMP2610/COMP Information Theory

COMP2610/COMP Information Theory COMP2610/COMP6261 - Information Theory Lecture 9: Probabilistic Inequalities Mark Reid and Aditya Menon Research School of Computer Science The Australian National University August 19th, 2014 Mark Reid

More information

Computational Systems Biology: Biology X

Computational Systems Biology: Biology X Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA L#8:(November-08-2010) Cancer and Signals Outline 1 Bayesian Interpretation of Probabilities Information Theory Outline Bayesian

More information

Lecture 11: Information theory THURSDAY, FEBRUARY 21, 2019

Lecture 11: Information theory THURSDAY, FEBRUARY 21, 2019 Lecture 11: Information theory DANIEL WELLER THURSDAY, FEBRUARY 21, 2019 Agenda Information and probability Entropy and coding Mutual information and capacity Both images contain the same fraction of black

More information

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018 Please submit the solutions on Gradescope. EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018 1. Optimal codeword lengths. Although the codeword lengths of an optimal variable length code

More information

Lecture 6 I. CHANNEL CODING. X n (m) P Y X

Lecture 6 I. CHANNEL CODING. X n (m) P Y X 6- Introduction to Information Theory Lecture 6 Lecturer: Haim Permuter Scribe: Yoav Eisenberg and Yakov Miron I. CHANNEL CODING We consider the following channel coding problem: m = {,2,..,2 nr} Encoder

More information

Math 3215 Intro. Probability & Statistics Summer 14. Homework 5: Due 7/3/14

Math 3215 Intro. Probability & Statistics Summer 14. Homework 5: Due 7/3/14 Math 325 Intro. Probability & Statistics Summer Homework 5: Due 7/3/. Let X and Y be continuous random variables with joint/marginal p.d.f. s f(x, y) 2, x y, f (x) 2( x), x, f 2 (y) 2y, y. Find the conditional

More information

LECTURE 13. Last time: Lecture outline

LECTURE 13. Last time: Lecture outline LECTURE 13 Last time: Strong coding theorem Revisiting channel and codes Bound on probability of error Error exponent Lecture outline Fano s Lemma revisited Fano s inequality for codewords Converse to

More information

MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016

MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016 MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016 Lecture 14: Information Theoretic Methods Lecturer: Jiaming Xu Scribe: Hilda Ibriga, Adarsh Barik, December 02, 2016 Outline f-divergence

More information

EE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm

EE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm EE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm 1. Feedback does not increase the capacity. Consider a channel with feedback. We assume that all the recieved outputs are sent back immediately

More information

Application of Information Theory, Lecture 7. Relative Entropy. Handout Mode. Iftach Haitner. Tel Aviv University.

Application of Information Theory, Lecture 7. Relative Entropy. Handout Mode. Iftach Haitner. Tel Aviv University. Application of Information Theory, Lecture 7 Relative Entropy Handout Mode Iftach Haitner Tel Aviv University. December 1, 2015 Iftach Haitner (TAU) Application of Information Theory, Lecture 7 December

More information

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157 Lecture 6: Gaussian Channels Copyright G. Caire (Sample Lectures) 157 Differential entropy (1) Definition 18. The (joint) differential entropy of a continuous random vector X n p X n(x) over R is: Z h(x

More information

Example: Letter Frequencies

Example: Letter Frequencies Example: Letter Frequencies i a i p i 1 a 0.0575 2 b 0.0128 3 c 0.0263 4 d 0.0285 5 e 0.0913 6 f 0.0173 7 g 0.0133 8 h 0.0313 9 i 0.0599 10 j 0.0006 11 k 0.0084 12 l 0.0335 13 m 0.0235 14 n 0.0596 15 o

More information

Electrical and Information Technology. Information Theory. Problems and Solutions. Contents. Problems... 1 Solutions...7

Electrical and Information Technology. Information Theory. Problems and Solutions. Contents. Problems... 1 Solutions...7 Electrical and Information Technology Information Theory Problems and Solutions Contents Problems.......... Solutions...........7 Problems 3. In Problem?? the binomial coefficent was estimated with Stirling

More information

Information Theory and Statistics Lecture 2: Source coding

Information Theory and Statistics Lecture 2: Source coding Information Theory and Statistics Lecture 2: Source coding Łukasz Dębowski ldebowsk@ipipan.waw.pl Ph. D. Programme 2013/2014 Injections and codes Definition (injection) Function f is called an injection

More information

Information Theory in Intelligent Decision Making

Information Theory in Intelligent Decision Making Information Theory in Intelligent Decision Making Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire, United Kingdom June 7, 2015 Information Theory

More information

Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information

Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information 204 IEEE International Symposium on Information Theory Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information Omur Ozel, Kaya Tutuncuoglu 2, Sennur Ulukus, and Aylin Yener

More information

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University Chapter 4 Data Transmission and Channel Capacity Po-Ning Chen, Professor Department of Communications Engineering National Chiao Tung University Hsin Chu, Taiwan 30050, R.O.C. Principle of Data Transmission

More information

Entropies & Information Theory

Entropies & Information Theory Entropies & Information Theory LECTURE I Nilanjana Datta University of Cambridge,U.K. See lecture notes on: http://www.qi.damtp.cam.ac.uk/node/223 Quantum Information Theory Born out of Classical Information

More information

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization

More information

Information Theory: Entropy, Markov Chains, and Huffman Coding

Information Theory: Entropy, Markov Chains, and Huffman Coding The University of Notre Dame A senior thesis submitted to the Department of Mathematics and the Glynn Family Honors Program Information Theory: Entropy, Markov Chains, and Huffman Coding Patrick LeBlanc

More information

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.

More information

CSCI-6971 Lecture Notes: Probability theory

CSCI-6971 Lecture Notes: Probability theory CSCI-6971 Lecture Notes: Probability theory Kristopher R. Beevers Department of Computer Science Rensselaer Polytechnic Institute beevek@cs.rpi.edu January 31, 2006 1 Properties of probabilities Let, A,

More information

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1 Kraft s inequality An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if N 2 l i 1 Proof: Suppose that we have a tree code. Let l max = max{l 1,...,

More information

1 Review of The Learning Setting

1 Review of The Learning Setting COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #8 Scribe: Changyan Wang February 28, 208 Review of The Learning Setting Last class, we moved beyond the PAC model: in the PAC model we

More information

Information Theory. Week 4 Compressing streams. Iain Murray,

Information Theory. Week 4 Compressing streams. Iain Murray, Information Theory http://www.inf.ed.ac.uk/teaching/courses/it/ Week 4 Compressing streams Iain Murray, 2014 School of Informatics, University of Edinburgh Jensen s inequality For convex functions: E[f(x)]

More information

6.1 Main properties of Shannon entropy. Let X be a random variable taking values x in some alphabet with probabilities.

6.1 Main properties of Shannon entropy. Let X be a random variable taking values x in some alphabet with probabilities. Chapter 6 Quantum entropy There is a notion of entropy which quantifies the amount of uncertainty contained in an ensemble of Qbits. This is the von Neumann entropy that we introduce in this chapter. In

More information

Lecture 1: Introduction, Entropy and ML estimation

Lecture 1: Introduction, Entropy and ML estimation 0-704: Information Processing and Learning Spring 202 Lecture : Introduction, Entropy and ML estimation Lecturer: Aarti Singh Scribes: Min Xu Disclaimer: These notes have not been subjected to the usual

More information

Introduction to Information Theory

Introduction to Information Theory Introduction to Information Theory Gurinder Singh Mickey Atwal atwal@cshl.edu Center for Quantitative Biology Kullback-Leibler Divergence Summary Shannon s coding theorems Entropy Mutual Information Multi-information

More information

3F1 Information Theory, Lecture 1

3F1 Information Theory, Lecture 1 3F1 Information Theory, Lecture 1 Jossy Sayir Department of Engineering Michaelmas 2013, 22 November 2013 Organisation History Entropy Mutual Information 2 / 18 Course Organisation 4 lectures Course material:

More information

CS 630 Basic Probability and Information Theory. Tim Campbell

CS 630 Basic Probability and Information Theory. Tim Campbell CS 630 Basic Probability and Information Theory Tim Campbell 21 January 2003 Probability Theory Probability Theory is the study of how best to predict outcomes of events. An experiment (or trial or event)

More information

Data Compression. Limit of Information Compression. October, Examples of codes 1

Data Compression. Limit of Information Compression. October, Examples of codes 1 Data Compression Limit of Information Compression Radu Trîmbiţaş October, 202 Outline Contents Eamples of codes 2 Kraft Inequality 4 2. Kraft Inequality............................ 4 2.2 Kraft inequality

More information

Exercises with solutions (Set D)

Exercises with solutions (Set D) Exercises with solutions Set D. A fair die is rolled at the same time as a fair coin is tossed. Let A be the number on the upper surface of the die and let B describe the outcome of the coin toss, where

More information

Part I. Entropy. Information Theory and Networks. Section 1. Entropy: definitions. Lecture 5: Entropy

Part I. Entropy. Information Theory and Networks. Section 1. Entropy: definitions. Lecture 5: Entropy and Networks Lecture 5: Matthew Roughan http://www.maths.adelaide.edu.au/matthew.roughan/ Lecture_notes/InformationTheory/ Part I School of Mathematical Sciences, University

More information

Channel Coding 1. Sportturm (SpT), Room: C3165

Channel Coding 1.   Sportturm (SpT), Room: C3165 Channel Coding Dr.-Ing. Dirk Wübben Institute for Telecommunications and High-Frequency Techniques Department of Communications Engineering Room: N3, Phone: 4/8-6385 Sportturm (SpT), Room: C365 wuebben@ant.uni-bremen.de

More information

AQI: Advanced Quantum Information Lecture 6 (Module 2): Distinguishing Quantum States January 28, 2013

AQI: Advanced Quantum Information Lecture 6 (Module 2): Distinguishing Quantum States January 28, 2013 AQI: Advanced Quantum Information Lecture 6 (Module 2): Distinguishing Quantum States January 28, 2013 Lecturer: Dr. Mark Tame Introduction With the emergence of new types of information, in this case

More information

Solutions to Set #2 Data Compression, Huffman code and AEP

Solutions to Set #2 Data Compression, Huffman code and AEP Solutions to Set #2 Data Compression, Huffman code and AEP. Huffman coding. Consider the random variable ( ) x x X = 2 x 3 x 4 x 5 x 6 x 7 0.50 0.26 0. 0.04 0.04 0.03 0.02 (a) Find a binary Huffman code

More information

EE376A: Homeworks #4 Solutions Due on Thursday, February 22, 2018 Please submit on Gradescope. Start every question on a new page.

EE376A: Homeworks #4 Solutions Due on Thursday, February 22, 2018 Please submit on Gradescope. Start every question on a new page. EE376A: Homeworks #4 Solutions Due on Thursday, February 22, 28 Please submit on Gradescope. Start every question on a new page.. Maximum Differential Entropy (a) Show that among all distributions supported

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

DEEP LEARNING CHAPTER 3 PROBABILITY & INFORMATION THEORY

DEEP LEARNING CHAPTER 3 PROBABILITY & INFORMATION THEORY DEEP LEARNING CHAPTER 3 PROBABILITY & INFORMATION THEORY OUTLINE 3.1 Why Probability? 3.2 Random Variables 3.3 Probability Distributions 3.4 Marginal Probability 3.5 Conditional Probability 3.6 The Chain

More information

3. Review of Probability and Statistics

3. Review of Probability and Statistics 3. Review of Probability and Statistics ECE 830, Spring 2014 Probabilistic models will be used throughout the course to represent noise, errors, and uncertainty in signal processing problems. This lecture

More information

Machine Learning Srihari. Information Theory. Sargur N. Srihari

Machine Learning Srihari. Information Theory. Sargur N. Srihari Information Theory Sargur N. Srihari 1 Topics 1. Entropy as an Information Measure 1. Discrete variable definition Relationship to Code Length 2. Continuous Variable Differential Entropy 2. Maximum Entropy

More information

Lecture 2. Capacity of the Gaussian channel

Lecture 2. Capacity of the Gaussian channel Spring, 207 5237S, Wireless Communications II 2. Lecture 2 Capacity of the Gaussian channel Review on basic concepts in inf. theory ( Cover&Thomas: Elements of Inf. Theory, Tse&Viswanath: Appendix B) AWGN

More information

QB LECTURE #4: Motif Finding

QB LECTURE #4: Motif Finding QB LECTURE #4: Motif Finding Adam Siepel Nov. 20, 2015 2 Plan for Today Probability models for binding sites Scoring and detecting binding sites De novo motif finding 3 Transcription Initiation Chromatin

More information

Speech Recognition Lecture 7: Maximum Entropy Models. Mehryar Mohri Courant Institute and Google Research

Speech Recognition Lecture 7: Maximum Entropy Models. Mehryar Mohri Courant Institute and Google Research Speech Recognition Lecture 7: Maximum Entropy Models Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.com This Lecture Information theory basics Maximum entropy models Duality theorem

More information

ELEMENT OF INFORMATION THEORY

ELEMENT OF INFORMATION THEORY History Table of Content ELEMENT OF INFORMATION THEORY O. Le Meur olemeur@irisa.fr Univ. of Rennes 1 http://www.irisa.fr/temics/staff/lemeur/ October 2010 1 History Table of Content VERSION: 2009-2010:

More information

EE/Stat 376B Handout #5 Network Information Theory October, 14, Homework Set #2 Solutions

EE/Stat 376B Handout #5 Network Information Theory October, 14, Homework Set #2 Solutions EE/Stat 376B Handout #5 Network Information Theory October, 14, 014 1. Problem.4 parts (b) and (c). Homework Set # Solutions (b) Consider h(x + Y ) h(x + Y Y ) = h(x Y ) = h(x). (c) Let ay = Y 1 + Y, where

More information

Outline of the Lecture. Background and Motivation. Basics of Information Theory: 1. Introduction. Markku Juntti. Course Overview

Outline of the Lecture. Background and Motivation. Basics of Information Theory: 1. Introduction. Markku Juntti. Course Overview : Markku Juntti Overview The basic ideas and concepts of information theory are introduced. Some historical notes are made and the overview of the course is given. Source The material is mainly based on

More information

Quiz 2 Date: Monday, November 21, 2016

Quiz 2 Date: Monday, November 21, 2016 10-704 Information Processing and Learning Fall 2016 Quiz 2 Date: Monday, November 21, 2016 Name: Andrew ID: Department: Guidelines: 1. PLEASE DO NOT TURN THIS PAGE UNTIL INSTRUCTED. 2. Write your name,

More information

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy Coding and Information Theory Chris Williams, School of Informatics, University of Edinburgh Overview What is information theory? Entropy Coding Information Theory Shannon (1948): Information theory is

More information

Introduction to Machine Learning

Introduction to Machine Learning What does this mean? Outline Contents Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola December 26, 2017 1 Introduction to Probability 1 2 Random Variables 3 3 Bayes

More information