Entropy, Relative Entropy and Mutual Information Exercises

Similar documents
Homework 1 Due: Thursday 2/5/2015. Instructions: Turn in your homework in class on Thursday 2/5/2015

COMPSCI 650 Applied Information Theory Jan 21, Lecture 2

EE5585 Data Compression May 2, Lecture 27

Homework Set #2 Data Compression, Huffman code and AEP

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye

Exercises with solutions (Set D)

Quiz 1 Date: Monday, October 17, 2016

ECE 587 / STA 563: Lecture 2 Measures of Information Information Theory Duke University, Fall 2017

LECTURE 3. Last time:

Exercise 1. = P(y a 1)P(a 1 )

Machine Learning. Lecture 02.2: Basics of Information Theory. Nevin L. Zhang

Lecture 5 - Information theory

CS 630 Basic Probability and Information Theory. Tim Campbell

Chapter 8: Differential entropy. University of Illinois at Chicago ECE 534, Natasha Devroye

Lecture 1: Probability Fundamentals

Solutions to Homework Set #1 Sanov s Theorem, Rate distortion

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.

Dept. of Linguistics, Indiana University Fall 2015

EE376A - Information Theory Final, Monday March 14th 2016 Solutions. Please start answering each question on a new page of the answer booklet.

Problems and results for the ninth week Mathematics A3 for Civil Engineering students

The devil is in the denominator

Quantitative Biology II Lecture 4: Variational Methods

Lecture 2 : CS6205 Advanced Modeling and Simulation

Coding of memoryless sources 1/35

Lecture 22: Final Review

Solutions to Set #2 Data Compression, Huffman code and AEP

If the objects are replaced there are n choices each time yielding n r ways. n C r and in the textbook by g(n, r).

Exercises with solutions (Set B)

Lecture 17: Differential Entropy

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157

LECTURE 2. Convexity and related notions. Last time: mutual information: definitions and properties. Lecture outline

x log x, which is strictly convex, and use Jensen s Inequality:

The binary entropy function

Intermediate Math Circles November 8, 2017 Probability II

(6, 1), (5, 2), (4, 3), (3, 4), (2, 5), (1, 6)

Entropy. Probability and Computing. Presentation 22. Probability and Computing Presentation 22 Entropy 1/39

EE/Stat 376B Handout #5 Network Information Theory October, 14, Homework Set #2 Solutions

Solutionbank S1 Edexcel AS and A Level Modular Mathematics

EE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm

Information. = more information was provided by the outcome in #2

A Gentle Tutorial on Information Theory and Learning. Roni Rosenfeld. Carnegie Mellon University

Chapter 6: Probability The Study of Randomness

Compound Events. The event E = E c (the complement of E) is the event consisting of those outcomes which are not in E.

A brief review of basics of probabilities

Shannon s Noisy-Channel Coding Theorem

Information in Biology

Chapter 4: Continuous channel and its capacity

Lecture 3: Channel Capacity

Capacity of a channel Shannon s second theorem. Information Theory 1/33

Lecture 11: Information theory THURSDAY, FEBRUARY 21, 2019

Chapter 2.5 Random Variables and Probability The Modern View (cont.)

Solutions 1. Introduction to Coding Theory - Spring 2010 Solutions 1. Exercise 1.1. See Examples 1.2 and 1.11 in the course notes.

Noisy-Channel Coding

Shannon s noisy-channel theorem

Series 7, May 22, 2018 (EM Convergence)

Week 2. Section Texas A& M University. Department of Mathematics Texas A& M University, College Station 22 January-24 January 2019

2, find c in terms of k. x

Math , Fall 2012: HW 5 Solutions

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

Axioms of Probability

EE5139R: Problem Set 7 Assigned: 30/09/15, Due: 07/10/15

Information in Biology

Probability Theory Refresher

Math 1313 Experiments, Events and Sample Spaces

MEMORIAL UNIVERSITY OF NEWFOUNDLAND

Communication Theory and Engineering

INFORMATION THEORY AND STATISTICS

Lesson B1 - Probability Distributions.notebook

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

Expectation, inequalities and laws of large numbers

Principles of Communications

ECONOMICS 207 SPRING 2006 LABORATORY EXERCISE 5 KEY. 8 = 10(5x 2) = 9(3x + 8), x 50x 20 = 27x x = 92 x = 4. 8x 2 22x + 15 = 0 (2x 3)(4x 5) = 0

EE5319R: Problem Set 3 Assigned: 24/08/16, Due: 31/08/16

Lecture 2. Capacity of the Gaussian channel

Shannon s Noisy-Channel Coding Theorem

Lecture Notes 1 Basic Probability. Elements of Probability. Conditional probability. Sequential Calculation of Probability

Notes 3: Stochastic channels and noisy coding theorem bound. 1 Model of information communication and noisy channel

COS597D: Information Theory in Computer Science September 21, Lecture 2

Gambling and Information Theory

U Logo Use Guidelines

5 Mutual Information and Channel Capacity

EE376A: Homework #2 Solutions Due by 11:59pm Thursday, February 1st, 2018

EE376A: Homeworks #4 Solutions Due on Thursday, February 22, 2018 Please submit on Gradescope. Start every question on a new page.

Chapter 9. Worked-Out Solutions. Check. Chapter 9 Maintaining Mathematical Proficiency (p. 477) y = 2 x 2. y = 1. = (x + 5) 2 4 =?

Problems from Probability and Statistical Inference (9th ed.) by Hogg, Tanis and Zimmerman.

Example: Letter Frequencies

STAT 516 Answers Homework 2 January 23, 2008 Solutions by Mark Daniel Ward PROBLEMS. = {(a 1, a 2,...) : a i < 6 for all i}

Lecture 11: Continuous-valued signals and differential entropy

X = X X n, + X 2

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016

3F1: Signals and Systems INFORMATION THEORY Examples Paper Solutions

Some Basic Concepts of Probability and Information Theory: Pt. 2

MATH 3670 First Midterm February 17, No books or notes. No cellphone or wireless devices. Write clearly and show your work for every answer.

UC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics. EECS 281A / STAT 241A Statistical Learning Theory

ECE 4400:693 - Information Theory

Computing and Communications 2. Information Theory -Entropy

Information Theory and Communication

1 Basic Information Theory

The Communication Complexity of Correlation. Prahladh Harsha Rahul Jain David McAllester Jaikumar Radhakrishnan

Transcription:

Entrop, Relative Entrop and Mutual Information Exercises Exercise 2.: Coin Flips. A fair coin is flipped until the first head occurs. Let X denote the number of flips required. (a) Find the entrop H(X) in bits. The following expressions ma be useful: r n = r r, nr n r = ( r) 2 () n= (b) A random variable X is drawn according to this distribution. Find an efficient sequence of es-no questions of the form, Is X contained in the set S? Compare H(X) to the expected number of questions required to determine X. n= The probabilit for the random variable is given b P {X = i} =.5 i. Hence, H(X) = i = i p i log p i.5 i log(.5 i ) = log(.5) i i.5 i (2) = = 2.5 (.5) 2 Exercise 2.: Minimum entrop. What is the minimum value of H(p,..., p n ) = H(p) as p ranges over the set of n-dimensional probabilit vectors? Find all p s which achieve this minimum. Since H(p) and i p i =, then the minimum value for H(p) is which is achieved when p i = and p j =, j i. Exercise 2.: Average entrop. Let H(p) = p log 2 p ( p) log 2 ( p) be the binar entrop function. (a) Evaluate H(/4). (b) Calculate the average entrop H(p) when the probabilit p is chosen uniforml in the range p.

(a) H(/4) = /4 log 2 (/4) ( /4) log 2 ( /4) =.8 () (b) Now, So, H(p) = E[H(p)] f(p) = = H(p)f(p)dp {, p,, otherwise. (4) (5) H(p) = H(p)dp = (p log p + ( p) log( p)) dp [ ] = p log pdp q log qdq = 2 p log p dp Letting u = ln p and v = p 2 and integrating b parts, we have: H(p) = udv [ ] = uv udv [ = p 2 ln p ] ln 2 p 2 p ln 2 dp [ = p 2 ln p ln 2 ] 2 ln 2 p2 = 2 ln 2 (6) (7) Exercise 2.6: Example of joint entrop. Let p(x, ) be given b X \ Y / / / Find 2

(a) H(X), H(Y ). (b) H(X Y ), H(Y X). (c) H(X, Y ). (d) H(Y ) H(Y X). (e) I(X; Y ). (f) Draw a Venn diagram for the quantities in (a) through (e). (a) H(X) = 2 log(2 ) log( ) = log 2 =.98 (8) (b) H(X Y ) = x H(Y ) = log( ) 2 log(2 ) =.98 ( ) p() p(x, ) log p(x, ) = log( ) + 2 log( ) + + 2 log( ) = 2 log 2 + log (9) () = 2 H(Y X) = x ( ) p(x) p(x, ) log p(x, ) = 2 log( ) + 2 log( ) + + log( ) = 2 log 2 + log () = 2

(c) (d) H(X, Y ) = x = = log p(x, ) log p(x, ) [ log( ) + log( ) + log + ] log( ) (2) (e) I(X; Y ) = x H(Y ) H(Y X) = log 2 2 = log( 2 = 2 log 2 + log 4 = log 4 ( ) p(x, ) p(x, ) log p(x)p() ) + log( 2 2 ) + + log( 2 ) () (4) = log 4 Exercise 2.8: Entrop of a sum. Let X and Y be random variables that take on values x, x 2,..., x r and, 2,..., s. Let Z = X + Y. (a) Show that H(Z X) = H(Y X). Argue that if X, Y are independent then H(Y ) H(Z) and H(X) H(Z). Thus the addition of independent random variables adds uncertaint. (b) Give an example (of necessaril dependent random variables) in which H(X) > H(Z) and H(Y ) > H(Z). (c) Under what conditions does H(Z) = H(X) + H(Y )? (a) H(Z X) = x = x = x p(z, x) log p(z x) z p(x) p(z x) log p(z x) z p(x) p( x) log p( x) (5) = H(Y X) If X, Y are independent, then H(Y X) = H(Y ). Now, H(Z) H(Z X) = H(Y X) = H(Y ). Similarl H(X) H(Z) 4

(b) If X, Y are dependent such that P r{y = x X = x} =, then P r{z = } =, so that H(X) = H(Y ) >, but H(Z) =. Hence H(X) > H(Z) and H(Y ) > H(Z). Another example is the sum of the two opposite faces on a dice, which alwas add to seven. (c) The random variables X, Y are independent and x i + j x m + n for all i, m R and j, n S, ie the two random variables X, Y never sum up to the same value. In other words, the alphabet of Z is r s. The proof is as follows. Notice that Z = X + Y = φ(x, Y ). Now H(Z) = H(φ(X, Y )) H(X, Y ) = H(X) + H(Y X) H(X) + H(Y ) (6) Now if φ( ) is a bijection (ie onl one pair of x, maps to one value of z), then H(φ(X, Y )) = H(X, Y ) and if X, Y are independent then H(Y X) = H(Y ). Hence, with these two conditions, H(Z) = H(X) + H(Y ). Exercise 2.2: Data processing. Let X X 2 X X n form a Markov chain in this order; i.e., let Reduce I(X ; X 2,..., X n ) to its simplest form. p(x, x 2,..., x n ) = p(x )p(x 2 x ) p(x n x n ) (7) I(X ; X 2,..., X n ) = H(X ) H(X X 2,..., X n ) = H(X ) [H(X, X 2,..., X n ) H(X 2,..., X n )] [ n ] n = H(X ) H(X i X i,..., X ) H(X i X i,..., X 2 ) = H(X ) i= [( H(X ) + = H(X 2 ) H(X 2 X ) = I(X 2 ; X ) = I(X ; X 2 ) i=2 ) ( n H(X i X i ) H(X 2 ) + i=2 )] n H(X i X i ) Exercise 2.: Fano s inequalit. Let P r(x = i) = p i, i =, 2,..., m and let p p 2 p p m. The minimal probabilit of error predictor of X is ˆX =, with resulting probabilit of error P e = p. Maximise H(p) subject to the constraint p = P e to find a bound on P e in terms of H. This is Fano s inequalit in the absence of conditioning. We want to maximise H(p) = m i= p i log p i subject to the constraints p = P e and p i =. Form the Lagrangian: L = H(p) + λ(p e + p ) + µ( p i ) (9) i= (8) 5

and take the partial derivatives for each p i and the Lagrangian multipliers: p i = (log p i + ) + µ, i p = (log p i + ) + λ + µ λ = P e + p µ = p i (2) Setting these equations to zero, we have p i = 2 µ p = 2 λ+µ p = P e pi = (2) We proceed b eliminating µ. Since the probabilities must sum to one, Hence, we have P e + i 2 µ = ( ) (22) Pe µ = + log m p = P e p i = P e m, i (2) Since we know that for these probabilities the entrop is maximised, H(p) ( P e ) log( P e ) + ( ) P e m log Pe m i ( ) = ( P e ) log( P e ) + P e log P e + P e m log m = H(P e ) + P e log ( H ) from which we get Fano s inequalit in the absence of conditioning. i (24) 6