Lecture 17: Differential Entropy

Similar documents
ECE 4400:693 - Information Theory

Chapter 8: Differential entropy. University of Illinois at Chicago ECE 534, Natasha Devroye

Lecture 22: Final Review

Lecture 5: Asymptotic Equipartition Property

Lecture 8: Channel Capacity, Continuous Random Variables

Lecture 11: Continuous-valued signals and differential entropy

EE/Stat 376B Handout #5 Network Information Theory October, 14, Homework Set #2 Solutions

Exercises with solutions (Set D)

Lecture 5 Channel Coding over Continuous Channels

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157

x log x, which is strictly convex, and use Jensen s Inequality:

Lecture 6: Entropy Rate

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

Lecture 20: Quantization and Rate-Distortion

Hands-On Learning Theory Fall 2016, Lecture 3

Random Variables and Their Distributions

National Sun Yat-Sen University CSE Course: Information Theory. Maximum Entropy and Spectral Estimation

Solutions to Homework Set #4 Differential Entropy and Gaussian Channel

LECTURE 13. Last time: Lecture outline

Chapter I: Fundamental Information Theory

EE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm

CHAPTER 3. P (B j A i ) P (B j ) =log 2. j=1

On Upper Bounding Discrete Entropy

Information Theory Primer:

EE376A: Homeworks #4 Solutions Due on Thursday, February 22, 2018 Please submit on Gradescope. Start every question on a new page.

ELEC546 Review of Information Theory

Chapter 6. Continuous Sources and Channels. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

ENGG2430A-Homework 2

EE5319R: Problem Set 3 Assigned: 24/08/16, Due: 31/08/16

Solutions to Homework Set #1 Sanov s Theorem, Rate distortion

( 1 k "information" I(X;Y) given by Y about X)

Lecture 5: GPs and Streaming regression

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Math 416 Lecture 2 DEFINITION. Here are the multivariate versions: X, Y, Z iff P(X = x, Y = y, Z =z) = p(x, y, z) of X, Y, Z iff for all sets A, B, C,

EE4601 Communication Systems

Ch. 8 Math Preliminaries for Lossy Coding. 8.5 Rate-Distortion Theory

Chapter 4: Continuous channel and its capacity

EE5585 Data Compression May 2, Lecture 27

Lecture Notes for Statistics 311/Electrical Engineering 377. John Duchi

Joint p.d.f. and Independent Random Variables

ECE Information theory Final

Chapter 5 continued. Chapter 5 sections

Ch. 8 Math Preliminaries for Lossy Coding. 8.4 Info Theory Revisited

COMPSCI 650 Applied Information Theory Jan 21, Lecture 2

Homework 2: Solution

Homework 1 Due: Thursday 2/5/2015. Instructions: Turn in your homework in class on Thursday 2/5/2015

Bivariate distributions

Recitation 2: Probability

LECTURE 3. Last time:

Math 3215 Intro. Probability & Statistics Summer 14. Homework 5: Due 7/3/14

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy

Gaussian channel. Information theory 2013, lecture 6. Jens Sjölund. 8 May Jens Sjölund (IMT, LiU) Gaussian channel 1 / 26

5 Mutual Information and Channel Capacity

The binary entropy function

Communications Theory and Engineering

There are two basic kinds of random variables continuous and discrete.

Information Theory and Communication

10-704: Information Processing and Learning Fall Lecture 9: Sept 28

LECTURE 2. Convexity and related notions. Last time: mutual information: definitions and properties. Lecture outline

EE595A Submodular functions, their optimization and applications Spring 2011

Lecture 18: Gaussian Channel

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016

Chapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code

18.2 Continuous Alphabet (discrete-time, memoryless) Channel

PROOF OF ZADOR-GERSHO THEOREM

Shannon meets Wiener II: On MMSE estimation in successive decoding schemes

5 Operations on Multiple Random Variables

LECTURE 10. Last time: Lecture outline

Appendix A : Introduction to Probability and stochastic processes

Introduction to Machine Learning

Quick Tour of Basic Probability Theory and Linear Algebra

ECE534, Spring 2018: Solutions for Problem Set #3

Lecture Notes for Statistics 311/Electrical Engineering 377. John Duchi

Multiple Random Variables

ECE 587 / STA 563: Lecture 2 Measures of Information Information Theory Duke University, Fall 2017

Review of Statistics

Probability on a Riemannian Manifold

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n

Lecture 1. Introduction

Algorithms for Uncertainty Quantification

Lecture 2: Repetition of probability theory and statistics

Vector Derivatives and the Gradient

Review: mostly probability and some statistics

Introduction to Probability Theory

Application of Information Theory, Lecture 7. Relative Entropy. Handout Mode. Iftach Haitner. Tel Aviv University.

Topics in Probability and Statistics

(Classical) Information Theory II: Source coding

LECTURE 15. Last time: Feedback channel: setting up the problem. Lecture outline. Joint source and channel coding theorem

1 Basic Information Theory

1 Probability and Random Variables

Entropy, Relative Entropy and Mutual Information Exercises

Slide11 Haykin Chapter 10: Information-Theoretic Models

Lecture 11: Quantum Information III - Source Coding

CHAPTER 4 MATHEMATICAL EXPECTATION. 4.1 Mean of a Random Variable

The Gaussian distribution

STA205 Probability: Week 8 R. Wolpert

Lecture 2: August 31

Continuous Random Variables

Lecture 14 February 28

Transcription:

Lecture 17: Differential Entropy Differential entropy AEP for differential entropy Quantization Maximum differential entropy Estimation counterpart of Fano s inequality Dr. Yao Xie, ECE587, Information Theory, Duke University

From discrete to continuous world Dr. Yao Xie, ECE587, Information Theory, Duke University 1

Differential entropy defined for continuous random variable differential entropy: h(x) = S f(x) log f(x)dx S is the support of probability density function (PDF) sometimes denote as h(f) Dr. Yao Xie, ECE587, Information Theory, Duke University 2

Uniform distribution f(x) = 1/a, x [0, a] differential entropy: h(x) = a 0 1/a log(a)dx = log a bits for a < 1, log a < 0, differential entropy can be negative! (unlike the discrete world) interpretation: volume of support set is 2 h(x) = 2 log a = a > 0 Dr. Yao Xie, ECE587, Information Theory, Duke University 3

Normal distribution f(x) = 1 x2 e 2σ 2, x R 2πσ 2 Differential entropy: h(x) = 1 2 log 2πeσ2 bits Calculation: Dr. Yao Xie, ECE587, Information Theory, Duke University 4

Some properties h(x + c) = h(x) h(ax) = h(x) + log a h(ax) = h(x) + log det(a) Dr. Yao Xie, ECE587, Information Theory, Duke University 5

AEP for continuous random variable Discrete world: for a sequence of i.i.d. random variables p(x 1,..., X n ) 2 nh(x) Continuous world: for a sequence of i.i.d. random variables 1 n log f(x 1,..., X n ) h(x), in probability. Proof from weak law of large number Dr. Yao Xie, ECE587, Information Theory, Duke University 6

Size of typical set Discrete world: number of typical sequences A (n) ϵ 2 nh(x) Continuous world: volume of typical set Volume of set A: Vol(A) = A dx 1 dx n Dr. Yao Xie, ECE587, Information Theory, Duke University 7

p(a (n) ϵ ) > 1 ϵ for n sufficiently large Vol(A) 2 n(h(x)+ϵ) for n sufficiently large Vol(A) (1 ϵ)2 n(h(x) ϵ) for n sufficiently large Proofs: very similar to the discrete world 1 = f(x 1,, x n )dx 1 dx n A (n) ϵ f(x 1,, x n )dx 1 dx n 2 n(h(x)+ϵ) A (n) ϵ dx 1 dx n Dr. Yao Xie, ECE587, Information Theory, Duke University 8

Volume of smallest set that contains most of the probability is 2 nh(x) for n-dimensional space, this means that each dim has measure 2 h(x)! "#$%&! "#$%& Differential entropy is a the volume of the typical set Fisher information is a the surface area of the typical set Dr. Yao Xie, ECE587, Information Theory, Duke University 9

Mean value theorem (MVT) If a function f is continuous on the closed interval [a, b], and differentiable on (a, b), then there exists a point c (a, b) such that f (c) = f(b) f(a) b a Dr. Yao Xie, ECE587, Information Theory, Duke University 10

Relation of continuous entropy to discrete entropy Discretize a continuous pdf f(x), divide the range of X into bins of length MVT: exist a value x i within each bin such that f(x i ) = (i+1) i f(x)dx f(x) x Dr. Yao Xie, ECE587, Information Theory, Duke University 11

Define a random variable X {x 1,...} with probability mass function p i = (i+1) i f(x)dx = f(x i ) Entropy of X H(X ) = p i log p i = f(x i ) log f(x i ) log, If f(x) is Riemann integrable, H(X ) + log h(x), as 0 Dr. Yao Xie, ECE587, Information Theory, Duke University 12

Implication on quantization Let = 2 n, log = n In general, h(x) + n is the number of bits on average to describe a continuous variable X to n-bit accuracy e.g. if X is uniform on [0, 1/8], the first 3 bits must be zero. Hence to describe X to n bit accuracy we need n 3 bits. Agrees with h(x) = 3 if X N (0, 100), n + h(x) = n +.5 log(2πe 100) = n + 5.37 Dr. Yao Xie, ECE587, Information Theory, Duke University 13

Joint and conditional differential entropy Joint differential entropy h(x 1,..., X n ) = f(x 1,..., x n ) log f(x 1,..., x n )dx 1 dx n Conditional differential entropy h(x Y ) = f(x, y) log f(x, y)dxdy = h(x, Y ) h(y ) Dr. Yao Xie, ECE587, Information Theory, Duke University 14

Relative entropy and mutual information Relative entropy D(f q) = f log f g Not necessarily finite. Let 0 log(0/0) = 0 Mutual information I(X; Y ) = f(x, y) log f(x, y) f(x)f(y) dxdy I(X; Y ) 0, D(f q) 0 Same Venn diagram relationship as in the discrete world: chain rule, conditioning reduces entropy, union bound... Dr. Yao Xie, ECE587, Information Theory, Duke University 15

Entropy of multivariate Gaussian Theorem. Let X 1,..., X n N (µ, K), µ: mean vector, K: covariance matrix, then h(x 1, X 2,..., X n ) = 1 2 log(2πe)n K bits Proof: Dr. Yao Xie, ECE587, Information Theory, Duke University 16

Mutual information of multivariate Gaussian (X, Y ) N (0, K) K = [ ] σ 2 ρσ 2 ρσ 2 σ 2 h(x) = h(y ) = 1 2 log(2πeσ2 ) h(x, Y ) = 1 2 log(2πe)2 K h(x; Y ) = h(x) + h(y ) h(x; Y ) = 1 2 log(1 ρ2 ) ρ = ±1, perfectly correlated, mutual information is! Dr. Yao Xie, ECE587, Information Theory, Duke University 17

Multivariate Gaussian is maximum entropy distribution Theorem. Let X R n be random vector with zero mean and covariance matrix K. Then h(x) 1 2 log(2πe)n K Proof: Dr. Yao Xie, ECE587, Information Theory, Duke University 18

Estimation counterpart of Fano s inequality Random variable X, estimator ˆX E(X ˆX) 2 1 2πe e2h(x) equality iff X is Gaussian and ˆX is the mean of X corollary: given side information Y and estimator ˆX(Y ) E(X ˆX(Y )) 2 1 2πe e2h(x Y ) Dr. Yao Xie, ECE587, Information Theory, Duke University 19

Summary discrete random variable continuous random variable entropy differential entropy Many things similar: mutual information, AEP Some things are different in continuous world: h(x) can be negative, maximum entropy distribution is Gaussian. Dr. Yao Xie, ECE587, Information Theory, Duke University 20