Ch. 8 Math Preliminaries for Lossy Coding. 8.5 Rate-Distortion Theory

Similar documents
Ch. 8 Math Preliminaries for Lossy Coding. 8.4 Info Theory Revisited

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy

Lecture 8: Channel Capacity, Continuous Random Variables

Math 416 Lecture 2 DEFINITION. Here are the multivariate versions: X, Y, Z iff P(X = x, Y = y, Z =z) = p(x, y, z) of X, Y, Z iff for all sets A, B, C,

18.2 Continuous Alphabet (discrete-time, memoryless) Channel

Lecture 18: Quantum Information Theory and Holevo s Bound

STAT 430/510: Lecture 15

Ch. 10 Vector Quantization. Advantages & Design

ECE 4400:693 - Information Theory

Revision of Lecture 5

Principles of Communications

Chapter 5 Class Notes

Ch. 2 Math Preliminaries for Lossless Compression. Section 2.4 Coding

SDS 321: Introduction to Probability and Statistics

Exercises with solutions (Set D)

Chapter 8: Differential entropy. University of Illinois at Chicago ECE 534, Natasha Devroye

Lecture 11: Continuous-valued signals and differential entropy

( 1 k "information" I(X;Y) given by Y about X)

Recitation 2: Probability

Statistics for scientists and engineers

Series 7, May 22, 2018 (EM Convergence)

Computing and Communications 2. Information Theory -Entropy

Probability and Information Theory. Sargur N. Srihari

STAT 430/510: Lecture 16

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation

Information Theory Primer:

Application of Information Theory, Lecture 7. Relative Entropy. Handout Mode. Iftach Haitner. Tel Aviv University.

Information Theory. Mark van Rossum. January 24, School of Informatics, University of Edinburgh 1 / 35

Lecture 17: Differential Entropy

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye

Note that the new channel is noisier than the original two : and H(A I +A2-2A1A2) > H(A2) (why?). min(c,, C2 ) = min(1 - H(a t ), 1 - H(A 2 )).

Ch. 5 Joint Probability Distributions and Random Samples

Extra Topic: DISTRIBUTIONS OF FUNCTIONS OF RANDOM VARIABLES

Information. = more information was provided by the outcome in #2

A Gentle Tutorial on Information Theory and Learning. Roni Rosenfeld. Carnegie Mellon University

Multivariate Distributions (Hogg Chapter Two)

Introduction to Statistical Methods for High Energy Physics

Lecture 22: Final Review

Multimedia Communications. Scalar Quantization

Revision of Lecture 4

The binary entropy function

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Information Dimension

Block 2: Introduction to Information Theory

Generative Learning algorithms

Quantitative Biology II Lecture 4: Variational Methods

EE376A - Information Theory Final, Monday March 14th 2016 Solutions. Please start answering each question on a new page of the answer booklet.

ECE531 Lecture 4b: Composite Hypothesis Testing

f( x) f( y). Functions which are not one-to-one are often called many-to-one. Take the domain and the range to both be all the real numbers:

Math 3215 Intro. Probability & Statistics Summer 14. Homework 5: Due 7/3/14

6 Central Limit Theorem. (Chs 6.4, 6.5)

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 18

Gaussian source Assumptions d = (x-y) 2, given D, find lower bound of I(X;Y)

Expectation Maximization

ECE531 Lecture 8: Non-Random Parameter Estimation

CS 630 Basic Probability and Information Theory. Tim Campbell

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157

Ch. 12 Linear Bayesian Estimators

The Multivariate Gaussian Distribution [DRAFT]

ECE Information theory Final

Linear Algebra, Summer 2011, pt. 3

Hands-On Learning Theory Fall 2016, Lecture 3

FACULTY OF ENGINEERING AND ARCHITECTURE. MATH 256 Probability and Random Processes. 04 Multiple Random Variables

12 Statistical Justifications; the Bias-Variance Decomposition

Consider the joint probability, P(x,y), shown as the contours in the figure above. P(x) is given by the integral of P(x,y) over all values of y.

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Lecture 11: Information theory THURSDAY, FEBRUARY 21, 2019

Statistics. Lent Term 2015 Prof. Mark Thomson. 2: The Gaussian Limit

Classification & Information Theory Lecture #8

Single Maths B: Introduction to Probability

Summary of Shannon Rate-Distortion Theory

Communication Theory II

The n th moment M n of a RV X is defined to be E(X n ), M n = E(X n ). Andrew Dabrowski

Expectation Maximization

Computational Physics (6810): Session 13

MA 1128: Lecture 08 03/02/2018. Linear Equations from Graphs And Linear Inequalities

Introduction to Probability and Statistics (Continued)

Machine Learning

Math101, Sections 2 and 3, Spring 2008 Review Sheet for Exam #2:

STAT J535: Introduction

Lecture 2. Capacity of the Gaussian channel

First Order ODEs, Part I

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

Q You mentioned that in complex analysis we study analytic functions, or, in another name, holomorphic functions. Pray tell me, what are they?

Chapter 1 Review of Equations and Inequalities

Capacity of AWGN channels

Efficient Coding. Odelia Schwartz 2017

CSE446: Clustering and EM Spring 2017

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University

Embedded Zerotree Wavelet (EZW)

EE5585 Data Compression May 2, Lecture 27

MATH 829: Introduction to Data Mining and Analysis Graphical Models I

19 : Slice Sampling and HMC

Clustering and Gaussian Mixtures

Latent Variable Models and Expectation Maximization

Data Analysis and Monte Carlo Methods

CHAPTER 3. P (B j A i ) P (B j ) =log 2. j=1

Continuous Random Variables

Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables

Transcription:

Ch. 8 Math Preliminaries for Lossy Coding 8.5 Rate-Distortion Theory 1

Introduction Theory provide insight into the trade between Rate & Distortion This theory is needed to answer: What do typical R-D curves look like? What factors impact the R-D trade-off? For a given practical case, what is the best R-D curve that I can expect? Etc. Tells designers when to stop trying to improve! Our Goal Here: To express the R-D function in terms of Info Theory & see what it tells

Expression for Distortion Need math form for Distortion Recall: D Expected Value taken w.r.t. joint PDF of X &Y = E d( X, Y) { } M 1 D d( x, y ) f ( x) P( y x) dx = <Differs from book> j= 0 j X j Describes joint prob of cont. X & disc. Y 3

Simple Example for Distortion In compression we wish to minimize D by designing a mapping from values of X (continuous) into values of Y (discrete) Simple Example: RV X can take any value here x Compression from X to Y Clearly our map doesn t impact f X (x) But it does specify P(y j x)!! In fact, designing a compression algorithm is equivalent to specifying P(y j x). Including determining how many y j values to use y 4

Goal for Picking Compression Algorithm Thus, D is a function of P(y j x): D = EdXY { (, )} = D( Py ( x) ) When we design a compression algorithm we want one that minimizes the distortion D Pick P(y j x) to minimize D(P(y j x)) j But wait!!!... We need to worry about how large or small the Rate is! Constrained minimization of D(P(y j x)) to find the theoretical lower bound on the R-D curve 5

Info Theory View of Rate-Distortion Recall that we said the Avg. Mutual Info I(X;Y) was the theoretical minimum rate needed to convey the amount of info about X that is in some specified Y Info-Theory R-D Goal #1: For given rate value R find the P(y j x) that minimizes the avg distortion E{d(X,Y)} under the rate constraint I(X;Y) R Info-Theory R-D Goal #: For given distortion value D find the P(y j x) that minimizes the avg rate I(X;Y) under the distortion constraint E{d(X,Y)} D These two complementary goals actually result in the same Info Theory R-D function We ll focus on #: RD ( ) = min I( XY ; ) P( y x): E{ d( X, Y)} D j 6

Notice that the minimization is over P(y j x) and involves E{d(X,Y)} and I(X;Y) We need these two things as functions of P(y j x) We ve already seen that for the first one: ( ) E{ d( X, Y)} = D P( y x) The second one is a bit harder to see: with I( X; Y) = h( X) h( X Y) = XY Y XY hx ( Y) f ( x y) f ( y)log f ( x y) dxdy j But using Bayes Rule we see f XY ( x y) = f ( y x) f ( x) YX f ( y x) f ( x) dy YX Has the order of the conditioning we need but is a PDF X X Depends on the reverse of the conditioning we need!! Recall: A Prob. Function can be written as a PDF that has only deltas in it 7

Example: R-D Function for Gaussian Source Approach: Find lower bound for I(X;Y) given the desired distortion level D Show this bound can be achieved Let X be Gaussian w/ Zero Mean & Variance σ Distortion Constraint: E{(X-Y)} D First consider case where D < σ I( XY ; ) = hx ( ) hx ( Y) Don t Use formal Math to get this reason it out by definition = hx ( ) hx ( Y Y) hx ( Y) hx ( ) hx ( Y) ( ) Now want to minimize this lower bound while meeting the distortion constraint That means we need to maximize h(x Y) 8

For notational ease let hz ( ) = Δ hx ( Y) We know that the Diff Entropy h(z) is maximized if Z is Gaussian so assume that. Further if Z has variance of D, then E{(X Y) } = D and we meet the distortion goal! 1 1 Since X is Gaussian we know that hx ( ) = log π eσ Since Z is Gaussian we know that hz ( ) = log [ π ed] 1 1 Then ( ) gives I( XY ; ) log πeσ log [ πed] 1 I( X; Y) log σ D ( For D < σ ) 9

Now consider case where D σ Note that: E{( X 0) } = σ D Setting Y = 0 satisfies the Dist. Goal Don t have to even send anything and you still meet the Dist. Goal!!! I(X;Y) = I(X;0) = 0 Combining these two cases (and noting that it is possible to actually achieve this bound see the textbook) we have RD ( ) 1 σ log, for D < σ = D 0, for D σ DR R ( ) = σ, R 0 10

11

In Practice The above results guide practical ideas and give bounds for comparing actual Rates & Distortions These ideas also motivate Operational Rate-Distortion In the above Info Theory View: D was a probabilistic average over the ensemble R was a lower limit In Operational R-D View: D is what we actually achieve on this particular signal R is the best rate we can get using a specified algorithm < More On This Later> 1