Lecture 15: Strong, Conditional, & Joint Typicality

Similar documents
Lecture 11: Channel Coding Theorem: Converse Part

Lecture 6: Source coding, Typicality, and Noisy channels and capacity

Lecture 27. Capacity of additive Gaussian noise channel and the sphere packing bound

Lecture 14: Graph Entropy

Entropy and Ergodic Theory Lecture 5: Joint typicality and conditional AEP

1 Convergence in Probability and the Weak Law of Large Numbers

Refinement of Two Fundamental Tools in Information Theory

Information Theory and Statistics Lecture 4: Lempel-Ziv code

Information Theory Tutorial Communication over Channels with memory. Chi Zhang Department of Electrical Engineering University of Notre Dame

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

Distribution of Random Samples & Limit theorems

Lecture 7: Channel coding theorem for discrete-time continuous memoryless channel

Spring Information Theory Midterm (take home) Due: Tue, Mar 29, 2016 (in class) Prof. Y. Polyanskiy. P XY (i, j) = α 2 i 2j

An Introduction to Asymptotic Theory

Glivenko-Cantelli Classes

Entropy Rates and Asymptotic Equipartition

Convergence of random variables. (telegram style notes) P.J.C. Spreij

EE 4TM4: Digital Communications II Information Measures

7.1 Convergence of sequences of random variables

Lecture Notes for Analysis Class

Multiterminal Source Coding with an Entropy-Based Distortion Measure

Random Walks on Discrete and Continuous Circles. by Jeffrey S. Rosenthal School of Mathematics, University of Minnesota, Minneapolis, MN, U.S.A.

The picture in figure 1.1 helps us to see that the area represents the distance traveled. Figure 1: Area represents distance travelled

lim za n n = z lim a n n.

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Lecture 12: November 13, 2018

Lecture Chapter 6: Convergence of Random Sequences

It is always the case that unions, intersections, complements, and set differences are preserved by the inverse image of a function.

Generalized Semi- Markov Processes (GSMP)

On Random Line Segments in the Unit Square

Lecture 2: Concentration Bounds

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

6. Sufficient, Complete, and Ancillary Statistics

Lecture 19: Convergence

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 6 9/24/2008 DISCRETE RANDOM VARIABLES AND THEIR EXPECTATIONS

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS

Chapter 2 The Monte Carlo Method

Chapter 6 Principles of Data Reduction

7.1 Convergence of sequences of random variables

Multiterminal source coding with complementary delivery

Discrete Mathematics and Probability Theory Spring 2013 Anant Sahai Lecture 18

Lecture 8: Convergence of transformations and law of large numbers

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

Application to Random Graphs

Lecture 7: October 18, 2017

Randomized Algorithms I, Spring 2018, Department of Computer Science, University of Helsinki Homework 1: Solutions (Discussed January 25, 2018)

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Lecture 9: Expanders Part 2, Extractors

6.3 Testing Series With Positive Terms

Exponential Functions and Taylor Series

Math Solutions to homework 6

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.

Massachusetts Institute of Technology

Sequences and Series of Functions

Asymptotic Coupling and Its Applications in Information Theory

ST5215: Advanced Statistical Theory

Machine Learning Brett Bernstein

ECE 6980 An Algorithmic and Information-Theoretic Toolbox for Massive Data

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Lecture 6: Integration and the Mean Value Theorem. slope =

An Introduction to Randomized Algorithms

MATH 112: HOMEWORK 6 SOLUTIONS. Problem 1: Rudin, Chapter 3, Problem s k < s k < 2 + s k+1

Infinite Sequences and Series

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Lecture 5: April 17, 2013

1 Review and Overview

Lecture 2. The Lovász Local Lemma

Intro to Learning Theory

Notes 19 : Martingale CLT

Introduction to Extreme Value Theory Laurens de Haan, ISM Japan, Erasmus University Rotterdam, NL University of Lisbon, PT

Lecture 23: Minimal sufficiency

1+x 1 + α+x. x = 2(α x2 ) 1+x

n outcome is (+1,+1, 1,..., 1). Let the r.v. X denote our position (relative to our starting point 0) after n moves. Thus X = X 1 + X 2 + +X n,

STAT 350 Handout 19 Sampling Distribution, Central Limit Theorem (6.6)

Precise Rates in Complete Moment Convergence for Negatively Associated Sequences

STAT Homework 1 - Solutions

Random Variables, Sampling and Estimation

EE 4TM4: Digital Communications II Probability Theory

Understanding Samples

1 Introduction to reducing variance in Monte Carlo simulations

Rademacher Complexity

Lecture 10: Universal coding and prediction

( θ. sup θ Θ f X (x θ) = L. sup Pr (Λ (X) < c) = α. x : Λ (x) = sup θ H 0. sup θ Θ f X (x θ) = ) < c. NH : θ 1 = θ 2 against AH : θ 1 θ 2

M17 MAT25-21 HOMEWORK 5 SOLUTIONS

Exponential Functions and Taylor Series

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

Output Analysis and Run-Length Control

Rates of Convergence by Moduli of Continuity

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem

Random Models. Tusheng Zhang. February 14, 2013

Shannon s noiseless coding theorem

Lecture 15: Conditional and Joint Typicaility

Regression with quadratic loss

Lecture 3 : Random variables and their distributions

Lecture 11: Pseudorandom functions

Seunghee Ye Ma 8: Week 5 Oct 28

On Information Divergence Measures and a Unified Typicality

Singular Continuous Measures by Michael Pejic 5/14/10

Transcription:

EE376A/STATS376A Iformatio Theory Lecture 15-02/27/2018 Lecture 15: Strog, Coditioal, & Joit Typicality Lecturer: Tsachy Weissma Scribe: Nimit Sohoi, William McCloskey, Halwest Mohammad I this lecture, we will cotiue developig tools that will be useful goig forward, i particular i the cotext of lossy compressio. 1 We will itroduce the otios of Strog, Coditioal, ad Joit Typicality. 1 Notatio A quick recap of the otatio: 1. Radom variables: i.e. X 2. Alphabet: i.e. X 3. Specific values: i.e. x 4. Sequece of values: i.e. x 5. Set of all probability mass fuctios o alphabet X : M(X ) 6. Empirical distributio of a sequece x : P x (a) := N(a x ) appears i x ] [N(a x ) is # of times symbol a 2 Typicality 2.1 Strog Typicality Defiitio 1. A sequece x X is strogly δ-typical with respect to a probability mass fuctio P M(X ) if P x (a) P (a) δ P (a), a X (1) I words, a sequece is strogly δ-typical with respect to P if its empirical distributio is close to the probability mass fuctio P. [δ is some fixed umber, typically small.] Defiitio 2. The strogly δ-typical set [or simply strogly typical set] of p, T δ (P ), is defied as the set of all sequeces that are strogly δ-typical with respect to P, i.e. T δ (P ) = {x : P x (A) P (a) δ P (a), a X } (2) Recall: the weakly ɛ-typical set of a IID source P is defied as A ɛ (P ) := {x : 1 log P (x ) H(P ) ɛ}. Note: The coditio for iclusio i the weakly ɛ-typical set is ideed weaker tha the coditio to be i the strogly δ-typical set. 1 log P (x ) = 1 log 1 P x (a) log 1 P (a). This is P (a) log 1 = 1 1 log P (x P (x i) i) P (a) = H(P ) if P x = 1 N(a x ) log 1 P (a) = P, i.e. if the empirical distributio iduced by x is close to P, i.e. if the sequece is strogly typical. Thus, P (x ) P 1 log P (x ) H(P ), i.e. strog typicality implies weak typicality. I the homework, we will show more precisely that 1 Optioal Readig: Chapter 2 i El Gamal ad Kim, Network Iformatio Theory. 1

T δ (P ) A ɛ (P ) for ɛ = δ H(P ). Example: Here is a example of a sequece that is weakly typical but ot strogly typical. Let P be the uiform distributio over X, i.e. P (a) = 1 a X. The P (x ) = 1 1 log p(x ) = log = H(P ) x X. Thus, A ɛ (P ) = X, while T δ (P ) = {x : P x (a) 1 δ, a X }. I other words, the weakly typical set is the set of all sequeces over X, whereas the strogly typical set is the set of all sequeces such that each symbol appears roughly the same umber of times alog the sequece. We have already show that the probability of a particular sequece beig i A ɛ (P ) approaches 1 as. I the homework, we will ivestigate the probability of a particular sequece beig i T δ (P ), i.e. P (T δ (P )). I fact, this also approaches 1 as. lim P (T δ(p )) = 1 This is also a maifestatio of the law of large umbers, which tells us that for every symbol a, the fractio of times that it appears i a sequece will approach its true probability uder the source P, with probability close to 1. Fially, we will show that the size of the set of strogly δ-typical sequeces T δ (P ) is roughly 2 H(P ) ; more precisely, that for all sufficietly large : 2 [H(P ) ɛ(δ)] T δ (P ) 2 [H(P )+ɛ(δ)] (3) where ɛ(δ) 0 as δ 0. The lower boud follows from the previously show fact that ay set with size smaller tha 2 H(P ) has vaishig probability. The upper boud simply follows from the fact that T δ (P ) A ɛ (P ). 2.2 Joit Typicality I the followig, we refer to the sequeces x = (x 1, x 2,..., x ), x i X ad y = (y 1, y 2,..., y ), y i Y, where X ad Y are fiite alphabets. Defiitio 3. The joit empirical distributio of (x, y ) is: P x,y (x, y) = 1 N(x, y x, y ) (4) Defiitio 4. (x, y ) is joitly δ-typical with respect to P M(X Y) if P x,y(x, y) P (x, y) δ P (x, y), x X, y Y (5) Defiitio 5. The joitly δ-typical set with respect to P M(X Y) is T δ (P ) = {(x, y ) : (x, y ) is joitly δ-typical with respect to P } (6) 2

Observe that these defiitios are just special cases of the defiitios of the empirical distributio, strog δ-typicality, ad the strogly δ-typical set, sice a pair of a sequece i X ad a sequece i Y is simply a sequece i the alphabet of pairs X Y. Notatio: For coveiece, we will sometimes write T δ (X) i place of T δ (P ),whe X P, or T δ (X, Y ) i place of T δ (P ) whe (X, Y ) P. I the homework, we will show that g : X R, x T δ (X), (1 δ)e[g(x)] 1 g(x i ) (1 + δ)e[g(x)] I other words, for strogly typical sequeces, the average value of g computed o the compoets of the sequece is close to the expected value of g(x). Observe that 1 g(x i ) = P x (a) g(a); the latter is the expectatio of g(x) whe X is distributed accordig to the empirical distributio of P x. But sice x T δ (x), P x is close to the true PMF of X [i.e. P ], which is why this expectatio is close to the true expectatio E[g(X)]. This property will be importat for the rate distortio theorem where g will be replaced by the distortio fuctio. I the homework, you will fid cases where this does ot hold for weak typicality. 2.3 Coditioal Typicality Defiitio 6. Fix x. The coditioal δ-typical set is T δ (Y x ) = {y : (x, y ) T δ (X, Y )} (7) I other words, it is the set of all sequeces y such that the pair (x, y ) is joitly δ-typical. Observe that if x T δ (X), the T δ (Y x ) =, because for a sequece (x, y ) to be joitly typical, each idividual sequece must be typical with respect to P X ad P Y, respectively (show i homework). I the homework, we will show that, assumig x T δ (X), (1 δ)2 [H(Y X) ɛ(δ)] T δ (Y x [H(Y X)+ɛ(δ)] ) 2 for all 0 < δ < δ ad sufficietly large, where ɛ(δ) = δ H(Y X). I short, for a sequece x that is typical, the umber of sequeces y that are joitly typical with x is approximately 2 H(Y X). A startig poit of the proof will be the Coditioal Typicality Lemma. Lemma 7 (Coditioal Typicality Lemma). For 0 < δ < δ, x T δ (X) ad Y P (y x ) = P Y X (y i x i ), the lim P (Y T δ (Y x )) = 1 (8) I other words, we fix a idividual sequece x, ad geerate the sequece Y stochastically ad idepedetly accordig to the distributio coditioed o x, i.e. we geerate Y i P Y X=xi, [accordig to the joit probability mass fuctio P X,Y, which gives rise to the coditioal probability mass fuctio P Y X ]. Oe ca thik of this i commuicatio termiology: the sequece Y is geerated is by takig the idividual sequece x ad passig it through the memoryless chael P (Y X). The probability that the sequece Y thus geerated is coditioally typical approaches 1 as becomes large. To prove the coditioal typicality lemma, we will employ the fact [to be proved earlier i the homework] that P (T δ (P )) 1. Fix some a X, ad cosider the subsequece of all compoets x i i x that 3

are equal to a. Cosider the subsequece of y i s correspodig to the same idices. This subsequece is geerated IID from the PMF P Y X=a. We will apply the aforemetioed result separately to each such subsequece correspodig to a symbol i a X. To prove the bouds o the size of T δ (Y x ), we will take a similar approach: we will use Equatio (3) [which will also be proved earlier i the homework] ad apply it to each subsequece associated with a symbol a X. We ca iterpret the Coditioal Typicality Lemma qualitatively with the help of the followig pictures: X Y T δ (X) T δ (Y ) x size 2 H(X) Tδ (Y x ) H(Y X) size 2 size 2 H(Y ) Figure 1: Illustratio of the relatioships betwee strogly δ-typical ad coditioally δ-typical sets The dashed lie deotes that, give chael iput x, the chael output will fall withi the dark gray set T δ (Y x ) with high probability. T δ (Y x ) ca be thought of the oise ball aroud the particular chael iput sequece x. Recall that i lecture 11, we used this to give ituitio for the chael codig coverse. Lemma 8 (Joit Typicality Lemma). 0 < δ < δ, if Ỹi IID Y, the for all sufficietly large ad x T δ (X), where ɛ(δ) 0 as δ 0. 2 [I(X;Y )+ ɛ(δ)] P (Ỹ T δ (Y x )) 2 [I(X;Y ) ɛ(δ)] (9) The proof of the Joit Typicality Lemma will also be a homework problem. Ituitively speakig, sice the sequece Ỹ is geerated IID with respect to Y, o a expoetial scale it is roughly uiformly distributed over the set T δ (Y ). Thus, the probability that the sequece falls withi T δ (Y x ) for some particular x is, o a expoetial scale, roughly the ratio of the size of this set to the size of T δ (Y ), sice T δ (Y x ) T δ (Y ). 4

2H(Y X) Agai, refer to Figure 1 for a visual aid. So, P (Ỹ T δ (Y X )) = 2 I(X;Y ). So, the 2 H(Y ) probability that a radomly geerated sequece Ỹ looks joitly typical with a particular sequece x is expoetially ulikely. I the ext lecture, we will see why these otios are sigificat i the cotext of lossy compressio. We will use them to prove the mai achievability result of lossy compressio. 5