The method of types. PhD short course Information Theory and Statistics Siena, September, Mauro Barni University of Siena

Similar documents
Information Theory and Hypothesis Testing

EE 4TM4: Digital Communications II Information Measures

IT and large deviation theory

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3

Information Theory and Statistics Lecture 4: Lempel-Ziv code

Lecture 10: Universal coding and prediction

Entropy and Ergodic Theory Lecture 5: Joint typicality and conditional AEP

Lecture 6: Source coding, Typicality, and Noisy channels and capacity

Solutions to Tutorial 3 (Week 4)

Lecture 19: Convergence

Expectation and Variance of a random variable

n outcome is (+1,+1, 1,..., 1). Let the r.v. X denote our position (relative to our starting point 0) after n moves. Thus X = X 1 + X 2 + +X n,

INFINITE SEQUENCES AND SERIES

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Shannon s noiseless coding theorem

Lecture 15: Strong, Conditional, & Joint Typicality

Lecture 7: October 18, 2017

STAT Homework 1 - Solutions

Please do NOT write in this box. Multiple Choice. Total

Asymptotic Coupling and Its Applications in Information Theory

Lecture 11: Channel Coding Theorem: Converse Part

Infinite Sequences and Series

Refinement of Two Fundamental Tools in Information Theory

CHAPTER 1 SEQUENCES AND INFINITE SERIES

Lesson 10: Limits and Continuity

SDS 321: Introduction to Probability and Statistics

Approximations and more PMFs and PDFs

1+x 1 + α+x. x = 2(α x2 ) 1+x

Lecture 12: November 13, 2018

Power series are analytic

Series III. Chapter Alternating Series

Lecture 27. Capacity of additive Gaussian noise channel and the sphere packing bound

Math 451: Euclidean and Non-Euclidean Geometry MWF 3pm, Gasson 204 Homework 3 Solutions

Math 61CM - Solutions to homework 3

MA131 - Analysis 1. Workbook 3 Sequences II

6.3 Testing Series With Positive Terms

Analytic Continuation

Lecture 7: Channel coding theorem for discrete-time continuous memoryless channel

Lecture 5: April 17, 2013

It is always the case that unions, intersections, complements, and set differences are preserved by the inverse image of a function.

Self-normalized deviation inequalities with application to t-statistic

Information Theory Tutorial Communication over Channels with memory. Chi Zhang Department of Electrical Engineering University of Notre Dame

Lecture 2: Concentration Bounds

5.1 A mutual information bound based on metric entropy

Topic 9: Sampling Distributions of Estimators

Distribution of Random Samples & Limit theorems

Sequences I. Chapter Introduction

Entropies & Information Theory

MAT1026 Calculus II Basic Convergence Tests for Series

Math 10A final exam, December 16, 2016

Power series are analytic

Axioms of Measure Theory

7.1 Convergence of sequences of random variables

Sieve Estimators: Consistency and Rates of Convergence

Lecture 8: Convergence of transformations and law of large numbers

7.1 Convergence of sequences of random variables

On a Smarandache problem concerning the prime gaps

Exponential Functions and Taylor Series

4.1 Data processing inequality

Chapter 6 Sampling Distributions

Information Theory and Coding

Seunghee Ye Ma 8: Week 5 Oct 28

Exponential Functions and Taylor Series

Lecture Chapter 6: Convergence of Random Sequences

Signal Processing. Lecture 02: Discrete Time Signals and Systems. Ahmet Taha Koru, Ph. D. Yildiz Technical University.

MA131 - Analysis 1. Workbook 2 Sequences I

This section is optional.

Sequences. Notation. Convergence of a Sequence

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013. Large Deviations for i.i.d. Random Variables

The Boolean Ring of Intervals

Lecture 4: April 10, 2013

4. Partial Sums and the Central Limit Theorem

Solution. 1 Solutions of Homework 1. Sangchul Lee. October 27, Problem 1.1

ECE 6980 An Algorithmic and Information-Theoretic Toolbox for Massive Data

MA131 - Analysis 1. Workbook 9 Series III

UNIT 2 DIFFERENT APPROACHES TO PROBABILITY THEORY

Last Lecture. Wald Test

Commutativity in Permutation Groups

Lecture 9: Expanders Part 2, Extractors

Sequences III. Chapter Roots

Lecture 3: August 31

ECE 564/645 - Digital Communication Systems (Spring 2014) Final Exam Friday, May 2nd, 8:00-10:00am, Marston 220

Math 216A Notes, Week 5

Understanding Samples

1 Approximating Integrals using Taylor Polynomials

Rademacher Complexity

Lecture 4 February 16, 2016

Lecture 11: Pseudorandom functions

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Frequentist Inference

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

A statistical method to determine sample size to estimate characteristic value of soil parameters

Randomized Algorithms I, Spring 2018, Department of Computer Science, University of Helsinki Homework 1: Solutions (Discussed January 25, 2018)

Math 299 Supplement: Real Analysis Nov 2013

The Maximum-Likelihood Decoding Performance of Error-Correcting Codes

Continuous Functions

Math 104: Homework 2 solutions

Rates of Convergence by Moduli of Continuity

A Note on the Kolmogorov-Feller Weak Law of Large Numbers

Basics of Probability Theory (for Theory of Computation courses)

Transcription:

PhD short course Iformatio Theory ad Statistics Siea, 15-19 September, 2014 The method of types Mauro Bari Uiversity of Siea

Outlie of the course Part 1: Iformatio theory i a utshell Part 2: The method of types ad its relatioship with statistics Part 3: Iformatio theory ad large deviatio theory Part 4: Iformatio theory ad hypothesis testig Part 5: Applicatio to adversarial sigal processig

Outlie of Part 2 The method of types Defiitios Basic properties with proof of theorems Law of large umbers Source codig, Uiversal source codig

Type or empirical probability Type, or empirical probability, of a sequece P x (a) = N(a x ) a X Set with all the types with deomiator P = all types with deomiator '! 1 if X = {0,1} P 5 = ( 0,1), 5, 4 $! 2 # &, " 5% 5, 3 $! 3 # &, " 5% 5, 2 $! 4 # &, " 5% 5, 1 $ ( # &, 1, 0 ) " 5% ( ) * +,

Type class Type class: all the sequeces havig the same type T(P) = { x X : P x = P} Example: x 5 = 01100! P x 5 = 3 5, 2 $ # & " 5% T ( P ) x 5 = ') ( *) 11000,10100,10010,10001, 01100 01010, 01001, 00110, 00101, 00011 + ), -)

Number of types The umber of types grows polyomially with Theorem The umber of types with deomiator is upper bouded by: P ( +1) X Proof. Obvious.

Probability of a sequece Theorem The probability that a sequece x = x is emitted by a DMS source with pmf Q is Q(x) = 2 ( H (P x ) +D(P x Q) ) if P x = Q Q(x) = 2 H (P x ) H (Q) = 2 Remember The larger the KL distace from the type of x ad Q the lower the probability.

Probability of a sequece Proof. i Q(x) = Q(x i ) = a X Q(a) N (a x) = Q(a) P x (a) = 2 P x (a)logq(a) a X a X a X = 2 [P x (a)logq(a) P x (a)log P x (a)+p x (a)log P x (a)] = 2 a " P x (a)log P x (a) Q(a) +P % $ x (a)log P x (a)' # & = 2 [ H (P x )+D(P x Q) ]

Examples Probability of a specific sequece with /2 tails ad heads Fair coi Biased coi with P(H) = 1/3, P(T) = 2/3 Same as above with /3 heads Fair coi Biased coi with P(H) = 1/3, P(T) = 2/3

Size of a type class Theorem The size of a type class T(P) ca be bouded as follows: 1 ( +1) X 2 H (P) T(P) 2 H (P) Remember The size of a type class grows expoetially with growig rate equal to the etropy of the type.

Size of a type class Proof. (upper boud) Give P P cosider the probability that a source with pmf P emits a sequece i T(P). We have 1 P(x) = 2 x T (P) x T (P) H (P) H (P) = T(P) 2 H (P) T(P) 2

Size of a type class Proof. (lower boud)! T(P) = # " P(a 1 )... P(a X ) $ & =! % 1! 2! X!! # " e $ & % T(P)!! $ # & " e % " 1 1 $ # e " $ # e % ' & % 1 ' & X " $ # Stirlig approximatio after some algebra X e % X ' & T(P) 1 ( +1) X 2 H (P)

Probability of a type class Theorem The probability that a DMS with pmf Q emits a sequece belogig to T(P) ca be bouded as follows: 1 ( +1) X 2 D(P Q) Q(T(P)) 2 D(P Q) Remember The larger the KL distace betwee P ad Q the smaller the probability. If P=Q, the probability teds to 1 expoetially fast

Probability of a type class Proof. Q(T(P)) = Q(x) = 2 x T (P) x T (P) (H (P)+D(P Q)) (H (P)+D(P Q)) = T(P) 2 By rememberig the bouds o the size of T(P): 1 ( +1) X 2 D(P Q) Q(T(P)) 2 D(P Q)

I summary P ( +1) X Q(x) = 2 [D(P x Q)+H (P x )] H (P) T(P) 2 Q(T(P)) 2 D(P Q)

Iformatio Theory ad Statistics

Law of large umbers The law of large umbers provides the lik betwee Iformatio Theory ad Statistics. The weak form of the LLN states that Give a sequece of iid radom variables X i X = 1 ε > 0 i=1 X i lim Pr{ X µ X > ε} = 0 Stadard proof is based o Chebyshev iequality. LLN ca be easily exteded to relative frequecies ad probabilities (for discrete radom variables).

Law of large umbers (IT perspective) Q(T(P)) 2 D(P Q) Whe grows the oly type class with a o-egligible probability is Q Theorem (law of large umbers) T ε Q = { x : D(P x Q) ε} P(x T Q ε ) = Q(T(P)) 2 D(P Q) 2 ε P:D(P Q)>ε P:D(P Q)>ε P:D(P Q)>ε ( +1) X 2 ε = 2 # ε X $ % log(+1) & ' ( That teds to 0 whe teds to ifiity

Source codig (achievability) Source codig theorem (Shao 48) Give a DMS source Q, ay rate R such that R = H(Q)+ε is achievable (for ay ε > 0) Code sequeces of icreasig leght. Code efficietly oly the sequeces i T(Q), sice the others will (almost) ever occur. To do that we eed oly H(Q) bits.

Source codig: rigorous proof Choose a small ε ad defie T ε Q = {x : D(P x Q) ε} By the cotiuity of D d(p x,q) ε ' which 0 if ε 0 By the cotiuity of H H(P x ) H(Q)+ε '' which 0 if ε ' 0 1. Code sequeces i T Q ε by coutig them i T Q ε 2. Code sequeces ot i T Q ε by coutig them i X

Source codig: rigorous proof The average umber of bits is L Pr{T Q ε }[H(Q)+ ε ''+ X log( +1)]+ (1 Pr{T Q ε }) log( X ) L log( +1) H(Q)ε ''+ X +δ log( X ) That ca be made arbitrarily small by icreasig ad by properly choosig ε ad δ

Uiversal source codig What if Q is ot kow? The suprisig result is that we ca still code at ayrate larger tha the Etropy. Observe a sequece of emitted symbols ad estimate Q, the trasmit iformatio about the type ad the idex of the sequece withi the type

Uiversal source codig (rigorous proof) Choose a arbitrarily small ε ad let T ε Q = { x : D(P x Q) ε}. Give a sequece x use X log( +1) bits to idicate its type ad H(P x ) to idex x withi the type. The average umber of bits used by the code is: X log( +1) X log( +1) + Q(x )H(P x ) + Q(x )H(P x ) x T Q ε x T Q ε +Q(x T Q ε )log X +Q(x T Q ε )[H(Q)+δ] H(Q)+δ ' Beig ε ad δ (ad hece δ ) arbitrarily small, ay rate larger tha H(Q) ca be obtaied.

Chael codig The method of types ca be used to prove may other results i IT icludig the chael codig theorem Outside the scope of this course

Refereces 1. T. M. Cover ad J. A. Thomas, Elemets of Iformatio Theory, Wiley 2. I. Csiszar, The method of types, IEEE Tras. If. Theory, vol.44, o.6, pp. 2505 2523, Oct. 1998. 3. I. Csiszar ad P. C Shields, Iformatio Theory ad Statistics; a Tutorial, Foudatios ad Treds i Commu. ad If. Theory, 2004, NOW Pubisher Ic.