Coding on Countably Infinite Alphabets

Size: px
Start display at page:

Download "Coding on Countably Infinite Alphabets"

Transcription

1 Coding on Countably Infinite Alphabets Non-parametric Information Theory Licence de droits d usage

2 Outline Lossless Coding on infinite alphabets Source Coding Universal Coding Infinite Alphabets Enveloppe Classes Theoretical Properties Algorithms Licence de droits d usage Page 2 / 28

3 Data Compression : Shannon Modelization Licence de droits d usage Page 3 / 28

4 Data Compression : Shannon Modelization Licence de droits d usage Page 3 / 28

5 Outline Lossless Coding on infinite alphabets Source Coding Universal Coding Infinite Alphabets Enveloppe Classes Theoretical Properties Algorithms Licence de droits d usage Page 4 / 28

6 Source Entropy Shannon ( 48) : E P [ φ n (x) ] H n (P) = E P [ log P n (X n 1 )] n-block entropy and there is a code reaching the bound (within 1 bit). Moreover, if P is stationary and ergodic : 1 n H n(p) H(P) entropy rate of the source P = minimal number of bits necessary per symbol. Licence de droits d usage Page 5 / 28

7 Coding Distributions Kraft Inequality : 2 φn(x) 1 x n 1 and reciprocal. Code φ coding distribution Q φ n (x) = 2 φn(x). The Shannon 48 theorem expresses that in average the best coding distribution is P! Otherwise, coding distribution Q n suffers from the regret ( ) ( log Q n X n 1 log P n ( X1 n )) P = log n (X1 n ) Q n(x1 n ). Licence de droits d usage Page 6 / 28

8 Outline Lossless Coding on infinite alphabets Source Coding Universal Coding Infinite Alphabets Enveloppe Classes Theoretical Properties Algorithms Licence de droits d usage Page 7 / 28

9 Universal Coders What if the source statistics are unknown? What if we need versatile code? = Need a single coding distribution Q n for a class of sources Λ = {P θ, θ Θ} Ex : memoryless processes, Markov chains, HMM... = unavoidable redundancy : E Pθ [ φ (X n 1 ) ] H (X n 1 ) = E P θ [log Q n (X n 1 ) + log P θ (X n 1 )] = KL (P θ, Q n ) = Küllback-Leibler divergence between P θ and Q n. Licence de droits d usage Page 8 / 28

10 Coders : Measures of Universality 1. Maximal regret : R (Q n, Λ) = sup 2. Worst case redundancy :[ R + (Q n, Λ) = sup E Pθ θ Θ sup x1 n An θ Θ log Pn θ log P ( ) θ x n ( 1 ) Q n x n 1 ( ) ] X n 1 Q n ( X n 1 ) = sup KL (P θ, Q n ) θ Θ 3. Expected redundancy [ with [ respect ( to ) prior ]] π : Rπ (Q n, Λ) = E π E Pθ log Pn θ X n ( 1 ) = E Q n X n π [KL (P θ, Q n )] 1 = R (Q n, Λ) R + (Q n, Λ) R (Q n, Λ) Licence de droits d usage Page 9 / 28

11 Classes : Measures of Complexity 1. Minimax regret : ( ) Rn (Λ) = inf R (Q n, Λ) = min max log Pn θ x n ( 1 ) Q n Q n x1 n,θ Q n x n 1 2. Minimax redundancy : R n + (Λ) = inf R + (Q n, Λ) = min max KL (Pθ n, Q n) Q n θ 3. Maximin redundancy : Rn (Λ) = sup inf π Q n Rπ (Q n, Λ) = max Q n π min E π [KL (Pθ n, Q n)] Q n = R n (Λ) R + n (Λ) R n (Λ) Licence de droits d usage Page 10 / 28

12 General Standard Results Minimax result (Haussler 97, Sion) : Rn (Λ) = R n + (Λ). Shtarkov s NML : Rn (Λ) = R (QNML, n Λ) with QNML(x) n = and ˆp(x) = sup P Λ P n (x), so that and hence Rn (Λ) = log ˆp(y). y X n Channel capacity method : if dµ(θ, x) = dπ(θ)dp θ (x) then P ˆp(x) y X n ˆp(y) inf Rπ (Q n, Λ) = H µ (X1 n ) H µ(x1 n θ) = H µ(θ) H µ (θ X1 n ). Q n Licence de droits d usage Page 11 / 28

13 Finite Alphabets : Standard Results Theorem (Shtarkov, Barron, Spankowski,... ) If I m denotes the class of memoryless processes over alphabet {1,..., m}, then R + n (I m ) = m 1 2 R n (I m ) = m 1 2 log n Γ(1/2)m + log 2e Γ ( ) m + o m (1) 2 log n Γ(1/2)m + log 2 Γ ( ) m + o m (1) m 1 log n Theorem (Rissanen 84) If dim Θ = k, and if there exists a n-consistent estimator of θ given X n 1, then lim inf n R n (Λ) k log n. 2 Licence de droits d usage Page 12 / 28

14 Outline Lossless Coding on infinite alphabets Source Coding Universal Coding Infinite Alphabets Enveloppe Classes Theoretical Properties Algorithms Licence de droits d usage Page 13 / 28

15 Infinite Alphabets : Issues Motivations : integer coding, lossless image coding, text coding on words, mathematical curiosity. 1. Understand general structural properties of minimax redundancy and minimax regret. 2. Characterize those source classes that have finite minimax regret. 3. Quantitative relations between minimax redundancy or regret and integrability of the envelope function. 4. Develope effective coding techniques for source classes with known non-trivial minimax redundancy rate. 5. Develope adaptive coding schemes for collections of source classes that are too large to have redundancy rates. Licence de droits d usage Page 14 / 28

16 Sanity-check Properties 1. R (Λ n ) < + Q n NML is well-defined and given by Q n NML(x) = ˆp(x) y X n ˆp(y) for x X n where ˆp(x) = sup P Λ P n (x). 2. R + (Λ n ) and R (Λ n ) are non-decreasing, sub-additive (or infinite) functions of n. Thus, and R + (Λ n ) R + (Λ n ) ( lim = inf R + Λ 1), n n n N + n R (Λ n ) R (Λ n ) ( lim = inf R Λ 1). n n n N + n Licence de droits d usage Page 15 / 28

17 Memoryless Sources 1. Let Λ be a class of stationary memoryless sources over a countably infinite alphabet. Let ˆp be defined by ˆp(x) = sup P Λ P{x}. The minimax regret with respect to Λ n is finite if and only if the normalized maximum likelihood (Shtarkov) coding probability is well-defined and : R (Λ n ) < x N + ˆp(x) <. 2. We give an example of a memoryless class Λ such that R + (Λ n ) <, but R (Λ n ) =. Licence de droits d usage Page 16 / 28

18 Outline Lossless Coding on infinite alphabets Source Coding Universal Coding Infinite Alphabets Enveloppe Classes Theoretical Properties Algorithms Licence de droits d usage Page 17 / 28

19 Redundancy and Regret Equivalence Property Definition Λ f = { P : } x N, P 1 {x} f (x) and P is stationary memoryless.. Theorem Let f be a non-negative function from N + to [0, 1], let Λ f be the class of stationary memoryless sources defined by envelope f. Then R + (Λ n f ) < R (Λ n f ) < f (k) <. k N + Licence de droits d usage Page 18 / 28

20 Idea of the Proof The only thing to prove is that k N + f (k) = R + (Λ n f ) =. Let the sequence of integers (h i ) i N be defined recursively by h 0 = 0 and h h i+1 = min h : f (k) > 1. k=h i +1 We consider the class Λ = {P θ, θ Θ} where Θ = N. The memoryless source P i is defined by its first marginal Pi 1 which is given by Pi 1 f (m) (m) = hi+1 k=h i +1 f (k) for m {p i + 1,..., p i+1 }. Taking any infinite entropy prior π over Θ the Λ 1 = {Pi 1 ; i N + } shows that R + ( Λ 1) Rπ ( Λ 1 ) = H(θ) H(θ X 1 ) =. Licence de droits d usage Page 19 / 28

21 Upper-bounding the Regret Theorem If Λ is a class of memoryless sources, let the tail function F Λ 1 be defined by F Λ 1(u) = k>u ˆp(k), then : [ R (Λ n ) inf n F Λ 1(u) log e + u 1 ] log n + 2. u:u n 2 Corollary Let Λ denote a class of memoryless sources, then the following holds : R (Λ n ) < R (Λ n ) = o(n) and R + (Λ n ) = o(n). Licence de droits d usage Page 20 / 28

22 Lower-bounding the Redundancy Theorem Let f denote a non-increasing, summable envelope function. For any integer p, let c(p) = p k=1 f (2k). Let c( ) = k 1 f (2k). Assume furthermore that c( ) > 1. Let p N + be such that c(p) > 1. Let n N +, ɛ > 0 and λ ]0, 1[ be such that n > c(p) 10 f (2p) ɛ(1 λ). Then R + (Λ n f ) C(p, n, λ, ɛ) p where C(p, n, λ, ɛ) = 1 1+ c(p) λ 2 nf (2p) i=1 ( ) 1 (2i) (1 λ) π lognf ɛ, 2 2c(p)e ( 1 4 π 5c(p) (1 λ)ɛnf (2p) ). Licence de droits d usage Page 21 / 28

23 Power-law Classes Theorem Let α denote a real number larger than 1, and C be such that Cζ(α) 2 α. The source class Λ C α is the envelope class associated with the decreasing function f α,c : x 1 C x α for C > 1 and α > 1. Then : where n 1/α A(α) log (Cζ(α)) 1/α R + (Λ n C ) α A(α) = 1 α 1 1 ( u 1 1/α 1 e 1/(ζ(α)u)) du. ( ) 2Cn 1/α R (Λ n C ) (log n) 1 1/α + O(1). α α 1 Licence de droits d usage Page 22 / 28

24 Exponential Classes Theorem Let C and α denote positive real numbers satisfying C > e 2α. The class Λ Ce α is the envelope class associated with function f α : x 1 Ce αx. Then 1 8α log2 n (1 o(1)) R + (Λ n Ce α ) R (Λ n Ce α ) 1 2α log2 n + O(1) cf Dominique Bontemps : R + (Λ n Ce α ) 1 4α log2 n Licence de droits d usage Page 23 / 28

25 Outline Lossless Coding on infinite alphabets Source Coding Universal Coding Infinite Alphabets Enveloppe Classes Theoretical Properties Algorithms Licence de droits d usage Page 24 / 28

26 The CensoringCode Algorithm Algorithm CensoringCode K 0 counts [1/2, 1/2,...] for i from 1 to n do cutoff 4 α 1 Ci 1/α if cutoff > K then for j K + 1 to cutoff do counts[0] counts[0] counts[j] + 1/2 end for K cutoff end if if x[i] cutoff then ArithCode(x[i], counts[0 : cutoff]) else ArithCode(0, counts[0 : cutoff]) C1 C1 EliasCode(x[i]) counts[0] counts[0] + 1 end if counts[x[i]] counts[x[i]] + 1 end for C2 ArithCode() C 1 C 2 Theorem Let C and α be positive reals. Let the sequence of cutoffs (K i ) i n be given by Then K i = ( ) 4Ci 1/α. α 1 R + (CensoringCode, Λ n C ) α ( ) 1 4Cn α log n (1 + o(1)). α 1 Licence de droits d usage Page 25 / 28

27 Approximately Adaptive Algorithms A sequence (Q n ) n of coding probabilities is said to be approximately asymptotically adaptive with respect to a collection (Λ m ) m M of source classes if for each P m M Λ m, for each Λ m such that P Λ m : D(P n, Q n )/R + (Λ n m) O(log n). A modification of CensoringCode choosing the n + 1th cutoff K n+1 according to the number of distinct symbols in x is approximately adaptive on W α = {P : P Λ α, 0 < lim inf P 1 (k) lim sup k α P 1 (k) < k k } Pattern coding is approximately adaptive if 1 < α 5/2.. Licence de droits d usage Page 26 / 28

28 Patterns The information conveyed in a message x can be separated into 1. a dictionary = (x) : the sequence of distinct symbols occurring in x in order of appearance ; 2. a pattern ψ = ψ(x) where ψ i is the rank of x i in dictionary. Example : Message x = a b r a c a d a b r a Pattern ψ(x) = Dictionary (x) = a b r c d = A random process (X n ) n with distribution P induces a random pattern process (Ψ n ) n on N + with distribution : P Ψ (Ψ n 1 = ψn 1 ) = P (X n 1 = x n 1 ). x n 1 :ψ(x n 1 )=ψn 1 Licence de droits d usage Page 27 / 28

29 Pattern Coding The pattern entropy rate exists and coincides with the process entropy rate : 1 n H (Ψn 1 ) = 1 [ ] n E P Ψ log P Ψ (Ψ n 1 ) H(Ψ) = H(X). Pattern redundancy satisfies : ( ) 1 ( ) n R 2 n. log n Ψ,n (I ) RΨ,n (I ) π 3 log e The proof of the lower-bound uses fine combinatorics on integer partitions with small summands (see Garivier 09). Upper-bounds in O(n 2/5 ) have been given (see Shamir 07), but there is still a gap between lower- and upper-bounds. For memoryless coding, the pattern redundancy is neglectible wrt. the codelength of the dictionary whenever n < 5/2. Licence de droits d usage Page 28 / 28

Context tree models for source coding

Context tree models for source coding Context tree models for source coding Toward Non-parametric Information Theory Licence de droits d usage Outline Lossless Source Coding = density estimation with log-loss Source Coding and Universal Coding

More information

Coding on countably infinite alphabets

Coding on countably infinite alphabets Coding on countably infinite alphabets Stéphane Boucheron, Aurélien Garivier and Elisabeth Gassiat Abstract arxiv:080.456v [math.st] 6 Jan 008 This paper describes universal lossless coding strategies

More information

A BLEND OF INFORMATION THEORY AND STATISTICS. Andrew R. Barron. Collaborators: Cong Huang, Jonathan Li, Gerald Cheang, Xi Luo

A BLEND OF INFORMATION THEORY AND STATISTICS. Andrew R. Barron. Collaborators: Cong Huang, Jonathan Li, Gerald Cheang, Xi Luo A BLEND OF INFORMATION THEORY AND STATISTICS Andrew R. YALE UNIVERSITY Collaborators: Cong Huang, Jonathan Li, Gerald Cheang, Xi Luo Frejus, France, September 1-5, 2008 A BLEND OF INFORMATION THEORY AND

More information

The Method of Types and Its Application to Information Hiding

The Method of Types and Its Application to Information Hiding The Method of Types and Its Application to Information Hiding Pierre Moulin University of Illinois at Urbana-Champaign www.ifp.uiuc.edu/ moulin/talks/eusipco05-slides.pdf EUSIPCO Antalya, September 7,

More information

Lecture 2: August 31

Lecture 2: August 31 0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 2: August 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy

More information

Consistency of the maximum likelihood estimator for general hidden Markov models

Consistency of the maximum likelihood estimator for general hidden Markov models Consistency of the maximum likelihood estimator for general hidden Markov models Jimmy Olsson Centre for Mathematical Sciences Lund University Nordstat 2012 Umeå, Sweden Collaborators Hidden Markov models

More information

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018 Please submit the solutions on Gradescope. EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018 1. Optimal codeword lengths. Although the codeword lengths of an optimal variable length code

More information

Lecture 5 Channel Coding over Continuous Channels

Lecture 5 Channel Coding over Continuous Channels Lecture 5 Channel Coding over Continuous Channels I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw November 14, 2014 1 / 34 I-Hsiang Wang NIT Lecture 5 From

More information

Piecewise Constant Prediction

Piecewise Constant Prediction Piecewise Constant Prediction Erik Ordentlich Information heory Research Hewlett-Packard Laboratories Palo Alto, CA 94304 Email: erik.ordentlich@hp.com Marcelo J. Weinberger Information heory Research

More information

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria Source Coding Master Universitario en Ingeniería de Telecomunicación I. Santamaría Universidad de Cantabria Contents Introduction Asymptotic Equipartition Property Optimal Codes (Huffman Coding) Universal

More information

Basic Principles of Lossless Coding. Universal Lossless coding. Lempel-Ziv Coding. 2. Exploit dependences between successive symbols.

Basic Principles of Lossless Coding. Universal Lossless coding. Lempel-Ziv Coding. 2. Exploit dependences between successive symbols. Universal Lossless coding Lempel-Ziv Coding Basic principles of lossless compression Historical review Variable-length-to-block coding Lempel-Ziv coding 1 Basic Principles of Lossless Coding 1. Exploit

More information

1 Ex. 1 Verify that the function H(p 1,..., p n ) = k p k log 2 p k satisfies all 8 axioms on H.

1 Ex. 1 Verify that the function H(p 1,..., p n ) = k p k log 2 p k satisfies all 8 axioms on H. Problem sheet Ex. Verify that the function H(p,..., p n ) = k p k log p k satisfies all 8 axioms on H. Ex. (Not to be handed in). looking at the notes). List as many of the 8 axioms as you can, (without

More information

5 Mutual Information and Channel Capacity

5 Mutual Information and Channel Capacity 5 Mutual Information and Channel Capacity In Section 2, we have seen the use of a quantity called entropy to measure the amount of randomness in a random variable. In this section, we introduce several

More information

WE start with a general discussion. Suppose we have

WE start with a general discussion. Suppose we have 646 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO. 2, MARCH 1997 Minimax Redundancy for the Class of Memoryless Sources Qun Xie and Andrew R. Barron, Member, IEEE Abstract Let X n = (X 1 ; 111;Xn)be

More information

Iterative Markov Chain Monte Carlo Computation of Reference Priors and Minimax Risk

Iterative Markov Chain Monte Carlo Computation of Reference Priors and Minimax Risk Iterative Markov Chain Monte Carlo Computation of Reference Priors and Minimax Risk John Lafferty School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 lafferty@cs.cmu.edu Abstract

More information

3F1 Information Theory, Lecture 3

3F1 Information Theory, Lecture 3 3F1 Information Theory, Lecture 3 Jossy Sayir Department of Engineering Michaelmas 2011, 28 November 2011 Memoryless Sources Arithmetic Coding Sources with Memory 2 / 19 Summary of last lecture Prefix-free

More information

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University Chapter 4 Data Transmission and Channel Capacity Po-Ning Chen, Professor Department of Communications Engineering National Chiao Tung University Hsin Chu, Taiwan 30050, R.O.C. Principle of Data Transmission

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 /

More information

STATISTICAL CURVATURE AND STOCHASTIC COMPLEXITY

STATISTICAL CURVATURE AND STOCHASTIC COMPLEXITY 2nd International Symposium on Information Geometry and its Applications December 2-6, 2005, Tokyo Pages 000 000 STATISTICAL CURVATURE AND STOCHASTIC COMPLEXITY JUN-ICHI TAKEUCHI, ANDREW R. BARRON, AND

More information

Information Theory in Intelligent Decision Making

Information Theory in Intelligent Decision Making Information Theory in Intelligent Decision Making Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire, United Kingdom June 7, 2015 Information Theory

More information

Information Theory, Statistics, and Decision Trees

Information Theory, Statistics, and Decision Trees Information Theory, Statistics, and Decision Trees Léon Bottou COS 424 4/6/2010 Summary 1. Basic information theory. 2. Decision trees. 3. Information theory and statistics. Léon Bottou 2/31 COS 424 4/6/2010

More information

Minimax Redundancy for Large Alphabets by Analytic Methods

Minimax Redundancy for Large Alphabets by Analytic Methods Minimax Redundancy for Large Alphabets by Analytic Methods Wojciech Szpankowski Purdue University W. Lafayette, IN 47907 March 15, 2012 NSF STC Center for Science of Information CISS, Princeton 2012 Joint

More information

3F1 Information Theory, Lecture 3

3F1 Information Theory, Lecture 3 3F1 Information Theory, Lecture 3 Jossy Sayir Department of Engineering Michaelmas 2013, 29 November 2013 Memoryless Sources Arithmetic Coding Sources with Memory Markov Example 2 / 21 Encoding the output

More information

Universal Loseless Compression: Context Tree Weighting(CTW)

Universal Loseless Compression: Context Tree Weighting(CTW) Universal Loseless Compression: Context Tree Weighting(CTW) Dept. Electrical Engineering, Stanford University Dec 9, 2014 Universal Coding with Model Classes Traditional Shannon theory assume a (probabilistic)

More information

Shannon s Noisy-Channel Coding Theorem

Shannon s Noisy-Channel Coding Theorem Shannon s Noisy-Channel Coding Theorem Lucas Slot Sebastian Zur February 13, 2015 Lucas Slot, Sebastian Zur Shannon s Noisy-Channel Coding Theorem February 13, 2015 1 / 29 Outline 1 Definitions and Terminology

More information

Homework Set #2 Data Compression, Huffman code and AEP

Homework Set #2 Data Compression, Huffman code and AEP Homework Set #2 Data Compression, Huffman code and AEP 1. Huffman coding. Consider the random variable ( x1 x X = 2 x 3 x 4 x 5 x 6 x 7 0.50 0.26 0.11 0.04 0.04 0.03 0.02 (a Find a binary Huffman code

More information

Lecture 4 Channel Coding

Lecture 4 Channel Coding Capacity and the Weak Converse Lecture 4 Coding I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw October 15, 2014 1 / 16 I-Hsiang Wang NIT Lecture 4 Capacity

More information

Bayesian Properties of Normalized Maximum Likelihood and its Fast Computation

Bayesian Properties of Normalized Maximum Likelihood and its Fast Computation Bayesian Properties of Normalized Maximum Likelihood and its Fast Computation Andre Barron Department of Statistics Yale University Email: andre.barron@yale.edu Teemu Roos HIIT & Dept of Computer Science

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science Transmission of Information Spring 2006

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science Transmission of Information Spring 2006 MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science 6.44 Transmission of Information Spring 2006 Homework 2 Solution name username April 4, 2006 Reading: Chapter

More information

Robustness and duality of maximum entropy and exponential family distributions

Robustness and duality of maximum entropy and exponential family distributions Chapter 7 Robustness and duality of maximum entropy and exponential family distributions In this lecture, we continue our study of exponential families, but now we investigate their properties in somewhat

More information

Information and Statistics

Information and Statistics Information and Statistics Andrew Barron Department of Statistics Yale University IMA Workshop On Information Theory and Concentration Phenomena Minneapolis, April 13, 2015 Barron Information and Statistics

More information

ELEMENT OF INFORMATION THEORY

ELEMENT OF INFORMATION THEORY History Table of Content ELEMENT OF INFORMATION THEORY O. Le Meur olemeur@irisa.fr Univ. of Rennes 1 http://www.irisa.fr/temics/staff/lemeur/ October 2010 1 History Table of Content VERSION: 2009-2010:

More information

Lecture 6 I. CHANNEL CODING. X n (m) P Y X

Lecture 6 I. CHANNEL CODING. X n (m) P Y X 6- Introduction to Information Theory Lecture 6 Lecturer: Haim Permuter Scribe: Yoav Eisenberg and Yakov Miron I. CHANNEL CODING We consider the following channel coding problem: m = {,2,..,2 nr} Encoder

More information

Basic math for biology

Basic math for biology Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood

More information

Exercises with solutions (Set D)

Exercises with solutions (Set D) Exercises with solutions Set D. A fair die is rolled at the same time as a fair coin is tossed. Let A be the number on the upper surface of the die and let B describe the outcome of the coin toss, where

More information

EE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16

EE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16 EE539R: Problem Set 4 Assigned: 3/08/6, Due: 07/09/6. Cover and Thomas: Problem 3.5 Sets defined by probabilities: Define the set C n (t = {x n : P X n(x n 2 nt } (a We have = P X n(x n P X n(x n 2 nt

More information

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information. L65 Dept. of Linguistics, Indiana University Fall 205 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission rate

More information

Chaos, Complexity, and Inference (36-462)

Chaos, Complexity, and Inference (36-462) Chaos, Complexity, and Inference (36-462) Lecture 7: Information Theory Cosma Shalizi 3 February 2009 Entropy and Information Measuring randomness and dependence in bits The connection to statistics Long-run

More information

Lossy Compression Coding Theorems for Arbitrary Sources

Lossy Compression Coding Theorems for Arbitrary Sources Lossy Compression Coding Theorems for Arbitrary Sources Ioannis Kontoyiannis U of Cambridge joint work with M. Madiman, M. Harrison, J. Zhang Beyond IID Workshop, Cambridge, UK July 23, 2018 Outline Motivation

More information

Dept. of Linguistics, Indiana University Fall 2015

Dept. of Linguistics, Indiana University Fall 2015 L645 Dept. of Linguistics, Indiana University Fall 2015 1 / 28 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission

More information

Piecewise Constant Prediction

Piecewise Constant Prediction Piecewise Constant Prediction Erik Ordentlich, Marcelo J. Weinberger, Yihong Wu HP Laboratories HPL-01-114 Keyword(s): prediction; data compression; i strategies; universal schemes Abstract: Mini prediction

More information

A One-to-One Code and Its Anti-Redundancy

A One-to-One Code and Its Anti-Redundancy A One-to-One Code and Its Anti-Redundancy W. Szpankowski Department of Computer Science, Purdue University July 4, 2005 This research is supported by NSF, NSA and NIH. Outline of the Talk. Prefix Codes

More information

(Classical) Information Theory II: Source coding

(Classical) Information Theory II: Source coding (Classical) Information Theory II: Source coding Sibasish Ghosh The Institute of Mathematical Sciences CIT Campus, Taramani, Chennai 600 113, India. p. 1 Abstract The information content of a random variable

More information

Introduction to information theory and coding

Introduction to information theory and coding Introduction to information theory and coding Louis WEHENKEL Set of slides No 4 Source modeling and source coding Stochastic processes and models for information sources First Shannon theorem : data compression

More information

Tight Bounds for Symmetric Divergence Measures and a New Inequality Relating f-divergences

Tight Bounds for Symmetric Divergence Measures and a New Inequality Relating f-divergences Tight Bounds for Symmetric Divergence Measures and a New Inequality Relating f-divergences Igal Sason Department of Electrical Engineering Technion, Haifa 3000, Israel E-mail: sason@ee.technion.ac.il Abstract

More information

On bounded redundancy of universal codes

On bounded redundancy of universal codes On bounded redundancy of universal codes Łukasz Dębowski Institute of omputer Science, Polish Academy of Sciences ul. Jana Kazimierza 5, 01-248 Warszawa, Poland Abstract onsider stationary ergodic measures

More information

A lower bound on compression of unknown alphabets

A lower bound on compression of unknown alphabets Theoretical Computer Science 33 005 93 3 wwwelseviercom/locate/tcs A lower bound on compression of unknown alphabets Nikola Jevtić a, Alon Orlitsky a,b,, Narayana P Santhanam a a ECE Department, University

More information

ECE 587 / STA 563: Lecture 5 Lossless Compression

ECE 587 / STA 563: Lecture 5 Lossless Compression ECE 587 / STA 563: Lecture 5 Lossless Compression Information Theory Duke University, Fall 2017 Author: Galen Reeves Last Modified: October 18, 2017 Outline of lecture: 5.1 Introduction to Lossless Source

More information

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information 4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information Ramji Venkataramanan Signal Processing and Communications Lab Department of Engineering ramji.v@eng.cam.ac.uk

More information

Lecture 4 Noisy Channel Coding

Lecture 4 Noisy Channel Coding Lecture 4 Noisy Channel Coding I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw October 9, 2015 1 / 56 I-Hsiang Wang IT Lecture 4 The Channel Coding Problem

More information

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability

More information

Second-Order Asymptotics in Information Theory

Second-Order Asymptotics in Information Theory Second-Order Asymptotics in Information Theory Vincent Y. F. Tan (vtan@nus.edu.sg) Dept. of ECE and Dept. of Mathematics National University of Singapore (NUS) National Taiwan University November 2015

More information

Large Deviations Performance of Knuth-Yao algorithm for Random Number Generation

Large Deviations Performance of Knuth-Yao algorithm for Random Number Generation Large Deviations Performance of Knuth-Yao algorithm for Random Number Generation Akisato KIMURA akisato@ss.titech.ac.jp Tomohiko UYEMATSU uematsu@ss.titech.ac.jp April 2, 999 No. AK-TR-999-02 Abstract

More information

Pointwise Redundancy in Lossy Data Compression and Universal Lossy Data Compression

Pointwise Redundancy in Lossy Data Compression and Universal Lossy Data Compression Pointwise Redundancy in Lossy Data Compression and Universal Lossy Data Compression I. Kontoyiannis To appear, IEEE Transactions on Information Theory, Jan. 2000 Last revision, November 21, 1999 Abstract

More information

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1 Kraft s inequality An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if N 2 l i 1 Proof: Suppose that we have a tree code. Let l max = max{l 1,...,

More information

Bioinformatics: Biology X

Bioinformatics: Biology X Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA Model Building/Checking, Reverse Engineering, Causality Outline 1 Bayesian Interpretation of Probabilities 2 Where (or of what)

More information

CS 630 Basic Probability and Information Theory. Tim Campbell

CS 630 Basic Probability and Information Theory. Tim Campbell CS 630 Basic Probability and Information Theory Tim Campbell 21 January 2003 Probability Theory Probability Theory is the study of how best to predict outcomes of events. An experiment (or trial or event)

More information

Keep it Simple Stupid On the Effect of Lower-Order Terms in BIC-Like Criteria

Keep it Simple Stupid On the Effect of Lower-Order Terms in BIC-Like Criteria Keep it Simple Stupid On the Effect of Lower-Order Terms in BIC-Like Criteria Teemu Roos and Yuan Zou Helsinki Institute for Information Technology HIIT Department of Computer Science University of Helsinki,

More information

Quiz 2 Date: Monday, November 21, 2016

Quiz 2 Date: Monday, November 21, 2016 10-704 Information Processing and Learning Fall 2016 Quiz 2 Date: Monday, November 21, 2016 Name: Andrew ID: Department: Guidelines: 1. PLEASE DO NOT TURN THIS PAGE UNTIL INSTRUCTED. 2. Write your name,

More information

ECE 587 / STA 563: Lecture 5 Lossless Compression

ECE 587 / STA 563: Lecture 5 Lossless Compression ECE 587 / STA 563: Lecture 5 Lossless Compression Information Theory Duke University, Fall 28 Author: Galen Reeves Last Modified: September 27, 28 Outline of lecture: 5. Introduction to Lossless Source

More information

Tight Bounds for Symmetric Divergence Measures and a Refined Bound for Lossless Source Coding

Tight Bounds for Symmetric Divergence Measures and a Refined Bound for Lossless Source Coding APPEARS IN THE IEEE TRANSACTIONS ON INFORMATION THEORY, FEBRUARY 015 1 Tight Bounds for Symmetric Divergence Measures and a Refined Bound for Lossless Source Coding Igal Sason Abstract Tight bounds for

More information

Bounded Expected Delay in Arithmetic Coding

Bounded Expected Delay in Arithmetic Coding Bounded Expected Delay in Arithmetic Coding Ofer Shayevitz, Ram Zamir, and Meir Feder Tel Aviv University, Dept. of EE-Systems Tel Aviv 69978, Israel Email: {ofersha, zamir, meir }@eng.tau.ac.il arxiv:cs/0604106v1

More information

Chapter 9 Fundamental Limits in Information Theory

Chapter 9 Fundamental Limits in Information Theory Chapter 9 Fundamental Limits in Information Theory Information Theory is the fundamental theory behind information manipulation, including data compression and data transmission. 9.1 Introduction o For

More information

Information Theory Primer:

Information Theory Primer: Information Theory Primer: Entropy, KL Divergence, Mutual Information, Jensen s inequality Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,

More information

Two-Part Codes with Low Worst-Case Redundancies for Distributed Compression of Bernoulli Sequences

Two-Part Codes with Low Worst-Case Redundancies for Distributed Compression of Bernoulli Sequences 2003 Conference on Information Sciences and Systems, The Johns Hopkins University, March 2 4, 2003 Two-Part Codes with Low Worst-Case Redundancies for Distributed Compression of Bernoulli Sequences Dror

More information

Sequential prediction with coded side information under logarithmic loss

Sequential prediction with coded side information under logarithmic loss under logarithmic loss Yanina Shkel Department of Electrical Engineering Princeton University Princeton, NJ 08544, USA Maxim Raginsky Department of Electrical and Computer Engineering Coordinated Science

More information

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 ECE598: Information-theoretic methods in high-dimensional statistics Spring 06 Lecture : Mutual Information Method Lecturer: Yihong Wu Scribe: Jaeho Lee, Mar, 06 Ed. Mar 9 Quick review: Assouad s lemma

More information

3. If a choice is broken down into two successive choices, the original H should be the weighted sum of the individual values of H.

3. If a choice is broken down into two successive choices, the original H should be the weighted sum of the individual values of H. Appendix A Information Theory A.1 Entropy Shannon (Shanon, 1948) developed the concept of entropy to measure the uncertainty of a discrete random variable. Suppose X is a discrete random variable that

More information

Information Theory. M1 Informatique (parcours recherche et innovation) Aline Roumy. January INRIA Rennes 1/ 73

Information Theory. M1 Informatique (parcours recherche et innovation) Aline Roumy. January INRIA Rennes 1/ 73 1/ 73 Information Theory M1 Informatique (parcours recherche et innovation) Aline Roumy INRIA Rennes January 2018 Outline 2/ 73 1 Non mathematical introduction 2 Mathematical introduction: definitions

More information

Variable-to-Variable Codes with Small Redundancy Rates

Variable-to-Variable Codes with Small Redundancy Rates Variable-to-Variable Codes with Small Redundancy Rates M. Drmota W. Szpankowski September 25, 2004 This research is supported by NSF, NSA and NIH. Institut f. Diskrete Mathematik und Geometrie, TU Wien,

More information

10-704: Information Processing and Learning Fall Lecture 10: Oct 3

10-704: Information Processing and Learning Fall Lecture 10: Oct 3 0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 0: Oct 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy of

More information

Coding of memoryless sources 1/35

Coding of memoryless sources 1/35 Coding of memoryless sources 1/35 Outline 1. Morse coding ; 2. Definitions : encoding, encoding efficiency ; 3. fixed length codes, encoding integers ; 4. prefix condition ; 5. Kraft and Mac Millan theorems

More information

Complex Systems Methods 2. Conditional mutual information, entropy rate and algorithmic complexity

Complex Systems Methods 2. Conditional mutual information, entropy rate and algorithmic complexity Complex Systems Methods 2. Conditional mutual information, entropy rate and algorithmic complexity Eckehard Olbrich MPI MiS Leipzig Potsdam WS 2007/08 Olbrich (Leipzig) 26.10.2007 1 / 18 Overview 1 Summary

More information

Lecture 21: Minimax Theory

Lecture 21: Minimax Theory Lecture : Minimax Theory Akshay Krishnamurthy akshay@cs.umass.edu November 8, 07 Recap In the first part of the course, we spent the majority of our time studying risk minimization. We found many ways

More information

Convexity/Concavity of Renyi Entropy and α-mutual Information

Convexity/Concavity of Renyi Entropy and α-mutual Information Convexity/Concavity of Renyi Entropy and -Mutual Information Siu-Wai Ho Institute for Telecommunications Research University of South Australia Adelaide, SA 5095, Australia Email: siuwai.ho@unisa.edu.au

More information

Minimax Optimal Bayes Mixtures for Memoryless Sources over Large Alphabets

Minimax Optimal Bayes Mixtures for Memoryless Sources over Large Alphabets Proceedings of Machine Learning Research 8: 8, 208 Algorithmic Learning Theory 208 Minimax Optimal Bayes Mixtures for Memoryless Sources over Large Alphabets Elias Jääsaari Helsinki Institute for Information

More information

ECE 4400:693 - Information Theory

ECE 4400:693 - Information Theory ECE 4400:693 - Information Theory Dr. Nghi Tran Lecture 8: Differential Entropy Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 1 / 43 Outline 1 Review: Entropy of discrete RVs 2 Differential

More information

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R.

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R. Ergodic Theorems Samy Tindel Purdue University Probability Theory 2 - MA 539 Taken from Probability: Theory and examples by R. Durrett Samy T. Ergodic theorems Probability Theory 1 / 92 Outline 1 Definitions

More information

Asymptotic Minimax Regret for Data Compression, Gambling, and Prediction

Asymptotic Minimax Regret for Data Compression, Gambling, and Prediction IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 46, NO. 2, MARCH 2000 431 Asymptotic Minimax Regret for Data Compression, Gambling, Prediction Qun Xie Andrew R. Barron, Member, IEEE Abstract For problems

More information

On Universal Types. Gadiel Seroussi Hewlett-Packard Laboratories Palo Alto, California, USA. University of Minnesota, September 14, 2004

On Universal Types. Gadiel Seroussi Hewlett-Packard Laboratories Palo Alto, California, USA. University of Minnesota, September 14, 2004 On Universal Types Gadiel Seroussi Hewlett-Packard Laboratories Palo Alto, California, USA University of Minnesota, September 14, 2004 Types for Parametric Probability Distributions A = finite alphabet,

More information

EECS 750. Hypothesis Testing with Communication Constraints

EECS 750. Hypothesis Testing with Communication Constraints EECS 750 Hypothesis Testing with Communication Constraints Name: Dinesh Krithivasan Abstract In this report, we study a modification of the classical statistical problem of bivariate hypothesis testing.

More information

On Competitive Prediction and Its Relation to Rate-Distortion Theory

On Competitive Prediction and Its Relation to Rate-Distortion Theory IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 49, NO. 12, DECEMBER 2003 3185 On Competitive Prediction and Its Relation to Rate-Distortion Theory Tsachy Weissman, Member, IEEE, and Neri Merhav, Fellow,

More information

Two Applications of the Gaussian Poincaré Inequality in the Shannon Theory

Two Applications of the Gaussian Poincaré Inequality in the Shannon Theory Two Applications of the Gaussian Poincaré Inequality in the Shannon Theory Vincent Y. F. Tan (Joint work with Silas L. Fong) National University of Singapore (NUS) 2016 International Zurich Seminar on

More information

Kolmogorov complexity

Kolmogorov complexity Kolmogorov complexity In this section we study how we can define the amount of information in a bitstring. Consider the following strings: 00000000000000000000000000000000000 0000000000000000000000000000000000000000

More information

Some Results on the Ergodicity of Adaptive MCMC Algorithms

Some Results on the Ergodicity of Adaptive MCMC Algorithms Some Results on the Ergodicity of Adaptive MCMC Algorithms Omar Khalil Supervisor: Jeffrey Rosenthal September 2, 2011 1 Contents 1 Andrieu-Moulines 4 2 Roberts-Rosenthal 7 3 Atchadé and Fort 8 4 Relationship

More information

Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI

Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI Generative and Discriminative Approaches to Graphical Models CMSC 35900 Topics in AI Lecture 2 Yasemin Altun January 26, 2007 Review of Inference on Graphical Models Elimination algorithm finds single

More information

ECE Advanced Communication Theory, Spring 2009 Homework #1 (INCOMPLETE)

ECE Advanced Communication Theory, Spring 2009 Homework #1 (INCOMPLETE) ECE 74 - Advanced Communication Theory, Spring 2009 Homework #1 (INCOMPLETE) 1. A Huffman code finds the optimal codeword to assign to a given block of source symbols. (a) Show that cannot be a Huffman

More information

Analytic Information Theory: From Shannon to Knuth and Back. Knuth80: Piteaa, Sweden, 2018 Dedicated to Don E. Knuth

Analytic Information Theory: From Shannon to Knuth and Back. Knuth80: Piteaa, Sweden, 2018 Dedicated to Don E. Knuth Analytic Information Theory: From Shannon to Knuth and Back Wojciech Szpankowski Center for Science of Information Purdue University January 7, 2018 Knuth80: Piteaa, Sweden, 2018 Dedicated to Don E. Knuth

More information

IN this paper, we study the problem of universal lossless compression

IN this paper, we study the problem of universal lossless compression 4008 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 9, SEPTEMBER 2006 An Algorithm for Universal Lossless Compression With Side Information Haixiao Cai, Member, IEEE, Sanjeev R. Kulkarni, Fellow,

More information

Upper Bounds on the Capacity of Binary Intermittent Communication

Upper Bounds on the Capacity of Binary Intermittent Communication Upper Bounds on the Capacity of Binary Intermittent Communication Mostafa Khoshnevisan and J. Nicholas Laneman Department of Electrical Engineering University of Notre Dame Notre Dame, Indiana 46556 Email:{mhoshne,

More information

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 2: Entropy and Mutual Information Chapter 2 outline Definitions Entropy Joint entropy, conditional entropy Relative entropy, mutual information Chain rules Jensen s inequality Log-sum inequality

More information

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157 Lecture 6: Gaussian Channels Copyright G. Caire (Sample Lectures) 157 Differential entropy (1) Definition 18. The (joint) differential entropy of a continuous random vector X n p X n(x) over R is: Z h(x

More information

Lecture 35: December The fundamental statistical distances

Lecture 35: December The fundamental statistical distances 36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose

More information

Lecture 1: Introduction, Entropy and ML estimation

Lecture 1: Introduction, Entropy and ML estimation 0-704: Information Processing and Learning Spring 202 Lecture : Introduction, Entropy and ML estimation Lecturer: Aarti Singh Scribes: Min Xu Disclaimer: These notes have not been subjected to the usual

More information

Quantitative Biology II Lecture 4: Variational Methods

Quantitative Biology II Lecture 4: Variational Methods 10 th March 2015 Quantitative Biology II Lecture 4: Variational Methods Gurinder Singh Mickey Atwal Center for Quantitative Biology Cold Spring Harbor Laboratory Image credit: Mike West Summary Approximate

More information

Information Dimension

Information Dimension Information Dimension Mina Karzand Massachusetts Institute of Technology November 16, 2011 1 / 26 2 / 26 Let X would be a real-valued random variable. For m N, the m point uniform quantized version of

More information

Exact Minimax Strategies for Predictive Density Estimation, Data Compression, and Model Selection

Exact Minimax Strategies for Predictive Density Estimation, Data Compression, and Model Selection 2708 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 50, NO. 11, NOVEMBER 2004 Exact Minimax Strategies for Predictive Density Estimation, Data Compression, Model Selection Feng Liang Andrew Barron, Senior

More information

National University of Singapore Department of Electrical & Computer Engineering. Examination for

National University of Singapore Department of Electrical & Computer Engineering. Examination for National University of Singapore Department of Electrical & Computer Engineering Examination for EE5139R Information Theory for Communication Systems (Semester I, 2014/15) November/December 2014 Time Allowed:

More information

Chapter 5: Data Compression

Chapter 5: Data Compression Chapter 5: Data Compression Definition. A source code C for a random variable X is a mapping from the range of X to the set of finite length strings of symbols from a D-ary alphabet. ˆX: source alphabet,

More information

Information Theory and Model Selection

Information Theory and Model Selection Information Theory and Model Selection Dean Foster & Robert Stine Dept of Statistics, Univ of Pennsylvania May 11, 1998 Data compression and coding Duality of code lengths and probabilities Model selection

More information