Coding on Countably Infinite Alphabets
|
|
- Beverly Shelton
- 6 years ago
- Views:
Transcription
1 Coding on Countably Infinite Alphabets Non-parametric Information Theory Licence de droits d usage
2 Outline Lossless Coding on infinite alphabets Source Coding Universal Coding Infinite Alphabets Enveloppe Classes Theoretical Properties Algorithms Licence de droits d usage Page 2 / 28
3 Data Compression : Shannon Modelization Licence de droits d usage Page 3 / 28
4 Data Compression : Shannon Modelization Licence de droits d usage Page 3 / 28
5 Outline Lossless Coding on infinite alphabets Source Coding Universal Coding Infinite Alphabets Enveloppe Classes Theoretical Properties Algorithms Licence de droits d usage Page 4 / 28
6 Source Entropy Shannon ( 48) : E P [ φ n (x) ] H n (P) = E P [ log P n (X n 1 )] n-block entropy and there is a code reaching the bound (within 1 bit). Moreover, if P is stationary and ergodic : 1 n H n(p) H(P) entropy rate of the source P = minimal number of bits necessary per symbol. Licence de droits d usage Page 5 / 28
7 Coding Distributions Kraft Inequality : 2 φn(x) 1 x n 1 and reciprocal. Code φ coding distribution Q φ n (x) = 2 φn(x). The Shannon 48 theorem expresses that in average the best coding distribution is P! Otherwise, coding distribution Q n suffers from the regret ( ) ( log Q n X n 1 log P n ( X1 n )) P = log n (X1 n ) Q n(x1 n ). Licence de droits d usage Page 6 / 28
8 Outline Lossless Coding on infinite alphabets Source Coding Universal Coding Infinite Alphabets Enveloppe Classes Theoretical Properties Algorithms Licence de droits d usage Page 7 / 28
9 Universal Coders What if the source statistics are unknown? What if we need versatile code? = Need a single coding distribution Q n for a class of sources Λ = {P θ, θ Θ} Ex : memoryless processes, Markov chains, HMM... = unavoidable redundancy : E Pθ [ φ (X n 1 ) ] H (X n 1 ) = E P θ [log Q n (X n 1 ) + log P θ (X n 1 )] = KL (P θ, Q n ) = Küllback-Leibler divergence between P θ and Q n. Licence de droits d usage Page 8 / 28
10 Coders : Measures of Universality 1. Maximal regret : R (Q n, Λ) = sup 2. Worst case redundancy :[ R + (Q n, Λ) = sup E Pθ θ Θ sup x1 n An θ Θ log Pn θ log P ( ) θ x n ( 1 ) Q n x n 1 ( ) ] X n 1 Q n ( X n 1 ) = sup KL (P θ, Q n ) θ Θ 3. Expected redundancy [ with [ respect ( to ) prior ]] π : Rπ (Q n, Λ) = E π E Pθ log Pn θ X n ( 1 ) = E Q n X n π [KL (P θ, Q n )] 1 = R (Q n, Λ) R + (Q n, Λ) R (Q n, Λ) Licence de droits d usage Page 9 / 28
11 Classes : Measures of Complexity 1. Minimax regret : ( ) Rn (Λ) = inf R (Q n, Λ) = min max log Pn θ x n ( 1 ) Q n Q n x1 n,θ Q n x n 1 2. Minimax redundancy : R n + (Λ) = inf R + (Q n, Λ) = min max KL (Pθ n, Q n) Q n θ 3. Maximin redundancy : Rn (Λ) = sup inf π Q n Rπ (Q n, Λ) = max Q n π min E π [KL (Pθ n, Q n)] Q n = R n (Λ) R + n (Λ) R n (Λ) Licence de droits d usage Page 10 / 28
12 General Standard Results Minimax result (Haussler 97, Sion) : Rn (Λ) = R n + (Λ). Shtarkov s NML : Rn (Λ) = R (QNML, n Λ) with QNML(x) n = and ˆp(x) = sup P Λ P n (x), so that and hence Rn (Λ) = log ˆp(y). y X n Channel capacity method : if dµ(θ, x) = dπ(θ)dp θ (x) then P ˆp(x) y X n ˆp(y) inf Rπ (Q n, Λ) = H µ (X1 n ) H µ(x1 n θ) = H µ(θ) H µ (θ X1 n ). Q n Licence de droits d usage Page 11 / 28
13 Finite Alphabets : Standard Results Theorem (Shtarkov, Barron, Spankowski,... ) If I m denotes the class of memoryless processes over alphabet {1,..., m}, then R + n (I m ) = m 1 2 R n (I m ) = m 1 2 log n Γ(1/2)m + log 2e Γ ( ) m + o m (1) 2 log n Γ(1/2)m + log 2 Γ ( ) m + o m (1) m 1 log n Theorem (Rissanen 84) If dim Θ = k, and if there exists a n-consistent estimator of θ given X n 1, then lim inf n R n (Λ) k log n. 2 Licence de droits d usage Page 12 / 28
14 Outline Lossless Coding on infinite alphabets Source Coding Universal Coding Infinite Alphabets Enveloppe Classes Theoretical Properties Algorithms Licence de droits d usage Page 13 / 28
15 Infinite Alphabets : Issues Motivations : integer coding, lossless image coding, text coding on words, mathematical curiosity. 1. Understand general structural properties of minimax redundancy and minimax regret. 2. Characterize those source classes that have finite minimax regret. 3. Quantitative relations between minimax redundancy or regret and integrability of the envelope function. 4. Develope effective coding techniques for source classes with known non-trivial minimax redundancy rate. 5. Develope adaptive coding schemes for collections of source classes that are too large to have redundancy rates. Licence de droits d usage Page 14 / 28
16 Sanity-check Properties 1. R (Λ n ) < + Q n NML is well-defined and given by Q n NML(x) = ˆp(x) y X n ˆp(y) for x X n where ˆp(x) = sup P Λ P n (x). 2. R + (Λ n ) and R (Λ n ) are non-decreasing, sub-additive (or infinite) functions of n. Thus, and R + (Λ n ) R + (Λ n ) ( lim = inf R + Λ 1), n n n N + n R (Λ n ) R (Λ n ) ( lim = inf R Λ 1). n n n N + n Licence de droits d usage Page 15 / 28
17 Memoryless Sources 1. Let Λ be a class of stationary memoryless sources over a countably infinite alphabet. Let ˆp be defined by ˆp(x) = sup P Λ P{x}. The minimax regret with respect to Λ n is finite if and only if the normalized maximum likelihood (Shtarkov) coding probability is well-defined and : R (Λ n ) < x N + ˆp(x) <. 2. We give an example of a memoryless class Λ such that R + (Λ n ) <, but R (Λ n ) =. Licence de droits d usage Page 16 / 28
18 Outline Lossless Coding on infinite alphabets Source Coding Universal Coding Infinite Alphabets Enveloppe Classes Theoretical Properties Algorithms Licence de droits d usage Page 17 / 28
19 Redundancy and Regret Equivalence Property Definition Λ f = { P : } x N, P 1 {x} f (x) and P is stationary memoryless.. Theorem Let f be a non-negative function from N + to [0, 1], let Λ f be the class of stationary memoryless sources defined by envelope f. Then R + (Λ n f ) < R (Λ n f ) < f (k) <. k N + Licence de droits d usage Page 18 / 28
20 Idea of the Proof The only thing to prove is that k N + f (k) = R + (Λ n f ) =. Let the sequence of integers (h i ) i N be defined recursively by h 0 = 0 and h h i+1 = min h : f (k) > 1. k=h i +1 We consider the class Λ = {P θ, θ Θ} where Θ = N. The memoryless source P i is defined by its first marginal Pi 1 which is given by Pi 1 f (m) (m) = hi+1 k=h i +1 f (k) for m {p i + 1,..., p i+1 }. Taking any infinite entropy prior π over Θ the Λ 1 = {Pi 1 ; i N + } shows that R + ( Λ 1) Rπ ( Λ 1 ) = H(θ) H(θ X 1 ) =. Licence de droits d usage Page 19 / 28
21 Upper-bounding the Regret Theorem If Λ is a class of memoryless sources, let the tail function F Λ 1 be defined by F Λ 1(u) = k>u ˆp(k), then : [ R (Λ n ) inf n F Λ 1(u) log e + u 1 ] log n + 2. u:u n 2 Corollary Let Λ denote a class of memoryless sources, then the following holds : R (Λ n ) < R (Λ n ) = o(n) and R + (Λ n ) = o(n). Licence de droits d usage Page 20 / 28
22 Lower-bounding the Redundancy Theorem Let f denote a non-increasing, summable envelope function. For any integer p, let c(p) = p k=1 f (2k). Let c( ) = k 1 f (2k). Assume furthermore that c( ) > 1. Let p N + be such that c(p) > 1. Let n N +, ɛ > 0 and λ ]0, 1[ be such that n > c(p) 10 f (2p) ɛ(1 λ). Then R + (Λ n f ) C(p, n, λ, ɛ) p where C(p, n, λ, ɛ) = 1 1+ c(p) λ 2 nf (2p) i=1 ( ) 1 (2i) (1 λ) π lognf ɛ, 2 2c(p)e ( 1 4 π 5c(p) (1 λ)ɛnf (2p) ). Licence de droits d usage Page 21 / 28
23 Power-law Classes Theorem Let α denote a real number larger than 1, and C be such that Cζ(α) 2 α. The source class Λ C α is the envelope class associated with the decreasing function f α,c : x 1 C x α for C > 1 and α > 1. Then : where n 1/α A(α) log (Cζ(α)) 1/α R + (Λ n C ) α A(α) = 1 α 1 1 ( u 1 1/α 1 e 1/(ζ(α)u)) du. ( ) 2Cn 1/α R (Λ n C ) (log n) 1 1/α + O(1). α α 1 Licence de droits d usage Page 22 / 28
24 Exponential Classes Theorem Let C and α denote positive real numbers satisfying C > e 2α. The class Λ Ce α is the envelope class associated with function f α : x 1 Ce αx. Then 1 8α log2 n (1 o(1)) R + (Λ n Ce α ) R (Λ n Ce α ) 1 2α log2 n + O(1) cf Dominique Bontemps : R + (Λ n Ce α ) 1 4α log2 n Licence de droits d usage Page 23 / 28
25 Outline Lossless Coding on infinite alphabets Source Coding Universal Coding Infinite Alphabets Enveloppe Classes Theoretical Properties Algorithms Licence de droits d usage Page 24 / 28
26 The CensoringCode Algorithm Algorithm CensoringCode K 0 counts [1/2, 1/2,...] for i from 1 to n do cutoff 4 α 1 Ci 1/α if cutoff > K then for j K + 1 to cutoff do counts[0] counts[0] counts[j] + 1/2 end for K cutoff end if if x[i] cutoff then ArithCode(x[i], counts[0 : cutoff]) else ArithCode(0, counts[0 : cutoff]) C1 C1 EliasCode(x[i]) counts[0] counts[0] + 1 end if counts[x[i]] counts[x[i]] + 1 end for C2 ArithCode() C 1 C 2 Theorem Let C and α be positive reals. Let the sequence of cutoffs (K i ) i n be given by Then K i = ( ) 4Ci 1/α. α 1 R + (CensoringCode, Λ n C ) α ( ) 1 4Cn α log n (1 + o(1)). α 1 Licence de droits d usage Page 25 / 28
27 Approximately Adaptive Algorithms A sequence (Q n ) n of coding probabilities is said to be approximately asymptotically adaptive with respect to a collection (Λ m ) m M of source classes if for each P m M Λ m, for each Λ m such that P Λ m : D(P n, Q n )/R + (Λ n m) O(log n). A modification of CensoringCode choosing the n + 1th cutoff K n+1 according to the number of distinct symbols in x is approximately adaptive on W α = {P : P Λ α, 0 < lim inf P 1 (k) lim sup k α P 1 (k) < k k } Pattern coding is approximately adaptive if 1 < α 5/2.. Licence de droits d usage Page 26 / 28
28 Patterns The information conveyed in a message x can be separated into 1. a dictionary = (x) : the sequence of distinct symbols occurring in x in order of appearance ; 2. a pattern ψ = ψ(x) where ψ i is the rank of x i in dictionary. Example : Message x = a b r a c a d a b r a Pattern ψ(x) = Dictionary (x) = a b r c d = A random process (X n ) n with distribution P induces a random pattern process (Ψ n ) n on N + with distribution : P Ψ (Ψ n 1 = ψn 1 ) = P (X n 1 = x n 1 ). x n 1 :ψ(x n 1 )=ψn 1 Licence de droits d usage Page 27 / 28
29 Pattern Coding The pattern entropy rate exists and coincides with the process entropy rate : 1 n H (Ψn 1 ) = 1 [ ] n E P Ψ log P Ψ (Ψ n 1 ) H(Ψ) = H(X). Pattern redundancy satisfies : ( ) 1 ( ) n R 2 n. log n Ψ,n (I ) RΨ,n (I ) π 3 log e The proof of the lower-bound uses fine combinatorics on integer partitions with small summands (see Garivier 09). Upper-bounds in O(n 2/5 ) have been given (see Shamir 07), but there is still a gap between lower- and upper-bounds. For memoryless coding, the pattern redundancy is neglectible wrt. the codelength of the dictionary whenever n < 5/2. Licence de droits d usage Page 28 / 28
Context tree models for source coding
Context tree models for source coding Toward Non-parametric Information Theory Licence de droits d usage Outline Lossless Source Coding = density estimation with log-loss Source Coding and Universal Coding
More informationCoding on countably infinite alphabets
Coding on countably infinite alphabets Stéphane Boucheron, Aurélien Garivier and Elisabeth Gassiat Abstract arxiv:080.456v [math.st] 6 Jan 008 This paper describes universal lossless coding strategies
More informationA BLEND OF INFORMATION THEORY AND STATISTICS. Andrew R. Barron. Collaborators: Cong Huang, Jonathan Li, Gerald Cheang, Xi Luo
A BLEND OF INFORMATION THEORY AND STATISTICS Andrew R. YALE UNIVERSITY Collaborators: Cong Huang, Jonathan Li, Gerald Cheang, Xi Luo Frejus, France, September 1-5, 2008 A BLEND OF INFORMATION THEORY AND
More informationThe Method of Types and Its Application to Information Hiding
The Method of Types and Its Application to Information Hiding Pierre Moulin University of Illinois at Urbana-Champaign www.ifp.uiuc.edu/ moulin/talks/eusipco05-slides.pdf EUSIPCO Antalya, September 7,
More informationLecture 2: August 31
0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 2: August 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy
More informationConsistency of the maximum likelihood estimator for general hidden Markov models
Consistency of the maximum likelihood estimator for general hidden Markov models Jimmy Olsson Centre for Mathematical Sciences Lund University Nordstat 2012 Umeå, Sweden Collaborators Hidden Markov models
More informationEE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018
Please submit the solutions on Gradescope. EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018 1. Optimal codeword lengths. Although the codeword lengths of an optimal variable length code
More informationLecture 5 Channel Coding over Continuous Channels
Lecture 5 Channel Coding over Continuous Channels I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw November 14, 2014 1 / 34 I-Hsiang Wang NIT Lecture 5 From
More informationPiecewise Constant Prediction
Piecewise Constant Prediction Erik Ordentlich Information heory Research Hewlett-Packard Laboratories Palo Alto, CA 94304 Email: erik.ordentlich@hp.com Marcelo J. Weinberger Information heory Research
More informationSource Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria
Source Coding Master Universitario en Ingeniería de Telecomunicación I. Santamaría Universidad de Cantabria Contents Introduction Asymptotic Equipartition Property Optimal Codes (Huffman Coding) Universal
More informationBasic Principles of Lossless Coding. Universal Lossless coding. Lempel-Ziv Coding. 2. Exploit dependences between successive symbols.
Universal Lossless coding Lempel-Ziv Coding Basic principles of lossless compression Historical review Variable-length-to-block coding Lempel-Ziv coding 1 Basic Principles of Lossless Coding 1. Exploit
More information1 Ex. 1 Verify that the function H(p 1,..., p n ) = k p k log 2 p k satisfies all 8 axioms on H.
Problem sheet Ex. Verify that the function H(p,..., p n ) = k p k log p k satisfies all 8 axioms on H. Ex. (Not to be handed in). looking at the notes). List as many of the 8 axioms as you can, (without
More information5 Mutual Information and Channel Capacity
5 Mutual Information and Channel Capacity In Section 2, we have seen the use of a quantity called entropy to measure the amount of randomness in a random variable. In this section, we introduce several
More informationWE start with a general discussion. Suppose we have
646 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO. 2, MARCH 1997 Minimax Redundancy for the Class of Memoryless Sources Qun Xie and Andrew R. Barron, Member, IEEE Abstract Let X n = (X 1 ; 111;Xn)be
More informationIterative Markov Chain Monte Carlo Computation of Reference Priors and Minimax Risk
Iterative Markov Chain Monte Carlo Computation of Reference Priors and Minimax Risk John Lafferty School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 lafferty@cs.cmu.edu Abstract
More information3F1 Information Theory, Lecture 3
3F1 Information Theory, Lecture 3 Jossy Sayir Department of Engineering Michaelmas 2011, 28 November 2011 Memoryless Sources Arithmetic Coding Sources with Memory 2 / 19 Summary of last lecture Prefix-free
More informationChapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University
Chapter 4 Data Transmission and Channel Capacity Po-Ning Chen, Professor Department of Communications Engineering National Chiao Tung University Hsin Chu, Taiwan 30050, R.O.C. Principle of Data Transmission
More informationExpectation Maximization
Expectation Maximization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 /
More informationSTATISTICAL CURVATURE AND STOCHASTIC COMPLEXITY
2nd International Symposium on Information Geometry and its Applications December 2-6, 2005, Tokyo Pages 000 000 STATISTICAL CURVATURE AND STOCHASTIC COMPLEXITY JUN-ICHI TAKEUCHI, ANDREW R. BARRON, AND
More informationInformation Theory in Intelligent Decision Making
Information Theory in Intelligent Decision Making Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire, United Kingdom June 7, 2015 Information Theory
More informationInformation Theory, Statistics, and Decision Trees
Information Theory, Statistics, and Decision Trees Léon Bottou COS 424 4/6/2010 Summary 1. Basic information theory. 2. Decision trees. 3. Information theory and statistics. Léon Bottou 2/31 COS 424 4/6/2010
More informationMinimax Redundancy for Large Alphabets by Analytic Methods
Minimax Redundancy for Large Alphabets by Analytic Methods Wojciech Szpankowski Purdue University W. Lafayette, IN 47907 March 15, 2012 NSF STC Center for Science of Information CISS, Princeton 2012 Joint
More information3F1 Information Theory, Lecture 3
3F1 Information Theory, Lecture 3 Jossy Sayir Department of Engineering Michaelmas 2013, 29 November 2013 Memoryless Sources Arithmetic Coding Sources with Memory Markov Example 2 / 21 Encoding the output
More informationUniversal Loseless Compression: Context Tree Weighting(CTW)
Universal Loseless Compression: Context Tree Weighting(CTW) Dept. Electrical Engineering, Stanford University Dec 9, 2014 Universal Coding with Model Classes Traditional Shannon theory assume a (probabilistic)
More informationShannon s Noisy-Channel Coding Theorem
Shannon s Noisy-Channel Coding Theorem Lucas Slot Sebastian Zur February 13, 2015 Lucas Slot, Sebastian Zur Shannon s Noisy-Channel Coding Theorem February 13, 2015 1 / 29 Outline 1 Definitions and Terminology
More informationHomework Set #2 Data Compression, Huffman code and AEP
Homework Set #2 Data Compression, Huffman code and AEP 1. Huffman coding. Consider the random variable ( x1 x X = 2 x 3 x 4 x 5 x 6 x 7 0.50 0.26 0.11 0.04 0.04 0.03 0.02 (a Find a binary Huffman code
More informationLecture 4 Channel Coding
Capacity and the Weak Converse Lecture 4 Coding I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw October 15, 2014 1 / 16 I-Hsiang Wang NIT Lecture 4 Capacity
More informationBayesian Properties of Normalized Maximum Likelihood and its Fast Computation
Bayesian Properties of Normalized Maximum Likelihood and its Fast Computation Andre Barron Department of Statistics Yale University Email: andre.barron@yale.edu Teemu Roos HIIT & Dept of Computer Science
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science Transmission of Information Spring 2006
MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science 6.44 Transmission of Information Spring 2006 Homework 2 Solution name username April 4, 2006 Reading: Chapter
More informationRobustness and duality of maximum entropy and exponential family distributions
Chapter 7 Robustness and duality of maximum entropy and exponential family distributions In this lecture, we continue our study of exponential families, but now we investigate their properties in somewhat
More informationInformation and Statistics
Information and Statistics Andrew Barron Department of Statistics Yale University IMA Workshop On Information Theory and Concentration Phenomena Minneapolis, April 13, 2015 Barron Information and Statistics
More informationELEMENT OF INFORMATION THEORY
History Table of Content ELEMENT OF INFORMATION THEORY O. Le Meur olemeur@irisa.fr Univ. of Rennes 1 http://www.irisa.fr/temics/staff/lemeur/ October 2010 1 History Table of Content VERSION: 2009-2010:
More informationLecture 6 I. CHANNEL CODING. X n (m) P Y X
6- Introduction to Information Theory Lecture 6 Lecturer: Haim Permuter Scribe: Yoav Eisenberg and Yakov Miron I. CHANNEL CODING We consider the following channel coding problem: m = {,2,..,2 nr} Encoder
More informationBasic math for biology
Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood
More informationExercises with solutions (Set D)
Exercises with solutions Set D. A fair die is rolled at the same time as a fair coin is tossed. Let A be the number on the upper surface of the die and let B describe the outcome of the coin toss, where
More informationEE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16
EE539R: Problem Set 4 Assigned: 3/08/6, Due: 07/09/6. Cover and Thomas: Problem 3.5 Sets defined by probabilities: Define the set C n (t = {x n : P X n(x n 2 nt } (a We have = P X n(x n P X n(x n 2 nt
More informationIntroduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.
L65 Dept. of Linguistics, Indiana University Fall 205 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission rate
More informationChaos, Complexity, and Inference (36-462)
Chaos, Complexity, and Inference (36-462) Lecture 7: Information Theory Cosma Shalizi 3 February 2009 Entropy and Information Measuring randomness and dependence in bits The connection to statistics Long-run
More informationLossy Compression Coding Theorems for Arbitrary Sources
Lossy Compression Coding Theorems for Arbitrary Sources Ioannis Kontoyiannis U of Cambridge joint work with M. Madiman, M. Harrison, J. Zhang Beyond IID Workshop, Cambridge, UK July 23, 2018 Outline Motivation
More informationDept. of Linguistics, Indiana University Fall 2015
L645 Dept. of Linguistics, Indiana University Fall 2015 1 / 28 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission
More informationPiecewise Constant Prediction
Piecewise Constant Prediction Erik Ordentlich, Marcelo J. Weinberger, Yihong Wu HP Laboratories HPL-01-114 Keyword(s): prediction; data compression; i strategies; universal schemes Abstract: Mini prediction
More informationA One-to-One Code and Its Anti-Redundancy
A One-to-One Code and Its Anti-Redundancy W. Szpankowski Department of Computer Science, Purdue University July 4, 2005 This research is supported by NSF, NSA and NIH. Outline of the Talk. Prefix Codes
More information(Classical) Information Theory II: Source coding
(Classical) Information Theory II: Source coding Sibasish Ghosh The Institute of Mathematical Sciences CIT Campus, Taramani, Chennai 600 113, India. p. 1 Abstract The information content of a random variable
More informationIntroduction to information theory and coding
Introduction to information theory and coding Louis WEHENKEL Set of slides No 4 Source modeling and source coding Stochastic processes and models for information sources First Shannon theorem : data compression
More informationTight Bounds for Symmetric Divergence Measures and a New Inequality Relating f-divergences
Tight Bounds for Symmetric Divergence Measures and a New Inequality Relating f-divergences Igal Sason Department of Electrical Engineering Technion, Haifa 3000, Israel E-mail: sason@ee.technion.ac.il Abstract
More informationOn bounded redundancy of universal codes
On bounded redundancy of universal codes Łukasz Dębowski Institute of omputer Science, Polish Academy of Sciences ul. Jana Kazimierza 5, 01-248 Warszawa, Poland Abstract onsider stationary ergodic measures
More informationA lower bound on compression of unknown alphabets
Theoretical Computer Science 33 005 93 3 wwwelseviercom/locate/tcs A lower bound on compression of unknown alphabets Nikola Jevtić a, Alon Orlitsky a,b,, Narayana P Santhanam a a ECE Department, University
More informationECE 587 / STA 563: Lecture 5 Lossless Compression
ECE 587 / STA 563: Lecture 5 Lossless Compression Information Theory Duke University, Fall 2017 Author: Galen Reeves Last Modified: October 18, 2017 Outline of lecture: 5.1 Introduction to Lossless Source
More information4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information
4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information Ramji Venkataramanan Signal Processing and Communications Lab Department of Engineering ramji.v@eng.cam.ac.uk
More informationLecture 4 Noisy Channel Coding
Lecture 4 Noisy Channel Coding I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw October 9, 2015 1 / 56 I-Hsiang Wang IT Lecture 4 The Channel Coding Problem
More informationPROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS
PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability
More informationSecond-Order Asymptotics in Information Theory
Second-Order Asymptotics in Information Theory Vincent Y. F. Tan (vtan@nus.edu.sg) Dept. of ECE and Dept. of Mathematics National University of Singapore (NUS) National Taiwan University November 2015
More informationLarge Deviations Performance of Knuth-Yao algorithm for Random Number Generation
Large Deviations Performance of Knuth-Yao algorithm for Random Number Generation Akisato KIMURA akisato@ss.titech.ac.jp Tomohiko UYEMATSU uematsu@ss.titech.ac.jp April 2, 999 No. AK-TR-999-02 Abstract
More informationPointwise Redundancy in Lossy Data Compression and Universal Lossy Data Compression
Pointwise Redundancy in Lossy Data Compression and Universal Lossy Data Compression I. Kontoyiannis To appear, IEEE Transactions on Information Theory, Jan. 2000 Last revision, November 21, 1999 Abstract
More informationAn instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1
Kraft s inequality An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if N 2 l i 1 Proof: Suppose that we have a tree code. Let l max = max{l 1,...,
More informationBioinformatics: Biology X
Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA Model Building/Checking, Reverse Engineering, Causality Outline 1 Bayesian Interpretation of Probabilities 2 Where (or of what)
More informationCS 630 Basic Probability and Information Theory. Tim Campbell
CS 630 Basic Probability and Information Theory Tim Campbell 21 January 2003 Probability Theory Probability Theory is the study of how best to predict outcomes of events. An experiment (or trial or event)
More informationKeep it Simple Stupid On the Effect of Lower-Order Terms in BIC-Like Criteria
Keep it Simple Stupid On the Effect of Lower-Order Terms in BIC-Like Criteria Teemu Roos and Yuan Zou Helsinki Institute for Information Technology HIIT Department of Computer Science University of Helsinki,
More informationQuiz 2 Date: Monday, November 21, 2016
10-704 Information Processing and Learning Fall 2016 Quiz 2 Date: Monday, November 21, 2016 Name: Andrew ID: Department: Guidelines: 1. PLEASE DO NOT TURN THIS PAGE UNTIL INSTRUCTED. 2. Write your name,
More informationECE 587 / STA 563: Lecture 5 Lossless Compression
ECE 587 / STA 563: Lecture 5 Lossless Compression Information Theory Duke University, Fall 28 Author: Galen Reeves Last Modified: September 27, 28 Outline of lecture: 5. Introduction to Lossless Source
More informationTight Bounds for Symmetric Divergence Measures and a Refined Bound for Lossless Source Coding
APPEARS IN THE IEEE TRANSACTIONS ON INFORMATION THEORY, FEBRUARY 015 1 Tight Bounds for Symmetric Divergence Measures and a Refined Bound for Lossless Source Coding Igal Sason Abstract Tight bounds for
More informationBounded Expected Delay in Arithmetic Coding
Bounded Expected Delay in Arithmetic Coding Ofer Shayevitz, Ram Zamir, and Meir Feder Tel Aviv University, Dept. of EE-Systems Tel Aviv 69978, Israel Email: {ofersha, zamir, meir }@eng.tau.ac.il arxiv:cs/0604106v1
More informationChapter 9 Fundamental Limits in Information Theory
Chapter 9 Fundamental Limits in Information Theory Information Theory is the fundamental theory behind information manipulation, including data compression and data transmission. 9.1 Introduction o For
More informationInformation Theory Primer:
Information Theory Primer: Entropy, KL Divergence, Mutual Information, Jensen s inequality Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,
More informationTwo-Part Codes with Low Worst-Case Redundancies for Distributed Compression of Bernoulli Sequences
2003 Conference on Information Sciences and Systems, The Johns Hopkins University, March 2 4, 2003 Two-Part Codes with Low Worst-Case Redundancies for Distributed Compression of Bernoulli Sequences Dror
More informationSequential prediction with coded side information under logarithmic loss
under logarithmic loss Yanina Shkel Department of Electrical Engineering Princeton University Princeton, NJ 08544, USA Maxim Raginsky Department of Electrical and Computer Engineering Coordinated Science
More informationECE598: Information-theoretic methods in high-dimensional statistics Spring 2016
ECE598: Information-theoretic methods in high-dimensional statistics Spring 06 Lecture : Mutual Information Method Lecturer: Yihong Wu Scribe: Jaeho Lee, Mar, 06 Ed. Mar 9 Quick review: Assouad s lemma
More information3. If a choice is broken down into two successive choices, the original H should be the weighted sum of the individual values of H.
Appendix A Information Theory A.1 Entropy Shannon (Shanon, 1948) developed the concept of entropy to measure the uncertainty of a discrete random variable. Suppose X is a discrete random variable that
More informationInformation Theory. M1 Informatique (parcours recherche et innovation) Aline Roumy. January INRIA Rennes 1/ 73
1/ 73 Information Theory M1 Informatique (parcours recherche et innovation) Aline Roumy INRIA Rennes January 2018 Outline 2/ 73 1 Non mathematical introduction 2 Mathematical introduction: definitions
More informationVariable-to-Variable Codes with Small Redundancy Rates
Variable-to-Variable Codes with Small Redundancy Rates M. Drmota W. Szpankowski September 25, 2004 This research is supported by NSF, NSA and NIH. Institut f. Diskrete Mathematik und Geometrie, TU Wien,
More information10-704: Information Processing and Learning Fall Lecture 10: Oct 3
0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 0: Oct 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy of
More informationCoding of memoryless sources 1/35
Coding of memoryless sources 1/35 Outline 1. Morse coding ; 2. Definitions : encoding, encoding efficiency ; 3. fixed length codes, encoding integers ; 4. prefix condition ; 5. Kraft and Mac Millan theorems
More informationComplex Systems Methods 2. Conditional mutual information, entropy rate and algorithmic complexity
Complex Systems Methods 2. Conditional mutual information, entropy rate and algorithmic complexity Eckehard Olbrich MPI MiS Leipzig Potsdam WS 2007/08 Olbrich (Leipzig) 26.10.2007 1 / 18 Overview 1 Summary
More informationLecture 21: Minimax Theory
Lecture : Minimax Theory Akshay Krishnamurthy akshay@cs.umass.edu November 8, 07 Recap In the first part of the course, we spent the majority of our time studying risk minimization. We found many ways
More informationConvexity/Concavity of Renyi Entropy and α-mutual Information
Convexity/Concavity of Renyi Entropy and -Mutual Information Siu-Wai Ho Institute for Telecommunications Research University of South Australia Adelaide, SA 5095, Australia Email: siuwai.ho@unisa.edu.au
More informationMinimax Optimal Bayes Mixtures for Memoryless Sources over Large Alphabets
Proceedings of Machine Learning Research 8: 8, 208 Algorithmic Learning Theory 208 Minimax Optimal Bayes Mixtures for Memoryless Sources over Large Alphabets Elias Jääsaari Helsinki Institute for Information
More informationECE 4400:693 - Information Theory
ECE 4400:693 - Information Theory Dr. Nghi Tran Lecture 8: Differential Entropy Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 1 / 43 Outline 1 Review: Entropy of discrete RVs 2 Differential
More informationErgodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R.
Ergodic Theorems Samy Tindel Purdue University Probability Theory 2 - MA 539 Taken from Probability: Theory and examples by R. Durrett Samy T. Ergodic theorems Probability Theory 1 / 92 Outline 1 Definitions
More informationAsymptotic Minimax Regret for Data Compression, Gambling, and Prediction
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 46, NO. 2, MARCH 2000 431 Asymptotic Minimax Regret for Data Compression, Gambling, Prediction Qun Xie Andrew R. Barron, Member, IEEE Abstract For problems
More informationOn Universal Types. Gadiel Seroussi Hewlett-Packard Laboratories Palo Alto, California, USA. University of Minnesota, September 14, 2004
On Universal Types Gadiel Seroussi Hewlett-Packard Laboratories Palo Alto, California, USA University of Minnesota, September 14, 2004 Types for Parametric Probability Distributions A = finite alphabet,
More informationEECS 750. Hypothesis Testing with Communication Constraints
EECS 750 Hypothesis Testing with Communication Constraints Name: Dinesh Krithivasan Abstract In this report, we study a modification of the classical statistical problem of bivariate hypothesis testing.
More informationOn Competitive Prediction and Its Relation to Rate-Distortion Theory
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 49, NO. 12, DECEMBER 2003 3185 On Competitive Prediction and Its Relation to Rate-Distortion Theory Tsachy Weissman, Member, IEEE, and Neri Merhav, Fellow,
More informationTwo Applications of the Gaussian Poincaré Inequality in the Shannon Theory
Two Applications of the Gaussian Poincaré Inequality in the Shannon Theory Vincent Y. F. Tan (Joint work with Silas L. Fong) National University of Singapore (NUS) 2016 International Zurich Seminar on
More informationKolmogorov complexity
Kolmogorov complexity In this section we study how we can define the amount of information in a bitstring. Consider the following strings: 00000000000000000000000000000000000 0000000000000000000000000000000000000000
More informationSome Results on the Ergodicity of Adaptive MCMC Algorithms
Some Results on the Ergodicity of Adaptive MCMC Algorithms Omar Khalil Supervisor: Jeffrey Rosenthal September 2, 2011 1 Contents 1 Andrieu-Moulines 4 2 Roberts-Rosenthal 7 3 Atchadé and Fort 8 4 Relationship
More informationGenerative and Discriminative Approaches to Graphical Models CMSC Topics in AI
Generative and Discriminative Approaches to Graphical Models CMSC 35900 Topics in AI Lecture 2 Yasemin Altun January 26, 2007 Review of Inference on Graphical Models Elimination algorithm finds single
More informationECE Advanced Communication Theory, Spring 2009 Homework #1 (INCOMPLETE)
ECE 74 - Advanced Communication Theory, Spring 2009 Homework #1 (INCOMPLETE) 1. A Huffman code finds the optimal codeword to assign to a given block of source symbols. (a) Show that cannot be a Huffman
More informationAnalytic Information Theory: From Shannon to Knuth and Back. Knuth80: Piteaa, Sweden, 2018 Dedicated to Don E. Knuth
Analytic Information Theory: From Shannon to Knuth and Back Wojciech Szpankowski Center for Science of Information Purdue University January 7, 2018 Knuth80: Piteaa, Sweden, 2018 Dedicated to Don E. Knuth
More informationIN this paper, we study the problem of universal lossless compression
4008 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 9, SEPTEMBER 2006 An Algorithm for Universal Lossless Compression With Side Information Haixiao Cai, Member, IEEE, Sanjeev R. Kulkarni, Fellow,
More informationUpper Bounds on the Capacity of Binary Intermittent Communication
Upper Bounds on the Capacity of Binary Intermittent Communication Mostafa Khoshnevisan and J. Nicholas Laneman Department of Electrical Engineering University of Notre Dame Notre Dame, Indiana 46556 Email:{mhoshne,
More informationChapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye
Chapter 2: Entropy and Mutual Information Chapter 2 outline Definitions Entropy Joint entropy, conditional entropy Relative entropy, mutual information Chain rules Jensen s inequality Log-sum inequality
More informationLecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157
Lecture 6: Gaussian Channels Copyright G. Caire (Sample Lectures) 157 Differential entropy (1) Definition 18. The (joint) differential entropy of a continuous random vector X n p X n(x) over R is: Z h(x
More informationLecture 35: December The fundamental statistical distances
36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose
More informationLecture 1: Introduction, Entropy and ML estimation
0-704: Information Processing and Learning Spring 202 Lecture : Introduction, Entropy and ML estimation Lecturer: Aarti Singh Scribes: Min Xu Disclaimer: These notes have not been subjected to the usual
More informationQuantitative Biology II Lecture 4: Variational Methods
10 th March 2015 Quantitative Biology II Lecture 4: Variational Methods Gurinder Singh Mickey Atwal Center for Quantitative Biology Cold Spring Harbor Laboratory Image credit: Mike West Summary Approximate
More informationInformation Dimension
Information Dimension Mina Karzand Massachusetts Institute of Technology November 16, 2011 1 / 26 2 / 26 Let X would be a real-valued random variable. For m N, the m point uniform quantized version of
More informationExact Minimax Strategies for Predictive Density Estimation, Data Compression, and Model Selection
2708 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 50, NO. 11, NOVEMBER 2004 Exact Minimax Strategies for Predictive Density Estimation, Data Compression, Model Selection Feng Liang Andrew Barron, Senior
More informationNational University of Singapore Department of Electrical & Computer Engineering. Examination for
National University of Singapore Department of Electrical & Computer Engineering Examination for EE5139R Information Theory for Communication Systems (Semester I, 2014/15) November/December 2014 Time Allowed:
More informationChapter 5: Data Compression
Chapter 5: Data Compression Definition. A source code C for a random variable X is a mapping from the range of X to the set of finite length strings of symbols from a D-ary alphabet. ˆX: source alphabet,
More informationInformation Theory and Model Selection
Information Theory and Model Selection Dean Foster & Robert Stine Dept of Statistics, Univ of Pennsylvania May 11, 1998 Data compression and coding Duality of code lengths and probabilities Model selection
More information