Context tree models for source coding

Size: px
Start display at page:

Download "Context tree models for source coding"

Transcription

1 Context tree models for source coding Toward Non-parametric Information Theory Licence de droits d usage

2 Outline Lossless Source Coding = density estimation with log-loss Source Coding and Universal Coding Bayesian coding Context-tree Models and Double Mixture Rissanen s model Coding in Context Tree models The Context-Tree Weighting Method Context Tree Weighting on Renewal Processes CTW on Renewal Processes Perpectives Licence de droits d usage Page 2 / 5

3 Data Compression : Shannon Modelization Licence de droits d usage Page 3 / 5

4 Data Compression : Shannon Modelization Licence de droits d usage Page 3 / 5

5 Outline Lossless Source Coding = density estimation with log-loss Source Coding and Universal Coding Bayesian coding Context-tree Models and Double Mixture Rissanen s model Coding in Context Tree models The Context-Tree Weighting Method Context Tree Weighting on Renewal Processes CTW on Renewal Processes Perpectives Licence de droits d usage Page 4 / 5

6 The Problem Finite alphabet A ; ex : A = {a, c, t, g} Set of all finite binary sequences : {, } = n {, }n If x A n, its length is x = n log = base 2 logarithm Let X be a stochastic process on the finite alphabet A, with stationary ergodic distribution P The marginal distribution of X n is denoted by Pn For n N, the coding function φ n : A n {, } Expected code length : E P [ φ n (X n ) ] Licence de droits d usage Page 5 / 5

7 Example : DNA Codes A = {a, c, t, g} Uniform concatenation code : φ = a c t g, φ n(x n) = φ (X )... φ (X n ) = x n An, φ n (X n ) = 2n Improvement? ψ = a c t g Problem : what is ψ () : ct or gc? Uniquely decodable codes : φ n (X n ) = φ m(y m ) = m = n and i {,..., n}, X i = Y i Instant codes = prefix codes : (a, b) A 2, j N, φ (a) :j φ (b) Licence de droits d usage Page 6 / 5

8 Kraft s Inequality Theorem (Kraft 49, McMillan 56) If φ n is a uniquely decodable code, then x A n 2 φn(x) Proof for prefix codes : Let D = max x A n φ n (x) and for x A n, let Z (x) = {w {, } D : w φ(x) = φ n (x)}. #Z (x) = 2 D φn(x) φ n prefix = x, y A n, x y = Z (x) Z (y) = Hence x A n N(x) = x A n 2 D φn(x) #{, } D = 2 D Licence de droits d usage Page 7 / 5

9 Shannon s First Theorem = if l(x) = φ n (x), we look for min l x A n l(x)p n (x) under constraint x A n 2 l(x) Theorem (Shannon 48) E P [ φ n (x) ] H n (X) = E P [ log P n (X n )] Misleading notation : H n (X) is actually a function of P n only, the marginal distribution of X n Tight bound : there exist a code φ n such that E P [ φ n (x) ] H n (X) + Licence de droits d usage Page 8 / 5

10 Example : binary entropy A = {, }, X i iid B(p), < p <. n H n(x) = H (X) = h(p) = p log p + ( p) log p Licence de droits d usage Page 9 / 5

11 Example : a 2-block code A = {, 2, 3}, n = 2, P =.3δ +.6δ 2 +.δ 3 i.i.d. H(X) =.3 log(.3).6 log(.6). log(.).295 bits X 2 φ 2 (X 2) E [ φ2 (X 2 ) ].335 Licence de droits d usage Page / 5

12 Entropy Rate Theorem[Shannon-Breiman-McMillan] if P is stationary and ergodic, then n H n(x) H(X) = Interpretation : H(X) = minimal number of bits necessary to code each symbol of X Kraft s inequality : for each code φ n there exists a (sub-)probability distribution Q n such that Q n (x) = 2 φn(x) log Q n (x) = φ n (x) Licence de droits d usage Page / 5

13 Arithmetic Coding : a constructive reciprocal φ n (x n ) = the log Qn (x n ) +2 first bits of Qn (X n x n ) + Qn (X n < x n ) Ex : A = {, 2, 3}, n = 2, Q 2 = [.3,.6,.] 2 I() = [,.9[ : m() =.45 =.... As log.9 + = 5, φ 2 () = I(22) = [.48,.84[ : m(22) =.66 =.... As log.36 + = 3, φ 2 (22) = 23 3 Licence de droits d usage Page 2 /

14 Coding distributions = log Q n (x)= code length for x with coding distribution Q n. Th Shannon : the best coding distribution is Q n = P n. Using another distribution causes an expected overhead called redundancy E P [ φ n (X n ) ] H (X n ) = E P [ log Q n (X n ) + log Pn (X n )] = KL (P n, Q n ) = Küllback-Leibler divergence of P n to Q n. Licence de droits d usage Page 3 / 5

15 Universal coding What if the source statistics are unknown? What if we need a versatile code? = Need a single coding distribution Q n for a class of sources Λ = {P θ, θ Θ} Ex : memoryless processes, Markov chains, HMM... = unavoidable redundancy, in the worst case : sup θ Θ E Pθ [ φ n (X n ) ] H (X n ) = sup KL (Pθ n, Qn ) θ Θ = Q n must be close to all the {P θ, θ Θ} for KL divergence Licence de droits d usage Page 4 / 5

16 Predictive Coding What if we want to code messages with different or large lengths? Consistent coding family : x n An, Q n (x n ) = a A Q n+ (xa) Define Q(a x n) = Q(x na) b A Q(x n b), then n ( ) Q n (x n ) = Q x j x j j= Predictive coding scheme : update Q( x n) Example : Laplace code Q(a x n) = n(a,x n)+ n+ A, where n(a, x n) = n j= x j =a. Licence de droits d usage Page 5 / 5

17 Density estimation with Logarithmic Loss Feature space : context x k A, Instant loss : number of bits used l(q, x k ) = log Q(x k x k ) Best regressor in( average : ) Q ( x k ) = P θ X k = x k X k = x k Excess loss : l(q, x k ) l(q, x k ) = log P θ(x k =x k X k =x k ) Q k (x k x k ) ( ) E Pθ [l(q, x k ) l(q, x k )] = KL P θ (X k = X k = x k k k ), Q ( x Total excess loss : L n (Q) = n k= l(q, x k) l(q, x k ) = log Pn θ (x n ) Q n (x n ) Total expected excess loss : E Pθ L n (Q) = KL(P n θ, Qn ) Licence de droits d usage Page 6 / 5

18 Outline Lossless Source Coding = density estimation with log-loss Source Coding and Universal Coding Bayesian coding Context-tree Models and Double Mixture Rissanen s model Coding in Context Tree models The Context-Tree Weighting Method Context Tree Weighting on Renewal Processes CTW on Renewal Processes Perpectives Licence de droits d usage Page 7 / 5

19 Bayesian predictive coding Let π be any prior on Θ, take Q n (x n ) = P n θ (x n )π(dθ) Predictive coding distribution : Q(a x n ) = P θ (x n+ = a X n ) π(dθ x n ) Conjugate priors = easy update rules for Q( x n) Idea : θ Θ, Q n (x n) Pn θ±δ (x n )π(θ ± δ) Licence de droits d usage Page 8 / 5

20 Example : i.i.d. classes A stationary memoryless source is parameterized by A θ S A = (θ,..., θ A ) : θ j = i= Conjugate Dirichlet prior π = D(α,..., α A ). Posterior : Π ( dθ x n ) = D(α + n,..., α A +n A ) = Q(a x n ) = n a + α a n + A j= α j Laplace code = special case α = = α A = Licence de droits d usage Page 9 / 5

21 The Krichevski-Trofimov Mixture Dirichlet prior with α = = α A = /2 : ( ) Γ A Q KT (x n ) = P θ (x n ) 2 θ S A A Γ( 2 ) A a A θ /2 a dθ a. Jeffrey s prior = Non-informative Peaked on extreme parameter values Example A =2 : ν(θ ) Q KT () =Q KT ()Q KT ( )Q KT ( ) = /2 /2 + 2 / Licence de droits d usage θ Page 2 / 5

22 Optimality Theorem : (Krichevsky-Trofimov 8) If QKT n is the KT-mixture code and if m = A, then max θ Θ KL(Pn θ, Qn KT ) = m log n Γ(/2)m + log 2 2 Γ ( ) m + o m () 2 Theorem : (Rissanen 84) If dim Θ = k, and if there exists a n-consistent estimator of θ given X n, then lim inf min n Q n max θ Θ KL(Pn θ, Qn ) k 2 log n Theorem : Haussler 96 there exists a Bayesian code Q n reaching the minimax redundandy Licence de droits d usage Page 2 / 5

23 Optimality : more precisely Theorem : (Krichevsky-Trofimov 8) If QKT n is the KT-mixture code and if m = A, then max θ Θ KL(Pn θ, Qn KT ) = m log n Γ(/2)m + log 2 2 Γ ( ) m + o m () 2 Theorem : (Xie-Barron 97) min max θ Q ΘKL(Pn n θ, Qn ) = m log n Γ(/2)m + log 2 2e Γ ( ) m + o m () 2 Actually, pointwise approximation : log QKT n (x n ) inf log θ Pn θ (x n ) + A log n + C 2 Licence de droits d usage Page 22 / 5

24 Sources with memory iid sources on finite alphabets are very-well treated......but they provide poor models for most applications! example : Shannon considered Markov models for natural languages Pb : a Markov Chain of order 3 on an alphabet of size 26 has k = 4394 degrees of freedom! min max Q n θ Θ KL(Pn θ, Qn ) k 2 log n = need of more parcimonious classes Licence de droits d usage Page 23 / 5

25 Outline Lossless Source Coding = density estimation with log-loss Source Coding and Universal Coding Bayesian coding Context-tree Models and Double Mixture Rissanen s model Coding in Context Tree models The Context-Tree Weighting Method Context Tree Weighting on Renewal Processes CTW on Renewal Processes Perpectives Licence de droits d usage Page 24 / 5

26 Context Tree Sources Formal definition : Context Tree : T A such that x, unique s T : x s = s The unique suffix of x in T is denoted by T (x) A( string s is ) a context of a stationary ergodic process P if P X s = s > and if for all x s : ( P X = a ) ( = x s s = P X = a X X s = s ) T the context tree of P if T is a context tree, contains all contexts of P and if none of its elements can be replaced by a proper suffix without violating one of these properties Licence de droits d usage Page 25 / 5

27 Context Tree Sources Informal Definition A Context tree Source is a Markov Chain whose order is allowed to depend on the past data. T = {,,, } P(X 4 = X = ) = P(X = X = ) 3/4 P(X 2 = X = ) /3 P(X 3 = X 2 = ) 4/5 P(X 4 = X 3 = ) /3 P(X 5 = X 4 = ) 2/3 (2/3, /3) (3/4, /4) (/5, 4/5) (/3, 2/3) Licence de droits d usage Page 26 / 5

28 Context Tree Sources Informal Definition A Context tree Source is a Markov Chain whose order is allowed to depend on the past data. T = {,,, } P(X 4 = X = ) = P(X = X = ) 3/4 P(X 2 = X = ) /3 P(X 3 = X 2 = ) 4/5 P(X 4 = X 3 = ) /3 P(X 5 = X 4 = ) 2/3 (3/4, /4) (/5, 4/5) (/3, 2/3) (2/3, /3) Licence de droits d usage Page 27 / 5

29 Context Tree Sources Informal Definition A Context tree Source is a Markov Chain whose order is allowed to depend on the past data. T = {,,, } P(X 4 = X = ) = P(X = X = ) 3/4 P(X 2 = X = ) /3 P(X 3 = X 2 = ) 4/5 P(X 4 = X 3 = ) /3 P(X 5 = X 4 = ) 2/3 (/5, 4/5) /3 (, 2/3) (3/4, /4) (2/3, /3) Licence de droits d usage Page 28 / 5

30 Context Tree Sources Informal Definition A Context tree Source is a Markov Chain whose order is allowed to depend on the past data. T = {,,, } P(X 4 = X = ) = P(X = X = ) 3/4 P(X 2 = X = ) /3 P(X 3 = X 2 = ) 4/5 P(X 4 = X 3 = ) /3 P(X 5 = X 4 = ) 2/3 4/5 (/5, ) (/3, 2/3) (3/4, /4) (2/3, /3) Licence de droits d usage Page 29 / 5

31 Context Tree Sources Informal Definition A Context tree Source is a Markov Chain whose order is allowed to depend on the past data. T = {,,, } P(X 4 = X = ) = P(X = X = ) 3/4 P(X 2 = X = ) /3 P(X 3 = X 2 = ) 4/5 P(X 4 = X 3 = ) /3 P(X 5 = X 4 = ) 2/3 (2/3, /3) (3/4, /4) (/5, 4/5) (/3, 2/3) Licence de droits d usage Page 3 / 5

32 Context Tree Sources Informal Definition A Context tree Source is a Markov Chain whose order is allowed to depend on the past data. T = {,,, } P(X 4 = X = ) = P(X = X = ) 3/4 P(X 2 = X = ) /3 P(X 3 = X 2 = ) 4/5 P(X 4 = X 3 = ) /3 P(X 5 = X 4 = ) 2/3 (3/4, /4) (/5, 4/5) (/3, 2/3) ( 2/3, /3) Licence de droits d usage Page 3 / 5

33 Context Trees versus Markov Chains Markov chain of order r = context tree source corresponding to a complete tree of depth r Markov chain of order 3 p (. ) p (. ) M =. p (. ) p(. ) p(. ) p(. ) p(. ) p(. ) p(. ) p(. ) p(. ) Finite context tree source of depth d = Markov Chain of order d p (. ) p (. ) p (. ) p(. ) M = p (. ) p(. ) p(. ) = much more flexibility : large number of models per dimension. Licence de droits d usage Page 32 / 5

34 Infinite trees, unbounded memory There exist (useful and simple) non-markovian context tree sources, see Galves, Ferrari, Comets,... Ex : Renewal Processes X = } {{ } }{{ } N i N i+ iid N i µ M (N + ) ( P X = X =... j ) = µ(j) µ ([j, [) Licence de droits d usage Page 33 / 5

35 Outline Lossless Source Coding = density estimation with log-loss Source Coding and Universal Coding Bayesian coding Context-tree Models and Double Mixture Rissanen s model Coding in Context Tree models The Context-Tree Weighting Method Context Tree Weighting on Renewal Processes CTW on Renewal Processes Perpectives Licence de droits d usage Page 34 / 5

36 Bayes Codes in a Context Tree Model For a given context tree T, let Λ T = {P stationary ergodic on A : T (P) = T } P Λ T is fully charactarized by θ = (θ s ) s T Θ T = S T A R T ( A ) : ( ) P X = X = x = θ T(x ) ( ) Bayesian coding : prior π T on Θ T, Q πt (x) = P θ (x n )dπ T (θ) Licence de droits d usage Page 35 / 5

37 Krichevsky-Trofimov Mixtures for Context Trees Prior : the θ s are independent, D ( 2,..., 2) - distributed : π T (dθ) = s T π KT (dθ s ) For s T let T (x n, s) = substring of x n that occurs in context s Example : for T = {,, } and x 8 =, T (x n, ) = T (x n, ) = T (x n, ) = The KT-mixture on model T is : Q n T (x n ) = s T Q T (x n,s) KT (T (x n, s)) Licence de droits d usage Page 36 / 5

38 Optimality Theorem : (Willems, Shtar kov, Tjalkens 93) there is a constant C such that : log Q n T (x n ) inf log P θ (x n θ S A T x ) + T A 2 first-order optimal What if we don t know which model T to use? = model selection ( ) n log + C T T Licence de droits d usage Page 37 / 5

39 Minimum Description Length Principle Occam s razor (Jorma Rissanen 78) : Choose the model that gives the shortest description of data Information theory : objective codelength = codelength of a minimax coder. Estimator associated with minimax mixtures (π T ) T : ˆT BIC = arg min log Pθ n (x n ) π T (dθ). T θ Θ T Looking directly at the bounds : ˆT KT = arg min T inf log Pθ n (x n ) + θ Θ T T ( A ) 2 log n. =penalized maximum likelihood estimator with BIC penalty. Licence de droits d usage Page 38 / 5

40 Consistency Results Theorems (Csiszár& Shields 96, Csiszár& Talata 4, G. 5, Galves & Leonardi 7, Leonardi 9) : the BIC estimator is strongly consistent the Krichevski-Trofimov estimator is not consistent Many Rates of convergence, minimal penalties,... The Krichevski-Trofimov estimator fails to identify Bernoulli-/2 iid sources! ( ) One can derive from this bounds on sup T sup θ ΘT KL Pθ n, Qṋ T Licence de droits d usage Page 39 / 5

41 Outline Lossless Source Coding = density estimation with log-loss Source Coding and Universal Coding Bayesian coding Context-tree Models and Double Mixture Rissanen s model Coding in Context Tree models The Context-Tree Weighting Method Context Tree Weighting on Renewal Processes CTW on Renewal Processes Perpectives Licence de droits d usage Page 4 / 5

42 Double mixture ν = measure on the set T of all context trees defined by : 2 T + ν(t ) = 2 is a probability distribution. Context Tree Weighting coding distribution : Q n CTW (x n ) = T ν(t )Q n T (x n ) Theorem (Shtarkov and al 93)) : oracle inequality : log QCTW(x n n x ) inf inf p θ (x n x ) T T θ Θ T + A T log + n 2 T + T (2 + log m) + m 2. Licence de droits d usage Page 4 / 5

43 CTW : Efficient Algorithm Idea : in each node, compute the arithmetical mean of a self-prob and a sub-prob. (4,3) (2,2) (2,) (2,) (,2) (,) (,) Licence de droits d usage Page 42 / 5

44 CTW : Efficient Algorithm Idea : in each node, compute the arithmetical mean of a self-prob and a sub-prob. (4,3) (2,2) (2,) 3 8 (2,) 3 8 (,2) 2 (,) 8 (,) Licence de droits d usage Page 42 / 5

45 CTW : Efficient Algorithm Idea : in each node, compute the arithmetical mean of a self-prob and a sub-prob. (4,3) ( ) 3 28 = (2,2) ( ) 6 = 6 (2,) 3 8 (2,) 3 8 (,2) 2 (,) 8 (,) Licence de droits d usage Page 42 / 5

46 CTW : Efficient Algorithm Idea : in each node, compute the arithmetical mean of a self-prob and a sub-prob. ( (4,3) ) = ( ) 3 28 = (2,2) ( ) 6 = 6 (2,) 3 8 (2,) 3 8 (,2) 2 (,) 8 (,) Licence de droits d usage Page 42 / 5

47 Outline Lossless Source Coding = density estimation with log-loss Source Coding and Universal Coding Bayesian coding Context-tree Models and Double Mixture Rissanen s model Coding in Context Tree models The Context-Tree Weighting Method Context Tree Weighting on Renewal Processes CTW on Renewal Processes Perpectives Licence de droits d usage Page 43 / 5

48 Renewal Processes Let R be the (non-parametric) class of renewal processes on the binary alphabet {, } : X = } {{ } }{{ }, N i N i N i+ iid µ M (N + ) Theorem : (Csiszár and Shields 96) There exist two constants c and C such that c n inf sup KL(P n, Q n ) = C n Q n P R = first example of an intermediate complexity class. Problem : non-constructive Does a general purpose coder behave well on Renewal Processes? Licence de droits d usage Page 44 / 5

49 Redundancy of the CTW method on Renewal Processes Theorem : (G. 4) There exist constants c and C such that for all P R : c n log n KL (P n, Q n CTW ) C n log n CTW is efficient on long-memory processes Adaptivity : if the renewal distribution is bounded, CTW achieves regret O(log n) (contrary to ad-hoc coders). Requires deep contexts in the double mixture ( = the tree should not be cut off at depth log n). Kind of non-parametric estimation : need for a balance between approximation and estimation. Licence de droits d usage Page 45 / 5

50 Why it works ^k P n is approximated by QT n with T as above. Two losses : estimating the transition parameters : k log(n)/2 approximation due to boundedness : n log(k)/k = for k = c n, both are O ( n log(n) ) Licence de droits d usage Page 46 / 5

51 Extension : Markovian Renewal Processes A Markovian Renewal Processes is such that successive distances between occurences of symbol form a Markov Chain on N + : X = } {{ } }{{ }, (N i ) i Markov chain N i N i+ Theorem : (Csiszár and Shields 96) There exist two constants c and C such that cn 2/3 inf Q n sup P MR KL(P n, Q n ) Cn 2/3 Theorem : (G. 4) CTW is almost adaptive on the class MR of Markovian Renewal processes : C R, P MR, KL (P n, Q n CTW ) Cn2/3 log n Licence de droits d usage Page 47 / 5

52 Markovian Renewal Processes are Context Tree Sources And so are r-order Markov Renewal Processes sk s s2 s Licence de droits d usage Page 48 / 5

53 Outline Lossless Source Coding = density estimation with log-loss Source Coding and Universal Coding Bayesian coding Context-tree Models and Double Mixture Rissanen s model Coding in Context Tree models The Context-Tree Weighting Method Context Tree Weighting on Renewal Processes CTW on Renewal Processes Perpectives Licence de droits d usage Page 49 / 5

54 Perspectives : oracle inequalities on more general classes? CTW is probably more useful than expected : it obtains good results on infinite memory processes We want to obtain oracle inequalities for different non-markovian classes But the setting is more difficult than usually And approximation theory is difficult! Ex : let P H be a HMM with 3 hidden states, what is inf T k inf KL(P n θ S A T H, Pn θ )??? Licence de droits d usage Page 5 / 5

55 the end... Thank you for your attention! Licence de droits d usage Page 5 / 5

Coding on Countably Infinite Alphabets

Coding on Countably Infinite Alphabets Coding on Countably Infinite Alphabets Non-parametric Information Theory Licence de droits d usage Outline Lossless Coding on infinite alphabets Source Coding Universal Coding Infinite Alphabets Enveloppe

More information

Universal Loseless Compression: Context Tree Weighting(CTW)

Universal Loseless Compression: Context Tree Weighting(CTW) Universal Loseless Compression: Context Tree Weighting(CTW) Dept. Electrical Engineering, Stanford University Dec 9, 2014 Universal Coding with Model Classes Traditional Shannon theory assume a (probabilistic)

More information

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria Source Coding Master Universitario en Ingeniería de Telecomunicación I. Santamaría Universidad de Cantabria Contents Introduction Asymptotic Equipartition Property Optimal Codes (Huffman Coding) Universal

More information

STATISTICAL CURVATURE AND STOCHASTIC COMPLEXITY

STATISTICAL CURVATURE AND STOCHASTIC COMPLEXITY 2nd International Symposium on Information Geometry and its Applications December 2-6, 2005, Tokyo Pages 000 000 STATISTICAL CURVATURE AND STOCHASTIC COMPLEXITY JUN-ICHI TAKEUCHI, ANDREW R. BARRON, AND

More information

3F1 Information Theory, Lecture 3

3F1 Information Theory, Lecture 3 3F1 Information Theory, Lecture 3 Jossy Sayir Department of Engineering Michaelmas 2013, 29 November 2013 Memoryless Sources Arithmetic Coding Sources with Memory Markov Example 2 / 21 Encoding the output

More information

Bioinformatics: Biology X

Bioinformatics: Biology X Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA Model Building/Checking, Reverse Engineering, Causality Outline 1 Bayesian Interpretation of Probabilities 2 Where (or of what)

More information

3F1 Information Theory, Lecture 3

3F1 Information Theory, Lecture 3 3F1 Information Theory, Lecture 3 Jossy Sayir Department of Engineering Michaelmas 2011, 28 November 2011 Memoryless Sources Arithmetic Coding Sources with Memory 2 / 19 Summary of last lecture Prefix-free

More information

WE start with a general discussion. Suppose we have

WE start with a general discussion. Suppose we have 646 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO. 2, MARCH 1997 Minimax Redundancy for the Class of Memoryless Sources Qun Xie and Andrew R. Barron, Member, IEEE Abstract Let X n = (X 1 ; 111;Xn)be

More information

Chapter 5: Data Compression

Chapter 5: Data Compression Chapter 5: Data Compression Definition. A source code C for a random variable X is a mapping from the range of X to the set of finite length strings of symbols from a D-ary alphabet. ˆX: source alphabet,

More information

Markov approximation and consistent estimation of unbounded probabilistic suffix trees

Markov approximation and consistent estimation of unbounded probabilistic suffix trees Markov approximation and consistent estimation of unbounded probabilistic suffix trees Denise Duarte Antonio Galves Nancy L. Garcia Abstract We consider infinite order chains whose transition probabilities

More information

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1 Kraft s inequality An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if N 2 l i 1 Proof: Suppose that we have a tree code. Let l max = max{l 1,...,

More information

Basic Principles of Lossless Coding. Universal Lossless coding. Lempel-Ziv Coding. 2. Exploit dependences between successive symbols.

Basic Principles of Lossless Coding. Universal Lossless coding. Lempel-Ziv Coding. 2. Exploit dependences between successive symbols. Universal Lossless coding Lempel-Ziv Coding Basic principles of lossless compression Historical review Variable-length-to-block coding Lempel-Ziv coding 1 Basic Principles of Lossless Coding 1. Exploit

More information

An O(N) Semi-Predictive Universal Encoder via the BWT

An O(N) Semi-Predictive Universal Encoder via the BWT An O(N) Semi-Predictive Universal Encoder via the BWT Dror Baron and Yoram Bresler Abstract We provide an O(N) algorithm for a non-sequential semi-predictive encoder whose pointwise redundancy with respect

More information

Homework Set #2 Data Compression, Huffman code and AEP

Homework Set #2 Data Compression, Huffman code and AEP Homework Set #2 Data Compression, Huffman code and AEP 1. Huffman coding. Consider the random variable ( x1 x X = 2 x 3 x 4 x 5 x 6 x 7 0.50 0.26 0.11 0.04 0.04 0.03 0.02 (a Find a binary Huffman code

More information

Lecture 2: August 31

Lecture 2: August 31 0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 2: August 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science Transmission of Information Spring 2006

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science Transmission of Information Spring 2006 MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science 6.44 Transmission of Information Spring 2006 Homework 2 Solution name username April 4, 2006 Reading: Chapter

More information

Information Theory. David Rosenberg. June 15, New York University. David Rosenberg (New York University) DS-GA 1003 June 15, / 18

Information Theory. David Rosenberg. June 15, New York University. David Rosenberg (New York University) DS-GA 1003 June 15, / 18 Information Theory David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 18 A Measure of Information? Consider a discrete random variable

More information

A BLEND OF INFORMATION THEORY AND STATISTICS. Andrew R. Barron. Collaborators: Cong Huang, Jonathan Li, Gerald Cheang, Xi Luo

A BLEND OF INFORMATION THEORY AND STATISTICS. Andrew R. Barron. Collaborators: Cong Huang, Jonathan Li, Gerald Cheang, Xi Luo A BLEND OF INFORMATION THEORY AND STATISTICS Andrew R. YALE UNIVERSITY Collaborators: Cong Huang, Jonathan Li, Gerald Cheang, Xi Luo Frejus, France, September 1-5, 2008 A BLEND OF INFORMATION THEORY AND

More information

IN this paper, we study the problem of universal lossless compression

IN this paper, we study the problem of universal lossless compression 4008 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 9, SEPTEMBER 2006 An Algorithm for Universal Lossless Compression With Side Information Haixiao Cai, Member, IEEE, Sanjeev R. Kulkarni, Fellow,

More information

Multimedia Communications. Mathematical Preliminaries for Lossless Compression

Multimedia Communications. Mathematical Preliminaries for Lossless Compression Multimedia Communications Mathematical Preliminaries for Lossless Compression What we will see in this chapter Definition of information and entropy Modeling a data source Definition of coding and when

More information

Computational Biology Lecture #3: Probability and Statistics. Bud Mishra Professor of Computer Science, Mathematics, & Cell Biology Sept

Computational Biology Lecture #3: Probability and Statistics. Bud Mishra Professor of Computer Science, Mathematics, & Cell Biology Sept Computational Biology Lecture #3: Probability and Statistics Bud Mishra Professor of Computer Science, Mathematics, & Cell Biology Sept 26 2005 L2-1 Basic Probabilities L2-2 1 Random Variables L2-3 Examples

More information

Information Theory and Statistics Lecture 2: Source coding

Information Theory and Statistics Lecture 2: Source coding Information Theory and Statistics Lecture 2: Source coding Łukasz Dębowski ldebowsk@ipipan.waw.pl Ph. D. Programme 2013/2014 Injections and codes Definition (injection) Function f is called an injection

More information

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018 Please submit the solutions on Gradescope. EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018 1. Optimal codeword lengths. Although the codeword lengths of an optimal variable length code

More information

(Classical) Information Theory II: Source coding

(Classical) Information Theory II: Source coding (Classical) Information Theory II: Source coding Sibasish Ghosh The Institute of Mathematical Sciences CIT Campus, Taramani, Chennai 600 113, India. p. 1 Abstract The information content of a random variable

More information

Minimax Redundancy for Large Alphabets by Analytic Methods

Minimax Redundancy for Large Alphabets by Analytic Methods Minimax Redundancy for Large Alphabets by Analytic Methods Wojciech Szpankowski Purdue University W. Lafayette, IN 47907 March 15, 2012 NSF STC Center for Science of Information CISS, Princeton 2012 Joint

More information

Asymptotic Minimax Regret for Data Compression, Gambling, and Prediction

Asymptotic Minimax Regret for Data Compression, Gambling, and Prediction IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 46, NO. 2, MARCH 2000 431 Asymptotic Minimax Regret for Data Compression, Gambling, Prediction Qun Xie Andrew R. Barron, Member, IEEE Abstract For problems

More information

U Logo Use Guidelines

U Logo Use Guidelines Information Theory Lecture 3: Applications to Machine Learning U Logo Use Guidelines Mark Reid logo is a contemporary n of our heritage. presents our name, d and our motto: arn the nature of things. authenticity

More information

lossless, optimal compressor

lossless, optimal compressor 6. Variable-length Lossless Compression The principal engineering goal of compression is to represent a given sequence a, a 2,..., a n produced by a source as a sequence of bits of minimal possible length.

More information

Information Theory and Model Selection

Information Theory and Model Selection Information Theory and Model Selection Dean Foster & Robert Stine Dept of Statistics, Univ of Pennsylvania May 11, 1998 Data compression and coding Duality of code lengths and probabilities Model selection

More information

Keep it Simple Stupid On the Effect of Lower-Order Terms in BIC-Like Criteria

Keep it Simple Stupid On the Effect of Lower-Order Terms in BIC-Like Criteria Keep it Simple Stupid On the Effect of Lower-Order Terms in BIC-Like Criteria Teemu Roos and Yuan Zou Helsinki Institute for Information Technology HIIT Department of Computer Science University of Helsinki,

More information

CSCI 2570 Introduction to Nanocomputing

CSCI 2570 Introduction to Nanocomputing CSCI 2570 Introduction to Nanocomputing Information Theory John E Savage What is Information Theory Introduced by Claude Shannon. See Wikipedia Two foci: a) data compression and b) reliable communication

More information

Chapter 2: Source coding

Chapter 2: Source coding Chapter 2: meghdadi@ensil.unilim.fr University of Limoges Chapter 2: Entropy of Markov Source Chapter 2: Entropy of Markov Source Markov model for information sources Given the present, the future is independent

More information

Bayesian Properties of Normalized Maximum Likelihood and its Fast Computation

Bayesian Properties of Normalized Maximum Likelihood and its Fast Computation Bayesian Properties of Normalized Maximum Likelihood and its Fast Computation Andre Barron Department of Statistics Yale University Email: andre.barron@yale.edu Teemu Roos HIIT & Dept of Computer Science

More information

EE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16

EE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16 EE539R: Problem Set 4 Assigned: 3/08/6, Due: 07/09/6. Cover and Thomas: Problem 3.5 Sets defined by probabilities: Define the set C n (t = {x n : P X n(x n 2 nt } (a We have = P X n(x n P X n(x n 2 nt

More information

COMM901 Source Coding and Compression. Quiz 1

COMM901 Source Coding and Compression. Quiz 1 German University in Cairo - GUC Faculty of Information Engineering & Technology - IET Department of Communication Engineering Winter Semester 2013/2014 Students Name: Students ID: COMM901 Source Coding

More information

Information Theory in Intelligent Decision Making

Information Theory in Intelligent Decision Making Information Theory in Intelligent Decision Making Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire, United Kingdom June 7, 2015 Information Theory

More information

10-704: Information Processing and Learning Fall Lecture 10: Oct 3

10-704: Information Processing and Learning Fall Lecture 10: Oct 3 0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 0: Oct 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy of

More information

Coding of memoryless sources 1/35

Coding of memoryless sources 1/35 Coding of memoryless sources 1/35 Outline 1. Morse coding ; 2. Definitions : encoding, encoding efficiency ; 3. fixed length codes, encoding integers ; 4. prefix condition ; 5. Kraft and Mac Millan theorems

More information

Achievability of Asymptotic Minimax Regret in Online and Batch Prediction

Achievability of Asymptotic Minimax Regret in Online and Batch Prediction JMLR: Workshop and Conference Proceedings 9:181 196, 013 ACML 013 Achievability of Asymptotic Minimax Regret in Online and Batch Prediction Kazuho Watanabe Graduate School of Information Science Nara Institute

More information

Quantifying Computational Security Subject to Source Constraints, Guesswork and Inscrutability

Quantifying Computational Security Subject to Source Constraints, Guesswork and Inscrutability Quantifying Computational Security Subject to Source Constraints, Guesswork and Inscrutability Ahmad Beirami, Robert Calderbank, Ken Duffy, Muriel Médard Department of Electrical and Computer Engineering,

More information

Bayesian Regularization

Bayesian Regularization Bayesian Regularization Aad van der Vaart Vrije Universiteit Amsterdam International Congress of Mathematicians Hyderabad, August 2010 Contents Introduction Abstract result Gaussian process priors Co-authors

More information

10-704: Information Processing and Learning Spring Lecture 8: Feb 5

10-704: Information Processing and Learning Spring Lecture 8: Feb 5 10-704: Information Processing and Learning Spring 2015 Lecture 8: Feb 5 Lecturer: Aarti Singh Scribe: Siheng Chen Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal

More information

Chapter 2 Date Compression: Source Coding. 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code

Chapter 2 Date Compression: Source Coding. 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code Chapter 2 Date Compression: Source Coding 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code 2.1 An Introduction to Source Coding Source coding can be seen as an efficient way

More information

EECS 229A Spring 2007 * * (a) By stationarity and the chain rule for entropy, we have

EECS 229A Spring 2007 * * (a) By stationarity and the chain rule for entropy, we have EECS 229A Spring 2007 * * Solutions to Homework 3 1. Problem 4.11 on pg. 93 of the text. Stationary processes (a) By stationarity and the chain rule for entropy, we have H(X 0 ) + H(X n X 0 ) = H(X 0,

More information

Chapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code

Chapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code Chapter 3 Source Coding 3. An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code 3. An Introduction to Source Coding Entropy (in bits per symbol) implies in average

More information

Bayesian Inference Course, WTCN, UCL, March 2013

Bayesian Inference Course, WTCN, UCL, March 2013 Bayesian Course, WTCN, UCL, March 2013 Shannon (1948) asked how much information is received when we observe a specific value of the variable x? If an unlikely event occurs then one would expect the information

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

SIGNAL COMPRESSION Lecture Shannon-Fano-Elias Codes and Arithmetic Coding

SIGNAL COMPRESSION Lecture Shannon-Fano-Elias Codes and Arithmetic Coding SIGNAL COMPRESSION Lecture 3 4.9.2007 Shannon-Fano-Elias Codes and Arithmetic Coding 1 Shannon-Fano-Elias Coding We discuss how to encode the symbols {a 1, a 2,..., a m }, knowing their probabilities,

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 /

More information

Lossy Compression Coding Theorems for Arbitrary Sources

Lossy Compression Coding Theorems for Arbitrary Sources Lossy Compression Coding Theorems for Arbitrary Sources Ioannis Kontoyiannis U of Cambridge joint work with M. Madiman, M. Harrison, J. Zhang Beyond IID Workshop, Cambridge, UK July 23, 2018 Outline Motivation

More information

Chapter 9 Fundamental Limits in Information Theory

Chapter 9 Fundamental Limits in Information Theory Chapter 9 Fundamental Limits in Information Theory Information Theory is the fundamental theory behind information manipulation, including data compression and data transmission. 9.1 Introduction o For

More information

Lecture 21: Minimax Theory

Lecture 21: Minimax Theory Lecture : Minimax Theory Akshay Krishnamurthy akshay@cs.umass.edu November 8, 07 Recap In the first part of the course, we spent the majority of our time studying risk minimization. We found many ways

More information

Two-Part Codes with Low Worst-Case Redundancies for Distributed Compression of Bernoulli Sequences

Two-Part Codes with Low Worst-Case Redundancies for Distributed Compression of Bernoulli Sequences 2003 Conference on Information Sciences and Systems, The Johns Hopkins University, March 2 4, 2003 Two-Part Codes with Low Worst-Case Redundancies for Distributed Compression of Bernoulli Sequences Dror

More information

Consistency of the maximum likelihood estimator for general hidden Markov models

Consistency of the maximum likelihood estimator for general hidden Markov models Consistency of the maximum likelihood estimator for general hidden Markov models Jimmy Olsson Centre for Mathematical Sciences Lund University Nordstat 2012 Umeå, Sweden Collaborators Hidden Markov models

More information

Introduction to Statistical Learning Theory

Introduction to Statistical Learning Theory Introduction to Statistical Learning Theory In the last unit we looked at regularization - adding a w 2 penalty. We add a bias - we prefer classifiers with low norm. How to incorporate more complicated

More information

arxiv: v1 [cs.it] 29 Feb 2008

arxiv: v1 [cs.it] 29 Feb 2008 arxiv:0802.4363v [cs.it] 29 Feb 2008 Estimating the Entropy of Binary Time Series: Methodology, Some Theory and a Simulation Study Yun Gao Ioannis Kontoyiannis Elie Bienenstock February 7, 203 Abstract

More information

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information 4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information Ramji Venkataramanan Signal Processing and Communications Lab Department of Engineering ramji.v@eng.cam.ac.uk

More information

Inferring Markov Chains: Bayesian Estimation, Model Comparison, Entropy Rate, and Out-of-class Modeling

Inferring Markov Chains: Bayesian Estimation, Model Comparison, Entropy Rate, and Out-of-class Modeling Santa Fe Institute Working Paper 7-4-5 arxiv.org/math.st/7375 Inferring Markov Chains: Bayesian Estimation, Model Comparison, Entropy Rate, and Out-of-class Modeling Christopher C. Strelioff,,2, James

More information

Information Theory and Hypothesis Testing

Information Theory and Hypothesis Testing Summer School on Game Theory and Telecommunications Campione, 7-12 September, 2014 Information Theory and Hypothesis Testing Mauro Barni University of Siena September 8 Review of some basic results linking

More information

13 : Variational Inference: Loopy Belief Propagation and Mean Field

13 : Variational Inference: Loopy Belief Propagation and Mean Field 10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction

More information

SIGNAL COMPRESSION Lecture 7. Variable to Fix Encoding

SIGNAL COMPRESSION Lecture 7. Variable to Fix Encoding SIGNAL COMPRESSION Lecture 7 Variable to Fix Encoding 1. Tunstall codes 2. Petry codes 3. Generalized Tunstall codes for Markov sources (a presentation of the paper by I. Tabus, G. Korodi, J. Rissanen.

More information

1 Introduction to information theory

1 Introduction to information theory 1 Introduction to information theory 1.1 Introduction In this chapter we present some of the basic concepts of information theory. The situations we have in mind involve the exchange of information through

More information

Exercises with solutions (Set B)

Exercises with solutions (Set B) Exercises with solutions (Set B) 3. A fair coin is tossed an infinite number of times. Let Y n be a random variable, with n Z, that describes the outcome of the n-th coin toss. If the outcome of the n-th

More information

10-704: Information Processing and Learning Fall Lecture 9: Sept 28

10-704: Information Processing and Learning Fall Lecture 9: Sept 28 10-704: Information Processing and Learning Fall 2016 Lecturer: Siheng Chen Lecture 9: Sept 28 Note: These notes are based on scribed notes from Spring15 offering of this course. LaTeX template courtesy

More information

Information Theory, Statistics, and Decision Trees

Information Theory, Statistics, and Decision Trees Information Theory, Statistics, and Decision Trees Léon Bottou COS 424 4/6/2010 Summary 1. Basic information theory. 2. Decision trees. 3. Information theory and statistics. Léon Bottou 2/31 COS 424 4/6/2010

More information

Information Theory. M1 Informatique (parcours recherche et innovation) Aline Roumy. January INRIA Rennes 1/ 73

Information Theory. M1 Informatique (parcours recherche et innovation) Aline Roumy. January INRIA Rennes 1/ 73 1/ 73 Information Theory M1 Informatique (parcours recherche et innovation) Aline Roumy INRIA Rennes January 2018 Outline 2/ 73 1 Non mathematical introduction 2 Mathematical introduction: definitions

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College

More information

arxiv: v3 [stat.me] 11 Feb 2018

arxiv: v3 [stat.me] 11 Feb 2018 arxiv:1708.02742v3 [stat.me] 11 Feb 2018 Minimum message length inference of the Poisson and geometric models using heavy-tailed prior distributions Chi Kuen Wong, Enes Makalic, Daniel F. Schmidt February

More information

ECE 587 / STA 563: Lecture 5 Lossless Compression

ECE 587 / STA 563: Lecture 5 Lossless Compression ECE 587 / STA 563: Lecture 5 Lossless Compression Information Theory Duke University, Fall 2017 Author: Galen Reeves Last Modified: October 18, 2017 Outline of lecture: 5.1 Introduction to Lossless Source

More information

Introduction to information theory and coding

Introduction to information theory and coding Introduction to information theory and coding Louis WEHENKEL Set of slides No 5 State of the art in data compression Stochastic processes and models for information sources First Shannon theorem : data

More information

Information & Correlation

Information & Correlation Information & Correlation Jilles Vreeken 11 June 2014 (TADA) Questions of the day What is information? How can we measure correlation? and what do talking drums have to do with this? Bits and Pieces What

More information

Tight Bounds for Symmetric Divergence Measures and a Refined Bound for Lossless Source Coding

Tight Bounds for Symmetric Divergence Measures and a Refined Bound for Lossless Source Coding APPEARS IN THE IEEE TRANSACTIONS ON INFORMATION THEORY, FEBRUARY 015 1 Tight Bounds for Symmetric Divergence Measures and a Refined Bound for Lossless Source Coding Igal Sason Abstract Tight bounds for

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

Information and Statistics

Information and Statistics Information and Statistics Andrew Barron Department of Statistics Yale University IMA Workshop On Information Theory and Concentration Phenomena Minneapolis, April 13, 2015 Barron Information and Statistics

More information

ELEMENT OF INFORMATION THEORY

ELEMENT OF INFORMATION THEORY History Table of Content ELEMENT OF INFORMATION THEORY O. Le Meur olemeur@irisa.fr Univ. of Rennes 1 http://www.irisa.fr/temics/staff/lemeur/ October 2010 1 History Table of Content VERSION: 2009-2010:

More information

Lecture 5 Channel Coding over Continuous Channels

Lecture 5 Channel Coding over Continuous Channels Lecture 5 Channel Coding over Continuous Channels I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw November 14, 2014 1 / 34 I-Hsiang Wang NIT Lecture 5 From

More information

Basic math for biology

Basic math for biology Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood

More information

Piecewise Constant Prediction

Piecewise Constant Prediction Piecewise Constant Prediction Erik Ordentlich Information heory Research Hewlett-Packard Laboratories Palo Alto, CA 94304 Email: erik.ordentlich@hp.com Marcelo J. Weinberger Information heory Research

More information

Exact Minimax Strategies for Predictive Density Estimation, Data Compression, and Model Selection

Exact Minimax Strategies for Predictive Density Estimation, Data Compression, and Model Selection 2708 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 50, NO. 11, NOVEMBER 2004 Exact Minimax Strategies for Predictive Density Estimation, Data Compression, Model Selection Feng Liang Andrew Barron, Senior

More information

Iterative Markov Chain Monte Carlo Computation of Reference Priors and Minimax Risk

Iterative Markov Chain Monte Carlo Computation of Reference Priors and Minimax Risk Iterative Markov Chain Monte Carlo Computation of Reference Priors and Minimax Risk John Lafferty School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 lafferty@cs.cmu.edu Abstract

More information

Lecture 8: Information Theory and Statistics

Lecture 8: Information Theory and Statistics Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang

More information

Achievability of Asymptotic Minimax Regret by Horizon-Dependent and Horizon-Independent Strategies

Achievability of Asymptotic Minimax Regret by Horizon-Dependent and Horizon-Independent Strategies Journal of Machine Learning Research 16 015) 1-48 Submitted 7/14; Revised 1/14; Published 8/15 Achievability of Asymptotic Minimax Regret by Horizon-Dependent and Horizon-Independent Strategies Kazuho

More information

Predictive MDL & Bayes

Predictive MDL & Bayes Predictive MDL & Bayes Marcus Hutter Canberra, ACT, 0200, Australia http://www.hutter1.net/ ANU RSISE NICTA Marcus Hutter - 2 - Predictive MDL & Bayes Contents PART I: SETUP AND MAIN RESULTS PART II: FACTS,

More information

Priors for the frequentist, consistency beyond Schwartz

Priors for the frequentist, consistency beyond Schwartz Victoria University, Wellington, New Zealand, 11 January 2016 Priors for the frequentist, consistency beyond Schwartz Bas Kleijn, KdV Institute for Mathematics Part I Introduction Bayesian and Frequentist

More information

Asymptotically Minimax Regret for Models with Hidden Variables

Asymptotically Minimax Regret for Models with Hidden Variables Asymptotically Minimax Regret for Models with Hidden Variables Jun ichi Takeuchi Department of Informatics, Kyushu University, Fukuoka, Fukuoka, Japan Andrew R. Barron Department of Statistics, Yale University,

More information

Information and Entropy

Information and Entropy Information and Entropy Shannon s Separation Principle Source Coding Principles Entropy Variable Length Codes Huffman Codes Joint Sources Arithmetic Codes Adaptive Codes Thomas Wiegand: Digital Image Communication

More information

Minimax lower bounds I

Minimax lower bounds I Minimax lower bounds I Kyoung Hee Kim Sungshin University 1 Preliminaries 2 General strategy 3 Le Cam, 1973 4 Assouad, 1983 5 Appendix Setting Family of probability measures {P θ : θ Θ} on a sigma field

More information

CAN the future of a sequence be predicted based on its

CAN the future of a sequence be predicted based on its 2124 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998 Universal Prediction Neri Merhav, Senior Member, IEEE, and Meir Feder, Senior Member, IEEE (Invited Paper) Abstract This paper

More information

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting

More information

ECE 587 / STA 563: Lecture 5 Lossless Compression

ECE 587 / STA 563: Lecture 5 Lossless Compression ECE 587 / STA 563: Lecture 5 Lossless Compression Information Theory Duke University, Fall 28 Author: Galen Reeves Last Modified: September 27, 28 Outline of lecture: 5. Introduction to Lossless Source

More information

Kolmogorov complexity ; induction, prediction and compression

Kolmogorov complexity ; induction, prediction and compression Kolmogorov complexity ; induction, prediction and compression Contents 1 Motivation for Kolmogorov complexity 1 2 Formal Definition 2 3 Trying to compute Kolmogorov complexity 3 4 Standard upper bounds

More information

On Universal Types. Gadiel Seroussi Hewlett-Packard Laboratories Palo Alto, California, USA. University of Minnesota, September 14, 2004

On Universal Types. Gadiel Seroussi Hewlett-Packard Laboratories Palo Alto, California, USA. University of Minnesota, September 14, 2004 On Universal Types Gadiel Seroussi Hewlett-Packard Laboratories Palo Alto, California, USA University of Minnesota, September 14, 2004 Types for Parametric Probability Distributions A = finite alphabet,

More information

Informational Confidence Bounds for Self-Normalized Averages and Applications

Informational Confidence Bounds for Self-Normalized Averages and Applications Informational Confidence Bounds for Self-Normalized Averages and Applications Aurélien Garivier Institut de Mathématiques de Toulouse - Université Paul Sabatier Thursday, September 12th 2013 Context Tree

More information

6.1 Variational representation of f-divergences

6.1 Variational representation of f-divergences ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 6: Variational representation, HCR and CR lower bounds Lecturer: Yihong Wu Scribe: Georgios Rovatsos, Feb 11, 2016

More information

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.

More information

Information Theory. Week 4 Compressing streams. Iain Murray,

Information Theory. Week 4 Compressing streams. Iain Murray, Information Theory http://www.inf.ed.ac.uk/teaching/courses/it/ Week 4 Compressing streams Iain Murray, 2014 School of Informatics, University of Edinburgh Jensen s inequality For convex functions: E[f(x)]

More information

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability

More information

Lec 03 Entropy and Coding II Hoffman and Golomb Coding

Lec 03 Entropy and Coding II Hoffman and Golomb Coding CS/EE 5590 / ENG 40 Special Topics Multimedia Communication, Spring 207 Lec 03 Entropy and Coding II Hoffman and Golomb Coding Zhu Li Z. Li Multimedia Communciation, 207 Spring p. Outline Lecture 02 ReCap

More information

CS Lecture 19. Exponential Families & Expectation Propagation

CS Lecture 19. Exponential Families & Expectation Propagation CS 6347 Lecture 19 Exponential Families & Expectation Propagation Discrete State Spaces We have been focusing on the case of MRFs over discrete state spaces Probability distributions over discrete spaces

More information

Lecture 3. Mathematical methods in communication I. REMINDER. A. Convex Set. A set R is a convex set iff, x 1,x 2 R, θ, 0 θ 1, θx 1 + θx 2 R, (1)

Lecture 3. Mathematical methods in communication I. REMINDER. A. Convex Set. A set R is a convex set iff, x 1,x 2 R, θ, 0 θ 1, θx 1 + θx 2 R, (1) 3- Mathematical methods in communication Lecture 3 Lecturer: Haim Permuter Scribe: Yuval Carmel, Dima Khaykin, Ziv Goldfeld I. REMINDER A. Convex Set A set R is a convex set iff, x,x 2 R, θ, θ, θx + θx

More information