Context tree models for source coding
|
|
- Horace Baker
- 6 years ago
- Views:
Transcription
1 Context tree models for source coding Toward Non-parametric Information Theory Licence de droits d usage
2 Outline Lossless Source Coding = density estimation with log-loss Source Coding and Universal Coding Bayesian coding Context-tree Models and Double Mixture Rissanen s model Coding in Context Tree models The Context-Tree Weighting Method Context Tree Weighting on Renewal Processes CTW on Renewal Processes Perpectives Licence de droits d usage Page 2 / 5
3 Data Compression : Shannon Modelization Licence de droits d usage Page 3 / 5
4 Data Compression : Shannon Modelization Licence de droits d usage Page 3 / 5
5 Outline Lossless Source Coding = density estimation with log-loss Source Coding and Universal Coding Bayesian coding Context-tree Models and Double Mixture Rissanen s model Coding in Context Tree models The Context-Tree Weighting Method Context Tree Weighting on Renewal Processes CTW on Renewal Processes Perpectives Licence de droits d usage Page 4 / 5
6 The Problem Finite alphabet A ; ex : A = {a, c, t, g} Set of all finite binary sequences : {, } = n {, }n If x A n, its length is x = n log = base 2 logarithm Let X be a stochastic process on the finite alphabet A, with stationary ergodic distribution P The marginal distribution of X n is denoted by Pn For n N, the coding function φ n : A n {, } Expected code length : E P [ φ n (X n ) ] Licence de droits d usage Page 5 / 5
7 Example : DNA Codes A = {a, c, t, g} Uniform concatenation code : φ = a c t g, φ n(x n) = φ (X )... φ (X n ) = x n An, φ n (X n ) = 2n Improvement? ψ = a c t g Problem : what is ψ () : ct or gc? Uniquely decodable codes : φ n (X n ) = φ m(y m ) = m = n and i {,..., n}, X i = Y i Instant codes = prefix codes : (a, b) A 2, j N, φ (a) :j φ (b) Licence de droits d usage Page 6 / 5
8 Kraft s Inequality Theorem (Kraft 49, McMillan 56) If φ n is a uniquely decodable code, then x A n 2 φn(x) Proof for prefix codes : Let D = max x A n φ n (x) and for x A n, let Z (x) = {w {, } D : w φ(x) = φ n (x)}. #Z (x) = 2 D φn(x) φ n prefix = x, y A n, x y = Z (x) Z (y) = Hence x A n N(x) = x A n 2 D φn(x) #{, } D = 2 D Licence de droits d usage Page 7 / 5
9 Shannon s First Theorem = if l(x) = φ n (x), we look for min l x A n l(x)p n (x) under constraint x A n 2 l(x) Theorem (Shannon 48) E P [ φ n (x) ] H n (X) = E P [ log P n (X n )] Misleading notation : H n (X) is actually a function of P n only, the marginal distribution of X n Tight bound : there exist a code φ n such that E P [ φ n (x) ] H n (X) + Licence de droits d usage Page 8 / 5
10 Example : binary entropy A = {, }, X i iid B(p), < p <. n H n(x) = H (X) = h(p) = p log p + ( p) log p Licence de droits d usage Page 9 / 5
11 Example : a 2-block code A = {, 2, 3}, n = 2, P =.3δ +.6δ 2 +.δ 3 i.i.d. H(X) =.3 log(.3).6 log(.6). log(.).295 bits X 2 φ 2 (X 2) E [ φ2 (X 2 ) ].335 Licence de droits d usage Page / 5
12 Entropy Rate Theorem[Shannon-Breiman-McMillan] if P is stationary and ergodic, then n H n(x) H(X) = Interpretation : H(X) = minimal number of bits necessary to code each symbol of X Kraft s inequality : for each code φ n there exists a (sub-)probability distribution Q n such that Q n (x) = 2 φn(x) log Q n (x) = φ n (x) Licence de droits d usage Page / 5
13 Arithmetic Coding : a constructive reciprocal φ n (x n ) = the log Qn (x n ) +2 first bits of Qn (X n x n ) + Qn (X n < x n ) Ex : A = {, 2, 3}, n = 2, Q 2 = [.3,.6,.] 2 I() = [,.9[ : m() =.45 =.... As log.9 + = 5, φ 2 () = I(22) = [.48,.84[ : m(22) =.66 =.... As log.36 + = 3, φ 2 (22) = 23 3 Licence de droits d usage Page 2 /
14 Coding distributions = log Q n (x)= code length for x with coding distribution Q n. Th Shannon : the best coding distribution is Q n = P n. Using another distribution causes an expected overhead called redundancy E P [ φ n (X n ) ] H (X n ) = E P [ log Q n (X n ) + log Pn (X n )] = KL (P n, Q n ) = Küllback-Leibler divergence of P n to Q n. Licence de droits d usage Page 3 / 5
15 Universal coding What if the source statistics are unknown? What if we need a versatile code? = Need a single coding distribution Q n for a class of sources Λ = {P θ, θ Θ} Ex : memoryless processes, Markov chains, HMM... = unavoidable redundancy, in the worst case : sup θ Θ E Pθ [ φ n (X n ) ] H (X n ) = sup KL (Pθ n, Qn ) θ Θ = Q n must be close to all the {P θ, θ Θ} for KL divergence Licence de droits d usage Page 4 / 5
16 Predictive Coding What if we want to code messages with different or large lengths? Consistent coding family : x n An, Q n (x n ) = a A Q n+ (xa) Define Q(a x n) = Q(x na) b A Q(x n b), then n ( ) Q n (x n ) = Q x j x j j= Predictive coding scheme : update Q( x n) Example : Laplace code Q(a x n) = n(a,x n)+ n+ A, where n(a, x n) = n j= x j =a. Licence de droits d usage Page 5 / 5
17 Density estimation with Logarithmic Loss Feature space : context x k A, Instant loss : number of bits used l(q, x k ) = log Q(x k x k ) Best regressor in( average : ) Q ( x k ) = P θ X k = x k X k = x k Excess loss : l(q, x k ) l(q, x k ) = log P θ(x k =x k X k =x k ) Q k (x k x k ) ( ) E Pθ [l(q, x k ) l(q, x k )] = KL P θ (X k = X k = x k k k ), Q ( x Total excess loss : L n (Q) = n k= l(q, x k) l(q, x k ) = log Pn θ (x n ) Q n (x n ) Total expected excess loss : E Pθ L n (Q) = KL(P n θ, Qn ) Licence de droits d usage Page 6 / 5
18 Outline Lossless Source Coding = density estimation with log-loss Source Coding and Universal Coding Bayesian coding Context-tree Models and Double Mixture Rissanen s model Coding in Context Tree models The Context-Tree Weighting Method Context Tree Weighting on Renewal Processes CTW on Renewal Processes Perpectives Licence de droits d usage Page 7 / 5
19 Bayesian predictive coding Let π be any prior on Θ, take Q n (x n ) = P n θ (x n )π(dθ) Predictive coding distribution : Q(a x n ) = P θ (x n+ = a X n ) π(dθ x n ) Conjugate priors = easy update rules for Q( x n) Idea : θ Θ, Q n (x n) Pn θ±δ (x n )π(θ ± δ) Licence de droits d usage Page 8 / 5
20 Example : i.i.d. classes A stationary memoryless source is parameterized by A θ S A = (θ,..., θ A ) : θ j = i= Conjugate Dirichlet prior π = D(α,..., α A ). Posterior : Π ( dθ x n ) = D(α + n,..., α A +n A ) = Q(a x n ) = n a + α a n + A j= α j Laplace code = special case α = = α A = Licence de droits d usage Page 9 / 5
21 The Krichevski-Trofimov Mixture Dirichlet prior with α = = α A = /2 : ( ) Γ A Q KT (x n ) = P θ (x n ) 2 θ S A A Γ( 2 ) A a A θ /2 a dθ a. Jeffrey s prior = Non-informative Peaked on extreme parameter values Example A =2 : ν(θ ) Q KT () =Q KT ()Q KT ( )Q KT ( ) = /2 /2 + 2 / Licence de droits d usage θ Page 2 / 5
22 Optimality Theorem : (Krichevsky-Trofimov 8) If QKT n is the KT-mixture code and if m = A, then max θ Θ KL(Pn θ, Qn KT ) = m log n Γ(/2)m + log 2 2 Γ ( ) m + o m () 2 Theorem : (Rissanen 84) If dim Θ = k, and if there exists a n-consistent estimator of θ given X n, then lim inf min n Q n max θ Θ KL(Pn θ, Qn ) k 2 log n Theorem : Haussler 96 there exists a Bayesian code Q n reaching the minimax redundandy Licence de droits d usage Page 2 / 5
23 Optimality : more precisely Theorem : (Krichevsky-Trofimov 8) If QKT n is the KT-mixture code and if m = A, then max θ Θ KL(Pn θ, Qn KT ) = m log n Γ(/2)m + log 2 2 Γ ( ) m + o m () 2 Theorem : (Xie-Barron 97) min max θ Q ΘKL(Pn n θ, Qn ) = m log n Γ(/2)m + log 2 2e Γ ( ) m + o m () 2 Actually, pointwise approximation : log QKT n (x n ) inf log θ Pn θ (x n ) + A log n + C 2 Licence de droits d usage Page 22 / 5
24 Sources with memory iid sources on finite alphabets are very-well treated......but they provide poor models for most applications! example : Shannon considered Markov models for natural languages Pb : a Markov Chain of order 3 on an alphabet of size 26 has k = 4394 degrees of freedom! min max Q n θ Θ KL(Pn θ, Qn ) k 2 log n = need of more parcimonious classes Licence de droits d usage Page 23 / 5
25 Outline Lossless Source Coding = density estimation with log-loss Source Coding and Universal Coding Bayesian coding Context-tree Models and Double Mixture Rissanen s model Coding in Context Tree models The Context-Tree Weighting Method Context Tree Weighting on Renewal Processes CTW on Renewal Processes Perpectives Licence de droits d usage Page 24 / 5
26 Context Tree Sources Formal definition : Context Tree : T A such that x, unique s T : x s = s The unique suffix of x in T is denoted by T (x) A( string s is ) a context of a stationary ergodic process P if P X s = s > and if for all x s : ( P X = a ) ( = x s s = P X = a X X s = s ) T the context tree of P if T is a context tree, contains all contexts of P and if none of its elements can be replaced by a proper suffix without violating one of these properties Licence de droits d usage Page 25 / 5
27 Context Tree Sources Informal Definition A Context tree Source is a Markov Chain whose order is allowed to depend on the past data. T = {,,, } P(X 4 = X = ) = P(X = X = ) 3/4 P(X 2 = X = ) /3 P(X 3 = X 2 = ) 4/5 P(X 4 = X 3 = ) /3 P(X 5 = X 4 = ) 2/3 (2/3, /3) (3/4, /4) (/5, 4/5) (/3, 2/3) Licence de droits d usage Page 26 / 5
28 Context Tree Sources Informal Definition A Context tree Source is a Markov Chain whose order is allowed to depend on the past data. T = {,,, } P(X 4 = X = ) = P(X = X = ) 3/4 P(X 2 = X = ) /3 P(X 3 = X 2 = ) 4/5 P(X 4 = X 3 = ) /3 P(X 5 = X 4 = ) 2/3 (3/4, /4) (/5, 4/5) (/3, 2/3) (2/3, /3) Licence de droits d usage Page 27 / 5
29 Context Tree Sources Informal Definition A Context tree Source is a Markov Chain whose order is allowed to depend on the past data. T = {,,, } P(X 4 = X = ) = P(X = X = ) 3/4 P(X 2 = X = ) /3 P(X 3 = X 2 = ) 4/5 P(X 4 = X 3 = ) /3 P(X 5 = X 4 = ) 2/3 (/5, 4/5) /3 (, 2/3) (3/4, /4) (2/3, /3) Licence de droits d usage Page 28 / 5
30 Context Tree Sources Informal Definition A Context tree Source is a Markov Chain whose order is allowed to depend on the past data. T = {,,, } P(X 4 = X = ) = P(X = X = ) 3/4 P(X 2 = X = ) /3 P(X 3 = X 2 = ) 4/5 P(X 4 = X 3 = ) /3 P(X 5 = X 4 = ) 2/3 4/5 (/5, ) (/3, 2/3) (3/4, /4) (2/3, /3) Licence de droits d usage Page 29 / 5
31 Context Tree Sources Informal Definition A Context tree Source is a Markov Chain whose order is allowed to depend on the past data. T = {,,, } P(X 4 = X = ) = P(X = X = ) 3/4 P(X 2 = X = ) /3 P(X 3 = X 2 = ) 4/5 P(X 4 = X 3 = ) /3 P(X 5 = X 4 = ) 2/3 (2/3, /3) (3/4, /4) (/5, 4/5) (/3, 2/3) Licence de droits d usage Page 3 / 5
32 Context Tree Sources Informal Definition A Context tree Source is a Markov Chain whose order is allowed to depend on the past data. T = {,,, } P(X 4 = X = ) = P(X = X = ) 3/4 P(X 2 = X = ) /3 P(X 3 = X 2 = ) 4/5 P(X 4 = X 3 = ) /3 P(X 5 = X 4 = ) 2/3 (3/4, /4) (/5, 4/5) (/3, 2/3) ( 2/3, /3) Licence de droits d usage Page 3 / 5
33 Context Trees versus Markov Chains Markov chain of order r = context tree source corresponding to a complete tree of depth r Markov chain of order 3 p (. ) p (. ) M =. p (. ) p(. ) p(. ) p(. ) p(. ) p(. ) p(. ) p(. ) p(. ) Finite context tree source of depth d = Markov Chain of order d p (. ) p (. ) p (. ) p(. ) M = p (. ) p(. ) p(. ) = much more flexibility : large number of models per dimension. Licence de droits d usage Page 32 / 5
34 Infinite trees, unbounded memory There exist (useful and simple) non-markovian context tree sources, see Galves, Ferrari, Comets,... Ex : Renewal Processes X = } {{ } }{{ } N i N i+ iid N i µ M (N + ) ( P X = X =... j ) = µ(j) µ ([j, [) Licence de droits d usage Page 33 / 5
35 Outline Lossless Source Coding = density estimation with log-loss Source Coding and Universal Coding Bayesian coding Context-tree Models and Double Mixture Rissanen s model Coding in Context Tree models The Context-Tree Weighting Method Context Tree Weighting on Renewal Processes CTW on Renewal Processes Perpectives Licence de droits d usage Page 34 / 5
36 Bayes Codes in a Context Tree Model For a given context tree T, let Λ T = {P stationary ergodic on A : T (P) = T } P Λ T is fully charactarized by θ = (θ s ) s T Θ T = S T A R T ( A ) : ( ) P X = X = x = θ T(x ) ( ) Bayesian coding : prior π T on Θ T, Q πt (x) = P θ (x n )dπ T (θ) Licence de droits d usage Page 35 / 5
37 Krichevsky-Trofimov Mixtures for Context Trees Prior : the θ s are independent, D ( 2,..., 2) - distributed : π T (dθ) = s T π KT (dθ s ) For s T let T (x n, s) = substring of x n that occurs in context s Example : for T = {,, } and x 8 =, T (x n, ) = T (x n, ) = T (x n, ) = The KT-mixture on model T is : Q n T (x n ) = s T Q T (x n,s) KT (T (x n, s)) Licence de droits d usage Page 36 / 5
38 Optimality Theorem : (Willems, Shtar kov, Tjalkens 93) there is a constant C such that : log Q n T (x n ) inf log P θ (x n θ S A T x ) + T A 2 first-order optimal What if we don t know which model T to use? = model selection ( ) n log + C T T Licence de droits d usage Page 37 / 5
39 Minimum Description Length Principle Occam s razor (Jorma Rissanen 78) : Choose the model that gives the shortest description of data Information theory : objective codelength = codelength of a minimax coder. Estimator associated with minimax mixtures (π T ) T : ˆT BIC = arg min log Pθ n (x n ) π T (dθ). T θ Θ T Looking directly at the bounds : ˆT KT = arg min T inf log Pθ n (x n ) + θ Θ T T ( A ) 2 log n. =penalized maximum likelihood estimator with BIC penalty. Licence de droits d usage Page 38 / 5
40 Consistency Results Theorems (Csiszár& Shields 96, Csiszár& Talata 4, G. 5, Galves & Leonardi 7, Leonardi 9) : the BIC estimator is strongly consistent the Krichevski-Trofimov estimator is not consistent Many Rates of convergence, minimal penalties,... The Krichevski-Trofimov estimator fails to identify Bernoulli-/2 iid sources! ( ) One can derive from this bounds on sup T sup θ ΘT KL Pθ n, Qṋ T Licence de droits d usage Page 39 / 5
41 Outline Lossless Source Coding = density estimation with log-loss Source Coding and Universal Coding Bayesian coding Context-tree Models and Double Mixture Rissanen s model Coding in Context Tree models The Context-Tree Weighting Method Context Tree Weighting on Renewal Processes CTW on Renewal Processes Perpectives Licence de droits d usage Page 4 / 5
42 Double mixture ν = measure on the set T of all context trees defined by : 2 T + ν(t ) = 2 is a probability distribution. Context Tree Weighting coding distribution : Q n CTW (x n ) = T ν(t )Q n T (x n ) Theorem (Shtarkov and al 93)) : oracle inequality : log QCTW(x n n x ) inf inf p θ (x n x ) T T θ Θ T + A T log + n 2 T + T (2 + log m) + m 2. Licence de droits d usage Page 4 / 5
43 CTW : Efficient Algorithm Idea : in each node, compute the arithmetical mean of a self-prob and a sub-prob. (4,3) (2,2) (2,) (2,) (,2) (,) (,) Licence de droits d usage Page 42 / 5
44 CTW : Efficient Algorithm Idea : in each node, compute the arithmetical mean of a self-prob and a sub-prob. (4,3) (2,2) (2,) 3 8 (2,) 3 8 (,2) 2 (,) 8 (,) Licence de droits d usage Page 42 / 5
45 CTW : Efficient Algorithm Idea : in each node, compute the arithmetical mean of a self-prob and a sub-prob. (4,3) ( ) 3 28 = (2,2) ( ) 6 = 6 (2,) 3 8 (2,) 3 8 (,2) 2 (,) 8 (,) Licence de droits d usage Page 42 / 5
46 CTW : Efficient Algorithm Idea : in each node, compute the arithmetical mean of a self-prob and a sub-prob. ( (4,3) ) = ( ) 3 28 = (2,2) ( ) 6 = 6 (2,) 3 8 (2,) 3 8 (,2) 2 (,) 8 (,) Licence de droits d usage Page 42 / 5
47 Outline Lossless Source Coding = density estimation with log-loss Source Coding and Universal Coding Bayesian coding Context-tree Models and Double Mixture Rissanen s model Coding in Context Tree models The Context-Tree Weighting Method Context Tree Weighting on Renewal Processes CTW on Renewal Processes Perpectives Licence de droits d usage Page 43 / 5
48 Renewal Processes Let R be the (non-parametric) class of renewal processes on the binary alphabet {, } : X = } {{ } }{{ }, N i N i N i+ iid µ M (N + ) Theorem : (Csiszár and Shields 96) There exist two constants c and C such that c n inf sup KL(P n, Q n ) = C n Q n P R = first example of an intermediate complexity class. Problem : non-constructive Does a general purpose coder behave well on Renewal Processes? Licence de droits d usage Page 44 / 5
49 Redundancy of the CTW method on Renewal Processes Theorem : (G. 4) There exist constants c and C such that for all P R : c n log n KL (P n, Q n CTW ) C n log n CTW is efficient on long-memory processes Adaptivity : if the renewal distribution is bounded, CTW achieves regret O(log n) (contrary to ad-hoc coders). Requires deep contexts in the double mixture ( = the tree should not be cut off at depth log n). Kind of non-parametric estimation : need for a balance between approximation and estimation. Licence de droits d usage Page 45 / 5
50 Why it works ^k P n is approximated by QT n with T as above. Two losses : estimating the transition parameters : k log(n)/2 approximation due to boundedness : n log(k)/k = for k = c n, both are O ( n log(n) ) Licence de droits d usage Page 46 / 5
51 Extension : Markovian Renewal Processes A Markovian Renewal Processes is such that successive distances between occurences of symbol form a Markov Chain on N + : X = } {{ } }{{ }, (N i ) i Markov chain N i N i+ Theorem : (Csiszár and Shields 96) There exist two constants c and C such that cn 2/3 inf Q n sup P MR KL(P n, Q n ) Cn 2/3 Theorem : (G. 4) CTW is almost adaptive on the class MR of Markovian Renewal processes : C R, P MR, KL (P n, Q n CTW ) Cn2/3 log n Licence de droits d usage Page 47 / 5
52 Markovian Renewal Processes are Context Tree Sources And so are r-order Markov Renewal Processes sk s s2 s Licence de droits d usage Page 48 / 5
53 Outline Lossless Source Coding = density estimation with log-loss Source Coding and Universal Coding Bayesian coding Context-tree Models and Double Mixture Rissanen s model Coding in Context Tree models The Context-Tree Weighting Method Context Tree Weighting on Renewal Processes CTW on Renewal Processes Perpectives Licence de droits d usage Page 49 / 5
54 Perspectives : oracle inequalities on more general classes? CTW is probably more useful than expected : it obtains good results on infinite memory processes We want to obtain oracle inequalities for different non-markovian classes But the setting is more difficult than usually And approximation theory is difficult! Ex : let P H be a HMM with 3 hidden states, what is inf T k inf KL(P n θ S A T H, Pn θ )??? Licence de droits d usage Page 5 / 5
55 the end... Thank you for your attention! Licence de droits d usage Page 5 / 5
Coding on Countably Infinite Alphabets
Coding on Countably Infinite Alphabets Non-parametric Information Theory Licence de droits d usage Outline Lossless Coding on infinite alphabets Source Coding Universal Coding Infinite Alphabets Enveloppe
More informationUniversal Loseless Compression: Context Tree Weighting(CTW)
Universal Loseless Compression: Context Tree Weighting(CTW) Dept. Electrical Engineering, Stanford University Dec 9, 2014 Universal Coding with Model Classes Traditional Shannon theory assume a (probabilistic)
More informationSource Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria
Source Coding Master Universitario en Ingeniería de Telecomunicación I. Santamaría Universidad de Cantabria Contents Introduction Asymptotic Equipartition Property Optimal Codes (Huffman Coding) Universal
More informationSTATISTICAL CURVATURE AND STOCHASTIC COMPLEXITY
2nd International Symposium on Information Geometry and its Applications December 2-6, 2005, Tokyo Pages 000 000 STATISTICAL CURVATURE AND STOCHASTIC COMPLEXITY JUN-ICHI TAKEUCHI, ANDREW R. BARRON, AND
More information3F1 Information Theory, Lecture 3
3F1 Information Theory, Lecture 3 Jossy Sayir Department of Engineering Michaelmas 2013, 29 November 2013 Memoryless Sources Arithmetic Coding Sources with Memory Markov Example 2 / 21 Encoding the output
More informationBioinformatics: Biology X
Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA Model Building/Checking, Reverse Engineering, Causality Outline 1 Bayesian Interpretation of Probabilities 2 Where (or of what)
More information3F1 Information Theory, Lecture 3
3F1 Information Theory, Lecture 3 Jossy Sayir Department of Engineering Michaelmas 2011, 28 November 2011 Memoryless Sources Arithmetic Coding Sources with Memory 2 / 19 Summary of last lecture Prefix-free
More informationWE start with a general discussion. Suppose we have
646 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO. 2, MARCH 1997 Minimax Redundancy for the Class of Memoryless Sources Qun Xie and Andrew R. Barron, Member, IEEE Abstract Let X n = (X 1 ; 111;Xn)be
More informationChapter 5: Data Compression
Chapter 5: Data Compression Definition. A source code C for a random variable X is a mapping from the range of X to the set of finite length strings of symbols from a D-ary alphabet. ˆX: source alphabet,
More informationMarkov approximation and consistent estimation of unbounded probabilistic suffix trees
Markov approximation and consistent estimation of unbounded probabilistic suffix trees Denise Duarte Antonio Galves Nancy L. Garcia Abstract We consider infinite order chains whose transition probabilities
More informationAn instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1
Kraft s inequality An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if N 2 l i 1 Proof: Suppose that we have a tree code. Let l max = max{l 1,...,
More informationBasic Principles of Lossless Coding. Universal Lossless coding. Lempel-Ziv Coding. 2. Exploit dependences between successive symbols.
Universal Lossless coding Lempel-Ziv Coding Basic principles of lossless compression Historical review Variable-length-to-block coding Lempel-Ziv coding 1 Basic Principles of Lossless Coding 1. Exploit
More informationAn O(N) Semi-Predictive Universal Encoder via the BWT
An O(N) Semi-Predictive Universal Encoder via the BWT Dror Baron and Yoram Bresler Abstract We provide an O(N) algorithm for a non-sequential semi-predictive encoder whose pointwise redundancy with respect
More informationHomework Set #2 Data Compression, Huffman code and AEP
Homework Set #2 Data Compression, Huffman code and AEP 1. Huffman coding. Consider the random variable ( x1 x X = 2 x 3 x 4 x 5 x 6 x 7 0.50 0.26 0.11 0.04 0.04 0.03 0.02 (a Find a binary Huffman code
More informationLecture 2: August 31
0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 2: August 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science Transmission of Information Spring 2006
MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science 6.44 Transmission of Information Spring 2006 Homework 2 Solution name username April 4, 2006 Reading: Chapter
More informationInformation Theory. David Rosenberg. June 15, New York University. David Rosenberg (New York University) DS-GA 1003 June 15, / 18
Information Theory David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 18 A Measure of Information? Consider a discrete random variable
More informationA BLEND OF INFORMATION THEORY AND STATISTICS. Andrew R. Barron. Collaborators: Cong Huang, Jonathan Li, Gerald Cheang, Xi Luo
A BLEND OF INFORMATION THEORY AND STATISTICS Andrew R. YALE UNIVERSITY Collaborators: Cong Huang, Jonathan Li, Gerald Cheang, Xi Luo Frejus, France, September 1-5, 2008 A BLEND OF INFORMATION THEORY AND
More informationIN this paper, we study the problem of universal lossless compression
4008 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 9, SEPTEMBER 2006 An Algorithm for Universal Lossless Compression With Side Information Haixiao Cai, Member, IEEE, Sanjeev R. Kulkarni, Fellow,
More informationMultimedia Communications. Mathematical Preliminaries for Lossless Compression
Multimedia Communications Mathematical Preliminaries for Lossless Compression What we will see in this chapter Definition of information and entropy Modeling a data source Definition of coding and when
More informationComputational Biology Lecture #3: Probability and Statistics. Bud Mishra Professor of Computer Science, Mathematics, & Cell Biology Sept
Computational Biology Lecture #3: Probability and Statistics Bud Mishra Professor of Computer Science, Mathematics, & Cell Biology Sept 26 2005 L2-1 Basic Probabilities L2-2 1 Random Variables L2-3 Examples
More informationInformation Theory and Statistics Lecture 2: Source coding
Information Theory and Statistics Lecture 2: Source coding Łukasz Dębowski ldebowsk@ipipan.waw.pl Ph. D. Programme 2013/2014 Injections and codes Definition (injection) Function f is called an injection
More informationEE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018
Please submit the solutions on Gradescope. EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018 1. Optimal codeword lengths. Although the codeword lengths of an optimal variable length code
More information(Classical) Information Theory II: Source coding
(Classical) Information Theory II: Source coding Sibasish Ghosh The Institute of Mathematical Sciences CIT Campus, Taramani, Chennai 600 113, India. p. 1 Abstract The information content of a random variable
More informationMinimax Redundancy for Large Alphabets by Analytic Methods
Minimax Redundancy for Large Alphabets by Analytic Methods Wojciech Szpankowski Purdue University W. Lafayette, IN 47907 March 15, 2012 NSF STC Center for Science of Information CISS, Princeton 2012 Joint
More informationAsymptotic Minimax Regret for Data Compression, Gambling, and Prediction
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 46, NO. 2, MARCH 2000 431 Asymptotic Minimax Regret for Data Compression, Gambling, Prediction Qun Xie Andrew R. Barron, Member, IEEE Abstract For problems
More informationU Logo Use Guidelines
Information Theory Lecture 3: Applications to Machine Learning U Logo Use Guidelines Mark Reid logo is a contemporary n of our heritage. presents our name, d and our motto: arn the nature of things. authenticity
More informationlossless, optimal compressor
6. Variable-length Lossless Compression The principal engineering goal of compression is to represent a given sequence a, a 2,..., a n produced by a source as a sequence of bits of minimal possible length.
More informationInformation Theory and Model Selection
Information Theory and Model Selection Dean Foster & Robert Stine Dept of Statistics, Univ of Pennsylvania May 11, 1998 Data compression and coding Duality of code lengths and probabilities Model selection
More informationKeep it Simple Stupid On the Effect of Lower-Order Terms in BIC-Like Criteria
Keep it Simple Stupid On the Effect of Lower-Order Terms in BIC-Like Criteria Teemu Roos and Yuan Zou Helsinki Institute for Information Technology HIIT Department of Computer Science University of Helsinki,
More informationCSCI 2570 Introduction to Nanocomputing
CSCI 2570 Introduction to Nanocomputing Information Theory John E Savage What is Information Theory Introduced by Claude Shannon. See Wikipedia Two foci: a) data compression and b) reliable communication
More informationChapter 2: Source coding
Chapter 2: meghdadi@ensil.unilim.fr University of Limoges Chapter 2: Entropy of Markov Source Chapter 2: Entropy of Markov Source Markov model for information sources Given the present, the future is independent
More informationBayesian Properties of Normalized Maximum Likelihood and its Fast Computation
Bayesian Properties of Normalized Maximum Likelihood and its Fast Computation Andre Barron Department of Statistics Yale University Email: andre.barron@yale.edu Teemu Roos HIIT & Dept of Computer Science
More informationEE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16
EE539R: Problem Set 4 Assigned: 3/08/6, Due: 07/09/6. Cover and Thomas: Problem 3.5 Sets defined by probabilities: Define the set C n (t = {x n : P X n(x n 2 nt } (a We have = P X n(x n P X n(x n 2 nt
More informationCOMM901 Source Coding and Compression. Quiz 1
German University in Cairo - GUC Faculty of Information Engineering & Technology - IET Department of Communication Engineering Winter Semester 2013/2014 Students Name: Students ID: COMM901 Source Coding
More informationInformation Theory in Intelligent Decision Making
Information Theory in Intelligent Decision Making Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire, United Kingdom June 7, 2015 Information Theory
More information10-704: Information Processing and Learning Fall Lecture 10: Oct 3
0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 0: Oct 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy of
More informationCoding of memoryless sources 1/35
Coding of memoryless sources 1/35 Outline 1. Morse coding ; 2. Definitions : encoding, encoding efficiency ; 3. fixed length codes, encoding integers ; 4. prefix condition ; 5. Kraft and Mac Millan theorems
More informationAchievability of Asymptotic Minimax Regret in Online and Batch Prediction
JMLR: Workshop and Conference Proceedings 9:181 196, 013 ACML 013 Achievability of Asymptotic Minimax Regret in Online and Batch Prediction Kazuho Watanabe Graduate School of Information Science Nara Institute
More informationQuantifying Computational Security Subject to Source Constraints, Guesswork and Inscrutability
Quantifying Computational Security Subject to Source Constraints, Guesswork and Inscrutability Ahmad Beirami, Robert Calderbank, Ken Duffy, Muriel Médard Department of Electrical and Computer Engineering,
More informationBayesian Regularization
Bayesian Regularization Aad van der Vaart Vrije Universiteit Amsterdam International Congress of Mathematicians Hyderabad, August 2010 Contents Introduction Abstract result Gaussian process priors Co-authors
More information10-704: Information Processing and Learning Spring Lecture 8: Feb 5
10-704: Information Processing and Learning Spring 2015 Lecture 8: Feb 5 Lecturer: Aarti Singh Scribe: Siheng Chen Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal
More informationChapter 2 Date Compression: Source Coding. 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code
Chapter 2 Date Compression: Source Coding 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code 2.1 An Introduction to Source Coding Source coding can be seen as an efficient way
More informationEECS 229A Spring 2007 * * (a) By stationarity and the chain rule for entropy, we have
EECS 229A Spring 2007 * * Solutions to Homework 3 1. Problem 4.11 on pg. 93 of the text. Stationary processes (a) By stationarity and the chain rule for entropy, we have H(X 0 ) + H(X n X 0 ) = H(X 0,
More informationChapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code
Chapter 3 Source Coding 3. An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code 3. An Introduction to Source Coding Entropy (in bits per symbol) implies in average
More informationBayesian Inference Course, WTCN, UCL, March 2013
Bayesian Course, WTCN, UCL, March 2013 Shannon (1948) asked how much information is received when we observe a specific value of the variable x? If an unlikely event occurs then one would expect the information
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationSIGNAL COMPRESSION Lecture Shannon-Fano-Elias Codes and Arithmetic Coding
SIGNAL COMPRESSION Lecture 3 4.9.2007 Shannon-Fano-Elias Codes and Arithmetic Coding 1 Shannon-Fano-Elias Coding We discuss how to encode the symbols {a 1, a 2,..., a m }, knowing their probabilities,
More informationExpectation Maximization
Expectation Maximization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 /
More informationLossy Compression Coding Theorems for Arbitrary Sources
Lossy Compression Coding Theorems for Arbitrary Sources Ioannis Kontoyiannis U of Cambridge joint work with M. Madiman, M. Harrison, J. Zhang Beyond IID Workshop, Cambridge, UK July 23, 2018 Outline Motivation
More informationChapter 9 Fundamental Limits in Information Theory
Chapter 9 Fundamental Limits in Information Theory Information Theory is the fundamental theory behind information manipulation, including data compression and data transmission. 9.1 Introduction o For
More informationLecture 21: Minimax Theory
Lecture : Minimax Theory Akshay Krishnamurthy akshay@cs.umass.edu November 8, 07 Recap In the first part of the course, we spent the majority of our time studying risk minimization. We found many ways
More informationTwo-Part Codes with Low Worst-Case Redundancies for Distributed Compression of Bernoulli Sequences
2003 Conference on Information Sciences and Systems, The Johns Hopkins University, March 2 4, 2003 Two-Part Codes with Low Worst-Case Redundancies for Distributed Compression of Bernoulli Sequences Dror
More informationConsistency of the maximum likelihood estimator for general hidden Markov models
Consistency of the maximum likelihood estimator for general hidden Markov models Jimmy Olsson Centre for Mathematical Sciences Lund University Nordstat 2012 Umeå, Sweden Collaborators Hidden Markov models
More informationIntroduction to Statistical Learning Theory
Introduction to Statistical Learning Theory In the last unit we looked at regularization - adding a w 2 penalty. We add a bias - we prefer classifiers with low norm. How to incorporate more complicated
More informationarxiv: v1 [cs.it] 29 Feb 2008
arxiv:0802.4363v [cs.it] 29 Feb 2008 Estimating the Entropy of Binary Time Series: Methodology, Some Theory and a Simulation Study Yun Gao Ioannis Kontoyiannis Elie Bienenstock February 7, 203 Abstract
More information4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information
4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information Ramji Venkataramanan Signal Processing and Communications Lab Department of Engineering ramji.v@eng.cam.ac.uk
More informationInferring Markov Chains: Bayesian Estimation, Model Comparison, Entropy Rate, and Out-of-class Modeling
Santa Fe Institute Working Paper 7-4-5 arxiv.org/math.st/7375 Inferring Markov Chains: Bayesian Estimation, Model Comparison, Entropy Rate, and Out-of-class Modeling Christopher C. Strelioff,,2, James
More informationInformation Theory and Hypothesis Testing
Summer School on Game Theory and Telecommunications Campione, 7-12 September, 2014 Information Theory and Hypothesis Testing Mauro Barni University of Siena September 8 Review of some basic results linking
More information13 : Variational Inference: Loopy Belief Propagation and Mean Field
10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction
More informationSIGNAL COMPRESSION Lecture 7. Variable to Fix Encoding
SIGNAL COMPRESSION Lecture 7 Variable to Fix Encoding 1. Tunstall codes 2. Petry codes 3. Generalized Tunstall codes for Markov sources (a presentation of the paper by I. Tabus, G. Korodi, J. Rissanen.
More information1 Introduction to information theory
1 Introduction to information theory 1.1 Introduction In this chapter we present some of the basic concepts of information theory. The situations we have in mind involve the exchange of information through
More informationExercises with solutions (Set B)
Exercises with solutions (Set B) 3. A fair coin is tossed an infinite number of times. Let Y n be a random variable, with n Z, that describes the outcome of the n-th coin toss. If the outcome of the n-th
More information10-704: Information Processing and Learning Fall Lecture 9: Sept 28
10-704: Information Processing and Learning Fall 2016 Lecturer: Siheng Chen Lecture 9: Sept 28 Note: These notes are based on scribed notes from Spring15 offering of this course. LaTeX template courtesy
More informationInformation Theory, Statistics, and Decision Trees
Information Theory, Statistics, and Decision Trees Léon Bottou COS 424 4/6/2010 Summary 1. Basic information theory. 2. Decision trees. 3. Information theory and statistics. Léon Bottou 2/31 COS 424 4/6/2010
More informationInformation Theory. M1 Informatique (parcours recherche et innovation) Aline Roumy. January INRIA Rennes 1/ 73
1/ 73 Information Theory M1 Informatique (parcours recherche et innovation) Aline Roumy INRIA Rennes January 2018 Outline 2/ 73 1 Non mathematical introduction 2 Mathematical introduction: definitions
More informationUnsupervised Learning
Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College
More informationarxiv: v3 [stat.me] 11 Feb 2018
arxiv:1708.02742v3 [stat.me] 11 Feb 2018 Minimum message length inference of the Poisson and geometric models using heavy-tailed prior distributions Chi Kuen Wong, Enes Makalic, Daniel F. Schmidt February
More informationECE 587 / STA 563: Lecture 5 Lossless Compression
ECE 587 / STA 563: Lecture 5 Lossless Compression Information Theory Duke University, Fall 2017 Author: Galen Reeves Last Modified: October 18, 2017 Outline of lecture: 5.1 Introduction to Lossless Source
More informationIntroduction to information theory and coding
Introduction to information theory and coding Louis WEHENKEL Set of slides No 5 State of the art in data compression Stochastic processes and models for information sources First Shannon theorem : data
More informationInformation & Correlation
Information & Correlation Jilles Vreeken 11 June 2014 (TADA) Questions of the day What is information? How can we measure correlation? and what do talking drums have to do with this? Bits and Pieces What
More informationTight Bounds for Symmetric Divergence Measures and a Refined Bound for Lossless Source Coding
APPEARS IN THE IEEE TRANSACTIONS ON INFORMATION THEORY, FEBRUARY 015 1 Tight Bounds for Symmetric Divergence Measures and a Refined Bound for Lossless Source Coding Igal Sason Abstract Tight bounds for
More informationLecture 13 : Variational Inference: Mean Field Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1
More informationInformation and Statistics
Information and Statistics Andrew Barron Department of Statistics Yale University IMA Workshop On Information Theory and Concentration Phenomena Minneapolis, April 13, 2015 Barron Information and Statistics
More informationELEMENT OF INFORMATION THEORY
History Table of Content ELEMENT OF INFORMATION THEORY O. Le Meur olemeur@irisa.fr Univ. of Rennes 1 http://www.irisa.fr/temics/staff/lemeur/ October 2010 1 History Table of Content VERSION: 2009-2010:
More informationLecture 5 Channel Coding over Continuous Channels
Lecture 5 Channel Coding over Continuous Channels I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw November 14, 2014 1 / 34 I-Hsiang Wang NIT Lecture 5 From
More informationBasic math for biology
Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood
More informationPiecewise Constant Prediction
Piecewise Constant Prediction Erik Ordentlich Information heory Research Hewlett-Packard Laboratories Palo Alto, CA 94304 Email: erik.ordentlich@hp.com Marcelo J. Weinberger Information heory Research
More informationExact Minimax Strategies for Predictive Density Estimation, Data Compression, and Model Selection
2708 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 50, NO. 11, NOVEMBER 2004 Exact Minimax Strategies for Predictive Density Estimation, Data Compression, Model Selection Feng Liang Andrew Barron, Senior
More informationIterative Markov Chain Monte Carlo Computation of Reference Priors and Minimax Risk
Iterative Markov Chain Monte Carlo Computation of Reference Priors and Minimax Risk John Lafferty School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 lafferty@cs.cmu.edu Abstract
More informationLecture 8: Information Theory and Statistics
Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang
More informationAchievability of Asymptotic Minimax Regret by Horizon-Dependent and Horizon-Independent Strategies
Journal of Machine Learning Research 16 015) 1-48 Submitted 7/14; Revised 1/14; Published 8/15 Achievability of Asymptotic Minimax Regret by Horizon-Dependent and Horizon-Independent Strategies Kazuho
More informationPredictive MDL & Bayes
Predictive MDL & Bayes Marcus Hutter Canberra, ACT, 0200, Australia http://www.hutter1.net/ ANU RSISE NICTA Marcus Hutter - 2 - Predictive MDL & Bayes Contents PART I: SETUP AND MAIN RESULTS PART II: FACTS,
More informationPriors for the frequentist, consistency beyond Schwartz
Victoria University, Wellington, New Zealand, 11 January 2016 Priors for the frequentist, consistency beyond Schwartz Bas Kleijn, KdV Institute for Mathematics Part I Introduction Bayesian and Frequentist
More informationAsymptotically Minimax Regret for Models with Hidden Variables
Asymptotically Minimax Regret for Models with Hidden Variables Jun ichi Takeuchi Department of Informatics, Kyushu University, Fukuoka, Fukuoka, Japan Andrew R. Barron Department of Statistics, Yale University,
More informationInformation and Entropy
Information and Entropy Shannon s Separation Principle Source Coding Principles Entropy Variable Length Codes Huffman Codes Joint Sources Arithmetic Codes Adaptive Codes Thomas Wiegand: Digital Image Communication
More informationMinimax lower bounds I
Minimax lower bounds I Kyoung Hee Kim Sungshin University 1 Preliminaries 2 General strategy 3 Le Cam, 1973 4 Assouad, 1983 5 Appendix Setting Family of probability measures {P θ : θ Θ} on a sigma field
More informationCAN the future of a sequence be predicted based on its
2124 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998 Universal Prediction Neri Merhav, Senior Member, IEEE, and Meir Feder, Senior Member, IEEE (Invited Paper) Abstract This paper
More informationLecture 3: More on regularization. Bayesian vs maximum likelihood learning
Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting
More informationECE 587 / STA 563: Lecture 5 Lossless Compression
ECE 587 / STA 563: Lecture 5 Lossless Compression Information Theory Duke University, Fall 28 Author: Galen Reeves Last Modified: September 27, 28 Outline of lecture: 5. Introduction to Lossless Source
More informationKolmogorov complexity ; induction, prediction and compression
Kolmogorov complexity ; induction, prediction and compression Contents 1 Motivation for Kolmogorov complexity 1 2 Formal Definition 2 3 Trying to compute Kolmogorov complexity 3 4 Standard upper bounds
More informationOn Universal Types. Gadiel Seroussi Hewlett-Packard Laboratories Palo Alto, California, USA. University of Minnesota, September 14, 2004
On Universal Types Gadiel Seroussi Hewlett-Packard Laboratories Palo Alto, California, USA University of Minnesota, September 14, 2004 Types for Parametric Probability Distributions A = finite alphabet,
More informationInformational Confidence Bounds for Self-Normalized Averages and Applications
Informational Confidence Bounds for Self-Normalized Averages and Applications Aurélien Garivier Institut de Mathématiques de Toulouse - Université Paul Sabatier Thursday, September 12th 2013 Context Tree
More information6.1 Variational representation of f-divergences
ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 6: Variational representation, HCR and CR lower bounds Lecturer: Yihong Wu Scribe: Georgios Rovatsos, Feb 11, 2016
More informationIEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm
IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.
More informationInformation Theory. Week 4 Compressing streams. Iain Murray,
Information Theory http://www.inf.ed.ac.uk/teaching/courses/it/ Week 4 Compressing streams Iain Murray, 2014 School of Informatics, University of Edinburgh Jensen s inequality For convex functions: E[f(x)]
More informationPROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS
PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability
More informationLec 03 Entropy and Coding II Hoffman and Golomb Coding
CS/EE 5590 / ENG 40 Special Topics Multimedia Communication, Spring 207 Lec 03 Entropy and Coding II Hoffman and Golomb Coding Zhu Li Z. Li Multimedia Communciation, 207 Spring p. Outline Lecture 02 ReCap
More informationCS Lecture 19. Exponential Families & Expectation Propagation
CS 6347 Lecture 19 Exponential Families & Expectation Propagation Discrete State Spaces We have been focusing on the case of MRFs over discrete state spaces Probability distributions over discrete spaces
More informationLecture 3. Mathematical methods in communication I. REMINDER. A. Convex Set. A set R is a convex set iff, x 1,x 2 R, θ, 0 θ 1, θx 1 + θx 2 R, (1)
3- Mathematical methods in communication Lecture 3 Lecturer: Haim Permuter Scribe: Yuval Carmel, Dima Khaykin, Ziv Goldfeld I. REMINDER A. Convex Set A set R is a convex set iff, x,x 2 R, θ, θ, θx + θx
More information