Lec 03 Entropy and Coding II Hoffman and Golomb Coding

CS/EE 5590 / ENG 40 Special Topics Multimedia Communication, Spring 207 Lec 03 Entropy and Coding II Hoffman and Golomb Coding Zhu Li Z. Li Multimedia Communciation, 207 Spring p.

Outline Lecture 02 ReCap Hoffman Coding Golomb Coding and JPEG 2000 Lossless Coding Z. Li Multimedia Communciation, 207 Spring p.2

Entropy Self Info of an event i X = k = log Pr X = k = log(p k ) Entropy of a source H X = k p k log( p k ) Conditional Entropy, Mutual Information H X X 2 = H X, X 2 H(X 2 ) I X, X 2 = H X + H X 2 H X, X 2 Relative Entropy Total area: H(X, Y) Main application: Contet Modeling a b c b c a b c b a b c b a D(p q = k p k log p k q k H(X Y) I(X; Y) H(Y X) H(X) H(Y) Z. Li Multimedia Communciation, 207 Spring p.3

Contet Reduces Entropy Eample Condition reduces entropy: H( 5 ) > H( 5 4, 3, 2, ) H( 5 ) > H( 5 f( 4, 3, 2, )) Contet: 2 3 4 5 f( 4, 3, 2, )==00 getentropy.m, lossless_coding.m f( 4, 3, 2, )<00 Contet function: f( 4, 3, 2, )= sum( 4, 3, 2, ) Z. Li Multimedia Communciation, 207 Spring p.4

Lossless Coding Prefi Coding Codes on leaves No code is prefi of other codes Simple encoding/decoding 0 0 0 0 Root node 0 Internal node leaf node 0 Kraft- McMillan Inequality: For a coding scheme with code length: l, l 2, l n, k 2 l k Given a set of integer length {l, l 2, l n } that satisfy above inequality, we can always find a prefi code with code length l, l 2, l n Z. Li Multimedia Communciation, 207 Spring p.5

Outline Lecture 02 ReCap Hoffman Coding Golomb Coding and JPEG 2000 Lossless Z. Li Multimedia Communciation, 207 Spring p.6

Huffman Coding A procedure to construct optimal prefi code Result of David Huffman s term paper in 952 when he was a PhD student at MIT Shannon Fano Huffman (925-999) Z. Li Multimedia Communciation, 207 Spring p.7

Huffman Code Design Requirement: The source probability distribution. (But not available in many cases) Procedure:. Sort the probability of all source symbols in a descending order. 2. Merge the last two into a new symbol, add their probabilities. 3. Repeat Step, 2 until only one symbol (the root) is left. 4. Code assignment: Traverse the tree from the root to each leaf node, assign 0 to the top branch and to the bottom branch. Z. Li Multimedia Communciation, 207 Spring p.8

Eample Source alphabet A = {a, a 2, a 3, a 4, a 5 } Probability distribution: {0.2, 0.4, 0.2, 0., 0.} 0 000 000 00 Sort a2 (0.4) merge Sort 0.4 merge Sort 0.4 merge 0 0.2 0 0.4 00 a(0.2) 0.6 000 0.2 000 0.2 a3(0.2) 0.4 000 0 a4(0.) 0.2 0.2 a5(0.) 00 00 Assign code Sort 0.6 0.4 merge 0 Z. Li Multimedia Communciation, 207 Spring p.9

Huffman code is prefi-free 0 000 000 00 (a 2 ) 0 (a ) 000 (a 3 ) 000 (a 4 ) 00(a 5 ) All codewords are leaf nodes No code is a prefi of any other code. (Prefi free) Z. Li Multimedia Communciation, 207 Spring p.0

Average Codeword Length vs Entropy Source alphabet A = {a, b, c, d, e} Probability distribution: {0.2, 0.4, 0.2, 0., 0.} Code: {0,, 000, 000, 00} Entropy: H(S) = - (0.2*log2(0.2)*2 + 0.4*log2(0.4)+0.*log2(0.)*2) = 2.22 bits / symbol Average Huffman codeword length: L = 0.2*2+0.4*+0.2*3+0.*4+0.*4 = 2.2 bits / symbol This verifies H(S) L < H(S) +. Z. Li Multimedia Communciation, 207 Spring p.

Huffman Code is not unique Two choices for each split: 0, or, 0 0.4 0.6 0 0.4 0 0.6 0.4 00 0.4 0 0.6 0.4 0.6 0.4 0.2 0.2 0 Multiple ordering choices for tied probabilities 0 a 0.4 0 0.6 b 0.4 0 0.6 b 0.4 0 0.6 0.4 0 a 0.4 0 0.6 0.4 0 c 0.2 c 0.2 Z. Li Multimedia Communciation, 207 Spring p.2

Huffman Coding is Optimal Assume the probabilities are ordered: p p 2 p m. Lemma: For any distribution, there eists an optimal prefi code that satisfies: If p j p k, then l j l k : otherwise can swap codewords to reduce the average length. The two least probable letters have the same length: otherwise we can truncate the longer one without violating prefi-free condition. The two longest codewords differ only in the last bit and correspond to the two least probable symbols. Otherwise we can rearrange to achieve this. Proof skipped. Z. Li Multimedia Communciation, 207 Spring p.3

Canonical Huffman Code Huffman algorithm is needed only to compute the optimal codeword lengths The optimal codewords for a given data set are not unique Canonical Huffman code is well structured Given the codeword lengths, can find a canonical Huffman code Also known as slice code, alphabetic code. Z. Li Multimedia Communciation, 207 Spring p.4

Canonical Huffman Code Rules: Assign 0 to left branch and to right branch Build the tree from left to right in increasing order of depth Each leaf is placed at the first available position Eample: Codeword lengths: 2, 2, 3, 3, 3, 4, 4 Verify that it satisfies Kraft-McMillan inequality A non-canonical eample N i 2 The Canonical Tree l i 0 000 00 0 00 00 0 00 0 0 00 0 0 Z. Li Multimedia Communciation, 207 Spring p.5

Canonical Huffman Properties: The first code is a series of 0 Codes of same length are consecutive: 00, 0, 0 If we pad zeros to the right side such that all 00 0 codewords have the same length, shorter codes would have lower value than longer codes: 0000 < 000 < 000 < 00 < 00 < 0 < 00 0 0 0 Coding from length level n to level n+: C(n+, ) = 2 ( C(n, last) + ): append a 0 to the net available level-n code First code of length n+ Last code of length n If from length n to n + 2 directly: e.g.,, 3, 3, 3, 4, 4 C(n+2, ) = 4( C(n, last) + ) 0 00 0 0 0 Z. Li Multimedia Communciation, 207 Spring p.6

Advantages of Canonical Huffman. Reducing memory requirement 000 00 0 00 Non-canonical tree needs: All codewords Lengths of all codewords Need a lot of space for large table 00 0 00 0 00 0 0 0 Canonical tree only needs: Min: shortest codeword length Ma: longest codeword length Distribution: Number of codewords in each level Min=2, Ma=4, # in each level: 2, 3, 2 Z. Li Multimedia Communciation, 207 Spring p.7

Outline Lecture 02 ReCap Hoffman Coding Golomb Coding Z. Li Multimedia Communciation, 207 Spring p.8

Unary Code (Comma Code) Encode a nonnegative integer n by n s and a 0 (or n 0 s and an ). No need to store codeword table, very simple Is this code prefi-free? n Codeword 0 0 0 2 0 3 0 4 0 5 0 0 0 0 0 0 0 0 When is this code optimal? When probabilities are: /2, /4, /8, /6, /32 D-adic Huffman code becomes unary code in this case. Z. Li Multimedia Communciation, 207 Spring p.9

Implementation Very Efficient Encoding: UnaryEncode(n) { while (n > 0) { WriteBit(); n--; } WriteBit(0); } Decoding: UnaryDecode() { n = 0; while (ReadBit() == ) { n++; } return n; } Z. Li Multimedia Communciation, 207 Spring p.20

Golomb Code [Golomb, 966] A multi-resolutional approach: Divide all numbers into groups of equal size m o Denote as Golomb(m) or Golomb-m Groups with smaller symbol values have shorter codes Symbols in the same group has codewords of similar lengths o The codeword length grows much slower than in unary code m m m m 0 ma Codeword : (Unary, fied-length) Group ID: Unary code Inde within each group: Z. Li Multimedia Communciation, 207 Spring p.2

Golomb Code q: Quotient, n used unary code q Codeword 0 0 0 2 0 3 0 4 0 5 0 6 0 qm r n m m r r: remainder, fied-length code K bits if m = 2^k m=8: 000, 00,, If m 2^k: (not desired) log 2 m log 2 m bits for smaller r bits for larger r m = 5: 00, 0, 0, 0, Z. Li Multimedia Communciation, 207 Spring p.22

Golomb Code with m = 5 (Golomb-5) n q r code n q r code n q r code 0 0 0 000 5 0 000 0 2 0 000 0 00 6 00 2 00 2 0 2 00 7 2 00 2 2 2 00 3 0 3 00 8 3 00 3 2 3 00 4 0 4 0 9 4 0 4 2 4 0 Z. Li Multimedia Communciation, 207 Spring p.23

Golomb vs Canonical Huffman Codewords: 000, 00, 00, 00, 0, 000, 00, 00, 00, 0 Canonical form: From left to right From short to long Take first valid spot Golomb code is a canonical Huffman With more properties Z. Li Multimedia Communciation, 207 Spring p.24

Golobm-Rice Code A special Golomb code with m= 2^k The remainder r is the fied k LSB bits of n m = 8 n q r code 0 0 0 0000 0 000 2 0 2 000 3 0 3 00 4 0 4 000 5 0 5 00 6 0 6 00 7 0 7 0 n q r code 8 0 0000 9 000 0 2 000 3 00 2 4 000 3 5 00 4 6 00 5 7 0 Z. Li Multimedia Communciation, 207 Spring p.25

Implementation Encoding: GolombEncode(n, RBits) { q = n >> RBits; UnaryCode(q); WriteBits(n, RBits); } Decoding: Remainder bits: RBits = 3 for m = 8 Output the lower (RBits) bits of n. n q r code 0 0 0 0000 0 000 2 0 2 000 3 0 3 00 4 0 4 000 5 0 5 00 6 0 6 00 7 0 7 0 GolombDecode(RBits) { q = UnaryDecode(); n = (q << RBits) + ReadBits(RBits); return n; } Z. Li Multimedia Communciation, 207 Spring p.26

Eponential Golomb Code (Ep-Golomb) Golomb code divides the alphabet into groups of equal size m m m m 0 ma In Ep-Golomb code, the group size increases eponentially Codes still contain two parts: Unary code followed by fied-length code 2 4 8 6 n code 0 0 00 2 0 3 000 4 00 5 00 6 0 7 0000 8 000 9 000 0 00 000 0 ma Proposed by Teuhola in 978 2 00 3 00 4 0 5 00000 Z. Li Multimedia Communciation, 207 Spring p.27

Decoding Implementation EpGolombDecode() { GroupID = UnaryDecode(); if (GroupID == 0) { return 0; } else { Base = ( << GroupID) - ; Inde = ReadBits(GroupID); return (Base + Inde); } } } n code Group ID 0 0 0 00 2 0 3 000 2 4 00 5 00 6 0 7 0000 3 8 000 9 000 0 00 000 2 00 3 00 4 0 Z. Li Multimedia Communciation, 207 Spring p.28

Outline Golomb Code Family: Unary Code Golomb Code Golomb-Rice Code Eponential Golomb Code Why Golomb code? Z. Li Multimedia Communciation, 207 Spring p.29

Geometric Distribution (GD) Geometric distribution with parameter ρ: P() = ρ ( - ρ), 0, integer. Prob of the number of failures before the first success in a series of independent Yes/No eperiments (Bernoulli trials). Unary code is the optimal prefi code for geometric distribution with ρ /2: ρ = /4: P(): 0.75, 0.9, 0.05, 0.0, 0.003, Huffman coding never needs to re-order equivalent to unary code. Unary code is the optimal prefi code, but not efficient ( avg length >> entropy) ρ = 3/4: P(): 0.25, 0.9, 0.4, 0., 0.08, Reordering is needed for Huffman code, unary code not optimal prefi code. ρ = /2: Epected length = entropy. Unary code is not only the optimal prefi code, but also optimal among all entropy coding (including arithmetic coding). Z. Li Multimedia Communciation, 207 Spring p.30

Geometric Distribution (GD) Geometric distribution is very useful for image/video compression Eample : run-length coding Binary sequence with i.i.d. distribution P(0) = ρ : Eample: 00000000000000000000000 Entropy <<, prefi code has poor performance. Run-length coding is efficient to compress the data: or: Number of consecutive 0 s between two s o run-length representation of the sequence: 5, 8, 4, 0, 6 Probability distribution of the run-length r: o P(r = n) = ρ n (- ρ): n 0 s followed by an. o The run has one-sided geometric distribution with parameter ρ. P(r) r Z. Li Multimedia Communciation, 207 Spring p.3

Geometric Distribution GD is the discrete analogy of the Eponential distribution f ( ) e f() Two-sided geometric distribution is the discrete analogy of the Laplacian distribution (also called double eponential distribution) f() f ( ) e 2 Z. Li Multimedia Communciation, 207 Spring p.32

Why Golomb Code? Significance of Golomb code: For any geometric distribution (GD), Golomb code is optimal prefi code and is as close to the entropy as possible (among all prefi codes) How to determine the Golomb parameter? How to apply it into practical codec? Z. Li Multimedia Communciation, 207 Spring p.33

Geometric Distribution Eample 2: GD is a also good model for Prediction error e(n) = (n) pred((),, (n-)). Most e(n) s have smaller values around 0: p ( n ) n, 0. can be modeled by geometric distribution. p(n) n 2 3 4 5 0.2 0.3 0.2 0.2 0 0 0 0 0 Z. Li Multimedia Communciation, 207 Spring p.34

Optimal Code for Geometric Distribution Geometric distribution with parameter ρ: P(X=n) = ρ n ( - ρ) Unary code is optimal prefi code when ρ /2. Also optimal among all entropy coding for ρ = /2. How to design the optimal code when ρ > /2? Transform into GD with ρ /2 (as close as possible) How? By grouping m events together! Each can be written as m q r P() P() m m m qmr qm mq m PX ( q) P ( ) ( ) ( ) ( ) q X qm r r0 r0 q has geometric dist with parameter ρ m. Unary code is optimal for q if ρ m /2 m m is the minimal possible integer. log 2 log 2 Z. Li Multimedia Communciation, 207 Spring p.35

Golomb Parameter Estimation (J2K book: pp. 55) P ) ( ) ( Goal of adaptive Golomb code: For the given data, find the best m such that ρ m /2. How to find ρ from the statistics of past data? ) ( ) ( ) ( ) ( 2 0 E. ) ( ) ( E E 2 ) ( ) ( m m E E ) ( ) ( log / E E m Let m=2 k. ) ( ) ( log / log 2 2 E E k Too costly to compute Z. Li Multimedia Communciation, 207 Spring p.36

Golomb Parameter Estimation (J2K book: pp. 55) ( ) E A faster method: Assume ρ, ρ 0. ) ( ) ( E m m m m m ρ m /2 ) ( 2 2 E m k. ) ( 2 log 0, ma 2 E k Z. Li Multimedia Communciation, 207 Spring p.37

Summary Hoffman Coding A prefi code that is optimal in code length (average) Canonical form to reduce variation of the code length Widely used Golomb Coding Suitable for coding prediction errors in image Optimal for Geometrical Distribution of p=0.5 Simple to encode and decode Many practical applications, e.g., JPEG-2000 lossless. Z. Li Multimedia Communciation, 207 Spring p.38

Q&A Z. Li Multimedia Communciation, 207 Spring p.39