Huffman Coding. C.M. Liu Perceptual Lab, College of Computer Science National Chiao-Tung University

Size: px

Start display at page:

Download "Huffman Coding. C.M. Liu Perceptual Lab, College of Computer Science National Chiao-Tung University"

Paulina Stephens
6 years ago
Views:

1 Huffman Coding C.M. Liu Perceptual Lab, College of Computer Science National Chiao-Tung University Office: EC538 (03)

2 Overview Huffman Coding Algorithm The procedure to build Huffman codes. Extended Huffman Codes Adaptive Huffman Coding Update Procedure Decoding Procedure Golomb Codes

3 Shannon-Fano Coding 3 The first code based on Shannon s theory Suboptimal (it took a graduate student to fix it!) Algorithm Start with empty codes Compute frequency statistics for all symbols Order the symbols in the set by frequency Split the set to minimize * difference Add 0 to the codes in the first set and to the rest Recursively assign the rest of the code bits for the two subsets, until sets cannot be split.

4 Shannon-Fano Coding () a b c d e f a b 9 8 c d e f 6 5 4

5 Shannon-Fano Coding (3) a b c d e f c d e f a b 9 8

6 Shannon-Fano Coding (4) a b c d e f c d e f a b 9 8

7 Shannon-Fano Coding (5) a b c d e f a b 9 8 c d e f 6 5 4

8 Shannon-Fano Coding (6) a b c d e f a b 9 8 c d e f 6 5 4

9 Shannon-Fano Coding (7) a b c d e f a b c d 6 5 e f 4

10 Shannon-Fano Coding (8) a b c d e f a b c d 6 5 e f 4

11 Shannon-Fano Coding (9) a b c d e f a b c d e f 4

12 Shannon-Fano Coding (0) a b c d e f a b c d e f 4

13 Shannon-Fano Coding: Remarks 3 Shannon-Fano does not always produce optimal prefix codes; the set of probabilities {0.35, 0.7, 0.7, 0.6, 0.5} Huffman coding is almost as computationally simple and produces prefix codes that always achieve the lowest expected code word length, under the constraints that each symbol is represented by a code formed of an integral number of bits. Symbol-by-symbol Huffman coding is only optimal if the probabilities of the symbols are independent and are some power of a half, i.e. / n

14 Optimum Prefix Codes 4 Key observations on optimal codes. Symbols that occur more frequently will have shorter codewords. The two least frequent symbols will have the same length Proofs. Assume the opposite code is clearly sub-optimal. Assume the opposite Let X, Y be the least frequent symbols & code(x) = k, code(y) = k+ Then by unique decodability (UD), code(x) cannot be a prefix for code(y) also, all other codes are shorter Dropping the last bit of code(y) would generate a new, shorter, uniquely decodable code!!! This contradicts optimality assumption!!!

15 Huffman Coding 5 David Huffman (95) Grad student of Robert M. Fano (MIT) Term paper(!) Explained by example Letter Code Probability Set Set Prob a 0. b 0.4 c 0. d 0. e 0.

16 Huffman Coding by Example 6 Init: Create a set out of each letter Letter Code Probability Set Set Prob a 0. b 0.4 c 0. d 0. e 0.

17 Huffman Coding by Example 7. Sort sets according to probability (lowest first) Letter Code Probability Set Set Prob a 0. a 0. b 0.4 b 0.4 c 0. c 0. d 0. d 0. e 0. e 0.

18 Huffman Coding by Example 8. Insert prefix into the codes of top set letters Letter Code Probability Set Set Prob a 0. d 0. b 0.4 e 0. c 0. a 0. d 0. c 0. e 0. b 0.4

19 Huffman Coding by Example 9 3. Insert prefix 0 into the codes of the second set letters Letter Code Probability Set Set Prob a 0. d 0. b 0.4 e 0. c 0. a 0. d 0. c 0. e 0 0. b 0.4

20 Huffman Coding by Example 0 4. Merge the top two sets Letter Code Probability Set Set Prob a 0. de d b 0.4 ea c 0. ac 0. d 0. dc e 0 0. b 0.4

21 Huffman Coding by Example. Sort sets according to probability (lowest first) Letter Code Probability Set Set Prob a 0. de 0. b 0.4 a 0. c 0. c 0. d 0. b 0.4 e 0 0.

22 Huffman Coding by Example. Insert prefix into the codes of top set letters Letter Code Probability Set Set Prob a 0. de 0. b 0.4 a 0. c 0. c 0. d 0. b 0.4 e 00 0.

23 Huffman Coding by Example 3 3. Insert prefix 0 into the codes of the second set letters Letter Code Probability Set Set Prob a 0 0. de 0. b 0.4 a 0. c 0. c 0. d 0. b 0.4 e 0 0.

24 Huffman Coding by Example 4 4. Merge the top two sets Letter Code Probability Set Set Prob a 0 0. dea b 0.4 ac 0. c 0. bc d 0. b 0.4 e 0 0.

25 Huffman Coding by Example 5. Sort sets according to probability (lowest first) Letter Code Probability Set Set Prob a 0 0. dea c b 0.4 dea c c 0. b 0.4 d 0. e 0 0.

26 Huffman Coding by Example 6. Insert prefix into the codes of top set letters Letter Code Probability Set Set Prob a 0 0. c 0. b 0.4 dea 0.4 c 0. b 0.4 d 0. e 0 0.

27 Huffman Coding by Example 7 3. Insert prefix 0 into the codes of the second set letters Letter Code Probability Set Set Prob a c 0. b 0.4 dea 0.4 c 0. b 0.4 d 0 0. e

28 Huffman Coding by Example 8 4. Merge the top two sets

29 Huffman Coding by Example 9. Sort sets according to probability (lowest first) Letter Code Probability Set Set Prob a cdea 0.6 b 0.4 b 0.4 c 0. d 0 0. e 00 0.

30 Huffman Coding by Example 30. Insert prefix into the codes of top set letters Letter Code Probability Set Set Prob a b 0.4 b 0.4 cdea 0.6 c 0. d 0 0. e 00 0.

31 Huffman Coding by Example 3 3. Insert prefix 0 into the codes of the second set letters Letter Code Probability Set Set Prob a b 0.4 b 0.4 cdea 0.6 c 0. d 0 0. e 00 0.

32 Huffman Coding by Example 3 4. Merge the top two sets Letter Code Probability Set Set Prob a b 0.4 b 0.4 cdea 0.6 c 0 0. d e The END

33 Example Summary 33 Average code length l = 0.4x + 0.x + 0.x3 + 0.x4 + 0.x4 =. bits/symbol Entropy H = Σ s=a..e P(s) log P(s) =. bits/symbol Redundancy l - H = bits/symbol

34 Huffman Tree b c 0. a 0. 0 e d 0. 0.

35 Building a Huffman Tree 35 Letter a b c d Code c 0. b 0.4 e a e d 0. 0.

36 Building a Huffman Tree 36 Letter Code a b c d c 0. b 0.4 e 0 a e d 0. 0.

37 Building a Huffman Tree 37 Letter Code a 0 b c d c 0. b 0.4 e 0 a e d 0. 0.

38 Building a Huffman Tree Letter Code a 00 b c d c 0. b 0.4 e 00 a e d 0. 0.

39 An Alternative Huffman Tree 39 Letter Code a b c d e 0 a c e b 0.4 d 0. 0.

40 An Alternative Huffman Tree 40 Letter Code a 0 b c d e a c e b 0.4 d 0. 0.

41 An Alternative Huffman Tree 4 Letter Code a 000 b c 0 d e a c e b 0.4 d 0. 0.

42 An Alternative Huffman Tree 4 Letter Code a 000 b c 000 d 0 e 00 Average code length a c l = 0.4x + ( )x3=. bits/symbol e b 0.4 d 0. 0.

43 Yet Another Tree 43 Letter Code a 00 b c 0 d 0 e a c b 0.4 Average code length e d l = 0.4x+ ( )x + ( )x3=. bits/symbol

44 44 Design Examples

45 Min Variance Huffman Trees 45 Huffman codes are not unique All versions yield the same average length Which one should we choose? The one with the minimum variance in codeword lengths I.e. with the minimum height tree Why? It will ensure the least amount of variability in the encoded stream How to achieve it? During sorting, break ties by placing smaller sets higher Alternatively, place newly merged sets as low as possible

46 Extended Huffman Codes 46 Consider the source: A = {a, b, c}, P(a) = 0.8, P(b) = 0.0, P(c) = 0.8 H = 0.86 bits/symbol Huffman code: a 0 b c 0 l =. bits/symbol Redundancy = b/sym (47%!) Q: Could we do better?

47 Extended Huffman Codes () 47 Idea Consider encoding sequences of two letters as opposed to single letters Letter Probability Code aa ab ac ba bb bc ca cb cc l =.78/ = Red. = bits/symbol

48 Extended Huffman Codes (3) 48 The idea can be extended further Consider all possible n m sequences (we did 3 ) In theory, by considering more sequences we can improve the coding In reality, the exponential growth of the alphabet makes this impractical E.g., for length 3 ASCII seq.: 56 3 = 4 = 6M Most sequences would have zero frequency Other methods are needed

49 Adaptive Huffman Coding 49 Problem Huffman requires probability estimates This could turn it into a two-pass procedure:. Collect statistics, generate codewords. Perform actual encoding Not practical in many situations E.g. compressing network transmissions Theoretical solution Start with equal probabilities Based on the first k symbol statistics (k =,, ) regenerate codewords and encode k+ st symbol Too expensive in practice

50 Adaptive Huffman Coding () 50 Basic idea Alphabet A = {a,, a n } Notes: Pick a fixed default binary codes for all symbols Start with an empty Huffman tree Read symbol s from source If NYT(s) // Not Yet Transmitted Send NYT, default(s) Update tree (and keep it Huffman) Else Until done Send codeword for s Update tree Codewords will change as a function of symbol frequencies Encoder & decoder follow the same procedure so they stay in sync

51 Adaptive Huffman Tree 5 Tree has at most n - nodes Node attributes symbol, left, right, parent, siblings, leaf weight If x k is leaf then weight(x k ) = frequency of symbol(x k ) Else x k = weight( left(x k )) + weight( right(x k )) id, assigned as follows: If weight(x ) weight(x ) weight(x n- ) then id(x ) id(x ) id(x n- ) Also, parent(x k- ) = parent(x k ), for k n Sibling property

52 Updating the Tree 5 Assign id(root) = n-, weight(nyt) = 0 Start with an NYT node Whenever a new symbols is seen, a new node is formed by splitting the NYT Maintaining sibling property Whenever node x is updated Repeat If weight(x) < weight(y), for all y siblings(x) weight(x)++ exit Else swap(x, z), where z rightmost sibling: weight(x) == weight(z) weight(x)++ x = parent(x) Until x == root

53 Adaptive Huffman Encoding 53 Input: aardvark Output: Symbol NYT a r d v k Code NYT slightly more efficient default codes are possible (4-/5-bit combination) 0 5 a f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

54 Adaptive Huffman Encoding 54 Input: aardvark Output: Symbol NYT 0 a Code NYT a r d v k a f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

55 Adaptive Huffman Encoding 55 Input: aardvark Output: Symbol NYT 0 a Code NYT a r d v k a f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

56 Adaptive Huffman Encoding 56 Input: aardvark Output: Symbol Code NYT 00 a r 0 d v NYT r 50 a k a f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

57 Adaptive Huffman Encoding 57 Input: aardvark Output: Symbol Code NYT 000 a r 0 d 00 v k NYT d 48 r 50 a a f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

58 Adaptive Huffman Encoding 58 4 Input: aardvark 5 Output: Symbol Code NYT 000 a r 0 d 00 v k NYT v d 48 r 50 a a f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

59 Adaptive Huffman Encoding 59 4 Input: aardvark 5 Output: Symbol Code NYT 000 a r 0 d 00 v k NYT v d 48 r 50 a a f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

60 Adaptive Huffman Encoding 60 4 Input: aardvark 5 Output: Symbol Code NYT 000 a r 0 d 00 v k NYT 48 0 r v a d a f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

61 Adaptive Huffman Encoding 6 4 Input: aardvark 5 Output: Symbol Code NYT 000 a r 0 d 00 v k NYT 48 0 r v a d a f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

62 Adaptive Huffman Encoding 6 5 Input: aardvark Output: Symbol Code NYT 00 a 0 r 0 d v 0 k a 50 NYT r v 46 d a f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

63 Adaptive Huffman Encoding 63 6 Input: aardvark Output: Symbol Code NYT 00 a 0 r 0 d v 0 k 3 a 50 NYT r v 46 d a f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

64 Adaptive Huffman Encoding 64 7 Input: aardvark Output: Symbol Code NYT 00 a 0 r 0 d v 0 k 3 a 50 NYT r v 46 d a f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

65 Adaptive Huffman Encoding 65 3 a 4 Input: aardvark Output: Symbol Code NYT 00 a 0 r 0 d v 0 k r v 46 d NYT 0 k 4 4

66 Adaptive Huffman Encoding 66 3 a 4 Input: aardvark Output: Symbol Code NYT 00 a 0 r 0 d v 0 k r v 46 d NYT 0 k 4 4

67 Adaptive Huffman Encoding Input: aardvark 3 50 a 5 49 Output: Symbol Code NYT 00 a 0 r 0 d 0 48 r 46 d v v k k 000 NYT k

68 Adaptive Huffman Decoding 68 Output: Input: Symbol NYT a Code NYT 0 5 r d v k a f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

69 Adaptive Huffman Decoding 69 Output: a Input: Symbol NYT 0 a Code NYT a r d v k a f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

70 Adaptive Huffman Decoding 70 Output: aa Input: Symbol NYT 0 a Code NYT a r d v k a f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

71 Adaptive Huffman Decoding 7 Output: aa Input: Symbol NYT 0 a Code NYT a r d v k a f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

72 Adaptive Huffman Decoding 7 Output: aar Input: Symbol NYT 00 a r 0 d v Code NYT r 50 a k a f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

73 Adaptive Huffman Decoding 73 Output: aar Input: Symbol NYT 00 a r 0 d v Code NYT r 50 a k a f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

74 Adaptive Huffman Decoding 74 Output: aard Input: Symbol Code NYT 000 a r 0 d 00 v k NYT d 48 r 50 a a f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

75 Adaptive Huffman Decoding 75 Output: aard Input: Symbol Code NYT 000 a r 0 d 00 v k NYT d 48 r 50 a a f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

76 Adaptive Huffman Decoding 76 Output: aardv Input: Symbol NYT 000 a r 0 d 00 v k Code NYT a f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y v d r 50 a

77 Adaptive Huffman Decoding 77 5 Output: aardv Input: Symbol Code NYT 00 a 0 r 0 d v 0 k 50 a NYT r v 46 d a f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

78 Adaptive Huffman Decoding 78 6 Output: aardva Input: Symbol Code NYT 00 a 0 r 0 d v 0 k 3 50 a NYT r v 46 d a f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

79 Adaptive Huffman Decoding 79 7 Output: aardvar Input: Symbol Code NYT 00 a 0 r 0 d v 0 k 3 50 a NYT r v 46 d a f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

80 Adaptive Huffman Decoding 80 7 Output: aardvar Input: Symbol Code NYT 00 a 0 r 0 d v 0 k 3 50 a NYT r v 46 d a f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

81 Adaptive Huffman Decoding 8 7 Output: aardvark Input: Symbol Code NYT 00 a 0 r 0 d v 0 k 3 50 a 5 48 r v 46 d k 000 NYT k

82 Adaptive Huffman Decoding a 5 Symbol Code NYT 00 a 0 r 0 d 0 v k r d v NYT 0 k 4 4

83 Dealing with Counter Overflow 83 Over time counters can overflow E.g., 3-bit counter ~ 4 billion BIG but still finite and can overflow on long network connections Solution? Rescale all frequency counts (of leaf nodes) when limit is reached E.g., divide by two all of them Re-compute the rest of the tree (keep it Huffman!) Note: After rescaling, new symbols will count twice as much as old ones! This is mostly a feature, not a bug: Data tends to have strong local correlation I.e., what happened a long time ago is not as important as what happened more recently

84 Huffman Image Compression 84 Example images: 56x56 pixels, 8 bits/pixel, 65,536 bytes Sena Sensin Earth Omaha Huffman coding of pixel values Image Bits/pixel Size (bytes) Compression Ratio Sena ,504.4 Sensin , Earth ,534.6 Omaha 7. 58,374.

85 Huffman Image Compression () 85 Basic observations The plain Huffman yields modest gains, except in the Earth case Lots of black skews the pixel distribution nicely We are not taking into account obvious correlations of pixel values Huffman coding of pixel differences Image Bits/pixel Size (bytes) Compression Ratio Sena 4.0 3, Sensin ,54.70 Earth , Omaha 6.4 5,643.4

86 Two-pass Huffman vs. Adaptive Huffman 86 Two-pass Image Bits/pixel Size (bytes) Compression Ratio Sena 4.0 3, Sensin ,54.70 Earth , Omaha 6.4 5,643.4 Adaptive Image Bits/pixel Size (bytes) Compression Ratio Sena ,6.03 Sensin , Earth , Omaha ,3.5

87 Huffman Text Compression 87 PDF(letters): US Constitution vs. Chapter 3 0. P(Consitution) P(Chapter) Probability A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Letter

88 Huffman Audio Compression 88 Huffman coding: 6-bit CD audio (44,00 Hz) x channels File Name Original File Size (bytes) Entropy (bits) Est. Compressed File Size (bytes) Compression Ratio Mozart 939, ,40.6 Cohn 40, ,300.5 Mir 884, , Difference Huffman Coding File Name Original File Size (bytes) Entropy of Diff. (bits) Est. Compressed File Size (bytes) Compression Ratio Mozart 939, Cohn 40, , Mir 884, ,40.47

89 Golomb Codes 89 Invented by Solomon W. Golomb in the 960s. Golomb coding is optimal for the geometric distribution Rice coding Golomb code has a tunable parameter that can be any positive value, Rice codes are those in which the tunable parameter is a power of two Unary code The unary representation of the number followed by Identical to Huffman code for {,, 3, } and P(k) = / k Optimal for the probability model

90 Golomb Codes () 90 Uses a tunable parameter m to divide an input value into the quotient and the remainder. To represent n, we compute q = n/m (quotient) r = n - qm (remainder) Represent q in unary code, followed by r in log m bits If m is not a power of then we can use log m bits Truncated Binary Encoding log m -bit representation for 0 r log m -m- log m -bit representation of r+ log m -m for the rest

91 Golomb Codes Truncated binary coding 9 Truncated binary coding An entropy encoding typically used for uniform probability distributions with a finite alphabet. A more general form of binary encoding when n is not a power of two. Coding (A Prefix Code) For k n k+, there are u = k+ n unused entries. k-bit codes for 0 r u-. (k+)-bit codes for the rest by r+u. U Truncated binary k U Encoding Standard binary UNUSED 0 3 UNUSED 00 4 UNUSED 0 5/UNUSED U n-u 3 0 6/UNUSED 4 7/UNUSED k+ N=5

92 Golomb Codes Truncated binary coding 9 Input value Offset value Standard Binary Truncated Binary N=7 Input value Offset value Standard Binary Truncated Binary N=0

93 Golomb Code Example 93 m = 6 log m = log m = 3 -bit codes for 0 r log r 3-bit codes of r+ log 6-6 for the rest r+ n q r Codeword

94 Golomb Code Example () 94 m = 6 log m = log m = 3 -bit codes for 0 r log r 3-bit codes of r+ log 6-6 for the rest r+ n q r Codeword

95 Golomb Code Example (3) 95 m = 6 log m = log m = 3 -bit codes for 0 r log r 3-bit codes of r+ log 6-6 for the rest r+ n q r Codeword

96 Golomb Code Example (4) 96 m = 6 log m = log m = 3 -bit codes for 0 r log r 3-bit codes of r+ log 6-6 for the rest r+ n q r Codeword

97 Golomb Code Example (5) 97 m = 6 log m = log m = 3 -bit codes for 0 r log r 3-bit codes of r+ log 6-6 for the rest r+ n q r Codeword

98 Golomb Code Example (6) 98 m = 6 log m = log m = 3 -bit codes for 0 r log r 3-bit codes of r+ log 6-6 for the rest r+ n q r Codeword

99 Golomb Code Example (7) 99 m = 6 log m = log m = 3 -bit codes for 0 r log r 3-bit codes of r+ log 6-6 for the rest r+ n q r Codeword n q r Codeword

100 Golomb Code Example (8) 00 m = 6 log m = log m = 3 -bit codes for 0 r log r 3-bit codes of r+ log 6-6 for the rest r+ n q r Codeword n q r Codeword

101 Golomb Code Example (9) 0 m = 6 log m = log m = 3 -bit codes for 0 r log r 3-bit codes of r+ log 6-6 for the rest r+ n q r Codeword n q r Codeword

102 Golomb Code Example (0) 0 m = 6 log m = log m = 3 -bit codes for 0 r log r 3-bit codes of r+ log 6-6 for the rest r+ n q r Codeword n q r Codeword

103 Golomb Code Example () 03 m = 6 log m = log m = 3 -bit codes for 0 r log r 3-bit codes of r+ log 6-6 for the rest r+ n q r Codeword n q r Codeword

104 Golomb Code Example () 04 m = 6 log m = log m = 3 -bit codes for 0 r log r 3-bit codes of r+ log 6-6 for the rest r+ n q r Codeword n q r Codeword

Golomb Codes: Choosing m 05 Assume a binary string (zeroes & ones) It can be encoded counting the runs of identical bits (either zeroes or ones) A.k.a. run-length encoding (RLE) E.g. 00000000000000000000000000000000000 ---4-0--3 - --------9-00---4 ---4 --3-4,,0,3,,9,,0,0,4,4,3, 35 zeroes, ones P(0) = 35/(35+) = 0.

105 Golomb Codes: Choosing m 05 Assume a binary string (zeroes & ones) It can be encoded counting the runs of identical bits (either zeroes or ones) A.k.a. run-length encoding (RLE) E.g ,,0,3,,9,,0,0,4,4,3, 35 zeroes, ones P(0) = 35/(35+) = log + p log m = m = log p log ( ) ( ) =

106 Summary 06 Early Shannon Fano code Huffman code Original (two-pass) version Collect symbol statistics, assign codes Perform actual encoding of the source Extended version Group multiple symbols to reduce entropy estimate Adaptive version Most practical build Huffman tree on the fly Single pass Escape codes for NYT symbols Encoder & decoder are synchronized More sensitive to local variation, tends to forget older data Homeworks (pp. 78) 4, 5, 6, 0.

Run-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE

Run-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE General e Image Coder Structure Motion Video x(s 1,s 2,t) or x(s 1,s 2 ) Natural Image Sampling A form of data compression; usually lossless, but can be lossy Redundancy Removal Lossless compression: predictive