Huffman Coding. C.M. Liu Perceptual Lab, College of Computer Science National Chiao-Tung University

Size: px
Start display at page:

Download "Huffman Coding. C.M. Liu Perceptual Lab, College of Computer Science National Chiao-Tung University"

Transcription

1 Huffman Coding C.M. Liu Perceptual Lab, College of Computer Science National Chiao-Tung University Office: EC538 (03)

2 Overview Huffman Coding Algorithm The procedure to build Huffman codes. Extended Huffman Codes Adaptive Huffman Coding Update Procedure Decoding Procedure Golomb Codes

3 Shannon-Fano Coding 3 The first code based on Shannon s theory Suboptimal (it took a graduate student to fix it!) Algorithm Start with empty codes Compute frequency statistics for all symbols Order the symbols in the set by frequency Split the set to minimize * difference Add 0 to the codes in the first set and to the rest Recursively assign the rest of the code bits for the two subsets, until sets cannot be split.

4 Shannon-Fano Coding () a b c d e f a b 9 8 c d e f 6 5 4

5 Shannon-Fano Coding (3) a b c d e f c d e f a b 9 8

6 Shannon-Fano Coding (4) a b c d e f c d e f a b 9 8

7 Shannon-Fano Coding (5) a b c d e f a b 9 8 c d e f 6 5 4

8 Shannon-Fano Coding (6) a b c d e f a b 9 8 c d e f 6 5 4

9 Shannon-Fano Coding (7) a b c d e f a b c d 6 5 e f 4

10 Shannon-Fano Coding (8) a b c d e f a b c d 6 5 e f 4

11 Shannon-Fano Coding (9) a b c d e f a b c d e f 4

12 Shannon-Fano Coding (0) a b c d e f a b c d e f 4

13 Shannon-Fano Coding: Remarks 3 Shannon-Fano does not always produce optimal prefix codes; the set of probabilities {0.35, 0.7, 0.7, 0.6, 0.5} Huffman coding is almost as computationally simple and produces prefix codes that always achieve the lowest expected code word length, under the constraints that each symbol is represented by a code formed of an integral number of bits. Symbol-by-symbol Huffman coding is only optimal if the probabilities of the symbols are independent and are some power of a half, i.e. / n

14 Optimum Prefix Codes 4 Key observations on optimal codes. Symbols that occur more frequently will have shorter codewords. The two least frequent symbols will have the same length Proofs. Assume the opposite code is clearly sub-optimal. Assume the opposite Let X, Y be the least frequent symbols & code(x) = k, code(y) = k+ Then by unique decodability (UD), code(x) cannot be a prefix for code(y) also, all other codes are shorter Dropping the last bit of code(y) would generate a new, shorter, uniquely decodable code!!! This contradicts optimality assumption!!!

15 Huffman Coding 5 David Huffman (95) Grad student of Robert M. Fano (MIT) Term paper(!) Explained by example Letter Code Probability Set Set Prob a 0. b 0.4 c 0. d 0. e 0.

16 Huffman Coding by Example 6 Init: Create a set out of each letter Letter Code Probability Set Set Prob a 0. b 0.4 c 0. d 0. e 0.

17 Huffman Coding by Example 7. Sort sets according to probability (lowest first) Letter Code Probability Set Set Prob a 0. a 0. b 0.4 b 0.4 c 0. c 0. d 0. d 0. e 0. e 0.

18 Huffman Coding by Example 8. Insert prefix into the codes of top set letters Letter Code Probability Set Set Prob a 0. d 0. b 0.4 e 0. c 0. a 0. d 0. c 0. e 0. b 0.4

19 Huffman Coding by Example 9 3. Insert prefix 0 into the codes of the second set letters Letter Code Probability Set Set Prob a 0. d 0. b 0.4 e 0. c 0. a 0. d 0. c 0. e 0 0. b 0.4

20 Huffman Coding by Example 0 4. Merge the top two sets Letter Code Probability Set Set Prob a 0. de d b 0.4 ea c 0. ac 0. d 0. dc e 0 0. b 0.4

21 Huffman Coding by Example. Sort sets according to probability (lowest first) Letter Code Probability Set Set Prob a 0. de 0. b 0.4 a 0. c 0. c 0. d 0. b 0.4 e 0 0.

22 Huffman Coding by Example. Insert prefix into the codes of top set letters Letter Code Probability Set Set Prob a 0. de 0. b 0.4 a 0. c 0. c 0. d 0. b 0.4 e 00 0.

23 Huffman Coding by Example 3 3. Insert prefix 0 into the codes of the second set letters Letter Code Probability Set Set Prob a 0 0. de 0. b 0.4 a 0. c 0. c 0. d 0. b 0.4 e 0 0.

24 Huffman Coding by Example 4 4. Merge the top two sets Letter Code Probability Set Set Prob a 0 0. dea b 0.4 ac 0. c 0. bc d 0. b 0.4 e 0 0.

25 Huffman Coding by Example 5. Sort sets according to probability (lowest first) Letter Code Probability Set Set Prob a 0 0. dea c b 0.4 dea c c 0. b 0.4 d 0. e 0 0.

26 Huffman Coding by Example 6. Insert prefix into the codes of top set letters Letter Code Probability Set Set Prob a 0 0. c 0. b 0.4 dea 0.4 c 0. b 0.4 d 0. e 0 0.

27 Huffman Coding by Example 7 3. Insert prefix 0 into the codes of the second set letters Letter Code Probability Set Set Prob a c 0. b 0.4 dea 0.4 c 0. b 0.4 d 0 0. e

28 Huffman Coding by Example 8 4. Merge the top two sets

29 Huffman Coding by Example 9. Sort sets according to probability (lowest first) Letter Code Probability Set Set Prob a cdea 0.6 b 0.4 b 0.4 c 0. d 0 0. e 00 0.

30 Huffman Coding by Example 30. Insert prefix into the codes of top set letters Letter Code Probability Set Set Prob a b 0.4 b 0.4 cdea 0.6 c 0. d 0 0. e 00 0.

31 Huffman Coding by Example 3 3. Insert prefix 0 into the codes of the second set letters Letter Code Probability Set Set Prob a b 0.4 b 0.4 cdea 0.6 c 0. d 0 0. e 00 0.

32 Huffman Coding by Example 3 4. Merge the top two sets Letter Code Probability Set Set Prob a b 0.4 b 0.4 cdea 0.6 c 0 0. d e The END

33 Example Summary 33 Average code length l = 0.4x + 0.x + 0.x3 + 0.x4 + 0.x4 =. bits/symbol Entropy H = Σ s=a..e P(s) log P(s) =. bits/symbol Redundancy l - H = bits/symbol

34 Huffman Tree b c 0. a 0. 0 e d 0. 0.

35 Building a Huffman Tree 35 Letter a b c d Code c 0. b 0.4 e a e d 0. 0.

36 Building a Huffman Tree 36 Letter Code a b c d c 0. b 0.4 e 0 a e d 0. 0.

37 Building a Huffman Tree 37 Letter Code a 0 b c d c 0. b 0.4 e 0 a e d 0. 0.

38 Building a Huffman Tree Letter Code a 00 b c d c 0. b 0.4 e 00 a e d 0. 0.

39 An Alternative Huffman Tree 39 Letter Code a b c d e 0 a c e b 0.4 d 0. 0.

40 An Alternative Huffman Tree 40 Letter Code a 0 b c d e a c e b 0.4 d 0. 0.

41 An Alternative Huffman Tree 4 Letter Code a 000 b c 0 d e a c e b 0.4 d 0. 0.

42 An Alternative Huffman Tree 4 Letter Code a 000 b c 000 d 0 e 00 Average code length a c l = 0.4x + ( )x3=. bits/symbol e b 0.4 d 0. 0.

43 Yet Another Tree 43 Letter Code a 00 b c 0 d 0 e a c b 0.4 Average code length e d l = 0.4x+ ( )x + ( )x3=. bits/symbol

44 44 Design Examples

45 Min Variance Huffman Trees 45 Huffman codes are not unique All versions yield the same average length Which one should we choose? The one with the minimum variance in codeword lengths I.e. with the minimum height tree Why? It will ensure the least amount of variability in the encoded stream How to achieve it? During sorting, break ties by placing smaller sets higher Alternatively, place newly merged sets as low as possible

46 Extended Huffman Codes 46 Consider the source: A = {a, b, c}, P(a) = 0.8, P(b) = 0.0, P(c) = 0.8 H = 0.86 bits/symbol Huffman code: a 0 b c 0 l =. bits/symbol Redundancy = b/sym (47%!) Q: Could we do better?

47 Extended Huffman Codes () 47 Idea Consider encoding sequences of two letters as opposed to single letters Letter Probability Code aa ab ac ba bb bc ca cb cc l =.78/ = Red. = bits/symbol

48 Extended Huffman Codes (3) 48 The idea can be extended further Consider all possible n m sequences (we did 3 ) In theory, by considering more sequences we can improve the coding In reality, the exponential growth of the alphabet makes this impractical E.g., for length 3 ASCII seq.: 56 3 = 4 = 6M Most sequences would have zero frequency Other methods are needed

49 Adaptive Huffman Coding 49 Problem Huffman requires probability estimates This could turn it into a two-pass procedure:. Collect statistics, generate codewords. Perform actual encoding Not practical in many situations E.g. compressing network transmissions Theoretical solution Start with equal probabilities Based on the first k symbol statistics (k =,, ) regenerate codewords and encode k+ st symbol Too expensive in practice

50 Adaptive Huffman Coding () 50 Basic idea Alphabet A = {a,, a n } Notes: Pick a fixed default binary codes for all symbols Start with an empty Huffman tree Read symbol s from source If NYT(s) // Not Yet Transmitted Send NYT, default(s) Update tree (and keep it Huffman) Else Until done Send codeword for s Update tree Codewords will change as a function of symbol frequencies Encoder & decoder follow the same procedure so they stay in sync

51 Adaptive Huffman Tree 5 Tree has at most n - nodes Node attributes symbol, left, right, parent, siblings, leaf weight If x k is leaf then weight(x k ) = frequency of symbol(x k ) Else x k = weight( left(x k )) + weight( right(x k )) id, assigned as follows: If weight(x ) weight(x ) weight(x n- ) then id(x ) id(x ) id(x n- ) Also, parent(x k- ) = parent(x k ), for k n Sibling property

52 Updating the Tree 5 Assign id(root) = n-, weight(nyt) = 0 Start with an NYT node Whenever a new symbols is seen, a new node is formed by splitting the NYT Maintaining sibling property Whenever node x is updated Repeat If weight(x) < weight(y), for all y siblings(x) weight(x)++ exit Else swap(x, z), where z rightmost sibling: weight(x) == weight(z) weight(x)++ x = parent(x) Until x == root

53 Adaptive Huffman Encoding 53 Input: aardvark Output: Symbol NYT a r d v k Code NYT slightly more efficient default codes are possible (4-/5-bit combination) 0 5 a f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

54 Adaptive Huffman Encoding 54 Input: aardvark Output: Symbol NYT 0 a Code NYT a r d v k a f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

55 Adaptive Huffman Encoding 55 Input: aardvark Output: Symbol NYT 0 a Code NYT a r d v k a f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

56 Adaptive Huffman Encoding 56 Input: aardvark Output: Symbol Code NYT 00 a r 0 d v NYT r 50 a k a f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

57 Adaptive Huffman Encoding 57 Input: aardvark Output: Symbol Code NYT 000 a r 0 d 00 v k NYT d 48 r 50 a a f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

58 Adaptive Huffman Encoding 58 4 Input: aardvark 5 Output: Symbol Code NYT 000 a r 0 d 00 v k NYT v d 48 r 50 a a f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

59 Adaptive Huffman Encoding 59 4 Input: aardvark 5 Output: Symbol Code NYT 000 a r 0 d 00 v k NYT v d 48 r 50 a a f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

60 Adaptive Huffman Encoding 60 4 Input: aardvark 5 Output: Symbol Code NYT 000 a r 0 d 00 v k NYT 48 0 r v a d a f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

61 Adaptive Huffman Encoding 6 4 Input: aardvark 5 Output: Symbol Code NYT 000 a r 0 d 00 v k NYT 48 0 r v a d a f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

62 Adaptive Huffman Encoding 6 5 Input: aardvark Output: Symbol Code NYT 00 a 0 r 0 d v 0 k a 50 NYT r v 46 d a f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

63 Adaptive Huffman Encoding 63 6 Input: aardvark Output: Symbol Code NYT 00 a 0 r 0 d v 0 k 3 a 50 NYT r v 46 d a f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

64 Adaptive Huffman Encoding 64 7 Input: aardvark Output: Symbol Code NYT 00 a 0 r 0 d v 0 k 3 a 50 NYT r v 46 d a f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

65 Adaptive Huffman Encoding 65 3 a 4 Input: aardvark Output: Symbol Code NYT 00 a 0 r 0 d v 0 k r v 46 d NYT 0 k 4 4

66 Adaptive Huffman Encoding 66 3 a 4 Input: aardvark Output: Symbol Code NYT 00 a 0 r 0 d v 0 k r v 46 d NYT 0 k 4 4

67 Adaptive Huffman Encoding Input: aardvark 3 50 a 5 49 Output: Symbol Code NYT 00 a 0 r 0 d 0 48 r 46 d v v k k 000 NYT k

68 Adaptive Huffman Decoding 68 Output: Input: Symbol NYT a Code NYT 0 5 r d v k a f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

69 Adaptive Huffman Decoding 69 Output: a Input: Symbol NYT 0 a Code NYT a r d v k a f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

70 Adaptive Huffman Decoding 70 Output: aa Input: Symbol NYT 0 a Code NYT a r d v k a f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

71 Adaptive Huffman Decoding 7 Output: aa Input: Symbol NYT 0 a Code NYT a r d v k a f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

72 Adaptive Huffman Decoding 7 Output: aar Input: Symbol NYT 00 a r 0 d v Code NYT r 50 a k a f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

73 Adaptive Huffman Decoding 73 Output: aar Input: Symbol NYT 00 a r 0 d v Code NYT r 50 a k a f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

74 Adaptive Huffman Decoding 74 Output: aard Input: Symbol Code NYT 000 a r 0 d 00 v k NYT d 48 r 50 a a f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

75 Adaptive Huffman Decoding 75 Output: aard Input: Symbol Code NYT 000 a r 0 d 00 v k NYT d 48 r 50 a a f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

76 Adaptive Huffman Decoding 76 Output: aardv Input: Symbol NYT 000 a r 0 d 00 v k Code NYT a f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y v d r 50 a

77 Adaptive Huffman Decoding 77 5 Output: aardv Input: Symbol Code NYT 00 a 0 r 0 d v 0 k 50 a NYT r v 46 d a f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

78 Adaptive Huffman Decoding 78 6 Output: aardva Input: Symbol Code NYT 00 a 0 r 0 d v 0 k 3 50 a NYT r v 46 d a f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

79 Adaptive Huffman Decoding 79 7 Output: aardvar Input: Symbol Code NYT 00 a 0 r 0 d v 0 k 3 50 a NYT r v 46 d a f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

80 Adaptive Huffman Decoding 80 7 Output: aardvar Input: Symbol Code NYT 00 a 0 r 0 d v 0 k 3 50 a NYT r v 46 d a f 000 k 000 p 0 u 000 b 0000 g 000 l 00 q 0000 v 00 c 0000 h 00 m 000 r 000 w 00 d 000 i 0000 n 00 s 000 x 0 e 0000 j 000 o 00 t 00 y 000

81 Adaptive Huffman Decoding 8 7 Output: aardvark Input: Symbol Code NYT 00 a 0 r 0 d v 0 k 3 50 a 5 48 r v 46 d k 000 NYT k

82 Adaptive Huffman Decoding a 5 Symbol Code NYT 00 a 0 r 0 d 0 v k r d v NYT 0 k 4 4

83 Dealing with Counter Overflow 83 Over time counters can overflow E.g., 3-bit counter ~ 4 billion BIG but still finite and can overflow on long network connections Solution? Rescale all frequency counts (of leaf nodes) when limit is reached E.g., divide by two all of them Re-compute the rest of the tree (keep it Huffman!) Note: After rescaling, new symbols will count twice as much as old ones! This is mostly a feature, not a bug: Data tends to have strong local correlation I.e., what happened a long time ago is not as important as what happened more recently

84 Huffman Image Compression 84 Example images: 56x56 pixels, 8 bits/pixel, 65,536 bytes Sena Sensin Earth Omaha Huffman coding of pixel values Image Bits/pixel Size (bytes) Compression Ratio Sena ,504.4 Sensin , Earth ,534.6 Omaha 7. 58,374.

85 Huffman Image Compression () 85 Basic observations The plain Huffman yields modest gains, except in the Earth case Lots of black skews the pixel distribution nicely We are not taking into account obvious correlations of pixel values Huffman coding of pixel differences Image Bits/pixel Size (bytes) Compression Ratio Sena 4.0 3, Sensin ,54.70 Earth , Omaha 6.4 5,643.4

86 Two-pass Huffman vs. Adaptive Huffman 86 Two-pass Image Bits/pixel Size (bytes) Compression Ratio Sena 4.0 3, Sensin ,54.70 Earth , Omaha 6.4 5,643.4 Adaptive Image Bits/pixel Size (bytes) Compression Ratio Sena ,6.03 Sensin , Earth , Omaha ,3.5

87 Huffman Text Compression 87 PDF(letters): US Constitution vs. Chapter 3 0. P(Consitution) P(Chapter) Probability A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Letter

88 Huffman Audio Compression 88 Huffman coding: 6-bit CD audio (44,00 Hz) x channels File Name Original File Size (bytes) Entropy (bits) Est. Compressed File Size (bytes) Compression Ratio Mozart 939, ,40.6 Cohn 40, ,300.5 Mir 884, , Difference Huffman Coding File Name Original File Size (bytes) Entropy of Diff. (bits) Est. Compressed File Size (bytes) Compression Ratio Mozart 939, Cohn 40, , Mir 884, ,40.47

89 Golomb Codes 89 Invented by Solomon W. Golomb in the 960s. Golomb coding is optimal for the geometric distribution Rice coding Golomb code has a tunable parameter that can be any positive value, Rice codes are those in which the tunable parameter is a power of two Unary code The unary representation of the number followed by Identical to Huffman code for {,, 3, } and P(k) = / k Optimal for the probability model

90 Golomb Codes () 90 Uses a tunable parameter m to divide an input value into the quotient and the remainder. To represent n, we compute q = n/m (quotient) r = n - qm (remainder) Represent q in unary code, followed by r in log m bits If m is not a power of then we can use log m bits Truncated Binary Encoding log m -bit representation for 0 r log m -m- log m -bit representation of r+ log m -m for the rest

91 Golomb Codes Truncated binary coding 9 Truncated binary coding An entropy encoding typically used for uniform probability distributions with a finite alphabet. A more general form of binary encoding when n is not a power of two. Coding (A Prefix Code) For k n k+, there are u = k+ n unused entries. k-bit codes for 0 r u-. (k+)-bit codes for the rest by r+u. U Truncated binary k U Encoding Standard binary UNUSED 0 3 UNUSED 00 4 UNUSED 0 5/UNUSED U n-u 3 0 6/UNUSED 4 7/UNUSED k+ N=5

92 Golomb Codes Truncated binary coding 9 Input value Offset value Standard Binary Truncated Binary N=7 Input value Offset value Standard Binary Truncated Binary N=0

93 Golomb Code Example 93 m = 6 log m = log m = 3 -bit codes for 0 r log r 3-bit codes of r+ log 6-6 for the rest r+ n q r Codeword

94 Golomb Code Example () 94 m = 6 log m = log m = 3 -bit codes for 0 r log r 3-bit codes of r+ log 6-6 for the rest r+ n q r Codeword

95 Golomb Code Example (3) 95 m = 6 log m = log m = 3 -bit codes for 0 r log r 3-bit codes of r+ log 6-6 for the rest r+ n q r Codeword

96 Golomb Code Example (4) 96 m = 6 log m = log m = 3 -bit codes for 0 r log r 3-bit codes of r+ log 6-6 for the rest r+ n q r Codeword

97 Golomb Code Example (5) 97 m = 6 log m = log m = 3 -bit codes for 0 r log r 3-bit codes of r+ log 6-6 for the rest r+ n q r Codeword

98 Golomb Code Example (6) 98 m = 6 log m = log m = 3 -bit codes for 0 r log r 3-bit codes of r+ log 6-6 for the rest r+ n q r Codeword

99 Golomb Code Example (7) 99 m = 6 log m = log m = 3 -bit codes for 0 r log r 3-bit codes of r+ log 6-6 for the rest r+ n q r Codeword n q r Codeword

100 Golomb Code Example (8) 00 m = 6 log m = log m = 3 -bit codes for 0 r log r 3-bit codes of r+ log 6-6 for the rest r+ n q r Codeword n q r Codeword

101 Golomb Code Example (9) 0 m = 6 log m = log m = 3 -bit codes for 0 r log r 3-bit codes of r+ log 6-6 for the rest r+ n q r Codeword n q r Codeword

102 Golomb Code Example (0) 0 m = 6 log m = log m = 3 -bit codes for 0 r log r 3-bit codes of r+ log 6-6 for the rest r+ n q r Codeword n q r Codeword

103 Golomb Code Example () 03 m = 6 log m = log m = 3 -bit codes for 0 r log r 3-bit codes of r+ log 6-6 for the rest r+ n q r Codeword n q r Codeword

104 Golomb Code Example () 04 m = 6 log m = log m = 3 -bit codes for 0 r log r 3-bit codes of r+ log 6-6 for the rest r+ n q r Codeword n q r Codeword

105 Golomb Codes: Choosing m 05 Assume a binary string (zeroes & ones) It can be encoded counting the runs of identical bits (either zeroes or ones) A.k.a. run-length encoding (RLE) E.g ,,0,3,,9,,0,0,4,4,3, 35 zeroes, ones P(0) = 35/(35+) = log + p log m = m = log p log ( ) ( ) =

106 Summary 06 Early Shannon Fano code Huffman code Original (two-pass) version Collect symbol statistics, assign codes Perform actual encoding of the source Extended version Group multiple symbols to reduce entropy estimate Adaptive version Most practical build Huffman tree on the fly Single pass Escape codes for NYT symbols Encoder & decoder are synchronized More sensitive to local variation, tends to forget older data Homeworks (pp. 78) 4, 5, 6, 0.

Run-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE

Run-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE General e Image Coder Structure Motion Video x(s 1,s 2,t) or x(s 1,s 2 ) Natural Image Sampling A form of data compression; usually lossless, but can be lossy Redundancy Removal Lossless compression: predictive

More information

CSEP 521 Applied Algorithms Spring Statistical Lossless Data Compression

CSEP 521 Applied Algorithms Spring Statistical Lossless Data Compression CSEP 52 Applied Algorithms Spring 25 Statistical Lossless Data Compression Outline for Tonight Basic Concepts in Data Compression Entropy Prefix codes Huffman Coding Arithmetic Coding Run Length Coding

More information

Shannon-Fano-Elias coding

Shannon-Fano-Elias coding Shannon-Fano-Elias coding Suppose that we have a memoryless source X t taking values in the alphabet {1, 2,..., L}. Suppose that the probabilities for all symbols are strictly positive: p(i) > 0, i. The

More information

Data Compression Techniques (Spring 2012) Model Solutions for Exercise 2

Data Compression Techniques (Spring 2012) Model Solutions for Exercise 2 582487 Data Compression Techniques (Spring 22) Model Solutions for Exercise 2 If you have any feedback or corrections, please contact nvalimak at cs.helsinki.fi.. Problem: Construct a canonical prefix

More information

CMPT 365 Multimedia Systems. Lossless Compression

CMPT 365 Multimedia Systems. Lossless Compression CMPT 365 Multimedia Systems Lossless Compression Spring 2017 Edited from slides by Dr. Jiangchuan Liu CMPT365 Multimedia Systems 1 Outline Why compression? Entropy Variable Length Coding Shannon-Fano Coding

More information

Lec 03 Entropy and Coding II Hoffman and Golomb Coding

Lec 03 Entropy and Coding II Hoffman and Golomb Coding CS/EE 5590 / ENG 40 Special Topics Multimedia Communication, Spring 207 Lec 03 Entropy and Coding II Hoffman and Golomb Coding Zhu Li Z. Li Multimedia Communciation, 207 Spring p. Outline Lecture 02 ReCap

More information

Lecture 3 : Algorithms for source coding. September 30, 2016

Lecture 3 : Algorithms for source coding. September 30, 2016 Lecture 3 : Algorithms for source coding September 30, 2016 Outline 1. Huffman code ; proof of optimality ; 2. Coding with intervals : Shannon-Fano-Elias code and Shannon code ; 3. Arithmetic coding. 1/39

More information

Chapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code

Chapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code Chapter 3 Source Coding 3. An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code 3. An Introduction to Source Coding Entropy (in bits per symbol) implies in average

More information

Multimedia. Multimedia Data Compression (Lossless Compression Algorithms)

Multimedia. Multimedia Data Compression (Lossless Compression Algorithms) Course Code 005636 (Fall 2017) Multimedia Multimedia Data Compression (Lossless Compression Algorithms) Prof. S. M. Riazul Islam, Dept. of Computer Engineering, Sejong University, Korea E-mail: riaz@sejong.ac.kr

More information

Autumn Coping with NP-completeness (Conclusion) Introduction to Data Compression

Autumn Coping with NP-completeness (Conclusion) Introduction to Data Compression Autumn Coping with NP-completeness (Conclusion) Introduction to Data Compression Kirkpatrick (984) Analogy from thermodynamics. The best crystals are found by annealing. First heat up the material to let

More information

Entropy as a measure of surprise

Entropy as a measure of surprise Entropy as a measure of surprise Lecture 5: Sam Roweis September 26, 25 What does information do? It removes uncertainty. Information Conveyed = Uncertainty Removed = Surprise Yielded. How should we quantify

More information

Basic Principles of Lossless Coding. Universal Lossless coding. Lempel-Ziv Coding. 2. Exploit dependences between successive symbols.

Basic Principles of Lossless Coding. Universal Lossless coding. Lempel-Ziv Coding. 2. Exploit dependences between successive symbols. Universal Lossless coding Lempel-Ziv Coding Basic principles of lossless compression Historical review Variable-length-to-block coding Lempel-Ziv coding 1 Basic Principles of Lossless Coding 1. Exploit

More information

Compression and Coding

Compression and Coding Compression and Coding Theory and Applications Part 1: Fundamentals Gloria Menegaz 1 Transmitter (Encoder) What is the problem? Receiver (Decoder) Transformation information unit Channel Ordering (significance)

More information

CSEP 590 Data Compression Autumn Arithmetic Coding

CSEP 590 Data Compression Autumn Arithmetic Coding CSEP 590 Data Compression Autumn 2007 Arithmetic Coding Reals in Binary Any real number x in the interval [0,1) can be represented in binary as.b 1 b 2... where b i is a bit. x 0 0 1 0 1... binary representation

More information

Information Theory and Statistics Lecture 2: Source coding

Information Theory and Statistics Lecture 2: Source coding Information Theory and Statistics Lecture 2: Source coding Łukasz Dębowski ldebowsk@ipipan.waw.pl Ph. D. Programme 2013/2014 Injections and codes Definition (injection) Function f is called an injection

More information

CS4800: Algorithms & Data Jonathan Ullman

CS4800: Algorithms & Data Jonathan Ullman CS4800: Algorithms & Data Jonathan Ullman Lecture 22: Greedy Algorithms: Huffman Codes Data Compression and Entropy Apr 5, 2018 Data Compression How do we store strings of text compactly? A (binary) code

More information

Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet)

Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet) Compression Motivation Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet) Storage: Store large & complex 3D models (e.g. 3D scanner

More information

Lecture 4 : Adaptive source coding algorithms

Lecture 4 : Adaptive source coding algorithms Lecture 4 : Adaptive source coding algorithms February 2, 28 Information Theory Outline 1. Motivation ; 2. adaptive Huffman encoding ; 3. Gallager and Knuth s method ; 4. Dictionary methods : Lempel-Ziv

More information

3F1 Information Theory, Lecture 3

3F1 Information Theory, Lecture 3 3F1 Information Theory, Lecture 3 Jossy Sayir Department of Engineering Michaelmas 2013, 29 November 2013 Memoryless Sources Arithmetic Coding Sources with Memory Markov Example 2 / 21 Encoding the output

More information

Digital communication system. Shannon s separation principle

Digital communication system. Shannon s separation principle Digital communication system Representation of the source signal by a stream of (binary) symbols Adaptation to the properties of the transmission channel information source source coder channel coder modulation

More information

SIGNAL COMPRESSION Lecture Shannon-Fano-Elias Codes and Arithmetic Coding

SIGNAL COMPRESSION Lecture Shannon-Fano-Elias Codes and Arithmetic Coding SIGNAL COMPRESSION Lecture 3 4.9.2007 Shannon-Fano-Elias Codes and Arithmetic Coding 1 Shannon-Fano-Elias Coding We discuss how to encode the symbols {a 1, a 2,..., a m }, knowing their probabilities,

More information

UNIT I INFORMATION THEORY. I k log 2

UNIT I INFORMATION THEORY. I k log 2 UNIT I INFORMATION THEORY Claude Shannon 1916-2001 Creator of Information Theory, lays the foundation for implementing logic in digital circuits as part of his Masters Thesis! (1939) and published a paper

More information

Lecture 2: Introduction to Audio, Video & Image Coding Techniques (I) -- Fundaments

Lecture 2: Introduction to Audio, Video & Image Coding Techniques (I) -- Fundaments Lecture 2: Introduction to Audio, Video & Image Coding Techniques (I) -- Fundaments Dr. Jian Zhang Conjoint Associate Professor NICTA & CSE UNSW COMP9519 Multimedia Systems S2 2006 jzhang@cse.unsw.edu.au

More information

Information Theory with Applications, Math6397 Lecture Notes from September 30, 2014 taken by Ilknur Telkes

Information Theory with Applications, Math6397 Lecture Notes from September 30, 2014 taken by Ilknur Telkes Information Theory with Applications, Math6397 Lecture Notes from September 3, 24 taken by Ilknur Telkes Last Time Kraft inequality (sep.or) prefix code Shannon Fano code Bound for average code-word length

More information

Chapter 2 Date Compression: Source Coding. 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code

Chapter 2 Date Compression: Source Coding. 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code Chapter 2 Date Compression: Source Coding 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code 2.1 An Introduction to Source Coding Source coding can be seen as an efficient way

More information

3F1 Information Theory, Lecture 3

3F1 Information Theory, Lecture 3 3F1 Information Theory, Lecture 3 Jossy Sayir Department of Engineering Michaelmas 2011, 28 November 2011 Memoryless Sources Arithmetic Coding Sources with Memory 2 / 19 Summary of last lecture Prefix-free

More information

Summary of Last Lectures

Summary of Last Lectures Lossless Coding IV a k p k b k a 0.16 111 b 0.04 0001 c 0.04 0000 d 0.16 110 e 0.23 01 f 0.07 1001 g 0.06 1000 h 0.09 001 i 0.15 101 100 root 1 60 1 0 0 1 40 0 32 28 23 e 17 1 0 1 0 1 0 16 a 16 d 15 i

More information

Lecture 2: Introduction to Audio, Video & Image Coding Techniques (I) -- Fundaments. Tutorial 1. Acknowledgement and References for lectures 1 to 5

Lecture 2: Introduction to Audio, Video & Image Coding Techniques (I) -- Fundaments. Tutorial 1. Acknowledgement and References for lectures 1 to 5 Lecture : Introduction to Audio, Video & Image Coding Techniques (I) -- Fundaments Dr. Jian Zhang Conjoint Associate Professor NICTA & CSE UNSW COMP959 Multimedia Systems S 006 jzhang@cse.unsw.edu.au Acknowledgement

More information

4. Quantization and Data Compression. ECE 302 Spring 2012 Purdue University, School of ECE Prof. Ilya Pollak

4. Quantization and Data Compression. ECE 302 Spring 2012 Purdue University, School of ECE Prof. Ilya Pollak 4. Quantization and Data Compression ECE 32 Spring 22 Purdue University, School of ECE Prof. What is data compression? Reducing the file size without compromising the quality of the data stored in the

More information

Lec 04 Variable Length Coding (VLC) in JPEG

Lec 04 Variable Length Coding (VLC) in JPEG ECE 5578 Multimedia Communication Lec 04 Variable Length Coding (VLC) in JPEG Zhu Li Dept of CSEE, UMKC Z. Li Multimedia Communciation, 2018 p.1 Outline Lecture 03 ReCap VLC JPEG Image Coding Framework

More information

Information and Entropy

Information and Entropy Information and Entropy Shannon s Separation Principle Source Coding Principles Entropy Variable Length Codes Huffman Codes Joint Sources Arithmetic Codes Adaptive Codes Thomas Wiegand: Digital Image Communication

More information

CSCI 2570 Introduction to Nanocomputing

CSCI 2570 Introduction to Nanocomputing CSCI 2570 Introduction to Nanocomputing Information Theory John E Savage What is Information Theory Introduced by Claude Shannon. See Wikipedia Two foci: a) data compression and b) reliable communication

More information

4.8 Huffman Codes. These lecture slides are supplied by Mathijs de Weerd

4.8 Huffman Codes. These lecture slides are supplied by Mathijs de Weerd 4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd Data Compression Q. Given a text that uses 32 symbols (26 different letters, space, and some punctuation characters), how can we

More information

DCSP-3: Minimal Length Coding. Jianfeng Feng

DCSP-3: Minimal Length Coding. Jianfeng Feng DCSP-3: Minimal Length Coding Jianfeng Feng Department of Computer Science Warwick Univ., UK Jianfeng.feng@warwick.ac.uk http://www.dcs.warwick.ac.uk/~feng/dcsp.html Automatic Image Caption (better than

More information

ECE Advanced Communication Theory, Spring 2009 Homework #1 (INCOMPLETE)

ECE Advanced Communication Theory, Spring 2009 Homework #1 (INCOMPLETE) ECE 74 - Advanced Communication Theory, Spring 2009 Homework #1 (INCOMPLETE) 1. A Huffman code finds the optimal codeword to assign to a given block of source symbols. (a) Show that cannot be a Huffman

More information

BASICS OF COMPRESSION THEORY

BASICS OF COMPRESSION THEORY BASICS OF COMPRESSION THEORY Why Compression? Task: storage and transport of multimedia information. E.g.: non-interlaced HDTV: 0x0x0x = Mb/s!! Solutions: Develop technologies for higher bandwidth Find

More information

CSE 421 Greedy: Huffman Codes

CSE 421 Greedy: Huffman Codes CSE 421 Greedy: Huffman Codes Yin Tat Lee 1 Compression Example 100k file, 6 letter alphabet: File Size: ASCII, 8 bits/char: 800kbits 2 3 > 6; 3 bits/char: 300kbits better: 2.52 bits/char 74%*2 +26%*4:

More information

10-704: Information Processing and Learning Fall Lecture 10: Oct 3

10-704: Information Processing and Learning Fall Lecture 10: Oct 3 0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 0: Oct 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy of

More information

Information & Correlation

Information & Correlation Information & Correlation Jilles Vreeken 11 June 2014 (TADA) Questions of the day What is information? How can we measure correlation? and what do talking drums have to do with this? Bits and Pieces What

More information

EE376A: Homework #2 Solutions Due by 11:59pm Thursday, February 1st, 2018

EE376A: Homework #2 Solutions Due by 11:59pm Thursday, February 1st, 2018 Please submit the solutions on Gradescope. Some definitions that may be useful: EE376A: Homework #2 Solutions Due by 11:59pm Thursday, February 1st, 2018 Definition 1: A sequence of random variables X

More information

1. Basics of Information

1. Basics of Information 1. Basics of Information 6.004x Computation Structures Part 1 Digital Circuits Copyright 2015 MIT EECS 6.004 Computation Structures L1: Basics of Information, Slide #1 What is Information? Information,

More information

Multimedia Communications. Mathematical Preliminaries for Lossless Compression

Multimedia Communications. Mathematical Preliminaries for Lossless Compression Multimedia Communications Mathematical Preliminaries for Lossless Compression What we will see in this chapter Definition of information and entropy Modeling a data source Definition of coding and when

More information

2018/5/3. YU Xiangyu

2018/5/3. YU Xiangyu 2018/5/3 YU Xiangyu yuxy@scut.edu.cn Entropy Huffman Code Entropy of Discrete Source Definition of entropy: If an information source X can generate n different messages x 1, x 2,, x i,, x n, then the

More information

Information Theory. Week 4 Compressing streams. Iain Murray,

Information Theory. Week 4 Compressing streams. Iain Murray, Information Theory http://www.inf.ed.ac.uk/teaching/courses/it/ Week 4 Compressing streams Iain Murray, 2014 School of Informatics, University of Edinburgh Jensen s inequality For convex functions: E[f(x)]

More information

Digital Communications III (ECE 154C) Introduction to Coding and Information Theory

Digital Communications III (ECE 154C) Introduction to Coding and Information Theory Digital Communications III (ECE 154C) Introduction to Coding and Information Theory Tara Javidi These lecture notes were originally developed by late Prof. J. K. Wolf. UC San Diego Spring 2014 1 / 8 I

More information

Lecture 1 : Data Compression and Entropy

Lecture 1 : Data Compression and Entropy CPS290: Algorithmic Foundations of Data Science January 8, 207 Lecture : Data Compression and Entropy Lecturer: Kamesh Munagala Scribe: Kamesh Munagala In this lecture, we will study a simple model for

More information

Data Compression Techniques

Data Compression Techniques Data Compression Techniques Part 1: Entropy Coding Lecture 4: Asymmetric Numeral Systems Juha Kärkkäinen 08.11.2017 1 / 19 Asymmetric Numeral Systems Asymmetric numeral systems (ANS) is a recent entropy

More information

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018 Please submit the solutions on Gradescope. EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018 1. Optimal codeword lengths. Although the codeword lengths of an optimal variable length code

More information

Introduction to information theory and coding

Introduction to information theory and coding Introduction to information theory and coding Louis WEHENKEL Set of slides No 5 State of the art in data compression Stochastic processes and models for information sources First Shannon theorem : data

More information

CMPT 365 Multimedia Systems. Final Review - 1

CMPT 365 Multimedia Systems. Final Review - 1 CMPT 365 Multimedia Systems Final Review - 1 Spring 2017 CMPT365 Multimedia Systems 1 Outline Entropy Lossless Compression Shannon-Fano Coding Huffman Coding LZW Coding Arithmetic Coding Lossy Compression

More information

Text Compression. Jayadev Misra The University of Texas at Austin December 5, A Very Incomplete Introduction to Information Theory 2

Text Compression. Jayadev Misra The University of Texas at Austin December 5, A Very Incomplete Introduction to Information Theory 2 Text Compression Jayadev Misra The University of Texas at Austin December 5, 2003 Contents 1 Introduction 1 2 A Very Incomplete Introduction to Information Theory 2 3 Huffman Coding 5 3.1 Uniquely Decodable

More information

A Mathematical Theory of Communication

A Mathematical Theory of Communication A Mathematical Theory of Communication Ben Eggers Abstract This paper defines information-theoretic entropy and proves some elementary results about it. Notably, we prove that given a few basic assumptions

More information

17.1 Binary Codes Normal numbers we use are in base 10, which are called decimal numbers. Each digit can be 10 possible numbers: 0, 1, 2, 9.

17.1 Binary Codes Normal numbers we use are in base 10, which are called decimal numbers. Each digit can be 10 possible numbers: 0, 1, 2, 9. ( c ) E p s t e i n, C a r t e r, B o l l i n g e r, A u r i s p a C h a p t e r 17: I n f o r m a t i o n S c i e n c e P a g e 1 CHAPTER 17: Information Science 17.1 Binary Codes Normal numbers we use

More information

Lecture 6: Kraft-McMillan Inequality and Huffman Coding

Lecture 6: Kraft-McMillan Inequality and Huffman Coding EE376A/STATS376A Information Theory Lecture 6-0/25/208 Lecture 6: Kraft-McMillan Inequality and Huffman Coding Lecturer: Tsachy Weissman Scribe: Akhil Prakash, Kai Yee Wan In this lecture, we begin with

More information

Lecture 1: September 25, A quick reminder about random variables and convexity

Lecture 1: September 25, A quick reminder about random variables and convexity Information and Coding Theory Autumn 207 Lecturer: Madhur Tulsiani Lecture : September 25, 207 Administrivia This course will cover some basic concepts in information and coding theory, and their applications

More information

Entropy Coding. Connectivity coding. Entropy coding. Definitions. Lossles coder. Input: a set of symbols Output: bitstream. Idea

Entropy Coding. Connectivity coding. Entropy coding. Definitions. Lossles coder. Input: a set of symbols Output: bitstream. Idea Connectivity coding Entropy Coding dd 7, dd 6, dd 7, dd 5,... TG output... CRRRLSLECRRE Entropy coder output Connectivity data Edgebreaker output Digital Geometry Processing - Spring 8, Technion Digital

More information

Introduction to Information Theory. By Prof. S.J. Soni Asst. Professor, CE Department, SPCE, Visnagar

Introduction to Information Theory. By Prof. S.J. Soni Asst. Professor, CE Department, SPCE, Visnagar Introduction to Information Theory By Prof. S.J. Soni Asst. Professor, CE Department, SPCE, Visnagar Introduction [B.P. Lathi] Almost in all the means of communication, none produces error-free communication.

More information

Chapter 2: Source coding

Chapter 2: Source coding Chapter 2: meghdadi@ensil.unilim.fr University of Limoges Chapter 2: Entropy of Markov Source Chapter 2: Entropy of Markov Source Markov model for information sources Given the present, the future is independent

More information

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria Source Coding Master Universitario en Ingeniería de Telecomunicación I. Santamaría Universidad de Cantabria Contents Introduction Asymptotic Equipartition Property Optimal Codes (Huffman Coding) Universal

More information

PART III. Outline. Codes and Cryptography. Sources. Optimal Codes (I) Jorge L. Villar. MAMME, Fall 2015

PART III. Outline. Codes and Cryptography. Sources. Optimal Codes (I) Jorge L. Villar. MAMME, Fall 2015 Outline Codes and Cryptography 1 Information Sources and Optimal Codes 2 Building Optimal Codes: Huffman Codes MAMME, Fall 2015 3 Shannon Entropy and Mutual Information PART III Sources Information source:

More information

1 Introduction to information theory

1 Introduction to information theory 1 Introduction to information theory 1.1 Introduction In this chapter we present some of the basic concepts of information theory. The situations we have in mind involve the exchange of information through

More information

Lecture 16. Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code

Lecture 16. Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code Lecture 16 Agenda for the lecture Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code Variable-length source codes with error 16.1 Error-free coding schemes 16.1.1 The Shannon-Fano-Elias

More information

Optimal codes - I. A code is optimal if it has the shortest codeword length L. i i. This can be seen as an optimization problem. min.

Optimal codes - I. A code is optimal if it has the shortest codeword length L. i i. This can be seen as an optimization problem. min. Huffman coding Optimal codes - I A code is optimal if it has the shortest codeword length L L m = i= pl i i This can be seen as an optimization problem min i= li subject to D m m i= lp Gabriele Monfardini

More information

Module 5 EMBEDDED WAVELET CODING. Version 2 ECE IIT, Kharagpur

Module 5 EMBEDDED WAVELET CODING. Version 2 ECE IIT, Kharagpur Module 5 EMBEDDED WAVELET CODING Lesson 13 Zerotree Approach. Instructional Objectives At the end of this lesson, the students should be able to: 1. Explain the principle of embedded coding. 2. Show the

More information

Chapter 2 Source Models and Entropy. Any information-generating process can be viewed as. computer program in executed form: binary 0

Chapter 2 Source Models and Entropy. Any information-generating process can be viewed as. computer program in executed form: binary 0 Part II Information Theory Concepts Chapter 2 Source Models and Entropy Any information-generating process can be viewed as a source: { emitting a sequence of symbols { symbols from a nite alphabet text:

More information

Quantum-inspired Huffman Coding

Quantum-inspired Huffman Coding Quantum-inspired Huffman Coding A. S. Tolba, M. Z. Rashad, and M. A. El-Dosuky Dept. of Computer Science, Faculty of Computers and Information Sciences, Mansoura University, Mansoura, Egypt. tolba_954@yahoo.com,

More information

Intro to Information Theory

Intro to Information Theory Intro to Information Theory Math Circle February 11, 2018 1. Random variables Let us review discrete random variables and some notation. A random variable X takes value a A with probability P (a) 0. Here

More information

6.02 Fall 2012 Lecture #1

6.02 Fall 2012 Lecture #1 6.02 Fall 2012 Lecture #1 Digital vs. analog communication The birth of modern digital communication Information and entropy Codes, Huffman coding 6.02 Fall 2012 Lecture 1, Slide #1 6.02 Fall 2012 Lecture

More information

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1 Kraft s inequality An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if N 2 l i 1 Proof: Suppose that we have a tree code. Let l max = max{l 1,...,

More information

MAHALAKSHMI ENGINEERING COLLEGE QUESTION BANK. SUBJECT CODE / Name: EC2252 COMMUNICATION THEORY UNIT-V INFORMATION THEORY PART-A

MAHALAKSHMI ENGINEERING COLLEGE QUESTION BANK. SUBJECT CODE / Name: EC2252 COMMUNICATION THEORY UNIT-V INFORMATION THEORY PART-A MAHALAKSHMI ENGINEERING COLLEGE QUESTION BANK DEPARTMENT: ECE SEMESTER: IV SUBJECT CODE / Name: EC2252 COMMUNICATION THEORY UNIT-V INFORMATION THEORY PART-A 1. What is binary symmetric channel (AUC DEC

More information

Data Compression Techniques

Data Compression Techniques Data Compression Techniques Part 2: Text Compression Lecture 5: Context-Based Compression Juha Kärkkäinen 14.11.2017 1 / 19 Text Compression We will now look at techniques for text compression. These techniques

More information

Data Compression. Limit of Information Compression. October, Examples of codes 1

Data Compression. Limit of Information Compression. October, Examples of codes 1 Data Compression Limit of Information Compression Radu Trîmbiţaş October, 202 Outline Contents Eamples of codes 2 Kraft Inequality 4 2. Kraft Inequality............................ 4 2.2 Kraft inequality

More information

lossless, optimal compressor

lossless, optimal compressor 6. Variable-length Lossless Compression The principal engineering goal of compression is to represent a given sequence a, a 2,..., a n produced by a source as a sequence of bits of minimal possible length.

More information

Greedy. Outline CS141. Stefano Lonardi, UCR 1. Activity selection Fractional knapsack Huffman encoding Later:

Greedy. Outline CS141. Stefano Lonardi, UCR 1. Activity selection Fractional knapsack Huffman encoding Later: October 5, 017 Greedy Chapters 5 of Dasgupta et al. 1 Activity selection Fractional knapsack Huffman encoding Later: Outline Dijkstra (single source shortest path) Prim and Kruskal (minimum spanning tree)

More information

MAHALAKSHMI ENGINEERING COLLEGE-TRICHY QUESTION BANK UNIT V PART-A. 1. What is binary symmetric channel (AUC DEC 2006)

MAHALAKSHMI ENGINEERING COLLEGE-TRICHY QUESTION BANK UNIT V PART-A. 1. What is binary symmetric channel (AUC DEC 2006) MAHALAKSHMI ENGINEERING COLLEGE-TRICHY QUESTION BANK SATELLITE COMMUNICATION DEPT./SEM.:ECE/VIII UNIT V PART-A 1. What is binary symmetric channel (AUC DEC 2006) 2. Define information rate? (AUC DEC 2007)

More information

Lec 05 Arithmetic Coding

Lec 05 Arithmetic Coding ECE 5578 Multimedia Communication Lec 05 Arithmetic Coding Zhu Li Dept of CSEE, UMKC web: http://l.web.umkc.edu/lizhu phone: x2346 Z. Li, Multimedia Communciation, 208 p. Outline Lecture 04 ReCap Arithmetic

More information

Adapting Boyer-Moore-Like Algorithms for Searching Huffman Encoded Texts

Adapting Boyer-Moore-Like Algorithms for Searching Huffman Encoded Texts Adapting Boyer-Moore-Like Algorithms for Searching Huffman Encoded Texts Domenico Cantone Simone Faro Emanuele Giaquinta Department of Mathematics and Computer Science, University of Catania, Italy 1 /

More information

Using an innovative coding algorithm for data encryption

Using an innovative coding algorithm for data encryption Using an innovative coding algorithm for data encryption Xiaoyu Ruan and Rajendra S. Katti Abstract This paper discusses the problem of using data compression for encryption. We first propose an algorithm

More information

Reduce the amount of data required to represent a given quantity of information Data vs information R = 1 1 C

Reduce the amount of data required to represent a given quantity of information Data vs information R = 1 1 C Image Compression Background Reduce the amount of data to represent a digital image Storage and transmission Consider the live streaming of a movie at standard definition video A color frame is 720 480

More information

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 13 Competitive Optimality of the Shannon Code So, far we have studied

More information

Information and Entropy. Professor Kevin Gold

Information and Entropy. Professor Kevin Gold Information and Entropy Professor Kevin Gold What s Information? Informally, when I communicate a message to you, that s information. Your grade is 100/100 Information can be encoded as a signal. Words

More information

Communications Theory and Engineering

Communications Theory and Engineering Communications Theory and Engineering Master's Degree in Electronic Engineering Sapienza University of Rome A.A. 2018-2019 AEP Asymptotic Equipartition Property AEP In information theory, the analog of

More information

ECE 587 / STA 563: Lecture 5 Lossless Compression

ECE 587 / STA 563: Lecture 5 Lossless Compression ECE 587 / STA 563: Lecture 5 Lossless Compression Information Theory Duke University, Fall 2017 Author: Galen Reeves Last Modified: October 18, 2017 Outline of lecture: 5.1 Introduction to Lossless Source

More information

Image Compression. Qiaoyong Zhong. November 19, CAS-MPG Partner Institute for Computational Biology (PICB)

Image Compression. Qiaoyong Zhong. November 19, CAS-MPG Partner Institute for Computational Biology (PICB) Image Compression Qiaoyong Zhong CAS-MPG Partner Institute for Computational Biology (PICB) November 19, 2012 1 / 53 Image Compression The art and science of reducing the amount of data required to represent

More information

COMP9319 Web Data Compression and Search. Lecture 2: Adaptive Huffman, BWT

COMP9319 Web Data Compression and Search. Lecture 2: Adaptive Huffman, BWT COMP9319 Web Data Compression and Search Lecture 2: daptive Huffman, BWT 1 Original readings Login to your cse account: cd ~cs9319/papers Original readings of each lecture will be placed there. 2 Course

More information

! Where are we on course map? ! What we did in lab last week. " How it relates to this week. ! Compression. " What is it, examples, classifications

! Where are we on course map? ! What we did in lab last week.  How it relates to this week. ! Compression.  What is it, examples, classifications Lecture #3 Compression! Where are we on course map?! What we did in lab last week " How it relates to this week! Compression " What is it, examples, classifications " Probability based compression # Huffman

More information

Ch 0 Introduction. 0.1 Overview of Information Theory and Coding

Ch 0 Introduction. 0.1 Overview of Information Theory and Coding Ch 0 Introduction 0.1 Overview of Information Theory and Coding Overview The information theory was founded by Shannon in 1948. This theory is for transmission (communication system) or recording (storage

More information

Exercises with solutions (Set B)

Exercises with solutions (Set B) Exercises with solutions (Set B) 3. A fair coin is tossed an infinite number of times. Let Y n be a random variable, with n Z, that describes the outcome of the n-th coin toss. If the outcome of the n-th

More information

An Approximation Algorithm for Constructing Error Detecting Prefix Codes

An Approximation Algorithm for Constructing Error Detecting Prefix Codes An Approximation Algorithm for Constructing Error Detecting Prefix Codes Artur Alves Pessoa artur@producao.uff.br Production Engineering Department Universidade Federal Fluminense, Brazil September 2,

More information

Coding for Discrete Source

Coding for Discrete Source EGR 544 Communication Theory 3. Coding for Discrete Sources Z. Aliyazicioglu Electrical and Computer Engineering Department Cal Poly Pomona Coding for Discrete Source Coding Represent source data effectively

More information

Lecture 10 : Basic Compression Algorithms

Lecture 10 : Basic Compression Algorithms Lecture 10 : Basic Compression Algorithms Modeling and Compression We are interested in modeling multimedia data. To model means to replace something complex with a simpler (= shorter) analog. Some models

More information

TTIC 31230, Fundamentals of Deep Learning David McAllester, April Information Theory and Distribution Modeling

TTIC 31230, Fundamentals of Deep Learning David McAllester, April Information Theory and Distribution Modeling TTIC 31230, Fundamentals of Deep Learning David McAllester, April 2017 Information Theory and Distribution Modeling Why do we model distributions and conditional distributions using the following objective

More information

Multimedia Information Systems

Multimedia Information Systems Multimedia Information Systems Samson Cheung EE 639, Fall 2004 Lecture 3 & 4: Color, Video, and Fundamentals of Data Compression 1 Color Science Light is an electromagnetic wave. Its color is characterized

More information

Source Coding Techniques

Source Coding Techniques Source Coding Techniques. Huffman Code. 2. Two-pass Huffman Code. 3. Lemple-Ziv Code. 4. Fano code. 5. Shannon Code. 6. Arithmetic Code. Source Coding Techniques. Huffman Code. 2. Two-path Huffman Code.

More information

Fibonacci Coding for Lossless Data Compression A Review

Fibonacci Coding for Lossless Data Compression A Review RESEARCH ARTICLE OPEN ACCESS Fibonacci Coding for Lossless Data Compression A Review Ezhilarasu P Associate Professor Department of Computer Science and Engineering Hindusthan College of Engineering and

More information

ECE 587 / STA 563: Lecture 5 Lossless Compression

ECE 587 / STA 563: Lecture 5 Lossless Compression ECE 587 / STA 563: Lecture 5 Lossless Compression Information Theory Duke University, Fall 28 Author: Galen Reeves Last Modified: September 27, 28 Outline of lecture: 5. Introduction to Lossless Source

More information

Digital Image Processing Lectures 25 & 26

Digital Image Processing Lectures 25 & 26 Lectures 25 & 26, Professor Department of Electrical and Computer Engineering Colorado State University Spring 2015 Area 4: Image Encoding and Compression Goal: To exploit the redundancies in the image

More information

ELEC 515 Information Theory. Distortionless Source Coding

ELEC 515 Information Theory. Distortionless Source Coding ELEC 515 Information Theory Distortionless Source Coding 1 Source Coding Output Alphabet Y={y 1,,y J } Source Encoder Lengths 2 Source Coding Two coding requirements The source sequence can be recovered

More information

SGN-2306 Signal Compression. 1. Simple Codes

SGN-2306 Signal Compression. 1. Simple Codes SGN-236 Signal Compression. Simple Codes. Signal Representation versus Signal Compression.2 Prefix Codes.3 Trees associated with prefix codes.4 Kraft inequality.5 A lower bound on the average length of

More information

COMP9319 Web Data Compression and Search. Lecture 2: Adaptive Huffman, BWT

COMP9319 Web Data Compression and Search. Lecture 2: Adaptive Huffman, BWT COMP9319 Web Data Compression and Search Lecture 2: daptive Huffman, BWT 1 Original readings Login to your cse account:! cd ~cs9319/papers! Original readings of each lecture will be placed there. 2 Course

More information