Entropy Coding. Connectivity coding. Entropy coding. Definitions. Lossles coder. Input: a set of symbols Output: bitstream. Idea

Size: px

Start display at page:

Download "Entropy Coding. Connectivity coding. Entropy coding. Definitions. Lossles coder. Input: a set of symbols Output: bitstream. Idea"

Griselda Farmer
5 years ago
Views:

1 Connectivity coding Entropy Coding dd 7, dd 6, dd 7, dd 5,... TG output... CRRRLSLECRRE Entropy coder output Connectivity data Edgebreaker output Digital Geometry Processing - Spring 8, Technion Digital Geometry Processing - Spring 8, Technion Entropy coding Lossles coder Input: a set of symbols Output: bitstream Idea ssign each symbol a series of bits Use less bits for common symbols Definitions lphabet Finite set containing at least one element Symbol Element in the alphabet string over the alphabet Sequence of symbols from alphabet Codeword its representing coded symbol or string p ioccurrence probability of si in input string L ilength of codeword of si in bits = {a, b, c, d, e} s i S = ccdabdcaad p = P( s S), p = i i i i Digital Geometry Processing - Spring 8, Technion 3 Digital Geometry Processing - Spring 8, Technion 4

2 Entropy Entropy of the set {e,,e n } with probabilities {p,,p n } -log p i = uncertainty in symbol e i The surprise when we see this symbol Entropy average surprise on all symbols In our context H ( p, K, p ) p log p n i i i= Minimal number of bits on the average, needed to represent a symbol verage on all symbols code lengths ssuming no dependencies between symbols appearances n Entropy example Entropy calculation for a two symbol alphabet. Example : p =.5 p =.5 ( ) H, = p log =.5log p p log.5.5log p =.5 = We need bit per symbol on average to represent the data. Digital Geometry Processing - Spring 8, Technion 5 Digital Geometry Processing - Spring 8, Technion 6 Entropy example Entropy calculation for a two symbol alphabet. Example : p =.8 p =. H (, ) = p log p p log p = =.8log.8. log..79 We need LESS than bit per symbol on average Entropy examples Entropy of {e, e n }is maximized when p =p = =p n =/n H(e,,e n )=log n No symbol is better than the other or contains more information k symbols must be represented by k bits Entropy of {e, e n } is minimized when p =, p = =p n = H(e,,e n )= Digital Geometry Processing - Spring 8, Technion 7 Digital Geometry Processing - Spring 8, Technion 8

3 Entropy coding Entropy Lower bound on average number of bits needed for alphabet Data compression limit Coding efficiency = its Per Symbol Entropy coding methods length(encoded message) PS = length(original message) Try to achieve entropy of alphabet: PS Entropy If PS = Entropy, code is optimal Code types Fixed-length codes ll codewords have same length (number of bits),, C, D, E, F Variable-length codes Codewords can have different lengths,, C, D, E, F - Digital Geometry Processing - Spring 8, Technion 9 Digital Geometry Processing - Spring 8, Technion Code types Prefix code No codeword is a prefix of any other codeword =, =, C =, D = Uniquely decodable code Has only one possible source string producing it Unambigously decoded Examples: Prefix code - end of codeword recognized without ambiguity Fixed-length code Huffman code variable-length prefix code Codeword chosen by probability of appearance High probability short codeword Integral number of bits per codeword Optimal variable-length prefix code for known probablilities Encoding/decoding done using Huffman tree Digital Geometry Processing - Spring 8, Technion Digital Geometry Processing - Spring 8, Technion

4 Huffman tree example Codeword determined according to to path pathfrom root rootto to symbol When decoding, tree tree traversal is is performed, starting from from root root Probabilities codewords: C- - D- E- Digital Geometry Processing - Spring 8, Technion 3 Example: decoding input (D) (D) Huffman encoding example Use previous codewords to encode CE : String: Encoded: C Number of bits used: 9 E The PS is (9 bits/4 symbols) =.5 Entropy: -.5log.5 -.5log.5 -.log. -.5log.5 -.5log.5 =.854 PS lower than entropy. WHY? Digital Geometry Processing - Spring 8, Technion 4 Huffman tree construction Init: Init: Leaf Leaf for for each each symbol symbol ssof of alphabet with with weight weight p s p s while while (tree (tree not not connected) connected) do do y, y, z z lowest_root_weights() lowest_root_weights() new_root new_root attachsons(y, attachsons(y, z) z) weight(r) weight(r) = weight(y)+weight(z) weight(y)+weight(z) Probabilities codewords: C- C - D- D E- E Digital Geometry Processing - Spring 8, Technion 5 Huffman tree construction Initialization Leaf for each symbol s of alphabet with weight p s Can work instead with integer weights - number of occurrences while (tree not connected) do Y, Z lowest_root_weights_tree() new_root r->attachsons(y, Z) attach one via a, the other via a, order not significant weight(r) = weight(y)+weight(z) Digital Geometry Processing - Spring 8, Technion 6

5 Huffman encoding uild a table of per-symbol encodings - generated from Huffman tree Globally known to both encoder and decoder Sent by encoder, read by decoder Encode one symbol after the other, using encoding table. Encode the pseudo-eof symbol. Huffman decoding Construct decoding tree based on encoding table Read coded message bit-by-bit Traverse the tree top to bottom accordingly When a leaf is reached, a codeword was found corresponding symbol is decoded Repeat until the pseudo-eof symbol is reached No ambiguities - prefix code Digital Geometry Processing - Spring 8, Technion 7 Digital Geometry Processing - Spring 8, Technion 8 Symbol probabilities Example Global English frequencies table How are the probabilities known? Counting symbols in input string Data must be given in advance Requires an extra pass on the input string Data source s distribution is known Data not known in advance, but distribution is known Digital Geometry Processing - Spring 8, Technion 9 Letter C D E F G H I J K L M Prob Total:. Letter N O P Q R S T U V W X Y Z Prob Digital Geometry Processing - Spring 8, Technion

6 Huffman entropy analysis est results - entropy wise Only when occurrence probabilities are negative powers of (i.e. ½, ¼, ). Otherwise, PS > entropy bound. Example Entropy =.75 Symbol n input stream which represens the probabilities CD Code: PS = (4 bits/8 symbols) =.75 C D Probability Codeword Digital Geometry Processing - Spring 8, Technion Huffman tree Construction complexity Simple implementation - O(n ). Using a Priority Queue - O(n log(n)): Inserting a new node O(log(n)) n nodes insertions - O(n log(n)) Retrieving smallest node weights o(log(n)) Digital Geometry Processing - Spring 8, Technion Huffman summary chieves entropy when occurrence probabilities are negative powers of lphabet and distribution must be known in advance Given Huffman tree, very easy (and fast) to encode and decode Huffman code not unique (arbitrary decisions in tree construction) etter than Huffman? Huffman optimal, so how can improve? Use fractional number of bits per codeword rithmetic coding Learn probabilities from bit stream Lempel-Ziv coding Unknown alphabet Unknown probabilities Handles dependencies between symbols Digital Geometry Processing - Spring 8, Technion 3 Digital Geometry Processing - Spring 8, Technion 4

Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet)

Compression Motivation Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet) Storage: Store large & complex 3D models (e.g. 3D scanner