Basic Principles of Lossless Coding. Universal Lossless coding. Lempel-Ziv Coding. 2. Exploit dependences between successive symbols.

Size: px

Start display at page:

Download "Basic Principles of Lossless Coding. Universal Lossless coding. Lempel-Ziv Coding. 2. Exploit dependences between successive symbols."

Sheila Mason
6 years ago
Views:

1 Universal Lossless coding Lempel-Ziv Coding Basic principles of lossless compression Historical review Variable-length-to-block coding Lempel-Ziv coding 1 Basic Principles of Lossless Coding 1. Exploit unequal probabilities 2. Exploit dependences between successive symbols. 2

2 Variable-length codes 1. Exploiting Unequal Probabilties Code must be uniquely decodable. Can restrict attention to prefix codes (without loss of optimality) source symbol probability p i binary codewr d w i len. L i a b c d e f g M R = avg. codeword length = pi L i i=1 = 2.7 bits/symbol Compare to log 2 7 = 3 or log 2 7 = Entropy Theory Q. What sets of lengths can a code have? A. Kraft inequality theorem M -L i { L 1,...,L M } are the lengths of a (prefix) code iff 2 1 i=1 Q. What is the smallest average length of any code? A. Follows from Kraft inequality H(P) L _ * H(P) + 1 where _ L* = minimum average length Key ideas H(P) = M - pi log p i = entropy of P = {p 1,...,p M } i=1 L i log p i H(P) measures "inequity" among probabilities _ Previous example: H(P) = 2.67, L* =

3 2. Exploiting Redundancy (Symbol Dependences) Assume: Source is stationary random process X = {X k } X k 's are dependent random variables Block-to-variable-length prefix code source symbol probability p i,j binary codewor d w i,j length L i,j aa 1/ ab 1/ ac 1/ ba 1/ bb 1/ bc 1/ bc 1/ ca 1/ cb 1/ cc 1/ Average length: _ M L = p(i,j) L i,j = 3.17 i,j=1 Rate _ R = L 2 = 1.59 bits/symbol Comparison: coding one symbol at a time: _ L* = 1.67, H(P) = 1.58 ( P = {1/3, 1/3, 1/3} ) 5 Entropy Theory Q. What is the smallest rate of a block-to-variable length code with blocklength K? A. For blocklength K From previous result where H K (X) R * K H K (X) + 1 K R * K = minimum average rate Rate = avg. length K = 1 K x p(x) L(x) (x = (x 1 x k ) ) H K (X) = - 1 K x p(x) log p(x) = Kth-order entropy H K (X) H (X) = ''entropy-rate''. (Inequity among joint probabilities increases with K.) Q. What is the smallest rate of a block-to-variable length code with any blocklength? A. R * = H (X) Example: English text H (X) 1.3 bits/sample 6

4 1. Complexity Shortcomings of Block-to-Variable-length Codes A blocklength K code for a source with M letters has M K codewords. Example: M = 26, K = 5 = avg. word length: M K 12 x Dependence upon statistical model a. What if source probabilities are not known? b. What if source probabilities "change" over time? c. What if there are no probabilities? 7 Historical Review 1948 Theoretical Developments Shannon McMillan Practical Developments Huffman 1960 Elias run-length coding conditional coding two-pass block universal coding Ziv-Lempel data base data compresion Arithmetic coding adaptive Huffman one-pass universal coding UNIX Compress

5 Two-Pass Universal Coding Kolmogorov '65, 'Fitingof '66, '67; Lynch '66; Davisson '66, '67; Shtarkov '7; Rice '71; Babkin '71; Cover '72; Ziv '72; Schalkwijk '72, Davisson '73 Encode a long block ( say M = 100 to 1000 ) 1. Measure frequencies of symbols (or short blocks) 2. Encode these frequencies 3. Design a code presuming these frequencies 4. Encode block using these Example: Lynch-Davisson code Typical Theorem Given a source alphabet A, there exists a "universal code" C fixed-tovariable length code such that R θ (C) H θ (X) for any stationary Markov source θ with alphabet A. Complexity Worse than before (for exploiting redundancy) 9 Lempel-Ziv Coding Many variations. All are basically variable-length-to-block coding. LZ-78 and LZ-77 are the two broad categories of LZ codes. 10

6 Variable-Length-to-Block Coding aka Dictionary Coding dictionary of source words probability length codewor binary p i n i d w i a (aa) b (bb) ab (abb) aba (abaa) ba (baa) bab (bab) abab (ababb) ababa (ababa) a b dictionary tree a b ab ba aba bab abab ababa "Greedy" Encoding: Find the longest sourceword w that prefixes the remaining symbols. codeword len. L Rate: R = avg. sourceword len. = =.9 bits/symbol p i n i i The above probabilities assume greedy encoding and (approximately) a Markov model with p(a) = p(b) = 0.5, p(b a) = p(a b) = 0.9. Basic idea: A good code has p i 2 -L all i 11 LZ-78: Lempel-Ziv Coding An Adaptive/Universal Block to VL Code source sequence to encode: a b a b a b a b a b a b a b a b a b a b a b a b a b a b Rate: encoder table source word index code word a b L L+1 E w Find longest sourceword w that prefixes the remaining symbols. 2. Send index of w and next symbol α. 3. Remove w α from source sequence. 4. If table not full, add w α to table. Go to If table full, continue encoding with Step 1 only. bits/symbol For the above squenc: Rate 4(L+1) n, when n 2 L, and Rate (L+1) 2 L-1, when n > 2 L, where n = # sourcewords encoded 12

7 LZ-78 Decoding encoded symbols 1 b 3 a 2 a 5 b 4 b 7 a 6 a.. decoder table index source code word word 1 a b L LZW -- A Variation of LZ-78 source sequence: a b a b a b a b a b a b a b a b a b a b a b a b a b a b.. encoder table index source word code word 1 a b L Find longest sourceword w that prefixes the remaining symbols. 2. Send index of w. 3. Remove w from source sequence. 4. If table not full, add w α to table, where α is next source symbol. Go to 1. L Rate: E w bits/symbol For this sequence: Rate 4L n, when n 2 L, and Rate L 2 L-1, when n > 2 L where n = number of sourcewords encoded 14

8 Decoding LZW encoded symbols decoder table index source code word 1 a word b L Other variants: do not limit the size of table, nor fix the length of the codewords. 16

9 LZ-77 source sequence: a b a b a b a b a b a b a b a b a b a b a b a b a b a b.. Fix integers B and C. (Typically, B = C 2 βc, for some 0 < β < 1 ) Assume we've already encoded x 1,...,x k. To encode x k+1 x k+2..., Consider x k-b+1... x k to be stored in a "buffer" of length B. 1. Find pointer I and length J ( 1 I B, 1 J C ) such that x k-j,..., x k-j+i = x k+1 x k+2... x k+j with J as large as possible. 2. Send I and J with log 2 B bits and log 2 C bits, respectively Rate: R log B + log C E J For this sequence: R = This variant continually adapts. bits/symbol log B + log C C 17 LZ Optimality Theorem (ZL '78) For any stationary ergodic source, if LZ parameters are chosen large enough, Rate entropy-rate Theorem (ZL '78) For any infinite sequence, if parameters are chosen large enough, the rate of the LZ codes is as small as that of any finite-state code designed just for that sequence. Why? Theorem (Wyner, Ziv '89) Let J B (x) length of longest prefix of x = x 1,x 2,... that is matched by some string in the buffer x -B+1,...,x 0. For a stationary ergodic source, lim B log B J B (x) = H (X) in probability, i.e. J B (x) log B H (X) with high prob. Consider Variant #3 (LZ 77) with buffer size B and length matching limit C log B H (X). Then J = C with high probability. Rate = log B + log C E J log B + log log B - log H (X) (log B)/H (X) log log B H (X) 1+ log B - log H (X) log B H (X) 18

Chapter 2: Source coding

Chapter 2: Source coding Chapter 2: meghdadi@ensil.unilim.fr University of Limoges Chapter 2: Entropy of Markov Source Chapter 2: Entropy of Markov Source Markov model for information sources Given the present, the future is independent