Dt Compession LZ77 Jens Mülle Univesität Stuttgt 2008-11-25
Outline Intoution Piniple of itiony methos LZ77 Sliing winow Exmples Optimiztion Pefomne ompison Applitions/Ptents Jens Mülle- IPVS Univesität Stuttgt 2
Piniple of itiony methos Compessing multiple stings n e moe effiient thn ompessing single symols only (e.g. Huffmn enoing). Stings of symols e e to itiony. Lte ouenes e efeene. Stti itiony: Enties e peefine n onstnt oing to the pplition of the text Aptive itiony: Enties e tken fom the text itself n ete on-the-fly Jens Mülle- IPVS Univesität Stuttgt 3
LZ77 Fist ppe y Ziv n Lempel in 1977 out lossless ompession with n ptive itiony. Goes though the text in sliing winow onsisting of seh uffe n look he uffe. Seh uffe this is text tht is eing Look-he uffe e though the winow The seh uffe is use s itiony Sizes of these uffes e pmetes of the implementtion. Assumption: Pttens in text ou within nge of the seh uffe. Jens Mülle- IPVS Univesität Stuttgt 4
LZ77 Exmple (Enoing) Enoing of the sting: output tuple: (offset, length, symol) 7 6 5 4 3 2 1 output (0,0,) (0,0,) (0,0,) (3,1,) (2,1,) (7,4,) Seh uffe Look-he uffe 12 htes ompesse into 6 tuples Compession te: (12*8)/(6*(5+2+3))=96/60=1,6=60%. Jens Mülle- IPVS Univesität Stuttgt 5
Size of output Size fo eh output tuple (offset, length, symol) when using fixe-length stoge: log 2 S + log2( S+ L) + log2 A whee S is the length of the seh uffe, L the length of the look he winow, A the size of the lphet. Why S+L n not only S? See next slie. Wost se if no symol epets in the seh uffe: n ( ) log2 S + log2( S+ L) log2 A log2 A Blow up of n + inste of Jens Mülle- IPVS Univesität Stuttgt 6
Enoing ehes into look-he uffe Speil se he he e s : HA 7 6 5 4 3 2 1 output s i : H A H A H AHA! (0,0,H) s i : H A H A H A HH! (0,0,A) i : H A H A H A H A! (2,4,H) H A H A H A H A! (2,1,!) H A H A H A! Seh uffe Look-he uffe Jens Mülle- IPVS Univesität Stuttgt 7
Enoing Pseuo oe lgoithm while look-he uffe is not empty go kws in seh uffe to fin longest mth of the look-he uffe if mth foun else fi en while pint: (offset fom winow ouny, length of mth, next symol in lookhe uffe); shift winow y length+1; pint: (0, 0, fist symol in look-he uffe); shift winow y 1; Jens Mülle- IPVS Univesität Stuttgt 8
Exmple (Deoing) input 7 6 5 4 3 2 1 (0,0,) (0,0,) (0,0,) (3,1,) (2,1,) (7,4,) Jens Mülle- IPVS Univesität Stuttgt 9
Deoing Pseuo oe lgoithm fo eh token (offset, length, symol) next if offset = 0 then pint symol; else fi go evese in pevious output y offset htes n opy hte wise fo length symols; pint symol; LZ77 is symmeti, enoing is moe iffiult thn eoing s it nees to fin the longest mth. Jens Mülle- IPVS Univesität Stuttgt 10
Optimiztions Suessos following LZ77 use iffeent optimiztions: Use vile size offset n length fiels in the tuples inste of fixe-length. Bette if smll offsets n sizes pevil. Don t output (0,0,x) token when hte is not foun ut inste iffeentite using flg-it: 0 x o 1 o,l Use ette suite t stutue (e.g. tee, hsh set) fo the uffes. This llows fste seh n/o lge uffes. Aitionl Huffmn oing of tuples/efeenes. -> LZSS, LZB, LZH, LZR, LZFG, LZMA, Deflte, Jens Mülle- IPVS Univesität Stuttgt 11
Pefomne 8 Bits/Symol 7 6 LZ77 LZR LZSS LZH 5 4 3 2 1 0 i ook1 ook2 geo news oj1 oj2 ppe1 pi pog pog1 pogp tns Benhmk (Fom Bell/Cley/Witten: Text Compession) Jens Mülle- IPVS Univesität Stuttgt 12
Applitions, Ptents Unlike LZ78, LZ77 hs not een ptente. This my e eson why its suessos sing on LZ77 e so wiely use: Deflte is omintion of LZSS togethe with Huffmn enoing n uses winow size of 32kB. This lgoithm is open soue n use in wht is wiely known s ZIP ompession (lthough the ZIP fomt itself is only ontine fomt, like AVI n n e use with sevel lgoithms), n y the fomts PNG, TIFF, PDF n mny othes. Jens Mülle- IPVS Univesität Stuttgt 13
Refeenes SOLOMON, D.: Dt Compession, The Complete Refeene., Spinge, New Yok, 1998 BELL, T. C., CLEARY, J. G., WITTEN, I. H.: Text Compession, Pentie Hll Avne Refeene Seies, 1990 SAYOOD, K.: Intoution to Dt Compession, Aemi Pess, Sn Diego, CA,1996, 2000. ZIV, J., LEMPEL, A.: A univesl lgoithm fo sequentil t ompession. IEEE Tnstions on Infomtion Theoy 23 (1977), 337 343. Jens Mülle- IPVS Univesität Stuttgt 14