Administrivi CSE 39 Introdution to Dt Compression Spring 23 Leture : Introdution to Dt Compression Entropy Prefix Codes Instrutor Prof. Alexnder Mohr mohr@s.sunys.edu offie hours: TBA We http://mnl.s.sunys.edu/lss/se39/24-fll/ Miling list http://mnl.s.sunys.edu/milmn/listinfo/se39/ Plese susrie y Mondy! Text Book Khlid Syood, Introdution to Dt Compression, Seond Edition, Morgn Kufmnn Pulishers, 2, ISBN 55865584. $79.95 list. 2 Bsi Dt Compression Conepts Compression Rtios: Bewre! originl Enoder ompressed x y xˆ Deoder Lossless ompression x = xˆ Also lled entropy oding, reversile oding. Lossy ompression x xˆ Also lled irreversile oding. Compression rtio = x y x is the numer of its in x. deompressed Compression rtio = x y. Two wys to mke the rtio lrger: Derese the size of the ompressed version. Inrese the size of the unompressed version! 3 4 Why Compress Brille Conserve storge spe. Redue time for trnsmission: Fster to enode, send, nd deode thn to send the originl. Progressive trnsmission: Some ompression tehniques llow us to send the most importnt its first so we n get low resolution version of some dt efore getting the high fidelity version. Redue omputtion Use less dt to hieve n pproximte nswer. System to red text y feeling rised dots on pper (or on eletroni displys). Invented in 82s y Louis Brille, Frenh lind mn. z nd the with mother th h gh 5 6
Brille Exmple Cler text: Cll me Ishmel. Some yers go -- never mind how long preisely -- hving \\ little or no money in my purse, nd nothing prtiulr to interest me on shore, \\ I thought I would sil out little nd see the wtery prt of the world. (238 hrters) Grde 2 Brille in ASCII.,ll me,i\%mel4,``s ye$>$s go -- n``e m9d h[ l;g preisely -- hv+ \\ ll or no m``oy 9 my purse \& no?+ ``piul$>$ 6 9t]e/ me on \%ore \\,i $?$``$ $,i wd sil ll \& see! wt]y ``p (! \_w4 (23 hrters) Compression rtio = 238/23 =.7 7 8 9 Lossless Compression Lossy Compression Dt is not lost - the originl is relly needed. Dt is lost, ut not too muh: text ompression. ompression of omputer inries to fit on floppy. Audio. Video. Still imges, medil imges, photogrphs. Compression rtio typilly no etter thn 4: for lossless ompression on mny kinds of files. Sttistil Tehniques: Compression rtios of : often yield quite high fidelity results. Mjor tehniques inlude: Huffmn oding. Arithmeti oding. Golom oding. Ditionry tehniques: LZW, LZ77. Sequitur. Burrows-Wheeler Method. Stndrds - Morse ode, Brille, Unix ompress, gzip, zip, zip, GIF, PNG, JBIG, Lossless JPEG. Vetor Quntiztion. Wvelets. Blok trnsforms. Stndrds JPEG, JPEG 2, MPEG (, 2, 4, 7). 2 2
Why is Dt Compression Possile Most dt from nture hs redundny There is more dt thn the tul informtion ontined in the dt. Squeezing out the exess dt mounts to ompression. However, unsqeezing out is neessry to e le to figure out wht the dt mens. Alwys possile to ompress? Consider two-it sequene. Cn you lwys ompress it to one it? Informtion theory is needed to understnd the limits of ompression nd give lues on how to ompress well. 3 Wht is Informtion? Anlog dt: Also lled ontinuous dt. Represented y rel numers (or omplex numers). Digitl dt: Finite set of symols {, 2,, n }. All dt represented s sequenes (strings) in the symol set. Exmple: {,,, d, r}: rdr. Digitl dt n e n pproximtion to nlog dt. 4 Symols Romn lphet plus puntution. ASCII 256 symols. Binry {, }: nd re lled its. All digitl informtion n e represented in inry. {,,, d} fixed length representtion: ; ; ; d. 2 its per symol. Exerise Bits Per Symol Suppose we hve n symols. How mny its (s funtion of n) re neessry to represent symol in inry? Hint: Turn the prolem round: how mny symols n s funtion of? 5 6 Disussion Non-powers of 2. Cn we do etter thn fixed length representtion for non-powers-of-2? Informtion Theory Developed y Shnnon in the 94 s nd 5 s. Attempts to explin the limits of ommunition using proility theory. Exmple: Suppose English text is eing sent It is more likely you reeive n e thn z. In some sense, z hs more informtion thn e euse you expet e. 7 8 3
y First-order Informtion Suppose we re given symols {, 2,..., m }. P( i ) = proility of symol i ourring in the sene of ny other informtion. P( ) + P( 2 ) +... + P( m ) = inf( i ) = -log 2 P( i ) its is the informtion of i in its. 7 6 5 4 3 2 -log(x) Exmple {,, } with P() = /8, P() = /4, P() = 5/8 inf() = -log 2 (/8) = 3 inf() = -log 2 (/4) = 2 inf() = -log 2 (5/8) =.678 Reeiving n hs more informtion thn reeiving or...8.5.22.29.36.43.5.57.64.7.78 x.85.92.99 9 2 First Order Entropy The first order entropy is defined for proility distriution over symols {, 2,..., m }: m H = P ) log ( P( )) i= ( i 2 H is the verge numer of its required to ode up symol, given ll we know is the proility distriution of the symols. H is the Shnnon lower ound on the verge numer of its to ode symol in this soure model. Stronger models of entropy inlude ontext. We ll tlk out this lter. i Entropy Exmples {,, } with /8, /4, 5/8. H = /8 *3 + /4 *2 + 5/8*.678 =.3 its/symol {,, } with /3, /3, /3. (worst se) H = -3* (/3)*log 2 (/3) =.6 its/symol Note tht the stndrd oding of 3 symols tkes 2 its. 2 22 An Extreme Cse {,, } with p()=, p()=, p()=. H =? Entropy Curve Suppose we hve two symols with proilities x nd -x, respetively. mximum entropy t.5.2 -(x log x + (-x)log(-x)) entropy.8.6.4.2..2.3.4.5.6.7.8.9 proility of first symol 23 24 4
A Simple Prefix Code {,, } with /8, /4, 5/8. A prefix ode is defined y inry tree Prefix ode property no output is prefix of nother tree input output ode repet strt t root of tree repet if red it = then go right else go left until node is lef report lef until end of the ode 25 26 27 28 29 3 5
3 32 33 34 35 36 6
How Good is the Code? 5/8 /8 /4 it rte = (/8)2 + (/4)2 + (5/8) = /8 =.375 ps Entropy =.3 ps Stndrd ode = 2 ps (ps = its per symol) 37 38 Exerise Exerise 2 Plyer : pik string from lphet {,,, d} nd enode the string using the tree on the ord. Plyer 2: deode the string plyer gives you. Chek for equlity. (While you wit, try Exerise 2). String: rdr. Alphet: {,,, d, r}. Design prefix ode tht ompresses the string the most! 39 4 7