Lectre 8: Coression CS2: Great Insights in Coter Science Michael L Littan Sring 2006 Overview When we decide how to reresent soething in its there are soe coeting interests: easil anilated/rocessed short Coon to se two reresentations: one direct to allow for eas rocessing one terse (coressed) to save storage and conication costs
Plan I going to tr to descrie one neat idea ilicit in Chater 6: Hffan coding For ore inforation see wiiedia: htt://enwiiediaorg/wii/ Hffan_coding Gettsrg Address For score and seven ears ago or fathers roght forth on this continent a new nation conceived in Liert and dedicated to the roosition that all en are created eal Now we are engaged in a great civil war testing whether that nation or an nation so conceived and so dedicated can long endre We are et on a great attlefield of that war We have coe to dedicate a ortion of that field as a final resting lace for those who here gave their lives that that nation ight live It is altogether fitting and roer that we shold do this Bt in a larger sense we can not dedicate we can not consecrate we can not hallow this grond The rave en living and dead who strggled here have consecrated it far aove or oor ower to add or detract The world will little note nor long reeer what we sa here t it can never forget what the did here It is for s the living rather to e dedicated here to the nfinished wor which the who foght here have ths far so nol advanced It is rather for s to e here dedicated to the great tas reaining efore s that fro these honored dead we tae increased devotion to that case for which the gave the last fll easre of devotion that we here highl resolve that these dead shall not have died in vain that this nation nder God shall have a new irth of freedo and that governent of the eole the eole for the eole shall not erish fro the earth
Character Conts For silicit let s trn the ercase letters into lowercase letters That leaves s with: 2 <s> <> 0? 2 a c 58 d 65 e 27 f g 80 h 68 i 0 j 2 l 77 n 9 o 79 r s 26 t 2 v w 0 x 0 z Attet #: ASCII The standard forat for reresenting characters ses 8 its er character The address is 82 characters long so a total of 856 its is needed sing this reresentation 8 its er character 856 total its 0% the size of ASCII reresentation
Attet #2: Coact Note that at least in its lowercase for there are onl 2 different characters needed Therefore each can e assigned a 5 it code (2 different 5its atterns) 5 its er character 7 total its 625% the size of ASCII reresentation 5it Patterns 00000 <s> 0000 <> 000 000 000 00? 00 a 00 000 c 00 d 0 e 0 f 00 g 0 h 0 i 0 j 000 00 l 0 0 n 0 o r 00 s 0 t v 0 w x z
Attet #: Variale Len Soe characters are ch ore coon than others Give the ost coon characters a it code and the reaining a 6it code How an its do we need now? Variale Length Patterns 000 <s> 00 e 0 t 0 a 0000 o 000 h 00 r 00 n 00 i 0 d 0 s 0 l 00 c 0 w g f 0 v 000 00 0 0 0 <> 00? 0 j x z
Decodailit Note that the code was chosen so that the first it of each character tells o whether the code is short (0) or long () This choice ensres that a essage can actall e decoded: 000000000000000000 h i <s> t h e r e 2 its not 5 Bt harder to wor with What Gives? We had assigned all 2 characters 5it codes Now we ve got that have it codes and that are 6it codes So ore than half of the characters have actall gotten longer How can that change hel? Need to factor in how an of each characters there are
Adding U the Bits How an its to write down jst the letter? Well there are s and each taes 6 its So 60 its (It was 50 efore) How aot t? There are 26 and each taes its That s 78 (was 60) So how do we total the all? Let c e a character fre(c) the ner of ties it aears and len(c) its encoding length Total its =! c fre(c) x len(c) Sing It U 2x + 65x + 26x +2x + 9x6+ 80x6 + 79x6 + + 0x6 + 0x6 = 6867 2 <s> 65 e 26 t 2 a 9 o 80 h 79 r 77 n 68 i 58 d s 2 l c w g 27 f 2 v <> 0? 0 j 0 x 0 z
Attet #: Sar Total for this exale: 6 its er character 6867 total its 579% the size of ASCII reresentation Attet #: Sorted 0 <s> e t a o Total for this exale: 7 its er character 67 total its 88% the size of ASCII reresentation
Attet #5: Yor Trn Mae sre it is decodale! 2 <s> 65 e 26 t 2 a 9 o 80 h 79 r 77 n 68 i 58 d s 2 l c w g 27 f 2 v <> 0? 0 j 0 x 0 z Can We Do Better? Shannon invented inforation theor which tals aot its and randoness and encodings Fano and Shannon wored together on finding inial size codes The fond a good heristic Fano assigned the role to his class Hffan solved it not nowing his rof had nsccessfll strggled with it
Tree (Prefix) Code First notice that a code can e drawn as a tree Left = 0 right = So e = 00 w = 0 Tree strctre ensres code is decodale: Bits tell o naigosl which character <s> e t a o h r n i d s l c w g f v <>? j x z Hffan Coding Mae each character a stree ( loc ) with cont eal to its freenc Tae two locs with sallest conts and erge the into left and right ranches The cont for the new loc is the s of the conts of the locs it is ade ot of Reeat ntil all locs have een erged into one ig loc (single tree) Read the code off the ranches in the tree
Partial Exale 2 l 85 s 2 v 2 7 9 76 2 a 9 o 95 7 g 27 f 55 w 29 57 2 58 d c <> 8 6 2 26 t 77 n 68 i 5 27 505 2 l s 2 v g 27 f w 58 d c <> 2 l s 2 v g 27 f w 58 d c 8 <> Partial Exale <> <> 8 <> 8 8 <> 2 8 8 <> 2 29 8 8 <> 2 29 8 8 <>
Coleted Code Tree 82 876 77 n 5 27 68 i 26 t 8 8 <> 505 6 c 58 d 2 29 57 2 w g 55 27 f 2 a 95 9 o 2 v 7 7 9 2 s 76 85 2 l 65 e 606 2 9 80 h 79 r 2 <s> Created Code <s> 0 e 000 t 00 a 0 o h r 00000 n 0000 i 00 d 0 s 0 l 000 c 00 w 00 g 00 f 000 v 00 0 0000 0000 000 00 00 000000 00000 <> 000000 00000
Hffan: Sar Total for this exale: its er character 65 total its 57% the size of ASCII reresentation Minial for this te of code Other Codes error detecting: Know if soething has een odified (it fli) error correcting: Know which it has een odified lticharacter: Encode seences (lie the ) with their own codes Can get ch closer to ini ossile code length: Shannon s entro
What To Know constrct a Hffan code fro freencies decode a essage sing a Hffan code encode a essage sing a Hffan code (Let s tr soe exales as tie erits) Next Tie Hillis Chater 8