Chater 6: Meor: Inforation and Secret Codes CS5: Great Insights in Coter Science Overview When we decide how to reresent soething in its there are soe coeting interests: easil anilated/rocessed short Coon to se two reresentations: one direct to allow for eas rocessing one terse (coressed) to save storage and conication costs
Plan I going to tr to descrie one neat idea ilicit in Chater 6: Hffan coding For ore inforation see wiiedia: htt://enwiiediaorg/wii/ Hffan_coding Gettsrg Address For score and seven ears ago or fathers roght forth on this continent a new nation conceived in Liert and dedicated to the roosition that all en are created eal Now we are engaged in a great civil war testing whether that nation or an nation so conceived and so dedicated can long endre We are et on a great attlefield of that war We have coe to dedicate a ortion of that field as a final resting lace for those who here gave their lives that that nation ight live It is altogether fitting and roer that we shold do this Bt in a larger sense we can not dedicate we can not consecrate we can not hallow this grond The rave en living and dead who strggled here have consecrated it far aove or oor ower to add or detract The world will little note nor long reeer what we sa here t it can never forget what the did here It is for s the living rather to e dedicated here to the nfinished wor which the who foght here have ths far so nol advanced It is rather for s to e here dedicated to the great tas reaining efore s that fro these honored dead we tae increased devotion to that case for which the gave the last fll easre of devotion that we here highl resolve that these dead shall not have died in vain that this nation nder God shall have a new irth of freedo and that governent of the eole the eole for the eole shall not erish fro the earth
Character Conts For silicit let s trn the ercase letters into lowercase letters That leaves s with: 282 <s> 4 <> 0? 2 a c 58 d 65 e 27 f 28 g 80 h 68 i 0 j 42 l 77 n 9 o 79 r 44 s 26 t 24 v 28 w 0 x 0 z Attet #: ASCII The standard forat for reresenting characters ses 8 its er character The address is 82 characters long so a total of 856 its is needed sing this reresentation 8 its er character 856 total its 0% the size of ASCII reresentation
Attet #2: Coact Note that at least in its lowercase for there are onl 2 different characters needed Therefore each can e assigned a 5it code (2 different 5its atterns) 5 its er character 74 total its 625% the size of ASCII reresentation 5it Patterns 00000 <s> 0000 <> 000 000 000 00? 00 a 00 000 c 00 d 0 e 0 f 00 g 0 h 0 i 0 j 000 00 l 0 0 n 0 o r 00 s 0 t v 0 w x z
Attet #: Var Length Soe characters are ch ore coon than others Give the 4 ost coon characters a it code and the reaining 28 a 6it code How an its do we need now? Variale Length Patterns 000 <s> 00 e 0 t 0 a 0000 o 000 h 00 r 00 n 00 i 0 d 0 s 0 l 00 c 0 w g f 0 v 000 00 0 0 0 <> 00? 0 j x z
Decodailit Note that the code was chosen so that the first it of each character tells o whether the code is short (0) or long () This choice ensres that a essage can actall e decoded: 000000000000000000 h i <s> t h e r e 42 its not 45 Bt harder to wor with What Gives? We had assigned all 2 characters 5it codes Now we ve got 4 that have it codes and 28 that are 6it codes So ore than half of the characters have actall gotten longer How can that change hel? Need to factor in how an of each characters there are
Adding U the Bits How an its to write down jst the letter? Well there are s and each taes 6 its So 60 its (It was 50 efore) How aot t? There are 26 and each taes its That s 78 (was 60) So how do we total the all? Let c e a character fre(c) the ner of ties it aears and len(c) its encoding length Total its =! c fre(c) x len(c) Sing It U 282x + 65x + 26x +2x + 9x6+ 80x6 + 79x6 + + 0x6 + 0x6 = 6867 282 <s> 65 e 26 t 2 a 9 o 80 h 79 r 77 n 68 i 58 d 44 s 42 l c 28 w 28 g 27 f 24 v 4 <> 0? 0 j 0 x 0 z
Attet #: Sar Total for this exale: 46 its er character (82 characters) 6867 total its 579% the size of ASCII reresentation Attet #4: Sorted 0 <s> e t a o Total for this exale: 7 its er character 467 total its 88% the size of ASCII reresentation
Attet #5: Yor Trn Mae sre it is decodale! 282 <s> 65 e 26 t 2 a 9 o 80 h 79 r 77 n 68 i 58 d 44 s 42 l c 28 w 28 g 27 f 24 v 4 <> 0? 0 j 0 x 0 z Can We Do Better? Shannon invented inforation theor which tals aot its and randoness and encodings Fano and Shannon wored together on finding inial size codes The fond a good heristic t didn t solve it Fano assigned the role to his class Hffan solved it not nowing his rof had nsccessfll strggled with it
Tree (Prefix) Code First notice that a code can e drawn as a tree Left = 0 right = So e = 00 w = 0 Tree strctre ensres code is decodale: Bits tell o naigosl which character <s> e t a o h r n i d s l c w g f v <>? j x z Hffan Coding Mae each character a stree ( loc ) with cont eal to its freenc Tae two locs with sallest conts and erge the into left and right ranches The cont for the new loc is the s of the conts of the locs it is ade ot of Reeat ntil all locs have een erged into one ig loc (single tree) Read the code off the ranches in the tree
Partial Exale 4 <> 4 <> 4 4 8 4 <> 8 4 8 4 <> 2 8 4 8 4 <> 2 29 8 4 8 4 <> 2 29 8 4 8 4 <> Coleted Code Tree 82 282 <s> 65 e 80 h 79 r 9 24 606 42 l 4 85 44 s 24 v 2 47 9 76 2 a 9 o 95 7 28 g 27 f 55 28 w 29 57 2 58 d c 4 <> 4 8 8 64 24 26 t 77 n 68 i 5 27 505 876
Created Code <s> 0 e 000 t 00 a 0 o h r 00000 n 0000 i 00 d 0 s 0 l 000 c 00 w 00 g 00 f 000 v 00 0 0000 0000 000 00 00 000000 00000 <> 000000 00000 Hffan: Sar Total for this exale: 4 its er character 65 total its 57% the size of ASCII reresentation Minial for a charactercharacter code for this assage
Other Codes error detecting: Know if soething has een odified (it fli) error correcting: Know which it has een odified Can o thin of a failiar exale? lticharacter: Encode seences (lie the ) with their own codes Can get ch closer to ini ossile code length: Shannon s entro Video Coression Colors generall change slowl: in sace (excet for edges) in tie (excet for cts or fast otion) So: encode colors regions Can lead to artifacts lie acrolocing: htt://wwwoteco/watch?v=xaj9tbrc2xm htt://wwwoteco/watch?v=xcrz4itbf
Encrtion B agreeing on a schee for transitting inforation coters can send secret essages to each other Most crrent schees deend on facts fro ner theor inclding (often) facts aot rie ners and the difficlt of factoring the Deo: Send a secret word Reeat ntil it doesn t wor anore