COMM1003 Information Theory Dr. Wassim Alexan Spring 2018 Lecture 5
The Baconian Cipher A mono alphabetic cipher invented by Sir Francis Bacon In this cipher, each letter is replaced by a sequence of five characters In the original Baconian cipher, the letters were replaced by As and Bs, as follows: a = aaaaa, b = aaaab, c = aaaba, d = aaabb... A nicer modification is to use 0s and 1s, as follows a = 00 000, b = 00 001, c = 00 010, d = 00 011... This is the same as counting from 0 to 25 in binary, to accommodate the 26 letters of the English language Wassim Alexan 2
The Baconian Cipher: A Quick Question! Is it possible to use the Baconian cipher to encrypt a message written in Russian? Knowing that the Russian alphabet consists of 33 letters What about a message in Dutch (26 letters), French (31 letters) or Arabic (28 letters)? Wassim Alexan 3
Exercise 1 You received the following ciphertext 010000001100100000100101100000100010010010110000001000101110011010001 000111001000000010011010000110100110 Can you identify the cipher used here? Carry out cryptanalysis to reveal the plaintext Wassim Alexan 4
Exercise 1 Solutions Since the cleartext is only composed of two characters, os and 1s, then this might be the Baconian cipher. As we decrypt it, the plaintext reveals itself to be Removing the spaces, we get ideclarethewaroncheating i declare the war on cheating Wassim Alexan 5
Breaking Mono Alphabetic Substitution Ciphers Substitution ciphers without a key are very weak and easily broken A brute force attack on a mono alphabetic substitution cipher with a key would require a maximum of 26! 2 88 trials Nevertheless, a simple letter frequency analysis would easily reveal the contents of a ciphertext made with a mono alphabetic substitution cipher By counting the frequency of letters in a large enough body of text, one could obtain the frequency of each letter in any given language Fig. 1 shows the frequency of each letter in the English language A third option would be to employ the Hill Climbing Algorithm Wassim Alexan 6
Letter Frequency Analysis Fig. 1. Frequency of letters in the English language. Wassim Alexan 7
Letter Frequency Analysis A quick examination of Fig. 2 reveals that the letter e is the most frequent in the English language (~ 0.13), followed by t (~0.09), a (~0.08) and o (0.075~) Fig. 2. Frequency of letters in the English language, in descending order. Wassim Alexan 8
Letter Frequency Analysis Thus, given a ciphertext of sufficient length, one could go about counting the occurrences of each of the letters and attempt mapping them to their substitutes based on the values shown in Fig. 3 Fig. 3. Frequency of letters in the English language. Wassim Alexan 9
Exercise 2 You received the following ciphertext which was encoded with a shift cipher XULTPAAJCXITLTLXAARPJHTIWTGXKTGHIDHIPXCIWTVGTPILPITGHLXIWIW TXGQADDS Perform an attack against this cipher based on a letter frequency count. How many letters do you have to identify through a frequency count to recover the key? What is the cleartext? Wassim Alexan 10
Exercise 2 Solutions We carry out a letter frequency analysis, obtaining the following numerical results {letter, counts, frequency} T I X P L G A W H D J C V U S R Q K 10 9 7 5 5 5 4 4 3 2 2 1 1 1 1 1 1 1 0.1493 0.1343 0.1045 0.0746 0.0746 0.0746 0.0746 0.0597 0.0597 0.0448 0.0299 0.0299 0.0149 0.0149 0.0149 0.0149 0.0149 0.0149 Since we already know that this is a shift cipher, then all we need to do is identify the shift key. This is actually pretty easy, since there is a big enough gap between the most frequent letter in the English alphabet and the next most frequent letter. By mapping the cipherletter T to the plainletter e, we get the following key A B C D E F G H I J K L M N O P Q R S T U V W X Y Z l m n o p q r s t u v w x y z a b c d e f g h i j k Wassim Alexan 11
Exercise 2 Solutions Decrypting the text, we get ifweallunitewewillcausetheriverstostainthegreatwaterswiththeirblood. Removing the spaces, we get if we all unite we will cause the rivers to stain the great waters with their blood. Wassim Alexan 12
The Hill Climbing Algorithm This algorithm searches for the key to the cipher An initial key is chosen at random and is used to decipher the ciphertext A statistical analysis is carried out on the obtained plaintext and is compared to the statistics of the English language (or whatever language is assumed to be used in this context) If the obtained plaintext seems to fit the statistical profile of the English language, then the key is updated by making a small change to it This process is iteratively repeated until a key gives the best statistical fit Wassim Alexan 13
The Hill Climbing Algorithm in Steps 1. Generate a random key, called the parent, decipher the ciphertext using this key. Rate the fitness of the deciphered text, store the result 2. Change the key slightly (swap two characters in the key at random), measure the fitness of the deciphered text using this new key, called the child 3. If the fitness is higher with the modified key, discard the old parent key and store the modified key as the new parent 4. Go back to step 2, unless no improvement in fitness occurred in that last 1000 iterations Wassim Alexan 14
The Hill Climbing Algorithm Notes This algorithm depends on the fitness function correctly distinguishing whether the plaintext obtained from one key is better than the plaintext from another key This is done by comparing quad gram statistics from the obtained plaintext with those of the target language However, this system fails when the true plaintext has an unsual statistical profile Consider this sample text from Simon Singh s book The Code Book From Zanzibar to Zambia to Zaire, ozone zones make zebras run zany zigzags This sample text is full of unusual quad grams, so it is expected to have a very low score! The Hill Climbing algorithm will most likely find a key that gives a piece of garbled plaintext that scores much higher than the true plaintext Wassim Alexan 15
The Hill Climbing Algorithm: An Example Consider the following ciphertext SOWFBRKAWFCZFSBSCSBQITBKOWLBFXTBKOWLSOXSOXFZWWIBICFWUQLR XINOCIJLWJFQUNWXLFBSZXFBTXAANTQIFBFSFQUFCZFSBSCSBIMWHWLNK AXBISWGSTOXLXTSWLUQLXJBUUWLWISTBKOWLSWGSTOXLXTSWLBSJBUU WLFULQRTXWFXLTBKOWLBISOXSSOWTBKOWLXAKOXZWSBFIQSFBRKANSO WXAKOXZWSFOBUSWJBSBFTQRKAWSWANECRZAWJ To begin the algorithm, we generate a random key, for example a b c d e f g h i j k l m n o p q r s t u v w x y z Y B X O N G S W K C P Z F M T D H R Q U J V E L I A Wassim Alexan 16
The Hill Climbing Algorithm: An Example Then, we decipher the ciphertext using this key, getting gdhmbrizhmjlmgbgjgbsyobidhxbmcobidhxgdcgdcmlhhybyjmhtsxrcyedjyuxhumstehc xmbglcmboczzeosymbmgmstmjlmgbgjgbynhqhxeizcbyghfgodcxcoghxtsxcubtthxhygo bidhxghfgodcxcoghxbgubtthxmtxsrochmcxobidhxbygdcggdhobidhxczidclhgbmysgmb rizegdhczidclhgmdbtghubgbmosrizhghzewjrlzhu The fitness of our first plaintext attempt is -22304.04 We now make a random change to the key, for example, by swapping the letters y and b in the key and try again This time, we get a fitness of -2200.78 An improvement! But the text is still not readable so we keep on carrying out more and more iterations, getting better and better fitness values Wassim Alexan 17
The Hill Climbing Algorithm: An Example After many iterations, the final key is found to be a b c d e f g h i j k l m n o p q r s t u v w x y z X Z T J W U M O B E P A R I Q K D L F S C H Y G N V Which results in the plaintext thesimplesubstitutioncipherisacipherthathasbeeninuseformanyhundredsofyearsitbasi callyconsistsofsubstitutingeveryplaintextcharacterforadifferentciphertextcharacteritdi ffersfromcaesarcipherinthatthecipheralphabetisnotsimplythealphabetshifteditiscompl etelyjumbled Wassim Alexan 18
The Hill Climbing Algorithm: An Example Adding spaces, the plaintext is quiet readable now as the simple substitution cipher is a cipher that has been in use for many hundreds of years it basically consists of substituting every plaintext character for a different ciphertext character it differs from caesar cipher in that the cipher alphabet is not simply the alphabet shifted it is completely jumbled Wassim Alexan 19
The Hill Cipher A poly alphabetic cipher invented by Lester Hill in 1929 The encryption process is based on a mathematical formula where E(l) = K l mod m (1) l is a vector containing n letters from the plaintext K is the n n key matrix m is the length of the alphabet Wassim Alexan 20
The Hill Cipher The decryption is based on the mathematical formula D(E(l)) = K -1 E(l ) (2) Note that K -1 is not the linear algebraic inverse of K The full decryption details of this cipher is left as a reading exercise for the students Wassim Alexan 21
The Hill Cipher: An Example Consider the following plaintext that we are interested in applying the Hill cipher onto attack the main gate of the castle at seven pm Let the key matrix be 2 9 4 5 2 1 3 17 7 (3) Take the first 3 letters of the plaintext (n = 3) and assign them numbers that refer to their locations in the alphabet to form l, then carry out the encryption as in (1) a b c d e f g h i j k l m n o p q r s t u v w x y z 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Wassim Alexan 22
The Hill Cipher: An Example att ack the mai nga teo fth eca stl eat sev enp m E (l) = (K l) mod m = = = = 2 4 5 9 2 1 3 17 7 171 57 456 15 5 14 P F 0 mod 26 0 19 19 mod 26 Wassim Alexan 23
Exercise 6 Continue the encryption process using the Hill cipher for the plaintext attack the main gate of the castle at seven pm Wassim Alexan 24
Exercise 6 Solutions We should start off by dividing the plaintext into groups of 3 letters att ack the mai nga teo fth eca stl eat sev enp m and pad the last group with a couple of extra a letters, so that each group would still be made up of 3 letters each att ack the mai nga teo fth eca stl eat sev enp maa We would then continue encrypting as before. The full ciphertext would be PFO GOA IHC MMO YZL ULP RMX QOU LDM ZDP BJJ FZA YEK Removing the spaces we get PFOGOAIHCMMOYZLULPRMXQOULDMZDPBJJFZAYEK Wassim Alexan 25
The Homophonic Substitution Cipher This is a poly alphabetic cipher The encryption process is based on a substitution, such that one plaintext letter could correspond to multiple ciphertext letters By introducing multiple substitutions for the high frequency letters, we effectively flatten the frequency distribution of the alphabet, thus making a letter frequency analysis almost obsolete! a b c d e f g h i j k l m n o p q r s t u v w x y z D X S F Z E H C V I T P G A Q L K J R U O W M Y B N 9 7 3 5 0 4 6 2 1 Wassim Alexan 26
Exercise 7 Given the following ciphertext, attempt an attack on it, knowing that its encryption was carried out using the Homophonic substitution cipher outlined in the previous slide F7EZ5FUC21DR6M9PP0E6CZSD4UP1 Wassim Alexan 27
Exercise 7 Solutions Using the table on slide 26, we decrypt the ciphertext getting defendtheeastwallofthecastle Adding spaces, we get defend the east wall of the castle Wassim Alexan 28
The Rail Fence Cipher The rail fence cipher is an easy to apply transposition cipher that jumbles up the order of the letters of a message in a quick and convenient way The key is the number of lines used It works by writing the plaintext on alternate lines across the page, then reading off each line in turn For example, the plaintext defend the east wall is written as shown below, with all spaces removed and a key = 2 Then, the ciphertext would be D F N T E A T A L E E D H E S W L DFNTEATALEEDHESWL Wassim Alexan 29
The Rail Fence Cipher The same plaintext could be encoded with a key = 3, padding the last couple of cells with the letter x Then, the ciphertext would be D N E T L E E D H E S W L X F T A A X DNETLEEDHESWLXFTAAX Wassim Alexan 30
Exercise 8 Give the following ciphertext, attempt an attack on it, knowing that its encryption was carried out using a Rail Fence cipher with a prime number key less than 7 DAEAISOETRXCUWLUTNHOUAESULBEOCNOMBEOGTDTYSYEX Wassim Alexan 31
Exercise 8 Solutions Since we do not know the key, we will have to attempt decrypting the ciphertext trying the keys 2, 3 and 5 The correct key is 5 which represents the number of rows of the matrix, but in order for us to know the number of columns, we have to count the characters in the ciphertext, which turns up to be 45 The next step is to write down the matrix and fill it with the ciphertext row by row as follows D A E A I S O E T R X C U W L U T N H O U A E S U L B E O C N O M B E O G T D T Y S Y E X Wassim Alexan 32
Exercise 8 Solutions Finally, we read off the plaintext from the matrix, in a zigzag fashion, getting donotcheatonyourexamsbecauseyouwillgetbusted Adding spaces, we get do not cheat on your exams because you will get busted Wassim Alexan 33