Recover plantext attac to bloc cphers L An-Png Bejng 100085, P.R.Chna apl0001@sna.com Abstract In ths paper, we wll present an estmaton for the upper-bound of the amount of 16-bytes plantexts for Englsh texts, that s not suffcent large mae clear that the bloc cphers wth bloc length no more than 16-bytes wll be subject to recover plantext attacs n the occasons of plantext -nown or plantext-chosen attacs. Keywords: bloc cpher, recover-plantext attac, brthday paradox,
1. Introducton For the securty of bloc cphers there are many researches, whch may be found n most of textboos and papers n cryptography, refer to see [1]. It s nown that bloc cphers have a characterstc that t encrypt plantexts n blocs wth a regular encrypton scheme, so, plantexts are 1-1 related to the cphertexts n blocs for a secret ey. It s not dffcult to now that these bloc cphers wll be easy subjected to recover plantext attac f the amount of bloc plantexts s not suffcent large. Suppose that the amount of all possble plantexts blocs s no more than 2 m, /2 2 an adversary has a dctonary of the bloc-pars (cphertext, plantext wth sze about 2 m +, then /2 he wll recover a bloc plantext whle collect 2 m blocs of cphertexts wth hgh successful probablty by the general brthday paradox. In most of the currently used bloc cphers, the output szes, that s, the lengths of blocs are equal to, or smaller than 16 bytes. In ths paper, we wll show that n the case of Englsh text the number of 16-bytes plantexts s less than 2 56, so the bloc cphers wth output sze of 16 bytes wll be vulnerable to recover plantext attacs n the occasons of plantext-nown or plantext-chosen attacs. In the rest of ths secton, we gve some conceptons used n ths paper. Denoted by Q the vocabulary for the plantexts, and suppose that the sze Q = N. For a word w Q,, denote by w the length,.e., the number of the letters contaned n the word w. An Englsh phase or a plantext blocα s called of -terms f t conssts of words or parts of words, There are four possble expressons for the -terms plantext blocs word1 word2 word (1.1 word1 word2 word (1.2 word1 word2 word (1.3 word1 word2 word (1.4 Where word s the th word ofα, and symbol represents a blan space. It should be mentoned that there are the possbltes that word1 n (1.1, (1.2 and word n (1.1, (1.3 are not complete Englsh words but only parts. Besdes, possbly there are exsted some blocs contan some punctuaton mars such as ',' or '.' or ;, whch wll be agreed to be a character rather than a term, except the specal case that word1 n (1.1 or (1.2 s just a punctuaton mar. We wll only tae the frequently used three punctuaton mars ',', '.' and ; nto the consderaton n the followng dscusson. For the smplcty, n ths paper, t s assumed that the words n the vocabulary Q are consst of Englsh letters, no nclude specal characters such as @, #, etc, and Araban numbers and abbrevatons. 2. The estmatons of the amount of plantext blocs
In ths secton, we wll present a estmaton for the amount of 16-bytes plantexts. Proposton 1. Suppose that Q s a vocabulary consst of Englsh words, ncludng no specal characters and Araban numbers, and the sze Q 60000. Let F be the set of all possble 16-bytes blocs of Englsh texts over Q. Denoted by words, 1 16, f the dstrbuton of Q satsfy that Q the subset of Q consst of -letters where μ s a constant, μ = 2, then Q, 1 16, (2.1 1 μ C16 56 2. F (2.2 Proof. Denoted by F, F and F the subsets of F consst of 16-bytes plantexts wth that the frst letter s a mnuscule one, a captal one and a punctuaton respectvely. We wll see that F possess a man part n the amount. For an postve nteger, 1 8, Let F be the subsets of F consst of -terms blocs, and (1 F, F, F (2 (3 and (4 F be the subsets of F wth (1 the expresson forms (1.1, (1.2, (1.3 and (1.4 respectvely. We wll frstly calculate F. Suppose that ζ F, s a -terms bloc, ζ = ( (. (2.3 word1 word2 word Denoted by word = c, obvously, c 1, 1, and δ = 1, 0, 0, 1, for 1 c = 16 + δ, (2.4 ζ F, F, F, F respectvely. At frst we are restrcted n the case (1 (2 (3 (4 (1 ζ F. We call ζ s of 1 2 ( c, c,..., c -type, and let F be the set of ( c1, c2,, c ( c, c,..., c -type blocs. For any -subset I of{1, 2,, }, denoted by 1 2 F ( I ( c1, c2,, c be the subset of F consst of the blocs wth punctuaton mars followng the words wth ( c1, c2,, c ndces n I. In the next, we calculate the szes F, and we at frst calculate ( ( c1, c2,, c F. (0 ( c1, c2,, c
c1 c Denoted by x = mn(26, N, y = mn(26, N, by (2.1, t has 1 (0 c 1 ( c1, c2,, c x y μ C16 = 2 F ( c1 1 c 1 2 c 1 16 16 μ 16 = 1 ( x/ C ( y/ C C. (2.5 By the basc combnatorcs, we now that for any postve nteger s, t has c1 + + c = s = 1 C = C c s 16 16. (2.6 And for the assumpton N 60000, t s easy to now that mn{ 26 / C, N / C } 26 / C 147, c1 c1 1 c1 1 3 2 16 16 16 mn{ 26 / C, N / C } 26 / C 147. So, wth (2.5, (2.6, (2.7 and (2.4, we have c1 + + c = 17 c c 1 c 1 3 2 16 16 16 (0 2 2 17 2 ( c1, c2,, c μ C 16 (2.7 F (147 ( (2.8 ( I To get an estmaton for F, > 0, have to change 16 nto 16 n the equaton (2.4, ( c1, c2,, c and notce that there are C 3 -subsets I s, so we have Hence, I c1 + + c = 17 F (147 C 3 μ C (1 2 2 17 2 16 0 2 (147 / μ C 3 μ 0 F (147 3 (2.9 ( I 2 2 17 2 ( c1, c2,, c C μ C 16 17 2 (16 (17 2 (17 2+ 1 (17 2! (16 2 17 2 (147 / μ μ e 16 (17 2 (17 2+ 1 C 3 2 π (17 2 17 2 0 (16 17 2 2 ( 147 / μ μ e 16 3 (17 2 1 +, 2 π (17 2 17 2 16 where Strlng s formula has been appled. (2.10 The estmatons for (2 F, (3 F (4 and F are smlar to the one above, but notce that that word1 n (1,2, (1.4 and word n (1.3, (1.4 are complete Englsh word rather than a part, and now δ = 0 n the equaton (2.5 for (3 F, F and δ = 1for (2 (4 F. Thus, we has
16 2 ( (147 / μ μ e 16 3 (16 2 F 1 +, = 2,3. 2 π (16 2 16 2 16 15 2 (4 μ e 16 3 (15 2 F 1 +. 2 π (15 2 15 2 16 (2.11 So, (1 (2 (3 (4 F ( F + F + F + F 1 8 2 17 2 (147 / μ μ e 16 3 (17 2 1+ 1 8 2 π (17 2 17 2 16 16 2 (147 / μ μ e 16 3 (16 2 + 2 1+ 1 < 8 2 π (16 2 16 2 16 + 1 < 8 15 2 μ e 16 3 (15 2 2 6 1 + + (147 μ 2 π ( 15 2 15 2 16 (2.12 For μ = 2, t has 3.73 10 16 F. (2.13 In respect to the estmaton of F, we now that the frst letter of a sentence s captal or mnuscule s determned by the punctuaton ahead t, and that the frst letter of word1 n (1.1 or (1.2 s captal one means word1 s a complete Englsh word rather than a part. Denoted by ( F, 1 4, the subsets of F consst of the plantext blocs wth type as n (1.1, (1.2, (1.3 and (1.4 respectvely, that s, ( ( F = F, = 1,2,3,4. Then, t has μ 4.2 10 147 (3 (4 14 F F + F + F. (2.14 For the estmaton of F, provded to substtute 16 by 16 1 n the estmaton of (3 F and (4 F, and wth multple 3 for there are three punctuaton mars. So 4 10 13 F (2.15 Hence, we have + + 3.8 10 2 16 56 F F F F. Remar 1. It s lely the nequaton (2.1 s true for the dstrbuton of Englsh words, but we
have not checed n total, so we have taen t as a condton, so that the constant μ may be modfed accordng to the actual cases. Remar 2. Moreover, for a -letters word w, and a postve nteger,, we call the segment formed by the frst letters of w as the -prefx of w, smlarly, the segment formed by the last letters of w as the -suffx of w. Denoted by [] Q and [] Q the sets of all the dstnct -prefx s and -suffx s of the words nq respectvely. Suppose that and denoted by Q = λ C, Q = λ C, (2.16 ( max{ λ, λ } [] ( 1 1 16 [ ] 16 λ =. In the proof above we nown that λ 147 guess that for the ordnary vocabulary there may be λ = 26, f so, then 15 51 1.8 10 2., however, we F (2.17 It s easy to now that the conjecture s true for = 1, and > 5, so the rest to be verfed are the cases 2 5. 3. Concluson For the smplcty of dscusson, we have excluded Araban numbers and some specal characters such as @, $, etc, and some specal punctuatons such as!,?, etc, though they occasonally appear n the Englsh texts, but a lttle. So, the estmaton above may be vewed as the one for the frequently appeared ones. The calculatons n the paper s nearly n combnatorcs, no consderatons on the Englsh grammar, logc and semantcs, so t s very lely that the actual amount of plantext blocs wll be much smaller then the one presented n Proposton 1. In fact, our frst dea s from the consderaton n Englsh grammar, but whch s somewhat trflng. The result presented ndcate that the bloc cphers wth 16-bytes bloc length such as AES wll be subject to recover plantext attacs when appled to encrypt Englsh texts n the occasons of plantext-nown or plantext-chosen attacs. From the dscusson above, we have seen that the amount of plantext blocs not only depend the sze of bloc length but also the dstrbuton of the words n languages. References [1] A. Menezes, P. van Oorschot, S. Vanstone, Handboo of Appled Cryptograpgy, CRC Press,1997.