New Form of Permutation Bias and Secret Key Leakage in Keystream Bytes of RC4

ew Form of Permutation Bias and Secret Key Leakage in Keystream Bytes of RC4 Subhamoy Maitra Applied Statistics Unit, Indian Statistical Institute, 203 B T Road, Kolkata 700 08, India, Email: subho@isical.ac.in Goutam Paul Department of Computer Science and Engineering, Jadavpur University, Kolkata 700 032, India, Email: goutam paul@cse.jdvu.ac.in Abstract Consider the permutation S in RC4. Roos pointed out in 995 that after the Key Scheduling Algorithm (KSA) of RC4, each of the initial bytes of the permutation, i.e., S[y] for small values of y, is biased towards some linear combination of the secret key bytes. In this paper, for the first time we show that the bias can be observed in S[S[y]] too. Based on this new form of permutation bias after the KSA and other related results, a complete framework is presented to show that many keystream output bytes of RC4 are significantly biased towards several linear combinations of the secret key bytes. The results do not assume any condition on the secret key. We find new biases in the initial as well as in the 256-th and 257-th keystream output bytes. For the first time biases at such later stages are discovered without any knowledge of the secret key bytes. We also identify that these biases propagate further, once the information for the index j is revealed. Keywords: Bias, Cryptanalysis, Keystream, Key Leakage, RC4, Stream Cipher. This is a revised and extended version of the paper with the same title which has been presented in the 5th Fast Software Encryption (FSE) Workshop, February 0-3, 2008, Lausanne, Switzerland and published in LCS vol. 5086, pages 253-269. In the current version, Theorem 6 is included as a new result and minor revisions have been made in the formula of Theorem 7.

Introduction RC4 is one of the most well known stream ciphers. It has very simple implementation and is used in a number of commercial products till date. Being one of the popular stream ciphers, it has also been subjected to many cryptanalytic attempts for more than a decade. Though lots of weaknesses have already been explored in RC4 [, 2, 3, 4, 5, 6, 7, 8, 0,, 2, 4, 5, 6, 8, 9, 20], it could not be thoroughly cracked yet and proper use of this stream cipher is still believed to be quite secure. This motivates the analysis of RC4. The Key Scheduling Algorithm (KSA) and the Pseudo Random Generation Algorithm (PRGA) of RC4 are presented below. The data structure contains an array S of size (typically, 256), which contains a permutation of the integers {0,..., }, two indices i, j and the secret key array K. Given a secret key k of l bytes (typically 5 to 6), the array K of size is such that K[y] = k[y mod l] for any y, 0 y. Algorithm KSA Initialization: For i = 0,..., S[i] = i; j = 0; Scrambling: For i = 0,..., j = (j + S[i] + K[i]); Swap(S[i], S[j]); Algorithm PRGA Initialization: i = j = 0; Output Keystream Generation Loop: i = i + ; j = j + S[i]; Swap(S[i], S[j]); t = S[i] + S[j]; Output z = S[t]; Apart from some minor details, the KSA and the PRGA are almost the same. In the KSA, the update of the index j depends on the secret key, whereas the key is not used in the PRGA. One may consider the PRGA as the KSA with all zero key. All additions in both the KSA and the PRGA are additions modulo. Initial empirical works based on the weaknesses of the RC4 KSA were explored in [6, 20] and several classes of weak keys had been identified. In [6], experimental evidences of the bias of the initial permutation bytes after the KSA towards the secret key have been reported. It was also observed in [6] that the first keystream output byte of RC4 leaks information about the secret key when the first two secret key bytes add to 0 mod 256. A more general theoretical study has been performed in [, 2] which includes the observations of [6]. These biases do propagate to the keystream output bytes as observed in [5, ]. In [5], the Glimpse theorem [4] is used to show the propagation of biases in the initial keystream output bytes. On the other hand, a bias in the choice of index has been exploited in [] to show a bias in the first keystream output byte. More than a decade ago (995), Roos [6] pointed out that the initial bytes S[y] of the permutation after the KSA are biased towards some function f y (see Section. for the definition of f y ) of the secret key. Since then several works [2, 9, 0,, 2, 3] have considered biases of S[y] either with functions of the secret key bytes or with absolute values and discussed applications of these biases. However, no research has so far been published 2

to study how the bytes S[S[y]] are related to the secret key for different values of y. Here we solve this problem, identifying substantial biases in this direction. It is interesting to note that as the KSA proceeds, the probabilities P (S[y] = f y ) decrease monotonically, whereas the probabilities P (S[S[y]] = f y ) first increases monotonically till the middle of the KSA and then decreases monotonically until the end of the KSA. Using these results and other related techniques, we find new biases in the keystream output bytes towards the secret key. A complete framework is presented towards the leakage of information about the secret key in the keystream output bytes, that not only reveals new biases at a later stage (256, 257-th bytes), but also points out that the biases propagate further, once the information regarding j is known. The works [2, 7] also explain how secret key information is leaked in the keystream output bytes. In [2], it is considered that the first few bytes of the secret key is known and based on that the next byte of the secret key is predicted. The attack is based on how secret key information is leaked in the first keystream output byte of the PRGA. In [7], the same idea of [2] has been exploited with the Glimpse theorem [4] to find the information leakage about the secret key at the 257-th byte of the PRGA. ote that, our result is better than that of [7], as in [7] the bias is observed only when certain conditions on the secret key and IV hold. However, the biases we note at 256, 257-th bytes do not assume any such condition on the secret key.. otations, Contributions and Outline Let S r be the permutation, i r and j r be the values of the indices i and j after r many rounds of the RC4 KSA, r. Hence S is the permutation after the complete key scheduling. By S 0, we denote the initial identity permutation. During round r of the KSA, i r = r, r, and hence the permutation S r after round r can also be denoted by S ir+. Let Sr G be the permutation, ig r and jg r be the values of the indices i and j, and z r be the keystream output byte after r many rounds of the PRGA, r. Clearly, i G r = r mod. We also denote S by S0 G as this is the permutation before the PRGA starts. Further, let y(y + ) y f y = + K[x], 2 for y 0. ote that all the additions and subtractions related to the key bytes, the permutation bytes and the indices are modulo. Our contribution can be summarized as follows. In Section 2, we present the results related to biased association of S [S [y]] towards the linear combination f y of the secret key bytes. In Section 3, we present a framework for identifying biases in RC4 keystream bytes towards several linear combinations of the secret key bytes. x=0 3

In Section 3.2, we show that P (z = f 0 ) is not a random association. This indicates bias at z 256. In Section 3.3, we use the bias of S [S []] (from Section 2) to prove that P (z + = + f ) is not a random association. This indicates bias at z 257. In Section 3.4, we observe new biases in the initial keystream bytes apart from the known ones [5]. It is shown that for r = and 3 r 32, P (z r = f r ) are not random associations. These results are taken together in Section 3.5 to present cryptanalytic applications. In Section 4, considering that the values of index j are leaked at some points during the PRGA, we show that biases of the keystream output bytes towards the secret key are observed at a much later stage. 2 Bias of S[S[y]] to Secret Key We start this section discussing how P (S r [S r []] = f ) varies with round r, r, during the KSA of RC4. Once again, note that f = (K[0]+K[]+) mod. To motivate, we like to refer to Figure that demonstrates the nature of the curve with an experimentation using 0 million randomly chosen secret keys. The probability P (S r [S r []] = f ) increases till around r = where it gets the maximum value around 0.85 and then it decreases to 2 0.36 at r =. ote that this nature is not similar to the nature of P (S r [] = f ) that decreases continuously as r increases during the KSA. Towards the theoretical results, let us first present the base case for r = 2, i.e., after round 2 of the RC4 KSA. Lemma P (S 2 [S 2 []] = K[0] + K[] + ) = 3 4 Further, P (S 2 [S 2 []] = K[0] + K[] + S 2 [] ) 2. Proof: The proof is based on three cases. + 2. 2 3. Let K[0] 0, K[] =. The probability of this event is 2. ow S 2 [] = S [K[0] + K[] + ] = S [K[0]] = S 0 [0] = 0. So, S 2 [S 2 []] = S 2 [0] = S [0] = K[0] = K[0] + K[] +. ote that S 2 [0] = S [0], as K[0] + K[] + 0. Moreover, in this case, S 2 []. 2. Let K[0] + K[] = 0, K[0], i.e., K[]. The probability of this event is 2. ow S 2 [] = S [K[0] + K[] + ] = S [] = S 0 [] =. ote that S [] = S 0 [], as K[0]. So, S 2 [S 2 []] = S 2 [] = = K[0] + K[] +. Also, in this case, S 2 []. 4

0.2 0.8 P(S i+ [S i+ []] = (K[0] + K[] + ) mod 256) > 0.6 0.4 0.2 0. 0.08 0.06 0.04 0.02 0 0 50 00 50 200 250 300 i > Figure : P (S i+ [S i+ []] = f ) versus i (r = i + ) during RC4 KSA. 3. S 2 [S 2 []] could be K[0] + K[] + by random association except the two previous cases. Out of that, S 2 [] will happen in 2 Thus P (S 2 [S 2 []] = K[0] + K[] + ) = 2( ) 2 P (S 2 [S 2 []] = K[0] + K[] + S 2 [] ) = 2( ) 2 proportion of cases. + ( 2( ) 2 ) = 3 4 2 + 2 + 2 ( 2( ) 2. Further 3 ) = 2 4( ) 2. 4 Lemma shows that after the second round (i =, r = 2), the event (S 2 [S 2 []] = K[0] + K[] + ) is not a random association. Lemma 2 Let p r = P (S r [S r []] = K[0] + K[] + S r [] r ) for r 2. Then for r 3, p r = ( 2 )p r + ( 2 ) ( )2(r 2). Proof: After the (r )-th round is over, the permutation is S r. In this case, p r = P (S r [S r []] = K[0]+K[]+ S r [] r 2). The event ( (S r [S r []] = K[0]+K[]+ ) (S r [] r ) ) can occur in two mutually exclusive and exhaustive ways: ( (S r [S r []] = K[0] + K[] + ) (S r [] r 2) ) and ( (S r [S r []] = K[0] + K[] + ) (S r [] = r ) ). We compute the contribution of each separately. In the r-th round, i = r and hence does not touch the indices 0,..., r 2. Thus, the event ( (S r [S r []] = K[0] + K[] + ) (S r [] r 2) ) occurs if we already had ( (Sr [S r []] = K[0] + K[] + ) (S r [] r 2) ) and j r / {, r }. Thus, the contribution of this part is p r ( 2 ). 5

The event ( (S r [S r []] = K[0] + K[] + ) (S r [] = r ) ) occurs if after the (r )-th round, S r [r ] = r, S r [] = K[0] + K[] + and j r = causing a swap involving the indices and r.. We have S r [r ] = r if the location r is not touched during the rounds i = 0,..., r 2. This happens with a probability at least ( )r. 2. The event S r [] = K[0] + K[] + may happen as follows. In the first round (when i = 0), j / {, K[0] + K[] + } so that S [] = and S [K[0] + K[] + ] = K[0]+K[]+ with probability ( 2 ). After this, in the second round (when i = ), we will have j 2 = j + S [] + K[] = K[0] + K[] +, and so after the swap, S 2 [] = K[0] + K[] +. ow, K[0] + K[] + remains in location from the end of round 2 till the end of round (r ) (when i = r 2) with probability ( )r 3. Thus, P (S r [] = K[0] + K[] + ) = ( 2) ( )r 3. 3. In the r-th round (when i = r ), j r becomes with probability. Thus, P ( (S r [S r []] = K[0]+K[]+) (S r [] = r ) ) = ( ( 2) ( )2(r 2). )r ( 2 ) ( )r 3 = Adding the above two contributions, we get p r = ( 2)p r + ( 2) ( )2(r 2). The recurrence in Lemma 2 along with the base case in Lemma completely specify the probabilities p r for all r [2,...,]. Theorem After the complete KSA, P (S [S []] = K[0] + K[] + ) ( )2( ). Proof: Using the approximation 2 ( )2, the recurrence in Lemma 2 can be rewritten as p r = ap r + a r b, where a = ( )2 and b =. The solution of this recurrence is given by p r = a r 2 p 2 +(r 2)a r b, r 2. Substituting the values of p 2 (from Lemma ), a and b, we get p r = 2 ( )2(r 2) +( r 2 )( )2(r ). Substituting r = and noting the fact that P ( (S [S []] = K[0]+K[]+) (S [] ) ) = P (S [S []] = K[0]+K[]+), we get P (S [S []] = K[0] + K[] + ) = 2 ( )2( 2) + ( 2 )( )2( ). ote that the second term ( 0.348 for = 256) dominates over the negligibly small first term ( 0.00 for = 256), and so P (S [S []] = K[0] + K[] + ) ( )2( ) (replacing 2 = 2 by in the second term). ow we like to present a more detailed observation. In [6, 2], the association between S [y] and f y is shown. As we have observed the non-random association between S [S []] and f, it is important to study what is the association between S [S [y]] and f y, and moving further, the association between S [S [S [y]]] and f y, for 0 y and so on. Our experimental observations show that these associations are not random (i.e., much more than ) for initial values of y. The experimental observations (over 0 million runs of randomly chosen keys) are presented in Figure 2. The theoretical analysis of the biases of S r [S r [y]] towards f y for small values of y is presented in Appendix A. The results involved in the process are tedious and we need to approximate certain quantities to get the following closed form formula. 6

0.4 0.35 0.3 Probability > 0.25 0.2 0.5 < A 0. < B 0.05 < C 0 0 50 00 50 200 250 300 y > Figure 2: A: P (S [y] = f y ), B: P (S [S [y]] = f y ), C: P (S [S [S [y]]] = f y ) versus y (0 y 255). Theorem 2 After the complete KSA, P (S [S [y]] = f y ) y ( ) y(y+) 2 +2( 2) + ( ( ) y(y+) 2 +2 3, 0 y 3. ) y(y+) 2 y+2( ) + ( y ) ( y ) Extending similar techniques, the association between S [S... [S [y]]...] and f y can be studied in general. Though the general results are combinatorially interesting, it is not immediate how they will be applicable to find further weaknesses in the RC4 PRGA. In terms of cryptanalytic point of view, we use the non-random association of S [S []] relating f (Theorem ) to obtain the bias at the 257-th keystream output byte in Section 3.3. 3 Biases in RC4 Keystream We present new biases in Sections 3.2, 3.3 and 3.4, which were not known earlier. For presenting these results, we need to build a framework. That is presented in Section 3.. Some biases in initial keystream bytes of RC4 has earlier been pointed out in [5] that has later been discussed in [8] too. Under our framework, the biases of [5] is presented in Theorem 3. 3. Existing Results under Our Framework Let S G r be the permutation, i G r and jg r be the values of the indices i and j, and z r be the keystream output byte after r many rounds of the PRGA, r. Clearly, i G r = r mod. 7

We also denote S by S0 G as this is the permutation before the PRGA starts. Let us consider the existing result that relates each permutation byte after the KSA with certain linear combination of the secret key bytes. Proposition [2, Theorem ] Consider that the index j takes its values uniformly at random during the KSA rounds. Then, P (S r [y] = f y ) ( y ) ( y(y+) 2 +r] +, 0 y r, r. Substituting r = in the statement of the above Proposition, we get the following. Corollary The bias of the final permutation after the KSA towards the secret key is given by P (S [y] = f y ) = ( y ) ( y(y+) 2 +] +, 0 y. As explained in [2], the above result indicates significant biases for small values of y (more precisely, for 0 y 47), that is supported by the experimental result presented in [6]. The Glimpse Main Theorem [4, 7] states that after the r-th round of the PRGA, r, P (Sr G[jG r ] = r z r) = P (Sr G[iG r ] = jg r z r) = 2. We rewrite the first relation between Sr G[jG r ] and r z r as the following proposition. Proposition 2 P (z r = r S G r [ig r ]) = 2, r. Proof: Sr G [jr G ] is assigned the value of Sr [i G G r ]. As the Glimpse Main Theorem gives P (z r = r Sr G[jG r ]) = 2 for r, we get P (z r = r Sr G [ig r ]) = 2 for r. The idea of writing the Glimpse Main Theorem in the form of Proposition 2 is due to the fact that relating z r to Sr [i G G r ] will ultimately relate z r to the secret key bytes, as the permutations in the initial rounds of the PRGA are related to the secret key. The following lemma shows how the permutation bytes at rounds t and r of the PRGA, for t + 2 r, are related. Lemma 3 Let P (St G[iG r ] = X) = q t,r, for some X. Then, for t + 2 r t +, P (Sr G [ig r ] = X) = q [( ] t,r )r t +. Proof: We consider two separate cases.. St G[iG r ] = X and during the next (r t ) rounds of the PRGA, the index ig r is not touched by any of the r t many j values (since t + 2 r t +, the index i G r is not touched by any of the r t many i values anyway). The first event occurs with probability q t,r and the second event occurs with probability ( )r t. Thus the contribution of this case is q t,r ( )r t. 2. St G[iG r ] X and still SG r [ig r ] equals X by random association. The contribution of this case is ( q t,r ). Thus, adding the above two contributions, we get P (Sr G [ig r ] = X) = q t,r ( ( q t,r ) = q [( ] t,r )r t +. 8 )r t +

Remark The above result holds for t + 2 r t +, and not for r = t +. If we take r = t+, then S G r = SG t, which is our starting point, i.e., P (SG r [ig r ] = X) = P (SG t [ig r ] = X) = q t,r. The following is an immediate consequence of Lemma 3. Corollary 2 For 2 r, P (Sr G [r] = f r) = [ [ ] ( r ( )r +. ) ( r(r+) 2 +] + ] Proof: For 2 r, we have i G r = r. Taking X = f r and t = 0 in Lemma 3, we have q 0,r = P (S0 G [r] = f r ) = P (S [r] = f r ) = ( r) ( r(r+) 2 +] + (by Corollary ), and hence P (Sr G [r] = f r) = [ ( r) ( r(r+) 2 +] + ] [ ] ( )r +. ext, we present the bias of each keystream output byte to a combination of the secret key bytes in the following lemma. Lemma 4 Let P (S G r [ig r ] = f i G r ) = w r, for r. Then P (z r = r f i G r ) = ( + w r), r. Proof: We consider two separate cases in which the event (z r = r f i G r ) can occur.. Sr G [ig r ] = f i and z G r r = r Sr G [ig r ]. The contribution of this case is P (SG r [ig r ] = f i G r ) P (z r = r Sr [i G G r ]) = w r 2 (by Proposition 2). 2. Sr G [ig r ] f i, and still z G r r = r f i G r due to random association. So the contribution of this case is P (Sr [i G G r ] f i G r ) = ( w r). Adding the above two contributions, we get the total probability as w r 2 + ( w r) = ( + w r). The following biases were discovered in [5]. Theorem 3 ( () P (z = f ) = (2) For 2 r (, P (z r = r f r ) = + ( )+2 + + [ ( r ) ( r(r+) ). 2 +] + ] [ ( )r ] ) +. Proof: First, we prove part (). In the first round, i.e., when r =, we have i G r = and f i G r = f, and so w = P (S0 G [] = f ) = P (S [] = f ) = ( ) ( (+) 2 +] + = ( ( )+2 + (by Corollary ). ow, using Lemma 4, we get P (z ) = f ) = (+w ) = + ( )+2 +. ext, we prove part (2). From Corollary 2, w r = P (Sr G [r] = f r) = [ ( r) ( r(r+) 2 +] + ] [ ] ( )r + (, 2 r. ow, using Lemma 4, we get P (z r = r f r ) = (+w r) = + [ ( r ) ( r(r+) 2 +] + ] [ ] ) ( )r +. 9

ote that Lemma 3 or Corollary 2 is not used in proving part () of the above theorem. It is proved directly from Corollary. In fact, Lemma 3 can not be used in part (), as here we have r = t + with t = 0 (see Remark ). To have a clear understanding of the quantity of the biases, Table lists the numerical values of the probabilities according to the formula given in Theorem 3. ote that the random association is, which is 0.0039 for = 256. Close to the round 48, the biases tend to disappear. This is indicated by the convergence of the values to the probability = 0.0039. 256 r P (z r = r f r ) -8 0.0053 0.0053 0.0053 0.0053 0.0052 0.0052 0.0052 0.005 9-6 0.005 0.0050 0.0050 0.0049 0.0048 0.0048 0.0047 0.0047 7-24 0.0046 0.0046 0.0045 0.0045 0.0044 0.0044 0.0043 0.0043 25-32 0.0043 0.0042 0.0042 0.0042 0.004 0.004 0.004 0.004 33-40 0.004 0.0040 0.0040 0.0040 0.0040 0.0040 0.0040 0.0040 4-48 0.0040 0.0040 0.0040 0.0040 0.0040 0.0039 0.0039 0.0039 Table : The probabilities computed following Theorem 3. One may check that P (z = f ) = ( + 0.36) and that decreases to P (z 32 = 32 f 32 ) = ( + 0.05), but still then it is 5% more than the random association. 3.2 ew Bias in the 256-th Keystream Output Byte Interestingly, the biases again reappear after rounds 256 and 257. First we present the bias for the 256-th keystream byte. ( ) Theorem 4 P (z = f 0 ) = + ( )2 + ( 2 ) +. 2 Proof: During the -th round of the PRGA, i G = mod = 0. Taking X = f 0, t = 0 and r = in Lemma 3, we have q 0, = P (S0 G[0] = f 0) = P (S [0] = f 0 ) = ( ) + (by Corollary ), and hence w = P (S G [0] = f 0) = [ ] [ ] ( ) + ( ) + = ( )2 + ( 2 ) ( +. Thus, by Lemma 4, the bias is given by 2 ) P (z = f 0 ) = ( + w ) = + ( )2 + ( 2 ) +. 2 For = 256, w = w 256 = 0.392 and the bias turns out to be 0.0045 (i.e., ( + 256 0.392)). Experimental results confirm this bias. 3.3 ew Bias in the 257-th Keystream Output Byte We will now show that the bias in the 257-th keystream output byte follows from Theorem, i.e., P (S [S []] = K[0] + K[] + ) ( )2( ). 0

Theorem( 5 P (z + = + f ) = + ( ( ( )3( ) ( )2( ) + Proof: During the ( +)-th round, we have, i G + = ( +) mod =. Taking X = f, t = and r = + in Lemma 3, we have q,+ = P (S G[] = f ) = P (S [S []] = f ) = )2( ), and hence w + = P (S G[] = f [( ] ) = ( )2( ) ) + = )3( ) ( ( )2( ) +. ow, using Lemma 4, we get P (z ) + = + f ) = ( + w +) = + ( )3( ) ( )2( ) +. For = 256, w + = w 257 = 0.0535 and P (z 257 = 257 f ) = (+0.0535) = 0.004 which also conforms to experimental observation. 3.4 More ew Biases in Initial Bytes of RC4 Keystream The biases of z r with r f r for the initial keystream output bytes have been pointed out in Theorem 3. Interestingly, experimental observation reveals bias of z r with f r too. The results are presented in Table 2 which is experimented over hundred million (0 8 ) randomly chosen keys of 6 bytes. For proper random association, P (z r = f r ) should have been 256, i.e., 0.0039. r P (z r = f r ) -8 0.0043 0.0039 0.0044 0.0044 0.0044 0.0044 0.0043 0.0043 9-6 0.0043 0.0043 0.0043 0.0042 0.0042 0.0042 0.0042 0.0042 7-24 0.004 0.004 0.004 0.004 0.004 0.0040 0.0040 0.0040 25-32 0.0040 0.0040 0.0040 0.0040 0.0040 0.0040 0.0040 0.0040 33-40 0.0039 0.0039 0.0039 0.0039 0.0039 0.0039 0.0039 0.0039 4-48 0.0039 0.0039 0.0039 0.0039 0.0039 0.0039 0.0039 0.0039 ). Table 2: Additional bias of the keystream bytes towards the secret key. Following our experimental observation, the explanation of the fact P (z 3 = f 2 ) > 256 was pointed out in [7]. We present the idea of [7] in this paragraph. Assume that after the third round of the KSA, S 3 [2] takes the value f 2, and is hit by j later in the KSA. Then f 2 is swapped with S k [k] and consider that S k [k] has remained k so far. Further, suppose that S [3] = 0 holds. Thus, S [2] = k, S [k] = f 2 and S [3] = 0 at the end of the KSA. In the second round of the PRGA, S G [2] = k is swapped with a more or less random location S G[l]. Therefore, SG 2 [l] = k and jg 2 = l. In the next round, i = 3 and points to S2 G[3] = 0. So j does not change and hence jg 3 = l = jg 2. Thus, SG 2 [l] = k is swapped with S2 G [3] = 0, and one gets S3 G [l] = 0 and S3 G [3] = k. The output z 3 is now S3 G[SG 3 [i] + SG 3 [jg 3 ]] = SG 3 [k + 0] = SG 3 [k] = f 2. Along the same line of arguments given in [7], we here provide a detailed theoretical analysis of the event z r = f r in general and explicitly derive a formula for P (z r = f r ).

The proof depends on P (S [r] = 0) for different r values. The explicit formula for the probabilities P (S [u] = v) was derived for the first time in [9] and the problem was addressed again in [0, 3]. Proposition 3 [3, Theorem, Item 2] For 0 v, v u, P (S [u] = v) = ( ) u + ( )v+ ( )+v u. Theorem 6 P (z = f 0 ) = ( ( ) ( ). )2 ( 2 ) γ +, where γ = ( ) 2 + Proof: Substituting r =, y = 0 in Proposition, we have P (S [0] = f 0 ) =. After the first round, suppose that the index 0 is touched for the first time by j t+ in round t + of the KSA and due to the swap the value f 0 is moved to the index t, t and also prior to this swap the value at the index t was t itself, which now comes to the index 0. This means that from round to round t (both inclusive), the pseudo-random index j has not taken the values 0 and t. So, after round t +, P ( (S t+ [0] = t) (S t+ [t] = f 0 ) ) = P ( (S t [0] = f 0 ) (S t [t] = t) (j t+ = 0) ) = ( 2 )t. From the end of round t + to the end of the KSA, f 0 remains in index t and t remains in index 0 with probability ( 2 ) t. Thus, P ( (S [0] = t) (S [t] = f 0 ) ) = ( 2 )t ( 2 ) t = ( 2 ) = β (say). In the first round of the PRGA, j G = 0 + SG 0 [] = S []. From Proposition 3, we have P (S [] = 0) = γ, where γ = ( ) 2 + ( ) ( ). If S [] = 0, then j G = 0 and z [ = S G S G [] + S G[jG ]] [ = S G S G 0 [] + S0 G[jG ]] = S G[0 + t] = SG [t]. Since t 0, the swap of the values at the indices 0 and does not move the value at the index t. Thereby S G [t] = S0 G [t] = S [t] = f 0 and z = f 0 with probability β γ = δ (say). Since, t can values, 2, 3,...,, the total probability is δ ( ). Substituting the values of β, γ, δ, we get the probability that the event (z = f 0 ) occurs in the above path is p = ( ) ( 2 ) γ. If the above path is not followed, still there is ( p) probability of occurrence of the event due to random association. Adding these two probabilities, we get the result. ( Theorem 7 For 3 r, P (z r = f r ) = ( ) ( r) ( r+) ( r(r ) 2 +r] ) + ( 2 ) r ( 3 )r 2 γ r +, where γ r = ( ) r + ( ) ( ) r. Proof: Substituting y = r in Proposition, we have P (S r [r ] = f r ) = α r, where α r ( r+) ( r(r ) 2 +r] +, r. After round r, suppose that the index r is touched for the first time by j t+ in round t + of the KSA and due to the swap the value f r is moved to the index t, r t and also prior to this swap the value at 2

the index t was t itself, which now comes to the index r. This means that from round r + to round t (both inclusive), the pseudo-random index j has not taken the values r and t. So, after round t +, P ( (S t+ [r ] = t) (S t+ [t] = f r ) ) = P ( (S t [r ] = f r ) (S t [t] = t) (j t+ = r ) ) = α r ( 2 )t r. From the end of round t+ until the end of the KSA, f r remains in index t and t remains in index r with probability ( 2 ) t. Thus, P ( (S [r ] = t) (S [t] = f r ) ) = α r ( 2 )t r ( 2 ) t = α r ( 2 ) r = β r (say). Also, from Proposition 3, we have P (S [r] = 0) = γ r, where γ r = ( ) r + ( ) ( ) r. Suppose the indices r, t and r are not touched by the pseudo-random index j in the first r 2 rounds of the PRGA. This happens with probability ( 3 )r 2. In round r of the PRGA, due to the swap, the value t at index r moves to the index jr G with probability, and jr G 2 / {t, r} with probability ( ). Further, if [ SG r [r] remains 0, then in round r of the PRGA, jr G = jr G and z r = Sr G S G r [r] + Sr G [jr G ] ] [ = Sr G S G r [r] + Sr [j G r ] ] G = Sr G[0 + t] = SG r [t] = f r with probability β r γ r ( 3 )r 2 ( 2) = δ r (say). Since, t can values r, r +, r + 2,...,, the total probability is δ r ( r). Substituting the values of α r, β r, γ r, δ r (, we get the probability that the ) event (z r = f r ) occurs in the above path is p = ( r) ( r+) ( r(r ) 2 +r] + ( 2 ) r ( 3 )r 2 γ r. If the above path is not followed, still there is ( p) probability of occurrence of the event due to random association. Adding these two probabilities, we get the result. The theoretically computed values of the probabilities according to the above two theorems match with the experimental values provided in Table 2. Theorem 7 does not cover cases r = and r = 2. Case r = was already separately presented in Theorem 6. It is interesting to justify the absence of bias in case r = 2 as observed experimentally in Table 2. 3.5 Cryptanalytic Applications Here we accumulate the results explained above. Consider the first keystream output byte z of the PRGA. We find the results that P (z = f ) = 0.0053 (see Theorem 3 and Table ) and that P (z = f 0 ) = 0.0043 (see Theorem 6 and Table 2). Further, from [], we have the result that P (z = f 2 ) = 0.0053. Taking them together, one can check that the P (z = f 0 z = f z = f 2 ) = ( 0.0043) ( 0.0053) ( 0.0053) = 0.048. (The independence assumption in calculating the probability is supported by detailed experimentation with 00 different runs, each run presenting the average probability considering 0 million randomly chosen secret keys of 6 bytes.) Our result indicates that out of randomly chosen 0000 secret keys, in 48 cases on an average, z reveals f 0 or f or f 2, i.e., K[0] or (K[0]+K[]+) or (K[0]+K[]+K[2]+3). If, however, one tries a random association, considering that z will be among three randomly chosen values v, v 2, v 3 from [0,..., 255], then P (z = v z = v 2 z = v 3 ) = ( 256 )3 = 3

0.07. Thus one can guess z with an additional advantage of 0.048 0.07 00% = 27% 0.07 over the random guess. Looking at z 2, we have P (z 2 = 2 f 2 ) = 0.0053 (see Theorem 3 and Table ), which provides an advantage of 0.0053 0.0039 00% = 36%. 0.0039 Similarly, referring to Theorem 3 and Theorem 7 (and also Table and Table 2), significant biases can be observed in P (z r = f r z r = r f r ) for r = 3 to 32 over random association. ow consider the following scenario with the events E,..., E 32, where E : (z = f 0 z = f z = f 2 ), E 2 : (z 2 = 2 f 2 ), and E r : (z r = f r z r = r f r ) for 3 r 32. Observing the first 32 keystream output bytes z,..., z 32, one may try to guess the secret key assuming that 3 or more of the events E,..., E 32 occur. We experimented with 0 million randomly chosen secret keys of length 6 bytes. We found that 3 or more of the events occur in 0.0028 proportion of cases, which is true for 0.0020 proportion of cases for random association. This demonstrates a substantial advantage (40%) over random guess. 4 Further Biases when j is Known during PRGA In all the currently known biases as well as in all the new biases discussed in this paper so far, it is assumed that the value of the pseudo-random index j is unknown. In this section, we are going to show that the biases in the permutation at some stage of the PRGA propagates to the keystream output bytes at a later stage, if the value of the index j at the earlier stage is known. Suppose that we know the value jt G of j after the round t in the PRGA. With high probability, the value V at the index jt G will remain there until jt G is touched by the deterministic index i for the first time after a few more rounds depending on what was the position (t mod ) of i at the t-th stage. This immediately leaks V in keystream output byte. More importantly, if the value V is biased to the secret key bytes, then that information will be leaked too. Formally, let P (St G[jG t ] = V ) = η t for some V. ote that, jt G will be touched by i in round r, where r = t + (jt G t mod ) or t + ( t mod ) + jt G depending on whether jt G > t mod or jt G t mod respectively. By Lemma 3, we would have [( P (Sr G [jg t ] = V ) = η t ( P (z r = r V ) = + η t )r t ] + [( )r t. ow, Lemma 4 immediately gives ] ) +. For some special V s, the form of η t may be known. In that case, it will be advantageous to probe the values of j at particular rounds. For example, according to Corollary 2, after the (t )-th round of the PRGA, St G [t] is biased to the linear combination f t of the t(t+) ] +. secret key bytes with probability η t = [ ( t ) ( 2 +] + ] [ ( )t ow, at round t, f t would move to the index j t due to the swap, and hence St G[j t] will be biased to f t with the same probability. So, the knowledge of j t will leak information about f t in round r = t + (jt G t mod ) or t + ( t mod ) + jt G depending on whether 4

j G t > t mod or j G t t mod respectively. If we know the values of j at multiple stages of the PRGA (it may be possible to read some values of j through side-channel attacks), then the biases propagate further down the keystream output bytes. The following example illustrates how the biases propagate down the keystream output bytes when single as well as multiple j G values are known. Example Suppose we know the value of j5 G as 8. With probability η 5, S4 G [5] would have remained f 5 which would [( move to index 8 ] due to the swap in round 5, i.e., S5 G [8] = f 5. With approximately η 5 )8 5 + probability, f 5 would remain in index 8 till the end of the round 8- = 7. So, we immediately get a bias of z 8 with 8 f 5. Moreover, in round 8, f 5 would move from index 8 to j8 G. So, if the value of jg 8 is also known, say j8 G = 3, then we have SG 8 [3] = f 5. We can apply the same line of arguments for round 256 + 3 = 259 to get a bias of z 259 with 259 f 5. Experiments with billion random keys demonstrate that in this case the bias of z 8 towards 8 f 5 is 0.0052 and the bias of z 259 towards 259 f 5 is 0.0044. These conform to the theoretical values and show that the knowledge of j during the PRGA helps in finding non-random association (away from = 0.0039) between the keystream output bytes and 256 the secret key. 5 Conclusion In this paper, we present several new observations on weaknesses of RC4. It is shown that biases towards the secret key exists at the permutation bytes S[S[y]] for different y values. To our knowledge, this is the first attempt to formally analyze the biases of S[S[y]] and its implications towards the security of RC4. Moreover, a framework is built to analyze biases of the keystream output bytes towards different linear combinations of the secret key bytes. Subsequently, theoretical results are proved to show that RC4 keystream output bytes at the indices to 32 leak significant information about the secret key bytes. We also discovered and proved new biases towards the secret key at the 256-th and the 257-th keystream output bytes. Moreover, we show that if one knows the value of j during some rounds of the PRGA, then the biases propagate further down the keystream. Acknowledgment: The authors like to thank Mr. Snehasis Mukherjee, Indian Statistical Institute, Kolkata for his support in the preparation of the graphs. References [] S. R. Fluhrer and D. A. McGrew. Statistical Analysis of the Alleged RC4 Keystream Generator. FSE 2000, pages 9-30, vol. 978, Lecture otes in Computer Science, Springer. [2] S. R. Fluhrer, I. Mantin and A. Shamir. Weaknesses in the Key Scheduling Algorithm of RC4. SAC 200, pages -24, vol. 2259, Lecture otes in Computer Science, Springer. 5

[3] J. Golic. Linear statistical weakness of alleged RC4 keystream generator. EURO- CRYPT 997, pages 226-238, vol. 233, Lecture otes in Computer Science, Springer. [4] R. J. Jenkins. ISAAC and RC4. 996. Available at http://burtleburtle.net/bob/rand/isaac.html. [5] A. Klein. Attacks on the RC4 stream cipher. February 27, 2006. Available at http://cage.ugent.be/~ klein/rc4/, [last accessed on June 27, 2007]. [6] I. Mantin and A. Shamir. A Practical Attack on Broadcast RC4. FSE 200, pages 52-64, vol. 2355, Lecture otes in Computer Science, Springer. [7] I. Mantin. A Practical Attack on the Fixed RC4 in the WEP Mode. ASIACRYPT 2005, pages 395-4, vol. 3788, Lecture otes in Computer Science, Springer. [8] I. Mantin. Predicting and Distinguishing Attacks on RC4 Keystream Generator. EU- ROCRYPT 2005, pages 49-506, vol. 3494, Lecture otes in Computer Science, Springer. [9] I. Mantin. Analysis of the stream cipher RC4. Master s Thesis, The Weizmann Institute of Science, Israel, 200. [0] I. Mironov. (ot So) Random Shuffles of RC4. CRYPTO 2002, pages 304-39, vol. 2442, Lecture otes in Computer Science, Springer. [] G. Paul, S. Rathi and S. Maitra. On on-negligible Bias of the First Output Byte of RC4 towards the First Three Bytes of the Secret Key. Proceedings of the International Workshop on Coding and Cryptography 2007, pages 285-294. [2] G. Paul and S. Maitra. Permutation after RC4 Key Scheduling Reveals the Secret Key. SAC 2007, pages 360-377, volume 4876, Lecture otes in Computer Science, Springer. A revised and extended version with the title RC4 State Information at Any Stage Reveals the Secret Key is available at the IACR Eprint Server, eprint.iacr.org, number 2007/208, June, 2007. [3] G. Paul, S. Maitra and R. Srivastava. On on-randomness of the Permutation after RC4 Key Scheduling. Proceedings of the Applied Algebra, Algebraic Algorithms, and Error Correcting Codes (AAECC-7), pages 00-09, volume 485, Lecture otes in Computer Science, Springer. [4] S. Paul and B. Preneel. Analysis of on-fortuitous Predictive States of the RC4 Keystream Generator. IDOCRYPT 2003, pages 52-67, vol. 2904, Lecture otes in Computer Science, Springer. [5] S. Paul and B. Preneel. A ew Weakness in the RC4 Keystream Generator and an Approach to Improve the Security of the Cipher. FSE 2004, pages 245-259, vol. 307, Lecture otes in Computer Science, Springer. 6

[6] A. Roos. A class of weak keys in the RC4 stream cipher. Two posts in sci.crypt, message-id 43ueh$j3@hermes.is.co.za and 44ebge$llf@hermes.is.co.za, 995. Available at http://marcel.wanda.ch/archive/weakkeys. [7] E. Tews. Email Communications, September, 2007. [8] E. Tews, R. P. Weinmann and A. Pyshkin. Breaking 04 bit WEP in less than 60 seconds. IACR Eprint Server, eprint.iacr.org, number 2007/20, April, 2007. [9] S. Vaudenay and M. Vuagnoux. Passive-only key recovery attacks on RC4. SAC 2007, pages 344-359, volume 4876, Lecture otes in Computer Science, Springer. [20] D. Wagner. My RC4 weak keys. Post in sci.crypt, message-id 447ol$cbj@cnn.Princeton.EDU, 26 September, 995. Available at http://www.cs.berkeley.edu/ daw/my-posts/my-rc4-weak-keys. Appendix A Lemma 5 P ( (S y+ [S y+ [y]] = f y ) (S y+ [y] y) ) ( ( ) ) ( y(y+) 2 y( 2 )y + ( )y), 0 y 3. Proof: S y+ [y] y means that it can take y+ many values 0,,..., y. Suppose S y+ [y] = x, 0 x y. Then S y+ [x] can equal f y in the following way.. From round (when i = 0) to x (when i = x ), j does not touch the indices x and f y. Thus, after round x, S x [x] = x and S x [f y ] = f y. This happens with probability ( 2 )x. 2. In round x + (when i = x), j x+ becomes equal to f y, and after the swap, S x+ [x] = f y and S x+ [f y ] = x. The probability of this event is P (j x+ = f y ) =. 3. From round x + 2 (when i = x + ) to y (when i = y ), again j does not touch the indices x and f y. Thus, after round y, S y [x] = f y and S y [f y ] = x. This occurs with probability ( 2 )y x. 4. In round y + (when i = y), j y+ becomes equal to f y, and after the swap, S y+ [y] = S y [f y ] = x and S y+ [S y+ [y]] = S y+ [x] = S y [x] = f y. According to [2, Lemma ], this happens with probability ( 2 +. For small values of y, this is approximately equal to ( ) y(y+) 2, which gives a simpler expression. We consider 0 y 3 for good approximation. )+ y(y+) Considering the above events to be independent, we have P ( (S y+ [S y+ [y]] = f y ) (S y+ [y] = x) ) = ( 2 )x 2 ( )y x ( 2 ) ( )y ( [0,..., y ], we get P ( (S y+ [S y+ [y]] = f y ) (S y+ [y] y ) ) = ( y ) y(y+) 2 = ( 7 ) y(y+) 2. Summing for all x in ) ( 2 )y ( ) y(y+) 2.

If S y+ [y] = y, then S y+ [S y+ [y]] can equal f y in the following ways: (a) f y has to be equal to y; this happens with probability, (b) index y is not touched by j in any of the first y rounds; this happens with probability ( )y, and (c) in the (y + )-th round, j y+ = f y so that there is no swap; this happens approximately with probability ( ) y(y+) 2 (as explained in Item 4 above). Hence, P ( (S y+ [S y+ [y]] = f y ) (S y+ [y] = y) ) = ( ) ( )y ( ) y(y+) 2. Adding the above two contributions (one for 0 S y+ [y] y and the other for S y+ [y] = y), we get P ( (S y+ [S y+ [y]] = f y ) (S y+ [y] y) ) = ( ( ( y( 2 )y + ( )y). ) y(y+) 2 Lemma 6 Let p r (y) = P ( (S r [S r [y]] = f y ) (S r [y] r ) ), 0 y, r. Then p r (y) = ( 2 )p r (y) + ( y ) ( ) y(y+) 2 +2r 3, 0 y 3, y + 2 r. Proof: Then event ( (S r [S r [y]] = f y ) (S r [y] r ) ) (, where r y + 2, can occur in two mutually exclusive and exhaustive ways: (Sr [S r [y]] = f y ) (S r [y] r 2) ) ( and (Sr [S r [y]] = f y ) (S r [y] = r ) ). We compute the contribution of each separately. In the r-th round, i = r / {0,..., r 2}. Hence the event ( (S r [S r [y]] = f y ) (S r [y] r 2) ) occurs if we already had ( (S r [S r [y]] = f y ) (S r [y] r 2) ) and j r / {y, S r [y]}. Thus, the contribution of this part is p r (y) ( 2 ). The event ( (S r [S r [y]] = f y ) (S r [y] = r ) ) occurs if after the (r )-th round, S r [r ] = r, S r [y] = f y and in the r-th round (i.e., when i = r ), j r = y causing a swap involving the indices y and r.. We have S r [r ] = r if the location r is not touched during the rounds i = 0,..., r 2. This happens with probability ( )r. 2. The event S r [y] = f y, according to Proposition, happens with a probability ( y ( y ) ( ) ( y(y+) 2 +r ] + y(y+). For small values of y, this is approximately equal to 2 +r 2] which gives a simpler expression. We consider 0 y 3 for good approximation. 3. In the r-th round (when i = r ), j r becomes y with probability. Thus, P ( (S r [S r [y]] = f y ) (S r [y] = r ) ) = ( )r ( y )( y(y+) 2 ) +r 2] = ( y ) ( ) y(y+) 2 +2r 3. Adding the above two contributions, we get p r (y) = ( 2)p r (y) + ( y ) ( ) y(y+) 2 +2r 3. The recurrence in Lemma 6 and the base case in Lemma 5 completely specify the probabilities p r (y) for all y in [0,..., 3] and r in [y +,..., ]. Theorem 2 (Section 2): After the complete KSA, P (S [S [y]] = f y ) y ( ) y(y+) 2 +2( 2) + ( 8 ) y(y+) 2 y+2( ) + ( y ) ( y )

( ) y(y+) 2 +2 3, 0 y 3. Proof: Using the approximation 2 )2, the recurrence in Lemma 6 can be rewritten as p r (y) = ( where a = ( )2 p r (y) + ( y )2 and b = ( y ( ) ( ) y(y+) ) y(y+) 2 +2r 3, i.e., p r (y) = ap r (y) + a r b, ) ( 2. The solution of this recurrence is p r (y) = a r y p y+ (y) + (r y )a r b, r y +. Substituting the values of p y+ (y) (from Lemma 5), a and b, we get p r (y) = y ( ) y(y+) 2 +2(r 2) + ( ) y(y+) 2 y+2(r ) + ) ( y ) ( ) y(y+) ( r y 2 +2r 3, y + r, for initial values of y (0 y 3). Substituting r = and noting the fact that P ( (S [S [y]] = f y ) (S [y] ) ) = P (S [S [y]] = f y ), we get the result. 9