THE RSA ENCRYPTION SCHEME Contents 1. The RSA Encryption Scheme 2 1.1. Advantages over traditional coding methods 3 1.2. Proof of the decoding procedure 4 1.3. Security of the RSA Scheme 4 1.4. Finding Large Primes 5 2. Some Algorithms 5 2.1. Modular Inversion 5 2.2. The Power Algorithm 8 3. Exercise 10 Date: March 24, 1997; Revised April 17, 2002, and April 15, 2009. 1
2 THE RSA ENCRYPTION SCHEME 1. The RSA Encryption Scheme We describe a scheme, first proposed over 30 years ago, by which any number of people can send each other messages, which cannot be read by anyone other than the intended receiver. This scheme goes back to 1977, when it was publicly described by Ron Rivest, Adi Shamir and Leonard Adleman from MIT. Each individual who wishes to participate in the scheme finds two large prime numbers, p and q. Each person then calculates n = pq. Then an integer r relatively prime to (p 1)(q 1) is chosen. Finally, s is calculated so that rs 1 (mod (p 1)(q 1)). The integers n and r are made public perhaps each person s n and r are published in a directory while keeping p, q and s secret. Now suppose A wants to send a secret message to B, let us say, the message JON SKATES First, A must turn this message into a number. It doesn t matter how, so long as everyone does it the same way; let us simply let A=01, B=02, C=03,..., Z=26, space=00. The message to be sent is now the number M = 10151400191101200519 Now A looks up B s n and r in the directory and calculates E M r and sends E to B as the encoded message. B decodes the message E by calculating using his secret s. As we shall see, E s E s M so that B recovers the message M. The description given here works only if M < n; if M n then M must be broken into pieces M 1, M 2,..., M t, each less than n. Each piece then gets encoded by E j Mj r and decoded by M j Ej s (mod n). EXAMPLE Suppose A finds B s entry in the directory to be n = 11096351737 and r = 684315297. This value of n is too small to use in practice, as we shall see, but it will do to illustrate the scheme. A breaks M up into 10 digit pieces: M 1 = 1015140019, M 2 = 1101200519. In general, if the last piece does not contain 10 digits some random digits may be added. Then using r = 684315297 A calculates E 1 M r 1 1015140019 684315297 9974872058 (mod 11096351737), E 2 M r 2 1101200519 684315297 7148290747 (mod 11096351737)
THE RSA ENCRYPTION SCHEME 3 Then A transmits to B the message E = 99748720587148290747. Now B breaks this up into 10 digit pieces E 1 = 9974872058, E 2 = 7148290747. He then uses his secret s which happens to be s = 657804897 to calculate M 1 E1 s 9974872058 657804897 1015140019 (mod 11096351737) M 2 E2 s 7148290747 657804897 1101200519 (mod 11096351737). Thus B has recovered M = 10151400191101200519 which he easily translates to JON SKATES QUESTIONS: (1) Why does it work? That is, why is it true that E s M (mod n)? (2) How do we calculate s such that rs 1 (mod (p 1)(q 1))? (3) How large do p and q have to be? (4) How do we find large primes? (5) How do we calculate M r especially when M and r are large? (6) Is the scheme secure? That is, is it possible for a third party to figure out the secret s from the known n and r? Or recover M from n, r, and an intercepted E without knowing s? (7) What advantage does this scheme have over traditional coding methods? 1.1. Advantages over traditional coding methods. In the usual coding schemes, A and B have to agree on a secret key before they can start sending messages. If communication of the secret key is intercepted, secrecy is gone. In the RSA scheme, no secret key is communicated. Suppose k people wish to participate in an information network in such a way that no one of them can decode messages sent between any two others. Then each pair needs a secret key; each person needs to keep track of k 1 different secret keys. Thus the total number of keys needed is k(k 1). In the RSA scheme each person has only two keys, his public r and his secret s; in all, only 2k keys.
4 THE RSA ENCRYPTION SCHEME 1.2. Proof of the decoding procedure. We have to prove that E s M (mod n). We have E M r so we need to show that M rs M (mod n). Now since p and q are distinct primes φ(n) = φ(pq) = φ(p)φ(q) = (p 1)(q 1). Since rs 1 (mod (p 1)(q 1)), there is a nonnegative integer d such that rs = d(p 1)(q 1) + 1. We consider two cases. Case 1. (M, n) = 1. By Euler s Theorem we have M φ(n) = M (p 1)(q 1) 1 and M rs (M (p 1)(q 1) ) d M 1 d M M Case 2. (M, n) > 1. We assume without loss of generality that (M, n) = p. Then (1.1) M rs M (mod p), since M 0 (mod p). By Fermat s Little Theorem we have M q 1 1 (mod q), since (M, q) = 1. Hence (1.2) M rs (M (q 1) ) d(p 1) M 1 d(p 1) M M (mod q). The congruences (1.1) and (1.2) imply that M rs M since n = pq and p, q are distinct primes. In both cases we have as required. M rs M 1.3. Security of the RSA Scheme. If a third party C can factor n, that is if he can find p and q, then he can compute (p 1)(q 1) and then he can compute the secret key s and use it to decode messages. Thus the scheme is not secure unless n is so large that C is unlikely to be able to factor it. At present (2009), a well chosen 2048 bit n is unlikely to be factored. This means p and q should each be about 1024 bits, i.e. about 300 digits. Is there a way to break the code, without factoring n? Nobody has found one (or, if they have, they aren t telling). Nobody has ruled out the possibility either.
THE RSA ENCRYPTION SCHEME 5 1.4. Finding Large Primes. How do we find a 300 digit prime? One way is to pick a 300 digit number at random and test it for primality. If it proves composite, then try again. How many numbers can you expect to test before finding a prime? The Prime Number Theorem says among the integers near x roughly one out of every log(x) is prime. Since log(10 300 ) = 300 log 10 691, you would expect to test about 691 numbers to find one that is prime. 2. Some Algorithms To implement the RSA algorithm, we need 1 A method for computing s; 2 An efficient method for computing M r (mod n). 2.1. Modular Inversion. Give r and n relatively prime we want to compute s such that rs 1 (mod n). In a previous class we saw how we could use the Euclidean algorithm to compute the inverse modulo n. We give a formal description of the Euclidean Algorithm. Euclid s Algorithm Input a and b while a > 0 do t b b a a t a [t/a] Output b STOP Below is a maple implementation of this algorithm. euclid:=proc(a,b) local A,B,t: A:=a: B:=b: while A>0 do t:=b: B:=A: A:=t - A*trunc(t/A): od: RETURN(B): end:
6 THE RSA ENCRYPTION SCHEME The Modular Inversion algorithm is a variant of Euclid s algorithm. Modular Inversion Algorithm Input r and n Let s = 0 and s = 1 while r > 0 do t n n r q [t/n] r t rq u s s s qs s u Output s STOP We illustrate the algorithm with r = 13 and n = 47. t = 47 t = 13 t = 8 n = 13 n = 8 n = 5 q = [47/13] = 3 q = [13/8] = 1 q = [8/5] = 1 r = 47 13 3 = 8 r = 13 8 1 = 5 r = 8 5 1 = 3 u = 1 u = 3 u = 4 s = 0 3 1 = 3 s = 1 1 ( 3) = 4 s = 3 1 4 = 7 s = 1 s = 3 s = 4 t = 5 t = 3 t = 2 n = 3 n = 2 n = 1 q = [5/3] = 1 q = [3/2] = 1 q = [2/1] = 2 r = 5 3 1 = 2 r = 3 2 1 = 1 r = 2 1 2 = 0 u = 7 u = 11 u = 18 s = 4 1 ( 7) = 11 s = 7 1 11 = 18 s = 11 2( 18) = 47 s = 7 s = 11 s = 18 We see that s = 18 when r = 0 so that the multiplicative inverse of 13 modulo 47 is 18 29 (mod 47). We will now prove that the Modular Inversion algorithm works. When we apply the Euclidean algorithm to the pair n,r we obtain a sequence of integers r 1 > r 2 > r 3 > > r k > r k+1 = 0,
THE RSA ENCRYPTION SCHEME 7 that satisfies n = rq 1 + r 1 r = r 1 q 2 + r 2 r 1 = r 2 q 3 + r 3 r 2 = r 3 q 4 + r 4. r k 2 = r k 1 q k + r k r k 1 = r k q k+1 + 0. We have (n, r) = r k which in our case is 1. An examination of the Modular Inversion algorithm reveals we may define a sequence {s m } as follows: s 1 := 0, s 0 := 1, s m+1 := s m 1 q m+1 s m, for m 0. The s m correspond to the values of s and s in the algorithm. For instance, s 1 corresponds to the initial s and s 0 to the initial s. We prove that for each m α m n + s m r = r m, for some integer α m. We proceed inductively. We have so that s 1 = s 1 q 1 s 0 = q 1, r 1 = n rq 1, r 1 = 1 n + ( q 1 )r = 1 n + s 1 r, and the result holds for m = 1. We now fix M < k and assume that the result holds for all m M. So there exist integers α M and α M 1 such that We have r M = α M n + s M r, r M 1 = α M 1 n + s M 1 r. r M+1 = r M 1 r M q M+1 (where a M+1 = α M 1 α M q M+1 ) = (α M 1 n + s M 1 r) (α M n + s M r)q M+1 = α M+1 n + (s M 1 s M q M+1 )r = α M+1 n + s M+1 r and the result holds for m = M + 1 and hence for all m k by induction. So we have α k n + s k r = r k = 1,
8 THE RSA ENCRYPTION SCHEME for some integer α k. It follows that s k r 1 (mod n). s k corresponds to the value of s which is the output from the algorithm and is the multiplicative inverse of r modulo n. 2.2. The Power Algorithm. In this section we describe an efficient method for computing M r (mod n). First we describe an efficient method for computing powers. As an example, suppose we wish to compute a 82 for some given a. The naive way would be to do 81 multiplications but this is clearly inefficient. The idea is to first convert 82 to binary: 82 = 0 1 + 1 2 + 0 4 + 0 8 + 1 16 + 0 32 + 1 64, so that 82 = (1010010) 2. We compute powers of a by doing repeated squaring: a 2, a 4, a 8, a 16, a 32, a 64. This takes 6 multiplications. Then we complete the computation of a 82 by doing 3 more multiplications: a 82 = a 2 a 16 a 64. This gives a total of 9 multiplications instead of 81. This leads to following algorithm. The Power Algorithm Input a and N Let j = 1, Q = 1, Y = 1 and n = N. while n > 0 do if n is odd then b 1 if n = N then Q a Y a Q Q 2 Output Y STOP Y Q Y b 0 if n = N then Q a Q Q 2 n [n/2] The input of the Power Algorithm is a number a and a positive integer N. The output is a N.
THE RSA ENCRYPTION SCHEME 9 To compute a N (mod m) requires a simple modification of the Power Algorithm. We need a function that does reduction mod m. Define a function whose input is pair a, m and whose output is the least nonnegative residue of a (mod m): mod (a, m) = a m [a/m]. Now we just adjust the Power Algorithm by doing reduction mod n at each step. The Power Algorithm mod m Input a, N and m Let j = 1, Q = 1, Y = 1 and n = N. while n > 0 do if n is odd then b 1 if n = N then Q a Y a Q mod (Q 2, m) Y mod (Q Y, m) b 0 if n = N then Q a Q mod (Q 2, m) n [n/2] Output Y STOP The input of this version of the Power Algorithm is a number a and a pair of positive integers N and m. The output is the least nonnegative residue of a N (mod m). Below is a maple implementation of this algorithm.
10 THE RSA ENCRYPTION SCHEME POWERMOD:=proc(a,N,M) local j,q,y,n,b: j:=1: Q:=1: Y:=1: n:=n: while n>0 do if modp(n,2)=1 then b:=1: if n=n then Q:=a: Y:=a: Q:=modp(Q^2,M): Y:=modp(Q*Y,M): fi: b:=0: if n=n then Q:=a: Q:=modp(Q^2,M): fi: fi: n:=trunc(n/2): od: RETURN(Y); end: 3. Exercise (a) Using the usual alphabet encoding encode the phrase: SEND MONEY as three 8 digit numbers: M 1 = 19051404, M 2 =, M 3 =. (b) Now use the RSA encryption scheme to encode M 1, M 2, M 3 with n = 536813567, and r = 3602561, to obtain three numbers (< n) E 1 = 463099189, E 2 =, E 3 =.
THE RSA ENCRYPTION SCHEME 11 (c) Break the code by factoring n and obtaining the secret key s. work/computations and check that E 1 decodes as M 1 etc. Explain your