Factoring. there exists some 1 i < j l such that x i x j (mod p). (1) p gcd(x i x j, n).

Similar documents
Factoring Algorithms Pollard s p 1 Method. This method discovers a prime factor p of an integer n whenever p 1 has only small prime factors.

Elementary factoring algorithms

Algorithms. Shanks square forms algorithm Williams p+1 Quadratic Sieve Dixon s Random Squares Algorithm

1 Maintaining a Dictionary

Dixon s Factorization method

21. Factoring Numbers I

The RSA Cryptosystem: Factoring the public modulus. Debdeep Mukhopadhyay

Theoretical Cryptography, Lectures 18-20

Kim Bowman and Zach Cochran Clemson University and University of Georgia

1 The Fundamental Theorem of Arithmetic. A positive integer N has a unique prime power decomposition. Primality Testing. and. Integer Factorisation

What is The Continued Fraction Factoring Method

Basic elements of number theory

Basic elements of number theory

Discrete Math, Fourteenth Problem Set (July 18)

Integer factorization, part 1: the Q sieve. D. J. Bernstein

Q 2.0.2: If it s 5:30pm now, what time will it be in 4753 hours? Q 2.0.3: Today is Wednesday. What day of the week will it be in one year from today?

CPSC 467b: Cryptography and Computer Security

22. The Quadratic Sieve and Elliptic Curves. 22.a The Quadratic Sieve

You separate binary numbers into columns in a similar fashion. 2 5 = 32

RSA Cryptosystem and Factorization

Integer factorization, part 1: the Q sieve. part 2: detecting smoothness. D. J. Bernstein

CPSC 467b: Cryptography and Computer Security

1 Continued Fractions

Proof: Let the check matrix be

Theoretical Cryptography, Lecture 13

Math 131 notes. Jason Riedy. 6 October, Linear Diophantine equations : Likely delayed 6

Notes on arithmetic. 1. Representation in base B

An integer p is prime if p > 1 and p has exactly two positive divisors, 1 and p.

Computing Discrete Logarithms. Many cryptosystems could be broken if we could compute discrete logarithms quickly.

CHAPTER 6. Prime Numbers. Definition and Fundamental Results

Factoring. Number Theory # 2

CPSC 467b: Cryptography and Computer Security

Congruence of Integers

Lecture 6: Cryptanalysis of public-key algorithms.,

AN ALGEBRA PRIMER WITH A VIEW TOWARD CURVES OVER FINITE FIELDS

Elliptic Curves Spring 2013 Lecture #12 03/19/2013

Discrete Logarithm Problem

Cryptography: Joining the RSA Cryptosystem

MA554 Assessment 1 Cosets and Lagrange s theorem

Quick Sort Notes , Spring 2010

Number theory (Chapter 4)

(x 1 +x 2 )(x 1 x 2 )+(x 2 +x 3 )(x 2 x 3 )+(x 3 +x 1 )(x 3 x 1 ).

Cosets and Lagrange s theorem

Lecture 4: Constructing the Integers, Rationals and Reals

Information Security

Lecture 11 - Basic Number Theory.

A Few Primality Testing Algorithms

Definition For a set F, a polynomial over F with variable x is of the form

WORKSHEET ON NUMBERS, MATH 215 FALL. We start our study of numbers with the integers: N = {1, 2, 3,...}

Beautiful Mathematics

CPSC 467b: Cryptography and Computer Security

Q: How can quantum computers break ecryption?

Lecture 10: Powers of Matrices, Difference Equations

Shor Factorization Algorithm

6.080 / Great Ideas in Theoretical Computer Science Spring 2008

Lecture 2: Mutually Orthogonal Latin Squares and Finite Fields

KTH, NADA , and D1449 Kryptografins grunder. Lecture 6: RSA. Johan Håstad, transcribed by Martin Lindkvist

The next sequence of lectures in on the topic of Arithmetic Algorithms. We shall build up to an understanding of the RSA public-key cryptosystem.

YALE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE

#26: Number Theory, Part I: Divisibility

CPSC 467: Cryptography and Computer Security

CSE 521: Design and Analysis of Algorithms I

Discrete Mathematics and Probability Theory Fall 2018 Alistair Sinclair and Yun Song Note 6

Number Theory Math 420 Silverman Exam #1 February 27, 2018

Discrete Mathematics and Probability Theory Summer 2014 James Cook Note 5

Powers in Modular Arithmetic, and RSA Public Key Cryptography

Elementary Properties of the Integers

Cryptography CS 555. Topic 18: RSA Implementation and Security. CS555 Topic 18 1

Instructor: Bobby Kleinberg Lecture Notes, 25 April The Miller-Rabin Randomized Primality Test

Notes: Pythagorean Triples

To factor an expression means to write it as a product of factors instead of a sum of terms. The expression 3x

Math Circle Beginners Group February 28, 2016 Euclid and Prime Numbers Solutions

PageRank: The Math-y Version (Or, What To Do When You Can t Tear Up Little Pieces of Paper)

Discrete Mathematics and Probability Theory Fall 2014 Anant Sahai Note 7

18.310A Final exam practice questions

Fundamental theorem of modules over a PID and applications

MATH 115, SUMMER 2012 LECTURE 4 THURSDAY, JUNE 21ST

9 Knapsack Cryptography

Discrete Mathematics and Probability Theory Fall 2013 Vazirani Note 3

Chuck Garner, Ph.D. May 25, 2009 / Georgia ARML Practice

Lecture 10 - MAC s continued, hash & MAC

CPSC 536N: Randomized Algorithms Term 2. Lecture 9

COMP4109 : Applied Cryptography

Finding the Median. 1 The problem of finding the median. 2 A good strategy? lecture notes September 2, Lecturer: Michel Goemans

Number Theory in Problem Solving. Konrad Pilch

Where do pseudo-random generators come from?

Math 299 Supplement: Modular Arithmetic Nov 8, 2013

Linear Independence Reading: Lay 1.7

CPSC 467: Cryptography and Computer Security

Senior Math Circles Cryptography and Number Theory Week 2

Modular Arithmetic Instructor: Marizza Bailey Name:

Number theory (Chapter 4)

Lecture 3: Latin Squares and Groups

CPSC 467: Cryptography and Computer Security

Introduction to Algorithms

THE RSA ENCRYPTION SCHEME

FOUNDATIONS OF ALGEBRAIC GEOMETRY CLASS 24

The Elliptic Curve Method and Other Integer Factorization Algorithms. John Wright

OBTAINING SQUARES FROM THE PRODUCTS OF NON-SQUARE INTEGERS

Experience in Factoring Large Integers Using Quadratic Sieve

Transcription:

18.310 lecture notes April 22, 2015 Factoring Lecturer: Michel Goemans We ve seen that it s possible to efficiently check whether an integer n is prime or not. What about factoring a number? If this could be done efficiently (for example, in say D 4 operations, where D log n is the number of digits of n), then the RSA encryption scheme could be broken, just by factoring N = pq in the public key. Probably the first algorithm that comes to mind is the brute-force approach: for each positive integer d greater than 1 and not larger than n, check if d divides n. The reason we go up to n is simply that if n is composite, it must have a divisor no larger than n. This can be done very efficiently for any particular choice of d, using the Euclidean algorithm: compute gcd(d, n), and if it s not 1, the result is a nontrivial 1 factor of n. Unfortunately, this method is extremely slow, just because of the number of potential divisors to check: n. We will see two faster algorithms. The first, Pollard s rho algorithm will require roughly n 1/4 gcd operations (rather than n 1/2 as above). The second, the quadratic sieve, will run roughly in time e log n log log n. This looks a bit complicated, but notice that (log n) C = e C log log n and n ɛ = e ɛ log n. So e log n log log n lies inbetween these two, in terms of its speed of growth as n gets large. Pollard s rho algorithm Suppose n is the number to be factored, and n = pq, where p is a prime factor not larger than n (q need not be prime here, if n has more than 2 prime factors). Note that we do not know p; once we figure out p, we re done! Consider the following thought experiment. Pick some numbers x 1, x 2,..., x l uniformly at random in Z n (the choice of l will be decided later). Assume that all the numbers are distinct (if n is large compared to l, this is very unlikely anyway). Now suppose that there exists some 1 i < j l such that x i x j (mod p). (1) Then p x i x j, and since p n also, we have that p gcd(x i x j, n). Moreover, since n < x i x j < n and x i x j, gcd(x i x j, n) < n. Thus gcd(x i x j, n) provides a nontrivial factor of n, and our job is done. Again, keep in mind that we do not know p. But, the above tells us that if we computed gcd(x i x j, n) for every pair 1 i < j l, we would find the nontrivial factor. But how big does l need to be such that the chance that (1) holds is reasonable? 1 Nontrivial here just means that the factor is neither 1, nor n itself. Factor-1

The answer is roughly p. This is essentially a reformulation of the Birthday paradox : the somewhat surprising fact that in a group of only 30 people, the probability that two people in the group have the same birthday is already quite large more than 60%. Let s calculate the probability that (1) does not hold, i.e., that the residue classes ( birthdays ) of x 1, x 2,..., x l are all different. This is ( P(all different) = 1 1 ) ( 1 2 ) ( 1 l 1 ) n n n e 1/n e 2/n e (l 1)/n = e l(l 1) 2n e l2 /(2n). using 1 y e y So if l = n, the probability is roughly e 1/2 2/3, and so there is a substantial probability that (1) occurs. Since p n, we only need to choose l = n 1/4 (remember we don t know p!). But then the number of pairs i, j is ( ) l 2 which is about 1 2 n. Checking all these pairs takes again about n gcd computations this is no better than brute search! What whas the point? The clever trick here is to not pick the x i s randomly, but instead in a way that looks random. Let f : Z n Z n be defined by f(x) = x 2 + 1 mod n. Fix any x 0 Z n. Now consider the sequence x 1 = f(x 0 ), x 2 = f(x 1 ),... x i = f(x i 1 ),.... Empirical fact : The sequence x 0, x 1,... looks random. This is not a precise statement, and indeed there is no formal proof that the algorithm that we will describe now actually works efficiently. But it is well established empirically that it does, in that for most choices of x 0, there will be a pair i < j < C p (for some small constant C) with x i x j (mod p). The plan now is to look for such a pair i, j in this sequence (rather than a randomly generated one). But we still can t check every pair i, j, so how does this help? The observation is that we re actually looking for cycles in the sequence x 0 mod p, x 1 mod p, x 2 mod p,.... For suppose x i x j (mod p), with i < j and i chosen as small as possible so that this holds. Then f(x i ) f(x j ) (mod p), i.e., x i+1 x j+1 (mod p) (you should check this!). Figure 1 shows the situation schematically: the points in the picture represent the values Z p. Eventually the sequence x 0 mod p, x 1 mod p,... runs into itself, and from then one goes around a cycle. (The name Pollard s rho algorithm comes from this picture...). It s worth recalling again at this point that we don t know p, so we cannot directly see the cycle. Nevertheless, we can find it very efficiently with the following algorithm. Tortoise and hare algorithm: 1. Let y 0 = x 0. Factor-2

x i+1 mod p x i mod p x 1 mod p x 0 mod p Figure 1: A cycle formed in the sequence x 0 mod p, x 1 mod p,.... 2. For i = 1, 2,... (a) x i = f(x i 1 ). (b) y i = f(f(y i 1 )) (c) If gcd(x i y i, n) 1, return this discovered factor. Informally, we start a tortoise (the sequence (x i )) and a hare (the sequence (y i )) at x 0, and the tortoise moves one step each turn, but the hare moves two. Once both the tortoise and the hare are on the cycle, it s just a matter of time until the hare runs into the tortoise. Our empirical fact implies that this algorithm will generally finish within C p Cn 1/4 steps for some small constant C; if we run for much longer than this without finding a cycle, we can give up and start again with a different choice of x 0. The quadratic sieve This algorithm is closely related to the currently fastest known method for factoring. As already mentioned, it takes time roughly e log n log log n. The approach is based on the following idea. If we could find two numbers a, b such that a b (mod n), a b (mod n), but a 2 b 2 (mod n), then we can easily get a factor. For we have that n (a b)(a + b), but n a b and n a + b. Thus gcd(a + b, n) will be a nontrivial factor (as will gcd(a b, n), but we only need one). Let g : Z n Z be the function defined by g(z) = z 2 mod n. Now if we could find some integer z, with n z n + K (K is something big, but much smaller than n) and where g(z) is a perfect square, then we are done. (Note: by perfect square, we really mean a perfect square as an integer, not as an element of Z n. So we mean g(z) = y 2 for some integer y, not g(z) y 2 (mod n).) For let g(z) = y 2. Then y 2 z 2 (mod n). But z y (mod n), since y < n z < n. Moreover, if K is not too large, it can also be shown that z y (mod n). So then we can obtain a nontrivial factor from gcd(y z, n) as already discussed. Example. Suppose n = 1817. Then n = 43. We get lucky: g(51) = 784 = 28 2. So 51 2 28 2 (mod 1817), hence 23 79 0 (mod 1817), and we have a factor. Unfortunately, starting from n and working our way up one at a time looking for a z s.t. g(z) is a perfect square does not work well; the special values of z are rare, and so the number of Factor-3

tries needed is again huge. But there is another hope. Suppose we find distinct z 1, z 2,..., z l Z n, with all z i n, and where l g(z i ) is a perfect square. (2) i=1 (Again, note that we multiply the numbers g(z i ) as normal integers, not modulo n; so this product can be larger than n). Then this is also (probably) good for us: let z = l i=1 z i and y 2 = l i=1 g(z i). Then y 2 z 2 (mod n), and so as long as y ±z (mod n), we again find our nontrivial factor from gcd(y + z, n). It is (vaguely speaking) unlikely that y ±z (mod n), though we won t deal with this formally; if we get unlucky, we have to try for another collection of z i s. Is it any easier to find a collection of z i s satisfying (2)? Notice that given an integer m with prime factorization m = p α 1 1 pα 2 2 pαt t, then m is a perfect square if and only iff α j is even for each 1 j t. The idea will be to work with z i s so that g(z i ) contain only small primes in their prime factorization, which will allow us to exploit this characterization of being a perfect square. Definition 1. For B N, an integer is called B-smooth if all its prime factors are at most B. For example, 15 and 75 are 5-smooth, but 14 is not 5-smooth. If B is not too large, we can i) check if an integer m is B-smooth, and ii) if it is, find its prime factorization, reasonably efficiently. Simply compute gcd(p i, m) for each prime p i B in turn, and if this is larger than 1, divide m by p i (keeping track of the number of factors of p i found). If m eventually reduces down to 1, we have a prime factorization of m in terms of primes not larger than B; otherwise, m is not B-smooth. The plan is: pick some B (which will grow with n unfortunately, but much more slowly than n; roughly, B = e 1 2 log n log log n ), and find B + 1 numbers z 1, z 2, z B+1 Z n, z i n, so that g(z i ) is B-smooth for each i. It turns out that these numbers are not so rare (depending on the choice of B), so we could do this just by trying each value z starting from n, checking if g(z) is B-smooth, as described above. Once we have these numbers z 1,..., z B+1, we will in fact be able to find a subset of these numbers which satisfy (2)! This is a consequence of some linear algebra, and the reason we wanted precisely B+1 numbers. Let p 1,..., p t be the list of primes of size at most B. Let g(z i ) = t j=1 pα i,j j be the prime factorization of g(z i ) (of course, some α i,j s can be zero). Then for a given subset I {1, 2,..., B + 1}, g(z i ) = t j=1 p ( α i,j) j. Thus, applying our condition for a number to be a perfect square, g(z i ) is a perfect square α i,j is even j {1, 2,..., t}. (3) Now the linear algebra connection. (Warning: some knowledge of linear algebra will be assumed here). Consider the (B + 1) t matrix A, with entries A ij = α i,j mod 2. We think of the entries of A as being in Z 2. Let A i denote the i th row of A (so this is a vector). Then what we are looking for is a nonempty collection of rows I {1, 2,..., B + 1} so that A i = 0. The existence of such a subset I follows from a key concept in linear algebra, linear dependence. Since A has only Factor-4

B columns, the rank of A is at most B; since there are more than B rows, this means there must be a linear dependence amongst the rows, i.e., w i Z 2 so that B+1 i=1 w i A i = 0 (and not all the w i are zero.) But that s precisely what we want; just take I = {i w i = 1}. There is also an efficient algorithm to find this linear dependence. Factor-5