Kim Bowman and Zach Cochran Clemson University and University of Georgia

Similar documents
ANALYZING THE QUADRATIC SIEVE ALGORITHM

Analyzing the Quadratic Sieve: An Undergraduate Research Case-history

Algorithms. Shanks square forms algorithm Williams p+1 Quadratic Sieve Dixon s Random Squares Algorithm

Factoring. there exists some 1 i < j l such that x i x j (mod p). (1) p gcd(x i x j, n).

Dixon s Factorization method

Integer factorization, part 1: the Q sieve. D. J. Bernstein

Factoring Algorithms Pollard s p 1 Method. This method discovers a prime factor p of an integer n whenever p 1 has only small prime factors.

A Few Primality Testing Algorithms

Experience in Factoring Large Integers Using Quadratic Sieve

Discrete Math, Fourteenth Problem Set (July 18)

Sieve-based factoring algorithms

Industrial Strength Factorization. Lawren Smithline Cornell University

SMOOTH NUMBERS AND THE QUADRATIC SIEVE. Carl Pomerance

What is The Continued Fraction Factoring Method

Lecture 8: Number theory

SYMBOL EXPLANATION EXAMPLE

3. Vector spaces 3.1 Linear dependence and independence 3.2 Basis and dimension. 5. Extreme points and basic feasible solutions

1 Overview and revision

AN ALGEBRA PRIMER WITH A VIEW TOWARD CURVES OVER FINITE FIELDS

Theoretical Cryptography, Lectures 18-20

NORTHWESTERN UNIVERSITY Thrusday, Oct 6th, 2011 ANSWERS FALL 2011 NU PUTNAM SELECTION TEST

Chuck Garner, Ph.D. May 25, 2009 / Georgia ARML Practice

Discrete Mathematics with Applications MATH236

PUTNAM TRAINING NUMBER THEORY. Exercises 1. Show that the sum of two consecutive primes is never twice a prime.

OBTAINING SQUARES FROM THE PRODUCTS OF NON-SQUARE INTEGERS

1 Maintaining a Dictionary

Discrete Math, Spring Solutions to Problems V

Corollary 4.2 (Pepin s Test, 1877). Let F k = 2 2k + 1, the kth Fermat number, where k 1. Then F k is prime iff 3 F k 1

Fermat s Little Theorem. Fermat s little theorem is a statement about primes that nearly characterizes them.

1 The Fundamental Theorem of Arithmetic. A positive integer N has a unique prime power decomposition. Primality Testing. and. Integer Factorisation

ALGEBRA. 1. Some elementary number theory 1.1. Primes and divisibility. We denote the collection of integers

Math 308 Midterm Answers and Comments July 18, Part A. Short answer questions

Number Theory Math 420 Silverman Exam #1 February 27, 2018

Primality Testing. 1 Introduction. 2 Brief Chronology of Primality Testing. CS265/CME309, Fall Instructor: Gregory Valiant

DIFFERENCE DENSITY AND APERIODIC SUM-FREE SETS. Neil J. Calkin Department of Mathematical Sciences, Clemson University, Clemson, SC 29634, USA

A GENERAL POLYNOMIAL SIEVE

Blankits and Covers Finding Cover with an Arbitrarily Large Lowest Modulus

SUMS OF SQUARES WUSHI GOLDRING

Algebraic Methods in Combinatorics

ORDERS OF ELEMENTS IN A GROUP

Monoids. Definition: A binary operation on a set M is a function : M M M. Examples:

Contradiction MATH Contradiction. Benjamin V.C. Collins, James A. Swenson MATH 2730

Pseudoprime Statistics to 10 19

CPSC 467b: Cryptography and Computer Security

MATH CSE20 Homework 5 Due Monday November 4

ECEN 5022 Cryptography

Lecture 11 - Basic Number Theory.

Number Theory. Zachary Friggstad. Programming Club Meeting

Algebraic Methods in Combinatorics

Average-case Analysis for Combinatorial Problems,

Theoretical Cryptography, Lecture 13

Math101, Sections 2 and 3, Spring 2008 Review Sheet for Exam #2:

MODEL ANSWERS TO HWK #7. 1. Suppose that F is a field and that a and b are in F. Suppose that. Thus a = 0. It follows that F is an integral domain.

CS280, Spring 2004: Prelim Solutions

Proofs of the infinitude of primes

Algorithms (II) Yu Yu. Shanghai Jiaotong University

FERMAT S TEST KEITH CONRAD

MODEL ANSWERS TO THE FIRST HOMEWORK

Chapter 9 Factorisation and Discrete Logarithms Using a Factor Base

Commutative Rings and Fields

Proof: Let the check matrix be

Introduction to Parallel Programming in OpenMP Dr. Yogish Sabharwal Department of Computer Science & Engineering Indian Institute of Technology, Delhi

Family Feud Review. Linear Algebra. October 22, 2013

Homework Assignment 4 Solutions

Advanced Algorithms and Complexity Course Project Report

(x 1 +x 2 )(x 1 x 2 )+(x 2 +x 3 )(x 2 x 3 )+(x 3 +x 1 )(x 3 x 1 ).

Number theory (Chapter 4)

MATH 433 Applied Algebra Lecture 4: Modular arithmetic (continued). Linear congruences.

Basic elements of number theory

Basic elements of number theory

Integer factorization, part 1: the Q sieve. part 2: detecting smoothness. D. J. Bernstein

Running time predictions for square products, and large prime variations. Ernie Croot, Andrew Granville, Robin Pemantle, & Prasad Tetali

Number Theory and Algebra: A Brief Introduction

Order of Operations. Real numbers

1 Recommended Reading 1. 2 Public Key/Private Key Cryptography Overview RSA Algorithm... 2

1.1 Basic Algebra. 1.2 Equations and Inequalities. 1.3 Systems of Equations

(I.D) Solving Linear Systems via Row-Reduction

Linear equations The first case of a linear equation you learn is in one variable, for instance:

Euclidean prime generators

Hamming codes and simplex codes ( )

IRREDUCIBILITY TESTS IN F p [T ]

Algebraic Structures Exam File Fall 2013 Exam #1

GRE Subject test preparation Spring 2016 Topic: Abstract Algebra, Linear Algebra, Number Theory.

Homework Problems, Math 134, Spring 2007 (Robert Boltje)

CHAPTER 3. Congruences. Congruence: definitions and properties

Sums and products: What we still don t know about addition and multiplication

UMA Putnam Talk LINEAR ALGEBRA TRICKS FOR THE PUTNAM

* 8 Groups, with Appendix containing Rings and Fields.

Sums and products. Carl Pomerance, Dartmouth College Hanover, New Hampshire, USA. Dartmouth Mathematics Society May 16, 2012

Math 4310 Solutions to homework 1 Due 9/1/16

WXML Final Report: AKS Primality Test

Lecture 7.5: Euclidean domains and algebraic integers

Discrete Logarithm Problem

The Chinese Remainder Theorem

Maximum exponent of boolean circulant matrices with constant number of nonzero entries in its generating vector

Computing Discrete Logarithms. Many cryptosystems could be broken if we could compute discrete logarithms quickly.

MTH 464: Computational Linear Algebra

1 2 3 style total. Circle the correct answer; no explanation is required. Each problem in this section counts 5 points.

Homework #2 solutions Due: June 15, 2012

Transcription:

Clemson University, Conference October 2004 1/31 Linear Dependency and the Quadratic Sieve Kim Bowman and Zach Cochran Clemson University and University of Georgia and Neil Calkin, Tim Flowers, Kevin James and Shannon urvis

Introduction to Factoring Fermat s method: difference of two squares 2/31 a 2 b 2 mod n n (a 2 b 2 ) = (a + b)(a b) If a is not (±b) mod n, then n (a + b) and n (a b) So, n must have a common factor in both. Let g = GCD(a b, n). Now, g = 1 n (a + b) and g = n n (a b) So, we know that 1 g n, which means there exists a nontrivial factor of n.

A Brief Overview of the Quadratic Sieve The Quadratic Sieve, or QS, is an algorithm used to factor some large integer n. 3/31 Collect some set of primes, where = {p 1, p 2, p 3,..., p k } are the primes that are B for some bound B. We know that one-half of the primes can be thrown out due to some technical math. The remaining subset is called the factor base. Take an interval R = { n,..., n + L 1 }, a subinterval of ( n, 2n), where n is the number you want to factor.

A Brief Overview of the Quadratic Sieve Compute f(r) = r 2 n for each r i in the interval and place these values in an array. 4/31 Divide each f(r i ) value in the array by each p j as many times as possible. This is known as the sieving process. Find and collect all the entries in the array that equal one after the sieving process is complete. We ll call the elements of this subset, R = {r 1, r 2, r 3,..., r m }, that factor completely over all primes p i B B-smooth.

A Brief Overview of the Quadratic Sieve After writing out the prime factorizations of these B-smooth numbers, take the exponents of each prime modulo two and put them in a binary matrix A where each row represents a B-smooth number and each column represents a different prime in the factor base. 5/31 erform Gaussian elimination on the matrix to find a linear dependency. We know each r 2 i is a square modulo n; let the product of the elements in R be a 2. The product of these f(r i ) s is a square modulo n, call this product b 2. Since a 2 b 2 mod n, we compute GCD(n, a b) to find a factor.

Limitations of the QS As it stands, the quadratic sieve is the second fastest general purpose factoring method. Despite this, it is still a time-consuming algorithm. The lengthiest part of the algorithm is the sieving process, but since this part of the code can be parallelized, the running time of the linear algebra becomes more important. 6/31

Limitations of the QS As the numbers we try to factor get larger, the matrix state of QS (...) looms larger. The unfavorable bound of Gaussian elimination ruins our overall complexity estimates, which assume that the matrix stage is not a bottleneck. In addition, the awkwardness of dealing with huge matrices seems to require large and expensive computers, computers for which it is not easy to get large blocks of time. 7/31 Richard Crandel and Carl omerance rime Numbers A Computational erspective

Limitations of the QS As it stands now, when there are k primes in the factor base, k + 1 B-smooth numbers are found by the sieving process to ensure a dependency. So, if there are k primes in the factor base and we sieve until we have k + 1 B-smooth numbers, then we have a [(k + 1) k] matrix A which is clearly row dependent. Any significant reduction in the size of A would speed up the linear algebra significantly. 8/31

Limitations of the QS How could we do better? 9/31 Our research focused on how many rows of A are needed to obtain a linear dependency. Such a decrease should allow us to shorten the interval we must sieve. If we do not need as many as k + 1 rows in A then: (a) We don t need to sieve as long. (b) Linear algebra is faster.

Approaches Let s consider a probabilistic model: 10/31 Since the entries of the exponent vectors from the sieve are either one or zero, we wrote a computer program that would generate random binary vectors, where each entry in the vector has a one with a probability designed to model the probabilities produced by the QS. The program would generate these vectors and perform Gaussian elimination until a dependency was found. The question that remained was what model should be used to generate the entries of these vectors.

Constant Weight Vectors Our first approach: Since the number n typically has about loglog n prime factors, we might consider vectors in F k 2 of fixed weight w = loglog(n). We concentrated on weight three vectors. After multiple tests were run in several different dimensions and data was collected, we found that we needed a number of vectors totaling roughly about 92 to 93% of the dimension to have a high probability that a dependency would be found. 11/31 This is in accordance with a theorem of Calkin, that essentially, k(1 e w log(2) ) is a lower bound for dependency. We can see easily that k(1 e w ) is (essentially) an upper bound for dependency; our results suggest that the threshold lies close to the lower bound.

What We ve Done 12/31 Figure 1: Dim 500 Wt 3 robability Distribution Graph

What We ve Done 13/31 Figure 2: Dim 500 Wt 3 robability Distribution Graph

Is there a better model? The next natural progression of thought leads to the question, Are constant weight vectors a realistic model? 14/31

Is there a better model? The next natural progression of thought leads to the question, Are constant weight vectors a realistic model? NO! 15/31

Is there a better model? The next natural progression of thought leads to the question, Are constant weight vectors a realistic model? NO! 16/31 Why? Where does this model go wrong?

Is there a better model? The next natural progression of thought leads to the question, Are constant weight vectors a realistic model? NO! 17/31 Why? Where does this model go wrong? Small primes are too rare and big primes are too common. By uniformly picking vectors of constant weight, you are assigning the same probability of getting a one in each column. That is equivalent to saying that a number n is equally likely to be divisible by two as it is to be divisible by 7919. How do we fix the model? A better representation of the random vectors would be to have different probabilities for each column.

Independent robability Vectors Let p i represent the prime number associated with the ith column. Let s look at the probability α i that an odd power of p i divides n. 18/31 α i = 1 p i 1 p 2 i + 1 p 3 i 1 p 4 i +... = 1 ( ) ( ) 1 1 2 (1 + + +...) p i p i p i ( ) ( ) = 1 1 p i 1 1 = 1pi 1 p i 1 + 1 p i = 1 p i + 1

Independent robability Vectors With this change in our model, dependencies occur much earlier. So far, looking at dimensions up through 25,000, we find dependencies within the first 100 rows instead of at 93% of the dimension. 19/31

20/31 Figure 3: Dim 500 Wt 3 robability Distribution Graph

21/31 Figure 4: Dim 500 Wt 3 robability Distribution Graph Figure 5: robability of Dependency in Dimension 512

An Even Better Model Extensive calculations borne out by extensive computations suggest that a better model is: 22/31 α i = 1 p 1 δ i +1 where δ = log u log k where u = log n log k

When is a set of binary vectors linearly dependent? 23/31 1. Do Gaussian elimination: roblem, slow 2. Trivial Dependency (TD): more rows than columns 3. Almost Trivial Dependency (ATD): more rows than non-empty columns 4. Nearly Trivial Dependency (NTD): continuing research (columns with 1 one) 5. Somewhat Trivial Dependency (STD): research to come (column with exactly 2 one s)

Almost Trivial Dependencies 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 0 0 0 1 0 1 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 1 0 1 0 1 0 1 0 0 0 1 1 0 1 1 0 0 0 1 0 0 0 0 0 0 1 1 24/31

Nearly Trivial Dependencies 1 0 0 0 0 1 0 0 0 1 1 1 0 0 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 1 0 1 0 0 0 0 1 0 1 1 0 0 1 0 0 25/31

The Expected Number of Empty Columns The probability that the ith column has no ones is: 26/31 (no 1 s) = (1 α i ) l where l is the number of vectors, or rows in the matrix. Thus, the expected number of empty columns is: E(no 1 s) = k (1 α i ) l k e lα i i=1 i=1 where k is the total number of columns. Similarly, the expected number of columns with exactly 1 one is: k E(1 one) lα i e lα i 1

Upper Bound Now that we have found the expected number of columns with no ones, we still need to be able to come up with an upper bound for dependency. In fact, we need to determine if there exists some relationship between the dimension of the space, or factor base size, and the number of vectors, or B-smooth rows. If we let the number of vectors l = l(k) be a function of our dimension, then we can try to determine an upper bound for dependency. 27/31 Since we know that a trivial dependency occurs when the number of vectors and columns with no ones exceeds the dimension of the space, we have an ATD if: k l + e lα i k i=1 How should l(k) grow for this inequality to hold?

Some Heuristics For the model in which we had α i = 1 p i +1, we showed this l k. For various k, the table below shows values of l that satisfy the previously stated inequality. 28/31 dimension k l(k) k difference 100 15 10 5 1000 46 31.623 14.377 10000 124 100 24 100000 317 316.228.772

Implementation Currently, we have been able to implement a working sieve and are testing data to see if our heuristics are correct. 29/31 We factored the number 89612417583329 in this sieve and these are the results we got: Our factor base size was 2093 Our smoothness bound was 38480 L = 1600 We found l = 429 smooth numbers

Implementation After removing empty columns, removing solon columns and rows, and interating the process, we reduced a 1600 2093 matrix to a 65 61 matrix! 30/31

References 1. Calkin, Neil. Dependent Sets of Constant Weight Binary Vectors. Combin. robab. Comput. 6, no. 3, 263-271, 1997. 31/31 2. Crandall, R. and C. omerance. rime Numbers A Computational erspective. Springer-Verlag, New York, 2001.