Fast inverse for big numbers: Picarte s iteration

Similar documents
1 Caveats of Parallel Algorithms

Lecture 1 & 2: Integer and Modular Arithmetic

Fast Fraction-Integer Method for Computing Multiplicative Inverse

arxiv: v1 [cs.na] 8 Feb 2016

Module 9: Further Numbers and Equations. Numbers and Indices. The aim of this lesson is to enable you to: work with rational and irrational numbers

TIGHT BOUNDS FOR THE FIRST ORDER MARCUM Q-FUNCTION

Fast, Parallel Algorithm for Multiplying Polynomials with Integer Coefficients

Exploring Lucas s Theorem. Abstract: Lucas s Theorem is used to express the remainder of the binomial coefficient of any two

SOME GENERAL RESULTS AND OPEN QUESTIONS ON PALINTIPLE NUMBERS

MCS 115 Exam 2 Solutions Apr 26, 2018

Fraction-Integer Method (FIM) for Calculating Multiplicative Inverse

Elliptic Curves Spring 2013 Lecture #3 02/12/2013

On a Parallel Lehmer-Euclid GCD Algorithm

#A50 INTEGERS 14 (2014) ON RATS SEQUENCES IN GENERAL BASES

Mathematical Ideas Modelling data, power variation, straightening data with logarithms, residual plots

Upper Bounds for Stern s Diatomic Sequence and Related Sequences

Essential Maths 1. Macquarie University MAFC_Essential_Maths Page 1 of These notes were prepared by Anne Cooper and Catriona March.

CPSC 518 Introduction to Computer Algebra Asymptotically Fast Integer Multiplication

RON M. ROTH * GADIEL SEROUSSI **

Chaos and Dynamical Systems

STRONG NORMALITY AND GENERALIZED COPELAND ERDŐS NUMBERS

QUADRATIC EQUATIONS EXPECTED BACKGROUND KNOWLEDGE

Speedy Maths. David McQuillan

The Accelerated Euclidean Algorithm

ERASMUS UNIVERSITY ROTTERDAM Information concerning the Entrance examination Mathematics level 2 for International Business Administration (IBA)

CPSC 518 Introduction to Computer Algebra Schönhage and Strassen s Algorithm for Integer Multiplication

Robot Position from Wheel Odometry

Topics in Cryptography. Lecture 5: Basic Number Theory

Non-Linear Regression Samuel L. Baker

Divide-and-Conquer. Reading: CLRS Sections 2.3, 4.1, 4.2, 4.3, 28.2, CSE 6331 Algorithms Steve Lai

Elliptic Curves Spring 2013 Lecture #4 02/14/2013

Zeroing the baseball indicator and the chirality of triples

Derivation and constants determination of the Freundlich and (fractal) Langmuir adsorption isotherms from kinetics

ERASMUS UNIVERSITY ROTTERDAM

Input Decidable Language -- Program Halts on all Input Encoding of Input -- Natural Numbers Encoded in Binary or Decimal, Not Unary

Fast evaluation of iterated multiplication of very large polynomials: An application to chinese remainder theory

3.5 Solving Quadratic Equations by the

Every equation implies at least two other equation. For example: The equation 2 3= implies 6 2 = 3 and 6 3= =6 6 3=2

Algebra I Notes Concept 00b: Review Properties of Integer Exponents

Fast Polynomial Factorization Over High Algebraic Extensions of Finite Fields

Introduction to Cryptography. Lecture 6

The Mean Version One way to write the One True Regression Line is: Equation 1 - The One True Line

Implementation of Galois Field Arithmetic. Nonbinary BCH Codes and Reed-Solomon Codes

Numbers. 2.1 Integers. P(n) = n(n 4 5n 2 + 4) = n(n 2 1)(n 2 4) = (n 2)(n 1)n(n + 1)(n + 2); 120 =

Fast reversion of formal power series

Songklanakarin Journal of Science and Technology SJST R1 KANYAMEE. Numerical methods for finding multiplicative inverses of a modulo N

Lecture 6: Introducing Complexity

Remainders. We learned how to multiply and divide in elementary

Divide and Conquer. Chapter Overview. 1.2 Divide and Conquer

A Division Algorithm Approach to p-adic Sylvester Expansions

Modular Arithmetic. Examples: 17 mod 5 = 2. 5 mod 17 = 5. 8 mod 3 = 1. Some interesting properties of modular arithmetic:

PROBLEM SET 1 SOLUTIONS 1287 = , 403 = , 78 = 13 6.

New Constructions of Sonar Sequences

Luis Manuel Santana Gallego 100 Investigation and simulation of the clock skew in modern integrated circuits. Clock Skew Model

1Number ONLINE PAGE PROOFS. systems: real and complex. 1.1 Kick off with CAS

Properties of proper rational numbers

Lecture 6 January 15, 2014

An Efficient Parallel Algorithm to Solve Block Toeplitz Systems

Worst-case analysis of Weber s GCD algorithm

= 1 2x. x 2 a ) 0 (mod p n ), (x 2 + 2a + a2. x a ) 2

Sharp estimates of bounded solutions to some semilinear second order dissipative equations

How to Multiply. 5.5 Integer Multiplication. Complex Multiplication. Integer Arithmetic. Complex multiplication. (a + bi) (c + di) = x + yi.

INTRODUCTION. 2. Characteristic curve of rain intensities. 1. Material and methods

SUFFIX PROPERTY OF INVERSE MOD

Partitions in the quintillions or Billions of congruences

Algebra I Notes Unit Nine: Exponential Expressions

FinQuiz Notes

DEPARTMENT OF ECONOMICS

Graphs and polynomials

Algorithm Design and Analysis

Section 8.5. z(t) = be ix(t). (8.5.1) Figure A pendulum. ż = ibẋe ix (8.5.2) (8.5.3) = ( bẋ 2 cos(x) bẍ sin(x)) + i( bẋ 2 sin(x) + bẍ cos(x)).

Polynomial Degree and Finite Differences

An Analysis of the Generalized Binary GCD Algorithm

CPSC 467: Cryptography and Computer Security

Solutions to Assignment 1

Star discrepancy of generalized two-dimensional Hammersley point sets

The variance for partial match retrievals in k-dimensional bucket digital trees

arxiv: v1 [math.co] 3 Feb 2014

Exact Arithmetic on a Computer

ROUNDOFF ERRORS; BACKWARD STABILITY

Mathematical Foundations of Public-Key Cryptography

THE BALANCED DECOMPOSITION NUMBER AND VERTEX CONNECTIVITY

1 Divide and Conquer (September 3)

SOLUTIONS FOR ADMISSIONS TEST IN MATHEMATICS, COMPUTER SCIENCE AND JOINT SCHOOLS WEDNESDAY 2 NOVEMBER 2016

GENERALIZED ARYABHATA REMAINDER THEOREM

NUMBER SYSTEMS. Number theory is the study of the integers. We denote the set of integers by Z:

Short Division of Long Integers. (joint work with David Harvey)

3 Finite fields and integer arithmetic

IN this paper, we consider the estimation of the frequency

10 Lorentz Group and Special Relativity

NOTES ON SIMPLE NUMBER THEORY

Limits at Infinity. Horizontal Asymptotes. Definition (Limits at Infinity) Horizontal Asymptotes

1. Define the following terms (1 point each): alternative hypothesis

Genetic Algorithms applied to Problems of Forbidden Configurations

An introduction to parallel algorithms

4 Finite field arithmetic

Smoothness Testing of Polynomials over Finite Fields

Experience in Factoring Large Integers Using Quadratic Sieve

Introduction to Algorithms

Transcription:

Fast inverse for ig numers: Picarte s iteration Claudio Gutierrez and Mauricio Monsalve Computer Science Department, Universidad de Chile cgutierr,mnmonsal@dcc.uchile.cl Astract. This paper presents an algorithm to compute the inverse of an n-it integer with k n its of precision running in time O(M(k, n)), where M(k, n) is the time needed to multiply two numers of k and n its. This ound improves asymptotically Newton s iteration in this range which gives O(M(k, k)), and hence improves asymptotically the est known upper ound for computing the inverse with high precision. Additionally we show that a simple implementation of the algorithm ehaves very well in practice for precision k >> n, improving largely Newton s iteration and also outperforming the standard multiple precision arithmetic lirary package GMP for ig n. Introduction It is well known that the representation of the inverse of an integer consists of a fixed part plus a period, whose lengths depend on the representation ase. For example, for decimal representation, given an integer odd and not divisile y 5, 1/ is pure recurring and the length of the period is the order of 10 modulo [3]. This means that the period could e exponentially long in the length of the decimal representation. For example 23 is represented y 2 decimal digits, while 1/23 = 0.0434782608695652173913 has a period of 22 decimals. Similar analysis holds for any ase, in particular for inary representation (cf. Thm. 136, [3]). Thus, for an integer of n digits, computing its inverse with k digits of precision for k in the range n k 2 n is relevant and not trivial. Given an integer (assume its inary representation has n its), the standard routine to compute efficiently its inverse 1/ is Newton s iteration, which gives n significative its of the inverse of in time O(M(n, n)), where M(x, y) is the time taken to multiply an x-it numer y an y-it numer. In fact, O(M(n, n)) is currently the est theoretical asymptotical ound for computing the inverse of an n-it numer [4, 1], and also is the method suggested to e used in practice for large numers ([4], 4.3.3, Division). Specialized packages for multiprecision arithmetic also consider it (e.g. GMP 1 [8]). Note that this ound depends on the multi- 1 GNU Multiple Precision Arithmetic Lirary, the fastest ignum lirary on the planet!

plication procedure used. The current est theoretical ound for multiplication M(n, n), which uses Schönhage and Strassen procedure [6], gives O(n lg n lg lg n). When more its of precision for 1/ are needed, say k > n, there seems to e no way to avoid the O(M(k, k)) time ound otained y running Newton s iteration. This ound is got essentially y the cost of the squaring in Newton s iteration formula x i+1 = 2x i x 2 i. In this paper we present an algorithm to compute the inverse 1/ of an n-it integer with k n its of approximation in time O(M(k, n)). Hence, using the current est asymptotical ounds for multiplication plus Shönhage s oservation ( [4], 4.3.3, Ex. 13) on computing k times n for k > n, it gives O(k lg n lg lg n) time, which improves asymptotically the current est ound O(k lg k lg lg k) (which uses Newton s iteration). This is particularly useful in the case we discussed efore when k is of the order 2 n. Additionally, we show experimentally that this algorithm ehaves very well in practice for k >> n. In fact, with a naive implementation we are ale to: 1) illustrate that the ehaviour of Picarte s iteration linear timecomplexity with parameter k when the size is fixed; 2) show that in practice Picarte s iteration is much faster than Newton s iteration than what appears in the asymptotical ound difference; and 3) show that Picarte s iteration is faster than current est widely used implementation GMP for ig n and large k. 1 Picarte s iteration In 1861 Ramón Picarte pulished a tale of inverses [5] with remarkale precision for the time. In order to calculate his tale of inverses, Picarte starts from the following identity: a 2 (a )2 = (2a ) +. The key oservation he made is that the integer part of a2 the integer part of (a )2, thus: a2 )2 = (2a ) + (a. is 2a plus We can now use the same formula for (a )2. Iterating this process q times, where q is such that a = q + r, 0 r <, we otain: a2 = 2aq q2 + r2. (1)

Note that we got a discrete version of Newton s iteration. In fact, if we consider inary representation, and put a = 2 i, then x i = 2i is the inary representation of 1/ with i its of precision. Also, if 2 i = q + r, 0 r < is the Euclidean decomposition, we get from (1) the identity x 2i = 2 i+1 x i x 2 i + r2, (2) which is essentially a discrete version of the iteration recommended y Knuth for implementing Newton s iteration for high-precision reciprocal (Algorithm R, [4], 4.3.3). Improved version ased on Picarte. The advantage of Picarte s formula (1) for inverse over the classical Newton s one x 2i = 2x i x 2 i, is not only that Picarte s is discrete and exact, ut that it allows to improve the efficiency of the evaluation of it y replacing the expression 2qa q 2 for the equivalent, ut simpler to evaluate expression qa + rq (recall that a = q + r, hence qa = q 2 + rq), getting: a2 = qa + rq + r2. (3) Now, using the same arguments for getting a inary representation used in the derivation of (2), we otain the following formula, which we will call Picarte s iteration: x 2i = 2 i x i + r i x i + r2 i. (4) Note that compared to Newton s iteration (formula (2) aove), the evaluation of this expression requires only multiplication of x i y a ounded numer (r i < ) plus a ounded division and a shift. From this iteration formula we otain: Theorem 1. For a n-it numer, the time complexity of computing 1/ with k n inary digits of precision is O(M(k, n)), where M(k, n) is the time complexity of multiplying a k-it numer y an n-it numer. Proof. The deduction aove proves that the iteration (4) computes what is claimed. To calculate the time complexity T (x) of computing x 2i, oserve that on the right hand side of (4) we have a shifting y i, a multiplication of an i-it numer y an n-it numer M(i, n), a squaring of an n-it numer M(n, n), a division of a 2n-it numer y an n-it numer D(2n), plus two sums of at most 2i-it numers S(2i). That is:

T (2i) = T (i) + M(i, n) + 2S(2i) + M(n, n) + D(2n) + c(shift, i). (5) By standard techniques (cf. Theorem 4.1 [2]), and the fact that for i n, M(i, n) asymptotically dominates over the terms on its right, it follows that for k n, T (k) = O(M(k, n)). It is interesting to note that using the aove method to compute 1/ with k n its of precision and Schönhage s oservation on how to multiply an n-it numer y a k-it numer for k > n ([4], 4.3.3, Ex. 13), we get the following result: Corollary 1. For n-it integers a,, the time complexity of computing a/ with k > n inary digits of precision is O(M(k, n)). 2 Illustrations of theoretical results and comparison with practical implementations The aim of the experiments shown in this section is three-fold: 1. To illustrate the ehaviour of Picarte s iteration linear time-complexity (of parameter k) when the size is fixed. 2. To show that in practice Picarte s iteration is much faster than Newton s iteration. We already know from theory that Picarte s is asymptotically faster than Newton s. But the role of constants is hidden in the asymptotic comparison. In fact, the experiments show this difference. 3. To show that Picarte s iteration is faster than current est widely used implementations. As point of illustration, we chose the GMP package, and compared our (naive) implementation of Picarte s iteration with the GMP s division algorithm for precision k. The results show that Picarte s iteration is more efficient for large k. Picarte s iteration was implemented in C language using the GMP lirary for ig numers. We also implemented Newton s iteration in a similar way (it differs from Picarte s essentially in two lines). See Algorithms 1 and 2 (code availale at http://www.dcc.uchile.cl/~mnmonsal/picarte). All tests were performed on a processor Intel Core 2 Duo with 2 Ghz and 1 G RAM running the Fedora Core 6 linux distriution.

Algorithm 1 Picarte s iteration Input j the numer of iterations, the numer to invert Output x the inverse, k the precision reached procedure P icarte(j, ) r 2, x 0, k 1 while j > 0 y 1 x << k y 2 r x x y 1 + y 2 rr r r // Division with remainder y 1, r rr, rest( rr ) y 2 x + y 1 x y 2 k k + k j j 1 end while return x, k Algorithm 2 Newton s iteration Input j the numer of iterations, the numer to invert Output x the inverse, k the precision reached procedure N ewton(j, ) r 2, x 0, k 1 while j > 0 y 1 x << k xx x x y 2 xx x y 1 y 2 rr r r y 1, r rr, rest( rr ) y 2 x + y 1 x y 2 k k + k j j 1 end while return x, k 2.1 Picarte s iteration asymptotic ehaviour GMP uses the current faster multiplication algorithms, so Picarte s asymptotic ound should e linear in k and logarithmic in n. The experiments do not deviate from this prediction. Fig. 1 shows Picarte s iteration when n = 2 5, n = 2 10 and n = 2 15, where n is the size of the inary representation of. 2.2 Picarte s iteration versus Newton s iteration Picarte s iteration improves the performance of Newton s iteration y essentially replacing x 2 i for r i x i (recall equations (2) and (4)). Hence the cost of each iteration in Picarte s algorithm is one multiplication y a ounded numer M(i, n) as compared to two multiplications M(2i, n) + M(i, i) (one unounded) in Newton s. Also note that r i < for all i. Thus, in practice, Picarte s iteration greatly improves the performance of Newton s as Figure 2 shows. Figure 2 shows Picarte s iteration versus Newton s iteration. Clearly, Picarte s iteration outperforms Newton s iteration. When k is ig, greater

Fig. 1. Picarte s iteration. Graphs on the left-hand side, (a), (c) and (e), show the time used y Picarte s iteration to compute 1/ as the precision k increases, for, respectively, n = 2 5, n = 2 10 and n = 2 15 respectively, where n is the size of the inary representation of. In each graph, the X-axis shows the precision k, and the Y-axis shows the time used to compute 1/ with k its of precision in microseconds. Graphs on the right-hand side, (), (d) and (f), show the same data as (a), (c) and (e), ut in log/log scale. than 2 20 ( 10 6 ), the difference in time ecomes of the order of milliseconds for Picarte s versus seconds and even minutes for Newton s. Also, Picarte s iteration uses less memory, so it can compute the inverse for greater numers and precisions than Newton s iteration (up to k = 2 29 using GMP, unlike Newton).

Fig. 2. Picarte versus Newton. Graphs show the performance of oth algorithms for different sizes of, ranging from n = 2 10 (graph (a)) to 2 16 (graph (f)). Each graph depicts the time in microseconds of computing 1/ for different precisions k. All graphs are in log/log scale. Note that Picarte s iteration and Newton s iteration perform very similarly when the precision required is no more than the size of the inary representation n of (see Figure 2, graphs (c), (d), (e) and (f)). Picarte s starts working much etter than Newton s after that level of precision, that is, when k n.

Fig. 3. Picarte versus GMP. Graphs show the performance of Picarte versus GMP inverse for different sizes of, ranging from n = 2 5 (graph (a)) to 2 16 (graph (f)). Each graph depicts the time in microseconds of computing 1/ for different precisions k. All graphs are in log/log scale. 2.3 Picarte s iteration and GMP division GMP current implementation uses ase-case division or divide and conquer division when the divisor is ig enough. Newton s iteration is not implemented ecause it seems to surpass divide and conquer division only for very large numers. 2 As we saw, Picarte s iteration greatly improves 2 GMP s documentation says: Newton s method used for division is asymptotically O(M(N)) and should therefore e superior to divide and conquer, ut it s elieved this would only e for large to very large N.

Newton s iteration. In our experiments, Picarte s iteration also surpasses GMP s division when the precision k needed is ig enough (Fig. 3). Note that our implementation is not optimized in any way, it is just the theoretical formula of Picarte s iteration naively implemented. Figure 3 shows Picarte s iteration versus GMP division. We tested two different divisions for GMP: the integer division 2k and the floating point division 1 with k its of precision, oth yielding similar results. The graphs show the integer division. Figure 3-(a) shows that GMP s division performs etter than Picarte s iteration for small. This is due to the fact that GMP uses a different division algorithm in this range (the single lim division) which takes advantage directly from the arithmetic provided y the hardware. But for large, Picarte s iteration ehaves etter than GMP s division for ig k. See Fig. 4 for a threshold of the compromise etween n and k. Fig. 4. Threshold when Picarte surpasses GMP. Picarte s iteration is faster than GMP s division for each pair (n, k) aove the threshold line. If the pair (n, k) is elow the line, GMP s division is faster. Built for n {64, 96, 128,..., 1572864}. 3 Conclusions We presented an algorithm to find the inverse of an n-it integer with k its of precision for k n. Our algorithm improves the asymptotical time ound in this range of all previously known algorithms for computing the inverse of integers. In fact the closest one is Algorithm R in [4] as discussed in the introduction. From here it also follows an algorithm for dividing

two n-it integers with k n its of precision whose running time is O(M(k, n)). We illustrated these theoretical results with a simple implementation that outperforms current est implemented algorithms for inverse for large n and k. References 1. C. Burnikel, J. Ziegler. Fast Recursive Division, Max-Planck-Institut für Informatik Research Report MPI-I-98-1-022, http://www.mpi-s.mpg.de/~ziegler/techrep.ps.gz 2. T. H. Cormen, Ch. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms, The MIT Press and McGraw-Hill, second edition, 2001. 3. G.H. Hardy, E.M Wright. An Introduction to the Theory of Numers, Third Ed., Oxford, 1953. 4. D. Knuth. The Art of Computer Programming, Vol. 2, Seminumerical Algorithms, Third Ed., Addison-Wesley, 1999. 5. R. Picarte. La division reduite a une addition, Mallet-Bachelier, Paris, 1861. 6. A. Schonhage, V. Strassen. Schnelle Multiplication grosser Zalen, Computing 7,1971, 281-292. 7. J. Von Zum Gathen. Computer Algera, Camridge Univ. Press, 1999. 8. GMP, www.gmpli.org