The Accelerated Euclidean Algorithm

Similar documents
On Newton-Raphson iteration for multiplicative inverses modulo prime powers

A new simple recursive algorithm for finding prime numbers using Rosser s theorem

Methylation-associated PHOX2B gene silencing is a rare event in human neuroblastoma.

On infinite permutations

The Mahler measure of trinomials of height 1

Hook lengths and shifted parts of partitions

Exact Comparison of Quadratic Irrationals

A Simple Proof of P versus NP

On the longest path in a recursively partitionable graph

b-chromatic number of cacti

On a Parallel Lehmer-Euclid GCD Algorithm

A non-commutative algorithm for multiplying (7 7) matrices using 250 multiplications

Climbing discrepancy search for flowshop and jobshop scheduling with time-lags

Smart Bolometer: Toward Monolithic Bolometer with Smart Functions

Analysis of Boyer and Moore s MJRTY algorithm

Case report on the article Water nanoelectrolysis: A simple model, Journal of Applied Physics (2017) 122,

Full-order observers for linear systems with unknown inputs

On size, radius and minimum degree

Completeness of the Tree System for Propositional Classical Logic

Can we reduce health inequalities? An analysis of the English strategy ( )

Easter bracelets for years

A note on the computation of the fraction of smallest denominator in between two irreducible fractions

Sparse multivariate factorization by mean of a few bivariate factorizations

Thomas Lugand. To cite this version: HAL Id: tel

A new approach of the concept of prime number

Hardware Operator for Simultaneous Sine and Cosine Evaluation

There are infinitely many twin primes 30n+11 and 30n+13, 30n+17 and 30n+19, 30n+29 and 30n+31

Evolution of the cooperation and consequences of a decrease in plant diversity on the root symbiont diversity

On Symmetric Norm Inequalities And Hermitian Block-Matrices

A non-commutative algorithm for multiplying (7 7) matrices using 250 multiplications

Axiom of infinity and construction of N

Efficient Subquadratic Space Complexity Binary Polynomial Multipliers Based On Block Recombination

Fast Computation of Moore-Penrose Inverse Matrices

The FLRW cosmological model revisited: relation of the local time with th e local curvature and consequences on the Heisenberg uncertainty principle

L institution sportive : rêve et illusion

A proximal approach to the inversion of ill-conditioned matrices

Vibro-acoustic simulation of a car window

All Associated Stirling Numbers are Arithmetical Triangles

Passerelle entre les arts : la sculpture sonore

approximation results for the Traveling Salesman and related Problems

Soundness of the System of Semantic Trees for Classical Logic based on Fitting and Smullyan

The Windy Postman Problem on Series-Parallel Graphs

On path partitions of the divisor graph

A remark on a theorem of A. E. Ingham.

On the link between finite differences and derivatives of polynomials

Unbiased minimum variance estimation for systems with unknown exogenous inputs

Comparison of Harmonic, Geometric and Arithmetic means for change detection in SAR time series

Dispersion relation results for VCS at JLab

Avalanche Polynomials of some Families of Graphs

Differential approximation results for the Steiner tree problem

Self-dual skew codes and factorization of skew polynomials

A Study of the Regular Pentagon with a Classic Geometric Approach

On Symmetric Norm Inequalities And Hermitian Block-Matrices

Q-adic Transform revisited

A Slice Based 3-D Schur-Cohn Stability Criterion

From Unstructured 3D Point Clouds to Structured Knowledge - A Semantics Approach

Question order experimental constraints on quantum-like models of judgement

Solution to Sylvester equation associated to linear descriptor systems

Factorisation of RSA-704 with CADO-NFS

Negative results on acyclic improper colorings

Confluence Algebras and Acyclicity of the Koszul Complex

Thermodynamic form of the equation of motion for perfect fluids of grade n

Nel s category theory based differential and integral Calculus, or Did Newton know category theory?

A Parallel Extended GCD Algorithm

Solving an integrated Job-Shop problem with human resource constraints

The Learner s Dictionary and the Sciences:

Some explanations about the IWLS algorithm to fit generalized linear models

On the simultaneous stabilization of three or more plants

Comments on the method of harmonic balance

Two Fast Parallel GCD Algorithms of Many Integers. Sidi Mohamed SEDJELMACI

Some tight polynomial-exponential lower bounds for an exponential function

A note on the acyclic 3-choosability of some planar graphs

Trench IGBT failure mechanisms evolution with temperature and gate resistance under various short-circuit conditions

Some Generalized Euclidean and 2-stage Euclidean number fields that are not norm-euclidean

Exogenous input estimation in Electronic Power Steering (EPS) systems

Multiple sensor fault detection in heat exchanger system

ON THE UNIQUENESS IN THE 3D NAVIER-STOKES EQUATIONS

Gaia astrometric accuracy in the past

New estimates for the div-curl-grad operators and elliptic problems with L1-data in the half-space

The Riemann Hypothesis Proof And The Quadrivium Theory

Comment on: Sadi Carnot on Carnot s theorem.

Simulation and measurement of loudspeaker nonlinearity with a broad-band noise excitation

On The Exact Solution of Newell-Whitehead-Segel Equation Using the Homotopy Perturbation Method

The exact complexity of the infinite Post Correspondence Problem

Solving a quartic equation and certain equations with degree n

On a theorem of Erdos and Szemeredi

A Context free language associated with interval maps

Cutwidth and degeneracy of graphs

Numerical Exploration of the Compacted Associated Stirling Numbers

Low frequency resolvent estimates for long range perturbations of the Euclidean Laplacian

STATISTICAL ENERGY ANALYSIS: CORRELATION BETWEEN DIFFUSE FIELD AND ENERGY EQUIPARTITION

A Simple Model for Cavitation with Non-condensable Gases

BERGE VAISMAN AND NASH EQUILIBRIA: TRANSFORMATION OF GAMES

On one class of permutation polynomials over finite fields of characteristic two *

Two Dimensional Linear Phase Multiband Chebyshev FIR Filter

Particle-in-cell simulations of high energy electron production by intense laser pulses in underdense plasmas

Worst-case analysis of Weber s GCD algorithm

Stickelberger s congruences for absolute norms of relative discriminants

On the Griesmer bound for nonlinear codes

FORMAL TREATMENT OF RADIATION FIELD FLUCTUATIONS IN VACUUM

Transcription:

The Accelerated Euclidean Algorithm Sidi Mohamed Sedjelmaci To cite this version: Sidi Mohamed Sedjelmaci The Accelerated Euclidean Algorithm Laureano Gonzales-Vega and Thomas Recio Eds 2004, University of Cantabria, Santander, Spain, pp283-287, 2004 <hal- 00003459> HAL Id: hal-00003459 https://halarchives-ouvertesfr/hal-00003459 Submitted on 2 Dec 2004 HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not The documents may come from teaching and research institutions in France or abroad, or from public or private research centers L archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d enseignement et de recherche français ou étrangers, des laboratoires publics ou privés

The Accelerated Euclidean Algorithm Sidi Mohamed Sedjelmaci Abstract We propose a new GCD algorithm called Accelerated Euclidean Algorithm, or AEA for short, which matches the On log 2 n loglog n time complexity of Schönhage algorithm for n-bit inputs This new GCD algorithm is designed for both integers and polynomials We only focus our study to the integer case, the polynomial case is currently addressed [3] ccsd-00003459, version 1-2 Dec 2004 1 Introduction The algorithm is based on a half-gcd like procedure, but unlike Schönhage s algorithm, it is iterative and therefore avoids the penalizing calls of the repetitive recursive procedures The new half-gcd procedure reduces the size the integers at least a half word-memory bits per iteration only in single precision By a dynamic updating process, we obtain the same recurrence and the same time performance as in the Schönhage approach Throughout, the following notation is used W is a word memory, ie: W 16, 32 or 64 Let u v > 2 be positive integers where u has a n bits with n 32 Given a non-negative integer x N, lx represents the number of its significant bits, not counting leading zeros, ie: lx log 2 x+1 For the sake of readability integers U and V will be represented as a concatenation of l sets of W bits integers except for the last set which may be shorter, with l n/w, ie: if U l 1 i0 2iW U l i with U 1 0 and V l 1 i0 2iW V l i, then we write symbollically U U 1 U 2 U l and V V 1 V l We use specific vectors which represent interval subsets from of U and V, by: Ui U X[ij] i+1 U j Ui ; X[i] ; for 1 i < j l V i V i+1 V j V i Let Mx be the cost of a multiplication of two x-bit integers The function Mx depends on the algorithm used to carry out the multiplications The fast Schönhage-Strassen algorithm [6] performs these multiplications in Mx On log n log log n 2 AEA: The Accelerated Euclidean Algorithm The following algorithm is based on two main ideas: The computations are done in a Most Signifivant digit First MSF way Update by multiplying, with the current matrix, ONLY twice the number of leading bits we have already chopped, NOT the others leading bits 1

Algorithm AEA Input : u v > 2 with u 8W; n lu; Output : a 2 2 matrix M and U,V such that M U,V U,V and lv lu 2 log 2 n/w 1 W Begin 1 U,V u,v; s log 2 n/w ; 2 if lv lu/2 + 1 return I; 3 if lv > lu/2 + 1 4 For i 1 to 2 s 1 5 if U i 0 or V i 0 Regular case 6 h 0; 7 if i odd L 0 ILEX[ii + 1]; update L i,h; 8 else 9 R 0 ILEX[ii + 1]; update R i,h; 10 x i/2; h h + 1; 11 While x even 12 R h R h 1 L h 1 ; update R i,h; 13 x x/2; h h + 1; 14 Enwhile 15 L h R h 1 L h 1 ; update L i,h; 16 Endelse 17 else Irregulari U i 0 and V i 0 ; 18 EndFor 19 Return L h and U,V ; End The algorithm ILE borrowed from [7] runs the extended Euclidean algorithm and stops when the remainder has roughly the half size of the inputs Algorithm ILE Input : u 0 u 1 0 Output : a 2 2 matrix M and u i 1,u i such that M u 0,u 1 u i 1,u i and lu i 1 2 lu 0 Begin 1 n lu 0 ; p lu 1 ; 2 if p < n/2 + 1 return M 1 0 0 1 3 if p n/2 + 1 4 m p n/2 1; 5 Apply Extended Euclid Algorithm until a i 2 m < a i+1 ; ai 1 b 6 return M i 1 and u i 1,u i End a i b i ; 2

The functions update L or update R update not all the bits of the operands, but only the next useful small vectors, in order to get the next needed matrix The irregular case is when U i 0 and V i 0, ie: one or many components are all equal to zero Roughly speaking, we perform an euclidean division in order to make an efficient reduction and continue our process The aim is to preserve, at most, the general scheme of our process The basic idea is to use full updated vectors, ie: vectors updated with all the previous matrices L h 3 An Example In order to carry out single-precision computations we take W 20 Recall that the notation means 2 A a c A a c B b d B b W +, where A, B, a, b, c and d d are integers cf notation in Section 1 u 922375420941 Let, then, with our notation, we obtain v 707599307587 U u 879645 785421 879645 785421 2 V v 674819 20 + 299843 674819 299843 We have U 1 879645 and V 1 674819 then lu 1 lv 1 20, n p 20 and m 1 p 1 n 2 9 Thus, in order to compute ILEU 1,V 1, we must stop the Extended Euclid Algorithm at a i 2 9 We obtain the matrix a i < 2 9 512 369 481 N 1 ILEU 1,V 1 then : 425 554 U U1 2 N 1 N 20 + U 2 U1 V 1 V 1 2 20 N + V 1 2 2 0 + N 1 1 V 2 U1 2 So N 1 20 + U 2 1066 138 V 1 2 20 2 + 601 20 + 2 160 20 + 1204 892378 2 441 20 +, 81257 hence U1 1204 892378 and 441 81257 V 1 892378 81257 Now we can apply ILE to the new integers less than 32 bits U 1263377882 and V 462503273 We have lu 31 and lv 29 We obtain m 8 and the second matrix in 32-bits single precision 41 112 and M N 2 N 1 N 2 ILEU,V 41 112 231 631 231 631 369 481 425 554 62729 81769 353414 460685 Moreover, at this level, we may consider that U 1 and V 1 are eliminated chopped, even if U 2 has some extra bits, and that U 2 and are updated as follows: 3

1873414 725479 1 824838 0 725479 ; lu 2 21; l 20 at 1 b t 1 On the other hand, it is easy to check that the final matrix M U 62729 81769 922375420941 M V 353414 460685 707599307587 Now, if u and v were larger in size, namely: u 879645 785421 u3 u 4 u k v 674819 299843 v 3 v 4 v k, a t 1873414 725479 b t satisfies then we have to continue the half-gcd process Since W bits have been already chopped, we have to update only the double, ie: multiply the next 2W leading bits of U and V by M, namely perform: U3 U 4 V 3 V 4 U3 U M 4 V 3 V 4, and disregard all the other bits of U and V U 3 U1 U 2 Then do the same process as before with instead of to chop V 3 V 1 V 2 the vector, and so on, repeating the process till we reach the middle of the size of U 4 An Example Let us consider two consecutive Fibonacci numbers u,v F 59,F 58, ie: F59 956722026041 956722 026041 956722 026041 10 F 58 591286729879 591286 9 + 729879 591286 729879 First, we must compute mmax which gives 2 mmax, the maximum size of the output matrix L 1 mmax lv lu 2 1 19 Let U 1,V 1 956722,591286 then lu 1 lv 1 20, n p 20 and m p 1 n 2 9 We run the Extended Euclid algorithm and stops at a i 2 9 We obtain the matrix L 0 and the remainders r 1,r 2 : 144 233 L 0 ILEU 1,V 1 and r 233 377 1,r 2 1670,1404 U U1 10 Hence L 0 L 9 + U 2 U1 V 0 V 1 10 9 L + V 0 10 2 V 9 + L 0 1 V 2, so By update1,0 we compute L 0 4

U1 10 L 9 + U 2 1670 166311903 1836 0 V 1 10 9 10 + 1404 9 + 269096830 1134 where FIX occurs in the last equality, hence the new values of U 1 and V 1 : U1 1836 311903 and 1134 903170 V 1 311903 10 9 + 903170 We apply ILE to the new integers U 1836311 and V 1134903 We have lu 21 and lv 21 Repeating the same process as before, we obtain m 9, the second matrix R 0 and remainders r 1,r 2 233 377 R 0 ILEU,V and r 377 610 1,r 2 2032,1583 Since L 1 R 0 L 0 and using update2,0 we obtain U 1836311 903 L 1 R V 0 10 1134903 6 +R 0 170 U 2178309 F32 Thus L 1 V 1346269 F 31 233 377 144 233 and L 1 R 0 L 0 377 610 233 377 2032 1583 146309 10 6 + 236731 121393 196418 196418 317811 We can apply one more Euclid step because the matrix Q L 1 still satifies a i 2 mmax 2 19 524288 So after one Euclid step we finally obtain: U 1346269 F31 Q L 1, V 832040 F 30 and after L 1 Q L 1, we have: 196418 317811 L 1 317811 514229 Moreover, at this level, we may consider that U 1 and V 1 are eliminated chopped, even if U 2 has some extra bits, and that U 2 and are updated as follows: 1873414 725479 1 824838 0 725479 ; lu 2 21; l 20 at 1 b t 1 On the other hand, it is easy to check that the final matrix M U 62729 81769 922375420941 M V 353414 460685 707599307587 Now, if u and v were larger in size, namely: u 879645 785421 u3 u 4 u k v 674819 299843 v 3 v 4 v k, a t 1873414 725479 b t, satisfies then we have to continue the half-gcd process Since W bits have been already chopped, we have to update only the double, ie: multiply the next 2W leading bits of U and V by M, namely perform: 5

U3 U 4 V 3 V 4 U3 U M 4 V 3 V 4, and disregard all the other bits of U and V U 3 U1 U 2 Then do the same process as before with instead of to chop V 3 V 1 V 2 the vector, and so on, repeating the process till we reach the middle of the size of U Here we stress that this is the main difference between our approach and the other Sorenson s like algorithms, where all the bits of U and V are updated by multiplying them with the matrix M In our approach, we only update the double of what we have already chopped 5 Remarks Unlike the recursive versions of GCD algorithms [1],[4], our aim is not to balance the computations on each leaf of the binary tree computations, but to make full of single precision everytime, computing therefore the maximum of quotients in single precision This lead to different computations each step in the algorithm AEA and the other recursive GCD algorithms Moreover the fundamental difference between AEA and Schönhage s approach is that AEA deals straightforward with the most significant leading bits first MSF computation Consequently, we can stop the algorithm AEA at any time and still obtain the leading bits of the result Thus AEA is a strong MSF algorithm and can be considered for on line arithmetics, where all the basic operations can be carried out simultaneously as soon as only the needed bits are available This new algorithm should be an alternative to the Schönhage GCD algorithm On the other hand, the derived GCD algorithm deals with many applications where long euclidean divisions are needed We have identified and started to study many applications such as subresultants and Cauchy index computation, Pade-approximates or LLL-algorithms 6 References [1] Gathen, J von zur, Gerhard, G Modern Computer Algebra In Cambridge University Press 1999 [2] DH Lehmer Euclid s algorithm for large numbers, American Math Monthly, 45, 1938, 227-233 [3] MF Roy and SM Sedjelmaci The Polynomial Accelerated Euclidean Algorithm, Subresultants and Cauchy index, work in progress, 2004 [4] A Schönhage Schnelle Berechnung von Kettenbruchentwicklugen, Acta Informatica, 1, 1971, 139-144 [5] A Schönhage, A,F,W Grotefeld and E Vetter Fast Algorithms, A Multitape Turing Machine Implementation, BI-Wissenschaftsverlag, Mannheim, Leipzig, Zürich, 1994 6

[6] A Schönhage, V Strassen Schnelle Multiplication grosser Zalen Computing 7,1971, 281-292 [7] SM, Sedjelmaci On A Parallel Lehmer-Euclid GCD Algorithm, in Proc of the International Symposium on Symbolic and Algebraic Computation ISSAC 2001, 2001, 303-308 Sidi Mohamed Sedjelmaci LIPN CNRS UMR 7030, Université Paris-Nord Av J-B Clément, 93430 Villetaneuse, France E-mail: sms@lipnuniv-paris13fr 7