The Middle Product Algorithm I
|
|
- Laurel Kelley
- 5 years ago
- Views:
Transcription
1 AAECC 4, (004) Digital Object Identifier (DOI) 0.007/s The Middle Product Algorithm I Speeding up the Division and Square Root of Power Series Guillaume Hanrot, Michel Quercia, Paul Zimmermann INRIA-Lorraine/LORIA, 65, rue du jardin botanique, 5460 Villers-lès-Nancy Cedex, France ( {hanrot, zimmerma}@loria.fr) 3 rue de Montchapet, 000 Dijon, France ( michel.quercia@prepas.org) Received: July 0, 000; revised version: November 4, 003 Published online: February 3, 004 Springer-Verlag 004 Abstract. We present new algorithms for the inverse, division, and square root of power series. The key trick is a new algorithm MiddleProduct or, for short, MP computing the n middle coefficients of a (n ) n full product in the same number of multiplications as a full n n product. This improves previous work of Brent, Mulders, Karp and Markstein, Burnikel and Ziegler. These results apply both to series and polynomials. Keywords: Middle product, Inversion, Division, Square root, Newton s method. Introduction One of the major tools for performing arithmetic or computing special functions is Newton s iteration. Indeed its quadratic rate of convergence (so that the number of correct digits or terms doubles at each step) and its self-correcting character make it especially suitable for this kind of computations. In full generality, Newton s rule for computing a root of a differentiable function f(x)can be written as x k+ = x k f(x k )/f (x k ). As a rule of thumb, the term x k should be seen as the main term, whereas f(x k )/f (x k ) should be seen as the correcting term. In particular, not all the digits of f(x k )/f (x k ) are significant, since the low weight ones will be corrected in the next iteration. Furthermore, since f(x k ) is supposed to tend to zero quickly, one can expect important cancellations in its evaluation. The present paper shows how to evaluate only the meaningful part of f(x k ), i.e., the middle digits, in some situations where the evaluation of f is mainly based on a multiplication. As a consequence, new algorithms for inversion, division and square root are derived. Another application of the middle product is the efficient computation of some transposed multiplications [].
2 46 G. Hanrot et al. We will make use of the following notations: M(n) denotes the complexity of the underlying multiplication algorithm on two polynomials of degree n; K(n) denotes the complexity of Karatsuba s algorithm, given by K() =, K(n) = K( n/ ) + K( n/ ); FFT (n) denotes the complexity of the Schönhage-Strassen multiplication algorithm [0]. The paper is organized as follows: describes the basic trick in the two most usual nontrivial multiplication models, namely Karatsuba and FFT. and 3 show how this trick can be used to compute the square and the inverse of a power series; 4 and 5 study the problem of computing the quotient and square root of power series. In 6, an implementation of these algorithms is presented, which confirms the complexity results in practice. We conclude the paper by a table summarizing the previous and new worstcase complexities under the Karatsuba and FFT models, for the middle product, inverse, quotient, square and square root operations, following work of Brent [], Mulders [9], Burnikel and Ziegler []. Finally, an appendix gives a formal description of how to transform any multiplication algorithm of a certain type into a middle product algorithm with the same complexity; this is linked to the so-called transposition principle. Newton s Inverse Iteration and the Basic Trick Newton s iteration for computing /A where A is a number, a polynomial, or a series follows the recurrence: x k+ = x k + x k ( Ax k ). () Suppose we are looking for an approximation of /A to precision n. In the last step of Newton s iteration, x k is then accurate to precision n/, and we have to compute Ax k to precision n, where the n/ most significant coefficients or bits of Ax k vanish with. To get an approximation x k+ to precision n of /A, we multiply x k by the n/ most significant coefficients of Ax k. The cost of this last step was 3M(n/) so far: M(n/) for the product Ax k which splits into two products of n/ coefficients, and M(n/) for the product of x k by Ax k. The total cost of Newton s iteration to compute /A to precision n is It may seem more natural to express the complexities of operations on power series in term of multiplications on power series (short products) instead of multiplications on polynomials (full products). For example, if M (n) denotes the complexity of a short product, Brent gives in [, Table 7.] upper bounds of 3M (n), 4M (n), and 5.5M (n) for the inverse, quotient, and square root respectively. However all known subquadratic algorithms for short products are based on full products, and in the FFT case, it is not even known whether a short product can be computed faster than a full product! This explains our choice of the full product as basic operation. By most significant, we mean the low order terms in the case of power series, and the leftmost digits in the usual notation for floating-point numbers.
3 The Middle Product Algorithm I 47 thus 3FFT (n) for FFT multiplication and 3 K(n) for Karatsuba multiplication [6]. As we know in advance that the upper n/ bits of Ax k will vanish with, a natural question is the following: is there a faster way to compute directly the n/ required bits from position n/ton of the product Ax k? We give in this section a positive answer under the Karatsuba and FFT models, and show this result applies to many multiplication algorithms. Note that other papers already investigated the computation of only part (usually the most significant bits) of the product of power series or floating-point numbers. See e.g. [8], [9].. Karatsuba Model for n a Power of We first describe our trick and the corresponding algorithm in the simple case when n is a power of, and when the underlying multiplication algorithm is the Karatsuba method. Basic trick. For the sake of clarity, we suppose A is a Taylor series in the variable t (at t = 0). Assume we have A = a 0 + a t + a t and x = x 0 + x t. We have Ax = a 0 x 0 + (a 0 x + a x 0 )t + (a x + a x 0 )t + a x t 3 and we want to compute (a 0 x + a x 0 )t + (a x + a x 0 )t. The algorithm works as follows (see Fig. ): Algorithm MiddleProduct. Input: x = x 0 + x t,a = a 0 + a t + a t Output: h = a 0 x + a x 0 and l = a x + a x 0. α (a 0 + a )x. β a (x x 0 ) 3. γ (a + a )x 0 4. h α β 5. l γ + β. Now we can use this idea recursively (note that here and in the sequel we use MP as a shortcut for MiddleProduct). This yields the following algorithm, described in the case when n is a power of. See. for the general case. Algorithm MP([x 0,...,x n ], [a 0,...,a n ]). 0. If n =, return [a 0 x 0 ]. p n/.. α MP([x p,...,x p ], [a 0 + a p,...,a p + a 3p ]) 3. β MP([x p x 0,...,x p x p ], [a p,...,a 3p ])
4 48 G. Hanrot et al. A 0 A A A 3 X 0 u v X u v u + v = α = MP(X, [A 0,A ] + [A,A ]) v u = β = MP(X X 0, [A,A ]) u + v = γ = MP(X 0, [A,A ] + [A,A 3 ]) Fig.. MiddleProduct recursion. 4. γ MP([x 0,...,x p ], [a p + a p,...,a 3p + a 4p ]) 5. Return [α β,γ + β]. Theorem When n is a power of two, Algorithm MP returns [c n,...,c n ] where c k = i+j=k a ix j, using exactly K(n) = n log 3 ring multiplications.. Karatsuba Model: The Odd Case In this section, we generalize the trick described above to deal with the case when n is not a power of. If R is the coefficient ring of our power series, and y := [y 0,...,y n ] R n, z := [z 0,...,z n ] R n, we define MP(y, z) = [ i+j=k y iz j ] n k n. Theorem Karatsuba s algorithm can be adapted so as to yield an algorithm computing MP(y, z) in K(n) multiplications. Proof. Consider the following algorithm: Algorithm MP-Karatsuba([y 0,...,y n ], [z 0,...,z n ]). 0. If n =, return [y 0 z 0 ].. n 0 n/, n n/.. α MP-Karatsuba([y n0,...,y n ], [z 0 +z n,...,z n +z 3n ]) 3. If n is even 3. β MP-Karatsuba([y n y 0,...,y n y n0 ], [z n,...,z 3n ]) else 3. β MP-Karatsuba([y n0,y n y 0,...,y n y n0 ], [z n,...,z 3n ])
5 The Middle Product Algorithm I γ MP-Karatsuba([y 0,...,y n0 ], [z n +z n,...,z n +n 0 + z n ]) 5. Return [α 0 β 0,...,α n β n,γ 0 + β 0,...,γ n0 + β n0 ]. It is easily checked that the number MP (n) of multiplications on an input of size n satisfies MP() =, MP (n) = MP( n/ ) + MP( n/ ), hence MP (n) = K(n). A straightforward calculation or using the general formulation developed in the appendix then allows one to prove that this algorithm indeed computes MP(y, z)..3 The FFT Model Theorem 3 Schönhage-Strassen s FFT algorithm can be adapted so as to yield an algorithm computing MP(y,z)inFFT (n) multiplications. Proof. We give the algorithm and refer to the general framework described in the appendix for a detailed proof. As for the complexity, the algorithm amounts to three FFTs of size n, which is exactly the same as what is done in a single FFT multiplication. Algorithm MP-FFT([y 0,...,y n ], [z 0,...,z n ]). 0. ω primitive (n)-th root of unity. α FFT(ω, [y n,...,y 0, 0,...,0]) }{{} n. β FFT(ω, [z 0,...,z n, 0]) 3. γ FFT(ω, [α 0 β 0,...,α n β n ]) 4. Return [γ 0 /n,...,γ n /n]. FFT(a,v) computes [ n i=0 ami v i ] 0 m n, where a is a root of unity of order n. At step 3, γ = [c n,...,c n,c n,c n + c 0,...,c 3n + c n ], where c k = i+j=k y iz j..4 General Situation The two algorithms described in that section can be generalized: to any multiplication algorithm of a certain quite general kind, we can associate a middle product algorithm of the same complexity. (See [5, Def. 8] for the definition of a bilinear algorithm.) Theorem 4 From any bilinear (n, n) multiplication algorithm, one can derive an algorithm computing a (n, n ) middle product with exactly the same number of ring multiplications.
6 40 G. Hanrot et al. Since the proof of that theorem is quite technical, and not required in the rest of the paper, we postpone it in an appendix. In the following sections, we consider an algorithm MP(y, z) taking y with n terms, z with n terms, and returning their middle product, with the same complexity K(n) or FFT (n) depending on the computation model than multiplication. We then use this algorithm to build new algorithms for basic operations on power series or polynomials: square, inverse, quotient, square root. Sometimes to simplify the algorithms we allow z to have more terms; in such a case we consider only the first n terms from z, and we write MP(y, z, n) to avoid confusion. We also write mul(a, b) for the full product of two series or polynomials of n terms (giving n terms), mul high(a, b) for the (short) product of two series of n terms, giving the n most significant terms, of order 0 to n, and mul low(a, b) for the short product giving the n least significant terms, of order n to n. Squaring Power Series In this section, we show how to use the MP algorithm to compute the square of a power series, or equivalently the upper (or lower) half part from the square of a polynomial. Theorem 5 Using the MP algorithm, one can compute the square of a power series of order n in R(n) K(n)+ ring multiplications under the Karatsuba model. Proof. We use the following algorithm. Let A = n i=0 a it i = A 0 + t p A,we have A = A 0 + A 0A t p mod t n, plus an extra term ap t p when n is odd. At step3wehaveα = A mod t n p, and at step 4 (where the second argument of MP should read [a ] for n = 3) β gives the coefficients of degree n p to n ofa 0 + A 0A t p, thus α + t n p β, plus ap t p when n is odd, gives A mod t n. Algorithm MP-square([a 0,...,a n ]).. If n =, return [a0 ].. p n/ 3. α MP-square([a 0,...,a n p ]) 4. β MP([a 0,...,a p ], [a n p+,...,a p, a p,...,a n ]) 5. If n mod = then β p β p + ap 6. Return [α 0,...,α n p,β 0,...,β p ]. The number R(n) of ring multiplications satisfies the recurrence R(n) = R( n/ )+M( n/ )+(n mod ), with R() =. In the Karatsuba model, assume that R(k) K(k)+ for k<n.ifn = k,wehaver(n) = R(k)+K(k) 3K(k)+ K(n)+. If n = k +, then R(n) = R(k + ) + K(k) +
7 The Middle Product Algorithm I 4 K(k+)+K(k)+3 K(k+)+K(k)+ K(n)+ K(k + ). 3, using the fact that K(k) + 3 Power Series Inversion In this section, we explain how the MP algorithm can be applied, when plugged into Newton s iteration, to compute inverses. Theorem 6 Using the MP algorithm, one can inverse a power series of order n in I (n) = K(n) ring multiplications under Karatsuba s model, and I (n) FFT (n) using FFT. Remark. The analysis of all the FFT variants from now on is biased by the fact that exact FFT complexity (i.e., exact formulas for FFT (n)) highly depends on the implementation. We shall thus content ourselves with reasonable estimates, assuming mainly that FFT (n) FFT (n) and FFT (n + ) FFT (n). Proof. We describe the algorithm, which simply rests on Newton s iteration, with the usual product replaced by an MP call. Algorithm MP-inv([a 0,...,a n ]). 0. If n =, return [/a 0 ].. p n/.. α MP-inv([a 0,...,a n p ]) 3. β MP([α 0,...,α n p ], [a,...,a n, 0],n p) 4. γ mul high([α 0,...,α p ], [β 0,...,β p ]) 5. Return [α 0,...,α n p, γ 0,..., γ p ]. The algorithm performs just one division. The number I (n) of ring multiplications is given by I() = 0, I (n) = I( n/ )+M( n/ )+M( n/ ). Under Karatsuba s framework, we have I (n) = K(n), since I (n) + satisfies the same recurrence as K(n). In the FFT case, one gets I (n) FFT (n). Remark. The mul high call at step 4 of MP-inv has a better complexity than K(n) for most values of n if Mulders algorithm is used, which results in a better overall complexity for MP-inv. See [9] for more details. 3 It can be shown that K(n + ) K(n) = z(n)+ where z(n) is the number of zeroes in the binary expansion of n.
8 4 G. Hanrot et al. 4 Power Series Division 4. Newton s Method From our inversion algorithm, we can mechanically deduce a division algorithm as follows: Theorem 7 The quotient of two order n power series Mulders short division can be computed in 4 3 K(n) under the Karatsuba model, and 5 FFT (n) under the FFT model. As a consequence, the division with remainder of a polynomial of degree n by a polynomial of degree n can be performed in 7 FFT (n). Proof. We use Karp and Markstein s trick [7] to incorporate the dividend in the last Newton s iteration. The corresponding algorithm is as follows: Algorithm MP-div-KM([b 0,...,b n ], [a 0,...,a n ]). 0. If n =, return [b 0 /a 0 ].. p n/.. α MP-inv([a 0,...,a n p ]); 3. β mul high(α, [b 0,...,b n p ]); 4. γ MP([β 0,...,β n p ], [a,...,a n, 0],n p); 5. δ mul([α 0,...,α p ], [b n p γ 0,...,b n γ p ]); 6. Return [β 0,...,β n p,δ 0,...,δ p ]. The total cost is equivalent to that of one inversion plus one short multiplication of size n/, i.e., I (n)+m(n/),or 4 3 K(n) with Karatsuba, and 5 FFT (n) using the FFT. Remark. In the above algorithm we consider that the mul high call at step 3 costs M(n p) multiplications. Actually it may be less expensive when one uses Mulders short product under the Karatsuba model. Nevertheless, the worst-case complexity 4 K(n) already improves on the previous best-known 3 algorithm of average complexity.397k(n) from Mulders [9, Table 6]. However, see also Direct Division Using MP Algorithm MP may also be used directly i.e., without Newton s iteration to compute the quotient of polynomials or power series. This new algorithm is described and analyzed in this section. Assume that we are dividing B by A, and split B as (B 0,B ), and A as (A 0,A ). Let the quotient be Q = (Q 0,Q ). The algorithm is illustrated by Figure.
9 The Middle Product Algorithm I 43 A 0 A Q 0 Q B r = 0 MP(Q 0,A) B r Fig.. Algorithm MP-divide. Namely, the upper part of the quotient Q 0 is obtained by dividing B 0 by A 0. The lower part is then obtained by dividing B minus the middle terms of the product AQ 0 hence MP(Q 0,A) bya 0. This gives more formally the following algorithm: Algorithm MP-divide ([b 0,...,b n ], [a 0,...,a n ]). 0. If n =, return [b 0 /a 0 ].. p n/.. α MP-divide([b 0,...,b n p ], [a 0,...,a n p ]). 3. r MP(α, [a,...,a n, 0],n p). 4. β MP-divide([b n p r 0,...,b n r p ], [a 0,...,a p ]). 5. Return [α, β]. Theorem 8 Algorithm MP-divide with inputs of n terms correctly computes the first n terms of their quotient, using D(n) = K(n) n ring multiplications under Karatsuba model. Proof. The correctness follows easily from the explanations of the picture above. Let D(n) be the number of multiplications in the base field, D d (n) be the number of divisions. Then one has D d (n) = D d ( n/ ) + D d ( n/ ), and D d () =, hence D d (n) = n. As for D(n),wehaveD(n) = D( n/ ) + D( n/ ) + M( n/ ), D() = 0. Under Karatsuba model, we get D(n) = K(n) n by a simple induction, since the recurrence relations are closely related to those that one obtains for Karatsuba method. In the case of the FFT, we get for n large enough the reasonable estimate D(n) FFT (n) log (n). As a consequence, Newton s method should be preferred in that case. 4.3 Division With Remainder Consider two series A and B of n terms. Since Algorithm MP-divide computes their quotient Q in K(n) operations, and the remainder can be ob-
10 44 G. Hanrot et al. tained from B AQ in K(n) additional operations, this yields a division with remainder in K(n) operations, which equals the best known complexity in the Karatsuba model []. (Considering series at infinity instead of zero, all results transpose to polynomials.) We show in this section how to compute the remainder, i.e., the low part from AQ,in 3 K(n) operations, which gives a division with remainder in 5 3 K(n) operations. In the following algorithm we assume that the high part S of AQ is known (which is the case in division, since this is just the high part of the dividend B). Algorithm ShortRem([a 0,...,a n ], [q 0,...,q n ], [s 0,...,s n ]). 0. p n/.. T mul low([a p,...,a n ], [q p,...,q n ]).. If n is even.. U mul([a 0 + a p,...,a p + a n ], [q 0 + q p,...,q p + q n ]);. Return [u 0 +u p t 0 s 0 s p,...,u p +u p t p s p s n,u p s p s n,t 0,...,t p ] else.3 U mul([a 0 +a p,...,a p +a n,a p ], [q 0 +q p,...,q p + q n,q p ]);.4 Return [u p t 0 s p,u 0 +u p t s 0 s p,...,u p 4 +u p 4 t p 3 s p 4 s p 4,u p 3 +u p 3 s p 3 s p 3,u p +u p s p s n,t 0,...,t p 3 ]. Theorem 9 Algorithm ShortRem correctly computes the low half of AQ, and uses K(n) operations under the Karatsuba model, when A and Q have 3 n terms. Proof. The idea is that the remainder comes from the computation AQ, which Karatsuba decomposes as three multiplications A 0 Q 0, A Q and (A 0 +A )(Q 0 + Q ). But in the present case, the upper half is already known, and we can recover A 0 Q 0 from it and the two other products. More precisely, put A = A 0 + t p A, Q = Q 0 + t p Q, and write A Q = T 0 + t n p T, A 0 Q 0 = Z 0 + t p Z, (A 0 + A )(Q 0 + Q ) = U 0 + t n p U, S = S 0 + t n p S. The low part of the remainder is simply the low part of A Q, i.e., T.By definition, S is the high part of A 0 Q 0 + t p (A 0 Q + A Q 0 ) + t p A Q. () If n is even, S 0 = Z 0 and S = Z + U 0 T 0 Z 0. This implies that Z = S + T 0 U 0 + S 0. The high part of the remainder is the high part of the low part of (), i.e., the low part of (A 0 Q + A Q 0 ) plus the high part of A Q, namely T 0 + U T Z = U 0 + U T S 0 S.
11 The Middle Product Algorithm I 45 If now n is odd, we have S 0 = Z 0, S = Z + t(u 0 T 0 Z 0 ), hence Z = S + ts 0 + tt 0 tu 0, and the high part of the remainder is U T Z + tt 0 = tu 0 + U T ts 0 S, and the result follows. The computation of T is a n n product, which costs K( n/ ); that of U = (A 0 + A ) (Q 0 + Q ) is a full n n product, which costs K( n/ ). Thus ShortRem costs K( n/ ) + K( n/ ) 3 K(n). Remark. Since we only need the sum of U 0 or tu 0 and U, we may replace the full product (A 0 + A ) (Q 0 + Q ) by a product mod t p, which would directly give the sum U 0 + U or tu 0 + U.Ifp is even or, better, divisible by a large power of, the evaluation of such a product can be improved by simple FFT-like techniques, using the fact that (x, y) (x + y)/ + t k (x y)/is an isomorphism from R[t]/(t k ) R[t]/(t k + ) onto R[t]/(t p ), when p = k, if the characteristic of the ring R is odd. Remark. When using FFT multiplication, Algorithm ShortRem is of little interest, since it replaces a multiplication of size n by two multiplications of size n/. However, the fact that the product (A 0 +A )(Q 0 +Q ) is needed only modulo t p (whereas FFT might compute it modulo t p ) can probably be used to speed up things. 5 Square Root of Power Series 5. Newton s Method From our division algorithm, we can derive a square root algorithm by the classical Newton s iteration x k+ = x k + A x k x k. Theorem 0 The square root of a power series of order n can be computed in S(n) K(n) ring multiplications under the Karatsuba model, S(n) FFT (n) under the FFT model. 7 Proof. The corresponding algorithm is as follows: Algorithm MP-sqrt([a 0,...,a n ]). 0. If n =, return [ a 0 ].. p n/.. α MP-sqrt([a 0,...,a n p ]); 3. β mul([α,...,α n p ], [α,...,α n p ]); 4. γ MP-divide([a n p β n p,...,a n β n 3 ], α); 5. Return [α 0,...,α n p,γ 0,...,γ p ].
12 46 G. Hanrot et al. where at step 4 it is assumed that β n p = 0ifn =, β n 3 = 0ifnis even, and α is truncated to p terms. The number S(n) of ring multiplications satisfies S() = 0, S(n) = S( n/ ) + D( n/ ) + M( n/ ). We get S(n) K(n) n under the Karatsuba model and S(n) 7 FFT (n) under the FFT model. Corollary The square root (resp. square root with remainder) of a power series can be computed in 3 4 K(n) (resp. 5 K(n)) multiplications under the 4 Karatsuba model, and 3FFT (n) (resp. 4FFT (n)) multiplications under the FFT model. Proof. If one replaces the full product from step 3 by a short square giving the n p high order terms of α (see ), one gets S(n) 3 K(n) under the 4 Karatsuba model. In practice, one just needs to replace the call to mul by a call to MP-square. For the FFT model, instead of dividing A xk by x k at step 4 of MP-sqrt, we multiply by y k where y k = y k + y k ( x k y k ) is an approximation of / A, computed together with x k. For the last step, this replaces one division of size n/ by one multiplication of size n/ the last y k is not needed and for the preceding steps this replaces one division of size n/ j by three multiplications of size n/ j for j, whence a total gain of M(n/) for D(n) 5 M(n). To get the complexities for the square root with remainder, just add K(n) under the Karatsuba model (using Algorithm MP-square) and FFT (n) under the FFT model. 5. A Variant We can also give a slightly more general algorithm.assume that A is given to precision n, and we have an approximation A = S +t p R to precision p. Write S = S + δt p, with δ of order n p. One has A = S + Rt p δt p (S δt p ). To get the right S, we just need that Rt p δt p (S δt p ) be 0 to precision n, hence δ = R/(S + δt p ) to precision n p. This identity is meaningful as soon as p>0, since the quotient is performed online : the term of degree k of δ only depends on the terms of δ of degree k p. However for our recursive algorithm to work, all the first half bits of (S + δt p ) are needed to compute the first half of δ. Hence we can give an in place formulation of MP-divide finding δ as soon as p >(n p)/, i.e., p > n/3. Algorithm MP-sqrt corresponds to the case p = n/, hence we can ignore the term δt p, and simply compute R/(S) using MP-divide. Precisely, MP-div-inplace(a,q,l,n) computes the quotient of [a 0,...,a n ]by[q 0,...,q n ] and puts the result in [q l,...,q l+n ]. Algorithm MP-div-inplace ([a 0,...,a n ], [q 0,...,q l+n ],l,n). 0. If n =, q l a 0 /q 0. Return.
13 The Middle Product Algorithm I 47. p n/.. MP-div-inplace(a,q,l,n p); 3. γ MP([q l,...,q l+n p ], [q,...,q n, 0]); 4. MP-div-inplace([a n p γ 0,...,a n γ p ],q,l+n p, p). Algorithm MP-sqrt-generic([a 0,...,a n ],p). 0. If n = return [ a 0 ].. α MP-sqrt-generic([a 0,...,a p ]);. β MP-square(α); 3. MP-div-inplace([a p β p,...,a p β 0,a p,...,a n ], α, p +,n p); 4. Return [α 0,...,α p, α 0,...,α n p ]. One can then try to optimize the choice of the splitting point p.a theoretical analysis in the Karatsuba model, taking p = βn, givesβ opt 0.4, with a cost 0.734K(n). Experimental observations yield the following rule: for n between k and k + k, choose p = k, except for n = k + where one should choose p = k + ; for n between k + k and k+, choose p = n k. Using those simple rules, we seem to get the optimal number of multiplications up to a 5% loss; furthermore those rules seem to be asymptotically optimal. The asymptotic behaviour for S(n)/K(n) is then lim sup S(n)/K(n) = 3/4(eg. for n = k where we use in fact the first algorithm) and lim inf S(n)/K(n) seems to be 7/8 (roughly for n up to 6 ), and obtained for n = k + k. On average we get 5 6 S(k) k= 5 K(k) Implementation Results Since the algorithms we propose offer only a gain on the constant factor, it seemed to us that a realistic (by opposition with complexity estimates) implementation was required. We describe it shortly in the present section, and give experimental results on several architectures. We choose to implement the algorithms on power series with coefficients in Z/pZ, with p the largest prime such that p fits into a machine word (p = 655 on 3-bit machines, p = on 64-bit machines). For small sizes of arguments, quadratic algorithms are used. The optimal threshold for each algorithm is determined by a tuning program. Experiments have been done for series from degree 6 to 000. We discuss the results algorithm by algorithm. 6. Middle Product The middle product algorithm has been compared with Karatsuba s algorithm. The thresholds with the quadratic version are very similar on all architectures,
14 48 G. Hanrot et al. though slightly smaller for the middle product. In practice, we recover the expected result MP (n) K(n). We sometimes get (ev6 and k7) a little better, roughly 0.9K(n)..4 MP K(n).4 MP K(n) Middle product on a k7-550mhz Middle product on a ev6-500mhz.4 MP K(n).4 MP K(n) Middle product on a sparcv9-96 MHz Middle product on an R0K-94 MHz 6. Short Square We have compared the MP-square algorithm, of theoretical complexity K(n) (see ), 4 with the following algorithm: Algorithm Kara-square([a 0,...,a n ]). 0. If n = return [a 0 ].. p n/ ; 4 A referee proposed the following Mulders-like trick to improve MP-square: take p = ( β)n instead of p = n/, which leads to the recurrence R(n) = R(βn) + K(( β)n) + R((β )n) under the Karatsuba model, with an optimal β 0.560, and a theoretical average cost of about 0.48K(n). The same idea also applies to Kara-square with p = βn, β opt 0.58, and a cost of about 0.485K(n).
15 The Middle Product Algorithm I 49. α mul([a 0,...,a n p ], [a 0,...,a n p ]); 3. β mul high([a 0,...,a n p ], [a n p,...,a n ]); 4. Return [α 0,...,α n p,α n p + β 0,...,α n + β p ]. The theoretical complexity of this algorithm is M( n ), i.e., K(n) with 3 Karatsuba multiplication. However, a square is usually less expensive to compute than a product, and costs K(n) under reasonable assumptions. With the 3 average gain due to Mulders algorithm, we get an algorithm of practical complexity about 0.5K(n). On the following diagrams, the upper curve is the time for execution of a full square using Karatsuba, the middle one is the time for computing a short square using Kara-square and the lower one the time for computing a short square using MP-square.. square MP-Square Kara-Square K(n)/. square MP-Square Kara-Square K(n)/ Short square on a k7-550mhz Short square on a ev6-500mhz. square MP-Square Kara-Square K(n)/. square MP-Square Kara-Square K(n)/ Short square on a sparcv9-96 MHz Short square on an R0K-94 MHz
16 430 G. Hanrot et al. 6.3 Division We have compared our MP-divide algorithm with Burnikel-Ziegler division [], in which we have used short products to compute the remainder associated to the high part of the quotient. We get the following diagrams (lower curve is MP-divide):.8 MP-Div Kara-Div K(n).8 MP-Div Kara-Div K(n) Division on a k7-550mhz Division on a ev6-500mhz.8 MP-Div Kara-Div K(n).8 MP-Div Kara-Div K(n) Division on a sparcv9-96 MHz Division on an R0K-94 MHz 6.4 Square Root We have compared our MP-Sqrt algorithm (lower curve) with Karatsuba square root [3], in which we have used the Kara-square algorithm. We get the following diagrams:
17 The Middle Product Algorithm I 43. Sqrt MP-Sqrt 3/4 K(n). Sqrt MP-Sqrt 3/4 K(n) Square root on a k7-550mhz Square root on a ev6-500mhz. Sqrt MP-Sqrt 3/4 K(n). Sqrt MP-Sqrt 3/4 K(n) Square root on a sparcv9-96 MHz Square root on an R0K-94 MHz 6.5 Synthesis The complexity results obtained in this paper, which might be subject to caution since they deal with constants, are reliable. Our experiments confirm that we really obtain those constants in practice. We even get slightly better since the middle product seems to perform slightly better than Karatsuba on a variety of machines. 7 Conclusion This paper presents new algorithms for basic operations on polynomials or power series. Those algorithms are based on a new algorithm, called middle product, that computes the n middle coefficients of a (n ) n product as efficiently as a full n n product. Table gives a detailed analysis of the
18 43 G. Hanrot et al. Table. Previous and new best-known complexities for several operations on power series and polynomials, under the Karatsuba and FFT models, within O(n) memory usage. Square Karatsuba model FFT model previous this paper previous this paper K(n) K(n) [Th. 5] M(n) M(n) Middle product K(n) K(n) [Th. ] M(n) M(n) [Th. 3] Inverse Quotient... with remainder K(n) [] Square root K(n) [3]... with remainder 3 K(n) [] K(n) [Th. 6] 3M(n) M(n) [Th. 6] 3 K(n) [] K(n) [Th. 8] 7 M(n) 5 M(n) [Th. 7] 5 K(n) [Th. 9] 9 M(n) 7 M(n) [Th. 7] 3 3 K(n) [Co. ] 7 M(n) 3M(n) [Co. ] 4 3 K(n) [3] 5 K(n) [Co. ] 9 M(n) 4M(n) [Co. ] 4 number of coefficient operations used by those algorithms, compared to previously known algorithms. 5 An implementation in Maple of these algorithms is available at the URL zimmerma/papers/mp.mpl. It is possible to obtain floating point versions of the middle product algorithm using only O() extra memory with respect to the polynomial version at each recursion step. This, and the applications to multiprecision floating point arithmetic, is a work in progress and will be presented in a forthcoming paper [3]. In [], Victor Shoup asks if a matrix/vector product of the following form can be reduced to a single multiplication of polynomials of degree less than n: v 0 v v n v v v n. v n v n v n b 0 b. b n. This is exactly MP([b n,...,b 0 ], [v 0,...,v n ]), whence the middle product algorithm partially answers to that open question: it does not reduce the matrix/vector product to a single n n product, but to the same number of ring multiplications, provided the product algorithm uses linear forms. This is a particular instance of the transposition principle problem [4, Open Problem 6] in the case of Toeplitz matrices: if M is the (n ) n matrix with M i,j = x i j for 0 i j <n, and M i,j = 0 otherwise, y is the vector (a 0,...,a n ) T, and z is the vector (a n,...,a 0 ) T, then M y gives the full product of [x 0,...,x n ] 5 For the sake of simplicity, we consider that a full square costs as much as a full product, and we don t reuse Fourier transforms when a given variable is used in several multiplications.
19 The Middle Product Algorithm I 433 and [a 0,...,a n ], whereas M T z gives MP([x 0,...,x n ], [a 0,...,a n ]), in reversed order. The algorithms of Table need only a O(n) memory space, i.e., proportional to the input size. If one enables a larger memory usage, then other algorithms exist. In particular, the relaxed DAC-multiplication proposed by van der Hoeven in [] can be adapted to yield a division with remainder in exactly K(n) scalar operations, using O(nlog n) memory. However the fast variant from [, 4.3.], which works within O(n) memory, requires more than K(n) operations asymptotically. Acknowledgements. The authors wish to thank one of the anonymous referees for his clever comments that greatly helped to improve the presentation of the results, and Éric Schost who pointed out the link with the transposition principle. References. Brent, R.P.: The complexity of multiple-precision arithmetic. In: The Complexity of Computational Problem Solving, R.S. Anderssen and R.P. Brent, (eds.), University of Queensland Press, 976, pp Burnikel, C., Ziegler, J.: Fast recursive division. Research Report MPI-I-98--0, MPI Saarbrücken, Hanrot, G., Quercia, M., Zimmermann, P.: The Middle Product Algorithm, II. Floating point arithmetic. In preparation 4. Kaltofen, E.: Challenges of symbolic computation: My favorite open problems. J. Symbolic Comput. 9(6), (000) 5. Kaminski, M., Kirpatrick, D.G., Bshouty, N.H.: Addition requirements for matrix and transposed matrix products. J. Algorithms, 9, (988) 6. Karatsuba, A.A., Ofman, Y.P.: Multiplication of multiplace numbers by automata. Dokl. Akad. Nauk SSSR 45(), (96) 7. Karp, A.H., Markstein, P.: High-precision division and square root. ACM Trans. Math. Softw. 3(4), (997) 8. Krandick, W., Johnson, J.R.: Efficient multiprecision floating point multiplication with exact rounding. RISC-Linz Report Series 93-76, RISC-Linz, Johannes Kepler University, Mulders, T.: On short multiplications and divisions. AAECC (), (000) 0. Schönhage, A., Strassen, V.: Schnelle Multiplikation großer Zahlen. Computing 7, 8 9 (97). Shoup, V.: Efficient computation of minimal polynomials in algebraic extension of finite fields. In: Proceedings of the 999 International Symposium on Symbolic and Algebraic Computation (Vancouver, Canada), S. Dooley, (ed.), ACM, pp van der Hoeven, J.: Relax, but don t be too lazy. J. Symbolic Comput. 34(6), (00) 3. Zimmermann, P.: Karatsuba square root. Research Report 3805, INRIA, 999 Appendix. The General Situation In this section, we give a more general and formal presentation of the trick described in. This presentation is suitable for any bilinear multiplication
20 434 G. Hanrot et al. algorithm, and gives, in the case of Karatsuba and FFT multiplication, corresponding algorithms of the same complexity for the middle product (see Theorems and 3). In the sequel, R is any commutative ring. Theorem Let M p,q,n : R p R q R n and n,p,q : R n R p R q be the bilinear forms defined by M p,q,n (y, z) = y j z k n,p,q (x, y) = x i y j. j+k=i+p 0 i<n i+j=k 0 k<q Then, for any (X,Y,Z) R n R p R q, we have (X M p,q,n (Y, Z)) = ( n,p,q (X, Ỹ) Z), (3) where ( ) denotes the canonical inner product of two vectors of the same length and Ỹ = (y p i ) 0 i<p. The n,p,q and M p,q,n bilinear mappings are related to the product of power series in the following way. Let X = (x 0,...,x n ) R n, Y = (y 0,...,y p ) R p and Z = (z 0,...,z q ) R q.wehave ( ) n,p,q (X, Y ) = x i y j,..., x i y j i+j=0 i+j=q ( = coefficients of t 0,...,t q in ( x i t i )( ) y j t j ) ; ( ) M p,q,n (Y, Z) = y j z k,..., y j z k = j+k=p j+k=n+p ( coefficients of t p,...,t n+p in ( y j t j )( ) z k t k ). For p = n and q = n, n,p,q and M p,q,n correspond respectively to the usual multiplication and the middle product. Proof. [Theorem ] We have n (X M p,q,n (Y, Z)) = x i y j z k = x i y j z k i=0 = 0 i<n 0 j<p 0 k<q i+j=k = 0 k<q which concludes the proof. 0 j<p 0 k<q j+k=i+p x i y p j z k z k 0 i<n 0 j<p i+j=k 0 i<n 0 j<p 0 k<q i+p j=k x i y p j = ( n,p,q (X, Ỹ) Z),
21 The Middle Product Algorithm I 435 We are now able to prove our main result, Theorem 4: Proof of Theorem 4. Suppose that we have a formula n,p,q (X, Y ) = l (a m X)(b m Y)c m (4) m= with a m R n, b m R p and c m R q. Such a formula will be called a decomposition of n,p,q into l ring multiplications. Then we have according to Theorem (X M p,q,n (Y, Z)) = ( n,p,q (X, Ỹ) Z) = l (a m X)(b m Ỹ )(c m Z). m= Therefore, X being an arbitrary vector in R n : M p,q,n (Y, Z) = l (b m Ỹ )(c m Z)a m, (5) m= and M p,q,n can also be decomposed into l ring multiplications, with same coefficients. Remark. Formulas (4) (5) show that the complexities of a (n, n) full product and a (n, n ) middle product are equal when one only counts multiplications between linear forms in the operands. However, if one also takes into account the time to compute those linear forms and the time to combine the products, then the complexities of n,n,n and M n,n,n remain comparable according to the transposition principle (see the following paragraph). Matrix multiplication. With the above notations, let us write a m = (a m,,..., a m,n ), b m = (b m,,...,b m,p ), c m = (c m,,...,c m,q ) and let A, B, C be the (l, n), (l, p) and (l, q) corresponding matrices. Then formulas (4) (5) read: n,p,q (X, Y ) = C T ((AX T ) (BY T )), M p,q,n (Y, Z) = A T ((BỸ T ) (CZ T )), where denotes the component-wise product of two vectors of length l. From the transposition principle [5, Def. 8 and Th. 4], we conclude that one can transform a bilinear algorithm P computing n,p,q (X, Y ) into a bilinear algorithm M computing M p,q,n (Y, Z).Ifα(X ), β(x ) and γ(x ) denote the number of additions, multiplications by a known scalar and multiplications between two indeterminates for algorithm X then we have α(m) = α(p) + q p, β(m) = β(p), γ (M) = γ(p).
22 436 G. Hanrot et al. n 0 n 0 n n 0 X = X 0 X X Z = Z 0 Z Y = Y 0 Y Y Ỹ = Ỹ Ỹ Ỹ 0 Z 0 Z n n n n 0 Fig. 3. Karatsuba and MP splitting. This means, in the case p = n, q = n, that from any multiplication bilinear algorithm using M(n) multiplications in R we can derive an algorithm for MP using M(n) multiplications in R, and n more additions. We now illustrate how the algorithms of Theorems and 3 can be obtained from Theorem. The Karatsuba Model Let X = (x i ) 0 i<n R n, Y = (y j ) 0 j<n R n and Z = (z k ) 0 k<n R n. Write n = n 0 + n, with 0 <n 0 n and split X, Y in the following way. X = [X 0,X,X ] with X 0 = (x 0,...,x n0 ), X = (x n0,...,x n ), X = (x n,...,x n ), Y = [Y 0,Y,Y ] with Y 0 = (y 0,...,y n0 ), Y = (y n0,...,y n ), Y = (y n,...,y n ). Note that the subvectors X and Y are empty when n 0 = n. Furthermore, we define Z 0,Z 0,Z and Z as the following subvectors of Z: Z 0 = (z 0,...,z n ), Z 0 = (z n,...,z 3n ), Z = (z n,...,z n +n 0 ), Z = (z n,...,z n ). Karatsuba s method amounts to make a recursive call on low halves and high halves of the operands, and on the differences of those operands, i.e., to compute a = n,n,n ([X 0,X ], [Y 0,Y ]), b = n0,n 0,n 0 (X,Y ), c = n,n,n ([X 0 X,X ], [Y 0 Y,Y ]),
23 The Middle Product Algorithm I 437 and then the high part of the result is b, the low part is a and the middle part is a c + b. In our formalism, we write ( n,n,n (X, Ỹ) Z) = ( n,n,n ([X 0,X ], [Ỹ, Ỹ ]) Z 0 + Z 0 ) ( n,n,n ([X 0 X,X ], [Ỹ Ỹ 0, Ỹ ]) Z 0 ) +( n0,n 0,n 0 (X, Ỹ 0 ) Z + Z ). Then, using equation (3) and Karatsuba s method, we can write (X M n,n,n (Y, Z)) = ( n,n,n (X, Ỹ) Z) = ( n,n,n ([X 0,X ], [Ỹ, Ỹ ]) Z 0 + Z 0 ) ( n,n,n ([X 0 X,X ], [Ỹ Ỹ 0, Ỹ ]) Z 0 ) +( n0,n 0,n 0 (X, Ỹ 0 ) Z + Z ) = ([X 0,X ] M n,n,n ([Y,Y ],Z 0 + Z 0 )) ([X 0 X,X ] M n,n,n ([Y,Y Y 0 ],Z 0 )) +(X M n0,n 0,n 0 (Y 0,Z + Z )) = (X [α β,γ + low(β)]) where α = M n,n,n ([Y,Y ],Z 0 + Z 0 ), β = M n,n,n ([Y,Y Y 0 ],Z 0 ), low(β) = (β 0,...,β n0 ), γ = M n0,n 0,n 0 (Y 0,Z + Z ). X being an arbitrary vector of R n, we infer: M n,n,n (Y, Z) = [α β,γ + low(β)]. (6) This formula is exactly Algorithm MP-Karatsuba from.. The FFT Model Let ω be a primitive (n)-th root of unity, X R n, Y R n and Z R n. The FFT algorithm becomes, in our formalism,
24 438 G. Hanrot et al. n,n,n (X, Y ) = n ((ω mi ) 0 i<n X)((ω mj ) 0 j<n Y )(ω mk ) 0 k<n. n m=0 Hence we obtain from (4) (5) the formula M n,n,n (Y, Z) = n ((ω mj ) 0 j<n Ỹ )((ω mk ) 0 k<n Z)(ω mi ) 0 i<n. n m=0 In more classical notations, if f m (n) is the m-th component of the direct Fourier transform, and g m (n) that of the inverse one, we get y j z k = n ω mi f m (n) n (ỹ,0,...,0)g(n) m (z), 0 i<n, 0 j<n 0 k<n j+k=i+n m=0 which is exactly Algorithm MP-FFT from.3. This means that MP(y, z) can be computed in two Fourier transforms of size n and one inverse Fourier transform of size n, which is exactly the same as for a full n n product. Remark. The preceding formulas are still valid if we suppose X of length n+: n+,n,n (X, Y ) = n ((ω mi ) 0 i n X)((ω mj ) 0 j<n Y )(ω mk ) 0 k<n, n M n,n,n+ (Y, Z) = n m=0 n ((ω mj ) 0 j<n Ỹ )((ω mk ) 0 k<n Z)(ω mi ) 0 i n, m=0 therefore we can compute the n+ middle coefficients of the product ( y j t j ) ( z k t k ) with no additional operation (assuming no particular optimization).
Short Division of Long Integers. (joint work with David Harvey)
Short Division of Long Integers (joint work with David Harvey) Paul Zimmermann October 6, 2011 The problem to be solved Divide efficiently a p-bit floating-point number by another p-bit f-p number in the
More informationFast Polynomial Multiplication
Fast Polynomial Multiplication Marc Moreno Maza CS 9652, October 4, 2017 Plan Primitive roots of unity The discrete Fourier transform Convolution of polynomials The fast Fourier transform Fast convolution
More informationCPSC 518 Introduction to Computer Algebra Asymptotically Fast Integer Multiplication
CPSC 518 Introduction to Computer Algebra Asymptotically Fast Integer Multiplication 1 Introduction We have now seen that the Fast Fourier Transform can be applied to perform polynomial multiplication
More informationThree Ways to Test Irreducibility
Three Ways to Test Irreducibility Richard P. Brent Australian National University joint work with Paul Zimmermann INRIA, Nancy France 12 Feb 2009 Outline Polynomials over finite fields Irreducibility criteria
More informationSCALED REMAINDER TREES
Draft. Aimed at Math. Comp. SCALED REMAINDER TREES DANIEL J. BERNSTEIN Abstract. It is well known that one can compute U mod p 1, U mod p 2,... in time n(lg n) 2+o(1) where n is the number of bits in U,
More informationCPSC 518 Introduction to Computer Algebra Schönhage and Strassen s Algorithm for Integer Multiplication
CPSC 518 Introduction to Computer Algebra Schönhage and Strassen s Algorithm for Integer Multiplication March, 2006 1 Introduction We have now seen that the Fast Fourier Transform can be applied to perform
More informationThree Ways to Test Irreducibility
Outline Three Ways to Test Irreducibility Richard P. Brent Australian National University joint work with Paul Zimmermann INRIA, Nancy France 8 Dec 2008 Polynomials over finite fields Irreducibility criteria
More information2 The Truncated Fourier Transform and Applications The TFT permits to speed up the multiplication of univariate polynomials with a constant factor bet
The Truncated Fourier Transform and Applications Joris van der Hoeven D pt. de Math matiques (B t. 425) Universit Paris-Sud 91405 Orsay Cedex France Email: joris@texmacs.org January 9, 2004 In this paper,
More informationERROR BOUNDS ON COMPLEX FLOATING-POINT MULTIPLICATION
MATHEMATICS OF COMPUTATION Volume 00, Number 0, Pages 000 000 S 005-5718(XX)0000-0 ERROR BOUNDS ON COMPLEX FLOATING-POINT MULTIPLICATION RICHARD BRENT, COLIN PERCIVAL, AND PAUL ZIMMERMANN In memory of
More informationInteger multiplication with generalized Fermat primes
Integer multiplication with generalized Fermat primes CARAMEL Team, LORIA, University of Lorraine Supervised by: Emmanuel Thomé and Jérémie Detrey Journées nationales du Calcul Formel 2015 (Cluny) November
More informationCSE 421 Algorithms. T(n) = at(n/b) + n c. Closest Pair Problem. Divide and Conquer Algorithms. What you really need to know about recurrences
CSE 421 Algorithms Richard Anderson Lecture 13 Divide and Conquer What you really need to know about recurrences Work per level changes geometrically with the level Geometrically increasing (x > 1) The
More informationFast reversion of formal power series
Fast reversion of formal power series Fredrik Johansson LFANT, INRIA Bordeaux RAIM, 2016-06-29, Banyuls-sur-mer 1 / 30 Reversion of power series F = exp(x) 1 = x + x 2 2! + x 3 3! + x 4 G = log(1 + x)
More information= 1 2x. x 2 a ) 0 (mod p n ), (x 2 + 2a + a2. x a ) 2
8. p-adic numbers 8.1. Motivation: Solving x 2 a (mod p n ). Take an odd prime p, and ( an) integer a coprime to p. Then, as we know, x 2 a (mod p) has a solution x Z iff = 1. In this case we can suppose
More informationSubquadratic Space Complexity Multiplication over Binary Fields with Dickson Polynomial Representation
Subquadratic Space Complexity Multiplication over Binary Fields with Dickson Polynomial Representation M A Hasan and C Negre Abstract We study Dickson bases for binary field representation Such representation
More informationFast reversion of power series
Fast reversion of power series Fredrik Johansson November 2011 Overview Fast power series arithmetic Fast composition and reversion (Brent and Kung, 1978) A new algorithm for reversion Implementation results
More informationRemainders. We learned how to multiply and divide in elementary
Remainders We learned how to multiply and divide in elementary school. As adults we perform division mostly by pressing the key on a calculator. This key supplies the quotient. In numerical analysis and
More informationSYMMETRY AND SPECIALIZABILITY IN THE CONTINUED FRACTION EXPANSIONS OF SOME INFINITE PRODUCTS
SYMMETRY AND SPECIALIZABILITY IN THE CONTINUED FRACTION EXPANSIONS OF SOME INFINITE PRODUCTS J MC LAUGHLIN Abstract Let fx Z[x] Set f 0x = x and for n 1 define f nx = ff n 1x We describe several infinite
More informationNotes on generating functions in automata theory
Notes on generating functions in automata theory Benjamin Steinberg December 5, 2009 Contents Introduction: Calculus can count 2 Formal power series 5 3 Rational power series 9 3. Rational power series
More informationLiterature Review: Adaptive Polynomial Multiplication
Literature Review: Adaptive Polynomial Multiplication Daniel S. Roche November 27, 2007 While output-sensitive algorithms have gained a fair amount of popularity in the computer algebra community, adaptive
More informationPolynomial evaluation and interpolation on special sets of points
Polynomial evaluation and interpolation on special sets of points Alin Bostan and Éric Schost Laboratoire STIX, École polytechnique, 91128 Palaiseau, France Abstract We give complexity estimates for the
More informationImplementation of the DKSS Algorithm for Multiplication of Large Numbers
Implementation of the DKSS Algorithm for Multiplication of Large Numbers Christoph Lüders Universität Bonn The International Symposium on Symbolic and Algebraic Computation, July 6 9, 2015, Bath, United
More informationOn the computational complexity of mathematical functions
On the computational complexity of mathematical functions Jean-Pierre Demailly Institut Fourier, Université de Grenoble I & Académie des Sciences, Paris (France) November 26, 2011 KVPY conference at Vijyoshi
More informationElliptic Curves Spring 2013 Lecture #3 02/12/2013
18.783 Elliptic Curves Spring 2013 Lecture #3 02/12/2013 3.1 Arithmetic in finite fields To make explicit computations with elliptic curves over finite fields, we need to know how to perform arithmetic
More informationChapter 2. Matrix Arithmetic. Chapter 2
Matrix Arithmetic Matrix Addition and Subtraction Addition and subtraction act element-wise on matrices. In order for the addition/subtraction (A B) to be possible, the two matrices A and B must have the
More informationCS 4424 Matrix multiplication
CS 4424 Matrix multiplication 1 Reminder: matrix multiplication Matrix-matrix product. Starting from a 1,1 a 1,n A =.. and B = a n,1 a n,n b 1,1 b 1,n.., b n,1 b n,n we get AB by multiplying A by all columns
More informationfeb abhi shelat Matrix, FFT
L7 feb 11 2016 abhi shelat Matrix, FFT userid: = Using the standard method, how many multiplications does it take to multiply two NxN matrices? cos( /4) = cos( /2) = sin( /4) = sin( /2) = Mergesort Karatsuba
More informationMoreover this binary operation satisfies the following properties
Contents 1 Algebraic structures 1 1.1 Group........................................... 1 1.1.1 Definitions and examples............................. 1 1.1.2 Subgroup.....................................
More informationDivide and Conquer: Polynomial Multiplication Version of October 1 / 7, 24201
Divide and Conquer: Polynomial Multiplication Version of October 7, 2014 Divide and Conquer: Polynomial Multiplication Version of October 1 / 7, 24201 Outline Outline: Introduction The polynomial multiplication
More informationCDM. Recurrences and Fibonacci
CDM Recurrences and Fibonacci Klaus Sutner Carnegie Mellon University 20-fibonacci 2017/12/15 23:16 1 Recurrence Equations Second Order The Fibonacci Monoid Recurrence Equations 3 We can define a sequence
More informationCDM. Recurrences and Fibonacci. 20-fibonacci 2017/12/15 23:16. Terminology 4. Recurrence Equations 3. Solution and Asymptotics 6.
CDM Recurrences and Fibonacci 1 Recurrence Equations Klaus Sutner Carnegie Mellon University Second Order 20-fibonacci 2017/12/15 23:16 The Fibonacci Monoid Recurrence Equations 3 Terminology 4 We can
More informationFrequency Domain Finite Field Arithmetic for Elliptic Curve Cryptography
Frequency Domain Finite Field Arithmetic for Elliptic Curve Cryptography Selçuk Baktır, Berk Sunar {selcuk,sunar}@wpi.edu Department of Electrical & Computer Engineering Worcester Polytechnic Institute
More informationFast algorithms for polynomials and matrices Part 2: polynomial multiplication
Fast algorithms for polynomials and matrices Part 2: polynomial multiplication by Grégoire Lecerf Computer Science Laboratory & CNRS École polytechnique 91128 Palaiseau Cedex France 1 Notation In this
More informationFaster integer multiplication using short lattice vectors
Faster integer multiplication using short lattice vectors David Harvey and Joris van der Hoeven ANTS XIII, University of Wisconsin, Madison, July 2018 University of New South Wales / CNRS, École Polytechnique
More informationSCHOOL OF MATHEMATICS MATHEMATICS FOR PART I ENGINEERING. Self-paced Course
SCHOOL OF MATHEMATICS MATHEMATICS FOR PART I ENGINEERING Self-paced Course MODULE ALGEBRA Module Topics Simplifying expressions and algebraic functions Rearranging formulae Indices 4 Rationalising a denominator
More informationOn Newton-Raphson iteration for multiplicative inverses modulo prime powers
On Newton-Raphson iteration for multiplicative inverses modulo prime powers Jean-Guillaume Dumas To cite this version: Jean-Guillaume Dumas. On Newton-Raphson iteration for multiplicative inverses modulo
More informationFast Matrix Product Algorithms: From Theory To Practice
Introduction and Definitions The τ-theorem Pan s aggregation tables and the τ-theorem Software Implementation Conclusion Fast Matrix Product Algorithms: From Theory To Practice Thomas Sibut-Pinote Inria,
More informationRelaxed Multiplication Using the Middle Product
Relaxed Multiplication Usin the Middle Product Joris van der Hoeven Département de Mathématiques (bât. 425) Université Paris-Sud 91405 Orsay Cedex France joris@texmacs.or ABSTRACT In previous work, we
More informationChapter 1. Sets and Numbers
Chapter 1. Sets and Numbers 1. Sets A set is considered to be a collection of objects (elements). If A is a set and x is an element of the set A, we say x is a member of A or x belongs to A, and we write
More informationDivide and Conquer. Maximum/minimum. Median finding. CS125 Lecture 4 Fall 2016
CS125 Lecture 4 Fall 2016 Divide and Conquer We have seen one general paradigm for finding algorithms: the greedy approach. We now consider another general paradigm, known as divide and conquer. We have
More informationElementary factoring algorithms
Math 5330 Spring 018 Elementary factoring algorithms The RSA cryptosystem is founded on the idea that, in general, factoring is hard. Where as with Fermat s Little Theorem and some related ideas, one can
More informationPage 52. Lecture 3: Inner Product Spaces Dual Spaces, Dirac Notation, and Adjoints Date Revised: 2008/10/03 Date Given: 2008/10/03
Page 5 Lecture : Inner Product Spaces Dual Spaces, Dirac Notation, and Adjoints Date Revised: 008/10/0 Date Given: 008/10/0 Inner Product Spaces: Definitions Section. Mathematical Preliminaries: Inner
More informationFast Multivariate Power Series Multiplication in Characteristic Zero
Fast Multivariate Power Series Multiplication in Characteristic Zero Grégoire Lecerf and Éric Schost Laboratoire GAGE, École polytechnique 91128 Palaiseau, France E-mail: lecerf,schost@gage.polytechnique.fr
More informationSTEP Support Programme. Hints and Partial Solutions for Assignment 17
STEP Support Programme Hints and Partial Solutions for Assignment 7 Warm-up You need to be quite careful with these proofs to ensure that you are not assuming something that should not be assumed. For
More informationChapter 3. Rings. The basic commutative rings in mathematics are the integers Z, the. Examples
Chapter 3 Rings Rings are additive abelian groups with a second operation called multiplication. The connection between the two operations is provided by the distributive law. Assuming the results of Chapter
More informationMATH 117 LECTURE NOTES
MATH 117 LECTURE NOTES XIN ZHOU Abstract. This is the set of lecture notes for Math 117 during Fall quarter of 2017 at UC Santa Barbara. The lectures follow closely the textbook [1]. Contents 1. The set
More informationFinite Fields: An introduction through exercises Jonathan Buss Spring 2014
Finite Fields: An introduction through exercises Jonathan Buss Spring 2014 A typical course in abstract algebra starts with groups, and then moves on to rings, vector spaces, fields, etc. This sequence
More informationOriginal citation: Hart, William B. and Novocin, Andrew (2011) Practical divide-and-conquer algorithms for polynomial arithmetic. In: CASC'11 Proceedings of the 13th international conference on Computer
More informationMultiplying huge integers using Fourier transforms
Fourier transforms October 25, 2007 820348901038490238478324 1739423249728934932894??? integers occurs in many fields of Computational Science: Cryptography Number theory... Traditional approaches to
More informationGroup, Rings, and Fields Rahul Pandharipande. I. Sets Let S be a set. The Cartesian product S S is the set of ordered pairs of elements of S,
Group, Rings, and Fields Rahul Pandharipande I. Sets Let S be a set. The Cartesian product S S is the set of ordered pairs of elements of S, A binary operation φ is a function, S S = {(x, y) x, y S}. φ
More informationCOMPUTER ARITHMETIC. 13/05/2010 cryptography - math background pp. 1 / 162
COMPUTER ARITHMETIC 13/05/2010 cryptography - math background pp. 1 / 162 RECALL OF COMPUTER ARITHMETIC computers implement some types of arithmetic for instance, addition, subtratction, multiplication
More informationCourse 311: Michaelmas Term 2005 Part III: Topics in Commutative Algebra
Course 311: Michaelmas Term 2005 Part III: Topics in Commutative Algebra D. R. Wilkins Contents 3 Topics in Commutative Algebra 2 3.1 Rings and Fields......................... 2 3.2 Ideals...............................
More informationSubquadratic space complexity multiplier for a class of binary fields using Toeplitz matrix approach
Subquadratic space complexity multiplier for a class of binary fields using Toeplitz matrix approach M A Hasan 1 and C Negre 2 1 ECE Department and CACR, University of Waterloo, Ontario, Canada 2 Team
More informationDefinitions, Theorems and Exercises. Abstract Algebra Math 332. Ethan D. Bloch
Definitions, Theorems and Exercises Abstract Algebra Math 332 Ethan D. Bloch December 26, 2013 ii Contents 1 Binary Operations 3 1.1 Binary Operations............................... 4 1.2 Isomorphic Binary
More informationarxiv: v1 [cs.na] 8 Feb 2016
Toom-Coo Multiplication: Some Theoretical and Practical Aspects arxiv:1602.02740v1 [cs.na] 8 Feb 2016 M.J. Kronenburg Abstract Toom-Coo multiprecision multiplication is a well-nown multiprecision multiplication
More informationAsymptotic Analysis and Recurrences
Appendix A Asymptotic Analysis and Recurrences A.1 Overview We discuss the notion of asymptotic analysis and introduce O, Ω, Θ, and o notation. We then turn to the topic of recurrences, discussing several
More informationCommutative Rings and Fields
Commutative Rings and Fields 1-22-2017 Different algebraic systems are used in linear algebra. The most important are commutative rings with identity and fields. Definition. A ring is a set R with two
More informationChapter 4. Matrices and Matrix Rings
Chapter 4 Matrices and Matrix Rings We first consider matrices in full generality, i.e., over an arbitrary ring R. However, after the first few pages, it will be assumed that R is commutative. The topics,
More informationDiscrete Mathematics U. Waterloo ECE 103, Spring 2010 Ashwin Nayak May 17, 2010 Recursion
Discrete Mathematics U. Waterloo ECE 103, Spring 2010 Ashwin Nayak May 17, 2010 Recursion During the past week, we learnt about inductive reasoning, in which we broke down a problem of size n, into one
More informationIntroduction to Algorithms
Lecture 1 Introduction to Algorithms 1.1 Overview The purpose of this lecture is to give a brief overview of the topic of Algorithms and the kind of thinking it involves: why we focus on the subjects that
More informationTransposition as a permutation: a tale of group actions and modular arithmetic
Transposition as a permutation: a tale of group actions and modular arithmetic Jeff Hooper Franklin Mendivil Department of Mathematics and Statistics Acadia University Abstract Converting a matrix from
More informationPolynomial Multiplication over Finite Fields using Field Extensions and Interpolation
009 19th IEEE International Symposium on Computer Arithmetic Polynomial Multiplication over Finite Fields using Field Extensions and Interpolation Murat Cenk Department of Mathematics and Computer Science
More informationCHAPTER 3: THE INTEGERS Z
CHAPTER 3: THE INTEGERS Z MATH 378, CSUSM. SPRING 2009. AITKEN 1. Introduction The natural numbers are designed for measuring the size of finite sets, but what if you want to compare the sizes of two sets?
More informationChapter 5 Divide and Conquer
CMPT 705: Design and Analysis of Algorithms Spring 008 Chapter 5 Divide and Conquer Lecturer: Binay Bhattacharya Scribe: Chris Nell 5.1 Introduction Given a problem P with input size n, P (n), we define
More informationFall 2017 Test II review problems
Fall 2017 Test II review problems Dr. Holmes October 18, 2017 This is a quite miscellaneous grab bag of relevant problems from old tests. Some are certainly repeated. 1. Give the complete addition and
More informationRepresenting numbers as the sum of squares and powers in the ring Z n
Representing numbers as the sum of squares powers in the ring Z n Rob Burns arxiv:708.03930v2 [math.nt] 23 Sep 207 26th September 207 Abstract We examine the representation of numbers as the sum of two
More informationTESTCRACKER CAT TOPPER s PROGRAM: QUANT. Factorial
TESTCRACKER CAT TOPPER s PROGRAM: QUANT Factorial The product of n consecutive natural numbers starting from 1 to n is called as the factorial n. n! = 1 x 2 x 3 x 4 x 5 x 6 x 7 x x (n 2) x (n 1) x n e.g.
More informationDiscrete Math, Fourteenth Problem Set (July 18)
Discrete Math, Fourteenth Problem Set (July 18) REU 2003 Instructor: László Babai Scribe: Ivona Bezakova 0.1 Repeated Squaring For the primality test we need to compute a X 1 (mod X). There are two problems
More informationSubquadratic Computational Complexity Schemes for Extended Binary Field Multiplication Using Optimal Normal Bases
1 Subquadratic Computational Complexity Schemes for Extended Binary Field Multiplication Using Optimal Normal Bases H. Fan and M. A. Hasan March 31, 2007 Abstract Based on a recently proposed Toeplitz
More informationLecture 1 & 2: Integer and Modular Arithmetic
CS681: Computational Numer Theory and Algera (Fall 009) Lecture 1 & : Integer and Modular Arithmetic July 30, 009 Lecturer: Manindra Agrawal Scrie: Purushottam Kar 1 Integer Arithmetic Efficient recipes
More informationMath 312/ AMS 351 (Fall 17) Sample Questions for Final
Math 312/ AMS 351 (Fall 17) Sample Questions for Final 1. Solve the system of equations 2x 1 mod 3 x 2 mod 7 x 7 mod 8 First note that the inverse of 2 is 2 mod 3. Thus, the first equation becomes (multiply
More informationLecture 6: Introducing Complexity
COMP26120: Algorithms and Imperative Programming Lecture 6: Introducing Complexity Ian Pratt-Hartmann Room KB2.38: email: ipratt@cs.man.ac.uk 2015 16 You need this book: Make sure you use the up-to-date
More informationAN ALGEBRA PRIMER WITH A VIEW TOWARD CURVES OVER FINITE FIELDS
AN ALGEBRA PRIMER WITH A VIEW TOWARD CURVES OVER FINITE FIELDS The integers are the set 1. Groups, Rings, and Fields: Basic Examples Z := {..., 3, 2, 1, 0, 1, 2, 3,...}, and we can add, subtract, and multiply
More informationTewodros Amdeberhan, Dante Manna and Victor H. Moll Department of Mathematics, Tulane University New Orleans, LA 70118
The -adic valuation of Stirling numbers Tewodros Amdeberhan, Dante Manna and Victor H. Moll Department of Mathematics, Tulane University New Orleans, LA 7011 Abstract We analyze properties of the -adic
More informationAlgebraic structures I
MTH5100 Assignment 1-10 Algebraic structures I For handing in on various dates January March 2011 1 FUNCTIONS. Say which of the following rules successfully define functions, giving reasons. For each one
More informationChapter 8. P-adic numbers. 8.1 Absolute values
Chapter 8 P-adic numbers Literature: N. Koblitz, p-adic Numbers, p-adic Analysis, and Zeta-Functions, 2nd edition, Graduate Texts in Mathematics 58, Springer Verlag 1984, corrected 2nd printing 1996, Chap.
More informationExact Arithmetic on a Computer
Exact Arithmetic on a Computer Symbolic Computation and Computer Algebra William J. Turner Department of Mathematics & Computer Science Wabash College Crawfordsville, IN 47933 Tuesday 21 September 2010
More informationFast, Parallel Algorithm for Multiplying Polynomials with Integer Coefficients
, July 4-6, 01, London, UK Fast, Parallel Algorithm for Multiplying Polynomials with Integer Coefficients Andrzej Chmielowiec Abstract This paper aims to develop and analyze an effective parallel algorithm
More informationFast multiplication and its applications
Algorithmic Number Theory MSRI Publications Volume 44, 2008 Fast multiplication and its applications DANIEL J. BERNSTEIN ABSTRACT. This survey explains how some useful arithmetic operations can be sped
More information(1) A frac = b : a, b A, b 0. We can define addition and multiplication of fractions as we normally would. a b + c d
The Algebraic Method 0.1. Integral Domains. Emmy Noether and others quickly realized that the classical algebraic number theory of Dedekind could be abstracted completely. In particular, rings of integers
More informationParallel Numerical Algorithms
Parallel Numerical Algorithms Chapter 13 Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign CS 554 / CSE 512 Michael T. Heath Parallel Numerical Algorithms
More informationRings If R is a commutative ring, a zero divisor is a nonzero element x such that xy = 0 for some nonzero element y R.
Rings 10-26-2008 A ring is an abelian group R with binary operation + ( addition ), together with a second binary operation ( multiplication ). Multiplication must be associative, and must distribute over
More informationECE133A Applied Numerical Computing Additional Lecture Notes
Winter Quarter 2018 ECE133A Applied Numerical Computing Additional Lecture Notes L. Vandenberghe ii Contents 1 LU factorization 1 1.1 Definition................................. 1 1.2 Nonsingular sets
More informationLecture 7: More Arithmetic and Fun With Primes
IAS/PCMI Summer Session 2000 Clay Mathematics Undergraduate Program Advanced Course on Computational Complexity Lecture 7: More Arithmetic and Fun With Primes David Mix Barrington and Alexis Maciel July
More informationELEMENTARY LINEAR ALGEBRA
ELEMENTARY LINEAR ALGEBRA K. R. MATTHEWS DEPARTMENT OF MATHEMATICS UNIVERSITY OF QUEENSLAND Corrected Version, 7th April 013 Comments to the author at keithmatt@gmail.com Chapter 1 LINEAR EQUATIONS 1.1
More informationA Generalization of Wilson s Theorem
A Generalization of Wilson s Theorem R. Andrew Ohana June 3, 2009 Contents 1 Introduction 2 2 Background Algebra 2 2.1 Groups................................. 2 2.2 Rings.................................
More informationBasic elements of number theory
Cryptography Basic elements of number theory Marius Zimand 1 Divisibility, prime numbers By default all the variables, such as a, b, k, etc., denote integer numbers. Divisibility a 0 divides b if b = a
More informationBasic elements of number theory
Cryptography Basic elements of number theory Marius Zimand By default all the variables, such as a, b, k, etc., denote integer numbers. Divisibility a 0 divides b if b = a k for some integer k. Notation
More informationDICKSON POLYNOMIALS OVER FINITE FIELDS. n n i. i ( a) i x n 2i. y, a = yn+1 a n+1 /y n+1
DICKSON POLYNOMIALS OVER FINITE FIELDS QIANG WANG AND JOSEPH L. YUCAS Abstract. In this paper we introduce the notion of Dickson polynomials of the k + 1)-th kind over finite fields F p m and study basic
More informationChapter 11 - Sequences and Series
Calculus and Analytic Geometry II Chapter - Sequences and Series. Sequences Definition. A sequence is a list of numbers written in a definite order, We call a n the general term of the sequence. {a, a
More informationThe p-adic Numbers. Akhil Mathew
The p-adic Numbers Akhil Mathew ABSTRACT These are notes for the presentation I am giving today, which itself is intended to conclude the independent study on algebraic number theory I took with Professor
More informationMATH 3030, Abstract Algebra Winter 2012 Toby Kenney Sample Midterm Examination Model Solutions
MATH 3030, Abstract Algebra Winter 2012 Toby Kenney Sample Midterm Examination Model Solutions Basic Questions 1. Give an example of a prime ideal which is not maximal. In the ring Z Z, the ideal {(0,
More informationshelat 16f-4800 sep Matrix Mult, Median, FFT
L5 shelat 16f-4800 sep 23 2016 Matrix Mult, Median, FFT merge-sort (A, p, r) if p
More informationNumbers. 2.1 Integers. P(n) = n(n 4 5n 2 + 4) = n(n 2 1)(n 2 4) = (n 2)(n 1)n(n + 1)(n + 2); 120 =
2 Numbers 2.1 Integers You remember the definition of a prime number. On p. 7, we defined a prime number and formulated the Fundamental Theorem of Arithmetic. Numerous beautiful results can be presented
More informationELEMENTARY LINEAR ALGEBRA
ELEMENTARY LINEAR ALGEBRA K R MATTHEWS DEPARTMENT OF MATHEMATICS UNIVERSITY OF QUEENSLAND First Printing, 99 Chapter LINEAR EQUATIONS Introduction to linear equations A linear equation in n unknowns x,
More informationCourse 2316 Sample Paper 1
Course 2316 Sample Paper 1 Timothy Murphy April 19, 2015 Attempt 5 questions. All carry the same mark. 1. State and prove the Fundamental Theorem of Arithmetic (for N). Prove that there are an infinity
More informationChapter 2. Divide-and-conquer. 2.1 Strassen s algorithm
Chapter 2 Divide-and-conquer This chapter revisits the divide-and-conquer paradigms and explains how to solve recurrences, in particular, with the use of the master theorem. We first illustrate the concept
More informationPartial Fractions. Combining fractions over a common denominator is a familiar operation from algebra: 2 x 3 + 3
Partial Fractions Combining fractions over a common denominator is a familiar operation from algebra: x 3 + 3 x + x + 3x 7 () x 3 3x + x 3 From the standpoint of integration, the left side of Equation
More informationECEN 5022 Cryptography
Elementary Algebra and Number Theory University of Colorado Spring 2008 Divisibility, Primes Definition. N denotes the set {1, 2, 3,...} of natural numbers and Z denotes the set of integers {..., 2, 1,
More informationORDERS OF ELEMENTS IN A GROUP
ORDERS OF ELEMENTS IN A GROUP KEITH CONRAD 1. Introduction Let G be a group and g G. We say g has finite order if g n = e for some positive integer n. For example, 1 and i have finite order in C, since
More informationFACTORING IN QUADRATIC FIELDS. 1. Introduction
FACTORING IN QUADRATIC FIELDS KEITH CONRAD For a squarefree integer d other than 1, let 1. Introduction K = Q[ d] = {x + y d : x, y Q}. This is closed under addition, subtraction, multiplication, and also
More information