Arbitrary-Precision Division - PDF Free Download

Arbitrary-Precision Division Rod Howell Kansas State University September 8, 000 This paper presents an algorithm for arbitrary-precision division and shows its worst-case time compleity to be related by a constant factor to that of arbitrary-precision multiplication. The material is adapted from [], pp. 64, 95-97, where S. A. Cook is credited for suggesting the basic idea. We assume a smooth bound g(n) Ω(n) for the worst-case time compleity of n-bit fied-point multiplication. Furthermore, we assume that for some n 0 N and some real c >, g(n) cg(n) for all n n 0. Intuitively, this condition ensures that g(n) eventually maintains a growth rate of at least n ɛ for some ɛ R + (i.e., it does not grow more slowly than this for arbitrarily long periods of time). In order to simplify the problem, we will restrict the input to positive integers u v. In particular, we wish to find u/v. Suppose v is an m-bit integer; i.e., m v < m. Then u ( ) = u m v v m, and / /(v m ) <. We therefore begin by presenting an algorithm to find a high-presision approimation for /, where is a fied-point rational number, / <. The idea is based on Newton s method, which generates successive approimations according to the following rule: z k+ = z k z k This method converges very quickly: if z k = ( ɛ)/, then ( ) ( ɛ) ɛ z k+ = = ɛ + ɛ ɛ = ɛ Copyright c 000, Rod Howell. This paper may be copied or printed in its entirety for use in conjunction with CIS 775, Analysis of Algorithms, at Kansas State University. Otherwise, no portion of this paper may be reproduced in any form or by any electronic or mechanical means without permission in writing from Rod Howell.

However, the time for convergence depends upon the accuracy required. Thus, the total time is not within a constant factor of the time to multiply. In order accomplish this goal, we use roughly the high-order half of to obtain recursively an approimation of roughly half the needed accuracy, then apply Newton s method a single iteration to obtain the desired result. We assume the following functions: trunc(, p): returns the fied-point truncated to p bits to the right of the radi point. Thus, 0 trunc(, p) < p. roundup(, p): returns the fied-point rounded up to p bits to the right of the radi point. Thus, 0 roundup(, p) < p. We now define reciprocal(, p) as follows: function reciprocal(, p) begin if p then return trunc(3/, p) else z reciprocal(, p/ + ) return roundup(z trunc(, p + )z, p) fi end We will first show the correctness of the algorithm. The following theorem follows from the definitions of trunc and roundup: Theorem reciprocal(, p) returns a value with at most p bits to the right of the radi point. We need the following lemma in order to bound the error incurred by reciprocal. Lemma The value returned by reciprocal(, p) is at most. Proof: The lemma clearly holds when p. Suppose p >. Consider the epression z trunc(, p + )z. Because z 0, the value of this epression is maimized when trunc(, p + ) is minimized. Thus, it suffices to show z z. () Rearranging terms, we find that () holds iff 0 z 4z + 4 = (z + ), which holds for all z R. The following theorem shows the accuracy of the value returned by reciprocal:

Theorem reciprocal(, p) returns a value y such that p Proof: By generalized induction on p. Base Case : p = 0. The value returned is. Because / <, =. Base Case : p. Because 3/ requires only bit to the right of the radi point, the value returned is 3/. Then = p. Induction Step: Let p >. Let / + α be the value returned by reciprocal(, p/ + ). By Lemma, By the Induction Hypothesis, + α. α p/ Let β be the value truncated by the call to trunc; i.e., β = trunc(, p + ). Then 0 β < p. Let γ be the value added by the call to roundup;. i.e., γ = roundup(z trunc(, p + ), p) (z trunc(, p + )). Then 0 γ < p. 3

Then the value y returned is given by ( ) ( ) y = + α ( β) + α + γ = + α α α + β = ( ) α + β + α + γ. ( ) + α + γ We need to derive a bound on β(/ + α) + γ α. First, we have Furthermore, ( ) 0 β + α + γ p + p = p + p = p 0 α p/ (p )/ = p Therefore, β ( ) + α + γ α p We are now ready to show the worst-case time compleity of reciprocal. Recall that g(n) is a bound on the worst-case time compleity for multiplying two n-bit fied-point numbers. We will show that the time compleity for reciprocal satisfies a recurrence of the form t(n) = t(n/) + cg(n) where n is a sufficiently large power of. We therefore need the following lemma. Lemma Let f : N R 0 be a smooth function such that f(n) cf(n) for some c > whenever n n 0 N. Let t : N R 0 be an eventually nondecreasing function satisfying t(n) = t(n/) + f(n) when n = n 0 k for some k. Then t(n) Θ(f(n)). 4

Proof: Because f is smooth, it is eventually positive. Without loss of generality, we may assume that f(n) > 0 for n n 0. Because f is smooth, it suffices to show that Because t(n) 0 for all n, clearly, t(n) Θ(f(n) n = n 0 k for some k ). t(n) Ω(f(n) n = n 0 k for some k ). We will show by induction on k that for n = n 0 k, t(n) df(n), where { } + t(n0 ) d = ma f(n 0 ), c. c Base: k =. Then n = n 0, and t(n) = t(n 0 ) + f(n) Because d + t(n 0 )/f(n), we have df(n) ( + t(n ) 0) f(n) f(n) = f(n) + t(n 0 ) = t(n) Induction Hypothesis: Assume that for some k, t(k) df(n). Induction Step: n = n 0 k+. Then t(n) = t(n 0 k ) + f(n) df(n 0 k ) + f(n) from the IH ( n ) = df + f(n) = Because d c/(c ), we have df(n) + f(n) ( c + d ) f(n) c d c c dc d c dc c + d d + d c. 5

Therefore, t(n) df(n). Theorem 3 reciprocal(, p) operates in a time in O(g(p)). Proof: Suppose p >. By Theorem, the value z contains at most p/ + bits to the right of the radi point. From Lemma, the value of z is at most. z can therefore be stored in p/ + bits. z therefore contains at most p + 4 bits. Because trunc(, p + ) contains at most p + bits, the multiplication trunc(, p + )z takes a time in O(g(p + 4)). Because g(n) Ω(n), this operation dominates the remainder of the work done outside the recursive call. We can therefore bound the total time with the following recurrence: t(p) = t( p/ + ) + cg(p + 4) for some c R and p > n 0 N. Let p = k, and define T (p) = t(p + ) ( ) p + = t + + cg(p + 5) ( p ) = t + + cg(p + 5) ( p = T + cg(p + 5) ) for p > n 0. From Lemma, T (p) Θ(cg(p+5)) = Θ(g(p), because g is smooth. Then t(p) = T (p ) Θ(g(p )) Θ(g(p)) Therefore, the time compeity of reciprocal(, p) is in O(g(p)). We can now use reciprocal to construct an integer division algorithm. We assume the eistence of a function numbits, which takes a natural number and returns the number of bits in its representation. Thus, for n u < n, numbits(u) returns n. The algorithm is as follows: procedure divide(u, v, var q, r) begin n numbits(u); m numbits(v) reciprocal(v m, n m + ) q u m r u qv if r < 0 then r r + v; q q elsif r v then r r v; q q + fi end 6

The following theorem shows the correctness of divide. Theorem 4 Let u v be positive integers. Upon eecution of divide(u, v, q, r), q contains the value u/v, and r contains the value u mod v. Proof: From the definition of numbits, we have n u < n and From Theorem, m v < m. = v m + α, where α m n. It then follows that after the first assignment to q, ( ) q = u v m + α m u = v + uα m uα m α n m. Hence, u q v Clearly, the remaining statements set q and r to u/v and u mod v, respectively. We conclude by showing the time compleity of divide. Theorem 5 Let u v be positive integers containing at most n bits. worst-case time compleity of divide(u, v, q, r) is in O(g(n)). The Proof: From Theorem 3, the call to reciprocal takes O(g(n)) time. From Theorem, contains at most n m + bits, so the product u can be computed in O(g(n)) time. Finally, q has at most n m + bits, so qv can be computed in O(g(n)) time. Because everything else can be done in at most linear time and g(n) Ω(n), the total time is in O(g(n)). References [] Donald Knuth. The Art of Computer Programming, volume, Seminumerical Algorithms. Addison-Wesley, nd edition, 98. 7