On-line Algorithms for Computing Exponentials and Logarithms. Asger Munk Nielsen. Dept. of Mathematics and Computer Science

Size: px

Start display at page:

Download "On-line Algorithms for Computing Exponentials and Logarithms. Asger Munk Nielsen. Dept. of Mathematics and Computer Science"

Abel Joseph
5 years ago
Views:

1 On-line Algorithms for Computing Exponentials and Logarithms Asger Munk Nielsen Dept. of Mathematics and Computer Science Odense University, Denmark Jean-Michel Muller Laboratoire LIP and CNRS Ecole Normale Superieure de Lyon 46 Allee d'italie, Lyon Cedex 07, France Abstract We propose a new on-line algorithm for fast evaluation of logarithms and exponentials. This algorithm is derived from the widely studied Briggs-De Lugish iteration. We examine various compromises between the on-line delay and the size of the required comparison constants. Index terms Computer Arithmetic, Elementary functions, on-line arithmetic, Redundant number systems. Introduction The point at stake in this paper is the design of algorithms that rapidly evaluate exponentials and logarithms. We focus on algorithms that belong to the shift-and-add class. Those algorithm use very simple elementary steps: additions, and shifts (i.e. multiplications by a power of the radix of the number system being used), and they go back to the 7th century: Briggs, a contemporary of Neper, invented an algorithm that made it possible to build the rst tables of logarithms. For instance, to compute ln x in radix- arithmetic, numerous methods (including that of Briggs, adapted to this radix) [3, 6, 4, P Q n 6, 4] consist of nding a sequence d k =?; 0;, such that x ( + d k?k ). n Then ln(x)? ln( + d k?k ). Another method belonging to the shift-and-add class is the CORDIC algorithm, introduced in 959 by J. Volder [8] and then generalized by J. Walther [9, ]. Such algorithms have been implemented in many pocket calculators or math co-processors (such as the Intel 8087 and the following Intel chips until the 486 family). In his Ph.D. dissertation [7], N. Takagi suggests an algorithm for computing logarithms and exponentials that is adapted to the use of redundant number systems (redundant number systems [, ] allow very fast carry-free additions, but they may complicate the comparisons that are required to perform the iterations: this point will be made clearer in what follows). We propose here a variant of that algorithm that allow on-line computation of logarithms and exponentials. On-line arithmetic was introduced by Ercegovac and Trivedi []. In on-line arithmetic, the digits ow through the operators in a digit-serial fashion, most signicant digit rst. This makes it possible to get a high throughput by pipelining the arithmetic operators at a digit-level. On line algorithms

2 have been proposed for the arithmetic operations [, 8, 5, 3] as well as for computing elementary functions [5, 7, 9, 0] Our algorithm is based on the following iteration : En+ = E n? ln( + d n?n ) L n+ = L n ( + d n?n ) () if we Q P n take E = X and L = then () satises, E n+ = X? ln( + d k?k ) and L n+ = n ( + d k?k ). If we are able to nd a sequence of digits d k f?; 0; g such that E n goes to zero, then we will have L n! e X. Thus we have computed the product of the number and the exponential of the number X. This iteration mode will be referred to as mode zero. If we are able to nd a sequence of digits d k f?; 0; g such that L n goes to one, then we will have E n! X + ln( ). Thus we have computed the logarithm of the number added to the number X. This iteration mode will be referred to as mode one. In the next section we will deduce valid procedures for nding sequences of digits d k such that the iteration variables converge towards the desired values, if X and are chosen such that they belong to some predened convergence domain. Selection of Digits and Domain of Convergence. Computing Exponentials In the zero mode the digits d k must be chosen such that the iteration variable E n goes to zero. By plotting E n+ versus E n for all values of d n f?; 0; g, as according to E n+ = E n?ln(+d n?n ), we get a diagram as depicted in [Figure ]. This diagram is similar to the Robertson Diagrams frequently found in studies of division algorithms. If we assume that E = X belongs to some E n+ u n+ d n = - d n = 0 d n = l n u n E n A n B n C n l n+ D n Figure : Robertson diagram for the zero mode (exponentials).

3 interval [l ; u ] and choose the digits d k such that E k+ belongs to some interval [l k+ ; u k+ ], where the sequence of intervals is such that: [l ; u ] [l ; u ] [l n ; u n ]; () where each of these intervals contain zero and lim n! u n? l n = 0, then E n will exhibit the desired convergence towards zero. From [Figure ] we may deduce a proper convergence domain and digit selection procedure for the zero mode. By noting that u n+ = u n?ln(+?n ) and l n+ = l n?ln(??n ), and P P since u n and l n must go to zero as n goes to innity, we deduce that u n = k=n ln(+?k ) and l n = k=n ln(??k ). From [Figure ] we may further deduce a valid selection procedure for the digits d k as: 8 <? if E n B n d k = 0 if A : n E n D n (3) if C n E n Notice that the selection procedure is nondeterministic in the sense that for some values of E n several choices of d n may be valid. The constants A n ; B n ; C n and D n are equal to (again these values are deduceable from the Robertson Diagram): A n = l n+ B n = u n+ + ln(??n ) C n = l n+ + ln( +?n ) D n = u n+ It can be shown that A n < B n < C n < D n for all n. Refer to [Table ] for the 0 rst values and limit of these constants. The convergence domain of the algorithm is given by the requirement that E = X [l ; u ], from which we deduce that?:406::: = l X u = 0:86887:::. n n l n n u n n A n n B n n C n n D n Table : First 0 values and limits of n l n ; n u n ; n A n ; n B n ; n C n and n D n (for zero mode). The relative error is bounded by:??n < k=n (??k )? = e ln? ex? L n e un? = L n k=n ( +?k )? < :8?n (4) 3

4 . Computing Logarithms In the one mode the digits d k must be chosen such that the iteration variable L n goes to one. By plotting n+ = L n+? versus n = L n? for all values of d n f?; 0; g, according to n+ = L n+? = n ( + d n?n ) + d n?n, we get a diagram as depicted in [Figure ]. If λ n+ r n+ s n d n = d n = 0 d n = - r n λ n s n+ F n G n H n I n Figure : Robertson diagram for the one mode (logarithms, n = ). we assume that = L? =? belongs to some interval [s ; r ], and that the digits d k are chosen such that k+ = L k+? belongs to some interval [s k+ ; r k+ ], where these intervals include zero and are contracting in the sense that the width of the interval goes to zero, then L n will converge towards one, as n approaches innity. From [Figure ] we may deduce a proper convergence domain and selection procedure for the one mode. By noting that r n+ = (??n )r n??n and s n+ = ( +?n )s n +?n, and since r n and s n must go to zero as n goes to innity, we deduce that j=n ( +?j )?. From [Figure ] we may r n = P k=n?k Q k j=n (??j )? and s n =? P k=n?k Q k further deduce a selection procedure for the digits d k as: d k = 8 < : if L n? G n 0 if F n L n? I n? if H n L n? (5) to: Again this selection procedure is nondeterministic. The constants F n ; G n ; H n and I n are equal F n = s n+ G n = r n+??n +?n H n = s n+ +?n??n I n = r n+ It can be shown that F n < G n < H n < I n for all n. Refer to [Table ] for the rst 0 values and limit of these constants. The convergence domain of the algorithm is given by the requirement 4

5 n n s n n r n n F n n G n n H n n I n Table : First 0 values and limits of n s n ; n r n ; n F n ; n G n ; n H n and n I n (for one mode). that = L? =? [s ; r ], from which we deduce that 0:494::: = s + u + = 3:467:::. The absolute error is given by:??n ln(s n + ) (X + ln( ))? E n ln(r n + ) r?n < :5?n (6) 3 On-Line Algorithms By denition, an on-line algorithm should be able to deduce the nth digit of the result, based on + n digits of the operands, where the integer is referred to as the on-line delay []. Thus in order to compute the exponential and logarithm of a number, in an on-line manner, we must develop algorithms that compute the desired value based on limited knowledge of the input operands X and. We will devise such an algorithm, and prove that it emulates a run of the nondeterministic algorithms described in the previous section. The algorithm is based on two scaled iteration variables ^E n = n (X n+? P n? ln( + ^d k?k )) and ^L n = n ( n+ Q n? ( + ^d k?k )? m), where m is set to either zero or one, corresponding to the desired mode of computation. Based on these iteration variables the proper digits ^d k can be selected, as specied by the following algorithm. Algorithm On-line Exponentials and Logarithms. Stimulus: m : Boolean (zero or one mode), X [l ; u ] = [?:406:::; 0:86887:::] (if m = 0), [s + ; r + ] = [0494:::; 3:467:::] (if m = ). Response: If m = 0 then lim n!?n ^Ln = e X. If m = then lim n!?n ^En = X + ln( ). Method:. Initialize ^E 0 = x 0 :x x x, ^L 0 = y?y 0 :y y y? m, P 0 = and ^d 0 = 0: (7) 5

6 . iterate: (for n = ; ; :::) ^E n = ^E n? + x n+?? n ln( + ^d n??n+ ) P n = P n?( + ^d n??n+ ) (8) ^L n = (^L n? + y n+ P n?? )( + ^d n??n+ ) + md n? If m = 0 then compute ~ E n = trunc( ^E n ; p) and ^d k = 8 < : If m = then compute ~ Ln = trunc(^ln ; q) and ^d k = 8 < :? if En ~ A 0 if A E ~ n B (9) if B E ~ n if Ln ~ C 0 if C L ~ n D (0)? if D L ~ n The function trunc(x; p) referred to in the algorithm, truncates the value x to p fractional digits. Rather than actually computing the term n ln( + ^d n??n+ ), this term should be accessible by means of a table lookup. The following lemmas give bounds on the comparison constants A; B; C and D, for which [Algorithm ] selects the digits ^d k correctly. Lemma Correctness of digit selection for zero mode. If A and B are p fractional digit constants satisfying for all n : and n A n +?p +? A n B n??p?? () n C n +?p +? B n D n??p?? () then given the input X and, [Algorithm ] (with m = 0) will emulate a run of the nondeterministic parallel algorithm (in the sense that the nondeterministic algorithm may choose the same digits as the ones chosen by the on-line algorithm, i.e. d k = ^d k ), when supplied with the same input, for the zero mode. Proof: (By induction) Let E n and ^En denote the iteration variables used by the nondeterministic respectively on-line algorithms. Since E ~ n is dened as ^E n truncated to p fractional digits, we have ^E n? ~ E n <?p, thus E? ~ E = ^E? ~ E + E? ^E ^E? ~ E + jx? X+j <?p +? ; (3) from which we deduce ~ E??p???? < E < ~ E +?p? +??. If ~ E A then E < ~ E +?p? +?? A +?p? +?? B thus according to (3) ^d =? is a valid choice. If A ~ E B then E < ~ E +?p? +?? B +?p? +?? D and E > ~ E??p???? A??p???? A thus ^d = 0 is a valid choice. 6

7 If ~ E B then E > ~ E +?p? +?? A +?p? +?? C thus ^d = is a valid choice. Now, assume that for k n? that ^d ; ^d ; : : :; ^d k? have been successfully chosen (i.e. that ^d k = d k where d k are the digits chosen by the nondeterministic algorithm), then by noting that n E n? ~ En = ^En? En ~ + n E n? ^En ^E n? En ~ + n X n? (X? ( + d k?k ))? (X +n? n? X ( + ^d k?k )) <?p +? ; and by using an argument similar to the one used for the base case, we conclude that ^d n is chosen correctly. Lemma Correctness of digit selection for one mode. If C and D are q fractional digit constants satisfying for all n : and n? n? n F n +?q +? ( +?k ) C n G n??q?? ( +?k ) (4) n? n? n H n +?q +? ( +?k ) D n I n??q?? ( +?k ) (5) then given the input X and, [Algorithm ] (with m = ) will emulate a run of the nondeterministic parallel algorithm, when supplied with the same input, for the one mode. Proof: The proof is similar to the one given for [Lemma ]. Corollary Under assumption of the hypothesis of [Lemma ] and [Lemma ], we have: L n??n ^Ln n? < ( for mode zero, and for mode one we have: ( +?k ))?n? ; (6) E n??n ^En <?n? : (7) Proof: and From [Lemma ] and [Lemma ] we have d k = ^d k thus: L n??n ^Ln n? = E n??n ^En n? X = X? n? n? ( + d k?k )? n+ ( + ^d k?k ) < ( ( +?k ))?n? ; (8) ln( + d k?k )? X n+ + 7 n? X ln( + ^d k?k ) <?n? : (9)

8 3. Hardware Requirements The algorithm requires the storage of the constants j ln( + ^d j??j+ ) for ^d j? f?; 0; g, thus to obtain an accuracy of the result of approximately n binary digits, we need to store 3n constants. This result can be improved by using the fact that for j n=, ln( + d j?j ) can be replaced by d j?j with accuracy?n. Thus we only need to store 3 n constants. The updating of the iteration variables, as specied by (9), can be done eciently using only shifts and additions. If the step time of each iteration of the algorithm is to be independent of the desired accuracy, e.g. the number of binary digits, the additions must be performed using redundant arithmetic, and the variables ^En ; P n and ^Ln should be represented in a redundant representation, like borrow-save. One inherent drawback of redundant arithmetic is that when comparing two redundant numbers, we are forced to examine all digits of both numbers. By designing the digit selection of [Algorithm ], such that the comparisons are based on truncated redundant numbers, this drawback has been eliminated, in the sense that we only need to examine a constant number of digits of both operands. 4 Choice of Constants and On-line Delay In this section we will discuss how to choose the comparison constants A; B; C and D, used when selecting the digits ^d k in the on-line algorithm. The lower and upper bounds for these constants, as given by [Lemma ] and [Lemma ], are depicted in [Figure 3] respectively [Figure 4]. n Dn n Dn - -p - δ B n C n n B n A n C n + -p + δ n n B n - -p - δ n A n + -p + δ n A n Figure 3: Lower and upper bound for constants A and B. 8

9 n I n n Hn n Gn D n I n - -q - δ n- -k Π(+ ) n- n -q Hn + + δ -k Π(+ ) n C n -q δ n- -k Gn - - Π (+ ) n F n n- n F n + -q + δ -k Π(+ ) Figure 4: Lower and upper bound for constants C and D. By inspection of [Figure 3] we note that: n A n +?p +? lim n! n A n +?p +? A B??p?? n B n??p?? ; (0) thus the p fractional digit constant A, must be chosen such that it belongs to the interval [ lim n! n A n +?p +? ; B??p?? ]: The best choice of A, will be the midpoint of this interval truncated to p fractional digits, e.g. A = trunc( limn! n A n+?p +? +B??p?? ; p) = trunc(b? ; p). In this way we have: A? (B? ) <?p : () From (0) and () we get the requirement: A <?p + B? B??p??, from which we deduce: log = A : () B + 0:5??p+ 9

10 Similarly we choose the constant B as the midpoint of the interval [lim n! n C n +?p +? ; D??p?? ], e.g. B = trunc( D +lim n! n C n ; p) = trunc(d ; p), and deduce that: log D??p+ = B : (3) Since A B the lower bound on-line delay when computing in mode zero is given by (), e.g. A. For mode one, we get by inspection of [Figure 4], that C and D must be chosen such that they belong to the intervals [ F +?q +? ; lim Q Q n! n G n??q?? ( +?k )] respectively [ H +?q +? ; lim n! n I n??q?? ( +?k )]. As with the zero mode, we choose the constants C and D as the midpoints of their respective Q intervals, truncated to q fractional digits, e.g. C = trunc( F+?q +? +lim n! n G n??q?? (+?k ) ; q) = trunc(f +?? (? Q Q ( +?k )); Q q) and D = trunc( H+?q +? +lim n! n I n??q?? (+?k ) ; q) = trunc(h + +?? (? ( +?k )); q). Using similar reasoning as for the zero mode we may deduce the following lower bounds on the on-line delay for mode one: log + Q ( +?k )?F??q+ log + Q ( +?k )? H??q+ = C : (4) = D : (5) Since D C the lower bound on the on-line delay for mode one is given by (5), e.g. D. Examples of values of A; B; C; D and the on-line delays A and D, are given in [Table 3] for various values of p and q. p; q A A B D C D Table 3: Truncated comparison comparison constants (in binary), and lower bound on on-line delay, for various values of p and q. 5 Digitization The output of [Algorithm ] is a sequence of approximations ^Ln or ^En, to the exact result e X respectively X + ln( ). If our algorithm for computing elementary functions is to be an on-line algorithm, we must produce the output as well as consume the input in an on-line manner. In order to produce the output in an on-line fashion, we will use a technique known as digitization. 0

11 Assume that we have computed a sequence of approximations < ^Zn > to a number Z, such that j?n ^Zn? Zj?n+p, and that the numerical value of the exact result is bounded by jzj q. We will then perform an on-line digitization of the value Z based on the sequence of approximations < ^Z n > producing the result S n in on-line mode. In each iteration of the algorithm, the (output) digit s n is chosen such that the residual error j ^Z n? S n j = j ^Z n? S n? + s n c j is minimized. Algorithm On-line Digitization. Stimulus: A sequence of approximations < ^Zn > satisfying: Response: S n = c++n P n s k?k. Method:?n ^Zn? Z?n+p.. Initialize S 0 = 0 (6). iterate: (for n = ; ; :::) T n = trunc(( ^Zn? S n?)?c ; k) (7) s n = 8 < : The following lemma gives a bound on the residual error. Lemma 3 If the scaling constant c is chosen such that c max ( p + &? if T n? 0 if? < T n (8) if < T n S n = S n? + s n c (9) log + q?p+ 3 +?k ' ; p + then the value S j computed by [Algorithm ] satises: 8j : Proof: (by induction) ^Z? S 0?c = & log 3??k ') ^Z j? S j c ( +?k ). (30) ^Z? Z + Z?c (? ^Z? Z+ jzj)?c ( p + q+ )?c 3 +?k (3) thus by virtue of the selection procedure (8) we have: thus Assume ^Z? S ( ^Z j? S j ( ^Z j+? S j?c +?k ) c. +?k ) c, thus = ( ^Z? S 0 )?c? s +?k ; (3) ( ^Zj? S j?)?c? s j +?k. Now ^Z j+? ^Z j + ^Z j? 4S j?? s j c+?c ^Z j+? j+ Z? ( ^Zj? j Z)?c + ( ^Zj? S j?)?c? s j ^Z j+? j+ Z?c + ^Z j? j Z?c + ( +?k ) p?c + p+?c + +?k+ 3 +?k ;

12 ( ^Z j+? S j )?c? s j+ +?k, and consequently: ^Zj+? Sj+ ( +?k ) c : (33) thus by virtue of the selection procedure (8) we have From [Lemma 3] we may deduce the following bound on the absolute error, made by approximating Z by?n S n. Z??n S n = Z??n ^Zn +?n ^Zn??n S n Z??n ^Zn +?n ^Z n? S n?n+p + c ( +?k )?n < c+?n Since the least signicant digit of?n S n is of weight c+?n, we conclude that?n S n is within one ulp (unit last place) of the exact result Z. Since the selection procedure is based on a k fractional digit estimation of ^Zn? S n?, it is not necessary to perform the full precision subtraction. Furthermore we only need to store S n to k fractional digits. Consequently [Algorithm ] can be implemented eciently, with a hardware cost that is independent on the desired precision (the number of binary digits of the result). 5. Applying Digitization to the Output of On-line Exp/Log Algorithm. In what follows we will briey argue that [Algorithm ] can be used to digitize the output of [Algorithm ]. For mode zero, by assuming j j, we deduce from (6) and [Corollary ] that:?n L n? e X <?n+3 =?n+p : (34) The absolute value of the exact result can be estimated to: e X j j e u < = q : (35) Thus with p = 3 and q =, we get from [Lemma 3] that the scaling constant c must satisfy, c 7 for k = and c 6 for k >. For mode one, we get from (4) and [Corollary ] that and by assuming that jxj we get?n ^En? (X + ln( )) <?n+ =?n+p ; (36) jx + ln( )j jxj + jln(u + )j < = q : (37) Thus with p = and q =, we get from [Lemma 3] that the scaling constant c must satisfy, c 6 for k = and c 5 for k >.

13 6 Conclusion We have proposed a new on-line algorithm for evaluating logarithms and exponentials. This algorithm is suitable for fast VLSI implementation. One should notice that in our presentation we rarely use the fact that the exponentials and logarithms are base e exponentials and logarithms, thus if one replaces the constants ln( + d k?k ) by log a ( + d k?k ) then (by modifying the comparison constants), one will obtain an algorithm for computing X + log a and a X. References [] T. Asada, N. Takagi, and S. ajima. Redundant CORDIC methods with a constant scale factor. IEEE Transactions on Computers, 40(9):989{995, September 99. [] A. Avizienis. Signed-digit number representations for fast parallel arithmetic. IRE Transactions on electronic computers, 0:pp 389{400, 96. Reprinted in E.E. Swartzlander, Computer Arithmetic, Vol., IEEE Computer Society Press Tutorial, 990. [3] J.C. Bajard, J. Duprat, S. Kla, and J.M. Muller. Some operators for on-line radix computations. Journal of Parallel and Distributed Computing, ():336{345, August 994. re. report of LIP ENSLyon n o 9-4. [4] J.C. Bajard, S. Kla, and J.M. Muller. Bkm: A new hardware algorithm for complex elementary functions. IEEE Transactions on Computers, 43(8):955{963, August 994. [5] R.H. Brackert, M.D. Ercegovac, and A.N. Willson. Design of an on-line multiply-add module for recursive digital lters. In M. D. Ercegovac and E. Swartzlander, editors, 9th Symposium on Computer Arithmetic, Santa Monica, USA, pages 34{4. IEEE Computer Society Press, sep 989. [6] T.C. Chen. Automatic computation of logarithms, exponentials, ratios and square roots. IBM J. of Res. and Dev., 6:380{388, 97. [7] M.D. Ercegovac and A.L. Grnarov. On-line multiplicative normalization. In 6th Symposium on Computer Arithmetic, pages 5{55. IEEE Computer Society Press, 983. [8] M.D. Ercegovac and T. lang. On-line arithmetic: a design methodology and applications in digital signal processing. In VLSI Signal Processing III, pages 5{63, 988. Reprinted in E.E. Swartzlander, Computer Arithmetic, Vol., IEEE Computer Society Press Tutorial, 990. [9] M.D. Ercegovac and T. Lang. On-line scheme for computing rotation factors. Journal of Parallel and Distributed Computing, Special Issue on Parallelism in Computer Arithmetic(5), 988. Reprinted in E.E. Swartzlander, Computer Arithmetic, Vol., IEEE Computer Society Press Tutorial, 990. [0] M.D. Ercegovac and T. Lang. Redundant and on-line cordic: Application to matrix triangularization and svd. IEEE Transaction on Computers, 39(6):75{740, June 990. [] M.D. Ercegovac and K.S. Trivedi. On-line algorithms for division and multiplication. IEEE Transactions on Computers, C-6(7):68{687, 977. Reprinted in E.E. Swartzlander, Computer Arithmetic, Vol., IEEE Computer Society Press Tutorial,

14 [] I. Koren. Computer arithmetic algorithms. Prentice-Hall, 993. [3] B. De Lugish. A Class of Algorithms for Automatic Evaluation of Functions and Computations in a Digital Computer. PhD thesis, Dept. of Computer Science, University of Illinois, Urbana, 970. [4] J.M. Muller. Discrete basis and computation of elementary functions. IEEE Transactions on Computers, C-34(9), September 985. [5] V.G. Oklobdzija and M.D. Ercegovac. An on-line square root algorithm. IEEE Transactions on Computers, C-3:70{75, 98. [6] W.H. Specker. A class of algorithms for ln(x), exp(x), sin(x), cos(x), tan? (x) and cot? (x). IEEE Transactions on Electronic Computers, EC-4, 965. Reprinted in E.E. Swartzlander, Computer Arithmetic, Vol., IEEE Computer Society Press Tutorial, 990. [7] N. Takagi. Studies on hardware algorithms for arithmetic operations with a redundant binary representation. PhD thesis, Dept. Info. Sci., Kyoto Univ., 987. [8] J. Volder. The cordic computing technique. IRE Transactions on Electronic Computers, 959. Reprinted in E.E. Swartzlander, Computer Arithmetic, Vol., IEEE Computer Society Press Tutorial, 990. [9] J. Walther. A unied algorithm for elementary functions. In Joint Computer Conference Proceedings, 97. Reprinted in E.E. Swartzlander, Computer Arithmetic, Vol., IEEE Computer Society Press Tutorial,

On-Line Hardware Implementation for Complex Exponential and Logarithm

On-Line Hardware Implementation for Complex Exponential and Logarithm Ali SKAF, Jean-Michel MULLER * and Alain GUYOT Laboratoire TIMA / INPG - 46, Av. Félix Viallet, 3831 Grenoble Cedex * Laboratoire LIP