On-Line Hardware Implementation for Complex Exponential and Logarithm Ali SKAF, Jean-Michel MULLER * and Alain GUYOT Laboratoire TIMA / INPG - 46, Av. Félix Viallet, 3831 Grenoble Cedex * Laboratoire LIP / ENSL - 46, Allée d'italie, 69364 Lyon Cedex (FRANCE) Phone : (+33) 76 57 47 7 - Fax : (+33) 76 47 38 14 E-mail : Ali.Skaf@.imag.fr ABSTRACT This work reports on an on-line arithmetic co-processor that implements a novel algorithm derived from CORDIC. This algorithm, known as BKM, was adapted for the on-line arithmetic use. A 16 SBD VLSI implementation is also discussed. The obtained circuit might be considered as the first on-line arithmetic co-processor. As a matter of fact, the BKM algorithm gives, depending on its functioning mode, the complex exponential or logarithm functions. All basic mathematical operations can hence be computed. The chip was designed using a specific to on-line cell library, some full-custom parts and a generated decision and control part. I- INTRODUCTION The CORDIC algorithm (COordinate Rotation on a DIgital Computer), discovered by Volder in 1959 [1] and generalised by Walther in 1971 [2], is largely used in classical arithmetic co-processors (such as I887, HP35, M68881, M68882...). When adapted to redundant systems, the algorithm leads to a more complex and less efficient architecture. In 1993 Bajard, Kla and Muller proposed another algorithm named BKM [3] in which computation is performed in the complex space. In this paper, the BKM is adapted to on-line types of architecture. II- FROM CORDIC TO BKM The CORDIC algorithm is based on the iteration: x n+1 = x n - d n y n 2 -n; y n+1 = y n + d n x n 2 -n; z n+1 = z n - d n arctan 2 -n; d n = ± 1. The d n values depend on the sign of operands in two different ways, giving two different computation modes (rotation and vectoring). The results for n, are summarised in Table 1. Rotation Mode dn = sign(zn) xn K(x cos z - y sin z ) Vectoring Mode dn = sign(-yn) xn K x 2 + y2 yn K(y cos z - x sin z ) yn zn zn z - arctan y Table 1: The CORDIC functioning modes with K = n= x 1 cos(arctan (2 -n = 1.64676... ))
This algorithm was later generalised [2] to perform most of the basic mathematical functions like hyperbolic, logarithmic, exponential and square root functions, as well as trigonometric functions, addition, subtraction, multiplication and division. The basic iteration being a shift-and-add operation, redundant number systems allowing carry-propagation-free additions speed up the execution. In our case, numbers are represented in the Signed Binary Digit (SBD) system, with digits {-1,, 1}, which is an extension of the Avizienis' redundant representation systems [4]. Each digit c is represented by two bits c + and c - such that c = c + - c -. Unfortunately, the sign of a redundant operand is given by that of the most significant non-zero digit, which might be any of the operand digits. So getting the sign is equivalent to a carry propagation. The test of the sign spoils the advantage of redundant systems. The examination of a few Most Significant Digits (MSD) might be a solution if we can afford to ignore the sign of small operands, and accept that dn is sometimes zero, knowing then that K is no longer a constant. To overcome the problem of the K variable value many solutions were proposed based on repeating the basic iteration in time [5] or in space [6]. Still these solutions do not lead to an efficient architecture. The BKM algorithm is based on the iteration: L n+1 = L n (1 + (d x n + i dy n ) 2-n ); E n+1 = En - ln (1 + (d x n + i dy n ) 2-n ) with d x n, dy n {1-,, 1} (I) The dn values are chosen either to drive Ln to 1 and consequently E n to E 1 + ln (L 1 ) (L-mode), or to drive E n to and thus L n to L 1 exp (E 1 ) (E-mode). We can show that to obtain n accuracy binary digits, n iterations are enough [3]. Under its original form the BKM receives operands and delivers results in parallel. III- ON-LINE ARCHITECTURE FOR BKM In on-line operators, operands as well as results are transported digit by digit through the different operators starting from the MSD [7]. Consequently, the result MSDs are first obtained and can eventually be fed to the next operator while computation is still going on. To obtain an on-line version of the BKM, we first have to feed the necessary number of the operand digits to be able to compute d 1. Five fractional digits are enough to guarantee the algorithm correctness. This has also an impact on the d n value choice in both modes. The on-line BKM is given by the following algorithm: a- E-mode: 1. Start with E 1 E= [-.829823738,.8688766517] + i [-.7497832,.7497832] Let Ê n [γ] be the truncated value of 2 n E n after its γ fractional digits. 2. Initialise Ê 1 = E 1 [γ]; (γ = 5 is a convenient choice) 3. Iterate (I), with d x n, dy n {1-,, 1} determined as follows: if Ê x n [γ] < - 1 4 then dx n = 1- else if Ê x n [γ] 1 4 then dx n = else dx n = 1 if Ê y n [γ] < - 3 8 then dy n = 1- else if Ê y n [γ] 3 8 then dy n = else dy n = 1 4. Result: L n L 1 e E 1 ; E n b- L-mode: 1. Start with L 1 L = [.5, 1.3] + i x [-.5,.5] Let Lˆ n[γ] be the truncated value of 2 n (L n - 1) after its γ fractional digits. 2. Initialise Lˆ 1 = L 1 [γ] 3. Iterate (I), with d x n, dy n {1-,, 1} determined as follows: - At step 1: if Lˆ x 1 < - 1 4 then dx 1 = 1 else dx 1 = if Lˆ y 1 < - 1 4 then dy y 1 = 1 else if Lˆ 1 1 4 then dy 1 = else dy 1 = 1-
+ + - -At step n>1: if Lˆ x n [γ] < - 1 4 then dx x n =1 else if Lˆ n [γ] 1 4 then dx n = else dx n =1- if Lˆ y n [γ] < - 1 4 then dy y n = 1 else if Lˆ n [γ] 1 4 then dy n = else dy n =1-4. Result: L n 1 ; E n E 1 + ln (L 1 ) We obtain the architecture for on-line BKM given in Fig. 2. e x 5+n ROM X Exponential loop ROM Y e y 5+n + + - EXP x r i OL1 dy dx OL1 MUX Decision MUX EXP y l x 5+n 2 -n+1 2 -n+1 l y 5+n + + + - + + + + LN x OL2 Logarithm loop OL2 LN y Fig. 2: On-line implementation for BKM The decision bloc contains PLAs that correspond to the different ways of determining d n according to the functioning mode. This is done by examining the six MSDs of either Ê n or Lˆ n. The PLAs have been reduced by 8% by simply eliminating the code 11 for the SBD. The ROM tables contain the constant values for the real an imaginary parts of the exponential loop : 2 ln[ n-2 1+d x n +( ) ] 2-n+1 d x2 +d y 2 n n 2-2n and 2 n-1 d y n arctan 2 -n 1+d x n 2-n The exponential-loop and logarithm-loop adders are redundant hybrid adders and four-input parallel adders respectively. OL1 and OL2 blocs are on-line adders [8]. IV- OPERATOR DESIGN STRATEGY We began first by building a library of the on-line elementary operators that we integrate in the already existing standard cell library from the ES2 company. We also built a modular full custom barrel shifter well adapted to our architecture. The target technology was chosen as the CMOS 1.2µ double-metal single-polysilicon of ES2. We generated the ROM tables and PLAs used in the decision and control parts. A special attention was paid to make the design as observable and controllable as possible. This would facilitate the circuit test and debugging operations.
A 16 SBD prototype was designed occupying 19 mm 2. The same circuit was also laid out using only the ES2 standard cells in order to evaluate the efficiency of our library and the impact of the full custom shifter, resulting in a 35 mm 2 chip. The area optimisation is hence of 45%. Electrical simulation showed that the chip would work at up to 25 MHz. The on-line delay is of 3 and 4 clock cycles for respectively the exponential and the logarithm loops, that has to be added to the initialisation time of 6 clock cycles. The circuit floorplan and plot are given in Fig. 3. Logarithm loop adder Y shifter X Decidion shifter Y Logarithm loop adder X Control Exponential loop adders ROM X PLAs & Clock ROM Y I/O Ring Figure not included in the postscript file. Fig. 3 Obtained on-line co-processor floorplan and plot V- CONCLUSION We presented in this work an on-line VLSI operator able to give either the complex exponential or the complex logarithm depending on the selected functioning mode. We can thus compute almost all elementary functions (sine, cosine, arctan, complex exponential, logarithm and multiplication). Furthermore, one can cascade two operators to compute other functions (2D rotations, square roots...). The developed on-line library seems to offer a good compromise between full custom solutions (very time consuming and not reusable) and standard cell solutions (area consuming and less efficient). As a matter of fact, using our library we could save 45% area compared to standard library. REFERENCES [1] J. Volder "The CORDIC computing technique" IRE Transactions on computers, Sept. 1959. [2] J. Walther "A unified algorithm for elementary functions" Joint Computer Conference Vol. 38, 1971. [3] J. C. Bajard, S. Kla and J. M. Muller "BKM: a new hardware algorithm for complex elementary functions" 11th Symp. on Computer Arithmetic, Windsor, Canada, June 1993. [4] A. Avizienis "Signed-digit number representation for fast parallel arithmetic" IRE Transactions on Electronic Computers vol. EC-1 September 1961. [5] N. Tagaki, T. Asada and S. Yajima "Redundant CORDIC methods with a constant scale factor" IEEE Transactions on Computers, Vol. 4 N 9, September 1991. [6] J. Duprat and J. M. Muller "The CORDIC algorithm: new results for fast VLSI implementation" Res. Rep. N 9-4, Lab. LIP / ENSL, Lyon, France 199. [7] M.D. Ercegovac "A general hardware-oriented method for evaluation of functions and computation in digital computer" IEEE Transactions on Computers, Vol. C-26 N 7, July, 1977.
[8] A. Skaf and A. Guyot "VLSI design of on-line add/multiply algorithms" proc. International Conference on Computer Design (ICCD'93), Cambridge, USA, October 1993.