On-Line Hardware Implementation for Complex Exponential and Logarithm

Similar documents
Accelerated Shift-and-Add algorithms

On-line Algorithms for Computing Exponentials and Logarithms. Asger Munk Nielsen. Dept. of Mathematics and Computer Science

Numeration and Computer Arithmetic Some Examples

Svoboda-Tung Division With No Compensation

Complex Logarithmic Number System Arithmetic Using High-Radix Redundant CORDIC Algorithms

A 32-bit Decimal Floating-Point Logarithmic Converter

A HIGH-SPEED PROCESSOR FOR RECTANGULAR-TO-POLAR CONVERSION WITH APPLICATIONS IN DIGITAL COMMUNICATIONS *

Proposal to Improve Data Format Conversions for a Hybrid Number System Processor

Proposal to Improve Data Format Conversions for a Hybrid Number System Processor

como trabajan las calculadoras?

How Do Calculators Calculate? Helmut Knaust Department of Mathematical Sciences University of Texas at El Paso

A Hardware-Oriented Method for Evaluating Complex Polynomials

A VLSI Algorithm for Modular Multiplication/Division

Dual-Field Arithmetic Unit for GF(p) and GF(2 m ) *

CORDIC, Divider, Square Root

Cost/Performance Tradeoff of n-select Square Root Implementations

Laboratoire de l Informatique du Parallélisme. École Normale Supérieure de Lyon Unité Mixte de Recherche CNRS-INRIA-ENS LYON n o 8512

Arithmetic Circuits-2

Arithmetic Circuits-2

EECS150 - Digital Design Lecture 24 - Arithmetic Blocks, Part 2 + Shifters

An Effective New CRT Based Reverse Converter for a Novel Moduli Set { 2 2n+1 1, 2 2n+1, 2 2n 1 }

9. Datapath Design. Jacob Abraham. Department of Electrical and Computer Engineering The University of Texas at Austin VLSI Design Fall 2017

I. INTRODUCTION. CMOS Technology: An Introduction to QCA Technology As an. T. Srinivasa Padmaja, C. M. Sri Priya

Radix-4 Vectoring CORDIC Algorithm and Architectures. July 1998 Technical Report No: UMA-DAC-98/20

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

Arithmetic operators for pairing-based cryptography

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

AREA EFFICIENT MODULAR ADDER/SUBTRACTOR FOR RESIDUE MODULI

Lecture 8: Sequential Multipliers

THE discrete sine transform (DST) and the discrete cosine

Design and FPGA Implementation of Radix-10 Algorithm for Division with Limited Precision Primitives

Laboratoire de l Informatique du Parallélisme

A COMBINED 16-BIT BINARY AND DUAL GALOIS FIELD MULTIPLIER. Jesus Garcia and Michael J. Schulte

Research Article Implementation of Special Function Unit for Vertex Shader Processor Using Hybrid Number System

International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research)

Design of Sequential Circuits

A High-Speed Realization of Chinese Remainder Theorem

Forward and Reverse Converters and Moduli Set Selection in Signed-Digit Residue Number Systems

EGFC: AN EXACT GLOBAL FAULT COLLAPSING TOOL FOR COMBINATIONAL CIRCUITS

KEYWORDS: Multiple Valued Logic (MVL), Residue Number System (RNS), Quinary Logic (Q uin), Quinary Full Adder, QFA, Quinary Half Adder, QHA.

Double Step Branching CORDIC : A New Algorithm for Fast Sine and Cosine Generation

Chapter 5: Solutions to Exercises

Hardware Operator for Simultaneous Sine and Cosine Evaluation

CMPEN 411 VLSI Digital Circuits Spring Lecture 21: Shifters, Decoders, Muxes

Mark Redekopp, All rights reserved. Lecture 1 Slides. Intro Number Systems Logic Functions

The goal differs from prime factorization. Prime factorization would initialize all divisors to be prime numbers instead of integers*

Performance/Complexity Space Exploration : Bulk vs. SOI

Low Power, High Speed Parallel Architecture For Cyclic Convolution Based On Fermat Number Transform (FNT)

Microtrend Systems Inc. Fixed Point Two s Complement CORDIC Arithmetic on MSP430

RN-coding of numbers: definition and some properties

A Suggestion for a Fast Residue Multiplier for a Family of Moduli of the Form (2 n (2 p ± 1))

FPGA accelerated multipliers over binary composite fields constructed via low hamming weight irreducible polynomials

Hardware Design I Chap. 4 Representative combinational logic

VHDL DESIGN AND IMPLEMENTATION OF C.P.U BY REVERSIBLE LOGIC GATES

Area-Time Optimal Adder with Relative Placement Generator

Section 3: Combinational Logic Design. Department of Electrical Engineering, University of Waterloo. Combinational Logic

Tree and Array Multipliers Ivor Page 1

A Simple Left-to-Right Algorithm for Minimal Weight Signed Radix-r Representations

CMPEN 411 VLSI Digital Circuits Spring Lecture 19: Adder Design

EFFICIENT FPGA-BASED INVERSE PARK TRANSFORMATION OF PMSM MOTOR USING CORDIC ALGORITHM

CS 140 Lecture 14 Standard Combinational Modules

Chapter 5. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 5 <1>

Design and Implementation of Efficient Modulo 2 n +1 Adder

Graduate Institute of Electronics Engineering, NTU Basic Division Scheme

Design of A Efficient Hybrid Adder Using Qca

Design of Arithmetic Logic Unit (ALU) using Modified QCA Adder

The equivalence of twos-complement addition and the conversion of redundant-binary to twos-complement numbers

Fixed-Point Trigonometric Functions on FPGAs

Part VI Function Evaluation

Novel Modulo 2 n +1Multipliers

Computer Architecture 10. Fast Adders

CMP 338: Third Class

VLSI Arithmetic. Lecture 9: Carry-Save and Multi-Operand Addition. Prof. Vojin G. Oklobdzija University of California

Numbering Systems. Computational Platforms. Scaling and Round-off Noise. Special Purpose. here that is dedicated architecture

Design of Low Power, High Speed Parallel Architecture of Cyclic Convolution Based on Fermat Number Transform (FNT)

Novel Bit Adder Using Arithmetic Logic Unit of QCA Technology

Lecture 18: Datapath Functional Units

Power Consumption Analysis. Arithmetic Level Countermeasures for ECC Coprocessor. Arithmetic Operators for Cryptography.

Lecture 11. Advanced Dividers

Design and Comparison of Wallace Multiplier Based on Symmetric Stacking and High speed counters

ISSN (PRINT): , (ONLINE): , VOLUME-4, ISSUE-10,

Lecture 12: Datapath Functional Units

Implementation of Reversible Control and Full Adder Unit Using HNG Reversible Logic Gate

Table-based polynomials for fast hardware function evaluation

DESİGN AND ANALYSİS OF FULL ADDER CİRCUİT USİNG NANOTECHNOLOGY BASED QUANTUM DOT CELLULAR AUTOMATA (QCA)

Implementation Of Digital Fir Filter Using Improved Table Look Up Scheme For Residue Number System

Information encoding and decoding using Residue Number System for {2 2n -1, 2 2n, 2 2n +1} moduli sets

Toward Correctly Rounded Transcendentals

An Area Efficient Enhanced Carry Select Adder

EECS150 - Digital Design Lecture 25 Shifters and Counters. Recap

Arithmetic Operators for Pairing-Based Cryptography

DIGITAL TECHNICS. Dr. Bálint Pődör. Óbuda University, Microelectronics and Technology Institute

Power Minimization of Full Adder Using Reversible Logic

Residue Number Systems Ivor Page 1

Literature Review on Multiplier Accumulation Unit by Using Hybrid Adder

Design and Implementation of a Radix-4 Complex Division Unit with Prescaling

FPGA IMPLEMENTATION OF BASIC ADDER CIRCUITS USING REVERSIBLE LOGIC GATES

A Simple Left-to-Right Algorithm for Minimal Weight Signed Radix-r Representations

OPTIMAL DESIGN AND SYNTHESIS OF FAULT TOLERANT PARALLEL ADDER/SUBTRACTOR USING REVERSIBLE LOGIC GATES. India. Andhra Pradesh India,

Hakim Weatherspoon CS 3410 Computer Science Cornell University

Transcription:

On-Line Hardware Implementation for Complex Exponential and Logarithm Ali SKAF, Jean-Michel MULLER * and Alain GUYOT Laboratoire TIMA / INPG - 46, Av. Félix Viallet, 3831 Grenoble Cedex * Laboratoire LIP / ENSL - 46, Allée d'italie, 69364 Lyon Cedex (FRANCE) Phone : (+33) 76 57 47 7 - Fax : (+33) 76 47 38 14 E-mail : Ali.Skaf@.imag.fr ABSTRACT This work reports on an on-line arithmetic co-processor that implements a novel algorithm derived from CORDIC. This algorithm, known as BKM, was adapted for the on-line arithmetic use. A 16 SBD VLSI implementation is also discussed. The obtained circuit might be considered as the first on-line arithmetic co-processor. As a matter of fact, the BKM algorithm gives, depending on its functioning mode, the complex exponential or logarithm functions. All basic mathematical operations can hence be computed. The chip was designed using a specific to on-line cell library, some full-custom parts and a generated decision and control part. I- INTRODUCTION The CORDIC algorithm (COordinate Rotation on a DIgital Computer), discovered by Volder in 1959 [1] and generalised by Walther in 1971 [2], is largely used in classical arithmetic co-processors (such as I887, HP35, M68881, M68882...). When adapted to redundant systems, the algorithm leads to a more complex and less efficient architecture. In 1993 Bajard, Kla and Muller proposed another algorithm named BKM [3] in which computation is performed in the complex space. In this paper, the BKM is adapted to on-line types of architecture. II- FROM CORDIC TO BKM The CORDIC algorithm is based on the iteration: x n+1 = x n - d n y n 2 -n; y n+1 = y n + d n x n 2 -n; z n+1 = z n - d n arctan 2 -n; d n = ± 1. The d n values depend on the sign of operands in two different ways, giving two different computation modes (rotation and vectoring). The results for n, are summarised in Table 1. Rotation Mode dn = sign(zn) xn K(x cos z - y sin z ) Vectoring Mode dn = sign(-yn) xn K x 2 + y2 yn K(y cos z - x sin z ) yn zn zn z - arctan y Table 1: The CORDIC functioning modes with K = n= x 1 cos(arctan (2 -n = 1.64676... ))

This algorithm was later generalised [2] to perform most of the basic mathematical functions like hyperbolic, logarithmic, exponential and square root functions, as well as trigonometric functions, addition, subtraction, multiplication and division. The basic iteration being a shift-and-add operation, redundant number systems allowing carry-propagation-free additions speed up the execution. In our case, numbers are represented in the Signed Binary Digit (SBD) system, with digits {-1,, 1}, which is an extension of the Avizienis' redundant representation systems [4]. Each digit c is represented by two bits c + and c - such that c = c + - c -. Unfortunately, the sign of a redundant operand is given by that of the most significant non-zero digit, which might be any of the operand digits. So getting the sign is equivalent to a carry propagation. The test of the sign spoils the advantage of redundant systems. The examination of a few Most Significant Digits (MSD) might be a solution if we can afford to ignore the sign of small operands, and accept that dn is sometimes zero, knowing then that K is no longer a constant. To overcome the problem of the K variable value many solutions were proposed based on repeating the basic iteration in time [5] or in space [6]. Still these solutions do not lead to an efficient architecture. The BKM algorithm is based on the iteration: L n+1 = L n (1 + (d x n + i dy n ) 2-n ); E n+1 = En - ln (1 + (d x n + i dy n ) 2-n ) with d x n, dy n {1-,, 1} (I) The dn values are chosen either to drive Ln to 1 and consequently E n to E 1 + ln (L 1 ) (L-mode), or to drive E n to and thus L n to L 1 exp (E 1 ) (E-mode). We can show that to obtain n accuracy binary digits, n iterations are enough [3]. Under its original form the BKM receives operands and delivers results in parallel. III- ON-LINE ARCHITECTURE FOR BKM In on-line operators, operands as well as results are transported digit by digit through the different operators starting from the MSD [7]. Consequently, the result MSDs are first obtained and can eventually be fed to the next operator while computation is still going on. To obtain an on-line version of the BKM, we first have to feed the necessary number of the operand digits to be able to compute d 1. Five fractional digits are enough to guarantee the algorithm correctness. This has also an impact on the d n value choice in both modes. The on-line BKM is given by the following algorithm: a- E-mode: 1. Start with E 1 E= [-.829823738,.8688766517] + i [-.7497832,.7497832] Let Ê n [γ] be the truncated value of 2 n E n after its γ fractional digits. 2. Initialise Ê 1 = E 1 [γ]; (γ = 5 is a convenient choice) 3. Iterate (I), with d x n, dy n {1-,, 1} determined as follows: if Ê x n [γ] < - 1 4 then dx n = 1- else if Ê x n [γ] 1 4 then dx n = else dx n = 1 if Ê y n [γ] < - 3 8 then dy n = 1- else if Ê y n [γ] 3 8 then dy n = else dy n = 1 4. Result: L n L 1 e E 1 ; E n b- L-mode: 1. Start with L 1 L = [.5, 1.3] + i x [-.5,.5] Let Lˆ n[γ] be the truncated value of 2 n (L n - 1) after its γ fractional digits. 2. Initialise Lˆ 1 = L 1 [γ] 3. Iterate (I), with d x n, dy n {1-,, 1} determined as follows: - At step 1: if Lˆ x 1 < - 1 4 then dx 1 = 1 else dx 1 = if Lˆ y 1 < - 1 4 then dy y 1 = 1 else if Lˆ 1 1 4 then dy 1 = else dy 1 = 1-

+ + - -At step n>1: if Lˆ x n [γ] < - 1 4 then dx x n =1 else if Lˆ n [γ] 1 4 then dx n = else dx n =1- if Lˆ y n [γ] < - 1 4 then dy y n = 1 else if Lˆ n [γ] 1 4 then dy n = else dy n =1-4. Result: L n 1 ; E n E 1 + ln (L 1 ) We obtain the architecture for on-line BKM given in Fig. 2. e x 5+n ROM X Exponential loop ROM Y e y 5+n + + - EXP x r i OL1 dy dx OL1 MUX Decision MUX EXP y l x 5+n 2 -n+1 2 -n+1 l y 5+n + + + - + + + + LN x OL2 Logarithm loop OL2 LN y Fig. 2: On-line implementation for BKM The decision bloc contains PLAs that correspond to the different ways of determining d n according to the functioning mode. This is done by examining the six MSDs of either Ê n or Lˆ n. The PLAs have been reduced by 8% by simply eliminating the code 11 for the SBD. The ROM tables contain the constant values for the real an imaginary parts of the exponential loop : 2 ln[ n-2 1+d x n +( ) ] 2-n+1 d x2 +d y 2 n n 2-2n and 2 n-1 d y n arctan 2 -n 1+d x n 2-n The exponential-loop and logarithm-loop adders are redundant hybrid adders and four-input parallel adders respectively. OL1 and OL2 blocs are on-line adders [8]. IV- OPERATOR DESIGN STRATEGY We began first by building a library of the on-line elementary operators that we integrate in the already existing standard cell library from the ES2 company. We also built a modular full custom barrel shifter well adapted to our architecture. The target technology was chosen as the CMOS 1.2µ double-metal single-polysilicon of ES2. We generated the ROM tables and PLAs used in the decision and control parts. A special attention was paid to make the design as observable and controllable as possible. This would facilitate the circuit test and debugging operations.

A 16 SBD prototype was designed occupying 19 mm 2. The same circuit was also laid out using only the ES2 standard cells in order to evaluate the efficiency of our library and the impact of the full custom shifter, resulting in a 35 mm 2 chip. The area optimisation is hence of 45%. Electrical simulation showed that the chip would work at up to 25 MHz. The on-line delay is of 3 and 4 clock cycles for respectively the exponential and the logarithm loops, that has to be added to the initialisation time of 6 clock cycles. The circuit floorplan and plot are given in Fig. 3. Logarithm loop adder Y shifter X Decidion shifter Y Logarithm loop adder X Control Exponential loop adders ROM X PLAs & Clock ROM Y I/O Ring Figure not included in the postscript file. Fig. 3 Obtained on-line co-processor floorplan and plot V- CONCLUSION We presented in this work an on-line VLSI operator able to give either the complex exponential or the complex logarithm depending on the selected functioning mode. We can thus compute almost all elementary functions (sine, cosine, arctan, complex exponential, logarithm and multiplication). Furthermore, one can cascade two operators to compute other functions (2D rotations, square roots...). The developed on-line library seems to offer a good compromise between full custom solutions (very time consuming and not reusable) and standard cell solutions (area consuming and less efficient). As a matter of fact, using our library we could save 45% area compared to standard library. REFERENCES [1] J. Volder "The CORDIC computing technique" IRE Transactions on computers, Sept. 1959. [2] J. Walther "A unified algorithm for elementary functions" Joint Computer Conference Vol. 38, 1971. [3] J. C. Bajard, S. Kla and J. M. Muller "BKM: a new hardware algorithm for complex elementary functions" 11th Symp. on Computer Arithmetic, Windsor, Canada, June 1993. [4] A. Avizienis "Signed-digit number representation for fast parallel arithmetic" IRE Transactions on Electronic Computers vol. EC-1 September 1961. [5] N. Tagaki, T. Asada and S. Yajima "Redundant CORDIC methods with a constant scale factor" IEEE Transactions on Computers, Vol. 4 N 9, September 1991. [6] J. Duprat and J. M. Muller "The CORDIC algorithm: new results for fast VLSI implementation" Res. Rep. N 9-4, Lab. LIP / ENSL, Lyon, France 199. [7] M.D. Ercegovac "A general hardware-oriented method for evaluation of functions and computation in digital computer" IEEE Transactions on Computers, Vol. C-26 N 7, July, 1977.

[8] A. Skaf and A. Guyot "VLSI design of on-line add/multiply algorithms" proc. International Conference on Computer Design (ICCD'93), Cambridge, USA, October 1993.