Design and Implementation of Efficient Modulo 2 n +1 Adder

Similar documents
Novel Modulo 2 n +1Multipliers

Low Power, High Speed Parallel Architecture For Cyclic Convolution Based On Fermat Number Transform (FNT)

Design of Low Power, High Speed Parallel Architecture of Cyclic Convolution Based on Fermat Number Transform (FNT)

AREA EFFICIENT MODULAR ADDER/SUBTRACTOR FOR RESIDUE MODULI

A High-Speed Realization of Chinese Remainder Theorem

KEYWORDS: Multiple Valued Logic (MVL), Residue Number System (RNS), Quinary Logic (Q uin), Quinary Full Adder, QFA, Quinary Half Adder, QHA.

ISSN (PRINT): , (ONLINE): , VOLUME-4, ISSUE-10,

Design of Arithmetic Logic Unit (ALU) using Modified QCA Adder

Design and Implementation of Carry Tree Adders using Low Power FPGAs

A Low-Error Statistical Fixed-Width Multiplier and Its Applications

DESIGN OF LOW POWER-DELAY PRODUCT CARRY LOOK AHEAD ADDER USING MANCHESTER CARRY CHAIN

The equivalence of twos-complement addition and the conversion of redundant-binary to twos-complement numbers

EFFICIENT MULTIOUTPUT CARRY LOOK-AHEAD ADDERS

An Area Efficient Enhanced Carry Select Adder

Novel Bit Adder Using Arithmetic Logic Unit of QCA Technology

Comparative analysis of QCA adders

Volume 3, No. 1, January 2012 Journal of Global Research in Computer Science RESEARCH PAPER Available Online at

An Effective New CRT Based Reverse Converter for a Novel Moduli Set { 2 2n+1 1, 2 2n+1, 2 2n 1 }

On Equivalences and Fair Comparisons Among Residue Number Systems with Special Moduli

Implementation Of Digital Fir Filter Using Improved Table Look Up Scheme For Residue Number System

Performance Evaluation of Signed-Digit Architecture for Weighted-to-Residue and Residue-to-Weighted Number Converters with Moduli Set (2 n 1, 2 n,

A Suggestion for a Fast Residue Multiplier for a Family of Moduli of the Form (2 n (2 p ± 1))

Design and Comparison of Wallace Multiplier Based on Symmetric Stacking and High speed counters

Area Efficient Sparse Modulo 2 n 3 Adder

International Journal of Advanced Research in Computer Science and Software Engineering

Proposal to Improve Data Format Conversions for a Hybrid Number System Processor

Reduced-Error Constant Correction Truncated Multiplier

AN IMPROVED LOW LATENCY SYSTOLIC STRUCTURED GALOIS FIELD MULTIPLIER

Proposal to Improve Data Format Conversions for a Hybrid Number System Processor

A COMBINED 16-BIT BINARY AND DUAL GALOIS FIELD MULTIPLIER. Jesus Garcia and Michael J. Schulte

Design of A Efficient Hybrid Adder Using Qca

DESİGN AND ANALYSİS OF FULL ADDER CİRCUİT USİNG NANOTECHNOLOGY BASED QUANTUM DOT CELLULAR AUTOMATA (QCA)

Area-Time Optimal Adder with Relative Placement Generator

ECE380 Digital Logic. Positional representation

International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research)

Arithmetic in Integer Rings and Prime Fields

FIXED WIDTH BOOTH MULTIPLIER BASED ON PEB CIRCUIT

CMPUT 329. Circuits for binary addition

I. INTRODUCTION. CMOS Technology: An Introduction to QCA Technology As an. T. Srinivasa Padmaja, C. M. Sri Priya

Carry Look Ahead Adders

Part II Addition / Subtraction

Synthesis of Saturating Counters Using Traditional and Non-traditional Basic Counters

Logic and Computer Design Fundamentals. Chapter 5 Arithmetic Functions and Circuits

THE discrete sine transform (DST) and the discrete cosine

Forward and Reverse Converters and Moduli Set Selection in Signed-Digit Residue Number Systems

EECS150 - Digital Design Lecture 24 - Arithmetic Blocks, Part 2 + Shifters

Hardware Design I Chap. 4 Representative combinational logic

Global Optimization of Common Subexpressions for Multiplierless Synthesis of Multiple Constant Multiplications

Optimum Prefix Adders in a Comprehensive Area, Timing and Power Design Space

Lecture 8: Sequential Multipliers

HARDWARE IMPLEMENTATION OF FIR/IIR DIGITAL FILTERS USING INTEGRAL STOCHASTIC COMPUTATION. Arash Ardakani, François Leduc-Primeau and Warren J.

ECE 407 Computer Aided Design for Electronic Systems. Simulation. Instructor: Maria K. Michael. Overview

Implementation of Reversible Control and Full Adder Unit Using HNG Reversible Logic Gate

Overview. Arithmetic circuits. Binary half adder. Binary full adder. Last lecture PLDs ROMs Tristates Design examples

Resource Efficient Design of Quantum Circuits for Quantum Algorithms

Adders, subtractors comparators, multipliers and other ALU elements

14:332:231 DIGITAL LOGIC DESIGN

EECS150 - Digital Design Lecture 21 - Design Blocks

ALUs and Data Paths. Subtitle: How to design the data path of a processor. 1/8/ L3 Data Path Design Copyright Joanne DeGroat, ECE, OSU 1

Part II Addition / Subtraction

ISSN (PRINT): , (ONLINE): , VOLUME-5, ISSUE-7,

On A Large-scale Multiplier for Public Key Cryptographic Hardware

CMPEN 411 VLSI Digital Circuits Spring Lecture 19: Adder Design

DESIGN OF AREA DELAY EFFICIENT BINARY ADDERS IN QUANTUM-DOT CELLULAR AUTOMATA

Adders, subtractors comparators, multipliers and other ALU elements

Variable Latency Speculative Addition: A New Paradigm for Arithmetic Circuit Design

Design and Study of Enhanced Parallel FIR Filter Using Various Adders for 16 Bit Length

Research Article Implementation of Special Function Unit for Vertex Shader Processor Using Hybrid Number System

EECS150. Arithmetic Circuits

Dual-Field Arithmetic Unit for GF(p) and GF(2 m ) *

CSE140: Components and Design Techniques for Digital Systems. Decoders, adders, comparators, multipliers and other ALU elements. Tajana Simunic Rosing

Midterm Exam Two is scheduled on April 8 in class. On March 27 I will help you prepare Midterm Exam Two.

EXPLOITING RESIDUE NUMBER SYSTEM FOR POWER-EFFICIENT DIGITAL SIGNAL PROCESSING IN EMBEDDED PROCESSORS

A VLSI Algorithm for Modular Multiplication/Division

Chapter 4. Combinational: Circuits with logic gates whose outputs depend on the present combination of the inputs. elements. Dr.

A Novel Low Power 1-bit Full Adder with CMOS Transmission-gate Architecture for Portable Applications

BINARY TO GRAY CODE CONVERTER IMPLEMENTATION USING QCA

Chapter 5. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 5 <1>

CMP 334: Seventh Class

ECE 645: Lecture 3. Conditional-Sum Adders and Parallel Prefix Network Adders. FPGA Optimized Adders

Implementation of Carry Look-Ahead in Domino Logic

Optimum Prefix Adders in a Comprehensive Area, Timing and Power Design Space

SUFFIX PROPERTY OF INVERSE MOD

COMBINATIONAL LOGIC FUNCTIONS

An Algorithm for Inversion in GF(2 m ) Suitable for Implementation Using a Polynomial Multiply Instruction on GF(2)

Logic. Combinational. inputs. outputs. the result. system can

FAST FIR ALGORITHM BASED AREA-EFFICIENT PARALLEL FIR DIGITAL FILTER STRUCTURES

FPGA IMPLEMENTATION OF 4-BIT AND 8-BIT SQUARE CIRCUIT USING REVERSIBLE LOGIC

Optimization of new Chinese Remainder theorems using special moduli sets

NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1

On the Complexity of Error Detection Functions for Redundant Residue Number Systems

Residue Number Systems Ivor Page 1

Lecture 4. Adders. Computer Systems Laboratory Stanford University

CSEE 3827: Fundamentals of Computer Systems. Combinational Circuits

Power Minimization of Full Adder Using Reversible Logic

Transformation Techniques for Real Time High Speed Implementation of Nonlinear Algorithms

ARITHMETIC COMBINATIONAL MODULES AND NETWORKS

DIGITAL TECHNICS. Dr. Bálint Pődör. Óbuda University, Microelectronics and Technology Institute

Class Website:

Median architecture by accumulative parallel counters

Transcription:

www..org 18 Design and Implementation of Efficient Modulo 2 n +1 Adder V. Jagadheesh 1, Y. Swetha 2 1,2 Research Scholar(INDIA) Abstract In this brief, we proposed an efficient weighted modulo (2 n +1) adders. This is achieved by modifying existing diminished-1 modulo (2 n +1) adders to incorporate simple correction schemes. Our proposed adders can produce modulo sums within the range {0, 2 n }, which is more than the range {0, 2 n 1} produced by existing diminished-1 modulo (2 n +1) adders. And our proposed adder gives better performance in terms of area and power when compared with previously existing architectures. We have implemented the proposed adders using 0.18-µm technology, and also compared the performance parameters of those adders. Keywords - Modulo arithmetic, residue number system (RNS), parallel-prefix carry computation, computer arithmetic, VLSI. I. INTRODUCTION The Residue Number System (RNS) [1] has been employed for efficient parallel carry-free arithmetic computations (addition, subtraction, and multiplication) in DSP applications as the computations for each residue channel can independently be done without carry propagation. Since RNS based computations can achieve significant speedup over the binary-system-based computation, they are widely used in DSP processors, cryptography algorithms, FIR filters, and communication components [2]-[3]. Arithmetic modulo 2 n +1 computation is one of the most common RNS operations that are used in pseudorandom number generation and cryptography. The modulo 2 n +1 addition is the most crucial step among the commonly used moduli sets, such as {2 n 1, 2 n, 2 n + 1}, {2 n 1, 2 n, 2 n + 1, 2 2n +1}, and {2 n 1, 2 n, 2 n + 1, 2 n+1 + 1}. There are many previously reported methods to speed up the modulo 2 n +1 addition. Depending on the input/output data representations, these methods can be classified into two categories, namely, diminished-1 [4]-[10],[12] and weighted [11] respectively. In the diminished-1 representation, each input and output operand is decreased by 1 compared with its weighted representation. Therefore, only n-bit operands are needed in diminished-1 modulo 2 n +1 addition, leading to smaller and faster components. However, this incurs an overhead due to the translators from/to the binary weighted system. On the other hand, the weighted-1 representation uses (n+1)-bit operands for computations, avoiding the overhead of translators, but requires larger area compared with the diminished-1 representations. But in the diminished-1 representation, the hardware required for the decrementing and incrementing the input and output respectively, are will take so much area and also it requires zero-detection circuit. In this brief, we propose efficient weighted modulo 2 n +1 adder design using diminished-1 adders with simple correction schemes. This is achieved by subtracting the sum of two (n + 1)-bit input numbers by the constant 2 n + 1 and producing carry and sum vectors. The modulo 2 n +1 addition can then be performed using parallel-prefix structure diminished- 1 adders by taking in the sum and carry vectors plus the inverted end-around carry with simple correction schemes. In addition, our proposed adders do not require the hardware for zero detection that is needed in diminished-1 modulo 2 n +1 addition. The rest of this brief is organized as follows. In Section II, we will discuss basics of parallel prefix adders, and the basics of modulo 2 n +1 adders in Section III. Our proposed efficient weighted modulo 2 n +1 adder is presented in Section IV. The synthesis results and comparisons are given in Section V, and Section VI concludes this brief. II. PARALLEL PREFIX ADDITION BASICS Suppose that A=A n-1 A n-2..a 1 A 0 and B=B n- 1B n-2..b 1 B 0 represent the two numbers to be added and S=S n-1 S n-2..s 1 S 0 denotes their sum. An adder can be considered as a three-stage circuit. These are called as pre-processing stage, carry calculation stage and post-processing stage. The pre-processing stage computes the carry-generate bits G i and carrypropagate bits P i, for every i, where 0 i (n-1), according to Gi=Ai Bi and Pi=Ai^Bi where and ^ denote logical AND and X-OR, respectively. The second stage of the adder, hereafter

www..org 19 called the carry calculation unit, computes the carry signals Ci, for 0 i (n-1) using the carry generate and propagate bits Gi and Pi. The third stage computes the sum bits according to Si=Pi^Ci-1 where C i-1 represent previous carry generate signals. Carry computation is transformed into a parallel prefix problem using the operator, which associates pairs of generate and propagate signals and it is defined as (G,P) (G,P )=(G P G, P P ) where denote logical OR. In a series of associations of consecutive generate/propagate pairs (G,P), the notation (G k:j,p k:j ), with k>j, is used to denote the group generate/propagate term produced out of bits k,k-1, j, that is, (Gk:j,Pk:j)=(Gk,Pk) (Gk-1,Pk-1) (Gj,Pj) Since every carry C i =G i:0, a number of algorithms have been introduced for computing all the carries using only operators. Fig.1 presents the most wellknown approaches for the design of an 8-bit adder[12], while Fig.2 depicts the logic-level implementation of the basic cells used throughout the paper. III. MODULO (2 n +1) ADDITION BASICS Modulo-m addition of mod-m residues A and B (0 A,B<m) is defined as[1]: Replacing m with 2 n +1 yields the equation for mod- (2 n +1) addition[7] and is given as follows: The efficient way to implement modulo 2 n +1 adder is by using diminished-1 adder with correction unit. Consider two (n+1)-bit numbers A and B, where 0 A,B 2 n, the values of diminished-1 of A and B are denoted by A* = A-1 and B* = B-1, respectively. The diminished-1 sum, S* can be computed by, S* = S-1 2 n +1 = A+B-1 2 n +1 = A*+B*+ out 2 n Fig. 1 Parallel prefix 8-bit integer adders (a)sklansky (b)koggestone (c)brent-kung (d)knowles Here out is denoted as the inverted end-around carry of the diminished-1 modulo 2 n sum of n-bit A and B. So, the diminished-1 modulo 2 n +1 addition can be represented as follows: Normally we preferred parallel prefix structures for designing diminished-1 modulo (2 n +1) adders due to its high speed carry generation. The diminished-1 modulo 2 8 +1 adder using Sklansky parallel prefix structure is shown in Fig.3 and the flowchart process of modulo 2 n +1 addition is shown in Fig.4. Fig. 2 Logic level implementation of Basic cells used in parallel prefix adders

www..org 20 This can then be expressed as, Fig. 3 Diminished-1 Modulo 2 8 +1 adder using Sklansky prefix structure From this, it can easily be seen that the value of the modulo 2 n + 1 addition can be obtained by first subtracting the value of the sum of A and B by (2 n +1) (i.e., 0111,..., 1) and then using the diminished-1 adder to get the final modulo sum by making the inverted end-around carry as the carry-in. Now, we present the method of weighted modulo 2 n +1 addition of A and B as follows. Denoting Y and U as the carry and sum vectors of the summation of A, B and (2 n +1), where Y = y n 2 y n 3,..., y 0Ӯ n 1 and U = u n 1 u n 2,..., u 0, the modulo addition can be expressed as follows: A+B-(2 n +1) 2 n = = = Fig. 4 Modulo (2 n +1) addition Flowchart process IV. PROPOSED EFFICIENT WEIGHTED MODULO (2 n +1) ADDERS Given two (n + 1)-bit inputs A = a n a n 1,..., a 0 and B = b n b n 1,..., b 0, where 0 A,B 2 n. The modulo 2 n + 1 of A + B can be represented as follows: This equation can be stated as, For i=0 to (n-2), the values of y i and u i can be expressed as y i =a i b i and u i =~(a i^b i ), respectively (, ~ and ^ are denoted as logical OR, NOT and X- OR operation respectively). Since the bit widths of Y and U are only n-bits, the values of y n-1 and u n-1 are required to be computed taking the values of a n, b n, a n-1, and b n-1 into consideration (i.e., y n-1 and u n-1 are the values of the carry and sum produced by (2a n +2b n +a n-1 +b n-1 +1, respectively). It should be noted that 0 A,B 2 n, which means a n =a n-1 =1 or b n =b n-1 =1 will cause the value of A and B to exceed

www..org 21 the range of {0,2 n }. Thus, these input combinations (i.e. a n =a n-1 =1 or b n =b n-1 =1) are not allowed and can be viewed as don t care conditions, which can help us simplify the circuits for generating y n-1 and u n-1. That is, the maximum value of (2a n +2b n +a n-1 +b n-1 +1) is 5, which occurs at a n =a n-1 =1(i.e., the maximum value of y n-1 is 2). The truth table for generating y n- 1,u n-1 and FIX is given in Table 1, where x is denoted as don t care. The reason for FIX is that, under some conditions, y n-1 =2(e.g., a n =b n =1 and a n-1 =b n-1 =0), which cannot be represented by 1-bit line (marked as * in Table 1); therefore, the value of y n-1 is set to 1, and the remaining value of carry (i.e., 1) is set to FIX. Notice that FIX is wired-or with the carry-out of Y +U (i.e., c out ) to be the inverted end around carry (denoted by ~(cout FIX)) as the carry-in for the diminished-1 addition stage later on. When y n-1 =2, FIX=1; otherwise, FIX=0. Table 1. Truth table for generating y n-1, u n-1 and FIX (*: conditions when y n-1 =2) According to Table 1, we can have equations as follows: y n-1 =(an bn an-1 bn-1) u n-1 =~(an-1^bn-1) and FIX =(anbn anbn-1 an-1bn) Based on the aforementioned, our proposed modulo 2 n + 1 addition of A and B is equivalent to, A+B 2 n +1 2 n = A+B-( +1) 2 n = Y +U 2 n +(~(cout FIX)) Two examples for our proposed addition methods are given as follows: Ex.1: Suppose n=4, A=16 10 =10000 2, and B=15 10 =01111 2, respectively. Step 1) (A+B)-(2 n +1) => Y =1110 2, U =0000 2, FIX=1 Step 2) Y +U =1110 2, c out =0, => Y +U +(~(cout FIX))=11102= 16+15 17=1410 Ex.2: Suppose n=4, A=11 10 =01011 2, and B=5 10 =00101 2, respectively. Step 1) (A+B)-(2 n +1) => Y =1110 2, U =0001 2, FIX=0 Step 2) Y +U =1111 2, c out =0, => Y +U +(~(cout FIX))=100002= 11+5 17=1610 The architecture for our proposed adder is given in Fig. 5 and proposed modulo 2 8 +1 adder using Sklansky parallel prefix structure is given in Fig. 6. V. RESULT ANALYSIS We have chosen Verilog HDL to design our proposed modulo 2 n +1 adder. We have used Cadence Incisive Enterprise simulator for simulation and RTL compiler for synthesis. We have targeted the design to TSMC 180nm CMOS Technology library. We have constrained the design for maximum operating frequency of 400MHz. The table 2&3 shows the comparative analysis of the existing[11] and the proposed modulo 2 n +1 adder without and with speed constraints for different parallel prefix structures like Sklansky, Koggestone, Brent-Kung and Knowles structures. Our proposed architecture gives the better results in terms of area and power when compared with existing architecture[11] according to Brent- Kung prefix structure. Our proposed architecture is efficient by 3% w.r. to cell area and efficient by 8% w.r. to power when compared with previous existing architecture[11]. For the proposed architecture, it is been observed that modulo 2 n +1 Brent-Kung parallel prefix structure is efficient by 10% w.r. to speed and efficient by 55% w.r. to power consumption and 5% efficient w.r. to cell area when compared with other prefix structures without timing constraints. And also, it is been observed that modulo 2 n +1 Sklansky parallel prefix structure is efficient by 10% w.r. to power consumption and 5% efficient w.r. to cell area when compared with other prefix structures with timing constraints.

www..org 22 Fig. 5(a)Architecture of our proposed weighted modulo 2 n +1 adder with the correction scheme (b)architecture of the A+B (2 n +1) Carry save adder (c)architecture of the correction scheme (d)architecture of FA+ (e)architecture of FAF Fig. 6 Proposed modulo 2 8 +1 Adder using Sklansky parallel prefix structure Table 2. Comparative Results of [11] Table 3. Comparative Results of proposed modulo 2 8 +1 adder

www..org 23 Fig.8(b) Results of Proposed architecture Fig. 7 Simulation Results of Modulo 2 8 +1 Adder Fig. 8(c) Comparative Results of Existing[11] and Proposed architectures Fig.8(a) Results of Existing[11] architecture CONCLUSION In this brief, an efficient modulo (2 n +1) adder has been proposed. This has been achieved by modifying the existing diminished-1 modulo adders to incorporate simple correction schemes. The proposed adders can perform modulo 2 n +1 addition and produce sums that are within the range {0,2 n }. Synthesis results show that our proposed architecture gives the better results in terms of area and power when compared with previous existing architectures. And proposed modulo (2 n +1) adder using Brent- Kung parallel prefix structure will gives better performance results in terms of area, power and delay and if we consider with timing constraints then modulo (2 n +1) adder using Sklansky parallel prefix structure will gives better performance results in

www..org 24 terms of area and power when compared with other prefix structures. REFERENCES [1] M. A. Soderstrand, W. K. Jenkins, G. A. Jullien, and F. J. Taylor, Residue Number System Arithmetic: Modern Applications in Digital Signal Processing, New York: IEEE Press, 1986. [2] J. Ramirez, A. Garcia, S. Lopez-Buedo, and A. Lloris, RNS-enabled digital signal processor design, Electron. Lett., vol. 38, no. 6, pp. 266 268, Mar. 2002. [3] S. Madhukumar and F. Chin, Enhanced architecture for residue number system-based CDMA for high-rate data transmission, IEEE Trans. Wireless Commun., vol. 3, no. 5, pp. 1363 1368, Sep. 2004. [4] H. Nozaki, M. Motoyama, A. Shimbo, and S. Kawamura, Implementation of RSA algorithm based on RNS montgomery multiplication, in Proc. 3rd Int. Workshop Cryptographic Hardware Embedded Syst., vol. 2162, Lecture Notes in Computer Science, New York, 2001, pp. 364 376. [5] L. Leibowitz, A simplified binary arithmetic for the Fermat number transform, IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-24, no. 5, pp. 356 359, Oct. 1976. [6] R. Zimmermann, Efficient VLSI implementation of modulo 2 n ±1 addition and multiplication, in Proc. 14th IEEE Symp. Comput. Arithmetic, Apr. 1999, pp. 158 167. [7] H. T. Vergos, C. Efstathiou, and D. Nikolos, Diminished-one modulo 2 n +1 adder design, IEEE Trans. Comput., vol. 51, no. 12, pp. 1389 1399, Dec. 2002. [8] C. Efstathiou, H. T. Vergos, and D. Nikolos, Modulo 2 n ±1 adder design using select-prefix blocks, IEEE Trans. Comput., vol. 52, no. 11, pp. 1399 1406, Nov. 2003. [9] S.-H. Lin and M.-H. Sheu, VLSI design of diminished-one modulo 2 n +1 adder using circular carry selection, IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 55, no. 9, pp. 897 901, Sep. 2008. [10] T.-B. Juang, M.-Y. Tsai, and C.-C. Chiu, Corrections on VLSI design of diminishedone modulo 2 n +1 adder using circular carry selection, IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 56, no. 3, pp. 260 261, Mar. 2009. [11] H. T. Vergos and C. Efstathiou, A unifying approach for weighted and diminished-1 modulo 2 n +1 addition, IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 55, no. 10, pp. 1041 1045, Oct. 2008. [12] Haridimos T. Vergos and Giorgos Dimitrakopoulos, On Modulo 2n+1 Adder Design, IEEE Transactions on Computers,vol.61,no.2 February 2012.