Design and Implementation of Carry Tree Adders using Low Power FPGAs

Similar documents
ISSN (PRINT): , (ONLINE): , VOLUME-4, ISSUE-10,

EFFICIENT MULTIOUTPUT CARRY LOOK-AHEAD ADDERS

Hw 6 due Thursday, Nov 3, 5pm No lab this week

Implementation of Carry Look-Ahead in Domino Logic

Comparative analysis of QCA adders

Lecture 4. Adders. Computer Systems Laboratory Stanford University

HIGH SPEED AND INDEPENDENT CARRY CHAIN CARRY LOOK AHEAD ADDER (CLA) IMPLEMENTATION USING CADENCE-EDA K.Krishna Kumar 1, A.

Design of A Efficient Hybrid Adder Using Qca

Switching Activity Calculation of VLSI Adders

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

Novel Bit Adder Using Arithmetic Logic Unit of QCA Technology

Design and Implementation of Efficient Modulo 2 n +1 Adder

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

Logical Effort Based Design Exploration of 64-bit Adders Using a Mixed Dynamic-CMOS/Threshold-Logic Approach

Design of Arithmetic Logic Unit (ALU) using Modified QCA Adder

High Speed Time Efficient Reversible ALU Based Logic Gate Structure on Vertex Family

A Novel Low Power 1-bit Full Adder with CMOS Transmission-gate Architecture for Portable Applications

EECS 427 Lecture 8: Adders Readings: EECS 427 F09 Lecture 8 1. Reminders. HW3 project initial proposal: due Wednesday 10/7

An Area Efficient Enhanced Carry Select Adder

DESIGN OF LOW POWER-DELAY PRODUCT CARRY LOOK AHEAD ADDER USING MANCHESTER CARRY CHAIN

Implementation of Reversible Control and Full Adder Unit Using HNG Reversible Logic Gate

CMPEN 411 VLSI Digital Circuits Spring Lecture 19: Adder Design

DESIGN OF AREA DELAY EFFICIENT BINARY ADDERS IN QUANTUM-DOT CELLULAR AUTOMATA

Homework 4 due today Quiz #4 today In class (80min) final exam on April 29 Project reports due on May 4. Project presentations May 5, 1-4pm

Design and Analysis of Comparator Using Different Logic Style of Full Adder

I. INTRODUCTION. CMOS Technology: An Introduction to QCA Technology As an. T. Srinivasa Padmaja, C. M. Sri Priya

EE241 - Spring 2001 Advanced Digital Integrated Circuits

Novel Modulo 2 n +1Multipliers

Midterm Exam Two is scheduled on April 8 in class. On March 27 I will help you prepare Midterm Exam Two.

ECE 645: Lecture 3. Conditional-Sum Adders and Parallel Prefix Network Adders. FPGA Optimized Adders

Area-Time Optimal Adder with Relative Placement Generator

Design of Reversible Code Converters Using Verilog HDL

Binary addition by hand. Adding two bits

Design of High-speed low power Reversible Logic BCD Adder Using HNG gate

Where are we? Data Path Design

Power Minimization of Full Adder Using Reversible Logic

L8/9: Arithmetic Structures

EE141- Spring 2004 Digital Integrated Circuits

Where are we? Data Path Design. Bit Slice Design. Bit Slice Design. Bit Slice Plan

Overview. Arithmetic circuits. Binary half adder. Binary full adder. Last lecture PLDs ROMs Tristates Design examples

Design of a Novel Reversible ALU using an Enhanced Carry Look-Ahead Adder

Robust Energy-Efficient Adder Topologies

Design and Comparison of Wallace Multiplier Based on Symmetric Stacking and High speed counters

BCD Adder Design using New Reversible Logic for Low Power Applications

PERFORMANCE ANALYSIS OF CLA CIRCUITS USING SAL AND REVERSIBLE LOGIC GATES FOR ULTRA LOW POWER APPLICATIONS

Logic and Computer Design Fundamentals. Chapter 5 Arithmetic Functions and Circuits

COMP 103. Lecture 16. Dynamic Logic

Chapter 4. Combinational: Circuits with logic gates whose outputs depend on the present combination of the inputs. elements. Dr.

ARITHMETIC COMBINATIONAL MODULES AND NETWORKS

FPGA IMPLEMENTATION OF BASIC ADDER CIRCUITS USING REVERSIBLE LOGIC GATES

VHDL DESIGN AND IMPLEMENTATION OF C.P.U BY REVERSIBLE LOGIC GATES

VLSI Design. [Adapted from Rabaey s Digital Integrated Circuits, 2002, J. Rabaey et al.] ECE 4121 VLSI DEsign.1

GALOP : A Generalized VLSI Architecture for Ultrafast Carry Originate-Propagate adders

Low Power, High Speed Parallel Architecture For Cyclic Convolution Based On Fermat Number Transform (FNT)

Optimum Prefix Adders in a Comprehensive Area, Timing and Power Design Space

A COMBINED 16-BIT BINARY AND DUAL GALOIS FIELD MULTIPLIER. Jesus Garcia and Michael J. Schulte

DESIGN AND ANALYSIS OF A FULL ADDER USING VARIOUS REVERSIBLE GATES

Circuit for Revisable Quantum Multiplier Implementation of Adders with Reversible Logic 1 KONDADASULA VEDA NAGA SAI SRI, 2 M.

EE141-Fall 2010 Digital Integrated Circuits. Announcements. An Intel Microprocessor. Bit-Sliced Design. Class Material. Last lecture.

NEW SELF-CHECKING BOOTH MULTIPLIERS

Online Testable Reversible Circuits using reversible gate

Hardware Design I Chap. 4 Representative combinational logic

DESIGN OF AREA-DELAY EFFICIENT ADDER BASED CIRCUITS IN QUANTUM DOT CELLULAR AUTOMATA

DIGITAL TECHNICS. Dr. Bálint Pődör. Óbuda University, Microelectronics and Technology Institute

Proposal to Improve Data Format Conversions for a Hybrid Number System Processor

Downloaded from

DESİGN AND ANALYSİS OF FULL ADDER CİRCUİT USİNG NANOTECHNOLOGY BASED QUANTUM DOT CELLULAR AUTOMATA (QCA)

DESIGN OF PARITY PRESERVING LOGIC BASED FAULT TOLERANT REVERSIBLE ARITHMETIC LOGIC UNIT

Design and Implementation of Reversible Binary Comparator N.SATHISH 1, T.GANDA PRASAD 2

KINGS COLLEGE OF ENGINEERING DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING QUESTION BANK

Vectorized 128-bit Input FP16/FP32/ FP64 Floating-Point Multiplier

ΗΜΥ 307 ΨΗΦΙΑΚΑ ΟΛΟΚΛΗΡΩΜΕΝΑ ΚΥΚΛΩΜΑΤΑ Εαρινό Εξάμηνο 2018

Chapter 03: Computer Arithmetic. Lesson 03: Arithmetic Operations Adder and Subtractor circuits Design

A NEW APPROACH TO DESIGN BCD ADDER AND CARRY SKIPBCD ADDER

A High-Speed Realization of Chinese Remainder Theorem

Implementation of Reversible ALU using Kogge-Stone Adder

Section 3: Combinational Logic Design. Department of Electrical Engineering, University of Waterloo. Combinational Logic

Design of Reversible Comparators with Priority Encoding Using Verilog HDL

Design for Manufacturability and Power Estimation. Physical issues verification (DSM)

Literature Review on Multiplier Accumulation Unit by Using Hybrid Adder

Area Efficient Sparse Modulo 2 n 3 Adder

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: Digital Logic

A Novel Ternary Content-Addressable Memory (TCAM) Design Using Reversible Logic

Design and Implementation of Carry Adders Using Adiabatic and Reversible Logic Gates

A Low-Error Statistical Fixed-Width Multiplier and Its Applications

Chapter 5. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 5 <1>

DELAY EFFICIENT BINARY ADDERS IN QCA K. Ayyanna 1, Syed Younus Basha 2, P. Vasanthi 3, A. Sreenivasulu 4

Optimum Prefix Adders in a Comprehensive Area, Timing and Power Design Space

Carry Look Ahead Adders

Lecture 11: Adders. Slides courtesy of Deming Chen. Slides based on the initial set from David Harris. 4th Ed.

Energy Delay Optimization

COE 202: Digital Logic Design Combinational Circuits Part 2. Dr. Ahmad Almulhem ahmadsm AT kfupm Phone: Office:

ISSN Vol.03, Issue.03, June-2015, Pages:

CMPEN 411 VLSI Digital Circuits Spring Lecture 14: Designing for Low Power

Reversible ALU Implementation using Kogge-Stone Adder

Integrated Circuits & Systems

DESIGN OF OPTIMAL CARRY SKIP ADDER AND CARRY SKIP BCD ADDER USING REVERSIBLE LOGIC GATES

Fast Ripple-Carry Adders in Standard-Cell CMOS VLSI

On the Analysis of Reversible Booth s Multiplier

A General Method for Energy-Error Tradeoffs in Approximate Adders

Transcription:

1 Design and Implementation of Carry Tree Adders using Low Power FPGAs Sivannarayana G 1, Raveendra babu Maddasani 2 and Padmasri Ch 3. Department of Electronics & Communication Engineering 1,2&3, Al-Ameer college of engineering 1&3, JNT University college of Engineering 2, Jawaharlal Nehru Technological University, Kakinada, India 1&2. Abstract The binary adder is the critical element in most digital circuit designs including digital signal processors (DSP) and microprocessor data path units. As such, extensive research continues to be focused on improving the power delay performance of the adder. To resolve the delay of carry-look ahead adders, the scheme of multilevel-look ahead adders. These adders have tree structures within a carry-computing stage similar to the carry propagate adder. However, the other two stages for these adders are called pre-computation and postcomputation stages. In pre-computation stage, each bit computes its carry generate/propagate and a temporary sum. In the post-computation stage, the sum and carry-out are finally produced. The carry-out can be omitted if only a sum needs to be produced. Index Terms Adders, carry-propagate, carry-generate Power delay. I.INTRODUCTION The high-operation speed and the excessive activity of the adder circuits in modern microprocessors, not only lead to high power consumption, but also create thermal hotspots that can severely affect circuit reliability and increase cooling costs. The presence of multiple execution engines in current processors further aggravates the problem. Therefore, there is a strong need for designing power-efficient adders that would also satisfy the tight constraints of high-speed.the design of high-speed, low-power and area efficient binary adders always receives a great deal of attention. Among the hundreds adder architectures known in the literature, when high performances are mandatory, parallel-prefix trees are generally preferable. Optimizing a parallel-prefix tree architecture and its transistor-level implementation for a specific design is not trivial since the designer has to choose: i) the radix-of the carry tree (i.e. the number of carries grouped in each step of the computation); ii) the tree architecture; iii) the logic style. As is well known, all these choices are crucial for both speed and power. parallel-prefix trees realized using dynamic domino logic achieve higher speed performances at the expense of consumed energy; where as, using static logics lowers power consumption, but sacrifices computational speed. This paper proposes a novel approach to optimize the implementation of the basic logic modules, namely the preprocessing stage and the associative dot operator, typically used within parallelprefix adders. Several optimization approaches have been proposed that try to reduce the power consumption of the circuit, either by trading gate sizing with an increase in the maximum delay of the circuit, or by using lower supply voltages in non-critical paths. The novelty of the proposed approach is that it directly reduces the switching activity of the carry-computation unit via a mathematical reformulation of the problem. Therefore its application is orthogonal to all other power-optimization methodologies. The basic idea exploited in the proposed designs consists of: 1) increasing speed by reducing the complexity of the pulldown networks (PDNs) of each dynamic gate; and 2) saving power by reducing the number of dynamic stages within the overall structure of the generic parallel-prefix tree. The paper is organized as follows: in Section II, a brief background on the parallel-prefix adder trees is provided, the novel circuits are then described in Section III where comparison results are also presented and discussed; finally, conclusions are drawn. II.MODEL: Assume that A = an 1an 2... a0 and B = bn 1bn 2... b0 represent the two numbers to be added, and S = sn 1sn 2... s0 denotes their sum. An adder can be considered as a three-stage circuit. The preprocessing stage computes the carry-generate bits gi, the carry-propagate bits pi, and the half-sum bits hi, for every i, 0 i n 1, according to: g i = a i b i p i = a i + b i h i = a i b i (1) Where, +, and denote the logical AND, OR and exclusive-or operations respectively. The second stage of the adder, hereafter called the carrycomputation unit, computes the carry signals ci, using the carry generate and propagate bits gi and pi, whereas the last stage computes the sum bits according to: s i = h i c i 1. (2)

2 The computation of the carries ci can be performed in parallel for each bit position 0 i n 1 using the formula Where the bit g 1 represents the carry-in signal c in. Based on (3), each carry c i can be written as c i = g i + K i (4) Several solutions have been presented for the carrycomputation problem. Carry computation is transformed to a prefix problem by using the associative operator, which associates pairs of generate and propagate bits and is defined, as follows, (g, p) (g, p ) = (g + p g, p p ). (5) Using consecutive associations of the generate and propagate pairs (g, p), each carry is computed according to c i = (g i, p i ) (g i 1, p i 1 )... (g 1, p 1 ) (g 0, p 0 ) (6) account that in general a XOR gate is of almost equal delay to a multiplexer, and that both h i and (g i 1 h i ) are computed in less logic levels than K i 1, then no extra delay is imposed by the use of the bits Ki for the computation of the sum bits si. The carry-out bit cn 1 is produced almost simultaneously with the sum bits using the relation cn 1 = gn 1+Kn 1. III. DESIGN SUMMARIZATION: The architecture of the proposed adders is shown in Figure1, The computation of K i can be transformed to a parallel-prefix problem by introducing the variable G* i, which is defined as follows: Assuming that G i = g i + g i 1 P i = p i p i 1 (9) G* i = P i G i 1 (10) The bits Ki of the odd and the even-indexed bit positions can be expressed as K 2k = (G* 2k, P 2k ) (G* 2k 2, P 2k 2 )... (G* 0, P 0 ) (11) K 2k+1 = (G* 2k+1, P 2k+1 ) (G* 2k 1, P 2k 1 )... (G* 1, P 1 ) (12) Fig1: The architecture of the proposed Carry Tree adders The computation of the bits K i, instead of the normal carries ci, complicates the derivation of the final sum bits s i since in this case However, using the Shannon expansion theorem on K i 1, the computation of the bits s i can be transformed as follows Equation (8) can be implemented using a multiplexer that selects either h i or (g i 1 h i ) according to the value of K i 1. The notation x denotes the complement of bit x. Taking into (7) (8) At first the number of terms (G* i, P i ) that need to be associated is reduced to half compared to the traditional approach, where the pairs (g, p) are used (Eq. (6)). In this way one less prefix level is required for the computation of the bits Ki, which directly leads to a reduction of 2 logic levels in the final implementation. Taking into account that the new preprocessing stage that computes the pairs (G* i, P i ) requires two additional logic levels compared to the logic-levels needed to derive the bits (g, p) in the traditional case, we conclude that no extra delay is imposed by the proposed adders. Additionally, the terms K i of the odd and the even-indexed bit positions can be computed independently, thus directly reducing the fan out requirements of the parallel-prefix structures. Their design is summarized in the following steps. Calculate the carry generate/propagate pairs (gi, pi) according to (1). Combine the bits gi, pi, gi 1, and pi 1 in order to produce the intermediate pairs (G* i, P i ), based on the definitions (9) & (10). Produce two separate prefix-trees, one for the even and one for the odd indexed bit positions that compute the terms K 2k and K 2k+1 of equations (11) and (12), using the pairs (G* i, P i ). Any of the already known parallel-prefix structures can be

3 employed for the generation of the bits Ki, in log2 n 1 prefix levels. In Figure 2 several architectures for the case of a 16-bit adder are presented, each one having different area, delay and fan out requirements. It can be easily verified that the proposed adders maintain all the benefits of the parallel-prefix structures, while at the same time offer reduced fan out requirements. Figure 2(a) presents a Kogge-Stone-like parallel-prefix structure, while in Figure 2(c) a Han- Carlson-like scheme is shown. It should be noted that in case of the proposed Han-Carlson-like prefix tree only the terms Ki of the odd-indexed bit positions are computed. The remaining terms of the even-indexed bit positions are computed using the bits Ki according to Ki+1 = pi+1 (gi + Ki) (13) Which are implemented by the grey cells of the last prefix level. (c) Fig.2: Novel 16-bit parallel-prefix Carry computation units (d) IV.RESULTS: The proposed adders are compared against the parallelprefix structures proposed by Kogge-Stone and Han-Carlson for the traditional definition of carry equations (Eq. (6)). Each 64-bit adder was at first described and simulated in Verilog HDL. In the following, all adders were mapped on the 0.25μm VST-25 technology library under typical conditions (2.5V, 25oC), using the Synopsys_ Design Compiler. Each design was recursively optimized for speed targeting the minimum possible delay. Also, several other designs were obtained targeting less strict delay constraints. (a) (b) Fig.3: The area and delay estimates for the traditional and the proposed 64-bit adders using a Kogge-Stone and Han-Carlson parallel-prefix structures.

4 Figure 3 presents the area and delay estimates of the proposed and the traditional Parallel-prefix adder architectures. It is noted that the proposed adders in all cases achieve equal delay compared to the already known structures with only a 2.5% in average increase in the implementation area. Based on the data provided by our technology library, it is derived that the delay of 1 fanout-4 (FO4) inverter equals to 120 130ps, under typical conditions. Thus the obtained results fully agree with the results given in (7), where the delay of different adder topologies is calculated using the method of Logical Effort. Fig. 5: Power and delay estimates for the traditional and the proposed 64-bit adders using a Han-Carlson parallel-prefix structure Fig. 4: Power and delay estimates for the traditional and the proposed 64-bit adders using a Kogge-Stone parallel-prefix structure. In Figure 4 the power consumption of the proposed and the traditional Kogge-Stone 64-bit adders are presented versus the different delays of the circuit. It can be easily observed that the proposed architecture achieves power reductions that range between 9% and 11%. All measurements were taken using Prime Power of Synopsys toolset after the application of 5000 random vectors. It should be noted that although the proposed adders require slightly larger implementation area, due to the addition of the extra multiplexers in the sum generation unit, they still offer power efficient designs as a result of the reduced switching activity of the the novel carry-computation units. Similar results are obtained for the case of the 64-bit Han-Carlson adders, as shown in Figure 5. Finally, since the application of the proposed technique is orthogonal to all other power optimization methods, such as gate sizing and the use of multiple supply voltages, their combined use can lead to further reduction in power dissipation. Specifically, the adoption of the proposed technique offers on average an extra 10% gain over the reductions obtained by the use of any other circuit-level optimization. V.CONCLUSIONS AND FUTURE WORK: A systematic methodology for the design of power-efficient parallel-prefix adders has been introduced in this paper. The proposed adders preserve all the benefits of the traditional parallel-prefix carry-computation units, while at the same time offer reduced switching activity and fan out requirements. Hence, high-speed data paths of modern microprocessors can truly benefit from the adoption of the proposed adder architecture. It may desire to extend our fan out to fan out f. VI.REFERENCES: [1] B.R.Zeydel, D.Baran, V.G.Oklobdzija, Energy-Efficient Design Methodologies: High-Performance VLSI Adders, IEEE Journal of Solid-State Circuits, vol.45, n 6, pp.1220-1233, 2010. [2] Simon Knowles, A Family of Adders, in Proceedings of the 14th IEEE Symposium on Computer Arithmetic, April 1999, pp. 30 34. [3] S. Mathew, R. Krishnamurthy, M. Anders, R. Rios, K. Mistry and K. Soumyanath, Sub-500-ps 64-b ALUs in 0.18-μm SOI/Bulk CMOS: Design and Scaling Trends IEEE Journal of Solid-State Circuits, vol. 36, no. 11, Nov. 2001.

5 [4] Y. Shimazaki, R. Zlatanovici, and B. Nikolic, A Shared- Well Dual-Supply-Voltage 64-bit ALU, IEEE Journal of Solid-State Circuits, vol. 39, no. 3, March 2004. [5] D. Harris and I. Sutherland, Logical Effort of Carry Propagate Adders, 37 th Asilomar Conference, Nov. 2003, pp. 873 878. [6] F.Frustaci, M.Lanuzza, P.Zicari, S.Perri, P.Corsonello, Designing High-Speed Adders in Power-Constrained Environments, IEEE Transactions on Circuits and Systems II: Express Briefs, Vol.56, n 2 pp.172-176, 2009. [7] R.Zlatanovici, S.Kao, B.Nikolic, Energy-Delay Optimization of 64- bit Carry-Lookahead Adders with a 240ps 90nm CMOS Design Example, IEEE Journal of Solid-State Circuits, Vol.44, n 2, pp.569-583, 2009.. [8] S.Das, S.P.Khatri, A Novel Hybrid Paralle-Prefix Adderarchitecture With Efficient Timing-Area Characteristic, IEEE Transactions on VLSI Systems, vol.16, n 3, pp.326-331, 2008. [9] F.Liu, F.F.Forouzandeh, O.A.Mohamed, G.Chen, X.Song, Q.Tan, A Comparative Study of Parallel Prefix Adders in FPGA Implementation of EAC, Proc. of the EUROMICRO Conference on Digital System Design, Architetcures, Methods and Tools, August 2009, Patras, Greece, pp.281-286. [10] R. Brodersen, M. Horowitz, D. Markovic, B. Nikolic and V. Stojanovic, Methods for True Power Minimization, International Conference on Computer-Aided Design, Nov. 2002, pp. 35 42.