Area-Time Optimal Adder with Relative Placement Generator

Similar documents
Homework 4 due today Quiz #4 today In class (80min) final exam on April 29 Project reports due on May 4. Project presentations May 5, 1-4pm

ISSN (PRINT): , (ONLINE): , VOLUME-4, ISSUE-10,

CMPEN 411 VLSI Digital Circuits Spring Lecture 19: Adder Design

VLSI Design. [Adapted from Rabaey s Digital Integrated Circuits, 2002, J. Rabaey et al.] ECE 4121 VLSI DEsign.1

EECS 427 Lecture 8: Adders Readings: EECS 427 F09 Lecture 8 1. Reminders. HW3 project initial proposal: due Wednesday 10/7

GALOP : A Generalized VLSI Architecture for Ultrafast Carry Originate-Propagate adders

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

Design of Arithmetic Logic Unit (ALU) using Modified QCA Adder

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

CS 140 Lecture 14 Standard Combinational Modules

I. INTRODUCTION. CMOS Technology: An Introduction to QCA Technology As an. T. Srinivasa Padmaja, C. M. Sri Priya

Very Large Scale Integration (VLSI)

Novel Bit Adder Using Arithmetic Logic Unit of QCA Technology

Where are we? Data Path Design

EE 447 VLSI Design. Lecture 5: Logical Effort

CSE477 VLSI Digital Circuits Fall Lecture 20: Adder Design

Lecture 4. Adders. Computer Systems Laboratory Stanford University

EFFICIENT MULTIOUTPUT CARRY LOOK-AHEAD ADDERS

ECE 407 Computer Aided Design for Electronic Systems. Simulation. Instructor: Maria K. Michael. Overview

Hw 6 due Thursday, Nov 3, 5pm No lab this week

Optimum Prefix Adders in a Comprehensive Area, Timing and Power Design Space

Bit-Sliced Design. EECS 141 F01 Arithmetic Circuits. A Generic Digital Processor. Full-Adder. The Binary Adder

VLSI Design, Fall Logical Effort. Jacob Abraham

Where are we? Data Path Design. Bit Slice Design. Bit Slice Design. Bit Slice Plan

Computer Architecture 10. Fast Adders

Lecture 6: Logical Effort

ARITHMETIC COMBINATIONAL MODULES AND NETWORKS

ECE429 Introduction to VLSI Design

A COMBINED 16-BIT BINARY AND DUAL GALOIS FIELD MULTIPLIER. Jesus Garcia and Michael J. Schulte

Laboratory Exercise #10 An Introduction to High-Speed Addition

Testability. Shaahin Hessabi. Sharif University of Technology. Adapted from the presentation prepared by book authors.

Hardware Design I Chap. 4 Representative combinational logic

Part II Addition / Subtraction

Comparative analysis of QCA adders

KINGS COLLEGE OF ENGINEERING DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING QUESTION BANK

Part II Addition / Subtraction

Lecture 8: Combinational Circuit Design

A High-Speed Realization of Chinese Remainder Theorem

EE141-Fall 2011 Digital Integrated Circuits

Chapter 9. Estimating circuit speed. 9.1 Counting gate delays

An Area Efficient Enhanced Carry Select Adder

7. Combinational Circuits

Implementation of Clock Network Based on Clock Mesh

VLSI Arithmetic Lecture 10: Multipliers

Midterm Exam Two is scheduled on April 8 in class. On March 27 I will help you prepare Midterm Exam Two.

Fast Ripple-Carry Adders in Standard-Cell CMOS VLSI

CSE140: Components and Design Techniques for Digital Systems. Logic minimization algorithm summary. Instructor: Mohsen Imani UC San Diego

Arithmetic Building Blocks

EE115C Digital Electronic Circuits Homework #5

Digital Integrated Circuits A Design Perspective

Tree and Array Multipliers Ivor Page 1

Chapter 8. Low-Power VLSI Design Methodology

DELAYS IN ASIC DESIGN

Number representation

Digital Integrated Circuits A Design Perspective

VLSI Design I; A. Milenkovic 1

Proposal to Improve Data Format Conversions for a Hybrid Number System Processor

Logic Synthesis and Verification

Lecture 7: Logic design. Combinational logic circuits

Physical Design of Digital Integrated Circuits (EN0291 S40) Sherief Reda Division of Engineering, Brown University Fall 2006

Issues on Timing and Clocking

NOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS: The copyright law of the United States (title 17, U.S. Code) governs the making of photocopies or

Design and Implementation of Carry Tree Adders using Low Power FPGAs

Chapter 5. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 5 <1>

EECS150 - Digital Design Lecture 24 - Arithmetic Blocks, Part 2 + Shifters

STATIC TIMING ANALYSIS

ECE 645: Lecture 3. Conditional-Sum Adders and Parallel Prefix Network Adders. FPGA Optimized Adders

Logical Effort Based Design Exploration of 64-bit Adders Using a Mixed Dynamic-CMOS/Threshold-Logic Approach

Clock signal in digital circuit is responsible for synchronizing the transfer to the data between processing elements.

Lecture 8: Sequential Multipliers

And Inverter Graphs. and and nand. inverter or nor xor

Arithmetic in Integer Rings and Prime Fields

Design of A Efficient Hybrid Adder Using Qca

Design and Implementation of Carry Adders Using Adiabatic and Reversible Logic Gates

Switching Activity Calculation of VLSI Adders

CMPEN 411 VLSI Digital Circuits Spring 2012 Lecture 17: Dynamic Sequential Circuits And Timing Issues

Lecture 12 CMOS Delay & Transient Response

Optimum Prefix Adders in a Comprehensive Area, Timing and Power Design Space

Proposal to Improve Data Format Conversions for a Hybrid Number System Processor

Digital Logic. CS211 Computer Architecture. l Topics. l Transistors (Design & Types) l Logic Gates. l Combinational Circuits.

Combinational Logic Design

Low Latency Architectures of a Comparator for Binary Signed Digits in a 28-nm CMOS Technology

Digital Microelectronic Circuits ( ) Logical Effort. Lecture 7: Presented by: Adam Teman

Combinational Logic. Lan-Da Van ( 范倫達 ), Ph. D. Department of Computer Science National Chiao Tung University Taiwan, R.O.C.

National Taiwan University Taipei, 106 Taiwan 2 Department of Computer Science and Information Engineering

PERFORMANCE IMPROVEMENT OF REVERSIBLE LOGIC ADDER

Logical Effort: Designing for Speed on the Back of an Envelope David Harris Harvey Mudd College Claremont, CA

Logical Effort. Sizing Transistors for Speed. Estimating Delays

Vectorized 128-bit Input FP16/FP32/ FP64 Floating-Point Multiplier

Interconnect (2) Buffering Techniques. Logical Effort

Chapter 2 Fault Modeling

EEC 216 Lecture #3: Power Estimation, Interconnect, & Architecture. Rajeevan Amirtharajah University of California, Davis

Chapter 5 CMOS Logic Gate Design

Lecture 9: Clocking, Clock Skew, Clock Jitter, Clock Distribution and some FM

DESIGN OF AREA DELAY EFFICIENT BINARY ADDERS IN QUANTUM-DOT CELLULAR AUTOMATA

Overview. Arithmetic circuits. Binary half adder. Binary full adder. Last lecture PLDs ROMs Tristates Design examples

Power Minimization of Full Adder Using Reversible Logic

Lecture 34: Portable Systems Technology Background Professor Randy H. Katz Computer Science 252 Fall 1995

Section 3: Combinational Logic Design. Department of Electrical Engineering, University of Waterloo. Combinational Logic

Logic and Computer Design Fundamentals. Chapter 5 Arithmetic Functions and Circuits

Transcription:

Area-Time Optimal Adder with Relative Placement Generator Abstract: This paper presents the design of a generator, for the production of area-time-optimal adders. A unique feature of this generator is that, it integrates synthesis and layout by providing relative placement information.. Relative placement information provides better support for structured layout and easier integration flow with data-path placer. Adders generated using the proposed generator are dynamically configured for a given technology library, wire-load model, delay, and area goal. Adders of sizes 1 to 102 bits are produced. The adder architecture used in this generator is a hybrid of Brent & Kung, carry select, and ripple carry adders. When compared with Synopsys fast adders, a 20-50% reduction in area with comparable delays are produced. This generator has been integrated into Synopsys high-performance datapath design tool Module Compiler. 1. Introduction Addition is an important operation affecting the speed and performance of digital signal processors. High-speed adders can be realized using most widely used carry look-ahead (CLA) [1], carry select [2], or binary carry look-ahead [3,] techniques. Due to time-tomarket constraints in VLSI industry, design of, high-speed area efficient circuits in minimal time is a major challenge. There are many full custom designed adders [][][][][][], which offer very high speed and low area for a specific process technology and complex circuit techniques. Due to the high cost of full custom adders, standard cell based designs are widely used [][][][][][] but the performance of these adders is also limited to a specific cell library and design constraints. There is another class of adders, which are produced using software programs called generators [][][][]. Using generators, adders can be generated for a variety of design goals, bit-widths, and technologies. First attempt to produce area-time optimal adders using software programs was reported by xx in []. The proposed adder architecture was based on Ladner and Fischer s parallel prefix computation model, which is essentially a look-ahead addition. The design is formulated as a dynamic programming problem and optimized for area and delay. In [], xxx propose a multi-dimensional programming paradigm to control the block sizes of carry skip adders and CLA for minimizing the delay. In the generator proposed in [], conditional sum adders which include testability features are generated. All the above generators have a major drawback that they did not consider the wiring effects. With the shrinking VLSI dimensions, especially today s deep sub-micron technologies interconnect delays play a dominant role in overall performance of the circuit. Moreover the digital design paradigm has moved from logic domain to physical domain, none of the above mentioned generators provide any information for the placement or tiling of adder cells. In this paper we present an efficient algorithm to generate area-time optimal adders, with relative placement (RP) information. Relative Placement (RP) assigns an instance/gate to a row and column position. RP information is used by the placement tool for fast,

efficient and structured placement of instances in layout. The adders generated are targeted for low area, with speeds comparable to Synopsys fast adders. The proposed generator takes into consideration the wire-load models along with other parameters such as delay, area, and operating conditions (temperature and voltage). The proposed generator that generates adders upto 102 bits has been integrated with Synopsys datapath synthesis tool, which allows further enhancement such as, pipelining, retiming, GUI interface, and adder instantiation in a Verilog like language called MCL (module compiler language). The adder architecture used in this generator is a hybrid of Brent & Kung [i, ii], carry select [], and ripple carry adders. In order to achieve low area with high-speed, area optimized ripple carry adders are used along with carry select adders (CSA) and carry generation logic. The whole adder is composed of -bit ripple carry and/or carry select adder blocks. The -bit ripple carry adder blocks are used for fast arriving carry signals (in the beginning of the adder) and CSA are used for slow arriving signals. The carry generation logic (carry chain) has been implemented using the o operator as described by xxxx in [i, ii] with a modification that four-bit groups are used, instead of two bits. By using four-bit groups, the number of wires and hardware is reduced by 1/2, keeping the adder delay proportional to O(log n) [iii] (where n is the number of bits of the adder) Also, timing analysis is used to equalize the arrival of the entire sum and reduce adder hardware by inserting/using maximum number of ripple carry adders. The adders produced by the generator are similar in topology to [iii, iv, v] but due to programming area-delay optimal adders are produced using different technology libraries, wire-load models, area-delay goals. The paper is organized as the following: In Section 2 the architecture of the proposed adder is described. In Section 3 the generator is described in detail. In Section the performance comparison of the adders generated and RP layout of a 6-bit adder using the proposed generator is presented. Finally Section 5 concludes this paper. 2. Adder Architecture: Let, A = a n-1, a n-2... a 1, a 0, and B = b n-1, b n-2,... b 1, b 0 be the two input operands, with a n-1 and b n-1 be the most significant bits. The generate and propagate signal at bit position i are given by; g i = a i b i, and p i = a i b i,(where: = AND operation and = XOR operation). The Carry out from bit position i is given by; C i = g i + g i- p i (where + = OR operation) provided C 0 = 0. The o operator as defined by [3] is given as follows: (g, p) o (g, p ) = (g + (p g ), p p ) (1) The group Generate (G) and Propagate (P) are given by: (G i, P i ) = (g 0, p 0 ) if i =0 & (g i,p i ) o (g i-1, p i-1 ) if 0 < i < n (2) In [3,,5], using (1), the generate and propagate signals for each level (k) of the adder are generated using the following combination: (G i+2k, P i+2k ) = (g i+2k, p i+2k ) o (g i, p i ) for 0 < k < log n (3)

In the proposed implementation for n bits, at k = 0 (first level) n/2 generate and propagate signals are produced using the following combination: (G 2i+1, P 2i+1 ) = (g 2i+1, p 2i+1 ) o (g 2i, p 2i ) for 0 < i < n/2 () At the second level n/ signals are produced (by grouping the signals generated at the first level) using () but limiting i to n/. These signals are the four-bit group generate and propagate signals, their value for -bit case is given below, and their grouping is shown in Fig. 1. (g 10, p 10 ) = (g 1, p 1 ) o (g 0, p 0 ) and (g 32, p 32 ) = (g 3, p 3 ) o (g 2, p 2 ) at k = 0 (at first level) (5) (G 30, P 30 ) = (g 32, p 32 ) o (g 10, p 10 ) (at second level) (6) In this realization no (g 2, p 2 ) or intermediate even carry is generated, because these are generated within the conditional sum adders. Once we have the -bit group carries, the carries in multiplies of are generated using (2). This technique results in minimum wiring and area, for n bits, approximately 2n/2 k signals are generated at each level of the adder in contrast to [3, 5] which requires 2(n-2 k ) signals. The wiring and area of Brent s adder [3,5] increases exponentially with the increase in the number of bits. While using four-bit groups offer linear increase in wiring and area, keeping the delay equivalent to Brent s adder []. Fig. 1 shows the block diagram of a 32-bit adder. As shown in this figure, the carry tree generates the carry inputs in multiples of four. The carry tree is made up of 3-input AND-OR and 2-input AND gates, implementing g i.i-1 = g i + g i-1 p i and p i.i-1 = p i-1 p i (filled circle) respectively, at each level of the tree. Since inverted logic is faster then noninverted logic therefore alternate polarities of g and p are produced at each level. The carries generated by the tree are then used to generate the final sum using either a ripple carry adder or a CSA. In the following sections we explain the design of the area-delayoptimized ripple and CSA adders.

g 31 p 31 g 0 p 0 Level 0 Level 1 Level 2 Level 3 Level C 031 C 027 C 023 C 019 C 015 C 011 C 07 C 03 Carry Select Adder Carry Select Adder Carry Select Adder Carry Select Adder Ripple Add. Ripple Add. Ripple Add. Ripple Add. S 2831 S 227 S 2023 S 1619 S 1215 S 711 S 8 S 03 Delay = log n Fig. 1. 32-Bit adder block diagram. Basic Cells Fig. 2 shows the area optimized ripple carry adder. The area of the adder is optimized by utilizing generate (g) and propagate (p) signals produced in the carry tree. This adder requires only 9-gates (counting shaded AND-OR gates as one gate) for -bits. a 3 b 3 g 2 p 2 g 01 p 2 p 01 b 2 a 2 g 01 p 01 b 1 a 1 g 0 p 0 C in b 0 a 0 C in Sum 3 Sum 2 Sum 1 Sum 0 -Bit Ripple Adder Fig. 2. Four bit area optimised ripple carry adder. Since, the four bit carry select adders are not on the critical path, they could be designed using two sets of four bit ripple carry adders, with C in = 0 for one set, and C in =1 for the other. However, we have found that it is possible to reduce the hardware by merging the two adders together.

Fig. 3 shows the schematic diagram of the -bit merged carry select adder (CSA). The hardware of this merged CSA is approximately 0% smaller then the hardware required by two separate four-bit ripple carry adders. In VLSI implementation, the -bit CSA is combined with the -bit carry generate circuit. b 3 a 3 g 2 p 2 g 01 b 2 a 2 g 1 p 1 g 0 b 1 a 1 p 0 New_add 11 XORs 3 MUXs 3 AOI 2 INV FASTCLA 8 XORs 0 MUXs 12 AOI (app. ) xx b 0 a 0 C in 0 1 C in 0 1 C in 0 1 C in Sum 3 Sum 2 Sum 1 Fig. 3. Four bits carry select adder. Sum 0 3. Generator In this section we present the generator architecture. The generator is described in C++ in a modular fashion and contains the following modules: 1. A Logic generator. 2. A Delay calculator. 3. An Adder optimizer. Functions, such as, reading the technology library, parsing MCL input, adder parameters (bitwidth, delay, area goals), and GUI interface are performed by the module compiler (MC) synthesis engine. Below we explain the design of the individual generator modules. Logic generator The main function of the logic generator is to instantiate, place, and interconnect logic cells and generate relative placement information. The MC synthesis engine parses the input and then passes a data structure to the logic generator. This data structure contains information regarding the adder type, number of bits, and pointers to input/output operands, desired area and delay. Logic generator creates a data structure containing n link lists (columns) containing input operand bits and their associated delay in ascending order. After this the generator performs following operation in the same order: Each instantiation of the gate updates RP data structure with bit/column number and the level of the adder. The nets produced following this operation are again placed in the link list, which is once again sorted for delays. 1. Generate the GP terms for each bit using NAND and NOR gates. 2. Generate a binary tree using equation 6, instantiate AND-OR and AND gate for each node. Since inverted logic is faster then non-inverted logic therefore alternate polarities of g and p are produced at each level of the GP tree, in order to get full advantage of inverting logic cells.

3. Generate intermediate GP terms, which are not power of 2.. Instantiate the first ripple carry adder shown in Fig.2 for the first four bits. 5. Instantiate CSA adders (Fig.3) for the rest of the bits. Adder optimizer Can you please explain this in words?? /*---------------------------------------------------------- Begin adder optimization Determine the maximum delay the Adder -----------------------------------------------------------*/ int delay[bigmaxbits]; int maxdelay = 0; int maxgroupdelay = 0; for (i=; i < origbits; i++) { delay[i] = gp_tree[maxlevel][i].delay; if (delay[i] > maxdelay) maxdelay = delay[i]; } /*---------------------------------------------------------- Now Start adder optimization (why can t this be merged above?) Now Replace the CSA adders -----------------------------------------------------------*/ //fast car_in path, area optimized ripple adder for (i=; i<origbits; i = (i + )) { if ( origbits-i < ) groupwidth = origbits-i-1; else groupwidth = 3; reducecol=adddbfalse; GenerateSumRip(inp,i,(claLog2(i-1)),groupWidth, reducecol); for (j=i; j < i+1+groupwidth; j++) { delay[j] = gp_tree[maxlevel][j].delay; if (delay[j] > maxgroupdelay) maxgroupdelay = delay[j]; } reducecol=adddbtrue; if(maxdelay > (maxgroupdelay+5)) GenerateSumRip(inp,i,(claLog2(i-1)),groupWidth, reducecol); else GenerateSumCSA(inp,i,(claLog2(i-1)),groupWidth, reducecol); } }

Delay calculator The delay calculator module provides detailed timing calculation information about the specified cell or net timing arc. Both, the operating conditions, and wire load models are taken into account when making delay calculations. For CMOS models, the delay is broken into four parts: intrinsic delay (D i ), transition delay (D t ), slope delay (D s ), connect delay (D c ). The total delay is therefore given by: D total = D i + D t + D s + D c. The intrinsic delay D i is the delay through the element, which is independent of the load (also known as the base delay). The transition delay D t is a function of the driver resistance and the capacitance of the wire and the pin. D t = R driver (C wire + C pins ) where R driver is the rise or fall resistance (also known as the Load Factor) The slope delay D s is the input to output delay due to the transition time on the input. The equation for this is D s = S s * D t (previous stage); where S s is the rise slope or fall slope. The connect delay D c is a rise or fall delay, also known as the time-of-flight delay or the time it takes for a waveform to propagate along a wire and is based on one of three types of models: Worst Case RC generic CMOS tree D c = R wire (C wire + sum(c load_pins )) Worst Case RC CMOS2 tree D c = (R wire /N) * (C wire + sum (C load_pins )) The above models the case where the load pin is at the extreme end of the wire. Balanced RC tree D c = (R wire /N) * (C wire /N + sum (C load_pins )) The above models the case where all load pins are on separate, equal branches of the interconnect wire. Each load pin incurs an equal portion of the wire capacitance. Best Case RC tree Dc = R wire (C wire + C pin ) = 0 Models the case where the load pin is physically adjacent to the driver. The wire resistance R wire is determined from a wire load model. The length of the net is computed from a global estimation function, which is based on the number of fanouts for that net. The wire capacitance C wire is determined for the head of the timing arc. The length is computed using the actual number of fanout pins on the net and the fanout_lengths defined in the wire_load group in the library. If no wire_load group is defined, then C wire is 0. The pin capacitance C pin is the library capacitance defined in the pin group for the load pin. For a cell arc, the pin names are associated with an input pin and an output pin for the same cell. For a net arc, the pin names are for a driver pin to a load pin on the same net. Calculating the Driving Cell Delay

Assuming nonlinear delay library modeling for the specified timing arc (that is, libcellinput-pin to libcell-output-pin timing relationship), the cell delay is a function of both the transition time on the input pin and the capacitive load on the output pin. Delay calculation uses the first timing arc it encounters for the driving cells specified output pin, and it uses zero for the input transition time. The total capacitive load seen by the driving cells output pin is the sum of the wire capacitances and the pin capacitances that it drives. These capacitances can exist both externally and internally to an input port. Both external wire capacitance and external pin capacitance contribute to the load seen by the driving cell. Internal capacitances are modeled in four ways: - The wire load model calculation of the nets wire capacitance based on the number of loads it drives - The net's wire capacitance annotated on pins or ports through set_rtl_load (for details, see the man page for set_rtl_load) (what man page?) - The net's pin capacitance for each leaf cell input pin it drives - The nets wire capacitance annotated by using set_load (this capacitance The conventional approach to calculate pre-layout wire load during synthesis, in an ASIC environment, has been a specification of a statistical wire load model based on total die size and fanout capacitance. For example, assume you have the following values: C pin - pin capacitance external to input port. C wire - wire capacitance external to input port. C wlm - calculated wire capacitance internal to input port based upon wire load model C fan - pin capacitance internal to input port based upon individual library cell input pin descriptions The total capacitive load seen by the output pin of our driving cell would be C tota l = (C pin + C wire ) + (C wlm + C fan ) Calculating the Interconnect Delay External to the Input Port The wire delay external to an input port is the product of the external wire resistance and all capacitance seen after the external pin capacitance. The external and internal capacitances used here are modeled in the same manner as described above in the calculation of the driving cell delay. The difference, here, is that the external pin capacitance does not contribute to the external wire delay. Only the external wire capacitance and all internal capacitances contribute to this delay.

For example, assume we have the following values R ip - resistance value external to input port C pin - pin capacitance external to input port C wire - wire capacitance external to input port R wlm - calculated wire resistance internal to input port based on the wire load model C wlm - calculated wire capacitance internal to input port based on wire load model C fan - pin capacitance internal to input port based on individual library cell input pin descriptions The wire delay external to an input port would be D ext = R ip * (C wire + C wlm + C fan ) The total wire delay both external and internal to an input port would be D total = D ext + (R wlm * (C wire + C wlm + C fan ) ) Note that Cpin does not affect the wire delay. However, it still presents an additional load to the driving cell and, therefore, affects the driving cells delay as seen above.. Performance Comparison The performance of the adders synthesized by the generator are compared with the existing Module Compiler (MC) FASTCLA [], and Design Ware (DW) BK adder []. All the MC adders are synthesized using MC and then post processed in Design Compiler (DC), while the DW adders were synthesized using only DC. Each adder is synthesized using ten different libraries worst case operating conditions and wire load models. Fig. and Fig. 5 shows the delay and area comparison between different adders using IBM_CMOS6S_SC library. It is clear from the synthesis results that in terms of delay adders generated by the proposed generator are comparable (good in some cases and slightly slow in some cases) to SYNOPSYs best adders (DW_BK, and MC_FASTCLA). While, in terms of area adders generated by the proposed generator are always 20-50% better.

IBM Delay COmparison 1.8 1.6 1. 1.2 Delay 1 0.8 0.6 0. 0.2 0 1 2 3 5 6 7 8 9 10 11 12 13 1 15 16 17 18 19 20 21 22 23 2 25 26 27 28 29 30 31 32 33 3 35 6 Bits Dw_Delay FCLA_Delay AOFCLA_Delay Fig.. Delay comparison of DW_BK, MC_FCLA, and NEW adder in IBM 0.18u Tech. IBM Area Comparison 3500 3000 2500 Cell Area 2000 1500 1000 500 0 1 2 3 5 6 7 8 9 10 11 12 13 1 15 16 17 18 19 20 21 22 23 2 25 26 27 28 29 30 31 32 33 3 35 6 Bits Dw_area FCLA_Area AOFCLA_area Fig. 5. Area comparison of DW_BK, MC_FCLA, and NEW adder in IBM 0.18u Tech. 5. Conclusions

Reference: i. Brent R., "On the addition of binary numbers", IEEE Trans. Computers, C-19, 758 (1970). ii. Brent, R.P.; Kung, H.T., A regular layout for parallel adders, IEEE Transactions on Computers, vol.c-31, (no.3), March 1982. p.260-. iii. A. A Farooqui, V. G. Oklobdzija, F. Chehrazi, "Multiplexer based adder for media signal processing", Proceedings of the 1999 International Symposium on VLSI Technology, Systems, and Applications, Taipei, Taiwan, R.O.C., Page(s): 100-103, 8-10 June 1999. iv. A. A Farooqui, V. G. Oklobdzija, F. Chehrazi, "6-Bit Media Adder", Proceedings of the 1999 IEEE International Symposium on Circuits and Systems, (ISCAS '99), Orlando, Florida, USA, 30 May-2 June 1999. v. A. A Farooqui, V. G. Oklobdzija, F. Chehrazi, "A Mutliplexer Based Parallel n-bit Adder Circuit for High Speed Processing, SONY-50N3106, filed March 1999.