Logical Effort of Higher Valency Adders

Similar documents
Lecture 4: Adders. Computer Systems Laboratory Stanford University

Logic effort and gate sizing

Estimating Delays. Gate Delay Model. Gate Delay. Effort Delay. Computing Logical Effort. Logical Effort

Adders. Today: Adders. EE M216A.:. Fall Prof. Dejan Marković Lecture 9. Basic terminology. Adder building blocks

Lecture 7: Multistage Logic Networks. Best Number of Stages

TOPICS MULTIPLIERLESS FILTER DESIGN ELEMENTARY SCHOOL ALGORITHM MULTIPLICATION

DUE: WEDS FEB 21ST 2018

Interconnect Optimization for Deep-Submicron and Giga-Hertz ICs

EE241 - Spring 2000 Advanced Digital Integrated Circuits. Carry-Skip Adder

Structure and Drive Paul A. Jensen Copyright July 20, 2003

( ) = ( ) + ( 0) ) ( )

I = α I I. Bipolar Junction Transistors (BJTs) 2.15 The Emitter-Coupled Pair. By using KVL: V

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Polynomial Regression Models

9 Derivation of Rate Equations from Single-Cell Conductance (Hodgkin-Huxley-like) Equations

FE REVIEW OPERATIONAL AMPLIFIERS (OP-AMPS)( ) 8/25/2010

Lecture 4. Adders. Computer Systems Laboratory Stanford University

EE241 - Spring 2005 Advanced Digital Integrated Circuits. Clock Generation. Lecture 22: Adders. Delay-Locked Loop (Delay Line Based) f REF Phase Det

A Novel, Low-Power Array Multiplier Architecture

Department of Electrical & Electronic Engineeing Imperial College London. E4.20 Digital IC Design. Median Filter Project Specification

Queueing Networks II Network Performance

HIGH-SPEED MULTI OPERAND ADDITION UTILIZING FLAG BITS VIBHUTI DAVE DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING

NUMERICAL DIFFERENTIATION

Abstract. The assumptions made for rank computation are as follows. (see Figure 1)

1 Derivation of Rate Equations from Single-Cell Conductance (Hodgkin-Huxley-like) Equations

Homework 4 due today Quiz #4 today In class (80min) final exam on April 29 Project reports due on May 4. Project presentations May 5, 1-4pm

Using the Minimum Set of Input Combinations to Minimize the Area of Local Routing Networks in Logic Clusters. FPGAs. Andy Ye Ryerson University

Lecture 11: Adders. Slides courtesy of Deming Chen. Slides based on the initial set from David Harris. 4th Ed.

Coarse-Grain MTCMOS Sleep

EEE 241: Linear Systems

Unit 1. Current and Voltage U 1 VOLTAGE AND CURRENT. Circuit Basics KVL, KCL, Ohm's Law LED Outputs Buttons/Switch Inputs. Current / Voltage Analogy

Linear Regression Analysis: Terminology and Notation

One-sided finite-difference approximations suitable for use with Richardson extrapolation

FE REVIEW OPERATIONAL AMPLIFIERS (OP-AMPS)

Lossy Compression. Compromise accuracy of reconstruction for increased compression.

Negative Binomial Regression

EEL 6266 Power System Operation and Control. Chapter 3 Economic Dispatch Using Dynamic Programming

CSE4210 Architecture and Hardware for DSP

Pulse Coded Modulation

Module #6: Combinational Logic Design with VHDL Part 2 (Arithmetic)

Continued..& Multiplier

ISSN (PRINT): , (ONLINE): , VOLUME-4, ISSUE-10,

Implementation and Study of Reversible Binary Comparators

Clock-Gating and Its Application to Low Power Design of Sequential Circuits

Grover s Algorithm + Quantum Zeno Effect + Vaidman

College of Engineering Department of Electronics and Communication Engineering. Test 1 With Model Answer

A Low Error and High Performance Multiplexer-Based Truncated Multiplier

Inductance Calculation for Conductors of Arbitrary Shape

Design and Implementation of Carry Tree Adders using Low Power FPGAs

ECE559VV Project Report

I. INTRODUCTION. 1.1 Circuit Theory Fundamentals

x = , so that calculated

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

Real-Time Systems. Multiprocessor scheduling. Multiprocessor scheduling. Multiprocessor scheduling

Numerical Heat and Mass Transfer

A New Design of Multiplier using Modified Booth Algorithm and Reversible Gate Logic

Chapter 15 Student Lecture Notes 15-1

CHAPTER 13. Exercises. E13.1 The emitter current is given by the Shockley equation:

AGC Introduction

Using the estimated penetrances to determine the range of the underlying genetic model in casecontrol

Lecture 8: Time & Clocks. CDK: Sections TVS: Sections

Stanford University CS254: Computational Complexity Notes 7 Luca Trevisan January 29, Notes for Lecture 7

Chapter 13: Multiple Regression

Introduction to information theory and data compression

CHAPTER 4 SPEECH ENHANCEMENT USING MULTI-BAND WIENER FILTER. In real environmental conditions the speech signal may be

Experience with Automatic Generation Control (AGC) Dynamic Simulation in PSS E

Department of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution

Optimum Prefix Adders in a Comprehensive Area, Timing and Power Design Space

Multiline Overview. Karl Bois, PhD Lead SI Engineer Servers & Blades. December 5th, 2016

Run-time Active Leakage Reduction By Power Gating And Reverse Body Biasing: An Energy View

Design and Optimization of Fuzzy Controller for Inverse Pendulum System Using Genetic Algorithm

Lecture 4: Universal Hash Functions/Streaming Cont d

EECS 427 Lecture 8: Adders Readings: EECS 427 F09 Lecture 8 1. Reminders. HW3 project initial proposal: due Wednesday 10/7

8 Derivation of Network Rate Equations from Single- Cell Conductance Equations

Feature Selection: Part 1

Uncertainty in measurements of power and energy on power networks

Combinational Circuit Design

Winter 2008 CS567 Stochastic Linear/Integer Programming Guest Lecturer: Xu, Huan

RT Level Power Analysis y. Jianwen Zhu, Poonam Agrawal, Daniel D. Gajski. components [La94]. its simplicity.

arxiv:quant-ph/ Jul 2002

An Interactive Optimisation Tool for Allocation Problems

Simultaneous Device and Interconnect Optimization

Low Complexity Soft-Input Soft-Output Hamming Decoder

AN EFFICIENT TECHNIQUE FOR DEVICE AND INTERCONNECT OPTIMIZATION IN DEEP SUBMICRON DESIGNS. Jason Cong Lei He

Determining Transmission Losses Penalty Factor Using Adaptive Neuro Fuzzy Inference System (ANFIS) For Economic Dispatch Application

Interconnect Modeling

NP-Completeness : Proofs

Gasometric Determination of NaHCO 3 in a Mixture

On balancing multiple video streams with distributed QoS control in mobile communications

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

8.1 Arc Length. What is the length of a curve? How can we approximate it? We could do it following the pattern we ve used before

Chapter 11: Simple Linear Regression and Correlation

Lab 2e Thermal System Response and Effective Heat Transfer Coefficient

Lecture 4. Macrostates and Microstates (Ch. 2 )

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

Suppose that there s a measured wndow of data fff k () ; :::; ff k g of a sze w, measured dscretely wth varable dscretzaton step. It s convenent to pl

18.1 Introduction and Recap

On the Multicriteria Integer Network Flow Problem

ELE B7 Power Systems Engineering. Power Flow- Introduction

Lecture 14: Forces and Stresses

Transcription:

Logcal Effort of gher Valency Adders Davd arrs arvey Mudd College E. Twelfth St. Claremont, CA Davd_arrs@hmc.edu Abstract gher valency parallel prefx adders reduce the number of logc levels at the expense of greater fan-n at each level. Ths paper uses the method of logcal effort to evaluate the tradeoffs of hgher valency for statc and dynamc mplementatons of varous adder archtectures. A G B P A G Fg Adder block dagram B A B A B C n Precomputaton P G P G P G P I. INTRODUCTION gher valency parallel prefx adders are popular for hgh performance applcatons such as mcroprocessor ALUs [,,, ]. A valency-v N-bt adder requres O(log v N) logc levels, so a bt addton requres as few as three levels of valency propagate-generate gates as opposed to sx levels of valency. owever, the hgher valency gates have greater logcal effort and parastc delay, are more complex to desgn, and are not always avalable n standard cell lbrares. Is hgher valency addton really faster? Domno gates have lower logcal efforts than ther statc counterparts and hence can use greater fan-ns. Does ths mean hgher valences are better suted to domno than statc logc? Ths paper uses the method of logcal effort to try to answer these questons. Accordng to the logcal effort model, the delays of valency-,, and desgns are all approxmately the same for a gven archtecture, crcut famly, and wre load model. Ths paper closely follows the methodology of []. It frst descrbes the statc and domno gates used to compute generate and propagate sgnals for the varous valences and tabulates the estmated logcal effort and parastc delay of each gate. It then shows the prefx networks and the crtcal paths that were examned. Fnally, t calculates the delays for valency,, and for each archtecture and crcut famly usng the method of logcal effort. II. LOGICAL EFFORT OF CIRCUIT BUILDING BLOCKS The three basc buldng blocks for an adder are the btwse Propagate/Generate (PG) cells, the group PG cells n the prefx network, and the sum XORs, as shown n Fg. gh performance datapath adders often buld these cells from domno gates whle statc s preferable when desgn smplcty and power consumpton take precedence over utmost performance. Fg shows mplementatons of the btwse PG cells and the sum XOR gates usng statc and domno gates. The statc desgns use propagate and generate (PG) whle the domno add kll (K) for monotonc sum computaton. The transstor wdths are specfed n arbtrary unts to delver unt drve. Nonnvertng statc gates add an nverter after Btwse PG Sum XOR C out A C P : S G : P : G : P : G : P : G : P : G : C S Invertng Statc G : C B A G B B A A B P P G -: G -: G -: G -: G -: G -: P P S G : C S G : C Fg Btwse PG and sm XOR gates P S A _h P ' A _l B _h P G -: Footless Domno A _h B _l P Prefx Network Postcomputaton Table Btwse PG and sum XOR delay estmates Cell Term Nonnvertng Invertng Footed Domno Footless Domno Btwse LEbt / / / * / / * / PDbt / + / / + / / + / Sum XOR LExor / / / * / / * / PDxor / + / / + / / + / / + / each nvertng stage. Footed domno gates requre an extra clocked evaluaton transstor. The logcal efforts (LE) and parastc delays (PD) are gven n Table. Prefx networks consst of black cells, gray cells, and buffers. Black cells compute both propagate and generate sgnals. Gray cells compute only generate, and buffers reduce the loadng presented by noncrtcal paths. Fg shows crcut mplementatons of propagate and generate gates for valency. Invertng statc desgns requre alternatng stages of the gates shown and ther DeMorgan complements that accept nverted nputs and produce true outputs. [] found that the dfference n delay of the complementary stages s nsgnfcant, so t wll be gnored. K -: A _l P ' tny G P K P ' S _h S _l

Fg Valency statc and dynamc generate/propagate gates Fg Valency adder archtectures G P G G G P G :j P P P : P P G G P P K K G : P : K : (c) Brent-Kung : : : : : : : : P G P G G Fg Valency statc and dynamc generate/propagate gates G G P G P G P P P P P P G P : Valency Term Cell Invertng G G : G P G P / P G P K K K / K Table Gray and black cell delay estmates Nonnvertng Footed Domno Footless Domno PDg / / + / + / / + / PDp + / + / / + / LEg / / * / / * / LEg / / * / / * / LEp Gray / / * / / * / LEp Black / / * / / * / LEp Black / / * / / * / PDg / / + / + / / + / PDp + / + / / + / LEg / / * / / * / LEg / / * / ½ * / LEg / / * / / * / LEp Gray / / * / / * / LEp Gray / / * / / * / LEp Black / / * / / * / LEp Black / / * / / * / LEp Black / / * / / * / PDg / / + / + / / + / PDp + / + / / + / LEg / / * / / * / LEg / / * / / * / LEg / / * / / * / LEg / / * / / * / LEp Gray / / * / / * / LEp Gray / / * / / * / LEp Gray / / * / / * / LEp Black / / * / / * / LEp Black / / * / / * / LEp Black / / * / / * / LEp Black / / * / / * / Smlarly, Fg shows the crcut desgns for valency. Table gves the logcal efforts and parastc delays for the varous nputs to black and gray cells n each crcut famly. III. ADDER ARCITECTURES Adders are dstngushed by the arrangement of cells n the group PG logc. Fg shows typcal parallel prefx archtectures for valency gates []. One of several paths may be most crtcal dependng on the cell delays; the black hghlghted lnes ndcate the path that was assumed to be crtcal n ths study. Smlarly, Fg shows the analogous archtectures for hgher valency. gher valency adders offer a number of hybrd tree / select archtectures such as the spannng tree and sparse tree that reduce the number of cells n the parallel prefx network n exchange for addng short rpple networks; these varants are not consdered n ths study. G : P : K : : : : : : : : : : : ::: : : : : : : : : : : : : : (d) Sklansky : : : : : : : : : : : : : : : : : : : : ::: : : : : : : : : : : : : : (e) Kogge-Stone : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : ::: : : : : : : : : : : : : : (f) an-carlson : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : (g) Knowles [,,,] : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : (h) Ladner-Fscher : : : : : : : : : : : : : : : : : : : : :: : : : : : : : : : : : : : :

IV. LOGICAL EFFORT DELAY MODEL The method of Logcal Effort provdes a smple method for determnng a lower bound on crtcal path delay n crcuts wth neglgble wre capactance. If the path has M stages, a path effort of F, and a parastc delay of PD, the delay (n τ) acheved wth best transstor szes s /M D DF PD MF PD = + = + () where D s measured n unts of τ, the delay of an deal nverter wth no parastc capactance drvng an dentcal nverter. Delay s normalzed to that of a fanout-of- nverter wth the converson FO τ. In general, achevng least delay requres usng dfferent transstor szes n each gate (although ths delay model has assumed that all transstors n a branch scale unformly). A regular layout wth consstent transstor szes n each type of cell s easer to buld but may sacrfce performance. Consder desgnng all cells to have an arbtrary unt drve (.e. output conductance). Defne an nverter wth unt drve to have unt nput capactance. For crcuts wth a sngle stage per cell (e.g. nvertng statc ), the path effort delay s smply the sum of the effort delays of each stage: D F M = f () = The total delay s stll the sum of the path effort and parastc delays. In a crcut wth two stages per cell (e.g. nonnvertng statc or domno), let us desgn the frst stage to have unt drve. Choose the sze of the second stage for least delay. If the path has C = M/ cells and the effort of the th cell s F, the path effort delay s D F C = F () = [] showed that the delay wth unform szes s only slghtly longer than the delay wth arbtray szes except on archtectures lke Sklansky that have unusually large fanouts on certan nodes. The unform sze desgns are also easer to layout and permt closed-form results when wre capactance s consdered, so we focus on them n ths paper. orzontal wres add capactance to the load of each stage. Let the wre capactance be w unts per column spanned. w depends on the wdth of each column, the wdth and spacng between wres, and the sze of a unt transstor; n a tral layout n a nm process, w.. Whle there s no closed-form soluton for the mnmum-delay problem wth wre capactance, the delay assumng fxed cell szes s readly calculated by addng the wre capactance to the stage effort f or F n EQ () or (). V. RESULTS The adder delays were evaluated usng a MATLAB scrpt. Fg plots delay (n FO nverter delays) vs. number of bts for varous adder archtectures, and crcut famles assumng w =.. The three curves on each set of axes ndcate valency,, and delays. The delay s nearly ndependent of the valency for both statc and domno desgns of most archtectures. Brent-Kung archtectures are an excepton that beneft from hgher valency for nonnvertng crcuts because the stage effort s too low wth valency, but Brent-Kung s not the fastest archtecture n any case. Domno gates are consstently faster than statc and footless domno s faster than footed. The desgns wth two gates per stage (all but nvertng ) are better at drvng the heavy wre loads and hence perform better for wde adders. VI. CONCLUSIONS The logcal effort model facltates rapd comparson of a wde varety of adder archtectures usng multple crcut famles whle accountng for the costs of fanout and nterconnect. Under the assumptons made n ths paper, the delay s nearly ndependent of the valency for both statc and domno desgns of most archtectures. Brent-Kung archtectures are an excepton that beneft from hgher valency for nonnvertng crcuts because the stage effort s too low wth valency, but Brent-Kung s not the fastest archtecture n any case. Valency desgns are the smplest to mplement. Ths paper has not consdered the area, power, or wrng tradeoffs of hgher valency adders. In practce, the logcal efforts of gates are lkely to be lower on account of velocty saturaton, but the parastc delays are lkely to be hgher when nternal nodes are consdered. Smulatons of extracted layouts could answer these questons. REFERENCES A. Beaumont-Smth and C. Lm, Parallel prefx adder desgn, Proc. th IEEE Symp. Comp. Arth, pp. -, June. D. arrs and I. Sutherland, Logcal effort of carry propagate adders, Proc. th Aslomar Conf. Sgnals, Systems, and Computers, pp. -,. T. Lynch and E. Swartzlander, A spannng tree carry lookahead adder, IEEE Trans. Computers, vol., no., Aug., pp. -. S. Mathew, M. Anders, R. Krshnamurthy, and S. Borkar, "A -Gz -nm address generaton unt wth -bt sparse-tree adder core," J. Sold-State Crcut, vol., no., May, pp. -. S. Naffzger, A subnanosecond. µm b adder desgn, Intl. Sold-state Crcuts Conf.,, pp. -. N. Weste and D. arrs, VLSI Desgn, Addson-Wesley,.

Fg Valency adder archtectures (a) Brent-Kung : : : : : : : : : : : : : : : : : : : : : : : : : : : (b) Sklansky : : : : : : : : : : : : : : : : : : : : : : : : : : : (c) Kogge-Stone : : : : : : : : : : : : : : : : : : : : : : : : : : : (d) an-carlson : : : : : : : : : : : : : : : : : : : : : : : : : : : (e) Ladner-Fscher : :

Fg Adder delay vs. # of bts (logcal effort model results) Brent-Kung Delay (FO) Ladner-Fscher Sklansky Kogge-Stone an-carlson Invertng Nonnvertng # of bts Footed Domno Valency Valency Valency Footless Domno