Logcal Effort of gher Valency Adders Davd arrs arvey Mudd College E. Twelfth St. Claremont, CA Davd_arrs@hmc.edu Abstract gher valency parallel prefx adders reduce the number of logc levels at the expense of greater fan-n at each level. Ths paper uses the method of logcal effort to evaluate the tradeoffs of hgher valency for statc and dynamc mplementatons of varous adder archtectures. A G B P A G Fg Adder block dagram B A B A B C n Precomputaton P G P G P G P I. INTRODUCTION gher valency parallel prefx adders are popular for hgh performance applcatons such as mcroprocessor ALUs [,,, ]. A valency-v N-bt adder requres O(log v N) logc levels, so a bt addton requres as few as three levels of valency propagate-generate gates as opposed to sx levels of valency. owever, the hgher valency gates have greater logcal effort and parastc delay, are more complex to desgn, and are not always avalable n standard cell lbrares. Is hgher valency addton really faster? Domno gates have lower logcal efforts than ther statc counterparts and hence can use greater fan-ns. Does ths mean hgher valences are better suted to domno than statc logc? Ths paper uses the method of logcal effort to try to answer these questons. Accordng to the logcal effort model, the delays of valency-,, and desgns are all approxmately the same for a gven archtecture, crcut famly, and wre load model. Ths paper closely follows the methodology of []. It frst descrbes the statc and domno gates used to compute generate and propagate sgnals for the varous valences and tabulates the estmated logcal effort and parastc delay of each gate. It then shows the prefx networks and the crtcal paths that were examned. Fnally, t calculates the delays for valency,, and for each archtecture and crcut famly usng the method of logcal effort. II. LOGICAL EFFORT OF CIRCUIT BUILDING BLOCKS The three basc buldng blocks for an adder are the btwse Propagate/Generate (PG) cells, the group PG cells n the prefx network, and the sum XORs, as shown n Fg. gh performance datapath adders often buld these cells from domno gates whle statc s preferable when desgn smplcty and power consumpton take precedence over utmost performance. Fg shows mplementatons of the btwse PG cells and the sum XOR gates usng statc and domno gates. The statc desgns use propagate and generate (PG) whle the domno add kll (K) for monotonc sum computaton. The transstor wdths are specfed n arbtrary unts to delver unt drve. Nonnvertng statc gates add an nverter after Btwse PG Sum XOR C out A C P : S G : P : G : P : G : P : G : P : G : C S Invertng Statc G : C B A G B B A A B P P G -: G -: G -: G -: G -: G -: P P S G : C S G : C Fg Btwse PG and sm XOR gates P S A _h P ' A _l B _h P G -: Footless Domno A _h B _l P Prefx Network Postcomputaton Table Btwse PG and sum XOR delay estmates Cell Term Nonnvertng Invertng Footed Domno Footless Domno Btwse LEbt / / / * / / * / PDbt / + / / + / / + / Sum XOR LExor / / / * / / * / PDxor / + / / + / / + / / + / each nvertng stage. Footed domno gates requre an extra clocked evaluaton transstor. The logcal efforts (LE) and parastc delays (PD) are gven n Table. Prefx networks consst of black cells, gray cells, and buffers. Black cells compute both propagate and generate sgnals. Gray cells compute only generate, and buffers reduce the loadng presented by noncrtcal paths. Fg shows crcut mplementatons of propagate and generate gates for valency. Invertng statc desgns requre alternatng stages of the gates shown and ther DeMorgan complements that accept nverted nputs and produce true outputs. [] found that the dfference n delay of the complementary stages s nsgnfcant, so t wll be gnored. K -: A _l P ' tny G P K P ' S _h S _l
Fg Valency statc and dynamc generate/propagate gates Fg Valency adder archtectures G P G G G P G :j P P P : P P G G P P K K G : P : K : (c) Brent-Kung : : : : : : : : P G P G G Fg Valency statc and dynamc generate/propagate gates G G P G P G P P P P P P G P : Valency Term Cell Invertng G G : G P G P / P G P K K K / K Table Gray and black cell delay estmates Nonnvertng Footed Domno Footless Domno PDg / / + / + / / + / PDp + / + / / + / LEg / / * / / * / LEg / / * / / * / LEp Gray / / * / / * / LEp Black / / * / / * / LEp Black / / * / / * / PDg / / + / + / / + / PDp + / + / / + / LEg / / * / / * / LEg / / * / ½ * / LEg / / * / / * / LEp Gray / / * / / * / LEp Gray / / * / / * / LEp Black / / * / / * / LEp Black / / * / / * / LEp Black / / * / / * / PDg / / + / + / / + / PDp + / + / / + / LEg / / * / / * / LEg / / * / / * / LEg / / * / / * / LEg / / * / / * / LEp Gray / / * / / * / LEp Gray / / * / / * / LEp Gray / / * / / * / LEp Black / / * / / * / LEp Black / / * / / * / LEp Black / / * / / * / LEp Black / / * / / * / Smlarly, Fg shows the crcut desgns for valency. Table gves the logcal efforts and parastc delays for the varous nputs to black and gray cells n each crcut famly. III. ADDER ARCITECTURES Adders are dstngushed by the arrangement of cells n the group PG logc. Fg shows typcal parallel prefx archtectures for valency gates []. One of several paths may be most crtcal dependng on the cell delays; the black hghlghted lnes ndcate the path that was assumed to be crtcal n ths study. Smlarly, Fg shows the analogous archtectures for hgher valency. gher valency adders offer a number of hybrd tree / select archtectures such as the spannng tree and sparse tree that reduce the number of cells n the parallel prefx network n exchange for addng short rpple networks; these varants are not consdered n ths study. G : P : K : : : : : : : : : : : ::: : : : : : : : : : : : : : (d) Sklansky : : : : : : : : : : : : : : : : : : : : ::: : : : : : : : : : : : : : (e) Kogge-Stone : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : ::: : : : : : : : : : : : : : (f) an-carlson : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : (g) Knowles [,,,] : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : (h) Ladner-Fscher : : : : : : : : : : : : : : : : : : : : :: : : : : : : : : : : : : : :
IV. LOGICAL EFFORT DELAY MODEL The method of Logcal Effort provdes a smple method for determnng a lower bound on crtcal path delay n crcuts wth neglgble wre capactance. If the path has M stages, a path effort of F, and a parastc delay of PD, the delay (n τ) acheved wth best transstor szes s /M D DF PD MF PD = + = + () where D s measured n unts of τ, the delay of an deal nverter wth no parastc capactance drvng an dentcal nverter. Delay s normalzed to that of a fanout-of- nverter wth the converson FO τ. In general, achevng least delay requres usng dfferent transstor szes n each gate (although ths delay model has assumed that all transstors n a branch scale unformly). A regular layout wth consstent transstor szes n each type of cell s easer to buld but may sacrfce performance. Consder desgnng all cells to have an arbtrary unt drve (.e. output conductance). Defne an nverter wth unt drve to have unt nput capactance. For crcuts wth a sngle stage per cell (e.g. nvertng statc ), the path effort delay s smply the sum of the effort delays of each stage: D F M = f () = The total delay s stll the sum of the path effort and parastc delays. In a crcut wth two stages per cell (e.g. nonnvertng statc or domno), let us desgn the frst stage to have unt drve. Choose the sze of the second stage for least delay. If the path has C = M/ cells and the effort of the th cell s F, the path effort delay s D F C = F () = [] showed that the delay wth unform szes s only slghtly longer than the delay wth arbtray szes except on archtectures lke Sklansky that have unusually large fanouts on certan nodes. The unform sze desgns are also easer to layout and permt closed-form results when wre capactance s consdered, so we focus on them n ths paper. orzontal wres add capactance to the load of each stage. Let the wre capactance be w unts per column spanned. w depends on the wdth of each column, the wdth and spacng between wres, and the sze of a unt transstor; n a tral layout n a nm process, w.. Whle there s no closed-form soluton for the mnmum-delay problem wth wre capactance, the delay assumng fxed cell szes s readly calculated by addng the wre capactance to the stage effort f or F n EQ () or (). V. RESULTS The adder delays were evaluated usng a MATLAB scrpt. Fg plots delay (n FO nverter delays) vs. number of bts for varous adder archtectures, and crcut famles assumng w =.. The three curves on each set of axes ndcate valency,, and delays. The delay s nearly ndependent of the valency for both statc and domno desgns of most archtectures. Brent-Kung archtectures are an excepton that beneft from hgher valency for nonnvertng crcuts because the stage effort s too low wth valency, but Brent-Kung s not the fastest archtecture n any case. Domno gates are consstently faster than statc and footless domno s faster than footed. The desgns wth two gates per stage (all but nvertng ) are better at drvng the heavy wre loads and hence perform better for wde adders. VI. CONCLUSIONS The logcal effort model facltates rapd comparson of a wde varety of adder archtectures usng multple crcut famles whle accountng for the costs of fanout and nterconnect. Under the assumptons made n ths paper, the delay s nearly ndependent of the valency for both statc and domno desgns of most archtectures. Brent-Kung archtectures are an excepton that beneft from hgher valency for nonnvertng crcuts because the stage effort s too low wth valency, but Brent-Kung s not the fastest archtecture n any case. Valency desgns are the smplest to mplement. Ths paper has not consdered the area, power, or wrng tradeoffs of hgher valency adders. In practce, the logcal efforts of gates are lkely to be lower on account of velocty saturaton, but the parastc delays are lkely to be hgher when nternal nodes are consdered. Smulatons of extracted layouts could answer these questons. REFERENCES A. Beaumont-Smth and C. Lm, Parallel prefx adder desgn, Proc. th IEEE Symp. Comp. Arth, pp. -, June. D. arrs and I. Sutherland, Logcal effort of carry propagate adders, Proc. th Aslomar Conf. Sgnals, Systems, and Computers, pp. -,. T. Lynch and E. Swartzlander, A spannng tree carry lookahead adder, IEEE Trans. Computers, vol., no., Aug., pp. -. S. Mathew, M. Anders, R. Krshnamurthy, and S. Borkar, "A -Gz -nm address generaton unt wth -bt sparse-tree adder core," J. Sold-State Crcut, vol., no., May, pp. -. S. Naffzger, A subnanosecond. µm b adder desgn, Intl. Sold-state Crcuts Conf.,, pp. -. N. Weste and D. arrs, VLSI Desgn, Addson-Wesley,.
Fg Valency adder archtectures (a) Brent-Kung : : : : : : : : : : : : : : : : : : : : : : : : : : : (b) Sklansky : : : : : : : : : : : : : : : : : : : : : : : : : : : (c) Kogge-Stone : : : : : : : : : : : : : : : : : : : : : : : : : : : (d) an-carlson : : : : : : : : : : : : : : : : : : : : : : : : : : : (e) Ladner-Fscher : :
Fg Adder delay vs. # of bts (logcal effort model results) Brent-Kung Delay (FO) Ladner-Fscher Sklansky Kogge-Stone an-carlson Invertng Nonnvertng # of bts Footed Domno Valency Valency Valency Footless Domno