EE241 - Spring 2010 Advanced Digital Integrated Circuits Lecture 25: Digital Arithmetic Adders Announcements Homework 4 due today Quiz #4 today In class (80min) final exam on April 29 Project reports due on May 4 6 pages, double column Project presentations May 5, 1-4pm 15 minute talk + 5 minute Q &A 2 1
Outline Last lecture Domino timing Other dynamic styles This lecture Adders 3 Adders 2
Arithmetic Circuits Chapter 11, Rabaey, 2 nd ed. Selected journal publications Books: Ercegovac and Lang, Digital Arithmetic Elsevier 2004 High-Speed VLSI Arithmetic Units: Adders and Multipliers, by V. Oklobdzija in Chandrakasan et al. 5 Adders EE141 Ripple carry & implementation Carry bypass (skip) Carry select Carry lookahead (basic) EE241 Conditional sum More carry lookahead 6 3
Conditional Sum Adders 0 i i y i s x y 1 i i i s x y 0 oi i i c x y 1 oi i i c x y Sklansky, Trans on Comp 6/60 7 Conditional Sum Adders 8 4
TG Conditional Sum Conditional Sum Adder Conditional Cell 2-way MUXes Rothermel, JSSC 89 9 TG Conditional Sum Serial connection of transmission gates Chain length = 1+log 2 n Signal propagation 10 5
DPL Conditional Sum CLA Conditional carry select 11 DPL Conditional Sum Block Conditional Sums 12 6
Carry-Lookahead Adders Adder trees Radix of a tree Minimum depth trees Sparse trees Logic manipulations Conventional vs. Ling Stack height limiting 13 Lookahead Adder: Basic Idea A 0, B 0 A 1, B 1 A N-1, B N-1 C i,0 P 0 C i,1 P 1 C i, N-1 P N-1 S 0 k 1 Co, k f Ak, Bk Ci, k Gk Pk Ci k S 1 S N-1 C i,,, 14 7
Propagate and Generate Signals Define 2 (or 3) new variables which ONLY depend on inputs a k, b k Generate (g k ) = a k b k Propagate (p k ) = a k b k (could be XOR as well) (Delete = a k B ) k c g, p g p c out k k k k in sg (, p ) a b c k k k k in Can also derive expressions for s and c out based on d k and p k 15 Lookahead Adder Looakahead Equations Position k: Position k + 1: ck gk pkck 1 ck 1 gk 1 pk 1ck g p g p c g p g p p c k 1 k 1 k k k 1 k 1 k 1 k k 1 k k 1 Carry exists if: - generated in stage k + 1 - generated in stage k and propagated through k + 1 - propagated through both k and k + 1 16 8
Lookahead Adder Unrolling of carry recurrence can be continued If unrolled to level k, resulting in two-level AND-OR structure AND Fan-In = k + 1, OR Fan-In = k + 1 k + 1 transistors in the MOS stack Limits k to 2 4 Later referred to as a radix of an adder 17 Carry Lookahead Trees C o 0 = G 0 + P 0 C i 0 C o1 = G 1 + P 1 G 0 + P 1 P 0 C i0 C o2 = G 2 + P 2 G 1 + P 2 P 1 G 0 + P 2 P 1 P 0 C i 0 = G 2 + P 2 G 1 + P 2 P 1 G 0 + P 0 C i0 = G 2:1 + P 2:1 C o 0 Can continue building the tree hierarchically 18 9
Tree Adders P G p p m more significant G m p l G g m p m g l l less significant Start from the input P, G, and continue up the tree 2-bit groups, then 4-bit groups, p (G G, P ) g, p g, p g p g, p G G m m l l m m l m l Kogge, Stone, Trans on Comp, 73 Radix 2 19 Adder Structure Carry tree and sum precompute operate in parallel Sum select selects the correct precomputed sum based on final carry 20 10
Adder Optimization If given Input capacitance, Overall fanout (loading capacitance) Wiring structure Adder topology Optimization can be performed to: Minimize the delay subject to power Minimize the power for given delay constraint 21 Design Considerations for CLA Adders Wire capacitance is determined by the microarchitecture From register files / Cache / Bypass Carry signals cross certain number of bitslices Multiplexers The adder topology determines the wire capacitance weak function of gate sizing Loopback Bus Loopback Bus Shifter Adder stage 1 Wiring Adder stage 2 Wiring Loopback Bus The capacitance of wires depends on the tree topology and wiring/shielding methodology Bit slice 63 Adder stage 3 Sum Select Bit slice 2 Bit slice 1 Bit slice 0 To register files / Cache 22 11
Specifying the Output Capacitance Fanout is dictated by the architecture In Itanium, each IEU drives 6 other IEUs, register files and the cache, through a long bus Thus the fanout is larger than 15-20, but depends on the ratio of the IEU input capacitance compared to the bus capacitance Bus is driven through a buffer, thus reducing the adder fanout to close to 1. 23 Specifying the Input Capacitance Larger C in : Less impact of internal wires Less fanout (less impact of the buss) Faster adder Power grows linearly with C in Smaller C in : Larger impact of internal wires Larger fanout Slower, lower power adder Optimum tradeoff: For desired de/dd (for both adder and 6 IEUs) find optimal Cg/Cw For example de/dd=2, Cg/Cw = 2.5-3 24 12
Carry Tree Considerations Number of signals merging at each stage (radix) Uniform vs. non-uniform Number of logic levels Full vs. sparse trees 25 Tree Adders: Kogge-Stone (A 0, B 0 ) (A 1, B 1 ) (A 2, B 2 ) (A 3, B 3 ) (A 4, B 4 ) (A 5, B 5 ) (A 6, B 6 ) (A 7, B 7 ) (A 8, B 8 ) (A 9, B 9 ) (A 10, B 10 ) (A 11, B 11 ) (A 12, B 12 ) (A 13, B 13 ) (A 14, B 14 ) (A 15, B 15 ) S 0 S 1 S 2 S 3 S 4 S 5 S 6 S 7 S 8 S 9 S 10 S 11 S 12 S 13 S 14 S 15 16-bit radix-2 Kogge-Stone Tree 26 13
Tree Adders: Other Trees Ladner-Fischer (A 0, B 0 ) (A 1, B 1 ) (A 2, B 2 ) (A 3, B 3 ) (A 4, B 4 ) (A 5, B 5 ) (A 6, B 6 ) (A 7, B 7 ) (A 8, B 8 ) (A 9, B 9 ) (A 10, B 10 ) (A 11, B 11 ) (A 12, B 12 ) (A 13, B 13 ) (A 14, B 14 ) (A 15, B 15 ) S 0 S 1 S 2 S 3 S 4 S 5 S 6 S 7 S 8 S 9 S 10 S 11 S 12 S 13 S 14 S 15 27 Kogge-Stone vs. Ladner-Fischer Uniform vs. progressively increasing fanouts Ladner-Fischer much slower Needs internal buffering 28 14
Tree Adders: Radix 4 (a 0, b 0 ) (a 1, b 1 ) (a 2, b 2 ) (a 3, b 3 ) (a 4, b 4 ) (a 5, b 5 ) (a 6, b 6 ) (a 7, b 7 ) (a 8, b 8 ) (a 9, b 9 ) (a 10, b 10 ) (a 11, b 11 ) (a 12, b 12 ) (a 13, b 13 ) (a 14, b 14 ) (a 15, b 15 ) S 0 S 1 S 2 S 3 S 4 S 5 S 6 S 7 S 8 S 9 S 10 S 11 S 12 S 13 S 14 S 15 16-bit radix-4 Kogge-Stone Tree 29 Radix-2 vs. Radix-4 More logic stages drive fanout easier Fanout is low, radix-4 can be padded with inverters Radix-4 has less stages and could have speed advantage when driving low fanouts Radix-2 has lower stack heights Radix-4 has longer wires (64 bits: crosses 48 bitslices vs. 32 in radix-2). Less logic stages precedes large wireload. 30 15
Ling Adder CLA g a b i i i p a b i i i G g p G i:0 i i i 1:0 S a b G i i i i 1:0 Ling s equations g a b i i i t a b i i i H g t H i:0 i i 1 i 1:0 S t H g t H i i i:0 i i 1 i 1:0 Ling, IBM J. Res. Dev, 5/81 31 Ling Adder Conventional radix-4 G g pg ppg pppg 3:0 3 3 2 3 2 1 3 2 1 0 Ling s radix-4 H g t g t t g t t t g g g t g t t g 3:0 3 2 2 2 1 1 2 1 0 0 3 2 2 1 2 1 0 Reduces the stack height (or width) Reduces input loading 32 16
Ling vs. CLA Conventional G3 Ling s H3 CK CK G3 H3 a3 a3 b3 a3 a2 a2 b2 b3 a2 a2 b2 b3 b2 a1 b1 a1 b 2 a1 a1 b1 b1 a0 b1 a0 b0 b0 33 17