The -bit inary dder CPE/EE 427, CPE 527 VLI Design I L2: dder Design Department of Electrical and Computer Engineering University of labama in Huntsville leksandar Milenkovic ( www. ece.uah.edu/~milenka ) www. ece.uah.edu/~milenka/cpe527-3f [dapted from Rabaey s Digital Integrated Circuits, 22, J. Rabaey et al. and Mary Jane Irwin ( www. cse. psu.edu/~mji ) ] -bit Full dder (F) How can we modify it easily to build an adder/subtractor? How can we make it better (faster, lower power, smaller)? carry status kill kill propagate propagate propagate propagate generate generate G = & P = = = P K =! &! = & & & (majority function) = G P& How can we use it to build a 64-bit adder? /7/23 VLI Design I;. Milenkovic 4 Course dministration Instructor: leksandar Milenkovic milenka@ece.uah.edu www.ece.uah.edu/~milenka E 27-L Office Hrs: MW 7:3-8:3 T: Fathima Tareen tareenf @eng.uah.edu Project pr.: For schedule http://www.ece.uah.edu/~milenka/cpe527-3f Follow conventions for ppt file names Timing, content,... HW#3: Due 2//3 Project: Reports due 2/2/3 Design submission due /2/3 (arrange with instructor & lab instructor) /7/23 VLI Design I;. Milenkovic 2 F Gate Level Implementations The way you learned to design in EE2 and CPE422 t /7/23 VLI Design I;. Milenkovic 5 t t2 t2 t t Review: asic uilding locks Datapath Execution units dder, multiplier, divider, shifter, etc. Register file and pipeline registers Multiplexers, decoders Control Finite state machines (PL, ROM, random logic) Interconnect witches, arbiters, buses Memory Caches (RMs), TLs, DRMs, buffers /7/23 VLI Design I;. Milenkovic 3 Carry-Look -head dder CL () Idea: speed up carry computation C i+ = G i + P i* C i Propagate: P i = i + i if P i =, then carry from (i-)th stage is propagated Generate: G i = i * i if G i = there is carry out Pi = i + i Gi = i i i = i i Ci Ci+ = Gi Ci = Gi (Gi Ci ) = Gi Gi Ci = Gi Gi (Gi 2 2 Ci 2) = Gi Gi Gi 2 2 Ci 2 /7/23 VLI Design I;. Milenkovic 6 VLI Design I;. Milenkovic
Carry-Look -head dder CL (2) Review: XOR F C = G + P C C2 = G + P G + P P C 3 3 C3 = G2 G P G P P C C4 = G3 G2 G+ P3 P G + P3 P P C 2 P 2 C C P2 G P P3 G3 P2 G2 P C3 C2 C 3 2 P G C P2 P G P2 C3 P G C G G2 PG Generator Carry Generate lock um Generator 6 transistors /7/23 VLI Design I;. Milenkovic 7 /7/23 VLI Design I;. Milenkovic Carry-Look -head dder CL (4) Review: CPL F C = G + P C C2 = G + P G + P P C C3 = G 2 G P G P P C C4 = G 3 G2 G P2 P G P P C C4 = G3 G2 G P G P P C!!!! C4 = G 3+ P 3 C C8 = G7 4 + P7 4 C4 = G7 4 + P7 4 (G3 C) = G7 4 + P7 4 G3 + P7 4 P3 C C2 = G 8 + P 8 C8 = G 8 + P 8 (G7 4 + P7 4 G3 + P7 4 P3 C ) = G 8 + P 8 G7 4 + P 8 P7 4 G3 + P 8 P7 4 P3 C C6 = G5 2 + P5 2 C2 = G5 2 + P5 2 ( G 8 + P 8 G7 4 + P 8 P7 4 G3 + P 8 P7 4 P3 C) = G5 2 + P5 2 G 8 + P5 2 P 8 G7 4 + P5 2 P 8 P7 4 G3 + P5 2 P 8 P7 4 P3 C /7/23 VLI Design I;. Milenkovic 8!!!!!! 2+8 transistors, dual rail beware of threshold drops /7/23 VLI Design I;. Milenkovic Carry-Look -head dder CL (5) Delay alanced F 5-2 5-2 -8-8 7-4 7-4 3: 3:!!P Identical Delays for Carry and um 4 4 4 4 p P P! P 5-2 -8 7-4 3:!!P G 5-2P 5-2 G 5- P 5- C 2 G -8 P -8 C 8 G 7-4 P 7-4 C 4 G 3- P 3- CG C C P!P!P eration ignal set-up 2+2 transistors Carry generation /7/23 VLI Design I;. Milenkovic 9 /7/23 VLI Design I;. Milenkovic 2 VLI Design I;. Milenkovic 2
Review: Mirror dder 24+4 transistors 6 8 8 8 4 4 4 -propagate kill 6 8 8 4! 6! -propagate 4 4 generate 2 3 4 4 4 2 2 2 3 3 = & & & UM = && C OUT &( ) izing: Each input in the carry circuit has a logical effort of 2 so the optimal fan-out for each is also 2. ince! drives 2 internal and 2 inverter transistor gates (to form for the nms bit adder) should oversize the carry circuit. PMO/NMO ratio of 2. /7/23 VLI Design I;. Milenkovic 3 Ripple Carry dder (RC) 3 3 2 2 =C 4 F F F F C = 3 2 T adder T F (, ) +(N-2)T F ( ) + T F ( ) T = O(N) worst case delay Real Goal: Make the fastest possible carry path /7/23 VLI Design I;. Milenkovic 6 Mirror dder Features The NMO and PMO chains are completely symmetrical with a maximum of two series transistors in the carry circuitry,guaranteeing identical rise and fall transitions if the NMO and PMO devices are properly sized. When laying out the cell, the most critical issue is the minimization of the capacitances at node! (four diffusion capacitances, two internal gate capacitances, and two inverter gate capacitances). hared diffusions can reduce the stack node capacitances. The transistors connected to are placed closest to the output. Only the transistors in the carry stage have to be optimized for optimal speed. ll transistors in the sum stage can be minimal size. /7/23 VLI Design I;. Milenkovic 4 Inversion Property Inverting all inputs to a F results in inverted values for all outputs F F! (,, ) = (!,!,! )! (,, ) = (!,!,! ) /7/23 VLI Design I;. Milenkovic 7 64-bit dder/ubtractor Exploiting the Inversion Property Ripple Carry dder (RC) built out of 64 Fs ubtraction complement all subtrahend bits (xor gates) and set the low order carry-in RC advantage: simple logic, so small (low cost) disadvantage: slow (O(N) for N bits) and lots of glitching (so lots of energy consumption) add/subt C = -bit F C -bit F C 2 -bit F 2 C 3 -bit F 63 C 64 = /7/23 VLI Design I;. Milenkovic 5 2 63 2 63... C 63 =C 4 3 3 F 3 2 2 F 2 F F inverted cell regular cell C = Minimizes the critical path (the carry chain) by eliminating inverters between the Fs (will need to increase the transistor sizing on the carry chain portion of the mirror adder). Now need two flavors of Fs /7/23 VLI Design I;. Milenkovic 8 VLI Design I;. Milenkovic 3
Fast Carry Chain Design 4-bit liced MCC dder The key to fast addition is a low latency carry network What matters is whether in a given position a carry is generated G i = i & i = i i propagated P i = i i (sometimes use i i ) annihilated (killed) K i =! i &! i 3 3 & G P 2 2 & G P & G P & G P clk Giving a carry recurrence of C i+ = G i P i C i!c 4!C C = C 2 = C 3 = C 4 = 3!C 3 2!C 2!C /7/23 VLI Design I;. Milenkovic 9 /7/23 VLI Design I;. Milenkovic 22 Fast Carry Chain Design The key to fast addition is a low latency carry network What matters is whether in a given position a carry is generated G i = i & i = i i propagated P i = i i (sometimes use i i ) annihilated (killed) K i =! i &! i Giving a carry recurrence of C i+ = G i P i C i C = G P C C 2 = G P G P P C C 3 = G 2 P 2 G P 2 P G P 2 P P C C 4 = G 3 P 3 G 2 P 3 P 2 G P 3 P 2 P G P 3 P 2 P P C /7/23 VLI Design I;. Milenkovic 2 Domino Manchester Carry Chain Circuit P 3 C i,4 2 3 4 3 3 3 3 3 G 3 2 P 2 G 2 2 3 4 5 6 clk!(g P C i, )!(G 2 P 2 G P 2 P G P 2 P P C i, )!(G P G P P C i, )!(G 3 P 3 G 2 P 3 P 2 G P 3 P 2 P G P 3 P 2 P P C i, ) /7/23 VLI Design I;. Milenkovic 23 3 P G 4 P G 5 clk C i, Manchester Carry Chain witches controlled by G i and P i inary dder Landscape synchronous word parallel adders!c i+ clk G i Pi!C i Total delay of time to form the switch control signals G i and P i setup time for the switches signal propagation delay through N switches in the worst case /7/23 VLI Design I;. Milenkovic 2 ripple carry adders (RC) T = O(N), = O(N) carry prop min adders signed-digit fast carry prop residue adders adders adders T = O(), = O(N) Manchester carry parallel conditional carry carry chain select prefix sum skip T = O(N) = O(N) T = O(log N) = O(N log N) T = O( N), = O(N) /7/23 VLI Design I;. Milenkovic 24 VLI Design I;. Milenkovic 4
C o,3 Carry-kip (Carry-ypass) dder C o,3 3 3 F 3 P = P P P 2 P 3 2 2 F 2 F lock Propagate F If (P & P & P 2 & P 3 = ) then C o,3 = C i, otherwise the block itself kills or generates the carry internally C i, Optimal lock ize and Time ssuming one stage of ripple (t carry ) has the same delay as one skip logic stage (t skip ) and both are T Ck = + + (N/-) + + t setup ripple in skips ripple in t sum block last block = 2 + N/ + o the optimal block size,, is dt Ck /d = (N/2) = opt nd the optimal time is Optimal T Ck = 2( (2N)) + /7/23 VLI Design I;. Milenkovic 25 /7/23 VLI Design I;. Milenkovic 28 Carry-kip Chain Implementation Carry-kip dder Extensions carry-out block carry-out P block carry-in Variable block sizes carry that is generated in, or absorbed by, one of the inner blocks travels a shorter distance through the skip blocks, so can have bigger blocks for the inner carries without increasing the overall delay P 3 P 2 P P Multiple levels of skip logic! P G 3 G 2 G G /7/23 VLI Design I;. Milenkovic 26 skip level skip level 2 ND of the first level skip signals (P s) /7/23 VLI Design I;. Milenkovic 29 4-bit lock Carry-kip dder Carry-kip dder Comparisons bits 2 to 5 bits 8 to bits 4 to 7 bits to 3 7 6 Carry Propagation Carry Propagation Carry Propagation Carry Propagation 5 um um um um Worst-case delay carry from bit to bit 5 = carry generated in bit, ripples through bits, 2, and 3, skips the middle two groups ( is the group size in bits), ripples in the last group from bit 2 to bit 5 C i, 4 3 2 =2 =3 =4 =5 =6 RC Ck Vk T add = t setup + t carry + ((N/) -) t skip + t carry + t sum 8 bits 6 bits 32 bits 48 bits 64 bits /7/23 VLI Design I;. Milenkovic 27 /7/23 VLI Design I;. Milenkovic 3 VLI Design I;. Milenkovic 5
Carry elect dder s s s bits 4 to s 9 quare Root Carry elect dder bits 9 to 3 bits 5 to 8 bits 2 to 4 bits to s s s s s s s s Precompute the carry out of each block for both carry_in = and carry_in = (can be done for all blocks in parallel) and then select the correct one 4-b carry propagation carry propagation multiplexer eration carry carry C out carry carry carry carry P sg s carry carry P sg s carry carry Cin /7/23 VLI Design I;. Milenkovic 3 /7/23 VLI Design I;. Milenkovic 34 Carry elect dder: Critical Path bits 2 to 5 bits 8 to bits 4 to 7 bits to 3 s s s s s s s s s bits 4 to s 9 quare Root Carry elect dder bits 9 to 3 bits 5 to 8 bits 2 to 4 bits to s s s s s s s s carry carry carry carry carry carry carry carry carry +6 carry carry +5 carry carry +4 carry P sg s carry +3 carry P sg s carry +2 carry C + out um + gen + + + + Cin /7/23 VLI Design I;. Milenkovic 32 T add = t setup + 2 t carry + vn t + t sum /7/23 VLI Design I;. Milenkovic 35 Carry elect dder: Critical Path bits 2 to 5 s s bits 8 to s s bits 4 to 7 s s bits to 3 s s carry carry carry carry +4 carry carry carry carry + + + + um +gen T add = t setup + t carry + N/ t + t sum /7/23 VLI Design I;. Milenkovic 33 Parallel Prefix dders (PPs) Define carry operator on (G,P) signal pairs (G,P ) (G,P ) (G,P) where G = G P G P = P P is associative, i.e., [(g,p ) (g,p )] (g,p ) = (g,p ) [(g,p ) (g,p )] G!G /7/23 VLI Design I;. Milenkovic 36 P G VLI Design I;. Milenkovic 6
Parallel Prefix Computation T = log 2 N - 2 = 2log 2 N T = log 2 N Parallel Prefix Computation T = log 2 N - 2 = 2log 2 N T = log 2 N Parallel Prefix Computation T = log 2 N = log 2 N PP General tructure Kogge-tone PPF dder Given P and G terms for each bit position, computing all the carries is equal to finding all the prefixes in parallel (G,P ) (G,P ) (G 2,P 2 ) (G N- 2,P N-2 ) (G N-,P N- ) ince is associative, we can group them in any order but note that it is not commutative P i, G i logic ( unit delay) C i parallel prefix logic tree ( unit delay per level) i logic ( unit delay) Measures to consider number of cells tree cell depth (time) tree cell area cell fan-in and fan-out max wiring length wiring congestion delay path variation (glitching) /7/23 VLI Design I;. Milenkovic 37 G 5 G 4 G 3 G 2 P 5 P 4 P 3 P 2 G G G 9 P P P 9 G 8 P 8 G 7 P 7 /7/23 VLI Design I;. Milenkovic 4 G 6 P 6 G 5 P 5 G 4 P 4 G 3 P 3 C 6 C 5 C 4 C 3 C 2 C C C 9 C 8 = N C 7 C 6 C 5 T add = t setup + log 2 N t + t sum C 4 G 2 G P 2 P C 3 C 2 G P C rent-kung PP More dder Comparisons G 5 G 4 G 3 G 2 p 5 p 4 p 3 P 2 G G G 9 p P p 9 G 8 P 8 G 7 P 7 G 6 G 5 P 6 P 5 G 4 G 3 P 4 P 3 G 2 p 2 G P G P 7 6 5 4 3 2 RC Ck Vk K PP C 6 C 5 C 4 C 3 C 2 C C C 9 C 8 = N/2 C 7 C 6 C 5 C 4 C 3 C 2 C 8 bits 6 bits 32 bits 48 bits 64 bits /7/23 VLI Design I;. Milenkovic 38 /7/23 VLI Design I;. Milenkovic 4 rent-kung PP dder peed Comparisons G 5 G 4 G 3 G 2 p 5 p 4 p 3 P 2 G G G 9 p P p 9 G 8 P 8 G 7 P 7 G 6 G 5 P 6 P 5 G 4 G 3 P 4 P 3 G 2 p 2 G P G P 7 6 5 4 3 RC MCC CCk VCk CCl &K 2 C 6 C 5 C 4 C 3 C 2 C C C 9 C 8 C 7 C 6 C 5 C 4 C 3 = N/2 C 2 C 6 bits 32 bits 64 bits /7/23 VLI Design I;. Milenkovic 39 /7/23 VLI Design I;. Milenkovic 42 VLI Design I;. Milenkovic 7
35 dder verage Power Comparisons 3 25 2 5 RC MCC CCk VCk CCl &K 5 6 bits 32 bits 64 bits /7/23 VLI Design I;. Milenkovic 43 PDP of dder Comparisons 8 6 4 2 RC MCC CCk VCk CCl K 8 bits 6 bits 32 bits 48 bits 64 bits From Nagendra, 996 /7/23 VLI Design I;. Milenkovic 44 VLI Design I;. Milenkovic 8