EE4-Fall 2 Digital Integrated ircuits dders Lecture 2 dders 4 4 nnouncements Midterm 2: Thurs. Nov. 4 th, 6:3-8:pm Exam starts at 6:3pm sharp Review session: Wed., Nov. 3 rd, 6pm n Intel Microprocessor 9- Mux 5- Mux a RRYGEN g64 roject phase 2 out this Thurs., due next Fri. Elad out of the office this afternoon hintan will hold office hours today from 2-3pm Hanh-huc will hold extra office today from 4-5pm Elad will hold extra office hours Thurs. -2pm 9- Mux node ck 2- Mux b um UMGEN + LU LU : Logical Unit Itanium has 6 64-bit integer execution units like this s s UMEL REG sum sumb to ache 2 2 5 5 lass Material it-liced Design Last lecture Dynamic logic Today s lecture dders Reading hapter Data-In Register dder ontrol hifter it 3 it 2 it it Data-Out Tile identical processing elements 3 3 6 6
Itanium Integer Datapath The inary dder = = + + + = + + Fetzer, Orton, I 2 7 7 Data aths re Thermal Hogs Express and as a function of, G, K Define 3 new variables which ONLY depend on, Generate (G) = ropagate () = Kill = 8 8 an also derive expressions for and based on K and Note that we will sometimes use an alternate definition for ropagate () = + Full-dder implest dder: Ripple- in Full adder out kill kill,,,,2,3 F F F F (=, ) 2 2 3 3 Worst case delay linear with the number of bits t d = O(N) t adder = (N-)t carry + t sum Goal: Make the fastest possible carry path circuit 9 9 2 2
omplementary tatic MO Full dder: Direct Implementation Minimize ritical ath by Reducing Inverting tages Even cell Odd cell 2 2 3 3 X,,,,2,3 F F F F Exploit Inversion roperty 28 Transistors 3 3 6 6 omplementary tatic MO Full dder etter tructure: The Mirror dder -ropagate Kill -ropagate Generate 28 Transistors 4 4 24 transistors 7 7 Inversion roperty izing the Mirror dder: Fanout ince LE of carry gate is 2, want f of 2 to get EF of 4 F F Use min. size sum gates to reduce load on carry. Total load on carry gate is: load = i + (6+6+9) load = 2 i 5 5 8 8
izing the Mirror dder Manchester hain i Gi i o Ki i load = i + (6+6+9) = 2 i i = 2 Minimum size G and K stacks to reduce diffusion loading 9 9 22 22 Mirror dder mary The NMO and MO chains are completely symmetrical. Maximum of two series transistors in the carry-generation gate. When laying out the cell, the most critical issue is the minimization of the capacitance at node. Reduction of the diffusion capacitances is particularly important. signals are critical - transistors connected to are placed closest to the output. Only the transistors in the (propagate) carry chain have to be optimized for speed. ll transistors in the sum stage can be minimal size. Dynamic Manchester hain i i i Gi Ki o i G i φ φ 2 2 23 23 Transmission Gate Full dder Manchester hain Generation i, G 2 3 G G2 G3 3 2 3 2 2 24 24
-ypass dder -elect dder i, G G 2 G 2 3 G 3,,,2 F F F F,3 lso called -kip,g ropagation G G 2 G 2 3 G 3, o, o, o,2 F F F F = o 2 3 o,3,k- ropagation o,k+3 Idea: If ( and and 2 and 3 = ) then 3 =, else kill or generate. Vector 25 25 28 28 -ypass dder (cont.) it 3 t setup it 4 7 t bypass it 8 it 2 5 elect dder: ritical ath it 3 it 4 7 it 8 it 2 5 - - - - - - - - t sum,,3,7,,5 M bits 3 4 7 8 2 5 t adder = t setup + (M-)t carry + (N/M-)t bypass + (M-)t carry + t sum 26 26 29 29 Ripple versus ypass Linear elect it -3 it 4-7 it 8- it 2-5 () t p ripple adder () bypass adder (5) (5) i, (5) (5) (5) (6) (7) (8) (9) -3 4-7 8-2-5 () 4..8 N 27 27 3 3
quare Root elect, () it - it 2-4 it 5-8 it 9-3 () (3) (3) (4) (5) (6) (4) (5) (6) (7) - -4 5-8 9-3 M it 4-9 (7) Mux (8) 4-9 (9) Rest of the Tree revious picture shows only half of the algorithm Need to generate carries at individual bit positions too 7, G7 6, G6 5, G5 4, G4 3, G3 2, G2, G, 7 6 5 4 3 2 3 3 34 34 dder Delays - omparison t p (in unit delays) 5 4 3 2 2 4 N Ripple adder Linear select quare root select 6 Many Kinds of Tree dders Many ways to construct these tree (or carry lookahead ) adders Many of these variations named after the people who first came up with them Most of these vary three basic parameters: Radix: how many bits are combined in each gate revious example was radix 2; often go up to radix 4 Tree Depth: how many stages of logic you go through to get the final carry. Must be at least log Radix (N) Fanout: Maximum logical branching in the tree 32 32 35 35 Logarithmic (Tree) dders asic Idea Look ahead across groups of multiple bits to figure out the carry Example with two bit groups: : =, G : = G + G, ut = G : + : n ombine these groups in a tree structure: Delay is now ~log 2 (N) Instead of ~N 7, G7 6, G6 5, G5 4, G4 3, G3 2, G2, G, G 33 33 7:6, G7:6 5:4, G5:4 3:2, G3:2 :, G: 7:4, G7:4 3:, G3: 7:, G7: Tree dders (, ) (, ) ( 2, 2 ) ( 3, 3 ) 4 ( 4, 4 ) 5 ( 5, 5 ) 6 7 ( 6, 6 ) ( 7, 7 ) 36 36 8 ( 8, 8 ) 9 ( 9, 9 ) (, ) (, ) rent-kung Tree 2 3 ( 2, 2 ) ( 3, 3 ) 4 ( 4, 4 ) 5 ( 5, 5 )
Tree dders Next Lecture (, ) (, ) ( 2, 2 ) ( 3, 3 ) ( 4, 4 ) ( 5, 5 ) ( 6, 6 ) ( 7, 7 ) ( 8, 8 ) ( 9, 9 ) (, ) (, ) ( 2, 2 ) ( 3, 3 ) ( 4, 4 ) ( 5, 5 ) 4 5 6 7 8 9 2 3 4 5 Domino Logic 6-bit radix-2 Kogge-tone tree 37 37 4 4 Tree dders (a, b ) (a, b ) (a 2, b 2 ) (a 3, b 3 ) (a 4, b 4 ) (a 5, b 5 ) (a 6, b 6 ) (a 7, b 7 ) (a 8, b 8 ) (a 9, b 9 ) (a, b ) (a, b ) (a 2, b 2 ) (a 3, b 3 ) (a 4, b 4 ) (a 5, b 5 ) 4 5 6 7 8 9 2 3 4 5 6-bit radix-2 sparse tree with sparseness of 2 38 38 Tree dders (a, b ) (a, b ) (a 2, b 2 ) (a 3, b 3 ) (a 4, b 4 ) (a 5, b 5 ) (a 6, b 6 ) (a 7, b 7 ) (a 8, b 8 ) (a 9, b 9 ) (a, b ) (a, b ) (a 2, b 2 ) (a 3, b 3 ) (a 4, b 4 ) (a 5, b 5 ) 4 5 6 7 8 9 2 3 4 5 6-bit radix-4 Kogge-tone Tree 39 39