Where are we? Data Path Design Subsystem Design Registers and Register Files dders and LUs Simple ripple carry addition Transistor schematics Faster addition Logic generation How it fits into the datapath lock-diagram style data path description it Slice Design ontrol it Slice Design ontrol it 3 it 3 Data-In Register dder Shifter Multiplexer it 2 it it Data-Out Data-In Register dder Shifter Multiplexer it 2 it it Data-Out Layout Reality Tile identical processing elements Tile identical processing elements Layout Reality it Slice Plan Recall planning a DFF to make a register Inputs on top in M2 Outputs on bottom in M2 lock and lock-bar routed horizontally in M D2 Qb2 Q2 D Qb Q D Qb Vdd b Q Vss it Slice Plan Now extend this to a register file D inputs go to all cells an select one register for writing by controlling the clock Q outputs go all the way through the register file Each cell can drive Q from enabled inverter Now you can select one register for reading by selecting which cell is driving its output D2 D Q2 Q D Q b En b En
it Slice Plan b En b En b En b En it Slice Design ontrol Q D it 3 D Q Data-In Register dder Shifter Multiplexer it 2 it it Data-Out D2 Q2 Tile identical processing elements Multi-Port Register Multi-Port Register Re Re it Slice Design ontrol it Slice Design ontrol it 3 it 3 Data-In Register dder Shifter Multiplexer it 2 it it Data-Out Data-In Register dder Shifter Multiplexer it 2 it it Data-Out Tile identical processing elements Where are power lines? Tile identical processing elements Where are power lines? asic omb scheme 2
hip-wide View of Power Power Routing is a global chipwide issue Here s another approach Note the Vdd and Gnd pads Global rings with combs for regions of the chip hip-wide View of Power Power Routing is a global chipwide issue Here s another approach Note the Vdd and Gnd pads Global rings with combs for regions of the chip ore power routing ore power routing hip-wide View of Power nother view of the same issue Watch out for routing blockages! Tweak on the Scheme Same basic scheme ut with no internal jumpers Jumpers are restricted to outer loops 3
dders Etc. asic ddition: Full dder heck out hapter in your text in Full adder out Sum in oolean Equations Full adder Sum out Direct Implementation Fig.3 in your text 32 transistors Use the Factored Equations VDD V DD i i X V DD i i S i V DD i i, Getting Rid of Inverters Even ell Odd ell 2 2 3 3 o, o, o,2 o,3 F F F F S S S 2 S 3 o Exploit Inversion Property 28 Transistors Fully static, complex gate implementation Note: need 2 different types of cells an improve performance by removing inverters from carry chain 4
etter Static Gate etter Static Gate Sometimes called a mirror adder ombine gates and reuse subterms Mirror dder onsiderations Feed the arry-in to the inner inputs so the internal capacitance is already discharged Make all transistors whose gates are connected to in and carry logic minimum size minimizes branching effort on critical path (carry out) Determine gate widths by Logical Effort reduce effort from to out at the expense of Sum Use relatively large transistors on critical path so that stray wiring cap is a small fraction of overall cap dder Layout Examples from Weste and Eshraghian Standard ell vs. Datapath Definitely worth looking at carefully Datapath Layout Datapath Layout little tricky to figure out You may not want to use this exact layout, but it might give you ideas Start by identifying vdd and gnd paths Think about rotating it ccw Think about a taller circuit that matches the bit-pitch of your register 5
Example Datapath Layout ddition and Subtraction Remember back to your logic design class dd the two s complement to subtract Take two s complement by inverting all the bits and adding one Use the carry-in to add one Out Use an XOR to invert or not Two s omplement dd/sub side: XOR Gates Slightly tricky gate, ~ ~ Lots of different schematics nother XOR gate Not too bad if you already have, ~,, ~ floating around If not, you ll need a couple inverters too Yet nother XOR Gate DVSL (section 6.2.3 in your text) Differential ascode Voltage Switch Logic Make sure that the combinational pull-down networks are complementary ~ ~ ~ ~ ~ XOR ~ ~ ~ XNOR Out Differential Inputs PDN PDN2 ~Out 6
DVSL XOR/XNOR nother DVSL Example Out ~Out Out ~Out ~ ~ ~ D E ~D ~ ~E ~ Generates both XOR/XNOR Still static, but might be slower than others ~ Pull-down stacks must be complementary DVSL Large XOR Out Four-input XOR aka odd parity ~Out DVSL Large XOR Out ~Out Four-input XOR aka odd parity ~D D ~D D ~D D ~D D ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ DVSL Large XOR Out Four-input XOR aka odd parity ~Out Transmission Gate XOR ~D D ~D D ~ ~ ~ ~ ~ Tiny, clever circuit If is high, N, P act like inverter If is low, is passed to the output through transmission gate 7
Transmission Gate dder nother Version P V DD V DD P i P i S Sum Generation V DD P P P V DD o arry Generation i i Setup i P Yet nother Version n Example Layout Not the same style we re used to seeing More Pass Transistors omplementary Pass Transistor Logic (PL) Slightly faster, but more area Speeding Up ddition It all comes back to the carry circuit Ripple carry delay goes from low-order to high-order bit This determines the speed of the addition S S out out Many many ways to speed up the carry calculation Section.2.2 in your text 8
arry Lookahead Sum = P i Key is that the carry depends ONLY on and, not the carry-in atch is that the gates have large fan-in - arry Lookahead Restated: i = Gi Pi (i-) = G P in = G P = G P(G P in) = G P G P P in 2 = G2 P2G2 P2PG P2PPin 3 = G P3G2 P3P2G P3P2PG P3P2PPin Or 3 = G3 P3(G2 P2( G P(G P in))) arry Lookahead,, N-, N-... arry Lookahead Logic i, P i, P i,n - PN- The equations get larger with each stage Usually do lookahead in small blocks (I.e. 4) and the combine in a tree... Fast arry Lookahead Logic nother Version VDD G3 G2 G Pseudo-nMOS Uses lots of current! i, P G o,3 P P2 P3 9
nother View nother View 4 4 3 3 2 2 in G 4 P 4 G 3 P 3 G 2 P 2 G P G P : itwise PG logic 2: Group PG logic G 3: G 2: G : G : 3 2 3: Sum logic 4 out S 4 S 3 S 2 S Ripple arry PG Diagram Notation 4 4 3 3 2 2 in lack cell Gray cell uffer G 4 P 4 G 3 P 3 G 2 P 2 G P G P i:k k-:j i:k k-:j i:j i:j i:j i:j G 3: G 2: G : G : 3 2 G i:k P i:k G k-:j G i:j G i:k P i:k G i:j G k-:j G i:j G i:j 4 P k-:j P i:j P i:j P i:j out S 4 S 3 S 2 S Ripple arry arry-lookahead dder t = t ( N ) t t ripple pg O xor 5 4 3 2 9 it Position 8 7 6 5 4 3 2 arry-lookahead adder computes G i: for many bits in parallel. Uses higher-valency cells with more than two inputs. 6:3 6:3 2:9 2:9 8:5 8:5 4: 4: Delay out G 6:3 2 G 2:9 8 G 8:5 4 G 4: P 6:3 P 2:9 P 8:5 P 4: in S 6:3 S 2:9 S 8:5 S 4: 5: 4: 3: 2: : : 9: 8: 7: 6: 5: 4: 3: 2: : :
L PG Diagram Higher-Valency ells 6 5 4 3 2 9 8 7 6 5 4 3 2 i:k k-:ll-:m m-:j i:j G i:k P i:k G k-:l P k-:l G l-:m P l-:m G m-:j G i:j 6: 5: 4: 3: 2: : : 9: 8: 7: 6: 5: 4: 3: 2: : : P m-:j P i:j arry-select dder arry-select dder arry-select ompute result for a block based on carry-in of and carry-in of, then select the right one Trick for critical paths dependent on late input X Precompute two possible outputs for X =, Select proper output when X arrives arry-select adder precomputes n-bit sums For both possible carries into n-bit group 6:3 6:3 2:9 2:9 8:5 8:5 4: 4: out 2 8 4 in S 6:3 S 2:9 S 8:5 S 4: arry-skip dder arry-skip PG Diagram ompute the P and G for an entire block If the block generates or kills, don t propagate 6 5 4 3 2 9 8 7 6 5 4 3 2 6:3 6:3 2:9 2:9 8:5 8:5 4: 4: P 6:3 P 2:9 P 8:5 P 4: 2 8 4 out in 6: 5: 4: 3: 2: : : 9: 8: 7: 6: 5: 4: 3: 2: : : S 6:3 S 2:9 S 8:5 S 4: tskip = tpg 2( n ) ( k ) to txor For k n-bit groups (N = nk)
Tree dder rent-kung If lookahead is good, lookahead across lookahead! Recursive lookahead gives O(log N) delay Many variations on tree adders 5:4 5:2 5 4 3:2 3 2 : :8 9:8 9 8 7 7:6 7:4 6 5:4 5 4 3 3:2 3: 2 : 5:8 7: : 3: 9: 5: 5:4:3:2::: 9: 8: 7: 6: 5: 4: 3: 2: : : Sklansky Kogge-Stone 5 4 3 2 9 8 7 6 5 4 3 2 5 4 3 2 9 8 7 6 5 4 3 2 5:4 4:3 3:2 2: : :9 9:8 8:7 7:6 6:5 5:4 4:3 3:2 2: : 5:4 3:2 : 9:8 7:6 5:4 3:2 : 5:2 4:2 :8 :8 7:4 6:4 3: 2: 5:2 4: 3: 2:9 :8 :7 9:6 8:5 7:4 6:3 5:2 4: 3: 2: 5:8 4:8 3:8 2:8 5:8 4:7 3:6 2:5 :4 :3 9:2 8: 7: 6: 5: 4: 5:4:3:2::: 9: 8: 7: 6: 5: 4: 3: 2: : : 5:4:3:2::: 9: 8: 7: 6: 5: 4: 3: 2: : : Manchester arry hain Instead of changing the architecture of the adder, use a clever circuit to ripple the carry more effectively lternate Implementation 2
Four it lock Summary dder architectures offer area / power / delay tradeoffs. hoose the best one for your application. rchitecture arry-ripple arry-skip n=4 arry-inc. n=4 rent-kung Sklansky Kogge-Stone lassification Logic Max Tracks Levels Fanout N- N/4 5 2 N/4 2 4 (L-,, ) 2log 2 N 2 (, L-, ) log 2 N N/2 (,, L-) log 2 N 2 N/2 ells N.25N 2N 2N.5 Nlog 2 N Nlog 2 N Design as Trade-Off How well does Synopsys do? t p (nse c) 8. 6. 4. 2. sta tic mirror manchester bypass select look-ahead.2 rea (mm 2 ).4 look-ahead select static bypass mirror manchester. 2 N. 2 N Do you want speed or size? There s always power to consider too Design compiler using a 8nm library What should you use? Ripple if timing allows ompact, easy L or carry-skip work well for 8-6 bits For 32, and especially 64 bits tree adders are faster dders designed and tiled by hand will be much smaller (and probably faster) than synthesized adders Logic Functions Use the features of the full adder cell to generate logic functions Lots of other ideas in your text 3
General Logic Generator One Possible MUX Version Remember the ig Picture ontrol Shifters Right nop Left it 3 i i Data-In Register dder Shifter Multiplexer it 2 it it Data-Out i- i- Tile identical processing elements We want things to stack up nicely in the datapath it-slice i Essentially a muxing... operation select the shift you want (section.8) arrel Shifter arrel Shifter 3 3 3 3 3 2 Sh 2 2 Sh 2 3 Sh2 : Data Wire Sh2 : Data Wire : ontrol Wire 3 : ontrol Wire Sh3 Sh3 2 Sh Sh Sh2 Shift any number of bits in one shot lever layout is possible Lots of wiring Sh3 Sh Sh Sh2 Shift any number of bits in one shot lever layout is possible Lots of wiring Sh3 4
Four by Four arrel Shifter Logarithmic Shifter Sh Sh Sh2 Sh2 Sh4 Sh4 3 2 3 3 2 2 Sh Sh Sh2 Sh3 uffer Note the zig-zag control wire in poly Logarithmic Shifter Layout 5