Where are we? Data Path Design

Where are we? Subsystem Design Registers and Register Files dders and LUs Simple ripple carry addition Transistor schematics Faster addition Logic generation How it fits into the datapath Data Path Design lock-diagram style data path description

it Slice Design ontrol it 3 Data-In Register dder Shifter Multiplexer it 2 it it 0 Data-Out Layout Reality Tile identical processing elements it Slice Design ontrol it 3 Data-In Register dder Shifter Multiplexer it 2 it it 0 Data-Out Tile identical processing elements Layout Reality 2

it Slice Plan Recall planning a DFF to make a register Inputs on top in M2 Outputs on bottom in M2 lock and lock-bar routed horizontally in M D2 Qb2 Q2 D Qb Q D0 Qb0 Q0 Vdd b Vss it Slice Plan Now extend this to a register file D inputs go to all cells an select one register for writing by controlling the clock Q outputs go all the way through the register file Each cell can drive Q from enabled inverter Now you can select one register for reading by selecting D2 which cell is driving its output D D0 b En b En Q2 Q Q0 3

it Slice Plan b En b En b En b En Q0 D0 Q D Q2 D2 it Slice Design ontrol it 3 Data-In Register dder Shifter Multiplexer it 2 it it 0 Data-Out Tile identical processing elements 4

Multi-Port Register Re Re0 Multi-Port Register 5

it Slice Design ontrol it 3 Data-In Register dder Shifter Multiplexer it 2 it it 0 Data-Out Tile identical processing elements Where are power lines? it Slice Design ontrol it 3 Data-In Register dder Shifter Multiplexer it 2 it it 0 Data-Out Tile identical processing elements Where are power lines? asic omb scheme 6

Power Routing is a global chipwide issue Here s another approach Note the Vdd and Gnd pads Global rings with combs for regions of the chip hip-wide View of Power Power Routing is a global chipwide issue Here s another approach Note the Vdd and Gnd pads Global rings with combs for regions of the chip hip-wide View of Power 7

ore power routing ore power routing 8

hip-wide View of Power nother view of the same issue Watch out for routing blockages! Tweak on the Scheme Same basic scheme ut with no internal jumpers Jumpers are restricted to outer loops 9

dders Etc. heck out hapter 0 in your text asic ddition: Full dder in Full adder Sum out 0

oolean Equations in Full adder Sum out Direct Implementation Fig 0.3 in your text 32 transistors

Use the Factored Equations V DD V DD i i V DD X i i S i V DD i o 28 Transistors Fully static, complex gate implementation Getting Rid of Inverters Even ell Odd ell 0 0 2 2 3 3 i,0 o,0 o, o,2 o,3 F F F F S 0 S S 2 S 3 Exploit Inversion Property Note: need 2 different types of cells an improve performance by removing inverters from carry chain 2

etter Static Gate ombine gates and reuse subterms etter Static Gate Sometimes called a mirror adder 3

Mirror dder onsiderations Feed the arry-in to the inner inputs so the internal capacitance is already discharged Make all transistors whose gates are connected to in and carry logic minimum size minimizes branching effort on critical path (carry out) Determine gate widths by Logical Effort reduce effort from to out at the expense of Sum Use relatively large transistors on critical path so that stray wiring cap is a small fraction of overall cap dder Layout Examples from Weste and Eshraghian Standard ell vs. Datapath Definitely worth looking at carefully 4

Datapath Layout little tricky to figure out You may not want to use this exact layout, but it might give you ideas Start by identifying vdd and gnd paths Think about rotating it ccw Think about a taller circuit that matches the bit-pitch of your register Datapath Layout 5

Example Datapath Layout ddition and Subtraction Remember back to your logic design class dd the two s complement to subtract Take two s complement by inverting all the bits and adding one Use the carry-in to add one Use an XOR to invert or not 0 0 Out 0 0 0 0 6

Two s omplement dd/sub side: XOR Gates Slightly tricky gate, ~ + ~ Lots of different schematics 7

nother XOR gate Not too bad if you already have, ~,, ~ floating around If not, you ll need a couple inverters too ~ ~ ~ ~ XOR ~ ~ ~ XNOR ~ Yet nother XOR Gate DVSL (section 6.2.3 in your text) Differential ascode Voltage Switch Logic Make sure that the combinational pull-down networks are complementary Out ~Out Differential Inputs PDN PDN2 8

DVSL XOR/XNOR Out ~Out ~ ~ ~ Generates both XOR/XNOR Still static, but might be slower than others nother DVSL Example Out ~Out D E ~D ~ ~E ~ ~ Pull-down stacks must be complementary 9

DVSL Large XOR Out ~Out Four-input XOR aka odd parity ~D D ~D D ~ ~ ~ ~ ~ DVSL Large XOR Out ~Out Four-input XOR aka odd parity ~D D ~D D ~ ~ ~ ~ ~ 20

DVSL Large XOR Out ~Out Four-input XOR aka odd parity ~D D ~D D ~ ~ ~ ~ ~ Transmission Gate XOR Tiny, clever circuit If is high, N, P act like inverter If is low, is passed to the output through transmission gate 2

Transmission Gate dder V DD V DD nother Version P P i i P P P V DD P S Sum Generation V DD o arry Generation i i Setup i P 22

Yet nother Version n Example Layout Not the same style we re used to seeing 23

More Pass Transistors omplementary Pass Transistor Logic (PL) Slightly faster, but more area S out S out Speeding Up ddition It all comes back to the carry circuit Ripple carry delay goes from low-order to high-order bit This determines the speed of the addition Many many ways to speed up the carry calculation Section 0.2.2 in your text 24

arry Lookahead Sum = P + i Key is that the carry depends ONLY on and, not the carry-in atch is that the gates have large fan-in - arry Lookahead Restated: i = Gi + Pi (i-) 0 = G0 + P0 in = G + P 0 = G + P(G0 + P0 in) = G + P G0 + P P0 in 2 = G2 + P2G2 + P2PG0 + P2PP0in 3 = G + P3G2 + P3P2G + P3P2PG0 + P3P2PP0in Or 3 = G3 + P3(G2 +P2( G + P(G0 + P0 in))) 25

arry Lookahead 0, 0, N-, N-... i,0 P0 i, P i,n - P N- The equations get larger with each stage Usually do lookahead in small blocks (I.e. 4) and the combine in a tree... arry Lookahead Logic 26

Fast arry Lookahead Logic Pseudo-nMOS Uses lots of current! nother Version V DD G 3 G 2 G G 0 i,0 o,3 P 0 P P 2 P 3 27

nother View nother View 4 4 3 3 2 2 in G 4 P 4 G 3 P 3 G 2 P 2 G P G 0 P 0 : itwise PG logic 2: Group PG logic G 3:0 G 2:0 G :0 G 0:0 3 2 0 3: Sum logic 4 out S 4 S 3 S 2 S 28

Ripple arry 4 4 3 3 2 2 in G 4 P 4 G 3 P 3 G 2 P 2 G P G 0 P 0 G 3:0 G 2:0 G :0 G 0:0 3 2 0 4 out S 4 S 3 S 2 S PG Diagram Notation lack cell Gray cell uffer i:k k-:j i:k k-:j i:j i:j i:j i:j G i:k P i:k G k-:j G i:j G i:k P i:k G i:j G k-:j G i:j G i:j P k-:j P i:j P i:j P i:j 29

Ripple arry t = t + ( N ) t + t ripple pg O xor 5 4 3 2 0 9 it Position 8 7 6 5 4 3 2 0 Delay 5:0 4:0 3:0 2:0 :0 0:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 :0 0:0 arry-lookahead dder arry-lookahead adder computes G i:0 for many bits in parallel. Uses higher-valency cells with more than two inputs. 6:3 6:3 2:9 2:9 8:5 8:5 4: 4: out G 6:3 P 6:3 2 G 2:9 P 2:9 8 G 8:5 P 8:5 4 G 4: P 4: + + + + in S 6:3 S 2:9 S 8:5 S 4: 30

L PG Diagram 6 5 4 3 2 0 9 8 7 6 5 4 3 2 0 6:0 5:0 4:0 3:0 2:0 :0 0:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 :0 0:0 Higher-Valency ells i:k k-:l l-:m m-:j i:j G i:k P i:k G k-:l P k-:l G l-:m P l-:m G m-:j P m-:j G i:j P i:j 3

arry-select dder arry-select ompute result for a block based on carry-in of and carry-in of 0, then select the right one arry-select dder Trick for critical paths dependent on late input X Precompute two possible outputs for X = 0, Select proper output when X arrives arry-select adder precomputes n-bit sums For both possible carries into n-bit group 6:3 6:3 2:9 2:9 8:5 8:5 4: 4: + 0 + 0 + 0 out + 2 + 8 + 4 + in 0 0 0 S 6:3 S 2:9 S 8:5 S 4: 32

arry-skip dder ompute the P and G for an entire block If the block generates or kills, don t propagate 6:3 6:3 2:9 2:9 8:5 8:5 4: 4: P 6:3 P 2:9 P 8:5 2 8 4 out 0 + 0 + 0 + 0 P 4: + in S 6:3 S 2:9 S 8:5 S 4: arry-skip PG Diagram 6 5 4 3 2 0 9 8 7 6 5 4 3 2 0 6:0 5:0 4:0 3:0 2:0 :0 0:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 :0 0:0 ( ) tskip = tpg + 2 n + ( k ) to + t For k n-bit groups (N = nk) xor 33

Tree dder If lookahead is good, lookahead across lookahead! Recursive lookahead gives O(log N) delay Many variations on tree adders rent-kung 5 4 3 2 0 9 8 7 6 5 4 3 2 0 5:4 3:2 :0 9:8 7:6 5:4 3:2 :0 5:2 :8 7:4 3:0 5:8 7:0 :0 3:0 9:0 5:0 5:04:03:02:0:00:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 :0 0:0 34

Sklansky 5 4 3 2 0 9 8 7 6 5 4 3 2 0 5:4 3:2 :0 9:8 7:6 5:4 3:2 :0 5:2 4:2 :8 0:8 7:4 6:4 3:0 2:0 5:8 4:8 3:8 2:8 5:04:03:02:0:00:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 :0 0:0 Kogge-Stone 5 4 3 2 0 9 8 7 6 5 4 3 2 0 5:4 4:3 3:2 2: :0 0:9 9:8 8:7 7:6 6:5 5:4 4:3 3:2 2: :0 5:2 4: 3:0 2:9 :8 0:7 9:6 8:5 7:4 6:3 5:2 4: 3:0 2:0 5:8 4:7 3:6 2:5 :4 0:3 9:2 8: 7:0 6:0 5:0 4:0 5:04:03:02:0:00:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 :0 0:0 35

Manchester arry hain Instead of changing the architecture of the adder, use a clever circuit to ripple the carry more effectively lternate Implementation 36

Four it lock Summary dder architectures offer area / power / delay tradeoffs. hoose the best one for your application. rchitecture arry-ripple lassification Logic Levels N- Max Fanout Tracks ells N arry-skip n=4 N/4 + 5 2.25N arry-inc. n=4 N/4 + 2 4 2N rent-kung Sklansky (L-, 0, 0) (0, L-, 0) 2log 2 N log 2 N 2 N/2 + 2N 0.5 Nlog 2 N Kogge-Stone (0, 0, L-) log 2 N 2 N/2 Nlog 2 N 37

Design as Trade-Off t p (nse c) 80.0 60.0 40.0 20.0 static mirror manchester bypass select look-ahead 0.2 rea (mm 2 )0.4 look-ahead select static bypass mirror manchester 0.0 0 0 20 N 0.0 0 0 20 N Do you want speed or size? There s always power to consider too How well does Synopsys do? Design compiler using a 80nm library 38

What should you use? Ripple if timing allows ompact, easy L or carry-skip work well for 8-6 bits For 32, and especially 64 bits tree adders are faster dders designed and tiled by hand will be much smaller (and probably faster) than synthesized adders Logic Functions Use the features of the full adder cell to generate logic functions Lots of other ideas in your text 39

General Logic Generator One Possible MUX Version 40

Remember the ig Picture ontrol it 3 Data-In Register dder Shifter Multiplexer it 2 it it 0 Data-Out Tile identical processing elements We want things to stack up nicely in the datapath Shifters Right nop Left i i i- i- it-slice i Essentially a muxing... operation select the shift you want (section 0.8) 4

arrel Shifter 3 3 Sh 2 2 Sh2 : Data Wire : ontrol Wire Sh3 0 0 Sh0 Sh Sh2 Sh3 Shift any number of bits in one shot lever layout is possible Lots of wiring arrel Shifter 3 3 3 Sh 2 2 3 Sh2 : Data Wire 3 : ontrol Wire Sh3 0 0 2 Sh0 Sh Sh2 Sh3 Shift any number of bits in one shot lever layout is possible Lots of wiring 42

Four by Four arrel Shifter 3 2 0 Sh0 Sh Sh2 Sh3 uffer Note the zig-zag control wire in poly Logarithmic Shifter Sh Sh Sh2 Sh2 Sh4 Sh4 3 3 2 2 0 0 43

Logarithmic Shifter Layout 44