Digital Integrated Circuits A Design Perspective Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic Arithmetic Circuits January, 2003 1
A Generic Digital Processor MEM ORY INPUT-OUTPUT CONTROL DATAPATH 2
Building Blocks for Digital Architectures Arithmetic unit - Bit-sliced datapath (adder, multiplier, shifter, comparator, etc. Memory - RAM, ROM, Buffers, Shift registers Control - Finite state machine (PLA, random logic. - Counters Interconnect - Switches - Arbiters - Bus 3
An Intel Microprocessor 9-1 Mux 5-1 Mux a CARRYGEN g64 node1 ck1 SUMSEL REG sum sumb to Cache 9-1 Mux 2-1 Mux b SUMGEN + LU s0 s1 LU : Logical Unit 1000um Itanium has 6 integer execution units like this 4
Bit-Sliced Design Control Bit 3 Data-In Register Adder Shifter Multiplexer Bit 2 Bit 1 Bit 0 Data-Out Tile identical processing elements 5
Loopback Bus Loopback Bus Loopback Bus Bit slice 0 Bit slice 1 Bit slice 2 Bit slice 63 Bit-Sliced Datapath From register files / Cache / Bypass Multiplexers Shifter Adder stage 1 Wiring Adder stage 2 Wiring Adder stage 3 Sum Select To register files / Cache 6
Itanium Integer Datapath Fetzer, Orton, ISSCC 02 7
Adders 8
Full-Adder A B Cin Full adder Sum Cout 9
The Binary Adder A B Cin Full adder Sum Cout S = A B C i = ABC i + ABC i + ABC i + ABC i C o = AB + BC i + AC i 10
Express Sum and Carry as a function of P, G, D Define 3 new variable which ONLY depend on A, B Generate (G = AB Propagate (P = A B Delete = A B Can also derive expressions for S and C o based on D and P Note that we will be sometimes using an alternate definition for Propagate (P = A + B 11
The Ripple-Carry Adder A 0 B 0 A 1 B 1 A 2 B 2 A 3 B 3 C i,0 C o,0 C o,1 C o,2 C o,3 FA FA FA FA (= C i,1 S 0 S 1 S 2 S 3 Worst case delay linear with the number of bits t d = O(N t adder = (N-1t carry + t sum Goal: Make the fastest possible carry path circuit 12
Complimentary Static CMOS Full Adder V DD V DD C i A B A B A B A C i X B C i V DD C i A S C i A B B V DD A B C i A C o B 28 Transistors 13
Inversion Property A B A B C i FA C o C i FA C o S S SABC i = SABC i C o ABC i = C o ABC i 14
Minimize Critical Path by Reducing Inverting Stages Even cell Odd cell A 0 B 0 A 1 B 1 A 2 B 2 A 3 B 3 C i,0 C o,0 C o,1 C o,2 C o,3 FA FA FA FA S 0 S 1 S 2 S 3 Exploit Inversion Property 15
A Better Structure: The Mirror Adder V DD V DD V DD A "0"-Propagate A C i B B A Kill C o A B C i B C i S "1"-Propagate A Generate C i A B B A B C i A B 24 transistors 16
Mirror Adder Stick Diagram V DD A B C i B A C i C o C i A B C o S GND 17
The Mirror Adder The NMOS and PMOS chains are completely symmetrical. A maximum of two series transistors can be observed in the carrygeneration circuitry. When laying out the cell, the most critical issue is the minimization of the capacitance at node C o. The reduction of the diffusion capacitances is particularly important. The capacitance at node C o is composed of four diffusion capacitances, two internal gate capacitances, and six gate capacitances in the connecting adder cell. The transistors connected to C i are placed closest to the output. Only the transistors in the carry stage have to be optimized for optimal speed. All transistors in the sum stage can be minimal size. 18
Transmission Gate Full Adder P V DD A V DD A A P C i C i P S Sum Generation V DD B A P B A P P V DD C o Carry Generation C i C i Setup A C i P 19
Manchester Carry Chain V DD P i C i C o G i 20
Manchester Carry Chain V DD P 0 P 1 P 2 P 3 C 3 C i,0 G 0 G 1 G 2 G 3 C 0 C 1 C 2 C 3 21
Manchester Carry Chain Stick Diagram Propagate/Generate Row V DD P i G i P i + 1 G i + 1 C i C i - 1 C i + 1 GND Inverter/Sum Row 22
Carry-Bypass Adder C i,0 P 0 G 1 P 0 G 1 P 2 G 2 P 3 G 3 C o,0 C o,1 C o,2 FA FA FA FA C o,3 Also called Carry-Skip P 0 G 1 P 0 G 1 P 2 G 2 P 3 G 3 BP=P o P 1 P 2 P 3 C i,0 C o,0 C o,1 C o,2 FA FA FA FA Multiplexer C o,3 Idea: If (P0 and P1 and P2 and P3 = 1 then C o3 = C 0, else kill or generate. 23
Carry-Bypass Adder (cont. Bit 0 3 Bit 4 7 Bit 8 11 Bit 12 15 Setup t setup Setup t bypass Setup Setup Carry propagation Carry propagation Carry propagation Carry propagation Sum Sum Sum t sum Sum M bits t adder = t setup + M tcarry + (N/M-1t bypass + (M-1t carry + t sum 24
Carry Ripple versus Carry Bypass t p ripple adder bypass adder 4..8 N 25
Carry-Select Adder Setup P,G "0" "0" Carry Propagation "1" "1" Carry Propagation C o,k-1 Multiplexer C o,k+3 Sum Generation Carry Vector 26
Carry Select Adder: Critical Path Bit 0 3 Bit 4 7 Bit 8 11 Bit 12 15 Setup Setup Setup Setup 0 0-Carry 0 0-Carry 0 0-Carry 0 0-Carry 1 1-Carry 1 1-Carry 1 1-Carry 1 1-Carry Multiplexer Multiplexer Multiplexer Multiplexer C i,0 C o,3 C o,7 C o,11 C o,15 Sum Generation Sum Generation Sum Generation Sum Generation S 0 3 S 4 7 S 8 11 S 12 15 27
Linear Carry Select Bit 0-3 Bit 4-7 Bit 8-11 Bit 12-15 Setup Setup Setup Setup (1 "0" (1 "0" Carry "0" "0" Carry "0" "0" Carry "0" "0" Carry C i,0 "1" "1" Carry (5 (5 Multiplexer "1" "1" Carry (5 "1" "1" Carry (5 "1" "1" Carry (5 (6 (7 (8 Multiplexer Multiplexer Multiplexer (9 Sum Generation Sum Generation Sum Generation Sum Generation S 0-3 S 4-7 S 8-11 S 12-15 (10 28
Square Root Carry Select Bit 0-1 Bit 2-4 Bit 5-8 Bit 9-13 Bit 14-19 Setup Setup Setup Setup (1 "0" Carry "0" (1 "0" "0" Carry "0" "0" Carry "0" "0" Carry C i,0 "1" Carry "1" Carry "1" Carry "1" Carry "1" "1" "1" "1" (3 (3 (4 (5 (6 (4 (5 (6 (7 Multiplexer Multiplexer Multiplexer Multiplexer (7 Mux (8 Sum Generation Sum Generation Sum Generation Sum Generation Sum S 0-1 S 2-4 S 5-8 S 9-13 S 14-19 (9 29
t p (in unit delays Adder Delays - Comparison 50 40 Ripple adder 30 20 Linear select 10 Square root select 0 0 20 40 N 60 30
LookAhead - Basic Idea A 0, B 0 A 1, B 1 A N-1, B N-1 C i,0 P 0 C i,1 P 1 C i, N-1 P N-1 S 0 S 1 S N-1 C ok = fa k B k C G P ok 1 = + C k k o k 1 31
Look-Ahead: Topology Expanding Lookahead equations: V DD C ok = G k + P k G k 1 + P k 1 C o k 2 G 3 G 2 All the way: C ok = G k + P k G k 1 + P k 1 + P 1 G 0 + P 0 C i0 C i,0 G 1 G 0 C o,3 P 0 P 1 P 2 P 3 32
Logarithmic Look-Ahead Adder A 0 F A 1 A 2 A 3 A 4 A 5 A 6 A 7 33
Carry Lookahead Trees C o2 C o1 C o0 = G 0 + P 0 C i0 = G 1 + P 1 G 0 + P 1 P 0 C i0 = G 2 + P 2 G 1 + P 2 P 1 G 0 + P 2 P 1 P 0 C i0 = G 2 + P 2 G 1 + P 2 P 1 G 0 + P 0 C i0 = G 2:1 + P 2:1 C o0 Can continue building the tree hierarchically. 34
Tree Adders (A 0, B 0 (A 1, B 1 (A 2, B 2 (A 3, B 3 (A 4, B 4 (A 5, B 5 (A 6, B 6 (A 7, B 7 (A 8, B 8 (A 9, B 9 (A 10, B 10 (A 11, B 11 (A 12, B 12 (A 13, B 13 (A 14, B 14 (A 15, B 15 S 0 S 1 S 2 S 3 S 4 S 5 S 6 S 7 S 8 S 9 S 10 S 11 S 12 S 13 S 14 S 15 16-bit radix-2 Kogge-Stone tree 35
Tree Adders (a 0, b 0 (a 1, b 1 (a 2, b 2 (a 3, b 3 (a 4, b 4 (a 5, b 5 (a 6, b 6 (a 7, b 7 (a 8, b 8 (a 9, b 9 (a 10, b 10 (a 11, b 11 (a 12, b 12 (a 13, b 13 (a 14, b 14 (a 15, b 15 36
Sparse Trees (a 0, b 0 (a 1, b 1 (a 2, b 2 (a 3, b 3 (a 4, b 4 (a 5, b 5 (a 6, b 6 (a 7, b 7 (a 8, b 8 (a 9, b 9 (a 10, b 10 (a 11, b 11 (a 12, b 12 (a 13, b 13 (a 14, b 14 (a 15, b 15 S 0 S 1 S 2 S 3 S 4 S 5 S 6 S 7 S 8 S 9 S 10 S 11 S 12 S 13 S 14 S 15 16-bit radix-2 sparse tree with sparseness of 2 37
Tree Adders (A 0, B 0 (A 1, B 1 (A 2, B 2 (A 3, B 3 (A 4, B 4 (A 5, B 5 (A 6, B 6 (A 7, B 7 (A 8, B 8 (A 9, B 9 (A 10, B 10 (A 11, B 11 (A 12, B 12 (A 13, B 13 (A 14, B 14 (A 15, B 15 S 0 S 1 S 2 S 3 S 4 S 5 S 6 S 7 S 8 S 9 S 10 S 11 S 12 S 13 S 14 S 15 Brent-Kung Tree 38
Example: Domino Adder V DD V DD Clk G i = a i b i Clk P i = a i + b i a i a i b i b i Clk Clk Propagate Generate 39
Example: Domino Adder V DD V DD Clk k P i:i-2k+1 Clk k G i:i-2k+1 P i:i-k+1 P i:i-k+1 G i:i-k+1 P i-k:i-2k+1 G i-k:i-2k+1 Propagate Generate 40
Example: Domino Sum 41
Multipliers 42
The Binary Multiplication Z X Y = = = = M 1 i = 0 M 1 i = 0 X i 2 i N 1 j = 0 M + N 1 k = 0 N 1 j = 0 Z k 2 k Y j 2 j X i Y j 2 i+ j with X Y = = M 1 i = 0 N 1 j = 0 X i 2 i Y j 2 j 43
The Binary Multiplication + x 1 0 1 0 1 0 1 0 1 1 1 0 1 0 1 0 1 0 1 0 1 0 0 0 0 0 0 0 1 0 1 0 1 0 1 1 1 0 0 1 1 1 0 Multiplicand Multiplier Partial products Result 44
The Array Multiplier X 3 X 2 X 1 X 0 Y 0 X 3 X 2 X 1 X 0 Y 1 Z 0 HA FA FA HA X 3 X 2 X 1 X 0 Y 2 Z 1 FA FA FA HA X3 X 2 X 1 X 0 Y 3 Z 2 FA FA FA HA Z 7 Z 6 Z 5 Z 4 Z 3 45
The MxN Array Multiplier Critical Path HA FA FA HA FA FA FA HA Critical Path 1 Critical Path 2 FA FA FA HA Critical Path 1 & 2 46
Carry-Save Multiplier HA HA HA HA HA FA FA FA HA FA FA FA HA FA FA HA Vector Merging Adder 47
Multiplier Floorplan 48
Wallace-Tree Multiplier Partial products First stage 6 5 4 3 2 1 0 6 5 4 3 2 1 0 Bit position (a (b Second stage Final adder 6 5 4 3 2 1 0 6 5 4 3 2 1 0 FA (c HA (d 49
Wallace-Tree Multiplier 50
Wallace-Tree Multiplier y 0 y 1 y2 FA C i-1 y 0 y 1 y 2 y 3 y 4 y 5 y 3 FA FA C i FA C i-1 C i C i C i-1 C i-1 y 4 FA C i FA C i-1 C i C i-1 y 5 C i FA FA C S C S 51
Multipliers Summary Optimization Goals Different Vs Binary Adder Once Again: Identify Critical Path Other possible techniques - Logarithmic versus Linear (Wallace Tree Mult - Data encoding (Booth - Pipelining FIRST GLIMPSE AT SYSTEM LEVEL OPTIMIZATION 52
Shifters 53
The Binary Shifter Right nop Left A i B i A i-1 B i-1 Bit-Slice i... 54
The Barrel Shifter A 3 B 3 Sh1 A 2 B 2 Sh2 : Data Wire A 1 B 1 : Control Wire Sh3 A 0 B 0 Sh0 Sh1 Sh2 Sh3 Area Dominated by Wiring 55
4x4 barrel shifter A 3 A 2 A 1 A 0 Sh0 Sh1 Sh2 Sh3 Width barrel ~ 2 p m M Buffer 56
Logarithmic Shifter Sh1 Sh1 Sh2 Sh2 Sh4 Sh4 A 3 B 3 A 2 B 2 A 1 B 1 A 0 B 0 57
0-7 bit Logarithmic Shifter A 3 Out3 A 2 Out2 A 1 Out1 A 0 Out0 58