Part II Addition / Subtraction

Size: px

Start display at page:

Download "Part II Addition / Subtraction"

Barrie Cain
5 years ago
Views:

Part II Addition / Subtraction Parts Chapters I. Number Representation 1. 2. 3. 4.

1 Part II Addition / Subtraction Parts Chapters I. Number Representation Numbers and Arithmetic Representing Signed Numbers Redundant Number Systems Residue Number Systems Elementary Operations II. III. IV. Addition / Subtraction Multiplication Division Basic Addition and Counting Carry-Lookahead Adders Variations in Fast Adders Multioperand Addition Basic Multiplication Schemes High-Radix Multipliers Tree and Array Multipliers Variations in Multipliers Basic Division Schemes High-Radix Dividers Variations in Dividers Division by Convergence V. Real Arithmetic Floating-Point Reperesentations Floating-Point Operations Errors and Error Control Precise and Certifiable Arithmetic VI. Function Evaluation Square-Rooting Methods The CORDIC Algorithms Variations in Function Evaluation Arithmetic by Table Lookup VII. Implementation Topics High-Throughput Arithmetic Low-Power Arithmetic Fault-Tolerant Arithmetic Past, Present, and Future Apr Computer Arithmetic, Addition/Subtraction Slide 1

2 About This Presentation This presentation is intended to support the use of the textbook Computer Arithmetic: Algorithms and Hardware Designs (Oxford University Press, 2000, ISBN ). It is updated regularly by the author as part of his teaching of the graduate course ECE 252B, Computer Arithmetic, at the University of California, Santa Barbara. Instructors can use these slides freely in classroom teaching and for other educational purposes. Unauthorized uses are strictly prohibited. Behrooz Parhami Edition Released Revised Revised Revised Revised First Jan Sep Sep Oct Apr Apr Computer Arithmetic, Addition/Subtraction Slide 2

3 II Addition / Subtraction Review addition schemes and various speedup methods Addition is a key op (in itself, and as a building block) Subtraction = negation + addition Carry propagation speedup: lookahead, skip, select, Two-operand versus multioperand addition Topics in This Part Chapter 5 Basic Addition and Counting Chapter 6 Carry-Lookahead Adders Chapter 7 Variations in Fast Adder Chapter 8 Multioperand Addition Apr Computer Arithmetic, Addition/Subtraction Slide 3

4 You can t add apples and oranges, son; only the government can do that. Apr Computer Arithmetic, Addition/Subtraction Slide 4

5 5 Basic Addition and Counting Chapter Goals Study the design of ripple-carry adders, discuss why their latency is unacceptable, and set the foundation for faster adders Chapter Highlights Full adders are versatile building blocks Longest carry chain on average: log 2 k bits Fast asynchronous adders are simple Counting is relatively easy to speed up Apr Computer Arithmetic, Addition/Subtraction Slide 5

6 Basic Addition and Counting: Topics Topics in This Chapter 5.1. Bit-Serial and Ripple-Carry Adders 5.2. Conditions and Exceptions 5.3. Analysis of Carry Propagation 5.4. Carry Completion Detection 5.5. Addition of a Constant 5.6. Manchester Carry Chains and Adders Apr Computer Arithmetic, Addition/Subtraction Slide 6

7 5.1 Bit-Serial and Ripple-Carry Adders Inputs Outputs x y c s Half-adder (HA): Truth table and block diagram Inputs Outputs x y c c s in out x y c FA out s Full-adder (FA): Truth table and block diagram Apr Computer Arithmetic, Addition/Subtraction Slide 7 c x s HA y c in

8 c s Half-Adder Implementations x y x y (a) AND/XOR half-adder. _ c c s (b) NOR-gate half-adder. x _ x _ y x y s y (c) NAND-gate half-adder with complemented carry. Fig. 5.1 Three implementations of a half-adder. Apr Computer Arithmetic, Addition/Subtraction Slide 8

9 Full-Adder Implementations y x y x c out HA HA s (a) Built of half-adders. c in c out c in y x c out Mux s s c in (c) Suitable for CMOS realization. (b) Built as an AND-OR circuit. Fig. 5.2 Possible designs for a full-adder in terms of half-adders, logic gates, and CMOS transmission gates. Apr Computer Arithmetic, Addition/Subtraction Slide 9

10 c out Full-Adder Implementations HA x y x y s HA (a) FA built of two HAs x y c in c out c out c in s c in (b) CMOS mux-based FA (c) Two-level AND-OR FA Fig. 5.2 (alternate version) Possible designs for a full-adder in terms of half-adders, logic gates, and CMOS transmission gates. Apr Computer Arithmetic, Addition/Subtraction Slide 10 s

11 Some Full-Adder Details Logic equations for a full-adder: s = x y c in (odd parity function) = xyc in x y c in x yc in xy c in c out = x y x c in y c in (majority function) P y x 0 TG z N TG x 1 TG (a) CMOS transmission gate: circuit and symbol (b) Two-input mux built of two transmission gates CMOS transmission gate and its use in a 2-to-1 mux. Apr Computer Arithmetic, Addition/Subtraction Slide 11

12 Simple Adders Built of Full-Adders y x Shift x i y i Fig. 5.3 Using full-adders in building bit-serial and ripple-carry adders. Carry FF c i+1 FA c i Shift Clock s i (a) Bit-serial adder. s x 31 y 31 x y 1 1 x y 0 0 c 32 c out FA c c 2 FA c 1 FA c 0 c in s 32 s 31 s (b) Ripple-carry adder. 1 s 0 Apr Computer Arithmetic, Addition/Subtraction Slide 12

13 VLSI Layout of a Ripple-Carry Adder y 3 x 3 y 2 x2 y1 x1 y 0 x 0 7 inverters V DD V SS c out c 3 c 2 c 1 Two 4-to-1 Mux's c in Clock 150λ s 3 s 2 760λ s 1 s 0 Fig. 5.4 The layout of a 4-bit ripple-carry adder in CMOS implementation [Puck94]. Apr Computer Arithmetic, Addition/Subtraction Slide 13

14 Critical Path Through a Ripple-Carry Adder T ripple-add = T FA (x,y c out ) + (k 2) T FA (c in cout) + T FA (c in s) x k 1 y k 1 x y k-2 k 2 x y 1 1 x y 0 0 c k c out c k 1 c k 2 c 2 c 1 FA FA... FA FA c 0 c in s k s k 1 s k 2 s 1 s 0 Fig. 5.5 Critical path in a k-bit ripple-carry adder. Apr Computer Arithmetic, Addition/Subtraction Slide 14

15 Binary Adders as Versatile Building Blocks Set one input to 0: Set one input to 1: Set one input to 0 and another to 1: Inputs Outputs x y c c s in out c out = 0 AND of 1 other 0 inputs 1 x y c out = 1 OR of 1 other 1 inputs c out FA s = NOT 1 of 0 third 1 input s Bit 3 Bit 2 Bit 1 Bit w 1 z 0 y x c in w xyz c 4 c 3 c 2 c 1 c 0 w xyz xyz xy 0 (w xyz) Fig. 5.6 Four-bit binary adder used to realize the logic function f = w + xyz and its complement. Apr Computer Arithmetic, Addition/Subtraction Slide 15

16 c out 5.2 Conditions and Exceptions y k 1 x k 1 y k 2 x k 2 c k 1 c k c k 2 FA FA... c 2 1 y 0 x 0 y x1 FA c 1 FA c 0 c in Overflow Negative Zero s k 1 s k 2 s 1 s 0 Fig. 5.7 Two s-complement adder with provisions for detecting conditions and exceptions. overflow 2 s-compl = x k 1 y k 1 s k 1 x k 1 y k 1 s k 1 overflow 2 s-compl = c k c k 1 = c k c k 1 c k c k 1 Apr Computer Arithmetic, Addition/Subtraction Slide 16

17 Saturating Adders Saturating (saturation) arithmetic: When a result s magnitude is too large, do not wrap around; rather, provide the most positive or the most negative value that is representable in the number format Example In 8-bit 2 s-complement format, we have: (wraparound); sat (saturating) Saturating arithmetic in desirable in many DSP applications Designing saturating adders Unsigned (quite easy) Signed (only slightly harder) Overflow Adder 0 1 Saturation value Apr Computer Arithmetic, Addition/Subtraction Slide 17

18 5.3 Analysis of Carry Propagation Bit positions c out c in \ /\ / \ /\ / Carry chains and their lengths Fig. 5.8 Example addition and its carry propagation chains. Apr Computer Arithmetic, Addition/Subtraction Slide 18

19 Using Probability to Analyze Carry Propagation Given binary numbers with random bits, for each position i we have Probability of carry generation = ¼ (both 1s) Probability of carry annihilation = ¼ (both 0s) Probability of carry propagation = ½ (different) Probability that carry generated at position i propagates through position j 1 and stops at position j (j > i) 2 (j 1 i) 1/2 = 2 (j i) Expected length of the carry chain that starts at position i 2 2 (k i 1) Average length of the longest carry chain in k-bit addition is strictly less than log 2 k; it is log 2 (1.25k) per experimental results Analogy: Expected number when rolling one die is 3.5; if one rolls many dice, the expected value of the largest number shown grows Apr Computer Arithmetic, Addition/Subtraction Slide 19

20 5.4 Carry Completion Detection b k... b i+1 x y = x +y i i i i b i x i y i... b = c 0 in c k = c out... c i+1 c i x i + y i... c = c 0 in alldone d i+1 } x i y i From other bit positions b i c i 0 0b = Carry 1: No not i carry yet known 0 1c i = Carry 1: Carry known to be Carry known to be 0 Fig. 5.9 Example addition and its carry propagation chains. Apr Computer Arithmetic, Addition/Subtraction Slide 20

21 5.5 Addition of a Constant: Counters 0 1 Mux Data in Count / Initialize Clock +1 ( 1) Count register x Reset Load Clear Enable Counter overflow c out Incrementer (Dec rementer) x + 1 (x 1) Data out Fig An up (down) counter built of a register, an incrementer (decrementer), and a multiplexer. Apr Computer Arithmetic, Addition/Subtraction Slide 21

22 Implementing a Simple Up Counter x k 1 x k 2... c k c k 1 c k 2 c 2 c 1 x x 1 0 s k 1 s k 2 s 2 s 1 s 0 (Fm arch text) Ripple-carry incrementer for use in an up counter. Count Output Q3 T Q 2 T Q1 T Q0 T Increment Q3 Q2 Q1 Q0 Fig Four-bit asynchronous up counter built only of negative-edge-triggered T flip-flops. Apr Computer Arithmetic, Addition/Subtraction Slide 22

23 Faster and Constant-Time Counters Any fast adder design can be specialized and optimized to yield a fast counter (carry-lookahead, carry-skip, etc.) One can use redundant representation to build a constant-time counter, but a conversion penalty must be paid during read-out Count register divided into three stages Load Increment 1 Load 1 Incrementer Control 2 Incrementer Control 1 Fig Fast (constant-time) three-stage up counter. Apr Computer Arithmetic, Addition/Subtraction Slide 23

24 1.6 Manchester Carry Chains and Adders Sum digit in radix r s i = (x i + y i + c i ) mod r Special case of radix 2 s i = x i y i c i Computing the carries c i is thus our central problem For this, the actual operand digits are not important What matters is whether in a given position a carry is generated, propagated, or annihilated (absorbed) For binary addition: g i = x i y i p i = x i y i a i = x i y i = (x i y i ) It is also helpful to define a transfer signal: t i = g i p i = a i = x i y i Using these signals, the carry recurrence is written as c i+1 = g i c i p i = g i c i g i c i p i = g i c i t i Apr Computer Arithmetic, Addition/Subtraction Slide 24

25 Manchester Carry Network The worst-case delay of a Manchester carry chain has three components: 1. Latency of forming the switch control signals 2. Set-up time for switches 3. Signal propagation delay through k switches V DD c i+1 a i g i Logic 0 p Logic 1 (a) Conceptual representation i 0 1 c i c' i+1 Clock Fig One stage in a Manchester carry chain. g i p i V SS (b) Possible CMOS realization. c' i Apr Computer Arithmetic, Addition/Subtraction Slide 25

26 Carry Network is the Essence of a Fast Adder g i p i Carry is: x i y i annihilated or killed propagated generated (impossible) g i = x i y i p i = x i y i g k 1 p k 1 g k 2 p k 2 g i+1 p i g i p i g 1 p 1 g 0 p 0 c 0 Carry network c k c k 1 c k c i+1 c i c 1 c 0 Ripple; Skip; Lookahead; Parallel-prefix (Fm arch text) The main part of an adder is the carry network. The rest is just a set of gates to produce the g and p signals and the sum bits. s i Apr Computer Arithmetic, Addition/Subtraction Slide 26

27 Ripple-Carry Adder Revisited The carry recurrence: c i+1 = g i p i c i Latency of k-bit adder is roughly 2k gate delays: 1 gate delay for production of p and g signals, plus 2(k 1) gate delays for carry propagation, plus 1 XOR gate delay for generation of the sum bits g k 1 p k 1 g k 2 p k 2 g 1 p 1 g 0 p 0... c k c k 1 c k 2 c 2 c 1 c 0 Fig (arch book) The carry propagation network of a ripplecarry adder. Apr Computer Arithmetic, Addition/Subtraction Slide 27

28 The Complete Design of a Ripple-Carry Adder g i p i Carry is: x i y i annihilated or killed propagated generated (impossible) g i = x i y i p i = x i y i g k 1 p k 1 g k 2 p k 2 g i+1 p i g i p i g 1 p 1 g 0 p 0 c 0 g k 1 p k 1 g k 2 p k 2 g 1 p 1 g 0 p 0 c k c k 1 c k 2. Carry network. c 2 c 1 c 0 c k c k 1 c k c i+1 c i c 1 c 0 Figure 10.6 (ripple-carry network) superimposed on Figure 10.5 (general structure of an adder). s i Apr Computer Arithmetic, Addition/Subtraction Slide 28

29 6 Carry-Lookahead Adders Chapter Goals Understand the carry-lookahead method and its many variations used in the design of fast adders Chapter Highlights Single- and multilevel carry lookahead Various designs for log-time adders Relating the carry determination problem to parallel prefix computation Implementing fast adders in VLSI Apr Computer Arithmetic, Addition/Subtraction Slide 29

30 Carry-Lookahead Adders: Topics Topics in This Chapter 6.1. Unrolling the Carry Recurrence 6.2. Carry-Lookahead Adder Design 6.3. Ling Adder and Related Designs 6.4. Carry Determination as Prefix Computation 6.5. Alternative Parallel Prefix Networks 6.6. VLSI Implementation Aspects Apr Computer Arithmetic, Addition/Subtraction Slide 30

31 6.1 Unrolling the Carry Recurrence Recall the generate, propagate, annihilate (absorb), and transfer signals: Signal Radix r Binary g i is 1 iff x i + y i r x i y i p i is 1 iff x i + y i = r 1 x i y i a i is 1 iff x i + y i < r 1 x i y i = (x i y i ) t i is 1 iff x i + y i r 1 x i y i s i (x i + y i + c i ) mod r x i y i c i The carry recurrence can be unrolled to obtain each carry signal directly from inputs, rather than through propagation c i = g i 1 c i 1 p i 1 = g i 1 (g i 2 c i 2 p i 2 ) p i 1 = g i 1 g i 2 p i 1 c i 2 p i 2 p i 1 = g i 1 g i 2 p i 1 g i 3 p i 2 p i 1 c i 3 p i 3 p i 2 p i 1 Note: Addition symbol vs logical OR = g i 1 g i 2 p i 1 g i 3 p i 2 p i 1 g i 4 p i 3 p i 2 p i 1 c i 4 p i 4 p i 3 p i 2 p i 1 =... Apr Computer Arithmetic, Addition/Subtraction Slide 31

32 Full Carry Lookahead x 3 y 3 x 2 y 2 x 1 y 1 x 0 y 0 c in... s 3 s 2 s 1 s 0 Theoretically, it is possible to derive each sum digit directly from the inputs that affect it Carry-lookahead adder design is simply a way of reducing the complexity of this ideal, but impractical, arrangement by hardware sharing among the various lookahead circuits Apr Computer Arithmetic, Addition/Subtraction Slide 32

33 Four-Bit Carry-Lookahead Adder Complexity reduced by deriving the carry-out indirectly c 4 c 3 p 3 g 3 p 2 Full carry lookahead is quite practical for a 4-bit adder g 2 c 1 = g 0 c 0 p 0 c 2 = g 1 g 0 p 1 c 0 p 0 p 1 c 3 = g 2 g 1 p 2 g 0 p 1 p 2 c 0 p 0 p 1 p 2 c 4 = g 3 g 2 p 3 g 1 p 2 p 3 g 0 p 1 p 2 p 3 c 0 p 0 p 1 p 2 p 3 c 2 c 1 Fig. 6.1 Four-bit carry network with full lookahead. c 0 p 1 g 1 p 0 g 0 Apr Computer Arithmetic, Addition/Subtraction Slide 33

34 Carry Lookahead Beyond 4 Bits Consider a 32-bit adder c 1 = g 0 c 0 p 0 c 2 = g 1 g 0 p 1 c 0 p 0 p 1 c 3 = g 2 g 1 p 2 g 0 p 1 p 2 c 0 p 0 p 1 p input AND c 31 = g 30 g 29 p 30 g 28 p 29 p 30 g 27 p 28 p 29 p c 0 p 0 p 1 p 2 p 3... p 29 p input OR... High fan-ins necessitate tree-structured circuits Apr Computer Arithmetic, Addition/Subtraction Slide 34

35 Two Solutions to the Fan-in Problem High-radix addition (i.e., radix 2 h ) Increases the latency for generating g and p signals and sum digits, but simplifies the carry network (optimal radix?) Multilevel lookahead Example: 16-bit addition Radix-16 (four digits) Two-level carry lookahead (four 4-bit blocks) Either way, the carries c 4, c 8, and c 12 are determined first c 16 c 15 c 14 c 13 c 12 c 11 c 10 c 9 c 8 c 7 c 6 c 5 c 4 c 3 c 2 c 1 c 0 c out??? c in Apr Computer Arithmetic, Addition/Subtraction Slide 35

36 6.2 Carry-Lookahead Adder Design Block generate and propagate signals g [i,i+3] = g i+3 g i+2 p i+3 g i+1 p i+2 p i+3 g i p i+1 p i+2 p i+3 p [i,i+3] = p i p i+1 p i+2 p i+3 c i+3 c i+2 c i+1 g p g p g p g p i+3 i+3 i+2 i+2 i+1 i+1 i i 4-bit lookahead carry generator c i Fig. 6.3 g [i,i+3] p [i,i+3] Schematic diagram of a 4-bit lookahead carry generator. Apr Computer Arithmetic, Addition/Subtraction Slide 36

37 A Building Block for Carry-Lookahead Addition c 4 Fig. 6.2 Four-bit lookahead carry generator. p 3 g 3 p [i,i+3] g [i,i+3] Block Signal Generation Intermediate Carries p i+3 g i+3 c 3 c i+3 p 2 p i+2 Fig. 6.1 Four-bit adder g 2 g i+2 c 2 p 1 c i+2 p i+1 g 1 g i+1 p 0 p i c 1 c i+1 g 0 c c i 0 Apr Computer Arithmetic, Addition/Subtraction Slide 37 g i

38 Combining Block g and p Signals j 0 i 0 j 3 gp j 1 i 1 j 2 i 2 i 3 cj +1 cj +1 cj gp gp gp Block generate and propagate signals can be combined in the same way as bit g and p signals to form g and p signals for wider blocks 4-bit lookahead carry generator c i 0 gp Fig. 6.4 Combining of g and p signals of four (contiguous or overlapping) blocks of arbitrary widths into the g and p signals for the overall block [i 0, j 3 ]. Apr Computer Arithmetic, Addition/Subtraction Slide 38

39 A Two-Level Carry-Lookahead Adder c c c c c g [12,15] g [8,11] g [4,7] g [0,3] p [12,15] p [8,11] p [4,7] p [0,3] c c 4-bit lookahead carry generator g p [48,63] g [32,47] g [16,31] g [0,15] p [48,63] [32,47] p [16,31] p [0,15] 16-bit Carry-Lookahead Adder 4-bit lookahead carry generator g p [0,63] [0,63] Fig. 6.5 Building a 64-bit carry-lookahead adder from 16 4-bit adders and 5 lookahead carry generators. Carry-out: c out = g [0,k 1] c 0 p [0,k 1] = x k 1 y k 1 s k 1 (x k 1 y k 1 ) Apr Computer Arithmetic, Addition/Subtraction Slide 39

40 Latency of a Multilevel Carry-Lookahead Adder Latency through the 16-bit CLA adder consists of finding: g and p for individual bit positions g and p signals for 4-bit blocks Block carry-in signals c 4, c 8, and c 12 Internal carries within 4-bit blocks Sum bits Total latency for the 16-bit adder 1 gate level 2 gate levels 2 gate levels 2 gate levels 2 gate levels 9 gate levels (compare to 32 gate levels for a 16-bit ripple-carry adder) Each additional lookahead level adds 4 gate levels of latency Latency for k-bit CLA adder: T lookahead-add = 4 log 4 k + 1 gate levels Apr Computer Arithmetic, Addition/Subtraction Slide 40

41 6.3 Ling Adder and Related Designs Consider the carry recurrence and its unrolling by 4 steps: c i = g i 1 c i 1 t i 1 = g i 1 g i 2 t i 1 g i 3 t i 2 t i 1 g i 4 t i 3 t i 2 t i 1 c i 4 t i 4 t i 3 t i 2 t i 1 Ling s modification: Propagate h i = c i c i 1 instead of c i h i = g i 1 h i 1 t i 2 = g i 1 g i 2 g i 3 t i 2 g i 4 t i 3 t i 2 h i 4 t i 4 t i 3 t i 2 CLA: 5 gates max 5 inputs 19 gate inputs Ling: 4 gates max 5 inputs 14 gate inputs The advantage of h i over c i is even greater with wired-or: CLA: 4 gates max 5 inputs 14 gate inputs Ling: 3 gates max 4 inputs 9 gate inputs Once h i is known, however, the sum is obtained by a slightly more complex expression compared with s i = p i c i s i = (t i h i+1 ) h i g i t i 1 Apr Computer Arithmetic, Addition/Subtraction Slide 41

42 6.4 Carry Determination as Prefix Computation j 1 Block B" j 0 i 1 Block B' i 0 g p g p (g", p") (g', p') g" p" g' p' g p g = g" + g'p" p = p'p" (g, p) Block B g p Fig. 6.6 Combining of g and p signals of two (contiguous or overlapping) blocks B' and B" of arbitrary widths into the g and p signals for block B. Apr Computer Arithmetic, Addition/Subtraction Slide 42

43 Formulating the Prefix Computation Problem The problem of carry determination can be formulated as: Given (g 0, p 0 ) (g 1, p 1 )... (g k 2, p k 2 ) (g k 1, p k 1 ) Find (g [0,0], p [0,0] ) (g [0,1], p [0,1] )... (g [0,k 2], p [0,k 2] ) (g [0,k 1], p [0,k 1] ) c 1 c 2... c k 1 c k Carry-in can be viewed as an extra ( 1) position: (g 1, p 1 ) = (c in, 0) The desired pairs are found by evaluating all prefixes of (g 0, p 0 ) (g 1, p 1 )... (g k 2, p k 2 ) (g k 1, p k 1 ) The carry operator is associative, but not commutative [(g 1, p 1 ) (g 2, p 2 )] (g 3, p 3 ) = (g 1, p 1 ) [(g 2, p 2 ) (g 3, p 3 )] Prefix sums analogy: Given x 0 x 1 x 2... x k 1 Find x 0 x 0 +x 1 x 0 +x 1 +x 2... x 0 +x x k 1 Apr Computer Arithmetic, Addition/Subtraction Slide 43

44 g 3, p 3 + g [0,3], p [0,3] =(c g 3, 4 p, 3 --) Example Prefix-Based Carry Network g 2, p 2 g [0,2], p [0,2] =(c g 2, 3, p--) 2 g 1, p 1 + g [0,1], p [0,1] =(c g 1, p 21, --) g 0, p g [0,0], p [0,0] =(c g 0, 1 p, 0 --) Four-input prefix sums network Scan order g p g p Four-bit Carry lookahead network g [0,3], p [0,3] =(c 4, --) g [0,2], p [0,2] =(c 3, --) g [0,1], p [0,1] =(c 2, --) g [0,0], p [0,0] =(c 1, --) g p Apr Computer Arithmetic, Addition/Subtraction Slide 44

45 6.5 Alternative Parallel Prefix Networks x k 1... x k/2 x k/ x Prefix Sums k/2 Prefix Sums k/ s k 1... s k/2 s k/ s 0 Fig. 6.7 Parallel prefix sums network built of two k/2-input networks and k/2 adders. (Ladner-Fischer) Delay recurrence Cost recurrence D(k) = D(k/2) + 1 = log 2 k C(k) = 2C(k/2) + k/2 = (k/2) log 2 k Apr Computer Arithmetic, Addition/Subtraction Slide 45

46 The Brent-Kung Recursive Construction x k 1 x k 2... x 3 x 2 x 1 x Prefix Sums k/ s k 1 s k 2... s 3 s 2 s 1 s 0 Fig. 6.8 Parallel prefix sums network built of one k/2-input network and k 1 adders. Delay recurrence Cost recurrence D(k) = D(k/2) + 2 = 2 log 2 k 1 ( 2 really) C(k) = C(k/2) + k 1 = 2k 2 log 2 k Apr Computer Arithmetic, Addition/Subtraction Slide 46

47 Brent-Kung Carry Network (8-Bit Adder) [7, 7 ] [6, 6] [5, 5 ] [4, 4 ] [3, 3 ] [2, 2 ] [1, 1] [0, 0 ] g p [1,1] [1,1] g [0,0] p [0,0] [6, 7 ] [2, 3 ] [4, 5] [0, 1 ] [4, 7 ] [0, 3 ] g p [0,1] [0,1] [0, 7 ] [0, 6] [0, 5 ] [0, 4 ] [0, 3 ] [0, 2 ] [0, 1] [0, 0 ] Apr Computer Arithmetic, Addition/Subtraction Slide 47

48 Brent-Kung Carry Network (16-Bit Adder) x 15 x 14 x 13 x 12 x 11 x 10 x 9 x 8 x 7 x 6 x 5 x 4 x 3 x 2 x 1 x 0 Level 1 Reason for latency being 2 log 2 k Fig. 6.9 Brent-Kung parallel prefix graph for 16 inputs. 5 6 s 15 s 14 s 13 s 12 s 11 s 10 s 9 s 8 s 7 s 6 s 5 s 4 s 3 s 2 s 1 s 0 Apr Computer Arithmetic, Addition/Subtraction Slide 48

49 Kogge-Stone Carry Network (16-Bit Adder) Cost formula C(k) = (k 1) + (k 2) + (k 4) (k k/2) = k log 2 k k + 1 x 15 x 14 x 13 x 12 x 11 x 10 x 9 x 8 x 7 x 6 x 5 x 4 x 3 x 2 x 1 x 0 log 2 k levels (minimum possible) Fig Kogge-Stone parallel prefix graph for 16 inputs. s 15 s 14 s 13 s 12 s 11 s 10 s 9 s 8 s 7 s 6 s 5 s 4 s 3 s 2 s 1 s 0 Apr Computer Arithmetic, Addition/Subtraction Slide 49

50 Speed-Cost Tradeoffs in Carry Networks Method Delay Cost Ladner-Fischer log 2 k (k/2) log 2 k Kogge-Stone log 2 k k log 2 k k + 1 Brent-Kung 2 log 2 k 2 2k 2 log 2 k Improving the Ladner/Fischer design x k 1... x k/2 x k/ x Prefix Sums k/2 Prefix Sums k/ s k 1... s k/2... s k/ s 0 These outputs can be produced one time unit later without increasing the overall latency This strategy saves enough to make the overall cost linear (best possible) Apr Computer Arithmetic, Addition/Subtraction Slide 50

51 Hybrid B-K/K-S Carry Network (16-Bit Adder) x 9 x 10 x 11 x 12 x 13 x 14 x 15 x 8 x 7 x x 6 5 x 4 x 3 x x 2 1 x 0 x 9 x 10 x 11 x 12 x 13 x 14 x 15 x 8 x 7 x 6 x 5 x 4 x 3 x 2 x 1 x 0 Level 1 Brent-Kung: 6 levels 26 cells Kogge-Stone: 4 levels 49 cells 6 s 9 s 10 s 11 s 12 s 13 s 14 s 15 s 8 s 7 s 6 s 5 s 4 s 3 s 2 s 1 s 0 s 9 s 10 s 11 s 12 s 13 s 14 s 15 s 8 s 7 s 6 s 5 s 4 s 3 s 2 s 1 s 0 x 15 x 14 x 13 x 12 x 11 x 10 x 9 x 8 x 7 x 6 x 5 x 4 x 3 x 2 x 1 x 0 Fig A Hybrid Brent-Kung/ Kogge-Stone parallel prefix graph for 16 inputs. Brent- Kung Kogge- Stone Brent- Kung Hybrid: 5 levels 32 cells s 15 s 14 s 13 s 12 s 11 s 10 s 9 s 8 s 7 s 6 s 5 s 4 s 3 s 2 s 1 s 0 Apr Computer Arithmetic, Addition/Subtraction Slide 51

52 6.6 VLSI Implementation Aspects Example: Radix-256 addition of 56-bit numbers as implemented in the AMD Am29050 CMOS micro Our description is based on the 64-bit version of the adder In radix-256, 64-bit addition, only these carries are needed: c 56 c 48 c 40 c 32 c 24 c 16 c 8 First, 4-bit Manchester carry chains (MCCs) of Fig. 6.12a are used to derive g and p signals for 4-bit blocks Next, the g and p signals for 4-bit blocks are combined to form the desired carries, using the MCCs in Fig. 6.12b Apr Computer Arithmetic, Addition/Subtraction Slide 52

53 Four-Bit Manchester Carry Chains g 3 PH2 PH2 g 3 PH2 PH2 g [0,3] p 3 g 2 PH2 PH2 p 3 g 2 PH2 PH2 p [0,3] g [0,2] p 2 g 1 PH2 p 2 g 1 PH2 PH2 p [0,2] g [0,1] p 1 g 0 PH2 g [0,3] p 1 g 0 PH2 PH2 p [0,1] p 0 p 0 PH2 p [0,3] PH2 PH2 (a) Fig Example four-bit Manchester carry chain designs in CMOS technology [Lync92]. Apr Computer Arithmetic, Addition/Subtraction Slide 53 (b)

54 Carry Network for 64-Bit Adder Level 1 Level 2 16 Type-a MCC blocks [60, 63] [56, 59] [52, 55] [48, 51] [44, 47] [40, 43] [36, 39] [32, 35] [28, 31] [24, 27] [20, 23] [16, 19] Type-b MCC Type-b MCC Type-b MCC [48, 63] [48, 59] [48, 55] [32, 47] [32, 43] [32, 39] [16, 31] [16, 27] [16, 23] Legend: [i, j] represents the pair of signals p and g [i, j] [i, j] [48, 55] [32, 47] [16, 31] [-1, 15] [32, 39] [16, 31] [16, 23] [-1, 15] Level 3 Type-b MCC Type-b MCC [-1, 55] [-1, 47] [-1, 31] [-1, 39] [-1, 31] [-1, 23] c 56 c 48 c 40 c 32 c 24 c in [12, 15] [8, 11] [4, 7] [0, 3] [-1, -1] Type-b* MCC [-1, 15] [-1, 11] [-1, 7] Fig Spanning-tree carry-lookahead network [Lync92]. The 16 MCCs at level 1, that produce generate and propagate signals for 4-bit blocks, are not shown. Apr Computer Arithmetic, Addition/Subtraction Slide 54 c 16 c 8 c 0

55 7 Variations in Fast Adders Chapter Goals Study alternatives to the carry-lookahead method for designing fast adders Chapter Highlights Many methods besides CLA are available (both competing and complementary) Best design is technology-dependent (often hybrid rather than pure) Knowledge of timing allows optimizations Apr Computer Arithmetic, Addition/Subtraction Slide 55

56 Variations in Fast Adders: Topics Topics in This Chapter 7.1. Simple Carry-Skip Adders 7.2. Multilevel Carry-Skip Adders 7.3. Carry-Select Adders 7.4. Conditional-Sum Adder 7.5. Hybrid Adder Designs 7.6. Optimizations in Fast Adders Apr Computer Arithmetic, Addition/Subtraction Slide 56

57 7.1 Simple Carry-Skip Adders c c Bit Block 4-Bit Block p Skip c c Bit Block Skip p c c 8 4-Bit Block (a) Ripple-carry adder. 4-Bit Block 4-Bit Block p Skip [12,15] [8,11] [4,7] 8 (b) Simple carry-skip adder. c c Ripple-carry stages Skip logic (2 gates) p c c [0,3] 0 0 Fig. 7.1 Converting a 16-bit ripple-carry adder into a simple carry-skip adder with 4-bit skip blocks. Apr Computer Arithmetic, Addition/Subtraction Slide 57

58 Another View of Carry-Skip Addition g 4j+3 p 4j+3 g 4j+2 p 4j+2 g 4j+1 p 4j+1 g 4j p 4j c 4j+4 c 4j+3 c 4j+2 c 4j+1 c 4j One-way street Freeway Street/freeway analogy for carry-skip adder. Apr Computer Arithmetic, Addition/Subtraction Slide 58

59 Carry-Skip Adder with Fixed Block Size Block width b; k/b blocks to form a k-bit adder (assume b divides k) T fixed-skip-add = (b 1) (k/b 2) + (b 1) in block 0 OR gate skips in last block 2b + k/b 3.5 stages dt/db = 2 k/b 2 = 0 b opt = k/2 T opt = 2 2k Example: k = 32, b opt = 4, T opt = 12.5 stages (contrast with 32 stages for a ripple-carry adder) Apr Computer Arithmetic, Addition/Subtraction Slide 59

60 Carry-Skip Adder with Variable-Width Blocks bt 1 bt 2... b b0 Block widths Fig. 7.2 Carry-skip adder with variable-size blocks and three sample carry paths. The total number of bits in the t blocks is k: Carry path (1) Ripple Skip Apr Computer Arithmetic, Addition/Subtraction Slide 60 1 Carry path (3) 2[b + (b + 1) (b + t/2 1)] = t(b + t/4 1/2) = k b = k/t t/4 + 1/2 T var-skip-add = 2(b 1) t 2 = 2k/t + t/2 2.5 dt/db = 2k/t 2 + 1/2 = 0 t opt = 2 k Carry path (2) T opt = 2 k 2.5 (a factor of 2 smaller than for fixed-block)

61 7.2 Multilevel Carry-Skip Adders c out c in Fig. 7.3 S 1 S 1 S 1 S 1 S 1 Schematic diagram of a one-level carry-skip adder. c out c in S 1 S 1 S 1 S 1 S 1 S 2 Fig. 7.4 Example of a two-level carry-skip adder. c out c in S 1 S 1 S 1 S 2 Fig. 7.5 Two-level carry-skip adder optimized by removing the short-block skip circuits. Apr Computer Arithmetic, Addition/Subtraction Slide 61

62 Designing a Single-Level Carry-Skip Adder Example 7.1 Each of the following takes one unit of time: generation of g i and p i, generation of level-i skip signal from level-(i 1) skip signals, ripple, skip, and formation of sum bit once the incoming carry is known Build the widest possible one-level carry-skip adder with total delay of 8 c 8 out b 6 7 b5 b4 b3 b2 b S 1 S 1 S 1 S 1 S 1 2 b 0 cin 0 Fig. 7.6 Timing constraints of a single-level carry-skip adder with a delay of 8 units. Max adder width = 18 ( ) Generalization of Example 7.1 for total time T (even or odd) T/2 T/ (T + 1)/ Thus, for any T, the total width is (T + 1) 2 /4 2 Apr Computer Arithmetic, Addition/Subtraction Slide 62

63 Designing a Two-Level Carry-Skip Adder Example 7.2 Each of the following takes one unit of time: generation of g i and p i, generation of level-i skip signal from level-(i 1) skip signals, ripple, skip, and formation of sum bit once the incoming carry is known Build the widest possible two-level carry-skip adder with total delay of 8 c out 8 Tproduce T assimilate {8, 1} {7, 2} {6, 3} {5, 4} {4, 5} {3, 8} c bf be bd bc bb ba F S2 S2 S2 S2 S2 (a) Block E Block D Block C Block B Block A in Max adder width = 30 ( ) c out t=8 2 c in 2 t= Fig. 7.7 Two-level carry-skip adder with a delay of 8 units: (a) Initial timing constraints, (b) Final design. Apr Computer Arithmetic, Addition/Subtraction Slide 63

64 Elaboration on Two-Level Carry-Skip Adder Example 7.2 Given the delay pair {β, α} for a level-2 block in Fig. 7.7a, the number of level-1 blocks that can be accommodated is γ = min(β 1, α) α cout b α 1 b α 2 b2 b1 b0 α 1 α S 1 S 1 S 1 S 1 S 1 S 1 S 1 cin c out Single-level carry-skip adder with T assimilate = α β b β 2 b β 3 β 1 β 2 4 b 2 3 b 1 2 b0 1 cin S 1 S 1 S 1 S 1 S 1 S 1 S 1 Single-level carry-skip adder with T produce = β Width of the ith level-1 block in the level-2 block characterized by {β, α} is b i = min(β γ + i + 1, α i); the total block width is then i=0 to γ 1 b i Apr Computer Arithmetic, Addition/Subtraction Slide 64

65 Carry-Skip Adder Optimization Scheme Block of b full-adder units I(b) G(b) A(b) Inputs S (b) h E (b) h Level-h skip Fig. 7.8 Generalized delay model for carry-skip adders. Apr Computer Arithmetic, Addition/Subtraction Slide 65

66 7.3 Carry-Select Adders k - 1 k/2 k k/2-bit adder 0 k/2-bit adder 1 k/2-bit adder c in k/2+1 k/2+1 k/2 1 0 Mux c k/2 c out k/2 High k/2 bits C select-add (k) = 3C add (k/2) + k/2 + 1 T select-add (k) = T add (k/2) + 1 Low k/2 bits Fig. 7.9 Carry-select adder for k-bit numbers built from three k/2-bit adders. Apr Computer Arithmetic, Addition/Subtraction Slide 66

67 Multilevel Carry-Select Adders k - 1 3k/4 0 k/4-bit adder 1 3k/4-1 k/2 0 k/4-bit adder 1 k/2-1 k/4 k/ k/4-bit adder 1 k/4-bit adder c in k/4+1 k/4+1 k/4 k/4 k/4+1 k/4+1 k/4 1 0 Mux 1 0 Mux c k/4 1 0 Mux k/2+1 c k/2 k/4 c out, High k/2 bits Middle k/4 bits Low k/4 bits Fig Two-level carry-select adder built of k/4-bit adders. Apr Computer Arithmetic, Addition/Subtraction Slide 67

68 7.4 Conditional-Sum Adder Multilevel carry-select idea carried out to the extreme (to 1-bit blocks. C(k) 2C(k/2) + k + 2 k (log 2 k + 2) + k C(1) T(k) = T(k/2) + 1 = log 2 k + T(1) where C(1) and T(1) are the cost and delay of the circuit of Fig for deriving the sum and carry bits with a carry-in of 0 and 1 y i x i k + 2 is an upper bound on number of single-bit 2-to-1 multiplexers needed for combining two k/2-bit adders into a k-bit adder c s i For c i = 1 c s For c i = 0 i+1 i+1 i Fig Top-level block for one bit position of a conditional-sum adder. Apr Computer Arithmetic, Addition/Subtraction Slide 68

69 Conditional-Sum Addition Example Table 7.2 Conditional-sum addition of two 16-bit numbers. The width of the block for which the sum and carry bits are known doubles with each additional level, leading to an addition time that grows as the logarithm of the word width k. Block width Block carry-in x y s c s c s c s c s c s c s c s c s c 0 1 s c Block sum and block carry-out c in c out Apr Computer Arithmetic, Addition/Subtraction Slide 69

70 Elaboration on Conditional-Sum Addition Two adjacent 4-bit blocks, forming an 8-bit block Two versions of sum bits and carry-out in 4-bit blocks Left 4-bit block 8j j Right 4-bit block 8j j Two versions of sum bits and carry-out in 8-bit block 0 0 8j j j Apr Computer Arithmetic, Addition/Subtraction Slide 70

71 7.5 Hybrid Adder Designs The most popular hybrid addition scheme: Lookahead Carry Generator c in Carry-Select Block g, p Mux Mux Mux c out Fig A hybrid carry-lookahead/carry-select adder. Apr Computer Arithmetic, Addition/Subtraction Slide 71

72 Details of a 64-Bit Hybrid CLA/Select Adder Level 1 Level 2 16 Type-a MCC blocks [60, 63] [56, 59] [52, 55] [48, 51] [44, 47] [40, 43] [36, 39] [32, 35] [28, 31] [24, 27] [20, 23] Type-b MCC Type-b MCC Type-b MCC [48, 63] [48, 59] [48, 55] [32, 47] [32, 43] [32, 39] [16, 31] [16, 27] [16, 23] Legend: [i, j] represents the pair of signals p and g [i, j] [i, j] [48, 55] [32, 47] [16, 31] [-1, 15] [32, 39] [16, 31] [16, 23] [-1, 15] Level 3 Type-b MCC Type-b MCC [-1, 55] [-1, 47] [-1, 31] [-1, 39] [-1, 31] [-1, 23] c 56 c 48 c 40 c 32 c 24 [16, 19] [12, 15] [-1, 15] c 16 [8, 11] [4, 7] [0, 3] [-1, -1] Type-b* MCC [-1, 11] [-1, 7] Fig [Lync92]. c 8 c in c 0 Each of the carries c 8j, produced by the tree network above is used to select one of the two versions of the sum in positions 8j to 8j + 7 Apr Computer Arithmetic, Addition/Subtraction Slide 72

73 Any Two Addition Schemes Can Be Combined c 48 c 32 c 16 c c c c g [12,15] g [8,11] g [4,7] g [0,3] p [12,15] p [8,11] p [4,7] p [0,3] 4-Bit Lookahead Carry Generator (with carry-out) 16-bit Carry-Lookahead Adder Fig Example 48-bit adder with hybrid ripple-carry/carry-lookahead design. Other possibilities: hybrid carry-select/ripple-carry hybrid ripple-carry/carry-select... Apr Computer Arithmetic, Addition/Subtraction Slide 73

74 7.6 Optimizations in Fast Adders What looks best at the block diagram or gate level may not be best when a circuit-level design is generated (effects of wire length, signal loading,... ) Modern practice: Optimization at the transistor level Variable-block carry-lookahead adder Optimizations for average or peak power consumption Timing-based optimizations (next slide) Apr Computer Arithmetic, Addition/Subtraction Slide 74

75 Optimizations Based on Signal Timing So far, we have assumed that all input bits are presented at the same time and all output bits are also needed simultaneously Latency from inputs in XOR-gate delays Bit Position Fig Example arrival times for operand bits in the final fast adder of a tree multiplier [Oklo96]. Apr Computer Arithmetic, Addition/Subtraction Slide 75

76 Modern Low-Power Adders Implemented in CMOS 64-Bit Adder Designs Cond l-sum Ling Three-Stage Ling Zeydel, Kluter, Oklobdzija, ARITH-17, 2005 Apr Computer Arithmetic, Addition/Subtraction Slide 76

77 Taxonomy of Parallel Prefix Networks Fanout = 2 f + 1 Logic levels = log 2 k + l From: Harris, David, Wire tracks = 2 t Apr Computer Arithmetic, Addition/Subtraction Slide 77

78 8 Multioperand Addition Chapter Goals Learn methods for speeding up the addition of several numbers (needed for multiplication or inner-product) Chapter Highlights Running total kept in redundant form Current total + Next number New total Deferred carry assimilation Wallace/Dadda trees and parallel counters Apr Computer Arithmetic, Addition/Subtraction Slide 78

79 Multioperand Addition: Topics Topics in This Chapter 8.1. Using Two-Operand Adders 8.2. Carry-Save Adders 8.3. Wallace and Dadda Trees 8.4. Parallel Counters 8.5. Generalized Parallel Counters 8.6. Adding Multiple Signed Numbers Apr Computer Arithmetic, Addition/Subtraction Slide 79

80 8.1 Using Two-Operand Adders Some applications of multioperand addition a x x 0 a x 1 a x a x 3 a p p p p p p p p s (0) (1) (2) (3) (4) (5) (6) Fig. 8.1 Multioperand addition problems for multiplication or inner-product computation in dot notation. Apr Computer Arithmetic, Addition/Subtraction Slide 80

81 Serial Implementation with One Adder x (i) k bits Adder Partial sum register k + log 2 n bits i 1 (j) x j=0 Fig. 8.1 Serial implementation of multi-operand addition with a single 2-operand adder. T serial-multi-add = O(n log(k + log n)) = O(n log k + n log log n) Therefore, addition time grows superlinearly with n when k is fixed and logarithmically with k for a given n Apr Computer Arithmetic, Addition/Subtraction Slide 81

82 Pipelined Implementation for Higher Throughput Problem to think about: Ignoring start-up and other overheads, this scheme achieves a speedup of 4 with 3 adders. How is this possible? x (i 1) x (i 6) + x (i 7) Delay Ready to compute (i) x (i 1) x + Delays s (i 12) x (i) (i 8) x (i 9) + (i 10) (i 11) x + x + x x (i 4) + x (i 5) Fig. 8.1 Serial multi-operand addition when each adder is a 4-stage pipeline. Apr Computer Arithmetic, Addition/Subtraction Slide 82

83 Parallel Implementation as Tree of Adders n 1 adders k k k k k k Adder Adder Adder k+1 k+1 k+1 Adder Adder k+2 k+2 k log 2 n adder levels Adder k+3 Fig. 8.4 Adding 7 numbers in a binary tree of adders. T tree-fast-multi-add = O(log k + log(k + 1) log(k + log 2 n 1)) = O(log n log k + log n log log n) T tree-ripple-multi-add = O(k + log n) [Justified on the next slide] Apr Computer Arithmetic, Addition/Subtraction Slide 83

84 Elaboration on Tree of Ripple-Carry Adders k k k k k k Adder Adder Adder k+1 k+1 k+1 Adder Adder k+2 k+2 k t+1 t+1 t t... FA HA Level i t+2 t+1 t+2 t+2 t+1 t+1 Adder k+3 T tree-ripple-multi-add = O(k + log n)... t+3 FA t+3 t+2 HA t+2 Level i+1 Fig. 8.5 Ripple-carry adders at levels i and i + 1 in the tree of adders used for multi-operand addition. The absolute best latency that we can hope for is O(log k + log n) There are kn data bits to process and using any set of computation elements with constant fan-in, this requires O(log(kn)) time We will see shortly that carry-save adders achieve this optimum time Apr Computer Arithmetic, Addition/Subtraction Slide 84

85 Fig. 8.6 A ripple-carry adder turns into a carry-save adder if the carries are saved (stored) rather than propagated. 8.2 Carry-Save Adders FA FA Cut FA FA FA FA FA FA FA FA FA FA c in Carry-propagate adder c out Carry-save adder (CSA) or (3; 2)-counter or 3-to-2 reduction circuit Fig. 8.7 Carry-propagate adder (CPA) and carry-save adder (CSA) functions in dot notation. Full-adder Fig. 8.8 Specifying fulland half-adder blocks, with their inputs and outputs, in dot notation. Half-adder Apr Computer Arithmetic, Addition/Subtraction Slide 85

86 Multioperand Addition Using Carry-Save Adders T carry-save-multi-add = O(tree height + T CPA ) = O(log n + log k) CSA CSA C carry-save-multi-add = (n 2)C CSA + C CPA CSA CSA CPA Input Sum register Carry register CSA CSA Carry-propagate adder Output Fig Serial carry-save addition using a single CSA. Fig. 8.9 Tree of carry-save adders reducing seven numbers to two. Apr Computer Arithmetic, Addition/Subtraction Slide 86

87 Example Reduction by a CSA Tree 12 FAs 6 FAs Bit position = 12 FAs FAs FAs FAs + 1 HA bit adder --Carry-propagate adder FAs Fig Representing a sevenoperand addition in tabular form. 4 FAs + 1 HA 7-bit adder Total cost = 7-bit adder + 28 FAs + 1 HA Fig Addition of seven 6-bit numbers in dot notation. A full-adder compacts 3 dots into 2 (compression ratio of 1.5) A half-adder rearranges 2 dots (no compression, but still useful) Apr Computer Arithmetic, Addition/Subtraction Slide 87

88 [0, k 1] [0, k 1] k-bit CSA Width of Adders in a CSA Tree [0, k 1] [0, k 1] [0, k 1] k-bit CSA [1, k] [0, k 1] [1, k] [0, k 1] [0, k 1] [0, k 1] Fig Adding seven k-bit numbers and the CSA/CPA widths required. The index pair [i, j] means that bit positions from i up to j are involved. k-bit CSA k-bit CSA k-bit CPA k-bit CSA [1, k] [2, k+1] [1, k] [1, k 1] [2, k+1] [2, k+1] [1, k+1] [0, k 1] Due to the gradual retirement (dropping out) of some of the result bits, CSA widths do not vary much as we go down the tree levels k+2 [2, k+1] 1 0 Apr Computer Arithmetic, Addition/Subtraction Slide 88

89 n inputs... 2 outputs h(n) = 1 + h( 2n/3 ) n(h) = 3n(h 1)/2 8.3 Wallace and Dadda Trees h 1 < n(h) h h levels h levels Table 8.1 The maximum number n(h) of inputs for an h-level CSA tree h n(h) h n(h) h n(h) n(h): Maximum number of inputs for h levels Apr Computer Arithmetic, Addition/Subtraction Slide 89

90 Example Wallace and Dadda Reduction Trees 12 FAs 6 FAs 6 FAs 4 FAs + 1 HA 7-bit adder Total cost = 7-bit adder + 28 FAs + 1 HA Fig Addition of seven 6-bit numbers in dot notation. Wallace tree: Reduce the number of operands at the earliest possible opportunity h n(h) Dadda tree: Postpone the reduction to the extent possible without causing added delay Apr Computer Arithmetic, Addition/Subtraction Slide 90 6 FAs 11 FAs 7 FAs 4 FAs + 1 HA 7-bit adder Total cost = 7-bit adder + 28 FAs + 1 HA Fig Adding seven 6-bit numbers using Dadda s strategy.

91 A Small Optimization in Reduction Trees 6 FAs Fig Adding seven 6-bit numbers by taking advantage of the final adder s carry-in. 6 FAs 11 FAs 11 FAs 7 FAs 6 FAs + 1 HA 4 FAs + 1 HA 3 FAs + 2 HA 7-bit adder 7-bit adder Total cost = 7-bit adder + 28 FAs + 1 HA Fig Adding seven 6-bit numbers using Dadda s strategy. Total cost = 7-bit adder + 26 FAs + 3 HA Apr Computer Arithmetic, Addition/Subtraction Slide 91

92 8.4 Parallel Counters 1-bit full-adder = (3; 2)-counter FA FA FA Circuit reducing 7 bits to their 3-bit sum = (7; 3)-counter FA FA HA 1 2 FA HA 3-bit ripple-carry adder Circuit reducing n bits to their log 2 (n + 1) -bit sum = (n; log 2 (n +1) )-counter 3 2 Fig A 10-input parallel counter also known as a (10; 4)-counter. 1 0 Apr Computer Arithmetic, Addition/Subtraction Slide 92

93 8.5 Generalized Parallel Counters Multicolumn reduction... (5, 5; 4)-counter Fig Dot notation for a (5, 5; 4)-counter and the use of such counters for reducing five numbers to two numbers. Unequal columns (2, 3; 3)-counter Gen. parallel counter = Parallel compressor Apr Computer Arithmetic, Addition/Subtraction Slide 93

94 A General Strategy for Column Compression (n; 2)-counters To i + 1 To i + 2 To i + 3 i n inputs ψ 1 ψ 2 ψ 3 One circuit slice i 1 ψ 1 i 2 ψ 2 i 3 ψ 3... n + ψ 1 + ψ 2 + ψ ψ 1 + 4ψ 2 + 8ψ n 3 ψ 1 + 3ψ 2 + 7ψ Example: Design a bit-slice of an (11; 2)-counter Solution: Let s limit transfers to two stages. Then, 8 ψ 1 + 3ψ 2 Possible choices include ψ 1 = 5, ψ 2 = 1 or ψ 1 = ψ 2 = 2 Apr Computer Arithmetic, Addition/Subtraction Slide 94

95 8.6 Adding Multiple Signed Numbers Extended positions Sign Magnitude positions x k 1 x k 1 x k 1 x k 1 x k 1 x k 1 x k 2 x k 3 x k 4... y k 1 y k 1 y k 1 y k 1 y k 1 y k 1 y k 2 y k 3 y k 4... z k 1 z k 1 z k 1 z k 1 z k 1 z k 1 z k 2 z k 3 z k 4... (a) Using sign extension Extended positions Sign Magnitude positions x k 1 ' x k 2 x k 3 x k 4... y k 1 ' y k 2 y k 3 y k 4... z k 1 ' z k 2 z k 3 z k 4... b = (1 b) (b) Using negatively weighted bits Fig Adding three 2's-complement numbers. Apr Computer Arithmetic, Addition/Subtraction Slide 95

Part II Addition / Subtraction

Part II Addition / Subtraction Parts Chapters I. Number Representation 1. 2. 3. 4. Numbers and Arithmetic Representing Signed Numbers Redundant Number Systems Residue Number Systems Elementary Operations