VLSI Arithmetic. Lecture 9: Carry-Save and Multi-Operand Addition. Prof. Vojin G. Oklobdzija University of California

VLSI Arithmetic Lecture 9: Carry-Save and Multi-Operand Addition Prof. Vojin G. Oklobdzija University of California http://www.ece.ucdavis.edu/acsel

Carry-Save Addition* *from Parhami 2 June 18, 2003

Carry-Save Addition* *from Parhami 3 June 18, 2003

Carry-Save Addition* *from Parhami 4 June 18, 2003

Carry-Save Addition* 5 *from Parhami June 18, 2003

Carry-Save Addition * *from Parhami 6 June 18, 2003

Carry-Save Addition * *from Parhami 7 June 18, 2003

Carry-Save Addition * *from Parhami 8 June 18, 2003

Carry-Save Addition * *from Parhami 9 June 18, 2003

Carry-Save Addition * *from Parhami 10 June 18, 2003

Parallel Counters * *from Parhami 11 June 18, 2003

Parallel Counters and Compressors * *from Parhami 12 June 18, 2003

Adding Multiple Signed Numbers * *from Parhami 13 June 18, 2003

Multi-Operand Addition from Digital Arithmetic, Morgan-Kauffman Publishers, 2004 Miloš Ercegovac and Tomàs Lang

MULTI-OPERAND ADDITION 1 Bit-arrays for unsigned and signed operands - simplification of sign extension Reduction by rows and by columns - [p:2] modules and [p:2] adders for reduction by rows - (p:q] counters and multicolumn counters for reduction by columns Sequential implementation Combinational implementation - Reduction by rows: arrays of adders (linear arrays, adder trees) - Reduction by columns: (p:q] counters - systematic design method for reduction by columns with (3:2] and (2:2] counters Pipelined adder arrays Partially combinational implementation

BIT ARRAYS FOR UNSIGNED AND SIGNED OPERANDS 2 a 0 a 0 a 0 a 0 b 0 b 0 b 0 b 0 c 0 c 0 c 0 c 0 d 0 d 0 d 0 e 0 e 0 e 0 e 0... d0.. a 1 a 2 b 1 b 2 c 1 c 2 d 1 d 2 e 1 e 2............... a n b n c n d n e n sign extension Figure 3.1: SIGN-EXTENDED ARRAY FOR m = 5.

Reduced to s y y y y... y (a) s -1 s -1 s -1 s -1. x x x x... x. x x x x... x. x x x x... x. x x x x... x s. x x x x... x s. x x x x... x s. x x x x... x. x x x x... x a 0. a 1 a 2... a n -1 b 0. b 1 b 2... b n -1 c 0. c 1 c 2... c n -1 d 0. d 1 d 2... d n -1 e 0. e 1 e 2... e n -1 Reduced to a 0. a 1 a 2... a n b 0. b 1 b 2... b n c 0. c 1 c 2... c n d 0. d 1 d 2... d n e 0. e 1 e 2... e n 1 0 1 1 Transformed to a 0. a 1 a 2... a n 3 b 0. b 1 b 2... b n (b) c 0. c 1 c 2... c n d 0. d 1 d 2... d n 1e 0 e 0 e 0. e 1 e 2... e n Figure 3.2: SIMPLIFYING SIGN-EXTENSION: (a) GENERAL CASE. (b) EXAMPLE OF SIMPLIFYING ARRAY WITH m = 5.

REDUCTION 4 By rows By columns

[p:2] ADDERS FOR REDUCTION BY ROWS 5 [p:2] adder k [p:2] module p................................................... 2...... (a)......... input carries output carries [p:2] module p p k p p k p p k h out H Z h in 2 2 k (b) z Figure 3.3: A [p:2] adder: (a) Input-output bit-matrix. (b) k-column [p:2] module decomposition.

MODEL OF [p:2] MODULE 6 k p p max value p(2 k - 1) HW w max value W h out max value 2 k H log 2 W h in max value H Z 2 2 k z max value 2(2 k - 1) Figure 3.4: A model of a [p:2] module.

7 inputs of weight 2 i inputs of weight 2 i-1 x i,3 x i,2 x i,1 x i,0 x i-1,3 x i-1,2 x i-1,1 x i-1,0 x i,3 x i,1 x i-1,3 x i-1,1 h i+1, 1 0 1 MUX x i,0 0 1 MUX x i-1,0 a h i-1, 1 carries to next column h i+1, 2 0 1 MUX c 0 1 MUX Z carries from previous column h i-1, 2 denotes the bits representing w such that w = 2c + 2b + a Note that w < 4 since ab = 0 x i-1,0 h i-1, 1 a b MUX Figure 3.5: Gate network implementation of [4:2] module.

8 inputs of weight 2 i inputs of weight 2 i-1 FA c s FA c s FA c s FA c s carries to next column FA c s FA c s carries from previous column denotes the bits representing w (a) inputs of weight 2 i inputs of weight 2 i-1 inputs of weight 2 i-2 FA c s FA c s FA c s FA c s FA c s FA c s FA c s FA c s FA c s carries to next column FA c s FA c s FA c s carries from previous column FA c s FA c s FA c s denotes the bits representing w (b) Figure 3.6: (a) [5:2] module. (b) [7:2] module.

(p:q] COUNTERS FOR REDUCTION BY COLUMNS 9 p 1 x i = q 1 y j2 j i=0 j=0 2 q 1 p, i.e., q = log 2 (p + 1) x 0 + x 1... x p-1 y q-1... y 0 (a)...... q outputs (b) p inputs (same weight) Figure 3.7: (a) (p:q] reduction. (b) Counter representation.

IMPLEMENTATION OF (p:q] COUNTERS 10 all inputs of weight (1) FA FA c s c s (1) (1) (1) FA c s (2) (2) (2) FA (4) (2) (1) Figure 3.8: Implementation of (7:3] counter by an array of full adders.

11 x x x 5 x 3 6 4 x 5 x x x2 0 x x x1 0 6 x 3 x 5 x 4 x 6 x 3 x 4 x 1 x 5 x 2 x 6 x 3 4 x 2 x 1 x 0 (q B2 ) a q A1 q B0 q A0 (q B1 ) q 2 q 1 q 0 Figure 3.9: Gate network of a (7:3] counter.

MULTICOLUMN COUNTER 12 (p k 1, p k 2,..., p 0 : q] v = k 1 i=0 p i j=1 a ij2 i 2 q 1 (a) (b) Figure 3.10: (a) (5,5:4] counter. (b) (1,2,3:4] counter.

13 cycle time dependent on precision X[i] Carry Propagate Adder clk S[i] Register S S[i-1] (a) cycle time not dependent on precision X[(i-1)(p-2)+1] X[i(p-2)] X[i] [p:2] Adder [3:2] Adder clk C[i] Reg. C PS[i] Reg. PS clk C[i] Reg. C PS[i] Reg. PS C[i-1] PS[i-1] C[i-1] PS[i-1] To CPA to get S (b) To CPA to get S (c) Figure 3.11: SEQUENTIAL MULTIOPERAND ADDITION: a) WITH CONVENTIONAL ADDER. b) WITH [p:2] ADDER. c) WITH [3:2] ADDER.

COMBINATIONAL IMPLEMENTATION 14 Reduction by rows: array of adders Linear array Adder tree Reduction by columns with (p:q] counters

15 p operands [p:2] ADDER p-2 operands [p:2] ADDER p-2 operands [p:2] ADDER p-2 operands [p:2] ADDER to CPA Figure 3.12: LINEAR ARRAY OF [p:2] ADDERS FOR MULTIOPERAND ADDITION.

ADDER TREE 16 k - the number of [p:2] CS adders for m operands: k = The number of adder levels m 2 p 2 pk = m + 2(k 1) [p:2] carry-save adders [p:2] tree of l levels m l [p:2] [p:2] m l-1 [p:2] tree of l-1 levels 2 Figure 3.13: Construction of a [p:2] carry-save adder tree. m l = p m l 1 2 + ml 1 mod 2

NUMBER OF LEVELS (cont.) 17 Table 3.1: [3:2] Reduction sequence. l 1 2 3 4 5 6 7 8 9 m l 3 4 6 9 13 19 28 42 63 m l pl 2 l 1 l log p/2 (m l /2)

18 m = 9 a a a a a a a a a b CSA CSA CSA a b a b a a a a a b x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x Level 4 CSAs L= 4 CSA CSA CSA f e CSA d c b c d e Bit-vector types a: (n-1,..., 0) b: (n,..., 1) c: (n,..., 0) d: (n+1,...,2) e: (n+1,..., 0) f: (n+2,..., 1) b a b c d b c d e d x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x a b a c b c e d e f x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x Level 3CSAs Level 2 CSA Level 1 CSA Figure 3.14: [3:2] adder tree for 9 operands (magnitudes with n = 3).

19 [4:2] ADDER [4:2] ADDER [4:2] ADDER [4:2] ADDER [4:2] ADDER [4:2] ADDER [4:2] ADDER Figure 3.15: Tree of [4:2] adders for m = 16.

REDUCTION BY COLUMNS WITH (p:q] COUNTERS 20 1 0 1 1 0 0 1 0 1 0 0 1 0 1 1 0 1 0 1 0 1 1 1 1 0 1 1 0 0 1 0 1 0 1 1 1 1 0 1 0 Figure 3.16: Example of reduction using (7:3] counters.

NUMBER OF COUNTER LEVELS 21 m 1 = p m l = p m l 1 q + ml 1 mod q l log p/q (m l /q) (p:q] tree of l levels p p m l (p:q] (p:q] q q m l-1 mod q m l-1 (p:q] tree of l-1 levels q Figure 3.17: Construction of (p:q] reduction tree.

22 Table 3.2: Sequence for (7:3] counters Number of levels 1 2 3 4... Max. number of rows 7 15 35 79...

23 15 7 3 Figure 3.18: Multilevel reduction with (7:3] counters

SYSTEMATIC DESIGN METHOD 24 Full adder (3-2) 2 i+1 2 i Half adder (2-2) 2 i+1 2 i denotes 0 or 1 or or diagonal outputs when representing separately sum and carry bit-vectors is preferrable horizontal outputs when interleaving sum and carry bits is acceptable Figure 3.19: Full adder and half adder as (3:2] and (2:2] counters.

25 column: 1 0 # of rows reduce 6 transfer 4 Figure 3.20: Reduction process.

RELATION AT LEVEL l 26 e i number of bits in column i f i number of full adders in column i h i number of half adders in column i e i 2f i h i + f i 1 + h i 1 = m l 1 resulting in 2f i + h i = e i m l 1 + f i 1 + h i 1 = p i Solution producing min number of carries: f i = p i /2 h i = p i mod 2

i 6 5 4 3 2 1 0 l = 4 e i 8 8 8 8 8 m 3 6 6 6 6 6 h i 0 0 0 1 0 f i 2 2 2 1 1 l = 3 e i 2 6 6 6 6 6 m 2 4 4 4 4 4 4 h i 0 0 0 0 1 0 f i 0 2 2 2 1 1 l = 2 e i 4 4 4 4 4 4 m 1 3 3 3 3 3 3 h i 0 0 0 0 0 1 f i 1 1 1 1 1 0 l = 1 e i 1 3 3 3 3 3 3 m 0 2 2 2 2 2 2 2 h i 0 0 0 0 0 0 1 f i 0 1 1 1 1 1 0 27

28 Level 4 m 3 =6 (8FAs, 1HA) Level 3 m 2 =4 (8FAs, 1HA) Level 2 Level 1 m 1 =3 (5FAs, 1HA) m 0 =2 (5FAs, 1HA) CPA Figure 3.21: Reduction by columns of 8 5-bit magnitudes. Cost of reduction: 26 FAs and 4 HAs.

EXAMPLE: ARRAY FOR f = a + 3b + 3c + d 29 Operands in [-4,3). Result range: 4 + ( 12) + ( 12) 4 = 32 f 3 + 9 + 9 + 3 = 24

transformed into a a 2 a 2 a 2 a 2 a 1 a 0 b b 2 b 2 b 2 b 2 b 1 b 0 2b b 2 b 2 b 2 b 1 b 0 0 c c 2 c 2 c 2 c 2 c 1 c 0 2c c 2 c 2 c 2 c 1 c 0 0 d d 2 d 2 d 2 d 2 d 1 d 0 a a 2-1 a 1 a 0 b b 2-1 b 1 b 0 2b b 2-1 b 1 b 0 c c 2-1 c 1 c 0 2c c 2-1 c 1 c 0 d d 2-1 d 1 d 0 30

FINAL BIT MATRIX 31 1 0 b 2 a 2 a 1 a 0 c 2 b 2 b 1 b 0 b 1 b 0 c 2 c 1 c 0 c 1 c 0 d 2 d 1 d 0

i 5 4 3 2 1 0 l = 3 e i 1 0 2 6 6 4 m 2 4 4 4 4 4 4 h i 0 0 0 1 0 0 f i 0 0 0 1 1 0 l = 2 e i 1 0 4 4 4 4 m 1 3 3 3 3 3 3 h i 0 0 0 0 0 1 f i 0 0 1 1 1 0 l = 1 e i 1 1 3 3 3 3 m 0 2 2 2 2 2 1* h i 0 0 0 0 0 0 f i 0 0 1 1 1 1 32

2 5 2 4 2 3 2 2 2 1 2 0 33 1 0 b 2 c 2 F2 a 2 b 2 b 1 F1 a 1 b 1 b 0 a 0 b 0 c 0 cf2 H1 c 2 c 1 sf2 cf1 c 1 c 0 sf1 d 0 Level 3 (m 2 =4) d 2 d 1 ch1 sh1 1 0 F5 cf5 b 2 c 2 cf2 sf5 cf4 F4 sh1 c 2 cf1 sf4 cf3 F3 c 1 H2 a 0 c 0 d 1 ch2 sf3 b 0 c 0 sh2 Level 2 (m 1 =3) ch1 sf2 sf1 d 0 1 cf5 F9 sf5 cf4 ch1 F8 sf2 cf3 sf4 F7 sf1 sf3 ch2 F6 sh2 c 0 d 0 Level 1 (m 0 =2) cf9 sf9 cf8 sf8 cf7 sf7 cf6 sf6 F12 cf12 cf5 cf9 cf11 F11 sf9 cf8 F10 cf10 sf8 cf7 ch3 H3 sf7 cf6 Carry-ripple adder f 5 f 4 f 3 f 2 f 1 f 0 Figure 3.22: Reduction array f = a + 3b + 3c + d.

PIPELINED LINEAR ARRAY 34 X[8,j] X[1,j] Stage 1 [4:2] ADDER X[8,j] X[1,j] [4:2] ADDER [4:2] ADDER Stage 2 [4:2] ADDER X[1,j-1] Stage 1 X[1,j-2] Stage 2 [4:2] ADDER Stage 3 [4:2] ADDER Stage 4 Pipelined CPA - latches Stage 3 Pipelined CPA S[j-2] S[j-3] S[j-3] S[j-4] (a) (b) Figure 3.23: Pipelined arrays with [4:2] adders for computing S[j] = 8 i=1 X[i, j], j = 1,..., N: (a) Linear array. (b) Tree array.

PARTIALLY COMBINATIONAL IMPLEMENTATION 35 a b c d a b c d [3:2] [3:2] [4:2] Stage 1 [4:2] [4:2] Stage 2 CPA - latches CPA s (a) s (b) Figure 3.24: Partially combinational scheme for summation of 4 operands per iteration: (a) Nonpipelined. (b) Pipelined.

36 q operands CSA CSA CSA CSA Reduction q-to-2 CSA CSA - latches CSA CSA Accumulation CPA s Figure 3.25: Scheme for summation of q operands per iteration.

VLSI Arithmetic Lecture 9b: Sign-Digit Arithmetic Prof. Vojin G. Oklobdzija University of California http://www.ece.ucdavis.edu/acsel

Sign-Digit Addition from Digital Arithmetic, Morgan-Kauffman Publishers, 2004 Miloš Ercegovac and Tomàs Lang

SIGNED-DIGIT ADDITION 45 Uses signed-digit representation (redundant) with digit set x = n 1 x ir i 0 D = { a,..., 1, 0, 1,..., a} Limits carry propagation to next position Addition algorithm: No carry produced in Step 2 Step 1: x + y = w + t x i + y i = w i + rt i+1 Step 2: s = w + t s i = w i + t i 2 Fast Two-Operand Adders

SD ADDER 46 X Y n n (a) t n SDA t 0 n S x n-1 y n-1 x i y i x i-1 y i-1 x 0 y 0 TW TW TW TW Step 1 t n w n-1 t n-1 t i+1 w i t i w i-1 t i-1 t 1 w 0 t 0 ADD ADD ADD ADD Step 2 s n s n-1 s i s i-1 s 0 (b) Figure 2.34: Signed-digit addition. 2 Fast Two-Operand Adders

CASES CONSIDERED 47 Case A : two SD operands; result SD Step 1: (t i+1, w i ) = (0, x i + y i ) if a + 1 x i + y i a 1 (1, x i + y i r) if x i + y i a ( 1, x i + y i + r) if x i + y i a - algorithm modified for r = 2 Case B : two conventional operands; result SD Case C : one conventional, one SD; result SD 2 Fast Two-Operand Adders

SIGNED-BINARY ADDITION: METHOD 1 (DOUBLE RECODING) 48 RECODING 1: x i + y i = 2h i+1 + z i { 2, 1, 0, 1, 2} h i {0, 1}, z i { 2, 1, 0} q i = z i + h i { 2, 1, 0, 1} RECODING 2: q i = z i + h i = 2t i+1 + w i { 2, 1, 0, 1} t i { 1, 0}, w i {0, 1} THE RESULT: s i = w i + t i { 1, 0, 1} 2 Fast Two-Operand Adders

METHOD 1 SD ADDER 49 x i y i x i-1 y i-1 x i-2 y i-2 Recoding 1 HZ HZ HZ h z i+1 i h i z i-1 h i-1 ADD ADD q i q i-1 Recoding 2 TW TW t w t i+1 i w t i i-1 i-1 ADD ADD s i s i-1 Figure 2.35: Double recoding method for signed-bit addition 2 Fast Two-Operand Adders

SIGNED BINARY ADDITION: METHOD 2 (Using Previous Digit) 50 P i = 0 if (x i, y i ) both nonnegative (which implies t i+1 0) 1 otherwise (t i+1 0) x i + y i P i 1 t i+1 w i 2-1 0 1 0(t i 0) 1-1 1 1(t i 0) 0 1 0-0 0-1 0(t i 0) 0-1 -1 1(t i 0) -1 1-2 - -1 0 2 Fast Two-Operand Adders

METHOD 2 SD ADDER 51 x i-2 x i y i x i-1 y i-1 Pi-2 y i-2 P i-1 P P TW TW TW t w i+1 i t i w i-1 t i-1 ADD ADD s i s i-1 Figure 2.36: Signed-bit addition using the information from previous digit 2 Fast Two-Operand Adders

Example 52 X 0 1 1 1 1 1 0 1 1 Y 0 1 1 0 1 0 1 0 1 P 0 0 0 0 1 0 0 1 0 W 0 0 0 1 0 1 1 1 0 T 0 1 1 0 1 1 0 0 1 0 S 1 1 0 0 1 1 1 0 0 2 Fast Two-Operand Adders

BIT-LEVEL IMPLEMENTATION OF RADIX-2 ALGORITHMS 53 Case C: x i {0, 1}, y i, s i { 1, 0, 1} Code: borrow-save y i = y + i y i, y + i, y i {0, 1}, sim. for s i x i + y i { 1, 0, 1, 2}: recode to (t i+1, w i ), t i+1 {0, 1}, w i { 1, 0} x i + y + i y i = 2t i+1 + w i x i + y i -1 0 1 2 w i -1 0-1 0 t i+1 0 0 1 1 2 Fast Two-Operand Adders

SWITCHING FUNCTIONS FOR t i+1 AND w i 54 x i y i + yi x i + y i t i+1 w i 0 0 0 0 0 0 0 0 1-1 0 1 0 1 0 1 1 1 0 1 1 0 0 0 1 0 0 1 1 1 1 0 1 0 0 0 1 1 0 2 1 0 1 1 1 1 1 1 w i = (x i y + i (y i ) ) t i+1 = x i y + i x i (y i ) y + i (y i ) = implemented using a full-adder and inverters (for variables subtracted) 2 Fast Two-Operand Adders

55 x i y+ i y ī x i-1 y+ i-1 y ī-1 FA FA t i+1 w i t i w i-1 t i-1 s ī s+ i s+ i-1 s ī-1 Figure 2.37: Redundant adder: one operand conventional, one operand redundant, result redundant. 2 Fast Two-Operand Adders

BOTH OPERANDS REDUNDANT 56 Apply double recoding x+ i x ī y+ i y ī x+ i-1 x ī-1 y+ i-1 y ī-1 h i+1 FA z i FA h i h i-1 {0,1} v i v i-1 {-2,-1,0} z {0,1} i-1 FA FA t i+1 w i t i w i-1 t i-1 {0,1} {-1,0} s+ i s ī s+ i-1 s ī-1 Figure 2.38: Redundant adder: operands and result redundant 2 Fast Two-Operand Adders

VLSI Arithmetic Lecture 9c: Sign-Digit Arithmetic Prof. Vojin G. Oklobdzija University of California http://www.ece.ucdavis.edu/acsel

Example of Sign-Digit Arithmetic: Viterbi Detector from A. K. Yeung, J. Rabaey, A 210Mb/s Radix-4 Bit-Level Pipelined Viterbi Decoder, Proceedings of the International Solid-State Circuits Conference, San Francisco, California, 1995.

4-bit Add-Compare-Subtract Unit 3 June 18, 2003

Radix-4 ACS (one bit) 4 June 18, 2003