Computer Arithmetic Design

Size: px
Start display at page:

Download "Computer Arithmetic Design"

Transcription

1 Computer Arithmetic Design Instructor: Kuan Jen Lin Web: Dept. of EE, FJU, Taiwan Room: SF 727B Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan 1

2 SW & HW SW = Algorithm + Data Structure + Programming techniques HW = Algorithm + Architecture + Design Method Computing Communication Pipeline Systolic array Low power Interface Full custom Cell based FPGA System level Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan 2

3 Course Objectives Learn computer algorithms to do arithmetic operations Learn hardware designs for computer arithmetic. After completing the course Students are able to implement computer arithmetic hardware designs using HDL. Students are able to read research papers about computer arithmetic. Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan 3

4 Tetbook Tetbook Behrooz Parhami, Computer Arithmetic Algorithms and Hardware Designs, Oford University Press Reference books: Ercegovac and Lang, Digital Arithmetic, MKP. Stine, Digital Computer Aruthmetic datapath Design Using Verilog HDL, CAP Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan 4

5 Syllabus Number representation Two-operand Addition Multi-operand Addition Multiplication Division Square Root Papers reading and presentation Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan 5

6 Grading Mid Eam (30%) Papers reading and presentation (30%) Homework (some problems need HDL programming) (30%) Attendance and Others (10%) Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan 6

7 Number Representation Instructor: Kuan Jen Lin Dept. of EE, FJU, Taiwan Room: SF 727B Most slides are revision of PowerPoint files gotten from tetbook website. Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan 7

8 Numbers and Arithmetic Chapter Goals Define scope and provide motivation Set the framework for the rest of the book Review positional fied-point numbers Chapter Highlights What goes on inside your calculator? Ways of encoding numbers in k bits Radices and digit sets: conventional, eotic Conversion from one system to another Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan 8

9 What is Computer Arithmetic? Pentium Division Bug ( ): Pentium s radi-4 SRT algorithm occasionally gave incorrect quotient First noted in 1994 by T. Nicely who computed sums of reciprocals of twin primes: 1/5 + 1/7 + 1/11 + 1/ /p + 1/(p + 2) +... Worst-case eample of division error in Pentium: c = = Correct quotient circa 1994 Pentium double FLP value; accurate to only 14 bits (worse than single!) Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan 9

10 The Scope of Computer Arithmetic. Hardware (our focus in this book) Software Design of efficient digital circuits for primitive and other arithmetic operations such as +,,,,, log, sin, cos Issues: Algorithms Error analysis Speed/cost trade-offs Hardware implementation Testing, verification General-purpose Special-purpose Fleible data paths Tailored to Fast primitive applications like: operations like Digital filtering +,,,, Image processing Benchmarking Radar tracking Numerical methods for solving systems of linear equations, partial differential equations, etc. Issues: Algorithms Error analysis Computational compleity Programming Testing, verification Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan 10

11 A Motivating Eample Using a calculator with, 2, and y functions, compute: u = 2 = th root of 2 v = 2 1/1024 = Save u and v; If you can t save, recompute values when needed = (((u 2 ) 2 )...) 2 = ' = u 1024 = y = (((v 2 ) 2 )...) 2 = y' = v 1024 = Perhaps v and u are not really the same value w = v u = Nonzero due to hidden digits (u 1) 1000 = [Hidden... (0) 68] (v 1) 1000 = [Hidden... (0) 69] Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan 11

12 Finite Precision Can Lead to Disaster Eample: Failure of Patriot Missile (1991 Feb. 25) Source American Patriot Missile battery in Dharan, Saudi Arabia, failed to intercept incoming Iraqi Scud missile The Scud struck an American Army barracks, killing 28 Cause, per GAO/IMTEC report: software problem (inaccurate calculation of the time since boot) Problem specifics: Time in tenths of second as measured by the system s internal clock was multiplied by 1/10 to get the time in seconds Internal registers were 24 bits wide 1/10 = (chopped to 24 b) Error Error in 100-hr operation period = 0.34 s Distance traveled by Scud = (0.34 s) (1676 m/s) 570 m Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan 12

13 Numbers and Their Encodings Some 4-bit number representation formats Unsigned integer ± Signed integer Fied point, 3+1 Signed fraction Floating point ± e s log Radi point 2's-compl fraction Logarithmic Eponent in { 2, 1, 0, 1} Significand in {0, 1, 2, 3} Base-2 logarithm Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan 13

14 Encoding Numbers in 4 Bits Number format Unsigned integers Signed-magnitude ± fied-point,. Signed fraction, ±. ± 2 s-compl. fraction, floating-point, s 2 e in [ 2, 1], s in [0, 3] e e s logarithmic (log =.) log Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan 14

15 Fied-Radi Positional Number Systems ( k 1 k l ) r = i r i One can generalize to: Arbitrary radi (not necessarily integer, positive, constant) Arbitrary digit set, usually { α, α+1,..., β 1, β} = [ α, β] Eample 1.1. Balanced ternary number system: Radi r = 3, digit set = [ 1, 1] Eample 1.2. Negative-radi number systems: Radi r, r 2, digit set = [0, r 1] The special case with radi 2 and digit set [0, 1] is known as the negabinary number system Can it represent all integer number? k 1 i= l Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan 15

16 More Eamples of Number Systems Eample 1.3. Digit set [ 4, 5] for r = 10: (3 1 5)ten represents 295 = Eample 1.4. Digit set [ 7, 7] for r = 10: (3 1 5)ten = (3 0 5)ten = ( )ten Eample 1.7. Quater-imaginary number system: radi r = 2j, digit set [0, 3] Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan 16

17 Number Radi Conversion Whole part Fractional part u = w. v = ( k 1 k l ) r Old = ( X K 1 X K 2... X 1 X 0. X 1 X 2... X L ) R New Eample: (31) eight = (25) ten 31 Oct. = 25 Dec. Halloween = Xmas Radi conversion, using arithmetic in the old radi r Convenient when converting from r = 10 Radi conversion, using arithmetic in the new radi R Convenient when converting to R = 10 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan 17

18 Radi Conversion: Old-Radi Arithmetic Converting whole part w: (105) ten = (?) five Repeatedly divide by five Quotient Remainder Therefore, (105) ten = (410) five Converting fractional part v: ( ) ten = (410.?) five Repeatedly multiply by five Whole Part Fraction Therefore, ( ) ten ( ) five Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan 18

19 Radi Conversion: New-Radi Arithmetic Converting whole part w: (22033) five = (?) ten ((((2 5) + 2) 5 + 0) 5 + 3) : : : : 10 : : : : : : : 12 : : : : : 60 : : : 303 : Horner s rule or formula Converting fractional part v: ( ) five = (105.?) ten ( ) five 5 5 = (22033) five = (1518) ten 1518 / 5 5 = 1518 / 3125 = Therefore, ( ) five = ( ) ten Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan 19

20 Horner s Rule for Fractions Converting fractional part v: ( ) five = (?) ten (((((3 / 5) + 3) / 5 + 0) / 5 + 2) / 5 + 2) / : : : : 0.6 : : : : : : : 3.6 : : : : : 0.72 : : : : Horner s rule or formula Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan 20

21 Classes of Number Representations Signed number Redundant number system Residue number system Real number Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan 21

22 2 Representing Signed Numbers Chapter Goals Learn different encodings of the sign info Discuss implications for arithmetic design Chapter Highlights Using sign bit, biasing, complementation Properties of 2 s-complement numbers Signed vs unsigned arithmetic Signed numbers, positions, or digits Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan 22

23 Four-bit signed-magnitude number representation system for integers Decrement Signed values (signed magnitude) _ Bit pattern (representation) 0100 Increment Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan 23

24 Four-bit biased integer number representation system with a bias of 8 Increment Signed values (biased by 8) _ Bit pattern (representation) 0100 Increment Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan 24

25 Arithmetic with Biased Numbers Addition/subtraction of biased numbers + y + bias = ( + bias) + (y + bias) bias y + bias = ( + bias) (y + bias) + bias A power-of-2 (or 2 a 1) bias simplifies addition/subtraction Comparison of biased numbers: Compare like ordinary unsigned numbers find true difference by ordinary subtraction We seldom perform arbitrary arithmetic on biased numbers Main application: Eponent field of floating-point numbers Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan 25

26 Eample and Two Special Cases Eample -- complement system for fied-point numbers: Complementation constant M = Fied-point number range [ 6.000, ] Represent as = Auiliary operations for complement representations complementation or change of sign (computing M ) computations of residues mod M Thus, M must be selected to simplify these operations Two choices allow just this for fied-point radi-r arithmetic with k whole digits and l fractional digits Radi complement M = r k Digit complement M = r k ulp (aka diminished radi compl) ulp (unit in least position) stands for r l Allows us to forget about l, even for nonintegers Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan 26

27 Two s- Complement Numbers Signed values (2 s complement) _ Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Unsigned representations Two s complement = radi complement system for r = 2 M = 2 k 2 k = [(2 k ulp) ] + ulp = compl + ulp Range of representable numbers in with k whole bits: from 2 k 1 to 2 k 1 ulp ulp (unit in least position) stands for r l Allows us to forget about l, even for nonintegers 27

28 One s-complement Number Representation Signed values (1 s complement) _ Unsigned representations 0100 One s complement = digit complement (diminished radi complement) system for r = 2 M = 2 k ulp (2 k ulp) = compl Range of representable numbers in with k whole bits: from 2 k 1 + ulp to 2 k 1 ulp Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan 28

29 Range/Precision etension for 2 s- and 1 s Complement Range/precision etension for 2 s-complement numbers... k 1 k 1 k 1 k 1 k l Sign etension Sign bit LSD Etension Range/precision etension for 1 s-complement numbers... k 1 k 1 k 1 k 1 k l k 1 k 1 k 1... Sign etension Sign bit LSD Etension Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan 29

30 Mod 2 k vs Mod 2 k -1 Mod-2 k operation needed in 2 s-complement arithmetic is trivial: Simply drop the carry-out (subtract 2 k if result is 2 k or greater) Mod-(2 k ulp) operation needed in 1 s-complement arithmetic is done via end-around carry ( + y) (2 k ulp) Connect c out to c in Since the dropped carry is worth 2 k unites and the inserted carry is worth ulp, the combined effect is to reduce the magnitude by 2 k -ulp. Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan 30

31 Why 2 s-complement Is the Universal Choice y Controlled complementation c out 0 1 Mu _ y or y Adder s = ± y c in add/sub Can replace this mu with k XOR gates 0 for addition, 1 for subtraction Adder/subtractor architecture for 2 s-complement numbers. Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan 31

32 Interpreting a 2 s-complement number as having a negatively weighted most-significant digit. = ( ) two s-compl = 90 Check: = ( ) two s-compl = ( ) two = 90 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan 32

33 Redundant Number Systems Chapter Goals Eplore the advantages and drawbacks of using more than r digit values in radi r Chapter Highlights Redundancy eliminates long carry chains Redundancy takes many forms: trade-offs Conversions between redundant and nonredundant representations Redundancy used for end values too? Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan 33

34 Coping with the Carry Problem Ways of dealing with the carry propagation problem: 1. Limit propagation to within a small number of bits (Chapters 3-4) 2. Detect end of propagation; don t wait for worst case (Chapter 5) 3. Speed up propagation via lookahead etc. (Chapters 6-7) 4. Ideal: Eliminate carry propagation altogether! (Chapter 3) Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan 34

35 Use Redundant Number System (1/2) Operand digits in [0, 9] Position sums in [0, 18] But how can we etend this beyond a single addition? Subsequent additions will cause problems. The digit values 10 through 18 are redundant. Carry occurs if the sum >= 10, while not >18. Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan 35

36 Use Redundant Number System (2/2) Is there still carry propagation problem? The sum of digits for each position is in [0, 36], each can be decomposed into an interim sum in [0, 16] and a transfer digit in [0, 2], i.e. carry. Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan 36

37 Eample: Addition of Redundant Numbers Position sum decomposition [0, 36] = 10 [0, 2] + [0, 16] Absorption of transfer digit [0, 16] + [0, 2] = [0, 18] Operand digits in [0, 18] Position sums in [0, 36] Interim sums in [0, 16] Transfer digits in [0, 2] Sum digits in [0, 18] Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan 37

38 Carry-Free Addition Schemes Interim sum at position i Operand digits at position i Transfer digit into position i i, i+1,yi+1 y i i 1,yi 1 i+1,yi+1 y i i 1,yi 1 i, i+1,yi+1 i, y i i 1,yi 1 s i+1 s i s i 1 (Impossible for positional system with fied digit set) (a) Ideal single-stage carry-free. ti s i+1 s i s i 1 (b) Two-stage carry-free. s i+1 s i s i 1 (c) Single-stage with lookahead. Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan 38

39 Redundancy Inde So, redundancy helps us achieve carry-free addition But how much redundancy is actually needed? Is [0, 11] enough for r = 10? Redundancy inde ρ = α + β + 1 r For eample, = Operand digits in [0, 11] Position sums in [0, 22] Interim sums in [0, 9] Transfer digits in [0, 2] Sum digits in [0, 11] Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan 39

40 Digit Sets and Digit-Set Conversions Eample 3.1: Convert from digit set [0, 18] to [0, 9] in radi = 10 (carry 1) = 10 (carry 1) = 10 (carry 1) = 10 (carry 1) = 10 (carry 1) = 10 (carry 1) Answer; all digits in [0, 9] Note: Conversion from redundant to nonredundant representation always involves carry propagation Thus, the process is sequential and slow Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan 40

41 Generalized Signed-Digit Numbers Non-redundant α = 0 α 1 Conventional Radi-r Positional ρ = 0 ρ 1 Non-redundant signed-digit α = β (even r) Symmetric minimal GSD r = 2 Minimal GSD Generalized signed-digit (GSD) ρ = 1 ρ 2 α β Asymmetric minimal GSD α = 0 α = 1 (r?2) Symmetric nonminimal GSD α < r α = β Non-minimal GSD α = 0 α β Asymmetric nonminimal GSD Radi r Digit set [ α, β] Requirement α + β + 1 r Redundancy inde ρ = α + β + 1 r α = 1 β = r BSD or BSB Storedcarry (SC) Non-binary SB Ordinary signed-digit Unsigned-digit redundant (UDR) SCB r = 2 α = r/2 + 1 α = r?1 r = 2 BSC Minimally redundant OSD Maimally redundant OSD BSCB Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan 41

42 Binary Signed Digit (BSD) i BSD representation of +6 s, v Sign and value encoding 2 s-compl bit 2 s-complement n, p Negative & positive flags n, z, p out-of-3 encoding Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan 42

43 Carry-Free Addition Algorithms Carry-free addition of GSD numbers i+1,y i+1 i, y i i?,y i? Compute the position sums p i = i + y i Divide p i into a transfer t i+1 and interim sum w i = p i rt i+1 w i t i Add incoming transfers to get the sum digits s i = w i + t i If the transfer digits t i are in [ λ, μ], we must have: α + λ p i rt i+1 β μ interim sum s i+1 s i s i? These constraints lead to: Smallest interim sum if a transfer of λ is to be absorbable Largest interim sum if a transfer of μ is to be absorbable λ α/ (r 1) μ β/ (r 1) Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan 43

44 Is Carry-Free Addition Always Applicable? No: It requires one of the following two conditions [Parh 90] a. r > 2, ρ 3 b. r > 2, ρ = 2, α 1, β 1 e.g., not [ 1, 10] in radi 10 In other words, it is inapplicable for r = 2 ρ = 1 ρ = 2 with α = 1 or β = 1 BSD is not two-stage carry-free Perhaps most useful case e.g., carry-save e.g., carry/borrow-save Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan 44

45 Use Carry-Estimate in [ 1, 1] high low low low high high A position sum 1 is kept intact when the incoming transfer is in [0, 1], whereas it is rewritten as 1 with a carry of 1 for incoming transfer in [ 1, 0]. This guarantees that t i w i and thus 1 s i 1. i y in [ 1, 1] i p in [ 2, 2] i e in {low: [ 1, 0], high: [0, 1]} i w in [ 1, 1] i t in [ 1, 1] i+1 s in [ 1, 1] i Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan 45

46 Residue Number Systems Chapter Goals Study a way of encoding large numbers as a collection of smaller numbers to simplify and speed up some operations Chapter Highlights Moduli, range, arithmetic operations Many sets of moduli possible: tradeoffs Conversions between RNS and binary The Chinese remainder theorem Why are RNS applications limited? Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan 46

47 RNS Representations and Arithmetic Chinese puzzle, 1500 years ago: What number has the remainders of 2, 3, and 2 when divided by 7, 5, and 3, respectively? Residues uniquely identify the number, hence they constitute a representation Pairwise relatively prime moduli: m k 1 >... > m 1 > m 0 The residue i of wrt the ith modulus m i (similar to a digit): i = mod m i = m i RNS representation contains a list of k residues or digits: = (2 3 2) RNS(7 5 3) Default RNS for this chapter: RNS( ) Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan 47

48 RNS Dynamic Range Product M of the k pairwise relatively prime moduli is the dynamic range M = m k 1... m 1 m 0 We can take the range of RNS( ) For RNS( ), M = = 840 to be [ 420, 419] or any other set of 840 Negative numbers: Complement relative to M consecutive integers m = M i m i 21 = ( ) RNS 21 = ( ) RNS = ( ) RNS Here are some eample numbers in our default RNS( ): ( ) RNS Represents 0 or 840 or... ( ) RNS Represents 1 or 841 or... ( ) RNS Represents 2 or 842 or..... ( ) RNS Represents 64 or 904 or... ( ) RNS Represents 70 or 770 or... ( ) RNS Represents 1 or 839 or... Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan 48

49 RNS as Weighted Representation For RNS( ), the weights of the 4 positions are: Eample: ( ) RNS represents the number = = 9 For RNS(7 5 3), the weights of the 3 positions are: Eample -- Chinese puzzle: (2 3 2) RNS(7 5 3) represents the number = = 23 We will see later how the weights can be determined for a given RNS Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan 49

50 RNS Encoding and Arithmetic Operations Operand 1 Operand 2 mod 8 mod 7 mod 5 mod 3 Binary-coded format for RNS( ). Mod-8 Unit Mod-7 Unit Mod-5 Unit Mod-3 Unit Result Arithmetic in RNS( ) ( ) RNS Represents = +5 ( ) RNS Represents y = 1 ( ) RNS + y : = 4, = 4, etc. ( ) RNS y : = 6, = 6, etc. (alternatively, find y and add to ) ( ) RNS y : = 3, = 2, etc. mod 8 mod 7 mod 5 mod 3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan 50

51 Choosing the RNS Moduli Target range for our RNS: Decimal values [0, ] Strategy 1: To minimize the largest modulus, and thus ensure high-speed arithmetic, pick prime numbers in sequence Pick m 0 = 2, m 1 = 3, m 2 = 5, etc. After adding m 5 = 13: RNS( ) M = Inadequate RNS( ) M = Too large RNS( ) M = Just right! = 19 bits Fine tuning: Combine pairs of moduli 2 & 13 (26) and 3 & 7 (21) RNS( ) M = Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan 51

52 An Improved Strategy Target range for our RNS: Decimal values [0, ] Strategy 2: Improve strategy 1 by including powers of smaller primes before proceeding to the net larger prime RNS(2 2 3) M = 12 RNS( ) M = 2520 RNS( ) M = RNS( ) M = (remove one 3, combine 3 & 5) RNS( ) M = = 18 bits Fine tuning: Maimize the size of the even modulus within the 4-bit limit RNS( ) M = Too large We can now remove 5 or 7; not an improvement in this eample Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan 52

53 Low-Cost RNS Moduli Target range for our RNS: Decimal values [0, ] Strategy 3: To simplify the modular reduction (mod m i ) operations, choose only moduli of the forms 2 a or 2 a 1, aka low-cost moduli RNS(2 a k 1 2 a k a a 0 1) We can have only one even modulus 2 a i 1 and 2 a j 1 are relatively prime iff a i and a j are relatively prime RNS( ) basis: 3, 2 M = 168 RNS( ) basis: 4, 3 M = 1680 RNS( ) basis: 5, 3, 2 M = RNS( ) basis: 5, 4, 3 M = Comparison It s easy to mod 2 k and 2 k -1 RNS( ) 18 bits M = RNS( ) 17 bits M = Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan 53

54 Encoding and Decoding of Numbers Conversion from binary/decimal to RNS Eample 4.1: Represent the number y = ( ) two = (164) ten in RNS( ) The mod-8 residue is easy to find 3 = y 8 = (100) two = 4 We have y = ; thus 2 = y 7 = = 3 1 = y 5 = = 4 0 = y 3 = = 2 Table 4.1 Residues of the first 10 powers of 2 i 2 i 2 i 7 2 i 5 2 i Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan 54

55 Conversion from RNS to Binary/Decimal Theorem 4.1 (The Chinese remainder theorem) = ( k ) RNS = i M i α i i m i M where M i = M/m i and α i = M i 1 m i (multiplicative inverse of M i wrt m i ) Implementing CRT-based RNS-to-binary conversion = i M i α i i m i M = i f i ( i ) M We can use a table to store the f i values - i m i entries Table 4.2 Values needed in applying the Chinese remainder theorem to RNS( ) i m i i M i α i i m i M Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan 55

56 Intuitive Justification for CRT Puzzle: What number has the remainders of 2, 3, and 2 when divided by the numbers 7, 5, and 3, respectively? = (2 3 2) RNS(7 5 3) = (?) ten (1 0 0) RNS(7 5 3) = multiple of 15 that is 1 mod 7 = 15 (0 1 0) RNS(7 5 3) = multiple of 21 that is 1 mod 5 = 21 (0 0 1) RNS(7 5 3) = multiple of 35 that is 1 mod 3 = 70 (2 3 2) RNS(7 5 3) = (2 0 0) + (0 3 0) + (0 0 2) = 2 (1 0 0) + 3 (0 1 0) + 2 (0 0 1) = = = 233 = 23 mod 105 Therefore, = (23) ten Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan 56

57 Difficult RNS Arithmetic Operations Sign test Magnitude comparison Division Could convert back and forth to/from binary. Another approach: convert to a mied radi system, as numbers in a mied radi system are comparable. Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan 57

58 Difficult RNS Arithmetic Operations Eample: Of the following RNS( ) numbers: Which, if any, are negative? Which is the largest? Which is the smallest? Assume a range of [ 420, 419] a b c d e f = ( ) RNS = ( ) RNS = ( ) RNS = ( ) RNS = ( ) RNS = ( ) RNS Answers: d < c < f < a < e < b 70 < 8 < 1 < 8 < 21 < 64 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan 58

59 General RNS Division General RNS division, as opposed to division by one of the moduli (aka scaling), is difficult; hence, use of RNS is unlikely to be effective when an application requires many divisions Scheme proposed in 1994 PhD thesis of Ching-Yu Hung (UCSB): Use an algorithm that has built-in tolerance to imprecision, and apply the approimate CRT decoding to choose quotient digits Eample SRT algorithm (s is the partial remainder) s < 0 quotient digit = 1 s 0 quotient digit = 0 s > 0 quotient digit = 1 The BSD quotient can be converted to RNS on the fly Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan 59

60 Limits of Fast Arithmetic in RNS Known results from number theory Theorem 4.2: The ith prime p i is asymptotically i ln i Theorem 4.3: The number of primes in [1, n] is asymptotically n/ln n Theorem 4.4: The product of all primes in [1, n] is asymptotically e n Implications to speed of arithmetic in RNS Theorem 4.5: It is possible to represent all k-bit binary numbers in RNS with O(k / log k) moduli such that the largest modulus has O(log k) bits That is, with fast log-time adders, addition needs O(log log k) time Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan 60

61 Hardware Implementation for RNS Representations Operand 1 Operand 2 Mod-8 Unit Mod-7 Unit Mod-5 Unit Mod-3 Unit Result mod 8 mod 7 mod 5 mod 3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan 61

62 Addition/Subtraction Instructor: Kuan Jen Lin Dept. of EE, FJU, Taiwan Room: SF 727B Most slides originate from the tetbook author s PowerPoint presentation files. Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 1

63 II Addition / Subtraction Review addition schemes and various speedup methods Addition is a key op (in itself, and as a building block) Subtraction = negation + addition Carry propagation speedup: lookahead, skip, select, Two-operand versus multioperand addition Topics in This Part Chapter 5 Basic Addition and Counting Chapter 6 Carry-Lookahead Adders Chapter 7 Variations in Fast Adder Chapter 8 Multioperand Addition Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 2

64 Basic Addition and Counting Chapter Goals Study the design of ripple-carry adders, discuss why their latency is unacceptable, and set the foundation for faster adders Chapter Highlights Full adders are versatile building blocks Longest carry chain on average: log 2 k bits Fast asynchronous adders are simple Counting is relatively easy to speed up Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 3

65 HA and FA Adders Inputs Outputs y c s Half-adder (HA): Truth table and block diagram Inputs Outputs y c c s in out y c FA out s Full-adder (FA): Truth table and block diagram c s HA y c in Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 4

66 Half-Adder Implementations c y c y s y s y (a) AND/XOR half-adder. _ c (b) NOR-gate half-adder. s y (c) NAND-gate half-adder with complemented carry. Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 5

67 Some Full-Adder Details Logic equations for a full-adder: s = y c in (odd parity function) = yc in y c in yc in y c in c out = y c in y c in (majority function) Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 6

68 Full-Adder Implementations y y c out HA HA c in c out c in s (a) Built of half-adders. y c out Mu s s c in (b) Built as an AND-OR circuit. (c) Suitable for CMOS realization. Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 7

69 Bit Serial Adder and Ripple Adder y Shift i y i Carry FF c i+1 FA c i Shift Clock s i (a) Bit-serial adder. s 31 y 31 y 1 1 y 0 0 c 32 c out FA c c 2 FA c 1 FA c 0 c in s 32 s 31 s (b) Ripple-carry adder. 1 s 0 Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 8

70 Critical Path Through a Ripple-Carry Adder T ripple-add = T FA (,y c out ) + (k 2) T FA (c in cout) + T FA (c in s) k 1 y k 1 y k-2 k 2 y 1 1 y 0 0 c k c out c k 2 c k 1 c 2 c 1 FA FA... FA FA c 0 c in s k s k 1 s k 2 s 1 s 0 Critical path in a k-bit ripple-carry adder. Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 9

71 Conditions and Eceptions c out y k 1 k 1 y k 2 k 2 c k 1 c k c k 2 FA FA... c 2 y1 1 y 0 0 FA c 1 FA c0 c in Overflow Negative Zero s k 1 s k 2 Overflows occurs when two numbers of like sign are added and a result of the opposite sign is produced. overflow 2 s-compl = k 1 y k 1 s k 1 k 1 y k 1 s k 1 overflow 2 s-compl = c k c k 1 = c k c k 1 c k c k 1 s 1 s 0 Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 10

72 Binary Adders as Versatile Building Blocks (1/2) Set one input to 0: c out = AND of other inputs y Set one input to 1: Set one input to 0 and another to 1: c out = OR of other inputs s = NOT of third input c out FA c in Bit 3 Bit 2 Bit 1 Bit w 1 z 0 y s w yz c 4 c 3 c 2 c 1 c 0 w yz yz y 0 (w yz) Fig. 5.6 Four-bit binary adder used to realize the logic function f = w + yz and its complement. Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 11

73 Binary Adders as Versatile Building Blocks (2/2) Inputs Outputs y c c s in out y c out FA s c in Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 12

74 Eample of Carry Propagation Bit positions c out c in \ /\ / \ /\ / Carry chains and their lengths Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 13

75 14 Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan Using Probability to Analyze Carry Propagation Given binary numbers with random bits, for each position i we have Probability of carry generation = ¼ (both 1s) Probability of carry annihilation = ¼ (both 0s) Probability of carry propagation = ½ (different) Probability that carry generated at position i propagates through position j 1 and stops at position j (j > i) 2 (j 1 i) 1/2 = 2 (j i) Epected length of the carry chain that starts at position i 1) ( ) 1 ( ) 1 ( ) 1 ( 1 1 ) 1 ( 1 1 ) ( 2 2 )2 ( 1)2 ( 2 )2 ( 2 )2 ( )2 ( = + = = + + = + = + i k i k i k i k i k l l i k k i j i j i k i k i k l i k i j Because the carry definitely stops at position k, the term for k is not multiplied by ½.

76 Carry Completion Detection b k... b i+1 y = +y i i i i b i i y i... b = c 0 in c k = c out... c i+1 c i i + y i... c = c 0 in d i+1 i y i b i c i 0 0 Carry not yet known 0 1b i = Carry 1: No known carry to be 1 1 0c i = Carry 1: Carry known to be 0 alldone } From other bit positions Dual rail coding Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 15

77 Self-Timed Adder Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 16

78 Self-Timed Adder with Parallel carry Completion Sensing Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 17

79 Addition of a Constant: Counters 0 1 Mu Data in Count / Initialize Clock +1 ( 1) Count register Reset Load Clear Enable Counter overflow c out Incrementer (Dec rementer) + 1 ( 1) Data out Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 18

80 Implementing a Simple Up Counter k 1 k 2... c k c k 1 c k 2 c 2 c s k 1 s k 2 s 2 s 1 s 0 Ripple-carry incrementer for use in an up counter. Count Output Q3 T Q 2 T Q1 T Q0 T Increment Q3 Q2 Q1 Q0 Four-bit asynchronous up counter built only of negative-edgetriggered T flip-flops. Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 19

81 Manchester Carry Chains and Adders Sum digit in radi r s i = ( i + y i + c i ) mod r Special case of radi 2 s i = i y i c i Computing the carries c i is thus our central problem For this, the actual operand digits are not important What matters is whether in a given position a carry is generated, propagated, or annihilated (absorbed) For binary addition: g i = i y i p i = i y i a i = i y i = ( i y i ) It is also helpful to define a transfer signal: t i = g i p i = a i = i y i Using these signals, the carry recurrence is written as c i+1 = g i c i p i = g i c i g i c i p i = g i c i t i Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 20

82 Manchester Carry Network The worst-case delay of a Manchester carry chain has three components: 1. Latency of forming the switch control signals 2. Set-up time for switches 3. Signal propagation delay through k switches g i = i y i p i = i y i c i +1 = g i c i p i c i+1 a i g i Logic 0 p Logic 1 (a) Conceptual representation i 0 1 c i V DD c' i+1 g i p i Clock V SS (b) Possible CMOS realization. c' i Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 21

83 Carry Network is the Essence of a Fast Adder g i p i Carry is: i y i annihilated or killed propagated generated (impossible) g i = i y i p i = i y i g k 1 p k 1 g k 2 p k 2 g i+1 p i g i p i g 1 p 1 g 0 p 0 c 0 Carry network c k c k 1 c k c i+1 c i c 1 c 0 Ripple; Skip; Lookahead; Parallel-prefi The main part of an adder is the carry network. The rest is just a set of gates to produce the g and p signals and the sum bits. s i Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 22

84 Carry Propagation Network of a Ripple-Carry Adder The carry recurrence: c i+1 = g i p i c i Latency of k-bit adder is roughly 2k gate delays: 1 gate delay for production of p and g signals, plus 2(k 1) gate delays for carry propagation, plus 1 XOR gate delay for generation of the sum bits g k 1 p k 1 g k 2 p k 2 g 1 p 1 g 0 p 0... c k c k 1 c k 2 c 2 c 1 c 0 Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 23

85 Carry-Lookahead Adders Chapter Goals Understand the carry-lookahead method and its many variations used in the design of fast adders Chapter Highlights Single- and multilevel carry lookahead Various designs for log-time adders Relating the carry determination problem to parallel prefi computation Implementing fast adders in VLSI Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 24

86 Unrolling the Carry Recurrence Recall the generate, propagate, annihilate (absorb), and transfer signals: Signal Radi r Binary g i is 1 iff i + y i r i y i p i is 1 iff i + y i = r 1 i y i a i is 1 iff i + y i < r 1 i y i = ( i y i ) t i is 1 iff i + y i r 1 i y i s i ( i + y i + c i ) mod r i y i c i The carry recurrence can be unrolled to obtain each carry signal directly from inputs, rather than through propagation c i = g i 1 c i 1 p i 1 = g i 1 (g i 2 c i 2 p i 2 ) p i 1 Where p j can be replaced with tj. = g i 1 g i 2 p i 1 c i 2 p i 2 p i 1 = g i 1 g i 2 p i 1 g i 3 p i 2 p i 1 c i 3 p i 3 p i 2 p i 1 = g i 1 g i 2 p i 1 g i 3 p i 2 p i 1 g i 4 p i 3 p i 2 p i 1 c i 4 p i 4 p i 3 p i 2 p i 1 =. Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 25

87 Four-Bit Carry-Lookahead Adder (1/2) Compleity reduced by deriving the carry-out indirectly c 4 =g 3 +c 3 p 3 c 4 p 3 g 3 c 3 Full carry lookahead is quite practical for a 4-bit adder p 2 g 2 c 1 = g 0 c 0 p 0 c 2 = g 1 g 0 p 1 c 0 p 0 p 1 c 3 = g 2 g 1 p 2 g 0 p 1 p 2 c 0 p 0 p 1 p 2 c 4 = g 3 g 2 p 3 g 1 p 2 p 3 g 0 p 1 p 2 p 3 c 0 p 0 p 1 p 2 p 3 c 2 c 1 p 1 g 1 p 0 g 0 c 0 Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 26

88 Four-Bit Carry-Lookahead Adder (2/2) Source: Ercegovac and Lang, Digital Arithmetic, MKP Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 27

89 Carry Lookahead Beyond 4 Bits Consider a 32-bit adder c 1 = g 0 c 0 p 0 c 2 = g 1 g 0 p 1 c 0 p 0 p 1 c 3 = g 2 g 1 p 2 g 0 p 1 p 2 c 0 p 0 p 1 p input AND c 31 = g 30 g 29 p 30 g 28 p 29 p 30 g 27 p 28 p 29 p c 0 p 0 p 1 p 2 p 3... p 29 p High fan-ins necessitate 32-input OR tree-structured circuits For wide words, full carry lookahead is impractical. Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 28

90 Two Schemes to Manage the Compleity High-radi addition (i.e., radi 2 h ) Increases the latency for generating g and p signals and sum digits, but simplifies the carry network (optimal radi?) Multilevel lookahead Eample: 16-bit addition Radi-16 (four digits) Two-level carry lookahead (four 4-bit blocks) Either way, the carries c 4, c 8, and c 12 are determined first c 16 c 15 c 14 c 13 c 12 c 11 c 10 c 9 c 8 c 7 c 6 c 5 c 4 c 3 c 2 c 1 c 0 c out??? c in Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 29

91 One-Level carry Lookahead Adder Source: Ercegovac and Lang, Digital Arithmetic, pp.72. Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 30

92 Block Generate and Propagate signals Block generate and propagate signals g [i,i+3] = g i+3 g i+2 p i+3 g i+1 p i+2 p i+3 g i p i+1 p i+2 p i+3 p [i,i+3] = p i p i+1 p i+2 p i+3 c i+3 c i+2 c i+1 Note: unrelated to c i g p g p g p g p i+3 i+3 i+2 i+2 i+1 i+1 i i 4-bit lookahead carry generator c i g [i,i+3] p [i,i+3] C k = g[0,k-1]+c 0 p[0,k-1] C i+4 = g[i,i+3]+c i p[i,i+3] Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 31

93 4-bit Lookahead Carry Generator p [i,i+3] g [i,i+3] p i+3 Block Signal Generation Intermediate Carries g i+3 c i+3 p i+2 g i+2 c i+2 p i+1 c i+1 c i g i+1 p i g i Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 32

94 A Two-Level Carry-Lookahead Adder (64 bits) c c c c c g [12,15] g [8,11] g [4,7] g [0,3] p [12,15] p [8,11] p [4,7] p [0,3] c c 4-bit lookahead carry generator g p [48,63] g [32,47] g [16,31] g [0,15] p [48,63] [32,47] p [16,31] p [0,15] 16-bit Carry-Lookahead Adder 4-bit lookahead carry generator g p [0,63] [0,63] 16 bit CLA C4, C8 and C12 are the C i+1, C i+2 an C i+3 respectively in last slide. C k = g[0,k-1]+c 0 p[0,k-1] Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 33

95 Latency of a 16-bit 2-Level l Carry-Lookahead Adder (1/2) (Level 1) g and p for individual bit positions 1 gate level (Level 1) g and p signals for 4-bit blocks 2 gate levels i.e. g[0,3], p[0,3] g[12, 15], p[12, 15] (Level 2) Block carry-in signals c 4, c 8, and c 12 g[0,15], p[0,15] (Level 1) Internal carries within 4-bit blocks c1, c2, c3, c5,.. (Level 2) C 15 if required 2 gate levels 2 gate levels (Level 1) Sum bits (XOR) 2 gate levels??? Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 34

96 Latency of a 16-bit 2-Level l Carry- Lookahead Adder (2/2) Total latency for the 16-bit adder is 9 gate levels Each additional lookahead level adds 4 gate levels of latency (yellow block in last slide) Latency for k-bit CLA adder: 4 log4k + 1 gate levels Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 35

97 Combining of g and p signals j 1 Block B" j 0 i 1 Block B' i 0 g p g p (g", p") (g', p') g" p" g' p' g p g = g" + g'p" p = p'p" Block B (g, p) g p Combining of g and p signals of two (contiguous or overlapping) blocks B' and B" of arbitrary widths into the g and p signals for block B. Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 36

98 Formulating the Prefi Computation Problem The problem of carry determination can be formulated as: Given (g 0, p 0 ) (g 1, p 1 )... (g k 2, p k 2 ) (g k 1, p k 1 ) Find (g [0,0], p [0,0] ) (g [0,1], p [0,1] )... (g [0,k 2], p [0,k 2] ) (g [0,k 1], p [0,k 1] ) c 1 c 2... c k 1 c k Carry-in can be viewed as an etra ( 1) position: (g 1, p 1 ) = (c in, 0) The desired pairs are found by evaluating all prefies of (g 0, p 0 ) (g 1, p 1 )... (g k 2, p k 2 ) (g k 1, p k 1 ) The carry operator is associative, but not commutative [(g 1, p 1 ) (g 2, p 2 )] (g 3, p 3 ) = (g 1, p 1 ) [(g 2, p 2 ) (g 3, p 3 )] Prefi sums analogy: Given k 1 Find k 1 Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 37

99 Prefi-Based Carry Network g 3, p 3 g 2, p 2 g 1, p 1 g 0, p g [0,3], p [0,3] =(c g 3, 4 p, 3 --) g [0,2], p [0,2] =(c g 2, 3, p--) 2 + g [0,1], p [0,1] =(c g 1, p 21, --) g [0,0], p [0,0] =(c g 0, 1 p, 0 --) Four-input prefi sums network Scan order g p g p Four-bit Carry lookahead network g [0,3], p [0,3] =(c 4, --) g [0,2], p [0,2] =(c 3, --) g [0,1], p [0,1] =(c 2, --) g [0,0], p [0,0] =(c 1, --) g p Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 38

100 Parallel Prefi Sums Network Built of Two k/2- Input Networks and k/2 Adders(Ladner-Fischer) k 1... k/2 k/ Prefi Sums k/2 Prefi Sums k/ s k/ s 0 s k 1... s k/2 Recursive dividing Delay recurrence Cost recurrence Incurs large fanout D(k) = D(k/2) + 1 = log 2 k C(k) = 2C(k/2) + k/2 = (k/2) log 2 k Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 39

101 a is t in the tetbook Source: Ercegovac and Lang, Digital Arithmetic, pp.81 Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 40

102 Eliminate Large Fanout Increase the number of levels Increase the number of cells Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 41

103 The Brent-Kung Recursive Construction k 1 k Prefi Sums k/ Parallel prefi sums network built of one k/2-input network and k 1 adders. s k 1 s k 2... s 3 s 2 s 1 s 0 Delay recurrence Cost recurrence D(k) = D(k/2) + 2 = 2 log 2 k 1 ( 2 really) C(k) = C(k/2) + k 1 = 2k 2 log 2 k Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 42

104 Brent-Kung Carry Network (8-Bit Adder) [7, 7 ] [6, 6] [5, 5 ] [4, 4 ] [3, 3 ] [2, 2 ] [1, 1] [0, 0 ] g p [1,1] [1,1] g [0,0] p [0,0] [6, 7 ] [2, 3 ] [4, 5] [0, 1 ] [4, 7 ] [0, 3 ] g p [0,1] [0,1] [0, 7 ] [0, 6] [0, 5 ] [0, 4 ] [0, 3 ] [0, 2 ] [0, 1] [0, 0 ] Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 43

105 Source: Ercegovac and Lang, Digital Arithmetic, pp.83 Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 44

106 Brent-Kung Carry Network (16-Bit Adder) Level 1 Reason for latency being 2 log 2 k s 15 s 14 s 13 s 12 s 11 s 10 s 9 s 8 s 7 s 6 s 5 s 4 s 3 s 2 s 1 s 0 Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 45

107 Kogge-Stone Carry Network (16-Bit Adder) Cost formula C(k) = (k 1) + (k 2) + (k 4) (k k/2) = k log 2 k k + 1 log 2 k levels (minimum possible) s 15 s 14 s 13 s 12 s 11 s 10 s 9 s 8 s 7 s 6 s 5 s 4 s 3 s 2 s 1 s 0 Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 46

108 Source: Ercegovac and Lang, Digital Arithmetic, pp.84 Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 47

109 Speed-Cost Tradeoffs in Carry Networks Method Ladner-Fischer Kogge-Stone Delay log 2 k log 2 k Cost (k/2) log 2 k k log 2 k k + 1 Improving the Ladner/Fischer design Brent-Kung 2 log 2 k 2 2k 2 log 2 k k?... k/2 k/2? Prefi Sums k/2 Prefi Sums k/ s k?... s k/2... s k/2?... s 0 These outputs can be produced one time unit later without increasing the overall latency This strategy saves enough to make the overall cost linear (best possible) Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 48

110 Hybrid B-K/K-S Carry Network (16-Bit Adder) Level Brent-Kung: 6 levels 26 cells Kogge-Stone: 4 levels 49 cells 6 s 9 s 10 s 11 s 12 s 13 s 14 s 15 s 8 s 7 s 6 s 5 s 4 s 3 s 2 s 1 s 0 s 9 s 10 s 11 s 12 s 13 s 14 s 15 s 8 s 7 s 6 s 5 s 4 s 3 s 2 s 1 s Brent- Kung Kogge- Stone Hybrid: 5 levels 32 cells Brent- Kung s 15 s 14 s 13 s 12 s 11 s 10 s 9 s 8 s 7 s 6 s 5 s 4 s 3 s 2 s 1 s 0 Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 49

111 Four-Bit Manchester Carry Chains (Transistor Level) g 3 PH2 PH2 g 3 PH2 PH2 g [0,3] p 3 g 2 PH2 PH2 p 3 g 2 PH2 PH2 p [0,3] g [0,2] p 2 g 1 PH2 p 2 g 1 PH2 PH2 p [0,2] g [0,1] p 1 g 0 PH2 p 1 g 0 PH2 PH2 p [0,1] g [0,3] p 0 p 0 PH2 p [0,3] PH2 PH2 (a) (b) Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 50

112 Variations in Fast Adders Chapter Goals Study alternatives to the carry-lookahead method for designing fast adders Chapter Highlights Many methods besides CLA are available (both competing and complementary) Best design is technology-dependent (often hybrid rather than pure) Knowledge of timing allows optimizations Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 51

113 Simple Carry-Skip Adders c c Bit Block 4-Bit Block p Skip c c Bit Block Skip p c c 8 4-Bit Block (a) Ripple-carry adder. 4-Bit Block 4-Bit Block p Skip [12,15] [8,11] [4,7] 8 (b) Simple carry-skip adder. c c Ripple-carry stages Skip logic (2 gates) p c c [0,3] 0 0 Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 52

114 Carry-Skip Adder Using MUX Source: Ercegovac and Lang, Digital Arithmetic, pp.66. Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 53

115 Another View of Carry-Skip Addition g 4j+3 p 4j+3 g 4j+2 p 4j+2 g 4j+1 p 4j+1 g 4j p 4j c 4j+4 c 4j+3 c 4j+2 c 4j+1 c 4j One-way street Freeway Street/freeway analogy for carry-skip adder. Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 54

116 Carry-Skip Adder with Fied Block Size Block width b; k/b blocks to form a k-bit adder (assume b divides k) T fied-skip-add = (b 1) (k/b 2) + (b 1) in block 0 OR gate skips in last block 2b + k/b 3.5 stages dt/db = 2 k/b 2 = 0 b opt = k/2 1stage = 2 gate levels T opt = 2 2k Eample: k = 32, b opt = 4, T opt = 12.5 stages (contrast with 32 stages for a ripple-carry adder) Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 55

117 Worst Case Delay Source: Ercegovac and Lang, Digital Arithmetic, pp Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 56

118 Worst case in block C 0 =0 Worst case in last block C 12 =1 Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 57

119 Carry-Skip Adder with Variable-Width Blocks (1/2) bt 1 bt 2... b b0 Block widths 1 Carry path (3) Carry path (1) Carry path (2) Ripple Skip Carry path (2) goes through one fewer skip than (1), so block t-2 can be one bit wider than block t-1 without increasing the total delay. Carry path (3) goes through one fewer skip than (1), so block 1 can be one bit wider than block 0 without increasing the total delay. Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 58

120 Carry-Skip Adder with Variable-Width Blocks (2/2) The total number of bits in the t blocks is k: 2[b + (b + 1) (b + t/2 1)] = t(b + t/4 1/2) = k b = k/t t/4 + 1/2 T var-skip-add = 2(b 1) t 2 = 2k/t + t/2 2.5 dt/db = 2k/t 2 + 1/2 = 0 t opt = 2 k T opt = 2 k 2.5 (a factor of 2 smaller than for fied-block) Let b=1 Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 59

121 Multilevel Carry-Skip Adders c out c in S 1 S 1 S 1 S 1 S 1 c out c in S 1 S 1 S 1 S 1 S 1 S 2 c out c in S 1 S 1 S 1 S 2 Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 60

122 Single-Level Carry-Skip Adder (Eample 7.1) Assumptions: Each of the following takes one unit of time: generation of g i and p i, generation of level-i skip signal from level-(i 1) skip signals, ripple, skip, and formation of sum bit once the incoming carry is known Build the widest possible one-level carry-skip adder with total delay of 8 c 8 out b 6 7 b5 b4 b3 b2 b S 1 S 1 S 1 S 1 S 1 At the right end, block width is limited by the output timing requirement. Stage b 0 takes 2 time units: one for generating gp and the other for generating carry. Stage b1 cannot be more than 3 bits, because its output is available at time 3, so it can take one time unit for generating gp and two for propagation across 2 bits. 2 b 0 cin 0 Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 61

123 At the left end, block width is limited by input timing. Stage b4 cannot be more than 3 bits, because its input become available at time 5 and the total adder delay is to be 8 units.. Ma adder width = 18 ( ) Generalization of Eample 7.1 for total time T (even or odd) T/2 T/ (T + 1)/ Thus, for any T, the total width is (T + 1) 2 /4 2 Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 62

124 c Two-Level Carry-Skip Adder (1/2) Eample 7.2 Given the delay pair {β, α} for a level-2 block in Fig. 7.7a, the number of level-1 blocks that can be accommodated is γ = min(β 1, α) α out b α 1 b α 2 b2 b1 α 1 α S 1 S 1 S 1 S 1 S 1 S 1 1 b0 S 1 cin 0 c out Single-level carry-skip adder with T assimilate = α β b β 2 b β 3 β 1 β 2 4 b 2 3 b 1 2 b 0 1 cin S 1 S 1 S 1 S 1 S 1 S 1 S 1 Single-level carry-skip adder with T produce = β Width of the ith level-1 block in the level-2 block characterized by {β, α} is b i = min(β γ + i + 1, α i); the total block width is then i=0 to γ 1 b i Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 63

125 Two-Level Carry-Skip Adder (2/2) c out 8 Tproduce T assimilate {8, 1} {7, 2} {6, 3} {5, 4} {4, 5} {3, 8} c bf be bd bc bb ba S2 S2 S2 S2 S2 (a) F Block E Block D Block C Block B Block A in c out t=8 2 c in 2 t= Ma adder width = 30 ( ) Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 64

126 Carry-Skip Adder Optimization Scheme Block of b full-adder units I(b) G(b) A(b) Inputs S (b) h E (b) h Level-h skip Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 65

127 Carry-Select Adders k - 1 k/2 k k/2-bit adder k/2-bit adder 0 1 k/2-bit adder k/2+1 k/2+1 k/2 1 0 Mu c k/2 c out k/2 c in Carry-select adder for k- bit numbers built from three k/2-bit adders. High k/2 bits Low k/2 bits C select-add (k) = 3C add (k/2) + k/2 + 1 T select-add (k) = T add (k/2) + 1 Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 66

128 Two-level Carry-Select Adder Built of k/4-bit adders k - 1 3k/4 0 k/4-bit adder 1 3k/4-1 k/2 0 k/4-bit adder 1 k/2-1 k/4 k/ k/4-bit adder 1 k/4-bit adder c in k/4+1 k/4+1 k/4 k/4 k/4+1 k/4+1 k/4 1 0 Mu 1 0 Mu c k/4 1 0 Mu k/2+1 c k/2 k/4 c out, High k/2 bits Middle k/4 bits Low k/4 bits k/2-bit conditional-sum Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 67

129 Conditional Adder Source: Ercegovac and Lang, Digital Arithmetic, pp.86 Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 68

130 Carry Select Adder Source: Ercegovac and Lang, Digital Arithmetic, pp.87 Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 69

131 Conditional Sum Adder Source: Ercegovac and Lang, Digital Arithmetic, pp.87 Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 70

132 16-Bit Conditional Sum Adder The same as Fig in tetbook Source: Ercegovac and Lang, Digital Arithmetic, pp.89 Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 71

133 Conditional-Sum Adder Multilevel carry-select idea carried out to the etreme (to 1-bit blocks. C(k) 2C(k/2) + k + 2 k (log 2 k + 2) + kc(1) T(k) = T(k/2) + 1 = log 2 k + T(1) where C(1) and T(1) are the cost and delay of the circuit of the following circuit for deriving the sum and carry bits with a carry-in of 0 and 1 y i i c s i For c i = 1 c s For c i = 0 i+1 i+1 i k + 2 is an upper bound on number of single-bit 2-to-1 multipleers needed for combining two k/2-bit adders into a k-bit adder Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 72

134 A Hybrid Carry-Lookahead/Carry-Select Adder The most popular hybrid addition scheme: Lookahead Carry Generator c in Carry-Select Block g, p Mu Mu Mu c out Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 73

135 Summary Source: Ercegovac and Lang, Digital Arithmetic, pp.114. Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 74

136 A Hybrid Ripple-Carry/Carry-Lookahead Design c 48 c 32 c 16 c c c c g [12,15] g [8,11] g [4,7] g [0,3] p [12,15] p [8,11] p [4,7] p [0,3] 4-Bit Lookahead Carry Generator (with carry-out) 16-bit Carry-Lookahead Adder Any Two Addition Schemes Can Be Combined Other possibilities: hybrid carry-select/ripple-carry hybrid ripple-carry/carry-select... Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 75

137 Optimizations in Fast Adders What looks best at the block diagram or gate level may not be best when a circuit-level design is generated (effects of wire length, signal loading,... ) Modern practice: Optimization at the transistor level Variable-block carry-lookahead adder Optimizations for average or peak power consumption Timing-based optimizations (net slide) Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 76

138 Multioperand Addition Chapter Goals Learn methods for speeding up the addition of several numbers (needed for multiplication or inner-product) Chapter Highlights Running total kept in redundant form Current total + Net number New total Deferred carry assimilation Wallace/Dadda trees and parallel counters Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 77

139 Some Applications of Multioperand Addition a a 1 a a a p p p p p p p p s (0) (1) (2) (3) (4) (5) (6) Multioperand addition problems for multiplication or inner-product computation in dot notation. Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 78

140 Serial Implementation with One Adder (i) k bits Adder Partial sum register k + log 2 n bits i 1 j=0 (j) T serial-multi-add = O(n log(k + log n)) = O(n log k + n log log n) Therefore, addition time grows superlinearly with n when k is fied and logarithmically with k for a given n Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 79

141 Pipelined Adder Source: Ercegovac and Lang, Digital Arithmetic, pp.166. Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 80

142 Parallel Implementation as Tree of Adders n 1 adders k k k k k k Adder Adder Adder k+1 k+1 k+1 Adder Adder k+2 k+2 k log 2 n adder levels Adder k+3 Adding 7 numbers in a binary tree of adders. T tree-fast-multi-add = O(log k + log(k + 1) log(k + log 2 n 1)) = O(log n log k + log n log log n) T tree-ripple-multi-add = O(k + log n) [Justified on the net slide] Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 81

143 Elaboration on Tree of Ripple-Carry Adders k k k k k k Adder Adder Adder k+1 k+1 k+1 Adder Adder k+2 k+2 k t+1 t+1 t t... FA HA Level i t+2 t+1 t+2 t+2 t+1 t+1 Adder k+3 T tree-ripple-multi-add = O(k + log n)... t+3 FA t+3 t+2 HA t+2 Level i+1 Fig. 8.5 Ripple-carry adders at levels i and i + 1 in the tree of adders used for multi-operand addition. The absolute best latency that we can hope for is O(log k + log n) There are kn data bits to process and using any set of computation elements with constant fan-in, this requires O(log(kn)) time We will see shortly that carry-save adders achieve this optimum time Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 82

144 Carry-Save Adders Ripple carry adder FA Cut FA FA FA FA FA Carry save adder FA FA FA FA FA FA c in Carry-propagate adder c out dot notation. Carry-save adder (CSA) or (3; 2)-counter or 3-to-2 reduction circuit Full-adder Specifying full- and halfadder blocks, with their inputs and outputs, in dot notation. Half-adder Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 83

145 Eample of CSA Reduction by row (3:2) counter 3 2 Also considered as reduction by column [3:2]. [p:q] counter: p bits of the same weight and produce q bits of adjacent weights. Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 84

146 Use Dot Notation c in Carry-propagate adder c out Carry-save adder (CSA) or (3; 2)-counter or 3-to-2 reduction circuit Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 85

147 Multioperand Addition Using Carry-Save Adders T carry-save-multi-add = O(tree height + T CPA ) = O(log n + log k) CSA CSA C carry-save-multi-add = (n 2)C CSA + C CPA CSA CSA CPA Input Sum register Carry register CSA CSA Carry-propagate adder Output Serial carry-save addition using a single CSA. Tree of carry-save adders reducing seven numbers to two. Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 86

148 Reduction by a CSA Tree 12 FAs 6 FAs Bit position = 12 FAs FAs FAs FAs + 1 HA bit adder --Carry-propagate adder FAs 4 FAs + 1 HA 7-bit adder Total cost = 7-bit adder + 28 FAs + 1 HA Addition of seven 6-bit numbers in dot notation. Representing a seven-operand addition in tabular form. A full-adder compacts 3 dots into 2 (compression ratio of 1.5) A half-adder rearranges 2 dots (no compression, but still useful) Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 87

149 Width of Adders in a CSA Tree [0, k 1] [0, k 1] k-bit CSA [0, k 1] [0, k 1] [0, k 1] k-bit CSA [1, k] [0, k 1] [1, k] [0, k 1] [0, k 1] [0, k 1] Adding seven k-bit numbers and the CSA/CPA widths required. Bit K+1 does not involve addition The inde pair [i, j] means that bit positions from i up to j are involved. k-bit CSA k-bit CSA k-bit CPA k-bit CSA [1, k] [2, k+1] [1, k] [1, k 1] [2, k+1] [2, k+1] [1, k+1] [0, k 1] Due to the gradual retirement (dropping out) of some of the result bits, CSA widths do not vary much as we go down the tree levels k+2 [2, k+1] 1 0 Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 88

150 Wallace and Dadda Trees n inputs... 2 outputs h(n) = 1 + h( 2n/3 ) n(h) = 3n(h 1)/ h 1 < n(h) h h levels h levels Table 8.1 The maimum number n(h) of inputs for an h-level CSA tree h n(h) h n(h) h n(h) n(h): Maimum number of inputs for h levels Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 89

151 Wallace and Dadda Reduction Trees 12 FAs 6 FAs 6 FAs 4 FAs + 1 HA 7-bit adder Total cost = 7-bit adder + 28 FAs + 1 HA Addition of seven 6-bit numbers using Wallace strategy. Wallace tree: Reduce the number of operands at the earliest possible opportunity h n(h) Dadda tree: Postpone the reduction to the etent possible without causing added delay 6 FAs 11 FAs 7 FAs 4 FAs + 1 HA 7-bit adder Total cost = 7-bit adder + 28 FAs + 1 HA Adding seven 6-bit numbers using Dadda s strategy. Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 90

152 A Small Optimization in Reduction Trees 6 FAs 6 FAs 11 FAs 11 FAs 7 FAs 6 FAs + 1 HA 4 FAs + 1 HA 3 FAs + 2 HA 7-bit adder 7-bit adder Total cost = 7-bit adder + 28 FAs + 1 HA Adding seven 6-bit numbers using Dadda s strategy. Total cost = 7-bit adder + 26 FAs + 3 HA taking advantage of the final adder s carry-in. Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 91

153 Parallel Counters 1-bit full-adder = (3; 2)-counter FA FA FA Circuit reducing 7 bits to their 3-bit sum = (7; 3)-counter FA FA HA 1 2 FA HA 3-bit ripple-carry adder Circuit reducing n bits to their log 2 (n + 1) -bit sum = (n; log 2 (n +1) )-counter 3 2 A 10-input parallel counter also known as a (10; 4)-counter. 1 0 Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 92

154 Implementation of [4:2] Counter Source: Ercegovac and Lang, Digital Arithmetic, pp.145. Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 93

155 Implementation of [5:2] Counter Source: Ercegovac and Lang, Digital Arithmetic, pp.146. Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 94

156 Implementation of [7:2] Counter Source: Ercegovac and Lang, Digital Arithmetic, pp.146. Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 95

157 Generalized Parallel Counters Multicolumn reduction... (5, 5; 4)-counter Dot notation for a (5, 5; 4)-counter and the use of such counters for reducing five numbers to two numbers. Unequal columns (2, 3; 3)-counter Gen. parallel counter = Parallel compressor Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 96

158 A General Strategy for Column Compression (n; 2)-counters To i + 1 To i + 2 To i + 3 i n inputs ψ 1 ψ 2 ψ 3 One circuit slice i 1 ψ 1 i 2 ψ 2 i 3 ψ 3... n + ψ 1 + ψ 2 + ψ ψ 1 + 4ψ 2 + 8ψ n 3 ψ 1 + 3ψ 2 + 7ψ Eample: Design a bit-slice of an (11; 2)-counter Solution: Let s limit transfers to two stages. Then, 8 ψ 1 + 3ψ 2 Possible choices include ψ 1 = 5, ψ 2 = 1 or ψ 1 = ψ 2 = 2 Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan 97

159 Multiplication Instructor: Kuan Jen Lin Dept. of EE, FJU, Taiwan Room: SF 727B Most slides originate from the tetbook author s PowerPoint presentation files. Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 1

160 III Multiplication Review multiplication schemes and various speedup methods Multiplication is heavily used (in arith & array indeing) Division = reciprocation + multiplication Multiplication speedup: high-radi, tree,... Bit-serial, modular, and array multipliers Topics in This Part Chapter 9 Basic Multiplication Schemes Chapter 10 High-Radi Multipliers Chapter 11 Tree and Array Multipliers Chapter 12 Variations in Multipliers Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 2

161 9 Basic Multiplication Schemes Chapter Goals Study shift/add or bit-at-a-time multipliers and set the stage for faster methods and variations to be covered in Chapters Chapter Highlights Multiplication = multioperand addition Hardware, firmware, software algorithms Multiplying 2 s-complement numbers The special case of one constant operand Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 3

162 Shift/Add Multiplication Algorithms Notation for our discussion of multiplication algorithms: a Multiplicand a k 1 a k 2... a 1 a 0 Multiplier k 1 k p Product (a ) p 2k 1 p 2k 2... p 3 p 2 p 1 p 0 Initially, we assume unsigned operands a Multiplicand Multiplier a 20 0 a 21 1 a 22 2 a 23 Partial products bit-matri Product Multiplication of two 4-bit unsigned binary numbers in dot notation. p 3 Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 4

163 Multiplication Recurrence a a 20 0 a 21 1 a 22 2 a 23 p 3 Multiplicand Multiplier Partial products bit-matri Product Preferred Multiplication with right shifts: top-to-bottom accumulation p (j+1) =(p (j) + j a 2 k ) 2 1 with p (0) = 0 and add p (k) = p = a + p (0) 2 k shift right Multiplication with left shifts: bottom-to-top accumulation p (j+1) = 2p (j) + k j 1 a with p (0) = 0 and shift p (k) = p = a + p (0) 2 k add Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 5

164 Eamples of Basic Multiplication Right-shift algorithm Left-shift algorithm ======================== ======================= a a ======================== ======================= p (0) p (0) a p (0) a p (1) p (1) p (1) a p (1) a p (2) p (2) p (2) a p (2) a p (3) p (3) p (3) a p (3) a p (4) p (4) p (4) ======================== ======================= Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 6

165 Programmed Using Right-Shift Algorithm {Using right shifts, multiply unsigned m_cand and m_ier, storing the resultant 2k-bit product in p_high and p_low. Registers: R0 holds 0 Rc for counter R0 0 Rc Counter Ra for m_cand R for m_ier Rp for p_high Rq for p_low} Ra Multiplicand R Multiplier {Load operands into registers Ra and R} Rp Product, high Rq Product, low mult: load Ra with m_cand load R with m_ier {Initialize partial product and counter} copy R0 into Rp copy R0 into Rq load k into Rc {Begin multiplication loop} m_loop: shift R right 1 {LSB moves to carry flag} branch no_add if carry = 0 add Ra to Rp {carry flag is set to cout} no_add: rotate Rp right 1 {carry to MSB, LSB to carry} rotate Rq right 1 {carry to MSB, LSB to carry} decr Rc {decrement counter by 1} branch m_loop if Rc 0 {Store the product} store Rp into p_high store Rq into p_low m_done:... Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 7

166 Time Compleity of Programmed Multiplication Assume k-bit words k iterations of the main loop 6-7 instructions per iteration, depending on the multiplier bit Thus, 6k + 3 to 7k + 3 machine instructions, ignoring operand loads and result store k = 32 implies instructions on average This is too slow for many modern applications! Microprogrammed multiply would be somewhat better Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 8

167 Sequential Multiplication with Right Shifts Shift Multiplier Doublewidth partial product p Shift (j) Hardware realization Multiplicand a Mu k j Clock? a j k Control path? c out Adder k Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 9

168 Sequential Multiplication with Left Shifts Shift Multiplier Doublewidth partial product p Shift (j) k-j-1 2k 0 Multiplicand a 0 1 Mu k-j-1 a k c out 2k-bit adder 2k Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 10

169 Multiplication of Signed Numbers Negative multiplicand, positive multiplier: No change, other than looking out for proper sign etension ============================ a ============================ p (0) a p (1) p (1) a p (2) p (2) a p (3) p (3) a p (4) p (4) a p (5) p (5) ============================ Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 11

170 Multiplication with a Negative Multiplier Negative multiplicand, negative multiplier: In last step (the sign bit), subtract rather than add 10101= ============================ a ============================ p (0) a p (1) p (1) a p (2) p (2) a p (3) p (3) a p (4) p (4) ( 4 a) p (5) p (5) ============================ Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 12

171 Booth s Recoding i i 1 y i Eplanation No string of 1s in sight End of string of 1s in Beginning of string of 1s in Continuation of string of 1s in Eample Operand (1) Recoded version y Justification 2 j + 2 j i i = 2 j+1 2 i Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 13

172 Eample Multiplication with Booth s Recoding 2 complement of is ============================ a Multiplier y Booth-recoded ============================ p (0) y 0 a p (1) p (1) y 1 a p (2) p (2) y 2 a p (3) p (3) y 3 a p (4) p (4) y 4 a p (5) p (5) ============================ Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 14

173 Multiplication by Constants Eplicit, e.g. y := Implicit, e.g. A[i, j] := A[i, j] + B[i, j] Address of A[i, j] = base + n i + j m n 1 Row i Software aspects: Optimizing compilers replace multiplications by shifts/adds/subs Produce efficient code using as few registers as possible Find the best code by a time/space-efficient algorithm Hardware aspects: Column j Synthesize special-purpose units such as filters y[t] = a 0 [t] + a 1 [t 1] + a 2 [t 2] + b 1 y[t 1] + b 2 y[t 2] Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 15

174 Multiplication Using Binary Epansion Eample: Multiply R1 by the constant 113 = ( ) two R2 R1 shift-left 1 R3 R2 + R1 R6 R3 shift-left 1 R7 R6 + R1 R112 R7 shift-left 4 R113 R112 + R1 Shift, add Shift Ri: Register that contains i times (R1) This notation is for clarity; only one register other than R1 is needed Shorter sequence using shift-and-add instructions R3 R1 shift-left 1 + R1 R7 R3 shift-left 1 + R1 R113 R7 shift-left 4 + R1 Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 16

175 Multiplication via Recoding Eample: Multiply R1 by 113 = ( ) two = ( ) two R8 R1 shift-left 3 R7 R8 R1 R112 R7 shift-left 4 R113 R112 + R1 Shift, add Shift Shift, subtract Shorter sequence using shift-and-add/subtract instructions R7 R3 shift-left 3 R1 R113 R7 shift-left 4 + R1 6 shift or add (3 shift-and-add) instructions needed without recoding Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 17

176 Multiplication via Factorization Eample: Multiply R1 by 119 = 7 17 = (8 1) (16 + 1) R8 R1 shift-left 3 R7 R8 R1 R112 R7 shift-left 4 R119 R112 + R7 Shorter sequence using shift-and-add/subtract instructions R7 R3 shift-left 3 R1 R119 R7 shift-left 4 + R7 Requires a scratch register for holding the 7 multiple 119 = ( ) two = ( ) two More instructions may be needed without factorization Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 18

177 High-Radi Multipliers Chapter Goals Study techniques that allow us to handle more than one multiplier bit in each cycle (two bits in radi 4, three in radi 8,...) Chapter Highlights High radi gives rise to difficult multiples Recoding (change of digit-set) as remedy Carry-save addition reduces cycle time Implementation and optimization methods Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 19

178 Radi-4 Multiplication in Dot Notation a Multiplicand Multiplier Radi 2 a 20 0 a 21 1 a 22 2 a 23 p 3 Partial products bit-matri Product Radi-4, or two-bitat-a-time, multiplication in dot notation Number of cycles is halved, but now the difficult multiple 3a must be dealt with a ( ) 1 0 two a 4 0 ( ) a two 1 p Multiplicand Multiplier Product Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 20

179 A Possible Design for a Radi-4 Multiplier Precomputed via shift-and-add 3a (3a = 2a + a) 0 a 2a Mu To the adder Multiplier 2-bit shifts i+1 i Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 21

180 Eample Radi-4 Multiplication Using 3a ================================ a a ================================ p (0) ( 1 0 ) two a p (1) p (1) ( 3 2 ) two a p (2) p (2) ================================ a ( ) 1 0 ( ) 3 2 p Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 22

181 A Second Design for a Radi-4 Multiplier 0 a 2a Mu To the adder 2-bit shifts Multiplier i+1 i c i+1 i +c c Carry mod 4 FF c i c replacing 3a with 4a (carry into net higher radi-4 multiplier digit) and a. Set if i+1 = i = 1 or i+1 ( if i c) = c = 1 i+1 i+1 i c Mu control Set carry Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 23

182 Radi-4 Booth s Recoding i+1 i i 1 y i+1 y i z i/2 Eplanation No string of 1s in sight End of string of 1s Isolated End of string of 1s Beginning of string of 1s End a string, begin new one Beginning of string of 1s Continuation of string of 1s Contet Recoded radi-2 digits Radi-4 digit Only shifting and complementation required Eample Operand (1) Recoded version y (1) Radi-4 version z Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 24

183 Eample Multiplication via Modified Booth s Recoding ================================ a z 1 2 Radi-4 ================================ p (0) z 0 a p (1) p (1) z 1 a p (2) p (2) ================================ a ( ) a two 0 ( ) a two 1 p Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 25

184 Multiple Generation with Radi-4 Booth s Recoding Multiplier Init. 0 Multiplicand 2-bit shift i+1 i i? k Sign etension, not 0 Recoding Logic Could have named this signal one/two neg two non0 0 a 2a Enable 0 1 Mu Select k+1 0 0, a, or 2a Add/subtract control z i/2 a To adder input Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 26

185 Using Carry-Save Adders Old Cumulative Partial Product 0 2a Mu 0 a Mu Multiplier i+1 i CSA Adder New Cumulative Partial Product Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 27

186 Keeping the Partial Product in Carry-Save Form Mu Multiplier Old PP Net multiple CS sum New PP Shift Partial Product k k Multiplicand 0 Mu Carry Sum k-bit CSA k k k-bit Adder Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 28

187 Carry-Save Multiplier with Radi-4 Booth s Recoding (1/2) a Multiplier i+1 i i-1 Booth recoder and selector Old cumulative partial product z a i/2 New cumulative partial product CSA Adder FF 2-bit Adder Etra dot To the lower half of partial product Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 29

188 Carry-Save Multiplier with Radi- 4 Booth s Recoding (2/2) i+2 neg i+1 i i? Recoding Logic two non0 a 2a 0 1 Enable Mu Select k+1 i? 0, a, or 2a Selective Complement k+2 0, a,, 2a, or?a Etra "Dot" for Column i z a i/2 Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 30

189 Another Design for Radi-4 Multiplication Old Cumulative Partial Product 0 2a Mu 0 a Mu Multiplier i+1 i CSA New Cumulative Partial Product CSA Adder FF 2-Bit Adder To the Lower Half of Partial Product Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 31

190 Radi-8 and Radi-16 Multipliers 4-bit right shift 4-Bit Shift 0 8a Mu 0 4a Mu 0 2a Mu 0 a Mu Multiplier i+3 i+2 i+1 i CSA CSA CSA CSA Sum Carry Partial Product (Upper Half) Bit FF Adder 4 To the Lower Half of Partial Product Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 32

191 A Spectrum of Multiplier Design Choices Adder Net multiple Partial product Several multiples... Small CSA tree Partial product All multiples... Full CSA tree Adder Adder Basic binary Speed up High-radi or partial tree Economize Full tree Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 33

192 VLSI Compleity Issues A radi-2 b multiplier requires: bk two-input AND gates to form the partial products bit-matri O(bk) area for the CSA tree At least Θ(k) area for the final carry-propagate adder Total area: A = O(bk) Latency: T = O((k/b) log b + log k) Any VLSI circuit computing the product of two k-bit integers must satisfy the following constraints: AT grows at least as fast as k 3/2 AT 2 is at least proportional to k 2 The preceding radi-2 b implementations are suboptimal, because: AT = O(k 2 log b + bk log k) AT 2 = O((k 3 /b) log 2 b) Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 34

193 Comparing High- and Low-Radi Multipliers AT = O(k 2 log b + bk log k) AT 2 = O((k 3 /b) log 2 b) Low-Cost b = O(1) High Speed b = O(k) AT- or AT 2 - Optimal AT O(k 2 ) O(k 2 log k) O(k 3/2 ) AT 2 O(k 3 ) O(k 2 log 2 k) O(k 2 ) Intermediate designs do not yield better AT or AT 2 values; The multipliers remain asymptotically suboptimal for any b By the AT measure (indicator of cost-effectiveness), slower radi-2 multipliers are better than high-radi or tree multipliers Thus, when an application requires many independent multiplications, it is more cost-effective to use a large number of slower multipliers High-radi multiplier latency can be reduced from O((k/b) log b + log k) to O(k/b + log k) through more effective pipelining (Chapter 11) Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 35

194 Tree and Array Multipliers Chapter Goals Study the design of multipliers for highest possible performance (speed, throughput) Chapter Highlights Tree multiplier = reduction tree + redundant-to-binary converter Avoiding full sign etension in multiplying signed numbers Array multiplier = one-sided reduction tree + ripple-carry adder Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 36

195 Full-Tree Multipliers a Multiplier... Multiple- Forming Circuits a a... a Partial-Products Reduction Tree (Multi-Operand Addition Tree) Redundant result Redundant-to-Binary Converter Higher-order product bits Some lower-order product bits are generated directly Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 37

196 Full-Tree versus Partial-Tree Multiplier All partial products... Large tree of carry-save adders Several partial products... Small tree of carry-save adders Adder Logdepth Logdepth Adder Product Product Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 38

197 Variations in Full-Tree Multiplier Design Designs are distinguished by variations in three elements: 1. Multiple-forming circuits a Multiple- Forming Circuits... a a a Multiplier Partial products reduction tree Partial-Products Reduction Tree (Multi-Operand Addition Tree) 3. Redundant-to-binary converter Redundant result Redundant-to-Binary Converter Higher-order product bits Some lower-order product bits are generated directly Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 39

198 Eample of Variations in CSA Tree Design Wallace Tree (5 FAs + 3 HAs + 4-Bit Adder) FA FA FA HA FA HA FA HA Bit Adder Latency!! Dadda Tree (4 FAs + 2 HAs + 6-Bit Adder) FA FA FA HA HA FA Bit Adder Two different binary 4 4 tree multipliers. Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 40

199 A 7X7 Tree [0, 6] [1, 7] [2, 8] [3, 9] [4, 10] [5, 11] [6, 12] Multiplier [1, 6] 7-bit CSA 7-bit CSA [2, 8] [1,8] [5, 11] [3, 11] 7-bit CSA [0,6] [6, 12] [2, 8] [3, 12] [1,7] 7-bit CSA [2,8] [3,9] [4,10] [5,11] X [6,12] The inde pair [i, j] means that bit positions from i up to j are involved. [3,9] [2,12] [3,12] 10-bit CSA [3,12] [4,13] [4,12] 10-bit CPA Ignore [4, 13] Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 41

200 Balanced-Delay Tree for 11 Inputs FA FA FA FA Inputs FA Level-1 carries FA 11 + ψ 1 = 2ψ Therefore, ψ 1 = 8 carries are needed FA FA FA FA Level-2 carries Level-3 carries FA FA FA FA FA Level-4 carry FA FA Outputs FA Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 42

201 Binary Tree of 4-to-2 Reduction Modules CSA 4-to-2 4-to-2 4-to-2 4-to-2 CSA 4-to-2 4-to-2 4-to-2 reduction module implemented with two levels of (3; 2)-counters 4-to-2 Due to its recursive structure, a binary tree is more regular than a 3-to-2 reduction tree when laid out in VLSI Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 43

202 Tree Multipliers for Signed Numbers Etended positions Sign Magnitude positions k 1 k 1 k 1 k 1 k 1 k 1 k 2 k 3 k 4... y k 1 y k 1 y k 1 y k 1 y k 1 y k 1 y k 2 y k 3 y k 4... z k 1 z k 1 z k 1 z k 1 z k 1 z k 1 z k 2 z k 3 z k 4... α β γ α β γ From Fig Sign etension in multioperand addition. Sign etensions α β γ α β γ α β γ α β γ α β α Signs The difference in multiplication is the shifting sign positions Five redundant copies removed FA FA FA FA FA αβγ FA Fig Sharing of full adders to reduce the CSA width in a signed tree multiplier. Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 44

203 Using the Negative-Weight Property of the Sign Bit Sign etension is a way of converting negatively weighted bits (negabits) to positively weighted bits (posibits) to facilitate reduction, but there are other methods of accomplishing the same without introducing a lot of etra bits Baugh and Wooley have contributed two such methods a 4 a 3 a 2 a 1 a a 4 0 a 3 0 a 2 0 a 1 0 a 0 0 a 4 1 a 3 1 a 2 1 a 1 1 a 0 1 a 4 2 a 3 2 a 2 2 a 1 2 a 0 2 a 4 3 a 3 3 a 2 3 a 1 3 a 0 3 a 4 4 a 3 4 a 2 4 a 1 4 a p p p p p p p p p p a. Unsigned a 4 a 3 a 2 a 1 a a 4 0 a 3 0 a 2 0 a 1 0 a 0 0 -a 4 1 a 3 1 a 2 1 a 1 1 a 0 1 -a 4 2 a 3 2 a 2 2 a 1 2 a 0 2 -a 4 3 a 3 3 a 2 3 a 1 3 a 0 3 a 4 4 -a 3 4 -a 2 4 -a 1 4 -a p 9 p 8 p 7 p 6 p 5 p 4 p 3 p 2 p 1 p 0 b. 2's-complement a 4 a 3 a 2 a 1 a a 4 0 a 3 0 a 2 0 a 1 0 a 0 0 _ a 4 1 a 3 1 a 2 1 a 1 1 a 0 1 _ a 4 2 a 3 2 a 2 2 a 1 2 a 0 2 a _ 4 3 a _ 3 3 _ a 2 3 a _ 1 3 a 0 3 a 4 4 a 3 4 a 2 4 a 1 4 a 0 4 a 4 a p p p p p p p p p p c. Baugh-Wooley a 4 a 3 a 2 a 1 a a a a a a a a a a a a a a a a a a 4 3 a 3 3 a 2 3 a a a a a a p p p p p p p p p p d. Modified B-W Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 45

204 Fig a 4 4 -a 3 4 -a 2 4 -a 1 4 -a p 9 p 8 p 7 p 6 p 5 p 4 p 3 p 2 p 1 p 0 The Baugh-Wooley Method and Its Modified Form a 4 0 = a 4 (1 0 ) a 4 = a 4 0 a 4 a 4 a 4 0 a 4 In net column a 4 0 = (1 a 4 0 ) 1 = (a 4 0 ) 1 1 (a 4 0 ) 1 In net column a 4 a 3 a 2 a 1 a a 4 0 a 3 0 a 2 0 a 1 0 a 0 0 _ a 4 1 a 3 1 a 2 1 a 1 1 a 0 1 _ a 4 2 a 3 2 a 2 2 a 1 2 a 0 2 a _ 4 3 a _ 3 3 a _ 2 3 a _ 1 3 a 0 3 a a a a a a 4 a p p p p p p p p p p c. Baugh-Wooley a a a a a a a a a a a a a a a a a a a a a a 4 3 a 3 3 a 2 3 a a a a a a p p p p p p p p p p d. Modified B-W Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 46

205 Alternate Views of the Baugh-Wooley Methods a 4 3 a 4 2 a 4 1 a a 3 4 a 2 4 a 1 4 a a 4 3 a 4 2 a 4 1 a a 3 4 a 2 4 a 1 4 a a 4 3 a 4 2 a 4 1 a a 3 4 a 2 4 a 1 4 a a 4 a 4 a 4 3 a 4 2 a 4 1 a a 3 4 a 2 4 a 1 4 a 0 4 a 4 a a 4 a 3 a 2 a 1 a a 4 0 a 3 0 a 2 0 a 1 0 a 0 0 a 4 1 a 3 1 a 2 1 a 1 1 a 0 1 a 4 2 a 3 2 a 2 2 a 1 2 a 0 2 a 4 3 a 3 3 a 2 3 a 1 3 a 0 3 a 4 4 a 3 4 a 2 4 a 1 4 a p p p p p p p p p p a. Unsigned a 4 a 3 a 2 a 1 a a 4 0 a 3 0 a 2 0 a 1 0 a 0 0 -a 4 1 a 3 1 a 2 1 a 1 1 a 0 1 -a 4 2 a 3 2 a 2 2 a 1 2 a 0 2 -a 4 3 a 3 3 a 2 3 a 1 3 a 0 3 a 4 4 -a 3 4 -a 2 4 -a 1 4 -a p 9 p 8 p 7 p 6 p 5 p 4 p 3 p 2 p 1 p 0 b. 2's-complement a 4 a 3 a 2 a 1 a a 4 0 a 3 0 a 2 0 a 1 0 a 0 0 _ a 4 1 a 3 1 a 2 1 a 1 1 a 0 1 _ a 4 2 a 3 2 a 2 2 a 1 2 a 0 2 a _ 4 3 a _ 3 3 _ a 2 3 a _ 1 3 a 0 3 a 4 4 a 3 4 a 2 4 a 1 4 a 0 4 a 4 a p p p p p p p p p p c. Baugh-Wooley a 4 a 3 a 2 a 1 a a a a a a a a a a a a a a a a a a 4 3 a 3 3 a 2 3 a a a a a a p p p p p p p p p p d. Modified B-W Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 47

206 Partial-Tree Multipliers High-radi versus partial-tree multipliers: The difference is quantitative, not qualitative For small h, say 8 bits, we view the multiplier of Fig as high-radi h inputs... CSA Tree Upper part of the cumulative partial product (stored-carry) When h is a significant fraction of k, say k/2 or k/4, then we tend to view it as a partial-tree multiplier Better design through pipelining to be covered in Section 11.6 Sum Carry Adder FF h-bit Adder Fig General structure of a partial-tree multiplier. Lower part of the cumulative partial product Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 48

207 Truncated Multipliers ulp. o o o o o o o o k-by-k fractional. o o o o o o o o multiplication o o o o o o o o. o o o o o o o o. o o o o o o o o. o o o o o o o o. o o o o o o o o. o o o o o o o o. o o o o o o o o. o o o o o o o o o o o o o o o o o o o o o o o o Ma error = 8/2 + 7/4 + 6/8 + 5/16 + 4/32 + 3/64 + 2/ /256 = ulp Mean error = ulp Removing the dots at the right does not lead to much loss of precision. Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 49

208 Truncated Multipliers with Error Compensation We can introduce additional dots on the left-hand side to compensate for the removal of dots from the right-hand side Constant compensation Variable compensation. o o o o o o o. o o o o o o o. o o o o o o. o o o o o o. o o o o o. o o o o o. o o o o. o o o o. o o o. o o o. 1 o o. o o. o. -1 o.. y -1 Constant and variable error compensation for truncated multipliers. Ma error = +4 ulp Ma error 3 ulp Mean error =? ulp Ma error = +? ulp Ma error? ulp Mean error =? ulp Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 50

209 Array Multipliers [3:2] Adder, i.e. a full adder a 2 a 1 a 0 0 a a a a a 0 0 a 4 a 3 CSA CSA Ripple-Carry Adder a CSA CSA A basic array multiplier uses a one-sided CSA tree and a ripplecarry adder. a 3 1 a 4 1 a 3 2 a 4 2 a 3 3 a 4 3 a 3 4 a 4 4 a 2 1 a 2 2 a 2 3 a 2 4 p 8 a 1 1 a 1 2 a 1 3 a 1 4 p 7 a 0 1 a 0 2 a 0 3 a 0 4 p 6 p 0 p 1 p 2 p 3 p 9 p 5 Details of a 5 5 array multiplier using FA blocks. 0 p 4 Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 51

210 Signed (2 s-complement) Array Multiplier using the Baugh- Wooley method or to shorten the critical path. a _ a _ a a a a a 2 1 a 1 1 a 0 1 a 0 0 p 0 a _ a a 2 2 a 1 2 a 0 2 p 1 a 4 4 a a _ a a 2 3 a 1 3 a 0 3 a 3 4 a 2 4 a 1 4 a 0 4 p 2 p 3 a 4 4 p 8 p 7 p 6 p 9 p 5 p 4 Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 52

211 Array Multiplier Built of Modified Full-Adder Cells a a a a a Design of a 5 5 array multiplier with two additive inputs and full-adder blocks that include AND gates. FA p p p p p p p p p p 5 Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 53

212 Array Multiplier without a Final Carry-Propagate Adder... Mu i i B i Level i Mu See net slide i i+1 i i+1 Mu... B i+1 k k Mu k [k, 2k?] k? i+1 i i? 1 All remaining bits of the final product produced only 2 gate levels after p k 1 0 Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 54

213 Etend Bits in Less-Significant Part in a Conditional Adder The circuit in the right part is considered a conditional adder as the circuit in the left part. Source: Ercegovac and Lang, Digital Arithmetic, pp Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 55

214 Pipelined Tree and Array Multipliers h inputs... h inputs... CSA Tree Upper part of the cumulative partial product (stored-carry) Pipelined CSA Tree Latches Latches Latches (h + 2)-input CSA tree CSA Latch Sum CSA Carry Adder FF h-bit Adder Lower part of the cumulative partial product Sum Carry Adder FF h-bit Adder Lower part of the cumulative partial product General structure of a partialtree multiplier. Efficiently pipelined partial-tree multiplier. Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 56

215 Pipelined Array Multipliers With latches after every FA level, the maimum throughput is achieved a a a a a Latches may be inserted after every h FA levels for an intermediate design Eample: 3-stage pipeline Pipelined 5 5 array multiplier using latched FA blocks. The small shaded boes are latches. Latched FA with AND gate Latch FA FA FA FA p p p p p p p p p p Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 57

216 Variations in Multipliers Chapter Goals Learn additional methods for synthesizing fast multipliers as well as other types of multipliers (bit-serial, modular, etc.) Chapter Highlights Building a multiplier from smaller units Performing multiply-add as one operation Bit-serial and (semi)systolic multipliers Using a multiplier for squaring is wasteful Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 58

217 Divide-and-Conquer Designs Building wide multiplier from narrower ones a HH p a a LH a H L H a L H L a LL Rearranged partial products in 2b-by-2b multiplication 2b bits b bits a HH 3b bits a H L a LH a LL Divide-and-conquer (recursive) strategy for synthesizing a 2b 2b multiplier from b b multipliers. Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 59

218 General Structure of a Recursive Multiplier 2b 2b 3b 3b 4b 4b use (3; 2)-counters use (5; 2)-counters use (7; 2)-counters 4b 4b 3b 3b 2b 2b b b Using b b multipliers to synthesize 2b 2b, 3b 3b, and 4b 4b multipliers. Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 60

219 An 8 X 8 Multiplier Using 4 X 4 Multipliers a a a a H H L H H L L L [4, 7] [4, 7] [0, 3] [4, 7] [4, 7] [0, 3] [0, 3] [0, 3] M ultiply M ultiply M ultiply M ultiply [12,15] [8,11] [8,11] [4, 7] [8,11] [4, 7] [4, 7] [0, 3] 8 Add [4, 7] 1 2 Add [8,11] 8 Add [4, 7] Add [8,11] Add [12,15] p p p [12,15] [8,11] [4, 7] [0, 3] p Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 61

220 Additive Multiply Modules a y z y a z p = a + y + z 4-bit adder c in p (a) Block diagram (b) Dot notation Additive multiply module with 2 4 multiplier (a) plus 4-bit and 2-bit additive inputs (y and z). b c AMM b-bit and c-bit multiplicative inputs b-bit and c-bit additive inputs (b + c)-bit output (2 b 1) (2 c 1) + (2 b 1) + (2 c 1) = 2 b+c 1 Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 62

221 Multiplier Built of AMMs Legend: 2 bits [0, 1] 4 bits a [0, 3] Understanding [0, 1] [2, 5] * * * [0, 1] a [2, 3] [4, 7] a [0, 3] [2, 3] [6, 9] [4, 5] * [4,7] [2, 3] a [4, 5] [4, 7] a [0, 3] [4,5] [8, 11] [6, 7] [6, 9] a [4, 7] a [4, 7] [10,13] p [12,15] p [8, 9] [10,11] a [0, 3] [4, 5] [8, 9] [6, 7] p * * [10,11] [8, 9] [8, 11] [6, 7] [0, 1] p[2, 3] p[4,5] [6, 7] [6, 7] An 8 8 multiplier built of 4 2 AMMs. Inputs marked with an asterisk carry 0s. p p an 8 8 multiplier built of 4 2 AMMs using dot notation Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 63

222 Bit-Serial Multipliers Bit-serial adder (LSB first) y 2 y 1 y 0 FF FA s 2 s 1 s 0 Bit-serial multiplier a 2 a 0 a ? p 2 p 0 p 1 Systolic arrays: synchronous arrays of processing elements that are interconnected by only short, local wires thus allowing very high clock rates. Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 64

223 Semisystolic Serial-Parallel Multiplier Multiplicand (parallel in) a 3 a 2 a a 0 Multiplier (serial in) LSB-first Carry FA Sum FA FA FA Product (serial out) Semi-systolic circuit for 4 4 multiplication in 8 clock cycles. This is called semisystolic because it has a large signal fan-out of k (k-way broadcasting) and a long wire spanning all k positions Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 65

224 Systolic Retiming as a Design Tool A semisystolic circuit can be converted to a systolic circuit via retiming, which involves advancing and retarding signals by means of delay removal and delay insertion in such a way that the relative timings of various parts are unaffected Cut e f CL CR CL C R g h +d g h e+d f+d Original delays Adjusted delays Eample of retiming by delaying the inputs to C L and advancing the outputs from C L by d units +d Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 66

225 A First Attempt at Retiming Multiplicand (parallel in) a 3 a 2 a 1 a Carry FA Sum FA FA FA Multiplier (serial in) LSB-first Product (serial out) Multiplicand (parallel in) a 3 a 2 a 1 a Multiplier (serial in) LSB-first Carry FA Sum FA FA FA Product (serial out) Cut 3 Cut 2 Cut 1 A retimed version of our semisystolic multiplier. Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 67

226 Deriving a Fully Systolic Multiplier Carry Multiplicand (parallel in) a 3 a 2 a 1 a FA Sum FA FA FA Multiplier (serial in) LSB-first Product (serial out) Multiplicand (parallel in) a 3 a 2 a a 0 Multiplier (serial in) LSB-first Carry FA Sum FA FA FA Product (serial out) A retimed version of our semisystolic multiplier. Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 68

227 A Direct Design for a Bit-Serial Multiplier t out p (i?) a i i t in a i i a (i - 1) (i - 1) a c out s in 2 (5; 3)-counter 1 Mu Building block for a latency-free bitserial multiplier The cellular structure of the bitserial multiplier based on the cell in Fig t c s out out in a i 1 0 s i t in c in out c s in out a i i LSB 0 p i a i i a i i a i (i - 1) Already accumulated into three numbers Fig Bit-serial multiplier design in dot notation. i a (i - 1) Already output (a) Structure of the bit-matri p (i - 1) a a i i 2p (i ) (i - 1) (i - 1) (b) Reduction after each input bit p Shift right to obtain p (i ) Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 69

228 Modular Multipliers FA FA... FA FA FA Modulo-(2 b 1) carry-save adder Mod-15 CSA Divide by 16 Mod-15 CSA Design of a 4 4 modulo-15 multiplier. Mod-15 CPA 4 Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 70

229 Other Eamples of Modular Multiplication One way to design of a 4 4 modulo-13 multiplier. 16 mod 13 = 3 Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 71

230 Squaring Multiply by p 9 p 8 p 7 p 6 p 5 p 4 4 p 1 p 3 p 2 p 0 Simplify _ p 9 0 p 8 p 7 p 6 p 5 p 4 p 3 Design of a 5-bit squarer p 2 0 Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 72

231 Constant Multiplier Source: Ercegovac and Lang, Digital Arithmetic, pp.224 Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 73

232 Multiple Constant Multiplier Source: Ercegovac and Lang, Digital Arithmetic, pp. 225 Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan 74

233 Division Instructor: Kuan Jen Lin Dept. of EE, FJU, Taiwan Room: SF 727B Most slides are revision of PowerPoint files gotten from tetbook website. Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan 1

234 Division Review Division schemes and various speedup methods Hardest basic operation (fortunately, also the rarest) Division speedup methods: high-radi, array,... Combined multiplication/division hardware Digit-recurrence vs convergence division schemes Topics in This Part Chapter 13 Basic Division Schemes Chapter 14 High-Radi Dividers Chapter 15 Variations in Dividers Chapter 16 Division by Convergence Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan 2

235 13 Basic Division Schemes Chapter Goals Study shift/subtract or bit-at-a-time dividers and set the stage for faster methods and variations to be covered in Chapters Chapter Highlights Shift/subtract divide vs shift/add multiply Hardware, firmware, software algorithms Dividing 2 s-complement numbers The special case of a constant divisor Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan 3

236 Shift/Subtract Division Algorithms Notation for our discussion of division algorithms: z Dividend z 2k 1 z 2k 2... z 3 z 2 z 1 z 0 d Divisor d k 1 d k 2... d 1 d 0 q Quotient q k 1 q k 2... q 1 q 0 s Remainder, z (d q) s k 1 s k 2... s 1 s 0 Initially, we assume unsigned operands d Divisor z q d 23 3 q d 22 2 q d 21 1 q d 20 0 s Quotient Dividend Subtracted bit-matri Remainder Division of an 8-bit number by a 4-bit number in dot notation. q Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan 4

237 Division versus Multiplication (1/2) Division is more comple than multiplication: Need for quotient digit selection or estimation Overflow possibility: the high-order k bits of z must be strictly less than d; the quotient of a 2k bit number divided by a k bit number may have a width of more than k bits. d Divisor q z q d 23 3 q d 22 2 q d 21 1 q d 20 0 s Quotient Dividend Subtracted bit-matri Remainder Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan 5

238 Division versus Multiplication (2/2) Pentium III latencies Instruction Latency Cycles/Issue Load / Store 3 1 Integer Multiply 4 1 Integer Divide Double/Single FP Multiply 5 2 Double/Single FP Add 3 1 Double/Single FP Divide Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan 6

239 Division Recurrence d Divisor q z q d 23 3 q d 22 2 q d 21 1 q d 20 0 s Quotient Dividend Subtracted bit-matri Remainder k bits 2z 0 2 k d k bits Division with left shifts (There is no corresponding right-shift algorithm) s (j) = 2s (j 1) q k j (2 k d) with s (0) = z and shift s (k) = 2 k s subtract Integer division is characterized by z = d q + s 2 2k z = (2 k d) (2 k q) + 2 2k s z frac = d frac q frac + 2 k s frac Divide fractions like integers; adjust the remainder No-overflow condition for fractions is: z frac < d frac Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan 7

240 Division Recurrence Steps Initialization Iterations One digit arithmetic left-shift of s (j) to produce rs (j) Determination of the quotient digit q j+1 by the quotient-digit selection function; The inde of q could be different Generation of the divisor multiple d q j+1 Subtraction of dq j+1 from rs (j). On-the-fly conversion of the quotient Or done in the termination step Termination: make sign(s)=sign(d)), conversion Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan 8

241 Eamples of Basic Division Integer division Fractional division ====================== ===================== z z frac d d frac ====================== ===================== s (0) s (0) s (0) s (0) q d {q 3 = 1} q 1 d {q 1 =1} s (1) s (1) s (1) s (1) q d {q 2 = 0} q 2 d {q 2 =0} s (2) s (2) s (2) s (2) q d {q 1 = 1} q 3 d {q 3 =1} s (3) s (3) s (3) s (3) q d {q 0 = 1} q 4 d {q 4 =1} s (4) s (4) s s frac q q frac ====================== ===================== Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan Notice the inde of q What is the residual of / 0.1? 9

242 Main Factors Affecting the Overall Eecution Time and Cost Radi r Quotient-digit set Redundant signed digit? Representation of the residual CSA? Quotient-digit selection Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan 10

243 Programmed Division Carry Flag Shifted Partial Remainder Rs Shifted Partial Quotient Rq Net quotient digit inserted here Partial Remainder (2k j Bits) Partial Quotient (j Bits) Rd Divisor d k 2 d Register usage for programmed division. Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan 11

244 Assembly Language Program for Division {Using left shifts, divide unsigned 2k-bit dividend, z_high z_low, storing the k-bit quotient and remainder. Registers: R0 holds 0 Rc for counter Rd for divisor Rs for z_high & remainder Rq for z_low & quotient} {Load operands into registers Rd, Rs, and Rq} div: load Rd with divisor load Rs with z_high load Rq with z_low {Check for eceptions} branch d_by_0 if Rd = R0 branch d_ovfl if Rs > Rd {Initialize counter} load k into Rc {Begin division loop} d_loop: shift Rq left 1 {zero to LSB, MSB to carry} rotate Rs left 1 {carry to LSB, MSB to carry} skip if carry = 1 branch no_sub if Rs < Rd sub Rd from Rs incr Rq {set quotient digit to 1} no_sub: decr Rc {decrement counter by 1} branch d_loop if Rc 0 {Store the quotient and remainder} store Rq into quotient store Rs into remainder d_by_0:... d_ovfl:... d_done:... Carry Flag Shifted Partial Remainder Rs Partial Remainder (2k?j Bits) Rd Divisor d k 2 d Register usage for programmed division. Shifted Partial Quotient Rq Partial Quotient (j Bits) Programmed division using left shifts. Net quotient digit inserted here Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan 12

245 Time Compleity of Programmed Division Assume k-bit words k iterations of the main loop 6 or 8 instructions per iteration, depending on the quotient bit Thus, 6k + 3 to 8k + 3 machine instructions, ignoring operand loads and result store k = 32 implies instructions on average This is too slow for many modern applications! Microprogrammed division would be somewhat better Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan 13

246 Restoring Hardware Dividers Shift Trial difference Quotient q k Partial remainder s (initial value z) (j) q k j Load Shift Divisor d 0 1 Mu k MSB of 2s (j 1) Quotient digit selector k c out Adder c in 1 Shift/subtract sequential restoring divider. Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan 14

247 Indirect Signed Division In division with signed operands, q and s are defined by z = d q + s sign(s) = sign(z) s < d Eamples of division with signed operands z = 5 d = 3 q = 1 s = 2 z = 5 d = 3 q = 1 s = 2 z = 5 d = 3 q = 1 s = 2 z = 5 d = 3 q = 1 s = 2 (not q = 2, s = 1) Magnitudes of q and s are unaffected by input signs Signs of q and s are derivable from signs of z and d Will discuss direct signed division later Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan 15

248 Eample of Restoring Unsigned Division ======================= z d d ======================= 1 s s (0) ( 2 4 d) s (1) s (1) Positive, so set q 3 = 1 s (2) Negative, so set q 2 = 0 1 and restore s =2s ( 2 4 d) s (2) s 2s (3) ( 2 4 d) Positive, so set q 1 = 1 +( 2 4 d) s (4) s Positive, so set q 0 = 1 q ======================= No overflow, because (0111) two < (1010) two Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan 16

249 Nonrestoring and Signed Division The cycle time in restoring division must be long enough to allow: Shifting the registers Allowing signals to propagate through the adder Determining and storing the net quotient digit Storing the trial difference, if required Nonrestoring division to the rescue! Assume q k j = 1 and subtract Store the result as the new PR (the partial remainder can become incorrect, hence the name nonrestoring ) Trial difference c out 0 1 Mu Adder Quotient q Partial remainder s (initial value z) Divisor d k k (j) k c in Shift Shift MSB of 2s (j 1) 1 q k j Load Quotient digit selector Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan 17

250 Justification for Nonrestoring Division Why it is acceptable to store an incorrect value in the partial-remainder register? Shifted partial remainder at start of the cycle is u Suppose subtraction yields the negative result u 2 k d Option 1: Restore the partial remainder to correct value u, shift left, and subtract to get 2u 2 k d Option 2: Keep the incorrect partial remainder u 2 k d, shift left, and add to get 2(u 2 k d) + 2 k d = 2u 2 k d Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan 18

251 Eample of Nonrestoring ======================= z d d ======================= s (0) s (0) Positive, +( 2 4 d) so subtract s (1) s (1) Positive, so set q 3 = 1 +( 2 4 d) and subtract s (2) s (2) Negative, so set q 2 = d and add s (3) s (3) Positive, so set q 1 = 1 +( 2 4 d) and subtract s (4) Positive, so set q 0 = 1 s q ======================= Unsigned Division No overflow: (0111) two < (1010) two Applying if sign(s) = sign(d) then qk j = 1 else qk j = -1, we get 11-11, that equals 1011 Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan 19

252 Graphical Depiction of Nonrestoring Division Eample Partial remainder s (0) 74 s (1) s (2) s (3) s (4) =16s ( ) two / ( ) two 100 (a) Restoring (117) ten / (10) ten Partial remainder s (0) 74 s (1) s (2) s (3) s (4) =16s 100 (b) Nonrestoring Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan 20

253 Nonrestoring Division with Signed Operands Restoring division q k j = 0 means no subtraction (or subtraction of 0) q k j = 1 means subtraction of d Nonrestoring division Eample: q = We always subtract or add It is as if quotient digits are selected from the set {1, 1}: 1 corresponds to subtraction 1 corresponds to addition Our goal is to end up with a remainder that matches the sign of the dividend This idea of trying to match the sign of s with the sign z, leads to a direct signed division algorithm if sign(s) = sign(d) then q k j = 1 else q k j = 1 Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan 21

254 Quotient Conversion and Final Correction Partial remainder variation and selected quotient digits during nonrestoring division with d > 0 Quotient with digits 1 and 1 z d 0 d +d 2 d 2 +d 2 +d 2 d d Replace 1s with 0s Shift left, complement MSB, and set LSB to 1 to get the 2 s-complement quotient Check: = 25 = Final correction step if sign(s) sign(z): Add d to, or subtract d from, s; subtract 1 from, or add 1 to, q Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan 22

255 ======================== z d d Signed Division ======================== s (0) s (0) sign(s (0) ) sign(d), +2 4 d so set q 3 = 1 and add s (1) s (1) sign(s (1) ) = sign(d), +( 2 4 d) so set q 2 = 1 and subtract s (2) s (2) sign(s (2) ) sign(d), +2 4 d so set q 1 = 1 and add s (3) s (3) sign(s (3) ) = sign(d), +( 2 4 d) so set q 0 = 1 and subtract s (4) sign(s (4) ) sign(z), +( 2 4 d) so perform corrective subtraction s (4) s q ======================== Eample of Nonrestoring p = Shift, compl MSB Add 1 to correct Check: 33/( 7) = 4 Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan 23

256 On-The-Fly Conversion Source: Ercegovac and Lang, Digital Arithmetic, pp. 257 Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan 24

Part II Addition / Subtraction

Part II Addition / Subtraction Part II Addition / Subtraction Parts Chapters I. Number Representation 1. 2. 3. 4. Numbers and Arithmetic Representing Signed Numbers Redundant Number Systems Residue Number Systems Elementary Operations

More information

Part II Addition / Subtraction

Part II Addition / Subtraction Part II Addition / Subtraction Parts Chapters I. Number Representation 1. 2. 3. 4. Numbers and Arithmetic Representing Signed Numbers Redundant Number Systems Residue Number Systems Elementary Operations

More information

Computer Architecture 10. Fast Adders

Computer Architecture 10. Fast Adders Computer Architecture 10 Fast s Ma d e wi t h Op e n Of f i c e. o r g 1 Carry Problem Addition is primary mechanism in implementing arithmetic operations Slow addition directly affects the total performance

More information

Lecture 11. Advanced Dividers

Lecture 11. Advanced Dividers Lecture 11 Advanced Dividers Required Reading Behrooz Parhami, Computer Arithmetic: Algorithms and Hardware Design Chapter 15 Variation in Dividers 15.3, Combinational and Array Dividers Chapter 16, Division

More information

CMPEN 411 VLSI Digital Circuits Spring Lecture 19: Adder Design

CMPEN 411 VLSI Digital Circuits Spring Lecture 19: Adder Design CMPEN 411 VLSI Digital Circuits Spring 2011 Lecture 19: Adder Design [Adapted from Rabaey s Digital Integrated Circuits, Second Edition, 2003 J. Rabaey, A. Chandrakasan, B. Nikolic] Sp11 CMPEN 411 L19

More information

Residue Number Systems Ivor Page 1

Residue Number Systems Ivor Page 1 Residue Number Systems 1 Residue Number Systems Ivor Page 1 7.1 Arithmetic in a modulus system The great speed of arithmetic in Residue Number Systems (RNS) comes from a simple theorem from number theory:

More information

9. Datapath Design. Jacob Abraham. Department of Electrical and Computer Engineering The University of Texas at Austin VLSI Design Fall 2017

9. Datapath Design. Jacob Abraham. Department of Electrical and Computer Engineering The University of Texas at Austin VLSI Design Fall 2017 9. Datapath Design Jacob Abraham Department of Electrical and Computer Engineering The University of Texas at Austin VLSI Design Fall 2017 October 2, 2017 ECE Department, University of Texas at Austin

More information

ELEN Electronique numérique

ELEN Electronique numérique ELEN0040 - Electronique numérique Patricia ROUSSEAUX Année académique 2014-2015 CHAPITRE 3 Combinational Logic Circuits ELEN0040 3-4 1 Combinational Functional Blocks 1.1 Rudimentary Functions 1.2 Functions

More information

VLSI Design. [Adapted from Rabaey s Digital Integrated Circuits, 2002, J. Rabaey et al.] ECE 4121 VLSI DEsign.1

VLSI Design. [Adapted from Rabaey s Digital Integrated Circuits, 2002, J. Rabaey et al.] ECE 4121 VLSI DEsign.1 VLSI Design Adder Design [Adapted from Rabaey s Digital Integrated Circuits, 2002, J. Rabaey et al.] ECE 4121 VLSI DEsign.1 Major Components of a Computer Processor Devices Control Memory Input Datapath

More information

ARITHMETIC COMBINATIONAL MODULES AND NETWORKS

ARITHMETIC COMBINATIONAL MODULES AND NETWORKS ARITHMETIC COMBINATIONAL MODULES AND NETWORKS 1 SPECIFICATION OF ADDER MODULES FOR POSITIVE INTEGERS HALF-ADDER AND FULL-ADDER MODULES CARRY-RIPPLE AND CARRY-LOOKAHEAD ADDER MODULES NETWORKS OF ADDER MODULES

More information

ECE 645: Lecture 3. Conditional-Sum Adders and Parallel Prefix Network Adders. FPGA Optimized Adders

ECE 645: Lecture 3. Conditional-Sum Adders and Parallel Prefix Network Adders. FPGA Optimized Adders ECE 645: Lecture 3 Conditional-Sum Adders and Parallel Prefix Network Adders FPGA Optimized Adders Required Reading Behrooz Parhami, Computer Arithmetic: Algorithms and Hardware Design Chapter 7.4, Conditional-Sum

More information

Chapter 5. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 5 <1>

Chapter 5. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 5 <1> Chapter 5 Digital Design and Computer Architecture, 2 nd Edition David Money Harris and Sarah L. Harris Chapter 5 Chapter 5 :: Topics Introduction Arithmetic Circuits umber Systems Sequential Building

More information

Residue Number Systems. Alternative number representations. TSTE 8 Digital Arithmetic Seminar 2. Residue Number Systems.

Residue Number Systems. Alternative number representations. TSTE 8 Digital Arithmetic Seminar 2. Residue Number Systems. TSTE8 Digital Arithmetic Seminar Oscar Gustafsson The idea is to use the residues of the numbers and perform operations on the residues Also called modular arithmetic since the residues are computed using

More information

CSE477 VLSI Digital Circuits Fall Lecture 20: Adder Design

CSE477 VLSI Digital Circuits Fall Lecture 20: Adder Design CSE477 VLSI Digital Circuits Fall 22 Lecture 2: Adder Design Mary Jane Irwin ( www.cse.psu.edu/~mji ) www.cse.psu.edu/~cg477 [Adapted from Rabaey s Digital Integrated Circuits, 22, J. Rabaey et al.] CSE477

More information

Computer Architecture, IFE CS and T&CS, 4 th sem. Representation of Integer Numbers in Computer Systems

Computer Architecture, IFE CS and T&CS, 4 th sem. Representation of Integer Numbers in Computer Systems Representation of Integer Numbers in Computer Systems Positional Numbering System Additive Systems history but... Roman numerals Positional Systems: r system base (radix) A number value a - digit i digit

More information

ISSN (PRINT): , (ONLINE): , VOLUME-4, ISSUE-10,

ISSN (PRINT): , (ONLINE): , VOLUME-4, ISSUE-10, A NOVEL DOMINO LOGIC DESIGN FOR EMBEDDED APPLICATION Dr.K.Sujatha Associate Professor, Department of Computer science and Engineering, Sri Krishna College of Engineering and Technology, Coimbatore, Tamilnadu,

More information

Adders, subtractors comparators, multipliers and other ALU elements

Adders, subtractors comparators, multipliers and other ALU elements CSE4: Components and Design Techniques for Digital Systems Adders, subtractors comparators, multipliers and other ALU elements Instructor: Mohsen Imani UC San Diego Slides from: Prof.Tajana Simunic Rosing

More information

ECEN 248: INTRODUCTION TO DIGITAL SYSTEMS DESIGN. Week 9 Dr. Srinivas Shakkottai Dept. of Electrical and Computer Engineering

ECEN 248: INTRODUCTION TO DIGITAL SYSTEMS DESIGN. Week 9 Dr. Srinivas Shakkottai Dept. of Electrical and Computer Engineering ECEN 248: INTRODUCTION TO DIGITAL SYSTEMS DESIGN Week 9 Dr. Srinivas Shakkottai Dept. of Electrical and Computer Engineering TIMING ANALYSIS Overview Circuits do not respond instantaneously to input changes

More information

LOGIC CIRCUITS. Basic Experiment and Design of Electronics. Ho Kyung Kim, Ph.D.

LOGIC CIRCUITS. Basic Experiment and Design of Electronics. Ho Kyung Kim, Ph.D. Basic Experiment and Design of Electronics LOGIC CIRCUITS Ho Kyung Kim, Ph.D. hokyung@pusan.ac.kr School of Mechanical Engineering Pusan National University Digital IC packages TTL (transistor-transistor

More information

Arithmetic in Integer Rings and Prime Fields

Arithmetic in Integer Rings and Prime Fields Arithmetic in Integer Rings and Prime Fields A 3 B 3 A 2 B 2 A 1 B 1 A 0 B 0 FA C 3 FA C 2 FA C 1 FA C 0 C 4 S 3 S 2 S 1 S 0 http://koclab.org Çetin Kaya Koç Spring 2018 1 / 71 Contents Arithmetic in Integer

More information

A High-Speed Realization of Chinese Remainder Theorem

A High-Speed Realization of Chinese Remainder Theorem Proceedings of the 2007 WSEAS Int. Conference on Circuits, Systems, Signal and Telecommunications, Gold Coast, Australia, January 17-19, 2007 97 A High-Speed Realization of Chinese Remainder Theorem Shuangching

More information

Homework 4 due today Quiz #4 today In class (80min) final exam on April 29 Project reports due on May 4. Project presentations May 5, 1-4pm

Homework 4 due today Quiz #4 today In class (80min) final exam on April 29 Project reports due on May 4. Project presentations May 5, 1-4pm EE241 - Spring 2010 Advanced Digital Integrated Circuits Lecture 25: Digital Arithmetic Adders Announcements Homework 4 due today Quiz #4 today In class (80min) final exam on April 29 Project reports due

More information

Logic and Computer Design Fundamentals. Chapter 5 Arithmetic Functions and Circuits

Logic and Computer Design Fundamentals. Chapter 5 Arithmetic Functions and Circuits Logic and Computer Design Fundamentals Chapter 5 Arithmetic Functions and Circuits Arithmetic functions Operate on binary vectors Use the same subfunction in each bit position Can design functional block

More information

Hardware Design I Chap. 4 Representative combinational logic

Hardware Design I Chap. 4 Representative combinational logic Hardware Design I Chap. 4 Representative combinational logic E-mail: shimada@is.naist.jp Already optimized circuits There are many optimized circuits which are well used You can reduce your design workload

More information

ECE 545 Digital System Design with VHDL Lecture 1. Digital Logic Refresher Part A Combinational Logic Building Blocks

ECE 545 Digital System Design with VHDL Lecture 1. Digital Logic Refresher Part A Combinational Logic Building Blocks ECE 545 Digital System Design with VHDL Lecture Digital Logic Refresher Part A Combinational Logic Building Blocks Lecture Roadmap Combinational Logic Basic Logic Review Basic Gates De Morgan s Law Combinational

More information

Chapter 5 Arithmetic Circuits

Chapter 5 Arithmetic Circuits Chapter 5 Arithmetic Circuits SKEE2263 Digital Systems Mun im/ismahani/izam {munim@utm.my,e-izam@utm.my,ismahani@fke.utm.my} February 11, 2016 Table of Contents 1 Iterative Designs 2 Adders 3 High-Speed

More information

Sample Test Paper - I

Sample Test Paper - I Scheme G Sample Test Paper - I Course Name : Computer Engineering Group Marks : 25 Hours: 1 Hrs. Q.1) Attempt any THREE: 09 Marks a) Define i) Propagation delay ii) Fan-in iii) Fan-out b) Convert the following:

More information

Lecture 4. Adders. Computer Systems Laboratory Stanford University

Lecture 4. Adders. Computer Systems Laboratory Stanford University Lecture 4 Adders Computer Systems Laboratory Stanford University horowitz@stanford.edu Copyright 2006 Mark Horowitz Some figures from High-Performance Microprocessor Design IEEE 1 Overview Readings Today

More information

EECS150 - Digital Design Lecture 21 - Design Blocks

EECS150 - Digital Design Lecture 21 - Design Blocks EECS150 - Digital Design Lecture 21 - Design Blocks April 3, 2012 John Wawrzynek Spring 2012 EECS150 - Lec21-db3 Page 1 Fixed Shifters / Rotators fixed shifters hardwire the shift amount into the circuit.

More information

LOGIC CIRCUITS. Basic Experiment and Design of Electronics

LOGIC CIRCUITS. Basic Experiment and Design of Electronics Basic Experiment and Design of Electronics LOGIC CIRCUITS Ho Kyung Kim, Ph.D. hokyung@pusan.ac.kr School of Mechanical Engineering Pusan National University Outline Combinational logic circuits Output

More information

Tree and Array Multipliers Ivor Page 1

Tree and Array Multipliers Ivor Page 1 Tree and Array Multipliers 1 Tree and Array Multipliers Ivor Page 1 11.1 Tree Multipliers In Figure 1 seven input operands are combined by a tree of CSAs. The final level of the tree is a carry-completion

More information

Adders, subtractors comparators, multipliers and other ALU elements

Adders, subtractors comparators, multipliers and other ALU elements CSE4: Components and Design Techniques for Digital Systems Adders, subtractors comparators, multipliers and other ALU elements Adders 2 Circuit Delay Transistors have instrinsic resistance and capacitance

More information

VLSI Arithmetic. Lecture 9: Carry-Save and Multi-Operand Addition. Prof. Vojin G. Oklobdzija University of California

VLSI Arithmetic. Lecture 9: Carry-Save and Multi-Operand Addition. Prof. Vojin G. Oklobdzija University of California VLSI Arithmetic Lecture 9: Carry-Save and Multi-Operand Addition Prof. Vojin G. Oklobdzija University of California http://www.ece.ucdavis.edu/acsel Carry-Save Addition* *from Parhami 2 June 18, 2003 Carry-Save

More information

Number representation

Number representation Number representation A number can be represented in binary in many ways. The most common number types to be represented are: Integers, positive integers one-complement, two-complement, sign-magnitude

More information

ECE 341. Lecture # 3

ECE 341. Lecture # 3 ECE 341 Lecture # 3 Instructor: Zeshan Chishti zeshan@ece.pdx.edu October 7, 2013 Portland State University Lecture Topics Counters Finite State Machines Decoders Multiplexers Reference: Appendix A of

More information

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic. Digital Integrated Circuits A Design Perspective Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic Arithmetic Circuits January, 2003 1 A Generic Digital Processor MEMORY INPUT-OUTPUT CONTROL DATAPATH

More information

How fast can we calculate?

How fast can we calculate? November 30, 2013 A touch of History The Colossus Computers developed at Bletchley Park in England during WW2 were probably the first programmable computers. Information about these machines has only been

More information

Numbers and Arithmetic

Numbers and Arithmetic Numbers and Arithmetic See: P&H Chapter 2.4 2.6, 3.2, C.5 C.6 Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University Big Picture: Building a Processor memory inst register file alu

More information

Digital Logic: Boolean Algebra and Gates. Textbook Chapter 3

Digital Logic: Boolean Algebra and Gates. Textbook Chapter 3 Digital Logic: Boolean Algebra and Gates Textbook Chapter 3 Basic Logic Gates XOR CMPE12 Summer 2009 02-2 Truth Table The most basic representation of a logic function Lists the output for all possible

More information

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic. Digital Integrated Circuits A Design Perspective Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic Arithmetic Circuits January, 2003 1 A Generic Digital Processor MEM ORY INPUT-OUTPUT CONTROL DATAPATH

More information

Part VI Function Evaluation

Part VI Function Evaluation Part VI Function Evaluation Parts Chapters I. Number Representation 1. 2. 3. 4. Numbers and Arithmetic Representing Signed Numbers Redundant Number Systems Residue Number Systems Elementary Operations

More information

Lecture 8: Sequential Multipliers

Lecture 8: Sequential Multipliers Lecture 8: Sequential Multipliers ECE 645 Computer Arithmetic 3/25/08 ECE 645 Computer Arithmetic Lecture Roadmap Sequential Multipliers Unsigned Signed Radix-2 Booth Recoding High-Radix Multiplication

More information

Hakim Weatherspoon CS 3410 Computer Science Cornell University

Hakim Weatherspoon CS 3410 Computer Science Cornell University Hakim Weatherspoon CS 3410 Computer Science Cornell University The slides are the product of many rounds of teaching CS 3410 by Professors Weatherspoon, Bala, Bracy, and Sirer. memory inst 32 register

More information

CS 140 Lecture 14 Standard Combinational Modules

CS 140 Lecture 14 Standard Combinational Modules CS 14 Lecture 14 Standard Combinational Modules Professor CK Cheng CSE Dept. UC San Diego Some slides from Harris and Harris 1 Part III. Standard Modules A. Interconnect B. Operators. Adders Multiplier

More information

Numbers and Arithmetic

Numbers and Arithmetic Numbers and Arithmetic See: P&H Chapter 2.4 2.6, 3.2, C.5 C.6 Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University Big Picture: Building a Processor memory inst register file alu

More information

EECS150 - Digital Design Lecture 23 - FFs revisited, FIFOs, ECCs, LSFRs. Cross-coupled NOR gates

EECS150 - Digital Design Lecture 23 - FFs revisited, FIFOs, ECCs, LSFRs. Cross-coupled NOR gates EECS150 - Digital Design Lecture 23 - FFs revisited, FIFOs, ECCs, LSFRs April 16, 2009 John Wawrzynek Spring 2009 EECS150 - Lec24-blocks Page 1 Cross-coupled NOR gates remember, If both R=0 & S=0, then

More information

Design of Sequential Circuits

Design of Sequential Circuits Design of Sequential Circuits Seven Steps: Construct a state diagram (showing contents of flip flop and inputs with next state) Assign letter variables to each flip flop and each input and output variable

More information

GF(2 m ) arithmetic: summary

GF(2 m ) arithmetic: summary GF(2 m ) arithmetic: summary EE 387, Notes 18, Handout #32 Addition/subtraction: bitwise XOR (m gates/ops) Multiplication: bit serial (shift and add) bit parallel (combinational) subfield representation

More information

A COMBINED 16-BIT BINARY AND DUAL GALOIS FIELD MULTIPLIER. Jesus Garcia and Michael J. Schulte

A COMBINED 16-BIT BINARY AND DUAL GALOIS FIELD MULTIPLIER. Jesus Garcia and Michael J. Schulte A COMBINED 16-BIT BINARY AND DUAL GALOIS FIELD MULTIPLIER Jesus Garcia and Michael J. Schulte Lehigh University Department of Computer Science and Engineering Bethlehem, PA 15 ABSTRACT Galois field arithmetic

More information

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: Digital Logic

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: Digital Logic Computer Science 324 Computer Architecture Mount Holyoke College Fall 2007 Topic Notes: Digital Logic Our goal for the next few weeks is to paint a a reasonably complete picture of how we can go from transistor

More information

Linear Feedback Shift Registers (LFSRs) 4-bit LFSR

Linear Feedback Shift Registers (LFSRs) 4-bit LFSR Linear Feedback Shift Registers (LFSRs) These are n-bit counters exhibiting pseudo-random behavior. Built from simple shift-registers with a small number of xor gates. Used for: random number generation

More information

Area-Time Optimal Adder with Relative Placement Generator

Area-Time Optimal Adder with Relative Placement Generator Area-Time Optimal Adder with Relative Placement Generator Abstract: This paper presents the design of a generator, for the production of area-time-optimal adders. A unique feature of this generator is

More information

Numbers & Arithmetic. Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University. See: P&H Chapter , 3.2, C.5 C.

Numbers & Arithmetic. Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University. See: P&H Chapter , 3.2, C.5 C. Numbers & Arithmetic Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University See: P&H Chapter 2.4-2.6, 3.2, C.5 C.6 Example: Big Picture Computer System Organization and Programming

More information

Cost/Performance Tradeoff of n-select Square Root Implementations

Cost/Performance Tradeoff of n-select Square Root Implementations Australian Computer Science Communications, Vol.22, No.4, 2, pp.9 6, IEEE Comp. Society Press Cost/Performance Tradeoff of n-select Square Root Implementations Wanming Chu and Yamin Li Computer Architecture

More information

Combinational Logic Design Arithmetic Functions and Circuits

Combinational Logic Design Arithmetic Functions and Circuits Combinational Logic Design Arithmetic Functions and Circuits Overview Binary Addition Half Adder Full Adder Ripple Carry Adder Carry Look-ahead Adder Binary Subtraction Binary Subtractor Binary Adder-Subtractor

More information

Midterm Exam Two is scheduled on April 8 in class. On March 27 I will help you prepare Midterm Exam Two.

Midterm Exam Two is scheduled on April 8 in class. On March 27 I will help you prepare Midterm Exam Two. Announcements Midterm Exam Two is scheduled on April 8 in class. On March 27 I will help you prepare Midterm Exam Two. Chapter 5 1 Chapter 3: Part 3 Arithmetic Functions Iterative combinational circuits

More information

ECE 645: Lecture 2. Carry-Lookahead, Carry-Select, & Hybrid Adders

ECE 645: Lecture 2. Carry-Lookahead, Carry-Select, & Hybrid Adders ECE 645: Lecture 2 Carry-Lookahead, Carry-Select, & Hybrid Adders Required Reading Behrooz Parhami, Computer Arithmetic: Algorithms and Hardware Design Chapter 6, Carry-Lookahead Adders Sections 6.1-6.2.

More information

DIGITAL TECHNICS. Dr. Bálint Pődör. Óbuda University, Microelectronics and Technology Institute

DIGITAL TECHNICS. Dr. Bálint Pődör. Óbuda University, Microelectronics and Technology Institute DIGITAL TECHNICS Dr. Bálint Pődör Óbuda University, Microelectronics and Technology Institute 4. LECTURE: COMBINATIONAL LOGIC DESIGN: ARITHMETICS (THROUGH EXAMPLES) 2016/2017 COMBINATIONAL LOGIC DESIGN:

More information

Parallelism in Computer Arithmetic: A Historical Perspective

Parallelism in Computer Arithmetic: A Historical Perspective Parallelism in Computer Arithmetic: A Historical Perspective 21s 2s 199s 198s 197s 196s 195s Behrooz Parhami Aug. 218 Parallelism in Computer Arithmetic Slide 1 University of California, Santa Barbara

More information

On Equivalences and Fair Comparisons Among Residue Number Systems with Special Moduli

On Equivalences and Fair Comparisons Among Residue Number Systems with Special Moduli On Equivalences and Fair Comparisons Among Residue Number Systems with Special Moduli Behrooz Parhami Department of Electrical and Computer Engineering University of California Santa Barbara, CA 93106-9560,

More information

Latches. October 13, 2003 Latches 1

Latches. October 13, 2003 Latches 1 Latches The second part of CS231 focuses on sequential circuits, where we add memory to the hardware that we ve already seen. Our schedule will be very similar to before: We first show how primitive memory

More information

Section 3: Combinational Logic Design. Department of Electrical Engineering, University of Waterloo. Combinational Logic

Section 3: Combinational Logic Design. Department of Electrical Engineering, University of Waterloo. Combinational Logic Section 3: Combinational Logic Design Major Topics Design Procedure Multilevel circuits Design with XOR gates Adders and Subtractors Binary parallel adder Decoders Encoders Multiplexers Programmed Logic

More information

VLSI Design I; A. Milenkovic 1

VLSI Design I; A. Milenkovic 1 The -bit inary dder CPE/EE 427, CPE 527 VLI Design I L2: dder Design Department of Electrical and Computer Engineering University of labama in Huntsville leksandar Milenkovic ( www. ece.uah.edu/~milenka

More information

We are here. Assembly Language. Processors Arithmetic Logic Units. Finite State Machines. Circuits Gates. Transistors

We are here. Assembly Language. Processors Arithmetic Logic Units. Finite State Machines. Circuits Gates. Transistors CSC258 Week 3 1 Logistics If you cannot login to MarkUs, email me your UTORID and name. Check lab marks on MarkUs, if it s recorded wrong, contact Larry within a week after the lab. Quiz 1 average: 86%

More information

EECS150. Arithmetic Circuits

EECS150. Arithmetic Circuits EE5 ection 8 Arithmetic ircuits Fall 2 Arithmetic ircuits Excellent Examples of ombinational Logic Design Time vs. pace Trade-offs Doing things fast may require more logic and thus more space Example:

More information

EECS150 - Digital Design Lecture 24 - Arithmetic Blocks, Part 2 + Shifters

EECS150 - Digital Design Lecture 24 - Arithmetic Blocks, Part 2 + Shifters EECS150 - Digital Design Lecture 24 - Arithmetic Blocks, Part 2 + Shifters April 15, 2010 John Wawrzynek 1 Multiplication a 3 a 2 a 1 a 0 Multiplicand b 3 b 2 b 1 b 0 Multiplier X a 3 b 0 a 2 b 0 a 1 b

More information

ECE380 Digital Logic. Positional representation

ECE380 Digital Logic. Positional representation ECE380 Digital Logic Number Representation and Arithmetic Circuits: Number Representation and Unsigned Addition Dr. D. J. Jackson Lecture 16-1 Positional representation First consider integers Begin with

More information

Logic Design II (17.342) Spring Lecture Outline

Logic Design II (17.342) Spring Lecture Outline Logic Design II (17.342) Spring 2012 Lecture Outline Class # 10 April 12, 2012 Dohn Bowden 1 Today s Lecture First half of the class Circuits for Arithmetic Operations Chapter 18 Should finish at least

More information

Overview. Arithmetic circuits. Binary half adder. Binary full adder. Last lecture PLDs ROMs Tristates Design examples

Overview. Arithmetic circuits. Binary half adder. Binary full adder. Last lecture PLDs ROMs Tristates Design examples Overview rithmetic circuits Last lecture PLDs ROMs Tristates Design examples Today dders Ripple-carry Carry-lookahead Carry-select The conclusion of combinational logic!!! General-purpose building blocks

More information

GALOP : A Generalized VLSI Architecture for Ultrafast Carry Originate-Propagate adders

GALOP : A Generalized VLSI Architecture for Ultrafast Carry Originate-Propagate adders GALOP : A Generalized VLSI Architecture for Ultrafast Carry Originate-Propagate adders Dhananjay S. Phatak Electrical Engineering Department State University of New York, Binghamton, NY 13902-6000 Israel

More information

Implementation Of Digital Fir Filter Using Improved Table Look Up Scheme For Residue Number System

Implementation Of Digital Fir Filter Using Improved Table Look Up Scheme For Residue Number System Implementation Of Digital Fir Filter Using Improved Table Look Up Scheme For Residue Number System G.Suresh, G.Indira Devi, P.Pavankumar Abstract The use of the improved table look up Residue Number System

More information

Combinational Logic. By : Ali Mustafa

Combinational Logic. By : Ali Mustafa Combinational Logic By : Ali Mustafa Contents Adder Subtractor Multiplier Comparator Decoder Encoder Multiplexer How to Analyze any combinational circuit like this? Analysis Procedure To obtain the output

More information

BOOLEAN ALGEBRA. Introduction. 1854: Logical algebra was published by George Boole known today as Boolean Algebra

BOOLEAN ALGEBRA. Introduction. 1854: Logical algebra was published by George Boole known today as Boolean Algebra BOOLEAN ALGEBRA Introduction 1854: Logical algebra was published by George Boole known today as Boolean Algebra It s a convenient way and systematic way of expressing and analyzing the operation of logic

More information

What s the Deal? MULTIPLICATION. Time to multiply

What s the Deal? MULTIPLICATION. Time to multiply What s the Deal? MULTIPLICATION Time to multiply Multiplying two numbers requires a multiply Luckily, in binary that s just an AND gate! 0*0=0, 0*1=0, 1*0=0, 1*1=1 Generate a bunch of partial products

More information

Carry Look Ahead Adders

Carry Look Ahead Adders Carry Look Ahead Adders Lesson Objectives: The objectives of this lesson are to learn about: 1. Carry Look Ahead Adder circuit. 2. Binary Parallel Adder/Subtractor circuit. 3. BCD adder circuit. 4. Binary

More information

Digital System Design Combinational Logic. Assoc. Prof. Pradondet Nilagupta

Digital System Design Combinational Logic. Assoc. Prof. Pradondet Nilagupta Digital System Design Combinational Logic Assoc. Prof. Pradondet Nilagupta pom@ku.ac.th Acknowledgement This lecture note is modified from Engin112: Digital Design by Prof. Maciej Ciesielski, Prof. Tilman

More information

Hw 6 due Thursday, Nov 3, 5pm No lab this week

Hw 6 due Thursday, Nov 3, 5pm No lab this week EE141 Fall 2005 Lecture 18 dders nnouncements Hw 6 due Thursday, Nov 3, 5pm No lab this week Midterm 2 Review: Tue Nov 8, North Gate Hall, Room 105, 6:30-8:30pm Exam: Thu Nov 10, Morgan, Room 101, 6:30-8:00pm

More information

Fundamentals of Digital Design

Fundamentals of Digital Design Fundamentals of Digital Design Digital Radiation Measurement and Spectroscopy NE/RHP 537 1 Binary Number System The binary numeral system, or base-2 number system, is a numeral system that represents numeric

More information

Class Website:

Class Website: ECE 20B, Winter 2003 Introduction to Electrical Engineering, II LECTURE NOTES #5 Instructor: Andrew B. Kahng (lecture) Email: abk@ece.ucsd.edu Telephone: 858-822-4884 office, 858-353-0550 cell Office:

More information

Binary addition by hand. Adding two bits

Binary addition by hand. Adding two bits Chapter 3 Arithmetic is the most basic thing you can do with a computer We focus on addition, subtraction, multiplication and arithmetic-logic units, or ALUs, which are the heart of CPUs. ALU design Bit

More information

Fundamentals of Computer Systems

Fundamentals of Computer Systems Fundamentals of Computer Systems Review for the Final Stephen A. Edwards Columbia University Summer 25 The Final 2 hours 8 problems Closed book Simple calculators are OK, but unnecessary One double-sided

More information

Power Consumption Analysis. Arithmetic Level Countermeasures for ECC Coprocessor. Arithmetic Operators for Cryptography.

Power Consumption Analysis. Arithmetic Level Countermeasures for ECC Coprocessor. Arithmetic Operators for Cryptography. Power Consumption Analysis General principle: measure the current I in the circuit Arithmetic Level Countermeasures for ECC Coprocessor Arnaud Tisserand, Thomas Chabrier, Danuta Pamula I V DD circuit traces

More information

Chapter 4. Combinational: Circuits with logic gates whose outputs depend on the present combination of the inputs. elements. Dr.

Chapter 4. Combinational: Circuits with logic gates whose outputs depend on the present combination of the inputs. elements. Dr. Chapter 4 Dr. Panos Nasiopoulos Combinational: Circuits with logic gates whose outputs depend on the present combination of the inputs. Sequential: In addition, they include storage elements Combinational

More information

Proposal to Improve Data Format Conversions for a Hybrid Number System Processor

Proposal to Improve Data Format Conversions for a Hybrid Number System Processor Proceedings of the 11th WSEAS International Conference on COMPUTERS, Agios Nikolaos, Crete Island, Greece, July 6-8, 007 653 Proposal to Improve Data Format Conversions for a Hybrid Number System Processor

More information

Performance Evaluation of Signed-Digit Architecture for Weighted-to-Residue and Residue-to-Weighted Number Converters with Moduli Set (2 n 1, 2 n,

Performance Evaluation of Signed-Digit Architecture for Weighted-to-Residue and Residue-to-Weighted Number Converters with Moduli Set (2 n 1, 2 n, Regular Paper Performance Evaluation of Signed-Digit Architecture for Weighted-to-Residue and Residue-to-Weighted Number Converters with Moduli Set (2 n 1, 2 n, 2 n +1) Shuangching Chen and Shugang Wei

More information

CMP 334: Seventh Class

CMP 334: Seventh Class CMP 334: Seventh Class Performance HW 5 solution Averages and weighted averages (review) Amdahl's law Ripple-carry adder circuits Binary addition Half-adder circuits Full-adder circuits Subtraction, negative

More information

Chapter 5: Solutions to Exercises

Chapter 5: Solutions to Exercises 1 DIGITAL ARITHMETIC Miloš D. Ercegovac and Tomás Lang Morgan Kaufmann Publishers, an imprint of Elsevier Science, c 2004 Updated: September 9, 2003 Chapter 5: Solutions to Selected Exercises With contributions

More information

CPE100: Digital Logic Design I

CPE100: Digital Logic Design I Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu CPE100: Digital Logic Design I Final Review http://www.ee.unlv.edu/~b1morris/cpe100/ 2 Logistics Tuesday Dec 12 th 13:00-15:00 (1-3pm) 2 hour

More information

Fundamentals of Computer Systems

Fundamentals of Computer Systems Fundamentals of Computer Systems Review for the Midterm Stephen A. Edwards Columbia University Spring 22 The Midterm 75 minutes 4 5 problems Closed book Simple calculators are OK, but unnecessary One double-sided

More information

Design and FPGA Implementation of Radix-10 Algorithm for Division with Limited Precision Primitives

Design and FPGA Implementation of Radix-10 Algorithm for Division with Limited Precision Primitives Design and FPGA Implementation of Radix-10 Algorithm for Division with Limited Precision Primitives Miloš D. Ercegovac Computer Science Department Univ. of California at Los Angeles California Robert McIlhenny

More information

Svoboda-Tung Division With No Compensation

Svoboda-Tung Division With No Compensation Svoboda-Tung Division With No Compensation Luis MONTALVO (IEEE Student Member), Alain GUYOT Integrated Systems Design Group, TIMA/INPG 46, Av. Félix Viallet, 38031 Grenoble Cedex, France. E-mail: montalvo@archi.imag.fr

More information

Combinational Logic. Lan-Da Van ( 范倫達 ), Ph. D. Department of Computer Science National Chiao Tung University Taiwan, R.O.C.

Combinational Logic. Lan-Da Van ( 范倫達 ), Ph. D. Department of Computer Science National Chiao Tung University Taiwan, R.O.C. Combinational Logic ( 范倫達 ), Ph. D. Department of Computer Science National Chiao Tung University Taiwan, R.O.C. Fall, 2017 ldvan@cs.nctu.edu.tw http://www.cs.nctu.edu.tw/~ldvan/ Combinational Circuits

More information

CSE140: Components and Design Techniques for Digital Systems. Decoders, adders, comparators, multipliers and other ALU elements. Tajana Simunic Rosing

CSE140: Components and Design Techniques for Digital Systems. Decoders, adders, comparators, multipliers and other ALU elements. Tajana Simunic Rosing CSE4: Components and Design Techniques for Digital Systems Decoders, adders, comparators, multipliers and other ALU elements Tajana Simunic Rosing Mux, Demux Encoder, Decoder 2 Transmission Gate: Mux/Tristate

More information

Arithmetic Circuits Didn t I learn how to do addition in the second grade? UNC courses aren t what they used to be...

Arithmetic Circuits Didn t I learn how to do addition in the second grade? UNC courses aren t what they used to be... rithmetic Circuits Didn t I learn how to do addition in the second grade? UNC courses aren t what they used to be... + Finally; time to build some serious functional blocks We ll need a lot of boxes The

More information

ALU (3) - Division Algorithms

ALU (3) - Division Algorithms HUMBOLDT-UNIVERSITÄT ZU BERLIN INSTITUT FÜR INFORMATIK Lecture 12 ALU (3) - Division Algorithms Sommersemester 2002 Leitung: Prof. Dr. Miroslaw Malek www.informatik.hu-berlin.de/rok/ca CA - XII - ALU(3)

More information

Arithmetic Circuits How to add and subtract using combinational logic Setting flags Adding faster

Arithmetic Circuits How to add and subtract using combinational logic Setting flags Adding faster rithmetic Circuits Didn t I learn how to do addition in second grade? UNC courses aren t what they used to be... 01011 +00101 10000 Finally; time to build some serious functional blocks We ll need a lot

More information

Proposal to Improve Data Format Conversions for a Hybrid Number System Processor

Proposal to Improve Data Format Conversions for a Hybrid Number System Processor Proposal to Improve Data Format Conversions for a Hybrid Number System Processor LUCIAN JURCA, DANIEL-IOAN CURIAC, AUREL GONTEAN, FLORIN ALEXA Department of Applied Electronics, Department of Automation

More information

Basic elements of number theory

Basic elements of number theory Cryptography Basic elements of number theory Marius Zimand 1 Divisibility, prime numbers By default all the variables, such as a, b, k, etc., denote integer numbers. Divisibility a 0 divides b if b = a

More information

Basic elements of number theory

Basic elements of number theory Cryptography Basic elements of number theory Marius Zimand By default all the variables, such as a, b, k, etc., denote integer numbers. Divisibility a 0 divides b if b = a k for some integer k. Notation

More information

Where are we? Data Path Design

Where are we? Data Path Design Where are we? Subsystem Design Registers and Register Files dders and LUs Simple ripple carry addition Transistor schematics Faster addition Logic generation How it fits into the datapath Data Path Design

More information