Multiplication Ivor Page 1

Similar documents
Tree and Array Multipliers Ivor Page 1

Lecture 8: Sequential Multipliers

Chapter 5 Arithmetic Circuits

EECS150 - Digital Design Lecture 24 - Arithmetic Blocks, Part 2 + Shifters

Lecture 8. Sequential Multipliers

Logic and Computer Design Fundamentals. Chapter 5 Arithmetic Functions and Circuits

What s the Deal? MULTIPLICATION. Time to multiply

Multiplication of signed-operands

VLSI Arithmetic. Lecture 9: Carry-Save and Multi-Operand Addition. Prof. Vojin G. Oklobdzija University of California

Binary Multipliers. Reading: Study Chapter 3. The key trick of multiplication is memorizing a digit-to-digit table Everything else was just adding

Hardware Design I Chap. 4 Representative combinational logic

ARITHMETIC COMBINATIONAL MODULES AND NETWORKS

9. Datapath Design. Jacob Abraham. Department of Electrical and Computer Engineering The University of Texas at Austin VLSI Design Fall 2017

Carry Look Ahead Adders

Computer Architecture 10. Fast Adders

Residue Number Systems Ivor Page 1

1 Short adders. t total_ripple8 = t first + 6*t middle + t last = 4t p + 6*2t p + 2t p = 18t p

Parallel Multipliers. Dr. Shoab Khan

CS 140 Lecture 14 Standard Combinational Modules

Chapter 6: Solutions to Exercises

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

Part II Addition / Subtraction

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

Lecture 18: Datapath Functional Units

Complement Arithmetic

Lecture 12: Datapath Functional Units

Part II Addition / Subtraction

Chapter 5. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 5 <1>

Midterm Exam Two is scheduled on April 8 in class. On March 27 I will help you prepare Midterm Exam Two.

Chapter 1: Solutions to Exercises

Chapter 03: Computer Arithmetic. Lesson 03: Arithmetic Operations Adder and Subtractor circuits Design

Class Website:

Combinational Logic. By : Ali Mustafa

Logic. Combinational. inputs. outputs. the result. system can

14:332:231 DIGITAL LOGIC DESIGN. Why Binary Number System?

Arithmetic Circuits-2

ECE 645: Lecture 2. Carry-Lookahead, Carry-Select, & Hybrid Adders

UNSIGNED BINARY NUMBERS DIGITAL ELECTRONICS SYSTEM DESIGN WHAT ABOUT NEGATIVE NUMBERS? BINARY ADDITION 11/9/2018

DIGITAL TECHNICS. Dr. Bálint Pődör. Óbuda University, Microelectronics and Technology Institute

ECE380 Digital Logic. Positional representation

NEW SELF-CHECKING BOOTH MULTIPLIERS

Cost/Performance Tradeoff of n-select Square Root Implementations

Fundamentals of Digital Design

Integer Multipliers 1

ELEN Electronique numérique

UNIT II COMBINATIONAL CIRCUITS:

A COMBINED 16-BIT BINARY AND DUAL GALOIS FIELD MULTIPLIER. Jesus Garcia and Michael J. Schulte

Digital System Design Combinational Logic. Assoc. Prof. Pradondet Nilagupta

CMP 334: Seventh Class

Chapter 5: Solutions to Exercises

Lecture 12: Datapath Functional Units

COMPUTERS ORGANIZATION 2ND YEAR COMPUTE SCIENCE MANAGEMENT ENGINEERING UNIT 3 - ARITMETHIC-LOGIC UNIT JOSÉ GARCÍA RODRÍGUEZ JOSÉ ANTONIO SERRA PÉREZ

Adders, subtractors comparators, multipliers and other ALU elements

Combinational Logic Design Arithmetic Functions and Circuits

Digital Design for Multiplication

LOGIC CIRCUITS. Basic Experiment and Design of Electronics

Logic and Computer Design Fundamentals. Chapter 8 Sequencing and Control

課程名稱 : 數位邏輯設計 P-1/ /6/11

Reduced-Area Constant-Coefficient and Multiple-Constant Multipliers for Xilinx FPGAs with 6-Input LUTs

Graduate Institute of Electronics Engineering, NTU Basic Division Scheme

BOOLEAN ALGEBRA. Introduction. 1854: Logical algebra was published by George Boole known today as Boolean Algebra

Arithmetic in Integer Rings and Prime Fields

DIGIT-SERIAL ARITHMETIC

Cost/Performance Tradeoffs:

Number representation

COE 202: Digital Logic Design Sequential Circuits Part 4. Dr. Ahmad Almulhem ahmadsm AT kfupm Phone: Office:

DIVISION BY DIGIT RECURRENCE

CprE 281: Digital Logic

14:332:231 DIGITAL LOGIC DESIGN. 2 s-complement Representation

Design of Sequential Circuits

Area-Time Optimal Adder with Relative Placement Generator

A VLSI Algorithm for Modular Multiplication/Division

GALOP : A Generalized VLSI Architecture for Ultrafast Carry Originate-Propagate adders

Solutions for Appendix C Exercises

14:332:231 DIGITAL LOGIC DESIGN

Chapter 2 Basic Arithmetic Circuits

Arithmetic Circuits-2

Sample Test Paper - I

L8/9: Arithmetic Structures

IT T35 Digital system desigm y - ii /s - iii

Numbers and Arithmetic

CSE 140 Spring 2017: Final Solutions (Total 50 Points)

We are here. Assembly Language. Processors Arithmetic Logic Units. Finite State Machines. Circuits Gates. Transistors

Combinational Logic. Lan-Da Van ( 范倫達 ), Ph. D. Department of Computer Science National Chiao Tung University Taiwan, R.O.C.

EE141- Spring 2004 Digital Integrated Circuits

EECS150. Arithmetic Circuits

LOGIC CIRCUITS. Basic Experiment and Design of Electronics. Ho Kyung Kim, Ph.D.

Overview. Arithmetic circuits. Binary half adder. Binary full adder. Last lecture PLDs ROMs Tristates Design examples

Tunable Floating-Point for Energy Efficient Accelerators

Logic Design II (17.342) Spring Lecture Outline

Boolean Algebra and Digital Logic 2009, University of Colombo School of Computing

Proposal to Improve Data Format Conversions for a Hybrid Number System Processor

Chapter 5. Digital systems. 5.1 Boolean algebra Negation, conjunction and disjunction

EECS150 - Digital Design Lecture 10 - Combinational Logic Circuits Part 1

7 Multipliers and their VHDL representation

A Suggestion for a Fast Residue Multiplier for a Family of Moduli of the Form (2 n (2 p ± 1))

Review for Final Exam

Numbers and Arithmetic

ECE 545 Digital System Design with VHDL Lecture 1. Digital Logic Refresher Part A Combinational Logic Building Blocks

EXPERIMENT Bit Binary Sequential Multiplier

Transcription:

Multiplication 1 Multiplication Ivor Page 1 10.1 Shift/Add Multiplication Algorithms We will adopt the notation, a Multiplicand a k 1 a k 2 a 1 a 0 b Multiplier x k 1 x k 2 x 1 x 0 p Product p 2k 1 p 2k 2 p 1 p 0 Figure 1 shows the arrangement of partial products produced by a row of AND gates, p i = x i. a k 1 a 1,a 0 2 i. 1 University of Texas at Dallas

Multiplication 2 Multiplicand a Multiplier x a x 0 2 0 a x 1 2 1 a x 2 2 2 a x 3 2 3 Figure 1: Four by Four Bit Multiply The shift/add multiplier is a time-iterative system based on a k bit adder and a 2k bit result register that is shifted right one place on each iteration.figure 2 shows the schematic:

Multiplication 3 m'cand a mux 0 k-bit adder u k k v lsb Figure 2: Shift-Right Shift/Add Multiplier

Multiplication 4 Initially the multiplier is loaded into the k bit register v.the operand y is loaded into u, the upper half of that double register.at the end of the multiply operation, the double length result register contains y+xa. Conceptually each iteration comprises an add cycle and a shift cycle. Following the logic of Figure 2, either the multiplicand or zero is added to the upper half of the result register, depending on the lsb of the multiplier.the result is stored in the upper half of the double register, then the double register us shifted right one place.for this scheme to work, the register u would have to be k + 1 bits in length. In practice, the add and shift are usually combined into one operation, avoiding the need for the extra bit in the register u.if the lsb of the multiplier is 1, a combined add/shift take place.if it is zero, only a shift of the double length result register is needed.a fast right-justify-shift would shift the double register until its lsb is one, or until an associated iteration counter becomes zero.the counter is required in the control logic of all time-iterative implementations.

Multiplication 5 The shifts cause the bits of the multiplier to be presented to the control input of the mux in sequence and cause the places vacated by the multiplier to be taken up by the result as it grows to the right. Multipliers based on left-shifts are also possible, but are not favored because the right-shift version enables sign extension of the result register with minimal additional hardware costs. 10.1.1 Negative Operands If we multiply A B, where the multiplicand A is negative and the multiplier B is positive, the result is (2 k A) B = B2 k AB.The required result is 2 2k AB.Sign extension of all the partial products simulates the effect of making the multiplicand 2k bits long, (2 2k A) B = B2 2k AB.In the right-shift multiplier above, sign extension is simply performed by making the right-shift a signed-shift.during every right shift, the sign bit of the result (left-most bit of register u)iscopied to the right, rather than shifted.

Multiplication 6 Example, 5 7+0: 0 0 0 0 0 1 1 1 m pr lsb=1 so add/shift: + 1 0 1 1 1 1 0 1 1 0 1 1 m pr lsb=1 so add/shift: + 1 0 1 1 1 1 0 0 0 1 0 1 m pr lsb=1 so add/shift: + 1 0 1 1 1 0 1 1 1 0 1 0 m pr lsb=0 so just shift: 1 1 0 1 1 1 0 1 = -35

Multiplication 7 Now consider a positive multiplicand and a negative multiplier: (A) (2 k B) =A2 k AB.If we sign-extend the multiplier, there will be 2k partial products and the time for a multiply will double.there are two other ways to solve this problem.the first is to subtract A2 k from the result.this requires an additional add cycle.the second is to treat the sign bit of the multiplier differently and to subtract the multiplicand in the cycle where that bit is being examined and it is a 1. First, let s see how the incorrect result is formed and use the additional subtraction cycle to get the correct result.

Multiplication 8 Example, 5 ( 7) + 0: 0 0 0 0 1 0 0 1 m pr lsb=1 so add/shift: + 0 1 0 1 0 0 1 0 1 1 0 0 m pr lsb= so just shift: 0 0 0 1 0 1 1 0 m pr lsb=0 so just shift: 0 0 0 0 1 0 1 1 m pr lsb=1 so add/shift: + 0 1 0 1 0 0 1 0 1 1 0 1 = 45. subtract the m cand from upper half: - 0 1 0 1 1 1 0 1 1 1 0 1 = -35 Next we will avoid the need for a correction cycle by subtracting the multiplicand when we get to the sign bit of the multiplier and it is 1.

Multiplication 9 Example, 5 ( 7) + 0: 0 0 0 0 1 0 0 1 m pr lsb=1 so add/shift: + 0 1 0 1 0 0 1 0 1 1 0 0 m pr lsb= so just shift: 0 0 0 1 0 1 1 0 m pr lsb=0 so just shift: 0 0 0 0 1 0 1 1 m pr lsb=1, subtract/shift: - 0 1 0 1 1 1 0 1 1 1 0 1 = -35

Multiplication 10 10.1.2 Booth Recoding In the final example above, there is a string of 2 zeros in the multiplier. The corresponding cycles of the multiplication are just shifts.in Booth radix-2 recoding, the cycles of the multiplication algorithm corresponding to strings of consecutive zeros or ones in the multiplier require only shifts.the simplest way to understand the technique is to consider recoding of the bits of the multiplier into a redundant number system with digit range [-1,1] (that is digit set { 1, 0, 1}).Here is an example: 0 0 0 1 1 1 1 1 1 1 0 0 1 1 1 m pr x in radix 2 0 0 1 0 0 0 0 0 0-1 0 1 0 0-1 recoded m per A zero followed by a string of w ones is replaced by 1, w 1 zeros, then -1.The complete algorithm for conversion is to follow. 1 0 0 1 1 1 0 1 1 0 1 0 1 1 1 0 m pr x in radix 2-1 0 1 0 0-1 1 0-1 1-1 1 0 0-1 0 recoded m per

Multiplication 11 In the first example, the number of add cycles is reduced from 10 to 4 but, if each bit of the multiplier has an even chance of being a one or a zero, the average number of adds remains the same. The modification to the hardware suggests that both the multiplicand and its negative value be available in registers.alternatively, logic could be provided to generate the 1 s complement of the multiplicand and, when the complement is used, the carry-in to the adder would be set to 1. The multiplier register is extended by one bit to include x 1.The operation in each cycle depends upon the two least significant bits of the multiplier register, x 0 and x 1.Initially x 1 =0.

Multiplication 12 Booth Radix-2 Recoding x i x i 1 Operation Explanation 0 0 shift within sequence of zeros 0 1 +a/shift end of a sequence of 1 s 1 0 a/shift beginning of a sequence of 1 s 1 1 shift within sequence of 1 s

Multiplication 13 Consider some values in 2 s complement notation and their Booth recoding: 2 s complement Booth recoding Value ------------- -------------- ----- 1 0 0 0 0-1 0 0 0 0-16 1 1 0 0 0 0-1 0 0 0-8 1 1 1 0 0 0 0-1 0 0-4 1 1 1 1 1 0 0 0 0-1 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1-1 1 0 0 0 1 1 0 0 1 0-1 3 0 0 1 1 1 0 1 0 0-1 7 0 1 1 1 1 1 0 0 0-1 15 0 1 1 1 0 1 0 0-1 0 14 0 1 1 0 0 1 0-1 0 0 12

Multiplication 14 The recoded version is not a complement system, but a redundant positional system with radix 2 and digit set [ 1, 1].This implies that the Booth multiplier correctly handles negative multiplier values. An example follows in which the multiplier is negative.the result register s sign bit propagates to the right during shifts.note the extra bit position to the right of the lsb of the multiplier, following the point in the example.

Multiplication 15 Example, 5 ( 7) + 0: 0 0 0 0 1 0 0 1.0 m pr lsbs=10 so subtract/shift: - 0 1 0 1 1 1 0 1 1 1 0 0.1 m pr lsbs=01 so add/shift: + 0 1 0 1 0 0 0 1 0 1 1 0.0 m pr lsbs=00 so just shift: 0 0 0 0 1 0 1 1.0 m pr lsbs=10, subtract/shift: - 0 1 0 1 1 1 0 1 1 1 0 1 = -35

Multiplication 16 Radix-2 Booth recoding will not, on average, reduce the number of additions.radix-4 Booth recoding, however, is widely used to reduce the number of partial products to be added.in this system, the digit set is [ 2, +2]. 21 31 22 32 radix 4 1 0 0 1 1 1 0 1 1 0 1 0 1 1 1 0.0 radix 2-2 2-1 2-1 -1 0-2 Booth radix-4 recoded Each recoded digit comes from the three binary digits above it.note the extra bit to the right of the lsb. Conversion requires examination of three consecutive bits of the multiplier.after each recoded bit has been generated, and used in the multiplication operation, the result register is right shifted two places. Only k/2 partial products are generated.

Multiplication 17 This multiplier is typically implemented with two registers for the multiplicand, one for its value and the other for the negation of its value, since 2 multiplicand and -2 multiplicand can be got by shifts. Booth Radix-4 Recoding x i+1 x i x i 1 Operation Explanation 0 0 0 shift within sequence of zeros 0 0 1 +a/shift end of a sequence of 1 s 0 1 0 +a/shift isolated 1 0 1 1 +2a/shift end of a sequence of 1 s 1 0 0 2a/shift beginning of a sequence of 1 s 1 0 1 a/shift end of a sequence of 1 s,beginning of another 1 1 0 a/shift beginning of sequence of 1 s 1 1 1 shift within sequence of 1 s

Multiplication 18 Example, 22 ( 6) + 0, k =6bits: 0 0 0 0 0 0 1 1 1 0 1 0.0 m pr lsbs=100 so -2a/shift: - 1 0 1 1 0 0 1 1 0 1 0 1 0 0 1 1 1 0.1 m pr lsbs=101 so subtract/shift: - 0 1 0 1 1 0 1 1 0 1 1 1 1 1 0 0 1 1.1 m pr lsbs=00 so just shift: 1 1 1 1 0 1 1 1 1 1 0 0.0 = -132 The adder must generate k + 1 output bits, the left-most being the sign bit.the right-shifts of the 2k +1bitresultregisterare2-place arithmetic shifts (the sign bit is propagated to the right).

Multiplication 19 Similar schemes using radix-8 and radix-16 recoding are also used. The radix-16 multiplier generates k/4 partial products (or requires k/4 add/shift cycles).five bits of the multiplier are examined at a time and a shift of four places takes place in each cycle. 10.1.3 Using a Carry-Save Adder In the above implementations the nature of the k bit adder was left unspecified.if a non-redundant carry-propagate adder is used, the best time for an add cycle is O(log k).since repeated additions are required, a carry-save adder is appropriate.in this design, the upper half of the result register comprises two parts, k bits for the incomplete sum, and k bits for the carries from the previous add cycle. Figure 3 illustrates the design:

Multiplication 20 products of multiplicand......... lsb's of multiplier F A F A F A c s incomplete sum register carry register Figure 3: Single CSA Adder in Booth Multiplier

Multiplication 21 In the diagram, the products of the multiplicand are a, a,.the outputs of the full-adders are skewed to the right to represent the combination of an add and a single bit shift.if Booth radix-4 or greater recoding is used, the skew would be over more bits. In each iteration a CSA addition takes constant time.after the final iteration, a carry-propagate addition is required to combine the incomplete sums with the carries.this can be done in T RC = O(k)timeusing a ripple-carry adder, or T CLA = O(log k) time using a CLA adder.the overall time is then c 1 k/b + T RC or c 1 k/b + T CLA where c 1 is the delay through the CSA and b =1, 2, 3, 4 corresponds to the advantage of using Booth recoding.since the first term is linear in k, it seems sensible to use a ripple-carry adder for the final stage.multiplexers can be used between the full-adders to switch the adder function from CSA to ripple-carry for the last iteration of the multiplication.

Multiplication 22 10.1.4 Using Multiple Carry-Save Adders Consider a design in which three digits of the multiplier are examined at a time and a shift of three places takes place on each iteration.we call this a radix-8 multiplier.say we examine x i+2,x i+1,x i, beginning with i = 0.We would add ax i+2 2 i+2 + ax i+1 2 i+1 + ax i 2 i to the result register.since scaling of the partial products takes place automatically, we would actually add one of {a, 2a,, 7a} to the upper half of the result register.a shift network would generate 2a and 4a and a CSA tree with five input operands would generate the next partial result. The five operands would be 4ax i+2,2ax x+1, ax 1, and the incomplete sum and carry outputs from the previous iteration. For a radix 8 system, three CSAs are needed and the depth of the CSA tree is three.

Multiplication 23 The use of a CSA tree significantly adds to the area of the multiplier in comparison with the equivalent Booth Radix-8 scheme.the delay through the CSA tree is O(log b) forradixb.this must be compared with the delay through the recoding unit in the Booth design.that unit examines 4 bits of the multiplier and generates signals for the multiplexer control inputs.the CSA tree design uses three bits of the multiplier directly as the multiplexer inputs. Radix 16 CSA tree-multipliers are also possible, providing an almost 4 speedup over the single CSA design. The optimal area time product for any k bit multiplier is proportional to k and the optimal area time 2 product is proportional to k 2.The overall time complexity of a radix b CSA tree multiplier is O((k/b)log b + log k) if a CLA adder is used for the final iteration. The area of the array of AND gates to produce the b partial products is O(kb).

Multiplication 24 So, for a radix b CSA tree multiplier, the two products are: AT = O(k 2 log b + bk log k) AT 2 = O((k 3 /b)log 2 b) These designs are sub-optimal for any radix b. Pipelining of the CSA tree can be used to enable a further speedup.for example, while the first level of the tree is adding values for iteration I, the second level is adding values for iteration I 1andsoon.This improvement reduces the time complexity to O((k/b)+ log k) Radix b Booth multipliers follow the same asymptotic results for area since we have not included the area of the CSA units in our estimates. The time complexity of the Booth radix b scheme is O((k/b)+log k), assuming that the recoding logic adds insignificantly to the delay. The pipelining idea is taken to the limit in the next module where we consider array multipliers.