Tree and Array Multipliers Ivor Page 1

Similar documents
Lecture 8: Sequential Multipliers

Multiplication Ivor Page 1

What s the Deal? MULTIPLICATION. Time to multiply

Chapter 5 Arithmetic Circuits

Logic and Computer Design Fundamentals. Chapter 5 Arithmetic Functions and Circuits

Lecture 8. Sequential Multipliers

VLSI Arithmetic. Lecture 9: Carry-Save and Multi-Operand Addition. Prof. Vojin G. Oklobdzija University of California

Residue Number Systems Ivor Page 1

9. Datapath Design. Jacob Abraham. Department of Electrical and Computer Engineering The University of Texas at Austin VLSI Design Fall 2017

Cost/Performance Tradeoffs:

Hardware Design I Chap. 4 Representative combinational logic

Carry Look Ahead Adders

A COMBINED 16-BIT BINARY AND DUAL GALOIS FIELD MULTIPLIER. Jesus Garcia and Michael J. Schulte

EECS150 - Digital Design Lecture 24 - Arithmetic Blocks, Part 2 + Shifters

1 Short adders. t total_ripple8 = t first + 6*t middle + t last = 4t p + 6*2t p + 2t p = 18t p

Part II Addition / Subtraction

DIGITAL TECHNICS. Dr. Bálint Pődör. Óbuda University, Microelectronics and Technology Institute

Part II Addition / Subtraction

Midterm Exam Two is scheduled on April 8 in class. On March 27 I will help you prepare Midterm Exam Two.

Combinational Logic. By : Ali Mustafa

Arithmetic Circuits-2

Integer Multipliers 1

Binary Multipliers. Reading: Study Chapter 3. The key trick of multiplication is memorizing a digit-to-digit table Everything else was just adding

Multiplication of signed-operands

Reduced-Area Constant-Coefficient and Multiple-Constant Multipliers for Xilinx FPGAs with 6-Input LUTs

Cost/Performance Tradeoff of n-select Square Root Implementations

Forward and Reverse Converters and Moduli Set Selection in Signed-Digit Residue Number Systems

Complement Arithmetic

Arithmetic Circuits-2

Novel Modulo 2 n +1Multipliers

14:332:231 DIGITAL LOGIC DESIGN. 2 s-complement Representation

A Low-Error Statistical Fixed-Width Multiplier and Its Applications

ARITHMETIC COMBINATIONAL MODULES AND NETWORKS

A VLSI Algorithm for Modular Multiplication/Division

CS 140 Lecture 14 Standard Combinational Modules

The equivalence of twos-complement addition and the conversion of redundant-binary to twos-complement numbers

Computer Architecture 10. Fast Adders

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

Numbers and Arithmetic

Z = F(X) Combinational circuit. A combinational circuit can be specified either by a truth table. Truth Table

Class Website:

On the Analysis of Reversible Booth s Multiplier

I. INTRODUCTION. CMOS Technology: An Introduction to QCA Technology As an. T. Srinivasa Padmaja, C. M. Sri Priya

ELEN Electronique numérique

Hakim Weatherspoon CS 3410 Computer Science Cornell University

14:332:231 DIGITAL LOGIC DESIGN

Lecture 18: Datapath Functional Units

NEW SELF-CHECKING BOOTH MULTIPLIERS

EE141- Spring 2004 Digital Integrated Circuits

Chapter 1: Solutions to Exercises

Lecture 12: Datapath Functional Units

Proposal to Improve Data Format Conversions for a Hybrid Number System Processor

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

Numbers & Arithmetic. Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University. See: P&H Chapter , 3.2, C.5 C.

CHAPTER 2 NUMBER SYSTEMS

Literature Review on Multiplier Accumulation Unit by Using Hybrid Adder

Systems I: Computer Organization and Architecture

National Taiwan University Taipei, 106 Taiwan 2 Department of Computer Science and Information Engineering

Proposal to Improve Data Format Conversions for a Hybrid Number System Processor

Total Time = 90 Minutes, Total Marks = 100. Total /10 /25 /20 /10 /15 /20

We are here. Assembly Language. Processors Arithmetic Logic Units. Finite State Machines. Circuits Gates. Transistors

Number representation

Power Optimization using Reversible Gates for Booth s Multiplier

UNSIGNED BINARY NUMBERS DIGITAL ELECTRONICS SYSTEM DESIGN WHAT ABOUT NEGATIVE NUMBERS? BINARY ADDITION 11/9/2018

Design and Comparison of Wallace Multiplier Based on Symmetric Stacking and High speed counters

Serial Parallel Multiplier Design in Quantum-dot Cellular Automata

University of Toronto Faculty of Applied Science and Engineering Final Examination

COMPUTERS ORGANIZATION 2ND YEAR COMPUTE SCIENCE MANAGEMENT ENGINEERING UNIT 3 - ARITMETHIC-LOGIC UNIT JOSÉ GARCÍA RODRÍGUEZ JOSÉ ANTONIO SERRA PÉREZ

COMBINATIONAL LOGIC FUNCTIONS

Quiz 2 Solutions Room 10 Evans Hall, 2:10pm Tuesday April 2 (Open Katz only, Calculators OK, 1hr 20mins)

Binary addition (1-bit) P Q Y = P + Q Comments Carry = Carry = Carry = Carry = 1 P Q

Arithmetic Circuits-2

Area-Time Optimal Adder with Relative Placement Generator

Numbers and Arithmetic

UNIT II COMBINATIONAL CIRCUITS:

ENGIN 112 Intro to Electrical and Computer Engineering

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: Digital Logic

UNIT III Design of Combinational Logic Circuits. Department of Computer Science SRM UNIVERSITY

14:332:231 DIGITAL LOGIC DESIGN. Why Binary Number System?

EE40 Lec 15. Logic Synthesis and Sequential Logic Circuits

7 Multipliers and their VHDL representation

1 Boolean Algebra Simplification

Chapter 5. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 5 <1>

Quiz 2 Room 10 Evans Hall, 2:10pm Tuesday April 2 (Open Katz only, Calculators OK, 1hr 20mins)

ECE380 Digital Logic. Positional representation

Ex code

Floating Point Representation and Digital Logic. Lecture 11 CS301

Logic and Computer Design Fundamentals. Chapter 8 Sequencing and Control

Additional Gates COE 202. Digital Logic Design. Dr. Muhamed Mudawar King Fahd University of Petroleum and Minerals

Logic. Combinational. inputs. outputs. the result. system can

Sample Test Paper - I

A High-Speed Realization of Chinese Remainder Theorem

Solutions for Appendix C Exercises

Combinational Logic Design Arithmetic Functions and Circuits

Vectorized 128-bit Input FP16/FP32/ FP64 Floating-Point Multiplier

Design of Sequential Circuits

Lecture 12: Datapath Functional Units

Chapter 03: Computer Arithmetic. Lesson 03: Arithmetic Operations Adder and Subtractor circuits Design

WORKBOOK. Try Yourself Questions. Electrical Engineering Digital Electronics. Detailed Explanations of

Transcription:

Tree and Array Multipliers 1 Tree and Array Multipliers Ivor Page 1 11.1 Tree Multipliers In Figure 1 seven input operands are combined by a tree of CSAs. The final level of the tree is a carry-completion adder, probably a CLA-based design. In the tree multiplier k/b input operands are combined, where b =2,4,or is the advantage gained by Booth recoding of the multiplier. All k bits wide k-bit CSA k-bit CSA All k bits wide k-bit CSA k-bit CSA All k bits wide k k k-1 k-bit CSA k-bit adder bit k+2 k bits bit 1 bit 0 Figure 1: Tree Multiplier Wallace or Dadda tree designs can be used, but the resulting layouts are irregular with varying signal path lengths and delays. An alternative design makes use of basic building blocks that comprise two CSAs to reduce 4-inputs to 2-outputs. These 4-to-2 blocks are used to form a simple binary tree that has a regular structure that is easy to layout. See Figure 2. The resulting binary tree has more CSA units than a Wallace or Dadda tree design, but its regular layout makes it attractive. 1 University of Texas at Dallas

Tree and Array Multipliers 2 CSA 4-to-2 4-to-2 4-to-2 4-to-2 CSA 4-to-2 4-to-2 4-to-2 Figure 2: 4-to-2 Building Block and Associated Binary Tree Multiplier The partial products to be added are progressively left shifted as illustrated in Figure 3: Multiplicand a Multiplier x a x 0 2 0 a x 1 2 1 a x 2 2 2 a x 3 2 3 Figure 3: Arrangement of partial products It would be wasteful to use CSAs of k +3 and more bits in a tree to add these partial products. Instead, each column of the array of partial products can be added with a separate tree of single-bit CSAs, or full-adders. Beginning in the 2 2 column and moving to the left, columns of 3, 4, k, k 1,,4, 3, 2, 1 bits must be reduced to just two bits in each column. As we saw in the last module, carry-in signals to each column add to these inputs. Wallace or Dadda trees can be used in these columns, with the carry signals propagating to the left. The full-adders of these designs each reduce 3 input bits to two outputs, providing a log-depth multiplier, but an irregular layout with signal paths of

Tree and Array Multipliers 3 varying lengths and varying delays. Figure 4 shows a balanced-delay 11 input tree. Each output is valid after the same number of full-adder delays. This circuit must be replicated in each column in order to equalize the delay paths throughout the entire multiplier. The number of inputs can easily be expanded to 1 by adding a column of seven full-adders to the right of the diagram. These structures have good layouts, but only for specific numbers of inputs. Level 1 carries Level 2 carries Level 3 carries Level 4 carry Outputs Figure 4: 11 Input Balanced Delay Counter For comparison purposes, Figure 5 shows an -bit multiplier based on BSD redundant binary adders. In this layout the result is produced in the middle. As can be seen, approximately 3k/2 connections pass vertically through the 3rd adder block. The 2k output connections and the k multiplicand inputs must also must be routed vertically in between the cells of the adder blocks. The layout is normally arranged so that the result is produced at the bottom (see figure in the text). This increases the amount of pass-through connections. There are k 2 partial product digits to be added and the depth of the multiplier is log k. The area of the multiplier is O(k 2 log k).

Tree and Array Multipliers 4 Adder sizes 11 7 10 12 11 12 Figure 5: Multiplier Based on Redundant BSD numbers Tagaki et.al. 2 gives the following table comparing the BSD multiplier with the standard array multiplier and one based on Wallace trees: k 16 32 64 12 Array 29/52 61/2336 125/9792 253/40064 509/16204 Wallace 22/15 24/2939 30/9965 34/37423 40/142335 BSD 21/550 27/2213 31/796 37/343 41/135324 The elements of the table, are Depth/Gate Count. The design is based on a fan-in limit of 4. As can be seen, the differences between the BSD and Wallace tree versions are minimal. The only advantage that can be claimed for the BSD version is that the layout is more regular than in the Wallace Tree version. This may lead to better actual area and delay for the BSD version. A layout similar to that of Figure 5 can be used for a multiplier based on the 4:2 units of Figure 2. Indeed, the 4:2 unit can be viewed as a GSD radix-2 adder for digit set [0, 2] where the two outputs of each 4:2 unit have encoding: zero=(0,0), one=(0,1) or (1,0), two=(11). This observation makes the two designs very simple to compare. The area and delay of the units depends 2 Naofumi Tagaki, Hiroto Yashuura, Shuzo Yajima, High Speed VLSI Multiplication Algorithm with a Redundant Binary Addition Tree, IEEE Transactions on Computers, C-34,9 Sept. 195

Tree and Array Multipliers 5 on the area and delay of the building blocks: BSD adder blocks vs. 4:2 reduction blocks. According to Takagi, a BSD adder block for k bits requires 13k or/nor gates. The technology used was emitted coupled logic, which naturally provides two outputs, the positive output and its complement. If we are restricted to Nand gates, then several inverters would have to be added to this estimate. The k bit 4:2 adder block would require 1k 2-input gates. On this basis, it appears that the multiplier based on BSD blocks would have a slight area advantage, but the comparison is really too close to call. We therefore have at least three designs for multipliers that have log depth and comparable gate counts. The Wallace tree has the most complex and least efficient layout. All three designs can benefit from high radix Booth recoding, and pipeline registers can be inserted in all of the designs to speed throughput (without reducing delay time). 11.2 Tree Multipliers for Signed Numbers Any form of Booth recoding, or the use of a redundant number system, will naturally take care of negative multiplier values. Negative multiplicands are also naturally catered for in a redundant system with symmetrical digit set (such as BSD). In any non-redundant tree (or array) multiplier it is still necessary to simulate sign extension of any negative partial products. The Baugh Wooley multiplier naturally takes care of both negative multipliers and multiplicands. Here are some tables illustrating the development of the Baugh Wooley multiplier. The first is an array multiplier for unsigned values: a 4 a 3 a 2 a 1 a 0 x 4 x 3 x 2 x 1 x 0 a 4 x 0 a 3 x 0 a 2 x 0 a 1 x 0 a 0 x 0 a 4 x 1 a 3 x 1 a 2 x 1 a 1 x 1 a 0 x 1 a 4 x 2 a 3 x 2 a 2 x 2 a 1 x 2 a 0 x 2 a 4 x 3 a 3 x 3 a 2 x 3 a 1 x 3 a 0 x 3 a 4 x 4 a 3 x 4 a 2 x 4 a 1 x 4 a 0 x 4 p 9 p p 7 p 6 p 5 p 4 p 3 p 2 p 1 p 0 In the next example we make use of the fact that the value of a k bit 2 s

Tree and Array Multipliers 6 complement number is: x k 1 2 k 1 + If the sign bit of the multiplier, x 4, is 1, the multiplicand is subtracted in the final row. x i 2 i a 4 a 3 a 2 a 1 a 0 x 4 x 3 x 2 x 1 x 0 a 4 x 0 a 3 x 0 a 2 x 0 a 1 x 0 a 0 x 0 a 4 x 1 a 3 x 1 a 2 x 1 a 1 x 1 a 0 x 1 a 4 x 2 a 3 x 2 a 2 x 2 a 1 x 2 a 0 x 2 a 4 x 3 a 3 x 3 a 2 x 3 a 1 x 3 a 0 x 3 a 4 x 4 a 3 x 4 a 2 x 4 a 1 x 4 a 0 x 4 p 9 p p 7 p 6 p 5 p 4 p 3 p 2 p 1 p 0 This table is impractical since it implies subtracting individual bits and adding others. The Baugh Wooley Multiplier uses only additions. Here is its table: a 4 a 3 a 2 a 1 a 0 x 4 x 3 x 2 x 1 x 0 a 4 x 0 a 3 x 0 a 2 x 0 a 1 x 0 a 0 x 0 a 4 x 1 a 3 x 1 a 2 x 1 a 1 x 1 a 0 x 1 a 4 x 2 a 3 x 2 a 2 x 2 a 1 x 2 a 0 x 2 a 4 x 3 a 3 x 3 a 2 x 3 a 1 x 3 a 0 x 3 a 4 x 4 a 3 x 4 a 2 x 4 a 1 x 4 a 0 x 4 a 4 a 4 1 x 4 x 4 p 9 p p 7 p 6 p 5 p 4 p 3 p 2 p 1 p 0 We now derive the Bough Wooley Multiplier equations. intend to produce can be expressed as follows: The product we P = A X =(p 2k 1 p 2k 2 p 1 p 0 )= p 2k 1 2 2k 1 + 2 p i 2 i

Tree and Array Multipliers 7 = ( a k 1 2 k 1 + ) a i 2 i x k 1 2 k 1 + k 2 k 2 = a k 1 x k 1 2 2 + a i x j 2 i+j k 2 2 k 1 j=0 k 2 a k 1 x i 2 i 2 k 1 j=0 x j 2 j a i x k 1 2 i (1) Equation 1 must be manipulated so that there are no negative terms. Here is the third term without its negative sign: k 2 2 k 1 a k 1 x i 2 i = 2 k 1 ( 0 2 k +0 2 k 1 + The inclusion of the two zeroes does not change the value. The expression in parens can be thought of as a k + 1 bit value with the sign bit in the 2 k position. We now negate that expression by complementing every bit and adding 1: = 2 k 1 ( 2 k +2 k 1 +1+ Equation 2 has zero value for a k 1 = 0 and for for a k 1 = 1 it has value: 2 k 1 ( 2 k +2 k 1 +1+ Equation 2 can therefore be rewritten: 2 k 1 ( 2 k +2 k 1 + a k 1 2 k 1 + a k 1 + x i 2 i ) The fourth term of Equation 1 can similarly be rewritten. (2)

Tree and Array Multipliers The product becomes: P = a k 1 x k 1 2 2k 2 + j=0 a i x j 2 i+j +2 k 1 ( 2 k +2 k 1 + a k 1 2 k 1 + a k 1 + +2 k 1 ( 2 k +2 k 1 + x k 1 2 k 1 + x k 1 + Equation 3 provides the basis for the Baugh Wooley multiplier. Here again is the 5 5 Baugh Wooley multiplier table: (3) a 4 a 3 a 2 a 1 a 0 x 4 x 3 x 2 x 1 x 0 a 4 x 0 a 3 x 0 a 2 x 0 a 1 x 0 a 0 x 0 a 4 x 1 a 3 x 1 a 2 x 1 a 1 x 1 a 0 x 1 a 4 x 2 a 3 x 2 a 2 x 2 a 1 x 2 a 0 x 2 a 4 x 3 a 3 x 3 a 2 x 3 a 1 x 3 a 0 x 3 a 4 x 4 a 3 x 4 a 2 x 4 a 1 x 4 a 0 x 4 a 4 a 4 1 x 4 x 4 p 9 p p 7 p 6 p 5 p 4 p 3 p 2 p 1 p 0 The first term of Equation 3 is a 4 x 4 2. The second term is a i x j 2 i+j j=0 It gives all the entries without complemented variables: a 3 x 0 a 2 x 0 a 1 x 0 a 0 x 0 a 3 x 1 a 2 x 1 a 1 x 1 a 0 x 1 a 3 x 2 a 2 x 2 a 1 x 2 a 0 x 2 a 3 x 3 a 2 x 3 a 1 x 3 a 0 x 3

Tree and Array Multipliers 9 The third term is: It gives 2 9 +2 plus: 2 k 1 ( 2 k +2 k 1 + a k 1 2 k 1 + a k 1 + a 4 x 3 a 4 x 2 a 4 x 1 a 4 x 0 a 4 a 4 And the fourth term is: It gives 2 9 +2 plus: 2 k 1 ( 2 k +2 k 1 + x k 1 2 k 1 + x k 1 + a 3 x 4 a 2 x 4 a 1 x 4 a 0 x 4 x 4 x 4 The two terms 2 9 +2 sum to form 2 10 +2 9. The negative term is outside the range of the product. Note that the table for Baugh Wooley in the text is incorrect. Here is that table:

Tree and Array Multipliers 10 a 4 a 3 a 2 a 1 a 0 x 4 x 3 x 2 x 1 x 0 a 4 x 0 a 3 x 0 a 2 x 0 a 1 x 0 a 0 x 0 a 4 x 1 a 3 x 1 a 2 x 1 a 1 x 1 a 0 x 1 a 4 x 2 a 3 x 2 a 2 x 2 a 1 x 2 a 0 x 2 a 4 x 3 a 3 x 3 a 2 x 3 a 1 x 3 a 0 x 3 a 4 x 4 a 3 x 4 a 2 x 4 a 1 x 4 a 0 x 4 a 4 a 4 1 x 4 x 4 p 9 p p 7 p 6 p 5 p 4 p 3 p 2 p 1 p 0 Try the example ( 1) ( 1): 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 0 1 1 1 1 0 1 1 1 1 0 0 0 0 1 0 1 1 0 1 1 1 0 0 0 1 0 0 0 1 The Baugh-Wooley multiplier deals with both negative multiplicands and negative multipliers without sign extension. The two additional levels add to the delay, although it is possible to remove one of these levels. A CLA adder can used to sum the two values emerging at the bottom of the array. The table can be implemented with Wallace trees in each column. Alternatively pipeline registers can be inserted between the rows of the table, enabling higher throughput, but not reducing delay.