EECS150 - Digital Design Lecture 24 - Arithmetic Blocks, Part 2 + Shifters

Similar documents
EECS150 - Digital Design Lecture 25 Shifters and Counters. Recap

EECS150 - Digital Design Lecture 21 - Design Blocks

EECS150 - Digital Design Lecture 11 - Shifters & Counters. Register Summary

EECS150 - Digital Design Lecture 18 - Counters

EECS150 - Digital Design Lecture 18 - Counters

EECS150 - Digital Design Lecture 17 - Sequential Circuits 3 (Counters)

Chapter 5 Arithmetic Circuits

EECS150 - Digital Design Lecture 23 - FSMs & Counters

EECS150 - Digital Design Lecture 16 Counters. Announcements

Lecture 8: Sequential Multipliers

Carry Look-ahead Adders. EECS150 - Digital Design Lecture 12 - Combinational Logic & Arithmetic Circuits Part 2. Carry Look-ahead Adders

Carry Look Ahead Adders

Review. EECS Components and Design Techniques for Digital Systems. Lec 18 Arithmetic II (Multiplication) Computer Number Systems

Combinational Logic Design Arithmetic Functions and Circuits

Chapter 5. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 5 <1>

Arithmetic Circuits-2

Combinational Logic. By : Ali Mustafa

EECS150 - Digital Design Lecture 22 - Arithmetic Blocks, Part 1

CS 140 Lecture 14 Standard Combinational Modules

Adders, subtractors comparators, multipliers and other ALU elements

What s the Deal? MULTIPLICATION. Time to multiply

Review: Designing with FSM. EECS Components and Design Techniques for Digital Systems. Lec 09 Counters Outline.

Chapter 03: Computer Arithmetic. Lesson 03: Arithmetic Operations Adder and Subtractor circuits Design

Review: Designing with FSM. EECS Components and Design Techniques for Digital Systems. Lec09 Counters Outline.

Logic and Computer Design Fundamentals. Chapter 5 Arithmetic Functions and Circuits

DIGITAL TECHNICS. Dr. Bálint Pődör. Óbuda University, Microelectronics and Technology Institute

Lecture 8. Sequential Multipliers

EECS150 - Digital Design Lecture 10 - Combinational Logic Circuits Part 1

Hardware Design I Chap. 4 Representative combinational logic

Binary Multipliers. Reading: Study Chapter 3. The key trick of multiplication is memorizing a digit-to-digit table Everything else was just adding

9. Datapath Design. Jacob Abraham. Department of Electrical and Computer Engineering The University of Texas at Austin VLSI Design Fall 2017

Arithmetic Circuits-2

Design of Sequential Circuits

Adders, subtractors comparators, multipliers and other ALU elements

CS61C : Machine Structures

ELEN Electronique numérique

CSE140: Components and Design Techniques for Digital Systems. Decoders, adders, comparators, multipliers and other ALU elements. Tajana Simunic Rosing

Computer Architecture 10. Fast Adders

Tree and Array Multipliers Ivor Page 1

Overview. Arithmetic circuits. Binary half adder. Binary full adder. Last lecture PLDs ROMs Tristates Design examples

CMPEN 411 VLSI Digital Circuits Spring Lecture 19: Adder Design

Multiplication Ivor Page 1

A COMBINED 16-BIT BINARY AND DUAL GALOIS FIELD MULTIPLIER. Jesus Garcia and Michael J. Schulte

Numbers and Arithmetic

CS61C : Machine Structures

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

Binary addition example worked out

Reduced-Area Constant-Coefficient and Multiple-Constant Multipliers for Xilinx FPGAs with 6-Input LUTs

DE58/DC58 LOGIC DESIGN DEC 2014

VLSI Arithmetic. Lecture 9: Carry-Save and Multi-Operand Addition. Prof. Vojin G. Oklobdzija University of California

Logic and Computer Design Fundamentals. Chapter 8 Sequencing and Control

Professor Fearing EECS150/Problem Set Solution Fall 2013 Due at 10 am, Thu. Oct. 3 (homework box under stairs)

ECE/CS 552: Introduction To Computer Architecture 1. Instructor:Mikko H Lipasti. Fall 2010 University i of Wisconsin-Madison

Numbers and Arithmetic

Lecture 18: Datapath Functional Units

Combinational Logic. Lan-Da Van ( 范倫達 ), Ph. D. Department of Computer Science National Chiao Tung University Taiwan, R.O.C.

For smaller NRE cost For faster time to market For smaller high-volume manufacturing cost For higher performance

Digital Logic: Boolean Algebra and Gates. Textbook Chapter 3

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

Lecture 12: Datapath Functional Units

CprE 281: Digital Logic

CSEE 3827: Fundamentals of Computer Systems. Combinational Circuits

Floating Point Representation and Digital Logic. Lecture 11 CS301

Multiplication of signed-operands

ECE380 Digital Logic. Positional representation

Combinational Logic. Mantıksal Tasarım BBM231. section instructor: Ufuk Çelikcan

Numbers & Arithmetic. Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University. See: P&H Chapter , 3.2, C.5 C.

Arithmetic in Integer Rings and Prime Fields

Introduction to Digital Logic

Solutions for Appendix C Exercises

Midterm Exam Two is scheduled on April 8 in class. On March 27 I will help you prepare Midterm Exam Two.

14:332:231 DIGITAL LOGIC DESIGN. 2 s-complement Representation

Digital Logic. CS211 Computer Architecture. l Topics. l Transistors (Design & Types) l Logic Gates. l Combinational Circuits.

ARITHMETIC COMBINATIONAL MODULES AND NETWORKS

Cost/Performance Tradeoff of n-select Square Root Implementations

Proposal to Improve Data Format Conversions for a Hybrid Number System Processor

Logic Design II (17.342) Spring Lecture Outline

CSE 140 Lecture 11 Standard Combinational Modules. CK Cheng and Diba Mirza CSE Dept. UC San Diego

ELCT201: DIGITAL LOGIC DESIGN

L8/9: Arithmetic Structures

UNSIGNED BINARY NUMBERS DIGITAL ELECTRONICS SYSTEM DESIGN WHAT ABOUT NEGATIVE NUMBERS? BINARY ADDITION 11/9/2018

CMP 334: Seventh Class

CSE 140 Spring 2017: Final Solutions (Total 50 Points)

Arithmetic Circuits-2

14:332:231 DIGITAL LOGIC DESIGN. Why Binary Number System?

3. Combinational Circuit Design

LOGIC CIRCUITS. Basic Experiment and Design of Electronics. Ho Kyung Kim, Ph.D.

Digital System Design Combinational Logic. Assoc. Prof. Pradondet Nilagupta

Parallel Multipliers. Dr. Shoab Khan

211: Computer Architecture Summer 2016

14:332:231 DIGITAL LOGIC DESIGN

Logic. Combinational. inputs. outputs. the result. system can

CMSC 313 Lecture 18 Midterm Exam returned Assign Homework 3 Circuits for Addition Digital Logic Components Programmable Logic Arrays

Fundamentals of Digital Design

Sample Test Paper - I

LOGIC CIRCUITS. Basic Experiment and Design of Electronics

We are here. Assembly Language. Processors Arithmetic Logic Units. Finite State Machines. Circuits Gates. Transistors

UNIT II COMBINATIONAL CIRCUITS:

Chapter 4. Combinational: Circuits with logic gates whose outputs depend on the present combination of the inputs. elements. Dr.

Boolean Algebra and Digital Logic 2009, University of Colombo School of Computing

Transcription:

EECS150 - Digital Design Lecture 24 - Arithmetic Blocks, Part 2 + Shifters April 15, 2010 John Wawrzynek 1 Multiplication a 3 a 2 a 1 a 0 Multiplicand b 3 b 2 b 1 b 0 Multiplier X a 3 b 0 a 2 b 0 a 1 b 0 a 0 b 0 a 3 b 1 a 2 b 1 a 1 b 1 a 0 b 1 Partial a 3 b 2 a 2 b 2 a 1 b 2 a 0 b 2 products a 3 b 3 a 2 b 3 a 1 b 3 a 0 b 3... a 1 b 0 +a 0 b 1 a 0 b 0 Product Many different circuits exist for multiplication. Each one has a different balance between speed (performance) and amount of logic (cost). 2

Shift and Add Multiplier Sums each partial product, one at a time. In binary, each partial product is shifted versions of A or 0. Cost α n, Τ = n clock cycles. What is the critical path for determining the min clock period? Control Algorithm: 1. P 0, A multiplicand, B multiplier 2. If LSB of B==1 then add A to P else add 0 3. Shift [P][B] right 1 4. Repeat steps 2 and 3 n-1 times. 5. [P][B] has product. 3 Shift and Add Multiplier Signed Multiplication: Remember for 2 s complement numbers MSB has negative weight: ex: -6 = 11010 2 = 0 2 0 + 1 2 1 + 0 2 2 + 1 2 3-1 2 4 = 0 + 2 + 0 + 8-16 = -6 Therefore for multiplication: a) subtract final partial product b) sign-extend partial products Modifications to shift & add circuit: a) adder/subtractor b) sign-extender on P shifter register 4

Bit-serial Multiplier Bit-serial multiplier (n 2 cycles, one bit of result per n cycles): Control Algorithm: repeat n cycles { // outer (i) loop repeat n cycles{ // inner (j) loop shifta, selectsum, shifthi } Note: The occurrence of a control shiftb, shifthi, shiftlow, reset signal x means x=1. The absence } of x means x=0. 5 Array Multiplier Single cycle multiply: Generates all n partial products simultaneously. Each row: n-bit adder with AND gates What is the critical path? 6

Carry-Save Addition Speeding up multiplication is a matter of speeding up the summing of the partial products. Carry-save addition can help. Carry-save addition passes (saves) the carries to the output, rather than propagating them. Example: sum three numbers, 3 10 = 0011, 2 10 = 0010, 3 10 = 0011 3 10 0011 + 2 10 0010 c 0100 = 4 10 s 0001 = 1 10 carry-save add carry-save add carry-propagate add 3 10 0011 c 0010 = 2 10 s 0110 = 6 10 1000 = 8 10 In general, carry-save addition takes in 3 numbers and produces 2. Whereas, carry-propagate takes 2 and produces 1. With this technique, we can avoid carry propagation until final addition 7 Carry-save Circuits When adding sets of numbers, carry-save can be used on all but the final sum. Standard adder (carry propagate) is used for final sum. Carry-save is fast (no carry propagation) and cheap (same cost as ripple adder) 8

Array Multiplier using Carry-save Addition Fast carrypropagate adder 9 Carry-save Addition CSA is associative and communitive. For example: (((X 0 + X 1 ) + X 2 ) + X 3 ) = ((X 0 + X 1 ) +( X 2 + X 3 )) A balanced tree can be used to reduce the logic delay. This structure is the basis of the Wallace Tree Multiplier. Partial products are summed with the CSA tree. Fast CPA (ex: CLA) is used for final sum. Multiplier delay α log 3/2 N + log 2 N 10

Constant Multiplication Our discussion so far has assumed both the multiplicand (A) and the multiplier (B) can vary at runtime. What if one of the two is a constant? Y = C * X Constant Coefficient multiplication comes up often in signal processing and other hardware. Ex: y i = αy i-1 + x i x i y i where α is an application dependent constant that is hard-wired into the circuit. How do we build and array style (combinational) multiplier that takes advantage of the constancy of one of the operands? 11 Multiplication by a Constant If the constant C in C*X is a power of 2, then the multiplication is simply a shift of X. Ex: 4*X What about division? What about multiplication by non- powers of 2? 12

Multiplication by a Constant In general, a combination of fixed shifts and addition: Ex: 6*X = 0110 * X = (2 2 + 2 1 )*X Details: 13 Multiplication by a Constant Another example: C = 23 10 = 010111 In general, the number of additions equals the number of 1 s in the constant minus one. Using carry-save adders (for all but one of these) helps reduce the delay and cost, but the number of adders is still the number of 1 s in C minus 2. Is there a way to further reduce the number of adders (and thus the cost and delay)? 14

Multiplication using Subtraction Subtraction is ~ the same cost and delay as addition. Consider C*X where C is the constant value 15 10 = 01111. C*X requires 3 additions. We can recode 15 from 01111 = (2 3 + 2 2 + 2 1 + 2 0 ) to 10001 = (2 4-2 0 ) where 1 means negative weight. Therefore, 15*X can be implemented with only one subtractor. 15 Canonic Signed Digit Representation CSD represents numbers using 1, 1, & 0 with the least possible number of non-zero digits. Strings of 2 or more non-zero digits are replaced. Leads to a unique representation. To form CSD representation might take 2 passes: First pass: replace all occurrences of 2 or more 1 s: 01..10 by 10..10 Second pass: same as a above, plus replace 0110 by 0010 Examples: 011101 = 29 100101 = 32-4 + 1 0010111 = 23 0011001 0101001 = 32-8 - 1 Can we further simplify the multiplier circuits? 0110110 = 54 1011010 1001010 = 64-8 - 2 16

Constant Coefficient Multiplication (KCM) Binary multiplier: Y = 231*X = (2 7 + 2 6 + 2 5 + 2 2 + 2 1 +2 0 )*X CSD helps, but the multipliers are limited to shifts followed by adds. CSD multiplier: Y = 231*X = (2 8-2 5 + 2 3-2 0 )*X How about shift/add/shift/add? KCM multiplier: Y = 231*X = 7*33*X = (2 3-2 0 )*(2 5 + 2 0 )*X No simple algorithm exists to determine the optimal KCM representation. Most use exhaustive search method. 17 Fixed Shifters / Rotators fixed shifters hardwire the shift amount into the circuit. Ex: verilog: X >> 2 (right shift X by 2 places) Fixed shift/rotator is nothing but wires! So what? Logical Shift Rotate Arithmetic Shift 18

Variable Shifters / Rotators Example: X >> S, where S is unknown when we synthesize the circuit. Uses: shift instruction in processors (ARM includes a shift on every instruction), floating-point arithmetic, division/multiplication by powers of 2, etc. One way to build this is a simple shift-register: a) Load word, b) shift enable for S cycles, c) read word. Worst case delay O(N), not good for processor design. Can we do it in O(logN) time and fit it in one cycle? 19 Log Shifter / Rotator Log(N) stages, each shifts (or not) by a power of 2 places, S=[s 2 ;s 1 ;s 0 ]: Shift by N/2 Shift by 2 Shift by 1 20

LUT Mapping of Log shifter Efficient with 2to1 multiplexors, for instance, 3LUTs. Virtex5 has 6LUTs. Naturally makes 4to1 muxes: Reorganize shifter to use 4to1 muxes. Final stage uses F7 mux 21 Improved Shifter / Rotator How about this approach? Could it lead to even less delay? What is the delay of these big muxes? Look a transistor-level implementation? 22

Barrel Shifter Cost/delay? (don t forget the decoder) 23 Connection Matrix Generally useful structure: N 2 control points. What other interesting functions can it do? 24

Cross-bar Switch Nlog(N) control signals. Supports all interesting permutations All one-to-one and one-to-many connections. Commonly used in communication hardware (switches, routers). 25