Lecture 11. Advanced Dividers

Similar documents
Graduate Institute of Electronics Engineering, NTU Basic Division Scheme

9. Datapath Design. Jacob Abraham. Department of Electrical and Computer Engineering The University of Texas at Austin VLSI Design Fall 2017

Cost/Performance Tradeoff of n-select Square Root Implementations

Part VI Function Evaluation

Lecture 8: Sequential Multipliers

Lecture 8. Sequential Multipliers

Chapter 1: Solutions to Exercises

DIVIDER IMPLEMENTATION

A HIGH-SPEED PROCESSOR FOR RECTANGULAR-TO-POLAR CONVERSION WITH APPLICATIONS IN DIGITAL COMMUNICATIONS *

Hardware Design I Chap. 4 Representative combinational logic

SRT Division and the Pentium FDIV Bug (draft lecture notes, CSCI P415)

Svoboda-Tung Division With No Compensation

Divider Implementation

DIVISION BY DIGIT RECURRENCE

Design and FPGA Implementation of Radix-10 Algorithm for Division with Limited Precision Primitives

EECS150 - Digital Design Lecture 27 - misc2

Complement Arithmetic

Chapter 5: Solutions to Exercises

Reduced-Area Constant-Coefficient and Multiple-Constant Multipliers for Xilinx FPGAs with 6-Input LUTs

ECE380 Digital Logic. Positional representation

Formal verification of IA-64 division algorithms

ALU (3) - Division Algorithms

Binary Multipliers. Reading: Study Chapter 3. The key trick of multiplication is memorizing a digit-to-digit table Everything else was just adding

14:332:231 DIGITAL LOGIC DESIGN. 2 s-complement Representation

ELEN Electronique numérique

ECE 645: Lecture 2. Carry-Lookahead, Carry-Select, & Hybrid Adders

The goal differs from prime factorization. Prime factorization would initialize all divisors to be prime numbers instead of integers*

CHAPTER 2 NUMBER SYSTEMS

Part II Addition / Subtraction

This Unit: Arithmetic. CIS 371 Computer Organization and Design. Pre-Class Exercise. Readings

Linear Feedback Shift Registers (LFSRs) 4-bit LFSR

Residue Number Systems Ivor Page 1

Computer Architecture 10. Residue Number Systems

Efficient Function Approximation Using Truncated Multipliers and Squarers

Tunable Floating-Point for Energy Efficient Accelerators

A 32-bit Decimal Floating-Point Logarithmic Converter

Remainders. We learned how to multiply and divide in elementary

Proposal to Improve Data Format Conversions for a Hybrid Number System Processor

Radix-4 Vectoring CORDIC Algorithm and Architectures. July 1998 Technical Report No: UMA-DAC-98/20

Adders, subtractors comparators, multipliers and other ALU elements

Conversions between Decimal and Binary

Chapter 5 Arithmetic Circuits

COMPUTERS ORGANIZATION 2ND YEAR COMPUTE SCIENCE MANAGEMENT ENGINEERING UNIT 3 - ARITMETHIC-LOGIC UNIT JOSÉ GARCÍA RODRÍGUEZ JOSÉ ANTONIO SERRA PÉREZ

Numbering Systems. Computational Platforms. Scaling and Round-off Noise. Special Purpose. here that is dedicated architecture

ARITHMETIC COMBINATIONAL MODULES AND NETWORKS

Logic and Computer Design Fundamentals. Chapter 5 Arithmetic Functions and Circuits

EC-121 Digital Logic Design

CS 140 Lecture 14 Standard Combinational Modules

MATH Dr. Halimah Alshehri Dr. Halimah Alshehri

SUFFIX PROPERTY OF INVERSE MOD

Adders, subtractors comparators, multipliers and other ALU elements

Chapter 4 Number Representations

Part II Addition / Subtraction

Arithmetic in Integer Rings and Prime Fields

COMPUTER ARITHMETIC. 13/05/2010 cryptography - math background pp. 1 / 162

A COMBINED 16-BIT BINARY AND DUAL GALOIS FIELD MULTIPLIER. Jesus Garcia and Michael J. Schulte

ECE 545 Digital System Design with VHDL Lecture 1. Digital Logic Refresher Part A Combinational Logic Building Blocks

Number Representations

ECE 645: Lecture 3. Conditional-Sum Adders and Parallel Prefix Network Adders. FPGA Optimized Adders

A Simple Architectural Enhancement for Fast and Flexible Elliptic Curve Cryptography over Binary Finite Fields GF(2 m )

What s the Deal? MULTIPLICATION. Time to multiply

Optimized Linear, Quadratic and Cubic Interpolators for Elementary Function Hardware Implementations

Proposal to Improve Data Format Conversions for a Hybrid Number System Processor

VLSI Arithmetic. Lecture 9: Carry-Save and Multi-Operand Addition. Prof. Vojin G. Oklobdzija University of California

Dual-Field Arithmetic Unit for GF(p) and GF(2 m ) *

CORDIC, Divider, Square Root

GF(2 m ) arithmetic: summary

Efficient random number generation on FPGA-s

EE260: Digital Design, Spring n Digital Computers. n Number Systems. n Representations. n Conversions. n Arithmetic Operations.

Pre-Algebra 2. Unit 9. Polynomials Name Period

An Effective New CRT Based Reverse Converter for a Novel Moduli Set { 2 2n+1 1, 2 2n+1, 2 2n 1 }

Computer Arithmetic Design

Short Division of Long Integers. (joint work with David Harvey)

Menu. Review of Number Systems EEL3701 EEL3701. Math. Review of number systems >Binary math >Signed number systems

Chapter 2 Boolean Algebra and Logic Gates

On the computation of the reciprocal of floating point expansions using an adapted Newton-Raphson iteration

Faster arithmetic for number-theoretic transforms

CSE 241 Digital Systems Spring 2013

CMP 334: Seventh Class

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

Sample Marking Scheme

SOLUTION OF ALGEBRAIC AND TRANSCENDENTAL EQUATIONS BISECTION METHOD

Multiplication of signed-operands

VLSI Signal Processing

Design of Sequential Circuits

Chapter 5. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 5 <1>

UNSIGNED BINARY NUMBERS DIGITAL ELECTRONICS SYSTEM DESIGN WHAT ABOUT NEGATIVE NUMBERS? BINARY ADDITION 11/9/2018

INF2270 Spring Philipp Häfliger. Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)

Overview. Arithmetic circuits. Binary half adder. Binary full adder. Last lecture PLDs ROMs Tristates Design examples

PanHomc'r I'rui;* :".>r '.a'' W"»' I'fltolt. 'j'l :. r... Jnfii<on. Kslaiaaac. <.T i.. %.. 1 >

Finite Fields. SOLUTIONS Network Coding - Prof. Frank H.P. Fitzek

McBits: Fast code-based cryptography

Power Consumption Analysis. Arithmetic Level Countermeasures for ECC Coprocessor. Arithmetic Operators for Cryptography.

Extended Introduction to Computer Science CS1001.py. Lecture 8 part A: Finding Zeroes of Real Functions: Newton Raphson Iteration

Newton-Raphson Algorithms for Floating-Point Division Using an FMA

Discrete Mathematics U. Waterloo ECE 103, Spring 2010 Ashwin Nayak May 17, 2010 Recursion

Numbers and Arithmetic

Computer Architecture 10. Fast Adders

1 RN(1/y) Ulp Accurate, Monotonic

DIGIT-SERIAL ARITHMETIC

Transcription:

Lecture 11 Advanced Dividers

Required Reading Behrooz Parhami, Computer Arithmetic: Algorithms and Hardware Design Chapter 15 Variation in Dividers 15.3, Combinational and Array Dividers Chapter 16, Division by Convergence

Division versus Multiplication Division is more complex than multiplication: Need for quotient digit selection or estimation Overflow possibility: the high-order k bits of z must be strictly less than d; this overflow check also detects the divide-by-zero condition. Pentium III latencies Instruction Latency Cycles/Issue Load / Store 3 1 Integer Multiply 4 1 Integer Divide 36 36 Double/Single FP Multiply 5 2 Double/Single FP Add 3 1 Double/Single FP Divide 38 38 The ratios haven t changed much in later Pentiums, Atom, or AMD products* *Source: T. Granlund, Instruction Latencies and Throughput for AMD and Intel x86 Processors, Feb. 2012 May 2012 Computer Arithmetic, Division Slide 3

Classification of Dividers Radix-2 Sequential High-radix Array Dividers Dividers by Convergence Restoring Non-restoring regular SRT using carry save adders SRT using carry save adders 4

Fractional Division

Unsigned Fractional Division z frac Dividend.z -1 z -2... z -(2k-1) z -2k d frac Divisor.d -1 d -2... d -(k-1) d -k q frac Quotient.q -1 q -2... q -(k-1) q -k s frac Remainder.000 0s -(k+1)... s -(2k-1) s -2k k bits 6

For Integers: Integer vs. Fractional Division z = q d + s 2-2k For Fractions: z 2-2k = (q 2 -k ) (d 2 -k ) + s (2-2k ) where z frac = q frac d frac + s frac z frac = z 2-2k d frac = d 2 -k q frac = q 2 -k s frac = s 2-2k 7

Unsigned Fractional Division Overflow Condition for no overflow: z frac < d frac 8

Sequential Fractional Division Basic Equations s (0) = z frac s (j) = 2 s (j-1) - q -j d frac for j=1..k 2 k s frac = s (k) s frac = 2 -k s (k) 9

Fig. 13.2 Examples of sequential division with integer and fractional operands. 10

Array Dividers 11

Sequential Fractional Division Basic Equations s frac (0) = z frac s (j) = 2 s (j-1) - q -j d frac s (k) frac = 2k s frac 12

Restoring Unsigned Fractional Division s (0) = z for j = 1 to k if 2 s (j-1) - d > 0 q -j = 1 s (j) = 2 s (j-1) - d else q -j = 0 s (j) = 2 s (j-1) 13

Restoring Array Divider q 1 z 1 d 1 z 2 d 2 z 3 d 3 z 4 q 2 0 z 5 q 3 0 z 6 Cell 0 FS 1 0 s s s 4 5 6 Dividend z =.z 1 z 2 3 z z 4 5 z 6 z Divisor d =.d 1 d 2 3 d Quotient q =.q 1 q 2 3 q Remainder s =.0 0 0 s 4 5 s 6 s May 2012 Computer Arithmetic, Division Slide 14

Non-Restoring Unsigned Fractional Division s (-1) = z-d for j = 0 to k-1 if s (j-1) > 0 q -j = 1 s (j) = 2 s (j-1) - d else q -j = 0 s (j) = 2 s (j-1) + d end for if s (k-1) > 0 q -k = 1 else q -k = 0 15

d z 0 0 d z 1 1 d z 2 2 d z 3 3 1 Nonrestoring Array Divider q 0 z 4 q Critical path 1 q 2 z 5 Similarity to array multiplier is deceiving z 6 Cell q 3 XOR FA s s s s 3 4 5 6 Dividend z = 0 z.z 1 2 z 3 z 4 z 5 z 6 z Divisor d = 0 d.d 1 2 d 3 d Quotient q = 0 q.q 1 2 q 3 q Remainder s = 0.0 0 3 s 4 s 5 s 6 s May 2012 Computer Arithmetic, Division Slide 16

Division by Convergence 17

Division by Convergence Chapter Goals Show how by using multiplication as the basic operation in each division step, the number of iterations can be reduced Chapter Highlights Digit-recurrence as convergence method Convergence by Newton-Raphson iteration Computing the reciprocal of a number Hardware implementation and fine tuning May 2012 Computer Arithmetic, Division Slide 18

16.1 General Convergence Methods Sequential digit-at-a-time (binary or high-radix) division can be viewed as a convergence scheme As each new digit of q = z / d is determined, the quotient value is refined, until it reaches the final correct value Convergence is from below in restoring division and oscillating in nonrestoring division Meanwhile, the remainder s = z q d approaches 0; the scaled remainder is kept in a certain range, such as [ d, d) 1 0.101101 0 q Digit May 2012 Computer Arithmetic, Division Slide 19

Elaboration on Scaled Remainder in Division The partial remainder s (j) in division recurrence isn t the true remainder but a version scaled by 2 j Division with left shifts s (j) = 2s (j 1) q k j (2 k d) with s (0) = z and shift s (k) = 2 k s subtract Quotient digit selection keeps the scaled remainder bounded (say, in the range d to d) to ensure the convergence of the true remainder to 0 1 0.101101 0 q Digit May 2012 Computer Arithmetic, Division Slide 20

Recurrence Formulas for Convergence Methods u (i+1) = f(u (i), v (i) ) v (i+1) = g(u (i), v (i) ) Constant Desired function u (i+1) = f(u (i), v (i), w (i) ) v (i+1) = g(u (i), v (i), w (i) ) w (i+1) = h(u (i), v (i), w (i) ) Guide the iteration such that one of the values converges to a constant (usually 0 or 1) The other value then converges to the desired function The complexity of this method depends on two factors: a. Ease of evaluating f and g (and h) b. Rate of convergence (number of iterations needed) May 2012 Computer Arithmetic, Division Slide 21

16.2 Division by Repeated Multiplications Motivation: Suppose add takes 1 clock and multiply 3 clocks 64-bit divide takes 64 clocks in radix 2, 32 in radix 4 à Divide faster via multiplications faster if 10 or fewer needed Idea: q = z d = zx dx (0) (0) x x (1) (1)! x! x ( m 1) ( m 1) Converges to q Force to 1 Remainder often not needed, but can be obtained by another multiplication if desired: s = z qd To turn the identity into a division algorithm, we face three questions: 1. How to select the multipliers x (i)? 2. How many iterations (pairs of multiplications)? 3. How to implement in hardware? May 2012 Computer Arithmetic, Division Slide 22

Formulation as a Convergence Computation Idea: q = z d = zx dx (0) (0) x x (1) (1)! x! x ( m 1) ( m 1) Converges to q Force to 1 d (i+1) = d (i) x (i) Set d (0) = d; make d (m) converge to 1 z (i+1) = z (i) x (i) Set z (0) = z; obtain z/d = q z (m) Question 1: How to select the multipliers x (i)? x (i) = 2 d (i) This choice transforms the recurrence equations into: d (i+1) = d (i) (2 - d (i) ) Set d (0) = d; iterate until d (m) 1 z (i+1) = z (i) (2 - d (i) ) Set z (0) = z; obtain z/d = q z (m) u (i+1) = f(u (i), v (i) ) v (i+1) = g(u (i), v (i) ) Fits the general form May 2012 Computer Arithmetic, Division Slide 23

Determining the Rate of Convergence d (i+1) = d (i) x (i) Set d (0) = d; make d (m) converge to 1 z (i+1) = z (i) x (i) Set z (0) = z; obtain z/d = q z (m) Question 2: How quickly does d (i) converge to 1? We can relate the error in step i + 1 to the error in step i: d (i+1) = d (i) (2 - d (i) ) = 1 (1 d (i) ) 2 1 d (i+1) = (1 d (i) ) 2 For 1 d (i) ε, we get 1 d (i+1) ε 2 : In general, for k-bit operands, we need Quadratic convergence 2m 1 multiplications and m 2 s complementations where m = log 2 k May 2012 Computer Arithmetic, Division Slide 24

Quadratic Convergence Table 16.1 Quadratic convergence in computing z/d by repeated multiplications, where 1/2 d = 1 y < 1 i d (i) = d (i 1) x (i 1), with d (0) = d x (i) = 2 d (i) 0 1 y = (.1xxx xxxx xxxx xxxx) two 1/2 1 + y 1 1 y 2 = (.11xx xxxx xxxx xxxx) two 3/4 1 + y 2 2 1 y 4 = (.1111 xxxx xxxx xxxx) two 15/16 1 + y 4 3 1 y 8 = (.1111 1111 xxxx xxxx) two 255/256 1 + y 8 4 1 y 16 = (.1111 1111 1111 1111) two = 1 ulp Each iteration doubles the number of guaranteed leading 1s (convergence to 1 is from below) Beginning with a single 1 (d ½), after log 2 k iterations we get as close to 1 as is possible in a fractional representation May 2012 Computer Arithmetic, Division Slide 25

Graphical Depiction of Convergence to q 1 1 ulp d q d (i) q ε z z (i) 0 1 2 3 4 5 6 Fig. 16.1 Graphical representation of convergence in division by repeated multiplications. Iteration i May 2012 Computer Arithmetic, Division Slide 26

16.5 Hardware Implementation Repeated multiplications: Each pair of ops involves the same multiplier d (i+1) = d (i) (2 - d (i) ) Set d (0) = d; iterate until d (m) 1 z (i+1) = z (i) (2 - d (i) ) Set z (0) = z; obtain z/d = q z (m) z (i) x (i) d (i+1) 2's Compl x (i+1) z (i+1) x (i+1) z (i) x (i) d (i+1) x (i+1) z (i+1) x (i+1) d (i) x (i) z (i) x (i) (i+1) x (i+1) d d (i+1) z (i+1) d (i+2) Fig. 16.6 Two multiplications fully overlapped in a 2-stage pipelined multiplier. May 2012 Computer Arithmetic, Division Slide 27

16.3 Division by Reciprocation The Newton-Raphson method can be used for finding a root of f (x) = 0 f(x) Start with an initial estimate x (0) for the root Tangent at x (i) Iteratively refine the estimate via the recurrence x (i+1) = x (i) f (x (i) ) / f ʹ (x (i) ) Justification: tan α (i) = f ʹ (x (i) ) = f (x (i) ) / (x (i) x (i+1) ) Root x (i+2) x (i+1) Fig. 16.2 Convergence to a root of f(x) = 0 in the Newton-Raphson method. α (i) x f(x (i)) (i) x May 2012 Computer Arithmetic, Division Slide 28

Computing 1/d by Convergence 1/d is the root of f (x) = 1/x d f ʹ (x) = 1/x 2 Substitute in the Newton-Raphson recurrence x (i+1) = x (i) f (x (i) ) / f ʹ (x (i) ) to get: x (i+1) = x (i) (2 - x (i) d) f(x) -d 1/d x One iteration = Two multiplications + One 2 s complementation Error analysis: Let δ (i) = 1/d x(i) be the error at the ith iteration δ (i+1) = 1/d x (i+1) = 1/d x (i) (2 x (i) d) = d (1/d x (i) ) 2 = d (δ (i) ) 2 Because d < 1, we have δ (i+1) < (δ (i) ) 2 May 2012 Computer Arithmetic, Division Slide 29

Choosing the Initial Approximation to 1/d With x (0) in the range 0 < x (0) < 2/d, convergence is guaranteed Justification: δ (0) = x (0) 1/d < 1/d δ (1) = x (1) 1/d = d (δ (0) ) 2 = (d δ (0) ) δ (0) < δ (0) For d in [1/2, 1): 1/x Simple choice x (0) = 1.5 Better approx. Max error = 0.5 < 1/d x (0) = 4( 3 1) 2d = 2.9282 2d Max error 0.1 2 1 0 0 1 x May 2012 Computer Arithmetic, Division Slide 30

16.4 Speedup of Convergence Division q = z d = zx dx (0) (0) x x (1) (1) ( m 1)! x Compute y = 1/d ( m 1)! x Do the multiplication yz Division can be performed via 2 log 2 k 1 multiplications This is not yet very impressive 64-bit numbers, 3-ns multiplier 33-ns division Three types of speedup are possible: Fewer multiplications (reduce m) Narrower multiplications (reduce the width of some x (i) s) Faster multiplications May 2012 Computer Arithmetic, Division Slide 31

Initial Approximation via Table Lookup Convergence is slow in the beginning: it takes 6 multiplications to get 8 bits of convergence and another 5 to go from 8 bits to 64 bits Approx to 1/d Better approx d x (0) x (1) x (2) = (0.1111 1111... ) two Read this value, x (0+), directly from a table, thereby reducing 6 multiplications to 2 A 2 w w lookup table is necessary and sufficient for w bits of convergence after 2 multiplications Example with 4-bit lookup: d = 0.1011 xxxx... (11/16 d < 12/16) Inverses of the two extremes are 16/11 1.0111 and 16/12 1.0101 So, 1.0110 is a good estimate for 1/d 1.0110 0.1011 = (11/8) (11/16) = 121/128 = 0.1111001 1.0110 0.1100 = (11/8) (3/4) = 33/32 = 1.000010 May 2012 Computer Arithmetic, Division Slide 32

Visualizing the Convergence with Table Lookup 1 1 ulp d q ε z After the 2nd pair of multiplications After table lookup and 1st pair of multiplications, replacing several iterations Iterations Fig. 16.3 Convergence in division by repeated multiplications with initial table lookup. May 2012 Computer Arithmetic, Division Slide 33

Convergence Does Not Have to Be from Below 1 1 ± ulp d q ± ε z Iterations Fig. 16.4 Convergence in division by repeated multiplications with initial table lookup and the use of truncated multiplicative factors. May 2012 Computer Arithmetic, Division Slide 34

Sequential Dividers with Carry-Save Adders 35

Block diagram of a radix-2 SRT divider with partial remainder in stored-carry form 36

October 1994 Pentium bug (1) Thomas Nicely, Lynchburg Collage, Virginia finds an error in his computer calculations, and traces it back to the Pentium processor November 7, 1994 First press announcement, Electronic Engineering Times Late 1994 Tim Coe, Vitesse Semiconductor presents an example with the worst-case error c = 4 195 835/3 145 727 Pentium = 1.333 739 06... Correct result = 1.333 820 44... 37

Intel admits subtle flaw Pentium bug (2) November 30, 1994 Intel s white paper about the bug and its possible consequences Intel - average spreadsheet user affected once in 27,000 years IBM - average spreadsheet user affected once every 24 days Replacements based on customer needs December 20, 1994 Announcement of no-question-asked replacements 38

Pentium bug (3) Error traced back to the look-up table used by the radix-4 SRT division algorithm 2048 cells, 1066 non-zero values {-2, -1, 1, 2} 5 non-zero values not downloaded correctly to the lookup table due to an error in the C script 39

40

Follow-up Courses 41

DIGITAL SYSTEMS DESIGN 1. ECE 681 VLSI Design for ASICs (Fall semesters) H. Homayoun, project/lab, front-end and back-end ASIC design with Synopsys tools 2. ECE 699 Digital Signal Processing Hardware Architectures (Spring semesters) A. Cohen, project, FPGA design for DSP 3. ECE 682 VLSI Test Concepts (Spring semesters) T. Storey, homework

NETWORK AND SYSTEM SECURITY 1. ECE 646 Cryptography and Computer Network Security (Fall semesters) K.Gaj, hardware, software, or analytical project 2. ECE 746 Advanced Applied Cryptography (Spring semesters) J.-P. Kaps, hardware, software, or analytical project 3. ECE 899 Cryptographic Engineering (Spring semesters) J.-P. Kaps, research-oriented project