SRT Division and the Pentium FDIV Bug (draft lecture notes, CSCI P415)

Similar documents
EECS150 - Digital Design Lecture 27 - misc2

Graduate Institute of Electronics Engineering, NTU Basic Division Scheme

Lecture 11. Advanced Dividers

Verifying the SRT Division Algorithm Using Theorem Proving Techniques

EDULABZ INTERNATIONAL NUMBER SYSTEM

Divisibility of Natural Numbers

Some Facts from Number Theory

( 3) ( ) ( ) ( ) ( ) ( )

14:332:231 DIGITAL LOGIC DESIGN. 2 s-complement Representation

Notes on arithmetic. 1. Representation in base B

C241 Homework Assignment 7

Part I, Number Systems. CS131 Mathematics for Computer Scientists II Note 1 INTEGERS

Hardware Design I Chap. 4 Representative combinational logic

The Euclidean Algorithm and Multiplicative Inverses

CPSC 467: Cryptography and Computer Security

DIVISION BY DIGIT RECURRENCE

Chapter 5: Solutions to Exercises

Long division for integers

CHAPTER 7: RATIONAL AND IRRATIONAL NUMBERS (3 WEEKS)...

The goal differs from prime factorization. Prime factorization would initialize all divisors to be prime numbers instead of integers*

NUMBERS AND CODES CHAPTER Numbers

Linear Feedback Shift Registers (LFSRs) 4-bit LFSR

Dividing Polynomials

CHAPTER 2 NUMBER SYSTEMS

Downloaded from

The minimal polynomial

9. Datapath Design. Jacob Abraham. Department of Electrical and Computer Engineering The University of Texas at Austin VLSI Design Fall 2017

CSE140: Components and Design Techniques for Digital Systems. Decoders, adders, comparators, multipliers and other ALU elements. Tajana Simunic Rosing

Cost/Performance Tradeoff of n-select Square Root Implementations

A Simple Left-to-Right Algorithm for Minimal Weight Signed Radix-r Representations

Top Down Design. Gunnar Gotshalks 03-1

Design and FPGA Implementation of Radix-10 Algorithm for Division with Limited Precision Primitives

A Simple Left-to-Right Algorithm for Minimal Weight Signed Radix-r Representations

Faster arithmetic for number-theoretic transforms

MATH Dr. Halimah Alshehri Dr. Halimah Alshehri

NUMERICAL MATHEMATICS & COMPUTING 6th Edition

How fast can we add (or subtract) two numbers n and m?

3.3 Dividing Polynomials. Copyright Cengage Learning. All rights reserved.

Svoboda-Tung Division With No Compensation

L1 2.1 Long Division of Polynomials and The Remainder Theorem Lesson MHF4U Jensen

Solutions to Tutorial 4 (Week 5)

Computer Architecture 10. Residue Number Systems

An Informal introduction to Formal Verification

Adders, subtractors comparators, multipliers and other ALU elements

Chapter 1: Solutions to Exercises

Mathematical Induction

COMPUTERS ORGANIZATION 2ND YEAR COMPUTE SCIENCE MANAGEMENT ENGINEERING UNIT 3 - ARITMETHIC-LOGIC UNIT JOSÉ GARCÍA RODRÍGUEZ JOSÉ ANTONIO SERRA PÉREZ

, a 1. , a 2. ,..., a n

Lecture 6: Introducing Complexity

New Minimal Weight Representations for Left-to-Right Window Methods

L1 2.1 Long Division of Polynomials and The Remainder Theorem Lesson MHF4U Jensen

A number that can be written as, where p and q are integers and q Number.

6.5 Dividing Polynomials

A HIGH-SPEED PROCESSOR FOR RECTANGULAR-TO-POLAR CONVERSION WITH APPLICATIONS IN DIGITAL COMMUNICATIONS *

Logical Abduction and its Applications in Program Verification. Isil Dillig MSR Cambridge

Modular numbers and Error Correcting Codes. Introduction. Modular Arithmetic.

Warm-Up. Use long division to divide 5 into

ALU (3) - Division Algorithms

History & Binary Representation

Integers and Division

Lecture 8. Sequential Multipliers

CS 124 Math Review Section January 29, 2018

8 Primes and Modular Arithmetic

2 Arithmetic. 2.1 Greatest common divisors. This chapter is about properties of the integers Z = {..., 2, 1, 0, 1, 2,...}.

CSCI 255. S i g n e d N u m s / S h i f t i n g / A r i t h m e t i c O p s.

Numbers and Arithmetic

CSE 241 Digital Systems Spring 2013

Conversions between Decimal and Binary

Handout 2 Gaussian elimination: Special cases

Radix-4 Vectoring CORDIC Algorithm and Architectures. July 1998 Technical Report No: UMA-DAC-98/20

Proof: If (a, a, b) is a Pythagorean triple, 2a 2 = b 2 b / a = 2, which is impossible.

SECTION 2.3: LONG AND SYNTHETIC POLYNOMIAL DIVISION

Lecture 12 : Recurrences DRAFT

Lecture 6: Finite Fields (PART 3) PART 3: Polynomial Arithmetic. Theoretical Underpinnings of Modern Cryptography

3 The fundamentals: Algorithms, the integers, and matrices

This is a recursive algorithm. The procedure is guaranteed to terminate, since the second argument decreases each time.

Mat Week 8. Week 8. gcd() Mat Bases. Integers & Computers. Linear Combos. Week 8. Induction Proofs. Fall 2013

Residue Number Systems Ivor Page 1

Student Responsibilities Week 8. Mat Section 3.6 Integers and Algorithms. Algorithm to Find gcd()

Numbers & Arithmetic. Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University. See: P&H Chapter , 3.2, C.5 C.

QUADRATIC PROGRAMMING?

4.5 Rational functions.

MA 1B PRACTICAL - HOMEWORK SET 3 SOLUTIONS. Solution. (d) We have matrix form Ax = b and vector equation 4

Introduction to Number Theory. The study of the integers

CHAPTER 6. Prime Numbers. Definition and Fundamental Results

Queens College, CUNY, Department of Computer Science Numerical Methods CSCI 361 / 761 Spring 2018 Instructor: Dr. Sateesh Mane.

Long and Synthetic Division of Polynomials

Lesson 7.1 Polynomial Degree and Finite Differences

Algorithms CMSC Basic algorithms in Number Theory: Euclid s algorithm and multiplicative inverse

This Unit: Arithmetic. CIS 371 Computer Organization and Design. Pre-Class Exercise. Readings

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 8

Answers (1) A) 36 = - - = Now, we can divide the numbers as shown below. For example : 4 = 2, 2 4 = -2, -2-4 = -2, 2-4 = 2.

Simple Iteration, cont d

Definition 6.1 (p.277) A positive integer n is prime when n > 1 and the only positive divisors are 1 and n. Alternatively

CHAPTER 1 NUMBER SYSTEMS. 1.1 Introduction

Appendix: Synthetic Division

Checkpoint Questions Due Monday, October 1 at 2:15 PM Remaining Questions Due Friday, October 5 at 2:15 PM

The Impossibility of Certain Types of Carmichael Numbers

Lecture 7 Feb 4, 14. Sections 1.7 and 1.8 Some problems from Sec 1.8

Binary Multipliers. Reading: Study Chapter 3. The key trick of multiplication is memorizing a digit-to-digit table Everything else was just adding

Transcription:

SRT Division and the Pentium FDIV Bug (draft lecture notes, CSCI P4) Steven D. Johnson September 0, 000 Abstract This talk explains the widely publicized design error in a 994 issue of the Intel Corp. Pentium microprocessor. I present the SRT division algorithm on which the hardware is based. Then I show where the error occurs, and why it wasn t caught in verification. Finally, I speculation about whether the source of the but was a design error or a process error. For a sequence, {q i } [n] of digits, let [Q] m denote m k=0 q k 0 k We abuse notation above by interpreting digit q k as a number, and will omit the subscript m unless the context demands it. Here is an algorithm for long division. Dividend P and divisor D have been normalized to be less than 0. The partial values {p i } [n] are there only for discussion purposes. { P, D < 0} p 0, d, i := P, D, 0; while i < n do {inv inv} b choose q i {0... 9} such that... {p i q i d < d} p i+, i = 0(p i q i d), i + e { P [Q] d < d } 0 n Hence, on termination P/D [Q] n D < D 0 n

To put this in the usual way, the remainder (difference between computed an actual quotients) is smaller than D. Note that Thus, the loop invariants are, p = 0(p q d) = 0(0(p q d) q d) = 0(0(0(p 0 q 0 d) q d) q d) = 0 p 0 ( 0 q 0 d 0 q d 0 q 0 d ) = 0 P 0 k q k d d k=0 = 0 P [Q] d inv p i = 0 i P [Q] i d inv p i < 0 i D None of the arguments above are affected by the actual value of the radix (base). 0 can be replaced by any b and the program is still correct. In computer division the radix is a power of two; in the Pentium series, the radix is 4.

0 0 4 0 R 8 ) 4 0 - - - - - - - - 0 - - - - - - - - - - 0 - - 4-0 6 8 0 0 0 0 R 8 ) 4 0 - - - - - - - - 0 - - - - - - 9 - - 4 - - 0-4 - 0 6 8

- 0 0 0 9 7 + 0 0 0 0 R - {00-97 R- = 40 R- ) 4 = 40 R+8} - 6-8 = -0 + - - 4 7 7-4 0 6 = -40 + 4 - - 7-4

The examples illustrate that by maintaining two partial quotients, one of positive quotient digits and the other of negative ones, we can sometimes recover from an invalid guess for the next digit. It is important to understand that this refinement is not error correcting. If the selected digit is too remote from the correct one, there is no chance of recovering. Within those limits, however, we can perform nonrestoring long division: making a guess based on the leading digits of the dividend and divisor, which is close enough to be corrected. Hence, we do not have to backtrack if the guess is wrong. SRT hardware exploits this in two ways. The first is the nonrestoring quality, just described. In radix 4, this would mean a choice of quotient digits from the set {,,, 0,,, }. However, the actual selection set used in hardware is {,, 0,, }. This reduces the multiplications to shifts. We always avoid selecting ± as a quotient digit. The geometric sum, + 4 + 8 + + 4 i + = 8 Recall from (probably) your first homework assignment in mathematical induction that a + ar + ar + + ar n = a(i rn+ ) r In the algorithm, we need to keep p k+ within the range [ 8 d, 8 d]. To summarize, our division algorithm now looks like { P, D < } p 0, d, i := P, D, 0; while i < n do {inv inv} b choose q i {... + } such that... { 4(p i q i d) 8 d} p i+, i = 4(p i q i d), i + { e P [Q] d < d } 4 n where [Q] m = m k=0 q k 4 k inv p i = 4 i P [Q] d inv p i < 4 i D To maintain p i within range, each quotient digit must map the interval [ 8 d, 8 d]

into the interval [ d, d]. This is how we arrive at the staggered selections: 8 4 - - 0 4 8 Overlapping ranges permit a choice of two possible quotient digits. At any point along this interval, if the wrong digit a digit other than the one/two possibilities is selected, the algorithm cannot recover. Now, just like people doing long division, the algorithm must guess the next quotient digit, based on the most significant digits of the dividend and divisor. This guessing is represented by a two dimensional look-up table, indexed by the 7 leading bits of p + i, including the sign bit, and the five leading bits of d. Edleman cites Coe and Tang as stating that table entries may be determined by the following thresholds. In this table, we take P to be 8p k 8 and D to be 6d 6. q P min P max 8 D 4 4 D 8 4 D 4 D 0 8 D 8 + D D 8 + 4 D 4 D 8 + 8 D 6

-8/D -/D -4/D -/D -/D /D /D 4/D /D 8/D - - 0 D - -4 - - - 0 P 4 The lookup table is a discrete version of this table, with boundaries lying within the overlapping regions (inexact rendering!): -8/D -/D -4/D -/D -/D /D /D 4/D /D 8/D - - 0 D - -4 - - - 0 P 4 There were five errors in the lookup table, all along the boundary between the region and the thought-to-be-inaccessible region above it. Actually, an adjustment must be made in the table to account for the fact that P is represented in carry-save form. That is, the value of P is maintained 7

in two registers, S and C under the invariant that P = S + C. To add D and P, the actual computation performs S, C := (S C D), shiftleft((s C) (C D) (D S)) bitwise in parallel. Subtraction is similar; D is complemented bit-wise, giving D, and the is compensated by shifting a into the vacated 0-bit of C. In this way, carry propagation is eliminated until a single final addition is performed to recover P. The carry-save representation weakens the bounds on the digit selection regions. It can be shown that this may be compensated by lowering all upper bounds by 8. It would appear that this lowering exposed the five errors in the Pentium s lookup table. -8/D -/D -4/D -/D -/D /D /D 4/D /D 8/D - - 0 D - -4 - - - 0 P 4 There remains a question of why the bug wasn t discovered in testing. Here are some relevant points: The lookup table is not symmetric. Random dividends and divisors do not result in random table accesses. In fact table access is far from uniform (!?). In fact, the algorithm accesses a bad entry to produce a bad p k+ only if p k indexes the table just below it. Note that since D never changes, the algorithm is always operating on a single column of the table. Theorem (Coe, Tang [Edelman]). a flawed table entry can only be reached if the divisor has the form d = d d d d 4 d d... and.d d d d 4 addresses one of the buggy columns 8

[Edelman] the error is bounded by e- when dividing two numbers in the standard interval [, ); and, no matter what happens the first eight quotient digits will be correct. References. Pratt. Anatomy of the Pentium Bug.. Edelman, Mathematics of the Pentium division bug.. Sharangpani and Barton. Statistical analysis of floating point flaw in the Pentium processor (994) 4. Clarke, German, and Zhao. Verifying the SRT division algorithm using theorem proving techniques.. Ruess, Shankar, and Srivas. Modular verification of SRT division. 6. Miner. Chapter six of Hardware Verification using Coinductive Assertions (PhD dissertation) 9