Formal verification of IA-64 division algorithms

Similar documents
Formal Verification of Mathematical Algorithms

Isolating critical cases for reciprocals using integer factorization. John Harrison Intel Corporation ARITH-16 Santiago de Compostela 17th June 2003

Optimizing Scientific Libraries for the Itanium

The LCF Approach to Theorem Proving

Newton-Raphson Algorithms for Floating-Point Division Using an FMA

Intel s Successes with Formal Methods

Theorem Proving for Verification

Chapter 1 Mathematical Preliminaries and Error Analysis

Lecture 11. Advanced Dividers

Fast and accurate Bessel function computation

Lecture Notes: Axiomatic Semantics and Hoare-style Verification

Reasoning About Imperative Programs. COS 441 Slides 10b

ALU (3) - Division Algorithms

What happens to the value of the expression x + y every time we execute this loop? while x>0 do ( y := y+z ; x := x:= x z )

Accelerating Correctly Rounded Floating-Point. Division When the Divisor is Known in Advance

Last Time. Inference Rules

Introduction. Pedro Cabalar. Department of Computer Science University of Corunna, SPAIN 2013/2014

Agent-Based HOL Reasoning 1

Floating-Point Verification by Theorem Proving

Mechanizing Elliptic Curve Associativity

ERROR BOUNDS ON COMPLEX FLOATING-POINT MULTIPLICATION

ECS 231 Computer Arithmetic 1 / 27

Classical Program Logics: Hoare Logic, Weakest Liberal Preconditions

Number Representation and Waveform Quantization

Chapter 1: Preliminaries and Error Analysis

The Formal Proof Susan Gillmor and Samantha Rabinowicz Project for MA341: Appreciation of Number Theory Boston University Summer Term

Direct Proof MAT231. Fall Transition to Higher Mathematics. MAT231 (Transition to Higher Math) Direct Proof Fall / 24

Formal Verification Methods 1: Propositional Logic

Notes for Chapter 1 of. Scientific Computing with Case Studies

Representable correcting terms for possibly underflowing floating point operations

Arithmetic and Error. How does error arise? How does error arise? Notes for Part 1 of CMSC 460

Automating the Verification of Floating-point Algorithms

Chapter 1 Computer Arithmetic

Formalizing Mathematics

ECE260: Fundamentals of Computer Engineering

Accurate polynomial evaluation in floating point arithmetic

9. Datapath Design. Jacob Abraham. Department of Electrical and Computer Engineering The University of Texas at Austin VLSI Design Fall 2017

The Euclidean Division Implemented with a Floating-Point Multiplication and a Floor

1 Floating point arithmetic

INF2270 Spring Philipp Häfliger. Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)

Reasoning with Higher-Order Abstract Syntax and Contexts: A Comparison

An Informal introduction to Formal Verification

Getting tight error bounds in floating-point arithmetic: illustration with complex functions, and the real x n function

Notes on Sets for Math 10850, fall 2017

Slope Fields: Graphing Solutions Without the Solutions

Number Systems III MA1S1. Tristan McLoughlin. December 4, 2013

Fast and Accurate Bessel Function Computation

Binary floating point

NICTA Advanced Course. Theorem Proving Principles, Techniques, Applications. Gerwin Klein Formal Methods

Design and FPGA Implementation of Radix-10 Algorithm for Division with Limited Precision Primitives

What Every Programmer Should Know About Floating-Point Arithmetic DRAFT. Last updated: November 3, Abstract

4ccurate 4lgorithms in Floating Point 4rithmetic

1. Introduction to commutative rings and fields

Binary addition example worked out

CHAPTER 1: Functions

Reliable real arithmetic. one-dimensional dynamical systems

Ex code

8/13/16. Data analysis and modeling: the tools of the trade. Ø Set of numbers. Ø Binary representation of numbers. Ø Floating points.

Tunable Floating-Point for Energy Efficient Accelerators

Discrete Mathematics for CS Fall 2003 Wagner Lecture 3. Strong induction

Dynamic Semantics. Dynamic Semantics. Operational Semantics Axiomatic Semantics Denotational Semantic. Operational Semantics

Interactive Theorem Proving in Industry

Chapter 4 Number Representations

Towards a Mechanised Denotational Semantics for Modelica

CSE 1400 Applied Discrete Mathematics Proofs

How to Ensure a Faithful Polynomial Evaluation with the Compensated Horner Algorithm

COP4020 Programming Languages. Introduction to Axiomatic Semantics Prof. Robert van Engelen

Proof Techniques (Review of Math 271)

QUADRATIC PROGRAMMING?

How do computers represent numbers?

Elements of Floating-point Arithmetic

Chapter 1 Error Analysis

Probabilistic Guarded Commands Mechanized in HOL

Hoare Logic I. Introduction to Deductive Program Verification. Simple Imperative Programming Language. Hoare Logic. Meaning of Hoare Triples

The Curry-Howard Isomorphism

On various ways to split a floating-point number

Computing Machine-Efficient Polynomial Approximations

Elements of Floating-point Arithmetic

Today s Lecture. Lecture 4: Formal SE. Some Important Points. Formal Software Engineering. Introduction to Formal Software Engineering

ELEMENTARY NUMBER THEORY AND METHODS OF PROOF

Optimizing the Representation of Intervals

Performance, Power & Energy. ELEC8106/ELEC6102 Spring 2010 Hayden Kwok-Hay So

Software implementation of ECC

Towards Lightweight Integration of SMT Solvers

Relational Interfaces and Refinement Calculus for Compositional System Reasoning

Numbers 1. 1 Overview. 2 The Integers, Z. John Nachbar Washington University in St. Louis September 22, 2017

A Short Introduction to Hoare Logic

Validating QBF Invalidity in HOL4

What are the recursion theoretic properties of a set of axioms? Understanding a paper by William Craig Armando B. Matos

Designing Information Devices and Systems I Fall 2018 Lecture Notes Note Introduction to Linear Algebra the EECS Way

Lecture 7. Floating point arithmetic and stability

Liquid-in-glass thermometer

Chapter 2. Review of Mathematics. 2.1 Exponents

Exercise on Continued Fractions

NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1

LCF + Logical Frameworks = Isabelle (25 Years Later)

Designing Information Devices and Systems I Spring 2018 Lecture Notes Note Introduction to Linear Algebra the EECS Way

Binary Floating-Point Numbers

ABE Math Review Package

Transcription:

Formal verification of IA-64 division algorithms 1 Formal verification of IA-64 division algorithms John Harrison Intel Corporation IA-64 overview HOL Light overview IEEE correctness Division on IA-64 Theory of division algorithms Improved theorems and faster algorithm Conclusions

Formal verification of IA-64 division algorithms 2 IA-64 overview IA-64 is a new 64-bit computer architecture jointly developed by Hewlett-Packard and Intel, and the Itanium TM chip from Intel will be its first silicon implementation. Among the special features of IA-64 are: An instruction format encoding parallelism explicitly Instruction predication Speculative and advanced loads Upward compatibility with IA-32 (x86). The IA-64 Applications Developer s Architecture Guide is now available from Intel in printed form and online: http://developer.intel.com/design/ia64/downloads/adag.htm

Formal verification of IA-64 division algorithms 3 Quick introduction to HOL Light HOL Light is a member of the family of HOL theorem provers. An LCF-style programmable proof checker written in CAML Light, which also serves as the interaction language. Supports classical higher order logic based on polymorphic simply typed lambda-calculus. Extremely simple logical core: 10 basic logical inference rules plus 2 definition mechanisms. More powerful proof procedures programmed on top, inheriting their reliability from the logical core. Fully programmable by the user. Well-developed mathematical theories including basic real analysis. HOL Light is available for download from: http://www.cl.cam.ac.uk/users/jrh/hol-light

Formal verification of IA-64 division algorithms 4 IEEE correctness The IEEE standard states that all the algebraic operations, including division, should give the closest floating point number to the true answer, or the closest number up, down, or towards zero in other rounding modes. ulp(a/b) a b In addition, all the flags need to be set correctly, e.g. inexact, underflow,.... IA-64 features an IEEE-correct fused multiple add, which can compute xy + z with a single rounding error. However it has no instruction for division.

Formal verification of IA-64 division algorithms 5 Division on IA-64 Instead, approximation instructions are provided, e.g. the floating point reciprocal approximation instruction. frcpa.sf f 1, p 2 = f 3 In normal cases, this returns in f 1 an approximation to 1 f 3. The approximation has a worst-case relative error of about 2 8.86. The particular approximation is specified in the IA-64 architecture. Software is intended to start from this approximation and refine it to an IEEE-correct quotient. Surprisingly, quite short sequences of straight-line code suffice to do so. We will concentrate on round-to-nearest mode, since the other modes are much easier.

Formal verification of IA-64 division algorithms 6 Markstein s main theorem Markstein (IBM Journal of Research and Development, vol. 34, 1990) proves the following general theorem. Suppose we have a quotient approximation q 0 a b and a reciprocal approximation y 0 1 b. Provided: The approximation q 0 is within 1 ulp of a b. The reciprocal approximation y 0 is 1 b rounded to the nearest floating point number then if we execute the following two fma (fused multiply add) operations: r = a bq 0 q = q 0 + ry 0 the value r is calculated exactly and q is the correctly rounded quotient, whatever the current rounding mode.

Formal verification of IA-64 division algorithms 7 Markstein s reciprocal theorem The problem is that we need a perfectly rounded y 0 first, for which Markstein proves the following variant theorem. If y 0 is within 1ulp of the exact 1 b, then if we execute the following fma operations in round-to-nearest mode: e = 1 by 0 y = y 0 + ey 0 then e is calculated exactly and y is the correctly rounded reciprocal, except possibly when the mantissa of b is all 1s.

Formal verification of IA-64 division algorithms 8 Using the theorems Using these two theorems together, we can obtain an IEEE-correct division algorithm as follows: Calculate approximations y 0 and q 0 accurate to 1 ulp (straightforward). [N fma latencies] Refine y 0 to a perfectly rounded y 1 by two fma operations, and in parallel calculate the remainder r = a bq 0. [2 fma latencies] Obtain the final quotient by q = q 0 + ry 0. [1 fma latency]. There remains the task of ensuring that the algorithm works correctly in the special case where b has a mantissa consisting of all 1s. One can prove this simply by testing whether the final quotient is in fact perfectly rounded. If it isn t, one needs a slightly more complicated proof. Markstein shows that things will still work provided q 0 overestimates the true quotient.

Formal verification of IA-64 division algorithms 9 Initial algorithm example Our example is an algorithm for quotients using only single precision computations (hence suitable for SIMD). It is built using the frcpa instruction and the (negated) fma (fused-multiply-add): 1. y 0 = 1 b (1 + ǫ) [frcpa] 2. e 0 = 1 by 0 3. y 1 = y 0 + e 0 y 0 4. e 1 = 1 by 1 q 0 = ay 0 5. y 2 = y 1 + e 1 y 1 r 0 = a bq 0 6. e 2 = 1 by 2 q 1 = q 0 + r 0 y 2 7. y 3 = y 2 + e 2 y 2 r 1 = a bq 0 8. q = q 1 + r 1 y 3 This algorithm needs 8 times the basic fma latency, i.e. 8 5 = 40 cycles. For extreme inputs, underflow and overflow can occur, and the formal proof needs to take account of this.

Formal verification of IA-64 division algorithms 10 Improved theorems In proving Markstein s theorems formally in HOL, we noticed a way to strengthen them. For the main theorem, instead of requiring y 0 to be perfectly rounded, we can require only a relative error: y 0 1 b < 1 b /2p where p is the floating point precision. Actually Markstein s original proof only relied on this property, but merely used it as an intermediate consequence of perfect rounding. The altered precondition looks only trivially different, and in the worst case it is. However it is in general much easier to achieve.

Formal verification of IA-64 division algorithms 11 Achieving the relative error bound Suppose y 0 results from rounding a value y 0. The rounding can contribute as much as 1 2 ulp(y 0), which in all significant cases is the same as 1 2 ulp(1 b ). Thus the relative error condition after rounding is achieved provided y 0 is in error by no more than 1 b /2p 1 2 ulp(1 b ) In the worst case, when b s mantissa is all 1s, these two terms are almost identical so extremely high accuracy is needed. However at the other end of the scale, when b s mantissa is all 0s, they differ by a factor of two. Thus we can generalize the way Markstein s reciprocal theorem isolates a single special case.

Formal verification of IA-64 division algorithms 12 Stronger reciprocal theorem We have the following generalization: if y 0 results from rounding a value y 0 with relative error better than d 2 2p : y 0 1 b d 2 2p 1 b then y 0 meets the relative error condition for the main theorem, except possibly when the mantissa of b is one of the d largest, i.e. when considered as an integer is 2 p d m 2 p 1. Hence, we can compute y 0 more sloppily, and hence perhaps more efficiently, at the cost of explicitly checking more special cases.

Formal verification of IA-64 division algorithms 13 An improved algorithm The following algorithm can be justified by applying the theorem with d = 165, explicitly checking 165 special cases. 1. y 0 = 1 b (1 + ǫ) [frcpa] 2. d = 1 by 0 q 0 = ay 0 3. y 1 = y 0 + dy 0 d = d + dd r 0 = a bq 0 4. e = 1 by 1 y 2 = y 0 + d y 0 q 1 = q 0 + r 0 y 1 5. y 3 = y 1 + ey 2 r 1 = a bq 1 6. q = q 1 + r 1 y 3 On a machine capable of issuing three FP operations per cycle, this can be run in 6 FP latencies. Itanium TM can only issue two FP instructions per cycle, but since it is fully pipelined, this only increases the overall latency by one cycle, not a full FP latency. Thus the whole algorithm runs in 31 cycles.

Formal verification of IA-64 division algorithms 14 Conclusions Because of HOL s mathematical generality, all the reasoning needed is done in a unified way with the customary HOL guarantee of soundness: Underlying pure mathematics Formalization of floating point operations Proof of the special Markstein-type theorems Routine relative error computation for the final result before rounding Explicit computation with the special cases isolated. Moreover, because HOL is programmable, many of these parts can be, and have been, automated. Finally, the detailed examination of the proofs that formal verification requires threw up significant improvements that have led to some faster algorithms.