Implementation of float float operators on graphic hardware

Size: px

Start display at page:

Download "Implementation of float float operators on graphic hardware"

Coral Elliott
5 years ago
Views:

1 Implementation of float float operators on graphic hardware Guillaume Da Graça David Defour DALI, Université de Perpignan

2 Outline Why GPU? Floating point arithmetic on GPU Float float format Problem & solution David Defour 2

3 Why do we want to use GPU? 300 Intel Pentium GFLOP ATI [R300 R360 R420] NVIDIA [NV30 NV35 NV40 G70] years David Defour 3

4 David Defour 4

5 David Defour 5

6 David Defour 6

7 Floating point format on GPU Reference Total Sig n Number of bits Exponant Mantissa Special values Nvidia (+ 1 NaN, Inf ATI No (+ 1 No documentation IEEE-754 ANSI-ISO ( (+ 1 NaN, Inf David Defour 7

8 Paranoia on GPU* Adaptation for GPU of Kahan s Paranoia test set Test arithmetic properties of the system Some results: Addition is done with one guard bit Addition and multiplication are truncated Multiplication is faithfully rounded * Karl E. Hillesland, Anselmo Lastra, GPU Floating Point Paranoia (ATI R300, Nvidia NV35, David Defour 8

9 Accuracy and GPU GPUs are driven by game and video Single precision should suffice Computational horse power of GPU is interesting for scientific use, however 24 bits is not enough We need precise operators Our contribution: We developed float float operators to reach 45 bits We used algorithms usually used for expansion. David Defour 9

10 An example: Add12 Proposition Add12:* Let a and b be floating point numbers. If the floating point arithmetic is faithful and satisfies properties A1 and A2, then the numbers s and r computed by the following algorithm satisfy s + r = a + b and either r = s = 0 or r < ulp(s. Add12( a, b s = a + b v = s a r = (a (s v + ( b v Return ( s, r The 2 requirements A1 : The roundoff error of a floating point sum is itself a floating point number A2 : if b a then fl(a+b 2 a D. M. Priest, On properties of loating point arithmetics: Numerical stability and the cost of accurate computations. Phd Thesis, 1992 David Defour 10

11 Accuracy Operator Add12 Mul12 Add22 Mull22 Theoretical accuracy Exact Exact Measured accuracy 48 Exact PROBLEM: We fail to obtain the expected accuracy. David Defour 11

12 The truth In fact, the information provided by paranoia is not completely true. We conducted some tests and Addition is done internally with 26 bits (we have more than 1 guard bit However, Sterbenz lemma remains valid!!! David Defour 12

13 The problem: Let us consider the Fast 2 Sum algorithm and The addition on 26 bits with truncation rounding mode A = B = S = A + B = V = S A = T = B V = _ David Defour 13

14 Addition with 25 bits(1 guard bit A = Previous example B = S = A + B = V = S A = T = B V = With 1 digit difference A = _ B = 1 S = A + B = V = S A = 1 0 T = B V = David Defour

15 Solution Proposition Add12: Let a and b be floating point numbers. If the floating point arithmetic is faithful, then the numbers c and d computed by the following algorithm satisfy c + d = a + b and either c = d = 0 or d <ulp(c. Add12( a, b If a < b swap( a, b s = a + b v = s a f = (b ((s v a v If (r+v f s = a r = b Return ( s, r Overhead : 1 addition and 2 tests David Defour 15

16 Algorithms for 45 bits ADD12(a, b if a < b end if swap(a, b s a + b, d s a g s d, h g a, f b h e f d if e + d f end if s a e b return (s, e MUL12(a, b (a hi, a lo SPLIT(a (b hi, b lo SPLIT(b p a * b SPLIT(a t ( * a a hi t (t a a lo a a hi e ((a hi * b hi p + a hi * b lo + a lo * b hi + a lo * b lo return (p, e return (a hi, a lo D. M. Priest, Algorithms for arbitrary precision floating point arithmetic, Proceedings of the 10 th IEEE Symposium on Computer Arithmetic (Arith 10, 1991 David Defour 16

17 Algorithms for 45 bits ADD22(a hi, a lo, b hi, b lo (u hi, u lo ADD12(a hi, b hi (v hi, v lo ADD12(a lo, b lo (w hi, w lo ADD12(u lo, v hi (p hi, p lo ADD12(v lo, w lo (r 3, r 2 ADD12(w hi, p hi (s, r 1 (e, v lo return (s, e ADD12(u hi, r 3 ADD12(r 2, r 1 * MUL22(a hi, a lo, b hi, b lo (u hi, u lo SPLIT(a hi (v hi, v lo SPLIT(b hi m hi a hi * b hi m lo (((u hi * v hi m hi + (u hi * v lo + (u lo * v hi + (u lo * v lo m lo m lo + a hi * b lo + a lo * b hi s m hi + m lo e m hi s + m lo return (s, e * Yozo Hida and Xiaoye S. Li and David H. Bailey, Algorithms for quad double precision floating point arithmetic, Proceedings of the 15th Symposium on Computer Arithmetic, 2001 David Defour 17

18 CPU Timing Time Size David Defour 18

19 GPU Timing Time Size David Defour 19

20 Conclusion Creating multiprecision operators on GPUs is not as easy as it is on CPUs Floating point arithmetic on GPUs is not normalized The GPU s computational horse power makes it interesting as numerical coprocessor Future: Find optimization using the fact that addition is done with 26 bits Provide multiprecision operators (GMP, MPFR Find evaluation scheme for special functions using available hardware David Defour 20

Accurate polynomial evaluation in floating point arithmetic

Accurate polynomial evaluation in floating point arithmetic in floating point arithmetic Université de Perpignan Via Domitia Laboratoire LP2A Équipe de recherche en Informatique DALI MIMS Seminar, February, 10 th 2006 General motivation Provide numerical algorithms