Basic building blocks for a triple-double intermediate format

Size: px
Start display at page:

Download "Basic building blocks for a triple-double intermediate format"

Transcription

1 Laboratoire de l Informatique du Parallélisme École Normale Supérieure de Lyon Unité Mixte de Recherche CNRS-INRIA-ENS LYON-UCBL n o 5668 Basic building blocks for a triple-double intermediate format Christoph Quirin Lauter Septembre 2005 Research Report N o RR École Normale Supérieure de Lyon 46 Allée d Italie, Lyon Cedex 07, France Téléphone : +33(0) Télécopieur : +33(0) Adresse électronique : lip@ens-lyon.fr

2 Basic building blocks for a triple-double intermediate format Christoph Quirin Lauter Septembre 2005 Abstract The implementation of correctly rounded elementary functions needs high intermediate accuracy before final rounding. This accuracy can be provided by (pseudo-) expansions of size three, i.e. a triple-double format. The report presents all basic operators for such a format. Triple-double numbers can be redundant. A renormalization procedure is presented and proven. Elementary functions implementations need addition and multiplication sequences. These operators must take operands in double, double-double and triple-double format. The results must be accordingly in one of the formats. Several procedures are presented. s are given for their accuracy bounds. Intermediate triple-double results must finally be correctly rounded to double precision. Two effective rounding sequences are presented, one for round-tonearest mode, one for the directed rounding modes. Their complete proofs constitute half of the report. Keywords: elementary functions, multiple precision, expansions, correct rounding Résumé La mise en œuvre de fonctions élémentaires correctement arrondies nécessite l utilisation d un format intermédiaire de haute précision avant l arrondi final. Cette précision peut être pourvue par des (pseudo-)expansions de taille trois, c est-à-dire par un format triple-double. Ce rapport présente tous les opérateurs de base d un tel format. Le format des nombres triple double est redondant, aussi une procédure de renormalisation est-elle présentée et prouvée. La mise en œuvre de fonctions élémentaires a besoin de séquences d addition et de multiplication. Ces opérateurs doivent être capables de prendre en argument des opérandes de format double, double-double ou triple-double. Leurs résultats doivent être dans un des formats correspondants. Un certain nombre de procédures sont présentées pour ces opérations avec des bornes prouvées pour leur précision. Les résultats intermédiaires en triple-double doivent finalement être arrondis correctement vers la double précision. Deux séquences d arrondi final efficaces sont présentées, une pour l arrondi au plus près, une autre pour les modes d arrondi dirigés. Leur preuve complète constitue la moitié du rapport. Mots-clés: fonctions élémentaires, précision multiple, expansions, arrondi correct

3 Basic building blocks for a triple-double intermediate format 1 1 Introduction The implementation of correctly rounded double precision elementary functions needs high accuracy intermediate formats [7, 4, 1, 3]. To give an order of magnitude, in most cases 120 bits of intermediate accuracy are needed for assuring the correct rounding property [3]. In contrast, the native IEEE 754 double precision format offers 53 bits of accuracy. Well-know techniques allow to double this accuracy [5]. Nevertheless the resulting accuracy, 106 bits, is not sufficient. Tripling it would be enough. Using expansions of floating point numbers allows to expand still more the accuracy of a native floating point type. An expansion is a non-evaluated sum of some floating point numbers in a floating point format [6, 9, 10]. However the techniques for manipulating general expansions presented in literature are too costly for the implementation of elementary functions. In this report, we are going to consider expansions of three double precision floating point numbers, i.e. we are going to investigate on a triple double format. In our case, we will hence manipulate floating point numbers (x h + x m + x l ) with x h, x m, x l F. After making some definitions and an analysis of a procedure that we call renormalization, we will consider addition and multiplication procedures for triple-double numbers. These will also comprise operations linking the triple double format with the native double and a double-double format. In the end, we will present and prove in particular two final rounding sequences for correctly rounding a triple-double number to the base double format. 2 Definitions It will be necessary to define a normal form of a triple-double number because it is clear that the triple-double floating point format - or that of (pseudo-)expansions in general - is redundant: there exist numbers ˆx such that there are different triplets (x h, x m, x l ) F 3 such that ˆx = x h + x m + x l. It is easy to see that a representation in such a format is unique if the numbers forming the expansion are ordered by decreasing 0 and if the latter are such that there is no (binary) digit represented by two bits of the mantissas of two different numbers of the expansion. The sign bit has in this case the same role as an additional bit of the significant [6]. As we will see, the need for a normal form for each triple-double number is not directly motivated by needs of the addition and multiplication operators we will have to define. As long as the latter operate correctly, i.e. between known error bounds, on numbers whose representation in a triple-double format is not unique, we are not obliged to recompute normal forms. On the other hand, when we want to round down such a higher precision number to a native double (in one of the different rounding modes), a normal form will be needed. If we could not provide one, we would assist to a explosion of different cases to be handled by the rounding sequence. We shall define therefore: Definition 2.1 (Overlap) Let x h, x l F be two non-subnormal double precision numbers. We will say that x h et x l overlap iff x l 2 52 x h Let be x h, x m, x l F the components of a triple-double number. We will suppose that they are not subnormals. So, we will say that there is overlap iff x h and x m or x m and x l or x h and x l overlap. Definition 2.2 (Normal form) Let be x h, x m, x l F three non-subnormal floating point numbers forming the triple-double number x h + x m + x l.

4 2 Christoph Quirin Lauter We will say that x h + x m + x l are in normal form iff there is no overlap between its components. Further, we will say that x h + x m + x l is normalised. Having made this definition, let us remark that it is deeply inspired by our direct needs and not by abstract analysises on how to represent real numbers. Anyway, the fact that an expansion is not overlapping is not a sufficient condition for its representing an unique floating point number. The implementation of addition and multiplication operators will be finally be based on computations on the components of the operands (or partial products in the case of a multiplication) followed by a final summing up for regaining the triple-double base format. As we do not statically know the magnitudes of the triple-double operands and its components, we will not be able to guarantee that in any case there will not be any overlap in the result. On the other hand, we strive to develop a renormalization sequence for recomputing normal forms. These will be handy, as we already said, for the final rounding and, if needed, i.e. if sufficiently good static bounds can not be given, before handing over a result as an operand to a following operation. Such a renormalization sequence must guarantee that for each triple-double in argument (if needed verifying some preconditions on an initial overlap), the triple-double number returned will be normalised and that the sum of the components for the first number is exactly the same as the one of the components of the latter. Definition 2.3 (The Add12 algorithm) Let A be an algorithm taking as arguments two double precision numbers x, y F and returning two double precision floating point numbers r h, r l F. We will call A the Add12 algorithm iff it verifies that x, y F.r h + r l = x + y x, y F. r l 2 53 r h r h = (x + y) r l = x + y r h i.e. iff it makes an exact addition of two floating point numbers such that the components of the double-double expansion in result are non-overlapping and if the most significant one is the floating point number nearest to the sum of the numbers in argument. Definition 2.4 (The Mul12 algorithm) Let A be an algorithm taking as arguments two double precision numbers x, y F and returning two double precision floating point numbers r h, r l F. We will call A the Mul12 algorithm iff it verifies that x, y F.r h + r l = x y x, y F. r l 2 53 r h r h = (x y) r l = x y r h i.e. iff it makes an exact multiplication of two floating point numbers such that the components of the double-double expansion in result are non-overlapping and if the most significant one is the floating point number nearest to the sum of the numbers in argument. We will pass over the existence proof of such algorithms. Consult [5] on this subject. Let us still define some notations that are needed for the analysis of numerical algorithms like the ones that we are going to consider.

5 Basic building blocks for a triple-double intermediate format 3 Definition 2.5 (Predecessor and successor of a floating point number) Let be x F a floating point number. Let be < the total ordering on F. If x is positive or zero we will design by x + the direct successor of x in F with regard to < and we will notate x its predecessor. If x is negative we will design by x the successor and by x + the predecessor of x. In any case, we will design by succ (x) the successor of x in F with regard to < and pred (x) its predecessor. This definition 2.5 is inspired by [4]. Definition 2.6 (Unit on the last place the ulp function) Let be x F a double precision number and let be x + its successor (resp. negative). So x + x if x 0 and x + + ulp (x) = 2 ulp ( ) x 2 if x + but x + = + ulp ( x) if x < 0 predecessor if x is This definition is inspired by [4], too. ulp function. Compare [8] for further research on the subject of the Let us give already some main lemmas that can be deduced from the definition 2.6 of the ulp function. Lemma 2.7 (The ulp functions with regard to upper bounds) Let be x h and x l two non-subnormal floating point numbers. So x l < ulp (x h ) x l 2 52 x h : Let us suppose that x l < ulp (x h ) and that x l > 2 52 x h. So we get the following inequality 2 52 x h < ulp (x h ) Without loss of generality, let us suppose now that x h > 0 and that x + h + where x+ h is the successor of x h in the ordered set of floating point numbers. So we know by the definition of non-subnormal floating point number in double precision that there exists m N et e Z such that x h = 2 e m with 2 52 m < Anyway, one can check that { x + h = 2e (m + 1) if m + 1 < e otherwise So 2 cases must be treated separately: 1st case: x + h = 2e (m + 1) So we get 2 e (m + 1) 2 e m > e m 1 > 2 52 m In contrast, m 2 52, so we obtain the strict inequality 1 > 1 which contradicts the hypotheses. 2nd case: x + h = 2e

6 4 Christoph Quirin Lauter In this case, we know that m = We can deduce that 2 e e ( ) > e ( ) 1 > This last inequality is a direct contradiction with the hypotheses. : Let us suppose now that x l 2 52 x h and that x l > ulp (x h ). So we get ulp (x h ) < 2 52 x h Without loss of generality, let us suppose that x h 0 and that x + h + where x+ h of x h in the ordered set of double precision floating point numbers. Thus is the successor x + h x h < 2 52 x h x + h < x h ( ) Thus, we get or a floating point number equal to 0 whose successor is equal to 0, too, or a double precision floating point number whose mantissa contains more than 53 bits because the inequality is strict. It is clear that we have got a contradiction in both cases. Lemma 2.8 (Commutativity of the x + and x operators with unary ) Let be x F a positive floating point number. So, ( x) + = ( x +) and ( x) = ( x ) Since the set of the floating point numbers is symmetrical around 0, we get ( x) + = pred ( x) = succ (x) = ( x +) and ( x) = succ ( x) = pred (x) = ( x ) Lemma 2.9 (x + and x for an integer power of 2) Let be x F a non-subnormal floating point number such that it exists e Z such that where p 2 is the format s precision. So, If x > 0, we get and If x is negative it suffices to apply lemma 2.8. x = ±2 e 2 p x x = 1 2 (x + x ) x x = 2 e 2 p 2 e 1 (2 p+1 1 ) = 2 e 1 x + x = 2 e (2 p + 1) 2 e 2 p = 2 e

7 Basic building blocks for a triple-double intermediate format 5 Lemma 2.10 (x + and x for a float different from an integer power of 2) Let be x F a non-subnormal floating point number such that it does not exist any e Z such that x = ±2 e 2 p where p 2 is the format s precision. So, x x = x + x If x > 0 we know that there exist e Z and m N such that with x = 2 e m 2 p < m < 2 p 1 because x is not exactly an integer power of 2. Further, one checks that x + = 2 e (m + 1) even if m = 2 p 1 1 and that because the lower bound given for m is strict. So one gets If x is negative it suffices to apply lemma 2.8. x = 2 e (m 1) x x = 2 e m 2 e (m 1) = 2 e = 2 e (m + 1) 2 e m = x + x Lemma 2.11 (Factorized integer powers of 2 and the operators x + and x ) Let be x F a non-subnormal floating point number such that 1 2 x is still not subnormal. So, ( ) x = 1 2 x+ and ( ) 1 2 x = 1 2 x Without loss of generality let us suppose that x is positive. Otherwise we easily apply lemma 2.8. So, if x can be written x = 2 e m with m p+1 where p is the precision then Otherwise, ( ) x = ( 2 e 1 m ) + = 2 e 1 (m + 1) = 1 2 x+ ( ) x = ( 2 e 1 (2 p+1 1 )) + = 2 e p = 1 2 x+ One can check that one obtains a completely analogous result for x.

8 6 Christoph Quirin Lauter Lemma 2.12 (Factor 3 of an integer power of 2 in argument of the x + operator) Let be x F a positive floating point number such that x is not subnormal, x + and (3 x) + are different from +, and that e Z. x = 2 e 2 p where p 3 is the precision of the format F. So the following equation holds (3 x) + + ulp (x) = 3 x + We can easily check the following (3 x) + + ulp (x) = (3 x) + ( x + x ) = (3 2 e 2 p ) + + (2 e 2 p ) + 2 e 2 p = ((2 + 1) 2 e 2 p ) e (2 p + 1) 2 e 2 p ( = 2 2 e 2 p ) + 2 2e 2 p + 2 e = ( 2 e+1 (2 p + 2 p 1)) e = 2 e+1 (2 p + 2 p ) + 2 e = 2 e+1 (2 p + 2 p 1) + 2 e e = 2 e+1 (2 p + 2 p 1) e = 3 2 e 2 p e = 3 2 e (2 p + 1) = 3 (2 e 2 p ) + = 3 x + Lemma 2.13 (Monotony of the ulp function) The ulp function is monotonic for non-subnormal positive floating point numbers and it is monotonic for non-subnormal negative floating point numbers, i.e. x, y F. denorm < x y ulp (x) ulp (y) x, y F. x y < denorm ulp (x) ulp (y) where denorm is the greatest positive subnormal. As a matter of fact it suffices to show that the ulp function is monotonic for non-subnormal floating point numbers and to apply its definition 2.6 for the negative case. Let us suppose so that we have two floating point numbers x, y F such that denorm < x < y. Without loss of generality we suppose that x + + and that y + +. Otherwise we apply definition 2.6 of the ulp function and lemma So we get ulp (y) ulp (x) = y + y x + + x It suffices now to show that We can suppose that we would have y + x + y + x 0 x = 2 ex m x y = 2 ey m y

9 Basic building blocks for a triple-double intermediate format 7 with e x, e y Z, 2 p m x, m y < 2 p+1 1. Since y > x, we clearly see that (e y, m y ) lex (e x, m x ) So four different cases are possible: 1. x + = 2 ex (m x + 1) y + = 2 ey (m y + 1) Hence y + x + y + x = 2 ey (m y + 1) 2 ex (m x + 1) 2 ey m y + 2 ex m x = 2 ey 2 ex 0 2. which yields to x + = 2 ex (m x + 1) y + = 2 ey+1 2 p y + x + y + x = 2 ey+1 2 p 2 ex (m x + 1) 2 ey (2 p+1 1 ) + 2 ex m x = 2 ey 2 ex 0 3. x + = 2 ex+1 2 p y + = 2 ey (m y + 1) One checks that y + x + y + x = 2 ey (m y + 1) 2 ex+1 2 p 2 ey m y + 2 ex (2 p+1 1 ) = 2 ey 2 ex 0 4. x + = 2 ex+1 2 p y + = 2 ey+1 2 p Thus y + x + y + x = 2 ey+1 2 p 2 ex+1 2 p 2 ey (2 p+1 1 ) + 2 ex (2 p 1 1 ) = 2 ey 2 ex 0 This finishes the proof.

10 8 Christoph Quirin Lauter 3 A normal form and renormalization procedures 3.1 A proposition for a renormalization sequence Let us now consider the following algorithm that, at first sight, seems to implement a renormalization of a triple-double number a h + a m + a l : Algorithm 3.1 (A possible renormalization) In: a h, a m, a l F Out: r h, r m, r l F (t 1h, t 1l ) Add12 (a h, a m ) (t 2, r l ) Add12 (t 1l, a l ) (r h, r m ) Add12 (t 1h, t 2 ) Let us now show that the first idea that algorithm 3.1 returns always a non-overlapping tripledouble number is not correct. We will do this by the following theorem. Theorem 3.2 There exist double precision numbers a h, a m, a l F such that algorithm 3.1 returns a triple-double number r h + r m + r l which is not normalised. It suffices to consider the following double precision numbers: a h = 1.0 a m = 2 54 a l = During the computations by algorithm 3.1, we will observe the following intermediate values: t 1h = a h a m = = 1.0 because is at the exact middle of two floating point numbers for an exponent corresponding to 1.0 and because the mantissa of 1.0 finishes by a 0 [2]. Thus, So we get t 1l = a h + a m t 1h = = 2 54 t 2 = t 1l ( ) = because with a 53 bit mantissa, is no longer representable with an exponent corresponding to and because is exactly at half of an ulp of So, clearly, we get r l = t 1l + a l t 2 = = 2 107

11 Basic building blocks for a triple-double intermediate format 9 So finally, we get r h = t 1h t 2 = 1.0 ( ) = because is greater than a quarter of an ulp of 1.0 and 1.0 is an exact power of 2. In consequence, r m = t 1h + t 2 r h = = So the exponent of r m is 55 which means that its ulp is at In contrast, the exponent of r l is 107 which means that there is one bit of overlap. 3.2 A second renormalization Let us now analyse a second renormalization algorithm. We will prove its correctness be a series of lemmas and a final theorem. Let be the following procedure: Algorithm 3.3 (Renormalization) In: a h, a m, a l F verifying the following preconditions: Preconditions: None of the numbers a h, a m, a l is subnormal a h et a m do not overlap in more than 51 bits a m et a l do not overlap in more than 51 bits which means formally: Out: r h, r m, r l F a m 2 2 a h a l 2 2 a m a l 2 4 a h (t 1h, t 1l ) Add12 (a m, a l ) (r h, t 2l ) Add12 (a h, t 1h ) (r m, r l ) Add12 (t 2l, t 1l ) Consult also [6] on the subject of this algorithm. Let us give now some lemmas on the properties of the values returned by algorithm 3.3 and on the intermediate ones. Lemma 3.4 (Exact sum) For each triple-double number a h +a m +a l, algorithm 3.3 returns a triple-double number r h +r m +r l such that a h + a m + a l = r h + r m + r l This fact is a trivial consequence of the properties of the Add12 algorithm.

12 10 Christoph Quirin Lauter Lemma 3.5 (Rounding of the middle component) For each triple-double number a h +a m +a l, algorithm 3.3 returns a triple-double number r h +r m +r l such that r m = (r m + r l ) The same way, the intermediate and final value will verify the following properties: t 1h = (a m + a l ) r h = (a h + t 1h ) t 1l 2 53 t 1h t 2l 2 53 r h In particular, r m will not be equal to 0 if r l is not equal to 0. This fact is a trivial consequence of the properties of the Add12 algorithm. Lemma 3.6 (Upper bounds) For all arguments of algorithm 3.3, the intermediate and final values t 1h, t 1l, t 2l et r m can be bounded upwards as follows: t 1h 2 1 a h t 1l 2 54 a h t 2l 2 52 a h t m 2 51 a h 1. Upper bound for t 1h : We have supposed that So we can check that a m 2 2 a h a l 2 4 a h t 1h a m + a l a h 2 2 a h a h a h 2 1 a h 2. Upper bound for t 1l : Using the properties of the Add12 algorithm we can get to know that t 1l 2 53 t 1h which yields finally to t 1l 2 54 a h 3. Upper bound for t 2l : t 2l 2 53 r h 2 53 ( a h + t 1h ) 2 53 a h t 1h a h t 1h 2 53 a h a h a h a h 2 52 a h

13 Basic building blocks for a triple-double intermediate format Upper bound for r m : r m t 2l t 1l t 2l + t 1l t 2l + t 1l 2 52 a h a h a h a h 2 51 a h Lemma 3.7 (Special case for r h = 0) For all arguments verifying the preconditions of algorithm 3.3, r h will not be equal to 0 if r m is not equal to 0. Formally: r h = 0 r m = 0 Let us suppose that r h = 0 and that r m 0. So we get because we have already shown that r h = (a h + t 1h ) ( ) 2 1 a h = 2 1 a h t 1h 2 1 a h So for r h being equal to 0, a h must be equal to 0. In contrast, this yields to t 1h = 0 because r h = (0 + t 1h ) = t 1h This implies that t 1l = 0 because of the properties of the Add12 procedure. The same way, we get t 2l = 0. We can deduce from this that 0 r m = (0 + 0) = 0 which is a contradiction. Lemma 3.8 (Lower bound for r h ) For all arguments verifying the preconditions of algorithm 3.3, the final result r h can be bounded in magnitude as follows: r h 2 1 a h We have that r h = a h t 1h Clearly, if a h and t 1h have the same sign, we get r h a h 2 1 a h Otherwise - a h and t 1h are now of the opposed sign - we have already seen that t 1h 2 1 a h So, in this case, too, we get r h 2 1 a h

14 12 Christoph Quirin Lauter Corollary 3.9 (Additional property on the Add12 procedure) Let be r h and r l two double precision floating point numbers returned by the Add12 procedure. So r l 1 2 ulp (r h) Using lemma 2.7 and the definition 2.3 of the Add12 procedure. Theorem 3.10 (Correctness of the renormalization algorithm 3.3) For all arguments verifying the preconditions of procedure 3.3, the values returned r h, r m and r l will not overlap unless they are all equal to 0 and their sum will be exactly the sum of the values in argument a h, a m et a l. The fact that the sum of the values returned is exactly equal to the sum of the values in argument has already been proven by lemma 3.4. Without loss of generality, we will now suppose that neither r h nor r m will be 0 in which case all values returned would be equal to 0 as we have shown it by lemmas 3.7 and 3.5. Using lemma 3.5, we know already that r m and r l do not overlap. Let us show now that r h and r m do not overlap by proving that the following inequality is true We will than use lemma 2.7 for concluding. There are two different cases to be treated. 1st case: t 2l = 0 We know that r m 3 4 ulp (r h) < ulp (r h ) r m = (t 2l + t 1l ) = (0 + t 1l ) = t 1l When showing lemma 3.6, we have already proven that Using lemma 3.8, we therefore know that which is the result we wanted to prove. 2nd case: t 2l 0 Still using lemma 3.6, we have shown that t 1l 2 54 a h r m 2 53 r h t 1h 2 1 a h In consequence, when the IEEE 754 [2] addition r h = a h t 1h is ported out, the rounding will be done at a bit of weight heigher than one ulp (t 1h ) because t 2l is strictly greater than 0 and because a h and t 1h do not completely overlap. Therefore we can check that t 2l ulp (t 1h ) With the result of lemma 3.6 we already mentioned, we can deduce that t 2l 1 2 ulp (t 1h) 1 2 t 2l

15 Basic building blocks for a triple-double intermediate format 13 So one can verify the following upper bound using among others lemma 3.9: r m = (t 2l + t 1l ) ( ) 3 2 t 2l ( ) 3 4 ulp (r h) = 3 4 ulp (r h) One remarks that the last simplification is correct here because ulp is always equal to an integer power of 2 and because the precision of a double is greater than 4 bits. 4 Operators on double-double numbers Since we dispose now of a renormalization procedure which is effective and proven, we can now consider the different addition and multiplication operators we need. They will surely work finally on expansions of size 3, but the double-double format [5] must be analysed, too, because it is at the base of the triple-double format. We already mentioned that on definition and analysis of this operators, we need not care such a lot of the overlap in the components of a triple-double number any more: as long as the overlap does not make us loose a too much of the final accuracy because several bits of the mantissa are represented twice, overlap is not of an issue for intermediate values. At the end of triple-double computations, it will be sufficient to apply once the renormalization procedure. In order to measure the consequences of an overlap in the operands on final accuracy and in order to be able to follow the increase of the overlap during computations in triple-double, we will indicate for each operator which produces a triple double result or which takes a triple-double operand not only a bound for relative and absolute rounding errors but also a bound for the maximal overlap of the values returned. All this bounds will be parameterised by a variable representing the maximal overlap of the triple-double arguments. 4.1 The addition operator Add22 Let us analyse first the following addition procedure: Algorithm 4.1 (Add22) In: two double-double numbers, a h + a l et b h + b l Out: a double-double number r h + r l Preconditions on the arguments: a l 2 53 a h b l 2 53 b h Algorithm:

16 14 Christoph Quirin Lauter t 1 a h b h if a h b h then else Compare [1] concerning algorithm 4.1. t 2 a h t 1 t 3 t 2 b h t 4 t 3 b l t 5 t 4 a l t 2 b h t 1 t 3 t 2 a h t 4 t 3 a l t 5 t 4 b l end if (r h, r l ) Add12 (t 1, t 5 ) Theorem 4.2 (Relative error of algorithm 4.1 Add22 without occurring of cancellation) Let be a h + a l and b h + b l the double-double arguments of algorithm 4.1 Add22. If a h and b h have the same sign, so we know that r h + r l = ((a h + a l ) + (b h + b l )) (1 + ε) where ε is bounded as follows: ε 2 103,5 Since the algorithm Add22 ends by a call to the Add12 procedure, it suffices to show that t 1 + t 5 = ((a h + a l ) + (b h + b l )) (1 + ε) Further, since the two branches of the algorithm are symmetrical, we can suppose that a h b h and consider only one branch without loss of generality. Finally, we remark that the following lines of the Add22 procedure constitute a non-conditional Add12 with arguments a h and b h and the result t 1 + t 3 : Thus, we get with t 1 = a h b h t 2 = a h t 1 t 3 = t 2 b h t 5 = t 4 a l = (t 4 + a l ) (1 + ε 1 ) = ((t 3 + bl) (1 + ε 2 ) + a l ) (1 + ε 1 ) = t 3 + a l + b l + δ δ = t 3 ε 2 + b l ε 2 + t 3 ε 1 + b l ε 1 + t 3 ε 2 ε 2 + b l ε 2 ε 2 + a l ε 1 For giving an upper bound for δ, let us first give an upper bound for t 3, a l and b l as function of a h + b h using the following bounds that we know already: a l 2 53 a h b l 2 53 b h t t 1

17 Basic building blocks for a triple-double intermediate format 15 We get therefore and than t t 1 = 2 53 a h b h 2 53 a h + b h a h + b l a l 2 53 a h 2 53 a h + b h The last bound is verified because we suppose that a h and b h have the same sign. Finally, since a h b h, Thus we get for δ : b l 2 53 b h 2 53 a h 2 53 a h + b h δ a h + b h ( ) a h + b h ( ) Let us now give a lower bound for a h + a l + b h + b l as a function of a h + b h in order to be able to give a relative error bound for the procedure Add22. We have that So we can check that Concerning δ, this yields to δ a h + a l + b h + b l One easily checks that a l + b l a l + b l 2 53 a h b h 2 52 a h 2 52 a h + b h a h + a l + b h + b l ( ) a h + b h ( ) ( ) 2 103,5 from which one trivially deduces the affirmation. Theorem 4.3 (Relative error of algorithm 4.1 Add22 with a bounded cancellation) Let be a h + a l and b h + b l the double-double arguments of 4.1 Add22. If a h and b h are of different sign and if one can check that for µ 1 so the returned result will verify where ε is bounded as follows: b h 2 µ a h r h + r l = ((a h + a l ) + (b h + b l )) (1 + ε) ε µ µ

18 16 Christoph Quirin Lauter Let us reuse the results obtained at the proof 4.1 and let us start by giving an upper bound for b l as a function of a h : b l 2 53 b h 2 53 µ a h Let us now continue with a lower bound for a h + b h still as a function of a h : We get in consequence and Thus we can check that a h + b h a h (1 2 µ) a l µ a h + b h b l 2 53 µ 1 2 µ a h + b h δ a h + b h µ µ Once again, we must give a lower bound for a h + a l + b h + b l with regard to a h + b h : We know that So Thus we get for δ a l + b l a l + b l 2 52 a h µ a h + b h a h + a l + b h + b l a h + b h 1 2 µ µ δ 1 2 µ a h + a l + b h + b l 1 2 µ µ µ = a h + a l + b h + b l µ µ 2 52 So finally the following inequality is verified for the relative error ε: ε µ µ 2 52 We can still give a less exact upper bound for this term by one that does not depend on µ because µ 1: 1 2 µ µ so ε Theorem 4.4 (Absolute error of algorithm 4.1 Add22 (general case)) Let be a h + a l and b h + b l the double-double arguments of algorithm 4.1 Add22. The result r h + r l returned by the algorithm verifies where δ is bounded as follows: r h + r l = (a h + a l ) + (b h + b l ) + δ δ max ( 2 53 a l + b l, a h + a l + b h + b l )

19 Basic building blocks for a triple-double intermediate format 17 Without loss of generality, we can now suppose that 1 2 a h b h a h and that a h and b h have different signs because for all other cases, the properties we have to show are a direct consequence of theorems 4.2 and 4.3. So we have 1 2 a h b h 2 a h and So the floating point operation 1 2 b h a h 2 b h t 1 = a h b h is exact by Sterbenz lemma [11]. In consequence, t 3 will be equal to 0 because, as we have already mentioned, the operations computing t 1 and t 3 out of a h and b h constitute a Add12 whose properties assure that We can deduce that t 3 = a h + b h t 1 t 4 = t 3 b l = b l and can finally check that with So we get with t 5 = t 4 a l = (t 4 + a l ) (1 + ε ) ε 2 53 r h + r l = t 1 + t 5 = (a h + a l + b h + b l ) + δ δ 2 53 a l + b l which yields to the bound to be proven δ = max ( 2 53 a l + b l, a h + a l + b h + b l ) Theorem 4.5 (Output overlap of algorithm 4.1 Add22) Let be a h + a l and b h + b l the double-double arguments of algorithm 4.1 Add22. So the values r h and r l returned by the algorithm will not overlap at all and will verify r l 2 53 r h The proof of the affirmed property is trivial because the procedure Add22 sequence Add12 which assures it. ends by a call to

20 18 Christoph Quirin Lauter 4.2 The multiplication operator Mul22 Let us now consider the multiplication operator Mul22: Algorithm 4.6 (Mul22) In: two double-double numbers, a h + a l et b h + b l Out: a double-double number r h + r l Preconditions on the arguments: Algorithm: a l 2 53 a h b l 2 53 b h (t 1, t 2 ) Mul12 (a h, b h ) t 3 a h b l t 4 a l b h t 5 t 3 t 4 t 6 t 2 t 5 (r h, r l ) Add12 (t 1, t 6 ) Compare also to [1] concerning algorithm 4.6. Theorem 4.7 (Relative error of algorithm 4.6 Mul22) Let be a h + a l and b h + b l the double-double arguments of algorithm 4.6 Mul22. So the values returned r h et r l verify r h + r l = ((a h + a l ) (b h + b l )) (1 + ε) where ε is bounded as follows: ε Further r h and r l will not overlap at all and verify r l 2 53 r h Since algorithm 4.6 ends by a call to the Add12 procedure, the properties of the latter yield to the fact that r h and r l do not overlap at all and that r l 2 53 r h. In order to give upper bounds for the relative and absolute error of the algorithm, let us express t 6 as a function of t 2, a h, a l, b h and b l joined by the error term δ. We get t 6 = t 2 (a h b l a l b h ) = (t 2 + (a h b l (1 + ε 1 ) + a l b h (1 + ε 2 )) (1 + ε 3 )) (1 + ε 4 ) where ε i 2 53, i = 1, 2, 3, 4. Simplifying this expression, we van verify that with t 6 = t 2 + a h b l + a l b h + δ δ a l b l + a h b l ε 1 + a l b h ε 2 + a h b l ε 3 + a h b l ε 1 ε 3 + a l b h ε 3 + a l b h ε 1 ε 3 +a h b l ε 4 + a l b h ε 4 + a h b l ε 1 ε 4 + a l b h ε 2 ε 4 + a h b l ε 3 ε 4 +a h b l ε 1 ε 3 ε 4 + a l b h ε 3 ε 4 + a l b h ε 2 ε 3 ε 4 a h b h ( ) a h b h 2 103

21 Basic building blocks for a triple-double intermediate format 19 For checking the given bound, we have supposed the following inequalities: and a l 2 53 a h a l 2 53 a h Let us now give a lower bound for (a h + a l ) (b h + b l ) as a function of a h b h. For doing so, we give an upper bound for a h b l + a l b h + a l b l. We verify since: a h b l + a l b h + a l b l a h b l + a l b h + a l b l 2 53 a h b h a h b h a h b h 2 51 a h b h This yields to (a h + a l ) (b h + b l ) a h b h (1 2 51) 1 2 a h b h from which we deduce that δ (a h + a l ) (b h + b l ) which gives us as an bound for the relative error r h + r l = (a h + a l ) (b h + b l ) (1 + ε) with ε Addition operators for triple-double numbers 5.1 The addition operator Add33 We are going to consider now the addition operator Add33. We will only analyse a simplified case where the arguments values verify some bounds statically known. Algorithm 5.1 (Add33) In: two triple-double numbers, a h + a m + a l et b h + b m + b l Out: a triple-double number r h + r m + r l Preconditions on the arguments: b h 3 4 a h a m 2 αo a h a l 2 αu a m b m 2 βo b h b l 2 βu b m α o 4 α u 1 β o 4 β u 1 Algorithm:

22 20 Christoph Quirin Lauter (r h, t 1 ) Add12 (a h, b h ) (t 2, t 3 ) Add12 (a m, b m ) (t 7, t 4 ) Add12 (t 1, t 2 ) t 6 a l b l t 5 t 3 t 4 t 8 t 5 t 6 (r m, r l ) Add12 (t 7, t 8 ) Theorem 5.2 (Relative error of algorithm 5.1 Add33) Let be a h +a m +a l and b h +b m +b l the triple-double arguments of algorithm 5.1 Add33 verifying the given preconditions. So the following egality will hold for the returned values r h, r m et r l where ε is bounded by: r h + r m + r l = ((a h + a m + a l ) + (b h + b m + b l )) (1 + ε) ε 2 min(αo+αu,βo+βu) min(αo,βo) 98 The returned values r m and r l will not overlap at all and the overlap of r h and r m will be bounded by the following expression: r m 2 min(αo,βo)+5 r h The procedure 5.1 ends by a call to the Add12 sequence. One can trivially deduce that r m and r l do not overlap at all and verify r l 2 53 r m Further, it suffices that the bounds given at theorem 5.2 hold for t 7 and t 8 because the last addition computing r m et r l will be exact. The same way, one can deduce the following inequalities out of the properties of the Add12 procedure. They will become useful during this proof. t r h t t 2 t t 7 Let us start the proof by giving bounds for the magnitude of r h with regard to a h : We have on the one hand and on the other r h = (a h + b h ) = ( a h + b h ) ( a h + b h ) ( a h + 3 ) 4 a h (2 a h ) = 2 a h r h = ( a h + b h ) ( ) 1 4 a h = 1 4 a h

23 Basic building blocks for a triple-double intermediate format 21 So we know that 1 4 a h r h 2 a h. It is now possible to give the following upper bounds for t 1, t 2, t 3, t 7, t 4, t 6 and t 6 : t r h a h = 2 51 a h t 2 ( a m + b m ) ( a m + b m ) ( 2 αo a h + 2 βo b h ) ( 2 αo a h + 2 βo 3 ) 4 a h ( ) 2 min(αo,βo)+1 a h = 2 min(αo,βo)+1 a h t t min(αo,βo)+1 a h = 2 min(αo,βo) 52 a h t 7 ( t 1 + t 2 ) ( t 1 + t 2 ) ( ) 2 51 a h + 2 min(αo,βo)+1 a h ( ) 2 min(αo,βo)+2 a h = 2 min(αo,βo)+2 a h t t min(αo,βo)+2 a h = 2 min(αo,βo) 51 a h and finally t 6 ( a l + b l ) ( 2 αo 2 αu a h + 2 βo 2 βu 3 ) 4 a h ( ) 2 min(αo+αu,βo+βu)+1 a h = 2 min(αo+αu,βo+βu)+1 a h t 5 ( t 3 + t 4 ) ( ) 2 min(αo,βo) 52 a h + 2 min(αo,βo) 51 a h ( ) 2 min(αo,βo) 50 a h = 2 min(αo,βo) 50 a h

24 22 Christoph Quirin Lauter Using the fact that the addition Add12 is exact, it is easy to show that we have exactly r h + t 7 + t 3 + t 4 = a h + a m + b h + b m Further, we can check t 8 = (t 3 2 t 4 ) 1 (a l 3 b l ) = t 3 + t 4 + a l + b l + δ with δ t 3 ε 3 + t 4 ε 2 + a l ε 3 + b l ε 3 + t 3 ε 1 + t 4 ε 1 + t 3 ε 1 ε 2 + t 4 ε 1 ε 2 + a l ε 1 + b l ε 2 + a l ε 1 ε 3 + b l ε 1 ε 3 where for i {1, 2, 3}, ε i is the relative error bound of the floating point addition i and verifies So we get immediately ε i 2 53 r h + r m + r l = r h + t 7 + t 8 = (a h + a m + a l ) + (b h + b m + b l ) + δ Let us now express (a h + a m + a l ) + (b h + b m + b l ) as a function of a h : and, the same way round, which allows for noting a h + a m + a l a h + a m + a l a h + 2 αo a h + 2 αo αu a h 2 a h b h + b m + b l 2 b h 3 2 a h (a h + a m + a l ) + (b h + b m + b l ) 2 2 a h In order to give a lower bound for this term, let us prove an upper bound for b h + a m + b m + a l + b l as follows So we get b h + a m + b m + a l + b l b h + a m + b m + a l + b l 3 4 a h + 2 αo a h + 2 βo 3 4 a h + 2 αo αu a h +2 βo βu 3 4 a h 7 8 a h (a h + a m + a l ) + (b h + b m + b l ) 1 8 a h Using this bounds, we can give upper bounds for the absolute error δ first as a function of a h and than as a function of (a h + a m + a l ) + (b h + b m + b l ) for deducing finally a bound for the relative error. So we get δ t 3 ε 3 + t 4 ε 2 + a l ε 3 + b l ε 3 + t 3 ε 1 + t 4 ε 1 + t 3 ε 1 ε 2 + t 4 ε 1 ε 2 + a l ε 1 + b l ε 2 + a l ε 1 ε 3 + b l ε 1 ε 3 a h ε

25 Basic building blocks for a triple-double intermediate format 23 with This yields to with ε min(αo,βo) min(αo,βo) αo αu βo βu min(αo,βo) min(αo,βo) min(αo,βo) min(αo,βo) αo αu βo βu αo αu βo βu 2 min(αo,βo) min(αo+αu,βo+βu) 50 r h + r m + r l = ((a h + a m + a l ) + (b h + b m + b l )) (1 + ε) ε 2 min(αo,βo) min(αo+αu,βo+βu) 47 In order to finish the prove, it suffices now to give an upper bound for the maximal overlap between r h and r m because we have already shown that r m and r l do not overlap at all. So we can check r 8 ( t 5 + t 6 ) ( ) 2 min(αo,βo) 50 a h + 2 min(αo+αu,βo+βu)+1 a h ( ) 2 min(αo,βo) 48 r h + 2 min(αo+αu,βo+βu)+3 r h and continue by giving the following upper bound r m = ( t 7 + t 8 ) ( t 7 + t 8 ) ( 2 min(αo,βo)+4 r h + ( r h ( )) 2 min(αo,βo) 48 r h + 2 min(αo+αu,βo+βu)+3 r h ( 2 min(αo,βu) min(αo,βo) min(α+αu,βo+βu) min(αo,βo) min(αo+αu,βo+βu) 50)) 2 min(αo,βo)+5 r h This is the maximal overlap bound we were looking for; the proof is therefore finished. Theorem 5.3 (Special case of algorithm 5.1 Add33) Let be a h + a m + a l and b h + b m + b l the triple-double arguments of algorithm 5.1 Add33 such that a h = a m = a l = 0 So the values r h, r m and r l returned will be exactly equal to r h + r m + r l = b h + b m + b l The values r m and r l will not overlap at all. The overlap of r h and r m must still be evaluated.

26 24 Christoph Quirin Lauter We will suppose that the Add12 procedure is exact for a h = a m = a l = 0 if even we are using its unconditional version. Under this hypothesis, we get thus: r h = (0 + b h ) = b h t 1 = 0 + b h b h = 0 t 2 = b m t 3 = 0 t 7 = (0 + b m ) = b m t 4 = 0 t 6 = 0 b l = b l t 5 = 0 0 = 0 t 8 = 0 b l = b l r m + r l = b m + b l In consequence, the following holds for the values returned: r h + r m + r l = b h + b m + b l Clearly r m et r l do not overlap because the Add12 procedure the algorithm calls at its last line assures this property. 5.2 The addition operator Add233 Let us consider now the addition operator Add233. We will only analyse a simplified case where the arguments of the algorithm verify statically known bounds. Algorithm 5.4 (Add233) In: a double-double number a h + a l and a triple-double number b h + b m + b l Out: a triple-double number r h + r m + r l Preconditions on the arguments: Algorithm: b h 2 2 a h a l 2 53 a h b m 2 βo b h b l 2 βu b m (r h, t 1 ) Add12 (a h, b h ) (t 2, t 3 ) Add12 (a l, b m ) (t 4, t 5 ) Add12 (t 1, t 2 ) t 6 t 3 b l t 7 t 6 t 5 (r m, r l ) Add12 (t 4, t 7 ) Theorem 5.5 (Relative error of algorithm 5.4 Add233) Let be a h + a l and b h + b m + b l the values taken in argument of algorithm 5.4 Add233. Let the preconditions hold for this values. So the following holds for the values returned by the algorithm r h, r m et r l r h + r m + r l = ((a h + a m + a l ) + (b h + b m + b l )) (1 + ε)

27 Basic building blocks for a triple-double intermediate format 25 where ε is bounded by ε 2 βo βu βo The values r m and r l will not overlap at all and the overlap of r h and r m will be bounded by: r m 2 γ r h with γ min (45, β o 4, β o + β u 2) We know using the properties of the Add12 procedure that r h + t 1 = a h + b h t 2 + t 3 = a l + b m t 4 + t 5 = t 1 + t 2 r m + r l = t 4 + t 7 Supposing that we dispose already of a term of the following form t 7 = t 5 + t 3 + b l + δ with a bounded δ, we can note that r h + r m + r l = (a h + a l ) + (b h + b m + b l ) + δ Let us now express t 7 by t 5, t 3 and b l : with ε and ε We get in consequence t 7 = t 5 t 6 = t 5 (t 3 b l ) = (t 5 + (t 3 + b l ) (1 + ε 1 )) (1 + ε 2 ) t 7 = t 5 + t 3 + b l + t 3 ε 1 + b l ε 1 + t 5 ε 2 + t 3 ε 2 + b l ε 2 + t 3 ε 1 ε 2 + b l ε 1 ε 2 and we can verify that the following upper bound holds for the absolute error δ: δ = t 3 ε 1 + b l ε 1 + t 5 ε 2 + t 3 ε 2 + b l ε 2 + t 3 ε 1 ε 2 + b l ε 1 ε t b l t b l t b l 2 52 t b l t 5 Let us get now some bounds for t 3, b l and t 5, all as a function of a h : b l 2 βo βu b h 2 βo βu 2 a h which can be obtained using the preconditions hypotheses. Further t t 2 = 2 53 (a l + b m ) 2 52 a l + b m 2 52 a l b m a h + 2 βo 52 b h a h + 2 βo 54 a h

28 26 Christoph Quirin Lauter and finally t t 4 = 2 53 (t 1 + t 2 ) 2 52 t 1 + t t t r h (a l + b m ) (a h + b h ) a l + b m a h + b h a l b m a h a h a h + 2 βo 53 a h a h ( βo 53) So we have δ a h ( βo βo βu βo 106) a h (2 βo βu βo ) Let us now give a lower bound for (a h + a l ) + (b h + b m + b l ) as a function of a h by getting out an upper bound for a l + b h + b m + b l as such a function: a l + b h + b m + b l a l + b h + b m + b l a h ( βo βo βu 2) Since β o 1, β u 1 we can check that a l + b h + b m + b l 2 1 a h In consequence a h + (a l + b h + b m + b l ) 1 2 a h Using this lower bound, we can finally give an upper bound for the relative error ε of the considered procedure: r h + r m + r l = ((a h + a l ) + (b h + b m + b l )) (1 + ε) with ε 2 βo βu βo Last but not least, let us now analyse the additional overlaps generated by the procedure. It is clear that r m and r l do not overlap at all because they are computed by the Add12 procedure. Let us merely examine now the overlap of r h and r m. We begin by giving a lower bound for r h as a function of a h : r h = (a h + b h ) ( a h + b h ) ( ) 3 4 a h ( ) 1 2 a h = 1 2 a h Let us then find an upper bound for r m using also here a term which is a function of a h : r m = (r m + r l )

29 Basic building blocks for a triple-double intermediate format 27 2 r m + r l = 2 t 4 + t 7 2 t t 5 + t 3 + b l + δ 2 t t t b l + 2 δ 2 t 4 + a h ( βo 52) + a h ( βo 53) + a h 2 βo βu 1 + a h (2 βo βu βo ) By bounding finally still t 4 by a term that is function of a h we obtain We finally check that we have t 4 = (t 1 + t 2 ) 2 t t r h + 2 (a l + b m ) 2 51 a h + b h + 4 a l + b m a h (2 βo ) (2 48 r h a h +2 βo βo βo βo βu 1 +2 βo βu βo ) from which we can deduce the following bound a h ( βo βo βu) r m r h ( βo βo βu+1) r m 2 γ r h with This finishes the proof. γ min (45, β o 4, β o + β u 2) 6 Triple-double multiplication operators 6.1 The multiplication procedure Mul23 Let us go on with an analysis of the multiplication procedure Mul23. Algorithm 6.1 (Mul23) In: two double-double numbers a h + a l and b h + b l

30 28 Christoph Quirin Lauter Out: a triple-double number r h + r m + r l Preconditions on the arguments: a l 2 53 a h b l 2 53 b h Algorithm: (r h, t 1 ) Mul12 (a h, b h ) (t 2, t 3 ) Mul12 (a h, b l ) (t 4, t 5 ) Mul12 (a l, b h ) t 6 a l b l (t 7, t 8 ) Add22 (t 2, t 3, t 4, t 5 ) (t 9, t 10 ) Add12 (t 1, t 6 ) (r m, r l ) Add22 (t 7, t 8, t 9, t 10 ) Theorem 6.2 (Relative error of algorithm 6.1 Mul23) Let be a h + a l and b h + b l the values taken by arguments of algorithm 6.1 Mul23 So the following holds for the values returned r h, r m and r l : r h + r m + r l = ((a h + a l ) (b h + b l )) (1 + ε) where ε is bounded as follows: ε The values returned r m and r l will not overlap at all and the overlap of r h et r m will be bounded as follows: r m 2 48 r h Since algorithm 6.1 is relatively long, we will proceed by analysing sub-sequences. So let us consider first the following sequence: (t 2, t 3 ) Mul12 (a h, b l ) (t 4, t 5 ) Mul12 (a l, b h ) (t 7, t 8 ) Add22 (t 2, t 3, t 4, t 5 ) Clearly t 7 and t 8 will not overlap. The same way t 2 and t 3 and t 4 and t 5 will not overlap and we know that we have exactly the following egalities t 2 + t 3 = a h b l t 4 + t 5 = b h a l Further we can check that t (a h b l ) 2 52 a h b l and similarly t b h a l So we have on the one side 2 53 t 3 + t t t a h b l b h a l

31 Basic building blocks for a triple-double intermediate format 29 and on the other t 2 + t 3 + t 4 + t b h a l + a h b l Using theorem 4.4 it is possible to note with b h a l a h b l t 7 + t 8 = a h b l + a l b h + δ 1 δ a h b l a l b h Let us now consider the following sub-sequence of algorithm 6.1: (r h, t 1 ) Mul12 (a h, b h ) t 6 a l b l (t 9, t 10 ) Add12 (t 1, t 6 ) Trivially t 9 and t 10 do not overlap. Additionally, one sees that we have exactly r h + t 1 = a h b h and, exactly too, So using where ε 2 53 we get t 9 + t 10 = t 1 + t 6 t 6 = a l b l = a l b l (1 + ε) with r h + t 9 + t 10 = r h + t 1 + t 6 Let us now bound t 9 with regard to a h b h : We have t 9 ( t 1 + t 6 ) = a h b h + t 6 = a h b h + a l b l + δ 2 δ a l b l ( t 1 + ( a l b l )) ( t 1 + ( a h b h )) ( 2 53 a h b h a h b h ) 2 51 a h b h With inequalities given, we can bound now the absolute and relative error of algorithm Mul We know already that t 7 + t 8 = a h b l + a l b h + δ 1 where and where δ a h b l b h a l r h + t 9 + t 10 = a h b h + a l b l + δ 2 δ a l b l

32 30 Christoph Quirin Lauter One remarks that t t 7 t t 9 and easily checks that and that 2 53 t 10 + t t t t 9 + t 10 + t 7 + t t t 7 So by means of the theorem 4.4, we obtain that r m + r l = t 9 + t 10 + t 7 + t 8 + δ 3 where So finally we get δ t t 7 r h + r m + r l = a h b h + a h b l + a l b h + a l b l + δ where δ = δ 1 + δ 2 + δ 3 And for t 7 we obtain the following inequalities t 7 ( t 7 + t 8 ) δ 1 + δ 2 + δ a l b h a h b l a l b l a h b h t 7 2 t 7 + t 8 2 ( a h b l + a l b h a l b h a h b l ) 8 a h b l In consequence we can check δ a h b h Let us give now an upper bound for a h b l + a l b h + a l b l as a function of a h b h : We have from which we deduce that a h b l + a l b h + a l b l a h b l + a l b h + a l b l 2 51 a h b h a h b h + (a h b l + a l b h + a l b l ) a h b h (1 2 51) 1 2 a h b h Thus δ a h b h + a h b l + a l b h + a l b l

33 Basic building blocks for a triple-double intermediate format 31 So we can finally give an upper bound for the relative error ε of the multiplication procedure Mul23 defined by algorithm 6.1: r h + r m + r l = (a h + a l ) (b h + b l ) (1 + ε) with ε Before concluding, we must still analyse the overlap of the different components of the tripledouble number returned by the algorithm. It is clear that r m and r l do not overlap because the Add22 brick ensures this. Let us now consider the magnitude of r m with regard to the one of r h. We give first a lower bound for r h : and then an upper bound for r m : r m ( r m + r l ) ( t 7 + t 8 + t 9 + t 10 + δ 3 ) r h = (a h b h ) ( a h b h ) 1 2 a h b h ( a h b l + a l b h + δ 1 + t 9 + t 10 + δ 3 ) ( a h b h ( )) 2 49 a h b h From this we can deduce the final bound r m 2 48 r h 6.2 The multiplication procedure Mul233 Let us concentrate now on the multiplication sequence Mul233. Algorithm 6.3 (Mul233) In: a double-double number a h + a l and a triple-double number b h + b m + b l Out: a triple-double number r h + r m + r l Preconditions on the arguments: with Algorithm: a l 2 53 a h b m 2 βo b h b l 2 βu b m β o 2 β u 1

34 32 Christoph Quirin Lauter (r h, t 1 ) Mul12 (a h, b h ) (t 2, t 3 ) Mul12 (a h, b m ) (t 4, t 5 ) Mul12 (a h, b l ) (t 6, t 7 ) Mul12 (a l, b h ) (t 8, t 9 ) Mul12 (a l, b m ) t 10 a l b l (t 11, t 12 ) Add22 (t 2, t 3, t 4, t 5 ) (t 13, t 14 ) Add22 (t 6, t 7, t 8, t 9 ) (t 15, t 16 ) Add22 (t 11, t 12, t 13, t 14 ) (t 17, t 18 ) Add12 (t 1, t 10 ) (r m, r l ) Add22 (t 17, t 18, t 15, t 16 ) Theorem 6.4 (Relative error of algorithm 6.3 Mul233) Let be a h + a l and b h + b m + b l the values in argument of algorithm 6.3 Mul233 given preconditions hold. So the following will hold for the values r h, r m and r l returned such that the where ε is bounded as follows: r h + r m + r l = ((a h + a l ) (b h + b m + b l )) (1 + ε) ε 2 99 βo βo βu βo+1 2 βo βu βo βo βu The values r m and r l will not overlap at all and the following bound will be verified for the overlap of r h and r m : r m 2 γ r h where γ min (48, β o 4, β o + β u 4) During this proof we will once again proceed by basic bricks that we will assemble in the end. Let us therefore start by the following one: Since we have the exact egalities and (t 2, t 3 ) Mul12 (a h, b m ) (t 4, t 5 ) Mul12 (a h, b l ) (t 11, t 12 ) Add22 (t 2, t 3, t 4, t 5 ) t 2 + t 3 = a h b m t 4 + t 5 = a h b l and since we know that t 2 and t 3 and t 4 and 5 do not overlap, it suffices to apply the bound proven at theorem 4.4. So we can check on the one hand and on the other 2 53 t 3 + t t t t t a h b m a h b l βo a h b m βo βu a h b l t 2 + t 3 + t 4 + t 5 = a h b m + a h b l a h b m a h b l βo a h b h βo βu a h b l

RN-coding of numbers: definition and some properties

RN-coding of numbers: definition and some properties Laboratoire de l Informatique du Parallélisme École Normale Supérieure de Lyon Unité Mixte de Recherche CNRS-INRIA-ENS LYON-UCBL n o 5668 RN-coding of numbers: definition and some properties Peter Kornerup,

More information

Laboratoire de l Informatique du Parallélisme. École Normale Supérieure de Lyon Unité Mixte de Recherche CNRS-INRIA-ENS LYON n o 8512

Laboratoire de l Informatique du Parallélisme. École Normale Supérieure de Lyon Unité Mixte de Recherche CNRS-INRIA-ENS LYON n o 8512 Laboratoire de l Informatique du Parallélisme École Normale Supérieure de Lyon Unité Mixte de Recherche CNRS-INRIA-ENS LYON n o 8512 SPI A few results on table-based methods Jean-Michel Muller October

More information

Laboratoire de l Informatique du Parallélisme

Laboratoire de l Informatique du Parallélisme Laboratoire de l Informatique du Parallélisme Ecole Normale Supérieure de Lyon Unité de recherche associée au CNRS n 1398 An Algorithm that Computes a Lower Bound on the Distance Between a Segment and

More information

Multiplication by an Integer Constant: Lower Bounds on the Code Length

Multiplication by an Integer Constant: Lower Bounds on the Code Length INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE Multiplication by an Integer Constant: Lower Bounds on the Code Length Vincent Lefèvre N 4493 Juillet 2002 THÈME 2 ISSN 0249-6399 ISRN INRIA/RR--4493--FR+ENG

More information

Bound on Run of Zeros and Ones for Images of Floating-Point Numbers by Algebraic Functions

Bound on Run of Zeros and Ones for Images of Floating-Point Numbers by Algebraic Functions Bound on Run of Zeros and Ones for Images of Floating-Point Numbers by Algebraic Functions Tomas Lang, Jean-Michel Muller To cite this version: Tomas Lang, Jean-Michel Muller Bound on Run of Zeros and

More information

The impact of heterogeneity on master-slave on-line scheduling

The impact of heterogeneity on master-slave on-line scheduling Laboratoire de l Informatique du Parallélisme École Normale Supérieure de Lyon Unité Mixte de Recherche CNRS-INRIA-ENS LYON-UCBL n o 5668 The impact of heterogeneity on master-slave on-line scheduling

More information

On the coverage probability of the Clopper-Pearson confidence interval

On the coverage probability of the Clopper-Pearson confidence interval On the coverage probability of the Clopper-Pearson confidence interval Sur la probabilité de couverture de l intervalle de confiance de Clopper-Pearson Rapport Interne GET / ENST Bretagne Dominique Pastor

More information

The Euclidean Division Implemented with a Floating-Point Multiplication and a Floor

The Euclidean Division Implemented with a Floating-Point Multiplication and a Floor The Euclidean Division Implemented with a Floating-Point Multiplication and a Floor Vincent Lefèvre 11th July 2005 Abstract This paper is a complement of the research report The Euclidean division implemented

More information

A set of formulas for primes

A set of formulas for primes A set of formulas for primes by Simon Plouffe December 31, 2018 Abstract In 1947, W. H. Mills published a paper describing a formula that gives primes : if A 1.3063778838630806904686144926 then A is always

More information

A set of formulas for primes

A set of formulas for primes A set of formulas for primes by Simon Plouffe December 31, 2018 Abstract In 1947, W. H. Mills published a paper describing a formula that gives primes : if A 1.3063778838630806904686144926 then A is always

More information

A set of formulas for primes

A set of formulas for primes A set of formulas for primes by Simon Plouffe December 31, 2018 Abstract In 1947, W. H. Mills published a paper describing a formula that gives primes : if A 1.3063778838630806904686144926 then A is always

More information

A set of formulas for primes

A set of formulas for primes A set of formulas for primes by Simon Plouffe December 31, 2018 Abstract In 1947, W. H. Mills published a paper describing a formula that gives primes : if A 1.3063778838630806904686144926 then A is always

More information

Odd perfect polynomials over F 2

Odd perfect polynomials over F 2 Journal de Théorie des Nombres de Bordeaux 19 (2007), 165 174 Odd perfect polynomials over F 2 par Luis H. GALLARDO et Olivier RAHAVANDRAINY Résumé. Un polynôme A F 2 [x] est dit parfait s il est égal

More information

Error Bounds on Complex Floating-Point Multiplication

Error Bounds on Complex Floating-Point Multiplication Error Bounds on Complex Floating-Point Multiplication Richard Brent, Colin Percival, Paul Zimmermann To cite this version: Richard Brent, Colin Percival, Paul Zimmermann. Error Bounds on Complex Floating-Point

More information

Up-to Techniques for Weak Bisimulation

Up-to Techniques for Weak Bisimulation Laboratoire de l Informatique du Parallélisme École Normale Supérieure de Lyon Unité Mixte de Recherche CNRS-INRIA-ENS LYON-UCBL n o 5668 Up-to Techniques for Weak Bisimulation Damien Pous Avril 2005 Research

More information

Hierarchy among Automata on Linear Orderings

Hierarchy among Automata on Linear Orderings Hierarchy among Automata on Linear Orderings Véronique Bruyère Institut d Informatique Université de Mons-Hainaut Olivier Carton LIAFA Université Paris 7 Abstract In a preceding paper, automata and rational

More information

The natural numbers. The natural numbers come with an addition +, a multiplication and an order < p < q, q < p, p = q.

The natural numbers. The natural numbers come with an addition +, a multiplication and an order < p < q, q < p, p = q. The natural numbers N = {0, 1,, 3, } The natural numbers come with an addition +, a multiplication and an order < p, q N, p + q N. p, q N, p q N. p, q N, exactly one of the following holds: p < q, q

More information

Discrete Dynamical Systems: Necessary Divergence Conditions for Synchronous Iterations

Discrete Dynamical Systems: Necessary Divergence Conditions for Synchronous Iterations LABORATOIRE D INFORMATIQUE DE L UNIVERSITE DE FRANCHE-COMTE EA 4269 Discrete Dynamical Systems: Necessary Divergence Conditions for Synchronous Iterations Jacques M. Bahi Jean-François Couchot Olivier

More information

ERROR BOUNDS ON COMPLEX FLOATING-POINT MULTIPLICATION

ERROR BOUNDS ON COMPLEX FLOATING-POINT MULTIPLICATION MATHEMATICS OF COMPUTATION Volume 00, Number 0, Pages 000 000 S 005-5718(XX)0000-0 ERROR BOUNDS ON COMPLEX FLOATING-POINT MULTIPLICATION RICHARD BRENT, COLIN PERCIVAL, AND PAUL ZIMMERMANN In memory of

More information

Efficient and accurate computation of upper bounds of approximation errors

Efficient and accurate computation of upper bounds of approximation errors Laboratoire de l Informatique du Parallélisme École Normale Supérieure de Lyon Unité Mixte de Recherche CNRS-INRIA-ENS LYON-UCBL n o 5668 Efficient and accurate computation of upper bounds of approximation

More information

A general scheme for deciding the branchwidth

A general scheme for deciding the branchwidth Laboratoire de l Informatique du Parallélisme École Normale Supérieure de Lyon Unité Mixte de Recherche CNRS-INRIA-ENS LYON-UCBL n o 5668 A general scheme for deciding the branchwidth Frédéric Mazoit Juin

More information

Asymptotically Fast Polynomial Matrix Algorithms for Multivariable Systems

Asymptotically Fast Polynomial Matrix Algorithms for Multivariable Systems Laboratoire de l Informatique du Parallélisme École Normale Supérieure de Lyon Unité Mixte de Recherche CNRS-INRIA-ENS LYON-UCBL n o 5668 Asymptotically Fast Polynomial Matrix Algorithms for Multivariable

More information

Permuting the partitions of a prime

Permuting the partitions of a prime Journal de Théorie des Nombres de Bordeaux 00 (XXXX), 000 000 Permuting the partitions of a prime par Stéphane VINATIER Résumé. Étant donné un nombre premier p impair, on caractérise les partitions l de

More information

ANNALES. FLORENT BALACHEFF, ERAN MAKOVER, HUGO PARLIER Systole growth for finite area hyperbolic surfaces

ANNALES. FLORENT BALACHEFF, ERAN MAKOVER, HUGO PARLIER Systole growth for finite area hyperbolic surfaces ANNALES DE LA FACULTÉ DES SCIENCES Mathématiques FLORENT BALACHEFF, ERAN MAKOVER, HUGO PARLIER Systole growth for finite area hyperbolic surfaces Tome XXIII, n o 1 (2014), p. 175-180.

More information

Maximizing the Cohesion is NP-hard

Maximizing the Cohesion is NP-hard INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE Maximizing the Cohesion is NP-hard arxiv:1109.1994v2 [cs.ni] 9 Oct 2011 Adrien Friggeri Eric Fleury N 774 version 2 initial version 8 September

More information

Acyclicity and Finite Linear Extendability: a Formal and Constructive Equivalence

Acyclicity and Finite Linear Extendability: a Formal and Constructive Equivalence Laboratoire de l Informatique du Parallélisme École Normale Supérieure de Lyon Unité Mixte de Recherche CNRS-INRIA-ENS LYON-UCBL n o 5668 Acyclicity and Finite Linear Extendability: a Formal and Constructive

More information

Table-based polynomials for fast hardware function evaluation

Table-based polynomials for fast hardware function evaluation Laboratoire de l Informatique du Parallélisme École Normale Supérieure de Lyon Unité Mixte de Recherche CNRS-INRIA-ENS LYON-UCBL n o 5668 Table-based polynomials for fast hardware function evaluation Jérémie

More information

Innocuous Double Rounding of Basic Arithmetic Operations

Innocuous Double Rounding of Basic Arithmetic Operations Innocuous Double Rounding of Basic Arithmetic Operations Pierre Roux ISAE, ONERA Double rounding occurs when a floating-point value is first rounded to an intermediate precision before being rounded to

More information

Séminaire d analyse fonctionnelle École Polytechnique

Séminaire d analyse fonctionnelle École Polytechnique Séminaire d analyse fonctionnelle École Polytechnique P. ENFLO On infinite-dimensional topological groups Séminaire d analyse fonctionnelle (Polytechnique) (1977-1978), exp. n o 10 et 11, p. 1-11

More information

ALGEBRA. 1. Some elementary number theory 1.1. Primes and divisibility. We denote the collection of integers

ALGEBRA. 1. Some elementary number theory 1.1. Primes and divisibility. We denote the collection of integers ALGEBRA CHRISTIAN REMLING 1. Some elementary number theory 1.1. Primes and divisibility. We denote the collection of integers by Z = {..., 2, 1, 0, 1,...}. Given a, b Z, we write a b if b = ac for some

More information

On various ways to split a floating-point number

On various ways to split a floating-point number On various ways to split a floating-point number Claude-Pierre Jeannerod Jean-Michel Muller Paul Zimmermann Inria, CNRS, ENS Lyon, Université de Lyon, Université de Lorraine France ARITH-25 June 2018 -2-

More information

Computation of the error functions erf and erfc in arbitrary precision with correct rounding

Computation of the error functions erf and erfc in arbitrary precision with correct rounding Computation of the error functions erf and erfc in arbitrary precision with correct rounding Sylvain Chevillard Arenaire, LIP, ENS-Lyon, France Sylvain.Chevillard@ens-lyon.fr Nathalie Revol INRIA, Arenaire,

More information

Laboratoire de l Informatique du Parallélisme

Laboratoire de l Informatique du Parallélisme Laboratoire de l Informatique du Parallélisme Ecole Normale Supérieure de Lyon Unité de recherche associée au CNRS n 1398 Reciprocation, Square root, Inverse Square Root, and some Elementary Functions

More information

Computing Machine-Efficient Polynomial Approximations

Computing Machine-Efficient Polynomial Approximations Computing Machine-Efficient Polynomial Approximations N. Brisebarre, S. Chevillard, G. Hanrot, J.-M. Muller, D. Stehlé, A. Tisserand and S. Torres Arénaire, LIP, É.N.S. Lyon Journées du GDR et du réseau

More information

How do computers represent numbers?

How do computers represent numbers? How do computers represent numbers? Tips & Tricks Week 1 Topics in Scientific Computing QMUL Semester A 2017/18 1/10 What does digital mean? The term DIGITAL refers to any device that operates on discrete

More information

Replicator Dynamics and Correlated Equilibrium

Replicator Dynamics and Correlated Equilibrium Replicator Dynamics and Correlated Equilibrium Yannick Viossat To cite this version: Yannick Viossat. Replicator Dynamics and Correlated Equilibrium. CECO-208. 2004. HAL Id: hal-00242953

More information

LIP. Laboratoire de l Informatique du Parallélisme. Ecole Normale Supérieure de Lyon

LIP. Laboratoire de l Informatique du Parallélisme. Ecole Normale Supérieure de Lyon LIP Laboratoire de l Informatique du Parallélisme Ecole Normale Supérieure de Lyon Institut IMAG Unité de recherche associée au CNRS n 1398 Inversion of 2D cellular automata: some complexity results runo

More information

Apprentissage automatique Méthodes à noyaux - motivation

Apprentissage automatique Méthodes à noyaux - motivation Apprentissage automatique Méthodes à noyaux - motivation MODÉLISATION NON-LINÉAIRE prédicteur non-linéaire On a vu plusieurs algorithmes qui produisent des modèles linéaires (régression ou classification)

More information

On the Complexity of Mapping Pipelined Filtering Services on Heterogeneous Platforms

On the Complexity of Mapping Pipelined Filtering Services on Heterogeneous Platforms On the Complexity of Mapping Pipelined Filtering Services on Heterogeneous Platforms Anne Benoit, Fanny Dufossé and Yves Robert LIP, École Normale Supérieure de Lyon, France {Anne.Benoit Fanny.Dufosse

More information

Standard forms for writing numbers

Standard forms for writing numbers Standard forms for writing numbers In order to relate the abstract mathematical descriptions of familiar number systems to the everyday descriptions of numbers by decimal expansions and similar means,

More information

Problem 3. Give an example of a sequence of continuous functions on a compact domain converging pointwise but not uniformly to a continuous function

Problem 3. Give an example of a sequence of continuous functions on a compact domain converging pointwise but not uniformly to a continuous function Problem 3. Give an example of a sequence of continuous functions on a compact domain converging pointwise but not uniformly to a continuous function Solution. If we does not need the pointwise limit of

More information

2. Two binary operations (addition, denoted + and multiplication, denoted

2. Two binary operations (addition, denoted + and multiplication, denoted Chapter 2 The Structure of R The purpose of this chapter is to explain to the reader why the set of real numbers is so special. By the end of this chapter, the reader should understand the difference between

More information

A Proof of the Focusing Property of Linear Logic

A Proof of the Focusing Property of Linear Logic A Proof of the Focusing Property of Linear Logic Olivier LAURENT Laboratoire de l Informatique du Parallélisme (UMR 5668) Université de Lyon, CNRS, ENS de Lyon, Université Claude Bernard Lyon 1 46, allée

More information

Centre d Economie de la Sorbonne UMR 8174

Centre d Economie de la Sorbonne UMR 8174 Centre d Economie de la Sorbonne UMR 8174 A Generalization of Fan s Matching Theorem Souhail CHEBBI Pascal GOURDEL Hakim HAMMAMI 2006.60 Maison des Sciences Économiques, 106-112 boulevard de L'Hôpital,

More information

3. Abstract Boolean Algebras

3. Abstract Boolean Algebras 3. ABSTRACT BOOLEAN ALGEBRAS 123 3. Abstract Boolean Algebras 3.1. Abstract Boolean Algebra. Definition 3.1.1. An abstract Boolean algebra is defined as a set B containing two distinct elements 0 and 1,

More information

Binary Floating-Point Numbers

Binary Floating-Point Numbers Binary Floating-Point Numbers S exponent E significand M F=(-1) s M β E Significand M pure fraction [0, 1-ulp] or [1, 2) for β=2 Normalized form significand has no leading zeros maximum # of significant

More information

MATH 117 LECTURE NOTES

MATH 117 LECTURE NOTES MATH 117 LECTURE NOTES XIN ZHOU Abstract. This is the set of lecture notes for Math 117 during Fall quarter of 2017 at UC Santa Barbara. The lectures follow closely the textbook [1]. Contents 1. The set

More information

Propagation of Roundoff Errors in Finite Precision Computations: a Semantics Approach 1

Propagation of Roundoff Errors in Finite Precision Computations: a Semantics Approach 1 Propagation of Roundoff Errors in Finite Precision Computations: a Semantics Approach 1 Matthieu Martel CEA - Recherche Technologique LIST-DTSI-SLA CEA F91191 Gif-Sur-Yvette Cedex, France e-mail : mmartel@cea.fr

More information

Mathematics-I Prof. S.K. Ray Department of Mathematics and Statistics Indian Institute of Technology, Kanpur. Lecture 1 Real Numbers

Mathematics-I Prof. S.K. Ray Department of Mathematics and Statistics Indian Institute of Technology, Kanpur. Lecture 1 Real Numbers Mathematics-I Prof. S.K. Ray Department of Mathematics and Statistics Indian Institute of Technology, Kanpur Lecture 1 Real Numbers In these lectures, we are going to study a branch of mathematics called

More information

On the Computation of Correctly-Rounded Sums

On the Computation of Correctly-Rounded Sums INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE On the Computation of Correctly-Rounded Sums Peter Kornerup Vincent Lefèvre Nicolas Louvet Jean-Michel Muller N 7262 April 2010 Algorithms,

More information

ANNALES SCIENTIFIQUES L ÉCOLE NORMALE SUPÉRIEURE. Cluster ensembles, quantization and the dilogarithm. Vladimir V. FOCK & Alexander B.

ANNALES SCIENTIFIQUES L ÉCOLE NORMALE SUPÉRIEURE. Cluster ensembles, quantization and the dilogarithm. Vladimir V. FOCK & Alexander B. ISSN 0012-9593 ASENAH quatrième série - tome 42 fascicule 6 novembre-décembre 2009 ANNALES SCIENTIFIQUES de L ÉCOLE NORMALE SUPÉRIEURE Vladimir V. FOCK & Alexander B. GONCHAROV Cluster ensembles, quantization

More information

A note on parallel and alternating time

A note on parallel and alternating time Journal of Complexity 23 (2007) 594 602 www.elsevier.com/locate/jco A note on parallel and alternating time Felipe Cucker a,,1, Irénée Briquel b a Department of Mathematics, City University of Hong Kong,

More information

Best linear unbiased prediction when error vector is correlated with other random vectors in the model

Best linear unbiased prediction when error vector is correlated with other random vectors in the model Best linear unbiased prediction when error vector is correlated with other random vectors in the model L.R. Schaeffer, C.R. Henderson To cite this version: L.R. Schaeffer, C.R. Henderson. Best linear unbiased

More information

Adsorption of chain molecules with a polar head a scaling description

Adsorption of chain molecules with a polar head a scaling description Adsorption of chain molecules with a polar head a scaling description S. Alexander To cite this version: S. Alexander. Adsorption of chain molecules with a polar head a scaling description. Journal de

More information

Characterization of Semantics for Argument Systems

Characterization of Semantics for Argument Systems Characterization of Semantics for Argument Systems Philippe Besnard and Sylvie Doutre IRIT Université Paul Sabatier 118, route de Narbonne 31062 Toulouse Cedex 4 France besnard, doutre}@irit.fr Abstract

More information

Representable correcting terms for possibly underflowing floating point operations

Representable correcting terms for possibly underflowing floating point operations Representable correcting terms for possibly underflowing floating point operations Sylvie Boldo & Marc Daumas E-mail: Sylvie.Boldo@ENS-Lyon.Fr & Marc.Daumas@ENS-Lyon.Fr Laboratoire de l Informatique du

More information

Some Applications of MATH 1090 Techniques

Some Applications of MATH 1090 Techniques 1 Some Applications of MATH 1090 Techniques 0.1 A simple problem about Nand gates in relation to And gates and inversion (Not) Gates. In this subsection the notation A means that A is proved from the axioms

More information

Laver Tables A Direct Approach

Laver Tables A Direct Approach Laver Tables A Direct Approach Aurel Tell Adler June 6, 016 Contents 1 Introduction 3 Introduction to Laver Tables 4.1 Basic Definitions............................... 4. Simple Facts.................................

More information

A note on the moving hyperplane method

A note on the moving hyperplane method 001-Luminy conference on Quasilinear Elliptic and Parabolic Equations and Systems, Electronic Journal of Differential Equations, Conference 08, 00, pp 1 6. http://ejde.math.swt.edu or http://ejde.math.unt.edu

More information

It s a Small World After All Calculus without s and s

It s a Small World After All Calculus without s and s It s a Small World After All Calculus without s and s Dan Sloughter Department of Mathematics Furman University November 18, 2004 Smallworld p1/39 L Hôpital s axiom Guillaume François Antoine Marquis de

More information

Kato s inequality when u is a measure. L inégalité de Kato lorsque u est une mesure

Kato s inequality when u is a measure. L inégalité de Kato lorsque u est une mesure Kato s inequality when u is a measure L inégalité de Kato lorsque u est une mesure Haïm Brezis a,b, Augusto C. Ponce a,b, a Laboratoire Jacques-Louis Lions, Université Pierre et Marie Curie, BC 187, 4

More information

Outils de Recherche Opérationnelle en Génie MTH Astuce de modélisation en Programmation Linéaire

Outils de Recherche Opérationnelle en Génie MTH Astuce de modélisation en Programmation Linéaire Outils de Recherche Opérationnelle en Génie MTH 8414 Astuce de modélisation en Programmation Linéaire Résumé Les problèmes ne se présentent pas toujours sous une forme qui soit naturellement linéaire.

More information

Minimum sizes of identifying codes in graphs differing by one edge or one vertex

Minimum sizes of identifying codes in graphs differing by one edge or one vertex Minimum sizes of identifying codes in graphs differing by one edge or one vertex Taille minimum des codes identifiants dans les graphes différant par un sommet ou une arête Irène Charon Olivier Hudry Antoine

More information

A note on top-k lists: average distance between two top-k lists

A note on top-k lists: average distance between two top-k lists A note on top-k lists: average distance between two top-k lists Antoine Rolland Univ Lyon, laboratoire ERIC, université Lyon, 69676 BRON, antoine.rolland@univ-lyon.fr Résumé. Ce papier présente un complément

More information

REVUE FRANÇAISE D INFORMATIQUE ET DE

REVUE FRANÇAISE D INFORMATIQUE ET DE REVUE FRANÇAISE D INFORMATIQUE ET DE RECHERCHE OPÉRATIONNELLE, SÉRIE ROUGE SURESH CHANDRA Decomposition principle for linear fractional functional programs Revue française d informatique et de recherche

More information

How to Ensure a Faithful Polynomial Evaluation with the Compensated Horner Algorithm

How to Ensure a Faithful Polynomial Evaluation with the Compensated Horner Algorithm How to Ensure a Faithful Polynomial Evaluation with the Compensated Horner Algorithm Philippe Langlois, Nicolas Louvet Université de Perpignan, DALI Research Team {langlois, nicolas.louvet}@univ-perp.fr

More information

Formal verification of IA-64 division algorithms

Formal verification of IA-64 division algorithms Formal verification of IA-64 division algorithms 1 Formal verification of IA-64 division algorithms John Harrison Intel Corporation IA-64 overview HOL Light overview IEEE correctness Division on IA-64

More information

APPROXIMATE EXPRESSION OF THE "THREE HALVES LAW" FOR A BOUNDED CATHODE IN A UNIFORM FIELD

APPROXIMATE EXPRESSION OF THE THREE HALVES LAW FOR A BOUNDED CATHODE IN A UNIFORM FIELD A translation from Atomic Energy of Canada Limited APPROXIMATE EXPRESSION OF THE "THREE HALVES LAW" FOR A BOUNDED CATHODE IN A UNIFORM FIELD (INTRODUCED BY ACADEMICIAN N.N. SEMENOV 3 JULY 1952) by I.I.

More information

Linear Algebra II. 2 Matrices. Notes 2 21st October Matrix algebra

Linear Algebra II. 2 Matrices. Notes 2 21st October Matrix algebra MTH6140 Linear Algebra II Notes 2 21st October 2010 2 Matrices You have certainly seen matrices before; indeed, we met some in the first chapter of the notes Here we revise matrix algebra, consider row

More information

ADVANCED CALCULUS - MTH433 LECTURE 4 - FINITE AND INFINITE SETS

ADVANCED CALCULUS - MTH433 LECTURE 4 - FINITE AND INFINITE SETS ADVANCED CALCULUS - MTH433 LECTURE 4 - FINITE AND INFINITE SETS 1. Cardinal number of a set The cardinal number (or simply cardinal) of a set is a generalization of the concept of the number of elements

More information

Toward Correctly Rounded Transcendentals

Toward Correctly Rounded Transcendentals IEEE TRANSACTIONS ON COMPUTERS, VOL. 47, NO. 11, NOVEMBER 1998 135 Toward Correctly Rounded Transcendentals Vincent Lefèvre, Jean-Michel Muller, Member, IEEE, and Arnaud Tisserand Abstract The Table Maker

More information

cedram Article mis en ligne dans le cadre du Centre de diffusion des revues académiques de mathématiques

cedram Article mis en ligne dans le cadre du Centre de diffusion des revues académiques de mathématiques Paul FILI On the heights of totally p-adic numbers Tome 26, n o 1 (2014), p. 103-109. Société Arithmétique de Bordeaux, 2014, tous droits réservés.

More information

THE LEBESGUE DIFFERENTIATION THEOREM VIA THE RISING SUN LEMMA

THE LEBESGUE DIFFERENTIATION THEOREM VIA THE RISING SUN LEMMA Real Analysis Exchange Vol. 29(2), 2003/2004, pp. 947 951 Claude-Alain Faure, Gymnase de la Cité, Case postale 329, CH-1000 Lausanne 17, Switzerland. email: cafaure@bluemail.ch THE LEBESGUE DIFFERENTIATION

More information

HW Graph Theory SOLUTIONS (hbovik) - Q

HW Graph Theory SOLUTIONS (hbovik) - Q 1, Diestel 3.5: Deduce the k = 2 case of Menger s theorem (3.3.1) from Proposition 3.1.1. Let G be 2-connected, and let A and B be 2-sets. We handle some special cases (thus later in the induction if these

More information

Stickelberger s congruences for absolute norms of relative discriminants

Stickelberger s congruences for absolute norms of relative discriminants Stickelberger s congruences for absolute norms of relative discriminants Georges Gras To cite this version: Georges Gras. Stickelberger s congruences for absolute norms of relative discriminants. Journal

More information

Discrete Mathematics and Probability Theory Fall 2018 Alistair Sinclair and Yun Song Note 6

Discrete Mathematics and Probability Theory Fall 2018 Alistair Sinclair and Yun Song Note 6 CS 70 Discrete Mathematics and Probability Theory Fall 2018 Alistair Sinclair and Yun Song Note 6 1 Modular Arithmetic In several settings, such as error-correcting codes and cryptography, we sometimes

More information

Computing the throughput of replicated workflows on heterogeneous platforms

Computing the throughput of replicated workflows on heterogeneous platforms Laboratoire de l Informatique du Parallélisme École Normale Supérieure de Lyon Unité Mixte de Recherche CNRS-INRIA-ENS LYON-UCBL n o 5668 Computing the throughput of replicated workflows on heterogeneous

More information

Introduction CSE 541

Introduction CSE 541 Introduction CSE 541 1 Numerical methods Solving scientific/engineering problems using computers. Root finding, Chapter 3 Polynomial Interpolation, Chapter 4 Differentiation, Chapter 4 Integration, Chapters

More information

arxiv:cs/ v1 [cs.dm] 21 Apr 2005

arxiv:cs/ v1 [cs.dm] 21 Apr 2005 arxiv:cs/0504090v1 [cs.dm] 21 Apr 2005 Abstract Discrete Morse Theory for free chain complexes Théorie de Morse pour des complexes de chaines libres Dmitry N. Kozlov Eidgenössische Technische Hochschule,

More information

Elements of Floating-point Arithmetic

Elements of Floating-point Arithmetic Elements of Floating-point Arithmetic Sanzheng Qiao Department of Computing and Software McMaster University July, 2012 Outline 1 Floating-point Numbers Representations IEEE Floating-point Standards Underflow

More information

Chapter 3 Deterministic planning

Chapter 3 Deterministic planning Chapter 3 Deterministic planning In this chapter we describe a number of algorithms for solving the historically most important and most basic type of planning problem. Two rather strong simplifying assumptions

More information

Accurate Solution of Triangular Linear System

Accurate Solution of Triangular Linear System SCAN 2008 Sep. 29 - Oct. 3, 2008, at El Paso, TX, USA Accurate Solution of Triangular Linear System Philippe Langlois DALI at University of Perpignan France Nicolas Louvet INRIA and LIP at ENS Lyon France

More information

Proofs: A General How To II. Rules of Inference. Rules of Inference Modus Ponens. Rules of Inference Addition. Rules of Inference Conjunction

Proofs: A General How To II. Rules of Inference. Rules of Inference Modus Ponens. Rules of Inference Addition. Rules of Inference Conjunction Introduction I Proofs Computer Science & Engineering 235 Discrete Mathematics Christopher M. Bourke cbourke@cse.unl.edu A proof is a proof. What kind of a proof? It s a proof. A proof is a proof. And when

More information

Additive symmetries: the non-negative case

Additive symmetries: the non-negative case Theoretical Computer Science 91 (003) 143 157 www.elsevier.com/locate/tcs Additive symmetries: the non-negative case Marc Daumas, Philippe Langlois Laboratoire de l Informatique du Parallelisme, UMR 5668

More information

,... We would like to compare this with the sequence y n = 1 n

,... We would like to compare this with the sequence y n = 1 n Example 2.0 Let (x n ) n= be the sequence given by x n = 2, i.e. n 2, 4, 8, 6,.... We would like to compare this with the sequence = n (which we know converges to zero). We claim that 2 n n, n N. Proof.

More information

Valiant s model: from exponential sums to exponential products

Valiant s model: from exponential sums to exponential products Laboratoire de l Informatique du Parallélisme École Normale Supérieure de Lyon Unité Mixte de Recherche CNRS-INRIA-ENS LYON-UCBL n o 5668 Valiant s model: from exponential sums to exponential products

More information

PUTNAM TRAINING NUMBER THEORY. Exercises 1. Show that the sum of two consecutive primes is never twice a prime.

PUTNAM TRAINING NUMBER THEORY. Exercises 1. Show that the sum of two consecutive primes is never twice a prime. PUTNAM TRAINING NUMBER THEORY (Last updated: December 11, 2017) Remark. This is a list of exercises on Number Theory. Miguel A. Lerma Exercises 1. Show that the sum of two consecutive primes is never twice

More information

Products of ratios of consecutive integers

Products of ratios of consecutive integers Products of ratios of consecutive integers Régis de la Bretèche, Carl Pomerance & Gérald Tenenbaum 27/1/23, 9h26 For Jean-Louis Nicolas, on his sixtieth birthday 1. Introduction Let {ε n } 1 n

More information

CS264: Beyond Worst-Case Analysis Lecture #11: LP Decoding

CS264: Beyond Worst-Case Analysis Lecture #11: LP Decoding CS264: Beyond Worst-Case Analysis Lecture #11: LP Decoding Tim Roughgarden October 29, 2014 1 Preamble This lecture covers our final subtopic within the exact and approximate recovery part of the course.

More information

DR.RUPNATHJI( DR.RUPAK NATH )

DR.RUPNATHJI( DR.RUPAK NATH ) Contents 1 Sets 1 2 The Real Numbers 9 3 Sequences 29 4 Series 59 5 Functions 81 6 Power Series 105 7 The elementary functions 111 Chapter 1 Sets It is very convenient to introduce some notation and terminology

More information

Some Nordhaus-Gaddum-type Results

Some Nordhaus-Gaddum-type Results Some Nordhaus-Gaddum-type Results Wayne Goddard Department of Mathematics Massachusetts Institute of Technology Cambridge, USA Michael A. Henning Department of Mathematics University of Natal Pietermaritzburg,

More information

On NP-Completeness for Linear Machines

On NP-Completeness for Linear Machines JOURNAL OF COMPLEXITY 13, 259 271 (1997) ARTICLE NO. CM970444 On NP-Completeness for Linear Machines Christine Gaßner* Institut für Mathematik und Informatik, Ernst-Moritz-Arndt-Universität, F.-L.-Jahn-Strasse

More information

Decompositions of graphs into cycles with chords

Decompositions of graphs into cycles with chords Decompositions of graphs into cycles with chords Paul Balister Hao Li Richard Schelp May 22, 2017 In memory of Dick Schelp, who passed away shortly after the submission of this paper Abstract We show that

More information

Homological Dimension

Homological Dimension Homological Dimension David E V Rose April 17, 29 1 Introduction In this note, we explore the notion of homological dimension After introducing the basic concepts, our two main goals are to give a proof

More information

COMPOSITIO MATHEMATICA

COMPOSITIO MATHEMATICA COMPOSITIO MATHEMATICA F. LOONSTRA The classes of partially ordered groups Compositio Mathematica, tome 9 (1951), p. 130-140 Foundation Compositio Mathematica,

More information

Improving the Compensated Horner Scheme with a Fused Multiply and Add

Improving the Compensated Horner Scheme with a Fused Multiply and Add Équipe de Recherche DALI Laboratoire LP2A, EA 3679 Université de Perpignan Via Domitia Improving the Compensated Horner Scheme with a Fused Multiply and Add S. Graillat, Ph. Langlois, N. Louvet DALI-LP2A

More information

Quadratic Equations Part I

Quadratic Equations Part I Quadratic Equations Part I Before proceeding with this section we should note that the topic of solving quadratic equations will be covered in two sections. This is done for the benefit of those viewing

More information

Basic Algebra. Final Version, August, 2006 For Publication by Birkhäuser Boston Along with a Companion Volume Advanced Algebra In the Series

Basic Algebra. Final Version, August, 2006 For Publication by Birkhäuser Boston Along with a Companion Volume Advanced Algebra In the Series Basic Algebra Final Version, August, 2006 For Publication by Birkhäuser Boston Along with a Companion Volume Advanced Algebra In the Series Cornerstones Selected Pages from Chapter I: pp. 1 15 Anthony

More information

Numerics and Error Analysis

Numerics and Error Analysis Numerics and Error Analysis CS 205A: Mathematical Methods for Robotics, Vision, and Graphics Doug James (and Justin Solomon) CS 205A: Mathematical Methods Numerics and Error Analysis 1 / 30 A Puzzle What

More information