Floating Point Arithmetic and Rounding Error Analysis

Size: px
Start display at page:

Download "Floating Point Arithmetic and Rounding Error Analysis"

Transcription

1 Floating Point Arithmetic and Rounding Error Analysis Claude-Pierre Jeannerod Nathalie Revol Inria LIP, ENS de Lyon

2 2 Context Starting point: How do numerical algorithms behave in nite precision arithmetic? Typically, basic matrix computations: Ax = b,... oating-point arithmetic as specied by IEEE 754.

3 2 Context Starting point: How do numerical algorithms behave in nite precision arithmetic? Typically, basic matrix computations: Ax = b,... oating-point arithmetic as specied by IEEE 754. Finite precision rounding errors: =

4 2 Context Starting point: How do numerical algorithms behave in nite precision arithmetic? Typically, basic matrix computations: Ax = b,... oating-point arithmetic as specied by IEEE 754. Finite precision rounding errors: = / =

5 2 Context Starting point: How do numerical algorithms behave in nite precision arithmetic? Typically, basic matrix computations: Ax = b,... oating-point arithmetic as specied by IEEE 754. Finite precision rounding errors: = / = =

6 2 Context Starting point: How do numerical algorithms behave in nite precision arithmetic? Typically, basic matrix computations: Ax = b,... oating-point arithmetic as specied by IEEE 754. Finite precision rounding errors: = / = = What is the eect of all such errors on the computed solution x?

7 Rounding error analysis 3 Old and nontrivial question [von Neumann, Turing, Wilkinson,...]

8 3 Rounding error analysis Old and nontrivial question [von Neumann, Turing, Wilkinson,...] In this lecture, two approaches: A priori analysis: Goal: bound on x x / x for any input and format Tool: the many nice properties of oating-point Ideal: readable, provably tight bound + short proof

9 3 Rounding error analysis Old and nontrivial question [von Neumann, Turing, Wilkinson,...] In this lecture, two approaches: A priori analysis: Goal: bound on x x / x for any input and format Tool: the many nice properties of oating-point Ideal: readable, provably tight bound + short proof A posteriori, automatic analysis: Goal: x and enclosure of x x for given input and format Tool: interval arithmetic based on oating-point Ideal: a narrow interval computed fast

10 4 Rounding error analysis Old and nontrivial question [von Neumann, Turing, Wilkinson,...] In this lecture, two approaches: A priori analysis: this lecture Goal: bound on x x / x for any input and format Tool: the many nice properties of oating-point Ideal: readable, provably tight bound + short proof A posteriori, automatic analysis: Nathalie's lecture Goal: x and enclosure of x x for given input and format Tool: interval arithmetic based on oating-point Ideal: a narrow interval computed fast

11 5 Context Floating-point arithmetic A priori analysis Conclusion

12 Floating-point arithmetic 6 An approximation of arithmetic over R. 1940's: rst implementations [Zuse's computers] : full specication [IEEE 754 standard]. Today: IEEE arithmetic everywhere!

13 Floating-point arithmetic 6 An approximation of arithmetic over R. 1940's: rst implementations [Zuse's computers] : full specication [IEEE 754 standard]. Today: IEEE arithmetic everywhere! Although often considered as fuzzy, it is highly structured and has many nice mathematical properties.

14 Floating-point arithmetic 6 An approximation of arithmetic over R. 1940's: rst implementations [Zuse's computers] : full specication [IEEE 754 standard]. Today: IEEE arithmetic everywhere! Although often considered as fuzzy, it is highly structured and has many nice mathematical properties. How to exploit these properties for rigorous analyses?

15 7 Floating-point numbers Rational numbers of the form M β E, where M, E Z, M < β p, E + p 1 [e min, e max ]. base β, precision p, exponent range dened by e min and e max.

16 7 Floating-point numbers Rational numbers of the form M β E, where M, E Z, M < β p, E + p 1 [e min, e max ]. base β, precision p, exponent range dened by e min and e max. Floats in C have β = 2, p = 24, and [e min, e max ] = [ 126, 127].

17 7 Floating-point numbers Rational numbers of the form M β E, where M, E Z, M < β p, E + p 1 [e min, e max ]. base β, precision p, exponent range dened by e min and e max. Floats in C have β = 2, p = 24, and [e min, e max ] = [ 126, 127].

18 8 Floating-point numbers We assume e min = and e max = + : unbounded exponent range, β is even and p 2.

19 8 Floating-point numbers We assume e min = and e max = + : unbounded exponent range, β is even and p 2. Denition: The set F of oating-point numbers in base β and precision p is F := {0} {M β E : M, E Z, β p 1 M < β p }.

20 Floating-point numbers: other representations x F\{0} x = m β e, m = ( } {{ } ) β [1, β). p 1 Three useful units: Unit in the rst place: ufp(x) = β e, Unit in the last place: ulp(x) = β e p+1, Unit roundo: u = 1 2 β1 p. 9

21 Floating-point numbers: other representations 9 x F\{0} x = m β e, m = ( } {{ } ) β [1, β). p 1 Three useful units: Unit in the rst place: ufp(x) = β e, Unit in the last place: ulp(x) = β e p+1, Unit roundo: u = 1 2 β1 p. Alternative views, which display the structure of F very well: x ulp(x)z, x = (1 + 2ku) ufp(x), k N. F [1, β) = { } 1, 1 + 2u, 1 + 4u,....

22 10 Floating-point numbers: some properties F can be seen as a structured grid with many nice properties: Symmetry: f F f F ; Auto-similarity: f F, e Z f β e F ;

23 Floating-point numbers: some properties 10 F can be seen as a structured grid with many nice properties: Symmetry: f F f F ; Auto-similarity: f F, e Z f β e F ; { } F [1, β) = 1, 1 + 2u, 1 + 4u,... ;

24 Floating-point numbers: some properties 10 F can be seen as a structured grid with many nice properties: Symmetry: f F f F ; Auto-similarity: f F, e Z f β e F ; { } F [1, β) = 1, 1 + 2u, 1 + 4u,... ; F [β e, β e+1 ] has (β 1)β p equally spaced elements, with spacing equal to 2uβ e ;

25 Floating-point numbers: some properties 10 F can be seen as a structured grid with many nice properties: Symmetry: f F f F ; Auto-similarity: f F, e Z f β e F ; { } F [1, β) = 1, 1 + 2u, 1 + 4u,... ; F [β e, β e+1 ] has (β 1)β p equally spaced elements, with spacing equal to 2uβ e ; Neighborhood of 1 F:..., 1 4u β, 1 2u β, 1, 1 + 2u, 1 + 4u,.... Hence 1 is β x closer to its predecessor than to its successor.

26 Rounding functions 11 Round-to-nearest function RN : R F such that t R, RN(t) t = min f t, f F with given tie-breaking rule.

27 Rounding functions 11 Round-to-nearest function RN : R F such that t R, RN(t) t = min f t, f F with given tie-breaking rule. t F RN(t) = t RN nondecreasing reasonable tie-breaking rule: RN( t) = RN(t) RN(tβ e ) = RN(t)β e, e Z

28 Rounding functions 12 More generally, a rounding function is any map from R to F such that t F (t) = t; t t (t) (t ).

29 12 Rounding functions More generally, a rounding function is any map from R to F such that t F (t) = t; t t (t) (t ). Adding just one extra constraint gives the usual directed roundings: Rounding down: RD(t) t. Rounding up: t RU(t). Rounding to zero: RZ(t) t.

30 12 Rounding functions More generally, a rounding function is any map from R to F such that t F (t) = t; t t (t) (t ). Adding just one extra constraint gives the usual directed roundings: Rounding down: RD(t) t. Rounding up: t RU(t). Rounding to zero: RZ(t) t. Key property for interval arithmetic: t F t [ ] RD(t), RU(t).

31 Error bounds for real numbers 13 E 1 (t) := RN(t) t t u 1 + u, E 2(t) := RN(t) t RN(t) u.

32 Error bounds for real numbers 13 E 1 (t) := RN(t) t t u 1 + u, E 2(t) := RN(t) t RN(t) u. Proof: Assume 1 t < β, so that RN(t) {1, 1 + 2u, 1 + 4u,..., β}.

33 Error bounds for real numbers 13 E 1 (t) := RN(t) t t u 1 + u, E 2(t) := RN(t) t RN(t) u. Proof: Assume 1 t < β, so that RN(t) {1, 1 + 2u, 1 + 4u,..., β}. Then RN(t) t 1 2u = u. 2

34 Error bounds for real numbers 13 E 1 (t) := RN(t) t t u 1 + u, E 2(t) := RN(t) t RN(t) u. Proof: Assume 1 t < β, so that RN(t) {1, 1 + 2u, 1 + 4u,..., β}. Then RN(t) t 1 2u = u. 2 Dividing by RN(t) 1 gives directly the bound on E 2.

35 Error bounds for real numbers 13 E 1 (t) := RN(t) t t u 1 + u, E 2(t) := RN(t) t RN(t) u. Proof: Assume 1 t < β, so that RN(t) {1, 1 + 2u, 1 + 4u,..., β}. Then RN(t) t 1 2u = u. 2 Dividing by RN(t) 1 gives directly the bound on E 2. If t 1 + u then the bound E 1 (t) u 1+u follows.

36 Error bounds for real numbers 13 E 1 (t) := RN(t) t t u 1 + u, E 2(t) := RN(t) t RN(t) u. Proof: Assume 1 t < β, so that RN(t) {1, 1 + 2u, 1 + 4u,..., β}. Then RN(t) t 1 2u = u. 2 Dividing by RN(t) 1 gives directly the bound on E 2. If t 1 + u then the bound E 1 (t) u 1+u follows. Else 1 t < 1 + u RN(t) = 1 E 1 (t) = t 1 t < u 1+u.

37 Error bounds for real numbers E 1 (t) := RN(t) t t u 1 + u, E 2(t) := RN(t) t RN(t) u. Proof: Assume 1 t < β, so that RN(t) {1, 1 + 2u, 1 + 4u,..., β}. Then RN(t) t 1 2u = u. 2 Dividing by RN(t) 1 gives directly the bound on E 2. If t 1 + u then the bound E 1 (t) u 1+u follows. Else 1 t < 1 + u RN(t) = 1 E 1 (t) = t 1 t < u 1+u. u Bound 1+u : sharp and well known [Dekker'71, Holm'80, Knuth'81-98], but simpler bound u almost always used in practice. 13

38 Error bounds for real numbers 14 Example for the usual binary formats: if p = 11 (half), u u 1 + u if p = 24 (oat), if p = 53 (double), if p = 113 (quad). For directed roundings, replace these bounds by 2u. Conclusion: in all cases, the relative errors due to rounding can be bounded by a tiny quantity which depends only on the format.

39 Correct rounding 15 This is the result of the composition of two functions: basic operations performed exactly, and exact result then rounded: x, y F, op = ±,, return r := RN(x op y). op extends to square root and FMA (fused multiply add: xy + z).

40 Correct rounding 15 This is the result of the composition of two functions: basic operations performed exactly, and exact result then rounded: x, y F, op = ±,, return r := RN(x op y). op extends to square root and FMA (fused multiply add: xy + z). The error bounds on E 1 and E 2 yield two standard models: r = (x op y) (1 + δ 1 ), δ 1 u 1+u =: u 1, = (x op y) 1, 1 + δ 2 δ 2 u.

41 Example 16 ( ) Let r = x+y RN(x+y) be evaluated naively as r = RN. 2 2

42 Example 16 ( ) Let r = x+y RN(x+y) be evaluated naively as r = RN. 2 2 High relative accuracy is ensured: RN(x + y) r = (1 + δ 1 ), 2 δ 1 u 1, = x + y (1 + δ 1 )(1 + δ 2 1), δ 1 u 1, =: r (1 + ɛ), ɛ 2u.

43 Example 16 ( ) Let r = x+y RN(x+y) be evaluated naively as r = RN. 2 2 High relative accuracy is ensured: RN(x + y) r = (1 + δ 1 ), 2 δ 1 u 1, = x + y (1 + δ 1 )(1 + δ 2 1), δ 1 u 1, =: r (1 + ɛ), ɛ 2u. We'd also like to have min(x, y) r max(x, y)...

44 Example Not always true: (RN ( ) ) ( ) 10 β = 10, p = 3 RN = RN =

45 Example Not always true: (RN ( ) ) ( ) 10 β = 10, p = 3 RN = RN = True if β = 2 or sign(x) sign(y). 17

46 Example 17 Not always true: (RN ( ) ) ( ) 10 β = 10, p = 3 RN = RN = True if β = 2 or sign(x) sign(y). Proof for base two: r := RN ( RN(x+y) 2 ) = RN ( ) x+y 2. x x+y y RN(x) RN ( x+y 2 2 x r y. ) RN(y)

47 Example 17 Not always true: (RN ( ) ) ( ) 10 β = 10, p = 3 RN = RN = True if β = 2 or sign(x) sign(y). Proof for base two: r := RN ( RN(x+y) 2 ) = RN ( ) x+y 2. x x+y y RN(x) RN ( x+y 2 2 x r y. ) RN(y) Repair other cases using r = x + y x. [Sterbenz'74, Boldo'15] 2

48 Other typical oating-point surprises 18 β = 2, any precision p 1 10 F.

49 Other typical oating-point surprises 18 β = 2, any precision p 1 10 F. Loss of algebraic properties: commutativity of +,, is preserved by correct rounding, but associativity and distributivity are lost: in general, ( (x + y) + z) (x + (y + z)).

50 Other typical oating-point surprises 18 β = 2, any precision p 1 10 F. Loss of algebraic properties: commutativity of +,, is preserved by correct rounding, but associativity and distributivity are lost: in general, ( (x + y) + z) (x + (y + z)). Catastrophic cancellation: 2 oating-point operations are enough to produce a result with relative error 1.

51 Catastrophic cancellation 19 For example, if x = 1, y = u β, and z = 1 then r := RN(RN(x + y) + z) = 0, and, since r := x + y + z is nonzero, we obtain r r / r = 1.

52 19 Catastrophic cancellation For example, if x = 1, y = u β, and z = 1 then r := RN(RN(x + y) + z) = 0, and, since r := x + y + z is nonzero, we obtain r r / r = 1. Possible workarounds: Sorting the input (if possible) Rewriting: a 2 b 2 = (a + b)(a b). Compensation: compute the rounding errors, and use them later in the algorithm in order to compensate for their eect. [Kahan, Rump,...]

53 Conditions for exact subtraction 20 Sterbenz' lemma: [Sterbenz'74] x, y F, y 2 x 2y x y F.

54 20 Conditions for exact subtraction Sterbenz' lemma: x, y F, y 2 x 2y x y F. [Sterbenz'74] Valid for any base β. Applications: Cody and Waite's range reduction, Kahan's accurate algorithms (discriminants, triangle area),...

55 Conditions for exact subtraction 20 Sterbenz' lemma: [Sterbenz'74] x, y F, y 2 x 2y x y F. Valid for any base β. Applications: Cody and Waite's range reduction, Kahan's accurate algorithms (discriminants, triangle area),... Proof: assume 0 < y x 2y. ulp(y) ulp(x) x y β e Z with β e = ulp(y). x y β is an integer such that 0 x y e β y e ulp(y) < βp. [Hauser'96]

56 Representable error terms 21 Addition and multiplication: x, y F, op {+, } x op y RN(x op y) F.

57 21 Representable error terms Addition and multiplication: x, y F, op {+, } x op y RN(x op y) F. Division and square root: x y RN(x/y) F, x RN ( x ) 2 F. Noted quite early. [Dekker'71, Pichat'76, Bohlender et al.'91] RN required only for ADD and SQRT. [Boldo & Daumas'03] FMA: its error is the sum of two oats. [Boldo & Muller'11]

58 22 Error-free transformations (EFT) Floating-point algorithms for computing such error terms exactly: x + y RN(x + y) in 6 additions [Møller'65, Knuth] and not less [Kornerup, Lefèvre, Louvet, Muller'12]

59 Error-free transformations (EFT) 22 Floating-point algorithms for computing such error terms exactly: x + y RN(x + y) in 6 additions [Møller'65, Knuth] and not less [Kornerup, Lefèvre, Louvet, Muller'12] xy RN(xy) can be obtained in 17 + and x [Dekker'71, Boldo'06] in only 2 ops if an FMA is available: ẑ := RN(xy) xy ẑ = FMA(x, y, ẑ).

60 Error-free transformations (EFT) 22 Floating-point algorithms for computing such error terms exactly: x + y RN(x + y) in 6 additions [Møller'65, Knuth] and not less [Kornerup, Lefèvre, Louvet, Muller'12] xy RN(xy) can be obtained in 17 + and x [Dekker'71, Boldo'06] in only 2 ops if an FMA is available: ẑ := RN(xy) xy ẑ = FMA(x, y, ẑ). Similar FMA-based EFT for DIV, SQRT... and FMA. EFT are key for extended precision algorithms: error compensation [Kahan'65,..., Higham'96, Ogita, Rump, Oishi'04+, Graillat, Langlois, Louvet'05+,...], oating-point expansions [Priest'91, Shewchuk'97, Joldes, Muller, Popescu'14+].

61 23 Optimal relative error bounds When t can be any real number, E 1 (t) best possible: u 1+u and E 2(t) u are t := 1 + u RN(t) is 1 or 1 + 2u t RN(t) = u.

62 23 Optimal relative error bounds When t can be any real number, E 1 (t) best possible: u 1+u and E 2(t) u are t := 1 + u RN(t) is 1 or 1 + 2u t RN(t) = u. Hence E 1 (t) = u 1 + u

63 23 Optimal relative error bounds When t can be any real number, E 1 (t) best possible: u 1+u and E 2(t) u are t := 1 + u RN(t) is 1 or 1 + 2u t RN(t) = u. Hence E 1 (t) = u 1 + u and, if rounding ties to even, RN(t) = 1 and thus E 2 (t) = u.

64 23 Optimal relative error bounds When t can be any real number, E 1 (t) best possible: u 1+u and E 2(t) u are t := 1 + u RN(t) is 1 or 1 + 2u t RN(t) = u. Hence E 1 (t) = u 1 + u and, if rounding ties to even, RN(t) = 1 and thus E 2 (t) = u. These are examples of optimal bounds: valid for all (t, RN) with t of a certain type; attained for some (t, RN) with t parametrized by β and p.

65 24 Can we do better when t = x op y and x, y F? This depends on op and, sometimes, on β and p. [J. & Rump'14]

66 Can we do better when t = x op y and x, y F? This depends on op and, sometimes, on β and p. [J. & Rump'14] t optimal bound on E 1(t) optimal bound on E 2(t) x ± y xy x/y u 1+u u 1+u u u 1+u ( ) u ( ) if β > 2, u 2u 2 if β = 2 u if β > 2, u 2u 2 1+u 2u 2 if β = 2 x u 1 + 2u 1 ( ) i β > 2 or 2 p + 1 is not a Fermat prime. Two standard models for each arithmetic operation. Application: sharper bounds and/or much simpler proofs. 24

67 25 Context Floating-point arithmetic A priori analysis Conclusion

68 Classical approach: Wilkinson's analysis 26 This is the most common way to guarantee a priori that the computed solution x has some kind of numerical quality: the forward error x x is 'small', the backward error A such that (A + A) x = b is 'small'.

69 Classical approach: Wilkinson's analysis 26 This is the most common way to guarantee a priori that the computed solution x has some kind of numerical quality: the forward error x x is 'small', the backward error A such that (A + A) x = b is 'small'. Developed by Wilkinson in the 1950s and 1960s. Relies almost exclusively on the rst standard model: r = (x op y)(1 + δ), δ u.

70 Classical approach: Wilkinson's analysis 26 This is the most common way to guarantee a priori that the computed solution x has some kind of numerical quality: the forward error x x is 'small', the backward error A such that (A + A) x = b is 'small'. Developed by Wilkinson in the 1950s and 1960s. Relies almost exclusively on the rst standard model: Eminently powerful: see Higham's book Accuracy and Stability of Numerical Algorithms (SIAM). r = (x op y)(1 + δ), δ u.

71 Example of analysis: a 2 b 2 Applying the standard model to each operation gives: r = ( RN(a 2 ) RN(b 2 ) ) (1 + δ 3 ) = ( a 2 (1 + δ 1 ) b 2 (1 + δ 2 ) ) (1 + δ 3 ), δ i u 1. 27

72 Example of analysis: a 2 b 2 Applying the standard model to each operation gives: r = ( RN(a 2 ) RN(b 2 ) ) (1 + δ 3 ) = ( a 2 (1 + δ 1 ) b 2 (1 + δ 2 ) ) (1 + δ 3 ), δ i u 1. r r = a 2 (δ 1 + δ 3 + δ 1 δ 3 ) b 2 (δ 2 + δ 3 + δ 2 δ 3 ). 27

73 Example of analysis: a 2 b 2 Applying the standard model to each operation gives: r = ( RN(a 2 ) RN(b 2 ) ) (1 + δ 3 ) = ( a 2 (1 + δ 1 ) b 2 (1 + δ 2 ) ) (1 + δ 3 ), δ i u 1. r r = a 2 (δ 1 + δ 3 + δ 1 δ 3 ) b 2 (δ 2 + δ 3 + δ 2 δ 3 ). r r 2u (a 2 +b 2 ). 27

74 Example of analysis: a 2 b 2 Applying the standard model to each operation gives: r = ( RN(a 2 ) RN(b 2 ) ) (1 + δ 3 ) = ( a 2 (1 + δ 1 ) b 2 (1 + δ 2 ) ) (1 + δ 3 ), δ i u 1. r r = a 2 (δ 1 + δ 3 + δ 1 δ 3 ) b 2 (δ 2 + δ 3 + δ 2 δ 3 ). r r 2u (a 2 +b 2 ). r r r 2u C, condition number C := a2 + b 2 a 2 b 2. 27

75 Example of analysis: a 2 b 2 Applying the standard model to each operation gives: r = ( RN(a 2 ) RN(b 2 ) ) (1 + δ 3 ) = ( a 2 (1 + δ 1 ) b 2 (1 + δ 2 ) ) (1 + δ 3 ), δ i u 1. r r = a 2 (δ 1 + δ 3 + δ 1 δ 3 ) b 2 (δ 2 + δ 3 + δ 2 δ 3 ). r r 2u (a 2 +b 2 ). r r r 2u C, condition number C := a2 + b 2 a 2 b 2. Bound easy to derive and to interpret: If C = O(1) then relative error in O(u): highly accurate! If C 1/u then relative error upper bounded by 1: it could be that catastrophic cancellation occurs. 27

76 28 Example of analysis: (a + b)(a b) ( ) r := RN RN(a + b) RN(a b) = (a + b)(a b) (1 + δ 1 )(1 + δ 2 )(1 + δ 3 ), δ i u 1. r r r (1 + u) 3 1 3u. Always highly accurate!

77 29 Floating-point summation Given x 1,..., x n F, evaluate their sum in any order. Classical analysis [Wilkinson'60]: Apply the standard model n 1 times. Deduce that the computed value ŝ F satises n n ŝ x i α x i, α = (1 + u) n 1 1. i=1 i=1

78 29 Floating-point summation Given x 1,..., x n F, evaluate their sum in any order. Classical analysis [Wilkinson'60]: Apply the standard model n 1 times. Deduce that the computed value ŝ F satises n n ŝ x i α x i, α = (1 + u) n 1 1. i=1 i=1 Easy to derive, valid for any order, asymptotically optimal: error 1 as u 0. error bound

79 29 Floating-point summation Given x 1,..., x n F, evaluate their sum in any order. Classical analysis [Wilkinson'60]: Apply the standard model n 1 times. Deduce that the computed value ŝ F satises n n ŝ x i α x i, α = (1 + u) n 1 1. i=1 i=1 Easy to derive, valid for any order, asymptotically optimal: error 1 as u 0. error bound u But, even with u replaced by 1+u, α = (n 1)u + O(u2 ), which hides a constant. So, classically bounded as α γ n 1, γ l = lu, lu < 1. [Higham'96] 1 lu

80 A simpler, O(u 2 )-free bound 30 Theorem [Rump'12] For recursive summation, one can take α = (n 1)u.

81 30 A simpler, O(u 2 )-free bound Theorem [Rump'12] For recursive summation, one can take α = (n 1)u. To prove this, Don't use just the (rened) standard model RN(x + y) (x + y) u 1+u x + y. (1)

82 30 A simpler, O(u 2 )-free bound Theorem [Rump'12] For recursive summation, one can take α = (n 1)u. To prove this, Don't use just the (rened) standard model RN(x + y) (x + y) But combine it with the lower-level property u 1+u x + y. (1) RN(x + y) (x + y) f (x + y), f F, min{ x, y }; (2)

83 30 A simpler, O(u 2 )-free bound Theorem [Rump'12] For recursive summation, one can take α = (n 1)u. To prove this, Don't use just the (rened) standard model RN(x + y) (x + y) But combine it with the lower-level property u 1+u x + y. (1) RN(x + y) (x + y) f (x + y), f F, min{ x, y }; (2) Conclude by induction on n with a clever case-distinction comparing x n to u i<n x i, and using either (1) or (2).

84 Wilkinson's bounds revisited 31 Problem Classical α New α Ref. summation (n 1)u + O(u 2 ) (n 1)u [1]

85 Wilkinson's bounds revisited Problem Classical α New α Ref. summation (n 1)u + O(u 2 ) (n 1)u [1] dot prod., mat. mul. nu + O(u 2 ) nu [1] Euclidean norm ( n + 1)u + 2 O(u2 ) ( n + 1)u 2 [2] Tx = b, A = LU nu + O(u 2 ) nu [2] A = R T R (n + 1)u + O(u 2 ) (n + 1)u [2] x n (recursive, β = 2) (n 1)u + O(u 2 ) (n 1)u ( ) [3] product x 1x 2 x n (n 1)u + O(u 2 ) (n 1)u ( ) [4] poly. eval. (Horner) 2nu + O(u 2 ) 2nu ( ) [4] ( ) if n < c u 1/2. [1]: with Rump'13; [2]: with Rump'14; [3]: Graillat, Lefèvre, Muller'14; [4]: with Bünger and Rump'14. 31

86 32 Kahan's algorithm for ad bc [ a Kahan's algorithm uses the FMA to evaluate det c ] b = ad bc: d ŵ := RN(bc); f := RN(ad ŵ); r := RN( f + e); e := RN(ŵ bc);

87 32 Kahan's algorithm for ad bc [ a Kahan's algorithm uses the FMA to evaluate det c ] b = ad bc: d ŵ := RN(bc); f := RN(ad ŵ); r := RN( f + e); e := RN(ŵ bc); The operation ad bc is not in IEEE 754, but very common: complex arithmetic, discriminant of a quadratic equation, robust orientation predicates using tests like 'ad bc > ɛ?' If evaluated naively, ad bc leads to highly inaccurate results: f r r can be of the order of u 1 1.

88 33 Kahan's algorithm for ad bc Analysis in the standard model [Higham'96]: r r r ( 2u 1 + u bc 2 r high relative accuracy as long as u bc 2 r. ).

89 33 Kahan's algorithm for ad bc Analysis in the standard model [Higham'96]: r r r ( 2u 1 + u bc 2 r ). high relative accuracy as long as u bc 2 r. When u bc 2 r, the error bound can be > 1 and does not even allow to conclude that sign( r) = sign(r).

90 33 Kahan's algorithm for ad bc Analysis in the standard model [Higham'96]: r r r ( 2u 1 + u bc 2 r ). high relative accuracy as long as u bc 2 r. When u bc 2 r, the error bound can be > 1 and does not even allow to conclude that sign( r) = sign(r). In fact, Kahan's algorithm is always highly accurate: the standard model alone fails to predict this; misinterpreting bounds dismissing good algorithms.

91 34 Further analysis [J., Louvet, Muller'13] The key is an ulp-analysis of the error terms ɛ 1 and ɛ 2 given by: ŵ := RN(bc); f := RN(ad ŵ); r := RN( f + e); e := RN(ŵ bc); f = ad ŵ + ɛ 1 r = f + e + ɛ 2 Since e is exactly ŵ bc, we have r r = ɛ 1 + ɛ 2. Furthermore, we can prove that ɛ i β ulp(r) for i = 1, 2. 2 Proposition: r r β ulp(r) 2βu r.

92 34 Further analysis [J., Louvet, Muller'13] The key is an ulp-analysis of the error terms ɛ 1 and ɛ 2 given by: ŵ := RN(bc); f := RN(ad ŵ); r := RN( f + e); e := RN(ŵ bc); f = ad ŵ + ɛ 1 r = f + e + ɛ 2 Since e is exactly ŵ bc, we have r r = ɛ 1 + ɛ 2. Furthermore, we can prove that ɛ i β ulp(r) for i = 1, 2. 2 Proposition: r r β ulp(r) 2βu r. These bounds mean Kahan's algorithm is always highly accurate.

93 Further analysis [J., Louvet, Muller'13] We can do better via a case analysis comparing ɛ 2 to 1 2 ulp(r): Theorem: relative error r r / r 2u; 35

94 Further analysis [J., Louvet, Muller'13] We can do better via a case analysis comparing ɛ 2 to 1 2 ulp(r): Theorem: relative error r r / r 2u; the leading constant 2 is best possible. 35

95 Certicate of optimality This is an explicit input set parametrized by β and p such that error error bound 1 as u 0. 36

96 Certicate of optimality 36 This is an explicit input set parametrized by β and p such that error error bound 1 as u 0. Example: for Kahan's algorithm for r = ad bc: a = b = β p c = β p 1 + β r r / r 2 βp 2 = 1 2u 1 + 2u = 1 2u + O(u2 ). d = 2β p 1 + β 2 βp 2

97 Certicate of optimality 36 This is an explicit input set parametrized by β and p such that error error bound 1 as u 0. Example: for Kahan's algorithm for r = ad bc: a = b = β p c = β p 1 + β r r / r 2 βp 2 = 1 2u 1 + 2u = 1 2u + O(u2 ). d = 2β p 1 + β 2 βp 2 Optimality is asymptotic, but often OK in practice: for β = 2 and p = 11, the above example has relative error u. The certicate consists of sparse, symbolic oating-point data, which we can handle automatically. [J., Louvet, Muller, Plet]

98 37 Context Floating-point arithmetic A priori analysis Conclusion

99 38 Summary Floating-point arithmetic is specied rigorously by IEEE 754, highly structured and much richer than the standard model. Exploiting this structure leads to enhanced a priori analysis: optimal standard models for basic arithmetic operations, simpler and sharper Wilkinson-like bounds, proofs of nice behavior of some numerical kernels.

100 38 Summary Floating-point arithmetic is specied rigorously by IEEE 754, highly structured and much richer than the standard model. Exploiting this structure leads to enhanced a priori analysis: optimal standard models for basic arithmetic operations, simpler and sharper Wilkinson-like bounds, proofs of nice behavior of some numerical kernels. On-going research: consider directed roundings as well. take underow and overow into account.

Getting tight error bounds in floating-point arithmetic: illustration with complex functions, and the real x n function

Getting tight error bounds in floating-point arithmetic: illustration with complex functions, and the real x n function Getting tight error bounds in floating-point arithmetic: illustration with complex functions, and the real x n function Jean-Michel Muller collaborators: S. Graillat, C.-P. Jeannerod, N. Louvet V. Lefèvre

More information

The Classical Relative Error Bounds for Computing a2 + b 2 and c/ a 2 + b 2 in Binary Floating-Point Arithmetic are Asymptotically Optimal

The Classical Relative Error Bounds for Computing a2 + b 2 and c/ a 2 + b 2 in Binary Floating-Point Arithmetic are Asymptotically Optimal -1- The Classical Relative Error Bounds for Computing a2 + b 2 and c/ a 2 + b 2 in Binary Floating-Point Arithmetic are Asymptotically Optimal Claude-Pierre Jeannerod Jean-Michel Muller Antoine Plet Inria,

More information

On various ways to split a floating-point number

On various ways to split a floating-point number On various ways to split a floating-point number Claude-Pierre Jeannerod Jean-Michel Muller Paul Zimmermann Inria, CNRS, ENS Lyon, Université de Lyon, Université de Lorraine France ARITH-25 June 2018 -2-

More information

Accurate Solution of Triangular Linear System

Accurate Solution of Triangular Linear System SCAN 2008 Sep. 29 - Oct. 3, 2008, at El Paso, TX, USA Accurate Solution of Triangular Linear System Philippe Langlois DALI at University of Perpignan France Nicolas Louvet INRIA and LIP at ENS Lyon France

More information

How to Ensure a Faithful Polynomial Evaluation with the Compensated Horner Algorithm

How to Ensure a Faithful Polynomial Evaluation with the Compensated Horner Algorithm How to Ensure a Faithful Polynomial Evaluation with the Compensated Horner Algorithm Philippe Langlois, Nicolas Louvet Université de Perpignan, DALI Research Team {langlois, nicolas.louvet}@univ-perp.fr

More information

Faithful Horner Algorithm

Faithful Horner Algorithm ICIAM 07, Zurich, Switzerland, 16 20 July 2007 Faithful Horner Algorithm Philippe Langlois, Nicolas Louvet Université de Perpignan, France http://webdali.univ-perp.fr Ph. Langlois (Université de Perpignan)

More information

Solving Triangular Systems More Accurately and Efficiently

Solving Triangular Systems More Accurately and Efficiently 17th IMACS World Congress, July 11-15, 2005, Paris, France Scientific Computation, Applied Mathematics and Simulation Solving Triangular Systems More Accurately and Efficiently Philippe LANGLOIS and Nicolas

More information

Newton-Raphson Algorithms for Floating-Point Division Using an FMA

Newton-Raphson Algorithms for Floating-Point Division Using an FMA Newton-Raphson Algorithms for Floating-Point Division Using an FMA Nicolas Louvet, Jean-Michel Muller, Adrien Panhaleux Abstract Since the introduction of the Fused Multiply and Add (FMA) in the IEEE-754-2008

More information

Accurate polynomial evaluation in floating point arithmetic

Accurate polynomial evaluation in floating point arithmetic in floating point arithmetic Université de Perpignan Via Domitia Laboratoire LP2A Équipe de recherche en Informatique DALI MIMS Seminar, February, 10 th 2006 General motivation Provide numerical algorithms

More information

Some issues related to double rounding

Some issues related to double rounding Noname manuscript No. will be inserted by the editor) Some issues related to double rounding Érik Martin-Dorel Guillaume Melquiond Jean-Michel Muller Received: date / Accepted: date Abstract Double rounding

More information

MANY numerical problems in dynamical systems

MANY numerical problems in dynamical systems IEEE TRANSACTIONS ON COMPUTERS, VOL., 201X 1 Arithmetic algorithms for extended precision using floating-point expansions Mioara Joldeş, Olivier Marty, Jean-Michel Muller and Valentina Popescu Abstract

More information

Algorithms for Accurate, Validated and Fast Polynomial Evaluation

Algorithms for Accurate, Validated and Fast Polynomial Evaluation Algorithms for Accurate, Validated and Fast Polynomial Evaluation Stef Graillat, Philippe Langlois, Nicolas Louvet Abstract: We survey a class of algorithms to evaluate polynomials with oating point coecients

More information

4ccurate 4lgorithms in Floating Point 4rithmetic

4ccurate 4lgorithms in Floating Point 4rithmetic 4ccurate 4lgorithms in Floating Point 4rithmetic Philippe LANGLOIS 1 1 DALI Research Team, University of Perpignan, France SCAN 2006 at Duisburg, Germany, 26-29 September 2006 Philippe LANGLOIS (UPVD)

More information

Homework 2 Foundations of Computational Math 1 Fall 2018

Homework 2 Foundations of Computational Math 1 Fall 2018 Homework 2 Foundations of Computational Math 1 Fall 2018 Note that Problems 2 and 8 have coding in them. Problem 2 is a simple task while Problem 8 is very involved (and has in fact been given as a programming

More information

On the Computation of Correctly-Rounded Sums

On the Computation of Correctly-Rounded Sums INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE On the Computation of Correctly-Rounded Sums Peter Kornerup Vincent Lefèvre Nicolas Louvet Jean-Michel Muller N 7262 April 2010 Algorithms,

More information

Additive symmetries: the non-negative case

Additive symmetries: the non-negative case Theoretical Computer Science 91 (003) 143 157 www.elsevier.com/locate/tcs Additive symmetries: the non-negative case Marc Daumas, Philippe Langlois Laboratoire de l Informatique du Parallelisme, UMR 5668

More information

ERROR ESTIMATION OF FLOATING-POINT SUMMATION AND DOT PRODUCT

ERROR ESTIMATION OF FLOATING-POINT SUMMATION AND DOT PRODUCT accepted for publication in BIT, May 18, 2011. ERROR ESTIMATION OF FLOATING-POINT SUMMATION AND DOT PRODUCT SIEGFRIED M. RUMP Abstract. We improve the well-known Wilkinson-type estimates for the error

More information

The Euclidean Division Implemented with a Floating-Point Multiplication and a Floor

The Euclidean Division Implemented with a Floating-Point Multiplication and a Floor The Euclidean Division Implemented with a Floating-Point Multiplication and a Floor Vincent Lefèvre 11th July 2005 Abstract This paper is a complement of the research report The Euclidean division implemented

More information

Computation of dot products in finite fields with floating-point arithmetic

Computation of dot products in finite fields with floating-point arithmetic Computation of dot products in finite fields with floating-point arithmetic Stef Graillat Joint work with Jérémy Jean LIP6/PEQUAN - Université Pierre et Marie Curie (Paris 6) - CNRS Computer-assisted proofs

More information

An accurate algorithm for evaluating rational functions

An accurate algorithm for evaluating rational functions An accurate algorithm for evaluating rational functions Stef Graillat To cite this version: Stef Graillat. An accurate algorithm for evaluating rational functions. 2017. HAL Id: hal-01578486

More information

Elements of Floating-point Arithmetic

Elements of Floating-point Arithmetic Elements of Floating-point Arithmetic Sanzheng Qiao Department of Computing and Software McMaster University July, 2012 Outline 1 Floating-point Numbers Representations IEEE Floating-point Standards Underflow

More information

Some Functions Computable with a Fused-mac

Some Functions Computable with a Fused-mac Some Functions Computable with a Fused-mac Sylvie Boldo, Jean-Michel Muller To cite this version: Sylvie Boldo, Jean-Michel Muller. Some Functions Computable with a Fused-mac. Paolo Montuschi, Eric Schwarz.

More information

Accurate Floating Point Product and Exponentiation

Accurate Floating Point Product and Exponentiation Accurate Floating Point Product and Exponentiation Stef Graillat To cite this version: Stef Graillat. Accurate Floating Point Product and Exponentiation. IEEE Transactions on Computers, Institute of Electrical

More information

Verification methods for linear systems using ufp estimation with rounding-to-nearest

Verification methods for linear systems using ufp estimation with rounding-to-nearest NOLTA, IEICE Paper Verification methods for linear systems using ufp estimation with rounding-to-nearest Yusuke Morikura 1a), Katsuhisa Ozaki 2,4, and Shin ichi Oishi 3,4 1 Graduate School of Fundamental

More information

Elements of Floating-point Arithmetic

Elements of Floating-point Arithmetic Elements of Floating-point Arithmetic Sanzheng Qiao Department of Computing and Software McMaster University July, 2012 Outline 1 Floating-point Numbers Representations IEEE Floating-point Standards Underflow

More information

Compensated Horner scheme in complex floating point arithmetic

Compensated Horner scheme in complex floating point arithmetic Compensated Horner scheme in complex floating point arithmetic Stef Graillat, Valérie Ménissier-Morain UPMC Univ Paris 06, UMR 7606, LIP6 4 place Jussieu, F-75252, Paris cedex 05, France stef.graillat@lip6.fr,valerie.menissier-morain@lip6.fr

More information

Faithfully Rounded Floating-point Computations

Faithfully Rounded Floating-point Computations 1 Faithfully Rounded Floating-point Computations MARKO LANGE, Waseda University SIEGFRIED M. RUMP, Hamburg University of Technology We present a pair arithmetic for the four basic operations and square

More information

Jim Lambers MAT 610 Summer Session Lecture 2 Notes

Jim Lambers MAT 610 Summer Session Lecture 2 Notes Jim Lambers MAT 610 Summer Session 2009-10 Lecture 2 Notes These notes correspond to Sections 2.2-2.4 in the text. Vector Norms Given vectors x and y of length one, which are simply scalars x and y, the

More information

Fast and accurate Bessel function computation

Fast and accurate Bessel function computation 0 Fast and accurate Bessel function computation John Harrison, Intel Corporation ARITH-19 Portland, OR Tue 9th June 2009 (11:00 11:30) 1 Bessel functions and their computation Bessel functions are certain

More information

Notes for Chapter 1 of. Scientific Computing with Case Studies

Notes for Chapter 1 of. Scientific Computing with Case Studies Notes for Chapter 1 of Scientific Computing with Case Studies Dianne P. O Leary SIAM Press, 2008 Mathematical modeling Computer arithmetic Errors 1999-2008 Dianne P. O'Leary 1 Arithmetic and Error What

More information

Introduction 5. 1 Floating-Point Arithmetic 5. 2 The Direct Solution of Linear Algebraic Systems 11

Introduction 5. 1 Floating-Point Arithmetic 5. 2 The Direct Solution of Linear Algebraic Systems 11 SCIENTIFIC COMPUTING BY NUMERICAL METHODS Christina C. Christara and Kenneth R. Jackson, Computer Science Dept., University of Toronto, Toronto, Ontario, Canada, M5S 1A4. (ccc@cs.toronto.edu and krj@cs.toronto.edu)

More information

Tight and rigourous error bounds for basic building blocks of double-word arithmetic. Mioara Joldes, Jean-Michel Muller, Valentina Popescu

Tight and rigourous error bounds for basic building blocks of double-word arithmetic. Mioara Joldes, Jean-Michel Muller, Valentina Popescu Tight and rigourous error bounds for basic building blocks of double-word arithmetic Mioara Joldes, Jean-Michel Muller, Valentina Popescu SCAN - September 2016 AMPAR CudA Multiple Precision ARithmetic

More information

Representable correcting terms for possibly underflowing floating point operations

Representable correcting terms for possibly underflowing floating point operations Representable correcting terms for possibly underflowing floating point operations Sylvie Boldo & Marc Daumas E-mail: Sylvie.Boldo@ENS-Lyon.Fr & Marc.Daumas@ENS-Lyon.Fr Laboratoire de l Informatique du

More information

Math 411 Preliminaries

Math 411 Preliminaries Math 411 Preliminaries Provide a list of preliminary vocabulary and concepts Preliminary Basic Netwon s method, Taylor series expansion (for single and multiple variables), Eigenvalue, Eigenvector, Vector

More information

Dense LU factorization and its error analysis

Dense LU factorization and its error analysis Dense LU factorization and its error analysis Laura Grigori INRIA and LJLL, UPMC February 2016 Plan Basis of floating point arithmetic and stability analysis Notation, results, proofs taken from [N.J.Higham,

More information

Contents. 2.1 Vectors in R n. Linear Algebra (part 2) : Vector Spaces (by Evan Dummit, 2017, v. 2.50) 2 Vector Spaces

Contents. 2.1 Vectors in R n. Linear Algebra (part 2) : Vector Spaces (by Evan Dummit, 2017, v. 2.50) 2 Vector Spaces Linear Algebra (part 2) : Vector Spaces (by Evan Dummit, 2017, v 250) Contents 2 Vector Spaces 1 21 Vectors in R n 1 22 The Formal Denition of a Vector Space 4 23 Subspaces 6 24 Linear Combinations and

More information

FLOATING POINT ARITHMETHIC - ERROR ANALYSIS

FLOATING POINT ARITHMETHIC - ERROR ANALYSIS FLOATING POINT ARITHMETHIC - ERROR ANALYSIS Brief review of floating point arithmetic Model of floating point arithmetic Notation, backward and forward errors Roundoff errors and floating-point arithmetic

More information

FLOATING POINT ARITHMETHIC - ERROR ANALYSIS

FLOATING POINT ARITHMETHIC - ERROR ANALYSIS FLOATING POINT ARITHMETHIC - ERROR ANALYSIS Brief review of floating point arithmetic Model of floating point arithmetic Notation, backward and forward errors 3-1 Roundoff errors and floating-point arithmetic

More information

Computation of the error functions erf and erfc in arbitrary precision with correct rounding

Computation of the error functions erf and erfc in arbitrary precision with correct rounding Computation of the error functions erf and erfc in arbitrary precision with correct rounding Sylvain Chevillard Arenaire, LIP, ENS-Lyon, France Sylvain.Chevillard@ens-lyon.fr Nathalie Revol INRIA, Arenaire,

More information

Automating the Verification of Floating-point Algorithms

Automating the Verification of Floating-point Algorithms Introduction Interval+Error Advanced Gappa Conclusion Automating the Verification of Floating-point Algorithms Guillaume Melquiond Inria Saclay Île-de-France LRI, Université Paris Sud, CNRS 2014-07-18

More information

ECS 231 Computer Arithmetic 1 / 27

ECS 231 Computer Arithmetic 1 / 27 ECS 231 Computer Arithmetic 1 / 27 Outline 1 Floating-point numbers and representations 2 Floating-point arithmetic 3 Floating-point error analysis 4 Further reading 2 / 27 Outline 1 Floating-point numbers

More information

Chapter. Algebra techniques. Syllabus Content A Basic Mathematics 10% Basic algebraic techniques and the solution of equations.

Chapter. Algebra techniques. Syllabus Content A Basic Mathematics 10% Basic algebraic techniques and the solution of equations. Chapter 2 Algebra techniques Syllabus Content A Basic Mathematics 10% Basic algebraic techniques and the solution of equations. Page 1 2.1 What is algebra? In order to extend the usefulness of mathematical

More information

Arithmetic and Error. How does error arise? How does error arise? Notes for Part 1 of CMSC 460

Arithmetic and Error. How does error arise? How does error arise? Notes for Part 1 of CMSC 460 Notes for Part 1 of CMSC 460 Dianne P. O Leary Preliminaries: Mathematical modeling Computer arithmetic Errors 1999-2006 Dianne P. O'Leary 1 Arithmetic and Error What we need to know about error: -- how

More information

Validation for scientific computations Error analysis

Validation for scientific computations Error analysis Validation for scientific computations Error analysis Cours de recherche master informatique Nathalie Revol Nathalie.Revol@ens-lyon.fr 20 octobre 2006 References for today s lecture Condition number: Ph.

More information

CHAPTER 1 POLYNOMIALS

CHAPTER 1 POLYNOMIALS 1 CHAPTER 1 POLYNOMIALS 1.1 Removing Nested Symbols of Grouping Simplify. 1. 4x + 3( x ) + 4( x + 1). ( ) 3x + 4 5 x 3 + x 3. 3 5( y 4) + 6 y ( y + 3) 4. 3 n ( n + 5) 4 ( n + 8) 5. ( x + 5) x + 3( x 6)

More information

Compensated Horner Scheme

Compensated Horner Scheme Compensated Horner Scheme S. Graillat 1, Philippe Langlois 1, Nicolas Louvet 1 DALI-LP2A Laboratory. Université de Perpignan Via Domitia. firstname.lastname@univ-perp.fr, http://webdali.univ-perp.fr Abstract.

More information

a. Define a function called an inner product on pairs of points x = (x 1, x 2,..., x n ) and y = (y 1, y 2,..., y n ) in R n by

a. Define a function called an inner product on pairs of points x = (x 1, x 2,..., x n ) and y = (y 1, y 2,..., y n ) in R n by Real Analysis Homework 1 Solutions 1. Show that R n with the usual euclidean distance is a metric space. Items a-c will guide you through the proof. a. Define a function called an inner product on pairs

More information

Improving the Compensated Horner Scheme with a Fused Multiply and Add

Improving the Compensated Horner Scheme with a Fused Multiply and Add Équipe de Recherche DALI Laboratoire LP2A, EA 3679 Université de Perpignan Via Domitia Improving the Compensated Horner Scheme with a Fused Multiply and Add S. Graillat, Ph. Langlois, N. Louvet DALI-LP2A

More information

Adaptive and Efficient Algorithm for 2D Orientation Problem

Adaptive and Efficient Algorithm for 2D Orientation Problem Japan J. Indust. Appl. Math., 26 (2009), 215 231 Area 2 Adaptive and Efficient Algorithm for 2D Orientation Problem Katsuhisa Ozaki, Takeshi Ogita, Siegfried M. Rump and Shin ichi Oishi Research Institute

More information

Handout #6 INTRODUCTION TO ALGEBRAIC STRUCTURES: Prof. Moseley AN ALGEBRAIC FIELD

Handout #6 INTRODUCTION TO ALGEBRAIC STRUCTURES: Prof. Moseley AN ALGEBRAIC FIELD Handout #6 INTRODUCTION TO ALGEBRAIC STRUCTURES: Prof. Moseley Chap. 2 AN ALGEBRAIC FIELD To introduce the notion of an abstract algebraic structure we consider (algebraic) fields. (These should not to

More information

Tomáš Madaras Congruence classes

Tomáš Madaras Congruence classes Congruence classes For given integer m 2, the congruence relation modulo m at the set Z is the equivalence relation, thus, it provides a corresponding partition of Z into mutually disjoint sets. Definition

More information

Essentials of Intermediate Algebra

Essentials of Intermediate Algebra Essentials of Intermediate Algebra BY Tom K. Kim, Ph.D. Peninsula College, WA Randy Anderson, M.S. Peninsula College, WA 9/24/2012 Contents 1 Review 1 2 Rules of Exponents 2 2.1 Multiplying Two Exponentials

More information

Contents. 4 Arithmetic and Unique Factorization in Integral Domains. 4.1 Euclidean Domains and Principal Ideal Domains

Contents. 4 Arithmetic and Unique Factorization in Integral Domains. 4.1 Euclidean Domains and Principal Ideal Domains Ring Theory (part 4): Arithmetic and Unique Factorization in Integral Domains (by Evan Dummit, 018, v. 1.00) Contents 4 Arithmetic and Unique Factorization in Integral Domains 1 4.1 Euclidean Domains and

More information

Lecture 7. Floating point arithmetic and stability

Lecture 7. Floating point arithmetic and stability Lecture 7 Floating point arithmetic and stability 2.5 Machine representation of numbers Scientific notation: 23 }{{} }{{} } 3.14159265 {{} }{{} 10 sign mantissa base exponent (significand) s m β e A floating

More information

Continued fractions and number systems: applications to correctly-rounded implementations of elementary functions and modular arithmetic.

Continued fractions and number systems: applications to correctly-rounded implementations of elementary functions and modular arithmetic. Continued fractions and number systems: applications to correctly-rounded implementations of elementary functions and modular arithmetic. Mourad Gouicem PEQUAN Team, LIP6/UPMC Nancy, France May 28 th 2013

More information

1 Finding and fixing floating point problems

1 Finding and fixing floating point problems Notes for 2016-09-09 1 Finding and fixing floating point problems Floating point arithmetic is not the same as real arithmetic. Even simple properties like associativity or distributivity of addition and

More information

Computing Time for Summation Algorithm: Less Hazard and More Scientific Research

Computing Time for Summation Algorithm: Less Hazard and More Scientific Research Computing Time for Summation Algorithm: Less Hazard and More Scientific Research Bernard Goossens, Philippe Langlois, David Parello, Kathy Porada To cite this version: Bernard Goossens, Philippe Langlois,

More information

1 Matrices and Systems of Linear Equations

1 Matrices and Systems of Linear Equations Linear Algebra (part ) : Matrices and Systems of Linear Equations (by Evan Dummit, 207, v 260) Contents Matrices and Systems of Linear Equations Systems of Linear Equations Elimination, Matrix Formulation

More information

output H = 2*H+P H=2*(H-P)

output H = 2*H+P H=2*(H-P) Ecient Algorithms for Multiplication on Elliptic Curves by Volker Muller TI-9/97 22. April 997 Institut fur theoretische Informatik Ecient Algorithms for Multiplication on Elliptic Curves Volker Muller

More information

1. Introduction to commutative rings and fields

1. Introduction to commutative rings and fields 1. Introduction to commutative rings and fields Very informally speaking, a commutative ring is a set in which we can add, subtract and multiply elements so that the usual laws hold. A field is a commutative

More information

MATH 117 LECTURE NOTES

MATH 117 LECTURE NOTES MATH 117 LECTURE NOTES XIN ZHOU Abstract. This is the set of lecture notes for Math 117 during Fall quarter of 2017 at UC Santa Barbara. The lectures follow closely the textbook [1]. Contents 1. The set

More information

Partitions in the quintillions or Billions of congruences

Partitions in the quintillions or Billions of congruences Partitions in the quintillions or Billions of congruences Fredrik Johansson November 2011 The partition function p(n) counts the number of ways n can be written as the sum of positive integers without

More information

1 Floating point arithmetic

1 Floating point arithmetic Introduction to Floating Point Arithmetic Floating point arithmetic Floating point representation (scientific notation) of numbers, for example, takes the following form.346 0 sign fraction base exponent

More information

LECTURE NOTES IN CRYPTOGRAPHY

LECTURE NOTES IN CRYPTOGRAPHY 1 LECTURE NOTES IN CRYPTOGRAPHY Thomas Johansson 2005/2006 c Thomas Johansson 2006 2 Chapter 1 Abstract algebra and Number theory Before we start the treatment of cryptography we need to review some basic

More information

Binary floating point

Binary floating point Binary floating point Notes for 2017-02-03 Why do we study conditioning of problems? One reason is that we may have input data contaminated by noise, resulting in a bad solution even if the intermediate

More information

Linear Algebra (part 1) : Matrices and Systems of Linear Equations (by Evan Dummit, 2016, v. 2.02)

Linear Algebra (part 1) : Matrices and Systems of Linear Equations (by Evan Dummit, 2016, v. 2.02) Linear Algebra (part ) : Matrices and Systems of Linear Equations (by Evan Dummit, 206, v 202) Contents 2 Matrices and Systems of Linear Equations 2 Systems of Linear Equations 2 Elimination, Matrix Formulation

More information

Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) 1.1 The Formal Denition of a Vector Space

Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) 1.1 The Formal Denition of a Vector Space Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) Contents 1 Vector Spaces 1 1.1 The Formal Denition of a Vector Space.................................. 1 1.2 Subspaces...................................................

More information

Optimizing the Representation of Intervals

Optimizing the Representation of Intervals Optimizing the Representation of Intervals Javier D. Bruguera University of Santiago de Compostela, Spain Numerical Sofware: Design, Analysis and Verification Santander, Spain, July 4-6 2012 Contents 1

More information

On the computation of the reciprocal of floating point expansions using an adapted Newton-Raphson iteration

On the computation of the reciprocal of floating point expansions using an adapted Newton-Raphson iteration On the computation of the reciprocal of floating point expansions using an adapted Newton-Raphson iteration Mioara Joldes, Valentina Popescu, Jean-Michel Muller ASAP 2014 1 / 10 Motivation Numerical problems

More information

Math Lecture 18 Notes

Math Lecture 18 Notes Math 1010 - Lecture 18 Notes Dylan Zwick Fall 2009 In our last lecture we talked about how we can add, subtract, and multiply polynomials, and we figured out that, basically, if you can add, subtract,

More information

A2 HW Imaginary Numbers

A2 HW Imaginary Numbers Name: A2 HW Imaginary Numbers Rewrite the following in terms of i and in simplest form: 1) 100 2) 289 3) 15 4) 4 81 5) 5 12 6) -8 72 Rewrite the following as a radical: 7) 12i 8) 20i Solve for x in simplest

More information

Arithmetic Operations. The real numbers have the following properties: In particular, putting a 1 in the Distributive Law, we get

Arithmetic Operations. The real numbers have the following properties: In particular, putting a 1 in the Distributive Law, we get MCA AP Calculus AB Summer Assignment The following packet is a review of many of the skills needed as we begin the study of Calculus. There two major sections to this review. Pages 2-9 are review examples

More information

The Ring Z of Integers

The Ring Z of Integers Chapter 2 The Ring Z of Integers The next step in constructing the rational numbers from N is the construction of Z, that is, of the (ring of) integers. 2.1 Equivalence Classes and Definition of Integers

More information

Semester Review Packet

Semester Review Packet MATH 110: College Algebra Instructor: Reyes Semester Review Packet Remarks: This semester we have made a very detailed study of four classes of functions: Polynomial functions Linear Quadratic Higher degree

More information

Fast and Accurate Bessel Function Computation

Fast and Accurate Bessel Function Computation Fast and Accurate Bessel Function Computation John Harrison Intel Corporation, JF-3 NE 5th Avenue Hillsboro OR 974, USA Email: johnh@ichips.intel.com Abstract The Bessel functions are considered relatively

More information

Optimizing Scientific Libraries for the Itanium

Optimizing Scientific Libraries for the Itanium 0 Optimizing Scientific Libraries for the Itanium John Harrison Intel Corporation Gelato Federation Meeting, HP Cupertino May 25, 2005 1 Quick summary Intel supplies drop-in replacement versions of common

More information

What Every Programmer Should Know About Floating-Point Arithmetic DRAFT. Last updated: November 3, Abstract

What Every Programmer Should Know About Floating-Point Arithmetic DRAFT. Last updated: November 3, Abstract What Every Programmer Should Know About Floating-Point Arithmetic Last updated: November 3, 2014 Abstract The article provides simple answers to the common recurring questions of novice programmers about

More information

Foundations of Discrete Mathematics

Foundations of Discrete Mathematics Foundations of Discrete Mathematics Chapter 0 By Dr. Dalia M. Gil, Ph.D. Statement Statement is an ordinary English statement of fact. It has a subject, a verb, and a predicate. It can be assigned a true

More information

GROUPS. Chapter-1 EXAMPLES 1.1. INTRODUCTION 1.2. BINARY OPERATION

GROUPS. Chapter-1 EXAMPLES 1.1. INTRODUCTION 1.2. BINARY OPERATION Chapter-1 GROUPS 1.1. INTRODUCTION The theory of groups arose from the theory of equations, during the nineteenth century. Originally, groups consisted only of transformations. The group of transformations

More information

Lecture Notes: Mathematical/Discrete Structures

Lecture Notes: Mathematical/Discrete Structures Lecture Notes: Mathematical/Discrete Structures Rashid Bin Muhammad, PhD. This booklet includes lecture notes, homework problems, and exam problems from discrete structures course I taught in Fall 2006

More information

arxiv: v1 [math.nt] 3 Jun 2016

arxiv: v1 [math.nt] 3 Jun 2016 Absolute real root separation arxiv:1606.01131v1 [math.nt] 3 Jun 2016 Yann Bugeaud, Andrej Dujella, Tomislav Pejković, and Bruno Salvy Abstract While the separation(the minimal nonzero distance) between

More information

Addition sequences and numerical evaluation of modular forms

Addition sequences and numerical evaluation of modular forms Addition sequences and numerical evaluation of modular forms Fredrik Johansson (INRIA Bordeaux) Joint work with Andreas Enge (INRIA Bordeaux) William Hart (TU Kaiserslautern) DK Statusseminar in Strobl,

More information

ERROR BOUNDS ON COMPLEX FLOATING-POINT MULTIPLICATION

ERROR BOUNDS ON COMPLEX FLOATING-POINT MULTIPLICATION MATHEMATICS OF COMPUTATION Volume 00, Number 0, Pages 000 000 S 005-5718(XX)0000-0 ERROR BOUNDS ON COMPLEX FLOATING-POINT MULTIPLICATION RICHARD BRENT, COLIN PERCIVAL, AND PAUL ZIMMERMANN In memory of

More information

A video College Algebra course & 6 Enrichment videos

A video College Algebra course & 6 Enrichment videos A video College Algebra course & 6 Enrichment videos Recorded at the University of Missouri Kansas City in 1998. All times are approximate. About 43 hours total. Available on YouTube at http://www.youtube.com/user/umkc

More information

MAT 243 Test 2 SOLUTIONS, FORM A

MAT 243 Test 2 SOLUTIONS, FORM A MAT Test SOLUTIONS, FORM A 1. [10 points] Give a recursive definition for the set of all ordered pairs of integers (x, y) such that x < y. Solution: Let S be the set described above. Note that if (x, y)

More information

REVIEW Chapter 1 The Real Number System

REVIEW Chapter 1 The Real Number System REVIEW Chapter The Real Number System In class work: Complete all statements. Solve all exercises. (Section.4) A set is a collection of objects (elements). The Set of Natural Numbers N N = {,,, 4, 5, }

More information

ACM 106a: Lecture 1 Agenda

ACM 106a: Lecture 1 Agenda 1 ACM 106a: Lecture 1 Agenda Introduction to numerical linear algebra Common problems First examples Inexact computation What is this course about? 2 Typical numerical linear algebra problems Systems of

More information

Introduction and mathematical preliminaries

Introduction and mathematical preliminaries Chapter Introduction and mathematical preliminaries Contents. Motivation..................................2 Finite-digit arithmetic.......................... 2.3 Errors in numerical calculations.....................

More information

Convergence of Rump s Method for Inverting Arbitrarily Ill-Conditioned Matrices

Convergence of Rump s Method for Inverting Arbitrarily Ill-Conditioned Matrices published in J. Comput. Appl. Math., 205(1):533 544, 2007 Convergence of Rump s Method for Inverting Arbitrarily Ill-Conditioned Matrices Shin ichi Oishi a,b Kunio Tanabe a Takeshi Ogita b,a Siegfried

More information

The numerical stability of barycentric Lagrange interpolation

The numerical stability of barycentric Lagrange interpolation IMA Journal of Numerical Analysis (2004) 24, 547 556 The numerical stability of barycentric Lagrange interpolation NICHOLAS J. HIGHAM Department of Mathematics, University of Manchester, Manchester M13

More information

2. Introduction to commutative rings (continued)

2. Introduction to commutative rings (continued) 2. Introduction to commutative rings (continued) 2.1. New examples of commutative rings. Recall that in the first lecture we defined the notions of commutative rings and field and gave some examples of

More information

CONTINUED FRACTIONS, PELL S EQUATION, AND TRANSCENDENTAL NUMBERS

CONTINUED FRACTIONS, PELL S EQUATION, AND TRANSCENDENTAL NUMBERS CONTINUED FRACTIONS, PELL S EQUATION, AND TRANSCENDENTAL NUMBERS JEREMY BOOHER Continued fractions usually get short-changed at PROMYS, but they are interesting in their own right and useful in other areas

More information

1. Introduction to commutative rings and fields

1. Introduction to commutative rings and fields 1. Introduction to commutative rings and fields Very informally speaking, a commutative ring is a set in which we can add, subtract and multiply elements so that the usual laws hold. A field is a commutative

More information

Commutative Rings and Fields

Commutative Rings and Fields Commutative Rings and Fields 1-22-2017 Different algebraic systems are used in linear algebra. The most important are commutative rings with identity and fields. Definition. A ring is a set R with two

More information

Mathematics High School Algebra

Mathematics High School Algebra Mathematics High School Algebra Expressions. An expression is a record of a computation with numbers, symbols that represent numbers, arithmetic operations, exponentiation, and, at more advanced levels,

More information

Algebra 1. Unit 3: Quadratic Functions. Romeo High School

Algebra 1. Unit 3: Quadratic Functions. Romeo High School Algebra 1 Unit 3: Quadratic Functions Romeo High School Contributors: Jennifer Boggio Jennifer Burnham Jim Cali Danielle Hart Robert Leitzel Kelly McNamara Mary Tarnowski Josh Tebeau RHS Mathematics Department

More information

Mathematical Reasoning & Proofs

Mathematical Reasoning & Proofs Mathematical Reasoning & Proofs MAT 1362 Fall 2018 Alistair Savage Department of Mathematics and Statistics University of Ottawa This work is licensed under a Creative Commons Attribution-ShareAlike 4.0

More information

Sequences. 1. Number sequences. 2. Arithmetic sequences. Consider the illustrated pattern of circles:

Sequences. 1. Number sequences. 2. Arithmetic sequences. Consider the illustrated pattern of circles: Sequences 1. Number sequences Consider the illustrated pattern of circles: The first layer has just one blue ball. The second layer has three pink balls. The third layer has five black balls. The fourth

More information

27 Wyner Math 2 Spring 2019

27 Wyner Math 2 Spring 2019 27 Wyner Math 2 Spring 2019 CHAPTER SIX: POLYNOMIALS Review January 25 Test February 8 Thorough understanding and fluency of the concepts and methods in this chapter is a cornerstone to success in the

More information