ERROR BOUNDS ON COMPLEX FLOATING-POINT MULTIPLICATION

Size: px
Start display at page:

Download "ERROR BOUNDS ON COMPLEX FLOATING-POINT MULTIPLICATION"

Transcription

1 MATHEMATICS OF COMPUTATION Volume 00, Number 0, Pages S (XX) ERROR BOUNDS ON COMPLEX FLOATING-POINT MULTIPLICATION RICHARD BRENT, COLIN PERCIVAL, AND PAUL ZIMMERMANN In memory of Erin Brent ( ) Abstract. Given floating-point arithmetic with t-digit base-β significands in which all arithmetic operations are performed as if calculated to infinite precision and rounded to a nearest representable value, we prove that the product of complex values z 0 and z 1 can be computed with maximum absolute error z 0 z 1 1 β1 t 5. In particular, this provides relative error bounds of 4 5 and 53 5 for IEEE 754 single and double precision arithmetic respectively, provided that overflow, underflow, and denormals do not occur. We also provide the numerical worst cases for IEEE 754 single and double precision arithmetic. 1. Introduction In an earlier paper [], the second author made the claim that the maximum relative error which can occur when computing the product z 0 z 1 of two complex values using floating-point arithmetic is ɛ 5, where ɛ is the maximum relative error which can result from rounded floating-point addition, subtraction, or multiplication. While reviewing that paper a few years later, the other two authors noted that the proof given was incorrect, although the result claimed was true. Since the bound of ɛ 8 which is commonly used [1] is suboptimal, we present here a corrected proof of the tighter bound. Interestingly, by explicitly finding worst-case inputs, we can demonstrate that our error bound is effectively optimal. Throughout this paper, we concern ourselves with floating-point arithmetic with t-digit base-β significands, denote by ulp(x) for x 0 the (unique) power of β such that β t 1 x /ulp(x) < β t, and write ɛ = 1 ulp(1) = 1 β1 t ; we also define ulp(0) = 0. The notations x y, x y, and x y represent rounded-to-nearest floating-point addition, subtraction, and multiplication of the values x and y.. An Error Bound Theorem 1. Let z 0 = a 0 + b 0 i and z 1 = a 1 + b 1 i, with a 0, b 0, a 1, b 1 floatingpoint values with t-digit base-β significands, and let z = ((a 0 a 1 ) (b 0 b 1 )) + ((a 0 b 1 ) (b 0 a 1 ))i be computed. Providing that no overflow or underflow occur, no denormal values are produced, arithmetic results are correctly rounded to a nearest representable value, z 0 z 1 0, and ɛ 5, the relative error z (z 0 z 1 ) 1 1 is less than ɛ 5 = 1 β1 t 5. 1 c 1997 American Mathematical Society

2 RICHARD BRENT, COLIN PERCIVAL, AND PAUL ZIMMERMANN Proof. Let a 0, b 0, a 1, and b 1 be chosen such that the relative error is maximized. By multiplying z 0 and z 1 by powers of i and/or taking complex conjugates, we can assume without loss of generality that (1) () 0 a 0, b 0, a 1, b 1 b 0 b 1 a 0 a 1 and given our assumptions that overflow, underflow, and denormals do not occur, and that rounding is performed to a nearest representable value, we can conclude that for any x occurring in the computation, the error introduced when rounding x is at most 1 ulp(x) and is strictly less than ɛ x. We note that the error I(z z 0 z 1 ) in the imaginary part of z is bounded as follows: I(z z 0 z 1 ) = ((a 0 b 1 ) (b 0 a 1 )) (a 0 b 1 + b 0 a 1 ) and consider two cases: a 0 b 1 a 0 b 1 + b 0 a 1 b 0 a 1 + ((a 0 b 1 ) (b 0 a 1 )) (a 0 b 1 + b 0 a 1 ) Case I1: ulp(a 0 b 1 + b 0 a 1 ) < ulp(a 0 b 1 + b 0 a 1 ) Using first the definition of ulp and second the assumption above, we must have a 0 b 1 + b 0 a 1 < β t ulp(a 0 b 1 + b 0 a 1 ) a 0 b 1 + b 0 a 1 and therefore (a0 b 1 + b 0 a 1 ) β t ulp(a 0 b 1 + b 0 a 1 ) < (a0 b 1 + b 0 a 1 ) (a 0 b 1 + b 0 a 1 ) a 0 b 1 a 0 b 1 + b 0 a 1 b 0 a 1 ɛ (a 0 b 1 + b 0 a 1 ). However, β t ulp(a 0 b 1 + b 0 a 1 ) is a representable floating-point value; so given our assumption that rounding is performed to a nearest representable value, we must now have ((a 0 b 1 ) (b 0 a 1 )) (a 0 b 1 + b 0 a 1 ) < ɛ (a 0 b 1 + b 0 a 1 ). Case I: ulp(a 0 b 1 + b 0 a 1 ) ulp(a 0 b 1 + b 0 a 1 ) From our assumption that the results of arithmetic operations are correctly rounded, we obtain ((a 0 b 1 ) (b 0 a 1 )) (a 0 b 1 + b 0 a 1 ) 1 ulp(a 0 b 1 + b 0 a 1 ) 1 ulp(a 0b 1 + b 0 a 1 ) ɛ (a 0 b 1 + b 0 a 1 ). Combining these two cases with the earlier-stated bound, we obtain I(z z 0 z 1 ) a 0 b 1 a 0 b 1 + b 0 a 1 b 0 a 1 + ((a 0 b 1 ) (b 0 a 1 )) (a 0 b 1 + b 0 a 1 ) < ɛ (a 0 b 1 ) + ɛ (b 0 a 1 ) + ɛ (a 0 b 1 + b 0 a 1 ) = ɛ (a 0 b 1 + b 0 a 1 ).

3 ERROR BOUNDS ON COMPLEX FLOATING-POINT MULTIPLICATION 3 Now that we have a bound on the imaginary part of the error, we turn our attention to the real part, and consider the following four cases (where the examples given apply to β = ): ulp(b 0 b 1 ) ulp(a 0 a 1 ) ulp(a 0 a 1 b 0 b 1 ) e.g., z 0 = z 1 = i ulp(b 0 b 1 ) < ulp(a 0 a 1 b 0 b 1 ) < ulp(a 0 a 1 ) e.g., z 0 = z 1 = i ulp(a 0 a 1 b 0 b 1 ) ulp(b 0 b 1 ) < ulp(a 0 a 1 ) e.g., z 0 = z 1 = i ulp(a 0 a 1 b 0 b 1 ) < ulp(b 0 b 1 ) = ulp(a 0 a 1 ) e.g., z 0 = z 1 = i. Since we have assumed that b 0 b 1 a 0 a 1, we know that ulp(b 0 b 1 ) ulp(a 0 a 1 ), and thus these four cases cover all possible inputs. Consequently, it suffices to prove the required bound for each of these four cases. Case R1: ulp(b 0 b 1 ) ulp(a 0 a 1 ) ulp(a 0 a 1 b 0 b 1 ) Note that the right inequality can only be strict if a 0 a 1 rounds up to a power of β and b 0 b 1 = 0. We observe that a 0 a 1 b 0 b 1 < a 0 a 1 b 0 b 1 + ɛ (a 0 a 1 + b 0 b 1 ) and bound the real part of the complex error as follows: R(z z 0 z 1 ) a 0 a 1 a 0 a 1 + b 0 b 1 b 0 b 1 + ((a 0 a 1 ) (b 0 b 1 )) (a 0 a 1 b 0 b 1 ) 1 ulp(a 0a 1 ) + 1 ulp(b 0b 1 ) + 1 ulp(a 0 a 1 b 0 b 1 ) 1 ulp(a 0 a 1 b 0 b 1 ) + 1 ulp(b 0b 1 ) + 1 ulp(a 0 a 1 b 0 b 1 ) < ɛ (a 0 a 1 b 0 b 1 ) + ɛ (b 0 b 1 ) < ɛ (a 0 a 1 b 0 b 1 ) + ɛ (a 0 a 1 + b 0 b 1 ). Applying the triangle inequality, we now observe that z z 0 z 1 = R(z z 0 z 1 ) + I(z z 0 z 1 ) as required. < ɛ (a 0 a 1 b 0 b 1 ) + (a 0 b 1 + b 0 a 1 ) + ɛ (a 0 a 1 + b 0 b 1 ) 3 ɛ 7 z 0z (a 0b 1 b 0 a 1 ) 1 7 (a 0a 1 5b 0 b 1 ) + ɛ z 0 z 1 ( ) ɛ 3/7 + ɛ z 0 z 1 < ɛ 5 z 0 z 1 Case R: ulp(b 0 b 1 ) < ulp(a 0 a 1 b 0 b 1 ) < ulp(a 0 a 1 ) Noting that ulp(x) < ulp(y) implies ulp(x) β 1 ulp(y) 1 ulp(y), we obtain R(z z 0 z 1 ) 1 ulp(a 0a 1 ) + 1 ulp(b 0b 1 ) + 1 ulp(a 0 a 1 b 0 b 1 ) 7 8 ulp(a 0a 1 ) ( ) 7 ɛ 4 a 0a 1

4 4 RICHARD BRENT, COLIN PERCIVAL, AND PAUL ZIMMERMANN and therefore z z 0 z 1 = R(z z 0 z 1 ) + I(z z 0 z 1 ) (7 ) < ɛ 4 a 0a 1 + (a 0 b 1 + b 0 a 1 ) 104 = ɛ 07 z 0z (a 0b 1 b 0 a 1 ) (79a 0a 1 18b 0 b 1 ) as required. ɛ 104/07 z 0 z 1 < ɛ 5 z 0 z 1 Case R3: ulp(a 0 a 1 b 0 b 1 ) ulp(b 0 b 1 ) < ulp(a 0 a 1 ) In this case, there is no rounding error introduced in computing the difference between a 0 a 1 and b 0 b 1 since ulp(a 0 a 1 b 0 b 1 ) ulp(b 0 b 1 ) ulp(b 0 b 1 ) and ulp(a 0 a 1 b 0 b 1 ) < ulp(a 0 a 1 ) ulp(a 0 a 1 ). Also, so we have ulp(b 0 b 1 ) 1 β ulp(a 0a 1 ) 1 ulp(a 0a 1 ) R(z z 0 z 1 ) 1 ulp(a 0a 1 ) + 1 ulp(b 0b 1 ) 3 4 ulp(a 0a 1 ) ( ) 3 ɛ a 0a 1 and consequently z z 0 z 1 = R(z z 0 z 1 ) + I(z z 0 z 1 ) (3 ) < ɛ a 0a 1 + (a 0 b 1 + b 0 a 1 ) 56 = ɛ 55 z 0z (a 0b 1 b 0 a 1 ) 1 0 (3a 0a 1 3b 0 b 1 ) as required. ɛ 56/55 z 0 z 1 < ɛ 5 z 0 z 1 Case R4: ulp(a 0 a 1 b 0 b 1 ) < ulp(b 0 b 1 ) = ulp(a 0 a 1 ) In this case, there is again no rounding error introduced in computing the difference between a 0 a 1 and b 0 b 1, so we obtain R(z z 0 z 1 ) a 0 a 1 a 0 a 1 + b 0 b 1 b 0 b 1 < ɛ (a 0 a 1 + b 0 b 1 )

5 ERROR BOUNDS ON COMPLEX FLOATING-POINT MULTIPLICATION 5 and consequently z z 0 z 1 = R(z z 0 z 1 ) + I(z z 0 z 1 ) < ɛ (a 0 a 1 + b 0 b 1 ) + (a 0 b 1 + b 0 a 1 ) = ɛ 5 z 0 z 1 (a 0 b 1 b 0 a 1 ) 4(a 0 a 1 b 0 b 1 ) ɛ 5 z 0 z 1 as required. 3. Worst-case multiplicands for β = Having proved an upper bound on the relative error which can result from floating-point rounding when computing the product of complex values, we now turn to a more number-theoretic problem: finding precise worst-case inputs for β =. Starting with the assumption that some inputs produce errors very close to the proven upper bound, we will repeatedly reduce the set of possible inputs until an exhaustive search becomes feasible. Theorem. Let β = and assume that z 0 = a 0 + b 0 i 0 and z 1 = a 1 + b 1 i 0, where a 0, b 0, a 1, b 1 are floating-point values with t-digit base-β significands, and z = ((a 0 a 1 ) (b 0 b 1 )) + ((a 0 b 1 ) (b 0 a 1 ))i are such that (1) () (3) (4) 0 a 0, b 0, a 1, b 1 b 0 b 1 a 0 a 1 b 0 a 1 a 0 b 1 1/ a 0 a 1 < 1 and no overflow, underflow, or denormal values occur during the computation of z. Assume further that the results of arithmetic operations are correctly rounded to a nearest representable value and that (5) z z 0 z 1 z 0 z 1 for some positive integer n. Then for some integers j xy, k xy satisfying > ɛ ( ) 5 nɛ > ɛ max 104/07, 3/7 + ɛ a 0 a 1 = 1/ + (j aa + 1/)ɛ + k aa ɛ a 0 b 1 = 1/ + (j ab + 1/)ɛ + k ab ɛ b 0 a 1 = 1/ + (j ba + 1/)ɛ + k ba ɛ b 0 b 1 = 1/ + (j bb + 1/)ɛ + k bb ɛ 0 j aa, j ab, j ba, j bb < n 4 k aa, k bb < n and a 0 b 0, a 1 b 1. k ab, k ba < n

6 6 RICHARD BRENT, COLIN PERCIVAL, AND PAUL ZIMMERMANN Proof. From equation (5), we note that ɛ nɛ < 11/07 < 4 ; we will use this trivial bound later without explicit comment. From the proof of Theorem 1, we know that Case R4 must hold, i.e., there is no error introduced in the computation of the difference between a 0 a 1 and b 0 b 1, and ulp(b 0 b 1 ) = ulp(a 0 a 1 ). From inequalities () and (4) above, this implies that 1/ b 0 b 1 a 0 a 1 < 1 R(z z 0 z 1 ) a 0 a 1 a 0 a 1 + b 0 b 1 b 0 b 1 ɛ. We can now obtain lower bounds on z 0 z 1 and z z 0 z 1, using the fact that (a 0 a 1 )(b 0 b 1 ) = (a 0 b 1 )(b 0 a 1 ): z 0 z 1 = ( a 0 + b ( 0) a 1 + b ) 1 = (a 0 a 1 ) + (a 0 b 1 ) + (b 0 a 1 ) + (b 0 b 1 ) (1/) + (a 0 b 1 ) + (1/)4 (a 0 b 1 ) + (1/) 1 z z 0 z 1 > z 0 z 1 ɛ (5 nɛ) ɛ (5 nɛ) as well as an upper bound on z 0 z 1 : We now note that z 0 z 1 104ɛ 07 z 0 z 1 < < z z 0 z 1 = R(z z 0 z 1 ) + I(z z 0 z 1 ) < ɛ + (ɛ (a 0 b 1 + b 0 a 1 )) ɛ + 4ɛ z 0 z 1 (a 0 b 1 ) z 0 z 1 (a 0 a 1 ) (b 0 b 1 ) = so b 0 a 1 a 0 b 1 109/196 < 1 and a 0 b 1 + b 0 a 1 109/49 (1 + ɛ) < ; this implies that ulp(b 0 a 1 ) ulp(a 0 b 1 ) ulp(1/) and ulp(a 0 b 1 + b 0 a 1 ) ulp(1), and therefore a 0 b 1 a 0 b 1 ɛ/ b 0 a 1 b 0 a 1 ɛ/ ((a 0 b 1 ) (b 0 a 1 )) (a 0 b 1 + b 0 a 1 ) ɛ I(z z 0 z 1 ) ɛ/ + ɛ/ + ɛ = ɛ which allows us to place upper bounds on z z 0 z 1 and z 0 z 1 : z z 0 z 1 = R(z z 0 z 1 ) + I(z z 0 z 1 ) (ɛ) + (ɛ) = 5ɛ z 0 z 1 < z z 0 z 1 ɛ (5 nɛ) 5 5 nɛ. Combining the known lower bound ɛ (5 nɛ) for z z 0 z 1 with the upper bounds on the error contributed by each individual rounding step, we find that ɛ/ (1 1 nɛ)ɛ < a 0 a 1 a 0 a 1 ɛ/

7 ERROR BOUNDS ON COMPLEX FLOATING-POINT MULTIPLICATION 7 ɛ/ (1 1 nɛ)ɛ < b 0 b 1 b 0 b 1 ɛ/ ɛ/ ( 4 nɛ)ɛ < a 0 b 1 a 0 b 1 ɛ/ ɛ/ ( 4 nɛ)ɛ < b 0 a 1 b 0 a 1 ɛ/ and similarly, by combining the upper bound on z 0 z 1 with lower bound of 1/ for each pairwise product, we obtain 5 1/ b 0 b 1 a 0 a 1 5 nɛ nɛ 4 = 0 4nɛ 5 1/ b 0 a 1 a 0 b 1 5 nɛ nɛ 4 = 0 4nɛ. Now consider the possible values for a 0 a 1 which satisfy these restrictions. Since it is the product of two values which are expressible using t digits of significand, a 0 a 1 can be exactly represented using t digits of significand; but since 1/ a 0 a 1 < 1, this implies that a 0 a 1 is an integer multiple of ɛ. There is therefore at least one pair of integers j aa, k aa with 0 j aa < ɛ 1 /, k aa ɛ 1 / for which a 0 a 1 = 1/ + (j aa + 1/)ɛ + k aa ɛ. Since a 0 a 1 is the closest multiple of ɛ to a 0 a 1, this implies that ɛ/ (1 1 nɛ)ɛ < a 0 a 1 a 0 a 1 = ɛ/ k aa ɛ i.e., k aa < n, and similarly k aa ɛ < 1 1 nɛ < 1 (1 nɛ) = nɛ 1/ + j aa ɛ a 0 a nɛ 0 4nɛ < 1/4 + nɛ/4 < 1/ + nɛ 4 i.e., 0 j aa < n/4. Applying the same argument to a 0 b 1, b 0 a 1, and b 0 b 1 allows us to infer that they possess the same structure, as required. To complete the proof, we note that the rounding errors from the products a 0 a 1 and b 0 b 1 must be in opposite directions (in order that they accumulate when subtracted), while the rounding errors from the products a 0 b 1 and b 0 a 1 must be in the same direction (in order that they accumulate when added); consequently, we must have a 0 b 0 and a 1 b 1. Corollary 1. Assume that the preconditions of Theorem are satisfied, and assume further that 1 (6) a 0 < 1 and n ɛ 1/. Then 1 < a 0, b 0, a 1, b 1 < 1. Proof. Assume that a 1 1. Then we can write a 0 = 1/ + Aɛ a 1 = 1 + Bɛ for some 0 A, B < (ɛ) 1. From Theorem, we have 1/ + (A + B)ɛ + ABɛ = a 0 a 1 = 1/ + (j aa + 1/)ɛ + k aa ɛ for some 0 j aa < n/4, k aa < n.

8 8 RICHARD BRENT, COLIN PERCIVAL, AND PAUL ZIMMERMANN As a result, we must have A + B n/4 1/ ɛ 1/, and since 0 A, B this implies 0 ABɛ ɛ/8. However, by reducing the equation above modulo ɛ, we find that ABɛ ɛ/ + k aa ɛ, which contradicts our bounds on ABɛ. Consequently, we can conclude that a 1 < 1. Now we note that a 0 a 1 > 1/ and a 0 < 1, so a 1 > 1/, and we have both of the bounds required for a 1. Applying the same argument to the other products provides the same bounds for a 0, b 0, and b 1. Corollary. Assume that the preconditions of Corollary 1 are satisfied, and assume further that n ɛ 1/ and ɛ 6. Then Proof. From Theorem, we obtain that j aa j ab j ba + j bb = 0, a 0 b 0 a 1 b 1 < 3nɛ. a 0 (a 1 b 1 ) = a 0 a 1 a 0 b 1 = (j aa j ab )ɛ + (k aa k ab )ɛ where j aa j ab < n 4, k aa k ab < 3n, and since a 0 > 1 (from Corollary 1), we can conclude that a 1 b 1 < n ɛ + 3nɛ. Since a 1 and b 1 are integer multiples of ɛ and 3nɛ < ɛ/, we conclude that a 1 b 1 n ɛ. Applying the same argument to the product a 1 (a 0 b 0 ) provides the same bound for a 0 b 0. We now note that (jaa j ab j ba + j bb )ɛ + (k aa k ab k ba + k bb )ɛ = a0 b 0 a 1 b 1 ( n ) ɛ < ɛ 4 from our assumed upper bound on n, and consequently we can conclude that j aa j ab j ba + j bb = 0. Finally, this allows us to write as required. a 0 b 0 a 1 b 1 = k aa k ab k ba + k bb ɛ < 3nɛ Corollary 3. Assume that the preconditions of Corollary 1 are satisfied, and assume further that n 1 4 ɛ 1/. Then (a 0 b 0 )(a 1 b 1 ) = (j aa j ab )(j aa j ba )ɛ (a 0 b 0 )(a 1 b 1 )k aa = (k aa k ab )(k aa k ba )ɛ Proof. For brevity and clarity, we will write (a 0 b 0 )(a 1 b 1 ) = xɛ and note that x is an integer between 3n and 3n, from Corollary. Then xa 0 a 1 = x ( + x j aa + 1 ) ɛ + xk aa ɛ, xa 0 a 1 = a 0(a 1 b 1 ) a1(a 0 b 0 ) ɛ ɛ = ((j aa j ab ) + (k aa k ab )ɛ) ((j aa j ba ) + (k aa k ba )ɛ) = (j aa j ab )(j aa j ba ) + ((j aa j ab )(k aa k ba ) + (j aa j ba )(k aa k ab )) ɛ + (k aa k ab )(k aa k ba )ɛ.

9 ERROR BOUNDS ON COMPLEX FLOATING-POINT MULTIPLICATION 9 Consequently, x (j aa j ab )(j aa j ba ) = ((j aa j ab )(k aa k ba ) + (j aa j ba )(k aa k ab ) (j aa + 1)x) ɛ + ((k aa k ab )(k aa k ba ) k aa x) ɛ, ( x (j aa j ab )(j aa j ba ) n 3n 4 + n 3n ( n ) ) n ɛ ( + 3n ) 3n + 6n ɛ = ( 3n + 3n ) ɛ + 1 n ɛ ɛ ɛ < 1, and since the only integer with absolute value less than one is zero, we can conclude that x = (j aa j ab )(j aa j ba ) as required. We now consider xa 0 a 1 ɛ modulo 1 ɛ 1, and note that and further that xk aa xa 0 a 1 ɛ (k aa k ab )(k aa k ba ) xk aa (k aa k ab )(k aa k ba ) 3n n + 3n 3n = 1n 4 1ɛ 1 64 < 1 ɛ 1 and therefore xk aa = (k aa k ab )(k aa k ba ). Theorem 3. Let β = and assume that z 0 = a 0 + b 0 i, z 1 = a 1 + b 1 i, and z = ((a 0 a 1 ) (b 0 b 1 )) + ((a 0 b 1 ) (b 0 a 1 ))i are such that (1) () (3) (4) (6) 0 a 0, b 0, a 1, b 1 b 0 b 1 a 0 a 1 b 0 a 1 a 0 b 1 1/ a 0 a 1 < 1 1/ a 0 < 1 and no overflow, underflow, or denormal values occur during the computation of z. Assume further that the results of arithmetic operations are correctly rounded to the nearest representable value, and that (5) z z 0 z 1 z 0 z 1 > ɛ ( ) 5 nɛ > ɛ max 104/07, 3/7 + ɛ

10 10 RICHARD BRENT, COLIN PERCIVAL, AND PAUL ZIMMERMANN for some n < 1 4 ɛ 1/ and ɛ 6. Then there exist integers c 0, d 0, α 0, β 0, c 1, d 1, α 1, β 1 satisfying a 0 = c 0 d 0 (1 + α 0 ɛ) b 0 = c 0 d 0 (1 + β 0 ɛ) a 1 = c 1 d 1 (1 + α 1 ɛ) b 1 = c 1 d 1 (1 + β 1 ɛ) gcd(c 0, d 0 ) = 1 gcd(c 1, d 1 ) = 1 c 0 c 1 = d 0 d 1 < 3n d 0 c 0 d 0 d 1 c 1 d 1 1 < a 0,b 0, a 1, b 1 < 1 α 0 β 0 ɛ 1 (mod d 0 ) α 0 β 0 α 1 β 1 ɛ 1 (mod d 1 ) α 1 β 1 min (α 0, β 0 ) + min (α 1, β 1 ) 0 max ( α 0, β 0 ) max ( α 1, β 1 ) < n Proof. Let the values j aa, j ab, j ba, j bb, k aa, k ab, k ba, and k bb be as constructed in Theorem, and further let g 0 = gcd(j aa j ab, (a 1 b 1 )/ɛ). From Corollary 1 we know that 1/ < a 1, b 1 < 1, so a 1 and b 1 are multiples of ɛ; consequently g 0 must be an integer. By the same argument, g 1 = gcd(j aa j ba, (a 0 b 0 )/ɛ) is an integer. Now note that g 0 (a 1 b 1 )ɛ 1 (a 1 b 1 )a 0 ɛ = (j aa j ab )ɛ 1 + (k aa k ab ) and since g 0 (j aa j ab ), we can conclude that g 0 (k aa k ab ). By the same argument, g 1 (k aa k ba ). We now write c 0 = j aa j ab d 0 = a 1 b 1 g 0 g 0 ɛ c 1 = j aa j ba d 1 = a 0 b 0 g 1 g 1 ɛ e 0 = k aa k ab g 0 e 1 = k aa k ba g 1 and note that these values are all integers; further, from Corollary 3 we have d 0 d 1 k aa = e 0 e 1 and d 0 d 1 = c 0 c 1, and since gcd(c 0, d 0 ) = gcd(c 1, d 1 ) = 1 by construction, this implies gcd(c 0, c 1 ) = 1. We now observe that and therefore 1 + a 0 = a 0(a 1 b 1 ) = c 0g 0 ɛ + e 0 g 0 ɛ = c 0 + e 0 ɛ a 1 b 1 d 0 g 0 ɛ d 0 a 1 = a 1(a 0 b 0 ) = c 1g 1 ɛ + e 1 g 1 ɛ = c 1 + e 1 ɛ a 0 b 0 d 1 g 1 ɛ d 1 ( j aa + 1 ) ɛ + k aa ɛ = a 0 a 1 = c 0c 1 d 0 d 1 + c 0e 1 + e 0 c 1 d 0 d 1 ɛ + e 0e 1 d 0 d 1 ɛ = 1 + c 0e 1 + e 0 c 1 d 0 d 1 ɛ + k aa ɛ

11 ERROR BOUNDS ON COMPLEX FLOATING-POINT MULTIPLICATION 11 and thus (using d 0 d 1 = c 0 c 1 ) c 0 c 1 (j aa + 1) = c 0 e 1 + e 0 c 1. Consequently c 0 e 0 c 1 and c 1 c 0 e 1, and since gcd(c 0, c 1 ) = 1 it follows that c 0 e 0 and c 1 e 1. Writing e 0 = c 0 α 0, e 1 = c 1 α 1 for integers α 0, α 1, we now have a 0 = c 0 (1 + α 0 ɛ) a 1 = c 1 (1 + α 1 ɛ) d 0 d 1 and taking β 0 = α 0 + c 1 g 1, β 1 = α 1 + c 0 g 0, we have b 0 = c 0 (1 + β 0 ɛ) d 0 b 1 = c 1 (1 + β 1 ɛ) d 1 as required. The remaining conditions can be obtained by remembering that a 0, b 0, a 1, and b 1 are integer multiples of ɛ, and by using the bounds on j xy and k xy given in Theorem. Corollary 4. In IEEE 754 single-precision arithmetic (β =, t = 4, ɛ = 4 ), using nearest even rounding mode, the values 1 a 0 = 3 b 0 = (1 4ɛ) a 1 = 3 (1 + 11ɛ) b 1 = (1 + 5ɛ) 3 result in a relative error δ ɛ 5 168ɛ ɛ in z, and δ is the largest possible relative error for for IEEE 754 single-precision inputs provided that overflow, underflow, and denormals do not occur. Proof. Straightforward computation for the values given establishes that a 0 a 1 = 1 (1 + 11ɛ) a 0 a 1 = 1 (1 + 1ɛ) b 0 b 1 = 1 (1 + ɛ 0ɛ ) b 0 b 1 = 1 R(z 0 z 1 ) = 5ɛ + 10ɛ R(z ) = 6ɛ a 0 b 1 = 1 (1 + 5ɛ) a 0 b 1 = 1 (1 + 4ɛ) b 0 a 1 = 1 (1 + 7ɛ 44ɛ ) b 0 a 1 = 1 (1 + 6ɛ) I(z 0 z 1 ) = 1 + 6ɛ ɛ I(z ) = 1 + 4ɛ z z 0 z 1 = ɛ (5 108ɛ + O(ɛ )) z 0 z 1 = 1 + 1ɛ + O(ɛ ) and the ratio of these provides the error as stated. To prove that this is the largest possible relative error for IEEE single-precision inputs, we note that the mappings z 0 z 0 i, z 1 z 1 i, (z 0, z 1 ) ( z 0, z 1 ), (z 0, z 1 ) (z 1, z 0 ), z 0 z 0 j, and z 1 z 1 k do not affect the relative error in z ; consequently, this allows us to assume without loss of generality that conditions (1-4) and (6) are satisfied by the worst-case inputs. Using the results of Theorem 3, an exhaustive computer search (taking about five minutes in MAPLE on the second author s 1.4 GHz laptop) completes the proof. 1 Note that while 3 is not an IEEE 754 single-precision value, 3 (1 + 5ɛ) and 3 (1 + 11ɛ) are, since ɛ ɛ (mod 3).

12 1 RICHARD BRENT, COLIN PERCIVAL, AND PAUL ZIMMERMANN Corollary 5. In IEEE 754 double-precision arithmetic (β =, t = 53, ɛ = 53 ), using nearest even rounding mode, the values a 0 = 3 4 (1 + 4ɛ) b 0 = 3 4 a 1 = 3 (1 + 7ɛ) b 1 = (1 + 1ɛ) 3 result in a relative error in z of approximately ɛ 5 96ɛ ɛ , and this is the worst possible provided that overflow, underflow, and denormals do not occur. Proof. Straightforward computation for the values given establishes that z z 0 z 1 = ɛ (5 36ɛ + O(ɛ )) z 0 z 1 = 1 + 1ɛ + O(ɛ ) and the ratio of these provides the error as stated. As in Corollary 4, an exhaustive search using the results of Theorem 3 (again, taking just a few minutes) completes the proof. For β = and t > 6, the constructions given in Corollaries 4 and 5 for a 0, b 0, a 1, b 1 provide for even and odd t respectively relative errors of ɛ 5 168ɛ + O(ɛ ) and ɛ 5 96ɛ + O(ɛ ). We believe that these are the worst-case inputs for all sufficiently large t when β =. 4. A note on methods The existence of this paper serves a strong demonstration of the power of experimental mathematics. The initial result the upper bound of 5ɛ was discovered experimentally seven years ago, on the basis of testing a few million random single-precision products. Experimental methods became even more important when it came to the results concerning worst-case inputs. Here the approach taken was to perform an exhaustive search, taking several hours on the second author s laptop, of IEEE single-precision inputs, using only a few arguments from Theorem 1 to prune the search. Once the worst few sets of inputs had been enumerated, it became clear that they possessed the structure described in Theorem 3, and it was natural to conjecture that this structure would be satisfied by the worst-case inputs in any precision. As is common with such problems, once the required result was known, constructing a proof was fairly straightforward. References 1. N.J. Higham, Accuracy and Stability of Numerical Algorithms, Second Edition, SIAM, 00.. C. Percival, Rapid multiplication modulo the sum and difference of highly composite numbers, Math. Comp. 7 (00), Mathematical Sciences Institute, Australian National University, Canberra, ACT 000, Australia address: complex@rpbrent.com IRMACS Centre, Simon Fraser University, Burnaby, BC, Canada address: cperciva@irmacs.sfu.ca INRIA Lorraine/LORIA, 615 rue du Jardin Botanique, F-5460 Villers-lès-Nancy Cedex, France address: zimmerma@loria.fr

Error Bounds on Complex Floating-Point Multiplication

Error Bounds on Complex Floating-Point Multiplication Error Bounds on Complex Floating-Point Multiplication Richard Brent, Colin Percival, Paul Zimmermann To cite this version: Richard Brent, Colin Percival, Paul Zimmermann. Error Bounds on Complex Floating-Point

More information

Math 324 Summer 2012 Elementary Number Theory Notes on Mathematical Induction

Math 324 Summer 2012 Elementary Number Theory Notes on Mathematical Induction Math 4 Summer 01 Elementary Number Theory Notes on Mathematical Induction Principle of Mathematical Induction Recall the following axiom for the set of integers. Well-Ordering Axiom for the Integers If

More information

Getting tight error bounds in floating-point arithmetic: illustration with complex functions, and the real x n function

Getting tight error bounds in floating-point arithmetic: illustration with complex functions, and the real x n function Getting tight error bounds in floating-point arithmetic: illustration with complex functions, and the real x n function Jean-Michel Muller collaborators: S. Graillat, C.-P. Jeannerod, N. Louvet V. Lefèvre

More information

ECS 231 Computer Arithmetic 1 / 27

ECS 231 Computer Arithmetic 1 / 27 ECS 231 Computer Arithmetic 1 / 27 Outline 1 Floating-point numbers and representations 2 Floating-point arithmetic 3 Floating-point error analysis 4 Further reading 2 / 27 Outline 1 Floating-point numbers

More information

Homework 2 Foundations of Computational Math 1 Fall 2018

Homework 2 Foundations of Computational Math 1 Fall 2018 Homework 2 Foundations of Computational Math 1 Fall 2018 Note that Problems 2 and 8 have coding in them. Problem 2 is a simple task while Problem 8 is very involved (and has in fact been given as a programming

More information

Formal verification of IA-64 division algorithms

Formal verification of IA-64 division algorithms Formal verification of IA-64 division algorithms 1 Formal verification of IA-64 division algorithms John Harrison Intel Corporation IA-64 overview HOL Light overview IEEE correctness Division on IA-64

More information

Lecture 7. Floating point arithmetic and stability

Lecture 7. Floating point arithmetic and stability Lecture 7 Floating point arithmetic and stability 2.5 Machine representation of numbers Scientific notation: 23 }{{} }{{} } 3.14159265 {{} }{{} 10 sign mantissa base exponent (significand) s m β e A floating

More information

On various ways to split a floating-point number

On various ways to split a floating-point number On various ways to split a floating-point number Claude-Pierre Jeannerod Jean-Michel Muller Paul Zimmermann Inria, CNRS, ENS Lyon, Université de Lyon, Université de Lorraine France ARITH-25 June 2018 -2-

More information

MATH 324 Summer 2011 Elementary Number Theory. Notes on Mathematical Induction. Recall the following axiom for the set of integers.

MATH 324 Summer 2011 Elementary Number Theory. Notes on Mathematical Induction. Recall the following axiom for the set of integers. MATH 4 Summer 011 Elementary Number Theory Notes on Mathematical Induction Principle of Mathematical Induction Recall the following axiom for the set of integers. Well-Ordering Axiom for the Integers If

More information

Binary Floating-Point Numbers

Binary Floating-Point Numbers Binary Floating-Point Numbers S exponent E significand M F=(-1) s M β E Significand M pure fraction [0, 1-ulp] or [1, 2) for β=2 Normalized form significand has no leading zeros maximum # of significant

More information

The Classical Relative Error Bounds for Computing a2 + b 2 and c/ a 2 + b 2 in Binary Floating-Point Arithmetic are Asymptotically Optimal

The Classical Relative Error Bounds for Computing a2 + b 2 and c/ a 2 + b 2 in Binary Floating-Point Arithmetic are Asymptotically Optimal -1- The Classical Relative Error Bounds for Computing a2 + b 2 and c/ a 2 + b 2 in Binary Floating-Point Arithmetic are Asymptotically Optimal Claude-Pierre Jeannerod Jean-Michel Muller Antoine Plet Inria,

More information

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 5. Ax = b.

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 5. Ax = b. CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 5 GENE H GOLUB Suppose we want to solve We actually have an approximation ξ such that 1 Perturbation Theory Ax = b x = ξ + e The question is, how

More information

Math 410 Homework 6 Due Monday, October 26

Math 410 Homework 6 Due Monday, October 26 Math 40 Homework 6 Due Monday, October 26. Let c be any constant and assume that lim s n = s and lim t n = t. Prove that: a) lim c s n = c s We talked about these in class: We want to show that for all

More information

Accurate Multiple-Precision Gauss-Legendre Quadrature

Accurate Multiple-Precision Gauss-Legendre Quadrature Accurate Multiple-Precision Gauss-Legendre Quadrature Laurent Fousse Université Henri-Poincaré Nancy 1 laurent@komite.net Abstract Numerical integration is an operation that is frequently available in

More information

Efficient Reproducible Floating Point Summation and BLAS

Efficient Reproducible Floating Point Summation and BLAS Efficient Reproducible Floating Point Summation and BLAS Peter Ahrens Hong Diep Nguyen James Demmel Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No.

More information

Ch 3.2: Direct proofs

Ch 3.2: Direct proofs Math 299 Lectures 8 and 9: Chapter 3 0. Ch3.1 A trivial proof and a vacuous proof (Reading assignment) 1. Ch3.2 Direct proofs 2. Ch3.3 Proof by contrapositive 3. Ch3.4 Proof by cases 4. Ch3.5 Proof evaluations

More information

Proof 1: Using only ch. 6 results. Since gcd(a, b) = 1, we have

Proof 1: Using only ch. 6 results. Since gcd(a, b) = 1, we have Exercise 13. Consider positive integers a, b, and c. (a) Suppose gcd(a, b) = 1. (i) Show that if a divides the product bc, then a must divide c. I give two proofs here, to illustrate the different methods.

More information

Elements of Floating-point Arithmetic

Elements of Floating-point Arithmetic Elements of Floating-point Arithmetic Sanzheng Qiao Department of Computing and Software McMaster University July, 2012 Outline 1 Floating-point Numbers Representations IEEE Floating-point Standards Underflow

More information

There are no Barker arrays having more than two dimensions

There are no Barker arrays having more than two dimensions There are no Barker arrays having more than two dimensions Jonathan Jedwab Matthew G. Parker 5 June 2006 (revised 7 December 2006) Abstract Davis, Jedwab and Smith recently proved that there are no 2-dimensional

More information

Solutions to Exercises Chapter 10: Ramsey s Theorem

Solutions to Exercises Chapter 10: Ramsey s Theorem Solutions to Exercises Chapter 10: Ramsey s Theorem 1 A platoon of soldiers (all of different heights) is in rectangular formation on a parade ground. The sergeant rearranges the soldiers in each row of

More information

27 th Annual ARML Scrimmage

27 th Annual ARML Scrimmage 27 th Annual ARML Scrimmage Featuring: Howard County ARML Team (host) Baltimore County ARML Team ARML Team Alumni Citizens By Raymond Cheong May 23, 2012 Reservoir HS Individual Round (10 min. per pair

More information

9/5/17. Fermat s last theorem. CS 220: Discrete Structures and their Applications. Proofs sections in zybooks. Proofs.

9/5/17. Fermat s last theorem. CS 220: Discrete Structures and their Applications. Proofs sections in zybooks. Proofs. Fermat s last theorem CS 220: Discrete Structures and their Applications Theorem: For every integer n > 2 there is no solution to the equation a n + b n = c n where a,b, and c are positive integers Proofs

More information

Elements of Floating-point Arithmetic

Elements of Floating-point Arithmetic Elements of Floating-point Arithmetic Sanzheng Qiao Department of Computing and Software McMaster University July, 2012 Outline 1 Floating-point Numbers Representations IEEE Floating-point Standards Underflow

More information

Reliable real arithmetic. one-dimensional dynamical systems

Reliable real arithmetic. one-dimensional dynamical systems Reliable real arithmetic and one-dimensional dynamical systems Keith Briggs, BT Research Keith.Briggs@bt.com http://keithbriggs.info QMUL Mathematics Department 2007 March 27 1600 typeset 2007 March 26

More information

MATH FINAL EXAM REVIEW HINTS

MATH FINAL EXAM REVIEW HINTS MATH 109 - FINAL EXAM REVIEW HINTS Answer: Answer: 1. Cardinality (1) Let a < b be two real numbers and define f : (0, 1) (a, b) by f(t) = (1 t)a + tb. (a) Prove that f is a bijection. (b) Prove that any

More information

Notes for Chapter 1 of. Scientific Computing with Case Studies

Notes for Chapter 1 of. Scientific Computing with Case Studies Notes for Chapter 1 of Scientific Computing with Case Studies Dianne P. O Leary SIAM Press, 2008 Mathematical modeling Computer arithmetic Errors 1999-2008 Dianne P. O'Leary 1 Arithmetic and Error What

More information

= 1 2x. x 2 a ) 0 (mod p n ), (x 2 + 2a + a2. x a ) 2

= 1 2x. x 2 a ) 0 (mod p n ), (x 2 + 2a + a2. x a ) 2 8. p-adic numbers 8.1. Motivation: Solving x 2 a (mod p n ). Take an odd prime p, and ( an) integer a coprime to p. Then, as we know, x 2 a (mod p) has a solution x Z iff = 1. In this case we can suppose

More information

EGMO 2016, Day 1 Solutions. Problem 1. Let n be an odd positive integer, and let x1, x2,..., xn be non-negative real numbers.

EGMO 2016, Day 1 Solutions. Problem 1. Let n be an odd positive integer, and let x1, x2,..., xn be non-negative real numbers. EGMO 2016, Day 1 Solutions Problem 1. Let n be an odd positive integer, and let x1, x2,..., xn be non-negative real numbers. Show that min (x2i + x2i+1 ) max (2xj xj+1 ), j=1,...,n i=1,...,n where xn+1

More information

An unusual cubic representation problem

An unusual cubic representation problem Annales Mathematicae et Informaticae 43 (014) pp. 9 41 http://ami.ektf.hu An unusual cubic representation problem Andrew Bremner a, Allan Macleod b a School of Mathematics and Mathematical Statistics,

More information

NORTHWESTERN UNIVERSITY Thrusday, Oct 6th, 2011 ANSWERS FALL 2011 NU PUTNAM SELECTION TEST

NORTHWESTERN UNIVERSITY Thrusday, Oct 6th, 2011 ANSWERS FALL 2011 NU PUTNAM SELECTION TEST Problem A1. Let a 1, a 2,..., a n be n not necessarily distinct integers. exist a subset of these numbers whose sum is divisible by n. Prove that there - Answer: Consider the numbers s 1 = a 1, s 2 = a

More information

Math Homework # 4

Math Homework # 4 Math 446 - Homework # 4 1. Are the following statements true or false? (a) 3 5(mod 2) Solution: 3 5 = 2 = 2 ( 1) is divisible by 2. Hence 2 5(mod 2). (b) 11 5(mod 5) Solution: 11 ( 5) = 16 is NOT divisible

More information

Precalculus Chapter P.1 Part 2 of 3. Mr. Chapman Manchester High School

Precalculus Chapter P.1 Part 2 of 3. Mr. Chapman Manchester High School Precalculus Chapter P.1 Part of 3 Mr. Chapman Manchester High School Algebraic Expressions Evaluating Algebraic Expressions Using the Basic Rules and Properties of Algebra Definition of an Algebraic Expression:

More information

A. Incorrect! Replacing is not a method for solving systems of equations.

A. Incorrect! Replacing is not a method for solving systems of equations. ACT Math and Science - Problem Drill 20: Systems of Equations No. 1 of 10 1. What methods were presented to solve systems of equations? (A) Graphing, replacing, and substitution. (B) Solving, replacing,

More information

2005 Euclid Contest. Solutions

2005 Euclid Contest. Solutions Canadian Mathematics Competition An activity of the Centre for Education in Mathematics and Computing, University of Waterloo, Waterloo, Ontario 2005 Euclid Contest Tuesday, April 19, 2005 Solutions c

More information

MATH 117 LECTURE NOTES

MATH 117 LECTURE NOTES MATH 117 LECTURE NOTES XIN ZHOU Abstract. This is the set of lecture notes for Math 117 during Fall quarter of 2017 at UC Santa Barbara. The lectures follow closely the textbook [1]. Contents 1. The set

More information

Champernowne s Number, Strong Normality, and the X Chromosome. by Adrian Belshaw and Peter Borwein

Champernowne s Number, Strong Normality, and the X Chromosome. by Adrian Belshaw and Peter Borwein Champernowne s Number, Strong Normality, and the X Chromosome by Adrian Belshaw and Peter Borwein ABSTRACT. Champernowne s number is the best-known example of a normal number, but its digits are far from

More information

Jim Lambers MAT 610 Summer Session Lecture 2 Notes

Jim Lambers MAT 610 Summer Session Lecture 2 Notes Jim Lambers MAT 610 Summer Session 2009-10 Lecture 2 Notes These notes correspond to Sections 2.2-2.4 in the text. Vector Norms Given vectors x and y of length one, which are simply scalars x and y, the

More information

Fermat s Last Theorem for Regular Primes

Fermat s Last Theorem for Regular Primes Fermat s Last Theorem for Regular Primes S. M.-C. 22 September 2015 Abstract Fermat famously claimed in the margin of a book that a certain family of Diophantine equations have no solutions in integers.

More information

On the sum of a prime and a square-free number

On the sum of a prime and a square-free number Ramanujan J 207 42:233 240 DOI 0.007/s39-05-9736-2 On the sum of a prime and a square-free number Adrian W. Dudek Dedicated to the memory of Theodor Ray Dudek Received: 0 March 205 / Accepted: 8 August

More information

3 The fundamentals: Algorithms, the integers, and matrices

3 The fundamentals: Algorithms, the integers, and matrices 3 The fundamentals: Algorithms, the integers, and matrices 3.4 The integers and division This section introduces the basics of number theory number theory is the part of mathematics involving integers

More information

Determining elements of minimal index in an infinite family of totally real bicyclic biquadratic number fields

Determining elements of minimal index in an infinite family of totally real bicyclic biquadratic number fields Determining elements of minimal index in an infinite family of totally real bicyclic biquadratic number fields István Gaál, University of Debrecen, Mathematical Institute H 4010 Debrecen Pf.12., Hungary

More information

MATH Fundamental Concepts of Algebra

MATH Fundamental Concepts of Algebra MATH 4001 Fundamental Concepts of Algebra Instructor: Darci L. Kracht Kent State University April, 015 0 Introduction We will begin our study of mathematics this semester with the familiar notion of even

More information

Introduction to Cryptology. Lecture 19

Introduction to Cryptology. Lecture 19 Introduction to Cryptology Lecture 19 Announcements HW6 due today HW7 due Thursday 4/20 Remember to sign up for Extra Credit Agenda Last time More details on AES/DES (K/L 6.2) Practical Constructions of

More information

NOTES ON SIMPLE NUMBER THEORY

NOTES ON SIMPLE NUMBER THEORY NOTES ON SIMPLE NUMBER THEORY DAMIEN PITMAN 1. Definitions & Theorems Definition: We say d divides m iff d is positive integer and m is an integer and there is an integer q such that m = dq. In this case,

More information

p-adic Analysis Compared to Real Lecture 1

p-adic Analysis Compared to Real Lecture 1 p-adic Analysis Compared to Real Lecture 1 Felix Hensel, Waltraud Lederle, Simone Montemezzani October 12, 2011 1 Normed Fields & non-archimedean Norms Definition 1.1. A metric on a non-empty set X is

More information

Limits and Continuity

Limits and Continuity Chapter Limits and Continuity. Limits of Sequences.. The Concept of Limit and Its Properties A sequence { } is an ordered infinite list x,x,...,,... The n-th term of the sequence is, and n is the index

More information

PUTNAM TRAINING NUMBER THEORY. Exercises 1. Show that the sum of two consecutive primes is never twice a prime.

PUTNAM TRAINING NUMBER THEORY. Exercises 1. Show that the sum of two consecutive primes is never twice a prime. PUTNAM TRAINING NUMBER THEORY (Last updated: December 11, 2017) Remark. This is a list of exercises on Number Theory. Miguel A. Lerma Exercises 1. Show that the sum of two consecutive primes is never twice

More information

A misère-play -operator

A misère-play -operator A misère-play -operator Matthieu Dufour Silvia Heubach Urban Larsson arxiv:1608.06996v1 [math.co] 25 Aug 2016 July 31, 2018 Abstract We study the -operator (Larsson et al, 2011) of impartial vector subtraction

More information

Chapter 1 Error Analysis

Chapter 1 Error Analysis Chapter 1 Error Analysis Several sources of errors are important for numerical data processing: Experimental uncertainty: Input data from an experiment have a limited precision. Instead of the vector of

More information

arxiv: v1 [math.co] 27 Aug 2015

arxiv: v1 [math.co] 27 Aug 2015 P-positions in Modular Extensions to Nim arxiv:1508.07054v1 [math.co] 7 Aug 015 Tanya Khovanova August 31, 015 Abstract Karan Sarkar In this paper, we consider a modular extension to the game of Nim, which

More information

1 Floating point arithmetic

1 Floating point arithmetic Introduction to Floating Point Arithmetic Floating point arithmetic Floating point representation (scientific notation) of numbers, for example, takes the following form.346 0 sign fraction base exponent

More information

MATH 426, TOPOLOGY. p 1.

MATH 426, TOPOLOGY. p 1. MATH 426, TOPOLOGY THE p-norms In this document we assume an extended real line, where is an element greater than all real numbers; the interval notation [1, ] will be used to mean [1, ) { }. 1. THE p

More information

A relative of the Thue-Morse Sequence

A relative of the Thue-Morse Sequence A relative of the Thue-Morse Sequence Jean-Paul Allouche C.N.R.S. U.R.A. 0226, Math., Université Bordeaux I, F-33405 Talence cedex, FRANCE André Arnold LaBRI, Université Bordeaux I, F-33405 Talence cedex,

More information

MATH 31BH Homework 1 Solutions

MATH 31BH Homework 1 Solutions MATH 3BH Homework Solutions January 0, 04 Problem.5. (a) (x, y)-plane in R 3 is closed and not open. To see that this plane is not open, notice that any ball around the origin (0, 0, 0) will contain points

More information

Recognising nilpotent groups

Recognising nilpotent groups Recognising nilpotent groups A. R. Camina and R. D. Camina School of Mathematics, University of East Anglia, Norwich, NR4 7TJ, UK; a.camina@uea.ac.uk Fitzwilliam College, Cambridge, CB3 0DG, UK; R.D.Camina@dpmms.cam.ac.uk

More information

Arithmetic and Error. How does error arise? How does error arise? Notes for Part 1 of CMSC 460

Arithmetic and Error. How does error arise? How does error arise? Notes for Part 1 of CMSC 460 Notes for Part 1 of CMSC 460 Dianne P. O Leary Preliminaries: Mathematical modeling Computer arithmetic Errors 1999-2006 Dianne P. O'Leary 1 Arithmetic and Error What we need to know about error: -- how

More information

Floating-point Computation

Floating-point Computation Chapter 2 Floating-point Computation 21 Positional Number System An integer N in a number system of base (or radix) β may be written as N = a n β n + a n 1 β n 1 + + a 1 β + a 0 = P n (β) where a i are

More information

MA257: INTRODUCTION TO NUMBER THEORY LECTURE NOTES

MA257: INTRODUCTION TO NUMBER THEORY LECTURE NOTES MA257: INTRODUCTION TO NUMBER THEORY LECTURE NOTES 2018 57 5. p-adic Numbers 5.1. Motivating examples. We all know that 2 is irrational, so that 2 is not a square in the rational field Q, but that we can

More information

Introduction CSE 541

Introduction CSE 541 Introduction CSE 541 1 Numerical methods Solving scientific/engineering problems using computers. Root finding, Chapter 3 Polynomial Interpolation, Chapter 4 Differentiation, Chapter 4 Integration, Chapters

More information

Algebraic function fields

Algebraic function fields Algebraic function fields 1 Places Definition An algebraic function field F/K of one variable over K is an extension field F K such that F is a finite algebraic extension of K(x) for some element x F which

More information

Number Theory Solutions Packet

Number Theory Solutions Packet Number Theory Solutions Pacet 1 There exist two distinct positive integers, both of which are divisors of 10 10, with sum equal to 157 What are they? Solution Suppose 157 = x + y for x and y divisors of

More information

Computer Arithmetic. MATH 375 Numerical Analysis. J. Robert Buchanan. Fall Department of Mathematics. J. Robert Buchanan Computer Arithmetic

Computer Arithmetic. MATH 375 Numerical Analysis. J. Robert Buchanan. Fall Department of Mathematics. J. Robert Buchanan Computer Arithmetic Computer Arithmetic MATH 375 Numerical Analysis J. Robert Buchanan Department of Mathematics Fall 2013 Machine Numbers When performing arithmetic on a computer (laptop, desktop, mainframe, cell phone,

More information

1. Suppose that a, b, c and d are four different integers. Explain why. (a b)(a c)(a d)(b c)(b d)(c d) a 2 + ab b = 2018.

1. Suppose that a, b, c and d are four different integers. Explain why. (a b)(a c)(a d)(b c)(b d)(c d) a 2 + ab b = 2018. New Zealand Mathematical Olympiad Committee Camp Selection Problems 2018 Solutions Due: 28th September 2018 1. Suppose that a, b, c and d are four different integers. Explain why must be a multiple of

More information

Chapter 5. Number Theory. 5.1 Base b representations

Chapter 5. Number Theory. 5.1 Base b representations Chapter 5 Number Theory The material in this chapter offers a small glimpse of why a lot of facts that you ve probably nown and used for a long time are true. It also offers some exposure to generalization,

More information

A CONSTRUCTION OF ARITHMETIC PROGRESSION-FREE SEQUENCES AND ITS ANALYSIS

A CONSTRUCTION OF ARITHMETIC PROGRESSION-FREE SEQUENCES AND ITS ANALYSIS A CONSTRUCTION OF ARITHMETIC PROGRESSION-FREE SEQUENCES AND ITS ANALYSIS BRIAN L MILLER & CHRIS MONICO TEXAS TECH UNIVERSITY Abstract We describe a particular greedy construction of an arithmetic progression-free

More information

THE SUM OF DIGITS OF n AND n 2

THE SUM OF DIGITS OF n AND n 2 THE SUM OF DIGITS OF n AND n 2 KEVIN G. HARE, SHANTA LAISHRAM, AND THOMAS STOLL Abstract. Let s q (n) denote the sum of the digits in the q-ary expansion of an integer n. In 2005, Melfi examined the structure

More information

4 Number Theory and Cryptography

4 Number Theory and Cryptography 4 Number Theory and Cryptography 4.1 Divisibility and Modular Arithmetic This section introduces the basics of number theory number theory is the part of mathematics involving integers and their properties.

More information

Basic building blocks for a triple-double intermediate format

Basic building blocks for a triple-double intermediate format Laboratoire de l Informatique du Parallélisme École Normale Supérieure de Lyon Unité Mixte de Recherche CNRS-INRIA-ENS LYON-UCBL n o 5668 Basic building blocks for a triple-double intermediate format Christoph

More information

Solutions for Homework Assignment 2

Solutions for Homework Assignment 2 Solutions for Homework Assignment 2 Problem 1. If a,b R, then a+b a + b. This fact is called the Triangle Inequality. By using the Triangle Inequality, prove that a b a b for all a,b R. Solution. To prove

More information

1. Prove that for every positive integer n there exists an n-digit number divisible by 5 n all of whose digits are odd.

1. Prove that for every positive integer n there exists an n-digit number divisible by 5 n all of whose digits are odd. 32 nd United States of America Mathematical Olympiad Proposed Solutions May, 23 Remark: The general philosophy of this marking scheme follows that of IMO 22. This scheme encourages complete solutions.

More information

Gal s Accurate Tables Method Revisited

Gal s Accurate Tables Method Revisited Gal s Accurate Tables Method Revisited Damien Stehlé UHP/LORIA 615 rue du jardin botanique F-5460 Villers-lès-Nancy Cedex stehle@loria.fr Paul Zimmermann INRIA Lorraine/LORIA 615 rue du jardin botanique

More information

THE MILLER RABIN TEST

THE MILLER RABIN TEST THE MILLER RABIN TEST KEITH CONRAD 1. Introduction The Miller Rabin test is the most widely used probabilistic primality test. For odd composite n > 1 at least 75% of numbers from to 1 to n 1 are witnesses

More information

1. multiplication is commutative and associative;

1. multiplication is commutative and associative; Chapter 4 The Arithmetic of Z In this chapter, we start by introducing the concept of congruences; these are used in our proof (going back to Gauss 1 ) that every integer has a unique prime factorization.

More information

Park Forest Math Team. Meet #3. Algebra. Self-study Packet

Park Forest Math Team. Meet #3. Algebra. Self-study Packet Park Forest Math Team Meet #3 Self-study Packet Problem Categories for this Meet: 1. Mystery: Problem solving 2. Geometry: Angle measures in plane figures including supplements and complements 3. Number

More information

Grade 11/12 Math Circles Rational Points on an Elliptic Curves Dr. Carmen Bruni November 11, Lest We Forget

Grade 11/12 Math Circles Rational Points on an Elliptic Curves Dr. Carmen Bruni November 11, Lest We Forget Faculty of Mathematics Waterloo, Ontario N2L 3G1 Centre for Education in Mathematics and Computing Grade 11/12 Math Circles Rational Points on an Elliptic Curves Dr. Carmen Bruni November 11, 2015 - Lest

More information

The numerical stability of barycentric Lagrange interpolation

The numerical stability of barycentric Lagrange interpolation IMA Journal of Numerical Analysis (2004) 24, 547 556 The numerical stability of barycentric Lagrange interpolation NICHOLAS J. HIGHAM Department of Mathematics, University of Manchester, Manchester M13

More information

CS 577 Introduction to Algorithms: Strassen s Algorithm and the Master Theorem

CS 577 Introduction to Algorithms: Strassen s Algorithm and the Master Theorem CS 577 Introduction to Algorithms: Jin-Yi Cai University of Wisconsin Madison In the last class, we described InsertionSort and showed that its worst-case running time is Θ(n 2 ). Check Figure 2.2 for

More information

Chapter Six. Polynomials. Properties of Exponents Algebraic Expressions Addition, Subtraction, and Multiplication Factoring Solving by Factoring

Chapter Six. Polynomials. Properties of Exponents Algebraic Expressions Addition, Subtraction, and Multiplication Factoring Solving by Factoring Chapter Six Polynomials Properties of Exponents Algebraic Expressions Addition, Subtraction, and Multiplication Factoring Solving by Factoring Properties of Exponents The properties below form the basis

More information

Math 101 Study Session Spring 2016 Test 4 Chapter 10, Chapter 11 Chapter 12 Section 1, and Chapter 12 Section 2

Math 101 Study Session Spring 2016 Test 4 Chapter 10, Chapter 11 Chapter 12 Section 1, and Chapter 12 Section 2 Math 101 Study Session Spring 2016 Test 4 Chapter 10, Chapter 11 Chapter 12 Section 1, and Chapter 12 Section 2 April 11, 2016 Chapter 10 Section 1: Addition and Subtraction of Polynomials A monomial is

More information

15 th Annual Harvard-MIT Mathematics Tournament Saturday 11 February 2012

15 th Annual Harvard-MIT Mathematics Tournament Saturday 11 February 2012 1 th Annual Harvard-MIT Mathematics Tournament Saturday 11 February 01 1. Let f be the function such that f(x) = { x if x 1 x if x > 1 What is the total length of the graph of f(f(...f(x)...)) from x =

More information

CSE 20 DISCRETE MATH. Fall

CSE 20 DISCRETE MATH. Fall CSE 20 DISCRETE MATH Fall 2017 http://cseweb.ucsd.edu/classes/fa17/cse20-ab/ Today's learning goals Distinguish between a theorem, an axiom, lemma, a corollary, and a conjecture. Recognize direct proofs

More information

Chapter VI. Relations. Assumptions are the termites of relationships. Henry Winkler

Chapter VI. Relations. Assumptions are the termites of relationships. Henry Winkler Chapter VI Relations Assumptions are the termites of relationships. Henry Winkler Studying relationships between objects can yield important information about the objects themselves. In the real numbers,

More information

φ(xy) = (xy) n = x n y n = φ(x)φ(y)

φ(xy) = (xy) n = x n y n = φ(x)φ(y) Groups 1. (Algebra Comp S03) Let A, B and C be normal subgroups of a group G with A B. If A C = B C and AC = BC then prove that A = B. Let b B. Since b = b1 BC = AC, there are a A and c C such that b =

More information

ON A THEOREM OF TARTAKOWSKY

ON A THEOREM OF TARTAKOWSKY ON A THEOREM OF TARTAKOWSKY MICHAEL A. BENNETT Dedicated to the memory of Béla Brindza Abstract. Binomial Thue equations of the shape Aa n Bb n = 1 possess, for A and B positive integers and n 3, at most

More information

A PASCAL-LIKE BOUND FOR THE NUMBER OF NECKLACES WITH FIXED DENSITY

A PASCAL-LIKE BOUND FOR THE NUMBER OF NECKLACES WITH FIXED DENSITY A PASCAL-LIKE BOUND FOR THE NUMBER OF NECKLACES WITH FIXED DENSITY I. HECKENBERGER AND J. SAWADA Abstract. A bound resembling Pascal s identity is presented for binary necklaces with fixed density using

More information

COMP239: Mathematics for Computer Science II. Prof. Chadi Assi EV7.635

COMP239: Mathematics for Computer Science II. Prof. Chadi Assi EV7.635 COMP239: Mathematics for Computer Science II Prof. Chadi Assi assi@ciise.concordia.ca EV7.635 The Euclidean Algorithm The Euclidean Algorithm Finding the GCD of two numbers using prime factorization is

More information

PUTNAM TRAINING PROBLEMS

PUTNAM TRAINING PROBLEMS PUTNAM TRAINING PROBLEMS (Last updated: December 3, 2003) Remark This is a list of Math problems for the NU Putnam team to be discussed during the training sessions Miguel A Lerma 1 Bag of candies In a

More information

Chapter 8. P-adic numbers. 8.1 Absolute values

Chapter 8. P-adic numbers. 8.1 Absolute values Chapter 8 P-adic numbers Literature: N. Koblitz, p-adic Numbers, p-adic Analysis, and Zeta-Functions, 2nd edition, Graduate Texts in Mathematics 58, Springer Verlag 1984, corrected 2nd printing 1996, Chap.

More information

On the Diagonal Approximation of Full Matrices

On the Diagonal Approximation of Full Matrices On the Diagonal Approximation of Full Matrices Walter M. Lioen CWI P.O. Box 94079, 090 GB Amsterdam, The Netherlands ABSTRACT In this paper the construction of diagonal matrices, in some

More information

Modular Arithmetic Instructor: Marizza Bailey Name:

Modular Arithmetic Instructor: Marizza Bailey Name: Modular Arithmetic Instructor: Marizza Bailey Name: 1. Introduction to Modular Arithmetic If someone asks you what day it is 145 days from now, what would you answer? Would you count 145 days, or find

More information

SECTION Types of Real Numbers The natural numbers, positive integers, or counting numbers, are

SECTION Types of Real Numbers The natural numbers, positive integers, or counting numbers, are SECTION.-.3. Types of Real Numbers The natural numbers, positive integers, or counting numbers, are The negative integers are N = {, 2, 3,...}. {..., 4, 3, 2, } The integers are the positive integers,

More information

(b) g(x) = 4 + 6(x 3) (x 3) 2 (= x x 2 ) M1A1 Note: Accept any alternative form that is correct. Award M1A0 for a substitution of (x + 3).

(b) g(x) = 4 + 6(x 3) (x 3) 2 (= x x 2 ) M1A1 Note: Accept any alternative form that is correct. Award M1A0 for a substitution of (x + 3). Paper. Answers. (a) METHOD f (x) q x f () q 6 q 6 f() p + 8 9 5 p METHOD f(x) (x ) + 5 x + 6x q 6, p (b) g(x) + 6(x ) (x ) ( + x x ) Note: Accept any alternative form that is correct. Award A for a substitution

More information

Lecture Notes 7, Math/Comp 128, Math 250

Lecture Notes 7, Math/Comp 128, Math 250 Lecture Notes 7, Math/Comp 128, Math 250 Misha Kilmer Tufts University October 23, 2005 Floating Point Arithmetic We talked last time about how the computer represents floating point numbers. In a floating

More information

Computing a Lower Bound for the Canonical Height on Elliptic Curves over Q

Computing a Lower Bound for the Canonical Height on Elliptic Curves over Q Computing a Lower Bound for the Canonical Height on Elliptic Curves over Q John Cremona 1 and Samir Siksek 2 1 School of Mathematical Sciences, University of Nottingham, University Park, Nottingham NG7

More information

THE MAHLER MEASURE OF POLYNOMIALS WITH ODD COEFFICIENTS

THE MAHLER MEASURE OF POLYNOMIALS WITH ODD COEFFICIENTS Bull. London Math. Soc. 36 (2004) 332 338 C 2004 London Mathematical Society DOI: 10.1112/S002460930300287X THE MAHLER MEASURE OF POLYNOMIALS WITH ODD COEFFICIENTS PETER BORWEIN, KEVIN G. HARE and MICHAEL

More information

LEGENDRE S THEOREM, LEGRANGE S DESCENT

LEGENDRE S THEOREM, LEGRANGE S DESCENT LEGENDRE S THEOREM, LEGRANGE S DESCENT SUPPLEMENT FOR MATH 370: NUMBER THEORY Abstract. Legendre gave simple necessary and sufficient conditions for the solvablility of the diophantine equation ax 2 +

More information

AP PHYSICS 2011 SCORING GUIDELINES

AP PHYSICS 2011 SCORING GUIDELINES AP PHYSICS 011 SCORING GUIDELINES General Notes About 011 AP Physics Scoring Guidelines 1. The solutions contain the most common method of solving the free-response questions and the allocation of points

More information

Introduction to Arithmetic Geometry Fall 2013 Lecture #5 09/19/2013

Introduction to Arithmetic Geometry Fall 2013 Lecture #5 09/19/2013 18.782 Introduction to Arithmetic Geometry Fall 2013 Lecture #5 09/19/2013 5.1 The field of p-adic numbers Definition 5.1. The field of p-adic numbers Q p is the fraction field of Z p. As a fraction field,

More information

CONTINUED FRACTIONS, PELL S EQUATION, AND TRANSCENDENTAL NUMBERS

CONTINUED FRACTIONS, PELL S EQUATION, AND TRANSCENDENTAL NUMBERS CONTINUED FRACTIONS, PELL S EQUATION, AND TRANSCENDENTAL NUMBERS JEREMY BOOHER Continued fractions usually get short-changed at PROMYS, but they are interesting in their own right and useful in other areas

More information