ROUNDOFF ERRORS; BACKWARD STABILITY

SECTION.5 ROUNDOFF ERRORS; BACKWARD STABILITY ROUNDOFF ERROR -- error due to the finite representation (usually in floatingpoint form) of real (and complex) numers in digital computers. FLOATING-POINT NUMBER SYSTEM -- a finite approximation to the real numer system, in which numers are represented in a form ± 0. d d Ld k e where d 0 (that is, the floating-point numers are normalized ) k is called the precision (that is, the numer of significant digits) is the ase e is the exponent and lies in some range m e M. Two ways to represent a real numer in floating-point: for example, suppose that x π = 3.45965L k = 4 = 0 with ROUNDING with CHOPPING = + 0.34 0 = + 0.34 0 Definition of IDEALIZED FLOATING-POINT ARITHMETIC If x and y denote floating-point numers, then fl (, x y), x y), x / y) are computed y performing exact arithmetic on the values x and y, and then rounding or chopping that result to k digits. Also, for example, fl (cos(x)) and x) are defined similarly. BASIC BOUNDS ON ROUNDOFF ERROR where If x is a real numer, then = x ( + ε ), (*) 04

and ε u u = k -k / using rounding using chopping The numer u is called the UNIT ROUNDOFF (see page 4 of the textook). NOTE. From (*) aove, x) x ε =, x that is, ε is the relative error in the approximation fl (x) to x. PROOF that (*) is correct: suppose that t x < t for some integer t. Then it can e shown that the distance etween every two adjacent floating-point numers in the interval [ t, t ] is t k. Thus, with chopping, the relative error is x) x t k x k = t (and with rounding, a factor of / is included). A BOUND ON THE ROUNDOFF ERROR OF IDEALIZED FLOATING-POINT ARITHMETIC Let o denote any one of + /. Then if x and y are floating-point numers, y the aove result and using idealized floating-point arithmetic, fl ( x o y) = ( x o y)( + ε ), where ε u. (See (.5.3) on page 4.) Thus, the relative error in doing one floating-point arithmetic operation is small. NOTE. If xˆ and ŷ are real numers ut are not floating-point numers, then the relative error in the computation of fl ( xˆ + yˆ) = xˆ) + yˆ)) may not e small (although this involves only 3 roundoff errors). See the ottom of page 43 of the textook. The reason for this is possile numeric cancellation. 05

ANOTHER WAY OF VIEWING THE ABOVE POINT: if x, y and z are all floatingpoint numers, then the relative error of f l ( may e large (as this involves roundoff errors, and there may e cancellation). EXAMPLE Let x = + 0.34 0 y = + 0.5600 0 z = 0.33 0 0 4 0 and use idealized chopping, floating-point arithmetic with = 0 and k = 4 to evaluate f l (. The result otained is 0.000, however the exact value is 0.00056. Thus the relative error is 0.00056 0.000 0.00056 = 0.359 or 35.9%. There are two TYPES OF ANALYSES of roundoff error: (i) forward (or direct) analysis -- determine a ound on exact solution - computed solution -- requires that one determines a ound on the maximum error in every calculation of the computation -- this is difficult to do, and the results are often overly pessimistic (for example, the early work of von Neumann and Goldstine) Example of such an analysis: Using the result fl ( = ( ( + ε), one otains = ( ( + ε )( + ε ) + z( + ε ) = ( ( + ε + ε + ε ε ) + z( + ε ) Therefore f l ( ( x + y = ( ( ε + ε + εε ) + zε and dividing oth sides y x + y + z gives the relative error. 06

(ii) ackward (or inverse) analysis Given a set of data z, z, K, zm, denote some computation on this data y C z, z, K, z ). A ackward error analysis requires that one prove ( m that there exist values z, z, K, zm (small perturations of the values z, z, K, ) for which z m fl C( z, z, K, z )) = C( z, z, K, z ). ( m m That is, the result of the floating-point computation with the values z, z, K, z m is equal to the exact value of the computation of C using the pertured values z, z, K, z. m Example Proving a result of the form f l ( = x + y + z where x x, y y and z z would e a ackward error analysis result. A ackward error analysis has a close relationship to staility: if the perturations are small, then the computation of C is stale. Sometimes such a computation is said to e ackward stale. NOTE. To prove that an algorithm is stale, one needs to do a ackward error analysis. But this does not guarantee that a computed solution, say xˆ, is accurate (close to the exact solution). To determine this, you need to determine the condition of the prolem. If the algorithm is stale and if the prolem is well-conditioned, then the computed solution is accurate. So in this case, one doesn't need a forward error analysis result. See page 46 for a discussion of this in terms of solving Ax =. ************************************************* Small residual implies ackward staility (pages 46-47) The development of ackward error analyses is due to James Wilkinson, in the 950's and 960's. Usually if you do a ackward error analysis of an algorithm (such as Gaussian elimination) it will show that the algorithm is ackward stale only for certain sets of input data. That is, the perturations are small only for certain sets of input data, and not for all possile sets of input data. 07

For the prolem of solving a fixed linear system Ax =, there is a simple a posteriori method of checking the ackward staility of the computation of a computed solution: use the residual vector. If xˆ denotes any computed solution to x, let rˆ = Axˆ. As noted in Section.4, xˆ is the exact solution of the linear system Az = + δ, where δ = rˆ. Thus, if δ = rˆ is small, then the algorithm that was used to compute xˆ is ackward stale (for this particular input data). Exercise.5.6 on page 47 gives a similar result for a perturation in A rather than. 08