Chapter 1 Mathematical Preliminaries and Error Analysis

Numerical Analysis (Math 3313) 2019-2018 Chapter 1 Mathematical Preliminaries and Error Analysis Intended learning outcomes: Upon successful completion of this chapter, a student will be able to (1) list the basic principles of numerical analysis, (2) identify different possible types of errors in numerical computation, (3) describe how numbers are stored in computer and its relation to numerical analysis. 1

1.1 Introduction Numerical analysis is a way to solve mathematical problems by special procedures (algorithms) which use arithmetic operations only. A major advantage for numerical analysis is that a numerical answer can be obtained even when a problem is very complicated and has no analytical solution. Modern numerical analysis can be credibly said to begin with the 1947 paper by John von Neumann and Herman Goldstine, Numerical Inverting of Matrices of High Order (Bulletin of the AMS, Nov. 1947). It is one of the first papers to study rounding error and include discussion of what today is called scientific computing. Although numerical analysis has a longer and richer history, modern numerical analysis, as used here, is characterized by the synergy of the programmable electronic computer, mathematical analysis, and the opportunity and need to solve large and complex problems in applications. Modern numerical analysis and scientific computing developed quickly and on many fronts. In this course, our focus is on solution of nonlinear equations, numerical linear algebra, numerical methods for differential and integral equations, methods of approximation of functions, and the impact of these developments on science and technology. Moreover, one of our interest is the impact of mathematical software packages. In contrast to more classical fields of mathematics, like Analysis, Number Theory or Algebraic Geometry, Numerical Analysis became an independent mathematical discipline only in the course of the 20th Century. Problems solvable by numerical analysis The following problems are some of the mathematical problems that can be solved using numerical analysis: (1) Solving nonlinear equations f(x) = 0. (2) Solving large systems of linear equations. (3) Solving systems of nonlinear equations. (4) Interpolating to find intermediate values within a table of data. (5) Fitting curves to data by a variety of methods. (6) Finding efficient and effective approximations of functions. (7) Finding derivatives of any order for functions even when the function is known only as a table of values. (8) Integrating any function even when known only as a table of values. (9) Solving differential equations. Example 1. The equation e x + sin x = 0 has no analytic solution but it can be solved numerically. Example 2. The integral b a e x2 dx has no closed solution but it can be evaluated numerically. 2019-2018 Page 2 of 7

Differences between numerical and analytic methods (1) Numerical solution is always numerical while analytic solution is given in terms of functions. Thus analytic solution has the advantage that it can be evaluated for specific instance and behavior and properties of the solution are often apparent. (2) Analytic methods give exact solutions while numerical methods give approximate solutions. 1.2 Computer and Numerical Analysis Numerical methods require such tedious and repetitive arithmetic operations that only when we use computer it is practical to solve problems by numerical methods. A human would make so many mistakes that there would be little confidence in the result. In addition, the manpower cost would be more than could normally be afforded. Thus computer and numerical analysis make a perfect combination. To use computers in numerical analysis we must write computer programs. The computer language is not important; one can use any language. Actually, writing programs is not always necessary. Numerical analysis is so important that extensive commercial software packages are available. An alternative to using a program written in one of the computer languages is to use a kind of software sometimes called a symbolic algebra program. These programs mimic the way the humans solve mathematical problems. Many such symbolic programs are available including Sage (System for Algebra and Geometry Experimentation), Freemat, Mathematica, Derive, MAPLE. MathCad, MATLAB, and MacSyma. Computer Arithmetic and Errors The analysis of computer errors and the other sources of error in numerical methods is a critically important part of the study of numerical analysis. There are several possible sources of errors in addition to those due to the inexact arithmetic of the computer. Inexactness of the mathematical model and measurements Real-world problems, which an existing or proposed physical situation is modeled by a mathematical equation, will nearly always have coefficients that are not known exactly. The reason is that the problems often depend on measurements of doubtful accuracy. Further, the model itself may not reflect the behavior of the situation perfectly. These sources of error are the domain of mathematical modeling and the domain of the experimentalists, and will not be our subject in this course. Truncation error Truncation errors are errors caused by the method itself. The term originates from the fact that numerical methods can usually compared to a truncated Taylor series. For example, we know that e x = n=0 2019-2018 Page 3 of 7 x n n!.

We may approximate e x by the cubic e x p 3 (x) = 1 + x + 1 2 x2 + 1 6 x3. The error in this approximation is due to truncating the series and has nothing to do with the computer or calculator. Many numerical methods are iterative and we would reach the exact answer only if we apply the method infinitely many times. But life is finite and computer time is costly. Thus we must be satisfied with an approximation to the exact analytic answer. The error in this approximation is a truncation error. Round-off error The error that is produced when a calculator or computer is used to perform real-number calculations is called round-off error. It occurs because the arithmetic performed in a machine involves numbers with only a finite number of digits, with the result that calculations are performed with only approximate representations of the actual numbers. The arithmetic performed by a calculator or computer is different from the arithmetic that we use in our algebra and calculus courses. From your past experience you might expect that we always have as true statements such things as 2 + 2 = 4, 4 8 = 32, and ( 3) 2 = 3. In standard computational arithmetic we expect exact results for 2 + 2 = 4 and 4 8 = 32, but we will not have precisely ( 3) 2 = 3. To understand why this is true we must explore the world of finite-digit arithmetic. In our traditional mathematical world we permit numbers with an infinite number of digits. The arithmetic we use in this world defines 3 as that unique positive number that when multiplied by itself produces the integer 3. In the computational world, however, each representable number has only a fixed and finite number of digits. This means, for example, that only rational numbers and not even all of these can be represented exactly. Since 3 is not rational, it is given an approximate representation within the machine, a representation whose square will not be precisely 3, although it will likely be sufficiently close to 3 to be acceptable in most situations. In most cases, then, this machine representation and arithmetic is satisfactory and passes without notice or concern, but at times problems arise because of this discrepancy. Blunders Since humans are involved in programming, operation, input preparation, and output interpretation, blunders or gross errors do occur more frequently than we like to admit. The solution here is care coupled with a careful examination of the results for reasonableness. Sometimes a test run with known results is worthwhile, but it is no guarantee of freedom from foolish error. Propagated error By propagated error we mean an error in the succeeding steps of a process due to an occurrence of an earlier error-such error is in addition to the local errors. It is somewhat analogous to errors in the initial conditions. Propagated error is of critical importance. If errors are magnified continuously as the method continues, eventually they will overshadow the true value, destroying its validity; we call such a 2019-2018 Page 4 of 7

method unstable. For stable method, the desirable kind, errors made at early points die out as the method continue. Remark. Each of these types of error, while interacting to a degree, may occur even in the absence of the other kinds. Floating-point arithmetic In order to examine round-off error in detail we need to understand how numeric quantities are represented in computer. An unfortunate fact of life is that any digital computer can only store finitely many quantities. Thus, a computer can not represent the infinity set of integers, the set rational numbers, set of real numbers, or the set complex numbers. So the decision of how to deal with more general numbers using only the finitely many that the computer can store becomes an important issue. In nearly all cases, numbers are stored as floating-point quantities. Example 1. Fixed-point number Floating-point number 13.524.135224 10 2 =.13524 E2 0.0005.5 10 3 =.5 E 3 In a computer, floating-point numbers have the general form ±.d 1 d 2 d p B e, where the d i s are digits or bits with 0 d i B 1 and B =the number base that is used, usually 2, 16, or 10. p =the number of significant bits (digits), that is, the precision. e =an integer exponent, ranging from E min to E max, with values going from negative E min to positive E max. In almost all cases, numbers are normalized so that d 1 0. Floating-point numbers have three parts sign: ±, mantissa (fractional part): d 1 d 2 d p, exponent part (characteristic): B e. The three parts have fixed total length 32 or 64 bits. The mantissa use most of these bits (23-52). The number of the bits that is used by the mantissa determines the precision of the representation and the bits of the exponent determines the range of the values. Most computer permits two or even three types of numbers: (1) Single precision: 6 to 7 significant decimal digits (±.d 1 d 7 B e ). (2) Double precision: 13 to 14 significant decimal digits(±.d 1 d 14 B e ). (3) extended precision: 19 to 20 significant decimal digits(±.d 1 d 20 B e ). 2019-2018 Page 5 of 7

Any positive real number within numerical range of the machine can be normalized to achieve the form x = 0.d 1 d 2 d k d k+1 d k+2 10 n. The floating-point form of x is obtained by terminating the mantissa of x at k decimal digits. There are two ways of performing the termination. One method, called chopping, is to simply chop off the digits d k+1 d k+2. The other method of terminating the mantissa of x at k decimal points is called rounding. If the k + 1st digit is smaller than 5, then the result is the same as chopping. If the k + 1st digit is 5 or greater, then 1 is added to the kth digit and the resulting number is chopped. Overflow and underflow Numbers occurring in calculations that have too small a magnitude to be represented result in underflow, and are generally set to 0 with computations continuing. However, numbers occurring in calculations that have too large a magnitude to be represented result in overflow and typically cause the computations to stop (unless the program has been designed to detect this occurrence) Absolute and relative errors As we said before, numerical methods give approximated solutions. In order to control how good are these approximations we need to control the error in them. There are common ways to express the size of the error in a computed result: absolute error and relative error. Definition. (Absolute and relative errors) Let x be an approximation to x. 1. The absolute error is 2. The relative error is E a = x x. E r = x x x. Remark. As a measure of accuracy, the absolute error can be misleading and the relative error is more meaningful, since the relative error takes into consideration the size of the true value. Example 1. Let x = 2.71828182 and x = 2.7182. Then E a = 0.00008182 and E r = 0.0000300925. Example 2. Let x = 98350 and x = 98000. Then E a = 350 and E r = 0.0000355871. Significant digits Significant digits is another way to express accuracy (how good an approximation is). It controls how many digit in the number have meaning after normalization. Definition. Let x = ±.d 1 d 2 d n d n+1 d p B e and let x = ±.d 1 d 2 d n e n+1 e p B e, where d 1 0. Then (1) If d n+1 e n+1 < 5, we say that x and x agree to n significant digits. 2019-2018 Page 6 of 7

(2) If d n+1 e n+1 5, we say that x and x agree to n 1 significant digits. Remark. Two numbers x and x agree to n significant digits if n is the largest positive integer such that E r 0.5 10 n 1 or E a < 0.5 10 e n, where e is the exponent of x. Example 1. Let x = 15.4321 and x = 15.4318. Then x =.154321 10 2 and x =.154318 10 2. Thus n = 4 and d 5 e 5 = 2 1 = 1 < 5. Therefore x and x agree to 4 significant digits. Note that E r = 0.0000194399 <.5 10 4 and E a = 0.0003 <.5 10 3. Example 2. Let x = 0.0123 and x = 0.0129. Then x =.123 10 1 and x =.129 10 1. Thus n = 2 and d 3 e 3 = 3 9 = 6. Therefore x and x agree to 1 significant digit. Note that E r = 0.048704 <.5 10 1 and E a = 0.0006 <.5 10 2. Remark. In general, the true value will not be known. Thus we can not use E a and E r to control numerical methods. We will use approximated error to over come this problem. Taylor Theorem Taylor theorem and its associated formula, the Taylor series, is of great value in the study of numerical methods. In essence, the Taylor theorem states that any smooth function can be approximated as a polynomial. The Taylor series then provides a means to express this idea mathematically in a form that can be used to generate practical results. Theorem 1. (Taylor s Theorem) Suppose that f is of class C n (n 0) on an interval I R, that is f (n+1) exists on I, and a I. Then for each x I, there exists a number ξ between a and x with where f(x) = P n (x) + R n (x), P n (x) = n j=0 is the nth-order Taylor polynomial for f based at a and f j (a) (x a) j j! R n (x) = f (n+1) (ξ)(x a) n+1 (n + 1)! the remainder term (or truncation error) associated with P n (x). The number ξ in the truncation error R n (x) depends on the value of x at which the polynomial P n (x) is being evaluated. However, we should not expect to be able to explicitly determine ξ. Taylor s Theorem simply ensures that such a value exists, and that it lies between x and a. In fact, one of the common problems in numerical methods is to try to determine a realistic bound for the value of f (n+1) (ξ) when x is in some specified interval. The infinite series obtained by taking the limit of P n (x) as n is called the Taylor series for f about a. 2019-2018 Page 7 of 7