Notes on floating point number, numerical computations and pitfalls

Similar documents
Numerical Algorithms. IE 496 Lecture 20

Floating-point Computation

Mathematical preliminaries and error analysis

Lecture 7. Floating point arithmetic and stability

Floating Point Number Systems. Simon Fraser University Surrey Campus MACM 316 Spring 2005 Instructor: Ha Le

Notes for Chapter 1 of. Scientific Computing with Case Studies

Arithmetic and Error. How does error arise? How does error arise? Notes for Part 1 of CMSC 460

BACHELOR OF COMPUTER APPLICATIONS (BCA) (Revised) Term-End Examination December, 2015 BCS-054 : COMPUTER ORIENTED NUMERICAL TECHNIQUES

Lecture Notes 7, Math/Comp 128, Math 250

Chapter 1: Introduction and mathematical preliminaries

Elements of Floating-point Arithmetic

INTRODUCTION TO COMPUTATIONAL MATHEMATICS

FLOATING POINT ARITHMETHIC - ERROR ANALYSIS

Elements of Floating-point Arithmetic

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 5. Ax = b.

FLOATING POINT ARITHMETHIC - ERROR ANALYSIS

1 ERROR ANALYSIS IN COMPUTATION

Binary floating point

Jim Lambers MAT 610 Summer Session Lecture 2 Notes

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science

Errors. Intensive Computation. Annalisa Massini 2017/2018

Computer Arithmetic. MATH 375 Numerical Analysis. J. Robert Buchanan. Fall Department of Mathematics. J. Robert Buchanan Computer Arithmetic

Mathematics for Engineers. Numerical mathematics

Chapter 1 Error Analysis

1 Floating point arithmetic

Introduction to Scientific Computing Languages

ECS 231 Computer Arithmetic 1 / 27

Introduction and mathematical preliminaries

ACM 106a: Lecture 1 Agenda

Chapter 1: Preliminaries and Error Analysis

ESO 208A: Computational Methods in Engineering. Saumyen Guha

Chapter 1 Mathematical Preliminaries and Error Analysis

Introduction to Scientific Computing Languages

Introduction CSE 541

1.1 COMPUTER REPRESENTATION OF NUM- BERS, REPRESENTATION ERRORS

Numerical Methods - Preliminaries

Numerical Methods in Physics and Astrophysics

Round-off Errors and Computer Arithmetic - (1.2)

1 Backward and Forward Error

1.2 Finite Precision Arithmetic

Chapter 1 Mathematical Preliminaries and Error Analysis

Compute the behavior of reality even if it is impossible to observe the processes (for example a black hole in astrophysics).

Applied Numerical Analysis (AE2220-I) R. Klees and R.P. Dwight

MAT 460: Numerical Analysis I. James V. Lambers

2.29 Numerical Fluid Mechanics Fall 2011 Lecture 2

QUADRATIC PROGRAMMING?

2. Review of Calculus Notation. C(X) all functions continuous on the set X. C[a, b] all functions continuous on the interval [a, b].

Numerical Methods - Lecture 2. Numerical Methods. Lecture 2. Analysis of errors in numerical methods

Introduction to Numerical Analysis

Tu: 9/3/13 Math 471, Fall 2013, Section 001 Lecture 1

Chapter 1 Computer Arithmetic

Remainders. We learned how to multiply and divide in elementary

5. Hand in the entire exam booklet and your computer score sheet.

AIMS Exercise Set # 1

Math 348: Numerical Methods with application in finance and economics. Boualem Khouider University of Victoria

Numerical Analysis. Yutian LI. 2018/19 Term 1 CUHKSZ. Yutian LI (CUHKSZ) Numerical Analysis 2018/19 1 / 41

LECTURE NOTES ELEMENTARY NUMERICAL METHODS. Eusebius Doedel

Gaussian Elimination for Linear Systems

you expect to encounter difficulties when trying to solve A x = b? 4. A composite quadrature rule has error associated with it in the following form

Midterm Review. Igor Yanovsky (Math 151A TA)

CS760, S. Qiao Part 3 Page 1. Kernighan and Pike [5] present the following general guidelines for testing software.

Introduction to Finite Di erence Methods

Math 471. Numerical methods Introduction

CHAPTER 3. Iterative Methods

COURSE Numerical integration of functions (continuation) 3.3. The Romberg s iterative generation method

1 Finding and fixing floating point problems

Math 411 Preliminaries

PLC Papers. Created For:

MATH ASSIGNMENT 03 SOLUTIONS

Solutions to Exam 2, Math 10560

An Introduction to Numerical Analysis. James Brannick. The Pennsylvania State University

Essential Mathematics

PowerPoints organized by Dr. Michael R. Gustafson II, Duke University

An Introduction to Numerical Analysis

Numerical Methods. Dr Dana Mackey. School of Mathematical Sciences Room A305 A Dana Mackey (DIT) Numerical Methods 1 / 12

Subtract 6 to both sides Divide by 2 on both sides. Cross Multiply. Answer: x = -9

Mock Final Exam Name. Solve and check the linear equation. 1) (-8x + 8) + 1 = -7(x + 3) A) {- 30} B) {- 6} C) {30} D) {- 28}

Algebra. Mathematics Help Sheet. The University of Sydney Business School

Lecture notes to the course. Numerical Methods I. Clemens Kirisits

How do computers represent numbers?

Examples MAT-INF1100. Øyvind Ryan

Numerical and Statistical Methods

NUMERICAL METHODS. x n+1 = 2x n x 2 n. In particular: which of them gives faster convergence, and why? [Work to four decimal places.

Numerical Methods. King Saud University

Number Systems III MA1S1. Tristan McLoughlin. December 4, 2013

Jim Lambers MAT 460/560 Fall Semester Practice Final Exam

Numerical and Statistical Methods

ACT MATH MUST-KNOWS Pre-Algebra and Elementary Algebra: 24 questions

CS 257: Numerical Methods

2018 LEHIGH UNIVERSITY HIGH SCHOOL MATH CONTEST

1 Solutions to selected problems

What Every Programmer Should Know About Floating-Point Arithmetic DRAFT. Last updated: November 3, Abstract

Chapter 4 Number Representations

One-Sided Difference Formula for the First Derivative

CS 450 Numerical Analysis. Chapter 8: Numerical Integration and Differentiation

Accurate polynomial evaluation in floating point arithmetic

ON REPRESENTATIONS OF COEFFICIENTS IN IMPLICIT INTERVAL METHODS OF RUNGE-KUTTA TYPE

Algorithms (II) Yu Yu. Shanghai Jiaotong University

Numerical Mathematical Analysis

CHAPTER 1. INTRODUCTION. ERRORS.

Transcription:

Notes on floating point number, numerical computations and pitfalls November 6, 212 1 Floating point numbers An n-digit floating point number in base β has the form x = ±(.d 1 d 2 d n ) β β e where.d 1 d 2 d n is a β-fraction called the mantissa and e is an integer called the exponent. Such a floating point number is called normalised if d 1, or else, d 1 = d 2 = = d n =. The exponent e is limited to a range Usually, m = M. m < e < M 1.1 Rounding, chopping, overflow and underflow There are two common ways of translating a given real number x in to an n β-digit floating point number f l(x) - rounding and chopping. For example, if two decimal digit floating point numbers are used { (.67) 1 rounded fl(2/3) = (.66) 1 chopped and { (.84) 1 3 rounded fl( 838) = (.83) 1 3 chopped The conversion is undefined if x β M (overflow) or < x β m n (underflow), where m and M are the bounds on the exponent. The difference between x and fl(x) is called the round-off error. If we write fl(x) = x(1 + δ) 1

then it is possible to bound δ independently of x. It is not difficult to see that δ < 1 2 β1 n in rounding β 1 n < δ in chopping The maximum possible value of δ is often called the unit roundoff or machine epsilon and is denoted by u. It is the smallest machine number such that fl(1 + u) > 1 1.2 Floating point operations If denotes one of the arithmetic operations (addition, subtraction, multiplication or division) and represents the floating point version of the same operation, then, usually x y x y However, it is reasonable to assume (actually the computer arithmetic is so constructed) that for some δ, with δ < u. 1.3 Error analysis x y = fl(x y) = (x y)(1 + δ) Consider the computation of f(x) = x 2n at a point x by n squaring: x 1 = x 2, x 2 = x 2 1,..., x n = x 2 n 1 with fl(x ) = x. In floating point arithmetic, we compute: ˆx 1 = x 2 (1 + δ 1 ), ˆx 2 = ˆx 2 1(1 + δ 2 ),..., ˆx n = ˆx 2 n 1(1 + δ n ) with δ i u, i. The computed answer is, therefore, ˆx n = x 2n (1 + δ 1 ) 2n 1 (1 + δ 2 n 1)(1 + δ n ) Now, if δ 1, δ 2,..., δ r u, then there exist δ and η with δ, η u, such that (1 + δ 1 )(1 + δ 1 )... (1 + δ r ) = (1 + δ) r = (1 + η) r+1 2

Thus, ˆx n = x 2n (1 + δ) 2n = f(x (1 + δ)) for some δ with δ u. In other words, the computed value ˆx n is the exact answer for a perturbed input x = x (1 + δ). The above is an example of backward error analysis. 1.4 Loss of significance If x is an approximation to x, then the error in x is x x. The relative error in x, as an approximation to x is the number (x x )/x Every floating point operation in a computational process may give an error which, once generated, may then be amplified or reduced in subsequent computations. One of the most common (and often avoidable) ways of increasing the importance of an error is commonly called loss of significant digits. x approximates x to r significant β-digits provided the absolute error x x is at most 1 2 in the rth significant β-digit of x, i.e., x x 1 2 βs r+1 with s the largest integer such that β s x. For example, x = 3 agrees with x = π to one significant decimal digit, where as x = 22 7 = 3.1428 is correct to three significant digits. Suppose we are to calculate z = x y and we have approximations x and y for x and y, respectively, each of which is good to r digits. Then, z = x y may not be good to r digits. For example, if x = (.76545421) 1 1 y = (.765442) 1 1 are each correct to seven decimal digits, then their exact difference z = x y = (.1221) 1 3 is good only to three digits as an approximation to x, since the fourth digit of z is derived from the eighth digits of x and y, both possibly in error. Hence, the relative error in z is possibly 1, times the relative error in x or y. 3

Such errors can be avoided by anticipating their occurrence. For example the calculation of f(x) = 1 cos x is prone to loss of significant digits for x near, however, the alternate representation sin x2 f(x) = 1 + cos x can be evaluated quite accurately for small values of x and is, on the other hand, problematic near x = π. 1.5 Condition and instability 2 Problems 2.1 Discretization 1. Derive the trapezoidal rule for numerical integration which goes somewhat like: b n 1 1 f(x)dx 2 h[f(x k) + f(x k+1 )] a k= for discrete values of x k in interval [a, b]. Compute π sin(x)dx using trapezoidal rule (use h =.1 and h =.1) and compare with the exact result. 2. Consider the differential equation y (x) = 2xy(x) 2x 2 + 1, x 1 y() = 1 (a) Show/verify that the exact solution is the function y(x) = e x2 +x. (b) If we approximate the derivative operation with a divided difference y (x k ) = (y k+1 y k )/(x k+1 x k ) then show that solution can be approximated by the iteration y k+1 = y k + h(2x k y k 2x 2 k + 1), k =, 1,..., n y = 1 where x k = kh, k =, 1,..., n and h = 1/n. 4

(c) Use h =.1 to solve the differential equation numerically and compare (plot) your answers with the exact solution. 3. (a) Derive the Newton s iteration for computing 2 given by x k+1 = 1 2 [x k + (2/x k )], k =, 1,... x = 1 (b) Show the Newton s iteration takes O(log n) steps to obtain n decimal digits of accuracy. (c) Numerically compute 2 using Newton s iteration and verify the rate of convergence. 4. Which of the following are rounding errors and which are truncation errors (please check out the definitions from Wikipedia)? (a) Replace sin(x) by x (x 3 /3!) + (x 5 )/5!... (b) Use 3.1415926536 for π. (c) Use the value x 1 for 2, where x k is given by Newton s iteration above. (d) Divide 1. by 3. and call the result.3333. 2.2 Unstable and Ill-conditioned problems 1. Consider the differential equation y (x) = (2/π)xy(y π), x 1 y() = y (a) Show/verify that the exact solution to this equation is y(x) = πy /[y + (π y )e x2 ] (b) Taking y = π compute the solution for i. an 8 digit rounded approximation for π ii. a 9 digit rounded approximation for π What can you say about the results? 2. Solve the system 2x 4y = 1 2.998x + 6.1y = 2 5

using any method you know. Compare the solution with the solution to the system obtained by changing the last equation to 2.998x+6y = 2. Is this problem stable? 3. Examine the stability of the equation x 3 12x 2 + 21x 1 = which has a solution x = 1. Change one of the coefficients (say 21 to 2) and show that x = 1 is no longer even close to a solution. 2.3 Unstable methods 1. Consider the quadratic ax 2 + bx + c =, a Consider a = 1, b = 1.1, c = 2.5245315. Suppose that b 2 4ac is computed correctly to 8 digits, what is the number of digits of accuracy in x? What is the source of the error? 2. Show that the solution to the quadratic can be re-written as x = 2c/(b 2 + b 2 4ac) Write a program to evaluate x for several values of a, b and c (with b large and positive and a, c of moderate size). Compare the results obtained with the usual formula and the formula above. 3. Consider the problem of determining the value of the integral If we let 1 I k = x 2 e x 1 dx 1 x k e x 1 dx Then, integration by parts gives us (please verify) I k = 1 ki k 1 I = 1 ex 1 dx = 1 (1/e) Thus we can compute I 2 by successively computing I 1, I 2,.... Compute the (k, I k ) table with a program, plot, and see if it makes sense. What are the errors due to? 6

Compare the results with that obtained using the following recursion (which you can easily! derive by integrating by parts twice). I k = (1/π) [k(k 1)/π 2 ]I k 2, k = 2, 4, 6,... 4. The standard deviation of a set of numbers x 1, x 2,..., x n is defined as s = (1/n) n (x i x) 2 where x is the average. An alternative formula that is often used is s = (1/n) i=1 n x 2 i x 2 i=1 (a) Discuss the instability of the second formula for the case where the x i are all very close to each other. (b) Observe that s should always be positive. Write a small program to evaluate the two formulas and find values of x 1,..., x 1 for which the second one gives negative results. 7