Chapter 1: Introduction and mathematical preliminaries

Similar documents
Introduction and mathematical preliminaries

Mathematical preliminaries and error analysis

Computer Arithmetic. MATH 375 Numerical Analysis. J. Robert Buchanan. Fall Department of Mathematics. J. Robert Buchanan Computer Arithmetic

Arithmetic and Error. How does error arise? How does error arise? Notes for Part 1 of CMSC 460

Chapter 1 Mathematical Preliminaries and Error Analysis

Notes for Chapter 1 of. Scientific Computing with Case Studies

Introduction CSE 541

PowerPoints organized by Dr. Michael R. Gustafson II, Duke University

How do computers represent numbers?

Notes on floating point number, numerical computations and pitfalls

Floating Point Number Systems. Simon Fraser University Surrey Campus MACM 316 Spring 2005 Instructor: Ha Le

Floating-point Computation

Numerical Methods in Physics and Astrophysics

Tu: 9/3/13 Math 471, Fall 2013, Section 001 Lecture 1

Chapter 1: Preliminaries and Error Analysis

Chapter 1 Mathematical Preliminaries and Error Analysis

Jim Lambers MAT 610 Summer Session Lecture 2 Notes

Chapter 1 Error Analysis

Numerical Algorithms. IE 496 Lecture 20

Chapter 4 Number Representations

Errors. Intensive Computation. Annalisa Massini 2017/2018

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science

Elements of Floating-point Arithmetic

Round-off Errors and Computer Arithmetic - (1.2)

1.1 COMPUTER REPRESENTATION OF NUM- BERS, REPRESENTATION ERRORS

Elements of Floating-point Arithmetic

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 5. Ax = b.

1 ERROR ANALYSIS IN COMPUTATION

Numerical Methods - Lecture 2. Numerical Methods. Lecture 2. Analysis of errors in numerical methods

Numerical Analysis. Yutian LI. 2018/19 Term 1 CUHKSZ. Yutian LI (CUHKSZ) Numerical Analysis 2018/19 1 / 41

ECS 231 Computer Arithmetic 1 / 27

BACHELOR OF COMPUTER APPLICATIONS (BCA) (Revised) Term-End Examination December, 2015 BCS-054 : COMPUTER ORIENTED NUMERICAL TECHNIQUES

Lecture Notes 7, Math/Comp 128, Math 250

Chapter 1. Numerical Errors. Module No. 1. Errors in Numerical Computations

An Introduction to Differential Algebra

NUMERICAL MATHEMATICS & COMPUTING 6th Edition

Lecture 7. Floating point arithmetic and stability

CS412: Introduction to Numerical Methods

Numerical Mathematical Analysis

Essential Mathematics

INTRODUCTION TO COMPUTATIONAL MATHEMATICS

Applied Mathematics 205. Unit 0: Overview of Scientific Computing. Lecturer: Dr. David Knezevic

MTH303. Section 1.3: Error Analysis. R.Touma

5.3 SOLVING TRIGONOMETRIC EQUATIONS

What Every Programmer Should Know About Floating-Point Arithmetic DRAFT. Last updated: November 3, Abstract

Numerics and Error Analysis

CISE-301: Numerical Methods Topic 1:

QUADRATIC PROGRAMMING?

NUMERICAL METHODS. x n+1 = 2x n x 2 n. In particular: which of them gives faster convergence, and why? [Work to four decimal places.

2. Review of Calculus Notation. C(X) all functions continuous on the set X. C[a, b] all functions continuous on the interval [a, b].

Examples MAT-INF1100. Øyvind Ryan

Chapter 1 Computer Arithmetic

ESO 208A: Computational Methods in Engineering. Saumyen Guha

Introduction to Finite Di erence Methods

Midterm Review. Igor Yanovsky (Math 151A TA)

Solutions - Homework 1 (Due date: September 25 th ) Presentation and clarity are very important! Show your procedure!

FLOATING POINT ARITHMETHIC - ERROR ANALYSIS

Next, we include the several conversion from type to type.

Numerical Methods. King Saud University

Numerical Methods. Dr Dana Mackey. School of Mathematical Sciences Room A305 A Dana Mackey (DIT) Numerical Methods 1 / 12

Theme 1: Solving Nonlinear Equations

Introduction to Scientific Computing

ACM 106a: Lecture 1 Agenda

MAT 460: Numerical Analysis I. James V. Lambers

Introductory Numerical Analysis

x x2 2 + x3 3 x4 3. Use the divided-difference method to find a polynomial of least degree that fits the values shown: (b)

NUMERICAL AND STATISTICAL COMPUTING (MCA-202-CR)

Math 471. Numerical methods Introduction

Introduction to Numerical Analysis

Binary floating point

Mathematics for Engineers. Numerical mathematics

1.2 Finite Precision Arithmetic

CHAPTER 1 REAL NUMBERS KEY POINTS

Applied Numerical Analysis (AE2220-I) R. Klees and R.P. Dwight

MATH Dr. Halimah Alshehri Dr. Halimah Alshehri

Number Systems III MA1S1. Tristan McLoughlin. December 4, 2013

The Euclidean Division Implemented with a Floating-Point Multiplication and a Floor

FLOATING POINT ARITHMETHIC - ERROR ANALYSIS

Complex Numbers: Definition: A complex number is a number of the form: z = a + bi where a, b are real numbers and i is a symbol with the property: i

ALGEBRA GRADE 7 MINNESOTA ACADEMIC STANDARDS CORRELATED TO MOVING WITH MATH. Part B Student Book Skill Builders (SB)

Compute the behavior of reality even if it is impossible to observe the processes (for example a black hole in astrophysics).

Notes on Conditioning

(i) 2-5 (ii) (3 + 23) - 23 (v) 2π

Math 128A: Homework 2 Solutions

Mathematics Pacing. Instruction: 9/7/17 10/31/17 Assessment: 11/1/17 11/8/17. # STUDENT LEARNING OBJECTIVES NJSLS Resources

CHAPTER 3. Iterative Methods

UNIT 5 QUADRATIC FUNCTIONS Lesson 2: Creating and Solving Quadratic Equations in One Variable Instruction

Numerical Differentiation & Integration. Numerical Differentiation III

Degree of a polynomial

MATH ASSIGNMENT 03 SOLUTIONS

Number Representation and Waveform Quantization

A Brief Introduction to Numerical Methods for Differential Equations

Mathematics Review. Sid Rudolph

A quadratic expression is a mathematical expression that can be written in the form 2

Conversions between Decimal and Binary

Solutions to Assignment 1

Unit 1 Vocabulary. A function that contains 1 or more or terms. The variables may be to any non-negative power.

Binary Floating-Point Numbers

1.2. Indices. Introduction. Prerequisites. Learning Outcomes

Math 411 Preliminaries

Transcription:

Chapter 1: Introduction and mathematical preliminaries Evy Kersalé September 26, 2011

Motivation Most of the mathematical problems you have encountered so far can be solved analytically. However, in real-life, analytic solutions are rather rare, and therefore we must devise a way of approximating the solutions.

Motivation Most of the mathematical problems you have encountered so far can be solved analytically. However, in real-life, analytic solutions are rather rare, and therefore we must devise a way of approximating the solutions. For example, while 2 1 e x dx has a well known analytic solution, only be solved in terms of special functions and solution. 2 1 2 1 e x2 dx can e x3 dx has no analytic

Motivation Most of the mathematical problems you have encountered so far can be solved analytically. However, in real-life, analytic solutions are rather rare, and therefore we must devise a way of approximating the solutions. For example, while 2 1 e x dx has a well known analytic solution, only be solved in terms of special functions and solution. 2 1 2 1 e x2 dx can e x3 dx has no analytic These integrals exist as the area under the curves y = exp(x 2 ) and y = exp(x 3 ). We can obtain a numerical approximation by estimating this area, e.g. by dividing it into strips and using the trapezium rule. 1 2

Definition Numerical analysis is a part of mathematics concerned with

Definition Numerical analysis is a part of mathematics concerned with devising methods, called numerical algorithms, for obtaining numerical approximate solutions to mathematical problems;

Definition Numerical analysis is a part of mathematics concerned with devising methods, called numerical algorithms, for obtaining numerical approximate solutions to mathematical problems; being able to estimate the error involved.

Definition Numerical analysis is a part of mathematics concerned with devising methods, called numerical algorithms, for obtaining numerical approximate solutions to mathematical problems; being able to estimate the error involved. Traditionally, numerical algorithms are built upon the most simple arithmetic operations (+,, and ).

Definition Numerical analysis is a part of mathematics concerned with devising methods, called numerical algorithms, for obtaining numerical approximate solutions to mathematical problems; being able to estimate the error involved. Traditionally, numerical algorithms are built upon the most simple arithmetic operations (+,, and ). Interestingly, digital computers can only do these very basic operations. However, they are very fast and, hence, have led to major advances in applied mathematics in the last 60 years.

Computer arithmetic: Floating point numbers. Computers can store integers exactly but not real numbers in general. Instead, they approximate them as floating point numbers.

Computer arithmetic: Floating point numbers. Computers can store integers exactly but not real numbers in general. Instead, they approximate them as floating point numbers. A decimal floating point (or machine number) is a number of the form ± 0. d 1 d 2... d }{{ k 10 ±n, 0 d } i 9, d 1 0, m where the significand or mantissa m (i.e. the fractional part) and the exponent n are fixed-length integers. (m cannot start with a zero.)

Computer arithmetic: Floating point numbers. Computers can store integers exactly but not real numbers in general. Instead, they approximate them as floating point numbers. A decimal floating point (or machine number) is a number of the form ± 0. d 1 d 2... d }{{ k 10 ±n, 0 d } i 9, d 1 0, m where the significand or mantissa m (i.e. the fractional part) and the exponent n are fixed-length integers. (m cannot start with a zero.) In fact, computers use binary numbers (base 2) rather than decimal numbers (base 10) but the same principle applies (see handout).

Computer arithmetic: Machine ε. Consider a simple computer where m is 3 digits long and n is one digit long. The smallest positive number this computer can store is 0.1 10 9 and the largest is 0.999 10 9.

Computer arithmetic: Machine ε. Consider a simple computer where m is 3 digits long and n is one digit long. The smallest positive number this computer can store is 0.1 10 9 and the largest is 0.999 10 9. Thus, the length of the exponent determines the range of numbers that can be stored.

Computer arithmetic: Machine ε. Consider a simple computer where m is 3 digits long and n is one digit long. The smallest positive number this computer can store is 0.1 10 9 and the largest is 0.999 10 9. Thus, the length of the exponent determines the range of numbers that can be stored. However, not all values in the range can be distinguished: numbers can only be recorded to a certain relative accuracy ε.

Computer arithmetic: Machine ε. Consider a simple computer where m is 3 digits long and n is one digit long. The smallest positive number this computer can store is 0.1 10 9 and the largest is 0.999 10 9. Thus, the length of the exponent determines the range of numbers that can be stored. However, not all values in the range can be distinguished: numbers can only be recorded to a certain relative accuracy ε. For example, on our simple computer, the next floating point number after 1 = 0.1 10 1 is 0.101 10 1 = 1.01. The quantity ε machine = 0.01 (machine ε) is the worst relative uncertainty in the floating point representation of a number.

Chopping and rounding There are two ways of terminating the mantissa of the k-digit decimal machine number approximating 0.d 1 d 2... d k d k+1 d k+2... 10 n, 0 d i 9, d 1 0,

Chopping and rounding There are two ways of terminating the mantissa of the k-digit decimal machine number approximating 0.d 1 d 2... d k d k+1 d k+2... 10 n, 0 d i 9, d 1 0, Chopping: chop off the digits d k+1, d k+2,... to get 0.d 1 d 2... d k 10 n.

Chopping and rounding There are two ways of terminating the mantissa of the k-digit decimal machine number approximating 0.d 1 d 2... d k d k+1 d k+2... 10 n, 0 d i 9, d 1 0, Chopping: chop off the digits d k+1, d k+2,... to get 0.d 1 d 2... d k 10 n. Rounding: add 5 10 n (k+1) and chop off the k + 1, k + 2,... digits. (If d k+1 5 we add 1 to d k before chopping.) Rounding is more accurate than chopping.

Chopping and rounding There are two ways of terminating the mantissa of the k-digit decimal machine number approximating 0.d 1 d 2... d k d k+1 d k+2... 10 n, 0 d i 9, d 1 0, Chopping: chop off the digits d k+1, d k+2,... to get 0.d 1 d 2... d k 10 n. Rounding: add 5 10 n (k+1) and chop off the k + 1, k + 2,... digits. (If d k+1 5 we add 1 to d k before chopping.) Rounding is more accurate than chopping. Example The five-digit floating-point form of π = 3.14159265359... is 0.31415 10 using chopping and 0.31416 10 using rounding.

Chopping and rounding There are two ways of terminating the mantissa of the k-digit decimal machine number approximating 0.d 1 d 2... d k d k+1 d k+2... 10 n, 0 d i 9, d 1 0, Chopping: chop off the digits d k+1, d k+2,... to get 0.d 1 d 2... d k 10 n. Rounding: add 5 10 n (k+1) and chop off the k + 1, k + 2,... digits. (If d k+1 5 we add 1 to d k before chopping.) Rounding is more accurate than chopping. Example The five-digit floating-point form of π = 3.14159265359... is 0.31415 10 using chopping and 0.31416 10 using rounding. Similarly, the five-digit floating-point form of 2/3 = 0.6666666... is 0.66666 using chopping and 0.66667 using rounding but that of 1/3 = 0.3333333... is 0.33333 using either chopping or rounding.

Measure of the error Much of numerical analysis is concerned with controlling the size of errors in calculations. These errors, quantified in two different ways, arise from two distinct sources.

Measure of the error Much of numerical analysis is concerned with controlling the size of errors in calculations. These errors, quantified in two different ways, arise from two distinct sources. Let p be the result of a numerical calculation and p the exact answer (i.e. p is an approximation to p). We define two measures of the error,

Measure of the error Much of numerical analysis is concerned with controlling the size of errors in calculations. These errors, quantified in two different ways, arise from two distinct sources. Let p be the result of a numerical calculation and p the exact answer (i.e. p is an approximation to p). We define two measures of the error, Absolute error: E = p p

Measure of the error Much of numerical analysis is concerned with controlling the size of errors in calculations. These errors, quantified in two different ways, arise from two distinct sources. Let p be the result of a numerical calculation and p the exact answer (i.e. p is an approximation to p). We define two measures of the error, Absolute error: E = p p Relative error: E r = p p / p (provided p 0) which takes into consideration the size of the value.

Measure of the error Much of numerical analysis is concerned with controlling the size of errors in calculations. These errors, quantified in two different ways, arise from two distinct sources. Let p be the result of a numerical calculation and p the exact answer (i.e. p is an approximation to p). We define two measures of the error, Absolute error: E = p p Relative error: E r = p p / p (provided p 0) which takes into consideration the size of the value. Example If p = 2 and p = 2.1, the absolute error E = 10 1 ; if p = 2 10 3 and p = 2.1 10 3, E = 10 4 is smaller; and if p = 2 10 3 and p = 2.1 10 3, E = 10 2 is larger but in all three cases the relative error remains the same, E r = 5 10 2.

Round-off errors Caused by the imprecision of using finite-digit arithmetic in practical calculations (e.g. floating point numbers).

Round-off errors Caused by the imprecision of using finite-digit arithmetic in practical calculations (e.g. floating point numbers). Example The 4-digit representation of x = 2 = 1.4142136... is x = 1.414 = 0.1414 10. Using 4-digit arithmetic, we can evaluate x 2 = 1.999 2, due to round-off errors.

Round-off errors Caused by the imprecision of using finite-digit arithmetic in practical calculations (e.g. floating point numbers). Example The 4-digit representation of x = 2 = 1.4142136... is x = 1.414 = 0.1414 10. Using 4-digit arithmetic, we can evaluate x 2 = 1.999 2, due to round-off errors. Round-off errors can be minimised by reducing the number of arithmetic operations, particularly those that magnify errors.

Magnification of the error.

Magnification of the error. Computers store numbers to a relative accuracy ε. Thus, the true value of a floating point number x could be anywhere between x (1 ε) and x (1 + ε).

Magnification of the error. Computers store numbers to a relative accuracy ε. Thus, the true value of a floating point number x could be anywhere between x (1 ε) and x (1 + ε). Now, if we add two numbers together, x + y, the true value lies in the interval (x + y ε( x + y ), x + y + ε( x + y )).

Magnification of the error. Computers store numbers to a relative accuracy ε. Thus, the true value of a floating point number x could be anywhere between x (1 ε) and x (1 + ε). Now, if we add two numbers together, x + y, the true value lies in the interval (x + y ε( x + y ), x + y + ε( x + y )). Thus, the absolute error is the sum of the errors in x and y, E = ε( x + y ) but the relative error of the answer is E r = ε x + y x + y.

Magnification of the error. Computers store numbers to a relative accuracy ε. Thus, the true value of a floating point number x could be anywhere between x (1 ε) and x (1 + ε). Now, if we add two numbers together, x + y, the true value lies in the interval (x + y ε( x + y ), x + y + ε( x + y )). Thus, the absolute error is the sum of the errors in x and y, E = ε( x + y ) but the relative error of the answer is x + y E r = ε x + y. If x and y both have the same sign the relative accuracy remains equal to ε, but if the have opposite signs the relative error will be larger.

Magnification of the error. Computers store numbers to a relative accuracy ε. Thus, the true value of a floating point number x could be anywhere between x (1 ε) and x (1 + ε). Now, if we add two numbers together, x + y, the true value lies in the interval (x + y ε( x + y ), x + y + ε( x + y )). Thus, the absolute error is the sum of the errors in x and y, E = ε( x + y ) but the relative error of the answer is x + y E r = ε x + y. If x and y both have the same sign the relative accuracy remains equal to ε, but if the have opposite signs the relative error will be larger. This magnification becomes particularly significant when two very close numbers are subtracted.

Magnification of the error: Example Recall: the exact solutions of the quadratic equation ax 2 + bx + c = 0 are x 1 = b b 2 4ac 2a, x 2 = b + b 2 4ac. 2a

Magnification of the error: Example Recall: the exact solutions of the quadratic equation ax 2 + bx + c = 0 are x 1 = b b 2 4ac 2a, x 2 = b + b 2 4ac. 2a Using 4-digit rounding arithmetic, solve the quadratic equation x 2 + 62x + 1 = 0, with roots x 1 61.9838670 and x 2 0.016133230.

Magnification of the error: Example Recall: the exact solutions of the quadratic equation ax 2 + bx + c = 0 are x 1 = b b 2 4ac 2a, x 2 = b + b 2 4ac. 2a Using 4-digit rounding arithmetic, solve the quadratic equation x 2 + 62x + 1 = 0, with roots x 1 61.9838670 and x 2 0.016133230. The discriminant b 2 4ac = 3840 = 61.97 is close to b = 62. Thus, x 1 = ( 62 61.97)/2 = 124.0/2 = 62, with a relative error E r = 2.6 10 4, but x 2 = ( 62 + 61.97)/2 = 0.0150, with a much larger relative error E r = 7.0 10 2.

Magnification of the error: Example Recall: the exact solutions of the quadratic equation ax 2 + bx + c = 0 are x 1 = b b 2 4ac 2a, x 2 = b + b 2 4ac. 2a Using 4-digit rounding arithmetic, solve the quadratic equation x 2 + 62x + 1 = 0, with roots x 1 61.9838670 and x 2 0.016133230. The discriminant b 2 4ac = 3840 = 61.97 is close to b = 62. Thus, x 1 = ( 62 61.97)/2 = 124.0/2 = 62, with a relative error E r = 2.6 10 4, but x 2 = ( 62 + 61.97)/2 = 0.0150, with a much larger relative error E r = 7.0 10 2. Similarly, division by small numbers (or equivalently, multiplication by large numbers) magnifies the absolute error, leaving the relative error unchanged.

Truncation errors Caused by the approximations in the computational algorithm itself. (An algorithm only gives an approximate solution to a mathematical problem, even if the arithmetic is exact.)

Truncation errors Caused by the approximations in the computational algorithm itself. (An algorithm only gives an approximate solution to a mathematical problem, even if the arithmetic is exact.) Example Calculate the derivative of a function f (x) at the point x 0. Recall the definition of the derivative df (x0) = lim dx h 0 f (x 0 + h) f (x 0). h

Truncation errors Caused by the approximations in the computational algorithm itself. (An algorithm only gives an approximate solution to a mathematical problem, even if the arithmetic is exact.) Example Calculate the derivative of a function f (x) at the point x 0. Recall the definition of the derivative df (x0) = lim dx h 0 f (x 0 + h) f (x 0). h However, on a computer we cannot take h 0 (there exists a smallest positive floating point number), so h must take a finite value.

Truncation errors Caused by the approximations in the computational algorithm itself. (An algorithm only gives an approximate solution to a mathematical problem, even if the arithmetic is exact.) Example Calculate the derivative of a function f (x) at the point x 0. Recall the definition of the derivative df (x0) = lim dx h 0 f (x 0 + h) f (x 0). h However, on a computer we cannot take h 0 (there exists a smallest positive floating point number), so h must take a finite value.

Truncation errors Caused by the approximations in the computational algorithm itself. (An algorithm only gives an approximate solution to a mathematical problem, even if the arithmetic is exact.) Example Calculate the derivative of a function f (x) at the point x 0. Recall the definition of the derivative df (x0) = lim dx h 0 f (x 0 + h) f (x 0). h However, on a computer we cannot take h 0 (there exists a smallest positive floating point number), so h must take a finite value. Using Taylor s theorem, f (x 0 + h) = f (x 0) + hf (x 0) + h 2 /2 f (ξ), where x 0 < ξ < x 0 + h.

Truncation errors Caused by the approximations in the computational algorithm itself. (An algorithm only gives an approximate solution to a mathematical problem, even if the arithmetic is exact.) Example Calculate the derivative of a function f (x) at the point x 0. Recall the definition of the derivative df (x0) = lim dx h 0 f (x 0 + h) f (x 0). h However, on a computer we cannot take h 0 (there exists a smallest positive floating point number), so h must take a finite value. Using Taylor s theorem, f (x 0 + h) = f (x 0) + hf (x 0) + h 2 /2 f (ξ), where x 0 < ξ < x 0 + h. Therefore, f (x 0 + h) f (x 0) h = hf (x 0) + h 2 /2 f (ξ) h = f (x 0)+ h 2 f (ξ) f (x 0)+ h 2 f (x 0).

Truncation errors Caused by the approximations in the computational algorithm itself. (An algorithm only gives an approximate solution to a mathematical problem, even if the arithmetic is exact.) Example Calculate the derivative of a function f (x) at the point x 0. Recall the definition of the derivative df (x0) = lim dx h 0 f (x 0 + h) f (x 0). h However, on a computer we cannot take h 0 (there exists a smallest positive floating point number), so h must take a finite value. Using Taylor s theorem, f (x 0 + h) = f (x 0) + hf (x 0) + h 2 /2 f (ξ), where x 0 < ξ < x 0 + h. Therefore, f (x 0 + h) f (x 0) h = hf (x 0) + h 2 /2 f (ξ) h = f (x 0)+ h 2 f (ξ) f (x 0)+ h 2 f (x 0). Using a finite value of h leads to a truncation error of size h/2 f (x 0).

Truncation errors: Example continued Clearly, the truncation error h/2 f (x 0) decreases with decreasing h.

Truncation errors: Example continued Clearly, the truncation error h/2 f (x 0) decreases with decreasing h. The absolute round-off error in f (x 0 + h) f (x 0) is 2ε f (x 0) and that in the derivative (f (x 0 + h) f (x 0))/h is 2ε/h f (x 0). So, the round-off error increases with decreasing h.

Truncation errors: Example continued Clearly, the truncation error h/2 f (x 0) decreases with decreasing h. The absolute round-off error in f (x 0 + h) f (x 0) is 2ε f (x 0) and that in the derivative (f (x 0 + h) f (x 0))/h is 2ε/h f (x 0). So, the round-off error increases with decreasing h. The relative accuracy of the calculation of f (x 0) (i.e. the sum of the relative truncation and round-off errors) is E r 1 h h E r = h f 2 f + 2ε f h f, h m h which has a minimum, min(e r ) = 2 ε ff / f, for h m = 2 ε f /f.