MATH ASSIGNMENT 03 SOLUTIONS

Similar documents
Midterm Review. Igor Yanovsky (Math 151A TA)

Jim Lambers MAT 610 Summer Session Lecture 2 Notes

Math 128A: Homework 2 Solutions

Floating-point Computation

Notes for Chapter 1 of. Scientific Computing with Case Studies

Mathematical preliminaries and error analysis

Arithmetic and Error. How does error arise? How does error arise? Notes for Part 1 of CMSC 460

Chapter 1 Mathematical Preliminaries and Error Analysis

Lecture 7. Floating point arithmetic and stability

Partial Fractions. June 27, In this section, we will learn to integrate another class of functions: the rational functions.

1 Floating point arithmetic

1 ERROR ANALYSIS IN COMPUTATION

Floating Point Number Systems. Simon Fraser University Surrey Campus MACM 316 Spring 2005 Instructor: Ha Le

Chapter 1 Error Analysis

Numerical Methods - Preliminaries

CHAPTER 2 POLYNOMIALS KEY POINTS

Math 473: Practice Problems for Test 1, Fall 2011, SOLUTIONS

a k 0, then k + 1 = 2 lim 1 + 1

Binary floating point

Two hours. To be provided by Examinations Office: Mathematical Formula Tables. THE UNIVERSITY OF MANCHESTER. 29 May :45 11:45

Math 230 Mock Final Exam Detailed Solution

SECTION A. f(x) = ln(x). Sketch the graph of y = f(x), indicating the coordinates of any points where the graph crosses the axes.

Notes on floating point number, numerical computations and pitfalls

Computer Arithmetic. MATH 375 Numerical Analysis. J. Robert Buchanan. Fall Department of Mathematics. J. Robert Buchanan Computer Arithmetic

Introduction to Numerical Analysis

FIXED POINT ITERATION

Assignment 16 Assigned Weds Oct 11

8.3 Partial Fraction Decomposition

Lecture Notes 7, Math/Comp 128, Math 250

ECS 231 Computer Arithmetic 1 / 27

LIMITS AND DERIVATIVES

x x2 2 + x3 3 x4 3. Use the divided-difference method to find a polynomial of least degree that fits the values shown: (b)

Feedback D. Incorrect! Exponential functions are continuous everywhere. Look for features like square roots or denominators that could be made 0.

5.2 Infinite Series Brian E. Veitch

Integration of Rational Functions by Partial Fractions

Elements of Floating-point Arithmetic

What Every Programmer Should Know About Floating-Point Arithmetic DRAFT. Last updated: November 3, Abstract

Introduction CSE 541

Homework 2 Foundations of Computational Math 1 Fall 2018

Application - Ray Tracing and Bézier surfaces. TANA09 Lecture 3. Error Estimate. Application - Ray tracing and Beziér surfaces.

Essential Mathematics

8.7 Taylor s Inequality Math 2300 Section 005 Calculus II. f(x) = ln(1 + x) f(0) = 0

CHAPTER 4: Polynomial and Rational Functions

9.5. Polynomial and Rational Inequalities. Objectives. Solve quadratic inequalities. Solve polynomial inequalities of degree 3 or greater.

Integration of Rational Functions by Partial Fractions

Pre-Calculus Notes from Week 6

Tu: 9/3/13 Math 471, Fall 2013, Section 001 Lecture 1

Tropical Polynomials

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 5. Ax = b.

Chapter 1: Introduction and mathematical preliminaries

1. The dosage in milligrams D of a heartworm preventive for a dog who weighs X pounds is given by D x. Substitute 28 in place of x to get:

Elements of Floating-point Arithmetic

you expect to encounter difficulties when trying to solve A x = b? 4. A composite quadrature rule has error associated with it in the following form

Numerical techniques to solve equations

MAT 460: Numerical Analysis I. James V. Lambers

College Algebra. Chapter 5 Review Created by: Lauren Atkinson. Math Coordinator, Mary Stangler Center for Academic Success

PART I Lecture Notes on Numerical Solution of Root Finding Problems MATH 435

Simple Iteration, cont d

MATH 150 Pre-Calculus

Lecture for Week 2 (Secs. 1.3 and ) Functions and Limits

Compute the behavior of reality even if it is impossible to observe the processes (for example a black hole in astrophysics).

Polynomial and Rational Functions. Chapter 3

FLOATING POINT ARITHMETHIC - ERROR ANALYSIS

b n x n + b n 1 x n b 1 x + b 0

Introduction to Series and Sequences Math 121 Calculus II Spring 2015

APPENDIX : PARTIAL FRACTIONS

FLOATING POINT ARITHMETHIC - ERROR ANALYSIS

Complex Numbers: Definition: A complex number is a number of the form: z = a + bi where a, b are real numbers and i is a symbol with the property: i

Least Squares Regression

Infinite series, improper integrals, and Taylor series

Chapter 8B - Trigonometric Functions (the first part)

Exponential and. Logarithmic Functions. Exponential Functions. Logarithmic Functions

Math 1302 Notes 2. How many solutions? What type of solution in the real number system? What kind of equation is it?

ROOT FINDING REVIEW MICHELLE FENG

Math 31A Discussion Session Week 1 Notes January 5 and 7, 2016

4.2 Floating-Point Numbers

Number Representation and Waveform Quantization

Math Practice Exam 2 - solutions

Next, we include the several conversion from type to type.

Introduction to Techniques for Counting

Homework 2. Matthew Jin. April 10, 2014

Math 181, Exam 2, Study Guide 2 Problem 1 Solution. 1 + dx. 1 + (cos x)2 dx. 1 + cos2 xdx. = π ( 1 + cos π 2

Computing Machine-Efficient Polynomial Approximations

f(x) = 2x + 5 3x 1. f 1 (x) = x + 5 3x 2. f(x) = 102x x

Introduction and mathematical preliminaries

Math 471. Numerical methods Introduction

Student s Printed Name: _Key_& Grading Guidelines CUID:

ESO 208A: Computational Methods in Engineering. Saumyen Guha

THE SECANT METHOD. q(x) = a 0 + a 1 x. with

ln(9 4x 5 = ln(75) (4x 5) ln(9) = ln(75) 4x 5 = ln(75) ln(9) ln(75) ln(9) = 1. You don t have to simplify the exact e x + 4e x

1 Solutions to selected problems

Numerical Solution of f(x) = 0

Math 411 Preliminaries

We can see that f(2) is undefined. (Plugging x = 2 into the function results in a 0 in the denominator)

( ) ( ) ( ) 2 6A: Special Trig Limits! Math 400

MA1131 Lecture 15 (2 & 3/12/2010) 77. dx dx v + udv dx. (uv) = v du dx dx + dx dx dx

MTH30 Review Sheet. y = g(x) BRONX COMMUNITY COLLEGE of the City University of New York DEPARTMENT OF MATHEMATICS & COMPUTER SCIENCE

MATH1131/1141 Calculus Test S1 v5a

Chapter 1 Computer Arithmetic

Numerical Analysis. Yutian LI. 2018/19 Term 1 CUHKSZ. Yutian LI (CUHKSZ) Numerical Analysis 2018/19 1 / 41

Transcription:

MATH444.0 ASSIGNMENT 03 SOLUTIONS 4.3 Newton s method can be used to compute reciprocals, without division. To compute /R, let fx) = x R so that fx) = 0 when x = /R. Write down the Newton iteration for this problem and compute by hand or with a calculator) the first few Newton iterates for approximating /3 starting with x 0 = 0.5 and not using any division. What happens if you start with x 0 =? For positive R, use the theory of fixed point iteration to determine an interval about /R from which Newton s method will converge to /R. Solution. Given fx) = x R, we have f x) = x 2. The Newton iteration for this problem then is x k = x k fx k ) f x k ) = x k x k R = 2x k Rx 2 k. x 2 k To approximate /3, we use the above with R = 3. With x 0 = 0.5 we get x = 20.5) 30.5) 2 = 0.25 x 2 = 20.25) 30.25) 2 = 0.325 x 3 = 20.325) 30.325) 2 = 0.3320325 x 4 = 20.3320325) 30.3320325) 2 = 0.333328247070325 which demonstrates rapid convergence to 0.3. On the other hand, with x 0 =, we see x = 2) 3) 2 = x 2 = 2 ) 3 ) 2 = 5 x 3 = 2 5) 3 5) 2 = 85 x 4 = 2 85) 3 85) 2 = 2845 which demonstrates rapid divergence. Considering Newton s method as a fixed-point iteration, we see that it will converge for x 0 chosen in an interval [a, b] which the function gx) = 2x Rx 2 maps into itself and for which g exists and is strictly bounded above by. Noting g is a polynomial and therefore differentiable, we note that g x) = 2 2Rx, and so g x) < 2R x 3 2R. All that remains to show is that g maps this interval into itself. To do this, maximize and minimize g on the interval [/2R, 3/2R]. The maximum occurs at /R and is /R while the minimum occurs at either endpoint, and takes the value 3/4R. Clearly g maps the interval into itself, and therfore for any choice of initial guess x 0 in [/2R, 3/2R] will give convergence.

4.2 Let ϕx) = x 2 + 4)/5. a) Find the fixed points) of ϕx). Solution. The fixed points of ϕ satisfy x = x2 +4 5 and are the roots of fx) = x 2 5x + 4. The fixed points of ϕ are x =, 4. b) Would the fixed point iteration x k+ = ϕx k ) converge to a fixed point in the interval [0, 2] for all initial guesses x 0 [0, 2]? Solution. In order for fixed-point iteration to converge for all initial guesses x 0 [0, 2], ϕ must map [0, 2] into itself and ϕ x) < for all x [0, 2]. Note that ϕ x) = 2x. On [0, 2], 0 5 ϕ x) 4. Also, the minimum value of ϕx) 5 on this interval is 4 0 and the maximum value is 8 2. Thus ϕ satisfies the 5 5 hypotheses of the fixed-point theorem, and so fixed-point iteration converges to x = for all initial guesses in [0, 2]. 4.6 Steffensen s method for solving fx) = 0 is defined by where x k+ = x k fx k) g k, g k = f x k + fx k )) fx k ). fx k ) Show that this is quadratically convergent, under suitable hypotheses. Solution. Let gx) = fx + fx)) fx). fx) Supposing that x is a simple root of f that is f x ) 0), then gx), while not defined at x satisfies the property fx + fx)) fx) lim gx) = lim x x x x fx) = lim x x = f x ). f x + fx)) + f x)) f x) f x) Similarly, g is differentiable in a neighborhood of x, again, presuming that f is, and so we can argue that given φx) = x fx) gx) then φx) φx ) lim = 0 x x x x and therefore, Steffensen s method converges quadratically to the root x. 5.2 Write down the IEEE double-precision representation for the decimal number 50.2, using round to nearest. 2

Solution. Since 50.2 is positive, the sign bit here is s = 0. Also, 2 5 = 32 50.2 < 2 6 = 64 so, the exponent is 5. Adding the bias, then, the exponent bits will be the bits that represent 028 = 000000000 2 = 0x404, the last number being the hexadecimal representation. All that remains is to determine the mantissa. To do so, we need to note that 50 = 32 + 6 + 2 = 000 2. Also, we need to determine the binary representation of 0.2. Note that 0.2 = 5 = 8 + 3 40 Here = 80 6 So far, then, we have = 8 + 6 + 80, so 5 80 = 6 8 + 6 + ) 80 = 2 7 + 2 8 + 2 8 5 0.2 = 2 3 + 2 4 + 2 7 + 2 8 + 2 8 0.2) Continuing, then it is easy to see that And so 0.2 = 2 3 + 2 4 + 2 7 + 2 8 + 2 + 2 2 + 2 2 0.2) = 2 3 + 2 4 + 2 7 + + 2 44 + 2 47 2 48 + 2 48 0.2). 50.2 = 2 5 + 2 4 + 2 + 2 3 + 2 4 + 2 7 + + 2 44 + 2 47 + 2 48 + 2 48 0.2) = 2 5 + 2 + 2 4 + 2 8 + 2 9 + 2 2 + + 2 49 + 2 52 + 2 53 + 2 53 0.2) ) Represented in binary, then, we have 50.2 = 2 5.000000000 00 2 + 2 53 + 2 53.2) ). Rounding to nearest causes the last four bits to change from 00 2 to 00 2 because the bit after the fifty-second bit is that is, that 2 53 appears there). Putting all of the above together, we see that the double-precision representation of 50.2 in round to nearest is 0 000000000 000000000000000000000000000 where the bars show the separation between the sign bit, the exponent and the mantissa. In hexadecimal this is written as 0x40499999999999a. 5.3 What is the gap between 2 and the next larger double-precision number? 3

Solution. To determine the gap between 2 and the next larger number, determine the exponent for 2, and multiply by ε = 2 52. Here 2 2 < 2 2, therefore the gap between 2 and the next larger double-precision number is 2 5. This can be verified using Matlab: log2eps2)) -5 5.4 What is the gap between 20 and the next larger double-precision number? Solution. To determine the gap between 20 and the next larger number, determine the exponent for 2, and multiply by ε = 2 52. Here 2 7 = 28 20 < 2 8, therefore the gap between 20 and the next larger double-precision number is 2 52+7 = 2 45. This can be verified using Matlab: log2eps20)) -45 3) a) Show that ln x ) x 2 = ln x + ) x 2 Solution. Note that ) + ln ln x x 2 Therefore x + x 2 ) = ln x ) x 2 x + )) x 2 = ln x 2 x 2 ) ) = ln) = 0 ln x ) x 2 = ln x + ) x 2 b) Which of the two formulas is more suitable for numerical computation? Explain why and provide a numerical example in which the difference in accuracy is evident. Solution. The second formula, that is ln x + x 2 ), is more suitable for numerical computing because it doesn t suffer the cancellation that the first formula does. The following Matlab script demonstrates that the first formula is more inaccurate in the worst case than the second formula: f = @x) logx-sqrtx.ˆ2-)); % First formula g = @x) -logx+sqrtx.ˆ2-)); % Second formula 4

% Note that with x = sqrtyˆ2 + ), then expfx)) == x-y) should % be the case. We can use this to evaluate the accuracy in general % of the two formulae. Let's look at err = @f, x, y) absexpfx))-x-y))./absx))./eps; % which gives the relative error in terms of unit round-off. y = 0:0000; x = sqrty.ˆ2+); figure; plotx, errf, x, y)); hold on; plotx, errg, x, y)); setgca, 'XLim', [0 ceilmaxx))]); legend'error in First formula', 'Error in Second formula'); The generated plot looks like this: The other trouble is that the first formula results in for all double precision numbers larger than 2 53 while the second formula delays overflow until after realmax 2 52. Here is an example: >> x = 2.^27:5); >> allisinffx))) >> allisfinitegx))) 5

4) For the following expressions, state the numerical difficulties that may occur, and rewrite the formulas in a way that is more suitable for numerical computation: a) x + x x, where x >>. x Solution. When x is much larger than, x 0 and so x + x x x x x, demonstrating catastrophic cancellation. This should be rewritten as x + x x x = x + x + x x x + x + x x ) ) x + x x x = x + x + x x 2 = ). x x + x + x x This formulation no longer suffers cancellation when x >> and will underflow more gradually as x. b) +, where a 0 and b. a 2 b 2 Solution. For a 0 the value will overflow whenever a < a 2 rmax where r max = realmax, the largest finite floating point number in a given floating-point system. As a result, getting a 2 out of the denominator is desired. One possible way to do this is a + 2 b = 2 a + a2 2 a 2 b 2 = a ) 2. + a b In the case that b < the above should be preferred and it won t suffer overflow until a becomes subnormal. If b >, b 2 +a 2 might be slightly preferred since ab that will delay overflow. c) a 2 + b 2 2ab sinθ) where a b and θ π 2. Solution. When a b and θ π 2, a2 + b 2 2ab sinθ) 0 and in fact, in floating point, may in fact turn out to be negative, which will never occur in exact arithmetic. For example, >> a = 0.3; b = a-epsa)/2; theta = pi/2); >> a.^2 + b.^2-2*a.*b.*sintheta) -2.7756e-7 6

To maintain the property that this expression is never negative, rewrite it as a 2 + b 2 2ab sinθ) = a 2 sin 2 θ) + cos 2 θ) ) + b 2 2ab sinθ) = a 2 sin 2 θ) 2ab sinθ) + b 2 + a 2 cos 2 θ) = a sinθ) b) 2 + a cosθ)) 2 This is the sum of non-negative numbers and it will always be non-negative. 5) Consider the linear system a ) b x = b a) y 0) with a, b > 0; a b. a) If a b, what is the numerical difficulty in solving this linear system? Solution. It is easy to see that the correct solution to this linear system is a x = y = b a 2 b 2 a 2 b. 2 When a b the denominator suffers cancellation and becomes very small, resulting in inaccuracies when computing x, y. a b results in a nearly singular system, and in fact when a = b the system is singular and there is no solution. b) Suggest a numerically stable formula for computing z = x + y given a and b. Solution. Noting that a + b)x + a + b)y = then the sum z = x + y can be computed simply as z =. As long as a, b are a+b both positive and not subnormal, then z will be easily computed as there is no cancellation. c) Determine whether the following statement is true or false, and explain why: When a b, the problem of solving the linear system is ill-conditioned but the problem of computing x + y is not ill-conditioned. Solution. This statement is true. When a b, it is hard to compute x, y accurately because of cancellation in forming a 2 b 2, and the closer a is to b, the worse it gets. As demonstrated inn part b), however, when both a, b are positive, computing x + y can be easily done relatively accurately, regardless of the values of a, b, provided they are not too small. 7