Motivation: We have already seen an example of a system of nonlinear equations when we studied Gaussian integration (p.8 of integration notes)

Similar documents
1. Method 1: bisection. The bisection methods starts from two points a 0 and b 0 such that

Optimization. Totally not complete this is...don't use it yet...

Numerical solutions of nonlinear systems of equations

Lecture Notes to Accompany. Scientific Computing An Introductory Survey. by Michael T. Heath. Chapter 5. Nonlinear Equations

Outline. Scientific Computing: An Introductory Survey. Nonlinear Equations. Nonlinear Equations. Examples: Nonlinear Equations

17 Solution of Nonlinear Systems

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey

CS 450 Numerical Analysis. Chapter 5: Nonlinear Equations

Outline. Scientific Computing: An Introductory Survey. Optimization. Optimization Problems. Examples: Optimization Problems

Computational Methods CMSC/AMSC/MAPL 460. Solving nonlinear equations and zero finding. Finding zeroes of functions

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science

Numerical optimization

STOP, a i+ 1 is the desired root. )f(a i) > 0. Else If f(a i+ 1. Set a i+1 = a i+ 1 and b i+1 = b Else Set a i+1 = a i and b i+1 = a i+ 1

Line Search Methods. Shefali Kulkarni-Thaker

Optimization: Nonlinear Optimization without Constraints. Nonlinear Optimization without Constraints 1 / 23

Lecture 7: Minimization or maximization of functions (Recipes Chapter 10)

Nonlinear Equations and Continuous Optimization

8 Numerical methods for unconstrained problems

Numerical Methods I Solving Nonlinear Equations

MATH 350: Introduction to Computational Mathematics

Scientific Computing: Optimization

Optimization and Calculus

4 Newton Method. Unconstrained Convex Optimization 21. H(x)p = f(x). Newton direction. Why? Recall second-order staylor series expansion:

Methods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent

Nonlinear Optimization: What s important?

MATH 4211/6211 Optimization Basics of Optimization Problems

MATH 350: Introduction to Computational Mathematics

Numerical Methods in Informatics

Chapter 3: Root Finding. September 26, 2005

1-D Optimization. Lab 16. Overview of Line Search Algorithms. Derivative versus Derivative-Free Methods

THE SECANT METHOD. q(x) = a 0 + a 1 x. with

Review of Classical Optimization

Lecture 17: Numerical Optimization October 2014

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

15 Nonlinear Equations and Zero-Finders

3.1 Introduction. Solve non-linear real equation f(x) = 0 for real root or zero x. E.g. x x 1.5 =0, tan x x =0.

MA 8019: Numerical Analysis I Solution of Nonlinear Equations

CHAPTER-II ROOTS OF EQUATIONS

1 Numerical optimization

(One Dimension) Problem: for a function f(x), find x 0 such that f(x 0 ) = 0. f(x)

LOCAL SEARCH. Today. Reading AIMA Chapter , Goals Local search algorithms. Introduce adversarial search 1/31/14

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science

Lecture 34 Minimization and maximization of functions

Solution of Nonlinear Equations

1 Numerical optimization

Math Numerical Analysis Mid-Term Test Solutions

10.3 Steepest Descent Techniques

Numerical Optimization

Chapter 3 Numerical Methods

CLASS NOTES Models, Algorithms and Data: Introduction to computing 2018

1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by:

14 Increasing and decreasing functions

Numerical Solution of f(x) = 0

Math 471. Numerical methods Root-finding algorithms for nonlinear equations

Midterm Review. Igor Yanovsky (Math 151A TA)

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

5 Finding roots of equations

SOLUTION OF ALGEBRAIC AND TRANSCENDENTAL EQUATIONS BISECTION METHOD

Note: Every graph is a level set (why?). But not every level set is a graph. Graphs must pass the vertical line test. (Level sets may or may not.

ECE580 Exam 1 October 4, Please do not write on the back of the exam pages. Extra paper is available from the instructor.

ECS550NFB Introduction to Numerical Methods using Matlab Day 2

Root Finding (and Optimisation)

Math 273a: Optimization Netwon s methods

Optimization. Next: Curve Fitting Up: Numerical Analysis for Chemical Previous: Linear Algebraic and Equations. Subsections

MATH 1902: Mathematics for the Physical Sciences I

PART I Lecture Notes on Numerical Solution of Root Finding Problems MATH 435

Intro to Scientific Computing: How long does it take to find a needle in a haystack?

Roots of equations, minimization, numerical integration

1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0

Statistics 580 Optimization Methods

8.7 Taylor s Inequality Math 2300 Section 005 Calculus II. f(x) = ln(1 + x) f(0) = 0

Section 1.4 Tangents and Velocity

Quasi-Newton Methods

Today. Introduction to optimization Definition and motivation 1-dimensional methods. Multi-dimensional methods. General strategies, value-only methods

MATH 3795 Lecture 12. Numerical Solution of Nonlinear Equations.

Gradient Descent. Sargur Srihari

PRACTICE PROBLEMS FOR MIDTERM I

Chapter 8: Taylor s theorem and L Hospital s rule

Bindel, Fall 2011 Intro to Scientific Computing (CS 3220) Week 6: Monday, Mar 7. e k+1 = 1 f (ξ k ) 2 f (x k ) e2 k.

Unit 2: Solving Scalar Equations. Notes prepared by: Amos Ron, Yunpeng Li, Mark Cowlishaw, Steve Wright Instructor: Steve Wright

Chapter 3: The Derivative in Graphing and Applications

Numerical Optimization

Lecture V. Numerical Optimization

Chapter 4. Unconstrained optimization

j=1 r 1 x 1 x n. r m r j (x) r j r j (x) r j (x). r j x k

CS 323: Numerical Analysis and Computing

GENG2140, S2, 2012 Week 7: Curve fitting

Notes on Some Methods for Solving Linear Systems

Trajectory-based optimization

Finding the Roots of f(x) = 0. Gerald W. Recktenwald Department of Mechanical Engineering Portland State University

Finding the Roots of f(x) = 0

Introductory Numerical Analysis

Line Search Algorithms

Optimization with Scipy (2)

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods

Pre-Calculus Notes from Week 6

Introduction to unconstrained optimization - direct search methods

Transcription:

AMSC/CMSC 460 Computational Methods, Fall 2007 UNIT 5: Nonlinear Equations Dianne P. O Leary c 2001, 2002, 2007 Solving Nonlinear Equations and Optimization Problems Read Chapter 8. Skip Section 8.1.1. Skim Sections 8.2.1 and 8.2.2. Motivation: We have already seen an example of a system of nonlinear equations when we studied Gaussian integration (p.8 of integration notes) I(1) = 2 = ω 1 + ω 2 + ω 3 I(t) = 0 = ω 1 t 1 + ω 2 t 2 + ω 3 t 3 I(t 2 ) = 2 3 = ω 1t 2 1 + ω 2 t 2 2 + ω 3 t 2 3 I(t 3 ) = 0 = ω 1 t 3 1 + ω 2 t 3 2 + ω 3 t 3 3 I(t 4 ) = 2 5 = ω 1t 4 1 + ω 2 t 4 2 + ω 3 t 4 3 I(t 5 ) = 0 = ω 1 t 5 1 + ω 2 t 5 2 + ω 3 t 5 3 Suppose we want to find the zeros (or roots) of a polynomial; that is, we want to find values x for which p(x) = 0, where p might be the polynomial p(x) = x 5 + 3x 3 + 2x + 4. Galois in 1830 proved that there is no finite sequence of rational operations plus square and cube roots that can solve this problem for every polynomial of degree 5 or higher. So we ll need a numeric algorithm. Suppose we have a nonlinear least squares problem: for example, we believe our data should fit the model f(t) = a 1 e b1t + a 2 e b2t where a 1, a 2, b 1, b 2 are unknown. If we measure 20 data points (t i, f i ), i = 1,..., 20, we can try to minimize the sum of squares of distances between the data points and the model s predictions: min a 1,a 2,b 1,b 2 20 i=1 (f(t i ) f i ) 2. 1

The plan Methods for solving one nonlinear equation with one variable (8.1): bisection method Newton method Newton variants. Practical algorithm: fzero Methods for solving systems of nonlinear equations (8.4) Methods for minimizing a function of a single variable (8.2) Method for minimizing multivariate functions (8.3) Note 1: If we try to minimize a differentiable function F (x), then calculus tells that a necessary condition for x to be a minimizer is that F (x ) = 0. Thus, if we can solve nonlinear equations, then we can solve minimization problems, although each candidate solution must be checked to see that it is a minimizer instead of a maximizer or an inflection point. Note 2: The solution x to a system of equations f(x) = 0 is called a zero of the system. Note 3: Van Loan s section 2.4.2 discusses inverse interpolation, one way to get an approximate zero of a function. One Nonlinear Equation, One Variable One Nonlinear Equation, One Variable Problem: Given a continuous function f(x), with as many continuous derivatives as we need for a particular method, find a point x so that f(x ) = 0. Bisection Method 2

Suppose we are given two points a and b so that f(a)f(b) < 0. Then, since f has opposite signs at the endpoints of the interval [a, b], and f is continuous, there must be a point x [a, b] so that f(x ) = 0. Let s guess that that point is the midpoint m = (a + b)/2. Then one of three things must be true: 1. f(m) = 0, so x = m and we are finished. 2. f(m) has the same sign as f(a). Then there must be a root of f(x) in the interval [m, b], which has half the length of our previous interval. 3. f(m) has the same sign as f(b), so there is a root in [a, m]. After 1 iteration, our interval has been reduced by a factor of 1/2. After k iterations, it has been reduced by a factor of 1/2 k, so we terminate when the interval is small enough. See the code on p. 280 of Van Loan. Advantages: This is a foolproof algorithm, simple to implement, with a guanteed convergence rate. doesn t require derivative evaluations. cost per iteration = 1 function value: very cheap. Disadvantages: Slow convergence. We would like to do better. Newton and Its Variants Let s use our polynomial approximation machinery. Suppose we have the same information as in bisection: (a, f(a)),(b, f(b)). Bisection predicts the root to be the midpoint (a + b)/2. Is there a better prediction? Suppose we form the linear polynomial that interpolates this data: p(x) = f(a) + f(b) f(a) (x a). b a 3

Then a prediction of the point where f(x) = 0 is the point were p(x) = 0, and this is b a x = a f(b) f(a) f(a). This gives us the finite difference Newton algorithm: Choose an initial guess a. Find a nearby point b and use the formula to predict a new point x. Replace a by x and repeat. This algorithm requires 2 function values per iteration. In the secant algorithm, the nearby point is chosen to be the next-newest guess. This reduces the cost to 1 function value per iteration. In the Newton method, we replace the formula by x = a b a f(b) f(a) f(a). b a x = a lim b a f(b) f(a) f(a) = a f(a)/f (a). The cost per iteration is 1 function value and one derivative value per iteration. How fast is Newton? Your book on p.285 presents a theorem showing that if f doesn t change sign in a neighborhood of x, f is not too nonlinear and we start the iteration close enough to x then the convergence rate is quadratic: x x c a x 2 where c is a constant. This is very fast convergence: If our initial error is 1/2, then after 1 iteration it is at most 1/4, after 2 iterations, at most 1/16, after 3 iterations, at most 1/256, etc. Advantages: 4

When Newton works, it converges very quickly. Disadvantages: It requires evaluation of the derivative, and this is sometimes expensive. It is notoriously unreliable. How fast is the Secant Method? If the method converges, then where x x c a x α α = 1 + 5 1.6. 2 This intriguing number is related to the Fibonacci series, and the proof of the result is interesting, but we ll skip it. Fzero We want to combine the reliability of bisection with the speed of Newton variants. Idea: We first determine an interval [a, b] with f(a)f(b) < 0. At each iteration, we reduce the length of this interval in one of two ways: Compute the secant guess s. If this gives us a new interval [a, s] or [s, b] that has length not much longer than half the old length, then we accept it. Otherwise we use bisection at this iteration. This algorithm is a variant of one by Dutch authors including Dekker, and improved by Richard Brent and others. Systems of Nonlinear Equations 5

Systems of Nonlinear Equations First let s use an example to illustrate some of the basics. Example: Let the function f(x) be defined by f 1 (x 1, x 2 ) = x 3 1 + cos(x 2 ) f 2 (x 1, x 2 ) = x 1 x 2 2 x 3 2. Then in solving f(x) = 0, we look for a point x 1, x 2 where both f 1 and f 2 are zero. The derivative of the function is called the Jacobian matrix. It is a matrix J(x) defined by [ ] f1 f 1 [ ] x J(x) = 1 x 2 3x 2 = 1 sin(x 2 ) x 2 2 2x 1 x 2 3x 2 2 [] Example: See nonlindemo.m f 2 x 1 f 2 x 2 Newton s Method For a single variable, Newton s method looked like x = a f(a)/f (a),. It s geometric interpretation is that we follow the tangent line at a until it hits zero. For the multivariate case, we follow the tangent plane. The computation involves the inverse of the Jacobian matrix instead of the inverse of the derivative: x = a J(a) 1 f(a). To implement this, we need Matlab functions to evaluate both the function and the Jacobian matrix. Since we know that we (almost never) invert a matrix, we actually implement the algorithm by solving the equation for p and then forming J(a)p = f(a) x = a + p. 6

If the Jacobian is too hard or too expensive...... Then we can replace it by a finite difference approximation: f i (x) f i(x + he j ) f i (x) x j h where h is a small scalar and e j is the jth column of the identity matrix, j = 1,..., n. Note that this requires an extra n function evaluations. Minimizing a Function of a Single Variable Minimizing a Function of a Single Variable Caution: We look for local minimizers, the best in their neighborhood rather than global minimizers, the best in the world. For nonlinear equations we had Bisection; for function minimization we have Golden Section Search. We won t discuss the details; just know that it involves evaluating the function at a pattern of points and choosing the best. The interval containing the root is reduced by a factor of approximately.618 each iteration, a linear rate of convergence For nonlinear equations we had Newton s method; for function minimization we have Parabolic Fit. This involves fitting a quadratic function (a parabola) to three points and approximating the minimizer of the function by the minimizer of the parabola. The convergence rate is superlinear, better than linear, but the method is not as reliable as Golden Section. fminbnd This is the Matlab function that minimizes functions of single variables. Just like Fzero, it combines a slow, reliable algorithm (Golden Section) with a fast but unreliable one (Parabolic Fit) to obtain a fast, reliable algorithm. (The book may talk about an older version of the function, called fmin.) Minimizing Multivariate Functions 7

Minimizing Multivariate Functions First let s use an example to illustrate some of the basics. Suppose we want to solve min x R 2 F(x) where F(x) = x 1 x 2 + e x1x2 Then the gradient (derivative) of the function F(x) is ] g(x) = F(x) = [ F(x) x 1 F(x) x 2 = [ ] x2 + x 2 e x1x2 x 1 + x 1 e x1x2 and the second-derivative matrix, the Hessian of the function, is also the Jacobian of the gradient: 2 F(x) 2 F(x) [ ] H(x) = x 2 1 x 1 x 2 x = 2 2e x1x2 1 + x 1 x 2 e x1x2 + e x1x2 1 + x 1 x 2 e x1x2 + e x1x2 x 2. 1e x1x2 2 F(x) x 2 x 1 2 F(x) x 2 2 Important note: Programming derivatives, gradients, Jacobians, and Hessians is tedious and error-prone. There are automatic systems to derive these functions from your code for function evaluation. One of the best is called Adifor, and there are also packages for use with Matlab. The Steepest Descent Algorithm The gradient is the direction of steepest ascent; i.e., if you are standing on a mountain in the fog, then the steepest uphill direction is in the direction of the gradient g(x), and the steepest downhill direction is in the opposite direction g(x). This insight leads to the steepest descent algorithm: Take a step in the direction of the negative gradient: Either step that way until the function stops decreasing (you start going uphill on the mountain) or take a fixed length step, as long as it does not bring you too far uphill. Advantages: 8

The algorithm is relatively easy to program. It requires no 2nd derivatives. Disadvantages: The algorithm is slow! Alternatives to Steepest Descent Your book really does not discuss any good algorithms for minimizing general functions of several variables. Here is some advice: 1. If gradients are available, the best algorithms are conjugate gradients or, if n 2 storage is also available, Quasi-Newton algorithms. 2. If Hessians are available, a quality implementation of Newton s method is a good alternative. 3. If no derivatives are available, a pattern search method is usually used. The Nelder-Meade simplex algorithm (different from the simplex algorithm for linear programming) is popular and is available in Matlab as fminsearch, but more recent pattern search methods are better. If the Minimization Problem is Nonlinear Least Squares...... then a very good alternative is the Gauss-Newton algorithm, which applies a Newton-like method to finding a zero of the gradient. Instead of computing the Hessian matrix exactly, we approximate it by J T J, where J is the Jacobian matrix for the residual vector f(x 1) f 1... f(x m ) f m This method works well as long as the model is a good fit to the data. Final Words f(a)f(b) < 0 is not a foolproof test. Why? Hint: underflow/overflow To find complex roots, can use Newton or secant with complex initial guesses. 9

If you are finding the roots of a polynomial, make sure that it is not an eigenvalue problem det(a λi) = 0 in disguise. If it is, you will almost surely get better results by working with the original matrix and using eig to find the eigenvalues. The multivariate methods discussed in the book are linesearch methods. Trust region methods are newer and equally useful. 10