Lecture 8: Optimization

Similar documents
15 Nonlinear Equations and Zero-Finders

Calculus 221 worksheet

Announcements. Topics: Homework: - sections , 6.1 (extreme values) * Read these sections and study solved examples in your textbook!

Lecture Notes to Accompany. Scientific Computing An Introductory Survey. by Michael T. Heath. Chapter 5. Nonlinear Equations

Scientific Computing: An Introductory Survey

Outline. Scientific Computing: An Introductory Survey. Nonlinear Equations. Nonlinear Equations. Examples: Nonlinear Equations

The Intermediate Value Theorem If a function f (x) is continuous in the closed interval [ a,b] then [ ]

Mathematic 108, Fall 2015: Solutions to assignment #7

What makes f '(x) undefined? (set the denominator = 0)

Numerical Analysis: Solving Nonlinear Equations

Determining the Roots of Non-Linear Equations Part I

1. Method 1: bisection. The bisection methods starts from two points a 0 and b 0 such that

Numerical Methods Dr. Sanjeev Kumar Department of Mathematics Indian Institute of Technology Roorkee Lecture No 7 Regula Falsi and Secant Methods

AP Calculus AB. Chapter IV Lesson B. Curve Sketching

CS 450 Numerical Analysis. Chapter 5: Nonlinear Equations

CHAPTER-II ROOTS OF EQUATIONS

Caculus 221. Possible questions for Exam II. March 19, 2002

Optimization. CS Summer 2008 Jonathan Kaldor

Mathematics Lecture. 6 Chapter. 4 APPLICATIONS OF DERIVATIVES. By Dr. Mohammed Ramidh

1 Lecture 25: Extreme values

Queens College, CUNY, Department of Computer Science Numerical Methods CSCI 361 / 761 Spring 2018 Instructor: Dr. Sateesh Mane.

PART I Lecture Notes on Numerical Solution of Root Finding Problems MATH 435

Mathematical Economics: Lecture 3

Families of Functions, Taylor Polynomials, l Hopital s

Midterm Review. Igor Yanovsky (Math 151A TA)

Functions of Several Variables

Numerical Methods I Solving Nonlinear Equations

Chapter 3: Root Finding. September 26, 2005

Solution of Nonlinear Equations

8.5 Taylor Polynomials and Taylor Series

Chapter 8: Taylor s theorem and L Hospital s rule

Sections 4.1 & 4.2: Using the Derivative to Analyze Functions

1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0

Numerical Methods in Informatics

Functions of Several Variables

University of Houston, Department of Mathematics Numerical Analysis, Fall 2005

Suppose that f is continuous on [a, b] and differentiable on (a, b). Then

Test 3 Review. y f(a) = f (a)(x a) y = f (a)(x a) + f(a) L(x) = f (a)(x a) + f(a)

Polynomial functions right- and left-hand behavior (end behavior):

Zeros of Functions. Chapter 10

The Mean Value Theorem and the Extended Mean Value Theorem

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey

8.7 Taylor s Inequality Math 2300 Section 005 Calculus II. f(x) = ln(1 + x) f(0) = 0

Outline. Math Numerical Analysis. Intermediate Value Theorem. Lecture Notes Zeros and Roots. Joseph M. Mahaffy,

Math Numerical Analysis

1S11: Calculus for students in Science

Cumulative Review. Name. 13) 2x = -4 13) SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.

Review Guideline for Final

Chapter 6. Nonlinear Equations. 6.1 The Problem of Nonlinear Root-finding. 6.2 Rate of Convergence

Relations and Functions (for Math 026 review)

Taylor Series. Math114. March 1, Department of Mathematics, University of Kentucky. Math114 Lecture 18 1/ 13

Convergence Rates on Root Finding

SOLUTION OF ALGEBRAIC AND TRANSCENDENTAL EQUATIONS BISECTION METHOD

TAYLOR POLYNOMIALS DARYL DEFORD

INTRODUCTION TO NUMERICAL ANALYSIS

Solving Non-Linear Equations (Root Finding)

Mathematical Foundations -1- Convexity and quasi-convexity. Convex set Convex function Concave function Quasi-concave function Supporting hyperplane

CHAPTER 10 Zeros of Functions

Root Finding: Close Methods. Bisection and False Position Dr. Marco A. Arocha Aug, 2014

Root Finding (and Optimisation)

Lecture 34 Minimization and maximization of functions

September Math Course: First Order Derivative

MTAEA Differentiation

Chapter 2: Functions, Limits and Continuity

ExtremeValuesandShapeofCurves

APPROXIMATION OF ROOTS OF EQUATIONS WITH A HAND-HELD CALCULATOR. Jay Villanueva Florida Memorial University Miami, FL

Outline. Scientific Computing: An Introductory Survey. Optimization. Optimization Problems. Examples: Optimization Problems

Solutions Exam 4 (Applications of Differentiation) 1. a. Applying the Quotient Rule we compute the derivative function of f as follows:

Continuity. To handle complicated functions, particularly those for which we have a reasonable formula or formulas, we need a more precise definition.

3.1 Introduction. Solve non-linear real equation f(x) = 0 for real root or zero x. E.g. x x 1.5 =0, tan x x =0.

A Review of Bracketing Methods for Finding Zeros of Nonlinear Functions

Bob Brown Math 251 Calculus 1 Chapter 4, Section 4 1 CCBC Dundalk

10/9/10. The line x = a is a vertical asymptote of the graph of a function y = f(x) if either. Definitions and Theorems.

14 Increasing and decreasing functions

8.8 Applications of Taylor Polynomials

1. Introduction. 2. Outlines

Convex Optimization. Chapter Introduction Mathematical Optimization

Quick Review Sheet for A.P. Calculus Exam

Iterative Methods to Solve Systems of Nonlinear Algebraic Equations

Lectures 9-10: Polynomial and piecewise polynomial interpolation

Limit. Chapter Introduction

MA4001 Engineering Mathematics 1 Lecture 15 Mean Value Theorem Increasing and Decreasing Functions Higher Order Derivatives Implicit Differentiation

Roots of Equations. ITCS 4133/5133: Introduction to Numerical Methods 1 Roots of Equations

How to Explain Usefulness of Different Results when Teaching Calculus: Example of the Mean Value Theorem

Math 121 Calculus 1 Fall 2009 Outcomes List for Final Exam

Calculus I Practice Problems 8: Answers

Bisection and False Position Dr. Marco A. Arocha Aug, 2014

Maximum and Minimum Values (4.2)

Math Section TTH 5:30-7:00pm SR 116. James West 620 PGH

Computational Methods CMSC/AMSC/MAPL 460. Solving nonlinear equations and zero finding. Finding zeroes of functions

x 2 + 6x 18 x + 2 Name: Class: Date: 1. Find the coordinates of the local extreme of the function y = x 2 4 x.

Lecture 32: Taylor Series and McLaurin series We saw last day that some functions are equal to a power series on part of their domain.

Jim Lambers MAT 460/560 Fall Semester Practice Final Exam

, applyingl Hospital s Rule again x 0 2 cos(x) xsinx

1 The best of all possible worlds

STOP, a i+ 1 is the desired root. )f(a i) > 0. Else If f(a i+ 1. Set a i+1 = a i+ 1 and b i+1 = b Else Set a i+1 = a i and b i+1 = a i+ 1

Power series solutions for 2nd order linear ODE s (not necessarily with constant coefficients) a n z n. n=0

Table of Contents. Module 1

Lecture 4: Optimization. Maximizing a function of a single variable

Transcription:

Lecture 8: Optimization This lecture describes methods for the optimization of a real-valued function f(x) on a bounded real interval [a, b]. We will describe methods for determining the maximum of f(x) on [a, b], i.e., we solve max f(x). (1) a x b The same methods applied to f(x) determine the minimum of f(x) on [a, b]. We first note that if the derivative f (x) exists, is continuous on [a, b], and is not too difficult to evaluate, then it can be attractive to solve (1) by first computing all distinct zeros of f (x), say z 1 < z 2 < < z l, in the interior of the interval [a, b], and then evaluate f(x) at these zeros and at the endpoints a and b. The z can be minima, maxima, or inflection points of f(x). The function f(x) achieves its maximum at one these zeros or at the endpoints. The zeros of f (x) can be computed by one of the methods of Lectures 6-7. The remainder of this lecture describes methods that do not require evaluation of the derivative. These methods are attractive to use when f (x) is either not available or very complicated to compute. The first method Golden Section Search (GSS) is analogous to bisection. The second method applies interpolation by a quadratic polynomial. Let N(x) denote an open real interval that contains x. The function f(x) is said to have a local maximum at x if there is an open interval N(x ), such that f(x ) f(x), x N(x ) [a, b]. A solution of (1) is referred to as a global maximum of f(x) on [a, b]. There may be several global maxima. Every global maximum is a local maximum. The converse is, of course, not true. The optimization methods to be described determine a local maximum of f(x) in [a, b]. Sometimes it is known from the background of the problem that there is at most one local maximum, which then necessarily also is the global maximum. In general, however, a function can have several local maxima, which may not be global maxima. A function, of course, may have several global maxima. Example 1. The function f(x) = sin(x) has global maxima at π/2 and 5π/2 in the interval [0, 3π]; see Figure 1. Example 2. The function f(x) = e x sin(x) has local maxima at π/4 and 9π/4 in the interval [0, 3π]; see Figure 2. The local maximum at π/4 also is the global maximum. Many optimization methods are myopic; they only know the function in a small neighborhood of the current approximation of the maximum. There are engineering techniques for steering myopic optimization methods towards a global maximum. These techniques are referred to as evolutionary methods, genetic algorithms, or simulated annealing. They were discussed by Gwenn Volkert in the fall. Golden Section Search This method is analogous to bisection in the sense that the original interval [a 1, b 1 ] = [a, b] is replaced by a sequence of intervals [a, b ], = 2, 3,..., of decreasing lengths, so that each interval [a, b ] contains at least one local maximum of f(x). GSS divides each one of the intervals [a, b ], = 1, 2,..., into three subintervals [a, x left and x right are cleverly chosen points, with < b. The method then discards the right-most or left-most subinterval, so that the ], [x left, x right ], and [x right, b ], where the x left a < x left remaining interval is guaranteed to contain a local maximum of f(x). < x right 1

1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1 0 1 2 3 4 5 6 7 8 9 Figure 1: The function f(x) = sin(x) for 0 x 3π. The function has two local maxima (at x = π/2 and x = 9π/2). The corresponding function values are marked by (in red). Thus, let a 1 < x left 1 < x right 1 < b 1 and evaluate f(x) at these points. If f(x left 1 ) f(x right 1 ), then f(x) has a local maximum in the interval [a 1, x right 1 ] and we set [a 2, b 2 ] := [a 1, x right 1 ]; otherwise f(x) has a local maximum in [x left 1, b 1 ] and we set [a 2, b 2 ] := [x left 1, b 1]. We now repeat the process for the interval [a 2, b 2 ], i.e., we select interior points x left 2 and x right 2, such that a 2 < x left 2 < x right 2 < b 2 and evaluate f(x) at these points. Similarly, as above we obtain the new interval [a 3, b 3 ], which is guaranteed to contain a local maximum of f(x). The computations proceed in this manner until a sufficiently small interval, say [a k, b k ], which contains a local maximum of f(x), has been determined. We would like to choose the interior points x left shorter than the interval [a, b ]. Example 3. The choices and x right, so that the interval [a +1, b +1 ] is substantially x left = a + 1 3 (b a ), x right = a + 2 3 (b a ), (2) secure that the length of the new interval [a +1, b +1 ] is 2/3 of the length of the previous interval [a, b ]. This subdivision requires that f(x) be evaluated at x left and x right for every. There is a better choice of interior points than in Example 3. The following selection reduces the length of the intervals [a, b ] faster than in Example 3 and requires only one function evaluation for every 2. The trick is to choose the interior points x left and x right in [a, b ] so that one of them becomes an endpoint of the next interval [a +1, b +1 ] and the other one becomes one of the interior points x left +1 or xright +1 of this interval. We then only have to evaluate f(x) at one interior point. Let the interior points be given by x left = a + ρ(b a ), x right = a + (1 ρ)(b a ) (3) 2

0.3 0.25 0.2 0.15 0.1 0.05 0 0.05 0 1 2 3 4 5 6 7 8 9 Figure 2: The function f(x) = e x sin(x) for 0 x 3π. The function has two local maxima (at x = π/4 and x = 9π/4). The corresponding function values are marked by (in red). for some constant 0 < ρ < 1 to be determined. The choice ρ = 1/3 yields (2). Assume that f(x left ) f(x right ). Then f(x) has a local maximum in [a, x right ]. We set a +1 := a, b +1 := x right (4) and would like The latter requirement determines the parameter ρ. x right +1 = xleft. (5) Figure 3: The bottom line depicts the interval [a, b ] with the endpoints a and b marked by o (in red), and the interior points x left and x right marked by (in red). The top line shows the interval [a +1, b +1 ] = [a, x right ] that may be determined in step of GSS. The endpoints are marked by o (in red) and the interior points x left +1 by (in red). and xright +1 There are many ways to determine ρ. One of the simplest is based on the observation that b +1 x right +1 = b x right, (6) b +1 a +1 b a 3

cf. Figure 3. Using first (4) and (5), and subsequently (3), the left-hand side simplifies to b +1 x right +1 = xright x left b +1 a +1 x right = (1 2ρ)(b a ) a (1 ρ)(b a ) = 1 2ρ 1 ρ. The right-hand side of (6) can be simplified similarly. It follows from the right-hand side equation of (3) that x right = b ρ(b a ) and, therefore, b x right b a = ρ(b a ) b a = ρ. Thus, equation (6) is equivalent to which can be expressed as The only root smaller than unity is given by 1 2ρ 1 ρ = ρ, ρ 2 3ρ + 1 = 0. ρ = 1 2 (3 5) 0.382. It follows that the lengths of the intervals [a, b ] decreases as increases according to b +1 a +1 = x right a = (1 ρ)(b a ) 0.618 (b a ). In particular, the length decreases less in each step than for the bisection method of Lectures 6-7. We conclude that GSS is useful for determining a local maximum to low accuracy. If high accuracy is desired, then faster methods should be employed, such as the method of the following section. Example 4. We apply GSS to the function of Example 1 with [a 1, b 1 ] = [0, 3π]. We obtain x left 1 = 3.60, x right 1 = 5.82, and f(x left 1 ) = f(x right 1 ) = 0.44. We therefore choose the next interval to be [a 2, b 2 ] = [0, x right 1 ] 2 ) = 0.79 and it is clear how the method and obtain x right 2 = 2.22 and x left 2 = x right 1 = 5.82. We have f(x right will determine the local maximum at x = π/2. The progress of GSS is illustrated by Figure 3. The reason that this local maximum is determined depends on that we discard the subinterval [x right 1, b 1 ] when f(x left 1 ) f(x right 1 ). If this inequality would have been f(x left 1 ) > f(x right 1 ), then the other local maximum would have been found. This example illustrates that it generally is a good idea to graph the function first, and then choose [a 1, b 1 ] so that it contains the desired maximum. Example 5. We apply GSS to the function of Example 2 with [a 1, b 1 ] = [0, 3π]. Similarly as in Example 4, we obtain x left 1 = 3.60, x right 1 = 5.82, f(x left 1 ) = 1.2 10 2, and f(x right 1 ) = 1.2 10 3. We therefore choose the next interval to be [a 2, b 2 ] = [x left 1, b 1 ], and obtain x left 2 = 5.82 and x right 2 = 7.20. We evaluate f(x right 2 ) = 5.9 10 4 and it is clear the method will determine the local maximum at x = 5π/2. If we would have liked to determine the global maximum, then a smaller initial interval [a 1, b 1 ] containing x = π/4 should be used. 4

Quadratic Interpolation We found in Lectures 6-7 that the secant method gives faster convergence than bisection. Analogously, we will see that interpolation by a sequence of quadratic polynomials yields faster convergence towards a local maximum than GSS. The rationale is that GSS only uses the function values to determines whether the function at a new node is larger or smaller than at an adacent node. We describe in this section a method based on interpolating the function by a sequence of quadratic polynomial and determine the maxima of the latter. We proceed as follows: Let the interval [a, b ] be determined in step 1 of GSS, and let ˆx denote the point inside this interval at which f(ˆx ) is known, i.e., ˆx is either x left. Instead of evaluating f(x) by a new point determined by GSS, we fit a quadratic polynomial q (x) to the data or x right (a, f(a )), (b, f(b )), and (ˆx, f(ˆx )). If this polynomial has a maximum in the interior of [a, b ], then this will be our new interior point. Denote this point by x polmax and evaluate f(x polmax ). We remark that it is easy to determine x polmax. If the leading coefficient of q (x) is negative, then q (x) is concave ( q (x) is convex) and has a maximum. The point x polmax is the zero of the linear polynomial q (x). Thus, assume that a < x polmax < b and that q (x) achieves its maximum at x polmax. We then discard the point in the set {a, x, b } furthest away from x polmax. This gives us three nodes at which the function f(x) is known, and we can compute a new quadratic polynomial, whose maximum we determine, and so on. Interpolation by a sequence of quadratic polynomials usually gives much faster convergence than GSS, however, it is not guaranteed to converge. Thus, quadratic interpolation has to be safeguarded. For instance, one may have to apply GSS when quadratic interpolation fails to converge to a maximum in the last interval determined by GSS. Example 6. The quadratic polynomial determined by interpolating f(x) at the nodes {a 1, x left 1, b 1 } or at the nodes {a 1, x right 1, b 1 } in Example 5 is not convex. Hence, we have to reduce the interval by GSS before applying polynomial interpolation. We finally comment on the accuracy of the computed local maximum by any optimization method. Assume that we have determined an approximation ˆx of the local maximum x. Let f(x) have a continuous second derivative f (x) in a neighborhood of x. Then, in view of that f (x ) vanishes, Taylor expansion of f(x) at x yields f(ˆx) f(x ) = 1 2 (x x ) 2 f ( x), where x lives in the interval with endpoints x and ˆx. This formula implies that the error in the computed maximum ˆx of the function f(x) can be expected to be proportional to square root of the error in the function value f(ˆx). In particular, we cannot expect to be able to determine a local maximum with a relative error smaller than the square root of machine epsilon. Exercises 1. Apply GSS with initial interval [0, π] to determine the global maximum of the function in Example 2. Carry out 4 steps by GSS. 2. Apply quadratic interpolation with initial interval [0, π] to determine the global maximum of the function in Example 2. Carry out 4 steps. How does the accuracy of the computed approximate maximum compare with the accuracy of the computed approximate maximum determined in Exercise 1? Estimate empirically 5

the rate of convergence of the quadratic interpolation method. For instance, study quotients x polmax +1 x s =, = 1, 2, 3,..., x or their logarithms. x polmax 3. Compute the minimum of the function in Example 2 by carrying out 4 GSS steps. 4. GSS is analogous to bisection, and the quadratic interpolation method is analogous to the secant method. The former applies local quadratic interpolation to determine a maximum, while the latter applies local linear interpolation to determine a zero. Describe an analogue of regula falsi for optimization. 6