Numerical optimization

Similar documents
Lecture 7: Minimization or maximization of functions (Recipes Chapter 10)

Motivation: We have already seen an example of a system of nonlinear equations when we studied Gaussian integration (p.8 of integration notes)

Today. Introduction to optimization Definition and motivation 1-dimensional methods. Multi-dimensional methods. General strategies, value-only methods

1 Numerical optimization

Lecture V. Numerical Optimization

1 Numerical optimization

Lecture 34 Minimization and maximization of functions

Bisection and False Position Dr. Marco A. Arocha Aug, 2014

Finite element function approximation

Root Finding: Close Methods. Bisection and False Position Dr. Marco A. Arocha Aug, 2014

Lecture 17: Numerical Optimization October 2014

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey

Optimization and Gradient Descent

Lecture 8 Optimization

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science

Root Finding and Optimization

1. Method 1: bisection. The bisection methods starts from two points a 0 and b 0 such that

Lecture 5: Random numbers and Monte Carlo (Numerical Recipes, Chapter 7) Motivations for generating random numbers

Optimization. Totally not complete this is...don't use it yet...

Unit 2: Solving Scalar Equations. Notes prepared by: Amos Ron, Yunpeng Li, Mark Cowlishaw, Steve Wright Instructor: Steve Wright

Outline. Scientific Computing: An Introductory Survey. Optimization. Optimization Problems. Examples: Optimization Problems

Unconstrained minimization of smooth functions

Topic 8c Multi Variable Optimization

Bindel, Fall 2011 Intro to Scientific Computing (CS 3220) Week 6: Monday, Mar 7. e k+1 = 1 f (ξ k ) 2 f (x k ) e2 k.

Numerical Optimization Techniques

(One Dimension) Problem: for a function f(x), find x 0 such that f(x 0 ) = 0. f(x)

UNCONSTRAINED OPTIMIZATION

Roots of equations, minimization, numerical integration

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares

Optimization: Nonlinear Optimization without Constraints. Nonlinear Optimization without Constraints 1 / 23

LOCAL SEARCH. Today. Reading AIMA Chapter , Goals Local search algorithms. Introduce adversarial search 1/31/14

Nonlinear Equations. Chapter The Bisection Method

Numerical Solution of f(x) = 0

Optimization and Calculus

Solving Non-Linear Equations (Root Finding)

Goals for This Lecture:

Lecture Notes: Geometric Considerations in Unconstrained Optimization

4 damped (modified) Newton methods

Quasi-Newton Methods

Solution of Nonlinear Equations

GENG2140, S2, 2012 Week 7: Curve fitting

MATHEMATICS FOR COMPUTER VISION WEEK 8 OPTIMISATION PART 2. Dr Fabio Cuzzolin MSc in Computer Vision Oxford Brookes University Year

Root Finding (and Optimisation)

Design and Optimization of Energy Systems Prof. C. Balaji Department of Mechanical Engineering Indian Institute of Technology, Madras

Example - Newton-Raphson Method

Statistics 580 Optimization Methods

Practical Numerical Analysis: Sheet 3 Solutions

Chapter 3: Root Finding. September 26, 2005

CPSC 340: Machine Learning and Data Mining. Gradient Descent Fall 2016

CHAPTER-II ROOTS OF EQUATIONS

Lecture 1: Introduction to computation

ECE 680 Modern Automatic Control. Gradient and Newton s Methods A Review

Finding the Roots of f(x) = 0. Gerald W. Recktenwald Department of Mechanical Engineering Portland State University

Finding the Roots of f(x) = 0

Numerical solutions of nonlinear systems of equations

Scientific Computing: Optimization

Queens College, CUNY, Department of Computer Science Numerical Methods CSCI 361 / 761 Spring 2018 Instructor: Dr. Sateesh Mane.

Scientific Computing. Roots of Equations

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.

Chapter 2 Solutions of Equations of One Variable

22 Path Optimisation Methods

Lecture 7. Root finding I. 1 Introduction. 2 Graphical solution

17 Solution of Nonlinear Systems

Typos/errors in Numerical Methods Using Matlab, 4th edition by Mathews and Fink

CHAPTER 4 ROOTS OF EQUATIONS

Review of Classical Optimization

Nonlinearity Root-finding Bisection Fixed Point Iteration Newton s Method Secant Method Conclusion. Nonlinear Systems

Introduction to Scientific Computing

Lecture 8. Root finding II

Introduction to unconstrained optimization - direct search methods

CPSC 340: Machine Learning and Data Mining. Stochastic Gradient Fall 2017

X. Numerical Methods

Midterm exam CS 189/289, Fall 2015

Lecture 7 Unconstrained nonlinear programming

Gradient Descent. Sargur Srihari

Lecture 7: Numerical Tools

Unconstrained optimization

Lecture 4: Perceptrons and Multilayer Perceptrons

CLASS NOTES Models, Algorithms and Data: Introduction to computing 2018

Solution of Algebric & Transcendental Equations

x 2 x n r n J(x + t(x x ))(x x )dt. For warming-up we start with methods for solving a single equation of one variable.

Numerical Methods Lecture 3

3.1 Introduction. Solve non-linear real equation f(x) = 0 for real root or zero x. E.g. x x 1.5 =0, tan x x =0.

Optimization with Scipy (2)

Root Finding and Optimization

Introduction to Numerical Analysis

Two hours. To be provided by Examinations Office: Mathematical Formula Tables. THE UNIVERSITY OF MANCHESTER. xx xxxx 2017 xx:xx xx.

V. Graph Sketching and Max-Min Problems

Math 409/509 (Spring 2011)

Gradient Descent. Dr. Xiaowei Huang

Figure 1: Graph of y = x cos(x)

Integration, differentiation, and root finding. Phys 420/580 Lecture 7

Physics 403. Segev BenZvi. Numerical Methods, Maximum Likelihood, and Least Squares. Department of Physics and Astronomy University of Rochester

Lecture 3: Huggett s 1993 model. 1 Solving the savings problem for an individual consumer

Scientific Computing: An Introductory Survey

HOMEWORK #4: LOGISTIC REGRESSION

MATH 350: Introduction to Computational Mathematics

CPSC 540: Machine Learning

Transcription:

THE UNIVERSITY OF WESTERN ONTARIO LONDON ONTARIO Paul Klein Office: SSC 408 Phone: 661-111 ext. 857 Email: paul.klein@uwo.ca URL: www.ssc.uwo.ca/economics/faculty/klein/ Numerical optimization In these notes we consider some methods of numerical minimization since that is what engineers are mostly up to. To maximize f, minimize f. Broadly speaking, optimization methods can be divided into gradient-based methods and non-gradient based methods. Gradient-based methods are typically faster but less robust. Which is better depends on the problem at hand. If the mimimand is smooth and you have a good initial guess, gradient-based methods are superior. 1 Optimization in one dimension 1.1 Golden section search Golden section search finds a minimizer in one dimension of a single-troughed function. 1

1. The golden section a + b = a a b = φ. 1 + φ φ = φ. φ φ 1 = 0. φ = 1 + 5 1.61803 1.3 Getting started Golden section search starts with three points x 1, x, x 3 such that f(x 1 ) > f(x ) and f(x 3 ) > f(x ). If f is single-troughed, we can then be sure that the minimizer lies between x 1 and x 3. But how to find three such points? Well, provided that the minimum is known to be between x 1 and x 3, then we say that we have bracketed the minimum. As explained below, it makes sense to choose the third point x such that the three points have the golden section property. x 3 x x x 1 = φ. This choice doesn t guarantee f(x 1 ) > f(x ) and f(x 3 ) > f(x ), but that doesn t matter so long as we have bracketed the minimum.

1.3.1 Updating: case 1 We now evaluate the function at a new point, x 4. It should be chosen such that x 3 x 4 x 4 x = φ. (This is illustrated in Figure 1.) Why is this desirable? Well, in the first place, the next point where we evaluate f should be in the larger subinterval (x, x 3 ), otherwise the amount by which our interval shrinks could be very small indeed and we definitely want to avoid that. But we can say more than that. If f(x 4 ) < f(x ), then the new points are x, x 4, and x 3. Otherwise, the new points are x 1, x and x 4. Thus our new interval has a width of either x 4 x 1 or x 3 x. To minimize the significance of bad luck, these lengths should be equal. In the notation of Figure 1, we should have a + c = b. Also, the spacing between the points should be the same before and after the revising of the triplet of points, i.e. These two equations together imply which leads to the conclusion that c a = a b. b a = a b + 1 b a = φ. 1.3. Updating: case What if it turns out that the new interval is in fact [x 1, x 4 ]? Then we will have the bigger subinterval first and the smaller subinterval second. The code has to allow 3

f1 f3 f4a f f4b a c b Figure 1: Golden section search: case 1 for that possibility. In fact we should draw a picture for that case, too. See Figure. In this case, we get a = b + c and and the conclusion is that a b = a c c a b = φ. 4

f1 f4a f3 f f4b a c b Figure : Golden section search: case 1.3.3 When to stop It is tempting to think that you can bracket the solution x in a range as small as (1 ϵ)x < x < (1 + ϵ)x where ϵ is machine precision. However, that is not so! f(x) f(x ) + f (x )(x x ) + 1 f (x )(x x ) The second term is zero, and the third term will be negligible compared to the first (that is, will be a factor ϵ smaller and so will be an additive zero in finite precision) whenever 1 f (x )(x x ) < ϵf(x ) 5

or x x x < f(x ϵ ) (x ) f (x ). Therefore, as a rule of thumb, it is hopeless to ask for bracketing with a width of less than ϵ! 1.4 Parabolic search If the function to be minimized is smooth (is continuously differentiable) then approximating it by a parabola and taking as the new approximation of the minimizer the minimizer of the parabola. Let the minimizer be bracketed by the three points a, b and c. Then the following number minimizes the parabola that goes through the points (a, f(a)), (b, f(b)) and (c, f(c)). x = b 1 (b a) [f(b) f(c)] (b c) [f(b) f(a)] (b a)[f(b) f(c)] (b c)[f(b) f(a)]. A pure parabolic interpolation method proceeds as follows. If a < x < b, then the new three points are a, x and b. If on the other hand b < x < c, then the new three points are b, x and c. This method is fast but could easily become numerically unstable. So a good algorithm cannot be based just on this. A hybrid method switches between parabolic interpolation as described above and golden section search. Parabolic interpolation is used to update a, b and c if it is acceptable (in the current interval and represents a noticeable improvement over the previous guess.) If not, then the algorithm falls back to an ordinary golden section step. Notice that golden section search doesn t have to start with two intervals with the right ratio 6

of lengths. All you need to know is which is the bigger interval and then you choose your new point there. 1.5 Newton s method Newton s method of optimization is to apply Newton s method of root finding to the equation f (x) = 0. The first-order approximation of this function around the nth approximation x n of the true solution x is f (x ) f (x n ) + f (x n ) (x x n ). where we interpret f (x n ) as the (1 n) gradient of f at x n and f (x n ) as the (n n) Hessian at x n. Evidently f (x ) = 0 so we can solve for x n = x x n by solving f (x n ) x n = f (x n ) and then defining x n+1 + γ x n where 0 < γ 1. Newton s method can of course fairly easily be extended to the multidimensional case, which brings us to our next topic. Multi-dimensional optimization.1 The method of steepest descent A nice fact about the gradient f(x) of a function is that it points in the direction of steepest ascent at x. To descend, we of course want to move in the opposite direction. 7

The only question is how far. Let that be an open question for now, denoting the distance travelled in each iteration by α k. x k+1 = x k α k f(x k ). But how to choose α k? A natural option would seem to be to minimize f along the line x = x k α f(x k ) and that is precisely what is typically done, i.e. α k solves min α f(x k α f(x k )). This is a one-dimensional minimization problem that can be solved using, for example, the method of golden section or the hybrid method described above.. The Gauss-Newton method If you are minimizing a sum of squares, the Gauss-Newton algorithm is the way to go. It is a generalization of Newton s method. The approach goes as follows. Suppose you want to minimize S(x) := m rk(x) k=1 where r maps R n into R m. Start with an initial guess x 0. Then compute the gradient J(x 0 ) := r(x0 ) x T. The updated approximate minimizer x 1 solves J(x 0 ) T J(x 0 )[x 1 x 0 ] = J(x 0 ) T r(x 0 ). Notice that if m = n, then the method reduces to Newton-Raphson for finding a solution to r(x) = 0. 8

.3 The Nelder-Mead method Golden section is fine in one dimension, and steepest descent is good in arbitrarily many dimensions provided we are comfortable with computing gradients. But what to do in higher dimensions with functions that are not necessarily differentiable? One popular alternative is the Nelder-Mead algorithm. We will illustrate it in two dimensions, but the algorithm can fairly easily be extended to arbitrarily many dimensions. The method starts with three (or n + 1 in n dimensions) points B, G and W that do not lie on a line (or hyperplane in more than two dimensions). The notation is there to capture the assumption that f(b) < f(g) < f(w ). Think Best, Good, Worst. We will now describe the set of points constructed from B, G and W that might be computed at some step along the decision tree. We then complete our description of the algorithm by describing the decision tree..3.1 Definition and illustration of new points The points R (reflection point) and M (midpoint of the good side) are defined as follows and are illustrated in Figure 3. M = 1 (B + G). R = M W. 9

R B M W G Figure 3: The triangle BGW and the points R and M. The extension point E is defined as follows and is illustrated in Figure 4. E = R M. The contraction points C 1 and C are is defined as follows and is illustrated in Figure 5. C 1 = 1 (M + W ). C = 1 (M + R). Whichever of the points C 1 and C delivers the lowest value of f is called C. The shrinkage point S is defined as follows and is illustrated in Figure 6. S = 1 (B + W ). 10

B E R M W G Figure 4: The triangle BGW and the points R, M and E..4 Decision tree Compute R and f(r). Case I: If f(r) < f(g) then either reflect or extend. Case II: If f(r) > f(g) then reflect, contract or shrink. 11

R B C M C 1 W G Figure 5: The triangle BGW and the points R, M, C 1 and C. if f(b) < f(r) replace W with R else compute E and f(e) if f(e) < f(b) replace W with E else replace W with R endif endif Case I 1

B S M W G Figure 6: The triangle BGW and the points M and S. if f(r) < f(w ) replace W with R else compute C and f(c) if f(c) < f(w ) replace W with C else compute S and f(s) replace W with S replace G with M endif endif Case II Notice that when we say replace W with R that doesn t mean that R is the new W, it just means that the new points are B, G, and R. They may need to be renamed appropriately. Incidentally, the Nelder-Mead algorithm is implemented by Matlab s fminsearch. 13