EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science

Similar documents
EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science

Optimization: Nonlinear Optimization without Constraints. Nonlinear Optimization without Constraints 1 / 23

Lecture V. Numerical Optimization

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science

1 Numerical optimization

Unconstrained Multivariate Optimization

Optimization Methods

NonlinearOptimization

Optimization and Root Finding. Kurt Hornik

1 Numerical optimization

Introduction to unconstrained optimization - direct search methods

Motivation: We have already seen an example of a system of nonlinear equations when we studied Gaussian integration (p.8 of integration notes)

Convex Optimization. Problem set 2. Due Monday April 26th

ECS550NFB Introduction to Numerical Methods using Matlab Day 2

Quasi-Newton Methods

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

Comparative study of Optimization methods for Unconstrained Multivariable Nonlinear Programming Problems

8 Numerical methods for unconstrained problems

Statistics 580 Optimization Methods

Optimization II: Unconstrained Multivariable

Line Search Methods for Unconstrained Optimisation

Static unconstrained optimization

Convex Optimization CMU-10725

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey

Methods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods

Chapter 4. Unconstrained optimization

Numerical Optimization

Programming, numerics and optimization

Math 411 Preliminaries

Unconstrained optimization

Lecture 7 Unconstrained nonlinear programming

Optimization. Totally not complete this is...don't use it yet...

(One Dimension) Problem: for a function f(x), find x 0 such that f(x 0 ) = 0. f(x)

5 Quasi-Newton Methods

Lecture Notes: Geometric Considerations in Unconstrained Optimization

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

Geometry optimization

Gradient Descent. Dr. Xiaowei Huang

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 3. Gradient Method

OPER 627: Nonlinear Optimization Lecture 14: Mid-term Review

Optimization II: Unconstrained Multivariable

Optimization. Next: Curve Fitting Up: Numerical Analysis for Chemical Previous: Linear Algebraic and Equations. Subsections

Outline. Scientific Computing: An Introductory Survey. Optimization. Optimization Problems. Examples: Optimization Problems

MATH 4211/6211 Optimization Quasi-Newton Method

Optimization 2. CS5240 Theoretical Foundations in Multimedia. Leow Wee Kheng

Numerical Methods I Solving Nonlinear Equations

Optimization Methods

Constrained optimization. Unconstrained optimization. One-dimensional. Multi-dimensional. Newton with equality constraints. Active-set method.

Exploring the energy landscape

Numerical optimization

Lecture 7: Minimization or maximization of functions (Recipes Chapter 10)

Nonlinear Optimization: What s important?

Multivariate Newton Minimanization

Algorithms for Constrained Optimization

Numerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems

Numerical optimization

, b = 0. (2) 1 2 The eigenvectors of A corresponding to the eigenvalues λ 1 = 1, λ 2 = 3 are

Scientific Computing: Optimization

MATHEMATICS FOR COMPUTER VISION WEEK 8 OPTIMISATION PART 2. Dr Fabio Cuzzolin MSc in Computer Vision Oxford Brookes University Year

Neural Network Training

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

Lecture 8 Optimization

Mathematical optimization

STAT Advanced Bayesian Inference

Minimization of Static! Cost Functions!

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares

Derivative-Free Optimization of Noisy Functions via Quasi-Newton Methods. Jorge Nocedal

Higher-Order Methods

Scientific Computing: An Introductory Survey

CHAPTER 2: QUADRATIC PROGRAMMING

Introduction to Unconstrained Optimization: Part 2

Today. Introduction to optimization Definition and motivation 1-dimensional methods. Multi-dimensional methods. General strategies, value-only methods

Applied Computational Economics Workshop. Part 3: Nonlinear Equations

Nonlinear Optimization for Optimal Control

Nonlinear Programming

Performance Surfaces and Optimum Points

ECE 595, Section 10 Numerical Simulations Lecture 7: Optimization and Eigenvalues. Prof. Peter Bermel January 23, 2013

10.34 Numerical Methods Applied to Chemical Engineering Fall Quiz #1 Review

NUMERICAL MATHEMATICS AND COMPUTING

4M020 Design tools. Algorithms for numerical optimization. L.F.P. Etman. Department of Mechanical Engineering Eindhoven University of Technology

Contents. Preface. 1 Introduction Optimization view on mathematical models NLP models, black-box versus explicit expression 3

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)

Optimization for neural networks

Gradient Descent. Sargur Srihari

Chapter 3 Numerical Methods

AM 205: lecture 18. Last time: optimization methods Today: conditions for optimality

Introduction to Black-Box Optimization in Continuous Search Spaces. Definitions, Examples, Difficulties

Data Mining (Mineria de Dades)

Optimization Concepts and Applications in Engineering

Review of Classical Optimization

Discussion of Maximization by Parts in Likelihood Inference

Topic 8c Multi Variable Optimization

Lecture 17: Numerical Optimization October 2014

2. Quasi-Newton methods

434 CHAP. 8 NUMERICAL OPTIMIZATION. Powell's Method. Powell s Method

Numerical Optimization: Basic Concepts and Algorithms

A projected Hessian for full waveform inversion

Transcription:

EAD 115 Numerical Solution of Engineering and Scientific Problems David M. Rocke Department of Applied Science

Multidimensional Unconstrained Optimization Suppose we have a function f() of more than one variable f(x 1, x 2,, x n ) We want to find the values of x 1, x 2,, x n that give f() the largest (or smallest) possible value Graphical solution is not possible, but a graphical picture helps understanding Hilltops and contour maps

Methods of solution Direct or non-gradient methods do not require derivatives Grid search Random search One variable at a time Line searches and Powell s method Simplex optimization

Gradient methods use first and possibly second derivatives Gradient is the vector of first partials Hessian is the matrix of second partials Steepest ascent/descent Conjugate gradient Newton s method Quasi-Newton methods

Grid and Random Search Given a function and limits on each variable, generate a set of random points in the domain, and eventually choose the one with the largest function value Alternatively, divide the interval on each variable into small segments and check the function for all possible combinations

2 2 1 2 2 1 1 1 2 2 f( x, x ) x x 2x 2x x x 2 x 2 1 x 3 f( 1,1.5) 1.25 1 2

Direct Search with 10,000 Points Method x 1 x 2 f E Random -.985 1.486 1.2498.0199 Random -.989 1.493 1.2499.0131 Random -1.003 1.490 1.2498.0107 Random -.992 1.486 1.2499.0157 Random -1.002 1.498 1.2500.0027 Random -.998 1.499 1.2500.0024 Random -1.015 1.520 1.2497.0255 Random -.999 1.493 1.2500.0070 Grid -.990 1.505 1.2497.0113

Features of Random and Grid Search Slow and inefficient Requires knowledge of domain Works even for discontinuous functions Poor in high dimension Grid search can be used iteratively, with progressively narrowing domains

Line searches Given a starting point and a direction, search for the maximum, or for a good next point, in that direction. Equivalent to one dimensional optimization, so can use Newton s method or another method from previous chapter Different methods use different directions

x v ( x, x,, x ) 1 2 ( v, v,, v ) 1 2 n n f ( x) f( x, x,, x ) 1 2 g( λ) f ( x λv) n

One-Variable-at-a Time Search Given a function f() of n variables, search in the direction in which only variable 1, changes Then search in the direction from that point in which only variable 2 changes, etc. Slow and inefficient in general Can speed up by searching in a direction after n changes (pattern direction)

Powell s Method If f() is quadratic, and if two points are found by line searches in the same direction from two different starting points, then the line joining the two ending points (a conjugate direction) heads toward the optimum Since many functions we encounter are approximately quadratic near the optimum, this can be effective

Start with a point x 0 and two random directions h 1 and h 2 Search in the direction of h 1 from x 0 to find a new point x 1 Search in the direction of h 2 from x 1 to find a new point x 2. Let h 3 be the direction joining x 0 to x 2 Search in the direction of h 3 from x 2 to find a new point x 3 Search in the direction of h 2 from x 3 to find a new point x 4 Search in the direction of h 3 from x 4 to find a new point x 5

Points x 3 and x 5 have been found by searching in the direction of h 3 from two starting points x 2 and x 4 Call the direction joining x 3 and x 5 h 4 Search in the direction of h 4 from x 5 to find a new point x 6 The new point x 6 will be exactly the optimum if f() is quadratic The iterations can then be repeated Errors estimated by change in x or in f()

Nelder-Mead Simplex Algorithm Direct search method that uses simplices, which are triangles in dimension 2, pyramids in dimension 3, etc. At each iteration a new point is added usually in the direction of the face of the simplex with largest function values

4.2 4 3-2

Gradient Methods The gradient of f() at a point x is the vector of partial derivatives of the function f() at x For smooth functions, the gradient is zero at an optimum, but may also be zero at a non-optimum The gradient points uphill The gradient is orthogonal to the contour lines of a function at a point

Directional Derivatives Given a point x in R n, a unit direction v, and a function f() of n variables, we can define a new function g() of one variable by g(λ)=f(x+λv) The derivative g (λ) is the directional derivative of f() at x in the direction of v This is greatest when v is in the gradient direction

x v 1 ( x, x,, x ) 1 2 ( v, v,, v ) T vv 1 2 i 1 2 i f( x) f( x, x,, x ) 1 2 f f f f,,, x x x 1 2 g( λ) f ( x λv) n v n n n T f f f g'(0) ( f) v v, v,, vn x x x n 1 2 1 2 n

Steepest Ascent The gradient direction is the direction of steepest ascent, but not necessarily the direction leading directly to the summit We can search along the direction of steepest ascent until a maximum is reached Then we can search again from a new steepest ascent direction

x x 2 1 2 1 2 f( x, x ) at (2,2) f (2,2) 8 2 1 1 2 2 1 f ( x, x ) x f (2,2) 4 f ( x, x ) 2 x x f (2,2) 8 2 1 2 1 2 2 f (2,2) (4,8) (2 4 λ,2 8 λ) is the gradient line g( λ) f (2 4 λ,2 8 λ) (2 4 λ)(2 8 λ) 2

The Hessian The Hessian of a function f() is the matrix of second partial derivatives The gradient is always 0 at a maximum (for smooth functions) The gradient is also 0 at a minimum The gradient is also 0 at a saddle point, which is neither a maximum nor a minimum A saddle point is a max in at least one direction and a min in at least one direction

Max, Min, and Saddle Point For one-variable functions, the second derivative is negative at a maximum and positive at a minimum For functions of more than one variable, a zero of the gradient is a max if the second directional derivative is negative for every direction and is a min if the second directional derivative is positive for every direction

Positive Definiteness A matrix H is positive definite if x T Hx > 0 for every vector x Equivalently, every eigenvalue of H is positive λ is an eigenvalue of H with eigenvector x if Hx = λx -H is positive definite if every eigenvalue of H is negative

Max, Min, and Saddle Point If the gradient f of a function f is zero at a point x and the Hessian H is positive definite at that point, then x is a local min If f is zero at a point x and -H is positive definite at that point, then x is a local max If f is zero at a point x and neither H nor -H is positive definite at that point, then x is a saddle point The determinant H helps only in dimension 1 or 2

Finite-Difference Approximations If analytical derivatives cannot be evaluated, one can use finite-difference approximations Centered difference approximations are in general more accurate, though requiring extra function evaluations Increment often macheps 1/2 or 1e-8 for dp This can be problematic for large problems

Complexity of Finite-Difference Derivatives In an n-variable problem, the function value is one function evaluation (FE) A finite-difference gradient is n FE s if forward or backward and 2n FE s if centered. A finite difference Hessian is O(n 2 ) FE s With a thousand variable problem, this can be huge

Steepest Ascent/Descent This is the simplest of the gradient-based methods From the current guess, compute the gradient Search along the gradient direction until a local max is reached of this onedimensional function Repeat until convergence

f( x, x ) 2x x 2x x 2x f x f x 1 2 2 2 1 2 1 2 1 1 2 ( x, x ) f ( x, x ) 2x 2 2x 1 2 1 2 1 1 2 2 1 ( x, x ) f ( x, x ) 2x 4x 1 2 2 1 2 1 2 True optimum 0 2x 2 2x 2 1 0 2x 4x 1 2 0 2 2x ( x, x ) (2,1) 2 2 H 2 4 2

Eigenvalues If H is a matrix, we can find the eigenvalues in a number of ways We will examine numerical methods for this later, but there is an algebraic method for small matrices We illustrate this for the Hessian in this example

H Hx H I x 0 2 2 2 4 x 0 2 2 2 4 0 ( 2)( 4) 4 2 0 6 4 6 36 16 2 6 20 x 0 2 solution is a maximum

f x f x f( x, x ) 2x x 2x x 2x 1 2 2 2 1 2 1 2 1 1 2 ( x, x ) f ( x, x ) 2x 2 2x 1 2 1 1 2 2 1 ( x, x ) f ( x, x ) 2x 4x 1 2 2 1 2 1 2 f ( 1,1) 7 f ( 1,1) 2x 2 2x 6 1 2 1 f ( 1,1) 2x 4x 6 2 1 2 g( ) f( 1 6,1 6 )

g( ) f( 1 6,1 6 ) 2( 1 6 )(1 6 ) 2( 1 6 ) ( 1 6 ) 2(1 6 ) 2 180 72 7 g '( ) 360 72 0 0.2 x ( 1 6(.2),1 6(.2)) (.2,.2) 2 2

f (2,1) 2 f (1, 1) 7 f (0.2, 0.2) 0.2 f f 1 2 (0.2, 0.2) 1.2 (0.2, 0.2) 1.2 g( ) f(0.2 1.2, 0.2 1.2 ) g 2 1.44 2.88 0.2 2 '( ) 2.88 2.88 1 f (1.4,1) 1.64

Practical Steepest Ascent In real examples, the maximum in the gradient direction cannot be calculated analytically Problem reduces to one dimensional optimization as a line search One can also use more primitive line searches that are fast but do not try to find the absolute optimum

Newton s Method Steepest ascent can be quite slow Newton s method is faster, though it requires evaluation of the Hessian Function is modeled by a quadratic at a point using first and second derivatives The quadratic is solved exactly This is used as the next iterate

A second-order multivariate Taylor series expansion at the current iterate is T T f( x) f( x ) f ( x )( x x ) 0.5( x x ) H ( x x ) i i i i i i At the optimum, the gradient is 0, so f( x) f( x ) H ( x x ) 0 i i i If H is invertible,then 1 x i 1 x i H i f( xi) In practice, solve the linear problem, H x H x f( x ) i i i i

Variations on Newton s Method Quasi-Newton methods use approximate Hessians that are built up as the iterations progress. There are several methods of doing this, the best is probably BFGS (Broyden-Fletcher-Goldfarb-Shanno) These do not require analytical or numerical Hessians to be calculated at each step

The Marquardt algorithm uses a compromise between steepest ascent and the Newton solution The steepest ascent direction is equivalent to using H=I Thus, if we use a Hessian of H+αI, and gradually reduce α from a large value, we get steepest ascent at first followed by more and more of the Newton direction

The trust region approach is an alternative to line searches Rather than searching along the gradient, or moving directly to the Newton solution (which may diverge), the trust region approach finds the maximum/minimum value of the quadratic model subject to a constraint on the stepsize This also results in a mixture of gradient and Newton until the Newton step is in the trust region

Using Matlab to find Optima fminbnd finds the minimum of a onevariable function fminsearch finds the minimum of a multivariable function fminunc (in the Optimization Toolbox) also finds minima of unconstrained functions

fminbnd Finds the minimum of a function of one variable on a closed interval. Assumes the function is continuous Uses a combination of golden section search and quadratic interpolation Exhibits slow convergence when the minimum is near the boundary fmincon is better (in Optimization Toolbox)

fminsearch Direct search method for functions of several variables Does not assume differentiability Can handle discontinuities Uses Nelder-Mead simplex algorithm Tends to be reliable but slow

fminunc Finds minima in unconstrained problems in several to large dimension For medium-scale optimization uses quasi-newton method with BFGS updates and mixed quadratic-cubic line search For large-scale optimization uses subspace trust-region method based on interior-reflective Newton method using preconditioned conjugate gradients

function f=fx(x) f=-(2*sin(x)-x^2/10) >> [ x fval ] = fminbnd('fx',0,4) >> x x = 1.4275 >> fval fval = -1.7757

function f=fxy(x) f=-(2*x(1)*x(2)+2*x(1)-x(1)^2-2*x(2)^2) >> [ x fval ] = fminsearch('fxy',[-1,1]) >> x x = 1.9999 1.0000 >> fval fval = -2.0000 >> [ x fval ] = fminunc('fxy',[-1,1]) >> x x = 2.0000 1.0000 >> fval fval = -2.0000