NonlinearOptimization

Similar documents
Optimization Methods

Optimization Methods

Optimization: Nonlinear Optimization without Constraints. Nonlinear Optimization without Constraints 1 / 23

Nonlinear Optimization: What s important?

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science

Programming, numerics and optimization

Quasi-Newton Methods

Convex Optimization CMU-10725

Introduction to unconstrained optimization - direct search methods

1 Numerical optimization

Multivariate Newton Minimanization

Lecture Notes: Geometric Considerations in Unconstrained Optimization

Lecture 7 Unconstrained nonlinear programming

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

MATH 4211/6211 Optimization Quasi-Newton Method

Convex Optimization. Problem set 2. Due Monday April 26th

5 Quasi-Newton Methods

The Conjugate Gradient Method

Methods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent

1 Numerical optimization

Line Search Methods for Unconstrained Optimisation

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey

Optimization II: Unconstrained Multivariable

2. Quasi-Newton methods

Statistics 580 Optimization Methods

MATHEMATICS FOR COMPUTER VISION WEEK 8 OPTIMISATION PART 2. Dr Fabio Cuzzolin MSc in Computer Vision Oxford Brookes University Year

Chapter 4. Unconstrained optimization

Nonlinear Programming

Lecture V. Numerical Optimization

Optimization and Root Finding. Kurt Hornik

Scientific Computing: Optimization

nonrobust estimation The n measurement vectors taken together give the vector X R N. The unknown parameter vector is P R M.

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

(One Dimension) Problem: for a function f(x), find x 0 such that f(x 0 ) = 0. f(x)

Optimization II: Unconstrained Multivariable

Higher-Order Methods

Outline. Scientific Computing: An Introductory Survey. Optimization. Optimization Problems. Examples: Optimization Problems

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 3. Gradient Method

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09

Math (P)refresher Lecture 8: Unconstrained Optimization

Lecture 14: October 17

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

Unconstrained optimization

Constrained optimization. Unconstrained optimization. One-dimensional. Multi-dimensional. Newton with equality constraints. Active-set method.

Numerical Optimization

Introduction to Unconstrained Optimization: Part 2

Improving the Convergence of Back-Propogation Learning with Second Order Methods

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno February 6, / 25 (BFG. Limited memory BFGS (L-BFGS)

Numerical solutions of nonlinear systems of equations

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2

Gradient Descent. Dr. Xiaowei Huang

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Performance Surfaces and Optimum Points

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods

Exploring the energy landscape

Optimization 2. CS5240 Theoretical Foundations in Multimedia. Leow Wee Kheng

Chapter 10 Conjugate Direction Methods

Static unconstrained optimization

ECE580 Exam 1 October 4, Please do not write on the back of the exam pages. Extra paper is available from the instructor.

Chapter 6: Derivative-Based. optimization 1

HW3 - Due 02/06. Each answer must be mathematically justified. Don t forget your name. 1 2, A = 2 2

Branch-and-Bound Algorithm. Pattern Recognition XI. Michal Haindl. Outline

Data Mining (Mineria de Dades)

Algorithms for Constrained Optimization

AM 205: lecture 18. Last time: optimization methods Today: conditions for optimality

Math 273a: Optimization Netwon s methods

Gradient-Based Optimization

Maximum Likelihood Estimation

Unconstrained Multivariate Optimization

13. Nonlinear least squares

Written Examination

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

Review of Classical Optimization

Trajectory-based optimization

Scientific Computing: An Introductory Survey

17 Solution of Nonlinear Systems

Lecture 7: Minimization or maximization of functions (Recipes Chapter 10)

Gradient Descent. Sargur Srihari

Matrix Derivatives and Descent Optimization Methods

Comparative study of Optimization methods for Unconstrained Multivariable Nonlinear Programming Problems

Krzysztof Tesch. Continuous optimisation algorithms

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science

Numerical Analysis of Electromagnetic Fields

A projected Hessian for full waveform inversion

Contents. Preface. 1 Introduction Optimization view on mathematical models NLP models, black-box versus explicit expression 3

Vasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks

5 Handling Constraints

MI-RUB Testing Lecture 10

An Iterative Descent Method

Numerical Optimization of Partial Differential Equations

Unconstrained Optimization

Minimization of Static! Cost Functions!

Quasi-Newton methods for minimization

Simulation based optimization

SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS

ORIE 6326: Convex Optimization. Quasi-Newton Methods

Numerical Optimization Techniques

Neural Network Training

Optimization. Next: Curve Fitting Up: Numerical Analysis for Chemical Previous: Linear Algebraic and Equations. Subsections

Transcription:

1/35 NonlinearOptimization Pavel Kordík Department of Computer Systems Faculty of Information Technology Czech Technical University in Prague Jiří Kašpar, Pavel Tvrdík, 2011 Unconstrained nonlinear optimization, Advanced Quasi-Newton Computer System method Architectures, MI-POA, 09/2011, Lecture 12 MI-POA Evropský sociální fond Praha & EU: Investujeme do vaší budoucnosti

Outline Zero order methods Random search Powell s method First order methods Steepest descent Conjugate gradient Second order methods 2/35 2

Powell s method Minimizing a function of multiple variables Proceeding from starting point in some vector direction n -> minimizing f(p) along the line nby onedimensional methods Critical part : Choosing the next direction n Not computing the function s gradient 3/35

4/35 Powell s method: algorithm x 0 Define set of n search directions S q coordinate unit vectors, q=1,,n x =x 0, y=x q=0 q=q+1 Find α* to min F(x q-1 + α* S q ) x q = (x q-1 + α* S q ) One iteration, n+1 one dimensional searches Find conjugate direction S q+1 =x q -y N Find α* to min F(x q + α* S q+1 ) x q+1 = (x q + α* S q+1 ) Converged? Update search directions S q =S q+1 q=1,,n Y Exit N q=n? Y y=x q+1

5/35 Starting point P 1 P 0 P E P 2 u 2

6/35 Polynomial interpolation Bracket the minimum. Fit a quadratic or cubic polynomial which interpolates f(x)at some points in the interval. Jump to the (easily obtained) minimum of the polynomial. Throw away the worst point and repeat the process.

Polynomial interpolation Quadratic interpolation using 3 points, 2 iterations Other methods to interpolate? 2 points and one gradient Cubic interpolation 7/35

Examples of quadratic functions Case 1: both eigenvalues positive with positive definite minimum 8/35

Examples of quadratic functions Case 2: eigenvalues have different sign with indefinite saddle point 9/35

Examples of quadratic functions Case 3: one eigenvalues is zero with positive semidefinite parabolic cylinder 10/35

11/35 Optimization for quadratic functions Assume that H is positive definite There is a unique minimum at If N is large, it is not feasible to perform this inversion directly.

Steepest descent Basic principle is to minimize the N-dimensional function by a series of 1D line-minimizations: The steepest descent method chooses p k to be parallel to the gradient Step-size α k is chosen to minimize f(x k + α k p k ). For quadratic forms there is a closed form solution: 12/35

13/35 Steepest descent The gradient is everywhere perpendicular to the contour lines. After each line minimization the new gradient is always orthogonal to the previous step direction (true of any line minimization). Consequently, the iterates tend to zig-zag down the valley in a very inefficient manner

14/35 Conjugate gradient Each p k is chosen to be conjugate to all previous search directions with respect to the Hessian H: The resulting search directions are mutually linearly independent. Remarkably, p k can be chosen using only knowledge of p k-1,, and

Conjugate gradient An N-dimensional quadratic form can be minimized in at most N conjugate descent steps. 3 different starting points. Minimum is reached in exactly 2 steps. 15/35

16/35 Optimization for General functions Apply methods developed using quadratic Taylor series expansion

17/35 Rosenbrock s function Minimum at [1, 1]

18/35 Steepest descent The 1D line minimization must be performed using one of the earlier methods (usually cubic polynomial interpolation) The zig-zag behaviour is clear in the zoomed view The algorithm crawls down the valley

Conjugate gradient Again, an explicit line minimization must be used at every step The algorithm converges in 98 iterations Far superior to steepest descent 19/35

20/35 Newton method Expand f(x)by its Taylor series about the point x k where the gradient is the vector and the Hessian is the symmetric matrix

Hessian Matrix of f(x) f ( x) is a C 2 2 ( ) f ( x) H x function of 2 f 2 x1 = M 2 f xn x is a symmetric matrix. n variables, 2 ( x) f ( x) 2 ( x) f ( x) Since cross - partials are equal for a 1 L O L M x C 2 1 x x 2 n n. function, H(x) Fin500J Topic 4 21/35

Conditions for a Minimum or a Maximum Value of a Function of Several Variables (cont.) Let f(x) be a C 2 function in R n. Suppose that x* is a critical point of f(x), i.e., f ( x *) = 0. ( ) 1. If the Hessian H x * is a positive definite matrix, then x* is a local minimum of f(x); ( ) 2. If the Hessian H x * is a negative definite matrix, then x* is a local maximum of f(x). ( ) 3. If the Hessian H x * is an indefinite matrix, then x* is neither a local maximum nor a local minimum of f(x). Fin500J Topic 4 22/35

Pavel Kordík (ČVUT FIT) MI-NON, 2011 Nonlinear Optimization 23/35 Example xy y x y x f 9 ), ( 3 3 + = Findthe local maxsand minsof f(x,y) Firstly, computing the first order partial derivatives (i.e., gradient of f(x,y)) and setting them to zero ( ). 3 3 and is (0,0) * *, points critical 0 9 3 9 3 ), ( 2 2 ), - ( y x x y y x y f x f y x f = + + = =

Pavel Kordík (ČVUT FIT) MI-NON, 2011 Nonlinear Optimization 24/35 Example (Cont.). 6 9 9 6 ), ( 2 2 2 2 2 2 2 = = y x y f x y f y x f x f y x f We now compute the Hessian of f(x,y) The first order leading principal minor is 6x and the second order principal minor is -36xy-81. At (0,0), these two minors are 0 and -81, respectively. Since the second order leading principal minor is negative, (0,0) is a saddle of f(x,y), i.e., neither a max nor a min. At (3, -3), these two minors are 18 and 243. So, the Hessian is positive definite and (3,-3) is a local min of f(x,y). Is (3, -3) a global min?

Newton method For a minimum we require that, and so with solution. This gives the iterative update If f(x)is quadratic, then the solution is found in one step. The method has quadratic convergence (as in the 1D case). The solution is guaranteed to be a downhill direction. Rather than jump straight to the minimum, it is better to perform a line minimization which ensures global convergence If H=I then this reduces to steepest descent. 25/35

26/35 Newton method -example The algorithm converges in only 18 iterations compared to the 98 for conjugate gradients. However, the method requires computing the Hessian matrix at each iteration this is not always feasible

Quasi-Newton methods If the problem size is large and the Hessian matrix is dense then it may be infeasible/inconvenient to compute it directly. Quasi-Newton methods avoid this problem by keeping a rolling estimate of H(x), updated at each iteration using new gradient information. Common schemes are due to Broyden, Goldfarb, Fletcher and Shanno (BFGS), and also Davidson, Fletcher and Powell (DFP). The idea is based on the fact that for quadratic functions holds and by accumulating g k s and x k s we can calculate H. 27/35

Quasi-Newton BFGS method Set H 0 = I. Update according to where The matrix inverse can also be computed in this way. Directions δ k s form a conjugate set. H k+1 is positive definite if H k is positive definite. The estimate H k is used to form a local quadratic approximation as before 28/35

BFGS example The method converges in 34 iterations, compared to 18 for the full-newton method 29/35

30/35 Non-linear least squares It is very commonin applications for a cost function f(x)to be the sum of a large number of squared residuals If each residual depends non-linearlyon the parameters xthen the minimization of f(x)is a nonlinear least squares problem.

31/35 Non-linear least squares The M N Jacobian of the vector of residuals ris defined as Consider Hence

Non-linear least squares For the Hessian holds Gauss-Newton approximation Note that the second-order term in the Hessian is multiplied by the residuals r i. In most problems, the residuals will typically be small. Also, at the minimum, the residuals will typically be distributed with mean = 0. For these reasons, the second-order term is often ignored. Hence, explicit computation of the full Hessian can again be avoided. 32/35

33/35 Gauss-Newton example The minimization of the Rosenbrock function can be written as a least-squares problem with residual vector

34/35 Gauss-Newton example minimization with the Gauss-Newton approximation with line search takes only 11 iterations

Comparison Newton CG Quasi-Newton Pavel Kordík (ČVUT FIT) Gauss-Newton Nonlinear Optimization MI-NON, 2011 35/35