Nonlinear Optimization: What s important?

Similar documents
Lecture 18: Optimization Programming

Nonlinear Programming

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

5 Quasi-Newton Methods

Optimization Methods

Constrained Optimization

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2

Part 4: Active-set methods for linearly constrained optimization. Nick Gould (RAL)

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

NonlinearOptimization

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

March 8, 2010 MATH 408 FINAL EXAM SAMPLE

Written Examination

Algorithms for Constrained Optimization

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods

Numerical Optimization of Partial Differential Equations

Optimization and Root Finding. Kurt Hornik

Outline. Scientific Computing: An Introductory Survey. Optimization. Optimization Problems. Examples: Optimization Problems

MATH 4211/6211 Optimization Quasi-Newton Method

Miscellaneous Nonlinear Programming Exercises

Algorithms for constrained local optimization

Generalization to inequality constrained problem. Maximize

Computational Finance

Scientific Computing: Optimization

March 5, 2012 MATH 408 FINAL EXAM SAMPLE

2.098/6.255/ Optimization Methods Practice True/False Questions

Scientific Computing: An Introductory Survey

Lectures 9 and 10: Constrained optimization problems and their optimality conditions

ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints

Part 5: Penalty and augmented Lagrangian methods for equality constrained optimization. Nick Gould (RAL)

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

Numerical optimization

, b = 0. (2) 1 2 The eigenvectors of A corresponding to the eigenvalues λ 1 = 1, λ 2 = 3 are

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09

Numerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems

Gradient Descent. Dr. Xiaowei Huang

Higher-Order Methods

8 Numerical methods for unconstrained problems

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

OPER 627: Nonlinear Optimization Lecture 14: Mid-term Review

Unconstrained optimization

Optimization: Nonlinear Optimization without Constraints. Nonlinear Optimization without Constraints 1 / 23

Lagrange duality. The Lagrangian. We consider an optimization program of the form

Additional Exercises for Introduction to Nonlinear Optimization Amir Beck March 16, 2017

AM 205: lecture 18. Last time: optimization methods Today: conditions for optimality

Methods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent

Optimization Methods. Lecture 18: Optimality Conditions and. Gradient Methods. for Unconstrained Optimization

Chapter 4. Unconstrained optimization

Optimization II: Unconstrained Multivariable

Convex Optimization & Lagrange Duality

LINEAR AND NONLINEAR PROGRAMMING

Optimality Conditions for Constrained Optimization

MATHEMATICS FOR COMPUTER VISION WEEK 8 OPTIMISATION PART 2. Dr Fabio Cuzzolin MSc in Computer Vision Oxford Brookes University Year

Numerical Optimization: Basic Concepts and Algorithms

4TE3/6TE3. Algorithms for. Continuous Optimization

Constrained optimization. Unconstrained optimization. One-dimensional. Multi-dimensional. Newton with equality constraints. Active-set method.

2 JOSE HERSKOVITS The rst stage to get an optimal design is to dene the Optimization Model. That is, to select appropriate design variables, an object

minimize x subject to (x 2)(x 4) u,

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno February 6, / 25 (BFG. Limited memory BFGS (L-BFGS)

Optimization Methods

Lecture 15: SQP methods for equality constrained optimization

Programming, numerics and optimization

Appendix A Taylor Approximations and Definite Matrices

Interior Point Methods in Mathematical Programming

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Machine Learning. Support Vector Machines. Manfred Huber

1 Computing with constraints

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares

SF2822 Applied nonlinear optimization, final exam Saturday December

Constrained Optimization and Lagrangian Duality

WHY DUALITY? Gradient descent Newton s method Quasi-newton Conjugate gradients. No constraints. Non-differentiable ???? Constrained problems? ????

5. Duality. Lagrangian

SF2822 Applied nonlinear optimization, final exam Wednesday June

1 Numerical optimization

Lecture Notes: Geometric Considerations in Unconstrained Optimization

Lecture 3. Optimization Problems and Iterative Algorithms

Introduction to unconstrained optimization - direct search methods

5 Handling Constraints

IPAM Summer School Optimization methods for machine learning. Jorge Nocedal

Contents. Preface. 1 Introduction Optimization view on mathematical models NLP models, black-box versus explicit expression 3

Nonlinear Programming (Hillier, Lieberman Chapter 13) CHEM-E7155 Production Planning and Control

Quasi-Newton Methods

Lecture V. Numerical Optimization

Introduction to Unconstrained Optimization: Part 2

Chapter 2. Optimization. Gradients, convexity, and ALS

Convex Optimization M2

Interior Point Methods for Mathematical Programming

SF2822 Applied Nonlinear Optimization. Preparatory question. Lecture 9: Sequential quadratic programming. Anders Forsgren

Numerical Optimization

Optimization II: Unconstrained Multivariable

CS-E4830 Kernel Methods in Machine Learning

Optimisation in Higher Dimensions

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

CE 191: Civil & Environmental Engineering Systems Analysis. LEC 17 : Final Review

Comparative study of Optimization methods for Unconstrained Multivariable Nonlinear Programming Problems

Lecture 18: November Review on Primal-dual interior-poit methods

Exam in TMA4180 Optimization Theory

Transcription:

Nonlinear Optimization: What s important? Julian Hall 10th May 2012

Convexity: convex problems A local minimizer is a global minimizer A solution of f (x) = 0 (stationary point) is a minimizer A global minimizer can be found by searching downhill For a convex quadratic function q(x) = g T x + 1 2 xt Hx the matrix H is positive semi-definite the minimizer is x = H 1 g For a convex nonlinear function f (x), at any point x the Hessian matrix H(x) = 2 f (x) is positive semi-definite

Convexity: non-convex problems A local minimizer may not be a global minimizer A solution of f (x) = 0 (stationary point) may be a maximizer a saddle point a non-global minimizer Finding a global minimizer can be very hard For a non-convex quadratic function q(x) = g T x + 1 2 xt Hx the matrix H is indefinite or negative (semi-)definite the unconstrained minimization of q(x) is unbounded q(x) on [0, 1] n may have 2 n local minimizers

Matrix definiteness A (square) symmetric matrix A is one of the following positive definite: x T Ax > 0 for x 0 All eigenvalues of A are positive positive semi-definite: x T Ax 0 All eigenvalues of A are non-negative indefinite: x T Ax has both signs Eigenvalues of A are both positive and negative negative semi-definite: x T Ax 0 All eigenvalues of A are non-positive negative definite: x T Ax < 0 for x 0 All eigenvalues of A are negative

Results For any matrix A, A T A is symmetric and positive semi-definite For any nonsingular matrix A, A T A is symmetric and positive definite For any positive definite matrix A A 1 is positive definite A 2 is positive definite For any positive definite matrices A and B A + B is positive definite

Unconstrained optimization The problem: minimize f (x) subject to x IR n Necessary conditions at a minimizer x : First order: f = 0 Second order: 2 f positive semi-definite Sufficient conditions for x to be a minimizer: First order: f = 0 Second order: 2 f positive definite Proofs not examinable

Unconstrained optimization The problem: minimize f (x) subject to x IR n Framework for line search method At iterate x k Find a search direction s k Perform a line search of f (x k + αs k ) to identify a step length α k Move to new iterate x k+1 = x k + α k s k Alternative approach is trust region method Find satisfactory (approximate) solution s of minimize f (x k + s) subject to s Move to new iterate x k+1 = x k + s Worth studying as a tricky constrained optimization problem Valuable practical technique Not examinable

Line search methods Find (approximate) minimizer α k of f (x k + αs k ) For a quadratic function q(x) = g T x + 1 2 xt Hx At x k with search direction s k the exact solution is α k = gk T s k s k T Hs k For a nonlinear function, techniques to find α k are Interpolation of f (x k + αs k ) and f (x k + αs k ) by quadratic or cubic polynomials Valuable practical technique Examinable Armijo linesearch considering values of f (x k + α j s k ) for values of α j in geometric sequence. Valuable for theoretical purposes Not examinable

Search directions Steepest descent s k = g k Newton s k = H k 1 g k Quasi-Newton s k = B k g k Conjugate gradient s k = g k + β k 1 s k 1

Steepest descent vs Newton s method Steepest descent (SD) First order needs only f (x) inexpensive Global convergence under weak assumptions Scale-dependent Rapid reduction in f when far from x Linear (slow) local convergence to x Newton s method (N) Second order also needs 2 f (x) expensive Global convergence only under strong assumptions Scale-independent Can be unreliable when far from x Quadratic (fast) local convergence to x

Quasi-Newton methods (QN) s k = B k g k Desired properties of B k Nonsingular Symmetric (since it is the inverse of a Hessian) Positive definite (so s k = B k g k is a descent direction) Like I when far from x (so behaves like SD) Like (H k ) 1 when close to x (so behaves like N)

Quasi-Newton methods s k = B k g k Methods Start with B 0 = I so s 0 = g 0 (SD) B k+1 relates change δ k in x k and change γ k in f (x k ) according to the quasi-newton condition B k+1 γ k = δ k Symmetric rank-one formula uses B k+1 = B k + auu T Rank-two formulae (DFP, BFGS and rest of the Broyden family) use B k+1 = B k + auu T + bvv T

Quasi-Newton methods Properties First order and O(n 2 ) cost per iteration (cheaper than N) Rapid reduction in f when far from x (like SD) Super-linear (fast) local convergence to x (similar to N)

The conjugate gradient method (CG) For quadratic functions with H positive definite: Let s 1 = g 1 Repeat, for k = 1,..., n α k = gk T s k s k T Hs k x k+1 = x k + α k s k g k+1 = g + Hx k+1 β k = gk+1t g k+1 g k T g k s k+1 = g k+1 + β k s k Until g m+1 = 0 for some m n

CG for nonlinear f (x) and linear systems Ax = b For nonlinear f (x) perform approximate line search use g k = f (x k ). For Ax = b If A is symmetric and positive definite: Solution of Ax = b is minimizer of q(x) = b T x + 1 2 xt Ax If A is unsymmetric and nonsingular: Solution of Ax = b is the solution of A T Ax = A T b Can solve A T Ax = A T b using CG by minimizing q(x) = (A T b) T x + 1 2 xt A T Ax since A T A is symmetric and positive definite CG is now more useful for symmetric linear systems than optimization

Least-squares problems Linear least squares: Find the best solution of Ax = b when A IR m n, m > n Minimize q(x) = Ax b 2 Form and solve A T Ax = A T b Nonlinear least squares: Find the best solution of r(x) = 0 when r : IR n IR m, m > n Minimize f (x) = r(x) 2 Use f (x) = 2A(x) T r(x) mx 2 f (x) = 2A(x) T A(x) + 2 r i (x) 2 r i i=1 where A(x) is the Jacobian matrix of r(x) Full Hessian is Newton s method 2A(x) T A(x) 2 f (x) is Gauss-Newton

Constrained optimization The problem: minimize f (x) subject to x S IR n Convexity of feasible region S is important If S is convex and f is convex on S then it is a convex programming problem (CPP) If S is non-convex then there may be multiple local minimizers The most general nonlinear optimization problem in this course is minimize f (x) subject to c i (x) = 0, i E and c i (x) 0, i I

Kuhn-Tucker (KT) conditions For the nonlinear optimization problem minimize f (x) subject to c i (x) = 0, i E and c i (x) 0, i I The Kuhn-Tucker (KT) conditions at a point x are that there exist Lagrange multipliers λ such that i λ i c i (x) = f (x) (KT1) c i (x) = 0, i E c i (x) 0, i I λ i 0, i I λ i = 0, i A (KT2) (KT3) (KT4) (KT5) where the active set A contains the indices of the active constraints at x

Convex programming problems (CPP) Definition minimize f (x) subject to c i (x) 0, i = 1, 2,..., m where each c i is concave and f is convex on S (CPP) Theorems For a CPP Every local minimizer x is a global minimizer The set S of global minimizers is convex and, if f (x) is strictly convex then there is a unique global minimizer If the KT conditions hold at x then it is a global minimizer Proofs examinable

Convex duality The Wolfe dual For the primal CPP (P) the Wolfe dual problem is where maximize L(x, λ) subject to x L(x, λ) = 0, λ 0 L(x, λ) = f (x) m λ i c i (x) i=1 (D) Theorem If x is a minimizer of (P) then x and λ are a maximizer of (D) and the optimal objective values of (P) and (D) are equal Proof examinable

Convex duality: examples LP problems For the primal LP problem minimize c T x subject to Ax b, x 0 the dual LP problem is maximize b T λ subject to A T λ c, λ 0 (P) (D)

Linearly constrained optimization For the linearly constrained optimization problem minimize f (x) subject to a T i x = b i, i E and a T i x b i, i I Feasible region S is convex If f is convex on S then it is a CPP The KT conditions at a point x are i λ ia i = f (x) a T i x = b i, i E a T i x b i, i I λ i 0, i I λ i = 0, i A (KT1) (KT2) (KT3) (KT4) (KT5)

Necessary and sufficient conditions for a minimizer Necessary conditions at a minimizer x First order: KT conditions are satisfied Second order: curvature of f is non-negative in directions of zero slope Sufficient conditions for x to be a minimizer First order: KT conditions are satisfied Second order: curvature of f is positive in directions of zero slope Proofs examinable

Nonlinear constrained optimization For the nonlinear constrained optimization problem minimize f (x) subject to c i (x) = 0, i E and c i (x) 0, i I Necessary and sufficient conditions for a minimizer are as for linearly constrained optimization problems, but curvature requirement applies to the Lagrangian L(x, λ) = f (x) i λ i c i (x) Proofs not examinable