Scientific Computing: Optimization

Similar documents
Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

Numerical Methods I Solving Nonlinear Equations

Lecture 18: Optimization Programming

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

2.098/6.255/ Optimization Methods Practice True/False Questions

Numerical Optimization. Review: Unconstrained Optimization

Nonlinear Optimization: What s important?

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey

Numerical optimization

Constrained Optimization

Numerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems

Lecture V. Numerical Optimization

IE 5531: Engineering Optimization I

Outline. Scientific Computing: An Introductory Survey. Optimization. Optimization Problems. Examples: Optimization Problems

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09

AM 205: lecture 18. Last time: optimization methods Today: conditions for optimality

MATHEMATICS FOR COMPUTER VISION WEEK 8 OPTIMISATION PART 2. Dr Fabio Cuzzolin MSc in Computer Vision Oxford Brookes University Year

Gradient Descent. Dr. Xiaowei Huang

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods

Optimization and Root Finding. Kurt Hornik

Optimization Methods

1 Computing with constraints

Multidisciplinary System Design Optimization (MSDO)

ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints

Math 273a: Optimization Basic concepts

Bindel, Spring 2017 Numerical Analysis (CS 4220) Notes for So far, we have considered unconstrained optimization problems.

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

MATH 4211/6211 Optimization Basics of Optimization Problems

Numerical Optimization of Partial Differential Equations

IE 5531: Engineering Optimization I

IE 5531: Engineering Optimization I

Constrained optimization: direct methods (cont.)

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares

Optimization Problems with Constraints - introduction to theory, numerical Methods and applications

Optimization Tutorial 1. Basic Gradient Descent

Minimization of Static! Cost Functions!

Generalization to inequality constrained problem. Maximize

1 Introduction

Unconstrained Optimization

More on Lagrange multipliers

Penalty and Barrier Methods. So we again build on our unconstrained algorithms, but in a different way.

September Math Course: First Order Derivative

2.3 Linear Programming

TMA 4180 Optimeringsteori KARUSH-KUHN-TUCKER THEOREM

Optimization Concepts and Applications in Engineering

Numerical Optimization

Lecture 3. Optimization Problems and Iterative Algorithms

CONSTRAINED NONLINEAR PROGRAMMING

5 Handling Constraints

Constrained Optimization and Lagrangian Duality

Part 4: Active-set methods for linearly constrained optimization. Nick Gould (RAL)

8 Numerical methods for unconstrained problems

Motivation: We have already seen an example of a system of nonlinear equations when we studied Gaussian integration (p.8 of integration notes)

Optimization. Totally not complete this is...don't use it yet...

Contents. Preface. 1 Introduction Optimization view on mathematical models NLP models, black-box versus explicit expression 3

Appendix A Taylor Approximations and Definite Matrices

4M020 Design tools. Algorithms for numerical optimization. L.F.P. Etman. Department of Mechanical Engineering Eindhoven University of Technology

Chapter 3 Numerical Methods

FALL 2018 MATH 4211/6211 Optimization Homework 4

Constrained optimization. Unconstrained optimization. One-dimensional. Multi-dimensional. Newton with equality constraints. Active-set method.

OPTIMALITY AND STABILITY OF SYMMETRIC EVOLUTIONARY GAMES WITH APPLICATIONS IN GENETIC SELECTION. (Communicated by Yang Kuang)

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

Written Examination

In view of (31), the second of these is equal to the identity I on E m, while this, in view of (30), implies that the first can be written

Lecture 3: Basics of set-constrained and unconstrained optimization

Scientific Computing: An Introductory Survey

Numerical Optimization

LECTURE 22: SWARM INTELLIGENCE 3 / CLASSICAL OPTIMIZATION

NONLINEAR. (Hillier & Lieberman Introduction to Operations Research, 8 th edition)

1 Numerical optimization

6.252 NONLINEAR PROGRAMMING LECTURE 10 ALTERNATIVES TO GRADIENT PROJECTION LECTURE OUTLINE. Three Alternatives/Remedies for Gradient Projection

Nonlinear Programming (Hillier, Lieberman Chapter 13) CHEM-E7155 Production Planning and Control

Lecture: Duality of LP, SOCP and SDP

Numerical Optimization: Basic Concepts and Algorithms

Support Vector Machines: Maximum Margin Classifiers

Optimisation in Higher Dimensions

Lectures 9 and 10: Constrained optimization problems and their optimality conditions

CE 191: Civil & Environmental Engineering Systems Analysis. LEC 17 : Final Review

Lecture Notes: Geometric Considerations in Unconstrained Optimization

Optimality Conditions

MATH2070 Optimisation

Unconstrained optimization

Optimization. Next: Curve Fitting Up: Numerical Analysis for Chemical Previous: Linear Algebraic and Equations. Subsections

Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

4TE3/6TE3. Algorithms for. Continuous Optimization

Applications of Linear Programming

Reading group: Calculus of Variations and Optimal Control Theory by Daniel Liberzon

Newton s Method. Javier Peña Convex Optimization /36-725

FINANCIAL OPTIMIZATION

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

CS-E4830 Kernel Methods in Machine Learning

Optimality Conditions for Constrained Optimization

Nonlinear Programming

Nonlinear Optimization for Optimal Control

Numerical Methods I Eigenvalue Problems

Scientific Data Computing: Lecture 3

Convex Optimization M2

Transcription:

Scientific Computing: Optimization Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 Course MATH-GA.2043 or CSCI-GA.2112, Spring 2012 March 8th, 2011 A. Donev (Courant Institute) Lecture VII 3/8/2011 1 / 26

Outline 1 Mathematical Background 2 Smooth Unconstrained Optimization 3 Constrained Optimization 4 Conclusions A. Donev (Courant Institute) Lecture VII 3/8/2011 2 / 26

Formulation Mathematical Background Optimization problems are among the most important in engineering and finance, e.g., minimizing production cost, maximizing profits, etc. min x R n f (x) where x are some variable parameters and f : R n R is a scalar objective function. Observe that one only need to consider minimization as max x R x R f (x) = min [ f (x)] n n A local minimum x is optimal in some neighborhood, f (x ) f (x) x s.t. x x R > 0. (think of finding the bottom of a valley) Finding the global minimum is generally not possible for arbitrary functions (think of finding Mt. Everest without a satelite). A. Donev (Courant Institute) Lecture VII 3/8/2011 3 / 26

Mathematical Background Connection to nonlinear systems Assume that the objective function is differentiable (i.e., first-order Taylor series converges or gradient exists). Then a necessary condition for a local minimizer is that x be a critical point { } f g (x ) = x f (x ) = (x ) = 0 x i i which is a system of non-linear equations! In fact similar methods, such as Newton or quasi-newton, apply to both problems. Vice versa, observe that solving f (x) = 0 is equivalent to an optimization problem [ ] min f (x) T f (x) x although this is only recommended under special circumstances. A. Donev (Courant Institute) Lecture VII 3/8/2011 4 / 26

Mathematical Background Sufficient Conditions Assume now that the objective function is twice-differentiable (i.e., Hessian exists). A critical point x is a local minimum if the Hessian is positive definite H (x ) = 2 xf (x ) 0 which means that the minimum really looks like a valley or a convex bowl. At any local minimum the Hessian is positive semi-definite, 2 xf (x ) 0. Methods that require Hessian information converge fast but are expensive (next class). A. Donev (Courant Institute) Lecture VII 3/8/2011 5 / 26

Mathematical Background Mathematical Programming The general term used is mathematical programming. Simplest case is unconstrained optimization min f (x) x R n where x are some variable parameters and f : R n R is a scalar objective function. Find a local minimum x : f (x ) f (x) x s.t. x x R > 0. (think of finding the bottom of a valley). Find the best local minimum, i.e., the global minimumx : This is virtually impossible in general and there are many specialized techniques such as genetic programming, simmulated annealing, branch-and-bound (e.g., using interval arithmetic), etc. Special case: A strictly convex objective function has a unique local minimum which is thus also the global minimum. A. Donev (Courant Institute) Lecture VII 3/8/2011 6 / 26

Mathematical Background Constrained Programming The most general form of constrained optimization min f (x) x X where X R n is a set of feasible solutions. The feasible set is usually expressed in terms of equality and inequality constraints: h(x) = 0 g(x) 0 The only generally solvable case: convex programming Minimizing a convex function f (x) over a convex set X : every local minimum is global. If f (x) is strictly convex then there is a unique local and global minimum. A. Donev (Courant Institute) Lecture VII 3/8/2011 7 / 26

Special Cases Mathematical Background Special case is linear programming: { c T x } min x R n s.t. Ax b. Equality-constrained quadratic programming { min x R 2 x 2 1 + x2 2 } s.t. x1 2 + 2x 1x 2 + 3x2 2 = 1. generalized to arbitary ellipsoids as: { min x R n f (x) = x 2 2 = x x = } n i=1 x2 i s.t. (x x 0 ) T A (x x 0 ) = 1 A. Donev (Courant Institute) Lecture VII 3/8/2011 8 / 26

Smooth Unconstrained Optimization Necessary and Sufficient Conditions A necessary condition for a local minimizer: The optimum x must be a critical point (maximum, minimum or saddle point): { } f g (x ) = x f (x ) = (x ) = 0, x i i and an additional sufficient condition for a critical point x to be a local minimum: The Hessian at the optimal point must be positive definite, { H (x ) = 2 xf (x 2 } f ) = (x ) 0. x i x j which means that the minimum really looks like a valley or a convex bowl. ij A. Donev (Courant Institute) Lecture VII 3/8/2011 9 / 26

Smooth Unconstrained Optimization Direct-Search Methods A direct search method only requires f (x) to be continuous but not necessarily differentiable, and requires only function evaluations. Methods that do a search similar to that in bisection can be devised in higher dimensions also, but they may fail to converge and are usually slow. The MATLAB function fminsearch uses the Nelder-Mead or simplex-search method, which can be thought of as rolling a simplex downhill to find the bottom of a valley. But there are many others and this is an active research area. Curse of dimensionality: As the number of variables (dimensionality) n becomes larger, direct search becomes hopeless since the number of samples needed grows as 2 n! A. Donev (Courant Institute) Lecture VII 3/8/2011 10 / 26

Smooth Unconstrained Optimization Minimum of 100(x 2 x 2 1 )2 + (a x 1 ) 2 in MATLAB % Rosenbrock or banana f u n c t i o n : a = 1 ; banana = @( x ) 100 ( x (2) x (1)ˆ2)ˆ2+( a x ( 1 ) ) ˆ 2 ; % This f u n c t i o n must a c c e p t a r r a y arguments! banana xy = @( x1, x2 ) 100 ( x2 x1. ˆ 2 ). ˆ 2 + ( a x1 ). ˆ 2 ; f i g u r e ( 1 ) ; e z s u r f ( banana xy, [ 0, 2, 0, 2 ] ) [ x, y ] = meshgrid ( l i n s p a c e ( 0, 2, 1 0 0 ) ) ; f i g u r e ( 2 ) ; c o n t o u r f ( x, y, banana xy ( x, y ), 1 0 0 ) % C o r r e c t answers a r e x =[1,1] and f ( x)=0 [ x, f v a l ] = f m i n s e a r c h ( banana, [ 1.2, 1 ], o p t i m s e t ( TolX,1 e 8)) x = 0. 999999999187814 0. 999999998441919 f v a l = 1. 099088951919573 e 18 A. Donev (Courant Institute) Lecture VII 3/8/2011 11 / 26

Smooth Unconstrained Optimization Figure of Rosenbrock f (x) 100 (x 2 x 1 2 ) 2 +(a x1 ) 2 2 1.8 800 600 1.6 1.4 1.2 400 1 200 0 2 1 0.8 0.6 0.4 1 0 0.5 0.2 0 1 0.5 0 x 2 1 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2 x 1 A. Donev (Courant Institute) Lecture VII 3/8/2011 12 / 26

Smooth Unconstrained Optimization Descent Methods Finding a local minimum is generally easier than the general problem of solving the non-linear equations g (x ) = x f (x ) = 0 We can evaluate f in addition to x f. The Hessian is positive-(semi)definite near the solution (enabling simpler linear algebra such as Cholesky). If we have a current guess for the solution x k, and a descent direction (i.e., downhill direction) d k : f ( x k + αd k) < f ( x k) for all 0 < α α max, then we can move downhill and get closer to the minimum (valley): where α k > 0 is a step length. x k+1 = x k + α k d k, A. Donev (Courant Institute) Lecture VII 3/8/2011 13 / 26

Smooth Unconstrained Optimization Gradient Descent Methods For a differentiable function we can use Taylor s series: f ( x k + αd k) f ( x k) + α k [ ( f ) T d k] This means that fastest local decrease in the objective is achieved when we move opposite of the gradient: steepest or gradient descent: d k = f ( x k) = g k. One option is to choose the step length using a line search one-dimensional minimization: α k = arg min α f ( x k + αd k), which needs to be solved only approximately. A. Donev (Courant Institute) Lecture VII 3/8/2011 14 / 26

Smooth Unconstrained Optimization Steepest Descent Assume an exact line search was used, i.e., α k = arg min α φ(α) where φ(α) = f ( x k + αd k). φ (α) = 0 = [ f ( x k + αd k)] T d k. This means that steepest descent takes a zig-zag path down to the minimum. Second-order analysis shows that steepest descent has linear convergence with convergence coefficient C 1 r 1 + r, where r = λ min (H) λ max (H) = 1 κ 2 (H), inversely proportional to the condition number of the Hessian. Steepest descent can be very slow for ill-conditioned Hessians: One improvement is to use conjugate-gradient method instead (see book). A. Donev (Courant Institute) Lecture VII 3/8/2011 15 / 26

Smooth Unconstrained Optimization Newton s Method Making a second-order or quadratic model of the function: f (x k + x) = f (x k ) + [ g ( x k)] T ( x) + 1 2 ( x)t [ H ( x k)] ( x) we obtain Newton s method: g(x + x) = f (x + x) = 0 = g + H ( x) x = H 1 g x k+1 = x k [ H ( x k)] 1 [ g ( x k )]. Note that this is exact for quadratic objective functions, where H H ( x k) = const. Also note that this is identical to using the Newton-Raphson method for solving the nonlinear system x f (x ) = 0. A. Donev (Courant Institute) Lecture VII 3/8/2011 16 / 26

Smooth Unconstrained Optimization Problems with Newton s Method Newton s method is exact for a quadratic function and converges in one step! For non-linear objective functions, however, Newton s method requires solving a linear system every step: expensive. It may not converge at all if the initial guess is not very good, or may converge to a saddle-point or maximum: unreliable. All of these are addressed by using variants of quasi-newton methods: x k+1 = x k α k H 1 [ ( k g x k )], where 0 < α k < 1 and H k is an approximation to the true Hessian. A. Donev (Courant Institute) Lecture VII 3/8/2011 17 / 26

Constrained Optimization General Formulation Consider the constrained optimization problem: min x R n f (x) s.t. h(x) = 0 (equality constraints) g(x) 0 (inequality constraints) Note that in principle only inequality constraints need to be considered since { h(x) 0 h(x) = 0 h(x) 0 but this is not usually a good idea. We focus here on non-degenerate cases without considering various complications that may arrise in practice. A. Donev (Courant Institute) Lecture VII 3/8/2011 18 / 26

Constrained Optimization Illustration of Lagrange Multipliers A. Donev (Courant Institute) Lecture VII 3/8/2011 19 / 26

Constrained Optimization Lagrange Multipliers: Single equality An equality constraint h(x) = 0 corresponds to an (n 1) dimensional constraint surface whose normal vector is h. The illustration on previous slide shows that for a single smooth equality constraint, the gradient of the objective function must be parallel to the normal vector of the constraint surface: f h λ s.t. f + λ h = 0, where λ is the Lagrange multiplier corresponding to the constraint h(x) = 0. Note that the equation f + λ h = 0 is in addition to the constraint h(x) = 0 itself. A. Donev (Courant Institute) Lecture VII 3/8/2011 20 / 26

Constrained Optimization Lagrange Multipliers: m equalities When m equalities are present, h 1 (x) = h 2 (x) = = h m (x) the generalization is that the descent direction f must be in the span of the normal vectors of the constraints: f + m λ i h i = f + ( h) T λ = 0 i=1 where the Jacobian has the normal vectors as rows: { } hi h =. x j This is a first-order necessary optimality condition. ij A. Donev (Courant Institute) Lecture VII 3/8/2011 21 / 26

Constrained Optimization Lagrange Multipliers: Single inequalities At the solution, a given inequality constraint g i (x) 0 can be active if g i (x ) = 0 inactive if g i (x ) < 0 For inequalities, there is a definite sign (direction) for the constraint normal vectors: For an active constraint, you can move freely along g but not along + g. This means that for a single active constraint f = µ g where µ > 0. A. Donev (Courant Institute) Lecture VII 3/8/2011 22 / 26

Constrained Optimization Lagrange Multipliers: r inequalities The generalization is the same as for equalities f + r µ i g i = f + ( g) T µ = 0. i=1 But now there is an inequality condition on the Lagrange multipliers, { µ i > 0 if g i = 0 (active) which can also be written as µ i = 0 if g i < 0 (inactive) µ 0 and µ T g(x) = 0. A. Donev (Courant Institute) Lecture VII 3/8/2011 23 / 26

Constrained Optimization Equality Constraints Putting equalities and inequalities together we get the first-order Karush-Kuhn-Tucker (KKT) necessary condition: There exist Lagrange multipliers λ R m and µ R r such that: f + ( h) T λ + ( g) T µ = 0, µ 0 and µ T g(x) = 0 This is now a system of equations, similarly to what we had for unconstrained optimization but now involving also (constrained) Lagrange multipliers. Note there are also second order necessary and sufficient conditions similar to unconstrained optimization. Many numerical methods are based on Lagrange multipliers (see books on Optimization). A. Donev (Courant Institute) Lecture VII 3/8/2011 24 / 26

Constrained Optimization Linear Programming Consider linear programming (see illustration) { c T x } min x R n s.t. Ax b. The feasible set here is a polytope (polygon, polyhedron) in R n, consider for now the case when it is bounded, meaning there are at least n + 1 constraints. The optimal point is a vertex of the polyhedron, meaning a point where (generically) n constraints are active, A act x = b act. Solving the problem therefore means finding the subset of active constraints: Combinatorial search problem, solved using the simplex algorithm (search along the edges of the polytope). Lately interior-point methods have become increasingly popular (move inside the polytope). A. Donev (Courant Institute) Lecture VII 3/8/2011 25 / 26

Conclusions Conclusions/Summary Optimization, or mathematical programming, is one of the most important numerical problems in practice. Optimization problems can be constrained or unconstrained, and the nature (linear, convex, quadratic, algebraic, etc.) of the functions involved matters. Finding a global minimum of a general function is virtually impossible in high dimensions, but very important in practice. An unconstrained local minimum can be found using direct search, gradient descent, or Newton-like methods. Equality-constrained optimization is tractable, but the best method depends on the specifics. We looked at penalty methods only as an illustration, not because they are good in practice! Constrained optimization is tractable for the convex case, otherwise often hard, and even NP-complete for integer programming. A. Donev (Courant Institute) Lecture VII 3/8/2011 26 / 26