MVE165/MMG631 Linear and integer optimization with applications Lecture 13 Overview of nonlinear programming. Ann-Brith Strömberg

Similar documents
Lecture 3. Optimization Problems and Iterative Algorithms

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

Nonlinear Programming (Hillier, Lieberman Chapter 13) CHEM-E7155 Production Planning and Control

5 Handling Constraints

Numerical optimization

Numerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems

NONLINEAR. (Hillier & Lieberman Introduction to Operations Research, 8 th edition)

Optimization Part 1 P. Agius L2.1, Spring 2008

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints

Optimization Tutorial 1. Basic Gradient Descent

Computational Finance

Lecture 18: Optimization Programming

Nonlinear Optimization: What s important?

Interior-Point Methods for Linear Optimization

Gradient Descent. Dr. Xiaowei Huang

Primal-Dual Interior-Point Methods for Linear Programming based on Newton s Method

Solution Methods. Richard Lusby. Department of Management Engineering Technical University of Denmark

Lecture 2: Convex Sets and Functions

CONSTRAINED NONLINEAR PROGRAMMING


Algorithms for constrained local optimization

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

8 Numerical methods for unconstrained problems

Lecture V. Numerical Optimization

Constrained Optimization and Lagrangian Duality

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09

2.098/6.255/ Optimization Methods Practice True/False Questions

14 Lecture 14 Local Extrema of Function

Constrained Optimization

More on Lagrange multipliers

Lecture 13: Constrained optimization

1. f(β) 0 (that is, β is a feasible point for the constraints)

Introduction to Nonlinear Optimization Paul J. Atzberger

Nonlinear Programming

Written Examination

MVE165/MMG631 Linear and integer optimization with applications Lecture 8 Discrete optimization: theory and algorithms

Unconstrained Optimization

Duality Theory of Constrained Optimization

Existence of minimizers

MATHEMATICAL ECONOMICS: OPTIMIZATION. Contents

Optimisation in Higher Dimensions

Applied Lagrange Duality for Constrained Optimization

2.3 Linear Programming

CS-E4830 Kernel Methods in Machine Learning

Machine Learning. Support Vector Machines. Manfred Huber

Optimality Conditions

1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by:

Optimization Problems with Constraints - introduction to theory, numerical Methods and applications

ICS-E4030 Kernel Methods in Machine Learning

Constrained Optimization. Unconstrained Optimization (1)

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

Convex Optimization. Prof. Nati Srebro. Lecture 12: Infeasible-Start Newton s Method Interior Point Methods

4TE3/6TE3. Algorithms for. Continuous Optimization

1 Numerical optimization

Review of Optimization Methods

SECTION C: CONTINUOUS OPTIMISATION LECTURE 9: FIRST ORDER OPTIMALITY CONDITIONS FOR CONSTRAINED NONLINEAR PROGRAMMING

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

5. Duality. Lagrangian

CSCI : Optimization and Control of Networks. Review on Convex Optimization

MIT Manufacturing Systems Analysis Lecture 14-16

N. L. P. NONLINEAR PROGRAMMING (NLP) deals with optimization models with at least one nonlinear function. NLP. Optimization. Models of following form:

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

A Brief Review on Convex Optimization

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

4TE3/6TE3. Algorithms for. Continuous Optimization

MATH 4211/6211 Optimization Basics of Optimization Problems

Multidisciplinary System Design Optimization (MSDO)

Unit: Optimality Conditions and Karush Kuhn Tucker Theorem

Chapter III. Unconstrained Univariate Optimization

Unconstrained minimization of smooth functions

Following The Central Trajectory Using The Monomial Method Rather Than Newton's Method

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

Numerical Optimization

Lectures 9 and 10: Constrained optimization problems and their optimality conditions

g(x,y) = c. For instance (see Figure 1 on the right), consider the optimization problem maximize subject to

Pattern Classification, and Quadratic Problems

Optimality Conditions for Constrained Optimization

Convex Optimization Boyd & Vandenberghe. 5. Duality

Scientific Computing: Optimization

Math 273a: Optimization Subgradients of convex functions

The Steepest Descent Algorithm for Unconstrained Optimization

Continuous Optimisation, Chpt 6: Solution methods for Constrained Optimisation

Chapter 4: Unconstrained nonlinear optimization

CE 191: Civil and Environmental Engineering Systems Analysis. LEC 05 : Optimality Conditions

4M020 Design tools. Algorithms for numerical optimization. L.F.P. Etman. Department of Mechanical Engineering Eindhoven University of Technology

Optimization Methods

Optimization. Next: Curve Fitting Up: Numerical Analysis for Chemical Previous: Linear Algebraic and Equations. Subsections

Support Vector Machines: Maximum Margin Classifiers

Lecture 5: Gradient Descent. 5.1 Unconstrained minimization problems and Gradient descent

1 Numerical optimization

Constrained optimization

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Applications of Linear Programming

Economics 101A (Lecture 3) Stefano DellaVigna

IE 5531: Engineering Optimization I

Nonlinear Optimization for Optimal Control

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

TMA 4180 Optimeringsteori KARUSH-KUHN-TUCKER THEOREM

The Karush-Kuhn-Tucker (KKT) conditions

Transcription:

MVE165/MMG631 Overview of nonlinear programming Ann-Brith Strömberg 2015 05 21

Areas of applications, examples (Ch. 9.1) Structural optimization Design of aircraft, ships, bridges, etc Decide on the material and the topology and thickness of a mechanical structure Minimize weight, maximize stiffness, constraints on deformation at certain loads, strength, fatigue limit, etc Analysis and design of traffic networks Estimate traffic flows and discharges Detect bottlenecks Analyze effects of traffic signals, tolls, etc

Areas of applications, more examples (Ch. 9.1) Least squares Adaptation of data Engine development, design of antennas or tyres, etc. For each function evaluation a computationally expensive (time consuming) simulation may be needed Maximize the volume of a cylinder While keeping the surface area constant Wind power generation The energy content in the wind v 3 (in Ass3a discretized measured data is used)

An overview of nonlinear optimization General notation for nonlinear programs Some special cases minimize x R n f(x) subject to g i (x) 0, i L, h i (x) = 0, i E. Unconstrained problems (L = E = ): minimize f(x) subject to x R n Convex programming: f convex, g i convex,i L, h i linear,i E. Linear constraints: g i, i L, and h i, i E Quadratic programming: f(x) = c T x+ 1 2 xt Qx Linear programming: f(x) = c T x

Properties of nonlinear programs The mathematical properties of nonlinear optimization problems can be very different No algorithm exists that solves all nonlinear optimization problems An optimal solution does not have to be located at an extreme point Nonlinear programs can be unconstrained What if a linear program has no constraints? f may be differentiable or non-differentiable E.g., the Lagrangean dual objective function; Ass3b For convex problems: Algorithms (typically) converge to an optimal solution Nonlinear problems can have local optima that are not global optima

Consider the problem to minimize f(x) subject to x S f(x) x 1 x 2 x 3 x 4 x 5 x 6 x 7 S x Possible extremal points are boundary points of S = [x 1,x 7 ] (i.e., {x 1,x 7 }) stationary points, where f (x) = 0 (i.e., {x 2,...,x 6 }) discontinuities in f or f Draw!

Boundary and stationary points (Ch. 10.0) Boundary points x is a boundary point to the feasible set S = {x R n g i (x) 0,i L} if g i (x) 0, i L, and g i (x) = 0 for at least one index i L Stationary points x is a stationary point to f if f(x) = 0 n (for n = 1: if f (x) = 0)

Local and global minima (maxima) (Ch. 2.4) Consider the nonlinear optimization problem to minimize f(x) subject to x S Local minimum In words: A solution is a local minimum if it is feasible and no other feasible solution in a sufficiently small neighbourhood has a lower objective value Formally: x is a local minimum if x S and ε > 0 such that f(x) f(x) for all x {y S : y x ε} Draw!! Global minimum In words: A solution is a global minimum if it is feasible and no other feasible solution has a lower objective value Formally: x is a global minimum if x S and f(x) f(x) for all x S

When is a local optimum also a global optimum? (Ch. 9.3) The concept of convexity is essential Functions: convex (minimization), concave (maximization) Sets: convex (minimization and maximization) The minimization (maximization) of a convex (concave) function over a convex set is referred to as a convex optimization problem Definition 9.5: Convex optimization problem If f and g i, i L, are convex functions, then minimize f(x) subject to g i (x) 0, i L is said to be a convex optimization problem Theorem 9.1: Global optimum Let x be a local optimum of a convex optimization problem. Then x is also a global optimum

Convex functions A function f is convex on S if, for any x,y S it holds that f(αx+(1 α)y) αf(x)+(1 α)f(y) for all 0 α 1 f(y) f(x) + (1 α)f(y) f(x) f(αx + (1 α)y) A convex function A non-convex function f(αx + (1 α)y) αf(x) + (1 α)f(y) f(y) f(x) x αx + (1 α)y y x αx + (1 α)y y The function f is strictly convex on S if, for any x,y S such that x y it holds that f(αx+(1 α)y) < αf(x)+(1 α)f(y) for all 0 < α < 1

Convex sets A set S is convex if, for any x,y S it holds that αx+(1 α)y S for all 0 α 1 Examples Convex sets x y Non-convex sets y x Consider a set S defined by the intersection of m = L inequalities, where the functions g i : R n R, i L, as Theorems 9.2 & 9.3 S = {x R n g i (x) 0, i L } If all the functions g i (x), i L, are convex on R n, then S is a convex set

The Karush-Kuhn-Tucker conditions: necessary conditions for optimality Let S := {x R n g i (x) 0,i L } Assume that the function f : R n R is differentiable, the functions g i : R n R, i L, are convex and differentiable, and there exists a point x S such that g i (x) < 0, i L If x S is a local minimum of f over S, then there exists a vector µ R m (where m = L ) such that f(x )+ i L µ i g i (x ) = 0 n µ i g i (x ) = 0, i L g i (x ) 0, µ 0 m i L

Geometry of the Karush-Kuhn-Tucker conditions g 2 ( x) f(x) g 1=0 g 1 (x) g 2=0 x g 3=0 S Figur: Geometric interpretation of the Karush-Kuhn-Tucker conditions. At a local minimum, the negative gradient of the objective function can be expressed as a non-negative linear combination of the gradients of the active constraints at this point.

The Karush-Kuhn-Tucker conditions: sufficient for optimality under convexity Assume that the functions f,g i : R n R, i L, are convex and differentiable, and let S = {x R n g i (x) 0,i L } If the conditions (where m = L ) f(x )+ i L µ i g i (x ) = 0 n µ i g i (x ) = 0, i L µ 0 m hold, then x S is a global minimum of f over S The Karush-Kuhn-Tucker conditions can also be stated for optimization problems with equality constraints For unconstrained optimization KKT reads: f(x ) = 0 For a quadratic program KKT forms a system of linear (in)equalities plus the complementarity constraints

The optimality conditions can be used to.. verify an (local) optimal solution solve certain special cases of nonlinear programs (e.g. quadratic programs) algorithm construction derive properties of a solution to a non-linear program

Example minimize f(x) := 2x 2 1 +2x 1x 2 +x 2 2 10x 1 10x 2 subject to x 2 1 +x2 2 5 3x 1 +x 2 6 Is x 0 = (1,2) T a Karush-Kuhn-Tucker point? Is it an optimal solution? Derive: f(x) = (4x 1 +2x 2 10,2x 1 +2x 2 10) T, g 1 (x) = (2x 1,2x 2 ) T, and g 2 (x) = (3,1) T 4x 0 1 +2x0 2 10+2x0 1 µ 1 +3µ 2 = 0 2x 0 1 +2x0 2 10+2x0 2 µ 1 +µ 2 = 0 µ 1 ((x 0 1 )2 +(x 0 2 )2 5) = µ 2 (3x 0 1 +x0 2 6) = 0 µ 1,µ 2 0 2µ 1 +3µ 2 = 2 4µ 1 +µ 2 = 4 0µ 1 = µ 2 = 0 µ 1,µ 2 0 µ 2 = 0 µ 1 = 1 0

Example, continued OK, the Karush-Kuhn-Tucker conditions hold Is the solution optimal? Check convexity! ( ) ( 4 2 2 0 2 f(x) =, 2 2 2 g 1 (x) = 0 2 f, g 1, and g 2 are convex ), 2 g 2 (x) = 0 2 2 x 0 = (1,2) T is an optimal solution and f(x 0 ) = 20

General iterative search method for unconstrained optimization (Ch. 2.5.1) 1 Choose a starting solution, x 0 R n. Let k = 0 2 Determine a search direction d k 3 If a termination criterion is fulfilled Stop! 4 Determine a step length, t k, by solving: minimize t 0 ϕ(t) := f(x k +t d k ) 5 New iteration point, x k+1 = x k +t k d k 6 Let k := k +1 and return to step 2 How choose search directions d k, step lengths t k, and termination criteria?

Improving search directions (Ch. 10) Goal: f(x k+1 ) < f(x k ) (minimization) How does f change locally in a direction d k at x k? Taylor expansion (Ch. 9.2): f(x k +td k ) = f(x k )+t f(x k ) T d k +O(t 2 ) For sufficiently small t > 0: f(x k +td k ) < f(x k ) f(x k ) T d k < 0 Definition If f(x k ) T d k < 0 then d k is a descent direction for f at x k If f(x k ) T d k > 0 then d k is an ascent direction for f at x k We wish to minimize (maximize) f over R n Choose d k as a descent (an ascent) direction from x k

An improving step d k+1 5 4 3 d k 2 1 0 5 4 3 2 1 x k+1 0 1 2 3 x k 4 5 1 2 3 4 5 Figur: At x k, the descent direction d k is generated. A step t k is taken in this direction, producing x k+1. At this point, a new descent direction d k+1 is generated, etc.

General iterative search method for unconstrained optimization (Ch. 2.5.1) 1 Choose a starting solution, x 0 R n. Let k = 0 2 Determine a search direction d k 3 If a termination criterion is fulfilled Stop! 4 Determine a step length, t k, by solving: minimize t 0 ϕ(t) := f(x k +t d k ) 5 New iteration point, x k+1 = x k +t k d k 6 Let k := k +1 and return to step 2

Step length line search (minimization) (Ch. 10.4) Solve min t 0 ϕ(t) := f(x k +t d k ) where d k is a descent direction from x k A minimization problem in one variable Solution t k Analytic solution: ϕ (t k ) = 0 (seldom possible to derive) Numerical solution methods The golden section method (reduce the interval of uncertainty) The bi-section method (reduce the interval of uncertainty) Newton-Raphson s method Armijo s method In practice Do not solve exactly, but to a sufficient improvement of the function value: f(x k +t k d k ) f(x k ) ε for some ε > 0

Line search d k 5 4 3 2 ϕ(t) 1 0 5 4 3 x k 2 1 0 1 2 3 4 5 1 2 3 4 5 t k t Figur: A line search in a descent direction. t k solves min t 0 ϕ(t) := f(x k +t d k )

General iterative search method for unconstrained optimization 1 Choose a starting solution, x 0 R n. Let k = 0 2 Determine a search direction d k 3 If a termination criterion is fulfilled Stop! 4 Determine a step length, t k, by solving: minimize t 0 ϕ(t) := f(x k +t d k ) 5 New iteration point, x k+1 = x k +t k d k 6 Let k := k +1 and return to step 2

Termination criteria Needed since f(x k ) = 0 will never be fulfilled exactly Typical choices (ε j > 0, j = 1,...,4) (a) f(x k ) < ε 1 (b) f(x k+1 ) f(x k ) < ε 2 (c) x k+1 x k < ε 3 (d) t k < ε 4 These are often combined The search method only guarantees a stationary solution, whose properties are determined by the properties of f (convexity,...)

Constrained optimization: Penalty methods Consider both inequality and equality constraints minimize x R n f(x) subject to g i (x) 0, i L, (1) h i (x) = 0, i E. Drop the constraints and add terms in the objective that penalize infeasibile solutions minimize x R n F µ (x) := f(x)+µ α i (x) (2) i L E { = 0 if x satisfies constraint i where µ > 0 and α i (x) = > 0 otherwise Common penalty functions (which of these are differentiable?) i L: α i (x) = max{0,g i (x)} or α i (x) = (max{0,g i (x)}) 2 i E: α i (x) = h i (x) or α i (x) = h i (x) 2

Squared and non-squared penalty functions minimize ( x 2 20lnx ) subject to x 5 60 50 x 2 20lnx x 2 20lnx+max{0,5 x} x 2 20lnx+(max{0,5 x}) 2 40 30 20 10 0 10 20 2 3 4 5 6 7 8 Figur: Squared and non-squared penalty function. g i differentiable = squared penalty function differentiable

Squared penalty functions In practice: Start with a low value of µ > 0 and increase the value as the computations proceed Example: minimize (x 2 20lnx) subject to x 5 ( ) minimize ( x 2 20lnx +µ(max{0,5 x}) 2) ( ) 25 20 15 µ=19.2 x 2 20lnx 10 5 µ=4.8 0 5 10 µ=2.4 µ=1.2 µ=0.6 µ=0.3 15 2 3 4 5 6 7 8 Figur: Squared penalty function: µ < such that an optimal solution for ( ) is optimal (feasible) for ( )

Non-squared penalty functions In practice: Start with a low value of µ > 0 and increase the value as the computations proceed Example: minimize ( x 2 20lnx ) subject to x 5 (+) minimize ( x 2 20lnx +µmax{0,5 x} ) (++) 25 20 15 µ=16 x 2 20lnx 10 5 µ=8 0 5 10 µ=4 µ=2 µ=1 15 2 3 4 5 6 7 8 Figur: Non-squared penalty function: For µ 6 the optimal solution for (++) is optimal (and feasible) for (+)

Constrained optimization: Barrier methods Consider only inequality constraints minimize x R n f(x) subject to g i (x) 0, i L (3) Drop the constraints and add terms in the objective that prevents from approaching the boundary of the feasible set minimize x R n F µ (x) := f(x)+µ i Lα i (x) (4) where µ > 0 and α i (x) + as g i (x) 0 (as constraint i approaches being active) Common barrier functions α i (x) = ln[ g i (x)] or α i (x) = 1 g i (x)

Logarithmic barrier functions Choose µ > 0 and decrease it as the computations proceed ( Example: minimize x 2 20lnx ) subject to x 5 ( minimize x>5 x 2 20lnx µln(x 5) ) 50 40 30 20 10 0 10 5 5.5 6 6.5 7 7.5 8

Fractional barrier functions Choose µ > 0 and decrease it as the computations proceed Example: minimize ( x 2 20lnx ) subject to x 5 ( minimize x>5 x 2 20lnx + µ ) x 5 50 40 30 20 10 0 10 20 30 40 50 2 3 4 5 6 7 8