ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints

Similar documents
Quiz Discussion. IE417: Nonlinear Programming: Lecture 12. Motivation. Why do we care? Jeff Linderoth. 16th March 2006

Quadratic Programming

5 Handling Constraints

Constrained Optimization

Constrained Optimization and Lagrangian Duality

2.098/6.255/ Optimization Methods Practice True/False Questions

Lecture 18: Optimization Programming

Convex Optimization & Lagrange Duality

1 Computing with constraints

Written Examination

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

IE 5531: Engineering Optimization I

Lecture 3. Optimization Problems and Iterative Algorithms

Numerical Optimization

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

4TE3/6TE3. Algorithms for. Continuous Optimization

5. Duality. Lagrangian

CS-E4830 Kernel Methods in Machine Learning

Optimization Problems with Constraints - introduction to theory, numerical Methods and applications

Nonlinear Optimization: What s important?

Generalization to inequality constrained problem. Maximize

Determination of Feasible Directions by Successive Quadratic Programming and Zoutendijk Algorithms: A Comparative Study

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

Multidisciplinary System Design Optimization (MSDO)

Numerical optimization

Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Lectures 9 and 10: Constrained optimization problems and their optimality conditions

More on Lagrange multipliers

LINEAR AND NONLINEAR PROGRAMMING

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Constrained optimization: direct methods (cont.)

Lecture 16: October 22

Computational Finance

Part 5: Penalty and augmented Lagrangian methods for equality constrained optimization. Nick Gould (RAL)

ICS-E4030 Kernel Methods in Machine Learning

2.3 Linear Programming

Convex Optimization M2

CONSTRAINED NONLINEAR PROGRAMMING

Lecture: Duality.

Algorithms for constrained local optimization

Applications of Linear Programming

Bindel, Spring 2017 Numerical Analysis (CS 4220) Notes for So far, we have considered unconstrained optimization problems.

Optimization Methods

Support Vector Machines: Maximum Margin Classifiers

Nonlinear Programming and the Kuhn-Tucker Conditions

Scientific Computing: Optimization

Numerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems

In view of (31), the second of these is equal to the identity I on E m, while this, in view of (30), implies that the first can be written

minimize x subject to (x 2)(x 4) u,

Optimization Tutorial 1. Basic Gradient Descent

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

N. L. P. NONLINEAR PROGRAMMING (NLP) deals with optimization models with at least one nonlinear function. NLP. Optimization. Models of following form:

Computational Optimization. Augmented Lagrangian NW 17.3

CS711008Z Algorithm Design and Analysis

IE 5531: Engineering Optimization I

Introduction to Mathematical Programming IE406. Lecture 10. Dr. Ted Ralphs

Seminars on Mathematics for Economics and Finance Topic 5: Optimization Kuhn-Tucker conditions for problems with inequality constraints 1

Penalty and Barrier Methods. So we again build on our unconstrained algorithms, but in a different way.

Slack Variable. Max Z= 3x 1 + 4x 2 + 5X 3. Subject to: X 1 + X 2 + X x 1 + 4x 2 + X X 1 + X 2 + 4X 3 10 X 1 0, X 2 0, X 3 0

Solution Methods. Richard Lusby. Department of Management Engineering Technical University of Denmark

Appendix A Taylor Approximations and Definite Matrices

Constrained optimization

Convex Optimization Boyd & Vandenberghe. 5. Duality

MVE165/MMG631 Linear and integer optimization with applications Lecture 13 Overview of nonlinear programming. Ann-Brith Strömberg

NONLINEAR. (Hillier & Lieberman Introduction to Operations Research, 8 th edition)

OPTIMIZATION THEORY IN A NUTSHELL Daniel McFadden, 1990, 2003

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Pattern Classification, and Quadratic Problems

Optimization Theory. Lectures 4-6

Gradient Descent. Dr. Xiaowei Huang

Lecture 11 and 12: Penalty methods and augmented Lagrangian methods for nonlinear programming

STATIC LECTURE 4: CONSTRAINED OPTIMIZATION II - KUHN TUCKER THEORY

CE 191: Civil & Environmental Engineering Systems Analysis. LEC 17 : Final Review

Lecture 6: Conic Optimization September 8

Optimality Conditions for Constrained Optimization

Lecture: Duality of LP, SOCP and SDP

Nonlinear Programming (Hillier, Lieberman Chapter 13) CHEM-E7155 Production Planning and Control

Part 4: Active-set methods for linearly constrained optimization. Nick Gould (RAL)

Numerical Optimization. Review: Unconstrained Optimization

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods

Lecture 13: Constrained optimization

Interior Point Methods. We ll discuss linear programming first, followed by three nonlinear problems. Algorithms for Linear Programming Problems

AM 205: lecture 18. Last time: optimization methods Today: conditions for optimality

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09

Optimality, Duality, Complementarity for Constrained Optimization

CONSTRAINED OPTIMALITY CRITERIA

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Duality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities

EC /11. Math for Microeconomics September Course, Part II Problem Set 1 with Solutions. a11 a 12. x 2

More First-Order Optimization Algorithms

Second Order Optimality Conditions for Constrained Nonlinear Programming

Microeconomics I. September, c Leopold Sögner

The general programming problem is the nonlinear programming problem where a given function is maximized subject to a set of inequality constraints.

UC Berkeley Department of Electrical Engineering and Computer Science. EECS 227A Nonlinear and Convex Optimization. Solutions 5 Fall 2009


Mathematical Economics. Lecture Notes (in extracts)

1 Introduction

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

Machine Learning. Support Vector Machines. Manfred Huber

Transcription:

ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints Instructor: Prof. Kevin Ross Scribe: Nitish John October 18, 2011 1 The Basic Goal The main idea is to transform a given constrained problem into an equivalent unconstrained problem. The theory and methods for unconstrained optimization can then be applied to the new problem. 2 Overview of methods for constrained NLP 2.1 Quadratic Programs 2.2 Separable Programs 2.3 Gradient Descent Style Methods 2.4 Newton Style Methods 2.5 Penalty Function 2.6 Barrier Function 2.7 Sequential Approximation all these methods take a constrained to unconstrained approach to solve the problem eg: penalty method penalize the function and assumes unconstrained most of these methods don t guarantee but allow to approximate a solution i

3 General form of constrained NLP min (or max) f(x) Subject to g i (x) = 0 (i E) g i (x) 0 (i I) Here E is an index set for the equality constraints and I is an index set for inequality constraints f and g i are twice continuously differentiable. exempli gratia: min x 2 1 2x 1 + x 2 2 x 2 3 + 4x 3 (1) s.t. x 1 x 2 + 2x 3 = 2 (2)...we can substitute x 1 = 2 + x 2 2x 3 into the objective and now the problem is unconstrained! min 2x 2 2 + 3x 2 3 4x 2 x 3 + 2x 2 (3) A general iterative method for this is: x k+1 = x k + p; then x k+1 must also satisfy Ax = b, and A(x k ) + p) = b Ap = 0 The problem now becomes min f(x + Zv), where Z is the null space matrix for A, and p = Zv, where v is any vector. Now the problem is unconstrained. 4 A Matrix Approach Recall that for Ax = 0, all solutions x form the null space for A, and Z is a basis for that null space. A is a plane in three dimensions, and Z is everything orthogonal to that plane. A is size m x n, m n, and if A has rank m, then Z has rank (n m). Back to the example, repeated from the prior section: e.g. min x 2 1 2x 1 + x 2 2 x 2 3 + 4x 3 (4) s.t. x 1 x 2 + 2x 3 = 2 (5) ii

Assume we have a way to find the null space Z. One such choice is Z = Then any feasible new x k+1 can be written 2 1 2 x + Zv = 0 + 1 0 0 0 1 1 2 1 0 0 1 v = x k + Zv (6) We seek to optimize the term Zv. Optimality conditions involve derivatives of a reduced function. φ(v) = f(x + Zv) (7) φ(v) = Z T f(x + Zv) = Z T f(x) (8) 2 φ(v) = Z T 2 f(x + Zv)Z = Z T 2 f(x)z (9) The term φ(v) = Z T f(x) is called the reduced gradient. The term 2 φ(v) = Z T 2 f(x)z is called the reduced Hessian. 5 Necessary Condition, Linear Equality Constraints Z T f(x) = 0 (10) Z T 2 f(x)z positive semidefinite. (11) 6 Sufficient Condition, Linear Equality Constraints Ax = b (12) Z T f(x) = 0 (13) Z T 2 f(x)z positive definite. (14) e.g. our previous example: iii

x = f(x) = f(x) = Z T f(x) = Reduced Hessian: Z T 2 f(x)z = Z T 2 f(x)z = 2.5 1.5 1 2x 1 2 2x 2 2x 3 + 4 (15) 3 3 7 ( ) 0 0 ( ) 1 1 0 2 0 1 ( ) 4 4 4 6 (16) (17) 2 0 0 0 2 0 0 0 2 (18) 1 2 1 0(19) 0 1...is p.d. at x (20) The reduced Hessian is positive definite at x, so we know that x is optimal. 7 Lagrange Multipliers Lagrange Multiplier express the gradient at the optimum as a linear combination of the rows of the constraint matrix A It indicates the sensitivity of the optimal objective value to the changes in the data Most applications where data is approximate, the only choice is to solve the problem using the best available estimates. Once a solution is obtained, the next step is to assess the quality of the resulting solution Question : how sensitive is the solution to variation in the data? 7.1 Derivation of λ Take a look at the necessary conditions again. We can represent the gradient as f(x) = A T λ by breaking it into null and range spaces. Note that f(x ) = 0 at x for some m-vector λ. iv

f(x ) = 0 (21) f(x ) = Zv + A T λ (22) Z T f(x ) = Z T Zv + Z T A T λ (23) f(x ) = A T λ...since Z T Z is nonzero, so Zv = 0. (24) At a local minimum, the gradient of the objective function is a linear combination of the gradients of the constraints. We call these λ i values Lagrange multipliers. Why do we care about λ? Say we perturb the right hand side of the constraints, b, to a new value b + δ. Using a Taylor series, we find that: f(x) f(x) + (x row x) T f(x) (25) = f(x) + (x new x)a T λ (26) = f(x) + δλ (27) m = f(x) + δ i λ i (28) i=1 In other words, the rate of change of f changes at the rate of the Lagrange parameters. So the Lagrange parameters represent shadow prices, or dual variables. Now we can develop a Lagrangian function: f(x) A T λ = 0 (29) Ax = b (30) This is a set of (n+m) equations, with (n+m) unknowns x and λ. We can consider the dual variables as unknowns. Other representations are: L(x, λ) = f(x) λ T (Ax b) (31) L(x, λ) = f(x) λ i (Ax b) i (32) The gradient w.r.t. both x and λ gives our optimality conditions, L(x, λ) = 0. The local minimizer must be a stationary point of the Lagrangian equation, L. (33) v

7.2 Lagrangian optimality conditions In the case of linear inequalities, we have the following conditions for optimality. Let x be a local minimum of f over x : Ax b, and let λ be a vector of Lagrange multipliers. Necessary conditions: f(x) = A T λ (34) or Z T f(x) = 0 (35) λ 0 (36) λ T (Ax b) = 0 (37) (38) Sufficient conditions: Ax b (39) f(x) = A T λ (40) or Z T f(x) = 0 (41) λ 0 (42) λ i = 0 or (Ax b) i = 0 (Strict complimentarity.) (43) Z T 2 f(x)z positive definite (44) 7.3 KKT Conditions for maximization The optimality conditions are often referred to as Karush Kuhn Tucker or KKT conditions, named after the authors who proved them. For a problem of the following form: maxf(x) s.t.g i (x) b i, 1 = 1, 2,..., m x 0 where f and g i are twice continuously differentiable, satisfying certain regularity conditions. The solution x can be optimal only if there exist u 1, u 2,...u m satisfying the following: 1. f x j m i=1 u i g i x j 0 2. x j ( f x j i = 1m u i g i x j ) = 0 3. g i (X ) b i 0 vi

4. u i (g i (x ) b i ) = 0 5. x j 0 6. u i 9 we have the following optimality conditions: 8 Descent methods to find the optimum 8.1 Gradient descent Consider again the equality constraints, Ax = b. Any vector can be written x = Âx A+Zx z, where  is the range space, and Z is the null space. Ax = b (45) A(Âx A + Zx z ) = b (46) ÂÂT x = b (47) Bypass technical problems by saying that A has full rank. Then we can claim x+p satisfies Ap = 0. We want to search the feasible region for the optimal x in a step-by-step manner. This is known as a line search method. The idea is to move from place to place in the feasible region in the vector direction p = Zp z, which is the set of directions where the equality constraints are not violated. Here p is a descent direction if p T f(x) = p T z Z T f(x) 0, or p T z f(x) 0. Previously our expression for steepest descent was: min yt f y. Now we have p = Zp z = Z(Z T Z) 1 Z T f(x) (48) p = ZZ T f(x)...the reduced gradient. (49) The (Z T Z) 1 term disappears if you find an orthogonal null space. This method is like steepest descent, but slightly easier. It is still a valid way to search the space. All you need to know is a local gradient and a basis for the null space, Z. The prior work involved is to find that Z. 8.2 Newton-type methods Newton-type methods use the 2 f term as well, in the form 2 f = G term. p turns out to be the solution of Z T G(x)Zp z = Z T f(x) (50) vii

This is the same as before, but everything has to be projected in the null space direction. Quasi-Newton methods approximate G at each step using various tricks, because G and G 1 are difficult to calculate. 8.3 Active set methods We can convert inequality (nonbinding) constraints to equality (binding) constraints as follows: { { min f(x) min f(x) s.t. Ax b (51) s.t. Âx = ˆb...where  is the correct active set of the solution (the binding set at the final solution). Active Set Methods iterate both where you are, x, and what constraints are binding. The concepts for active set methods are: 1. Vector x is the current point, and  are binding constraints at x. 2. Check whether x is a solution to the corresponding equality constraints f(x) = ÂT λ for some vector λ. 3. If the answer to 2. is yes, and λ 0, then x is a solution. 4. If the answer to 2. is yes, but λ j < 0 for some j, then find direction α : f(x) T p < 0 (52) a T j α > 0 (53) â T i α = 0 (54) i j (a j is dropped from the working set) 5. If f(x) ÂT λ, then construct a search direction a with A α = 0, and f(x) T p < 0. 8.4 Quadratic Objective **See full working example in textbook under Quadratic Programming A more special case - constraints are linear and objective is quadratic. (55) (56) viii

maxf(x) = cx 0.5x T Qx suchthatax b x 0 Q is a matrix in objective e.g.c = 15 30 x = [ x 1 x 2 ] Q = [ 4 4 4 8 A = [ 1 2 ] b = [30]x T Qx = [ x 1 x 2 ] [ 4 4 4 8 ] ] [ x1 x 2 ] = 4x 2 1 4x 2 x 1 4x 2 x 1 + 8x 2 2 This is the special case where Q is positive semi definite. simplex method. Consider the KKT conditions: This leads to a modified 15 + 4x 2 4x 1 u 1 0 x 1 (15 + 4x 2 4x 1 u 1 ) = 0 30 + 4x 1 8x 2 u 1 0 x 2 (30 + 4x 1 8x 2 u 1 ) = 0 x 1 + 2x 2 30 0 u 1 (x 1 + 2x 2 30) = 0 Introduce slack variables to get a set of equality constraints. Some variables are complimentary. x 1 y 1 + x 2 y 2 + u 1 v 1 = 0 x 1, x 2, y 1, y 2, v 1 0 => either x 1 = 0 or y 1 = 0 etc. Leads to the set of equations: 4x 1 4x 2 + u 1 y 2 = 15 4x 1 + 8x 2 + 2u 1 y 2 = 30 x 1 + 2x 2 + v 1 = 30 x 1 0, x 2 0, u 1 0, y 1 0, y 2 0, v 1 0 x 1 y 1 + x 2 y 2 + u 1 v 1 = 0 ix

i.e. Almost an LP problem. Optimal to original = feasible to this set of linear equations plus complexity constraints. Modified simplex - uses phase 1 idea to find solution. i.e. min Z = m i Z j such that satisfying constraints [ A I ] [ x = starting solution where x = 0, u = Z 0, y = c T, v = b Use the simplex method with restricted entry rule: Exclude any entering variable whose complimentary variable is already basic (non zero). Solution to phase 1 Feasible solution to original problem optimal solution to the real problem. Summary quadratic programming uses simplex method, objective is to find a solution to the KKT condition. These are linear because the objective is quadratic. set up the problem, find a feasible solution to linear equation by keeping the complimentarity condition. Set this up just as e would to find an initial solution to start the simplex method i.e. minimize sum of dummy variables apply simplex, keeping complementary condition satisfied by not allowing 2 complementary variables to both be in the basis if the final solution has objective value zero, we have found a solution to the KKT conditions - therefore solved the original optimization ] x