Constrained Optimization

Similar documents
More on Lagrange multipliers

Algorithms for Constrained Optimization

Optimization Problems with Constraints - introduction to theory, numerical Methods and applications

Lecture 3. Optimization Problems and Iterative Algorithms

1 Computing with constraints

Lecture 18: Optimization Programming

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

2.098/6.255/ Optimization Methods Practice True/False Questions

Nonlinear Optimization: What s important?

4TE3/6TE3. Algorithms for. Continuous Optimization

IE 5531: Engineering Optimization I

Constrained optimization

Constrained optimization: direct methods (cont.)

In view of (31), the second of these is equal to the identity I on E m, while this, in view of (30), implies that the first can be written

Numerical optimization

5 Handling Constraints

Generalization to inequality constrained problem. Maximize

Sufficient Conditions for Finite-variable Constrained Minimization

Numerical Optimization

MATH2070 Optimisation

ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints

Written Examination

Lectures 9 and 10: Constrained optimization problems and their optimality conditions

Advanced Mathematical Programming IE417. Lecture 24. Dr. Ted Ralphs

Numerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods

Gradient Descent. Dr. Xiaowei Huang

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

Chap 2. Optimality conditions

Nonlinear Optimization

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

Algorithms for nonlinear programming problems II

AM 205: lecture 18. Last time: optimization methods Today: conditions for optimality

CE 191: Civil & Environmental Engineering Systems Analysis. LEC 17 : Final Review

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Algorithms for nonlinear programming problems II

MS&E 318 (CME 338) Large-Scale Numerical Optimization

Scientific Computing: Optimization

Support Vector Machines

Quiz Discussion. IE417: Nonlinear Programming: Lecture 12. Motivation. Why do we care? Jeff Linderoth. 16th March 2006

Math 5311 Constrained Optimization Notes

Constrained Optimization and Lagrangian Duality

Introduction to Optimization Techniques. Nonlinear Optimization in Function Spaces

Numerical Optimization. Review: Unconstrained Optimization

Optimality Conditions for Constrained Optimization

Nonlinear Programming (Hillier, Lieberman Chapter 13) CHEM-E7155 Production Planning and Control

Part 5: Penalty and augmented Lagrangian methods for equality constrained optimization. Nick Gould (RAL)

Algorithms for constrained local optimization

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

Penalty and Barrier Methods. So we again build on our unconstrained algorithms, but in a different way.

CS-E4830 Kernel Methods in Machine Learning

Constrained Optimization Theory

TMA 4180 Optimeringsteori KARUSH-KUHN-TUCKER THEOREM

ICS-E4030 Kernel Methods in Machine Learning

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Convex Optimization M2

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Lecture 11 and 12: Penalty methods and augmented Lagrangian methods for nonlinear programming

Numerical Optimization of Partial Differential Equations

SECTION C: CONTINUOUS OPTIMISATION LECTURE 11: THE METHOD OF LAGRANGE MULTIPLIERS

Lecture 13: Constrained optimization

Chapter 2. Optimization. Gradients, convexity, and ALS

N. L. P. NONLINEAR PROGRAMMING (NLP) deals with optimization models with at least one nonlinear function. NLP. Optimization. Models of following form:

WHY DUALITY? Gradient descent Newton s method Quasi-newton Conjugate gradients. No constraints. Non-differentiable ???? Constrained problems? ????

SECTION C: CONTINUOUS OPTIMISATION LECTURE 9: FIRST ORDER OPTIMALITY CONDITIONS FOR CONSTRAINED NONLINEAR PROGRAMMING

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09

University of California, Davis Department of Agricultural and Resource Economics ARE 252 Lecture Notes 2 Quirino Paris

8 Barrier Methods for Constrained Optimization

ON LICQ AND THE UNIQUENESS OF LAGRANGE MULTIPLIERS

Duality Theory of Constrained Optimization

Pattern Classification, and Quadratic Problems

Optimization Methods

2.3 Linear Programming

Nonlinear Programming and the Kuhn-Tucker Conditions

Bindel, Spring 2017 Numerical Analysis (CS 4220) Notes for So far, we have considered unconstrained optimization problems.

Linear Support Vector Machine. Classification. Linear SVM. Huiping Cao. Huiping Cao, Slide 1/26

Support Vector Machines

Machine Learning. Support Vector Machines. Manfred Huber

Optimization Tutorial 1. Basic Gradient Descent

Quadratic Programming

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

IE 5531: Engineering Optimization I

Lecture 2: Linear SVM in the Dual

3E4: Modelling Choice. Introduction to nonlinear programming. Announcements

5. Duality. Lagrangian

Unit: Optimality Conditions and Karush Kuhn Tucker Theorem

Computational Optimization. Augmented Lagrangian NW 17.3

Chapter 3 Numerical Methods

Part 4: Active-set methods for linearly constrained optimization. Nick Gould (RAL)

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

Nonlinear Optimization for Optimal Control

subject to (x 2)(x 4) u,

Lagrange Relaxation and Duality

Duality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities

Examination paper for TMA4180 Optimization I

CONSTRAINED NONLINEAR PROGRAMMING

Optimal control problems with PDE constraints

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares

Transcription:

1 / 22 Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University March 30, 2015

2 / 22 1. Equality constraints only 1.1 Reduced gradient 1.2 Lagrange multiplier and Lagrangian 1.3 Examples 2. KKT conditions 2.1 With inequality constraints 2.2 Non-negative Lagrangian multiplier 2.3 Regularity 2.4 KKT conditions 2.5 Geometric interpretation of KKT conditions 2.6 Examples 3. Sensitivity analysis 4. Generalized reduced gradient (GRG)

3 / 22 From unconstrained to constrained

4 / 22 Optimization with equality constraints (1/3) A general optimization problem with only equality constraints is the following: min x f (x) subject to h j (x) = 0, j = 1, 2,..., m. Let there be n variables and m equality constraints. In an ideal case, we can eliminate m variables by using the equalities and solve an unconstrained problem for the rest n m variables. However, such elimination may not be feasible in practice. Consider taking a feasible perturbation from a feasible point x, the perturbation x needs to be such that the equality constraints are still satisfied. Mathematically, it requires the first-order approximations of the perturbations for constraints to be: h j = n ( h j / x i ) x i = 0, j = 1, 2,...m. (1) i=1

Optimization with equality constraints (2/3) Equation (1) contains a system of linear equations with n m degrees of freedom. Let us define state variables as: and decision variables as: s i := x i, i = 1,..., m, d i := x i, i = m + 1,..., n. The number of decision variables is equal to the number of degrees of freedom. Equation (1) can be rewritten as: ( h/ s) s = ( h/ d) d, (2) where the matrix ( h/ s) is h 1 / s 1 h 1 / s 2 h 1 / s m h 2 / s 1 h 2 / s 2 h 2 / s m......, h m / s 1 h m / s 2 h m / s m which is the Jacobian matrix with respect to the state variables. h/ d is then the Jacobian with respect to the decision variables. 5 / 22

Optimization with equality constraints (3/3) From Equation (2), we can further have s = ( h/ s) 1 ( h/ d) d. (3) Equation (3) shows that for some perturbation for the decision variables, we can derive the corresponding perturbation for the state variables so that h(x) = 0 for first-order approximation. Notice that Equation (3) can only be derived when the Jacobian h/ s is invertible, i.e., the gradients of equality constraints must be linearly independent. Since s can be considered as functions of d, the original constrained optimization problem can be treated as an unconstrained problem for minimizing min f (x) := z(s(d), d). d The gradient of this new objective function is Plug in Equation (3) to have z/ d = ( f / d) + ( f / s)( s/ d). z/ d = ( f / d) ( f / s)( h/ s) 1 ( h/ d). (4) 6 / 22

7 / 22 Lagrange multiplier From Equation (4), a stationary point x = (s, d ) T will then satisfy ( f / d) ( f / s)( h/ s) 1 ( h/ d) = 0 T, (5) evaluated at x. Equation (5) and h = 0 together have n equalities and n variables. The stationary point x can be found when h/ s is invertible for some choice of s. Now introduce the Lagrange multiplier as From Equations (5) and (6), we can have λ T := ( f / s)( h/ s) 1. (6) ( f / d) + λ T ( h/ d) = 0 T and ( f / s) + λ T ( h/ s) = 0 T. Recall that x = (s, d) T, then for a stationary point we have ( f / x) + λ T ( h/ x) = 0. (7)

8 / 22 Lagrangian function Introduce the Lagrangian function for the original optimization problem with equality constraints: L(x, λ) := f (x) + λ T h(x). First-order necessary condition: x is a (constrained) stationary point if L/ x = 0 and L/ λ = 0. This condition leads to Equation (7) and h = 0, which in total has m + n variables (x and λ) and m + n equalities. The stationary point solved using the Lagrangian function will be the same as that from the reduced gradient method in Equation (5). Define the Hessian of the Lagrangian with respect to x as L xx. Second-order sufficiency condition: If x together with some λ satisfies L/ x = 0 and h = 0, and x T L xx x > 0 for any x 0 that satisfies h x x = 0, then x is a local (constrained) minimum.

9 / 22 Examples (1/4) Exercise 5.2 For the problem min (x 1 2) 2 + (x 2 2) 2 x 1,x 2 subject to x 2 1 + x 2 2 1 = 0, find the optimal solution using constrained derivatives (reduced gradient) and Lagrange multipliers. Exercise 5.3 For the problem min x 1,x 2 x 2 1 + x 2 2 x 2 3 subject to 5x 2 1 + 4x 2 2 + x 2 3 20 = 0, x 1 + x 2 x 3 = 0, find the optimal solution using constrained derivatives and Lagrange multipliers.

10 / 22 Examples (2/4) (A problem where Lagrangian multipliers cannot be found) For the problem min x 1,x 2 x 1 + x 2 subject to (x 1 1) 2 + x 2 2 1 = 0, (x 1 2) 2 + x 2 2 4 = 0, find the optimal solution and Lagrange multipliers. (Source: Fig. 3.1.2 D.P. Bertsekas, Nonlinear Programming)

11 / 22 Examples (3/4) Important: In all development of theory hereafter, we assume that stationary points are regular. We will discuss in more details the regularity condition in the next section on KKT. (A problem where Lagrangian multipliers are zeros) For the problem min x x 2 subject to x = 0, find the optimal solution and Lagrange multipliers. Important: In all development of theory hereafter, we assume that all equality constraints are active.

12 / 22 Examples (4/4) Example 5.6 Consider the problem with x i > 0: min x 1,x 2,x 3 x 2 1x 2 + x 2 2x 3 + x 1 x 2 3 subject to x 2 1 + x 2 2 + x 2 3 3 = 0. find the optimal solution using Lagrangian.

13 / 22 With inequality constraints Let us now look at the constrained optimization problem with both equality and inequality constraints min x f (x) subject to g(x) 0, h(x) = 0. Denote ĝ as a set of inequality constraints that are active at a stationary point. Then following the discussion on the optimality conditions for problems with equality constraints, we have ( f / x) + λ T ( h/ x) + ˆµ T ( ĝ/ x) = 0 T, (8) where λ and ˆµ are Lagrangian multipliers on h and ĝ.

14 / 22 Nonnegative Lagrange multiplier The Lagrange multipliers (at the local minimum) for inequality constraints µ are nonnegative. This can be shown by examining the first-order perturbations for f, g and h at a local minimum for feasible nonzero perturbations x: f ĝ x 0, x h x 0, x x = 0. (9) x Combining Equations (8) and (9) we get ˆµ T ĝ 0. Since ĝ 0 for feasibility, we have ˆµ 0.

15 / 22 Regularity A Regular point x is such that the active inequality constraints and all equality constraints are linearly independent, i.e., (( ĝ/ x) T, ( h/ x T )) should have independent columns. Active constraints with zero multipliers are possible when x is not a regular point. This situation is usually referred to as degeneracy.

The Karush-Kuhn-Tucker (KKT) conditions For the optimization problem min x f (x) subject to g(x) 0, h(x) = 0, its optimal solution x (assumed to be regular) must satisfy g(x ) 0; h(x ) = 0; ( f / x ) + λ T ( h/ x ) + µ T ( g/ x ) = 0 T, where λ 0, µ 0, µ T g = 0. (10) A point that satisfies the KKT conditions is called a KKT point and may not be a minimum since the conditions are not sufficient. Second-order sufficiency conditions: If a KKT point x exists, such that the Hessian of the Lagrangian on feasible perturbations is positive-definite, i.e., x T L xx x > 0 for any nonzero x that satisfies h x x = 0 and ĝ x x = 0, then x is a local constrained minimum. 16 / 22

17 / 22 Geometry interpretation of KKT conditions The KKT conditions (necessary) state that f / x should belong to the cone spanned by the gradients of the active constraints at x. The second-order sufficiency conditions require both the objective functon and the feasible space be locally convex at the solution. Further, if a KKT point exists for a convex function subject to a convex constraint set, then this point is a unique global minimizer.

18 / 22 Example Example 5.10: Solve the following problem using KKT conditions min x 1,x 2 8x 2 1 8x 1 x 2 + 3x 2 2 subject to x 1 4x 2 + 3 0, x 1 + 2x 2 0. Example with irregular solution: Solve the following problem min x 1,x 2 x 1 subject to x 2 (1 x 1 ) 3 0, x 1 0, x 2 0.

19 / 22 Sensitivity analysis (1/2) Consider the constrained problem with local minimum x and h(x ) = 0 being the set of equality constraints and active inequality constraints. What will happen to the optimal objective value f (x ) when we make a small perturbation h, e.g., slightly relax (restrain) the constraints? Use the partition x = ( d, s) T. We have h = ( h/ d) d + ( h/ s) s. Assuming x is regular thus ( h/ s) 1 exists, we further have s = ( ) 1 h h s ( h s ) 1 ( ) h d. (11) d Recall that the perturbation of the objective function is ( ) ( ) f f f = d + s. (12) d s

20 / 22 Sensitivity analysis (2/2) Use Equation (11) in Equation (12) to have f = ( f s ) ( ) 1 h h + s ( ) z d. (13) d Notice that the reduced gradient ( z/ d) is zero at x. Therefore f (x ) = ( ) ( ) 1 f h h s s = λ T h. (14) To conclude, for a unit perturbation in active (equality and inequality) constraints h, the optimal objective value will be changed by λ. Note that the analysis here is based on first-order approximation and is only valid for small changes in constraints.

21 / 22 Generalized reduced gradient (1/2) We discussed the optimality conditions for constrained problems. Generalized reduced gradient (GRG) is an iterative algorithm to find solutions for ( z/ d) = 0 T. Similar to the gradient descent method for unconstrained problems, we update the decision variables by d k+1 = d k α( z/ d) T k. The corresponding state variables can be found by s k+1 = s k ( h/ s) 1 k ( h/ d) k d k = s k + α k ( h/ s) 1 k ( h/ d) k ( z/ d) T k.

Generalized reduced gradient (2/2) Note that the above calculation is based on the linearization of the constraints and it will not satisfy the constraints exactly unless they are all linear. However, a solution to the nonlinear system h(d k+1, s k+1 ) = 0, given d k+1 can be found iteratively using s k+1 as an initial guess and the following iteration [s k+1 ] j+1 = [s k+1 ( h/ s) 1 k+1 h(d k+1, s k+1 )] j. The iteration on the decision variables may also be performed based on Newton s method: d k+1 = d k α( 2 z/ d 2 ) 1 k ( z/ d) T k. The state variables can also be adjusted by the quadratic approximation s k+1 = s k + ( s/ d) k d k + (1/2) d T k ( 2 s/ d 2 ) k d k. The GRG algorithm can be used with the presence of inequality constraints when accompanied by an active set algorithm. This will be discussed in Chapter 7. 22 / 22