KKT Examples. Stanley B. Gershwin Massachusetts Institute of Technology

Similar documents
Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Lecture 18: Optimization Programming

SECTION C: CONTINUOUS OPTIMISATION LECTURE 9: FIRST ORDER OPTIMALITY CONDITIONS FOR CONSTRAINED NONLINEAR PROGRAMMING

Generalization to inequality constrained problem. Maximize

Constrained Optimization

TMA 4180 Optimeringsteori KARUSH-KUHN-TUCKER THEOREM

Nonlinear Programming and the Kuhn-Tucker Conditions

MIT Manufacturing Systems Analysis Lecture 14-16

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Constrained optimization

5 Handling Constraints

2.098/6.255/ Optimization Methods Practice True/False Questions

More on Lagrange multipliers

Let f(x) = x, but the domain of f is the interval 0 x 1. Note

Topic one: Production line profit maximization subject to a production rate constraint. c 2010 Chuan Shi Topic one: Line optimization : 22/79

University of California, Davis Department of Agricultural and Resource Economics ARE 252 Lecture Notes 2 Quirino Paris

Optimization Problems with Constraints - introduction to theory, numerical Methods and applications

STATIC LECTURE 4: CONSTRAINED OPTIMIZATION II - KUHN TUCKER THEORY

2015 Math Camp Calculus Exam Solution

Math 5311 Constrained Optimization Notes

In view of (31), the second of these is equal to the identity I on E m, while this, in view of (30), implies that the first can be written

Introduction to Optimization Techniques. Nonlinear Optimization in Function Spaces

Multidisciplinary System Design Optimization (MSDO)

Nonlinear Programming (Hillier, Lieberman Chapter 13) CHEM-E7155 Production Planning and Control

Mathematical Economics. Lecture Notes (in extracts)

Numerical Optimization

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

Chap 2. Optimality conditions

Principles of Optimal Control Spring 2008

Scientific Computing: Optimization

Mathematics-I Prof. S.K. Ray Department of Mathematics and Statistics Indian Institute of Technology, Kanpur. Lecture 1 Real Numbers

Gradient Descent. Dr. Xiaowei Huang

Optimization. A first course on mathematics for economists

Constrained Optimization

Unit: Optimality Conditions and Karush Kuhn Tucker Theorem

ON LICQ AND THE UNIQUENESS OF LAGRANGE MULTIPLIERS

Constrained Optimization and Lagrangian Duality

Pattern Classification, and Quadratic Problems

Support Vector Machine (SVM) and Kernel Methods

An overview of key ideas

Optimality Conditions

The Karush-Kuhn-Tucker (KKT) conditions

6.254 : Game Theory with Engineering Applications Lecture 7: Supermodular Games

g(x,y) = c. For instance (see Figure 1 on the right), consider the optimization problem maximize subject to

5. Duality. Lagrangian

Convex Optimization M2

The Kuhn-Tucker Problem

Numerical optimization

Optimization Tutorial 1. Basic Gradient Descent

Optimality, Duality, Complementarity for Constrained Optimization

Numerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems

Mathematical Foundations -1- Constrained Optimization. Constrained Optimization. An intuitive approach 2. First Order Conditions (FOC) 7

Support Vector Machine (SVM) and Kernel Methods

Lecture: Duality.

CONSTRAINED NONLINEAR PROGRAMMING

εx 2 + x 1 = 0. (2) Suppose we try a regular perturbation expansion on it. Setting ε = 0 gives x 1 = 0,

IE 5531: Engineering Optimization I

CS-E4830 Kernel Methods in Machine Learning

Chapter 7. Extremal Problems. 7.1 Extrema and Local Extrema

Lecture 2: Linear SVM in the Dual

FIN 550 Practice Exam Answers. A. Linear programs typically have interior solutions.

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

Finite Dimensional Optimization Part I: The KKT Theorem 1

Support Vector Machines: Maximum Margin Classifiers

1 Computing with constraints

The Fundamental Welfare Theorems

An Introduction to Algebraic Multigrid (AMG) Algorithms Derrick Cerwinsky and Craig C. Douglas 1/84

8. Constrained Optimization

Calculus and optimization

Convex Optimization Boyd & Vandenberghe. 5. Duality

Review of Optimization Methods

Nonlinear Optimization

Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares

Review of Optimization Basics

Support Vector Machines

Date: July 5, Contents

1. Method 1: bisection. The bisection methods starts from two points a 0 and b 0 such that

Math 5a Reading Assignments for Sections

8 Barrier Methods for Constrained Optimization

Statistical Machine Learning from Data

Math 164 (Lec 1): Optimization Instructor: Alpár R. Mészáros

Chapter 3: Constrained Extrema

Lecture 3. Optimization Problems and Iterative Algorithms

Approximation, Taylor Polynomials, and Derivatives

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Interior-Point Methods for Linear Optimization

Tangent spaces, normals and extrema

Summary Notes on Maximization

Optimization. Charles J. Geyer School of Statistics University of Minnesota. Stat 8054 Lecture Notes

Duality of LPs and Applications

Calculus Example Exam Solutions

Lecture 9: Large Margin Classifiers. Linear Support Vector Machines

ARE202A, Fall 2005 CONTENTS. 1. Graphical Overview of Optimization Theory (cont) Separating Hyperplanes 1

Lecture: Duality of LP, SOCP and SDP

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

11 The Max-Product Algorithm

Constrained Optimization Theory

SECTION C: CONTINUOUS OPTIMISATION LECTURE 11: THE METHOD OF LAGRANGE MULTIPLIERS

Newton-Type Algorithms for Nonlinear Constrained Optimization

Transcription:

Stanley B. Gershwin Massachusetts Institute of Technology The purpose of this note is to supplement the slides that describe the Karush-Kuhn-Tucker conditions. Neither these notes nor the slides are a complete description of these conditions; they are only intended to provide some intuition about how the conditions are sometimes used and what they mean. The KKT conditions are usually not solved directly in the analysis of practical large nonlinear programming problems by software packages. Iterative successive approximation methods are most often used. The results, however they are obtained, must satisfy these conditions. Example 0 No constraints: min J= x + x + x 3 The solution is x = x = x 3 = x 4 = 0 and J = 0. + x4 Example One equality constraint: min J= x + x + x 3 + x 4 x + x + x 3 + x 4 = ()

Solution: Adjoin the constraint min J = x + x + x 3 + x 4 + λ( x x x 3 x 4 ) x + x + x 3 + x 4 = In this context, λ is called a Lagrange multiplier. The KKT conditions reduce, in this case, to setting J/ x to zero: so so Therefore x λ 0 J x λ 0 = = () x x 3 λ 0 x 4 λ 0 λ x = x = x 3 = x 4 = ( ) λ x + x + x 3 + x 4 = 4 = or λ = x = x = x 3 = x 4 = and J = 4 4 x 4 x = x = 4 λ ( 4, 4, 4, 4 ) x equality constraint J=constant Figure : Example, represented in two dimensions Comments Why should we adjoin the constraints? The answer to this question has two parts:

λ First, it does no harm. There is no difference between J and J for any set of x i that satisfies the problem since the set of x i that satisfies the problem must satisfy (). Furthermore, it works. This may not feel like a very satisfying answer, but this is also why we plug x = Ce λt into a linear differential equation with constant coefficients. We could have done a zillion other things that would have done no harm, but they would not have done any good either. There is not much point in studying them; we only study what works. λ In this case, it provides a relationship that we can use among the components of x (equation ()). In general, it replaces the minimization requirement with a set of equations and inequalities, and there are just enough of them to determine a unique solution (when the problem has a unique solution). The solution is illustrated in Figure. We are seeking the smallest 4-dimensional sphere that intersects with the equality constraint (a 3-dimensional plane in 4-dimensional space). Equation () essentially tells us that the solution point is on a line that intersects with that plane. If we asked for the maximum rather than the minimum of J, the same necessary conditions would have applied, and we would have gotten the same answer following the steps shown here. However, that answer would be wrong. After all, if it is a minimum, it cannot also be a maximum unless the function is a constant, which it certainly is not. We know that we have found a minimum because of the second order conditions: the second derivative matrix J/ x is positive definite. It is a coincidence that for both Example 0 and Example J = x i, i =,..., 4 at the optimum. What is not a coincidence is that J for Example is greater than J for Example 0. If you change an optimization problem by adding a constraint, you make the optimum worse; or, at best, you leave it unchanged. Example One equality constraint and one inequality constraint: min J = x + x + x 3 + x 4 x + x + x 3 + x 4 = (3) x 4 A (4) in which A is a parameter that we will play with. Figures and 3 illustrate two possible versions of this problem, depending on the value of A. (The shaded regions are the forbidden values of x, the places where x 4 > A.) 3

x 4 x 4 inequality constraint x* x* x equality constraint inequality constraint x equality constraint J=constant J=constant Figure : Example, A large Figure 3: Example, A small Digression: The inequality constraint requires a new Lagrange multiplier. To understand it, let us temporarily ignore the equality constraint and consider the following scalar problem, in which J and g are arbitrary functions that are differentiable, whose derivatives are continuous, and where J has a minimum: min J(x) (5) x g(x) 0 There are two possibilities: the solution x satisfies g(x ) < 0 (ie, where the solution is strictly in the interior of the inequality condition) or it satisfies g(x ) = 0 (where the solution is on the boundary of the interior of the inequality condition). Figures 4 and 5 illustrate these two possibilities. Possibility, the interior solution: The necessary condition is that x satisfies dj (x ) = 0 (6) dx That is, the solution is the same as that of the problem without the inequality constraint. Possibility, the boundary solution: Let x = x + δx. For x to be the optimal solution to (5), δx = 0 must be the solution to min J(x + δx) (7) δx g(x + δx) 0 This is certainly true if we restrict our attention to small δx. In that case, we can expand J and g as first order Taylor approximations and (7) becomes, approximately 4

g(x) < 0 g(x) < 0 Figure 4: Case, Interior solution Figure 5: Case, Boundary solution dj min J(x ) + (x )δx δx dx g(x ) + dg (x )δx 0 dx Since J(x ) is independent of δx and g(x ) = 0, this can be written dj min (x )δx (8) δx dx dg (x )δx 0 dx We now seek conditions on dj (x ) and dg (x ) such that the solution to (8) is δx = 0. Note dx dx that (8) is a linear programming problem. To be really exhaustive about it, we must now consider the four cases in the table below. Case, for example, reduces to min δx δx δx 0 Actually, there are nine cases, since we could also have dj/dx = 0 or dg/dx = 0. However, if dj/dx = 0 at x = x, then we really have a situation like the interior solution. That is, the constraint is still ineffective since we would have the same x even if we ignored the g(x) 0 condition. If dg/dx = 0 at x = x, then the constraint qualification is violated. This is discussed below. 5

since the magnitudes do not matter only the signs matter. The solution is clearly δx =. Similarly, Case becomes min δx δx δx 0 and it is clear that the solution is δx = 0. The other cases are similar. Case 3 4 dj (x ) dx > 0 > 0 < 0 < 0 dg (x ) dx > 0 < 0 > 0 < 0 Solution δx = δx = 0 δx = 0 δx = Therefore, the only cases in which the solution is δx = 0 are Cases and 3, the cases in which the signs of dj (x ) and dg (x ) are opposite. Therefore, there is some positive number µ such dx dx that or dj (x ) = µ dg (x ) dx dx dj dg (x ) + µ (x ) = 0 dx dx Equation (6) is implied by this if we require µ = 0 when g(x ) < 0. Solution: For the problem of Example, define J = x + x + x 3 + x 4 + λ( x x x 3 x 4 ) + µ(x 4 A) Then, the KKT conditions are 6

From (9): Therefore so, from (0), or Therefore From (), or J = 0 x (9) x + x + x 3 + x 4 = (0) x 4 A () µ 0 () µ(x 4 A) = 0 (3) x λ 0 J x λ 0 = = x x 3 λ 0 x 4 λ + µ 0 λ λ µ x = x = x 3 =, x 4 = ( ) λ µ x + x + x 3 + x 4 = 4 = + µ 4λ µ = or λ = 4 + µ µ + µ µ 3µ x = x = x 3 = = +, x 4 = = (4) 8 4 8 8 4 8 3µ A 4 8 3µ A (5) 8 4 Case : A > /4 This is the interior case illustrated in Figures and 4. Since /4 A 0, (5) implies that () is automatically satisfied. From (4) x = x = x 3 ; x 4 = (x + x + x 3 ) 4 4 But we have more from (4): (3) implies that µ = 0. Therefore x = x = x 3 = x 4 = 4 which is consistent with the answer to Example and common sense. Example says that this is optimal; if we also require that x 4 is less than A and A is greater than /4, we haven t changed anything. Note that the optimal J is again /4. 7

Case : A = /4 This behaves like Case. The unconstrained optimum lies on the boundary. Therefore, if we ignored the inequality constraint, we would get the same x. Case 3: A < /4 If x 4 were strictly less than A, then (3) would require that µ = 0. But then (4) would imply x = /4, which violates (). Therefore x 4 = A and x = x = x 3 = ( A) 3 Also ) J = 3 ( A) + A = ( A) + A 9 3 = A + 4A 3 and that J /4 and that J = /4 when A = /4. Comments The comments after Example hold here. if A 4 4 J = A + 4A otherwise 3 8

The graph of J as a function of A is: J.5.5 0.5 0 0.5 0 0.5 A The extension of the KKTs to more than one equation and/or more than one inequality constraint is straightforward, but there is one more condition to be applied in general, the so-called constraint qualification. This says that the gradients of the equations and of the effective inequality constraints must be linearly independent at the solution x. The informal discussion of the KKT conditions was modeled on that of Bryson and Ho (975). There are plenty of other references, but this discussion is especially intuitive. /4 References Bryson, A. E. and Y.-C. Ho (975). Applied Optimal Control: Optimization, Estimation, and Control. Hemisphere Publishing Corporation. 9

MIT OpenCourseWare http://ocw.mit.edu.854 /.853 Introduction to Manufacturing Systems Fall 00 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.