Quiz Discussion IE417: Nonlinear Programming: Lecture 12 Jeff Linderoth Department of Industrial and Systems Engineering Lehigh University 16th March 2006 Motivation Why do we care? We are interested in determining conditions under which we can verify that a solution is optimal. For constrained problems. For a very simple example, lets assume we are minimizing functions that are One-dimensional Continuous Differentiable Recall: a function f(x) is convex on a set S if for all a S and b S, f(λa + (1 λ)b) λf(a) + (1 λ)b. Algorithms for nonlinear programming work to find points that satisfy these conditions When faced with a problem that you don t know how to handle, write down the optimality conditions Often you can learn a lot about a problem, by examining the properties of its optimal solutions.
(1-D) Constrained Optimization Breaking it down Now we consider the following problem for scalar variable x R 1. z = min 0 x u f(x) There are three cases for where an optimal solution might be x = 0 0 < x < u x = u If 0 < x < u, then the necessary and sufficient conditions for optimality are the same as the unconstrained case You should know these all too well! Namely, a necessary condition is that f(x) = 0 What if NOT 0 < x < u Example (x R 2 ) If x = 0, then we need f (x) 0 (necessary), f > 0 (sufficient) If x = u, then we need f (x) 0 (necessary), f > 0 (sufficient) KKT Conditions How do these conditions generalize to optimization problems with more than one variable? The intuition if a constraint holds with equality (is binding), then the gradient of the objective function must be pointing in a way that would improve the objective. Formally The gradient of the objective function must be a linear combination of the gradients of the binding constraints. min f(x) = x 1 + x 2 s.t.c(x) def = x 2 1 + x 2 2 2 = 0 f(x) c(x) The Key At the optimal solution, c is parallel to f: f(x ) = λ c(x )
Why Parallel? Example (x R 2 ) minimize We need that any better point remain feasible, which to first order implies 0 = c(x + d) c(x) + c T (x)d = c T (x)d Any better point must yield a descent direction: f(x) T d < 0 x 1 + x 2 x 2 1 + x 2 2 2 0 x 2 0 You see at the optimal solution x = ( 2, 0), The Canonical Problem Recall! min f(x) x R n c i (x) = 0 i E c i (x) 0 i I Local Solution ˆx is a local solution of (NLP) if ˆx Ω and a neighborhood N (ˆx) such that f(x) f(ˆx) x N Ω Or if then Ω = {x R n c i (x) = 0, i E, c i (x) 0, i I Strict Local Solution ˆx is a strict local solution of (NLP) if ˆx Ω and a neighborhood N (ˆx) such that f(x) > f(ˆx) x N Ω with ˆx x min f(x) x Ω (NLP)
KKT Conditions Lagrangians Lagrangian L(x, λ) = f(x) E I λ i c i (x) Geometrically, if (ˆx) is an optimal solution, then we must be able to write ˆx as an appropriate linear combination of the binding constraints. If a constraint is not binding, it s weight must be 0. Active Set LICQ A(x) = E {i I c i (x) = 0 Given x and active set A(x ), the linear indepence constraint qualification (LICQ) holds if the set of vectors c i (x ), i A(x ) is linearly independent First Order Necessary (KKT) Conditions Returning to example If x is a local solution NLP, and LICQ holds at x, then there exists multipliers λ i, i E ci such that KKT x L(x, λ ) = 0 (1) c i (x ) = 0 i E (2) c i (x ) 0 i I (3) λ i 0 i I (4) λ i c i (x ) = 0 i I E (5) minimize x 1 + x 2 x 2 1 x 2 2 2 0 (λ 1 ) x 2 0 (λ 2 ) Can write (??) as 0 = x L(x, λ ) = f(x ) λ i c i (x ) i A(x )
KKT Conditions Primal Feasible: Dual Feasible: Complementary Slackness: x 2 1 x 2 2 2 0 (λ 1 ) x 2 0 (λ 2 ) λ 1 0 λ 2 0 1 λ 1 (2x 1 ) = 0 1 λ 1 (2x 2 ) + λ 2 = 0 λ 1 (2 x 2 1 x 2 2) = 0 λ 2 x 2 = 0 Checking Optimal Solutions Let s check whether or not the necessary (KKT) conditions for optimality are satisfied by x = ( 2, 0) Primal Feasible (OK). Dual Feasible Complementary Slackness λ 1 = 1 2 2 λ 2 = 1 λ 1 (2 x 2 1 x 2 2) = 0 λ 2 x 2 = 0 x = ( 2, 0) is potentially an optimal solution Another Example KKT Conditions maximize f(x) = 2x 1 + 3x 2 + 4x 2 1 + 2x 1 x 2 + x 2 2 x 1 x 2 0 x 1 + x 2 4 x 1 3 Primal Feasibility x 1 x 2 0 x 1 + x 2 4 x 1 3 Dual Feasibility 2 + 8x 1 + 2x 2 = λ 1 + λ 2 + λ 3 3 + 2x 1 + 2x 2 = λ 1 + λ 2 λ 1, λ 2, λ 3 0 Complementary Slackness λ 1 (x 1 x 2 ) = 0 λ 2 (4 x 1 x 2 ) = 0 λ 3 (3 x 1 ) = 0
Checking Some Points What are Multipliers? Active Constraints Is x 1 = 2, x 2 = 2 an optimal point? Is x 1 = 3, x 2 = 1 an optimal point? Suppose I change the right hand side of active, inequality constraint i to be c i (x) ɛ c i (x ) Suppose the new solution is x (ɛ) Suppose ɛ is so small that the active set of constraints does not change, and the effect on all multipliers is so small that it can be ignored Constraints: ɛ c i (x ) = c i (x (ɛ)) c i (x ) (x (ɛ) x ) T c i (x ) 0 = c j (x (ɛ)) c j (x ) (x (ɛ) x ) T c j (x ) j i A Multipliers Active Constraints Homework Objective: In the limit: The Upshot f(x (ɛ) f(x ) (x (ɛ) x ) T f(x ) = λ j(x (ɛ) x ) T c j (x ) df(x (ɛ)) dɛ j A(x ) ɛ c i (x ) λ i = λ i c i (x ) If λ i c i(x ) is large, then the objective function value is very sensitive to changes in constraint i. If λ i = 0, the constraint is inactive, and small perturbations to c i will not affect the objective function value Due: 3/27 Turn in: 12.13, 12.18 Try/Do: 12.2, 12.3, 12.4, 12.6, 12.7, 12.13, 12.15, 12.16, 12.21