Convex Optimization Lecture 6: KKT Conditions, and applications

Similar documents
Convex Optimization & Lagrange Duality

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient

Convex Optimization M2

Optimality Conditions for Constrained Optimization

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

5. Duality. Lagrangian

Chap 2. Optimality conditions

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Constrained Optimization and Lagrangian Duality

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus

Convex Optimization Boyd & Vandenberghe. 5. Duality

Lecture: Duality.

Optimization and Optimal Control in Banach Spaces

Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Lecture: Duality of LP, SOCP and SDP

subject to (x 2)(x 4) u,

Subgradients. subgradients and quasigradients. subgradient calculus. optimality conditions via subgradients. directional derivatives

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Lecture 6: Conic Optimization September 8

Math 273a: Optimization Subgradients of convex functions

ICS-E4030 Kernel Methods in Machine Learning

Lecture 8. Strong Duality Results. September 22, 2008

4TE3/6TE3. Algorithms for. Continuous Optimization

Duality Theory of Constrained Optimization

Generalization to inequality constrained problem. Maximize

Nonlinear Optimization

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Subgradients. subgradients. strong and weak subgradient calculus. optimality conditions via subgradients. directional derivatives

Inequality Constraints

Duality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities

Lectures 9 and 10: Constrained optimization problems and their optimality conditions

CONSTRAINT QUALIFICATIONS, LAGRANGIAN DUALITY & SADDLE POINT OPTIMALITY CONDITIONS

CSCI : Optimization and Control of Networks. Review on Convex Optimization

The general programming problem is the nonlinear programming problem where a given function is maximized subject to a set of inequality constraints.

A Brief Review on Convex Optimization

The Karush-Kuhn-Tucker (KKT) conditions

Nonlinear Programming (Hillier, Lieberman Chapter 13) CHEM-E7155 Production Planning and Control

EE/AA 578, Univ of Washington, Fall Duality

Lecture 18: Optimization Programming

In Progress: Summary of Notation and Basic Results Convex Analysis C&O 663, Fall 2009

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

Convex Functions. Pontus Giselsson

Additional Homework Problems

Lecture 5. Theorems of Alternatives and Self-Dual Embedding

CO 250 Final Exam Guide

Lecture 3. Optimization Problems and Iterative Algorithms

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

Part IB Optimisation

Optimization for Machine Learning

Convex Optimization and Modeling

2.098/6.255/ Optimization Methods Practice True/False Questions

CS-E4830 Kernel Methods in Machine Learning

Chapter 2. Optimization. Gradients, convexity, and ALS

Linear and Combinatorial Optimization

Convex Optimization Theory. Chapter 5 Exercises and Solutions: Extended Version

Constrained Optimization

Nonlinear Programming and the Kuhn-Tucker Conditions

Primal/Dual Decomposition Methods

STATIC LECTURE 4: CONSTRAINED OPTIMIZATION II - KUHN TUCKER THEORY

Lagrangian Duality and Convex Optimization

5 Handling Constraints

Optimality, Duality, Complementarity for Constrained Optimization

ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints

BASICS OF CONVEX ANALYSIS

Applications of Linear Programming

Lagrange Relaxation and Duality

University of California, Davis Department of Agricultural and Resource Economics ARE 252 Lecture Notes 2 Quirino Paris

Duality Uses and Correspondences. Ryan Tibshirani Convex Optimization

Numerical Optimization

p. 5 First line of Section 1.4. Change nonzero vector v R n to nonzero vector v C n int([x, y]) = (x, y)

Solving generalized semi-infinite programs by reduction to simpler problems.

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

IE 5531: Engineering Optimization I

Maximal Monotone Inclusions and Fitzpatrick Functions

Convex Optimization Overview (cnt d)

Lagrange duality. The Lagrangian. We consider an optimization program of the form

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness.

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

SECTION C: CONTINUOUS OPTIMISATION LECTURE 9: FIRST ORDER OPTIMALITY CONDITIONS FOR CONSTRAINED NONLINEAR PROGRAMMING

LECTURE 13 LECTURE OUTLINE

Lecture: Introduction to LP, SDP and SOCP

LECTURE 12 LECTURE OUTLINE. Subgradients Fenchel inequality Sensitivity in constrained optimization Subdifferential calculus Optimality conditions

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Existence of minimizers

Miscellaneous Nonlinear Programming Exercises

Jørgen Tind, Department of Statistics and Operations Research, University of Copenhagen, Universitetsparken 5, 2100 Copenhagen O, Denmark.

Optimization Problems with Constraints - introduction to theory, numerical Methods and applications

Optimality Conditions for Nonsmooth Convex Optimization

Chapter 2: Preliminaries and elements of convex analysis

Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 4

Franco Giannessi, Giandomenico Mastroeni. Institute of Mathematics University of Verona, Verona, Italy

Stochastic Programming Math Review and MultiPeriod Models

Linear and non-linear programming

Finite Dimensional Optimization Part III: Convex Optimization 1

On the Method of Lagrange Multipliers

Lagrangian Duality for Dummies

Roles of Convexity in Optimization Theory. Efor, T. E and Nshi C. E

Transcription:

Convex Optimization Lecture 6: KKT Conditions, and applications Dr. Michel Baes, IFOR / ETH Zürich

Quick recall of last week s lecture Various aspects of convexity: The set of minimizers is convex. Convex functions are line-differentiable (i.e. the limit lim t 0 [f(x+td) f(x)]/t always exists). Differentiable convex functions: equivalent definitions, easier optimality conditions. Subdifferential: a generalization of gradient. New optimality conditions. Deducing differentiability by looking at f(x). Conjugate functions arise naturally from duality. g f(x) iff x f (g). An easy tool: support functions. Support function of subdifferentials.

Combining subdifferentials: Subdifferential of a maximum Let f 1,..., f m : R n R {+ } be convex, such that D := m i=1 intdom(f i) φ. Let f(x) := max i f i (x). f 1 f 3 Let I(x) := {i : f i (x) = f(x)} for x D. f(x) = C := conv{ f i (x) : i I(x)}. Proof: (see blackboard). Key steps: f 2 We just need to check σ C σ f(x) as f(x) and C are closed and convex. Let d R n. Then lim t 0 I(x + td) I(x). σ f(x) (d) = f(x)[d] = max i I(x) f i (x)[d]. f i (x)[d] = σ fi (x)(d) = max{ g i, d : g i f i (x)}. Remember the support function of a k-simplex. Adapting it slightly, σ C (d) = max i I(x) { g i, d : g i f i (x)}.

Some examples Let f(t) := t = max{t, t}. Then f(t) = sign(t) for t 0. Also, f(0) = conv{ 1,1} = [ 1,1]. Let f(x) := max 1 i n x i, and I(x) := {i : x i = f(x)}. Then f(x) = conv{e i : i I(x)}. In particular, f(0) = n := {g 0 : i g i = 1}. Observe that g f(0) iff 0 f (g) iff g minimizes f. Now, f is the support function of n. Thus f = χ n, and f = χ n = χ n, which is indeed minimized in n. Generalizable for every support function

Combining subdifferentials: Subdifferential of a sum Let f 1, f 2 : R n R {+ } be convex, such that D := i relint(dom(f i )) φ, and s := f 1 + f 2. Then s(x) = f 1 (x) + f 2 (x) for all x D. The proof, due to Rockafellar, is far to be trivial. The direction is easy: if g i f i (x), f i (y) f i (x) + g i, y x y, and i = 1,2. Summing up both sides, we get that g 1 + g 2 s(x). Sketch for : We use g s (x) iff s(x) + s (g) = g, x. It can be proven that s (g) = inf{f1 (u) + f 2 (v) : u + v = g} when D φ. Now: g s (x) g, x = f 1 (x) + f 2 (x) + f 1 (u ) + f 2 (v ) iff u f 1 (x), v f 2 (x), and u + v = g.

Subdifferential of a sum The missing part The conjugate of a sum [Rockafellar, Th. 16.4] Let g 1, g 2 : R n R {+ } be convex. g1 (x) + g 2 (x) = sup y,z y + z, x g 1(y) g 2 (z) = sup d = sup d sup y+z=d d, x y + z, x g 1 (y) g 2 (z) inf g 1(y) + g 2 (z) = φ (x), y+z=d where φ(d) := inf{g 1 (y) + g 2 (z) : y + z = d} is the inf-convolution of g 1 and g 2. We let g 1 := f1, g 2 := f2. Since (f 1 + f 2 ) = (f 1 + f 2 ) when i relint(dom(f i )) φ, we get the needed result.

The Karush-Kuhn-Tucker Theorem The expression Kuhn-Tucker has 185,000 hits on Google. Needless to say, it is a cornerstone of Optimization. Proved in 1939 in the Master Thesis of Karush, rediscovered in 1951 by Kuhn and Tucker.

The Karush-Kuhn-Tucker Theorem Theorem 1 (KKT Conditions for Convex Optimization) Let f : R n R {+ } be a convex function, g 1,..., g m be concave functions, b R m such that Slater s condition holds: x : g i ( x) > b i for 1 i m. A point x is a solution to f = min{f(x) : g(x) b} iff g(x ) b, (Feasibility) h 0 f(x ), h i ( g i (x )), ( Original λ i 0 for 1 i m: KKT h 0 + i I(x ) λ i h i = 0, Conditions) where I(x ) := {i : g i (x ) = b i }. Note: The minus sign ensures that ( g i (x )) φ.

The Karush-Kuhn-Tucker Theorem: the proof is simple with subdifferentials f = min{f(x) : g(x) b} (P) Let φ(x) := max{f(x) f, b 1 g 1 (x),..., b m g m (x)}, which is convex. x is an optimum of (P) iff x argmin x φ(x) iff 0 φ(x ) iff 0 conv{ f(x ), ( g i (x )) : i I(x )} (obviously f(x ) = f ) iff h 0 f(x ), h i ( g i (x )), α i 0, α 0 + i I(x ) α i = 1 such that 0 = α 0 h 0 + i I(x ) α ih i. α 0 0. First, h i, y x g i (x ) g i (y) = b i g i (y) for all y and all i I(x ). If α 0 = 0, then 0 = i I(x ) α i h i, x x i I(x ) α i(b i g i ( x)), contradicting Slater s condition, satisfied by x. It remains to let λ i := α i/α 0.

This theorem cannot be used! You need to know I(x ) in advance! Easy way out: set λ i := 0 when i / I(x ). Theorem 2 (KKT Conditions for Convex Optimization II) Let f : R n R {+ } be a convex function, g 1,..., g m be concave functions, b R m such that Slater s condition holds: x : g i ( x) > b i for 1 i m. A point x is a solution to f = min{f(x) : g(x) b} iff g(x ) b, (Feasibility) h 0 f(x ), h i ( g i (x )), ( Usable λ i 0 for 1 i m: KKT h 0 + m i=1 λ i h i = 0, Conditions) and λ i (b i g i (x )) = 0 for all i. (Complementarity)

When you have a slightly different problem Equality constraints (necessary affine constraints): the same statement holds, but no sign constraint for the corresponding λ i s, and an extra condition on linear independence of the h i s. A version of the KKT Theorem exists for differentiable non-convex problems. The conditions read the same but are not sufficient. First find all the KKT points (x, λ ), then test them all to find the global optimum. Interesting exercise: what happens for general conic inequalities?

KKT and duality λ i is the dual optimum. Recall: Theorem 3 (Complementarity conditions) Suppose that x and F are feasible for their respective problems, and that f(x ) = F (b). Then p = f(x ) = F (g(x )) = F (b) = d (F). We take as candidates x and F (y) = u, y + u 0, with u := λ and u 0 := f(x ) λ, b. 1. By direct substitution, F (b) = f(x ). 2. F is feasible, that is F (g(x)) f(x) for all x. Fix x R n First, f(x ) f(x) h 0, x x = f(x) + i I(x ) λ i h i, x x f(x) + i I(x ) λ i (g i(x ) g i (x)) = f(x) + i I(x ) λ i (b i g i (x)), which is equivalent to F (g(x)) f(x). Thus λ is the dual optimum, and can be interpreted as the constraints prices. The KKT Conditions are nothing but L(x, λ )/ x = 0

A geometric view of KKT For unconstrained problems, we recover the optimality condition 0 f(x ). When the f is differentiable, and Q := {x : g(x) b} has a nonempty interior, we have x argmin{f(x) : x Q} iff f (x ), y x 0 y Q. x Q KKT says f (x ) = i I(x ) λ i h i, with h i, y x g i (x ) g i (y) = b i g i (y) and λ i 0 for i I(x ). Thus: f (x ), y x = i I(x ) λ i h i, y x i I(x ) λ i (b i g i (y)) 0 for all feasible y.

Application Projecting on a subspace One of the most solved optimization problems in the world. (Also known as Least-Squares Problem) Direct applications in meteorology, genomic, statistics, control, signal processing,... Let A R m n and b R m, with n m. Find the shortest solution of Ax = b: min{ x 2 2 /2 : Ax = b} KKT conditions: Ax = b, x A T λ = 0 imply AA T λ = b, and x = A T (AA T ) 1 b A := A T (AA T ) 1 is the Moore-Penrose inverse of A.

A historical application: A simple mechanical system We have on a straight segment between two walls: two masses each of width w; three springs of very short length at rest ( 0) attached between the walls and the center of the masses, of rigidity k 1, k 2 k 3 respectively. What is the equilibrium configuration? What are the forces on the walls? w L w

A historical application: Modeling as an optimization problem w L w x 1 x 2 Potential energy of a spring: rigidity length 2 /2. Force exerted by a spring: rigidity length. min 1 ( 2 k1 x 2 1 + k 2(x 2 x 1 ) 2 + k 3 (L x 2 ) 2) s.t. x 1 w/2 x 2 x 1 w L x 2 w/2.

A historical application: The optimality conditions min 1 ( 2 k1 x 2 1 + k 2(x 2 x 1 ) 2 + k 3 (L x 2 ) 2) s.t. x 1 w/2 x 2 x 1 w L x 2 w/2. Complementarity and KKT Conditions: λ 1 (x 1 w/2) = 0, λ 2 (x 2 x 1 w) = 0, λ 3 (L x 2 w/2) = 0, k 1 x 1 k 2(x 2 x 1 ) λ 1 + λ 2 = 0, k 2 (x 2 x 1 ) k 3(L x 2 ) λ 2 + λ 3 = 0, λ i 0, x feasible.

A historical application: The physical interpretation of dual variables Complementarity and KKT Conditions: λ 1(x 1 w/2) = 0, λ 2(x 2 x 1 w) = 0, λ 3(L x 2 w/2) = 0, k 1 x 1 k 2(x 2 x 1 ) λ 1 + λ 2 = 0, k 2 (x 2 x 1) k 3 (L x 2) λ 2 + λ 3 = 0, λ i 0, x feasible. The KKT Conditions can be interpreted as a force balance equation on both masses. k 1 x 1 k 2 (x 2 x 1 ) k 2 (x 2 x 1 ) k 3 (L x 2 ) λ 1 λ 2 λ 2 λ 3 λ 1 [λ 3 ] is the force exerted on the left [right] wall is the force exerted on each block λ 2

Applications of KKT s Theorem are countless I am sure that each of you will have to use them some day (If you stay in engineering)

For next week Making convex optimization work for you: Modeling and solving Linear, Second-Order, and Semidefinite optimization problems.