Convex Optimization M2

Similar documents
5. Duality. Lagrangian

Convex Optimization Boyd & Vandenberghe. 5. Duality

Lecture: Duality.

EE/AA 578, Univ of Washington, Fall Duality

Lecture: Duality of LP, SOCP and SDP

Duality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Convex Optimization & Lagrange Duality

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

CSCI : Optimization and Control of Networks. Review on Convex Optimization

A Brief Review on Convex Optimization

Tutorial on Convex Optimization for Engineers Part II

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness.

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Lagrange duality. The Lagrangian. We consider an optimization program of the form

Optimisation convexe: performance, complexité et applications.

Lecture 18: Optimization Programming

subject to (x 2)(x 4) u,

Convex Optimization and Modeling

12. Interior-point methods

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

Constrained Optimization and Lagrangian Duality

ICS-E4030 Kernel Methods in Machine Learning

EE364a Review Session 5

Lagrangian Duality Theory

On the Method of Lagrange Multipliers

EE 227A: Convex Optimization and Applications October 14, 2008

CS-E4830 Kernel Methods in Machine Learning

COM S 578X: Optimization for Machine Learning

Optimization for Machine Learning

Tutorial on Convex Optimization: Part II

Homework Set #6 - Solutions

Primal/Dual Decomposition Methods

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Lecture 6: Conic Optimization September 8

Lecture 8. Strong Duality Results. September 22, 2008

4TE3/6TE3. Algorithms for. Continuous Optimization

Convex Optimization in Communications and Signal Processing

Lagrangian Duality and Convex Optimization

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient

HW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given.

4. Algebra and Duality

Convex Optimization and SVM

Chap 2. Optimality conditions

12. Interior-point methods

Convex Optimization M2

Solution to EE 617 Mid-Term Exam, Fall November 2, 2017

Lecture 7: Convex Optimizations

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

The Lagrangian L : R d R m R r R is an (easier to optimize) lower bound on the original problem:

Lecture 14: Optimality Conditions for Conic Problems

Interior Point Algorithms for Constrained Convex Optimization

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

A Tutorial on Convex Optimization II: Duality and Interior Point Methods

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Generalization to inequality constrained problem. Maximize

Introduction to Mathematical Programming IE406. Lecture 10. Dr. Ted Ralphs

Optimality Conditions for Constrained Optimization

UC Berkeley Department of Electrical Engineering and Computer Science. EECS 227A Nonlinear and Convex Optimization. Solutions 5 Fall 2009

Lecture 7: Weak Duality

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Duality Theory of Constrained Optimization

Semidefinite Programming Basics and Applications

Lecture: Introduction to LP, SDP and SOCP

UC Berkeley Department of Electrical Engineering and Computer Science. EECS 227A Nonlinear and Convex Optimization. Solutions 6 Fall 2009

Lecture Note 5: Semidefinite Programming for Stability Analysis

Subgradients. subgradients and quasigradients. subgradient calculus. optimality conditions via subgradients. directional derivatives

Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Numerical Optimization

Agenda. 1 Duality for LP. 2 Theorem of alternatives. 3 Conic Duality. 4 Dual cones. 5 Geometric view of cone programs. 6 Conic duality theorem

Optimality, Duality, Complementarity for Constrained Optimization

Additional Homework Problems

Support Vector Machines: Maximum Margin Classifiers

A : k n. Usually k > n otherwise easily the minimum is zero. Analytical solution:

4. Convex optimization problems

Duality Uses and Correspondences. Ryan Tibshirani Convex Optimization

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus

Lecture 1. 1 Conic programming. MA 796S: Convex Optimization and Interior Point Methods October 8, Consider the conic program. min.

Convex Optimization Lecture 6: KKT Conditions, and applications

A : k n. Usually k > n otherwise easily the minimum is zero. Analytical solution:

Lecture: Convex Optimization Problems

Conic Linear Optimization and its Dual. yyye

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 17

Example: feasibility. Interpretation as formal proof. Example: linear inequalities and Farkas lemma

Convexification of Mixed-Integer Quadratically Constrained Quadratic Programs

Linear and non-linear programming

Optimization for Communications and Networks. Poompat Saengudomlert. Session 4 Duality and Lagrange Multipliers

15. Conic optimization

Finite Dimensional Optimization Part III: Convex Optimization 1

Machine Learning. Lecture 6: Support Vector Machine. Feng Li.

Lectures 9 and 10: Constrained optimization problems and their optimality conditions

Duality. Geoff Gordon & Ryan Tibshirani Optimization /

Lecture 5. Theorems of Alternatives and Self-Dual Embedding

CONSTRAINT QUALIFICATIONS, LAGRANGIAN DUALITY & SADDLE POINT OPTIMALITY CONDITIONS

Subgradients. subgradients. strong and weak subgradient calculus. optimality conditions via subgradients. directional derivatives

Convex Optimization. Lecture 12 - Equality Constrained Optimization. Instructor: Yuanzhang Xiao. Fall University of Hawaii at Manoa

Lecture 5. The Dual Cone and Dual Problem

Constrained Optimization

LP Duality: outline. Duality theory for Linear Programming. alternatives. optimization I Idea: polyhedra

Transcription:

Convex Optimization M2 Lecture 3 A. d Aspremont. Convex Optimization M2. 1/49

Duality A. d Aspremont. Convex Optimization M2. 2/49

DMs DM par email: dm.daspremont@gmail.com A. d Aspremont. Convex Optimization M2. 3/49

Outline Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples theorems of alternatives generalized inequalities A. d Aspremont. Convex Optimization M2. 4/49

Lagrangian standard form problem (not necessarily convex) minimize f 0 (x) subject to f i (x) 0, i = 1,..., m h i (x) = 0, i = 1,..., p variable x R n, domain D, optimal value p Lagrangian: L : R n R m R p R, with dom L = D R m R p, L(x, λ, ν) = f 0 (x) + m λ i f i (x) + p ν i h i (x) weighted sum of objective and constraint functions λ i is Lagrange multiplier associated with f i (x) 0 ν i is Lagrange multiplier associated with h i (x) = 0 A. d Aspremont. Convex Optimization M2. 5/49

Lagrange dual function Lagrange dual function: g : R m R p R, g(λ, ν) = inf x D = inf x D L(x, λ, ν) ( f 0 (x) + g is concave, can be for some λ, ν m λ i f i (x) + ) p ν i h i (x) lower bound property: if λ 0, then g(λ, ν) p proof: if x is feasible and λ 0, then f 0 ( x) L( x, λ, ν) inf L(x, λ, ν) = g(λ, ν) x D minimizing over all feasible x gives p g(λ, ν) A. d Aspremont. Convex Optimization M2. 6/49

Least-norm solution of linear equations minimize x T x subject to Ax = b dual function Lagrangian is L(x, ν) = x T x + ν T (Ax b) to minimize L over x, set gradient equal to zero: x L(x, ν) = 2x + A T ν = 0 = x = (1/2)A T ν plug in in L to obtain g: a concave function of ν g(ν) = L(( 1/2)A T ν, ν) = 1 4 νt AA T ν b T ν lower bound property: p (1/4)ν T AA T ν b T ν for all ν A. d Aspremont. Convex Optimization M2. 7/49

Standard form LP minimize c T x subject to Ax = b, x 0 dual function Lagrangian is L(x, λ, ν) = c T x + ν T (Ax b) λ T x = b T ν + (c + A T ν λ) T x L is linear in x, hence g(λ, ν) = inf x L(x, λ, ν) = { b T ν A T ν λ + c = 0 otherwise g is linear on affine domain {(λ, ν) A T ν λ + c = 0}, hence concave lower bound property: p b T ν if A T ν + c 0 A. d Aspremont. Convex Optimization M2. 8/49

Equality constrained norm minimization dual function minimize subject to g(ν) = inf x ( x νt Ax + b T ν) = x Ax = b where v = sup u 1 u T v is dual norm of { b T ν A T ν 1 otherwise proof: follows from inf x ( x y T x) = 0 if y 1, otherwise if y 1, then x y T x 0 for all x, with equality if x = 0 if y > 1, choose x = tu where u 1, u T y = y > 1: x y T x = t( u y ) as t lower bound property: p b T ν if A T ν 1 A. d Aspremont. Convex Optimization M2. 9/49

Two-way partitioning minimize x T W x subject to x 2 i = 1, i = 1,..., n a nonconvex problem; feasible set contains 2 n discrete points interpretation: partition {1,..., n} in two sets; W ij is cost of assigning i, j to the same set; W ij is cost of assigning to different sets dual function g(ν) = inf x (xt W x + i ν i (x 2 i 1)) = inf x xt (W + diag(ν))x 1 T ν { 1 = T ν W + diag(ν) 0 otherwise lower bound property: p 1 T ν if W + diag(ν) 0 example: ν = λ min (W )1 gives bound p nλ min (W ) A. d Aspremont. Convex Optimization M2. 10/49

The dual problem Lagrange dual problem maximize g(λ, ν) subject to λ 0 finds best lower bound on p, obtained from Lagrange dual function a convex optimization problem; optimal value denoted d λ, ν are dual feasible if λ 0, (λ, ν) dom g often simplified by making implicit constraint (λ, ν) dom g explicit example: standard form LP and its dual (page 8) minimize subject to c T x Ax = b x 0 maximize b T ν subject to A T ν + c 0 A. d Aspremont. Convex Optimization M2. 11/49

Weak and strong duality weak duality: d p always holds (for convex and nonconvex problems) can be used to find nontrivial lower bounds for difficult problems for example, solving the SDP maximize 1 T ν subject to W + diag(ν) 0 gives a lower bound for the two-way partitioning problem on page 10 strong duality: d = p does not hold in general (usually) holds for convex problems conditions that guarantee strong duality in convex problems are called constraint qualifications A. d Aspremont. Convex Optimization M2. 12/49

Slater s constraint qualification strong duality holds for a convex problem if it is strictly feasible, i.e., minimize f 0 (x) subject to f i (x) 0, i = 1,..., m Ax = b x int D : f i (x) < 0, i = 1,..., m, Ax = b also guarantees that the dual optimum is attained (if p > ) can be sharpened: e.g., can replace int D with relint D (interior relative to affine hull); linear inequalities do not need to hold with strict inequality,... there exist many other types of constraint qualifications A. d Aspremont. Convex Optimization M2. 13/49

Feasibility problems feasibility problem A in x R n. f i (x) < 0, i = 1,..., m, h i (x) = 0, i = 1,..., p feasibility problem B in λ R m, ν R p. λ 0, λ 0, g(λ, ν) 0 where g(λ, ν) = inf x ( m λ if i (x) + p ν ih i (x)) feasibility problem B is convex (g is concave), even if problem A is not A and B are always weak alternatives: at most one is feasible proof: assume x satisfies A, λ, ν satisfy B 0 g(λ, ν) m λ if i ( x) + p ν ih i ( x) < 0 A and B are strong alternatives if exactly one of the two is feasible (can prove infeasibility of A by producing solution of B and vice-versa). A. d Aspremont. Convex Optimization M2. 14/49

Inequality form LP primal problem minimize subject to c T x Ax b dual function g(λ) = inf x ( (c + A T λ) T x b T λ ) { b = T λ A T λ + c = 0 otherwise dual problem maximize b T λ subject to A T λ + c = 0, λ 0 from Slater s condition: p = d if A x b for some x in fact, p = d except when primal and dual are infeasible A. d Aspremont. Convex Optimization M2. 15/49

Quadratic program primal problem (assume P S n ++) minimize subject to x T P x Ax b dual function g(λ) = inf x ( x T P x + λ T (Ax b) ) = 1 4 λt AP 1 A T λ b T λ dual problem maximize (1/4)λ T AP 1 A T λ b T λ subject to λ 0 from Slater s condition: p = d if A x b for some x in fact, p = d always A. d Aspremont. Convex Optimization M2. 16/49

A nonconvex problem with strong duality minimize x T Ax + 2b T x subject to x T x 1 nonconvex if A 0 dual function: g(λ) = inf x (x T (A + λi)x + 2b T x λ) unbounded below if A + λi 0 or if A + λi 0 and b R(A + λi) minimized by x = (A + λi) b otherwise: g(λ) = b T (A + λi) b λ dual problem and equivalent SDP: maximize b T (A + λi) b λ subject to A + λi 0 b R(A + λi) maximize subject to t [ λ A + λi b b T t ] 0 strong duality although primal problem is not convex (not easy to show) A. d Aspremont. Convex Optimization M2. 17/49

Geometric interpretation For simplicity, consider problem with one constraint f 1 (x) 0 interpretation of dual function: g(λ) = inf (t + λu), where G = {(f 1(x), f 0 (x)) x D} (u,t) G t t G G λu + t = g(λ) p g(λ) u p d u λu + t = g(λ) is (non-vertical) supporting hyperplane to G hyperplane intersects t-axis at t = g(λ) A. d Aspremont. Convex Optimization M2. 18/49

epigraph variation: same interpretation if G is replaced with A = {(u, t) f 1 (x) u, f 0 (x) t for some x D} t A λu + t = g(λ) p g(λ) u strong duality holds if there is a non-vertical supporting hyperplane to A at (0, p ) for convex problem, A is convex, hence has supp. hyperplane at (0, p ) Slater s condition: if there exist (ũ, t) A with ũ < 0, then supporting hyperplanes at (0, p ) must be non-vertical A. d Aspremont. Convex Optimization M2. 19/49

Complementary slackness Assume strong duality holds, x is primal optimal, (λ, ν ) is dual optimal f 0 (x ) = g(λ, ν ) = inf x ( f 0 (x ) + f 0 (x ) f 0 (x) + m λ i f i (x) + m λ i f i (x ) + ) p νi h i (x) p νi h i (x ) hence, the two inequalities hold with equality x minimizes L(x, λ, ν ) λ i f i(x ) = 0 for i = 1,..., m (known as complementary slackness): λ i > 0 = f i (x ) = 0, f i (x ) < 0 = λ i = 0 A. d Aspremont. Convex Optimization M2. 20/49

Karush-Kuhn-Tucker (KKT) conditions the following four conditions are called KKT conditions (for a problem with differentiable f i, h i ): 1. Primal feasibility: f i (x) 0, i = 1,..., m, h i (x) = 0, i = 1,..., p 2. Dual feasibility: λ 0 3. Complementary slackness: λ i f i (x) = 0, i = 1,..., m 4. Gradient of Lagrangian with respect to x vanishes (first order condition): f 0 (x) + m λ i f i (x) + p ν i h i (x) = 0 If strong duality holds and x, λ, ν are optimal, then they must satisfy the KKT conditions A. d Aspremont. Convex Optimization M2. 21/49

KKT conditions for convex problem If x, λ, ν satisfy KKT for a convex problem, then they are optimal: from complementary slackness: f 0 ( x) = L( x, λ, ν) from 4th condition (and convexity): g( λ, ν) = L( x, λ, ν) hence, f 0 ( x) = g( λ, ν) If Slater s condition is satisfied, x is optimal if and only if there exist λ, ν that satisfy KKT conditions recall that Slater implies strong duality, and dual optimum is attained generalizes optimality condition f 0 (x) = 0 for unconstrained problem Summary: When strong duality holds, the KKT conditions are necessary conditions for optimality If the problem is convex, they are also sufficient A. d Aspremont. Convex Optimization M2. 22/49

example: water-filling (assume α i > 0) minimize n log(x i + α i ) subject to x 0, 1 T x = 1 x is optimal iff x 0, 1 T x = 1, and there exist λ R n, ν R such that λ 0, λ i x i = 0, 1 x i + α i + λ i = ν if ν < 1/α i : λ i = 0 and x i = 1/ν α i if ν 1/α i : λ i = ν 1/α i and x i = 0 determine ν from 1 T x = n max{0, 1/ν α i} = 1 interpretation n patches; level of patch i is at height α i flood area with unit amount of water 1/ν x i α i resulting level is 1/ν i A. d Aspremont. Convex Optimization M2. 23/49

Perturbation and sensitivity analysis (unperturbed) optimization problem and its dual minimize f 0 (x) subject to f i (x) 0, i = 1,..., m h i (x) = 0, i = 1,..., p maximize g(λ, ν) subject to λ 0 perturbed problem and its dual min. f 0 (x) s.t. f i (x) u i, i = 1,..., m h i (x) = v i, i = 1,..., p max. g(λ, ν) u T λ v T ν s.t. λ 0 x is primal variable; u, v are parameters p (u, v) is optimal value as a function of u, v we are interested in information about p (u, v) that we can obtain from the solution of the unperturbed problem and its dual A. d Aspremont. Convex Optimization M2. 24/49

Perturbation and sensitivity analysis global sensitivity result Strong duality holds for unperturbed problem and λ, ν are dual optimal for unperturbed problem. Apply weak duality to perturbed problem: p (u, v) g(λ, ν ) u T λ v T ν = p (0, 0) u T λ v T ν local sensitivity: if (in addition) p (u, v) is differentiable at (0, 0), then λ i = p (0, 0) u i, ν i = p (0, 0) v i A. d Aspremont. Convex Optimization M2. 25/49

Duality and problem reformulations equivalent formulations of a problem can lead to very different duals reformulating the primal problem can be useful when the dual is difficult to derive, or uninteresting common reformulations introduce new variables and equality constraints make explicit constraints implicit or vice-versa transform objective or constraint functions e.g., replace f 0 (x) by φ(f 0 (x)) with φ convex, increasing A. d Aspremont. Convex Optimization M2. 26/49

Introducing new variables and equality constraints minimize f 0 (Ax + b) dual function is constant: g = inf x L(x) = inf x f 0 (Ax + b) = p we have strong duality, but dual is quite useless reformulated problem and its dual minimize f 0 (y) subject to Ax + b y = 0 maximize b T ν f0 (ν) subject to A T ν = 0 dual function follows from g(ν) = inf (f 0(y) ν T y + ν T Ax + b T ν) x,y { f = 0 (ν) + b T ν A T ν = 0 otherwise A. d Aspremont. Convex Optimization M2. 27/49

norm approximation problem: minimize Ax b minimize subject to y y = Ax b can look up conjugate of, or derive dual directly g(ν) = inf x,y νt y ν T Ax + b T ν) { b = ν + inf y ( y + ν T y) A T ν = 0 otherwise { b = ν A T ν = 0, ν 1 otherwise (see page 7) dual of norm approximation problem maximize b T ν subject to A T ν = 0, ν 1 A. d Aspremont. Convex Optimization M2. 28/49

Implicit constraints LP with box constraints: primal and dual problem minimize subject to c T x Ax = b 1 x 1 maximize b T ν 1 T λ 1 1 T λ 2 subject to c + A T ν + λ 1 λ 2 = 0 λ 1 0, λ 2 0 reformulation with box constraints made implicit minimize f 0 (x) = subject to Ax = b { c T x 1 x 1 otherwise dual function g(ν) = inf 1 x 1 (ct x + ν T (Ax b)) = b T ν A T ν + c 1 dual problem: maximize b T ν A T ν + c 1 A. d Aspremont. Convex Optimization M2. 29/49

Problems with generalized inequalities Ki is generalized inequality on R k i definitions are parallel to scalar case: minimize f 0 (x) subject to f i (x) Ki 0, i = 1,..., m h i (x) = 0, i = 1,..., p Lagrange multiplier for f i (x) Ki 0 is vector λ i R k i Lagrangian L : R n R k 1 R k m R p R, is defined as L(x, λ 1,, λ m, ν) = f 0 (x) + m λ T i f i (x) + p ν i h i (x) dual function g : R k 1 R k m R p R, is defined as g(λ 1,..., λ m, ν) = inf x D L(x, λ 1,, λ m, ν) A. d Aspremont. Convex Optimization M2. 30/49

lower bound property: if λ i K i 0, then g(λ 1,..., λ m, ν) p proof: if x is feasible and λ K i 0, then f 0 ( x) f 0 ( x) + m λ T i f i ( x) + inf L(x, λ 1,..., λ m, ν) x D = g(λ 1,..., λ m, ν) p ν i h i ( x) minimizing over all feasible x gives p g(λ 1,..., λ m, ν) dual problem maximize g(λ 1,..., λ m, ν) subject to λ i K i 0, i = 1,..., m weak duality: p d always strong duality: p = d for convex problem with constraint qualification (for example, Slater s: primal problem is strictly feasible) A. d Aspremont. Convex Optimization M2. 31/49

Semidefinite program primal SDP (F i, G S k ) minimize subject to c T x x 1 F 1 + + x n F n G Lagrange multiplier is matrix Z S k Lagrangian L(x, Z) = c T x + Tr (Z(x 1 F 1 + + x n F n G)) dual function g(z) = inf x L(x, Z) = { Tr(GZ) Tr(Fi Z) + c i = 0, i = 1,..., n otherwise dual SDP maximize Tr(GZ) subject to Z 0, Tr(F i Z) + c i = 0, i = 1,..., n p = d if primal SDP is strictly feasible ( x with x 1 F 1 + + x n F n G) A. d Aspremont. Convex Optimization M2. 32/49

Duality: SOCP example Let s consider the following Second Order Cone Program (SOCP): minimize f T x subject to A i x + b i 2 c T i x + d i, i = 1,..., m, with variable x R n. Let s show that the dual can be expressed as maximize subject to m (bt i u i + d i v i ) m (AT i u i + c i v i ) + f = 0 u i 2 v i, i = 1,..., m, with variables u i R n i, v i R, i = 1,..., m and problem data given by f R n, A i R n i n, b i R n i, c i R and d i R. A. d Aspremont. Convex Optimization M2. 33/49

Duality: SOCP We can derive the dual in the following two ways: 1. Introduce new variables y i R n i and t i R and equalities y i = A i x + b i, t i = c T i x + d i, and derive the Lagrange dual. 2. Start from the conic formulation of the SOCP and use the conic dual. Use the fact that the second-order cone is self-dual: t x tv + x T y 0, for all v, y such that v y The condition x T y tv is a simple Cauchy-Schwarz inequality A. d Aspremont. Convex Optimization M2. 34/49

Duality: SOCP We introduce new variables, and write the problem as minimize c T x subject to y i 2 t i, i = 1,..., m y i = A i x + b i, t i = c T i x + d i, i = 1,..., m The Lagrangian is L(x, y, t, λ, ν, µ) m m m = c T x + λ i ( y i 2 t i ) + νi T (y i A i x b i ) + µ i (t i c T i x d i ) m = (c A T i ν i n (b T i ν i + d i µ i ). m µ i c i ) T x + m (λ i y i 2 + νi T y i ) + m ( λ i + µ i )t i A. d Aspremont. Convex Optimization M2. 35/49

Duality: SOCP The minimum over x is bounded below if and only if m (A T i ν i + µ i c i ) = c. To minimize over y i, we note that inf y i (λ i y i 2 + ν T i y i ) = { 0 νi 2 λ i otherwise. The minimum over t i is bounded below if and only if λ i = µ i. A. d Aspremont. Convex Optimization M2. 36/49

Duality: SOCP The Lagrange dual function is g(λ, ν, µ) = n (bt i ν i + d i µ i ) if m (AT i ν i + µ i c i ) = c, ν i 2 λ i, µ = λ which leads to the dual problem otherwise maximize subject to n (b T i ν i + d i λ i ) m (A T i ν i + λ i c i ) = c ν i 2 λ i, i = 1,..., m. which is again an SOCP A. d Aspremont. Convex Optimization M2. 37/49

Duality: SOCP We can also express the SOCP as a conic form problem minimize c T x subject to (c T i x + d i, A i x + b i ) Ki 0, i = 1,..., m. The Lagrangian is given by: L(x, u i, v i ) = c T x i (A ix + b i ) T u i i (ct i x + d i)v i = ( c i (AT i u i + c i v i ) ) T x i (bt i u i + d i v i ) for (v i, u i ) K i 0 (which is also v i u i ) A. d Aspremont. Convex Optimization M2. 38/49

Duality: SOCP With L(x, u i, v i ) = ( c i the dual function is given by: g(λ, ν, µ) = The conic dual is then: ) T (A T i u i + c i v i ) x (b T i u i + d i v i ) n (bt i ν i + d i µ i ) if m (AT i ν i + µ i c i ) = c, otherwise maximize n subject to m (bt i u i + d i v i ) (AT i u i + v i c i ) = c (v i, u i ) K i 0, i = 1,..., m. i A. d Aspremont. Convex Optimization M2. 39/49

Proof Convex problem & constraint qualification Strong duality A. d Aspremont. Convex Optimization M2. 40/49

Slater s constraint qualification Convex problem minimize f 0 (x) subject to f i (x) 0, i = 1,..., m Ax = b The problem satisfies Slater s condition if it is strictly feasible, i.e., x int D : f i (x) < 0, i = 1,..., m, Ax = b also guarantees that the dual optimum is attained (if p > ) there exist many other types of constraint qualifications A. d Aspremont. Convex Optimization M2. 41/49

KKT conditions for convex problem If x, λ, ν satisfy KKT for a convex problem, then they are optimal: from complementary slackness: f 0 ( x) = L( x, λ, ν) from 4th condition (and convexity): g( λ, ν) = L( x, λ, ν) hence, f 0 ( x) = g( λ, ν) with ( x, λ, ν) feasible. If Slater s condition is satisfied, x is optimal if and only if there exist λ, ν that satisfy KKT conditions Slater implies strong duality (more on this now), and dual optimum is attained generalizes optimality condition f 0 (x) = 0 for unconstrained problem Summary For a convex problem satisfying constraint qualification, the KKT conditions are necessary & sufficient conditions for optimality. A. d Aspremont. Convex Optimization M2. 42/49

Proof To simplify the analysis. We make two additional technical assumptions: The domain D has nonempty interior (hence, relint D = int D) We also assume that A has full rank, i.e. Rank A = p. A. d Aspremont. Convex Optimization M2. 43/49

Proof We define the set A as A = {(u, v, t) x D, f i (x) u i, i = 1,..., m, h i (x) = v i, i = 1,..., p, f 0 (x) t}, which is the set of values taken by the constraint and objective functions. If the problem is convex, A is defined by a list of convex constraints hence is convex. We define a second convex set B as B = {(0, 0, s) R m R p R s < p }. The sets A and B do not intersect (otherwise p could not be optimal value of the problem). First step: The hyperplane separating A and B defines a supporting hyperplane to A at (0, p ). A. d Aspremont. Convex Optimization M2. 44/49

Geometric proof t (ũ, t) A B u Illustration of strong duality proof, for a convex problem that satisfies Slater s constraint qualification. The two sets A and B are convex and do not intersect, so they can be separated by a hyperplane. Slater s constraint qualification guarantees that any separating hyperplane must be nonvertical. A. d Aspremont. Convex Optimization M2. 45/49

Proof By the separating hyperplane theorem there exists ( λ, ν, µ) 0 and α such that (u, v, t) A = λ T u + ν T v + µt α, (1) and (u, v, t) B = λ T u + ν T v + µt α. (2) From (1) we conclude that λ 0 and µ 0. (Otherwise λ T u + µt is unbounded below over A, contradicting (1).) The condition (2) simply means that µt α for all t < p, and hence, µp α. Together with (1) we conclude that for any x D, µp α µf 0 (x) + m λ i f i (x) + ν T (Ax b) (3) A. d Aspremont. Convex Optimization M2. 46/49

Proof Let us assume that µ > 0 (separating hyperplane is nonvertical) We can divide the previous equation by µ to get L(x, λ/µ, ν/µ) p for all x D Minimizing this inequality over x produces p g(λ, ν), where λ = λ/µ, ν = ν/µ. By weak duality we have g(λ, ν) p, so in fact g(λ, ν) = p. This shows that strong duality holds, and that the dual optimum is attained, whenever µ > 0. The normal vector has the form (λ, 1) and produces the Lagrange multipliers. A. d Aspremont. Convex Optimization M2. 47/49

Proof Second step: Slater s constraint qualification is used to establish that the hyperplane must be nonvertical, i.e. µ > 0. By contradiction, assume that µ = 0. From (3), we conclude that for all x D, m λ i f i (x) + ν T (Ax b) 0. (4) Applying this to the point x that satisfies the Slater condition, we have m λ i f i ( x) 0. Since f i ( x) < 0 and λ i 0, we conclude that λ = 0. A. d Aspremont. Convex Optimization M2. 48/49

Proof This is where we use the two technical assumptions. Then (4) implies that for all x D, ν T (Ax b) 0. But x satisfies ν T (A x b) = 0, and since x int D, there are points in D with ν T (Ax b) < 0 unless A T ν = 0. This contradicts our assumption that Rank A = p. This means that we cannot have µ = 0 and ends the proof. A. d Aspremont. Convex Optimization M2. 49/49