Lecture: Duality of LP, SOCP and SDP

Similar documents
Lecture: Duality.

5. Duality. Lagrangian

Convex Optimization M2

Convex Optimization Boyd & Vandenberghe. 5. Duality

EE/AA 578, Univ of Washington, Fall Duality

Duality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

CSCI : Optimization and Control of Networks. Review on Convex Optimization

A Brief Review on Convex Optimization

Tutorial on Convex Optimization for Engineers Part II

Convex Optimization & Lagrange Duality

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

8. Conjugate functions

The proximal mapping

Constrained Optimization and Lagrangian Duality

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness.

On the Method of Lagrange Multipliers

Lagrange duality. The Lagrangian. We consider an optimization program of the form

Lecture 18: Optimization Programming

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

subject to (x 2)(x 4) u,

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

Convex Optimization and Modeling

The Lagrangian L : R d R m R r R is an (easier to optimize) lower bound on the original problem:

Lecture 7: Convex Optimizations

12. Interior-point methods

Lecture 6: Conic Optimization September 8

ICS-E4030 Kernel Methods in Machine Learning

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus

Optimization for Machine Learning

Optimisation convexe: performance, complexité et applications.

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient

Lecture: Introduction to LP, SDP and SOCP

COM S 578X: Optimization for Machine Learning

Generalization to inequality constrained problem. Maximize

Primal/Dual Decomposition Methods

Lagrangian Duality Theory

CS-E4830 Kernel Methods in Machine Learning

Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Lagrangian Duality and Convex Optimization

Convex Optimization and SVM

EE364a Review Session 5

Linear and non-linear programming

Tutorial on Convex Optimization: Part II

Lecture 8. Strong Duality Results. September 22, 2008

4TE3/6TE3. Algorithms for. Continuous Optimization

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Lecture Notes on Support Vector Machine

12. Interior-point methods

Solution to EE 617 Mid-Term Exam, Fall November 2, 2017

Introduction to Mathematical Programming IE406. Lecture 10. Dr. Ted Ralphs

E5295/5B5749 Convex optimization with engineering applications. Lecture 5. Convex programming and semidefinite programming

Convex Optimization in Communications and Signal Processing

Lecture Note 5: Semidefinite Programming for Stability Analysis

Support Vector Machines: Maximum Margin Classifiers

UC Berkeley Department of Electrical Engineering and Computer Science. EECS 227A Nonlinear and Convex Optimization. Solutions 5 Fall 2009

Homework Set #6 - Solutions

Duality Uses and Correspondences. Ryan Tibshirani Convex Optimization

Lecture 3. Optimization Problems and Iterative Algorithms

Applications of Linear Programming

Machine Learning. Lecture 6: Support Vector Machine. Feng Li.

Chap 2. Optimality conditions

Optimization for Communications and Networks. Poompat Saengudomlert. Session 4 Duality and Lagrange Multipliers

HW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given.

minimize x subject to (x 2)(x 4) u,

Duality. Geoff Gordon & Ryan Tibshirani Optimization /

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

EE 227A: Convex Optimization and Applications October 14, 2008

Optimality Conditions for Constrained Optimization

Subgradients. subgradients and quasigradients. subgradient calculus. optimality conditions via subgradients. directional derivatives

Convex Optimization M2

Support vector machines

Lecture: Examples of LP, SOCP and SDP

Dual Decomposition.

Lecture: Convex Optimization Problems

A : k n. Usually k > n otherwise easily the minimum is zero. Analytical solution:

Lagrange Relaxation: Introduction and Applications

Optimality, Duality, Complementarity for Constrained Optimization

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

UC Berkeley Department of Electrical Engineering and Computer Science. EECS 227A Nonlinear and Convex Optimization. Solutions 6 Fall 2009

A : k n. Usually k > n otherwise easily the minimum is zero. Analytical solution:

Interior Point Algorithms for Constrained Convex Optimization

A Tutorial on Convex Optimization II: Duality and Interior Point Methods

4. Convex optimization problems

Convex Functions. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

EE 546, Univ of Washington, Spring Proximal mapping. introduction. review of conjugate functions. proximal mapping. Proximal mapping 6 1

Numerical Optimization

Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Gauge optimization and duality

Lectures 9 and 10: Constrained optimization problems and their optimality conditions

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Math 273a: Optimization Subgradients of convex functions

Optimisation in Higher Dimensions

Duality revisited. Javier Peña Convex Optimization /36-725

Subgradients. subgradients. strong and weak subgradient calculus. optimality conditions via subgradients. directional derivatives

Convex Optimization and Modeling

Linear and Combinatorial Optimization

Dual Proximal Gradient Method

Transcription:

1/33 Lecture: Duality of LP, SOCP and SDP Zaiwen Wen Beijing International Center For Mathematical Research Peking University http://bicmr.pku.edu.cn/~wenzw/bigdata2017.html wenzw@pku.edu.cn Acknowledgement: this slides is based on Prof. Lieven Vandenberghe s lecture notes

2/33 Lagrangian standard form problem (not necessarily convex) min f 0 (x) s.t. f i (x) 0, i = 1,..., m h i (x) = 0, i = 1,..., p variable x R n, domain D, optimal value p. Lagrangian: L : R n R m R p R, with doml = D R m R p L(x, λ, ν) = f 0 (x) + m λ i f i (x) + p ν i h i (x) weighted sum of objective and constraint functions λ i is the Lagrange multiplier associated with f i (x) 0 ν i is the Lagrange multiplier associated with h i (x) = 0

Lagrange dual function 3/33 Lagrange dual function: g : R m R p R g(λ, ν) = inf x D = inf x D L(x, λ, ν) ( m f 0 (x) + λ i f i (x) + g is concave, can be for some λ, ν ) p ν i h i (x) lower bound property: if λ 0, then g(λ, ν) p Proof: If x is feasible and λ 0, then f 0 ( x) L( x, λ, ν) inf L(x, λ, ν) = g(λ, ν). x D minimizing over all feasible x gives p g(λ, ν).

4/33 The dual problem Lagrange dual problem max g(λ, ν) s.t. λ 0 finds best lower bound on p, obtained from Lagrange dual function a convex optimization problem; optimal value denoted d λ, ν are dual feasible if λ 0, (λ, ν) domg often simplified by making implicit constraint (λ, ν) domg explicit

5/33 Standard form LP (P) min c x s.t. Ax = b x 0 (D) max b y s.t. A y + s = c s 0 Lagrangian is L(x, λ, ν) = c x + ν (Ax b) λ x = b ν + (c + A v λ) x L is affine in x, hence g(λ, ν) = inf L(x, λ, ν) = x { b ν, A ν λ + c = 0 otherwise lower bound property: p b v if A v + c 0 Check the dual of (D)

6/33 Inequality form LP min s.t. c x Ax b Lagrangian is L(x, λ, ν) = c x + ν (Ax b) = b ν + (c + A v) x L is affine in x, hence g(λ, ν) = inf L(x, λ, ν) = x { b ν, A ν + c = 0 otherwise Dual problem max b v s.t. A v = c, v 0

Equality constrained norm minimization min s.t. x Ax = b max b v s.t. A v 1 Dual function g(ν) = inf x ( x ν Ax + b ν) = { b ν A ν 1 otherwise, where v = sup u 1 u v is the dual norm of Proof: follows from inf x ( x y x) = 0 if y 1, otherwise if y 1, then x y x 0 for all x, with equality if x = 0 if y > 1, then choose x = tu, where u 1, u y = y > 1: x y x = t( u y ) as t for all x, with equality if x = 0 7/33

LP with box constraints min s.t. c x Ax = b 1 x 1 max b v 1 λ 1 1 λ 2 s.t. c + A v + λ 1 λ 2 = 0 λ 1 0, λ 2 0 reformulation with box constraints made implicit { c x 1 x 1 min f 0 (x) = otherwise Dual function g(ν) = s.t. Ax = b inf 1 x 1 (c x + ν (Ax b)) = b ν A v + c 1 Dual problem: max ν b ν A v + c 1 8/33

9/33 Lagrange dual and conjugate function min s.t. f 0 (x) Ax b Cx = d dual function g(λ, ν) = inf x domf 0 (f 0 (x) + (A λ + C v) x b λ d ν) = f ( A λ C v) b λ d ν recall definition of conjugate f (y) = sup x domf (y x f (x)) simplifies derivation of dual if conjugate of f 0 is known

Conjugate function 10/33 the conjugate of a function f is f (y) = f is closed and sup (y T x f (x)) x dom f convex even if f is not Fenchel s inequality f (x) + f (y) x T y x, y (extends inequality x T x/2 + y T y/2 x T y to non-quadratic convex f )

Quadratic function 11/33 f (x) = 1 2 xt Ax + b T x + c strictly convex case (A 0) f (y) = 1 2 (y b)t A 1 (y b) c general convex case (A 0) f (y) = 1 2 (y b)t A (y b) c, dom f = range(a) + b

Negative entropy and negative logarithm 12/33 negative entropy f (x) = negative logarithm n x i log x i f (y) = n e y i 1 n f (x) = log x i n f (y) = log( y i ) n matrix logarithm f (x) = log det X (dom f = S n ++) f (Y) = log det( Y) n

Indicator function and norm 13/33 indicator of convex set C: conjugate is support function of C { 0, x C f (x) = +, x C f (y) = sup y T x x C norm: conjugate is indicator of unit dual norm ball { f (x) = x f 0, y 1 (y) = +, y > 1 (see next page)

14/33 proof: recall the definition of dual norm: y = sup x T y x 1 to evaluate f (y) = sup x (y T x x ) we distinguish two cases if y 1, then (by definition of dual norm) y T x x x and equality holds if x = 0; therefore sup x (y T x x ) = 0 if y > 1, there exists an x with x 1, x T y > 1; then f (y) y T (tx) tx = t(y T x x ) and r.h.s. goes to infinity if t

The second conjugate 15/33 f (x) = sup (x T y f (y)) y dom f f (x) is closed and convex from Fenchel s inequality x T y f (y) f (x) for all y and x): f f (x) x equivalently, epi f epi f (for any f ) if f is closed and convex, then f (x) = f (x) x equivalently, epi f = epi f (if f is closed convex); proof on next page

Conjugates and subgradients 16/33 if f is closed and convex, then y f (x) x f (x) x T y = f (x) + f (y) proof: if y f (x), then f (y) = sup u (y T u f (u)) = y T x f (x) f (v) = sup(v T u f (u) u v T x f (x) = x T (v y) f (x) + y T x = f (y) + x T (v y) (1) for all v; therefore, x is a subgradient of f at y (x f (y)) reverse implication x f (y) y f (x) follows from f = f

Some calculus rules 17/33 separable sum f (x 1, x 2 ) = g(x 1 ) + h(x 2 ) f (y 1, y 2 ) = g (y 1 ) + h (y 2 ) scalar multiplication: (for α > 0) f (x) = αg(x) f (y) = αg (y/α) addition to affine function f (x) = g(x) + a T x + b f (y) = g (y a) b infimal convolution f (x) = inf (g(u) + h(v)) f u+v=x (y) = g (y) + h (y)

18/33 Duality and problem reformulations equivalent formulations of a problem can lead to very different duals reformulating the primal problem can be useful when the dual is difficult to derive, or uninteresting Common reformulations introduce new variables and equality constraints make explicit constraints implicit or vice-versa transform objective or constraint functions e.g., replace f 0 (x) by φ(f 0 (x)) with φ convex, increasing

Introducing new variables and equality constraints 19/33 min f 0 (Ax + b) dual function is constant: g = inf x L(x) = inf x f 0 (Ax + b) = p we have strong duality, but dual is quite useless reformulated problem and its dual (P) min f 0 (y) s.t. Ax + b = y (D) max b y f 0 (ν) s.t. A ν = 0 dual functions follows from g(ν) = inf f 0 (y) ν y + ν Ax + b ν x,y { f0 = (ν) + b ν, A ν = 0 otherwise

Norm approximation problem 20/33 min y min Ax b s.t. Ax b = y can look up conjugate of, or derive dual directly g(ν) = inf y + ν y ν Ax + b ν x,y { b ν + inf y y + ν y, A ν = 0 =, otherwise { b ν, A ν = 0, v 1 =, otherwise Dual problem is max b ν s.t. A ν = 0, v 1

21/33 Weak and strong duality weak duality: d p always holds (for convex and nonconvex problems) can be used to find nontrivial lower bounds for difficult problems for example, solving the maxcut SDP strong duality: d = p does not hold in general (usually) holds for convex problems conditions that guarantee strong duality in convex problems are called constraint qualifications

22/33 Slater s constraint qualification strong duality holds for a convex problem min f 0 (x) s.t. f i (x) 0, i = 1,..., m Ax = b If it is strictly feasible, i.e., x intd : f i (x) < 0, i = 1,..., m, Ax = b also guarantees that the dual optimum is attained (if p > ) can be sharpened: e.g., can replace intd with relintd (interior relative to affine hull); linear inequalities do not need to hold with strict inequality,... there exist many other types of constraint qualifications

Complementary slackness 23/33 assume strong duality holds, x is primal optimal, (λ, ν ) is dual feasible ( ) m p f 0 (x ) = g(λ, ν ) = inf f 0 (x) + λ i f i (x) + νi h i (x) x D f 0 (x ) + f 0 (x ) m λ i f i (x ) + p νi h i (x ) hence, the two inequalities hold with equality x minimizes L(x, λ, ν ) λ i f i(x ) = 0 for i = 1,..., m (known as complementary slackness): λ i > 0 = f i (x ) = 0, f i (x ) < 0 = λ i = 0

24/33 Karush-Kuhn-Tucker (KKT) conditions the following four conditions are called KKT conditions (for a problem with differentiable f i, h i ): 1 primal constraints: f i (x) 0, i = 1,..., m, h i (x) = 0, i = 1,..., p 2 dual constraints: λ 0 3 complementary slackness: λ i f i (x) = 0, i = 1,..., m 4 gradient of Lagrangian with respect to x vanishes: f 0 (x) + m λ i f i (x) + p ν i h i (x) = 0 If strong duality holds and x, λ, ν are optimal, then they must satisfy the KKT conditions

25/33 KKT conditions for convex problem If x, λ, ν satisfy KKT for a convex problem, then they are optimal from complementary slackness: f 0 ( x) = L( x, λ, ν) from 4th condition (and convexity): g( λ, ν) = L( x, λ, ν) Hence, f 0 ( x) = g( λ, ν) if Slater s condition is satisfied: x is optimal if and only if there exist λ, ν that satisfy KKT conditions recall that Slater implies strong duality, and dual optimum is attained generalizes optimality condition f 0 (x) = 0 for unconstrained problem

LP Duality 26/33 Strong duality: If a LP has an optimal solution, so does its dual, and their objective fun. are equal. primal finite unbounded infeasible dual finite unbounded infeasible If p =, then d p =, hence dual is infeasible If d = +, then + = d p, hence primal is infeasible min x 1 + 2x 2 s.t. x 1 + x 2 = 1 2x 1 + 2x 2 = 3 max p 1 + 3p 2 s.t. p 1 + 2p 2 = 1 p 1 + 2p 2 = 2

27/33 Problems with generalized inequalities min f 0 (x) s.t. f i (x) Ki 0, i = 1,..., m h i (x) = 0, f i (x) K i means f i (x) K i. Lagrangian:, Ki inner product in K i L(x, λ 1,..., λ m, ν) = f 0 (x) + dual function i = 1,..., p m λ i, f i (x) Ki + p ν i h i (x) g(λ 1,..., λ m, ν) = inf x D L(x, λ 1,..., λ m, ν)

28/33 lower bound property: if λ i K i, then g(λ 1,..., λ m, ν) p proof: If x is feasible and λ K i 0, then f 0 ( x) f 0 ( x) + m λ i, f i ( x) Ki + inf L(x, λ 1,..., λ m, ν) x D = g(λ 1,..., λ m, ν) p ν i h i ( x) minimizing over all feasible x gives g(λ 1,..., λ m, ν) p. Dual problem max g(λ 1,..., λ m, ν) weak duality: p d s.t. λ i K i 0, i = 1,..., m strong duality: p = d for convex problem with constraint qualification (for example, Slater s: primal problem is strictly feasible)

Semidefinite program min s.t. c x x 1 F 1 +... + x n F n G F i, G S k, multiplier is matrix Z S k Lagrangian L(x, Z) = c x + Z, x 1 F 1 +... + x n F n G dual function g(z) = inf x L(x, Z) = { G, Z, F i, Z + c i = 0 otherwise Dual SDP min G, Z s.t. F i, Z + c i = 0, Z 0 p = d if primal SDP is strictly feasible. 29/33

SDP Relaxtion of Maxcut min x Wx s.t. x 2 i = 1 = max 1 ν s.t. W + diag(ν) 0 min Tr(WX) s.t. X ii = 1 X 0 a nonconvex problem; feasible set contains 2n discrete points interpretation: partition {1,..., n} in two sets; W ij is cost of assigning i, j to the same set;; W ij is cost of assigning to different sets dual function ( g(ν) = inf x x Wx + i ν i (x 2 i 1) ) = inf x (W + diag(ν))x 1 ν x { 1 ν W + diag(ν) 0 = otherwise 30/33

SOCP/SDP Duality (P) min c x s.t. Ax = b, x Q 0 (D) max b y s.t. A y + s = c, s Q 0 (P) min C, X s.t. A 1, X = b 1... A m, X = b m X 0 (D) max b y s.t. y i A i + S = C i S 0 Strong duality If p >, (P) is strictly feasible, then (D) is feasible and p = d If d < +, (D) is strictly feasible, then (P) is feasible and p = d If (P) and (D) has strictly feasible solutions, then both have optimal solutions. 31/33

Failure of SOCP Duality 32/33 inf (1, 1, 0)x s.t. (0, 0, 1)x = 1 x Q 0 sup s.t. y (0, 0, 1) y + z = (1, 1, 0) z Q 0 primal: min x 0 x 1, s.t. x 0 x1 2 + 1; It holds x 0 x 1 > 0 and x 0 x 1 0 if x 0 = x1 2 + 1. Hence, p = 0, no finite solution dual: sup y s.t. 1 1 + y 2. Hence, y = 0 p = d but primal is not attainable.

Failure of SDP Duality 33/33 Consider min 0 0 0 0 0 0, X 0 0 1 s.t. 1 0 0 0 0 0, X = 0 0 0 0 0 1 0 0 0 0, X = 2 1 0 2 X 0 max 2y 2 s.t. 1 0 0 0 0 0 y 1 + 0 1 0 0 0 0 y 2 0 0 0 0 0 0 0 0 0 1 0 2 0 0 1 primal: X = 0 0 0 0 0 0 0 0 1, p = 1 dual: y = (0, 0). Hence, d = 0 Both problems have finite optimal values, but p d