Lecture: Duality of LP, SOCP and SDP

1/33 Lecture: Duality of LP, SOCP and SDP Zaiwen Wen Beijing International Center For Mathematical Research Peking University http://bicmr.pku.edu.cn/~wenzw/bigdata2017.html wenzw@pku.edu.cn Acknowledgement: this slides is based on Prof. Lieven Vandenberghe s lecture notes

2/33 Lagrangian standard form problem (not necessarily convex) min f 0 (x) s.t. f i (x) 0, i = 1,..., m h i (x) = 0, i = 1,..., p variable x R n, domain D, optimal value p. Lagrangian: L : R n R m R p R, with doml = D R m R p L(x, λ, ν) = f 0 (x) + m λ i f i (x) + p ν i h i (x) weighted sum of objective and constraint functions λ i is the Lagrange multiplier associated with f i (x) 0 ν i is the Lagrange multiplier associated with h i (x) = 0

Lagrange dual function 3/33 Lagrange dual function: g : R m R p R g(λ, ν) = inf x D = inf x D L(x, λ, ν) ( m f 0 (x) + λ i f i (x) + g is concave, can be for some λ, ν ) p ν i h i (x) lower bound property: if λ 0, then g(λ, ν) p Proof: If x is feasible and λ 0, then f 0 ( x) L( x, λ, ν) inf L(x, λ, ν) = g(λ, ν). x D minimizing over all feasible x gives p g(λ, ν).

4/33 The dual problem Lagrange dual problem max g(λ, ν) s.t. λ 0 finds best lower bound on p, obtained from Lagrange dual function a convex optimization problem; optimal value denoted d λ, ν are dual feasible if λ 0, (λ, ν) domg often simplified by making implicit constraint (λ, ν) domg explicit

5/33 Standard form LP (P) min c x s.t. Ax = b x 0 (D) max b y s.t. A y + s = c s 0 Lagrangian is L(x, λ, ν) = c x + ν (Ax b) λ x = b ν + (c + A v λ) x L is affine in x, hence g(λ, ν) = inf L(x, λ, ν) = x { b ν, A ν λ + c = 0 otherwise lower bound property: p b v if A v + c 0 Check the dual of (D)

6/33 Inequality form LP min s.t. c x Ax b Lagrangian is L(x, λ, ν) = c x + ν (Ax b) = b ν + (c + A v) x L is affine in x, hence g(λ, ν) = inf L(x, λ, ν) = x { b ν, A ν + c = 0 otherwise Dual problem max b v s.t. A v = c, v 0

Equality constrained norm minimization min s.t. x Ax = b max b v s.t. A v 1 Dual function g(ν) = inf x ( x ν Ax + b ν) = { b ν A ν 1 otherwise, where v = sup u 1 u v is the dual norm of Proof: follows from inf x ( x y x) = 0 if y 1, otherwise if y 1, then x y x 0 for all x, with equality if x = 0 if y > 1, then choose x = tu, where u 1, u y = y > 1: x y x = t( u y ) as t for all x, with equality if x = 0 7/33

LP with box constraints min s.t. c x Ax = b 1 x 1 max b v 1 λ 1 1 λ 2 s.t. c + A v + λ 1 λ 2 = 0 λ 1 0, λ 2 0 reformulation with box constraints made implicit { c x 1 x 1 min f 0 (x) = otherwise Dual function g(ν) = s.t. Ax = b inf 1 x 1 (c x + ν (Ax b)) = b ν A v + c 1 Dual problem: max ν b ν A v + c 1 8/33

9/33 Lagrange dual and conjugate function min s.t. f 0 (x) Ax b Cx = d dual function g(λ, ν) = inf x domf 0 (f 0 (x) + (A λ + C v) x b λ d ν) = f ( A λ C v) b λ d ν recall definition of conjugate f (y) = sup x domf (y x f (x)) simplifies derivation of dual if conjugate of f 0 is known

Conjugate function 10/33 the conjugate of a function f is f (y) = f is closed and sup (y T x f (x)) x dom f convex even if f is not Fenchel s inequality f (x) + f (y) x T y x, y (extends inequality x T x/2 + y T y/2 x T y to non-quadratic convex f )

Quadratic function 11/33 f (x) = 1 2 xt Ax + b T x + c strictly convex case (A 0) f (y) = 1 2 (y b)t A 1 (y b) c general convex case (A 0) f (y) = 1 2 (y b)t A (y b) c, dom f = range(a) + b

Negative entropy and negative logarithm 12/33 negative entropy f (x) = negative logarithm n x i log x i f (y) = n e y i 1 n f (x) = log x i n f (y) = log( y i ) n matrix logarithm f (x) = log det X (dom f = S n ++) f (Y) = log det( Y) n

Indicator function and norm 13/33 indicator of convex set C: conjugate is support function of C { 0, x C f (x) = +, x C f (y) = sup y T x x C norm: conjugate is indicator of unit dual norm ball { f (x) = x f 0, y 1 (y) = +, y > 1 (see next page)

14/33 proof: recall the definition of dual norm: y = sup x T y x 1 to evaluate f (y) = sup x (y T x x ) we distinguish two cases if y 1, then (by definition of dual norm) y T x x x and equality holds if x = 0; therefore sup x (y T x x ) = 0 if y > 1, there exists an x with x 1, x T y > 1; then f (y) y T (tx) tx = t(y T x x ) and r.h.s. goes to infinity if t

The second conjugate 15/33 f (x) = sup (x T y f (y)) y dom f f (x) is closed and convex from Fenchel s inequality x T y f (y) f (x) for all y and x): f f (x) x equivalently, epi f epi f (for any f ) if f is closed and convex, then f (x) = f (x) x equivalently, epi f = epi f (if f is closed convex); proof on next page

Conjugates and subgradients 16/33 if f is closed and convex, then y f (x) x f (x) x T y = f (x) + f (y) proof: if y f (x), then f (y) = sup u (y T u f (u)) = y T x f (x) f (v) = sup(v T u f (u) u v T x f (x) = x T (v y) f (x) + y T x = f (y) + x T (v y) (1) for all v; therefore, x is a subgradient of f at y (x f (y)) reverse implication x f (y) y f (x) follows from f = f

Some calculus rules 17/33 separable sum f (x 1, x 2 ) = g(x 1 ) + h(x 2 ) f (y 1, y 2 ) = g (y 1 ) + h (y 2 ) scalar multiplication: (for α > 0) f (x) = αg(x) f (y) = αg (y/α) addition to affine function f (x) = g(x) + a T x + b f (y) = g (y a) b infimal convolution f (x) = inf (g(u) + h(v)) f u+v=x (y) = g (y) + h (y)

18/33 Duality and problem reformulations equivalent formulations of a problem can lead to very different duals reformulating the primal problem can be useful when the dual is difficult to derive, or uninteresting Common reformulations introduce new variables and equality constraints make explicit constraints implicit or vice-versa transform objective or constraint functions e.g., replace f 0 (x) by φ(f 0 (x)) with φ convex, increasing

Introducing new variables and equality constraints 19/33 min f 0 (Ax + b) dual function is constant: g = inf x L(x) = inf x f 0 (Ax + b) = p we have strong duality, but dual is quite useless reformulated problem and its dual (P) min f 0 (y) s.t. Ax + b = y (D) max b y f 0 (ν) s.t. A ν = 0 dual functions follows from g(ν) = inf f 0 (y) ν y + ν Ax + b ν x,y { f0 = (ν) + b ν, A ν = 0 otherwise

Norm approximation problem 20/33 min y min Ax b s.t. Ax b = y can look up conjugate of, or derive dual directly g(ν) = inf y + ν y ν Ax + b ν x,y { b ν + inf y y + ν y, A ν = 0 =, otherwise { b ν, A ν = 0, v 1 =, otherwise Dual problem is max b ν s.t. A ν = 0, v 1

21/33 Weak and strong duality weak duality: d p always holds (for convex and nonconvex problems) can be used to find nontrivial lower bounds for difficult problems for example, solving the maxcut SDP strong duality: d = p does not hold in general (usually) holds for convex problems conditions that guarantee strong duality in convex problems are called constraint qualifications

22/33 Slater s constraint qualification strong duality holds for a convex problem min f 0 (x) s.t. f i (x) 0, i = 1,..., m Ax = b If it is strictly feasible, i.e., x intd : f i (x) < 0, i = 1,..., m, Ax = b also guarantees that the dual optimum is attained (if p > ) can be sharpened: e.g., can replace intd with relintd (interior relative to affine hull); linear inequalities do not need to hold with strict inequality,... there exist many other types of constraint qualifications

Complementary slackness 23/33 assume strong duality holds, x is primal optimal, (λ, ν ) is dual feasible ( ) m p f 0 (x ) = g(λ, ν ) = inf f 0 (x) + λ i f i (x) + νi h i (x) x D f 0 (x ) + f 0 (x ) m λ i f i (x ) + p νi h i (x ) hence, the two inequalities hold with equality x minimizes L(x, λ, ν ) λ i f i(x ) = 0 for i = 1,..., m (known as complementary slackness): λ i > 0 = f i (x ) = 0, f i (x ) < 0 = λ i = 0

24/33 Karush-Kuhn-Tucker (KKT) conditions the following four conditions are called KKT conditions (for a problem with differentiable f i, h i ): 1 primal constraints: f i (x) 0, i = 1,..., m, h i (x) = 0, i = 1,..., p 2 dual constraints: λ 0 3 complementary slackness: λ i f i (x) = 0, i = 1,..., m 4 gradient of Lagrangian with respect to x vanishes: f 0 (x) + m λ i f i (x) + p ν i h i (x) = 0 If strong duality holds and x, λ, ν are optimal, then they must satisfy the KKT conditions

25/33 KKT conditions for convex problem If x, λ, ν satisfy KKT for a convex problem, then they are optimal from complementary slackness: f 0 ( x) = L( x, λ, ν) from 4th condition (and convexity): g( λ, ν) = L( x, λ, ν) Hence, f 0 ( x) = g( λ, ν) if Slater s condition is satisfied: x is optimal if and only if there exist λ, ν that satisfy KKT conditions recall that Slater implies strong duality, and dual optimum is attained generalizes optimality condition f 0 (x) = 0 for unconstrained problem

LP Duality 26/33 Strong duality: If a LP has an optimal solution, so does its dual, and their objective fun. are equal. primal finite unbounded infeasible dual finite unbounded infeasible If p =, then d p =, hence dual is infeasible If d = +, then + = d p, hence primal is infeasible min x 1 + 2x 2 s.t. x 1 + x 2 = 1 2x 1 + 2x 2 = 3 max p 1 + 3p 2 s.t. p 1 + 2p 2 = 1 p 1 + 2p 2 = 2

27/33 Problems with generalized inequalities min f 0 (x) s.t. f i (x) Ki 0, i = 1,..., m h i (x) = 0, f i (x) K i means f i (x) K i. Lagrangian:, Ki inner product in K i L(x, λ 1,..., λ m, ν) = f 0 (x) + dual function i = 1,..., p m λ i, f i (x) Ki + p ν i h i (x) g(λ 1,..., λ m, ν) = inf x D L(x, λ 1,..., λ m, ν)

28/33 lower bound property: if λ i K i, then g(λ 1,..., λ m, ν) p proof: If x is feasible and λ K i 0, then f 0 ( x) f 0 ( x) + m λ i, f i ( x) Ki + inf L(x, λ 1,..., λ m, ν) x D = g(λ 1,..., λ m, ν) p ν i h i ( x) minimizing over all feasible x gives g(λ 1,..., λ m, ν) p. Dual problem max g(λ 1,..., λ m, ν) weak duality: p d s.t. λ i K i 0, i = 1,..., m strong duality: p = d for convex problem with constraint qualification (for example, Slater s: primal problem is strictly feasible)

Semidefinite program min s.t. c x x 1 F 1 +... + x n F n G F i, G S k, multiplier is matrix Z S k Lagrangian L(x, Z) = c x + Z, x 1 F 1 +... + x n F n G dual function g(z) = inf x L(x, Z) = { G, Z, F i, Z + c i = 0 otherwise Dual SDP min G, Z s.t. F i, Z + c i = 0, Z 0 p = d if primal SDP is strictly feasible. 29/33

SDP Relaxtion of Maxcut min x Wx s.t. x 2 i = 1 = max 1 ν s.t. W + diag(ν) 0 min Tr(WX) s.t. X ii = 1 X 0 a nonconvex problem; feasible set contains 2n discrete points interpretation: partition {1,..., n} in two sets; W ij is cost of assigning i, j to the same set;; W ij is cost of assigning to different sets dual function ( g(ν) = inf x x Wx + i ν i (x 2 i 1) ) = inf x (W + diag(ν))x 1 ν x { 1 ν W + diag(ν) 0 = otherwise 30/33

SOCP/SDP Duality (P) min c x s.t. Ax = b, x Q 0 (D) max b y s.t. A y + s = c, s Q 0 (P) min C, X s.t. A 1, X = b 1... A m, X = b m X 0 (D) max b y s.t. y i A i + S = C i S 0 Strong duality If p >, (P) is strictly feasible, then (D) is feasible and p = d If d < +, (D) is strictly feasible, then (P) is feasible and p = d If (P) and (D) has strictly feasible solutions, then both have optimal solutions. 31/33

Failure of SOCP Duality 32/33 inf (1, 1, 0)x s.t. (0, 0, 1)x = 1 x Q 0 sup s.t. y (0, 0, 1) y + z = (1, 1, 0) z Q 0 primal: min x 0 x 1, s.t. x 0 x1 2 + 1; It holds x 0 x 1 > 0 and x 0 x 1 0 if x 0 = x1 2 + 1. Hence, p = 0, no finite solution dual: sup y s.t. 1 1 + y 2. Hence, y = 0 p = d but primal is not attainable.

Failure of SDP Duality 33/33 Consider min 0 0 0 0 0 0, X 0 0 1 s.t. 1 0 0 0 0 0, X = 0 0 0 0 0 1 0 0 0 0, X = 2 1 0 2 X 0 max 2y 2 s.t. 1 0 0 0 0 0 y 1 + 0 1 0 0 0 0 y 2 0 0 0 0 0 0 0 0 0 1 0 2 0 0 1 primal: X = 0 0 0 0 0 0 0 0 1, p = 1 dual: y = (0, 0). Hence, d = 0 Both problems have finite optimal values, but p d