Lecture: Duality http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghe s lecture notes
Introduction 2/35 Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized inequalities
Lagrangian standard form problem (not necessarily convex) min f 0 (x) s.t. f i (x) 0, i = 1,..., m h i (x) = 0, i = 1,..., p variable x R n, domain D, optimal value p Lagrangian: L : R n R m R p R, with dom L = D R m R p, m p L(x, λ, ν) = f 0 (x) + λ i f i (x) + ν i h i (x) i=1 i=1 weighted sum of objective and constraint functions λ i is Lagrange multiplier associated with f i (x) 0 ν i is Lagrange multiplier associated with h i (x) = 0 3/35
Lagrange dual function 4/35 Lagrange dual function: g : R m R p R, g(λ, ν) = inf L(x, λ, ν) x D ( m = inf f 0 (x) + λ i f i (x) + x D i=1 g is concave, can be for some λ, ν lower bound property: if λ 0, then g(λ, ν) p proof: if x is feasible and λ 0, then ) p ν i h i (x) i=1 f 0 ( x) L( x, λ, ν) inf L(x, λ, ν) = g(λ, ν) x D minimizing over all feasible x gives p g(λ, ν)
Least-norm solution of linear equations dual function min s.t. x T x Ax = b Lagrangian is L(x, ν) = x T x + ν T (Ax b) to minimize L over x, set gradient equal to zero: x L(x, ν) = 2x + A T ν = 0 = x = (1/2)A T ν plug in L to obtain g: g(ν) = L(( 1/2)A T ν, ν) = 1 4 νt AA T ν b T ν a concave function of ν lower bound property: p (1/4)ν T AA T ν b T ν for all ν 5/35
6/35 Standard form LP dual function min c T x s.t. Ax = b, x 0 Lagrangian is L(x, λ, ν) = c T x + ν T (Ax b) λ T x = b T ν + (c + A T ν λ) T x L is affine in x, hence g(λ, ν) = inf x L(x, λ, ν) = { b T ν A T ν λ + c = 0 otherwise g is linear on affine domain {(λ, ν) A T ν λ + c = 0}, hence concave lower bound property: p b T ν if A T ν + c 0
7/35 Equality constrained norm minimization dual function min s.t. x Ax = b g(ν) = inf x ( x νt Ax + b T ν) = { b T ν A T ν 1 where v = sup u 1 u T v is dual norm of otherwise proof: follows from inf x ( x y T x) = 0 if y 1, otherwise if y 1, then x y T x 0 for all x, with equality if x = 0 if y > 1, choose x = tu where u 1, u T y = y > 1: x y T x = t( u y ) as t lower bound property: p b T ν if A T ν 1
Two-way partitioning min x T Wx s.t. x 2 i = 1, i = 1,..., n a nonconvex problem; feasible set contains 2 n discrete points interpretation: partition {1,..., n} in two sets; W ij is cost of assigning i, j to the same set; W ij is cost of assigning to different sets dual function g(ν) = inf x (xt Wx + i ν i (x 2 i 1)) = inf x xt (W + diag(ν))x 1 T ν { 1 T ν W + diag(ν) 0 = otherwise lower bound property: p 1 T ν if W + diag(ν) 0 example: ν = λ min (W)1 gives bound p nλ min (W) 8/35
Lagrange dual and conjugate function 9/35 dual function g(λ, ν) = min f 0 (x) s.t. Ax b, Cx = d ( inf f0 (x) + (A T λ + C T ν) T x b T λ d T ν ) x dom f 0 = f 0 ( A T λ C T ν) b T λ d T ν recall definition of conjugate f (y) = sup x dom f (y T x f (x)) simplifies derivation of dual if conjugate of f 0 is known example: entropy maximization f 0 (x) = n x i log x i, f0 (y) = i=1 n i=1 e y i 1
The dual problem Lagrange dual problem max g(λ, ν) s.t. λ 0 finds best lower bound on p, obtained from Lagrange dual function a convex optimization problem; optimal value denoted d λ, ν are dual feasible if λ 0, (λ, ν) dom g often simplified by making implicit constraint (λ, ν) dom g explicit example: standard form LP and its dual (page 5-5) min c T x max b T ν s.t. Ax = b x 0 s.t. A T ν + c 0 10/35
Weak and strong duality weak duality: d p always holds (for convex and nonconvex problems) can be used to find nontrivial lower bounds for difficult problems for example, solving the SDP max 1 T ν s.t. W + diag(ν) 0 gives a lower bound for the two-way partitioning problem on page 5-7 strong duality: d = p does not hold in general (usually) holds for convex problems conditions that guarantee strong duality in convex problems are called constraint qualifications 11/35
12/35 Slater s constraint qualification strong duality holds for a convex problem min f 0 (x) s.t. f i (x) 0, i = 1,..., m Ax = b if it is strictly feasible, i.e., x int D : f i (x) < 0, i = 1,..., m, Ax = b also guarantees that the dual optimum is attained (if p > ) can be sharpened: e.g., can replace int D with relint D (interior relative to affine hull); linear inequalities do not need to hold with strict inequality,... there exist many other types of constraint qualifications
Inequality form LP primal problem min s.t. c T x Ax b dual function g(λ) = inf x ( (c + A T λ) T x b T λ ) = { b T λ A T λ + c = 0 otherwise dual problem max b T λ s.t. A T λ + c = 0, λ 0 from Slater s condition: p = d if A x < b for some x in fact, p = d except when primal and dual are infeasible 13/35
Quadratic program primal problem (assume P S n ++) min s.t. x T Px Ax b dual function g(λ) = inf x ( x T Px + λ T (Ax b) ) = 1 4 λt AP 1 A T λ b T λ dual problem max (1/4)λ T AP 1 A T λ b T λ s.t. λ 0 from Slater s condition: p = d if A x < b for some x in fact, p = d always 14/35
A nonconvex problem with strong duality A 0, hence nonconvex min x T Ax + 2b T x s.t. x T x 1 dual function: g(λ) = inf x (x T (A + λi)x + 2b T x λ) unbounded below if A + λi 0 or if A + λi 0 and b / R(A + λi) minimized by x = (A + λi) b otherwise: g(λ) = b T (A + λi) b λ dual problem and equivalent SDP: max b T (A + λi) b λ s.t. A + λi 0 b R(A + λi) max s.t. t λ [ A + λi b b T t ] 0 strong duality although primal problem is not convex (not easy to show) 15/35
Geometric interpretation Geometric interpretation for simplicity, consider problem with one constraint 1 (x) 0 for simplicity, consider problem with one constraint f 1 (x) 0 interpretation interpretation of of dual dual function: function: g(λ) = inf (t + λu), where G = {(f 1(x), f 0 (x)) x D} g(λ) = (u,t) G inf (t + λu), where G = {(f 1(x), f 0 (x)) x D} (u,t) G t t G G λu + t = g(λ) p g(λ) u p d u λu λu + + t = t = g(λ) g(λ) is is (non-vertical) (non-vertical) supporting supporting hyperplane hyperplane to to G G hyperplane intersects t-axis at t g(λ) hyperplane intersects t-axis at t = g(λ) 16/35
epigraph epigraph variation: variation: same same interpretation if GifisG is replaced with with A = {(u, A = t) {(u, f 1 (x) t) f u, f 0 (x) t for some x D} 1 (x) u, f 0 (x) t for some x D} t A λu + t = g(λ) p g(λ) u strong strong duality duality holds holds if there if there is a non-vertical is a non-vertical supporting supporting hyperplane hyperplanetotoa Aat at (0, (0, p p) ) for convex for convex problem, problem, A isaconvex, convex, hence hence has has supp. supp. hyperplane at at (0, p (0, p ) Slater s condition: if there exist (ũ, t) A with ũ < 0, then supportin Slater s hyperplanes condition: at if(0, there p ) exist must (ũ, t) be non-vertical A with ũ < 0, then supporting hyperplanes at (0, p ) must be non-vertical 17/35
Complementary slackness 18/35 assume strong duality holds, x is primal optimal, (λ, ν ) is dual optimal ( ) m p f 0 (x ) = g(λ, ν ) = inf f 0 (x) + λ x i f i (x) + νi h i (x) f 0 (x ) + f 0 (x ) i=1 m λ i f i (x ) + i=1 i=1 p νi h i (x ) i=1 hence, the two inequalities hold with equality x minimizes L(x, λ, ν ) λ i f i(x ) = 0 for i = 1,..., m (known as complementary slackness): λ i > 0 = f i (x ) = 0, f i (x ) < 0 = λ i = 0
Karush-Kuhn-Tucker (KKT) conditions 19/35 the following four conditions are called KKT conditions (for a problem with differentiable f i, h i ): 1 primal constraints: f i (x) 0, i = 1,..., m, h i (x) = 0, i = 1,..., p 2 dual constraints: λ 0 3 complementary slackness: λ i f i (x) = 0, i = 1,..., m 4 gradient of Lagrangian with respect to x vanishes: f 0 (x) + m λ i f i (x) + i=1 p ν i h i (x) = 0 from page 5-17: if strong duality holds and x, λ, ν are optimal, then they must satisfy the KKT conditions i=1
KKT conditions for convex problem 20/35 if x, λ, ν satisfy KKT for a convex problem, then they are optimal: from complementary slackness: f 0 ( x) = L( x, λ, ν) from 4th condition (and convexity): g( λ, ν) = L( x, λ, ν) hence, f 0 ( x) = g( λ, ν) if Slater s condition is satisfied: x is optimal if and only if there exist λ, ν that satisfy KKT conditions recall that Slater implies strong duality, and dual optimum is attained generalizes optimality condition f 0 (x) = 0 for unconstrained problem
example:water-filling (assume α i > 0) n min log(x i + α i ) x i + α i ) i=1 x = 1 s.t. x 0, 1 T x = 1 ist xλ is optimal R n, ν iff Rxsuch 0, that 1 T x = 1, and there exist λ R n, ν R such that 1 1 λ 0, λ i x i = 0, + λ i = ν + λ i = ν x i + α α i i if ν < 1/α i : λ i = 0 and x i = 1/ν α i if ν 1/α i : λ i = ν 1/α i and x i = 0 determine ν from 1 T x = n α i=1 max{0, 1/ν α i } = 1 i} = 1 interpretation 1/ν x i i α i n patches; level of patch i is at height α i flood area with unit amount of water resulting level is 1/ν 21/35
Perturbation and sensitivity analysis (unperturbed) optimization problem and its dual min f 0 (x) max g(λ, ν) s.t. f i (x) 0, i = 1,..., m s.t. λ 0 h i (x) = 0, i = 1,..., p perturbed problem and its dual min f 0 (x) s.t. f i (x) u i, i = 1,..., m h i (x) = v i, i = 1,..., p max s.t. λ 0 g(λ, ν) u T λ v T ν x is primal variable; u, v are parameters p (u, v) is optimal value as a function of u, v we are interested in information about p (u, v) that we can obtain from the solution of the unperturbed problem and its dual 22/35
global sensitivity result assume strong duality holds for unperturbed problem, and that λ, ν are dual optimal for unperturbed problem apply weak duality to perturbed problem: p (u, v) g(λ, ν ) u T λ v T ν = p (0, 0) u T λ v T ν sensitivity interpretation if λ large: p increases greatly if we tighten constraint i (u i < 0) if λ small: p does not decrease much if we loosen constraint i (u i > 0) if ν large and positive: p increases greatly if we take v i < 0; if ν large and negative: p increases greatly if we take v i > 0 if ν small and positive: p does not decrease much if we take v i > 0; if ν small and negative: p does not decrease much if we take v i < 0 23/35
local sensitivity: if (in addition) p (u, v) is differentiable at (0, 0), then λ i = p (0, 0), νi = p (0, 0) : if (in addition) p (u, v) is differentiable u at (0, 0), then i v i λ i proof = p (0, 0) (for λ i ):, fromν i global = p (0, 0) sensitivity result, u i v i p (0, 0) p (te i, 0) p (0, 0) rom global sensitivity result, = lim λ i u i t 0 t p (0, 0) p (te p (0, 0) p (te i, 0) p i, 0) p (0, 0) (0, 0) = lim = lim λ i u λ i tց0 i ut i t 0 t p hence, (0, 0) equality p (te i, 0) p (0, 0) = lim λ u p i i tր0 t (u) for a problem with one (inequality) constraint: lem with one (inequality) u = 0 u p (u) p (0) λ u 24/35
Duality and problem reformulations 25/35 equivalent formulations of a problem can lead to very different duals reformulating the primal problem can be useful when the dual is difficult to derive, or uninteresting common reformulations introduce new variables and equality constraints make explicit constraints implicit or vice-versa transform objective or constraint functions e.g., replace f 0 (x) by φ(f 0 (x)) with φ convex, increasing
Introducing new variables and equality constraints 26/35 min f 0 (Ax + b) dual function is constant: g = inf x L(x) = inf x f 0 (Ax + b) = p we have strong duality, but dual is quite useless reformulated problem and its dual min f 0 (y) s.t. Ax + b y = 0 max b T ν f 0 (ν) s.t. A T ν = 0 dual function follows from g(ν) = inf (f 0(y) ν T y + ν T Ax + b T ν) x,y { f 0 (ν) + b T ν A T ν = 0 = otherwise
norm approximation problem: min Ax b 27/35 min s.t. y y = Ax b can look up conjugate of, or derive dual directly g(ν) = inf ( y + x,y νt y ν T Ax + b T ν) { b T ν + inf y ( y + ν T y) A T ν = 0 = otherwise { b T ν A T ν = 0, ν 1 = otherwise dual of norm approximation problem max b T ν s.t. A T ν = 0, ν 1
Implicit constraints LP with box constraints: primal and dual problem min c T x s.t. Ax = b 1 x 1 max b T ν 1 T λ 1 1 T λ 2 s.t. c + A T ν + λ 1 λ 2 = 0 λ 1 0, λ 2 0 reformulation with box constraints made implicit min { c T x 1 x 1 f 0 (x) = otherwise s.t. Ax = b dual function g(ν) = inf 1 x 1 (ct x + ν T (Ax b)) = b T ν A T ν + c 1 dual problem: max b T ν A T ν + c 1 28/35
Problems with generalized inequalities min f 0 (x) s.t. f i (x) Ki 0, i = 1,..., m h i (x) = 0, Ki is generalized inequality on R k i definitions are parallel to scalar case: i = 1,..., p Lagrange multiplier for f i (x) Ki 0 is vector λ i R k i Lagrangian L : R n R k 1 R km R p R, is defined as m p L(x, λ 1,..., λ m, ν) = f 0 (x) + λ T i f i (x) + ν i h i (x) dual function g : R k 1 R km R p R, is defined as i=1 i=1 g(λ 1,..., λ m, ν) = inf x D L(x, λ 1,..., λ m, ν) 29/35
lower bound property: if λ i K i 0, then g(λ 1,..., λ m, ν) p proof: if x is feasible and λ K i 0, then f 0 ( x) f 0 ( x) + m λ T i f i ( x) + i=1 inf x D L(x, λ 1,..., λ m, ν) = g(λ 1,..., λ m, ν) p ν i h i ( x) minimizing over all feasible x gives p g(λ 1,..., λ m, ν) i=1 dual problem max g(λ 1,..., λ m, ν) s.t. λ i K i 0, i = 1,..., m weak duality: p d always strong duality: p = d for convex problem with constraint qualification (for example, Slater s: primal problem is strictly feasible) 30/35
Semidefinite program primal SDP (A i, C S n ) min s.t. b T y y 1 A 1 + + y m A m C Lagrange multiplier is matrix Z S n Lagrangian L(y, Z) = b T y + tr(z(y 1 A 1 + + y m A m C)) dual function g(z) = inf y L(y, Z) = { tr(cz) tr(ai Z) + b i = 0, i = 1,..., m otherwise dual SDP max tr(cz) s.t. Z 0, tr(a i Z) + b i = 0, i = 1,..., m p = d if primal SDP is strictly feasible ( y with y 1 A 1 + + y m A m C) 31/35
LP Duality 32/35 Strong duality: If a LP has an optimal solution, so does its dual, and their objective fun. are equal. primal finite unbounded infeasible dual finite unbounded infeasible If p =, then d p =, hence dual is infeasible If d = +, then + = d p, hence primal is infeasible min x 1 + 2x 2 s.t. x 1 + x 2 = 1 2x 1 + 2x 2 = 3 max p 1 + 3p 2 s.t. p 1 + 2p 2 = 1 p 1 + 2p 2 = 2
SOCP/SDP Duality (P) min c x s.t. Ax = b, x Q 0 (D) max b y s.t. A y + s = c, s Q 0 (P) min C, X s.t. A 1, X = b 1... A m, X = b m X 0 (D) max b y s.t. y i A i + S = C i S 0 Strong duality If p >, (P) is strictly feasible, then (D) is feasible and p = d If d < +, (D) is strictly feasible, then (P) is feasible and p = d If (P) and (D) has strictly feasible solutions, then both have optimal solutions. 33/35
Failure of SOCP Duality 34/35 inf (1, 1, 0)x s.t. (0, 0, 1)x = 1 x Q 0 sup s.t. y (0, 0, 1) y + z = (1, 1, 0) z Q 0 primal: min x 0 x 1, s.t. x 0 x1 2 + 1; It holds x 0 x 1 > 0 and x 0 x 1 0 if x 0 = x1 2 + 1. Hence, p = 0, no finite solution dual: sup y s.t. 1 1 + y 2. Hence, y = 0 p = d but primal is not attainable.
Failure of SDP Duality 35/35 Consider min 0 0 0 0 0 0, X 0 0 1 s.t. 1 0 0 0 0 0, X = 0 0 0 0 0 1 0 0 0 0, X = 2 1 0 2 X 0 max 2y 2 s.t. 1 0 0 0 0 0 y 1 + 0 1 0 0 0 0 y 2 0 0 0 0 0 0 0 0 0 1 0 2 0 0 1 primal: X = 0 0 0 0 0 0 0 0 1, p = 1 dual: y = (0, 0). Hence, d = 0 Both problems have finite optimal values, but p d