Lecture: Duality.

Similar documents
5. Duality. Lagrangian

Convex Optimization Boyd & Vandenberghe. 5. Duality

Convex Optimization M2

EE/AA 578, Univ of Washington, Fall Duality

Lecture: Duality of LP, SOCP and SDP

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Duality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities

Convex Optimization & Lagrange Duality

CSCI : Optimization and Control of Networks. Review on Convex Optimization

A Brief Review on Convex Optimization

Tutorial on Convex Optimization for Engineers Part II

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness.

Lagrange duality. The Lagrangian. We consider an optimization program of the form

Constrained Optimization and Lagrangian Duality

Lecture 18: Optimization Programming

Convex Optimization and Modeling

ICS-E4030 Kernel Methods in Machine Learning

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

On the Method of Lagrange Multipliers

subject to (x 2)(x 4) u,

CS-E4830 Kernel Methods in Machine Learning

Optimisation convexe: performance, complexité et applications.

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Homework Set #6 - Solutions

12. Interior-point methods

COM S 578X: Optimization for Machine Learning

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

Optimization for Machine Learning

Lagrangian Duality Theory

The Lagrangian L : R d R m R r R is an (easier to optimize) lower bound on the original problem:

EE364a Review Session 5

Lecture 7: Convex Optimizations

Convex Optimization in Communications and Signal Processing

Lagrangian Duality and Convex Optimization

Primal/Dual Decomposition Methods

EE 227A: Convex Optimization and Applications October 14, 2008

Lecture 6: Conic Optimization September 8

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient

Lecture: Convex Optimization Problems

Tutorial on Convex Optimization: Part II

HW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given.

Convex Optimization and SVM

4TE3/6TE3. Algorithms for. Continuous Optimization

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus

Introduction to Mathematical Programming IE406. Lecture 10. Dr. Ted Ralphs

A Tutorial on Convex Optimization II: Duality and Interior Point Methods

Optimality, Duality, Complementarity for Constrained Optimization

Generalization to inequality constrained problem. Maximize

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Support Vector Machines: Maximum Margin Classifiers

Solution to EE 617 Mid-Term Exam, Fall November 2, 2017

Optimality Conditions for Constrained Optimization

Subgradients. subgradients and quasigradients. subgradient calculus. optimality conditions via subgradients. directional derivatives

Chap 2. Optimality conditions

Lecture Note 5: Semidefinite Programming for Stability Analysis

12. Interior-point methods

Linear and non-linear programming

Convex Optimization and Support Vector Machine

4. Convex optimization problems

Duality. Geoff Gordon & Ryan Tibshirani Optimization /

UC Berkeley Department of Electrical Engineering and Computer Science. EECS 227A Nonlinear and Convex Optimization. Solutions 6 Fall 2009

A : k n. Usually k > n otherwise easily the minimum is zero. Analytical solution:

Semidefinite Programming Basics and Applications

Duality Uses and Correspondences. Ryan Tibshirani Convex Optimization

A : k n. Usually k > n otherwise easily the minimum is zero. Analytical solution:

Lecture 7: Weak Duality

Lecture Notes on Support Vector Machine

Numerical Optimization

Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Convex Optimization M2

Interior Point Algorithms for Constrained Convex Optimization

Quiz Discussion. IE417: Nonlinear Programming: Lecture 12. Motivation. Why do we care? Jeff Linderoth. 16th March 2006

Convex Optimization. Lecture 12 - Equality Constrained Optimization. Instructor: Yuanzhang Xiao. Fall University of Hawaii at Manoa

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

Finite Dimensional Optimization Part III: Convex Optimization 1

Part IB Optimisation

Subgradients. subgradients. strong and weak subgradient calculus. optimality conditions via subgradients. directional derivatives

UC Berkeley Department of Electrical Engineering and Computer Science. EECS 227A Nonlinear and Convex Optimization. Solutions 5 Fall 2009

ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints

E5295/5B5749 Convex optimization with engineering applications. Lecture 5. Convex programming and semidefinite programming

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

4. Convex optimization problems (part 1: general)

Linear and Combinatorial Optimization

Lecture: Introduction to LP, SDP and SOCP

Machine Learning. Lecture 6: Support Vector Machine. Feng Li.

Convex Optimization and Modeling

Lecture 5. Theorems of Alternatives and Self-Dual Embedding

Lectures 9 and 10: Constrained optimization problems and their optimality conditions

Convexification of Mixed-Integer Quadratically Constrained Quadratic Programs

Optimization for Communications and Networks. Poompat Saengudomlert. Session 4 Duality and Lagrange Multipliers

Applications of Linear Programming

Duality Theory of Constrained Optimization

Support Vector Machines

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

minimize x x2 2 x 1x 2 x 1 subject to x 1 +2x 2 u 1 x 1 4x 2 u 2, 5x 1 +76x 2 1,

Machine Learning. Support Vector Machines. Manfred Huber

Convex Optimization Lecture 6: KKT Conditions, and applications

Transcription:

Lecture: Duality http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghe s lecture notes

Introduction 2/35 Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized inequalities

Lagrangian standard form problem (not necessarily convex) min f 0 (x) s.t. f i (x) 0, i = 1,..., m h i (x) = 0, i = 1,..., p variable x R n, domain D, optimal value p Lagrangian: L : R n R m R p R, with dom L = D R m R p, m p L(x, λ, ν) = f 0 (x) + λ i f i (x) + ν i h i (x) i=1 i=1 weighted sum of objective and constraint functions λ i is Lagrange multiplier associated with f i (x) 0 ν i is Lagrange multiplier associated with h i (x) = 0 3/35

Lagrange dual function 4/35 Lagrange dual function: g : R m R p R, g(λ, ν) = inf L(x, λ, ν) x D ( m = inf f 0 (x) + λ i f i (x) + x D i=1 g is concave, can be for some λ, ν lower bound property: if λ 0, then g(λ, ν) p proof: if x is feasible and λ 0, then ) p ν i h i (x) i=1 f 0 ( x) L( x, λ, ν) inf L(x, λ, ν) = g(λ, ν) x D minimizing over all feasible x gives p g(λ, ν)

Least-norm solution of linear equations dual function min s.t. x T x Ax = b Lagrangian is L(x, ν) = x T x + ν T (Ax b) to minimize L over x, set gradient equal to zero: x L(x, ν) = 2x + A T ν = 0 = x = (1/2)A T ν plug in L to obtain g: g(ν) = L(( 1/2)A T ν, ν) = 1 4 νt AA T ν b T ν a concave function of ν lower bound property: p (1/4)ν T AA T ν b T ν for all ν 5/35

6/35 Standard form LP dual function min c T x s.t. Ax = b, x 0 Lagrangian is L(x, λ, ν) = c T x + ν T (Ax b) λ T x = b T ν + (c + A T ν λ) T x L is affine in x, hence g(λ, ν) = inf x L(x, λ, ν) = { b T ν A T ν λ + c = 0 otherwise g is linear on affine domain {(λ, ν) A T ν λ + c = 0}, hence concave lower bound property: p b T ν if A T ν + c 0

7/35 Equality constrained norm minimization dual function min s.t. x Ax = b g(ν) = inf x ( x νt Ax + b T ν) = { b T ν A T ν 1 where v = sup u 1 u T v is dual norm of otherwise proof: follows from inf x ( x y T x) = 0 if y 1, otherwise if y 1, then x y T x 0 for all x, with equality if x = 0 if y > 1, choose x = tu where u 1, u T y = y > 1: x y T x = t( u y ) as t lower bound property: p b T ν if A T ν 1

Two-way partitioning min x T Wx s.t. x 2 i = 1, i = 1,..., n a nonconvex problem; feasible set contains 2 n discrete points interpretation: partition {1,..., n} in two sets; W ij is cost of assigning i, j to the same set; W ij is cost of assigning to different sets dual function g(ν) = inf x (xt Wx + i ν i (x 2 i 1)) = inf x xt (W + diag(ν))x 1 T ν { 1 T ν W + diag(ν) 0 = otherwise lower bound property: p 1 T ν if W + diag(ν) 0 example: ν = λ min (W)1 gives bound p nλ min (W) 8/35

Lagrange dual and conjugate function 9/35 dual function g(λ, ν) = min f 0 (x) s.t. Ax b, Cx = d ( inf f0 (x) + (A T λ + C T ν) T x b T λ d T ν ) x dom f 0 = f 0 ( A T λ C T ν) b T λ d T ν recall definition of conjugate f (y) = sup x dom f (y T x f (x)) simplifies derivation of dual if conjugate of f 0 is known example: entropy maximization f 0 (x) = n x i log x i, f0 (y) = i=1 n i=1 e y i 1

The dual problem Lagrange dual problem max g(λ, ν) s.t. λ 0 finds best lower bound on p, obtained from Lagrange dual function a convex optimization problem; optimal value denoted d λ, ν are dual feasible if λ 0, (λ, ν) dom g often simplified by making implicit constraint (λ, ν) dom g explicit example: standard form LP and its dual (page 5-5) min c T x max b T ν s.t. Ax = b x 0 s.t. A T ν + c 0 10/35

Weak and strong duality weak duality: d p always holds (for convex and nonconvex problems) can be used to find nontrivial lower bounds for difficult problems for example, solving the SDP max 1 T ν s.t. W + diag(ν) 0 gives a lower bound for the two-way partitioning problem on page 5-7 strong duality: d = p does not hold in general (usually) holds for convex problems conditions that guarantee strong duality in convex problems are called constraint qualifications 11/35

12/35 Slater s constraint qualification strong duality holds for a convex problem min f 0 (x) s.t. f i (x) 0, i = 1,..., m Ax = b if it is strictly feasible, i.e., x int D : f i (x) < 0, i = 1,..., m, Ax = b also guarantees that the dual optimum is attained (if p > ) can be sharpened: e.g., can replace int D with relint D (interior relative to affine hull); linear inequalities do not need to hold with strict inequality,... there exist many other types of constraint qualifications

Inequality form LP primal problem min s.t. c T x Ax b dual function g(λ) = inf x ( (c + A T λ) T x b T λ ) = { b T λ A T λ + c = 0 otherwise dual problem max b T λ s.t. A T λ + c = 0, λ 0 from Slater s condition: p = d if A x < b for some x in fact, p = d except when primal and dual are infeasible 13/35

Quadratic program primal problem (assume P S n ++) min s.t. x T Px Ax b dual function g(λ) = inf x ( x T Px + λ T (Ax b) ) = 1 4 λt AP 1 A T λ b T λ dual problem max (1/4)λ T AP 1 A T λ b T λ s.t. λ 0 from Slater s condition: p = d if A x < b for some x in fact, p = d always 14/35

A nonconvex problem with strong duality A 0, hence nonconvex min x T Ax + 2b T x s.t. x T x 1 dual function: g(λ) = inf x (x T (A + λi)x + 2b T x λ) unbounded below if A + λi 0 or if A + λi 0 and b / R(A + λi) minimized by x = (A + λi) b otherwise: g(λ) = b T (A + λi) b λ dual problem and equivalent SDP: max b T (A + λi) b λ s.t. A + λi 0 b R(A + λi) max s.t. t λ [ A + λi b b T t ] 0 strong duality although primal problem is not convex (not easy to show) 15/35

Geometric interpretation Geometric interpretation for simplicity, consider problem with one constraint 1 (x) 0 for simplicity, consider problem with one constraint f 1 (x) 0 interpretation interpretation of of dual dual function: function: g(λ) = inf (t + λu), where G = {(f 1(x), f 0 (x)) x D} g(λ) = (u,t) G inf (t + λu), where G = {(f 1(x), f 0 (x)) x D} (u,t) G t t G G λu + t = g(λ) p g(λ) u p d u λu λu + + t = t = g(λ) g(λ) is is (non-vertical) (non-vertical) supporting supporting hyperplane hyperplane to to G G hyperplane intersects t-axis at t g(λ) hyperplane intersects t-axis at t = g(λ) 16/35

epigraph epigraph variation: variation: same same interpretation if GifisG is replaced with with A = {(u, A = t) {(u, f 1 (x) t) f u, f 0 (x) t for some x D} 1 (x) u, f 0 (x) t for some x D} t A λu + t = g(λ) p g(λ) u strong strong duality duality holds holds if there if there is a non-vertical is a non-vertical supporting supporting hyperplane hyperplanetotoa Aat at (0, (0, p p) ) for convex for convex problem, problem, A isaconvex, convex, hence hence has has supp. supp. hyperplane at at (0, p (0, p ) Slater s condition: if there exist (ũ, t) A with ũ < 0, then supportin Slater s hyperplanes condition: at if(0, there p ) exist must (ũ, t) be non-vertical A with ũ < 0, then supporting hyperplanes at (0, p ) must be non-vertical 17/35

Complementary slackness 18/35 assume strong duality holds, x is primal optimal, (λ, ν ) is dual optimal ( ) m p f 0 (x ) = g(λ, ν ) = inf f 0 (x) + λ x i f i (x) + νi h i (x) f 0 (x ) + f 0 (x ) i=1 m λ i f i (x ) + i=1 i=1 p νi h i (x ) i=1 hence, the two inequalities hold with equality x minimizes L(x, λ, ν ) λ i f i(x ) = 0 for i = 1,..., m (known as complementary slackness): λ i > 0 = f i (x ) = 0, f i (x ) < 0 = λ i = 0

Karush-Kuhn-Tucker (KKT) conditions 19/35 the following four conditions are called KKT conditions (for a problem with differentiable f i, h i ): 1 primal constraints: f i (x) 0, i = 1,..., m, h i (x) = 0, i = 1,..., p 2 dual constraints: λ 0 3 complementary slackness: λ i f i (x) = 0, i = 1,..., m 4 gradient of Lagrangian with respect to x vanishes: f 0 (x) + m λ i f i (x) + i=1 p ν i h i (x) = 0 from page 5-17: if strong duality holds and x, λ, ν are optimal, then they must satisfy the KKT conditions i=1

KKT conditions for convex problem 20/35 if x, λ, ν satisfy KKT for a convex problem, then they are optimal: from complementary slackness: f 0 ( x) = L( x, λ, ν) from 4th condition (and convexity): g( λ, ν) = L( x, λ, ν) hence, f 0 ( x) = g( λ, ν) if Slater s condition is satisfied: x is optimal if and only if there exist λ, ν that satisfy KKT conditions recall that Slater implies strong duality, and dual optimum is attained generalizes optimality condition f 0 (x) = 0 for unconstrained problem

example:water-filling (assume α i > 0) n min log(x i + α i ) x i + α i ) i=1 x = 1 s.t. x 0, 1 T x = 1 ist xλ is optimal R n, ν iff Rxsuch 0, that 1 T x = 1, and there exist λ R n, ν R such that 1 1 λ 0, λ i x i = 0, + λ i = ν + λ i = ν x i + α α i i if ν < 1/α i : λ i = 0 and x i = 1/ν α i if ν 1/α i : λ i = ν 1/α i and x i = 0 determine ν from 1 T x = n α i=1 max{0, 1/ν α i } = 1 i} = 1 interpretation 1/ν x i i α i n patches; level of patch i is at height α i flood area with unit amount of water resulting level is 1/ν 21/35

Perturbation and sensitivity analysis (unperturbed) optimization problem and its dual min f 0 (x) max g(λ, ν) s.t. f i (x) 0, i = 1,..., m s.t. λ 0 h i (x) = 0, i = 1,..., p perturbed problem and its dual min f 0 (x) s.t. f i (x) u i, i = 1,..., m h i (x) = v i, i = 1,..., p max s.t. λ 0 g(λ, ν) u T λ v T ν x is primal variable; u, v are parameters p (u, v) is optimal value as a function of u, v we are interested in information about p (u, v) that we can obtain from the solution of the unperturbed problem and its dual 22/35

global sensitivity result assume strong duality holds for unperturbed problem, and that λ, ν are dual optimal for unperturbed problem apply weak duality to perturbed problem: p (u, v) g(λ, ν ) u T λ v T ν = p (0, 0) u T λ v T ν sensitivity interpretation if λ large: p increases greatly if we tighten constraint i (u i < 0) if λ small: p does not decrease much if we loosen constraint i (u i > 0) if ν large and positive: p increases greatly if we take v i < 0; if ν large and negative: p increases greatly if we take v i > 0 if ν small and positive: p does not decrease much if we take v i > 0; if ν small and negative: p does not decrease much if we take v i < 0 23/35

local sensitivity: if (in addition) p (u, v) is differentiable at (0, 0), then λ i = p (0, 0), νi = p (0, 0) : if (in addition) p (u, v) is differentiable u at (0, 0), then i v i λ i proof = p (0, 0) (for λ i ):, fromν i global = p (0, 0) sensitivity result, u i v i p (0, 0) p (te i, 0) p (0, 0) rom global sensitivity result, = lim λ i u i t 0 t p (0, 0) p (te p (0, 0) p (te i, 0) p i, 0) p (0, 0) (0, 0) = lim = lim λ i u λ i tց0 i ut i t 0 t p hence, (0, 0) equality p (te i, 0) p (0, 0) = lim λ u p i i tր0 t (u) for a problem with one (inequality) constraint: lem with one (inequality) u = 0 u p (u) p (0) λ u 24/35

Duality and problem reformulations 25/35 equivalent formulations of a problem can lead to very different duals reformulating the primal problem can be useful when the dual is difficult to derive, or uninteresting common reformulations introduce new variables and equality constraints make explicit constraints implicit or vice-versa transform objective or constraint functions e.g., replace f 0 (x) by φ(f 0 (x)) with φ convex, increasing

Introducing new variables and equality constraints 26/35 min f 0 (Ax + b) dual function is constant: g = inf x L(x) = inf x f 0 (Ax + b) = p we have strong duality, but dual is quite useless reformulated problem and its dual min f 0 (y) s.t. Ax + b y = 0 max b T ν f 0 (ν) s.t. A T ν = 0 dual function follows from g(ν) = inf (f 0(y) ν T y + ν T Ax + b T ν) x,y { f 0 (ν) + b T ν A T ν = 0 = otherwise

norm approximation problem: min Ax b 27/35 min s.t. y y = Ax b can look up conjugate of, or derive dual directly g(ν) = inf ( y + x,y νt y ν T Ax + b T ν) { b T ν + inf y ( y + ν T y) A T ν = 0 = otherwise { b T ν A T ν = 0, ν 1 = otherwise dual of norm approximation problem max b T ν s.t. A T ν = 0, ν 1

Implicit constraints LP with box constraints: primal and dual problem min c T x s.t. Ax = b 1 x 1 max b T ν 1 T λ 1 1 T λ 2 s.t. c + A T ν + λ 1 λ 2 = 0 λ 1 0, λ 2 0 reformulation with box constraints made implicit min { c T x 1 x 1 f 0 (x) = otherwise s.t. Ax = b dual function g(ν) = inf 1 x 1 (ct x + ν T (Ax b)) = b T ν A T ν + c 1 dual problem: max b T ν A T ν + c 1 28/35

Problems with generalized inequalities min f 0 (x) s.t. f i (x) Ki 0, i = 1,..., m h i (x) = 0, Ki is generalized inequality on R k i definitions are parallel to scalar case: i = 1,..., p Lagrange multiplier for f i (x) Ki 0 is vector λ i R k i Lagrangian L : R n R k 1 R km R p R, is defined as m p L(x, λ 1,..., λ m, ν) = f 0 (x) + λ T i f i (x) + ν i h i (x) dual function g : R k 1 R km R p R, is defined as i=1 i=1 g(λ 1,..., λ m, ν) = inf x D L(x, λ 1,..., λ m, ν) 29/35

lower bound property: if λ i K i 0, then g(λ 1,..., λ m, ν) p proof: if x is feasible and λ K i 0, then f 0 ( x) f 0 ( x) + m λ T i f i ( x) + i=1 inf x D L(x, λ 1,..., λ m, ν) = g(λ 1,..., λ m, ν) p ν i h i ( x) minimizing over all feasible x gives p g(λ 1,..., λ m, ν) i=1 dual problem max g(λ 1,..., λ m, ν) s.t. λ i K i 0, i = 1,..., m weak duality: p d always strong duality: p = d for convex problem with constraint qualification (for example, Slater s: primal problem is strictly feasible) 30/35

Semidefinite program primal SDP (A i, C S n ) min s.t. b T y y 1 A 1 + + y m A m C Lagrange multiplier is matrix Z S n Lagrangian L(y, Z) = b T y + tr(z(y 1 A 1 + + y m A m C)) dual function g(z) = inf y L(y, Z) = { tr(cz) tr(ai Z) + b i = 0, i = 1,..., m otherwise dual SDP max tr(cz) s.t. Z 0, tr(a i Z) + b i = 0, i = 1,..., m p = d if primal SDP is strictly feasible ( y with y 1 A 1 + + y m A m C) 31/35

LP Duality 32/35 Strong duality: If a LP has an optimal solution, so does its dual, and their objective fun. are equal. primal finite unbounded infeasible dual finite unbounded infeasible If p =, then d p =, hence dual is infeasible If d = +, then + = d p, hence primal is infeasible min x 1 + 2x 2 s.t. x 1 + x 2 = 1 2x 1 + 2x 2 = 3 max p 1 + 3p 2 s.t. p 1 + 2p 2 = 1 p 1 + 2p 2 = 2

SOCP/SDP Duality (P) min c x s.t. Ax = b, x Q 0 (D) max b y s.t. A y + s = c, s Q 0 (P) min C, X s.t. A 1, X = b 1... A m, X = b m X 0 (D) max b y s.t. y i A i + S = C i S 0 Strong duality If p >, (P) is strictly feasible, then (D) is feasible and p = d If d < +, (D) is strictly feasible, then (P) is feasible and p = d If (P) and (D) has strictly feasible solutions, then both have optimal solutions. 33/35

Failure of SOCP Duality 34/35 inf (1, 1, 0)x s.t. (0, 0, 1)x = 1 x Q 0 sup s.t. y (0, 0, 1) y + z = (1, 1, 0) z Q 0 primal: min x 0 x 1, s.t. x 0 x1 2 + 1; It holds x 0 x 1 > 0 and x 0 x 1 0 if x 0 = x1 2 + 1. Hence, p = 0, no finite solution dual: sup y s.t. 1 1 + y 2. Hence, y = 0 p = d but primal is not attainable.

Failure of SDP Duality 35/35 Consider min 0 0 0 0 0 0, X 0 0 1 s.t. 1 0 0 0 0 0, X = 0 0 0 0 0 1 0 0 0 0, X = 2 1 0 2 X 0 max 2y 2 s.t. 1 0 0 0 0 0 y 1 + 0 1 0 0 0 0 y 2 0 0 0 0 0 0 0 0 0 1 0 2 0 0 1 primal: X = 0 0 0 0 0 0 0 0 1, p = 1 dual: y = (0, 0). Hence, d = 0 Both problems have finite optimal values, but p d