EE/AA 578, Univ of Washington, Fall Duality

Similar documents
Convex Optimization Boyd & Vandenberghe. 5. Duality

5. Duality. Lagrangian

Lecture: Duality.

Convex Optimization M2

Duality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities

Lecture: Duality of LP, SOCP and SDP

CSCI : Optimization and Control of Networks. Review on Convex Optimization

A Brief Review on Convex Optimization

Tutorial on Convex Optimization for Engineers Part II

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Convex Optimization & Lagrange Duality

On the Method of Lagrange Multipliers

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness.

Lagrange duality. The Lagrangian. We consider an optimization program of the form

Convex Optimization and Modeling

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Lecture 18: Optimization Programming

Lagrangian Duality and Convex Optimization

subject to (x 2)(x 4) u,

EE364a Review Session 5

Constrained Optimization and Lagrangian Duality

Optimization for Machine Learning

12. Interior-point methods

ICS-E4030 Kernel Methods in Machine Learning

CS-E4830 Kernel Methods in Machine Learning

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Homework Set #6 - Solutions

4TE3/6TE3. Algorithms for. Continuous Optimization

12. Interior-point methods

Tutorial on Convex Optimization: Part II

Lagrangian Duality Theory

The Lagrangian L : R d R m R r R is an (easier to optimize) lower bound on the original problem:

Optimisation convexe: performance, complexité et applications.

COM S 578X: Optimization for Machine Learning

Introduction to Mathematical Programming IE406. Lecture 10. Dr. Ted Ralphs

4. Convex optimization problems (part 1: general)

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

Optimality Conditions for Constrained Optimization

minimize x x2 2 x 1x 2 x 1 subject to x 1 +2x 2 u 1 x 1 4x 2 u 2, 5x 1 +76x 2 1,

Optimality, Duality, Complementarity for Constrained Optimization

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient

Generalization to inequality constrained problem. Maximize

Support Vector Machines: Maximum Margin Classifiers

1. Introduction. mathematical optimization. least-squares and linear programming. convex optimization. example. course goals and topics

1. Introduction. mathematical optimization. least-squares and linear programming. convex optimization. example. course goals and topics

1. Introduction. mathematical optimization. least-squares and linear programming. convex optimization. example. course goals and topics

HW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given.

Primal/Dual Decomposition Methods

Duality. Geoff Gordon & Ryan Tibshirani Optimization /

Finite Dimensional Optimization Part III: Convex Optimization 1

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Convex Optimization and SVM

Convex Optimization in Communications and Signal Processing

Solution to EE 617 Mid-Term Exam, Fall November 2, 2017

EE 227A: Convex Optimization and Applications October 14, 2008

Machine Learning. Lecture 6: Support Vector Machine. Feng Li.

Linear and non-linear programming

Lecture 7: Convex Optimizations

Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Chap 2. Optimality conditions

Duality Theory of Constrained Optimization

Linear and Combinatorial Optimization

Rate Control in Communication Networks

Lecture 6: Conic Optimization September 8

UC Berkeley Department of Electrical Engineering and Computer Science. EECS 227A Nonlinear and Convex Optimization. Solutions 6 Fall 2009

Interior Point Algorithms for Constrained Convex Optimization

Convex Optimization and Support Vector Machine

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Lecture Notes on Support Vector Machine

Nonlinear Programming and the Kuhn-Tucker Conditions

A Tutorial on Convex Optimization II: Duality and Interior Point Methods

Applications of Linear Programming

Support Vector Machines

Linear Programming. Larry Blume Cornell University, IHS Vienna and SFI. Summer 2016

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

Numerical Optimization

Duality Uses and Correspondences. Ryan Tibshirani Convex Optimization

Convex Optimization Theory. Chapter 5 Exercises and Solutions: Extended Version

Optimization for Communications and Networks. Poompat Saengudomlert. Session 4 Duality and Lagrange Multipliers

4. Algebra and Duality

Optimization Theory. Lectures 4-6

Part IB Optimisation

Convex Optimization M2

Convex Optimization. Lecture 12 - Equality Constrained Optimization. Instructor: Yuanzhang Xiao. Fall University of Hawaii at Manoa

1. f(β) 0 (that is, β is a feasible point for the constraints)

Convexification of Mixed-Integer Quadratically Constrained Quadratic Programs

Lecture 6. Foundations of LMIs in System and Control Theory

Lagrange Relaxation: Introduction and Applications

9. Dual decomposition and dual algorithms

CS711008Z Algorithm Design and Analysis

In view of (31), the second of these is equal to the identity I on E m, while this, in view of (30), implies that the first can be written

Machine Learning. Support Vector Machines. Manfred Huber

Lecture 7: Weak Duality

Lecture Note 18: Duality

A : k n. Usually k > n otherwise easily the minimum is zero. Analytical solution:

Conic Linear Optimization and its Dual. yyye

ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints

Transcription:

7. Duality EE/AA 578, Univ of Washington, Fall 2016 Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized inequalities 7 1

Lagrangian standard form problem (not necessarily convex) minimize f 0 (x) subject to f i (x) 0, i = 1,...,m h i (x) = 0, i = 1,...,p variable x R n, domain D, optimal value p Lagrangian: L : R n R m R p R, with doml = D R m R p, L(x,λ,ν) = f 0 (x)+ m λ i f i (x)+ i=1 p ν i h i (x) i=1 weighted sum of objective and constraint functions λ i is Lagrange multiplier associated with f i (x) 0 ν i is Lagrange multiplier associated with h i (x) = 0 7 2

Lagrange dual function Lagrange dual function: g : R m R p R, g(λ,ν) = inf L(x,λ,ν) x D ( = inf x D f 0 (x)+ m λ i f i (x)+ i=1 ) p ν i h i (x) i=1 g is concave, can be for some λ, ν lower bound property: if λ 0, then g(λ,ν) p proof: if x is feasible and λ 0, then f 0 ( x) L( x,λ,ν) inf L(x,λ,ν) = g(λ,ν) x D minimizing over all feasible x gives p g(λ,ν) 7 3

Least-norm solution of linear equations dual function minimize x T x subject to Ax = b Lagrangian is L(x,ν) = x T x+ν T (Ax b) to minimize L over x, set gradient equal to zero: x L(x,ν) = 2x+A T ν = 0 = x = (1/2)A T ν plug in in L to obtain g: g(ν) = L(( 1/2)A T ν,ν) = 1 4 νt AA T ν b T ν a concave function of ν lower bound property: p (1/4)ν T AA T ν b T ν for all ν 7 4

Standard form LP minimize c T x subject to Ax = b, x 0 dual function Lagrangian is L is linear in x, hence L(x,λ,ν) = c T x+ν T (Ax b) λ T x g(λ,ν) = inf x L(x,λ,ν) = = b T ν +(c+a T ν λ) T x { b T ν A T ν λ+c = 0 otherwise g is linear on affine domain {(λ,ν) A T ν λ+c = 0}, hence concave lower bound property: p b T ν if A T ν +c 0 7 5

Equality constrained norm minimization dual function minimize x subject to Ax = b g(ν) = inf x ( x νt Ax+b T ν) = where v = sup u 1 u T v is dual norm of { b T ν A T ν 1 otherwise proof: follows from inf x ( x y T x) = 0 if y 1, otherwise if y 1, then x y T x 0 for all x, with equality if x = 0 if y > 1, choose x = tu where u 1, u T y = y > 1: x y T x = t( u y ) as t lower bound property: p b T ν if A T ν 1 7 6

Example: two-way partitioning minimize x T Wx subject to x 2 i = 1, i = 1,...,n a nonconvex problem; feasible set contains 2 n discrete points interpretation: partition {1,...,n} in two sets; W ij is cost of assigning i, j to the same set; W ij is cost of assigning to different sets dual function g(ν) = inf x (xt Wx+ i ν i (x 2 i 1)) = inf x (xt (W +diag(ν))x 1 T ν) { 1 = T ν W +diag(ν) 0 otherwise lower bound property: p 1 T ν if W +diag(ν) 0 example: ν = λ min (W)1 gives bound p nλ min (W) 7 7

The dual problem Lagrange dual problem maximize g(λ, ν) subject to λ 0 finds best lower bound on p, obtained from Lagrange dual function a convex optimization problem; optimal value denoted d λ, ν are dual feasible if λ 0, (λ,ν) domg often simplified by making implicit constraint (λ, ν) dom g explicit example: standard form LP and its dual (page 7 5) minimize c T x subject to Ax = b x 0 maximize b T ν subject to A T ν +c 0 7 8

weak duality: d p Weak and strong duality always holds (for convex and nonconvex problems) can be used to find nontrivial lower bounds for difficult problems for example, solving the SDP maximize 1 T ν subject to W +diag(ν) 0 gives a lower bound for the two-way partitioning problem on page 7 7 strong duality: d = p does not hold in general (usually) holds for convex problems conditions that guarantee strong duality in convex problems are called constraint qualifications 7 9

Slater s constraint qualification strong duality holds for a convex problem if it is strictly feasible, i.e., minimize f 0 (x) subject to f i (x) 0, i = 1,...,m Ax = b x intd : f i (x) < 0, i = 1,...,m, Ax = b also guarantees that the dual optimum is attained (if p > ) can be sharpened: (e.g., can replace int D with relative interior; linear inequalities do not need to hold with strict inequality, etc... ) there exist many other types of constraint qualifications 7 10

Inequality form LP primal problem minimize c T x subject to Ax b dual function g(λ) = inf x ( (c+a T λ) T x b T λ ) { b = T λ A T λ+c = 0 otherwise dual problem maximize b T λ subject to A T λ+c = 0, λ 0 from Slater s condition: p = d if A x b for some x in fact, p = d except when primal and dual are infeasible 7 11

primal problem (assume P S n ++) Quadratic program minimize x T Px subject to Ax b dual function g(λ) = inf x ( x T Px+λ T (Ax b) ) = 1 4 λt AP 1 A T λ b T λ dual problem maximize (1/4)λ T AP 1 A T λ b T λ subject to λ 0 from Slater s condition: p = d if A x b for some x in fact, p = d always 7 12

Economic (price) interpretation minimize f 0 (x) subject to f i (x) 0, i = 1,...,m. f 0 (x) is cost of operating firm at operating condition x; constraints give resource limits. suppose: can violate f i (x) 0 by paying additional cost of λ i (in dollars per unit violation), i.e., incur cost λ i f i (x) if f i (x) > 0 can sell unused portion of ith constraint at same price, i.e., gain profit λ i f i (x) if f i (x) < 0 total cost to firm to operate at x at constraint prices λ i : L(x,λ) = f 0 (x)+ m λ i f i (x) i=1 7 13

interpretations: dual function: g(λ) = inf x L(x,λ) is optimal cost to firm at constraint prices λ i weak duality: cost can be lowered if firm allowed to pay for violated constraints (and get paid for non-tight ones) duality gap: advantage to firm under this scenario strong duality: λ give prices for which firm has no advantage in being allowed to violate constraints... 7 14

Min-max & saddle-point interpretation can express primal and dual problems in a more symmetric form: ( ) m { f0 (x) f f 0 (x)+ λ i f i (x) = i (x) 0, + otherwise, sup λ 0 i=1 so p = inf x sup λ 0 L(x,λ). also by definition, d = sup λ 0 inf x L(x,λ). weak duality can be expressed as supinf λ 0 x L(x,λ) inf sup L(x, λ) x λ 0 strong duality when equality holds. means can switch the order of minimization over x and maximization over λ 0. if x and λ are primal and dual optimal and strong duality holds, they form a saddle-point for the Lagrangian (converse also true). 7 15

Geometric interpretation for simplicity, consider problem with one constraint f 1 (x) 0 interpretation of dual function: g(λ) = inf (u,t) G (t+λu), where G = {(f 1(x),f 0 (x)) x D} t t G G λu + t = g(λ) p g(λ) u p d u λu+t = g(λ) is (non-vertical) supporting hyperplane to G hyperplane intersects t-axis at t = g(λ) 7 16

epigraph variation: same interpretation if G is replaced with A = {(u,t) f 1 (x) u,f 0 (x) t for some x D} t A λu + t = g(λ) p g(λ) u strong duality holds if there is a non-vertical supporting hyperplane to A at (0,p ) for convex problem, A is convex, hence has supp. hyperplane at (0,p ) Slater s condition: if there exist (ũ, t) A with ũ < 0, then supporting hyperplanes at (0,p ) must be non-vertical 7 17

Complementary slackness assume strong duality holds, x is primal optimal, (λ,ν ) is dual optimal f 0 (x ) = g(λ,ν ) = inf x ( f 0 (x )+ f 0 (x ) f 0 (x)+ m λ if i (x)+ i=1 m λ if i (x )+ i=1 ) p νih i (x) i=1 p νih i (x ) i=1 hence, the two inequalities hold with equality x minimizes L(x,λ,ν ) λ i f i(x ) = 0 for i = 1,...,m (known as complementary slackness): λ i > 0 = f i (x ) = 0, f i (x ) < 0 = λ i = 0 7 18

Karush-Kuhn-Tucker (KKT) conditions the following four conditions are called KKT conditions (for a problem with differentiable f i, h i ): 1. primal constraints: f i (x) 0, i = 1,...,m, h i (x) = 0, i = 1,...,p 2. dual constraints: λ 0 3. complementary slackness: λ i f i (x) = 0, i = 1,...,m 4. gradient of Lagrangian with respect to x vanishes: f 0 (x)+ m λ i f i (x)+ i=1 p ν i h i (x) = 0 i=1 from page 7 18: if strong duality holds and x, λ, ν are optimal, then they must satisfy the KKT conditions 7 19

KKT conditions for convex problem if x, λ, ν satisfy KKT for a convex problem, then they are optimal: from complementary slackness: f 0 ( x) = L( x, λ, ν) from 4th condition (and convexity): g( λ, ν) = L( x, λ, ν) hence, f 0 ( x) = g( λ, ν) if Slater s condition is satisfied: x is optimal if and only if there exist λ, ν that satisfy KKT conditions recall that Slater implies strong duality, and dual optimum is attained generalizes optimality condition f 0 (x) = 0 for unconstrained problem 7 20

example: water-filling (assume α i > 0) minimize n i=1 log(x i +α i ) subject to x 0, 1 T x = 1 x is optimal iff x 0, 1 T x = 1, and there exist λ R n, ν R such that λ 0, λ i x i = 0, 1 x i +α i +λ i = ν if ν < 1/α i : λ i = 0 and x i = 1/ν α i if ν 1/α i : λ i = ν 1/α i and x i = 0 determine ν from 1 T x = n i=1 max{0,1/ν α i} = 1 interpretation n patches; level of patch i is at height α i flood area with unit amount of water resulting level is 1/ν 1/ν x i i α i 7 21

Perturbation and sensitivity analysis (unperturbed) optimization problem and its dual minimize f 0 (x) subject to f i (x) 0, i = 1,...,m h i (x) = 0, i = 1,...,p maximize g(λ, ν) subject to λ 0 perturbed problem and its dual min. f 0 (x) s.t. f i (x) u i, i = 1,...,m h i (x) = v i, i = 1,...,p max. g(λ,ν) u T λ v T ν s.t. λ 0 x is primal variable; u, v are parameters p (u,v) is optimal value as a function of u, v we are interested in information about p (u,v) that we can obtain from the solution of the unperturbed problem and its dual 7 22

global sensitivity result assume strong duality holds for unperturbed problem, and that λ, ν are dual optimal for unperturbed problem apply weak duality to perturbed problem: p (u,v) g(λ,ν ) u T λ v T ν = p (0,0) u T λ v T ν sensitivity interpretation if λ i large: p increases greatly if we tighten constraint i (u i < 0) if λ i small: p does not decrease much if we loosen constraint i (u i > 0) if ν i large and positive: p increases greatly if we take v i < 0; if ν i large and negative: p increases greatly if we take v i > 0 if ν i small and positive: p does not decrease much if we take v i > 0; if ν i small and negative: p does not decrease much if we take v i < 0 7 23

local sensitivity: if (in addition) p (u,v) is differentiable at (0,0), then λ i = p (0,0) u i, ν i = p (0,0) v i proof (for λ i ): from global sensitivity result, p (0,0) u i = lim tց0 p (te i,0) p (0,0) t λ i hence, equality p (0,0) u i = lim tր0 p (te i,0) p (0,0) t λ i p (u) for a problem with one (inequality) constraint: u = 0 u p (u) p (0) λ u 7 24

Duality and problem reformulations equivalent formulations of a problem can lead to very different duals reformulating the primal problem can be useful when the dual is difficult to derive, or uninteresting common reformulations introduce new variables and equality constraints make explicit constraints implicit or vice-versa transform objective or constraint functions e.g., replace f 0 (x) by φ(f 0 (x)) with φ convex, increasing 7 25

Introducing new variables and equality constraints minimize f 0 (Ax+b) dual function is constant: g = inf x L(x) = inf x f 0 (Ax+b) = p we have strong duality, but dual is quite useless reformulated problem and its dual minimize f 0 (y) subject to Ax+b y = 0 maximize b T ν f 0(ν) subject to A T ν = 0 dual function follows from g(ν) = inf (f 0(y) ν T y +ν T Ax+b T ν) x,y { f = 0 (ν)+b T ν A T ν = 0 otherwise 7 26

norm approximation problem: minimize Ax b minimize y subject to y = Ax b can look up conjugate of, or derive dual directly (see page 7 4) g(ν) = inf x,y ( y +νt y ν T Ax+b T ν) { b = T ν +inf y ( y +ν T y) A T ν = 0 otherwise { b = T ν A T ν = 0, ν 1 otherwise dual of norm approximation problem maximize b T ν subject to A T ν = 0, ν 1 7 27

Implicit constraints LP with box constraints: primal and dual problem minimize c T x subject to Ax = b 1 x 1 maximize b T ν 1 T λ 1 1 T λ 2 subject to c+a T ν +λ 1 λ 2 = 0 λ 1 0, λ 2 0 reformulation with box constraints made implicit minimize f 0 (x) = subject to Ax = b { c T x 1 x 1 otherwise dual function g(ν) = inf 1 x 1 (ct x+ν T (Ax b)) = b T ν A T ν +c 1 dual problem: maximize b T ν A T ν +c 1 7 28

Problems with generalized inequalities minimize f 0 (x) subject to f i (x) Ki 0, i = 1,...,m h i (x) = 0, i = 1,...,p Ki is generalized inequality on R k i definitions are parallel to scalar case: Lagrange multiplier for f i (x) Ki 0 is vector λ i R k i Lagrangian L : R n R k 1 R k m R p R, is defined as L(x,λ 1,,λ m,ν) = f 0 (x)+ m λ T i f i (x)+ i=1 p ν i h i (x) i=1 dual function g : R k 1 R k m R p R, is defined as g(λ 1,...,λ m,ν) = inf x D L(x,λ 1,,λ m,ν) 7 29

lower bound property: if λ i K i 0, then g(λ 1,...,λ m,ν) p proof: if x is feasible and λ K i 0, then f 0 ( x) f 0 ( x)+ m λ T i f i ( x)+ i=1 inf x D L(x,λ 1,...,λ m,ν) = g(λ 1,...,λ m,ν) p ν i h i ( x) i=1 minimizing over all feasible x gives p g(λ 1,...,λ m,ν) dual problem weak duality: p d always maximize g(λ 1,...,λ m,ν) subject to λ i K i 0, i = 1,...,m strong duality: p = d for convex problem with constraint qualification (for example, Slater s: primal problem is strictly feasible) 7 30

primal SDP (F i,g S k ) Semidefinite program minimize c T x subject to x 1 F 1 + +x n F n G Lagrange multiplier is matrix Z S k Lagrangian L(x,Z) = c T x+tr(z(x 1 F 1 + +x n F n G)) dual function g(z) = inf x L(x,Z) = dual SDP { tr(gz) tr(fi Z)+c i = 0, i = 1,...,n otherwise maximize tr(gz) subject to Z 0, tr(f i Z)+c i = 0, i = 1,...,n p = d if primal SDP is strictly feasible ( x with x 1 F 1 + +x n F n G) 7 31

A nonconvex problem with strong duality minimize x T Ax+2b T x subject to x T x 1 A 0, hence nonconvex dual function: g(λ) = inf x (x T (A+λI)x+2b T x λ) unbounded below if A+λI 0 or if A+λI 0 and b R(A+λI) minimized by x = (A+λI) b otherwise: g(λ) = b T (A+λI) b λ dual problem and equivalent SDP: maximize b T (A+λI) b λ subject to A+λI 0 b R(A+λI) maximize t [ λ A+λI b subject to b T t ] 0 strong duality although primal problem is not convex (not easy to show; if interested, see appendix B in book) 7 32