Duality. Geoff Gordon & Ryan Tibshirani Optimization /

Similar documents
Duality in Linear Programs. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Lecture 10: Duality in Linear Programs

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness.

Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725

5. Duality. Lagrangian

Convex Optimization Boyd & Vandenberghe. 5. Duality

Constrained Optimization and Lagrangian Duality

Lecture: Duality of LP, SOCP and SDP

Lagrangian Duality and Convex Optimization

Lecture: Duality.

Convex Optimization M2

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Lecture 18: Optimization Programming

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Lagrange duality. The Lagrangian. We consider an optimization program of the form

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

ICS-E4030 Kernel Methods in Machine Learning

Optimization for Machine Learning

EE/AA 578, Univ of Washington, Fall Duality

Duality Uses and Correspondences. Ryan Tibshirani Convex Optimization

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

CS-E4830 Kernel Methods in Machine Learning

4. Algebra and Duality

Optimization for Communications and Networks. Poompat Saengudomlert. Session 4 Duality and Lagrange Multipliers

Optimality, Duality, Complementarity for Constrained Optimization

Lecture 7: Convex Optimizations

Convex Optimization & Lagrange Duality

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

E5295/5B5749 Convex optimization with engineering applications. Lecture 5. Convex programming and semidefinite programming

Linear and non-linear programming

The Lagrangian L : R d R m R r R is an (easier to optimize) lower bound on the original problem:

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

ELE539A: Optimization of Communication Systems Lecture 6: Quadratic Programming, Geometric Programming, and Applications

EE364a Review Session 5

Applications of Linear Programming

Lagrange Relaxation and Duality

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Lagrangian Duality Theory

Introduction to Mathematical Programming IE406. Lecture 10. Dr. Ted Ralphs

Lagrangian Duality. Richard Lusby. Department of Management Engineering Technical University of Denmark

Lecture Note 5: Semidefinite Programming for Stability Analysis

Convex Optimization and Modeling

Sequential Convex Programming

Duality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

Part IB Optimisation

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

Introduction to Nonlinear Stochastic Programming

Generalization to inequality constrained problem. Maximize

CSCI : Optimization and Control of Networks. Review on Convex Optimization

Duality Theory of Constrained Optimization

subject to (x 2)(x 4) u,

Tutorial on Convex Optimization: Part II

A Brief Review on Convex Optimization

Nonlinear Programming

Solution Methods for Stochastic Programs

Homework 3. Convex Optimization /36-725

Primal/Dual Decomposition Methods

1 Strict local optimality in unconstrained optimization

minimize x subject to (x 2)(x 4) u,

Finite Dimensional Optimization Part III: Convex Optimization 1

On the Method of Lagrange Multipliers

Maximum likelihood estimation

Lecture 3: Semidefinite Programming

Support Vector Machines

Convex Functions. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Lecture 8. Strong Duality Results. September 22, 2008

Tutorial on Convex Optimization for Engineers Part II

Convex Optimization. Ofer Meshi. Lecture 6: Lower Bounds Constrained Optimization

HW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given.

15. Conic optimization

Lecture Notes on Support Vector Machine

Semidefinite Programming Basics and Applications

Lagrange Relaxation: Introduction and Applications

Nonlinear Optimization: What s important?

Convex Optimization and SVM

Interior-point methods Optimization Geoff Gordon Ryan Tibshirani

15-780: LinearProgramming

CS711008Z Algorithm Design and Analysis

Homework Set #6 - Solutions

Support Vector Machine (SVM) and Kernel Methods

Convex Optimization M2

Optimality Conditions for Constrained Optimization

ECE Optimization for wireless networks Final. minimize f o (x) s.t. Ax = b,

1 Seidel s LP algorithm

Linear Programming Duality

Convex Optimization and Support Vector Machine

Lagrangian Duality. Evelien van der Hurk. DTU Management Engineering

Support Vector Machine (SVM) and Kernel Methods

CSCI 1951-G Optimization Methods in Finance Part 09: Interior Point Methods

Applied Lagrange Duality for Constrained Optimization

Module 04 Optimization Problems KKT Conditions & Solvers

Example: feasibility. Interpretation as formal proof. Example: linear inequalities and Farkas lemma

Lecture 3. Optimization Problems and Iterative Algorithms

On the relation between concavity cuts and the surrogate dual for convex maximization problems

Subgradients. subgradients and quasigradients. subgradient calculus. optimality conditions via subgradients. directional derivatives

Nonlinear Programming and the Kuhn-Tucker Conditions

Pattern Classification, and Quadratic Problems

Lecture 7 Duality II

March 8, 2010 MATH 408 FINAL EXAM SAMPLE

Transcription:

Duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1

Duality in linear programs Suppose we want to find lower bound on the optimal value in our convex problem, B min x C f(x) E.g., consider the following simple LP min x,y x + y subject to x + y 2 x, y 0 What s a lower bound? Easy, take B = 2 But didn t we get lucky? 2

Try again: min x,y x + 3y subject to x + y 2 x, y 0 x + y 2 + 2y 0 = x + 3y 2 Lower bound B = 2 More generally: min x,y px + qy subject to x + y 2 x, y 0 a + b = p a + c = q a, b, c 0 Lower bound B = 2a, for any a, b, c satisfying above 3

What s the best we can do? Maximize our lower bound over all possible a, b, c: min x,y px + qy subject to x + y 2 x, y 0 max a,b,c 2a subject to a + b = p a + c = q a, b, c 0 Called primal LP Called dual LP Note: number of dual variables is number of primal constraints 4

Try another one: min x,y px + qy subject to x 0 y 1 3x + y = 2 max a,b,c 2c b subject to a + 3c = p b + c = q a, b 0 Primal LP Dual LP Note: in the dual problem, c is unconstrained 5

General form LP Given c R n, A R m n, b R m, G R r n, h R r min x R n ct x subject to Ax = b Gx h max u R m,v R r b T u h T v subject to A T u G T v = c v 0 Primal LP Dual LP Explanation: for any u and v 0, and x primal feasible, u T (Ax b) + v T (Gx h) 0, i.e., ( A T u G T v) T x b T u h T v So if c = A T u G T v, we get a bound on primal optimal value 6

Max flow and min cut Soviet railway network (from Schrijver (2002), On the history of transportation and maximum flow problems) 7

s f ij c ij t Given graph G = (V, E), define flow f ij, (i, j) E to satisfy: f ij 0, (i, j) E f ij c ij, (i, j) E f ik = f kj, k V \{s, t} (i,k) E (k,j) E Max flow problem: find flow that maximizes total value of flow from s to t. I.e., as an LP: max f R E (s,j) E f sj subject to f ij 0, f ij c ij for all (i, j) E f ik = f kj for all k V \ {s, t} (i,k) E (k,j) E 8

Derive the dual, in steps: Note that ( ) a ij f ij + b ij (f ij c ij ) (i,j) E + k V \{s,t} x k ( (i,k) E f ik (k,j) E for any a ij, b ij 0, (i, j) E, and x k, k V \ {s, t} Rearrange as (i,j) E M ij (a, b, x)f ij (i,j) E b ij c ij f kj ) 0 where M ij (a, b, x) collects terms multiplying f ij 9

10 Want to make LHS in previous inequality equal to primal M sj = b sj a sj + x j want this = 1 objective, i.e., M it = b it a it x i want this = 0 M ij = b ij a ij + x j x i want this = 0 We ve shown that primal optimal value (i,j) E b ij c ij, subject to a, b, x satisfying constraints. Hence dual problem is (minimize over a, b, x to get best upper bound): min b R E,x R V (i,j) E b ij c ij subject to b ij + x j x i 0 b 0, x s = 1, x t = 0 for all (i, j) E

11 Suppose that at the solution, it just so happened x i {0, 1} for all i V Call A = {i : x i = 1} and B = {i : x i = 0}, note that s A and t B. Then the constraints b ij x i x j for (i, j) E, b 0 imply that b ij = 1 if i A and j B, and 0 otherwise. Moreover, the objective (i,j) E b ijc ij is the capacity of cut defined by A, B I.e., we ve argued that the dual is the LP relaxation of the min cut problem: min b R E,x R V (i,j) E b ij c ij subject to b ij x i x j b ij, x i, x j {0, 1} for all i, j

12 Therefore, from what we know so far: value of max flow optimal value for LP relaxed min cut capacity of min cut Famous result, called max flow min cut theorem: value of max flow through a network is exactly the capacity of the min cut Hence in the above, we get all equalities. In particular, we get that the primal LP and dual LP have exactly the same optimal values, a phenomenon called strong duality How often does this happen? More on this later

(From F. Estrada et al. (2004), Spectral embedding and min cut for image segmentation ) 13

14 Another perspective on LP duality min x R n ct x subject to Ax = b Gx h max u R m, v R r b T u h T v subject to A T u G T v = c v 0 Primal LP Dual LP Explanation # 2: for any u and v 0, and x primal feasible c T x c T x + u T (Ax b) + v T (Gx h) := L(x, u, v) So if C denotes primal feasible set, f primal optimal value, then for any u and v 0, f min x C L(x, u, v) min L(x, u, v) := g(u, v) x Rn

15 In other words, g(u, v) is a lower bound on f for any u and v 0 Note that g(u, v) = { b T u h T v if c = A T u G T v otherwise Now we can maximize g(u, v) over u and v 0 to get the tightest bound, and this gives exactly the dual LP as before This last perspective is actually completely general and applies to arbitrary optimization problems (even nonconvex ones)

16 Outline Rest of today: Lagrange dual function Langrange dual problem Examples Weak and strong duality

17 Lagrangian Consider general minimization problem min x R n f(x) subject to h i (x) 0, i = 1,... m l j (x) = 0, j = 1,... r Need not be convex, but of course we will pay special attention to convex case We define the Lagrangian as L(x, u, v) = f(x) + m u i h i (x) + i=1 r v j l j (x) j=1 New variables u R m, v R r, with u 0 (implicitly, we define L(x, u, v) = for u < 0)

18 Important property: for any u 0 and v, f(x) L(x, u, v) at each feasible x Why? For feasible x, L(x, u, v) = f(x) + m u i h i (x) + }{{} 0 i=1 r j=1 v j l j (x) f(x) }{{} =0 Solid line is f Dashed line is h, hence feasible set [ 0.46, 0.46] Each dotted line shows L(x, u, v) for different choices of u 0 and v (From B & V page 217)

19 Lagrange dual function Let C denote primal feasible set, f denote primal optimal value. Minimizing L(x, u, v) over all x R n gives a lower bound: f min x C L(x, u, v) min L(x, u, v) := g(u, v) x Rn We call g(u, v) the Lagrange dual function, and it gives a lower bound on f for any u 0 and v, called dual feasible u, v Dashed horizontal line is f Dual variable λ is (our u) Solid line shows g(λ) (From B & V page 217)

20 Quadratic program Consider quadratic program (QP, step up from LP!) where Q 0. Lagrangian: 1 min x R n 2 xt Qx + c T x subject to Ax = b, x 0 L(x, u, v) = 1 2 xt Qx + c T x u T x + v T (Ax b) Lagrange dual function: g(u, v) = min x R n L(x, u, v) = 1 2 (c u+at v) T Q 1 (c u+a T v) b T v For any u 0 and any v, this is lower a bound on primal optimal value f

21 Same problem but now Q 0. Lagrangian: 1 min x R n 2 xt Qx + c T x subject to Ax = b, x 0 L(x, u, v) = 1 2 xt Qx + c T x u T x + v T (Ax b) Lagrange dual function: 1 2 (c u + AT v) T Q + (c u + A T v) b T v g(u, v) = if c u + A T v null(q) otherwise where Q + denotes generalized inverse of Q. For any u 0, v, and c u + A T v null(q), g(u, v) is a nontrivial lower bound on f

22 Quadratic program in 2D We choose f(x) to be quadratic in 2 variables, subject to x 0. Dual function g(u) is also quadratic in 2 variables, also subject to u 0 primal Dual function g(u) provides a bound on f for every u 0 f / g dual Largest bound this gives us: turns out to be exactly f... coincidence? x1 / u1 x2 / u2 More on this later

Given primal problem Lagrange dual problem min x R n f(x) subject to h i (x) 0, i = 1,... m l j (x) = 0, j = 1,... r Our constructed dual function g(u, v) satisfies f g(u, v) for all u 0 and v. Hence best lower bound is given by maximizing g(u, v) over all dual feasible u, v, yielding Lagrange dual problem: max g(u, v) u R m, v Rr subject to u 0 Key property, called weak duality: if dual optimal value g, then f g Note that this always holds (even if primal problem is nonconvex) 23

24 Another key property: the dual problem is a convex optimization problem (as written, it is a concave maximization problem) Again, this is always true (even when primal problem is not convex) By definition: { g(u, v) = min f(x) + x R n m u i h i (x) + i=1 { = max f(x) x R n r j=1 m u i h i (x) i=1 } v j l j (x) r j=1 } v j l j (x) }{{} pointwise maximum of convex functions in (u, v) I.e., g is concave in (u, v), and u 0 is a convex constraint, hence dual problem is a concave maximization problem

f 25 Nonconvex quartic minimization Define f(x) = x 4 50x 2 + 100x (nonconvex), minimize subject to constraint x 4.5 Primal Dual 1000 0 1000 3000 5000 g 1160 1120 1080 10 5 0 5 10 x 0 20 40 60 80 100 v Dual function g can be derived explicitly (via closed-form equation for roots of a cubic equation). Form of g is quite complicated, and would be hard to tell whether or not g is concave... but it must be!

26 Strong duality Recall that we always have f g (weak duality). On the other hand, in some problems we have observed that actually which is called strong duality f = g Slater s condition: if the primal is a convex problem (i.e., f and h 1,... h m are convex, l 1,... l r are affine), and there exists at least one strictly feasible x R n, meaning h 1 (x) < 0,... h m (x) < 0 and l 1 (x) = 0,... l r (x) = 0 then strong duality holds This is a pretty weak condition. (And it can be further refined: need strict inequalities only over functions h i that are not affine)

27 Back to where we started For linear programs: Easy to check that the dual of the dual LP is the primal LP Refined version of Slater s condition: strong duality holds for an LP if it is feasible Apply same logic to its dual LP: strong duality holds if it is feasible Hence strong duality holds for LPs, except when both primal and dual are infeasible In other words, we pretty much always have strong duality for LPs

28 Mixed strategies for matrix games Setup: two players, vs., and a payout matrix P G R 1 2... n 1 P 11 P 12... P 1n 2 P 21 P 22... P 2n... m P m1 P m2... P mn Game: if G chooses i and R chooses j, then G must pay R amount P ij (don t feel bad for G this can be positive or negative) They use mixed strategies, i.e., each will first specify a probability distribution, and then x : y : P(G chooses i) = x i, i = 1,... m P(R chooses j) = y j, j = 1,... n

29 The expected payout then, from G to R, is m n x i y j P ij = x T P y i=1 j=1 Now suppose that, because G is older and wiser, he will allow R to know his strategy x ahead of time. In this case, R will definitely choose y to maximize x T P y, which results in G paying off max {x T P y : y 0, 1 T y = 1} = max i=1,...n (P T x) i G s best strategy is then to choose his distribution x according to min x R m max i=1,...n (P T x) i subject to x 0, 1 T x = 1

30 In an alternate universe, if R were somehow older and wiser than G, then he might allow G to know his strategy y beforehand By the same logic, R s best strategy is to choose his distribution y according to max y R n min j=1,...m (P y) j subject to y 0, 1 T y = 1 Call G s expected payout in first scenario f1, and expected payout in second scenario f2. Because it is clearly advantageous to know the other player s strategy, f1 f 2 We can show using strong duality that f1 = f 2 come as a surprise!... which may

Recast first problem as an LP min x R m, t R t subject to x 0, 1 T x = 1 P T x t Lagrangian and Lagrange dual function L(x, u, v, y) = t u T x + v(1 1 T x) + y T (P T x t) { v if 1 1 T y = 0, P y u v = 0 g(u, v, y) = otherwise Hence dual problem is max u R m, t R v subject to y 0, 1 T y = 1 P y v This is exactly the second problem, and we have strong LP duality 31

32 References S. Boyd and L. Vandenberghe (2004), Convex Optimization, Cambridge University Press, Chapter 5 R. T. Rockafellar (1970), Convex Analysis, Princeton University Press, Chapters 28 30