Duality in Linear Programs. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Similar documents
Duality. Geoff Gordon & Ryan Tibshirani Optimization /

Lecture 10: Duality in Linear Programs

Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness.

Subgradient Method. Ryan Tibshirani Convex Optimization

Part IB Optimisation

Introduction to Mathematical Programming IE406. Lecture 10. Dr. Ted Ralphs

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Lecture 18: Optimization Programming

Conditional Gradient (Frank-Wolfe) Method

ICS-E4030 Kernel Methods in Machine Learning

Lagrange duality. The Lagrangian. We consider an optimization program of the form

E5295/5B5749 Convex optimization with engineering applications. Lecture 5. Convex programming and semidefinite programming

Constrained Optimization and Lagrangian Duality

EE364a Review Session 5

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Subgradient Method. Guest Lecturer: Fatma Kilinc-Karzan. Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization /36-725

Optimization for Communications and Networks. Poompat Saengudomlert. Session 4 Duality and Lagrange Multipliers

6. Proximal gradient method

Convex Optimization Boyd & Vandenberghe. 5. Duality

Applications of Linear Programming

4. Algebra and Duality

5. Duality. Lagrangian

Convex Optimization. Ofer Meshi. Lecture 6: Lower Bounds Constrained Optimization

Lecture: Duality of LP, SOCP and SDP

Lecture #21. c T x Ax b. maximize subject to

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

6. Proximal gradient method

Lagrangian Duality and Convex Optimization

15-780: LinearProgramming

Optimization for Machine Learning

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Dual Ascent. Ryan Tibshirani Convex Optimization

ELE539A: Optimization of Communication Systems Lecture 6: Quadratic Programming, Geometric Programming, and Applications

Generalization to inequality constrained problem. Maximize

Linear Programming Duality

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

Lagrangian Duality. Richard Lusby. Department of Management Engineering Technical University of Denmark

Primal-dual Subgradient Method for Convex Problems with Functional Constraints

Optimization methods

Convex Optimization and Modeling

Convex Optimization and SVM

Gradient descent. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

CS-E4830 Kernel Methods in Machine Learning

Dual Proximal Gradient Method

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Duality Uses and Correspondences. Ryan Tibshirani Convex Optimization

Convex Optimization. Lecture 12 - Equality Constrained Optimization. Instructor: Yuanzhang Xiao. Fall University of Hawaii at Manoa

Lecture 3: Semidefinite Programming

Today: Linear Programming (con t.)

Convex Optimization M2

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Lecture 7 Duality II

March 8, 2010 MATH 408 FINAL EXAM SAMPLE

Lecture: Duality.

Optimality, Duality, Complementarity for Constrained Optimization

Primal-Dual Interior-Point Methods. Ryan Tibshirani Convex Optimization

Algorithmic Game Theory and Applications. Lecture 7: The LP Duality Theorem

Linear and Combinatorial Optimization

Quiz Discussion. IE417: Nonlinear Programming: Lecture 12. Motivation. Why do we care? Jeff Linderoth. 16th March 2006

Stochastic Gradient Descent. Ryan Tibshirani Convex Optimization

Duality of LPs and Applications

Dual Methods. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Primal/Dual Decomposition Methods

Lecture 8. Strong Duality Results. September 22, 2008

Convex Optimization and Support Vector Machine

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

12. Interior-point methods

Primal-Dual Interior-Point Methods. Ryan Tibshirani Convex Optimization /36-725

Lagrange Relaxation: Introduction and Applications

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization

Lagrange Relaxation and Duality

10-725/36-725: Convex Optimization Spring Lecture 21: April 6

Lecture 5. x 1,x 2,x 3 0 (1)

Optimization methods

EE/AA 578, Univ of Washington, Fall Duality

Lectures 6, 7 and part of 8

10-725/ Optimization Midterm Exam

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method

Fast proximal gradient methods

4.6 Linear Programming duality

Linear and non-linear programming

Lecture 6: September 12

minimize x subject to (x 2)(x 4) u,

1 Seidel s LP algorithm

Convex Optimization & Lagrange Duality

princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 17: Duality and MinMax Theorem Lecturer: Sanjeev Arora

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

Dual and primal-dual methods

Input: System of inequalities or equalities over the reals R. Output: Value for variables that minimizes cost function

Tutorial on Convex Optimization: Part II

CS711008Z Algorithm Design and Analysis

10 Numerical methods for constrained problems

Constrained Optimization

subject to (x 2)(x 4) u,

Transcription:

Duality in Linear Programs Lecturer: Ryan Tibshirani Convex Optimization 10-725/36-725 1

Last time: proximal gradient descent Consider the problem x g(x) + h(x) with g, h convex, g differentiable, and h simple in so much as prox t (x) = arg z 1 2t x z 2 2 + h(x) is computable. Proximal gradient descent: let x (0) R n, repeat: x (k) = prox tk ( x (k 1) t k g(x (k 1) ) ), k = 1, 2, 3,... Step sizes t k chosen to be fixed and small, or via backtracking If g is Lipschitz with constant L, then this has convergence rate O(1/ɛ). Lastly we can accelerate this, to optimal rate O(1/ ɛ) 2

Lower bounds in linear programs Suppose we want to find lower bound on the optimal value in our convex problem, B x f(x) E.g., consider the following simple LP x,y x + y subject to x + y 2 x, y 0 What s a lower bound? Easy, take B = 2 But didn t we get lucky? 3

Try again: x,y x + 3y subject to x + y 2 x, y 0 x + y 2 + 2y 0 = x + 3y 2 Lower bound B = 2 More generally: x,y px + qy subject to x + y 2 x, y 0 a + b = p a + c = q a, b, c 0 Lower bound B = 2a, for any a, b, c satisfying above 4

What s the best we can do? Maximize our lower bound over all possible a, b, c: x,y px + qy subject to x + y 2 x, y 0 max a,b,c subject to 2a a + b = p a + c = q a, b, c 0 Called primal LP Called dual LP Note: number of dual variables is number of primal constraints 5

Try another one: x,y px + qy subject to x 0 y 1 3x + y = 2 max a,b,c subject to 2c b a + 3c = p b + c = q a, b 0 Primal LP Dual LP Note: in the dual problem, c is unconstrained 6

Outline Today: Duality in general LPs Max flow and cut Second take on duality Matrix games 7

Duality for general form LP Given c R n, A R m n, b R m, G R r n, h R r : x subject to c T x Ax = b Gx h max u,v subject to b T u h T v A T u G T v = c v 0 Primal LP Dual LP Explanation: for any u and v 0, and x primal feasible, u T (Ax b) + v T (Gx h) 0, i.e., ( A T u G T v) T x b T u h T v So if c = A T u G T v, we get a bound on primal optimal value 8

Example: max flow and cut Soviet railway network (from Schrijver (2002), On the history of transportation and maximum flow problems ) 9

10 s f ij c ij t Given graph G = (V, E), define flow f ij, (i, j) E to satisfy: f ij 0, (i, j) E f ij c ij, (i, j) E f ik = f kj, k V \{s, t} (i,k) E (k,j) E Max flow problem: find flow that maximizes total value of the flow from s to t. I.e., as an LP: max f R E (s,j) E f sj subject to f ij 0, f ij c ij for all (i, j) E f ik = f kj for all k V \ {s, t} (i,k) E (k,j) E

11 Derive the dual, in steps: Note that ( ) a ij f ij + b ij (f ij c ij ) (i,j) E + k V \{s,t} x k ( (i,k) E f ik (k,j) E for any a ij, b ij 0, (i, j) E, and x k, k V \ {s, t} Rearrange as (i,j) E M ij (a, b, x)f ij (i,j) E b ij c ij f kj ) 0 where M ij (a, b, x) collects terms multiplying f ij

12 Want to make LHS in previous inequality equal to primal M sj = b sj a sj + x j want this = 1 objective, i.e., M it = b it a it x i want this = 0 M ij = b ij a ij + x j x i want this = 0 We ve shown that primal optimal value (i,j) E b ij c ij, subject to a, b, x satisfying constraints. Hence dual problem is (imize over a, b, x to get best upper bound): b R E, x R V (i,j) E b ij c ij subject to b ij + x j x i 0 for all (i, j) E b 0, x s = 1, x t = 0

13 Suppose that at the solution, it just so happened that x i {0, 1} for all i V Let A = {i : x i = 1}, B = {i : x i = 0}; note s A, t B. Then b ij x i x j for (i, j) E, b 0 imply that b ij = 1 if i A and j B, and 0 otherwise. Moreover, the objective (i,j) E b ijc ij is the capacity of cut defined by A, B I.e., we ve argued that the dual is the LP relaxation of the cut problem: b R E, x R V subject to (i,j) E b ij c ij b ij x i x j b ij, x i, x j {0, 1} for all i, j

14 Therefore, from what we know so far: value of max flow optimal value for LP relaxed cut capacity of cut Famous result, called max flow cut theorem: value of max flow through a network is exactly the capacity of the cut Hence in the above, we get all equalities. In particular, we get that the primal LP and dual LP have exactly the same optimal values, a phenomenon called strong duality How often does this happen? More on this soon

15 Another perspective on LP duality x subject to c T x Ax = b Gx h max u,b subject to b T u h T v A T u G T v = c v 0 Primal LP Dual LP Explanation # 2: for any u and v 0, and x primal feasible c T x c T x + u T (Ax b) + v T (Gx h) := L(x, u, v) So if C denotes primal feasible set, f primal optimal value, then for any u and v 0, f x C L(x, u, v) x L(x, u, v) := g(u, v)

16 In other words, g(u, v) is a lower bound on f for any u and v 0 Note that g(u, v) = { b T u h T v if c = A T u G T v otherwise Now we can maximize g(u, v) over u and v 0 to get the tightest bound, and this gives exactly the dual LP as before This last perspective is actually completely general and applies to arbitrary optimization problems (even nonconvex ones)

17 Example: mixed strategies for matrix games Setup: two players, vs., and a payout matrix P J R 1 2... n 1 P 11 P 12... P 1n 2 P 21 P 22... P 2n... m P m1 P m2... P mn Game: if J chooses i and R chooses j, then J must pay R amount P ij (don t feel bad for J this can be positive or negative) They use mixed strategies, i.e., each will first specify a probability distribution, and then x : y : P(J chooses i) = x i, i = 1,... m P(R chooses j) = y j, j = 1,... n

18 The expected payout then, from J to R, is m n x i y j P ij = x T P y i=1 j=1 Now suppose that, because J is wiser, he will allow R to know his strategy x ahead of time. In this case, R will choose y to maximize x T P y, which results in J paying off max {x T P y : y 0, 1 T y = 1} = max i=1,...n (P T x) i J s best strategy is then to choose his distribution x according to x max i=1,...n (P T x) i subject to x 0, 1 T x = 1

19 In an alternate universe, if R were somehow wiser than J, then he might allow J to know his strategy y beforehand By the same logic, R s best strategy is to choose his distribution y according to max y j=1,...m (P y) j subject to y 0, 1 T y = 1 Call R s expected payout in first scenario f1, and expected payout in second scenario f2. Because it is clearly advantageous to know the other player s strategy, f1 f 2 But by Von Neumman s imax theorem: we know that f 1 = f 2... which may come as a surprise!

20 Recast first problem as an LP: x,t subject to x 0, 1 T x = 1 P T x t Now form what we call the Lagrangian: L(x, t, u, v, y) = t u T x + v(1 1 T x) + y T (P T x t1) and what we call the Lagrange dual function: g(u, v, y) = L(x, t, u, v, y) x,t { v if 1 1 T y = 0, P y u v1 = 0 = otherwise

21 Hence dual problem, after eliating slack variable u, is max y,v v subject to y 0, 1 T y = 1 P y v This is exactly the second problem, and therefore again we see that strong duality holds So how often does strong duality hold? In LPs, as we ll see, strong duality holds unless both the primal and dual are infeasible

22 References S. Boyd and L. Vandenberghe (2004), Convex optimization, Chapter 5 R. T. Rockafellar (1970), Convex analysis, Chapters 28 30