Duality in Linear Programs Lecturer: Ryan Tibshirani Convex Optimization 10-725/36-725 1
Last time: proximal gradient descent Consider the problem x g(x) + h(x) with g, h convex, g differentiable, and h simple in so much as prox t (x) = arg z 1 2t x z 2 2 + h(x) is computable. Proximal gradient descent: let x (0) R n, repeat: x (k) = prox tk ( x (k 1) t k g(x (k 1) ) ), k = 1, 2, 3,... Step sizes t k chosen to be fixed and small, or via backtracking If g is Lipschitz with constant L, then this has convergence rate O(1/ɛ). Lastly we can accelerate this, to optimal rate O(1/ ɛ) 2
Lower bounds in linear programs Suppose we want to find lower bound on the optimal value in our convex problem, B x f(x) E.g., consider the following simple LP x,y x + y subject to x + y 2 x, y 0 What s a lower bound? Easy, take B = 2 But didn t we get lucky? 3
Try again: x,y x + 3y subject to x + y 2 x, y 0 x + y 2 + 2y 0 = x + 3y 2 Lower bound B = 2 More generally: x,y px + qy subject to x + y 2 x, y 0 a + b = p a + c = q a, b, c 0 Lower bound B = 2a, for any a, b, c satisfying above 4
What s the best we can do? Maximize our lower bound over all possible a, b, c: x,y px + qy subject to x + y 2 x, y 0 max a,b,c subject to 2a a + b = p a + c = q a, b, c 0 Called primal LP Called dual LP Note: number of dual variables is number of primal constraints 5
Try another one: x,y px + qy subject to x 0 y 1 3x + y = 2 max a,b,c subject to 2c b a + 3c = p b + c = q a, b 0 Primal LP Dual LP Note: in the dual problem, c is unconstrained 6
Outline Today: Duality in general LPs Max flow and cut Second take on duality Matrix games 7
Duality for general form LP Given c R n, A R m n, b R m, G R r n, h R r : x subject to c T x Ax = b Gx h max u,v subject to b T u h T v A T u G T v = c v 0 Primal LP Dual LP Explanation: for any u and v 0, and x primal feasible, u T (Ax b) + v T (Gx h) 0, i.e., ( A T u G T v) T x b T u h T v So if c = A T u G T v, we get a bound on primal optimal value 8
Example: max flow and cut Soviet railway network (from Schrijver (2002), On the history of transportation and maximum flow problems ) 9
10 s f ij c ij t Given graph G = (V, E), define flow f ij, (i, j) E to satisfy: f ij 0, (i, j) E f ij c ij, (i, j) E f ik = f kj, k V \{s, t} (i,k) E (k,j) E Max flow problem: find flow that maximizes total value of the flow from s to t. I.e., as an LP: max f R E (s,j) E f sj subject to f ij 0, f ij c ij for all (i, j) E f ik = f kj for all k V \ {s, t} (i,k) E (k,j) E
11 Derive the dual, in steps: Note that ( ) a ij f ij + b ij (f ij c ij ) (i,j) E + k V \{s,t} x k ( (i,k) E f ik (k,j) E for any a ij, b ij 0, (i, j) E, and x k, k V \ {s, t} Rearrange as (i,j) E M ij (a, b, x)f ij (i,j) E b ij c ij f kj ) 0 where M ij (a, b, x) collects terms multiplying f ij
12 Want to make LHS in previous inequality equal to primal M sj = b sj a sj + x j want this = 1 objective, i.e., M it = b it a it x i want this = 0 M ij = b ij a ij + x j x i want this = 0 We ve shown that primal optimal value (i,j) E b ij c ij, subject to a, b, x satisfying constraints. Hence dual problem is (imize over a, b, x to get best upper bound): b R E, x R V (i,j) E b ij c ij subject to b ij + x j x i 0 for all (i, j) E b 0, x s = 1, x t = 0
13 Suppose that at the solution, it just so happened that x i {0, 1} for all i V Let A = {i : x i = 1}, B = {i : x i = 0}; note s A, t B. Then b ij x i x j for (i, j) E, b 0 imply that b ij = 1 if i A and j B, and 0 otherwise. Moreover, the objective (i,j) E b ijc ij is the capacity of cut defined by A, B I.e., we ve argued that the dual is the LP relaxation of the cut problem: b R E, x R V subject to (i,j) E b ij c ij b ij x i x j b ij, x i, x j {0, 1} for all i, j
14 Therefore, from what we know so far: value of max flow optimal value for LP relaxed cut capacity of cut Famous result, called max flow cut theorem: value of max flow through a network is exactly the capacity of the cut Hence in the above, we get all equalities. In particular, we get that the primal LP and dual LP have exactly the same optimal values, a phenomenon called strong duality How often does this happen? More on this soon
15 Another perspective on LP duality x subject to c T x Ax = b Gx h max u,b subject to b T u h T v A T u G T v = c v 0 Primal LP Dual LP Explanation # 2: for any u and v 0, and x primal feasible c T x c T x + u T (Ax b) + v T (Gx h) := L(x, u, v) So if C denotes primal feasible set, f primal optimal value, then for any u and v 0, f x C L(x, u, v) x L(x, u, v) := g(u, v)
16 In other words, g(u, v) is a lower bound on f for any u and v 0 Note that g(u, v) = { b T u h T v if c = A T u G T v otherwise Now we can maximize g(u, v) over u and v 0 to get the tightest bound, and this gives exactly the dual LP as before This last perspective is actually completely general and applies to arbitrary optimization problems (even nonconvex ones)
17 Example: mixed strategies for matrix games Setup: two players, vs., and a payout matrix P J R 1 2... n 1 P 11 P 12... P 1n 2 P 21 P 22... P 2n... m P m1 P m2... P mn Game: if J chooses i and R chooses j, then J must pay R amount P ij (don t feel bad for J this can be positive or negative) They use mixed strategies, i.e., each will first specify a probability distribution, and then x : y : P(J chooses i) = x i, i = 1,... m P(R chooses j) = y j, j = 1,... n
18 The expected payout then, from J to R, is m n x i y j P ij = x T P y i=1 j=1 Now suppose that, because J is wiser, he will allow R to know his strategy x ahead of time. In this case, R will choose y to maximize x T P y, which results in J paying off max {x T P y : y 0, 1 T y = 1} = max i=1,...n (P T x) i J s best strategy is then to choose his distribution x according to x max i=1,...n (P T x) i subject to x 0, 1 T x = 1
19 In an alternate universe, if R were somehow wiser than J, then he might allow J to know his strategy y beforehand By the same logic, R s best strategy is to choose his distribution y according to max y j=1,...m (P y) j subject to y 0, 1 T y = 1 Call R s expected payout in first scenario f1, and expected payout in second scenario f2. Because it is clearly advantageous to know the other player s strategy, f1 f 2 But by Von Neumman s imax theorem: we know that f 1 = f 2... which may come as a surprise!
20 Recast first problem as an LP: x,t subject to x 0, 1 T x = 1 P T x t Now form what we call the Lagrangian: L(x, t, u, v, y) = t u T x + v(1 1 T x) + y T (P T x t1) and what we call the Lagrange dual function: g(u, v, y) = L(x, t, u, v, y) x,t { v if 1 1 T y = 0, P y u v1 = 0 = otherwise
21 Hence dual problem, after eliating slack variable u, is max y,v v subject to y 0, 1 T y = 1 P y v This is exactly the second problem, and therefore again we see that strong duality holds So how often does strong duality hold? In LPs, as we ll see, strong duality holds unless both the primal and dual are infeasible
22 References S. Boyd and L. Vandenberghe (2004), Convex optimization, Chapter 5 R. T. Rockafellar (1970), Convex analysis, Chapters 28 30