Lagrangian Methods for Constrained Optimization

Pricing Communication Networks: Economics, Technology and Modelling. Costas Courcoubetis and Richard Weber Copyright 2003 John Wiley & Sons, Ltd. ISBN: 0-470-85130-9 Appendix A Lagrangian Methods for Constrained Optimization A.1 Regional and functional constraints Throughout this book we have considered optimization problems that were subject to constraints. These include the problem of allocating a finite amounts of bandwidth to maximize total user benefit, the social welfare maximization problem, and the time of day pricing problem. We make frequent use of the Lagrangian method to solve these problems. This appendix provides a tutorial on the method. Take, for example, NETWORK : maximize x½0 Xn r rd1 w r log x r ; subject to Ax C posed in Chapter 12. This is an example of the generic constrained optimization problem: P : maximize f.x/; subject to g.x/ D b Here f is to be maximized subject to constraints that are of two types. The constraint x 2 X is a regional constraint. For example, it might be x ½ 0. The constraint g.x/ D b is a functional constraint. Sometimes the functional constraint is an inequality constraint, like g.x/ b. But if it is, we can always add a slack variable, z, and re-write it as the equality constraint g.x/ C z D b, re-defining the regional constraint as x 2 X and z ½ 0. To illustrate, we shall use the NETWORK problem with just one resource constraint: P 1 : maximize x½0 where b is a positive number. w i log x i ; subject to x i D b A.2 The Lagrangian method The solution of a constrained optimization problem can often be found by using the socalled Lagrangian method. WedefinetheLagrangian as L.x;½/D f.x/ C ½.b g.x//

334 APPENDIX A For P 1 it is L 1.x;½/D w i log x i C ½! b x i In general, the Lagrangian is the sum of the original objective function, and a term that involves the functional constraint and a Lagrange multiplier ½. Suppose we ignore the functional constraint and consider the problem of maximizing the Lagrangian, subject only to the regional constraint. This is often an easier problem than the original one. The value of x that maximizes L.x;½/ depends upon the value of ½. Let us denote this optimizing value of x by x.½/. For example, since L 1.x;½/ is a concave function of x it has a unique maximum at a point where f is stationary with respect to changes in x, i.e.where L 1 =x i D w i =x i ½ D 0 for all i Thus, x i.½/ D w i =½. Note that x i.½/ > 0for½>0, and so the solution lies in the interior of the feasible set. Think of ½ as knob that we can turn to adjust the value of x. Imagine turning this knob until we find a value of ½, say½ D ½ Ł, such that the functional constraint is satisfied, i.e. g.x.½ Ł // D b. Letx Ł D x.½ Ł /. Our claim is that x Ł solves P. This is the so-called Lagrangian Sufficiency Theorem, which we state and prove shortly. First, note that, in our example, g.x.½// D P i w i =½. Thus, choosing ½ Ł D P i w i=b, wehaveg.x.½ Ł // D b. The next theorem shows that x D x.½ Ł / D w i b= P j w j is optimal for P 1. Theorem 5 (Lagrangian Sufficiency Theorem) Suppose there exist x Ł 2 X and ½ Ł,such that x Ł maximizes L.x;½ Ł / over all x 2 X, andg.x Ł / D b. Thenx Ł solves P. Proof max f.x/ D max [ f.x/ C ½ Ł.b g.x//] max [ f.x/ C ½Ł.b g.x//] D f.x Ł / C ½ Ł.b g.x Ł //] D f.x Ł / Equality in the first line holds because we have simply added 0 on the right-hand side. The inequality in the second line holds because we have enlarged the set over which maximization takes place. In the third line, we use the fact that x Ł maximizes L.x;½ Ł / and in the fourth line we use g.x Ł / D b. Butx Ł is feasible for P, in that it satisfies the regional and functional constraints. Hence x Ł is optimal. Multiple Constraints If g and b are vectors, so that g.x/ D b expresses more than one constraint, then we would write L.x;½/D f.x/ C ½ >.b g.x//

WHEN DOES THE METHOD WORK? 335 where the vector ½ now has one component for each constraint. For example, the Lagrangian for NETWORK is Xn r L.x;½/D w r log x r C X rd1 j ½ j.c j X j A jr x r z j / where z j is the slack variable for the jth constraint. A.3 When does the method work? The Lagrangian method is based on a sufficiency theorem. The means that method can work, but need not work. Our approach is to write down the Lagrangian, maximize it, and then see if we can choose ½ and a maximizing x so that the conditions of the Lagrangian Sufficiency Theorem are satisfied. If this works, then we are happy. If it does not work, then too bad. We must try to solve our problem some other way. The method worked for P 1 because we could find an appropriate ½ Ł. To see that this is so, note that as ½ increases from 0to1, g.x.½// decreases from 1 to 0. Moreover, g.x.½// is continuous in ½. Therefore, given positive b, theremustexista½for which g.x.½// D b. For this value of ½, which we denote ½ Ł,andforx Ł D x.½ Ł / the conditions of the Lagrangian Sufficiency Theorem are satisfied. To see that the Lagrangian method does not always work, consider the problem P 2 : minimize x; subject to x ½ 0and p x D 2 This cannot be solved by the Lagrangian method. If we minimize L.x;½/D x C ½.2 p x/ over x ½ 0, we get a minimum value of 1, no matter what we take as the value of ½. This is clearly not the right answer. So the Lagrangian method fails to work. However, the method does work for Now P 0 2 : minimize x; subject to x ½ 0andx2 D 16 L.x;½/D x C ½.16 x 2 / If we take ½ Ł D 1=8, then L=x D 1 C x=4, and so x Ł D 4. Note that P 2 and P 0 2 are really the same problem, except that the functional constraint is expressed differently. Thus, whether or not the Lagrangian method will work can depend upon how we formulate the problem. We can say something more about when the Lagrangian method will work. Let P.b/ be the problem: minimize f.x/, such that x 2 X and g.x/ D b. Define.b/ as min f.x/, subject to x 2 X and g.x/ D b. Then the Lagrangian method works for P.b Ł / if and only if there is a line that is tangent to.b/ at b Ł and lies completely below.b/. This happens if.b/ is a convex function of b, but this is a difficult condition to check. A set of sufficient conditions that are easier to check are provided in the following theorem. These conditions do not hold in P 2 as g.x/ D p x is not a convex function of x. InP 0 2 the sufficient conditions are met.

336 APPENDIX A f feasible (g, f ) f(b) f(b * ) minl L = f l > (g b * ) g b * f(b * ) + l > (g b * ) Figure A.1 l > (g b * ) Lagrange multipliers Theorem 6 If f and g are convex functions, X is a convex set, and x Ł is an optimal solution to P, then there exist Lagrange multipliers ½ 2 R m such that L.x Ł ;½/ L.x;½/ for all x 2 X. Remark Recall that f is a convex function if for all x 1 ; x 2 2 X and 2 [0; 1], we have f. x 1 C.1 /x 2 / f.x 1 / C.1 /f.x 2 /. X is a convex set if for all x 1 ; x 2 2 X and 2 [0; 1], we have x 1 C.1 /x 2 2 X. Furthermore, f is a concave function if f is convex. The proof of the theorem proceeds by showing that.b/ is convex. This implies that for each b Ł there is a tangent hyperplane to.b/ at b Ł with the graph of.b/ lying entirely above it. This uses the so-called supporting hyperplane theorem for convex sets, which is geometrically obvious, but some work to prove. Note the equation of the hyperplane will be y D.b Ł / C ½ >.b b Ł / for some multipliers ½. This½ can be shown to be the required vector of Lagrange multipliers, and Figure A.1 gives some geometric intuition as to why the Lagrange multipliers ½ exist and why these ½s give the rate of change of the optimum.b/ with b. A.4 Shadow prices The maximizing x, the appropriate value of ½ and maximal value of f all depend on b. What happens if b changes by a small amount? Let the maximizing x be x Ł.b/. Suppose the Lagrangian method works and let ½ Ł.b/ denote the appropriate value of ½. As above, let.b/ denote the maximal value of f.wehave.b/ D f.x Ł.b// C ½ Ł ð b g.x Ł.b/ Ł So simply differentiating with respect to b, we have b.b/ D x Ł i b x Ł i C ½ Ł [b g. Nx.b/]gC ½Ł b D 0 C ½ Ł C [b g.x Ł.b/] ½Ł b ý f.x Ł.b// C ½ Ł [b g.x Ł.b/] C b f f.xł.b// ½ Ł f f.xł.b// C ½ Ł [b g.x Ł.b/]g

THE DUAL PROBLEM 337 where the first term on the right-hand side is 0, because L.x;½ Ł / is stationary with respect to x i at x D x Ł and the third term is zero because b g.x Ł.b// D 0. Thus,.b/ D ½Ł b and ½ Ł can be interpreted as the rate at which the maximized value of f increases with b, for small increases around b. For this reason, the Lagrange multiplier ½ Ł is also called a shadow price, the idea being that if b increases to b C Ž then we should be prepared to pay ½ Ł Ž for the increase we receive in f. It can happen that at the optimum none, one, or several constraints are active. For example, with constraints g 1.x/ b 1 and g 2.x/ b 2 it can happen that at the optimum g 1.x Ł / D b 1 and g 2.x Ł /<b 2. In this case we will find have ½ Ł 2 D 0. This makes sense. The second constraint is not limiting the maximization of f and so the shadow price of b 2 is zero. A.5 The dual problem By similar reasoning to that we used in the proof of the Lagrangian sufficiency theorem, we have that for any ½.b/ D max f.x/ D max [ f.x/ C ½.b g.x//] max[ f.x/ C ½.b g.x//] : The right-hand side provides an upper bound on.b/. We make this upper bound as tight as possible by minimizing over ½, sothatwehave.b/ min max [ f.x/ C ½ ½Ł.b g.x//] The right-hand side above defines an optimization problem, called the dual problem. The original problem is called the primal problem. If the primal can be solved by the Lagrangian method then the inequality above is an equality, and the solution to the dual problem is just ½ Ł.b/. If the primal cannot be solved by the Lagrangian method we will have a strict inequality, the so-called duality gap. The dual problem is interesting because it can sometimes be easier to solve, or because it formulates the problem in an illuminating way. The dual of P 1 is! minimize ½ w i log.w i =½/ C ½ b w i =½ where we have inserted x i D w i =½, after carrying out the inner maximization over x. This is a convex function of ½. Differentiating with respect to ½, one can check that the stationary point is the maximum, and P i w i=½ D b. Thisgives½, and finally, as before.b/ D w i log! w i P j w b j

338 APPENDIX A The dual plays a particularly important role in the theory of linear programming. A linear program, such P : maximize c > x; subject to x ½ 0andAx b is one in which both the objective function and constraints are linear. Here x; c 2 R n, b 2 R m and A is a m ð n matrix. The dual of P is D: D : minimize ½ > b; subject to ½ ½ 0andy > A ½ c > D is another linear program, and the dual of D is P. The decision variables in D, i.e. ½ 1 ;:::;½ m, are the Lagrange multipliers for the m constraints expressed by Ax b in P. A.6 Further reading For further details on Lagrangian methods of constrained optimization, see the course notes of Weber (1998).