Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Similar documents
Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

CONSTRAINT QUALIFICATIONS, LAGRANGIAN DUALITY & SADDLE POINT OPTIMALITY CONDITIONS

Primal/Dual Decomposition Methods

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

Convex Optimization and Modeling

Convex Optimization & Lagrange Duality

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

Solving Dual Problems

5. Duality. Lagrangian

Finite Dimensional Optimization Part III: Convex Optimization 1

Lagrangian Duality Theory

Optimization for Communications and Networks. Poompat Saengudomlert. Session 4 Duality and Lagrange Multipliers

Convex Optimization M2

Lagrange Relaxation and Duality

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

CS-E4830 Kernel Methods in Machine Learning

Enhanced Fritz John Optimality Conditions and Sensitivity Analysis

ICS-E4030 Kernel Methods in Machine Learning

LECTURE 10 LECTURE OUTLINE

Additional Homework Problems

Constrained Optimization and Lagrangian Duality

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient

subject to (x 2)(x 4) u,

Lecture: Duality of LP, SOCP and SDP

The Lagrangian L : R d R m R r R is an (easier to optimize) lower bound on the original problem:

Subgradients. subgradients and quasigradients. subgradient calculus. optimality conditions via subgradients. directional derivatives

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness.

Convex Optimization Boyd & Vandenberghe. 5. Duality

Lecture 8. Strong Duality Results. September 22, 2008

Primal-dual Subgradient Method for Convex Problems with Functional Constraints

4. Algebra and Duality

Lagrangian Duality and Convex Optimization

4TE3/6TE3. Algorithms for. Continuous Optimization

Lecture: Duality.

Lagrange duality. The Lagrangian. We consider an optimization program of the form

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

Convex Optimization Theory. Chapter 5 Exercises and Solutions: Extended Version

Duality Theory of Constrained Optimization

Outline. Roadmap for the NPP segment: 1 Preliminaries: role of convexity. 2 Existence of a solution

Generalization to inequality constrained problem. Maximize

Introduction to Mathematical Programming IE406. Lecture 10. Dr. Ted Ralphs

10 Numerical methods for constrained problems

Lecture 18: Optimization Programming

Math 273a: Optimization Subgradients of convex functions

Support Vector Machines

Convex Optimization and Support Vector Machine

Primal Solutions and Rate Analysis for Subgradient Methods

Duality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities

1. f(β) 0 (that is, β is a feasible point for the constraints)

Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Support Vector Machines: Maximum Margin Classifiers

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Lagrangian Duality. Richard Lusby. Department of Management Engineering Technical University of Denmark

Convex Functions. Pontus Giselsson

CONSTRAINED NONLINEAR PROGRAMMING

Convex Optimization and SVM

Econ 508-A FINITE DIMENSIONAL OPTIMIZATION - NECESSARY CONDITIONS. Carmen Astorne-Figari Washington University in St. Louis.

Subgradients. subgradients. strong and weak subgradient calculus. optimality conditions via subgradients. directional derivatives

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus

Optimality Conditions for Constrained Optimization

Nonlinear Programming 3rd Edition. Theoretical Solutions Manual Chapter 6

3.10 Lagrangian relaxation

A Brief Review on Convex Optimization

Optimisation in Higher Dimensions

EE364a Review Session 5

Homework Set #6 - Solutions

Example: feasibility. Interpretation as formal proof. Example: linear inequalities and Farkas lemma

CSCI : Optimization and Control of Networks. Review on Convex Optimization

EE/AA 578, Univ of Washington, Fall Duality

Lecture 7: Convex Optimizations

Optimization for Machine Learning

Lecture 8 Plus properties, merit functions and gap functions. September 28, 2008

CS711008Z Algorithm Design and Analysis

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization

Lecture 5. Theorems of Alternatives and Self-Dual Embedding

EE 227A: Convex Optimization and Applications October 14, 2008

Duality (Continued) min f ( x), X R R. Recall, the general primal problem is. The Lagrangian is a function. defined by

Convex Optimization. Ofer Meshi. Lecture 6: Lower Bounds Constrained Optimization

12. Interior-point methods

Convex Optimization Theory. Athena Scientific, Supplementary Chapter 6 on Convex Optimization Algorithms

Solving large Semidefinite Programs - Part 1 and 2

Lagrangian Duality. Evelien van der Hurk. DTU Management Engineering

Lectures 9 and 10: Constrained optimization problems and their optimality conditions

minimize x subject to (x 2)(x 4) u,

The fundamental theorem of linear programming

Solutions Chapter 5. The problem of finding the minimum distance from the origin to a line is written as. min 1 2 kxk2. subject to Ax = b.

Machine Learning. Lecture 6: Support Vector Machine. Feng Li.

Nonlinear Programming and the Kuhn-Tucker Conditions

Duality in Linear Programs. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Optimization. Yuh-Jye Lee. March 28, Data Science and Machine Intelligence Lab National Chiao Tung University 1 / 40

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Examination paper for TMA4180 Optimization I

Duality. for The New Palgrave Dictionary of Economics, 2nd ed. Lawrence E. Blume

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

On duality theory of conic linear problems

TMA947/MAN280 APPLIED OPTIMIZATION

Miscellaneous Nonlinear Programming Exercises

Midterm Review. Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A.

Transcription:

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Michael Patriksson 0-0

The Relaxation Theorem 1 Problem: find f := infimum f(x), x subject to x S, (1a) (1b) where f : R n R given function, S R n A relaxation to (1) has the following form: find fr := infimum f R (x), x subject to x S R, (2a) (2b) where f R : R n R is a function with f R f on S, and S R S

Relaxation Theorem: (a) [relaxation] fr f (b) [infeasibility] If (2) is infeasible, then so is (1) (c) [optimal relaxation] If the problem (2) has an optimal solution, x R, for which it holds that 2 x R S and f R (x R) = f(x R), (3) then x R is an optimal solution to (1) as well Proof portion. For (c), note that f(x R) = f R (x R) f R (x) f(x), x S Applications: exterior penalty methods yield lower bounds on f ; Lagrangian relaxation yields lower bound on f

Lagrangian relaxation Consider the optimization problem to find 3 f := infimum f(x), x subject to x X, (4a) (4b) g i (x) 0, i = 1,..., m, (4c) where f : R n R and g i : R n R (i = 1, 2,..., m) are given functions, and X R n For this problem, we assume that < f <, (5) that is, that f is bounded from below and that the problem has at least one feasible solution

For a vector µ R m, we define the Lagrange function m L(x, µ) := f(x) + µ i g i (x) = f(x) + µ T g(x) (6) i=1 We call the vector µ R m a Lagrange multiplier if it is non-negative and if f = inf x X L(x, µ ) holds 4

Lagrange multipliers and global optima 5 Let µ be a Lagrange multiplier. Then, x is an optimal solution to (4) if and only if x is feasible in (4) and x arg min x X L(x, µ ), and µ i g i (x ) = 0, i = 1,..., m Notice the resemblance to the KKT conditions! If X = R n and all functions are in C 1 then x arg min x X L(x, µ ) is the same as the force equilibrium condition, the first row of the KKT conditions. The second item, µ i g i (x ) = 0 for all i is the complementarity conditions (7)

Seems to imply that there is a hidden convexity assumption here. Yes, there is. We show a Strong Duality Theorem later 6

The Lagrangian dual problem associated with the Lagrangian relaxation 7 q(µ) := infimum L(x, µ) (8) x X is the Lagrangian dual function. The Lagrangian dual problem is to maximize q(µ), subject to µ 0 m (9a) (9b) For some µ, q(µ) = is possible; if this is true for all µ 0 m, q := supremum q(µ) = 0 m

The effective domain of q is D q := { µ R m q(µ) > } The effective domain D q of q is convex, and q is concave on D q That the Lagrangian dual problem always is convex (we indeed maximize a concave function!) is very good news! But we need still to show how a Lagrangian dual optimal solution can be used to generate a primal optimal solution 8

Weak Duality Theorem 9 Let x and µ be feasible in (4) and (9), respectively. Then, In particular, q(µ) f(x) q f If q(µ) = f(x), then the pair (x, µ) is optimal in its respective problem

Weak duality is also a consequence of the Relaxation Theorem: For any µ 0 m, let 10 S := X { x R n g(x) 0 m }, S R := X, f R := L(µ, ) (10a) (10b) (10c) Apply the Relaxation Theorem If q = f, there is no duality gap. If there exists a Lagrange multiplier vector, then by the weak duality theorem, there is no duality gap. There may be cases where no Lagrange multiplier exists even when there is no duality gap; in that case, the Lagrangian dual problem cannot have an optimal solution

Global optimality conditions 11 The vector (x, µ ) is a pair of optimal primal solution and Lagrange multiplier if and only if x arg min x X L(x, µ ), µ 0 m, (Dual feasibility) (11a) (Lagrangian optimality) (11b) x X, g(x ) 0 m, (Primal feasibility) (11c) µ i g i (x ) = 0, i = 1,..., m (Complementary slackness) (11d) If (x, µ ), equivalent to zero duality gap and existence of Lagrange multipliers

Saddle points 12 The vector (x, µ ) is a pair of optimal primal solution and Lagrange multiplier if and only if x X, µ 0 m, and (x, µ ) is a saddle point of the Lagrangian function on X R m +, that is, L(x, µ) L(x, µ ) L(x, µ ), (x, µ) X R m +, (12) holds If (x, µ ), equivalent to the global optimality conditions, the existence of Lagrange multipliers, and a zero duality gap

Strong duality for convex programs, introduction Results so far have been rather non-technical to achieve: convexity of the dual problem comes with very few assumptions on the original, primal problem, and the characterization of the primal dual set of optimal solutions is simple and also quite easily established In order to establish strong duality, that is, to establish sufficient conditions under which there is no duality gap, however takes much more In particular, as is the case with the KKT conditions we need regularity conditions (that is, constraint qualifications), and we also need to utilize separation theorems 13

Strong duality Theorem 14 Consider problem (4), where f : R n R and g i (i = 1,..., m) are convex and X R n is a convex set Introduce the following Slater condition: x X with g(x) < 0 m (13) Suppose that (5) and Slater s CQ (13) hold for the (convex) problem (4) (a) There is no duality gap and there exists at least one Lagrange multiplier µ. Moreover, the set of Lagrange multipliers is bounded and convex

(b) If the infimum in (4) is attained at some x, then the pair (x, µ ) satisfies the global optimality conditions (11) (c) If the functions f and g i are in C 1 then the condition (11b) can be written as a variational inequality. If further X is open (for example, X = R n ) then the conditions (11) are the same as the KKT conditions Similar statements for the case of also having linear equality constraints. If all constraints are linear we can remove the Slater condition 15

Examples, I: An explicit, differentiable dual problem 16 Consider the problem to minimize f(x) := x 2 1 + x 2 2, x subject to x 1 + x 2 4, Let g(x) := x 1 x 2 + 4 and X := { (x 1, x 2 ) x j 0, j = 1, 2 } x j 0, j = 1, 2

The Lagrangian dual function is q(µ) = min x X L(x, µ) := f(x) µ(x 1 + x 2 4) = 4µ + min x X {x2 1 + x 2 2 µx 1 µx 2 } = 4µ + min x 1 0 {x2 1 µx 1 } + min x 2 0 {x2 2 µx 2 }, µ 0 For a fixed µ 0, the minimum is attained at x 1 (µ) = µ, x 2 2(µ) = µ 2 Substituting this expression into q(µ), we obtain that q(µ) = f(x(µ)) µ(x 1 (µ) + x 2 (µ) 4) = 4µ µ2 2 Note that q is strictly concave, and it is differentiable everywhere (due to the fact that f, g are differentiable and x(µ) is unique) 17

We then have that q (µ) = 4 µ = 0 µ = 4. As µ = 4 0, it is the optimum in the dual problem! µ = 4; x = (x 1 (µ ), x 2 (µ )) T = (2, 2) T Also: f(x ) = q(µ ) = 8 This is an example where the dual function is differentiable. In this particular case, the optimum x is also unique, and is automatically given by x = x(µ) 18

Examples, II: An implicit, non-differentiable dual problem 19 Consider the linear programming problem to minimize f(x) := x 1 x 2, x subject to 2x 1 + 4x 2 3, 0 x 1 2, 0 x 2 1 The optimal solution is x = (3/2, 0) T, f(x ) = 3/2 Consider Lagrangian relaxing the first constraint,

obtaining L(x, µ) = x 1 x 2 + µ(2x 1 + 4x 2 3); q(µ) = 3µ+ min {( 1 + 2µ)x 1}+ min {( 1 + 4µ)x 2} 0 x 1 2 0 x 2 1 3 + 5µ, 0 µ 1/4, = 2 + µ, 1/4 µ 1/2, 3µ, 1/2 µ 20 We have that µ = 1/2, and hence q(µ ) = 3/2. For linear programs, we have strong duality, but how do we obtain the optimal primal solution from µ? q is non-differentiable at µ. We utilize the characterization given in (11)

First, at µ, it is clear that X(µ ) is the set { ( ) 2α 0 0 α 1 }. Among the subproblem solutions, we next have to find one that is primal feasible as well as complementary Primal feasibility means that 2 2α + 4 0 3 α 3/4 Further, complementarity means that µ (2x 1 + 4x 2 3) = 0 α = 3/4, since µ 0. We conclude that the only primal vector x that satisfies the system (11) together with the dual optimal solution µ = 1/2 is x = (3/2, 0) T Observe finally that f = q 21

Why must µ = 1/2? According to the global optimality conditions, the optimal solution must in this convex case be among the subproblem solutions. Since x 1 is not in one of the corners (it is between 0 and 2), the value of µ has to be such that the cost term for x 1 in L(x, µ ) is identically zero! That is, 1 + µ 2 = 0 implies that µ = 1/2! 22

A non-coordinability phenomenon (non-unique subproblem solution means that the optimal solution is not obtained automatically) In non-convex cases, the optimal solution may not be among the points in X(µ ). What do we do then?? 23

Subgradients of convex functions 24 Let f : R n R be a convex function We say that a vector p R n is a subgradient of f at x R n if f(y) f(x) + p T (y x), y R n (14) The set of such vectors p defines the subdifferential of f at x, and is denoted f(x) This set is the collection of slopes of the function f at x For every x R n, f(x) is a non-empty, convex, and compact set

25 f PSfrag replacements x Figure 1: Four possible slopes of the convex function f at x

26 PSfrag replacements f(x) x 1 2 3 f 4 5 Figure 2: The subdifferential of a convex function f at x

The convex function f is differentiable at x exactly when there exists one and only one subgradient of f at x, which then is the gradient of f at x, f(x) 27

Differentiability of the Lagrangian dual function: Introduction 28 Consider the problem (4), under the assumption that f, g i (i = 1,..., m) continuous; X nonempty and compact (15) Then, the set of solutions to the Lagrangian subproblem, X(µ) := arg minimum x X is non-empty and compact for every µ L(x, µ), µ R m, (16) We develop the sub-differentiability properties of the function q

Subgradients and gradients of q Suppose that, in the problem (4), (15) holds The dual function q is finite, continuous and concave on R m. If its supremum over R m + is attained, then the optimal solution set therefore is closed and convex Let µ R m. If x X(µ), then g(x) is a subgradient to q at µ, that is, g(x) q(µ) Proof. Let µ R m be arbitrary. We have that 29 q( µ) = infimum y X L(y, µ) f(x) + µ T g(x) = f(x) + ( µ µ) T g(x) + µ T g(x) = q(µ) + ( µ µ) T g(x)

Example 30 Let h(x) = min{h 1 (x), h 2 (x)}, where h 1 (x) = 4 x and h 2 (x) = 4 (x 2) 2 Then, h(x) = { 4 x, 1 x 4, 4 (x 2) 2, x 1, x 4

The function h is non-differentiable at x = 1 and x = 4, since its graph has non-unique supporting hyperplanes there { 1}, 1 < x < 4 {4 2x}, x < 1, x > 4 h(x) = [ 1, 2], x = 1 [ 4, 1], x = 4 The subdifferential is here either a singleton (at differentiable points) or an interval (at non-differentiable points) 31

The Lagrangian dual problem 32 Let µ R m. Then, q(µ) = conv { g(x) x X(µ) } Let µ R m. The dual function q is differentiable at µ if and only if { g(x) x X(µ) } is a singleton set [the vector of constraint functions is invariant over X(µ)]. Then, q(µ) = g(x), for every x X(µ) Holds in particular if the Lagrangian subproblem has a unique solution [X(µ) is a singleton set]. In particular, satisfied if X is convex, f is strictly convex on X, and g i (i = 1,..., m) are convex; q then even in C 1

How do we write the subdifferential of h? Theorem: If h(x) = min i=1,...,m h i (x), where each function h i is concave and differentiable on R n, then h is a concave function on R n Let I( x) {1,..., m} be defined by h( x) = h i ( x) for i I( x) and h( x) < h i ( x) for i I( x) (the active segments at x) Then, the subdifferential h( x) is the convex hull of { h i ( x) i I( x)}, that is, h( x) = ξ = λ i h i ( x) λ i =1; λ i 0, i I( x) i I( x) i I( x) 33

Optimality conditions for the dual problem For a differentiable, concave function h it holds that 34 x arg maximum x n h(x) h(x ) = 0 n Theorem: Assume that h is concave on R n. Then, x arg maximum x n h(x) 0 n h(x ) Proof. Suppose that 0 n h(x ) = h(x) h(x ) + (0 n ) T (x x ) for all x R n, that is, h(x) h(x ) for all x R n Suppose that x arg maximum x n h(x) = h(x) h(x ) = h(x ) + (0 n ) T (x x ) for all x R n, that is, 0 n h(x )

The example: 0 h(1) = x = 1 For optimization with constraints the KKT conditions are generalized thus: 35 x arg maximum x X h(x) h(x ) N X (x ), where N X (x ) is the normal cone to X at x, that is, the conical hull of the active constraints normals at x

In the case of the dual problem we have only sign conditions Consider the dual problem (9), and let µ 0 m. It is then optimal in (9) if and only if there exists a subgradient g q(µ ) for which the following holds: 36 g 0 m ; µ i g i = 0, i = 1,..., m Compare with a one-dimensional max-problem (h concave): x is optimal if and only if h (x ) 0; x h (x ) = 0; x 0

A subgradient method for the dual problem 37 Subgradient methods extend gradient projection methods from the C 1 to general convex (or, concave) functions, generating a sequence of dual vectors in R m + using a single subgradient in each iteration The simplest type of iteration has the form µ k+1 = Proj m + k + α k g k ] (17a) = [µ k + α k g k ] + (17b) = (maximum {0, (µ k ) i + α k (g k ) i }) m i=1, (17c) where g k q(µ k ) is arbitrarily chosen

We often use g k = g(x k ), where x k arg minimum x X L(x, µ k ) 38 Main difference to C 1 case: an arbitrary subgradient g k may not be an ascent direction! Cannot do line searches; must use predetermined step lengths α k Suppose that µ R m + is not optimal in (9). Then, for every optimal solution µ U in (9), µ k+1 µ < µ k µ holds for every step length α k in the interval α k (0, 2[q q(µ k )]/ g k 2 ) (18)

39 Why? Let g q( µ), and let U be the set of optimal solutions to (9). Then, U { µ R m g T (µ µ) 0 } In other words, g defines a half-space that contains the set of optimal solutions Good news: If the step length is small enough we get closer to the set of optimal solutions!

40 PSfrag replacements g µ 1 2 5 4 q 3 q(µ) Figure 3: The half-space defined by the subgradient g of q at µ. Note that the subgradient is not an ascent direction

Polyak step length rule: 41 σ α k 2[q q(µ k )]/ g k 2 σ, k = 1, 2,... (19) σ > 0 makes sure that we do not allow the step lengths to converge to zero or a too large value Bad news: Utilizes knowledge of the optimal value q! (Can be replaced by an upper bound on q which is updated) The divergent series step length rule: α k > 0, k = 1, 2,... ; lim α k = 0; α s = + (20) k s=1

Additional condition often added: αs 2 < + (21) s=1 42

Suppose that the problem (4) is feasible, and that (15) and (13) hold (a) Let {µ k } be generated by the method (17), under the Polyak step length rule (19), where σ is a small positive number. Then, {µ k } converges to an optimal solution to (9) (b) Let {µ k } be generated by the method (17), under the divergent step length rule (20). Then, {q(µ k )} q, and {dist U (µ k )} 0 (c) Let {µ k } be generated by the method (17), under the divergent step length rule (20), (21). Then, {µ k } converges to an optimal solution to (9) 43

Application to the Lagrangian dual problem 44 Given µ k 0 m Solve the Lagrangian subproblem to minimize L(x, µ k ) over x X Let an optimal solution to this problem be x k Calculate g(x k ) q(µ k ) Take a (positive) step in the direction of g(x k ) from µ k, according to a step length rule Set any negative components of this vector to zero We have obtained µ k+1

Additional algorithms 45 We can choose the subgradient more carefully, such that we will obtain ascent directions. This amounts to gathering several subgradients at nearby points and solving quadratic programming problems to find the best convex combination of them (Bundle methods) Pre-multiply the subgradient obtained by some positive definite matrix. We get methods similar to Newton methods (Space dilation methods) Pre-project the subgradient vector (onto the tangent cone of R m +) so that the direction taken is a feasible direction (Subgradient-projection methods)

More to come 46 Discrete optimization: The size of the duality gap, and the relation to the continuous relaxation. Convexification Primal feasibility heuristics Global optimality conditions for discrete optimization (and general problems)