On the Method of Lagrange Multipliers

Similar documents
Duality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Convex Optimization & Lagrange Duality

Lagrangian Duality and Convex Optimization

EE/AA 578, Univ of Washington, Fall Duality

The Lagrangian L : R d R m R r R is an (easier to optimize) lower bound on the original problem:

Convex Optimization Boyd & Vandenberghe. 5. Duality

CS-E4830 Kernel Methods in Machine Learning

Lecture: Duality of LP, SOCP and SDP

5. Duality. Lagrangian

Constrained Optimization and Lagrangian Duality

ICS-E4030 Kernel Methods in Machine Learning

CSCI : Optimization and Control of Networks. Review on Convex Optimization

A Brief Review on Convex Optimization

Lagrange duality. The Lagrangian. We consider an optimization program of the form

Lecture: Duality.

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Convex Optimization M2

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness.

Tutorial on Convex Optimization for Engineers Part II

EE364a Review Session 5

Lecture Notes on Support Vector Machine

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

Introduction to Mathematical Programming IE406. Lecture 10. Dr. Ted Ralphs

Lecture 18: Optimization Programming

Convex Optimization and Modeling

Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725

12. Interior-point methods

Optimization for Communications and Networks. Poompat Saengudomlert. Session 4 Duality and Lagrange Multipliers

Convex Optimization Overview (cnt d)

subject to (x 2)(x 4) u,

Lagrangian Duality Theory

Tutorial on Convex Optimization: Part II

Primal/Dual Decomposition Methods

Support Vector Machines: Maximum Margin Classifiers

Optimization for Machine Learning

Generalization to inequality constrained problem. Maximize

12. Interior-point methods

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient

Machine Learning. Lecture 6: Support Vector Machine. Feng Li.

Convex Optimization and SVM

Optimality, Duality, Complementarity for Constrained Optimization

Linear and Combinatorial Optimization

Homework Set #6 - Solutions

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Lecture 7: Convex Optimizations

Lagrange Relaxation and Duality

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Lagrangian Duality. Richard Lusby. Department of Management Engineering Technical University of Denmark

Support Vector Machines for Regression

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus

Lecture 6: Conic Optimization September 8

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Machine Learning. Support Vector Machines. Manfred Huber

2.098/6.255/ Optimization Methods Practice True/False Questions

Outline. Roadmap for the NPP segment: 1 Preliminaries: role of convexity. 2 Existence of a solution

4TE3/6TE3. Algorithms for. Continuous Optimization

CONVEX OPTIMIZATION, DUALITY, AND THEIR APPLICATION TO SUPPORT VECTOR MACHINES. Contents 1. Introduction 1 2. Convex Sets

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Introduction to Machine Learning Spring 2018 Note Duality. 1.1 Primal and Dual Problem

Rate Control in Communication Networks

A Tutorial on Convex Optimization II: Duality and Interior Point Methods

The Karush-Kuhn-Tucker conditions

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Convex Optimization and Support Vector Machine

Support vector machines

Support Vector Machine

FIN 550 Exam answers. A. Every unconstrained problem has at least one interior solution.

Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

1. f(β) 0 (that is, β is a feasible point for the constraints)

Lagrangian Duality for Dummies

Numerical Optimization

Mathematical Foundations -1- Constrained Optimization. Constrained Optimization. An intuitive approach 2. First Order Conditions (FOC) 7

January 29, Introduction to optimization and complexity. Outline. Introduction. Problem formulation. Convexity reminder. Optimality Conditions

Interior Point Algorithms for Constrained Convex Optimization

UC Berkeley Department of Electrical Engineering and Computer Science. EECS 227A Nonlinear and Convex Optimization. Solutions 5 Fall 2009

EE 227A: Convex Optimization and Applications October 14, 2008

Quiz Discussion. IE417: Nonlinear Programming: Lecture 12. Motivation. Why do we care? Jeff Linderoth. 16th March 2006

Lagrangian Duality. Evelien van der Hurk. DTU Management Engineering

Primal-dual Subgradient Method for Convex Problems with Functional Constraints

ECE Optimization for wireless networks Final. minimize f o (x) s.t. Ax = b,

Note 3: LP Duality. If the primal problem (P) in the canonical form is min Z = n (1) then the dual problem (D) in the canonical form is max W = m (2)

4. Algebra and Duality

Subgradients. subgradients and quasigradients. subgradient calculus. optimality conditions via subgradients. directional derivatives

In view of (31), the second of these is equal to the identity I on E m, while this, in view of (30), implies that the first can be written

4. Convex optimization problems (part 1: general)

Convex Optimization in Communications and Signal Processing

10-725/ Optimization Midterm Exam

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

CS711008Z Algorithm Design and Analysis

In applications, we encounter many constrained optimization problems. Examples Basis pursuit: exact sparse recovery problem

10 Numerical methods for constrained problems

Duality. Geoff Gordon & Ryan Tibshirani Optimization /

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

9. Dual decomposition and dual algorithms

Lecture Note 5: Semidefinite Programming for Stability Analysis

Support Vector Machines

Transcription:

On the Method of Lagrange Multipliers Reza Nasiri Mahalati November 6, 2016 Most of what is in this note is taken from the Convex Optimization book by Stephen Boyd and Lieven Vandenberghe. This should hopefully demystify the method of lagrange multipliers to some extent, and help you understand why and when this method works. The Lagrange dual function Generally, an optimization problem in the standard form is given by: minimize f 0 (x) subject to f i (x) 0, i = 1,...,m h i (x) = 0, i = 1,...,p, (1) with variable x R n. We assume its domain D = m i=0 domf i p domh i is nonempty, and denote the optimal value of (1) by p. We use dom to denote the domain of a function. As an example, the general norm minimization with equality constraints that we discussed in class is a special case of (1) where: f 0 (x) = (1/2) Ax b 2 h i (x) = c T i x d i, i = 1,...,p, (2) and there are no inequality constraints (i.e. there are no f i (x) i = 1,...,m). We simply write the p equality constraints in the matrix form as Cx d = 0. The basic idea in Lagrangian duality is to take the constraints in (1) into account by augmenting the objective function with a weighted sum of the constraint functions. We define the Lagrangian L : R n R m R p R associated with the problem (1) as L(x,λ,ν) = f 0 (x)+ λ i f i (x)+ ν i h i (x), with doml = D R m R p. We refer to λ i as the Lagrange multiplier associated with the ith inequality constraint f i (x) 0; similarly we refer to ν i as the Lagrange multiplier associated with the ith equality constraint h i (x) = 0. The vectors λ and ν are called the dual variables or Lagrange multiplier vectors associated with the problem (1). 1

We define the Lagrange dual function (or just dual function) g : R m R p R as the minimum value of the Lagrangian over x, for λ R m, ν R p, ( ) g(λ,ν) = minl(x,λ,ν) = min f 0 (x)+ λ i f i (x)+ ν i h i (x). x D x D When the Lagrangian is unbounded below in x, the dual function takes on the value. Since the dual function is the pointwise minimum of a family of affine functions of (λ,ν), it is always concave and hence, we can always find its maximum. Lower bounds on optimal value The dual function yields lower bounds on the optimal value p of the problem (1). For any λ 0 and any ν we have g(λ,ν) p. (3) This important property is easily verified. Suppose x is a feasible point for the problem (1), i.e., f i ( x) 0 and h i ( x) = 0, and λ 0. Then we have λ i f i ( x)+ ν i h i ( x) 0, since each term in the first sum is nonpositive, and each term in the second sum is zero, and therefore L( x,λ,ν) = f 0 ( x)+ λ i f i ( x)+ ν i h i ( x) f 0 ( x). Hence g(λ,ν) = min L(x,λ,ν) L( x,λ,ν) f 0( x). x D Since g(λ,ν) f 0 ( x) holds for every feasible point x, the inequality (3) follows. The lower bound (3) is illustrated in figure 1, for a simple problem with x R and one inequality constraint. The inequality (3) holds, but is vacuous, when g(λ,ν) =. The dual function givesanontrivial lower boundonp onlywhenλ 0and(λ,ν) domg, i.e., g(λ,ν) >. We refer to a pair (λ,ν) with λ 0 and (λ,ν) domg as dual feasible. Linear approximation interpretation The Lagrangian and lower bound property can be given a simple interpretation, based on a linear approximation of the indicator functions of the sets {0} and R +. We first rewrite the original problem (1) as an unconstrained problem, minimize f 0 (x)+ m I (f i (x))+ p I 0 (h i (x)), (4) where I : R R is the indicator function for the nonpositive reals, { 0 u 0 I (u) = u > 0, 2

5 4 3 2 1 0 1 2 1 0.5 0 0.5 1 x Figure 1: Lower bound from a dual feasible point. The solid curve shows the objective function f 0, and the dashed curve shows the constraint function f 1. The feasible set is the interval [ 0.46, 0.46], which is indicated by the two dotted vertical lines. The optimal point and value are x = 0.46, p = 1.54 (shown as a circle). The dotted curves show L(x,λ) for λ = 0.1, 0.2,...,1.0. Each of these has a minimum value smaller than p, since on the feasible set (and for λ 0) we have L(x,λ) f 0 (x). 3

and similarly, I 0 is the indicator function of {0}. In the formulation (4), the function I (u) can be interpreted as expressing our irritation or displeasure associated with a constraint function value u = f i (x): It is zero if f i (x) 0, and infinite if f i (x) > 0. In a similar way, I 0 (u) gives our displeasure for an equality constraint value u = h i (x). We can think of I as a brick wall or infinitely hard displeasure function; our displeasure rises from zero to infinite as f i (x) transitions from nonpositive to positive. Now suppose in theformulation (4) we replace thefunction I (u) with the linear function λ i u, where λ i 0, and the function I 0 (u) with ν i u. The objective becomes the Lagrangian function L(x,λ,ν), and the dual function value g(λ,ν) is the optimal value of the problem minimize L(x,λ,ν) = f 0 (x)+ m λ i f i (x)+ p ν i h i (x). (5) Inthisformulation, weuse alinear or soft displeasure functioninplaceofi andi 0. Foran inequality constraint, our displeasure is zero when f i (x) = 0, and is positive when f i (x) > 0 (assuming λ i > 0); our displeasure grows as the constraint becomes more violated. Unlike the original formulation, in which any nonpositive value of f i (x) is acceptable, in the soft formulation we actually derive pleasure from constraints that have margin, i.e., from f i (x) < 0. Clearly the approximation of the indicator function I (u) with a linear function λ i u is rather poor. But the linear function is at least an underestimator of the indicator function. Since λ i u I (u) and ν i u I 0 (u) for all u, we see immediately that the dual function yields a lower bound on the optimal value of the original problem. The Lagrange dual problem For each pair (λ,ν) with λ 0, the Lagrange dual function gives us a lower bound on the optimal value p of the optimization problem (1). Thus we have a lower bound that depends on some parameters λ, ν. A natural question is: What is the best lower bound that can be obtained from the Lagrange dual function? This leads to the optimization problem maximize g(λ, ν) subject to λ 0. (6) This problem is called the Lagrange dual problem associated with the problem (1). In this context the original problem (1) is sometimes called the primal problem. The term dual feasible, to describe a pair (λ,ν) with λ 0 and g(λ,ν) >, now makes sense. It means, as the name implies, that (λ,ν) is feasible for the dual problem (6). We refer to (λ,ν ) as dual optimal or optimal Lagrange multipliers if they are optimal for the problem (6). The Lagrange dual problem (6) is always a convex optimization problem, since the objective to be maximized is concave and the constraint is convex. Therefore, we can always solve this problem. 4

Weak duality The optimal value of the Lagrange dual problem, which we denote d, is, by definition, the best lower bound on p that can be obtained from the Lagrange dual function. In particular, we have the simple but important inequality d p, (7) which holds for any general optimization problem. This property is called weak duality. The weak duality inequality (7) holds when d and p are infinite. For example, if the primal problem is unbounded below, so that p =, we must have d =, i.e., the Lagrange dual problem is infeasible. Conversely, if the dual problem is unbounded above, so that d =, we must have p =, i.e., the primal problem is infeasible. We refer to the difference p d as the optimal duality gap of the original problem, since it gives the gap between the optimal value of the primal problem and the best (i.e., greatest) lower bound on it that can be obtained from the Lagrange dual function. The optimal duality gap is always nonnegative. The bound (7) can sometimes be used to find a lower bound on the optimal value of a problem that is difficult to solve, since the dual problem is always convex and can be solved efficiently, to find d. Strong duality If the equality d = p (8) holds, i.e., the optimal duality gapis zero, then we say that strong duality holds. This means that the best bound that can be obtained from the Lagrange dual function is tight. Strong duality does not, in general, hold. But if the primal problem (1) is convex, i.e., of the form minimize f 0 (x) subject to f i (x) 0, i = 1,...,m, (9) Cx = d, with f 0,...,f m convex functions, we usually (but not always) have strong duality. There are many results that establish conditions on the problem, beyond convexity, under which strong duality holds. These conditions are called constraint qualifications. The optimization problems we will be solving in EE263 are always convex, and since we don t work with inequality constraints in this course, we need not worry about constraint qualifications. In other words, strong duality always holds in EE263, except for the case where the constraint Cx = d cannot be satisfied for any x D, which means the problem is infeasible and cannot be solved. Now suppose that the primal and dual optimal values are attained and equal (so, in particular, strong duality holds). Let x be a primal optimal and (λ,ν ) be a dual optimal 5

point. This means that f 0 (x ) = g(λ,ν ) ( ) = inf f 0 (x)+ λ x i f i(x)+ νi h i(x) f 0 (x )+ λ i f i(x )+ νi h i(x ) f 0 (x ). The first line states that the optimal duality gap is zero, and the second line is the definition of the dual function. The third line follows since the infimum of the Lagrangian over x is less than or equal to its value at x = x. The last inequality follows from λ i 0, f i(x ) 0, i = 1,...,m, and h i (x ) = 0, i = 1,...,p. We conclude that the two inequalities in this chain hold with equality. We can draw several interesting conclusions from this. For example, since the inequality in the third line is an equality, we conclude that x minimizes L(x,λ,ν ) over x. (The Lagrangian L(x,λ,ν ) can have other minimizers; x is simply a minimizer.) Another important conclusion is that m λ if i (x ) = 0. Since each term in this sum is nonpositive, we conclude that λ i f i(x ) = 0, i = 1,...,m. (10) This condition is known as complementary slackness; it holds for any primal optimal x and any dual optimal (λ,ν ) (when strong duality holds). We can express the complementary slackness condition as λ i > 0 = f i(x ) = 0, or, equivalently, f i (x ) < 0 = λ i = 0. Roughly speaking, this means the ith optimal Lagrange multiplier is zero unless the ith constraint is active at the optimum. KKT conditions We now assume that the functions f 0,...,f m,h 1,...,h p are differentiable (and therefore have open domains), but we make no assumptions yet about convexity. As above, let x and (λ,ν ) be any primal and dual optimal points with zero duality gap. Since x minimizes L(x,λ,ν ) over x, it follows that its gradient must vanish at x, i.e., f 0 (x )+ λ i f i(x )+ νi h i(x ) = 0. 6

Thus we have f i (x ) 0, i = 1,...,m h i (x ) = 0, i = 1,...,p λ i 0, i = 1,...,m λ i f i(x ) = 0, i = 1,...,m f 0 (x )+ m λ i f i(x )+ p νi h i(x ) = 0, (11) which are called the Karush-Kuhn-Tucker (KKT) conditions. To summarize, for any optimization problem with differentiable objective and constraint functions for which strong duality obtains, any pair of primal and dual optimal points must satisfy the KKT conditions (11). When the primal problem is convex, the KKT conditions are also sufficient for the points to be primal and dual optimal. In other words, if f i are convex and h i are affine, and x, λ, ν are any points that satisfy the KKT conditions f i ( x) 0, i = 1,...,m h i ( x) = 0, i = 1,...,p λ i 0, i = 1,...,m λ i f i ( x) = 0, i = 1,...,m f 0 ( x)+ m λi f i ( x)+ p ν i h i ( x) = 0, then x and ( λ, ν) are primal and dual optimal, with zero duality gap. What we did with the method of Lagrange multipliers in class, was precisely to form and solve the KKT system. To see this, note that the KKT conditions for our general norm minimization problem would be: C x d = ν L( x, ν) = 0, f 0 ( x)+ p ν i h i ( x) = x L( x, ν) = 0, which is exactly the system of equations we got from the method of Lagrange multipliers. 7