Lagrange duality. The Lagrangian. We consider an optimization program of the form

Similar documents
5. Duality. Lagrangian

Convex Optimization M2

Convex Optimization Boyd & Vandenberghe. 5. Duality

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Lecture: Duality.

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness.

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

Lecture: Duality of LP, SOCP and SDP

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Convex Optimization & Lagrange Duality

ICS-E4030 Kernel Methods in Machine Learning

EE/AA 578, Univ of Washington, Fall Duality

Constrained Optimization and Lagrangian Duality

subject to (x 2)(x 4) u,

On the Method of Lagrange Multipliers

Duality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities

Optimization for Communications and Networks. Poompat Saengudomlert. Session 4 Duality and Lagrange Multipliers

Lagrangian Duality and Convex Optimization

Optimization for Machine Learning

Tutorial on Convex Optimization: Part II

CS-E4830 Kernel Methods in Machine Learning

Lecture 18: Optimization Programming

Convex Optimization and Modeling

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient

Primal/Dual Decomposition Methods

EE364a Review Session 5

Homework Set #6 - Solutions

Interior Point Algorithms for Constrained Convex Optimization

Lagrangian Duality Theory

Lecture 7: Weak Duality

Convex Optimization and SVM

The Lagrangian L : R d R m R r R is an (easier to optimize) lower bound on the original problem:

EE 227A: Convex Optimization and Applications October 14, 2008

CSCI : Optimization and Control of Networks. Review on Convex Optimization

A Brief Review on Convex Optimization

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

12. Interior-point methods

HW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given.

Applications of Linear Programming

Lagrange Relaxation and Duality

Constrained optimization

ECE Optimization for wireless networks Final. minimize f o (x) s.t. Ax = b,

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

UC Berkeley Department of Electrical Engineering and Computer Science. EECS 227A Nonlinear and Convex Optimization. Solutions 5 Fall 2009

Convex Optimization. Lecture 12 - Equality Constrained Optimization. Instructor: Yuanzhang Xiao. Fall University of Hawaii at Manoa

Linear and Combinatorial Optimization

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus

Numerical Optimization

A Tutorial on Convex Optimization II: Duality and Interior Point Methods

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Lagrange Relaxation: Introduction and Applications

4. Algebra and Duality

minimize x subject to (x 2)(x 4) u,

10 Numerical methods for constrained problems

Introduction to Mathematical Programming IE406. Lecture 10. Dr. Ted Ralphs

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

The Karush-Kuhn-Tucker (KKT) conditions

Lecture 9 Sequential unconstrained minimization

Tutorial on Convex Optimization for Engineers Part II

Convex Optimization and Support Vector Machine

Lagrangian Duality. Richard Lusby. Department of Management Engineering Technical University of Denmark

Optimality Conditions for Constrained Optimization

Support Vector Machines

Lecture 7: Convex Optimizations

Quiz Discussion. IE417: Nonlinear Programming: Lecture 12. Motivation. Why do we care? Jeff Linderoth. 16th March 2006

Additional Homework Problems

Lecture 8. Strong Duality Results. September 22, 2008

Example Problem. Linear Program (standard form) CSCI5654 (Linear Programming, Fall 2013) Lecture-7. Duality

Nonlinear Optimization: What s important?

Duality. Geoff Gordon & Ryan Tibshirani Optimization /

Convex Optimization. Ofer Meshi. Lecture 6: Lower Bounds Constrained Optimization

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

Lecture 6: Conic Optimization September 8

Numerical Optimization

Primal-dual Subgradient Method for Convex Problems with Functional Constraints

Solution to EE 617 Mid-Term Exam, Fall November 2, 2017

4TE3/6TE3. Algorithms for. Continuous Optimization

Support Vector Machines: Maximum Margin Classifiers

10-725/ Optimization Midterm Exam

Equality constrained minimization

Generalization to inequality constrained problem. Maximize

Part IB Optimisation

Optimality, Duality, Complementarity for Constrained Optimization

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

Lecture Note 5: Semidefinite Programming for Stability Analysis

SECTION C: CONTINUOUS OPTIMISATION LECTURE 11: THE METHOD OF LAGRANGE MULTIPLIERS

Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725

E5295/5B5749 Convex optimization with engineering applications. Lecture 5. Convex programming and semidefinite programming

CSCI 1951-G Optimization Methods in Finance Part 09: Interior Point Methods

Convex Optimization Lecture 13

Primal-Dual Interior-Point Methods for Linear Programming based on Newton s Method

COM S 578X: Optimization for Machine Learning

Example: feasibility. Interpretation as formal proof. Example: linear inequalities and Farkas lemma

Duality Uses and Correspondences. Ryan Tibshirani Convex Optimization

Duality in Linear Programs. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Lecture 8 Plus properties, merit functions and gap functions. September 28, 2008

Duality (Continued) min f ( x), X R R. Recall, the general primal problem is. The Lagrangian is a function. defined by

Math 273a: Optimization Subgradients of convex functions

Lagrangian Duality. Evelien van der Hurk. DTU Management Engineering

Transcription:

Lagrange duality Another way to arrive at the KKT conditions, and one which gives us some insight on solving constrained optimization problems, is through the Lagrange dual. The dual is a maximization program in λ, ν it is always concave (even when the original program is not convex), and gives us a systematic way to lower bound the optimal value. The Lagrangian We consider an optimization program of the form minimize f 0 (x) f m (x) 0, m = 1,..., M (1) x R N h p (x) = 0, p = 1,..., P. Much of what we will say below applies equally well to nonconvex programs as well as convex programs, so we will make it clear when we are taking the f m to be convex and the h p to be affine. We will take the domain of all of the f m and h p to be all of R N below; this just simplifies the exposition, we can easily replace this with the intersections of the dom f m and dom h p. We will assume that the intersection of the feasible set, C = {x : f m (x) 0, h p (x) = 0, m = 1,..., M, p = 1,..., P } is a non-empty and a subset R N. 1

The Lagrangian takes the constraints in the program above and integrates them into the objective function. The Lagrangian L : R N R M R P R associated with this optimization program is L(x, λ, ν) = f 0 (x) + λ m f m (x) + ν p h p (x) The x above are referred to as primal variables, and the λ, ν as either dual variables or Lagrange multipliers. The Lagrange dual function g(λ, ν) : R M R P R is the minimum of the Lagrangian over all values of x: ( ) g(λ, ν) = inf f 0 (x) + λ m f m (x) + ν p h p (x). x R N Since the dual is a pointwise infimum of a family of affine functions in λ, ν, g is concave regardless of whether or not the f m, h p are convex. The key fact about the dual function is that is it is everywhere a lower bound on the optimal value of the original program. If p is the optimal value for (1), then g(λ, ν) p, for all λ 0, ν R P. 2

This is (almost too) easy to see. For any feasible point x 0, λ m f m (x 0 ) + ν p h p (x 0 ) 0, and so L(x 0, λ, ν) f 0 (x 0 ), for all λ 0, ν R P, meaning g(λ, ν) = inf x R N L(x, λ, ν) L(x 0, λ, ν) f 0 (x 0 ). Since this holds for all feasible x 0, g(λ, ν) inf x C f 0 (x) = p. The (Lagrange) dual to the optimization program (1) is maximize g(λ, ν) subject to λ 0. (2) λ R M, ν R P The dual optimal value d is d = sup g(λ, ν) = sup λ 0, ν λ 0, ν inf L(x, λ, ν). x R N Since g(λ, ν) p, we know that d p. The quantity p d is called the duality gap. If p = d, then we say that (1) and (2) exhibit strong duality. 3

Certificates of (sub)optimality Any dual feasible 1 (λ, ν) gives us a lower bound on p, since g(λ, ν) p. If we have a primal feasible x, then we know that f 0 (x) p f 0 (x) g(λ, ν). We will refer to f 0 (x) g(λ, ν) as the duality gap for primal/dual (feasible) pair x, λ, ν. We know that p [g(λ, ν), f 0 (x)], and likewise d [g(λ, ν), f 0 (x)]. If we are ever able to reduce this gap to zero, then we know that x is primal optimal, and λ, ν are dual optimal. There are certain kinds of primal-dual algorithms that produce a series of (feasible) points x (k), λ (k), ν (k) at every iteration. We can then use f 0 (x (k) ) g(λ (k), ν (k) ) ɛ, as a stopping criteria, and know that our answer would yield an objective value no further than ɛ from optimal. Strong duality and the KKT conditions Suppose that for a convex program, the primal optimal value p an the dual optimal value d are equal p = d. 1 We simply need λ 0 for (λ, ν) to be dual feasible. 4

If x is a primal optimal point and λ, ν is a dual optimal point, then we must have f 0 (x ) = g(λ, ν ) ( = inf x R N f 0 (x ) + f 0 (x ). f 0 (x) + λ mf m (x) + λ mf m (x ) + ) νph p (x) νph p (x ) The last inequality follows from the fact that λ m 0 (dual feasibility), f m (x ) 0, and h p (x ) = 0 (primal feasibility). Since we started out and ended up with the same thing, all of the things above must be equal, and so λ mf m (x ) = 0, m = 1,..., M. Also, since we know x is a minimizer of L(x, λ, ν ) (second equality above), which is an unconstrained convex function (with λ, ν fixed), the gradient with respect to x must be zero: x L(x, λ, ν ) = f 0 (x )+ λ m f m (x )+ νp h p (x ) = 0. Thus strong duality immediately leads to the KKT conditions holding at the solution. Also, if you can find x, λ, ν that obey the KKT conditions, not only do you know that you have a primal optimal point on your hands, but also we have strong duality (and λ, ν are dual optimal). For if KKT holds, x L(x, λ, ν ) = 0, 5

meaning that x is a minimizer of L(x, λ, ν ), i.e. thus L(x, λ, ν ) L(x, λ, ν ), g(λ, ν ) = L(x, λ, ν ) = f 0 (x ) + λ mf m (x ) + = f 0 (x ), (by KKT), νph p (x ) and we have strong duality. The upshot of this is that the conditions for strong duality are essentially the same as those under which KKT is necessary. The program (1) and its dual (2) have strong duality if the f m are affine inequality constraints, or there is an x R N such that for all the f i which are not affine we have f i (x) < 0. 6

Examples 1. Inequality LP. Calculate the dual of minimize x, c subject to Ax b. x R N Answer: The Lagrangian is L(x, λ) = x, c + λ m ( x, a m b m ) = c T x λ T b + λ T Ax. This is a linear functional in x it is unbounded below unless c + A T λ = 0. Thus g(λ) = inf x ( ) c T x λ T b + λ T Ax = { λ, b, c + A T λ = 0, otherwise. So the Lagrange dual program is maximize λ, b subject to A T λ = c λ R M λ 0. 7

2. Standard form LP. Calculate the dual of minimize x R N x, c subject to Ax = b λ 0. 8

Least-squares. Calculate the dual of minimize x R N x 2 2 subject to Ax = b. Check that the duality gap is zero. Answer: maximize ν R M 1 4 νt AA T ν b T ν 9

3. Minimum norm. Calculate the dual of minimize x subject to Ax = b, x R N where is a general valid norm. Answer: Use f 0 (x) = x to ease notation below. We start with the Lagrangian: L(x, ν) = f 0 (x) + ν p ( x, a m b m ) = f 0 (x) ν, b + (A T ν) T x and so g(ν) = ν, b + inf x = ν, b sup x = ν, b f 0 ( A T ν), ( ) f 0 (x) + (A T ν) T x ( f 0 (x) (A T ν) T x ) where f 0 is the Fenchel dual of f 0 : f 0 (y) = sup( x, y f 0 (x)). x With f 0 =, we know already that so f 0 (y) = g(ν) = { 0, y 1,, otherwise, { ν, b, A T ν 1, otherwise. 10

Thus the dual program is maximize ν, b subject to A T ν 1, ν R P where is the dual norm of. 11