Lagrangian Duality for Dummies

Similar documents
Duality Uses and Correspondences. Ryan Tibshirani Convex Optimization

Introduction to Machine Learning Spring 2018 Note Duality. 1.1 Primal and Dual Problem

CS-E4830 Kernel Methods in Machine Learning

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

ICS-E4030 Kernel Methods in Machine Learning

Constrained Optimization and Lagrangian Duality

Support Vector Machines

Lagrange duality. The Lagrangian. We consider an optimization program of the form

1. f(β) 0 (that is, β is a feasible point for the constraints)

Support Vector Machines: Maximum Margin Classifiers

too, of course, but is perhaps overkill here.

CS269: Machine Learning Theory Lecture 16: SVMs and Kernels November 17, 2010

CS711008Z Algorithm Design and Analysis

Continuous Optimisation, Chpt 6: Solution methods for Constrained Optimisation

Duality revisited. Javier Peña Convex Optimization /36-725

Convex Optimization Overview (cnt d)

On the Method of Lagrange Multipliers

Convex Optimization and Support Vector Machine

10701 Recitation 5 Duality and SVM. Ahmed Hefny

The Lagrangian L : R d R m R r R is an (easier to optimize) lower bound on the original problem:

Lagrangian Duality and Convex Optimization

Newton-Type Algorithms for Nonlinear Constrained Optimization

9. Interpretations, Lifting, SOS and Moments

STATIC LECTURE 4: CONSTRAINED OPTIMIZATION II - KUHN TUCKER THEORY

Review of Optimization Basics

Lagrange Relaxation and Duality

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Quiz Discussion. IE417: Nonlinear Programming: Lecture 12. Motivation. Why do we care? Jeff Linderoth. 16th March 2006

IE 5531 Midterm #2 Solutions

Support Vector Machines for Regression

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness.

Convex Optimization & Lagrange Duality

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Convexity II: Optimization Basics

Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Tutorial on Convex Optimization: Part II

Intro to Nonlinear Optimization

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

5. Duality. Lagrangian

Applied Lagrange Duality for Constrained Optimization

Computational Optimization. Constrained Optimization Part 2

In view of (31), the second of these is equal to the identity I on E m, while this, in view of (30), implies that the first can be written

Convex Optimization and SVM

FIN 550 Exam answers. A. Every unconstrained problem has at least one interior solution.

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

Lecture 7: Weak Duality

Numerical Optimization

TMA 4180 Optimeringsteori KARUSH-KUHN-TUCKER THEOREM

Lecture 14: Optimality Conditions for Conic Problems

Optimization 4. GAME THEORY

Using quadratic convex reformulation to tighten the convex relaxation of a quadratic program with complementarity constraints

ECE Optimization for wireless networks Final. minimize f o (x) s.t. Ax = b,

Duality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities

1.4 FOUNDATIONS OF CONSTRAINED OPTIMIZATION

1 Convexity, Convex Relaxations, and Global Optimization

Convex Optimization M2

10 Numerical methods for constrained problems

Support Vector Machine (SVM) and Kernel Methods

Duality. Geoff Gordon & Ryan Tibshirani Optimization /

Lagrangian Duality Theory

Convex Optimization Boyd & Vandenberghe. 5. Duality

Introduction to Mathematical Programming IE406. Lecture 10. Dr. Ted Ralphs

Primal/Dual Decomposition Methods

Convex Optimization and Modeling

Lecture: Duality.

Lecture 23: Conditional Gradient Method

EE/AA 578, Univ of Washington, Fall Duality

Interior Point Algorithms for Constrained Convex Optimization

Lecture: Duality of LP, SOCP and SDP

CHAPTER 1-2: SHADOW PRICES

Lecture Support Vector Machine (SVM) Classifiers

Convex Optimization Lecture 6: KKT Conditions, and applications

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning

12. Interior-point methods

Convex Programs. Carlo Tomasi. December 4, 2018

Lecture Notes on Support Vector Machine

Lecture 16: October 22

Lagrangian Methods for Constrained Optimization

Support Vector Machines, Kernel SVM

Primal-dual Subgradient Method for Convex Problems with Functional Constraints

Lecture 18: Optimization Programming

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

Pattern Classification, and Quadratic Problems

Nonlinear Optimization for Optimal Control

Optimality, Duality, Complementarity for Constrained Optimization

Nonlinear Programming and the Kuhn-Tucker Conditions

Topic: Primal-Dual Algorithms Date: We finished our discussion of randomized rounding and began talking about LP Duality.

Optimization for Communications and Networks. Poompat Saengudomlert. Session 4 Duality and Lagrange Multipliers

Outline. Basic Concepts in Optimization Part I. Illustration of a (Strict) Local Minimum, x. Local Optima. Neighborhood.

Understanding the Simplex algorithm. Standard Optimization Problems.

CSC373: Algorithm Design, Analysis and Complexity Fall 2017 DENIS PANKRATOV NOVEMBER 1, 2017

Symmetric and Asymmetric Duality

OPTIMISATION /09 EXAM PREPARATION GUIDELINES

Lectures 9 and 10: Constrained optimization problems and their optimality conditions

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Fundamentals of Operations Research. Prof. G. Srinivasan. Indian Institute of Technology Madras. Lecture No. # 15

Lecture 23: November 19

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient

A Review of Linear Programming

Transcription:

Lagrangian Duality for Dummies David Knowles November 13, 2010 We want to solve the following optimisation problem: f 0 () (1) such that f i () 0 i 1,..., m (2) For now we do not need to assume conveity. For simplicity we assume no equality constraints, but all these results etend straightforwardly in that case. An obvious (but foolish) approach would be to achieve this by imising the following function: { f0 (), if f J() = i () 0 i (3), otherwise = f 0 () + i I[f i ()] (4) where I[u] is a infinite step function: I[u] = { 0, if u 0, otherwise (5) This function I gives infinite penalty to a constraint being dissatisfied. Now if we were able to imise J() then we would have a way of solving our optimisation problem. Unfortunately, J() is a pretty horrible function to optimise because I[u] is both non-differentiable and discontinuous. What about replacing I[u] with something nicer? A straight line, u is certainly easier to handle. This might seem like a pretty dumb choice, but for 0 the penalty is at least in the correct direction (we are penalised for constraints being dissatisfied), and u is a lower bound on I[u] (see Figure 1). If we 1

1.0 0.5 0.0 0.5 1.0 I(u) 0.6u 1.0 0.5 0.0 0.5 1.0 u Figure 1: The infinite step function I(u) and the linear relaation u. For 0 note that u is a lower bound on I(u). replace I[u] by u in the function J() we get a function of and known as the Lagrangian: L(, ) = f 0 () + i i f i () (6) Note that if we take the maimum with respect to of this function we recover J(). For a particular value of, if the constraints are all satisfied (i.e. f i () 0 i) then the best we can do is to set i = 0 i, so L(, 0) = f 0 (). If any of the constraints are not satisfied, then f i () 0 for some i, and we can make L(, ) infinite by taking i. So we have: ma L(, ) = J() (7) That s all well and good, but how does it help us solve our original optimisation problem? Remember that we want to imise J(), which we now 2

know means finding: ma L(, ) (8) This is a hard problem. But what would happen if we reversed the order of maimisation over and imisation over? Then we would be finding: ma L(, ) = ma g() (9) where g() = L(, ) is known as the dual function. Maimising the dual function g() is known as the dual problem, in the constrast the original primal problem. Since g() is a pointwise imum of affine functions (L(, ) is affine, i.e. linear, in ), it is a concave function. The imisation of L(, ) over might be hard. However since g() is concave and i 0 i are linear constraints, maimising g() over is a conve optimisation problem, i.e. an easy problem. But does solving this problem relate to the original problem? Recall that u is a lower bound on I(u). As a result, L(, ) is a lower bound on J() for all 0. Thus L(, ) J() 0 (10) L(, ) = g() J() =: p (11) d = ma g() p (12) where p and d are the optima of the primal and dual problems respectively. We see that for any the dual function g() gives a lower bound on the optimal problem. We can then interpret the dual problem of maimising over as finding the tightest possible lower bound on p : ma L(, ) ma L(, ) (13) This property is known as weak duality, and in fact holds in general for smooth functions L. The difference p d is known as the optimal duality gap. Strong duality means that we have equality, i.e. the optimal duality gap is zero. Strong duality holds if our optimisation problem is conve and a strictly feasible point eists (i.e. a point where all constraints are strictly satisfied). In that case the solution of the primal and dual problems is equivalent, i.e. the optimal is given by L(, ), where is the maimiser of g(). 3

Duality gives us an option of trying to solve our original (potentially nonconve) constrained optimisation problem in another way. If imising the Lagrangian over happens to be easy for our problem, then we know that maimising the resulting dual function over is easy. If strong duality holds we have found an easier approach to our original problem: if not then we still have a lower bound which may be of use. Duality also lets us formulate optimality conditions for constrained optimisation problems. 1 KKT Conditions For an unconstrained conve optimisation problem we know we are at the global imum if the gradient is zero. The KKT conditions are the equivalent conditions for the global imum of a constrained conve optimisation problem. If strong duality holds and (, ) is optimal then imises L(, ), giving us the first KKT condition: L(, ) = f 0 ( ) + i i f i ( ) = 0 (14) We can interpret this condition by saying that the gradient of the objective function and constraint function must be parallel (and opposite). This means that moving along the constraint surface cannot improve the objective function. This concept is illustrated for a simple 2D optimisation problem with one inequality constraint in Figure 2. Also, by definition we have: f 0 ( ) = g( ) = L(, ) f 0 ( ) + i i f i ( ) f 0 ( ) (15) where the last inequality holds because i i f i ( ) 0. We see that i f i ( ) = 0 (16) i Since i 0 and f i ( ) 0 for all i, this gives the second KKT condition: i f i ( ) = 0 i (17) This condition is known as complementary slackness: either i or f i ( ) must be zero for all i. If the constraint is inactive then i = 0, and f i ( ) has some 4

1 f 1 f 0 Figure 2: The first KKT condition, f 0 ( ) + 1 f 1 ( ) = 0. The black contours are the objective function, the red line is the constraint boundary. At the optimal solution the gradient of the objective and constraint must be parallel and opposing so that no direction along the constraint boundary could give an improved objective value. negative value. We can interpret i = 0 as meaning that we can disregard constraint i in the first KKT condition since this constraint is not active. If the constraint is active then f i ( ) = 0 and i has some positive value. Note that this condition means that the lower bound u on I(u) shown in Figure 1 is tight: if is zero and u the functions match, and if u = 0, the functions also match. The remaining KKT conditions are simply the primal constraints and dual constraints (i.e. that i 0). 5