Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Similar documents
Convex Optimization M2

Optimization for Communications and Networks. Poompat Saengudomlert. Session 4 Duality and Lagrange Multipliers

Convex Optimization & Lagrange Duality

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Lecture 18: Optimization Programming

Lecture 6: Conic Optimization September 8

Constrained Optimization and Lagrangian Duality

4TE3/6TE3. Algorithms for. Continuous Optimization

Lagrange duality. The Lagrangian. We consider an optimization program of the form

Optimality, Duality, Complementarity for Constrained Optimization

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

LECTURE 10 LECTURE OUTLINE

Optimality Conditions for Constrained Optimization

CS-E4830 Kernel Methods in Machine Learning

Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725

5. Duality. Lagrangian

Convex Optimization Theory. Chapter 5 Exercises and Solutions: Extended Version

Lecture 9 Monotone VIs/CPs Properties of cones and some existence results. October 6, 2008

Lecture Notes on Support Vector Machine

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient

2.098/6.255/ Optimization Methods Practice True/False Questions

Chap 2. Optimality conditions

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Lecture: Duality.

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness.

1 Review of last lecture and introduction

Chapter 1. Preliminaries

Support Vector Machines

Convex Optimization Boyd & Vandenberghe. 5. Duality

Date: July 5, Contents

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

ICS-E4030 Kernel Methods in Machine Learning

Convex Optimization. Lecture 12 - Equality Constrained Optimization. Instructor: Yuanzhang Xiao. Fall University of Hawaii at Manoa

Support Vector Machines: Maximum Margin Classifiers

Lecture 8 Plus properties, merit functions and gap functions. September 28, 2008

Lecture: Duality of LP, SOCP and SDP

Lagrangian Duality Theory

Lecture 5. The Dual Cone and Dual Problem

A Geometric Framework for Nonconvex Optimization Duality using Augmented Lagrangian Functions

Lecture 1. 1 Conic programming. MA 796S: Convex Optimization and Interior Point Methods October 8, Consider the conic program. min.

Duality Theory of Constrained Optimization

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus

Lectures 9 and 10: Constrained optimization problems and their optimality conditions

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization

4. Algebra and Duality

CO 250 Final Exam Guide

Convex Optimization and Modeling

Semidefinite Programming

The Karush-Kuhn-Tucker (KKT) conditions

Nonlinear Programming

Linear and non-linear programming

Lecture 3. Optimization Problems and Iterative Algorithms

Math 273a: Optimization Subgradients of convex functions

On duality theory of conic linear problems

IE 5531: Engineering Optimization I

Convex Analysis and Optimization Chapter 4 Solutions

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Introduction to Optimization Techniques. Nonlinear Optimization in Function Spaces

Machine Learning. Support Vector Machines. Manfred Huber

Convex Optimization Lecture 6: KKT Conditions, and applications

Duality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities

Optimization for Machine Learning

Lagrangian Duality and Convex Optimization

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 17

ON LICQ AND THE UNIQUENESS OF LAGRANGE MULTIPLIERS

A Unified Analysis of Nonconvex Optimization Duality and Penalty Methods with General Augmenting Functions

Lecture 8. Strong Duality Results. September 22, 2008

Lecture 5. Theorems of Alternatives and Self-Dual Embedding

CONSTRAINT QUALIFICATIONS, LAGRANGIAN DUALITY & SADDLE POINT OPTIMALITY CONDITIONS

Extended Monotropic Programming and Duality 1

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Support Vector Machines for Regression

Nonlinear Optimization: What s important?

Convex Programs. Carlo Tomasi. December 4, 2018

Duality (Continued) min f ( x), X R R. Recall, the general primal problem is. The Lagrangian is a function. defined by

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

EE/AA 578, Univ of Washington, Fall Duality

Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Additional Homework Problems

Tutorial on Convex Optimization: Part II

Convex analysis and profit/cost/support functions

Generalization to inequality constrained problem. Maximize

Linear Programming. Larry Blume Cornell University, IHS Vienna and SFI. Summer 2016

Lecture 2: Linear SVM in the Dual

Applied Lagrange Duality for Constrained Optimization

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

Optimization. Yuh-Jye Lee. March 28, Data Science and Machine Intelligence Lab National Chiao Tung University 1 / 40

LECTURE 13 LECTURE OUTLINE

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

On the Method of Lagrange Multipliers

Pattern Classification, and Quadratic Problems

Convex Optimization and SVM

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

DUALITY, OPTIMALITY CONDITIONS AND PERTURBATION ANALYSIS

Lagrange Relaxation and Duality

1.4 FOUNDATIONS OF CONSTRAINED OPTIMIZATION

Transcription:

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee227c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee227c@berkeley.edu October 15, 2018 13 Duality theory These notes are based on earlier lecture notes by Benjamin Recht and Ashia Wilson. 13.1 Optimality conditions for equality constrained optimization Recall that x minimizes a smooth, convex function f over a closed convex set Ω if and only if f (x ), x x 0 x Ω. (1) Let s specialize this to the special case where Ω is an affine set. Let A be an n d matrix with rank n such that Ω = {x : Ax = b} for some b R n. Note that we can always assume that rank(a) = n or else we would have redundant constraints. We could also parameterize Ω as Ω = {x 0 + v : Av = 0} for any x 0 Ω. Then using (1), we have f (x ), x x 0 x Ω if and only if f (x ), u 0 u ker(a). But since ker A is a subspace, this can hold if and only if f (x ), u = 0 for all u ker(a). In particular, this means, f (x ) must lie in ker(a). Since we have that R d = ker(a) Im(A T ), this means that f (x ) = A T λ for some λ R n. 1

To summarize, this means that x is optimal for f over Ω if and only if there λ R m such that { f (x ) + A T λ = 0 Ax = b These optimality conditions are known as the Karush-Kuhn-Tucker Conditions or KKT Conditions. As an example, consider the equality constrained quadratic optimization problem minimize subject to 1 2 xt Qx + c T x Ax = b The KKT conditions can be expressed in matrix form [ ] [ ] [ ] Q A T x c = A 0 λ b. 13.2 Nonlinear constraints Let Ω be a closed convex set. Let s define the tangent cone of Ω at x as T Ω (x) = cone{z x : z Ω} The tangent cone is the set of all directions that can move from x and remain in Ω. We can also define the normal cone of Ω at x to be the set N Ω (x) = T Ω (x) = {u : u, v 0, v T Ω (x)} Suppose we want to minimize a continuously differentiable function f over the intersection of a closed convex set Ω and an affine set A = {x : Ax = b} minimize subject to f (x) x Ω Ax = b (2) where A is again a full rank n d matrix. In this section, we will generalize (1) to show Proposition 13.1. x is optimal for (2) if and only if there exists λ R n such that { [ f (x ) + A λ ] N Ω (x ). x Ω A The key to our analysis here will be to rely on convex analytic arguments. Note that when there is no equality constraint Ax = b, our constrained optimality condition is completely equivalent to the assertion f (x ) N Ω (x ). (3) 2

C C Figure 1: Black set is C, red set is T C (x), blue set is N c (x) Thus, to prove Proposition 13.1, it will suffice for us to understand the normal cone of the set Ω A at the point x. To obtain a reasonable characterization, we begin by proving a general fact. Proposition 13.2. Let Ω R d be a closed convex set. Let A denote the affine set {x : Ax = b} for some A R n d and b R n. Suppose that the set ri(ω) A is non-empty. Then for any x Ω A, N Ω A (x) = N Ω (x) + {A T λ : λ R n }. Proof. The assertion is straightforward. To see this, suppose z Ω A and note that z x null(a) so that (z x) A λ = λ A(z x) = 0 for all λ R n. If u N Ω (x), then (z x) u 0, so for λ R n, we have z x, u + A λ = z x, u 0 implying that u + A λ N Ω A (x). For the reverse inclusion, let v N Ω A (x). Then we have v (z x) 0 for all z Ω A Now define the sets { C 1 = (y, µ) R d+1 C 2 = } : y = z x, z Ω, µ v y } { (y, µ) R d+1 : y ker(a), µ = 0 Note that ri(c 1 ) C 2 = because otherwise there would exist a (ŷ, µ) such that v ŷ > µ = 0 3.

and ŷ T Ω A (x). This would contradict our assumption that v N Ω A (x). Since their intersection is empty, we can properly separate ri(c 1 ) from C 2. Indeed, since C 2 is a subspace and C 1 has nonempty relative interior, there must be a (w, γ) such that while inf {w y + γµ} < sup {w y + γµ} 0 (y,µ) C 1 (y,µ) C 1 w u = 0 for all u ker(a). In particular, this means that w = A T λ for some λ R n. Now, γ must be nonnegative, as otherwise, sup (y,µ) C 1 {w y + γµ} = (which can be seen by letting µ tend to negative infinity). If γ = 0, then sup w y 0 y C 1 but the set {y : w T y = 0} does not contain the set {z x : z Ω} as the infimum is strictly negative. This means that the relative interior of Ω {x} cannot intersect the kernel of A which contradicts our assumptions. Thus, we can conclude that γ is strictly positive. By homogeneity, we may as well assume that γ = 1. To complete the argument, note that we now have (w + v) (z x) 0 for all z Ω. This means that v + w N Ω (x) and we have already shown that w = A T λ. Thus, v = (v + w) w N Ω (x) + N A (x). Let s now translate the consequence of this proposition for our problem. Using (3) and Proposition 13.2, we have that x is optimal for min f (x) s.t x Ω, Ax = b if and only if Ax = b and there exists a λ R n such that [ f (x ) + A λ] N Ω (x ) x Ω. This reduction is not immediately useful to us, as it doesn t provide an algorithmic approach to solving the constrained optimization problem. However, it will form the basis of our dive into duality. 4

13.3 Duality Duality lets us associate to any constrained optimization problem, a concave maximization problem whose solutions lower bound the optimal value of the original problem. In particular, under mild assumptions, we will show that one can solve the primal problem by first solving the dual problem. We ll continue to focus on the standard primal problem for an equality constrained optimization problem: minimize subject to f (x) x Ω Ax = b Here, assume that Ω is a closed convex set, f is differentiable, and A is full rank. The key behind duality (here, Lagrangian duality) is that problem (4) is equivalent to min max f (x) + x Ω λ R λt (Ax b) n To see this, just note that if Ax = b, then the max with respect to λ is infinite. On the other hand, if Ax = b is feasible, then the max with respect to λ is equal to f (x). The dual problem associated with (4) is Note that the function max min f (x) + λ R n x Ω λt (Ax b) q(λ) := min x Ω f (x) + λt (Ax b) is always a concave function as it is a minimum of linear functions. Hence the dual problem is a concave maximization problem, regardless of what form f and Ω take. We now show that it always lower bounds the primal problem. 13.4 Weak duality (4) Proposition 13.3. For any function ϕ(x, z), min max x z ϕ(x, z) max min ϕ(x, z). z x Proof. The proof is essentially tautological. Note that we always have ϕ(x, z) min x ϕ(x, z) Taking the maximization with respect to the second argument verifies that max z ϕ(x, z) max min ϕ(x, z) x. z x Now, minimizing the left hand side with respect to x proves min max x z which is precisely our assertion. ϕ(x, z) max min ϕ(x, z) z x 5

13.5 Strong duality For convex optimization problems, we can prove a considerably stronger result. Namely, that the primal and dual problems attain the same optimal value. And, moreover, that if we know a dual optimal solution, we can extract a primal optimal solution from a simpler optimization problem. Theorem 13.4 (Strong Duality). 1. If z relint(ω) that also satisfies our equality constraint, and the primal problem has an optimal solution, then the dual has an optimal solution and the primal and dual values are equal 2. In order for x to be optimal for the primal and λ optimal for the dual, it is necessary and sufficient that Ax = b, x Ω and Proof. For all λ and all feasible x x arg min x Ω L(x, λ ) = f (x) + λ T (Ax b) q(λ) f (x) + λ(ax b) = f (x) where the second equality holds because Ax = b. Now by Proposition 13.1, x is optimal if and only if there exists a λ such that f (x ) + A T λ, x x 0 x Ω and Ax = b. Note that this condition means that x minimizes L(x, λ ) over Ω. By preceding argument, it now follows that q(λ ) = inf x Ω L(x, λ ) = L(x, λ ) = f (x ) + λ T (Ax b) = f (x ) which completes the proof. References 6