Subgradients. subgradients and quasigradients. subgradient calculus. optimality conditions via subgradients. directional derivatives

Similar documents
Subgradients. subgradients. strong and weak subgradient calculus. optimality conditions via subgradients. directional derivatives

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus

Primal/Dual Decomposition Methods

Lecture 1: Background on Convex Analysis

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

5. Duality. Lagrangian

Convex Functions. Pontus Giselsson

LECTURE 12 LECTURE OUTLINE. Subgradients Fenchel inequality Sensitivity in constrained optimization Subdifferential calculus Optimality conditions

Optimality Conditions for Nonsmooth Convex Optimization

Constrained Optimization and Lagrangian Duality

Convex Optimization Boyd & Vandenberghe. 5. Duality

Convex Optimization. (EE227A: UC Berkeley) Lecture 4. Suvrit Sra. (Conjugates, subdifferentials) 31 Jan, 2013

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions

Convex Optimization M2

3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

Lecture: Duality.

A SET OF LECTURE NOTES ON CONVEX OPTIMIZATION WITH SOME APPLICATIONS TO PROBABILITY THEORY INCOMPLETE DRAFT. MAY 06

Convex Optimization & Lagrange Duality

Optimization and Optimal Control in Banach Spaces

Lecture: Duality of LP, SOCP and SDP

CSCI : Optimization and Control of Networks. Review on Convex Optimization

Convex Optimization and Modeling

A Brief Review on Convex Optimization

Lecture 6: September 17

The proximal mapping

Chapter 2 Convex Analysis

Optimization for Machine Learning

Convex analysis and profit/cost/support functions

Lagrangian Duality and Convex Optimization

Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods

In Progress: Summary of Notation and Basic Results Convex Analysis C&O 663, Fall 2009

Optimality Conditions for Constrained Optimization

Lecture 2: Convex functions

8. Conjugate functions

Math 273a: Optimization Subgradients of convex functions

CONSTRAINT QUALIFICATIONS, LAGRANGIAN DUALITY & SADDLE POINT OPTIMALITY CONDITIONS

6. Proximal gradient method

Convex Analysis Background

Lecture 8. Strong Duality Results. September 22, 2008

BASICS OF CONVEX ANALYSIS

EE364b Homework 2. λ i f i ( x) = 0, i=1

Convex Optimization Lecture 6: KKT Conditions, and applications

Chap 2. Optimality conditions

Finite Dimensional Optimization Part III: Convex Optimization 1

Lecture 3. Optimization Problems and Iterative Algorithms

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

subject to (x 2)(x 4) u,

Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

Math 273a: Optimization Subgradients of convex functions

A function(al) f is convex if dom f is a convex set, and. f(θx + (1 θ)y) < θf(x) + (1 θ)f(y) f(x) = x 3

Convex Optimization in Communications and Signal Processing

Convex Optimization Theory. Chapter 5 Exercises and Solutions: Extended Version

g 2 (x) (1/3)M 1 = (1/3)(2/3)M.

Constraint qualifications for convex inequality systems with applications in constrained optimization

Lagrange duality. The Lagrangian. We consider an optimization program of the form

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness.

Convex Analysis and Optimization Chapter 4 Solutions

Duality Uses and Correspondences. Ryan Tibshirani Convex Optimization

Lecture 6: Conic Optimization September 8

Convex Functions. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Linear Programming. Larry Blume Cornell University, IHS Vienna and SFI. Summer 2016

4. Convex optimization problems

1. f(β) 0 (that is, β is a feasible point for the constraints)

Convexity in R n. The following lemma will be needed in a while. Lemma 1 Let x E, u R n. If τ I(x, u), τ 0, define. f(x + τu) f(x). τ.

CS-E4830 Kernel Methods in Machine Learning

A : k n. Usually k > n otherwise easily the minimum is zero. Analytical solution:

EE/AA 578, Univ of Washington, Fall Duality

Convex Optimization Conjugate, Subdifferential, Proximation

A : k n. Usually k > n otherwise easily the minimum is zero. Analytical solution:

Stochastic Programming Math Review and MultiPeriod Models

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

On the Method of Lagrange Multipliers

Convex functions. Definition. f : R n! R is convex if dom f is a convex set and. f ( x +(1 )y) < f (x)+(1 )f (y) f ( x +(1 )y) apple f (x)+(1 )f (y)

Subdifferential representation of convex functions: refinements and applications

Convex Optimization. Prof. Nati Srebro. Lecture 12: Infeasible-Start Newton s Method Interior Point Methods

SYMBOLIC CONVEX ANALYSIS

Convex Analysis Notes. Lecturer: Adrian Lewis, Cornell ORIE Scribe: Kevin Kircher, Cornell MAE

11. Equality constrained minimization

Lagrangian Duality Theory

Lecture 9 Sequential unconstrained minimization

New formulas for the Fenchel subdifferential of the conjugate function

Solutions Chapter 5. The problem of finding the minimum distance from the origin to a line is written as. min 1 2 kxk2. subject to Ax = b.

On duality theory of conic linear problems

Helly's Theorem and its Equivalences via Convex Analysis

Subgradient Descent. David S. Rosenberg. New York University. February 7, 2018

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method

Extended Monotropic Programming and Duality 1

Quiz Discussion. IE417: Nonlinear Programming: Lecture 12. Motivation. Why do we care? Jeff Linderoth. 16th March 2006

Tutorial on Convex Optimization for Engineers Part II

Lecture 6: September 12

1. Introduction. mathematical optimization. least-squares and linear programming. convex optimization. example. course goals and topics

1. Introduction. mathematical optimization. least-squares and linear programming. convex optimization. example. course goals and topics

Existence of minimizers

Transcription:

Subgradients subgradients and quasigradients subgradient calculus optimality conditions via subgradients directional derivatives Prof. S. Boyd, EE392o, Stanford University

Basic inequality recall basic inequality for convex differentiable f: f(y) f(x) + f(x) T (y x) first-order approximation of f at x is global underestimator ( f(x), 1) supports epi f at (x, f(x)) What if f is not differentiable? Prof. S. Boyd, EE392o, Stanford University 1

Subgradient of a function g is a subgradient of f (not necessarily convex) at x if f(y) f(x) + g T (y x) ( (g, 1) supports epi f at (x, f(x))) PSfrag replacements f(x 1 ) + g T 1 (x x 1) for all y f(x) f(x 2 ) + g T 2 (x x 2) f(x 2 ) + g T 3 (x x 2) x 1 x 2 g 2, g 3 are subgradients at x 2 ; g 1 is a subgradient at x 1 Prof. S. Boyd, EE392o, Stanford University 2

subgradient gives affine global underestimator of f if f is convex, it has at least one subgradient at every point in relint dom f if f is convex and differentiable, f(x) is a subgradient of f at x Prof. S. Boyd, EE392o, Stanford University 3

Example f = max{f 1, f 2 }, with f 1, f 2 convex and differentiable PSfrag replacements f 2 (x) f(x) f 1 (x) x 0 f 1 (x 0 ) > f 2 (x 0 ): unique subgradient g = f 1 (x 0 ) f 2 (x 0 ) > f 1 (x 0 ): unique subgradient g = f 2 (x 0 ) f 1 (x 0 ) = f 2 (x 0 ): subgradients form a line segment [ f 1 (x 0 ), f 2 (x 0 )] Prof. S. Boyd, EE392o, Stanford University 4

Subdifferential set of all subgradients of f at x is called the subdifferential of f at x, written f(x) f(x) is a closed convex set f(x) nonempty (if f convex, and finite near x) f(x) = { f(x)} if f is differentiable at x if f(x) = {g}, then f is differentiable at x and g = f(x) in many applications, don t need complete f(x); it is sufficient to find one g f(x) Prof. S. Boyd, EE392o, Stanford University 5

example: f(x) = x Sfrag replacements f(x) = x f(x) 1 x 1 x Prof. S. Boyd, EE392o, Stanford University 6

Calculus of subgradients assumption: all functions are finite near x f(x) = { f(x)} if f is differentiable at x scaling: (αf) = α f (if α > 0) addition: (f 1 + f 2 ) = f 1 + f 2 (RHS is addition of sets) affine transformation of variables: if g(x) = f(ax + b), then g(x) = A T f(ax + b) pointwise maximum: if f = max f i, then i=1,...,m f(x) = Co { f i (x) f i (x) = f(x)}, i.e., convex hull of union of subdifferentials of active functions at x Prof. S. Boyd, EE392o, Stanford University 7

special case: if f i differentiable replacements f(x) = Co{ f i (x) f i (x) = f(x)} example: f(x) = x 1 = max{s T x s i { 1, 1}} 1 1 1 1 1 (1,1) 1 1 f(x) at x = (0, 0) at x = (1, 0) at x = (1, 1) Prof. S. Boyd, EE392o, Stanford University 8

Pointwise supremum if f = sup f α, α A cl Co { f β (x) f β (x) = f(x)} f(x) (usually get equality, but requires some technical conditions to hold, e.g., A compact, f α cts in x and α) roughly speaking, f(x) is closure of convex hull of union of subdifferentials of active function in any case, if f β (x) = f(x), then f β (x) f(x) Prof. S. Boyd, EE392o, Stanford University 9

example f(x) = λ max (A(x)) = sup y T A(x)y y 2 =1 where A(x) = A 0 + x 1 A 1 + + x n A n, A i S k f is pointwise supremum of g y (x) = y T A(x)y over y 2 = 1 g y is affine in x, with g y (x) = (y T A 1 y,..., y T A n y) hence, f(x) = Co { g y A(x)y = λ max (A(x))y, y 2 = 1} (not hard to verify) to find one subgradient at x, can choose any unit eigenvector y associated with λ max (A(x)); then (y T A 1 y,..., y T A n y) f(x) Prof. S. Boyd, EE392o, Stanford University 10

define g(y) as the optimal value of Minimization minimize subject to f 0 (x) f i (x) y i, i = 1,..., m (f i convex; variable x) with λ an optimal dual variable, we have g(z) g(y) m λ i (z i y i ) i=1 i.e., λ is a subgradient of g at y Prof. S. Boyd, EE392o, Stanford University 11

Subgradients and sublevel sets g is a subgradient at x means f(y) f(x) + g T (y x) hence f(y) f(x) = g T (y x) 0 g f(x 0 ) PSfrag replacements x 0 f(x) f(x 0 ) x 1 f(x 1 ) Prof. S. Boyd, EE392o, Stanford University 12

f differentiable at x 0 : f(x 0 ) is normal to the sublevel set {x f(x) f(x 0 )} f nondifferentiable at x 0 : subgradient defines a supporting hyperplane to sublevel set through x 0 Prof. S. Boyd, EE392o, Stanford University 13

Quasigradients g 0 is a quasigradient of f at x if g T (y x) 0 = f(y) f(x) holds for all y PSfrag replacements f(y) f(x) x g quasigradients at x form a cone Prof. S. Boyd, EE392o, Stanford University 14

example: f(x) = at x + b c T x + d, (dom f = {x ct x + d > 0}) g = a f(x 0 )c is a quasigradient at x 0 proof: for c T x + d > 0: a T (x x 0 ) f(x 0 )c T (x x 0 ) = f(x) f(x 0 ) Prof. S. Boyd, EE392o, Stanford University 15

example: degree of a 1 + a 2 t + + a n t n 1 f(a) = min{i a i+2 = = a n = 0} g = sign(a k+1 )e k+1 (with k = f(a)) is a quasigradient at a 0 proof: implies b k+1 0 g T (b a) = sign(a k+1 )b k+1 a k+1 0 Prof. S. Boyd, EE392o, Stanford University 16

Optimality conditions unconstrained recall for f convex, differentiable, f(x ) = inf x f(x) 0 = f(x ) generalization to nondifferentiable convex f: f(x ) = inf x f(x) 0 f(x ) Prof. S. Boyd, EE392o, Stanford University 17

f(x) PSfrag replacements 0 f(x 0 ) x 0 x proof. by definition (!) f(y) f(x ) + 0 T (y x ) for all y 0 f(x )... seems trivial but isn t Prof. S. Boyd, EE392o, Stanford University 18

Example: piecewise linear minimization f(x) = max i=1,...,m (a T i x + b i) x minimizes f 0 f(x ) = Co{a i a T i x + b i = f(x )} there is a λ with λ 0, 1 T λ = 1, m λ i a i = 0 i=1 where λ i = 0 if a T i x + b i < f(x ) Prof. S. Boyd, EE392o, Stanford University 19

... but these are the KKT conditions for the epigraph form minimize t subject to a T i x + b i t, i = 1,..., m with dual maximize b T λ subject to λ 0, A T λ = 0, 1 T λ = 1 Prof. S. Boyd, EE392o, Stanford University 20

Optimality conditions constrained minimize subject to f 0 (x) f i (x) 0, i = 1,..., m we assume f i convex, defined on R n (hence subdifferentiable) strict feasibility (Slater s condition) x is primal optimal (λ is dual optimal) iff f i (x ) 0, λ i 0 0 f 0 (x ) + m i=1 λ i f i(x ) λ i f i(x ) = 0... generalizes KKT for nondifferentiable f i Prof. S. Boyd, EE392o, Stanford University 21

Directional derivative directional derivative of f at x in the direction δx is can be + or f (x; δx) = lim h 0 f(x + hδx) f(x) h f convex, finite near x = f (x; δx) exists f differentiable at x if and only if, for some g (= f(x)) and all δx, f (x; δx) = g T δx (i.e., f (x; δx) is a linear function of δx) Prof. S. Boyd, EE392o, Stanford University 22

Directional derivative and subdifferential general formula for convex f: f (x; δx) = sup g T δx g f(x) δx PSfrag replacements f(x) Prof. S. Boyd, EE392o, Stanford University 23

Descent directions δx is a descent direction for f at x if f (x; δx) < 0 for differentiable f, δx = f(x) is always a descent direction (except when it is zero) warning: for nondifferentiable (convex) functions, δx = g, with g f(x), need not be descent direction x 2 g example: f(x) = x 1 + 2 x 2 PSfrag replacements x 1 Prof. S. Boyd, EE392o, Stanford University 24

Subgradients and distance to sublevel sets if f is convex, f(z) < f(x), g f(x), then for small t > 0, x tg z 2 < x z 2 thus g is descent direction for x z 2, for any z with f(z) < f(x) (e.g., x ) negative subgradient is descent direction for distance to optimal point proof: x tg z 2 2 = x z 2 2 2tg T (x z) + t 2 g 2 2 x z 2 2 2t(f(x) f(z)) + t 2 g 2 2 Prof. S. Boyd, EE392o, Stanford University 25

Descent directions and optimality fact: for f convex, finite near x, either 0 f(x) (in which case x minimizes f), or there is a descent direction for f at x i.e., x is optimal (minimizes f) iff there is no descent direction for f at x proof: define δx sd = argmin z z f(x) if δx sd = 0, then 0 f(x), so x is optimal; otherwise f (x; δx sd ) = ( inf z f(x) z ) 2 < 0, so δxsd is a descent direction Prof. S. Boyd, EE392o, Stanford University 26

f(x) PSfrag replacements x sd idea extends to constrained case (feasible descent direction) Prof. S. Boyd, EE392o, Stanford University 27