Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus

Similar documents
Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient

Subgradients. subgradients. strong and weak subgradient calculus. optimality conditions via subgradients. directional derivatives

Subgradients. subgradients and quasigradients. subgradient calculus. optimality conditions via subgradients. directional derivatives

Lecture 1: Background on Convex Analysis

Constrained Optimization and Lagrangian Duality

The proximal mapping

Lecture: Duality of LP, SOCP and SDP

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Primal/Dual Decomposition Methods

Optimality Conditions for Nonsmooth Convex Optimization

Lecture: Duality.

6. Proximal gradient method

CS-E4830 Kernel Methods in Machine Learning

5. Subgradient method

Lecture 3. Optimization Problems and Iterative Algorithms

Optimization for Machine Learning

Lagrange duality. The Lagrangian. We consider an optimization program of the form

5. Duality. Lagrangian

ICS-E4030 Kernel Methods in Machine Learning

Convex Optimization M2

Optimality Conditions for Constrained Optimization

Math 273a: Optimization Subgradients of convex functions

8. Conjugate functions

Convex Analysis Notes. Lecturer: Adrian Lewis, Cornell ORIE Scribe: Kevin Kircher, Cornell MAE

Convex Optimization. (EE227A: UC Berkeley) Lecture 4. Suvrit Sra. (Conjugates, subdifferentials) 31 Jan, 2013

Inequality Constraints

Convex Optimization Boyd & Vandenberghe. 5. Duality

A Brief Review on Convex Optimization

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Convex Functions. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Dual Decomposition.

Dual Proximal Gradient Method

Lecture 2: Convex Sets and Functions

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

CSCI : Optimization and Control of Networks. Review on Convex Optimization

Convex Optimization & Lagrange Duality

Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Lecture 6: Conic Optimization September 8

Existence of minimizers

Chap 2. Optimality conditions

Convex Analysis Background

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

Convex Optimization Lecture 6: KKT Conditions, and applications

Optimization and Optimal Control in Banach Spaces

Convex Analysis and Optimization Chapter 4 Solutions

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 17

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

IE 521 Convex Optimization Homework #1 Solution

Lecture 1: Introduction. Outline. B9824 Foundations of Optimization. Fall Administrative matters. 2. Introduction. 3. Existence of optima

Lecture: Smoothing.

Nonlinear Programming 3rd Edition. Theoretical Solutions Manual Chapter 6

Convex Functions. Pontus Giselsson

Lecture 18: Optimization Programming

Constrained Optimization

Solutions Chapter 5. The problem of finding the minimum distance from the origin to a line is written as. min 1 2 kxk2. subject to Ax = b.

On the Method of Lagrange Multipliers

EE 546, Univ of Washington, Spring Proximal mapping. introduction. review of conjugate functions. proximal mapping. Proximal mapping 6 1

1 Review of last lecture and introduction

HW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given.

CONSTRAINT QUALIFICATIONS, LAGRANGIAN DUALITY & SADDLE POINT OPTIMALITY CONDITIONS

Primal-dual Subgradient Method for Convex Problems with Functional Constraints

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Extended Monotropic Programming and Duality 1

In applications, we encounter many constrained optimization problems. Examples Basis pursuit: exact sparse recovery problem

Lagrangian Duality and Convex Optimization

Convex Optimization and SVM

Numerical Optimization

Additional Homework Problems

A : k n. Usually k > n otherwise easily the minimum is zero. Analytical solution:

Lecture 1: Introduction. Outline. B9824 Foundations of Optimization. Fall Administrative matters. 2. Introduction. 3. Existence of optima

Lecture 8. Strong Duality Results. September 22, 2008

A : k n. Usually k > n otherwise easily the minimum is zero. Analytical solution:

Duality Theory of Constrained Optimization

Math 273a: Optimization Convex Conjugacy

Convex Optimization Conjugate, Subdifferential, Proximation

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

Duality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities

Convex Optimization and Modeling

Optimisation convexe: performance, complexité et applications.

Convex Optimization in Communications and Signal Processing

3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions

Chapter 2. Optimization. Gradients, convexity, and ALS

Subgradient Method. Ryan Tibshirani Convex Optimization

1. f(β) 0 (that is, β is a feasible point for the constraints)

Linear Programming. Larry Blume Cornell University, IHS Vienna and SFI. Summer 2016

Lectures 9 and 10: Constrained optimization problems and their optimality conditions

3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions

Chapter 2 Convex Analysis

subject to (x 2)(x 4) u,

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

Lecture 6: September 17

Local strong convexity and local Lipschitz continuity of the gradient of convex functions

15. Conic optimization

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

Subgradient Methods. Stephen Boyd (with help from Jaehyun Park) Notes for EE364b, Stanford University, Spring

CONVEX OPTIMIZATION, DUALITY, AND THEIR APPLICATION TO SUPPORT VECTOR MACHINES. Contents 1. Introduction 1 2. Convex Sets

Math 273a: Optimization Subgradient Methods

Transcription:

1/41 Subgradient Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes definition subgradient calculus duality and optimality conditions directional derivative

Basic inequality recall basic inequality for convex differentiable f : f (y) f (x) + f (x) (y x) the first-order approximation of f at x is a global lower bound f (x) defines non-vertical supporting hyperplane to epi f at (x, f (x)) [ f (x) 1 ] ([ y t what if f is not differentiable? ] [ x f (x) ]) 0 (y, t) epi f 2/41

Subgradient 3/41 g is a subgradient of a convex function f at x dom f if f (y) f (x) + g (y x) y dom f g 2, g 3 are subgradients at x 2 ; g 1 is a subgradient at x 1

properties f (x) + g (y x) is a global lower bound on f (y) g defines non-vertical supporting hyperplane to epi f at (x, f (x)) [ ] ([ ] [ ]) g y x 0 (y, t) epi f 1 t f (x) if f is convex and differentiable, then f (x) is a subgradient of f at x algorithms for nondifferentiable convex optimization unconstrained optimality: x minimizes f (x) if and only if 0 f (x) KKT conditions with nondifferentiable functions 4/41

Example f (x) = max{f 1 (x), f 2 (x)} f 1, f 2 convex and differentiable subgradients at x 0 form line segment [ f 1 (x 0 ), f 2 (x 0 )] if f 1 (ˆx) > f 2 (ˆx), subgradient of f at ˆx is f 1 (ˆx) if f 1 (ˆx) < f 2 (ˆx), subgradient of f at ˆx is f 2 (ˆx) 5/41

Subdifferential 6/41 the subdifferential f (x) of f at x is the set of all subgradients: f (x) = {g g (y x) f (y) f (x)} y dom f f (x) is a closed convex set (possibly empty) (follows from the definition: f (x) is an intersection of halfspaces) if x int dom f then f (x) is nonempty and bounded (proof on next two pages)

7/41 proof : we show that f (x) is nonempty when x int dom f (x, f (x)) is in the boundary of the convex set epi f therefore there exists a supporting hyperplane to epi f at (x, f (x)): (a, b) 0, [ a b ] ([ y t ] [ b > 0 gives a contradiction as t x f (x) ]) 0 (y, t) epi f b = 0 gives a contradiction for y = x + ɛa with small ɛ > 0 therefore b < 0 and g = a/ b is a subgradient of f at x

8/41 proof : we show that f (x) is empty when x int dom f for small r > 0, define a set of 2n points B = {x ± re k k = 1,, n} dom f and define M = max y B f (y) < for every nonzero g f (x), there is a point y B with f (y) f (x) + g (y x) = f (x) + r g (choose an index k with g k = g, and take y = x + rsign(g k )e k ) therefore f (x) is bounded: sup g M f (x) g f (x) r

Examples 9/41 absolute value f (x) = x Euclidean norm f (x) = x 2 f (x) = 1 x 2 x if x 0, f (x) = {g g 2 1} if x = 0}

Monotonicity 10/41 subdifferential of a convex function is a monotone operator: (u v) (x y) 0 x, y, u f (x), v f (y) proof : by definition f (y) f (x) + u (y x), f (x) f (y) + v (x y) combining the two inequalities shows monotonicity

Examples of non-subdifferentiable functions 11/41 the following functions are not subdifferentiable at x = 0 f : R R, dom f = R + f (x) = 1 if x = 0, f (x) = 0 if x > 0 f : R R, dom f = R + f (x) = x the only supporting hyperplane to epi f at (0, f (0)) is vertical

Subgradients and sublevel sets 12/41 if g is a subgradient of f at x, then nonzero subgradients at x define supporting hyperplanes to sublevel set {y f (y) f (x)}

Outline 13/41 definition subgradient calculus duality and optimality conditions directional derivative

14/41 Subgradient calculus weak subgradient calculus: rules for finding one subgradient sufficient for most nondifferentiable convex optimization algorithms if you can evaluate f (x), you can usually compute a subgradient strong subgradient calculus: rules for finding f (x) (all subgradients) some algorithms, optimality conditions, etc., need entire subdifferential can be quite complicated we will assume that x int dom f

Basic rules 15/41 differentiable functions: f (x) = { f (x)} if f is differentiable at x nonnegative combination if h(x) = α 1 f 1 (x) + α 2 f 2 (x) with α 1, α 2 0, then (r.h.s. is addition of sets) h(x) = α 1 f 1 (x) + α 2 f 2 (x) affine transformation of variables: if h(x) = f (Ax + b), then h(x) = A f (Ax + b)

16/41 Pointwise maximum f (x) = max{f 1 (x),, f m (x)} define I(x) = {i f i (x) = f (x)}, the active functions at x weak result: to compute a subgradient at x, choose any k I(x), and any subgradient of f k at x strong result f (x) = conv i I(x) f i (x) convex hull of the union of subdifferentials of active functions at x if f i s are differentiable, f (x) = conv{ f i (x) i I(x)}

Example: piecewise-linear function f (x) = max i=1,...,m a i x + b i the subdifferential at x is a polyhedron f (x) = conv{a i i I(x)} with I(x) = {i a i x + b i = f (x)} 17/41

Example: l 1 -norm 18/41 f (x) = x 1 = max s { 1,1} s x n the subdifferential is a product of intervals [ 1, 1], x k = 0 f (x) = J 1 J n, J k = {1}, x k > 0 { 1}, x k < 0

Pointwise supremum f (x) = sup f α (x), α A f α (x) convex in x for every α weak result: to find a subgradient at x, find any β for which f (ˆx) = f β (ˆx) (assuming maximum is attained) choose any g f β (ˆx) strong result: define I(x) = {α A f α (x) = f (x)} conv f α (x) f (x) α I(x) equality requires extra conditions (e.g., A compact, f α continuous in α) 19/41

20/41 Exercise: maximum eigenvalue problem: explain how to find a subgradient of f (x) = λ max (A(x)) = sup y A(x)y y 2 =1 where A(x) = A 0 + x 1 A 1 + + x n A n with symmetric coefficients A i solution: to find a subgradient at ˆx, choose any unit eigenvector y with eigenvalue λ max (A(ˆx)) the gradient of y A(x)y at ˆx is a subgradient of f : (y A 1 y,, y A n y) f (ˆx)

21/41 Minimization f (x) = inf h(x, y), h jointly convex in (x, y) y weak result: to find a subgradient at ˆx find ŷ that minimizes h(ˆx, y) (assuming minimum is attained) find subgradient (g, 0) h(ˆx, ŷ) proof : for all x, y therefore h(x, y) h(ˆx, ŷ) + g (x ˆx) + 0 (y ŷ) = f (ˆx) + g (x ˆx) f (x) = inf h(x, y) f (ˆx) + y g (x ˆx)

22/41 Exercise: Euclidean distance to convex set problem: explain how to find a subgradient of where C is a closed convex set f (x) = inf y C x y 2 solution: to find a subgradient at ˆx, if f (ˆx) = 0 (that is, ˆx C), take g = 0 if f (ˆx) > 0, find projection ŷ = P(ˆx) on C; take g = 1 ŷ ˆx 2 (ˆx ŷ) = 1 ˆx P(ˆx) 2 (ˆx P(ˆx))

Composition 23/41 f (x) = h(f 1 (x),, f k (x)), h convex nondecreasing, f i convex weak result: to find a subgradient at ˆx, find z h(f 1 (ˆx),, f k (ˆx)) and g i f i (ˆx) then g = z 1 g 1 + + z k g k f (ˆx) reduces to standard formula for differentiable h, f i proof : ( ) f (x) h f 1 (ˆx) + g 1 (x ˆx),, f k (ˆx) + g k (x ˆx) ( ) h(f 1 (ˆx),, f k (ˆx)) + z g 1 (x ˆx),, g k (x ˆx) = f (ˆx) + g (x ˆx)

Optimal value function 24/41 define h(u, v) as the optimal value of convex problem min s.t. f 0 (x) f i (x) u i, i = 1,, m Ax = b + v (functions f i are convex; optimization variable is x) weak result: suppose h(û, ˆv) is finite, strong duality holds with the dual ( ) max inf x s.t. λ 0 f 0 (x) + i λ i (f i (x) û i ) + ν (Ax b ˆv) if ˆλ, ˆν are optimal dual variables (for r.h.s. û, ˆv) then (ˆλ, ˆν) h(û, ˆv)

25/41 proof : by weak duality for problem with r.h.s. u, v h(u, v) inf x = inf x ( ( f 0 (x) + i f 0 (x) + i ) ˆλ i (f i (x u i ) + ˆν (Ax b v) ) ˆλ i (f i (x û i ) + ˆν (Ax b ˆv) ˆλ (u û) ˆν (v ˆv) = h(û, ˆv) ˆλ (u û) ˆν (v ˆv)

Expectation 26/41 f (x) = Eh(x, u) u random, h convex in x for every u weak result: to find a subgradient at ˆx choose a function g(u) with g(u) x h(ˆx, u) then, g = E u g(u) f (ˆx) proof : by convexity of h and definition of g(u), f (x) = Eh(x, u) ( ) E h(ˆx, u) + g(u) (x ˆx) = f (ˆx) + g (x ˆx)

Outline 27/41 definition subgradient calculus duality and optimality conditions directional derivative

Optimality conditions - unconstrained 28/41 x minimizes f (x) if and only 0 f (x ) proof : by definition f (y) f (x ) + 0 (y x ) for all y 0 f (x )

Example: piecewise linear minimization optimality condition f (x) = max i=1,,m (a i x + b i ) 0 conv{a i i I(x )} (where I(x) = {i a i x + b i = f (x)}) in other words, x is optimal if and only if there is a λ with λ 0, 1 λ = 1, m λ i a i = 0, λ i = 0 for i / I(x ) i=1 these are the optimality conditions for the equivalent linear program min t max b λ s.t. Ax + b t1 s.t. A λ = 0 λ 0, 1 λ = 1 29/41

Optimality conditions - constrained 30/41 min f 0 (x) s.t. f i (x) 0, i = 1,, m from Lagrange duality if strong duality holds, then x, λ are primal, dual optimal if and only if 1. x is primal feasible 2. λ 0 3. λ i f i(x ) = 0 for i = 1,, m 4. x is a minimizer of L(x, λ ) = f 0 (x) + m λ i f i (x) i=1

31/41 Karush-Kuhn-Tucker conditions (if dom f i = R n ) conditions 1, 2, 3 and m 0 L x (x, λ ) = f 0 (x ) + λ i f i (x ) i=1 this generalizes the condition 0 = f 0 (x ) + for differentiable f i m λ i f i (x ) i=1

Outline 32/41 definition subgradient calculus duality and optimality conditions directional derivative

Directional derivative 33/41 Definition (general f ): directional derivative of f at x in the direction y is (if the limit exists) f f (x + αy) f (x) (x; y) = lim α 0 α = lim (t(f (x + 1t ) y) tf (x) t f (x; y) is the right derivative of g(α) = f (x + αy) at α = 0 f (x; y) is homogeneous in y: f (x; λy) = λf (x; y) for λ 0

Directional derivative of a convex function 34/41 equivalent definition (convex f ): replace lim with inf proof f f (x + αy) f (x) (x; y) = inf α>0 α = inf (t(f (x + 1t ) y) tf (x) t>0 the function h(y) = f (x + y) f (x) is convex in y, with h(0) = 0 its perspective th(y/t) is nonincreasing in t (EE236B ex. A2.5); hence f (x; y) = lim t th(y/t) = inf t>0 th(y/t)

Properties 35/41 consequences of the expressions (for convex f ) f f (x + αy) f (x) (x; y) = lim α>0 α = lim (t(f (x + 1t ) y) tf (x) t>0 f (x; y) is convex in y (partial minimization of a convex function in y, t) f (x; y) defines a lower bound on f in the direction y: f (x + αy) f (x) + αf (x; y) α 0

36/41 Directional derivative and subgradients for convex f and x intdom f f (x; y) = sup g y g f (x) f (x; y) is support function of f (x) generalizes f (x; y) = f (x) y for differentiable functions implies that f (x; y) exists for all x intdom f, all y (see page 6)

37/41 proof : if g f (x) then from p 35 f f (x) + αg y f (x) (x; y) inf = g y α>0 α it remains to show that f (x; y) = ĝ y for at least one ĝ f (x) f (x; y) is convex in y with domain R n, hence subdifferentiable at all y let ĝ be a subgradient of f (x; y) at y: for all v, λ 0, λf (x; v) = f (x; λv) f (x; y) + ĝ (λv y) taking λ shows f (x; v) ĝ v; from the lower bound on p 34 f (x + v) f (x) + f (x; v) f (x) + ĝ v hence ĝ f (x); taking λ = 0 we see that f (x; y) ĝ y v

Descent directions and subgradients y is a descent direction of f at x if f (x; y) < 0 negative gradient of differentiable f is descent direction (if f (x) 0) negative subgradient is not always a descent direction example: f (x 1, x 2 ) = x 1 + 2 x 2 g = (1, 2) f (1, 0), but y = ( 1, 2) is not a descent direction at (1, 0) 38/41

Steepest descent direction definition: (normalized) steepest descent direction at x int dom f is x nsd = argmin f (x; y) y 2 1 x nsd is the primal solution y of the pair of dual problems (BV S8.1.3) min (over y) f (x; y) s.t. y 2 1 max (over g) g 2 s.t. g f (x) optimal g is subgradient with least norm f (x; x nsd ) = g 2 if 0 / f (x), x nsd = g / g 2 39/41

Subgradients and distance to sublevel sets if f is convex, f (y) < f (x), g f (x), then for small t > 0, x tg y 2 2 = x y 2 2 2tg (x y) + t 2 g 2 2 x y 2 2 2t(f (x) f (y)) + t 2 g 2 2 < x y 2 2 g is descent direction for x y 2, for any y with f (y) < f (x) in particular, g is descent direction for distance to any minimizer of f 40/41

References 41/41 J.-B. Hiriart-Urruty, C. Lemar echal, Convex Analysis and Minimization Algoritms (1993), chapter VI. Yu. Nesterov, Introductory Lectures on Convex Optimization. A Basic Course (2004), section 3.1. B. T. Polyak, Introduction to Optimization (1987), section 5.1.