Subgradients. subgradients and quasigradients. subgradient calculus. optimality conditions via subgradients. directional derivatives

Subgradients subgradients and quasigradients subgradient calculus optimality conditions via subgradients directional derivatives Prof. S. Boyd, EE392o, Stanford University

Basic inequality recall basic inequality for convex differentiable f: f(y) f(x) + f(x) T (y x) first-order approximation of f at x is global underestimator ( f(x), 1) supports epi f at (x, f(x)) What if f is not differentiable? Prof. S. Boyd, EE392o, Stanford University 1

Subgradient of a function g is a subgradient of f (not necessarily convex) at x if f(y) f(x) + g T (y x) ( (g, 1) supports epi f at (x, f(x))) PSfrag replacements f(x 1 ) + g T 1 (x x 1) for all y f(x) f(x 2 ) + g T 2 (x x 2) f(x 2 ) + g T 3 (x x 2) x 1 x 2 g 2, g 3 are subgradients at x 2 ; g 1 is a subgradient at x 1 Prof. S. Boyd, EE392o, Stanford University 2

subgradient gives affine global underestimator of f if f is convex, it has at least one subgradient at every point in relint dom f if f is convex and differentiable, f(x) is a subgradient of f at x Prof. S. Boyd, EE392o, Stanford University 3

Example f = max{f 1, f 2 }, with f 1, f 2 convex and differentiable PSfrag replacements f 2 (x) f(x) f 1 (x) x 0 f 1 (x 0 ) > f 2 (x 0 ): unique subgradient g = f 1 (x 0 ) f 2 (x 0 ) > f 1 (x 0 ): unique subgradient g = f 2 (x 0 ) f 1 (x 0 ) = f 2 (x 0 ): subgradients form a line segment [ f 1 (x 0 ), f 2 (x 0 )] Prof. S. Boyd, EE392o, Stanford University 4

Subdifferential set of all subgradients of f at x is called the subdifferential of f at x, written f(x) f(x) is a closed convex set f(x) nonempty (if f convex, and finite near x) f(x) = { f(x)} if f is differentiable at x if f(x) = {g}, then f is differentiable at x and g = f(x) in many applications, don t need complete f(x); it is sufficient to find one g f(x) Prof. S. Boyd, EE392o, Stanford University 5

example: f(x) = x Sfrag replacements f(x) = x f(x) 1 x 1 x Prof. S. Boyd, EE392o, Stanford University 6

Calculus of subgradients assumption: all functions are finite near x f(x) = { f(x)} if f is differentiable at x scaling: (αf) = α f (if α > 0) addition: (f 1 + f 2 ) = f 1 + f 2 (RHS is addition of sets) affine transformation of variables: if g(x) = f(ax + b), then g(x) = A T f(ax + b) pointwise maximum: if f = max f i, then i=1,...,m f(x) = Co { f i (x) f i (x) = f(x)}, i.e., convex hull of union of subdifferentials of active functions at x Prof. S. Boyd, EE392o, Stanford University 7

special case: if f i differentiable replacements f(x) = Co{ f i (x) f i (x) = f(x)} example: f(x) = x 1 = max{s T x s i { 1, 1}} 1 1 1 1 1 (1,1) 1 1 f(x) at x = (0, 0) at x = (1, 0) at x = (1, 1) Prof. S. Boyd, EE392o, Stanford University 8

Pointwise supremum if f = sup f α, α A cl Co { f β (x) f β (x) = f(x)} f(x) (usually get equality, but requires some technical conditions to hold, e.g., A compact, f α cts in x and α) roughly speaking, f(x) is closure of convex hull of union of subdifferentials of active function in any case, if f β (x) = f(x), then f β (x) f(x) Prof. S. Boyd, EE392o, Stanford University 9

example f(x) = λ max (A(x)) = sup y T A(x)y y 2 =1 where A(x) = A 0 + x 1 A 1 + + x n A n, A i S k f is pointwise supremum of g y (x) = y T A(x)y over y 2 = 1 g y is affine in x, with g y (x) = (y T A 1 y,..., y T A n y) hence, f(x) = Co { g y A(x)y = λ max (A(x))y, y 2 = 1} (not hard to verify) to find one subgradient at x, can choose any unit eigenvector y associated with λ max (A(x)); then (y T A 1 y,..., y T A n y) f(x) Prof. S. Boyd, EE392o, Stanford University 10

define g(y) as the optimal value of Minimization minimize subject to f 0 (x) f i (x) y i, i = 1,..., m (f i convex; variable x) with λ an optimal dual variable, we have g(z) g(y) m λ i (z i y i ) i=1 i.e., λ is a subgradient of g at y Prof. S. Boyd, EE392o, Stanford University 11

Subgradients and sublevel sets g is a subgradient at x means f(y) f(x) + g T (y x) hence f(y) f(x) = g T (y x) 0 g f(x 0 ) PSfrag replacements x 0 f(x) f(x 0 ) x 1 f(x 1 ) Prof. S. Boyd, EE392o, Stanford University 12

f differentiable at x 0 : f(x 0 ) is normal to the sublevel set {x f(x) f(x 0 )} f nondifferentiable at x 0 : subgradient defines a supporting hyperplane to sublevel set through x 0 Prof. S. Boyd, EE392o, Stanford University 13

Quasigradients g 0 is a quasigradient of f at x if g T (y x) 0 = f(y) f(x) holds for all y PSfrag replacements f(y) f(x) x g quasigradients at x form a cone Prof. S. Boyd, EE392o, Stanford University 14

example: f(x) = at x + b c T x + d, (dom f = {x ct x + d > 0}) g = a f(x 0 )c is a quasigradient at x 0 proof: for c T x + d > 0: a T (x x 0 ) f(x 0 )c T (x x 0 ) = f(x) f(x 0 ) Prof. S. Boyd, EE392o, Stanford University 15

example: degree of a 1 + a 2 t + + a n t n 1 f(a) = min{i a i+2 = = a n = 0} g = sign(a k+1 )e k+1 (with k = f(a)) is a quasigradient at a 0 proof: implies b k+1 0 g T (b a) = sign(a k+1 )b k+1 a k+1 0 Prof. S. Boyd, EE392o, Stanford University 16

Optimality conditions unconstrained recall for f convex, differentiable, f(x ) = inf x f(x) 0 = f(x ) generalization to nondifferentiable convex f: f(x ) = inf x f(x) 0 f(x ) Prof. S. Boyd, EE392o, Stanford University 17

f(x) PSfrag replacements 0 f(x 0 ) x 0 x proof. by definition (!) f(y) f(x ) + 0 T (y x ) for all y 0 f(x )... seems trivial but isn t Prof. S. Boyd, EE392o, Stanford University 18

Example: piecewise linear minimization f(x) = max i=1,...,m (a T i x + b i) x minimizes f 0 f(x ) = Co{a i a T i x + b i = f(x )} there is a λ with λ 0, 1 T λ = 1, m λ i a i = 0 i=1 where λ i = 0 if a T i x + b i < f(x ) Prof. S. Boyd, EE392o, Stanford University 19

... but these are the KKT conditions for the epigraph form minimize t subject to a T i x + b i t, i = 1,..., m with dual maximize b T λ subject to λ 0, A T λ = 0, 1 T λ = 1 Prof. S. Boyd, EE392o, Stanford University 20

Optimality conditions constrained minimize subject to f 0 (x) f i (x) 0, i = 1,..., m we assume f i convex, defined on R n (hence subdifferentiable) strict feasibility (Slater s condition) x is primal optimal (λ is dual optimal) iff f i (x ) 0, λ i 0 0 f 0 (x ) + m i=1 λ i f i(x ) λ i f i(x ) = 0... generalizes KKT for nondifferentiable f i Prof. S. Boyd, EE392o, Stanford University 21

Directional derivative directional derivative of f at x in the direction δx is can be + or f (x; δx) = lim h 0 f(x + hδx) f(x) h f convex, finite near x = f (x; δx) exists f differentiable at x if and only if, for some g (= f(x)) and all δx, f (x; δx) = g T δx (i.e., f (x; δx) is a linear function of δx) Prof. S. Boyd, EE392o, Stanford University 22

Directional derivative and subdifferential general formula for convex f: f (x; δx) = sup g T δx g f(x) δx PSfrag replacements f(x) Prof. S. Boyd, EE392o, Stanford University 23

Descent directions δx is a descent direction for f at x if f (x; δx) < 0 for differentiable f, δx = f(x) is always a descent direction (except when it is zero) warning: for nondifferentiable (convex) functions, δx = g, with g f(x), need not be descent direction x 2 g example: f(x) = x 1 + 2 x 2 PSfrag replacements x 1 Prof. S. Boyd, EE392o, Stanford University 24

Subgradients and distance to sublevel sets if f is convex, f(z) < f(x), g f(x), then for small t > 0, x tg z 2 < x z 2 thus g is descent direction for x z 2, for any z with f(z) < f(x) (e.g., x ) negative subgradient is descent direction for distance to optimal point proof: x tg z 2 2 = x z 2 2 2tg T (x z) + t 2 g 2 2 x z 2 2 2t(f(x) f(z)) + t 2 g 2 2 Prof. S. Boyd, EE392o, Stanford University 25

Descent directions and optimality fact: for f convex, finite near x, either 0 f(x) (in which case x minimizes f), or there is a descent direction for f at x i.e., x is optimal (minimizes f) iff there is no descent direction for f at x proof: define δx sd = argmin z z f(x) if δx sd = 0, then 0 f(x), so x is optimal; otherwise f (x; δx sd ) = ( inf z f(x) z ) 2 < 0, so δxsd is a descent direction Prof. S. Boyd, EE392o, Stanford University 26

f(x) PSfrag replacements x sd idea extends to constrained case (feasible descent direction) Prof. S. Boyd, EE392o, Stanford University 27