Subgradients. subgradients. strong and weak subgradient calculus. optimality conditions via subgradients. directional derivatives

Size: px

Start display at page:

Download "Subgradients. subgradients. strong and weak subgradient calculus. optimality conditions via subgradients. directional derivatives"

Wilfred Fowler
5 years ago
Views:

1 Subgradients subgradients strong and weak subgradient calculus optimality conditions via subgradients directional derivatives Prof. S. Boyd, EE364b, Stanford University

2 Basic inequality recall basic inequality for convex differentiable f: f(y) f(x) + f(x) T (y x) first-order approximation of f at x is global underestimator ( f(x), 1) supports epi f at (x,f(x)) what if f is not differentiable? Prof. S. Boyd, EE364b, Stanford University 1

3 Subgradient of a function g is a subgradient of f (not necessarily convex) at x if f(y) f(x) + g T (y x) for all y f(x) f(x 1 ) + g T 1 (x x 1) f(x 2 ) + g T 2 (x x 2) f(x 2 ) + g T 3 (x x 2) x 1 x 2 g 2, g 3 are subgradients at x 2 ; g 1 is a subgradient at x 1 Prof. S. Boyd, EE364b, Stanford University 2

4 g is a subgradient of f at x iff (g, 1) supports epi f at (x,f(x)) g is a subgradient iff f(x) + g T (y x) is a global (affine) underestimator of f if f is convex and differentiable, f(x) is a subgradient of f at x subgradients come up in several contexts: algorithms for nondifferentiable convex optimization convex analysis, e.g., optimality conditions, duality for nondifferentiable problems (if f(y) f(x) + g T (y x) for all y, then g is a supergradient) Prof. S. Boyd, EE364b, Stanford University 3

5 Example f = max{f 1, f 2 }, with f 1, f 2 convex and differentiable f(x) f 2 (x) f 1 (x) x 0 f 1 (x 0 ) > f 2 (x 0 ): unique subgradient g = f 1 (x 0 ) f 2 (x 0 ) > f 1 (x 0 ): unique subgradient g = f 2 (x 0 ) f 1 (x 0 ) = f 2 (x 0 ): subgradients form a line segment [ f 1 (x 0 ), f 2 (x 0 )] Prof. S. Boyd, EE364b, Stanford University 4

6 Subdifferential set of all subgradients of f at x is called the subdifferential of f at x, denoted f(x) f(x) is a closed convex set (can be empty) if f is convex, f(x) is nonempty, for x relint dom f f(x) = { f(x)}, if f is differentiable at x if f(x) = {g}, then f is differentiable at x and g = f(x) Prof. S. Boyd, EE364b, Stanford University 5

7 Example f(x) = x f(x) = x f(x) 1 x 1 x righthand plot shows {(x, f(x)) x R} Prof. S. Boyd, EE364b, Stanford University 6

8 Subgradient calculus weak subgradient calculus: formulas for finding one subgradient g f(x) strong subgradient calculus: formulas for finding the whole subdifferential f(x), i.e., all subgradients of f at x many algorithms for nondifferentiable convex optimization require only one subgradient at each step, so weak calculus suffices some algorithms, optimality conditions, etc., need whole subdifferential roughly speaking: if you can compute f(x), you can usually compute a g f(x) we ll assume that f is convex, and x relintdom f Prof. S. Boyd, EE364b, Stanford University 7

9 Some basic rules f(x) = { f(x)} if f is differentiable at x scaling: (αf) = α f (if α > 0) addition: (f 1 + f 2 ) = f 1 + f 2 (RHS is addition of sets) affine transformation of variables: if g(x) = f(ax + b), then g(x) = A T f(ax + b) finite pointwise maximum: if f = max i=1,...,m f i, then f(x) = Co { f i (x) f i (x) = f(x)}, i.e., convex hull of union of subdifferentials of active functions at x Prof. S. Boyd, EE364b, Stanford University 8

10 f(x) = max{f 1 (x),...,f m (x)}, with f 1,...,f m differentiable f(x) = Co{ f i (x) f i (x) = f(x)} example: f(x) = x 1 = max{s T x s i { 1,1}} (1,1) 1 1 f(x) at x = (0, 0) at x = (1, 0) at x = (1, 1) Prof. S. Boyd, EE364b, Stanford University 9

11 Pointwise supremum if f = sup f α, α A clco { f β (x) f β (x) = f(x)} f(x) (usually get equality, but requires some technical conditions to hold, e.g., A compact, f α cts in x and α) roughly speaking, f(x) is closure of convex hull of union of subdifferentials of active functions Prof. S. Boyd, EE364b, Stanford University 10

12 Weak rule for pointwise supremum f = sup f α α A find any β for which f β (x) = f(x) (assuming supremum is achieved) choose any g f β (x) then, g f(x) Prof. S. Boyd, EE364b, Stanford University 11

13 example f(x) = λ max (A(x)) = sup y T A(x)y y 2 =1 where A(x) = A 0 + x 1 A x n A n, A i S k f is pointwise supremum of g y (x) = y T A(x)y over y 2 = 1 g y is affine in x, with g y (x) = (y T A 1 y,...,y T A n y) hence, f(x) Co { g y A(x)y = λ max (A(x))y, y 2 = 1} (in fact equality holds here) to find one subgradient at x, can choose any unit eigenvector y associated with λ max (A(x)); then (y T A 1 y,...,y T A n y) f(x) Prof. S. Boyd, EE364b, Stanford University 12

14 Expectation f(x) = E f(x,u), with f convex in x for each u, u a random variable for each u, choose any g u f (x, u) (so u g u is a function) then, g = Eg u f(x) Monte Carlo method for (approximately) computing f(x) and a g f(x): generate independent samples u 1,...,u K from distribution of u f(x) (1/K) K i=1 f(x, u i) for each i choose g i x f(x,u i ) g = (1/K) K i=1 g(x, u i) is an (approximate) subgradient (more on this later) Prof. S. Boyd, EE364b, Stanford University 13

15 Minimization define g(y) as the optimal value of minimize f 0 (x) subject to f i (x) y i, i = 1,...,m (f i convex; variable x) with λ an optimal dual variable, we have g(z) g(y) m λ i(z i y i ) i=1 i.e., λ is a subgradient of g at y Prof. S. Boyd, EE364b, Stanford University 14

16 Composition f(x) = h(f 1 (x),...,f k (x)), with h convex nondecreasing, f i convex find q h(f 1 (x),...,f k (x)), g i f i (x) then, g = q 1 g q k g k f(x) reduces to standard formula for differentiable h, g i proof: f(y) = h(f 1 (y),...,f k (y)) h(f 1 (x) + g1 T (y x),...,f k (x) + gk T (y x)) h(f 1 (x),...,f k (x)) + q T (g1 T (y x),...,gk T (y x)) = f(x) + g T (y x) Prof. S. Boyd, EE364b, Stanford University 15

17 Subgradients and sublevel sets g is a subgradient at x means f(y) f(x) + g T (y x) hence f(y) f(x) = g T (y x) 0 g f(x 0 ) x 0 f(x) f(x 0 ) x 1 f(x 1 ) Prof. S. Boyd, EE364b, Stanford University 16

18 f differentiable at x 0 : f(x 0 ) is normal to the sublevel set {x f(x) f(x 0 )} f nondifferentiable at x 0 : subgradient defines a supporting hyperplane to sublevel set through x 0 Prof. S. Boyd, EE364b, Stanford University 17

19 Quasigradients g 0 is a quasigradient of f at x if g T (y x) 0 = f(y) f(x) holds for all y f(y) f(x) x g quasigradients at x form a cone Prof. S. Boyd, EE364b, Stanford University 18

20 example: f(x) = at x + b c T x + d, (dom f = {x ct x + d > 0}) g = a f(x 0 )c is a quasigradient at x 0 proof: for c T x + d > 0: a T (x x 0 ) f(x 0 )c T (x x 0 ) = f(x) f(x 0 ) Prof. S. Boyd, EE364b, Stanford University 19

21 example: degree of a 1 + a 2 t + + a n t n 1 f(a) = min{i a i+2 = = a n = 0} g = sign(a k+1 )e k+1 (with k = f(a)) is a quasigradient at a 0 proof: implies b k+1 0 g T (b a) = sign(a k+1 )b k+1 a k+1 0 Prof. S. Boyd, EE364b, Stanford University 20

22 Optimality conditions unconstrained recall for f convex, differentiable, f(x ) = inf x f(x) 0 = f(x ) generalization to nondifferentiable convex f: f(x ) = inf x f(x) 0 f(x ) Prof. S. Boyd, EE364b, Stanford University 21

23 f(x) 0 f(x 0 ) x 0 x proof. by definition (!) f(y) f(x ) + 0 T (y x ) for all y 0 f(x )... seems trivial but isn t Prof. S. Boyd, EE364b, Stanford University 22

24 Example: piecewise linear minimization f(x) = max i=1,...,m (a T i x + b i) x minimizes f 0 f(x ) = Co{a i a T i x + b i = f(x )} there is a λ with λ 0, 1 T λ = 1, m λ i a i = 0 i=1 where λ i = 0 if a T i x + b i < f(x ) Prof. S. Boyd, EE364b, Stanford University 23

25 ... but these are the KKT conditions for the epigraph form minimize t subject to a T i x + b i t, i = 1,...,m with dual maximize b T λ subject to λ 0, A T λ = 0, 1 T λ = 1 Prof. S. Boyd, EE364b, Stanford University 24

26 we assume Optimality conditions constrained minimize f 0 (x) subject to f i (x) 0, i = 1,...,m f i convex, defined on R n (hence subdifferentiable) strict feasibility (Slater s condition) x is primal optimal (λ is dual optimal) iff f i (x ) 0, λ i 0 0 f 0 (x ) + m i=1 λ i f i(x ) λ i f i(x ) = 0... generalizes KKT for nondifferentiable f i Prof. S. Boyd, EE364b, Stanford University 25

27 Directional derivative directional derivative of f at x in the direction δx is can be + or f (x; δx) = lim hց0 f(x + hδx) f(x) h f convex, finite near x = f (x;δx) exists f differentiable at x if and only if, for some g (= f(x)) and all δx, f (x;δx) = g T δx (i.e., f (x;δx) is a linear function of δx) Prof. S. Boyd, EE364b, Stanford University 26

28 Directional derivative and subdifferential general formula for convex f: f (x;δx) = sup g T δx g f(x) δx f(x) Prof. S. Boyd, EE364b, Stanford University 27

29 Descent directions δx is a descent direction for f at x if f (x;δx) < 0 for differentiable f, δx = f(x) is always a descent direction (except when it is zero) warning: for nondifferentiable (convex) functions, δx = g, with g f(x), need not be descent direction x 2 g example: f(x) = x x 2 x 1 Prof. S. Boyd, EE364b, Stanford University 28

30 Subgradients and distance to sublevel sets if f is convex, f(z) < f(x), g f(x), then for small t > 0, x tg z 2 < x z 2 thus g is descent direction for x z 2, for any z with f(z) < f(x) (e.g., x ) negative subgradient is descent direction for distance to optimal point proof: x tg z 2 2 = x z 2 2 2tg T (x z) + t 2 g 2 2 x z 2 2 2t(f(x) f(z)) + t 2 g 2 2 Prof. S. Boyd, EE364b, Stanford University 29

31 Descent directions and optimality fact: for f convex, finite near x, either 0 f(x) (in which case x minimizes f), or there is a descent direction for f at x i.e., x is optimal (minimizes f) iff there is no descent direction for f at x proof: define δx sd = argmin z z f(x) if δx sd = 0, then 0 f(x), so x is optimal; otherwise f (x;δx sd ) = ( inf z f(x) z ) 2 < 0, so δxsd is a descent direction Prof. S. Boyd, EE364b, Stanford University 30

32 f(x) x sd idea extends to constrained case (feasible descent direction) Prof. S. Boyd, EE364b, Stanford University 31

Subgradients. subgradients and quasigradients. subgradient calculus. optimality conditions via subgradients. directional derivatives

Subgradients. subgradients and quasigradients. subgradient calculus. optimality conditions via subgradients. directional derivatives Subgradients subgradients and quasigradients subgradient calculus optimality conditions via subgradients directional derivatives Prof. S. Boyd, EE392o, Stanford University Basic inequality recall basic