5. Subgradient method

Size: px

Start display at page:

Download "5. Subgradient method"

Harvey Hodge
6 years ago
Views:

1 L. Vandenberghe EE236C (Spring 2016) 5. Subgradient method subgradient method convergence analysis optimal step size when f is known alternating projections optimality 5-1

2 Subgradient method to minimize a nondifferentiable convex function f: choose x (0) and repeat g (k 1) is any subgradient of f at x (k 1) x (k) = x (k 1) t k g (k 1), k = 1, 2,... Step size rules fixed step: t k constant fixed length: t k g (k 1) 2 = x (k) x (k 1) 2 is constant diminishing: t k 0, t k = k=1 Subgradient method 5-2

3 Assumptions f has finite optimal value f, minimizer x f is convex, dom f = R n f is Lipschitz continuous with constant G > 0: f(x) f(y) G x y 2 x, y this is equivalent to g 2 G for all x and g f(x) (see next page) Subgradient method 5-3

4 Proof. assume g 2 G for all subgradients; choose g y f(y), g x f(x): g T x (x y) f(x) f(y) g T y (x y) by the Cauchy-Schwarz inequality G x y 2 f(x) f(y) G x y 2 assume g 2 > G for some g f(x); take y = x + g/ g 2 : f(y) f(x) + g T (y x) = f(x) + g 2 > f(x) + G Subgradient method 5-4

5 Analysis the subgradient method is not a descent method the key quantity in the analysis is the distance to the optimal set with x + = x (i), x = x (i 1), g = g (i 1), t = t i : x + x 2 2 = x tg x 2 2 = x x 2 2 2tg T (x x ) + t 2 g 2 2 x x 2 2 2t (f(x) f ) + t 2 g 2 2 combine inequalities for i = 1,..., k, and define f (k) best = min 0 i<k f(x(i) ): k 2( t i )(f (k) best f ) x (0) x 2 2 x (k) x k x (0) x t 2 i g (i 1) 2 2 k t 2 i g (i 1) 2 2 Subgradient method 5-5

6 Fixed step size: t i = t f (k) best f x(0) x 2 2 2kt + G2 t 2 does not guarantee convergence of f (k) best for large k, f (k) best is approximately G2 t/2-suboptimal Fixed step length: t i = s/ g (i 1) 2 f (k) best f G x(0) x 2 2 2ks + Gs 2 does not guarantee convergence of f (k) best for large k, f (k) best is approximately Gs/2-suboptimal Subgradient method 5-6

7 Diminishing step size: t i 0, t i = f (k) best f x (0) x G 2 2 k t i k t 2 i can show that ( k t 2 i )/( k t i ) 0; hence, f (k) best converges to f Subgradient method 5-7

8 Example: 1-norm minimization minimize Ax b 1 subgradient is given by A T sign(ax b) example with A R , b R 500 Fixed steplength t k = s/ g (k 1) 2 for s = 0.1, 0.01, (f(x (k) ) f )/f (f best (x (k) ) f )/f k k Subgradient method 5-8

9 Diminishing step size: t k = 0.01/ k and t k = 0.01/k (f best (x (k) ) f )/f 0.01/ k 0.01/k k Subgradient method 5-9

10 Optimal step size for fixed number of iterations from page 5-5: if s i = t i g (i 1) 2 and x (0) x 2 R: f (k) best f k R s 2 i k s i /G for given k, bound is minimized by fixed step length s i = s = R/ k resulting bound after k steps is f (k) best f GR k guarantees accuracy f (k) best f ɛ in k = O(1/ɛ 2 ) iterations Subgradient method 5-10

11 Optimal step size when f is known right-hand side in first inequality of page 5-5 is minimized by t i = f(x(i 1) ) f g (i 1) 2 2 optimized bound is ( f(x (i 1) ) f ) 2 g (i 1) 2 2 x (i 1) x 2 2 x (i) x 2 2 applying recursively (with x (0) x 2 R and g (i) 2 G) gives f (k) best f GR k Subgradient method 5-11

12 Exercise: find point in intersection of convex sets find a point in the intersection of m closed convex sets C 1,..., C m : minimize f(x) = max {f 1 (x),..., f m (x)} where f j (x) = inf y C j x y 2 is Euclidean distance of x to C j f = 0 if the intersection is nonempty (from p. 4-14): g f(ˆx) if g f j (ˆx) and C j is farthest set from ˆx (from p. 4-20) subgradient g f j (ˆx) follows from projection P j (ˆx) on C j : g = 0 (if ˆx C j ), g = 1 ˆx P j (ˆx) 2 (ˆx P j (ˆx)) (if ˆx C j ) note that g 2 = 1 if ˆx C j Subgradient method 5-12

13 Subgradient method optimal step size (page 5-11) for f = 0 and g (i 1) 2 = 1 is t i = f(x (i 1) ) at iteration k, find farthest set C j (with f(x (k 1) ) = f j (x (k 1) )), and take x (k) = x (k 1) f(x(k 1) ) f j (x (k 1) ) (x(k 1) P j (x (k 1) )) = P j (x (k 1) ) at each step, we project the current point onto the farthest set a version of the alternating projections algorithm for m = 2, projections alternate onto one set, then the other later, we will see faster versions of this that are almost as simple Subgradient method 5-13

14 Optimality of the subgradient method can the f (k) best f GR/ k bound on page 5-10 be improved? Problem class f is convex, with a minimizer x we know a starting point x (0) with x (0) x 2 R we know the Lipschitz constant G of f on {x x x (0) 2 R} f is defined by an oracle: given x, oracle returns f(x) and a subgradient Algorithm class: k iterations of any method that chooses x (i) in x (0) + span{g (0), g (1),..., g (i 1) } Subgradient method 5-14

15 Test problem and oracle f(x) = max x i + 1,...,k 2 x 2 2, x (0) = 0 solution: x = 1 k (1, }. {{.., 1 }, 0, }. {{.., 0 } ) and f = 1 2k k n k R = x (0) x 2 = 1/ k and G = 1 + 1/ k oracle returns subgradient eĵ + x where ĵ = min{j x j = Iteration: for i = 0,..., k 1, entries x (i) i+1,..., x(i) k max x i},...,k are zero; therefore f (k) best f = min i<k f(x(i) ) f f = GR 2(1 + k) Conclusion: O(1/ k) bound cannot be improved Subgradient method 5-15

16 Summary: subgradient method handles general nondifferentiable convex problem often leads to very simple algorithms convergence can be very slow no good stopping criterion theoretical complexity: O(1/ɛ 2 ) iterations to find ɛ-suboptimal point an optimal 1st-order method: O(1/ɛ 2 ) bound cannot be improved Subgradient method 5-16

17 References S. Boyd, lecture notes and slides for EE364b, Convex Optimization II Yu. Nesterov, Introductory Lectures on Convex Optimization. A Basic Course (2004) with the example on page 5-15 of this lecture Subgradient method 5-17

Subgradient Method. Ryan Tibshirani Convex Optimization

Subgradient Method. Ryan Tibshirani Convex Optimization Subgradient Method Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last last time: gradient descent min x f(x) for f convex and differentiable, dom(f) = R n. Gradient descent: choose initial