Algorithms for Nonsmooth Optimization

Size: px
Start display at page:

Download "Algorithms for Nonsmooth Optimization"

Transcription

1 Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization 1 of 55

2 Outline Motivating Examples Subdifferential Theory Fundamental Algorithms Nonconvex Nonsmooth Functions General Framework Algorithms for Nonsmooth Optimization 2 of 55

3 Outline Motivating Examples Subdifferential Theory Fundamental Algorithms Nonconvex Nonsmooth Functions General Framework Algorithms for Nonsmooth Optimization 3 of 55

4 Nonsmooth optimization In mathematical optimization, one wants to i.e., minimize an objective subject to constraints Why nonsmooth optimization? min x X f(x) Algorithms for Nonsmooth Optimization 4 of 55

5 Nonsmooth optimization In mathematical optimization, one wants to i.e., minimize an objective subject to constraints min x X f(x) Why nonsmooth optimization? Nonsmoothness can arise for different reasons: physical technological methodological numerical (Bagirov, Karmitsa, Mäkelä (2014)) Algorithms for Nonsmooth Optimization 4 of 55

6 Nonsmooth optimization In mathematical optimization, one wants to i.e., minimize an objective subject to constraints min x X f(x) Why nonsmooth optimization? Nonsmoothness can arise for different reasons: physical (phenomena can be nonsmooth) phase changes in materials technological methodological numerical (Bagirov, Karmitsa, Mäkelä (2014)) Algorithms for Nonsmooth Optimization 4 of 55

7 Nonsmooth optimization In mathematical optimization, one wants to i.e., minimize an objective subject to constraints min x X f(x) Why nonsmooth optimization? Nonsmoothness can arise for different reasons: physical (phenomena can be nonsmooth) phase changes in materials technological (constraints impose nonsmoothness) obstacles in shape design methodological numerical (Bagirov, Karmitsa, Mäkelä (2014)) Algorithms for Nonsmooth Optimization 4 of 55

8 Nonsmooth optimization In mathematical optimization, one wants to i.e., minimize an objective subject to constraints min x X f(x) Why nonsmooth optimization? Nonsmoothness can arise for different reasons: physical (phenomena can be nonsmooth) phase changes in materials technological (constraints impose nonsmoothness) obstacles in shape design methodological (nonsmoothness introduced by solution method) decompositions; penalty formulations numerical (Bagirov, Karmitsa, Mäkelä (2014)) Algorithms for Nonsmooth Optimization 4 of 55

9 Nonsmooth optimization In mathematical optimization, one wants to i.e., minimize an objective subject to constraints min x X f(x) Why nonsmooth optimization? Nonsmoothness can arise for different reasons: physical (phenomena can be nonsmooth) phase changes in materials technological (constraints impose nonsmoothness) obstacles in shape design methodological (nonsmoothness introduced by solution method) decompositions; penalty formulations numerical (analytically smooth, but practically nonsmooth) stiff problems (Bagirov, Karmitsa, Mäkelä (2014)) Algorithms for Nonsmooth Optimization 4 of 55

10 Data fitting min x R n θ(x) + ψ(x) where, e.g., θ(x) = Ax b 2 2 and ψ(x) = φ 1 (t) = n φ(x i ) with i=1 α t 1 + α t, φ 2 (t) = log(α t + 1), φ 3 (t) = t q, or φ 4 (t) = α (α t)2 + α Algorithms for Nonsmooth Optimization 5 of 55

11 Clusterwise linear regression (CLR) Given a dataset of pairs A := {(a i, b i )} l i=1, the goal of CLR is to simultaneously partition the dataset into k disjoint clusters, and find regression coefficients {(x j, y j )} k j=1 for each cluster in order to minimize overall error in the fit; e.g., min f k({x j, y j }), where f k ({x j, y j }) = {(x j,y j )} l min j {1,...,k} xt j a i y j b i p. i=1 This objective is nonconvex (though it is a difference of convex functions). Algorithms for Nonsmooth Optimization 6 of 55

12 Decomposition Various types of decomposition strategies introduce nonsmoothness. Primal decomposition can be used for min f 1(x 1, y) + f 2 (x 2, y), (x 1,x 2,y) where y is the complicating/linking variable; equivalent to min φ 1 (y) + φ 2 (y) where y This master problem may be nonsmooth in y. φ 1 (y) := min x 1 f 1 (x 1, y) φ 2 (y) := min x 2 f 2 (x 2, y) Algorithms for Nonsmooth Optimization 7 of 55

13 Decomposition Various types of decomposition strategies introduce nonsmoothness. Primal decomposition can be used for min f 1(x 1, y) + f 2 (x 2, y), (x 1,x 2,y) where y is the complicating/linking variable; equivalent to min φ 1 (y) + φ 2 (y) where y φ 1 (y) := min x 1 f 1 (x 1, y) φ 2 (y) := min x 2 f 2 (x 2, y) This master problem may be nonsmooth in y. Dual decomposition can be used for same problem, reformulating as min (x 1,x 2,y) f 1(x 1, y 1 ) + f 2 (x 2, y 2 ) s.t. y 1 = y 2 The Lagrangian is separable, meaning the dual function decomposes: g 1 (λ) = inf (f 1(x 1, y 1 ) + λ T y 1 ) (x 1,y 1 ) g 2 (λ) = inf (f 2(x 2, y 2 ) λ T y 2 ) (x 2,y 2 ) Dual problem to maximize g(λ) = g 1 (λ) + g 2 (λ) may be nonsmooth in λ. Algorithms for Nonsmooth Optimization 7 of 55

14 Dual decomposition with constraints Consider the nearly separable problem min (x 1,...,x m) m f i (x i ) i=1 s.t. x i X i for all i {1,..., m} m A i x i b (e.g., shared resource constraint) i=1 where the last are complicating/linking constraints; dualizing leads to ( m m ) g(λ) := min f i (x i ) + λ T A i x i b (x 1,...,x m) i=1 i=1 s.t. x i X i for all i {1,..., m}. Given λ R m, the value g(λ) comes from solving separable problems; the dual max λ 0 g(λ) is typically nonsmooth (and people often use poor algorithms!). Algorithms for Nonsmooth Optimization 8 of 55

15 Control of dynamical systems Consider the discrete time linear dynamical system: y k+1 = Ay k + Bu k z k = Cy k (state equation) (observation equation) Supposing we want to design a control such that u k = XCy k (where X is our variable) consider the closed loop system given by y k+1 = Ay k + Bu k = Ay k + BXCy k = (A + BXC)y k. Common objectives are to minimize a stability measure ρ(a + BXC), which are often functions of the eigenvalues of A + BXC. Algorithms for Nonsmooth Optimization 9 of 55

16 Eigenvalue optimization Plots of ordered eigenvalues as matrix is perturbed along a given direction: Algorithms for Nonsmooth Optimization 10 of 55

17 Other sources of nonsmooth optimization problems Lagrangian relaxation Composite optimization (e.g., penalty methods for soft constraints ) Parametric optimization (e.g., for model predictive control) Multilevel optimization Algorithms for Nonsmooth Optimization 11 of 55

18 Outline Motivating Examples Subdifferential Theory Fundamental Algorithms Nonconvex Nonsmooth Functions General Framework Algorithms for Nonsmooth Optimization 12 of 55

19 Derivatives When I teach an optimization class, I always start with the same question: What is a derivative? (f : R R) Algorithms for Nonsmooth Optimization 13 of 55

20 Derivatives When I teach an optimization class, I always start with the same question: What is a derivative? (f : R R) Answer I get: slope of the tangent line f(x) slope = f (x) x Algorithms for Nonsmooth Optimization 13 of 55

21 Gradients Then I ask: What is a gradient? (f : R n R) Algorithms for Nonsmooth Optimization 14 of 55

22 Gradients Then I ask: What is a gradient? (f : R n R) Answer I get: direction along which the function increases at the fastest rate Algorithms for Nonsmooth Optimization 14 of 55

23 Derivative vs. gradient So if a derivative is a magnitude (here, a slope), then why does it generalize in multiple dimensions to something that is a direction? (n = 1) f (x) = df f (x) = dx x (x) f (x) x 1 (n 1) f(x) =. f (x) x n What s important? Magnitude? direction? Algorithms for Nonsmooth Optimization 15 of 55

24 Derivative vs. gradient So if a derivative is a magnitude (here, a slope), then why does it generalize in multiple dimensions to something that is a direction? (n = 1) f (x) = df f (x) = dx x (x) f (x) x 1 (n 1) f(x) =. f (x) x n What s important? Magnitude? direction? Answer: The gradient is a vector in R n, which has magnitude (e.g., its 2-norm) can be viewed as a direction and gives us a way to compute directional derivatives Algorithms for Nonsmooth Optimization 15 of 55

25 Differentiable f How should we think about the gradient? If f is continuously differentiable (i.e., f C 1 ), then f(x) is the unique vector in the linear (Taylor) approximation of f at x. f(x) Both are graphs of functions of x! f(x) + f(x) T (x x) x x Algorithms for Nonsmooth Optimization 16 of 55

26 Differentiable and convex f If f C 1 is convex, then f(x) f(x) + f(x) T (x x) for all (x, x) R n R n f(x) f(x) + f(x) T (x x) x x Algorithms for Nonsmooth Optimization 17 of 55

27 Graphs and epigraphs There is another interpretation of a gradient that is also useful. First... What is a graph? Algorithms for Nonsmooth Optimization 18 of 55

28 Graphs and epigraphs There is another interpretation of a gradient that is also useful. First... What is a graph? A set of points in R n+1, namely, {(x, z) : f(x) = z} {(x, f(x))} x Algorithms for Nonsmooth Optimization 18 of 55

29 Graphs and epigraphs There is another interpretation of a gradient that is also useful. First... What is a graph? A set of points in R n+1, namely, {(x, z) : f(x) = z} A related quantity, another set, is the epigraph: {(x, z) : f(x) z} {(x, f(x))} x Algorithms for Nonsmooth Optimization 18 of 55

30 Differentiable and convex f If f C 1 is convex, then, for all (x, x) R n R n, f(x) f(x) + f(x) T (x x) f(x) f(x) T x f(x) f(x) T x [ ] T [ ] [ ] T [ ] f(x) x f(x) x 1 f(x) 1 f(x) Algorithms for Nonsmooth Optimization 19 of 55

31 Differentiable and convex f If f C 1 is convex, then, for all (x, x) R n R n, f(x) f(x) + f(x) T (x x) f(x) f(x) T x f(x) f(x) T x [ ] T [ ] [ ] T [ ] f(x) x f(x) x 1 f(x) 1 f(x) Note: Given x, the vector [ ] f(x) is given, 1 so the inequality above involves a linear function over R n+1 and says [ ] [ ] x x the value at any point in the graph is at least the value at f(x) f(x) Algorithms for Nonsmooth Optimization 19 of 55

32 Linearization f(x) f(x) + f(x) T (x x) x x Algorithms for Nonsmooth Optimization 20 of 55

33 Linearization and supporting hyperplane for epigraph [ ] [ ] x f(x) + f(x) 1 {(x, f(x))} f(x) + f(x) T (x x) x x Algorithms for Nonsmooth Optimization 20 of 55

34 Subgradients (convex f) Why was that useful? We can generalize this idea when the function is not differentiable somewhere. [ ] [ ] x g + f(x) 1 [ ] x f(x) A vector g R n is a subgradient of a convex f : R n R at x R n if f(x) f(x) + g T (x x) [ ] T [ ] [ ] T [ ] g x g x 1 f(x) 1 f(x) Algorithms for Nonsmooth Optimization 21 of 55

35 Subdifferentials Theorem If f is convex and differentiable at x, then f(x) is its unique subgradient at x. But in general, the set of all subgradients for a convex f at x is the subdifferential of f at x: f(x) := {g R n : g is a subgradient of f at x}. From the definition, it is easily seen that x is a minimizer of f if and only if 0 f(x ) Algorithms for Nonsmooth Optimization 22 of 55

36 What about nonconvex, nonsmooth? We need to generalize the idea of a subgradient further. Directional derivatives Subgradients Subdifferentials Let s return to this after we discuss some algorithms... Algorithms for Nonsmooth Optimization 23 of 55

37 Outline Motivating Examples Subdifferential Theory Fundamental Algorithms Nonconvex Nonsmooth Functions General Framework Algorithms for Nonsmooth Optimization 24 of 55

38 A fundamental iteration Thinking of f(x k ), we have a vector that directs us in a direction of descent, and vanishes as we approach a minimizer Algorithms for Nonsmooth Optimization 25 of 55

39 A fundamental iteration Thinking of f(x k ), we have a vector that directs us in a direction of descent, and vanishes as we approach a minimizer Algorithm : Gradient Descent 1: Choose an initial point x 0 R n and stepsize α (0, 1/L] 2: for k = 0, 1, 2,... do 3: if f(x k ) 0, then return x k 4: else set x k+1 x k α f(x k ) I call this a fundamental iteration. Algorithms for Nonsmooth Optimization 25 of 55

40 A fundamental iteration Thinking of f(x k ), we have a vector that directs us in a direction of descent, and vanishes as we approach a minimizer Algorithm : Gradient Descent 1: Choose an initial point x 0 R n and stepsize α (0, 1/L] 2: for k = 0, 1, 2,... do 3: if f(x k ) 0, then return x k 4: else set x k+1 x k α f(x k ) I call this a fundamental iteration. Here, we suppose f is Lipschitz continuous, i.e., there exists L 0 such that f(x) f(x) 2 L x x 2 for all (x, x) R n R n = f(x) f(x) + f(x) T (x x) L x x 2 2 for all (x, x) R n R n. Algorithms for Nonsmooth Optimization 25 of 55

41 Convergence of gradient descent f(x k ) x k x Algorithms for Nonsmooth Optimization 26 of 55

42 Convergence of gradient descent f(x k ) f(x)? f(x)? x k x Algorithms for Nonsmooth Optimization 26 of 55

43 Convergence of gradient descent f(x k ) f(x k ) + f(x k ) T (x x k ) L x x k 2 2 x k x Algorithms for Nonsmooth Optimization 26 of 55

44 Gradient descent for f Theorem If f is Lipschitz continuous with constant L > 0 and α (0, 1/L], then f(x j ) 2 2 < which implies { f(x j )} 0. j=0 Algorithms for Nonsmooth Optimization 27 of 55

45 Gradient descent for f Theorem If f is Lipschitz continuous with constant L > 0 and α (0, 1/L], then f(x j ) 2 2 < which implies { f(x j )} 0. j=0 Proof. Let k N and recall that x k+1 x k = α f(x k ). Then, since α (0, 1/L], f(x k+1 ) f(x k ) + f(x k ) T (x k+1 x k ) L x k+1 x k 2 2 = f(x k ) α f(x k ) α2 L f(x k ) 2 2 = f(x k ) α(1 1 2 αl) f(x k) 2 2 f(x k ) 1 2 α f(x k) 2 2. Thus, summing over j {0,..., k}, one finds > f(x 0 ) f inf f(x 0 ) f(x k+1 ) 1 2 α k j=0 f(x j) 2 2. Algorithms for Nonsmooth Optimization 27 of 55

46 Strong convexity Now suppose that f is c-strongly convex, which means that f(x) f(x) + f(x) T (x x) c x x 2 2 for all (x, x) Rn R n. Important consequences of this are that f has a unique global minimizer, call it x with f := f(x ), and the gradient norm grows with the optimality error in that 2c(f(x) f ) f(x) 2 2 for all x R n. Algorithms for Nonsmooth Optimization 28 of 55

47 Strong convexity, lower bound f(x k ) f(x k ) + f(x k ) T (x x k ) L x x k 2 2 x k x Algorithms for Nonsmooth Optimization 29 of 55

48 Strong convexity, lower bound f(x k ) f(x k ) + f(x k ) T (x x k ) L x x k 2 2 f(x k ) + f(x k ) T (x x k ) c x x k 2 2 x k x Algorithms for Nonsmooth Optimization 29 of 55

49 Gradient descent for strongly convex f Theorem If f is Lipschitz with L > 0, f is c-strongly convex, and α (0, 1/L], then f(x j+1 ) f (1 αc) j+1 (f(x 0 ) f ) for all j N. Algorithms for Nonsmooth Optimization 30 of 55

50 Gradient descent for strongly convex f Theorem If f is Lipschitz with L > 0, f is c-strongly convex, and α (0, 1/L], then f(x j+1 ) f (1 αc) j+1 (f(x 0 ) f ) for all j N. Proof. Let k N. Following the previous proof, one finds f(x k+1 ) f(x k ) 1 2 α f(x k) 2 2 f(x k ) αc(f(x k ) f ). Subtracting f from both sides, one finds f(x k+1 ) f (1 αc)(f(x k ) f ). Applying the result repeatedly over j {0,..., k} yields the result. Algorithms for Nonsmooth Optimization 30 of 55

51 A fundamental iteration when f is nonsmooth? What is a fundamental iteration for nonsmooth optimization? Algorithms for Nonsmooth Optimization 31 of 55

52 A fundamental iteration when f is nonsmooth? What is a fundamental iteration for nonsmooth optimization? Steepest descent! For convex f, the directional derivative of f at x along s is f (x; s) = max g f(x) gt s Along which direction is f decreasing at the fastest rate? Algorithms for Nonsmooth Optimization 31 of 55

53 A fundamental iteration when f is nonsmooth? What is a fundamental iteration for nonsmooth optimization? Steepest descent! For convex f, the directional derivative of f at x along s is f (x; s) = max g f(x) gt s Along which direction is f decreasing at the fastest rate? The solution of an optimization problem! min f (x; s) = min max s 2 1 s 2 1 g f(x) gt s = max min g f(x) s 2 1 gt s (von Neumann minimax theorem) = max g f(x) ( g 2) = min g f(x) g 2 = (need minimum norm subgradient) Algorithms for Nonsmooth Optimization 31 of 55

54 Main challenge But, typically, we can only access g f(x), not all of f(x) I would argue: no practical fundamental iteration for general nonsmooth optimization (no computable descent direction that vanishes near a minimizer) What are our options? Algorithms for Nonsmooth Optimization 32 of 55

55 Main challenge But, typically, we can only access g f(x), not all of f(x) I would argue: no practical fundamental iteration for general nonsmooth optimization (no computable descent direction that vanishes near a minimizer) What are our options? There are a few ways to design a convergent algorithm: algorithmically (e.g., subgradient method) iteratively (e.g., cutting plane / bundle methods) randomly (e.g., gradient sampling) Algorithms for Nonsmooth Optimization 32 of 55

56 Subgradient method Algorithm : Subgradient method (not descent) 1: Choose an initial point x 0 R n. 2: for k = 0, 1, 2,... do 3: if a termination condition is satisfied, then return x k 4: else compute g k f(x k ), choose α k R >0, and set x k+1 x k α k g k Algorithms for Nonsmooth Optimization 33 of 55

57 Why not subgradient descent? Consider min f(x), where f(x 1, x 2 ) := x 1 + x 2 + max{0, x x2 2 4}. x R 2 At x = (0, 2), we have {[ [ ]} [ f(x) = conv,, but 1] 3 1] are both directions of ascent for f from x! and [ ] 1 3 Algorithms for Nonsmooth Optimization 34 of 55

58 Decreasing the distance to a solution The objective f is not the only measure of progress. Given an arbitrary subgradient g k for f at x k, we have f(x) f(x k ) + gk T (x x k) for all x R n, (1) which means that all points with an objective value lower than f(x k ) lie in H k := {x R n : gk T (x x k) 0} Thus, a small step along g k should decrease the distance to a solution (Convexity is crucial for this idea) Algorithms for Nonsmooth Optimization 35 of 55

59 Algorithmic convergence Theorem If f has a minimizer, g k 2 G R >0 for all k N, and the stepsizes satisfy α k = k=1 and α 2 k <, (2) k=1 then { lim k min j {0,...,k} f j } = f. An example sequence satisfying (2) is α k = α/k for k = 1, 2,... Algorithms for Nonsmooth Optimization 36 of 55

60 Proof, lim k { minj {0,...,k} f j } = f, part 1. Let k N. By (1), the iterates satisfy x k+1 x 2 2 = x k α k g k x 2 2 Applying this inequality recursively, we have = x k x 2 2 2α kg T k (x k x ) + α 2 k g k 2 2 x k x 2 2 2α k (f k f ) + α 2 k g k 2 2. k k 0 x k+1 x 2 2 x 0 x α j (f j f ) + α 2 j g j 2 2, which implies that j=0 j=0 k k 2 α j (f j f ) x 0 x α 2 j g j 2 2 j=0 j=1 min f j f x 0 x k G2 j=1 α2 j j {0,...,k} 2 k j=0 α. (3) j Algorithms for Nonsmooth Optimization 37 of 55

61 Proof, lim k { minj {0,...,k} f j } = f, part 2. Now consider an arbitrary scalar ɛ > 0. By (2), there exists a nonnegative integer K such that, for all k > K, α k ɛ k G 2 and α j 1 K x 0 x G 2 α 2 j. ɛ j=0 j=0 Then, by (3), it follows that for all k > K we have min f j f x 0 x G2 K j=0 α2 j j {0,...,k} 2 k j=0 α + j 2 ɛ x 0 x G2 K j=0 α2 j ( x 0 x G2 K j=0 α2 j = ɛ 2 + ɛ 2 = ɛ. The result follows since ɛ > 0 was chosen arbitrarily. G 2 k j=k+1 α2 j 2 K j=0 α j + 2 k j=k+1 α j ) + G2 k j=k+1 ɛ G 2 α j 2 k j=k+1 α j Algorithms for Nonsmooth Optimization 38 of 55

62 Cutting plane method Subgradient methods lose previously computed information in every iteration. Algorithms for Nonsmooth Optimization 39 of 55

63 Cutting plane method Subgradient methods lose previously computed information in every iteration. Suppose, after a sequence of iterates, we have the affine underestimators f i (x) = f(x i ) + gi T (x x i) for all i {0,..., k}. f(x 1 ) + g1 T (x x 1) f(x) f(x 0 ) + g T 0 (x x 0) x 0 x 2 x 1 x At iteration k, we can compute the next iterate by solving the master problem x k+1 arg min ˆf k (x), where ˆf k (x) := max (f(x i) + gi T (x x i)). x X i {1,...,k} Algorithms for Nonsmooth Optimization 39 of 55

64 Cutting plane method convergence The iterates of the cutting plane method yield lower bounds of the optimal value: v k+1 := min x X ˆf k (x) min f(x) =: f. x X Therefore, if v k+1 = f(x k+1 ), then we terminate since f(x k+1 ) = f. Algorithms for Nonsmooth Optimization 40 of 55

65 Cutting plane method convergence The iterates of the cutting plane method yield lower bounds of the optimal value: v k+1 := min x X ˆf k (x) min f(x) =: f. x X Therefore, if v k+1 = f(x k+1 ), then we terminate since f(x k+1 ) = f. If f is piecewise linear, then convergence occurs in finitely many iterations! f(x 1 ) + g T 1 (x x 1) f(x) f(x 0 ) + g T 0 (x x 0) x 0 x 2 x 1 x However, in general, we have the following theorem. Theorem The cutting plane method yields {x k } satisfying {f(x k )} f. Algorithms for Nonsmooth Optimization 40 of 55

66 Bundle method A bundle method attempts to combine the practical advantages of a cutting plane method with the theoretical strengths of a proximal point method. Given x k, consider the regularized master problem ( min ˆfk (x) + γ ) x R n 2 x x k 2 2, where ˆf k (x) := max(f(x i ) + gi T (x x i)). i I k Here, I k {1,..., k 1} indicates a subset of previous iterations. This problem is equivalent to the quadratic optimization problem min (x,v) R n R v + γ 2 x x k 2 2 s.t. f(x i ) + g T i (x x i) v for all i I k. Only move to a new point when a sufficient decrease is obtained. Convergence rate analyses are limited; O( 1 ɛ log( 1 )) for strongly convex f ɛ Algorithms for Nonsmooth Optimization 41 of 55

67 Bundle method convergence Analysis makes use of the Moreau-Yosida regularization function f γ(x) = min x R n ( f(x) γ x x 2 2). Theorem If x k is not a minimizer, then f γ(x k ) < f(x k ). Algorithms for Nonsmooth Optimization 42 of 55

68 Bundle method convergence Theorem For all (k, j) N N in a bundle method, v k,j γ x k,j x k 2 2 f γ(x k ) < f(x k ). Algorithms for Nonsmooth Optimization 43 of 55

69 Outline Motivating Examples Subdifferential Theory Fundamental Algorithms Nonconvex Nonsmooth Functions General Framework Algorithms for Nonsmooth Optimization 44 of 55

70 Clarke subdifferential What if f is nonconvex and nonsmooth? What are subgradients? We still need some structure; we assume f is locally Lipschitz and f is differentiable on a full measure set D Algorithms for Nonsmooth Optimization 45 of 55

71 Clarke subdifferential What if f is nonconvex and nonsmooth? What are subgradients? We still need some structure; we assume f is locally Lipschitz and f is differentiable on a full measure set D The Clarke subdifferential of f at x is { } f(x) = conv lim f(x j) : x j x and x j D, j i.e., convex hull of limits of gradients of f at points in D converging to x Algorithms for Nonsmooth Optimization 45 of 55

72 Clarke subdifferential What if f is nonconvex and nonsmooth? What are subgradients? We still need some structure; we assume f is locally Lipschitz and f is differentiable on a full measure set D The Clarke subdifferential of f at x is { } f(x) = conv lim f(x j) : x j x and x j D, j i.e., convex hull of limits of gradients of f at points in D converging to x Theorem If f is continuously differentiable at x, then f(x) = { f(x)} Algorithms for Nonsmooth Optimization 45 of 55

73 Differentiable, but nonsmooth Theorem If f is differentiable at x, then { f(x)} f(x) (not necessarily equal) Considering { x 2 cos( 1 f(x) = x ) if x 0 0 if x = 0 one finds that f (0) = 0 yet [ 1, 1] f(0) Algorithms for Nonsmooth Optimization 46 of 55

74 Clarke ɛ-subdifferential As before, we typically cannot compute f(x). It is approximated by the Clarke ɛ-subdifferential, namely, ɛf(x) = conv{ f(b(x, ɛ))}, which in turn can be approximated as in ɛf(x) conv{ f(x k ), f(x k,1 ),..., f(x k,m )}, where {x k,1,..., x k,m } B(x k, ɛ). Algorithms for Nonsmooth Optimization 47 of 55

75 Clarke ɛ-subdifferential and gradient sampling As before, we typically cannot compute f(x). It is approximated by the Clarke ɛ-subdifferential, namely, ɛf(x) = conv{ f(b(x, ɛ))}, which in turn can be approximated as in ɛf(x) conv{ f(x k ), f(x k,1 ),..., f(x k,m )}, where {x k,1,..., x k,m } B(x k, ɛ). In gradient sampling, we compute the minimum norm element in which is equivalent to solving min (x,v) R n R v + x x k 2 2 conv{ f(x k ), f(x k,1 ),..., f(x k,m )}, s.t. f(x k ) + f(x k,i ) T (x x k ) v for all i {1,..., m} Algorithms for Nonsmooth Optimization 47 of 55

76 Outline Motivating Examples Subdifferential Theory Fundamental Algorithms Nonconvex Nonsmooth Functions General Framework Algorithms for Nonsmooth Optimization 48 of 55

77 Popular and effective method Despite all I ve talked about, a very effective method: BFGS Algorithms for Nonsmooth Optimization 49 of 55

78 Popular and effective method Despite all I ve talked about, a very effective method: BFGS Approximate second-order information with gradient displacements: x k+1 x k x Secant equation H k y k = s k to match gradient of f at x k, where s k := x k+1 x k and y k := f(x k+1 ) f(x k ) Algorithms for Nonsmooth Optimization 49 of 55

79 BFGS-type updates Inverse Hessian and Hessian approximation updating formulas (s T k v k > 0): ( W k+1 I v ks T k s T k v k ( H k+1 I s ks T k H k s T k H ks k ) T W k ( I v ks T k s T k v k ) T ( H k ) I s ks T k H k s T k H ks k With an appropriate technique for choosing v k, we attain self-correcting properties for {H k } and {W k } + s ks T k s T k v k ) + v kv T k s T k v k (inverse) Hessian approximations that can be used in other algorithms Algorithms for Nonsmooth Optimization 50 of 55

80 Subproblems in nonsmooth optimization algorithms With sets of points, scalars, and (sub)gradients {x k,j } m j=1, {f k,j} m j=1, {g k,j} m j=1, nonsmooth optimization methods involve the primal subproblem ( min x R n max {f k,j + gk,j T (x x k,j)} + 1 j {1,...,m} 2 (x x k) T H k (x x k ) s.t. x x k δ k, ) (P) but, with G k [g k,1 g k,m ], it is typically more efficient to solve the dual sup (ω,γ) R m + Rn 1 2 (G kω + γ) T W k (G k ω + γ) + b T k ω δ k γ s.t. 1 T m ω = 1. (D) The primal solution can then be recovered by x k x k W k (G k ω k + γ k ). }{{} g k Algorithms for Nonsmooth Optimization 51 of 55

81 Algorithm Self-Correcting Variable-Metric Alg. for Nonsmooth Opt. 1: Choose x 1 R n. 2: Choose a symmetric positive definite W 1 R n n. 3: Choose α (0, 1) 4: for k = 1, 2,... do 5: Solve (P) (D) such that setting 6: yields G k [ g k,1 g k,m ], s k W k (G k ω k + γ k ), and x k+1 x k + s k f(x k+1 ) f(x k ) 1 2 α(g kω k + γ k ) T W k (G k ω k + γ k ). 7: Choose v k (details omitted, but very simple) 8: Set ( ) T ( W k W k+1 I v ks T k s T k v k I v ks T k s T k v k ) + s ks T k s T k v. k Algorithms for Nonsmooth Optimization 52 of 55

82 Instances of the framework Cutting plane / bundle methods Points added incrementally until sufficient decrease obtained Finite number of additions until accepted step Gradient sampling methods Points added randomly / incrementally until sufficient decrease obtained Sufficient number of iterations with good steps In any case: convergence guarantees require {W k } to be uniformly positive definite and bounded on a sufficient number of accepted steps Algorithms for Nonsmooth Optimization 53 of 55

83 C++ implementation: NonOpt BFGS w/ weak Wolfe line search Name Exit ɛ end f(x end) #iter #func #grad #subs maxq Stationary +9.77e e mxhilb Stepsize +3.13e e chained lq Stepsize +5.00e e chained cb3 1 Stepsize +1.00e e chained cb3 2 Stepsize +1.00e e active faces Stepsize +2.50e e brown function 2 Stepsize +1.00e e chained mifflin 2 Stepsize +5.00e e chained crescent 1 Stepsize +1.00e e chained crescent 2 Stepsize +1.00e e Bundle method with self-correcting properties Name Exit ɛ end f(x end) #iter #func #grad #subs maxq Stationary +9.77e e mxhilb Stationary +9.77e e chained lq Stationary +9.77e e chained cb3 1 Stationary +9.77e e chained cb3 2 Stationary +9.77e e active faces Stationary +9.77e e brown function 2 Stationary +9.77e e chained mifflin 2 Stationary +9.77e e chained crescent 1 Stationary +9.77e e chained crescent 2 Stationary +9.77e e Algorithms for Nonsmooth Optimization 54 of 55

84 Thanks! NonOpt coming soon... Andreas could finish in a day what has taken me 6 months on sabbatical, so it ll be done when he has a free day ;-) Thanks for listening! Algorithms for Nonsmooth Optimization 55 of 55

Kaisa Joki Adil M. Bagirov Napsu Karmitsa Marko M. Mäkelä. New Proximal Bundle Method for Nonsmooth DC Optimization

Kaisa Joki Adil M. Bagirov Napsu Karmitsa Marko M. Mäkelä. New Proximal Bundle Method for Nonsmooth DC Optimization Kaisa Joki Adil M. Bagirov Napsu Karmitsa Marko M. Mäkelä New Proximal Bundle Method for Nonsmooth DC Optimization TUCS Technical Report No 1130, February 2015 New Proximal Bundle Method for Nonsmooth

More information

Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods

Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 30 Notation f : H R { } is a closed proper convex function domf := {x R n

More information

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term; Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many

More information

Active sets, steepest descent, and smooth approximation of functions

Active sets, steepest descent, and smooth approximation of functions Active sets, steepest descent, and smooth approximation of functions Dmitriy Drusvyatskiy School of ORIE, Cornell University Joint work with Alex D. Ioffe (Technion), Martin Larsson (EPFL), and Adrian

More information

Primal/Dual Decomposition Methods

Primal/Dual Decomposition Methods Primal/Dual Decomposition Methods Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2018-19, HKUST, Hong Kong Outline of Lecture Subgradients

More information

Subgradient Method. Ryan Tibshirani Convex Optimization

Subgradient Method. Ryan Tibshirani Convex Optimization Subgradient Method Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last last time: gradient descent min x f(x) for f convex and differentiable, dom(f) = R n. Gradient descent: choose initial

More information

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Michael Patriksson 0-0 The Relaxation Theorem 1 Problem: find f := infimum f(x), x subject to x S, (1a) (1b) where f : R n R

More information

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

How to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization

How to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization How to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization Frank E. Curtis Department of Industrial and Systems Engineering, Lehigh University Daniel P. Robinson Department

More information

Introduction. New Nonsmooth Trust Region Method for Unconstraint Locally Lipschitz Optimization Problems

Introduction. New Nonsmooth Trust Region Method for Unconstraint Locally Lipschitz Optimization Problems New Nonsmooth Trust Region Method for Unconstraint Locally Lipschitz Optimization Problems Z. Akbari 1, R. Yousefpour 2, M. R. Peyghami 3 1 Department of Mathematics, K.N. Toosi University of Technology,

More information

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications Weijun Zhou 28 October 20 Abstract A hybrid HS and PRP type conjugate gradient method for smooth

More information

ON A CLASS OF NONSMOOTH COMPOSITE FUNCTIONS

ON A CLASS OF NONSMOOTH COMPOSITE FUNCTIONS MATHEMATICS OF OPERATIONS RESEARCH Vol. 28, No. 4, November 2003, pp. 677 692 Printed in U.S.A. ON A CLASS OF NONSMOOTH COMPOSITE FUNCTIONS ALEXANDER SHAPIRO We discuss in this paper a class of nonsmooth

More information

Math 273a: Optimization Subgradients of convex functions

Math 273a: Optimization Subgradients of convex functions Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 42 Subgradients Assumptions

More information

Stochastic Optimization Algorithms Beyond SG

Stochastic Optimization Algorithms Beyond SG Stochastic Optimization Algorithms Beyond SG Frank E. Curtis 1, Lehigh University involving joint work with Léon Bottou, Facebook AI Research Jorge Nocedal, Northwestern University Optimization Methods

More information

Introduction. A Modified Steepest Descent Method Based on BFGS Method for Locally Lipschitz Functions. R. Yousefpour 1

Introduction. A Modified Steepest Descent Method Based on BFGS Method for Locally Lipschitz Functions. R. Yousefpour 1 A Modified Steepest Descent Method Based on BFGS Method for Locally Lipschitz Functions R. Yousefpour 1 1 Department Mathematical Sciences, University of Mazandaran, Babolsar, Iran; yousefpour@umz.ac.ir

More information

CS-E4830 Kernel Methods in Machine Learning

CS-E4830 Kernel Methods in Machine Learning CS-E4830 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 27. September, 2017 Juho Rousu 27. September, 2017 1 / 45 Convex optimization Convex optimisation This

More information

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE CONVEX ANALYSIS AND DUALITY Basic concepts of convex analysis Basic concepts of convex optimization Geometric duality framework - MC/MC Constrained optimization

More information

5. Duality. Lagrangian

5. Duality. Lagrangian 5. Duality Convex Optimization Boyd & Vandenberghe Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized

More information

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)

More information

R-Linear Convergence of Limited Memory Steepest Descent

R-Linear Convergence of Limited Memory Steepest Descent R-Linear Convergence of Limited Memory Steepest Descent Frank E. Curtis, Lehigh University joint work with Wei Guo, Lehigh University OP17 Vancouver, British Columbia, Canada 24 May 2017 R-Linear Convergence

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

A quasisecant method for minimizing nonsmooth functions

A quasisecant method for minimizing nonsmooth functions A quasisecant method for minimizing nonsmooth functions Adil M. Bagirov and Asef Nazari Ganjehlou Centre for Informatics and Applied Optimization, School of Information Technology and Mathematical Sciences,

More information

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44 Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)

More information

5 Handling Constraints

5 Handling Constraints 5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest

More information

Lecture 24 November 27

Lecture 24 November 27 EE 381V: Large Scale Optimization Fall 01 Lecture 4 November 7 Lecturer: Caramanis & Sanghavi Scribe: Jahshan Bhatti and Ken Pesyna 4.1 Mirror Descent Earlier, we motivated mirror descent as a way to improve

More information

Inverse problems Total Variation Regularization Mark van Kraaij Casa seminar 23 May 2007 Technische Universiteit Eindh ove n University of Technology

Inverse problems Total Variation Regularization Mark van Kraaij Casa seminar 23 May 2007 Technische Universiteit Eindh ove n University of Technology Inverse problems Total Variation Regularization Mark van Kraaij Casa seminar 23 May 27 Introduction Fredholm first kind integral equation of convolution type in one space dimension: g(x) = 1 k(x x )f(x

More information

Unconstrained optimization

Unconstrained optimization Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout

More information

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013 Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for

More information

Math 273a: Optimization Subgradient Methods

Math 273a: Optimization Subgradient Methods Math 273a: Optimization Subgradient Methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Nonsmooth convex function Recall: For ˉx R n, f(ˉx) := {g R

More information

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen Numerisches Rechnen (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2011/12 IGPM, RWTH Aachen Numerisches Rechnen

More information

Convex Optimization Theory. Athena Scientific, Supplementary Chapter 6 on Convex Optimization Algorithms

Convex Optimization Theory. Athena Scientific, Supplementary Chapter 6 on Convex Optimization Algorithms Convex Optimization Theory Athena Scientific, 2009 by Dimitri P. Bertsekas Massachusetts Institute of Technology Supplementary Chapter 6 on Convex Optimization Algorithms This chapter aims to supplement

More information

A Quasi-Newton Algorithm for Nonconvex, Nonsmooth Optimization with Global Convergence Guarantees

A Quasi-Newton Algorithm for Nonconvex, Nonsmooth Optimization with Global Convergence Guarantees Noname manuscript No. (will be inserted by the editor) A Quasi-Newton Algorithm for Nonconvex, Nonsmooth Optimization with Global Convergence Guarantees Frank E. Curtis Xiaocun Que May 26, 2014 Abstract

More information

Convex Optimization. Problem set 2. Due Monday April 26th

Convex Optimization. Problem set 2. Due Monday April 26th Convex Optimization Problem set 2 Due Monday April 26th 1 Gradient Decent without Line-search In this problem we will consider gradient descent with predetermined step sizes. That is, instead of determining

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

ICS-E4030 Kernel Methods in Machine Learning

ICS-E4030 Kernel Methods in Machine Learning ICS-E4030 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 28. September, 2016 Juho Rousu 28. September, 2016 1 / 38 Convex optimization Convex optimisation This

More information

Introduction to Alternating Direction Method of Multipliers

Introduction to Alternating Direction Method of Multipliers Introduction to Alternating Direction Method of Multipliers Yale Chang Machine Learning Group Meeting September 29, 2016 Yale Chang (Machine Learning Group Meeting) Introduction to Alternating Direction

More information

Subgradient Method. Guest Lecturer: Fatma Kilinc-Karzan. Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization /36-725

Subgradient Method. Guest Lecturer: Fatma Kilinc-Karzan. Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization /36-725 Subgradient Method Guest Lecturer: Fatma Kilinc-Karzan Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization 10-725/36-725 Adapted from slides from Ryan Tibshirani Consider the problem Recall:

More information

Unconstrained minimization of smooth functions

Unconstrained minimization of smooth functions Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and

More information

Optimization and Optimal Control in Banach Spaces

Optimization and Optimal Control in Banach Spaces Optimization and Optimal Control in Banach Spaces Bernhard Schmitzer October 19, 2017 1 Convex non-smooth optimization with proximal operators Remark 1.1 (Motivation). Convex optimization: easier to solve,

More information

arxiv: v1 [math.oc] 1 Jul 2016

arxiv: v1 [math.oc] 1 Jul 2016 Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the

More information

Convex Optimization Boyd & Vandenberghe. 5. Duality

Convex Optimization Boyd & Vandenberghe. 5. Duality 5. Duality Convex Optimization Boyd & Vandenberghe Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized

More information

SOLVING A MINIMIZATION PROBLEM FOR A CLASS OF CONSTRAINED MAXIMUM EIGENVALUE FUNCTION

SOLVING A MINIMIZATION PROBLEM FOR A CLASS OF CONSTRAINED MAXIMUM EIGENVALUE FUNCTION International Journal of Pure and Applied Mathematics Volume 91 No. 3 2014, 291-303 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu doi: http://dx.doi.org/10.12732/ijpam.v91i3.2

More information

You should be able to...

You should be able to... Lecture Outline Gradient Projection Algorithm Constant Step Length, Varying Step Length, Diminishing Step Length Complexity Issues Gradient Projection With Exploration Projection Solving QPs: active set

More information

Higher-Order Methods

Higher-Order Methods Higher-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. PCMI, July 2016 Stephen Wright (UW-Madison) Higher-Order Methods PCMI, July 2016 1 / 25 Smooth

More information

PDE-Constrained and Nonsmooth Optimization

PDE-Constrained and Nonsmooth Optimization Frank E. Curtis October 1, 2009 Outline PDE-Constrained Optimization Introduction Newton s method Inexactness Results Summary and future work Nonsmooth Optimization Sequential quadratic programming (SQP)

More information

Lecture 6 : Projected Gradient Descent

Lecture 6 : Projected Gradient Descent Lecture 6 : Projected Gradient Descent EE227C. Lecturer: Professor Martin Wainwright. Scribe: Alvin Wan Consider the following update. x l+1 = Π C (x l α f(x l )) Theorem Say f : R d R is (m, M)-strongly

More information

Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming

Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming Zhaosong Lu October 5, 2012 (Revised: June 3, 2013; September 17, 2013) Abstract In this paper we study

More information

Convex Analysis and Optimization Chapter 2 Solutions

Convex Analysis and Optimization Chapter 2 Solutions Convex Analysis and Optimization Chapter 2 Solutions Dimitri P. Bertsekas with Angelia Nedić and Asuman E. Ozdaglar Massachusetts Institute of Technology Athena Scientific, Belmont, Massachusetts http://www.athenasc.com

More information

Dual Proximal Gradient Method

Dual Proximal Gradient Method Dual Proximal Gradient Method http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/19 1 proximal gradient method

More information

Convex Optimization M2

Convex Optimization M2 Convex Optimization M2 Lecture 3 A. d Aspremont. Convex Optimization M2. 1/49 Duality A. d Aspremont. Convex Optimization M2. 2/49 DMs DM par email: dm.daspremont@gmail.com A. d Aspremont. Convex Optimization

More information

Gradient Sampling Methods for Nonsmooth Optimization

Gradient Sampling Methods for Nonsmooth Optimization Gradient Sampling Methods for Nonsmooth Optimization J.V. Burke F.E. Curtis A.S. Lewis M.L. Overton L.E.A. Simões April 27, 2018 Submitted to: Special Methods for Nonsmooth Optimization, Springer, 2018

More information

Infeasibility Detection and an Inexact Active-Set Method for Large-Scale Nonlinear Optimization

Infeasibility Detection and an Inexact Active-Set Method for Large-Scale Nonlinear Optimization Infeasibility Detection and an Inexact Active-Set Method for Large-Scale Nonlinear Optimization Frank E. Curtis, Lehigh University involving joint work with James V. Burke, University of Washington Daniel

More information

Proximal methods. S. Villa. October 7, 2014

Proximal methods. S. Villa. October 7, 2014 Proximal methods S. Villa October 7, 2014 1 Review of the basics Often machine learning problems require the solution of minimization problems. For instance, the ERM algorithm requires to solve a problem

More information

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL) Part 3: Trust-region methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective

More information

Optimality Conditions for Constrained Optimization

Optimality Conditions for Constrained Optimization 72 CHAPTER 7 Optimality Conditions for Constrained Optimization 1. First Order Conditions In this section we consider first order optimality conditions for the constrained problem P : minimize f 0 (x)

More information

Gradient Descent. Dr. Xiaowei Huang

Gradient Descent. Dr. Xiaowei Huang Gradient Descent Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Three machine learning algorithms: decision tree learning k-nn linear regression only optimization objectives are discussed,

More information

Line Search Methods for Unconstrained Optimisation

Line Search Methods for Unconstrained Optimisation Line Search Methods for Unconstrained Optimisation Lecture 8, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Generic

More information

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Compiled by David Rosenberg Abstract Boyd and Vandenberghe s Convex Optimization book is very well-written and a pleasure to read. The

More information

A Trust Funnel Algorithm for Nonconvex Equality Constrained Optimization with O(ɛ 3/2 ) Complexity

A Trust Funnel Algorithm for Nonconvex Equality Constrained Optimization with O(ɛ 3/2 ) Complexity A Trust Funnel Algorithm for Nonconvex Equality Constrained Optimization with O(ɛ 3/2 ) Complexity Mohammadreza Samadi, Lehigh University joint work with Frank E. Curtis (stand-in presenter), Lehigh University

More information

A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming

A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming Zhaosong Lu Lin Xiao March 9, 2015 (Revised: May 13, 2016; December 30, 2016) Abstract We propose

More information

Lecture: Duality.

Lecture: Duality. Lecture: Duality http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghe s lecture notes Introduction 2/35 Lagrange dual problem weak and strong

More information

Chapter 2 Convex Analysis

Chapter 2 Convex Analysis Chapter 2 Convex Analysis The theory of nonsmooth analysis is based on convex analysis. Thus, we start this chapter by giving basic concepts and results of convexity (for further readings see also [202,

More information

Introduction to gradient descent

Introduction to gradient descent 6-1: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction to gradient descent Derivation and intuitions Hessian 6-2: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction Our

More information

LIMITED MEMORY BUNDLE METHOD FOR LARGE BOUND CONSTRAINED NONSMOOTH OPTIMIZATION: CONVERGENCE ANALYSIS

LIMITED MEMORY BUNDLE METHOD FOR LARGE BOUND CONSTRAINED NONSMOOTH OPTIMIZATION: CONVERGENCE ANALYSIS LIMITED MEMORY BUNDLE METHOD FOR LARGE BOUND CONSTRAINED NONSMOOTH OPTIMIZATION: CONVERGENCE ANALYSIS Napsu Karmitsa 1 Marko M. Mäkelä 2 Department of Mathematics, University of Turku, FI-20014 Turku,

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

3.10 Lagrangian relaxation

3.10 Lagrangian relaxation 3.10 Lagrangian relaxation Consider a generic ILP problem min {c t x : Ax b, Dx d, x Z n } with integer coefficients. Suppose Dx d are the complicating constraints. Often the linear relaxation and the

More information

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 36 Why proximal? Newton s method: for C 2 -smooth, unconstrained problems allow

More information

10-725/36-725: Convex Optimization Spring Lecture 21: April 6

10-725/36-725: Convex Optimization Spring Lecture 21: April 6 10-725/36-725: Conve Optimization Spring 2015 Lecturer: Ryan Tibshirani Lecture 21: April 6 Scribes: Chiqun Zhang, Hanqi Cheng, Waleed Ammar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

Convex Functions. Pontus Giselsson

Convex Functions. Pontus Giselsson Convex Functions Pontus Giselsson 1 Today s lecture lower semicontinuity, closure, convex hull convexity preserving operations precomposition with affine mapping infimal convolution image function supremum

More information

On Solving Large-Scale Finite Minimax Problems. using Exponential Smoothing

On Solving Large-Scale Finite Minimax Problems. using Exponential Smoothing On Solving Large-Scale Finite Minimax Problems using Exponential Smoothing E. Y. Pee and J. O. Royset This paper focuses on finite minimax problems with many functions, and their solution by means of exponential

More information

An Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization

An Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization An Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization Frank E. Curtis, Lehigh University involving joint work with Travis Johnson, Northwestern University Daniel P. Robinson, Johns

More information

Proximal and First-Order Methods for Convex Optimization

Proximal and First-Order Methods for Convex Optimization Proximal and First-Order Methods for Convex Optimization John C Duchi Yoram Singer January, 03 Abstract We describe the proximal method for minimization of convex functions We review classical results,

More information

Outline. Relaxation. Outline DMP204 SCHEDULING, TIMETABLING AND ROUTING. 1. Lagrangian Relaxation. Lecture 12 Single Machine Models, Column Generation

Outline. Relaxation. Outline DMP204 SCHEDULING, TIMETABLING AND ROUTING. 1. Lagrangian Relaxation. Lecture 12 Single Machine Models, Column Generation Outline DMP204 SCHEDULING, TIMETABLING AND ROUTING 1. Lagrangian Relaxation Lecture 12 Single Machine Models, Column Generation 2. Dantzig-Wolfe Decomposition Dantzig-Wolfe Decomposition Delayed Column

More information

Duality in Linear Programs. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Duality in Linear Programs. Lecturer: Ryan Tibshirani Convex Optimization /36-725 Duality in Linear Programs Lecturer: Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: proximal gradient descent Consider the problem x g(x) + h(x) with g, h convex, g differentiable, and

More information

An Inexact Newton Method for Optimization

An Inexact Newton Method for Optimization New York University Brown Applied Mathematics Seminar, February 10, 2009 Brief biography New York State College of William and Mary (B.S.) Northwestern University (M.S. & Ph.D.) Courant Institute (Postdoc)

More information

An Inexact Newton Method for Nonlinear Constrained Optimization

An Inexact Newton Method for Nonlinear Constrained Optimization An Inexact Newton Method for Nonlinear Constrained Optimization Frank E. Curtis Numerical Analysis Seminar, January 23, 2009 Outline Motivation and background Algorithm development and theoretical results

More information

Quasi-Newton methods for minimization

Quasi-Newton methods for minimization Quasi-Newton methods for minimization Lectures for PHD course on Numerical optimization Enrico Bertolazzi DIMS Universitá di Trento November 21 December 14, 2011 Quasi-Newton methods for minimization 1

More information

CSCI : Optimization and Control of Networks. Review on Convex Optimization

CSCI : Optimization and Control of Networks. Review on Convex Optimization CSCI7000-016: Optimization and Control of Networks Review on Convex Optimization 1 Convex set S R n is convex if x,y S, λ,µ 0, λ+µ = 1 λx+µy S geometrically: x,y S line segment through x,y S examples (one

More information

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Optimization Escuela de Ingeniería Informática de Oviedo (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Unconstrained optimization Outline 1 Unconstrained optimization 2 Constrained

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 4 Subgradient Shiqian Ma, MAT-258A: Numerical Optimization 2 4.1. Subgradients definition subgradient calculus duality and optimality conditions Shiqian

More information

Convex Optimization. Ofer Meshi. Lecture 6: Lower Bounds Constrained Optimization

Convex Optimization. Ofer Meshi. Lecture 6: Lower Bounds Constrained Optimization Convex Optimization Ofer Meshi Lecture 6: Lower Bounds Constrained Optimization Lower Bounds Some upper bounds: #iter μ 2 M #iter 2 M #iter L L μ 2 Oracle/ops GD κ log 1/ε M x # ε L # x # L # ε # με f

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE7C (Spring 018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee7c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee7c@berkeley.edu February

More information

A Brief Review on Convex Optimization

A Brief Review on Convex Optimization A Brief Review on Convex Optimization 1 Convex set S R n is convex if x,y S, λ,µ 0, λ+µ = 1 λx+µy S geometrically: x,y S line segment through x,y S examples (one convex, two nonconvex sets): A Brief Review

More information

Constrained Optimization and Lagrangian Duality

Constrained Optimization and Lagrangian Duality CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may

More information

Accelerated primal-dual methods for linearly constrained convex problems

Accelerated primal-dual methods for linearly constrained convex problems Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize

More information

1 Sparsity and l 1 relaxation

1 Sparsity and l 1 relaxation 6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the

More information

Asynchronous Non-Convex Optimization For Separable Problem

Asynchronous Non-Convex Optimization For Separable Problem Asynchronous Non-Convex Optimization For Separable Problem Sandeep Kumar and Ketan Rajawat Dept. of Electrical Engineering, IIT Kanpur Uttar Pradesh, India Distributed Optimization A general multi-agent

More information

Lecture 6: September 12

Lecture 6: September 12 10-725: Optimization Fall 2013 Lecture 6: September 12 Lecturer: Ryan Tibshirani Scribes: Micol Marchetti-Bowick Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have not

More information

Math 273a: Optimization Convex Conjugacy

Math 273a: Optimization Convex Conjugacy Math 273a: Optimization Convex Conjugacy Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Convex conjugate (the Legendre transform) Let f be a closed proper

More information

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus 1/41 Subgradient Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes definition subgradient calculus duality and optimality conditions directional derivative Basic inequality

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE7C (Spring 018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee7c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee7c@berkeley.edu October

More information

A New Trust Region Algorithm Using Radial Basis Function Models

A New Trust Region Algorithm Using Radial Basis Function Models A New Trust Region Algorithm Using Radial Basis Function Models Seppo Pulkkinen University of Turku Department of Mathematics July 14, 2010 Outline 1 Introduction 2 Background Taylor series approximations

More information

Generalized Uniformly Optimal Methods for Nonlinear Programming

Generalized Uniformly Optimal Methods for Nonlinear Programming Generalized Uniformly Optimal Methods for Nonlinear Programming Saeed Ghadimi Guanghui Lan Hongchao Zhang Janumary 14, 2017 Abstract In this paper, we present a generic framewor to extend existing uniformly

More information

Optimization for Machine Learning

Optimization for Machine Learning Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html

More information

10 Numerical methods for constrained problems

10 Numerical methods for constrained problems 10 Numerical methods for constrained problems min s.t. f(x) h(x) = 0 (l), g(x) 0 (m), x X The algorithms can be roughly divided the following way: ˆ primal methods: find descent direction keeping inside

More information

Nonlinear Optimization for Optimal Control

Nonlinear Optimization for Optimal Control Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]

More information

A Unified Analysis of Nonconvex Optimization Duality and Penalty Methods with General Augmenting Functions

A Unified Analysis of Nonconvex Optimization Duality and Penalty Methods with General Augmenting Functions A Unified Analysis of Nonconvex Optimization Duality and Penalty Methods with General Augmenting Functions Angelia Nedić and Asuman Ozdaglar April 16, 2006 Abstract In this paper, we study a unifying framework

More information

The Proximal Gradient Method

The Proximal Gradient Method Chapter 10 The Proximal Gradient Method Underlying Space: In this chapter, with the exception of Section 10.9, E is a Euclidean space, meaning a finite dimensional space endowed with an inner product,

More information

12. Interior-point methods

12. Interior-point methods 12. Interior-point methods Convex Optimization Boyd & Vandenberghe inequality constrained minimization logarithmic barrier function and central path barrier method feasibility and phase I methods complexity

More information

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014 Convex Optimization Dani Yogatama School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA February 12, 2014 Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12,

More information