Algorithms for Nonsmooth Optimization
|
|
- Mitchell Hines
- 5 years ago
- Views:
Transcription
1 Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization 1 of 55
2 Outline Motivating Examples Subdifferential Theory Fundamental Algorithms Nonconvex Nonsmooth Functions General Framework Algorithms for Nonsmooth Optimization 2 of 55
3 Outline Motivating Examples Subdifferential Theory Fundamental Algorithms Nonconvex Nonsmooth Functions General Framework Algorithms for Nonsmooth Optimization 3 of 55
4 Nonsmooth optimization In mathematical optimization, one wants to i.e., minimize an objective subject to constraints Why nonsmooth optimization? min x X f(x) Algorithms for Nonsmooth Optimization 4 of 55
5 Nonsmooth optimization In mathematical optimization, one wants to i.e., minimize an objective subject to constraints min x X f(x) Why nonsmooth optimization? Nonsmoothness can arise for different reasons: physical technological methodological numerical (Bagirov, Karmitsa, Mäkelä (2014)) Algorithms for Nonsmooth Optimization 4 of 55
6 Nonsmooth optimization In mathematical optimization, one wants to i.e., minimize an objective subject to constraints min x X f(x) Why nonsmooth optimization? Nonsmoothness can arise for different reasons: physical (phenomena can be nonsmooth) phase changes in materials technological methodological numerical (Bagirov, Karmitsa, Mäkelä (2014)) Algorithms for Nonsmooth Optimization 4 of 55
7 Nonsmooth optimization In mathematical optimization, one wants to i.e., minimize an objective subject to constraints min x X f(x) Why nonsmooth optimization? Nonsmoothness can arise for different reasons: physical (phenomena can be nonsmooth) phase changes in materials technological (constraints impose nonsmoothness) obstacles in shape design methodological numerical (Bagirov, Karmitsa, Mäkelä (2014)) Algorithms for Nonsmooth Optimization 4 of 55
8 Nonsmooth optimization In mathematical optimization, one wants to i.e., minimize an objective subject to constraints min x X f(x) Why nonsmooth optimization? Nonsmoothness can arise for different reasons: physical (phenomena can be nonsmooth) phase changes in materials technological (constraints impose nonsmoothness) obstacles in shape design methodological (nonsmoothness introduced by solution method) decompositions; penalty formulations numerical (Bagirov, Karmitsa, Mäkelä (2014)) Algorithms for Nonsmooth Optimization 4 of 55
9 Nonsmooth optimization In mathematical optimization, one wants to i.e., minimize an objective subject to constraints min x X f(x) Why nonsmooth optimization? Nonsmoothness can arise for different reasons: physical (phenomena can be nonsmooth) phase changes in materials technological (constraints impose nonsmoothness) obstacles in shape design methodological (nonsmoothness introduced by solution method) decompositions; penalty formulations numerical (analytically smooth, but practically nonsmooth) stiff problems (Bagirov, Karmitsa, Mäkelä (2014)) Algorithms for Nonsmooth Optimization 4 of 55
10 Data fitting min x R n θ(x) + ψ(x) where, e.g., θ(x) = Ax b 2 2 and ψ(x) = φ 1 (t) = n φ(x i ) with i=1 α t 1 + α t, φ 2 (t) = log(α t + 1), φ 3 (t) = t q, or φ 4 (t) = α (α t)2 + α Algorithms for Nonsmooth Optimization 5 of 55
11 Clusterwise linear regression (CLR) Given a dataset of pairs A := {(a i, b i )} l i=1, the goal of CLR is to simultaneously partition the dataset into k disjoint clusters, and find regression coefficients {(x j, y j )} k j=1 for each cluster in order to minimize overall error in the fit; e.g., min f k({x j, y j }), where f k ({x j, y j }) = {(x j,y j )} l min j {1,...,k} xt j a i y j b i p. i=1 This objective is nonconvex (though it is a difference of convex functions). Algorithms for Nonsmooth Optimization 6 of 55
12 Decomposition Various types of decomposition strategies introduce nonsmoothness. Primal decomposition can be used for min f 1(x 1, y) + f 2 (x 2, y), (x 1,x 2,y) where y is the complicating/linking variable; equivalent to min φ 1 (y) + φ 2 (y) where y This master problem may be nonsmooth in y. φ 1 (y) := min x 1 f 1 (x 1, y) φ 2 (y) := min x 2 f 2 (x 2, y) Algorithms for Nonsmooth Optimization 7 of 55
13 Decomposition Various types of decomposition strategies introduce nonsmoothness. Primal decomposition can be used for min f 1(x 1, y) + f 2 (x 2, y), (x 1,x 2,y) where y is the complicating/linking variable; equivalent to min φ 1 (y) + φ 2 (y) where y φ 1 (y) := min x 1 f 1 (x 1, y) φ 2 (y) := min x 2 f 2 (x 2, y) This master problem may be nonsmooth in y. Dual decomposition can be used for same problem, reformulating as min (x 1,x 2,y) f 1(x 1, y 1 ) + f 2 (x 2, y 2 ) s.t. y 1 = y 2 The Lagrangian is separable, meaning the dual function decomposes: g 1 (λ) = inf (f 1(x 1, y 1 ) + λ T y 1 ) (x 1,y 1 ) g 2 (λ) = inf (f 2(x 2, y 2 ) λ T y 2 ) (x 2,y 2 ) Dual problem to maximize g(λ) = g 1 (λ) + g 2 (λ) may be nonsmooth in λ. Algorithms for Nonsmooth Optimization 7 of 55
14 Dual decomposition with constraints Consider the nearly separable problem min (x 1,...,x m) m f i (x i ) i=1 s.t. x i X i for all i {1,..., m} m A i x i b (e.g., shared resource constraint) i=1 where the last are complicating/linking constraints; dualizing leads to ( m m ) g(λ) := min f i (x i ) + λ T A i x i b (x 1,...,x m) i=1 i=1 s.t. x i X i for all i {1,..., m}. Given λ R m, the value g(λ) comes from solving separable problems; the dual max λ 0 g(λ) is typically nonsmooth (and people often use poor algorithms!). Algorithms for Nonsmooth Optimization 8 of 55
15 Control of dynamical systems Consider the discrete time linear dynamical system: y k+1 = Ay k + Bu k z k = Cy k (state equation) (observation equation) Supposing we want to design a control such that u k = XCy k (where X is our variable) consider the closed loop system given by y k+1 = Ay k + Bu k = Ay k + BXCy k = (A + BXC)y k. Common objectives are to minimize a stability measure ρ(a + BXC), which are often functions of the eigenvalues of A + BXC. Algorithms for Nonsmooth Optimization 9 of 55
16 Eigenvalue optimization Plots of ordered eigenvalues as matrix is perturbed along a given direction: Algorithms for Nonsmooth Optimization 10 of 55
17 Other sources of nonsmooth optimization problems Lagrangian relaxation Composite optimization (e.g., penalty methods for soft constraints ) Parametric optimization (e.g., for model predictive control) Multilevel optimization Algorithms for Nonsmooth Optimization 11 of 55
18 Outline Motivating Examples Subdifferential Theory Fundamental Algorithms Nonconvex Nonsmooth Functions General Framework Algorithms for Nonsmooth Optimization 12 of 55
19 Derivatives When I teach an optimization class, I always start with the same question: What is a derivative? (f : R R) Algorithms for Nonsmooth Optimization 13 of 55
20 Derivatives When I teach an optimization class, I always start with the same question: What is a derivative? (f : R R) Answer I get: slope of the tangent line f(x) slope = f (x) x Algorithms for Nonsmooth Optimization 13 of 55
21 Gradients Then I ask: What is a gradient? (f : R n R) Algorithms for Nonsmooth Optimization 14 of 55
22 Gradients Then I ask: What is a gradient? (f : R n R) Answer I get: direction along which the function increases at the fastest rate Algorithms for Nonsmooth Optimization 14 of 55
23 Derivative vs. gradient So if a derivative is a magnitude (here, a slope), then why does it generalize in multiple dimensions to something that is a direction? (n = 1) f (x) = df f (x) = dx x (x) f (x) x 1 (n 1) f(x) =. f (x) x n What s important? Magnitude? direction? Algorithms for Nonsmooth Optimization 15 of 55
24 Derivative vs. gradient So if a derivative is a magnitude (here, a slope), then why does it generalize in multiple dimensions to something that is a direction? (n = 1) f (x) = df f (x) = dx x (x) f (x) x 1 (n 1) f(x) =. f (x) x n What s important? Magnitude? direction? Answer: The gradient is a vector in R n, which has magnitude (e.g., its 2-norm) can be viewed as a direction and gives us a way to compute directional derivatives Algorithms for Nonsmooth Optimization 15 of 55
25 Differentiable f How should we think about the gradient? If f is continuously differentiable (i.e., f C 1 ), then f(x) is the unique vector in the linear (Taylor) approximation of f at x. f(x) Both are graphs of functions of x! f(x) + f(x) T (x x) x x Algorithms for Nonsmooth Optimization 16 of 55
26 Differentiable and convex f If f C 1 is convex, then f(x) f(x) + f(x) T (x x) for all (x, x) R n R n f(x) f(x) + f(x) T (x x) x x Algorithms for Nonsmooth Optimization 17 of 55
27 Graphs and epigraphs There is another interpretation of a gradient that is also useful. First... What is a graph? Algorithms for Nonsmooth Optimization 18 of 55
28 Graphs and epigraphs There is another interpretation of a gradient that is also useful. First... What is a graph? A set of points in R n+1, namely, {(x, z) : f(x) = z} {(x, f(x))} x Algorithms for Nonsmooth Optimization 18 of 55
29 Graphs and epigraphs There is another interpretation of a gradient that is also useful. First... What is a graph? A set of points in R n+1, namely, {(x, z) : f(x) = z} A related quantity, another set, is the epigraph: {(x, z) : f(x) z} {(x, f(x))} x Algorithms for Nonsmooth Optimization 18 of 55
30 Differentiable and convex f If f C 1 is convex, then, for all (x, x) R n R n, f(x) f(x) + f(x) T (x x) f(x) f(x) T x f(x) f(x) T x [ ] T [ ] [ ] T [ ] f(x) x f(x) x 1 f(x) 1 f(x) Algorithms for Nonsmooth Optimization 19 of 55
31 Differentiable and convex f If f C 1 is convex, then, for all (x, x) R n R n, f(x) f(x) + f(x) T (x x) f(x) f(x) T x f(x) f(x) T x [ ] T [ ] [ ] T [ ] f(x) x f(x) x 1 f(x) 1 f(x) Note: Given x, the vector [ ] f(x) is given, 1 so the inequality above involves a linear function over R n+1 and says [ ] [ ] x x the value at any point in the graph is at least the value at f(x) f(x) Algorithms for Nonsmooth Optimization 19 of 55
32 Linearization f(x) f(x) + f(x) T (x x) x x Algorithms for Nonsmooth Optimization 20 of 55
33 Linearization and supporting hyperplane for epigraph [ ] [ ] x f(x) + f(x) 1 {(x, f(x))} f(x) + f(x) T (x x) x x Algorithms for Nonsmooth Optimization 20 of 55
34 Subgradients (convex f) Why was that useful? We can generalize this idea when the function is not differentiable somewhere. [ ] [ ] x g + f(x) 1 [ ] x f(x) A vector g R n is a subgradient of a convex f : R n R at x R n if f(x) f(x) + g T (x x) [ ] T [ ] [ ] T [ ] g x g x 1 f(x) 1 f(x) Algorithms for Nonsmooth Optimization 21 of 55
35 Subdifferentials Theorem If f is convex and differentiable at x, then f(x) is its unique subgradient at x. But in general, the set of all subgradients for a convex f at x is the subdifferential of f at x: f(x) := {g R n : g is a subgradient of f at x}. From the definition, it is easily seen that x is a minimizer of f if and only if 0 f(x ) Algorithms for Nonsmooth Optimization 22 of 55
36 What about nonconvex, nonsmooth? We need to generalize the idea of a subgradient further. Directional derivatives Subgradients Subdifferentials Let s return to this after we discuss some algorithms... Algorithms for Nonsmooth Optimization 23 of 55
37 Outline Motivating Examples Subdifferential Theory Fundamental Algorithms Nonconvex Nonsmooth Functions General Framework Algorithms for Nonsmooth Optimization 24 of 55
38 A fundamental iteration Thinking of f(x k ), we have a vector that directs us in a direction of descent, and vanishes as we approach a minimizer Algorithms for Nonsmooth Optimization 25 of 55
39 A fundamental iteration Thinking of f(x k ), we have a vector that directs us in a direction of descent, and vanishes as we approach a minimizer Algorithm : Gradient Descent 1: Choose an initial point x 0 R n and stepsize α (0, 1/L] 2: for k = 0, 1, 2,... do 3: if f(x k ) 0, then return x k 4: else set x k+1 x k α f(x k ) I call this a fundamental iteration. Algorithms for Nonsmooth Optimization 25 of 55
40 A fundamental iteration Thinking of f(x k ), we have a vector that directs us in a direction of descent, and vanishes as we approach a minimizer Algorithm : Gradient Descent 1: Choose an initial point x 0 R n and stepsize α (0, 1/L] 2: for k = 0, 1, 2,... do 3: if f(x k ) 0, then return x k 4: else set x k+1 x k α f(x k ) I call this a fundamental iteration. Here, we suppose f is Lipschitz continuous, i.e., there exists L 0 such that f(x) f(x) 2 L x x 2 for all (x, x) R n R n = f(x) f(x) + f(x) T (x x) L x x 2 2 for all (x, x) R n R n. Algorithms for Nonsmooth Optimization 25 of 55
41 Convergence of gradient descent f(x k ) x k x Algorithms for Nonsmooth Optimization 26 of 55
42 Convergence of gradient descent f(x k ) f(x)? f(x)? x k x Algorithms for Nonsmooth Optimization 26 of 55
43 Convergence of gradient descent f(x k ) f(x k ) + f(x k ) T (x x k ) L x x k 2 2 x k x Algorithms for Nonsmooth Optimization 26 of 55
44 Gradient descent for f Theorem If f is Lipschitz continuous with constant L > 0 and α (0, 1/L], then f(x j ) 2 2 < which implies { f(x j )} 0. j=0 Algorithms for Nonsmooth Optimization 27 of 55
45 Gradient descent for f Theorem If f is Lipschitz continuous with constant L > 0 and α (0, 1/L], then f(x j ) 2 2 < which implies { f(x j )} 0. j=0 Proof. Let k N and recall that x k+1 x k = α f(x k ). Then, since α (0, 1/L], f(x k+1 ) f(x k ) + f(x k ) T (x k+1 x k ) L x k+1 x k 2 2 = f(x k ) α f(x k ) α2 L f(x k ) 2 2 = f(x k ) α(1 1 2 αl) f(x k) 2 2 f(x k ) 1 2 α f(x k) 2 2. Thus, summing over j {0,..., k}, one finds > f(x 0 ) f inf f(x 0 ) f(x k+1 ) 1 2 α k j=0 f(x j) 2 2. Algorithms for Nonsmooth Optimization 27 of 55
46 Strong convexity Now suppose that f is c-strongly convex, which means that f(x) f(x) + f(x) T (x x) c x x 2 2 for all (x, x) Rn R n. Important consequences of this are that f has a unique global minimizer, call it x with f := f(x ), and the gradient norm grows with the optimality error in that 2c(f(x) f ) f(x) 2 2 for all x R n. Algorithms for Nonsmooth Optimization 28 of 55
47 Strong convexity, lower bound f(x k ) f(x k ) + f(x k ) T (x x k ) L x x k 2 2 x k x Algorithms for Nonsmooth Optimization 29 of 55
48 Strong convexity, lower bound f(x k ) f(x k ) + f(x k ) T (x x k ) L x x k 2 2 f(x k ) + f(x k ) T (x x k ) c x x k 2 2 x k x Algorithms for Nonsmooth Optimization 29 of 55
49 Gradient descent for strongly convex f Theorem If f is Lipschitz with L > 0, f is c-strongly convex, and α (0, 1/L], then f(x j+1 ) f (1 αc) j+1 (f(x 0 ) f ) for all j N. Algorithms for Nonsmooth Optimization 30 of 55
50 Gradient descent for strongly convex f Theorem If f is Lipschitz with L > 0, f is c-strongly convex, and α (0, 1/L], then f(x j+1 ) f (1 αc) j+1 (f(x 0 ) f ) for all j N. Proof. Let k N. Following the previous proof, one finds f(x k+1 ) f(x k ) 1 2 α f(x k) 2 2 f(x k ) αc(f(x k ) f ). Subtracting f from both sides, one finds f(x k+1 ) f (1 αc)(f(x k ) f ). Applying the result repeatedly over j {0,..., k} yields the result. Algorithms for Nonsmooth Optimization 30 of 55
51 A fundamental iteration when f is nonsmooth? What is a fundamental iteration for nonsmooth optimization? Algorithms for Nonsmooth Optimization 31 of 55
52 A fundamental iteration when f is nonsmooth? What is a fundamental iteration for nonsmooth optimization? Steepest descent! For convex f, the directional derivative of f at x along s is f (x; s) = max g f(x) gt s Along which direction is f decreasing at the fastest rate? Algorithms for Nonsmooth Optimization 31 of 55
53 A fundamental iteration when f is nonsmooth? What is a fundamental iteration for nonsmooth optimization? Steepest descent! For convex f, the directional derivative of f at x along s is f (x; s) = max g f(x) gt s Along which direction is f decreasing at the fastest rate? The solution of an optimization problem! min f (x; s) = min max s 2 1 s 2 1 g f(x) gt s = max min g f(x) s 2 1 gt s (von Neumann minimax theorem) = max g f(x) ( g 2) = min g f(x) g 2 = (need minimum norm subgradient) Algorithms for Nonsmooth Optimization 31 of 55
54 Main challenge But, typically, we can only access g f(x), not all of f(x) I would argue: no practical fundamental iteration for general nonsmooth optimization (no computable descent direction that vanishes near a minimizer) What are our options? Algorithms for Nonsmooth Optimization 32 of 55
55 Main challenge But, typically, we can only access g f(x), not all of f(x) I would argue: no practical fundamental iteration for general nonsmooth optimization (no computable descent direction that vanishes near a minimizer) What are our options? There are a few ways to design a convergent algorithm: algorithmically (e.g., subgradient method) iteratively (e.g., cutting plane / bundle methods) randomly (e.g., gradient sampling) Algorithms for Nonsmooth Optimization 32 of 55
56 Subgradient method Algorithm : Subgradient method (not descent) 1: Choose an initial point x 0 R n. 2: for k = 0, 1, 2,... do 3: if a termination condition is satisfied, then return x k 4: else compute g k f(x k ), choose α k R >0, and set x k+1 x k α k g k Algorithms for Nonsmooth Optimization 33 of 55
57 Why not subgradient descent? Consider min f(x), where f(x 1, x 2 ) := x 1 + x 2 + max{0, x x2 2 4}. x R 2 At x = (0, 2), we have {[ [ ]} [ f(x) = conv,, but 1] 3 1] are both directions of ascent for f from x! and [ ] 1 3 Algorithms for Nonsmooth Optimization 34 of 55
58 Decreasing the distance to a solution The objective f is not the only measure of progress. Given an arbitrary subgradient g k for f at x k, we have f(x) f(x k ) + gk T (x x k) for all x R n, (1) which means that all points with an objective value lower than f(x k ) lie in H k := {x R n : gk T (x x k) 0} Thus, a small step along g k should decrease the distance to a solution (Convexity is crucial for this idea) Algorithms for Nonsmooth Optimization 35 of 55
59 Algorithmic convergence Theorem If f has a minimizer, g k 2 G R >0 for all k N, and the stepsizes satisfy α k = k=1 and α 2 k <, (2) k=1 then { lim k min j {0,...,k} f j } = f. An example sequence satisfying (2) is α k = α/k for k = 1, 2,... Algorithms for Nonsmooth Optimization 36 of 55
60 Proof, lim k { minj {0,...,k} f j } = f, part 1. Let k N. By (1), the iterates satisfy x k+1 x 2 2 = x k α k g k x 2 2 Applying this inequality recursively, we have = x k x 2 2 2α kg T k (x k x ) + α 2 k g k 2 2 x k x 2 2 2α k (f k f ) + α 2 k g k 2 2. k k 0 x k+1 x 2 2 x 0 x α j (f j f ) + α 2 j g j 2 2, which implies that j=0 j=0 k k 2 α j (f j f ) x 0 x α 2 j g j 2 2 j=0 j=1 min f j f x 0 x k G2 j=1 α2 j j {0,...,k} 2 k j=0 α. (3) j Algorithms for Nonsmooth Optimization 37 of 55
61 Proof, lim k { minj {0,...,k} f j } = f, part 2. Now consider an arbitrary scalar ɛ > 0. By (2), there exists a nonnegative integer K such that, for all k > K, α k ɛ k G 2 and α j 1 K x 0 x G 2 α 2 j. ɛ j=0 j=0 Then, by (3), it follows that for all k > K we have min f j f x 0 x G2 K j=0 α2 j j {0,...,k} 2 k j=0 α + j 2 ɛ x 0 x G2 K j=0 α2 j ( x 0 x G2 K j=0 α2 j = ɛ 2 + ɛ 2 = ɛ. The result follows since ɛ > 0 was chosen arbitrarily. G 2 k j=k+1 α2 j 2 K j=0 α j + 2 k j=k+1 α j ) + G2 k j=k+1 ɛ G 2 α j 2 k j=k+1 α j Algorithms for Nonsmooth Optimization 38 of 55
62 Cutting plane method Subgradient methods lose previously computed information in every iteration. Algorithms for Nonsmooth Optimization 39 of 55
63 Cutting plane method Subgradient methods lose previously computed information in every iteration. Suppose, after a sequence of iterates, we have the affine underestimators f i (x) = f(x i ) + gi T (x x i) for all i {0,..., k}. f(x 1 ) + g1 T (x x 1) f(x) f(x 0 ) + g T 0 (x x 0) x 0 x 2 x 1 x At iteration k, we can compute the next iterate by solving the master problem x k+1 arg min ˆf k (x), where ˆf k (x) := max (f(x i) + gi T (x x i)). x X i {1,...,k} Algorithms for Nonsmooth Optimization 39 of 55
64 Cutting plane method convergence The iterates of the cutting plane method yield lower bounds of the optimal value: v k+1 := min x X ˆf k (x) min f(x) =: f. x X Therefore, if v k+1 = f(x k+1 ), then we terminate since f(x k+1 ) = f. Algorithms for Nonsmooth Optimization 40 of 55
65 Cutting plane method convergence The iterates of the cutting plane method yield lower bounds of the optimal value: v k+1 := min x X ˆf k (x) min f(x) =: f. x X Therefore, if v k+1 = f(x k+1 ), then we terminate since f(x k+1 ) = f. If f is piecewise linear, then convergence occurs in finitely many iterations! f(x 1 ) + g T 1 (x x 1) f(x) f(x 0 ) + g T 0 (x x 0) x 0 x 2 x 1 x However, in general, we have the following theorem. Theorem The cutting plane method yields {x k } satisfying {f(x k )} f. Algorithms for Nonsmooth Optimization 40 of 55
66 Bundle method A bundle method attempts to combine the practical advantages of a cutting plane method with the theoretical strengths of a proximal point method. Given x k, consider the regularized master problem ( min ˆfk (x) + γ ) x R n 2 x x k 2 2, where ˆf k (x) := max(f(x i ) + gi T (x x i)). i I k Here, I k {1,..., k 1} indicates a subset of previous iterations. This problem is equivalent to the quadratic optimization problem min (x,v) R n R v + γ 2 x x k 2 2 s.t. f(x i ) + g T i (x x i) v for all i I k. Only move to a new point when a sufficient decrease is obtained. Convergence rate analyses are limited; O( 1 ɛ log( 1 )) for strongly convex f ɛ Algorithms for Nonsmooth Optimization 41 of 55
67 Bundle method convergence Analysis makes use of the Moreau-Yosida regularization function f γ(x) = min x R n ( f(x) γ x x 2 2). Theorem If x k is not a minimizer, then f γ(x k ) < f(x k ). Algorithms for Nonsmooth Optimization 42 of 55
68 Bundle method convergence Theorem For all (k, j) N N in a bundle method, v k,j γ x k,j x k 2 2 f γ(x k ) < f(x k ). Algorithms for Nonsmooth Optimization 43 of 55
69 Outline Motivating Examples Subdifferential Theory Fundamental Algorithms Nonconvex Nonsmooth Functions General Framework Algorithms for Nonsmooth Optimization 44 of 55
70 Clarke subdifferential What if f is nonconvex and nonsmooth? What are subgradients? We still need some structure; we assume f is locally Lipschitz and f is differentiable on a full measure set D Algorithms for Nonsmooth Optimization 45 of 55
71 Clarke subdifferential What if f is nonconvex and nonsmooth? What are subgradients? We still need some structure; we assume f is locally Lipschitz and f is differentiable on a full measure set D The Clarke subdifferential of f at x is { } f(x) = conv lim f(x j) : x j x and x j D, j i.e., convex hull of limits of gradients of f at points in D converging to x Algorithms for Nonsmooth Optimization 45 of 55
72 Clarke subdifferential What if f is nonconvex and nonsmooth? What are subgradients? We still need some structure; we assume f is locally Lipschitz and f is differentiable on a full measure set D The Clarke subdifferential of f at x is { } f(x) = conv lim f(x j) : x j x and x j D, j i.e., convex hull of limits of gradients of f at points in D converging to x Theorem If f is continuously differentiable at x, then f(x) = { f(x)} Algorithms for Nonsmooth Optimization 45 of 55
73 Differentiable, but nonsmooth Theorem If f is differentiable at x, then { f(x)} f(x) (not necessarily equal) Considering { x 2 cos( 1 f(x) = x ) if x 0 0 if x = 0 one finds that f (0) = 0 yet [ 1, 1] f(0) Algorithms for Nonsmooth Optimization 46 of 55
74 Clarke ɛ-subdifferential As before, we typically cannot compute f(x). It is approximated by the Clarke ɛ-subdifferential, namely, ɛf(x) = conv{ f(b(x, ɛ))}, which in turn can be approximated as in ɛf(x) conv{ f(x k ), f(x k,1 ),..., f(x k,m )}, where {x k,1,..., x k,m } B(x k, ɛ). Algorithms for Nonsmooth Optimization 47 of 55
75 Clarke ɛ-subdifferential and gradient sampling As before, we typically cannot compute f(x). It is approximated by the Clarke ɛ-subdifferential, namely, ɛf(x) = conv{ f(b(x, ɛ))}, which in turn can be approximated as in ɛf(x) conv{ f(x k ), f(x k,1 ),..., f(x k,m )}, where {x k,1,..., x k,m } B(x k, ɛ). In gradient sampling, we compute the minimum norm element in which is equivalent to solving min (x,v) R n R v + x x k 2 2 conv{ f(x k ), f(x k,1 ),..., f(x k,m )}, s.t. f(x k ) + f(x k,i ) T (x x k ) v for all i {1,..., m} Algorithms for Nonsmooth Optimization 47 of 55
76 Outline Motivating Examples Subdifferential Theory Fundamental Algorithms Nonconvex Nonsmooth Functions General Framework Algorithms for Nonsmooth Optimization 48 of 55
77 Popular and effective method Despite all I ve talked about, a very effective method: BFGS Algorithms for Nonsmooth Optimization 49 of 55
78 Popular and effective method Despite all I ve talked about, a very effective method: BFGS Approximate second-order information with gradient displacements: x k+1 x k x Secant equation H k y k = s k to match gradient of f at x k, where s k := x k+1 x k and y k := f(x k+1 ) f(x k ) Algorithms for Nonsmooth Optimization 49 of 55
79 BFGS-type updates Inverse Hessian and Hessian approximation updating formulas (s T k v k > 0): ( W k+1 I v ks T k s T k v k ( H k+1 I s ks T k H k s T k H ks k ) T W k ( I v ks T k s T k v k ) T ( H k ) I s ks T k H k s T k H ks k With an appropriate technique for choosing v k, we attain self-correcting properties for {H k } and {W k } + s ks T k s T k v k ) + v kv T k s T k v k (inverse) Hessian approximations that can be used in other algorithms Algorithms for Nonsmooth Optimization 50 of 55
80 Subproblems in nonsmooth optimization algorithms With sets of points, scalars, and (sub)gradients {x k,j } m j=1, {f k,j} m j=1, {g k,j} m j=1, nonsmooth optimization methods involve the primal subproblem ( min x R n max {f k,j + gk,j T (x x k,j)} + 1 j {1,...,m} 2 (x x k) T H k (x x k ) s.t. x x k δ k, ) (P) but, with G k [g k,1 g k,m ], it is typically more efficient to solve the dual sup (ω,γ) R m + Rn 1 2 (G kω + γ) T W k (G k ω + γ) + b T k ω δ k γ s.t. 1 T m ω = 1. (D) The primal solution can then be recovered by x k x k W k (G k ω k + γ k ). }{{} g k Algorithms for Nonsmooth Optimization 51 of 55
81 Algorithm Self-Correcting Variable-Metric Alg. for Nonsmooth Opt. 1: Choose x 1 R n. 2: Choose a symmetric positive definite W 1 R n n. 3: Choose α (0, 1) 4: for k = 1, 2,... do 5: Solve (P) (D) such that setting 6: yields G k [ g k,1 g k,m ], s k W k (G k ω k + γ k ), and x k+1 x k + s k f(x k+1 ) f(x k ) 1 2 α(g kω k + γ k ) T W k (G k ω k + γ k ). 7: Choose v k (details omitted, but very simple) 8: Set ( ) T ( W k W k+1 I v ks T k s T k v k I v ks T k s T k v k ) + s ks T k s T k v. k Algorithms for Nonsmooth Optimization 52 of 55
82 Instances of the framework Cutting plane / bundle methods Points added incrementally until sufficient decrease obtained Finite number of additions until accepted step Gradient sampling methods Points added randomly / incrementally until sufficient decrease obtained Sufficient number of iterations with good steps In any case: convergence guarantees require {W k } to be uniformly positive definite and bounded on a sufficient number of accepted steps Algorithms for Nonsmooth Optimization 53 of 55
83 C++ implementation: NonOpt BFGS w/ weak Wolfe line search Name Exit ɛ end f(x end) #iter #func #grad #subs maxq Stationary +9.77e e mxhilb Stepsize +3.13e e chained lq Stepsize +5.00e e chained cb3 1 Stepsize +1.00e e chained cb3 2 Stepsize +1.00e e active faces Stepsize +2.50e e brown function 2 Stepsize +1.00e e chained mifflin 2 Stepsize +5.00e e chained crescent 1 Stepsize +1.00e e chained crescent 2 Stepsize +1.00e e Bundle method with self-correcting properties Name Exit ɛ end f(x end) #iter #func #grad #subs maxq Stationary +9.77e e mxhilb Stationary +9.77e e chained lq Stationary +9.77e e chained cb3 1 Stationary +9.77e e chained cb3 2 Stationary +9.77e e active faces Stationary +9.77e e brown function 2 Stationary +9.77e e chained mifflin 2 Stationary +9.77e e chained crescent 1 Stationary +9.77e e chained crescent 2 Stationary +9.77e e Algorithms for Nonsmooth Optimization 54 of 55
84 Thanks! NonOpt coming soon... Andreas could finish in a day what has taken me 6 months on sabbatical, so it ll be done when he has a free day ;-) Thanks for listening! Algorithms for Nonsmooth Optimization 55 of 55
Kaisa Joki Adil M. Bagirov Napsu Karmitsa Marko M. Mäkelä. New Proximal Bundle Method for Nonsmooth DC Optimization
Kaisa Joki Adil M. Bagirov Napsu Karmitsa Marko M. Mäkelä New Proximal Bundle Method for Nonsmooth DC Optimization TUCS Technical Report No 1130, February 2015 New Proximal Bundle Method for Nonsmooth
More informationCoordinate Update Algorithm Short Course Subgradients and Subgradient Methods
Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 30 Notation f : H R { } is a closed proper convex function domf := {x R n
More informationmin f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;
Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many
More informationActive sets, steepest descent, and smooth approximation of functions
Active sets, steepest descent, and smooth approximation of functions Dmitriy Drusvyatskiy School of ORIE, Cornell University Joint work with Alex D. Ioffe (Technion), Martin Larsson (EPFL), and Adrian
More informationPrimal/Dual Decomposition Methods
Primal/Dual Decomposition Methods Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2018-19, HKUST, Hong Kong Outline of Lecture Subgradients
More informationSubgradient Method. Ryan Tibshirani Convex Optimization
Subgradient Method Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last last time: gradient descent min x f(x) for f convex and differentiable, dom(f) = R n. Gradient descent: choose initial
More informationLecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem
Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Michael Patriksson 0-0 The Relaxation Theorem 1 Problem: find f := infimum f(x), x subject to x S, (1a) (1b) where f : R n R
More informationOptimization methods
Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to
More informationHow to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization
How to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization Frank E. Curtis Department of Industrial and Systems Engineering, Lehigh University Daniel P. Robinson Department
More informationIntroduction. New Nonsmooth Trust Region Method for Unconstraint Locally Lipschitz Optimization Problems
New Nonsmooth Trust Region Method for Unconstraint Locally Lipschitz Optimization Problems Z. Akbari 1, R. Yousefpour 2, M. R. Peyghami 3 1 Department of Mathematics, K.N. Toosi University of Technology,
More informationA globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications
A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications Weijun Zhou 28 October 20 Abstract A hybrid HS and PRP type conjugate gradient method for smooth
More informationON A CLASS OF NONSMOOTH COMPOSITE FUNCTIONS
MATHEMATICS OF OPERATIONS RESEARCH Vol. 28, No. 4, November 2003, pp. 677 692 Printed in U.S.A. ON A CLASS OF NONSMOOTH COMPOSITE FUNCTIONS ALEXANDER SHAPIRO We discuss in this paper a class of nonsmooth
More informationMath 273a: Optimization Subgradients of convex functions
Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 42 Subgradients Assumptions
More informationStochastic Optimization Algorithms Beyond SG
Stochastic Optimization Algorithms Beyond SG Frank E. Curtis 1, Lehigh University involving joint work with Léon Bottou, Facebook AI Research Jorge Nocedal, Northwestern University Optimization Methods
More informationIntroduction. A Modified Steepest Descent Method Based on BFGS Method for Locally Lipschitz Functions. R. Yousefpour 1
A Modified Steepest Descent Method Based on BFGS Method for Locally Lipschitz Functions R. Yousefpour 1 1 Department Mathematical Sciences, University of Mazandaran, Babolsar, Iran; yousefpour@umz.ac.ir
More informationCS-E4830 Kernel Methods in Machine Learning
CS-E4830 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 27. September, 2017 Juho Rousu 27. September, 2017 1 / 45 Convex optimization Convex optimisation This
More informationLECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE
LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE CONVEX ANALYSIS AND DUALITY Basic concepts of convex analysis Basic concepts of convex optimization Geometric duality framework - MC/MC Constrained optimization
More information5. Duality. Lagrangian
5. Duality Convex Optimization Boyd & Vandenberghe Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized
More informationFrank-Wolfe Method. Ryan Tibshirani Convex Optimization
Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)
More informationR-Linear Convergence of Limited Memory Steepest Descent
R-Linear Convergence of Limited Memory Steepest Descent Frank E. Curtis, Lehigh University joint work with Wei Guo, Lehigh University OP17 Vancouver, British Columbia, Canada 24 May 2017 R-Linear Convergence
More informationConditional Gradient (Frank-Wolfe) Method
Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties
More informationA quasisecant method for minimizing nonsmooth functions
A quasisecant method for minimizing nonsmooth functions Adil M. Bagirov and Asef Nazari Ganjehlou Centre for Informatics and Applied Optimization, School of Information Technology and Mathematical Sciences,
More informationConvex Optimization. Newton s method. ENSAE: Optimisation 1/44
Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)
More information5 Handling Constraints
5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest
More informationLecture 24 November 27
EE 381V: Large Scale Optimization Fall 01 Lecture 4 November 7 Lecturer: Caramanis & Sanghavi Scribe: Jahshan Bhatti and Ken Pesyna 4.1 Mirror Descent Earlier, we motivated mirror descent as a way to improve
More informationInverse problems Total Variation Regularization Mark van Kraaij Casa seminar 23 May 2007 Technische Universiteit Eindh ove n University of Technology
Inverse problems Total Variation Regularization Mark van Kraaij Casa seminar 23 May 27 Introduction Fredholm first kind integral equation of convolution type in one space dimension: g(x) = 1 k(x x )f(x
More informationUnconstrained optimization
Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout
More informationConvex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013
Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for
More informationMath 273a: Optimization Subgradient Methods
Math 273a: Optimization Subgradient Methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Nonsmooth convex function Recall: For ˉx R n, f(ˉx) := {g R
More informationNumerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen
Numerisches Rechnen (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2011/12 IGPM, RWTH Aachen Numerisches Rechnen
More informationConvex Optimization Theory. Athena Scientific, Supplementary Chapter 6 on Convex Optimization Algorithms
Convex Optimization Theory Athena Scientific, 2009 by Dimitri P. Bertsekas Massachusetts Institute of Technology Supplementary Chapter 6 on Convex Optimization Algorithms This chapter aims to supplement
More informationA Quasi-Newton Algorithm for Nonconvex, Nonsmooth Optimization with Global Convergence Guarantees
Noname manuscript No. (will be inserted by the editor) A Quasi-Newton Algorithm for Nonconvex, Nonsmooth Optimization with Global Convergence Guarantees Frank E. Curtis Xiaocun Que May 26, 2014 Abstract
More informationConvex Optimization. Problem set 2. Due Monday April 26th
Convex Optimization Problem set 2 Due Monday April 26th 1 Gradient Decent without Line-search In this problem we will consider gradient descent with predetermined step sizes. That is, instead of determining
More informationOptimization methods
Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,
More informationICS-E4030 Kernel Methods in Machine Learning
ICS-E4030 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 28. September, 2016 Juho Rousu 28. September, 2016 1 / 38 Convex optimization Convex optimisation This
More informationIntroduction to Alternating Direction Method of Multipliers
Introduction to Alternating Direction Method of Multipliers Yale Chang Machine Learning Group Meeting September 29, 2016 Yale Chang (Machine Learning Group Meeting) Introduction to Alternating Direction
More informationSubgradient Method. Guest Lecturer: Fatma Kilinc-Karzan. Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization /36-725
Subgradient Method Guest Lecturer: Fatma Kilinc-Karzan Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization 10-725/36-725 Adapted from slides from Ryan Tibshirani Consider the problem Recall:
More informationUnconstrained minimization of smooth functions
Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and
More informationOptimization and Optimal Control in Banach Spaces
Optimization and Optimal Control in Banach Spaces Bernhard Schmitzer October 19, 2017 1 Convex non-smooth optimization with proximal operators Remark 1.1 (Motivation). Convex optimization: easier to solve,
More informationarxiv: v1 [math.oc] 1 Jul 2016
Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the
More informationConvex Optimization Boyd & Vandenberghe. 5. Duality
5. Duality Convex Optimization Boyd & Vandenberghe Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized
More informationSOLVING A MINIMIZATION PROBLEM FOR A CLASS OF CONSTRAINED MAXIMUM EIGENVALUE FUNCTION
International Journal of Pure and Applied Mathematics Volume 91 No. 3 2014, 291-303 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu doi: http://dx.doi.org/10.12732/ijpam.v91i3.2
More informationYou should be able to...
Lecture Outline Gradient Projection Algorithm Constant Step Length, Varying Step Length, Diminishing Step Length Complexity Issues Gradient Projection With Exploration Projection Solving QPs: active set
More informationHigher-Order Methods
Higher-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. PCMI, July 2016 Stephen Wright (UW-Madison) Higher-Order Methods PCMI, July 2016 1 / 25 Smooth
More informationPDE-Constrained and Nonsmooth Optimization
Frank E. Curtis October 1, 2009 Outline PDE-Constrained Optimization Introduction Newton s method Inexactness Results Summary and future work Nonsmooth Optimization Sequential quadratic programming (SQP)
More informationLecture 6 : Projected Gradient Descent
Lecture 6 : Projected Gradient Descent EE227C. Lecturer: Professor Martin Wainwright. Scribe: Alvin Wan Consider the following update. x l+1 = Π C (x l α f(x l )) Theorem Say f : R d R is (m, M)-strongly
More informationIterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming
Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming Zhaosong Lu October 5, 2012 (Revised: June 3, 2013; September 17, 2013) Abstract In this paper we study
More informationConvex Analysis and Optimization Chapter 2 Solutions
Convex Analysis and Optimization Chapter 2 Solutions Dimitri P. Bertsekas with Angelia Nedić and Asuman E. Ozdaglar Massachusetts Institute of Technology Athena Scientific, Belmont, Massachusetts http://www.athenasc.com
More informationDual Proximal Gradient Method
Dual Proximal Gradient Method http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/19 1 proximal gradient method
More informationConvex Optimization M2
Convex Optimization M2 Lecture 3 A. d Aspremont. Convex Optimization M2. 1/49 Duality A. d Aspremont. Convex Optimization M2. 2/49 DMs DM par email: dm.daspremont@gmail.com A. d Aspremont. Convex Optimization
More informationGradient Sampling Methods for Nonsmooth Optimization
Gradient Sampling Methods for Nonsmooth Optimization J.V. Burke F.E. Curtis A.S. Lewis M.L. Overton L.E.A. Simões April 27, 2018 Submitted to: Special Methods for Nonsmooth Optimization, Springer, 2018
More informationInfeasibility Detection and an Inexact Active-Set Method for Large-Scale Nonlinear Optimization
Infeasibility Detection and an Inexact Active-Set Method for Large-Scale Nonlinear Optimization Frank E. Curtis, Lehigh University involving joint work with James V. Burke, University of Washington Daniel
More informationProximal methods. S. Villa. October 7, 2014
Proximal methods S. Villa October 7, 2014 1 Review of the basics Often machine learning problems require the solution of minimization problems. For instance, the ERM algorithm requires to solve a problem
More informationPart 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)
Part 3: Trust-region methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective
More informationOptimality Conditions for Constrained Optimization
72 CHAPTER 7 Optimality Conditions for Constrained Optimization 1. First Order Conditions In this section we consider first order optimality conditions for the constrained problem P : minimize f 0 (x)
More informationGradient Descent. Dr. Xiaowei Huang
Gradient Descent Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Three machine learning algorithms: decision tree learning k-nn linear regression only optimization objectives are discussed,
More informationLine Search Methods for Unconstrained Optimisation
Line Search Methods for Unconstrained Optimisation Lecture 8, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Generic
More informationExtreme Abridgment of Boyd and Vandenberghe s Convex Optimization
Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Compiled by David Rosenberg Abstract Boyd and Vandenberghe s Convex Optimization book is very well-written and a pleasure to read. The
More informationA Trust Funnel Algorithm for Nonconvex Equality Constrained Optimization with O(ɛ 3/2 ) Complexity
A Trust Funnel Algorithm for Nonconvex Equality Constrained Optimization with O(ɛ 3/2 ) Complexity Mohammadreza Samadi, Lehigh University joint work with Frank E. Curtis (stand-in presenter), Lehigh University
More informationA Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming
A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming Zhaosong Lu Lin Xiao March 9, 2015 (Revised: May 13, 2016; December 30, 2016) Abstract We propose
More informationLecture: Duality.
Lecture: Duality http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghe s lecture notes Introduction 2/35 Lagrange dual problem weak and strong
More informationChapter 2 Convex Analysis
Chapter 2 Convex Analysis The theory of nonsmooth analysis is based on convex analysis. Thus, we start this chapter by giving basic concepts and results of convexity (for further readings see also [202,
More informationIntroduction to gradient descent
6-1: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction to gradient descent Derivation and intuitions Hessian 6-2: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction Our
More informationLIMITED MEMORY BUNDLE METHOD FOR LARGE BOUND CONSTRAINED NONSMOOTH OPTIMIZATION: CONVERGENCE ANALYSIS
LIMITED MEMORY BUNDLE METHOD FOR LARGE BOUND CONSTRAINED NONSMOOTH OPTIMIZATION: CONVERGENCE ANALYSIS Napsu Karmitsa 1 Marko M. Mäkelä 2 Department of Mathematics, University of Turku, FI-20014 Turku,
More informationUses of duality. Geoff Gordon & Ryan Tibshirani Optimization /
Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear
More information3.10 Lagrangian relaxation
3.10 Lagrangian relaxation Consider a generic ILP problem min {c t x : Ax b, Dx d, x Z n } with integer coefficients. Suppose Dx d are the complicating constraints. Often the linear relaxation and the
More informationCoordinate Update Algorithm Short Course Proximal Operators and Algorithms
Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 36 Why proximal? Newton s method: for C 2 -smooth, unconstrained problems allow
More information10-725/36-725: Convex Optimization Spring Lecture 21: April 6
10-725/36-725: Conve Optimization Spring 2015 Lecturer: Ryan Tibshirani Lecture 21: April 6 Scribes: Chiqun Zhang, Hanqi Cheng, Waleed Ammar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:
More informationConvex Functions. Pontus Giselsson
Convex Functions Pontus Giselsson 1 Today s lecture lower semicontinuity, closure, convex hull convexity preserving operations precomposition with affine mapping infimal convolution image function supremum
More informationOn Solving Large-Scale Finite Minimax Problems. using Exponential Smoothing
On Solving Large-Scale Finite Minimax Problems using Exponential Smoothing E. Y. Pee and J. O. Royset This paper focuses on finite minimax problems with many functions, and their solution by means of exponential
More informationAn Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization
An Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization Frank E. Curtis, Lehigh University involving joint work with Travis Johnson, Northwestern University Daniel P. Robinson, Johns
More informationProximal and First-Order Methods for Convex Optimization
Proximal and First-Order Methods for Convex Optimization John C Duchi Yoram Singer January, 03 Abstract We describe the proximal method for minimization of convex functions We review classical results,
More informationOutline. Relaxation. Outline DMP204 SCHEDULING, TIMETABLING AND ROUTING. 1. Lagrangian Relaxation. Lecture 12 Single Machine Models, Column Generation
Outline DMP204 SCHEDULING, TIMETABLING AND ROUTING 1. Lagrangian Relaxation Lecture 12 Single Machine Models, Column Generation 2. Dantzig-Wolfe Decomposition Dantzig-Wolfe Decomposition Delayed Column
More informationDuality in Linear Programs. Lecturer: Ryan Tibshirani Convex Optimization /36-725
Duality in Linear Programs Lecturer: Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: proximal gradient descent Consider the problem x g(x) + h(x) with g, h convex, g differentiable, and
More informationAn Inexact Newton Method for Optimization
New York University Brown Applied Mathematics Seminar, February 10, 2009 Brief biography New York State College of William and Mary (B.S.) Northwestern University (M.S. & Ph.D.) Courant Institute (Postdoc)
More informationAn Inexact Newton Method for Nonlinear Constrained Optimization
An Inexact Newton Method for Nonlinear Constrained Optimization Frank E. Curtis Numerical Analysis Seminar, January 23, 2009 Outline Motivation and background Algorithm development and theoretical results
More informationQuasi-Newton methods for minimization
Quasi-Newton methods for minimization Lectures for PHD course on Numerical optimization Enrico Bertolazzi DIMS Universitá di Trento November 21 December 14, 2011 Quasi-Newton methods for minimization 1
More informationCSCI : Optimization and Control of Networks. Review on Convex Optimization
CSCI7000-016: Optimization and Control of Networks Review on Convex Optimization 1 Convex set S R n is convex if x,y S, λ,µ 0, λ+µ = 1 λx+µy S geometrically: x,y S line segment through x,y S examples (one
More informationOptimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30
Optimization Escuela de Ingeniería Informática de Oviedo (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Unconstrained optimization Outline 1 Unconstrained optimization 2 Constrained
More informationShiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient
Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 4 Subgradient Shiqian Ma, MAT-258A: Numerical Optimization 2 4.1. Subgradients definition subgradient calculus duality and optimality conditions Shiqian
More informationConvex Optimization. Ofer Meshi. Lecture 6: Lower Bounds Constrained Optimization
Convex Optimization Ofer Meshi Lecture 6: Lower Bounds Constrained Optimization Lower Bounds Some upper bounds: #iter μ 2 M #iter 2 M #iter L L μ 2 Oracle/ops GD κ log 1/ε M x # ε L # x # L # ε # με f
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE7C (Spring 018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee7c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee7c@berkeley.edu February
More informationA Brief Review on Convex Optimization
A Brief Review on Convex Optimization 1 Convex set S R n is convex if x,y S, λ,µ 0, λ+µ = 1 λx+µy S geometrically: x,y S line segment through x,y S examples (one convex, two nonconvex sets): A Brief Review
More informationConstrained Optimization and Lagrangian Duality
CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may
More informationAccelerated primal-dual methods for linearly constrained convex problems
Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize
More information1 Sparsity and l 1 relaxation
6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the
More informationAsynchronous Non-Convex Optimization For Separable Problem
Asynchronous Non-Convex Optimization For Separable Problem Sandeep Kumar and Ketan Rajawat Dept. of Electrical Engineering, IIT Kanpur Uttar Pradesh, India Distributed Optimization A general multi-agent
More informationLecture 6: September 12
10-725: Optimization Fall 2013 Lecture 6: September 12 Lecturer: Ryan Tibshirani Scribes: Micol Marchetti-Bowick Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have not
More informationMath 273a: Optimization Convex Conjugacy
Math 273a: Optimization Convex Conjugacy Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Convex conjugate (the Legendre transform) Let f be a closed proper
More informationSubgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus
1/41 Subgradient Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes definition subgradient calculus duality and optimality conditions directional derivative Basic inequality
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE7C (Spring 018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee7c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee7c@berkeley.edu October
More informationA New Trust Region Algorithm Using Radial Basis Function Models
A New Trust Region Algorithm Using Radial Basis Function Models Seppo Pulkkinen University of Turku Department of Mathematics July 14, 2010 Outline 1 Introduction 2 Background Taylor series approximations
More informationGeneralized Uniformly Optimal Methods for Nonlinear Programming
Generalized Uniformly Optimal Methods for Nonlinear Programming Saeed Ghadimi Guanghui Lan Hongchao Zhang Janumary 14, 2017 Abstract In this paper, we present a generic framewor to extend existing uniformly
More informationOptimization for Machine Learning
Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html
More information10 Numerical methods for constrained problems
10 Numerical methods for constrained problems min s.t. f(x) h(x) = 0 (l), g(x) 0 (m), x X The algorithms can be roughly divided the following way: ˆ primal methods: find descent direction keeping inside
More informationNonlinear Optimization for Optimal Control
Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]
More informationA Unified Analysis of Nonconvex Optimization Duality and Penalty Methods with General Augmenting Functions
A Unified Analysis of Nonconvex Optimization Duality and Penalty Methods with General Augmenting Functions Angelia Nedić and Asuman Ozdaglar April 16, 2006 Abstract In this paper, we study a unifying framework
More informationThe Proximal Gradient Method
Chapter 10 The Proximal Gradient Method Underlying Space: In this chapter, with the exception of Section 10.9, E is a Euclidean space, meaning a finite dimensional space endowed with an inner product,
More information12. Interior-point methods
12. Interior-point methods Convex Optimization Boyd & Vandenberghe inequality constrained minimization logarithmic barrier function and central path barrier method feasibility and phase I methods complexity
More informationConvex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014
Convex Optimization Dani Yogatama School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA February 12, 2014 Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12,
More information