Nonsmooth optimization : beyond first order methods. A tutorial focusing on bundle methods
|
|
- Alexina Campbell
- 5 years ago
- Views:
Transcription
1 Nonsmooth optimization : beyond first order methods. A tutorial focusing on bundle methods Claudia Sagastizábal (IMECC-UniCamp, Campinas Brazil, adjunct researcher) SESO 2018, Paris, May 23 and 25, 2018
2 Computational NSO: what do we mean? For the unconstrained problem minf(x), where f : IR n IR is convex but not differentiable at some points Algorithms defined according on how much information is provided by certain oracle
3 Computational NSO: what do we mean? For the unconstrained problem minf(x), where f : IR n IR is convex but not differentiable at some points, Algorithms defined according on how much information is provided by certain oracle an informative oracle f(x) x the full f(x)
4 Computational NSO: what do we mean? For the unconstrained problem minf(x), where f : IR n IR is convex but not differentiable at some points Algorithms defined according on how much information is provided by certain oracle a black box f(x) x ONE g(x) f(x)
5 Computational NSO: what do we mean? For the unconstrained problem minf(x), where f : IR n IR is convex but not differentiable at some points, Algorithms defined according on how much information is provided by certain oracle a black box f(x) x ONE g(x) f(x) How common are nonsmooth objective functions in optimization?
6 When does nonsmoothness appear? * if the nature of the problem imposes a nonsmooth model; or * if sparsity of the solution is a concern; or * in problems difficult to solve, because they are large scale because they are heterogeneous sometimes the solution method induces nonsmoothness
7 Example of NS model Recovery of blocky images (l 1 -regularization of TV)
8 Example of sparse optimization min{ x 1 : Ax = b} Basis pursuit: find least 1-norm point on the affine plane Tends to return a sparse point (sometimes, the sparsest)
9 Example of sparse optimization min{ x 1 : Ax = b} Basis pursuit: find least 1-norm point on the affine plane Tends to return a sparse point (sometimes, the sparsest) LASSO denoises basis pursuit { } min Ax b 2 2 : x 1 τ or or min { } x 1 + µ 2 Ax b 2 2 } min { x 1 : Ax b 2 2 σ
10 Example of sparse optimization min{ x 1 : h(x) b} Basis pursuit: find least 1-norm point on a nonlinear set Tends to return a sparse point (sometimes, the sparsest) LASSO denoises basis pursuit { ( } + min h(x) b) 2 2 : x 1 τ or ( ) } + min { x 1 + µ2 h(x)x b 2 2 or { ( } + min x 1 : h(x)x b) 2 2 σ
11 Lagrangian Relaxation Example Real-life optimization problems min C j (p j ) j J (primal) for j J: p j P j g j (p j ) = Dem j J x
12 Lagrangian Relaxation Example Real-life optimization problems max C j (p j ) j J (primal) for j J: p j P j g j (p j ) = Dem j J x often exhibit separable structure after dualization
13 Lagrangian Relaxation Example Real-life optimization problems max C j (p j ) j J (primal) for j J: p j P j g j (p j ) = Dem j J x often exhibit separable structure passing to the (dual):
14 Lagrangian Relaxation Example Real-life optimization problems max C j (p j ) j J (primal) for j J: p j P j g j (p j ) = Dem j J x often exhibit separable structure passing to the (dual): min x f(x) := f 0 (x) + j J min x x,dem + j J f j (x) max C j (p j ) + p j P j x,g j (p j )
15 Lagrangian Relaxation Example Real-life optimization problems max C j (p j ) j J (primal) for j J: p j P j g j (p j ) = Dem j J x often exhibit separable structure passing to the (dual): min x f(x) := f 0 (x) + j J min x x,dem + j J f j (x) max C j (p j ) + p j P j x,g j (p j )
16 Benders Decomposition Example Similar situation, but now the uncoupling is done on a primal level min (primal) j J I j ( p j )+C j (p j ) for j J: p j P j p j p j + p j p D
17 Benders Decomposition Example Similar situation, but now the uncoupling is done on a primal level (primal) min p min j J I j ( p j )+C j (p j ) for j J: p j P j p D I j ( p j )+V j ( p j ) j J p D V j ( p j ) := p j p j + p j min C j (p j ) p j p j + p j minf(x) := j J f j ( p j ) f j ( p j ) :=
18 Benders Decomposition Example Similar situation, but now the uncoupling is done on a primal level (primal) min p min j J I j ( p j )+C j (p j ) for j J: p j P j p D I j ( p j )+V j ( p j ) j J p D V j ( p j ) := p j p j + p j min C j (p j ) p j p j + p j minf(x) := j J f j ( p j ) for f j ( p j ) := I j ( p j )+V j ( p j )
19 Computing f(x k ): how difficult is it? 1. f(x) = x, for n = 1 2. A linear Lasso function, f(x) = x 1 + µ 2 Ax b A nonlinear Lasso function, h C 1, f(x) = x 1 + µ 2 (h(x) b) One of the local subproblems in the Lagrangian example, max C j (p j ) + x k,g j (p j ) f j (x k ) := p j P j 5. One of the local subproblems in the Benders example, } (I j ( p j ) + V j ( p j ) =f j (x k,j ) = min {C j (p j ) : p j p j + x k,j
20 But why would one want ALL of f(x k )?
21 But why would one want ALL of f(x k )? Indispensible to calculate the proximal point p = prox f t (x) p = argminf(y) + 1 2t y x 2 2 f(p) + 1 t (p x) 1 t (x p) f(p)
22 But why would one want ALL of f(x k )? Indispensible to calculate the proximal point p = prox f t (x) p = argminf(y) + 1 2t y x f(p) + 1 t (p x) 1 t (x p) f(p)
23 But why would one want ALL of f(x k )? Indispensible to calculate the proximal point p = prox f t (x) p = argminf(y) + 1 2t y x f(p) + 1 t (p x) 1 t (x p) f(p)
24 But why would one want ALL of f(x k )? Indispensible to calculate the proximal point p = prox f t (x) p = argminf(y) + 1 2t y x f(p) + 1 t (p x) 1 t (x p) f(p) Without full knowledge of the subdifferential, the implicit inclusion cannot be solved!
25 But why would one want ALL of f(x k )? Indispensible to calculate the proximal point p = prox f t (x) p = argminf(y) + 1 2t y x f(p) + 1 t (p x) 1 t (x p) f(p) Without full knowledge of the subdifferential, the implicit inclusion cannot be solved! note: p x t f(p) akin to a subgradient method
26 Proximal point algorithms (Accel. Nesterov, FISTA, AugLag) x k+1 = prox f t k (x k ) x k+1 = argminf(y) + 1 2t k y x k 2 2
27 Proximal point algorithms (Accel. Nesterov, FISTA, AugLag) x k+1 = prox f t k (x k ) x k+1 = argminf(y) + 1 2t k y x k 2 2 of interest if computing prox f t k (x k ) is much easier than minimizing f stepsize t k > 0 impacts on the number of iterations
28 Proximal point algorithms (Accel. Nesterov, FISTA, AugLag) x k+1 = prox f t k (x k ) x k+1 = argminf(y) + 1 2t k y x k 2 2 of interest if computing prox f t k (x k ) is much easier than minimizing f stepsize t k > 0 impacts on the number of iterations
29 Proximal point: calculus rules separable sum: f(x,y) = (g(x),h(y)) = prox f t (x) = ( prox g t (x),proxh t (y) ) scalar factor (α 0) and translation (v 0): f(x) = g(αx + ( v) = ) prox f t (x) = α 1 prox α2 g t (αx + v) v perspective" (α > 0): f(x) = αg( α 1 x) = proxf t (x) = αproxg/α t ( α x )
30 Proximal point: special functions + linear term (v 0): f(x) = g(x) + v,x = prox f t (x) = proxg t (x v) + convex quadratic term (t > 0): f(x) = g(x) + 1 2t x v 2 = prox f t (x) = proxλg t (λx + (1 λ)v) for λ = t t + 1 composition with linear term such that A A = α 1 I, (α 0): f(x) = g(ax + v) = [ ] prox f t (x) = (I αa A)x + αa prox g/α (Ax + v) v t
31 Proximal point algorithm: convergence If argminf /0 then f(x k ) f( x) x0 x 2 2 k i=1 t i
32 Proximal point algorithm: convergence If argminf /0 then f(x k ) f( x) x0 x 2 2 k i=1 t i = convergence if t i + = rate 1/k if {t k } bounded away from zero
33 Proximal point algorithm: acceleration ) x k+1 = prox f t k (x k + θ k+1 ( θ 1 k 1)(x k x k 1 ) for θ 2 k+1 t k+1 = (1 θ k+1 ) θ2 k t k
34 Proximal point algorithm: acceleration ) x k+1 = prox f t k (x k + θ k+1 ( θ 1 k 1)(x k x k 1 ) for θ 2 k+1 t k+1 = (1 θ k+1 ) θ2 k t k = convergence if t i + = rate 1/k 2 if {t k } bounded away from zero
35 What if prox f t is not computable?
36 What if prox f t Use bundle methods! is not computable?
37 What if prox f t Use bundle methods! is not computable? When do bundle method prove most useful?
38 What if prox f t Use bundle methods! is not computable? When do bundle method prove most useful? In situations when the objective function is not available explicitly and/or when we do not have access to the full subdifferential and/or when calculations need to be done with high precision
39 Bundling to approximate the prox WANT: p = prox f t (x) p = argminf(y) + 2t 1 y x f(p) + 1 t (p x) (x p) f(p) 1 t
40 Bundling to approximate the prox WANT: p = prox f t (x) p = argminf(y) + 2t 1 y x f(p) + 1 t (p x) (x p) f(p) 1 t HAVE: q = prox M t (x) q = argminm(y) + 2t 1 y x M(q) + 1 t (q x) (x q) M(q) 1 t
41 Bundling to approximate the prox WANT: p = prox f t (x) p = argminf(y) + 2t 1 y x f(p) + 1 t (p x) (x p) f(p) 1 t HAVE: q = prox M t (x) q = argminm(y) + 1 2t y x M(q) + 1 t (q x) (x q) M(q) 1 t M is a model of f for which we do have full knowledge of the subdifferential: the implicit inclusion can be solved!
42 Bundling to approximate the prox WANT: p = prox f t (x) p = argminf(y) + 2t 1 y x f(p) + 1 t (p x) (x p) f(p) 1 t HAVE: q = prox M t (x) q = argminm(y) + 1 2t y x M(q) + 1 t (q x) (x q) M(q) 1 t M is a model of f for which we do have full knowledge of the subdifferential: the implicit inclusion can be solved!
43 How is the model built?
44 Model built with the black box
45 A quick overview of Convex Analysis An example of a convex nonsmooth function
46 A quick overview of Convex Analysis An example of a convex nonsmooth function
47 A quick overview of Convex Analysis An example of a convex nonsmooth function { f(x)} = {slope of the linearization supporting f, tangent at x}
48 A quick overview of Convex Analysis An example of a convex nonsmooth function { f(x)} = {slope of the linearization supporting f, tangent at x} By convexity, f(y) f(x) + f(x),y x for all y
49 A quick overview of Convex Analysis An example of a convex nonsmooth function
50 A quick overview of Convex Analysis An example of a convex nonsmooth function
51 A quick overview of Convex Analysis An example of a convex nonsmooth function
52 A quick overview of Convex Analysis An example of a convex nonsmooth function
53 A quick overview of Convex Analysis An example of a convex nonsmooth function f(x) = {g IR n : f(y) f(x) + g,y x for all y} = {slopes of linearizations supporting f, tangent at x}
54 A quick overview of Convex Analysis An example of a convex nonsmooth function f(x) = {g IR n : f(y) f(x) + g,y x for all y} = {slopes of linearizations supporting f, tangent at x}
55 What can be done with the oracle output? An example of a convex nonsmooth function f(x) = {g IR n : f(y) f(x) + g,y x for all y} = {slopes of linearizations supporting f, tangent at x}
56 What can be done with the oracle output? An example of a convex nonsmooth function f(x) = {g IR n : f(y) f(x) + g,y x for all y} = {slopes of linearizations supporting f, tangent at x} 1 oracle call x k f(xk ) g(x k ) f(x k ) = 1 linearization
57 BEWARE: x k f(x k ) g(x k ) f(x k ) if oracle output is not accurate, linearization can be wrong! wrong g(x k ) gives bad linearization at x k f(x) = {g IR n : f(y) f(x) + g,y x for all y} (similarly if wrong f(x k ), more on this later)
58 How is the oracle information used? Putting together linearizations creates a cutting-plane model M for f f i = x i f(x i ) g i = g(x i ) = M(y) = max{f i + g i,x x i } i
59 How is the oracle information used? Putting together linearizations 1 creates a cutting-plane model M for f f i = x i f(x i ) g i = g(x i ) { = M(y) = max f i + g i,x x i } i
60 How is the oracle information used? Putting together linearizations 1 creates a cutting-plane model M for f f i = x i f(x i ) g i = g(x i ) { = M(y) = max f i + g i,x x i } i
61 How is the oracle information used? Putting together linearizations 1 creates a cutting-plane model M for f f i = x i f(x i ) g i = g(x i ) { = M(y) = max f i + g i,y x i } i (just one type of model, many others are possible)
62 How is the oracle information used? Putting together linearizations 1 creates a cutting-plane model M for f f i = x i f(x i ) g i = g(x i ) = M k (y) = max i k { f i + g i,y x i } (just one type of model, many others are possible)
63 Infinite bundling yields prox f t WANT: p = prox f t (x) p = argminf(y) + 2t 1 y x 2 2 HAVE: q k = prox M k t k (x) q k = argminm k (y) + 2t 1 k y x k = G k + t 1 k (q k x k ) Theorem [CL93] Suppose the models satisfy M k (y) f(y) for all k and y M k+1 (y) f(q k ) + g(q k ),y x k M k+1 (y) M k (q k ) + G k,y x k If 0 < t min t k+1 t k, then lim k qk = p and lim k M k(q k ) = f(p) for G k M k (q k )
64 Infinite bundling yields prox f t WANT: p = prox f t (x) p = argminf(y) + 2t 1 y x 2 2 HAVE: q k = prox M k t k (x) q k = argminm k (y) + 2t 1 k y x k = G k + t 1 k (q k x k ) Theorem [CL93] Suppose the models satisfy M k (y) f(y) for all k and y M k+1 (y) f(q k ) + g(q k ),y x k M k+1 (y) M k (q k ) + G k,y x k If 0 < t min t k+1 t k, then lim k qk = p and lim k M k(q k ) = f(p) for G k M k (q k )
65 Models for the half-and-half function STRUCTURE none f(x) x Ax + x Bx sum compo sition f 1 (x) + f 2 (x) (h c)(x) f 1 (x) = x Ax f 2 (x) = x Bx f 2 is smooth c(x) = (x,x Bx) IR n+1 c is smooth h(c) = C 1:n AC 1:n + C n+1 h is sublinear
66 Models for the half-and-half function STRUCTURE none f(x) x Ax + x Bx sum compo sition f 1 (x) + f 2 (x) (h c)(x) f 1 (x) = x Ax f 2 (x) = x Bx f 2 is smooth c(x) = (x,x Bx) IR n+1 c is smooth h(c) = C 1:n AC 1:n + C n+1 h is sublinear
67 Models for the half-and-half function STRUCTURE none f(x) x Ax + x Bx f k := f(x k ), g k f(x k ) sum compo sition f 1 (x) + f 2 (x) (h c)(x) f 1 (x) = x Ax f 2 (x) = x Bx f k 1,gk 1,fk 2, f 2(x k ) c(x) = (x,x Bx) IR n+1 c k = c(x k ),c (x k ) h(c) = C 1:n AC 1:n + C n+1 h k,g k h(c k )
68 Models for the half-and-half function STRUCTURE none f(x) x Ax + x Bx f k := f(x k ), g k f(x k ) sum compo sition f 1 (x) + f 2 (x) (h c)(x) f 1 (x) = x Ax f 2 (x) = x Bx f k 1,gk 1,fk 2, f 2(x k ) c(x) = (x,x Bx) IR n+1 c k = c(x k ),c (x k ) h(c) = C 1:n AC 1:n + C n+1 h k,g k h(c k )
69 Models for the half-and-half function STRUCTURE none f(x) x Ax + x Bx f k := f(x k ), g k f(x k ) sum compo sition f 1 (x) + f 2 (x) (h c)(x) f 1 (x) = x Ax f 2 (x) = x Bx f k 1,gk 1,fk 2, f 2(x k ) c(x) = (x,x Bx) IR n+1 c k = c(x k ),c (x k ) h(c) = C 1:n AC 1:n + C n+1 h k,g k h(c k )
70 NSO pitfalls NSO pitfalls NSO pitfalls NSO pitfalls
71 Stopping test in smooth optimization Algorithms for unconstrained smooth optimization use as optimality certificate Fermat s rule 0 = f( x) and generate a minimizing sequence: {x k } x such that f(x k ) 0. If f C 1, then f( x) = 0
72 Stopping test in smooth optimization Algorithms for unconstrained smooth optimization use as optimality certificate Fermat s rule 0 = f( x) and generate a minimizing sequence: {x k } x such that f(x k ) 0. If f C 1, then f( x) = 0 things are less straightforward if f is nonsmooth...
73 What happens with the stopping test in NSO? Algorithms for unconstrained NSO use as optimality certificate the inclusion 0 f( x) As a set-valued mapping f(x) is osc: ( ) x k,g(x k ) f(x k x k x ) : g(x k ) ḡ = ḡ f( x)
74 What happens with the stopping test in NSO? Algorithms for unconstrained NSO use as optimality certificate the inclusion 0 f( x) As a set-valued mapping f(x) is osc: ( ) x k,g(x k ) f(x k x k x ) : g(x k ) ḡ = ḡ f( x) As a set-valued mapping, f(x) is not isc: Given ḡ f( x) ( ) / x k,g(x k ) f(x k x k x ) : g(x k ) ḡ
75 What happens with the stopping test in NSO? Algorithms for unconstrained NSO use as optimality certificate the inclusion 0 f( x) As a set-valued mapping f(x) is osc: ( ) x k,g(x k ) f(x k x k x ) : g(x k ) ḡ = ḡ f( x) As a set-valued mapping, f(x) is not isc: Given ḡ f( x) ( ) /// x k,g(x k ) f(x k x k x ) : g(x k ) ḡ
76 The subdifferential For the absolute value function, f(x) = x [ 1, 1 ε/x] x < ε/2, ε f(x) = [ 1,1] ε/2 x1 ε1/2, [1 ε/x,1] x > ε/2. 1 x < 0 f(x) = [ 1,1] x = 0 1 x > 0
77 What happens with the stopping test in NSO? We need to design a sound stopping test that does not rely on the straightforward extension of Fermat s rule.
78 What happens with the stopping test in NSO? We need to design a sound stopping test that does not rely on the straightforward extension of Fermat s rule. We use instead ḡ ε f( x) for ḡ and ε small where the ε-subdifferential contains the slopes of linearizations supporting f up to ε, tangent at x: ε f(x) = {g IR n : f(y) f(x)+ g,y x ε for all y}
79 The ε-subdifferential ε f(x) = {g IR n : f(y) f(x) + g,y x ε for all y} f + ε f ε Subgradient linearization
80 The ε-subdifferential ε f(x) = {g IR n : f(y) f(x) + g,y x ε for all y} f + ε f ε Epsilon Subgradient linearization
81 The ε-subdifferential ε f(x) = {g IR n : f(y) f(x) + g,y x ε for all y} f + ε f ε Epsilon Subgradient linearization
82 The ε-subdifferential ε f(x) = {g IR n : f(y) f(x) + g,y x ε for all y} f + ε f ε Epsilon Subgradient linearization
83 The ε-subdifferential For the absolute value function, f(x) = x [ 1, 1 ε/x] x < ε/2, ε f(x) = [ 1,1] ε/2 x1 ε1/2, [1 ε/x,1] x > ε/2. 1 x < 0 f(x) = [ 1,1] x = 0 1 x > 0
84 The ε-subdifferential For the absolute value function, f(x) = x [ 1, 1 ε/x] if x < ε/2, ε f(x) = [ 1,1] if ε/2 x1 ε1/2, [1 ε/x,1] if x > ε/2. 1 x < 0 f(x) = [ 1,1] x = 0 1 x > 0 ε 2 ε 2
85 The ε-subdifferential ε 2 ε 2 As a set-valued mapping ε f(x) is osc: ( ) ε k ε ε k,x k,g(x k ) ε kf(x k ) : x k x G(x k ) ḡ = ḡ ε f( x) As a set-valued mapping, ε f(x) is isc: Given ḡ ε f( x) ( ) ε k,x k,g(x k ) ε kf(x k ) : ε k ε x k x G(x k ) ḡ
86 The ε-subdifferential and bundle methods Generate iterates so that for a subsequence {^x k } As a set-valued mapping ε f(x) is osc: ( ) ε k ε ε k,^x k,g(^x k ) ε kf(^x k ) : x k x G(^x k ) ḡ = ḡ ε f( x) with ε = 0 and ḡ = 0 As a set-valued mapping, ε f(x) is isc:given ḡ ε f( x) : ( ) ε k ε ε k,^x k,g(^x k ) ε kf(^x k ) : x k x G(x k ) ḡ
87 The ε-subdifferential and bundle methods You told us we were going to use subgradient information provided by an oracle or a black box, and now you want to use ε-subgradients!
88 The transportation formula How to express subgradients at x i as ε-subgradients at ^x k? g i f(x i ) if and only if, for all y IR n f(y) f(x i ) + g i,y x i = f(x i ) + g i,y x pmf(^x k ) ) = f(^x k ) + g i,y x Bigl(f(^x k ) f(x i ) = f(^x k ) + g i,y x ± ^x k ( ) f(^x k ) f(x i ) = f(^x k ) + g i,y ^x k ( f(^x k ) f(x i ) g i,^x k x i ) = f(^x k ) + g i,y ^x k e i (^x k )
89 The transportation formula How to express subgradients at x i as ε-subgradients at ^x k? g i f(x i ) if and only if, for all y IR n f(y) f(x i ) + g i,y x i = f(x i ) + g i,y x ± f(^x k ) ( ) = f(^x k ) + g i (y x) f(^x k ) f(x i ) ( ) = f(^x k ) + g i (y x ± ^x k ) f(^x k ) f(x i ) ( = f(^x k ) + g i (y ^x k ) f(^x k ) f(x i ) g i,^x k x i ) = f(^x k ) + g i (y ^x k ) e i (^x k ) = g i e i (^x k ) f(^xk ) e i (^x k ) := f(^x k ) f(x i ) g i,^x k x i 0
90 The transportation formula How to express subgradients at x i as ε-subgradients at ^x k? g i f(x i ) if and only if, for all y IR n f(y) f(x i ) + g i,y x i = f(x i ) + g i,y x ± f(^x k ) ( ) = f(^x k ) + g i,y x f(^x k ) f(x i ) = f(^x k ) + g i y x ± ^x k ( ) f(^x k ) f(x i ) = f(^x k ) + g i y ^x k ( f(^x k ) f(x i ) g i,^x k x i ) = f(^x k ) + g i,y ^x k e i (^x k ) = g i e i (^x k ) f(^xk ) e i (^x k ) := f(^x k ) f(x i ) g i,,^x k x i ) 0
91 The transportation formula How to express subgradients at x i as ε-subgradients at ^x k? g i f(x i ) if and only if, for all y IR n f(y) f(x i ) + g i,y x i = f(x i ) + g i,y x ± f(^x k ) ( ) = f(^x k ) + g i,y x f(^x k ) f(x i ) = f(^x k ) + g i,y x ± ^x k ( ) f(^x k ) f(x i ) = f(^x k ) + g i,y ^x k ( f(^x k ) f(x i ) g i,^x k x i ) = f(^x k ) + g i,y ^x k e i (^x k ) = g i e i (^x k ) f(^xk ) e i (^x k ) := f(^x k ) f(x i ) g i,,^x k x i ) 0
92 The transportation formula How to express subgradients at x i as ε-subgradients at ^x k? g i f(x i ) if and only if, for all y IR n f(y) f(x i ) + g i,y x i = f(x i ) + g i,y x ± f(^x k ) ( ) = f(^x k ) + g i,y x f(^x k ) f(x i ) = f(^x k ) + g i,y x ± ^x k ( ) f(^x k ) f(x i ) = f(^x k ) + g i,y ^x k ( f(^x k ) f(x i ) g i,^x k x i ) = f(^x k ) + g i,y ^x k e i (^x k ) = g i e i (^x k ) f(^xk ) e i (^x k ) := f(^x k ) f(x i ) g i,,^x k x i ) 0
93 The transportation formula How to express subgradients at x i as ε-subgradients at ^x k? g i f(x i ) if and only if, for all y IR n f(y) f(x i ) + g i,y x i = f(x i ) + g i,y x ± f(^x k ) ( ) = f(^x k ) + g i,y x f(^x k ) f(x i ) = f(^x k ) + g i,y x ± ^x k ( ) f(^x k ) f(x i ) = f(^x k ) + g i,y ^x k ( f(^x k ) f(x i ) g i,^x k x i ) = f(^x k ) + g i,y ^x k e i (^x k ) = g i e i (^x k ) f(^xk ) e i (^x k ) := f(^x k ) f(x i ) g i,,^x k x i ) 0
94 The transportation formula How to express subgradients at x i as ε-subgradients at ^x k? g i f(x i ) if and only if, for all y IR n f(y) f(x i ) + g i,y x i = f(x i ) + g i,y x ± f(^x k ) ( ) = f(^x k ) + g i,y x f(^x k ) f(x i ) = f(^x k ) + g i,y x ± ^x k ( ) f(^x k ) f(x i ) = f(^x k ) + g i,y ^x k ( f(^x k ) f(x i ) g i,^x k x i ) = f(^x k ) + g i,y ^x k e i (^x k ) = g i e i (^x k ) f(^xk ) e i (^x k ) := f(^x k ) f(x i ) g i,^x k x i 0
95 The transportation formula How to express subgradients at x i as ε-subgradients at ^x k? g i f(x i ) if and only if, for all y IR n f(y) f(x i ) + g i,y x i = f(x i ) + g i,y x ± f(^x k ) ( ) = f(^x k ) + g i,y x f(^x k ) f(x i ) = f(^x k ) + g i,y x ± ^x k ( ) f(^x k ) f(x i ) = f(^x k ) + g i,y ^x k ( f(^x k ) f(x i ) g i,^x k x i ) = f(^x k ) + g i,y ^x k e i (^x k ) = g i e i (^x k ) f(^xk ) e i (^x k ) := f(^x k ) f(x i ) g i,^x k x i 0
96 Linearization errors e j (^x k ) f(^x k ) f M k ( ) f(x j ) ^x k x j f(x j ) + g j, x j
97 The ε-subdifferential and bundle methods We collect the black-box x i,i = 1,2,...,k, so that at iteration k we can define a bundle of information, centered at a special iterate ^x k {x i } B k := ei (^x k ) = f(^x k ) f(x i ) g i,^x k x i g i e i (^x k ) f(^xk )
98 The ε-subdifferential and bundle methods We collect the black-box x i,i = 1,2,...,k, so that at iteration k we can define a bundle of information, centered at a special iterate ^x k {x i } B k := ei (^x k ) = f(^x k ) f(x i ) g i,^x k x i g i e i (^x k ) f(^xk ) A suitable convex combination ε k := i B k α i e i (^x k ) and G k := i B k α i g i will eventually satisfy the optimality condition!
99 Why special NSO methods? Smooth optimization techniques do not work abs 0 f(x) = x f(x k ) = 1, x k 0 f(0) = [ 1,1] Smooth stopping test fails: f(x k ) TOL ( g(x k ) TOL)
100 Why special NSO methods? Smooth optimization techniques do not work Smooth approximations of derivatives by finite differences fail For f : IR 3 IR defined by f(x) = max(x 1,x 2,x 3 ) f(0) =? Forward finite difference f(x+ x) f(x) x = (1,1,1) Central finite difference f(x+ x) f(x ) 2 x = ( 1 2, 1 2, none of them in the subdifferential!
101 Why special NSO methods? Smooth optimization techniques do not work Smooth approximations of derivatives by finite differences fail For f : IR 3 IR defined by f(x) = max(x 1,x 2,x 3 ) f(0) =? Forward finite difference f(x+ x) f(x) x = (1,1,1) Central finite difference f(x+ x) f(x ) 2 x = ( 1 2, 1 2, 1 2 none of them in the subdifferential!
102 Why special NSO methods? Smooth optimization techniques do not work Linesearches get trapped in kinks and fail
103 Why special NSO methods? Smooth optimization techniques do not work Linesearches get trapped in kinks and fail Example 9.1 Instability of steepest descent"
104 Why special NSO methods? Smooth optimization techniques do not work g(x k ) may not provide descent
105 Why special NSO methods? Smooth optimization techniques do not work g(x k ) may not provide descent
106 Why special NSO methods? Smooth optimization techniques do not work Smooth stopping test fails Finite difference approximations fail Linesearches get trapped in kinks and fail Direction opposite to a subgradient may increase the functional values
107
108
109
110 Bundle Methods WANT: p = prox f t (x) p = argminf(y) + 2t 1 y x 2 2 HAVE: q k = prox M k t k (x) q k = argminm k (y) + 2t 1 k y x k = G k + t 1 k (q k x k ) G k εk f(x) for G k M k (q k ) for ε k = f(x) M k (q k ) t k G k 2 2
111 Bundle Methods WANT: p = prox f t (x) p = argminf(y) + 2t 1 y x 2 2 HAVE: q k = prox M k t k (x) q k = argminm k (y) + 2t 1 k y x k = G k + t 1 k (q k x k ) Two subsequences G k εk f(x) for G k M k (q k ) for ε k = f(x) M k (q k ) t k G k 2 2 Iterates giving sufficiently good approximal points Iterates just helping the optimization process CL93 eventually applies null
112 Bundle Methods HAVE: q k = prox M k t k (x) = x k + t k G k G k εk f(x) Two subsequences for ε k = f(x) M k (q k ) t k G k 2 2 Iterates giving sufficiently good approximal points moving towards minimum in a manner that makes δ k := ε k + t k G k (serious) Iterates just helping the optimization process CL93 eventually applies (null)
113 Bundle Methods
114 Bundle Methods M 1 ( ) + 1 2t 1 ^x 1 2
115 Bundle Methods M 2 ( ) + 1 2t 2 ^x 2 2 ^x 1 x 2
116 Bundle Methods ^x 1 ^x 3 x 2
117 Bundle Methods ^x 1 ^x 4 ^x 3 x 2
118 Bundle Methods 0 Choose x 1, set k = 1, and let ^x 1 = x 1. 1 Compute x k+1 = argminm k (x) + 1 2t k x ^x k 2 2 If δ k :=f(^x k ) M k (x k+1 ) tol STOP 3 Call the oracle at x k+1. If f(x k+1 ) f(^x k ) mδ k, set ^x k+1 = x k+1 (Serious Step) Otherwise, maintain ^x k+1 = ^x k (Null Step) 4 Define M k+1, t k+1, make k = k + 1, and loop to 1.
119 Bundle Methods: selection mechanism ( M k+1 ( ) = max M k ( ),f k + now the choice of the new model is more flexible: x k+1 argminm k (x) + 2t 1 k x ^x k 2 with M k (x) = max i k {f i + g k, x k ), g i,x x i } is equivalent to a QP: min r IR,x IR n r + 2t 1 k x ^x k 2 s.t. r f i + g i,x x i for i k A posteriori, the solution remains the same if...
120 Bundle Methods: selection mechanism ( M k+1 ( ) = max M k ( ),f k + now the choice of the new model is more flexible: x k+1 argminm k (x) + 2t 1 k x ^x k 2 with M k (x) = max i k {f i + g k, x k ), g i,x x i } is equivalent to a QP: min r IR,x IR n r + 2t 1 k x ^x k 2 s.t. r f i + g i,x x i for i k A posteriori, the solution remains the same if all, or...
121 Bundle Methods: selection mechanism ( M k+1 ( ) = max M k ( ),f k + now the choice of the new model is more flexible: x k+1 argminm k (x) + 2t 1 k x ^x k 2 with M k (x) = max i k {f i + g k, x k ), g i,x x i } is equivalent to a QP: min r IR,x IR n r + 2t 1 k x ^x k 2 s.t. r f i + g i,x x i for active i s A posteriori, the solution remains the same if all, or active, or...
122 Bundle Methods: selection mechanism ( M k+1 ( ) = max M k ( ),f k + now the choice of the new model is more flexible: x k+1 argminm k (x) + 2t 1 k x ^x k 2 with M k (x) = max i k {f i + g k, x k ), g i,x x i } is equivalent to a QP: min r IR,x IR n r + 2t 1 k x ^x k 2 s.t. r i ᾱi( f i + g i,x x i ) A posteriori, the solution remains the same if all, or active, or the optimal convex combination
123 Bundle Methods: selection mechanism ( M k+1 ( ) = max M k ( ),f k + now the choice of the new model is more flexible: x k+1 argminm k (x) + 2t 1 k x ^x k 2 with M k (x) = max i k {f i + g k, x k ), g i,x x i } is equivalent to a QP: min r IR,x IR n r + 2t 1 k x ^x k 2 s.t. r i ᾱi( f i + g i,x x i ) A posteriori, the solution remains the same if all, or active, or the optimal convex combination are kept
124 Bundle Methods: next model options ( M k+1 ( ) = max M k ( ),f k + g k, x k ) or ( M k+1 ( ) = max max active,fk + g k, x k ) or ( M k+1 ( ) = max aggregate,f k + g k, x k ) Same QP solution if all, or active, or the optimal convex combination aggregate=full Bundle Compression: QP with only 2 constraints (but slows down the overall process)
125 The cutting-plane model You told us we were going to use a bundle B k composed by linearization errors and ε-subgradients at ^x k, but the model uses f i and g i f(x i )
126 Rewriting the cutting-plane model The transportation formula centers the ith linearization in the serious iterate f(y) f(x i ) + g i,y x i f(^x k ) f = f(^x k ) + g i,y ^x k e i (^x k e j (^x k ) ) f(x j ) M k ( ) ^x k x j f(x j ) + g j, x j
127 Rewriting the cutting-plane model The transportation formula centers the ith linearization in the serious iterate f(y) f(x i ) + g i,y x i f(^x k ) f = f(^x k ) + g i,y ^x k e i (^x k e j (^x k ) ) f(x j ) M k ( ) this translates into the model as follows { M k (y) = max f(x i ) + g i,y x i } : i B k { = max f(^x k ) + g i,y ^x k } e i (^x k ) : i B k = f(^x k ) + max { g i,y ^x k } e i (^x k ) : i B k ^x k x j f(x j ) + g j, x j
128 Bundle Method 0 Choose x 1, set k = 1, and let ^x 1 = x 1. 1 Compute x k+1 = argminm k (x) + 1 2t k x ^x k 2 2 If δ k :=f(^x k ) M k (x k+1 ) tol STOP 3 Call the oracle at x k+1. If f(x k+1 ) f(^x k ) mδ k, set ^x k+1 = x k+1 Otherwise, maintain ^x k+1 = ^x k 4 Define M k+1, t k+1, make k = k + 1, and loop to 1.
129 Bundle Method 0 Choose x 1, set k = 1, and let ^x 1 = x 1. 1 Compute x k+1 = argminm k (x) + 1 2t k x ^x k 2 2 If δ k :=f(^x k ) M k (x k+1 ) tol STOP 3 Call the oracle at x k+1. If f(x k+1 ) f(^x k ) mδ k, set ^x k+1 = x k+1 (Serious Step) k K S Otherwise, maintain ^x k+1 = ^x k (Null Step) k K N 4 Define M k+1, t k+1, make k = k + 1, and loop to 1.
130 Bundle Method When k, the algorithm generates two subsequences. Convergence analysis addresses the mutually exclusive situations either the SS subsequence is infinite K := {k K S } (limit point minimizes f) or there is a last SS, followed by infinitely many null steps K := {k K N : k lastss} (last SS minimizes f and nullto last SS)
131 Bundle Method When k, the algorithm generates two subsequences. Convergence analysis addresses the mutually exclusive situations either the SS subsequence is infinite K := {k K S } (limit point minimizes f) or there is a last SS, followed by infinitely many null steps K := {k K N : k last SS} (last SS minimizes f and nullto last SS)
132 Bundle Method When k, the algorithm generates two subsequences. Convergence analysis addresses the mutually exclusive situations either the SS subsequence is infinite K := {k K S } (limit point minimizes f) or there is a last SS, followed by infinitely many null steps K := {k K N : k last SS} (last SS minimizes f and null last SS)
133 Equivalent QPs 1. Given t k, the stepsize parameter of the proximal bundle method, with QP subproblem given by (PB) k minm k (x) + 1 2t k x ^x k 2 2. Given k, the radius parameter of the trust-region bundle method, with QP subproblem given by min M k (x) (TRB) k s.t. x ^x k 2 k 3. Given l k, the level parameter of the level bundle method,
134 with QP subproblem given by min 1 2 (LB) x ^xk 2 k s.t. M k (x) l k Show that 1. given t k, there exists k such that if x k+1 solves (PB) k, then x k+1 solves (TRB) k. 2. given k, there exists l k such that if x k+1 solves (TRB) k, then x k+1 solves (LB) k. 3. given ell k, there exists t k such that if x k+1 solves (LB) k, then x k+1 solves (PB) k.
135 Theorem K := {k K S } Suppose the bundle method loops forever and there are infinitely many serious steps. Either the solution set of minf is empty and f(^x k ) or the following holds (i) lim δ k = 0 and lim ε k = 0. k K S k K S (ii) If the stepsizes are chosen so that {^x k } is a minimizing sequence. k K S t k = + then (iii) If, in addition, t k t up for all k K S, then the subsequence {^x k } is bounded. In this case, any limit point x minimizes f and the whole sequence converges to x
136 Theorem K := { } k K N ˆk Suppose the bundle method loops forever and there are infinitely many null steps after a last serious one, denoted by ^x and generated at iteration ^k. Suppose stepsizes are chosen so that t lo t k+1 t k for all k K The following holds 1. The sequence {x k+1 } is bounded 2. lim k K M k (x k+1 ) = f(^x) 3. ^x minimizes f 4. lim k K x k+1 = ^x
137 Model requirements 1. M k f 2. If k was declared a null step a) M k+1 (x) f k+1 + g k+1,x x k+1 b) M k+1 (x) A k (x)= M k (x k+1 ) + G k,x x k+1
138 Model requirements 1. M k f 2. If k was declared a null step a) M k+1 (x) f k+1 + g k+1,x x k+1 b) M k+1 (x) A k (x)= M k (x k+1 ) + Any model satisfying these conditions that is used in the QP maintains the convergence results G k,x x k+1
139 Comparing the methods: bundle and SG Typical performance on a battery of Unit Commitment problems
140 Bundle Methods with on-demand accuracy the new generation
141 Oracle types: exact and upper f 1 (x)/g 1 (x) f 1 (x) is easy: exact f 1 (x)/g 1 (x) f 2 (x)/g 2 (x) f 2 (x) is difficult: inexact f 2 x/g 2 x f 2 f 1 f 1 (x) + g 1 (x), x f 2 x + g 2 x, x Oracle f 2 x/g 2 x NOT of lower type
142 Oracle types: exact and lower f 1 (x)/g 1 (x) f 1 (x) is easy: exact f 1 (x)/g 1 (x) f 2 (x)/g 2 (x) f 2 (x) is less difficult: inexact f 2 x/g 2 x f 2 f 1 f 1 (x) + g 1 (x), x f 2 x + g 2 x, x Oracle f 2 x/g 2 x of lower type
143 For the EM problem f j (x) = max{ C j (q j ) + x,g j (q j ) By computing f x k and g x k satisfying we can build A lower oracle f x k = f(x k ) η k and g x k η kf(x k ) An asymptotically exact oracle η k 0 as k A partly asymptotically exact oracle An on-demand accuracy oracle η k 0 as K s k η k η k when f x k f^x k mδ k : q j P j }
144 BM with lower inexact oracles M k (x) = max{f x i + g x i,x x i : i B k } δ k = ε k + t k G k 2 SS test: f x k+1 ^f k mδ k { ( )} ^f k := max f^x k,max M j (^x k ),j ^k + Oracle inaccuracy is locally bounded: R 0 η(r) 0 : x R = η η(r) convergence as before, up to the accuracy on SS
145 BM with lower inexact oracles M k (x) = max{f x i + g x i,x x i : i B k } δ k = ε k + t k G k 2 SS test: f x k+1 ^f k mδ k { ( )} ^f k := max f^x k,max M j (^x k ),j ^k + Oracle inaccuracy is locally bounded: R 0 η(r) 0 : x R = η η(r) convergence as before, up to the accuracy on SS Convex proximal bundle methods in depth: a unified analysis for inexact oracles W. de Oliveira, C. Sagastizábal, C. Lemaréchal MathProg 148, pp , 2014
146 General comments Bundle methods are robust (do not oscillate, as CP methods do) reliable (have a stopping test, unlike SG methods) can deal with inaccuracy in a reasonable manner
147 Extending bundle methods
148 Constrained NSO problems: an example
149 Optimal management of the hydrovalley ) max λ E η (ρ(u) s.t. (x,u) P P η ( Au + a min Mη Au + a max ) p
150 Optimal management of the hydrovalley ) max λ E η (ρ(u) s.t. Is this a convex program? (x,u) P P η ( Au + a min Mη Au + a max ) p
151 Optimal management of the hydrovalley ) max λ E η (ρ(u) s.t. (x,u) P P η ( Au + a min Mη Au + a max ) p Is this a convex program? YES: the function ( ) u log (P ) η Au + a min Mη Au + a max is convex. min f(u) We need to solve s.t. (x,u) P for linear f and with c(u) 0 ( ) c(u) := log (P ) η Au + a min Mη Au + a max logp
152 Optimal management of the hydrovalley ) max λ E η (ρ(u) s.t. (x,u) P P η ( Au + a min Mη Au + a max ) p Is this a convex program? YES: the function ( ) u log (P ) η Au + a min Mη Au + a max min f(u) We need to solve s.t. (x,u) P for linear f and with c(u) 0 ( ) c(u) := log (P ) η Au + a min Mη Au + a max logp difficult to compute! is convex.
153 Need to solve the constrained problem (P) min f(u) s.t. (x,u) P c(u) 0 for linear f and with inexact evaluation of c and its gradient, via a black box with controllable inaccuracy (bounded by a given tolerance ε, with confidence level 99%, noting that evaluation errors can be positive or negative)
154 Handling constraints in NSO For nonsmooth constrained problems use the Improvement Function min f(u) s.t. c(u) 0 max{f(u) f(^u),c(u)} u (changes with each serious point ^u and supposes exact f/c values available) [SagSol SiOPT, 2005 and KarasRibSagSol MPB, 2009]
155 Improvement function Let ( x,ū) be a solution to (P). The function Hū(u) := max {f(u) f(ū), c(u)} (x,u) P has perfect theoretical properties: If Slater condition ( (x, u) P s.t. c(u) < 0) holds, then ū solves min f(u) s.t. c(u) 0 (P) (x,u) P min H ū(u) = Hū(ū) = 0 (x,u) P 0 H(ū) for H( ) := Hū( )
156 Improvement function Let ( x,ū) be a solution to (P). The function Hū(u) := has perfect theoretical properties: Without Slater condition ū solves max {f(u) f(ū), c(u)} (x,u) P min f(u) s.t. c(u) 0 (P) (x,u) P BUT min H ū(u) = Hū(ū) = 0 (x,u) P and also 0 H(ū) for H( ) := Hū( )
157 Improvement function Let ( x,ū) be a solution to (P). The function Hū(u) := has perfect theoretical properties: Without Slater condition ū solves max {f(u) f(ū), c(u)} (x,u) P min f(u) s.t. c(u) 0 (P) (x,u) P : when c(ū) 0 ū solves (P), otherwise it minimizes infeasibility over P min H ū(u) = Hū(ū) = 0 (x,u) P 0 H(ū) for H( ) := Hū( )
158 Handling nonconvex problems
159 Nonconvex proximal point mapping [PR96] p R f(x) := argmin y IR N {f(y) + R 2 y x 2} x is the prox-center and R > R x is the prox-parameter Theorem If f is convex p R f is well defined for any R > 0. p R f is single valued and loc. Lip. p = p R f(x) R(x p) f(p) x minimizes f x = p R f(x ) for any R > 0. x k+1 = p R f(x k ) converges to a minimizer x. What is f is nonconvex?
160 Nonconvex difficulties Proximal Bundle Methods are the most robust and reliable (oracle) methods for convex minimization. Their success relies heavily on convexity. If f is convex: x k+1 = p R f(x k ) converges to a minimizer x. ˇf k lies entirely below f. May no longer be true for nonconvex f
161 Nonconvex difficulties Proximal Bundle Methods are the most robust and reliable (oracle) methods for convex minimization. Their success relies heavily on convexity. If f is convex x k+1 = p R f(x k ) converges to a minimizer x. ˇf k lies entirely below f. May no longer be true for nonconvex f
162 Nonconvex difficulties Proximal Bundle Methods are the most robust and reliable (oracle) methods for convex minimization. Their success relies heavily on convexity. If f is convex x k+1 = p R f(x k ) converges to a minimizer x. ˇf k lies entirely below f. May no longer be true for nonconvex f
163 Nonconvex difficulties Proximal Bundle Methods are the most robust and reliable (oracle) methods for convex minimization. Their success relies heavily on convexity. If f is convex x k+1 = p R f(x k ) converges to a minimizer x. ˇf k lies entirely below f. May no longer be true for nonconvex f
164 Nonconvex difficulties Proximal Bundle Methods are the most robust and reliable (oracle) methods for convex minimization. Their success relies heavily on convexity. If f is convex x k+1 = p R f(x k ) converges to a minimizer x. ˇf k lies entirely below f. May no longer be true for nonconvex f How this difficulty has been addressed?
165 Take each plane in the model: f i + g i, y i and rewrite it, centered at x k : [ ( )] f(x k ) f(x k ) f i + g i,x k y i + g i, x k f(x k ) e f k,i + g i, x k { } ˇf k (y) = max f(x k ) e f k,i + g i,y x k Good: e f k,i positive convergence Good: If f convex ef k,i positive. BAD: If f nonconvex, e f k,i may be negative
166 Nonconvex bundle methods fix negative linearization errors, replacing ˇf k by: { } ˇf FIX k (y) = max f(x k ) e f k,i + g i,y x k [Mif77, Lem80, Kiw85, Luk98]
167 Nonconvex bundle methods fix negative linearization errors, replacing ˇf k by: { } ˇf FIX k (y) = max f(x k ) e f k,i + g i,y x k [Mif77, Lem80, Kiw85, Luk98]
168 Nonconvex bundle methods fix negative linearization errors, replacing ˇf k by: { } ˇf FIX k (y) = max f(x k ) e f k,i + g i,y x k [Mif77, Lem80, Kiw85, Luk98]
169 Nonconvex bundle methods fix negative linearization errors, replacing ˇf k by: { } ˇf FIX k (y) = max f(x k ) e f k,i + g i,y x k [Mif77, Lem80, Kiw85, Luk98]
170 Nonconvex bundle methods fix negative linearization errors, replacing ˇf k by: { } ˇf FIX k (y) = max f(x k ) e f k,i + g i,y x k [Mif77, Lem80, Kiw85, Luk98]
171 Nonconvex bundle methods fix negative linearization errors, replacing ˇf k by: { } ˇf FIX k (y) = max f(x k ) e f k,i + g i,y x k [Mif77, Lem80, Kiw85, Luk98]
172 A new method A different approach (ours) is based on the following trick Take η,µ > 0 : R = η + µ and note } p R f(x k ) = min { f(w) + R 1 w 2 w x k 2 } = min { f(w) + (η + µ) 1 w 2 w x k 2 } = min { f(w) + η 1 w 2 w x k µ 2 w x k 2 } = min { F 1 η(w) + µ w 2 w x k 2 = p µ (F η ) (x k ) p R f = p µ (F η )
173 Redistributed Proximal Bundle Method At l th -iteration, for k = k(l), given R k, x k and a bundle B = {y i,f i,g i,i I l } 0. Split R k into η l and µ l. 1. Model F ηl ˇF ηl,l(y) = max i B {F ηl i + g ηl i,y y i } 2. Minimize the penalized model y l+1 = argmin{ˇf ηl,l(y) + µ l 2 y x k 2 } 3. Descent test If y l+1 good: x k+1 y l+1, define R k+1 serious step If y l+1 bad: null step 4. Update bundle B B {y l+1,f l+1,g l+1 }
174 VU quasi-newton bundle For x IR n, given matrices A 0, B 0, f(x) = x Ax + x Bx has a unique minimizer at x = 0. On N (A) the function is not differentiable, and the first term vanishes: f N (A) looks smooth. R(A) N (A) V U parallel to f( x) U perpendicular to V V is parallel to N (A), the ridge" of nonsmoothness
175
176 VU-Algorithm: (Mifflin&Sagastizábal, MathProg 05)Recall that f V N (A) is nice: the key is the two QP-solves R(A) N (A) V U 2 bundle QPs Newton-move Two successive bundle QPs identify the ridge of nonsmoothness Solve a 2nd QP to create an approximation of V based on ˇf(^p k )
177 VU-Algorithm: superlinear m serious subsequence (Mifflin&Sag, MathProg 05)
178 To learn more Bundle methods history R. MIFFLIN, C. SAGASTIZÁBAL, Documenta Math, A Science Fiction Story in Nonsmooth Optimization Originating at IIASA vol-ismp/44_mifflin-robert.pdf (exact) Bundle books J.F. BONNANS, J.C. GILBERT, C. LEMARÉCHAL, AND C. SAGASTIZÁBAL, Numerical Optimization: Theoretical and Practical Aspects, Springer, 2nd ed., J.B. HIRIART-URRUTY AND C. LEMARÉCHAL, Convex Analysis and Minimization Algorithms II, no. 306 in Grund. der math. Wissenschaften, Springer, 2nd ed., Inexact Bundle theory (next page)
179 Inexact Bundle theory M. HINTERMÜLLER, A proximal bundle method based on approximate subgradients, COAp 20 (2001), pp M. V. SOLODOV, On Approximations with Finite Precision in Bundle Methods for Nonsmooth Optimization. JoTA (2003), pp K.C. KIWIEL, A proximal bundle method with approximate subgradient linearizations, SiOpt 16 (2006), pp W. DE OLIVEIRA, C. SAGASTIZÁBAL, AND C. LEMARÉCHAL, Convex proximal bundle methods in depth: a unified analysis for inexact oracles, MathProg 148 (2014), pp Inexact Bundle variants with applications G. EMIEL AND C. SAGASTIZÁBAL, Incremental-like bundle methods with application to energy planning, COAp 46 (2010), pp W. DE OLIVEIRA, C. SAGASTIZÁBAL, AND S. SCHEIMBERG, Inexact bundle methods for two-stage stochastic programming, SiOpt 21 (2011), pp (next page)
180 W. VAN ACKOOIJ AND C. SAGASTIZÁBAL, Constrained bundle methods for upper inexact oracles with application to joint chance constrained energy problems, SiOpt 24 (2014), pp W. DE OLIVEIRA AND C. SAGASTIZÁBAL, Level bundle methods for oracles with on-demand accuracy, OMS 29 (2014), pp W. DE OLIVEIRA AND C. SAGASTIZÁBAL, Bundle methods in the xxi century: A birds -eye view, Pesquisa Operacional 34 (2014), pp W. DE OLIVEIRA AND M. SOLODOV, A doubly stabilized bundle method for nonsmooth convex optimization, MathProg 156(1), pp , 2016.
181 Any doubts or questions? Just me
Math 273a: Optimization Subgradient Methods
Math 273a: Optimization Subgradient Methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Nonsmooth convex function Recall: For ˉx R n, f(ˉx) := {g R
More informationSOLVING A MINIMIZATION PROBLEM FOR A CLASS OF CONSTRAINED MAXIMUM EIGENVALUE FUNCTION
International Journal of Pure and Applied Mathematics Volume 91 No. 3 2014, 291-303 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu doi: http://dx.doi.org/10.12732/ijpam.v91i3.2
More informationProximal Newton Method. Ryan Tibshirani Convex Optimization /36-725
Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h
More informationAn Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods
An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods Renato D.C. Monteiro B. F. Svaiter May 10, 011 Revised: May 4, 01) Abstract This
More informationConditional Gradient (Frank-Wolfe) Method
Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties
More informationProximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization
Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R
More informationAlgorithms for Nonsmooth Optimization
Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization
More informationOn the iterate convergence of descent methods for convex optimization
On the iterate convergence of descent methods for convex optimization Clovis C. Gonzaga March 1, 2014 Abstract We study the iterate convergence of strong descent algorithms applied to convex functions.
More informationA doubly stabilized bundle method for nonsmooth convex optimization
Mathematical Programming manuscript No. (will be inserted by the editor) A doubly stabilized bundle method for nonsmooth convex optimization Welington de Oliveira Mikhail Solodov Received: date / Accepted:
More informationMaster 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique
Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some
More informationProximal and First-Order Methods for Convex Optimization
Proximal and First-Order Methods for Convex Optimization John C Duchi Yoram Singer January, 03 Abstract We describe the proximal method for minimization of convex functions We review classical results,
More informationA Proximal Method for Identifying Active Manifolds
A Proximal Method for Identifying Active Manifolds W.L. Hare April 18, 2006 Abstract The minimization of an objective function over a constraint set can often be simplified if the active manifold of the
More informationIncremental-like Bundle Methods with Application to Energy Planning
Incremental-like Bundle Methods with Application to Energy Planning This report corrects and expands Sections 5 and 6 in DOI 10.1007/s10589-009-9288-8. Grégory Emiel Claudia Sagastizábal October 1, 2009
More informationCoordinate Update Algorithm Short Course Subgradients and Subgradient Methods
Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 30 Notation f : H R { } is a closed proper convex function domf := {x R n
More informationFrank-Wolfe Method. Ryan Tibshirani Convex Optimization
Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)
More informationConvex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013
Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for
More informationOptimization for Machine Learning
Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html
More informationOptimization methods
Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to
More informationLevel Bundle Methods for Constrained Convex Optimization with Various Oracles
Research report manuscript No. (will be inserted by the editor) Level Bundle Methods for Constrained Convex Optimization with Various Oracles Wim van Ackooij Welington de Oliveira Received: May 23, 2013
More informationThis can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization
This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization x = prox_f(x)+prox_{f^*}(x) use to get prox of norms! PROXIMAL METHODS WHY PROXIMAL METHODS Smooth
More informationCoordinate Update Algorithm Short Course Proximal Operators and Algorithms
Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 36 Why proximal? Newton s method: for C 2 -smooth, unconstrained problems allow
More informationLecture 17: October 27
0-725/36-725: Convex Optimiation Fall 205 Lecturer: Ryan Tibshirani Lecture 7: October 27 Scribes: Brandon Amos, Gines Hidalgo Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These
More informationA quasisecant method for minimizing nonsmooth functions
A quasisecant method for minimizing nonsmooth functions Adil M. Bagirov and Asef Nazari Ganjehlou Centre for Informatics and Applied Optimization, School of Information Technology and Mathematical Sciences,
More informationNesterov s Optimal Gradient Methods
Yurii Nesterov http://www.core.ucl.ac.be/~nesterov Nesterov s Optimal Gradient Methods Xinhua Zhang Australian National University NICTA 1 Outline The problem from machine learning perspective Preliminaries
More informationOptimization for Learning and Big Data
Optimization for Learning and Big Data Donald Goldfarb Department of IEOR Columbia University Department of Mathematics Distinguished Lecture Series May 17-19, 2016. Lecture 1. First-Order Methods for
More informationOn the convergence properties of the projected gradient method for convex optimization
Computational and Applied Mathematics Vol. 22, N. 1, pp. 37 52, 2003 Copyright 2003 SBMAC On the convergence properties of the projected gradient method for convex optimization A. N. IUSEM* Instituto de
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationNewton s Method. Javier Peña Convex Optimization /36-725
Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and
More informationSparse Optimization Lecture: Basic Sparse Optimization Models
Sparse Optimization Lecture: Basic Sparse Optimization Models Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know basic l 1, l 2,1, and nuclear-norm
More informationarxiv: v1 [math.oc] 5 Dec 2014
FAST BUNDLE-LEVEL TYPE METHODS FOR UNCONSTRAINED AND BALL-CONSTRAINED CONVEX OPTIMIZATION YUNMEI CHEN, GUANGHUI LAN, YUYUAN OUYANG, AND WEI ZHANG arxiv:141.18v1 [math.oc] 5 Dec 014 Abstract. It has been
More informationMcMaster University. Advanced Optimization Laboratory. Title: A Proximal Method for Identifying Active Manifolds. Authors: Warren L.
McMaster University Advanced Optimization Laboratory Title: A Proximal Method for Identifying Active Manifolds Authors: Warren L. Hare AdvOl-Report No. 2006/07 April 2006, Hamilton, Ontario, Canada A Proximal
More informationA bundle-filter method for nonsmooth convex constrained optimization
Math. Program., Ser. B (2009) 116:297 320 DOI 10.1007/s10107-007-0-7 FULL LENGTH PAPER A bundle-filter method for nonsmooth convex constrained optimization Elizabeth Karas Ademir Ribeiro Claudia Sagastizábal
More informationSparse Optimization Lecture: Dual Methods, Part I
Sparse Optimization Lecture: Dual Methods, Part I Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know dual (sub)gradient iteration augmented l 1 iteration
More informationConvex Optimization Lecture 16
Convex Optimization Lecture 16 Today: Projected Gradient Descent Conditional Gradient Descent Stochastic Gradient Descent Random Coordinate Descent Recall: Gradient Descent (Steepest Descent w.r.t Euclidean
More informationOptimization methods
Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,
More informationCoordinate Update Algorithm Short Course Operator Splitting
Coordinate Update Algorithm Short Course Operator Splitting Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 25 Operator splitting pipeline 1. Formulate a problem as 0 A(x) + B(x) with monotone operators
More informationAgenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples
Agenda Fast proximal gradient methods 1 Accelerated first-order methods 2 Auxiliary sequences 3 Convergence analysis 4 Numerical examples 5 Optimality of Nesterov s scheme Last time Proximal gradient method
More informationAdaptive Primal Dual Optimization for Image Processing and Learning
Adaptive Primal Dual Optimization for Image Processing and Learning Tom Goldstein Rice University tag7@rice.edu Ernie Esser University of British Columbia eesser@eos.ubc.ca Richard Baraniuk Rice University
More informationCoordinate Descent and Ascent Methods
Coordinate Descent and Ascent Methods Julie Nutini Machine Learning Reading Group November 3 rd, 2015 1 / 22 Projected-Gradient Methods Motivation Rewrite non-smooth problem as smooth constrained problem:
More informationMethods for Unconstrained Optimization Numerical Optimization Lectures 1-2
Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Coralia Cartis, University of Oxford INFOMM CDT: Modelling, Analysis and Computation of Continuous Real-World Problems Methods
More informationAlgorithms for constrained local optimization
Algorithms for constrained local optimization Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen Algorithms for constrained local optimization p. Feasible direction methods Algorithms for constrained
More informationLine Search Methods for Unconstrained Optimisation
Line Search Methods for Unconstrained Optimisation Lecture 8, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Generic
More informationConvex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014
Convex Optimization Dani Yogatama School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA February 12, 2014 Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12,
More informationConvex Optimization and l 1 -minimization
Convex Optimization and l 1 -minimization Sangwoon Yun Computational Sciences Korea Institute for Advanced Study December 11, 2009 2009 NIMS Thematic Winter School Outline I. Convex Optimization II. l
More information8 Numerical methods for unconstrained problems
8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields
More informationarxiv: v1 [math.oc] 1 Jul 2016
Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the
More informationSEMI-SMOOTH SECOND-ORDER TYPE METHODS FOR COMPOSITE CONVEX PROGRAMS
SEMI-SMOOTH SECOND-ORDER TYPE METHODS FOR COMPOSITE CONVEX PROGRAMS XIANTAO XIAO, YONGFENG LI, ZAIWEN WEN, AND LIWEI ZHANG Abstract. The goal of this paper is to study approaches to bridge the gap between
More informationIdentifying Active Constraints via Partial Smoothness and Prox-Regularity
Journal of Convex Analysis Volume 11 (2004), No. 2, 251 266 Identifying Active Constraints via Partial Smoothness and Prox-Regularity W. L. Hare Department of Mathematics, Simon Fraser University, Burnaby,
More informationACCELERATED BUNDLE LEVEL TYPE METHODS FOR LARGE SCALE CONVEX OPTIMIZATION
ACCELERATED BUNDLE LEVEL TYPE METHODS FOR LARGE SCALE CONVEX OPTIMIZATION By WEI ZHANG A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS
More informationSubgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus
1/41 Subgradient Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes definition subgradient calculus duality and optimality conditions directional derivative Basic inequality
More informationPrimal/Dual Decomposition Methods
Primal/Dual Decomposition Methods Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2018-19, HKUST, Hong Kong Outline of Lecture Subgradients
More informationLecture 7: September 17
10-725: Optimization Fall 2013 Lecture 7: September 17 Lecturer: Ryan Tibshirani Scribes: Serim Park,Yiming Gu 7.1 Recap. The drawbacks of Gradient Methods are: (1) requires f is differentiable; (2) relatively
More informationGradient Descent. Ryan Tibshirani Convex Optimization /36-725
Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like
More informationDual and primal-dual methods
ELE 538B: Large-Scale Optimization for Data Science Dual and primal-dual methods Yuxin Chen Princeton University, Spring 2018 Outline Dual proximal gradient method Primal-dual proximal gradient method
More informationThe Newton Bracketing method for the minimization of convex functions subject to affine constraints
Discrete Applied Mathematics 156 (2008) 1977 1987 www.elsevier.com/locate/dam The Newton Bracketing method for the minimization of convex functions subject to affine constraints Adi Ben-Israel a, Yuri
More informationNonlinear Optimization for Optimal Control
Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]
More informationA Tutorial on Primal-Dual Algorithm
A Tutorial on Primal-Dual Algorithm Shenlong Wang University of Toronto March 31, 2016 1 / 34 Energy minimization MAP Inference for MRFs Typical energies consist of a regularization term and a data term.
More informationConvex Optimization. Newton s method. ENSAE: Optimisation 1/44
Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)
More informationBundle methods for inexact data
Bundle methods for inexact data Welington de Oliveira and Mihail Solodov Abstract Many applications of optimization to real-life problems lead to nonsmooth objective and/or constraint functions that are
More informationLIMITED MEMORY BUNDLE METHOD FOR LARGE BOUND CONSTRAINED NONSMOOTH OPTIMIZATION: CONVERGENCE ANALYSIS
LIMITED MEMORY BUNDLE METHOD FOR LARGE BOUND CONSTRAINED NONSMOOTH OPTIMIZATION: CONVERGENCE ANALYSIS Napsu Karmitsa 1 Marko M. Mäkelä 2 Department of Mathematics, University of Turku, FI-20014 Turku,
More informationStochastic Optimization Algorithms Beyond SG
Stochastic Optimization Algorithms Beyond SG Frank E. Curtis 1, Lehigh University involving joint work with Léon Bottou, Facebook AI Research Jorge Nocedal, Northwestern University Optimization Methods
More informationc 2013 Society for Industrial and Applied Mathematics
SIAM J. OPTIM. Vol. 3, No., pp. 109 115 c 013 Society for Industrial and Applied Mathematics AN ACCELERATED HYBRID PROXIMAL EXTRAGRADIENT METHOD FOR CONVEX OPTIMIZATION AND ITS IMPLICATIONS TO SECOND-ORDER
More informationMath 273a: Optimization Overview of First-Order Optimization Algorithms
Math 273a: Optimization Overview of First-Order Optimization Algorithms Wotao Yin Department of Mathematics, UCLA online discussions on piazza.com 1 / 9 Typical flow of numerical optimization Optimization
More informationDual Proximal Gradient Method
Dual Proximal Gradient Method http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/19 1 proximal gradient method
More informationPenalty and Barrier Methods General classical constrained minimization problem minimize f(x) subject to g(x) 0 h(x) =0 Penalty methods are motivated by the desire to use unconstrained optimization techniques
More information10. Unconstrained minimization
Convex Optimization Boyd & Vandenberghe 10. Unconstrained minimization terminology and assumptions gradient descent method steepest descent method Newton s method self-concordant functions implementation
More informationOptimization Tutorial 1. Basic Gradient Descent
E0 270 Machine Learning Jan 16, 2015 Optimization Tutorial 1 Basic Gradient Descent Lecture by Harikrishna Narasimhan Note: This tutorial shall assume background in elementary calculus and linear algebra.
More informationUses of duality. Geoff Gordon & Ryan Tibshirani Optimization /
Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear
More informationLecture 6: September 12
10-725: Optimization Fall 2013 Lecture 6: September 12 Lecturer: Ryan Tibshirani Scribes: Micol Marchetti-Bowick Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have not
More informationε-enlargements of maximal monotone operators: theory and applications
ε-enlargements of maximal monotone operators: theory and applications Regina S. Burachik, Claudia A. Sagastizábal and B. F. Svaiter October 14, 2004 Abstract Given a maximal monotone operator T, we consider
More informationSIAM Conference on Imaging Science, Bologna, Italy, Adaptive FISTA. Peter Ochs Saarland University
SIAM Conference on Imaging Science, Bologna, Italy, 2018 Adaptive FISTA Peter Ochs Saarland University 07.06.2018 joint work with Thomas Pock, TU Graz, Austria c 2018 Peter Ochs Adaptive FISTA 1 / 16 Some
More informationShiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers
Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 9 Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 2 Separable convex optimization a special case is min f(x)
More informationFast proximal gradient methods
L. Vandenberghe EE236C (Spring 2013-14) Fast proximal gradient methods fast proximal gradient method (FISTA) FISTA with line search FISTA as descent method Nesterov s second method 1 Fast (proximal) gradient
More informationSparsity Regularization
Sparsity Regularization Bangti Jin Course Inverse Problems & Imaging 1 / 41 Outline 1 Motivation: sparsity? 2 Mathematical preliminaries 3 l 1 solvers 2 / 41 problem setup finite-dimensional formulation
More informationTHE NEWTON BRACKETING METHOD FOR THE MINIMIZATION OF CONVEX FUNCTIONS SUBJECT TO AFFINE CONSTRAINTS
THE NEWTON BRACKETING METHOD FOR THE MINIMIZATION OF CONVEX FUNCTIONS SUBJECT TO AFFINE CONSTRAINTS ADI BEN-ISRAEL AND YURI LEVIN Abstract. The Newton Bracketing method [9] for the minimization of convex
More informationUniversity of Houston, Department of Mathematics Numerical Analysis, Fall 2005
3 Numerical Solution of Nonlinear Equations and Systems 3.1 Fixed point iteration Reamrk 3.1 Problem Given a function F : lr n lr n, compute x lr n such that ( ) F(x ) = 0. In this chapter, we consider
More informationA PROJECTED HESSIAN GAUSS-NEWTON ALGORITHM FOR SOLVING SYSTEMS OF NONLINEAR EQUATIONS AND INEQUALITIES
IJMMS 25:6 2001) 397 409 PII. S0161171201002290 http://ijmms.hindawi.com Hindawi Publishing Corp. A PROJECTED HESSIAN GAUSS-NEWTON ALGORITHM FOR SOLVING SYSTEMS OF NONLINEAR EQUATIONS AND INEQUALITIES
More informationBenchmark of Some Nonsmooth Optimization Solvers for Computing Nonconvex Proximal Points
Benchmark of Some Nonsmooth Optimization Solvers for Computing Nonconvex Proximal Points Warren Hare Claudia Sagastizábal February 28, 2006 Key words: nonconvex optimization, proximal point, algorithm
More informationALGORITHMS FOR MINIMIZING DIFFERENCES OF CONVEX FUNCTIONS AND APPLICATIONS
ALGORITHMS FOR MINIMIZING DIFFERENCES OF CONVEX FUNCTIONS AND APPLICATIONS Mau Nam Nguyen (joint work with D. Giles and R. B. Rector) Fariborz Maseeh Department of Mathematics and Statistics Portland State
More informationParallel Coordinate Optimization
1 / 38 Parallel Coordinate Optimization Julie Nutini MLRG - Spring Term March 6 th, 2018 2 / 38 Contours of a function F : IR 2 IR. Goal: Find the minimizer of F. Coordinate Descent in 2D Contours of a
More informationSubgradient Method. Guest Lecturer: Fatma Kilinc-Karzan. Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization /36-725
Subgradient Method Guest Lecturer: Fatma Kilinc-Karzan Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization 10-725/36-725 Adapted from slides from Ryan Tibshirani Consider the problem Recall:
More informationNewton s Method. Ryan Tibshirani Convex Optimization /36-725
Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x
More informationLecture 14: October 17
1-725/36-725: Convex Optimization Fall 218 Lecture 14: October 17 Lecturer: Lecturer: Ryan Tibshirani Scribes: Pengsheng Guo, Xian Zhou Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:
More informationSubgradient Method. Ryan Tibshirani Convex Optimization
Subgradient Method Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last last time: gradient descent min x f(x) for f convex and differentiable, dom(f) = R n. Gradient descent: choose initial
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57
More informationKaisa Joki Adil M. Bagirov Napsu Karmitsa Marko M. Mäkelä. New Proximal Bundle Method for Nonsmooth DC Optimization
Kaisa Joki Adil M. Bagirov Napsu Karmitsa Marko M. Mäkelä New Proximal Bundle Method for Nonsmooth DC Optimization TUCS Technical Report No 1130, February 2015 New Proximal Bundle Method for Nonsmooth
More information5 Handling Constraints
5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest
More informationThe Proximal Gradient Method
Chapter 10 The Proximal Gradient Method Underlying Space: In this chapter, with the exception of Section 10.9, E is a Euclidean space, meaning a finite dimensional space endowed with an inner product,
More informationA STABILIZED SQP METHOD: SUPERLINEAR CONVERGENCE
A STABILIZED SQP METHOD: SUPERLINEAR CONVERGENCE Philip E. Gill Vyacheslav Kungurtsev Daniel P. Robinson UCSD Center for Computational Mathematics Technical Report CCoM-14-1 June 30, 2014 Abstract Regularized
More information10 Numerical methods for constrained problems
10 Numerical methods for constrained problems min s.t. f(x) h(x) = 0 (l), g(x) 0 (m), x X The algorithms can be roughly divided the following way: ˆ primal methods: find descent direction keeping inside
More informationLinear Convergence under the Polyak-Łojasiewicz Inequality
Linear Convergence under the Polyak-Łojasiewicz Inequality Hamed Karimi, Julie Nutini and Mark Schmidt The University of British Columbia LCI Forum February 28 th, 2017 1 / 17 Linear Convergence of Gradient-Based
More informationWorst-Case Complexity Guarantees and Nonconvex Smooth Optimization
Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization Frank E. Curtis, Lehigh University Beyond Convexity Workshop, Oaxaca, Mexico 26 October 2017 Worst-Case Complexity Guarantees and Nonconvex
More informationConvex Analysis Background
Convex Analysis Background John C. Duchi Stanford University Park City Mathematics Institute 206 Abstract In this set of notes, we will outline several standard facts from convex analysis, the study of
More informationStochastic and online algorithms
Stochastic and online algorithms stochastic gradient method online optimization and dual averaging method minimizing finite average Stochastic and online optimization 6 1 Stochastic optimization problem
More informationStructural and Multidisciplinary Optimization. P. Duysinx and P. Tossings
Structural and Multidisciplinary Optimization P. Duysinx and P. Tossings 2018-2019 CONTACTS Pierre Duysinx Institut de Mécanique et du Génie Civil (B52/3) Phone number: 04/366.91.94 Email: P.Duysinx@uliege.be
More informationOptimization and Optimal Control in Banach Spaces
Optimization and Optimal Control in Banach Spaces Bernhard Schmitzer October 19, 2017 1 Convex non-smooth optimization with proximal operators Remark 1.1 (Motivation). Convex optimization: easier to solve,
More informationAccelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)
Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Donghwan Kim and Jeffrey A. Fessler EECS Department, University of Michigan
More informationProximal splitting methods on convex problems with a quadratic term: Relax!
Proximal splitting methods on convex problems with a quadratic term: Relax! The slides I presented with added comments Laurent Condat GIPSA-lab, Univ. Grenoble Alpes, France Workshop BASP Frontiers, Jan.
More informationProximal methods. S. Villa. October 7, 2014
Proximal methods S. Villa October 7, 2014 1 Review of the basics Often machine learning problems require the solution of minimization problems. For instance, the ERM algorithm requires to solve a problem
More informationLECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE
LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE CONVEX ANALYSIS AND DUALITY Basic concepts of convex analysis Basic concepts of convex optimization Geometric duality framework - MC/MC Constrained optimization
More information