Nonsmooth optimization : beyond first order methods. A tutorial focusing on bundle methods

Size: px
Start display at page:

Download "Nonsmooth optimization : beyond first order methods. A tutorial focusing on bundle methods"

Transcription

1 Nonsmooth optimization : beyond first order methods. A tutorial focusing on bundle methods Claudia Sagastizábal (IMECC-UniCamp, Campinas Brazil, adjunct researcher) SESO 2018, Paris, May 23 and 25, 2018

2 Computational NSO: what do we mean? For the unconstrained problem minf(x), where f : IR n IR is convex but not differentiable at some points Algorithms defined according on how much information is provided by certain oracle

3 Computational NSO: what do we mean? For the unconstrained problem minf(x), where f : IR n IR is convex but not differentiable at some points, Algorithms defined according on how much information is provided by certain oracle an informative oracle f(x) x the full f(x)

4 Computational NSO: what do we mean? For the unconstrained problem minf(x), where f : IR n IR is convex but not differentiable at some points Algorithms defined according on how much information is provided by certain oracle a black box f(x) x ONE g(x) f(x)

5 Computational NSO: what do we mean? For the unconstrained problem minf(x), where f : IR n IR is convex but not differentiable at some points, Algorithms defined according on how much information is provided by certain oracle a black box f(x) x ONE g(x) f(x) How common are nonsmooth objective functions in optimization?

6 When does nonsmoothness appear? * if the nature of the problem imposes a nonsmooth model; or * if sparsity of the solution is a concern; or * in problems difficult to solve, because they are large scale because they are heterogeneous sometimes the solution method induces nonsmoothness

7 Example of NS model Recovery of blocky images (l 1 -regularization of TV)

8 Example of sparse optimization min{ x 1 : Ax = b} Basis pursuit: find least 1-norm point on the affine plane Tends to return a sparse point (sometimes, the sparsest)

9 Example of sparse optimization min{ x 1 : Ax = b} Basis pursuit: find least 1-norm point on the affine plane Tends to return a sparse point (sometimes, the sparsest) LASSO denoises basis pursuit { } min Ax b 2 2 : x 1 τ or or min { } x 1 + µ 2 Ax b 2 2 } min { x 1 : Ax b 2 2 σ

10 Example of sparse optimization min{ x 1 : h(x) b} Basis pursuit: find least 1-norm point on a nonlinear set Tends to return a sparse point (sometimes, the sparsest) LASSO denoises basis pursuit { ( } + min h(x) b) 2 2 : x 1 τ or ( ) } + min { x 1 + µ2 h(x)x b 2 2 or { ( } + min x 1 : h(x)x b) 2 2 σ

11 Lagrangian Relaxation Example Real-life optimization problems min C j (p j ) j J (primal) for j J: p j P j g j (p j ) = Dem j J x

12 Lagrangian Relaxation Example Real-life optimization problems max C j (p j ) j J (primal) for j J: p j P j g j (p j ) = Dem j J x often exhibit separable structure after dualization

13 Lagrangian Relaxation Example Real-life optimization problems max C j (p j ) j J (primal) for j J: p j P j g j (p j ) = Dem j J x often exhibit separable structure passing to the (dual):

14 Lagrangian Relaxation Example Real-life optimization problems max C j (p j ) j J (primal) for j J: p j P j g j (p j ) = Dem j J x often exhibit separable structure passing to the (dual): min x f(x) := f 0 (x) + j J min x x,dem + j J f j (x) max C j (p j ) + p j P j x,g j (p j )

15 Lagrangian Relaxation Example Real-life optimization problems max C j (p j ) j J (primal) for j J: p j P j g j (p j ) = Dem j J x often exhibit separable structure passing to the (dual): min x f(x) := f 0 (x) + j J min x x,dem + j J f j (x) max C j (p j ) + p j P j x,g j (p j )

16 Benders Decomposition Example Similar situation, but now the uncoupling is done on a primal level min (primal) j J I j ( p j )+C j (p j ) for j J: p j P j p j p j + p j p D

17 Benders Decomposition Example Similar situation, but now the uncoupling is done on a primal level (primal) min p min j J I j ( p j )+C j (p j ) for j J: p j P j p D I j ( p j )+V j ( p j ) j J p D V j ( p j ) := p j p j + p j min C j (p j ) p j p j + p j minf(x) := j J f j ( p j ) f j ( p j ) :=

18 Benders Decomposition Example Similar situation, but now the uncoupling is done on a primal level (primal) min p min j J I j ( p j )+C j (p j ) for j J: p j P j p D I j ( p j )+V j ( p j ) j J p D V j ( p j ) := p j p j + p j min C j (p j ) p j p j + p j minf(x) := j J f j ( p j ) for f j ( p j ) := I j ( p j )+V j ( p j )

19 Computing f(x k ): how difficult is it? 1. f(x) = x, for n = 1 2. A linear Lasso function, f(x) = x 1 + µ 2 Ax b A nonlinear Lasso function, h C 1, f(x) = x 1 + µ 2 (h(x) b) One of the local subproblems in the Lagrangian example, max C j (p j ) + x k,g j (p j ) f j (x k ) := p j P j 5. One of the local subproblems in the Benders example, } (I j ( p j ) + V j ( p j ) =f j (x k,j ) = min {C j (p j ) : p j p j + x k,j

20 But why would one want ALL of f(x k )?

21 But why would one want ALL of f(x k )? Indispensible to calculate the proximal point p = prox f t (x) p = argminf(y) + 1 2t y x 2 2 f(p) + 1 t (p x) 1 t (x p) f(p)

22 But why would one want ALL of f(x k )? Indispensible to calculate the proximal point p = prox f t (x) p = argminf(y) + 1 2t y x f(p) + 1 t (p x) 1 t (x p) f(p)

23 But why would one want ALL of f(x k )? Indispensible to calculate the proximal point p = prox f t (x) p = argminf(y) + 1 2t y x f(p) + 1 t (p x) 1 t (x p) f(p)

24 But why would one want ALL of f(x k )? Indispensible to calculate the proximal point p = prox f t (x) p = argminf(y) + 1 2t y x f(p) + 1 t (p x) 1 t (x p) f(p) Without full knowledge of the subdifferential, the implicit inclusion cannot be solved!

25 But why would one want ALL of f(x k )? Indispensible to calculate the proximal point p = prox f t (x) p = argminf(y) + 1 2t y x f(p) + 1 t (p x) 1 t (x p) f(p) Without full knowledge of the subdifferential, the implicit inclusion cannot be solved! note: p x t f(p) akin to a subgradient method

26 Proximal point algorithms (Accel. Nesterov, FISTA, AugLag) x k+1 = prox f t k (x k ) x k+1 = argminf(y) + 1 2t k y x k 2 2

27 Proximal point algorithms (Accel. Nesterov, FISTA, AugLag) x k+1 = prox f t k (x k ) x k+1 = argminf(y) + 1 2t k y x k 2 2 of interest if computing prox f t k (x k ) is much easier than minimizing f stepsize t k > 0 impacts on the number of iterations

28 Proximal point algorithms (Accel. Nesterov, FISTA, AugLag) x k+1 = prox f t k (x k ) x k+1 = argminf(y) + 1 2t k y x k 2 2 of interest if computing prox f t k (x k ) is much easier than minimizing f stepsize t k > 0 impacts on the number of iterations

29 Proximal point: calculus rules separable sum: f(x,y) = (g(x),h(y)) = prox f t (x) = ( prox g t (x),proxh t (y) ) scalar factor (α 0) and translation (v 0): f(x) = g(αx + ( v) = ) prox f t (x) = α 1 prox α2 g t (αx + v) v perspective" (α > 0): f(x) = αg( α 1 x) = proxf t (x) = αproxg/α t ( α x )

30 Proximal point: special functions + linear term (v 0): f(x) = g(x) + v,x = prox f t (x) = proxg t (x v) + convex quadratic term (t > 0): f(x) = g(x) + 1 2t x v 2 = prox f t (x) = proxλg t (λx + (1 λ)v) for λ = t t + 1 composition with linear term such that A A = α 1 I, (α 0): f(x) = g(ax + v) = [ ] prox f t (x) = (I αa A)x + αa prox g/α (Ax + v) v t

31 Proximal point algorithm: convergence If argminf /0 then f(x k ) f( x) x0 x 2 2 k i=1 t i

32 Proximal point algorithm: convergence If argminf /0 then f(x k ) f( x) x0 x 2 2 k i=1 t i = convergence if t i + = rate 1/k if {t k } bounded away from zero

33 Proximal point algorithm: acceleration ) x k+1 = prox f t k (x k + θ k+1 ( θ 1 k 1)(x k x k 1 ) for θ 2 k+1 t k+1 = (1 θ k+1 ) θ2 k t k

34 Proximal point algorithm: acceleration ) x k+1 = prox f t k (x k + θ k+1 ( θ 1 k 1)(x k x k 1 ) for θ 2 k+1 t k+1 = (1 θ k+1 ) θ2 k t k = convergence if t i + = rate 1/k 2 if {t k } bounded away from zero

35 What if prox f t is not computable?

36 What if prox f t Use bundle methods! is not computable?

37 What if prox f t Use bundle methods! is not computable? When do bundle method prove most useful?

38 What if prox f t Use bundle methods! is not computable? When do bundle method prove most useful? In situations when the objective function is not available explicitly and/or when we do not have access to the full subdifferential and/or when calculations need to be done with high precision

39 Bundling to approximate the prox WANT: p = prox f t (x) p = argminf(y) + 2t 1 y x f(p) + 1 t (p x) (x p) f(p) 1 t

40 Bundling to approximate the prox WANT: p = prox f t (x) p = argminf(y) + 2t 1 y x f(p) + 1 t (p x) (x p) f(p) 1 t HAVE: q = prox M t (x) q = argminm(y) + 2t 1 y x M(q) + 1 t (q x) (x q) M(q) 1 t

41 Bundling to approximate the prox WANT: p = prox f t (x) p = argminf(y) + 2t 1 y x f(p) + 1 t (p x) (x p) f(p) 1 t HAVE: q = prox M t (x) q = argminm(y) + 1 2t y x M(q) + 1 t (q x) (x q) M(q) 1 t M is a model of f for which we do have full knowledge of the subdifferential: the implicit inclusion can be solved!

42 Bundling to approximate the prox WANT: p = prox f t (x) p = argminf(y) + 2t 1 y x f(p) + 1 t (p x) (x p) f(p) 1 t HAVE: q = prox M t (x) q = argminm(y) + 1 2t y x M(q) + 1 t (q x) (x q) M(q) 1 t M is a model of f for which we do have full knowledge of the subdifferential: the implicit inclusion can be solved!

43 How is the model built?

44 Model built with the black box

45 A quick overview of Convex Analysis An example of a convex nonsmooth function

46 A quick overview of Convex Analysis An example of a convex nonsmooth function

47 A quick overview of Convex Analysis An example of a convex nonsmooth function { f(x)} = {slope of the linearization supporting f, tangent at x}

48 A quick overview of Convex Analysis An example of a convex nonsmooth function { f(x)} = {slope of the linearization supporting f, tangent at x} By convexity, f(y) f(x) + f(x),y x for all y

49 A quick overview of Convex Analysis An example of a convex nonsmooth function

50 A quick overview of Convex Analysis An example of a convex nonsmooth function

51 A quick overview of Convex Analysis An example of a convex nonsmooth function

52 A quick overview of Convex Analysis An example of a convex nonsmooth function

53 A quick overview of Convex Analysis An example of a convex nonsmooth function f(x) = {g IR n : f(y) f(x) + g,y x for all y} = {slopes of linearizations supporting f, tangent at x}

54 A quick overview of Convex Analysis An example of a convex nonsmooth function f(x) = {g IR n : f(y) f(x) + g,y x for all y} = {slopes of linearizations supporting f, tangent at x}

55 What can be done with the oracle output? An example of a convex nonsmooth function f(x) = {g IR n : f(y) f(x) + g,y x for all y} = {slopes of linearizations supporting f, tangent at x}

56 What can be done with the oracle output? An example of a convex nonsmooth function f(x) = {g IR n : f(y) f(x) + g,y x for all y} = {slopes of linearizations supporting f, tangent at x} 1 oracle call x k f(xk ) g(x k ) f(x k ) = 1 linearization

57 BEWARE: x k f(x k ) g(x k ) f(x k ) if oracle output is not accurate, linearization can be wrong! wrong g(x k ) gives bad linearization at x k f(x) = {g IR n : f(y) f(x) + g,y x for all y} (similarly if wrong f(x k ), more on this later)

58 How is the oracle information used? Putting together linearizations creates a cutting-plane model M for f f i = x i f(x i ) g i = g(x i ) = M(y) = max{f i + g i,x x i } i

59 How is the oracle information used? Putting together linearizations 1 creates a cutting-plane model M for f f i = x i f(x i ) g i = g(x i ) { = M(y) = max f i + g i,x x i } i

60 How is the oracle information used? Putting together linearizations 1 creates a cutting-plane model M for f f i = x i f(x i ) g i = g(x i ) { = M(y) = max f i + g i,x x i } i

61 How is the oracle information used? Putting together linearizations 1 creates a cutting-plane model M for f f i = x i f(x i ) g i = g(x i ) { = M(y) = max f i + g i,y x i } i (just one type of model, many others are possible)

62 How is the oracle information used? Putting together linearizations 1 creates a cutting-plane model M for f f i = x i f(x i ) g i = g(x i ) = M k (y) = max i k { f i + g i,y x i } (just one type of model, many others are possible)

63 Infinite bundling yields prox f t WANT: p = prox f t (x) p = argminf(y) + 2t 1 y x 2 2 HAVE: q k = prox M k t k (x) q k = argminm k (y) + 2t 1 k y x k = G k + t 1 k (q k x k ) Theorem [CL93] Suppose the models satisfy M k (y) f(y) for all k and y M k+1 (y) f(q k ) + g(q k ),y x k M k+1 (y) M k (q k ) + G k,y x k If 0 < t min t k+1 t k, then lim k qk = p and lim k M k(q k ) = f(p) for G k M k (q k )

64 Infinite bundling yields prox f t WANT: p = prox f t (x) p = argminf(y) + 2t 1 y x 2 2 HAVE: q k = prox M k t k (x) q k = argminm k (y) + 2t 1 k y x k = G k + t 1 k (q k x k ) Theorem [CL93] Suppose the models satisfy M k (y) f(y) for all k and y M k+1 (y) f(q k ) + g(q k ),y x k M k+1 (y) M k (q k ) + G k,y x k If 0 < t min t k+1 t k, then lim k qk = p and lim k M k(q k ) = f(p) for G k M k (q k )

65 Models for the half-and-half function STRUCTURE none f(x) x Ax + x Bx sum compo sition f 1 (x) + f 2 (x) (h c)(x) f 1 (x) = x Ax f 2 (x) = x Bx f 2 is smooth c(x) = (x,x Bx) IR n+1 c is smooth h(c) = C 1:n AC 1:n + C n+1 h is sublinear

66 Models for the half-and-half function STRUCTURE none f(x) x Ax + x Bx sum compo sition f 1 (x) + f 2 (x) (h c)(x) f 1 (x) = x Ax f 2 (x) = x Bx f 2 is smooth c(x) = (x,x Bx) IR n+1 c is smooth h(c) = C 1:n AC 1:n + C n+1 h is sublinear

67 Models for the half-and-half function STRUCTURE none f(x) x Ax + x Bx f k := f(x k ), g k f(x k ) sum compo sition f 1 (x) + f 2 (x) (h c)(x) f 1 (x) = x Ax f 2 (x) = x Bx f k 1,gk 1,fk 2, f 2(x k ) c(x) = (x,x Bx) IR n+1 c k = c(x k ),c (x k ) h(c) = C 1:n AC 1:n + C n+1 h k,g k h(c k )

68 Models for the half-and-half function STRUCTURE none f(x) x Ax + x Bx f k := f(x k ), g k f(x k ) sum compo sition f 1 (x) + f 2 (x) (h c)(x) f 1 (x) = x Ax f 2 (x) = x Bx f k 1,gk 1,fk 2, f 2(x k ) c(x) = (x,x Bx) IR n+1 c k = c(x k ),c (x k ) h(c) = C 1:n AC 1:n + C n+1 h k,g k h(c k )

69 Models for the half-and-half function STRUCTURE none f(x) x Ax + x Bx f k := f(x k ), g k f(x k ) sum compo sition f 1 (x) + f 2 (x) (h c)(x) f 1 (x) = x Ax f 2 (x) = x Bx f k 1,gk 1,fk 2, f 2(x k ) c(x) = (x,x Bx) IR n+1 c k = c(x k ),c (x k ) h(c) = C 1:n AC 1:n + C n+1 h k,g k h(c k )

70 NSO pitfalls NSO pitfalls NSO pitfalls NSO pitfalls

71 Stopping test in smooth optimization Algorithms for unconstrained smooth optimization use as optimality certificate Fermat s rule 0 = f( x) and generate a minimizing sequence: {x k } x such that f(x k ) 0. If f C 1, then f( x) = 0

72 Stopping test in smooth optimization Algorithms for unconstrained smooth optimization use as optimality certificate Fermat s rule 0 = f( x) and generate a minimizing sequence: {x k } x such that f(x k ) 0. If f C 1, then f( x) = 0 things are less straightforward if f is nonsmooth...

73 What happens with the stopping test in NSO? Algorithms for unconstrained NSO use as optimality certificate the inclusion 0 f( x) As a set-valued mapping f(x) is osc: ( ) x k,g(x k ) f(x k x k x ) : g(x k ) ḡ = ḡ f( x)

74 What happens with the stopping test in NSO? Algorithms for unconstrained NSO use as optimality certificate the inclusion 0 f( x) As a set-valued mapping f(x) is osc: ( ) x k,g(x k ) f(x k x k x ) : g(x k ) ḡ = ḡ f( x) As a set-valued mapping, f(x) is not isc: Given ḡ f( x) ( ) / x k,g(x k ) f(x k x k x ) : g(x k ) ḡ

75 What happens with the stopping test in NSO? Algorithms for unconstrained NSO use as optimality certificate the inclusion 0 f( x) As a set-valued mapping f(x) is osc: ( ) x k,g(x k ) f(x k x k x ) : g(x k ) ḡ = ḡ f( x) As a set-valued mapping, f(x) is not isc: Given ḡ f( x) ( ) /// x k,g(x k ) f(x k x k x ) : g(x k ) ḡ

76 The subdifferential For the absolute value function, f(x) = x [ 1, 1 ε/x] x < ε/2, ε f(x) = [ 1,1] ε/2 x1 ε1/2, [1 ε/x,1] x > ε/2. 1 x < 0 f(x) = [ 1,1] x = 0 1 x > 0

77 What happens with the stopping test in NSO? We need to design a sound stopping test that does not rely on the straightforward extension of Fermat s rule.

78 What happens with the stopping test in NSO? We need to design a sound stopping test that does not rely on the straightforward extension of Fermat s rule. We use instead ḡ ε f( x) for ḡ and ε small where the ε-subdifferential contains the slopes of linearizations supporting f up to ε, tangent at x: ε f(x) = {g IR n : f(y) f(x)+ g,y x ε for all y}

79 The ε-subdifferential ε f(x) = {g IR n : f(y) f(x) + g,y x ε for all y} f + ε f ε Subgradient linearization

80 The ε-subdifferential ε f(x) = {g IR n : f(y) f(x) + g,y x ε for all y} f + ε f ε Epsilon Subgradient linearization

81 The ε-subdifferential ε f(x) = {g IR n : f(y) f(x) + g,y x ε for all y} f + ε f ε Epsilon Subgradient linearization

82 The ε-subdifferential ε f(x) = {g IR n : f(y) f(x) + g,y x ε for all y} f + ε f ε Epsilon Subgradient linearization

83 The ε-subdifferential For the absolute value function, f(x) = x [ 1, 1 ε/x] x < ε/2, ε f(x) = [ 1,1] ε/2 x1 ε1/2, [1 ε/x,1] x > ε/2. 1 x < 0 f(x) = [ 1,1] x = 0 1 x > 0

84 The ε-subdifferential For the absolute value function, f(x) = x [ 1, 1 ε/x] if x < ε/2, ε f(x) = [ 1,1] if ε/2 x1 ε1/2, [1 ε/x,1] if x > ε/2. 1 x < 0 f(x) = [ 1,1] x = 0 1 x > 0 ε 2 ε 2

85 The ε-subdifferential ε 2 ε 2 As a set-valued mapping ε f(x) is osc: ( ) ε k ε ε k,x k,g(x k ) ε kf(x k ) : x k x G(x k ) ḡ = ḡ ε f( x) As a set-valued mapping, ε f(x) is isc: Given ḡ ε f( x) ( ) ε k,x k,g(x k ) ε kf(x k ) : ε k ε x k x G(x k ) ḡ

86 The ε-subdifferential and bundle methods Generate iterates so that for a subsequence {^x k } As a set-valued mapping ε f(x) is osc: ( ) ε k ε ε k,^x k,g(^x k ) ε kf(^x k ) : x k x G(^x k ) ḡ = ḡ ε f( x) with ε = 0 and ḡ = 0 As a set-valued mapping, ε f(x) is isc:given ḡ ε f( x) : ( ) ε k ε ε k,^x k,g(^x k ) ε kf(^x k ) : x k x G(x k ) ḡ

87 The ε-subdifferential and bundle methods You told us we were going to use subgradient information provided by an oracle or a black box, and now you want to use ε-subgradients!

88 The transportation formula How to express subgradients at x i as ε-subgradients at ^x k? g i f(x i ) if and only if, for all y IR n f(y) f(x i ) + g i,y x i = f(x i ) + g i,y x pmf(^x k ) ) = f(^x k ) + g i,y x Bigl(f(^x k ) f(x i ) = f(^x k ) + g i,y x ± ^x k ( ) f(^x k ) f(x i ) = f(^x k ) + g i,y ^x k ( f(^x k ) f(x i ) g i,^x k x i ) = f(^x k ) + g i,y ^x k e i (^x k )

89 The transportation formula How to express subgradients at x i as ε-subgradients at ^x k? g i f(x i ) if and only if, for all y IR n f(y) f(x i ) + g i,y x i = f(x i ) + g i,y x ± f(^x k ) ( ) = f(^x k ) + g i (y x) f(^x k ) f(x i ) ( ) = f(^x k ) + g i (y x ± ^x k ) f(^x k ) f(x i ) ( = f(^x k ) + g i (y ^x k ) f(^x k ) f(x i ) g i,^x k x i ) = f(^x k ) + g i (y ^x k ) e i (^x k ) = g i e i (^x k ) f(^xk ) e i (^x k ) := f(^x k ) f(x i ) g i,^x k x i 0

90 The transportation formula How to express subgradients at x i as ε-subgradients at ^x k? g i f(x i ) if and only if, for all y IR n f(y) f(x i ) + g i,y x i = f(x i ) + g i,y x ± f(^x k ) ( ) = f(^x k ) + g i,y x f(^x k ) f(x i ) = f(^x k ) + g i y x ± ^x k ( ) f(^x k ) f(x i ) = f(^x k ) + g i y ^x k ( f(^x k ) f(x i ) g i,^x k x i ) = f(^x k ) + g i,y ^x k e i (^x k ) = g i e i (^x k ) f(^xk ) e i (^x k ) := f(^x k ) f(x i ) g i,,^x k x i ) 0

91 The transportation formula How to express subgradients at x i as ε-subgradients at ^x k? g i f(x i ) if and only if, for all y IR n f(y) f(x i ) + g i,y x i = f(x i ) + g i,y x ± f(^x k ) ( ) = f(^x k ) + g i,y x f(^x k ) f(x i ) = f(^x k ) + g i,y x ± ^x k ( ) f(^x k ) f(x i ) = f(^x k ) + g i,y ^x k ( f(^x k ) f(x i ) g i,^x k x i ) = f(^x k ) + g i,y ^x k e i (^x k ) = g i e i (^x k ) f(^xk ) e i (^x k ) := f(^x k ) f(x i ) g i,,^x k x i ) 0

92 The transportation formula How to express subgradients at x i as ε-subgradients at ^x k? g i f(x i ) if and only if, for all y IR n f(y) f(x i ) + g i,y x i = f(x i ) + g i,y x ± f(^x k ) ( ) = f(^x k ) + g i,y x f(^x k ) f(x i ) = f(^x k ) + g i,y x ± ^x k ( ) f(^x k ) f(x i ) = f(^x k ) + g i,y ^x k ( f(^x k ) f(x i ) g i,^x k x i ) = f(^x k ) + g i,y ^x k e i (^x k ) = g i e i (^x k ) f(^xk ) e i (^x k ) := f(^x k ) f(x i ) g i,,^x k x i ) 0

93 The transportation formula How to express subgradients at x i as ε-subgradients at ^x k? g i f(x i ) if and only if, for all y IR n f(y) f(x i ) + g i,y x i = f(x i ) + g i,y x ± f(^x k ) ( ) = f(^x k ) + g i,y x f(^x k ) f(x i ) = f(^x k ) + g i,y x ± ^x k ( ) f(^x k ) f(x i ) = f(^x k ) + g i,y ^x k ( f(^x k ) f(x i ) g i,^x k x i ) = f(^x k ) + g i,y ^x k e i (^x k ) = g i e i (^x k ) f(^xk ) e i (^x k ) := f(^x k ) f(x i ) g i,,^x k x i ) 0

94 The transportation formula How to express subgradients at x i as ε-subgradients at ^x k? g i f(x i ) if and only if, for all y IR n f(y) f(x i ) + g i,y x i = f(x i ) + g i,y x ± f(^x k ) ( ) = f(^x k ) + g i,y x f(^x k ) f(x i ) = f(^x k ) + g i,y x ± ^x k ( ) f(^x k ) f(x i ) = f(^x k ) + g i,y ^x k ( f(^x k ) f(x i ) g i,^x k x i ) = f(^x k ) + g i,y ^x k e i (^x k ) = g i e i (^x k ) f(^xk ) e i (^x k ) := f(^x k ) f(x i ) g i,^x k x i 0

95 The transportation formula How to express subgradients at x i as ε-subgradients at ^x k? g i f(x i ) if and only if, for all y IR n f(y) f(x i ) + g i,y x i = f(x i ) + g i,y x ± f(^x k ) ( ) = f(^x k ) + g i,y x f(^x k ) f(x i ) = f(^x k ) + g i,y x ± ^x k ( ) f(^x k ) f(x i ) = f(^x k ) + g i,y ^x k ( f(^x k ) f(x i ) g i,^x k x i ) = f(^x k ) + g i,y ^x k e i (^x k ) = g i e i (^x k ) f(^xk ) e i (^x k ) := f(^x k ) f(x i ) g i,^x k x i 0

96 Linearization errors e j (^x k ) f(^x k ) f M k ( ) f(x j ) ^x k x j f(x j ) + g j, x j

97 The ε-subdifferential and bundle methods We collect the black-box x i,i = 1,2,...,k, so that at iteration k we can define a bundle of information, centered at a special iterate ^x k {x i } B k := ei (^x k ) = f(^x k ) f(x i ) g i,^x k x i g i e i (^x k ) f(^xk )

98 The ε-subdifferential and bundle methods We collect the black-box x i,i = 1,2,...,k, so that at iteration k we can define a bundle of information, centered at a special iterate ^x k {x i } B k := ei (^x k ) = f(^x k ) f(x i ) g i,^x k x i g i e i (^x k ) f(^xk ) A suitable convex combination ε k := i B k α i e i (^x k ) and G k := i B k α i g i will eventually satisfy the optimality condition!

99 Why special NSO methods? Smooth optimization techniques do not work abs 0 f(x) = x f(x k ) = 1, x k 0 f(0) = [ 1,1] Smooth stopping test fails: f(x k ) TOL ( g(x k ) TOL)

100 Why special NSO methods? Smooth optimization techniques do not work Smooth approximations of derivatives by finite differences fail For f : IR 3 IR defined by f(x) = max(x 1,x 2,x 3 ) f(0) =? Forward finite difference f(x+ x) f(x) x = (1,1,1) Central finite difference f(x+ x) f(x ) 2 x = ( 1 2, 1 2, none of them in the subdifferential!

101 Why special NSO methods? Smooth optimization techniques do not work Smooth approximations of derivatives by finite differences fail For f : IR 3 IR defined by f(x) = max(x 1,x 2,x 3 ) f(0) =? Forward finite difference f(x+ x) f(x) x = (1,1,1) Central finite difference f(x+ x) f(x ) 2 x = ( 1 2, 1 2, 1 2 none of them in the subdifferential!

102 Why special NSO methods? Smooth optimization techniques do not work Linesearches get trapped in kinks and fail

103 Why special NSO methods? Smooth optimization techniques do not work Linesearches get trapped in kinks and fail Example 9.1 Instability of steepest descent"

104 Why special NSO methods? Smooth optimization techniques do not work g(x k ) may not provide descent

105 Why special NSO methods? Smooth optimization techniques do not work g(x k ) may not provide descent

106 Why special NSO methods? Smooth optimization techniques do not work Smooth stopping test fails Finite difference approximations fail Linesearches get trapped in kinks and fail Direction opposite to a subgradient may increase the functional values

107

108

109

110 Bundle Methods WANT: p = prox f t (x) p = argminf(y) + 2t 1 y x 2 2 HAVE: q k = prox M k t k (x) q k = argminm k (y) + 2t 1 k y x k = G k + t 1 k (q k x k ) G k εk f(x) for G k M k (q k ) for ε k = f(x) M k (q k ) t k G k 2 2

111 Bundle Methods WANT: p = prox f t (x) p = argminf(y) + 2t 1 y x 2 2 HAVE: q k = prox M k t k (x) q k = argminm k (y) + 2t 1 k y x k = G k + t 1 k (q k x k ) Two subsequences G k εk f(x) for G k M k (q k ) for ε k = f(x) M k (q k ) t k G k 2 2 Iterates giving sufficiently good approximal points Iterates just helping the optimization process CL93 eventually applies null

112 Bundle Methods HAVE: q k = prox M k t k (x) = x k + t k G k G k εk f(x) Two subsequences for ε k = f(x) M k (q k ) t k G k 2 2 Iterates giving sufficiently good approximal points moving towards minimum in a manner that makes δ k := ε k + t k G k (serious) Iterates just helping the optimization process CL93 eventually applies (null)

113 Bundle Methods

114 Bundle Methods M 1 ( ) + 1 2t 1 ^x 1 2

115 Bundle Methods M 2 ( ) + 1 2t 2 ^x 2 2 ^x 1 x 2

116 Bundle Methods ^x 1 ^x 3 x 2

117 Bundle Methods ^x 1 ^x 4 ^x 3 x 2

118 Bundle Methods 0 Choose x 1, set k = 1, and let ^x 1 = x 1. 1 Compute x k+1 = argminm k (x) + 1 2t k x ^x k 2 2 If δ k :=f(^x k ) M k (x k+1 ) tol STOP 3 Call the oracle at x k+1. If f(x k+1 ) f(^x k ) mδ k, set ^x k+1 = x k+1 (Serious Step) Otherwise, maintain ^x k+1 = ^x k (Null Step) 4 Define M k+1, t k+1, make k = k + 1, and loop to 1.

119 Bundle Methods: selection mechanism ( M k+1 ( ) = max M k ( ),f k + now the choice of the new model is more flexible: x k+1 argminm k (x) + 2t 1 k x ^x k 2 with M k (x) = max i k {f i + g k, x k ), g i,x x i } is equivalent to a QP: min r IR,x IR n r + 2t 1 k x ^x k 2 s.t. r f i + g i,x x i for i k A posteriori, the solution remains the same if...

120 Bundle Methods: selection mechanism ( M k+1 ( ) = max M k ( ),f k + now the choice of the new model is more flexible: x k+1 argminm k (x) + 2t 1 k x ^x k 2 with M k (x) = max i k {f i + g k, x k ), g i,x x i } is equivalent to a QP: min r IR,x IR n r + 2t 1 k x ^x k 2 s.t. r f i + g i,x x i for i k A posteriori, the solution remains the same if all, or...

121 Bundle Methods: selection mechanism ( M k+1 ( ) = max M k ( ),f k + now the choice of the new model is more flexible: x k+1 argminm k (x) + 2t 1 k x ^x k 2 with M k (x) = max i k {f i + g k, x k ), g i,x x i } is equivalent to a QP: min r IR,x IR n r + 2t 1 k x ^x k 2 s.t. r f i + g i,x x i for active i s A posteriori, the solution remains the same if all, or active, or...

122 Bundle Methods: selection mechanism ( M k+1 ( ) = max M k ( ),f k + now the choice of the new model is more flexible: x k+1 argminm k (x) + 2t 1 k x ^x k 2 with M k (x) = max i k {f i + g k, x k ), g i,x x i } is equivalent to a QP: min r IR,x IR n r + 2t 1 k x ^x k 2 s.t. r i ᾱi( f i + g i,x x i ) A posteriori, the solution remains the same if all, or active, or the optimal convex combination

123 Bundle Methods: selection mechanism ( M k+1 ( ) = max M k ( ),f k + now the choice of the new model is more flexible: x k+1 argminm k (x) + 2t 1 k x ^x k 2 with M k (x) = max i k {f i + g k, x k ), g i,x x i } is equivalent to a QP: min r IR,x IR n r + 2t 1 k x ^x k 2 s.t. r i ᾱi( f i + g i,x x i ) A posteriori, the solution remains the same if all, or active, or the optimal convex combination are kept

124 Bundle Methods: next model options ( M k+1 ( ) = max M k ( ),f k + g k, x k ) or ( M k+1 ( ) = max max active,fk + g k, x k ) or ( M k+1 ( ) = max aggregate,f k + g k, x k ) Same QP solution if all, or active, or the optimal convex combination aggregate=full Bundle Compression: QP with only 2 constraints (but slows down the overall process)

125 The cutting-plane model You told us we were going to use a bundle B k composed by linearization errors and ε-subgradients at ^x k, but the model uses f i and g i f(x i )

126 Rewriting the cutting-plane model The transportation formula centers the ith linearization in the serious iterate f(y) f(x i ) + g i,y x i f(^x k ) f = f(^x k ) + g i,y ^x k e i (^x k e j (^x k ) ) f(x j ) M k ( ) ^x k x j f(x j ) + g j, x j

127 Rewriting the cutting-plane model The transportation formula centers the ith linearization in the serious iterate f(y) f(x i ) + g i,y x i f(^x k ) f = f(^x k ) + g i,y ^x k e i (^x k e j (^x k ) ) f(x j ) M k ( ) this translates into the model as follows { M k (y) = max f(x i ) + g i,y x i } : i B k { = max f(^x k ) + g i,y ^x k } e i (^x k ) : i B k = f(^x k ) + max { g i,y ^x k } e i (^x k ) : i B k ^x k x j f(x j ) + g j, x j

128 Bundle Method 0 Choose x 1, set k = 1, and let ^x 1 = x 1. 1 Compute x k+1 = argminm k (x) + 1 2t k x ^x k 2 2 If δ k :=f(^x k ) M k (x k+1 ) tol STOP 3 Call the oracle at x k+1. If f(x k+1 ) f(^x k ) mδ k, set ^x k+1 = x k+1 Otherwise, maintain ^x k+1 = ^x k 4 Define M k+1, t k+1, make k = k + 1, and loop to 1.

129 Bundle Method 0 Choose x 1, set k = 1, and let ^x 1 = x 1. 1 Compute x k+1 = argminm k (x) + 1 2t k x ^x k 2 2 If δ k :=f(^x k ) M k (x k+1 ) tol STOP 3 Call the oracle at x k+1. If f(x k+1 ) f(^x k ) mδ k, set ^x k+1 = x k+1 (Serious Step) k K S Otherwise, maintain ^x k+1 = ^x k (Null Step) k K N 4 Define M k+1, t k+1, make k = k + 1, and loop to 1.

130 Bundle Method When k, the algorithm generates two subsequences. Convergence analysis addresses the mutually exclusive situations either the SS subsequence is infinite K := {k K S } (limit point minimizes f) or there is a last SS, followed by infinitely many null steps K := {k K N : k lastss} (last SS minimizes f and nullto last SS)

131 Bundle Method When k, the algorithm generates two subsequences. Convergence analysis addresses the mutually exclusive situations either the SS subsequence is infinite K := {k K S } (limit point minimizes f) or there is a last SS, followed by infinitely many null steps K := {k K N : k last SS} (last SS minimizes f and nullto last SS)

132 Bundle Method When k, the algorithm generates two subsequences. Convergence analysis addresses the mutually exclusive situations either the SS subsequence is infinite K := {k K S } (limit point minimizes f) or there is a last SS, followed by infinitely many null steps K := {k K N : k last SS} (last SS minimizes f and null last SS)

133 Equivalent QPs 1. Given t k, the stepsize parameter of the proximal bundle method, with QP subproblem given by (PB) k minm k (x) + 1 2t k x ^x k 2 2. Given k, the radius parameter of the trust-region bundle method, with QP subproblem given by min M k (x) (TRB) k s.t. x ^x k 2 k 3. Given l k, the level parameter of the level bundle method,

134 with QP subproblem given by min 1 2 (LB) x ^xk 2 k s.t. M k (x) l k Show that 1. given t k, there exists k such that if x k+1 solves (PB) k, then x k+1 solves (TRB) k. 2. given k, there exists l k such that if x k+1 solves (TRB) k, then x k+1 solves (LB) k. 3. given ell k, there exists t k such that if x k+1 solves (LB) k, then x k+1 solves (PB) k.

135 Theorem K := {k K S } Suppose the bundle method loops forever and there are infinitely many serious steps. Either the solution set of minf is empty and f(^x k ) or the following holds (i) lim δ k = 0 and lim ε k = 0. k K S k K S (ii) If the stepsizes are chosen so that {^x k } is a minimizing sequence. k K S t k = + then (iii) If, in addition, t k t up for all k K S, then the subsequence {^x k } is bounded. In this case, any limit point x minimizes f and the whole sequence converges to x

136 Theorem K := { } k K N ˆk Suppose the bundle method loops forever and there are infinitely many null steps after a last serious one, denoted by ^x and generated at iteration ^k. Suppose stepsizes are chosen so that t lo t k+1 t k for all k K The following holds 1. The sequence {x k+1 } is bounded 2. lim k K M k (x k+1 ) = f(^x) 3. ^x minimizes f 4. lim k K x k+1 = ^x

137 Model requirements 1. M k f 2. If k was declared a null step a) M k+1 (x) f k+1 + g k+1,x x k+1 b) M k+1 (x) A k (x)= M k (x k+1 ) + G k,x x k+1

138 Model requirements 1. M k f 2. If k was declared a null step a) M k+1 (x) f k+1 + g k+1,x x k+1 b) M k+1 (x) A k (x)= M k (x k+1 ) + Any model satisfying these conditions that is used in the QP maintains the convergence results G k,x x k+1

139 Comparing the methods: bundle and SG Typical performance on a battery of Unit Commitment problems

140 Bundle Methods with on-demand accuracy the new generation

141 Oracle types: exact and upper f 1 (x)/g 1 (x) f 1 (x) is easy: exact f 1 (x)/g 1 (x) f 2 (x)/g 2 (x) f 2 (x) is difficult: inexact f 2 x/g 2 x f 2 f 1 f 1 (x) + g 1 (x), x f 2 x + g 2 x, x Oracle f 2 x/g 2 x NOT of lower type

142 Oracle types: exact and lower f 1 (x)/g 1 (x) f 1 (x) is easy: exact f 1 (x)/g 1 (x) f 2 (x)/g 2 (x) f 2 (x) is less difficult: inexact f 2 x/g 2 x f 2 f 1 f 1 (x) + g 1 (x), x f 2 x + g 2 x, x Oracle f 2 x/g 2 x of lower type

143 For the EM problem f j (x) = max{ C j (q j ) + x,g j (q j ) By computing f x k and g x k satisfying we can build A lower oracle f x k = f(x k ) η k and g x k η kf(x k ) An asymptotically exact oracle η k 0 as k A partly asymptotically exact oracle An on-demand accuracy oracle η k 0 as K s k η k η k when f x k f^x k mδ k : q j P j }

144 BM with lower inexact oracles M k (x) = max{f x i + g x i,x x i : i B k } δ k = ε k + t k G k 2 SS test: f x k+1 ^f k mδ k { ( )} ^f k := max f^x k,max M j (^x k ),j ^k + Oracle inaccuracy is locally bounded: R 0 η(r) 0 : x R = η η(r) convergence as before, up to the accuracy on SS

145 BM with lower inexact oracles M k (x) = max{f x i + g x i,x x i : i B k } δ k = ε k + t k G k 2 SS test: f x k+1 ^f k mδ k { ( )} ^f k := max f^x k,max M j (^x k ),j ^k + Oracle inaccuracy is locally bounded: R 0 η(r) 0 : x R = η η(r) convergence as before, up to the accuracy on SS Convex proximal bundle methods in depth: a unified analysis for inexact oracles W. de Oliveira, C. Sagastizábal, C. Lemaréchal MathProg 148, pp , 2014

146 General comments Bundle methods are robust (do not oscillate, as CP methods do) reliable (have a stopping test, unlike SG methods) can deal with inaccuracy in a reasonable manner

147 Extending bundle methods

148 Constrained NSO problems: an example

149 Optimal management of the hydrovalley ) max λ E η (ρ(u) s.t. (x,u) P P η ( Au + a min Mη Au + a max ) p

150 Optimal management of the hydrovalley ) max λ E η (ρ(u) s.t. Is this a convex program? (x,u) P P η ( Au + a min Mη Au + a max ) p

151 Optimal management of the hydrovalley ) max λ E η (ρ(u) s.t. (x,u) P P η ( Au + a min Mη Au + a max ) p Is this a convex program? YES: the function ( ) u log (P ) η Au + a min Mη Au + a max is convex. min f(u) We need to solve s.t. (x,u) P for linear f and with c(u) 0 ( ) c(u) := log (P ) η Au + a min Mη Au + a max logp

152 Optimal management of the hydrovalley ) max λ E η (ρ(u) s.t. (x,u) P P η ( Au + a min Mη Au + a max ) p Is this a convex program? YES: the function ( ) u log (P ) η Au + a min Mη Au + a max min f(u) We need to solve s.t. (x,u) P for linear f and with c(u) 0 ( ) c(u) := log (P ) η Au + a min Mη Au + a max logp difficult to compute! is convex.

153 Need to solve the constrained problem (P) min f(u) s.t. (x,u) P c(u) 0 for linear f and with inexact evaluation of c and its gradient, via a black box with controllable inaccuracy (bounded by a given tolerance ε, with confidence level 99%, noting that evaluation errors can be positive or negative)

154 Handling constraints in NSO For nonsmooth constrained problems use the Improvement Function min f(u) s.t. c(u) 0 max{f(u) f(^u),c(u)} u (changes with each serious point ^u and supposes exact f/c values available) [SagSol SiOPT, 2005 and KarasRibSagSol MPB, 2009]

155 Improvement function Let ( x,ū) be a solution to (P). The function Hū(u) := max {f(u) f(ū), c(u)} (x,u) P has perfect theoretical properties: If Slater condition ( (x, u) P s.t. c(u) < 0) holds, then ū solves min f(u) s.t. c(u) 0 (P) (x,u) P min H ū(u) = Hū(ū) = 0 (x,u) P 0 H(ū) for H( ) := Hū( )

156 Improvement function Let ( x,ū) be a solution to (P). The function Hū(u) := has perfect theoretical properties: Without Slater condition ū solves max {f(u) f(ū), c(u)} (x,u) P min f(u) s.t. c(u) 0 (P) (x,u) P BUT min H ū(u) = Hū(ū) = 0 (x,u) P and also 0 H(ū) for H( ) := Hū( )

157 Improvement function Let ( x,ū) be a solution to (P). The function Hū(u) := has perfect theoretical properties: Without Slater condition ū solves max {f(u) f(ū), c(u)} (x,u) P min f(u) s.t. c(u) 0 (P) (x,u) P : when c(ū) 0 ū solves (P), otherwise it minimizes infeasibility over P min H ū(u) = Hū(ū) = 0 (x,u) P 0 H(ū) for H( ) := Hū( )

158 Handling nonconvex problems

159 Nonconvex proximal point mapping [PR96] p R f(x) := argmin y IR N {f(y) + R 2 y x 2} x is the prox-center and R > R x is the prox-parameter Theorem If f is convex p R f is well defined for any R > 0. p R f is single valued and loc. Lip. p = p R f(x) R(x p) f(p) x minimizes f x = p R f(x ) for any R > 0. x k+1 = p R f(x k ) converges to a minimizer x. What is f is nonconvex?

160 Nonconvex difficulties Proximal Bundle Methods are the most robust and reliable (oracle) methods for convex minimization. Their success relies heavily on convexity. If f is convex: x k+1 = p R f(x k ) converges to a minimizer x. ˇf k lies entirely below f. May no longer be true for nonconvex f

161 Nonconvex difficulties Proximal Bundle Methods are the most robust and reliable (oracle) methods for convex minimization. Their success relies heavily on convexity. If f is convex x k+1 = p R f(x k ) converges to a minimizer x. ˇf k lies entirely below f. May no longer be true for nonconvex f

162 Nonconvex difficulties Proximal Bundle Methods are the most robust and reliable (oracle) methods for convex minimization. Their success relies heavily on convexity. If f is convex x k+1 = p R f(x k ) converges to a minimizer x. ˇf k lies entirely below f. May no longer be true for nonconvex f

163 Nonconvex difficulties Proximal Bundle Methods are the most robust and reliable (oracle) methods for convex minimization. Their success relies heavily on convexity. If f is convex x k+1 = p R f(x k ) converges to a minimizer x. ˇf k lies entirely below f. May no longer be true for nonconvex f

164 Nonconvex difficulties Proximal Bundle Methods are the most robust and reliable (oracle) methods for convex minimization. Their success relies heavily on convexity. If f is convex x k+1 = p R f(x k ) converges to a minimizer x. ˇf k lies entirely below f. May no longer be true for nonconvex f How this difficulty has been addressed?

165 Take each plane in the model: f i + g i, y i and rewrite it, centered at x k : [ ( )] f(x k ) f(x k ) f i + g i,x k y i + g i, x k f(x k ) e f k,i + g i, x k { } ˇf k (y) = max f(x k ) e f k,i + g i,y x k Good: e f k,i positive convergence Good: If f convex ef k,i positive. BAD: If f nonconvex, e f k,i may be negative

166 Nonconvex bundle methods fix negative linearization errors, replacing ˇf k by: { } ˇf FIX k (y) = max f(x k ) e f k,i + g i,y x k [Mif77, Lem80, Kiw85, Luk98]

167 Nonconvex bundle methods fix negative linearization errors, replacing ˇf k by: { } ˇf FIX k (y) = max f(x k ) e f k,i + g i,y x k [Mif77, Lem80, Kiw85, Luk98]

168 Nonconvex bundle methods fix negative linearization errors, replacing ˇf k by: { } ˇf FIX k (y) = max f(x k ) e f k,i + g i,y x k [Mif77, Lem80, Kiw85, Luk98]

169 Nonconvex bundle methods fix negative linearization errors, replacing ˇf k by: { } ˇf FIX k (y) = max f(x k ) e f k,i + g i,y x k [Mif77, Lem80, Kiw85, Luk98]

170 Nonconvex bundle methods fix negative linearization errors, replacing ˇf k by: { } ˇf FIX k (y) = max f(x k ) e f k,i + g i,y x k [Mif77, Lem80, Kiw85, Luk98]

171 Nonconvex bundle methods fix negative linearization errors, replacing ˇf k by: { } ˇf FIX k (y) = max f(x k ) e f k,i + g i,y x k [Mif77, Lem80, Kiw85, Luk98]

172 A new method A different approach (ours) is based on the following trick Take η,µ > 0 : R = η + µ and note } p R f(x k ) = min { f(w) + R 1 w 2 w x k 2 } = min { f(w) + (η + µ) 1 w 2 w x k 2 } = min { f(w) + η 1 w 2 w x k µ 2 w x k 2 } = min { F 1 η(w) + µ w 2 w x k 2 = p µ (F η ) (x k ) p R f = p µ (F η )

173 Redistributed Proximal Bundle Method At l th -iteration, for k = k(l), given R k, x k and a bundle B = {y i,f i,g i,i I l } 0. Split R k into η l and µ l. 1. Model F ηl ˇF ηl,l(y) = max i B {F ηl i + g ηl i,y y i } 2. Minimize the penalized model y l+1 = argmin{ˇf ηl,l(y) + µ l 2 y x k 2 } 3. Descent test If y l+1 good: x k+1 y l+1, define R k+1 serious step If y l+1 bad: null step 4. Update bundle B B {y l+1,f l+1,g l+1 }

174 VU quasi-newton bundle For x IR n, given matrices A 0, B 0, f(x) = x Ax + x Bx has a unique minimizer at x = 0. On N (A) the function is not differentiable, and the first term vanishes: f N (A) looks smooth. R(A) N (A) V U parallel to f( x) U perpendicular to V V is parallel to N (A), the ridge" of nonsmoothness

175

176 VU-Algorithm: (Mifflin&Sagastizábal, MathProg 05)Recall that f V N (A) is nice: the key is the two QP-solves R(A) N (A) V U 2 bundle QPs Newton-move Two successive bundle QPs identify the ridge of nonsmoothness Solve a 2nd QP to create an approximation of V based on ˇf(^p k )

177 VU-Algorithm: superlinear m serious subsequence (Mifflin&Sag, MathProg 05)

178 To learn more Bundle methods history R. MIFFLIN, C. SAGASTIZÁBAL, Documenta Math, A Science Fiction Story in Nonsmooth Optimization Originating at IIASA vol-ismp/44_mifflin-robert.pdf (exact) Bundle books J.F. BONNANS, J.C. GILBERT, C. LEMARÉCHAL, AND C. SAGASTIZÁBAL, Numerical Optimization: Theoretical and Practical Aspects, Springer, 2nd ed., J.B. HIRIART-URRUTY AND C. LEMARÉCHAL, Convex Analysis and Minimization Algorithms II, no. 306 in Grund. der math. Wissenschaften, Springer, 2nd ed., Inexact Bundle theory (next page)

179 Inexact Bundle theory M. HINTERMÜLLER, A proximal bundle method based on approximate subgradients, COAp 20 (2001), pp M. V. SOLODOV, On Approximations with Finite Precision in Bundle Methods for Nonsmooth Optimization. JoTA (2003), pp K.C. KIWIEL, A proximal bundle method with approximate subgradient linearizations, SiOpt 16 (2006), pp W. DE OLIVEIRA, C. SAGASTIZÁBAL, AND C. LEMARÉCHAL, Convex proximal bundle methods in depth: a unified analysis for inexact oracles, MathProg 148 (2014), pp Inexact Bundle variants with applications G. EMIEL AND C. SAGASTIZÁBAL, Incremental-like bundle methods with application to energy planning, COAp 46 (2010), pp W. DE OLIVEIRA, C. SAGASTIZÁBAL, AND S. SCHEIMBERG, Inexact bundle methods for two-stage stochastic programming, SiOpt 21 (2011), pp (next page)

180 W. VAN ACKOOIJ AND C. SAGASTIZÁBAL, Constrained bundle methods for upper inexact oracles with application to joint chance constrained energy problems, SiOpt 24 (2014), pp W. DE OLIVEIRA AND C. SAGASTIZÁBAL, Level bundle methods for oracles with on-demand accuracy, OMS 29 (2014), pp W. DE OLIVEIRA AND C. SAGASTIZÁBAL, Bundle methods in the xxi century: A birds -eye view, Pesquisa Operacional 34 (2014), pp W. DE OLIVEIRA AND M. SOLODOV, A doubly stabilized bundle method for nonsmooth convex optimization, MathProg 156(1), pp , 2016.

181 Any doubts or questions? Just me

Math 273a: Optimization Subgradient Methods

Math 273a: Optimization Subgradient Methods Math 273a: Optimization Subgradient Methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Nonsmooth convex function Recall: For ˉx R n, f(ˉx) := {g R

More information

SOLVING A MINIMIZATION PROBLEM FOR A CLASS OF CONSTRAINED MAXIMUM EIGENVALUE FUNCTION

SOLVING A MINIMIZATION PROBLEM FOR A CLASS OF CONSTRAINED MAXIMUM EIGENVALUE FUNCTION International Journal of Pure and Applied Mathematics Volume 91 No. 3 2014, 291-303 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu doi: http://dx.doi.org/10.12732/ijpam.v91i3.2

More information

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725 Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h

More information

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods Renato D.C. Monteiro B. F. Svaiter May 10, 011 Revised: May 4, 01) Abstract This

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R

More information

Algorithms for Nonsmooth Optimization

Algorithms for Nonsmooth Optimization Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization

More information

On the iterate convergence of descent methods for convex optimization

On the iterate convergence of descent methods for convex optimization On the iterate convergence of descent methods for convex optimization Clovis C. Gonzaga March 1, 2014 Abstract We study the iterate convergence of strong descent algorithms applied to convex functions.

More information

A doubly stabilized bundle method for nonsmooth convex optimization

A doubly stabilized bundle method for nonsmooth convex optimization Mathematical Programming manuscript No. (will be inserted by the editor) A doubly stabilized bundle method for nonsmooth convex optimization Welington de Oliveira Mikhail Solodov Received: date / Accepted:

More information

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some

More information

Proximal and First-Order Methods for Convex Optimization

Proximal and First-Order Methods for Convex Optimization Proximal and First-Order Methods for Convex Optimization John C Duchi Yoram Singer January, 03 Abstract We describe the proximal method for minimization of convex functions We review classical results,

More information

A Proximal Method for Identifying Active Manifolds

A Proximal Method for Identifying Active Manifolds A Proximal Method for Identifying Active Manifolds W.L. Hare April 18, 2006 Abstract The minimization of an objective function over a constraint set can often be simplified if the active manifold of the

More information

Incremental-like Bundle Methods with Application to Energy Planning

Incremental-like Bundle Methods with Application to Energy Planning Incremental-like Bundle Methods with Application to Energy Planning This report corrects and expands Sections 5 and 6 in DOI 10.1007/s10589-009-9288-8. Grégory Emiel Claudia Sagastizábal October 1, 2009

More information

Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods

Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 30 Notation f : H R { } is a closed proper convex function domf := {x R n

More information

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)

More information

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013 Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for

More information

Optimization for Machine Learning

Optimization for Machine Learning Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html

More information

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

Level Bundle Methods for Constrained Convex Optimization with Various Oracles

Level Bundle Methods for Constrained Convex Optimization with Various Oracles Research report manuscript No. (will be inserted by the editor) Level Bundle Methods for Constrained Convex Optimization with Various Oracles Wim van Ackooij Welington de Oliveira Received: May 23, 2013

More information

This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization

This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization x = prox_f(x)+prox_{f^*}(x) use to get prox of norms! PROXIMAL METHODS WHY PROXIMAL METHODS Smooth

More information

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 36 Why proximal? Newton s method: for C 2 -smooth, unconstrained problems allow

More information

Lecture 17: October 27

Lecture 17: October 27 0-725/36-725: Convex Optimiation Fall 205 Lecturer: Ryan Tibshirani Lecture 7: October 27 Scribes: Brandon Amos, Gines Hidalgo Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These

More information

A quasisecant method for minimizing nonsmooth functions

A quasisecant method for minimizing nonsmooth functions A quasisecant method for minimizing nonsmooth functions Adil M. Bagirov and Asef Nazari Ganjehlou Centre for Informatics and Applied Optimization, School of Information Technology and Mathematical Sciences,

More information

Nesterov s Optimal Gradient Methods

Nesterov s Optimal Gradient Methods Yurii Nesterov http://www.core.ucl.ac.be/~nesterov Nesterov s Optimal Gradient Methods Xinhua Zhang Australian National University NICTA 1 Outline The problem from machine learning perspective Preliminaries

More information

Optimization for Learning and Big Data

Optimization for Learning and Big Data Optimization for Learning and Big Data Donald Goldfarb Department of IEOR Columbia University Department of Mathematics Distinguished Lecture Series May 17-19, 2016. Lecture 1. First-Order Methods for

More information

On the convergence properties of the projected gradient method for convex optimization

On the convergence properties of the projected gradient method for convex optimization Computational and Applied Mathematics Vol. 22, N. 1, pp. 37 52, 2003 Copyright 2003 SBMAC On the convergence properties of the projected gradient method for convex optimization A. N. IUSEM* Instituto de

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Newton s Method. Javier Peña Convex Optimization /36-725

Newton s Method. Javier Peña Convex Optimization /36-725 Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and

More information

Sparse Optimization Lecture: Basic Sparse Optimization Models

Sparse Optimization Lecture: Basic Sparse Optimization Models Sparse Optimization Lecture: Basic Sparse Optimization Models Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know basic l 1, l 2,1, and nuclear-norm

More information

arxiv: v1 [math.oc] 5 Dec 2014

arxiv: v1 [math.oc] 5 Dec 2014 FAST BUNDLE-LEVEL TYPE METHODS FOR UNCONSTRAINED AND BALL-CONSTRAINED CONVEX OPTIMIZATION YUNMEI CHEN, GUANGHUI LAN, YUYUAN OUYANG, AND WEI ZHANG arxiv:141.18v1 [math.oc] 5 Dec 014 Abstract. It has been

More information

McMaster University. Advanced Optimization Laboratory. Title: A Proximal Method for Identifying Active Manifolds. Authors: Warren L.

McMaster University. Advanced Optimization Laboratory. Title: A Proximal Method for Identifying Active Manifolds. Authors: Warren L. McMaster University Advanced Optimization Laboratory Title: A Proximal Method for Identifying Active Manifolds Authors: Warren L. Hare AdvOl-Report No. 2006/07 April 2006, Hamilton, Ontario, Canada A Proximal

More information

A bundle-filter method for nonsmooth convex constrained optimization

A bundle-filter method for nonsmooth convex constrained optimization Math. Program., Ser. B (2009) 116:297 320 DOI 10.1007/s10107-007-0-7 FULL LENGTH PAPER A bundle-filter method for nonsmooth convex constrained optimization Elizabeth Karas Ademir Ribeiro Claudia Sagastizábal

More information

Sparse Optimization Lecture: Dual Methods, Part I

Sparse Optimization Lecture: Dual Methods, Part I Sparse Optimization Lecture: Dual Methods, Part I Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know dual (sub)gradient iteration augmented l 1 iteration

More information

Convex Optimization Lecture 16

Convex Optimization Lecture 16 Convex Optimization Lecture 16 Today: Projected Gradient Descent Conditional Gradient Descent Stochastic Gradient Descent Random Coordinate Descent Recall: Gradient Descent (Steepest Descent w.r.t Euclidean

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Coordinate Update Algorithm Short Course Operator Splitting

Coordinate Update Algorithm Short Course Operator Splitting Coordinate Update Algorithm Short Course Operator Splitting Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 25 Operator splitting pipeline 1. Formulate a problem as 0 A(x) + B(x) with monotone operators

More information

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples Agenda Fast proximal gradient methods 1 Accelerated first-order methods 2 Auxiliary sequences 3 Convergence analysis 4 Numerical examples 5 Optimality of Nesterov s scheme Last time Proximal gradient method

More information

Adaptive Primal Dual Optimization for Image Processing and Learning

Adaptive Primal Dual Optimization for Image Processing and Learning Adaptive Primal Dual Optimization for Image Processing and Learning Tom Goldstein Rice University tag7@rice.edu Ernie Esser University of British Columbia eesser@eos.ubc.ca Richard Baraniuk Rice University

More information

Coordinate Descent and Ascent Methods

Coordinate Descent and Ascent Methods Coordinate Descent and Ascent Methods Julie Nutini Machine Learning Reading Group November 3 rd, 2015 1 / 22 Projected-Gradient Methods Motivation Rewrite non-smooth problem as smooth constrained problem:

More information

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Coralia Cartis, University of Oxford INFOMM CDT: Modelling, Analysis and Computation of Continuous Real-World Problems Methods

More information

Algorithms for constrained local optimization

Algorithms for constrained local optimization Algorithms for constrained local optimization Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen Algorithms for constrained local optimization p. Feasible direction methods Algorithms for constrained

More information

Line Search Methods for Unconstrained Optimisation

Line Search Methods for Unconstrained Optimisation Line Search Methods for Unconstrained Optimisation Lecture 8, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Generic

More information

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014 Convex Optimization Dani Yogatama School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA February 12, 2014 Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12,

More information

Convex Optimization and l 1 -minimization

Convex Optimization and l 1 -minimization Convex Optimization and l 1 -minimization Sangwoon Yun Computational Sciences Korea Institute for Advanced Study December 11, 2009 2009 NIMS Thematic Winter School Outline I. Convex Optimization II. l

More information

8 Numerical methods for unconstrained problems

8 Numerical methods for unconstrained problems 8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields

More information

arxiv: v1 [math.oc] 1 Jul 2016

arxiv: v1 [math.oc] 1 Jul 2016 Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the

More information

SEMI-SMOOTH SECOND-ORDER TYPE METHODS FOR COMPOSITE CONVEX PROGRAMS

SEMI-SMOOTH SECOND-ORDER TYPE METHODS FOR COMPOSITE CONVEX PROGRAMS SEMI-SMOOTH SECOND-ORDER TYPE METHODS FOR COMPOSITE CONVEX PROGRAMS XIANTAO XIAO, YONGFENG LI, ZAIWEN WEN, AND LIWEI ZHANG Abstract. The goal of this paper is to study approaches to bridge the gap between

More information

Identifying Active Constraints via Partial Smoothness and Prox-Regularity

Identifying Active Constraints via Partial Smoothness and Prox-Regularity Journal of Convex Analysis Volume 11 (2004), No. 2, 251 266 Identifying Active Constraints via Partial Smoothness and Prox-Regularity W. L. Hare Department of Mathematics, Simon Fraser University, Burnaby,

More information

ACCELERATED BUNDLE LEVEL TYPE METHODS FOR LARGE SCALE CONVEX OPTIMIZATION

ACCELERATED BUNDLE LEVEL TYPE METHODS FOR LARGE SCALE CONVEX OPTIMIZATION ACCELERATED BUNDLE LEVEL TYPE METHODS FOR LARGE SCALE CONVEX OPTIMIZATION By WEI ZHANG A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS

More information

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus 1/41 Subgradient Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes definition subgradient calculus duality and optimality conditions directional derivative Basic inequality

More information

Primal/Dual Decomposition Methods

Primal/Dual Decomposition Methods Primal/Dual Decomposition Methods Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2018-19, HKUST, Hong Kong Outline of Lecture Subgradients

More information

Lecture 7: September 17

Lecture 7: September 17 10-725: Optimization Fall 2013 Lecture 7: September 17 Lecturer: Ryan Tibshirani Scribes: Serim Park,Yiming Gu 7.1 Recap. The drawbacks of Gradient Methods are: (1) requires f is differentiable; (2) relatively

More information

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725 Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like

More information

Dual and primal-dual methods

Dual and primal-dual methods ELE 538B: Large-Scale Optimization for Data Science Dual and primal-dual methods Yuxin Chen Princeton University, Spring 2018 Outline Dual proximal gradient method Primal-dual proximal gradient method

More information

The Newton Bracketing method for the minimization of convex functions subject to affine constraints

The Newton Bracketing method for the minimization of convex functions subject to affine constraints Discrete Applied Mathematics 156 (2008) 1977 1987 www.elsevier.com/locate/dam The Newton Bracketing method for the minimization of convex functions subject to affine constraints Adi Ben-Israel a, Yuri

More information

Nonlinear Optimization for Optimal Control

Nonlinear Optimization for Optimal Control Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]

More information

A Tutorial on Primal-Dual Algorithm

A Tutorial on Primal-Dual Algorithm A Tutorial on Primal-Dual Algorithm Shenlong Wang University of Toronto March 31, 2016 1 / 34 Energy minimization MAP Inference for MRFs Typical energies consist of a regularization term and a data term.

More information

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44 Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)

More information

Bundle methods for inexact data

Bundle methods for inexact data Bundle methods for inexact data Welington de Oliveira and Mihail Solodov Abstract Many applications of optimization to real-life problems lead to nonsmooth objective and/or constraint functions that are

More information

LIMITED MEMORY BUNDLE METHOD FOR LARGE BOUND CONSTRAINED NONSMOOTH OPTIMIZATION: CONVERGENCE ANALYSIS

LIMITED MEMORY BUNDLE METHOD FOR LARGE BOUND CONSTRAINED NONSMOOTH OPTIMIZATION: CONVERGENCE ANALYSIS LIMITED MEMORY BUNDLE METHOD FOR LARGE BOUND CONSTRAINED NONSMOOTH OPTIMIZATION: CONVERGENCE ANALYSIS Napsu Karmitsa 1 Marko M. Mäkelä 2 Department of Mathematics, University of Turku, FI-20014 Turku,

More information

Stochastic Optimization Algorithms Beyond SG

Stochastic Optimization Algorithms Beyond SG Stochastic Optimization Algorithms Beyond SG Frank E. Curtis 1, Lehigh University involving joint work with Léon Bottou, Facebook AI Research Jorge Nocedal, Northwestern University Optimization Methods

More information

c 2013 Society for Industrial and Applied Mathematics

c 2013 Society for Industrial and Applied Mathematics SIAM J. OPTIM. Vol. 3, No., pp. 109 115 c 013 Society for Industrial and Applied Mathematics AN ACCELERATED HYBRID PROXIMAL EXTRAGRADIENT METHOD FOR CONVEX OPTIMIZATION AND ITS IMPLICATIONS TO SECOND-ORDER

More information

Math 273a: Optimization Overview of First-Order Optimization Algorithms

Math 273a: Optimization Overview of First-Order Optimization Algorithms Math 273a: Optimization Overview of First-Order Optimization Algorithms Wotao Yin Department of Mathematics, UCLA online discussions on piazza.com 1 / 9 Typical flow of numerical optimization Optimization

More information

Dual Proximal Gradient Method

Dual Proximal Gradient Method Dual Proximal Gradient Method http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/19 1 proximal gradient method

More information

Penalty and Barrier Methods General classical constrained minimization problem minimize f(x) subject to g(x) 0 h(x) =0 Penalty methods are motivated by the desire to use unconstrained optimization techniques

More information

10. Unconstrained minimization

10. Unconstrained minimization Convex Optimization Boyd & Vandenberghe 10. Unconstrained minimization terminology and assumptions gradient descent method steepest descent method Newton s method self-concordant functions implementation

More information

Optimization Tutorial 1. Basic Gradient Descent

Optimization Tutorial 1. Basic Gradient Descent E0 270 Machine Learning Jan 16, 2015 Optimization Tutorial 1 Basic Gradient Descent Lecture by Harikrishna Narasimhan Note: This tutorial shall assume background in elementary calculus and linear algebra.

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

Lecture 6: September 12

Lecture 6: September 12 10-725: Optimization Fall 2013 Lecture 6: September 12 Lecturer: Ryan Tibshirani Scribes: Micol Marchetti-Bowick Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have not

More information

ε-enlargements of maximal monotone operators: theory and applications

ε-enlargements of maximal monotone operators: theory and applications ε-enlargements of maximal monotone operators: theory and applications Regina S. Burachik, Claudia A. Sagastizábal and B. F. Svaiter October 14, 2004 Abstract Given a maximal monotone operator T, we consider

More information

SIAM Conference on Imaging Science, Bologna, Italy, Adaptive FISTA. Peter Ochs Saarland University

SIAM Conference on Imaging Science, Bologna, Italy, Adaptive FISTA. Peter Ochs Saarland University SIAM Conference on Imaging Science, Bologna, Italy, 2018 Adaptive FISTA Peter Ochs Saarland University 07.06.2018 joint work with Thomas Pock, TU Graz, Austria c 2018 Peter Ochs Adaptive FISTA 1 / 16 Some

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 9 Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 2 Separable convex optimization a special case is min f(x)

More information

Fast proximal gradient methods

Fast proximal gradient methods L. Vandenberghe EE236C (Spring 2013-14) Fast proximal gradient methods fast proximal gradient method (FISTA) FISTA with line search FISTA as descent method Nesterov s second method 1 Fast (proximal) gradient

More information

Sparsity Regularization

Sparsity Regularization Sparsity Regularization Bangti Jin Course Inverse Problems & Imaging 1 / 41 Outline 1 Motivation: sparsity? 2 Mathematical preliminaries 3 l 1 solvers 2 / 41 problem setup finite-dimensional formulation

More information

THE NEWTON BRACKETING METHOD FOR THE MINIMIZATION OF CONVEX FUNCTIONS SUBJECT TO AFFINE CONSTRAINTS

THE NEWTON BRACKETING METHOD FOR THE MINIMIZATION OF CONVEX FUNCTIONS SUBJECT TO AFFINE CONSTRAINTS THE NEWTON BRACKETING METHOD FOR THE MINIMIZATION OF CONVEX FUNCTIONS SUBJECT TO AFFINE CONSTRAINTS ADI BEN-ISRAEL AND YURI LEVIN Abstract. The Newton Bracketing method [9] for the minimization of convex

More information

University of Houston, Department of Mathematics Numerical Analysis, Fall 2005

University of Houston, Department of Mathematics Numerical Analysis, Fall 2005 3 Numerical Solution of Nonlinear Equations and Systems 3.1 Fixed point iteration Reamrk 3.1 Problem Given a function F : lr n lr n, compute x lr n such that ( ) F(x ) = 0. In this chapter, we consider

More information

A PROJECTED HESSIAN GAUSS-NEWTON ALGORITHM FOR SOLVING SYSTEMS OF NONLINEAR EQUATIONS AND INEQUALITIES

A PROJECTED HESSIAN GAUSS-NEWTON ALGORITHM FOR SOLVING SYSTEMS OF NONLINEAR EQUATIONS AND INEQUALITIES IJMMS 25:6 2001) 397 409 PII. S0161171201002290 http://ijmms.hindawi.com Hindawi Publishing Corp. A PROJECTED HESSIAN GAUSS-NEWTON ALGORITHM FOR SOLVING SYSTEMS OF NONLINEAR EQUATIONS AND INEQUALITIES

More information

Benchmark of Some Nonsmooth Optimization Solvers for Computing Nonconvex Proximal Points

Benchmark of Some Nonsmooth Optimization Solvers for Computing Nonconvex Proximal Points Benchmark of Some Nonsmooth Optimization Solvers for Computing Nonconvex Proximal Points Warren Hare Claudia Sagastizábal February 28, 2006 Key words: nonconvex optimization, proximal point, algorithm

More information

ALGORITHMS FOR MINIMIZING DIFFERENCES OF CONVEX FUNCTIONS AND APPLICATIONS

ALGORITHMS FOR MINIMIZING DIFFERENCES OF CONVEX FUNCTIONS AND APPLICATIONS ALGORITHMS FOR MINIMIZING DIFFERENCES OF CONVEX FUNCTIONS AND APPLICATIONS Mau Nam Nguyen (joint work with D. Giles and R. B. Rector) Fariborz Maseeh Department of Mathematics and Statistics Portland State

More information

Parallel Coordinate Optimization

Parallel Coordinate Optimization 1 / 38 Parallel Coordinate Optimization Julie Nutini MLRG - Spring Term March 6 th, 2018 2 / 38 Contours of a function F : IR 2 IR. Goal: Find the minimizer of F. Coordinate Descent in 2D Contours of a

More information

Subgradient Method. Guest Lecturer: Fatma Kilinc-Karzan. Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization /36-725

Subgradient Method. Guest Lecturer: Fatma Kilinc-Karzan. Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization /36-725 Subgradient Method Guest Lecturer: Fatma Kilinc-Karzan Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization 10-725/36-725 Adapted from slides from Ryan Tibshirani Consider the problem Recall:

More information

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

Newton s Method. Ryan Tibshirani Convex Optimization /36-725 Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x

More information

Lecture 14: October 17

Lecture 14: October 17 1-725/36-725: Convex Optimization Fall 218 Lecture 14: October 17 Lecturer: Lecturer: Ryan Tibshirani Scribes: Pengsheng Guo, Xian Zhou Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

Subgradient Method. Ryan Tibshirani Convex Optimization

Subgradient Method. Ryan Tibshirani Convex Optimization Subgradient Method Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last last time: gradient descent min x f(x) for f convex and differentiable, dom(f) = R n. Gradient descent: choose initial

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57

More information

Kaisa Joki Adil M. Bagirov Napsu Karmitsa Marko M. Mäkelä. New Proximal Bundle Method for Nonsmooth DC Optimization

Kaisa Joki Adil M. Bagirov Napsu Karmitsa Marko M. Mäkelä. New Proximal Bundle Method for Nonsmooth DC Optimization Kaisa Joki Adil M. Bagirov Napsu Karmitsa Marko M. Mäkelä New Proximal Bundle Method for Nonsmooth DC Optimization TUCS Technical Report No 1130, February 2015 New Proximal Bundle Method for Nonsmooth

More information

5 Handling Constraints

5 Handling Constraints 5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest

More information

The Proximal Gradient Method

The Proximal Gradient Method Chapter 10 The Proximal Gradient Method Underlying Space: In this chapter, with the exception of Section 10.9, E is a Euclidean space, meaning a finite dimensional space endowed with an inner product,

More information

A STABILIZED SQP METHOD: SUPERLINEAR CONVERGENCE

A STABILIZED SQP METHOD: SUPERLINEAR CONVERGENCE A STABILIZED SQP METHOD: SUPERLINEAR CONVERGENCE Philip E. Gill Vyacheslav Kungurtsev Daniel P. Robinson UCSD Center for Computational Mathematics Technical Report CCoM-14-1 June 30, 2014 Abstract Regularized

More information

10 Numerical methods for constrained problems

10 Numerical methods for constrained problems 10 Numerical methods for constrained problems min s.t. f(x) h(x) = 0 (l), g(x) 0 (m), x X The algorithms can be roughly divided the following way: ˆ primal methods: find descent direction keeping inside

More information

Linear Convergence under the Polyak-Łojasiewicz Inequality

Linear Convergence under the Polyak-Łojasiewicz Inequality Linear Convergence under the Polyak-Łojasiewicz Inequality Hamed Karimi, Julie Nutini and Mark Schmidt The University of British Columbia LCI Forum February 28 th, 2017 1 / 17 Linear Convergence of Gradient-Based

More information

Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization

Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization Frank E. Curtis, Lehigh University Beyond Convexity Workshop, Oaxaca, Mexico 26 October 2017 Worst-Case Complexity Guarantees and Nonconvex

More information

Convex Analysis Background

Convex Analysis Background Convex Analysis Background John C. Duchi Stanford University Park City Mathematics Institute 206 Abstract In this set of notes, we will outline several standard facts from convex analysis, the study of

More information

Stochastic and online algorithms

Stochastic and online algorithms Stochastic and online algorithms stochastic gradient method online optimization and dual averaging method minimizing finite average Stochastic and online optimization 6 1 Stochastic optimization problem

More information

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings Structural and Multidisciplinary Optimization P. Duysinx and P. Tossings 2018-2019 CONTACTS Pierre Duysinx Institut de Mécanique et du Génie Civil (B52/3) Phone number: 04/366.91.94 Email: P.Duysinx@uliege.be

More information

Optimization and Optimal Control in Banach Spaces

Optimization and Optimal Control in Banach Spaces Optimization and Optimal Control in Banach Spaces Bernhard Schmitzer October 19, 2017 1 Convex non-smooth optimization with proximal operators Remark 1.1 (Motivation). Convex optimization: easier to solve,

More information

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Donghwan Kim and Jeffrey A. Fessler EECS Department, University of Michigan

More information

Proximal splitting methods on convex problems with a quadratic term: Relax!

Proximal splitting methods on convex problems with a quadratic term: Relax! Proximal splitting methods on convex problems with a quadratic term: Relax! The slides I presented with added comments Laurent Condat GIPSA-lab, Univ. Grenoble Alpes, France Workshop BASP Frontiers, Jan.

More information

Proximal methods. S. Villa. October 7, 2014

Proximal methods. S. Villa. October 7, 2014 Proximal methods S. Villa October 7, 2014 1 Review of the basics Often machine learning problems require the solution of minimization problems. For instance, the ERM algorithm requires to solve a problem

More information

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE CONVEX ANALYSIS AND DUALITY Basic concepts of convex analysis Basic concepts of convex optimization Geometric duality framework - MC/MC Constrained optimization

More information