A multistart multisplit direct search methodology for global optimization

Size: px
Start display at page:

Download "A multistart multisplit direct search methodology for global optimization"

Transcription

1 1/69 A multistart multisplit direct search methodology for global optimization Ismael Vaz (Univ. Minho) Luis Nunes Vicente (Univ. Coimbra) IPAM, Optimization and Optimal Control for Complex Energy and Property Landscapes October, 2017

2 Outline 2/69 1 LOCAL: Probabilistic direct search. Deterministic case first. Motivation, performance, complexity.

3 Outline 2/69 1 LOCAL: Probabilistic direct search. Deterministic case first. Motivation, performance, complexity. 2 GLOBAL: By multistarting & multispliting probabilistic DS. How to split and merge runs. Use of non-convex modelling.

4 Direct-search methods 3/69 Definition Sample the objective function at a finite number of points at each iteration. Achieve descent by moving in the direction of potentially better points. In the smooth and deterministic case, these points are defined by directions in positive spanning sets (PSS):

5 Direct-search methods 3/69 Definition Sample the objective function at a finite number of points at each iteration. Achieve descent by moving in the direction of potentially better points. In the smooth and deterministic case, these points are defined by directions in positive spanning sets (PSS): D

6 Coordinate search 4/69

7 Coordinate search 4/69

8 Coordinate search 4/69

9 Coordinate search 4/69

10 Coordinate search 4/69

11 Coordinate search 4/69

12 Coordinate search 4/69

13 Coordinate search 4/69

14 Coordinate search 4/69

15 Coordinate search 4/69

16 Coordinate search 4/69

17 Coordinate search 4/69

18 Coordinate search 4/69

19 Coordinate search 4/69

20 A class of DS methods 5/69 Choose: x 0 and α 0. For k = 0, 1, 2,... (Until α k is suff. small) Search step (optional)

21 A class of DS methods 5/69 Choose: x 0 and α 0. For k = 0, 1, 2,... (Until α k is suff. small) Search step (optional) Poll step: Select D k PSS and find x k + α k d k (d k D k ): f(x k + α k d k ) < f(x k ) ρ(α k ) like ρ(α) = α 2 /2.

22 A class of DS methods 5/69 Choose: x 0 and α 0. For k = 0, 1, 2,... (Until α k is suff. small) Search step (optional) Poll step: Select D k PSS and find x k + α k d k (d k D k ): f(x k + α k d k ) < f(x k ) ρ(α k ) like ρ(α) = α 2 /2. SUCCESS: Move x k+1 = x k + α k d k and possibly increase α k+1 = γ α k (γ = 1 or 2). UNSUCCESS: Stay x k+1 = x k and decrease α k+1 = θ α k (θ = 1/2).

23 Deterministic approach 6/69 Positive spanning set (PSS)

24 Deterministic approach 6/69 Positive spanning set (PSS) Cosine measure of a PSS D cm(d) = min max d v 0 v R n d D d v > 0.

25 Deterministic approach 6/69 Positive spanning set (PSS) cm = 1 n cm = 1 n Cosine measure of a PSS D cm(d) = min max d v 0 v R n d D d v > 0.

26 Deterministic approach 6/69 Positive spanning set (PSS) f(x k ) f(x k ) cm = 1 n cm = 1 n Cosine measure of a PSS D cm(d) = min max d v 0 v R n d D d v > 0. Thus d D descent when f(x k ) 0.

27 Deterministic approach 6/69 Positive spanning set (PSS) f(x k ) f(x k ) cm = 1 n cm = 1 n Cosine measure of a PSS D cm(d) = min max d v 0 v R n d D d v > 0. Thus d D descent when f(x k ) 0. = α k small leads to success!

28 WCC of DS (smooth case) 7/69 Insight: (decrease in f) O(α 2 k ) }{{} success O(α 2 k u ) O( f(x ku ) 2 ) }{{} unsuccess Kolda, Lewis, Torczon, 2003 SIREV

29 WCC of DS (smooth case) 7/69 Insight: (decrease in f) O(α 2 k ) }{{} success Theorem (LNV, 2013 EURO J. Comp. Optim.) O(α 2 k u ) O( f(x ku ) 2 ) }{{} unsuccess Kolda, Lewis, Torczon, 2003 SIREV Any such DS method generates a sequence {x k } k 0 such that: min f(x j) = O(1/ k) 0 j k

30 WCC of DS (smooth case) 7/69 Insight: (decrease in f) O(α 2 k ) }{{} success Theorem (LNV, 2013 EURO J. Comp. Optim.) O(α 2 k u ) O( f(x ku ) 2 ) }{{} unsuccess Kolda, Lewis, Torczon, 2003 SIREV Any such DS method generates a sequence {x k } k 0 such that: min f(x j) = O(1/ k) 0 j k and takes at most k ɛ O ( n ɛ 2) iterations to reduce the gradient below ɛ (0, 1).

31 WCC of DS (smooth case) 7/69 Insight: (decrease in f) O(α 2 k ) }{{} success Theorem (LNV, 2013 EURO J. Comp. Optim.) O(α 2 k u ) O( f(x ku ) 2 ) }{{} unsuccess Kolda, Lewis, Torczon, 2003 SIREV Any such DS method generates a sequence {x k } k 0 such that: min f(x j) = O(1/ k) 0 j k and takes at most k ɛ O ( n ɛ 2) iterations to reduce the gradient below ɛ (0, 1). The # of fevals must be multiplied by n: O ( n 2 ɛ 2).

32 WCC of DS (smooth case) 7/69 Insight: (decrease in f) O(α 2 k ) }{{} success Theorem (LNV, 2013 EURO J. Comp. Optim.) O(α 2 k u ) O( f(x ku ) 2 ) }{{} unsuccess Kolda, Lewis, Torczon, 2003 SIREV Any such DS method generates a sequence {x k } k 0 such that: min f(x j) = O(1/ k) 0 j k and takes at most k ɛ O ( n ɛ 2) iterations to reduce the gradient below ɛ (0, 1). The # of fevals must be multiplied by n: O ( n 2 ɛ 2). Bounds depend on L 2 f (instead of L f as in gradient method).

33 WCC of DS (smooth, convex case) 8/69 Ruling out cases where the supreme distance from the initial level set L f (x 0 ) to the solution set X f is infinite... Theorem (M. Dodangeh and LNV, 2016 Math. Program.) Any such DS method generates a sequence {x k } k 0 such that: f(x k ) f = O(1/k) k ɛ O ( n ɛ 1). Again, the # of fevals must be multiplied by n: O ( n 2 ɛ 1).

34 WCC of DS (smooth, convex case) 8/69 Ruling out cases where the supreme distance from the initial level set L f (x 0 ) to the solution set X f is infinite... Theorem (M. Dodangeh and LNV, 2016 Math. Program.) Any such DS method generates a sequence {x k } k 0 such that: f(x k ) f = O(1/k) k ɛ O ( n ɛ 1). Again, the # of fevals must be multiplied by n: O ( n 2 ɛ 1). The n 2 factor comes from D cm(d) 2.

35 WCC of DS (smooth, convex case) 8/69 Ruling out cases where the supreme distance from the initial level set L f (x 0 ) to the solution set X f is infinite... Theorem (M. Dodangeh and LNV, 2016 Math. Program.) Any such DS method generates a sequence {x k } k 0 such that: f(x k ) f = O(1/k) k ɛ O ( n ɛ 1). Again, the # of fevals must be multiplied by n: O ( n 2 ɛ 1). The n 2 factor comes from D cm(d) 2. For D = D one obtains 2n (1/ n) 2 = 2n2. Is this optimal?

36 Optimality of the n 2 factor 9/69 Theorem (M. Dodangeh, LNV, and Z. Zhang, 2016 Optim. Lett.) The factor n 2 is optimal since any PSS D in R n satisfies D cm(d) 2 1 ζ 2 n 2

37 Optimality of the n 2 factor 9/69 Theorem (M. Dodangeh, LNV, and Z. Zhang, 2016 Optim. Lett.) The factor n 2 is optimal since any PSS D in R n satisfies D cm(d) 2 1 ζ 2 n 2 Plot of 8 = D cm(d ) 2 4 = n 2 D cm(d) 2 n2. for the case n = 2 and D s with uniform angles, m = D

38 Global rate of DS (smooth, strongly convex case) 10/69 Theorem (M. Dodangeh and LNV, 2014 Math. Program.) Any such DS method generates a sequence {x k } k 0 such that: f(x k ) f < Cr k, where r (0, 1) and C > 0.

39 Global rate of DS (smooth, strongly convex case) 10/69 Theorem (M. Dodangeh and LNV, 2014 Math. Program.) Any such DS method generates a sequence {x k } k 0 such that: f(x k ) f < Cr k, where r (0, 1) and C > 0. When f is SC (constant µ > 0), one has (k 0 first unsucc.) x k x L f /µ x k0 x.

40 Global rate of DS (smooth, strongly convex case) 10/69 Theorem (M. Dodangeh and LNV, 2014 Math. Program.) Any such DS method generates a sequence {x k } k 0 such that: f(x k ) f < Cr k, where r (0, 1) and C > 0. When f is SC (constant µ > 0), one has (k 0 first unsucc.) x k x L f /µ x k0 x. A linear rate for the iterates can be derived from 1 2 µ x x 2 f(x) f.

41 Difficulties in the nonsmooth case 11/69 The cone of descent directions at the poll center is shaded.

42 One possible fix: Combine DS with Smoothing 12/69

43 One possible fix: Combine DS with Smoothing 12/69

44 One possible fix: Combine DS with Smoothing 12/69 Essentially the WCC cost increases from O ( ɛ 2) to O ( ɛ 3) [R. Garmanjani and LNV, 2013 IMA J. Numer. Anal.].

45 One possible fix: Combine DS with Smoothing 12/69 Essentially the WCC cost increases from O ( ɛ 2) to O ( ɛ 3) [R. Garmanjani and LNV, 2013 IMA J. Numer. Anal.]. The # of function evaluations increases from O ( ) n 2 ɛ 2) to O (n 5 2 ɛ 3.

46 Summary of DS global rates 13/69 Imposing sufficient decrease to accept new iterates:

47 Summary of DS global rates 13/69 Imposing sufficient decrease to accept new iterates: O( log(ɛ)) strongly convex linear global rate for f, f, and absolute error in iterates.

48 Summary of DS global rates 13/69 Imposing sufficient decrease to accept new iterates: O( log(ɛ)) strongly convex linear global rate for f, f, and absolute error in iterates. O(ɛ 1 ) convex global rate 1/k for f and f.

49 Summary of DS global rates 13/69 Imposing sufficient decrease to accept new iterates: O( log(ɛ)) strongly convex linear global rate for f, f, and absolute error in iterates. O(ɛ 1 ) convex global rate 1/k for f and f. O(ɛ 2 ) non-convex global rate 1/ k for f.

50 Summary of DS global rates 13/69 Imposing sufficient decrease to accept new iterates: O( log(ɛ)) strongly convex linear global rate for f, f, and absolute error in iterates. O(ɛ 1 ) convex global rate 1/k for f and f. O(ɛ 2 ) non-convex global rate 1/ k for f. In terms of function evaluations: O(n 2 ɛ 1 ), O(n 2 ɛ 2 ). The factor n 2 is proved approximately optimal.

51 Summary of DS global rates 13/69 Imposing sufficient decrease to accept new iterates: O( log(ɛ)) strongly convex linear global rate for f, f, and absolute error in iterates. O(ɛ 1 ) convex global rate 1/k for f and f. O(ɛ 2 ) non-convex global rate 1/ k for f. In terms of function evaluations: O(n 2 ɛ 1 ), O(n 2 ɛ 2 ). The factor n 2 is proved approximately optimal. O(ɛ 3 ) non-smooth, non-convex (using smoothing techniques).

52 Descent condition and successful iterations 14/69 Assume the polling directions are normalized. Lemma If cm ( D k, f(x k ) ) κ and α k < 2κ f(x k) L f + 1, the k-th iteration is successful.

53 Descent condition and successful iterations 14/69 Assume the polling directions are normalized. Lemma If cm ( D k, f(x k ) ) κ and α k < 2κ f(x k) L f + 1, the k-th iteration is successful. where cm(d, v) is the cosine measure of D given v, defined by cm(d, v) = max d D and L f is a Lipschitz constant of f. d v d v

54 Randomly generating positive spanning sets... 15/69 f(x k )

55 Randomly generating positive spanning sets... 15/69 f(x k ) n + 1 random polling directions in this case not a PSS

56 Randomly generating positive spanning sets... 15/69 f(x k ) n + 1 random polling directions in this case not a PSS f(x k )

57 Randomly generating positive spanning sets... 15/69 f(x k ) n + 1 random polling directions in this case not a PSS f(x k ) n random polling directions certainly not a PSS...

58 Randomly generating positive spanning sets... 15/69 f(x k ) n + 1 random polling directions in this case not a PSS f(x k ) n random polling directions certainly not a PSS... cm ( D k, f(x k ) ) κ can be satisfied probabilistically...

59 Numerical illustration 16/69 Relative performance for different sets of polling directions (n = 40). [I I] [Q Q] 2n n + 1 n/4 2 1 arglina arglinb broydn3d dqrtic engval freuroth integreq nondquar sinquad vardim Solution accuracy was Averages were taken over 10 independent runs.

60 Probabilistic descent 17/69 From the definition of probabilistic models (Bandeira, LNV, Scheinberg, 2014 SIOPT): Definition The sequence {D k } is (p, κ)-probabilistically descent if, for each k 0, Pr ( cm(d k, f(x k )) κ D 0,..., D k 1 ) p,

61 Probabilistic descent 17/69 From the definition of probabilistic models (Bandeira, LNV, Scheinberg, 2014 SIOPT): Definition The sequence {D k } is (p, κ)-probabilistically descent if, for each k 0, Pr ( cm(d k, f(x k )) κ D 0,..., D k 1 ) p, Let Z k be the indicator function of { cm ( D k, f(x k ) ) κ }, and p 0 = ln θ ln(γ 1 θ) = 1 2 θ = 1/2, γ = 2.

62 Global rate: Counting descent 18/69 For each realization of the DS algorithm, define g k : the gradient with minimum norm among f(x 0 ),..., f(x k ),

63 Global rate: Counting descent 18/69 For each realization of the DS algorithm, define g k : the gradient with minimum norm among f(x 0 ),..., f(x k ), k ɛ : the smallest integer such that f(x k ) ɛ.

64 Global rate: Counting descent 18/69 For each realization of the DS algorithm, define g k : the gradient with minimum norm among f(x 0 ),..., f(x k ), k ɛ : the smallest integer such that f(x k ) ɛ. Denote the corresponding random variables by G k and K ɛ.

65 Global rate: Counting descent 18/69 For each realization of the DS algorithm, define g k : the gradient with minimum norm among f(x 0 ),..., f(x k ), k ɛ : the smallest integer such that f(x k ) ɛ. Denote the corresponding random variables by G k and K ɛ. Let z l denote the realization of Z l = { cm ( D l, f(x l ) ) κ } (l 0). Intuition: If g k is big, then k 1 l=0 z l is probably small.

66 Global rate: Counting descent 19/69 In fact, one can prove k 1 z l l=0 ( ) 1 O g k 2 + p 0 k.

67 Global rate: Counting descent 19/69 In fact, one can prove k 1 z l l=0 ( ) 1 O g k 2 + p 0 k. It then results, { G } k > ɛ { k 1 Z l l=0 } [ ( 1 ) O kɛ 2 + p 0 ]k.

68 Global rate: Counting descent 19/69 In fact, one can prove k 1 z l l=0 ( ) 1 O g k 2 + p 0 k. It then results, { G } k > ɛ { k 1 Z l l=0 } [ ( 1 ) O kɛ 2 + p 0 ]k. λ

69 Global rate: Counting descent In fact, one can prove k 1 l=0 It then results, { G } k > ɛ z l ( ) 1 O g k 2 + p 0 k. { k 1 Z l l=0 } [ ( 1 ) O kɛ 2 + p 0 ]k. λ Hence Pr ( G k ɛ ) = 1 Pr ( G k > ɛ ) 1 Pr ( k 1 ) Z l λ k l=0 } {{ } apply Chernoff & Submartingale Theory. 19/69

70 Global rate and WCC bound 20/69 Theorem (Gratton, Royer, LNV, and Zhang, 2015 SIOPT) Suppose that {D k } is (p, κ)-probabilistically descent with p > p 0. Then ( ( )) Pr G 1 k O κ 1 exp [ O(k)]. k

71 Global rate and WCC bound 20/69 Theorem (Gratton, Royer, LNV, and Zhang, 2015 SIOPT) Suppose that {D k } is (p, κ)-probabilistically descent with p > p 0. Then ( ( )) Pr G 1 k O κ 1 exp [ O(k)]. k O(1/ k) sublinear rate with overwhelmingly high probability.

72 Global rate and WCC bound 20/69 Theorem (Gratton, Royer, LNV, and Zhang, 2015 SIOPT) Suppose that {D k } is (p, κ)-probabilistically descent with p > p 0. Then ( ( )) Pr G 1 k O κ 1 exp [ O(k)]. k O(1/ k) sublinear rate with overwhelmingly high probability. Since Pr(K ɛ k) = Pr( G k ɛ), we also get: Theorem (Gratton, Royer, LNV, and Zhang, 2015 SIOPT) Suppose that {D k } is (p, κ)-probabilistically descent with p > p 0. Then ( ( ) ) ɛ 2 Pr K ɛ O 1 exp [ O(ɛ 2 ) ]. κ 2

73 Global rate and WCC bound Theorem (Gratton, Royer, LNV, and Zhang, 2015 SIOPT) Suppose that {D k } is (p, κ)-probabilistically descent with p > p 0. Then ( ( )) Pr G 1 k O κ 1 exp [ O(k)]. k O(1/ k) sublinear rate with overwhelmingly high probability. Since Pr(K ɛ k) = Pr( G k ɛ), we also get: Theorem (Gratton, Royer, LNV, and Zhang, 2015 SIOPT) Suppose that {D k } is (p, κ)-probabilistically descent with p > p 0. Then ( ( ) ) ɛ 2 Pr K ɛ O 1 exp [ O(ɛ 2 ) ]. κ 2 O(ɛ 2 ) bound for # of iter. with overwhelmingly high probability. 20/69

74 Two uniform directions are enough, one is not 21/69 g

75 Two uniform directions are enough, one is not 21/69 g d 1 d 1 U(S 1 ) κ (0, 1), ( ) Pr cm (d 1, g) = d 1 g κ < 1/2.

76 Two uniform directions are enough, one is not 21/69 g d 1 d 1 U(S 1 ) κ (0, 1), ( ) Pr cm (d 1, g) = d 1 g κ < 1/2.

77 Two uniform directions are enough, one is not 21/69 g d 1 g d 1 d 2 d 1 U(S 1 ) κ (0, 1), ( ) Pr cm (d 1, g) = d 1 g κ < 1/2. d 1, d 2 U(S 1 ) κ (0, 1), Pr (cm ({d 1, d 2 }, g) κ ) > 1/2.

78 Worst case complexity: Dependence on the dimension 22/69 Then, when r = D > 1, {D k } is p-probabilistically (1/ n)-descent for some p > p 0 = 1/2 independent of n.

79 Worst case complexity: Dependence on the dimension 22/69 Then, when r = D > 1, {D k } is p-probabilistically (1/ n)-descent for some p > p 0 = 1/2 independent of n. Plugging κ = 1/ n into the WCC bound, one obtains

80 Worst case complexity: Dependence on the dimension 22/69 Then, when r = D > 1, {D k } is p-probabilistically (1/ n)-descent for some p > p 0 = 1/2 independent of n. Plugging κ = 1/ n into the WCC bound, one obtains WCC (number of function evaluations) ( Pr Kɛ f O ( nɛ 2) ) r 1 exp [ O(ɛ 2 ) ].

81 Worst case complexity: Dependence on the dimension 22/69 Then, when r = D > 1, {D k } is p-probabilistically (1/ n)-descent for some p > p 0 = 1/2 independent of n. Plugging κ = 1/ n into the WCC bound, one obtains WCC (number of function evaluations) ( Pr Kɛ f O ( nɛ 2) ) r 1 exp [ O(ɛ 2 ) ]. The WCC bound is O(rnɛ 2 ), better than when O(n 2 ɛ 2 ) when r n.

82 Worst case complexity: Dependence on the dimension 22/69 Then, when r = D > 1, {D k } is p-probabilistically (1/ n)-descent for some p > p 0 = 1/2 independent of n. Plugging κ = 1/ n into the WCC bound, one obtains WCC (number of function evaluations) ( Pr Kɛ f O ( nɛ 2) ) r 1 exp [ O(ɛ 2 ) ]. The WCC bound is O(rnɛ 2 ), better than when O(n 2 ɛ 2 ) when r n. Theory admits r = 2, leading to O(nɛ 2 )!!!

83 A more detailed look at the numerical experiments 23/69 Relative performance for different sets of polling directions (n = 40). [I I] [Q Q] 2 (γ = 2) 4 (γ = 1.1) arglina arglinb broydn3d dqrtic engval freuroth integreq nondquar sinquad vardim Now γ = 1 for [I I] and [Q Q].

84 A more detailed look at the numerical experiments 24/69 Relative performance for different sets of polling directions (n = 100). [I I] [Q Q] 2 (γ = 2) 4 (γ = 1.1) arglina arglinb broydn3d dqrtic engval freuroth integreq nondquar sinquad vardim Now γ = 1 for [I I] and [Q Q].

85 Even better than 2 directions 25/69 Two random vectors D k = {d, d} u.d. on the unit sphere of R n works even better, and is an optimal choice:

86 Even better than 2 directions 25/69 Two random vectors D k = {d, d} u.d. on the unit sphere of R n works even better, and is an optimal choice: Given v R n with v = 1 and κ [0, 1], Pr ( cm(d k, v) κ ) = Pr ( d v κ ) + Pr ( d v κ ) = 2ϱ. with ϱ = Pr ( {d v κ} ).

87 Even better than 2 directions 25/69 Two random vectors D k = {d, d} u.d. on the unit sphere of R n works even better, and is an optimal choice: Given v R n with v = 1 and κ [0, 1], Pr ( cm(d k, v) κ ) = Pr ( d v κ ) + Pr ( d v κ ) = 2ϱ. with ϱ = Pr ( {d v κ} ). Given any γ, θ satisfying 0 < θ < 1 < γ, we can pick κ > 0 sufficiently small so that 2ϱ > p 0 = (ln θ)/[ln(γ 1 θ)].

88 Even better than 2 directions 25/69 Two random vectors D k = {d, d} u.d. on the unit sphere of R n works even better, and is an optimal choice: Given v R n with v = 1 and κ [0, 1], Pr ( cm(d k, v) κ ) = Pr ( d v κ ) + Pr ( d v κ ) = 2ϱ. with ϱ = Pr ( {d v κ} ). Given any γ, θ satisfying 0 < θ < 1 < γ, we can pick κ > 0 sufficiently small so that 2ϱ > p 0 = (ln θ)/[ln(γ 1 θ)]. In addition, for any D = {d 1, d 2 } u.d. in the unit sphere, Pr ( cm(d, v) κ ) = 2ϱ Pr ( {d 1 v κ} {d 2 v κ} ) 2ϱ.

89 So, a new approach and proof technique 26/69 A new proof technique for establishing global rates and WCC bounds for randomized algorithms for which

90 So, a new approach and proof technique 26/69 A new proof technique for establishing global rates and WCC bounds for randomized algorithms for which the new iterate depends on some object (directions, models), the quality of the object is favorable with a certain probability.

91 So, a new approach and proof technique 26/69 A new proof technique for establishing global rates and WCC bounds for randomized algorithms for which the new iterate depends on some object (directions, models), the quality of the object is favorable with a certain probability. The technique is based on: counting the number of iterations for which the quality is favorable, examining the probabilistic behavior of this number.

92 So, a new approach and proof technique 26/69 A new proof technique for establishing global rates and WCC bounds for randomized algorithms for which the new iterate depends on some object (directions, models), the quality of the object is favorable with a certain probability. The technique is based on: counting the number of iterations for which the quality is favorable, examining the probabilistic behavior of this number. What we obtain is a global rate of O(1/ k), with overwhelmingly high probability.

93 Examples of application 27/69 This technique has also been applied to: Trust-region methods based on probabilistic models [Gratton, Royer, LNV, and Zhang, 2017 IMAJNA]. When models are built iteratively in some random fashion and exhibit good accuracy with sufficiently high probability.

94 Examples of application 27/69 This technique has also been applied to: Trust-region methods based on probabilistic models [Gratton, Royer, LNV, and Zhang, 2017 IMAJNA]. When models are built iteratively in some random fashion and exhibit good accuracy with sufficiently high probability. Problems with linear constraints [Gratton, Royer, LNV, and Zhang, 2017]. Random generation can be efficiently done in subspaces identified within approximate tangent cones.

95 Global Optimization part of the talk 28/69 Our goal is now to develop a solver for a global optimization problem of the type min x Ω f(x), where Ω R n is a box. We want to do this:

96 Global Optimization part of the talk 28/69 Our goal is now to develop a solver for a global optimization problem of the type min x Ω f(x), where Ω R n is a box. We want to do this: Relying on probabilistic direct search which is an efficient, simple, and rigorous local solver.

97 Global Optimization part of the talk 28/69 Our goal is now to develop a solver for a global optimization problem of the type min x Ω f(x), where Ω R n is a box. We want to do this: Relying on probabilistic direct search which is an efficient, simple, and rigorous local solver. Taking advantage of parallel computing.

98 Global Optimization part of the talk 28/69 Our goal is now to develop a solver for a global optimization problem of the type min x Ω f(x), where Ω R n is a box. We want to do this: Relying on probabilistic direct search which is an efficient, simple, and rigorous local solver. Taking advantage of parallel computing. Having an approach capable of identifying several global or local minimizers of interest.

99 REMEMBER: Probabilistic DS (polling with 2 directions) 29/69 Choose: x 0 and α 0. For k = 0, 1, 2,... (Until α k is suff. small) Search step (optional) Poll step: Select D k = [d, d] RANDOMLY and find x k + α k d k (d k D k ): f(x k + α k d k ) < f(x k ) ρ(α k ) like ρ(α) = α 2 /2. SUCCESS: Move x k+1 = x k + α k d k and increase α k+1 = γ α k (γ = 2). UNSUCCESS: Stay x k+1 = x k and decrease α k+1 = θ α k (θ = 1/2).

100 A multistart multisplit DS framework (main idea) 30/69 MAIN IDEA:

101 A multistart multisplit DS framework (main idea) 30/69 MAIN IDEA: Consider a certain number R of DS runs guided by the poll centers (in parallel).

102 A multistart multisplit DS framework (main idea) 30/69 MAIN IDEA: Consider a certain number R of DS runs guided by the poll centers (in parallel). Create R clusters C j, j = 1,..., R, of points, out of all the points where f has been evaluated.

103 A multistart multisplit DS framework (main idea) 30/69 MAIN IDEA: Consider a certain number R of DS runs guided by the poll centers (in parallel). Create R clusters C j, j = 1,..., R, of points, out of all the points where f has been evaluated. Split or merge the DS runs based on how the poll centers fit into the clusters.

104 A multistart multisplit DS framework (main idea) 30/69 MAIN IDEA: Consider a certain number R of DS runs guided by the poll centers (in parallel). Create R clusters C j, j = 1,..., R, of points, out of all the points where f has been evaluated. Split or merge the DS runs based on how the poll centers fit into the clusters. IN THIS WAY:

105 A multistart multisplit DS framework (main idea) 30/69 MAIN IDEA: Consider a certain number R of DS runs guided by the poll centers (in parallel). Create R clusters C j, j = 1,..., R, of points, out of all the points where f has been evaluated. Split or merge the DS runs based on how the poll centers fit into the clusters. IN THIS WAY: We parallelize DS runs (where poll steps are typically limited to 2n evaluations).

106 A multistart multisplit DS framework (main idea) 30/69 MAIN IDEA: Consider a certain number R of DS runs guided by the poll centers (in parallel). Create R clusters C j, j = 1,..., R, of points, out of all the points where f has been evaluated. Split or merge the DS runs based on how the poll centers fit into the clusters. IN THIS WAY: We parallelize DS runs (where poll steps are typically limited to 2n evaluations). We allow MULTISTART runs and SPLITTING/MERGING of runs.

107 A multistart multisplit DS framework (scheme) 31/69 MULTISTART: Select initial points for R 0 DS runs: x 0 j, j = 1,..., R 0 (and initial step sizes α 0 j, j = 1,..., R 0).

108 A multistart multisplit DS framework (scheme) 31/69 MULTISTART: Select initial points for R 0 DS runs: x 0 j, j = 1,..., R 0 (and initial step sizes α 0 j, j = 1,..., R 0). At each OUTER iteration l:

109 A multistart multisplit DS framework (scheme) 31/69 MULTISTART: Select initial points for R 0 DS runs: x 0 j, j = 1,..., R 0 (and initial step sizes α 0 j, j = 1,..., R 0). At each OUTER iteration l: Create R l clusters C l j, j = 1,..., R l (using all the history of points).

110 A multistart multisplit DS framework (scheme) 31/69 MULTISTART: Select initial points for R 0 DS runs: x 0 j, j = 1,..., R 0 (and initial step sizes α 0 j, j = 1,..., R 0). At each OUTER iteration l: Create R l clusters C l j, j = 1,..., R l (using all the history of points). Decide whether any DS run is split or merged. Let, j = 1,..., R l+1 be the new poll centers. x l+1 j

111 A multistart multisplit DS framework (scheme) 31/69 MULTISTART: Select initial points for R 0 DS runs: x 0 j, j = 1,..., R 0 (and initial step sizes α 0 j, j = 1,..., R 0). At each OUTER iteration l: Create R l clusters C l j, j = 1,..., R l (using all the history of points). Decide whether any DS run is split or merged. Let, j = 1,..., R l+1 be the new poll centers. x l+1 j INNER ITERATIONS: Perform (in parallel) a certain number of iterations for all the R l+1 DS runs, starting from x l+1 j, j = 1,..., R l+1.

112 Requirements and approaches 32/69 To align with typical global convergence requirements:

113 Requirements and approaches 32/69 To align with typical global convergence requirements: The clustering step must be finite.

114 Requirements and approaches 32/69 To align with typical global convergence requirements: The clustering step must be finite. The DS runs must comply to the local rules.

115 Requirements and approaches 32/69 To align with typical global convergence requirements: The clustering step must be finite. The DS runs must comply to the local rules. When merging two DS runs, the new poll center must be the one with the lowest objective value.

116 Requirements and approaches 32/69 To align with typical global convergence requirements: The clustering step must be finite. The DS runs must comply to the local rules. When merging two DS runs, the new poll center must be the one with the lowest objective value. We consider two approaches for the OUTER iterations:

117 Requirements and approaches 32/69 To align with typical global convergence requirements: The clustering step must be finite. The DS runs must comply to the local rules. When merging two DS runs, the new poll center must be the one with the lowest objective value. We consider two approaches for the OUTER iterations: Using space clustering.

118 Requirements and approaches 32/69 To align with typical global convergence requirements: The clustering step must be finite. The DS runs must comply to the local rules. When merging two DS runs, the new poll center must be the one with the lowest objective value. We consider two approaches for the OUTER iterations: Using space clustering. Using nonconvex models built from previous function values.

119 The space clustering approach 33/69 Create R l clusters using kmeans clustering (using all previously evaluated points).

120 The space clustering approach 33/69 Create R l clusters using kmeans clustering (using all previously evaluated points). If a cluster contains more than two poll centers, remove the DS run corresponding to the poll center with worst objective value.

121 The space clustering approach 33/69 Create R l clusters using kmeans clustering (using all previously evaluated points). If a cluster contains more than two poll centers, remove the DS run corresponding to the poll center with worst objective value. If a cluster contains no poll center, then attempt to start a new DS run from the centroid of the cluster.

122 The space clustering approach 33/69 Create R l clusters using kmeans clustering (using all previously evaluated points). If a cluster contains more than two poll centers, remove the DS run corresponding to the poll center with worst objective value. If a cluster contains no poll center, then attempt to start a new DS run from the centroid of the cluster. POSSIBLE DRAWBACK: The kmeans approach does not take into account the previous objective function values (clustering is done in the variables space).

123 The space clustering approach (example) 34/ D Ackley problem - kmeans Cluster 1 Cluster 2 Poll centers Centroid Two poll centers in the same cluster Centroid

124 A nonconvex piecewise quadratic model 35/69 Modeling can instead identify valleys of convexity. (Nonconvex) quadratic piecewise models were suggested by Mangasarian et al., JOGO, 2006.

125 A nonconvex piecewise quadratic model 35/69 Modeling can instead identify valleys of convexity. (Nonconvex) quadratic piecewise models were suggested by Mangasarian et al., JOGO, Given a sample set Y = {y 1,..., y np }, a nonconvex piecewise quadratic underestimate of f (with n q quadratics) can be obtained by solving:

126 A nonconvex piecewise quadratic model 35/69 Modeling can instead identify valleys of convexity. (Nonconvex) quadratic piecewise models were suggested by Mangasarian et al., JOGO, Given a sample set Y = {y 1,..., y np }, a nonconvex piecewise quadratic underestimate of f (with n q quadratics) can be obtained by solving: max p n p q(p; y j ) s.t. q(p; y j ) f(y j ), j = 1,..., n p, j=1 with p containing all the quadratics coefficients p i = (a i, c i, H i ) in

127 A nonconvex piecewise quadratic model 35/69 Modeling can instead identify valleys of convexity. (Nonconvex) quadratic piecewise models were suggested by Mangasarian et al., JOGO, Given a sample set Y = {y 1,..., y np }, a nonconvex piecewise quadratic underestimate of f (with n q quadratics) can be obtained by solving: max p n p q(p; y j ) s.t. q(p; y j ) f(y j ), j = 1,..., n p, j=1 with p containing all the quadratics coefficients p i = (a i, c i, H i ) in { q(p; y) = a i + c i y + 1 } 2 y H i y = q i (y), min 1 i n q and where the H i s must be assured symmetric and PD.

128 A nonconvex piecewise quadratic model (example) 36/69 Nonconvex piecewise underestimate 3 sin 2.5 q q 2 q 3 q 4 q

129 A nonconvex piecewise quadratic model (drawbacks) 37/69 One can introduce auxiliary variables (linear objective): max p,γ n p j=1 γ j s.t. q(p; y j ) f(y j ), j = 1,..., n p, γ j q i (y j ), i = 1,..., n q, j = 1,..., n p.

130 A nonconvex piecewise quadratic model (drawbacks) 37/69 One can introduce auxiliary variables (linear objective): max p,γ n p j=1 γ j s.t. q(p; y j ) f(y j ), j = 1,..., n p, γ j q i (y j ), i = 1,..., n q, j = 1,..., n p. Still, this optimization problem has several drawbacks: It is nonconvex, and has to be solved itself for global optimality to ensure effective underestimation.

131 A nonconvex piecewise quadratic model (drawbacks) 37/69 One can introduce auxiliary variables (linear objective): max p,γ n p j=1 γ j s.t. q(p; y j ) f(y j ), j = 1,..., n p, γ j q i (y j ), i = 1,..., n q, j = 1,..., n p. Still, this optimization problem has several drawbacks: It is nonconvex, and has to be solved itself for global optimality to ensure effective underestimation. The # of variables is excessive: n p (# of auxiliary variables) + (n + 1)(n + 2)/2 n q (quadratic coefficients # of quadratics).

132 A nonconvex piecewise quadratic model (drawbacks) 37/69 One can introduce auxiliary variables (linear objective): max p,γ n p j=1 γ j s.t. q(p; y j ) f(y j ), j = 1,..., n p, γ j q i (y j ), i = 1,..., n q, j = 1,..., n p. Still, this optimization problem has several drawbacks: It is nonconvex, and has to be solved itself for global optimality to ensure effective underestimation. The # of variables is excessive: n p (# of auxiliary variables) + (n + 1)(n + 2)/2 n q (quadratic coefficients # of quadratics). # of constraints also high when imposing PD on all quadratics.

133 A nonconvex piecewise quadratic model (simplified) 38/69 But still, one could think of computing just a feasible point for the model.

134 A nonconvex piecewise quadratic model (simplified) 38/69 But still, one could think of computing just a feasible point for the model. Or one can think of taking a subset of n p n p points and considering n q = 1, i.e., fitting only one quadratic q i, resulting in the following LP:

135 A nonconvex piecewise quadratic model (simplified) 38/69 But still, one could think of computing just a feasible point for the model. Or one can think of taking a subset of n p n p points and considering n q = 1, i.e., fitting only one quadratic q i, resulting in the following LP: max p i,γ n p j=1 γ j s.t. q i (y j ) f(y j ), j = 1,..., n p, γ j q i (y j ), j = 1,..., n p.

136 A nonconvex piecewise quadratic model (simplified) 38/69 But still, one could think of computing just a feasible point for the model. Or one can think of taking a subset of n p n p points and considering n q = 1, i.e., fitting only one quadratic q i, resulting in the following LP: max p i,γ n p j=1 γ j s.t. q i (y j ) f(y j ), j = 1,..., n p, γ j q i (y j ), j = 1,..., n p. The # of variables reduces to n p (# of auxiliary variables) + (n + 1)(n + 2)/2 (one quadratic).

137 The nonconvex modeling approach 39/69 Consider again R runs of DS and all previously evaluated points.

138 The nonconvex modeling approach 39/69 Consider again R runs of DS and all previously evaluated points. Form R clusters of points around the R poll centers (using distances to them).

139 The nonconvex modeling approach 39/69 Consider again R runs of DS and all previously evaluated points. Form R clusters of points around the R poll centers (using distances to them). Compute the quadratic underestimating models for each cluster of points.

140 The nonconvex modeling approach 39/69 Consider again R runs of DS and all previously evaluated points. Form R clusters of points around the R poll centers (using distances to them). Compute the quadratic underestimating models for each cluster of points. Assess how well the quadratic models fit the whole data: θ i,j = (n p) j l=1 ( f(y j l ) q i(y j l ) f(y j l ) + 1 ) 2, i, j = 1,..., R.

141 The nonconvex modeling approach 39/69 Consider again R runs of DS and all previously evaluated points. Form R clusters of points around the R poll centers (using distances to them). Compute the quadratic underestimating models for each cluster of points. Assess how well the quadratic models fit the whole data: θ i,j = (n p) j l=1 ( f(y j l ) q i(y j l ) f(y j l ) + 1 ) 2, i, j = 1,..., R. If θ i,i is large, then split the DS run i (the quadratic i does not fit well cluster i).

142 The nonconvex modeling approach 39/69 Consider again R runs of DS and all previously evaluated points. Form R clusters of points around the R poll centers (using distances to them). Compute the quadratic underestimating models for each cluster of points. Assess how well the quadratic models fit the whole data: θ i,j = (n p) j l=1 ( f(y j l ) q i(y j l ) f(y j l ) + 1 ) 2, i, j = 1,..., R. If θ i,i is large, then split the DS run i (the quadratic i does not fit well cluster i). In addition, if θ i,j, i j, is small, then merge DS runs i and j, using the poll center with lowest objective value.

143 The nonconvex modeling approach (example) 40/69 Nonconvex piecewise underestimate 3 sin 2.5 q q 2 q 3 q 4 q θ i,j values: clusters or runs q q q q

144 Numerical results 41/69 We provide some numerical results with a set of 102 global optimization problems. Our goal here is:

145 Numerical results 41/69 We provide some numerical results with a set of 102 global optimization problems. Our goal here is: To understand if our approach is sound, and to see if it is capable of identifying global minima.

146 Numerical results 41/69 We provide some numerical results with a set of 102 global optimization problems. Our goal here is: To understand if our approach is sound, and to see if it is capable of identifying global minima. To show that it performs efficiently on a wide and well representative set of problems.

147 Numerical results 41/69 We provide some numerical results with a set of 102 global optimization problems. Our goal here is: To understand if our approach is sound, and to see if it is capable of identifying global minima. To show that it performs efficiently on a wide and well representative set of problems. To develop a solver/code reasonably tested and tuned.

148 Numerical results 41/69 We provide some numerical results with a set of 102 global optimization problems. Our goal here is: To understand if our approach is sound, and to see if it is capable of identifying global minima. To show that it performs efficiently on a wide and well representative set of problems. To develop a solver/code reasonably tested and tuned. The results are presented using performance profiles.

149 Numerical results 41/69 We provide some numerical results with a set of 102 global optimization problems. Our goal here is: To understand if our approach is sound, and to see if it is capable of identifying global minima. To show that it performs efficiently on a wide and well representative set of problems. To develop a solver/code reasonably tested and tuned. The results are presented using performance profiles. The solver starts with 10 poll centers in serial. A maximum of 10 simultaneous runs is imposed.

150 Numerical results We provide some numerical results with a set of 102 global optimization problems. Our goal here is: To understand if our approach is sound, and to see if it is capable of identifying global minima. To show that it performs efficiently on a wide and well representative set of problems. To develop a solver/code reasonably tested and tuned. The results are presented using performance profiles. The solver starts with 10 poll centers in serial. A maximum of 10 simultaneous runs is imposed. A comparison is made with PSwarm (a direct search solver that uses a particle swarm heuristic as a search step). 41/69

151 Performance profiles (# function evaluations) 42/69 1 Number of function evaluations MMDS (2n dirs, kmeans) MMDS (2n dirs, nonconvex) MMDS (2 opposite, kmeans) MMDS (2 opposite, nonconvex) PSwarm

152 Performance profiles (quality of final f) 43/69 1 Objective function values MMDS (2n dirs, kmeans) MMDS (2n dirs, nonconvex) MMDS (2 opposite, kmeans) MMDS (2 opposite, nonconvex) PSwarm

153 Numerical results (application problem) 44/69 Is this approach capable of computing different global minima for an application problem?

154 Numerical results (application problem) 44/69 Is this approach capable of computing different global minima for an application problem? Consider optimizing the orientation of a part, to be built by additive manufacturing (3D printer), where the staircase effect is to be minimized: min (θ x,θ y) j h 2 A j d.n j 2 if d.n j 1, 0 otherwise, where h is the slicing thickness, A j is the area of the j triangle, d is the slicing direction, and n j is the j triangle normal.

155 Numerical results (application problem) 44/69 Is this approach capable of computing different global minima for an application problem? Consider optimizing the orientation of a part, to be built by additive manufacturing (3D printer), where the staircase effect is to be minimized: min (θ x,θ y) j h 2 A j d.n j 2 if d.n j 1, 0 otherwise, where h is the slicing thickness, A j is the area of the j triangle, d is the slicing direction, and n j is the j triangle normal. This is a 2D problem where (θ x, θ y ) [0, 180] 2 are the rotation degrees along the x-axis and y-axis, respectively, in a three dimension space.

156 A used part 45/69 We consider an aerospace complex part for numerical testing. Nearly triangles are needed to define the part.

157 The part/object to be considered 46/69 Slices are taken along the z axis with 5mm high, resulting in 64 slices. Before optimization

158 Objective function landscape 47/

159 Numerical results (space clustering) 48/

160 Numerical results (space clustering) 49/

161 Numerical results (space clustering) 50/

162 Numerical results (space clustering) 51/

163 Numerical results (space clustering) 52/

164 Numerical results (space clustering) 53/69 The solver stops at (θ x, θ y ) = (90, 180), with f = e + 05, taking 124 function evaluations.

165 Numerical results (space clustering) 53/69 The solver stops at (θ x, θ y ) = (90, 180), with f = e + 05, taking 124 function evaluations. The space clustering approach determined only one of the global minimizers!

166 54/69 Numerical results (nonconvex modeling) q

167 55/69 Numerical results (nonconvex modeling) q 1,q

168 Numerical results (nonconvex modeling) min of q s 56/

169 Numerical results (nonconvex modeling) min of q s 57/

170 Numerical results (nonconvex modeling) min of q s 58/

171 Numerical results (nonconvex modeling) min of q s 59/

172 Numerical results (nonconvex modeling) min of q s 60/

173 Numerical results (nonconvex modeling) min of q s /69

174 Numerical results (nonconvex modeling) min of q s 62/

175 Numerical results (nonconvex modeling) min of q s 63/

176 Numerical results (nonconvex modeling) min of q s 64/

177 Numerical results (nonconvex modeling) min of q s 65/

178 Numerical results (nonconvex modeling) min of q s 66/

179 Numerical results (nonconvex modeling) min of q s 67/

180 Numerical results (nonconvex modeling) 68/69 The solver stops now at (θ x, θ y ) = (90, 180) and (θ x, θ y ) = (90, 0), both with f = e + 05, taking 2152 function evaluations.

181 Numerical results (nonconvex modeling) 68/69 The solver stops now at (θ x, θ y ) = (90, 180) and (θ x, θ y ) = (90, 0), both with f = e + 05, taking 2152 function evaluations. The nonconvex modeling approach determined both global minimizers of the problem!

182 The result of Optimization 69/69 Slices are taken along the z axis with 5mm high. After optimization

Direct Search Based on Probabilistic Descent

Direct Search Based on Probabilistic Descent Direct Search Based on Probabilistic Descent S. Gratton C. W. Royer L. N. Vicente Z. Zhang January 22, 2015 Abstract Direct-search methods are a class of popular derivative-free algorithms characterized

More information

Worst Case Complexity of Direct Search

Worst Case Complexity of Direct Search Worst Case Complexity of Direct Search L. N. Vicente October 25, 2012 Abstract In this paper we prove that the broad class of direct-search methods of directional type based on imposing sufficient decrease

More information

Worst Case Complexity of Direct Search

Worst Case Complexity of Direct Search Worst Case Complexity of Direct Search L. N. Vicente May 3, 200 Abstract In this paper we prove that direct search of directional type shares the worst case complexity bound of steepest descent when sufficient

More information

Worst case complexity of direct search

Worst case complexity of direct search EURO J Comput Optim (2013) 1:143 153 DOI 10.1007/s13675-012-0003-7 ORIGINAL PAPER Worst case complexity of direct search L. N. Vicente Received: 7 May 2012 / Accepted: 2 November 2012 / Published online:

More information

On the optimal order of worst case complexity of direct search

On the optimal order of worst case complexity of direct search On the optimal order of worst case complexity of direct search M. Dodangeh L. N. Vicente Z. Zhang May 14, 2015 Abstract The worst case complexity of direct-search methods has been recently analyzed when

More information

Mesures de criticalité d'ordres 1 et 2 en recherche directe

Mesures de criticalité d'ordres 1 et 2 en recherche directe Mesures de criticalité d'ordres 1 et 2 en recherche directe From rst to second-order criticality measures in direct search Clément Royer ENSEEIHT-IRIT, Toulouse, France Co-auteurs: S. Gratton, L. N. Vicente

More information

Direct search based on probabilistic feasible descent for bound and linearly constrained problems

Direct search based on probabilistic feasible descent for bound and linearly constrained problems Direct search based on probabilistic feasible descent for bound and linearly constrained problems S. Gratton C. W. Royer L. N. Vicente Z. Zhang March 8, 2018 Abstract Direct search is a methodology for

More information

Interpolation-Based Trust-Region Methods for DFO

Interpolation-Based Trust-Region Methods for DFO Interpolation-Based Trust-Region Methods for DFO Luis Nunes Vicente University of Coimbra (joint work with A. Bandeira, A. R. Conn, S. Gratton, and K. Scheinberg) July 27, 2010 ICCOPT, Santiago http//www.mat.uc.pt/~lnv

More information

Simple Complexity Analysis of Simplified Direct Search

Simple Complexity Analysis of Simplified Direct Search Simple Complexity Analysis of Simplified Direct Search Jaub Konečný Peter Richtári School of Mathematics University of Edinburgh United Kingdom September 30, 014 (Revised : November 13, 014) Abstract We

More information

You should be able to...

You should be able to... Lecture Outline Gradient Projection Algorithm Constant Step Length, Varying Step Length, Diminishing Step Length Complexity Issues Gradient Projection With Exploration Projection Solving QPs: active set

More information

Complexity Analysis of Interior Point Algorithms for Non-Lipschitz and Nonconvex Minimization

Complexity Analysis of Interior Point Algorithms for Non-Lipschitz and Nonconvex Minimization Mathematical Programming manuscript No. (will be inserted by the editor) Complexity Analysis of Interior Point Algorithms for Non-Lipschitz and Nonconvex Minimization Wei Bian Xiaojun Chen Yinyu Ye July

More information

Complexity analysis of second-order algorithms based on line search for smooth nonconvex optimization

Complexity analysis of second-order algorithms based on line search for smooth nonconvex optimization Complexity analysis of second-order algorithms based on line search for smooth nonconvex optimization Clément Royer - University of Wisconsin-Madison Joint work with Stephen J. Wright MOPTA, Bethlehem,

More information

Implicitely and Densely Discrete Black-Box Optimization Problems

Implicitely and Densely Discrete Black-Box Optimization Problems Implicitely and Densely Discrete Black-Box Optimization Problems L. N. Vicente September 26, 2008 Abstract This paper addresses derivative-free optimization problems where the variables lie implicitly

More information

Convex Optimization. Ofer Meshi. Lecture 6: Lower Bounds Constrained Optimization

Convex Optimization. Ofer Meshi. Lecture 6: Lower Bounds Constrained Optimization Convex Optimization Ofer Meshi Lecture 6: Lower Bounds Constrained Optimization Lower Bounds Some upper bounds: #iter μ 2 M #iter 2 M #iter L L μ 2 Oracle/ops GD κ log 1/ε M x # ε L # x # L # ε # με f

More information

Lecture 5: September 12

Lecture 5: September 12 10-725/36-725: Convex Optimization Fall 2015 Lecture 5: September 12 Lecturer: Lecturer: Ryan Tibshirani Scribes: Scribes: Barun Patra and Tyler Vuong Note: LaTeX template courtesy of UC Berkeley EECS

More information

Optimisation non convexe avec garanties de complexité via Newton+gradient conjugué

Optimisation non convexe avec garanties de complexité via Newton+gradient conjugué Optimisation non convexe avec garanties de complexité via Newton+gradient conjugué Clément Royer (Université du Wisconsin-Madison, États-Unis) Toulouse, 8 janvier 2019 Nonconvex optimization via Newton-CG

More information

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725 Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like

More information

Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method

Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method Davood Hajinezhad Iowa State University Davood Hajinezhad Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method 1 / 35 Co-Authors

More information

USING SIMPLEX GRADIENTS OF NONSMOOTH FUNCTIONS IN DIRECT SEARCH METHODS

USING SIMPLEX GRADIENTS OF NONSMOOTH FUNCTIONS IN DIRECT SEARCH METHODS USING SIMPLEX GRADIENTS OF NONSMOOTH FUNCTIONS IN DIRECT SEARCH METHODS A. L. CUSTÓDIO, J. E. DENNIS JR., AND L. N. VICENTE Abstract. It has been shown recently that the efficiency of direct search methods

More information

USING SIMPLEX GRADIENTS OF NONSMOOTH FUNCTIONS IN DIRECT SEARCH METHODS

USING SIMPLEX GRADIENTS OF NONSMOOTH FUNCTIONS IN DIRECT SEARCH METHODS Pré-Publicações do Departamento de Matemática Universidade de Coimbra Preprint Number 06 48 USING SIMPLEX GRADIENTS OF NONSMOOTH FUNCTIONS IN DIRECT SEARCH METHODS A. L. CUSTÓDIO, J. E. DENNIS JR. AND

More information

Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods

Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 30 Notation f : H R { } is a closed proper convex function domf := {x R n

More information

A derivative-free nonmonotone line search and its application to the spectral residual method

A derivative-free nonmonotone line search and its application to the spectral residual method IMA Journal of Numerical Analysis (2009) 29, 814 825 doi:10.1093/imanum/drn019 Advance Access publication on November 14, 2008 A derivative-free nonmonotone line search and its application to the spectral

More information

Stationarity results for generating set search for linearly constrained optimization

Stationarity results for generating set search for linearly constrained optimization Stationarity results for generating set search for linearly constrained optimization Virginia Torczon, College of William & Mary Collaborators: Tammy Kolda, Sandia National Laboratories Robert Michael

More information

1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by:

1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by: Newton s Method Suppose we want to solve: (P:) min f (x) At x = x, f (x) can be approximated by: n x R. f (x) h(x) := f ( x)+ f ( x) T (x x)+ (x x) t H ( x)(x x), 2 which is the quadratic Taylor expansion

More information

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL) Part 3: Trust-region methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective

More information

Constrained optimization

Constrained optimization Constrained optimization DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Compressed sensing Convex constrained

More information

Algorithms for Nonsmooth Optimization

Algorithms for Nonsmooth Optimization Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization

More information

Lecture 1: Supervised Learning

Lecture 1: Supervised Learning Lecture 1: Supervised Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech ISYE6740/CSE6740/CS7641: Computational Data Analysis/Machine from Portland, Learning Oregon: pervised learning (Supervised)

More information

Contents. Preface. 1 Introduction Optimization view on mathematical models NLP models, black-box versus explicit expression 3

Contents. Preface. 1 Introduction Optimization view on mathematical models NLP models, black-box versus explicit expression 3 Contents Preface ix 1 Introduction 1 1.1 Optimization view on mathematical models 1 1.2 NLP models, black-box versus explicit expression 3 2 Mathematical modeling, cases 7 2.1 Introduction 7 2.2 Enclosing

More information

12. Interior-point methods

12. Interior-point methods 12. Interior-point methods Convex Optimization Boyd & Vandenberghe inequality constrained minimization logarithmic barrier function and central path barrier method feasibility and phase I methods complexity

More information

Lecture 6: September 12

Lecture 6: September 12 10-725: Optimization Fall 2013 Lecture 6: September 12 Lecturer: Ryan Tibshirani Scribes: Micol Marchetti-Bowick Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have not

More information

Stochastic Optimization Algorithms Beyond SG

Stochastic Optimization Algorithms Beyond SG Stochastic Optimization Algorithms Beyond SG Frank E. Curtis 1, Lehigh University involving joint work with Léon Bottou, Facebook AI Research Jorge Nocedal, Northwestern University Optimization Methods

More information

IPAM Summer School Optimization methods for machine learning. Jorge Nocedal

IPAM Summer School Optimization methods for machine learning. Jorge Nocedal IPAM Summer School 2012 Tutorial on Optimization methods for machine learning Jorge Nocedal Northwestern University Overview 1. We discuss some characteristics of optimization problems arising in deep

More information

Complexity of gradient descent for multiobjective optimization

Complexity of gradient descent for multiobjective optimization Complexity of gradient descent for multiobjective optimization J. Fliege A. I. F. Vaz L. N. Vicente July 18, 2018 Abstract A number of first-order methods have been proposed for smooth multiobjective optimization

More information

Globally Convergent Evolution Strategies and CMA-ES

Globally Convergent Evolution Strategies and CMA-ES Globally Convergent Evolution Strategies and CMA-ES Y. Diouane, S. Gratton and L. N. Vicente Technical Report TR/PA/2/6 Publications of the Parallel Algorithms Team http://www.cerfacs.fr/algor/publications/

More information

Composite nonlinear models at scale

Composite nonlinear models at scale Composite nonlinear models at scale Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with D. Davis (Cornell), M. Fazel (UW), A.S. Lewis (Cornell) C. Paquette (Lehigh), and S. Roy (UW)

More information

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44 Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)

More information

Optimization and Gradient Descent

Optimization and Gradient Descent Optimization and Gradient Descent INFO-4604, Applied Machine Learning University of Colorado Boulder September 12, 2017 Prof. Michael Paul Prediction Functions Remember: a prediction function is the function

More information

Unconstrained minimization of smooth functions

Unconstrained minimization of smooth functions Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and

More information

Global and derivative-free optimization Lectures 1-4

Global and derivative-free optimization Lectures 1-4 Global and derivative-free optimization Lectures 1-4 Coralia Cartis, University of Oxford INFOMM CDT: Contemporary Numerical Techniques Global and derivative-free optimizationlectures 1-4 p. 1/46 Lectures

More information

Gradient methods for minimizing composite functions

Gradient methods for minimizing composite functions Gradient methods for minimizing composite functions Yu. Nesterov May 00 Abstract In this paper we analyze several new methods for solving optimization problems with the objective function formed as a sum

More information

Constrained Optimization Theory

Constrained Optimization Theory Constrained Optimization Theory Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. IMA, August 2016 Stephen Wright (UW-Madison) Constrained Optimization Theory IMA, August

More information

Gradient methods for minimizing composite functions

Gradient methods for minimizing composite functions Math. Program., Ser. B 2013) 140:125 161 DOI 10.1007/s10107-012-0629-5 FULL LENGTH PAPER Gradient methods for minimizing composite functions Yu. Nesterov Received: 10 June 2010 / Accepted: 29 December

More information

Convex Optimization Lecture 16

Convex Optimization Lecture 16 Convex Optimization Lecture 16 Today: Projected Gradient Descent Conditional Gradient Descent Stochastic Gradient Descent Random Coordinate Descent Recall: Gradient Descent (Steepest Descent w.r.t Euclidean

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

Applied Machine Learning for Biomedical Engineering. Enrico Grisan

Applied Machine Learning for Biomedical Engineering. Enrico Grisan Applied Machine Learning for Biomedical Engineering Enrico Grisan enrico.grisan@dei.unipd.it Data representation To find a representation that approximates elements of a signal class with a linear combination

More information

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725 Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:

More information

Uniform Convergence of a Multilevel Energy-based Quantization Scheme

Uniform Convergence of a Multilevel Energy-based Quantization Scheme Uniform Convergence of a Multilevel Energy-based Quantization Scheme Maria Emelianenko 1 and Qiang Du 1 Pennsylvania State University, University Park, PA 16803 emeliane@math.psu.edu and qdu@math.psu.edu

More information

Trust-Region SQP Methods with Inexact Linear System Solves for Large-Scale Optimization

Trust-Region SQP Methods with Inexact Linear System Solves for Large-Scale Optimization Trust-Region SQP Methods with Inexact Linear System Solves for Large-Scale Optimization Denis Ridzal Department of Computational and Applied Mathematics Rice University, Houston, Texas dridzal@caam.rice.edu

More information

Lecture 4: Linear and quadratic problems

Lecture 4: Linear and quadratic problems Lecture 4: Linear and quadratic problems linear programming examples and applications linear fractional programming quadratic optimization problems (quadratically constrained) quadratic programming second-order

More information

Key words. Global optimization, multistart strategies, direct-search methods, pattern search methods, nonsmooth calculus.

Key words. Global optimization, multistart strategies, direct-search methods, pattern search methods, nonsmooth calculus. GLODS: GLOBAL AND LOCAL OPTIMIZATION USING DIRECT SEARCH A. L. CUSTÓDIO AND J. F. A. MADEIRA Abstract. Locating and identifying points as global minimizers is, in general, a hard and timeconsuming task.

More information

Empirical Risk Minimization

Empirical Risk Minimization Empirical Risk Minimization Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Introduction PAC learning ERM in practice 2 General setting Data X the input space and Y the output space

More information

CSC 576: Gradient Descent Algorithms

CSC 576: Gradient Descent Algorithms CSC 576: Gradient Descent Algorithms Ji Liu Department of Computer Sciences, University of Rochester December 22, 205 Introduction The gradient descent algorithm is one of the most popular optimization

More information

Lecture 13 October 6, Covering Numbers and Maurey s Empirical Method

Lecture 13 October 6, Covering Numbers and Maurey s Empirical Method CS 395T: Sublinear Algorithms Fall 2016 Prof. Eric Price Lecture 13 October 6, 2016 Scribe: Kiyeon Jeon and Loc Hoang 1 Overview In the last lecture we covered the lower bound for p th moment (p > 2) and

More information

arxiv: v1 [math.oc] 1 Jul 2016

arxiv: v1 [math.oc] 1 Jul 2016 Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the

More information

EE364 Review Session 4

EE364 Review Session 4 EE364 Review Session 4 EE364 Review Outline: convex optimization examples solving quasiconvex problems by bisection exercise 4.47 1 Convex optimization problems we have seen linear programming (LP) quadratic

More information

Stochastic Compositional Gradient Descent: Algorithms for Minimizing Nonlinear Functions of Expected Values

Stochastic Compositional Gradient Descent: Algorithms for Minimizing Nonlinear Functions of Expected Values Stochastic Compositional Gradient Descent: Algorithms for Minimizing Nonlinear Functions of Expected Values Mengdi Wang Ethan X. Fang Han Liu Abstract Classical stochastic gradient methods are well suited

More information

Global convergence of trust-region algorithms for constrained minimization without derivatives

Global convergence of trust-region algorithms for constrained minimization without derivatives Global convergence of trust-region algorithms for constrained minimization without derivatives P.D. Conejo E.W. Karas A.A. Ribeiro L.G. Pedroso M. Sachine September 27, 2012 Abstract In this work we propose

More information

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)

More information

6-3 Solving Systems by Elimination

6-3 Solving Systems by Elimination Another method for solving systems of equations is elimination. Like substitution, the goal of elimination is to get one equation that has only one variable. To do this by elimination, you add the two

More information

1 Randomized Computation

1 Randomized Computation CS 6743 Lecture 17 1 Fall 2007 1 Randomized Computation Why is randomness useful? Imagine you have a stack of bank notes, with very few counterfeit ones. You want to choose a genuine bank note to pay at

More information

A user s guide to Lojasiewicz/KL inequalities

A user s guide to Lojasiewicz/KL inequalities Other A user s guide to Lojasiewicz/KL inequalities Toulouse School of Economics, Université Toulouse I SLRA, Grenoble, 2015 Motivations behind KL f : R n R smooth ẋ(t) = f (x(t)) or x k+1 = x k λ k f

More information

Descent methods. min x. f(x)

Descent methods. min x. f(x) Gradient Descent Descent methods min x f(x) 5 / 34 Descent methods min x f(x) x k x k+1... x f(x ) = 0 5 / 34 Gradient methods Unconstrained optimization min f(x) x R n. 6 / 34 Gradient methods Unconstrained

More information

CS 372: Computational Geometry Lecture 14 Geometric Approximation Algorithms

CS 372: Computational Geometry Lecture 14 Geometric Approximation Algorithms CS 372: Computational Geometry Lecture 14 Geometric Approximation Algorithms Antoine Vigneron King Abdullah University of Science and Technology December 5, 2012 Antoine Vigneron (KAUST) CS 372 Lecture

More information

Approximation algorithms for nonnegative polynomial optimization problems over unit spheres

Approximation algorithms for nonnegative polynomial optimization problems over unit spheres Front. Math. China 2017, 12(6): 1409 1426 https://doi.org/10.1007/s11464-017-0644-1 Approximation algorithms for nonnegative polynomial optimization problems over unit spheres Xinzhen ZHANG 1, Guanglu

More information

Gradient descent. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Gradient descent. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725 Gradient descent Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Gradient descent First consider unconstrained minimization of f : R n R, convex and differentiable. We want to solve

More information

Subgradient Method. Ryan Tibshirani Convex Optimization

Subgradient Method. Ryan Tibshirani Convex Optimization Subgradient Method Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last last time: gradient descent min x f(x) for f convex and differentiable, dom(f) = R n. Gradient descent: choose initial

More information

Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization

Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization Frank E. Curtis, Lehigh University Beyond Convexity Workshop, Oaxaca, Mexico 26 October 2017 Worst-Case Complexity Guarantees and Nonconvex

More information

Lecture 5: September 15

Lecture 5: September 15 10-725/36-725: Convex Optimization Fall 2015 Lecture 5: September 15 Lecturer: Lecturer: Ryan Tibshirani Scribes: Scribes: Di Jin, Mengdi Wang, Bin Deng Note: LaTeX template courtesy of UC Berkeley EECS

More information

SOME PROPERTIES OF INTEGRAL APOLLONIAN PACKINGS

SOME PROPERTIES OF INTEGRAL APOLLONIAN PACKINGS SOME PROPERTIES OF INTEGRAL APOLLONIAN PACKINGS HENRY LI Abstract. Within the study of fractals, some very interesting number theoretical properties can arise from unexpected places. The integral Apollonian

More information

A New Trust Region Algorithm Using Radial Basis Function Models

A New Trust Region Algorithm Using Radial Basis Function Models A New Trust Region Algorithm Using Radial Basis Function Models Seppo Pulkkinen University of Turku Department of Mathematics July 14, 2010 Outline 1 Introduction 2 Background Taylor series approximations

More information

Mini-batch Stochastic Approximation Methods for Nonconvex Stochastic Composite Optimization

Mini-batch Stochastic Approximation Methods for Nonconvex Stochastic Composite Optimization Noname manuscript No. (will be inserted by the editor) Mini-batch Stochastic Approximation Methods for Nonconvex Stochastic Composite Optimization Saeed Ghadimi Guanghui Lan Hongchao Zhang the date of

More information

University of Houston, Department of Mathematics Numerical Analysis, Fall 2005

University of Houston, Department of Mathematics Numerical Analysis, Fall 2005 3 Numerical Solution of Nonlinear Equations and Systems 3.1 Fixed point iteration Reamrk 3.1 Problem Given a function F : lr n lr n, compute x lr n such that ( ) F(x ) = 0. In this chapter, we consider

More information

Development of an algorithm for solving mixed integer and nonconvex problems arising in electrical supply networks

Development of an algorithm for solving mixed integer and nonconvex problems arising in electrical supply networks Development of an algorithm for solving mixed integer and nonconvex problems arising in electrical supply networks E. Wanufelle 1 S. Leyffer 2 A. Sartenaer 1 Ph. Toint 1 1 FUNDP, University of Namur 2

More information

Kaisa Joki Adil M. Bagirov Napsu Karmitsa Marko M. Mäkelä. New Proximal Bundle Method for Nonsmooth DC Optimization

Kaisa Joki Adil M. Bagirov Napsu Karmitsa Marko M. Mäkelä. New Proximal Bundle Method for Nonsmooth DC Optimization Kaisa Joki Adil M. Bagirov Napsu Karmitsa Marko M. Mäkelä New Proximal Bundle Method for Nonsmooth DC Optimization TUCS Technical Report No 1130, February 2015 New Proximal Bundle Method for Nonsmooth

More information

Spectral gradient projection method for solving nonlinear monotone equations

Spectral gradient projection method for solving nonlinear monotone equations Journal of Computational and Applied Mathematics 196 (2006) 478 484 www.elsevier.com/locate/cam Spectral gradient projection method for solving nonlinear monotone equations Li Zhang, Weijun Zhou Department

More information

An Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization

An Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization An Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization Frank E. Curtis, Lehigh University involving joint work with Travis Johnson, Northwestern University Daniel P. Robinson, Johns

More information

New class of limited-memory variationally-derived variable metric methods 1

New class of limited-memory variationally-derived variable metric methods 1 New class of limited-memory variationally-derived variable metric methods 1 Jan Vlček, Ladislav Lukšan Institute of Computer Science, Academy of Sciences of the Czech Republic, L. Lukšan is also from Technical

More information

Copositive Programming and Combinatorial Optimization

Copositive Programming and Combinatorial Optimization Copositive Programming and Combinatorial Optimization Franz Rendl http://www.math.uni-klu.ac.at Alpen-Adria-Universität Klagenfurt Austria joint work with I.M. Bomze (Wien) and F. Jarre (Düsseldorf) IMA

More information

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS 1. Introduction. We consider first-order methods for smooth, unconstrained optimization: (1.1) minimize f(x), x R n where f : R n R. We assume

More information

A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming

A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming Zhaosong Lu Lin Xiao March 9, 2015 (Revised: May 13, 2016; December 30, 2016) Abstract We propose

More information

Introduction to gradient descent

Introduction to gradient descent 6-1: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction to gradient descent Derivation and intuitions Hessian 6-2: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction Our

More information

Bubble Clustering: Set Covering via Union of Ellipses

Bubble Clustering: Set Covering via Union of Ellipses Bubble Clustering: Set Covering via Union of Ellipses Matt Kraning, Arezou Keshavarz, Lei Zhao CS229, Stanford University Email: {mkraning,arezou,leiz}@stanford.edu Abstract We develop an algorithm called

More information

Lecture Examples of problems which have randomized algorithms

Lecture Examples of problems which have randomized algorithms 6.841 Advanced Complexity Theory March 9, 2009 Lecture 10 Lecturer: Madhu Sudan Scribe: Asilata Bapat Meeting to talk about final projects on Wednesday, 11 March 2009, from 5pm to 7pm. Location: TBA. Includes

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 3. Gradient Method

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 3. Gradient Method Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 3 Gradient Method Shiqian Ma, MAT-258A: Numerical Optimization 2 3.1. Gradient method Classical gradient method: to minimize a differentiable convex

More information

LAGRANGE MULTIPLIERS

LAGRANGE MULTIPLIERS LAGRANGE MULTIPLIERS MATH 195, SECTION 59 (VIPUL NAIK) Corresponding material in the book: Section 14.8 What students should definitely get: The Lagrange multiplier condition (one constraint, two constraints

More information

A glimpse into convex geometry. A glimpse into convex geometry

A glimpse into convex geometry. A glimpse into convex geometry A glimpse into convex geometry 5 \ þ ÏŒÆ Two basis reference: 1. Keith Ball, An elementary introduction to modern convex geometry 2. Chuanming Zong, What is known about unit cubes Convex geometry lies

More information

The simplex algorithm

The simplex algorithm The simplex algorithm The simplex algorithm is the classical method for solving linear programs. Its running time is not polynomial in the worst case. It does yield insight into linear programs, however,

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Big Data Analytics: Optimization and Randomization

Big Data Analytics: Optimization and Randomization Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.

More information

Convex Optimization in Classification Problems

Convex Optimization in Classification Problems New Trends in Optimization and Computational Algorithms December 9 13, 2001 Convex Optimization in Classification Problems Laurent El Ghaoui Department of EECS, UC Berkeley elghaoui@eecs.berkeley.edu 1

More information

Semidefinite and Second Order Cone Programming Seminar Fall 2012 Project: Robust Optimization and its Application of Robust Portfolio Optimization

Semidefinite and Second Order Cone Programming Seminar Fall 2012 Project: Robust Optimization and its Application of Robust Portfolio Optimization Semidefinite and Second Order Cone Programming Seminar Fall 2012 Project: Robust Optimization and its Application of Robust Portfolio Optimization Instructor: Farid Alizadeh Author: Ai Kagawa 12/12/2012

More information

Towards stability and optimality in stochastic gradient descent

Towards stability and optimality in stochastic gradient descent Towards stability and optimality in stochastic gradient descent Panos Toulis, Dustin Tran and Edoardo M. Airoldi August 26, 2016 Discussion by Ikenna Odinaka Duke University Outline Introduction 1 Introduction

More information

STATIONARITY RESULTS FOR GENERATING SET SEARCH FOR LINEARLY CONSTRAINED OPTIMIZATION

STATIONARITY RESULTS FOR GENERATING SET SEARCH FOR LINEARLY CONSTRAINED OPTIMIZATION STATIONARITY RESULTS FOR GENERATING SET SEARCH FOR LINEARLY CONSTRAINED OPTIMIZATION TAMARA G. KOLDA, ROBERT MICHAEL LEWIS, AND VIRGINIA TORCZON Abstract. We present a new generating set search (GSS) approach

More information

Nonlinear Programming (Hillier, Lieberman Chapter 13) CHEM-E7155 Production Planning and Control

Nonlinear Programming (Hillier, Lieberman Chapter 13) CHEM-E7155 Production Planning and Control Nonlinear Programming (Hillier, Lieberman Chapter 13) CHEM-E7155 Production Planning and Control 19/4/2012 Lecture content Problem formulation and sample examples (ch 13.1) Theoretical background Graphical

More information

Lecture: Local Spectral Methods (3 of 4) 20 An optimization perspective on local spectral methods

Lecture: Local Spectral Methods (3 of 4) 20 An optimization perspective on local spectral methods Stat260/CS294: Spectral Graph Methods Lecture 20-04/07/205 Lecture: Local Spectral Methods (3 of 4) Lecturer: Michael Mahoney Scribe: Michael Mahoney Warning: these notes are still very rough. They provide

More information

Introduction. New Nonsmooth Trust Region Method for Unconstraint Locally Lipschitz Optimization Problems

Introduction. New Nonsmooth Trust Region Method for Unconstraint Locally Lipschitz Optimization Problems New Nonsmooth Trust Region Method for Unconstraint Locally Lipschitz Optimization Problems Z. Akbari 1, R. Yousefpour 2, M. R. Peyghami 3 1 Department of Mathematics, K.N. Toosi University of Technology,

More information

Randomized Block Coordinate Non-Monotone Gradient Method for a Class of Nonlinear Programming

Randomized Block Coordinate Non-Monotone Gradient Method for a Class of Nonlinear Programming Randomized Block Coordinate Non-Monotone Gradient Method for a Class of Nonlinear Programming Zhaosong Lu Lin Xiao June 25, 2013 Abstract In this paper we propose a randomized block coordinate non-monotone

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization

More information

Overparametrization for Landscape Design in Non-convex Optimization

Overparametrization for Landscape Design in Non-convex Optimization Overparametrization for Landscape Design in Non-convex Optimization Jason D. Lee University of Southern California September 19, 2018 The State of Non-Convex Optimization Practical observation: Empirically,

More information