Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization

Size: px
Start display at page:

Download "Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization"

Transcription

1 Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization Frank E. Curtis, Lehigh University Beyond Convexity Workshop, Oaxaca, Mexico 26 October 2017 Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization 1 of 33

2 Outline Motivation Trust Region vs. Cubic Regularization Complexity Bounds Summary/Questions Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization 2 of 33

3 Outline Motivation Trust Region vs. Cubic Regularization Complexity Bounds Summary/Questions Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization 3 of 33

4 Unconstrained nonconvex optimization Given f : R n R, consider the unconstrained optimization problem What type of method to use? First-order or second-order? min f(x). x R n One motivation for second-order: improved complexity bounds. Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization 4 of 33

5 Unconstrained nonconvex optimization Given f : R n R, consider the unconstrained optimization problem What type of method to use? First-order or second-order? min f(x). x R n One motivation for second-order: improved complexity bounds. Main message of this talk: For nonconvex optimization complexity bounds should be taken with a grain of salt (for now). Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization 4 of 33

6 Unconstrained nonconvex optimization Given f : R n R, consider the unconstrained optimization problem What type of method to use? First-order or second-order? min f(x). x R n One motivation for second-order: improved complexity bounds. Main message of this talk: For nonconvex optimization complexity bounds should be taken with a grain of salt (for now). Parting words: There are other, better motivations for second-order methods. Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization 4 of 33

7 Methods of interest Trust region methods Decades of algorithmic development Levenberg (1944); Marquardt (1963); Powell (1970); many more! Cubic regularization methods Relatively recent algorithmic development; fewer variants Griewank (1981); Nesterov & Polyak (2006); Cartis, Gould, & Toint (2011) Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization 5 of 33

8 Methods of interest Trust region methods Decades of algorithmic development Levenberg (1944); Marquardt (1963); Powell (1970); many more! Cubic regularization methods Relatively recent algorithmic development; fewer variants Griewank (1981); Nesterov & Polyak (2006); Cartis, Gould, & Toint (2011) Theoretical guarantees to assess a nonconvex optimization algorithm: Global convergence, i.e., f(x k ) 0 and maybe min(eig( 2 f(x k ))) 0 Local convergence rate, i.e., f(x k+1 ) 2 / f(x k ) 2 0 (or more) Worst-case complexity, i.e., upper bound on number of iterations 1 to achieve f(x k ) 2 ɛ and perhaps min(eig( 2 f(x k ))) ɛ for some ɛ > or function evaluations, subproblem solves, etc. Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization 5 of 33

9 Methods of interest Trust region methods Decades of algorithmic development Levenberg (1944); Marquardt (1963); Powell (1970); many more! Global convergence, local quadratic rate when 2 f(x ) 0 O(ɛ 2 ) complexity to first-order ɛ-criticality, O(ɛ 3 ) to second-order Cubic regularization methods Relatively recent algorithmic development; fewer variants Griewank (1981); Nesterov & Polyak (2006); Cartis, Gould, & Toint (2011) Global convergence, local quadratic rate when 2 f(x ) 0 O(ɛ 3/2 ) complexity to first-order ɛ-criticality, O(ɛ 3 ) to second-order Theoretical guarantees to assess a nonconvex optimization algorithm: Global convergence, i.e., f(x k ) 0 and maybe min(eig( 2 f(x k ))) 0 Local convergence rate, i.e., f(x k+1 ) 2 / f(x k ) 2 0 (or more) Worst-case complexity, i.e., upper bound on number of iterations 1 to achieve f(x k ) 2 ɛ and perhaps min(eig( 2 f(x k ))) ɛ for some ɛ > or function evaluations, subproblem solves, etc. Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization 5 of 33

10 Theory vs. practice Researchers have been gravitating to adopt and build on cubic regularization: Agarwal, Allen-Zhu, Bullins, Hazan, Ma (2017) Carmon, Duchi (2017) Kohler, Lucchi (2017) Peng, Roosta-Khorasan, Mahoney (2017) However, there remains a large gap between theory and practice! Little evidence that cubic regularization methods offer improved performance: Trust region (TR) methods remain the state-of-the-art TR-like methods can achieve the same complexity guarantees Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization 6 of 33

11 Trust region methods with optimal complexity Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization 7 of 33

12 Closer look Let s understand these complexity bounds, then look closely at them. I ll refer to three algorithms: ttr: Traditional Trust Region algorithm arc: Adaptive Regularisation algorithm using Cubics Cartis, Gould, & Toint (2011) trace: Trust Region Algorithm with Contractions and Expansions Curtis, Robinson, & Samadi (2017) Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization 8 of 33

13 Outline Motivation Trust Region vs. Cubic Regularization Complexity Bounds Summary/Questions Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization 9 of 33

14 Algorithm basics 1: Solve to compute s k : min s R n ttr q k(s) := f k + g T k s st H k s s.t. s 2 δ k (dual: λ k ) 2: Compute ratio: ρ q k f k f(x k +s k ) f k q k (s k ) 3: Update radius: ρ q k η: accept and δ k ρ q k < η: reject and δ k 1: Solve to compute s k : min s R n arc c k(s) := f k + g T k s st H k s σ k s 3 2 2: Compute ratio: ρ c k f k f(x k +s k ) f k c k (s k ) 3: Update regularization: ρ c k η: accept and σ k ρ c k < η: reject and σ k Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization 10 of 33

15 Algorithm basics: Subproblem solution correspondence 1: Solve to compute s k : min s R n ttr q k(s) := f k + g T k s st H k s s.t. s 2 δ k (dual: λ k ) 2: Compute ratio: ρ q k f k f(x k +s k ) f k q k (s k ) 3: Update radius: ρ q k η: accept and δ k ρ q k < η: reject and δ k σ k = λ k δ k δ k = s k 2 1: Solve to compute s k : min s R n arc c k(s) := f k + g T k s st H k s σ k s 3 2 2: Compute ratio: ρ c k f k f(x k +s k ) f k c k (s k ) 3: Update regularization: ρ c k η: accept and σ k ρ c k < η: reject and σ k Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization 10 of 33

16 Discussion What are the similarities? algorithmic frameworks are almost identical one-to-one correspondence (except λ k = 0) between subproblem solutions What are the key differences? step acceptance criteria trust region vs. regularization coefficient updates Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization 11 of 33

17 Discussion What are the similarities? algorithmic frameworks are almost identical one-to-one correspondence (except λ k = 0) between subproblem solutions What are the key differences? step acceptance criteria trust region vs. regularization coefficient updates Recall that a solution s k of the TR subproblem is also a solution of min s R n f k + g T k s st (H k + λ k I)s, so the dual variable λ k can be viewed as a quadratic regularization coefficient. Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization 11 of 33

18 Regularization/stepsize trade-off: ttr s k (δ) 2 (= δ) (λ k (δ k ), s k (δ k ) 2 ) At a given iterate x k, curve illustrates dual variable (i.e., quadratic regularization) and norm of corresponding step as a function of TR radius 0 max{0, min(eig(h k ))} λ k (δ) Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization 12 of 33

19 Regularization/stepsize trade-off: ttr s k (δ) 2 (= δ) (λ k (δ k ), s k (δ k ) 2 ) After a rejected step (i.e., with x k+1 = x k ) we set δ k+1 γδ k (linear rate of decrease) while λ k+1 > λ k (λ k+1 (δ k+1 ), s k+1 (δ k+1 ) 2 ) 0 max{0, min(eig(h k ))} λ k (δ) Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization 12 of 33

20 Regularization/stepsize trade-off: ttr s k (δ) 2 (= δ) In fact, the increase in the dual can be quite severe in some cases! (We have no direct control over this.) (λ k (δ k ), s k (δ k ) 2 ) (λ k+1 (δ k+1 ), s k+1 (δ k+1 ) 2 ) 0 max{0, min(eig(h k ))} λ k (δ) Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization 12 of 33

21 Intuition, please! λ Intuitively, what is so important about k = λ k? s k 2 δ k Large δ k implies s k may not yield objective decrease. Small δ k prohibits long steps. Small λ k suggests the TR is not restricting us too much. Large λ k suggests more objective decrease is possible. So what is so bad (for complexity s sake) with the following? It s that we may go from a large, but unproductive step to a productive, but (too) short step! λ k δ k 0 and λ k+1 δ k+1 0. Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization 13 of 33

22 arc magic So what s the magic of arc? It s not the types of steps you compute (since TR subproblem gives the same). It s that a simple update for σ k gives a good regularization/stepsize balance. Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization 14 of 33

23 arc magic So what s the magic of arc? It s not the types of steps you compute (since TR subproblem gives the same). It s that a simple update for σ k gives a good regularization/stepsize balance. In arc, restricting σ k σ min for all k and proving that σ k σ max for all k ensures that all accepted steps satisfy f k f k+1 c 1 σ min s k 3 2 ( and s k 2 c 2 σ max + c 3 ) 2 g k+1 1/2 2. One can also show that, at any point, the number of rejected steps that can occur consecutively is bounded above by a constant (independent of k and ɛ). Important to note that arc always has the regularization on. Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization 14 of 33

24 Regularization/stepsize trade-off: arc s k (σ) 2 slope = 1/σ k All points on the dashed line yield the same ratio σ = λ/ s 2 so, given σ k, the properties of s k are determined by the intersection of the dashed line and the curve 0 max{0, min(eig(h k ))} λ k (σ) Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization 15 of 33

25 Regularization/stepsize trade-off: arc s k (σ) 2 slope = 1/σ k A sequence of rejected steps follow the curve much differently than ttr; 1/σ k+1 1/σ k+2 1/σ k+3 1/σ k+4 1/σ k+5 1/σ k+6 1/σ k+7 0 max{0, min(eig(h k ))} λ k (σ) Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization 15 of 33

26 Regularization/stepsize trade-off: arc s k (σ) 2 A sequence of rejected steps follow the curve much differently than ttr; in particular, for sufficiently large σ, the rate of decrease in s is sublinear 0 max{0, min(eig(h k ))} λ k (σ) Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization 15 of 33

27 From ttr to trace trace involves three key modifications of ttr. 1: Different step acceptance ratio 2: New expansion step: May reject step while increasing TR radius 3: New contraction procedure: Explicit or implicit (through update of λ) With these, we obtain the same complexity properties as arc. We also recently proposed and analyzed a framework that generalizes both... and allows for inexact subproblem solutions (maintaining complexity). min f (s,λ) R n k + gk T s + 1 R 2 st (H k + λi)s s.t. (σ L k )2 s 2 2 λ (σu k )2 s 2 2 Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization 16 of 33

28 Outline Motivation Trust Region vs. Cubic Regularization Complexity Bounds Summary/Questions Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization 17 of 33

29 Complexity for arc and trace Suppose that f is twice continuously differentiable bounded below by f min... with g and H both Lipschitz continuous (constants L g and L H ) Combine the inequalities f k f k+1 η s k 3 2 and s k 2 (L H + σ max) 1/2 g k+1 1/2 2 The cardinality of the set {k N : g k 2 > ɛ} is bounded above by ( ) f 0 f min η(l H + σ max) 3/2 ɛ 3/2 Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization 18 of 33

30 Complexity for arc and trace Suppose that f is twice continuously differentiable bounded below by f min... with g and H both Lipschitz continuous (constants L g and L H ) Combine the inequalities f k f k+1 η s k 3 2 and s k 2 (L H + σ max) 1/2 g k+1 1/2 2 The cardinality of the set {k N : g k 2 > ɛ} is bounded above by ( ) f 0 f min η(l H + σ max) 3/2 ɛ 3/2 But these bounds are very pessimistic... Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization 18 of 33

31 Simpler setting Consider the iteration x k+1 x k 1 L g k. This type of complexity analysis considers the set G(ɛ g) := {x R n : g(x) 2 ɛ g} and aims to find an upper bound on the cardinality of K g(ɛ g) := {k N : x k G(ɛ g)}. Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization 19 of 33

32 Upper bound on K g (ɛ g ) Using s k = 1 L g g k and the upper bound f k+1 f k + g T k s k Lg s k 2 2, one finds f k f k+1 1 2L g g k 2 2 = (f 0 f ) 1 2L g K g(ɛ g) ɛ 2 g = K g(ɛ g) 2L g(f 0 f )ɛ 2 g. Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization 20 of 33

33 Nice f But what if f is nice?... e.g., satisfying the Polyak-Lojasiewicz condition (for c R ++ ), i.e., f(x) f 1 2c g(x) 2 2 for all x Rn. Now consider the set F(ɛ f ) := {x R n : f(x) f ɛ f } and consider an upper bound on the cardinality of K f (ɛ f ) := {k N : x k F(ɛ f )}. Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization 21 of 33

34 Upper bound on K f (ɛ f ) Using s k = 1 L g g k and the upper bound one finds f k+1 f k + g T k s k Lg s k 2 2, f k f k+1 1 2L g g k 2 2 c L g (f k f ) = (1 c L g )(f k f ) f k+1 f = (1 c ) k (f L 0 f g ) f k f = ( ) ( ( )) f0 f 1 Lg K f (ɛ f ) log log. ɛ f L g c Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization 22 of 33

35 For the first step... In the general nonconvex analysis the expected decrease for the first step is much more pessimistic: general nonconvex: f 0 f 1 1 2L g ɛ g PL condition: (1 c L g )(f 0 f ) f 1 f... and it remains more pessimistic throughout! Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization 23 of 33

36 Upper bounds on K f (ɛ f ) versus K g (ɛ g ) Let f(x) = 1 2 x2, meaning that g(x) = x. Let ɛ f = 1 2 ɛ2 g, meaning that F(ɛ f ) = G(ɛ g). Let x 0 = 10, c = 1, and L = 2. (Similar pictures for any L > 1.) Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization 24 of 33

37 Upper bounds on K f (ɛ f ) versus {k N : 1 2 g k 2 2 > ɛ g} Let f(x) = 1 2 x2, meaning that 1 2 g(x)2 = 1 2 x2. Let ɛ f = ɛ g, meaning that F(ɛ f ) = G(ɛ g). Let x 0 = 10, c = 1, and L = 2. (Similar pictures for any L > 1.) Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization 25 of 33

38 Bad worst-case! Worst-case complexity bounds in the general nonconvex case are very pessimistic. The analysis immediately admits a large gap when the function is nice. The essentially tight examples for the worst-case bounds are... weird. 2 2 Cartis, Gould, Toint (2010) Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization 26 of 33

39 Outline Motivation Trust Region vs. Cubic Regularization Complexity Bounds Summary/Questions Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization 27 of 33

40 Summary We have seen that cubic regularization admits good complexity properties. But we have also seen that tweaking trust region methods yields the same. And the empirical performance of trust region remains better (I claim!). These complexity bounds are extremely pessimistic. If your function is nice, these bounds are way off. How can we close the gap between theory and practice? Nesterov & Polyak (2006) consider different phases but can this be generalized (for non-strongly convex)? Complexity properties should probably not (yet) drive algorithm selection. Then why second-order? Mitigate effects of ill-conditioning, easier to tune, parallel and distributed, etc. Going back to machine learning (ML) connection... Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization 28 of 33

41 Optimization for ML: Beyond SG stochastic gradient better rate better constant Bottou et al., Optimization Methods for Large-Scale Machine Learning, SIAM Review (to appear) Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization 29 of 33

42 Optimization for ML: Beyond SG stochastic gradient better rate better constant better rate and better constant Bottou et al., Optimization Methods for Large-Scale Machine Learning, SIAM Review (to appear) Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization 29 of 33

43 Two-dimensional schematic of methods stochastic gradient batch gradient second-order noise reduction stochastic Newton batch Newton Bottou et al., Optimization Methods for Large-Scale Machine Learning, SIAM Review (to appear) Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization 30 of 33

44 2D schematic: Noise reduction methods stochastic gradient batch gradient noise reduction dynamic sampling gradient aggregation iterate averaging stochastic Newton Bottou et al., Optimization Methods for Large-Scale Machine Learning, SIAM Review (to appear) Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization 31 of 33

45 2D schematic: Second-order methods stochastic gradient batch gradient stochastic Newton second-order diagonal scaling natural gradient Gauss-Newton quasi-newton Hessian-free Newton Bottou et al., Optimization Methods for Large-Scale Machine Learning, SIAM Review (to appear) Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization 32 of 33

46 Fairly comparing algorithms How should we compare optimization algorithms for machine learning? Fair comparison would... not ignore time spent tuning parameters... demonstrate speed and reliability... involve many problems (test sets?)... involve many runs of each algorithm What about testing accuracy? Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization 33 of 33

Second-Order Methods for Stochastic Optimization

Second-Order Methods for Stochastic Optimization Second-Order Methods for Stochastic Optimization Frank E. Curtis, Lehigh University involving joint work with Léon Bottou, Facebook AI Research Jorge Nocedal, Northwestern University Optimization Methods

More information

A Trust Funnel Algorithm for Nonconvex Equality Constrained Optimization with O(ɛ 3/2 ) Complexity

A Trust Funnel Algorithm for Nonconvex Equality Constrained Optimization with O(ɛ 3/2 ) Complexity A Trust Funnel Algorithm for Nonconvex Equality Constrained Optimization with O(ɛ 3/2 ) Complexity Mohammadreza Samadi, Lehigh University joint work with Frank E. Curtis (stand-in presenter), Lehigh University

More information

Higher-Order Methods

Higher-Order Methods Higher-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. PCMI, July 2016 Stephen Wright (UW-Madison) Higher-Order Methods PCMI, July 2016 1 / 25 Smooth

More information

Stochastic Optimization Algorithms Beyond SG

Stochastic Optimization Algorithms Beyond SG Stochastic Optimization Algorithms Beyond SG Frank E. Curtis 1, Lehigh University involving joint work with Léon Bottou, Facebook AI Research Jorge Nocedal, Northwestern University Optimization Methods

More information

A trust region algorithm with a worst-case iteration complexity of O(ɛ 3/2 ) for nonconvex optimization

A trust region algorithm with a worst-case iteration complexity of O(ɛ 3/2 ) for nonconvex optimization Math. Program., Ser. A DOI 10.1007/s10107-016-1026-2 FULL LENGTH PAPER A trust region algorithm with a worst-case iteration complexity of O(ɛ 3/2 ) for nonconvex optimization Frank E. Curtis 1 Daniel P.

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

arxiv: v1 [math.oc] 1 Jul 2016

arxiv: v1 [math.oc] 1 Jul 2016 Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the

More information

arxiv: v1 [math.oc] 9 Oct 2018

arxiv: v1 [math.oc] 9 Oct 2018 Cubic Regularization with Momentum for Nonconvex Optimization Zhe Wang Yi Zhou Yingbin Liang Guanghui Lan Ohio State University Ohio State University zhou.117@osu.edu liang.889@osu.edu Ohio State University

More information

How to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization

How to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization How to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization Frank E. Curtis Department of Industrial and Systems Engineering, Lehigh University Daniel P. Robinson Department

More information

An introduction to complexity analysis for nonconvex optimization

An introduction to complexity analysis for nonconvex optimization An introduction to complexity analysis for nonconvex optimization Philippe Toint (with Coralia Cartis and Nick Gould) FUNDP University of Namur, Belgium Séminaire Résidentiel Interdisciplinaire, Saint

More information

Third-order Smoothness Helps: Even Faster Stochastic Optimization Algorithms for Finding Local Minima

Third-order Smoothness Helps: Even Faster Stochastic Optimization Algorithms for Finding Local Minima Third-order Smoothness elps: Even Faster Stochastic Optimization Algorithms for Finding Local Minima Yaodong Yu and Pan Xu and Quanquan Gu arxiv:171.06585v1 [math.oc] 18 Dec 017 Abstract We propose stochastic

More information

Complexity analysis of second-order algorithms based on line search for smooth nonconvex optimization

Complexity analysis of second-order algorithms based on line search for smooth nonconvex optimization Complexity analysis of second-order algorithms based on line search for smooth nonconvex optimization Clément Royer - University of Wisconsin-Madison Joint work with Stephen J. Wright MOPTA, Bethlehem,

More information

Convergence of Cubic Regularization for Nonconvex Optimization under KŁ Property

Convergence of Cubic Regularization for Nonconvex Optimization under KŁ Property Convergence of Cubic Regularization for Nonconvex Optimization under KŁ Property Yi Zhou Department of ECE The Ohio State University zhou.1172@osu.edu Zhe Wang Department of ECE The Ohio State University

More information

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)

More information

This manuscript is for review purposes only.

This manuscript is for review purposes only. 1 2 3 4 5 6 7 8 9 10 11 12 THE USE OF QUADRATIC REGULARIZATION WITH A CUBIC DESCENT CONDITION FOR UNCONSTRAINED OPTIMIZATION E. G. BIRGIN AND J. M. MARTíNEZ Abstract. Cubic-regularization and trust-region

More information

Oracle Complexity of Second-Order Methods for Smooth Convex Optimization

Oracle Complexity of Second-Order Methods for Smooth Convex Optimization racle Complexity of Second-rder Methods for Smooth Convex ptimization Yossi Arjevani had Shamir Ron Shiff Weizmann Institute of Science Rehovot 7610001 Israel Abstract yossi.arjevani@weizmann.ac.il ohad.shamir@weizmann.ac.il

More information

arxiv: v2 [math.oc] 1 Nov 2017

arxiv: v2 [math.oc] 1 Nov 2017 Stochastic Non-convex Optimization with Strong High Probability Second-order Convergence arxiv:1710.09447v [math.oc] 1 Nov 017 Mingrui Liu, Tianbao Yang Department of Computer Science The University of

More information

Nonlinear Optimization Methods for Machine Learning

Nonlinear Optimization Methods for Machine Learning Nonlinear Optimization Methods for Machine Learning Jorge Nocedal Northwestern University University of California, Davis, Sept 2018 1 Introduction We don t really know, do we? a) Deep neural networks

More information

1. Introduction. We consider the numerical solution of the unconstrained (possibly nonconvex) optimization problem

1. Introduction. We consider the numerical solution of the unconstrained (possibly nonconvex) optimization problem SIAM J. OPTIM. Vol. 2, No. 6, pp. 2833 2852 c 2 Society for Industrial and Applied Mathematics ON THE COMPLEXITY OF STEEPEST DESCENT, NEWTON S AND REGULARIZED NEWTON S METHODS FOR NONCONVEX UNCONSTRAINED

More information

Sub-Sampled Newton Methods for Machine Learning. Jorge Nocedal

Sub-Sampled Newton Methods for Machine Learning. Jorge Nocedal Sub-Sampled Newton Methods for Machine Learning Jorge Nocedal Northwestern University Goldman Lecture, Sept 2016 1 Collaborators Raghu Bollapragada Northwestern University Richard Byrd University of Colorado

More information

Optimal Newton-type methods for nonconvex smooth optimization problems

Optimal Newton-type methods for nonconvex smooth optimization problems Optimal Newton-type methods for nonconvex smooth optimization problems Coralia Cartis, Nicholas I. M. Gould and Philippe L. Toint June 9, 20 Abstract We consider a general class of second-order iterations

More information

Numerical Experience with a Class of Trust-Region Algorithms May 23, for 2016 Unconstrained 1 / 30 S. Smooth Optimization

Numerical Experience with a Class of Trust-Region Algorithms May 23, for 2016 Unconstrained 1 / 30 S. Smooth Optimization Numerical Experience with a Class of Trust-Region Algorithms for Unconstrained Smooth Optimization XI Brazilian Workshop on Continuous Optimization Universidade Federal do Paraná Geovani Nunes Grapiglia

More information

Adaptive Negative Curvature Descent with Applications in Non-convex Optimization

Adaptive Negative Curvature Descent with Applications in Non-convex Optimization Adaptive Negative Curvature Descent with Applications in Non-convex Optimization Mingrui Liu, Zhe Li, Xiaoyu Wang, Jinfeng Yi, Tianbao Yang Department of Computer Science, The University of Iowa, Iowa

More information

Contents. 1 Introduction. 1.1 History of Optimization ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016

Contents. 1 Introduction. 1.1 History of Optimization ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016 ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016 LECTURERS: NAMAN AGARWAL AND BRIAN BULLINS SCRIBE: KIRAN VODRAHALLI Contents 1 Introduction 1 1.1 History of Optimization.....................................

More information

Linear Convergence under the Polyak-Łojasiewicz Inequality

Linear Convergence under the Polyak-Łojasiewicz Inequality Linear Convergence under the Polyak-Łojasiewicz Inequality Hamed Karimi, Julie Nutini, Mark Schmidt University of British Columbia Linear of Convergence of Gradient-Based Methods Fitting most machine learning

More information

An example of slow convergence for Newton s method on a function with globally Lipschitz continuous Hessian

An example of slow convergence for Newton s method on a function with globally Lipschitz continuous Hessian An example of slow convergence for Newton s method on a function with globally Lipschitz continuous Hessian C. Cartis, N. I. M. Gould and Ph. L. Toint 3 May 23 Abstract An example is presented where Newton

More information

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL) Part 3: Trust-region methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective

More information

Linear Convergence under the Polyak-Łojasiewicz Inequality

Linear Convergence under the Polyak-Łojasiewicz Inequality Linear Convergence under the Polyak-Łojasiewicz Inequality Hamed Karimi, Julie Nutini and Mark Schmidt The University of British Columbia LCI Forum February 28 th, 2017 1 / 17 Linear Convergence of Gradient-Based

More information

Infeasibility Detection and an Inexact Active-Set Method for Large-Scale Nonlinear Optimization

Infeasibility Detection and an Inexact Active-Set Method for Large-Scale Nonlinear Optimization Infeasibility Detection and an Inexact Active-Set Method for Large-Scale Nonlinear Optimization Frank E. Curtis, Lehigh University involving joint work with James V. Burke, University of Washington Daniel

More information

Worst Case Complexity of Direct Search

Worst Case Complexity of Direct Search Worst Case Complexity of Direct Search L. N. Vicente May 3, 200 Abstract In this paper we prove that direct search of directional type shares the worst case complexity bound of steepest descent when sufficient

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Proximal-Gradient Mark Schmidt University of British Columbia Winter 2018 Admin Auditting/registration forms: Pick up after class today. Assignment 1: 2 late days to hand in

More information

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Coralia Cartis, University of Oxford INFOMM CDT: Modelling, Analysis and Computation of Continuous Real-World Problems Methods

More information

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725 Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h

More information

Sub-Sampled Newton Methods

Sub-Sampled Newton Methods Sub-Sampled Newton Methods F. Roosta-Khorasani and M. W. Mahoney ICSI and Dept of Statistics, UC Berkeley February 2016 F. Roosta-Khorasani and M. W. Mahoney (UCB) Sub-Sampled Newton Methods Feb 2016 1

More information

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

Newton s Method. Ryan Tibshirani Convex Optimization /36-725 Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x

More information

Stochastic Optimization Methods for Machine Learning. Jorge Nocedal

Stochastic Optimization Methods for Machine Learning. Jorge Nocedal Stochastic Optimization Methods for Machine Learning Jorge Nocedal Northwestern University SIAM CSE, March 2017 1 Collaborators Richard Byrd R. Bollagragada N. Keskar University of Colorado Northwestern

More information

Cubic regularization of Newton s method for convex problems with constraints

Cubic regularization of Newton s method for convex problems with constraints CORE DISCUSSION PAPER 006/39 Cubic regularization of Newton s method for convex problems with constraints Yu. Nesterov March 31, 006 Abstract In this paper we derive efficiency estimates of the regularized

More information

Stochastic Cubic Regularization for Fast Nonconvex Optimization

Stochastic Cubic Regularization for Fast Nonconvex Optimization Stochastic Cubic Regularization for Fast Nonconvex Optimization Nilesh Tripuraneni Mitchell Stern Chi Jin Jeffrey Regier Michael I. Jordan {nilesh tripuraneni,mitchell,chijin,regier}@berkeley.edu jordan@cs.berkeley.edu

More information

Large-scale Stochastic Optimization

Large-scale Stochastic Optimization Large-scale Stochastic Optimization 11-741/641/441 (Spring 2016) Hanxiao Liu hanxiaol@cs.cmu.edu March 24, 2016 1 / 22 Outline 1. Gradient Descent (GD) 2. Stochastic Gradient Descent (SGD) Formulation

More information

On the complexity of an Inexact Restoration method for constrained optimization

On the complexity of an Inexact Restoration method for constrained optimization On the complexity of an Inexact Restoration method for constrained optimization L. F. Bueno J. M. Martínez September 18, 2018 Abstract Recent papers indicate that some algorithms for constrained optimization

More information

Composite nonlinear models at scale

Composite nonlinear models at scale Composite nonlinear models at scale Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with D. Davis (Cornell), M. Fazel (UW), A.S. Lewis (Cornell) C. Paquette (Lehigh), and S. Roy (UW)

More information

Cubic-regularization counterpart of a variable-norm trust-region method for unconstrained minimization

Cubic-regularization counterpart of a variable-norm trust-region method for unconstrained minimization Cubic-regularization counterpart of a variable-norm trust-region method for unconstrained minimization J. M. Martínez M. Raydan November 15, 2015 Abstract In a recent paper we introduced a trust-region

More information

Tutorial: PART 2. Optimization for Machine Learning. Elad Hazan Princeton University. + help from Sanjeev Arora & Yoram Singer

Tutorial: PART 2. Optimization for Machine Learning. Elad Hazan Princeton University. + help from Sanjeev Arora & Yoram Singer Tutorial: PART 2 Optimization for Machine Learning Elad Hazan Princeton University + help from Sanjeev Arora & Yoram Singer Agenda 1. Learning as mathematical optimization Stochastic optimization, ERM,

More information

OPER 627: Nonlinear Optimization Lecture 9: Trust-region methods

OPER 627: Nonlinear Optimization Lecture 9: Trust-region methods OPER 627: Nonlinear Optimization Lecture 9: Trust-region methods Department of Statistical Sciences and Operations Research Virginia Commonwealth University Sept 25, 2013 (Lecture 9) Nonlinear Optimization

More information

Second order machine learning

Second order machine learning Second order machine learning Michael W. Mahoney ICSI and Department of Statistics UC Berkeley Michael W. Mahoney (UC Berkeley) Second order machine learning 1 / 96 Outline Machine Learning s Inverse Problem

More information

Trust Regions. Charles J. Geyer. March 27, 2013

Trust Regions. Charles J. Geyer. March 27, 2013 Trust Regions Charles J. Geyer March 27, 2013 1 Trust Region Theory We follow Nocedal and Wright (1999, Chapter 4), using their notation. Fletcher (1987, Section 5.1) discusses the same algorithm, but

More information

Parameter Optimization in the Nonlinear Stepsize Control Framework for Trust-Region July 12, 2017 Methods 1 / 39

Parameter Optimization in the Nonlinear Stepsize Control Framework for Trust-Region July 12, 2017 Methods 1 / 39 Parameter Optimization in the Nonlinear Stepsize Control Framework for Trust-Region Methods EUROPT 2017 Federal University of Paraná - Curitiba/PR - Brazil Geovani Nunes Grapiglia Federal University of

More information

Selected Topics in Optimization. Some slides borrowed from

Selected Topics in Optimization. Some slides borrowed from Selected Topics in Optimization Some slides borrowed from http://www.stat.cmu.edu/~ryantibs/convexopt/ Overview Optimization problems are almost everywhere in statistics and machine learning. Input Model

More information

Second order machine learning

Second order machine learning Second order machine learning Michael W. Mahoney ICSI and Department of Statistics UC Berkeley Michael W. Mahoney (UC Berkeley) Second order machine learning 1 / 88 Outline Machine Learning s Inverse Problem

More information

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares Robert Bridson October 29, 2008 1 Hessian Problems in Newton Last time we fixed one of plain Newton s problems by introducing line search

More information

Optimization Tutorial 1. Basic Gradient Descent

Optimization Tutorial 1. Basic Gradient Descent E0 270 Machine Learning Jan 16, 2015 Optimization Tutorial 1 Basic Gradient Descent Lecture by Harikrishna Narasimhan Note: This tutorial shall assume background in elementary calculus and linear algebra.

More information

Lecture 14: October 17

Lecture 14: October 17 1-725/36-725: Convex Optimization Fall 218 Lecture 14: October 17 Lecturer: Lecturer: Ryan Tibshirani Scribes: Pengsheng Guo, Xian Zhou Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

On Lagrange multipliers of trust region subproblems

On Lagrange multipliers of trust region subproblems On Lagrange multipliers of trust region subproblems Ladislav Lukšan, Ctirad Matonoha, Jan Vlček Institute of Computer Science AS CR, Prague Applied Linear Algebra April 28-30, 2008 Novi Sad, Serbia Outline

More information

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term; Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many

More information

Quasi-Newton Methods. Javier Peña Convex Optimization /36-725

Quasi-Newton Methods. Javier Peña Convex Optimization /36-725 Quasi-Newton Methods Javier Peña Convex Optimization 10-725/36-725 Last time: primal-dual interior-point methods Consider the problem min x subject to f(x) Ax = b h(x) 0 Assume f, h 1,..., h m are convex

More information

Newton s Method. Javier Peña Convex Optimization /36-725

Newton s Method. Javier Peña Convex Optimization /36-725 Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and

More information

arxiv: v1 [cs.lg] 6 Sep 2018

arxiv: v1 [cs.lg] 6 Sep 2018 arxiv:1809.016v1 [cs.lg] 6 Sep 018 Aryan Mokhtari Asuman Ozdaglar Ali Jadbabaie Laboratory for Information and Decision Systems Massachusetts Institute of Technology Cambridge, MA 0139 Abstract aryanm@mit.edu

More information

Optimization: Nonlinear Optimization without Constraints. Nonlinear Optimization without Constraints 1 / 23

Optimization: Nonlinear Optimization without Constraints. Nonlinear Optimization without Constraints 1 / 23 Optimization: Nonlinear Optimization without Constraints Nonlinear Optimization without Constraints 1 / 23 Nonlinear optimization without constraints Unconstrained minimization min x f(x) where f(x) is

More information

Proximal and First-Order Methods for Convex Optimization

Proximal and First-Order Methods for Convex Optimization Proximal and First-Order Methods for Convex Optimization John C Duchi Yoram Singer January, 03 Abstract We describe the proximal method for minimization of convex functions We review classical results,

More information

An Evolving Gradient Resampling Method for Machine Learning. Jorge Nocedal

An Evolving Gradient Resampling Method for Machine Learning. Jorge Nocedal An Evolving Gradient Resampling Method for Machine Learning Jorge Nocedal Northwestern University NIPS, Montreal 2015 1 Collaborators Figen Oztoprak Stefan Solntsev Richard Byrd 2 Outline 1. How to improve

More information

Taylor-like models in nonsmooth optimization

Taylor-like models in nonsmooth optimization Taylor-like models in nonsmooth optimization Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with Ioffe (Technion), Lewis (Cornell), and Paquette (UW) SIAM Optimization 2017 AFOSR,

More information

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications Weijun Zhou 28 October 20 Abstract A hybrid HS and PRP type conjugate gradient method for smooth

More information

Optimisation non convexe avec garanties de complexité via Newton+gradient conjugué

Optimisation non convexe avec garanties de complexité via Newton+gradient conjugué Optimisation non convexe avec garanties de complexité via Newton+gradient conjugué Clément Royer (Université du Wisconsin-Madison, États-Unis) Toulouse, 8 janvier 2019 Nonconvex optimization via Newton-CG

More information

Keywords: Nonlinear least-squares problems, regularized models, error bound condition, local convergence.

Keywords: Nonlinear least-squares problems, regularized models, error bound condition, local convergence. STRONG LOCAL CONVERGENCE PROPERTIES OF ADAPTIVE REGULARIZED METHODS FOR NONLINEAR LEAST-SQUARES S. BELLAVIA AND B. MORINI Abstract. This paper studies adaptive regularized methods for nonlinear least-squares

More information

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R

More information

8 Numerical methods for unconstrained problems

8 Numerical methods for unconstrained problems 8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields

More information

Lecture 17: October 27

Lecture 17: October 27 0-725/36-725: Convex Optimiation Fall 205 Lecturer: Ryan Tibshirani Lecture 7: October 27 Scribes: Brandon Amos, Gines Hidalgo Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These

More information

Quasi-Newton Methods. Zico Kolter (notes by Ryan Tibshirani, Javier Peña, Zico Kolter) Convex Optimization

Quasi-Newton Methods. Zico Kolter (notes by Ryan Tibshirani, Javier Peña, Zico Kolter) Convex Optimization Quasi-Newton Methods Zico Kolter (notes by Ryan Tibshirani, Javier Peña, Zico Kolter) Convex Optimization 10-725 Last time: primal-dual interior-point methods Given the problem min x f(x) subject to h(x)

More information

Subgradient Method. Guest Lecturer: Fatma Kilinc-Karzan. Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization /36-725

Subgradient Method. Guest Lecturer: Fatma Kilinc-Karzan. Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization /36-725 Subgradient Method Guest Lecturer: Fatma Kilinc-Karzan Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization 10-725/36-725 Adapted from slides from Ryan Tibshirani Consider the problem Recall:

More information

Accelerating the cubic regularization of Newton s method on convex problems

Accelerating the cubic regularization of Newton s method on convex problems Accelerating the cubic regularization of Newton s method on convex problems Yu. Nesterov September 005 Abstract In this paper we propose an accelerated version of the cubic regularization of Newton s method

More information

Worst Case Complexity of Direct Search

Worst Case Complexity of Direct Search Worst Case Complexity of Direct Search L. N. Vicente October 25, 2012 Abstract In this paper we prove that the broad class of direct-search methods of directional type based on imposing sufficient decrease

More information

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS 1. Introduction. We consider first-order methods for smooth, unconstrained optimization: (1.1) minimize f(x), x R n where f : R n R. We assume

More information

IFT Lecture 6 Nesterov s Accelerated Gradient, Stochastic Gradient Descent

IFT Lecture 6 Nesterov s Accelerated Gradient, Stochastic Gradient Descent IFT 6085 - Lecture 6 Nesterov s Accelerated Gradient, Stochastic Gradient Descent This version of the notes has not yet been thoroughly checked. Please report any bugs to the scribes or instructor. Scribe(s):

More information

Parallel Coordinate Optimization

Parallel Coordinate Optimization 1 / 38 Parallel Coordinate Optimization Julie Nutini MLRG - Spring Term March 6 th, 2018 2 / 38 Contours of a function F : IR 2 IR. Goal: Find the minimizer of F. Coordinate Descent in 2D Contours of a

More information

Chapter 3 Numerical Methods

Chapter 3 Numerical Methods Chapter 3 Numerical Methods Part 2 3.2 Systems of Equations 3.3 Nonlinear and Constrained Optimization 1 Outline 3.2 Systems of Equations 3.3 Nonlinear and Constrained Optimization Summary 2 Outline 3.2

More information

5 Handling Constraints

5 Handling Constraints 5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest

More information

Contents. Preface. 1 Introduction Optimization view on mathematical models NLP models, black-box versus explicit expression 3

Contents. Preface. 1 Introduction Optimization view on mathematical models NLP models, black-box versus explicit expression 3 Contents Preface ix 1 Introduction 1 1.1 Optimization view on mathematical models 1 1.2 NLP models, black-box versus explicit expression 3 2 Mathematical modeling, cases 7 2.1 Introduction 7 2.2 Enclosing

More information

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big

More information

On Nesterov s Random Coordinate Descent Algorithms - Continued

On Nesterov s Random Coordinate Descent Algorithms - Continued On Nesterov s Random Coordinate Descent Algorithms - Continued Zheng Xu University of Texas At Arlington February 20, 2015 1 Revisit Random Coordinate Descent The Random Coordinate Descent Upper and Lower

More information

An Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization

An Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization An Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization Frank E. Curtis, Lehigh University involving joint work with Travis Johnson, Northwestern University Daniel P. Robinson, Johns

More information

Subgradient Method. Ryan Tibshirani Convex Optimization

Subgradient Method. Ryan Tibshirani Convex Optimization Subgradient Method Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last last time: gradient descent min x f(x) for f convex and differentiable, dom(f) = R n. Gradient descent: choose initial

More information

CORE 50 YEARS OF DISCUSSION PAPERS. Globally Convergent Second-order Schemes for Minimizing Twicedifferentiable 2016/28

CORE 50 YEARS OF DISCUSSION PAPERS. Globally Convergent Second-order Schemes for Minimizing Twicedifferentiable 2016/28 26/28 Globally Convergent Second-order Schemes for Minimizing Twicedifferentiable Functions YURII NESTEROV AND GEOVANI NUNES GRAPIGLIA 5 YEARS OF CORE DISCUSSION PAPERS CORE Voie du Roman Pays 4, L Tel

More information

Geometry optimization

Geometry optimization Geometry optimization Trygve Helgaker Centre for Theoretical and Computational Chemistry Department of Chemistry, University of Oslo, Norway European Summer School in Quantum Chemistry (ESQC) 211 Torre

More information

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44 Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)

More information

Methods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent

Methods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent Nonlinear Optimization Steepest Descent and Niclas Börlin Department of Computing Science Umeå University niclas.borlin@cs.umu.se A disadvantage with the Newton method is that the Hessian has to be derived

More information

Evaluation complexity for nonlinear constrained optimization using unscaled KKT conditions and high-order models by E. G. Birgin, J. L. Gardenghi, J. M. Martínez, S. A. Santos and Ph. L. Toint Report NAXYS-08-2015

More information

Stochastic Analogues to Deterministic Optimizers

Stochastic Analogues to Deterministic Optimizers Stochastic Analogues to Deterministic Optimizers ISMP 2018 Bordeaux, France Vivak Patel Presented by: Mihai Anitescu July 6, 2018 1 Apology I apologize for not being here to give this talk myself. I injured

More information

Unconstrained minimization of smooth functions

Unconstrained minimization of smooth functions Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and

More information

On Lagrange multipliers of trust-region subproblems

On Lagrange multipliers of trust-region subproblems On Lagrange multipliers of trust-region subproblems Ladislav Lukšan, Ctirad Matonoha, Jan Vlček Institute of Computer Science AS CR, Prague Programy a algoritmy numerické matematiky 14 1.- 6. června 2008

More information

Zero-Order Methods for the Optimization of Noisy Functions. Jorge Nocedal

Zero-Order Methods for the Optimization of Noisy Functions. Jorge Nocedal Zero-Order Methods for the Optimization of Noisy Functions Jorge Nocedal Northwestern University Simons Institute, October 2017 1 Collaborators Albert Berahas Northwestern University Richard Byrd University

More information

Sub-Sampled Newton Methods I: Globally Convergent Algorithms

Sub-Sampled Newton Methods I: Globally Convergent Algorithms Sub-Sampled Newton Methods I: Globally Convergent Algorithms arxiv:1601.04737v3 [math.oc] 26 Feb 2016 Farbod Roosta-Khorasani February 29, 2016 Abstract Michael W. Mahoney Large scale optimization problems

More information

Programming, numerics and optimization

Programming, numerics and optimization Programming, numerics and optimization Lecture C-3: Unconstrained optimization II Łukasz Jankowski ljank@ippt.pan.pl Institute of Fundamental Technological Research Room 4.32, Phone +22.8261281 ext. 428

More information

10. Unconstrained minimization

10. Unconstrained minimization Convex Optimization Boyd & Vandenberghe 10. Unconstrained minimization terminology and assumptions gradient descent method steepest descent method Newton s method self-concordant functions implementation

More information

Improving L-BFGS Initialization for Trust-Region Methods in Deep Learning

Improving L-BFGS Initialization for Trust-Region Methods in Deep Learning Improving L-BFGS Initialization for Trust-Region Methods in Deep Learning Jacob Rafati http://rafati.net jrafatiheravi@ucmerced.edu Ph.D. Candidate, Electrical Engineering and Computer Science University

More information

Constrained Optimization and Lagrangian Duality

Constrained Optimization and Lagrangian Duality CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may

More information

Non-convex optimization. Issam Laradji

Non-convex optimization. Issam Laradji Non-convex optimization Issam Laradji Strongly Convex Objective function f(x) x Strongly Convex Objective function Assumptions Gradient Lipschitz continuous f(x) Strongly convex x Strongly Convex Objective

More information

Iteration and evaluation complexity on the minimization of functions whose computation is intrinsically inexact

Iteration and evaluation complexity on the minimization of functions whose computation is intrinsically inexact Iteration and evaluation complexity on the minimization of functions whose computation is intrinsically inexact E. G. Birgin N. Krejić J. M. Martínez September 5, 017 Abstract In many cases in which one

More information

Complexity of gradient descent for multiobjective optimization

Complexity of gradient descent for multiobjective optimization Complexity of gradient descent for multiobjective optimization J. Fliege A. I. F. Vaz L. N. Vicente July 18, 2018 Abstract A number of first-order methods have been proposed for smooth multiobjective optimization

More information

Interpolation-Based Trust-Region Methods for DFO

Interpolation-Based Trust-Region Methods for DFO Interpolation-Based Trust-Region Methods for DFO Luis Nunes Vicente University of Coimbra (joint work with A. Bandeira, A. R. Conn, S. Gratton, and K. Scheinberg) July 27, 2010 ICCOPT, Santiago http//www.mat.uc.pt/~lnv

More information

Gradient Methods Using Momentum and Memory

Gradient Methods Using Momentum and Memory Chapter 3 Gradient Methods Using Momentum and Memory The steepest descent method described in Chapter always steps in the negative gradient direction, which is orthogonal to the boundary of the level set

More information