Mesures de criticalité d'ordres 1 et 2 en recherche directe

Size: px

Start display at page:

Download "Mesures de criticalité d'ordres 1 et 2 en recherche directe"

Darlene Boone
6 years ago
Views:

1 Mesures de criticalité d'ordres 1 et 2 en recherche directe From rst to second-order criticality measures in direct search Clément Royer ENSEEIHT-IRIT, Toulouse, France Co-auteurs: S. Gratton, L. N. Vicente Journées du GDR MOA - 02/12/15 Mesures de criticalité d'ordres 1 et 2 en recherche directe 1 / 25

2 Outline 1 A problem: solving nonconvex problems via second-order methods 2 A context: direct-search methods 3 From rst to second-order polling 4 Second-order analysis and numerical behaviour Mesures de criticalité d'ordres 1 et 2 en recherche directe 2 / 25

3 Introduction We are interested in solving an unconstrained optimization problem: min f (x). x R n The objective function f f bounded from below, C 2 ; f, 2 f Lipschitz continuous; f nonconvex the Hessian matrix is not always positive semidenite. Mesures de criticalité d'ordres 1 et 2 en recherche directe 3 / 25

4 Caring about second order Our denition of a second-order method An optimization algorithm that exploits the (negative) curvature information contained in the Hessian matrix, to ensure second-order convergence. Second-order tools for the analysis Taylor expansion : f (x + s) f (x) f (x) s s 2 f (x) s + L 2 f s 3, Directional derivative estimate f (x + s) 2 f (x) + f (x s) = s 2 f (x) s + O ( s 3). Mesures de criticalité d'ordres 1 et 2 en recherche directe 4 / 25

5 Second-order derivative-based optimization Early treatment in Trust-Region and (Curvilinear) Line Search Methods; Negative curvature is seldom handled to provide second-order convergence guarantees; Regain of interest, with the outbreak of cubic models: Curtis et al '13,'14,'15, Wong ISMP '15. Main issues Cost of computing negative curvature directions; Dissociate the contributions from orders 1 and 2; No natural scaling between f (x) and λmin ( 2 f (x) ). Mesures de criticalité d'ordres 1 et 2 en recherche directe 5 / 25

6 Outline 1 A problem: solving nonconvex problems via second-order methods 2 A context: direct-search methods 3 From rst to second-order polling 4 Second-order analysis and numerical behaviour Mesures de criticalité d'ordres 1 et 2 en recherche directe 6 / 25

7 Solving the problem without using the derivatives We consider a setting in which derivatives of f are unavailable or too expensive for computation. Derivative-Free Optimization (DFO) methods Do not use the derivatives within the algorithm; Two main classes: Model-based methods; Direct-search methods. Introduction to Derivative-Free Optimization A.R. Conn, K. Scheinberg, L.N. Vicente. (2009) Mesures de criticalité d'ordres 1 et 2 en recherche directe 7 / 25

8 Solving the problem without using the derivatives We consider a setting in which derivatives of f are unavailable or too expensive for computation. Derivative-Free Optimization (DFO) methods Do not use the derivatives within the algorithm; Two main classes: Model-based methods; Direct-search methods. Introduction to Derivative-Free Optimization A.R. Conn, K. Scheinberg, L.N. Vicente. (2009) Mesures de criticalité d'ordres 1 et 2 en recherche directe 7 / 25

9 A simple direct-search framework 1 Initialization Set x 0, α 0 > 0, θ < 1 γ. Set k = 0. 2 Poll Step Choose a polling/direction set of (unitary) vectors. If it exists d k within the set such that f (x k + α k d k ) f (x k ) < α 3 k, then set x k+1 := x k + α k d k and α k+1 := γ α k. Otherwise, set x k+1 := x k and α k+1 := θ α k. 3 Set k = k + 1 and go back to the poll step. Mesures de criticalité d'ordres 1 et 2 en recherche directe 8 / 25

10 A simple direct-search framework 1 Initialization Set x 0, α 0 > 0, θ < 1 γ. Set k = 0. 2 Poll Step Choose a polling/direction set of (unitary) vectors. If it exists d k within the set such that f (x k + α k d k ) f (x k ) < α 3 k, then set x k+1 := x k + α k d k and α k+1 := γ α k. Otherwise, set x k+1 := x k and α k+1 := θ α k. 3 Set k = k + 1 and go back to the poll step. Remarks Performance criterion : # of evaluations of f ; Theoretical properties mainly depend on polling choices. Mesures de criticalité d'ordres 1 et 2 en recherche directe 8 / 25

11 Order 2 in derivative-free methods Few practical methods that explicitly deal with nonconvexity; For direct search, most results due to Abramson et al ('05,'06,'14). Issues with the existing direct-search approaches Study properties of (unknown) convergent subsequences; Rely on density assumptions and on direction sets dependent from an iteration to another. Our objective is to develop a method that exploits second-order properties at the iteration level. Mesures de criticalité d'ordres 1 et 2 en recherche directe 9 / 25

12 Outline 1 A problem: solving nonconvex problems via second-order methods 2 A context: direct-search methods 3 From rst to second-order polling 4 Second-order analysis and numerical behaviour Mesures de criticalité d'ordres 1 et 2 en recherche directe 10 / 25

13 Back to the direct-search method 1 Initialization Set x 0, α 0 > 0, θ < 1 γ. Set k = 0. 2 Poll Step Choose a polling set of (unitary) vectors. If it exists d k within the set such that f (x k + α k d k ) f (x k ) < α 3 k, then set x k+1 := x k + α k d k and α k+1 := γ α k. Otherwise, set x k+1 := x k and α k+1 := θ α k. 3 Set k = k + 1 and go back to the poll step. How can we dene rules to choose the polling sets? Mesures de criticalité d'ordres 1 et 2 en recherche directe 11 / 25

14 First-order polling quality Typical direct-search methods ensure rst-order convergence; The polling sets must provide good approximations of the negative gradient. Mesures de criticalité d'ordres 1 et 2 en recherche directe 12 / 25

15 First-order polling quality Typical direct-search methods ensure rst-order convergence; The polling sets must provide good approximations of the negative gradient. A measure of rst-order quality Let D be a set of unitary vectors and v R n \ {0}. Then d v cm(d, v) = max d D v is called the cosine measure of D at v. Mesures de criticalité d'ordres 1 et 2 en recherche directe 12 / 25

16 First-order polling quality Typical direct-search methods ensure rst-order convergence; The polling sets must provide good approximations of the negative gradient. A measure of rst-order quality Let D be a set of unitary vectors and v R n \ {0}. Then d v cm(d, v) = max d D v is called the cosine measure of D at v. If cm(d, f (x)) > 0, it means that D contains a descent direction of f at x. Mesures de criticalité d'ordres 1 et 2 en recherche directe 12 / 25

17 Usual polling choice Positive Spanning Sets (PSS) D is a PSS if it generates R n by nonnegative linear combinations. D is a PSS i v 0, cm(d, v) > 0; a PSS contains at least n + 1 vectors; Ex) The coordinate set D = [I -I ]. Mesures de criticalité d'ordres 1 et 2 en recherche directe 13 / 25

18 Usual polling choice Positive Spanning Sets (PSS) D is a PSS if it generates R n by nonnegative linear combinations. D is a PSS i v 0, cm(d, v) > 0; a PSS contains at least n + 1 vectors; Ex) The coordinate set D = [I -I ]. PSS and rst-order convergence Two main ideas : Use the Taylor expansion f (x + α d) f (x) α f (x) d + L f α 2. Assume that for every iteration, cm (D k, f (x k )) κ, κ (0, 1). Mesures de criticalité d'ordres 1 et 2 en recherche directe 13 / 25

19 First-order results First-order polling strategy 1 Poll along a Positive Spanning Set D k. Mesures de criticalité d'ordres 1 et 2 en recherche directe 14 / 25

20 First-order results First-order polling strategy 1 Poll along a Positive Spanning Set D k. Convergence arguments Independently of D k, α k 0; On unsuccessful iterations, α k O (κ f (x k ) ). Theorem (First-order convergence) lim inf k f (x k) = 0. Mesures de criticalité d'ordres 1 et 2 en recherche directe 14 / 25

21 A second-order criticality measure Denition Given a set of unitary vectors D and a symmetric matrix A, the Rayleigh measure of D with respect to A is dened by rm (D, A) = min d A d, d V (D) where V (D) = {d D d D} is the symmetric part of D. The Rayleigh measure is an approximation of the minimum eigenvalue; We want this approximation to be suciently good. Mesures de criticalité d'ordres 1 et 2 en recherche directe 15 / 25

22 Rayleigh measure and negative curvature In derivative-based methods, if λ min ( 2 f (x k )) < 0, one uses a sucient negative curvature direction: with β (0, 1]. In a direct-search environment d 2 f (x k ) d β λ min ( 2 f (x k )), Derivative-free: Hessian eigenvalues cannot be computed; Direct search: The step size goes to zero; We will be ensuring rm ( D k, 2 f (x k ) ) β λ min ( 2 f (x k )) + O(α k ). Mesures de criticalité d'ordres 1 et 2 en recherche directe 16 / 25

23 A second-order polling strategy for Direct Search Second-order polling rules 1 Poll along a PSS D k (First-order rule); Mesures de criticalité d'ordres 1 et 2 en recherche directe 17 / 25

24 A second-order polling strategy for Direct Search Second-order polling rules 1 Poll along a PSS D k (First-order rule); 2 Poll along -D k ; 3 Select a basis B k D k and build an approximated Hessian 2 f (x k ) B k, using function values; H k B k 4 Compute a unitary vector such that H k v k = λ min (H k ) v k ; poll along v k and -v k. Mesures de criticalité d'ordres 1 et 2 en recherche directe 17 / 25

25 A second-order polling strategy for Direct Search Second-order polling rules 1 Poll along a PSS D k (First-order rule); 2 Poll along -D k ; 3 Select a basis B k D k and build an approximated Hessian 2 f (x k ) B k, using function values; H k B k 4 Compute a unitary vector such that H k v k = λ min (H k ) v k ; poll along v k and -v k. The cost of an iteration is at most O(n 2 ) evaluations. The polling stops as soon as it encounters a direction d such that f (x k + α k d) f (x k ) < α 3 k. Mesures de criticalité d'ordres 1 et 2 en recherche directe 17 / 25

26 Outline 1 A problem: solving nonconvex problems via second-order methods 2 A context: direct-search methods 3 From rst to second-order polling 4 Second-order analysis and numerical behaviour Mesures de criticalité d'ordres 1 et 2 en recherche directe 18 / 25

27 Second-order convergence Assumptions The D k 's are PSS with k, cm(d k, f (x k )) κ > 0; It exists σ (0, 1] such that k, σ min (B k ) 2 σ > 0. Minimum eigenvalue estimate Let k be an unsuccessful iteration, and P k the corresponding polling set. rm ( P k, 2 f (x k ) ) v k 2 f (x k ) v k σ λ min ( 2 f (x k )) + O (n α k ). The factors σ and n are due to the approximation error. Mesures de criticalité d'ordres 1 et 2 en recherche directe 19 / 25

28 Second-order convergence (2) Convergence arguments As before, α k 0; On an unsuccessful iteration k, one has: α k max { O (κ f (x k ) ), O ( ( σ n 1 λ min 2 f (xk ) ))}. Mesures de criticalité d'ordres 1 et 2 en recherche directe 20 / 25

29 Second-order convergence (2) Convergence arguments As before, α k 0; On an unsuccessful iteration k, one has: α k max { O (κ f (x k ) ), O ( ( σ n 1 λ min 2 f (xk ) ))}. Theorem (Second-order convergence) lim inf k max { f (x k ), λ min ( 2 f (x k )) } = 0. Mesures de criticalité d'ordres 1 et 2 en recherche directe 20 / 25

30 Second-order worst-case complexity We aim to reach an (ɛ g, ɛ H )-second-order critical point, i.e. f (x k ) < ɛ g and λ min ( 2 f (x k )) > ɛ H. Theorem Let N ɛgh the number of evaluations of f needed to reach a (ɛ g, ɛ H )-second-order critical point; then N ɛgh O ( n 2 max { κ 3 ɛ 3 g }), σ 3 n 3 ɛ 3. H Corollary Choosing D k = [I -I ] yields κ = 1/ n, σ = 1, and the complexity bound is O ( n 5 max { ɛ 3 g }), ɛ 3. H Mesures de criticalité d'ordres 1 et 2 en recherche directe 21 / 25

31 Practical insights On 60 CUTEst problems with negative curvature: Using symmetric sets generally improves the performance; Second-order rules (plain lines) allow to solve more problems. Mesures de criticalité d'ordres 1 et 2 en recherche directe 22 / 25

32 Conclusion Our contributions The denition of a second-order criticality measure; A second-order direct-search method that converges w.r.t. this measure and its associated complexity; Numerical conrmation of the theoretical ndings. Mesures de criticalité d'ordres 1 et 2 en recherche directe 23 / 25

33 Conclusion Our contributions The denition of a second-order criticality measure; A second-order direct-search method that converges w.r.t. this measure and its associated complexity; Numerical conrmation of the theoretical ndings. For more information A second-order globally convergent direct-search method and its worst-case complexity. S. Gratton, C. W. Royer, L. N. Vicente. To appear in Optimization. Mesures de criticalité d'ordres 1 et 2 en recherche directe 23 / 25

34 Towards randomization Guaranteeing P (cm(d k, f (x k )) > κ) p > 0 is sucient for rst-order convergence, and we can do it in practice (Gratton, R., Vicente and Zhang '14); Can we do the same with second-order properties? Mesures de criticalité d'ordres 1 et 2 en recherche directe 24 / 25

35 Merci! Mesures de criticalité d'ordres 1 et 2 en recherche directe 25 / 25

Complexity analysis of second-order algorithms based on line search for smooth nonconvex optimization

Complexity analysis of second-order algorithms based on line search for smooth nonconvex optimization Clément Royer - University of Wisconsin-Madison Joint work with Stephen J. Wright MOPTA, Bethlehem,