Complexity analysis of second-order algorithms based on line search for smooth nonconvex optimization

Size: px

Start display at page:

Download "Complexity analysis of second-order algorithms based on line search for smooth nonconvex optimization"

Vincent Wilkins
5 years ago
Views:

1 Complexity analysis of second-order algorithms based on line search for smooth nonconvex optimization Clément Royer - University of Wisconsin-Madison Joint work with Stephen J. Wright MOPTA, Bethlehem, Pennsylvania, USA - August 17, 2017 Complexity of second order line search 1

2 Smooth nonconvex optimization We consider an unconstrained smooth problem: min x R f (x). n Assumptions on f f bounded from below. f twice continuously dierentiable. f is not convex. Complexity of second order line search 2

3 Optimality conditions Second-order necessary point x satises the second-order necessary conditions if f (x ) = 0, 2 f (x ) 0. Basic paradigm If x is not a second-order necessary point, d such that 1 d f (x) < 0: gradient-type direction. and/or 2 d 2 f (x)d < 0: negative curvature direction specic to nonconvex problems. Complexity of second order line search 3

4 Motivation Example: Nonconvex formulation of low-rank matrix problems For common classes of problems: min f (U V ). U R n r,v R m r Second-order necessary points are global minimizers (or close). Saddle points have negative curvature. Complexity of second order line search 4

5 Motivation Example: Nonconvex formulation of low-rank matrix problems For common classes of problems: min f (U V ). U R n r,v R m r Second-order necessary points are global minimizers (or close). Saddle points have negative curvature. Renewed interested: Second-order necessary points of nonconvex problems. Needed: Ecient algorithms. Complexity of second order line search 4

6 Second-order complexity Principle For a given method, two tolerances ɛ g, ɛ H (0, 1): Obj: bound the worst-case cost of reaching x k such that f (x k ) ɛ g, λ k = λ min ( 2 f (x k )) ɛ H. Focus: Bound dependencies on ɛ g, ɛ H. Complexity of second order line search 5

7 Second-order complexity Principle For a given method, two tolerances ɛ g, ɛ H (0, 1): Obj: bound the worst-case cost of reaching x k such that f (x k ) ɛ g, λ k = λ min ( 2 f (x k )) ɛ H. Focus: Bound dependencies on ɛ g, ɛ H. Denition of cost? Best rates? Complexity of second order line search 5

8 Existing complexity results Nonconvex optimization literature Classical cost: Number of (expensive) iterations. Best methods: Newton-type frameworks. Complexity of second order line search 6

9 Existing complexity results Nonconvex optimization literature Classical cost: Number of (expensive) iterations. Best methods: Newton-type frameworks. Algorithms Classical trust region Cubic regularization TRACE trust region Bounds O ( max{ɛ 2 g ɛ 1 H, ɛ 3 H }) ( ) O max{ɛ 3 2 g, ɛ 3 H } Complexity of second order line search 6

10 Existing complexity results (2) Learning/Statistics community Specic setting ɛ g = ɛ, ɛ H = O( ɛ). Best Newton-type bound: O(ɛ 3 2 ). Gradient-based cheaper iterations. Cost measure: Hessian-vector products/gradient evaluations. Complexity of second order line search 7

11 Existing complexity results (2) Learning/Statistics community Specic setting ɛ g = ɛ, ɛ H = O( ɛ). Best Newton-type bound: O(ɛ 3 2 ). Gradient-based cheaper iterations. Cost measure: Hessian-vector products/gradient evaluations. Algorithms Gradient descent methods with random noise Accelerated gradient methods for nonconvex problems Bounds Õ ( ɛ 2) Õ(ɛ 7 4 ) Õ( ): logarithmic factors. Results hold with high probability. Complexity of second order line search 7

12 Our objective Illustrate all the possible complexities... In terms of iterations, evaluations, etc. For arbitrary ɛ g, ɛ H. Deterministic and high probability results. Complexity of second order line search 8

13 Our objective Illustrate all the possible complexities... In terms of iterations, evaluations, etc. For arbitrary ɛ g, ɛ H. Deterministic and high probability results....in a single framework Based on line search. Matrix-free: only require Hessian-vector products. Good complexity guarantees. Complexity of second order line search 8

14 Outline 1 Our algorithm 2 Complexity analysis 3 Inexact variants Complexity of second order line search 9

15 Outline 1 Our algorithm 2 Complexity analysis 3 Inexact variants Complexity of second order line search 10

16 Basic framework Parameters: x 0 R n, θ (0, 1), η > 0, ɛ g (0, 1), ɛ H (0, 1). For k=0, 1, 2,... 1 Compute a search direction d k. 2 Perform a backtracking line search to compute α k = θ j k such that f (x k + α k d k ) < f (x k ) η 6 α3 k d k 3. 3 Set x k+1 = x k + α k d k. Complexity of second order line search 11

17 Selecting the search direction d k Step 1: Use gradient related information Compute If R k < ɛ H, set g k = f (x k ), R k = g k 2 f (x k )g k g k 2. d k = R k g k g k. Elseif R k [ ɛ H, ɛ H ] and g k > ɛ g, set Otherwise perform Step 2. g k d k = g k. 1/2 Complexity of second order line search 12

18 Selecting the search direction d k (2) Step 2: Use eigenvalue information Compute an eigenpair (v k, λ k ) such that λ k = λ min ( 2 f (x k )) and 2 f (x k )v k = λ k v k, vk g k 0, v k = 1. Case λ k < ɛ H : d k = λ k v k ; Case λ k > ɛ H - Newton step: d k = dk n, 2 f (x k )dk n = g k; Case λ k [ ɛ H, ɛ H ] - regularized Newton step: d k = dk r, ( ) f 2 (x k ) + 2ɛ H dk r = g k. Complexity of second order line search 13

19 Outline 1 Our algorithm 2 Complexity analysis 3 Inexact variants Complexity of second order line search 14

20 Assumptions and notations Assumptions L f (x 0 ) = {x f (x) f (x 0 )} compact. f twice continuously dierentiable on a open set containing L f (x 0 ), with Lipschitz continuous Hessian. L H : Lipschitz constant for 2 f. flow: lower bound on {f (x k )}. U H : upper bound on 2 f (x k ). Complexity of second order line search 15

21 Criterion Approximate solution x k is an (ɛ g, ɛ H )-point if min { g k, g k+1 } ɛ g, λ k ɛ H. Complexity of second order line search 16

22 Criterion Approximate solution x k is an (ɛ g, ɛ H )-point if Other possibilities: min { g k, g k+1 } ɛ g, λ k ɛ H. Remove gradient directions and use g k+1 No cheap gradient steps. Add a stopping criterion and use g k. No global/local convergence. Complexity of second order line search 16

23 Analysis of the method Key principle Bound the decrease produced at every step while an (ɛ g, ɛ H )-point has not been reached. Complexity of second order line search 17

24 Analysis of the method Key principle Bound the decrease produced at every step while an (ɛ g, ɛ H )-point has not been reached. Five possible directions. Two ways of scaling g k : By its (negative) curvature; By its norm; Negative eigenvector; Newton step; Regularized Newton step. Complexity of second order line search 17

25 Analysis of the method Key principle Bound the decrease produced at every step while an (ɛ g, ɛ H )-point has not been reached. Five possible directions. Two ways of scaling g k : By its (negative) curvature; By its norm; Negative eigenvector; Newton step; Regularized Newton step. One proof technique, typical of backtracking line search If unit step is accepted, guaranteed decrease; Otherwise, lower bound on accepted step size. Complexity of second order line search 17

26 Example: When d k = g k / g k 1/2 In that case: g k 2 f (x k )g k g k 2 [ ɛ H, ɛ H ], g k > ɛ g. Unit step accepted: f (x k ) f (x k+1 ) η 6 d k 3 η 6 ɛ 3 2 g. Unit step rejected: By Taylor expansion, there exists a step α k = θ j k that is accepted such that { } θ j 1 5 k θ min 3, 1 2 ɛg ɛ 1 L H + η H. So the line search terminates and f (x k ) f (x k+1 ) η 6 α3 k d k 3 O ( ) ɛ 3 gɛ 3 H. Complexity of second order line search 18

27 Example: When d k = g k / g k 1/2 In that case: g k 2 f (x k )g k g k 2 [ ɛ H, ɛ H ], g k > ɛ g. Unit step accepted: f (x k ) f (x k+1 ) η 6 d k 3 η 6 ɛ 3 2 g. Unit step rejected: By Taylor expansion, there exists a step α k = θ j k that is accepted such that { } θ j 1 5 k θ min 3, 1 2 ɛg ɛ 1 L H + η H. So the line search terminates and Final decrease: f (x k ) f (x k+1 ) η 6 α3 k d k 3 f (x k ) f (x k+1 ) c g min O ( ) ɛ 3 gɛ 3 H. { ɛ 3 gɛ 3 3 H, ɛ 2 g }. Complexity of second order line search 18

28 Decrease bound General decrease lemma If at the k-th iteration, an (ɛ g, ɛ H )-point has not been reached, then { } 3 2 f (x k ) f (x k+1 ) c min ɛg, ɛ 3 H, ɛ3 gɛ 3 H, ϕ(ɛ g, ɛ H ) 3, where ϕ(ɛ g, ɛ H ) = L 1 H ɛ H ( 2 + ) 4 + 2L H ɛ g /ɛ 2 H. c depends on L H, η, θ. Complexity of second order line search 19

29 Iteration complexity Iteration complexity bound The method reaches an (ɛ g, ɛ H )-point in at most iterations. Specic rates: f 0 flow c max { ɛ g = ɛ, ɛ H = ɛ: O(ɛ 3 2 ). ɛ g = ɛ H = ɛ: O(ɛ 3 ). } ɛ 3 2 g, ɛ 3 H, ɛ 3 g ɛ 3 H, ϕ(ɛ g, ɛ H ) 3 Optimal bounds for Newton-type methods. Complexity of second order line search 20

30 Function evaluation complexity #Iterations = #Gradient/#Hessian evaluations. #Iterations #Function evaluations. Complexity of second order line search 21

31 Function evaluation complexity #Iterations = #Gradient/#Hessian evaluations. #Iterations #Function evaluations. Line-search iterations If x k is not a (ɛ g, ɛ H )-point, the line search takes at most ( )) 1 2 O (log θ min{ɛg ɛ 1 H, ɛ2 H } iterations. Evaluation complexity bound The method reaches an (ɛ g, ɛ H )-point in at most function evaluations. ( { }) Õ max ɛ 3 2 g, ɛ 3 H, ɛ 3 g ɛ 3 H, ϕ(ɛ g, ɛ H ) 3 Complexity of second order line search 21

32 Outline 1 Our algorithm 2 Complexity analysis 3 Inexact variants Complexity of second order line search 22

33 Motivation Algorithmic cost The method should be matrix-free. We use matrix-related operations: Linear system solve; Eigenvalue/Eigenvector computation. Inexactness Perform the matrix operations inexactly. Main cost unit: matrix-vector product/gradient evaluation. Complexity of second order line search 23

34 Conjugate gradient for linear systems We solve systems of the form Hd = g, with H ɛ H I. Complexity of second order line search 24

35 Conjugate gradient for linear systems We solve systems of the form Hd = g, with H ɛ H I. Conjugate Gradient (CG) We apply the conjugate gradient algorithm with stopping criterion: Hd + g ξ 2 min { g, ɛ H d }, ξ (0, 1). If κ = λ max (H)/λ min (H), the CG method will nd such a vector in at most { ( )} min n, 1 2 κ log 4κ 5 2 /ξ = min { n, O ( κ log(κ/ξ) )} matrix-vector products. Complexity of second order line search 24

36 Lanczos for eigenvalue computation Lanczos method to compute a minimum eigenvector. Can fail if deterministic Random start. Results for matrices A 0 Change the Hessian. Complexity of second order line search 25

37 Lanczos for eigenvalue computation Lanczos method to compute a minimum eigenvector. Can fail if deterministic Random start. Results for matrices A 0 Change the Hessian. Lanczos iterations Let H R n n symmetric with H U H, ɛ > 0, δ (0, 1). With probability at least 1 δ, the Lanczos procedure applied to U H I H outputs a vector v such that v Hv λ min (H) + ɛ. in at most min { n, ln(n/δ2 ) 2 2 U H ɛ } iterations/matrix-vector products. Complexity of second order line search 25

38 Selecting the search direction d k - Inexact version Step 1: Use gradient related information Compute If R k < ɛ H, set g k = f (x k ), R k = g k 2 f (x k )g k g k 2. d k = R k g k g k. Elseif R k [ ɛ H, ɛ H ] and g k ɛ g, set d k = Otherwise perform the Inexact Step 2. g k g k 1 2. Complexity of second order line search 26

39 Selecting the direction d k - Inexact version (2) Inexact Step 2: Use (inexact) eigenvalue information Compute an eigenpair (v k i, λi k ) such that with probability 1 δ, λ i k = [v i k ] 2 f (x k )v i k λ k + ɛ H 2, [v i k ] g k 0, v i k = 1. Case λ i k < 1 2 ɛ H: d k = v i k ; Case λ i k > 3 2 ɛ H: - Inexact Newton: Use CG to obtain d k = d in k, 2 f (x k )d in k + g k ξ min { g 2 k, ɛ H dk in } ; Case λ i k [ 1 2 ɛ H, 3 2 ɛ H]: - Inexact regularized Newton: Use CG to obtain d k = d ir k, [ 2 f (x k ) + 2ɛ H ] d ir k + g k ξ min { g 2 k, ɛ H dk ir }. Complexity of second order line search 27

40 Complexity analysis of the inexact method Identical reasoning: 5 steps, 1 proof. Using Lanczos with a random start, the negative curvature decrease only holds with probability 1 δ. With CG, the inexact Newton and regularized Newton give slightly dierent formulas. Decrease lemma For any iteration k, if x k is not an (ɛ g, ɛ H )-point, f (x k ) f (x k+1 ) ĉ min { 3 ɛ 3 gɛ 3 H, ɛ 2 g, ɛ 3 H, ϕ (ɛ g, ξ 2 ɛ H with probability at least 1 δ, and ĉ only depends on L H, η, θ. ) 3 ( ) }, ϕ ɛ g, 4+ξ ɛ 3 2 H, Complexity of second order line search 28

41 Complexity results Iteration complexity An (ɛ g, ɛ H )-point is reached in at most ˆK := f 0 flow ĉ max { ɛ 3 3 g ɛ 3 H, ɛ 2 g ), ɛ 3 H (ɛ, ϕ g, ξ ɛ 3 ( ) } 2 H, ϕ ɛ g, 4+ξ ɛ 3 2 H, iterations, with probability at least 1 ˆKδ. Cost complexity The number of Hessian-vector products or gradient evaluations needed to reach an (ɛ g, ɛ H )-point is at most min { ( ) ( )} n, O U 1/2 1 H ɛ 2 H log(ɛ 1 H /ξ), O U 1/2 1 H ɛ 2 H log(n/δ2 ) ˆK, with probability at least 1 ˆKδ. Complexity of second order line search 29

42 Complexity results (simplied) Setting: ɛ g = ɛ, ɛ H = ɛ. An (ɛ, ɛ)-point is reached in at most O(ɛ 3 2 ) iterations, ( ) Õ ɛ 7 4 Hessian-vector products/gradient evaluations, with probability 1 O(ɛ 3 2 δ). Complexity of second order line search 30

43 Complexity results (simplied) Setting: ɛ g = ɛ, ɛ H = ɛ. An (ɛ, ɛ)-point is reached in at most O(ɛ 3 2 ) iterations, ( ) Õ ɛ 7 4 Hessian-vector products/gradient evaluations, with probability 1 O(ɛ 3 2 δ). Setting δ = 0 gives results in probability 1: Iterations: O(ɛ 3 2 ). Hessian-vector/gradients: O ( ) nɛ 3 2. Complexity of second order line search 30

44 Summary Our proposal A class of second-order line-search methods. Best known complexity guarantees. Features gradient steps and inexactness. Can be implemented matrix-free. For more details... Complexity analysis of second-order line-search algorithms for smooth nonconvex optimization, C. W. Royer and S. J. Wright, arxiv: Also contains local convergence results. Complexity of second order line search 31

45 Follow-up Perspectives Numerical testing of our class of methods. Extension to constrained problems. Complexity of second order line search 32

46 Follow-up Perspectives Numerical testing of our class of methods. Extension to constrained problems. Thank you for your attention! Complexity of second order line search 32

Optimisation non convexe avec garanties de complexité via Newton+gradient conjugué

Optimisation non convexe avec garanties de complexité via Newton+gradient conjugué Clément Royer (Université du Wisconsin-Madison, États-Unis) Toulouse, 8 janvier 2019 Nonconvex optimization via Newton-CG